A newer version of the Streamlit SDK is available:
1.52.1
metadata
title: Slash
emoji: π
colorFrom: purple
colorTo: gray
sdk: streamlit
sdk_version: 1.25.0
pinned: false
license: mit
short_description: 'An AI powered book summarizer '
π Book Summarizer AI
An intelligent web application that extracts text from PDF books and generates comprehensive summaries using state-of-the-art AI models.
β¨ Features
- π PDF Text Extraction: Advanced PDF processing with multiple extraction methods
- π€ AI-Powered Summarization: Uses transformer models (BART, T5) for high-quality summaries
- π Beautiful Web Interface: Modern UI built with Streamlit
- β‘ FastAPI Backend: Scalable and fast API for processing
- π Configurable Settings: Adjust summary length, chunk size, and AI models
- π Text Analysis: Detailed statistics about book content
- πΎ Download Summaries: Save summaries as text files
π Quick Start
Option 1: Automated Setup (Recommended)
Windows:
# Double-click start.bat or run:
start.bat
Unix/Linux/Mac:
# Make script executable and run:
chmod +x start.sh
./start.sh
Option 2: Manual Setup
- Install dependencies:
pip install -r requirements.txt
- Download NLTK data:
python -c "import nltk; nltk.download('punkt'); nltk.download('stopwords')"
- Start the FastAPI backend:
uvicorn api.main:app --reload --port 8000
- Start the Streamlit frontend:
streamlit run app.py
- Open your browser:
- Frontend: http://localhost:8501
- API Docs: http://localhost:8000/docs
π Usage
- Upload PDF: Select a PDF book file (max 50MB)
- Configure Settings: Choose AI model and summary parameters
- Generate Summary: Click "Generate Summary" and wait for processing
- Download Result: Save your AI-generated summary
π οΈ Technology Stack
Frontend
- Streamlit: Modern web interface
- Custom CSS: Beautiful styling and responsive design
Backend
- FastAPI: High-performance API framework
- Uvicorn: ASGI server for FastAPI
AI & ML
- Hugging Face Transformers: State-of-the-art NLP models
- PyTorch: Deep learning framework
- BART/T5 Models: Pre-trained summarization models
PDF Processing
- PyPDF2: PDF text extraction
- pdfplumber: Advanced PDF processing
- NLTK: Natural language processing
π Project Structure
book-summarizer/
βββ app.py # Streamlit frontend
βββ start.py # Automated startup script
βββ start.bat # Windows startup script
βββ start.sh # Unix/Linux/Mac startup script
βββ api/
β βββ __init__.py # API package
β βββ main.py # FastAPI backend
β βββ pdf_processor.py # PDF text extraction
β βββ summarizer.py # AI summarization logic
β βββ utils.py # Utility functions
βββ requirements.txt # Python dependencies
βββ README.md # Project documentation
βοΈ Configuration
AI Models
- facebook/bart-large-cnn: Best quality, slower processing
- t5-small: Faster processing, good quality
- facebook/bart-base: Balanced performance
Summary Settings
- Max Length: 50-500 words (default: 150)
- Min Length: 10-200 words (default: 50)
- Chunk Size: 500-2000 characters (default: 1000)
- Overlap: 50-200 characters (default: 100)
π§ API Endpoints
GET /- API informationGET /health- Health checkPOST /upload-pdf- Validate PDF filePOST /extract-text- Extract text from PDFPOST /summarize- Generate book summaryGET /models- List available AI modelsPOST /change-model- Switch AI model
π Requirements
- Python: 3.8 or higher
- Memory: At least 4GB RAM (8GB recommended)
- Storage: 2GB free space for models
- Internet: Required for first-time model download
π Troubleshooting
Common Issues
"Module not found" errors:
pip install -r requirements.txtNLTK data missing:
python -c "import nltk; nltk.download('punkt'); nltk.download('stopwords')"API connection failed:
- Ensure FastAPI is running on port 8000
- Check firewall settings
- Verify no other service is using the port
Large PDF processing slow:
- Reduce chunk size in advanced settings
- Use a faster model (t5-small)
- Ensure sufficient RAM
Model download issues:
- Check internet connection
- Clear Hugging Face cache:
rm -rf ~/.cache/huggingface
Performance Tips
- GPU Acceleration: Install CUDA for faster processing
- Model Selection: Use smaller models for faster results
- Chunk Size: Smaller chunks = faster processing but may lose context
- Memory: Close other applications to free up RAM
π€ Contributing
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests if applicable
- Submit a pull request
π License
This project is open source and available under the MIT License.
π Acknowledgments
- Hugging Face for transformer models
- Streamlit for the web framework
- FastAPI for the backend framework
- The open-source community for various libraries
π Support
For issues, questions, or feature requests:
- Check the troubleshooting section
- Open an issue on GitHub
Happy summarizing! πβ¨
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference