File size: 5,617 Bytes
d5ce03d 3669696 d5ce03d 6880cd9 effc43f 6880cd9 9c1b428 d5ce03d |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 |
---
title: Slash
emoji: π
colorFrom: purple
colorTo: gray
sdk: streamlit
sdk_version: 1.25.0
pinned: false
license: mit
short_description: 'An AI powered book summarizer '
---
# π Book Summarizer AI
An intelligent web application that extracts text from PDF books and generates comprehensive summaries using state-of-the-art AI models.
## β¨ Features
- π **PDF Text Extraction**: Advanced PDF processing with multiple extraction methods
- π€ **AI-Powered Summarization**: Uses transformer models (BART, T5) for high-quality summaries
- π **Beautiful Web Interface**: Modern UI built with Streamlit
- β‘ **FastAPI Backend**: Scalable and fast API for processing
- π **Configurable Settings**: Adjust summary length, chunk size, and AI models
- π **Text Analysis**: Detailed statistics about book content
- πΎ **Download Summaries**: Save summaries as text files
## π Quick Start
### Option 1: Automated Setup (Recommended)
**Windows:**
```bash
# Double-click start.bat or run:
start.bat
```
**Unix/Linux/Mac:**
```bash
# Make script executable and run:
chmod +x start.sh
./start.sh
```
### Option 2: Manual Setup
1. **Install dependencies:**
```bash
pip install -r requirements.txt
```
2. **Download NLTK data:**
```python
python -c "import nltk; nltk.download('punkt'); nltk.download('stopwords')"
```
3. **Start the FastAPI backend:**
```bash
uvicorn api.main:app --reload --port 8000
```
4. **Start the Streamlit frontend:**
```bash
streamlit run app.py
```
5. **Open your browser:**
- Frontend: http://localhost:8501
- API Docs: http://localhost:8000/docs
## π Usage
1. **Upload PDF**: Select a PDF book file (max 50MB)
2. **Configure Settings**: Choose AI model and summary parameters
3. **Generate Summary**: Click "Generate Summary" and wait for processing
4. **Download Result**: Save your AI-generated summary
## π οΈ Technology Stack
### Frontend
- **Streamlit**: Modern web interface
- **Custom CSS**: Beautiful styling and responsive design
### Backend
- **FastAPI**: High-performance API framework
- **Uvicorn**: ASGI server for FastAPI
### AI & ML
- **Hugging Face Transformers**: State-of-the-art NLP models
- **PyTorch**: Deep learning framework
- **BART/T5 Models**: Pre-trained summarization models
### PDF Processing
- **PyPDF2**: PDF text extraction
- **pdfplumber**: Advanced PDF processing
- **NLTK**: Natural language processing
## π Project Structure
```
book-summarizer/
βββ app.py # Streamlit frontend
βββ start.py # Automated startup script
βββ start.bat # Windows startup script
βββ start.sh # Unix/Linux/Mac startup script
βββ api/
β βββ __init__.py # API package
β βββ main.py # FastAPI backend
β βββ pdf_processor.py # PDF text extraction
β βββ summarizer.py # AI summarization logic
β βββ utils.py # Utility functions
βββ requirements.txt # Python dependencies
βββ README.md # Project documentation
```
## βοΈ Configuration
### AI Models
- **facebook/bart-large-cnn**: Best quality, slower processing
- **t5-small**: Faster processing, good quality
- **facebook/bart-base**: Balanced performance
### Summary Settings
- **Max Length**: 50-500 words (default: 150)
- **Min Length**: 10-200 words (default: 50)
- **Chunk Size**: 500-2000 characters (default: 1000)
- **Overlap**: 50-200 characters (default: 100)
## π§ API Endpoints
- `GET /` - API information
- `GET /health` - Health check
- `POST /upload-pdf` - Validate PDF file
- `POST /extract-text` - Extract text from PDF
- `POST /summarize` - Generate book summary
- `GET /models` - List available AI models
- `POST /change-model` - Switch AI model
## π Requirements
- **Python**: 3.8 or higher
- **Memory**: At least 4GB RAM (8GB recommended)
- **Storage**: 2GB free space for models
- **Internet**: Required for first-time model download
## π Troubleshooting
### Common Issues
1. **"Module not found" errors:**
```bash
pip install -r requirements.txt
```
2. **NLTK data missing:**
```python
python -c "import nltk; nltk.download('punkt'); nltk.download('stopwords')"
```
3. **API connection failed:**
- Ensure FastAPI is running on port 8000
- Check firewall settings
- Verify no other service is using the port
4. **Large PDF processing slow:**
- Reduce chunk size in advanced settings
- Use a faster model (t5-small)
- Ensure sufficient RAM
5. **Model download issues:**
- Check internet connection
- Clear Hugging Face cache: `rm -rf ~/.cache/huggingface`
### Performance Tips
- **GPU Acceleration**: Install CUDA for faster processing
- **Model Selection**: Use smaller models for faster results
- **Chunk Size**: Smaller chunks = faster processing but may lose context
- **Memory**: Close other applications to free up RAM
## π€ Contributing
1. Fork the repository
2. Create a feature branch
3. Make your changes
4. Add tests if applicable
5. Submit a pull request
## π License
This project is open source and available under the MIT License.
## π Acknowledgments
- Hugging Face for transformer models
- Streamlit for the web framework
- FastAPI for the backend framework
- The open-source community for various libraries
## π Support
For issues, questions, or feature requests:
1. Check the troubleshooting section
2. Open an issue on GitHub
---
**Happy summarizing! πβ¨**
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
|