--- title: Image Caption Generator emoji: 🏢 colorFrom: yellow colorTo: yellow sdk: gradio sdk_version: 5.49.1 app_file: app.py pinned: false license: mit short_description: image_caption_generator --- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference --- title: Image Captioning LSTM emoji: 📸 colorFrom: blue colorTo: purple sdk: gradio sdk_version: 4.12.0 app_file: app.py pinned: false --- # Image Captioning with LSTM This application generates descriptive captions for images using a deep learning model trained on the Flickr8K dataset. ## Model Architecture - **Image Encoder**: ResNet50 (pre-trained on ImageNet) extracts visual features - **Caption Decoder**: LSTM-based sequence generator with embedding layer - **Vocabulary Size**: ~8000 unique words - **Max Caption Length**: 40 tokens ## Features - **Greedy Search**: Fast caption generation by selecting the most probable word at each step - **Beam Search**: Improved caption quality by exploring multiple candidate sequences (k=3) ## Training Dataset The model was trained on the Flickr8K dataset containing: - 8,000 images - 5 captions per image (40,000 captions total) ## How to Use 1. Upload an image 2. Choose caption generation method (Greedy or Beam Search) 3. Click Submit to generate caption ## Model Performance The model achieves competitive BLEU scores on the test set, providing natural and descriptive captions for a variety of scenes. ## Citation Dataset: M. Hodosh, P. Young and J. Hockenmaier (2013) "Framing Image Description as a Ranking Task: Data, Models and Evaluation Metrics", Journal of Artificial Intelligence Research, Volume 47, pages 853-899