---
title: Image Caption Generator
emoji: 🏢
colorFrom: yellow
colorTo: yellow
sdk: gradio
sdk_version: 5.49.1
app_file: app.py
pinned: false
license: mit
short_description: image_caption_generator
---

Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference


---
title: Image Captioning LSTM
emoji: 📸
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 4.12.0
app_file: app.py
pinned: false
---

# Image Captioning with LSTM

This application generates descriptive captions for images using a deep learning model trained on the Flickr8K dataset.

## Model Architecture

- **Image Encoder**: ResNet50 (pre-trained on ImageNet) extracts visual features
- **Caption Decoder**: LSTM-based sequence generator with embedding layer
- **Vocabulary Size**: ~8000 unique words
- **Max Caption Length**: 40 tokens

## Features

- **Greedy Search**: Fast caption generation by selecting the most probable word at each step
- **Beam Search**: Improved caption quality by exploring multiple candidate sequences (k=3)

## Training Dataset

The model was trained on the Flickr8K dataset containing:
- 8,000 images
- 5 captions per image (40,000 captions total)

## How to Use

1. Upload an image
2. Choose caption generation method (Greedy or Beam Search)
3. Click Submit to generate caption

## Model Performance

The model achieves competitive BLEU scores on the test set, providing natural and descriptive captions for a variety of scenes.

## Citation

Dataset: M. Hodosh, P. Young and J. Hockenmaier (2013) "Framing Image Description as a Ranking Task: Data, Models and Evaluation Metrics", Journal of Artificial Intelligence Research, Volume 47, pages 853-899