---
pipeline_tag: image-to-image
library_name: diffusers
license: mit
---

# F-ViTA: Foundation Model Guided Visible to Thermal Translation

This repository contains the model described in the paper [F-ViTA: Foundation Model Guided Visible to Thermal Translation](https://huggingface.co/papers/2504.02801).

F-ViTA leverages foundation models (SAM and Grounded DINO) to guide the visible-to-thermal image translation process using an InstructPix2Pix diffusion model. This approach improves translation accuracy and generalizes well to out-of-distribution scenarios.

Code: https://github.com/jay-jnp/F-ViTA

Pre-trained checkpoints are available for several datasets:

*   **KAIST:** [huggingface.co/jay-jnp/F-ViTA\_KAIST](https://huggingface.co/jay-jnp/F-ViTA_KAIST)
*   **FLIR:** [huggingface.co/jay-jnp/F-VITA\_FLIR](https://huggingface.co/jay-jnp/F-VITA_FLIR)
*   **NIRSCENE:** [huggingface.co/jay-jnp/F-VITA\_NIRSCENE](https://huggingface.co/jay-jnp/F-VITA_NIRSCENE)
*   **OSU:** [huggingface.co/jay-jnp/F-VITA\_OSU](https://huggingface.co/jay-jnp/F-VITA_OSU)