microsoft
/

xclip-base-patch16

@@ -21,7 +21,7 @@ model-index:
 # X-CLIP (base-sized model)
-X-CLIP model (base-sized, patch resolution of 32) trained fully-supervised on [Kinetics-400](https://www.deepmind.com/open-source/kinetics). It was introduced in the paper [Expanding Language-Image Pretrained Models for General Video Recognition](https://arxiv.org/abs/2208.02816) by Ni et al. and first released in [this repository](https://github.com/microsoft/VideoX/tree/master/X-CLIP).
 This model was trained using 8 frames per video, at a resolution of 224x224.

 # X-CLIP (base-sized model)
+X-CLIP model (base-sized, patch resolution of 16) trained fully-supervised on [Kinetics-400](https://www.deepmind.com/open-source/kinetics). It was introduced in the paper [Expanding Language-Image Pretrained Models for General Video Recognition](https://arxiv.org/abs/2208.02816) by Ni et al. and first released in [this repository](https://github.com/microsoft/VideoX/tree/master/X-CLIP).
 This model was trained using 8 frames per video, at a resolution of 224x224.