OS-Copilot
/

OS-Atlas-Pro-4B

Image-Text-to-Text

feature-extraction

Model card Files Files and versions

numbmelon commited on Nov 19, 2024

Commit

7625c55

·

verified ·

1 Parent(s): a6dbe5f

Update README.md

Files changed (1) hide show

README.md +6 -3

README.md CHANGED Viewed

@@ -23,8 +23,8 @@ For GUI grounding tasks, you can use:
 - [OS-Atlas-Base-4B](https://huggingface.co/OS-Copilot/OS-Atlas-Base-4B)
 For generating single-step actions in GUI agent tasks, you can use:
-- [OS-Atlas-Action-7B](https://huggingface.co/OS-Copilot/OS-Atlas-Pro-7B)
-- [OS-Atlas-Action-4B](https://huggingface.co/OS-Copilot/OS-Atlas-Pro-4B)
 ## OS-Atlas-Action-4B
@@ -39,6 +39,9 @@ pip install transformers
 For additional dependencies, please refer to the [InternVL2 documentation](https://internvl.readthedocs.io/en/latest/get_started/installation.html)
 ### Example Inference Code
 ```python
 import torch
 import torchvision.transforms as T
@@ -123,7 +126,7 @@ def load_image(image_file, input_size=448, max_num=6):
     return pixel_values
 # If you want to load a model using multiple GPUs, please refer to the `Multiple GPUs` section.
-path = 'https://github.com/OS-Copilot/OS-Atlas/blob/main/exmaples/images/action_example_1.jpg'
 model = AutoModel.from_pretrained(
     path,
     torch_dtype=torch.bfloat16,

 - [OS-Atlas-Base-4B](https://huggingface.co/OS-Copilot/OS-Atlas-Base-4B)
 For generating single-step actions in GUI agent tasks, you can use:
+- [OS-Atlas-Pro-7B](https://huggingface.co/OS-Copilot/OS-Atlas-Pro-7B)
+- [OS-Atlas-Pro-4B](https://huggingface.co/OS-Copilot/OS-Atlas-Pro-4B)
 ## OS-Atlas-Action-4B
 For additional dependencies, please refer to the [InternVL2 documentation](https://internvl.readthedocs.io/en/latest/get_started/installation.html)
 ### Example Inference Code
+First download the [example image](https://github.com/OS-Copilot/OS-Atlas/blob/main/examples/images/action_example_1.jpg) and save it to the current directory.
+Inference code:
 ```python
 import torch
 import torchvision.transforms as T
     return pixel_values
 # If you want to load a model using multiple GPUs, please refer to the `Multiple GPUs` section.
+path = './action_example_1.jpg' # change to your example image path
 model = AutoModel.from_pretrained(
     path,
     torch_dtype=torch.bfloat16,