OpenVLA: An Open-Source Vision-Language-Action Model
Paper
•
2406.09246
•
Published
•
41
这是OpenVLA在LIBERO-Spatial数据集上训练的完整checkpoint集合,包含训练过程中的多个epoch checkpoints。
本仓库包含 10 个训练checkpoints:
| Checkpoint Directory | Epoch | Step | 下载 |
|---|---|---|---|
epoch-05-step-000685/ |
5 | 685 | 下载 |
epoch-10-step-001370/ |
10 | 1370 | 下载 |
epoch-15-step-002055/ |
15 | 2055 | 下载 |
epoch-18-step-002500/ |
18 | 2500 | 下载 |
epoch-20-step-002740/ |
20 | 2740 | 下载 |
epoch-25-step-003425/ |
25 | 3425 | 下载 |
epoch-30-step-004110/ |
30 | 4110 | 下载 |
epoch-35-step-004795/ |
35 | 4795 | 下载 |
epoch-36-step-005000/ |
36 | 5000 | 下载 |
epoch-40-step-005480/ |
40 | 5480 | 下载 |
from transformers import AutoModelForVision2Seq, AutoProcessor
import torch
# 选择要使用的checkpoint(例如epoch 40)
checkpoint_path = "{{repo_id}}/epoch-40-step-005480"
# 加载模型
model = AutoModelForVision2Seq.from_pretrained(
checkpoint_path,
trust_remote_code=True,
torch_dtype=torch.bfloat16
).to("cuda")
# 加载processor
processor = AutoProcessor.from_pretrained(
checkpoint_path,
trust_remote_code=True
)
# 预测动作
from PIL import Image
image = Image.open("observation.jpg")
prompt = "In: What action should the robot take to pick up the object?\nOut:"
inputs = processor(prompt, image).to("cuda", dtype=torch.bfloat16)
action = model.predict_action(
**inputs,
unnorm_key="libero_spatial_no_noops",
do_sample=False
)
print(action) # 7-DoF action vector
# 使用epoch 40的checkpoint
python experiments/robot/libero/run_libero_eval.py \
--model_family openvla \
--pretrained_checkpoint {{repo_id}}/epoch-40-step-005480 \
--task_suite_name libero_spatial_no_noops \
--center_crop False \
--num_trials_per_task 50
# 下载特定checkpoint
huggingface-cli download {{repo_id}} --include "epoch-40-step-005480/*" --local-dir ./local-checkpoints
# 或下载所有checkpoints
huggingface-cli download {{repo_id}} --local-dir ./local-checkpoints
libero-spatial/
├── epoch-05-step-000685/
│ ├── config.json
│ ├── model-*.safetensors
│ ├── preprocessor_config.json
│ ├── tokenizer_config.json
│ └── dataset_statistics.json
├── epoch-10-step-001370/
│ └── ...
├── epoch-15-step-002055/
│ └── ...
... (其他checkpoints)
pip install transformers torch pillow huggingface_hub
@article{kim2024openvla,
title={{OpenVLA: An Open-Source Vision-Language-Action Model}},
author={{Kim, Moo Jin and Pertsch, Karl and Karamcheti, Siddharth and Xiao, Ted and Balakrishna, Ashwin and Nair, Suraj and Rafailov, Rafael and Foster, Ethan and Lam, Grace and Sanketi, Pannag and Nasiriany, Soroush and Liang, Zheyuan and Sadigh, Dorsa and Levine, Sergey and Liang, Percy}},
journal={{arXiv preprint arXiv:2406.09246}},
year={{2024}}
}
MIT License