Spaces:

burtenshaw
/

openenv-benchmark-ws

Sleeping

App Files Files Community

openenv-benchmark-ws / README.md

burtenshaw HF Staff

Upload folder using huggingface_hub

73edc95 verified 6 days ago

preview code

raw

history blame contribute delete

7.73 kB

	---
	title: Benchmark Environment Server
	emoji: 🎸
	colorFrom: red
	colorTo: red
	sdk: docker
	pinned: false
	app_port: 8000
	base_path: /web
	tags:
	- openenv
	---

	# Benchmark Environment

	A simple test environment that echoes back messages. Perfect for testing the env APIs as well as demonstrating environment usage patterns.

	## Quick Start

	The simplest way to use the Benchmark environment is through the `BenchmarkEnv` class:

	```python
	from benchmark import BenchmarkAction, BenchmarkEnv

	try:
	# Create environment from Docker image
	benchmarkenv = BenchmarkEnv.from_docker_image("benchmark-env:latest")

	# Reset
	result = benchmarkenv.reset()
	print(f"Reset: {result.observation.echoed_message}")

	# Send multiple messages
	messages = ["Hello, World!", "Testing echo", "Final message"]

	for msg in messages:
	result = benchmarkenv.step(BenchmarkAction(message=msg))
	print(f"Sent: '{msg}'")
	print(f" → Echoed: '{result.observation.echoed_message}'")
	print(f" → Length: {result.observation.message_length}")
	print(f" → Reward: {result.reward}")

	finally:
	# Always clean up
	benchmarkenv.close()
	```

	That's it! The `BenchmarkEnv.from_docker_image()` method handles:
	- Starting the Docker container
	- Waiting for the server to be ready
	- Connecting to the environment
	- Container cleanup when you call `close()`

	## Building the Docker Image

	Before using the environment, you need to build the Docker image:

	```bash
	# From project root
	docker build -t benchmark-env:latest -f server/Dockerfile .
	```

	## Deploying to Hugging Face Spaces

	You can easily deploy your OpenEnv environment to Hugging Face Spaces using the `openenv push` command:

	```bash
	# From the environment directory (where openenv.yaml is located)
	openenv push

	# Or specify options
	openenv push --namespace my-org --private
	```

	The `openenv push` command will:
	1. Validate that the directory is an OpenEnv environment (checks for `openenv.yaml`)
	2. Prepare a custom build for Hugging Face Docker space (enables web interface)
	3. Upload to Hugging Face (ensuring you're logged in)

	### Prerequisites

	- Authenticate with Hugging Face: The command will prompt for login if not already authenticated

	### Options

	- `--directory`, `-d`: Directory containing the OpenEnv environment (defaults to current directory)
	- `--repo-id`, `-r`: Repository ID in format 'username/repo-name' (defaults to 'username/env-name' from openenv.yaml)
	- `--base-image`, `-b`: Base Docker image to use (overrides Dockerfile FROM)
	- `--private`: Deploy the space as private (default: public)

	### Examples

	```bash
	# Push to your personal namespace (defaults to username/env-name from openenv.yaml)
	openenv push

	# Push to a specific repository
	openenv push --repo-id my-org/my-env

	# Push with a custom base image
	openenv push --base-image ghcr.io/meta-pytorch/openenv-base:latest

	# Push as a private space
	openenv push --private

	# Combine options
	openenv push --repo-id my-org/my-env --base-image custom-base:latest --private
	```

	After deployment, your space will be available at:
	`https://huggingface.co/spaces/<repo-id>`

	The deployed space includes:
	- Web Interface at `/web` - Interactive UI for exploring the environment
	- API Documentation at `/docs` - Full OpenAPI/Swagger interface
	- Health Check at `/health` - Container health monitoring
	- WebSocket at `/ws` - Persistent session endpoint for low-latency interactions

	## Environment Details

	### Action
	BenchmarkAction: Contains a single field
	- `message` (str) - The message to echo back

	### Observation
	BenchmarkObservation: Contains the echo response and metadata
	- `echoed_message` (str) - The message echoed back
	- `message_length` (int) - Length of the message
	- `reward` (float) - Reward based on message length (length × 0.1)
	- `done` (bool) - Always False for echo environment
	- `metadata` (dict) - Additional info like step count

	### Reward
	The reward is calculated as: `message_length × 0.1`
	- "Hi" → reward: 0.2
	- "Hello, World!" → reward: 1.3
	- Empty message → reward: 0.0

	## Advanced Usage

	### Connecting to an Existing Server

	If you already have a Benchmark environment server running, you can connect directly:

	```python
	from benchmark import BenchmarkEnv

	# Connect to existing server
	benchmarkenv = BenchmarkEnv(base_url="<ENV_HTTP_URL_HERE>")

	# Use as normal
	result = benchmarkenv.reset()
	result = benchmarkenv.step(BenchmarkAction(message="Hello!"))
	```

	Note: When connecting to an existing server, `benchmarkenv.close()` will NOT stop the server.

	### WebSocket Client for Persistent Sessions

	For long-running episodes or when you need lower latency, use the WebSocket client:

	```python
	from benchmark import BenchmarkAction, BenchmarkEnvWS

	# Connect via WebSocket (maintains persistent connection)
	with BenchmarkEnvWS(base_url="http://localhost:8000") as env:
	result = env.reset()
	print(f"Reset: {result.observation.echoed_message}")
	# Multiple steps with low latency
	for msg in ["Hello", "World", "!"]:
	result = env.step(BenchmarkAction(message=msg))
	print(f"Echoed: {result.observation.echoed_message}")
	```

	WebSocket advantages:
	- Lower latency: No HTTP connection overhead per request
	- Persistent session: Server maintains your environment state
	- Efficient for episodes: Better for many sequential steps

	### Concurrent WebSocket Sessions

	The server supports multiple concurrent WebSocket connections. To enable this,
	modify `server/app.py` to use factory mode:

	```python
	# In server/app.py - use factory mode for concurrent sessions
	app = create_app(
	BenchmarkEnvironment, # Pass class, not instance
	BenchmarkAction,
	BenchmarkObservation,
	max_concurrent_envs=4, # Allow 4 concurrent sessions
	)
	```

	Then multiple clients can connect simultaneously:

	```python
	from benchmark import BenchmarkAction, BenchmarkEnvWS
	from concurrent.futures import ThreadPoolExecutor

	def run_episode(client_id: int):
	with BenchmarkEnvWS(base_url="http://localhost:8000") as env:
	result = env.reset()
	for i in range(10):
	result = env.step(BenchmarkAction(message=f"Client {client_id}, step {i}"))
	return client_id, result.observation.message_length

	# Run 4 episodes concurrently
	with ThreadPoolExecutor(max_workers=4) as executor:
	results = list(executor.map(run_episode, range(4)))
	```

	## Development & Testing

	### Direct Environment Testing

	Test the environment logic directly without starting the HTTP server:

	```bash
	# From the server directory
	python3 server/benchmark_environment.py
	```

	This verifies that:
	- Environment resets correctly
	- Step executes actions properly
	- State tracking works
	- Rewards are calculated correctly

	### Running Locally

	Run the server locally for development:

	```bash
	uvicorn server.app:app --reload
	```

	## Project Structure

	```
	benchmark/
	├── .dockerignore # Docker build exclusions
	├── __init__.py # Module exports
	├── README.md # This file
	├── openenv.yaml # OpenEnv manifest
	├── pyproject.toml # Project metadata and dependencies
	├── uv.lock # Locked dependencies (generated)
	├── client.py # BenchmarkEnv (HTTP) and BenchmarkEnvWS (WebSocket) clients
	├── models.py # Action and Observation models
	└── server/
	├── __init__.py # Server module exports
	├── benchmark_environment.py # Core environment logic
	├── app.py # FastAPI application (HTTP + WebSocket endpoints)
	└── Dockerfile # Container image definition
	```