AI & ML interests
None defined yet.
albertvillanovaΒ
posted
an
update
5 months ago
albertvillanovaΒ
posted
an
update
5 months ago
Post
662
π smolagents v1.21.0 is here!
Now with improved safety in the local Python executor: dunder calls are blocked!
β οΈ Still, not fully isolated: for untrusted code, use a remote executor instead: Docker, E2B, Wasm.
β¨ Many bug fixes: more reliable code.
π https://github.com/huggingface/smolagents/releases/tag/v1.21.0
Now with improved safety in the local Python executor: dunder calls are blocked!
β οΈ Still, not fully isolated: for untrusted code, use a remote executor instead: Docker, E2B, Wasm.
β¨ Many bug fixes: more reliable code.
π https://github.com/huggingface/smolagents/releases/tag/v1.21.0
albertvillanovaΒ
posted
an
update
6 months ago
Post
786
π New in smolagents v1.20.0: Remote Python Execution via WebAssembly (Wasm)
We've just merged a major new capability into the smolagents framework: the CodeAgent can now execute Python code remotely in a secure, sandboxed WebAssembly environment!
π§ Powered by Pyodide and Deno, this new WasmExecutor lets your agent-generated Python code run safely: without relying on Docker or local execution.
Why this matters:
β Isolated execution = no host access
β No need for Python on the user's machine
β Safer evaluation of arbitrary code
β Compatible with serverless / edge agent workloads
β Ideal for constrained or untrusted environments
This is just the beginning: a focused initial implementation with known limitations. A solid MVP designed for secure, sandboxed use cases. π‘
π‘ We're inviting the open-source community to help evolve this executor:
β’ Tackle more advanced Python features
β’ Expand compatibility
β’ Add test coverage
β’ Shape the next-gen secure agent runtime
π Check out the PR: https://github.com/huggingface/smolagents/pull/1261
Let's reimagine what agent-driven Python execution can look like: remote-first, wasm-secure, and community-built.
This feature is live in smolagents v1.20.0!
Try it out.
Break things. Extend it. Give us feedback.
Let's build safer, smarter agents; together π§ βοΈ
π https://github.com/huggingface/smolagents/releases/tag/v1.20.0
#smolagents #WebAssembly #Python #AIagents #Pyodide #Deno #OpenSource #HuggingFace #AgenticAI
We've just merged a major new capability into the smolagents framework: the CodeAgent can now execute Python code remotely in a secure, sandboxed WebAssembly environment!
π§ Powered by Pyodide and Deno, this new WasmExecutor lets your agent-generated Python code run safely: without relying on Docker or local execution.
Why this matters:
β Isolated execution = no host access
β No need for Python on the user's machine
β Safer evaluation of arbitrary code
β Compatible with serverless / edge agent workloads
β Ideal for constrained or untrusted environments
This is just the beginning: a focused initial implementation with known limitations. A solid MVP designed for secure, sandboxed use cases. π‘
π‘ We're inviting the open-source community to help evolve this executor:
β’ Tackle more advanced Python features
β’ Expand compatibility
β’ Add test coverage
β’ Shape the next-gen secure agent runtime
π Check out the PR: https://github.com/huggingface/smolagents/pull/1261
Let's reimagine what agent-driven Python execution can look like: remote-first, wasm-secure, and community-built.
This feature is live in smolagents v1.20.0!
Try it out.
Break things. Extend it. Give us feedback.
Let's build safer, smarter agents; together π§ βοΈ
π https://github.com/huggingface/smolagents/releases/tag/v1.20.0
#smolagents #WebAssembly #Python #AIagents #Pyodide #Deno #OpenSource #HuggingFace #AgenticAI
albertvillanovaΒ
posted
an
update
6 months ago
Post
1814
π SmolAgents v1.19.0 is live!
This release brings major improvements to agent flexibility, UI usability, streaming architecture, and developer experience: making it easier than ever to build smart, interactive AI agents. Here's what's new:
π§ Agent Upgrades
- Support for managed agents in ToolCallingAgent
- Context manager support for cleaner agent lifecycle handling
- Output formatting now uses XML tags for consistency
π₯οΈ UI Enhancements
- GradioUI now supports reset_agent_memory: perfect for fresh starts in dev & demos.
π Streaming Refactor
- Streaming event aggregation moved off the Model class
- β‘οΈ Better architecture & maintainability
π¦ Output Tracking
- CodeAgent outputs are now stored in ActionStep
- β More visibility and structure to agent decisions
π Bug Fixes
- Smarter planning logic
- Cleaner Docker logs
- Better prompt formatting for additional_args
- Safer internal functions and final answer matching
π Docs Improvements
- Added quickstart examples with tool usage
- One-click Colab launch buttons
- Expanded reference docs (AgentMemory, GradioUI docstrings)
- Fixed broken links and migrated to .md format
π Full release notes:
https://github.com/huggingface/smolagents/releases/tag/v1.19.0
π¬ Try it out, explore the new features, and let us know what you build!
#smolagents #opensource #AIagents #LLM #HuggingFace
This release brings major improvements to agent flexibility, UI usability, streaming architecture, and developer experience: making it easier than ever to build smart, interactive AI agents. Here's what's new:
π§ Agent Upgrades
- Support for managed agents in ToolCallingAgent
- Context manager support for cleaner agent lifecycle handling
- Output formatting now uses XML tags for consistency
π₯οΈ UI Enhancements
- GradioUI now supports reset_agent_memory: perfect for fresh starts in dev & demos.
π Streaming Refactor
- Streaming event aggregation moved off the Model class
- β‘οΈ Better architecture & maintainability
π¦ Output Tracking
- CodeAgent outputs are now stored in ActionStep
- β More visibility and structure to agent decisions
π Bug Fixes
- Smarter planning logic
- Cleaner Docker logs
- Better prompt formatting for additional_args
- Safer internal functions and final answer matching
π Docs Improvements
- Added quickstart examples with tool usage
- One-click Colab launch buttons
- Expanded reference docs (AgentMemory, GradioUI docstrings)
- Fixed broken links and migrated to .md format
π Full release notes:
https://github.com/huggingface/smolagents/releases/tag/v1.19.0
π¬ Try it out, explore the new features, and let us know what you build!
#smolagents #opensource #AIagents #LLM #HuggingFace
albertvillanovaΒ
posted
an
update
7 months ago
Post
740
New in smolagents v1.17.0:
- Structured generation in CodeAgent π§±
- Streamable HTTP MCP support π
- Agent.run() returns rich RunResult π¦
Smarter agents, smoother workflows.
Try it now: https://github.com/huggingface/smolagents/releases/tag/v1.17.0
- Structured generation in CodeAgent π§±
- Streamable HTTP MCP support π
- Agent.run() returns rich RunResult π¦
Smarter agents, smoother workflows.
Try it now: https://github.com/huggingface/smolagents/releases/tag/v1.17.0
albertvillanovaΒ
posted
an
update
8 months ago
Post
2605
New in smolagents v1.16.0:
π Bing support in WebSearchTool
π Custom functions & executor_kwargs in LocalPythonExecutor
π§ Streaming GradioUI fixes
π Local web agents via api_base & api_key
π Better docs
π https://github.com/huggingface/smolagents/releases/tag/v1.16.0
π Bing support in WebSearchTool
π Custom functions & executor_kwargs in LocalPythonExecutor
π§ Streaming GradioUI fixes
π Local web agents via api_base & api_key
π Better docs
π https://github.com/huggingface/smolagents/releases/tag/v1.16.0
albertvillanovaΒ
posted
an
update
9 months ago
Post
2887
smolagents v1.14.0 is out! π
π MCPClient: A sleek new client for connecting to remote MCP servers, making integrations more flexible and scalable.
πͺ¨ Amazon Bedrock: Native support for Bedrock-hosted models.
SmolAgents is now more powerful, flexible, and enterprise-ready. πΌ
Full release π https://github.com/huggingface/smolagents/releases/tag/v1.14.0
#smolagents #LLM #AgenticAI
π MCPClient: A sleek new client for connecting to remote MCP servers, making integrations more flexible and scalable.
πͺ¨ Amazon Bedrock: Native support for Bedrock-hosted models.
SmolAgents is now more powerful, flexible, and enterprise-ready. πΌ
Full release π https://github.com/huggingface/smolagents/releases/tag/v1.14.0
#smolagents #LLM #AgenticAI
albertvillanovaΒ
posted
an
update
10 months ago
Post
4191
π New smolagents update: Safer Local Python Execution! π¦Ύπ
With the latest release, we've added security checks to the local Python interpreter: every evaluation is now analyzed for dangerous builtins, modules, and functions. π
Here's why this matters & what you need to know! π§΅π
1οΈβ£ Why is local execution risky? β οΈ
AI agents that run arbitrary Python code can unintentionally (or maliciously) access system files, run unsafe commands, or exfiltrate data.
2οΈβ£ New Safety Layer in smolagents π‘οΈ
We now inspect every return value during execution:
β Allowed: Safe built-in types (e.g., numbers, strings, lists)
β Blocked: Dangerous functions/modules (e.g., os.system, subprocess, exec, shutil)
3οΈβ£ Immediate Benefits π‘
- Prevent agents from accessing unsafe builtins
- Block unauthorized file or network access
- Reduce accidental security vulnerabilities
4οΈβ£ Security Disclaimer β οΈ
π¨ Despite these improvements, local Python execution is NEVER 100% safe. π¨
If you need true isolation, use a remote sandboxed executor like Docker or E2B.
5οΈβ£ The Best Practice: Use Sandboxed Execution π
For production-grade AI agents, we strongly recommend running code in a Docker or E2B sandbox to ensure complete isolation.
6οΈβ£ Upgrade Now & Stay Safe! π
Check out the latest smolagents release and start building safer AI agents today.
π https://github.com/huggingface/smolagents
What security measures do you take when running AI-generated code? Letβs discuss! π
#AI #smolagents #Python #Security
With the latest release, we've added security checks to the local Python interpreter: every evaluation is now analyzed for dangerous builtins, modules, and functions. π
Here's why this matters & what you need to know! π§΅π
1οΈβ£ Why is local execution risky? β οΈ
AI agents that run arbitrary Python code can unintentionally (or maliciously) access system files, run unsafe commands, or exfiltrate data.
2οΈβ£ New Safety Layer in smolagents π‘οΈ
We now inspect every return value during execution:
β Allowed: Safe built-in types (e.g., numbers, strings, lists)
β Blocked: Dangerous functions/modules (e.g., os.system, subprocess, exec, shutil)
3οΈβ£ Immediate Benefits π‘
- Prevent agents from accessing unsafe builtins
- Block unauthorized file or network access
- Reduce accidental security vulnerabilities
4οΈβ£ Security Disclaimer β οΈ
π¨ Despite these improvements, local Python execution is NEVER 100% safe. π¨
If you need true isolation, use a remote sandboxed executor like Docker or E2B.
5οΈβ£ The Best Practice: Use Sandboxed Execution π
For production-grade AI agents, we strongly recommend running code in a Docker or E2B sandbox to ensure complete isolation.
6οΈβ£ Upgrade Now & Stay Safe! π
Check out the latest smolagents release and start building safer AI agents today.
π https://github.com/huggingface/smolagents
What security measures do you take when running AI-generated code? Letβs discuss! π
#AI #smolagents #Python #Security
albertvillanovaΒ
posted
an
update
10 months ago
Post
4097
π Big news for AI agents! With the latest release of smolagents, you can now securely execute Python code in sandboxed Docker or E2B environments. π¦Ύπ
Here's why this is a game-changer for agent-based systems: π§΅π
1οΈβ£ Security First π
Running AI agents in unrestricted Python environments is risky! With sandboxing, your agents are isolated, preventing unintended file access, network abuse, or system modifications.
2οΈβ£ Deterministic & Reproducible Runs π¦
By running agents in containerized environments, you ensure that every execution happens in a controlled and predictable settingβno more environment mismatches or dependency issues!
3οΈβ£ Resource Control & Limits π¦
Docker and E2B allow you to enforce CPU, memory, and execution time limits, so rogue or inefficient agents donβt spiral out of control.
4οΈβ£ Safer Code Execution in Production π
Deploy AI agents confidently, knowing that any generated code runs in an ephemeral, isolated environment, protecting your host machine and infrastructure.
5οΈβ£ Easy to Integrate π οΈ
With smolagents, you can simply configure your agent to use Docker or E2B as its execution backendβno need for complex security setups!
6οΈβ£ Perfect for Autonomous AI Agents π€
If your AI agents generate and execute code dynamically, this is a must-have to avoid security pitfalls while enabling advanced automation.
β‘ Get started now: https://github.com/huggingface/smolagents
What will you build with smolagents? Let us know! ππ‘
Here's why this is a game-changer for agent-based systems: π§΅π
1οΈβ£ Security First π
Running AI agents in unrestricted Python environments is risky! With sandboxing, your agents are isolated, preventing unintended file access, network abuse, or system modifications.
2οΈβ£ Deterministic & Reproducible Runs π¦
By running agents in containerized environments, you ensure that every execution happens in a controlled and predictable settingβno more environment mismatches or dependency issues!
3οΈβ£ Resource Control & Limits π¦
Docker and E2B allow you to enforce CPU, memory, and execution time limits, so rogue or inefficient agents donβt spiral out of control.
4οΈβ£ Safer Code Execution in Production π
Deploy AI agents confidently, knowing that any generated code runs in an ephemeral, isolated environment, protecting your host machine and infrastructure.
5οΈβ£ Easy to Integrate π οΈ
With smolagents, you can simply configure your agent to use Docker or E2B as its execution backendβno need for complex security setups!
6οΈβ£ Perfect for Autonomous AI Agents π€
If your AI agents generate and execute code dynamically, this is a must-have to avoid security pitfalls while enabling advanced automation.
β‘ Get started now: https://github.com/huggingface/smolagents
What will you build with smolagents? Let us know! ππ‘
albertvillanovaΒ
posted
an
update
11 months ago
Post
4188
π Introducing
@huggingface
Open Deep-Researchπ₯
In just 24 hours, we built an open-source agent that:
β Autonomously browse the web
β Search, scroll & extract info
β Download & manipulate files
β Run calculations on data
55% on GAIA validation set! Help us improve it!π‘
https://huggingface.co/blog/open-deep-research
In just 24 hours, we built an open-source agent that:
β Autonomously browse the web
β Search, scroll & extract info
β Download & manipulate files
β Run calculations on data
55% on GAIA validation set! Help us improve it!π‘
https://huggingface.co/blog/open-deep-research
albertvillanovaΒ
posted
an
update
12 months ago
Post
2221
Discover all the improvements in the new version of Lighteval: https://huggingface.co/docs/lighteval/
albertvillanovaΒ
posted
an
update
about 1 year ago
Post
1932
π¨ How green is your model? π± Introducing a new feature in the Comparator tool: Environmental Impact for responsible #LLM research!
π open-llm-leaderboard/comparator
Now, you can not only compare models by performance, but also by their environmental footprint!
π The Comparator calculates COβ emissions during evaluation and shows key model characteristics: evaluation score, number of parameters, architecture, precision, type... π οΈ
Make informed decisions about your model's impact on the planet and join the movement towards greener AI!
π open-llm-leaderboard/comparator
Now, you can not only compare models by performance, but also by their environmental footprint!
π The Comparator calculates COβ emissions during evaluation and shows key model characteristics: evaluation score, number of parameters, architecture, precision, type... π οΈ
Make informed decisions about your model's impact on the planet and join the movement towards greener AI!
albertvillanovaΒ
posted
an
update
about 1 year ago
Post
1678
π New feature of the Comparator of the π€ Open LLM Leaderboard: now compare models with their base versions & derivatives (finetunes, adapters, etc.). Perfect for tracking how adjustments affect performance & seeing innovations in action. Dive deeper into the leaderboard!
π οΈ Here's how to use it:
1. Select your model from the leaderboard.
2. Load its model tree.
3. Choose any base & derived models (adapters, finetunes, merges, quantizations) for comparison.
4. Press Load.
See side-by-side performance metrics instantly!
Ready to dive in? π Try the π€ Open LLM Leaderboard Comparator now! See how models stack up against their base versions and derivatives to understand fine-tuning and other adjustments. Easier model analysis for better insights! Check it out here: open-llm-leaderboard/comparator π
π οΈ Here's how to use it:
1. Select your model from the leaderboard.
2. Load its model tree.
3. Choose any base & derived models (adapters, finetunes, merges, quantizations) for comparison.
4. Press Load.
See side-by-side performance metrics instantly!
Ready to dive in? π Try the π€ Open LLM Leaderboard Comparator now! See how models stack up against their base versions and derivatives to understand fine-tuning and other adjustments. Easier model analysis for better insights! Check it out here: open-llm-leaderboard/comparator π
albertvillanovaΒ
posted
an
update
about 1 year ago
Post
3311
π Exciting update! You can now compare multiple models side-by-side with the Hugging Face Open LLM Comparator! π
open-llm-leaderboard/comparator
Dive into multi-model evaluations, pinpoint the best model for your needs, and explore insights across top open LLMs all in one place. Ready to level up your model comparison game?
open-llm-leaderboard/comparator
Dive into multi-model evaluations, pinpoint the best model for your needs, and explore insights across top open LLMs all in one place. Ready to level up your model comparison game?
albertvillanovaΒ
posted
an
update
about 1 year ago
Post
1315
π¨ Instruct-tuning impacts models differently across families! Qwen2.5-72B-Instruct excels on IFEval but struggles with MATH-Hard, while Llama-3.1-70B-Instruct avoids MATH performance loss! Why? Can they follow the format in examples? π Compare models:
open-llm-leaderboard/comparator
albertvillanovaΒ
posted
an
update
about 1 year ago
Post
2032
Finding the Best SmolLM for Your Project
Need an LLM assistant but unsure which hashtag#smolLM to run locally? With so many models available, how can you decide which one suits your needs best? π€
If the model youβre interested in is evaluated on the Hugging Face Open LLM Leaderboard, thereβs an easy way to compare them: use the model Comparator tool: open-llm-leaderboard/comparator
Letβs walk through an exampleπ
Letβs compare two solid options:
- Qwen2.5-1.5B-Instruct from Alibaba Cloud Qwen (1.5B params)
- gemma-2-2b-it from Google (2.5B params)
For an assistant, you want a model thatβs great at instruction following. So, how do these two models stack up on the IFEval task?
What about other evaluations?
Both models are close in performance on many other tasks, showing minimal differences. Surprisingly, the 1.5B Qwen model performs just as well as the 2.5B Gemma in many areas, even though it's smaller in size! π
This is a great example of how parameter size isnβt everything. With efficient design and training, a smaller model like Qwen2.5-1.5B can match or even surpass larger models in certain tasks.
Looking for other comparisons? Drop your model suggestions below! π
Need an LLM assistant but unsure which hashtag#smolLM to run locally? With so many models available, how can you decide which one suits your needs best? π€
If the model youβre interested in is evaluated on the Hugging Face Open LLM Leaderboard, thereβs an easy way to compare them: use the model Comparator tool: open-llm-leaderboard/comparator
Letβs walk through an exampleπ
Letβs compare two solid options:
- Qwen2.5-1.5B-Instruct from Alibaba Cloud Qwen (1.5B params)
- gemma-2-2b-it from Google (2.5B params)
For an assistant, you want a model thatβs great at instruction following. So, how do these two models stack up on the IFEval task?
What about other evaluations?
Both models are close in performance on many other tasks, showing minimal differences. Surprisingly, the 1.5B Qwen model performs just as well as the 2.5B Gemma in many areas, even though it's smaller in size! π
This is a great example of how parameter size isnβt everything. With efficient design and training, a smaller model like Qwen2.5-1.5B can match or even surpass larger models in certain tasks.
Looking for other comparisons? Drop your model suggestions below! π
albertvillanovaΒ
posted
an
update
about 1 year ago
Post
2078
π¨ Weβve just released a new tool to compare the performance of models in the π€ Open LLM Leaderboard: the Comparator π
open-llm-leaderboard/comparator
Want to see how two different versions of LLaMA stack up? Letβs walk through a step-by-step comparison of LLaMA-3.1 and LLaMA-3.2. π¦π§΅π
1/ Load the Models' Results
- Go to the π€ Open LLM Leaderboard Comparator: open-llm-leaderboard/comparator
- Search for "LLaMA-3.1" and "LLaMA-3.2" in the model dropdowns.
- Press the Load button. Ready to dive into the results!
2/ Compare Metric Results in the Results Tab π
- Head over to the Results tab.
- Here, youβll see the performance metrics for each model, beautifully color-coded using a gradient to highlight performance differences: greener is better! π
- Want to focus on a specific task? Use the Task filter to hone in on comparisons for tasks like BBH or MMLU-Pro.
3/ Check Config Alignment in the Configs Tab βοΈ
- To ensure youβre comparing apples to apples, head to the Configs tab.
- Review both modelsβ evaluation configurations, such as metrics, datasets, prompts, few-shot configs...
- If something looks off, itβs good to know before drawing conclusions! β
4/ Compare Predictions by Sample in the Details Tab π
- Curious about how each model responds to specific inputs? The Details tab is your go-to!
- Select a Task (e.g., MuSR) and then a Subtask (e.g., Murder Mystery) and then press the Load Details button.
- Check out the side-by-side predictions and dive into the nuances of each modelβs outputs.
5/ With this tool, itβs never been easier to explore how small changes between model versions affect performance on a wide range of tasks. Whether youβre a researcher or enthusiast, you can instantly visualize improvements and dive into detailed comparisons.
π Try the π€ Open LLM Leaderboard Comparator now and take your model evaluations to the next level!
open-llm-leaderboard/comparator
Want to see how two different versions of LLaMA stack up? Letβs walk through a step-by-step comparison of LLaMA-3.1 and LLaMA-3.2. π¦π§΅π
1/ Load the Models' Results
- Go to the π€ Open LLM Leaderboard Comparator: open-llm-leaderboard/comparator
- Search for "LLaMA-3.1" and "LLaMA-3.2" in the model dropdowns.
- Press the Load button. Ready to dive into the results!
2/ Compare Metric Results in the Results Tab π
- Head over to the Results tab.
- Here, youβll see the performance metrics for each model, beautifully color-coded using a gradient to highlight performance differences: greener is better! π
- Want to focus on a specific task? Use the Task filter to hone in on comparisons for tasks like BBH or MMLU-Pro.
3/ Check Config Alignment in the Configs Tab βοΈ
- To ensure youβre comparing apples to apples, head to the Configs tab.
- Review both modelsβ evaluation configurations, such as metrics, datasets, prompts, few-shot configs...
- If something looks off, itβs good to know before drawing conclusions! β
4/ Compare Predictions by Sample in the Details Tab π
- Curious about how each model responds to specific inputs? The Details tab is your go-to!
- Select a Task (e.g., MuSR) and then a Subtask (e.g., Murder Mystery) and then press the Load Details button.
- Check out the side-by-side predictions and dive into the nuances of each modelβs outputs.
5/ With this tool, itβs never been easier to explore how small changes between model versions affect performance on a wide range of tasks. Whether youβre a researcher or enthusiast, you can instantly visualize improvements and dive into detailed comparisons.
π Try the π€ Open LLM Leaderboard Comparator now and take your model evaluations to the next level!
albertvillanovaΒ
posted
an
update
over 1 year ago
Post
1569
Check out the new Structured #Wikipedia dataset by Wikimedia Enterprise: abstract, infobox, structured sections, main image,...
Currently in early beta (English & French). Explore it and give feedback: wikimedia/structured-wikipedia
More info: https://enterprise.wikimedia.com/blog/hugging-face-dataset/
@sdelbecque @resquito-wmf
Currently in early beta (English & French). Explore it and give feedback: wikimedia/structured-wikipedia
More info: https://enterprise.wikimedia.com/blog/hugging-face-dataset/
@sdelbecque @resquito-wmf
albertvillanovaΒ
updated
2
datasets
over 1 year ago