This project looks very cool - I've been trying to build something similar in a few different ways (https://github.com/simonw/denobox is my most recent attempt) but this is way ahead of where I've got, especially given its support for shell scripting.
I'm sad about this bit though:
> Python code is MIT. The WASM binary is proprietary—you can use it with this package but can't extract or redistribute it separately.
Thanks Simon! Denobox looks very cool: Deno's permissions model is a natural fit for this.
On the licensing: totally fair point. Our intention is to open source the WASM too. The binary is closed for now only because we need to clean up the source code before releasing it as open-source. The Python SDK and capability layer are MIT.
We wanted to ship something usable now rather than wait. Since the wasm binary runs in wasmtime within an open source harness, it is possible to audit everything going in and out of the wasm blob for security.
Genuinely open to feedback on this. If the split license is a blocker for your use cases, that's useful signal for us.
That's great to hear. The split license is a blocker for me because I build open source tools for other people to use, so I need to be sure that all of my dependencies are things I can freely redistribute to others.
I posted this elsewhere in the thread, and don't want to spam it everywhere (or take away from Amla!), but you might be interested in eryx [1] - the Python bindings [2] get you a similar Python-in-Python sandbox based on a WASI build of CPython (props to the componentize-py [3] people)!
% uv run --with pyeryx python
Installed 1 package in 1ms
Python 3.14.0 (main, Oct 7 2025, 16:07:00) [Clang 20.1.4 ] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import eryx
>>> sandbox = eryx.Sandbox()
>>> result = sandbox.execute('''
... print("Hello from the sandbox!")
... x = 2 + 2
... print(f"2 + 2 = {x}")
... ''')
>>> result
ExecuteResult(stdout="Hello from the sandbox!\n2 + 2 = 4", duration_ms=6.83, callback_invocations=0, peak_memory_bytes=Some(16384000))
>>> sandbox.execute('''
... import sqlite3
... print(sqlite3.connect(":memory:").execute("select sqlite_version()").fetchall())
... ''').stdout
Traceback (most recent call last):
File "<python-input-6>", line 1, in <module>
sandbox.execute('''
~~~~~~~~~~~~~~~^^^^
import sqlite3
^^^^^^^^^^^^^^
print(sqlite3.connect(":memory:").execute("select sqlite_version()").fetchall())
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
''').stdout
^^^^
eryx.ExecutionError: Traceback (most recent call last):
File "<string>", line 1, in <module>
File "<string>", line 125, in _eryx_exec
File "<user>", line 2, in <module>
File "/python-stdlib/sqlite3/__init__.py", line 57, in <module>
from sqlite3.dbapi2 import *
File "/python-stdlib/sqlite3/dbapi2.py", line 27, in <module>
from _sqlite3 import *
ModuleNotFoundError: No module named '_sqlite3'
It looks like there's not mechanism yet in the Python bindings for exposing callback functions to the sandboxed code - it exists in the Rust library and Python has a ExecuteRusult.callback_invocations counter so presumably this is coming soon?
I tried it (really like the API design) but ran into a blocker:
uv run --with localsandbox python -c '
from localsandbox import LocalSandbox
with LocalSandbox() as sandbox:
result = sandbox.bash("echo hi")
print(result.stdout)
'
Gave me:
Traceback (most recent call last):
File "<string>", line 5, in <module>
result = sandbox.bash("echo hi")
File "/Users/simon/.cache/uv/archive-v0/spFCEHagkq3VTpTyStT-Z/lib/python3.14/site-packages/localsandbox/core.py", line 492, in bash
raise SubprocessCrashed(
...<2 lines>...
)
localsandbox.exceptions.SubprocessCrashed: Node subprocess crashed: error: Failed reading lockfile at '/Users/simon/.cache/uv/archive-v0/spFCEHagkq3VTpTyStT-Z/lib/python3.14/site-packages/localsandbox/shim/deno.lock'
Caused by:
Unsupported lockfile version '5'. Try upgrading Deno or recreating the lockfile
Actually that was with Deno 2.2.10 - I ran "brew upgrade deno" and got Deno 2.6.7 and now it works!
It looks like it currently defaults to allowing networking so it can load Pyodide from npm. My preference is a sandbox with no network access at all and access only to specific files that I can configure.
I really like the capability enforcement model, it's a great concept. One thing this discussion is missing though is the ecosystem layer. Sandboxing solves execution safety, but there's a parallel problem: how do agents discover and compose tools portably across frameworks? Right now every framework has its own tool format and registry (or none at all). WASM's component model actually solves this — you get typed interfaces (WIT), language interop, and composability for free. I've been building a registry and runtime (also based on wasmtime!) for this: components written in any language, published to a shared registry, runnable locally or in the cloud. Sandboxes like amla-sandbox could be a consumer of these components. https://asterai.io/why
The ecosystem layer is a hard but very important problem to solve. Right now we define tools in Python on the host side, but I see a clear path to WIT-defined components. The registry of portable tools is very compelling.
Shell commands work for individual tools, but you lose composability. If you want to chain components that share a sandboxed environment, say, add a tracing component alongside an OTP confirmation layer that gates sensitive actions, you need a shared runtime and typed interfaces. That's the layer I'm building with asterai: standard substrate so components compose without glue code. Plus, having a central ecosystem lets you add features like the traceability with almost 1 click complexity. Of course, this only wins long term if WASM wins.
How does the AI compose tools? Asking it to write a script in some language that both you and the AI know seems like a pretty natural approach. It helps if there's an ecosystem of common libraries available, and that's not so easy to build.
In my example above I wasn't referring to AI composing the tools, but you as the agent builder composing the tool call workflow. So, I suppose we can call it AI-time composition vs build-time composition.
For example, say you have a shell script to make a bank transfer. This just makes an API call to your bank.
You can't trust the AI to reliably make a call to your traceability tool, and then to your OTP confirmation gate, and only then to proceed with the bank transfer. This will eventually fail and be compromised.
If you're running your agent on a "composable tool runtime", rather than raw shell for tool calls, you can easily make it so the "transfer $500 to Alice" call always goes through the route trace -> confirm OTP -> validate action. This is configured at build time.
Your alternative with raw shell would be to program the tool itself to follow this workflow, but then you'd end up with a lot of duplicate source code if you have the same workflow for different tool calls.
Of course, any AI agent SDK will let you configure these workflows. But they are locked to their own ecosystems, it's not a global ecosystem like you can achieve with WASM, allowing for interop between components written in any language.
Sure, but every tool that you provide access to, is a potential escape hatch from the sandbox. It's safer to run everything inside the sandbox, including the called tools.
That's definitely true. Our model assumes tools run outside the sandbox on a trusted host—the sandbox constrains which tools can be called and with what parameters. The reason for this is most "useful" tools are actually just some API call over the network (MCP, REST API, etc.). Then you need to get credentials and network access into the sandbox, which opens its own attack surface. We chose to keep credentials on the host and let the sandbox act as a policy enforcement layer: agents can only invoke what you've explicitly exposed, with the constraints you define.
While I think that with their current choice for the runtime will hit some limitations (aka: not really full Python support, partial JS support), I strongly believe using Wasm for sandboxing is the way for the future of containers.
At Wasmer we are working hard to make this model work. I'm incredibly happy to see more people joining on the quest!
Browserpod is great, been following it for a bit. Keep the good work up!
The main issue that I see with Browserpod is very similar to Emscripten: it's designed to work mainly in the browser, and not outside.
In my view, where Wasm really shines, is for enabling containers that work seamlessly in any of this environments: browsers, servers, or even embedded in apps :)
It is true that BrowserPod is currently focused on browsers environment, but there is nothing preventing the technology from running on native as well. It would requite some work, but nothing truly challenging :-)
Appreciate your support! We deliberately chose a limited runtime (quickjs + some shell applets). The tool parameter constraint enforcement was more important to us than language completeness. For agent tool calling, you don't really need NumPy and Pandas.
Wasmer is doing great work—we're using wasmtime on the host side currently but have been following your progress. Excited to see WASM sandboxing become more mainstream for this use case.
Fair point. We get around this by "yielding" back from the Wasm runtime (in a coroutine style) so that the "host" can do network calls or other IO on behalf of the Wasm runtime. But it would be great to do this natively within Wasm!
We implemented all the system calls necessary to make networking work (within Wasm), and dynamic linking (so you could import and run pydantic, numpy, gevent and more!)
One nice thing about using AgentFS as the VFS is that it's backed by sqlite so it's very portable - making it easy to fork and resume agent workflows across machines / time.
I really like Amla Sandbox addition of injecting tool calls into the sandbox, which lets the agent generated code interact with the harness provided tools. Very interesting!
Cool to see more projects in this space! I think Wasm is a great way to do secure sandboxing here. How does Amla handle commands like grep/jq/curl etc which make AI agents so effective at bash but require recompilation to WASI (which is kinda impractical for so many projects)?
I've been working on a couple of things which take a very similar approach, with what seem to be some different tradeoffs:
- eryx [1], which uses a WASI build of CPython to provide a true Python sandbox (similar to componentize-py but supports some form of 'dynamic linking' with either pure Python packages or WASI-compiled native wheels)
- conch [2], which embeds the `brush` Rust reimplementation of Bash to provide a similar bash sandbox. This is where I've been struggling with figuring out the best way to do subcommands, right now they just have to be rewritten and compiled in but I'd like to find a way to dynamically link them in similar to the Python package approach...
One other note, WASI's VFS support has been great, I just wish there was more progress on `wasi-tls`, it's tricky to get network access working otherwise...
Great question. We cheated a bit; we didn't compile the GNU coreutils to wasm. Instead, we have Rust reimplementations of common shell commands. It allows us to focus on the use cases agents actually care about instead of reimplementing all of the corner cases exactly.
curl is interesting. We don't include it currently but we could do it without too much additional effort.
Networking isn't done within the wasm sandbox; we "yield" back to the the caller using what we call "host operations" in order to perform any IO. This keeps the Wasm sandbox minimal and as close to "pure compute" as possible. In fact, the only capabilities we give the WASI runtime is a method to get the current time and to generate random numbers. Since we intercept all external IO, random number generation, time, and the Wasm runtime is just for pure computation, we also get perfect reproducibility. We can replay anything within the sandbox exactly.
Your approach with brush is interesting. Having actual bash semantics rather than "bash-like" is a real advantage for complex scripts. The dynamic linking problem for subcommands is a tough one; have you looked at WASI components for this? Feels like that's where it'll eventually land but the tooling isn't there yet.
Will check out eryx and conch. Thanks for sharing!
Hah, that is exactly the same approach I landed on. Fortunately the most common tools either seem to have Rust ports or are fairly easy to port 80% of the functionality! Conch's Wasm file is around ~3.5MB and only has a few tools though so I can see it growing. I think for the places where size really matters (e.g. the web) it should be possible to split it using the component model and `jco` (which I think splits Wasm components into modules along interface boundaries, and could defer loading of unused modules) but I haven't got that far yet.
I did something very similar to you for networking in eryx too (no networking in conch yet); defined an `eryx:net` interface in WIT and reimplemented the `urllib` module using host networking, which most downstream packages (httpx, requests, etc) use far enough down the stack. It's a tradeoff but I think it's pretty much good enough for most use cases like this, and gives the host full control which is great.
Oh full transparency, the vast majority of conch and eryx were written by Opus 4.5. Being backed by wasmtime and the rather strict Rust compiler is definitely a boon here!
The opus 4.5 confession is great haha. We have found Claude Code + Opus 4.5 + Rust with miri/cargo-deny/cargo-check/cargo-fmt + Python with strict type checking/pedantic lint rules/comprehensive test suites to be a winning combination. It makes AI-assisted development surprisingly viable for systems work.
Good to see that you chose a similar path for networking in eryx!
I had the same idea, forcing the agent to execute code inside a WASM instance, and I've developed a few proof of concepts over the past few weeks. The latest solution I adopted was to provide a WASM instance as a sandbox and use MCP to supply the tool calls to the agent. However, it hasn't seemed flexible enough for all use cases to me. On top of that, there's also the issue of supporting the various possible runtimes.
Interesting! What use cases felt too constrained? We've been mostly focused on "agent calls tools with parameters". Curious where you hit flexibility limits.
Would love to see your MCP approach if you've published it anywhere.
Really appreciate the pragmatic approach here. The 11MB vs 173MB difference with agentvm highlights an important tradeoff: sometimes you don't need full Linux compatibility if you can constrain the problem space well enough. The tool-calling validation layer seems like the sweet spot between safety and practical deployment.
In theory it's more secure. Containers and VMs run on real hardware, containers usually even on the real kernel (unless you use something like Kata). WASM doesn't have any system interface by default, you have full control over what it accesses. So it's similar to JVM for example.
True. bubblewrap and similar (Landlock, sandbox-exec on Mac) are solid lightweight options. The main difference is they still expose a syscall interface that you then restrict, vs WASM where capabilities are opt-in from zero. Different starting points, similar goals.
Some advantages of building the sandbox in wasm, aside from the security benefits, are complete execution reproducibility. amla-sandbox controls all external side effects, leaving the wasm core as just "pure computation", which makes recording traces and replaying them very easy. It's great for debugging complex workflows.
The readme exaggerates the threat of agents shelling out and glosses over a serious drawback of itself. On the shelling out side, it says "One prompt injection and you're done." Well, you can run a lot of these agents in a container, and I do. So maybe you're not "done". Also it's rare enough that this warning exaggerates - Claude Code has a yolo mode and outside of that, it has a pretty good permission system. On glossing over the drawback: "The WASM binary is proprietary—you can use it with this package but can't extract or redistribute it separately." And who is Amla Labs? FWIW the first commit is in 2026 and the license is in 2025.
On containers: yes, running in Docker/Firecracker works. The "one prompt injection and you’re done" framing is hyperbolic for containerized setups. The pitch is more relevant for people running agents in their local environment without isolation, or who want something lighter than spinning up containers per execution.
On the licensing: completely valid concern. We are a new company (just two cofounders right now) and the binary is closed for now only because we need to clean up the source code before releasing it as open-source. The Python SDK and capability layer are MIT.
I get that "trust us" isn’t compelling for a security product from an unknown entity, but since the Wasm binary runs within wasmtime (one of the most popular Wasm runtimes) and you can audit everything going in and out of it, the security story should hopefully be more palatable while we work on open sourcing the Wasm core.
The 2025/2026 date discrepancy is just me being sloppy with the license
This is really awesome. I want to give my agent access to basic coding tools to do text manipulation, add up numbers, etc, but I want to keep a tight lid on it. This seems like a great way to add that functionality!
The sandbox doesn’t run models. it runs agent-generated code and constrains tool calls. The model runs wherever you want (OpenAI, Anthropic, local Ollama, whatever).
> The sandbox runs inside WebAssembly with WASI for a minimal syscall interface. WASM provides memory isolation by design—linear memory is bounds-checked, and there's no way to escape to the host address space. The wasmtime runtime we use is built with defense-in-depth and has been formally verified for memory safety.
> On top of WASM isolation, every tool call goes through capability validation: [...]
> The design draws from capability-based security as implemented in systems like seL4—access is explicitly granted, not implicitly available. Agents don't get ambient authority just because they're running in your process.
agentvm looks very cool! They are taking a different approach - full Linux VM emulated in WASM. It's very impressive technically.
We differentiate from agentvm by being lightweight (~11 MB Wasm binary, compared to 173 MB for agentvm). Though there is still a lot we can learn from agentvm, thank you for sharing their project.
Thank you! When I started working on agentvm my original goal was similar to yours, build a kind of Mingw or Cygwin for WASM. However, I quickly learned that this wouldn't really be feasible with reasonable amounts of time/token spend, mostly due to issues like having to find a way to make fork work, etc. I am no expert for WASM or Linux system programming, but it's been a lot of fun working on this stuff. I hope that the WASI standard and runtimes become more mature, as I feel that WASM sandboxes make a lot of sense in environments where containers are not an option.
Thanks for sharing the context! The fork problem is gnarly. Makes sense that full Linux emulation was the path forward for your use case.
Agreed on WASI maturity. We're hoping the component model lands in a stable form soon. Would love to see the ecosystem converge so these approaches can interoperate.
> Don't there need to be per- CPU/RAM/GPU quotas per WASM scope/tab? Or is preventing DOS with WASM out of scope for browsers?
> IIRC, it's possible to check resource utilization in e.g. a browser Task Manager, but there's no way to do `nice` or `docker --cpu-quota` or `systemd-nspawn --cpu-affinity` to prevent one or more WASM tabs from DOS'ing a workstation with non-costed operations.
This project looks very cool - I've been trying to build something similar in a few different ways (https://github.com/simonw/denobox is my most recent attempt) but this is way ahead of where I've got, especially given its support for shell scripting.
I'm sad about this bit though:
> Python code is MIT. The WASM binary is proprietary—you can use it with this package but can't extract or redistribute it separately.
Thanks Simon! Denobox looks very cool: Deno's permissions model is a natural fit for this.
On the licensing: totally fair point. Our intention is to open source the WASM too. The binary is closed for now only because we need to clean up the source code before releasing it as open-source. The Python SDK and capability layer are MIT. We wanted to ship something usable now rather than wait. Since the wasm binary runs in wasmtime within an open source harness, it is possible to audit everything going in and out of the wasm blob for security.
Genuinely open to feedback on this. If the split license is a blocker for your use cases, that's useful signal for us.
That's great to hear. The split license is a blocker for me because I build open source tools for other people to use, so I need to be sure that all of my dependencies are things I can freely redistribute to others.
Makes total sense. We'll prioritize getting the WASM source out. This is good signal that it matters. Will ping you when it's up!
Small suggestion: push an alpha to PyPI ASAP mainly to preserve your name there but also to make it more convenient for people to try out with `uv`.
I posted this elsewhere in the thread, and don't want to spam it everywhere (or take away from Amla!), but you might be interested in eryx [1] - the Python bindings [2] get you a similar Python-in-Python sandbox based on a WASI build of CPython (props to the componentize-py [3] people)!
[1]: https://github.com/sd2k/eryx/
[2]: https://pypi.org/project/pyeryx/
[3]: https://github.com/bytecodealliance/componentize-py/
That's really cool.
Any chance you could add SQLite?
Filed a feature request here: https://github.com/eryx-org/eryx/issues/28It looks like there's not mechanism yet in the Python bindings for exposing callback functions to the sandboxed code - it exists in the Rust library and Python has a ExecuteRusult.callback_invocations counter so presumably this is coming soon?
Good call, yes, I'll get that added soon!
Simon - would love if you could take a look at Localsandbox (https://github.com/coplane/localsandbox) - it was partly inspired by your Pyodide post!
I tried it (really like the API design) but ran into a blocker:
Gave me: Actually that was with Deno 2.2.10 - I ran "brew upgrade deno" and got Deno 2.6.7 and now it works!It looks like it currently defaults to allowing networking so it can load Pyodide from npm. My preference is a sandbox with no network access at all and access only to specific files that I can configure.
Thanks for taking a look and the feedback! We run the shim with internet access (https://github.com/coplane/localsandbox/blob/main/localsandb...) but the pyodide sandbox itself doesn't run with internet access : https://github.com/coplane/localsandbox/blob/main/localsandb...
Oh neat, thanks - I'd missed that.
https://github.com/bytecodealliance/ComponentizeJS is a Bytecode Alliance project which can run JS in a SpiderMonkey-based runtime as a Wasm component
I really like the capability enforcement model, it's a great concept. One thing this discussion is missing though is the ecosystem layer. Sandboxing solves execution safety, but there's a parallel problem: how do agents discover and compose tools portably across frameworks? Right now every framework has its own tool format and registry (or none at all). WASM's component model actually solves this — you get typed interfaces (WIT), language interop, and composability for free. I've been building a registry and runtime (also based on wasmtime!) for this: components written in any language, published to a shared registry, runnable locally or in the cloud. Sandboxes like amla-sandbox could be a consumer of these components. https://asterai.io/why
The ecosystem layer is a hard but very important problem to solve. Right now we define tools in Python on the host side, but I see a clear path to WIT-defined components. The registry of portable tools is very compelling.
Will checkout asterai, thanks for sharing!
Exposing tools to the AI as shell commands works pretty well? There are many standards to choose from for the actual network API.
Shell commands work for individual tools, but you lose composability. If you want to chain components that share a sandboxed environment, say, add a tracing component alongside an OTP confirmation layer that gates sensitive actions, you need a shared runtime and typed interfaces. That's the layer I'm building with asterai: standard substrate so components compose without glue code. Plus, having a central ecosystem lets you add features like the traceability with almost 1 click complexity. Of course, this only wins long term if WASM wins.
How does the AI compose tools? Asking it to write a script in some language that both you and the AI know seems like a pretty natural approach. It helps if there's an ecosystem of common libraries available, and that's not so easy to build.
I'm pretty happy with Typescript.
In my example above I wasn't referring to AI composing the tools, but you as the agent builder composing the tool call workflow. So, I suppose we can call it AI-time composition vs build-time composition.
For example, say you have a shell script to make a bank transfer. This just makes an API call to your bank.
You can't trust the AI to reliably make a call to your traceability tool, and then to your OTP confirmation gate, and only then to proceed with the bank transfer. This will eventually fail and be compromised.
If you're running your agent on a "composable tool runtime", rather than raw shell for tool calls, you can easily make it so the "transfer $500 to Alice" call always goes through the route trace -> confirm OTP -> validate action. This is configured at build time.
Your alternative with raw shell would be to program the tool itself to follow this workflow, but then you'd end up with a lot of duplicate source code if you have the same workflow for different tool calls.
Of course, any AI agent SDK will let you configure these workflows. But they are locked to their own ecosystems, it's not a global ecosystem like you can achieve with WASM, allowing for interop between components written in any language.
Sure, but every tool that you provide access to, is a potential escape hatch from the sandbox. It's safer to run everything inside the sandbox, including the called tools.
That's definitely true. Our model assumes tools run outside the sandbox on a trusted host—the sandbox constrains which tools can be called and with what parameters. The reason for this is most "useful" tools are actually just some API call over the network (MCP, REST API, etc.). Then you need to get credentials and network access into the sandbox, which opens its own attack surface. We chose to keep credentials on the host and let the sandbox act as a policy enforcement layer: agents can only invoke what you've explicitly exposed, with the constraints you define.
This is great!
While I think that with their current choice for the runtime will hit some limitations (aka: not really full Python support, partial JS support), I strongly believe using Wasm for sandboxing is the way for the future of containers.
At Wasmer we are working hard to make this model work. I'm incredibly happy to see more people joining on the quest!
Hi, if you like the idea of Wasm sandboxing you might be interested in what we are working on: BrowserPod :-)
https://labs.leaningtech.com/blog/browserpod-beta-announceme...
https://browserpod.io
Browserpod is great, been following it for a bit. Keep the good work up!
The main issue that I see with Browserpod is very similar to Emscripten: it's designed to work mainly in the browser, and not outside.
In my view, where Wasm really shines, is for enabling containers that work seamlessly in any of this environments: browsers, servers, or even embedded in apps :)
It is true that BrowserPod is currently focused on browsers environment, but there is nothing preventing the technology from running on native as well. It would requite some work, but nothing truly challenging :-)
Appreciate your support! We deliberately chose a limited runtime (quickjs + some shell applets). The tool parameter constraint enforcement was more important to us than language completeness. For agent tool calling, you don't really need NumPy and Pandas.
Wasmer is doing great work—we're using wasmtime on the host side currently but have been following your progress. Excited to see WASM sandboxing become more mainstream for this use case.
> For agent tool calling, you don't really need NumPy and Pandas.
That's true, but you'll likely need sockets, pydantic or SQLAlchemy (all of of them require heavy support on the Wasm layer!)
Fair point. We get around this by "yielding" back from the Wasm runtime (in a coroutine style) so that the "host" can do network calls or other IO on behalf of the Wasm runtime. But it would be great to do this natively within Wasm!
Might be worth taking a look at WASIX [1]
We implemented all the system calls necessary to make networking work (within Wasm), and dynamic linking (so you could import and run pydantic, numpy, gevent and more!)
[1] https://wasix.org/
We will take a look! Thanks for sharing. Dynamic linking to run pydantic/numpy/etc. would be huge!
Sharing our version of this built on just-bash, AgentFS, and Pyodide: https://github.com/coplane/localsandbox
One nice thing about using AgentFS as the VFS is that it's backed by sqlite so it's very portable - making it easy to fork and resume agent workflows across machines / time.
I really like Amla Sandbox addition of injecting tool calls into the sandbox, which lets the agent generated code interact with the harness provided tools. Very interesting!
Thanks for sharing localsandbox! sqlite-backed VFS for fork and resume workflows is very interesting.
Cool to see more projects in this space! I think Wasm is a great way to do secure sandboxing here. How does Amla handle commands like grep/jq/curl etc which make AI agents so effective at bash but require recompilation to WASI (which is kinda impractical for so many projects)?
I've been working on a couple of things which take a very similar approach, with what seem to be some different tradeoffs:
- eryx [1], which uses a WASI build of CPython to provide a true Python sandbox (similar to componentize-py but supports some form of 'dynamic linking' with either pure Python packages or WASI-compiled native wheels) - conch [2], which embeds the `brush` Rust reimplementation of Bash to provide a similar bash sandbox. This is where I've been struggling with figuring out the best way to do subcommands, right now they just have to be rewritten and compiled in but I'd like to find a way to dynamically link them in similar to the Python package approach...
One other note, WASI's VFS support has been great, I just wish there was more progress on `wasi-tls`, it's tricky to get network access working otherwise...
[1] https://github.com/eryx-org/eryx [2] https://github.com/sd2k/conch
Great question. We cheated a bit; we didn't compile the GNU coreutils to wasm. Instead, we have Rust reimplementations of common shell commands. It allows us to focus on the use cases agents actually care about instead of reimplementing all of the corner cases exactly.
For `jq` specifically we use the excellent `jaq_interpret` crate: https://crates.io/crates/jaq-interpret
curl is interesting. We don't include it currently but we could do it without too much additional effort.
Networking isn't done within the wasm sandbox; we "yield" back to the the caller using what we call "host operations" in order to perform any IO. This keeps the Wasm sandbox minimal and as close to "pure compute" as possible. In fact, the only capabilities we give the WASI runtime is a method to get the current time and to generate random numbers. Since we intercept all external IO, random number generation, time, and the Wasm runtime is just for pure computation, we also get perfect reproducibility. We can replay anything within the sandbox exactly.
Your approach with brush is interesting. Having actual bash semantics rather than "bash-like" is a real advantage for complex scripts. The dynamic linking problem for subcommands is a tough one; have you looked at WASI components for this? Feels like that's where it'll eventually land but the tooling isn't there yet.
Will check out eryx and conch. Thanks for sharing!
Hah, that is exactly the same approach I landed on. Fortunately the most common tools either seem to have Rust ports or are fairly easy to port 80% of the functionality! Conch's Wasm file is around ~3.5MB and only has a few tools though so I can see it growing. I think for the places where size really matters (e.g. the web) it should be possible to split it using the component model and `jco` (which I think splits Wasm components into modules along interface boundaries, and could defer loading of unused modules) but I haven't got that far yet.
I did something very similar to you for networking in eryx too (no networking in conch yet); defined an `eryx:net` interface in WIT and reimplemented the `urllib` module using host networking, which most downstream packages (httpx, requests, etc) use far enough down the stack. It's a tradeoff but I think it's pretty much good enough for most use cases like this, and gives the host full control which is great.
Oh full transparency, the vast majority of conch and eryx were written by Opus 4.5. Being backed by wasmtime and the rather strict Rust compiler is definitely a boon here!
The opus 4.5 confession is great haha. We have found Claude Code + Opus 4.5 + Rust with miri/cargo-deny/cargo-check/cargo-fmt + Python with strict type checking/pedantic lint rules/comprehensive test suites to be a winning combination. It makes AI-assisted development surprisingly viable for systems work.
Good to see that you chose a similar path for networking in eryx!
This is cool, but I had imagined something like a pure Typescript library that can run in a browser.
Sounds like just-bash: https://github.com/vercel-labs/just-bash
I had the same idea, forcing the agent to execute code inside a WASM instance, and I've developed a few proof of concepts over the past few weeks. The latest solution I adopted was to provide a WASM instance as a sandbox and use MCP to supply the tool calls to the agent. However, it hasn't seemed flexible enough for all use cases to me. On top of that, there's also the issue of supporting the various possible runtimes.
Interesting! What use cases felt too constrained? We've been mostly focused on "agent calls tools with parameters". Curious where you hit flexibility limits.
Would love to see your MCP approach if you've published it anywhere.
Really appreciate the pragmatic approach here. The 11MB vs 173MB difference with agentvm highlights an important tradeoff: sometimes you don't need full Linux compatibility if you can constrain the problem space well enough. The tool-calling validation layer seems like the sweet spot between safety and practical deployment.
is a wasm sandbox as secure as a container or vm?
If I had to rank these, in order of least to most secure, it would be container < VM < WASM.
WASM has:
- Bounds checked linear memory
- No system calls except what you explicitly grant via WASI
- Much smaller attack surface
VMs have:
- Hardware isolation, separate kernel
- May have hypervisor bugs leading to VM escape (rare in practice though)
Some problems with containers:
- Shared host kernel (kernel exploit = escape)
- Seccomp/AppArmor/namespaces reduce attack surface but don't eliminate it
- Larger attack surface (full syscall interface)
- Container escapes are a known class of vulnerability
In theory it's more secure. Containers and VMs run on real hardware, containers usually even on the real kernel (unless you use something like Kata). WASM doesn't have any system interface by default, you have full control over what it accesses. So it's similar to JVM for example.
Docker and vms are not the only options though... you can use bubblewrap and other equivalents for mac
True. bubblewrap and similar (Landlock, sandbox-exec on Mac) are solid lightweight options. The main difference is they still expose a syscall interface that you then restrict, vs WASM where capabilities are opt-in from zero. Different starting points, similar goals.
Some advantages of building the sandbox in wasm, aside from the security benefits, are complete execution reproducibility. amla-sandbox controls all external side effects, leaving the wasm core as just "pure computation", which makes recording traces and replaying them very easy. It's great for debugging complex workflows.
The readme exaggerates the threat of agents shelling out and glosses over a serious drawback of itself. On the shelling out side, it says "One prompt injection and you're done." Well, you can run a lot of these agents in a container, and I do. So maybe you're not "done". Also it's rare enough that this warning exaggerates - Claude Code has a yolo mode and outside of that, it has a pretty good permission system. On glossing over the drawback: "The WASM binary is proprietary—you can use it with this package but can't extract or redistribute it separately." And who is Amla Labs? FWIW the first commit is in 2026 and the license is in 2025.
Fair points.
On containers: yes, running in Docker/Firecracker works. The "one prompt injection and you’re done" framing is hyperbolic for containerized setups. The pitch is more relevant for people running agents in their local environment without isolation, or who want something lighter than spinning up containers per execution.
On the licensing: completely valid concern. We are a new company (just two cofounders right now) and the binary is closed for now only because we need to clean up the source code before releasing it as open-source. The Python SDK and capability layer are MIT.
I get that "trust us" isn’t compelling for a security product from an unknown entity, but since the Wasm binary runs within wasmtime (one of the most popular Wasm runtimes) and you can audit everything going in and out of it, the security story should hopefully be more palatable while we work on open sourcing the Wasm core.
The 2025/2026 date discrepancy is just me being sloppy with the license
This is really awesome. I want to give my agent access to basic coding tools to do text manipulation, add up numbers, etc, but I want to keep a tight lid on it. This seems like a great way to add that functionality!
Thanks! That’s exactly the use case we built this for
> What you don't get: ...GPU access...
So no local models are supported.
The sandbox doesn’t run models. it runs agent-generated code and constrains tool calls. The model runs wherever you want (OpenAI, Anthropic, local Ollama, whatever).
thats great one i am definetly ussing this
From the README:
> Security model
> The sandbox runs inside WebAssembly with WASI for a minimal syscall interface. WASM provides memory isolation by design—linear memory is bounds-checked, and there's no way to escape to the host address space. The wasmtime runtime we use is built with defense-in-depth and has been formally verified for memory safety.
> On top of WASM isolation, every tool call goes through capability validation: [...]
> The design draws from capability-based security as implemented in systems like seL4—access is explicitly granted, not implicitly available. Agents don't get ambient authority just because they're running in your process.
From "Show HN: NPM install a WASM based Linux VM for your agents" re: https://github.com/deepclause/agentvm .. https://news.ycombinator.com/item?id=46686346 :
>> How to run vscode-container-wasm-gcc-example with c2w, with joelseverin/linux-wasm?
> linux-wasm is apparently faster than c2w.
container2wasm issue #550: https://github.com/container2wasm/container2wasm/issues/550#...
vscode-container-wasm-gcc-example : https://github.com/ktock/vscode-container-wasm-gcc-example
Cloudflare Runners also run WASM; with workerd:
cloudflare/workerd : https://github.com/cloudflare/workerd
...
"Cage" implements ARM64 MTE Memory Tagging Extensions support for WASM with LLVM emscripten iirc:
- "Cage: Hardware-Accelerated Safe WebAssembly" (2024) https://news.ycombinator.com/item?id=46151170 :
> [ llvm-memsafe-wasm , wasmtime-mte , ]
agentvm looks very cool! They are taking a different approach - full Linux VM emulated in WASM. It's very impressive technically.
We differentiate from agentvm by being lightweight (~11 MB Wasm binary, compared to 173 MB for agentvm). Though there is still a lot we can learn from agentvm, thank you for sharing their project.
Thank you! When I started working on agentvm my original goal was similar to yours, build a kind of Mingw or Cygwin for WASM. However, I quickly learned that this wouldn't really be feasible with reasonable amounts of time/token spend, mostly due to issues like having to find a way to make fork work, etc. I am no expert for WASM or Linux system programming, but it's been a lot of fun working on this stuff. I hope that the WASI standard and runtimes become more mature, as I feel that WASM sandboxes make a lot of sense in environments where containers are not an option.
Thanks for sharing the context! The fork problem is gnarly. Makes sense that full Linux emulation was the path forward for your use case.
Agreed on WASI maturity. We're hoping the component model lands in a stable form soon. Would love to see the ecosystem converge so these approaches can interoperate.
Nice! Fork is actually already working on Wasmer thanks to WASIX :) (and sockets, subprocesses, ...).
Let me know if you need any help using it!
"Rethinking Code Refinement: Learning to Judge Code Efficiency" https://news.ycombinator.com/item?id=42097656
eWASM has costed opcodes. The EVM virtual machine has not implemented eWASM.
Costed opcodes in WASM for agents could incentivize efficiency
re: wasm-bpf and eWASM and the BPF verifier: https://news.ycombinator.com/item?id=42092120
ewasm docs > Gas Costs > "Gas costs of individual instructions" https://ewasm.readthedocs.io/en/mkdocs/determining_wasm_gas_...
Browser tabs could show CPU, RAM, GPU utilization;
From "The Risks of WebAssembly" (2022) https://news.ycombinator.com/item?id=32765865 :
> Don't there need to be per- CPU/RAM/GPU quotas per WASM scope/tab? Or is preventing DOS with WASM out of scope for browsers?
> IIRC, it's possible to check resource utilization in e.g. a browser Task Manager, but there's no way to do `nice` or `docker --cpu-quota` or `systemd-nspawn --cpu-affinity` to prevent one or more WASM tabs from DOS'ing a workstation with non-costed operations.
Presumably workerd supports resource quotas somehow?
From 2024 re: Process isolation in browsers : https://news.ycombinator.com/item?id=40861851 :
> From "WebGPU is now available on Android" [...] (2022) :
>> What are some ideas for UI Visual Affordances to solve for bad UX due to slow browser tabs and extensions?
>> UBY: Browsers: Strobe the tab or extension button when it's beyond (configurable) resource usage thresholds
>> UBY: Browsers: Vary the {color, size, fill} of the tabs according to their relative resource utilization