This is a sandbox for running untrusted programs completely isolated, primarily for competitive programming competitions. This sandbox is used by sunwalker, but may just as well be used for your projects.
Licensed under Apache 2.0.
TL;DR: sunwalker-box is more secure than some sandboxes and faster than others, provided that the invoked tasks are homogenous, that is, the same program is invoked in the same environment with slightly different configuration, e.g. input.
Most general-purpose sandboxes prioritize protection from permanent damage and don't care if the sandboxed process causes denial of service, e.g. by allocating lots of memory, using all cores of the CPU, or running a fork bomb.
They are also usually optimized for long-running tasks, when the time the sandbox takes to start does not really matter. For instance, Docker and Podman take about half a second to run hello-world.
In contrast, competitive-programming-themed sandboxes typically protect against DoS attacks and common vulnerabilities, but little else: there are often ways to circumvent some of protections using slightly non-trivial OS features. They are also seldom extensible.
Finally, sandboxes seldom prioritize efficiency. We can do much better than recreating a sandbox and restarting the same program from the filesystem a hundred times. And while tricks such as preforking might not increase the performance by a lot, optimizations do accumulate.
sunwalker-box supports x86-64 (AMD64) and aarch64 (ARM64) architectures and requires Linux 5.19+ on x86-64 and Linux 6.2+ on aarch64.
Both build methods generate a sunwalker_box
executable in the current directory. The executable is statically linked and relatively small (around a megabyte, and smaller in compressed form), so it can be copied to any Linux machine and run without requiring any additional libraries or dependencies.
For x86-64, you can use the Dockerfile
provided in this repository:
$ id="$(docker create "$(docker build -q .)" a)" && docker cp "$id:/sunwalker_box" sunwalker_box && docker rm "$id"
This generates sunwalker_box
binary.
For x86-64, you can use the flake.nix
provided in this repository:
$ nix build
You can also build sunwalker-box without flakes, and it's just as simple:
$ nix-build
Both commands will produce ./result/bin/sunwalker-box
binary.
For aarch64, or if you don't want to use Docker, you will need to install the following dependencies:
- Rust, with:
- Toolchain
nightly-<architecture>-unknown-linux-gnu
, provided by e.g.rustup toolchain install nightly-x86_64-unknown-linux-gnu
, - Target
<target>-unknown-linux-musl
, provided by e.g.rustup +nightly target add x86_64-unknown-linux-musl
, - Component
rust-src
, provided byrustup component add rust-src
- Toolchain
- GNU make
- GNU C++ compiler, provided on Ubuntu by
g++
- Linux userspace headers, provided on Ubuntu as
libc6-dev
, on Arch Linux aslinux-api-headers
, and on Alpine aslinux-headers
- binutils
- Python 3
ruby
, provided on Ubuntu byruby
gem
, provided on Ubuntu byruby-rubygems
- Ruby headers, provided on Ubuntu by
ruby-dev
seccomp-tools
, provided bygem install seccomp-tools
- (aarch64 only) Linux kernel headers, provided on Ubuntu by
linux-headers-$(uname -r)
, - (sandbox-tests only)
musl-gcc
, provided on Ubuntu bymusl-tools
, - (sandbox-tests only)
pyyaml
python package, provided on Alpine bypy3-yaml
To build sunwalker-box, use:
$ make
This generates a sunwalker_box
executable in the current directory.
Cross-compilation for x86-64 is supported by passing CC=x86_64-linux-gnu-gcc
(or the corresponding cross-compiler for your Linux distribution) to make
. For aarch64, sunwalker-box should be built natively on the target machine, as it includes a kernel module which is dependent on the exact kernel version.
TL;DR: Isolate the cores you want sunwalker-box to run user processes on with
sunwalker_box isolate --core {CORE}
, then start the sandbox withsunwalker_box start --core {CORE}
(but you should add a few more options for security).
Sunwalker manipulates the system configuration quite a bit, so all commands have to be run under root. If, for some reason, you need to run sunwalker-box in Docker (and that's very, very inefficient and makes absolutely no sense if you're spawning a container per user submission because Docker's just duplicating what sunwalker-box does, just worse), --privileged --cgroupns=host
suffices.
Firstly, reserve the cores you want sunwalker to use exclusively for running untrusted code. The preferred way to do this is to start the Linux kernel with the isolcpus
option. If you can't or won't use isolcpus
, sunwalker will work fine anyway, but then Linux might use the cores for kernel tasks, so the testing results might be somewhat unstable.
Regardless of whether you used isolcpus
, enable sunwalker to use the core via
# sunwalker_box isolate --core {CORE}
Note that the cores are indexed from zero.
If you use multiple cores (you probably should use all but one or two cores for sunwalker if you are running it in production), repeat the command for each core. This should only be run once (until reboot).
To steal the core back from sunwalker, use
# sunwalker_box free --core {CORE}
Again, you most likely won't need this in production.
After registering the cores, you can finally start a sunwalker box instance using a command as simple as:
# sunwalker_box start --core {CORE}
You will most likely need to pass more options to keep the sandbox secured, though. Most importantly, you will need to setup a chroot environment and pass a path to it using --root {PATH}
. You might also want to adjust the amount of disk space the box is allowed to use using --quota-inodes {INODES} --quota-space {BYTES}
. The defaults are 1024 inodes and 30 MiB respectively; you might want to increase or decrease those, depending on your usecase. Note that unless you use commit
, the limit is only enforced approximately.
If, after running the start
command, sunwalker quietly awaits input, you're doing it right and sunwalker has created an empty sandbox. To actually do anything with the box, you issue commands to sunwalker via stdin, as if you used, say, memcached. To stop the sandbox, just ^C it--all resources will be cleaned up automatically.
The commands typically look like {COMMAND_NAME} {JSON_ENCODED_ARGUMENT}
and are terminated with a newline. The box responses with a single line containing ok
, ok {ADDITIONAL_JSON_DATA}
, or error {ERROR_DESCRIPTION_AS_A_JSON_STRING}
.
The command run
starts a process inside the sandbox. It takes a JSON object with the following options as an argument:
argv
(required) -- a list of arguments, including the path/name of the program as the first argument, e.g."argv": ["program_name", "arg1", ...]
.stdin
,stdout
,stderr
(optional) -- to which files standard streams are to be redirected. If missing, uses/dev/null
(must be present inside the chroot environment). Example:"stdin": "/space/input.txt"
.real_time_limit
(optional) -- how much wall time the program may use, in seconds, e.g."real_time_limit": 1.5
.cpu_time_limit
(optional) -- how much CPU time the program may use, in seconds, e.g."cpu_time_limit": 1.0
.idleness_time_limit
(optional) -- how much time the program may spend in iowait in total, in seconds, e.g."idleness_time_limit": 1.0
.memory_limit
(optional) -- how much RAM the program may use, in bytes, e.g."memory_limit": 128000000
.processes_limit
(optional) -- how many processes the program may start at once (including itself), e.g."processes_limit": 64
. Must be positive.env
(optional) -- the new environment of the process as a string-to-string dictionary. If missing, environment variables are inherited. If passed, all old environment variables are deleted.
To prevent DOS, cpu_time_limit
and processes_limit
must necessarily be set. Setting real_time_limit
and/or idleness_time_limit
is also recommended, but not strictly necessary, e.g. if you kill the box on timeout manually.
The program is always executed with working directory /space
.
This command is blocking. When the program exits or a limit expires, an ok
status is returned (even if a limit expired) with a JSON-object value with the following properties:
limit_verdict
-- either of the following:"OK"
-- the program exitted without exceeding limits."Signaled"
-- the program was terminated by a signal."RealTimeLimitExceeded" / "CPUTimeLimitExceeded" / "IdlenessTimeLimitExceeded" / "MemoryLimitExceeded"
-- the program used more wall-clock time/CPU time/iowait time/memory than allowed.
exit_code
-- either the exit code of the program from0
to255
(0
typically indicates success), or, iflimit_verdict
is"Signaled"
, the negated number of the signal, e.g.-9
forSIGKILL
.real_time / cpu_time / idleness_time / memory
-- approximately how much wall-clock time/CPU time/iowait time/memory the program used, in the same units as the corresponding limits (i.e. seconds or bytes). Note the word "approximately" -- even when the limit is exceeded, i.e.limit_verdict
is not"OK"
, the corresponding metric might be slightly less than the limit. How to handle this discrepancy is your choice, but do not use metrics to check if a limit has been exceeded.
After the process finishes, you can run another program in the same box in the same way. And if you want to run another program (or the same program with different input, you get the gist), but without the leftovers of the previous processes (PIDs, temporary files, network data, etc.), don't restart the sandbox! Instead, use reset
, which efficiently restores the box to the original state as if sunwalker-box was just invoked, and proceed without restarting sunwalker-box. This is much more efficient. And if you need to revert to a more mature state, commit
is available.
Filesystem-related commands modify the virtual overlay filesystem rather than the chroot environment, so all modifications are temporary and are not propagated to disk (that is, unless you are low on memory and use swap). They are also rolled back when the reset
command is issued, either to the clean state at the start of sunwalker-box
, or to the state at the moment when commit
is invoked.
Sunwalker creates a user-writable /space
directory to put user files to.
Unless specified otherwise, the paths are relative to the box chroot environment.
mkdir "/path/on/filesystem"
-- create a directory at the given path. Returns nothing.ls "/path/to/a/directory"
-- list the contents of the directory. Returns a JSON object with filenames as keys and objects satisfying{file_type: "dir" | "file" | "symlink" | "block" | "char" | "fifo" | "socket" | "unknown", len: integer, mode: integer}
as values.cat "/path/to/a/file"
orcat {"path": "/path/to/a/file", "at": seek_to_offset, "len": count_of_bytes_to_read}
-- returns the contents of the whole file or its part as an array of byte values. Seeking further than EOF is considered an error, reaching EOF beforelen
is exhausted is not. A length limit of0
means unlimited. Only regular files can be read this way.extpath "/path/to/a/file"
-- returns a path by which a file inside the sandbox can be accessed from outside.mkfile {"path": "/path/to/a/file", "content": [...byte_values]}
-- creates a regular file with the given bytes content.mksymlink {"link": "/where/to/put/the/link", "target": "/where/the/link/points/to"}
-- creates a symlink with the given target. The target does not have to exist or be a path.bind {"internal": "/path/inside/the/box", "external": "/path/outside/the/box", "ro": false/true}
-- creates a read-write or a read-only mirror of an external directory or file. The file/directory must already exist inside the sandbox; if they don't, usemkfile
/mkdir
before.