Skip to content
This repository has been archived by the owner on Nov 15, 2023. It is now read-only.

PVF: Move landlock out of thread into process; add landlock exceptions #7580

Draft
wants to merge 16 commits into
base: master
Choose a base branch
from

Conversation

mrcnski
Copy link
Contributor

@mrcnski mrcnski commented Aug 4, 2023

Currently we apply sandboxing per-thread, when it should be per-process. This shouldn't be a big change, we just need sandboxing exceptions for the artifacts/cache directories.

Without this, the sandboxing we have with landlock is not really secure.

TODO

  • We can just pass around a cloned config instead of all the values separately. Link
  • We can only throw an error if landlock is enabled and expected to work. Link
  • Use assert! instead of io::Error. Link
    • NOTE: Maybe can instead remove the artifact cache exception, see here.
  • The --artifact-dir should be passed through the handshake instead of through a CLI param. Link
    • NOTE: Maybe can instead remove the artifact cache exception, see here.

Related

Closes paritytech/polkadot-sdk#600

- [X] Add check for whether --socket-path or --cache-path are `None` (both are
      required)
- [X] Fix some compile errors
- [X] Update some docs
- [X] Update landlock tests
@mrcnski mrcnski added A0-please_review Pull request needs code review. B0-silent Changes should not be mentioned in any release notes C1-low PR touches the given topic and has a low impact on builders. D3-trivial 🧸 PR contains trivial changes in a runtime directory that do not require an audit. labels Aug 4, 2023
node/core/pvf/src/host.rs Show resolved Hide resolved
node/core/pvf/execute-worker/src/lib.rs Outdated Show resolved Hide resolved
Comment on lines +188 to +190
if !artifact_path.starts_with(cache_path) {
return Err(io::Error::new(io::ErrorKind::Other, format!("received an artifact path {artifact_path:?} that does not belong to expected artifact dir {cache_path:?}")))
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: I wonder if an io::Error is the most appropriate error here. To change it we would have to change the error type of this function (would need a new error kind) which would be a bit annoying. But since this would be a logic bug from our side it can be an assert! instead.

@mrcnski
Copy link
Contributor Author

mrcnski commented Aug 6, 2023

This PR adds a landlock exception for the PVF artifacts cache directory. We have to do it for the whole directory because, at the time of process startup, we of course can't know yet which artifacts are going to come in. Everything outside this directory is still totally restricted for the process.

This has one issue which is that execution jobs, which need read permissions on the cache directory, can now read other artifacts in the directory. As different validators may have different dir contents this can be a source of randomness for attackers. This makes it possible to attack the chain by crafting a PVF+candidate which results in arbitrary code execution which: reads the contents of the cache directory, uses it to seed some randomness, and causes the execution job to vote for or against the candidate with 50% chance, thus stalling the chain which assumes that at least 66% of validators are trustworthy and not compromised

We could spwn a brand new process for each job, but that seems like a lot of overhead and I'm not sure it's feasible. cc @s0me0ne-unkn0wn

But, there are so many sources of randomness, that I have been thinking to abandon this goal to avoid it. In the end we would need virtualization and that is not a free win but comes with challenges. Perhaps it is better to rely on governance to deal with attacks on the chain. cc @eskimor I forget if we discussed this already or not

At any rate, securing validators themselves is a priority right now so we need this landlock fix.

@eskimor
Copy link
Member

eskimor commented Aug 6, 2023

We could spwn a brand new process for each job, but that seems like a lot of overhead and I'm not sure it's feasible. cc @s0me0ne-unkn0wn

Spawning a new process is not that much overhead. Spawning a new process for each and every http request as some webservers do is quite bad, but if the useful work is in the hundreds of milliseconds to seconds, it should be negligible.

@mrcnski
Copy link
Contributor Author

mrcnski commented Aug 7, 2023

@eskimor according to our metrics, execution jobs are on the order of 10ms - so spawning a process may not be so negligible. I also remember that when @s0me0ne-unkn0wn was redesigning the execution job queue, one of the goals was to reuse the same worker process as much as possible for jobs. I don't remember the details at this point, but @s0me0ne-unkn0wn can surely provide more insight to how feasible is a new process for each job.

@s0me0ne-unkn0wn
Copy link
Contributor

Spawning a new process on Unix is a fork(), which creates a new process duplicating the whole virtual space of the calling thread, and may be expensive enough. I believe that's why we relied on only spawning workers when absolutely needed (no worker available or a previous one died or ended up in some strange state), as the polkadot memory footprint is huge. The worker then spawns jobs in threads, which are effectively CoW forks and thus less expensive.

But there are two variables here, one new and one old. The old one is the parachain logic. If it's purely computational, it's likely that spawning a new process overhead will take more time than the execution itself. But if the parachain does some i/o ops, calling host functions reading and writing storage, it may turn the other way.

The new variable is separate binary workers. Their memory footprint is small and that may reduce that overhead significantly and we might see it's negligible now.

But again, those are empirical estimations. It's easy to check, just build a version with the execution worker gracefully exiting after every execution, burn it in on Versi, and compare the results.

@s0me0ne-unkn0wn
Copy link
Contributor

Well, I've probably screwed up the explanation somewhat. We still need to fork the main process (not the whole process indeed, only a calling thread) to spawn a new process from an external binary. But prior to worker separation, we needed that forked process to execute the whole polkadot binary with its cli and everything, and now it's a lightweight polkadot-execute-worker which is not so much effort, that's what I meant.

@mrcnski
Copy link
Contributor Author

mrcnski commented Aug 7, 2023

Cool, I've raised an issue here: paritytech/polkadot-sdk#584. It's a follow-up because it doesn't block this PR, and it would likely come with changes to the execution queue logic, so it should be a separate PR. Thanks @s0me0ne-unkn0wn!

@mrcnski
Copy link
Contributor Author

mrcnski commented Aug 8, 2023

TODO: The --artifact-dir should actually be passed through the handshake instead of through a CLI param, this would simplify the worker interface. Only the socket-path and node/worker versions (to do the version check before the handshake) should be passed by params.

Allow an exception for reading from the artifact cache, but disallow listing
the directory contents. Since we prepend artifact names with a random hash,
this means attackers can't discover artifacts apart from the current job.
We already checked whether landlock is enabled in the host. We can therefore
only throw an error here if landlock is enabled and expected to work. Otherwise
we shouldn't even log here, as errors are already logged in the host, and is
just noise here.
@mrcnski mrcnski self-assigned this Aug 21, 2023
@mrcnski mrcnski marked this pull request as draft August 23, 2023 14:43
This is an attempt at an improved chroot jail that doesn't require root, but
still allows us to use sockets and artifacts from the host.
@paritytech-cicd-pr
Copy link

The CI pipeline was cancelled due to failure one of the required jobs.
Job name: test-linux-stable
Logs: https://gitlab.parity.io/parity/mirrors/polkadot/-/jobs/3431179

Comment on lines +207 to +212
if libc::unshare(libc::CLONE_NEWUSER) < 0 {
return Err("unshare user namespace")
}
if libc::unshare(libc::CLONE_NEWNS) < 0 {
return Err("unshare mount namespace")
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. These are flags so this can be done in a single unshare call.
  2. Since this is also switching to a new user namespace this function should be renamed as it's not strictly only changing the root.
  3. On some distros this could actually fail (due to user namespaces being disabled). In this case we'd probably want to abort execution unless a flag is passed on the command-line?

let cache_path_c = CString::new(cache_path.as_os_str().as_bytes()).unwrap();
let root_absolute_c = CString::new("/").unwrap();
// Append a random string to prevent races and to avoid dealing with the dir already existing.
let oldroot_relative_c = CString::new(format!("{}/oldroot-{}", cache_path_str, s)).unwrap();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use join or push instead of manually merging these with /.

node/core/pvf/common/src/worker/mod.rs Show resolved Hide resolved
Comment on lines +189 to +190
let mut buf = Vec::with_capacity(RANDOM_LEN);
buf.extend(rand::thread_rng().sample_iter(&Alphanumeric).take(RANDOM_LEN));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should be able to do a .collect here directly into the buffer.

Comment on lines +231 to +241
if libc::mkdir(oldroot_relative_c.as_ptr(), 0755) < 0 {
return Err("mkdir oldroot")
}
if libc::syscall(
libc::SYS_pivot_root,
cache_path_c.as_ptr(),
oldroot_relative_c.as_ptr(),
) < 0
{
return Err("pivot_root")
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you call the pivot root syscall with both paths being "." then this mkdir should be unnecessary.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
A0-please_review Pull request needs code review. B0-silent Changes should not be mentioned in any release notes C1-low PR touches the given topic and has a low impact on builders. D3-trivial 🧸 PR contains trivial changes in a runtime directory that do not require an audit.
Projects
No open projects
Status: In Progress
Development

Successfully merging this pull request may close these issues.

PVF worker: apply sandboxing per-process
5 participants