* [Virtio-fs] [PATCH for-5.1 2/3] virtiofsd: add container-friendly -o chroot sandboxing option
@ 2020-07-22 13:02 ` Stefan Hajnoczi
0 siblings, 0 replies; 43+ messages in thread
From: Stefan Hajnoczi @ 2020-07-22 13:02 UTC (permalink / raw)
To: qemu-devel; +Cc: virtio-fs, rmohr, vromanso
virtiofsd cannot run in an unprivileged container because CAP_SYS_ADMIN
is required to create namespaces.
Introduce a weaker sandbox that is sufficient in container environments
because the container runtime already sets up namespaces. Use chroot to
restrict path traversal to the shared directory.
virtiofsd loses the following:
1. Mount namespace. The process chroots to the shared directory but
leaves the mounts in place. Seccomp rejects mount(2)/umount(2)
syscalls.
2. Pid namespace. This should be fine because virtiofsd is the only
process running in the container.
3. Network namespace. This should be fine because seccomp already
rejects the connect(2) syscall, but an additional layer of security
is lost. Container runtime-specific network security policies can be
used drop network traffic (except for the vhost-user UNIX domain
socket).
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
tools/virtiofsd/helper.c | 3 +++
tools/virtiofsd/passthrough_ll.c | 44 ++++++++++++++++++++++++++++++--
2 files changed, 45 insertions(+), 2 deletions(-)
diff --git a/tools/virtiofsd/helper.c b/tools/virtiofsd/helper.c
index 3105b6c23a..7421c9ca1a 100644
--- a/tools/virtiofsd/helper.c
+++ b/tools/virtiofsd/helper.c
@@ -151,6 +151,9 @@ void fuse_cmdline_help(void)
" -o cache=<mode> cache mode. could be one of \"auto, "
"always, none\"\n"
" default: auto\n"
+ " -o chroot|no_chroot use container-friendly chroot instead\n"
+ " of stronger mount namespace sandbox\n"
+ " default: false\n"
" -o flock|no_flock enable/disable flock\n"
" default: no_flock\n"
" -o log_level=<level> log level, default to \"info\"\n"
diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
index 50a164a599..990c0a8a70 100644
--- a/tools/virtiofsd/passthrough_ll.c
+++ b/tools/virtiofsd/passthrough_ll.c
@@ -139,6 +139,7 @@ enum {
struct lo_data {
pthread_mutex_t mutex;
+ int chroot; /* 1 - use chroot, 0 - use mount namespace */
int debug;
int writeback;
int flock;
@@ -162,6 +163,8 @@ struct lo_data {
};
static const struct fuse_opt lo_opts[] = {
+ { "chroot", offsetof(struct lo_data, chroot), 1 },
+ { "no_chroot", offsetof(struct lo_data, chroot), 0 },
{ "writeback", offsetof(struct lo_data, writeback), 1 },
{ "no_writeback", offsetof(struct lo_data, writeback), 0 },
{ "source=%s", offsetof(struct lo_data, source), 0 },
@@ -2665,6 +2668,37 @@ static void setup_capabilities(char *modcaps_in)
pthread_mutex_unlock(&cap.mutex);
}
+/*
+ * Use chroot as a weaker sandbox for environment where the process is launched
+ * without CAP_SYS_ADMIN.
+ */
+static void setup_chroot(struct lo_data *lo)
+{
+ lo->proc_self_fd = open("/proc/self/fd", O_PATH);
+ if (lo->proc_self_fd == -1) {
+ fuse_log(FUSE_LOG_ERR, "open(\"/proc/self/fd\", O_PATH): %m\n");
+ exit(1);
+ }
+
+ /*
+ * Make the shared directory the file system root so that FUSE_OPEN
+ * (lo_open()) cannot escape the shared directory by opening a symlink.
+ *
+ * It's still possible to escape the chroot via lo->proc_self_fd but that
+ * requires gaining control of the process first.
+ */
+ if (chroot(lo->source) != 0) {
+ fuse_log(FUSE_LOG_ERR, "chroot(\"%s\"): %m\n", lo->source);
+ exit(1);
+ }
+
+ /* Move into the chroot */
+ if (chdir("/") != 0) {
+ fuse_log(FUSE_LOG_ERR, "chdir(\"/\"): %m\n");
+ exit(1);
+ }
+}
+
/*
* Lock down this process to prevent access to other processes or files outside
* source directory. This reduces the impact of arbitrary code execution bugs.
@@ -2672,8 +2706,13 @@ static void setup_capabilities(char *modcaps_in)
static void setup_sandbox(struct lo_data *lo, struct fuse_session *se,
bool enable_syslog)
{
- setup_namespaces(lo, se);
- setup_mounts(lo->source);
+ if (lo->chroot) {
+ setup_chroot(lo);
+ } else {
+ setup_namespaces(lo, se);
+ setup_mounts(lo->source);
+ }
+
setup_seccomp(enable_syslog);
setup_capabilities(g_strdup(lo->modcaps));
}
@@ -2820,6 +2859,7 @@ int main(int argc, char *argv[])
struct fuse_session *se;
struct fuse_cmdline_opts opts;
struct lo_data lo = {
+ .chroot = 0,
.debug = 0,
.writeback = 0,
.posix_lock = 1,
--
2.26.2
^ permalink raw reply related [flat|nested] 43+ messages in thread
* Re: [PATCH for-5.1 2/3] virtiofsd: add container-friendly -o chroot sandboxing option
2020-07-22 13:02 ` [Virtio-fs] " Stefan Hajnoczi
@ 2020-07-22 16:58 ` Daniel P. Berrangé
-1 siblings, 0 replies; 43+ messages in thread
From: Daniel P. Berrangé @ 2020-07-22 16:58 UTC (permalink / raw)
To: Stefan Hajnoczi
Cc: virtio-fs, vromanso, qemu-devel, Dr. David Alan Gilbert, rmohr
On Wed, Jul 22, 2020 at 02:02:05PM +0100, Stefan Hajnoczi wrote:
> virtiofsd cannot run in an unprivileged container because CAP_SYS_ADMIN
> is required to create namespaces.
>
> Introduce a weaker sandbox that is sufficient in container environments
> because the container runtime already sets up namespaces. Use chroot to
> restrict path traversal to the shared directory.
>
> virtiofsd loses the following:
>
> 1. Mount namespace. The process chroots to the shared directory but
> leaves the mounts in place. Seccomp rejects mount(2)/umount(2)
> syscalls.
>
> 2. Pid namespace. This should be fine because virtiofsd is the only
> process running in the container.
>
> 3. Network namespace. This should be fine because seccomp already
> rejects the connect(2) syscall, but an additional layer of security
> is lost. Container runtime-specific network security policies can be
> used drop network traffic (except for the vhost-user UNIX domain
> socket).
IIUC this relies on the fact that the container will still have
CAP_SYS_CHROOT IOW, we still don't have a solution for running
virtiofsd as an unprivileged user.
Bearing in mind possibly need for future improvements to handle
unprivileged users....
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
> tools/virtiofsd/helper.c | 3 +++
> tools/virtiofsd/passthrough_ll.c | 44 ++++++++++++++++++++++++++++++--
> 2 files changed, 45 insertions(+), 2 deletions(-)
>
> diff --git a/tools/virtiofsd/helper.c b/tools/virtiofsd/helper.c
> index 3105b6c23a..7421c9ca1a 100644
> --- a/tools/virtiofsd/helper.c
> +++ b/tools/virtiofsd/helper.c
> @@ -151,6 +151,9 @@ void fuse_cmdline_help(void)
> " -o cache=<mode> cache mode. could be one of \"auto, "
> "always, none\"\n"
> " default: auto\n"
> + " -o chroot|no_chroot use container-friendly chroot instead\n"
> + " of stronger mount namespace sandbox\n"
> + " default: false\n"
A simple boolean like this feels like it lacks future-proofing.
Should we have an option that takes an enumerated arg string:
-o sandbox=[namespace|chroot|....]
so we can add more options later with the same syntax.
Regards,
Daniel
--
|: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o- https://fstop138.berrange.com :|
|: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [Virtio-fs] [PATCH for-5.1 2/3] virtiofsd: add container-friendly -o chroot sandboxing option
@ 2020-07-22 16:58 ` Daniel P. Berrangé
0 siblings, 0 replies; 43+ messages in thread
From: Daniel P. Berrangé @ 2020-07-22 16:58 UTC (permalink / raw)
To: Stefan Hajnoczi; +Cc: virtio-fs, vromanso, qemu-devel, rmohr
On Wed, Jul 22, 2020 at 02:02:05PM +0100, Stefan Hajnoczi wrote:
> virtiofsd cannot run in an unprivileged container because CAP_SYS_ADMIN
> is required to create namespaces.
>
> Introduce a weaker sandbox that is sufficient in container environments
> because the container runtime already sets up namespaces. Use chroot to
> restrict path traversal to the shared directory.
>
> virtiofsd loses the following:
>
> 1. Mount namespace. The process chroots to the shared directory but
> leaves the mounts in place. Seccomp rejects mount(2)/umount(2)
> syscalls.
>
> 2. Pid namespace. This should be fine because virtiofsd is the only
> process running in the container.
>
> 3. Network namespace. This should be fine because seccomp already
> rejects the connect(2) syscall, but an additional layer of security
> is lost. Container runtime-specific network security policies can be
> used drop network traffic (except for the vhost-user UNIX domain
> socket).
IIUC this relies on the fact that the container will still have
CAP_SYS_CHROOT IOW, we still don't have a solution for running
virtiofsd as an unprivileged user.
Bearing in mind possibly need for future improvements to handle
unprivileged users....
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
> tools/virtiofsd/helper.c | 3 +++
> tools/virtiofsd/passthrough_ll.c | 44 ++++++++++++++++++++++++++++++--
> 2 files changed, 45 insertions(+), 2 deletions(-)
>
> diff --git a/tools/virtiofsd/helper.c b/tools/virtiofsd/helper.c
> index 3105b6c23a..7421c9ca1a 100644
> --- a/tools/virtiofsd/helper.c
> +++ b/tools/virtiofsd/helper.c
> @@ -151,6 +151,9 @@ void fuse_cmdline_help(void)
> " -o cache=<mode> cache mode. could be one of \"auto, "
> "always, none\"\n"
> " default: auto\n"
> + " -o chroot|no_chroot use container-friendly chroot instead\n"
> + " of stronger mount namespace sandbox\n"
> + " default: false\n"
A simple boolean like this feels like it lacks future-proofing.
Should we have an option that takes an enumerated arg string:
-o sandbox=[namespace|chroot|....]
so we can add more options later with the same syntax.
Regards,
Daniel
--
|: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o- https://fstop138.berrange.com :|
|: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH for-5.1 2/3] virtiofsd: add container-friendly -o chroot sandboxing option
2020-07-22 16:58 ` [Virtio-fs] " Daniel P. Berrangé
@ 2020-07-23 12:17 ` Stefan Hajnoczi
-1 siblings, 0 replies; 43+ messages in thread
From: Stefan Hajnoczi @ 2020-07-23 12:17 UTC (permalink / raw)
To: Daniel P. Berrangé
Cc: virtio-fs, vromanso, qemu-devel, Dr. David Alan Gilbert, rmohr
[-- Attachment #1: Type: text/plain, Size: 1355 bytes --]
On Wed, Jul 22, 2020 at 05:58:11PM +0100, Daniel P. Berrangé wrote:
> On Wed, Jul 22, 2020 at 02:02:05PM +0100, Stefan Hajnoczi wrote:
> > virtiofsd cannot run in an unprivileged container because CAP_SYS_ADMIN
> > is required to create namespaces.
> >
> > Introduce a weaker sandbox that is sufficient in container environments
> > because the container runtime already sets up namespaces. Use chroot to
> > restrict path traversal to the shared directory.
> >
> > virtiofsd loses the following:
> >
> > 1. Mount namespace. The process chroots to the shared directory but
> > leaves the mounts in place. Seccomp rejects mount(2)/umount(2)
> > syscalls.
> >
> > 2. Pid namespace. This should be fine because virtiofsd is the only
> > process running in the container.
> >
> > 3. Network namespace. This should be fine because seccomp already
> > rejects the connect(2) syscall, but an additional layer of security
> > is lost. Container runtime-specific network security policies can be
> > used drop network traffic (except for the vhost-user UNIX domain
> > socket).
>
> IIUC this relies on the fact that the container will still have
> CAP_SYS_CHROOT IOW, we still don't have a solution for running
> virtiofsd as an unprivileged user.
Yes, this still requires root in the container.
Stefan
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [Virtio-fs] [PATCH for-5.1 2/3] virtiofsd: add container-friendly -o chroot sandboxing option
@ 2020-07-23 12:17 ` Stefan Hajnoczi
0 siblings, 0 replies; 43+ messages in thread
From: Stefan Hajnoczi @ 2020-07-23 12:17 UTC (permalink / raw)
To: Daniel P. Berrangé; +Cc: virtio-fs, vromanso, qemu-devel, rmohr
[-- Attachment #1: Type: text/plain, Size: 1355 bytes --]
On Wed, Jul 22, 2020 at 05:58:11PM +0100, Daniel P. Berrangé wrote:
> On Wed, Jul 22, 2020 at 02:02:05PM +0100, Stefan Hajnoczi wrote:
> > virtiofsd cannot run in an unprivileged container because CAP_SYS_ADMIN
> > is required to create namespaces.
> >
> > Introduce a weaker sandbox that is sufficient in container environments
> > because the container runtime already sets up namespaces. Use chroot to
> > restrict path traversal to the shared directory.
> >
> > virtiofsd loses the following:
> >
> > 1. Mount namespace. The process chroots to the shared directory but
> > leaves the mounts in place. Seccomp rejects mount(2)/umount(2)
> > syscalls.
> >
> > 2. Pid namespace. This should be fine because virtiofsd is the only
> > process running in the container.
> >
> > 3. Network namespace. This should be fine because seccomp already
> > rejects the connect(2) syscall, but an additional layer of security
> > is lost. Container runtime-specific network security policies can be
> > used drop network traffic (except for the vhost-user UNIX domain
> > socket).
>
> IIUC this relies on the fact that the container will still have
> CAP_SYS_CHROOT IOW, we still don't have a solution for running
> virtiofsd as an unprivileged user.
Yes, this still requires root in the container.
Stefan
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH for-5.1 2/3] virtiofsd: add container-friendly -o chroot sandboxing option
2020-07-22 13:02 ` [Virtio-fs] " Stefan Hajnoczi
@ 2020-07-22 17:58 ` Dr. David Alan Gilbert
-1 siblings, 0 replies; 43+ messages in thread
From: Dr. David Alan Gilbert @ 2020-07-22 17:58 UTC (permalink / raw)
To: Stefan Hajnoczi; +Cc: virtio-fs, rmohr, qemu-devel, vromanso
* Stefan Hajnoczi (stefanha@redhat.com) wrote:
> virtiofsd cannot run in an unprivileged container because CAP_SYS_ADMIN
> is required to create namespaces.
>
> Introduce a weaker sandbox that is sufficient in container environments
> because the container runtime already sets up namespaces. Use chroot to
> restrict path traversal to the shared directory.
>
> virtiofsd loses the following:
>
> 1. Mount namespace. The process chroots to the shared directory but
> leaves the mounts in place. Seccomp rejects mount(2)/umount(2)
> syscalls.
OK, I'm guessing the behaviour of what happens if the host adds another
mount afterwards might be different?
> 2. Pid namespace. This should be fine because virtiofsd is the only
> process running in the container.
Is it ? Isn't the qemu and any other vhost-user processes also in the
same container?
> 3. Network namespace. This should be fine because seccomp already
> rejects the connect(2) syscall, but an additional layer of security
> is lost. Container runtime-specific network security policies can be
> used drop network traffic (except for the vhost-user UNIX domain
> socket).
Should this be tied to the same flag - this feels different from the
chroot specific problem.
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
> tools/virtiofsd/helper.c | 3 +++
> tools/virtiofsd/passthrough_ll.c | 44 ++++++++++++++++++++++++++++++--
> 2 files changed, 45 insertions(+), 2 deletions(-)
>
> diff --git a/tools/virtiofsd/helper.c b/tools/virtiofsd/helper.c
> index 3105b6c23a..7421c9ca1a 100644
> --- a/tools/virtiofsd/helper.c
> +++ b/tools/virtiofsd/helper.c
> @@ -151,6 +151,9 @@ void fuse_cmdline_help(void)
> " -o cache=<mode> cache mode. could be one of \"auto, "
> "always, none\"\n"
> " default: auto\n"
> + " -o chroot|no_chroot use container-friendly chroot instead\n"
> + " of stronger mount namespace sandbox\n"
> + " default: false\n"
I agree with Dan that something more enum like feels right
> " -o flock|no_flock enable/disable flock\n"
> " default: no_flock\n"
> " -o log_level=<level> log level, default to \"info\"\n"
> diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
> index 50a164a599..990c0a8a70 100644
> --- a/tools/virtiofsd/passthrough_ll.c
> +++ b/tools/virtiofsd/passthrough_ll.c
> @@ -139,6 +139,7 @@ enum {
>
> struct lo_data {
> pthread_mutex_t mutex;
> + int chroot; /* 1 - use chroot, 0 - use mount namespace */
> int debug;
> int writeback;
> int flock;
> @@ -162,6 +163,8 @@ struct lo_data {
> };
>
> static const struct fuse_opt lo_opts[] = {
> + { "chroot", offsetof(struct lo_data, chroot), 1 },
> + { "no_chroot", offsetof(struct lo_data, chroot), 0 },
> { "writeback", offsetof(struct lo_data, writeback), 1 },
> { "no_writeback", offsetof(struct lo_data, writeback), 0 },
> { "source=%s", offsetof(struct lo_data, source), 0 },
> @@ -2665,6 +2668,37 @@ static void setup_capabilities(char *modcaps_in)
> pthread_mutex_unlock(&cap.mutex);
> }
>
> +/*
> + * Use chroot as a weaker sandbox for environment where the process is launched
> + * without CAP_SYS_ADMIN.
> + */
> +static void setup_chroot(struct lo_data *lo)
> +{
> + lo->proc_self_fd = open("/proc/self/fd", O_PATH);
> + if (lo->proc_self_fd == -1) {
> + fuse_log(FUSE_LOG_ERR, "open(\"/proc/self/fd\", O_PATH): %m\n");
> + exit(1);
> + }
> +
> + /*
> + * Make the shared directory the file system root so that FUSE_OPEN
> + * (lo_open()) cannot escape the shared directory by opening a symlink.
> + *
> + * It's still possible to escape the chroot via lo->proc_self_fd but that
> + * requires gaining control of the process first.
> + */
> + if (chroot(lo->source) != 0) {
> + fuse_log(FUSE_LOG_ERR, "chroot(\"%s\"): %m\n", lo->source);
> + exit(1);
> + }
> +
> + /* Move into the chroot */
> + if (chdir("/") != 0) {
> + fuse_log(FUSE_LOG_ERR, "chdir(\"/\"): %m\n");
> + exit(1);
> + }
> +}
> +
This looks OK to me, but I would prefer a check from someone who has the
experience of why the mount based sandboxing is so much more common than
old chroot.
Dave
> /*
> * Lock down this process to prevent access to other processes or files outside
> * source directory. This reduces the impact of arbitrary code execution bugs.
> @@ -2672,8 +2706,13 @@ static void setup_capabilities(char *modcaps_in)
> static void setup_sandbox(struct lo_data *lo, struct fuse_session *se,
> bool enable_syslog)
> {
> - setup_namespaces(lo, se);
> - setup_mounts(lo->source);
> + if (lo->chroot) {
> + setup_chroot(lo);
> + } else {
> + setup_namespaces(lo, se);
> + setup_mounts(lo->source);
> + }
> +
> setup_seccomp(enable_syslog);
> setup_capabilities(g_strdup(lo->modcaps));
> }
> @@ -2820,6 +2859,7 @@ int main(int argc, char *argv[])
> struct fuse_session *se;
> struct fuse_cmdline_opts opts;
> struct lo_data lo = {
> + .chroot = 0,
> .debug = 0,
> .writeback = 0,
> .posix_lock = 1,
> --
> 2.26.2
>
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [Virtio-fs] [PATCH for-5.1 2/3] virtiofsd: add container-friendly -o chroot sandboxing option
@ 2020-07-22 17:58 ` Dr. David Alan Gilbert
0 siblings, 0 replies; 43+ messages in thread
From: Dr. David Alan Gilbert @ 2020-07-22 17:58 UTC (permalink / raw)
To: Stefan Hajnoczi; +Cc: virtio-fs, rmohr, qemu-devel, vromanso
* Stefan Hajnoczi (stefanha@redhat.com) wrote:
> virtiofsd cannot run in an unprivileged container because CAP_SYS_ADMIN
> is required to create namespaces.
>
> Introduce a weaker sandbox that is sufficient in container environments
> because the container runtime already sets up namespaces. Use chroot to
> restrict path traversal to the shared directory.
>
> virtiofsd loses the following:
>
> 1. Mount namespace. The process chroots to the shared directory but
> leaves the mounts in place. Seccomp rejects mount(2)/umount(2)
> syscalls.
OK, I'm guessing the behaviour of what happens if the host adds another
mount afterwards might be different?
> 2. Pid namespace. This should be fine because virtiofsd is the only
> process running in the container.
Is it ? Isn't the qemu and any other vhost-user processes also in the
same container?
> 3. Network namespace. This should be fine because seccomp already
> rejects the connect(2) syscall, but an additional layer of security
> is lost. Container runtime-specific network security policies can be
> used drop network traffic (except for the vhost-user UNIX domain
> socket).
Should this be tied to the same flag - this feels different from the
chroot specific problem.
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
> tools/virtiofsd/helper.c | 3 +++
> tools/virtiofsd/passthrough_ll.c | 44 ++++++++++++++++++++++++++++++--
> 2 files changed, 45 insertions(+), 2 deletions(-)
>
> diff --git a/tools/virtiofsd/helper.c b/tools/virtiofsd/helper.c
> index 3105b6c23a..7421c9ca1a 100644
> --- a/tools/virtiofsd/helper.c
> +++ b/tools/virtiofsd/helper.c
> @@ -151,6 +151,9 @@ void fuse_cmdline_help(void)
> " -o cache=<mode> cache mode. could be one of \"auto, "
> "always, none\"\n"
> " default: auto\n"
> + " -o chroot|no_chroot use container-friendly chroot instead\n"
> + " of stronger mount namespace sandbox\n"
> + " default: false\n"
I agree with Dan that something more enum like feels right
> " -o flock|no_flock enable/disable flock\n"
> " default: no_flock\n"
> " -o log_level=<level> log level, default to \"info\"\n"
> diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
> index 50a164a599..990c0a8a70 100644
> --- a/tools/virtiofsd/passthrough_ll.c
> +++ b/tools/virtiofsd/passthrough_ll.c
> @@ -139,6 +139,7 @@ enum {
>
> struct lo_data {
> pthread_mutex_t mutex;
> + int chroot; /* 1 - use chroot, 0 - use mount namespace */
> int debug;
> int writeback;
> int flock;
> @@ -162,6 +163,8 @@ struct lo_data {
> };
>
> static const struct fuse_opt lo_opts[] = {
> + { "chroot", offsetof(struct lo_data, chroot), 1 },
> + { "no_chroot", offsetof(struct lo_data, chroot), 0 },
> { "writeback", offsetof(struct lo_data, writeback), 1 },
> { "no_writeback", offsetof(struct lo_data, writeback), 0 },
> { "source=%s", offsetof(struct lo_data, source), 0 },
> @@ -2665,6 +2668,37 @@ static void setup_capabilities(char *modcaps_in)
> pthread_mutex_unlock(&cap.mutex);
> }
>
> +/*
> + * Use chroot as a weaker sandbox for environment where the process is launched
> + * without CAP_SYS_ADMIN.
> + */
> +static void setup_chroot(struct lo_data *lo)
> +{
> + lo->proc_self_fd = open("/proc/self/fd", O_PATH);
> + if (lo->proc_self_fd == -1) {
> + fuse_log(FUSE_LOG_ERR, "open(\"/proc/self/fd\", O_PATH): %m\n");
> + exit(1);
> + }
> +
> + /*
> + * Make the shared directory the file system root so that FUSE_OPEN
> + * (lo_open()) cannot escape the shared directory by opening a symlink.
> + *
> + * It's still possible to escape the chroot via lo->proc_self_fd but that
> + * requires gaining control of the process first.
> + */
> + if (chroot(lo->source) != 0) {
> + fuse_log(FUSE_LOG_ERR, "chroot(\"%s\"): %m\n", lo->source);
> + exit(1);
> + }
> +
> + /* Move into the chroot */
> + if (chdir("/") != 0) {
> + fuse_log(FUSE_LOG_ERR, "chdir(\"/\"): %m\n");
> + exit(1);
> + }
> +}
> +
This looks OK to me, but I would prefer a check from someone who has the
experience of why the mount based sandboxing is so much more common than
old chroot.
Dave
> /*
> * Lock down this process to prevent access to other processes or files outside
> * source directory. This reduces the impact of arbitrary code execution bugs.
> @@ -2672,8 +2706,13 @@ static void setup_capabilities(char *modcaps_in)
> static void setup_sandbox(struct lo_data *lo, struct fuse_session *se,
> bool enable_syslog)
> {
> - setup_namespaces(lo, se);
> - setup_mounts(lo->source);
> + if (lo->chroot) {
> + setup_chroot(lo);
> + } else {
> + setup_namespaces(lo, se);
> + setup_mounts(lo->source);
> + }
> +
> setup_seccomp(enable_syslog);
> setup_capabilities(g_strdup(lo->modcaps));
> }
> @@ -2820,6 +2859,7 @@ int main(int argc, char *argv[])
> struct fuse_session *se;
> struct fuse_cmdline_opts opts;
> struct lo_data lo = {
> + .chroot = 0,
> .debug = 0,
> .writeback = 0,
> .posix_lock = 1,
> --
> 2.26.2
>
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH for-5.1 2/3] virtiofsd: add container-friendly -o chroot sandboxing option
2020-07-22 17:58 ` [Virtio-fs] " Dr. David Alan Gilbert
@ 2020-07-23 12:28 ` Stefan Hajnoczi
-1 siblings, 0 replies; 43+ messages in thread
From: Stefan Hajnoczi @ 2020-07-23 12:28 UTC (permalink / raw)
To: Dr. David Alan Gilbert; +Cc: virtio-fs, rmohr, qemu-devel, vromanso
[-- Attachment #1: Type: text/plain, Size: 2484 bytes --]
On Wed, Jul 22, 2020 at 06:58:20PM +0100, Dr. David Alan Gilbert wrote:
> * Stefan Hajnoczi (stefanha@redhat.com) wrote:
> > virtiofsd cannot run in an unprivileged container because CAP_SYS_ADMIN
> > is required to create namespaces.
> >
> > Introduce a weaker sandbox that is sufficient in container environments
> > because the container runtime already sets up namespaces. Use chroot to
> > restrict path traversal to the shared directory.
> >
> > virtiofsd loses the following:
> >
> > 1. Mount namespace. The process chroots to the shared directory but
> > leaves the mounts in place. Seccomp rejects mount(2)/umount(2)
> > syscalls.
>
> OK, I'm guessing the behaviour of what happens if the host adds another
> mount afterwards might be different?
Running inside a container with -o chroot:
- The container has its own mount namespace and it is therefore not
affected, modulo shared subtrees (see mount(8)).
Running outside a container with -o chroot:
- Path traversal can only reach mounts that are made within the shared
directory tree. Technically other mounts are still there but it is
not possible to reach them via file system paths.
> > 2. Pid namespace. This should be fine because virtiofsd is the only
> > process running in the container.
>
> Is it ? Isn't the qemu and any other vhost-user processes also in the
> same container?
No. QEMU, virtiofsd, and other vhost-user processes should run in their
own containers. Application container images are typically designed to
run a single program per container. It's technically possible to launch
multiple programs but that is considered bad practice for application
containers.
Kubernetes:
Containers in a pod do not share a single pid namespace by default.
Pods do share a single network namespace so they can communicate via
UNIX domain sockets.
> > 3. Network namespace. This should be fine because seccomp already
> > rejects the connect(2) syscall, but an additional layer of security
> > is lost. Container runtime-specific network security policies can be
> > used drop network traffic (except for the vhost-user UNIX domain
> > socket).
>
> Should this be tied to the same flag - this feels different from the
> chroot specific problem.
Good point. Daniel Berrange has suggested another command-line syntax
that makes sandbox configuration more modular. I'll try to implement
something like that.
Stefan
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [Virtio-fs] [PATCH for-5.1 2/3] virtiofsd: add container-friendly -o chroot sandboxing option
@ 2020-07-23 12:28 ` Stefan Hajnoczi
0 siblings, 0 replies; 43+ messages in thread
From: Stefan Hajnoczi @ 2020-07-23 12:28 UTC (permalink / raw)
To: Dr. David Alan Gilbert; +Cc: virtio-fs, rmohr, qemu-devel, vromanso
[-- Attachment #1: Type: text/plain, Size: 2484 bytes --]
On Wed, Jul 22, 2020 at 06:58:20PM +0100, Dr. David Alan Gilbert wrote:
> * Stefan Hajnoczi (stefanha@redhat.com) wrote:
> > virtiofsd cannot run in an unprivileged container because CAP_SYS_ADMIN
> > is required to create namespaces.
> >
> > Introduce a weaker sandbox that is sufficient in container environments
> > because the container runtime already sets up namespaces. Use chroot to
> > restrict path traversal to the shared directory.
> >
> > virtiofsd loses the following:
> >
> > 1. Mount namespace. The process chroots to the shared directory but
> > leaves the mounts in place. Seccomp rejects mount(2)/umount(2)
> > syscalls.
>
> OK, I'm guessing the behaviour of what happens if the host adds another
> mount afterwards might be different?
Running inside a container with -o chroot:
- The container has its own mount namespace and it is therefore not
affected, modulo shared subtrees (see mount(8)).
Running outside a container with -o chroot:
- Path traversal can only reach mounts that are made within the shared
directory tree. Technically other mounts are still there but it is
not possible to reach them via file system paths.
> > 2. Pid namespace. This should be fine because virtiofsd is the only
> > process running in the container.
>
> Is it ? Isn't the qemu and any other vhost-user processes also in the
> same container?
No. QEMU, virtiofsd, and other vhost-user processes should run in their
own containers. Application container images are typically designed to
run a single program per container. It's technically possible to launch
multiple programs but that is considered bad practice for application
containers.
Kubernetes:
Containers in a pod do not share a single pid namespace by default.
Pods do share a single network namespace so they can communicate via
UNIX domain sockets.
> > 3. Network namespace. This should be fine because seccomp already
> > rejects the connect(2) syscall, but an additional layer of security
> > is lost. Container runtime-specific network security policies can be
> > used drop network traffic (except for the vhost-user UNIX domain
> > socket).
>
> Should this be tied to the same flag - this feels different from the
> chroot specific problem.
Good point. Daniel Berrange has suggested another command-line syntax
that makes sandbox configuration more modular. I'll try to implement
something like that.
Stefan
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [Virtio-fs] [PATCH for-5.1 2/3] virtiofsd: add container-friendly -o chroot sandboxing option
2020-07-23 12:28 ` [Virtio-fs] " Stefan Hajnoczi
@ 2020-07-23 13:47 ` Vivek Goyal
-1 siblings, 0 replies; 43+ messages in thread
From: Vivek Goyal @ 2020-07-23 13:47 UTC (permalink / raw)
To: Stefan Hajnoczi
Cc: virtio-fs, qemu-devel, vromanso, Dr. David Alan Gilbert, rmohr
On Thu, Jul 23, 2020 at 01:28:50PM +0100, Stefan Hajnoczi wrote:
> On Wed, Jul 22, 2020 at 06:58:20PM +0100, Dr. David Alan Gilbert wrote:
> > * Stefan Hajnoczi (stefanha@redhat.com) wrote:
> > > virtiofsd cannot run in an unprivileged container because CAP_SYS_ADMIN
> > > is required to create namespaces.
> > >
> > > Introduce a weaker sandbox that is sufficient in container environments
> > > because the container runtime already sets up namespaces. Use chroot to
> > > restrict path traversal to the shared directory.
> > >
> > > virtiofsd loses the following:
> > >
> > > 1. Mount namespace. The process chroots to the shared directory but
> > > leaves the mounts in place. Seccomp rejects mount(2)/umount(2)
> > > syscalls.
> >
> > OK, I'm guessing the behaviour of what happens if the host adds another
> > mount afterwards might be different?
>
> Running inside a container with -o chroot:
> - The container has its own mount namespace and it is therefore not
> affected, modulo shared subtrees (see mount(8)).
How does virtiofsd inside container gets the directory it wants to
export to guest? I am assuming that its probably a volume mount
inside container. If yes, volume mounts can set propagation
properties say "slave" and that means mounts done later will
propagate inside container in that volume (and hopefully be
visible inside guest once submount patches are upstream).
Thanks
Vivek
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [Virtio-fs] [PATCH for-5.1 2/3] virtiofsd: add container-friendly -o chroot sandboxing option
@ 2020-07-23 13:47 ` Vivek Goyal
0 siblings, 0 replies; 43+ messages in thread
From: Vivek Goyal @ 2020-07-23 13:47 UTC (permalink / raw)
To: Stefan Hajnoczi; +Cc: virtio-fs, qemu-devel, vromanso, rmohr
On Thu, Jul 23, 2020 at 01:28:50PM +0100, Stefan Hajnoczi wrote:
> On Wed, Jul 22, 2020 at 06:58:20PM +0100, Dr. David Alan Gilbert wrote:
> > * Stefan Hajnoczi (stefanha@redhat.com) wrote:
> > > virtiofsd cannot run in an unprivileged container because CAP_SYS_ADMIN
> > > is required to create namespaces.
> > >
> > > Introduce a weaker sandbox that is sufficient in container environments
> > > because the container runtime already sets up namespaces. Use chroot to
> > > restrict path traversal to the shared directory.
> > >
> > > virtiofsd loses the following:
> > >
> > > 1. Mount namespace. The process chroots to the shared directory but
> > > leaves the mounts in place. Seccomp rejects mount(2)/umount(2)
> > > syscalls.
> >
> > OK, I'm guessing the behaviour of what happens if the host adds another
> > mount afterwards might be different?
>
> Running inside a container with -o chroot:
> - The container has its own mount namespace and it is therefore not
> affected, modulo shared subtrees (see mount(8)).
How does virtiofsd inside container gets the directory it wants to
export to guest? I am assuming that its probably a volume mount
inside container. If yes, volume mounts can set propagation
properties say "slave" and that means mounts done later will
propagate inside container in that volume (and hopefully be
visible inside guest once submount patches are upstream).
Thanks
Vivek
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [Virtio-fs] [PATCH for-5.1 2/3] virtiofsd: add container-friendly -o chroot sandboxing option
2020-07-23 13:47 ` Vivek Goyal
@ 2020-07-23 15:36 ` Stefan Hajnoczi
-1 siblings, 0 replies; 43+ messages in thread
From: Stefan Hajnoczi @ 2020-07-23 15:36 UTC (permalink / raw)
To: Vivek Goyal
Cc: virtio-fs, qemu-devel, vromanso, Dr. David Alan Gilbert, rmohr
[-- Attachment #1: Type: text/plain, Size: 2453 bytes --]
On Thu, Jul 23, 2020 at 09:47:33AM -0400, Vivek Goyal wrote:
> On Thu, Jul 23, 2020 at 01:28:50PM +0100, Stefan Hajnoczi wrote:
> > On Wed, Jul 22, 2020 at 06:58:20PM +0100, Dr. David Alan Gilbert wrote:
> > > * Stefan Hajnoczi (stefanha@redhat.com) wrote:
> > > > virtiofsd cannot run in an unprivileged container because CAP_SYS_ADMIN
> > > > is required to create namespaces.
> > > >
> > > > Introduce a weaker sandbox that is sufficient in container environments
> > > > because the container runtime already sets up namespaces. Use chroot to
> > > > restrict path traversal to the shared directory.
> > > >
> > > > virtiofsd loses the following:
> > > >
> > > > 1. Mount namespace. The process chroots to the shared directory but
> > > > leaves the mounts in place. Seccomp rejects mount(2)/umount(2)
> > > > syscalls.
> > >
> > > OK, I'm guessing the behaviour of what happens if the host adds another
> > > mount afterwards might be different?
> >
> > Running inside a container with -o chroot:
> > - The container has its own mount namespace and it is therefore not
> > affected, modulo shared subtrees (see mount(8)).
>
> How does virtiofsd inside container gets the directory it wants to
> export to guest? I am assuming that its probably a volume mount
> inside container. If yes, volume mounts can set propagation
> properties say "slave" and that means mounts done later will
> propagate inside container in that volume (and hopefully be
> visible inside guest once submount patches are upstream).
Yes, the shared directory is a volume mount and propagation works as
expected:
$ docker run -v path/to/shared-dir:/shared-dir:slave \
-v /tmp/socket-dir:/var/run \
virtiofsd
When I mount things in path/to/shared-dir on the host they appear in
/shared-dir inside the guest. When I omit the "slave" option (the
default is "private") then the mount does not propagate.
The following Dockerfile was used:
$ cp ~/src/qemu/virtiofsd . # copy the executable
$ cat >Dockerfile
FROM fedora:32
COPY --chown=0:0 virtiofsd /usr/bin/virtiofsd
RUN dnf update -qy && dnf install -qy libseccomp liburing zlib glib2 nettle gnutls libcap-ng
VOLUME ["/shared-dir", "/var/run"]
CMD ["/usr/bin/virtiofsd", "--socket-path=/var/run/virtiofsd.sock", "-o", "source=/shared-dir", "-o", "cache=auto", "-o", "no_posix_lock", "-o", "xattr", "-o", "chroot"]
Stefan
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [Virtio-fs] [PATCH for-5.1 2/3] virtiofsd: add container-friendly -o chroot sandboxing option
@ 2020-07-23 15:36 ` Stefan Hajnoczi
0 siblings, 0 replies; 43+ messages in thread
From: Stefan Hajnoczi @ 2020-07-23 15:36 UTC (permalink / raw)
To: Vivek Goyal; +Cc: virtio-fs, qemu-devel, vromanso, rmohr
[-- Attachment #1: Type: text/plain, Size: 2453 bytes --]
On Thu, Jul 23, 2020 at 09:47:33AM -0400, Vivek Goyal wrote:
> On Thu, Jul 23, 2020 at 01:28:50PM +0100, Stefan Hajnoczi wrote:
> > On Wed, Jul 22, 2020 at 06:58:20PM +0100, Dr. David Alan Gilbert wrote:
> > > * Stefan Hajnoczi (stefanha@redhat.com) wrote:
> > > > virtiofsd cannot run in an unprivileged container because CAP_SYS_ADMIN
> > > > is required to create namespaces.
> > > >
> > > > Introduce a weaker sandbox that is sufficient in container environments
> > > > because the container runtime already sets up namespaces. Use chroot to
> > > > restrict path traversal to the shared directory.
> > > >
> > > > virtiofsd loses the following:
> > > >
> > > > 1. Mount namespace. The process chroots to the shared directory but
> > > > leaves the mounts in place. Seccomp rejects mount(2)/umount(2)
> > > > syscalls.
> > >
> > > OK, I'm guessing the behaviour of what happens if the host adds another
> > > mount afterwards might be different?
> >
> > Running inside a container with -o chroot:
> > - The container has its own mount namespace and it is therefore not
> > affected, modulo shared subtrees (see mount(8)).
>
> How does virtiofsd inside container gets the directory it wants to
> export to guest? I am assuming that its probably a volume mount
> inside container. If yes, volume mounts can set propagation
> properties say "slave" and that means mounts done later will
> propagate inside container in that volume (and hopefully be
> visible inside guest once submount patches are upstream).
Yes, the shared directory is a volume mount and propagation works as
expected:
$ docker run -v path/to/shared-dir:/shared-dir:slave \
-v /tmp/socket-dir:/var/run \
virtiofsd
When I mount things in path/to/shared-dir on the host they appear in
/shared-dir inside the guest. When I omit the "slave" option (the
default is "private") then the mount does not propagate.
The following Dockerfile was used:
$ cp ~/src/qemu/virtiofsd . # copy the executable
$ cat >Dockerfile
FROM fedora:32
COPY --chown=0:0 virtiofsd /usr/bin/virtiofsd
RUN dnf update -qy && dnf install -qy libseccomp liburing zlib glib2 nettle gnutls libcap-ng
VOLUME ["/shared-dir", "/var/run"]
CMD ["/usr/bin/virtiofsd", "--socket-path=/var/run/virtiofsd.sock", "-o", "source=/shared-dir", "-o", "cache=auto", "-o", "no_posix_lock", "-o", "xattr", "-o", "chroot"]
Stefan
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [Virtio-fs] [PATCH for-5.1 2/3] virtiofsd: add container-friendly -o chroot sandboxing option
2020-07-22 13:02 ` [Virtio-fs] " Stefan Hajnoczi
` (2 preceding siblings ...)
(?)
@ 2020-07-22 18:17 ` Vivek Goyal
2020-07-23 12:29 ` Stefan Hajnoczi
-1 siblings, 1 reply; 43+ messages in thread
From: Vivek Goyal @ 2020-07-22 18:17 UTC (permalink / raw)
To: Stefan Hajnoczi; +Cc: virtio-fs, vromanso, qemu-devel, rmohr
On Wed, Jul 22, 2020 at 02:02:05PM +0100, Stefan Hajnoczi wrote:
> virtiofsd cannot run in an unprivileged container because CAP_SYS_ADMIN
> is required to create namespaces.
>
> Introduce a weaker sandbox that is sufficient in container environments
> because the container runtime already sets up namespaces. Use chroot to
> restrict path traversal to the shared directory.
>
> virtiofsd loses the following:
>
> 1. Mount namespace. The process chroots to the shared directory but
> leaves the mounts in place. Seccomp rejects mount(2)/umount(2)
> syscalls.
>
> 2. Pid namespace. This should be fine because virtiofsd is the only
> process running in the container.
>
> 3. Network namespace. This should be fine because seccomp already
> rejects the connect(2) syscall, but an additional layer of security
> is lost. Container runtime-specific network security policies can be
> used drop network traffic (except for the vhost-user UNIX domain
> socket).
>
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
> tools/virtiofsd/helper.c | 3 +++
> tools/virtiofsd/passthrough_ll.c | 44 ++++++++++++++++++++++++++++++--
> 2 files changed, 45 insertions(+), 2 deletions(-)
>
> diff --git a/tools/virtiofsd/helper.c b/tools/virtiofsd/helper.c
> index 3105b6c23a..7421c9ca1a 100644
> --- a/tools/virtiofsd/helper.c
> +++ b/tools/virtiofsd/helper.c
> @@ -151,6 +151,9 @@ void fuse_cmdline_help(void)
> " -o cache=<mode> cache mode. could be one of \"auto, "
> "always, none\"\n"
> " default: auto\n"
> + " -o chroot|no_chroot use container-friendly chroot instead\n"
> + " of stronger mount namespace sandbox\n"
> + " default: false\n"
This option name disabling namespace setup is little confusing to me.
Will it make sense to provide another option to disable/enable
namespaces. "-o no-namespaces" and that disables setting up
namespaces.
Thanks
Vivek
> " -o flock|no_flock enable/disable flock\n"
> " default: no_flock\n"
> " -o log_level=<level> log level, default to \"info\"\n"
> diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
> index 50a164a599..990c0a8a70 100644
> --- a/tools/virtiofsd/passthrough_ll.c
> +++ b/tools/virtiofsd/passthrough_ll.c
> @@ -139,6 +139,7 @@ enum {
>
> struct lo_data {
> pthread_mutex_t mutex;
> + int chroot; /* 1 - use chroot, 0 - use mount namespace */
> int debug;
> int writeback;
> int flock;
> @@ -162,6 +163,8 @@ struct lo_data {
> };
>
> static const struct fuse_opt lo_opts[] = {
> + { "chroot", offsetof(struct lo_data, chroot), 1 },
> + { "no_chroot", offsetof(struct lo_data, chroot), 0 },
> { "writeback", offsetof(struct lo_data, writeback), 1 },
> { "no_writeback", offsetof(struct lo_data, writeback), 0 },
> { "source=%s", offsetof(struct lo_data, source), 0 },
> @@ -2665,6 +2668,37 @@ static void setup_capabilities(char *modcaps_in)
> pthread_mutex_unlock(&cap.mutex);
> }
>
> +/*
> + * Use chroot as a weaker sandbox for environment where the process is launched
> + * without CAP_SYS_ADMIN.
> + */
> +static void setup_chroot(struct lo_data *lo)
> +{
> + lo->proc_self_fd = open("/proc/self/fd", O_PATH);
> + if (lo->proc_self_fd == -1) {
> + fuse_log(FUSE_LOG_ERR, "open(\"/proc/self/fd\", O_PATH): %m\n");
> + exit(1);
> + }
> +
> + /*
> + * Make the shared directory the file system root so that FUSE_OPEN
> + * (lo_open()) cannot escape the shared directory by opening a symlink.
> + *
> + * It's still possible to escape the chroot via lo->proc_self_fd but that
> + * requires gaining control of the process first.
> + */
> + if (chroot(lo->source) != 0) {
> + fuse_log(FUSE_LOG_ERR, "chroot(\"%s\"): %m\n", lo->source);
> + exit(1);
> + }
> +
> + /* Move into the chroot */
> + if (chdir("/") != 0) {
> + fuse_log(FUSE_LOG_ERR, "chdir(\"/\"): %m\n");
> + exit(1);
> + }
> +}
> +
> /*
> * Lock down this process to prevent access to other processes or files outside
> * source directory. This reduces the impact of arbitrary code execution bugs.
> @@ -2672,8 +2706,13 @@ static void setup_capabilities(char *modcaps_in)
> static void setup_sandbox(struct lo_data *lo, struct fuse_session *se,
> bool enable_syslog)
> {
> - setup_namespaces(lo, se);
> - setup_mounts(lo->source);
> + if (lo->chroot) {
> + setup_chroot(lo);
> + } else {
> + setup_namespaces(lo, se);
> + setup_mounts(lo->source);
> + }
> +
> setup_seccomp(enable_syslog);
> setup_capabilities(g_strdup(lo->modcaps));
> }
> @@ -2820,6 +2859,7 @@ int main(int argc, char *argv[])
> struct fuse_session *se;
> struct fuse_cmdline_opts opts;
> struct lo_data lo = {
> + .chroot = 0,
> .debug = 0,
> .writeback = 0,
> .posix_lock = 1,
> --
> 2.26.2
>
> _______________________________________________
> Virtio-fs mailing list
> Virtio-fs@redhat.com
> https://www.redhat.com/mailman/listinfo/virtio-fs
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [Virtio-fs] [PATCH for-5.1 2/3] virtiofsd: add container-friendly -o chroot sandboxing option
2020-07-22 18:17 ` Vivek Goyal
@ 2020-07-23 12:29 ` Stefan Hajnoczi
0 siblings, 0 replies; 43+ messages in thread
From: Stefan Hajnoczi @ 2020-07-23 12:29 UTC (permalink / raw)
To: Vivek Goyal; +Cc: virtio-fs, vromanso, qemu-devel, rmohr
[-- Attachment #1: Type: text/plain, Size: 2315 bytes --]
On Wed, Jul 22, 2020 at 02:17:10PM -0400, Vivek Goyal wrote:
> On Wed, Jul 22, 2020 at 02:02:05PM +0100, Stefan Hajnoczi wrote:
> > virtiofsd cannot run in an unprivileged container because CAP_SYS_ADMIN
> > is required to create namespaces.
> >
> > Introduce a weaker sandbox that is sufficient in container environments
> > because the container runtime already sets up namespaces. Use chroot to
> > restrict path traversal to the shared directory.
> >
> > virtiofsd loses the following:
> >
> > 1. Mount namespace. The process chroots to the shared directory but
> > leaves the mounts in place. Seccomp rejects mount(2)/umount(2)
> > syscalls.
> >
> > 2. Pid namespace. This should be fine because virtiofsd is the only
> > process running in the container.
> >
> > 3. Network namespace. This should be fine because seccomp already
> > rejects the connect(2) syscall, but an additional layer of security
> > is lost. Container runtime-specific network security policies can be
> > used drop network traffic (except for the vhost-user UNIX domain
> > socket).
> >
> > Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> > ---
> > tools/virtiofsd/helper.c | 3 +++
> > tools/virtiofsd/passthrough_ll.c | 44 ++++++++++++++++++++++++++++++--
> > 2 files changed, 45 insertions(+), 2 deletions(-)
> >
> > diff --git a/tools/virtiofsd/helper.c b/tools/virtiofsd/helper.c
> > index 3105b6c23a..7421c9ca1a 100644
> > --- a/tools/virtiofsd/helper.c
> > +++ b/tools/virtiofsd/helper.c
> > @@ -151,6 +151,9 @@ void fuse_cmdline_help(void)
> > " -o cache=<mode> cache mode. could be one of \"auto, "
> > "always, none\"\n"
> > " default: auto\n"
> > + " -o chroot|no_chroot use container-friendly chroot instead\n"
> > + " of stronger mount namespace sandbox\n"
> > + " default: false\n"
>
> This option name disabling namespace setup is little confusing to me.
>
> Will it make sense to provide another option to disable/enable
> namespaces. "-o no-namespaces" and that disables setting up
> namespaces.
Thanks, I'll propose a new syntax.
Stefan
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH for-5.1 2/3] virtiofsd: add container-friendly -o chroot sandboxing option
2020-07-22 13:02 ` [Virtio-fs] " Stefan Hajnoczi
@ 2020-07-22 19:03 ` Dr. David Alan Gilbert
-1 siblings, 0 replies; 43+ messages in thread
From: Dr. David Alan Gilbert @ 2020-07-22 19:03 UTC (permalink / raw)
To: Stefan Hajnoczi; +Cc: virtio-fs, rmohr, qemu-devel, vromanso
* Stefan Hajnoczi (stefanha@redhat.com) wrote:
> virtiofsd cannot run in an unprivileged container because CAP_SYS_ADMIN
> is required to create namespaces.
>
> Introduce a weaker sandbox that is sufficient in container environments
> because the container runtime already sets up namespaces. Use chroot to
> restrict path traversal to the shared directory.
>
> virtiofsd loses the following:
>
> 1. Mount namespace. The process chroots to the shared directory but
> leaves the mounts in place. Seccomp rejects mount(2)/umount(2)
> syscalls.
>
> 2. Pid namespace. This should be fine because virtiofsd is the only
> process running in the container.
>
> 3. Network namespace. This should be fine because seccomp already
> rejects the connect(2) syscall, but an additional layer of security
> is lost. Container runtime-specific network security policies can be
> used drop network traffic (except for the vhost-user UNIX domain
> socket).
>
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
> tools/virtiofsd/helper.c | 3 +++
> tools/virtiofsd/passthrough_ll.c | 44 ++++++++++++++++++++++++++++++--
> 2 files changed, 45 insertions(+), 2 deletions(-)
>
> diff --git a/tools/virtiofsd/helper.c b/tools/virtiofsd/helper.c
> index 3105b6c23a..7421c9ca1a 100644
> --- a/tools/virtiofsd/helper.c
> +++ b/tools/virtiofsd/helper.c
> @@ -151,6 +151,9 @@ void fuse_cmdline_help(void)
> " -o cache=<mode> cache mode. could be one of \"auto, "
> "always, none\"\n"
> " default: auto\n"
> + " -o chroot|no_chroot use container-friendly chroot instead\n"
> + " of stronger mount namespace sandbox\n"
> + " default: false\n"
> " -o flock|no_flock enable/disable flock\n"
> " default: no_flock\n"
> " -o log_level=<level> log level, default to \"info\"\n"
> diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
> index 50a164a599..990c0a8a70 100644
> --- a/tools/virtiofsd/passthrough_ll.c
> +++ b/tools/virtiofsd/passthrough_ll.c
> @@ -139,6 +139,7 @@ enum {
>
> struct lo_data {
> pthread_mutex_t mutex;
> + int chroot; /* 1 - use chroot, 0 - use mount namespace */
> int debug;
> int writeback;
> int flock;
> @@ -162,6 +163,8 @@ struct lo_data {
> };
>
> static const struct fuse_opt lo_opts[] = {
> + { "chroot", offsetof(struct lo_data, chroot), 1 },
> + { "no_chroot", offsetof(struct lo_data, chroot), 0 },
> { "writeback", offsetof(struct lo_data, writeback), 1 },
> { "no_writeback", offsetof(struct lo_data, writeback), 0 },
> { "source=%s", offsetof(struct lo_data, source), 0 },
> @@ -2665,6 +2668,37 @@ static void setup_capabilities(char *modcaps_in)
> pthread_mutex_unlock(&cap.mutex);
> }
>
> +/*
> + * Use chroot as a weaker sandbox for environment where the process is launched
> + * without CAP_SYS_ADMIN.
> + */
> +static void setup_chroot(struct lo_data *lo)
> +{
> + lo->proc_self_fd = open("/proc/self/fd", O_PATH);
> + if (lo->proc_self_fd == -1) {
> + fuse_log(FUSE_LOG_ERR, "open(\"/proc/self/fd\", O_PATH): %m\n");
> + exit(1);
> + }
> +
> + /*
> + * Make the shared directory the file system root so that FUSE_OPEN
> + * (lo_open()) cannot escape the shared directory by opening a symlink.
> + *
> + * It's still possible to escape the chroot via lo->proc_self_fd but that
> + * requires gaining control of the process first.
> + */
> + if (chroot(lo->source) != 0) {
> + fuse_log(FUSE_LOG_ERR, "chroot(\"%s\"): %m\n", lo->source);
> + exit(1);
> + }
I'm seeing suggestions that you should drop CAP_SYS_CHROOT after
chroot'ing to stop an old escape (where you create another jail inside
the jail and the kernel then lets you walk outside of the old one).
Dave
> + /* Move into the chroot */
> + if (chdir("/") != 0) {
> + fuse_log(FUSE_LOG_ERR, "chdir(\"/\"): %m\n");
> + exit(1);
> + }
> +}
> +
> /*
> * Lock down this process to prevent access to other processes or files outside
> * source directory. This reduces the impact of arbitrary code execution bugs.
> @@ -2672,8 +2706,13 @@ static void setup_capabilities(char *modcaps_in)
> static void setup_sandbox(struct lo_data *lo, struct fuse_session *se,
> bool enable_syslog)
> {
> - setup_namespaces(lo, se);
> - setup_mounts(lo->source);
> + if (lo->chroot) {
> + setup_chroot(lo);
> + } else {
> + setup_namespaces(lo, se);
> + setup_mounts(lo->source);
> + }
> +
> setup_seccomp(enable_syslog);
> setup_capabilities(g_strdup(lo->modcaps));
> }
> @@ -2820,6 +2859,7 @@ int main(int argc, char *argv[])
> struct fuse_session *se;
> struct fuse_cmdline_opts opts;
> struct lo_data lo = {
> + .chroot = 0,
> .debug = 0,
> .writeback = 0,
> .posix_lock = 1,
> --
> 2.26.2
>
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [Virtio-fs] [PATCH for-5.1 2/3] virtiofsd: add container-friendly -o chroot sandboxing option
@ 2020-07-22 19:03 ` Dr. David Alan Gilbert
0 siblings, 0 replies; 43+ messages in thread
From: Dr. David Alan Gilbert @ 2020-07-22 19:03 UTC (permalink / raw)
To: Stefan Hajnoczi; +Cc: virtio-fs, rmohr, qemu-devel, vromanso
* Stefan Hajnoczi (stefanha@redhat.com) wrote:
> virtiofsd cannot run in an unprivileged container because CAP_SYS_ADMIN
> is required to create namespaces.
>
> Introduce a weaker sandbox that is sufficient in container environments
> because the container runtime already sets up namespaces. Use chroot to
> restrict path traversal to the shared directory.
>
> virtiofsd loses the following:
>
> 1. Mount namespace. The process chroots to the shared directory but
> leaves the mounts in place. Seccomp rejects mount(2)/umount(2)
> syscalls.
>
> 2. Pid namespace. This should be fine because virtiofsd is the only
> process running in the container.
>
> 3. Network namespace. This should be fine because seccomp already
> rejects the connect(2) syscall, but an additional layer of security
> is lost. Container runtime-specific network security policies can be
> used drop network traffic (except for the vhost-user UNIX domain
> socket).
>
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
> tools/virtiofsd/helper.c | 3 +++
> tools/virtiofsd/passthrough_ll.c | 44 ++++++++++++++++++++++++++++++--
> 2 files changed, 45 insertions(+), 2 deletions(-)
>
> diff --git a/tools/virtiofsd/helper.c b/tools/virtiofsd/helper.c
> index 3105b6c23a..7421c9ca1a 100644
> --- a/tools/virtiofsd/helper.c
> +++ b/tools/virtiofsd/helper.c
> @@ -151,6 +151,9 @@ void fuse_cmdline_help(void)
> " -o cache=<mode> cache mode. could be one of \"auto, "
> "always, none\"\n"
> " default: auto\n"
> + " -o chroot|no_chroot use container-friendly chroot instead\n"
> + " of stronger mount namespace sandbox\n"
> + " default: false\n"
> " -o flock|no_flock enable/disable flock\n"
> " default: no_flock\n"
> " -o log_level=<level> log level, default to \"info\"\n"
> diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
> index 50a164a599..990c0a8a70 100644
> --- a/tools/virtiofsd/passthrough_ll.c
> +++ b/tools/virtiofsd/passthrough_ll.c
> @@ -139,6 +139,7 @@ enum {
>
> struct lo_data {
> pthread_mutex_t mutex;
> + int chroot; /* 1 - use chroot, 0 - use mount namespace */
> int debug;
> int writeback;
> int flock;
> @@ -162,6 +163,8 @@ struct lo_data {
> };
>
> static const struct fuse_opt lo_opts[] = {
> + { "chroot", offsetof(struct lo_data, chroot), 1 },
> + { "no_chroot", offsetof(struct lo_data, chroot), 0 },
> { "writeback", offsetof(struct lo_data, writeback), 1 },
> { "no_writeback", offsetof(struct lo_data, writeback), 0 },
> { "source=%s", offsetof(struct lo_data, source), 0 },
> @@ -2665,6 +2668,37 @@ static void setup_capabilities(char *modcaps_in)
> pthread_mutex_unlock(&cap.mutex);
> }
>
> +/*
> + * Use chroot as a weaker sandbox for environment where the process is launched
> + * without CAP_SYS_ADMIN.
> + */
> +static void setup_chroot(struct lo_data *lo)
> +{
> + lo->proc_self_fd = open("/proc/self/fd", O_PATH);
> + if (lo->proc_self_fd == -1) {
> + fuse_log(FUSE_LOG_ERR, "open(\"/proc/self/fd\", O_PATH): %m\n");
> + exit(1);
> + }
> +
> + /*
> + * Make the shared directory the file system root so that FUSE_OPEN
> + * (lo_open()) cannot escape the shared directory by opening a symlink.
> + *
> + * It's still possible to escape the chroot via lo->proc_self_fd but that
> + * requires gaining control of the process first.
> + */
> + if (chroot(lo->source) != 0) {
> + fuse_log(FUSE_LOG_ERR, "chroot(\"%s\"): %m\n", lo->source);
> + exit(1);
> + }
I'm seeing suggestions that you should drop CAP_SYS_CHROOT after
chroot'ing to stop an old escape (where you create another jail inside
the jail and the kernel then lets you walk outside of the old one).
Dave
> + /* Move into the chroot */
> + if (chdir("/") != 0) {
> + fuse_log(FUSE_LOG_ERR, "chdir(\"/\"): %m\n");
> + exit(1);
> + }
> +}
> +
> /*
> * Lock down this process to prevent access to other processes or files outside
> * source directory. This reduces the impact of arbitrary code execution bugs.
> @@ -2672,8 +2706,13 @@ static void setup_capabilities(char *modcaps_in)
> static void setup_sandbox(struct lo_data *lo, struct fuse_session *se,
> bool enable_syslog)
> {
> - setup_namespaces(lo, se);
> - setup_mounts(lo->source);
> + if (lo->chroot) {
> + setup_chroot(lo);
> + } else {
> + setup_namespaces(lo, se);
> + setup_mounts(lo->source);
> + }
> +
> setup_seccomp(enable_syslog);
> setup_capabilities(g_strdup(lo->modcaps));
> }
> @@ -2820,6 +2859,7 @@ int main(int argc, char *argv[])
> struct fuse_session *se;
> struct fuse_cmdline_opts opts;
> struct lo_data lo = {
> + .chroot = 0,
> .debug = 0,
> .writeback = 0,
> .posix_lock = 1,
> --
> 2.26.2
>
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH for-5.1 2/3] virtiofsd: add container-friendly -o chroot sandboxing option
2020-07-22 19:03 ` [Virtio-fs] " Dr. David Alan Gilbert
@ 2020-07-23 12:32 ` Stefan Hajnoczi
-1 siblings, 0 replies; 43+ messages in thread
From: Stefan Hajnoczi @ 2020-07-23 12:32 UTC (permalink / raw)
To: Dr. David Alan Gilbert; +Cc: virtio-fs, rmohr, qemu-devel, vromanso
[-- Attachment #1: Type: text/plain, Size: 973 bytes --]
On Wed, Jul 22, 2020 at 08:03:18PM +0100, Dr. David Alan Gilbert wrote:
> * Stefan Hajnoczi (stefanha@redhat.com) wrote:
> > + /*
> > + * Make the shared directory the file system root so that FUSE_OPEN
> > + * (lo_open()) cannot escape the shared directory by opening a symlink.
> > + *
> > + * It's still possible to escape the chroot via lo->proc_self_fd but that
> > + * requires gaining control of the process first.
> > + */
> > + if (chroot(lo->source) != 0) {
> > + fuse_log(FUSE_LOG_ERR, "chroot(\"%s\"): %m\n", lo->source);
> > + exit(1);
> > + }
>
> I'm seeing suggestions that you should drop CAP_SYS_CHROOT after
> chroot'ing to stop an old escape (where you create another jail inside
> the jail and the kernel then lets you walk outside of the old one).
That's already the case:
1. setup_seccomp() blocks further chroot(2) calls.
2. setup_capabilities() drops CAP_SYS_CHROOT.
Stefan
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [Virtio-fs] [PATCH for-5.1 2/3] virtiofsd: add container-friendly -o chroot sandboxing option
@ 2020-07-23 12:32 ` Stefan Hajnoczi
0 siblings, 0 replies; 43+ messages in thread
From: Stefan Hajnoczi @ 2020-07-23 12:32 UTC (permalink / raw)
To: Dr. David Alan Gilbert; +Cc: virtio-fs, rmohr, qemu-devel, vromanso
[-- Attachment #1: Type: text/plain, Size: 973 bytes --]
On Wed, Jul 22, 2020 at 08:03:18PM +0100, Dr. David Alan Gilbert wrote:
> * Stefan Hajnoczi (stefanha@redhat.com) wrote:
> > + /*
> > + * Make the shared directory the file system root so that FUSE_OPEN
> > + * (lo_open()) cannot escape the shared directory by opening a symlink.
> > + *
> > + * It's still possible to escape the chroot via lo->proc_self_fd but that
> > + * requires gaining control of the process first.
> > + */
> > + if (chroot(lo->source) != 0) {
> > + fuse_log(FUSE_LOG_ERR, "chroot(\"%s\"): %m\n", lo->source);
> > + exit(1);
> > + }
>
> I'm seeing suggestions that you should drop CAP_SYS_CHROOT after
> chroot'ing to stop an old escape (where you create another jail inside
> the jail and the kernel then lets you walk outside of the old one).
That's already the case:
1. setup_seccomp() blocks further chroot(2) calls.
2. setup_capabilities() drops CAP_SYS_CHROOT.
Stefan
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH for-5.1 2/3] virtiofsd: add container-friendly -o chroot sandboxing option
2020-07-23 12:32 ` [Virtio-fs] " Stefan Hajnoczi
@ 2020-07-23 17:55 ` Dr. David Alan Gilbert
-1 siblings, 0 replies; 43+ messages in thread
From: Dr. David Alan Gilbert @ 2020-07-23 17:55 UTC (permalink / raw)
To: Stefan Hajnoczi; +Cc: virtio-fs, rmohr, qemu-devel, vromanso
* Stefan Hajnoczi (stefanha@redhat.com) wrote:
> On Wed, Jul 22, 2020 at 08:03:18PM +0100, Dr. David Alan Gilbert wrote:
> > * Stefan Hajnoczi (stefanha@redhat.com) wrote:
> > > + /*
> > > + * Make the shared directory the file system root so that FUSE_OPEN
> > > + * (lo_open()) cannot escape the shared directory by opening a symlink.
> > > + *
> > > + * It's still possible to escape the chroot via lo->proc_self_fd but that
> > > + * requires gaining control of the process first.
> > > + */
> > > + if (chroot(lo->source) != 0) {
> > > + fuse_log(FUSE_LOG_ERR, "chroot(\"%s\"): %m\n", lo->source);
> > > + exit(1);
> > > + }
> >
> > I'm seeing suggestions that you should drop CAP_SYS_CHROOT after
> > chroot'ing to stop an old escape (where you create another jail inside
> > the jail and the kernel then lets you walk outside of the old one).
>
> That's already the case:
>
> 1. setup_seccomp() blocks further chroot(2) calls.
> 2. setup_capabilities() drops CAP_SYS_CHROOT.
Ah yes; could you add a comment if you respin; it's not obvious that
the capability to chroot allows you to break out of an existing chroot
you're in.
Dave
> Stefan
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [Virtio-fs] [PATCH for-5.1 2/3] virtiofsd: add container-friendly -o chroot sandboxing option
@ 2020-07-23 17:55 ` Dr. David Alan Gilbert
0 siblings, 0 replies; 43+ messages in thread
From: Dr. David Alan Gilbert @ 2020-07-23 17:55 UTC (permalink / raw)
To: Stefan Hajnoczi; +Cc: virtio-fs, rmohr, qemu-devel, vromanso
* Stefan Hajnoczi (stefanha@redhat.com) wrote:
> On Wed, Jul 22, 2020 at 08:03:18PM +0100, Dr. David Alan Gilbert wrote:
> > * Stefan Hajnoczi (stefanha@redhat.com) wrote:
> > > + /*
> > > + * Make the shared directory the file system root so that FUSE_OPEN
> > > + * (lo_open()) cannot escape the shared directory by opening a symlink.
> > > + *
> > > + * It's still possible to escape the chroot via lo->proc_self_fd but that
> > > + * requires gaining control of the process first.
> > > + */
> > > + if (chroot(lo->source) != 0) {
> > > + fuse_log(FUSE_LOG_ERR, "chroot(\"%s\"): %m\n", lo->source);
> > > + exit(1);
> > > + }
> >
> > I'm seeing suggestions that you should drop CAP_SYS_CHROOT after
> > chroot'ing to stop an old escape (where you create another jail inside
> > the jail and the kernel then lets you walk outside of the old one).
>
> That's already the case:
>
> 1. setup_seccomp() blocks further chroot(2) calls.
> 2. setup_capabilities() drops CAP_SYS_CHROOT.
Ah yes; could you add a comment if you respin; it's not obvious that
the capability to chroot allows you to break out of an existing chroot
you're in.
Dave
> Stefan
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH for-5.1 2/3] virtiofsd: add container-friendly -o chroot sandboxing option
2020-07-23 17:55 ` [Virtio-fs] " Dr. David Alan Gilbert
@ 2020-07-24 12:22 ` Stefan Hajnoczi
-1 siblings, 0 replies; 43+ messages in thread
From: Stefan Hajnoczi @ 2020-07-24 12:22 UTC (permalink / raw)
To: Dr. David Alan Gilbert; +Cc: virtio-fs, rmohr, qemu-devel, vromanso
[-- Attachment #1: Type: text/plain, Size: 1357 bytes --]
On Thu, Jul 23, 2020 at 06:55:38PM +0100, Dr. David Alan Gilbert wrote:
> * Stefan Hajnoczi (stefanha@redhat.com) wrote:
> > On Wed, Jul 22, 2020 at 08:03:18PM +0100, Dr. David Alan Gilbert wrote:
> > > * Stefan Hajnoczi (stefanha@redhat.com) wrote:
> > > > + /*
> > > > + * Make the shared directory the file system root so that FUSE_OPEN
> > > > + * (lo_open()) cannot escape the shared directory by opening a symlink.
> > > > + *
> > > > + * It's still possible to escape the chroot via lo->proc_self_fd but that
> > > > + * requires gaining control of the process first.
> > > > + */
> > > > + if (chroot(lo->source) != 0) {
> > > > + fuse_log(FUSE_LOG_ERR, "chroot(\"%s\"): %m\n", lo->source);
> > > > + exit(1);
> > > > + }
> > >
> > > I'm seeing suggestions that you should drop CAP_SYS_CHROOT after
> > > chroot'ing to stop an old escape (where you create another jail inside
> > > the jail and the kernel then lets you walk outside of the old one).
> >
> > That's already the case:
> >
> > 1. setup_seccomp() blocks further chroot(2) calls.
> > 2. setup_capabilities() drops CAP_SYS_CHROOT.
>
> Ah yes; could you add a comment if you respin; it's not obvious that
> the capability to chroot allows you to break out of an existing chroot
> you're in.
Sure.
Stefan
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [Virtio-fs] [PATCH for-5.1 2/3] virtiofsd: add container-friendly -o chroot sandboxing option
@ 2020-07-24 12:22 ` Stefan Hajnoczi
0 siblings, 0 replies; 43+ messages in thread
From: Stefan Hajnoczi @ 2020-07-24 12:22 UTC (permalink / raw)
To: Dr. David Alan Gilbert; +Cc: virtio-fs, rmohr, qemu-devel, vromanso
[-- Attachment #1: Type: text/plain, Size: 1357 bytes --]
On Thu, Jul 23, 2020 at 06:55:38PM +0100, Dr. David Alan Gilbert wrote:
> * Stefan Hajnoczi (stefanha@redhat.com) wrote:
> > On Wed, Jul 22, 2020 at 08:03:18PM +0100, Dr. David Alan Gilbert wrote:
> > > * Stefan Hajnoczi (stefanha@redhat.com) wrote:
> > > > + /*
> > > > + * Make the shared directory the file system root so that FUSE_OPEN
> > > > + * (lo_open()) cannot escape the shared directory by opening a symlink.
> > > > + *
> > > > + * It's still possible to escape the chroot via lo->proc_self_fd but that
> > > > + * requires gaining control of the process first.
> > > > + */
> > > > + if (chroot(lo->source) != 0) {
> > > > + fuse_log(FUSE_LOG_ERR, "chroot(\"%s\"): %m\n", lo->source);
> > > > + exit(1);
> > > > + }
> > >
> > > I'm seeing suggestions that you should drop CAP_SYS_CHROOT after
> > > chroot'ing to stop an old escape (where you create another jail inside
> > > the jail and the kernel then lets you walk outside of the old one).
> >
> > That's already the case:
> >
> > 1. setup_seccomp() blocks further chroot(2) calls.
> > 2. setup_capabilities() drops CAP_SYS_CHROOT.
>
> Ah yes; could you add a comment if you respin; it's not obvious that
> the capability to chroot allows you to break out of an existing chroot
> you're in.
Sure.
Stefan
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 43+ messages in thread