All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [RFC] [PATCH 0/2] Sandboxing Qemu guests with Libseccomp
@ 2012-05-04 19:08 Eduardo Otubo
  2012-05-04 19:08 ` [Qemu-devel] [RFC] [PATCH 1/2] Adding support for libseccomp in configure Eduardo Otubo
                   ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Eduardo Otubo @ 2012-05-04 19:08 UTC (permalink / raw)
  To: qemu-devel; +Cc: Eduardo Otubo

Hello all,

This is the first effort to sandboxing Qemu guests using Libseccomp[0]. The
patches that follows are pretty simple and straightforward. I added the correct
options and checks to the configure script and the basic calls to libseccomp in
the main loop at vl.c. Details of each one are in the emails of the patch set.

This support limits the system call footprint of the entire QEMU process to a
limited set of syscalls, those that we know QEMU uses.  The idea is to limit
the allowable syscalls, therefore limiting the impact that an attacked guest
could have on the host system.

It's important to note that the libseccomp itself needs the seccomp mode 2
feature in the kernel, which is pretty close to get to the mainline since it's
already been accepted to the linux-next branch[1].

As always, comments are more than welcome.

Regards,

[0] - http://sourceforge.net/projects/libseccomp/
[1] - http://git.kernel.org/?p=linux/kernel/git/next/linux-next.git;a=commit;h=e2cfabdfd075648216f99c2c03821cf3f47c1727

Eduardo Otubo (2):
  Adding support for libseccomp in configure
  Adding basic calls to libseccomp in vl.c

 configure |   23 ++++++++++++++++++
 vl.c      |   81 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 104 insertions(+)

-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Qemu-devel] [RFC] [PATCH 1/2] Adding support for libseccomp in configure
  2012-05-04 19:08 [Qemu-devel] [RFC] [PATCH 0/2] Sandboxing Qemu guests with Libseccomp Eduardo Otubo
@ 2012-05-04 19:08 ` Eduardo Otubo
  2012-05-04 19:08 ` [Qemu-devel] [RFC] [PATCH 2/2] Adding basic calls to libseccomp in vl.c Eduardo Otubo
  2012-05-08  9:15 ` [Qemu-devel] [RFC] [PATCH 0/2] Sandboxing Qemu guests with Libseccomp Daniel P. Berrange
  2 siblings, 0 replies; 13+ messages in thread
From: Eduardo Otubo @ 2012-05-04 19:08 UTC (permalink / raw)
  To: qemu-devel; +Cc: Eduardo Otubo

Adding basic options to the configure script to use libseccomp or not.
The default is set to 'no'. If the flag --enable-libseccomp is used, the
script will check for its existence using pkg-config.

Signed-off-by: Eduardo Otubo <otubo@linux.vnet.ibm.com>
---
 configure |   23 +++++++++++++++++++++++
 1 file changed, 23 insertions(+)

diff --git a/configure b/configure
index 0111774..a496db3 100755
--- a/configure
+++ b/configure
@@ -194,6 +194,7 @@ zlib="yes"
 guest_agent="yes"
 libiscsi=""
 coroutine=""
+libseccomp="no"
 
 # parse CC options first
 for opt do
@@ -824,6 +825,10 @@ for opt do
   ;;
   --disable-guest-agent) guest_agent="no"
   ;;
+  --enable-libseccomp) libseccomp="yes"
+  ;;
+  --disable-libseccomp) libseccomp="no"
+  ;;
   *) echo "ERROR: unknown option $opt"; show_help="yes"
   ;;
   esac
@@ -1108,6 +1113,8 @@ echo "  --disable-usb-redir      disable usb network redirection support"
 echo "  --enable-usb-redir       enable usb network redirection support"
 echo "  --disable-guest-agent    disable building of the QEMU Guest Agent"
 echo "  --enable-guest-agent     enable building of the QEMU Guest Agent"
+echo "  --disable-libseccomp     disable libseccomp support"
+echo "  --enable-libseccomp      enable libseccomp support"
 echo "  --with-coroutine=BACKEND coroutine backend. Supported options:"
 echo "                           gthread, ucontext, sigaltstack, windows"
 echo ""
@@ -1353,6 +1360,17 @@ EOF
 fi
 
 ##########################################
+# libseccomp check
+
+if test "$libseccomp" = "yes" ; then
+    if $pkg_config libseccomp --modversion >/dev/null 2>&1; then
+        LIBS=`$pkg_config --libs libseccomp`
+    else
+        feature_not_found "libseccomp"
+    fi
+fi
+
+##########################################
 # xen probe
 
 if test "$xen" != "no" ; then
@@ -3008,6 +3026,7 @@ echo "usb net redir     $usb_redir"
 echo "OpenGL support    $opengl"
 echo "libiscsi support  $libiscsi"
 echo "build guest agent $guest_agent"
+echo "seccomp support   $libseccomp"
 echo "coroutine backend $coroutine_backend"
 
 if test "$sdl_too_old" = "yes"; then
@@ -3309,6 +3328,10 @@ if test "$libiscsi" = "yes" ; then
   echo "CONFIG_LIBISCSI=y" >> $config_host_mak
 fi
 
+if test "$libseccomp" = "yes" ; then
+  echo "CONFIG_LIBSECCOMP=y" >> $config_host_mak
+fi
+
 # XXX: suppress that
 if [ "$bsd" = "yes" ] ; then
   echo "CONFIG_BSD=y" >> $config_host_mak
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [Qemu-devel] [RFC] [PATCH 2/2] Adding basic calls to libseccomp in vl.c
  2012-05-04 19:08 [Qemu-devel] [RFC] [PATCH 0/2] Sandboxing Qemu guests with Libseccomp Eduardo Otubo
  2012-05-04 19:08 ` [Qemu-devel] [RFC] [PATCH 1/2] Adding support for libseccomp in configure Eduardo Otubo
@ 2012-05-04 19:08 ` Eduardo Otubo
  2012-05-04 21:59   ` Andreas Färber
  2012-05-08  9:15 ` [Qemu-devel] [RFC] [PATCH 0/2] Sandboxing Qemu guests with Libseccomp Daniel P. Berrange
  2 siblings, 1 reply; 13+ messages in thread
From: Eduardo Otubo @ 2012-05-04 19:08 UTC (permalink / raw)
  To: qemu-devel; +Cc: Eduardo Otubo

I added a syscall struct using priority levels as described in the libseccomp
man page. The priority numbers are based to the frequency they appear in a
sample strace from a regular qemu guest run under libvirt.

Libseccomp generates linear BPF code to filter system calls, those rules are
read one after another. The priority system places the most common rules first
in order to reduce the overhead when processing them.

Also, since this is just a first RFC, the whitelist is a little raw. We might
need your help to improve, test and fine tune the set of system calls.

Signed-off-by: Eduardo Otubo <otubo@linux.vnet.ibm.com>
---
 vl.c |   81 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 81 insertions(+)

diff --git a/vl.c b/vl.c
index ae91a8a..e23838b 100644
--- a/vl.c
+++ b/vl.c
@@ -63,6 +63,9 @@
 
 #include <linux/ppdev.h>
 #include <linux/parport.h>
+#ifdef CONFIG_LIBSECCOMP
+#include <seccomp.h>
+#endif
 #endif
 #ifdef __sun__
 #include <sys/stat.h>
@@ -175,6 +178,8 @@ int main(int argc, char **argv)
 
 #define MAX_VIRTIO_CONSOLES 1
 
+static int seccomp_start(void);
+
 static const char *data_dir;
 const char *bios_name = NULL;
 enum vga_retrace_method vga_retrace_method = VGA_RETRACE_DUMB;
@@ -313,6 +318,75 @@ static int default_driver_check(QemuOpts *opts, void *opaque)
     return 0;
 }
 
+#ifdef CONFIG_LIBSECCOMP
+struct qemu_seccomp_syscall {
+    int32_t num;
+    uint8_t priority;
+};
+
+static struct qemu_seccomp_syscall seccomp_whitelist[] = {
+    {SCMP_SYS(timer_settime), 255},
+    {SCMP_SYS(timer_gettime), 254},
+    {SCMP_SYS(futex), 253},
+    {SCMP_SYS(select), 252},
+    {SCMP_SYS(recvfrom), 251},
+    {SCMP_SYS(sendto), 250},
+    {SCMP_SYS(read), 249},
+    {SCMP_SYS(brk), 248},
+    {SCMP_SYS(clone), 247},
+    {SCMP_SYS(mmap), 247},
+    {SCMP_SYS(mprotect), 246},
+    {SCMP_SYS(rt_sigprocmask), 245},
+    {SCMP_SYS(write), 244},
+    {SCMP_SYS(fcntl), 243},
+    {SCMP_SYS(tgkill), 242},
+    {SCMP_SYS(rt_sigaction), 242},
+    {SCMP_SYS(pipe2), 242},
+    {SCMP_SYS(munmap), 242},
+    {SCMP_SYS(mremap), 242},
+    {SCMP_SYS(getsockname), 242},
+    {SCMP_SYS(getpeername), 242},
+    {SCMP_SYS(fdatasync), 242},
+    {SCMP_SYS(close), 242}
+};
+
+#define seccomp_whitelist_count \
+            (sizeof(seccomp_whitelist)/sizeof(seccomp_whitelist[0]))
+
+int seccomp_start(void)
+{
+    int rc = 0;
+    unsigned int i = 0;
+
+    rc = seccomp_init(SCMP_ACT_KILL);
+    if (rc < 0) {
+        goto seccomp_return;
+    }
+
+    for (i = 0; i < seccomp_whitelist_count; i++) {
+        rc = seccomp_rule_add(SCMP_ACT_ALLOW, seccomp_whitelist[i].num, 0);
+        if (rc < 0) {
+            goto seccomp_return;
+        }
+        rc = seccomp_syscall_priority(seccomp_whitelist[i].num,
+                                      seccomp_whitelist[i].priority);
+        if (rc < 0) {
+            goto seccomp_return;
+        }
+    }
+
+    rc = seccomp_load();
+
+  seccomp_return:
+    seccomp_release();
+    if (rc < 0) {
+        fprintf(stderr,
+                "ERROR: failed to configure the seccomp syscall filter in the kernel");
+    }
+    return rc;
+}
+#endif
+
 /***********************************************************/
 /* QEMU state */
 
@@ -2257,6 +2331,13 @@ int qemu_init_main_loop(void)
 
 int main(int argc, char **argv, char **envp)
 {
+
+#ifdef CONFIG_LIBSECCOMP
+    if (seccomp_start() < 0) {
+        exit(1);
+    }
+#endif
+
     int i;
     int snapshot, linux_boot;
     const char *icount_option = NULL;
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [RFC] [PATCH 2/2] Adding basic calls to libseccomp in vl.c
  2012-05-04 19:08 ` [Qemu-devel] [RFC] [PATCH 2/2] Adding basic calls to libseccomp in vl.c Eduardo Otubo
@ 2012-05-04 21:59   ` Andreas Färber
  2012-05-07 11:01     ` Paolo Bonzini
  2012-05-07 12:16     ` Eduardo Otubo
  0 siblings, 2 replies; 13+ messages in thread
From: Andreas Färber @ 2012-05-04 21:59 UTC (permalink / raw)
  To: Eduardo Otubo; +Cc: qemu-devel

Am 04.05.2012 21:08, schrieb Eduardo Otubo:
> I added a syscall struct using priority levels as described in the libseccomp
> man page. The priority numbers are based to the frequency they appear in a
> sample strace from a regular qemu guest run under libvirt.
> 
> Libseccomp generates linear BPF code to filter system calls, those rules are
> read one after another. The priority system places the most common rules first
> in order to reduce the overhead when processing them.
> 
> Also, since this is just a first RFC, the whitelist is a little raw. We might
> need your help to improve, test and fine tune the set of system calls.
> 
> Signed-off-by: Eduardo Otubo <otubo@linux.vnet.ibm.com>
> ---
>  vl.c |   81 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 81 insertions(+)
> 
> diff --git a/vl.c b/vl.c
> index ae91a8a..e23838b 100644
> --- a/vl.c
> +++ b/vl.c
> @@ -63,6 +63,9 @@
>  
>  #include <linux/ppdev.h>
>  #include <linux/parport.h>
> +#ifdef CONFIG_LIBSECCOMP
> +#include <seccomp.h>
> +#endif
>  #endif
>  #ifdef __sun__
>  #include <sys/stat.h>
> @@ -175,6 +178,8 @@ int main(int argc, char **argv)
>  
>  #define MAX_VIRTIO_CONSOLES 1
>  
> +static int seccomp_start(void);

You haven't tested this without libseccomp apparently?

Other than that mostly some Coding Style issues...

> +
>  static const char *data_dir;
>  const char *bios_name = NULL;
>  enum vga_retrace_method vga_retrace_method = VGA_RETRACE_DUMB;
> @@ -313,6 +318,75 @@ static int default_driver_check(QemuOpts *opts, void *opaque)
>      return 0;
>  }
>  
> +#ifdef CONFIG_LIBSECCOMP
> +struct qemu_seccomp_syscall {

Struct names are expected to be CamelCase.

> +    int32_t num;
> +    uint8_t priority;
> +};
> +
> +static struct qemu_seccomp_syscall seccomp_whitelist[] = {
> +    {SCMP_SYS(timer_settime), 255},

Spaces inside braces please.

> +    {SCMP_SYS(timer_gettime), 254},
> +    {SCMP_SYS(futex), 253},
> +    {SCMP_SYS(select), 252},
> +    {SCMP_SYS(recvfrom), 251},
> +    {SCMP_SYS(sendto), 250},
> +    {SCMP_SYS(read), 249},
> +    {SCMP_SYS(brk), 248},
> +    {SCMP_SYS(clone), 247},
> +    {SCMP_SYS(mmap), 247},
> +    {SCMP_SYS(mprotect), 246},
> +    {SCMP_SYS(rt_sigprocmask), 245},
> +    {SCMP_SYS(write), 244},
> +    {SCMP_SYS(fcntl), 243},
> +    {SCMP_SYS(tgkill), 242},
> +    {SCMP_SYS(rt_sigaction), 242},
> +    {SCMP_SYS(pipe2), 242},
> +    {SCMP_SYS(munmap), 242},
> +    {SCMP_SYS(mremap), 242},
> +    {SCMP_SYS(getsockname), 242},
> +    {SCMP_SYS(getpeername), 242},
> +    {SCMP_SYS(fdatasync), 242},
> +    {SCMP_SYS(close), 242}
> +};
> +
> +#define seccomp_whitelist_count \
> +            (sizeof(seccomp_whitelist)/sizeof(seccomp_whitelist[0]))

Please just use the ARRAY_SIZE() macro inline.

> +
> +int seccomp_start(void)
> +{
> +    int rc = 0;
> +    unsigned int i = 0;
> +
> +    rc = seccomp_init(SCMP_ACT_KILL);
> +    if (rc < 0) {
> +        goto seccomp_return;
> +    }
> +
> +    for (i = 0; i < seccomp_whitelist_count; i++) {
> +        rc = seccomp_rule_add(SCMP_ACT_ALLOW, seccomp_whitelist[i].num, 0);
> +        if (rc < 0) {
> +            goto seccomp_return;
> +        }
> +        rc = seccomp_syscall_priority(seccomp_whitelist[i].num,
> +                                      seccomp_whitelist[i].priority);
> +        if (rc < 0) {
> +            goto seccomp_return;
> +        }
> +    }
> +
> +    rc = seccomp_load();
> +
> +  seccomp_return:
> +    seccomp_release();
> +    if (rc < 0) {
> +        fprintf(stderr,
> +                "ERROR: failed to configure the seccomp syscall filter in the kernel");

\n missing.

> +    }
> +    return rc;
> +}
> +#endif
> +
>  /***********************************************************/
>  /* QEMU state */
>  
> @@ -2257,6 +2331,13 @@ int qemu_init_main_loop(void)
>  
>  int main(int argc, char **argv, char **envp)
>  {
> +
> +#ifdef CONFIG_LIBSECCOMP
> +    if (seccomp_start() < 0) {
> +        exit(1);

This is inconsistent: Either exit() within seccomp_start() where you
print the error message, or move the fprintf() here.

> +    }
> +#endif

This is adding code before the variable declaration block, please move
to after.

> +
>      int i;
>      int snapshot, linux_boot;
>      const char *icount_option = NULL;

Andreas

-- 
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [RFC] [PATCH 2/2] Adding basic calls to libseccomp in vl.c
  2012-05-04 21:59   ` Andreas Färber
@ 2012-05-07 11:01     ` Paolo Bonzini
  2012-05-07 12:28       ` Eduardo Otubo
  2012-05-07 12:16     ` Eduardo Otubo
  1 sibling, 1 reply; 13+ messages in thread
From: Paolo Bonzini @ 2012-05-07 11:01 UTC (permalink / raw)
  To: qemu-devel, otubo

Il 04/05/2012 23:59, Andreas Färber ha scritto:
>> > +static struct qemu_seccomp_syscall seccomp_whitelist[] = {
>> > +    {SCMP_SYS(timer_settime), 255},
> Spaces inside braces please.
> 
>> > +    {SCMP_SYS(timer_gettime), 254},
>> > +    {SCMP_SYS(futex), 253},
>> > +    {SCMP_SYS(select), 252},
>> > +    {SCMP_SYS(recvfrom), 251},
>> > +    {SCMP_SYS(sendto), 250},
>> > +    {SCMP_SYS(read), 249},
>> > +    {SCMP_SYS(brk), 248},
>> > +    {SCMP_SYS(clone), 247},
>> > +    {SCMP_SYS(mmap), 247},
>> > +    {SCMP_SYS(mprotect), 246},
>> > +    {SCMP_SYS(rt_sigprocmask), 245},
>> > +    {SCMP_SYS(write), 244},
>> > +    {SCMP_SYS(fcntl), 243},
>> > +    {SCMP_SYS(tgkill), 242},
>> > +    {SCMP_SYS(rt_sigaction), 242},
>> > +    {SCMP_SYS(pipe2), 242},
>> > +    {SCMP_SYS(munmap), 242},
>> > +    {SCMP_SYS(mremap), 242},
>> > +    {SCMP_SYS(getsockname), 242},
>> > +    {SCMP_SYS(getpeername), 242},
>> > +    {SCMP_SYS(fdatasync), 242},
>> > +    {SCMP_SYS(close), 242}
>> > +};
>> > +

At least the following are also used: recvmsg, sendmsg, accept, connect,
bind, listen, ioctl, fallocate, eventfd.  I don't know if all of them
have to be included in the list.  Other syscalls are not used but
probably should be allowed for simplicity, for example poll.

For ioctl, we may want to refine the white-list depending on the
argument, and perhaps even filter by file descriptor (the KVM ioctls are
in relatively fast paths, so it would be nice if they were passed with
fewer BPF ops).

BTW, please keep this out of vl.c, so that all hairiness can be added as
appropriate.

Paolo

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [RFC] [PATCH 2/2] Adding basic calls to libseccomp in vl.c
  2012-05-04 21:59   ` Andreas Färber
  2012-05-07 11:01     ` Paolo Bonzini
@ 2012-05-07 12:16     ` Eduardo Otubo
  1 sibling, 0 replies; 13+ messages in thread
From: Eduardo Otubo @ 2012-05-07 12:16 UTC (permalink / raw)
  To: Andreas Färber; +Cc: qemu-devel

On Fri, May 04, 2012 at 11:59:00PM +0200, Andreas Färber wrote:
> Am 04.05.2012 21:08, schrieb Eduardo Otubo:
> > I added a syscall struct using priority levels as described in the libseccomp
> > man page. The priority numbers are based to the frequency they appear in a
> > sample strace from a regular qemu guest run under libvirt.
> > 
> > Libseccomp generates linear BPF code to filter system calls, those rules are
> > read one after another. The priority system places the most common rules first
> > in order to reduce the overhead when processing them.
> > 
> > Also, since this is just a first RFC, the whitelist is a little raw. We might
> > need your help to improve, test and fine tune the set of system calls.
> > 
> > Signed-off-by: Eduardo Otubo <otubo@linux.vnet.ibm.com>
> > ---
> >  vl.c |   81 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 81 insertions(+)
> > 
> > diff --git a/vl.c b/vl.c
> > index ae91a8a..e23838b 100644
> > --- a/vl.c
> > +++ b/vl.c
> > @@ -63,6 +63,9 @@
> >  
> >  #include <linux/ppdev.h>
> >  #include <linux/parport.h>
> > +#ifdef CONFIG_LIBSECCOMP
> > +#include <seccomp.h>
> > +#endif
> >  #endif
> >  #ifdef __sun__
> >  #include <sys/stat.h>
> > @@ -175,6 +178,8 @@ int main(int argc, char **argv)
> >  
> >  #define MAX_VIRTIO_CONSOLES 1
> >  
> > +static int seccomp_start(void);
> 
> You haven't tested this without libseccomp apparently?
> 
> Other than that mostly some Coding Style issues...

Actually I did run the scripts/checkpatch.pl, and there were no errors. But
I'll fix those issues for my next version. Thanks for pointing them out :)

> 
> > +
> >  static const char *data_dir;
> >  const char *bios_name = NULL;
> >  enum vga_retrace_method vga_retrace_method = VGA_RETRACE_DUMB;
> > @@ -313,6 +318,75 @@ static int default_driver_check(QemuOpts *opts, void *opaque)
> >      return 0;
> >  }
> >  
> > +#ifdef CONFIG_LIBSECCOMP
> > +struct qemu_seccomp_syscall {
> 
> Struct names are expected to be CamelCase.
> 
> > +    int32_t num;
> > +    uint8_t priority;
> > +};
> > +
> > +static struct qemu_seccomp_syscall seccomp_whitelist[] = {
> > +    {SCMP_SYS(timer_settime), 255},
> 
> Spaces inside braces please.
> 
> > +    {SCMP_SYS(timer_gettime), 254},
> > +    {SCMP_SYS(futex), 253},
> > +    {SCMP_SYS(select), 252},
> > +    {SCMP_SYS(recvfrom), 251},
> > +    {SCMP_SYS(sendto), 250},
> > +    {SCMP_SYS(read), 249},
> > +    {SCMP_SYS(brk), 248},
> > +    {SCMP_SYS(clone), 247},
> > +    {SCMP_SYS(mmap), 247},
> > +    {SCMP_SYS(mprotect), 246},
> > +    {SCMP_SYS(rt_sigprocmask), 245},
> > +    {SCMP_SYS(write), 244},
> > +    {SCMP_SYS(fcntl), 243},
> > +    {SCMP_SYS(tgkill), 242},
> > +    {SCMP_SYS(rt_sigaction), 242},
> > +    {SCMP_SYS(pipe2), 242},
> > +    {SCMP_SYS(munmap), 242},
> > +    {SCMP_SYS(mremap), 242},
> > +    {SCMP_SYS(getsockname), 242},
> > +    {SCMP_SYS(getpeername), 242},
> > +    {SCMP_SYS(fdatasync), 242},
> > +    {SCMP_SYS(close), 242}
> > +};
> > +
> > +#define seccomp_whitelist_count \
> > +            (sizeof(seccomp_whitelist)/sizeof(seccomp_whitelist[0]))
> 
> Please just use the ARRAY_SIZE() macro inline.
> 
> > +
> > +int seccomp_start(void)
> > +{
> > +    int rc = 0;
> > +    unsigned int i = 0;
> > +
> > +    rc = seccomp_init(SCMP_ACT_KILL);
> > +    if (rc < 0) {
> > +        goto seccomp_return;
> > +    }
> > +
> > +    for (i = 0; i < seccomp_whitelist_count; i++) {
> > +        rc = seccomp_rule_add(SCMP_ACT_ALLOW, seccomp_whitelist[i].num, 0);
> > +        if (rc < 0) {
> > +            goto seccomp_return;
> > +        }
> > +        rc = seccomp_syscall_priority(seccomp_whitelist[i].num,
> > +                                      seccomp_whitelist[i].priority);
> > +        if (rc < 0) {
> > +            goto seccomp_return;
> > +        }
> > +    }
> > +
> > +    rc = seccomp_load();
> > +
> > +  seccomp_return:
> > +    seccomp_release();
> > +    if (rc < 0) {
> > +        fprintf(stderr,
> > +                "ERROR: failed to configure the seccomp syscall filter in the kernel");
> 
> \n missing.
> 
> > +    }
> > +    return rc;
> > +}
> > +#endif
> > +
> >  /***********************************************************/
> >  /* QEMU state */
> >  
> > @@ -2257,6 +2331,13 @@ int qemu_init_main_loop(void)
> >  
> >  int main(int argc, char **argv, char **envp)
> >  {
> > +
> > +#ifdef CONFIG_LIBSECCOMP
> > +    if (seccomp_start() < 0) {
> > +        exit(1);
> 
> This is inconsistent: Either exit() within seccomp_start() where you
> print the error message, or move the fprintf() here.
> 

Agreed.

> > +    }
> > +#endif
> 
> This is adding code before the variable declaration block, please move
> to after.

Agreed.

Thanks for the comments.

-- 
Eduardo Otubo
Software Engineer
Linux Technology Center
IBM Systems & Technology Group
Mobile: +55 19 8135 0885 
eotubo@linux.vnet.ibm.com

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [RFC] [PATCH 2/2] Adding basic calls to libseccomp in vl.c
  2012-05-07 11:01     ` Paolo Bonzini
@ 2012-05-07 12:28       ` Eduardo Otubo
  2012-05-07 12:34         ` Paolo Bonzini
  0 siblings, 1 reply; 13+ messages in thread
From: Eduardo Otubo @ 2012-05-07 12:28 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel

On Mon, May 07, 2012 at 01:01:01PM +0200, Paolo Bonzini wrote:
> Il 04/05/2012 23:59, Andreas Färber ha scritto:
> >> > +static struct qemu_seccomp_syscall seccomp_whitelist[] = {
> >> > +    {SCMP_SYS(timer_settime), 255},
> > Spaces inside braces please.
> > 
> >> > +    {SCMP_SYS(timer_gettime), 254},
> >> > +    {SCMP_SYS(futex), 253},
> >> > +    {SCMP_SYS(select), 252},
> >> > +    {SCMP_SYS(recvfrom), 251},
> >> > +    {SCMP_SYS(sendto), 250},
> >> > +    {SCMP_SYS(read), 249},
> >> > +    {SCMP_SYS(brk), 248},
> >> > +    {SCMP_SYS(clone), 247},
> >> > +    {SCMP_SYS(mmap), 247},
> >> > +    {SCMP_SYS(mprotect), 246},
> >> > +    {SCMP_SYS(rt_sigprocmask), 245},
> >> > +    {SCMP_SYS(write), 244},
> >> > +    {SCMP_SYS(fcntl), 243},
> >> > +    {SCMP_SYS(tgkill), 242},
> >> > +    {SCMP_SYS(rt_sigaction), 242},
> >> > +    {SCMP_SYS(pipe2), 242},
> >> > +    {SCMP_SYS(munmap), 242},
> >> > +    {SCMP_SYS(mremap), 242},
> >> > +    {SCMP_SYS(getsockname), 242},
> >> > +    {SCMP_SYS(getpeername), 242},
> >> > +    {SCMP_SYS(fdatasync), 242},
> >> > +    {SCMP_SYS(close), 242}
> >> > +};
> >> > +
> 
> At least the following are also used: recvmsg, sendmsg, accept, connect,
> bind, listen, ioctl, fallocate, eventfd.  I don't know if all of them
> have to be included in the list.  Other syscalls are not used but
> probably should be allowed for simplicity, for example poll.

You straced those syscalls from what kind of guest? Can you provide the
frequency they appear on a strace of you example so we can set the
priority? Don't need any fancy report, just some grep's and wc's on a
strace output should be just fine.

> 
> For ioctl, we may want to refine the white-list depending on the
> argument, and perhaps even filter by file descriptor (the KVM ioctls are
> in relatively fast paths, so it would be nice if they were passed with
> fewer BPF ops).
> 
> BTW, please keep this out of vl.c, so that all hairiness can be added as
> appropriate.

I thought it would be overkill the create a new seccomp.[c|h] just for this
purpose. But yes, we can start thinking about that since the features might
grow in the future.

Thanks for the comments,
Regards

-- 
Eduardo Otubo
Software Engineer
Linux Technology Center
IBM Systems & Technology Group
Mobile: +55 19 8135 0885 
eotubo@linux.vnet.ibm.com

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [RFC] [PATCH 2/2] Adding basic calls to libseccomp in vl.c
  2012-05-07 12:28       ` Eduardo Otubo
@ 2012-05-07 12:34         ` Paolo Bonzini
  0 siblings, 0 replies; 13+ messages in thread
From: Paolo Bonzini @ 2012-05-07 12:34 UTC (permalink / raw)
  To: Eduardo Otubo; +Cc: qemu-devel

> > At least the following are also used: recvmsg, sendmsg, accept, connect,
> > bind, listen, ioctl, fallocate, eventfd.  I don't know if all of them
> > have to be included in the list.  Other syscalls are not used but
> > probably should be allowed for simplicity, for example poll.
> 
> You straced those syscalls from what kind of guest? Can you provide
> the frequency they appear on a strace of you example so we can set the
> priority? Don't need any fancy report, just some grep's and wc's on a
> strace output should be just fine.

No, just looking at the code.  (Uhm, fallocate is not used in master yet).

ioctl is the only one with pretty high priority.

Paolo

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [RFC] [PATCH 0/2] Sandboxing Qemu guests with Libseccomp
  2012-05-04 19:08 [Qemu-devel] [RFC] [PATCH 0/2] Sandboxing Qemu guests with Libseccomp Eduardo Otubo
  2012-05-04 19:08 ` [Qemu-devel] [RFC] [PATCH 1/2] Adding support for libseccomp in configure Eduardo Otubo
  2012-05-04 19:08 ` [Qemu-devel] [RFC] [PATCH 2/2] Adding basic calls to libseccomp in vl.c Eduardo Otubo
@ 2012-05-08  9:15 ` Daniel P. Berrange
  2012-05-08 11:32   ` Stefano Stabellini
  2 siblings, 1 reply; 13+ messages in thread
From: Daniel P. Berrange @ 2012-05-08  9:15 UTC (permalink / raw)
  To: Eduardo Otubo; +Cc: qemu-devel

On Fri, May 04, 2012 at 04:08:36PM -0300, Eduardo Otubo wrote:
> Hello all,
> 
> This is the first effort to sandboxing Qemu guests using Libseccomp[0]. The
> patches that follows are pretty simple and straightforward. I added the correct
> options and checks to the configure script and the basic calls to libseccomp in
> the main loop at vl.c. Details of each one are in the emails of the patch set.
> 
> This support limits the system call footprint of the entire QEMU process to a
> limited set of syscalls, those that we know QEMU uses.  The idea is to limit
> the allowable syscalls, therefore limiting the impact that an attacked guest
> could have on the host system.

What functionality has been lost by applying this seccomp filter ? I've not
looked closely at the code, but it appears as if this blocks pretty much
any kind of runtime device changes. ie no hotplug of any kind will work ?

Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [RFC] [PATCH 0/2] Sandboxing Qemu guests with Libseccomp
  2012-05-08  9:15 ` [Qemu-devel] [RFC] [PATCH 0/2] Sandboxing Qemu guests with Libseccomp Daniel P. Berrange
@ 2012-05-08 11:32   ` Stefano Stabellini
  2012-05-08 14:10     ` Corey Bryant
  0 siblings, 1 reply; 13+ messages in thread
From: Stefano Stabellini @ 2012-05-08 11:32 UTC (permalink / raw)
  To: Daniel P. Berrange; +Cc: qemu-devel, Eduardo Otubo

On Tue, 8 May 2012, Daniel P. Berrange wrote:
> On Fri, May 04, 2012 at 04:08:36PM -0300, Eduardo Otubo wrote:
> > Hello all,
> > 
> > This is the first effort to sandboxing Qemu guests using Libseccomp[0]. The
> > patches that follows are pretty simple and straightforward. I added the correct
> > options and checks to the configure script and the basic calls to libseccomp in
> > the main loop at vl.c. Details of each one are in the emails of the patch set.
> > 
> > This support limits the system call footprint of the entire QEMU process to a
> > limited set of syscalls, those that we know QEMU uses.  The idea is to limit
> > the allowable syscalls, therefore limiting the impact that an attacked guest
> > could have on the host system.
> 
> What functionality has been lost by applying this seccomp filter ? I've not
> looked closely at the code, but it appears as if this blocks pretty much
> any kind of runtime device changes. ie no hotplug of any kind will work ?

Right, I was wondering the same thing: open is not on the list so adding
a new disk shouldn't be possible.

Regarding Xen, most of the hypercalls go through xc_* calls that are
ioctls on the privcmd device. Is it possible to add ioctl to the list?

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [RFC] [PATCH 0/2] Sandboxing Qemu guests with Libseccomp
  2012-05-08 11:32   ` Stefano Stabellini
@ 2012-05-08 14:10     ` Corey Bryant
  2012-05-08 14:27       ` Daniel P. Berrange
  0 siblings, 1 reply; 13+ messages in thread
From: Corey Bryant @ 2012-05-08 14:10 UTC (permalink / raw)
  To: Stefano Stabellini; +Cc: qemu-devel, Eduardo Otubo



On 05/08/2012 07:32 AM, Stefano Stabellini wrote:
> On Tue, 8 May 2012, Daniel P. Berrange wrote:
>> On Fri, May 04, 2012 at 04:08:36PM -0300, Eduardo Otubo wrote:
>>> Hello all,
>>>
>>> This is the first effort to sandboxing Qemu guests using Libseccomp[0]. The
>>> patches that follows are pretty simple and straightforward. I added the correct
>>> options and checks to the configure script and the basic calls to libseccomp in
>>> the main loop at vl.c. Details of each one are in the emails of the patch set.
>>>
>>> This support limits the system call footprint of the entire QEMU process to a
>>> limited set of syscalls, those that we know QEMU uses.  The idea is to limit
>>> the allowable syscalls, therefore limiting the impact that an attacked guest
>>> could have on the host system.
>>
>> What functionality has been lost by applying this seccomp filter ? I've not
>> looked closely at the code, but it appears as if this blocks pretty much
>> any kind of runtime device changes. ie no hotplug of any kind will work ?
>
> Right, I was wondering the same thing: open is not on the list so adding
> a new disk shouldn't be possible.
>
> Regarding Xen, most of the hypercalls go through xc_* calls that are
> ioctls on the privcmd device. Is it possible to add ioctl to the list?
>

If the whitelist is complete there should be no functionality lost when 
using seccomp with QEMU.  The idea (at least at this point) is to 
disallow the system calls that QEMU doesn't use.  open and ioctl should 
be added to the whitelist.

-- 
Regards,
Corey

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [RFC] [PATCH 0/2] Sandboxing Qemu guests with Libseccomp
  2012-05-08 14:10     ` Corey Bryant
@ 2012-05-08 14:27       ` Daniel P. Berrange
  2012-05-08 15:19         ` Corey Bryant
  0 siblings, 1 reply; 13+ messages in thread
From: Daniel P. Berrange @ 2012-05-08 14:27 UTC (permalink / raw)
  To: Corey Bryant; +Cc: Eduardo Otubo, qemu-devel, Stefano Stabellini

On Tue, May 08, 2012 at 10:10:25AM -0400, Corey Bryant wrote:
> 
> 
> On 05/08/2012 07:32 AM, Stefano Stabellini wrote:
> >On Tue, 8 May 2012, Daniel P. Berrange wrote:
> >>On Fri, May 04, 2012 at 04:08:36PM -0300, Eduardo Otubo wrote:
> >>>Hello all,
> >>>
> >>>This is the first effort to sandboxing Qemu guests using Libseccomp[0]. The
> >>>patches that follows are pretty simple and straightforward. I added the correct
> >>>options and checks to the configure script and the basic calls to libseccomp in
> >>>the main loop at vl.c. Details of each one are in the emails of the patch set.
> >>>
> >>>This support limits the system call footprint of the entire QEMU process to a
> >>>limited set of syscalls, those that we know QEMU uses.  The idea is to limit
> >>>the allowable syscalls, therefore limiting the impact that an attacked guest
> >>>could have on the host system.
> >>
> >>What functionality has been lost by applying this seccomp filter ? I've not
> >>looked closely at the code, but it appears as if this blocks pretty much
> >>any kind of runtime device changes. ie no hotplug of any kind will work ?
> >
> >Right, I was wondering the same thing: open is not on the list so adding
> >a new disk shouldn't be possible.
> >
> >Regarding Xen, most of the hypercalls go through xc_* calls that are
> >ioctls on the privcmd device. Is it possible to add ioctl to the list?
> >
> 
> If the whitelist is complete there should be no functionality lost
> when using seccomp with QEMU.  The idea (at least at this point) is
> to disallow the system calls that QEMU doesn't use.  open and ioctl
> should be added to the whitelist.

Ok. So my next question is what is the benchmark for evaluating
whether this seccomp code provides any kind of meaningful security
improvement ? AFAICT, if you were allow open(), or indeed every
syscall any QEMU feature could possibly use, then there would be
little-to-no security benefit.

Regards,
Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [RFC] [PATCH 0/2] Sandboxing Qemu guests with Libseccomp
  2012-05-08 14:27       ` Daniel P. Berrange
@ 2012-05-08 15:19         ` Corey Bryant
  0 siblings, 0 replies; 13+ messages in thread
From: Corey Bryant @ 2012-05-08 15:19 UTC (permalink / raw)
  To: Daniel P. Berrange; +Cc: Stefano Stabellini, qemu-devel, Eduardo Otubo



On 05/08/2012 10:27 AM, Daniel P. Berrange wrote:
> On Tue, May 08, 2012 at 10:10:25AM -0400, Corey Bryant wrote:
>>
>>
>> On 05/08/2012 07:32 AM, Stefano Stabellini wrote:
>>> On Tue, 8 May 2012, Daniel P. Berrange wrote:
>>>> On Fri, May 04, 2012 at 04:08:36PM -0300, Eduardo Otubo wrote:
>>>>> Hello all,
>>>>>
>>>>> This is the first effort to sandboxing Qemu guests using Libseccomp[0]. The
>>>>> patches that follows are pretty simple and straightforward. I added the correct
>>>>> options and checks to the configure script and the basic calls to libseccomp in
>>>>> the main loop at vl.c. Details of each one are in the emails of the patch set.
>>>>>
>>>>> This support limits the system call footprint of the entire QEMU process to a
>>>>> limited set of syscalls, those that we know QEMU uses.  The idea is to limit
>>>>> the allowable syscalls, therefore limiting the impact that an attacked guest
>>>>> could have on the host system.
>>>>
>>>> What functionality has been lost by applying this seccomp filter ? I've not
>>>> looked closely at the code, but it appears as if this blocks pretty much
>>>> any kind of runtime device changes. ie no hotplug of any kind will work ?
>>>
>>> Right, I was wondering the same thing: open is not on the list so adding
>>> a new disk shouldn't be possible.
>>>
>>> Regarding Xen, most of the hypercalls go through xc_* calls that are
>>> ioctls on the privcmd device. Is it possible to add ioctl to the list?
>>>
>>
>> If the whitelist is complete there should be no functionality lost
>> when using seccomp with QEMU.  The idea (at least at this point) is
>> to disallow the system calls that QEMU doesn't use.  open and ioctl
>> should be added to the whitelist.
>
> Ok. So my next question is what is the benchmark for evaluating
> whether this seccomp code provides any kind of meaningful security
> improvement ? AFAICT, if you were allow open(), or indeed every
> syscall any QEMU feature could possibly use, then there would be
> little-to-no security benefit.

Well let's say we have a seccomp whitelist of 50 syscalls.  That reduces 
the syscall footprint from ~350 (on x86) syscalls to 50, limiting what 
the attacker could execute from an exploited guest.

Eventually it would be nice to fine-tune the syscall parameters that are 
whitelisted.  For example, we could only allow a designated subset of 
allowable ioctls.  Or we could allow I/O operations only on a designated 
set of file descriptors that the guest needs to access.

-- 
Regards,
Corey

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2012-05-08 15:19 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-05-04 19:08 [Qemu-devel] [RFC] [PATCH 0/2] Sandboxing Qemu guests with Libseccomp Eduardo Otubo
2012-05-04 19:08 ` [Qemu-devel] [RFC] [PATCH 1/2] Adding support for libseccomp in configure Eduardo Otubo
2012-05-04 19:08 ` [Qemu-devel] [RFC] [PATCH 2/2] Adding basic calls to libseccomp in vl.c Eduardo Otubo
2012-05-04 21:59   ` Andreas Färber
2012-05-07 11:01     ` Paolo Bonzini
2012-05-07 12:28       ` Eduardo Otubo
2012-05-07 12:34         ` Paolo Bonzini
2012-05-07 12:16     ` Eduardo Otubo
2012-05-08  9:15 ` [Qemu-devel] [RFC] [PATCH 0/2] Sandboxing Qemu guests with Libseccomp Daniel P. Berrange
2012-05-08 11:32   ` Stefano Stabellini
2012-05-08 14:10     ` Corey Bryant
2012-05-08 14:27       ` Daniel P. Berrange
2012-05-08 15:19         ` Corey Bryant

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.