All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCHv2 1/4] Adding new syscalls (bugzilla 855162)
@ 2012-10-23  5:55 Eduardo Otubo
  2012-10-23  5:55 ` [Qemu-devel] [PATCHv2 2/4] Setting "-sandbox on" as deafult Eduardo Otubo
                   ` (3 more replies)
  0 siblings, 4 replies; 25+ messages in thread
From: Eduardo Otubo @ 2012-10-23  5:55 UTC (permalink / raw)
  To: qemu-devel; +Cc: pmoore, aliguori, coreyb, Eduardo Otubo

According to the bug 855162[0] - there's the need of adding new syscalls
to the whitelist whenn using Qemu with Libvirt.

[0] - https://bugzilla.redhat.com/show_bug.cgi?id=855162

v2: Adding new syscalls to the list: readlink, rt_sigpending, and 
    rt_sigtimedwait

Reported-by: Paul Moore <pmoore@redhat.com>
Signed-off-by: Eduardo Otubo <otubo@linux.vnet.ibm.com>
---
 qemu-seccomp.c | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/qemu-seccomp.c b/qemu-seccomp.c
index 64329a3..a7b33e2 100644
--- a/qemu-seccomp.c
+++ b/qemu-seccomp.c
@@ -45,6 +45,13 @@ static const struct QemuSeccompSyscall seccomp_whitelist[] = {
     { SCMP_SYS(access), 245 },
     { SCMP_SYS(prctl), 245 },
     { SCMP_SYS(signalfd), 245 },
+    { SCMP_SYS(getrlimit), 245 },
+    { SCMP_SYS(set_tid_address), 245 },
+    { SCMP_SYS(socketpair), 245 },
+    { SCMP_SYS(statfs), 245 },
+    { SCMP_SYS(unlink), 245 },
+    { SCMP_SYS(wait4), 245 },
+    { SCMP_SYS(getuid), 245 },
 #if defined(__i386__)
     { SCMP_SYS(fcntl64), 245 },
     { SCMP_SYS(fstat64), 245 },
@@ -107,7 +114,11 @@ static const struct QemuSeccompSyscall seccomp_whitelist[] = {
     { SCMP_SYS(getsockname), 242 },
     { SCMP_SYS(getpeername), 242 },
     { SCMP_SYS(fdatasync), 242 },
-    { SCMP_SYS(close), 242 }
+    { SCMP_SYS(close), 242 },
+    { SCMP_SYS(accept4), 242 },
+    { SCMP_SYS(readlink), 242 },
+    { SCMP_SYS(rt_sigpending), 242 },
+    { SCMP_SYS(rt_sigtimedwait), 242 }
 };
 
 int seccomp_start(void)
-- 
1.7.12

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [Qemu-devel] [PATCHv2 2/4] Setting "-sandbox on" as deafult
  2012-10-23  5:55 [Qemu-devel] [PATCHv2 1/4] Adding new syscalls (bugzilla 855162) Eduardo Otubo
@ 2012-10-23  5:55 ` Eduardo Otubo
  2012-10-23  5:55 ` [Qemu-devel] [PATCHv2 3/4] Support for "double whitelist" filters Eduardo Otubo
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 25+ messages in thread
From: Eduardo Otubo @ 2012-10-23  5:55 UTC (permalink / raw)
  To: qemu-devel; +Cc: pmoore, aliguori, coreyb, Eduardo Otubo

Now the seccomp filter will be set to "on" even if no argument
"-sandbox" is given.

v2: nothing new

Signed-off-by: Eduardo Otubo <otubo@linux.vnet.ibm.com>
---
 configure |  2 +-
 vl.c      | 38 +++++++++++++++++++++++++++-----------
 2 files changed, 28 insertions(+), 12 deletions(-)

diff --git a/configure b/configure
index 353d788..c613a51 100755
--- a/configure
+++ b/configure
@@ -220,7 +220,7 @@ guest_agent="yes"
 want_tools="yes"
 libiscsi=""
 coroutine=""
-seccomp=""
+seccomp="yes"
 glusterfs=""
 
 # parse CC options first
diff --git a/vl.c b/vl.c
index 5b357a3..bec68cd 100644
--- a/vl.c
+++ b/vl.c
@@ -276,6 +276,10 @@ static int default_cdrom = 1;
 static int default_sdcard = 1;
 static int default_vga = 1;
 
+#ifdef CONFIG_SECCOMP
+bool seccomp_on = true;
+#endif
+
 static struct {
     const char *driver;
     int *flag;
@@ -770,23 +774,28 @@ static int bt_parse(const char *opt)
     return 1;
 }
 
-static int parse_sandbox(QemuOpts *opts, void *opaque)
+static int install_seccomp_filters(void)
 {
-    /* FIXME: change this to true for 1.3 */
-    if (qemu_opt_get_bool(opts, "enable", false)) {
 #ifdef CONFIG_SECCOMP
-        if (seccomp_start() < 0) {
-            qerror_report(ERROR_CLASS_GENERIC_ERROR,
-                          "failed to install seccomp syscall filter in the kernel");
-            return -1;
-        }
-#else
+    if (seccomp_start() < 0) {
         qerror_report(ERROR_CLASS_GENERIC_ERROR,
-                      "sandboxing request but seccomp is not compiled into this build");
+                "failed to install seccomp syscall filter in the kernel");
         return -1;
-#endif
     }
+#else
+    qerror_report(ERROR_CLASS_GENERIC_ERROR,
+            "sandboxing requested but seccomp is not compiled into this build");
+    return -1;
+#endif
+    return 0;
+}
+
 
+static int parse_sandbox(QemuOpts *opts, void *opaque)
+{
+    if (!qemu_opt_get_bool(opts, "enable", true)) {
+        seccomp_on = false;
+    }
     return 0;
 }
 
@@ -3320,6 +3329,13 @@ int main(int argc, char **argv, char **envp)
         exit(1);
     }
 
+    /* We should install seccomp filters even if -sandbox on is not used. */
+    if (seccomp_on) {
+        if (install_seccomp_filters() < 0) {
+            exit(1);
+        }
+    }
+
     if (machine == NULL) {
         fprintf(stderr, "No machine found.\n");
         exit(1);
-- 
1.7.12

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [Qemu-devel] [PATCHv2 3/4] Support for "double whitelist" filters
  2012-10-23  5:55 [Qemu-devel] [PATCHv2 1/4] Adding new syscalls (bugzilla 855162) Eduardo Otubo
  2012-10-23  5:55 ` [Qemu-devel] [PATCHv2 2/4] Setting "-sandbox on" as deafult Eduardo Otubo
@ 2012-10-23  5:55 ` Eduardo Otubo
  2012-10-23 15:10   ` Corey Bryant
  2012-11-02 21:29   ` Paul Moore
  2012-10-23  5:55 ` [Qemu-devel] [PATCHv2 4/4] Warning messages on net devices hotplug Eduardo Otubo
  2012-11-01 21:43 ` [Qemu-devel] [PATCHv2 1/4] Adding new syscalls (bugzilla 855162) Paul Moore
  3 siblings, 2 replies; 25+ messages in thread
From: Eduardo Otubo @ 2012-10-23  5:55 UTC (permalink / raw)
  To: qemu-devel; +Cc: pmoore, aliguori, coreyb, Eduardo Otubo

This patch includes a second whitelist right before the main loop. It's
a smaller and more restricted whitelist, excluding execve() among many
others.

v2: * ctx changed to main_loop_ctx
    * seccomp_on now inside ifdef
    * open syscall added to the main_loop whitelist

Signed-off-by: Eduardo Otubo <otubo@linux.vnet.ibm.com>
---
 qemu-seccomp.c | 99 ++++++++++++++++++++++++++++++++++++++++++++++++++++------
 qemu-seccomp.h |  7 ++++-
 vl.c           | 21 +++++++++++--
 3 files changed, 114 insertions(+), 13 deletions(-)

diff --git a/qemu-seccomp.c b/qemu-seccomp.c
index a7b33e2..033cfad 100644
--- a/qemu-seccomp.c
+++ b/qemu-seccomp.c
@@ -13,6 +13,7 @@
  * GNU GPL, version 2 or (at your option) any later version.
  */
 #include <stdio.h>
+#include <stdlib.h>
 #include <seccomp.h>
 #include "qemu-seccomp.h"
 
@@ -21,7 +22,7 @@ struct QemuSeccompSyscall {
     uint8_t priority;
 };
 
-static const struct QemuSeccompSyscall seccomp_whitelist[] = {
+static const struct QemuSeccompSyscall seccomp_whitelist_init[] = {
     { SCMP_SYS(timer_settime), 255 },
     { SCMP_SYS(timer_gettime), 254 },
     { SCMP_SYS(futex), 253 },
@@ -121,27 +122,107 @@ static const struct QemuSeccompSyscall seccomp_whitelist[] = {
     { SCMP_SYS(rt_sigtimedwait), 242 }
 };
 
-int seccomp_start(void)
+static const struct QemuSeccompSyscall seccomp_whitelist_main_loop[] = {
+    { SCMP_SYS(timer_settime), 255 },
+    { SCMP_SYS(timer_gettime), 254 },
+    { SCMP_SYS(futex), 253 },
+    { SCMP_SYS(select), 252 },
+    { SCMP_SYS(recvfrom), 251 },
+    { SCMP_SYS(sendto), 250 },
+    { SCMP_SYS(read), 249 },
+    { SCMP_SYS(brk), 248 },
+    { SCMP_SYS(mmap), 247 },
+    { SCMP_SYS(open), 247 },
+#if defined(__i386__)
+    { SCMP_SYS(fcntl64), 245 },
+    { SCMP_SYS(fstat64), 245 },
+    { SCMP_SYS(stat64), 245 },
+    { SCMP_SYS(getgid32), 245 },
+    { SCMP_SYS(getegid32), 245 },
+    { SCMP_SYS(getuid32), 245 },
+    { SCMP_SYS(geteuid32), 245 },
+    { SCMP_SYS(sigreturn), 245 },
+    { SCMP_SYS(_newselect), 245 },
+    { SCMP_SYS(_llseek), 245 },
+    { SCMP_SYS(mmap2), 245},
+    { SCMP_SYS(sigprocmask), 245 },
+#endif
+    { SCMP_SYS(exit), 245 },
+    { SCMP_SYS(timer_delete), 245 },
+    { SCMP_SYS(exit_group), 245 },
+    { SCMP_SYS(rt_sigreturn), 245 },
+    { SCMP_SYS(madvise), 245 },
+    { SCMP_SYS(write), 244 },
+    { SCMP_SYS(fcntl), 243 },
+    { SCMP_SYS(tgkill), 242 },
+    { SCMP_SYS(rt_sigaction), 242 },
+    { SCMP_SYS(pipe2), 242 },
+    { SCMP_SYS(munmap), 242 },
+    { SCMP_SYS(mremap), 242 },
+    { SCMP_SYS(getsockname), 242 },
+    { SCMP_SYS(getpeername), 242 },
+    { SCMP_SYS(close), 242 },
+    { SCMP_SYS(accept4), 242 },
+    { SCMP_SYS(eventfd2), 242 },
+    { SCMP_SYS(recvmsg), 242 },
+    { SCMP_SYS(ioctl), 242 },
+    { SCMP_SYS(rt_sigprocmask), 242 }
+};
+
+static int
+process_whitelist(const struct QemuSeccompSyscall *whitelist,
+                  unsigned int size, scmp_filter_ctx *ctx)
 {
     int rc = 0;
+
     unsigned int i = 0;
-    scmp_filter_ctx ctx;
+
+    for (i = 0; i < size; i++) {
+        rc = seccomp_rule_add(ctx, SCMP_ACT_ALLOW, whitelist[i].num, 0);
+        if (rc < 0) {
+            return -1;
+        }
+
+        rc = seccomp_syscall_priority(ctx, whitelist[i].num,
+                                      whitelist[i].priority);
+        if (rc < 0) {
+            return -1;
+        }
+    }
+    return 0;
+}
+
+int
+seccomp_start(enum whitelist_mode mode, scmp_filter_ctx *ctx)
+{
+    int rc = 0;
 
     ctx = seccomp_init(SCMP_ACT_KILL);
     if (ctx == NULL) {
+        rc = -1;
         goto seccomp_return;
     }
 
-    for (i = 0; i < ARRAY_SIZE(seccomp_whitelist); i++) {
-        rc = seccomp_rule_add(ctx, SCMP_ACT_ALLOW, seccomp_whitelist[i].num, 0);
-        if (rc < 0) {
+    switch (mode) {
+    case INIT:
+        if (process_whitelist
+            (seccomp_whitelist_init,
+             ARRAY_SIZE(seccomp_whitelist_init), ctx) < 0) {
+            rc = -1;
             goto seccomp_return;
         }
-        rc = seccomp_syscall_priority(ctx, seccomp_whitelist[i].num,
-                                      seccomp_whitelist[i].priority);
-        if (rc < 0) {
+        break;
+    case MAIN_LOOP:
+        if (process_whitelist
+            (seccomp_whitelist_main_loop,
+             ARRAY_SIZE(seccomp_whitelist_main_loop), ctx) < 0) {
+            rc = -1;
             goto seccomp_return;
         }
+        break;
+    default:
+        rc = -1;
+        goto seccomp_return;
     }
 
     rc = seccomp_load(ctx);
diff --git a/qemu-seccomp.h b/qemu-seccomp.h
index b2fc3f8..1c97978 100644
--- a/qemu-seccomp.h
+++ b/qemu-seccomp.h
@@ -18,5 +18,10 @@
 #include <seccomp.h>
 #include "osdep.h"
 
-int seccomp_start(void);
+enum whitelist_mode {
+    INIT = 0,
+    MAIN_LOOP = 1,
+};
+
+int seccomp_start(enum whitelist_mode mode, scmp_filter_ctx *ctx);
 #endif
diff --git a/vl.c b/vl.c
index bec68cd..d50018f 100644
--- a/vl.c
+++ b/vl.c
@@ -774,10 +774,11 @@ static int bt_parse(const char *opt)
     return 1;
 }
 
-static int install_seccomp_filters(void)
+static int
+install_seccomp_filters(enum whitelist_mode mode, scmp_filter_ctx *ctx)
 {
 #ifdef CONFIG_SECCOMP
-    if (seccomp_start() < 0) {
+    if (seccomp_start(mode, ctx) < 0) {
         qerror_report(ERROR_CLASS_GENERIC_ERROR,
                 "failed to install seccomp syscall filter in the kernel");
         return -1;
@@ -2407,6 +2408,10 @@ int main(int argc, char **argv, char **envp)
     const char *trace_events = NULL;
     const char *trace_file = NULL;
 
+#ifdef CONFIG_SECCOMP
+    scmp_filter_ctx main_loop_ctx;
+#endif
+
     atexit(qemu_run_exit_notifiers);
     error_set_progname(argv[0]);
 
@@ -3330,11 +3335,13 @@ int main(int argc, char **argv, char **envp)
     }
 
     /* We should install seccomp filters even if -sandbox on is not used. */
+#ifdef CONFIG_SECCOMP
     if (seccomp_on) {
-        if (install_seccomp_filters() < 0) {
+        if (install_seccomp_filters(INIT, &main_loop_ctx) < 0) {
             exit(1);
         }
     }
+#endif
 
     if (machine == NULL) {
         fprintf(stderr, "No machine found.\n");
@@ -3794,6 +3801,14 @@ int main(int argc, char **argv, char **envp)
 
     os_setup_post();
 
+#ifdef CONFIG_SECCOMP
+    if (seccomp_on) {
+        if (install_seccomp_filters(MAIN_LOOP, &main_loop_ctx) < 0) {
+            exit(1);
+        }
+    }
+#endif
+
     resume_all_vcpus();
     main_loop();
     bdrv_close_all();
-- 
1.7.12

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [Qemu-devel] [PATCHv2 4/4] Warning messages on net devices hotplug
  2012-10-23  5:55 [Qemu-devel] [PATCHv2 1/4] Adding new syscalls (bugzilla 855162) Eduardo Otubo
  2012-10-23  5:55 ` [Qemu-devel] [PATCHv2 2/4] Setting "-sandbox on" as deafult Eduardo Otubo
  2012-10-23  5:55 ` [Qemu-devel] [PATCHv2 3/4] Support for "double whitelist" filters Eduardo Otubo
@ 2012-10-23  5:55 ` Eduardo Otubo
  2012-10-23 15:59   ` Corey Bryant
  2012-11-01 21:43 ` [Qemu-devel] [PATCHv2 1/4] Adding new syscalls (bugzilla 855162) Paul Moore
  3 siblings, 1 reply; 25+ messages in thread
From: Eduardo Otubo @ 2012-10-23  5:55 UTC (permalink / raw)
  To: qemu-devel; +Cc: pmoore, aliguori, coreyb, Eduardo Otubo

With the inclusion of the new "double whitelist" seccomp filter, Qemu
won't be able to execve() in runtime, thus, no hotplug net devices
allowed.

v2: * Error messages moved to the backend function, net_init_tap(), recommended
      by Paolo Bonzini
    * Documentation added to QMP and HMP commands, and also to the Qemu options.

Signed-off-by: Eduardo Otubo <otubo@linux.vnet.ibm.com>
---
 hmp-commands.hx | 4 ++--
 net.c           | 1 +
 net/tap.c       | 5 +++++
 qemu-options.hx | 3 ++-
 qmp-commands.hx | 3 ++-
 5 files changed, 12 insertions(+), 4 deletions(-)

diff --git a/hmp-commands.hx b/hmp-commands.hx
index e0b537d..3e28501 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -1068,7 +1068,7 @@ ETEXI
         .name       = "host_net_add",
         .args_type  = "device:s,opts:s?",
         .params     = "tap|user|socket|vde|dump [options]",
-        .help       = "add host VLAN client",
+        .help       = "add host VLAN client, feature disabled when -sandbox is in use",
         .mhandler.cmd = net_host_device_add,
     },
 
@@ -1096,7 +1096,7 @@ ETEXI
         .name       = "netdev_add",
         .args_type  = "netdev:O",
         .params     = "[user|tap|socket],id=str[,prop=value][,...]",
-        .help       = "add host network device",
+        .help       = "add host network device, feature disabled when -sandbox is in use",
         .mhandler.cmd = hmp_netdev_add,
     },
 
diff --git a/net.c b/net.c
index ae4bc0d..02188f0 100644
--- a/net.c
+++ b/net.c
@@ -765,6 +765,7 @@ void net_host_device_add(Monitor *mon, const QDict *qdict)
     qemu_opt_set(opts, "type", device);
 
     net_client_init(opts, 0, &local_err);
+
     if (error_is_set(&local_err)) {
         qerror_report_err(local_err);
         error_free(local_err);
diff --git a/net/tap.c b/net/tap.c
index df89caa..dd8c79b 100644
--- a/net/tap.c
+++ b/net/tap.c
@@ -590,6 +590,11 @@ static int net_tap_init(const NetdevTapOptions *tap, int *vnet_hdr,
 int net_init_tap(const NetClientOptions *opts, const char *name,
                  NetClientState *peer)
 {
+#ifdef CONFIG_SECCOMP
+    error_report("Cannot hotplug TAP device when -sandbox is in effect");
+    return -1;
+#endif
+
     const NetdevTapOptions *tap;
 
     int fd, vnet_hdr = 0;
diff --git a/qemu-options.hx b/qemu-options.hx
index 7d97f96..02afba3 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -2767,7 +2767,8 @@ STEXI
 @item -sandbox
 @findex -sandbox
 Enable Seccomp mode 2 system call filter. 'on' will enable syscall filtering and 'off' will
-disable it.  The default is 'off'.
+disable it.  The default is 'on'. Note that when using the '-sandbox on' option the hot plug
+of new devices will be disabled.
 ETEXI
 
 DEF("readconfig", HAS_ARG, QEMU_OPTION_readconfig,
diff --git a/qmp-commands.hx b/qmp-commands.hx
index 2f8477e..cccb8f1 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -757,7 +757,8 @@ Example:
 
 Note: The supported device options are the same ones supported by the '-net'
       command-line argument, which are listed in the '-help' output or QEMU's
-      manual
+      manual. Also note that the hot plug is disabled when -sandbox is in
+      effect
 
 EQMP
 
-- 
1.7.12

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCHv2 3/4] Support for "double whitelist" filters
  2012-10-23  5:55 ` [Qemu-devel] [PATCHv2 3/4] Support for "double whitelist" filters Eduardo Otubo
@ 2012-10-23 15:10   ` Corey Bryant
  2012-10-24 20:06     ` Eduardo Otubo
  2012-10-25 20:16     ` Eduardo Otubo
  2012-11-02 21:29   ` Paul Moore
  1 sibling, 2 replies; 25+ messages in thread
From: Corey Bryant @ 2012-10-23 15:10 UTC (permalink / raw)
  To: Eduardo Otubo; +Cc: pmoore, aliguori, qemu-devel



On 10/23/2012 01:55 AM, Eduardo Otubo wrote:
> This patch includes a second whitelist right before the main loop. It's
> a smaller and more restricted whitelist, excluding execve() among many
> others.
>
> v2: * ctx changed to main_loop_ctx
>      * seccomp_on now inside ifdef
>      * open syscall added to the main_loop whitelist
>
> Signed-off-by: Eduardo Otubo <otubo@linux.vnet.ibm.com>
> ---
>   qemu-seccomp.c | 99 ++++++++++++++++++++++++++++++++++++++++++++++++++++------
>   qemu-seccomp.h |  7 ++++-
>   vl.c           | 21 +++++++++++--
>   3 files changed, 114 insertions(+), 13 deletions(-)
>
> diff --git a/qemu-seccomp.c b/qemu-seccomp.c
> index a7b33e2..033cfad 100644
> --- a/qemu-seccomp.c
> +++ b/qemu-seccomp.c
> @@ -13,6 +13,7 @@
>    * GNU GPL, version 2 or (at your option) any later version.
>    */
>   #include <stdio.h>
> +#include <stdlib.h>
>   #include <seccomp.h>
>   #include "qemu-seccomp.h"
>
> @@ -21,7 +22,7 @@ struct QemuSeccompSyscall {
>       uint8_t priority;
>   };
>
> -static const struct QemuSeccompSyscall seccomp_whitelist[] = {
> +static const struct QemuSeccompSyscall seccomp_whitelist_init[] = {
>       { SCMP_SYS(timer_settime), 255 },
>       { SCMP_SYS(timer_gettime), 254 },
>       { SCMP_SYS(futex), 253 },
> @@ -121,27 +122,107 @@ static const struct QemuSeccompSyscall seccomp_whitelist[] = {
>       { SCMP_SYS(rt_sigtimedwait), 242 }
>   };
>
> -int seccomp_start(void)
> +static const struct QemuSeccompSyscall seccomp_whitelist_main_loop[] = {
> +    { SCMP_SYS(timer_settime), 255 },
> +    { SCMP_SYS(timer_gettime), 254 },
> +    { SCMP_SYS(futex), 253 },
> +    { SCMP_SYS(select), 252 },
> +    { SCMP_SYS(recvfrom), 251 },
> +    { SCMP_SYS(sendto), 250 },
> +    { SCMP_SYS(read), 249 },
> +    { SCMP_SYS(brk), 248 },
> +    { SCMP_SYS(mmap), 247 },
> +    { SCMP_SYS(open), 247 },
> +#if defined(__i386__)
> +    { SCMP_SYS(fcntl64), 245 },
> +    { SCMP_SYS(fstat64), 245 },
> +    { SCMP_SYS(stat64), 245 },
> +    { SCMP_SYS(getgid32), 245 },
> +    { SCMP_SYS(getegid32), 245 },
> +    { SCMP_SYS(getuid32), 245 },
> +    { SCMP_SYS(geteuid32), 245 },
> +    { SCMP_SYS(sigreturn), 245 },
> +    { SCMP_SYS(_newselect), 245 },
> +    { SCMP_SYS(_llseek), 245 },
> +    { SCMP_SYS(mmap2), 245},
> +    { SCMP_SYS(sigprocmask), 245 },
> +#endif
> +    { SCMP_SYS(exit), 245 },
> +    { SCMP_SYS(timer_delete), 245 },
> +    { SCMP_SYS(exit_group), 245 },
> +    { SCMP_SYS(rt_sigreturn), 245 },
> +    { SCMP_SYS(madvise), 245 },
> +    { SCMP_SYS(write), 244 },
> +    { SCMP_SYS(fcntl), 243 },
> +    { SCMP_SYS(tgkill), 242 },
> +    { SCMP_SYS(rt_sigaction), 242 },
> +    { SCMP_SYS(pipe2), 242 },
> +    { SCMP_SYS(munmap), 242 },
> +    { SCMP_SYS(mremap), 242 },
> +    { SCMP_SYS(getsockname), 242 },
> +    { SCMP_SYS(getpeername), 242 },
> +    { SCMP_SYS(close), 242 },
> +    { SCMP_SYS(accept4), 242 },
> +    { SCMP_SYS(eventfd2), 242 },
> +    { SCMP_SYS(recvmsg), 242 },
> +    { SCMP_SYS(ioctl), 242 },
> +    { SCMP_SYS(rt_sigprocmask), 242 }
> +};
> +
> +static int
> +process_whitelist(const struct QemuSeccompSyscall *whitelist,
> +                  unsigned int size, scmp_filter_ctx *ctx)
>   {
>       int rc = 0;
> +
>       unsigned int i = 0;
> -    scmp_filter_ctx ctx;
> +
> +    for (i = 0; i < size; i++) {
> +        rc = seccomp_rule_add(ctx, SCMP_ACT_ALLOW, whitelist[i].num, 0);
> +        if (rc < 0) {
> +            return -1;
> +        }
> +
> +        rc = seccomp_syscall_priority(ctx, whitelist[i].num,
> +                                      whitelist[i].priority);
> +        if (rc < 0) {
> +            return -1;
> +        }
> +    }
> +    return 0;
> +}
> +
> +int
> +seccomp_start(enum whitelist_mode mode, scmp_filter_ctx *ctx)
> +{
> +    int rc = 0;
>
>       ctx = seccomp_init(SCMP_ACT_KILL);

Is there any reason why ctx can't be a local variable in this function? 
  It is allocated and freed on each entry and exit in this function.

>       if (ctx == NULL) {
> +        rc = -1;
>           goto seccomp_return;
>       }
>
> -    for (i = 0; i < ARRAY_SIZE(seccomp_whitelist); i++) {
> -        rc = seccomp_rule_add(ctx, SCMP_ACT_ALLOW, seccomp_whitelist[i].num, 0);
> -        if (rc < 0) {
> +    switch (mode) {
> +    case INIT:
> +        if (process_whitelist
> +            (seccomp_whitelist_init,
> +             ARRAY_SIZE(seccomp_whitelist_init), ctx) < 0) {
> +            rc = -1;
>               goto seccomp_return;
>           }
> -        rc = seccomp_syscall_priority(ctx, seccomp_whitelist[i].num,
> -                                      seccomp_whitelist[i].priority);
> -        if (rc < 0) {
> +        break;
> +    case MAIN_LOOP:
> +        if (process_whitelist
> +            (seccomp_whitelist_main_loop,
> +             ARRAY_SIZE(seccomp_whitelist_main_loop), ctx) < 0) {
> +            rc = -1;
>               goto seccomp_return;
>           }
> +        break;
> +    default:
> +        rc = -1;
> +        goto seccomp_return;
>       }
>
>       rc = seccomp_load(ctx);
> diff --git a/qemu-seccomp.h b/qemu-seccomp.h
> index b2fc3f8..1c97978 100644
> --- a/qemu-seccomp.h
> +++ b/qemu-seccomp.h
> @@ -18,5 +18,10 @@
>   #include <seccomp.h>
>   #include "osdep.h"
>
> -int seccomp_start(void);
> +enum whitelist_mode {
> +    INIT = 0,
> +    MAIN_LOOP = 1,
> +};
> +
> +int seccomp_start(enum whitelist_mode mode, scmp_filter_ctx *ctx);
>   #endif
> diff --git a/vl.c b/vl.c
> index bec68cd..d50018f 100644
> --- a/vl.c
> +++ b/vl.c
> @@ -774,10 +774,11 @@ static int bt_parse(const char *opt)
>       return 1;
>   }
>
> -static int install_seccomp_filters(void)
> +static int
> +install_seccomp_filters(enum whitelist_mode mode, scmp_filter_ctx *ctx)
>   {
>   #ifdef CONFIG_SECCOMP
> -    if (seccomp_start() < 0) {
> +    if (seccomp_start(mode, ctx) < 0) {
>           qerror_report(ERROR_CLASS_GENERIC_ERROR,
>                   "failed to install seccomp syscall filter in the kernel");

I heard from Luiz Capitulino on one of my patches that qerror_report() 
is deprecated.  So you'll want to update this.

>           return -1;
> @@ -2407,6 +2408,10 @@ int main(int argc, char **argv, char **envp)
>       const char *trace_events = NULL;
>       const char *trace_file = NULL;
>
> +#ifdef CONFIG_SECCOMP
> +    scmp_filter_ctx main_loop_ctx;
> +#endif
> +
>       atexit(qemu_run_exit_notifiers);
>       error_set_progname(argv[0]);
>
> @@ -3330,11 +3335,13 @@ int main(int argc, char **argv, char **envp)
>       }
>
>       /* We should install seccomp filters even if -sandbox on is not used. */
> +#ifdef CONFIG_SECCOMP
>       if (seccomp_on) {
> -        if (install_seccomp_filters() < 0) {
> +        if (install_seccomp_filters(INIT, &main_loop_ctx) < 0) {

I don't think the variable name "main_loop_ctx" makes sense here. 
Should the name be more generic since it's used wherever a seccomp 
filter is installed?

>               exit(1);
>           }
>       }
> +#endif
>
>       if (machine == NULL) {
>           fprintf(stderr, "No machine found.\n");
> @@ -3794,6 +3801,14 @@ int main(int argc, char **argv, char **envp)
>
>       os_setup_post();
>
> +#ifdef CONFIG_SECCOMP
> +    if (seccomp_on) {
> +        if (install_seccomp_filters(MAIN_LOOP, &main_loop_ctx) < 0) {
> +            exit(1);
> +        }
> +    }
> +#endif
> +
>       resume_all_vcpus();
>       main_loop();
>       bdrv_close_all();
>

-- 
Regards,
Corey Bryant

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCHv2 4/4] Warning messages on net devices hotplug
  2012-10-23  5:55 ` [Qemu-devel] [PATCHv2 4/4] Warning messages on net devices hotplug Eduardo Otubo
@ 2012-10-23 15:59   ` Corey Bryant
  2012-10-23 16:39     ` Eric Blake
  0 siblings, 1 reply; 25+ messages in thread
From: Corey Bryant @ 2012-10-23 15:59 UTC (permalink / raw)
  To: Eduardo Otubo; +Cc: pmoore, aliguori, qemu-devel



On 10/23/2012 01:55 AM, Eduardo Otubo wrote:
> With the inclusion of the new "double whitelist" seccomp filter, Qemu
> won't be able to execve() in runtime, thus, no hotplug net devices
> allowed.
>
> v2: * Error messages moved to the backend function, net_init_tap(), recommended
>        by Paolo Bonzini
>      * Documentation added to QMP and HMP commands, and also to the Qemu options.
>
> Signed-off-by: Eduardo Otubo <otubo@linux.vnet.ibm.com>
> ---
>   hmp-commands.hx | 4 ++--
>   net.c           | 1 +
>   net/tap.c       | 5 +++++
>   qemu-options.hx | 3 ++-
>   qmp-commands.hx | 3 ++-
>   5 files changed, 12 insertions(+), 4 deletions(-)
>
> diff --git a/hmp-commands.hx b/hmp-commands.hx
> index e0b537d..3e28501 100644
> --- a/hmp-commands.hx
> +++ b/hmp-commands.hx
> @@ -1068,7 +1068,7 @@ ETEXI
>           .name       = "host_net_add",
>           .args_type  = "device:s,opts:s?",
>           .params     = "tap|user|socket|vde|dump [options]",
> -        .help       = "add host VLAN client",
> +        .help       = "add host VLAN client, feature disabled when -sandbox is in use",

Maybe the "feature disabled.." part should be in parenthesis?  There's 
also another section in hmp-commands.hx a few lines below this code 
where you can add the same note.

>           .mhandler.cmd = net_host_device_add,
>       },
>
> @@ -1096,7 +1096,7 @@ ETEXI
>           .name       = "netdev_add",
>           .args_type  = "netdev:O",
>           .params     = "[user|tap|socket],id=str[,prop=value][,...]",
> -        .help       = "add host network device",
> +        .help       = "add host network device, feature disabled when -sandbox is in use",

Same comments as above.

>           .mhandler.cmd = hmp_netdev_add,
>       },
>
> diff --git a/net.c b/net.c
> index ae4bc0d..02188f0 100644
> --- a/net.c
> +++ b/net.c
> @@ -765,6 +765,7 @@ void net_host_device_add(Monitor *mon, const QDict *qdict)
>       qemu_opt_set(opts, "type", device);
>
>       net_client_init(opts, 0, &local_err);
> +

You can get rid of this change.

>       if (error_is_set(&local_err)) {
>           qerror_report_err(local_err);
>           error_free(local_err);
> diff --git a/net/tap.c b/net/tap.c
> index df89caa..dd8c79b 100644
> --- a/net/tap.c
> +++ b/net/tap.c
> @@ -590,6 +590,11 @@ static int net_tap_init(const NetdevTapOptions *tap, int *vnet_hdr,
>   int net_init_tap(const NetClientOptions *opts, const char *name,
>                    NetClientState *peer)
>   {
> +#ifdef CONFIG_SECCOMP
> +    error_report("Cannot hotplug TAP device when -sandbox is in effect");
> +    return -1;
> +#endif
> +

It seems like it would make more sense to put this code after the local 
variables are defined.  But more importantly, does this prevent adding a 
tap device on the command line?  If so, that's not good.  Also I don't 
think you are preventing a "netdev_add bridge" any more in v2, which 
also calls execve().

>       const NetdevTapOptions *tap;
>
>       int fd, vnet_hdr = 0;
> diff --git a/qemu-options.hx b/qemu-options.hx
> index 7d97f96..02afba3 100644
> --- a/qemu-options.hx
> +++ b/qemu-options.hx
> @@ -2767,7 +2767,8 @@ STEXI
>   @item -sandbox
>   @findex -sandbox
>   Enable Seccomp mode 2 system call filter. 'on' will enable syscall filtering and 'off' will
> -disable it.  The default is 'off'.
> +disable it.  The default is 'on'. Note that when using the '-sandbox on' option the hot plug
> +of new devices will be disabled.

Only network devices are prevented, right?

Also, as I mentioned before, can you limit this to the subset of options 
that cause execve() to be issued?  For example, can we allow libvirt to 
pass an fd for hotplugging a network device (e.g. netdev_add tap,fd=23)? 
  I don't know for sure but I'm guessing libvirt does that.

Also I think it would be worth-while to update the qemu-kvm --help 
output with a similar note too.

>   ETEXI
>
>   DEF("readconfig", HAS_ARG, QEMU_OPTION_readconfig,
> diff --git a/qmp-commands.hx b/qmp-commands.hx
> index 2f8477e..cccb8f1 100644
> --- a/qmp-commands.hx
> +++ b/qmp-commands.hx
> @@ -757,7 +757,8 @@ Example:
>
>   Note: The supported device options are the same ones supported by the '-net'
>         command-line argument, which are listed in the '-help' output or QEMU's
> -      manual
> +      manual. Also note that the hot plug is disabled when -sandbox is in
> +      effect

Not all hotplug abilities are disabled.  Just network devices.  This is 
missing a period too.

>
>   EQMP
>

-- 
Regards,
Corey Bryant

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCHv2 4/4] Warning messages on net devices hotplug
  2012-10-23 15:59   ` Corey Bryant
@ 2012-10-23 16:39     ` Eric Blake
  0 siblings, 0 replies; 25+ messages in thread
From: Eric Blake @ 2012-10-23 16:39 UTC (permalink / raw)
  To: Corey Bryant; +Cc: pmoore, aliguori, qemu-devel, Eduardo Otubo

[-- Attachment #1: Type: text/plain, Size: 935 bytes --]

On 10/23/2012 09:59 AM, Corey Bryant wrote:

> Only network devices are prevented, right?
> 
> Also, as I mentioned before, can you limit this to the subset of options
> that cause execve() to be issued?  For example, can we allow libvirt to
> pass an fd for hotplugging a network device (e.g. netdev_add tap,fd=23)?
>  I don't know for sure but I'm guessing libvirt does that.

Correct, libvirt prefers passing network devices pre-opened via fds,
rather than having qemu exec scripts.

>> +      manual. Also note that the hot plug is disabled when -sandbox
>> is in
>> +      effect
> 
> Not all hotplug abilities are disabled.  Just network devices.  This is
> missing a period too.

And not all network hotplug, just hotplug that requires use of exec
(again, fd passing bypasses the need for exec).

-- 
Eric Blake   eblake@redhat.com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 617 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCHv2 3/4] Support for "double whitelist" filters
  2012-10-23 15:10   ` Corey Bryant
@ 2012-10-24 20:06     ` Eduardo Otubo
  2012-10-25 20:16     ` Eduardo Otubo
  1 sibling, 0 replies; 25+ messages in thread
From: Eduardo Otubo @ 2012-10-24 20:06 UTC (permalink / raw)
  To: Corey Bryant; +Cc: pmoore, aliguori, qemu-devel

On Tue, Oct 23, 2012 at 11:10:58AM -0400, Corey Bryant wrote:
> 
> 
> On 10/23/2012 01:55 AM, Eduardo Otubo wrote:
> >This patch includes a second whitelist right before the main loop. It's
> >a smaller and more restricted whitelist, excluding execve() among many
> >others.
> >
> >v2: * ctx changed to main_loop_ctx
> >     * seccomp_on now inside ifdef
> >     * open syscall added to the main_loop whitelist
> >
> >Signed-off-by: Eduardo Otubo <otubo@linux.vnet.ibm.com>
> >---
> >  qemu-seccomp.c | 99 ++++++++++++++++++++++++++++++++++++++++++++++++++++------
> >  qemu-seccomp.h |  7 ++++-
> >  vl.c           | 21 +++++++++++--
> >  3 files changed, 114 insertions(+), 13 deletions(-)
> >
> >diff --git a/qemu-seccomp.c b/qemu-seccomp.c
> >index a7b33e2..033cfad 100644
> >--- a/qemu-seccomp.c
> >+++ b/qemu-seccomp.c
> >@@ -13,6 +13,7 @@
> >   * GNU GPL, version 2 or (at your option) any later version.
> >   */
> >  #include <stdio.h>
> >+#include <stdlib.h>
> >  #include <seccomp.h>
> >  #include "qemu-seccomp.h"
> >
> >@@ -21,7 +22,7 @@ struct QemuSeccompSyscall {
> >      uint8_t priority;
> >  };
> >
> >-static const struct QemuSeccompSyscall seccomp_whitelist[] = {
> >+static const struct QemuSeccompSyscall seccomp_whitelist_init[] = {
> >      { SCMP_SYS(timer_settime), 255 },
> >      { SCMP_SYS(timer_gettime), 254 },
> >      { SCMP_SYS(futex), 253 },
> >@@ -121,27 +122,107 @@ static const struct QemuSeccompSyscall seccomp_whitelist[] = {
> >      { SCMP_SYS(rt_sigtimedwait), 242 }
> >  };
> >
> >-int seccomp_start(void)
> >+static const struct QemuSeccompSyscall seccomp_whitelist_main_loop[] = {
> >+    { SCMP_SYS(timer_settime), 255 },
> >+    { SCMP_SYS(timer_gettime), 254 },
> >+    { SCMP_SYS(futex), 253 },
> >+    { SCMP_SYS(select), 252 },
> >+    { SCMP_SYS(recvfrom), 251 },
> >+    { SCMP_SYS(sendto), 250 },
> >+    { SCMP_SYS(read), 249 },
> >+    { SCMP_SYS(brk), 248 },
> >+    { SCMP_SYS(mmap), 247 },
> >+    { SCMP_SYS(open), 247 },
> >+#if defined(__i386__)
> >+    { SCMP_SYS(fcntl64), 245 },
> >+    { SCMP_SYS(fstat64), 245 },
> >+    { SCMP_SYS(stat64), 245 },
> >+    { SCMP_SYS(getgid32), 245 },
> >+    { SCMP_SYS(getegid32), 245 },
> >+    { SCMP_SYS(getuid32), 245 },
> >+    { SCMP_SYS(geteuid32), 245 },
> >+    { SCMP_SYS(sigreturn), 245 },
> >+    { SCMP_SYS(_newselect), 245 },
> >+    { SCMP_SYS(_llseek), 245 },
> >+    { SCMP_SYS(mmap2), 245},
> >+    { SCMP_SYS(sigprocmask), 245 },
> >+#endif
> >+    { SCMP_SYS(exit), 245 },
> >+    { SCMP_SYS(timer_delete), 245 },
> >+    { SCMP_SYS(exit_group), 245 },
> >+    { SCMP_SYS(rt_sigreturn), 245 },
> >+    { SCMP_SYS(madvise), 245 },
> >+    { SCMP_SYS(write), 244 },
> >+    { SCMP_SYS(fcntl), 243 },
> >+    { SCMP_SYS(tgkill), 242 },
> >+    { SCMP_SYS(rt_sigaction), 242 },
> >+    { SCMP_SYS(pipe2), 242 },
> >+    { SCMP_SYS(munmap), 242 },
> >+    { SCMP_SYS(mremap), 242 },
> >+    { SCMP_SYS(getsockname), 242 },
> >+    { SCMP_SYS(getpeername), 242 },
> >+    { SCMP_SYS(close), 242 },
> >+    { SCMP_SYS(accept4), 242 },
> >+    { SCMP_SYS(eventfd2), 242 },
> >+    { SCMP_SYS(recvmsg), 242 },
> >+    { SCMP_SYS(ioctl), 242 },
> >+    { SCMP_SYS(rt_sigprocmask), 242 }
> >+};
> >+
> >+static int
> >+process_whitelist(const struct QemuSeccompSyscall *whitelist,
> >+                  unsigned int size, scmp_filter_ctx *ctx)
> >  {
> >      int rc = 0;
> >+
> >      unsigned int i = 0;
> >-    scmp_filter_ctx ctx;
> >+
> >+    for (i = 0; i < size; i++) {
> >+        rc = seccomp_rule_add(ctx, SCMP_ACT_ALLOW, whitelist[i].num, 0);
> >+        if (rc < 0) {
> >+            return -1;
> >+        }
> >+
> >+        rc = seccomp_syscall_priority(ctx, whitelist[i].num,
> >+                                      whitelist[i].priority);
> >+        if (rc < 0) {
> >+            return -1;
> >+        }
> >+    }
> >+    return 0;
> >+}
> >+
> >+int
> >+seccomp_start(enum whitelist_mode mode, scmp_filter_ctx *ctx)
> >+{
> >+    int rc = 0;
> >
> >      ctx = seccomp_init(SCMP_ACT_KILL);
> 
> Is there any reason why ctx can't be a local variable in this
> function?  It is allocated and freed on each entry and exit in this
> function.

I think you're probaby right. I'll declare this variable as local in the
next version.

> 
> >      if (ctx == NULL) {
> >+        rc = -1;
> >          goto seccomp_return;
> >      }
> >
> >-    for (i = 0; i < ARRAY_SIZE(seccomp_whitelist); i++) {
> >-        rc = seccomp_rule_add(ctx, SCMP_ACT_ALLOW, seccomp_whitelist[i].num, 0);
> >-        if (rc < 0) {
> >+    switch (mode) {
> >+    case INIT:
> >+        if (process_whitelist
> >+            (seccomp_whitelist_init,
> >+             ARRAY_SIZE(seccomp_whitelist_init), ctx) < 0) {
> >+            rc = -1;
> >              goto seccomp_return;
> >          }
> >-        rc = seccomp_syscall_priority(ctx, seccomp_whitelist[i].num,
> >-                                      seccomp_whitelist[i].priority);
> >-        if (rc < 0) {
> >+        break;
> >+    case MAIN_LOOP:
> >+        if (process_whitelist
> >+            (seccomp_whitelist_main_loop,
> >+             ARRAY_SIZE(seccomp_whitelist_main_loop), ctx) < 0) {
> >+            rc = -1;
> >              goto seccomp_return;
> >          }
> >+        break;
> >+    default:
> >+        rc = -1;
> >+        goto seccomp_return;
> >      }
> >
> >      rc = seccomp_load(ctx);
> >diff --git a/qemu-seccomp.h b/qemu-seccomp.h
> >index b2fc3f8..1c97978 100644
> >--- a/qemu-seccomp.h
> >+++ b/qemu-seccomp.h
> >@@ -18,5 +18,10 @@
> >  #include <seccomp.h>
> >  #include "osdep.h"
> >
> >-int seccomp_start(void);
> >+enum whitelist_mode {
> >+    INIT = 0,
> >+    MAIN_LOOP = 1,
> >+};
> >+
> >+int seccomp_start(enum whitelist_mode mode, scmp_filter_ctx *ctx);
> >  #endif
> >diff --git a/vl.c b/vl.c
> >index bec68cd..d50018f 100644
> >--- a/vl.c
> >+++ b/vl.c
> >@@ -774,10 +774,11 @@ static int bt_parse(const char *opt)
> >      return 1;
> >  }
> >
> >-static int install_seccomp_filters(void)
> >+static int
> >+install_seccomp_filters(enum whitelist_mode mode, scmp_filter_ctx *ctx)
> >  {
> >  #ifdef CONFIG_SECCOMP
> >-    if (seccomp_start() < 0) {
> >+    if (seccomp_start(mode, ctx) < 0) {
> >          qerror_report(ERROR_CLASS_GENERIC_ERROR,
> >                  "failed to install seccomp syscall filter in the kernel");
> 
> I heard from Luiz Capitulino on one of my patches that
> qerror_report() is deprecated.  So you'll want to update this.
> 
> >          return -1;
> >@@ -2407,6 +2408,10 @@ int main(int argc, char **argv, char **envp)
> >      const char *trace_events = NULL;
> >      const char *trace_file = NULL;
> >
> >+#ifdef CONFIG_SECCOMP
> >+    scmp_filter_ctx main_loop_ctx;
> >+#endif
> >+
> >      atexit(qemu_run_exit_notifiers);
> >      error_set_progname(argv[0]);
> >
> >@@ -3330,11 +3335,13 @@ int main(int argc, char **argv, char **envp)
> >      }
> >
> >      /* We should install seccomp filters even if -sandbox on is not used. */
> >+#ifdef CONFIG_SECCOMP
> >      if (seccomp_on) {
> >-        if (install_seccomp_filters() < 0) {
> >+        if (install_seccomp_filters(INIT, &main_loop_ctx) < 0) {
> 
> I don't think the variable name "main_loop_ctx" makes sense here.
> Should the name be more generic since it's used wherever a seccomp
> filter is installed?

I removed this variable since I'm using a local variable as I said
above.

Thanks for the comments :)

-- 
Eduardo Otubo
Software Engineer
Linux Technology Center
IBM Systems & Technology Group

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCHv2 3/4] Support for "double whitelist" filters
  2012-10-23 15:10   ` Corey Bryant
  2012-10-24 20:06     ` Eduardo Otubo
@ 2012-10-25 20:16     ` Eduardo Otubo
  1 sibling, 0 replies; 25+ messages in thread
From: Eduardo Otubo @ 2012-10-25 20:16 UTC (permalink / raw)
  To: Corey Bryant; +Cc: pmoore, aliguori, qemu-devel

On Tue, Oct 23, 2012 at 11:10:58AM -0400, Corey Bryant wrote:
> 
> 
> On 10/23/2012 01:55 AM, Eduardo Otubo wrote:
> >This patch includes a second whitelist right before the main loop. It's
> >a smaller and more restricted whitelist, excluding execve() among many
> >others.
> >
> >v2: * ctx changed to main_loop_ctx
> >     * seccomp_on now inside ifdef
> >     * open syscall added to the main_loop whitelist
> >
> >Signed-off-by: Eduardo Otubo <otubo@linux.vnet.ibm.com>
> >---
> >  qemu-seccomp.c | 99 ++++++++++++++++++++++++++++++++++++++++++++++++++++------
> >  qemu-seccomp.h |  7 ++++-
> >  vl.c           | 21 +++++++++++--
> >  3 files changed, 114 insertions(+), 13 deletions(-)
> >
> >diff --git a/qemu-seccomp.c b/qemu-seccomp.c
> >index a7b33e2..033cfad 100644
> >--- a/qemu-seccomp.c
> >+++ b/qemu-seccomp.c
> >@@ -13,6 +13,7 @@
> >   * GNU GPL, version 2 or (at your option) any later version.
> >   */
> >  #include <stdio.h>
> >+#include <stdlib.h>
> >  #include <seccomp.h>
> >  #include "qemu-seccomp.h"
> >
> >@@ -21,7 +22,7 @@ struct QemuSeccompSyscall {
> >      uint8_t priority;
> >  };
> >
> >-static const struct QemuSeccompSyscall seccomp_whitelist[] = {
> >+static const struct QemuSeccompSyscall seccomp_whitelist_init[] = {
> >      { SCMP_SYS(timer_settime), 255 },
> >      { SCMP_SYS(timer_gettime), 254 },
> >      { SCMP_SYS(futex), 253 },
> >@@ -121,27 +122,107 @@ static const struct QemuSeccompSyscall seccomp_whitelist[] = {
> >      { SCMP_SYS(rt_sigtimedwait), 242 }
> >  };
> >
> >-int seccomp_start(void)
> >+static const struct QemuSeccompSyscall seccomp_whitelist_main_loop[] = {
> >+    { SCMP_SYS(timer_settime), 255 },
> >+    { SCMP_SYS(timer_gettime), 254 },
> >+    { SCMP_SYS(futex), 253 },
> >+    { SCMP_SYS(select), 252 },
> >+    { SCMP_SYS(recvfrom), 251 },
> >+    { SCMP_SYS(sendto), 250 },
> >+    { SCMP_SYS(read), 249 },
> >+    { SCMP_SYS(brk), 248 },
> >+    { SCMP_SYS(mmap), 247 },
> >+    { SCMP_SYS(open), 247 },
> >+#if defined(__i386__)
> >+    { SCMP_SYS(fcntl64), 245 },
> >+    { SCMP_SYS(fstat64), 245 },
> >+    { SCMP_SYS(stat64), 245 },
> >+    { SCMP_SYS(getgid32), 245 },
> >+    { SCMP_SYS(getegid32), 245 },
> >+    { SCMP_SYS(getuid32), 245 },
> >+    { SCMP_SYS(geteuid32), 245 },
> >+    { SCMP_SYS(sigreturn), 245 },
> >+    { SCMP_SYS(_newselect), 245 },
> >+    { SCMP_SYS(_llseek), 245 },
> >+    { SCMP_SYS(mmap2), 245},
> >+    { SCMP_SYS(sigprocmask), 245 },
> >+#endif
> >+    { SCMP_SYS(exit), 245 },
> >+    { SCMP_SYS(timer_delete), 245 },
> >+    { SCMP_SYS(exit_group), 245 },
> >+    { SCMP_SYS(rt_sigreturn), 245 },
> >+    { SCMP_SYS(madvise), 245 },
> >+    { SCMP_SYS(write), 244 },
> >+    { SCMP_SYS(fcntl), 243 },
> >+    { SCMP_SYS(tgkill), 242 },
> >+    { SCMP_SYS(rt_sigaction), 242 },
> >+    { SCMP_SYS(pipe2), 242 },
> >+    { SCMP_SYS(munmap), 242 },
> >+    { SCMP_SYS(mremap), 242 },
> >+    { SCMP_SYS(getsockname), 242 },
> >+    { SCMP_SYS(getpeername), 242 },
> >+    { SCMP_SYS(close), 242 },
> >+    { SCMP_SYS(accept4), 242 },
> >+    { SCMP_SYS(eventfd2), 242 },
> >+    { SCMP_SYS(recvmsg), 242 },
> >+    { SCMP_SYS(ioctl), 242 },
> >+    { SCMP_SYS(rt_sigprocmask), 242 }
> >+};
> >+
> >+static int
> >+process_whitelist(const struct QemuSeccompSyscall *whitelist,
> >+                  unsigned int size, scmp_filter_ctx *ctx)
> >  {
> >      int rc = 0;
> >+
> >      unsigned int i = 0;
> >-    scmp_filter_ctx ctx;
> >+
> >+    for (i = 0; i < size; i++) {
> >+        rc = seccomp_rule_add(ctx, SCMP_ACT_ALLOW, whitelist[i].num, 0);
> >+        if (rc < 0) {
> >+            return -1;
> >+        }
> >+
> >+        rc = seccomp_syscall_priority(ctx, whitelist[i].num,
> >+                                      whitelist[i].priority);
> >+        if (rc < 0) {
> >+            return -1;
> >+        }
> >+    }
> >+    return 0;
> >+}
> >+
> >+int
> >+seccomp_start(enum whitelist_mode mode, scmp_filter_ctx *ctx)
> >+{
> >+    int rc = 0;
> >
> >      ctx = seccomp_init(SCMP_ACT_KILL);
> 
> Is there any reason why ctx can't be a local variable in this
> function?  It is allocated and freed on each entry and exit in this
> function.
> 

It does make sense. I'll make it local in the next version, thanks.

> >      if (ctx == NULL) {
> >+        rc = -1;
> >          goto seccomp_return;
> >      }
> >
> >-    for (i = 0; i < ARRAY_SIZE(seccomp_whitelist); i++) {
> >-        rc = seccomp_rule_add(ctx, SCMP_ACT_ALLOW, seccomp_whitelist[i].num, 0);
> >-        if (rc < 0) {
> >+    switch (mode) {
> >+    case INIT:
> >+        if (process_whitelist
> >+            (seccomp_whitelist_init,
> >+             ARRAY_SIZE(seccomp_whitelist_init), ctx) < 0) {
> >+            rc = -1;
> >              goto seccomp_return;
> >          }
> >-        rc = seccomp_syscall_priority(ctx, seccomp_whitelist[i].num,
> >-                                      seccomp_whitelist[i].priority);
> >-        if (rc < 0) {
> >+        break;
> >+    case MAIN_LOOP:
> >+        if (process_whitelist
> >+            (seccomp_whitelist_main_loop,
> >+             ARRAY_SIZE(seccomp_whitelist_main_loop), ctx) < 0) {
> >+            rc = -1;
> >              goto seccomp_return;
> >          }
> >+        break;
> >+    default:
> >+        rc = -1;
> >+        goto seccomp_return;
> >      }
> >
> >      rc = seccomp_load(ctx);
> >diff --git a/qemu-seccomp.h b/qemu-seccomp.h
> >index b2fc3f8..1c97978 100644
> >--- a/qemu-seccomp.h
> >+++ b/qemu-seccomp.h
> >@@ -18,5 +18,10 @@
> >  #include <seccomp.h>
> >  #include "osdep.h"
> >
> >-int seccomp_start(void);
> >+enum whitelist_mode {
> >+    INIT = 0,
> >+    MAIN_LOOP = 1,
> >+};
> >+
> >+int seccomp_start(enum whitelist_mode mode, scmp_filter_ctx *ctx);
> >  #endif
> >diff --git a/vl.c b/vl.c
> >index bec68cd..d50018f 100644
> >--- a/vl.c
> >+++ b/vl.c
> >@@ -774,10 +774,11 @@ static int bt_parse(const char *opt)
> >      return 1;
> >  }
> >
> >-static int install_seccomp_filters(void)
> >+static int
> >+install_seccomp_filters(enum whitelist_mode mode, scmp_filter_ctx *ctx)
> >  {
> >  #ifdef CONFIG_SECCOMP
> >-    if (seccomp_start() < 0) {
> >+    if (seccomp_start(mode, ctx) < 0) {
> >          qerror_report(ERROR_CLASS_GENERIC_ERROR,
> >                  "failed to install seccomp syscall filter in the kernel");
> 
> I heard from Luiz Capitulino on one of my patches that
> qerror_report() is deprecated.  So you'll want to update this.

Luiz Capitulino commented on the IRC that fprintf(stderr) is fine for
vl.c. I'll use that.

> 
> >          return -1;
> >@@ -2407,6 +2408,10 @@ int main(int argc, char **argv, char **envp)
> >      const char *trace_events = NULL;
> >      const char *trace_file = NULL;
> >
> >+#ifdef CONFIG_SECCOMP
> >+    scmp_filter_ctx main_loop_ctx;
> >+#endif
> >+
> >      atexit(qemu_run_exit_notifiers);
> >      error_set_progname(argv[0]);
> >
> >@@ -3330,11 +3335,13 @@ int main(int argc, char **argv, char **envp)
> >      }
> >
> >      /* We should install seccomp filters even if -sandbox on is not used. */
> >+#ifdef CONFIG_SECCOMP
> >      if (seccomp_on) {
> >-        if (install_seccomp_filters() < 0) {
> >+        if (install_seccomp_filters(INIT, &main_loop_ctx) < 0) {
> 
> I don't think the variable name "main_loop_ctx" makes sense here.
> Should the name be more generic since it's used wherever a seccomp
> filter is installed?

since the ctx variable now is local in the function
install_seccomp_filters() I'll remove this reference in the next
version.

Thanks for all the comments :)

-- 
Eduardo Otubo
Software Engineer
Linux Technology Center
IBM Systems & Technology Group

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCHv2 1/4] Adding new syscalls (bugzilla 855162)
  2012-10-23  5:55 [Qemu-devel] [PATCHv2 1/4] Adding new syscalls (bugzilla 855162) Eduardo Otubo
                   ` (2 preceding siblings ...)
  2012-10-23  5:55 ` [Qemu-devel] [PATCHv2 4/4] Warning messages on net devices hotplug Eduardo Otubo
@ 2012-11-01 21:43 ` Paul Moore
  2012-11-02  2:29   ` Eduardo Otubo
  2012-11-02 13:48   ` Corey Bryant
  3 siblings, 2 replies; 25+ messages in thread
From: Paul Moore @ 2012-11-01 21:43 UTC (permalink / raw)
  To: Eduardo Otubo; +Cc: aliguori, coreyb, qemu-devel

On Tuesday, October 23, 2012 03:55:29 AM Eduardo Otubo wrote:
> According to the bug 855162[0] - there's the need of adding new syscalls
> to the whitelist whenn using Qemu with Libvirt.
> 
> [0] - https://bugzilla.redhat.com/show_bug.cgi?id=855162
> 
> v2: Adding new syscalls to the list: readlink, rt_sigpending, and
>     rt_sigtimedwait
> 
> Reported-by: Paul Moore <pmoore@redhat.com>
> Signed-off-by: Eduardo Otubo <otubo@linux.vnet.ibm.com>
> ---
>  qemu-seccomp.c | 13 ++++++++++++-
>  1 file changed, 12 insertions(+), 1 deletion(-)

I had an opportunity to test this patchset on a F17 machine using QEMU 1.2 and 
unfortunately it still fails.  I'm using a relatively basic guest 
configuration running F16, the details are documented in the RH BZ that 
Eduardo mentioned in the patch description.

Eduardo, I assume you are not able to reproduce this?

> diff --git a/qemu-seccomp.c b/qemu-seccomp.c
> index 64329a3..a7b33e2 100644
> --- a/qemu-seccomp.c
> +++ b/qemu-seccomp.c
> @@ -45,6 +45,13 @@ static const struct QemuSeccompSyscall
> seccomp_whitelist[] = { { SCMP_SYS(access), 245 },
>      { SCMP_SYS(prctl), 245 },
>      { SCMP_SYS(signalfd), 245 },
> +    { SCMP_SYS(getrlimit), 245 },
> +    { SCMP_SYS(set_tid_address), 245 },
> +    { SCMP_SYS(socketpair), 245 },
> +    { SCMP_SYS(statfs), 245 },
> +    { SCMP_SYS(unlink), 245 },
> +    { SCMP_SYS(wait4), 245 },
> +    { SCMP_SYS(getuid), 245 },
>  #if defined(__i386__)
>      { SCMP_SYS(fcntl64), 245 },
>      { SCMP_SYS(fstat64), 245 },
> @@ -107,7 +114,11 @@ static const struct QemuSeccompSyscall
> seccomp_whitelist[] = { { SCMP_SYS(getsockname), 242 },
>      { SCMP_SYS(getpeername), 242 },
>      { SCMP_SYS(fdatasync), 242 },
> -    { SCMP_SYS(close), 242 }
> +    { SCMP_SYS(close), 242 },
> +    { SCMP_SYS(accept4), 242 },
> +    { SCMP_SYS(readlink), 242 },
> +    { SCMP_SYS(rt_sigpending), 242 },
> +    { SCMP_SYS(rt_sigtimedwait), 242 }
>  };
> 
>  int seccomp_start(void)
-- 
paul moore
security and virtualization @ redhat

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCHv2 1/4] Adding new syscalls (bugzilla 855162)
  2012-11-01 21:43 ` [Qemu-devel] [PATCHv2 1/4] Adding new syscalls (bugzilla 855162) Paul Moore
@ 2012-11-02  2:29   ` Eduardo Otubo
  2012-11-02 14:10     ` Paul Moore
  2012-11-02 13:48   ` Corey Bryant
  1 sibling, 1 reply; 25+ messages in thread
From: Eduardo Otubo @ 2012-11-02  2:29 UTC (permalink / raw)
  To: Paul Moore; +Cc: aliguori, coreyb, qemu-devel

On Thu, Nov 01, 2012 at 05:43:03PM -0400, Paul Moore wrote:
> On Tuesday, October 23, 2012 03:55:29 AM Eduardo Otubo wrote:
> > According to the bug 855162[0] - there's the need of adding new syscalls
> > to the whitelist whenn using Qemu with Libvirt.
> > 
> > [0] - https://bugzilla.redhat.com/show_bug.cgi?id=855162
> > 
> > v2: Adding new syscalls to the list: readlink, rt_sigpending, and
> >     rt_sigtimedwait
> > 
> > Reported-by: Paul Moore <pmoore@redhat.com>
> > Signed-off-by: Eduardo Otubo <otubo@linux.vnet.ibm.com>
> > ---
> >  qemu-seccomp.c | 13 ++++++++++++-
> >  1 file changed, 12 insertions(+), 1 deletion(-)
> 
> I had an opportunity to test this patchset on a F17 machine using QEMU 1.2 and 
> unfortunately it still fails.  I'm using a relatively basic guest 
> configuration running F16, the details are documented in the RH BZ that 
> Eduardo mentioned in the patch description.
> 
> Eduardo, I assume you are not able to reproduce this?

Unfortunately no. But we have the v3  patchset coming soon with new
syscalls and we're hoping to get this fixed. Thanks for the feedback
Paul!

> 
> > diff --git a/qemu-seccomp.c b/qemu-seccomp.c
> > index 64329a3..a7b33e2 100644
> > --- a/qemu-seccomp.c
> > +++ b/qemu-seccomp.c
> > @@ -45,6 +45,13 @@ static const struct QemuSeccompSyscall
> > seccomp_whitelist[] = { { SCMP_SYS(access), 245 },
> >      { SCMP_SYS(prctl), 245 },
> >      { SCMP_SYS(signalfd), 245 },
> > +    { SCMP_SYS(getrlimit), 245 },
> > +    { SCMP_SYS(set_tid_address), 245 },
> > +    { SCMP_SYS(socketpair), 245 },
> > +    { SCMP_SYS(statfs), 245 },
> > +    { SCMP_SYS(unlink), 245 },
> > +    { SCMP_SYS(wait4), 245 },
> > +    { SCMP_SYS(getuid), 245 },
> >  #if defined(__i386__)
> >      { SCMP_SYS(fcntl64), 245 },
> >      { SCMP_SYS(fstat64), 245 },
> > @@ -107,7 +114,11 @@ static const struct QemuSeccompSyscall
> > seccomp_whitelist[] = { { SCMP_SYS(getsockname), 242 },
> >      { SCMP_SYS(getpeername), 242 },
> >      { SCMP_SYS(fdatasync), 242 },
> > -    { SCMP_SYS(close), 242 }
> > +    { SCMP_SYS(close), 242 },
> > +    { SCMP_SYS(accept4), 242 },
> > +    { SCMP_SYS(readlink), 242 },
> > +    { SCMP_SYS(rt_sigpending), 242 },
> > +    { SCMP_SYS(rt_sigtimedwait), 242 }
> >  };
> > 
> >  int seccomp_start(void)
> -- 
> paul moore
> security and virtualization @ redhat
> 

-- 
Eduardo Otubo
Software Engineer
Linux Technology Center
IBM Systems & Technology Group

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCHv2 1/4] Adding new syscalls (bugzilla 855162)
  2012-11-01 21:43 ` [Qemu-devel] [PATCHv2 1/4] Adding new syscalls (bugzilla 855162) Paul Moore
  2012-11-02  2:29   ` Eduardo Otubo
@ 2012-11-02 13:48   ` Corey Bryant
  2012-11-02 14:10     ` Paul Moore
  1 sibling, 1 reply; 25+ messages in thread
From: Corey Bryant @ 2012-11-02 13:48 UTC (permalink / raw)
  To: Paul Moore; +Cc: aliguori, qemu-devel, Eduardo Otubo



On 11/01/2012 05:43 PM, Paul Moore wrote:
> On Tuesday, October 23, 2012 03:55:29 AM Eduardo Otubo wrote:
>> According to the bug 855162[0] - there's the need of adding new syscalls
>> to the whitelist whenn using Qemu with Libvirt.
>>
>> [0] - https://bugzilla.redhat.com/show_bug.cgi?id=855162
>>
>> v2: Adding new syscalls to the list: readlink, rt_sigpending, and
>>      rt_sigtimedwait
>>
>> Reported-by: Paul Moore <pmoore@redhat.com>
>> Signed-off-by: Eduardo Otubo <otubo@linux.vnet.ibm.com>
>> ---
>>   qemu-seccomp.c | 13 ++++++++++++-
>>   1 file changed, 12 insertions(+), 1 deletion(-)
>
> I had an opportunity to test this patchset on a F17 machine using QEMU 1.2 and
> unfortunately it still fails.  I'm using a relatively basic guest
> configuration running F16, the details are documented in the RH BZ that
> Eduardo mentioned in the patch description.

Paul, Here's the latest diff for the whitelist.  We're looking to get 
the patches out in the next few days after a bit more testing.

diff --git a/qemu-seccomp.c b/qemu-seccomp.c
index 64329a3..81aaf74 100644
--- a/qemu-seccomp.c
+++ b/qemu-seccomp.c
@@ -45,6 +45,12 @@ static const struct QemuSeccompSyscall 
seccomp_whitelist[] = {
      { SCMP_SYS(access), 245 },
      { SCMP_SYS(prctl), 245 },
      { SCMP_SYS(signalfd), 245 },
+    { SCMP_SYS(getrlimit), 245 },
+    { SCMP_SYS(set_tid_address), 245 },
+    { SCMP_SYS(socketpair), 245 },
+    { SCMP_SYS(statfs), 245 },
+    { SCMP_SYS(unlink), 245 },
+    { SCMP_SYS(wait4), 245 },
  #if defined(__i386__)
      { SCMP_SYS(fcntl64), 245 },
      { SCMP_SYS(fstat64), 245 },
@@ -59,6 +65,8 @@ static const struct QemuSeccompSyscall 
seccomp_whitelist[] = {
      { SCMP_SYS(mmap2), 245},
      { SCMP_SYS(sigprocmask), 245 },
  #elif defined(__x86_64__)
+    { SCMP_SYS(semget), 245},
+#endif
      { SCMP_SYS(sched_getparam), 245},
      { SCMP_SYS(sched_getscheduler), 245},
      { SCMP_SYS(fstat), 245},
@@ -69,11 +77,15 @@ static const struct QemuSeccompSyscall 
seccomp_whitelist[] = {
      { SCMP_SYS(socket), 245},
      { SCMP_SYS(setsockopt), 245},
      { SCMP_SYS(uname), 245},
-    { SCMP_SYS(semget), 245},
-#endif
      { SCMP_SYS(eventfd2), 245 },
      { SCMP_SYS(dup), 245 },
+    { SCMP_SYS(dup2), 245 },
+    { SCMP_SYS(dup3), 245 },
      { SCMP_SYS(gettid), 245 },
+    { SCMP_SYS(getgid), 245 },
+    { SCMP_SYS(getegid), 245 },
+    { SCMP_SYS(getuid), 245 },
+    { SCMP_SYS(geteuid), 245 },
      { SCMP_SYS(timer_create), 245 },
      { SCMP_SYS(exit), 245 },
      { SCMP_SYS(clock_gettime), 245 },
@@ -107,7 +119,22 @@ static const struct QemuSeccompSyscall 
seccomp_whitelist[] = {
      { SCMP_SYS(getsockname), 242 },
      { SCMP_SYS(getpeername), 242 },
      { SCMP_SYS(fdatasync), 242 },
-    { SCMP_SYS(close), 242 }
+    { SCMP_SYS(close), 242 },
+    { SCMP_SYS(accept4), 242 },
+    { SCMP_SYS(rt_sigpending), 242 },
+    { SCMP_SYS(rt_sigtimedwait), 242 },
+    { SCMP_SYS(readv), 242 },
+    { SCMP_SYS(writev), 242 },
+    { SCMP_SYS(preadv), 242 },
+    { SCMP_SYS(pwritev), 242 },
+    { SCMP_SYS(setrlimit), 242 },
+    { SCMP_SYS(ftruncate), 242 },
+    { SCMP_SYS(lstat), 242 },
+    { SCMP_SYS(pipe), 242 },
+    { SCMP_SYS(umask), 242 },
+    { SCMP_SYS(chdir), 242 },
+    { SCMP_SYS(setitimer), 242 },
+    { SCMP_SYS(setsid), 242 }
  };

Regards,
Corey Bryant

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCHv2 1/4] Adding new syscalls (bugzilla 855162)
  2012-11-02 13:48   ` Corey Bryant
@ 2012-11-02 14:10     ` Paul Moore
  2012-11-02 14:38       ` Paul Moore
  0 siblings, 1 reply; 25+ messages in thread
From: Paul Moore @ 2012-11-02 14:10 UTC (permalink / raw)
  To: Corey Bryant, Eduardo Otubo; +Cc: aliguori, qemu-devel

On Friday, November 02, 2012 09:48:55 AM Corey Bryant wrote:
> On 11/01/2012 05:43 PM, Paul Moore wrote:
> > On Tuesday, October 23, 2012 03:55:29 AM Eduardo Otubo wrote:
> >> According to the bug 855162[0] - there's the need of adding new syscalls
> >> to the whitelist whenn using Qemu with Libvirt.
> >> 
> >> [0] - https://bugzilla.redhat.com/show_bug.cgi?id=855162
> >> 
> >> v2: Adding new syscalls to the list: readlink, rt_sigpending, and
> >> 
> >>      rt_sigtimedwait
> >> 
> >> Reported-by: Paul Moore <pmoore@redhat.com>
> >> Signed-off-by: Eduardo Otubo <otubo@linux.vnet.ibm.com>
> >> ---
> >> 
> >>   qemu-seccomp.c | 13 ++++++++++++-
> >>   1 file changed, 12 insertions(+), 1 deletion(-)
> > 
> > I had an opportunity to test this patchset on a F17 machine using QEMU 1.2
> > and unfortunately it still fails.  I'm using a relatively basic guest
> > configuration running F16, the details are documented in the RH BZ that
> > Eduardo mentioned in the patch description.
> 
> Paul, Here's the latest diff for the whitelist.  We're looking to get
> the patches out in the next few days after a bit more testing.

Okay, thanks for the updated list ... I'm rebuilding QEMU right now and I'll 
report back with the results later today.

-- 
paul moore
security and virtualization @ redhat

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCHv2 1/4] Adding new syscalls (bugzilla 855162)
  2012-11-02  2:29   ` Eduardo Otubo
@ 2012-11-02 14:10     ` Paul Moore
  0 siblings, 0 replies; 25+ messages in thread
From: Paul Moore @ 2012-11-02 14:10 UTC (permalink / raw)
  To: Eduardo Otubo; +Cc: aliguori, coreyb, qemu-devel

On Friday, November 02, 2012 12:29:37 AM Eduardo Otubo wrote:
> On Thu, Nov 01, 2012 at 05:43:03PM -0400, Paul Moore wrote:
> > On Tuesday, October 23, 2012 03:55:29 AM Eduardo Otubo wrote:
> > > According to the bug 855162[0] - there's the need of adding new syscalls
> > > to the whitelist whenn using Qemu with Libvirt.
> > > 
> > > [0] - https://bugzilla.redhat.com/show_bug.cgi?id=855162
> > > 
> > > v2: Adding new syscalls to the list: readlink, rt_sigpending, and
> > > 
> > >     rt_sigtimedwait
> > > 
> > > Reported-by: Paul Moore <pmoore@redhat.com>
> > > Signed-off-by: Eduardo Otubo <otubo@linux.vnet.ibm.com>
> > > ---
> > > 
> > >  qemu-seccomp.c | 13 ++++++++++++-
> > >  1 file changed, 12 insertions(+), 1 deletion(-)
> > 
> > I had an opportunity to test this patchset on a F17 machine using QEMU 1.2
> > and unfortunately it still fails.  I'm using a relatively basic guest
> > configuration running F16, the details are documented in the RH BZ that
> > Eduardo mentioned in the patch description.
> > 
> > Eduardo, I assume you are not able to reproduce this?
> 
> Unfortunately no. But we have the v3  patchset coming soon with new
> syscalls and we're hoping to get this fixed. Thanks for the feedback
> Paul!

No problem, thanks for all your work on this patchset.

-- 
paul moore
security and virtualization @ redhat

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCHv2 1/4] Adding new syscalls (bugzilla 855162)
  2012-11-02 14:10     ` Paul Moore
@ 2012-11-02 14:38       ` Paul Moore
  2012-11-02 14:43         ` Corey Bryant
  0 siblings, 1 reply; 25+ messages in thread
From: Paul Moore @ 2012-11-02 14:38 UTC (permalink / raw)
  To: Corey Bryant, Eduardo Otubo; +Cc: aliguori, qemu-devel

On Friday, November 02, 2012 10:10:02 AM Paul Moore wrote:
> On Friday, November 02, 2012 09:48:55 AM Corey Bryant wrote:
> > On 11/01/2012 05:43 PM, Paul Moore wrote:
> > > On Tuesday, October 23, 2012 03:55:29 AM Eduardo Otubo wrote:
> > >> According to the bug 855162[0] - there's the need of adding new
> > >> syscalls
> > >> to the whitelist whenn using Qemu with Libvirt.
> > >> 
> > >> [0] - https://bugzilla.redhat.com/show_bug.cgi?id=855162
> > >> 
> > >> v2: Adding new syscalls to the list: readlink, rt_sigpending, and
> > >> 
> > >>      rt_sigtimedwait
> > >> 
> > >> Reported-by: Paul Moore <pmoore@redhat.com>
> > >> Signed-off-by: Eduardo Otubo <otubo@linux.vnet.ibm.com>
> > >> ---
> > >> 
> > >>   qemu-seccomp.c | 13 ++++++++++++-
> > >>   1 file changed, 12 insertions(+), 1 deletion(-)
> > > 
> > > I had an opportunity to test this patchset on a F17 machine using QEMU
> > > 1.2
> > > and unfortunately it still fails.  I'm using a relatively basic guest
> > > configuration running F16, the details are documented in the RH BZ that
> > > Eduardo mentioned in the patch description.
> > 
> > Paul, Here's the latest diff for the whitelist.  We're looking to get
> > the patches out in the next few days after a bit more testing.
> 
> Okay, thanks for the updated list ... I'm rebuilding QEMU right now and I'll
> report back with the results later today.

Sadly, no luck, it still fails.

-- 
paul moore
security and virtualization @ redhat

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCHv2 1/4] Adding new syscalls (bugzilla 855162)
  2012-11-02 14:38       ` Paul Moore
@ 2012-11-02 14:43         ` Corey Bryant
  2012-11-02 14:46           ` Paul Moore
  0 siblings, 1 reply; 25+ messages in thread
From: Corey Bryant @ 2012-11-02 14:43 UTC (permalink / raw)
  To: Paul Moore; +Cc: aliguori, qemu-devel, Eduardo Otubo



On 11/02/2012 10:38 AM, Paul Moore wrote:
> On Friday, November 02, 2012 10:10:02 AM Paul Moore wrote:
>> On Friday, November 02, 2012 09:48:55 AM Corey Bryant wrote:
>>> On 11/01/2012 05:43 PM, Paul Moore wrote:
>>>> On Tuesday, October 23, 2012 03:55:29 AM Eduardo Otubo wrote:
>>>>> According to the bug 855162[0] - there's the need of adding new
>>>>> syscalls
>>>>> to the whitelist whenn using Qemu with Libvirt.
>>>>>
>>>>> [0] - https://bugzilla.redhat.com/show_bug.cgi?id=855162
>>>>>
>>>>> v2: Adding new syscalls to the list: readlink, rt_sigpending, and
>>>>>
>>>>>       rt_sigtimedwait
>>>>>
>>>>> Reported-by: Paul Moore <pmoore@redhat.com>
>>>>> Signed-off-by: Eduardo Otubo <otubo@linux.vnet.ibm.com>
>>>>> ---
>>>>>
>>>>>    qemu-seccomp.c | 13 ++++++++++++-
>>>>>    1 file changed, 12 insertions(+), 1 deletion(-)
>>>>
>>>> I had an opportunity to test this patchset on a F17 machine using QEMU
>>>> 1.2
>>>> and unfortunately it still fails.  I'm using a relatively basic guest
>>>> configuration running F16, the details are documented in the RH BZ that
>>>> Eduardo mentioned in the patch description.
>>>
>>> Paul, Here's the latest diff for the whitelist.  We're looking to get
>>> the patches out in the next few days after a bit more testing.
>>
>> Okay, thanks for the updated list ... I'm rebuilding QEMU right now and I'll
>> report back with the results later today.
>
> Sadly, no luck, it still fails.
>

Hmm, let me send you the current patch set off-line, which includes 
debug support to write the failing syscall out.  If you don't mind could 
you try it out?

-- 
Regards,
Corey Bryant

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCHv2 1/4] Adding new syscalls (bugzilla 855162)
  2012-11-02 14:43         ` Corey Bryant
@ 2012-11-02 14:46           ` Paul Moore
  2012-11-02 14:49             ` Corey Bryant
  0 siblings, 1 reply; 25+ messages in thread
From: Paul Moore @ 2012-11-02 14:46 UTC (permalink / raw)
  To: Corey Bryant; +Cc: aliguori, qemu-devel, Eduardo Otubo

On Friday, November 02, 2012 10:43:41 AM Corey Bryant wrote:
> On 11/02/2012 10:38 AM, Paul Moore wrote:
> > On Friday, November 02, 2012 10:10:02 AM Paul Moore wrote:
> >> On Friday, November 02, 2012 09:48:55 AM Corey Bryant wrote:
> >>> On 11/01/2012 05:43 PM, Paul Moore wrote:
> >>>> On Tuesday, October 23, 2012 03:55:29 AM Eduardo Otubo wrote:
> >>>>> According to the bug 855162[0] - there's the need of adding new
> >>>>> syscalls
> >>>>> to the whitelist whenn using Qemu with Libvirt.
> >>>>> 
> >>>>> [0] - https://bugzilla.redhat.com/show_bug.cgi?id=855162
> >>>>> 
> >>>>> v2: Adding new syscalls to the list: readlink, rt_sigpending, and
> >>>>> 
> >>>>>       rt_sigtimedwait
> >>>>> 
> >>>>> Reported-by: Paul Moore <pmoore@redhat.com>
> >>>>> Signed-off-by: Eduardo Otubo <otubo@linux.vnet.ibm.com>
> >>>>> ---
> >>>>> 
> >>>>>    qemu-seccomp.c | 13 ++++++++++++-
> >>>>>    1 file changed, 12 insertions(+), 1 deletion(-)
> >>>> 
> >>>> I had an opportunity to test this patchset on a F17 machine using QEMU
> >>>> 1.2
> >>>> and unfortunately it still fails.  I'm using a relatively basic guest
> >>>> configuration running F16, the details are documented in the RH BZ that
> >>>> Eduardo mentioned in the patch description.
> >>> 
> >>> Paul, Here's the latest diff for the whitelist.  We're looking to get
> >>> the patches out in the next few days after a bit more testing.
> >> 
> >> Okay, thanks for the updated list ... I'm rebuilding QEMU right now and
> >> I'll report back with the results later today.
> > 
> > Sadly, no luck, it still fails.
> 
> Hmm, let me send you the current patch set off-line, which includes
> debug support to write the failing syscall out.  If you don't mind could
> you try it out?

Sure, no problem.

On a related note, I think it would be a *really* good idea to also submit the 
debug code upstream, just in a disabled state by default.  You could either 
bracket it with #ifdefs or get fancy and allow it at runtime with '-sandbox 
debug' or something similar.

-- 
paul moore
security and virtualization @ redhat

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCHv2 1/4] Adding new syscalls (bugzilla 855162)
  2012-11-02 14:46           ` Paul Moore
@ 2012-11-02 14:49             ` Corey Bryant
  0 siblings, 0 replies; 25+ messages in thread
From: Corey Bryant @ 2012-11-02 14:49 UTC (permalink / raw)
  To: Paul Moore; +Cc: aliguori, qemu-devel, Eduardo Otubo



On 11/02/2012 10:46 AM, Paul Moore wrote:
> On Friday, November 02, 2012 10:43:41 AM Corey Bryant wrote:
>> On 11/02/2012 10:38 AM, Paul Moore wrote:
>>> On Friday, November 02, 2012 10:10:02 AM Paul Moore wrote:
>>>> On Friday, November 02, 2012 09:48:55 AM Corey Bryant wrote:
>>>>> On 11/01/2012 05:43 PM, Paul Moore wrote:
>>>>>> On Tuesday, October 23, 2012 03:55:29 AM Eduardo Otubo wrote:
>>>>>>> According to the bug 855162[0] - there's the need of adding new
>>>>>>> syscalls
>>>>>>> to the whitelist whenn using Qemu with Libvirt.
>>>>>>>
>>>>>>> [0] - https://bugzilla.redhat.com/show_bug.cgi?id=855162
>>>>>>>
>>>>>>> v2: Adding new syscalls to the list: readlink, rt_sigpending, and
>>>>>>>
>>>>>>>        rt_sigtimedwait
>>>>>>>
>>>>>>> Reported-by: Paul Moore <pmoore@redhat.com>
>>>>>>> Signed-off-by: Eduardo Otubo <otubo@linux.vnet.ibm.com>
>>>>>>> ---
>>>>>>>
>>>>>>>     qemu-seccomp.c | 13 ++++++++++++-
>>>>>>>     1 file changed, 12 insertions(+), 1 deletion(-)
>>>>>>
>>>>>> I had an opportunity to test this patchset on a F17 machine using QEMU
>>>>>> 1.2
>>>>>> and unfortunately it still fails.  I'm using a relatively basic guest
>>>>>> configuration running F16, the details are documented in the RH BZ that
>>>>>> Eduardo mentioned in the patch description.
>>>>>
>>>>> Paul, Here's the latest diff for the whitelist.  We're looking to get
>>>>> the patches out in the next few days after a bit more testing.
>>>>
>>>> Okay, thanks for the updated list ... I'm rebuilding QEMU right now and
>>>> I'll report back with the results later today.
>>>
>>> Sadly, no luck, it still fails.
>>
>> Hmm, let me send you the current patch set off-line, which includes
>> debug support to write the failing syscall out.  If you don't mind could
>> you try it out?
>
> Sure, no problem.
>
> On a related note, I think it would be a *really* good idea to also submit the
> debug code upstream, just in a disabled state by default.  You could either
> bracket it with #ifdefs or get fancy and allow it at runtime with '-sandbox
> debug' or something similar.
>

I agree.  That's the plan with the v3 patch series.  We'll get them out 
in the next few days.

-- 
Regards,
Corey Bryant

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCHv2 3/4] Support for "double whitelist" filters
  2012-10-23  5:55 ` [Qemu-devel] [PATCHv2 3/4] Support for "double whitelist" filters Eduardo Otubo
  2012-10-23 15:10   ` Corey Bryant
@ 2012-11-02 21:29   ` Paul Moore
  2012-11-02 22:00     ` Corey Bryant
  2012-11-02 22:01     ` Anthony Liguori
  1 sibling, 2 replies; 25+ messages in thread
From: Paul Moore @ 2012-11-02 21:29 UTC (permalink / raw)
  To: Eduardo Otubo, coreyb; +Cc: aliguori, qemu-devel

On Tuesday, October 23, 2012 03:55:31 AM Eduardo Otubo wrote:
> This patch includes a second whitelist right before the main loop. It's
> a smaller and more restricted whitelist, excluding execve() among many
> others.
> 
> v2: * ctx changed to main_loop_ctx
>     * seccomp_on now inside ifdef
>     * open syscall added to the main_loop whitelist
> 
> Signed-off-by: Eduardo Otubo <otubo@linux.vnet.ibm.com>

Unfortunately qemu.org seems to be down for me today so I can't grab the 
latest repo to review/verify this patch (some of my comments/assumptions below 
may be off) but I'm a little confused, hopefully you guys can help me out, 
read below ...

The first call to seccomp_install_filter() will setup a whitelist for the 
syscalls that have been explicitly specified, all others will hit the default 
action TRAP/KILL.  The second call to seccomp_install_filter() will add a 
second whitelist for another set of explicitly specified syscalls, all others 
will hit the default action TRAP/KILL.

The problem occurs when the filters are executed in the kernel when a syscall 
is executed.  On each syscall the first filter will be executed and the action 
will either be ALLOW or TRAP/KILL, next the second filter will be executed and 
the action will either be ALLOW or TRAP/KILL; since the kernel always takes 
the most restrictive (lowest integer action value) action when multiple 
filters are specified, I think your double whitelist value is going to have 
some inherent problems.  I might suggest an initial, fairly permissive 
whitelist followed by a follow-on blacklist if you want to disable certain 
syscalls.

-- 
paul moore
security and virtualization @ redhat

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCHv2 3/4] Support for "double whitelist" filters
  2012-11-02 21:29   ` Paul Moore
@ 2012-11-02 22:00     ` Corey Bryant
  2012-11-02 22:14       ` Paul Moore
  2012-11-02 22:01     ` Anthony Liguori
  1 sibling, 1 reply; 25+ messages in thread
From: Corey Bryant @ 2012-11-02 22:00 UTC (permalink / raw)
  To: Paul Moore; +Cc: aliguori, qemu-devel, Eduardo Otubo



On 11/02/2012 05:29 PM, Paul Moore wrote:
> On Tuesday, October 23, 2012 03:55:31 AM Eduardo Otubo wrote:
>> This patch includes a second whitelist right before the main loop. It's
>> a smaller and more restricted whitelist, excluding execve() among many
>> others.
>>
>> v2: * ctx changed to main_loop_ctx
>>      * seccomp_on now inside ifdef
>>      * open syscall added to the main_loop whitelist
>>
>> Signed-off-by: Eduardo Otubo <otubo@linux.vnet.ibm.com>
>
> Unfortunately qemu.org seems to be down for me today so I can't grab the
> latest repo to review/verify this patch (some of my comments/assumptions below
> may be off) but I'm a little confused, hopefully you guys can help me out,
> read below ...
>
> The first call to seccomp_install_filter() will setup a whitelist for the
> syscalls that have been explicitly specified, all others will hit the default
> action TRAP/KILL.  The second call to seccomp_install_filter() will add a
> second whitelist for another set of explicitly specified syscalls, all others
> will hit the default action TRAP/KILL.

That's correct.  The goal was to have a 2nd list that is a subset of the 
1st list, and also not include execve() in the 2nd list.  At this point 
though, since it's late in the release, we've expanded the 2nd list to 
be the same as the 1st with the exception of execve() not being in the 
2nd list.

>
> The problem occurs when the filters are executed in the kernel when a syscall
> is executed.  On each syscall the first filter will be executed and the action
> will either be ALLOW or TRAP/KILL, next the second filter will be executed and
> the action will either be ALLOW or TRAP/KILL; since the kernel always takes
> the most restrictive (lowest integer action value) action when multiple
> filters are specified, I think your double whitelist value is going to have
> some inherent problems.

That's something I hadn't thought of.  But TRAP and KILL won't exist 
together in our whitelists, and our 2nd whitelist is a subset of the 
1st.  So do you think there would still be problems?

> I might suggest an initial, fairly permissive
> whitelist followed by a follow-on blacklist if you want to disable certain
> syscalls.
>

I have to admit I'm nervous about this at this point in QEMU 1.3.  It's 
getting late in the cycle and we'd hoped to get this in earlier.  A more 
permissive whitelist is probably going to be the only way we'll 
successfully turn -sandbox on by default at this point in QEMU 1.3.

-- 
Regards,
Corey Bryant

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCHv2 3/4] Support for "double whitelist" filters
  2012-11-02 21:29   ` Paul Moore
  2012-11-02 22:00     ` Corey Bryant
@ 2012-11-02 22:01     ` Anthony Liguori
  1 sibling, 0 replies; 25+ messages in thread
From: Anthony Liguori @ 2012-11-02 22:01 UTC (permalink / raw)
  To: Paul Moore, Eduardo Otubo, coreyb; +Cc: qemu-devel

Paul Moore <pmoore@redhat.com> writes:

> On Tuesday, October 23, 2012 03:55:31 AM Eduardo Otubo wrote:
>> This patch includes a second whitelist right before the main loop. It's
>> a smaller and more restricted whitelist, excluding execve() among many
>> others.
>> 
>> v2: * ctx changed to main_loop_ctx
>>     * seccomp_on now inside ifdef
>>     * open syscall added to the main_loop whitelist
>> 
>> Signed-off-by: Eduardo Otubo <otubo@linux.vnet.ibm.com>
>
> Unfortunately qemu.org seems to be down for me today so I can't grab
> the 

qemu.org is up, just having DNS problems.  Use git.qemu-project.org
instead and you should be fine.

Regards,

Anthony Liguori

> latest repo to review/verify this patch (some of my comments/assumptions below 
> may be off) but I'm a little confused, hopefully you guys can help me out, 
> read below ...
>
> The first call to seccomp_install_filter() will setup a whitelist for the 
> syscalls that have been explicitly specified, all others will hit the default 
> action TRAP/KILL.  The second call to seccomp_install_filter() will add a 
> second whitelist for another set of explicitly specified syscalls, all others 
> will hit the default action TRAP/KILL.
>
> The problem occurs when the filters are executed in the kernel when a syscall 
> is executed.  On each syscall the first filter will be executed and the action 
> will either be ALLOW or TRAP/KILL, next the second filter will be executed and 
> the action will either be ALLOW or TRAP/KILL; since the kernel always takes 
> the most restrictive (lowest integer action value) action when multiple 
> filters are specified, I think your double whitelist value is going to have 
> some inherent problems.  I might suggest an initial, fairly permissive 
> whitelist followed by a follow-on blacklist if you want to disable certain 
> syscalls.
>
> -- 
> paul moore
> security and virtualization @ redhat

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCHv2 3/4] Support for "double whitelist" filters
  2012-11-02 22:00     ` Corey Bryant
@ 2012-11-02 22:14       ` Paul Moore
  2012-11-05 14:39         ` Corey Bryant
  0 siblings, 1 reply; 25+ messages in thread
From: Paul Moore @ 2012-11-02 22:14 UTC (permalink / raw)
  To: Corey Bryant; +Cc: aliguori, qemu-devel, Eduardo Otubo

On Friday, November 02, 2012 06:00:29 PM Corey Bryant wrote:
> On 11/02/2012 05:29 PM, Paul Moore wrote:
> > On Tuesday, October 23, 2012 03:55:31 AM Eduardo Otubo wrote:
> >> This patch includes a second whitelist right before the main loop. It's
> >> a smaller and more restricted whitelist, excluding execve() among many
> >> others.
> >> 
> >> v2: * ctx changed to main_loop_ctx
> >> 
> >>      * seccomp_on now inside ifdef
> >>      * open syscall added to the main_loop whitelist
> >> 
> >> Signed-off-by: Eduardo Otubo <otubo@linux.vnet.ibm.com>
> > 
> > Unfortunately qemu.org seems to be down for me today so I can't grab the
> > latest repo to review/verify this patch (some of my comments/assumptions
> > below may be off) but I'm a little confused, hopefully you guys can help
> > me out, read below ...
> > 
> > The first call to seccomp_install_filter() will setup a whitelist for the
> > syscalls that have been explicitly specified, all others will hit the
> > default action TRAP/KILL.  The second call to seccomp_install_filter()
> > will add a second whitelist for another set of explicitly specified
> > syscalls, all others will hit the default action TRAP/KILL.
> 
> That's correct.  The goal was to have a 2nd list that is a subset of the
> 1st list, and also not include execve() in the 2nd list.  At this point
> though, since it's late in the release, we've expanded the 2nd list to
> be the same as the 1st with the exception of execve() not being in the
> 2nd list.
> 
> > The problem occurs when the filters are executed in the kernel when a
> > syscall is executed.  On each syscall the first filter will be executed
> > and the action will either be ALLOW or TRAP/KILL, next the second filter
> > will be executed and the action will either be ALLOW or TRAP/KILL; since
> > the kernel always takes the most restrictive (lowest integer action
> > value) action when multiple filters are specified, I think your double
> > whitelist value is going to have some inherent problems.
> 
> That's something I hadn't thought of.  But TRAP and KILL won't exist
> together in our whitelists, and our 2nd whitelist is a subset of the
> 1st.  So do you think there would still be problems?

It doesn't really matter if the default action is TRAP and/or KILL, the point 
is that if you use a second whitelist after an initial whitelist the effective 
seccomp filter is going to be only the syscalls you explicitly allowed in the 
second whitelist.  When using multiple seccomp filters on a process, all 
filters are executed for each syscall and the most restrictive action of all 
the filters is the action that the kernel takes.

Don't get me wrong, I like the idea of progressively restricting QEMU, but if 
you are going to load multiple seccomp filters into the kernel, you almost 
certainly only want the first whitelist filter to be the union of all the 
seccomp filter you intend to load with all subsequent filters being blacklists 
which progressively remove syscalls which are allowed by the initial 
whitelist.

> > I might suggest an initial, fairly permissive
> > whitelist followed by a follow-on blacklist if you want to disable certain
> > syscalls.
> 
> I have to admit I'm nervous about this at this point in QEMU 1.3.  It's
> getting late in the cycle and we'd hoped to get this in earlier.  A more
> permissive whitelist is probably going to be the only way we'll
> successfully turn -sandbox on by default at this point in QEMU 1.3.

Thats fine, I just wanted to point out that I think the multiple whitelist 
approach is going to have some inherent problems.

-- 
paul moore
security and virtualization @ redhat

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCHv2 3/4] Support for "double whitelist" filters
  2012-11-02 22:14       ` Paul Moore
@ 2012-11-05 14:39         ` Corey Bryant
  2012-11-05 21:58           ` Paul Moore
  0 siblings, 1 reply; 25+ messages in thread
From: Corey Bryant @ 2012-11-05 14:39 UTC (permalink / raw)
  To: Paul Moore; +Cc: aliguori, qemu-devel, Eduardo Otubo



On 11/02/2012 06:14 PM, Paul Moore wrote:
> On Friday, November 02, 2012 06:00:29 PM Corey Bryant wrote:
>> On 11/02/2012 05:29 PM, Paul Moore wrote:
>>> On Tuesday, October 23, 2012 03:55:31 AM Eduardo Otubo wrote:
>>>> This patch includes a second whitelist right before the main loop. It's
>>>> a smaller and more restricted whitelist, excluding execve() among many
>>>> others.
>>>>
>>>> v2: * ctx changed to main_loop_ctx
>>>>
>>>>       * seccomp_on now inside ifdef
>>>>       * open syscall added to the main_loop whitelist
>>>>
>>>> Signed-off-by: Eduardo Otubo <otubo@linux.vnet.ibm.com>
>>>
>>> Unfortunately qemu.org seems to be down for me today so I can't grab the
>>> latest repo to review/verify this patch (some of my comments/assumptions
>>> below may be off) but I'm a little confused, hopefully you guys can help
>>> me out, read below ...
>>>
>>> The first call to seccomp_install_filter() will setup a whitelist for the
>>> syscalls that have been explicitly specified, all others will hit the
>>> default action TRAP/KILL.  The second call to seccomp_install_filter()
>>> will add a second whitelist for another set of explicitly specified
>>> syscalls, all others will hit the default action TRAP/KILL.
>>
>> That's correct.  The goal was to have a 2nd list that is a subset of the
>> 1st list, and also not include execve() in the 2nd list.  At this point
>> though, since it's late in the release, we've expanded the 2nd list to
>> be the same as the 1st with the exception of execve() not being in the
>> 2nd list.
>>
>>> The problem occurs when the filters are executed in the kernel when a
>>> syscall is executed.  On each syscall the first filter will be executed
>>> and the action will either be ALLOW or TRAP/KILL, next the second filter
>>> will be executed and the action will either be ALLOW or TRAP/KILL; since
>>> the kernel always takes the most restrictive (lowest integer action
>>> value) action when multiple filters are specified, I think your double
>>> whitelist value is going to have some inherent problems.
>>
>> That's something I hadn't thought of.  But TRAP and KILL won't exist
>> together in our whitelists, and our 2nd whitelist is a subset of the
>> 1st.  So do you think there would still be problems?
>
> It doesn't really matter if the default action is TRAP and/or KILL, the point
> is that if you use a second whitelist after an initial whitelist the effective
> seccomp filter is going to be only the syscalls you explicitly allowed in the
> second whitelist.  When using multiple seccomp filters on a process, all
> filters are executed for each syscall and the most restrictive action of all
> the filters is the action that the kernel takes.
>
> Don't get me wrong, I like the idea of progressively restricting QEMU, but if
> you are going to load multiple seccomp filters into the kernel, you almost
> certainly only want the first whitelist filter to be the union of all the
> seccomp filter you intend to load with all subsequent filters being blacklists
> which progressively remove syscalls which are allowed by the initial
> whitelist.
>

That's what we're doing though.  The first whitelist is a union of all 
subsequent filters.  Of course there's only one subsequent filter at 
this point.  But the idea is to start out with a large whitelist for 
initialization and then tighten it up before the main loop when 
presumably less syscalls are needed.

My concern is getting the two whitelists correct.  We keep uncovering 
new syscalls as we test.

>>> I might suggest an initial, fairly permissive
>>> whitelist followed by a follow-on blacklist if you want to disable certain
>>> syscalls.
>>
>> I have to admit I'm nervous about this at this point in QEMU 1.3.  It's
>> getting late in the cycle and we'd hoped to get this in earlier.  A more
>> permissive whitelist is probably going to be the only way we'll
>> successfully turn -sandbox on by default at this point in QEMU 1.3.
>
> Thats fine, I just wanted to point out that I think the multiple whitelist
> approach is going to have some inherent problems.
>

Are you thinking there will be problems with the current two-whitelist 
approach, or are you thinking there would be problems in the future if 
we continued restricting the QEMU process with further whitelists?  If 
you mean the latter, then I understand your point since QEMU is a single 
process that requires a certain subset of syscalls.

I'm thinking once the two whitelists are in place, we can move on to 
restricting syscall parameters in the existing whitelists where it makes 
sense, and then look into your original decomposition approach, where 
parts of qemu are run in separate threads/processes which would allow 
much tighter seccomp restriction.

What do you think?

-- 
Regards,
Corey Bryant

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCHv2 3/4] Support for "double whitelist" filters
  2012-11-05 14:39         ` Corey Bryant
@ 2012-11-05 21:58           ` Paul Moore
  2012-11-05 22:26             ` Corey Bryant
  0 siblings, 1 reply; 25+ messages in thread
From: Paul Moore @ 2012-11-05 21:58 UTC (permalink / raw)
  To: Corey Bryant; +Cc: aliguori, qemu-devel, Eduardo Otubo

On Monday, November 05, 2012 09:39:46 AM Corey Bryant wrote:
> On 11/02/2012 06:14 PM, Paul Moore wrote:
> > On Friday, November 02, 2012 06:00:29 PM Corey Bryant wrote:
> >> On 11/02/2012 05:29 PM, Paul Moore wrote:
> >>> On Tuesday, October 23, 2012 03:55:31 AM Eduardo Otubo wrote:
> >>>> This patch includes a second whitelist right before the main loop. It's
> >>>> a smaller and more restricted whitelist, excluding execve() among many
> >>>> others.
> >>>> 
> >>>> v2: * ctx changed to main_loop_ctx
> >>>> 
> >>>>       * seccomp_on now inside ifdef
> >>>>       * open syscall added to the main_loop whitelist
> >>>> 
> >>>> Signed-off-by: Eduardo Otubo <otubo@linux.vnet.ibm.com>
> >>> 
> >>> Unfortunately qemu.org seems to be down for me today so I can't grab the
> >>> latest repo to review/verify this patch (some of my comments/assumptions
> >>> below may be off) but I'm a little confused, hopefully you guys can help
> >>> me out, read below ...
> >>> 
> >>> The first call to seccomp_install_filter() will setup a whitelist for
> >>> the
> >>> syscalls that have been explicitly specified, all others will hit the
> >>> default action TRAP/KILL.  The second call to seccomp_install_filter()
> >>> will add a second whitelist for another set of explicitly specified
> >>> syscalls, all others will hit the default action TRAP/KILL.
> >> 
> >> That's correct.  The goal was to have a 2nd list that is a subset of the
> >> 1st list, and also not include execve() in the 2nd list.  At this point
> >> though, since it's late in the release, we've expanded the 2nd list to
> >> be the same as the 1st with the exception of execve() not being in the
> >> 2nd list.
> >> 
> >>> The problem occurs when the filters are executed in the kernel when a
> >>> syscall is executed.  On each syscall the first filter will be executed
> >>> and the action will either be ALLOW or TRAP/KILL, next the second filter
> >>> will be executed and the action will either be ALLOW or TRAP/KILL; since
> >>> the kernel always takes the most restrictive (lowest integer action
> >>> value) action when multiple filters are specified, I think your double
> >>> whitelist value is going to have some inherent problems.
> >> 
> >> That's something I hadn't thought of.  But TRAP and KILL won't exist
> >> together in our whitelists, and our 2nd whitelist is a subset of the
> >> 1st.  So do you think there would still be problems?
> > 
> > It doesn't really matter if the default action is TRAP and/or KILL, the
> > point is that if you use a second whitelist after an initial whitelist
> > the effective seccomp filter is going to be only the syscalls you
> > explicitly allowed in the second whitelist.  When using multiple seccomp
> > filters on a process, all filters are executed for each syscall and the
> > most restrictive action of all the filters is the action that the kernel
> > takes.
> > 
> > Don't get me wrong, I like the idea of progressively restricting QEMU, but
> > if you are going to load multiple seccomp filters into the kernel, you
> > almost certainly only want the first whitelist filter to be the union of
> > all the seccomp filter you intend to load with all subsequent filters
> > being blacklists which progressively remove syscalls which are allowed by
> > the initial whitelist.
> 
> That's what we're doing though.  The first whitelist is a union of all
> subsequent filters.  Of course there's only one subsequent filter at
> this point.  But the idea is to start out with a large whitelist for
> initialization and then tighten it up before the main loop when
> presumably less syscalls are needed.

Okay, that's good ... It still seems a bit odd to me, I think a whitelist 1st 
blacklist 2nd is a more intuitive and efficient solution but that may just be 
me.

> My concern is getting the two whitelists correct.  We keep uncovering
> new syscalls as we test.

Of course, this whole whitelist/blacklist discussion assumes the list of 
allowed syscalls is correct.

> >>> I might suggest an initial, fairly permissive
> >>> whitelist followed by a follow-on blacklist if you want to disable
> >>> certain
> >>> syscalls.
> >> 
> >> I have to admit I'm nervous about this at this point in QEMU 1.3.  It's
> >> getting late in the cycle and we'd hoped to get this in earlier.  A more
> >> permissive whitelist is probably going to be the only way we'll
> >> successfully turn -sandbox on by default at this point in QEMU 1.3.
> > 
> > Thats fine, I just wanted to point out that I think the multiple whitelist
> > approach is going to have some inherent problems.
> 
> Are you thinking there will be problems with the current two-whitelist
> approach, or are you thinking there would be problems in the future if
> we continued restricting the QEMU process with further whitelists?  If
> you mean the latter, then I understand your point since QEMU is a single
> process that requires a certain subset of syscalls.

I was originally concerned that you were structuring the whitelists 
incorrectly, but it sounds like that is not the case - that's good.

I'm still concerned that the double whitelist approach may result in bigger 
syscall filters than necessary but until we get a final-ish list there is no 
point worrying about that.

> I'm thinking once the two whitelists are in place, we can move on to
> restricting syscall parameters in the existing whitelists where it makes
> sense ...

Yep, sounds reasonable.

> and then look into your original decomposition approach, where
> parts of qemu are run in separate threads/processes which would allow
> much tighter seccomp restriction.

Ultimately I think this is the right solution if we want to get serious about 
making QEMU more resistant to attacks from malicious guests.

-- 
paul moore
security and virtualization @ redhat

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCHv2 3/4] Support for "double whitelist" filters
  2012-11-05 21:58           ` Paul Moore
@ 2012-11-05 22:26             ` Corey Bryant
  0 siblings, 0 replies; 25+ messages in thread
From: Corey Bryant @ 2012-11-05 22:26 UTC (permalink / raw)
  To: Paul Moore; +Cc: aliguori, qemu-devel, Eduardo Otubo



On 11/05/2012 04:58 PM, Paul Moore wrote:
> On Monday, November 05, 2012 09:39:46 AM Corey Bryant wrote:
>> On 11/02/2012 06:14 PM, Paul Moore wrote:
>>> On Friday, November 02, 2012 06:00:29 PM Corey Bryant wrote:
>>>> On 11/02/2012 05:29 PM, Paul Moore wrote:
>>>>> On Tuesday, October 23, 2012 03:55:31 AM Eduardo Otubo wrote:
>>>>>> This patch includes a second whitelist right before the main loop. It's
>>>>>> a smaller and more restricted whitelist, excluding execve() among many
>>>>>> others.
>>>>>>
>>>>>> v2: * ctx changed to main_loop_ctx
>>>>>>
>>>>>>        * seccomp_on now inside ifdef
>>>>>>        * open syscall added to the main_loop whitelist
>>>>>>
>>>>>> Signed-off-by: Eduardo Otubo <otubo@linux.vnet.ibm.com>
>>>>>
>>>>> Unfortunately qemu.org seems to be down for me today so I can't grab the
>>>>> latest repo to review/verify this patch (some of my comments/assumptions
>>>>> below may be off) but I'm a little confused, hopefully you guys can help
>>>>> me out, read below ...
>>>>>
>>>>> The first call to seccomp_install_filter() will setup a whitelist for
>>>>> the
>>>>> syscalls that have been explicitly specified, all others will hit the
>>>>> default action TRAP/KILL.  The second call to seccomp_install_filter()
>>>>> will add a second whitelist for another set of explicitly specified
>>>>> syscalls, all others will hit the default action TRAP/KILL.
>>>>
>>>> That's correct.  The goal was to have a 2nd list that is a subset of the
>>>> 1st list, and also not include execve() in the 2nd list.  At this point
>>>> though, since it's late in the release, we've expanded the 2nd list to
>>>> be the same as the 1st with the exception of execve() not being in the
>>>> 2nd list.
>>>>
>>>>> The problem occurs when the filters are executed in the kernel when a
>>>>> syscall is executed.  On each syscall the first filter will be executed
>>>>> and the action will either be ALLOW or TRAP/KILL, next the second filter
>>>>> will be executed and the action will either be ALLOW or TRAP/KILL; since
>>>>> the kernel always takes the most restrictive (lowest integer action
>>>>> value) action when multiple filters are specified, I think your double
>>>>> whitelist value is going to have some inherent problems.
>>>>
>>>> That's something I hadn't thought of.  But TRAP and KILL won't exist
>>>> together in our whitelists, and our 2nd whitelist is a subset of the
>>>> 1st.  So do you think there would still be problems?
>>>
>>> It doesn't really matter if the default action is TRAP and/or KILL, the
>>> point is that if you use a second whitelist after an initial whitelist
>>> the effective seccomp filter is going to be only the syscalls you
>>> explicitly allowed in the second whitelist.  When using multiple seccomp
>>> filters on a process, all filters are executed for each syscall and the
>>> most restrictive action of all the filters is the action that the kernel
>>> takes.
>>>
>>> Don't get me wrong, I like the idea of progressively restricting QEMU, but
>>> if you are going to load multiple seccomp filters into the kernel, you
>>> almost certainly only want the first whitelist filter to be the union of
>>> all the seccomp filter you intend to load with all subsequent filters
>>> being blacklists which progressively remove syscalls which are allowed by
>>> the initial whitelist.
>>
>> That's what we're doing though.  The first whitelist is a union of all
>> subsequent filters.  Of course there's only one subsequent filter at
>> this point.  But the idea is to start out with a large whitelist for
>> initialization and then tighten it up before the main loop when
>> presumably less syscalls are needed.
>
> Okay, that's good ... It still seems a bit odd to me, I think a whitelist 1st
> blacklist 2nd is a more intuitive and efficient solution but that may just be
> me.
>

I missed the blacklist point on this before.  Yes, that makes more sense 
2nd list.  We'll try that out.

-- 
Regards,
Corey Bryant

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2012-11-05 22:26 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-10-23  5:55 [Qemu-devel] [PATCHv2 1/4] Adding new syscalls (bugzilla 855162) Eduardo Otubo
2012-10-23  5:55 ` [Qemu-devel] [PATCHv2 2/4] Setting "-sandbox on" as deafult Eduardo Otubo
2012-10-23  5:55 ` [Qemu-devel] [PATCHv2 3/4] Support for "double whitelist" filters Eduardo Otubo
2012-10-23 15:10   ` Corey Bryant
2012-10-24 20:06     ` Eduardo Otubo
2012-10-25 20:16     ` Eduardo Otubo
2012-11-02 21:29   ` Paul Moore
2012-11-02 22:00     ` Corey Bryant
2012-11-02 22:14       ` Paul Moore
2012-11-05 14:39         ` Corey Bryant
2012-11-05 21:58           ` Paul Moore
2012-11-05 22:26             ` Corey Bryant
2012-11-02 22:01     ` Anthony Liguori
2012-10-23  5:55 ` [Qemu-devel] [PATCHv2 4/4] Warning messages on net devices hotplug Eduardo Otubo
2012-10-23 15:59   ` Corey Bryant
2012-10-23 16:39     ` Eric Blake
2012-11-01 21:43 ` [Qemu-devel] [PATCHv2 1/4] Adding new syscalls (bugzilla 855162) Paul Moore
2012-11-02  2:29   ` Eduardo Otubo
2012-11-02 14:10     ` Paul Moore
2012-11-02 13:48   ` Corey Bryant
2012-11-02 14:10     ` Paul Moore
2012-11-02 14:38       ` Paul Moore
2012-11-02 14:43         ` Corey Bryant
2012-11-02 14:46           ` Paul Moore
2012-11-02 14:49             ` Corey Bryant

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.