Re: [Qemu-devel] [PATCH v2] Add argument filters to the seccomp sandbox

All of lore.kernel.org
 help / color / mirror / Atom feed

* Re: [Qemu-devel] [PATCH v2] Add argument filters to the seccomp sandbox
@ 2015-09-29  3:14 Namsun Ch'o
  2015-09-29 15:38 ` Eduardo Otubo
  2015-09-29 22:12 ` Paul Moore
  0 siblings, 2 replies; 14+ messages in thread
From: Namsun Ch'o @ 2015-09-29  3:14 UTC (permalink / raw)
  To: pmoore; +Cc: qemu-devel, eduardo.otubo

> My understanding of the config file you proposed is that it would allow the
> configuration of a whitelist, so changes to the config very could result in
> *less* strict of a filter, not always more.

No. Any time an administrator wants a syscall that is not enabled in the
sandbox, they either don't actually need it, or it's a bug and should be
fixed. So all the config would do is add argument filters to syscalls which
are already whitelisted. The alternative would be that the syscalls are given
no further argument filtering. The config could only make the filteres more
restrictive, never less.

Perhaps there could be a #define somewhere that toggles whether or not a
syscall argument filter can be created for a syscall which is not in the
built-in whitelist, otherwise it would throw an error saying that you cannot
create an argument filter for a syscall that is not permitted.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] [PATCH v2] Add argument filters to the seccomp sandbox
  2015-09-29  3:14 [Qemu-devel] [PATCH v2] Add argument filters to the seccomp sandbox Namsun Ch'o
@ 2015-09-29 15:38 ` Eduardo Otubo
  2015-09-29 22:12 ` Paul Moore
  1 sibling, 0 replies; 14+ messages in thread
From: Eduardo Otubo @ 2015-09-29 15:38 UTC (permalink / raw)
  To: Namsun Ch'o; +Cc: pmoore, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 2363 bytes --]

(I'm not sure what happens to your emails that all of them does not
relate to the same thread/Message-ID, making a pain to follow through
out the volume of email on the list, please pay attention to that)

On Mon, Sep 28, 2015 at 11=14=42PM -0400, Namsun Ch'o wrote:
> > My understanding of the config file you proposed is that it would allow the
> > configuration of a whitelist, so changes to the config very could result in
> > *less* strict of a filter, not always more.
> 
> No. Any time an administrator wants a syscall that is not enabled in the
> sandbox, they either don't actually need it, or it's a bug and should be
> fixed. So all the config would do is add argument filters to syscalls which
> are already whitelisted. The alternative would be that the syscalls are given
> no further argument filtering. The config could only make the filteres more
> restrictive, never less.

By its concept, adding exception to the whitelist always makes it less
restrictive by definition, but I got your point. Another point here,
though, is that whitelisting syscall is not as common as you think it
is. I've been maintaining this sub-system for more than an year now and
I had to review/commit/push very few times.

Seccomp sandboxing is enabled by default on virt-test, so whoever is
using that framework is also testing their workload against the current
syscall whitelist. If no one has hit any problem so far, it means the
whitelist is in good shape for usual workloads and configurations.

I'm not particularly against any improvement like configuration files or
more command line args, but I'm concerned about the security itself. If
some guest can scape to the host, it's gonna be much easier to whitelist
syscalls for the next guests, changing the command line is a little too
obvious -- paranoid example, I know.

If you want to write an RFC with your idea, you're more than welcome. We
could move on this discussion and perhaps come up with a nice solution.

Regards,

> 
> Perhaps there could be a #define somewhere that toggles whether or not a
> syscall argument filter can be created for a syscall which is not in the
> built-in whitelist, otherwise it would throw an error saying that you cannot
> create an argument filter for a syscall that is not permitted.

-- 
Eduardo Otubo
ProfitBricks GmbH

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] [PATCH v2] Add argument filters to the seccomp sandbox
  2015-09-29  3:14 [Qemu-devel] [PATCH v2] Add argument filters to the seccomp sandbox Namsun Ch'o
  2015-09-29 15:38 ` Eduardo Otubo
@ 2015-09-29 22:12 ` Paul Moore
  1 sibling, 0 replies; 14+ messages in thread
From: Paul Moore @ 2015-09-29 22:12 UTC (permalink / raw)
  To: Namsun Ch'o; +Cc: qemu-devel, eduardo.otubo

On Monday, September 28, 2015 11:14:42 PM Namsun Ch'o wrote:
> > My understanding of the config file you proposed is that it would allow
> > the
> > configuration of a whitelist, so changes to the config very could result
> > in
> > *less* strict of a filter, not always more.
> 
> No. Any time an administrator wants a syscall that is not enabled in the
> sandbox, they either don't actually need it, or it's a bug and should be
> fixed. So all the config would do is add argument filters to syscalls which
> are already whitelisted.

If the syscall, without arguments, is already added to the whitelist then 
adding a new libseccomp rule to allow that same syscall with specific 
arguments will have no effect since a broader rule already exists in the 
filter.

> The alternative would be that the syscalls are given no further argument
> filtering. The config could only make the filteres more restrictive, never
> less.

I still don't see how this is the case, but it probably isn't worth arguing 
any further without some patches.
 
> Perhaps there could be a #define somewhere that toggles whether or not a
> syscall argument filter can be created for a syscall which is not in the
> built-in whitelist, otherwise it would throw an error saying that you cannot
> create an argument filter for a syscall that is not permitted.

I would argue you should never be able to add a syscall to the whitelist via a 
config file and/or command line option, but that is my opinion.

-- 
paul moore
security @ redhat

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] [PATCH v2] Add argument filters to the seccomp sandbox
       [not found] <d55ad1eed872006f0634c3e0067553a5@airmail.cc>
@ 2015-10-01  7:17 ` Markus Armbruster
  0 siblings, 0 replies; 14+ messages in thread
From: Markus Armbruster @ 2015-10-01  7:17 UTC (permalink / raw)
  To: Namsun Ch'o; +Cc: qemu-devel, eduardo.otubo

"Namsun Ch'o" <namnamc@airmail.cc> writes:

>> This message lacks In-Reply-To: and References: headers. Whatever you
>> use to send e-mail seems grossly deficient. Perhaps it has some "don't
>> be such an ass" configuration button, but I doubt it.
>
> I've switched providers. Is this one better?

I'm afraid not.

It's a rather common web mail defect.

Can you use a real mail user agent, i.e. *not* something running in your
browser?

If you insist on using a web browser... as much as I hate to point
people to Google's services: gmail can apparently be made to behave.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] [PATCH v2] Add argument filters to the seccomp sandbox
  2015-09-30  6:41 Namsun Ch'o
@ 2015-10-01  5:58 ` Markus Armbruster
  0 siblings, 0 replies; 14+ messages in thread
From: Markus Armbruster @ 2015-10-01  5:58 UTC (permalink / raw)
  To: Namsun Ch'o; +Cc: pmoore, qemu-devel, eduardo.otubo

"Namsun Ch'o" <namnamc@Safe-mail.net> writes:

>> (I'm not sure what happens to your emails that all of them does not
>> relate to the same thread/Message-ID, making a pain to follow through
>> out the volume of email on the list, please pay attention to that)
>
> I just click Reply All, I'm not sure how else I would do it. Are they somehow
> being posted as new top level threads instead of replies?

This message lacks In-Reply-To: and References: headers.  Whatever you
use to send e-mail seems grossly deficient.  Perhaps it has some "don't
be such an ass" configuration button, but I doubt it.

[...]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] [PATCH v2] Add argument filters to the seccomp sandbox
@ 2015-09-30  6:41 Namsun Ch'o
  2015-10-01  5:58 ` Markus Armbruster
  0 siblings, 1 reply; 14+ messages in thread
From: Namsun Ch'o @ 2015-09-30  6:41 UTC (permalink / raw)
  To: eduardo.otubo; +Cc: pmoore, qemu-devel

> (I'm not sure what happens to your emails that all of them does not
> relate to the same thread/Message-ID, making a pain to follow through
> out the volume of email on the list, please pay attention to that)

I just click Reply All, I'm not sure how else I would do it. Are they somehow
being posted as new top level threads instead of replies?

> I'm not particularly against any improvement like configuration files or
> more command line args, but I'm concerned about the security itself. If
> some guest can scape to the host, it's gonna be much easier to whitelist
> syscalls for the next guests, changing the command line is a little too
> obvious -- paranoid example, I know.

Any config would be used only for syscalls which are already whitelisted by
the default qemu-seccomp.c For example changing the config could be used to
allow ioctls only on certain file descriptors, since ioctl is already
whitelisted, but it could not be used to whiteilst something which is not
already whitelisted, such as the personality system call.

> If you want to write an RFC with your idea, you're more than welcome. We
> could move on this discussion and perhaps come up with a nice solution.

I've never attempted that before. Could you point me in the right direction?

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] [PATCH v2] Add argument filters to the seccomp sandbox
  2015-09-28 21:34 Namsun Ch'o
@ 2015-09-29  1:19 ` Paul Moore
  0 siblings, 0 replies; 14+ messages in thread
From: Paul Moore @ 2015-09-29  1:19 UTC (permalink / raw)
  To: Namsun Ch'o; +Cc: qemu-devel, eduardo.otubo

On Monday, September 28, 2015 05:34:53 PM Namsun Ch'o wrote:
> > To be clear, I'm not suggesting "--enable-syscalls=foo,bar,...", what I'm
> > suggesting is a decomposition of the current filter list into blocks of
> > syscalls that are needed to enable specific functionality.  For example,
> > if you enable audio support at runtime a set of syscalls will be added to
> > the filter whitelist, if you enable a network device a different set of
> > syscalls will be added to the filter, and so on.
> > 
> > I think having an admin specified filter, either via a command line or
> > configuration file, is a step in the wrong direction.
> 
> How come? I think it is safer than forcing an admin to re-compile everything
> (which just won't happen in an enterprise environment).

For starters, I don't believe that a random administrator understands the QEMU 
code and the potential impact of any changes to such a config file as well as 
the QEMU developers.

> ... If any configuration file only increases the strictness of a syscall, I
> don't see the danger of an admin shooting themselves in the foot. Allowing
> an admin to decrease security would be a problem, but they can do -sandbox
> off anyway.

My understanding of the config file you proposed is that it would allow the 
configuration of a whitelist, so changes to the config very could result in 
*less* strict of a filter, not always more.

Also, while yes, allowing an admin to disable the sandbox via a command line 
switch does disable the seccomp filter, it is obvious that such a step is 
being taken.

> But if the dynamic sandbox is strict enough for each feature, it'd be great.

-- 
paul moore
security @ redhat

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] [PATCH v2] Add argument filters to the seccomp sandbox
@ 2015-09-28 21:34 Namsun Ch'o
  2015-09-29  1:19 ` Paul Moore
  0 siblings, 1 reply; 14+ messages in thread
From: Namsun Ch'o @ 2015-09-28 21:34 UTC (permalink / raw)
  To: pmoore; +Cc: qemu-devel, eduardo.otubo

> To be clear, I'm not suggesting "--enable-syscalls=foo,bar,...", what I'm 
> suggesting is a decomposition of the current filter list into blocks of 
> syscalls that are needed to enable specific functionality.  For example, if 
> you enable audio support at runtime a set of syscalls will be added to the 
> filter whitelist, if you enable a network device a different set of syscalls 
> will be added to the filter, and so on.
> 
> I think having an admin specified filter, either via a command line or 
> configuration file, is a step in the wrong direction.

How come? I think it is safer than forcing an admin to re-compile everything
(which just won't happen in an enterprise environment). If any configuration
file only increases the strictness of a syscall, I don't see the danger of an
admin shooting themselves in the foot. Allowing an admin to decrease security
would be a problem, but they can do -sandbox off anyway.

But if the dynamic sandbox is strict enough for each feature, it'd be great.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] [PATCH v2] Add argument filters to the seccomp sandbox
  2015-09-26  5:06 Namsun Ch'o
@ 2015-09-28 18:24 ` Paul Moore
  0 siblings, 0 replies; 14+ messages in thread
From: Paul Moore @ 2015-09-28 18:24 UTC (permalink / raw)
  To: Namsun Ch'o; +Cc: qemu-devel, eduardo.otubo

On Saturday, September 26, 2015 01:06:57 AM Namsun Ch'o wrote:
> > I've suggested this in the past but to my knowledge no has done any work
> > in this direction, including myself. Despite the lack of progress, I still
> > think this is a very worthwhile idea.
> 
> Which is exactly why I think a configuration file would be the best option
> instead of --enable-syscalls=foo,bar,baz. It would allow someone to easily
> customize their policy without needing to create a patch, or wait on QEMU
> developers to do work on it.

To be clear, I'm not suggesting "--enable-syscalls=foo,bar,...", what I'm 
suggesting is a decomposition of the current filter list into blocks of 
syscalls that are needed to enable specific functionality.  For example, if 
you enable audio support at runtime a set of syscalls will be added to the 
filter whitelist, if you enable a network device a different set of syscalls 
will be added to the filter, and so on.

I think having an admin specified filter, either via a command line or 
configuration file, is a step in the wrong direction.

-- 
paul moore
security @ redhat

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] [PATCH v2] Add argument filters to the seccomp sandbox
@ 2015-09-26  5:06 Namsun Ch'o
  2015-09-28 18:24 ` Paul Moore
  0 siblings, 1 reply; 14+ messages in thread
From: Namsun Ch'o @ 2015-09-26  5:06 UTC (permalink / raw)
  To: pmoore; +Cc: qemu-devel, eduardo.otubo

> I've suggested this in the past but to my knowledge no has done any work in
> this direction, including myself. Despite the lack of progress, I still
> think this is a very worthwhile idea.

Which is exactly why I think a configuration file would be the best option
instead of --enable-syscalls=foo,bar,baz. It would allow someone to easily
customize their policy without needing to create a patch, or wait on QEMU
developers to do work on it. The configuration file could be as simple as:

    shmctl arg0 eq IPC_PRIVATE and arg2 eq IPC_CREAT|0777 or IPC_CREAT|0600
    close  arg0 le 13 and arg0 ge 4
    ioctl  arg1 ne EVIL_IOCTL or ANOTHER_EVIL_ONE or MORE_EVIL_IOCTLS

Or something like:

    [shmctl]
    A0 EQ "IPC_PRIVATE"
    A2 EQ "IPC_CREAT|0777", "IPC_CREAT|0600"

    [close]
    A0 LE 13
    A0 GE 4

    [ioctl]
    A1 NE "EVIL_IOCTL", "ANOTHER_EVIL_ONE", "MORE_EVIL_IOCTLS"

And that would be the equivalent of hardcoding the following in the sandbox
file. Honestly, I think that the worry that admins will shoot themselves in
the foot is unfounded. Unless they know at least basic strace, QEMU will
simply get killed. That is of course if it is made such that it can only be
used to increase the strictness of already existing filtered syscalls, not
reduce the security by adding new syscalls to the argument-less whitelist.

    seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(shmctl), 2,
        SCMP_A0(SCMP_CMP_EQ, IPC_PRIVATE),
        SCMP_A2(SCMP_CMP_EQ, IPC_CREAT|0777));

    seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(shmctl), 2,
        SCMP_A0(SCMP_CMP_EQ, IPC_PRIVATE),
        SCMP_A2(SCMP_CMP_EQ, IPC_CREAT|0600));

    seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(close), 1,
        SCMP_A0(SCMP_CMP_LE, 13),
        SCMP_A0(SCMP_CMP_GE, 4));

    seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(ioctl), 1,
        SCMP_A1(SCMP_CMP_NE, EVIL_IOCTL),
        SCMP_A1(SCMP_CMP_NE, ANOTHER_EVIL_ONE),
        SCMP_A1(SCMP_CMP_NE, MORE_EVIL_IOCTLS));

I think the best part of that would be that it would be much easier for the
common VM setups to have pre-made policies, so users could include
"filesystem_access.scmp" and "remote_vnc.scmp" and "usermode_network.scmp"
inside /etc/qemu/seccomp.d for a system where they will be using QEMU with
usermode networking, remote VNC, and mounting a shared directory. That would
be significantly easier to distribute and update than it would be to create
new hardcoded code in qemu-seccomp.c.

If I find time to make a patch which would do this, would it be likely
accepted or is there a policy against such a thing?

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] [PATCH v2] Add argument filters to the seccomp sandbox
  2015-09-25  4:53 Namsun Ch'o
@ 2015-09-25 17:03 ` Paul Moore
  0 siblings, 0 replies; 14+ messages in thread
From: Paul Moore @ 2015-09-25 17:03 UTC (permalink / raw)
  To: Namsun Ch'o; +Cc: qemu-devel, eduardo.otubo

On Friday, September 25, 2015 12:53:04 AM Namsun Ch'o wrote:
> Another idea which would fit in with the security model is to have a dynamic
> sandbox which enables syscalls and syscall filters based on what command
> line or config parameters are passed to QEMU on its first start.

I've suggested this in the past but to my knowledge no has done any work in 
this direction, including myself.  Despite the lack of progress, I still think 
this is a very worthwhile idea.

-- 
paul moore
security @ redhat

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] [PATCH v2] Add argument filters to the seccomp sandbox
@ 2015-09-25  4:53 Namsun Ch'o
  2015-09-25 17:03 ` Paul Moore
  0 siblings, 1 reply; 14+ messages in thread
From: Namsun Ch'o @ 2015-09-25  4:53 UTC (permalink / raw)
  To: eduardo.otubo; +Cc: pmoore, qemu-devel

>Can you point out which exact use case breaks if you don't whitelist the
>below mentioned system calls' flags?

It happens whenever I do -runas with the sandbox enabled, or chroot with the
sandbox enabled.

sh# qemu-system-x86_64 -m 2048 -enable-kvm -chroot /var/empty -sandbox on \
> -cdrom /tmp/devuan-jessie-netboot-i386-alpha2.iso
^C
Session terminated, terminating shell...^C ...terminated.
sh# tail -n 1 /var/log/audit/audit.log | fold -s -w 80
type=SECCOMP msg=audit(1443154833.702:286096): auid=0 uid=0 gid=0 ses=12
pid=985623 comm="qemu-system-x86" exe="/usr/bin/qemu-system-x86_64" sig=31
arch=c000003e syscall=161 compat=0 ip=0x309c2885397 code=0x0

>We thought about this in beggining of the development of seccomp on
>qemu. Some feature like allow all, which would print to stderr all
>illegal hits and a another argument like
>-sandbox_add="syscall1,syscall2", but this would be against the concept
>of the whole security schema. We don't want the user to take full
>control of it, and if you're a developer, you know what to do.

Is there an official security model for QEMU? I actually think a config file
which contains seccomp rules would be a very good idea, because it would let
the people who want to deploy a secure VM do so, so they can tighten it up
based on the functions they need, without needing to go to the trouble of
compiling a custom version (which might not be a very good idea when your job
is on the line when some unexpected crash caused by a custom patch causes
several hours of downtime for customers).

Another idea which would fit in with the security model is to have a dynamic
sandbox which enables syscalls and syscall filters based on what command line
or config parameters are passed to QEMU on its first start. For example QEMU
should have no need to perform every single filesystem operation that exists
on a setup where 9p is not in use. The same applies to the highly dangerous
syscalls like setsockopt, getsockopt, and ioctl, which would have to be
blanket enabled just because someone might use an obscure configuration.
Implementing a dynamic seccomp policy would be as easy as something like:

if (howerver_qemu_checks_for_enabled_options(optname) == 0)
    enable_calls_needed_for_optname();

>Isn't it IPC_CREAT? Or am I missing something?

Yes, that was a dumb typo on my part. I posted an older patch of mine before
fixing that typo

>Can you resend a v3 describing the changes you did from v1 to v2 and v3?
>This helps keep tracking of ideas and discussions.

Yes, I put it in a new top thread as the FAQ suggested.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] [PATCH v2] Add argument filters to the seccomp sandbox
  2015-09-11  0:54 namnamc
@ 2015-09-24  9:59 ` Eduardo Otubo
  0 siblings, 0 replies; 14+ messages in thread
From: Eduardo Otubo @ 2015-09-24  9:59 UTC (permalink / raw)
  To: namnamc; +Cc: pmoore, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 6110 bytes --]

On Thu, Sep 10, 2015 at 08=54=28PM -0400, namnamc@Safe-mail.net wrote:
> > The current intention of the seccomp filter in QEMU, is that /all/ existing
> > QEMU features continue to work unchanged. So even if a flag is used in a
> > seemingly uncommon code path, we still need to allow that in a seccomp
> > filter.
> It already doesn't work very well, e.g. with -chroot, it fails because chroot()
> is not whitelisted, same with -runas because setgid() etc isn't whitelisted.
> Maybe there should be an extra option for -sandbox, like "-sandbox experimental"
> which does argument filtering, which of course may break something, and the old
> behavior would do plain syscall filtering without caring about arguments, because
> that's so much easier to guarantee to work, even if it provides little security.

Can you point out which exact use case breaks if you don't whitelist the
below mentioned system calls' flags?

> 
> We could also change the default behavior from SCMP_ACT_KILL (which kills the
> entire thing as soon as a single violation occurs) to SCMP_ACT_ERRNO(EPERM), which
> will just return EPERM for a syscall with a violation. The software will be much
> more capable of handling a permission denied error without crashing. Although of
> course that violates the principle of fast-fail.

We thought about this in beggining of the development of seccomp on
qemu. Some feature like allow all, which would print to stderr all
illegal hits and a another argument like
-sandbox_add="syscall1,syscall2", but this would be against the concept
of the whole security schema. We don't want the user to take full
control of it, and if you're a developer, you know what to do.

> 
> > So we need to add DODUMP, DONTDUMP, UNMERGABLE and WILLNEED here. That
> > is still stricter than the previous allow-everything rule, so a net
> > win.
> And MADV_INVALID too I assume? That was one of the others I got with grep.
> 
> > > +
> > > +    /* shmget */
> > > +    rc = seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(shmget), 2,
> > > +        SCMP_A0(SCMP_CMP_EQ, IPC_PRIVATE),
> > > +        SCMP_A2(SCMP_CMP_EQ, IP_CREAT|0777));
> > 
> > I'm not familiar with semantics of these seccomp rules, but is this
> > saying that the second arg must be exactly equal to IP_CREAT|0777 ?
> > If the app passes IP_CREAT|0600, would that be permitted instead ?
> > The latter is what I see gtk2 source code passing for mode.
> Argument 2 must be exactly equal to IP_CREAT|0777, yes, otherwise Qemu dies with
> SIGSYS. I did check the Qemu source and saw 0777 was harcoded. Does the 0600 mask
> in GTK2 ever get called in Qemu? Anyway I added the MADV flags and the 0600 mask
> in the v2 patch.
> 
> Signed-off-by: Namsun Ch'o <namnamc@safe-mail.net>
> ---
> diff --git a/qemu-seccomp.c b/qemu-seccomp.c
> index f9de0d3..a353ef9 100644
> --- a/qemu-seccomp.c
> +++ b/qemu-seccomp.c
> @@ -14,6 +14,8 @@
>   */
>  #include <stdio.h>
>  #include <seccomp.h>
> +#include <linux/ipc.h>
> +#include <asm-generic/mman-common.h>
>  #include "sysemu/seccomp.h"
>  
>  struct QemuSeccompSyscall {
> @@ -105,7 +107,6 @@ static const struct QemuSeccompSyscall seccomp_whitelist[] = {
>      { SCMP_SYS(rt_sigreturn), 245 },
>      { SCMP_SYS(sync), 245 },
>      { SCMP_SYS(pread64), 245 },
> -    { SCMP_SYS(madvise), 245 },
>      { SCMP_SYS(set_robust_list), 245 },
>      { SCMP_SYS(lseek), 245 },
>      { SCMP_SYS(pselect6), 245 },
> @@ -224,11 +225,9 @@ static const struct QemuSeccompSyscall seccomp_whitelist[] = {
>      { SCMP_SYS(arch_prctl), 240 },
>      { SCMP_SYS(mkdir), 240 },
>      { SCMP_SYS(fchmod), 240 },
> -    { SCMP_SYS(shmget), 240 },
>      { SCMP_SYS(shmat), 240 },
>      { SCMP_SYS(shmdt), 240 },
>      { SCMP_SYS(timerfd_create), 240 },
> -    { SCMP_SYS(shmctl), 240 },
>      { SCMP_SYS(mlockall), 240 },
>      { SCMP_SYS(mlock), 240 },
>      { SCMP_SYS(munlock), 240 },
> @@ -264,6 +263,60 @@ int seccomp_start(void)
>          }
>      }
>  
> +    /* madvise */
> +    static const int madvise_flags[] = {
> +        MADV_DODUMP,
> +        MADV_DONTDUMP,
> +        MADV_INVALID,
> +        MADV_UNMERGEABLE,
> +        MADV_WILLNEED,
> +        MADV_DONTFORK,
> +        MADV_DONTNEED,
> +        MADV_HUGEPAGE,
> +        MADV_MERGEABLE,
> +    };
> +    for (i = 0; i < ARRAY_SIZE(madvise_flags); i++) {
> +        rc = seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(madvise), 1,
> +            SCMP_A2(SCMP_CMP_EQ, madvise_flags[i]));
> +        if (rc < 0) {
> +            goto seccomp_return;
> +        }
> +    }
> +    rc = seccomp_syscall_priority(ctx, SCMP_SYS(madvise), 245);
> +    if (rc < 0) {
> +        goto seccomp_return;
> +    }
> +
> +    /* shmget */
> +    rc = seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(shmget), 2,
> +        SCMP_A0(SCMP_CMP_EQ, IPC_PRIVATE),
> +        SCMP_A2(SCMP_CMP_EQ, IP_CREAT|0777));
> +    if (rc < 0) {
> +        goto seccomp_return;
> +    }
> +    rc = seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(shmget), 2,
> +        SCMP_A0(SCMP_CMP_EQ, IPC_PRIVATE),
> +        SCMP_A2(SCMP_CMP_EQ, IP_CREAT|0600));

Isn't it IPC_CREAT? Or am I missing something?

Can you resend a v3 describing the changes you did from v1 to v2 and v3?
This helps keep tracking of ideas and discussions.

Thanks a lot for the contribution!

> +    if (rc < 0) {
> +        goto seccomp_return;
> +    }
> +    rc = seccomp_syscall_priority(ctx, SCMP_SYS(shmget), 240);
> +    if (rc < 0) {
> +        goto seccomp_return;
> +    }
> +
> +    /* shmctl */
> +    rc = seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(shmctl), 2,
> +        SCMP_A1(SCMP_CMP_EQ, IPC_RMID),
> +        SCMP_A2(SCMP_CMP_EQ, 0));
> +    if (rc < 0) {
> +        goto seccomp_return;
> +    }
> +    rc = seccomp_syscall_priority(ctx, SCMP_SYS(shmctl), 240);
> +    if (rc < 0) {
> +        goto seccomp_return;
> +    }
> +
>      rc = seccomp_load(ctx);
>  
>    seccomp_return:

-- 
Eduardo Otubo
ProfitBricks GmbH

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Qemu-devel] [PATCH v2] Add argument filters to the seccomp sandbox
@ 2015-09-11  0:54 namnamc
  2015-09-24  9:59 ` Eduardo Otubo
  0 siblings, 1 reply; 14+ messages in thread
From: namnamc @ 2015-09-11  0:54 UTC (permalink / raw)
  To: berrange; +Cc: pmoore, qemu-devel, eduardo.otubo

> The current intention of the seccomp filter in QEMU, is that /all/ existing
> QEMU features continue to work unchanged. So even if a flag is used in a
> seemingly uncommon code path, we still need to allow that in a seccomp
> filter.
It already doesn't work very well, e.g. with -chroot, it fails because chroot()
is not whitelisted, same with -runas because setgid() etc isn't whitelisted.
Maybe there should be an extra option for -sandbox, like "-sandbox experimental"
which does argument filtering, which of course may break something, and the old
behavior would do plain syscall filtering without caring about arguments, because
that's so much easier to guarantee to work, even if it provides little security.

We could also change the default behavior from SCMP_ACT_KILL (which kills the
entire thing as soon as a single violation occurs) to SCMP_ACT_ERRNO(EPERM), which
will just return EPERM for a syscall with a violation. The software will be much
more capable of handling a permission denied error without crashing. Although of
course that violates the principle of fast-fail.

> So we need to add DODUMP, DONTDUMP, UNMERGABLE and WILLNEED here. That
> is still stricter than the previous allow-everything rule, so a net
> win.
And MADV_INVALID too I assume? That was one of the others I got with grep.

> > +
> > +    /* shmget */
> > +    rc = seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(shmget), 2,
> > +        SCMP_A0(SCMP_CMP_EQ, IPC_PRIVATE),
> > +        SCMP_A2(SCMP_CMP_EQ, IP_CREAT|0777));
> 
> I'm not familiar with semantics of these seccomp rules, but is this
> saying that the second arg must be exactly equal to IP_CREAT|0777 ?
> If the app passes IP_CREAT|0600, would that be permitted instead ?
> The latter is what I see gtk2 source code passing for mode.
Argument 2 must be exactly equal to IP_CREAT|0777, yes, otherwise Qemu dies with
SIGSYS. I did check the Qemu source and saw 0777 was harcoded. Does the 0600 mask
in GTK2 ever get called in Qemu? Anyway I added the MADV flags and the 0600 mask
in the v2 patch.

Signed-off-by: Namsun Ch'o <namnamc@safe-mail.net>
---
diff --git a/qemu-seccomp.c b/qemu-seccomp.c
index f9de0d3..a353ef9 100644
--- a/qemu-seccomp.c
+++ b/qemu-seccomp.c
@@ -14,6 +14,8 @@
  */
 #include <stdio.h>
 #include <seccomp.h>
+#include <linux/ipc.h>
+#include <asm-generic/mman-common.h>
 #include "sysemu/seccomp.h"
 
 struct QemuSeccompSyscall {
@@ -105,7 +107,6 @@ static const struct QemuSeccompSyscall seccomp_whitelist[] = {
     { SCMP_SYS(rt_sigreturn), 245 },
     { SCMP_SYS(sync), 245 },
     { SCMP_SYS(pread64), 245 },
-    { SCMP_SYS(madvise), 245 },
     { SCMP_SYS(set_robust_list), 245 },
     { SCMP_SYS(lseek), 245 },
     { SCMP_SYS(pselect6), 245 },
@@ -224,11 +225,9 @@ static const struct QemuSeccompSyscall seccomp_whitelist[] = {
     { SCMP_SYS(arch_prctl), 240 },
     { SCMP_SYS(mkdir), 240 },
     { SCMP_SYS(fchmod), 240 },
-    { SCMP_SYS(shmget), 240 },
     { SCMP_SYS(shmat), 240 },
     { SCMP_SYS(shmdt), 240 },
     { SCMP_SYS(timerfd_create), 240 },
-    { SCMP_SYS(shmctl), 240 },
     { SCMP_SYS(mlockall), 240 },
     { SCMP_SYS(mlock), 240 },
     { SCMP_SYS(munlock), 240 },
@@ -264,6 +263,60 @@ int seccomp_start(void)
         }
     }
 
+    /* madvise */
+    static const int madvise_flags[] = {
+        MADV_DODUMP,
+        MADV_DONTDUMP,
+        MADV_INVALID,
+        MADV_UNMERGEABLE,
+        MADV_WILLNEED,
+        MADV_DONTFORK,
+        MADV_DONTNEED,
+        MADV_HUGEPAGE,
+        MADV_MERGEABLE,
+    };
+    for (i = 0; i < ARRAY_SIZE(madvise_flags); i++) {
+        rc = seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(madvise), 1,
+            SCMP_A2(SCMP_CMP_EQ, madvise_flags[i]));
+        if (rc < 0) {
+            goto seccomp_return;
+        }
+    }
+    rc = seccomp_syscall_priority(ctx, SCMP_SYS(madvise), 245);
+    if (rc < 0) {
+        goto seccomp_return;
+    }
+
+    /* shmget */
+    rc = seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(shmget), 2,
+        SCMP_A0(SCMP_CMP_EQ, IPC_PRIVATE),
+        SCMP_A2(SCMP_CMP_EQ, IP_CREAT|0777));
+    if (rc < 0) {
+        goto seccomp_return;
+    }
+    rc = seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(shmget), 2,
+        SCMP_A0(SCMP_CMP_EQ, IPC_PRIVATE),
+        SCMP_A2(SCMP_CMP_EQ, IP_CREAT|0600));
+    if (rc < 0) {
+        goto seccomp_return;
+    }
+    rc = seccomp_syscall_priority(ctx, SCMP_SYS(shmget), 240);
+    if (rc < 0) {
+        goto seccomp_return;
+    }
+
+    /* shmctl */
+    rc = seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(shmctl), 2,
+        SCMP_A1(SCMP_CMP_EQ, IPC_RMID),
+        SCMP_A2(SCMP_CMP_EQ, 0));
+    if (rc < 0) {
+        goto seccomp_return;
+    }
+    rc = seccomp_syscall_priority(ctx, SCMP_SYS(shmctl), 240);
+    if (rc < 0) {
+        goto seccomp_return;
+    }
+
     rc = seccomp_load(ctx);
 
   seccomp_return:

^ permalink raw reply related	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2015-10-01  7:17 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-09-29  3:14 [Qemu-devel] [PATCH v2] Add argument filters to the seccomp sandbox Namsun Ch'o
2015-09-29 15:38 ` Eduardo Otubo
2015-09-29 22:12 ` Paul Moore
     [not found] <d55ad1eed872006f0634c3e0067553a5@airmail.cc>
2015-10-01  7:17 ` Markus Armbruster
  -- strict thread matches above, loose matches on Subject: below --
2015-09-30  6:41 Namsun Ch'o
2015-10-01  5:58 ` Markus Armbruster
2015-09-28 21:34 Namsun Ch'o
2015-09-29  1:19 ` Paul Moore
2015-09-26  5:06 Namsun Ch'o
2015-09-28 18:24 ` Paul Moore
2015-09-25  4:53 Namsun Ch'o
2015-09-25 17:03 ` Paul Moore
2015-09-11  0:54 namnamc
2015-09-24  9:59 ` Eduardo Otubo

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.