From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([208.118.235.92]:53576) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Sm8aD-0000E4-Me for qemu-devel@nongnu.org; Tue, 03 Jul 2012 15:16:23 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Sm8aB-0003gJ-CK for qemu-devel@nongnu.org; Tue, 03 Jul 2012 15:16:21 -0400 Received: from mail-qa0-f45.google.com ([209.85.216.45]:47485) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Sm8aB-0003fu-4g for qemu-devel@nongnu.org; Tue, 03 Jul 2012 15:16:19 -0400 Received: by qaeb19 with SMTP id b19so2711629qae.4 for ; Tue, 03 Jul 2012 12:16:17 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: <4FF1E2D4.1050702@linux.vnet.ibm.com> References: <5022524.gIe1TV6Uvp@sifl> <3077496.8pYx57Tfhz@sifl> <4FE05CC8.9040801@redhat.com> <4FE2D576.10509@redhat.com> <4FEB7A4D.7050608@redhat.com> <4FF1E2D4.1050702@linux.vnet.ibm.com> From: Blue Swirl Date: Tue, 3 Jul 2012 19:15:55 +0000 Message-ID: Content-Type: text/plain; charset=UTF-8 Subject: Re: [Qemu-devel] [RFC] [PATCHv2 2/2] Adding basic calls to libseccomp in vl.c List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Corey Bryant Cc: Paul Moore , qemu-devel@nongnu.org, Avi Kivity , Anthony Liguori , Eduardo Otubo On Mon, Jul 2, 2012 at 6:05 PM, Corey Bryant wrote: > > > On 06/28/2012 03:49 PM, Blue Swirl wrote: >> >> On Wed, Jun 27, 2012 at 9:25 PM, Anthony Liguori >> wrote: >>> >>> On 06/21/2012 03:04 AM, Avi Kivity wrote: >>>> >>>> >>>> On 06/19/2012 09:58 PM, Blue Swirl wrote: >>>>>>> >>>>>>> >>>>>>> At least qemu-ifup/down scripts, migration exec and smbd have been >>>>>>> mentioned. Only the system calls made by smbd (for some version of >>>>>>> it) >>>>>>> can be known. The user could specify arbitrary commands for the >>>>>>> others, those could be assumed to use some common (large) subset of >>>>>>> system calls but I think the security value would be close to zero >>>>>>> then. >>>>>> >>>>>> >>>>>> >>>>>> We're not trying to protect against the user, but against the guest. >>>>>> If >>>>>> we assume the user wrote those scripts with care so they cannot be >>>>>> exploited by the guest, then we are okay. >>>>> >>>>> >>>>> >>>>> My concern was that first we could accidentally filter a system call >>>>> that changes the script or executable behavior, much like sendmail + >>>>> capabilities bug, and then a guest could trigger running this >>>>> script/executable and exploit the changed behavior. >>>> >>>> >>>> >>>> Ah, I see. I agree this is dangerous. We should probably disable exec >>>> if we seccomp. >>> >>> >>> >>> There's no great place to jump into this thread so I guess I'll do it >>> here. >>> >>> There is absolutely no doubt that white-listing syscalls that we >>> currently >>> use provides an improvement in security. >>> >>> We need to assume: >>> >>> 1) QEMU is run as an unprivileged user >>> >>> 2) QEMU is already heavily restricted by SELinux >>> >>> In this case, seccomp() is not being used to replace MAC or DAC. It's >>> supplementing both of them by additionally filtering out syscalls that >>> may >>> have unknown kernel exploits in them. That's all this initial effort is >>> about. Since it's scope is so limited, we can simply enable it >>> unconditionally too. >> >> >> I don't think the scope is limited in a safe way. What is the set of >> system calls that can't ever cause problems to any possible ifup/down >> scripts, migration exec helpers and various versions of smbd? >> >> For example, unlink() is missing. What if the ifup/down script needs >> it for lock file cleanup? ftruncate()? Every socket syscalls in case >> LDAP is used to access user information by the libc? >> >> I think we can't define the safe set, except 'allow all'. I'd propose >> one of the following to avoid breakage: >> >> 1. Allow all system calls for the initial patch, refactor later to >> reduce the set. Useless until refactored. > > > One thing I like about starting with a known subset of syscalls used by QEMU > is that it forces us to expand the whitelist if we come across more syscalls > that QEMU uses. Finding out what QEMU uses is the relatively easy part. Finding out what the external helpers might use seems to be impossible. > > An issue with this approach is that if seccomp kills QEMU for using a > disallowed syscall, I don't think we know what syscall it is. (At least, I > don't think it is accessible anywhere.) This is good for security but makes > it hard for developers who are debugging. > > Would it make sense to have the ability to configure QEMU in either: > 1) seccomp kill mode (this is what the existing patches do), or > 2) seccomp debug mode? > > In debug mode we could trap on the failing syscall (using SCMP_ACT_TRAP), > determine the syscall value, and issue an error message that displays the > syscall value. I think that it would be nice and it would be useful also after any refactoring. > > The emulator() function here gives an idea of how this could be done: > https://lkml.org/lkml/2012/4/12/449 > > >> >> 2. Don't make seccomp mode enabled default, when enabled, forbid >> execve(). Limits functionality when enabled, no security benefit if >> not enabled. >> >> 3. Before enabling seccomp, fork a helper process without restrictions >> that is used to launch other programs. Needs some work. >> >>> >>> After we have this initial support, then we can look at a -sandbox >>> option. >>> This open could prevent things like open()/execve() but that will come >>> at a >>> cost of features. >>> >>> I think the reasonable thing to do for -sandbox is to basically focus on >>> the >>> set of syscalls that QEMU would use if it were launched under libvirt. >>> We >>> should obviously make improvements (things like -blockdev) to make this >>> even >>> more restrictive. >>> >>> Who knows, maybe we end up having multiple types of sandboxes. A >>> '-sandbox >>> libvirt' and a '-sandbox user' where the later is focused on the typical >>> usage of an unprivileged user. >>> >>> But this is all stuff that can come later. We solve a big problem by >>> just >>> getting the initial whitelist support in. >> >> >> Fully agree, but we'd have to agree about what is a safe initial >> whitelist. >> >>> >>> Regards, >>> >>> Anthony Liguori >>> >>> >>>> >>>>>> >>>>>> We have decomposed qemu to some extent, in that privileged operations >>>>>> happen in libvirt. So the modes make sense - qemu has no idea whether >>>>>> a >>>>>> privileged management system is controlling it or not. >>>>> >>>>> >>>>> >>>>> So with -seccomp, libvirt could tell QEMU that for example open(), >>>>> execve(), bind() and connect() will never be needed? >>>> >>>> >>>> >>>> Yes. >>>> >>> >> > > -- > Regards, > Corey > >