From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([208.118.235.92]:53576)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <blauwirbel@gmail.com>) id 1Sm8aD-0000E4-Me
	for qemu-devel@nongnu.org; Tue, 03 Jul 2012 15:16:23 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <blauwirbel@gmail.com>) id 1Sm8aB-0003gJ-CK
	for qemu-devel@nongnu.org; Tue, 03 Jul 2012 15:16:21 -0400
Received: from mail-qa0-f45.google.com ([209.85.216.45]:47485)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <blauwirbel@gmail.com>) id 1Sm8aB-0003fu-4g
	for qemu-devel@nongnu.org; Tue, 03 Jul 2012 15:16:19 -0400
Received: by qaeb19 with SMTP id b19so2711629qae.4
	for <qemu-devel@nongnu.org>; Tue, 03 Jul 2012 12:16:17 -0700 (PDT)
MIME-Version: 1.0
In-Reply-To: <4FF1E2D4.1050702@linux.vnet.ibm.com>
References: <cover.1339614945.git.otubo@linux.vnet.ibm.com>
	<5022524.gIe1TV6Uvp@sifl>
	<CAAu8pHuWEARDbWxk7+cprxci6sM-Fiae2HFgdhZ5v7mePELNGQ@mail.gmail.com>
	<3077496.8pYx57Tfhz@sifl>
	<CAAu8pHs8Uiztxo5kHFwxGmSv5_MpQgAQJuM+gfvOyRNYvE4e-w@mail.gmail.com>
	<4FE05CC8.9040801@redhat.com>
	<CAAu8pHsVgmwYpahmRcxgDtHTn+4rknCTJy36vfSH=om9FswuLQ@mail.gmail.com>
	<4FE2D576.10509@redhat.com> <4FEB7A4D.7050608@redhat.com>
	<CAAu8pHtYmoJ7WCK7LAOj_j2YU-nAgiLTg7q4qXL3Vu-kPRpZnw@mail.gmail.com>
	<4FF1E2D4.1050702@linux.vnet.ibm.com>
From: Blue Swirl <blauwirbel@gmail.com>
Date: Tue, 3 Jul 2012 19:15:55 +0000
Message-ID: <CAAu8pHvCma5WPQYaGvv_1Ba7NES=zXFZQrQcf79nZHVRvvZ_Jw@mail.gmail.com>
Content-Type: text/plain; charset=UTF-8
Subject: Re: [Qemu-devel] [RFC] [PATCHv2 2/2] Adding basic calls to
 libseccomp in vl.c
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Corey Bryant <coreyb@linux.vnet.ibm.com>
Cc: Paul Moore <pmoore@redhat.com>, qemu-devel@nongnu.org, Avi Kivity <avi@redhat.com>, Anthony Liguori <anthony@codemonkey.ws>, Eduardo Otubo <otubo@linux.vnet.ibm.com>

On Mon, Jul 2, 2012 at 6:05 PM, Corey Bryant <coreyb@linux.vnet.ibm.com> wrote:
>
>
> On 06/28/2012 03:49 PM, Blue Swirl wrote:
>>
>> On Wed, Jun 27, 2012 at 9:25 PM, Anthony Liguori <anthony@codemonkey.ws>
>> wrote:
>>>
>>> On 06/21/2012 03:04 AM, Avi Kivity wrote:
>>>>
>>>>
>>>> On 06/19/2012 09:58 PM, Blue Swirl wrote:
>>>>>>>
>>>>>>>
>>>>>>> At least qemu-ifup/down scripts, migration exec and smbd have been
>>>>>>> mentioned. Only the system calls made by smbd (for some version of
>>>>>>> it)
>>>>>>> can be known. The user could specify arbitrary commands for the
>>>>>>> others, those could be assumed to use some common (large) subset of
>>>>>>> system calls but I think the security value would be close to zero
>>>>>>> then.
>>>>>>
>>>>>>
>>>>>>
>>>>>> We're not trying to protect against the user, but against the guest.
>>>>>> If
>>>>>> we assume the user wrote those scripts with care so they cannot be
>>>>>> exploited by the guest, then we are okay.
>>>>>
>>>>>
>>>>>
>>>>> My concern was that first we could accidentally filter a system call
>>>>> that changes the script or executable behavior, much like sendmail +
>>>>> capabilities bug, and then a guest could trigger running this
>>>>> script/executable and exploit the changed behavior.
>>>>
>>>>
>>>>
>>>> Ah, I see.  I agree this is dangerous.  We should probably disable exec
>>>> if we seccomp.
>>>
>>>
>>>
>>> There's no great place to jump into this thread so I guess I'll do it
>>> here.
>>>
>>> There is absolutely no doubt that white-listing syscalls that we
>>> currently
>>> use provides an improvement in security.
>>>
>>> We need to assume:
>>>
>>> 1) QEMU is run as an unprivileged user
>>>
>>> 2) QEMU is already heavily restricted by SELinux
>>>
>>> In this case, seccomp() is not being used to replace MAC or DAC.  It's
>>> supplementing both of them by additionally filtering out syscalls that
>>> may
>>> have unknown kernel exploits in them.  That's all this initial effort is
>>> about. Since it's scope is so limited, we can simply enable it
>>> unconditionally too.
>>
>>
>> I don't think the scope is limited in a safe way. What is the set of
>> system calls that can't ever cause problems to any possible ifup/down
>> scripts, migration exec helpers and various versions of smbd?
>>
>> For example, unlink() is missing. What if the ifup/down script needs
>> it for lock file cleanup? ftruncate()? Every socket syscalls in case
>> LDAP is used to access user information by the libc?
>>
>> I think we can't define the safe set, except 'allow all'. I'd propose
>> one of the following to avoid breakage:
>>
>> 1. Allow all system calls for the initial patch, refactor later to
>> reduce the set. Useless until refactored.
>
>
> One thing I like about starting with a known subset of syscalls used by QEMU
> is that it forces us to expand the whitelist if we come across more syscalls
> that QEMU uses.

Finding out what QEMU uses is the relatively easy part. Finding out
what the external helpers might use seems to be impossible.

>
> An issue with this approach is that if seccomp kills QEMU for using a
> disallowed syscall, I don't think we know what syscall it is.  (At least, I
> don't think it is accessible anywhere.)  This is good for security but makes
> it hard for developers who are debugging.
>
> Would it make sense to have the ability to configure QEMU in either:
> 1) seccomp kill mode (this is what the existing patches do), or
> 2) seccomp debug mode?
>
> In debug mode we could trap on the failing syscall (using SCMP_ACT_TRAP),
> determine the syscall value, and issue an error message that displays the
> syscall value.

I think that it would be nice and it would be useful also after any refactoring.

>
> The emulator() function here gives an idea of how this could be done:
> https://lkml.org/lkml/2012/4/12/449
>
>
>>
>> 2. Don't make seccomp mode enabled default, when enabled, forbid
>> execve(). Limits functionality when enabled, no security benefit if
>> not enabled.
>>
>> 3. Before enabling seccomp, fork a helper process without restrictions
>> that is used to launch other programs. Needs some work.
>>
>>>
>>> After we have this initial support, then we can look at a -sandbox
>>> option.
>>>   This open could prevent things like open()/execve() but that will come
>>> at a
>>> cost of features.
>>>
>>> I think the reasonable thing to do for -sandbox is to basically focus on
>>> the
>>> set of syscalls that QEMU would use if it were launched under libvirt.
>>> We
>>> should obviously make improvements (things like -blockdev) to make this
>>> even
>>> more restrictive.
>>>
>>> Who knows, maybe we end up having multiple types of sandboxes.  A
>>> '-sandbox
>>> libvirt' and a '-sandbox user' where the later is focused on the typical
>>> usage of an unprivileged user.
>>>
>>> But this is all stuff that can come later.  We solve a big problem by
>>> just
>>> getting the initial whitelist support in.
>>
>>
>> Fully agree, but we'd have to agree about what is a safe initial
>> whitelist.
>>
>>>
>>> Regards,
>>>
>>> Anthony Liguori
>>>
>>>
>>>>
>>>>>>
>>>>>> We have decomposed qemu to some extent, in that privileged operations
>>>>>> happen in libvirt.  So the modes make sense - qemu has no idea whether
>>>>>> a
>>>>>> privileged management system is controlling it or not.
>>>>>
>>>>>
>>>>>
>>>>> So with -seccomp, libvirt could tell QEMU that for example open(),
>>>>> execve(), bind() and connect() will never be needed?
>>>>
>>>>
>>>>
>>>> Yes.
>>>>
>>>
>>
>
> --
> Regards,
> Corey
>
>