From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([208.118.235.92]:45783) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Sh1ei-0001ZC-D3 for qemu-devel@nongnu.org; Tue, 19 Jun 2012 12:51:54 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Sh1eb-0001Ng-6C for qemu-devel@nongnu.org; Tue, 19 Jun 2012 12:51:51 -0400 Received: from e34.co.us.ibm.com ([32.97.110.152]:34296) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Sh1ea-0001MK-S9 for qemu-devel@nongnu.org; Tue, 19 Jun 2012 12:51:45 -0400 Received: from /spool/local by e34.co.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Tue, 19 Jun 2012 10:51:36 -0600 Received: from d01relay03.pok.ibm.com (d01relay03.pok.ibm.com [9.56.227.235]) by d01dlp03.pok.ibm.com (Postfix) with ESMTP id 30601C90068 for ; Tue, 19 Jun 2012 12:51:32 -0400 (EDT) Received: from d01av03.pok.ibm.com (d01av03.pok.ibm.com [9.56.224.217]) by d01relay03.pok.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id q5JGpXa8145382 for ; Tue, 19 Jun 2012 12:51:33 -0400 Received: from d01av03.pok.ibm.com (loopback [127.0.0.1]) by d01av03.pok.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id q5JGpNCp027614 for ; Tue, 19 Jun 2012 13:51:23 -0300 Message-ID: <4FE0AE09.8050506@linux.vnet.ibm.com> Date: Tue, 19 Jun 2012 12:51:21 -0400 From: Corey Bryant MIME-Version: 1.0 References: <20120613203305.GC6019@redhat.com> <20120618083335.GD28026@redhat.com> <4FDF479B.9060502@linux.vnet.ibm.com> <4FDFA36E.4010802@linux.vnet.ibm.com> <4FE08025.6030406@linux.vnet.ibm.com> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [RFC] [PATCHv2 2/2] Adding basic calls to libseccomp in vl.c List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Blue Swirl Cc: Will Drewry , qemu-devel , Eduardo Otubo On 06/19/2012 11:37 AM, Will Drewry wrote: > On Tue, Jun 19, 2012 at 8:35 AM, Corey Bryant wrote: >> >> >> On 06/18/2012 06:14 PM, Will Drewry wrote: >>> >>> [-all] >>> >>> On Mon, Jun 18, 2012 at 4:53 PM, Corey Bryant >>> wrote: >>>> >>>> >>>> >>>> On 06/18/2012 04:18 PM, Blue Swirl wrote: >>>>> >>>>> >>>>> On Mon, Jun 18, 2012 at 3:22 PM, Corey Bryant >>>>> >>>>> wrote: >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On 06/18/2012 04:33 AM, Daniel P. Berrange wrote: >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Fri, Jun 15, 2012 at 07:04:45PM +0000, Blue Swirl wrote: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Wed, Jun 13, 2012 at 8:33 PM, Daniel P. Berrange >>>>>>>> >>>>>>>> wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Wed, Jun 13, 2012 at 07:56:06PM +0000, Blue Swirl wrote: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Wed, Jun 13, 2012 at 7:20 PM, Eduardo Otubo >>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> I added a syscall struct using priority levels as described in the >>>>>>>>>>> libseccomp man page. The priority numbers are based to the >>>>>>>>>>> frequency >>>>>>>>>>> they appear in a sample strace from a regular qemu guest run under >>>>>>>>>>> libvirt. >>>>>>>>>>> >>>>>>>>>>> Libseccomp generates linear BPF code to filter system calls, those >>>>>>>>>>> rules >>>>>>>>>>> are read one after another. The priority system places the most >>>>>>>>>>> common >>>>>>>>>>> rules first in order to reduce the overhead when processing them. >>>>>>>>>>> >>>>>>>>>>> Also, since this is just a first RFC, the whitelist is a little >>>>>>>>>>> raw. >>>>>>>>>>> We >>>>>>>>>>> might need your help to improve, test and fine tune the set of >>>>>>>>>>> system >>>>>>>>>>> calls. >>>>>>>>>>> >>>>>>>>>>> v2: Fixed some style issues >>>>>>>>>>> Removed code from vl.c and created qemu-seccomp.[ch] >>>>>>>>>>> Now using ARRAY_SIZE macro >>>>>>>>>>> Added more syscalls without priority/frequency set yet >>>>>>>>>>> >>>>>>>>>>> Signed-off-by: Eduardo Otubo >>>>>>>>>>> --- >>>>>>>>>>> qemu-seccomp.c | 73 >>>>>>>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >>>>>>>>>>> qemu-seccomp.h | 9 +++++++ >>>>>>>>>>> vl.c | 7 ++++++ >>>>>>>>>>> 3 files changed, 89 insertions(+) >>>>>>>>>>> create mode 100644 qemu-seccomp.c >>>>>>>>>>> create mode 100644 qemu-seccomp.h >>>>>>>>>>> >>>>>>>>>>> diff --git a/qemu-seccomp.c b/qemu-seccomp.c >>>>>>>>>>> new file mode 100644 >>>>>>>>>>> index 0000000..048b7ba >>>>>>>>>>> --- /dev/null >>>>>>>>>>> +++ b/qemu-seccomp.c >>>>>>>>>>> @@ -0,0 +1,73 @@ >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Copyright and license info missing. >>>>>>>>>> >>>>>>>>>>> +#include >>>>>>>>>>> +#include >>>>>>>>>>> +#include "qemu-seccomp.h" >>>>>>>>>>> + >>>>>>>>>>> +static struct QemuSeccompSyscall seccomp_whitelist[] = { >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> 'const' >>>>>>>>>> >>>>>>>>>>> + { SCMP_SYS(timer_settime), 255 }, >>>>>>>>>>> + { SCMP_SYS(timer_gettime), 254 }, >>>>>>>>>>> + { SCMP_SYS(futex), 253 }, >>>>>>>>>>> + { SCMP_SYS(select), 252 }, >>>>>>>>>>> + { SCMP_SYS(recvfrom), 251 }, >>>>>>>>>>> + { SCMP_SYS(sendto), 250 }, >>>>>>>>>>> + { SCMP_SYS(read), 249 }, >>>>>>>>>>> + { SCMP_SYS(brk), 248 }, >>>>>>>>>>> + { SCMP_SYS(clone), 247 }, >>>>>>>>>>> + { SCMP_SYS(mmap), 247 }, >>>>>>>>>>> + { SCMP_SYS(mprotect), 246 }, >>>>>>>>>>> + { SCMP_SYS(ioctl), 245 }, >>>>>>>>>>> + { SCMP_SYS(recvmsg), 245 }, >>>>>>>>>>> + { SCMP_SYS(sendmsg), 245 }, >>>>>>>>>>> + { SCMP_SYS(accept), 245 }, >>>>>>>>>>> + { SCMP_SYS(connect), 245 }, >>>>>>>>>>> + { SCMP_SYS(bind), 245 }, >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> It would be nice to avoid connect() and bind(). Perhaps seccomp >>>>>>>>>> init >>>>>>>>>> should be postponed to after all sockets have been created? >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> If you want to migrate your guest, you need to be able to >>>>>>>>> call connect() at an arbitrary point in the QEMU process' >>>>>>>>> lifecycle. So you can't avoid allowing connect(). Similarly >>>>>>>>> if you want to allow hotplug of NICs (and their backends) >>>>>>>>> then you need to have both bind() + connect() available. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> That's bad. Migration could conceivably be extended to use file >>>>>>>> descriptor passing, but hotplug is more tricky. >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> As with execve(), i'm reporting this on the basis that on the previous >>>>>>> patch posting I was told we must whitelist any syscalls QEMU can >>>>>>> conceivably use to avoid any loss in functionality. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> Thanks for pointing out syscalls needed for the whitelist. >>>>>> >>>>>> As Paul has already mentioned, it was recommended that we restrict all >>>>>> of >>>>>> QEMU (as a single process) from the start of execution. This is >>>>>> opposed >>>>>> to >>>>>> other options of restricting QEMU from the time that vCPUS start, >>>>>> further >>>>>> restricting based on syscall parms, or decomposing QEMU into multiple >>>>>> processes that are individually restricted with their own seccomp >>>>>> whitelists. >>>>> >>>>> >>>>> >>>>> Can each thread have separate seccomp whitelists? For example CPU >>>>> threads should not need pretty much anything but the I/O thread needs >>>>> I/O. >>>>> >>>> >>>> No, seccomp filters are defined and enforced at the process level. >>> >>> >>> I'll keep lurking :) especially since I don't know the internals of >>> qemu well, but you can do per-thread seccomp filters since >>> processes==threads on linux. The real risk is that threads share so >>> much that an attack on the CPU thread may be able to parlay that into >>> a syscall proxy on a another thread. Probably what would make sense >>> in that way is a loose global filter, then have each sub-thread >>> install a functionality specific second filter. >>> >>> I may be way off base though, so feel free to just tell me to keep lurking >>> :) >>> >>> Thanks again for all the support and for pushing hard to get this >>> functionality in qemu! >> >> >> Please keep lurking! I appreciate the input and education. :) >> >> So whether it's a thread or process, I assume it will have its own a >> task_struct, allowing us to set a filter per thread or per process. The >> difference being that threads share more resources than processes. Sort of >> thinking out loud here to see if I'm right. > > Exactly! > >> It doesn't seem ideal vs process separation, but it's do-able. > > Yep -- so for something like qemu, you could install a global baseline > policy (e.g., union of all needed syscalls) then for each thread, they > can install a more restrictive set. The actual security guarantees > will be the total synthesis because of cross-thread attacks, but it > would make exploitation pretty painful. > > If you want better guarantees, then process separation is needed. One > option is even doing brokering for complex syscalls using either > ptrace or a sigsys handler, but that is likely too much to get into > while establishing a baseline. > In response to "Can each thread have separate seccomp whitelists?" please take a look at the thread above from Will Drewry. seccomp *can* be used per thread. However, it's not ideal vs per process seccomp filters. -- Regards, Corey >> You don't mind if I share your input with the others, do you? > > Of course not! > > cheers! > >> >> -- >> Regards, >> Corey >> >> >>> >>>> >>>>>> I think this approach is a good starting point that can be further >>>>>> tuned >>>>>> in >>>>>> the future. And as with most security measures, defense in depth >>>>>> improves >>>>>> the cause (e.g. combining seccomp with DAC or MAC). >>>>> >>>>> >>>>> >>>>> Agreed. >>>>> >>>>>> >>>>>> -- >>>>>> Regards, >>>>>> Corey >>>>>> >>>>>> >>>>> >>>> >>>> >>>> >>> >> >