From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754088Ab2ARBCI (ORCPT ); Tue, 17 Jan 2012 20:02:08 -0500 Received: from mail-tul01m020-f174.google.com ([209.85.214.174]:40390 "EHLO mail-tul01m020-f174.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753852Ab2ARBCE convert rfc822-to-8bit (ORCPT ); Tue, 17 Jan 2012 20:02:04 -0500 MIME-Version: 1.0 In-Reply-To: <49017bd7edab7010cd9ac767e39d99e4.squirrel@webmail.greenhost.nl> References: <20120112172315.GA26295@redhat.com> <20120113173153.GA24273@redhat.com> <20120116183730.GB21112@redhat.com> <20120117164523.GA17070@redhat.com> <20120117170512.GB17070@redhat.com> <49017bd7edab7010cd9ac767e39d99e4.squirrel@webmail.greenhost.nl> From: Andrew Lutomirski Date: Tue, 17 Jan 2012 17:01:41 -0800 X-Google-Sender-Auth: HY3ttRpJj-jQaqJTFj2sUzxhLik Message-ID: Subject: Re: Compat 32-bit syscall entry from 64-bit task!? [was: Re: [RFC,PATCH 1/2] seccomp_filters: system call filtering using BPF] To: Indan Zupancic Cc: Oleg Nesterov , Will Drewry , linux-kernel@vger.kernel.org, keescook@chromium.org, john.johansen@canonical.com, serge.hallyn@canonical.com, coreyb@linux.vnet.ibm.com, pmoore@redhat.com, eparis@redhat.com, djm@mindrot.org, torvalds@linux-foundation.org, segoon@openwall.com, rostedt@goodmis.org, jmorris@namei.org, scarybeasts@gmail.com, avi@redhat.com, penberg@cs.helsinki.fi, viro@zeniv.linux.org.uk, mingo@elte.hu, akpm@linux-foundation.org, khilman@ti.com, borislav.petkov@amd.com, amwang@redhat.com, ak@linux.intel.com, eric.dumazet@gmail.com, gregkh@suse.de, dhowells@redhat.com, daniel.lezcano@free.fr, linux-fsdevel@vger.kernel.org, linux-security-module@vger.kernel.org, olofj@chromium.org, mhalcrow@google.com, dlaor@redhat.com, Roland McGrath , Andi Kleen Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8BIT Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Jan 17, 2012 at 4:56 PM, Indan Zupancic wrote: > On Tue, January 17, 2012 18:45, Andrew Lutomirski wrote: >> On Tue, Jan 17, 2012 at 9:05 AM, Oleg Nesterov wrote: >>> On 01/17, Andrew Lutomirski wrote: >>>> >>>> (is_compat_task says whether the executable was marked as 32-bit. �The >>>> actual execution mode is determined by the cs register, which the user >>>> can control. >>> >>> Confused... Afaics, TIF_IA32 says that the binary is 32-bit (this comes >>> along with TS_COMPAT). >>> >>> TS_COMPAT says that, say, the task did "int 80" to enters the kernel. >>> 64-bit or not, we should treat is as 32-bit in this case. >> >> I think you're right, and checking which entry was used is better than >> checking the cs register (since 64-bit code can use int80).  That's >> what I get for insufficiently careful reading of the assembly.  (And >> for going from memory from when I wrote the vsyscall emulation code -- >> that code is entered from a page fault, so the entry point used is >> irrelevant.) > > Wait: If a tasks is set to 64 bit mode, but calls into the kernel via > int 0x80 it's changed to 32 bit mode for that system call and back to > 64 bit mode when the system call is finished!? > > Our ptrace jailer is checking cs to figure out if a task is a compat task > or not, if the kernel can change that behind our back it means our jailer > isn't secure for x86_64 with compat enabled. Or is cs changed before the > ptrace stuff and ptrace sees the "right" cs value? If not, we have to add > an expensive PTRACE_PEEKTEXT to check if it's an int 0x80 or not. Or is > there another way? I don't know what your ptrace jailer does. But a task can switch itself between 32-bit and 64-bit execution at will, and there's nothing the kernel can do about it. (That isn't quite true -- in theory the kernel could fiddle with the GDT, but that would be expensive and wouldn't work on Xen.) That being said, is_compat_task is apparently a good indication of whether the current *syscall* entry is a 64-bit syscall or a 32-bit syscall. Perhaps the function should be renamed to in_compat_syscall, because that's what it does. > > I think this behaviour is so unexpected that it can only cause security > problems in the long run. Is anyone counting on this? Where is this > behaviour documented? Nowhere, I think. --Andy From mboxrd@z Thu Jan 1 00:00:00 1970 From: Andrew Lutomirski Subject: Re: Compat 32-bit syscall entry from 64-bit task!? [was: Re: [RFC,PATCH 1/2] seccomp_filters: system call filtering using BPF] Date: Tue, 17 Jan 2012 17:01:41 -0800 Message-ID: References: <20120112172315.GA26295@redhat.com> <20120113173153.GA24273@redhat.com> <20120116183730.GB21112@redhat.com> <20120117164523.GA17070@redhat.com> <20120117170512.GB17070@redhat.com> <49017bd7edab7010cd9ac767e39d99e4.squirrel@webmail.greenhost.nl> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: Oleg Nesterov , Will Drewry , linux-kernel@vger.kernel.org, keescook@chromium.org, john.johansen@canonical.com, serge.hallyn@canonical.com, coreyb@linux.vnet.ibm.com, pmoore@redhat.com, eparis@redhat.com, djm@mindrot.org, torvalds@linux-foundation.org, segoon@openwall.com, rostedt@goodmis.org, jmorris@namei.org, scarybeasts@gmail.com, avi@redhat.com, penberg@cs.helsinki.fi, viro@zeniv.linux.org.uk, mingo@elte.hu, akpm@linux-foundation.org, khilman@ti.com, borislav.petkov@amd.com, amwang@redhat.com, ak@linux.intel.com, eric.dumazet@gmail.com, gregkh@suse.de, dhowells@redhat.com, daniel.lezcano@free.fr, linux-fsdevel@vger.kernel.org, linux-security-module@vger.kernel.org, olofj@chromium.org, mhalcrow@google.com, dlaor@redhat.com, Roland McGrath , Andi Kleen To: Indan Zupancic Return-path: In-Reply-To: <49017bd7edab7010cd9ac767e39d99e4.squirrel@webmail.greenhost.nl> Sender: linux-security-module-owner@vger.kernel.org List-Id: linux-fsdevel.vger.kernel.org On Tue, Jan 17, 2012 at 4:56 PM, Indan Zupancic wrote: > On Tue, January 17, 2012 18:45, Andrew Lutomirski wrote: >> On Tue, Jan 17, 2012 at 9:05 AM, Oleg Nesterov wro= te: >>> On 01/17, Andrew Lutomirski wrote: >>>> >>>> (is_compat_task says whether the executable was marked as 32-bit. = =EF=BF=BDThe >>>> actual execution mode is determined by the cs register, which the = user >>>> can control. >>> >>> Confused... Afaics, TIF_IA32 says that the binary is 32-bit (this c= omes >>> along with TS_COMPAT). >>> >>> TS_COMPAT says that, say, the task did "int 80" to enters the kerne= l. >>> 64-bit or not, we should treat is as 32-bit in this case. >> >> I think you're right, and checking which entry was used is better th= an >> checking the cs register (since 64-bit code can use int80). =C2=A0Th= at's >> what I get for insufficiently careful reading of the assembly. =C2=A0= (And >> for going from memory from when I wrote the vsyscall emulation code = -- >> that code is entered from a page fault, so the entry point used is >> irrelevant.) > > Wait: If a tasks is set to 64 bit mode, but calls into the kernel via > int 0x80 it's changed to 32 bit mode for that system call and back to > 64 bit mode when the system call is finished!? > > Our ptrace jailer is checking cs to figure out if a task is a compat = task > or not, if the kernel can change that behind our back it means our ja= iler > isn't secure for x86_64 with compat enabled. Or is cs changed before = the > ptrace stuff and ptrace sees the "right" cs value? If not, we have to= add > an expensive PTRACE_PEEKTEXT to check if it's an int 0x80 or not. Or = is > there another way? I don't know what your ptrace jailer does. But a task can switch itself between 32-bit and 64-bit execution at will, and there's nothing the kernel can do about it. (That isn't quite true -- in theory the kernel could fiddle with the GDT, but that would be expensive and wouldn't work on Xen.) That being said, is_compat_task is apparently a good indication of whether the current *syscall* entry is a 64-bit syscall or a 32-bit syscall. Perhaps the function should be renamed to in_compat_syscall, because that's what it does. > > I think this behaviour is so unexpected that it can only cause securi= ty > problems in the long run. Is anyone counting on this? Where is this > behaviour documented? Nowhere, I think. --Andy -- To unsubscribe from this list: send the line "unsubscribe linux-securit= y-module" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html