From mboxrd@z Thu Jan 1 00:00:00 1970 From: Vasiliy Kulikov Subject: Re: [RFC] x86: restrict pid namespaces to 32 or 64 bit syscalls Date: Sun, 14 Aug 2011 20:08:48 +0400 Message-ID: <20110814160848.GA4333@albatros> References: <20110812150304.GC16880@albatros> <4E45884B.8030303@zytor.com> <20110813062246.GC3851@albatros> <36fcaf94-2e99-47cb-a835-aefb79856429@email.android.com> <632d03b0-6725-431e-b100-13f5046b03e9@email.android.com> <20110814092028.GB14293@openwall.com> <01ba0cce-d28e-473e-be3a-7d3c8f185681@email.android.com> <20110814152729.GU5782@one.firstfloor.org> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: Content-Disposition: inline In-Reply-To: <20110814152729.GU5782-qrUzlfsMFqo/4alezvVtWx2eb7JE58TQ@public.gmane.org> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: containers-bounces-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org Errors-To: containers-bounces-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org To: Andi Kleen Cc: Will Drewry , kernel-hardening-ZwoEplunGu1jrUoiu81ncdBPR1lH4CV8@public.gmane.org, containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org, x86-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-security-module-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Ingo Molnar , Solar Designer , "H. Peter Anvin" , Thomas Gleixner List-Id: containers.vger.kernel.org (CC'ed Will Drewry, the author of new seccomp version, and containers list) On Sun, Aug 14, 2011 at 17:27 +0200, Andi Kleen wrote: > > i386 vs x86-64 vs x32 is just one of many axes along which syscalls can be restricted (and for that matter, one axis if backward compatibility), and it does not make sense to burden the code with ad hoc filters. Designing a general filter facility which can be used to restrict any container to the subset of system calls it actually needs would make more sense, no? > > I believe this is already in the newer versions of seccomp. The "newer versions of seccomp" are NAK'ed by Ingo. AFAIU, Ingo wants more generic filters to filter much more than syscalls. But it contradicts the security by simplicity, which we're trying to achieve with this patch. Compatibility syscalls are much more error prone than common syscalls as they lack good testing or sometimes lack it at all, unfortunately. The link I've posted is about a crazy bug - a completely uninitialized structure was used in copy_from_user() function. The function was not tested _at all_. I doubt any non-compatibility syscall (ioctl() handler, etc.) can be completely untested. Also we already have CONFIG_IA32_EMULATION, this patch only moves the configuration mechanism from the compilation stage to the runtime stage, it doesn't draw the new line. It grants the permissions to use the feature to some containers, but denies to other containers, which is an rather expected property of containers separation. Thanks, -- Vasiliy Kulikov http://www.openwall.com - bringing security into open computing environments From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754765Ab1HNQNO (ORCPT ); Sun, 14 Aug 2011 12:13:14 -0400 Received: from mail-fx0-f46.google.com ([209.85.161.46]:42969 "EHLO mail-fx0-f46.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754669Ab1HNQKM (ORCPT ); Sun, 14 Aug 2011 12:10:12 -0400 Date: Sun, 14 Aug 2011 20:08:48 +0400 From: Vasiliy Kulikov To: Andi Kleen Cc: "H. Peter Anvin" , Solar Designer , Thomas Gleixner , Ingo Molnar , James Morris , kernel-hardening@lists.openwall.com, x86@kernel.org, linux-kernel@vger.kernel.org, linux-security-module@vger.kernel.org, Will Drewry , containers@lists.linux-foundation.org Subject: Re: [RFC] x86: restrict pid namespaces to 32 or 64 bit syscalls Message-ID: <20110814160848.GA4333@albatros> References: <20110812150304.GC16880@albatros> <4E45884B.8030303@zytor.com> <20110813062246.GC3851@albatros> <36fcaf94-2e99-47cb-a835-aefb79856429@email.android.com> <632d03b0-6725-431e-b100-13f5046b03e9@email.android.com> <20110814092028.GB14293@openwall.com> <01ba0cce-d28e-473e-be3a-7d3c8f185681@email.android.com> <20110814152729.GU5782@one.firstfloor.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20110814152729.GU5782@one.firstfloor.org> User-Agent: Mutt/1.5.20 (2009-06-14) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org (CC'ed Will Drewry, the author of new seccomp version, and containers list) On Sun, Aug 14, 2011 at 17:27 +0200, Andi Kleen wrote: > > i386 vs x86-64 vs x32 is just one of many axes along which syscalls can be restricted (and for that matter, one axis if backward compatibility), and it does not make sense to burden the code with ad hoc filters. Designing a general filter facility which can be used to restrict any container to the subset of system calls it actually needs would make more sense, no? > > I believe this is already in the newer versions of seccomp. The "newer versions of seccomp" are NAK'ed by Ingo. AFAIU, Ingo wants more generic filters to filter much more than syscalls. But it contradicts the security by simplicity, which we're trying to achieve with this patch. Compatibility syscalls are much more error prone than common syscalls as they lack good testing or sometimes lack it at all, unfortunately. The link I've posted is about a crazy bug - a completely uninitialized structure was used in copy_from_user() function. The function was not tested _at all_. I doubt any non-compatibility syscall (ioctl() handler, etc.) can be completely untested. Also we already have CONFIG_IA32_EMULATION, this patch only moves the configuration mechanism from the compilation stage to the runtime stage, it doesn't draw the new line. It grants the permissions to use the feature to some containers, but denies to other containers, which is an rather expected property of containers separation. Thanks, -- Vasiliy Kulikov http://www.openwall.com - bringing security into open computing environments From mboxrd@z Thu Jan 1 00:00:00 1970 Reply-To: kernel-hardening@lists.openwall.com Sender: Vasiliy Kulikov Date: Sun, 14 Aug 2011 20:08:48 +0400 From: Vasiliy Kulikov Message-ID: <20110814160848.GA4333@albatros> References: <20110812150304.GC16880@albatros> <4E45884B.8030303@zytor.com> <20110813062246.GC3851@albatros> <36fcaf94-2e99-47cb-a835-aefb79856429@email.android.com> <632d03b0-6725-431e-b100-13f5046b03e9@email.android.com> <20110814092028.GB14293@openwall.com> <01ba0cce-d28e-473e-be3a-7d3c8f185681@email.android.com> <20110814152729.GU5782@one.firstfloor.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20110814152729.GU5782@one.firstfloor.org> Subject: [kernel-hardening] Re: [RFC] x86: restrict pid namespaces to 32 or 64 bit syscalls To: Andi Kleen Cc: "H. Peter Anvin" , Solar Designer , Thomas Gleixner , Ingo Molnar , James Morris , kernel-hardening@lists.openwall.com, x86@kernel.org, linux-kernel@vger.kernel.org, linux-security-module@vger.kernel.org, Will Drewry , containers@lists.linux-foundation.org List-ID: (CC'ed Will Drewry, the author of new seccomp version, and containers list) On Sun, Aug 14, 2011 at 17:27 +0200, Andi Kleen wrote: > > i386 vs x86-64 vs x32 is just one of many axes along which syscalls can be restricted (and for that matter, one axis if backward compatibility), and it does not make sense to burden the code with ad hoc filters. Designing a general filter facility which can be used to restrict any container to the subset of system calls it actually needs would make more sense, no? > > I believe this is already in the newer versions of seccomp. The "newer versions of seccomp" are NAK'ed by Ingo. AFAIU, Ingo wants more generic filters to filter much more than syscalls. But it contradicts the security by simplicity, which we're trying to achieve with this patch. Compatibility syscalls are much more error prone than common syscalls as they lack good testing or sometimes lack it at all, unfortunately. The link I've posted is about a crazy bug - a completely uninitialized structure was used in copy_from_user() function. The function was not tested _at all_. I doubt any non-compatibility syscall (ioctl() handler, etc.) can be completely untested. Also we already have CONFIG_IA32_EMULATION, this patch only moves the configuration mechanism from the compilation stage to the runtime stage, it doesn't draw the new line. It grants the permissions to use the feature to some containers, but denies to other containers, which is an rather expected property of containers separation. Thanks, -- Vasiliy Kulikov http://www.openwall.com - bringing security into open computing environments