From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933332Ab1D1Pq0 (ORCPT ); Thu, 28 Apr 2011 11:46:26 -0400 Received: from oproxy4-pub.bluehost.com ([69.89.21.11]:40073 "HELO oproxy4-pub.bluehost.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with SMTP id S932413Ab1D1PqY (ORCPT ); Thu, 28 Apr 2011 11:46:24 -0400 DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=default; d=xenotime.net; h=Received:Date:From:To:Cc:Subject:Message-Id:In-Reply-To:References:Organization:X-Mailer:Mime-Version:Content-Type:Content-Transfer-Encoding:X-Identified-User; b=t5I6llDDMysfjX6iHDorWvUj9fm6euAtblk6VgM5lBaNGe+/2fElVAbQelo4Is0poPgFaYf5OTG44CsptghBmkcTzucysKA925eLYJf4XeNyH3dkd3eb9mdV8ZnLn5O0; Date: Thu, 28 Apr 2011 08:46:21 -0700 From: Randy Dunlap To: Will Drewry Cc: linux-kernel@vger.kernel.org, kees.cook@canonical.com, eparis@redhat.com, agl@chromium.org, mingo@elte.hu, jmorris@namei.org, rostedt@goodmis.org Subject: Re: [PATCH 5/7] seccomp_filter: Document what seccomp_filter is and how it works. Message-Id: <20110428084621.5517ec8a.rdunlap@xenotime.net> In-Reply-To: <1303960136-14298-4-git-send-email-wad@chromium.org> References: <1303960136-14298-1-git-send-email-wad@chromium.org> <1303960136-14298-4-git-send-email-wad@chromium.org> Organization: YPO4 X-Mailer: Sylpheed 2.7.1 (GTK+ 2.16.6; x86_64-unknown-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Identified-User: {1807:box742.bluehost.com:xenotime:xenotime.net} {sentby:smtp auth 50.53.38.135 authed with rdunlap@xenotime.net} Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, 27 Apr 2011 22:08:49 -0500 Will Drewry wrote: > Adds a text file covering what CONFIG_SECCOMP_FILTER is, how it is > implemented presently, and what it may be used for. In addition, > the limitations and caveats of the proposed implementation are > included. > > Signed-off-by: Will Drewry > --- > Documentation/trace/seccomp_filter.txt | 75 ++++++++++++++++++++++++++++++++ > 1 files changed, 75 insertions(+), 0 deletions(-) > create mode 100644 Documentation/trace/seccomp_filter.txt > > diff --git a/Documentation/trace/seccomp_filter.txt b/Documentation/trace/seccomp_filter.txt > new file mode 100644 > index 0000000..6a0fd33 > --- /dev/null > +++ b/Documentation/trace/seccomp_filter.txt > @@ -0,0 +1,75 @@ > + Seccomp filtering > + ================= > + > +Introduction > +------------ > + > +A large number of system calls are exposed to every userland process > +with many of them going unused for the entire lifetime of the > +application. As system calls change and mature, bugs are found and > +quashed. A certain subset of userland applications benefit by having > +a reduce set of available system calls. The reduced set reduces the reduced > +total kernel surface exposed to the application. System call filtering > +is meant for use with those applications. > + > +The implementation currently leverages both the existing seccomp > +infrastructure and the kernel tracing infrastructure. By centralizing > +hooks for attack surface reduction in seccomp, it is possible to assure > +attention to security that is less relevant in normal ftrace scenarios, > +such as time of check, time of use attacks. However, ftrace provides a > +rich, human-friendly environment for specifying system calls by name and > +expected arguments. (As such, this requires FTRACE_SYSCALLS.) > + > + > +What it isn't > +------------- > + > +System call filtering isn't a sandbox. It provides a clearly defined > +mechanism for minimizing the exposed kernel surface. Beyond that, policy for > +logical behavior and information flow should be managed with an LSM of your > +choosing. > + > + > +Usage > +----- > + > +An additional seccomp mode is exposed through mode '2'. This mode > +depends on CONFIG_SECCOMP_FILTER which in turn depends on > +CONFIG_FTRACE_SYSCALLS. > + > +A collection of filters may be supplied via prctl, and the current set of > +filters is exposed in /proc//seccomp_filter. > + > +For instance, > + const char filters[] = > + "sys_read: (fd == 1) || (fd == 2)\n" > + "sys_write: (fd == 0)\n" > + "sys_exit: 1\n" > + "sys_exit_group: 1\n" > + "on_next_syscall: 1"; > + prctl(PR_SET_SECCOMP, 2, filters); > + > +This will setup system call filters for read, write, and exit where reading can > +be done only from fds 1 and 2 and writing to fd 0. The "on_next_syscall" directive tells > +seccomp to not enforce the ruleset until after the next system call is run. This allows > +for launchers to apply system call filters to a binary before executing it. > + > +Once enabled, the access may only be reduced. For example, a set of filters may be: > + > + sys_read: 1 > + sys_write: 1 > + sys_mmap: 1 > + sys_prctl: 1 > + > +Then it may call the following to drop mmap access: > + prctl(PR_SET_SECCOMP, 2, "sys_mmap: 0"); > + > + > +Caveats > +------- > + > +The system call names come from ftrace events. At present, many system > +calls are not hooked - such as x86's ptregs wrapped system calls. > + > +In addition compat_task()s will not be supported until a sys32s begin > +being hooked. Last sentence is hard to read IMO: a. what are compat_task()s? b. what is a sys32s begin? c. awkward wording, maybe change to: until a sys32s begin has been hooked. thanks, --- ~Randy *** Remember to use Documentation/SubmitChecklist when testing your code ***