From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 89736C4727E for ; Thu, 1 Oct 2020 19:01:42 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 4828420872 for ; Thu, 1 Oct 2020 19:01:42 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729047AbgJAS67 (ORCPT ); Thu, 1 Oct 2020 14:58:59 -0400 Received: from wnew1-smtp.messagingengine.com ([64.147.123.26]:43087 "EHLO wnew1-smtp.messagingengine.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729990AbgJAS5F (ORCPT ); Thu, 1 Oct 2020 14:57:05 -0400 Received: from compute6.internal (compute6.nyi.internal [10.202.2.46]) by mailnew.west.internal (Postfix) with ESMTP id 366651298; Thu, 1 Oct 2020 14:56:36 -0400 (EDT) Received: from mailfrontend1 ([10.202.2.162]) by compute6.internal (MEProxy); Thu, 01 Oct 2020 14:56:37 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=tycho.pizza; h= date:from:to:cc:subject:message-id:references:mime-version :content-type:content-transfer-encoding:in-reply-to; s=fm1; bh=i bq8LGhrZAKigNWOBMJB3QC8JFPS6HWHnRsmip0Juf8=; b=faefTIbsKtfeTFnxU i6tdavFKCHLFdM2fxfyp1AHD15/aQqHWGz8vNOLmFkRRXqkpjgOsgLXT0+J0TZM5 qc9c7yfgRZ5TLa2zdVqFAV+uXEpLR/t+uKd9UIHwbYy2BGVLLo8EgUla4FNkk6E4 vK3g8QjsUQa0GfsJcY5g0+xbOEMp8ktmfX7VVq+AtUqd79eCBD7flPRARFIFWzlM S0GS7Nkkh+WsmhH/Fp6zHBe6tCNT9gKvxYm7qiRBoITQYF7S0gAljeOXlTIQrL82 pFvMNcICudfy1edLjhUhpLzsIIvqI0fqXF7ogOTy6gwY1udn0/b+zhqyhyLacVBM HzsKQ== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:content-type :date:from:in-reply-to:message-id:mime-version:references :subject:to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender :x-sasl-enc; s=fm3; bh=ibq8LGhrZAKigNWOBMJB3QC8JFPS6HWHnRsmip0Ju f8=; b=vbC/uYymsPNo3iCMQ0xDYDVah+dU7erOdTEDmHcMYHh7/G8rQptUkvKKA NXouI6bqPOdVVEmAdVZwFeiyQqzVsU405xvQG6b6xwCBS5EJaDqnmpqNcrJ6RHGU ZZK5qXcd1FVgkPQYEnTTcV2dxW2FDhydgh+1Fx9zHU+eN8JrbZQpKUVQve/zvP+N PDwR0BRNg01821QELz1wEM1QlrlnrBP0gkYyXXqy1fuB2dA2kpXwmEiq1jb2nIME xtGJYrTqngxi8q1mIM12Q+MDu2lGVYN32b1O3nLX8tDxRjRRLsBgVwRqxCcGAi2e /8RZ5PMfF6Jl/Czmu7jomZbGVhYJA== X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedujedrfeeggddufeegucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmne cujfgurhepfffhvffukfhfgggtugfgjgesthekredttddtjeenucfhrhhomhepvfihtghh ohcutehnuggvrhhsvghnuceothihtghhohesthihtghhohdrphhiiiiirgeqnecuggftrf grthhtvghrnhephfeuvddvleeiveeggeejueekueeljedtjeefteefueejfedvledttefh hfeukeffnecukfhppeejfedrvddujedruddtrdeitdenucevlhhushhtvghrufhiiigvpe dtnecurfgrrhgrmhepmhgrihhlfhhrohhmpehthigthhhosehthigthhhordhpihiiiigr X-ME-Proxy: Received: from cisco (c-73-217-10-60.hsd1.co.comcast.net [73.217.10.60]) by mail.messagingengine.com (Postfix) with ESMTPA id A23DD328005E; Thu, 1 Oct 2020 14:56:32 -0400 (EDT) Date: Thu, 1 Oct 2020 12:56:31 -0600 From: Tycho Andersen To: Jann Horn Cc: Christian Brauner , linux-man , Song Liu , Will Drewry , Kees Cook , Daniel Borkmann , Giuseppe Scrivano , Robert Sesek , Linux Containers , lkml , Alexei Starovoitov , "Michael Kerrisk (man-pages)" , bpf , Andy Lutomirski , Christian Brauner Subject: Re: For review: seccomp_user_notif(2) manual page Message-ID: <20201001185631.GD1260245@cisco> References: <45f07f17-18b6-d187-0914-6f341fe90857@gmail.com> <20201001125043.dj6taeieatpw3a4w@gmail.com> <20201001165850.GC1260245@cisco> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org On Thu, Oct 01, 2020 at 08:18:49PM +0200, Jann Horn wrote: > On Thu, Oct 1, 2020 at 6:58 PM Tycho Andersen wrote: > > On Thu, Oct 01, 2020 at 05:47:54PM +0200, Jann Horn via Containers wrote: > > > On Thu, Oct 1, 2020 at 2:54 PM Christian Brauner > > > wrote: > > > > On Wed, Sep 30, 2020 at 05:53:46PM +0200, Jann Horn via Containers wrote: > > > > > On Wed, Sep 30, 2020 at 1:07 PM Michael Kerrisk (man-pages) > > > > > wrote: > > > > > > NOTES > > > > > > The file descriptor returned when seccomp(2) is employed with the > > > > > > SECCOMP_FILTER_FLAG_NEW_LISTENER flag can be monitored using > > > > > > poll(2), epoll(7), and select(2). When a notification is pend‐ > > > > > > ing, these interfaces indicate that the file descriptor is read‐ > > > > > > able. > > > > > > > > > > We should probably also point out somewhere that, as > > > > > include/uapi/linux/seccomp.h says: > > > > > > > > > > * Similar precautions should be applied when stacking SECCOMP_RET_USER_NOTIF > > > > > * or SECCOMP_RET_TRACE. For SECCOMP_RET_USER_NOTIF filters acting on the > > > > > * same syscall, the most recently added filter takes precedence. This means > > > > > * that the new SECCOMP_RET_USER_NOTIF filter can override any > > > > > * SECCOMP_IOCTL_NOTIF_SEND from earlier filters, essentially allowing all > > > > > * such filtered syscalls to be executed by sending the response > > > > > * SECCOMP_USER_NOTIF_FLAG_CONTINUE. Note that SECCOMP_RET_TRACE can equally > > > > > * be overriden by SECCOMP_USER_NOTIF_FLAG_CONTINUE. > > > > > > > > > > In other words, from a security perspective, you must assume that the > > > > > target process can bypass any SECCOMP_RET_USER_NOTIF (or > > > > > SECCOMP_RET_TRACE) filters unless it is completely prohibited from > > > > > calling seccomp(). This should also be noted over in the main > > > > > seccomp(2) manpage, especially the SECCOMP_RET_TRACE part. > > > > > > > > So I was actually wondering about this when I skimmed this and a while > > > > ago but forgot about this again... Afaict, you can only ever load a > > > > single filter with SECCOMP_FILTER_FLAG_NEW_LISTENER set. If there > > > > already is a filter with the SECCOMP_FILTER_FLAG_NEW_LISTENER property > > > > in the tasks filter hierarchy then the kernel will refuse to load a new > > > > one? > > > > > > > > static struct file *init_listener(struct seccomp_filter *filter) > > > > { > > > > struct file *ret = ERR_PTR(-EBUSY); > > > > struct seccomp_filter *cur; > > > > > > > > for (cur = current->seccomp.filter; cur; cur = cur->prev) { > > > > if (cur->notif) > > > > goto out; > > > > } > > > > > > > > shouldn't that be sufficient to guarantee that USER_NOTIF filters can't > > > > override each other for the same task simply because there can only ever > > > > be a single one? > > > > > > Good point. Exceeeept that that check seems ineffective because this > > > happens before we take the locks that guard against TSYNC, and also > > > before we decide to which existing filter we want to chain the new > > > filter. So if two threads race with TSYNC, I think they'll be able to > > > chain two filters with listeners together. > > > > Yep, seems the check needs to also be in seccomp_can_sync_threads() to > > be totally effective, > > > > > I don't know whether we want to eternalize this "only one listener > > > across all the filters" restriction in the manpage though, or whether > > > the man page should just say that the kernel currently doesn't support > > > it but that security-wise you should assume that it might at some > > > point. > > > > This requirement originally came from Andy, arguing that the semantics > > of this were/are confusing, which still makes sense to me. Perhaps we > > should do something like the below? > [...] > > +static bool has_listener_parent(struct seccomp_filter *child) > > +{ > > + struct seccomp_filter *cur; > > + > > + for (cur = current->seccomp.filter; cur; cur = cur->prev) { > > + if (cur->notif) > > + return true; > > + } > > + > > + return false; > > +} > [...] > > @@ -407,6 +419,11 @@ static inline pid_t seccomp_can_sync_threads(void) > [...] > > + /* don't allow TSYNC to install multiple listeners */ > > + if (flags & SECCOMP_FILTER_FLAG_NEW_LISTENER && > > + !has_listener_parent(thread->seccomp.filter)) > > + continue; > [...] > > @@ -1462,12 +1479,9 @@ static const struct file_operations seccomp_notify_ops = { > > static struct file *init_listener(struct seccomp_filter *filter) > [...] > > - for (cur = current->seccomp.filter; cur; cur = cur->prev) { > > - if (cur->notif) > > - goto out; > > - } > > + if (has_listener_parent(current->seccomp.filter)) > > + goto out; > > I dislike this because it combines a non-locked check and a locked > check. And I don't think this will work in the case where TSYNC and > non-TSYNC race - if the non-TSYNC call nests around the TSYNC filter > installation, the thread that called seccomp in non-TSYNC mode will > still end up with two notifying filters. How about the following? Sure, you can add, Reviewed-by: Tycho Andersen when you send it. Tycho