From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI, SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 27422C4363D for ; Thu, 24 Sep 2020 00:42:11 +0000 (UTC) Received: from silver.osuosl.org (smtp3.osuosl.org [140.211.166.136]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 63D992145D for ; Thu, 24 Sep 2020 00:42:10 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=google.com header.i=@google.com header.b="i+nBcw6Q" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 63D992145D Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=lists.linux-foundation.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=containers-bounces@lists.linux-foundation.org Received: from localhost (localhost [127.0.0.1]) by silver.osuosl.org (Postfix) with ESMTP id E94A229C87; Thu, 24 Sep 2020 00:42:09 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from silver.osuosl.org ([127.0.0.1]) by localhost (.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id vnLCRFuE28RH; Thu, 24 Sep 2020 00:42:07 +0000 (UTC) Received: from lists.linuxfoundation.org (lf-lists.osuosl.org [140.211.9.56]) by silver.osuosl.org (Postfix) with ESMTP id 3D663275C6; Thu, 24 Sep 2020 00:42:07 +0000 (UTC) Received: from lf-lists.osuosl.org (localhost [127.0.0.1]) by lists.linuxfoundation.org (Postfix) with ESMTP id 318F1C0859; Thu, 24 Sep 2020 00:42:07 +0000 (UTC) Received: from silver.osuosl.org (smtp3.osuosl.org [140.211.166.136]) by lists.linuxfoundation.org (Postfix) with ESMTP id C2905C0051 for ; Thu, 24 Sep 2020 00:42:06 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by silver.osuosl.org (Postfix) with ESMTP id 92CF427E4C for ; Thu, 24 Sep 2020 00:42:06 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from silver.osuosl.org ([127.0.0.1]) by localhost (.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id pijan9NH3NLk for ; Thu, 24 Sep 2020 00:42:05 +0000 (UTC) X-Greylist: domain auto-whitelisted by SQLgrey-1.7.6 Received: from mail-ed1-f66.google.com (mail-ed1-f66.google.com [209.85.208.66]) by silver.osuosl.org (Postfix) with ESMTPS id 01F50275C6 for ; Thu, 24 Sep 2020 00:42:04 +0000 (UTC) Received: by mail-ed1-f66.google.com with SMTP id w1so1644996edr.3 for ; Wed, 23 Sep 2020 17:42:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=ALS2hDm4ftsEUNq9TXFt06kpNHf/ZcV3y5aWx3s1Do8=; b=i+nBcw6Qc5YdCSP79kchMxNZ9vscFmuRHIjL+4oM9aaVNrPeQ2J1fRTklh0kqsjyI1 N8NLB4IWr+D48TgWyYvZVuR18wkLSPfX8ovenzBJ4SrpdBaZGCKeG/X9PqlHe/AREy+G qo3TDDWtf6fkUtjt6XhYF7btCK2MTxIo9/4DEX+39YzOETzUS7wn6qaEXW02Fy7Nv7Sy iAhA3DJm5GB6uJspOdBYPqpCrSqmLiQJtt9QHxklrgcQ+yeDdyLYxkIwNF+ZXqK0GpPm VMI2uHMtk9t1HcUse8hr8ZqCv8PfIWNolN5O2CBcz/jLPmHK2IpxgDyi/+YWAMhYYGp8 Br8w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=ALS2hDm4ftsEUNq9TXFt06kpNHf/ZcV3y5aWx3s1Do8=; b=snaxt7O0D1fv9pWb5eAFGqIKVO59fEQUf+y/oLXDijILaOgcgPh46xLRSu4OUNKGLT BGPccKtZem/KRtBJUqkor5ikKaJfrISOIXFX+KOISILIWOmZ9/phxjtdnwFzzbPqdCeS XECI0GAWXiw5Wme6guSPjUt7lBk/+2PpVBYfrNz4fw1DAcgQFnNVngxk8g+RBEuZHLJC uW/GdkHrGU1k3yjL4HhPu4nBofSQjVmj2OLMH5h1Hm4jFREPL7jX2I3vEyi8/VI1zHbV SsqFAmocdnuWkmvrcxcy4ELROw0vQ5XHJ6rkoGGbaLnraYMXGaZ/WV2WBMDN9zYKR1p/ fauw== X-Gm-Message-State: AOAM530AT7oZ0ySTUAYDFKPtRzQNd+o1kllJ4/Uc97UZS3TksO1VzPNw znOT+ITSKbq70aG/W4n0EdK5UXmTEI3wLCY+U1YFIw== X-Google-Smtp-Source: ABdhPJyJ+2+MBiUxJyKzlnkil/1tpGNHZFlDrmkJ5755EYgjTjHJLwH+rSfDuJCXfwgvffq+bBDZCtFfWoiMAlyJwQo= X-Received: by 2002:a50:e807:: with SMTP id e7mr2071571edn.84.1600908123101; Wed, 23 Sep 2020 17:42:03 -0700 (PDT) MIME-Version: 1.0 References: <20200923232923.3142503-1-keescook@chromium.org> <20200923232923.3142503-2-keescook@chromium.org> In-Reply-To: <20200923232923.3142503-2-keescook@chromium.org> Date: Thu, 24 Sep 2020 02:41:36 +0200 Message-ID: Subject: Re: [PATCH 1/6] seccomp: Introduce SECCOMP_PIN_ARCHITECTURE To: Kees Cook Cc: Andrea Arcangeli , Giuseppe Scrivano , Will Drewry , bpf , YiFei Zhu , Linux API , Linux Containers , Tobin Feldman-Fitzthum , Hubertus Franke , Andy Lutomirski , Valentin Rothberg , Dimitrios Skarlatos , Jack Chen , Josep Torrellas , Tianyin Xu , kernel list X-BeenThere: containers@lists.linux-foundation.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: Linux Containers List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Jann Horn via Containers Reply-To: Jann Horn Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: containers-bounces@lists.linux-foundation.org Sender: "Containers" On Thu, Sep 24, 2020 at 1:29 AM Kees Cook wrote: > For systems that provide multiple syscall maps based on audit > architectures (e.g. AUDIT_ARCH_X86_64 and AUDIT_ARCH_I386 via > CONFIG_COMPAT) or via syscall masks (e.g. x86_x32), allow a fast way > to pin the process to a specific syscall table, instead of needing > to generate all filters with an architecture check as the first filter > action. > > This creates the internal representation that seccomp itself can use > (which is separate from the filters, which need to stay runtime > agnostic). Additionally paves the way for constant-action bitmaps. I don't really see the point in providing this UAPI - the syscall number checking will probably have much more performance cost than the architecture number check, and it's not like this lets us avoid the check, we're just moving it over into C code. > Signed-off-by: Kees Cook > --- > include/linux/seccomp.h | 9 +++ > include/uapi/linux/seccomp.h | 1 + > kernel/seccomp.c | 79 ++++++++++++++++++- > tools/testing/selftests/seccomp/seccomp_bpf.c | 33 ++++++++ > 4 files changed, 120 insertions(+), 2 deletions(-) > > diff --git a/include/linux/seccomp.h b/include/linux/seccomp.h > index 02aef2844c38..0be20bc81ea9 100644 > --- a/include/linux/seccomp.h > +++ b/include/linux/seccomp.h > @@ -20,12 +20,18 @@ > #include > #include > > +#define SECCOMP_ARCH_IS_NATIVE 1 > +#define SECCOMP_ARCH_IS_COMPAT 2 FYI, mips has three different possible "arch" values (per kernel build config; the __AUDIT_ARCH_LE flag can also be set, but that's fixed based on the config): - AUDIT_ARCH_MIPS - AUDIT_ARCH_MIPS | __AUDIT_ARCH_64BIT - AUDIT_ARCH_MIPS | __AUDIT_ARCH_64BIT | __AUDIT_ARCH_CONVENTION_MIPS64_N32 But I guess we can deal with that once someone wants to actually add support for this on mips. > +#define SECCOMP_ARCH_IS_MULTIPLEX 3 Why should X32 be handled specially? If the seccomp filter allows specific syscalls (as it should), we don't have to care about X32. Only in weird cases where the seccomp filter wants to deny specific syscalls (a horrible idea), X32 is a concern, and in such cases, the userspace code can generate a single conditional jump to deal with it. And when seccomp is used properly to allow specific syscalls, the kernel will just waste time uselessly checking this X32 stuff. [...] > diff --git a/kernel/seccomp.c b/kernel/seccomp.c [...] > +static long seccomp_pin_architecture(void) > +{ > +#ifdef SECCOMP_ARCH > + struct task_struct *task = current; > + > + u8 arch = seccomp_get_arch(syscall_get_arch(task), > + syscall_get_nr(task, task_pt_regs(task))); > + > + /* How did you even get here? */ Via a racing TSYNC, that's how. > + if (task->seccomp.arch && task->seccomp.arch != arch) > + return -EBUSY; > + > + task->seccomp.arch = arch; > +#endif > + return 0; > +} Why does this return 0 if SECCOMP_ARCH is not defined? That suggests to userspace that we have successfully pinned the ABI, even though we're actually unable to do so. _______________________________________________ Containers mailing list Containers@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/containers From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-17.4 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2D93DC4363D for ; Thu, 24 Sep 2020 00:42:31 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id CD17321D91 for ; Thu, 24 Sep 2020 00:42:30 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="i+nBcw6Q" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726825AbgIXAmF (ORCPT ); Wed, 23 Sep 2020 20:42:05 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43248 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726466AbgIXAmF (ORCPT ); Wed, 23 Sep 2020 20:42:05 -0400 Received: from mail-ed1-x541.google.com (mail-ed1-x541.google.com [IPv6:2a00:1450:4864:20::541]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9BF5EC0613CE for ; Wed, 23 Sep 2020 17:42:04 -0700 (PDT) Received: by mail-ed1-x541.google.com with SMTP id j2so1597925eds.9 for ; Wed, 23 Sep 2020 17:42:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=ALS2hDm4ftsEUNq9TXFt06kpNHf/ZcV3y5aWx3s1Do8=; b=i+nBcw6Qc5YdCSP79kchMxNZ9vscFmuRHIjL+4oM9aaVNrPeQ2J1fRTklh0kqsjyI1 N8NLB4IWr+D48TgWyYvZVuR18wkLSPfX8ovenzBJ4SrpdBaZGCKeG/X9PqlHe/AREy+G qo3TDDWtf6fkUtjt6XhYF7btCK2MTxIo9/4DEX+39YzOETzUS7wn6qaEXW02Fy7Nv7Sy iAhA3DJm5GB6uJspOdBYPqpCrSqmLiQJtt9QHxklrgcQ+yeDdyLYxkIwNF+ZXqK0GpPm VMI2uHMtk9t1HcUse8hr8ZqCv8PfIWNolN5O2CBcz/jLPmHK2IpxgDyi/+YWAMhYYGp8 Br8w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=ALS2hDm4ftsEUNq9TXFt06kpNHf/ZcV3y5aWx3s1Do8=; b=KsRk229vH/1R0LIiKx5NdXR70lDmqxgg+XuKXZ6vwXj8D1G3zEz7JuuZeqrftW6UMe Dj2WyCshCEird6HHGhWm+/WVCc9Mz/xIvLjNcFIqHp5Y0uq74DCISSgbUN9h/RaRhBSJ 2aqB8Skxovbgajc1WPzF/c+dg+3HnKQqLgg940dI2YsXh1q7ttRhajW1J4/lccjEJdHH FH/MjfWiUd8pGEAQGwe8Ja/n9TC9azWeh37P/PZ2zI41e51Bw1WKXeTx6qCok8kiefMD 7RZQ1bVuMDNOjv/FNzMGj/wKqrJxS/AfP5xUP7tlxvTNby1WTaQIwnYltmVeLk7oxzPy EKXw== X-Gm-Message-State: AOAM530LuPNkjEYkKn2S1ee6zOzg+NKsYR9Ib/t1Q2ewX4ueIdCb0ZE8 AQxCxjBRc4bptkMDmKwGNliA4WYdtKLUlm51b6mxtg== X-Google-Smtp-Source: ABdhPJyJ+2+MBiUxJyKzlnkil/1tpGNHZFlDrmkJ5755EYgjTjHJLwH+rSfDuJCXfwgvffq+bBDZCtFfWoiMAlyJwQo= X-Received: by 2002:a50:e807:: with SMTP id e7mr2071571edn.84.1600908123101; Wed, 23 Sep 2020 17:42:03 -0700 (PDT) MIME-Version: 1.0 References: <20200923232923.3142503-1-keescook@chromium.org> <20200923232923.3142503-2-keescook@chromium.org> In-Reply-To: <20200923232923.3142503-2-keescook@chromium.org> From: Jann Horn Date: Thu, 24 Sep 2020 02:41:36 +0200 Message-ID: Subject: Re: [PATCH 1/6] seccomp: Introduce SECCOMP_PIN_ARCHITECTURE To: Kees Cook Cc: YiFei Zhu , Christian Brauner , Tycho Andersen , Andy Lutomirski , Will Drewry , Andrea Arcangeli , Giuseppe Scrivano , Tobin Feldman-Fitzthum , Dimitrios Skarlatos , Valentin Rothberg , Hubertus Franke , Jack Chen , Josep Torrellas , Tianyin Xu , bpf , Linux Containers , Linux API , kernel list Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Sep 24, 2020 at 1:29 AM Kees Cook wrote: > For systems that provide multiple syscall maps based on audit > architectures (e.g. AUDIT_ARCH_X86_64 and AUDIT_ARCH_I386 via > CONFIG_COMPAT) or via syscall masks (e.g. x86_x32), allow a fast way > to pin the process to a specific syscall table, instead of needing > to generate all filters with an architecture check as the first filter > action. > > This creates the internal representation that seccomp itself can use > (which is separate from the filters, which need to stay runtime > agnostic). Additionally paves the way for constant-action bitmaps. I don't really see the point in providing this UAPI - the syscall number checking will probably have much more performance cost than the architecture number check, and it's not like this lets us avoid the check, we're just moving it over into C code. > Signed-off-by: Kees Cook > --- > include/linux/seccomp.h | 9 +++ > include/uapi/linux/seccomp.h | 1 + > kernel/seccomp.c | 79 ++++++++++++++++++- > tools/testing/selftests/seccomp/seccomp_bpf.c | 33 ++++++++ > 4 files changed, 120 insertions(+), 2 deletions(-) > > diff --git a/include/linux/seccomp.h b/include/linux/seccomp.h > index 02aef2844c38..0be20bc81ea9 100644 > --- a/include/linux/seccomp.h > +++ b/include/linux/seccomp.h > @@ -20,12 +20,18 @@ > #include > #include > > +#define SECCOMP_ARCH_IS_NATIVE 1 > +#define SECCOMP_ARCH_IS_COMPAT 2 FYI, mips has three different possible "arch" values (per kernel build config; the __AUDIT_ARCH_LE flag can also be set, but that's fixed based on the config): - AUDIT_ARCH_MIPS - AUDIT_ARCH_MIPS | __AUDIT_ARCH_64BIT - AUDIT_ARCH_MIPS | __AUDIT_ARCH_64BIT | __AUDIT_ARCH_CONVENTION_MIPS64_N32 But I guess we can deal with that once someone wants to actually add support for this on mips. > +#define SECCOMP_ARCH_IS_MULTIPLEX 3 Why should X32 be handled specially? If the seccomp filter allows specific syscalls (as it should), we don't have to care about X32. Only in weird cases where the seccomp filter wants to deny specific syscalls (a horrible idea), X32 is a concern, and in such cases, the userspace code can generate a single conditional jump to deal with it. And when seccomp is used properly to allow specific syscalls, the kernel will just waste time uselessly checking this X32 stuff. [...] > diff --git a/kernel/seccomp.c b/kernel/seccomp.c [...] > +static long seccomp_pin_architecture(void) > +{ > +#ifdef SECCOMP_ARCH > + struct task_struct *task = current; > + > + u8 arch = seccomp_get_arch(syscall_get_arch(task), > + syscall_get_nr(task, task_pt_regs(task))); > + > + /* How did you even get here? */ Via a racing TSYNC, that's how. > + if (task->seccomp.arch && task->seccomp.arch != arch) > + return -EBUSY; > + > + task->seccomp.arch = arch; > +#endif > + return 0; > +} Why does this return 0 if SECCOMP_ARCH is not defined? That suggests to userspace that we have successfully pinned the ABI, even though we're actually unable to do so.