From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-19.4 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,MENTIONS_GIT_HOSTING,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D261BC433DF for ; Mon, 12 Oct 2020 06:45:07 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 6CB5020757 for ; Mon, 12 Oct 2020 06:45:07 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="liJEn3P8" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726413AbgJLGnT (ORCPT ); Mon, 12 Oct 2020 02:43:19 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51352 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725941AbgJLGnT (ORCPT ); Mon, 12 Oct 2020 02:43:19 -0400 Received: from mail-ej1-x644.google.com (mail-ej1-x644.google.com [IPv6:2a00:1450:4864:20::644]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E4E0DC0613CE for ; Sun, 11 Oct 2020 23:43:18 -0700 (PDT) Received: by mail-ej1-x644.google.com with SMTP id p15so21607735ejm.7 for ; Sun, 11 Oct 2020 23:43:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=8L82QjxMFG+r7QSsdjzTxfz9BqpiVnaUpK+324xJ45o=; b=liJEn3P8GhTC9buz4u9pgXW2jZr7I9OZqGoH9rI6dOu/SvvD+mgE2tMDI/XC8+UyEa hkRR2uD0cJ3mxMZQwctrmWg6mIC/iBQIHjYQkAe/7TE3UoNtaU9H280J2Jlc1756UknI 6CrZmT9ZxNGkPJncUXIO0tIyvokI7yydua6P/beBqwQ2PddwXuLQnhJ0g5sNDtEwrz94 4vlCC4C0QnujNTx1CIx8/jDcfCZyohwTZePfYMDYQpSCyOKl12t3iOAnIHnMmOU9Muz8 HmeEpOwTDkslb4i6FqVRfZZ0Yp9vKV+KTVJTsqEwFCvDWpg5M/taMPJ1xppOvFG87pH7 MFAA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=8L82QjxMFG+r7QSsdjzTxfz9BqpiVnaUpK+324xJ45o=; b=C4pEpXQBySn7+/e5kewXXwM4HPNo1nT2UxyQmdBqsh4BrR3mxxKmWYWB7Zw5+e3H+k RGMNEOZfxt9mI5Jn8UwhOMx2oSsz1xl9hP24mwQ8gnFCaKTfWgojCXDeMrcsaeENJy9X xjJDPIP0qA0NE77MFLN4GSMKXvRtm5qjiFYDa14NLCpAWb6bUzYxVjTARsqPlS2AYtdW xxFkdCcRAo3m92jQ+t0mGHn89+YE0EahM9rJV+Zq2gDTQuPHKLoV0vmCkCGIQoqV8vGb tCpxVx0u0AnsBTsQv23x7vUNbgDwExkPRd9hV7LFLMQa0etwIHSurB1Ybm7InG8f2ZsX mSlg== X-Gm-Message-State: AOAM530jwe4Qddy8UGWWJ3IA/k1naQmnvrZlZo8sPKZ3z7uaFwcJyO/a fl64HUOLcxH/opgqiKJw6hcBcEZksyEjaPspm915BQ== X-Google-Smtp-Source: ABdhPJx1VUYEvitmxLWOydKL99dOpXcnBPMRcX+FVk/fF9o/nqN8xeAbbK0V+dWHQlFYCOYOHNrS5SDijBDnSR1H8x4= X-Received: by 2002:a17:906:86c3:: with SMTP id j3mr27448642ejy.493.1602484997186; Sun, 11 Oct 2020 23:43:17 -0700 (PDT) MIME-Version: 1.0 References: <10f91a367ec4fcdea7fc3f086de3f5f13a4a7436.1602431034.git.yifeifz2@illinois.edu> In-Reply-To: <10f91a367ec4fcdea7fc3f086de3f5f13a4a7436.1602431034.git.yifeifz2@illinois.edu> From: Jann Horn Date: Mon, 12 Oct 2020 08:42:50 +0200 Message-ID: Subject: Re: [PATCH v5 seccomp 1/5] seccomp/cache: Lookup syscall allowlist bitmap for fast path To: YiFei Zhu Cc: Linux Containers , YiFei Zhu , bpf , kernel list , Aleksa Sarai , Andrea Arcangeli , Andy Lutomirski , David Laight , Dimitrios Skarlatos , Giuseppe Scrivano , Hubertus Franke , Jack Chen , Josep Torrellas , Kees Cook , Tianyin Xu , Tobin Feldman-Fitzthum , Tycho Andersen , Valentin Rothberg , Will Drewry Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sun, Oct 11, 2020 at 5:48 PM YiFei Zhu wrote: > The overhead of running Seccomp filters has been part of some past > discussions [1][2][3]. Oftentimes, the filters have a large number > of instructions that check syscall numbers one by one and jump based > on that. Some users chain BPF filters which further enlarge the > overhead. A recent work [6] comprehensively measures the Seccomp > overhead and shows that the overhead is non-negligible and has a > non-trivial impact on application performance. > > We observed some common filters, such as docker's [4] or > systemd's [5], will make most decisions based only on the syscall > numbers, and as past discussions considered, a bitmap where each bit > represents a syscall makes most sense for these filters. > > The fast (common) path for seccomp should be that the filter permits > the syscall to pass through, and failing seccomp is expected to be > an exceptional case; it is not expected for userspace to call a > denylisted syscall over and over. > > When it can be concluded that an allow must occur for the given > architecture and syscall pair (this determination is introduced in > the next commit), seccomp will immediately allow the syscall, > bypassing further BPF execution. > > Each architecture number has its own bitmap. The architecture > number in seccomp_data is checked against the defined architecture > number constant before proceeding to test the bit against the > bitmap with the syscall number as the index of the bit in the > bitmap, and if the bit is set, seccomp returns allow. The bitmaps > are all clear in this patch and will be initialized in the next > commit. > > When only one architecture exists, the check against architecture > number is skipped, suggested by Kees Cook [7]. > > [1] https://lore.kernel.org/linux-security-module/c22a6c3cefc2412cad00ae14c1371711@huawei.com/T/ > [2] https://lore.kernel.org/lkml/202005181120.971232B7B@keescook/T/ > [3] https://github.com/seccomp/libseccomp/issues/116 > [4] https://github.com/moby/moby/blob/ae0ef82b90356ac613f329a8ef5ee42ca923417d/profiles/seccomp/default.json > [5] https://github.com/systemd/systemd/blob/6743a1caf4037f03dc51a1277855018e4ab61957/src/shared/seccomp-util.c#L270 > [6] Draco: Architectural and Operating System Support for System Call Security > https://tianyin.github.io/pub/draco.pdf, MICRO-53, Oct. 2020 > [7] https://lore.kernel.org/bpf/202010091614.8BB0EB64@keescook/ > > Co-developed-by: Dimitrios Skarlatos > Signed-off-by: Dimitrios Skarlatos > Signed-off-by: YiFei Zhu Reviewed-by: Jann Horn