From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-11.4 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_IN_DEF_DKIM_WL autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id CAAE1C4363D for ; Thu, 24 Sep 2020 14:06:52 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 7BC5E212CC for ; Thu, 24 Sep 2020 14:06:52 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="fdN4Y3Kn" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728102AbgIXOG2 (ORCPT ); Thu, 24 Sep 2020 10:06:28 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54410 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727952AbgIXOG1 (ORCPT ); Thu, 24 Sep 2020 10:06:27 -0400 Received: from mail-ej1-x642.google.com (mail-ej1-x642.google.com [IPv6:2a00:1450:4864:20::642]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8A78EC0613D3 for ; Thu, 24 Sep 2020 07:06:27 -0700 (PDT) Received: by mail-ej1-x642.google.com with SMTP id r7so4623039ejs.11 for ; Thu, 24 Sep 2020 07:06:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=oTKhozomiksQmEwUGZ3qRDCL7Z7QGtyosMbot5Bgjyk=; b=fdN4Y3Kn7nTC3mmnwcEAVtvHKFTOVqvEfnnuyMGVG7zBy90zeoG1JAX+VL+L714dQX 4KrJyZex66L5th/3yiM6GTAyUmHH7cIpCZf6bWWempwGqcsy0petGFjbk8SPWfYl/g6v nHTcr0Ht7tS8M141EgEoM2g1XzgVqh+VOUkcZ6nv2IZ7mZuVthmc+1Bo3AFfP4FIx1A9 ZvzPoDNLA4JlU9mj23VSuTxWEyhzaDiRq3b1OFJbe9n5ugBclzEMAMgvwisDPjxbnCXy pMGse6Kh2JIdlYgRWxy8zxQ2lpyxHMdzFg4ook3wHyXs+AlV3e8mCP1SCiPb7ldjVNO4 BlRw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=oTKhozomiksQmEwUGZ3qRDCL7Z7QGtyosMbot5Bgjyk=; b=YUfyE2XDa+04oG9ewe6q9lTXkWpydxrfh2SSfdC7IhXc79QnpXPBH6HHwv5euV1uE4 llHUrqUyUmoeWVpOvsFW3XByIJjEP4DyNpmXQz3MvBoOtGAiU638ys8t8wvwraeI1biV CVIT8P+HDy7/+V6DdJQIQgyciYTKg+X/zJ3hyNV3Xkyuq4GtsJfxUDTaySFWaAnlRyPj v/T4PaI6WFa8eInCfIQ9rv5+ROHi1qiYEgmzD4+e5Lyjj4XLRt3ib6lvU9Lu4ClXzPk7 Y6hIN2AKYRQY1BpmVeQHp4vugkL68NvgQqScCVk3FQ6RJPvyOv5HSyJUbsznWAyoDAfG NByw== X-Gm-Message-State: AOAM533xOoSB+JAcdVYnthpon87XssLFlY16tbZ1xvehhyqjlDTAobMM EigpU6fJCE3e68AULJCQlF060JlrezPbYVdAfmc9BQ== X-Google-Smtp-Source: ABdhPJwwyHaADMpJhLyRUcgsXvG5IwoHzjYXcthXS5FPqRPTRojTWAbOTxrURqPaQIBCU/TwFEcT0uEP/6pO6oOXUGw= X-Received: by 2002:a17:906:1f94:: with SMTP id t20mr4800ejr.493.1600956385788; Thu, 24 Sep 2020 07:06:25 -0700 (PDT) MIME-Version: 1.0 References: <20200923232923.3142503-1-keescook@chromium.org> <43039bb6-9d9f-b347-fa92-ea34ccc21d3d@rasmusvillemoes.dk> In-Reply-To: <43039bb6-9d9f-b347-fa92-ea34ccc21d3d@rasmusvillemoes.dk> From: Jann Horn Date: Thu, 24 Sep 2020 16:05:59 +0200 Message-ID: Subject: Re: [PATCH v1 0/6] seccomp: Implement constant action bitmaps To: Rasmus Villemoes Cc: Kees Cook , YiFei Zhu , Christian Brauner , Tycho Andersen , Andy Lutomirski , Will Drewry , Andrea Arcangeli , Giuseppe Scrivano , Tobin Feldman-Fitzthum , Dimitrios Skarlatos , Valentin Rothberg , Hubertus Franke , Jack Chen , Josep Torrellas , Tianyin Xu , bpf , Linux Containers , Linux API , kernel list Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Sep 24, 2020 at 3:40 PM Rasmus Villemoes wrote: > On 24/09/2020 01.29, Kees Cook wrote: > > rfc: https://lore.kernel.org/lkml/20200616074934.1600036-1-keescook@chromium.org/ > > alternative: https://lore.kernel.org/containers/cover.1600661418.git.yifeifz2@illinois.edu/ > > v1: > > - rebase to for-next/seccomp > > - finish X86_X32 support for both pinning and bitmaps > > - replace TLB magic with Jann's emulator > > - add JSET insn > > > > TODO: > > - add ALU|AND insn > > - significantly more testing > > > > Hi, > > > > This is a refresh of my earlier constant action bitmap series. It looks > > like the RFC was missed on the container list, so I've CCed it now. :) > > I'd like to work from this series, as it handles the multi-architecture > > stuff. > > So, I agree with Jann's point that the only thing that matters is that > always-allowed syscalls are indeed allowed fast. > > But one thing I'm wondering about and I haven't seen addressed anywhere: > Why build the bitmap on the kernel side (with all the complexity of > having to emulate the filter for all syscalls)? Why can't userspace just > hand the kernel "here's a new filter: the syscalls in this bitmap are > always allowed noquestionsasked, for the rest, run this bpf". Sure, that > might require a new syscall or extending seccomp(2) somewhat, but isn't > that a _lot_ simpler? It would probably also mean that the bpf we do get > handed is a lot smaller. Userspace might need to pass a couple of > bitmaps, one for each relevant arch, but you get the overall idea. It's not really a lot of logic though; and there are a bunch of different things in userspace that talk to the seccomp() syscall that would have to be updated if we made this part of the UAPI. libseccomp, Chrome, Android, OpenSSH, bubblewrap, ... - overall, if we can make the existing interface faster, it'll be less effort, and there will be less code duplication (because otherwise every user of seccomp will have to implement the same thing in userspace). Doing this internally with the old UAPI also means that we're not creating any additional commitments in terms of UAPI - if we come up with something better in the future, we can just rip this stuff out. If we created a new UAPI, we'd have to stay, in some form, compatible with it forever. > I'm also a bit worried about the performance of doing that emulation; > that's constant extra overhead for, say, launching a docker container. > > Regardless of how the kernel's bitmap gets created, something like > > + if (nr < NR_syscalls) { > + if (test_bit(nr, bitmaps->allow)) { > + *filter_ret = SECCOMP_RET_ALLOW; > + return true; > + } > > probably wants some nospec protection somewhere to avoid the irony of > seccomp() being used actively by bad guys. Good point...