From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-0.6 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id A5FCAC433E0 for ; Thu, 11 Jun 2020 23:31:40 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 7AEF42078C for ; Thu, 11 Jun 2020 23:31:40 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="cQBSeLaO" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726287AbgFKXbj (ORCPT ); Thu, 11 Jun 2020 19:31:39 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58918 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726270AbgFKXbj (ORCPT ); Thu, 11 Jun 2020 19:31:39 -0400 Received: from mail-pl1-x643.google.com (mail-pl1-x643.google.com [IPv6:2607:f8b0:4864:20::643]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 45D03C08C5C1; Thu, 11 Jun 2020 16:31:39 -0700 (PDT) Received: by mail-pl1-x643.google.com with SMTP id y17so2941586plb.8; Thu, 11 Jun 2020 16:31:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=53niQxqahks3EKNdNU3JdcwDT5/fPF/bXLdvCKXBBLU=; b=cQBSeLaO39fs84L+DCF7EpMGoLq4AUFWvJrTHTKXio6daEkqS+rp1K6pEOm7t3tIx8 i/g2el5SbhHHeGVCBEKrQ+1nymJjnvpcE81waFAIThUZ5oOOQ/uPVzreplkuIHSp8bxZ y5uKxabngYjUdfDvvC66FY+bRyfcx6AyjUQAr1vEGWqKO4bQSfQ6V5PJLiT3EgePVnAy Fqe2xTdjb8sdMpT1DvfJDthxyq3jv21fccco0e0FsY1fouFHUFRwAfk1RB4/LiXZ43JW RrlU6/PV4VOPkHKmkUReX2yTO0lqMF+ltJVHEE+JMZujbibg/Hu50ilDT0etsMEmMSJU p1bw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=53niQxqahks3EKNdNU3JdcwDT5/fPF/bXLdvCKXBBLU=; b=oWC6Y9IrbRFvc++8nZdfKmGBkgWxlSWcy4iDGNoCjWo3XT+aCRJ01W/YmznMULFijt /Vqg33cbUq4UTyXIx8QDdW8uV0THL7/02X/baJY9b3RgreSxAnZOzNY9CXfWWF+krAlw VjKaohnuGjn4sPdNBwN1etq16J3IL6h4JL8751sbObJq5KyoQtZ3zyz1pP6n60b79wWK 3BKf60i8CbMaO2Ro8TalQe2uGySSYbzkFB/Gx0KIk6ZR5jMUWJBqK+N2oldr1/aHR4Uv SxXn8t3NkvNM3uKsWAvYgmbOgBFHAVsJbl0tZSgtsToYf5M87C+PkUvgrqeYZDuJiU6x R2uA== X-Gm-Message-State: AOAM533ZjQ0VV6N7u39+i0lIfsmxLURMlPBIaW0Dm8lNbkTRzgHkiSsU kEJ9kImKYhlt5beZ/xQzzwk= X-Google-Smtp-Source: ABdhPJxmHS9QruoYsSu8Vwr0dt8xezQlGs8fL/ApDuOch0vBMhIPalUbjDCVFD6D25ayMBdlYjWJHA== X-Received: by 2002:a17:902:d711:: with SMTP id w17mr8690412ply.139.1591918298382; Thu, 11 Jun 2020 16:31:38 -0700 (PDT) Received: from ast-mbp.dhcp.thefacebook.com ([2620:10d:c090:400::5:73f9]) by smtp.gmail.com with ESMTPSA id u128sm4078947pfu.148.2020.06.11.16.31.36 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 11 Jun 2020 16:31:37 -0700 (PDT) Date: Thu, 11 Jun 2020 16:31:34 -0700 From: Alexei Starovoitov To: "Eric W. Biederman" Cc: Tetsuo Handa , Linus Torvalds , Kees Cook , Andrew Morton , Alexei Starovoitov , David Miller , Al Viro , bpf , linux-fsdevel , Daniel Borkmann , Jakub Kicinski , Masahiro Yamada , Gary Lin , Bruno Meneguele Subject: Re: [RFC][PATCH] net/bpfilter: Remove this broken and apparently unmantained Message-ID: <20200611233134.5vofl53dj5wpwp5j@ast-mbp.dhcp.thefacebook.com> References: <202006051903.C44988B@keescook> <875zc4c86z.fsf_-_@x220.int.ebiederm.org> <20200606201956.rvfanoqkevjcptfl@ast-mbp> <20200607014935.vhd3scr4qmawq7no@ast-mbp> <33cf7a57-0afa-9bb9-f831-61cca6c19eba@i-love.sakura.ne.jp> <20200608162306.iu35p4xoa2kcp3bu@ast-mbp.dhcp.thefacebook.com> <87r1uo2ejt.fsf@x220.int.ebiederm.org> <20200609235631.ukpm3xngbehfqthz@ast-mbp.dhcp.thefacebook.com> <87d066vd4y.fsf@x220.int.ebiederm.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <87d066vd4y.fsf@x220.int.ebiederm.org> Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org On Wed, Jun 10, 2020 at 04:12:29PM -0500, Eric W. Biederman wrote: > Alexei Starovoitov writes: > > > On Tue, Jun 09, 2020 at 03:02:30PM -0500, Eric W. Biederman wrote: > >> Alexei Starovoitov writes: > >> > >> > bpf_lsm is that thing that needs to load and start acting early. > >> > It's somewhat chicken and egg. fork_usermode_blob() will start a process > >> > that will load and apply security policy to all further forks and > >> > execs. > >> > >> What is the timeframe for bpf_lsm patches wanting to use > >> fork_usermode_blob()? > >> > >> Are we possibly looking at something that will be ready for the next > >> merge window? > > > > In bpf space there are these that want to use usermode_blobs: > > 1. bpfilter itself. > > First of all I think we made a mistake delaying landing the main patches: > > https://lore.kernel.org/patchwork/patch/902785/ > > https://lore.kernel.org/patchwork/patch/902783/ > > without them bpfilter is indeed dead. That probably was the reason > > no one was brave enough to continue working on it. > > So I think the landed skeleton of bpfilter can be removed. > > I think no user space code will notice that include/uapi/linux/bpfilter.h > > is gone. So it won't be considered as user space breakage. > > Similarly CONFIG_BPFILTER can be nuked too. > > bpftool is checking for it (see tools/bpf/bpftool/feature.c) > > but it's fine to remove it. > > I still think that the approach taken was a correct one, but > > lifting that project off the ground was too much for three of us. > > So when it's staffed appropriately we can re-add that code. > > > > 2. bpf_lsm. > > It's very active at the moment. I'm working on it as well > > (sleepable progs is targeting that), but I'm not sure when folks > > would have to have it during the boot. So far it sounds that > > they're addressing more critical needs first. "bpf_lsm ready at boot" > > came up several times during "bpf office hours" conference calls, > > so it's certainly on the radar. If I to guess I don't think > > bpf_lsm will use usermode_blobs in the next 6 weeks. > > More likely 2-4 month. > > > > 3. bpf iterator. > > It's already capable extension of several things in /proc. > > See https://lore.kernel.org/bpf/20200509175921.2477493-1-yhs@fb.com/ > > Cat-ing bpf program as "cat /sys/fs/bpf/my_ipv6_route" > > will produce the same human output as "cat /proc/net/ipv6_route". > > The key difference is that bpf is all tracing based and it's unstable. > > struct fib6_info can change and prog will stop loading. > > There are few FIXME in there. That is being addressed right now. > > After that the next step is to make cat-able progs available > > right after boot via usermode_blobs. > > Unlike cases 1 and 2 here we don't care that they appear before pid 1. > > They can certainly be chef installed and started as services. > > But they are kernel dependent, so deploying them to production > > is much more complicated when they're done as separate rpm. > > Testing is harder and so on. Operational issues pile up when something > > that almost like kernel module is done as a separate package. > > Hence usermode_blob fits the best. > > Of course we were not planning to add a bunch of them to kernel tree. > > The idea was to add only _one_ such cat-able bpf prog and have it as > > a selftest for usermode_blob + bpf_iter. What we want our users to > > see in 'cat my_ipv6_route' is probably different from other companies. > > These patches will likely be using usermode_blob() in the next month. > > > > But we don't need to wait. We can make the progress right now. > > How about we remove bpfilter uapi and rename net/bpfilter/bpfilter_kern.c > > into net/umb/umb_test.c only to exercise Makefile to build elf file > > from simple main.c including .S with incbin trick > > and kernel side that does fork_usermode_blob(). > > And that's it. > > net/ipv4/bpfilter/sockopt.c and kconfig can be removed. > > That would be enough base to do use cases 2 and 3 above. > > Having such selftest will be enough to adjust the layering > > for fork_usermode_blob(), right? > > If I understand correctly you are asking people to support out of tree > code. I see some justification for this functionality for in-tree code. > For out of tree code there really is no way to understand support or > maintain the code. It's just like saying that sys_finit_module() is there to support out of tree code. There are in- and out- tree modules and there will be in- and out- of tree bpf programs, but the focus is on those that are relevant for the long term future of the kernel. The 1 case above is in-tree only. There is nothing in bpfilter that makes sense out of tree. The 2 case (bpf_lsm) is primarily in-tree. Security is something everyone wants its own way, but majority of bpf_lsm functionality should live in-tree. The 3 case is mostly out-of-tree. If there was obvious way to extend /proc it could have been in-tree, but no one will agree. > We probably also need to have a conversation about why this > functionality is a better choice that using a compiled in initramfs, > such as can be had by setting CONFIG_INITRAMFS_SOURCE. I explained it several times already. I don't see how initramfs solves 1 and 2. > Even with this write up and the conversations so far I don't understand > what problem fork_usermode_blob is supposed to be solving. Is there > anything kernel version dependent about bpf_lsm? For me the primary > justification of something like fork_usermode_blob is something that is > for all practical purposes a kernel module but it just happens to run in > usermode. that's what it is. It's a kernel module that runs in user space. > From what little I know about bpf_lsm that isn't the case. So far all It is. > you have mentioned is that bpf_lsm needs to load early. That seems like > something that could be solved by a couple of lines init/main.c that > forks and exec's a program before init if it is present. Maybe that > also needs a bit of protection so the bootloader can't override the > binary. > > The entire concept of a loadable lsm has me scratching my head. Last > time that concept was seriously looked at the races for initializing per > object data were difficult enough to deal with modular support was > removed from all of the existing lsms. I'm not sure what races you're talking about. usermode_blob will interact with kernel via syscalls and other standard communication mechanism. > Not to mention there are places where the lsm hooks are a pretty lousy > API and will be refactored to make things better with no thought of any > out of tree code. I don't see how refactoring LSM hooks is relevant in this discussion. > > > If I understood you correctly you want to replace pid_t > > in 'struct umh_info' with proper 'struct pid' pointer that > > is refcounted, so user process's exit is clean? What else? > > No "if (filename)" or "if (file)" on the exec code paths. No extra case > for the LSM's to have to deal with. Nothing fork_usermode_blob does is > something that can't be done from userspace as far as execve is > concerned so there is no justification for any special cases in the core > of the exec code. Adding getname_kernel() instead of filename==NULL is trivial enough and makes sense as a cleanup. But where do you see 'if (file)' ? The correct 'file' pointer is passed from shmem_kernel_file_setup() all the way to exec. > Getting the deny_write_count and the reference count correct on the file > argument as well as getting BPRM_FLAGS_PATH_INACCESSIBLE set. There is no fd because there is no task, but there is a file. I think do_execve should assume BINPRM_FLAGS_PATH_INACCESSIBLE in this case. > Using the proper type for argv and envp. I guess that's going to be a part of other cleanup. > Those are the things I know of that need to be addressed. > > > Getting the code refactored so that the do_open_execat can be called > in do_execveat_common instead of __do_execve_file is enough of a > challenge of code motion I really would rather not do that. Unfortunately that is > the only way I can see right now to have both do_execveat_common and > do_execve_file pass in a struct file. The 'struct file' is there. Please take another look at the code. > Calling deny_write_access and get_file in do_execve_file and probably > a bit more is the only way I can see to cleanly isoloate the special > cases fork_usermode_blob brings to the table. > > > Strictly speaking I am also aware of the issue that the kernel has to > use set_fs(KERNEL_DS) to allow argv and envp to exist in kernel space > instead of userspace. That needs to be fixed as well, but for all > kernel uses of exec. So any work fixing fork_usermode_blob can ignore > that issue. well, this is the problem of usermodehelper_exec. usermode_blob doesn't use argv/envp. They could be NULL for all practical purpose.