From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 19117C433F5 for ; Sun, 6 Feb 2022 04:31:38 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229569AbiBFE30 (ORCPT ); Sat, 5 Feb 2022 23:29:26 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45330 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229452AbiBFE3X (ORCPT ); Sat, 5 Feb 2022 23:29:23 -0500 Received: from mail-pl1-x632.google.com (mail-pl1-x632.google.com [IPv6:2607:f8b0:4864:20::632]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E1AC0C06173B; Sat, 5 Feb 2022 20:29:21 -0800 (PST) Received: by mail-pl1-x632.google.com with SMTP id z5so8613460plg.8; Sat, 05 Feb 2022 20:29:21 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=Jdhw7mSR7V7cNcaM0UCeu+GU2bmiq3T19a9N118Zppg=; b=Cd+ijFk/74tk8ZNujs8fsDYd1sTXI5V/wGkDR0Fi92TrXWNha4DUr2B62eZNiFm4lw lornrc25uXGBV8m9I7uqUKok5hGpfLrnjmPt3u1ZAFfeKVXtpqOkOCJO/EWgx7vrWJdC 2WF5ogO26iW7WTGKMGpEJCJHESEVdlI/DG5JWyEXLoLBDVvyvDEwTUW7/RkDop7INl44 LckVO0k9clUROBR5+IgXhfP2UEB3gr23nWPfxBaoldKDCj015MM60H/CEiKLVTh2EH24 nTFIlyztZGJ2gDziCu6w5Q09K/7k9n5eCid3hIs94RwI4c7UFNw/xvjtTykZ7uImKVAl KBow== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=Jdhw7mSR7V7cNcaM0UCeu+GU2bmiq3T19a9N118Zppg=; b=dixe0EsVrjLPWcomXwFVfdeclgc46Ei/g3+zP2aTNNxQJxzcnap9dqSDhfLW5RqSEl kgk4Qt3Yssu6JSFWkS+b+Zi8/BMs2/NBR/zzJCzFTFx6zIrLWh0FlNfdBxlYg00j6DU5 KHBMiaiD23ppilF52kaR5xq6o0DwohKPxJbDAmn9lVeAUj1G0CfMpcPOIV3Uvpxta83h /Mcsbj1MvASdATbPZiwFI5C6fFwe/xizfqeOl1svodQgfDKqrS5yzNgH/ZnrAMwHTAWF Y+5CFMZcXXz4VaMiqtUHafhbYIfxY5M+muOGo1esFARZjezL5+mcllapVLLY1jg+DpDd 5YPQ== X-Gm-Message-State: AOAM5303UEZwtxnqrDMw3kMP6gG6avflh7MzxSFlPxMo+5zoFxq1MnZ2 GTPG6czZXc/C/FRjGe8wPeeyVOsffDzIMlo6OLqdS8Xe X-Google-Smtp-Source: ABdhPJxSbBYDlrcxjq4UdQFxjJ6+qrQz4UnDrLrww5Fn3AkoCYVzGO+fdu52p+w79IDqfhZo1gZK2VPPm5RnUpN047Y= X-Received: by 2002:a17:90b:4ac6:: with SMTP id mh6mr7409716pjb.138.1644121761313; Sat, 05 Feb 2022 20:29:21 -0800 (PST) MIME-Version: 1.0 References: <20220201205534.1962784-1-haoluo@google.com> <20220201205534.1962784-6-haoluo@google.com> <20220203180414.blk6ou3ccmod2qck@ast-mbp.dhcp.thefacebook.com> In-Reply-To: From: Alexei Starovoitov Date: Sat, 5 Feb 2022 20:29:10 -0800 Message-ID: Subject: Re: [PATCH RFC bpf-next v2 5/5] selftests/bpf: test for pinning for cgroup_view link To: Hao Luo Cc: Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , Martin KaFai Lau , Song Liu , Yonghong Song , KP Singh , Shakeel Butt , Joe Burton , Stanislav Fomichev , bpf , LKML Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Feb 4, 2022 at 10:27 AM Hao Luo wrote: > > > > > In our use case, we can't ask the users who create cgroups to do the > > > pinning. Pinning requires root privilege. In our use case, we have > > > non-root users who can create cgroup directories and still want to > > > read bpf stats. They can't do pinning by themselves. This is why > > > inheritance is a requirement for us. With inheritance, they only need > > > to mkdir in cgroupfs and bpffs (unprivileged operations), no pinning > > > operation is required. Patch 1-4 are needed to implement inheritance. > > > > > > It's also not a good idea in our use case to add a userspace > > > privileged process to monitor cgroupfs operations and perform the > > > pinning. It's more complex and has a higher maintenance cost and > > > runtime overhead, compared to the solution of asking whoever makes > > > cgroups to mkdir in bpffs. The other problem is: if there are nodes in > > > the data center that don't have the userspace process deployed, the > > > stats will be unavailable, which is a no-no for some of our users. > > > > The commit log says that there will be a daemon that does that > > monitoring of cgroupfs. And that daemon needs to mkdir > > directories in bpffs when a new cgroup is created, no? > > The kernel is only doing inheritance of bpf progs into > > new dirs. I think that daemon can pin as well. > > > > The cgroup creation is typically managed by an agent like systemd. > > Sounds like you have your own agent that creates cgroups? > > If so it has to be privileged and it can mkdir in bpffs and pin too ? > > Ah, yes, we have our own daemon to manage cgroups. That daemon creates > the top-level cgroup for each job to run inside. However, the job can > create its own cgroups inside the top-level cgroup, for fine grained > resource control. This doesn't go through the daemon. The job-created > cgroups don't have the pinned objects and this is a no-no for our > users. We can whitelist certain tracepoints to be sleepable and extend tp_btf prog type to include everything from prog_type_syscall. Such prog would attach to cgroup_mkdir and cgroup_release and would call bpf_sys_bpf() helper to pin progs in new bpffs dirs. We can allow prog_type_syscall to do mkdir in bpffs as well. This feature could be useful for similar monitoring/introspection tasks. We can write a program that would monitor bpf prog load/unload and would pin an iterator prog that would show debug info about a prog. Like cat /sys/fs/bpf/progs.debug shows a list of loaded progs. With this feature we can implement: ls /sys/fs/bpf/all_progs.debug/ and each loaded prog would have a corresponding file. The file name would be a program name, for example. cat /sys/fs/bpf/all_progs.debug/my_prog would pretty print info about 'my_prog' bpf program. This way the kernfs/cgroupfs specific logic from patches 1-4 will not be necessary. wdyt?