From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id B356AC43334 for ; Wed, 29 Jun 2022 11:11:26 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230209AbiF2LLZ (ORCPT ); Wed, 29 Jun 2022 07:11:25 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48496 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229777AbiF2LLZ (ORCPT ); Wed, 29 Jun 2022 07:11:25 -0400 Received: from mail-wm1-x32d.google.com (mail-wm1-x32d.google.com [IPv6:2a00:1450:4864:20::32d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id F2D1F36B52 for ; Wed, 29 Jun 2022 04:11:23 -0700 (PDT) Received: by mail-wm1-x32d.google.com with SMTP id f190so8842231wma.5 for ; Wed, 29 Jun 2022 04:11:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=isovalent-com.20210112.gappssmtp.com; s=20210112; h=message-id:date:mime-version:user-agent:subject:content-language:to :cc:references:from:in-reply-to:content-transfer-encoding; bh=1PKyxZ1uu6yvDN2EF68ZfrLkCE2pPZtrlvwd7rslqbM=; b=L5QoSoybpVAjT7lZHQv69jquCs6gYVFWBSvengc9LLBR+jPmDZfPZCZ22vsU8G8Xep wMFyLE4Xg3NteSvv2nmjIjvj3XTz5MaL3GvgUwO0Cayj21olJWupfzZRJuARUd6bc4/E Gaz0xBV8oIithR99taSxUw8y+7s1yxC4Nkq4F5kjSmYpUzG97av3k+fKBTqTgghYNboB K4d9ox/KAz0Z1EWBSCqO0gIyzSlYP6xuVJbgHBt5JOBj4NPGDM4qHKkq3iOZyQt5tl3A RPfRzjoiMlZq9x4VDCLiRUFn/hoC8FHAn3HOLzrBlcWSWUJWSthZZp0IqVlAT8nu/2Sr /VCA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:date:mime-version:user-agent:subject :content-language:to:cc:references:from:in-reply-to :content-transfer-encoding; bh=1PKyxZ1uu6yvDN2EF68ZfrLkCE2pPZtrlvwd7rslqbM=; b=cxYyB81caM/UDXpN+adhZAEr2/wIIYY3repwwsUMXePslg0b5rtfE6FUXLy5HKxDyI Tv25Pq1LgsymHoss0FetRN85TFEZ0zjKUagwhtUlhpBHsL23/ZijdT5yQF8wqDfBVANu fGX01F2vBhZSdqGSm77apXttpivuZsnlEW2avKCpOhur3t+oIRuri6xS35b8z69U2uYL xS9ItikTAo5r/5MWkJD2GM8kCVf2c4zGkOck60maiNyaRuXcp8sTMf0z1ZHKBgAXOWEN MMdEbc4TxqInrGPo2lcU8fu5h+QNS+6NI+fDoSbs1wOTTRHsG65BKflAL2md0kRcKOEh 94mw== X-Gm-Message-State: AJIora9bzh2mTowmkmyvKE8VQHzt8H/pzsGc3542WHDXF2J8r/dGPkCe OB+N5jtffO+SsfDqNAXjpsPgNA== X-Google-Smtp-Source: AGRyM1uqzk5GfuDS8rJdDcVdXRoYqCZ4s9Yum9tkp7m7T+GZj30tfIsi7QLN3U1eRFWtLy77ZTR3oQ== X-Received: by 2002:a05:600c:34cc:b0:39c:832c:bd92 with SMTP id d12-20020a05600c34cc00b0039c832cbd92mr3036148wmq.24.1656501082432; Wed, 29 Jun 2022 04:11:22 -0700 (PDT) Received: from [192.168.178.21] ([51.155.200.13]) by smtp.gmail.com with ESMTPSA id g21-20020a7bc4d5000000b0039c587342d8sm2819557wmk.3.2022.06.29.04.11.21 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 29 Jun 2022 04:11:21 -0700 (PDT) Message-ID: Date: Wed, 29 Jun 2022 12:11:20 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.0 Subject: Re: [PATCH bpf-next] bpftool: Probe for memcg-based accounting before bumping rlimit Content-Language: en-GB To: Stanislav Fomichev Cc: Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko , Yafang Shao , netdev@vger.kernel.org, bpf@vger.kernel.org References: <20220628164529.80050-1-quentin@isovalent.com> From: Quentin Monnet In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org On 28/06/2022 18:53, Stanislav Fomichev wrote: > On Tue, Jun 28, 2022 at 9:45 AM Quentin Monnet wrote: >> >> Bpftool used to bump the memlock rlimit to make sure to be able to load >> BPF objects. After the kernel has switched to memcg-based memory >> accounting [0] in 5.11, bpftool has relied on libbpf to probe the system >> for memcg-based accounting support and for raising the rlimit if >> necessary [1]. But this was later reverted, because the probe would >> sometimes fail, resulting in bpftool not being able to load all required >> objects [2]. >> >> Here we add a more efficient probe, in bpftool itself. We first lower >> the rlimit to 0, then we attempt to load a BPF object (and finally reset >> the rlimit): if the load succeeds, then memcg-based memory accounting is >> supported. >> >> This approach was earlier proposed for the probe in libbpf itself [3], >> but given that the library may be used in multithreaded applications, >> the probe could have undesirable consequences if one thread attempts to >> lock kernel memory while memlock rlimit is at 0. Since bpftool is >> single-threaded and the rlimit is process-based, this is fine to do in >> bpftool itself. >> >> This probe was inspired by the similar one from the cilium/ebpf Go >> library [4]. >> >> [0] commit 97306be45fbe ("Merge branch 'switch to memcg-based memory accounting'") >> [1] commit a777e18f1bcd ("bpftool: Use libbpf 1.0 API mode instead of RLIMIT_MEMLOCK") >> [2] commit 6b4384ff1088 ("Revert "bpftool: Use libbpf 1.0 API mode instead of RLIMIT_MEMLOCK"") >> [3] https://lore.kernel.org/bpf/20220609143614.97837-1-quentin@isovalent.com/t/#u >> [4] https://github.com/cilium/ebpf/blob/v0.9.0/rlimit/rlimit.go#L39 >> >> Cc: Stanislav Fomichev >> Cc: Yafang Shao >> Suggested-by: Daniel Borkmann >> Signed-off-by: Quentin Monnet >> --- >> tools/bpf/bpftool/common.c | 71 ++++++++++++++++++++++++++++++++++-- >> tools/include/linux/kernel.h | 5 +++ >> 2 files changed, 73 insertions(+), 3 deletions(-) >> >> diff --git a/tools/bpf/bpftool/common.c b/tools/bpf/bpftool/common.c >> index a0d4acd7c54a..e07769802f76 100644 >> --- a/tools/bpf/bpftool/common.c >> +++ b/tools/bpf/bpftool/common.c >> @@ -13,14 +13,17 @@ >> #include >> #include >> #include >> -#include >> -#include >> #include >> #include >> #include >> #include >> #include >> >> +#include >> +#include >> +#include >> +#include >> + >> #include >> #include >> #include /* libbpf_num_possible_cpus */ >> @@ -73,11 +76,73 @@ static bool is_bpffs(char *path) >> return (unsigned long)st_fs.f_type == BPF_FS_MAGIC; >> } >> >> +/* Probe whether kernel switched from memlock-based (RLIMIT_MEMLOCK) to >> + * memcg-based memory accounting for BPF maps and programs. This was done in >> + * commit 97306be45fbe ("Merge branch 'switch to memcg-based memory >> + * accounting'"), in Linux 5.11. >> + * >> + * Libbpf also offers to probe for memcg-based accounting vs rlimit, but does >> + * so by checking for the availability of a given BPF helper and this has >> + * failed on some kernels with backports in the past, see commit 6b4384ff1088 >> + * ("Revert "bpftool: Use libbpf 1.0 API mode instead of RLIMIT_MEMLOCK""). >> + * Instead, we can probe by lowering the process-based rlimit to 0, trying to >> + * load a BPF object, and resetting the rlimit. If the load succeeds then >> + * memcg-based accounting is supported. >> + * >> + * This would be too dangerous to do in the library, because multithreaded >> + * applications might attempt to load items while the rlimit is at 0. Given >> + * that bpftool is single-threaded, this is fine to do here. >> + */ >> +static bool known_to_need_rlimit(void) >> +{ >> + const size_t prog_load_attr_sz = offsetofend(union bpf_attr, attach_btf_obj_fd); > > nit: > Any specific reason you're hard coding this sz via offseofend? Why not > use sizeof(bpf_attr) directly as a syscall/memset size? > The kernel should handle all these cases where bpftool has extra zero > padding, right? No particular reason. Good point, I'll send a v2 to address this. Thanks, Quentin