From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 67C3AC61DA4 for ; Tue, 7 Feb 2023 00:49:12 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B2E1A6B0071; Mon, 6 Feb 2023 19:49:11 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id ADDB76B0073; Mon, 6 Feb 2023 19:49:11 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9A8746B0074; Mon, 6 Feb 2023 19:49:11 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 8ABF06B0071 for ; Mon, 6 Feb 2023 19:49:11 -0500 (EST) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 5D1D4A04E6 for ; Tue, 7 Feb 2023 00:49:11 +0000 (UTC) X-FDA: 80438661702.19.49204BF Received: from mail-yw1-f170.google.com (mail-yw1-f170.google.com [209.85.128.170]) by imf03.hostedemail.com (Postfix) with ESMTP id 75BBC2000B for ; Tue, 7 Feb 2023 00:49:08 +0000 (UTC) Authentication-Results: imf03.hostedemail.com; dkim=pass header.d=vt-edu.20210112.gappssmtp.com header.s=20210112 header.b=0ouDCMuo; spf=pass (imf03.hostedemail.com: domain of horenc@vt.edu designates 209.85.128.170 as permitted sender) smtp.mailfrom=horenc@vt.edu; dmarc=pass (policy=none) header.from=vt.edu ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1675730948; a=rsa-sha256; cv=none; b=ZdA13eXTwF04ZIqE4rx+ld25XMFDB7ZgypO1+sgETjlULZJh0Q8ntgYY8uV2y7CRpS6JFf Awfnqd4/k/FkDHpZnxVSGf0O+p+rX/gwKr12I+9m40leV+2GmsvIRJ+MUGoSFbjJAP4E9e R/lF/5pq+4kvbLFbnH1CuFy7MiK4k8k= ARC-Authentication-Results: i=1; imf03.hostedemail.com; dkim=pass header.d=vt-edu.20210112.gappssmtp.com header.s=20210112 header.b=0ouDCMuo; spf=pass (imf03.hostedemail.com: domain of horenc@vt.edu designates 209.85.128.170 as permitted sender) smtp.mailfrom=horenc@vt.edu; dmarc=pass (policy=none) header.from=vt.edu ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1675730948; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=QPAYjMheEunzCDlLLfJN3F1HbNtHnxHe/A0amqFodLE=; b=n26TqdSnHLIEc6Qn9CYe/NXsi03X96XIHgrZv12WALVULoVtFbGLBc77mMcMzdvRG0yGjf ePVFZAkAOE8HQZRf4QPkggr9pVHnZmTmQ49oMIBLfi0pCXDk82DnN/jWMxtEQX3lJZXFLj BwdyQ6zGFQUwXGR+ubxkaLg9Q0i/dMs= Received: by mail-yw1-f170.google.com with SMTP id 00721157ae682-527979e8670so90673247b3.10 for ; Mon, 06 Feb 2023 16:49:08 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=vt-edu.20210112.gappssmtp.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=QPAYjMheEunzCDlLLfJN3F1HbNtHnxHe/A0amqFodLE=; b=0ouDCMuoMJ2J8LkEmxXyFXgx5tnRNTTHFtPSmZoaLhfPSfBlYw9DpMgJXDiIG+hx8W DN0TJqzR+ooDSg4S1z3I2iy4d75UG9MQ9Wfk1dK7NIQBZx0pn2ciSIZLlv9kzHTjTGWT 3Kx4ysmZWmyCAEBWwYN5OmKlPGxTWXAS0gb0Y1kXEHOwozTwyCfm8ETK0s95SXp0U5Lm pnogeN7V+8TT63NVc690/466KYcJsRfH5CcQLNjKE25g2g6vIfoOWHrplNMBXM0rOdXe ffItxAyt55lCAJ7xwvDc7ebOpbF2o3dP+HNsITgkvAeV32halyLh1oTO0hroD1yP8qco fUEw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=QPAYjMheEunzCDlLLfJN3F1HbNtHnxHe/A0amqFodLE=; b=sRdiy+wwkKkmFGxcz7ATQsGcDMg9vMOaKZjM185P2VtbwdfNX9FVyhnguySCaHxyq0 eErJprWJpHlXELDDva7bbOoGaB3+pLo99F6nvMCT1sYJ8kilsjNbK6a2klbcwHYBTwFg WcKBlKFKT3PjPpkUS80DbdPqly4DYddXbIswo39UhV4+mmpOaQbDli7bj84yPnInTNNB OG6w9t3LVwOxZNTK/SMZBPHLX4QC6tk8Ry1c8qPHx3nPlx05E1kOrPVF7+p7jPUmicL/ 3aXLcdk8akltjC33Km4/4EdHHtcJ4y2jZPtIAkCgj+ei6QGYpDz805QTmzqI62Z44X3D Bexw== X-Gm-Message-State: AO0yUKXKQQZRZPllbKga8A+d1+7qpblJ3Qxk+Kwk5hTwlvs8LdG9a5ls p3Ww+2fT/7Xfr9wPJQ8Lh1PDTZ4O5XYohd5YwaEtoQ== X-Google-Smtp-Source: AK7set9rWn6150tM7dL9Vmi6wbg3WeoF3JE3/2WWC+QdEEGOEGfrffeh35ryxf19e3YHSGHSei10LviXaF24/86W83I= X-Received: by 2002:a81:4804:0:b0:52a:9e40:3d08 with SMTP id v4-20020a814804000000b0052a9e403d08mr113726ywa.460.1675730947357; Mon, 06 Feb 2023 16:49:07 -0800 (PST) MIME-Version: 1.0 References: <20230202014158.19616-1-laoar.shao@gmail.com> <63ddbfd9ae610_6bb1520861@john.notmuch> In-Reply-To: From: Ho-Ren Chuang Date: Mon, 6 Feb 2023 16:48:56 -0800 Message-ID: Subject: Re: [PATCH bpf-next 0/7] bpf, mm: bpf memory usage To: Yafang Shao Cc: John Fastabend , ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org, kafai@fb.com, songliubraving@fb.com, yhs@fb.com, kpsingh@kernel.org, sdf@google.com, haoluo@google.com, jolsa@kernel.org, tj@kernel.org, dennis@kernel.org, cl@linux.com, akpm@linux-foundation.org, penberg@kernel.org, rientjes@google.com, iamjoonsoo.kim@lge.com, roman.gushchin@linux.dev, 42.hyeyoo@gmail.com, vbabka@suse.cz, urezki@gmail.com, linux-mm@kvack.org, bpf@vger.kernel.org, hao.xiang@bytedance.com, yifeima@bytedance.com, Xiaoning Ding , horenchuang@bytedance.com Content-Type: multipart/alternative; boundary="000000000000db52f205f41183ec" X-Rspam-User: X-Rspamd-Queue-Id: 75BBC2000B X-Rspamd-Server: rspam01 X-Stat-Signature: kqcin3cpr85jxfst6t4g3odrdght6biw X-HE-Tag: 1675730948-919967 X-HE-Meta: U2FsdGVkX1+7Zk6fadgyy+TEv4giT8/99JRcfuE77Q3kdzro94EZg19V3de9v0/DbxebDoB9ZYUucrg57tvCOGjDq6INydg+5NoNON0vbA590rFlDRwH+mD520bq4N91vY6UCy9YLOZGghrtU5pGw6NvupGgkhHeQO4ElsIcCz4kot9BJUN0FB9JjtAU3IqllzJleBAWWUORfzgtywIM45A3NxlAXo2nZKWz0GzhdDWksLo/XSv40mK4JEI2RS0BS9DtXZC944b1pBELpCYPVs3jRcnZ+eRalS2A3oPOWqJwELxTRY7gCAHUiTEuq8Ru6g2Evk0478sgnCukyyJchH/7yz5+BpHqjVRIKmet+egoIMf8QY5ZM9pT763pxY2JBjA9Jx9v0aunZODNFgOFmCC1KFLL/YLoiXkTZsUyTGXqoB8zXnoWdIQ+TLdCcD0UlwdUfvN5j2SNUjCQFjGVllsfC3ZmhQNtBZECthMZYmqX0xb8JnuDyf9cjNaIgxbwhiPvcc6O8YaFqhc0oZw+JYiI7dckIhJgfvwUu8IZw46VN724Yf7jhklV3RcAiRdZwOZij+1nSz61Jlbt9iMvnv9NEv300PjdmcDmHyU4YuutvcgmIGlyunpZ9/h3kE6ri11W9cKgQEbQI44oJ+qverUNKBWHayeaFRX0ZXEiTCvpwagtSG0K+2f0HtRn5/O/NadT+DbvJdUn3OQ0MTc7uRAbQiDOWqCsKi6niyR2S45P/x82L04RHDdA5Da7eHKLA9ogru/HkFPadJKVhOFRqFZGGAMFYrmxQ/nLzWLt9Jm13r8RV6oiYCEUOBmhV7Db3eZROYY1C3b2ssFGmmX1u7wyBVDGEoadyFry+yXHwJM27ZmsH+QtWzjJAfzOqvZwEejd4P5TlerIgmi3OilrGR9lu5ue3d7ixW6LZp1OgMh38ipL/RANHpjF3htzknen/Yn3KHc5XBSf48FkWl7 UUFFFxCm VMOk4s8Qo/KYgeIuPfIT0G6JXv4A8MeKTEuOQq2BwGVsZxTAnSZu4Mcj79UlGvm2kzwFQ4QjOMoPXCnpYuLo/ee6peJkSug0/kNwAqeSHExoFKYuDwHSF3SeyfIRGLysXZpQLJSgEy5tufHebcTY3Wl6fZTlgS0htxAGk03qYR5aoUWQt+jewbA1/k9cmybAKII0js4a9+/tg0o1v2AKhA68Gmos7bVQBDaumropxoVRNYZbh0gDo8krP9lEjyKlMEmZHNf5E8LwlIv2sDgjNqYnIbLSaiFq5Ml83/Bgm5uR/TDoFGfyDUa2/Wuel1bUEkT0tKsXaAOcyFrZSxFPa3MWRwhdeGlLLdB5FmxHNGw3OQlR9YA8L0I+LLQNuvxgP9/jY427ajwgU1c8cH11xuOF4tfoW4u4ucIbG9jyo1cWggQL2+s+pfQ+VJwceH0u5kgXQgqlwKrFPbKPdk1aU2rOp1IMF4dZoqUj5p6PoSqFEBXZr1gGXbLHNhVXMFGQlH60km9t9Txbjig+DeGZcXYDwKGx5wVFRkf21BhCqFrCVxJSADJ1E9yEu8PkF35yPYQ/FnSB01DDngCpkJxXDzH33JWJLqvGGUV7mTglqIc3swdk06u/b0G3v26J6BSLPzKLsmwo4xlJyB8Fpam1qIN6UzQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: --000000000000db52f205f41183ec Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hi Yafang and everyone, We've proposed very similar features at https://lore.kernel.org/bpf/CAAYibXgiCOOEY9NvLXbY4ve7pH8xWrZjnczrj6SHy3x_Tt= OU1g@mail.gmail.com/#t We are very excited seeing we are not the only ones eager to have this feature upstream to monitor eBPF map's actual usage. This shows the need for having such an ability in eBPF. Regarding the use cases please also check https://lore.kernel.org/all/CAADnVQLBt0snxv4bKwg1WKQ9wDFbaDCtZ03v1-LjOTYtsK= PckQ@mail.gmail.com/#t . We are developing an app to monitor memory footprints used by eBPF programs/maps similar to Linux `top` command. Thank you, On Sat, Feb 4, 2023 at 8:03 PM Yafang Shao wrote: > On Sat, Feb 4, 2023 at 10:15 AM John Fastabend > wrote: > > > > Yafang Shao wrote: > > > Currently we can't get bpf memory usage reliably. bpftool now shows t= he > > > bpf memory footprint, which is difference with bpf memory usage. The > > > difference can be quite great between the footprint showed in bpftool > > > and the memory actually allocated by bpf in some cases, for example, > > > > > > - non-preallocated bpf map > > > The non-preallocated bpf map memory usage is dynamically changed. T= he > > > allocated elements count can be from 0 to the max entries. But the > > > memory footprint in bpftool only shows a fixed number. > > > - bpf metadata consumes more memory than bpf element > > > In some corner cases, the bpf metadata can consumes a lot more memo= ry > > > than bpf element consumes. For example, it can happen when the > element > > > size is quite small. > > > > Just following up slightly on previous comment. > > > > The metadata should be fixed and knowable correct? > > The metadata of BPF itself is fixed, but the medata of MM allocation > depends on the kernel configuretion. > > > What I'm getting at > > is if this can be calculated directly instead of through a BPF helper > > and walking the entire map. > > > > As I explained in another thread, it doesn't walk the entire map. > > > > > > > We need a way to get the bpf memory usage especially there will be mo= re > > > and more bpf programs running on the production environment and thus > the > > > bpf memory usage is not trivial. > > > > In our environments we track map usage so we always know how many entri= es > > are in a map. I don't think we use this to calculate memory footprint > > at the moment, but just for map usage. Seems though once you have this > > calculating memory footprint can be done out of band because element > > and overheads costs are fixed. > > > > > > > > This patchset introduces a new map ops ->map_mem_usage to get the > memory > > > usage. In this ops, the memory usage is got from the pointers which i= s > > > already allocated by a bpf map. To make the code simple, we igore som= e > > > small pointers as their size are quite small compared with the total > > > usage. > > > > > > In order to get the memory size from the pointers, some generic mm > helpers > > > are introduced firstly, for example, percpu_size(), vsize() and > kvsize(). > > > > > > This patchset only implements the bpf memory usage for hashtab. I wil= l > > > extend it to other maps and bpf progs (bpf progs can dynamically > allocate > > > memory via bpf_obj_new()) in the future. > > > > My preference would be to calculate this out of band. Walking a > > large map and doing it in a critical section to get the memory > > usage seems not optimal > > > > I don't quite understand what you mean by calculating it out of band. > This patchset introduces a BPF helper which is used in bpftool, so it > is already out of band, right ? > We should do it in bpftool, because the sys admin wants a generic way > to get the system-wide bpf memory usage. > > -- > Regards > Yafang > --=20 Best regards, Ho-Ren (Jack) Chuang =E8=8E=8A=E8=B3=80=E4=BB=BB 1(540)449-9833 --000000000000db52f205f41183ec Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hi Yafang and everyone,

https://lore.kernel.org/bpf/CAAYibXgiCO= OEY9NvLXbY4ve7pH8xWrZjnczrj6SHy3x_TtOU1g@mail.gmail.com/#t
=C2=A0

We are very exc= ited seeing we are not the only ones eager to have this feature upstream to= monitor eBPF map's actual usage. This shows the need for having such a= n ability in eBPF.


Regarding the use cases please also check=C2=A0https://lore.kernel.org/all/CAADnVQLBt0snxv4bKwg1WKQ9wDFbaDCtZ03v1= -LjOTYtsKPckQ@mail.gmail.com/#t=C2=A0. We are developing an app = to monitor memory footprints used by eBPF programs/maps similar to Linux `t= op` command.


Thank you,

<= div class=3D"gmail_quote">
On Sat, Feb= 4, 2023 at 8:03 PM Yafang Shao <laoar.shao@gmail.com> wrote:
On Sat, Feb 4, 2023 at 10:15 AM John Fastabend <john.fastabend@g= mail.com> wrote:
>
> Yafang Shao wrote:
> > Currently we can't get bpf memory usage reliably. bpftool now= shows the
> > bpf memory footprint, which is difference with bpf memory usage. = The
> > difference can be quite great between the footprint showed in bpf= tool
> > and the memory actually allocated by bpf in some cases, for examp= le,
> >
> > - non-preallocated bpf map
> >=C2=A0 =C2=A0The non-preallocated bpf map memory usage is dynamica= lly changed. The
> >=C2=A0 =C2=A0allocated elements count can be from 0 to the max ent= ries. But the
> >=C2=A0 =C2=A0memory footprint in bpftool only shows a fixed number= .
> > - bpf metadata consumes more memory than bpf element
> >=C2=A0 =C2=A0In some corner cases, the bpf metadata can consumes a= lot more memory
> >=C2=A0 =C2=A0than bpf element consumes. For example, it can happen= when the element
> >=C2=A0 =C2=A0size is quite small.
>
> Just following up slightly on previous comment.
>
> The metadata should be fixed and knowable correct?

The metadata of BPF itself is fixed, but the medata of MM allocation
depends on the kernel configuretion.

> What I'm getting at
> is if this can be calculated directly instead of through a BPF helper<= br> > and walking the entire map.
>

As I explained in another thread, it doesn't walk the entire map.

> >
> > We need a way to get the bpf memory usage especially there will b= e more
> > and more bpf programs running on the production environment and t= hus the
> > bpf memory usage is not trivial.
>
> In our environments we track map usage so we always know how many entr= ies
> are in a map. I don't think we use this to calculate memory footpr= int
> at the moment, but just for map usage. Seems though once you have this=
> calculating memory footprint can be done out of band because element > and overheads costs are fixed.
>
> >
> > This patchset introduces a new map ops ->map_mem_usage to get = the memory
> > usage. In this ops, the memory usage is got from the pointers whi= ch is
> > already allocated by a bpf map. To make the code simple, we igore= some
> > small pointers as their size are quite small compared with the to= tal
> > usage.
> >
> > In order to get the memory size from the pointers, some generic m= m helpers
> > are introduced firstly, for example, percpu_size(), vsize() and k= vsize().
> >
> > This patchset only implements the bpf memory usage for hashtab. I= will
> > extend it to other maps and bpf progs (bpf progs can dynamically = allocate
> > memory via bpf_obj_new()) in the future.
>
> My preference would be to calculate this out of band. Walking a
> large map and doing it in a critical section to get the memory
> usage seems not optimal
>

I don't quite understand what you mean by calculating it out of band. This patchset introduces a BPF helper which is used in bpftool, so it
is already out of band, right ?
We should do it in bpftool, because the sys admin wants a generic way
to get the system-wide bpf memory usage.

--
Regards
Yafang


--
Best regards,
Ho-Ren (Jack)= =C2=A0Chuang
=E8=8E=8A=E8=B3=80=E4=BB= =BB
1(540)449-9833
--000000000000db52f205f41183ec--