bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: John Fastabend <john.fastabend@gmail.com>
To: Alexei Starovoitov <alexei.starovoitov@gmail.com>,
	 Anton Protopopov <aspsk@isovalent.com>
Cc: Martin KaFai Lau <martin.lau@kernel.org>,
	 bpf <bpf@vger.kernel.org>,  Joe Stringer <joe@isovalent.com>,
	 John Fastabend <john.fastabend@gmail.com>
Subject: Re: [PATCH bpf-next 1/2] bpf: add new map ops ->map_pressure
Date: Fri, 23 Jun 2023 17:00:15 -0700	[thread overview]
Message-ID: <6496320f40706_7b3662086f@john.notmuch> (raw)
In-Reply-To: <CAADnVQLn2hxXPXbmPXMn4G6=jCBd6Xmty7RO2bY+S-GiS8NJ6w@mail.gmail.com>

Alexei Starovoitov wrote:
> On Fri, Jun 2, 2023 at 7:20 AM Anton Protopopov <aspsk@isovalent.com> wrote:
> >
> > On Thu, Jun 01, 2023 at 05:40:10PM -0700, Alexei Starovoitov wrote:
> > > On Thu, Jun 1, 2023 at 11:24 AM Alexei Starovoitov
> > > <alexei.starovoitov@gmail.com> wrote:
> > > >
> > > > On Thu, Jun 1, 2023 at 11:17 AM Anton Protopopov <aspsk@isovalent.com> wrote:
> > > > > >
> > > > > > LRU logic doesn't kick in until the map is full.
> > > > >
> > > > > In fact, it can: a reproducable example is in the self-test from this patch
> > > > > series. In the test N threads try to insert random values for keys 1..3000
> > > > > simultaneously. As the result, the map may contain any number of elements,
> > > > > typically 100 to 1000 (never full 3000, which is also less than the map size).
> > > > > So a user can't really even closely estimate the number of elements in the LRU
> > > > > map based on the number of updates (with unique keys). A per-cpu counter
> > > > > inc/dec'ed from the kernel side would solve this.
> > > >
> > > > That's odd and unexpected.
> > > > Definitely something to investigate and fix in the LRU map.
> > > >
> > > > Pls cc Martin in the future.
> > > >
> > > > > > If your LRU map is not full you shouldn't be using LRU in the first place.
> > > > >
> > > > > This makes sense, yes, especially that LRU evictions may happen randomly,
> > > > > without a map being full. I will step back with this patch until we investigate
> > > > > if we can replace LRUs with hashes.
> > > > >
> > > > > Thanks for the comments!
> > >
> > > Thinking about it more...
> > > since you're proposing to use percpu counter unconditionally for prealloc
> > > and percpu_counter_add_batch() logic is batched,
> > > it could actually be acceptable if it's paired with non-api access.
> > > Like another patch can add generic kfunc to do __percpu_counter_sum()
> > > and in the 3rd patch kernel/bpf/preload/iterators/iterators.bpf.c
> > > for maps can be extended to print the element count, so the user can have
> > > convenient 'cat /sys/fs/bpf/maps.debug' way to debug maps.
> > >
> > > But additional logic of percpu_counter_add_batch() might get in the way
> > > of debugging eventually.
> > > If we want to have stats then we can have normal per-cpu u32 in basic
> > > struct bpf_map that most maps, except array, will inc/dec on update/delete.
> > > kfunc to iterate over percpu is still necessary.
> > > This way we will be able to see not only number of elements, but detect
> > > bad usage when one cpu is only adding and another cpu is deleting elements.
> > > And other cpu misbalance.
> >
> > This looks for me like two different things: one is a kfunc to get the current
> > counter (e.g., bpf_map_elements_count), the other is a kfunc to dump some more
> > detailed stats (e.g., per-cpu values or more).
> >
> > My patch, slightly modified, addresses the first goal: most maps of interest
> > already have a counter in some form (sometimes just atomic_t or u64+lock). If
> > we add a percpu (non-batch) counter for pre-allocated hashmaps, then it's done:
> > the new kfunc can get the counter based on the map type.
> >
> > If/when there's need to provide per-cpu statistics of elements or some more
> > sophisticated statistics, this can be done without changing the api of the
> > bpf_map_elements_count() kfunc.
> >
> > Would this work?
> 
> No, because bpf_map_elements_count() as a building block is too big
> and too specific. Nothing else can be made out of it, but counting
> elements.
> "for_each_cpu in per-cpu variable" would be generic that is usable beyond
> this particular use case of stats collection.

Without much thought, could you hook the eviction logic in LRU to know
when the evict happens and even more details about what was evicted so
we could debug the random case where we evict something in a conntrack
table and then later it comes back to life and sends some data like a
long living UDP session.

For example in the cases where you build an LRU map because in 99%
cases no evictions happen and the LRU is just there as a backstop
you might even generate events to userspace to let it know evicts
are in progress and they should do something about it.

Thanks,
John

  parent reply	other threads:[~2023-06-24  0:00 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-31 11:05 [PATCH bpf-next 0/2] add mechanism to report map pressure Anton Protopopov
2023-05-31 11:05 ` [PATCH bpf-next 1/2] bpf: add new map ops ->map_pressure Anton Protopopov
2023-05-31 18:24   ` Alexei Starovoitov
2023-06-01  7:31     ` Anton Protopopov
2023-06-01 16:39       ` Alexei Starovoitov
2023-06-01 18:18         ` Anton Protopopov
2023-06-01 18:24           ` Alexei Starovoitov
2023-06-02  0:40             ` Alexei Starovoitov
2023-06-02 14:21               ` Anton Protopopov
2023-06-02 16:23                 ` Alexei Starovoitov
2023-06-06  7:49                   ` Anton Protopopov
2023-06-24  0:00                   ` John Fastabend [this message]
2023-06-26 16:29                     ` Anton Protopopov
2023-06-28 13:17             ` Anton Protopopov
2023-06-01  0:44   ` kernel test robot
2023-06-01  7:50     ` Anton Protopopov
2023-06-13  8:23       ` Yujie Liu
2023-06-13  8:30         ` Anton Protopopov
2023-05-31 11:05 ` [PATCH bpf-next 2/2] selftests/bpf: test map pressure Anton Protopopov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6496320f40706_7b3662086f@john.notmuch \
    --to=john.fastabend@gmail.com \
    --cc=alexei.starovoitov@gmail.com \
    --cc=aspsk@isovalent.com \
    --cc=bpf@vger.kernel.org \
    --cc=joe@isovalent.com \
    --cc=martin.lau@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).