From: Alexei Starovoitov <alexei.starovoitov@gmail.com>
To: Brian Vazquez <brianvv.kernel@gmail.com>
Cc: Song Liu <liu.song.a23@gmail.com>,
Brian Vazquez <brianvv@google.com>,
Daniel Borkmann <daniel@iogearbox.net>,
"David S . Miller" <davem@davemloft.net>,
Stanislav Fomichev <sdf@google.com>,
Willem de Bruijn <willemb@google.com>,
Petar Penkov <ppenkov@google.com>,
Networking <netdev@vger.kernel.org>, bpf <bpf@vger.kernel.org>,
Yonghong Song <yhs@fb.com>
Subject: Re: [PATCH bpf-next 2/6] bpf: add BPF_MAP_DUMP command to dump more than one entry per call
Date: Thu, 25 Jul 2019 18:47:10 -0700 [thread overview]
Message-ID: <CAADnVQJpp37fXLsu8ZnMFPoC0Uof3roz4gofX0QCewNkwtf-Xg@mail.gmail.com> (raw)
In-Reply-To: <CABCgpaXE=dkBcJVqs95NZQTFuznA-q64kYPEcbvmYvAJ4wSp1A@mail.gmail.com>
On Thu, Jul 25, 2019 at 6:24 PM Brian Vazquez <brianvv.kernel@gmail.com> wrote:
>
> On Thu, Jul 25, 2019 at 4:54 PM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> >
> > On Thu, Jul 25, 2019 at 04:25:53PM -0700, Brian Vazquez wrote:
> > > > > > If prev_key is deleted before map_get_next_key(), we get the first key
> > > > > > again. This is pretty weird.
> > > > >
> > > > > Yes, I know. But note that the current scenario happens even for the
> > > > > old interface (imagine you are walking a map from userspace and you
> > > > > tried get_next_key the prev_key was removed, you will start again from
> > > > > the beginning without noticing it).
> > > > > I tried to sent a patch in the past but I was missing some context:
> > > > > before NULL was used to get the very first_key the interface relied in
> > > > > a random (non existent) key to retrieve the first_key in the map, and
> > > > > I was told what we still have to support that scenario.
> > > >
> > > > BPF_MAP_DUMP is slightly different, as you may return the first key
> > > > multiple times in the same call. Also, BPF_MAP_DUMP is new, so we
> > > > don't have to support legacy scenarios.
> > > >
> > > > Since BPF_MAP_DUMP keeps a list of elements. It is possible to try
> > > > to look up previous keys. Would something down this direction work?
> > >
> > > I've been thinking about it and I think first we need a way to detect
> > > that since key was not present we got the first_key instead:
> > >
> > > - One solution I had in mind was to explicitly asked for the first key
> > > with map_get_next_key(map, NULL, first_key) and while walking the map
> > > check that map_get_next_key(map, prev_key, key) doesn't return the
> > > same key. This could be done using memcmp.
> > > - Discussing with Stan, he mentioned that another option is to support
> > > a flag in map_get_next_key to let it know that we want an error
> > > instead of the first_key.
> > >
> > > After detecting the problem we also need to define what we want to do,
> > > here some options:
> > >
> > > a) Return the error to the caller
> > > b) Try with previous keys if any (which be limited to the keys that we
> > > have traversed so far in this dump call)
> > > c) continue with next entries in the map. array is easy just get the
> > > next valid key (starting on i+1), but hmap might be difficult since
> > > starting on the next bucket could potentially skip some keys that were
> > > concurrently added to the same bucket where key used to be, and
> > > starting on the same bucket could lead us to return repeated elements.
> > >
> > > Or maybe we could support those 3 cases via flags and let the caller
> > > decide which one to use?
> >
> > this type of indecision is the reason why I wasn't excited about
> > batch dumping in the first place and gave 'soft yes' when Stan
> > mentioned it during lsf/mm/bpf uconf.
> > We probably shouldn't do it.
> > It feels this map_dump makes api more complex and doesn't really
> > give much benefit to the user other than large map dump becomes faster.
> > I think we gotta solve this problem differently.
>
> Some users are working around the dumping problems with the existing
> api by creating a bpf_map_get_next_key_and_delete userspace function
> (see https://www.bouncybouncy.net/blog/bpf_map_get_next_key-pitfalls/)
> which in my opinion is actually a good idea. The only problem with
> that is that calling bpf_map_get_next_key(fd, key, next_key) and then
> bpf_map_delete_elem(fd, key) from userspace is racing with kernel code
> and it might lose some information when deleting.
> We could then do map_dump_and_delete using that idea but in the kernel
> where we could better handle the racing condition. In that scenario
> even if we retrieve the same key it will contain different info ( the
> delta between old and new value). Would that work?
you mean get_next+lookup+delete at once?
Sounds useful.
Yonghong has been thinking about batching api as well.
I think if we cannot figure out how to make a batch of two commands
get_next + lookup to work correctly then we need to identify/invent one
command and make batching more generic.
Like make one jumbo/compound/atomic command to be get_next+lookup+delete.
Define the semantics of this single compound command.
And then let batching to be a multiplier of such command.
In a sense that multiplier 1 or N should be have the same way.
No extra flags to alter the batching.
The high level description of the batch would be:
pls execute get_next,lookup,delete and repeat it N times.
or
pls execute get_next,lookup and repeat N times.
where each command action is defined to be composable.
Just a rough idea.
next prev parent reply other threads:[~2019-07-26 1:47 UTC|newest]
Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-07-24 16:57 [PATCH bpf-next 0/6] bpf: add BPF_MAP_DUMP command to dump more than one entry per call Brian Vazquez
2019-07-24 16:57 ` [PATCH bpf-next 1/6] bpf: add bpf_map_value_size and bp_map_copy_value helper functions Brian Vazquez
2019-07-24 20:53 ` Song Liu
2019-07-24 16:57 ` [PATCH bpf-next 2/6] bpf: add BPF_MAP_DUMP command to dump more than one entry per call Brian Vazquez
2019-07-24 19:54 ` Willem de Bruijn
2019-07-24 22:26 ` Brian Vazquez
2019-07-24 22:33 ` Willem de Bruijn
2019-07-24 21:40 ` Song Liu
2019-07-24 22:44 ` Brian Vazquez
2019-07-24 23:04 ` Song Liu
2019-07-25 23:25 ` Brian Vazquez
2019-07-25 23:54 ` Alexei Starovoitov
2019-07-26 1:02 ` Willem de Bruijn
2019-07-26 1:24 ` Brian Vazquez
2019-07-26 1:47 ` Alexei Starovoitov [this message]
2019-07-26 6:10 ` Yonghong Song
2019-07-26 23:36 ` Brian Vazquez
2019-07-27 0:02 ` Jakub Kicinski
2019-07-27 17:54 ` Yonghong Song
2019-07-24 16:58 ` [PATCH bpf-next 3/6] bpf: keep bpf.h in sync with tools/ Brian Vazquez
2019-07-24 21:41 ` Song Liu
2019-07-24 23:10 ` Andrii Nakryiko
2019-07-25 23:27 ` Brian Vazquez
2019-07-24 16:58 ` [PATCH bpf-next 4/6] libbpf: support BPF_MAP_DUMP command Brian Vazquez
2019-07-24 19:51 ` Willem de Bruijn
2019-07-24 16:58 ` [PATCH bpf-next 5/6] selftests/bpf: test BPF_MAP_DUMP command on a bpf hashmap Brian Vazquez
2019-07-24 21:58 ` Song Liu
2019-07-24 16:58 ` [PATCH bpf-next 6/6] selftests/bpf: add test to measure performance of BPF_MAP_DUMP Brian Vazquez
2019-07-24 22:01 ` Song Liu
2019-07-24 19:20 ` [PATCH bpf-next 0/6] bpf: add BPF_MAP_DUMP command to dump more than one entry per call Song Liu
2019-07-24 22:15 ` Brian Vazquez
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAADnVQJpp37fXLsu8ZnMFPoC0Uof3roz4gofX0QCewNkwtf-Xg@mail.gmail.com \
--to=alexei.starovoitov@gmail.com \
--cc=bpf@vger.kernel.org \
--cc=brianvv.kernel@gmail.com \
--cc=brianvv@google.com \
--cc=daniel@iogearbox.net \
--cc=davem@davemloft.net \
--cc=liu.song.a23@gmail.com \
--cc=netdev@vger.kernel.org \
--cc=ppenkov@google.com \
--cc=sdf@google.com \
--cc=willemb@google.com \
--cc=yhs@fb.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).