All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: Suggestions on iterating eBPF maps
       [not found] <CAMOXUJm3pYr4RwzJSYop3jFT+NjrcjKgm=QGZbbP_1U2299JhQ@mail.gmail.com>
@ 2018-04-27 18:33 ` Chenbo Feng
  2018-04-28  1:04   ` Alexei Starovoitov
  0 siblings, 1 reply; 4+ messages in thread
From: Chenbo Feng @ 2018-04-27 18:33 UTC (permalink / raw)
  To: netdev, Alexei Starovoitov, Daniel Borkmann
  Cc: Lorenzo Colitti, Joel Fernandes

resend with  plain text

On Fri, Apr 27, 2018 at 11:22 AM Chenbo Feng <fengc@google.com> wrote:

> Hi net-next,

> When doing the eBPF tools user-space development I noticed that the map
iterating process in user-space have some little flaws. If we want to dump
the whole map. The only way now I know is to use a null key to start the
iteration and keep calling bpf_get_next_key and bpf_look_up_elem for each
new key value pair until we reach the end of the map. I noticed the
bpftools recently added used the similar approach.

> The overhead of repeating syscalls is acceptable, but the race problem
come with this iteration process is a little annoying. If the current key
we are using get deleted before we do the syscall to get the next key . The
next key returned will start from the beginning of the map again and some
entry will be dumped again depending on the position of the key deleted. If
the racing problem is within the same userspace process, it can be easily
fixed by adding some read/write locks. However, if multiple processes is
reading the map through pinned fd while there is one process is editing the
map entry or the kernel program is deleting entries, it become harder to
get a consistent and correct map dump.

> We are wondering if there is already implementation we didn't notice in
mainline kernel that help improved this iteration process and addressed the
racing problem mentioned above? If not, what can be down to address the
issue above. One thing we came up with is to use a single entry bpf map as
a across process lock to prevent multiple userspace process to read/write
other maps at the same time. But I don't know how safe this solution is
since there will still be a race to read the lock map value and setup the
lock.

> Thanks
> Chenbo Feng

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Suggestions on iterating eBPF maps
  2018-04-27 18:33 ` Suggestions on iterating eBPF maps Chenbo Feng
@ 2018-04-28  1:04   ` Alexei Starovoitov
  2018-05-02  2:05     ` Lorenzo Colitti
  0 siblings, 1 reply; 4+ messages in thread
From: Alexei Starovoitov @ 2018-04-28  1:04 UTC (permalink / raw)
  To: Chenbo Feng; +Cc: netdev, Daniel Borkmann, Lorenzo Colitti, Joel Fernandes

On Fri, Apr 27, 2018 at 06:33:56PM +0000, Chenbo Feng wrote:
> resend with  plain text
> 
> On Fri, Apr 27, 2018 at 11:22 AM Chenbo Feng <fengc@google.com> wrote:
> 
> > Hi net-next,
> 
> > When doing the eBPF tools user-space development I noticed that the map
> iterating process in user-space have some little flaws. If we want to dump
> the whole map. The only way now I know is to use a null key to start the
> iteration and keep calling bpf_get_next_key and bpf_look_up_elem for each
> new key value pair until we reach the end of the map. I noticed the
> bpftools recently added used the similar approach.
> 
> > The overhead of repeating syscalls is acceptable, but the race problem
> come with this iteration process is a little annoying. If the current key
> we are using get deleted before we do the syscall to get the next key . The
> next key returned will start from the beginning of the map again and some
> entry will be dumped again depending on the position of the key deleted. If
> the racing problem is within the same userspace process, it can be easily
> fixed by adding some read/write locks. However, if multiple processes is
> reading the map through pinned fd while there is one process is editing the
> map entry or the kernel program is deleting entries, it become harder to
> get a consistent and correct map dump.
> 
> > We are wondering if there is already implementation we didn't notice in
> mainline kernel that help improved this iteration process and addressed the
> racing problem mentioned above? If not, what can be down to address the
> issue above. One thing we came up with is to use a single entry bpf map as
> a across process lock to prevent multiple userspace process to read/write
> other maps at the same time. But I don't know how safe this solution is
> since there will still be a race to read the lock map value and setup the
> lock.

to avoid seeing duplicate keys due to parallel removal one can walk all
keys with get_next first. Remove duplicate keys and then lookup their values.
By that time some elements could be removed and lookups will be failing.

Another approach could be to use map-in-map and have almost atomic
replace of the whole map with new potentially empty map. The prog
can continue using the new map, while user space walks no longer
accessed old map.

yet another approach would be to introduce a knob to the prog
that user space controls and make program obey that knob.
When it's on the prog won't be deleting/updating maps.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Suggestions on iterating eBPF maps
  2018-04-28  1:04   ` Alexei Starovoitov
@ 2018-05-02  2:05     ` Lorenzo Colitti
  2018-05-02  2:33       ` Alexei Starovoitov
  0 siblings, 1 reply; 4+ messages in thread
From: Lorenzo Colitti @ 2018-05-02  2:05 UTC (permalink / raw)
  To: Alexei Starovoitov; +Cc: Chenbo Feng, netdev, Daniel Borkmann, Joel Fernandes

On Sat, Apr 28, 2018 at 10:04 AM, Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
> Another approach could be to use map-in-map and have almost atomic
> replace of the whole map with new potentially empty map. The prog
> can continue using the new map, while user space walks no longer
> accessed old map.

That sounds like a promising approach. I assume this would be
functionally equivalent to an approach where there is a map containing
a boolean that says whether to write to map A or map B? We'd then do
the following:

0. Kernel program is writing to map A.
1. Userspace pushes config that says to write to map B.
2. Kernel program starts to write to map B.
3. Userspace scans map A, collecting stats and deleting everything it finds.

One problem with this is: if the effects of #1 are not immediately
visible to the programs running on all cores, the program could still
be writing to map A and the deletes in #3 would result in loss of
data. Are there any guarantees around this? I know that hash map
writes are atomic, but I'm not aware of any other guarantees here. Are
there memory barriers around map writes and reads?

In the absence of guarantees, userspace could put a sleep between #1
and #3 and things would be correct Most Of The Time(TM), but if the
kernel is busy doing other things that might not be sufficient.
Thoughts?

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Suggestions on iterating eBPF maps
  2018-05-02  2:05     ` Lorenzo Colitti
@ 2018-05-02  2:33       ` Alexei Starovoitov
  0 siblings, 0 replies; 4+ messages in thread
From: Alexei Starovoitov @ 2018-05-02  2:33 UTC (permalink / raw)
  To: Lorenzo Colitti; +Cc: Chenbo Feng, netdev, Daniel Borkmann, Joel Fernandes

On Wed, May 02, 2018 at 11:05:19AM +0900, Lorenzo Colitti wrote:
> On Sat, Apr 28, 2018 at 10:04 AM, Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> > Another approach could be to use map-in-map and have almost atomic
> > replace of the whole map with new potentially empty map. The prog
> > can continue using the new map, while user space walks no longer
> > accessed old map.
> 
> That sounds like a promising approach. I assume this would be
> functionally equivalent to an approach where there is a map containing
> a boolean that says whether to write to map A or map B? We'd then do
> the following:
> 
> 0. Kernel program is writing to map A.
> 1. Userspace pushes config that says to write to map B.
> 2. Kernel program starts to write to map B.
> 3. Userspace scans map A, collecting stats and deleting everything it finds.
> 
> One problem with this is: if the effects of #1 are not immediately
> visible to the programs running on all cores, the program could still
> be writing to map A and the deletes in #3 would result in loss of
> data. Are there any guarantees around this? I know that hash map
> writes are atomic, but I'm not aware of any other guarantees here. Are
> there memory barriers around map writes and reads?
> 
> In the absence of guarantees, userspace could put a sleep between #1
> and #3 and things would be correct Most Of The Time(TM), but if the
> kernel is busy doing other things that might not be sufficient.
> Thoughts?

if you use map-in-map you don't need extra boolean map.
0. bpf prog can do
   inner_map = lookup(map_in_map, key=0);
   lookup(inner_map, your_real_key);
1. user space writes into map_in_map[0] <- FD of new map
2. some cpus are using old inner map and some a new
3. user space does sys_membarrier(CMD_GLOBAL) which will do synchronize_sched()
   which in CONFIG_PREEMPT_NONE=y servers is the same as synchronize_rcu()
   which will guarantee that progs finished.
4. scan old inner map

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2018-05-02  2:33 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <CAMOXUJm3pYr4RwzJSYop3jFT+NjrcjKgm=QGZbbP_1U2299JhQ@mail.gmail.com>
2018-04-27 18:33 ` Suggestions on iterating eBPF maps Chenbo Feng
2018-04-28  1:04   ` Alexei Starovoitov
2018-05-02  2:05     ` Lorenzo Colitti
2018-05-02  2:33       ` Alexei Starovoitov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.