All of lore.kernel.org
 help / color / mirror / Atom feed
* [bpf PATCH 0/2] xdp devmap improvements cleanup
@ 2020-01-08 21:34 John Fastabend
  2020-01-08 21:34 ` [bpf PATCH 1/2] bpf: xdp, update devmap comments to reflect napi/rcu usage John Fastabend
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: John Fastabend @ 2020-01-08 21:34 UTC (permalink / raw)
  To: bjorn.topel, bpf, toke; +Cc: netdev, john.fastabend, ast, daniel

Couple cleanup patches to recently posted series[0] from Bjorn to
cleanup and optimize the devmap usage. Patches have commit ids
the cleanup applies to.

[0] https://www.spinics.net/lists/netdev/msg620639.html

---

John Fastabend (2):
      bpf: xdp, update devmap comments to reflect napi/rcu usage
      bpf: xdp, remove no longer required rcu_read_{un}lock()


 kernel/bpf/devmap.c |   23 +++++++++++------------
 1 file changed, 11 insertions(+), 12 deletions(-)

--
Signature

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [bpf PATCH 1/2] bpf: xdp, update devmap comments to reflect napi/rcu usage
  2020-01-08 21:34 [bpf PATCH 0/2] xdp devmap improvements cleanup John Fastabend
@ 2020-01-08 21:34 ` John Fastabend
  2020-01-09  1:20   ` Song Liu
  2020-01-08 21:35 ` [bpf PATCH 2/2] bpf: xdp, remove no longer required rcu_read_{un}lock() John Fastabend
  2020-01-09  6:09 ` [bpf PATCH 0/2] xdp devmap improvements cleanup Björn Töpel
  2 siblings, 1 reply; 11+ messages in thread
From: John Fastabend @ 2020-01-08 21:34 UTC (permalink / raw)
  To: bjorn.topel, bpf, toke; +Cc: netdev, john.fastabend, ast, daniel

Now that we rely on synchronize_rcu and call_rcu waiting to
exit perempt-disable regions (NAPI) lets update the comments
to reflect this.

Fixes: 0536b85239b84 ("xdp: Simplify devmap cleanup")
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
---
 kernel/bpf/devmap.c |   21 +++++++++++----------
 1 file changed, 11 insertions(+), 10 deletions(-)

diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c
index da9c832..f0bf525 100644
--- a/kernel/bpf/devmap.c
+++ b/kernel/bpf/devmap.c
@@ -193,10 +193,12 @@ static void dev_map_free(struct bpf_map *map)
 
 	/* At this point bpf_prog->aux->refcnt == 0 and this map->refcnt == 0,
 	 * so the programs (can be more than one that used this map) were
-	 * disconnected from events. Wait for outstanding critical sections in
-	 * these programs to complete. The rcu critical section only guarantees
-	 * no further reads against netdev_map. It does __not__ ensure pending
-	 * flush operations (if any) are complete.
+	 * disconnected from events. The following synchronize_rcu() guarantees
+	 * both rcu read critical sections complete and waits for
+	 * preempt-disable regions (NAPI being the relavent context here) so we
+	 * are certain there will be no further reads against the netdev_map and
+	 * all flush operations are complete. Flush operations can only be done
+	 * from NAPI context for this reason.
 	 */
 
 	spin_lock(&dev_map_lock);
@@ -498,12 +500,11 @@ static int dev_map_delete_elem(struct bpf_map *map, void *key)
 		return -EINVAL;
 
 	/* Use call_rcu() here to ensure any rcu critical sections have
-	 * completed, but this does not guarantee a flush has happened
-	 * yet. Because driver side rcu_read_lock/unlock only protects the
-	 * running XDP program. However, for pending flush operations the
-	 * dev and ctx are stored in another per cpu map. And additionally,
-	 * the driver tear down ensures all soft irqs are complete before
-	 * removing the net device in the case of dev_put equals zero.
+	 * completed as well as any flush operations because call_rcu
+	 * will wait for preempt-disable region to complete, NAPI in this
+	 * context.  And additionally, the driver tear down ensures all
+	 * soft irqs are complete before removing the net device in the
+	 * case of dev_put equals zero.
 	 */
 	old_dev = xchg(&dtab->netdev_map[k], NULL);
 	if (old_dev)


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [bpf PATCH 2/2] bpf: xdp, remove no longer required rcu_read_{un}lock()
  2020-01-08 21:34 [bpf PATCH 0/2] xdp devmap improvements cleanup John Fastabend
  2020-01-08 21:34 ` [bpf PATCH 1/2] bpf: xdp, update devmap comments to reflect napi/rcu usage John Fastabend
@ 2020-01-08 21:35 ` John Fastabend
  2020-01-09  1:21   ` Song Liu
  2020-01-09  6:01   ` Toshiaki Makita
  2020-01-09  6:09 ` [bpf PATCH 0/2] xdp devmap improvements cleanup Björn Töpel
  2 siblings, 2 replies; 11+ messages in thread
From: John Fastabend @ 2020-01-08 21:35 UTC (permalink / raw)
  To: bjorn.topel, bpf, toke; +Cc: netdev, john.fastabend, ast, daniel

Now that we depend on rcu_call() and synchronize_rcu() to also wait
for preempt_disabled region to complete the rcu read critical section
in __dev_map_flush() is no longer relevant.

These originally ensured the map reference was safe while a map was
also being free'd. But flush by new rules can only be called from
preempt-disabled NAPI context. The synchronize_rcu from the map free
path and the rcu_call from the delete path will ensure the reference
here is safe. So lets remove the rcu_read_lock and rcu_read_unlock
pair to avoid any confusion around how this is being protected.

If the rcu_read_lock was required it would mean errors in the above
logic and the original patch would also be wrong.

Fixes: 0536b85239b84 ("xdp: Simplify devmap cleanup")
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
---
 kernel/bpf/devmap.c |    2 --
 1 file changed, 2 deletions(-)

diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c
index f0bf525..0129d4a 100644
--- a/kernel/bpf/devmap.c
+++ b/kernel/bpf/devmap.c
@@ -378,10 +378,8 @@ void __dev_map_flush(void)
 	struct list_head *flush_list = this_cpu_ptr(&dev_map_flush_list);
 	struct xdp_bulk_queue *bq, *tmp;
 
-	rcu_read_lock();
 	list_for_each_entry_safe(bq, tmp, flush_list, flush_node)
 		bq_xmit_all(bq, XDP_XMIT_FLUSH);
-	rcu_read_unlock();
 }
 
 /* rcu_read_lock (from syscall and BPF contexts) ensures that if a delete and/or


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [bpf PATCH 1/2] bpf: xdp, update devmap comments to reflect napi/rcu usage
  2020-01-08 21:34 ` [bpf PATCH 1/2] bpf: xdp, update devmap comments to reflect napi/rcu usage John Fastabend
@ 2020-01-09  1:20   ` Song Liu
  0 siblings, 0 replies; 11+ messages in thread
From: Song Liu @ 2020-01-09  1:20 UTC (permalink / raw)
  To: John Fastabend
  Cc: Björn Töpel, bpf, Toke Høiland-Jørgensen,
	Networking, Alexei Starovoitov, Daniel Borkmann

On Wed, Jan 8, 2020 at 1:36 PM John Fastabend <john.fastabend@gmail.com> wrote:
>
> Now that we rely on synchronize_rcu and call_rcu waiting to
> exit perempt-disable regions (NAPI) lets update the comments
> to reflect this.
>
> Fixes: 0536b85239b84 ("xdp: Simplify devmap cleanup")
> Signed-off-by: John Fastabend <john.fastabend@gmail.com>

Acked-by: Song Liu <songliubraving@fb.com>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [bpf PATCH 2/2] bpf: xdp, remove no longer required rcu_read_{un}lock()
  2020-01-08 21:35 ` [bpf PATCH 2/2] bpf: xdp, remove no longer required rcu_read_{un}lock() John Fastabend
@ 2020-01-09  1:21   ` Song Liu
  2020-01-09  6:01   ` Toshiaki Makita
  1 sibling, 0 replies; 11+ messages in thread
From: Song Liu @ 2020-01-09  1:21 UTC (permalink / raw)
  To: John Fastabend
  Cc: Björn Töpel, bpf, Toke Høiland-Jørgensen,
	Networking, Alexei Starovoitov, Daniel Borkmann

On Wed, Jan 8, 2020 at 1:36 PM John Fastabend <john.fastabend@gmail.com> wrote:
>
> Now that we depend on rcu_call() and synchronize_rcu() to also wait
> for preempt_disabled region to complete the rcu read critical section
> in __dev_map_flush() is no longer relevant.
>
> These originally ensured the map reference was safe while a map was
> also being free'd. But flush by new rules can only be called from
> preempt-disabled NAPI context. The synchronize_rcu from the map free
> path and the rcu_call from the delete path will ensure the reference
> here is safe. So lets remove the rcu_read_lock and rcu_read_unlock
> pair to avoid any confusion around how this is being protected.
>
> If the rcu_read_lock was required it would mean errors in the above
> logic and the original patch would also be wrong.
>
> Fixes: 0536b85239b84 ("xdp: Simplify devmap cleanup")
> Signed-off-by: John Fastabend <john.fastabend@gmail.com>

Acked-by: Song Liu <songliubraving@fb.com>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [bpf PATCH 2/2] bpf: xdp, remove no longer required rcu_read_{un}lock()
  2020-01-08 21:35 ` [bpf PATCH 2/2] bpf: xdp, remove no longer required rcu_read_{un}lock() John Fastabend
  2020-01-09  1:21   ` Song Liu
@ 2020-01-09  6:01   ` Toshiaki Makita
  2020-01-09  6:34     ` Toshiaki Makita
  2020-01-09  6:35     ` John Fastabend
  1 sibling, 2 replies; 11+ messages in thread
From: Toshiaki Makita @ 2020-01-09  6:01 UTC (permalink / raw)
  To: John Fastabend, bjorn.topel, bpf, toke; +Cc: netdev, ast, daniel

On 2020/01/09 6:35, John Fastabend wrote:
> Now that we depend on rcu_call() and synchronize_rcu() to also wait
> for preempt_disabled region to complete the rcu read critical section
> in __dev_map_flush() is no longer relevant.
> 
> These originally ensured the map reference was safe while a map was
> also being free'd. But flush by new rules can only be called from
> preempt-disabled NAPI context. The synchronize_rcu from the map free
> path and the rcu_call from the delete path will ensure the reference
> here is safe. So lets remove the rcu_read_lock and rcu_read_unlock
> pair to avoid any confusion around how this is being protected.
> 
> If the rcu_read_lock was required it would mean errors in the above
> logic and the original patch would also be wrong.
> 
> Fixes: 0536b85239b84 ("xdp: Simplify devmap cleanup")
> Signed-off-by: John Fastabend <john.fastabend@gmail.com>
> ---
>   kernel/bpf/devmap.c |    2 --
>   1 file changed, 2 deletions(-)
> 
> diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c
> index f0bf525..0129d4a 100644
> --- a/kernel/bpf/devmap.c
> +++ b/kernel/bpf/devmap.c
> @@ -378,10 +378,8 @@ void __dev_map_flush(void)
>   	struct list_head *flush_list = this_cpu_ptr(&dev_map_flush_list);
>   	struct xdp_bulk_queue *bq, *tmp;
>   
> -	rcu_read_lock();
>   	list_for_each_entry_safe(bq, tmp, flush_list, flush_node)
>   		bq_xmit_all(bq, XDP_XMIT_FLUSH);
> -	rcu_read_unlock();

I introduced this lock because some drivers have assumption that
.ndo_xdp_xmit() is called under RCU. (commit 86723c864063)

Maybe devmap deletion logic does not need this anymore, but is it
OK to drivers?

Toshiaki Makita

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [bpf PATCH 0/2] xdp devmap improvements cleanup
  2020-01-08 21:34 [bpf PATCH 0/2] xdp devmap improvements cleanup John Fastabend
  2020-01-08 21:34 ` [bpf PATCH 1/2] bpf: xdp, update devmap comments to reflect napi/rcu usage John Fastabend
  2020-01-08 21:35 ` [bpf PATCH 2/2] bpf: xdp, remove no longer required rcu_read_{un}lock() John Fastabend
@ 2020-01-09  6:09 ` Björn Töpel
  2 siblings, 0 replies; 11+ messages in thread
From: Björn Töpel @ 2020-01-09  6:09 UTC (permalink / raw)
  To: John Fastabend
  Cc: bpf, Toke Høiland-Jørgensen, Netdev,
	Alexei Starovoitov, Daniel Borkmann

On Wed, 8 Jan 2020 at 22:34, John Fastabend <john.fastabend@gmail.com> wrote:
>
> Couple cleanup patches to recently posted series[0] from Bjorn to
> cleanup and optimize the devmap usage. Patches have commit ids
> the cleanup applies to.
>
> [0] https://www.spinics.net/lists/netdev/msg620639.html
>
> ---
>
> John Fastabend (2):
>       bpf: xdp, update devmap comments to reflect napi/rcu usage
>       bpf: xdp, remove no longer required rcu_read_{un}lock()
>

Thanks for the clean-up, John!

For the series:
Acked-by: Björn Töpel <bjorn.topel@intel.com>

>
>  kernel/bpf/devmap.c |   23 +++++++++++------------
>  1 file changed, 11 insertions(+), 12 deletions(-)
>
> --
> Signature

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [bpf PATCH 2/2] bpf: xdp, remove no longer required rcu_read_{un}lock()
  2020-01-09  6:01   ` Toshiaki Makita
@ 2020-01-09  6:34     ` Toshiaki Makita
  2020-01-09  6:35     ` John Fastabend
  1 sibling, 0 replies; 11+ messages in thread
From: Toshiaki Makita @ 2020-01-09  6:34 UTC (permalink / raw)
  To: John Fastabend, bjorn.topel, bpf, toke; +Cc: netdev, ast, daniel

On 2020/01/09 15:01, Toshiaki Makita wrote:
> On 2020/01/09 6:35, John Fastabend wrote:
>> Now that we depend on rcu_call() and synchronize_rcu() to also wait
>> for preempt_disabled region to complete the rcu read critical section
>> in __dev_map_flush() is no longer relevant.
>>
>> These originally ensured the map reference was safe while a map was
>> also being free'd. But flush by new rules can only be called from
>> preempt-disabled NAPI context. The synchronize_rcu from the map free
>> path and the rcu_call from the delete path will ensure the reference
>> here is safe. So lets remove the rcu_read_lock and rcu_read_unlock
>> pair to avoid any confusion around how this is being protected.
>>
>> If the rcu_read_lock was required it would mean errors in the above
>> logic and the original patch would also be wrong.
>>
>> Fixes: 0536b85239b84 ("xdp: Simplify devmap cleanup")
>> Signed-off-by: John Fastabend <john.fastabend@gmail.com>
>> ---
>>   kernel/bpf/devmap.c |    2 --
>>   1 file changed, 2 deletions(-)
>>
>> diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c
>> index f0bf525..0129d4a 100644
>> --- a/kernel/bpf/devmap.c
>> +++ b/kernel/bpf/devmap.c
>> @@ -378,10 +378,8 @@ void __dev_map_flush(void)
>>       struct list_head *flush_list = this_cpu_ptr(&dev_map_flush_list);
>>       struct xdp_bulk_queue *bq, *tmp;
>> -    rcu_read_lock();
>>       list_for_each_entry_safe(bq, tmp, flush_list, flush_node)
>>           bq_xmit_all(bq, XDP_XMIT_FLUSH);
>> -    rcu_read_unlock();
> 
> I introduced this lock because some drivers have assumption that
> .ndo_xdp_xmit() is called under RCU. (commit 86723c864063)
> 
> Maybe devmap deletion logic does not need this anymore, but is it
> OK to drivers?

More references for the driver problem:
- Lockdep splat in virtio_net: https://lists.openwall.net/netdev/2019/04/24/282
- Discussion for fix: https://lists.openwall.net/netdev/2019/04/25/234

Toshiaki Makita

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [bpf PATCH 2/2] bpf: xdp, remove no longer required rcu_read_{un}lock()
  2020-01-09  6:01   ` Toshiaki Makita
  2020-01-09  6:34     ` Toshiaki Makita
@ 2020-01-09  6:35     ` John Fastabend
  2020-01-09  6:57       ` Toshiaki Makita
  1 sibling, 1 reply; 11+ messages in thread
From: John Fastabend @ 2020-01-09  6:35 UTC (permalink / raw)
  To: Toshiaki Makita, John Fastabend, bjorn.topel, bpf, toke
  Cc: netdev, ast, daniel

Toshiaki Makita wrote:
> On 2020/01/09 6:35, John Fastabend wrote:
> > Now that we depend on rcu_call() and synchronize_rcu() to also wait
> > for preempt_disabled region to complete the rcu read critical section
> > in __dev_map_flush() is no longer relevant.
> > 
> > These originally ensured the map reference was safe while a map was
> > also being free'd. But flush by new rules can only be called from
> > preempt-disabled NAPI context. The synchronize_rcu from the map free
> > path and the rcu_call from the delete path will ensure the reference
> > here is safe. So lets remove the rcu_read_lock and rcu_read_unlock
> > pair to avoid any confusion around how this is being protected.
> > 
> > If the rcu_read_lock was required it would mean errors in the above
> > logic and the original patch would also be wrong.
> > 
> > Fixes: 0536b85239b84 ("xdp: Simplify devmap cleanup")
> > Signed-off-by: John Fastabend <john.fastabend@gmail.com>
> > ---
> >   kernel/bpf/devmap.c |    2 --
> >   1 file changed, 2 deletions(-)
> > 
> > diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c
> > index f0bf525..0129d4a 100644
> > --- a/kernel/bpf/devmap.c
> > +++ b/kernel/bpf/devmap.c
> > @@ -378,10 +378,8 @@ void __dev_map_flush(void)
> >   	struct list_head *flush_list = this_cpu_ptr(&dev_map_flush_list);
> >   	struct xdp_bulk_queue *bq, *tmp;
> >   
> > -	rcu_read_lock();
> >   	list_for_each_entry_safe(bq, tmp, flush_list, flush_node)
> >   		bq_xmit_all(bq, XDP_XMIT_FLUSH);
> > -	rcu_read_unlock();
> 
> I introduced this lock because some drivers have assumption that
> .ndo_xdp_xmit() is called under RCU. (commit 86723c864063)
> 
> Maybe devmap deletion logic does not need this anymore, but is it
> OK to drivers?

Ah OK thanks for catching this. So its a strange requirement from
virto_net to need read_lock like this. Quickly scanned the drivers
and seems its the only one.

I think the best path forward is to fix virtio_net so it doesn't
need rcu_read_lock() here then the locking is much cleaner IMO.

I'll send a v2 and either move the xdp enabled check (the piece
using the rcu_read_lock) into a bitmask flag or push the
rcu_read_lock() into virtio_net so its clear that this is a detail
of virtio_net and not a general thing. FWIW I don't think the
rcu_read_lock is actually needed in the virtio_net case anymore
either but pretty sure the rcu_dereference will cause an rcu
splat. Maybe there is another annotation we can use. I'll dig
into it tomorrow. Thanks

> 
> Toshiaki Makita



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [bpf PATCH 2/2] bpf: xdp, remove no longer required rcu_read_{un}lock()
  2020-01-09  6:35     ` John Fastabend
@ 2020-01-09  6:57       ` Toshiaki Makita
  2020-01-09 14:58         ` John Fastabend
  0 siblings, 1 reply; 11+ messages in thread
From: Toshiaki Makita @ 2020-01-09  6:57 UTC (permalink / raw)
  To: John Fastabend, bjorn.topel, bpf, toke; +Cc: netdev, ast, daniel

On 2020/01/09 15:35, John Fastabend wrote:
> Toshiaki Makita wrote:
>> On 2020/01/09 6:35, John Fastabend wrote:
>>> Now that we depend on rcu_call() and synchronize_rcu() to also wait
>>> for preempt_disabled region to complete the rcu read critical section
>>> in __dev_map_flush() is no longer relevant.
>>>
>>> These originally ensured the map reference was safe while a map was
>>> also being free'd. But flush by new rules can only be called from
>>> preempt-disabled NAPI context. The synchronize_rcu from the map free
>>> path and the rcu_call from the delete path will ensure the reference
>>> here is safe. So lets remove the rcu_read_lock and rcu_read_unlock
>>> pair to avoid any confusion around how this is being protected.
>>>
>>> If the rcu_read_lock was required it would mean errors in the above
>>> logic and the original patch would also be wrong.
>>>
>>> Fixes: 0536b85239b84 ("xdp: Simplify devmap cleanup")
>>> Signed-off-by: John Fastabend <john.fastabend@gmail.com>
>>> ---
>>>    kernel/bpf/devmap.c |    2 --
>>>    1 file changed, 2 deletions(-)
>>>
>>> diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c
>>> index f0bf525..0129d4a 100644
>>> --- a/kernel/bpf/devmap.c
>>> +++ b/kernel/bpf/devmap.c
>>> @@ -378,10 +378,8 @@ void __dev_map_flush(void)
>>>    	struct list_head *flush_list = this_cpu_ptr(&dev_map_flush_list);
>>>    	struct xdp_bulk_queue *bq, *tmp;
>>>    
>>> -	rcu_read_lock();
>>>    	list_for_each_entry_safe(bq, tmp, flush_list, flush_node)
>>>    		bq_xmit_all(bq, XDP_XMIT_FLUSH);
>>> -	rcu_read_unlock();
>>
>> I introduced this lock because some drivers have assumption that
>> .ndo_xdp_xmit() is called under RCU. (commit 86723c864063)
>>
>> Maybe devmap deletion logic does not need this anymore, but is it
>> OK to drivers?
> 
> Ah OK thanks for catching this. So its a strange requirement from
> virto_net to need read_lock like this. Quickly scanned the drivers
> and seems its the only one.
> 
> I think the best path forward is to fix virtio_net so it doesn't
> need rcu_read_lock() here then the locking is much cleaner IMO.

Actually veth is calling rcu_dereference in .ndo_xdp_xmit() so it needs
the same treatment. In the reference I sent in another mail, Jesper
said mlx5 also has some RCU dependency.

> I'll send a v2 and either move the xdp enabled check (the piece
> using the rcu_read_lock) into a bitmask flag or push the
> rcu_read_lock() into virtio_net so its clear that this is a detail
> of virtio_net and not a general thing. FWIW I don't think the
> rcu_read_lock is actually needed in the virtio_net case anymore
> either but pretty sure the rcu_dereference will cause an rcu
> splat. Maybe there is another annotation we can use. I'll dig
> into it tomorrow. Thanks

I'm thinking we can just move the rcu_lock to wrap around only ndo_xdp_xmit.
But as you suggest if we can identify all drivers which depends on RCU and move the
rcu_lock into the drivers (or remove the dependency) it's better.

Toshiaki Makita

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [bpf PATCH 2/2] bpf: xdp, remove no longer required rcu_read_{un}lock()
  2020-01-09  6:57       ` Toshiaki Makita
@ 2020-01-09 14:58         ` John Fastabend
  0 siblings, 0 replies; 11+ messages in thread
From: John Fastabend @ 2020-01-09 14:58 UTC (permalink / raw)
  To: Toshiaki Makita, John Fastabend, bjorn.topel, bpf, toke, jbrouer
  Cc: netdev, ast, daniel

Toshiaki Makita wrote:
> On 2020/01/09 15:35, John Fastabend wrote:
> > Toshiaki Makita wrote:
> >> On 2020/01/09 6:35, John Fastabend wrote:
> >>> Now that we depend on rcu_call() and synchronize_rcu() to also wait
> >>> for preempt_disabled region to complete the rcu read critical section
> >>> in __dev_map_flush() is no longer relevant.
> >>>
> >>> These originally ensured the map reference was safe while a map was
> >>> also being free'd. But flush by new rules can only be called from
> >>> preempt-disabled NAPI context. The synchronize_rcu from the map free
> >>> path and the rcu_call from the delete path will ensure the reference
> >>> here is safe. So lets remove the rcu_read_lock and rcu_read_unlock
> >>> pair to avoid any confusion around how this is being protected.
> >>>
> >>> If the rcu_read_lock was required it would mean errors in the above
> >>> logic and the original patch would also be wrong.
> >>>
> >>> Fixes: 0536b85239b84 ("xdp: Simplify devmap cleanup")
> >>> Signed-off-by: John Fastabend <john.fastabend@gmail.com>
> >>> ---
> >>>    kernel/bpf/devmap.c |    2 --
> >>>    1 file changed, 2 deletions(-)
> >>>
> >>> diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c
> >>> index f0bf525..0129d4a 100644
> >>> --- a/kernel/bpf/devmap.c
> >>> +++ b/kernel/bpf/devmap.c
> >>> @@ -378,10 +378,8 @@ void __dev_map_flush(void)
> >>>    	struct list_head *flush_list = this_cpu_ptr(&dev_map_flush_list);
> >>>    	struct xdp_bulk_queue *bq, *tmp;
> >>>    
> >>> -	rcu_read_lock();
> >>>    	list_for_each_entry_safe(bq, tmp, flush_list, flush_node)
> >>>    		bq_xmit_all(bq, XDP_XMIT_FLUSH);
> >>> -	rcu_read_unlock();
> >>
> >> I introduced this lock because some drivers have assumption that
> >> .ndo_xdp_xmit() is called under RCU. (commit 86723c864063)
> >>
> >> Maybe devmap deletion logic does not need this anymore, but is it
> >> OK to drivers?
> > 
> > Ah OK thanks for catching this. So its a strange requirement from
> > virto_net to need read_lock like this. Quickly scanned the drivers
> > and seems its the only one.
> > 
> > I think the best path forward is to fix virtio_net so it doesn't
> > need rcu_read_lock() here then the locking is much cleaner IMO.
> 
> Actually veth is calling rcu_dereference in .ndo_xdp_xmit() so it needs
> the same treatment. In the reference I sent in another mail, Jesper
> said mlx5 also has some RCU dependency.

So veth, virtio and tun seem to need rcu_read_lock/unlock because
they use an rcu_dereference() in the xdp_xmit path. I'll audit the
rest today.

@Jesper, recall why mlx5 would require rcu_read_lock()/unlock()
pair? I just looked at mlx5_xdp_xmit and I'm not seeing a
rcu_dereference in there so if it is required I would want
to understand why.

> 
> > I'll send a v2 and either move the xdp enabled check (the piece
> > using the rcu_read_lock) into a bitmask flag or push the
> > rcu_read_lock() into virtio_net so its clear that this is a detail
> > of virtio_net and not a general thing. FWIW I don't think the
> > rcu_read_lock is actually needed in the virtio_net case anymore
> > either but pretty sure the rcu_dereference will cause an rcu
> > splat. Maybe there is another annotation we can use. I'll dig
> > into it tomorrow. Thanks
> 
> I'm thinking we can just move the rcu_lock to wrap around only ndo_xdp_xmit.
> But as you suggest if we can identify all drivers which depends on RCU and move the
> rcu_lock into the drivers (or remove the dependency) it's better.

I think we are working in bpf-next tree here so it would be best
to identify the minimal set of drivers that require the read_lock
and push that into the driver. I prefer these things are a precise
so its easy to understand when reading the code. Otherwise its
really not clear without grepping through the code or walking
the git history why we wrapped this in a rcu_read_lock/unlock.
At minimum we want a comment but that feels like a big hammer
that is not needed.

Most drivers should not care about the rcu_read_lock it seems
to just be special cases in the software devices where this happens.
veth for example is dereferencing the peer netdev. tun is dereference
the tun_file. virtio_net usage seems to be arbitrary to me and
is simply used to decide if xdp is enabled but we can do that
in a cleaner way.

I'll put a v2 together today and send it out for review.

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2020-01-09 14:58 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-01-08 21:34 [bpf PATCH 0/2] xdp devmap improvements cleanup John Fastabend
2020-01-08 21:34 ` [bpf PATCH 1/2] bpf: xdp, update devmap comments to reflect napi/rcu usage John Fastabend
2020-01-09  1:20   ` Song Liu
2020-01-08 21:35 ` [bpf PATCH 2/2] bpf: xdp, remove no longer required rcu_read_{un}lock() John Fastabend
2020-01-09  1:21   ` Song Liu
2020-01-09  6:01   ` Toshiaki Makita
2020-01-09  6:34     ` Toshiaki Makita
2020-01-09  6:35     ` John Fastabend
2020-01-09  6:57       ` Toshiaki Makita
2020-01-09 14:58         ` John Fastabend
2020-01-09  6:09 ` [bpf PATCH 0/2] xdp devmap improvements cleanup Björn Töpel

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.