netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [bpf PATCH 0/2] xdp devmap improvements cleanup
@ 2020-01-08 21:34 John Fastabend
  2020-01-08 21:34 ` [bpf PATCH 1/2] bpf: xdp, update devmap comments to reflect napi/rcu usage John Fastabend
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: John Fastabend @ 2020-01-08 21:34 UTC (permalink / raw)
  To: bjorn.topel, bpf, toke; +Cc: netdev, john.fastabend, ast, daniel

Couple cleanup patches to recently posted series[0] from Bjorn to
cleanup and optimize the devmap usage. Patches have commit ids
the cleanup applies to.

[0] https://www.spinics.net/lists/netdev/msg620639.html

---

John Fastabend (2):
      bpf: xdp, update devmap comments to reflect napi/rcu usage
      bpf: xdp, remove no longer required rcu_read_{un}lock()


 kernel/bpf/devmap.c |   23 +++++++++++------------
 1 file changed, 11 insertions(+), 12 deletions(-)

--
Signature

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [bpf PATCH 1/2] bpf: xdp, update devmap comments to reflect napi/rcu usage
  2020-01-08 21:34 [bpf PATCH 0/2] xdp devmap improvements cleanup John Fastabend
@ 2020-01-08 21:34 ` John Fastabend
  2020-01-09  1:20   ` Song Liu
  2020-01-08 21:35 ` [bpf PATCH 2/2] bpf: xdp, remove no longer required rcu_read_{un}lock() John Fastabend
  2020-01-09  6:09 ` [bpf PATCH 0/2] xdp devmap improvements cleanup Björn Töpel
  2 siblings, 1 reply; 11+ messages in thread
From: John Fastabend @ 2020-01-08 21:34 UTC (permalink / raw)
  To: bjorn.topel, bpf, toke; +Cc: netdev, john.fastabend, ast, daniel

Now that we rely on synchronize_rcu and call_rcu waiting to
exit perempt-disable regions (NAPI) lets update the comments
to reflect this.

Fixes: 0536b85239b84 ("xdp: Simplify devmap cleanup")
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
---
 kernel/bpf/devmap.c |   21 +++++++++++----------
 1 file changed, 11 insertions(+), 10 deletions(-)

diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c
index da9c832..f0bf525 100644
--- a/kernel/bpf/devmap.c
+++ b/kernel/bpf/devmap.c
@@ -193,10 +193,12 @@ static void dev_map_free(struct bpf_map *map)
 
 	/* At this point bpf_prog->aux->refcnt == 0 and this map->refcnt == 0,
 	 * so the programs (can be more than one that used this map) were
-	 * disconnected from events. Wait for outstanding critical sections in
-	 * these programs to complete. The rcu critical section only guarantees
-	 * no further reads against netdev_map. It does __not__ ensure pending
-	 * flush operations (if any) are complete.
+	 * disconnected from events. The following synchronize_rcu() guarantees
+	 * both rcu read critical sections complete and waits for
+	 * preempt-disable regions (NAPI being the relavent context here) so we
+	 * are certain there will be no further reads against the netdev_map and
+	 * all flush operations are complete. Flush operations can only be done
+	 * from NAPI context for this reason.
 	 */
 
 	spin_lock(&dev_map_lock);
@@ -498,12 +500,11 @@ static int dev_map_delete_elem(struct bpf_map *map, void *key)
 		return -EINVAL;
 
 	/* Use call_rcu() here to ensure any rcu critical sections have
-	 * completed, but this does not guarantee a flush has happened
-	 * yet. Because driver side rcu_read_lock/unlock only protects the
-	 * running XDP program. However, for pending flush operations the
-	 * dev and ctx are stored in another per cpu map. And additionally,
-	 * the driver tear down ensures all soft irqs are complete before
-	 * removing the net device in the case of dev_put equals zero.
+	 * completed as well as any flush operations because call_rcu
+	 * will wait for preempt-disable region to complete, NAPI in this
+	 * context.  And additionally, the driver tear down ensures all
+	 * soft irqs are complete before removing the net device in the
+	 * case of dev_put equals zero.
 	 */
 	old_dev = xchg(&dtab->netdev_map[k], NULL);
 	if (old_dev)


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [bpf PATCH 2/2] bpf: xdp, remove no longer required rcu_read_{un}lock()
  2020-01-08 21:34 [bpf PATCH 0/2] xdp devmap improvements cleanup John Fastabend
  2020-01-08 21:34 ` [bpf PATCH 1/2] bpf: xdp, update devmap comments to reflect napi/rcu usage John Fastabend
@ 2020-01-08 21:35 ` John Fastabend
  2020-01-09  1:21   ` Song Liu
  2020-01-09  6:01   ` Toshiaki Makita
  2020-01-09  6:09 ` [bpf PATCH 0/2] xdp devmap improvements cleanup Björn Töpel
  2 siblings, 2 replies; 11+ messages in thread
From: John Fastabend @ 2020-01-08 21:35 UTC (permalink / raw)
  To: bjorn.topel, bpf, toke; +Cc: netdev, john.fastabend, ast, daniel

Now that we depend on rcu_call() and synchronize_rcu() to also wait
for preempt_disabled region to complete the rcu read critical section
in __dev_map_flush() is no longer relevant.

These originally ensured the map reference was safe while a map was
also being free'd. But flush by new rules can only be called from
preempt-disabled NAPI context. The synchronize_rcu from the map free
path and the rcu_call from the delete path will ensure the reference
here is safe. So lets remove the rcu_read_lock and rcu_read_unlock
pair to avoid any confusion around how this is being protected.

If the rcu_read_lock was required it would mean errors in the above
logic and the original patch would also be wrong.

Fixes: 0536b85239b84 ("xdp: Simplify devmap cleanup")
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
---
 kernel/bpf/devmap.c |    2 --
 1 file changed, 2 deletions(-)

diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c
index f0bf525..0129d4a 100644
--- a/kernel/bpf/devmap.c
+++ b/kernel/bpf/devmap.c
@@ -378,10 +378,8 @@ void __dev_map_flush(void)
 	struct list_head *flush_list = this_cpu_ptr(&dev_map_flush_list);
 	struct xdp_bulk_queue *bq, *tmp;
 
-	rcu_read_lock();
 	list_for_each_entry_safe(bq, tmp, flush_list, flush_node)
 		bq_xmit_all(bq, XDP_XMIT_FLUSH);
-	rcu_read_unlock();
 }
 
 /* rcu_read_lock (from syscall and BPF contexts) ensures that if a delete and/or


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [bpf PATCH 1/2] bpf: xdp, update devmap comments to reflect napi/rcu usage
  2020-01-08 21:34 ` [bpf PATCH 1/2] bpf: xdp, update devmap comments to reflect napi/rcu usage John Fastabend
@ 2020-01-09  1:20   ` Song Liu
  0 siblings, 0 replies; 11+ messages in thread
From: Song Liu @ 2020-01-09  1:20 UTC (permalink / raw)
  To: John Fastabend
  Cc: Björn Töpel, bpf, Toke Høiland-Jørgensen,
	Networking, Alexei Starovoitov, Daniel Borkmann

On Wed, Jan 8, 2020 at 1:36 PM John Fastabend <john.fastabend@gmail.com> wrote:
>
> Now that we rely on synchronize_rcu and call_rcu waiting to
> exit perempt-disable regions (NAPI) lets update the comments
> to reflect this.
>
> Fixes: 0536b85239b84 ("xdp: Simplify devmap cleanup")
> Signed-off-by: John Fastabend <john.fastabend@gmail.com>

Acked-by: Song Liu <songliubraving@fb.com>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [bpf PATCH 2/2] bpf: xdp, remove no longer required rcu_read_{un}lock()
  2020-01-08 21:35 ` [bpf PATCH 2/2] bpf: xdp, remove no longer required rcu_read_{un}lock() John Fastabend
@ 2020-01-09  1:21   ` Song Liu
  2020-01-09  6:01   ` Toshiaki Makita
  1 sibling, 0 replies; 11+ messages in thread
From: Song Liu @ 2020-01-09  1:21 UTC (permalink / raw)
  To: John Fastabend
  Cc: Björn Töpel, bpf, Toke Høiland-Jørgensen,
	Networking, Alexei Starovoitov, Daniel Borkmann

On Wed, Jan 8, 2020 at 1:36 PM John Fastabend <john.fastabend@gmail.com> wrote:
>
> Now that we depend on rcu_call() and synchronize_rcu() to also wait
> for preempt_disabled region to complete the rcu read critical section
> in __dev_map_flush() is no longer relevant.
>
> These originally ensured the map reference was safe while a map was
> also being free'd. But flush by new rules can only be called from
> preempt-disabled NAPI context. The synchronize_rcu from the map free
> path and the rcu_call from the delete path will ensure the reference
> here is safe. So lets remove the rcu_read_lock and rcu_read_unlock
> pair to avoid any confusion around how this is being protected.
>
> If the rcu_read_lock was required it would mean errors in the above
> logic and the original patch would also be wrong.
>
> Fixes: 0536b85239b84 ("xdp: Simplify devmap cleanup")
> Signed-off-by: John Fastabend <john.fastabend@gmail.com>

Acked-by: Song Liu <songliubraving@fb.com>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [bpf PATCH 2/2] bpf: xdp, remove no longer required rcu_read_{un}lock()
  2020-01-08 21:35 ` [bpf PATCH 2/2] bpf: xdp, remove no longer required rcu_read_{un}lock() John Fastabend
  2020-01-09  1:21   ` Song Liu
@ 2020-01-09  6:01   ` Toshiaki Makita
  2020-01-09  6:34     ` Toshiaki Makita
  2020-01-09  6:35     ` John Fastabend
  1 sibling, 2 replies; 11+ messages in thread
From: Toshiaki Makita @ 2020-01-09  6:01 UTC (permalink / raw)
  To: John Fastabend, bjorn.topel, bpf, toke; +Cc: netdev, ast, daniel

On 2020/01/09 6:35, John Fastabend wrote:
> Now that we depend on rcu_call() and synchronize_rcu() to also wait
> for preempt_disabled region to complete the rcu read critical section
> in __dev_map_flush() is no longer relevant.
> 
> These originally ensured the map reference was safe while a map was
> also being free'd. But flush by new rules can only be called from
> preempt-disabled NAPI context. The synchronize_rcu from the map free
> path and the rcu_call from the delete path will ensure the reference
> here is safe. So lets remove the rcu_read_lock and rcu_read_unlock
> pair to avoid any confusion around how this is being protected.
> 
> If the rcu_read_lock was required it would mean errors in the above
> logic and the original patch would also be wrong.
> 
> Fixes: 0536b85239b84 ("xdp: Simplify devmap cleanup")
> Signed-off-by: John Fastabend <john.fastabend@gmail.com>
> ---
>   kernel/bpf/devmap.c |    2 --
>   1 file changed, 2 deletions(-)
> 
> diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c
> index f0bf525..0129d4a 100644
> --- a/kernel/bpf/devmap.c
> +++ b/kernel/bpf/devmap.c
> @@ -378,10 +378,8 @@ void __dev_map_flush(void)
>   	struct list_head *flush_list = this_cpu_ptr(&dev_map_flush_list);
>   	struct xdp_bulk_queue *bq, *tmp;
>   
> -	rcu_read_lock();
>   	list_for_each_entry_safe(bq, tmp, flush_list, flush_node)
>   		bq_xmit_all(bq, XDP_XMIT_FLUSH);
> -	rcu_read_unlock();

I introduced this lock because some drivers have assumption that
.ndo_xdp_xmit() is called under RCU. (commit 86723c864063)

Maybe devmap deletion logic does not need this anymore, but is it
OK to drivers?

Toshiaki Makita

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [bpf PATCH 0/2] xdp devmap improvements cleanup
  2020-01-08 21:34 [bpf PATCH 0/2] xdp devmap improvements cleanup John Fastabend
  2020-01-08 21:34 ` [bpf PATCH 1/2] bpf: xdp, update devmap comments to reflect napi/rcu usage John Fastabend
  2020-01-08 21:35 ` [bpf PATCH 2/2] bpf: xdp, remove no longer required rcu_read_{un}lock() John Fastabend
@ 2020-01-09  6:09 ` Björn Töpel
  2 siblings, 0 replies; 11+ messages in thread
From: Björn Töpel @ 2020-01-09  6:09 UTC (permalink / raw)
  To: John Fastabend
  Cc: bpf, Toke Høiland-Jørgensen, Netdev,
	Alexei Starovoitov, Daniel Borkmann

On Wed, 8 Jan 2020 at 22:34, John Fastabend <john.fastabend@gmail.com> wrote:
>
> Couple cleanup patches to recently posted series[0] from Bjorn to
> cleanup and optimize the devmap usage. Patches have commit ids
> the cleanup applies to.
>
> [0] https://www.spinics.net/lists/netdev/msg620639.html
>
> ---
>
> John Fastabend (2):
>       bpf: xdp, update devmap comments to reflect napi/rcu usage
>       bpf: xdp, remove no longer required rcu_read_{un}lock()
>

Thanks for the clean-up, John!

For the series:
Acked-by: Björn Töpel <bjorn.topel@intel.com>

>
>  kernel/bpf/devmap.c |   23 +++++++++++------------
>  1 file changed, 11 insertions(+), 12 deletions(-)
>
> --
> Signature

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [bpf PATCH 2/2] bpf: xdp, remove no longer required rcu_read_{un}lock()
  2020-01-09  6:01   ` Toshiaki Makita
@ 2020-01-09  6:34     ` Toshiaki Makita
  2020-01-09  6:35     ` John Fastabend
  1 sibling, 0 replies; 11+ messages in thread
From: Toshiaki Makita @ 2020-01-09  6:34 UTC (permalink / raw)
  To: John Fastabend, bjorn.topel, bpf, toke; +Cc: netdev, ast, daniel

On 2020/01/09 15:01, Toshiaki Makita wrote:
> On 2020/01/09 6:35, John Fastabend wrote:
>> Now that we depend on rcu_call() and synchronize_rcu() to also wait
>> for preempt_disabled region to complete the rcu read critical section
>> in __dev_map_flush() is no longer relevant.
>>
>> These originally ensured the map reference was safe while a map was
>> also being free'd. But flush by new rules can only be called from
>> preempt-disabled NAPI context. The synchronize_rcu from the map free
>> path and the rcu_call from the delete path will ensure the reference
>> here is safe. So lets remove the rcu_read_lock and rcu_read_unlock
>> pair to avoid any confusion around how this is being protected.
>>
>> If the rcu_read_lock was required it would mean errors in the above
>> logic and the original patch would also be wrong.
>>
>> Fixes: 0536b85239b84 ("xdp: Simplify devmap cleanup")
>> Signed-off-by: John Fastabend <john.fastabend@gmail.com>
>> ---
>>   kernel/bpf/devmap.c |    2 --
>>   1 file changed, 2 deletions(-)
>>
>> diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c
>> index f0bf525..0129d4a 100644
>> --- a/kernel/bpf/devmap.c
>> +++ b/kernel/bpf/devmap.c
>> @@ -378,10 +378,8 @@ void __dev_map_flush(void)
>>       struct list_head *flush_list = this_cpu_ptr(&dev_map_flush_list);
>>       struct xdp_bulk_queue *bq, *tmp;
>> -    rcu_read_lock();
>>       list_for_each_entry_safe(bq, tmp, flush_list, flush_node)
>>           bq_xmit_all(bq, XDP_XMIT_FLUSH);
>> -    rcu_read_unlock();
> 
> I introduced this lock because some drivers have assumption that
> .ndo_xdp_xmit() is called under RCU. (commit 86723c864063)
> 
> Maybe devmap deletion logic does not need this anymore, but is it
> OK to drivers?

More references for the driver problem:
- Lockdep splat in virtio_net: https://lists.openwall.net/netdev/2019/04/24/282
- Discussion for fix: https://lists.openwall.net/netdev/2019/04/25/234

Toshiaki Makita

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [bpf PATCH 2/2] bpf: xdp, remove no longer required rcu_read_{un}lock()
  2020-01-09  6:01   ` Toshiaki Makita
  2020-01-09  6:34     ` Toshiaki Makita
@ 2020-01-09  6:35     ` John Fastabend
  2020-01-09  6:57       ` Toshiaki Makita
  1 sibling, 1 reply; 11+ messages in thread
From: John Fastabend @ 2020-01-09  6:35 UTC (permalink / raw)
  To: Toshiaki Makita, John Fastabend, bjorn.topel, bpf, toke
  Cc: netdev, ast, daniel

Toshiaki Makita wrote:
> On 2020/01/09 6:35, John Fastabend wrote:
> > Now that we depend on rcu_call() and synchronize_rcu() to also wait
> > for preempt_disabled region to complete the rcu read critical section
> > in __dev_map_flush() is no longer relevant.
> > 
> > These originally ensured the map reference was safe while a map was
> > also being free'd. But flush by new rules can only be called from
> > preempt-disabled NAPI context. The synchronize_rcu from the map free
> > path and the rcu_call from the delete path will ensure the reference
> > here is safe. So lets remove the rcu_read_lock and rcu_read_unlock
> > pair to avoid any confusion around how this is being protected.
> > 
> > If the rcu_read_lock was required it would mean errors in the above
> > logic and the original patch would also be wrong.
> > 
> > Fixes: 0536b85239b84 ("xdp: Simplify devmap cleanup")
> > Signed-off-by: John Fastabend <john.fastabend@gmail.com>
> > ---
> >   kernel/bpf/devmap.c |    2 --
> >   1 file changed, 2 deletions(-)
> > 
> > diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c
> > index f0bf525..0129d4a 100644
> > --- a/kernel/bpf/devmap.c
> > +++ b/kernel/bpf/devmap.c
> > @@ -378,10 +378,8 @@ void __dev_map_flush(void)
> >   	struct list_head *flush_list = this_cpu_ptr(&dev_map_flush_list);
> >   	struct xdp_bulk_queue *bq, *tmp;
> >   
> > -	rcu_read_lock();
> >   	list_for_each_entry_safe(bq, tmp, flush_list, flush_node)
> >   		bq_xmit_all(bq, XDP_XMIT_FLUSH);
> > -	rcu_read_unlock();
> 
> I introduced this lock because some drivers have assumption that
> .ndo_xdp_xmit() is called under RCU. (commit 86723c864063)
> 
> Maybe devmap deletion logic does not need this anymore, but is it
> OK to drivers?

Ah OK thanks for catching this. So its a strange requirement from
virto_net to need read_lock like this. Quickly scanned the drivers
and seems its the only one.

I think the best path forward is to fix virtio_net so it doesn't
need rcu_read_lock() here then the locking is much cleaner IMO.

I'll send a v2 and either move the xdp enabled check (the piece
using the rcu_read_lock) into a bitmask flag or push the
rcu_read_lock() into virtio_net so its clear that this is a detail
of virtio_net and not a general thing. FWIW I don't think the
rcu_read_lock is actually needed in the virtio_net case anymore
either but pretty sure the rcu_dereference will cause an rcu
splat. Maybe there is another annotation we can use. I'll dig
into it tomorrow. Thanks

> 
> Toshiaki Makita



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [bpf PATCH 2/2] bpf: xdp, remove no longer required rcu_read_{un}lock()
  2020-01-09  6:35     ` John Fastabend
@ 2020-01-09  6:57       ` Toshiaki Makita
  2020-01-09 14:58         ` John Fastabend
  0 siblings, 1 reply; 11+ messages in thread
From: Toshiaki Makita @ 2020-01-09  6:57 UTC (permalink / raw)
  To: John Fastabend, bjorn.topel, bpf, toke; +Cc: netdev, ast, daniel

On 2020/01/09 15:35, John Fastabend wrote:
> Toshiaki Makita wrote:
>> On 2020/01/09 6:35, John Fastabend wrote:
>>> Now that we depend on rcu_call() and synchronize_rcu() to also wait
>>> for preempt_disabled region to complete the rcu read critical section
>>> in __dev_map_flush() is no longer relevant.
>>>
>>> These originally ensured the map reference was safe while a map was
>>> also being free'd. But flush by new rules can only be called from
>>> preempt-disabled NAPI context. The synchronize_rcu from the map free
>>> path and the rcu_call from the delete path will ensure the reference
>>> here is safe. So lets remove the rcu_read_lock and rcu_read_unlock
>>> pair to avoid any confusion around how this is being protected.
>>>
>>> If the rcu_read_lock was required it would mean errors in the above
>>> logic and the original patch would also be wrong.
>>>
>>> Fixes: 0536b85239b84 ("xdp: Simplify devmap cleanup")
>>> Signed-off-by: John Fastabend <john.fastabend@gmail.com>
>>> ---
>>>    kernel/bpf/devmap.c |    2 --
>>>    1 file changed, 2 deletions(-)
>>>
>>> diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c
>>> index f0bf525..0129d4a 100644
>>> --- a/kernel/bpf/devmap.c
>>> +++ b/kernel/bpf/devmap.c
>>> @@ -378,10 +378,8 @@ void __dev_map_flush(void)
>>>    	struct list_head *flush_list = this_cpu_ptr(&dev_map_flush_list);
>>>    	struct xdp_bulk_queue *bq, *tmp;
>>>    
>>> -	rcu_read_lock();
>>>    	list_for_each_entry_safe(bq, tmp, flush_list, flush_node)
>>>    		bq_xmit_all(bq, XDP_XMIT_FLUSH);
>>> -	rcu_read_unlock();
>>
>> I introduced this lock because some drivers have assumption that
>> .ndo_xdp_xmit() is called under RCU. (commit 86723c864063)
>>
>> Maybe devmap deletion logic does not need this anymore, but is it
>> OK to drivers?
> 
> Ah OK thanks for catching this. So its a strange requirement from
> virto_net to need read_lock like this. Quickly scanned the drivers
> and seems its the only one.
> 
> I think the best path forward is to fix virtio_net so it doesn't
> need rcu_read_lock() here then the locking is much cleaner IMO.

Actually veth is calling rcu_dereference in .ndo_xdp_xmit() so it needs
the same treatment. In the reference I sent in another mail, Jesper
said mlx5 also has some RCU dependency.

> I'll send a v2 and either move the xdp enabled check (the piece
> using the rcu_read_lock) into a bitmask flag or push the
> rcu_read_lock() into virtio_net so its clear that this is a detail
> of virtio_net and not a general thing. FWIW I don't think the
> rcu_read_lock is actually needed in the virtio_net case anymore
> either but pretty sure the rcu_dereference will cause an rcu
> splat. Maybe there is another annotation we can use. I'll dig
> into it tomorrow. Thanks

I'm thinking we can just move the rcu_lock to wrap around only ndo_xdp_xmit.
But as you suggest if we can identify all drivers which depends on RCU and move the
rcu_lock into the drivers (or remove the dependency) it's better.

Toshiaki Makita

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [bpf PATCH 2/2] bpf: xdp, remove no longer required rcu_read_{un}lock()
  2020-01-09  6:57       ` Toshiaki Makita
@ 2020-01-09 14:58         ` John Fastabend
  0 siblings, 0 replies; 11+ messages in thread
From: John Fastabend @ 2020-01-09 14:58 UTC (permalink / raw)
  To: Toshiaki Makita, John Fastabend, bjorn.topel, bpf, toke, jbrouer
  Cc: netdev, ast, daniel

Toshiaki Makita wrote:
> On 2020/01/09 15:35, John Fastabend wrote:
> > Toshiaki Makita wrote:
> >> On 2020/01/09 6:35, John Fastabend wrote:
> >>> Now that we depend on rcu_call() and synchronize_rcu() to also wait
> >>> for preempt_disabled region to complete the rcu read critical section
> >>> in __dev_map_flush() is no longer relevant.
> >>>
> >>> These originally ensured the map reference was safe while a map was
> >>> also being free'd. But flush by new rules can only be called from
> >>> preempt-disabled NAPI context. The synchronize_rcu from the map free
> >>> path and the rcu_call from the delete path will ensure the reference
> >>> here is safe. So lets remove the rcu_read_lock and rcu_read_unlock
> >>> pair to avoid any confusion around how this is being protected.
> >>>
> >>> If the rcu_read_lock was required it would mean errors in the above
> >>> logic and the original patch would also be wrong.
> >>>
> >>> Fixes: 0536b85239b84 ("xdp: Simplify devmap cleanup")
> >>> Signed-off-by: John Fastabend <john.fastabend@gmail.com>
> >>> ---
> >>>    kernel/bpf/devmap.c |    2 --
> >>>    1 file changed, 2 deletions(-)
> >>>
> >>> diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c
> >>> index f0bf525..0129d4a 100644
> >>> --- a/kernel/bpf/devmap.c
> >>> +++ b/kernel/bpf/devmap.c
> >>> @@ -378,10 +378,8 @@ void __dev_map_flush(void)
> >>>    	struct list_head *flush_list = this_cpu_ptr(&dev_map_flush_list);
> >>>    	struct xdp_bulk_queue *bq, *tmp;
> >>>    
> >>> -	rcu_read_lock();
> >>>    	list_for_each_entry_safe(bq, tmp, flush_list, flush_node)
> >>>    		bq_xmit_all(bq, XDP_XMIT_FLUSH);
> >>> -	rcu_read_unlock();
> >>
> >> I introduced this lock because some drivers have assumption that
> >> .ndo_xdp_xmit() is called under RCU. (commit 86723c864063)
> >>
> >> Maybe devmap deletion logic does not need this anymore, but is it
> >> OK to drivers?
> > 
> > Ah OK thanks for catching this. So its a strange requirement from
> > virto_net to need read_lock like this. Quickly scanned the drivers
> > and seems its the only one.
> > 
> > I think the best path forward is to fix virtio_net so it doesn't
> > need rcu_read_lock() here then the locking is much cleaner IMO.
> 
> Actually veth is calling rcu_dereference in .ndo_xdp_xmit() so it needs
> the same treatment. In the reference I sent in another mail, Jesper
> said mlx5 also has some RCU dependency.

So veth, virtio and tun seem to need rcu_read_lock/unlock because
they use an rcu_dereference() in the xdp_xmit path. I'll audit the
rest today.

@Jesper, recall why mlx5 would require rcu_read_lock()/unlock()
pair? I just looked at mlx5_xdp_xmit and I'm not seeing a
rcu_dereference in there so if it is required I would want
to understand why.

> 
> > I'll send a v2 and either move the xdp enabled check (the piece
> > using the rcu_read_lock) into a bitmask flag or push the
> > rcu_read_lock() into virtio_net so its clear that this is a detail
> > of virtio_net and not a general thing. FWIW I don't think the
> > rcu_read_lock is actually needed in the virtio_net case anymore
> > either but pretty sure the rcu_dereference will cause an rcu
> > splat. Maybe there is another annotation we can use. I'll dig
> > into it tomorrow. Thanks
> 
> I'm thinking we can just move the rcu_lock to wrap around only ndo_xdp_xmit.
> But as you suggest if we can identify all drivers which depends on RCU and move the
> rcu_lock into the drivers (or remove the dependency) it's better.

I think we are working in bpf-next tree here so it would be best
to identify the minimal set of drivers that require the read_lock
and push that into the driver. I prefer these things are a precise
so its easy to understand when reading the code. Otherwise its
really not clear without grepping through the code or walking
the git history why we wrapped this in a rcu_read_lock/unlock.
At minimum we want a comment but that feels like a big hammer
that is not needed.

Most drivers should not care about the rcu_read_lock it seems
to just be special cases in the software devices where this happens.
veth for example is dereferencing the peer netdev. tun is dereference
the tun_file. virtio_net usage seems to be arbitrary to me and
is simply used to decide if xdp is enabled but we can do that
in a cleaner way.

I'll put a v2 together today and send it out for review.

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2020-01-09 14:58 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-01-08 21:34 [bpf PATCH 0/2] xdp devmap improvements cleanup John Fastabend
2020-01-08 21:34 ` [bpf PATCH 1/2] bpf: xdp, update devmap comments to reflect napi/rcu usage John Fastabend
2020-01-09  1:20   ` Song Liu
2020-01-08 21:35 ` [bpf PATCH 2/2] bpf: xdp, remove no longer required rcu_read_{un}lock() John Fastabend
2020-01-09  1:21   ` Song Liu
2020-01-09  6:01   ` Toshiaki Makita
2020-01-09  6:34     ` Toshiaki Makita
2020-01-09  6:35     ` John Fastabend
2020-01-09  6:57       ` Toshiaki Makita
2020-01-09 14:58         ` John Fastabend
2020-01-09  6:09 ` [bpf PATCH 0/2] xdp devmap improvements cleanup Björn Töpel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).