BPF Archive on lore.kernel.org
 help / color / Atom feed
* [bpf-next PATCH v2 0/2] xdp devmap improvements cleanup
@ 2020-01-12  2:36 John Fastabend
  2020-01-12  2:37 ` [bpf-next PATCH v2 1/2] bpf: xdp, update devmap comments to reflect napi/rcu usage John Fastabend
  2020-01-12  2:37 ` [bpf-next PATCH v2 2/2] bpf: xdp, remove no longer required rcu_read_{un}lock() John Fastabend
  0 siblings, 2 replies; 10+ messages in thread
From: John Fastabend @ 2020-01-12  2:36 UTC (permalink / raw)
  To: bjorn.topel, bpf, toke, toshiaki.makita1
  Cc: netdev, john.fastabend, ast, daniel

Couple cleanup patches to recently posted series[0] from Bjorn to
cleanup and optimize the devmap usage. Patches have commit ids
the cleanup applies to.

Toshiaki, noted that virtio_net depends on rcu_lock being held
when calling xdp flush routines. This is specific to virtio_net
dereferencing xdp program to determine if xdp is enabled or not.
More typical pattern is to look at a flag set by the driver, at
least in the case of real hardware drivers. veth has a similar
requirement where it derferences the peer netdev priv structure
requiring the rcu_read_lock. I believe its better to put the
rcu_read_lock()/unlock() pair where its actually used in the
driver. FWIW in other xdp paths we expect driver writers to
place rcu_read_lock/unlock pairs where they are needed as well
so this keeps that expectation. Also it improves readability
making it obvious why the rcu_read_lock and unlock pair is
needed. In the virtio case we can probably do some further
cleanup and remove it altogether if we want. For more details
see patch 2/2.

[0] https://www.spinics.net/lists/netdev/msg620639.html

v2: Place rcu_read_{un}lock pair in virtio_net and veth drivers
    so we don't break this requirement when removing rcu read
    lock from devmap.

---

John Fastabend (2):
      bpf: xdp, update devmap comments to reflect napi/rcu usage
      bpf: xdp, remove no longer required rcu_read_{un}lock()


 drivers/net/veth.c       |    6 +++++-
 drivers/net/virtio_net.c |    8 ++++++--
 kernel/bpf/devmap.c      |   26 ++++++++++++++------------
 3 files changed, 25 insertions(+), 15 deletions(-)

--
Signature

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [bpf-next PATCH v2 1/2] bpf: xdp, update devmap comments to reflect napi/rcu usage
  2020-01-12  2:36 [bpf-next PATCH v2 0/2] xdp devmap improvements cleanup John Fastabend
@ 2020-01-12  2:37 ` John Fastabend
  2020-01-12  7:47   ` Maciej Fijalkowski
  2020-01-12 11:21   ` Toke Høiland-Jørgensen
  2020-01-12  2:37 ` [bpf-next PATCH v2 2/2] bpf: xdp, remove no longer required rcu_read_{un}lock() John Fastabend
  1 sibling, 2 replies; 10+ messages in thread
From: John Fastabend @ 2020-01-12  2:37 UTC (permalink / raw)
  To: bjorn.topel, bpf, toke, toshiaki.makita1
  Cc: netdev, john.fastabend, ast, daniel

Now that we rely on synchronize_rcu and call_rcu waiting to
exit perempt-disable regions (NAPI) lets update the comments
to reflect this.

Fixes: 0536b85239b84 ("xdp: Simplify devmap cleanup")
Acked-by: Björn Töpel <bjorn.topel@intel.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
---
 kernel/bpf/devmap.c |   21 +++++++++++----------
 1 file changed, 11 insertions(+), 10 deletions(-)

diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c
index da9c832..f0bf525 100644
--- a/kernel/bpf/devmap.c
+++ b/kernel/bpf/devmap.c
@@ -193,10 +193,12 @@ static void dev_map_free(struct bpf_map *map)
 
 	/* At this point bpf_prog->aux->refcnt == 0 and this map->refcnt == 0,
 	 * so the programs (can be more than one that used this map) were
-	 * disconnected from events. Wait for outstanding critical sections in
-	 * these programs to complete. The rcu critical section only guarantees
-	 * no further reads against netdev_map. It does __not__ ensure pending
-	 * flush operations (if any) are complete.
+	 * disconnected from events. The following synchronize_rcu() guarantees
+	 * both rcu read critical sections complete and waits for
+	 * preempt-disable regions (NAPI being the relavent context here) so we
+	 * are certain there will be no further reads against the netdev_map and
+	 * all flush operations are complete. Flush operations can only be done
+	 * from NAPI context for this reason.
 	 */
 
 	spin_lock(&dev_map_lock);
@@ -498,12 +500,11 @@ static int dev_map_delete_elem(struct bpf_map *map, void *key)
 		return -EINVAL;
 
 	/* Use call_rcu() here to ensure any rcu critical sections have
-	 * completed, but this does not guarantee a flush has happened
-	 * yet. Because driver side rcu_read_lock/unlock only protects the
-	 * running XDP program. However, for pending flush operations the
-	 * dev and ctx are stored in another per cpu map. And additionally,
-	 * the driver tear down ensures all soft irqs are complete before
-	 * removing the net device in the case of dev_put equals zero.
+	 * completed as well as any flush operations because call_rcu
+	 * will wait for preempt-disable region to complete, NAPI in this
+	 * context.  And additionally, the driver tear down ensures all
+	 * soft irqs are complete before removing the net device in the
+	 * case of dev_put equals zero.
 	 */
 	old_dev = xchg(&dtab->netdev_map[k], NULL);
 	if (old_dev)


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [bpf-next PATCH v2 2/2] bpf: xdp, remove no longer required rcu_read_{un}lock()
  2020-01-12  2:36 [bpf-next PATCH v2 0/2] xdp devmap improvements cleanup John Fastabend
  2020-01-12  2:37 ` [bpf-next PATCH v2 1/2] bpf: xdp, update devmap comments to reflect napi/rcu usage John Fastabend
@ 2020-01-12  2:37 ` John Fastabend
  2020-01-12  6:49   ` Maciej Fijalkowski
  2020-01-12 11:22   ` Toke Høiland-Jørgensen
  1 sibling, 2 replies; 10+ messages in thread
From: John Fastabend @ 2020-01-12  2:37 UTC (permalink / raw)
  To: bjorn.topel, bpf, toke, toshiaki.makita1
  Cc: netdev, john.fastabend, ast, daniel

Now that we depend on rcu_call() and synchronize_rcu() to also wait
for preempt_disabled region to complete the rcu read critical section
in __dev_map_flush() is no longer required. Except in a few special
cases in drivers that need it for other reasons.

These originally ensured the map reference was safe while a map was
also being free'd. And additionally that bpf program updates via
ndo_bpf did not happen while flush updates were in flight. But flush
by new rules can only be called from preempt-disabled NAPI context.
The synchronize_rcu from the map free path and the rcu_call from the
delete path will ensure the reference there is safe. So lets remove
the rcu_read_lock and rcu_read_unlock pair to avoid any confusion
around how this is being protected.

If the rcu_read_lock was required it would mean errors in the above
logic and the original patch would also be wrong.

Now that we have done above we put the rcu_read_lock in the driver
code where it is needed in a driver dependent way. I think this
helps readability of the code so we know where and why we are
taking read locks. Most drivers will not need rcu_read_locks here
and further XDP drivers already have rcu_read_locks in their code
paths for reading xdp programs on RX side so this makes it symmetric
where we don't have half of rcu critical sections define in driver
and the other half in devmap.

Fixes: 0536b85239b84 ("xdp: Simplify devmap cleanup")
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
---
 drivers/net/veth.c       |    6 +++++-
 drivers/net/virtio_net.c |    8 ++++++--
 kernel/bpf/devmap.c      |    5 +++--
 3 files changed, 14 insertions(+), 5 deletions(-)

diff --git a/drivers/net/veth.c b/drivers/net/veth.c
index a552df3..184e1b4 100644
--- a/drivers/net/veth.c
+++ b/drivers/net/veth.c
@@ -377,6 +377,7 @@ static int veth_xdp_xmit(struct net_device *dev, int n,
 	unsigned int max_len;
 	struct veth_rq *rq;
 
+	rcu_read_lock();
 	if (unlikely(flags & ~XDP_XMIT_FLAGS_MASK)) {
 		ret = -EINVAL;
 		goto drop;
@@ -418,11 +419,14 @@ static int veth_xdp_xmit(struct net_device *dev, int n,
 	if (flags & XDP_XMIT_FLUSH)
 		__veth_xdp_flush(rq);
 
-	if (likely(!drops))
+	if (likely(!drops)) {
+		rcu_read_unlock();
 		return n;
+	}
 
 	ret = n - drops;
 drop:
+	rcu_read_unlock();
 	atomic64_add(drops, &priv->dropped);
 
 	return ret;
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 4d7d5434..2c11f82 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -498,12 +498,16 @@ static int virtnet_xdp_xmit(struct net_device *dev,
 	void *ptr;
 	int i;
 
+	rcu_read_lock();
+
 	/* Only allow ndo_xdp_xmit if XDP is loaded on dev, as this
 	 * indicate XDP resources have been successfully allocated.
 	 */
 	xdp_prog = rcu_dereference(rq->xdp_prog);
-	if (!xdp_prog)
+	if (!xdp_prog) {
+		rcu_read_unlock();
 		return -ENXIO;
+	}
 
 	sq = virtnet_xdp_sq(vi);
 
@@ -552,7 +556,7 @@ static int virtnet_xdp_xmit(struct net_device *dev,
 	sq->stats.xdp_tx_drops += drops;
 	sq->stats.kicks += kicks;
 	u64_stats_update_end(&sq->stats.syncp);
-
+	rcu_read_unlock();
 	return ret;
 }
 
diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c
index f0bf525..d0ce2e2 100644
--- a/kernel/bpf/devmap.c
+++ b/kernel/bpf/devmap.c
@@ -372,16 +372,17 @@ static int bq_xmit_all(struct xdp_bulk_queue *bq, u32 flags)
  * from NET_RX_SOFTIRQ. Either way the poll routine must complete before the
  * net device can be torn down. On devmap tear down we ensure the flush list
  * is empty before completing to ensure all flush operations have completed.
+ * When drivers update the bpf program they may need to ensure any flush ops
+ * are also complete. Using synchronize_rcu or call_rcu will suffice for this
+ * because both wait for napi context to exit.
  */
 void __dev_map_flush(void)
 {
 	struct list_head *flush_list = this_cpu_ptr(&dev_map_flush_list);
 	struct xdp_bulk_queue *bq, *tmp;
 
-	rcu_read_lock();
 	list_for_each_entry_safe(bq, tmp, flush_list, flush_node)
 		bq_xmit_all(bq, XDP_XMIT_FLUSH);
-	rcu_read_unlock();
 }
 
 /* rcu_read_lock (from syscall and BPF contexts) ensures that if a delete and/or


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [bpf-next PATCH v2 2/2] bpf: xdp, remove no longer required rcu_read_{un}lock()
  2020-01-12  2:37 ` [bpf-next PATCH v2 2/2] bpf: xdp, remove no longer required rcu_read_{un}lock() John Fastabend
@ 2020-01-12  6:49   ` Maciej Fijalkowski
  2020-01-14  3:25     ` John Fastabend
  2020-01-12 11:22   ` Toke Høiland-Jørgensen
  1 sibling, 1 reply; 10+ messages in thread
From: Maciej Fijalkowski @ 2020-01-12  6:49 UTC (permalink / raw)
  To: John Fastabend
  Cc: bjorn.topel, bpf, toke, toshiaki.makita1, netdev, ast, daniel

On Sat, Jan 11, 2020 at 06:37:42PM -0800, John Fastabend wrote:
> Now that we depend on rcu_call() and synchronize_rcu() to also wait
> for preempt_disabled region to complete the rcu read critical section
> in __dev_map_flush() is no longer required. Except in a few special
> cases in drivers that need it for other reasons.
> 
> These originally ensured the map reference was safe while a map was
> also being free'd. And additionally that bpf program updates via
> ndo_bpf did not happen while flush updates were in flight. But flush
> by new rules can only be called from preempt-disabled NAPI context.
> The synchronize_rcu from the map free path and the rcu_call from the
> delete path will ensure the reference there is safe. So lets remove
> the rcu_read_lock and rcu_read_unlock pair to avoid any confusion
> around how this is being protected.
> 
> If the rcu_read_lock was required it would mean errors in the above
> logic and the original patch would also be wrong.
> 
> Now that we have done above we put the rcu_read_lock in the driver
> code where it is needed in a driver dependent way. I think this
> helps readability of the code so we know where and why we are
> taking read locks. Most drivers will not need rcu_read_locks here
> and further XDP drivers already have rcu_read_locks in their code
> paths for reading xdp programs on RX side so this makes it symmetric
> where we don't have half of rcu critical sections define in driver
> and the other half in devmap.
> 
> Fixes: 0536b85239b84 ("xdp: Simplify devmap cleanup")
> Signed-off-by: John Fastabend <john.fastabend@gmail.com>
> ---
>  drivers/net/veth.c       |    6 +++++-
>  drivers/net/virtio_net.c |    8 ++++++--
>  kernel/bpf/devmap.c      |    5 +++--
>  3 files changed, 14 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/net/veth.c b/drivers/net/veth.c
> index a552df3..184e1b4 100644
> --- a/drivers/net/veth.c
> +++ b/drivers/net/veth.c
> @@ -377,6 +377,7 @@ static int veth_xdp_xmit(struct net_device *dev, int n,
>  	unsigned int max_len;
>  	struct veth_rq *rq;
>  
> +	rcu_read_lock();
>  	if (unlikely(flags & ~XDP_XMIT_FLAGS_MASK)) {
>  		ret = -EINVAL;
>  		goto drop;
> @@ -418,11 +419,14 @@ static int veth_xdp_xmit(struct net_device *dev, int n,
>  	if (flags & XDP_XMIT_FLUSH)
>  		__veth_xdp_flush(rq);
>  
> -	if (likely(!drops))
> +	if (likely(!drops)) {
> +		rcu_read_unlock();
>  		return n;
> +	}
>  
>  	ret = n - drops;
>  drop:
> +	rcu_read_unlock();
>  	atomic64_add(drops, &priv->dropped);
>  
>  	return ret;
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 4d7d5434..2c11f82 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -498,12 +498,16 @@ static int virtnet_xdp_xmit(struct net_device *dev,
>  	void *ptr;
>  	int i;
>  
> +	rcu_read_lock();
> +
>  	/* Only allow ndo_xdp_xmit if XDP is loaded on dev, as this
>  	 * indicate XDP resources have been successfully allocated.
>  	 */
>  	xdp_prog = rcu_dereference(rq->xdp_prog);

We could convert that rcu_dereference to rcu_access_pointer so that we
don't need the rcu critical section here at all. Actually this was
suggested some time ago by David Ahern during the initial discussion
around this code. Not sure why we didn't change it.

Veth is also checking the xdp prog presence and it is doing that via
rcu_access_pointer so such conversion would make it more common, no?

xdp_prog is only check against NULL, so quoting the part of comment from
rcu_access_pointer:
"This is useful when the value of this pointer is accessed, but the pointer
is not dereferenced, for example, when testing an RCU-protected pointer
against NULL."

> -	if (!xdp_prog)
> +	if (!xdp_prog) {
> +		rcu_read_unlock();
>  		return -ENXIO;
> +	}
>  
>  	sq = virtnet_xdp_sq(vi);
>  
> @@ -552,7 +556,7 @@ static int virtnet_xdp_xmit(struct net_device *dev,
>  	sq->stats.xdp_tx_drops += drops;
>  	sq->stats.kicks += kicks;
>  	u64_stats_update_end(&sq->stats.syncp);
> -
> +	rcu_read_unlock();
>  	return ret;
>  }
>  
> diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c
> index f0bf525..d0ce2e2 100644
> --- a/kernel/bpf/devmap.c
> +++ b/kernel/bpf/devmap.c
> @@ -372,16 +372,17 @@ static int bq_xmit_all(struct xdp_bulk_queue *bq, u32 flags)
>   * from NET_RX_SOFTIRQ. Either way the poll routine must complete before the
>   * net device can be torn down. On devmap tear down we ensure the flush list
>   * is empty before completing to ensure all flush operations have completed.
> + * When drivers update the bpf program they may need to ensure any flush ops
> + * are also complete. Using synchronize_rcu or call_rcu will suffice for this
> + * because both wait for napi context to exit.
>   */
>  void __dev_map_flush(void)
>  {
>  	struct list_head *flush_list = this_cpu_ptr(&dev_map_flush_list);
>  	struct xdp_bulk_queue *bq, *tmp;
>  
> -	rcu_read_lock();
>  	list_for_each_entry_safe(bq, tmp, flush_list, flush_node)
>  		bq_xmit_all(bq, XDP_XMIT_FLUSH);
> -	rcu_read_unlock();
>  }
>  
>  /* rcu_read_lock (from syscall and BPF contexts) ensures that if a delete and/or
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [bpf-next PATCH v2 1/2] bpf: xdp, update devmap comments to reflect napi/rcu usage
  2020-01-12  2:37 ` [bpf-next PATCH v2 1/2] bpf: xdp, update devmap comments to reflect napi/rcu usage John Fastabend
@ 2020-01-12  7:47   ` Maciej Fijalkowski
  2020-01-14  3:27     ` John Fastabend
  2020-01-12 11:21   ` Toke Høiland-Jørgensen
  1 sibling, 1 reply; 10+ messages in thread
From: Maciej Fijalkowski @ 2020-01-12  7:47 UTC (permalink / raw)
  To: John Fastabend
  Cc: bjorn.topel, bpf, toke, toshiaki.makita1, netdev, ast, daniel

On Sat, Jan 11, 2020 at 06:37:21PM -0800, John Fastabend wrote:

Small nits for typos, can be ignored.

> Now that we rely on synchronize_rcu and call_rcu waiting to
> exit perempt-disable regions (NAPI) lets update the comments

s/perempt/preempt

> to reflect this.
> 
> Fixes: 0536b85239b84 ("xdp: Simplify devmap cleanup")
> Acked-by: Björn Töpel <bjorn.topel@intel.com>
> Acked-by: Song Liu <songliubraving@fb.com>
> Signed-off-by: John Fastabend <john.fastabend@gmail.com>
> ---
>  kernel/bpf/devmap.c |   21 +++++++++++----------
>  1 file changed, 11 insertions(+), 10 deletions(-)
> 
> diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c
> index da9c832..f0bf525 100644
> --- a/kernel/bpf/devmap.c
> +++ b/kernel/bpf/devmap.c
> @@ -193,10 +193,12 @@ static void dev_map_free(struct bpf_map *map)
>  
>  	/* At this point bpf_prog->aux->refcnt == 0 and this map->refcnt == 0,
>  	 * so the programs (can be more than one that used this map) were
> -	 * disconnected from events. Wait for outstanding critical sections in
> -	 * these programs to complete. The rcu critical section only guarantees
> -	 * no further reads against netdev_map. It does __not__ ensure pending
> -	 * flush operations (if any) are complete.
> +	 * disconnected from events. The following synchronize_rcu() guarantees
> +	 * both rcu read critical sections complete and waits for
> +	 * preempt-disable regions (NAPI being the relavent context here) so we

s/relavent/relevant

> +	 * are certain there will be no further reads against the netdev_map and
> +	 * all flush operations are complete. Flush operations can only be done
> +	 * from NAPI context for this reason.
>  	 */
>  
>  	spin_lock(&dev_map_lock);
> @@ -498,12 +500,11 @@ static int dev_map_delete_elem(struct bpf_map *map, void *key)
>  		return -EINVAL;
>  
>  	/* Use call_rcu() here to ensure any rcu critical sections have
> -	 * completed, but this does not guarantee a flush has happened
> -	 * yet. Because driver side rcu_read_lock/unlock only protects the
> -	 * running XDP program. However, for pending flush operations the
> -	 * dev and ctx are stored in another per cpu map. And additionally,
> -	 * the driver tear down ensures all soft irqs are complete before
> -	 * removing the net device in the case of dev_put equals zero.
> +	 * completed as well as any flush operations because call_rcu
> +	 * will wait for preempt-disable region to complete, NAPI in this
> +	 * context.  And additionally, the driver tear down ensures all
> +	 * soft irqs are complete before removing the net device in the
> +	 * case of dev_put equals zero.
>  	 */
>  	old_dev = xchg(&dtab->netdev_map[k], NULL);
>  	if (old_dev)
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [bpf-next PATCH v2 1/2] bpf: xdp, update devmap comments to reflect napi/rcu usage
  2020-01-12  2:37 ` [bpf-next PATCH v2 1/2] bpf: xdp, update devmap comments to reflect napi/rcu usage John Fastabend
  2020-01-12  7:47   ` Maciej Fijalkowski
@ 2020-01-12 11:21   ` Toke Høiland-Jørgensen
  1 sibling, 0 replies; 10+ messages in thread
From: Toke Høiland-Jørgensen @ 2020-01-12 11:21 UTC (permalink / raw)
  To: John Fastabend, bjorn.topel, bpf, toshiaki.makita1
  Cc: netdev, john.fastabend, ast, daniel

John Fastabend <john.fastabend@gmail.com> writes:

> Now that we rely on synchronize_rcu and call_rcu waiting to
> exit perempt-disable regions (NAPI) lets update the comments
> to reflect this.
>
> Fixes: 0536b85239b84 ("xdp: Simplify devmap cleanup")
> Acked-by: Björn Töpel <bjorn.topel@intel.com>
> Acked-by: Song Liu <songliubraving@fb.com>
> Signed-off-by: John Fastabend <john.fastabend@gmail.com>

Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [bpf-next PATCH v2 2/2] bpf: xdp, remove no longer required rcu_read_{un}lock()
  2020-01-12  2:37 ` [bpf-next PATCH v2 2/2] bpf: xdp, remove no longer required rcu_read_{un}lock() John Fastabend
  2020-01-12  6:49   ` Maciej Fijalkowski
@ 2020-01-12 11:22   ` Toke Høiland-Jørgensen
  1 sibling, 0 replies; 10+ messages in thread
From: Toke Høiland-Jørgensen @ 2020-01-12 11:22 UTC (permalink / raw)
  To: John Fastabend, bjorn.topel, bpf, toshiaki.makita1
  Cc: netdev, john.fastabend, ast, daniel

John Fastabend <john.fastabend@gmail.com> writes:

> Now that we depend on rcu_call() and synchronize_rcu() to also wait
> for preempt_disabled region to complete the rcu read critical section
> in __dev_map_flush() is no longer required. Except in a few special
> cases in drivers that need it for other reasons.
>
> These originally ensured the map reference was safe while a map was
> also being free'd. And additionally that bpf program updates via
> ndo_bpf did not happen while flush updates were in flight. But flush
> by new rules can only be called from preempt-disabled NAPI context.
> The synchronize_rcu from the map free path and the rcu_call from the
> delete path will ensure the reference there is safe. So lets remove
> the rcu_read_lock and rcu_read_unlock pair to avoid any confusion
> around how this is being protected.
>
> If the rcu_read_lock was required it would mean errors in the above
> logic and the original patch would also be wrong.
>
> Now that we have done above we put the rcu_read_lock in the driver
> code where it is needed in a driver dependent way. I think this
> helps readability of the code so we know where and why we are
> taking read locks. Most drivers will not need rcu_read_locks here
> and further XDP drivers already have rcu_read_locks in their code
> paths for reading xdp programs on RX side so this makes it symmetric
> where we don't have half of rcu critical sections define in driver
> and the other half in devmap.
>
> Fixes: 0536b85239b84 ("xdp: Simplify devmap cleanup")
> Signed-off-by: John Fastabend <john.fastabend@gmail.com>

Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [bpf-next PATCH v2 2/2] bpf: xdp, remove no longer required rcu_read_{un}lock()
  2020-01-14  3:25     ` John Fastabend
@ 2020-01-14  0:31       ` Maciej Fijalkowski
  0 siblings, 0 replies; 10+ messages in thread
From: Maciej Fijalkowski @ 2020-01-14  0:31 UTC (permalink / raw)
  To: John Fastabend
  Cc: bjorn.topel, bpf, toke, toshiaki.makita1, netdev, ast, daniel

On Mon, Jan 13, 2020 at 07:25:51PM -0800, John Fastabend wrote:
> Maciej Fijalkowski wrote:
> > On Sat, Jan 11, 2020 at 06:37:42PM -0800, John Fastabend wrote:
> > > Now that we depend on rcu_call() and synchronize_rcu() to also wait

one last thing since you'll be sending a v3, s/rcu_call/call_rcu.

> > > for preempt_disabled region to complete the rcu read critical section
> > > in __dev_map_flush() is no longer required. Except in a few special
> > > cases in drivers that need it for other reasons.
> > > 
> > > These originally ensured the map reference was safe while a map was
> > > also being free'd. And additionally that bpf program updates via
> > > ndo_bpf did not happen while flush updates were in flight. But flush
> > > by new rules can only be called from preempt-disabled NAPI context.
> > > The synchronize_rcu from the map free path and the rcu_call from the
> > > delete path will ensure the reference there is safe. So lets remove
> > > the rcu_read_lock and rcu_read_unlock pair to avoid any confusion
> > > around how this is being protected.
> > > 
> > > If the rcu_read_lock was required it would mean errors in the above
> > > logic and the original patch would also be wrong.
> > > 
> > > Now that we have done above we put the rcu_read_lock in the driver
> > > code where it is needed in a driver dependent way. I think this
> > > helps readability of the code so we know where and why we are
> > > taking read locks. Most drivers will not need rcu_read_locks here
> > > and further XDP drivers already have rcu_read_locks in their code
> > > paths for reading xdp programs on RX side so this makes it symmetric
> > > where we don't have half of rcu critical sections define in driver
> > > and the other half in devmap.
> > > 
> > > Fixes: 0536b85239b84 ("xdp: Simplify devmap cleanup")
> > > Signed-off-by: John Fastabend <john.fastabend@gmail.com>
> > > ---
> 
> [...]
> 
> > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > index 4d7d5434..2c11f82 100644
> > > --- a/drivers/net/virtio_net.c
> > > +++ b/drivers/net/virtio_net.c
> > > @@ -498,12 +498,16 @@ static int virtnet_xdp_xmit(struct net_device *dev,
> > >  	void *ptr;
> > >  	int i;
> > >  
> > > +	rcu_read_lock();
> > > +
> > >  	/* Only allow ndo_xdp_xmit if XDP is loaded on dev, as this
> > >  	 * indicate XDP resources have been successfully allocated.
> > >  	 */
> > >  	xdp_prog = rcu_dereference(rq->xdp_prog);
> > 
> > We could convert that rcu_dereference to rcu_access_pointer so that we
> > don't need the rcu critical section here at all. Actually this was
> > suggested some time ago by David Ahern during the initial discussion
> > around this code. Not sure why we didn't change it.
> 
> Makes sense I'll send a v3 with a middle patch to do this and then drop
> this segment.

Great :)

> 
> > 
> > Veth is also checking the xdp prog presence and it is doing that via
> > rcu_access_pointer so such conversion would make it more common, no?
> 
> veth derefernces rcv netdevice and this accesses it. The logic to
> drop the rcu here is less obvious to me. At least I would have to
> study it closely.

Veth does two rcu derefs in the veth_xmit, one for netdev and one for
xdp_prog and I was referring to a xdp_prog deref which is done via
rcu_access_pointer. So yeah we need to keep the rcu section in there but I
was just making an argument for having the rcu_access_pointer on the
virtio_net side.

> 
> > 
> > xdp_prog is only check against NULL, so quoting the part of comment from
> > rcu_access_pointer:
> > "This is useful when the value of this pointer is accessed, but the pointer
> > is not dereferenced, for example, when testing an RCU-protected pointer
> > against NULL."
> 
> +1 thanks it does make the cleanup nicer.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [bpf-next PATCH v2 2/2] bpf: xdp, remove no longer required rcu_read_{un}lock()
  2020-01-12  6:49   ` Maciej Fijalkowski
@ 2020-01-14  3:25     ` John Fastabend
  2020-01-14  0:31       ` Maciej Fijalkowski
  0 siblings, 1 reply; 10+ messages in thread
From: John Fastabend @ 2020-01-14  3:25 UTC (permalink / raw)
  To: Maciej Fijalkowski, John Fastabend
  Cc: bjorn.topel, bpf, toke, toshiaki.makita1, netdev, ast, daniel

Maciej Fijalkowski wrote:
> On Sat, Jan 11, 2020 at 06:37:42PM -0800, John Fastabend wrote:
> > Now that we depend on rcu_call() and synchronize_rcu() to also wait
> > for preempt_disabled region to complete the rcu read critical section
> > in __dev_map_flush() is no longer required. Except in a few special
> > cases in drivers that need it for other reasons.
> > 
> > These originally ensured the map reference was safe while a map was
> > also being free'd. And additionally that bpf program updates via
> > ndo_bpf did not happen while flush updates were in flight. But flush
> > by new rules can only be called from preempt-disabled NAPI context.
> > The synchronize_rcu from the map free path and the rcu_call from the
> > delete path will ensure the reference there is safe. So lets remove
> > the rcu_read_lock and rcu_read_unlock pair to avoid any confusion
> > around how this is being protected.
> > 
> > If the rcu_read_lock was required it would mean errors in the above
> > logic and the original patch would also be wrong.
> > 
> > Now that we have done above we put the rcu_read_lock in the driver
> > code where it is needed in a driver dependent way. I think this
> > helps readability of the code so we know where and why we are
> > taking read locks. Most drivers will not need rcu_read_locks here
> > and further XDP drivers already have rcu_read_locks in their code
> > paths for reading xdp programs on RX side so this makes it symmetric
> > where we don't have half of rcu critical sections define in driver
> > and the other half in devmap.
> > 
> > Fixes: 0536b85239b84 ("xdp: Simplify devmap cleanup")
> > Signed-off-by: John Fastabend <john.fastabend@gmail.com>
> > ---

[...]

> > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > index 4d7d5434..2c11f82 100644
> > --- a/drivers/net/virtio_net.c
> > +++ b/drivers/net/virtio_net.c
> > @@ -498,12 +498,16 @@ static int virtnet_xdp_xmit(struct net_device *dev,
> >  	void *ptr;
> >  	int i;
> >  
> > +	rcu_read_lock();
> > +
> >  	/* Only allow ndo_xdp_xmit if XDP is loaded on dev, as this
> >  	 * indicate XDP resources have been successfully allocated.
> >  	 */
> >  	xdp_prog = rcu_dereference(rq->xdp_prog);
> 
> We could convert that rcu_dereference to rcu_access_pointer so that we
> don't need the rcu critical section here at all. Actually this was
> suggested some time ago by David Ahern during the initial discussion
> around this code. Not sure why we didn't change it.

Makes sense I'll send a v3 with a middle patch to do this and then drop
this segment.

> 
> Veth is also checking the xdp prog presence and it is doing that via
> rcu_access_pointer so such conversion would make it more common, no?

veth derefernces rcv netdevice and this accesses it. The logic to
drop the rcu here is less obvious to me. At least I would have to
study it closely.

> 
> xdp_prog is only check against NULL, so quoting the part of comment from
> rcu_access_pointer:
> "This is useful when the value of this pointer is accessed, but the pointer
> is not dereferenced, for example, when testing an RCU-protected pointer
> against NULL."

+1 thanks it does make the cleanup nicer.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [bpf-next PATCH v2 1/2] bpf: xdp, update devmap comments to reflect napi/rcu usage
  2020-01-12  7:47   ` Maciej Fijalkowski
@ 2020-01-14  3:27     ` John Fastabend
  0 siblings, 0 replies; 10+ messages in thread
From: John Fastabend @ 2020-01-14  3:27 UTC (permalink / raw)
  To: Maciej Fijalkowski, John Fastabend
  Cc: bjorn.topel, bpf, toke, toshiaki.makita1, netdev, ast, daniel

Maciej Fijalkowski wrote:
> On Sat, Jan 11, 2020 at 06:37:21PM -0800, John Fastabend wrote:
> 
> Small nits for typos, can be ignored.

thanks better to not have typos and I'll send a v3 anyways
for the rcu_access_pointer comment in virtio_net.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, back to index

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-01-12  2:36 [bpf-next PATCH v2 0/2] xdp devmap improvements cleanup John Fastabend
2020-01-12  2:37 ` [bpf-next PATCH v2 1/2] bpf: xdp, update devmap comments to reflect napi/rcu usage John Fastabend
2020-01-12  7:47   ` Maciej Fijalkowski
2020-01-14  3:27     ` John Fastabend
2020-01-12 11:21   ` Toke Høiland-Jørgensen
2020-01-12  2:37 ` [bpf-next PATCH v2 2/2] bpf: xdp, remove no longer required rcu_read_{un}lock() John Fastabend
2020-01-12  6:49   ` Maciej Fijalkowski
2020-01-14  3:25     ` John Fastabend
2020-01-14  0:31       ` Maciej Fijalkowski
2020-01-12 11:22   ` Toke Høiland-Jørgensen

BPF Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/bpf/0 bpf/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 bpf bpf/ https://lore.kernel.org/bpf \
		bpf@vger.kernel.org
	public-inbox-index bpf

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.bpf


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git