bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH bpf-next v3 0/3] XDP flush cleanups
@ 2020-01-27  0:13 John Fastabend
  2020-01-27  0:14 ` [PATCH bpf-next v3 1/3] bpf: xdp, update devmap comments to reflect napi/rcu usage John Fastabend
                   ` (3 more replies)
  0 siblings, 4 replies; 6+ messages in thread
From: John Fastabend @ 2020-01-27  0:13 UTC (permalink / raw)
  To: bpf
  Cc: bjorn.topel, songliubraving, john.fastabend, ast, daniel, toke,
	maciej.fijalkowski, netdev

A couple updates to cleanup some of the XDP comments and rcu usage.

It would be best if patch 1/3 goes into current bpf-next with the
associated patch in the fixes tag so we don't have out of sync
comments in the code. Just noting because its close to time to close
{bpf|net}-next branches.

v2->v3: Jesper noticed I can't spell, so fixed spelling. If we
are fixing comments its best to have correct spelling.

v1->v2: Added 2/3 patch for virtio_net to use rcu_access_pointer
and avoid read_lock.

John Fastabend (3):
  bpf: xdp, update devmap comments to reflect napi/rcu usage
  bpf: xdp, virtio_net use access ptr macro for xdp enable check
  bpf: xdp, remove no longer required rcu_read_{un}lock()

 drivers/net/veth.c       |  6 +++++-
 drivers/net/virtio_net.c |  2 +-
 kernel/bpf/devmap.c      | 26 ++++++++++++++------------
 3 files changed, 20 insertions(+), 14 deletions(-)

-- 
2.7.4


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH bpf-next v3 1/3] bpf: xdp, update devmap comments to reflect napi/rcu usage
  2020-01-27  0:13 [PATCH bpf-next v3 0/3] XDP flush cleanups John Fastabend
@ 2020-01-27  0:14 ` John Fastabend
  2020-01-27  0:14 ` [PATCH bpf-next v3 2/3] bpf: xdp, virtio_net use access ptr macro for xdp enable check John Fastabend
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 6+ messages in thread
From: John Fastabend @ 2020-01-27  0:14 UTC (permalink / raw)
  To: bpf
  Cc: bjorn.topel, songliubraving, john.fastabend, ast, daniel, toke,
	maciej.fijalkowski, netdev

Now that we rely on synchronize_rcu and call_rcu waiting to
exit perempt-disable regions (NAPI) lets update the comments
to reflect this.

Fixes: 0536b85239b84 ("xdp: Simplify devmap cleanup")
Acked-by: Björn Töpel <bjorn.topel@intel.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
---
 kernel/bpf/devmap.c | 21 +++++++++++----------
 1 file changed, 11 insertions(+), 10 deletions(-)

diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c
index da9c832..707583f 100644
--- a/kernel/bpf/devmap.c
+++ b/kernel/bpf/devmap.c
@@ -193,10 +193,12 @@ static void dev_map_free(struct bpf_map *map)
 
 	/* At this point bpf_prog->aux->refcnt == 0 and this map->refcnt == 0,
 	 * so the programs (can be more than one that used this map) were
-	 * disconnected from events. Wait for outstanding critical sections in
-	 * these programs to complete. The rcu critical section only guarantees
-	 * no further reads against netdev_map. It does __not__ ensure pending
-	 * flush operations (if any) are complete.
+	 * disconnected from events. The following synchronize_rcu() guarantees
+	 * both rcu read critical sections complete and waits for
+	 * preempt-disable regions (NAPI being the relevant context here) so we
+	 * are certain there will be no further reads against the netdev_map and
+	 * all flush operations are complete. Flush operations can only be done
+	 * from NAPI context for this reason.
 	 */
 
 	spin_lock(&dev_map_lock);
@@ -498,12 +500,11 @@ static int dev_map_delete_elem(struct bpf_map *map, void *key)
 		return -EINVAL;
 
 	/* Use call_rcu() here to ensure any rcu critical sections have
-	 * completed, but this does not guarantee a flush has happened
-	 * yet. Because driver side rcu_read_lock/unlock only protects the
-	 * running XDP program. However, for pending flush operations the
-	 * dev and ctx are stored in another per cpu map. And additionally,
-	 * the driver tear down ensures all soft irqs are complete before
-	 * removing the net device in the case of dev_put equals zero.
+	 * completed as well as any flush operations because call_rcu
+	 * will wait for preempt-disable region to complete, NAPI in this
+	 * context.  And additionally, the driver tear down ensures all
+	 * soft irqs are complete before removing the net device in the
+	 * case of dev_put equals zero.
 	 */
 	old_dev = xchg(&dtab->netdev_map[k], NULL);
 	if (old_dev)
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH bpf-next v3 2/3] bpf: xdp, virtio_net use access ptr macro for xdp enable check
  2020-01-27  0:13 [PATCH bpf-next v3 0/3] XDP flush cleanups John Fastabend
  2020-01-27  0:14 ` [PATCH bpf-next v3 1/3] bpf: xdp, update devmap comments to reflect napi/rcu usage John Fastabend
@ 2020-01-27  0:14 ` John Fastabend
  2020-01-27 10:02   ` Maciej Fijalkowski
  2020-01-27  0:14 ` [PATCH bpf-next v3 3/3] bpf: xdp, remove no longer required rcu_read_{un}lock() John Fastabend
  2020-01-27 10:58 ` [PATCH bpf-next v3 0/3] XDP flush cleanups Daniel Borkmann
  3 siblings, 1 reply; 6+ messages in thread
From: John Fastabend @ 2020-01-27  0:14 UTC (permalink / raw)
  To: bpf
  Cc: bjorn.topel, songliubraving, john.fastabend, ast, daniel, toke,
	maciej.fijalkowski, netdev

virtio_net currently relies on rcu critical section to access the xdp
program in its xdp_xmit handler. However, the pointer to the xdp program
is only used to do a NULL pointer comparison to determine if xdp is
enabled or not.

Use rcu_access_pointer() instead of rcu_dereference() to reflect this.
Then later when we drop rcu_read critical section virtio_net will not
need in special handling.

Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
---
 drivers/net/virtio_net.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 4d7d5434..945eabc 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -501,7 +501,7 @@ static int virtnet_xdp_xmit(struct net_device *dev,
 	/* Only allow ndo_xdp_xmit if XDP is loaded on dev, as this
 	 * indicate XDP resources have been successfully allocated.
 	 */
-	xdp_prog = rcu_dereference(rq->xdp_prog);
+	xdp_prog = rcu_access_pointer(rq->xdp_prog);
 	if (!xdp_prog)
 		return -ENXIO;
 
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH bpf-next v3 3/3] bpf: xdp, remove no longer required rcu_read_{un}lock()
  2020-01-27  0:13 [PATCH bpf-next v3 0/3] XDP flush cleanups John Fastabend
  2020-01-27  0:14 ` [PATCH bpf-next v3 1/3] bpf: xdp, update devmap comments to reflect napi/rcu usage John Fastabend
  2020-01-27  0:14 ` [PATCH bpf-next v3 2/3] bpf: xdp, virtio_net use access ptr macro for xdp enable check John Fastabend
@ 2020-01-27  0:14 ` John Fastabend
  2020-01-27 10:58 ` [PATCH bpf-next v3 0/3] XDP flush cleanups Daniel Borkmann
  3 siblings, 0 replies; 6+ messages in thread
From: John Fastabend @ 2020-01-27  0:14 UTC (permalink / raw)
  To: bpf
  Cc: bjorn.topel, songliubraving, john.fastabend, ast, daniel, toke,
	maciej.fijalkowski, netdev

Now that we depend on rcu_call() and synchronize_rcu() to also wait
for preempt_disabled region to complete the rcu read critical section
in __dev_map_flush() is no longer required. Except in a few special
cases in drivers that need it for other reasons.

These originally ensured the map reference was safe while a map was
also being free'd. And additionally that bpf program updates via
ndo_bpf did not happen while flush updates were in flight. But flush
by new rules can only be called from preempt-disabled NAPI context.
The synchronize_rcu from the map free path and the rcu_call from the
delete path will ensure the reference there is safe. So lets remove
the rcu_read_lock and rcu_read_unlock pair to avoid any confusion
around how this is being protected.

If the rcu_read_lock was required it would mean errors in the above
logic and the original patch would also be wrong.

Now that we have done above we put the rcu_read_lock in the driver
code where it is needed in a driver dependent way. I think this
helps readability of the code so we know where and why we are
taking read locks. Most drivers will not need rcu_read_locks here
and further XDP drivers already have rcu_read_locks in their code
paths for reading xdp programs on RX side so this makes it symmetric
where we don't have half of rcu critical sections define in driver
and the other half in devmap.

Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
---
 drivers/net/veth.c  | 6 +++++-
 kernel/bpf/devmap.c | 5 +++--
 2 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/drivers/net/veth.c b/drivers/net/veth.c
index a552df3..184e1b4 100644
--- a/drivers/net/veth.c
+++ b/drivers/net/veth.c
@@ -377,6 +377,7 @@ static int veth_xdp_xmit(struct net_device *dev, int n,
 	unsigned int max_len;
 	struct veth_rq *rq;
 
+	rcu_read_lock();
 	if (unlikely(flags & ~XDP_XMIT_FLAGS_MASK)) {
 		ret = -EINVAL;
 		goto drop;
@@ -418,11 +419,14 @@ static int veth_xdp_xmit(struct net_device *dev, int n,
 	if (flags & XDP_XMIT_FLUSH)
 		__veth_xdp_flush(rq);
 
-	if (likely(!drops))
+	if (likely(!drops)) {
+		rcu_read_unlock();
 		return n;
+	}
 
 	ret = n - drops;
 drop:
+	rcu_read_unlock();
 	atomic64_add(drops, &priv->dropped);
 
 	return ret;
diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c
index 707583f..2b83c56 100644
--- a/kernel/bpf/devmap.c
+++ b/kernel/bpf/devmap.c
@@ -372,16 +372,17 @@ static int bq_xmit_all(struct xdp_bulk_queue *bq, u32 flags)
  * from NET_RX_SOFTIRQ. Either way the poll routine must complete before the
  * net device can be torn down. On devmap tear down we ensure the flush list
  * is empty before completing to ensure all flush operations have completed.
+ * When drivers update the bpf program they may need to ensure any flush ops
+ * are also complete. Using synchronize_rcu or call_rcu will suffice for this
+ * because both wait for napi context to exit.
  */
 void __dev_map_flush(void)
 {
 	struct list_head *flush_list = this_cpu_ptr(&dev_map_flush_list);
 	struct xdp_bulk_queue *bq, *tmp;
 
-	rcu_read_lock();
 	list_for_each_entry_safe(bq, tmp, flush_list, flush_node)
 		bq_xmit_all(bq, XDP_XMIT_FLUSH);
-	rcu_read_unlock();
 }
 
 /* rcu_read_lock (from syscall and BPF contexts) ensures that if a delete and/or
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH bpf-next v3 2/3] bpf: xdp, virtio_net use access ptr macro for xdp enable check
  2020-01-27  0:14 ` [PATCH bpf-next v3 2/3] bpf: xdp, virtio_net use access ptr macro for xdp enable check John Fastabend
@ 2020-01-27 10:02   ` Maciej Fijalkowski
  0 siblings, 0 replies; 6+ messages in thread
From: Maciej Fijalkowski @ 2020-01-27 10:02 UTC (permalink / raw)
  To: John Fastabend
  Cc: bpf, bjorn.topel, songliubraving, ast, daniel, toke, netdev

On Sun, Jan 26, 2020 at 04:14:01PM -0800, John Fastabend wrote:
> virtio_net currently relies on rcu critical section to access the xdp
> program in its xdp_xmit handler. However, the pointer to the xdp program
> is only used to do a NULL pointer comparison to determine if xdp is
> enabled or not.
> 
> Use rcu_access_pointer() instead of rcu_dereference() to reflect this.
> Then later when we drop rcu_read critical section virtio_net will not
> need in special handling.
> 
> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
> Signed-off-by: John Fastabend <john.fastabend@gmail.com>

Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH bpf-next v3 0/3] XDP flush cleanups
  2020-01-27  0:13 [PATCH bpf-next v3 0/3] XDP flush cleanups John Fastabend
                   ` (2 preceding siblings ...)
  2020-01-27  0:14 ` [PATCH bpf-next v3 3/3] bpf: xdp, remove no longer required rcu_read_{un}lock() John Fastabend
@ 2020-01-27 10:58 ` Daniel Borkmann
  3 siblings, 0 replies; 6+ messages in thread
From: Daniel Borkmann @ 2020-01-27 10:58 UTC (permalink / raw)
  To: John Fastabend, bpf
  Cc: bjorn.topel, songliubraving, ast, toke, maciej.fijalkowski, netdev

On 1/27/20 1:13 AM, John Fastabend wrote:
> A couple updates to cleanup some of the XDP comments and rcu usage.
> 
> It would be best if patch 1/3 goes into current bpf-next with the
> associated patch in the fixes tag so we don't have out of sync
> comments in the code. Just noting because its close to time to close
> {bpf|net}-next branches.
> 
> v2->v3: Jesper noticed I can't spell, so fixed spelling. If we
> are fixing comments its best to have correct spelling.
> 
> v1->v2: Added 2/3 patch for virtio_net to use rcu_access_pointer
> and avoid read_lock.
> 
> John Fastabend (3):
>    bpf: xdp, update devmap comments to reflect napi/rcu usage
>    bpf: xdp, virtio_net use access ptr macro for xdp enable check
>    bpf: xdp, remove no longer required rcu_read_{un}lock()
> 
>   drivers/net/veth.c       |  6 +++++-
>   drivers/net/virtio_net.c |  2 +-
>   kernel/bpf/devmap.c      | 26 ++++++++++++++------------
>   3 files changed, 20 insertions(+), 14 deletions(-)
> 

Series applied, thanks. I had to manually massage the patch 3/3 as it
wasn't rebased onto bpf-next.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2020-01-27 17:11 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-01-27  0:13 [PATCH bpf-next v3 0/3] XDP flush cleanups John Fastabend
2020-01-27  0:14 ` [PATCH bpf-next v3 1/3] bpf: xdp, update devmap comments to reflect napi/rcu usage John Fastabend
2020-01-27  0:14 ` [PATCH bpf-next v3 2/3] bpf: xdp, virtio_net use access ptr macro for xdp enable check John Fastabend
2020-01-27 10:02   ` Maciej Fijalkowski
2020-01-27  0:14 ` [PATCH bpf-next v3 3/3] bpf: xdp, remove no longer required rcu_read_{un}lock() John Fastabend
2020-01-27 10:58 ` [PATCH bpf-next v3 0/3] XDP flush cleanups Daniel Borkmann

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).