BPF Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH bpf-next 0/2] xdp: Introduce bulking for non-map XDP_REDIRECT
@ 2020-01-10 14:22 Toke Høiland-Jørgensen
  2020-01-10 14:22 ` [PATCH bpf-next 1/2] xdp: Move devmap bulk queue into struct net_device Toke Høiland-Jørgensen
  2020-01-10 14:22 ` [PATCH bpf-next 2/2] xdp: Use bulking for non-map XDP_REDIRECT Toke Høiland-Jørgensen
  0 siblings, 2 replies; 13+ messages in thread
From: Toke Høiland-Jørgensen @ 2020-01-10 14:22 UTC (permalink / raw)
  To: netdev
  Cc: bpf, Daniel Borkmann, Alexei Starovoitov, David Miller,
	Jesper Dangaard Brouer, Björn Töpel, John Fastabend

Since commit 96360004b862 ("xdp: Make devmap flush_list common for all map
instances"), devmap flushing is a global operation instead of tied to a
particular map. This means that with a bit of refactoring, we can finally fix
the performance delta between the bpf_redirect_map() and bpf_redirect() helper
functions, by introducing bulking for the latter as well.

This series makes this change by moving the data structure used for the bulking
into struct net_device itself, so we can access it even when there is not
devmap. Once this is done, moving the bpf_redirect() helper to use the bulking
mechanism becomes quite trivial, and brings bpf_redirect() up to the same as
bpf_redirect_map():

                  Before:   After:
bpf_redirect_map: 8.4 Mpps  8.4 Mpps  (no change)
bpf_redirect:     5.0 Mpps  8.4 Mpps  (+68%)

After this patch series, the only semantics different between the two variants
of the bpf() helper (apart from the absence of a map argument, obviously) is
that the _map() variant will return an error if passed an invalid map index,
whereas the bpf_redirect() helper will succeed, but drop packets on
xdp_do_redirect(). This is because the helper has no reference to the calling
netdev, so unfortunately we can't do the ifindex lookup directly in the helper.

---

Toke Høiland-Jørgensen (2):
      xdp: Move devmap bulk queue into struct net_device
      xdp: Use bulking for non-map XDP_REDIRECT


 include/linux/bpf.h        |   13 +++++-
 include/linux/netdevice.h  |    3 +
 include/trace/events/xdp.h |    2 -
 kernel/bpf/devmap.c        |   92 ++++++++++++++++++++++----------------------
 net/core/dev.c             |    2 +
 net/core/filter.c          |   30 +-------------
 6 files changed, 66 insertions(+), 76 deletions(-)


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH bpf-next 1/2] xdp: Move devmap bulk queue into struct net_device
  2020-01-10 14:22 [PATCH bpf-next 0/2] xdp: Introduce bulking for non-map XDP_REDIRECT Toke Høiland-Jørgensen
@ 2020-01-10 14:22 ` Toke Høiland-Jørgensen
  2020-01-10 15:03   ` Björn Töpel
  2020-01-10 16:08   ` Jesper Dangaard Brouer
  2020-01-10 14:22 ` [PATCH bpf-next 2/2] xdp: Use bulking for non-map XDP_REDIRECT Toke Høiland-Jørgensen
  1 sibling, 2 replies; 13+ messages in thread
From: Toke Høiland-Jørgensen @ 2020-01-10 14:22 UTC (permalink / raw)
  To: netdev
  Cc: bpf, Daniel Borkmann, Alexei Starovoitov, David Miller,
	Jesper Dangaard Brouer, Björn Töpel, John Fastabend

From: Toke Høiland-Jørgensen <toke@redhat.com>

Commit 96360004b862 ("xdp: Make devmap flush_list common for all map
instances"), changed devmap flushing to be a global operation instead of a
per-map operation. However, the queue structure used for bulking was still
allocated as part of the containing map.

This patch moves the devmap bulk queue into struct net_device. The
motivation for this is reusing it for the non-map variant of XDP_REDIRECT,
which will be changed in a subsequent commit.

We defer the actual allocation of the bulk queue structure until the
NETDEV_REGISTER notification devmap.c. This makes it possible to check for
ndo_xdp_xmit support before allocating the structure, which is not possible
at the time struct net_device is allocated. However, we keep the freeing in
free_netdev() to avoid adding another RCU callback on NETDEV_UNREGISTER.

Because of this change, we lose the reference back to the map that
originated the redirect, so change the tracepoint to always return 0 as the
map ID and index. Otherwise no functional change is intended with this
patch.

Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
---
 include/linux/netdevice.h  |    3 ++
 include/trace/events/xdp.h |    2 +
 kernel/bpf/devmap.c        |   61 ++++++++++++++++++--------------------------
 net/core/dev.c             |    2 +
 4 files changed, 31 insertions(+), 37 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 2741aa35bec6..1b2bc2a7522e 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -876,6 +876,7 @@ enum bpf_netdev_command {
 struct bpf_prog_offload_ops;
 struct netlink_ext_ack;
 struct xdp_umem;
+struct xdp_dev_bulk_queue;
 
 struct netdev_bpf {
 	enum bpf_netdev_command command;
@@ -1993,6 +1994,8 @@ struct net_device {
 	spinlock_t		tx_global_lock;
 	int			watchdog_timeo;
 
+	struct xdp_dev_bulk_queue __percpu *xdp_bulkq;
+
 #ifdef CONFIG_XPS
 	struct xps_dev_maps __rcu *xps_cpus_map;
 	struct xps_dev_maps __rcu *xps_rxqs_map;
diff --git a/include/trace/events/xdp.h b/include/trace/events/xdp.h
index a7378bcd9928..72bad13d4a3c 100644
--- a/include/trace/events/xdp.h
+++ b/include/trace/events/xdp.h
@@ -278,7 +278,7 @@ TRACE_EVENT(xdp_devmap_xmit,
 	),
 
 	TP_fast_assign(
-		__entry->map_id		= map->id;
+		__entry->map_id		= map ? map->id : 0;
 		__entry->act		= XDP_REDIRECT;
 		__entry->map_index	= map_index;
 		__entry->drops		= drops;
diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c
index da9c832fc5c8..bcb05cb6b728 100644
--- a/kernel/bpf/devmap.c
+++ b/kernel/bpf/devmap.c
@@ -53,13 +53,11 @@
 	(BPF_F_NUMA_NODE | BPF_F_RDONLY | BPF_F_WRONLY)
 
 #define DEV_MAP_BULK_SIZE 16
-struct bpf_dtab_netdev;
-
-struct xdp_bulk_queue {
+struct xdp_dev_bulk_queue {
 	struct xdp_frame *q[DEV_MAP_BULK_SIZE];
 	struct list_head flush_node;
+	struct net_device *dev;
 	struct net_device *dev_rx;
-	struct bpf_dtab_netdev *obj;
 	unsigned int count;
 };
 
@@ -67,9 +65,8 @@ struct bpf_dtab_netdev {
 	struct net_device *dev; /* must be first member, due to tracepoint */
 	struct hlist_node index_hlist;
 	struct bpf_dtab *dtab;
-	struct xdp_bulk_queue __percpu *bulkq;
 	struct rcu_head rcu;
-	unsigned int idx; /* keep track of map index for tracepoint */
+	unsigned int idx;
 };
 
 struct bpf_dtab {
@@ -219,7 +216,6 @@ static void dev_map_free(struct bpf_map *map)
 
 			hlist_for_each_entry_safe(dev, next, head, index_hlist) {
 				hlist_del_rcu(&dev->index_hlist);
-				free_percpu(dev->bulkq);
 				dev_put(dev->dev);
 				kfree(dev);
 			}
@@ -234,7 +230,6 @@ static void dev_map_free(struct bpf_map *map)
 			if (!dev)
 				continue;
 
-			free_percpu(dev->bulkq);
 			dev_put(dev->dev);
 			kfree(dev);
 		}
@@ -320,10 +315,9 @@ static int dev_map_hash_get_next_key(struct bpf_map *map, void *key,
 	return -ENOENT;
 }
 
-static int bq_xmit_all(struct xdp_bulk_queue *bq, u32 flags)
+static int bq_xmit_all(struct xdp_dev_bulk_queue *bq, u32 flags)
 {
-	struct bpf_dtab_netdev *obj = bq->obj;
-	struct net_device *dev = obj->dev;
+	struct net_device *dev = bq->dev;
 	int sent = 0, drops = 0, err = 0;
 	int i;
 
@@ -346,8 +340,7 @@ static int bq_xmit_all(struct xdp_bulk_queue *bq, u32 flags)
 out:
 	bq->count = 0;
 
-	trace_xdp_devmap_xmit(&obj->dtab->map, obj->idx,
-			      sent, drops, bq->dev_rx, dev, err);
+	trace_xdp_devmap_xmit(NULL, 0, sent, drops, bq->dev_rx, dev, err);
 	bq->dev_rx = NULL;
 	__list_del_clearprev(&bq->flush_node);
 	return 0;
@@ -374,7 +367,7 @@ static int bq_xmit_all(struct xdp_bulk_queue *bq, u32 flags)
 void __dev_map_flush(void)
 {
 	struct list_head *flush_list = this_cpu_ptr(&dev_map_flush_list);
-	struct xdp_bulk_queue *bq, *tmp;
+	struct xdp_dev_bulk_queue *bq, *tmp;
 
 	rcu_read_lock();
 	list_for_each_entry_safe(bq, tmp, flush_list, flush_node)
@@ -401,12 +394,12 @@ struct bpf_dtab_netdev *__dev_map_lookup_elem(struct bpf_map *map, u32 key)
 /* Runs under RCU-read-side, plus in softirq under NAPI protection.
  * Thus, safe percpu variable access.
  */
-static int bq_enqueue(struct bpf_dtab_netdev *obj, struct xdp_frame *xdpf,
+static int bq_enqueue(struct net_device *dev, struct xdp_frame *xdpf,
 		      struct net_device *dev_rx)
 
 {
 	struct list_head *flush_list = this_cpu_ptr(&dev_map_flush_list);
-	struct xdp_bulk_queue *bq = this_cpu_ptr(obj->bulkq);
+	struct xdp_dev_bulk_queue *bq = this_cpu_ptr(dev->xdp_bulkq);
 
 	if (unlikely(bq->count == DEV_MAP_BULK_SIZE))
 		bq_xmit_all(bq, 0);
@@ -444,7 +437,7 @@ int dev_map_enqueue(struct bpf_dtab_netdev *dst, struct xdp_buff *xdp,
 	if (unlikely(!xdpf))
 		return -EOVERFLOW;
 
-	return bq_enqueue(dst, xdpf, dev_rx);
+	return bq_enqueue(dev, xdpf, dev_rx);
 }
 
 int dev_map_generic_redirect(struct bpf_dtab_netdev *dst, struct sk_buff *skb,
@@ -483,7 +476,6 @@ static void __dev_map_entry_free(struct rcu_head *rcu)
 	struct bpf_dtab_netdev *dev;
 
 	dev = container_of(rcu, struct bpf_dtab_netdev, rcu);
-	free_percpu(dev->bulkq);
 	dev_put(dev->dev);
 	kfree(dev);
 }
@@ -538,30 +530,14 @@ static struct bpf_dtab_netdev *__dev_map_alloc_node(struct net *net,
 						    u32 ifindex,
 						    unsigned int idx)
 {
-	gfp_t gfp = GFP_ATOMIC | __GFP_NOWARN;
 	struct bpf_dtab_netdev *dev;
-	struct xdp_bulk_queue *bq;
-	int cpu;
 
-	dev = kmalloc_node(sizeof(*dev), gfp, dtab->map.numa_node);
+	dev = kmalloc_node(sizeof(*dev), GFP_ATOMIC | __GFP_NOWARN, dtab->map.numa_node);
 	if (!dev)
 		return ERR_PTR(-ENOMEM);
 
-	dev->bulkq = __alloc_percpu_gfp(sizeof(*dev->bulkq),
-					sizeof(void *), gfp);
-	if (!dev->bulkq) {
-		kfree(dev);
-		return ERR_PTR(-ENOMEM);
-	}
-
-	for_each_possible_cpu(cpu) {
-		bq = per_cpu_ptr(dev->bulkq, cpu);
-		bq->obj = dev;
-	}
-
 	dev->dev = dev_get_by_index(net, ifindex);
 	if (!dev->dev) {
-		free_percpu(dev->bulkq);
 		kfree(dev);
 		return ERR_PTR(-EINVAL);
 	}
@@ -721,9 +697,22 @@ static int dev_map_notification(struct notifier_block *notifier,
 {
 	struct net_device *netdev = netdev_notifier_info_to_dev(ptr);
 	struct bpf_dtab *dtab;
-	int i;
+	int i, cpu;
 
 	switch (event) {
+	case NETDEV_REGISTER:
+		if (!netdev->netdev_ops->ndo_xdp_xmit || netdev->xdp_bulkq)
+			break;
+
+		/* will be freed in free_netdev() */
+		netdev->xdp_bulkq = __alloc_percpu_gfp(sizeof(struct xdp_dev_bulk_queue),
+						       sizeof(void *), GFP_ATOMIC);
+		if (!netdev->xdp_bulkq)
+			return NOTIFY_BAD;
+
+		for_each_possible_cpu(cpu)
+			per_cpu_ptr(netdev->xdp_bulkq, cpu)->dev = netdev;
+		break;
 	case NETDEV_UNREGISTER:
 		/* This rcu_read_lock/unlock pair is needed because
 		 * dev_map_list is an RCU list AND to ensure a delete
diff --git a/net/core/dev.c b/net/core/dev.c
index d99f88c58636..e7802a41ae7f 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -9847,6 +9847,8 @@ void free_netdev(struct net_device *dev)
 
 	free_percpu(dev->pcpu_refcnt);
 	dev->pcpu_refcnt = NULL;
+	free_percpu(dev->xdp_bulkq);
+	dev->xdp_bulkq = NULL;
 
 	netdev_unregister_lockdep_key(dev);
 


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH bpf-next 2/2] xdp: Use bulking for non-map XDP_REDIRECT
  2020-01-10 14:22 [PATCH bpf-next 0/2] xdp: Introduce bulking for non-map XDP_REDIRECT Toke Høiland-Jørgensen
  2020-01-10 14:22 ` [PATCH bpf-next 1/2] xdp: Move devmap bulk queue into struct net_device Toke Høiland-Jørgensen
@ 2020-01-10 14:22 ` Toke Høiland-Jørgensen
  2020-01-10 15:15   ` Björn Töpel
  1 sibling, 1 reply; 13+ messages in thread
From: Toke Høiland-Jørgensen @ 2020-01-10 14:22 UTC (permalink / raw)
  To: netdev
  Cc: bpf, Daniel Borkmann, Alexei Starovoitov, David Miller,
	Jesper Dangaard Brouer, Björn Töpel, John Fastabend

From: Toke Høiland-Jørgensen <toke@redhat.com>

Since the bulk queue used by XDP_REDIRECT now lives in struct net_device,
we can re-use the bulking for the non-map version of the bpf_redirect()
helper. This is a simple matter of having xdp_do_redirect_slow() queue the
frame on the bulk queue instead of sending it out with __bpf_tx_xdp().

Unfortunately we can't make the bpf_redirect() helper return an error if
the ifindex doesn't exit (as bpf_redirect_map() does), because we don't
have a reference to the network namespace of the ingress device at the time
the helper is called. So we have to leave it as-is and keep the device
lookup in xdp_do_redirect_slow().

With this change, the performance of the xdp_redirect sample program goes
from 5Mpps to 8.4Mpps (a 68% increase).

Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
---
 include/linux/bpf.h |   13 +++++++++++--
 kernel/bpf/devmap.c |   31 ++++++++++++++++++++++---------
 net/core/filter.c   |   30 ++----------------------------
 3 files changed, 35 insertions(+), 39 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index b14e51d56a82..25c050202536 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -962,7 +962,9 @@ struct sk_buff;
 
 struct bpf_dtab_netdev *__dev_map_lookup_elem(struct bpf_map *map, u32 key);
 struct bpf_dtab_netdev *__dev_map_hash_lookup_elem(struct bpf_map *map, u32 key);
-void __dev_map_flush(void);
+void __dev_flush(void);
+int dev_xdp_enqueue(struct net_device *dev, struct xdp_buff *xdp,
+		    struct net_device *dev_rx);
 int dev_map_enqueue(struct bpf_dtab_netdev *dst, struct xdp_buff *xdp,
 		    struct net_device *dev_rx);
 int dev_map_generic_redirect(struct bpf_dtab_netdev *dst, struct sk_buff *skb,
@@ -1071,13 +1073,20 @@ static inline struct net_device  *__dev_map_hash_lookup_elem(struct bpf_map *map
 	return NULL;
 }
 
-static inline void __dev_map_flush(void)
+static inline void __dev_flush(void)
 {
 }
 
 struct xdp_buff;
 struct bpf_dtab_netdev;
 
+static inline
+int dev_xdp_enqueue(struct net_device *dev, struct xdp_buff *xdp,
+		    struct net_device *dev_rx)
+{
+	return 0;
+}
+
 static inline
 int dev_map_enqueue(struct bpf_dtab_netdev *dst, struct xdp_buff *xdp,
 		    struct net_device *dev_rx)
diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c
index bcb05cb6b728..adbb82770d02 100644
--- a/kernel/bpf/devmap.c
+++ b/kernel/bpf/devmap.c
@@ -81,7 +81,7 @@ struct bpf_dtab {
 	u32 n_buckets;
 };
 
-static DEFINE_PER_CPU(struct list_head, dev_map_flush_list);
+static DEFINE_PER_CPU(struct list_head, dev_flush_list);
 static DEFINE_SPINLOCK(dev_map_lock);
 static LIST_HEAD(dev_map_list);
 
@@ -357,16 +357,16 @@ static int bq_xmit_all(struct xdp_dev_bulk_queue *bq, u32 flags)
 	goto out;
 }
 
-/* __dev_map_flush is called from xdp_do_flush_map() which _must_ be signaled
+/* __dev_flush is called from xdp_do_flush_map() which _must_ be signaled
  * from the driver before returning from its napi->poll() routine. The poll()
  * routine is called either from busy_poll context or net_rx_action signaled
  * from NET_RX_SOFTIRQ. Either way the poll routine must complete before the
  * net device can be torn down. On devmap tear down we ensure the flush list
  * is empty before completing to ensure all flush operations have completed.
  */
-void __dev_map_flush(void)
+void __dev_flush(void)
 {
-	struct list_head *flush_list = this_cpu_ptr(&dev_map_flush_list);
+	struct list_head *flush_list = this_cpu_ptr(&dev_flush_list);
 	struct xdp_dev_bulk_queue *bq, *tmp;
 
 	rcu_read_lock();
@@ -398,7 +398,7 @@ static int bq_enqueue(struct net_device *dev, struct xdp_frame *xdpf,
 		      struct net_device *dev_rx)
 
 {
-	struct list_head *flush_list = this_cpu_ptr(&dev_map_flush_list);
+	struct list_head *flush_list = this_cpu_ptr(&dev_flush_list);
 	struct xdp_dev_bulk_queue *bq = this_cpu_ptr(dev->xdp_bulkq);
 
 	if (unlikely(bq->count == DEV_MAP_BULK_SIZE))
@@ -419,10 +419,9 @@ static int bq_enqueue(struct net_device *dev, struct xdp_frame *xdpf,
 	return 0;
 }
 
-int dev_map_enqueue(struct bpf_dtab_netdev *dst, struct xdp_buff *xdp,
-		    struct net_device *dev_rx)
+static inline int _xdp_enqueue(struct net_device *dev, struct xdp_buff *xdp,
+			       struct net_device *dev_rx)
 {
-	struct net_device *dev = dst->dev;
 	struct xdp_frame *xdpf;
 	int err;
 
@@ -440,6 +439,20 @@ int dev_map_enqueue(struct bpf_dtab_netdev *dst, struct xdp_buff *xdp,
 	return bq_enqueue(dev, xdpf, dev_rx);
 }
 
+int dev_xdp_enqueue(struct net_device *dev, struct xdp_buff *xdp,
+		    struct net_device *dev_rx)
+{
+	return _xdp_enqueue(dev, xdp, dev_rx);
+}
+
+int dev_map_enqueue(struct bpf_dtab_netdev *dst, struct xdp_buff *xdp,
+		    struct net_device *dev_rx)
+{
+	struct net_device *dev = dst->dev;
+
+	return _xdp_enqueue(dev, xdp, dev_rx);
+}
+
 int dev_map_generic_redirect(struct bpf_dtab_netdev *dst, struct sk_buff *skb,
 			     struct bpf_prog *xdp_prog)
 {
@@ -760,7 +773,7 @@ static int __init dev_map_init(void)
 	register_netdevice_notifier(&dev_map_notifier);
 
 	for_each_possible_cpu(cpu)
-		INIT_LIST_HEAD(&per_cpu(dev_map_flush_list, cpu));
+		INIT_LIST_HEAD(&per_cpu(dev_flush_list, cpu));
 	return 0;
 }
 
diff --git a/net/core/filter.c b/net/core/filter.c
index 42fd17c48c5f..550488162fe1 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -3458,32 +3458,6 @@ static const struct bpf_func_proto bpf_xdp_adjust_meta_proto = {
 	.arg2_type	= ARG_ANYTHING,
 };
 
-static int __bpf_tx_xdp(struct net_device *dev,
-			struct bpf_map *map,
-			struct xdp_buff *xdp,
-			u32 index)
-{
-	struct xdp_frame *xdpf;
-	int err, sent;
-
-	if (!dev->netdev_ops->ndo_xdp_xmit) {
-		return -EOPNOTSUPP;
-	}
-
-	err = xdp_ok_fwd_dev(dev, xdp->data_end - xdp->data);
-	if (unlikely(err))
-		return err;
-
-	xdpf = convert_to_xdp_frame(xdp);
-	if (unlikely(!xdpf))
-		return -EOVERFLOW;
-
-	sent = dev->netdev_ops->ndo_xdp_xmit(dev, 1, &xdpf, XDP_XMIT_FLUSH);
-	if (sent <= 0)
-		return sent;
-	return 0;
-}
-
 static noinline int
 xdp_do_redirect_slow(struct net_device *dev, struct xdp_buff *xdp,
 		     struct bpf_prog *xdp_prog, struct bpf_redirect_info *ri)
@@ -3499,7 +3473,7 @@ xdp_do_redirect_slow(struct net_device *dev, struct xdp_buff *xdp,
 		goto err;
 	}
 
-	err = __bpf_tx_xdp(fwd, NULL, xdp, 0);
+	err = dev_xdp_enqueue(fwd, xdp, dev);
 	if (unlikely(err))
 		goto err;
 
@@ -3529,7 +3503,7 @@ static int __bpf_tx_xdp_map(struct net_device *dev_rx, void *fwd,
 
 void xdp_do_flush_map(void)
 {
-	__dev_map_flush();
+	__dev_flush();
 	__cpu_map_flush();
 	__xsk_map_flush();
 }


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH bpf-next 1/2] xdp: Move devmap bulk queue into struct net_device
  2020-01-10 14:22 ` [PATCH bpf-next 1/2] xdp: Move devmap bulk queue into struct net_device Toke Høiland-Jørgensen
@ 2020-01-10 15:03   ` Björn Töpel
  2020-01-10 15:26     ` Toke Høiland-Jørgensen
  2020-01-10 16:08   ` Jesper Dangaard Brouer
  1 sibling, 1 reply; 13+ messages in thread
From: Björn Töpel @ 2020-01-10 15:03 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: Netdev, bpf, Daniel Borkmann, Alexei Starovoitov, David Miller,
	Jesper Dangaard Brouer, John Fastabend

On Fri, 10 Jan 2020 at 15:22, Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>
> From: Toke Høiland-Jørgensen <toke@redhat.com>
>
> Commit 96360004b862 ("xdp: Make devmap flush_list common for all map
> instances"), changed devmap flushing to be a global operation instead of a
> per-map operation. However, the queue structure used for bulking was still
> allocated as part of the containing map.
>
> This patch moves the devmap bulk queue into struct net_device. The
> motivation for this is reusing it for the non-map variant of XDP_REDIRECT,
> which will be changed in a subsequent commit.
>
> We defer the actual allocation of the bulk queue structure until the
> NETDEV_REGISTER notification devmap.c. This makes it possible to check for
> ndo_xdp_xmit support before allocating the structure, which is not possible
> at the time struct net_device is allocated. However, we keep the freeing in
> free_netdev() to avoid adding another RCU callback on NETDEV_UNREGISTER.
>
> Because of this change, we lose the reference back to the map that
> originated the redirect, so change the tracepoint to always return 0 as the
> map ID and index. Otherwise no functional change is intended with this
> patch.
>

Nice work, Toke!

I'm getting some checkpatch warnings (>80 char lines), other than that:

Acked-by: Björn Töpel <bjorn.topel@intel.com>

> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
> ---
>  include/linux/netdevice.h  |    3 ++
>  include/trace/events/xdp.h |    2 +
>  kernel/bpf/devmap.c        |   61 ++++++++++++++++++--------------------------
>  net/core/dev.c             |    2 +
>  4 files changed, 31 insertions(+), 37 deletions(-)
>
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index 2741aa35bec6..1b2bc2a7522e 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -876,6 +876,7 @@ enum bpf_netdev_command {
>  struct bpf_prog_offload_ops;
>  struct netlink_ext_ack;
>  struct xdp_umem;
> +struct xdp_dev_bulk_queue;
>
>  struct netdev_bpf {
>         enum bpf_netdev_command command;
> @@ -1993,6 +1994,8 @@ struct net_device {
>         spinlock_t              tx_global_lock;
>         int                     watchdog_timeo;
>
> +       struct xdp_dev_bulk_queue __percpu *xdp_bulkq;
> +
>  #ifdef CONFIG_XPS
>         struct xps_dev_maps __rcu *xps_cpus_map;
>         struct xps_dev_maps __rcu *xps_rxqs_map;
> diff --git a/include/trace/events/xdp.h b/include/trace/events/xdp.h
> index a7378bcd9928..72bad13d4a3c 100644
> --- a/include/trace/events/xdp.h
> +++ b/include/trace/events/xdp.h
> @@ -278,7 +278,7 @@ TRACE_EVENT(xdp_devmap_xmit,
>         ),
>
>         TP_fast_assign(
> -               __entry->map_id         = map->id;
> +               __entry->map_id         = map ? map->id : 0;
>                 __entry->act            = XDP_REDIRECT;
>                 __entry->map_index      = map_index;
>                 __entry->drops          = drops;
> diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c
> index da9c832fc5c8..bcb05cb6b728 100644
> --- a/kernel/bpf/devmap.c
> +++ b/kernel/bpf/devmap.c
> @@ -53,13 +53,11 @@
>         (BPF_F_NUMA_NODE | BPF_F_RDONLY | BPF_F_WRONLY)
>
>  #define DEV_MAP_BULK_SIZE 16
> -struct bpf_dtab_netdev;
> -
> -struct xdp_bulk_queue {
> +struct xdp_dev_bulk_queue {
>         struct xdp_frame *q[DEV_MAP_BULK_SIZE];
>         struct list_head flush_node;
> +       struct net_device *dev;
>         struct net_device *dev_rx;
> -       struct bpf_dtab_netdev *obj;
>         unsigned int count;
>  };
>
> @@ -67,9 +65,8 @@ struct bpf_dtab_netdev {
>         struct net_device *dev; /* must be first member, due to tracepoint */
>         struct hlist_node index_hlist;
>         struct bpf_dtab *dtab;
> -       struct xdp_bulk_queue __percpu *bulkq;
>         struct rcu_head rcu;
> -       unsigned int idx; /* keep track of map index for tracepoint */
> +       unsigned int idx;
>  };
>
>  struct bpf_dtab {
> @@ -219,7 +216,6 @@ static void dev_map_free(struct bpf_map *map)
>
>                         hlist_for_each_entry_safe(dev, next, head, index_hlist) {
>                                 hlist_del_rcu(&dev->index_hlist);
> -                               free_percpu(dev->bulkq);
>                                 dev_put(dev->dev);
>                                 kfree(dev);
>                         }
> @@ -234,7 +230,6 @@ static void dev_map_free(struct bpf_map *map)
>                         if (!dev)
>                                 continue;
>
> -                       free_percpu(dev->bulkq);
>                         dev_put(dev->dev);
>                         kfree(dev);
>                 }
> @@ -320,10 +315,9 @@ static int dev_map_hash_get_next_key(struct bpf_map *map, void *key,
>         return -ENOENT;
>  }
>
> -static int bq_xmit_all(struct xdp_bulk_queue *bq, u32 flags)
> +static int bq_xmit_all(struct xdp_dev_bulk_queue *bq, u32 flags)
>  {
> -       struct bpf_dtab_netdev *obj = bq->obj;
> -       struct net_device *dev = obj->dev;
> +       struct net_device *dev = bq->dev;
>         int sent = 0, drops = 0, err = 0;
>         int i;
>
> @@ -346,8 +340,7 @@ static int bq_xmit_all(struct xdp_bulk_queue *bq, u32 flags)
>  out:
>         bq->count = 0;
>
> -       trace_xdp_devmap_xmit(&obj->dtab->map, obj->idx,
> -                             sent, drops, bq->dev_rx, dev, err);
> +       trace_xdp_devmap_xmit(NULL, 0, sent, drops, bq->dev_rx, dev, err);
>         bq->dev_rx = NULL;
>         __list_del_clearprev(&bq->flush_node);
>         return 0;
> @@ -374,7 +367,7 @@ static int bq_xmit_all(struct xdp_bulk_queue *bq, u32 flags)
>  void __dev_map_flush(void)
>  {
>         struct list_head *flush_list = this_cpu_ptr(&dev_map_flush_list);
> -       struct xdp_bulk_queue *bq, *tmp;
> +       struct xdp_dev_bulk_queue *bq, *tmp;
>
>         rcu_read_lock();
>         list_for_each_entry_safe(bq, tmp, flush_list, flush_node)
> @@ -401,12 +394,12 @@ struct bpf_dtab_netdev *__dev_map_lookup_elem(struct bpf_map *map, u32 key)
>  /* Runs under RCU-read-side, plus in softirq under NAPI protection.
>   * Thus, safe percpu variable access.
>   */
> -static int bq_enqueue(struct bpf_dtab_netdev *obj, struct xdp_frame *xdpf,
> +static int bq_enqueue(struct net_device *dev, struct xdp_frame *xdpf,
>                       struct net_device *dev_rx)
>
>  {
>         struct list_head *flush_list = this_cpu_ptr(&dev_map_flush_list);
> -       struct xdp_bulk_queue *bq = this_cpu_ptr(obj->bulkq);
> +       struct xdp_dev_bulk_queue *bq = this_cpu_ptr(dev->xdp_bulkq);
>
>         if (unlikely(bq->count == DEV_MAP_BULK_SIZE))
>                 bq_xmit_all(bq, 0);
> @@ -444,7 +437,7 @@ int dev_map_enqueue(struct bpf_dtab_netdev *dst, struct xdp_buff *xdp,
>         if (unlikely(!xdpf))
>                 return -EOVERFLOW;
>
> -       return bq_enqueue(dst, xdpf, dev_rx);
> +       return bq_enqueue(dev, xdpf, dev_rx);
>  }
>
>  int dev_map_generic_redirect(struct bpf_dtab_netdev *dst, struct sk_buff *skb,
> @@ -483,7 +476,6 @@ static void __dev_map_entry_free(struct rcu_head *rcu)
>         struct bpf_dtab_netdev *dev;
>
>         dev = container_of(rcu, struct bpf_dtab_netdev, rcu);
> -       free_percpu(dev->bulkq);
>         dev_put(dev->dev);
>         kfree(dev);
>  }
> @@ -538,30 +530,14 @@ static struct bpf_dtab_netdev *__dev_map_alloc_node(struct net *net,
>                                                     u32 ifindex,
>                                                     unsigned int idx)
>  {
> -       gfp_t gfp = GFP_ATOMIC | __GFP_NOWARN;
>         struct bpf_dtab_netdev *dev;
> -       struct xdp_bulk_queue *bq;
> -       int cpu;
>
> -       dev = kmalloc_node(sizeof(*dev), gfp, dtab->map.numa_node);
> +       dev = kmalloc_node(sizeof(*dev), GFP_ATOMIC | __GFP_NOWARN, dtab->map.numa_node);
>         if (!dev)
>                 return ERR_PTR(-ENOMEM);
>
> -       dev->bulkq = __alloc_percpu_gfp(sizeof(*dev->bulkq),
> -                                       sizeof(void *), gfp);
> -       if (!dev->bulkq) {
> -               kfree(dev);
> -               return ERR_PTR(-ENOMEM);
> -       }
> -
> -       for_each_possible_cpu(cpu) {
> -               bq = per_cpu_ptr(dev->bulkq, cpu);
> -               bq->obj = dev;
> -       }
> -
>         dev->dev = dev_get_by_index(net, ifindex);
>         if (!dev->dev) {
> -               free_percpu(dev->bulkq);
>                 kfree(dev);
>                 return ERR_PTR(-EINVAL);
>         }
> @@ -721,9 +697,22 @@ static int dev_map_notification(struct notifier_block *notifier,
>  {
>         struct net_device *netdev = netdev_notifier_info_to_dev(ptr);
>         struct bpf_dtab *dtab;
> -       int i;
> +       int i, cpu;
>
>         switch (event) {
> +       case NETDEV_REGISTER:
> +               if (!netdev->netdev_ops->ndo_xdp_xmit || netdev->xdp_bulkq)
> +                       break;
> +
> +               /* will be freed in free_netdev() */
> +               netdev->xdp_bulkq = __alloc_percpu_gfp(sizeof(struct xdp_dev_bulk_queue),
> +                                                      sizeof(void *), GFP_ATOMIC);
> +               if (!netdev->xdp_bulkq)
> +                       return NOTIFY_BAD;
> +
> +               for_each_possible_cpu(cpu)
> +                       per_cpu_ptr(netdev->xdp_bulkq, cpu)->dev = netdev;
> +               break;
>         case NETDEV_UNREGISTER:
>                 /* This rcu_read_lock/unlock pair is needed because
>                  * dev_map_list is an RCU list AND to ensure a delete
> diff --git a/net/core/dev.c b/net/core/dev.c
> index d99f88c58636..e7802a41ae7f 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -9847,6 +9847,8 @@ void free_netdev(struct net_device *dev)
>
>         free_percpu(dev->pcpu_refcnt);
>         dev->pcpu_refcnt = NULL;
> +       free_percpu(dev->xdp_bulkq);
> +       dev->xdp_bulkq = NULL;
>
>         netdev_unregister_lockdep_key(dev);
>
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH bpf-next 2/2] xdp: Use bulking for non-map XDP_REDIRECT
  2020-01-10 14:22 ` [PATCH bpf-next 2/2] xdp: Use bulking for non-map XDP_REDIRECT Toke Høiland-Jørgensen
@ 2020-01-10 15:15   ` Björn Töpel
  2020-01-10 15:30     ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 13+ messages in thread
From: Björn Töpel @ 2020-01-10 15:15 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: Netdev, bpf, Daniel Borkmann, Alexei Starovoitov, David Miller,
	Jesper Dangaard Brouer, John Fastabend

On Fri, 10 Jan 2020 at 15:22, Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>
> From: Toke Høiland-Jørgensen <toke@redhat.com>
>
> Since the bulk queue used by XDP_REDIRECT now lives in struct net_device,
> we can re-use the bulking for the non-map version of the bpf_redirect()
> helper. This is a simple matter of having xdp_do_redirect_slow() queue the
> frame on the bulk queue instead of sending it out with __bpf_tx_xdp().
>
> Unfortunately we can't make the bpf_redirect() helper return an error if
> the ifindex doesn't exit (as bpf_redirect_map() does), because we don't
> have a reference to the network namespace of the ingress device at the time
> the helper is called. So we have to leave it as-is and keep the device
> lookup in xdp_do_redirect_slow().
>
> With this change, the performance of the xdp_redirect sample program goes
> from 5Mpps to 8.4Mpps (a 68% increase).
>

After these changes, does the noinline (commit 47b123ed9e99 ("xdp:
split code for map vs non-map redirect")) still make sense?

> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
> ---
>  include/linux/bpf.h |   13 +++++++++++--
>  kernel/bpf/devmap.c |   31 ++++++++++++++++++++++---------
>  net/core/filter.c   |   30 ++----------------------------
>  3 files changed, 35 insertions(+), 39 deletions(-)
>
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index b14e51d56a82..25c050202536 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -962,7 +962,9 @@ struct sk_buff;
>
>  struct bpf_dtab_netdev *__dev_map_lookup_elem(struct bpf_map *map, u32 key);
>  struct bpf_dtab_netdev *__dev_map_hash_lookup_elem(struct bpf_map *map, u32 key);
> -void __dev_map_flush(void);
> +void __dev_flush(void);
> +int dev_xdp_enqueue(struct net_device *dev, struct xdp_buff *xdp,
> +                   struct net_device *dev_rx);
>  int dev_map_enqueue(struct bpf_dtab_netdev *dst, struct xdp_buff *xdp,
>                     struct net_device *dev_rx);
>  int dev_map_generic_redirect(struct bpf_dtab_netdev *dst, struct sk_buff *skb,
> @@ -1071,13 +1073,20 @@ static inline struct net_device  *__dev_map_hash_lookup_elem(struct bpf_map *map
>         return NULL;
>  }
>
> -static inline void __dev_map_flush(void)
> +static inline void __dev_flush(void)
>  {
>  }
>
>  struct xdp_buff;
>  struct bpf_dtab_netdev;
>
> +static inline
> +int dev_xdp_enqueue(struct net_device *dev, struct xdp_buff *xdp,
> +                   struct net_device *dev_rx)
> +{
> +       return 0;
> +}
> +
>  static inline
>  int dev_map_enqueue(struct bpf_dtab_netdev *dst, struct xdp_buff *xdp,
>                     struct net_device *dev_rx)
> diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c
> index bcb05cb6b728..adbb82770d02 100644
> --- a/kernel/bpf/devmap.c
> +++ b/kernel/bpf/devmap.c
> @@ -81,7 +81,7 @@ struct bpf_dtab {
>         u32 n_buckets;
>  };
>
> -static DEFINE_PER_CPU(struct list_head, dev_map_flush_list);
> +static DEFINE_PER_CPU(struct list_head, dev_flush_list);
>  static DEFINE_SPINLOCK(dev_map_lock);
>  static LIST_HEAD(dev_map_list);
>
> @@ -357,16 +357,16 @@ static int bq_xmit_all(struct xdp_dev_bulk_queue *bq, u32 flags)
>         goto out;
>  }
>
> -/* __dev_map_flush is called from xdp_do_flush_map() which _must_ be signaled
> +/* __dev_flush is called from xdp_do_flush_map() which _must_ be signaled
>   * from the driver before returning from its napi->poll() routine. The poll()
>   * routine is called either from busy_poll context or net_rx_action signaled
>   * from NET_RX_SOFTIRQ. Either way the poll routine must complete before the
>   * net device can be torn down. On devmap tear down we ensure the flush list
>   * is empty before completing to ensure all flush operations have completed.
>   */
> -void __dev_map_flush(void)
> +void __dev_flush(void)
>  {
> -       struct list_head *flush_list = this_cpu_ptr(&dev_map_flush_list);
> +       struct list_head *flush_list = this_cpu_ptr(&dev_flush_list);
>         struct xdp_dev_bulk_queue *bq, *tmp;
>
>         rcu_read_lock();
> @@ -398,7 +398,7 @@ static int bq_enqueue(struct net_device *dev, struct xdp_frame *xdpf,
>                       struct net_device *dev_rx)
>
>  {
> -       struct list_head *flush_list = this_cpu_ptr(&dev_map_flush_list);
> +       struct list_head *flush_list = this_cpu_ptr(&dev_flush_list);
>         struct xdp_dev_bulk_queue *bq = this_cpu_ptr(dev->xdp_bulkq);
>
>         if (unlikely(bq->count == DEV_MAP_BULK_SIZE))
> @@ -419,10 +419,9 @@ static int bq_enqueue(struct net_device *dev, struct xdp_frame *xdpf,
>         return 0;
>  }
>
> -int dev_map_enqueue(struct bpf_dtab_netdev *dst, struct xdp_buff *xdp,
> -                   struct net_device *dev_rx)
> +static inline int _xdp_enqueue(struct net_device *dev, struct xdp_buff *xdp,
> +                              struct net_device *dev_rx)
>  {
> -       struct net_device *dev = dst->dev;
>         struct xdp_frame *xdpf;
>         int err;
>
> @@ -440,6 +439,20 @@ int dev_map_enqueue(struct bpf_dtab_netdev *dst, struct xdp_buff *xdp,
>         return bq_enqueue(dev, xdpf, dev_rx);
>  }
>
> +int dev_xdp_enqueue(struct net_device *dev, struct xdp_buff *xdp,
> +                   struct net_device *dev_rx)
> +{
> +       return _xdp_enqueue(dev, xdp, dev_rx);
> +}
> +

dev_xdp_enqueue, and dev_map_enqueue are *very* similar. Can these be
combined, and maybe fold the xdp_do_redirect_slow() into
xdp_do_direct_map? OTOH the TP are different, so maybe combining the
two functions will be messy... It's only that with your changes the
map/ifindex redirect are very similar. Just an idea, might be messy.
:-P

> +int dev_map_enqueue(struct bpf_dtab_netdev *dst, struct xdp_buff *xdp,
> +                   struct net_device *dev_rx)
> +{
> +       struct net_device *dev = dst->dev;
> +
> +       return _xdp_enqueue(dev, xdp, dev_rx);
> +}
> +
>  int dev_map_generic_redirect(struct bpf_dtab_netdev *dst, struct sk_buff *skb,
>                              struct bpf_prog *xdp_prog)
>  {
> @@ -760,7 +773,7 @@ static int __init dev_map_init(void)
>         register_netdevice_notifier(&dev_map_notifier);
>
>         for_each_possible_cpu(cpu)
> -               INIT_LIST_HEAD(&per_cpu(dev_map_flush_list, cpu));
> +               INIT_LIST_HEAD(&per_cpu(dev_flush_list, cpu));
>         return 0;
>  }
>
> diff --git a/net/core/filter.c b/net/core/filter.c
> index 42fd17c48c5f..550488162fe1 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -3458,32 +3458,6 @@ static const struct bpf_func_proto bpf_xdp_adjust_meta_proto = {
>         .arg2_type      = ARG_ANYTHING,
>  };
>
> -static int __bpf_tx_xdp(struct net_device *dev,
> -                       struct bpf_map *map,
> -                       struct xdp_buff *xdp,
> -                       u32 index)
> -{
> -       struct xdp_frame *xdpf;
> -       int err, sent;
> -
> -       if (!dev->netdev_ops->ndo_xdp_xmit) {
> -               return -EOPNOTSUPP;
> -       }
> -
> -       err = xdp_ok_fwd_dev(dev, xdp->data_end - xdp->data);
> -       if (unlikely(err))
> -               return err;
> -
> -       xdpf = convert_to_xdp_frame(xdp);
> -       if (unlikely(!xdpf))
> -               return -EOVERFLOW;
> -
> -       sent = dev->netdev_ops->ndo_xdp_xmit(dev, 1, &xdpf, XDP_XMIT_FLUSH);
> -       if (sent <= 0)
> -               return sent;
> -       return 0;
> -}
> -
>  static noinline int
>  xdp_do_redirect_slow(struct net_device *dev, struct xdp_buff *xdp,
>                      struct bpf_prog *xdp_prog, struct bpf_redirect_info *ri)
> @@ -3499,7 +3473,7 @@ xdp_do_redirect_slow(struct net_device *dev, struct xdp_buff *xdp,
>                 goto err;
>         }
>
> -       err = __bpf_tx_xdp(fwd, NULL, xdp, 0);
> +       err = dev_xdp_enqueue(fwd, xdp, dev);
>         if (unlikely(err))
>                 goto err;
>
> @@ -3529,7 +3503,7 @@ static int __bpf_tx_xdp_map(struct net_device *dev_rx, void *fwd,
>
>  void xdp_do_flush_map(void)
>  {
> -       __dev_map_flush();
> +       __dev_flush();
>         __cpu_map_flush();
>         __xsk_map_flush();
>  }
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH bpf-next 1/2] xdp: Move devmap bulk queue into struct net_device
  2020-01-10 15:03   ` Björn Töpel
@ 2020-01-10 15:26     ` Toke Høiland-Jørgensen
  0 siblings, 0 replies; 13+ messages in thread
From: Toke Høiland-Jørgensen @ 2020-01-10 15:26 UTC (permalink / raw)
  To: Björn Töpel
  Cc: Netdev, bpf, Daniel Borkmann, Alexei Starovoitov, David Miller,
	Jesper Dangaard Brouer, John Fastabend

Björn Töpel <bjorn.topel@gmail.com> writes:

> On Fri, 10 Jan 2020 at 15:22, Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>>
>> From: Toke Høiland-Jørgensen <toke@redhat.com>
>>
>> Commit 96360004b862 ("xdp: Make devmap flush_list common for all map
>> instances"), changed devmap flushing to be a global operation instead of a
>> per-map operation. However, the queue structure used for bulking was still
>> allocated as part of the containing map.
>>
>> This patch moves the devmap bulk queue into struct net_device. The
>> motivation for this is reusing it for the non-map variant of XDP_REDIRECT,
>> which will be changed in a subsequent commit.
>>
>> We defer the actual allocation of the bulk queue structure until the
>> NETDEV_REGISTER notification devmap.c. This makes it possible to check for
>> ndo_xdp_xmit support before allocating the structure, which is not possible
>> at the time struct net_device is allocated. However, we keep the freeing in
>> free_netdev() to avoid adding another RCU callback on NETDEV_UNREGISTER.
>>
>> Because of this change, we lose the reference back to the map that
>> originated the redirect, so change the tracepoint to always return 0 as the
>> map ID and index. Otherwise no functional change is intended with this
>> patch.
>>
>
> Nice work, Toke!

Thanks!

> I'm getting some checkpatch warnings (>80 char lines), other than
> that:

Oh, right, totally forgot to run checkpatch; will fix and respin :)

-Toke


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH bpf-next 2/2] xdp: Use bulking for non-map XDP_REDIRECT
  2020-01-10 15:15   ` Björn Töpel
@ 2020-01-10 15:30     ` Toke Høiland-Jørgensen
  2020-01-10 15:54       ` Björn Töpel
  0 siblings, 1 reply; 13+ messages in thread
From: Toke Høiland-Jørgensen @ 2020-01-10 15:30 UTC (permalink / raw)
  To: Björn Töpel
  Cc: Netdev, bpf, Daniel Borkmann, Alexei Starovoitov, David Miller,
	Jesper Dangaard Brouer, John Fastabend

Björn Töpel <bjorn.topel@gmail.com> writes:

> On Fri, 10 Jan 2020 at 15:22, Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>>
>> From: Toke Høiland-Jørgensen <toke@redhat.com>
>>
>> Since the bulk queue used by XDP_REDIRECT now lives in struct net_device,
>> we can re-use the bulking for the non-map version of the bpf_redirect()
>> helper. This is a simple matter of having xdp_do_redirect_slow() queue the
>> frame on the bulk queue instead of sending it out with __bpf_tx_xdp().
>>
>> Unfortunately we can't make the bpf_redirect() helper return an error if
>> the ifindex doesn't exit (as bpf_redirect_map() does), because we don't
>> have a reference to the network namespace of the ingress device at the time
>> the helper is called. So we have to leave it as-is and keep the device
>> lookup in xdp_do_redirect_slow().
>>
>> With this change, the performance of the xdp_redirect sample program goes
>> from 5Mpps to 8.4Mpps (a 68% increase).
>>
>
> After these changes, does the noinline (commit 47b123ed9e99 ("xdp:
> split code for map vs non-map redirect")) still make sense?

Hmm, good question. The two code paths are certainly close to one
another; and I guess they could be consolidated further.

The best case would be if we had a way to lookup the ifindex directly in
the helper. Do you know if there's a way to get the current net
namespace from the helper? Can we use current->nsproxy->net_ns in that
context?

If we can, and if we don't mind merging the two different tracepoints,
the xdp_do_redirect() function could be made quite a bit leaner...

-Toke


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH bpf-next 2/2] xdp: Use bulking for non-map XDP_REDIRECT
  2020-01-10 15:30     ` Toke Høiland-Jørgensen
@ 2020-01-10 15:54       ` Björn Töpel
  2020-01-10 15:57         ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 13+ messages in thread
From: Björn Töpel @ 2020-01-10 15:54 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: Netdev, bpf, Daniel Borkmann, Alexei Starovoitov, David Miller,
	Jesper Dangaard Brouer, John Fastabend

On Fri, 10 Jan 2020 at 16:30, Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>
> Björn Töpel <bjorn.topel@gmail.com> writes:
>
[...]
> >
> > After these changes, does the noinline (commit 47b123ed9e99 ("xdp:
> > split code for map vs non-map redirect")) still make sense?
>
> Hmm, good question. The two code paths are certainly close to one
> another; and I guess they could be consolidated further.
>
> The best case would be if we had a way to lookup the ifindex directly in
> the helper. Do you know if there's a way to get the current net
> namespace from the helper? Can we use current->nsproxy->net_ns in that
> context?
>

Nope, interrupt context. :-( Another (ugly) way is adding a netns
member to the bpf_redirect_info, that is populated by the driver
(driver changes everywhere -- ick). So no.

(And *if* one would go the route of changing all drivers, I think the
percpu bpf_redirect_info should be replaced a by a context that is
passed from the driver to the XDP program execution and
xdp_do_redirect/flush. But that's a much bigger patch. :-))


Björn


> If we can, and if we don't mind merging the two different tracepoints,
> the xdp_do_redirect() function could be made quite a bit leaner...
>
> -Toke
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH bpf-next 2/2] xdp: Use bulking for non-map XDP_REDIRECT
  2020-01-10 15:54       ` Björn Töpel
@ 2020-01-10 15:57         ` Toke Høiland-Jørgensen
  0 siblings, 0 replies; 13+ messages in thread
From: Toke Høiland-Jørgensen @ 2020-01-10 15:57 UTC (permalink / raw)
  To: Björn Töpel
  Cc: Netdev, bpf, Daniel Borkmann, Alexei Starovoitov, David Miller,
	Jesper Dangaard Brouer, John Fastabend

Björn Töpel <bjorn.topel@gmail.com> writes:

> On Fri, 10 Jan 2020 at 16:30, Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>>
>> Björn Töpel <bjorn.topel@gmail.com> writes:
>>
> [...]
>> >
>> > After these changes, does the noinline (commit 47b123ed9e99 ("xdp:
>> > split code for map vs non-map redirect")) still make sense?
>>
>> Hmm, good question. The two code paths are certainly close to one
>> another; and I guess they could be consolidated further.
>>
>> The best case would be if we had a way to lookup the ifindex directly in
>> the helper. Do you know if there's a way to get the current net
>> namespace from the helper? Can we use current->nsproxy->net_ns in that
>> context?
>>
>
> Nope, interrupt context. :-( Another (ugly) way is adding a netns
> member to the bpf_redirect_info, that is populated by the driver
> (driver changes everywhere -- ick). So no.

Yup, that's what I thought. OK, too bad; I'll see what other
consolidation I can do with the current code, then.

> (And *if* one would go the route of changing all drivers, I think the
> percpu bpf_redirect_info should be replaced a by a context that is
> passed from the driver to the XDP program execution and
> xdp_do_redirect/flush. But that's a much bigger patch. :-))

Yeah, let's leave that until the next time we figure out we have to
change all the drivers, then ;)

-Toke


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH bpf-next 1/2] xdp: Move devmap bulk queue into struct net_device
  2020-01-10 14:22 ` [PATCH bpf-next 1/2] xdp: Move devmap bulk queue into struct net_device Toke Høiland-Jørgensen
  2020-01-10 15:03   ` Björn Töpel
@ 2020-01-10 16:08   ` Jesper Dangaard Brouer
  2020-01-10 22:34     ` Toke Høiland-Jørgensen
  1 sibling, 1 reply; 13+ messages in thread
From: Jesper Dangaard Brouer @ 2020-01-10 16:08 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: netdev, bpf, Daniel Borkmann, Alexei Starovoitov, David Miller,
	Björn Töpel, John Fastabend, brouer

On Fri, 10 Jan 2020 15:22:02 +0100
Toke Høiland-Jørgensen <toke@redhat.com> wrote:

> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index 2741aa35bec6..1b2bc2a7522e 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
[...]
> @@ -1993,6 +1994,8 @@ struct net_device {
>  	spinlock_t		tx_global_lock;
>  	int			watchdog_timeo;
>  
> +	struct xdp_dev_bulk_queue __percpu *xdp_bulkq;
> +
>  #ifdef CONFIG_XPS
>  	struct xps_dev_maps __rcu *xps_cpus_map;
>  	struct xps_dev_maps __rcu *xps_rxqs_map;

We need to check that the cache-line for this location in struct
net_device is not getting updated (write operation) from different CPUs.

The test you ran was a single queue single CPU test, which will not
show any regression for that case.

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH bpf-next 1/2] xdp: Move devmap bulk queue into struct net_device
  2020-01-10 16:08   ` Jesper Dangaard Brouer
@ 2020-01-10 22:34     ` Toke Høiland-Jørgensen
  2020-01-10 22:46       ` Eric Dumazet
  0 siblings, 1 reply; 13+ messages in thread
From: Toke Høiland-Jørgensen @ 2020-01-10 22:34 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: netdev, bpf, Daniel Borkmann, Alexei Starovoitov, David Miller,
	Björn Töpel, John Fastabend, brouer

Jesper Dangaard Brouer <brouer@redhat.com> writes:

> On Fri, 10 Jan 2020 15:22:02 +0100
> Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>
>> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
>> index 2741aa35bec6..1b2bc2a7522e 100644
>> --- a/include/linux/netdevice.h
>> +++ b/include/linux/netdevice.h
> [...]
>> @@ -1993,6 +1994,8 @@ struct net_device {
>>  	spinlock_t		tx_global_lock;
>>  	int			watchdog_timeo;
>>  
>> +	struct xdp_dev_bulk_queue __percpu *xdp_bulkq;
>> +
>>  #ifdef CONFIG_XPS
>>  	struct xps_dev_maps __rcu *xps_cpus_map;
>>  	struct xps_dev_maps __rcu *xps_rxqs_map;
>
> We need to check that the cache-line for this location in struct
> net_device is not getting updated (write operation) from different CPUs.
>
> The test you ran was a single queue single CPU test, which will not
> show any regression for that case.

Well, pahole says:

	/* --- cacheline 14 boundary (896 bytes) --- */
	struct netdev_queue *      _tx __attribute__((__aligned__(64))); /*   896     8 */
	unsigned int               num_tx_queues;        /*   904     4 */
	unsigned int               real_num_tx_queues;   /*   908     4 */
	struct Qdisc *             qdisc;                /*   912     8 */
	struct hlist_head  qdisc_hash[16];               /*   920   128 */
	/* --- cacheline 16 boundary (1024 bytes) was 24 bytes ago --- */
	unsigned int               tx_queue_len;         /*  1048     4 */
	spinlock_t                 tx_global_lock;       /*  1052     4 */
	int                        watchdog_timeo;       /*  1056     4 */

	/* XXX 4 bytes hole, try to pack */

	struct xdp_dev_bulk_queue * xdp_bulkq;           /*  1064     8 */
	struct xps_dev_maps *      xps_cpus_map;         /*  1072     8 */
	struct xps_dev_maps *      xps_rxqs_map;         /*  1080     8 */
	/* --- cacheline 17 boundary (1088 bytes) --- */


of those, tx_queue_len is the max queue len (so only set on init),
tx_global_lock is not used by multi-queue devices, watchdog_timeo also
seems to be a static value thats set on init, and the xps* pointers also
only seems to be set once on init. So I think we're fine?

I can run a multi-CPU test just to be sure, but I really don't see which
of those fields might be updated on TX...

-Toke


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH bpf-next 1/2] xdp: Move devmap bulk queue into struct net_device
  2020-01-10 22:34     ` Toke Høiland-Jørgensen
@ 2020-01-10 22:46       ` Eric Dumazet
  2020-01-10 23:16         ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 13+ messages in thread
From: Eric Dumazet @ 2020-01-10 22:46 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen, Jesper Dangaard Brouer
  Cc: netdev, bpf, Daniel Borkmann, Alexei Starovoitov, David Miller,
	Björn Töpel, John Fastabend



On 1/10/20 2:34 PM, Toke Høiland-Jørgensen wrote:
> Jesper Dangaard Brouer <brouer@redhat.com> writes:
> 
>> On Fri, 10 Jan 2020 15:22:02 +0100
>> Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>>
>>> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
>>> index 2741aa35bec6..1b2bc2a7522e 100644
>>> --- a/include/linux/netdevice.h
>>> +++ b/include/linux/netdevice.h
>> [...]
>>> @@ -1993,6 +1994,8 @@ struct net_device {
>>>  	spinlock_t		tx_global_lock;
>>>  	int			watchdog_timeo;
>>>  
>>> +	struct xdp_dev_bulk_queue __percpu *xdp_bulkq;
>>> +
>>>  #ifdef CONFIG_XPS
>>>  	struct xps_dev_maps __rcu *xps_cpus_map;
>>>  	struct xps_dev_maps __rcu *xps_rxqs_map;
>>
>> We need to check that the cache-line for this location in struct
>> net_device is not getting updated (write operation) from different CPUs.
>>
>> The test you ran was a single queue single CPU test, which will not
>> show any regression for that case.
> 
> Well, pahole says:
> 
> 	/* --- cacheline 14 boundary (896 bytes) --- */
> 	struct netdev_queue *      _tx __attribute__((__aligned__(64))); /*   896     8 */
> 	unsigned int               num_tx_queues;        /*   904     4 */
> 	unsigned int               real_num_tx_queues;   /*   908     4 */
> 	struct Qdisc *             qdisc;                /*   912     8 */
> 	struct hlist_head  qdisc_hash[16];               /*   920   128 */
> 	/* --- cacheline 16 boundary (1024 bytes) was 24 bytes ago --- */
> 	unsigned int               tx_queue_len;         /*  1048     4 */
> 	spinlock_t                 tx_global_lock;       /*  1052     4 */
> 	int                        watchdog_timeo;       /*  1056     4 */
> 
> 	/* XXX 4 bytes hole, try to pack */
> 
> 	struct xdp_dev_bulk_queue * xdp_bulkq;           /*  1064     8 */
> 	struct xps_dev_maps *      xps_cpus_map;         /*  1072     8 */
> 	struct xps_dev_maps *      xps_rxqs_map;         /*  1080     8 */
> 	/* --- cacheline 17 boundary (1088 bytes) --- */
> 
> 
> of those, tx_queue_len is the max queue len (so only set on init),
> tx_global_lock is not used by multi-queue devices, watchdog_timeo also
> seems to be a static value thats set on init, and the xps* pointers also
> only seems to be set once on init. So I think we're fine?
> 
> I can run a multi-CPU test just to be sure, but I really don't see which
> of those fields might be updated on TX...
> 

Note that another interesting field is miniq_egress, your patch
moves it to another cache line.

We probably should move qdisc_hash array elsewhere.



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH bpf-next 1/2] xdp: Move devmap bulk queue into struct net_device
  2020-01-10 22:46       ` Eric Dumazet
@ 2020-01-10 23:16         ` Toke Høiland-Jørgensen
  0 siblings, 0 replies; 13+ messages in thread
From: Toke Høiland-Jørgensen @ 2020-01-10 23:16 UTC (permalink / raw)
  To: Eric Dumazet, Jesper Dangaard Brouer
  Cc: netdev, bpf, Daniel Borkmann, Alexei Starovoitov, David Miller,
	Björn Töpel, John Fastabend

Eric Dumazet <eric.dumazet@gmail.com> writes:

> On 1/10/20 2:34 PM, Toke Høiland-Jørgensen wrote:
>> Jesper Dangaard Brouer <brouer@redhat.com> writes:
>> 
>>> On Fri, 10 Jan 2020 15:22:02 +0100
>>> Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>>>
>>>> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
>>>> index 2741aa35bec6..1b2bc2a7522e 100644
>>>> --- a/include/linux/netdevice.h
>>>> +++ b/include/linux/netdevice.h
>>> [...]
>>>> @@ -1993,6 +1994,8 @@ struct net_device {
>>>>  	spinlock_t		tx_global_lock;
>>>>  	int			watchdog_timeo;
>>>>  
>>>> +	struct xdp_dev_bulk_queue __percpu *xdp_bulkq;
>>>> +
>>>>  #ifdef CONFIG_XPS
>>>>  	struct xps_dev_maps __rcu *xps_cpus_map;
>>>>  	struct xps_dev_maps __rcu *xps_rxqs_map;
>>>
>>> We need to check that the cache-line for this location in struct
>>> net_device is not getting updated (write operation) from different CPUs.
>>>
>>> The test you ran was a single queue single CPU test, which will not
>>> show any regression for that case.
>> 
>> Well, pahole says:
>> 
>> 	/* --- cacheline 14 boundary (896 bytes) --- */
>> 	struct netdev_queue *      _tx __attribute__((__aligned__(64))); /*   896     8 */
>> 	unsigned int               num_tx_queues;        /*   904     4 */
>> 	unsigned int               real_num_tx_queues;   /*   908     4 */
>> 	struct Qdisc *             qdisc;                /*   912     8 */
>> 	struct hlist_head  qdisc_hash[16];               /*   920   128 */
>> 	/* --- cacheline 16 boundary (1024 bytes) was 24 bytes ago --- */
>> 	unsigned int               tx_queue_len;         /*  1048     4 */
>> 	spinlock_t                 tx_global_lock;       /*  1052     4 */
>> 	int                        watchdog_timeo;       /*  1056     4 */
>> 
>> 	/* XXX 4 bytes hole, try to pack */
>> 
>> 	struct xdp_dev_bulk_queue * xdp_bulkq;           /*  1064     8 */
>> 	struct xps_dev_maps *      xps_cpus_map;         /*  1072     8 */
>> 	struct xps_dev_maps *      xps_rxqs_map;         /*  1080     8 */
>> 	/* --- cacheline 17 boundary (1088 bytes) --- */
>> 
>> 
>> of those, tx_queue_len is the max queue len (so only set on init),
>> tx_global_lock is not used by multi-queue devices, watchdog_timeo also
>> seems to be a static value thats set on init, and the xps* pointers also
>> only seems to be set once on init. So I think we're fine?
>> 
>> I can run a multi-CPU test just to be sure, but I really don't see which
>> of those fields might be updated on TX...
>> 
>
> Note that another interesting field is miniq_egress, your patch
> moves it to another cache line.

Hmm, since there's that 4-byte hole, I gust we could just move
watchdog_timeo down to fix that. Any reason that's a bad idea?

> We probably should move qdisc_hash array elsewhere.

You certainly won't hear me object to that :)

-Toke


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, back to index

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-01-10 14:22 [PATCH bpf-next 0/2] xdp: Introduce bulking for non-map XDP_REDIRECT Toke Høiland-Jørgensen
2020-01-10 14:22 ` [PATCH bpf-next 1/2] xdp: Move devmap bulk queue into struct net_device Toke Høiland-Jørgensen
2020-01-10 15:03   ` Björn Töpel
2020-01-10 15:26     ` Toke Høiland-Jørgensen
2020-01-10 16:08   ` Jesper Dangaard Brouer
2020-01-10 22:34     ` Toke Høiland-Jørgensen
2020-01-10 22:46       ` Eric Dumazet
2020-01-10 23:16         ` Toke Høiland-Jørgensen
2020-01-10 14:22 ` [PATCH bpf-next 2/2] xdp: Use bulking for non-map XDP_REDIRECT Toke Høiland-Jørgensen
2020-01-10 15:15   ` Björn Töpel
2020-01-10 15:30     ` Toke Høiland-Jørgensen
2020-01-10 15:54       ` Björn Töpel
2020-01-10 15:57         ` Toke Høiland-Jørgensen

BPF Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/bpf/0 bpf/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 bpf bpf/ https://lore.kernel.org/bpf \
		bpf@vger.kernel.org
	public-inbox-index bpf

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.bpf


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git