* [PATCH bpf-next v2 0/2] xdp: Introduce bulking for non-map XDP_REDIRECT @ 2020-01-13 18:10 Toke Høiland-Jørgensen 2020-01-13 18:10 ` [PATCH bpf-next v2 1/2] xdp: Move devmap bulk queue into struct net_device Toke Høiland-Jørgensen ` (2 more replies) 0 siblings, 3 replies; 13+ messages in thread From: Toke Høiland-Jørgensen @ 2020-01-13 18:10 UTC (permalink / raw) To: netdev Cc: bpf, Daniel Borkmann, Alexei Starovoitov, David Miller, Jesper Dangaard Brouer, Björn Töpel, John Fastabend Since commit 96360004b862 ("xdp: Make devmap flush_list common for all map instances"), devmap flushing is a global operation instead of tied to a particular map. This means that with a bit of refactoring, we can finally fix the performance delta between the bpf_redirect_map() and bpf_redirect() helper functions, by introducing bulking for the latter as well. This series makes this change by moving the data structure used for the bulking into struct net_device itself, so we can access it even when there is not devmap. Once this is done, moving the bpf_redirect() helper to use the bulking mechanism becomes quite trivial, and brings bpf_redirect() up to the same as bpf_redirect_map(): Before: After: 1 CPU: bpf_redirect_map: 8.4 Mpps 8.4 Mpps (no change) bpf_redirect: 5.0 Mpps 8.4 Mpps (+68%) 2 CPUs: bpf_redirect_map: 15.9 Mpps 16.1 Mpps (+1% or ~no change) bpf_redirect: 9.5 Mpps 15.9 Mpps (+67%) After this patch series, the only semantics different between the two variants of the bpf() helper (apart from the absence of a map argument, obviously) is that the _map() variant will return an error if passed an invalid map index, whereas the bpf_redirect() helper will succeed, but drop packets on xdp_do_redirect(). This is because the helper has no reference to the calling netdev, so unfortunately we can't do the ifindex lookup directly in the helper. Changelog: v2: - Consolidate code paths and tracepoints for map and non-map redirect variants (Björn) - Add performance data for 2-CPU test (Jesper) - Move fields to avoid shifting cache lines in struct net_device (Eric) --- Toke Høiland-Jørgensen (2): xdp: Move devmap bulk queue into struct net_device xdp: Use bulking for non-map XDP_REDIRECT and consolidate code paths include/linux/bpf.h | 13 +++++- include/linux/netdevice.h | 11 +++-- include/trace/events/xdp.h | 104 +++++++++++++++++++------------------------- kernel/bpf/devmap.c | 94 +++++++++++++++++++++------------------- net/core/dev.c | 2 + net/core/filter.c | 86 +++++++----------------------------- 6 files changed, 132 insertions(+), 178 deletions(-) ^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH bpf-next v2 1/2] xdp: Move devmap bulk queue into struct net_device 2020-01-13 18:10 [PATCH bpf-next v2 0/2] xdp: Introduce bulking for non-map XDP_REDIRECT Toke Høiland-Jørgensen @ 2020-01-13 18:10 ` Toke Høiland-Jørgensen 2020-01-15 19:45 ` John Fastabend 2020-01-15 20:17 ` Jesper Dangaard Brouer 2020-01-13 18:10 ` [PATCH bpf-next v2 2/2] xdp: Use bulking for non-map XDP_REDIRECT and consolidate code paths Toke Høiland-Jørgensen 2020-01-14 17:47 ` [PATCH bpf-next v2 0/2] xdp: Introduce bulking for non-map XDP_REDIRECT Alexei Starovoitov 2 siblings, 2 replies; 13+ messages in thread From: Toke Høiland-Jørgensen @ 2020-01-13 18:10 UTC (permalink / raw) To: netdev Cc: bpf, Daniel Borkmann, Alexei Starovoitov, David Miller, Jesper Dangaard Brouer, Björn Töpel, John Fastabend From: Toke Høiland-Jørgensen <toke@redhat.com> Commit 96360004b862 ("xdp: Make devmap flush_list common for all map instances"), changed devmap flushing to be a global operation instead of a per-map operation. However, the queue structure used for bulking was still allocated as part of the containing map. This patch moves the devmap bulk queue into struct net_device. The motivation for this is reusing it for the non-map variant of XDP_REDIRECT, which will be changed in a subsequent commit. To avoid other fields of struct net_device moving to different cache lines, we also move a couple of other members around. We defer the actual allocation of the bulk queue structure until the NETDEV_REGISTER notification devmap.c. This makes it possible to check for ndo_xdp_xmit support before allocating the structure, which is not possible at the time struct net_device is allocated. However, we keep the freeing in free_netdev() to avoid adding another RCU callback on NETDEV_UNREGISTER. Because of this change, we lose the reference back to the map that originated the redirect, so change the tracepoint to always return 0 as the map ID and index. Otherwise no functional change is intended with this patch. Acked-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> --- include/linux/netdevice.h | 11 +++++--- include/trace/events/xdp.h | 2 + kernel/bpf/devmap.c | 63 +++++++++++++++++++------------------------- net/core/dev.c | 2 + 4 files changed, 37 insertions(+), 41 deletions(-) diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 2741aa35bec6..1f24405c1ec5 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -876,6 +876,7 @@ enum bpf_netdev_command { struct bpf_prog_offload_ops; struct netlink_ext_ack; struct xdp_umem; +struct xdp_dev_bulk_queue; struct netdev_bpf { enum bpf_netdev_command command; @@ -1986,12 +1987,10 @@ struct net_device { unsigned int num_tx_queues; unsigned int real_num_tx_queues; struct Qdisc *qdisc; -#ifdef CONFIG_NET_SCHED - DECLARE_HASHTABLE (qdisc_hash, 4); -#endif unsigned int tx_queue_len; spinlock_t tx_global_lock; - int watchdog_timeo; + + struct xdp_dev_bulk_queue __percpu *xdp_bulkq; #ifdef CONFIG_XPS struct xps_dev_maps __rcu *xps_cpus_map; @@ -2001,8 +2000,12 @@ struct net_device { struct mini_Qdisc __rcu *miniq_egress; #endif +#ifdef CONFIG_NET_SCHED + DECLARE_HASHTABLE (qdisc_hash, 4); +#endif /* These may be needed for future network-power-down code. */ struct timer_list watchdog_timer; + int watchdog_timeo; int __percpu *pcpu_refcnt; struct list_head todo_list; diff --git a/include/trace/events/xdp.h b/include/trace/events/xdp.h index a7378bcd9928..72bad13d4a3c 100644 --- a/include/trace/events/xdp.h +++ b/include/trace/events/xdp.h @@ -278,7 +278,7 @@ TRACE_EVENT(xdp_devmap_xmit, ), TP_fast_assign( - __entry->map_id = map->id; + __entry->map_id = map ? map->id : 0; __entry->act = XDP_REDIRECT; __entry->map_index = map_index; __entry->drops = drops; diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c index da9c832fc5c8..030d125c3839 100644 --- a/kernel/bpf/devmap.c +++ b/kernel/bpf/devmap.c @@ -53,13 +53,11 @@ (BPF_F_NUMA_NODE | BPF_F_RDONLY | BPF_F_WRONLY) #define DEV_MAP_BULK_SIZE 16 -struct bpf_dtab_netdev; - -struct xdp_bulk_queue { +struct xdp_dev_bulk_queue { struct xdp_frame *q[DEV_MAP_BULK_SIZE]; struct list_head flush_node; + struct net_device *dev; struct net_device *dev_rx; - struct bpf_dtab_netdev *obj; unsigned int count; }; @@ -67,9 +65,8 @@ struct bpf_dtab_netdev { struct net_device *dev; /* must be first member, due to tracepoint */ struct hlist_node index_hlist; struct bpf_dtab *dtab; - struct xdp_bulk_queue __percpu *bulkq; struct rcu_head rcu; - unsigned int idx; /* keep track of map index for tracepoint */ + unsigned int idx; }; struct bpf_dtab { @@ -219,7 +216,6 @@ static void dev_map_free(struct bpf_map *map) hlist_for_each_entry_safe(dev, next, head, index_hlist) { hlist_del_rcu(&dev->index_hlist); - free_percpu(dev->bulkq); dev_put(dev->dev); kfree(dev); } @@ -234,7 +230,6 @@ static void dev_map_free(struct bpf_map *map) if (!dev) continue; - free_percpu(dev->bulkq); dev_put(dev->dev); kfree(dev); } @@ -320,10 +315,9 @@ static int dev_map_hash_get_next_key(struct bpf_map *map, void *key, return -ENOENT; } -static int bq_xmit_all(struct xdp_bulk_queue *bq, u32 flags) +static int bq_xmit_all(struct xdp_dev_bulk_queue *bq, u32 flags) { - struct bpf_dtab_netdev *obj = bq->obj; - struct net_device *dev = obj->dev; + struct net_device *dev = bq->dev; int sent = 0, drops = 0, err = 0; int i; @@ -346,8 +340,7 @@ static int bq_xmit_all(struct xdp_bulk_queue *bq, u32 flags) out: bq->count = 0; - trace_xdp_devmap_xmit(&obj->dtab->map, obj->idx, - sent, drops, bq->dev_rx, dev, err); + trace_xdp_devmap_xmit(NULL, 0, sent, drops, bq->dev_rx, dev, err); bq->dev_rx = NULL; __list_del_clearprev(&bq->flush_node); return 0; @@ -374,7 +367,7 @@ static int bq_xmit_all(struct xdp_bulk_queue *bq, u32 flags) void __dev_map_flush(void) { struct list_head *flush_list = this_cpu_ptr(&dev_map_flush_list); - struct xdp_bulk_queue *bq, *tmp; + struct xdp_dev_bulk_queue *bq, *tmp; rcu_read_lock(); list_for_each_entry_safe(bq, tmp, flush_list, flush_node) @@ -401,12 +394,12 @@ struct bpf_dtab_netdev *__dev_map_lookup_elem(struct bpf_map *map, u32 key) /* Runs under RCU-read-side, plus in softirq under NAPI protection. * Thus, safe percpu variable access. */ -static int bq_enqueue(struct bpf_dtab_netdev *obj, struct xdp_frame *xdpf, +static int bq_enqueue(struct net_device *dev, struct xdp_frame *xdpf, struct net_device *dev_rx) { struct list_head *flush_list = this_cpu_ptr(&dev_map_flush_list); - struct xdp_bulk_queue *bq = this_cpu_ptr(obj->bulkq); + struct xdp_dev_bulk_queue *bq = this_cpu_ptr(dev->xdp_bulkq); if (unlikely(bq->count == DEV_MAP_BULK_SIZE)) bq_xmit_all(bq, 0); @@ -444,7 +437,7 @@ int dev_map_enqueue(struct bpf_dtab_netdev *dst, struct xdp_buff *xdp, if (unlikely(!xdpf)) return -EOVERFLOW; - return bq_enqueue(dst, xdpf, dev_rx); + return bq_enqueue(dev, xdpf, dev_rx); } int dev_map_generic_redirect(struct bpf_dtab_netdev *dst, struct sk_buff *skb, @@ -483,7 +476,6 @@ static void __dev_map_entry_free(struct rcu_head *rcu) struct bpf_dtab_netdev *dev; dev = container_of(rcu, struct bpf_dtab_netdev, rcu); - free_percpu(dev->bulkq); dev_put(dev->dev); kfree(dev); } @@ -538,30 +530,15 @@ static struct bpf_dtab_netdev *__dev_map_alloc_node(struct net *net, u32 ifindex, unsigned int idx) { - gfp_t gfp = GFP_ATOMIC | __GFP_NOWARN; struct bpf_dtab_netdev *dev; - struct xdp_bulk_queue *bq; - int cpu; - dev = kmalloc_node(sizeof(*dev), gfp, dtab->map.numa_node); + dev = kmalloc_node(sizeof(*dev), GFP_ATOMIC | __GFP_NOWARN, + dtab->map.numa_node); if (!dev) return ERR_PTR(-ENOMEM); - dev->bulkq = __alloc_percpu_gfp(sizeof(*dev->bulkq), - sizeof(void *), gfp); - if (!dev->bulkq) { - kfree(dev); - return ERR_PTR(-ENOMEM); - } - - for_each_possible_cpu(cpu) { - bq = per_cpu_ptr(dev->bulkq, cpu); - bq->obj = dev; - } - dev->dev = dev_get_by_index(net, ifindex); if (!dev->dev) { - free_percpu(dev->bulkq); kfree(dev); return ERR_PTR(-EINVAL); } @@ -721,9 +698,23 @@ static int dev_map_notification(struct notifier_block *notifier, { struct net_device *netdev = netdev_notifier_info_to_dev(ptr); struct bpf_dtab *dtab; - int i; + int i, cpu; switch (event) { + case NETDEV_REGISTER: + if (!netdev->netdev_ops->ndo_xdp_xmit || netdev->xdp_bulkq) + break; + + /* will be freed in free_netdev() */ + netdev->xdp_bulkq = + __alloc_percpu_gfp(sizeof(struct xdp_dev_bulk_queue), + sizeof(void *), GFP_ATOMIC); + if (!netdev->xdp_bulkq) + return NOTIFY_BAD; + + for_each_possible_cpu(cpu) + per_cpu_ptr(netdev->xdp_bulkq, cpu)->dev = netdev; + break; case NETDEV_UNREGISTER: /* This rcu_read_lock/unlock pair is needed because * dev_map_list is an RCU list AND to ensure a delete diff --git a/net/core/dev.c b/net/core/dev.c index d99f88c58636..e7802a41ae7f 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -9847,6 +9847,8 @@ void free_netdev(struct net_device *dev) free_percpu(dev->pcpu_refcnt); dev->pcpu_refcnt = NULL; + free_percpu(dev->xdp_bulkq); + dev->xdp_bulkq = NULL; netdev_unregister_lockdep_key(dev); ^ permalink raw reply related [flat|nested] 13+ messages in thread
* RE: [PATCH bpf-next v2 1/2] xdp: Move devmap bulk queue into struct net_device 2020-01-13 18:10 ` [PATCH bpf-next v2 1/2] xdp: Move devmap bulk queue into struct net_device Toke Høiland-Jørgensen @ 2020-01-15 19:45 ` John Fastabend 2020-01-15 22:22 ` Toke Høiland-Jørgensen 2020-01-15 20:17 ` Jesper Dangaard Brouer 1 sibling, 1 reply; 13+ messages in thread From: John Fastabend @ 2020-01-15 19:45 UTC (permalink / raw) To: Toke Høiland-Jørgensen, netdev Cc: bpf, Daniel Borkmann, Alexei Starovoitov, David Miller, Jesper Dangaard Brouer, Björn Töpel, John Fastabend Toke Høiland-Jørgensen wrote: > From: Toke Høiland-Jørgensen <toke@redhat.com> > > Commit 96360004b862 ("xdp: Make devmap flush_list common for all map > instances"), changed devmap flushing to be a global operation instead of a > per-map operation. However, the queue structure used for bulking was still > allocated as part of the containing map. > > This patch moves the devmap bulk queue into struct net_device. The > motivation for this is reusing it for the non-map variant of XDP_REDIRECT, > which will be changed in a subsequent commit. To avoid other fields of > struct net_device moving to different cache lines, we also move a couple of > other members around. > > We defer the actual allocation of the bulk queue structure until the > NETDEV_REGISTER notification devmap.c. This makes it possible to check for > ndo_xdp_xmit support before allocating the structure, which is not possible > at the time struct net_device is allocated. However, we keep the freeing in > free_netdev() to avoid adding another RCU callback on NETDEV_UNREGISTER. > > Because of this change, we lose the reference back to the map that > originated the redirect, so change the tracepoint to always return 0 as the > map ID and index. Otherwise no functional change is intended with this > patch. > > Acked-by: Björn Töpel <bjorn.topel@intel.com> > Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> > --- LGTM. I didn't check the net_device layout with pahole though so I'm trusting they are good from v1 discussion. Acked-by: John Fastabend <john.fastabend@gmail.com> ^ permalink raw reply [flat|nested] 13+ messages in thread
* RE: [PATCH bpf-next v2 1/2] xdp: Move devmap bulk queue into struct net_device 2020-01-15 19:45 ` John Fastabend @ 2020-01-15 22:22 ` Toke Høiland-Jørgensen 0 siblings, 0 replies; 13+ messages in thread From: Toke Høiland-Jørgensen @ 2020-01-15 22:22 UTC (permalink / raw) To: John Fastabend, netdev Cc: bpf, Daniel Borkmann, Alexei Starovoitov, David Miller, Jesper Dangaard Brouer, Björn Töpel, John Fastabend John Fastabend <john.fastabend@gmail.com> writes: > Toke Høiland-Jørgensen wrote: >> From: Toke Høiland-Jørgensen <toke@redhat.com> >> >> Commit 96360004b862 ("xdp: Make devmap flush_list common for all map >> instances"), changed devmap flushing to be a global operation instead of a >> per-map operation. However, the queue structure used for bulking was still >> allocated as part of the containing map. >> >> This patch moves the devmap bulk queue into struct net_device. The >> motivation for this is reusing it for the non-map variant of XDP_REDIRECT, >> which will be changed in a subsequent commit. To avoid other fields of >> struct net_device moving to different cache lines, we also move a couple of >> other members around. >> >> We defer the actual allocation of the bulk queue structure until the >> NETDEV_REGISTER notification devmap.c. This makes it possible to check for >> ndo_xdp_xmit support before allocating the structure, which is not possible >> at the time struct net_device is allocated. However, we keep the freeing in >> free_netdev() to avoid adding another RCU callback on NETDEV_UNREGISTER. >> >> Because of this change, we lose the reference back to the map that >> originated the redirect, so change the tracepoint to always return 0 as the >> map ID and index. Otherwise no functional change is intended with this >> patch. >> >> Acked-by: Björn Töpel <bjorn.topel@intel.com> >> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> >> --- > > LGTM. I didn't check the net_device layout with pahole though so I'm > trusting they are good from v1 discussion. I believe so; looks like this now: /* --- cacheline 14 boundary (896 bytes) --- */ struct netdev_queue * _tx __attribute__((__aligned__(64))); /* 896 8 */ unsigned int num_tx_queues; /* 904 4 */ unsigned int real_num_tx_queues; /* 908 4 */ struct Qdisc * qdisc; /* 912 8 */ unsigned int tx_queue_len; /* 920 4 */ spinlock_t tx_global_lock; /* 924 4 */ struct xdp_dev_bulk_queue * xdp_bulkq; /* 928 8 */ struct xps_dev_maps * xps_cpus_map; /* 936 8 */ struct xps_dev_maps * xps_rxqs_map; /* 944 8 */ struct mini_Qdisc * miniq_egress; /* 952 8 */ /* --- cacheline 15 boundary (960 bytes) --- */ struct hlist_head qdisc_hash[16]; /* 960 128 */ /* --- cacheline 17 boundary (1088 bytes) --- */ struct timer_list watchdog_timer; /* 1088 40 */ /* XXX last struct has 4 bytes of padding */ int watchdog_timeo; /* 1128 4 */ /* XXX 4 bytes hole, try to pack */ int * pcpu_refcnt; /* 1136 8 */ struct list_head todo_list; /* 1144 16 */ /* --- cacheline 18 boundary (1152 bytes) was 8 bytes ago --- */ -Toke ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH bpf-next v2 1/2] xdp: Move devmap bulk queue into struct net_device 2020-01-13 18:10 ` [PATCH bpf-next v2 1/2] xdp: Move devmap bulk queue into struct net_device Toke Høiland-Jørgensen 2020-01-15 19:45 ` John Fastabend @ 2020-01-15 20:17 ` Jesper Dangaard Brouer 2020-01-15 22:11 ` Toke Høiland-Jørgensen 1 sibling, 1 reply; 13+ messages in thread From: Jesper Dangaard Brouer @ 2020-01-15 20:17 UTC (permalink / raw) To: Toke Høiland-Jørgensen Cc: netdev, bpf, Daniel Borkmann, Alexei Starovoitov, David Miller, Björn Töpel, John Fastabend, brouer On Mon, 13 Jan 2020 19:10:55 +0100 Toke Høiland-Jørgensen <toke@redhat.com> wrote: > diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c > index da9c832fc5c8..030d125c3839 100644 > --- a/kernel/bpf/devmap.c > +++ b/kernel/bpf/devmap.c [...] > @@ -346,8 +340,7 @@ static int bq_xmit_all(struct xdp_bulk_queue *bq, u32 flags) > out: > bq->count = 0; > > - trace_xdp_devmap_xmit(&obj->dtab->map, obj->idx, > - sent, drops, bq->dev_rx, dev, err); > + trace_xdp_devmap_xmit(NULL, 0, sent, drops, bq->dev_rx, dev, err); Hmm ... I don't like that we lose the map_id and map_index identifier. This is part of our troubleshooting interface. > bq->dev_rx = NULL; > __list_del_clearprev(&bq->flush_node); > return 0; -- Best regards, Jesper Dangaard Brouer MSc.CS, Principal Kernel Engineer at Red Hat LinkedIn: http://www.linkedin.com/in/brouer ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH bpf-next v2 1/2] xdp: Move devmap bulk queue into struct net_device 2020-01-15 20:17 ` Jesper Dangaard Brouer @ 2020-01-15 22:11 ` Toke Høiland-Jørgensen 2020-01-16 11:24 ` Jesper Dangaard Brouer 0 siblings, 1 reply; 13+ messages in thread From: Toke Høiland-Jørgensen @ 2020-01-15 22:11 UTC (permalink / raw) To: Jesper Dangaard Brouer Cc: netdev, bpf, Daniel Borkmann, Alexei Starovoitov, David Miller, Björn Töpel, John Fastabend, brouer Jesper Dangaard Brouer <brouer@redhat.com> writes: > On Mon, 13 Jan 2020 19:10:55 +0100 > Toke Høiland-Jørgensen <toke@redhat.com> wrote: > >> diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c >> index da9c832fc5c8..030d125c3839 100644 >> --- a/kernel/bpf/devmap.c >> +++ b/kernel/bpf/devmap.c > [...] >> @@ -346,8 +340,7 @@ static int bq_xmit_all(struct xdp_bulk_queue *bq, u32 flags) >> out: >> bq->count = 0; >> >> - trace_xdp_devmap_xmit(&obj->dtab->map, obj->idx, >> - sent, drops, bq->dev_rx, dev, err); >> + trace_xdp_devmap_xmit(NULL, 0, sent, drops, bq->dev_rx, dev, err); > > Hmm ... I don't like that we lose the map_id and map_index identifier. > This is part of our troubleshooting interface. Hmm, I guess I can take another look at whether there's a way to avoid that. Any ideas? -Toke ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH bpf-next v2 1/2] xdp: Move devmap bulk queue into struct net_device 2020-01-15 22:11 ` Toke Høiland-Jørgensen @ 2020-01-16 11:24 ` Jesper Dangaard Brouer 2020-01-16 13:51 ` Toke Høiland-Jørgensen 0 siblings, 1 reply; 13+ messages in thread From: Jesper Dangaard Brouer @ 2020-01-16 11:24 UTC (permalink / raw) To: Toke Høiland-Jørgensen Cc: netdev, bpf, Daniel Borkmann, Alexei Starovoitov, David Miller, Björn Töpel, John Fastabend, brouer On Wed, 15 Jan 2020 23:11:21 +0100 Toke Høiland-Jørgensen <toke@redhat.com> wrote: > Jesper Dangaard Brouer <brouer@redhat.com> writes: > > > On Mon, 13 Jan 2020 19:10:55 +0100 > > Toke Høiland-Jørgensen <toke@redhat.com> wrote: > > > >> diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c > >> index da9c832fc5c8..030d125c3839 100644 > >> --- a/kernel/bpf/devmap.c > >> +++ b/kernel/bpf/devmap.c > > [...] > >> @@ -346,8 +340,7 @@ static int bq_xmit_all(struct xdp_bulk_queue *bq, u32 flags) > >> out: > >> bq->count = 0; > >> > >> - trace_xdp_devmap_xmit(&obj->dtab->map, obj->idx, > >> - sent, drops, bq->dev_rx, dev, err); > >> + trace_xdp_devmap_xmit(NULL, 0, sent, drops, bq->dev_rx, dev, err); > > > > Hmm ... I don't like that we lose the map_id and map_index identifier. > > This is part of our troubleshooting interface. > > Hmm, I guess I can take another look at whether there's a way to avoid > that. Any ideas? Looking at the code and the other tracepoints... I will actually suggest to remove these two arguments, because the trace_xdp_redirect_map tracepoint also contains the ifindex'es, and to troubleshoot people can record both tracepoints and do the correlation themselves. When changing the tracepoint I would like to keep member 'drops' and 'sent' at the same struct offsets. As our xdp_monitor example reads these and I hope we can kept it working this way. I've coded it up, and tested it. The new xdp_monitor will work on older kernels, but the old xdp_monitor will fail attaching on newer kernels. I think this is fair enough, as we are backwards compatible. [PATCH] devmap: adjust tracepoing after Tokes changes From: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> --- include/trace/events/xdp.h | 29 ++++++++++++----------------- kernel/bpf/devmap.c | 2 +- samples/bpf/xdp_monitor_kern.c | 8 +++----- 3 files changed, 16 insertions(+), 23 deletions(-) diff --git a/include/trace/events/xdp.h b/include/trace/events/xdp.h index cf568a38f852..f1e64689ce94 100644 --- a/include/trace/events/xdp.h +++ b/include/trace/events/xdp.h @@ -247,43 +247,38 @@ TRACE_EVENT(xdp_cpumap_enqueue, TRACE_EVENT(xdp_devmap_xmit, - TP_PROTO(const struct bpf_map *map, u32 map_index, - int sent, int drops, - const struct net_device *from_dev, - const struct net_device *to_dev, int err), + TP_PROTO(const struct net_device *from_dev, + const struct net_device *to_dev, + int sent, int drops, int err), - TP_ARGS(map, map_index, sent, drops, from_dev, to_dev, err), + TP_ARGS(from_dev, to_dev, sent, drops, err), TP_STRUCT__entry( - __field(int, map_id) + __field(int, from_ifindex) __field(u32, act) - __field(u32, map_index) + __field(int, to_ifindex) __field(int, drops) __field(int, sent) - __field(int, from_ifindex) - __field(int, to_ifindex) __field(int, err) ), TP_fast_assign( - __entry->map_id = map ? map->id : 0; + __entry->from_ifindex = from_dev->ifindex; __entry->act = XDP_REDIRECT; - __entry->map_index = map_index; + __entry->to_ifindex = to_dev->ifindex; __entry->drops = drops; __entry->sent = sent; - __entry->from_ifindex = from_dev->ifindex; - __entry->to_ifindex = to_dev->ifindex; __entry->err = err; ), TP_printk("ndo_xdp_xmit" - " map_id=%d map_index=%d action=%s" + " from_ifindex=%d to_ifindex=%d action=%s" " sent=%d drops=%d" - " from_ifindex=%d to_ifindex=%d err=%d", - __entry->map_id, __entry->map_index, + " err=%d", + __entry->from_ifindex, __entry->to_ifindex, __print_symbolic(__entry->act, __XDP_ACT_SYM_TAB), __entry->sent, __entry->drops, - __entry->from_ifindex, __entry->to_ifindex, __entry->err) + __entry->err) ); /* Expect users already include <net/xdp.h>, but not xdp_priv.h */ diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c index db32272c4f77..1b4bfe4e06d6 100644 --- a/kernel/bpf/devmap.c +++ b/kernel/bpf/devmap.c @@ -340,7 +340,7 @@ static int bq_xmit_all(struct xdp_dev_bulk_queue *bq, u32 flags) out: bq->count = 0; - trace_xdp_devmap_xmit(NULL, 0, sent, drops, bq->dev_rx, dev, err); + trace_xdp_devmap_xmit(bq->dev_rx, dev, sent, drops, err); bq->dev_rx = NULL; __list_del_clearprev(&bq->flush_node); return 0; diff --git a/samples/bpf/xdp_monitor_kern.c b/samples/bpf/xdp_monitor_kern.c index ad10fe700d7d..39458a44472e 100644 --- a/samples/bpf/xdp_monitor_kern.c +++ b/samples/bpf/xdp_monitor_kern.c @@ -222,14 +222,12 @@ struct bpf_map_def SEC("maps") devmap_xmit_cnt = { */ struct devmap_xmit_ctx { u64 __pad; // First 8 bytes are not accessible by bpf code - int map_id; // offset:8; size:4; signed:1; + int from_ifindex; // offset:8; size:4; signed:1; u32 act; // offset:12; size:4; signed:0; - u32 map_index; // offset:16; size:4; signed:0; + int to_ifindex; // offset:16; size:4; signed:1; int drops; // offset:20; size:4; signed:1; int sent; // offset:24; size:4; signed:1; - int from_ifindex; // offset:28; size:4; signed:1; - int to_ifindex; // offset:32; size:4; signed:1; - int err; // offset:36; size:4; signed:1; + int err; // offset:28; size:4; signed:1; }; SEC("tracepoint/xdp/xdp_devmap_xmit") ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH bpf-next v2 1/2] xdp: Move devmap bulk queue into struct net_device 2020-01-16 11:24 ` Jesper Dangaard Brouer @ 2020-01-16 13:51 ` Toke Høiland-Jørgensen 0 siblings, 0 replies; 13+ messages in thread From: Toke Høiland-Jørgensen @ 2020-01-16 13:51 UTC (permalink / raw) To: Jesper Dangaard Brouer Cc: netdev, bpf, Daniel Borkmann, Alexei Starovoitov, David Miller, Björn Töpel, John Fastabend, brouer Jesper Dangaard Brouer <brouer@redhat.com> writes: > On Wed, 15 Jan 2020 23:11:21 +0100 > Toke Høiland-Jørgensen <toke@redhat.com> wrote: > >> Jesper Dangaard Brouer <brouer@redhat.com> writes: >> >> > On Mon, 13 Jan 2020 19:10:55 +0100 >> > Toke Høiland-Jørgensen <toke@redhat.com> wrote: >> > >> >> diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c >> >> index da9c832fc5c8..030d125c3839 100644 >> >> --- a/kernel/bpf/devmap.c >> >> +++ b/kernel/bpf/devmap.c >> > [...] >> >> @@ -346,8 +340,7 @@ static int bq_xmit_all(struct xdp_bulk_queue *bq, u32 flags) >> >> out: >> >> bq->count = 0; >> >> >> >> - trace_xdp_devmap_xmit(&obj->dtab->map, obj->idx, >> >> - sent, drops, bq->dev_rx, dev, err); >> >> + trace_xdp_devmap_xmit(NULL, 0, sent, drops, bq->dev_rx, dev, err); >> > >> > Hmm ... I don't like that we lose the map_id and map_index identifier. >> > This is part of our troubleshooting interface. >> >> Hmm, I guess I can take another look at whether there's a way to avoid >> that. Any ideas? > > Looking at the code and the other tracepoints... > > I will actually suggest to remove these two arguments, because the > trace_xdp_redirect_map tracepoint also contains the ifindex'es, and to > troubleshoot people can record both tracepoints and do the correlation > themselves. > > When changing the tracepoint I would like to keep member 'drops' and > 'sent' at the same struct offsets. As our xdp_monitor example reads > these and I hope we can kept it working this way. > > I've coded it up, and tested it. The new xdp_monitor will work on > older kernels, but the old xdp_monitor will fail attaching on newer > kernels. I think this is fair enough, as we are backwards compatible. SGTM - thanks! I'll respin and include this :) -Toke ^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH bpf-next v2 2/2] xdp: Use bulking for non-map XDP_REDIRECT and consolidate code paths 2020-01-13 18:10 [PATCH bpf-next v2 0/2] xdp: Introduce bulking for non-map XDP_REDIRECT Toke Høiland-Jørgensen 2020-01-13 18:10 ` [PATCH bpf-next v2 1/2] xdp: Move devmap bulk queue into struct net_device Toke Høiland-Jørgensen @ 2020-01-13 18:10 ` Toke Høiland-Jørgensen 2020-01-15 12:16 ` Maciej Fijalkowski 2020-01-15 19:43 ` John Fastabend 2020-01-14 17:47 ` [PATCH bpf-next v2 0/2] xdp: Introduce bulking for non-map XDP_REDIRECT Alexei Starovoitov 2 siblings, 2 replies; 13+ messages in thread From: Toke Høiland-Jørgensen @ 2020-01-13 18:10 UTC (permalink / raw) To: netdev Cc: bpf, Daniel Borkmann, Alexei Starovoitov, David Miller, Jesper Dangaard Brouer, Björn Töpel, John Fastabend From: Toke Høiland-Jørgensen <toke@redhat.com> Since the bulk queue used by XDP_REDIRECT now lives in struct net_device, we can re-use the bulking for the non-map version of the bpf_redirect() helper. This is a simple matter of having xdp_do_redirect_slow() queue the frame on the bulk queue instead of sending it out with __bpf_tx_xdp(). Unfortunately we can't make the bpf_redirect() helper return an error if the ifindex doesn't exit (as bpf_redirect_map() does), because we don't have a reference to the network namespace of the ingress device at the time the helper is called. So we have to leave it as-is and keep the device lookup in xdp_do_redirect_slow(). Since this leaves less reason to have the non-map redirect code in a separate function, so we get rid of the xdp_do_redirect_slow() function entirely. This does lose us the tracepoint disambiguation, but fortunately the xdp_redirect and xdp_redirect_map tracepoints use the same tracepoint entry structures. This means both can contain a map index, so we can just amend the tracepoint definitions so we always emit the xdp_redirect(_err) tracepoints, but with the map ID only populated if a map is present. This means we retire the xdp_redirect_map(_err) tracepoints entirely, but keep the definitions around in case someone is still listening for them. With this change, the performance of the xdp_redirect sample program goes from 5Mpps to 8.4Mpps (a 68% increase). Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> --- include/linux/bpf.h | 13 +++++- include/trace/events/xdp.h | 102 +++++++++++++++++++------------------------- kernel/bpf/devmap.c | 31 +++++++++---- net/core/filter.c | 86 +++++++------------------------------ 4 files changed, 95 insertions(+), 137 deletions(-) diff --git a/include/linux/bpf.h b/include/linux/bpf.h index b14e51d56a82..25c050202536 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -962,7 +962,9 @@ struct sk_buff; struct bpf_dtab_netdev *__dev_map_lookup_elem(struct bpf_map *map, u32 key); struct bpf_dtab_netdev *__dev_map_hash_lookup_elem(struct bpf_map *map, u32 key); -void __dev_map_flush(void); +void __dev_flush(void); +int dev_xdp_enqueue(struct net_device *dev, struct xdp_buff *xdp, + struct net_device *dev_rx); int dev_map_enqueue(struct bpf_dtab_netdev *dst, struct xdp_buff *xdp, struct net_device *dev_rx); int dev_map_generic_redirect(struct bpf_dtab_netdev *dst, struct sk_buff *skb, @@ -1071,13 +1073,20 @@ static inline struct net_device *__dev_map_hash_lookup_elem(struct bpf_map *map return NULL; } -static inline void __dev_map_flush(void) +static inline void __dev_flush(void) { } struct xdp_buff; struct bpf_dtab_netdev; +static inline +int dev_xdp_enqueue(struct net_device *dev, struct xdp_buff *xdp, + struct net_device *dev_rx) +{ + return 0; +} + static inline int dev_map_enqueue(struct bpf_dtab_netdev *dst, struct xdp_buff *xdp, struct net_device *dev_rx) diff --git a/include/trace/events/xdp.h b/include/trace/events/xdp.h index 72bad13d4a3c..cf568a38f852 100644 --- a/include/trace/events/xdp.h +++ b/include/trace/events/xdp.h @@ -79,14 +79,27 @@ TRACE_EVENT(xdp_bulk_tx, __entry->sent, __entry->drops, __entry->err) ); +#ifndef __DEVMAP_OBJ_TYPE +#define __DEVMAP_OBJ_TYPE +struct _bpf_dtab_netdev { + struct net_device *dev; +}; +#endif /* __DEVMAP_OBJ_TYPE */ + +#define devmap_ifindex(tgt, map) \ + (((map->map_type == BPF_MAP_TYPE_DEVMAP || \ + map->map_type == BPF_MAP_TYPE_DEVMAP_HASH)) ? \ + ((struct _bpf_dtab_netdev *)tgt)->dev->ifindex : 0) + + DECLARE_EVENT_CLASS(xdp_redirect_template, TP_PROTO(const struct net_device *dev, const struct bpf_prog *xdp, - int to_ifindex, int err, - const struct bpf_map *map, u32 map_index), + const void *tgt, int err, + const struct bpf_map *map, u32 index), - TP_ARGS(dev, xdp, to_ifindex, err, map, map_index), + TP_ARGS(dev, xdp, tgt, err, map, index), TP_STRUCT__entry( __field(int, prog_id) @@ -103,90 +116,65 @@ DECLARE_EVENT_CLASS(xdp_redirect_template, __entry->act = XDP_REDIRECT; __entry->ifindex = dev->ifindex; __entry->err = err; - __entry->to_ifindex = to_ifindex; + __entry->to_ifindex = map ? devmap_ifindex(tgt, map) : + index; __entry->map_id = map ? map->id : 0; - __entry->map_index = map_index; + __entry->map_index = map ? index : 0; ), - TP_printk("prog_id=%d action=%s ifindex=%d to_ifindex=%d err=%d", + TP_printk("prog_id=%d action=%s ifindex=%d to_ifindex=%d err=%d" + " map_id=%d map_index=%d", __entry->prog_id, __print_symbolic(__entry->act, __XDP_ACT_SYM_TAB), __entry->ifindex, __entry->to_ifindex, - __entry->err) + __entry->err, __entry->map_id, __entry->map_index) ); DEFINE_EVENT(xdp_redirect_template, xdp_redirect, TP_PROTO(const struct net_device *dev, const struct bpf_prog *xdp, - int to_ifindex, int err, - const struct bpf_map *map, u32 map_index), - TP_ARGS(dev, xdp, to_ifindex, err, map, map_index) + const void *tgt, int err, + const struct bpf_map *map, u32 index), + TP_ARGS(dev, xdp, tgt, err, map, index) ); DEFINE_EVENT(xdp_redirect_template, xdp_redirect_err, TP_PROTO(const struct net_device *dev, const struct bpf_prog *xdp, - int to_ifindex, int err, - const struct bpf_map *map, u32 map_index), - TP_ARGS(dev, xdp, to_ifindex, err, map, map_index) + const void *tgt, int err, + const struct bpf_map *map, u32 index), + TP_ARGS(dev, xdp, tgt, err, map, index) ); #define _trace_xdp_redirect(dev, xdp, to) \ - trace_xdp_redirect(dev, xdp, to, 0, NULL, 0); + trace_xdp_redirect(dev, xdp, NULL, 0, NULL, to); #define _trace_xdp_redirect_err(dev, xdp, to, err) \ - trace_xdp_redirect_err(dev, xdp, to, err, NULL, 0); + trace_xdp_redirect_err(dev, xdp, NULL, err, NULL, to); + +#define _trace_xdp_redirect_map(dev, xdp, to, map, index) \ + trace_xdp_redirect(dev, xdp, to, 0, map, index); -DEFINE_EVENT_PRINT(xdp_redirect_template, xdp_redirect_map, +#define _trace_xdp_redirect_map_err(dev, xdp, to, map, index, err) \ + trace_xdp_redirect_err(dev, xdp, to, err, map, index); + +/* not used anymore, but kept around so as not to break old programs */ +DEFINE_EVENT(xdp_redirect_template, xdp_redirect_map, TP_PROTO(const struct net_device *dev, const struct bpf_prog *xdp, - int to_ifindex, int err, - const struct bpf_map *map, u32 map_index), - TP_ARGS(dev, xdp, to_ifindex, err, map, map_index), - TP_printk("prog_id=%d action=%s ifindex=%d to_ifindex=%d err=%d" - " map_id=%d map_index=%d", - __entry->prog_id, - __print_symbolic(__entry->act, __XDP_ACT_SYM_TAB), - __entry->ifindex, __entry->to_ifindex, - __entry->err, - __entry->map_id, __entry->map_index) + const void *tgt, int err, + const struct bpf_map *map, u32 index), + TP_ARGS(dev, xdp, tgt, err, map, index) ); -DEFINE_EVENT_PRINT(xdp_redirect_template, xdp_redirect_map_err, +DEFINE_EVENT(xdp_redirect_template, xdp_redirect_map_err, TP_PROTO(const struct net_device *dev, const struct bpf_prog *xdp, - int to_ifindex, int err, - const struct bpf_map *map, u32 map_index), - TP_ARGS(dev, xdp, to_ifindex, err, map, map_index), - TP_printk("prog_id=%d action=%s ifindex=%d to_ifindex=%d err=%d" - " map_id=%d map_index=%d", - __entry->prog_id, - __print_symbolic(__entry->act, __XDP_ACT_SYM_TAB), - __entry->ifindex, __entry->to_ifindex, - __entry->err, - __entry->map_id, __entry->map_index) + const void *tgt, int err, + const struct bpf_map *map, u32 index), + TP_ARGS(dev, xdp, tgt, err, map, index) ); -#ifndef __DEVMAP_OBJ_TYPE -#define __DEVMAP_OBJ_TYPE -struct _bpf_dtab_netdev { - struct net_device *dev; -}; -#endif /* __DEVMAP_OBJ_TYPE */ - -#define devmap_ifindex(fwd, map) \ - ((map->map_type == BPF_MAP_TYPE_DEVMAP || \ - map->map_type == BPF_MAP_TYPE_DEVMAP_HASH) ? \ - ((struct _bpf_dtab_netdev *)fwd)->dev->ifindex : 0) - -#define _trace_xdp_redirect_map(dev, xdp, fwd, map, idx) \ - trace_xdp_redirect_map(dev, xdp, devmap_ifindex(fwd, map), \ - 0, map, idx) - -#define _trace_xdp_redirect_map_err(dev, xdp, fwd, map, idx, err) \ - trace_xdp_redirect_map_err(dev, xdp, devmap_ifindex(fwd, map), \ - err, map, idx) - TRACE_EVENT(xdp_cpumap_kthread, TP_PROTO(int map_id, unsigned int processed, unsigned int drops, diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c index 030d125c3839..db32272c4f77 100644 --- a/kernel/bpf/devmap.c +++ b/kernel/bpf/devmap.c @@ -81,7 +81,7 @@ struct bpf_dtab { u32 n_buckets; }; -static DEFINE_PER_CPU(struct list_head, dev_map_flush_list); +static DEFINE_PER_CPU(struct list_head, dev_flush_list); static DEFINE_SPINLOCK(dev_map_lock); static LIST_HEAD(dev_map_list); @@ -357,16 +357,16 @@ static int bq_xmit_all(struct xdp_dev_bulk_queue *bq, u32 flags) goto out; } -/* __dev_map_flush is called from xdp_do_flush_map() which _must_ be signaled +/* __dev_flush is called from xdp_do_flush_map() which _must_ be signaled * from the driver before returning from its napi->poll() routine. The poll() * routine is called either from busy_poll context or net_rx_action signaled * from NET_RX_SOFTIRQ. Either way the poll routine must complete before the * net device can be torn down. On devmap tear down we ensure the flush list * is empty before completing to ensure all flush operations have completed. */ -void __dev_map_flush(void) +void __dev_flush(void) { - struct list_head *flush_list = this_cpu_ptr(&dev_map_flush_list); + struct list_head *flush_list = this_cpu_ptr(&dev_flush_list); struct xdp_dev_bulk_queue *bq, *tmp; rcu_read_lock(); @@ -398,7 +398,7 @@ static int bq_enqueue(struct net_device *dev, struct xdp_frame *xdpf, struct net_device *dev_rx) { - struct list_head *flush_list = this_cpu_ptr(&dev_map_flush_list); + struct list_head *flush_list = this_cpu_ptr(&dev_flush_list); struct xdp_dev_bulk_queue *bq = this_cpu_ptr(dev->xdp_bulkq); if (unlikely(bq->count == DEV_MAP_BULK_SIZE)) @@ -419,10 +419,9 @@ static int bq_enqueue(struct net_device *dev, struct xdp_frame *xdpf, return 0; } -int dev_map_enqueue(struct bpf_dtab_netdev *dst, struct xdp_buff *xdp, - struct net_device *dev_rx) +static inline int _xdp_enqueue(struct net_device *dev, struct xdp_buff *xdp, + struct net_device *dev_rx) { - struct net_device *dev = dst->dev; struct xdp_frame *xdpf; int err; @@ -440,6 +439,20 @@ int dev_map_enqueue(struct bpf_dtab_netdev *dst, struct xdp_buff *xdp, return bq_enqueue(dev, xdpf, dev_rx); } +int dev_xdp_enqueue(struct net_device *dev, struct xdp_buff *xdp, + struct net_device *dev_rx) +{ + return _xdp_enqueue(dev, xdp, dev_rx); +} + +int dev_map_enqueue(struct bpf_dtab_netdev *dst, struct xdp_buff *xdp, + struct net_device *dev_rx) +{ + struct net_device *dev = dst->dev; + + return _xdp_enqueue(dev, xdp, dev_rx); +} + int dev_map_generic_redirect(struct bpf_dtab_netdev *dst, struct sk_buff *skb, struct bpf_prog *xdp_prog) { @@ -762,7 +775,7 @@ static int __init dev_map_init(void) register_netdevice_notifier(&dev_map_notifier); for_each_possible_cpu(cpu) - INIT_LIST_HEAD(&per_cpu(dev_map_flush_list, cpu)); + INIT_LIST_HEAD(&per_cpu(dev_flush_list, cpu)); return 0; } diff --git a/net/core/filter.c b/net/core/filter.c index 42fd17c48c5f..f023f3a8f351 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -3458,58 +3458,6 @@ static const struct bpf_func_proto bpf_xdp_adjust_meta_proto = { .arg2_type = ARG_ANYTHING, }; -static int __bpf_tx_xdp(struct net_device *dev, - struct bpf_map *map, - struct xdp_buff *xdp, - u32 index) -{ - struct xdp_frame *xdpf; - int err, sent; - - if (!dev->netdev_ops->ndo_xdp_xmit) { - return -EOPNOTSUPP; - } - - err = xdp_ok_fwd_dev(dev, xdp->data_end - xdp->data); - if (unlikely(err)) - return err; - - xdpf = convert_to_xdp_frame(xdp); - if (unlikely(!xdpf)) - return -EOVERFLOW; - - sent = dev->netdev_ops->ndo_xdp_xmit(dev, 1, &xdpf, XDP_XMIT_FLUSH); - if (sent <= 0) - return sent; - return 0; -} - -static noinline int -xdp_do_redirect_slow(struct net_device *dev, struct xdp_buff *xdp, - struct bpf_prog *xdp_prog, struct bpf_redirect_info *ri) -{ - struct net_device *fwd; - u32 index = ri->tgt_index; - int err; - - fwd = dev_get_by_index_rcu(dev_net(dev), index); - ri->tgt_index = 0; - if (unlikely(!fwd)) { - err = -EINVAL; - goto err; - } - - err = __bpf_tx_xdp(fwd, NULL, xdp, 0); - if (unlikely(err)) - goto err; - - _trace_xdp_redirect(dev, xdp_prog, index); - return 0; -err: - _trace_xdp_redirect_err(dev, xdp_prog, index, err); - return err; -} - static int __bpf_tx_xdp_map(struct net_device *dev_rx, void *fwd, struct bpf_map *map, struct xdp_buff *xdp) { @@ -3529,7 +3477,7 @@ static int __bpf_tx_xdp_map(struct net_device *dev_rx, void *fwd, void xdp_do_flush_map(void) { - __dev_map_flush(); + __dev_flush(); __cpu_map_flush(); __xsk_map_flush(); } @@ -3568,10 +3516,11 @@ void bpf_clear_redirect_map(struct bpf_map *map) } } -static int xdp_do_redirect_map(struct net_device *dev, struct xdp_buff *xdp, - struct bpf_prog *xdp_prog, struct bpf_map *map, - struct bpf_redirect_info *ri) +int xdp_do_redirect(struct net_device *dev, struct xdp_buff *xdp, + struct bpf_prog *xdp_prog) { + struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info); + struct bpf_map *map = READ_ONCE(ri->map); u32 index = ri->tgt_index; void *fwd = ri->tgt_value; int err; @@ -3580,7 +3529,18 @@ static int xdp_do_redirect_map(struct net_device *dev, struct xdp_buff *xdp, ri->tgt_value = NULL; WRITE_ONCE(ri->map, NULL); - err = __bpf_tx_xdp_map(dev, fwd, map, xdp); + if (unlikely(!map)) { + fwd = dev_get_by_index_rcu(dev_net(dev), index); + if (unlikely(!fwd)) { + err = -EINVAL; + goto err; + } + + err = dev_xdp_enqueue(fwd, xdp, dev); + } else { + err = __bpf_tx_xdp_map(dev, fwd, map, xdp); + } + if (unlikely(err)) goto err; @@ -3590,18 +3550,6 @@ static int xdp_do_redirect_map(struct net_device *dev, struct xdp_buff *xdp, _trace_xdp_redirect_map_err(dev, xdp_prog, fwd, map, index, err); return err; } - -int xdp_do_redirect(struct net_device *dev, struct xdp_buff *xdp, - struct bpf_prog *xdp_prog) -{ - struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info); - struct bpf_map *map = READ_ONCE(ri->map); - - if (likely(map)) - return xdp_do_redirect_map(dev, xdp, xdp_prog, map, ri); - - return xdp_do_redirect_slow(dev, xdp, xdp_prog, ri); -} EXPORT_SYMBOL_GPL(xdp_do_redirect); static int xdp_do_generic_redirect_map(struct net_device *dev, ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH bpf-next v2 2/2] xdp: Use bulking for non-map XDP_REDIRECT and consolidate code paths 2020-01-13 18:10 ` [PATCH bpf-next v2 2/2] xdp: Use bulking for non-map XDP_REDIRECT and consolidate code paths Toke Høiland-Jørgensen @ 2020-01-15 12:16 ` Maciej Fijalkowski 2020-01-15 19:43 ` John Fastabend 1 sibling, 0 replies; 13+ messages in thread From: Maciej Fijalkowski @ 2020-01-15 12:16 UTC (permalink / raw) To: Toke Høiland-Jørgensen Cc: netdev, bpf, Daniel Borkmann, Alexei Starovoitov, David Miller, Jesper Dangaard Brouer, Björn Töpel, John Fastabend On Mon, Jan 13, 2020 at 07:10:56PM +0100, Toke Høiland-Jørgensen wrote: > From: Toke Høiland-Jørgensen <toke@redhat.com> > > Since the bulk queue used by XDP_REDIRECT now lives in struct net_device, > we can re-use the bulking for the non-map version of the bpf_redirect() > helper. This is a simple matter of having xdp_do_redirect_slow() queue the > frame on the bulk queue instead of sending it out with __bpf_tx_xdp(). > > Unfortunately we can't make the bpf_redirect() helper return an error if > the ifindex doesn't exit (as bpf_redirect_map() does), because we don't > have a reference to the network namespace of the ingress device at the time > the helper is called. So we have to leave it as-is and keep the device > lookup in xdp_do_redirect_slow(). > > Since this leaves less reason to have the non-map redirect code in a > separate function, so we get rid of the xdp_do_redirect_slow() function > entirely. This does lose us the tracepoint disambiguation, but fortunately > the xdp_redirect and xdp_redirect_map tracepoints use the same tracepoint > entry structures. This means both can contain a map index, so we can just > amend the tracepoint definitions so we always emit the xdp_redirect(_err) > tracepoints, but with the map ID only populated if a map is present. This > means we retire the xdp_redirect_map(_err) tracepoints entirely, but keep > the definitions around in case someone is still listening for them. > > With this change, the performance of the xdp_redirect sample program goes > from 5Mpps to 8.4Mpps (a 68% increase). > > Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> > --- > include/linux/bpf.h | 13 +++++- > include/trace/events/xdp.h | 102 +++++++++++++++++++------------------------- > kernel/bpf/devmap.c | 31 +++++++++---- > net/core/filter.c | 86 +++++++------------------------------ > 4 files changed, 95 insertions(+), 137 deletions(-) > > diff --git a/include/linux/bpf.h b/include/linux/bpf.h > index b14e51d56a82..25c050202536 100644 > --- a/include/linux/bpf.h > +++ b/include/linux/bpf.h > @@ -962,7 +962,9 @@ struct sk_buff; > > struct bpf_dtab_netdev *__dev_map_lookup_elem(struct bpf_map *map, u32 key); > struct bpf_dtab_netdev *__dev_map_hash_lookup_elem(struct bpf_map *map, u32 key); > -void __dev_map_flush(void); > +void __dev_flush(void); > +int dev_xdp_enqueue(struct net_device *dev, struct xdp_buff *xdp, > + struct net_device *dev_rx); > int dev_map_enqueue(struct bpf_dtab_netdev *dst, struct xdp_buff *xdp, > struct net_device *dev_rx); > int dev_map_generic_redirect(struct bpf_dtab_netdev *dst, struct sk_buff *skb, > @@ -1071,13 +1073,20 @@ static inline struct net_device *__dev_map_hash_lookup_elem(struct bpf_map *map > return NULL; > } > > -static inline void __dev_map_flush(void) > +static inline void __dev_flush(void) > { > } > > struct xdp_buff; > struct bpf_dtab_netdev; > > +static inline > +int dev_xdp_enqueue(struct net_device *dev, struct xdp_buff *xdp, > + struct net_device *dev_rx) > +{ > + return 0; > +} > + > static inline > int dev_map_enqueue(struct bpf_dtab_netdev *dst, struct xdp_buff *xdp, > struct net_device *dev_rx) > diff --git a/include/trace/events/xdp.h b/include/trace/events/xdp.h > index 72bad13d4a3c..cf568a38f852 100644 > --- a/include/trace/events/xdp.h > +++ b/include/trace/events/xdp.h > @@ -79,14 +79,27 @@ TRACE_EVENT(xdp_bulk_tx, > __entry->sent, __entry->drops, __entry->err) > ); > > +#ifndef __DEVMAP_OBJ_TYPE > +#define __DEVMAP_OBJ_TYPE > +struct _bpf_dtab_netdev { > + struct net_device *dev; > +}; > +#endif /* __DEVMAP_OBJ_TYPE */ > + > +#define devmap_ifindex(tgt, map) \ > + (((map->map_type == BPF_MAP_TYPE_DEVMAP || \ > + map->map_type == BPF_MAP_TYPE_DEVMAP_HASH)) ? \ > + ((struct _bpf_dtab_netdev *)tgt)->dev->ifindex : 0) > + Delete one blank line > + > DECLARE_EVENT_CLASS(xdp_redirect_template, > > TP_PROTO(const struct net_device *dev, > const struct bpf_prog *xdp, > - int to_ifindex, int err, > - const struct bpf_map *map, u32 map_index), > + const void *tgt, int err, > + const struct bpf_map *map, u32 index), > > - TP_ARGS(dev, xdp, to_ifindex, err, map, map_index), > + TP_ARGS(dev, xdp, tgt, err, map, index), > > TP_STRUCT__entry( > __field(int, prog_id) > @@ -103,90 +116,65 @@ DECLARE_EVENT_CLASS(xdp_redirect_template, > __entry->act = XDP_REDIRECT; > __entry->ifindex = dev->ifindex; > __entry->err = err; > - __entry->to_ifindex = to_ifindex; > + __entry->to_ifindex = map ? devmap_ifindex(tgt, map) : > + index; > __entry->map_id = map ? map->id : 0; > - __entry->map_index = map_index; > + __entry->map_index = map ? index : 0; > ), > > - TP_printk("prog_id=%d action=%s ifindex=%d to_ifindex=%d err=%d", > + TP_printk("prog_id=%d action=%s ifindex=%d to_ifindex=%d err=%d" > + " map_id=%d map_index=%d", > __entry->prog_id, > __print_symbolic(__entry->act, __XDP_ACT_SYM_TAB), > __entry->ifindex, __entry->to_ifindex, > - __entry->err) > + __entry->err, __entry->map_id, __entry->map_index) > ); > > DEFINE_EVENT(xdp_redirect_template, xdp_redirect, > TP_PROTO(const struct net_device *dev, > const struct bpf_prog *xdp, > - int to_ifindex, int err, > - const struct bpf_map *map, u32 map_index), > - TP_ARGS(dev, xdp, to_ifindex, err, map, map_index) > + const void *tgt, int err, > + const struct bpf_map *map, u32 index), > + TP_ARGS(dev, xdp, tgt, err, map, index) > ); > > DEFINE_EVENT(xdp_redirect_template, xdp_redirect_err, > TP_PROTO(const struct net_device *dev, > const struct bpf_prog *xdp, > - int to_ifindex, int err, > - const struct bpf_map *map, u32 map_index), > - TP_ARGS(dev, xdp, to_ifindex, err, map, map_index) > + const void *tgt, int err, > + const struct bpf_map *map, u32 index), > + TP_ARGS(dev, xdp, tgt, err, map, index) > ); > > #define _trace_xdp_redirect(dev, xdp, to) \ > - trace_xdp_redirect(dev, xdp, to, 0, NULL, 0); > + trace_xdp_redirect(dev, xdp, NULL, 0, NULL, to); > > #define _trace_xdp_redirect_err(dev, xdp, to, err) \ > - trace_xdp_redirect_err(dev, xdp, to, err, NULL, 0); > + trace_xdp_redirect_err(dev, xdp, NULL, err, NULL, to); > + > +#define _trace_xdp_redirect_map(dev, xdp, to, map, index) \ > + trace_xdp_redirect(dev, xdp, to, 0, map, index); > > -DEFINE_EVENT_PRINT(xdp_redirect_template, xdp_redirect_map, > +#define _trace_xdp_redirect_map_err(dev, xdp, to, map, index, err) \ > + trace_xdp_redirect_err(dev, xdp, to, err, map, index); > + > +/* not used anymore, but kept around so as not to break old programs */ > +DEFINE_EVENT(xdp_redirect_template, xdp_redirect_map, > TP_PROTO(const struct net_device *dev, > const struct bpf_prog *xdp, > - int to_ifindex, int err, > - const struct bpf_map *map, u32 map_index), > - TP_ARGS(dev, xdp, to_ifindex, err, map, map_index), > - TP_printk("prog_id=%d action=%s ifindex=%d to_ifindex=%d err=%d" > - " map_id=%d map_index=%d", > - __entry->prog_id, > - __print_symbolic(__entry->act, __XDP_ACT_SYM_TAB), > - __entry->ifindex, __entry->to_ifindex, > - __entry->err, > - __entry->map_id, __entry->map_index) > + const void *tgt, int err, > + const struct bpf_map *map, u32 index), > + TP_ARGS(dev, xdp, tgt, err, map, index) > ); > > -DEFINE_EVENT_PRINT(xdp_redirect_template, xdp_redirect_map_err, > +DEFINE_EVENT(xdp_redirect_template, xdp_redirect_map_err, > TP_PROTO(const struct net_device *dev, > const struct bpf_prog *xdp, > - int to_ifindex, int err, > - const struct bpf_map *map, u32 map_index), > - TP_ARGS(dev, xdp, to_ifindex, err, map, map_index), > - TP_printk("prog_id=%d action=%s ifindex=%d to_ifindex=%d err=%d" > - " map_id=%d map_index=%d", > - __entry->prog_id, > - __print_symbolic(__entry->act, __XDP_ACT_SYM_TAB), > - __entry->ifindex, __entry->to_ifindex, > - __entry->err, > - __entry->map_id, __entry->map_index) > + const void *tgt, int err, > + const struct bpf_map *map, u32 index), > + TP_ARGS(dev, xdp, tgt, err, map, index) > ); > > -#ifndef __DEVMAP_OBJ_TYPE > -#define __DEVMAP_OBJ_TYPE > -struct _bpf_dtab_netdev { > - struct net_device *dev; > -}; > -#endif /* __DEVMAP_OBJ_TYPE */ > - > -#define devmap_ifindex(fwd, map) \ > - ((map->map_type == BPF_MAP_TYPE_DEVMAP || \ > - map->map_type == BPF_MAP_TYPE_DEVMAP_HASH) ? \ > - ((struct _bpf_dtab_netdev *)fwd)->dev->ifindex : 0) > - > -#define _trace_xdp_redirect_map(dev, xdp, fwd, map, idx) \ > - trace_xdp_redirect_map(dev, xdp, devmap_ifindex(fwd, map), \ > - 0, map, idx) > - > -#define _trace_xdp_redirect_map_err(dev, xdp, fwd, map, idx, err) \ > - trace_xdp_redirect_map_err(dev, xdp, devmap_ifindex(fwd, map), \ > - err, map, idx) > - > TRACE_EVENT(xdp_cpumap_kthread, > > TP_PROTO(int map_id, unsigned int processed, unsigned int drops, > diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c > index 030d125c3839..db32272c4f77 100644 > --- a/kernel/bpf/devmap.c > +++ b/kernel/bpf/devmap.c > @@ -81,7 +81,7 @@ struct bpf_dtab { > u32 n_buckets; > }; > > -static DEFINE_PER_CPU(struct list_head, dev_map_flush_list); > +static DEFINE_PER_CPU(struct list_head, dev_flush_list); > static DEFINE_SPINLOCK(dev_map_lock); > static LIST_HEAD(dev_map_list); > > @@ -357,16 +357,16 @@ static int bq_xmit_all(struct xdp_dev_bulk_queue *bq, u32 flags) > goto out; > } > > -/* __dev_map_flush is called from xdp_do_flush_map() which _must_ be signaled > +/* __dev_flush is called from xdp_do_flush_map() which _must_ be signaled > * from the driver before returning from its napi->poll() routine. The poll() > * routine is called either from busy_poll context or net_rx_action signaled > * from NET_RX_SOFTIRQ. Either way the poll routine must complete before the > * net device can be torn down. On devmap tear down we ensure the flush list > * is empty before completing to ensure all flush operations have completed. > */ > -void __dev_map_flush(void) > +void __dev_flush(void) > { > - struct list_head *flush_list = this_cpu_ptr(&dev_map_flush_list); > + struct list_head *flush_list = this_cpu_ptr(&dev_flush_list); > struct xdp_dev_bulk_queue *bq, *tmp; > > rcu_read_lock(); > @@ -398,7 +398,7 @@ static int bq_enqueue(struct net_device *dev, struct xdp_frame *xdpf, > struct net_device *dev_rx) > ^^^ While you're at this part of code maybe you could remove another blank line? :) > { > - struct list_head *flush_list = this_cpu_ptr(&dev_map_flush_list); > + struct list_head *flush_list = this_cpu_ptr(&dev_flush_list); > struct xdp_dev_bulk_queue *bq = this_cpu_ptr(dev->xdp_bulkq); > > if (unlikely(bq->count == DEV_MAP_BULK_SIZE)) > @@ -419,10 +419,9 @@ static int bq_enqueue(struct net_device *dev, struct xdp_frame *xdpf, > return 0; > } > > -int dev_map_enqueue(struct bpf_dtab_netdev *dst, struct xdp_buff *xdp, > - struct net_device *dev_rx) > +static inline int _xdp_enqueue(struct net_device *dev, struct xdp_buff *xdp, > + struct net_device *dev_rx) > { > - struct net_device *dev = dst->dev; > struct xdp_frame *xdpf; > int err; > > @@ -440,6 +439,20 @@ int dev_map_enqueue(struct bpf_dtab_netdev *dst, struct xdp_buff *xdp, > return bq_enqueue(dev, xdpf, dev_rx); > } > > +int dev_xdp_enqueue(struct net_device *dev, struct xdp_buff *xdp, > + struct net_device *dev_rx) > +{ > + return _xdp_enqueue(dev, xdp, dev_rx); AFAIK normally the internal functions are prefixed with a double underscore, no? Could we have it renamed to __xdp_enqueue? > +} > + > +int dev_map_enqueue(struct bpf_dtab_netdev *dst, struct xdp_buff *xdp, > + struct net_device *dev_rx) > +{ > + struct net_device *dev = dst->dev; > + > + return _xdp_enqueue(dev, xdp, dev_rx); > +} > + > int dev_map_generic_redirect(struct bpf_dtab_netdev *dst, struct sk_buff *skb, > struct bpf_prog *xdp_prog) > { > @@ -762,7 +775,7 @@ static int __init dev_map_init(void) > register_netdevice_notifier(&dev_map_notifier); > > for_each_possible_cpu(cpu) > - INIT_LIST_HEAD(&per_cpu(dev_map_flush_list, cpu)); > + INIT_LIST_HEAD(&per_cpu(dev_flush_list, cpu)); > return 0; > } > > diff --git a/net/core/filter.c b/net/core/filter.c > index 42fd17c48c5f..f023f3a8f351 100644 > --- a/net/core/filter.c > +++ b/net/core/filter.c > @@ -3458,58 +3458,6 @@ static const struct bpf_func_proto bpf_xdp_adjust_meta_proto = { > .arg2_type = ARG_ANYTHING, > }; > > -static int __bpf_tx_xdp(struct net_device *dev, > - struct bpf_map *map, > - struct xdp_buff *xdp, > - u32 index) > -{ > - struct xdp_frame *xdpf; > - int err, sent; > - > - if (!dev->netdev_ops->ndo_xdp_xmit) { > - return -EOPNOTSUPP; > - } > - > - err = xdp_ok_fwd_dev(dev, xdp->data_end - xdp->data); > - if (unlikely(err)) > - return err; > - > - xdpf = convert_to_xdp_frame(xdp); > - if (unlikely(!xdpf)) > - return -EOVERFLOW; > - > - sent = dev->netdev_ops->ndo_xdp_xmit(dev, 1, &xdpf, XDP_XMIT_FLUSH); > - if (sent <= 0) > - return sent; > - return 0; > -} > - > -static noinline int > -xdp_do_redirect_slow(struct net_device *dev, struct xdp_buff *xdp, > - struct bpf_prog *xdp_prog, struct bpf_redirect_info *ri) > -{ > - struct net_device *fwd; > - u32 index = ri->tgt_index; > - int err; > - > - fwd = dev_get_by_index_rcu(dev_net(dev), index); > - ri->tgt_index = 0; > - if (unlikely(!fwd)) { > - err = -EINVAL; > - goto err; > - } > - > - err = __bpf_tx_xdp(fwd, NULL, xdp, 0); > - if (unlikely(err)) > - goto err; > - > - _trace_xdp_redirect(dev, xdp_prog, index); > - return 0; > -err: > - _trace_xdp_redirect_err(dev, xdp_prog, index, err); > - return err; > -} > - > static int __bpf_tx_xdp_map(struct net_device *dev_rx, void *fwd, > struct bpf_map *map, struct xdp_buff *xdp) > { > @@ -3529,7 +3477,7 @@ static int __bpf_tx_xdp_map(struct net_device *dev_rx, void *fwd, > > void xdp_do_flush_map(void) > { > - __dev_map_flush(); > + __dev_flush(); Hmm maybe it's also time for s/xdp_do_flush_map/xdp_do_flush ? Drivers changes, though :< > __cpu_map_flush(); > __xsk_map_flush(); > } > @@ -3568,10 +3516,11 @@ void bpf_clear_redirect_map(struct bpf_map *map) > } > } > > -static int xdp_do_redirect_map(struct net_device *dev, struct xdp_buff *xdp, > - struct bpf_prog *xdp_prog, struct bpf_map *map, > - struct bpf_redirect_info *ri) > +int xdp_do_redirect(struct net_device *dev, struct xdp_buff *xdp, > + struct bpf_prog *xdp_prog) > { > + struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info); > + struct bpf_map *map = READ_ONCE(ri->map); > u32 index = ri->tgt_index; > void *fwd = ri->tgt_value; > int err; > @@ -3580,7 +3529,18 @@ static int xdp_do_redirect_map(struct net_device *dev, struct xdp_buff *xdp, > ri->tgt_value = NULL; > WRITE_ONCE(ri->map, NULL); > > - err = __bpf_tx_xdp_map(dev, fwd, map, xdp); > + if (unlikely(!map)) { > + fwd = dev_get_by_index_rcu(dev_net(dev), index); > + if (unlikely(!fwd)) { > + err = -EINVAL; > + goto err; > + } > + > + err = dev_xdp_enqueue(fwd, xdp, dev); > + } else { > + err = __bpf_tx_xdp_map(dev, fwd, map, xdp); > + } > + > if (unlikely(err)) > goto err; > > @@ -3590,18 +3550,6 @@ static int xdp_do_redirect_map(struct net_device *dev, struct xdp_buff *xdp, > _trace_xdp_redirect_map_err(dev, xdp_prog, fwd, map, index, err); > return err; > } > - > -int xdp_do_redirect(struct net_device *dev, struct xdp_buff *xdp, > - struct bpf_prog *xdp_prog) > -{ > - struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info); > - struct bpf_map *map = READ_ONCE(ri->map); > - > - if (likely(map)) > - return xdp_do_redirect_map(dev, xdp, xdp_prog, map, ri); > - > - return xdp_do_redirect_slow(dev, xdp, xdp_prog, ri); > -} > EXPORT_SYMBOL_GPL(xdp_do_redirect); > > static int xdp_do_generic_redirect_map(struct net_device *dev, > ^ permalink raw reply [flat|nested] 13+ messages in thread
* RE: [PATCH bpf-next v2 2/2] xdp: Use bulking for non-map XDP_REDIRECT and consolidate code paths 2020-01-13 18:10 ` [PATCH bpf-next v2 2/2] xdp: Use bulking for non-map XDP_REDIRECT and consolidate code paths Toke Høiland-Jørgensen 2020-01-15 12:16 ` Maciej Fijalkowski @ 2020-01-15 19:43 ` John Fastabend 1 sibling, 0 replies; 13+ messages in thread From: John Fastabend @ 2020-01-15 19:43 UTC (permalink / raw) To: Toke Høiland-Jørgensen, netdev Cc: bpf, Daniel Borkmann, Alexei Starovoitov, David Miller, Jesper Dangaard Brouer, Björn Töpel, John Fastabend Toke Høiland-Jørgensen wrote: > From: Toke Høiland-Jørgensen <toke@redhat.com> > > Since the bulk queue used by XDP_REDIRECT now lives in struct net_device, > we can re-use the bulking for the non-map version of the bpf_redirect() > helper. This is a simple matter of having xdp_do_redirect_slow() queue the > frame on the bulk queue instead of sending it out with __bpf_tx_xdp(). > > Unfortunately we can't make the bpf_redirect() helper return an error if > the ifindex doesn't exit (as bpf_redirect_map() does), because we don't > have a reference to the network namespace of the ingress device at the time > the helper is called. So we have to leave it as-is and keep the device > lookup in xdp_do_redirect_slow(). > > Since this leaves less reason to have the non-map redirect code in a > separate function, so we get rid of the xdp_do_redirect_slow() function > entirely. This does lose us the tracepoint disambiguation, but fortunately > the xdp_redirect and xdp_redirect_map tracepoints use the same tracepoint > entry structures. This means both can contain a map index, so we can just > amend the tracepoint definitions so we always emit the xdp_redirect(_err) > tracepoints, but with the map ID only populated if a map is present. This > means we retire the xdp_redirect_map(_err) tracepoints entirely, but keep > the definitions around in case someone is still listening for them. > > With this change, the performance of the xdp_redirect sample program goes > from 5Mpps to 8.4Mpps (a 68% increase). > > Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Acked-by: John Fastabend <john.fastabend@gmail.com> ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH bpf-next v2 0/2] xdp: Introduce bulking for non-map XDP_REDIRECT 2020-01-13 18:10 [PATCH bpf-next v2 0/2] xdp: Introduce bulking for non-map XDP_REDIRECT Toke Høiland-Jørgensen 2020-01-13 18:10 ` [PATCH bpf-next v2 1/2] xdp: Move devmap bulk queue into struct net_device Toke Høiland-Jørgensen 2020-01-13 18:10 ` [PATCH bpf-next v2 2/2] xdp: Use bulking for non-map XDP_REDIRECT and consolidate code paths Toke Høiland-Jørgensen @ 2020-01-14 17:47 ` Alexei Starovoitov 2020-01-15 17:49 ` John Fastabend 2 siblings, 1 reply; 13+ messages in thread From: Alexei Starovoitov @ 2020-01-14 17:47 UTC (permalink / raw) To: Toke Høiland-Jørgensen Cc: Network Development, bpf, Daniel Borkmann, Alexei Starovoitov, David Miller, Jesper Dangaard Brouer, Björn Töpel, John Fastabend On Mon, Jan 13, 2020 at 10:11 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote: > > Since commit 96360004b862 ("xdp: Make devmap flush_list common for all map > instances"), devmap flushing is a global operation instead of tied to a > particular map. This means that with a bit of refactoring, we can finally fix > the performance delta between the bpf_redirect_map() and bpf_redirect() helper > functions, by introducing bulking for the latter as well. > > This series makes this change by moving the data structure used for the bulking > into struct net_device itself, so we can access it even when there is not > devmap. Once this is done, moving the bpf_redirect() helper to use the bulking > mechanism becomes quite trivial, and brings bpf_redirect() up to the same as > bpf_redirect_map(): > > Before: After: > 1 CPU: > bpf_redirect_map: 8.4 Mpps 8.4 Mpps (no change) > bpf_redirect: 5.0 Mpps 8.4 Mpps (+68%) > 2 CPUs: > bpf_redirect_map: 15.9 Mpps 16.1 Mpps (+1% or ~no change) > bpf_redirect: 9.5 Mpps 15.9 Mpps (+67%) > > After this patch series, the only semantics different between the two variants > of the bpf() helper (apart from the absence of a map argument, obviously) is > that the _map() variant will return an error if passed an invalid map index, > whereas the bpf_redirect() helper will succeed, but drop packets on > xdp_do_redirect(). This is because the helper has no reference to the calling > netdev, so unfortunately we can't do the ifindex lookup directly in the helper. > > Changelog: > > v2: > - Consolidate code paths and tracepoints for map and non-map redirect variants > (Björn) > - Add performance data for 2-CPU test (Jesper) > - Move fields to avoid shifting cache lines in struct net_device (Eric) John, since you commented on v1 please review this v2. Thanks! ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH bpf-next v2 0/2] xdp: Introduce bulking for non-map XDP_REDIRECT 2020-01-14 17:47 ` [PATCH bpf-next v2 0/2] xdp: Introduce bulking for non-map XDP_REDIRECT Alexei Starovoitov @ 2020-01-15 17:49 ` John Fastabend 0 siblings, 0 replies; 13+ messages in thread From: John Fastabend @ 2020-01-15 17:49 UTC (permalink / raw) To: Alexei Starovoitov, Toke Høiland-Jørgensen Cc: Network Development, bpf, Daniel Borkmann, Alexei Starovoitov, David Miller, Jesper Dangaard Brouer, Björn Töpel, John Fastabend Alexei Starovoitov wrote: > On Mon, Jan 13, 2020 at 10:11 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote: > > > > Since commit 96360004b862 ("xdp: Make devmap flush_list common for all map > > instances"), devmap flushing is a global operation instead of tied to a > > particular map. This means that with a bit of refactoring, we can finally fix > > the performance delta between the bpf_redirect_map() and bpf_redirect() helper > > functions, by introducing bulking for the latter as well. > > > > This series makes this change by moving the data structure used for the bulking > > into struct net_device itself, so we can access it even when there is not > > devmap. Once this is done, moving the bpf_redirect() helper to use the bulking > > mechanism becomes quite trivial, and brings bpf_redirect() up to the same as > > bpf_redirect_map(): > > > > Before: After: > > 1 CPU: > > bpf_redirect_map: 8.4 Mpps 8.4 Mpps (no change) > > bpf_redirect: 5.0 Mpps 8.4 Mpps (+68%) > > 2 CPUs: > > bpf_redirect_map: 15.9 Mpps 16.1 Mpps (+1% or ~no change) > > bpf_redirect: 9.5 Mpps 15.9 Mpps (+67%) > > > > After this patch series, the only semantics different between the two variants > > of the bpf() helper (apart from the absence of a map argument, obviously) is > > that the _map() variant will return an error if passed an invalid map index, > > whereas the bpf_redirect() helper will succeed, but drop packets on > > xdp_do_redirect(). This is because the helper has no reference to the calling > > netdev, so unfortunately we can't do the ifindex lookup directly in the helper. > > > > Changelog: > > > > v2: > > - Consolidate code paths and tracepoints for map and non-map redirect variants > > (Björn) > > - Add performance data for 2-CPU test (Jesper) > > - Move fields to avoid shifting cache lines in struct net_device (Eric) > > John, since you commented on v1 please review this v2. Thanks! hmm don't think I had an initial comment but will review regardless ;) ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2020-01-16 13:51 UTC | newest] Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-01-13 18:10 [PATCH bpf-next v2 0/2] xdp: Introduce bulking for non-map XDP_REDIRECT Toke Høiland-Jørgensen 2020-01-13 18:10 ` [PATCH bpf-next v2 1/2] xdp: Move devmap bulk queue into struct net_device Toke Høiland-Jørgensen 2020-01-15 19:45 ` John Fastabend 2020-01-15 22:22 ` Toke Høiland-Jørgensen 2020-01-15 20:17 ` Jesper Dangaard Brouer 2020-01-15 22:11 ` Toke Høiland-Jørgensen 2020-01-16 11:24 ` Jesper Dangaard Brouer 2020-01-16 13:51 ` Toke Høiland-Jørgensen 2020-01-13 18:10 ` [PATCH bpf-next v2 2/2] xdp: Use bulking for non-map XDP_REDIRECT and consolidate code paths Toke Høiland-Jørgensen 2020-01-15 12:16 ` Maciej Fijalkowski 2020-01-15 19:43 ` John Fastabend 2020-01-14 17:47 ` [PATCH bpf-next v2 0/2] xdp: Introduce bulking for non-map XDP_REDIRECT Alexei Starovoitov 2020-01-15 17:49 ` John Fastabend
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).