* [PATCH bpf-next 0/2] xdp: Introduce bulking for non-map XDP_REDIRECT @ 2020-01-10 14:22 Toke Høiland-Jørgensen 2020-01-10 14:22 ` [PATCH bpf-next 1/2] xdp: Move devmap bulk queue into struct net_device Toke Høiland-Jørgensen 2020-01-10 14:22 ` [PATCH bpf-next 2/2] xdp: Use bulking for non-map XDP_REDIRECT Toke Høiland-Jørgensen 0 siblings, 2 replies; 13+ messages in thread From: Toke Høiland-Jørgensen @ 2020-01-10 14:22 UTC (permalink / raw) To: netdev Cc: bpf, Daniel Borkmann, Alexei Starovoitov, David Miller, Jesper Dangaard Brouer, Björn Töpel, John Fastabend Since commit 96360004b862 ("xdp: Make devmap flush_list common for all map instances"), devmap flushing is a global operation instead of tied to a particular map. This means that with a bit of refactoring, we can finally fix the performance delta between the bpf_redirect_map() and bpf_redirect() helper functions, by introducing bulking for the latter as well. This series makes this change by moving the data structure used for the bulking into struct net_device itself, so we can access it even when there is not devmap. Once this is done, moving the bpf_redirect() helper to use the bulking mechanism becomes quite trivial, and brings bpf_redirect() up to the same as bpf_redirect_map(): Before: After: bpf_redirect_map: 8.4 Mpps 8.4 Mpps (no change) bpf_redirect: 5.0 Mpps 8.4 Mpps (+68%) After this patch series, the only semantics different between the two variants of the bpf() helper (apart from the absence of a map argument, obviously) is that the _map() variant will return an error if passed an invalid map index, whereas the bpf_redirect() helper will succeed, but drop packets on xdp_do_redirect(). This is because the helper has no reference to the calling netdev, so unfortunately we can't do the ifindex lookup directly in the helper. --- Toke Høiland-Jørgensen (2): xdp: Move devmap bulk queue into struct net_device xdp: Use bulking for non-map XDP_REDIRECT include/linux/bpf.h | 13 +++++- include/linux/netdevice.h | 3 + include/trace/events/xdp.h | 2 - kernel/bpf/devmap.c | 92 ++++++++++++++++++++++---------------------- net/core/dev.c | 2 + net/core/filter.c | 30 +------------- 6 files changed, 66 insertions(+), 76 deletions(-) ^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH bpf-next 1/2] xdp: Move devmap bulk queue into struct net_device 2020-01-10 14:22 [PATCH bpf-next 0/2] xdp: Introduce bulking for non-map XDP_REDIRECT Toke Høiland-Jørgensen @ 2020-01-10 14:22 ` Toke Høiland-Jørgensen 2020-01-10 15:03 ` Björn Töpel 2020-01-10 16:08 ` Jesper Dangaard Brouer 2020-01-10 14:22 ` [PATCH bpf-next 2/2] xdp: Use bulking for non-map XDP_REDIRECT Toke Høiland-Jørgensen 1 sibling, 2 replies; 13+ messages in thread From: Toke Høiland-Jørgensen @ 2020-01-10 14:22 UTC (permalink / raw) To: netdev Cc: bpf, Daniel Borkmann, Alexei Starovoitov, David Miller, Jesper Dangaard Brouer, Björn Töpel, John Fastabend From: Toke Høiland-Jørgensen <toke@redhat.com> Commit 96360004b862 ("xdp: Make devmap flush_list common for all map instances"), changed devmap flushing to be a global operation instead of a per-map operation. However, the queue structure used for bulking was still allocated as part of the containing map. This patch moves the devmap bulk queue into struct net_device. The motivation for this is reusing it for the non-map variant of XDP_REDIRECT, which will be changed in a subsequent commit. We defer the actual allocation of the bulk queue structure until the NETDEV_REGISTER notification devmap.c. This makes it possible to check for ndo_xdp_xmit support before allocating the structure, which is not possible at the time struct net_device is allocated. However, we keep the freeing in free_netdev() to avoid adding another RCU callback on NETDEV_UNREGISTER. Because of this change, we lose the reference back to the map that originated the redirect, so change the tracepoint to always return 0 as the map ID and index. Otherwise no functional change is intended with this patch. Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> --- include/linux/netdevice.h | 3 ++ include/trace/events/xdp.h | 2 + kernel/bpf/devmap.c | 61 ++++++++++++++++++-------------------------- net/core/dev.c | 2 + 4 files changed, 31 insertions(+), 37 deletions(-) diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 2741aa35bec6..1b2bc2a7522e 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -876,6 +876,7 @@ enum bpf_netdev_command { struct bpf_prog_offload_ops; struct netlink_ext_ack; struct xdp_umem; +struct xdp_dev_bulk_queue; struct netdev_bpf { enum bpf_netdev_command command; @@ -1993,6 +1994,8 @@ struct net_device { spinlock_t tx_global_lock; int watchdog_timeo; + struct xdp_dev_bulk_queue __percpu *xdp_bulkq; + #ifdef CONFIG_XPS struct xps_dev_maps __rcu *xps_cpus_map; struct xps_dev_maps __rcu *xps_rxqs_map; diff --git a/include/trace/events/xdp.h b/include/trace/events/xdp.h index a7378bcd9928..72bad13d4a3c 100644 --- a/include/trace/events/xdp.h +++ b/include/trace/events/xdp.h @@ -278,7 +278,7 @@ TRACE_EVENT(xdp_devmap_xmit, ), TP_fast_assign( - __entry->map_id = map->id; + __entry->map_id = map ? map->id : 0; __entry->act = XDP_REDIRECT; __entry->map_index = map_index; __entry->drops = drops; diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c index da9c832fc5c8..bcb05cb6b728 100644 --- a/kernel/bpf/devmap.c +++ b/kernel/bpf/devmap.c @@ -53,13 +53,11 @@ (BPF_F_NUMA_NODE | BPF_F_RDONLY | BPF_F_WRONLY) #define DEV_MAP_BULK_SIZE 16 -struct bpf_dtab_netdev; - -struct xdp_bulk_queue { +struct xdp_dev_bulk_queue { struct xdp_frame *q[DEV_MAP_BULK_SIZE]; struct list_head flush_node; + struct net_device *dev; struct net_device *dev_rx; - struct bpf_dtab_netdev *obj; unsigned int count; }; @@ -67,9 +65,8 @@ struct bpf_dtab_netdev { struct net_device *dev; /* must be first member, due to tracepoint */ struct hlist_node index_hlist; struct bpf_dtab *dtab; - struct xdp_bulk_queue __percpu *bulkq; struct rcu_head rcu; - unsigned int idx; /* keep track of map index for tracepoint */ + unsigned int idx; }; struct bpf_dtab { @@ -219,7 +216,6 @@ static void dev_map_free(struct bpf_map *map) hlist_for_each_entry_safe(dev, next, head, index_hlist) { hlist_del_rcu(&dev->index_hlist); - free_percpu(dev->bulkq); dev_put(dev->dev); kfree(dev); } @@ -234,7 +230,6 @@ static void dev_map_free(struct bpf_map *map) if (!dev) continue; - free_percpu(dev->bulkq); dev_put(dev->dev); kfree(dev); } @@ -320,10 +315,9 @@ static int dev_map_hash_get_next_key(struct bpf_map *map, void *key, return -ENOENT; } -static int bq_xmit_all(struct xdp_bulk_queue *bq, u32 flags) +static int bq_xmit_all(struct xdp_dev_bulk_queue *bq, u32 flags) { - struct bpf_dtab_netdev *obj = bq->obj; - struct net_device *dev = obj->dev; + struct net_device *dev = bq->dev; int sent = 0, drops = 0, err = 0; int i; @@ -346,8 +340,7 @@ static int bq_xmit_all(struct xdp_bulk_queue *bq, u32 flags) out: bq->count = 0; - trace_xdp_devmap_xmit(&obj->dtab->map, obj->idx, - sent, drops, bq->dev_rx, dev, err); + trace_xdp_devmap_xmit(NULL, 0, sent, drops, bq->dev_rx, dev, err); bq->dev_rx = NULL; __list_del_clearprev(&bq->flush_node); return 0; @@ -374,7 +367,7 @@ static int bq_xmit_all(struct xdp_bulk_queue *bq, u32 flags) void __dev_map_flush(void) { struct list_head *flush_list = this_cpu_ptr(&dev_map_flush_list); - struct xdp_bulk_queue *bq, *tmp; + struct xdp_dev_bulk_queue *bq, *tmp; rcu_read_lock(); list_for_each_entry_safe(bq, tmp, flush_list, flush_node) @@ -401,12 +394,12 @@ struct bpf_dtab_netdev *__dev_map_lookup_elem(struct bpf_map *map, u32 key) /* Runs under RCU-read-side, plus in softirq under NAPI protection. * Thus, safe percpu variable access. */ -static int bq_enqueue(struct bpf_dtab_netdev *obj, struct xdp_frame *xdpf, +static int bq_enqueue(struct net_device *dev, struct xdp_frame *xdpf, struct net_device *dev_rx) { struct list_head *flush_list = this_cpu_ptr(&dev_map_flush_list); - struct xdp_bulk_queue *bq = this_cpu_ptr(obj->bulkq); + struct xdp_dev_bulk_queue *bq = this_cpu_ptr(dev->xdp_bulkq); if (unlikely(bq->count == DEV_MAP_BULK_SIZE)) bq_xmit_all(bq, 0); @@ -444,7 +437,7 @@ int dev_map_enqueue(struct bpf_dtab_netdev *dst, struct xdp_buff *xdp, if (unlikely(!xdpf)) return -EOVERFLOW; - return bq_enqueue(dst, xdpf, dev_rx); + return bq_enqueue(dev, xdpf, dev_rx); } int dev_map_generic_redirect(struct bpf_dtab_netdev *dst, struct sk_buff *skb, @@ -483,7 +476,6 @@ static void __dev_map_entry_free(struct rcu_head *rcu) struct bpf_dtab_netdev *dev; dev = container_of(rcu, struct bpf_dtab_netdev, rcu); - free_percpu(dev->bulkq); dev_put(dev->dev); kfree(dev); } @@ -538,30 +530,14 @@ static struct bpf_dtab_netdev *__dev_map_alloc_node(struct net *net, u32 ifindex, unsigned int idx) { - gfp_t gfp = GFP_ATOMIC | __GFP_NOWARN; struct bpf_dtab_netdev *dev; - struct xdp_bulk_queue *bq; - int cpu; - dev = kmalloc_node(sizeof(*dev), gfp, dtab->map.numa_node); + dev = kmalloc_node(sizeof(*dev), GFP_ATOMIC | __GFP_NOWARN, dtab->map.numa_node); if (!dev) return ERR_PTR(-ENOMEM); - dev->bulkq = __alloc_percpu_gfp(sizeof(*dev->bulkq), - sizeof(void *), gfp); - if (!dev->bulkq) { - kfree(dev); - return ERR_PTR(-ENOMEM); - } - - for_each_possible_cpu(cpu) { - bq = per_cpu_ptr(dev->bulkq, cpu); - bq->obj = dev; - } - dev->dev = dev_get_by_index(net, ifindex); if (!dev->dev) { - free_percpu(dev->bulkq); kfree(dev); return ERR_PTR(-EINVAL); } @@ -721,9 +697,22 @@ static int dev_map_notification(struct notifier_block *notifier, { struct net_device *netdev = netdev_notifier_info_to_dev(ptr); struct bpf_dtab *dtab; - int i; + int i, cpu; switch (event) { + case NETDEV_REGISTER: + if (!netdev->netdev_ops->ndo_xdp_xmit || netdev->xdp_bulkq) + break; + + /* will be freed in free_netdev() */ + netdev->xdp_bulkq = __alloc_percpu_gfp(sizeof(struct xdp_dev_bulk_queue), + sizeof(void *), GFP_ATOMIC); + if (!netdev->xdp_bulkq) + return NOTIFY_BAD; + + for_each_possible_cpu(cpu) + per_cpu_ptr(netdev->xdp_bulkq, cpu)->dev = netdev; + break; case NETDEV_UNREGISTER: /* This rcu_read_lock/unlock pair is needed because * dev_map_list is an RCU list AND to ensure a delete diff --git a/net/core/dev.c b/net/core/dev.c index d99f88c58636..e7802a41ae7f 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -9847,6 +9847,8 @@ void free_netdev(struct net_device *dev) free_percpu(dev->pcpu_refcnt); dev->pcpu_refcnt = NULL; + free_percpu(dev->xdp_bulkq); + dev->xdp_bulkq = NULL; netdev_unregister_lockdep_key(dev); ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH bpf-next 1/2] xdp: Move devmap bulk queue into struct net_device 2020-01-10 14:22 ` [PATCH bpf-next 1/2] xdp: Move devmap bulk queue into struct net_device Toke Høiland-Jørgensen @ 2020-01-10 15:03 ` Björn Töpel 2020-01-10 15:26 ` Toke Høiland-Jørgensen 2020-01-10 16:08 ` Jesper Dangaard Brouer 1 sibling, 1 reply; 13+ messages in thread From: Björn Töpel @ 2020-01-10 15:03 UTC (permalink / raw) To: Toke Høiland-Jørgensen Cc: Netdev, bpf, Daniel Borkmann, Alexei Starovoitov, David Miller, Jesper Dangaard Brouer, John Fastabend On Fri, 10 Jan 2020 at 15:22, Toke Høiland-Jørgensen <toke@redhat.com> wrote: > > From: Toke Høiland-Jørgensen <toke@redhat.com> > > Commit 96360004b862 ("xdp: Make devmap flush_list common for all map > instances"), changed devmap flushing to be a global operation instead of a > per-map operation. However, the queue structure used for bulking was still > allocated as part of the containing map. > > This patch moves the devmap bulk queue into struct net_device. The > motivation for this is reusing it for the non-map variant of XDP_REDIRECT, > which will be changed in a subsequent commit. > > We defer the actual allocation of the bulk queue structure until the > NETDEV_REGISTER notification devmap.c. This makes it possible to check for > ndo_xdp_xmit support before allocating the structure, which is not possible > at the time struct net_device is allocated. However, we keep the freeing in > free_netdev() to avoid adding another RCU callback on NETDEV_UNREGISTER. > > Because of this change, we lose the reference back to the map that > originated the redirect, so change the tracepoint to always return 0 as the > map ID and index. Otherwise no functional change is intended with this > patch. > Nice work, Toke! I'm getting some checkpatch warnings (>80 char lines), other than that: Acked-by: Björn Töpel <bjorn.topel@intel.com> > Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> > --- > include/linux/netdevice.h | 3 ++ > include/trace/events/xdp.h | 2 + > kernel/bpf/devmap.c | 61 ++++++++++++++++++-------------------------- > net/core/dev.c | 2 + > 4 files changed, 31 insertions(+), 37 deletions(-) > > diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h > index 2741aa35bec6..1b2bc2a7522e 100644 > --- a/include/linux/netdevice.h > +++ b/include/linux/netdevice.h > @@ -876,6 +876,7 @@ enum bpf_netdev_command { > struct bpf_prog_offload_ops; > struct netlink_ext_ack; > struct xdp_umem; > +struct xdp_dev_bulk_queue; > > struct netdev_bpf { > enum bpf_netdev_command command; > @@ -1993,6 +1994,8 @@ struct net_device { > spinlock_t tx_global_lock; > int watchdog_timeo; > > + struct xdp_dev_bulk_queue __percpu *xdp_bulkq; > + > #ifdef CONFIG_XPS > struct xps_dev_maps __rcu *xps_cpus_map; > struct xps_dev_maps __rcu *xps_rxqs_map; > diff --git a/include/trace/events/xdp.h b/include/trace/events/xdp.h > index a7378bcd9928..72bad13d4a3c 100644 > --- a/include/trace/events/xdp.h > +++ b/include/trace/events/xdp.h > @@ -278,7 +278,7 @@ TRACE_EVENT(xdp_devmap_xmit, > ), > > TP_fast_assign( > - __entry->map_id = map->id; > + __entry->map_id = map ? map->id : 0; > __entry->act = XDP_REDIRECT; > __entry->map_index = map_index; > __entry->drops = drops; > diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c > index da9c832fc5c8..bcb05cb6b728 100644 > --- a/kernel/bpf/devmap.c > +++ b/kernel/bpf/devmap.c > @@ -53,13 +53,11 @@ > (BPF_F_NUMA_NODE | BPF_F_RDONLY | BPF_F_WRONLY) > > #define DEV_MAP_BULK_SIZE 16 > -struct bpf_dtab_netdev; > - > -struct xdp_bulk_queue { > +struct xdp_dev_bulk_queue { > struct xdp_frame *q[DEV_MAP_BULK_SIZE]; > struct list_head flush_node; > + struct net_device *dev; > struct net_device *dev_rx; > - struct bpf_dtab_netdev *obj; > unsigned int count; > }; > > @@ -67,9 +65,8 @@ struct bpf_dtab_netdev { > struct net_device *dev; /* must be first member, due to tracepoint */ > struct hlist_node index_hlist; > struct bpf_dtab *dtab; > - struct xdp_bulk_queue __percpu *bulkq; > struct rcu_head rcu; > - unsigned int idx; /* keep track of map index for tracepoint */ > + unsigned int idx; > }; > > struct bpf_dtab { > @@ -219,7 +216,6 @@ static void dev_map_free(struct bpf_map *map) > > hlist_for_each_entry_safe(dev, next, head, index_hlist) { > hlist_del_rcu(&dev->index_hlist); > - free_percpu(dev->bulkq); > dev_put(dev->dev); > kfree(dev); > } > @@ -234,7 +230,6 @@ static void dev_map_free(struct bpf_map *map) > if (!dev) > continue; > > - free_percpu(dev->bulkq); > dev_put(dev->dev); > kfree(dev); > } > @@ -320,10 +315,9 @@ static int dev_map_hash_get_next_key(struct bpf_map *map, void *key, > return -ENOENT; > } > > -static int bq_xmit_all(struct xdp_bulk_queue *bq, u32 flags) > +static int bq_xmit_all(struct xdp_dev_bulk_queue *bq, u32 flags) > { > - struct bpf_dtab_netdev *obj = bq->obj; > - struct net_device *dev = obj->dev; > + struct net_device *dev = bq->dev; > int sent = 0, drops = 0, err = 0; > int i; > > @@ -346,8 +340,7 @@ static int bq_xmit_all(struct xdp_bulk_queue *bq, u32 flags) > out: > bq->count = 0; > > - trace_xdp_devmap_xmit(&obj->dtab->map, obj->idx, > - sent, drops, bq->dev_rx, dev, err); > + trace_xdp_devmap_xmit(NULL, 0, sent, drops, bq->dev_rx, dev, err); > bq->dev_rx = NULL; > __list_del_clearprev(&bq->flush_node); > return 0; > @@ -374,7 +367,7 @@ static int bq_xmit_all(struct xdp_bulk_queue *bq, u32 flags) > void __dev_map_flush(void) > { > struct list_head *flush_list = this_cpu_ptr(&dev_map_flush_list); > - struct xdp_bulk_queue *bq, *tmp; > + struct xdp_dev_bulk_queue *bq, *tmp; > > rcu_read_lock(); > list_for_each_entry_safe(bq, tmp, flush_list, flush_node) > @@ -401,12 +394,12 @@ struct bpf_dtab_netdev *__dev_map_lookup_elem(struct bpf_map *map, u32 key) > /* Runs under RCU-read-side, plus in softirq under NAPI protection. > * Thus, safe percpu variable access. > */ > -static int bq_enqueue(struct bpf_dtab_netdev *obj, struct xdp_frame *xdpf, > +static int bq_enqueue(struct net_device *dev, struct xdp_frame *xdpf, > struct net_device *dev_rx) > > { > struct list_head *flush_list = this_cpu_ptr(&dev_map_flush_list); > - struct xdp_bulk_queue *bq = this_cpu_ptr(obj->bulkq); > + struct xdp_dev_bulk_queue *bq = this_cpu_ptr(dev->xdp_bulkq); > > if (unlikely(bq->count == DEV_MAP_BULK_SIZE)) > bq_xmit_all(bq, 0); > @@ -444,7 +437,7 @@ int dev_map_enqueue(struct bpf_dtab_netdev *dst, struct xdp_buff *xdp, > if (unlikely(!xdpf)) > return -EOVERFLOW; > > - return bq_enqueue(dst, xdpf, dev_rx); > + return bq_enqueue(dev, xdpf, dev_rx); > } > > int dev_map_generic_redirect(struct bpf_dtab_netdev *dst, struct sk_buff *skb, > @@ -483,7 +476,6 @@ static void __dev_map_entry_free(struct rcu_head *rcu) > struct bpf_dtab_netdev *dev; > > dev = container_of(rcu, struct bpf_dtab_netdev, rcu); > - free_percpu(dev->bulkq); > dev_put(dev->dev); > kfree(dev); > } > @@ -538,30 +530,14 @@ static struct bpf_dtab_netdev *__dev_map_alloc_node(struct net *net, > u32 ifindex, > unsigned int idx) > { > - gfp_t gfp = GFP_ATOMIC | __GFP_NOWARN; > struct bpf_dtab_netdev *dev; > - struct xdp_bulk_queue *bq; > - int cpu; > > - dev = kmalloc_node(sizeof(*dev), gfp, dtab->map.numa_node); > + dev = kmalloc_node(sizeof(*dev), GFP_ATOMIC | __GFP_NOWARN, dtab->map.numa_node); > if (!dev) > return ERR_PTR(-ENOMEM); > > - dev->bulkq = __alloc_percpu_gfp(sizeof(*dev->bulkq), > - sizeof(void *), gfp); > - if (!dev->bulkq) { > - kfree(dev); > - return ERR_PTR(-ENOMEM); > - } > - > - for_each_possible_cpu(cpu) { > - bq = per_cpu_ptr(dev->bulkq, cpu); > - bq->obj = dev; > - } > - > dev->dev = dev_get_by_index(net, ifindex); > if (!dev->dev) { > - free_percpu(dev->bulkq); > kfree(dev); > return ERR_PTR(-EINVAL); > } > @@ -721,9 +697,22 @@ static int dev_map_notification(struct notifier_block *notifier, > { > struct net_device *netdev = netdev_notifier_info_to_dev(ptr); > struct bpf_dtab *dtab; > - int i; > + int i, cpu; > > switch (event) { > + case NETDEV_REGISTER: > + if (!netdev->netdev_ops->ndo_xdp_xmit || netdev->xdp_bulkq) > + break; > + > + /* will be freed in free_netdev() */ > + netdev->xdp_bulkq = __alloc_percpu_gfp(sizeof(struct xdp_dev_bulk_queue), > + sizeof(void *), GFP_ATOMIC); > + if (!netdev->xdp_bulkq) > + return NOTIFY_BAD; > + > + for_each_possible_cpu(cpu) > + per_cpu_ptr(netdev->xdp_bulkq, cpu)->dev = netdev; > + break; > case NETDEV_UNREGISTER: > /* This rcu_read_lock/unlock pair is needed because > * dev_map_list is an RCU list AND to ensure a delete > diff --git a/net/core/dev.c b/net/core/dev.c > index d99f88c58636..e7802a41ae7f 100644 > --- a/net/core/dev.c > +++ b/net/core/dev.c > @@ -9847,6 +9847,8 @@ void free_netdev(struct net_device *dev) > > free_percpu(dev->pcpu_refcnt); > dev->pcpu_refcnt = NULL; > + free_percpu(dev->xdp_bulkq); > + dev->xdp_bulkq = NULL; > > netdev_unregister_lockdep_key(dev); > > ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH bpf-next 1/2] xdp: Move devmap bulk queue into struct net_device 2020-01-10 15:03 ` Björn Töpel @ 2020-01-10 15:26 ` Toke Høiland-Jørgensen 0 siblings, 0 replies; 13+ messages in thread From: Toke Høiland-Jørgensen @ 2020-01-10 15:26 UTC (permalink / raw) To: Björn Töpel Cc: Netdev, bpf, Daniel Borkmann, Alexei Starovoitov, David Miller, Jesper Dangaard Brouer, John Fastabend Björn Töpel <bjorn.topel@gmail.com> writes: > On Fri, 10 Jan 2020 at 15:22, Toke Høiland-Jørgensen <toke@redhat.com> wrote: >> >> From: Toke Høiland-Jørgensen <toke@redhat.com> >> >> Commit 96360004b862 ("xdp: Make devmap flush_list common for all map >> instances"), changed devmap flushing to be a global operation instead of a >> per-map operation. However, the queue structure used for bulking was still >> allocated as part of the containing map. >> >> This patch moves the devmap bulk queue into struct net_device. The >> motivation for this is reusing it for the non-map variant of XDP_REDIRECT, >> which will be changed in a subsequent commit. >> >> We defer the actual allocation of the bulk queue structure until the >> NETDEV_REGISTER notification devmap.c. This makes it possible to check for >> ndo_xdp_xmit support before allocating the structure, which is not possible >> at the time struct net_device is allocated. However, we keep the freeing in >> free_netdev() to avoid adding another RCU callback on NETDEV_UNREGISTER. >> >> Because of this change, we lose the reference back to the map that >> originated the redirect, so change the tracepoint to always return 0 as the >> map ID and index. Otherwise no functional change is intended with this >> patch. >> > > Nice work, Toke! Thanks! > I'm getting some checkpatch warnings (>80 char lines), other than > that: Oh, right, totally forgot to run checkpatch; will fix and respin :) -Toke ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH bpf-next 1/2] xdp: Move devmap bulk queue into struct net_device 2020-01-10 14:22 ` [PATCH bpf-next 1/2] xdp: Move devmap bulk queue into struct net_device Toke Høiland-Jørgensen 2020-01-10 15:03 ` Björn Töpel @ 2020-01-10 16:08 ` Jesper Dangaard Brouer 2020-01-10 22:34 ` Toke Høiland-Jørgensen 1 sibling, 1 reply; 13+ messages in thread From: Jesper Dangaard Brouer @ 2020-01-10 16:08 UTC (permalink / raw) To: Toke Høiland-Jørgensen Cc: netdev, bpf, Daniel Borkmann, Alexei Starovoitov, David Miller, Björn Töpel, John Fastabend, brouer On Fri, 10 Jan 2020 15:22:02 +0100 Toke Høiland-Jørgensen <toke@redhat.com> wrote: > diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h > index 2741aa35bec6..1b2bc2a7522e 100644 > --- a/include/linux/netdevice.h > +++ b/include/linux/netdevice.h [...] > @@ -1993,6 +1994,8 @@ struct net_device { > spinlock_t tx_global_lock; > int watchdog_timeo; > > + struct xdp_dev_bulk_queue __percpu *xdp_bulkq; > + > #ifdef CONFIG_XPS > struct xps_dev_maps __rcu *xps_cpus_map; > struct xps_dev_maps __rcu *xps_rxqs_map; We need to check that the cache-line for this location in struct net_device is not getting updated (write operation) from different CPUs. The test you ran was a single queue single CPU test, which will not show any regression for that case. -- Best regards, Jesper Dangaard Brouer MSc.CS, Principal Kernel Engineer at Red Hat LinkedIn: http://www.linkedin.com/in/brouer ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH bpf-next 1/2] xdp: Move devmap bulk queue into struct net_device 2020-01-10 16:08 ` Jesper Dangaard Brouer @ 2020-01-10 22:34 ` Toke Høiland-Jørgensen 2020-01-10 22:46 ` Eric Dumazet 0 siblings, 1 reply; 13+ messages in thread From: Toke Høiland-Jørgensen @ 2020-01-10 22:34 UTC (permalink / raw) To: Jesper Dangaard Brouer Cc: netdev, bpf, Daniel Borkmann, Alexei Starovoitov, David Miller, Björn Töpel, John Fastabend, brouer Jesper Dangaard Brouer <brouer@redhat.com> writes: > On Fri, 10 Jan 2020 15:22:02 +0100 > Toke Høiland-Jørgensen <toke@redhat.com> wrote: > >> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h >> index 2741aa35bec6..1b2bc2a7522e 100644 >> --- a/include/linux/netdevice.h >> +++ b/include/linux/netdevice.h > [...] >> @@ -1993,6 +1994,8 @@ struct net_device { >> spinlock_t tx_global_lock; >> int watchdog_timeo; >> >> + struct xdp_dev_bulk_queue __percpu *xdp_bulkq; >> + >> #ifdef CONFIG_XPS >> struct xps_dev_maps __rcu *xps_cpus_map; >> struct xps_dev_maps __rcu *xps_rxqs_map; > > We need to check that the cache-line for this location in struct > net_device is not getting updated (write operation) from different CPUs. > > The test you ran was a single queue single CPU test, which will not > show any regression for that case. Well, pahole says: /* --- cacheline 14 boundary (896 bytes) --- */ struct netdev_queue * _tx __attribute__((__aligned__(64))); /* 896 8 */ unsigned int num_tx_queues; /* 904 4 */ unsigned int real_num_tx_queues; /* 908 4 */ struct Qdisc * qdisc; /* 912 8 */ struct hlist_head qdisc_hash[16]; /* 920 128 */ /* --- cacheline 16 boundary (1024 bytes) was 24 bytes ago --- */ unsigned int tx_queue_len; /* 1048 4 */ spinlock_t tx_global_lock; /* 1052 4 */ int watchdog_timeo; /* 1056 4 */ /* XXX 4 bytes hole, try to pack */ struct xdp_dev_bulk_queue * xdp_bulkq; /* 1064 8 */ struct xps_dev_maps * xps_cpus_map; /* 1072 8 */ struct xps_dev_maps * xps_rxqs_map; /* 1080 8 */ /* --- cacheline 17 boundary (1088 bytes) --- */ of those, tx_queue_len is the max queue len (so only set on init), tx_global_lock is not used by multi-queue devices, watchdog_timeo also seems to be a static value thats set on init, and the xps* pointers also only seems to be set once on init. So I think we're fine? I can run a multi-CPU test just to be sure, but I really don't see which of those fields might be updated on TX... -Toke ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH bpf-next 1/2] xdp: Move devmap bulk queue into struct net_device 2020-01-10 22:34 ` Toke Høiland-Jørgensen @ 2020-01-10 22:46 ` Eric Dumazet 2020-01-10 23:16 ` Toke Høiland-Jørgensen 0 siblings, 1 reply; 13+ messages in thread From: Eric Dumazet @ 2020-01-10 22:46 UTC (permalink / raw) To: Toke Høiland-Jørgensen, Jesper Dangaard Brouer Cc: netdev, bpf, Daniel Borkmann, Alexei Starovoitov, David Miller, Björn Töpel, John Fastabend On 1/10/20 2:34 PM, Toke Høiland-Jørgensen wrote: > Jesper Dangaard Brouer <brouer@redhat.com> writes: > >> On Fri, 10 Jan 2020 15:22:02 +0100 >> Toke Høiland-Jørgensen <toke@redhat.com> wrote: >> >>> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h >>> index 2741aa35bec6..1b2bc2a7522e 100644 >>> --- a/include/linux/netdevice.h >>> +++ b/include/linux/netdevice.h >> [...] >>> @@ -1993,6 +1994,8 @@ struct net_device { >>> spinlock_t tx_global_lock; >>> int watchdog_timeo; >>> >>> + struct xdp_dev_bulk_queue __percpu *xdp_bulkq; >>> + >>> #ifdef CONFIG_XPS >>> struct xps_dev_maps __rcu *xps_cpus_map; >>> struct xps_dev_maps __rcu *xps_rxqs_map; >> >> We need to check that the cache-line for this location in struct >> net_device is not getting updated (write operation) from different CPUs. >> >> The test you ran was a single queue single CPU test, which will not >> show any regression for that case. > > Well, pahole says: > > /* --- cacheline 14 boundary (896 bytes) --- */ > struct netdev_queue * _tx __attribute__((__aligned__(64))); /* 896 8 */ > unsigned int num_tx_queues; /* 904 4 */ > unsigned int real_num_tx_queues; /* 908 4 */ > struct Qdisc * qdisc; /* 912 8 */ > struct hlist_head qdisc_hash[16]; /* 920 128 */ > /* --- cacheline 16 boundary (1024 bytes) was 24 bytes ago --- */ > unsigned int tx_queue_len; /* 1048 4 */ > spinlock_t tx_global_lock; /* 1052 4 */ > int watchdog_timeo; /* 1056 4 */ > > /* XXX 4 bytes hole, try to pack */ > > struct xdp_dev_bulk_queue * xdp_bulkq; /* 1064 8 */ > struct xps_dev_maps * xps_cpus_map; /* 1072 8 */ > struct xps_dev_maps * xps_rxqs_map; /* 1080 8 */ > /* --- cacheline 17 boundary (1088 bytes) --- */ > > > of those, tx_queue_len is the max queue len (so only set on init), > tx_global_lock is not used by multi-queue devices, watchdog_timeo also > seems to be a static value thats set on init, and the xps* pointers also > only seems to be set once on init. So I think we're fine? > > I can run a multi-CPU test just to be sure, but I really don't see which > of those fields might be updated on TX... > Note that another interesting field is miniq_egress, your patch moves it to another cache line. We probably should move qdisc_hash array elsewhere. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH bpf-next 1/2] xdp: Move devmap bulk queue into struct net_device 2020-01-10 22:46 ` Eric Dumazet @ 2020-01-10 23:16 ` Toke Høiland-Jørgensen 0 siblings, 0 replies; 13+ messages in thread From: Toke Høiland-Jørgensen @ 2020-01-10 23:16 UTC (permalink / raw) To: Eric Dumazet, Jesper Dangaard Brouer Cc: netdev, bpf, Daniel Borkmann, Alexei Starovoitov, David Miller, Björn Töpel, John Fastabend Eric Dumazet <eric.dumazet@gmail.com> writes: > On 1/10/20 2:34 PM, Toke Høiland-Jørgensen wrote: >> Jesper Dangaard Brouer <brouer@redhat.com> writes: >> >>> On Fri, 10 Jan 2020 15:22:02 +0100 >>> Toke Høiland-Jørgensen <toke@redhat.com> wrote: >>> >>>> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h >>>> index 2741aa35bec6..1b2bc2a7522e 100644 >>>> --- a/include/linux/netdevice.h >>>> +++ b/include/linux/netdevice.h >>> [...] >>>> @@ -1993,6 +1994,8 @@ struct net_device { >>>> spinlock_t tx_global_lock; >>>> int watchdog_timeo; >>>> >>>> + struct xdp_dev_bulk_queue __percpu *xdp_bulkq; >>>> + >>>> #ifdef CONFIG_XPS >>>> struct xps_dev_maps __rcu *xps_cpus_map; >>>> struct xps_dev_maps __rcu *xps_rxqs_map; >>> >>> We need to check that the cache-line for this location in struct >>> net_device is not getting updated (write operation) from different CPUs. >>> >>> The test you ran was a single queue single CPU test, which will not >>> show any regression for that case. >> >> Well, pahole says: >> >> /* --- cacheline 14 boundary (896 bytes) --- */ >> struct netdev_queue * _tx __attribute__((__aligned__(64))); /* 896 8 */ >> unsigned int num_tx_queues; /* 904 4 */ >> unsigned int real_num_tx_queues; /* 908 4 */ >> struct Qdisc * qdisc; /* 912 8 */ >> struct hlist_head qdisc_hash[16]; /* 920 128 */ >> /* --- cacheline 16 boundary (1024 bytes) was 24 bytes ago --- */ >> unsigned int tx_queue_len; /* 1048 4 */ >> spinlock_t tx_global_lock; /* 1052 4 */ >> int watchdog_timeo; /* 1056 4 */ >> >> /* XXX 4 bytes hole, try to pack */ >> >> struct xdp_dev_bulk_queue * xdp_bulkq; /* 1064 8 */ >> struct xps_dev_maps * xps_cpus_map; /* 1072 8 */ >> struct xps_dev_maps * xps_rxqs_map; /* 1080 8 */ >> /* --- cacheline 17 boundary (1088 bytes) --- */ >> >> >> of those, tx_queue_len is the max queue len (so only set on init), >> tx_global_lock is not used by multi-queue devices, watchdog_timeo also >> seems to be a static value thats set on init, and the xps* pointers also >> only seems to be set once on init. So I think we're fine? >> >> I can run a multi-CPU test just to be sure, but I really don't see which >> of those fields might be updated on TX... >> > > Note that another interesting field is miniq_egress, your patch > moves it to another cache line. Hmm, since there's that 4-byte hole, I gust we could just move watchdog_timeo down to fix that. Any reason that's a bad idea? > We probably should move qdisc_hash array elsewhere. You certainly won't hear me object to that :) -Toke ^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH bpf-next 2/2] xdp: Use bulking for non-map XDP_REDIRECT 2020-01-10 14:22 [PATCH bpf-next 0/2] xdp: Introduce bulking for non-map XDP_REDIRECT Toke Høiland-Jørgensen 2020-01-10 14:22 ` [PATCH bpf-next 1/2] xdp: Move devmap bulk queue into struct net_device Toke Høiland-Jørgensen @ 2020-01-10 14:22 ` Toke Høiland-Jørgensen 2020-01-10 15:15 ` Björn Töpel 1 sibling, 1 reply; 13+ messages in thread From: Toke Høiland-Jørgensen @ 2020-01-10 14:22 UTC (permalink / raw) To: netdev Cc: bpf, Daniel Borkmann, Alexei Starovoitov, David Miller, Jesper Dangaard Brouer, Björn Töpel, John Fastabend From: Toke Høiland-Jørgensen <toke@redhat.com> Since the bulk queue used by XDP_REDIRECT now lives in struct net_device, we can re-use the bulking for the non-map version of the bpf_redirect() helper. This is a simple matter of having xdp_do_redirect_slow() queue the frame on the bulk queue instead of sending it out with __bpf_tx_xdp(). Unfortunately we can't make the bpf_redirect() helper return an error if the ifindex doesn't exit (as bpf_redirect_map() does), because we don't have a reference to the network namespace of the ingress device at the time the helper is called. So we have to leave it as-is and keep the device lookup in xdp_do_redirect_slow(). With this change, the performance of the xdp_redirect sample program goes from 5Mpps to 8.4Mpps (a 68% increase). Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> --- include/linux/bpf.h | 13 +++++++++++-- kernel/bpf/devmap.c | 31 ++++++++++++++++++++++--------- net/core/filter.c | 30 ++---------------------------- 3 files changed, 35 insertions(+), 39 deletions(-) diff --git a/include/linux/bpf.h b/include/linux/bpf.h index b14e51d56a82..25c050202536 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -962,7 +962,9 @@ struct sk_buff; struct bpf_dtab_netdev *__dev_map_lookup_elem(struct bpf_map *map, u32 key); struct bpf_dtab_netdev *__dev_map_hash_lookup_elem(struct bpf_map *map, u32 key); -void __dev_map_flush(void); +void __dev_flush(void); +int dev_xdp_enqueue(struct net_device *dev, struct xdp_buff *xdp, + struct net_device *dev_rx); int dev_map_enqueue(struct bpf_dtab_netdev *dst, struct xdp_buff *xdp, struct net_device *dev_rx); int dev_map_generic_redirect(struct bpf_dtab_netdev *dst, struct sk_buff *skb, @@ -1071,13 +1073,20 @@ static inline struct net_device *__dev_map_hash_lookup_elem(struct bpf_map *map return NULL; } -static inline void __dev_map_flush(void) +static inline void __dev_flush(void) { } struct xdp_buff; struct bpf_dtab_netdev; +static inline +int dev_xdp_enqueue(struct net_device *dev, struct xdp_buff *xdp, + struct net_device *dev_rx) +{ + return 0; +} + static inline int dev_map_enqueue(struct bpf_dtab_netdev *dst, struct xdp_buff *xdp, struct net_device *dev_rx) diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c index bcb05cb6b728..adbb82770d02 100644 --- a/kernel/bpf/devmap.c +++ b/kernel/bpf/devmap.c @@ -81,7 +81,7 @@ struct bpf_dtab { u32 n_buckets; }; -static DEFINE_PER_CPU(struct list_head, dev_map_flush_list); +static DEFINE_PER_CPU(struct list_head, dev_flush_list); static DEFINE_SPINLOCK(dev_map_lock); static LIST_HEAD(dev_map_list); @@ -357,16 +357,16 @@ static int bq_xmit_all(struct xdp_dev_bulk_queue *bq, u32 flags) goto out; } -/* __dev_map_flush is called from xdp_do_flush_map() which _must_ be signaled +/* __dev_flush is called from xdp_do_flush_map() which _must_ be signaled * from the driver before returning from its napi->poll() routine. The poll() * routine is called either from busy_poll context or net_rx_action signaled * from NET_RX_SOFTIRQ. Either way the poll routine must complete before the * net device can be torn down. On devmap tear down we ensure the flush list * is empty before completing to ensure all flush operations have completed. */ -void __dev_map_flush(void) +void __dev_flush(void) { - struct list_head *flush_list = this_cpu_ptr(&dev_map_flush_list); + struct list_head *flush_list = this_cpu_ptr(&dev_flush_list); struct xdp_dev_bulk_queue *bq, *tmp; rcu_read_lock(); @@ -398,7 +398,7 @@ static int bq_enqueue(struct net_device *dev, struct xdp_frame *xdpf, struct net_device *dev_rx) { - struct list_head *flush_list = this_cpu_ptr(&dev_map_flush_list); + struct list_head *flush_list = this_cpu_ptr(&dev_flush_list); struct xdp_dev_bulk_queue *bq = this_cpu_ptr(dev->xdp_bulkq); if (unlikely(bq->count == DEV_MAP_BULK_SIZE)) @@ -419,10 +419,9 @@ static int bq_enqueue(struct net_device *dev, struct xdp_frame *xdpf, return 0; } -int dev_map_enqueue(struct bpf_dtab_netdev *dst, struct xdp_buff *xdp, - struct net_device *dev_rx) +static inline int _xdp_enqueue(struct net_device *dev, struct xdp_buff *xdp, + struct net_device *dev_rx) { - struct net_device *dev = dst->dev; struct xdp_frame *xdpf; int err; @@ -440,6 +439,20 @@ int dev_map_enqueue(struct bpf_dtab_netdev *dst, struct xdp_buff *xdp, return bq_enqueue(dev, xdpf, dev_rx); } +int dev_xdp_enqueue(struct net_device *dev, struct xdp_buff *xdp, + struct net_device *dev_rx) +{ + return _xdp_enqueue(dev, xdp, dev_rx); +} + +int dev_map_enqueue(struct bpf_dtab_netdev *dst, struct xdp_buff *xdp, + struct net_device *dev_rx) +{ + struct net_device *dev = dst->dev; + + return _xdp_enqueue(dev, xdp, dev_rx); +} + int dev_map_generic_redirect(struct bpf_dtab_netdev *dst, struct sk_buff *skb, struct bpf_prog *xdp_prog) { @@ -760,7 +773,7 @@ static int __init dev_map_init(void) register_netdevice_notifier(&dev_map_notifier); for_each_possible_cpu(cpu) - INIT_LIST_HEAD(&per_cpu(dev_map_flush_list, cpu)); + INIT_LIST_HEAD(&per_cpu(dev_flush_list, cpu)); return 0; } diff --git a/net/core/filter.c b/net/core/filter.c index 42fd17c48c5f..550488162fe1 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -3458,32 +3458,6 @@ static const struct bpf_func_proto bpf_xdp_adjust_meta_proto = { .arg2_type = ARG_ANYTHING, }; -static int __bpf_tx_xdp(struct net_device *dev, - struct bpf_map *map, - struct xdp_buff *xdp, - u32 index) -{ - struct xdp_frame *xdpf; - int err, sent; - - if (!dev->netdev_ops->ndo_xdp_xmit) { - return -EOPNOTSUPP; - } - - err = xdp_ok_fwd_dev(dev, xdp->data_end - xdp->data); - if (unlikely(err)) - return err; - - xdpf = convert_to_xdp_frame(xdp); - if (unlikely(!xdpf)) - return -EOVERFLOW; - - sent = dev->netdev_ops->ndo_xdp_xmit(dev, 1, &xdpf, XDP_XMIT_FLUSH); - if (sent <= 0) - return sent; - return 0; -} - static noinline int xdp_do_redirect_slow(struct net_device *dev, struct xdp_buff *xdp, struct bpf_prog *xdp_prog, struct bpf_redirect_info *ri) @@ -3499,7 +3473,7 @@ xdp_do_redirect_slow(struct net_device *dev, struct xdp_buff *xdp, goto err; } - err = __bpf_tx_xdp(fwd, NULL, xdp, 0); + err = dev_xdp_enqueue(fwd, xdp, dev); if (unlikely(err)) goto err; @@ -3529,7 +3503,7 @@ static int __bpf_tx_xdp_map(struct net_device *dev_rx, void *fwd, void xdp_do_flush_map(void) { - __dev_map_flush(); + __dev_flush(); __cpu_map_flush(); __xsk_map_flush(); } ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH bpf-next 2/2] xdp: Use bulking for non-map XDP_REDIRECT 2020-01-10 14:22 ` [PATCH bpf-next 2/2] xdp: Use bulking for non-map XDP_REDIRECT Toke Høiland-Jørgensen @ 2020-01-10 15:15 ` Björn Töpel 2020-01-10 15:30 ` Toke Høiland-Jørgensen 0 siblings, 1 reply; 13+ messages in thread From: Björn Töpel @ 2020-01-10 15:15 UTC (permalink / raw) To: Toke Høiland-Jørgensen Cc: Netdev, bpf, Daniel Borkmann, Alexei Starovoitov, David Miller, Jesper Dangaard Brouer, John Fastabend On Fri, 10 Jan 2020 at 15:22, Toke Høiland-Jørgensen <toke@redhat.com> wrote: > > From: Toke Høiland-Jørgensen <toke@redhat.com> > > Since the bulk queue used by XDP_REDIRECT now lives in struct net_device, > we can re-use the bulking for the non-map version of the bpf_redirect() > helper. This is a simple matter of having xdp_do_redirect_slow() queue the > frame on the bulk queue instead of sending it out with __bpf_tx_xdp(). > > Unfortunately we can't make the bpf_redirect() helper return an error if > the ifindex doesn't exit (as bpf_redirect_map() does), because we don't > have a reference to the network namespace of the ingress device at the time > the helper is called. So we have to leave it as-is and keep the device > lookup in xdp_do_redirect_slow(). > > With this change, the performance of the xdp_redirect sample program goes > from 5Mpps to 8.4Mpps (a 68% increase). > After these changes, does the noinline (commit 47b123ed9e99 ("xdp: split code for map vs non-map redirect")) still make sense? > Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> > --- > include/linux/bpf.h | 13 +++++++++++-- > kernel/bpf/devmap.c | 31 ++++++++++++++++++++++--------- > net/core/filter.c | 30 ++---------------------------- > 3 files changed, 35 insertions(+), 39 deletions(-) > > diff --git a/include/linux/bpf.h b/include/linux/bpf.h > index b14e51d56a82..25c050202536 100644 > --- a/include/linux/bpf.h > +++ b/include/linux/bpf.h > @@ -962,7 +962,9 @@ struct sk_buff; > > struct bpf_dtab_netdev *__dev_map_lookup_elem(struct bpf_map *map, u32 key); > struct bpf_dtab_netdev *__dev_map_hash_lookup_elem(struct bpf_map *map, u32 key); > -void __dev_map_flush(void); > +void __dev_flush(void); > +int dev_xdp_enqueue(struct net_device *dev, struct xdp_buff *xdp, > + struct net_device *dev_rx); > int dev_map_enqueue(struct bpf_dtab_netdev *dst, struct xdp_buff *xdp, > struct net_device *dev_rx); > int dev_map_generic_redirect(struct bpf_dtab_netdev *dst, struct sk_buff *skb, > @@ -1071,13 +1073,20 @@ static inline struct net_device *__dev_map_hash_lookup_elem(struct bpf_map *map > return NULL; > } > > -static inline void __dev_map_flush(void) > +static inline void __dev_flush(void) > { > } > > struct xdp_buff; > struct bpf_dtab_netdev; > > +static inline > +int dev_xdp_enqueue(struct net_device *dev, struct xdp_buff *xdp, > + struct net_device *dev_rx) > +{ > + return 0; > +} > + > static inline > int dev_map_enqueue(struct bpf_dtab_netdev *dst, struct xdp_buff *xdp, > struct net_device *dev_rx) > diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c > index bcb05cb6b728..adbb82770d02 100644 > --- a/kernel/bpf/devmap.c > +++ b/kernel/bpf/devmap.c > @@ -81,7 +81,7 @@ struct bpf_dtab { > u32 n_buckets; > }; > > -static DEFINE_PER_CPU(struct list_head, dev_map_flush_list); > +static DEFINE_PER_CPU(struct list_head, dev_flush_list); > static DEFINE_SPINLOCK(dev_map_lock); > static LIST_HEAD(dev_map_list); > > @@ -357,16 +357,16 @@ static int bq_xmit_all(struct xdp_dev_bulk_queue *bq, u32 flags) > goto out; > } > > -/* __dev_map_flush is called from xdp_do_flush_map() which _must_ be signaled > +/* __dev_flush is called from xdp_do_flush_map() which _must_ be signaled > * from the driver before returning from its napi->poll() routine. The poll() > * routine is called either from busy_poll context or net_rx_action signaled > * from NET_RX_SOFTIRQ. Either way the poll routine must complete before the > * net device can be torn down. On devmap tear down we ensure the flush list > * is empty before completing to ensure all flush operations have completed. > */ > -void __dev_map_flush(void) > +void __dev_flush(void) > { > - struct list_head *flush_list = this_cpu_ptr(&dev_map_flush_list); > + struct list_head *flush_list = this_cpu_ptr(&dev_flush_list); > struct xdp_dev_bulk_queue *bq, *tmp; > > rcu_read_lock(); > @@ -398,7 +398,7 @@ static int bq_enqueue(struct net_device *dev, struct xdp_frame *xdpf, > struct net_device *dev_rx) > > { > - struct list_head *flush_list = this_cpu_ptr(&dev_map_flush_list); > + struct list_head *flush_list = this_cpu_ptr(&dev_flush_list); > struct xdp_dev_bulk_queue *bq = this_cpu_ptr(dev->xdp_bulkq); > > if (unlikely(bq->count == DEV_MAP_BULK_SIZE)) > @@ -419,10 +419,9 @@ static int bq_enqueue(struct net_device *dev, struct xdp_frame *xdpf, > return 0; > } > > -int dev_map_enqueue(struct bpf_dtab_netdev *dst, struct xdp_buff *xdp, > - struct net_device *dev_rx) > +static inline int _xdp_enqueue(struct net_device *dev, struct xdp_buff *xdp, > + struct net_device *dev_rx) > { > - struct net_device *dev = dst->dev; > struct xdp_frame *xdpf; > int err; > > @@ -440,6 +439,20 @@ int dev_map_enqueue(struct bpf_dtab_netdev *dst, struct xdp_buff *xdp, > return bq_enqueue(dev, xdpf, dev_rx); > } > > +int dev_xdp_enqueue(struct net_device *dev, struct xdp_buff *xdp, > + struct net_device *dev_rx) > +{ > + return _xdp_enqueue(dev, xdp, dev_rx); > +} > + dev_xdp_enqueue, and dev_map_enqueue are *very* similar. Can these be combined, and maybe fold the xdp_do_redirect_slow() into xdp_do_direct_map? OTOH the TP are different, so maybe combining the two functions will be messy... It's only that with your changes the map/ifindex redirect are very similar. Just an idea, might be messy. :-P > +int dev_map_enqueue(struct bpf_dtab_netdev *dst, struct xdp_buff *xdp, > + struct net_device *dev_rx) > +{ > + struct net_device *dev = dst->dev; > + > + return _xdp_enqueue(dev, xdp, dev_rx); > +} > + > int dev_map_generic_redirect(struct bpf_dtab_netdev *dst, struct sk_buff *skb, > struct bpf_prog *xdp_prog) > { > @@ -760,7 +773,7 @@ static int __init dev_map_init(void) > register_netdevice_notifier(&dev_map_notifier); > > for_each_possible_cpu(cpu) > - INIT_LIST_HEAD(&per_cpu(dev_map_flush_list, cpu)); > + INIT_LIST_HEAD(&per_cpu(dev_flush_list, cpu)); > return 0; > } > > diff --git a/net/core/filter.c b/net/core/filter.c > index 42fd17c48c5f..550488162fe1 100644 > --- a/net/core/filter.c > +++ b/net/core/filter.c > @@ -3458,32 +3458,6 @@ static const struct bpf_func_proto bpf_xdp_adjust_meta_proto = { > .arg2_type = ARG_ANYTHING, > }; > > -static int __bpf_tx_xdp(struct net_device *dev, > - struct bpf_map *map, > - struct xdp_buff *xdp, > - u32 index) > -{ > - struct xdp_frame *xdpf; > - int err, sent; > - > - if (!dev->netdev_ops->ndo_xdp_xmit) { > - return -EOPNOTSUPP; > - } > - > - err = xdp_ok_fwd_dev(dev, xdp->data_end - xdp->data); > - if (unlikely(err)) > - return err; > - > - xdpf = convert_to_xdp_frame(xdp); > - if (unlikely(!xdpf)) > - return -EOVERFLOW; > - > - sent = dev->netdev_ops->ndo_xdp_xmit(dev, 1, &xdpf, XDP_XMIT_FLUSH); > - if (sent <= 0) > - return sent; > - return 0; > -} > - > static noinline int > xdp_do_redirect_slow(struct net_device *dev, struct xdp_buff *xdp, > struct bpf_prog *xdp_prog, struct bpf_redirect_info *ri) > @@ -3499,7 +3473,7 @@ xdp_do_redirect_slow(struct net_device *dev, struct xdp_buff *xdp, > goto err; > } > > - err = __bpf_tx_xdp(fwd, NULL, xdp, 0); > + err = dev_xdp_enqueue(fwd, xdp, dev); > if (unlikely(err)) > goto err; > > @@ -3529,7 +3503,7 @@ static int __bpf_tx_xdp_map(struct net_device *dev_rx, void *fwd, > > void xdp_do_flush_map(void) > { > - __dev_map_flush(); > + __dev_flush(); > __cpu_map_flush(); > __xsk_map_flush(); > } > ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH bpf-next 2/2] xdp: Use bulking for non-map XDP_REDIRECT 2020-01-10 15:15 ` Björn Töpel @ 2020-01-10 15:30 ` Toke Høiland-Jørgensen 2020-01-10 15:54 ` Björn Töpel 0 siblings, 1 reply; 13+ messages in thread From: Toke Høiland-Jørgensen @ 2020-01-10 15:30 UTC (permalink / raw) To: Björn Töpel Cc: Netdev, bpf, Daniel Borkmann, Alexei Starovoitov, David Miller, Jesper Dangaard Brouer, John Fastabend Björn Töpel <bjorn.topel@gmail.com> writes: > On Fri, 10 Jan 2020 at 15:22, Toke Høiland-Jørgensen <toke@redhat.com> wrote: >> >> From: Toke Høiland-Jørgensen <toke@redhat.com> >> >> Since the bulk queue used by XDP_REDIRECT now lives in struct net_device, >> we can re-use the bulking for the non-map version of the bpf_redirect() >> helper. This is a simple matter of having xdp_do_redirect_slow() queue the >> frame on the bulk queue instead of sending it out with __bpf_tx_xdp(). >> >> Unfortunately we can't make the bpf_redirect() helper return an error if >> the ifindex doesn't exit (as bpf_redirect_map() does), because we don't >> have a reference to the network namespace of the ingress device at the time >> the helper is called. So we have to leave it as-is and keep the device >> lookup in xdp_do_redirect_slow(). >> >> With this change, the performance of the xdp_redirect sample program goes >> from 5Mpps to 8.4Mpps (a 68% increase). >> > > After these changes, does the noinline (commit 47b123ed9e99 ("xdp: > split code for map vs non-map redirect")) still make sense? Hmm, good question. The two code paths are certainly close to one another; and I guess they could be consolidated further. The best case would be if we had a way to lookup the ifindex directly in the helper. Do you know if there's a way to get the current net namespace from the helper? Can we use current->nsproxy->net_ns in that context? If we can, and if we don't mind merging the two different tracepoints, the xdp_do_redirect() function could be made quite a bit leaner... -Toke ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH bpf-next 2/2] xdp: Use bulking for non-map XDP_REDIRECT 2020-01-10 15:30 ` Toke Høiland-Jørgensen @ 2020-01-10 15:54 ` Björn Töpel 2020-01-10 15:57 ` Toke Høiland-Jørgensen 0 siblings, 1 reply; 13+ messages in thread From: Björn Töpel @ 2020-01-10 15:54 UTC (permalink / raw) To: Toke Høiland-Jørgensen Cc: Netdev, bpf, Daniel Borkmann, Alexei Starovoitov, David Miller, Jesper Dangaard Brouer, John Fastabend On Fri, 10 Jan 2020 at 16:30, Toke Høiland-Jørgensen <toke@redhat.com> wrote: > > Björn Töpel <bjorn.topel@gmail.com> writes: > [...] > > > > After these changes, does the noinline (commit 47b123ed9e99 ("xdp: > > split code for map vs non-map redirect")) still make sense? > > Hmm, good question. The two code paths are certainly close to one > another; and I guess they could be consolidated further. > > The best case would be if we had a way to lookup the ifindex directly in > the helper. Do you know if there's a way to get the current net > namespace from the helper? Can we use current->nsproxy->net_ns in that > context? > Nope, interrupt context. :-( Another (ugly) way is adding a netns member to the bpf_redirect_info, that is populated by the driver (driver changes everywhere -- ick). So no. (And *if* one would go the route of changing all drivers, I think the percpu bpf_redirect_info should be replaced a by a context that is passed from the driver to the XDP program execution and xdp_do_redirect/flush. But that's a much bigger patch. :-)) Björn > If we can, and if we don't mind merging the two different tracepoints, > the xdp_do_redirect() function could be made quite a bit leaner... > > -Toke > ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH bpf-next 2/2] xdp: Use bulking for non-map XDP_REDIRECT 2020-01-10 15:54 ` Björn Töpel @ 2020-01-10 15:57 ` Toke Høiland-Jørgensen 0 siblings, 0 replies; 13+ messages in thread From: Toke Høiland-Jørgensen @ 2020-01-10 15:57 UTC (permalink / raw) To: Björn Töpel Cc: Netdev, bpf, Daniel Borkmann, Alexei Starovoitov, David Miller, Jesper Dangaard Brouer, John Fastabend Björn Töpel <bjorn.topel@gmail.com> writes: > On Fri, 10 Jan 2020 at 16:30, Toke Høiland-Jørgensen <toke@redhat.com> wrote: >> >> Björn Töpel <bjorn.topel@gmail.com> writes: >> > [...] >> > >> > After these changes, does the noinline (commit 47b123ed9e99 ("xdp: >> > split code for map vs non-map redirect")) still make sense? >> >> Hmm, good question. The two code paths are certainly close to one >> another; and I guess they could be consolidated further. >> >> The best case would be if we had a way to lookup the ifindex directly in >> the helper. Do you know if there's a way to get the current net >> namespace from the helper? Can we use current->nsproxy->net_ns in that >> context? >> > > Nope, interrupt context. :-( Another (ugly) way is adding a netns > member to the bpf_redirect_info, that is populated by the driver > (driver changes everywhere -- ick). So no. Yup, that's what I thought. OK, too bad; I'll see what other consolidation I can do with the current code, then. > (And *if* one would go the route of changing all drivers, I think the > percpu bpf_redirect_info should be replaced a by a context that is > passed from the driver to the XDP program execution and > xdp_do_redirect/flush. But that's a much bigger patch. :-)) Yeah, let's leave that until the next time we figure out we have to change all the drivers, then ;) -Toke ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2020-01-10 23:16 UTC | newest] Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-01-10 14:22 [PATCH bpf-next 0/2] xdp: Introduce bulking for non-map XDP_REDIRECT Toke Høiland-Jørgensen 2020-01-10 14:22 ` [PATCH bpf-next 1/2] xdp: Move devmap bulk queue into struct net_device Toke Høiland-Jørgensen 2020-01-10 15:03 ` Björn Töpel 2020-01-10 15:26 ` Toke Høiland-Jørgensen 2020-01-10 16:08 ` Jesper Dangaard Brouer 2020-01-10 22:34 ` Toke Høiland-Jørgensen 2020-01-10 22:46 ` Eric Dumazet 2020-01-10 23:16 ` Toke Høiland-Jørgensen 2020-01-10 14:22 ` [PATCH bpf-next 2/2] xdp: Use bulking for non-map XDP_REDIRECT Toke Høiland-Jørgensen 2020-01-10 15:15 ` Björn Töpel 2020-01-10 15:30 ` Toke Høiland-Jørgensen 2020-01-10 15:54 ` Björn Töpel 2020-01-10 15:57 ` Toke Høiland-Jørgensen
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).