bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH bpf-next v2 0/8] Simplify xdp_do_redirect_map()/xdp_do_flush_map() and XDP maps
@ 2019-12-19  6:09 Björn Töpel
  2019-12-19  6:09 ` [PATCH bpf-next v2 1/8] xdp: simplify devmap cleanup Björn Töpel
                   ` (9 more replies)
  0 siblings, 10 replies; 36+ messages in thread
From: Björn Töpel @ 2019-12-19  6:09 UTC (permalink / raw)
  To: netdev, ast, daniel
  Cc: Björn Töpel, bpf, davem, jakub.kicinski, hawk,
	john.fastabend, magnus.karlsson, jonathan.lemon

This series aims to simplify the XDP maps and
xdp_do_redirect_map()/xdp_do_flush_map(), and to crank out some more
performance from XDP_REDIRECT scenarios.

The first part of the series simplifies all XDP_REDIRECT capable maps,
so that __XXX_flush_map() does not require the map parameter, by
moving the flush list from the map to global scope.

This results in that the map_to_flush member can be removed from
struct bpf_redirect_info, and its corresponding logic.

Simpler code, and more performance due to that checks/code per-packet
is moved to flush.

Pre-series performance:
  $ sudo taskset -c 22 ./xdpsock -i enp134s0f0 -q 20 -n 1 -r -z
  
   sock0@enp134s0f0:20 rxdrop xdp-drv 
                  pps         pkts        1.00       
  rx              20,797,350  230,942,399
  tx              0           0          
  
  $ sudo ./xdp_redirect_cpu --dev enp134s0f0 --cpu 22 xdp_cpu_map0
  
  Running XDP/eBPF prog_name:xdp_cpu_map5_lb_hash_ip_pairs
  XDP-cpumap      CPU:to  pps            drop-pps    extra-info
  XDP-RX          20      7723038        0           0
  XDP-RX          total   7723038        0
  cpumap_kthread  total   0              0           0
  redirect_err    total   0              0
  xdp_exception   total   0              0

Post-series performance:
  $ sudo taskset -c 22 ./xdpsock -i enp134s0f0 -q 20 -n 1 -r -z

   sock0@enp134s0f0:20 rxdrop xdp-drv 
                  pps         pkts        1.00       
  rx              21,524,979  86,835,327 
  tx              0           0          
  
  $ sudo ./xdp_redirect_cpu --dev enp134s0f0 --cpu 22 xdp_cpu_map0

  Running XDP/eBPF prog_name:xdp_cpu_map5_lb_hash_ip_pairs
  XDP-cpumap      CPU:to  pps            drop-pps    extra-info
  XDP-RX          20      7840124        0           0          
  XDP-RX          total   7840124        0          
  cpumap_kthread  total   0              0           0          
  redirect_err    total   0              0          
  xdp_exception   total   0              0          
  
Results: +3.5% and +1.5% for the ubenchmarks.

v1->v2 [1]:
  * Removed 'unused-variable' compiler warning (Jakub)

[1] https://lore.kernel.org/bpf/20191218105400.2895-1-bjorn.topel@gmail.com/

Björn Töpel (8):
  xdp: simplify devmap cleanup
  xdp: simplify cpumap cleanup
  xdp: fix graze->grace type-o in cpumap comments
  xsk: make xskmap flush_list common for all map instances
  xdp: make devmap flush_list common for all map instances
  xdp: make cpumap flush_list common for all map instances
  xdp: remove map_to_flush and map swap detection
  xdp: simplify __bpf_tx_xdp_map()

 include/linux/bpf.h    |  8 ++---
 include/linux/filter.h |  1 -
 include/net/xdp_sock.h | 11 +++---
 kernel/bpf/cpumap.c    | 76 ++++++++++++++--------------------------
 kernel/bpf/devmap.c    | 78 ++++++++++--------------------------------
 kernel/bpf/xskmap.c    | 18 ++--------
 net/core/filter.c      | 63 ++++++----------------------------
 net/xdp/xsk.c          | 17 ++++-----
 8 files changed, 75 insertions(+), 197 deletions(-)

-- 
2.20.1


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH bpf-next v2 1/8] xdp: simplify devmap cleanup
  2019-12-19  6:09 [PATCH bpf-next v2 0/8] Simplify xdp_do_redirect_map()/xdp_do_flush_map() and XDP maps Björn Töpel
@ 2019-12-19  6:09 ` Björn Töpel
  2020-01-07 17:32   ` John Fastabend
  2019-12-19  6:10 ` [PATCH bpf-next v2 2/8] xdp: simplify cpumap cleanup Björn Töpel
                   ` (8 subsequent siblings)
  9 siblings, 1 reply; 36+ messages in thread
From: Björn Töpel @ 2019-12-19  6:09 UTC (permalink / raw)
  To: netdev, ast, daniel
  Cc: Björn Töpel, bpf, davem, jakub.kicinski, hawk,
	john.fastabend, magnus.karlsson, jonathan.lemon,
	Toke Høiland-Jørgensen

From: Björn Töpel <bjorn.topel@intel.com>

After the RCU flavor consolidation [1], call_rcu() and
synchronize_rcu() waits for preempt-disable regions (NAPI) in addition
to the read-side critical sections. As a result of this, the cleanup
code in devmap can be simplified

* There is no longer a need to flush in __dev_map_entry_free, since we
  know that this has been done when the call_rcu() callback is
  triggered.

* When freeing the map, there is no need to explicitly wait for a
  flush. It's guaranteed to be done after the synchronize_rcu() call
  in dev_map_free(). The rcu_barrier() is still needed, so that the
  map is not freed prior the elements.

[1] https://lwn.net/Articles/777036/

Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
---
 kernel/bpf/devmap.c | 43 +++++--------------------------------------
 1 file changed, 5 insertions(+), 38 deletions(-)

diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c
index 3d3d61b5985b..b7595de6a91a 100644
--- a/kernel/bpf/devmap.c
+++ b/kernel/bpf/devmap.c
@@ -201,7 +201,7 @@ static struct bpf_map *dev_map_alloc(union bpf_attr *attr)
 static void dev_map_free(struct bpf_map *map)
 {
 	struct bpf_dtab *dtab = container_of(map, struct bpf_dtab, map);
-	int i, cpu;
+	int i;
 
 	/* At this point bpf_prog->aux->refcnt == 0 and this map->refcnt == 0,
 	 * so the programs (can be more than one that used this map) were
@@ -221,18 +221,6 @@ static void dev_map_free(struct bpf_map *map)
 	/* Make sure prior __dev_map_entry_free() have completed. */
 	rcu_barrier();
 
-	/* To ensure all pending flush operations have completed wait for flush
-	 * list to empty on _all_ cpus.
-	 * Because the above synchronize_rcu() ensures the map is disconnected
-	 * from the program we can assume no new items will be added.
-	 */
-	for_each_online_cpu(cpu) {
-		struct list_head *flush_list = per_cpu_ptr(dtab->flush_list, cpu);
-
-		while (!list_empty(flush_list))
-			cond_resched();
-	}
-
 	if (dtab->map.map_type == BPF_MAP_TYPE_DEVMAP_HASH) {
 		for (i = 0; i < dtab->n_buckets; i++) {
 			struct bpf_dtab_netdev *dev;
@@ -345,8 +333,7 @@ static int dev_map_hash_get_next_key(struct bpf_map *map, void *key,
 	return -ENOENT;
 }
 
-static int bq_xmit_all(struct xdp_bulk_queue *bq, u32 flags,
-		       bool in_napi_ctx)
+static int bq_xmit_all(struct xdp_bulk_queue *bq, u32 flags)
 {
 	struct bpf_dtab_netdev *obj = bq->obj;
 	struct net_device *dev = obj->dev;
@@ -384,11 +371,7 @@ static int bq_xmit_all(struct xdp_bulk_queue *bq, u32 flags,
 	for (i = 0; i < bq->count; i++) {
 		struct xdp_frame *xdpf = bq->q[i];
 
-		/* RX path under NAPI protection, can return frames faster */
-		if (likely(in_napi_ctx))
-			xdp_return_frame_rx_napi(xdpf);
-		else
-			xdp_return_frame(xdpf);
+		xdp_return_frame_rx_napi(xdpf);
 		drops++;
 	}
 	goto out;
@@ -409,7 +392,7 @@ void __dev_map_flush(struct bpf_map *map)
 
 	rcu_read_lock();
 	list_for_each_entry_safe(bq, tmp, flush_list, flush_node)
-		bq_xmit_all(bq, XDP_XMIT_FLUSH, true);
+		bq_xmit_all(bq, XDP_XMIT_FLUSH);
 	rcu_read_unlock();
 }
 
@@ -440,7 +423,7 @@ static int bq_enqueue(struct bpf_dtab_netdev *obj, struct xdp_frame *xdpf,
 	struct xdp_bulk_queue *bq = this_cpu_ptr(obj->bulkq);
 
 	if (unlikely(bq->count == DEV_MAP_BULK_SIZE))
-		bq_xmit_all(bq, 0, true);
+		bq_xmit_all(bq, 0);
 
 	/* Ingress dev_rx will be the same for all xdp_frame's in
 	 * bulk_queue, because bq stored per-CPU and must be flushed
@@ -509,27 +492,11 @@ static void *dev_map_hash_lookup_elem(struct bpf_map *map, void *key)
 	return dev ? &dev->ifindex : NULL;
 }
 
-static void dev_map_flush_old(struct bpf_dtab_netdev *dev)
-{
-	if (dev->dev->netdev_ops->ndo_xdp_xmit) {
-		struct xdp_bulk_queue *bq;
-		int cpu;
-
-		rcu_read_lock();
-		for_each_online_cpu(cpu) {
-			bq = per_cpu_ptr(dev->bulkq, cpu);
-			bq_xmit_all(bq, XDP_XMIT_FLUSH, false);
-		}
-		rcu_read_unlock();
-	}
-}
-
 static void __dev_map_entry_free(struct rcu_head *rcu)
 {
 	struct bpf_dtab_netdev *dev;
 
 	dev = container_of(rcu, struct bpf_dtab_netdev, rcu);
-	dev_map_flush_old(dev);
 	free_percpu(dev->bulkq);
 	dev_put(dev->dev);
 	kfree(dev);
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH bpf-next v2 2/8] xdp: simplify cpumap cleanup
  2019-12-19  6:09 [PATCH bpf-next v2 0/8] Simplify xdp_do_redirect_map()/xdp_do_flush_map() and XDP maps Björn Töpel
  2019-12-19  6:09 ` [PATCH bpf-next v2 1/8] xdp: simplify devmap cleanup Björn Töpel
@ 2019-12-19  6:10 ` Björn Töpel
  2019-12-19  6:10 ` [PATCH bpf-next v2 3/8] xdp: fix graze->grace type-o in cpumap comments Björn Töpel
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 36+ messages in thread
From: Björn Töpel @ 2019-12-19  6:10 UTC (permalink / raw)
  To: netdev, ast, daniel
  Cc: Björn Töpel, bpf, davem, jakub.kicinski, hawk,
	john.fastabend, magnus.karlsson, jonathan.lemon,
	Toke Høiland-Jørgensen

From: Björn Töpel <bjorn.topel@intel.com>

After the RCU flavor consolidation [1], call_rcu() and
synchronize_rcu() waits for preempt-disable regions (NAPI) in addition
to the read-side critical sections. As a result of this, the cleanup
code in cpumap can be simplified

* There is no longer a need to flush in __cpu_map_entry_free, since we
  know that this has been done when the call_rcu() callback is
  triggered.

* When freeing the map, there is no need to explicitly wait for a
  flush. It's guaranteed to be done after the synchronize_rcu() call
  in cpu_map_free().

[1] https://lwn.net/Articles/777036/

Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
---
 kernel/bpf/cpumap.c | 34 +++++-----------------------------
 1 file changed, 5 insertions(+), 29 deletions(-)

diff --git a/kernel/bpf/cpumap.c b/kernel/bpf/cpumap.c
index ef49e17ae47c..04c8eb11cd90 100644
--- a/kernel/bpf/cpumap.c
+++ b/kernel/bpf/cpumap.c
@@ -75,7 +75,7 @@ struct bpf_cpu_map {
 	struct list_head __percpu *flush_list;
 };
 
-static int bq_flush_to_queue(struct xdp_bulk_queue *bq, bool in_napi_ctx);
+static int bq_flush_to_queue(struct xdp_bulk_queue *bq);
 
 static struct bpf_map *cpu_map_alloc(union bpf_attr *attr)
 {
@@ -399,7 +399,6 @@ static struct bpf_cpu_map_entry *__cpu_map_entry_alloc(u32 qsize, u32 cpu,
 static void __cpu_map_entry_free(struct rcu_head *rcu)
 {
 	struct bpf_cpu_map_entry *rcpu;
-	int cpu;
 
 	/* This cpu_map_entry have been disconnected from map and one
 	 * RCU graze-period have elapsed.  Thus, XDP cannot queue any
@@ -408,13 +407,6 @@ static void __cpu_map_entry_free(struct rcu_head *rcu)
 	 */
 	rcpu = container_of(rcu, struct bpf_cpu_map_entry, rcu);
 
-	/* Flush remaining packets in percpu bulkq */
-	for_each_online_cpu(cpu) {
-		struct xdp_bulk_queue *bq = per_cpu_ptr(rcpu->bulkq, cpu);
-
-		/* No concurrent bq_enqueue can run at this point */
-		bq_flush_to_queue(bq, false);
-	}
 	free_percpu(rcpu->bulkq);
 	/* Cannot kthread_stop() here, last put free rcpu resources */
 	put_cpu_map_entry(rcpu);
@@ -507,7 +499,6 @@ static int cpu_map_update_elem(struct bpf_map *map, void *key, void *value,
 static void cpu_map_free(struct bpf_map *map)
 {
 	struct bpf_cpu_map *cmap = container_of(map, struct bpf_cpu_map, map);
-	int cpu;
 	u32 i;
 
 	/* At this point bpf_prog->aux->refcnt == 0 and this map->refcnt == 0,
@@ -522,18 +513,6 @@ static void cpu_map_free(struct bpf_map *map)
 	bpf_clear_redirect_map(map);
 	synchronize_rcu();
 
-	/* To ensure all pending flush operations have completed wait for flush
-	 * list be empty on _all_ cpus. Because the above synchronize_rcu()
-	 * ensures the map is disconnected from the program we can assume no new
-	 * items will be added to the list.
-	 */
-	for_each_online_cpu(cpu) {
-		struct list_head *flush_list = per_cpu_ptr(cmap->flush_list, cpu);
-
-		while (!list_empty(flush_list))
-			cond_resched();
-	}
-
 	/* For cpu_map the remote CPUs can still be using the entries
 	 * (struct bpf_cpu_map_entry).
 	 */
@@ -599,7 +578,7 @@ const struct bpf_map_ops cpu_map_ops = {
 	.map_check_btf		= map_check_no_btf,
 };
 
-static int bq_flush_to_queue(struct xdp_bulk_queue *bq, bool in_napi_ctx)
+static int bq_flush_to_queue(struct xdp_bulk_queue *bq)
 {
 	struct bpf_cpu_map_entry *rcpu = bq->obj;
 	unsigned int processed = 0, drops = 0;
@@ -620,10 +599,7 @@ static int bq_flush_to_queue(struct xdp_bulk_queue *bq, bool in_napi_ctx)
 		err = __ptr_ring_produce(q, xdpf);
 		if (err) {
 			drops++;
-			if (likely(in_napi_ctx))
-				xdp_return_frame_rx_napi(xdpf);
-			else
-				xdp_return_frame(xdpf);
+			xdp_return_frame_rx_napi(xdpf);
 		}
 		processed++;
 	}
@@ -646,7 +622,7 @@ static int bq_enqueue(struct bpf_cpu_map_entry *rcpu, struct xdp_frame *xdpf)
 	struct xdp_bulk_queue *bq = this_cpu_ptr(rcpu->bulkq);
 
 	if (unlikely(bq->count == CPU_MAP_BULK_SIZE))
-		bq_flush_to_queue(bq, true);
+		bq_flush_to_queue(bq);
 
 	/* Notice, xdp_buff/page MUST be queued here, long enough for
 	 * driver to code invoking us to finished, due to driver
@@ -688,7 +664,7 @@ void __cpu_map_flush(struct bpf_map *map)
 	struct xdp_bulk_queue *bq, *tmp;
 
 	list_for_each_entry_safe(bq, tmp, flush_list, flush_node) {
-		bq_flush_to_queue(bq, true);
+		bq_flush_to_queue(bq);
 
 		/* If already running, costs spin_lock_irqsave + smb_mb */
 		wake_up_process(bq->obj->kthread);
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH bpf-next v2 3/8] xdp: fix graze->grace type-o in cpumap comments
  2019-12-19  6:09 [PATCH bpf-next v2 0/8] Simplify xdp_do_redirect_map()/xdp_do_flush_map() and XDP maps Björn Töpel
  2019-12-19  6:09 ` [PATCH bpf-next v2 1/8] xdp: simplify devmap cleanup Björn Töpel
  2019-12-19  6:10 ` [PATCH bpf-next v2 2/8] xdp: simplify cpumap cleanup Björn Töpel
@ 2019-12-19  6:10 ` Björn Töpel
  2020-01-07 17:33   ` John Fastabend
  2019-12-19  6:10 ` [PATCH bpf-next v2 4/8] xsk: make xskmap flush_list common for all map instances Björn Töpel
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 36+ messages in thread
From: Björn Töpel @ 2019-12-19  6:10 UTC (permalink / raw)
  To: netdev, ast, daniel
  Cc: Björn Töpel, bpf, davem, jakub.kicinski, hawk,
	john.fastabend, magnus.karlsson, jonathan.lemon,
	Toke Høiland-Jørgensen

From: Björn Töpel <bjorn.topel@intel.com>

Simple spelling fix.

Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
---
 kernel/bpf/cpumap.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/kernel/bpf/cpumap.c b/kernel/bpf/cpumap.c
index 04c8eb11cd90..f9deed659798 100644
--- a/kernel/bpf/cpumap.c
+++ b/kernel/bpf/cpumap.c
@@ -401,7 +401,7 @@ static void __cpu_map_entry_free(struct rcu_head *rcu)
 	struct bpf_cpu_map_entry *rcpu;
 
 	/* This cpu_map_entry have been disconnected from map and one
-	 * RCU graze-period have elapsed.  Thus, XDP cannot queue any
+	 * RCU grace-period have elapsed.  Thus, XDP cannot queue any
 	 * new packets and cannot change/set flush_needed that can
 	 * find this entry.
 	 */
@@ -428,7 +428,7 @@ static void __cpu_map_entry_free(struct rcu_head *rcu)
  * percpu bulkq to queue.  Due to caller map_delete_elem() disable
  * preemption, cannot call kthread_stop() to make sure queue is empty.
  * Instead a work_queue is started for stopping kthread,
- * cpu_map_kthread_stop, which waits for an RCU graze period before
+ * cpu_map_kthread_stop, which waits for an RCU grace period before
  * stopping kthread, emptying the queue.
  */
 static void __cpu_map_entry_replace(struct bpf_cpu_map *cmap,
@@ -523,7 +523,7 @@ static void cpu_map_free(struct bpf_map *map)
 		if (!rcpu)
 			continue;
 
-		/* bq flush and cleanup happens after RCU graze-period */
+		/* bq flush and cleanup happens after RCU grace-period */
 		__cpu_map_entry_replace(cmap, i, NULL); /* call_rcu */
 	}
 	free_percpu(cmap->flush_list);
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH bpf-next v2 4/8] xsk: make xskmap flush_list common for all map instances
  2019-12-19  6:09 [PATCH bpf-next v2 0/8] Simplify xdp_do_redirect_map()/xdp_do_flush_map() and XDP maps Björn Töpel
                   ` (2 preceding siblings ...)
  2019-12-19  6:10 ` [PATCH bpf-next v2 3/8] xdp: fix graze->grace type-o in cpumap comments Björn Töpel
@ 2019-12-19  6:10 ` Björn Töpel
  2020-01-07 17:54   ` John Fastabend
  2019-12-19  6:10 ` [PATCH bpf-next v2 5/8] xdp: make devmap " Björn Töpel
                   ` (5 subsequent siblings)
  9 siblings, 1 reply; 36+ messages in thread
From: Björn Töpel @ 2019-12-19  6:10 UTC (permalink / raw)
  To: netdev, ast, daniel
  Cc: Björn Töpel, bpf, davem, jakub.kicinski, hawk,
	john.fastabend, magnus.karlsson, jonathan.lemon,
	Toke Høiland-Jørgensen

From: Björn Töpel <bjorn.topel@intel.com>

The xskmap flush list is used to track entries that need to flushed
from via the xdp_do_flush_map() function. This list used to be
per-map, but there is really no reason for that. Instead make the
flush list global for all xskmaps, which simplifies __xsk_map_flush()
and xsk_map_alloc().

Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
---
 include/net/xdp_sock.h | 11 ++++-------
 kernel/bpf/xskmap.c    | 18 +++---------------
 net/core/filter.c      |  9 ++++-----
 net/xdp/xsk.c          | 17 +++++++++--------
 4 files changed, 20 insertions(+), 35 deletions(-)

diff --git a/include/net/xdp_sock.h b/include/net/xdp_sock.h
index e3780e4b74e1..48594740d67c 100644
--- a/include/net/xdp_sock.h
+++ b/include/net/xdp_sock.h
@@ -72,7 +72,6 @@ struct xdp_umem {
 
 struct xsk_map {
 	struct bpf_map map;
-	struct list_head __percpu *flush_list;
 	spinlock_t lock; /* Synchronize map updates */
 	struct xdp_sock *xsk_map[];
 };
@@ -139,9 +138,8 @@ void xsk_map_try_sock_delete(struct xsk_map *map, struct xdp_sock *xs,
 			     struct xdp_sock **map_entry);
 int xsk_map_inc(struct xsk_map *map);
 void xsk_map_put(struct xsk_map *map);
-int __xsk_map_redirect(struct bpf_map *map, struct xdp_buff *xdp,
-		       struct xdp_sock *xs);
-void __xsk_map_flush(struct bpf_map *map);
+int __xsk_map_redirect(struct xdp_sock *xs, struct xdp_buff *xdp);
+void __xsk_map_flush(void);
 
 static inline struct xdp_sock *__xsk_map_lookup_elem(struct bpf_map *map,
 						     u32 key)
@@ -369,13 +367,12 @@ static inline u64 xsk_umem_adjust_offset(struct xdp_umem *umem, u64 handle,
 	return 0;
 }
 
-static inline int __xsk_map_redirect(struct bpf_map *map, struct xdp_buff *xdp,
-				     struct xdp_sock *xs)
+static inline int __xsk_map_redirect(struct xdp_sock *xs, struct xdp_buff *xdp)
 {
 	return -EOPNOTSUPP;
 }
 
-static inline void __xsk_map_flush(struct bpf_map *map)
+static inline void __xsk_map_flush(void)
 {
 }
 
diff --git a/kernel/bpf/xskmap.c b/kernel/bpf/xskmap.c
index 90c4fce1c981..2cc5c8f4c800 100644
--- a/kernel/bpf/xskmap.c
+++ b/kernel/bpf/xskmap.c
@@ -72,9 +72,9 @@ static void xsk_map_sock_delete(struct xdp_sock *xs,
 static struct bpf_map *xsk_map_alloc(union bpf_attr *attr)
 {
 	struct bpf_map_memory mem;
-	int cpu, err, numa_node;
+	int err, numa_node;
 	struct xsk_map *m;
-	u64 cost, size;
+	u64 size;
 
 	if (!capable(CAP_NET_ADMIN))
 		return ERR_PTR(-EPERM);
@@ -86,9 +86,8 @@ static struct bpf_map *xsk_map_alloc(union bpf_attr *attr)
 
 	numa_node = bpf_map_attr_numa_node(attr);
 	size = struct_size(m, xsk_map, attr->max_entries);
-	cost = size + array_size(sizeof(*m->flush_list), num_possible_cpus());
 
-	err = bpf_map_charge_init(&mem, cost);
+	err = bpf_map_charge_init(&mem, size);
 	if (err < 0)
 		return ERR_PTR(err);
 
@@ -102,16 +101,6 @@ static struct bpf_map *xsk_map_alloc(union bpf_attr *attr)
 	bpf_map_charge_move(&m->map.memory, &mem);
 	spin_lock_init(&m->lock);
 
-	m->flush_list = alloc_percpu(struct list_head);
-	if (!m->flush_list) {
-		bpf_map_charge_finish(&m->map.memory);
-		bpf_map_area_free(m);
-		return ERR_PTR(-ENOMEM);
-	}
-
-	for_each_possible_cpu(cpu)
-		INIT_LIST_HEAD(per_cpu_ptr(m->flush_list, cpu));
-
 	return &m->map;
 }
 
@@ -121,7 +110,6 @@ static void xsk_map_free(struct bpf_map *map)
 
 	bpf_clear_redirect_map(map);
 	synchronize_net();
-	free_percpu(m->flush_list);
 	bpf_map_area_free(m);
 }
 
diff --git a/net/core/filter.c b/net/core/filter.c
index a411f7835dee..c51678c473c5 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -3511,8 +3511,7 @@ xdp_do_redirect_slow(struct net_device *dev, struct xdp_buff *xdp,
 
 static int __bpf_tx_xdp_map(struct net_device *dev_rx, void *fwd,
 			    struct bpf_map *map,
-			    struct xdp_buff *xdp,
-			    u32 index)
+			    struct xdp_buff *xdp)
 {
 	int err;
 
@@ -3537,7 +3536,7 @@ static int __bpf_tx_xdp_map(struct net_device *dev_rx, void *fwd,
 	case BPF_MAP_TYPE_XSKMAP: {
 		struct xdp_sock *xs = fwd;
 
-		err = __xsk_map_redirect(map, xdp, xs);
+		err = __xsk_map_redirect(xs, xdp);
 		return err;
 	}
 	default:
@@ -3562,7 +3561,7 @@ void xdp_do_flush_map(void)
 			__cpu_map_flush(map);
 			break;
 		case BPF_MAP_TYPE_XSKMAP:
-			__xsk_map_flush(map);
+			__xsk_map_flush();
 			break;
 		default:
 			break;
@@ -3619,7 +3618,7 @@ static int xdp_do_redirect_map(struct net_device *dev, struct xdp_buff *xdp,
 	if (ri->map_to_flush && unlikely(ri->map_to_flush != map))
 		xdp_do_flush_map();
 
-	err = __bpf_tx_xdp_map(dev, fwd, map, xdp, index);
+	err = __bpf_tx_xdp_map(dev, fwd, map, xdp);
 	if (unlikely(err))
 		goto err;
 
diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
index 956793893c9d..e45c27f5cfca 100644
--- a/net/xdp/xsk.c
+++ b/net/xdp/xsk.c
@@ -31,6 +31,8 @@
 
 #define TX_BATCH_SIZE 16
 
+static DEFINE_PER_CPU(struct list_head, xskmap_flush_list);
+
 bool xsk_is_setup_for_bpf_map(struct xdp_sock *xs)
 {
 	return READ_ONCE(xs->rx) &&  READ_ONCE(xs->umem) &&
@@ -264,11 +266,9 @@ int xsk_generic_rcv(struct xdp_sock *xs, struct xdp_buff *xdp)
 	return err;
 }
 
-int __xsk_map_redirect(struct bpf_map *map, struct xdp_buff *xdp,
-		       struct xdp_sock *xs)
+int __xsk_map_redirect(struct xdp_sock *xs, struct xdp_buff *xdp)
 {
-	struct xsk_map *m = container_of(map, struct xsk_map, map);
-	struct list_head *flush_list = this_cpu_ptr(m->flush_list);
+	struct list_head *flush_list = this_cpu_ptr(&xskmap_flush_list);
 	int err;
 
 	err = xsk_rcv(xs, xdp);
@@ -281,10 +281,9 @@ int __xsk_map_redirect(struct bpf_map *map, struct xdp_buff *xdp,
 	return 0;
 }
 
-void __xsk_map_flush(struct bpf_map *map)
+void __xsk_map_flush(void)
 {
-	struct xsk_map *m = container_of(map, struct xsk_map, map);
-	struct list_head *flush_list = this_cpu_ptr(m->flush_list);
+	struct list_head *flush_list = this_cpu_ptr(&xskmap_flush_list);
 	struct xdp_sock *xs, *tmp;
 
 	list_for_each_entry_safe(xs, tmp, flush_list, flush_node) {
@@ -1177,7 +1176,7 @@ static struct pernet_operations xsk_net_ops = {
 
 static int __init xsk_init(void)
 {
-	int err;
+	int err, cpu;
 
 	err = proto_register(&xsk_proto, 0 /* no slab */);
 	if (err)
@@ -1195,6 +1194,8 @@ static int __init xsk_init(void)
 	if (err)
 		goto out_pernet;
 
+	for_each_possible_cpu(cpu)
+		INIT_LIST_HEAD(&per_cpu(xskmap_flush_list, cpu));
 	return 0;
 
 out_pernet:
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH bpf-next v2 5/8] xdp: make devmap flush_list common for all map instances
  2019-12-19  6:09 [PATCH bpf-next v2 0/8] Simplify xdp_do_redirect_map()/xdp_do_flush_map() and XDP maps Björn Töpel
                   ` (3 preceding siblings ...)
  2019-12-19  6:10 ` [PATCH bpf-next v2 4/8] xsk: make xskmap flush_list common for all map instances Björn Töpel
@ 2019-12-19  6:10 ` Björn Töpel
  2020-01-07 17:58   ` John Fastabend
  2019-12-19  6:10 ` [PATCH bpf-next v2 6/8] xdp: make cpumap " Björn Töpel
                   ` (4 subsequent siblings)
  9 siblings, 1 reply; 36+ messages in thread
From: Björn Töpel @ 2019-12-19  6:10 UTC (permalink / raw)
  To: netdev, ast, daniel
  Cc: Björn Töpel, bpf, davem, jakub.kicinski, hawk,
	john.fastabend, magnus.karlsson, jonathan.lemon,
	Toke Høiland-Jørgensen

From: Björn Töpel <bjorn.topel@intel.com>

The devmap flush list is used to track entries that need to flushed
from via the xdp_do_flush_map() function. This list used to be
per-map, but there is really no reason for that. Instead make the
flush list global for all devmaps, which simplifies __dev_map_flush()
and dev_map_init_map().

Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
---
 include/linux/bpf.h |  4 ++--
 kernel/bpf/devmap.c | 35 +++++++++++++----------------------
 net/core/filter.c   |  2 +-
 3 files changed, 16 insertions(+), 25 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index d467983e61bb..31191804ca09 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -959,7 +959,7 @@ struct sk_buff;
 
 struct bpf_dtab_netdev *__dev_map_lookup_elem(struct bpf_map *map, u32 key);
 struct bpf_dtab_netdev *__dev_map_hash_lookup_elem(struct bpf_map *map, u32 key);
-void __dev_map_flush(struct bpf_map *map);
+void __dev_map_flush(void);
 int dev_map_enqueue(struct bpf_dtab_netdev *dst, struct xdp_buff *xdp,
 		    struct net_device *dev_rx);
 int dev_map_generic_redirect(struct bpf_dtab_netdev *dst, struct sk_buff *skb,
@@ -1068,7 +1068,7 @@ static inline struct net_device  *__dev_map_hash_lookup_elem(struct bpf_map *map
 	return NULL;
 }
 
-static inline void __dev_map_flush(struct bpf_map *map)
+static inline void __dev_map_flush(void)
 {
 }
 
diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c
index b7595de6a91a..da9c832fc5c8 100644
--- a/kernel/bpf/devmap.c
+++ b/kernel/bpf/devmap.c
@@ -75,7 +75,6 @@ struct bpf_dtab_netdev {
 struct bpf_dtab {
 	struct bpf_map map;
 	struct bpf_dtab_netdev **netdev_map; /* DEVMAP type only */
-	struct list_head __percpu *flush_list;
 	struct list_head list;
 
 	/* these are only used for DEVMAP_HASH type maps */
@@ -85,6 +84,7 @@ struct bpf_dtab {
 	u32 n_buckets;
 };
 
+static DEFINE_PER_CPU(struct list_head, dev_map_flush_list);
 static DEFINE_SPINLOCK(dev_map_lock);
 static LIST_HEAD(dev_map_list);
 
@@ -109,8 +109,8 @@ static inline struct hlist_head *dev_map_index_hash(struct bpf_dtab *dtab,
 
 static int dev_map_init_map(struct bpf_dtab *dtab, union bpf_attr *attr)
 {
-	int err, cpu;
-	u64 cost;
+	u64 cost = 0;
+	int err;
 
 	/* check sanity of attributes */
 	if (attr->max_entries == 0 || attr->key_size != 4 ||
@@ -125,9 +125,6 @@ static int dev_map_init_map(struct bpf_dtab *dtab, union bpf_attr *attr)
 
 	bpf_map_init_from_attr(&dtab->map, attr);
 
-	/* make sure page count doesn't overflow */
-	cost = (u64) sizeof(struct list_head) * num_possible_cpus();
-
 	if (attr->map_type == BPF_MAP_TYPE_DEVMAP_HASH) {
 		dtab->n_buckets = roundup_pow_of_two(dtab->map.max_entries);
 
@@ -143,17 +140,10 @@ static int dev_map_init_map(struct bpf_dtab *dtab, union bpf_attr *attr)
 	if (err)
 		return -EINVAL;
 
-	dtab->flush_list = alloc_percpu(struct list_head);
-	if (!dtab->flush_list)
-		goto free_charge;
-
-	for_each_possible_cpu(cpu)
-		INIT_LIST_HEAD(per_cpu_ptr(dtab->flush_list, cpu));
-
 	if (attr->map_type == BPF_MAP_TYPE_DEVMAP_HASH) {
 		dtab->dev_index_head = dev_map_create_hash(dtab->n_buckets);
 		if (!dtab->dev_index_head)
-			goto free_percpu;
+			goto free_charge;
 
 		spin_lock_init(&dtab->index_lock);
 	} else {
@@ -161,13 +151,11 @@ static int dev_map_init_map(struct bpf_dtab *dtab, union bpf_attr *attr)
 						      sizeof(struct bpf_dtab_netdev *),
 						      dtab->map.numa_node);
 		if (!dtab->netdev_map)
-			goto free_percpu;
+			goto free_charge;
 	}
 
 	return 0;
 
-free_percpu:
-	free_percpu(dtab->flush_list);
 free_charge:
 	bpf_map_charge_finish(&dtab->map.memory);
 	return -ENOMEM;
@@ -254,7 +242,6 @@ static void dev_map_free(struct bpf_map *map)
 		bpf_map_area_free(dtab->netdev_map);
 	}
 
-	free_percpu(dtab->flush_list);
 	kfree(dtab);
 }
 
@@ -384,10 +371,9 @@ static int bq_xmit_all(struct xdp_bulk_queue *bq, u32 flags)
  * net device can be torn down. On devmap tear down we ensure the flush list
  * is empty before completing to ensure all flush operations have completed.
  */
-void __dev_map_flush(struct bpf_map *map)
+void __dev_map_flush(void)
 {
-	struct bpf_dtab *dtab = container_of(map, struct bpf_dtab, map);
-	struct list_head *flush_list = this_cpu_ptr(dtab->flush_list);
+	struct list_head *flush_list = this_cpu_ptr(&dev_map_flush_list);
 	struct xdp_bulk_queue *bq, *tmp;
 
 	rcu_read_lock();
@@ -419,7 +405,7 @@ static int bq_enqueue(struct bpf_dtab_netdev *obj, struct xdp_frame *xdpf,
 		      struct net_device *dev_rx)
 
 {
-	struct list_head *flush_list = this_cpu_ptr(obj->dtab->flush_list);
+	struct list_head *flush_list = this_cpu_ptr(&dev_map_flush_list);
 	struct xdp_bulk_queue *bq = this_cpu_ptr(obj->bulkq);
 
 	if (unlikely(bq->count == DEV_MAP_BULK_SIZE))
@@ -777,10 +763,15 @@ static struct notifier_block dev_map_notifier = {
 
 static int __init dev_map_init(void)
 {
+	int cpu;
+
 	/* Assure tracepoint shadow struct _bpf_dtab_netdev is in sync */
 	BUILD_BUG_ON(offsetof(struct bpf_dtab_netdev, dev) !=
 		     offsetof(struct _bpf_dtab_netdev, dev));
 	register_netdevice_notifier(&dev_map_notifier);
+
+	for_each_possible_cpu(cpu)
+		INIT_LIST_HEAD(&per_cpu(dev_map_flush_list, cpu));
 	return 0;
 }
 
diff --git a/net/core/filter.c b/net/core/filter.c
index c51678c473c5..b7570cb84902 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -3555,7 +3555,7 @@ void xdp_do_flush_map(void)
 		switch (map->map_type) {
 		case BPF_MAP_TYPE_DEVMAP:
 		case BPF_MAP_TYPE_DEVMAP_HASH:
-			__dev_map_flush(map);
+			__dev_map_flush();
 			break;
 		case BPF_MAP_TYPE_CPUMAP:
 			__cpu_map_flush(map);
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH bpf-next v2 6/8] xdp: make cpumap flush_list common for all map instances
  2019-12-19  6:09 [PATCH bpf-next v2 0/8] Simplify xdp_do_redirect_map()/xdp_do_flush_map() and XDP maps Björn Töpel
                   ` (4 preceding siblings ...)
  2019-12-19  6:10 ` [PATCH bpf-next v2 5/8] xdp: make devmap " Björn Töpel
@ 2019-12-19  6:10 ` Björn Töpel
  2020-01-07 17:59   ` John Fastabend
  2019-12-19  6:10 ` [PATCH bpf-next v2 7/8] xdp: remove map_to_flush and map swap detection Björn Töpel
                   ` (3 subsequent siblings)
  9 siblings, 1 reply; 36+ messages in thread
From: Björn Töpel @ 2019-12-19  6:10 UTC (permalink / raw)
  To: netdev, ast, daniel
  Cc: Björn Töpel, bpf, davem, jakub.kicinski, hawk,
	john.fastabend, magnus.karlsson, jonathan.lemon,
	Toke Høiland-Jørgensen

From: Björn Töpel <bjorn.topel@intel.com>

The cpumap flush list is used to track entries that need to flushed
from via the xdp_do_flush_map() function. This list used to be
per-map, but there is really no reason for that. Instead make the
flush list global for all devmaps, which simplifies __cpu_map_flush()
and cpu_map_alloc().

Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
---
 include/linux/bpf.h |  4 ++--
 kernel/bpf/cpumap.c | 36 ++++++++++++++++++------------------
 net/core/filter.c   |  2 +-
 3 files changed, 21 insertions(+), 21 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 31191804ca09..8f3e00c84f39 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -966,7 +966,7 @@ int dev_map_generic_redirect(struct bpf_dtab_netdev *dst, struct sk_buff *skb,
 			     struct bpf_prog *xdp_prog);
 
 struct bpf_cpu_map_entry *__cpu_map_lookup_elem(struct bpf_map *map, u32 key);
-void __cpu_map_flush(struct bpf_map *map);
+void __cpu_map_flush(void);
 int cpu_map_enqueue(struct bpf_cpu_map_entry *rcpu, struct xdp_buff *xdp,
 		    struct net_device *dev_rx);
 
@@ -1097,7 +1097,7 @@ struct bpf_cpu_map_entry *__cpu_map_lookup_elem(struct bpf_map *map, u32 key)
 	return NULL;
 }
 
-static inline void __cpu_map_flush(struct bpf_map *map)
+static inline void __cpu_map_flush(void)
 {
 }
 
diff --git a/kernel/bpf/cpumap.c b/kernel/bpf/cpumap.c
index f9deed659798..70f71b154fa5 100644
--- a/kernel/bpf/cpumap.c
+++ b/kernel/bpf/cpumap.c
@@ -72,17 +72,18 @@ struct bpf_cpu_map {
 	struct bpf_map map;
 	/* Below members specific for map type */
 	struct bpf_cpu_map_entry **cpu_map;
-	struct list_head __percpu *flush_list;
 };
 
+static DEFINE_PER_CPU(struct list_head, cpu_map_flush_list);
+
 static int bq_flush_to_queue(struct xdp_bulk_queue *bq);
 
 static struct bpf_map *cpu_map_alloc(union bpf_attr *attr)
 {
 	struct bpf_cpu_map *cmap;
 	int err = -ENOMEM;
-	int ret, cpu;
 	u64 cost;
+	int ret;
 
 	if (!capable(CAP_SYS_ADMIN))
 		return ERR_PTR(-EPERM);
@@ -106,7 +107,6 @@ static struct bpf_map *cpu_map_alloc(union bpf_attr *attr)
 
 	/* make sure page count doesn't overflow */
 	cost = (u64) cmap->map.max_entries * sizeof(struct bpf_cpu_map_entry *);
-	cost += sizeof(struct list_head) * num_possible_cpus();
 
 	/* Notice returns -EPERM on if map size is larger than memlock limit */
 	ret = bpf_map_charge_init(&cmap->map.memory, cost);
@@ -115,23 +115,14 @@ static struct bpf_map *cpu_map_alloc(union bpf_attr *attr)
 		goto free_cmap;
 	}
 
-	cmap->flush_list = alloc_percpu(struct list_head);
-	if (!cmap->flush_list)
-		goto free_charge;
-
-	for_each_possible_cpu(cpu)
-		INIT_LIST_HEAD(per_cpu_ptr(cmap->flush_list, cpu));
-
 	/* Alloc array for possible remote "destination" CPUs */
 	cmap->cpu_map = bpf_map_area_alloc(cmap->map.max_entries *
 					   sizeof(struct bpf_cpu_map_entry *),
 					   cmap->map.numa_node);
 	if (!cmap->cpu_map)
-		goto free_percpu;
+		goto free_charge;
 
 	return &cmap->map;
-free_percpu:
-	free_percpu(cmap->flush_list);
 free_charge:
 	bpf_map_charge_finish(&cmap->map.memory);
 free_cmap:
@@ -526,7 +517,6 @@ static void cpu_map_free(struct bpf_map *map)
 		/* bq flush and cleanup happens after RCU grace-period */
 		__cpu_map_entry_replace(cmap, i, NULL); /* call_rcu */
 	}
-	free_percpu(cmap->flush_list);
 	bpf_map_area_free(cmap->cpu_map);
 	kfree(cmap);
 }
@@ -618,7 +608,7 @@ static int bq_flush_to_queue(struct xdp_bulk_queue *bq)
  */
 static int bq_enqueue(struct bpf_cpu_map_entry *rcpu, struct xdp_frame *xdpf)
 {
-	struct list_head *flush_list = this_cpu_ptr(rcpu->cmap->flush_list);
+	struct list_head *flush_list = this_cpu_ptr(&cpu_map_flush_list);
 	struct xdp_bulk_queue *bq = this_cpu_ptr(rcpu->bulkq);
 
 	if (unlikely(bq->count == CPU_MAP_BULK_SIZE))
@@ -657,10 +647,9 @@ int cpu_map_enqueue(struct bpf_cpu_map_entry *rcpu, struct xdp_buff *xdp,
 	return 0;
 }
 
-void __cpu_map_flush(struct bpf_map *map)
+void __cpu_map_flush(void)
 {
-	struct bpf_cpu_map *cmap = container_of(map, struct bpf_cpu_map, map);
-	struct list_head *flush_list = this_cpu_ptr(cmap->flush_list);
+	struct list_head *flush_list = this_cpu_ptr(&cpu_map_flush_list);
 	struct xdp_bulk_queue *bq, *tmp;
 
 	list_for_each_entry_safe(bq, tmp, flush_list, flush_node) {
@@ -670,3 +659,14 @@ void __cpu_map_flush(struct bpf_map *map)
 		wake_up_process(bq->obj->kthread);
 	}
 }
+
+static int __init cpu_map_init(void)
+{
+	int cpu;
+
+	for_each_possible_cpu(cpu)
+		INIT_LIST_HEAD(&per_cpu(cpu_map_flush_list, cpu));
+	return 0;
+}
+
+subsys_initcall(cpu_map_init);
diff --git a/net/core/filter.c b/net/core/filter.c
index b7570cb84902..c706325b3e66 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -3558,7 +3558,7 @@ void xdp_do_flush_map(void)
 			__dev_map_flush();
 			break;
 		case BPF_MAP_TYPE_CPUMAP:
-			__cpu_map_flush(map);
+			__cpu_map_flush();
 			break;
 		case BPF_MAP_TYPE_XSKMAP:
 			__xsk_map_flush();
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH bpf-next v2 7/8] xdp: remove map_to_flush and map swap detection
  2019-12-19  6:09 [PATCH bpf-next v2 0/8] Simplify xdp_do_redirect_map()/xdp_do_flush_map() and XDP maps Björn Töpel
                   ` (5 preceding siblings ...)
  2019-12-19  6:10 ` [PATCH bpf-next v2 6/8] xdp: make cpumap " Björn Töpel
@ 2019-12-19  6:10 ` Björn Töpel
  2020-01-07 18:15   ` John Fastabend
  2019-12-19  6:10 ` [PATCH bpf-next v2 8/8] xdp: simplify __bpf_tx_xdp_map() Björn Töpel
                   ` (2 subsequent siblings)
  9 siblings, 1 reply; 36+ messages in thread
From: Björn Töpel @ 2019-12-19  6:10 UTC (permalink / raw)
  To: netdev, ast, daniel
  Cc: Björn Töpel, bpf, davem, jakub.kicinski, hawk,
	john.fastabend, magnus.karlsson, jonathan.lemon,
	Toke Høiland-Jørgensen

From: Björn Töpel <bjorn.topel@intel.com>

Now that all XDP maps that can be used with bpf_redirect_map() tracks
entries to be flushed in a global fashion, there is not need to track
that the map has changed and flush from xdp_do_generic_map()
anymore. All entries will be flushed in xdp_do_flush_map().

This means that the map_to_flush can be removed, and the corresponding
checks. Moving the flush logic to one place, xdp_do_flush_map(), give
a bulking behavior and performance boost.

Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
---
 include/linux/filter.h |  1 -
 net/core/filter.c      | 27 +++------------------------
 2 files changed, 3 insertions(+), 25 deletions(-)

diff --git a/include/linux/filter.h b/include/linux/filter.h
index 37ac7025031d..69d6706fc889 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -592,7 +592,6 @@ struct bpf_redirect_info {
 	u32 tgt_index;
 	void *tgt_value;
 	struct bpf_map *map;
-	struct bpf_map *map_to_flush;
 	u32 kern_flags;
 };
 
diff --git a/net/core/filter.c b/net/core/filter.c
index c706325b3e66..d9caa3e57ea1 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -3547,26 +3547,9 @@ static int __bpf_tx_xdp_map(struct net_device *dev_rx, void *fwd,
 
 void xdp_do_flush_map(void)
 {
-	struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info);
-	struct bpf_map *map = ri->map_to_flush;
-
-	ri->map_to_flush = NULL;
-	if (map) {
-		switch (map->map_type) {
-		case BPF_MAP_TYPE_DEVMAP:
-		case BPF_MAP_TYPE_DEVMAP_HASH:
-			__dev_map_flush();
-			break;
-		case BPF_MAP_TYPE_CPUMAP:
-			__cpu_map_flush();
-			break;
-		case BPF_MAP_TYPE_XSKMAP:
-			__xsk_map_flush();
-			break;
-		default:
-			break;
-		}
-	}
+	__dev_map_flush();
+	__cpu_map_flush();
+	__xsk_map_flush();
 }
 EXPORT_SYMBOL_GPL(xdp_do_flush_map);
 
@@ -3615,14 +3598,10 @@ static int xdp_do_redirect_map(struct net_device *dev, struct xdp_buff *xdp,
 	ri->tgt_value = NULL;
 	WRITE_ONCE(ri->map, NULL);
 
-	if (ri->map_to_flush && unlikely(ri->map_to_flush != map))
-		xdp_do_flush_map();
-
 	err = __bpf_tx_xdp_map(dev, fwd, map, xdp);
 	if (unlikely(err))
 		goto err;
 
-	ri->map_to_flush = map;
 	_trace_xdp_redirect_map(dev, xdp_prog, fwd, map, index);
 	return 0;
 err:
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH bpf-next v2 8/8] xdp: simplify __bpf_tx_xdp_map()
  2019-12-19  6:09 [PATCH bpf-next v2 0/8] Simplify xdp_do_redirect_map()/xdp_do_flush_map() and XDP maps Björn Töpel
                   ` (6 preceding siblings ...)
  2019-12-19  6:10 ` [PATCH bpf-next v2 7/8] xdp: remove map_to_flush and map swap detection Björn Töpel
@ 2019-12-19  6:10 ` Björn Töpel
  2019-12-19  7:18 ` [PATCH bpf-next v2 0/8] Simplify xdp_do_redirect_map()/xdp_do_flush_map() and XDP maps Jesper Dangaard Brouer
  2019-12-20  5:21 ` Alexei Starovoitov
  9 siblings, 0 replies; 36+ messages in thread
From: Björn Töpel @ 2019-12-19  6:10 UTC (permalink / raw)
  To: netdev, ast, daniel
  Cc: Björn Töpel, bpf, davem, jakub.kicinski, hawk,
	john.fastabend, magnus.karlsson, jonathan.lemon,
	Toke Høiland-Jørgensen

From: Björn Töpel <bjorn.topel@intel.com>

The explicit error checking is not needed. Simply return the error
instead.

Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
---
 net/core/filter.c | 33 +++++++--------------------------
 1 file changed, 7 insertions(+), 26 deletions(-)

diff --git a/net/core/filter.c b/net/core/filter.c
index d9caa3e57ea1..217af9974c86 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -3510,35 +3510,16 @@ xdp_do_redirect_slow(struct net_device *dev, struct xdp_buff *xdp,
 }
 
 static int __bpf_tx_xdp_map(struct net_device *dev_rx, void *fwd,
-			    struct bpf_map *map,
-			    struct xdp_buff *xdp)
+			    struct bpf_map *map, struct xdp_buff *xdp)
 {
-	int err;
-
 	switch (map->map_type) {
 	case BPF_MAP_TYPE_DEVMAP:
-	case BPF_MAP_TYPE_DEVMAP_HASH: {
-		struct bpf_dtab_netdev *dst = fwd;
-
-		err = dev_map_enqueue(dst, xdp, dev_rx);
-		if (unlikely(err))
-			return err;
-		break;
-	}
-	case BPF_MAP_TYPE_CPUMAP: {
-		struct bpf_cpu_map_entry *rcpu = fwd;
-
-		err = cpu_map_enqueue(rcpu, xdp, dev_rx);
-		if (unlikely(err))
-			return err;
-		break;
-	}
-	case BPF_MAP_TYPE_XSKMAP: {
-		struct xdp_sock *xs = fwd;
-
-		err = __xsk_map_redirect(xs, xdp);
-		return err;
-	}
+	case BPF_MAP_TYPE_DEVMAP_HASH:
+		return dev_map_enqueue(fwd, xdp, dev_rx);
+	case BPF_MAP_TYPE_CPUMAP:
+		return cpu_map_enqueue(fwd, xdp, dev_rx);
+	case BPF_MAP_TYPE_XSKMAP:
+		return __xsk_map_redirect(fwd, xdp);
 	default:
 		break;
 	}
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [PATCH bpf-next v2 0/8] Simplify xdp_do_redirect_map()/xdp_do_flush_map() and XDP maps
  2019-12-19  6:09 [PATCH bpf-next v2 0/8] Simplify xdp_do_redirect_map()/xdp_do_flush_map() and XDP maps Björn Töpel
                   ` (7 preceding siblings ...)
  2019-12-19  6:10 ` [PATCH bpf-next v2 8/8] xdp: simplify __bpf_tx_xdp_map() Björn Töpel
@ 2019-12-19  7:18 ` Jesper Dangaard Brouer
  2019-12-20  5:21 ` Alexei Starovoitov
  9 siblings, 0 replies; 36+ messages in thread
From: Jesper Dangaard Brouer @ 2019-12-19  7:18 UTC (permalink / raw)
  To: Björn Töpel
  Cc: netdev, ast, daniel, bpf, davem, jakub.kicinski, hawk,
	john.fastabend, magnus.karlsson, jonathan.lemon

On Thu, 19 Dec 2019 07:09:58 +0100
Björn Töpel <bjorn.topel@gmail.com> wrote:

>   $ sudo ./xdp_redirect_cpu --dev enp134s0f0 --cpu 22 xdp_cpu_map0
                                                        ^^^^^^^^^^^^
>   
>   Running XDP/eBPF prog_name:xdp_cpu_map5_lb_hash_ip_pairs
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>   XDP-cpumap      CPU:to  pps            drop-pps    extra-info
>   XDP-RX          20      7723038        0           0
>   XDP-RX          total   7723038        0
[...]

Talking about how to invoke the 'xdp_redirect_cpu' program, I notice
that you are using BPF-prog named: 'xdp_cpu_map5_lb_hash_ip_pairs'
(default) but on cmdline it looks like you want 'xdp_cpu_map0'. 
You need to use '--prog xdp_cpu_map0'.  It will make a HUGE performance
difference.

Like:
  sudo ./xdp_redirect_cpu --dev mlx5p1 --cpu 22 --prog xdp_cpu_map0



(p.s. The load-balance hash_ip_pairs is the default, because it is
usable for solving the most common issue, where the NIC is not
RSS distributing traffic correctly, e.g. ixgbe Q-in-Q. And it is close
to the Suricata approach.)
-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH bpf-next v2 0/8] Simplify xdp_do_redirect_map()/xdp_do_flush_map() and XDP maps
  2019-12-19  6:09 [PATCH bpf-next v2 0/8] Simplify xdp_do_redirect_map()/xdp_do_flush_map() and XDP maps Björn Töpel
                   ` (8 preceding siblings ...)
  2019-12-19  7:18 ` [PATCH bpf-next v2 0/8] Simplify xdp_do_redirect_map()/xdp_do_flush_map() and XDP maps Jesper Dangaard Brouer
@ 2019-12-20  5:21 ` Alexei Starovoitov
  2019-12-20  7:46   ` Jesper Dangaard Brouer
  9 siblings, 1 reply; 36+ messages in thread
From: Alexei Starovoitov @ 2019-12-20  5:21 UTC (permalink / raw)
  To: Björn Töpel
  Cc: Network Development, Alexei Starovoitov, Daniel Borkmann, bpf,
	David S. Miller, Jakub Kicinski, Jesper Dangaard Brouer,
	John Fastabend, Karlsson, Magnus, Jonathan Lemon

On Wed, Dec 18, 2019 at 10:10 PM Björn Töpel <bjorn.topel@gmail.com> wrote:
>
> This series aims to simplify the XDP maps and
> xdp_do_redirect_map()/xdp_do_flush_map(), and to crank out some more
> performance from XDP_REDIRECT scenarios.
>
> The first part of the series simplifies all XDP_REDIRECT capable maps,
> so that __XXX_flush_map() does not require the map parameter, by
> moving the flush list from the map to global scope.
>
> This results in that the map_to_flush member can be removed from
> struct bpf_redirect_info, and its corresponding logic.
>
> Simpler code, and more performance due to that checks/code per-packet
> is moved to flush.
>
> Pre-series performance:
>   $ sudo taskset -c 22 ./xdpsock -i enp134s0f0 -q 20 -n 1 -r -z
>
>    sock0@enp134s0f0:20 rxdrop xdp-drv
>                   pps         pkts        1.00
>   rx              20,797,350  230,942,399
>   tx              0           0
>
>   $ sudo ./xdp_redirect_cpu --dev enp134s0f0 --cpu 22 xdp_cpu_map0
>
>   Running XDP/eBPF prog_name:xdp_cpu_map5_lb_hash_ip_pairs
>   XDP-cpumap      CPU:to  pps            drop-pps    extra-info
>   XDP-RX          20      7723038        0           0
>   XDP-RX          total   7723038        0
>   cpumap_kthread  total   0              0           0
>   redirect_err    total   0              0
>   xdp_exception   total   0              0
>
> Post-series performance:
>   $ sudo taskset -c 22 ./xdpsock -i enp134s0f0 -q 20 -n 1 -r -z
>
>    sock0@enp134s0f0:20 rxdrop xdp-drv
>                   pps         pkts        1.00
>   rx              21,524,979  86,835,327
>   tx              0           0
>
>   $ sudo ./xdp_redirect_cpu --dev enp134s0f0 --cpu 22 xdp_cpu_map0
>
>   Running XDP/eBPF prog_name:xdp_cpu_map5_lb_hash_ip_pairs
>   XDP-cpumap      CPU:to  pps            drop-pps    extra-info
>   XDP-RX          20      7840124        0           0
>   XDP-RX          total   7840124        0
>   cpumap_kthread  total   0              0           0
>   redirect_err    total   0              0
>   xdp_exception   total   0              0
>
> Results: +3.5% and +1.5% for the ubenchmarks.
>
> v1->v2 [1]:
>   * Removed 'unused-variable' compiler warning (Jakub)
>
> [1] https://lore.kernel.org/bpf/20191218105400.2895-1-bjorn.topel@gmail.com/

My understanding that outstanding discussions are not objecting to the
core ideas
of the patch set, hence applied. Thanks

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH bpf-next v2 0/8] Simplify xdp_do_redirect_map()/xdp_do_flush_map() and XDP maps
  2019-12-20  5:21 ` Alexei Starovoitov
@ 2019-12-20  7:46   ` Jesper Dangaard Brouer
  2019-12-20  9:26     ` Jesper Dangaard Brouer
  0 siblings, 1 reply; 36+ messages in thread
From: Jesper Dangaard Brouer @ 2019-12-20  7:46 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: brouer, Björn Töpel, Network Development,
	Alexei Starovoitov, Daniel Borkmann, bpf, David S. Miller,
	Jakub Kicinski, John Fastabend, Karlsson, Magnus, Jonathan Lemon

On Thu, 19 Dec 2019 21:21:39 -0800
Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:

> > v1->v2 [1]:
> >   * Removed 'unused-variable' compiler warning (Jakub)
> >
> > [1] https://lore.kernel.org/bpf/20191218105400.2895-1-bjorn.topel@gmail.com/  
> 
> My understanding that outstanding discussions are not objecting to the
> core ideas of the patch set, hence applied. Thanks

I had hoped to have time to review it in details today.  But as I don't
have any objecting to the core ideas, then I don't mind it getting
applied. We can just fix things in followups.

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH bpf-next v2 0/8] Simplify xdp_do_redirect_map()/xdp_do_flush_map() and XDP maps
  2019-12-20  7:46   ` Jesper Dangaard Brouer
@ 2019-12-20  9:26     ` Jesper Dangaard Brouer
  2019-12-20 10:29       ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 36+ messages in thread
From: Jesper Dangaard Brouer @ 2019-12-20  9:26 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Björn Töpel, Network Development, Alexei Starovoitov,
	Daniel Borkmann, bpf, David S. Miller, Jakub Kicinski,
	John Fastabend, Karlsson, Magnus, Jonathan Lemon, brouer

On Fri, 20 Dec 2019 08:46:51 +0100
Jesper Dangaard Brouer <brouer@redhat.com> wrote:

> On Thu, 19 Dec 2019 21:21:39 -0800
> Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
> 
> > > v1->v2 [1]:
> > >   * Removed 'unused-variable' compiler warning (Jakub)
> > >
> > > [1] https://lore.kernel.org/bpf/20191218105400.2895-1-bjorn.topel@gmail.com/    
> > 
> > My understanding that outstanding discussions are not objecting to the
> > core ideas of the patch set, hence applied. Thanks  
> 
> I had hoped to have time to review it in details today.  But as I don't
> have any objecting to the core ideas, then I don't mind it getting
> applied. We can just fix things in followups.

I have now went over the entire patchset, and everything look perfect,
I will go as far as saying it is brilliant.  We previously had the
issue, that using different redirect maps in a BPF-prog would cause the
bulking effect to be reduced, as map_to_flush cause previous map to get
flushed. This is now solved :-)

Thanks you Bjørn!

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH bpf-next v2 0/8] Simplify xdp_do_redirect_map()/xdp_do_flush_map() and XDP maps
  2019-12-20  9:26     ` Jesper Dangaard Brouer
@ 2019-12-20 10:29       ` Toke Høiland-Jørgensen
  2020-01-07 11:10         ` Björn Töpel
  0 siblings, 1 reply; 36+ messages in thread
From: Toke Høiland-Jørgensen @ 2019-12-20 10:29 UTC (permalink / raw)
  To: Jesper Dangaard Brouer, Alexei Starovoitov
  Cc: Björn Töpel, Network Development, Alexei Starovoitov,
	Daniel Borkmann, bpf, David S. Miller, Jakub Kicinski,
	John Fastabend, Karlsson, Magnus, Jonathan Lemon, brouer

Jesper Dangaard Brouer <brouer@redhat.com> writes:

> On Fri, 20 Dec 2019 08:46:51 +0100
> Jesper Dangaard Brouer <brouer@redhat.com> wrote:
>
>> On Thu, 19 Dec 2019 21:21:39 -0800
>> Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
>> 
>> > > v1->v2 [1]:
>> > >   * Removed 'unused-variable' compiler warning (Jakub)
>> > >
>> > > [1] https://lore.kernel.org/bpf/20191218105400.2895-1-bjorn.topel@gmail.com/    
>> > 
>> > My understanding that outstanding discussions are not objecting to the
>> > core ideas of the patch set, hence applied. Thanks  
>> 
>> I had hoped to have time to review it in details today.  But as I don't
>> have any objecting to the core ideas, then I don't mind it getting
>> applied. We can just fix things in followups.
>
> I have now went over the entire patchset, and everything look perfect,
> I will go as far as saying it is brilliant.  We previously had the
> issue, that using different redirect maps in a BPF-prog would cause the
> bulking effect to be reduced, as map_to_flush cause previous map to get
> flushed. This is now solved :-)

Another thing that occurred to me while thinking about this: Now that we
have a single flush list, is there any reason we couldn't move the
devmap xdp_bulk_queue into struct net_device? That way it could also be
used for the non-map variant of bpf_redirect()?

-Toke


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH bpf-next v2 0/8] Simplify xdp_do_redirect_map()/xdp_do_flush_map() and XDP maps
  2019-12-20 10:29       ` Toke Høiland-Jørgensen
@ 2020-01-07 11:10         ` Björn Töpel
  2020-01-07 11:25           ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 36+ messages in thread
From: Björn Töpel @ 2020-01-07 11:10 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: Jesper Dangaard Brouer, Alexei Starovoitov, Network Development,
	Alexei Starovoitov, Daniel Borkmann, bpf, David S. Miller,
	Jakub Kicinski, John Fastabend, Karlsson, Magnus, Jonathan Lemon

On Fri, 20 Dec 2019 at 11:30, Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>
> Jesper Dangaard Brouer <brouer@redhat.com> writes:
>
[...]
> > I have now went over the entire patchset, and everything look perfect,
> > I will go as far as saying it is brilliant.  We previously had the
> > issue, that using different redirect maps in a BPF-prog would cause the
> > bulking effect to be reduced, as map_to_flush cause previous map to get
> > flushed. This is now solved :-)
>
> Another thing that occurred to me while thinking about this: Now that we
> have a single flush list, is there any reason we couldn't move the
> devmap xdp_bulk_queue into struct net_device? That way it could also be
> used for the non-map variant of bpf_redirect()?
>

Indeed! (At least I don't see any blockers...)

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH bpf-next v2 0/8] Simplify xdp_do_redirect_map()/xdp_do_flush_map() and XDP maps
  2020-01-07 11:10         ` Björn Töpel
@ 2020-01-07 11:25           ` Toke Høiland-Jørgensen
  2020-01-07 13:05             ` Jesper Dangaard Brouer
  0 siblings, 1 reply; 36+ messages in thread
From: Toke Høiland-Jørgensen @ 2020-01-07 11:25 UTC (permalink / raw)
  To: Björn Töpel
  Cc: Jesper Dangaard Brouer, Alexei Starovoitov, Network Development,
	Alexei Starovoitov, Daniel Borkmann, bpf, David S. Miller,
	Jakub Kicinski, John Fastabend, Karlsson, Magnus, Jonathan Lemon

Björn Töpel <bjorn.topel@gmail.com> writes:

> On Fri, 20 Dec 2019 at 11:30, Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>>
>> Jesper Dangaard Brouer <brouer@redhat.com> writes:
>>
> [...]
>> > I have now went over the entire patchset, and everything look perfect,
>> > I will go as far as saying it is brilliant.  We previously had the
>> > issue, that using different redirect maps in a BPF-prog would cause the
>> > bulking effect to be reduced, as map_to_flush cause previous map to get
>> > flushed. This is now solved :-)
>>
>> Another thing that occurred to me while thinking about this: Now that we
>> have a single flush list, is there any reason we couldn't move the
>> devmap xdp_bulk_queue into struct net_device? That way it could also be
>> used for the non-map variant of bpf_redirect()?
>>
>
> Indeed! (At least I don't see any blockers...)

Cool, that's what I thought. Maybe I'll give that a shot, then, unless
you beat me to it ;)

-Toke


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH bpf-next v2 0/8] Simplify xdp_do_redirect_map()/xdp_do_flush_map() and XDP maps
  2020-01-07 11:25           ` Toke Høiland-Jørgensen
@ 2020-01-07 13:05             ` Jesper Dangaard Brouer
  2020-01-07 13:27               ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 36+ messages in thread
From: Jesper Dangaard Brouer @ 2020-01-07 13:05 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: Björn Töpel, Alexei Starovoitov, Network Development,
	Alexei Starovoitov, Daniel Borkmann, bpf, David S. Miller,
	Jakub Kicinski, John Fastabend, Karlsson, Magnus, Jonathan Lemon,
	brouer

On Tue, 07 Jan 2020 12:25:47 +0100
Toke Høiland-Jørgensen <toke@redhat.com> wrote:

> Björn Töpel <bjorn.topel@gmail.com> writes:
> 
> > On Fri, 20 Dec 2019 at 11:30, Toke Høiland-Jørgensen <toke@redhat.com> wrote:  
> >>
> >> Jesper Dangaard Brouer <brouer@redhat.com> writes:
> >>  
> > [...]  
> >> > I have now went over the entire patchset, and everything look perfect,
> >> > I will go as far as saying it is brilliant.  We previously had the
> >> > issue, that using different redirect maps in a BPF-prog would cause the
> >> > bulking effect to be reduced, as map_to_flush cause previous map to get
> >> > flushed. This is now solved :-)  
> >>
> >> Another thing that occurred to me while thinking about this: Now that we
> >> have a single flush list, is there any reason we couldn't move the
> >> devmap xdp_bulk_queue into struct net_device? That way it could also be
> >> used for the non-map variant of bpf_redirect()?
> >>  
> >
> > Indeed! (At least I don't see any blockers...)  
> 
> Cool, that's what I thought. Maybe I'll give that a shot, then, unless
> you beat me to it ;)
 
Generally sounds like a good idea.

It this only for devmap xdp_bulk_queue?

Some gotchas off the top of my head.

The cpumap also have a struct xdp_bulk_queue, which have a different
layout. (sidenote: due to BTF we likely want rename that).

If you want to generalize this across all redirect maps type. You
should know, that it was on purpose that I designed the bulking to be
map specific, because that allowed each map to control its own optimal
bulking.  E.g. devmap does 16 frames bulking, cpumap does 8 frames (as
it matches sending 1 cacheline into underlying ptr_ring), xskmap does
64 AFAIK (which could hurt-latency, but that is another discussion).

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH bpf-next v2 0/8] Simplify xdp_do_redirect_map()/xdp_do_flush_map() and XDP maps
  2020-01-07 13:05             ` Jesper Dangaard Brouer
@ 2020-01-07 13:27               ` Toke Høiland-Jørgensen
  2020-01-07 13:52                 ` Jesper Dangaard Brouer
  0 siblings, 1 reply; 36+ messages in thread
From: Toke Høiland-Jørgensen @ 2020-01-07 13:27 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: Björn Töpel, Alexei Starovoitov, Network Development,
	Alexei Starovoitov, Daniel Borkmann, bpf, David S. Miller,
	Jakub Kicinski, John Fastabend, Karlsson, Magnus, Jonathan Lemon,
	brouer

Jesper Dangaard Brouer <brouer@redhat.com> writes:

> On Tue, 07 Jan 2020 12:25:47 +0100
> Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>
>> Björn Töpel <bjorn.topel@gmail.com> writes:
>> 
>> > On Fri, 20 Dec 2019 at 11:30, Toke Høiland-Jørgensen <toke@redhat.com> wrote:  
>> >>
>> >> Jesper Dangaard Brouer <brouer@redhat.com> writes:
>> >>  
>> > [...]  
>> >> > I have now went over the entire patchset, and everything look perfect,
>> >> > I will go as far as saying it is brilliant.  We previously had the
>> >> > issue, that using different redirect maps in a BPF-prog would cause the
>> >> > bulking effect to be reduced, as map_to_flush cause previous map to get
>> >> > flushed. This is now solved :-)  
>> >>
>> >> Another thing that occurred to me while thinking about this: Now that we
>> >> have a single flush list, is there any reason we couldn't move the
>> >> devmap xdp_bulk_queue into struct net_device? That way it could also be
>> >> used for the non-map variant of bpf_redirect()?
>> >>  
>> >
>> > Indeed! (At least I don't see any blockers...)  
>> 
>> Cool, that's what I thought. Maybe I'll give that a shot, then, unless
>> you beat me to it ;)
>  
> Generally sounds like a good idea.
>
> It this only for devmap xdp_bulk_queue?

Non-map redirect only supports redirecting across interfaces (the
parameter is an ifindex), so yeah, this would be just for that.

> Some gotchas off the top of my head.
>
> The cpumap also have a struct xdp_bulk_queue, which have a different
> layout. (sidenote: due to BTF we likely want rename that).
>
> If you want to generalize this across all redirect maps type. You
> should know, that it was on purpose that I designed the bulking to be
> map specific, because that allowed each map to control its own optimal
> bulking.  E.g. devmap does 16 frames bulking, cpumap does 8 frames (as
> it matches sending 1 cacheline into underlying ptr_ring), xskmap does
> 64 AFAIK (which could hurt-latency, but that is another discussion).

Björn's patches do leave the per-type behaviour, they just get rid of
the per-map flush queues... :)

-Toke


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH bpf-next v2 0/8] Simplify xdp_do_redirect_map()/xdp_do_flush_map() and XDP maps
  2020-01-07 13:27               ` Toke Høiland-Jørgensen
@ 2020-01-07 13:52                 ` Jesper Dangaard Brouer
  2020-01-07 14:18                   ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 36+ messages in thread
From: Jesper Dangaard Brouer @ 2020-01-07 13:52 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: Björn Töpel, Alexei Starovoitov, Network Development,
	Alexei Starovoitov, Daniel Borkmann, bpf, David S. Miller,
	Jakub Kicinski, John Fastabend, Karlsson, Magnus, Jonathan Lemon,
	brouer

On Tue, 07 Jan 2020 14:27:41 +0100
Toke Høiland-Jørgensen <toke@redhat.com> wrote:

> Jesper Dangaard Brouer <brouer@redhat.com> writes:
> 
> > On Tue, 07 Jan 2020 12:25:47 +0100
> > Toke Høiland-Jørgensen <toke@redhat.com> wrote:
> >  
> >> Björn Töpel <bjorn.topel@gmail.com> writes:
> >>   
> >> > On Fri, 20 Dec 2019 at 11:30, Toke Høiland-Jørgensen <toke@redhat.com> wrote:    
> >> >>
> >> >> Jesper Dangaard Brouer <brouer@redhat.com> writes:
> >> >>    
> >> > [...]    
> >> >> > I have now went over the entire patchset, and everything look perfect,
> >> >> > I will go as far as saying it is brilliant.  We previously had the
> >> >> > issue, that using different redirect maps in a BPF-prog would cause the
> >> >> > bulking effect to be reduced, as map_to_flush cause previous map to get
> >> >> > flushed. This is now solved :-)    
> >> >>
> >> >> Another thing that occurred to me while thinking about this: Now that we
> >> >> have a single flush list, is there any reason we couldn't move the
> >> >> devmap xdp_bulk_queue into struct net_device? That way it could also be
> >> >> used for the non-map variant of bpf_redirect()?
> >> >>    
> >> >
> >> > Indeed! (At least I don't see any blockers...)    
> >> 
> >> Cool, that's what I thought. Maybe I'll give that a shot, then, unless
> >> you beat me to it ;)  
> >  
> > Generally sounds like a good idea.
> >
> > It this only for devmap xdp_bulk_queue?  
> 
> Non-map redirect only supports redirecting across interfaces (the
> parameter is an ifindex), so yeah, this would be just for that.

Sure, then you don't need to worry about below gotchas.

I do like the idea, as this would/should solve the non-map redirect
performance issue.


> > Some gotchas off the top of my head.
> >
> > The cpumap also have a struct xdp_bulk_queue, which have a different
> > layout. (sidenote: due to BTF we likely want rename that).
> >
> > If you want to generalize this across all redirect maps type. You
> > should know, that it was on purpose that I designed the bulking to be
> > map specific, because that allowed each map to control its own optimal
> > bulking.  E.g. devmap does 16 frames bulking, cpumap does 8 frames (as
> > it matches sending 1 cacheline into underlying ptr_ring), xskmap does
> > 64 AFAIK (which could hurt-latency, but that is another discussion).  
> 
> Björn's patches do leave the per-type behaviour, they just get rid of
> the per-map flush queues... :)

Yes, I know ;-)

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH bpf-next v2 0/8] Simplify xdp_do_redirect_map()/xdp_do_flush_map() and XDP maps
  2020-01-07 13:52                 ` Jesper Dangaard Brouer
@ 2020-01-07 14:18                   ` Toke Høiland-Jørgensen
  0 siblings, 0 replies; 36+ messages in thread
From: Toke Høiland-Jørgensen @ 2020-01-07 14:18 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: Björn Töpel, Alexei Starovoitov, Network Development,
	Alexei Starovoitov, Daniel Borkmann, bpf, David S. Miller,
	Jakub Kicinski, John Fastabend, Karlsson, Magnus, Jonathan Lemon,
	brouer

Jesper Dangaard Brouer <brouer@redhat.com> writes:

> On Tue, 07 Jan 2020 14:27:41 +0100
> Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>
>> Jesper Dangaard Brouer <brouer@redhat.com> writes:
>> 
>> > On Tue, 07 Jan 2020 12:25:47 +0100
>> > Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>> >  
>> >> Björn Töpel <bjorn.topel@gmail.com> writes:
>> >>   
>> >> > On Fri, 20 Dec 2019 at 11:30, Toke Høiland-Jørgensen <toke@redhat.com> wrote:    
>> >> >>
>> >> >> Jesper Dangaard Brouer <brouer@redhat.com> writes:
>> >> >>    
>> >> > [...]    
>> >> >> > I have now went over the entire patchset, and everything look perfect,
>> >> >> > I will go as far as saying it is brilliant.  We previously had the
>> >> >> > issue, that using different redirect maps in a BPF-prog would cause the
>> >> >> > bulking effect to be reduced, as map_to_flush cause previous map to get
>> >> >> > flushed. This is now solved :-)    
>> >> >>
>> >> >> Another thing that occurred to me while thinking about this: Now that we
>> >> >> have a single flush list, is there any reason we couldn't move the
>> >> >> devmap xdp_bulk_queue into struct net_device? That way it could also be
>> >> >> used for the non-map variant of bpf_redirect()?
>> >> >>    
>> >> >
>> >> > Indeed! (At least I don't see any blockers...)    
>> >> 
>> >> Cool, that's what I thought. Maybe I'll give that a shot, then, unless
>> >> you beat me to it ;)  
>> >  
>> > Generally sounds like a good idea.
>> >
>> > It this only for devmap xdp_bulk_queue?  
>> 
>> Non-map redirect only supports redirecting across interfaces (the
>> parameter is an ifindex), so yeah, this would be just for that.
>
> Sure, then you don't need to worry about below gotchas.
>
> I do like the idea, as this would/should solve the non-map redirect
> performance issue.

Yes, that was exactly my thought. Taking a stab at it now... :)

>> > Some gotchas off the top of my head.
>> >
>> > The cpumap also have a struct xdp_bulk_queue, which have a different
>> > layout. (sidenote: due to BTF we likely want rename that).
>> >
>> > If you want to generalize this across all redirect maps type. You
>> > should know, that it was on purpose that I designed the bulking to be
>> > map specific, because that allowed each map to control its own optimal
>> > bulking.  E.g. devmap does 16 frames bulking, cpumap does 8 frames (as
>> > it matches sending 1 cacheline into underlying ptr_ring), xskmap does
>> > 64 AFAIK (which could hurt-latency, but that is another discussion).  
>> 
>> Björn's patches do leave the per-type behaviour, they just get rid of
>> the per-map flush queues... :)
>
> Yes, I know ;-)

I know you do; was just a bit puzzled why you brought up all that other
stuff, and, well, had to answer something, didn't I? ;)

-Toke


^ permalink raw reply	[flat|nested] 36+ messages in thread

* RE: [PATCH bpf-next v2 1/8] xdp: simplify devmap cleanup
  2019-12-19  6:09 ` [PATCH bpf-next v2 1/8] xdp: simplify devmap cleanup Björn Töpel
@ 2020-01-07 17:32   ` John Fastabend
  2020-01-08 10:08     ` Björn Töpel
  0 siblings, 1 reply; 36+ messages in thread
From: John Fastabend @ 2020-01-07 17:32 UTC (permalink / raw)
  To: Björn Töpel, netdev, ast, daniel
  Cc: Björn Töpel, bpf, davem, jakub.kicinski, hawk,
	john.fastabend, magnus.karlsson, jonathan.lemon,
	Toke Høiland-Jørgensen

Björn Töpel wrote:
> From: Björn Töpel <bjorn.topel@intel.com>
> 
> After the RCU flavor consolidation [1], call_rcu() and
> synchronize_rcu() waits for preempt-disable regions (NAPI) in addition
> to the read-side critical sections. As a result of this, the cleanup
> code in devmap can be simplified

OK great makes sense. One comment below.

> 
> * There is no longer a need to flush in __dev_map_entry_free, since we
>   know that this has been done when the call_rcu() callback is
>   triggered.
> 
> * When freeing the map, there is no need to explicitly wait for a
>   flush. It's guaranteed to be done after the synchronize_rcu() call
>   in dev_map_free(). The rcu_barrier() is still needed, so that the
>   map is not freed prior the elements.
> 
> [1] https://lwn.net/Articles/777036/
> 
> Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
> Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
> ---
>  kernel/bpf/devmap.c | 43 +++++--------------------------------------
>  1 file changed, 5 insertions(+), 38 deletions(-)
> 
> diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c
> index 3d3d61b5985b..b7595de6a91a 100644
> --- a/kernel/bpf/devmap.c
> +++ b/kernel/bpf/devmap.c
> @@ -201,7 +201,7 @@ static struct bpf_map *dev_map_alloc(union bpf_attr *attr)
>  static void dev_map_free(struct bpf_map *map)
>  {
>  	struct bpf_dtab *dtab = container_of(map, struct bpf_dtab, map);
> -	int i, cpu;
> +	int i;
>  
>  	/* At this point bpf_prog->aux->refcnt == 0 and this map->refcnt == 0,
>  	 * so the programs (can be more than one that used this map) were
> @@ -221,18 +221,6 @@ static void dev_map_free(struct bpf_map *map)
>  	/* Make sure prior __dev_map_entry_free() have completed. */
>  	rcu_barrier();
>  

The comment at the start of this function also needs to be fixed it says,

  /* At this point bpf_prog->aux->refcnt == 0 and this map->refcnt == 0,
   * so the programs (can be more than one that used this map) were
   * disconnected from events. Wait for outstanding critical sections in
   * these programs to complete. The rcu critical section only guarantees
   * no further reads against netdev_map. It does __not__ ensure pending
   * flush operations (if any) are complete.
   */

also comment in dev_map_delete_elem() needs update.

> -	/* To ensure all pending flush operations have completed wait for flush
> -	 * list to empty on _all_ cpus.
> -	 * Because the above synchronize_rcu() ensures the map is disconnected
> -	 * from the program we can assume no new items will be added.
> -	 */
> -	for_each_online_cpu(cpu) {
> -		struct list_head *flush_list = per_cpu_ptr(dtab->flush_list, cpu);
> -
> -		while (!list_empty(flush_list))
> -			cond_resched();
> -	}
> -
>  	if (dtab->map.map_type == BPF_MAP_TYPE_DEVMAP_HASH) {
>  		for (i = 0; i < dtab->n_buckets; i++) {
>  			struct bpf_dtab_netdev *dev;
> @@ -345,8 +333,7 @@ static int dev_map_hash_get_next_key(struct bpf_map *map, void *key,
>  	return -ENOENT;
>  }

Otherwise LGTM thanks.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* RE: [PATCH bpf-next v2 3/8] xdp: fix graze->grace type-o in cpumap comments
  2019-12-19  6:10 ` [PATCH bpf-next v2 3/8] xdp: fix graze->grace type-o in cpumap comments Björn Töpel
@ 2020-01-07 17:33   ` John Fastabend
  0 siblings, 0 replies; 36+ messages in thread
From: John Fastabend @ 2020-01-07 17:33 UTC (permalink / raw)
  To: Björn Töpel, netdev, ast, daniel
  Cc: Björn Töpel, bpf, davem, jakub.kicinski, hawk,
	john.fastabend, magnus.karlsson, jonathan.lemon,
	Toke Høiland-Jørgensen

Björn Töpel wrote:
> From: Björn Töpel <bjorn.topel@intel.com>
> 
> Simple spelling fix.
> 
> Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
> Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
> ---

Acked-by: John Fastabend <john.fastabend@gmail.com>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* RE: [PATCH bpf-next v2 4/8] xsk: make xskmap flush_list common for all map instances
  2019-12-19  6:10 ` [PATCH bpf-next v2 4/8] xsk: make xskmap flush_list common for all map instances Björn Töpel
@ 2020-01-07 17:54   ` John Fastabend
  2020-01-08 10:13     ` Björn Töpel
  0 siblings, 1 reply; 36+ messages in thread
From: John Fastabend @ 2020-01-07 17:54 UTC (permalink / raw)
  To: Björn Töpel, netdev, ast, daniel
  Cc: Björn Töpel, bpf, davem, jakub.kicinski, hawk,
	john.fastabend, magnus.karlsson, jonathan.lemon,
	Toke Høiland-Jørgensen

Björn Töpel wrote:
> From: Björn Töpel <bjorn.topel@intel.com>
> 
> The xskmap flush list is used to track entries that need to flushed
> from via the xdp_do_flush_map() function. This list used to be
> per-map, but there is really no reason for that. Instead make the
> flush list global for all xskmaps, which simplifies __xsk_map_flush()
> and xsk_map_alloc().
> 
> Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
> Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
> ---

Just to check. The reason this is OK is because xdp_do_flush_map()
is called from NAPI context and is per CPU so the only entries on
the list will be from the current cpu napi context? Even in the case
where multiple xskmaps exist we can't have entries from more than
a single map on any list at the same time by my reading.

LGTM,
Acked-by: John Fastabend <john.fastabend@gmail.com>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* RE: [PATCH bpf-next v2 5/8] xdp: make devmap flush_list common for all map instances
  2019-12-19  6:10 ` [PATCH bpf-next v2 5/8] xdp: make devmap " Björn Töpel
@ 2020-01-07 17:58   ` John Fastabend
  2020-01-08 10:16     ` Björn Töpel
  0 siblings, 1 reply; 36+ messages in thread
From: John Fastabend @ 2020-01-07 17:58 UTC (permalink / raw)
  To: Björn Töpel, netdev, ast, daniel
  Cc: Björn Töpel, bpf, davem, jakub.kicinski, hawk,
	john.fastabend, magnus.karlsson, jonathan.lemon,
	Toke Høiland-Jørgensen

Björn Töpel wrote:
> From: Björn Töpel <bjorn.topel@intel.com>
> 
> The devmap flush list is used to track entries that need to flushed
> from via the xdp_do_flush_map() function. This list used to be
> per-map, but there is really no reason for that. Instead make the
> flush list global for all devmaps, which simplifies __dev_map_flush()
> and dev_map_init_map().
> 
> Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
> Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
> ---
>  include/linux/bpf.h |  4 ++--
>  kernel/bpf/devmap.c | 35 +++++++++++++----------------------
>  net/core/filter.c   |  2 +-
>  3 files changed, 16 insertions(+), 25 deletions(-)
> 
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index d467983e61bb..31191804ca09 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -959,7 +959,7 @@ struct sk_buff;
>  
>  struct bpf_dtab_netdev *__dev_map_lookup_elem(struct bpf_map *map, u32 key);
>  struct bpf_dtab_netdev *__dev_map_hash_lookup_elem(struct bpf_map *map, u32 key);
> -void __dev_map_flush(struct bpf_map *map);
> +void __dev_map_flush(void);
>  int dev_map_enqueue(struct bpf_dtab_netdev *dst, struct xdp_buff *xdp,
>  		    struct net_device *dev_rx);
>  int dev_map_generic_redirect(struct bpf_dtab_netdev *dst, struct sk_buff *skb,
> @@ -1068,7 +1068,7 @@ static inline struct net_device  *__dev_map_hash_lookup_elem(struct bpf_map *map
>  	return NULL;
>  }
>  
> -static inline void __dev_map_flush(struct bpf_map *map)
> +static inline void __dev_map_flush(void)

How about __dev_flush(void) then sense its not map specific anymore?
Probably same in patch 4/5.

>  {
>  }
>  
> diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c
> index b7595de6a91a..da9c832fc5c8 100644
> --- a/kernel/bpf/devmap.c
> +++ b/kernel/bpf/devmap.c

[...]
 
> @@ -384,10 +371,9 @@ static int bq_xmit_all(struct xdp_bulk_queue *bq, u32 flags)
>   * net device can be torn down. On devmap tear down we ensure the flush list
>   * is empty before completing to ensure all flush operations have completed.
>   */
> -void __dev_map_flush(struct bpf_map *map)
> +void __dev_map_flush(void)

__dev_flush()?

>  {
> -	struct bpf_dtab *dtab = container_of(map, struct bpf_dtab, map);
> -	struct list_head *flush_list = this_cpu_ptr(dtab->flush_list);
> +	struct list_head *flush_list = this_cpu_ptr(&dev_map_flush_list);
>  	struct xdp_bulk_queue *bq, *tmp;
>  
>  	rcu_read_lock();

[...]

Looks good changing the function name would make things a bit cleaner IMO.

Acked-by: John Fastabend <john.fastabend@gmail.com>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* RE: [PATCH bpf-next v2 6/8] xdp: make cpumap flush_list common for all map instances
  2019-12-19  6:10 ` [PATCH bpf-next v2 6/8] xdp: make cpumap " Björn Töpel
@ 2020-01-07 17:59   ` John Fastabend
  0 siblings, 0 replies; 36+ messages in thread
From: John Fastabend @ 2020-01-07 17:59 UTC (permalink / raw)
  To: Björn Töpel, netdev, ast, daniel
  Cc: Björn Töpel, bpf, davem, jakub.kicinski, hawk,
	john.fastabend, magnus.karlsson, jonathan.lemon,
	Toke Høiland-Jørgensen

Björn Töpel wrote:
> From: Björn Töpel <bjorn.topel@intel.com>
> 
> The cpumap flush list is used to track entries that need to flushed
> from via the xdp_do_flush_map() function. This list used to be
> per-map, but there is really no reason for that. Instead make the
> flush list global for all devmaps, which simplifies __cpu_map_flush()
> and cpu_map_alloc().
> 
> Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
> Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
> ---

Consider *map_flush() -> *_flush() change but thats just a small nit
its not too important to me.

Acked-by: John Fastabend <john.fastabend@gmail.com>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* RE: [PATCH bpf-next v2 7/8] xdp: remove map_to_flush and map swap detection
  2019-12-19  6:10 ` [PATCH bpf-next v2 7/8] xdp: remove map_to_flush and map swap detection Björn Töpel
@ 2020-01-07 18:15   ` John Fastabend
  2020-01-07 21:07     ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 36+ messages in thread
From: John Fastabend @ 2020-01-07 18:15 UTC (permalink / raw)
  To: Björn Töpel, netdev, ast, daniel
  Cc: Björn Töpel, bpf, davem, jakub.kicinski, hawk,
	john.fastabend, magnus.karlsson, jonathan.lemon,
	Toke Høiland-Jørgensen

Björn Töpel wrote:
> From: Björn Töpel <bjorn.topel@intel.com>
> 
> Now that all XDP maps that can be used with bpf_redirect_map() tracks
> entries to be flushed in a global fashion, there is not need to track
> that the map has changed and flush from xdp_do_generic_map()
> anymore. All entries will be flushed in xdp_do_flush_map().
> 
> This means that the map_to_flush can be removed, and the corresponding
> checks. Moving the flush logic to one place, xdp_do_flush_map(), give
> a bulking behavior and performance boost.
> 
> Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
> Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
> ---

__dev_map_flush() still has rcu_read_lock/unlock() around flush_list by
this point, assuming I've followed along correctly. Can we drop those
now seeing its per CPU and all list ops are per-cpu inside napi context?

Two reasons to consider, with this patch dev_map_flush() is always
called even if the list is empty so even in TX case without redirect.
But probably more important it makes the locking requirements more clear.
Could probably be done in a follow up patch but wanted to bring it up.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* RE: [PATCH bpf-next v2 7/8] xdp: remove map_to_flush and map swap detection
  2020-01-07 18:15   ` John Fastabend
@ 2020-01-07 21:07     ` Toke Høiland-Jørgensen
  2020-01-08  3:45       ` John Fastabend
  0 siblings, 1 reply; 36+ messages in thread
From: Toke Høiland-Jørgensen @ 2020-01-07 21:07 UTC (permalink / raw)
  To: John Fastabend, Björn Töpel, netdev, ast, daniel
  Cc: Björn Töpel, bpf, davem, jakub.kicinski, hawk,
	john.fastabend, magnus.karlsson, jonathan.lemon

John Fastabend <john.fastabend@gmail.com> writes:

> Björn Töpel wrote:
>> From: Björn Töpel <bjorn.topel@intel.com>
>> 
>> Now that all XDP maps that can be used with bpf_redirect_map() tracks
>> entries to be flushed in a global fashion, there is not need to track
>> that the map has changed and flush from xdp_do_generic_map()
>> anymore. All entries will be flushed in xdp_do_flush_map().
>> 
>> This means that the map_to_flush can be removed, and the corresponding
>> checks. Moving the flush logic to one place, xdp_do_flush_map(), give
>> a bulking behavior and performance boost.
>> 
>> Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
>> Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
>> ---
>
> __dev_map_flush() still has rcu_read_lock/unlock() around flush_list by
> this point, assuming I've followed along correctly. Can we drop those
> now seeing its per CPU and all list ops are per-cpu inside napi context?

Hmm, I guess so? :)

> Two reasons to consider, with this patch dev_map_flush() is always
> called even if the list is empty so even in TX case without redirect.
> But probably more important it makes the locking requirements more clear.
> Could probably be done in a follow up patch but wanted to bring it up.

This series was already merged, but I'll follow up with the non-map
redirect change. This requires a bit of refactoring anyway, so I can
incorporate the lock removal into that...

-Toke


^ permalink raw reply	[flat|nested] 36+ messages in thread

* RE: [PATCH bpf-next v2 7/8] xdp: remove map_to_flush and map swap detection
  2020-01-07 21:07     ` Toke Høiland-Jørgensen
@ 2020-01-08  3:45       ` John Fastabend
  2020-01-08 10:24         ` Björn Töpel
  0 siblings, 1 reply; 36+ messages in thread
From: John Fastabend @ 2020-01-08  3:45 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen, John Fastabend,
	Björn Töpel, netdev, ast, daniel
  Cc: Björn Töpel, bpf, davem, jakub.kicinski, hawk,
	john.fastabend, magnus.karlsson, jonathan.lemon

Toke Høiland-Jørgensen wrote:
> John Fastabend <john.fastabend@gmail.com> writes:
> 
> > Björn Töpel wrote:
> >> From: Björn Töpel <bjorn.topel@intel.com>
> >> 
> >> Now that all XDP maps that can be used with bpf_redirect_map() tracks
> >> entries to be flushed in a global fashion, there is not need to track
> >> that the map has changed and flush from xdp_do_generic_map()
> >> anymore. All entries will be flushed in xdp_do_flush_map().
> >> 
> >> This means that the map_to_flush can be removed, and the corresponding
> >> checks. Moving the flush logic to one place, xdp_do_flush_map(), give
> >> a bulking behavior and performance boost.
> >> 
> >> Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
> >> Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
> >> ---
> >
> > __dev_map_flush() still has rcu_read_lock/unlock() around flush_list by
> > this point, assuming I've followed along correctly. Can we drop those
> > now seeing its per CPU and all list ops are per-cpu inside napi context?
> 
> Hmm, I guess so? :)
> 
> > Two reasons to consider, with this patch dev_map_flush() is always
> > called even if the list is empty so even in TX case without redirect.
> > But probably more important it makes the locking requirements more clear.
> > Could probably be done in a follow up patch but wanted to bring it up.
> 
> This series was already merged, but I'll follow up with the non-map
> redirect change. This requires a bit of refactoring anyway, so I can
> incorporate the lock removal into that...
> 
> -Toke
> 

Ah I was just catching up with email and missed itwas already applied.

I can also submit a few fixup patches no problem for the comments and this.

.John

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH bpf-next v2 1/8] xdp: simplify devmap cleanup
  2020-01-07 17:32   ` John Fastabend
@ 2020-01-08 10:08     ` Björn Töpel
  0 siblings, 0 replies; 36+ messages in thread
From: Björn Töpel @ 2020-01-08 10:08 UTC (permalink / raw)
  To: John Fastabend
  Cc: Netdev, Alexei Starovoitov, Daniel Borkmann,
	Björn Töpel, bpf, David Miller, Jakub Kicinski,
	Jesper Dangaard Brouer, Karlsson, Magnus, Jonathan Lemon,
	Toke Høiland-Jørgensen

On Tue, 7 Jan 2020 at 18:32, John Fastabend <john.fastabend@gmail.com> wrote:
>
[...]
>
> Otherwise LGTM thanks.

Thanks for the review, John! I'll update the comments in a follow-up!


Björn

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH bpf-next v2 4/8] xsk: make xskmap flush_list common for all map instances
  2020-01-07 17:54   ` John Fastabend
@ 2020-01-08 10:13     ` Björn Töpel
  2020-01-08 15:52       ` John Fastabend
  0 siblings, 1 reply; 36+ messages in thread
From: Björn Töpel @ 2020-01-08 10:13 UTC (permalink / raw)
  To: John Fastabend
  Cc: Netdev, Alexei Starovoitov, Daniel Borkmann,
	Björn Töpel, bpf, David Miller, Jakub Kicinski,
	Jesper Dangaard Brouer, Karlsson, Magnus, Jonathan Lemon,
	Toke Høiland-Jørgensen

On Tue, 7 Jan 2020 at 18:54, John Fastabend <john.fastabend@gmail.com> wrote:
>
> Björn Töpel wrote:
> > From: Björn Töpel <bjorn.topel@intel.com>
> >
> > The xskmap flush list is used to track entries that need to flushed
> > from via the xdp_do_flush_map() function. This list used to be
> > per-map, but there is really no reason for that. Instead make the
> > flush list global for all xskmaps, which simplifies __xsk_map_flush()
> > and xsk_map_alloc().
> >
> > Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
> > Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
> > ---
>
> Just to check. The reason this is OK is because xdp_do_flush_map()
> is called from NAPI context and is per CPU so the only entries on
> the list will be from the current cpu napi context?

Correct!

> Even in the case
> where multiple xskmaps exist we can't have entries from more than
> a single map on any list at the same time by my reading.
>

No, there can be entries from different (XSK) maps. Instead of
focusing on maps to flush, focus on *entries* to flush. At the end of
the poll function, all entries (regardless of map origin) will be
flushed. Makes sense?


Björn


> LGTM,
> Acked-by: John Fastabend <john.fastabend@gmail.com>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH bpf-next v2 5/8] xdp: make devmap flush_list common for all map instances
  2020-01-07 17:58   ` John Fastabend
@ 2020-01-08 10:16     ` Björn Töpel
  2020-01-08 10:23       ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 36+ messages in thread
From: Björn Töpel @ 2020-01-08 10:16 UTC (permalink / raw)
  To: John Fastabend
  Cc: Netdev, Alexei Starovoitov, Daniel Borkmann,
	Björn Töpel, bpf, David Miller, Jakub Kicinski,
	Jesper Dangaard Brouer, Karlsson, Magnus, Jonathan Lemon,
	Toke Høiland-Jørgensen

On Tue, 7 Jan 2020 at 18:58, John Fastabend <john.fastabend@gmail.com> wrote:
>
[...]
> __dev_flush()?
>
[...]
>
> Looks good changing the function name would make things a bit cleaner IMO.
>

Hmm, I actually prefer the _map_ naming, since it's more clear that
"entries from the devmap" are being flushed -- but dev_flush() works
as well! :-) I can send a follow-up with the name change!


Thanks,
Björn

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH bpf-next v2 5/8] xdp: make devmap flush_list common for all map instances
  2020-01-08 10:16     ` Björn Töpel
@ 2020-01-08 10:23       ` Toke Høiland-Jørgensen
  2020-01-08 10:25         ` Björn Töpel
  0 siblings, 1 reply; 36+ messages in thread
From: Toke Høiland-Jørgensen @ 2020-01-08 10:23 UTC (permalink / raw)
  To: Björn Töpel, John Fastabend
  Cc: Netdev, Alexei Starovoitov, Daniel Borkmann,
	Björn Töpel, bpf, David Miller, Jakub Kicinski,
	Jesper Dangaard Brouer, Karlsson, Magnus, Jonathan Lemon

Björn Töpel <bjorn.topel@gmail.com> writes:

> On Tue, 7 Jan 2020 at 18:58, John Fastabend <john.fastabend@gmail.com> wrote:
>>
> [...]
>> __dev_flush()?
>>
> [...]
>>
>> Looks good changing the function name would make things a bit cleaner IMO.
>>
>
> Hmm, I actually prefer the _map_ naming, since it's more clear that
> "entries from the devmap" are being flushed -- but dev_flush() works
> as well! :-) I can send a follow-up with the name change!

Or I can just change it at the point where I'm adding support for
non-map redirect (which is when the _map suffix stops being accurate)? :)

-Toke


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH bpf-next v2 7/8] xdp: remove map_to_flush and map swap detection
  2020-01-08  3:45       ` John Fastabend
@ 2020-01-08 10:24         ` Björn Töpel
  0 siblings, 0 replies; 36+ messages in thread
From: Björn Töpel @ 2020-01-08 10:24 UTC (permalink / raw)
  To: John Fastabend
  Cc: Toke Høiland-Jørgensen, Netdev, Alexei Starovoitov,
	Daniel Borkmann, Björn Töpel, bpf, David Miller,
	Jakub Kicinski, Jesper Dangaard Brouer, Karlsson, Magnus,
	Jonathan Lemon

On Wed, 8 Jan 2020 at 04:45, John Fastabend <john.fastabend@gmail.com> wrote:
>
> Toke Høiland-Jørgensen wrote:
[...]
> >
> > This series was already merged, but I'll follow up with the non-map
> > redirect change. This requires a bit of refactoring anyway, so I can
> > incorporate the lock removal into that...
> >
> > -Toke
> >
>
> Ah I was just catching up with email and missed itwas already applied.
>
> I can also submit a few fixup patches no problem for the comments and this.
>

Ah, perfect! Thanks, John and Toke!


Björn

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH bpf-next v2 5/8] xdp: make devmap flush_list common for all map instances
  2020-01-08 10:23       ` Toke Høiland-Jørgensen
@ 2020-01-08 10:25         ` Björn Töpel
  0 siblings, 0 replies; 36+ messages in thread
From: Björn Töpel @ 2020-01-08 10:25 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: John Fastabend, Netdev, Alexei Starovoitov, Daniel Borkmann,
	Björn Töpel, bpf, David Miller, Jakub Kicinski,
	Jesper Dangaard Brouer, Karlsson, Magnus, Jonathan Lemon

On Wed, 8 Jan 2020 at 11:23, Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>
> Björn Töpel <bjorn.topel@gmail.com> writes:
>
> > On Tue, 7 Jan 2020 at 18:58, John Fastabend <john.fastabend@gmail.com> wrote:
> >>
> > [...]
> >> __dev_flush()?
> >>
> > [...]
> >>
> >> Looks good changing the function name would make things a bit cleaner IMO.
> >>
> >
> > Hmm, I actually prefer the _map_ naming, since it's more clear that
> > "entries from the devmap" are being flushed -- but dev_flush() works
> > as well! :-) I can send a follow-up with the name change!
>
> Or I can just change it at the point where I'm adding support for
> non-map redirect (which is when the _map suffix stops being accurate)? :)
>

Even better! :-) Thanks!

Björn

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH bpf-next v2 4/8] xsk: make xskmap flush_list common for all map instances
  2020-01-08 10:13     ` Björn Töpel
@ 2020-01-08 15:52       ` John Fastabend
  2020-01-08 16:01         ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 36+ messages in thread
From: John Fastabend @ 2020-01-08 15:52 UTC (permalink / raw)
  To: Björn Töpel, John Fastabend
  Cc: Netdev, Alexei Starovoitov, Daniel Borkmann,
	Björn Töpel, bpf, David Miller, Jakub Kicinski,
	Jesper Dangaard Brouer, Karlsson, Magnus, Jonathan Lemon,
	Toke Høiland-Jørgensen

Björn Töpel wrote:
> On Tue, 7 Jan 2020 at 18:54, John Fastabend <john.fastabend@gmail.com> wrote:
> >
> > Björn Töpel wrote:
> > > From: Björn Töpel <bjorn.topel@intel.com>
> > >
> > > The xskmap flush list is used to track entries that need to flushed
> > > from via the xdp_do_flush_map() function. This list used to be
> > > per-map, but there is really no reason for that. Instead make the
> > > flush list global for all xskmaps, which simplifies __xsk_map_flush()
> > > and xsk_map_alloc().
> > >
> > > Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
> > > Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
> > > ---
> >
> > Just to check. The reason this is OK is because xdp_do_flush_map()
> > is called from NAPI context and is per CPU so the only entries on
> > the list will be from the current cpu napi context?
> 
> Correct!
> 
> > Even in the case
> > where multiple xskmaps exist we can't have entries from more than
> > a single map on any list at the same time by my reading.
> >
> 
> No, there can be entries from different (XSK) maps. Instead of
> focusing on maps to flush, focus on *entries* to flush. At the end of
> the poll function, all entries (regardless of map origin) will be
> flushed. Makes sense?

Ah OK. This would mean that a single program used multiple maps
though correct? Because we can only run a single BPF program per
NAPI context.

What I was after is checking that semantics haven't changed which
I believe is true, just checking.

> 
> 
> Björn
> 
> 
> > LGTM,
> > Acked-by: John Fastabend <john.fastabend@gmail.com>



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH bpf-next v2 4/8] xsk: make xskmap flush_list common for all map instances
  2020-01-08 15:52       ` John Fastabend
@ 2020-01-08 16:01         ` Toke Høiland-Jørgensen
  0 siblings, 0 replies; 36+ messages in thread
From: Toke Høiland-Jørgensen @ 2020-01-08 16:01 UTC (permalink / raw)
  To: John Fastabend, Björn Töpel, John Fastabend
  Cc: Netdev, Alexei Starovoitov, Daniel Borkmann,
	Björn Töpel, bpf, David Miller, Jakub Kicinski,
	Jesper Dangaard Brouer, Karlsson, Magnus, Jonathan Lemon

John Fastabend <john.fastabend@gmail.com> writes:

> Björn Töpel wrote:
>> On Tue, 7 Jan 2020 at 18:54, John Fastabend <john.fastabend@gmail.com> wrote:
>> >
>> > Björn Töpel wrote:
>> > > From: Björn Töpel <bjorn.topel@intel.com>
>> > >
>> > > The xskmap flush list is used to track entries that need to flushed
>> > > from via the xdp_do_flush_map() function. This list used to be
>> > > per-map, but there is really no reason for that. Instead make the
>> > > flush list global for all xskmaps, which simplifies __xsk_map_flush()
>> > > and xsk_map_alloc().
>> > >
>> > > Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
>> > > Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
>> > > ---
>> >
>> > Just to check. The reason this is OK is because xdp_do_flush_map()
>> > is called from NAPI context and is per CPU so the only entries on
>> > the list will be from the current cpu napi context?
>> 
>> Correct!
>> 
>> > Even in the case
>> > where multiple xskmaps exist we can't have entries from more than
>> > a single map on any list at the same time by my reading.
>> >
>> 
>> No, there can be entries from different (XSK) maps. Instead of
>> focusing on maps to flush, focus on *entries* to flush. At the end of
>> the poll function, all entries (regardless of map origin) will be
>> flushed. Makes sense?
>
> Ah OK. This would mean that a single program used multiple maps
> though correct? Because we can only run a single BPF program per
> NAPI context.

Yeah, there's nothing limiting each program to a single map (of any
type)...

-Toke


^ permalink raw reply	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2020-01-08 16:01 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-12-19  6:09 [PATCH bpf-next v2 0/8] Simplify xdp_do_redirect_map()/xdp_do_flush_map() and XDP maps Björn Töpel
2019-12-19  6:09 ` [PATCH bpf-next v2 1/8] xdp: simplify devmap cleanup Björn Töpel
2020-01-07 17:32   ` John Fastabend
2020-01-08 10:08     ` Björn Töpel
2019-12-19  6:10 ` [PATCH bpf-next v2 2/8] xdp: simplify cpumap cleanup Björn Töpel
2019-12-19  6:10 ` [PATCH bpf-next v2 3/8] xdp: fix graze->grace type-o in cpumap comments Björn Töpel
2020-01-07 17:33   ` John Fastabend
2019-12-19  6:10 ` [PATCH bpf-next v2 4/8] xsk: make xskmap flush_list common for all map instances Björn Töpel
2020-01-07 17:54   ` John Fastabend
2020-01-08 10:13     ` Björn Töpel
2020-01-08 15:52       ` John Fastabend
2020-01-08 16:01         ` Toke Høiland-Jørgensen
2019-12-19  6:10 ` [PATCH bpf-next v2 5/8] xdp: make devmap " Björn Töpel
2020-01-07 17:58   ` John Fastabend
2020-01-08 10:16     ` Björn Töpel
2020-01-08 10:23       ` Toke Høiland-Jørgensen
2020-01-08 10:25         ` Björn Töpel
2019-12-19  6:10 ` [PATCH bpf-next v2 6/8] xdp: make cpumap " Björn Töpel
2020-01-07 17:59   ` John Fastabend
2019-12-19  6:10 ` [PATCH bpf-next v2 7/8] xdp: remove map_to_flush and map swap detection Björn Töpel
2020-01-07 18:15   ` John Fastabend
2020-01-07 21:07     ` Toke Høiland-Jørgensen
2020-01-08  3:45       ` John Fastabend
2020-01-08 10:24         ` Björn Töpel
2019-12-19  6:10 ` [PATCH bpf-next v2 8/8] xdp: simplify __bpf_tx_xdp_map() Björn Töpel
2019-12-19  7:18 ` [PATCH bpf-next v2 0/8] Simplify xdp_do_redirect_map()/xdp_do_flush_map() and XDP maps Jesper Dangaard Brouer
2019-12-20  5:21 ` Alexei Starovoitov
2019-12-20  7:46   ` Jesper Dangaard Brouer
2019-12-20  9:26     ` Jesper Dangaard Brouer
2019-12-20 10:29       ` Toke Høiland-Jørgensen
2020-01-07 11:10         ` Björn Töpel
2020-01-07 11:25           ` Toke Høiland-Jørgensen
2020-01-07 13:05             ` Jesper Dangaard Brouer
2020-01-07 13:27               ` Toke Høiland-Jørgensen
2020-01-07 13:52                 ` Jesper Dangaard Brouer
2020-01-07 14:18                   ` Toke Høiland-Jørgensen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).