bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH bpf-next 00/12] xsk: clean up ring access functions
@ 2019-12-09  7:56 Magnus Karlsson
  2019-12-09  7:56 ` [PATCH bpf-next 01/12] xsk: eliminate the lazy update threshold Magnus Karlsson
                   ` (12 more replies)
  0 siblings, 13 replies; 22+ messages in thread
From: Magnus Karlsson @ 2019-12-09  7:56 UTC (permalink / raw)
  To: magnus.karlsson, bjorn.topel, ast, daniel, netdev, jonathan.lemon
  Cc: bpf, saeedm, jeffrey.t.kirsher, maciej.fijalkowski,
	maciejromanfijalkowski

This patch set cleans up the ring access functions of AF_XDP in hope
that it will now be easier to understand and maintain. I used to get a
headache every time I looked at this code in order to really understand it,
but now I do think it is a lot less painful.

The code has been simplified a lot and as a bonus we get better
performance. On my 2.0 GHz Broadwell machine with a standard default
config plus AF_XDP support and CONFIG_PREEMPT on I get the following
results in percent performance increases with this patch set compared
to without it:

Zero-copy (-N):
          rxdrop        txpush        l2fwd
1 core:     4%            5%            4%
2 cores:    1%            0%            2%

Zero-copy with poll() (-N -p):
          rxdrop        txpush        l2fwd
1 core:     1%            3%            3%
2 cores:   22%            0%            5%

Skb mode (-S):
Shows a 0% to 1% performance improvement over the same benchmarks as
above.

Here 1 core means that we are running the driver processing and the
application on the same core, while 2 cores means that they execute on
separate cores. The applications are from the xdpsock sample app.

When a results says 22% better, as in the case of poll mode with 2
cores and rxdrop, my first reaction is that it must be a
bug. Everything else shows between 0% and 5% performance
improvement. What is giving rise to 22%? A quick bisect indicates that
it is patches 2, 3, 4, 5, and 6 that are giving rise to most of this
improvement. So not one patch in particular, but something around 4%
improvement from each one of them. Note that exactly this benchmark
has previously had an extraordinary slow down compared to when running
without poll syscalls. For all the other poll tests above, the
slowdown has always been around 4% for using poll syscalls. But with
the bad performing test in question, it was above 25%. Interestingly,
after this clean up, the slow down is 4%, just like all the other poll
tests. Please take an extra peek at this so I have not messed up
something.

The 0% for txpush with two cores is due to the test bottlenecking on
a non-CPU HW resource. If I eliminated that bottleneck on my system,
I would expect to see an increase there too.

This patch has been applied against commit e7096c131e51 ("net: WireGuard secure network tunnel")

Structure of the patch set:

Patch 1: Eliminate the lazy update threshold used when preallocating
         entries in the completion ring
Patch 2: Consolidate the two local producer pointers into one
Patch 3: Standardize the naming of the producer ring access functions
Patch 4: Simplify the detection of empty and full rings
Patch 5: Eliminate the Rx batch size used for the fill ring
Patch 6: Simplify the functions xskq_nb_avail and xskq_nb_free
Patch 7: Simplify and standardize the naming of the consumer ring
         access functions
Patch 8: Change the names of the validation functions to improve
         readability and also the return value of these functions
Patch 9: Change the name of xsk_umem_discard_addr() to
         xsk_umem_release_addr() to better reflect the new
         names. Requires a name change in the drivers that support AF_XDP
         zero-copy.
Patch 10: Remove unnecessary READ_ONCE of data in the ring
Patch 11: Add overall function naming comment and reorder the functions
          for easier reference
Patch 12: Use the struct_size helper function when allocating rings

Thanks: Magnus

Magnus Karlsson (12):
  xsk: eliminate the lazy update threshold
  xsk: consolidate to one single cached producer pointer
  xsk: standardize naming of producer ring access functions
  xsk: simplify detection of empty and full rings
  xsk: eliminate the RX batch size
  xsk: simplify xskq_nb_avail and xskq_nb_free
  xsk: simplify the consumer ring access functions
  xsk: change names of validation functions
  xsk: ixgbe: i40e: ice: mlx5: xsk_umem_discard_addr to
    xsk_umem_release_addr
  xsk: remove unnecessary READ_ONCE of data
  xsk: add function naming comments and reorder functions
  xsk: use struct_size() helper

 drivers/net/ethernet/intel/i40e/i40e_xsk.c         |   4 +-
 drivers/net/ethernet/intel/ice/ice_xsk.c           |   4 +-
 drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c       |   4 +-
 .../net/ethernet/mellanox/mlx5/core/en/xsk/rx.c    |   2 +-
 include/net/xdp_sock.h                             |  14 +-
 net/xdp/xsk.c                                      |  62 ++--
 net/xdp/xsk_queue.c                                |  15 +-
 net/xdp/xsk_queue.h                                | 370 +++++++++++----------
 8 files changed, 245 insertions(+), 230 deletions(-)

--
2.7.4

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH bpf-next 01/12] xsk: eliminate the lazy update threshold
  2019-12-09  7:56 [PATCH bpf-next 00/12] xsk: clean up ring access functions Magnus Karlsson
@ 2019-12-09  7:56 ` Magnus Karlsson
  2019-12-09  7:56 ` [PATCH bpf-next 02/12] xsk: consolidate to one single cached producer pointer Magnus Karlsson
                   ` (11 subsequent siblings)
  12 siblings, 0 replies; 22+ messages in thread
From: Magnus Karlsson @ 2019-12-09  7:56 UTC (permalink / raw)
  To: magnus.karlsson, bjorn.topel, ast, daniel, netdev, jonathan.lemon
  Cc: bpf, saeedm, jeffrey.t.kirsher, maciej.fijalkowski,
	maciejromanfijalkowski

The lazy update threshold was introduced to keep the producer and
consumer some distance apart in the completion ring. This was
important in the beginning of the development of AF_XDP as the ring
format as that point in time was very sensitive to the producer and
consumer being on the same cache line. This is not the case
anymore as the current ring format does not degrade in any noticeable
way when this happens. Moreover, this threshold makes it impossible
to run the system with rings that have less than 128 entries.

So let us remove this threshold and just get one entry from the ring
as in all other functions. This will enable us to remove this function
in a later commit. Note that xskq_produce_addr_lazy followed by
xskq_produce_flush_addr_n are still not the same function as
xskq_produce_addr() as it operates on another cached pointer.

Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
---
 net/xdp/xsk_queue.h | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/net/xdp/xsk_queue.h b/net/xdp/xsk_queue.h
index eddae46..a2f0ba6 100644
--- a/net/xdp/xsk_queue.h
+++ b/net/xdp/xsk_queue.h
@@ -11,7 +11,6 @@
 #include <net/xdp_sock.h>
 
 #define RX_BATCH_SIZE 16
-#define LAZY_UPDATE_THRESHOLD 128
 
 struct xdp_ring {
 	u32 producer ____cacheline_aligned_in_smp;
@@ -239,7 +238,7 @@ static inline int xskq_produce_addr_lazy(struct xsk_queue *q, u64 addr)
 {
 	struct xdp_umem_ring *ring = (struct xdp_umem_ring *)q->ring;
 
-	if (xskq_nb_free(q, q->prod_head, LAZY_UPDATE_THRESHOLD) == 0)
+	if (xskq_nb_free(q, q->prod_head, 1) == 0)
 		return -ENOSPC;
 
 	/* A, matches D */
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH bpf-next 02/12] xsk: consolidate to one single cached producer pointer
  2019-12-09  7:56 [PATCH bpf-next 00/12] xsk: clean up ring access functions Magnus Karlsson
  2019-12-09  7:56 ` [PATCH bpf-next 01/12] xsk: eliminate the lazy update threshold Magnus Karlsson
@ 2019-12-09  7:56 ` Magnus Karlsson
  2019-12-10  0:42   ` Martin Lau
  2019-12-13 18:04   ` Maxim Mikityanskiy
  2019-12-09  7:56 ` [PATCH bpf-next 03/12] xsk: standardize naming of producer ring access functions Magnus Karlsson
                   ` (10 subsequent siblings)
  12 siblings, 2 replies; 22+ messages in thread
From: Magnus Karlsson @ 2019-12-09  7:56 UTC (permalink / raw)
  To: magnus.karlsson, bjorn.topel, ast, daniel, netdev, jonathan.lemon
  Cc: bpf, saeedm, jeffrey.t.kirsher, maciej.fijalkowski,
	maciejromanfijalkowski

Currently, the xsk ring code has two cached producer pointers:
prod_head and prod_tail. This patch consolidates these two into a
single one called cached_prod to make the code simpler and easier to
maintain. This will be in line with the user space part of the the
code found in libbpf, that only uses a single cached pointer.

The Rx path only uses the two top level functions
xskq_produce_batch_desc and xskq_produce_flush_desc and they both use
prod_head and never prod_tail. So just move them over to
cached_prod.

The Tx XDP_DRV path uses xskq_produce_addr_lazy and
xskq_produce_flush_addr_n and unnecessarily operates on both prod_tail
and prod_cons, so move them over to just use cached_prod by skipping
the intermediate step of updating prod_tail.

The Tx path in XDP_SKB mode uses xskq_reserve_addr and
xskq_produce_addr. They currently use both cached pointers, but we can
operate on the global producer pointer in xskq_produce_addr since it
has to be updated anyway, thus eliminating the use of both cached
pointers. We can also remove the xskq_nb_free in xskq_produce_addr
since it is already called in xskq_reserve_addr. No need to do it
twice.

When there is only one cached producer pointer, we can also simplify
xskq_nb_free by removing one argument.

Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
---
 net/xdp/xsk_queue.h | 49 ++++++++++++++++++++++---------------------------
 1 file changed, 22 insertions(+), 27 deletions(-)

diff --git a/net/xdp/xsk_queue.h b/net/xdp/xsk_queue.h
index a2f0ba6..d88e1a0 100644
--- a/net/xdp/xsk_queue.h
+++ b/net/xdp/xsk_queue.h
@@ -35,8 +35,7 @@ struct xsk_queue {
 	u64 size;
 	u32 ring_mask;
 	u32 nentries;
-	u32 prod_head;
-	u32 prod_tail;
+	u32 cached_prod;
 	u32 cons_head;
 	u32 cons_tail;
 	struct xdp_ring *ring;
@@ -94,39 +93,39 @@ static inline u64 xskq_nb_invalid_descs(struct xsk_queue *q)
 
 static inline u32 xskq_nb_avail(struct xsk_queue *q, u32 dcnt)
 {
-	u32 entries = q->prod_tail - q->cons_tail;
+	u32 entries = q->cached_prod - q->cons_tail;
 
 	if (entries == 0) {
 		/* Refresh the local pointer */
-		q->prod_tail = READ_ONCE(q->ring->producer);
-		entries = q->prod_tail - q->cons_tail;
+		q->cached_prod = READ_ONCE(q->ring->producer);
+		entries = q->cached_prod - q->cons_tail;
 	}
 
 	return (entries > dcnt) ? dcnt : entries;
 }
 
-static inline u32 xskq_nb_free(struct xsk_queue *q, u32 producer, u32 dcnt)
+static inline u32 xskq_nb_free(struct xsk_queue *q, u32 dcnt)
 {
-	u32 free_entries = q->nentries - (producer - q->cons_tail);
+	u32 free_entries = q->nentries - (q->cached_prod - q->cons_tail);
 
 	if (free_entries >= dcnt)
 		return free_entries;
 
 	/* Refresh the local tail pointer */
 	q->cons_tail = READ_ONCE(q->ring->consumer);
-	return q->nentries - (producer - q->cons_tail);
+	return q->nentries - (q->cached_prod - q->cons_tail);
 }
 
 static inline bool xskq_has_addrs(struct xsk_queue *q, u32 cnt)
 {
-	u32 entries = q->prod_tail - q->cons_tail;
+	u32 entries = q->cached_prod - q->cons_tail;
 
 	if (entries >= cnt)
 		return true;
 
 	/* Refresh the local pointer. */
-	q->prod_tail = READ_ONCE(q->ring->producer);
-	entries = q->prod_tail - q->cons_tail;
+	q->cached_prod = READ_ONCE(q->ring->producer);
+	entries = q->cached_prod - q->cons_tail;
 
 	return entries >= cnt;
 }
@@ -220,17 +219,15 @@ static inline void xskq_discard_addr(struct xsk_queue *q)
 static inline int xskq_produce_addr(struct xsk_queue *q, u64 addr)
 {
 	struct xdp_umem_ring *ring = (struct xdp_umem_ring *)q->ring;
-
-	if (xskq_nb_free(q, q->prod_tail, 1) == 0)
-		return -ENOSPC;
+	unsigned int idx = q->ring->producer;
 
 	/* A, matches D */
-	ring->desc[q->prod_tail++ & q->ring_mask] = addr;
+	ring->desc[idx++ & q->ring_mask] = addr;
 
 	/* Order producer and data */
 	smp_wmb(); /* B, matches C */
 
-	WRITE_ONCE(q->ring->producer, q->prod_tail);
+	WRITE_ONCE(q->ring->producer, idx);
 	return 0;
 }
 
@@ -238,11 +235,11 @@ static inline int xskq_produce_addr_lazy(struct xsk_queue *q, u64 addr)
 {
 	struct xdp_umem_ring *ring = (struct xdp_umem_ring *)q->ring;
 
-	if (xskq_nb_free(q, q->prod_head, 1) == 0)
+	if (xskq_nb_free(q, 1) == 0)
 		return -ENOSPC;
 
 	/* A, matches D */
-	ring->desc[q->prod_head++ & q->ring_mask] = addr;
+	ring->desc[q->cached_prod++ & q->ring_mask] = addr;
 	return 0;
 }
 
@@ -252,17 +249,16 @@ static inline void xskq_produce_flush_addr_n(struct xsk_queue *q,
 	/* Order producer and data */
 	smp_wmb(); /* B, matches C */
 
-	q->prod_tail += nb_entries;
-	WRITE_ONCE(q->ring->producer, q->prod_tail);
+	WRITE_ONCE(q->ring->producer, q->ring->producer + nb_entries);
 }
 
 static inline int xskq_reserve_addr(struct xsk_queue *q)
 {
-	if (xskq_nb_free(q, q->prod_head, 1) == 0)
+	if (xskq_nb_free(q, 1) == 0)
 		return -ENOSPC;
 
 	/* A, matches D */
-	q->prod_head++;
+	q->cached_prod++;
 	return 0;
 }
 
@@ -340,11 +336,11 @@ static inline int xskq_produce_batch_desc(struct xsk_queue *q,
 	struct xdp_rxtx_ring *ring = (struct xdp_rxtx_ring *)q->ring;
 	unsigned int idx;
 
-	if (xskq_nb_free(q, q->prod_head, 1) == 0)
+	if (xskq_nb_free(q, 1) == 0)
 		return -ENOSPC;
 
 	/* A, matches D */
-	idx = (q->prod_head++) & q->ring_mask;
+	idx = q->cached_prod++ & q->ring_mask;
 	ring->desc[idx].addr = addr;
 	ring->desc[idx].len = len;
 
@@ -356,8 +352,7 @@ static inline void xskq_produce_flush_desc(struct xsk_queue *q)
 	/* Order producer and data */
 	smp_wmb(); /* B, matches C */
 
-	q->prod_tail = q->prod_head;
-	WRITE_ONCE(q->ring->producer, q->prod_tail);
+	WRITE_ONCE(q->ring->producer, q->cached_prod);
 }
 
 static inline bool xskq_full_desc(struct xsk_queue *q)
@@ -367,7 +362,7 @@ static inline bool xskq_full_desc(struct xsk_queue *q)
 
 static inline bool xskq_empty_desc(struct xsk_queue *q)
 {
-	return xskq_nb_free(q, q->prod_tail, q->nentries) == q->nentries;
+	return xskq_nb_free(q, q->nentries) == q->nentries;
 }
 
 void xskq_set_umem(struct xsk_queue *q, u64 size, u64 chunk_mask);
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH bpf-next 03/12] xsk: standardize naming of producer ring access functions
  2019-12-09  7:56 [PATCH bpf-next 00/12] xsk: clean up ring access functions Magnus Karlsson
  2019-12-09  7:56 ` [PATCH bpf-next 01/12] xsk: eliminate the lazy update threshold Magnus Karlsson
  2019-12-09  7:56 ` [PATCH bpf-next 02/12] xsk: consolidate to one single cached producer pointer Magnus Karlsson
@ 2019-12-09  7:56 ` Magnus Karlsson
  2019-12-09  7:56 ` [PATCH bpf-next 04/12] xsk: simplify detection of empty and full rings Magnus Karlsson
                   ` (9 subsequent siblings)
  12 siblings, 0 replies; 22+ messages in thread
From: Magnus Karlsson @ 2019-12-09  7:56 UTC (permalink / raw)
  To: magnus.karlsson, bjorn.topel, ast, daniel, netdev, jonathan.lemon
  Cc: bpf, saeedm, jeffrey.t.kirsher, maciej.fijalkowski,
	maciejromanfijalkowski

Adopt the naming of the producer ring access functions to have a
similar naming convention as the functions in libbpf, but adapted to
the kernel. You first reserve a number of entries that you later
submit to the global state of the ring. This is much clearer, IMO,
than the one that was in the kernel part. Once renamed, we also
discover that two functions are actually the same, so remove one of
them. Some of the primitive ring submission operations are also the
same so break these out into __xskq_prod_submit that the upper level
ring access functions can use.

Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
---
 net/xdp/xsk.c       | 18 ++++++++---------
 net/xdp/xsk_queue.h | 56 +++++++++++++++++++++++++----------------------------
 2 files changed, 35 insertions(+), 39 deletions(-)

diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
index 9567938..76013f0 100644
--- a/net/xdp/xsk.c
+++ b/net/xdp/xsk.c
@@ -165,7 +165,7 @@ static int __xsk_rcv(struct xdp_sock *xs, struct xdp_buff *xdp, u32 len)
 
 	offset += metalen;
 	addr = xsk_umem_adjust_offset(xs->umem, addr, offset);
-	err = xskq_produce_batch_desc(xs->rx, addr, len);
+	err = xskq_prod_reserve_desc(xs->rx, addr, len);
 	if (!err) {
 		xskq_discard_addr(xs->umem->fq);
 		xdp_return_buff(xdp);
@@ -178,7 +178,7 @@ static int __xsk_rcv(struct xdp_sock *xs, struct xdp_buff *xdp, u32 len)
 
 static int __xsk_rcv_zc(struct xdp_sock *xs, struct xdp_buff *xdp, u32 len)
 {
-	int err = xskq_produce_batch_desc(xs->rx, (u64)xdp->handle, len);
+	int err = xskq_prod_reserve_desc(xs->rx, xdp->handle, len);
 
 	if (err)
 		xs->rx_dropped++;
@@ -214,7 +214,7 @@ static int xsk_rcv(struct xdp_sock *xs, struct xdp_buff *xdp)
 
 static void xsk_flush(struct xdp_sock *xs)
 {
-	xskq_produce_flush_desc(xs->rx);
+	xskq_prod_submit(xs->rx);
 	xs->sk.sk_data_ready(&xs->sk);
 }
 
@@ -245,12 +245,12 @@ int xsk_generic_rcv(struct xdp_sock *xs, struct xdp_buff *xdp)
 	memcpy(buffer, xdp->data_meta, len + metalen);
 
 	addr = xsk_umem_adjust_offset(xs->umem, addr, metalen);
-	err = xskq_produce_batch_desc(xs->rx, addr, len);
+	err = xskq_prod_reserve_desc(xs->rx, addr, len);
 	if (err)
 		goto out_drop;
 
 	xskq_discard_addr(xs->umem->fq);
-	xskq_produce_flush_desc(xs->rx);
+	xskq_prod_submit(xs->rx);
 
 	spin_unlock_bh(&xs->rx_lock);
 
@@ -295,7 +295,7 @@ void __xsk_map_flush(struct bpf_map *map)
 
 void xsk_umem_complete_tx(struct xdp_umem *umem, u32 nb_entries)
 {
-	xskq_produce_flush_addr_n(umem->cq, nb_entries);
+	xskq_prod_submit_n(umem->cq, nb_entries);
 }
 EXPORT_SYMBOL(xsk_umem_complete_tx);
 
@@ -320,7 +320,7 @@ bool xsk_umem_consume_tx(struct xdp_umem *umem, struct xdp_desc *desc)
 		if (!xskq_peek_desc(xs->tx, desc, umem))
 			continue;
 
-		if (xskq_produce_addr_lazy(umem->cq, desc->addr))
+		if (xskq_prod_reserve_addr(umem->cq, desc->addr))
 			goto out;
 
 		xskq_discard_desc(xs->tx);
@@ -349,7 +349,7 @@ static void xsk_destruct_skb(struct sk_buff *skb)
 	unsigned long flags;
 
 	spin_lock_irqsave(&xs->tx_completion_lock, flags);
-	WARN_ON_ONCE(xskq_produce_addr(xs->umem->cq, addr));
+	xskq_prod_submit_addr(xs->umem->cq, addr);
 	spin_unlock_irqrestore(&xs->tx_completion_lock, flags);
 
 	sock_wfree(skb);
@@ -390,7 +390,7 @@ static int xsk_generic_xmit(struct sock *sk)
 		addr = desc.addr;
 		buffer = xdp_umem_get_data(xs->umem, addr);
 		err = skb_store_bits(skb, 0, buffer, len);
-		if (unlikely(err) || xskq_reserve_addr(xs->umem->cq)) {
+		if (unlikely(err) || xskq_prod_reserve(xs->umem->cq)) {
 			kfree_skb(skb);
 			goto out;
 		}
diff --git a/net/xdp/xsk_queue.h b/net/xdp/xsk_queue.h
index d88e1a0..202d402 100644
--- a/net/xdp/xsk_queue.h
+++ b/net/xdp/xsk_queue.h
@@ -216,22 +216,17 @@ static inline void xskq_discard_addr(struct xsk_queue *q)
 	q->cons_tail++;
 }
 
-static inline int xskq_produce_addr(struct xsk_queue *q, u64 addr)
+static inline int xskq_prod_reserve(struct xsk_queue *q)
 {
-	struct xdp_umem_ring *ring = (struct xdp_umem_ring *)q->ring;
-	unsigned int idx = q->ring->producer;
+	if (xskq_nb_free(q, 1) == 0)
+		return -ENOSPC;
 
 	/* A, matches D */
-	ring->desc[idx++ & q->ring_mask] = addr;
-
-	/* Order producer and data */
-	smp_wmb(); /* B, matches C */
-
-	WRITE_ONCE(q->ring->producer, idx);
+	q->cached_prod++;
 	return 0;
 }
 
-static inline int xskq_produce_addr_lazy(struct xsk_queue *q, u64 addr)
+static inline int xskq_prod_reserve_addr(struct xsk_queue *q, u64 addr)
 {
 	struct xdp_umem_ring *ring = (struct xdp_umem_ring *)q->ring;
 
@@ -243,23 +238,32 @@ static inline int xskq_produce_addr_lazy(struct xsk_queue *q, u64 addr)
 	return 0;
 }
 
-static inline void xskq_produce_flush_addr_n(struct xsk_queue *q,
-					     u32 nb_entries)
+static inline void __xskq_prod_submit(struct xsk_queue *q, u32 idx)
 {
 	/* Order producer and data */
 	smp_wmb(); /* B, matches C */
 
-	WRITE_ONCE(q->ring->producer, q->ring->producer + nb_entries);
+	WRITE_ONCE(q->ring->producer, idx);
 }
 
-static inline int xskq_reserve_addr(struct xsk_queue *q)
+static inline void xskq_prod_submit(struct xsk_queue *q)
 {
-	if (xskq_nb_free(q, 1) == 0)
-		return -ENOSPC;
+	__xskq_prod_submit(q, q->cached_prod);
+}
 
-	/* A, matches D */
-	q->cached_prod++;
-	return 0;
+static inline void xskq_prod_submit_addr(struct xsk_queue *q, u64 addr)
+{
+	struct xdp_umem_ring *ring = (struct xdp_umem_ring *)q->ring;
+	u32 idx = q->ring->producer;
+
+	ring->desc[idx++ & q->ring_mask] = addr;
+
+	__xskq_prod_submit(q, idx);
+}
+
+static inline void xskq_prod_submit_n(struct xsk_queue *q, u32 nb_entries)
+{
+	__xskq_prod_submit(q, q->ring->producer + nb_entries);
 }
 
 /* Rx/Tx queue */
@@ -330,11 +334,11 @@ static inline void xskq_discard_desc(struct xsk_queue *q)
 	q->cons_tail++;
 }
 
-static inline int xskq_produce_batch_desc(struct xsk_queue *q,
-					  u64 addr, u32 len)
+static inline int xskq_prod_reserve_desc(struct xsk_queue *q,
+					 u64 addr, u32 len)
 {
 	struct xdp_rxtx_ring *ring = (struct xdp_rxtx_ring *)q->ring;
-	unsigned int idx;
+	u32 idx;
 
 	if (xskq_nb_free(q, 1) == 0)
 		return -ENOSPC;
@@ -347,14 +351,6 @@ static inline int xskq_produce_batch_desc(struct xsk_queue *q,
 	return 0;
 }
 
-static inline void xskq_produce_flush_desc(struct xsk_queue *q)
-{
-	/* Order producer and data */
-	smp_wmb(); /* B, matches C */
-
-	WRITE_ONCE(q->ring->producer, q->cached_prod);
-}
-
 static inline bool xskq_full_desc(struct xsk_queue *q)
 {
 	return xskq_nb_avail(q, q->nentries) == q->nentries;
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH bpf-next 04/12] xsk: simplify detection of empty and full rings
  2019-12-09  7:56 [PATCH bpf-next 00/12] xsk: clean up ring access functions Magnus Karlsson
                   ` (2 preceding siblings ...)
  2019-12-09  7:56 ` [PATCH bpf-next 03/12] xsk: standardize naming of producer ring access functions Magnus Karlsson
@ 2019-12-09  7:56 ` Magnus Karlsson
  2019-12-09  7:56 ` [PATCH bpf-next 05/12] xsk: eliminate the RX batch size Magnus Karlsson
                   ` (8 subsequent siblings)
  12 siblings, 0 replies; 22+ messages in thread
From: Magnus Karlsson @ 2019-12-09  7:56 UTC (permalink / raw)
  To: magnus.karlsson, bjorn.topel, ast, daniel, netdev, jonathan.lemon
  Cc: bpf, saeedm, jeffrey.t.kirsher, maciej.fijalkowski,
	maciejromanfijalkowski

In order to set the correct return flags for poll, the xsk code has to
check if the Rx queue is empty and if the Tx queue is full. This code
was unnecessarily large and complex as it used the functions that are
used to update the local state from the global state (xskq_nb_free and
xskq_nb_avail). Since we are not doing this nor updating any data
dependent on this state, we can simplify the functions. Another
benefit from this is that we can also simplify the xskq_nb_free and
xskq_nb_avail functions in a later commit.

Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
---
 net/xdp/xsk.c       |  4 ++--
 net/xdp/xsk_queue.h | 10 ++++++----
 2 files changed, 8 insertions(+), 6 deletions(-)

diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
index 76013f0..dbeb4e4 100644
--- a/net/xdp/xsk.c
+++ b/net/xdp/xsk.c
@@ -471,9 +471,9 @@ static __poll_t xsk_poll(struct file *file, struct socket *sock,
 			__xsk_sendmsg(sk);
 	}
 
-	if (xs->rx && !xskq_empty_desc(xs->rx))
+	if (xs->rx && !xskq_prod_is_empty(xs->rx))
 		mask |= EPOLLIN | EPOLLRDNORM;
-	if (xs->tx && !xskq_full_desc(xs->tx))
+	if (xs->tx && !xskq_cons_is_full(xs->tx))
 		mask |= EPOLLOUT | EPOLLWRNORM;
 
 	return mask;
diff --git a/net/xdp/xsk_queue.h b/net/xdp/xsk_queue.h
index 202d402..85358af 100644
--- a/net/xdp/xsk_queue.h
+++ b/net/xdp/xsk_queue.h
@@ -351,14 +351,16 @@ static inline int xskq_prod_reserve_desc(struct xsk_queue *q,
 	return 0;
 }
 
-static inline bool xskq_full_desc(struct xsk_queue *q)
+static inline bool xskq_cons_is_full(struct xsk_queue *q)
 {
-	return xskq_nb_avail(q, q->nentries) == q->nentries;
+	/* No barriers needed since data is not accessed */
+	return READ_ONCE(q->ring->producer) - q->cons_tail == q->nentries;
 }
 
-static inline bool xskq_empty_desc(struct xsk_queue *q)
+static inline bool xskq_prod_is_empty(struct xsk_queue *q)
 {
-	return xskq_nb_free(q, q->nentries) == q->nentries;
+	/* No barriers needed since data is not accessed */
+	return READ_ONCE(q->ring->consumer) == q->cached_prod;
 }
 
 void xskq_set_umem(struct xsk_queue *q, u64 size, u64 chunk_mask);
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH bpf-next 05/12] xsk: eliminate the RX batch size
  2019-12-09  7:56 [PATCH bpf-next 00/12] xsk: clean up ring access functions Magnus Karlsson
                   ` (3 preceding siblings ...)
  2019-12-09  7:56 ` [PATCH bpf-next 04/12] xsk: simplify detection of empty and full rings Magnus Karlsson
@ 2019-12-09  7:56 ` Magnus Karlsson
  2019-12-09 10:16   ` Sergei Shtylyov
  2019-12-09  7:56 ` [PATCH bpf-next 06/12] xsk: simplify xskq_nb_avail and xskq_nb_free Magnus Karlsson
                   ` (7 subsequent siblings)
  12 siblings, 1 reply; 22+ messages in thread
From: Magnus Karlsson @ 2019-12-09  7:56 UTC (permalink / raw)
  To: magnus.karlsson, bjorn.topel, ast, daniel, netdev, jonathan.lemon
  Cc: bpf, saeedm, jeffrey.t.kirsher, maciej.fijalkowski,
	maciejromanfijalkowski

In the xsk consumer ring code there is a variable call RX_BATCH_SIZE
that dictates the minimum number of entries that we try to grab from
the fill and Tx rings. In fact, the code always try to grab the
maximum amount of entries from these rings. The only thing this
variable does is to throw an error if there is less than 16 (as it is
defined) entries on the ring. There is no reason to do this and it
will just lead to weird behavior from user space's point of view. So
eliminate this variable.

With this change, we will be able to simplify the xskq_nb_free and
xskq_nb_avail code in the next commit.

Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
---
 net/xdp/xsk_queue.h | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/net/xdp/xsk_queue.h b/net/xdp/xsk_queue.h
index 85358af..e3ae62d 100644
--- a/net/xdp/xsk_queue.h
+++ b/net/xdp/xsk_queue.h
@@ -10,8 +10,6 @@
 #include <linux/if_xdp.h>
 #include <net/xdp_sock.h>
 
-#define RX_BATCH_SIZE 16
-
 struct xdp_ring {
 	u32 producer ____cacheline_aligned_in_smp;
 	u32 consumer ____cacheline_aligned_in_smp;
@@ -202,7 +200,7 @@ static inline u64 *xskq_peek_addr(struct xsk_queue *q, u64 *addr,
 	if (q->cons_tail == q->cons_head) {
 		smp_mb(); /* D, matches A */
 		WRITE_ONCE(q->ring->consumer, q->cons_tail);
-		q->cons_head = q->cons_tail + xskq_nb_avail(q, RX_BATCH_SIZE);
+		q->cons_head = q->cons_tail + xskq_nb_avail(q, 1);
 
 		/* Order consumer and data */
 		smp_rmb();
@@ -320,7 +318,7 @@ static inline struct xdp_desc *xskq_peek_desc(struct xsk_queue *q,
 	if (q->cons_tail == q->cons_head) {
 		smp_mb(); /* D, matches A */
 		WRITE_ONCE(q->ring->consumer, q->cons_tail);
-		q->cons_head = q->cons_tail + xskq_nb_avail(q, RX_BATCH_SIZE);
+		q->cons_head = q->cons_tail + xskq_nb_avail(q, 1);
 
 		/* Order consumer and data */
 		smp_rmb(); /* C, matches B */
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH bpf-next 06/12] xsk: simplify xskq_nb_avail and xskq_nb_free
  2019-12-09  7:56 [PATCH bpf-next 00/12] xsk: clean up ring access functions Magnus Karlsson
                   ` (4 preceding siblings ...)
  2019-12-09  7:56 ` [PATCH bpf-next 05/12] xsk: eliminate the RX batch size Magnus Karlsson
@ 2019-12-09  7:56 ` Magnus Karlsson
  2019-12-09  7:56 ` [PATCH bpf-next 07/12] xsk: simplify the consumer ring access functions Magnus Karlsson
                   ` (6 subsequent siblings)
  12 siblings, 0 replies; 22+ messages in thread
From: Magnus Karlsson @ 2019-12-09  7:56 UTC (permalink / raw)
  To: magnus.karlsson, bjorn.topel, ast, daniel, netdev, jonathan.lemon
  Cc: bpf, saeedm, jeffrey.t.kirsher, maciej.fijalkowski,
	maciejromanfijalkowski

At this point, there are no users of the functions xskq_nb_avail and
xskq_nb_free that take any other number of entries argument than 1, so
let us get rid of the second argument that takes the number of
entries.

Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
---
 net/xdp/xsk_queue.h | 24 +++++++++++++-----------
 1 file changed, 13 insertions(+), 11 deletions(-)

diff --git a/net/xdp/xsk_queue.h b/net/xdp/xsk_queue.h
index e3ae62d..deda03c 100644
--- a/net/xdp/xsk_queue.h
+++ b/net/xdp/xsk_queue.h
@@ -89,7 +89,7 @@ static inline u64 xskq_nb_invalid_descs(struct xsk_queue *q)
 	return q ? q->invalid_descs : 0;
 }
 
-static inline u32 xskq_nb_avail(struct xsk_queue *q, u32 dcnt)
+static inline u32 xskq_nb_avail(struct xsk_queue *q)
 {
 	u32 entries = q->cached_prod - q->cons_tail;
 
@@ -99,19 +99,21 @@ static inline u32 xskq_nb_avail(struct xsk_queue *q, u32 dcnt)
 		entries = q->cached_prod - q->cons_tail;
 	}
 
-	return (entries > dcnt) ? dcnt : entries;
+	return entries;
 }
 
-static inline u32 xskq_nb_free(struct xsk_queue *q, u32 dcnt)
+static inline bool xskq_prod_is_full(struct xsk_queue *q)
 {
 	u32 free_entries = q->nentries - (q->cached_prod - q->cons_tail);
 
-	if (free_entries >= dcnt)
-		return free_entries;
+	if (free_entries)
+		return false;
 
 	/* Refresh the local tail pointer */
 	q->cons_tail = READ_ONCE(q->ring->consumer);
-	return q->nentries - (q->cached_prod - q->cons_tail);
+	free_entries = q->nentries - (q->cached_prod - q->cons_tail);
+
+	return !free_entries;
 }
 
 static inline bool xskq_has_addrs(struct xsk_queue *q, u32 cnt)
@@ -200,7 +202,7 @@ static inline u64 *xskq_peek_addr(struct xsk_queue *q, u64 *addr,
 	if (q->cons_tail == q->cons_head) {
 		smp_mb(); /* D, matches A */
 		WRITE_ONCE(q->ring->consumer, q->cons_tail);
-		q->cons_head = q->cons_tail + xskq_nb_avail(q, 1);
+		q->cons_head = q->cons_tail + xskq_nb_avail(q);
 
 		/* Order consumer and data */
 		smp_rmb();
@@ -216,7 +218,7 @@ static inline void xskq_discard_addr(struct xsk_queue *q)
 
 static inline int xskq_prod_reserve(struct xsk_queue *q)
 {
-	if (xskq_nb_free(q, 1) == 0)
+	if (xskq_prod_is_full(q))
 		return -ENOSPC;
 
 	/* A, matches D */
@@ -228,7 +230,7 @@ static inline int xskq_prod_reserve_addr(struct xsk_queue *q, u64 addr)
 {
 	struct xdp_umem_ring *ring = (struct xdp_umem_ring *)q->ring;
 
-	if (xskq_nb_free(q, 1) == 0)
+	if (xskq_prod_is_full(q))
 		return -ENOSPC;
 
 	/* A, matches D */
@@ -318,7 +320,7 @@ static inline struct xdp_desc *xskq_peek_desc(struct xsk_queue *q,
 	if (q->cons_tail == q->cons_head) {
 		smp_mb(); /* D, matches A */
 		WRITE_ONCE(q->ring->consumer, q->cons_tail);
-		q->cons_head = q->cons_tail + xskq_nb_avail(q, 1);
+		q->cons_head = q->cons_tail + xskq_nb_avail(q);
 
 		/* Order consumer and data */
 		smp_rmb(); /* C, matches B */
@@ -338,7 +340,7 @@ static inline int xskq_prod_reserve_desc(struct xsk_queue *q,
 	struct xdp_rxtx_ring *ring = (struct xdp_rxtx_ring *)q->ring;
 	u32 idx;
 
-	if (xskq_nb_free(q, 1) == 0)
+	if (xskq_prod_is_full(q))
 		return -ENOSPC;
 
 	/* A, matches D */
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH bpf-next 07/12] xsk: simplify the consumer ring access functions
  2019-12-09  7:56 [PATCH bpf-next 00/12] xsk: clean up ring access functions Magnus Karlsson
                   ` (5 preceding siblings ...)
  2019-12-09  7:56 ` [PATCH bpf-next 06/12] xsk: simplify xskq_nb_avail and xskq_nb_free Magnus Karlsson
@ 2019-12-09  7:56 ` Magnus Karlsson
  2019-12-09  7:56 ` [PATCH bpf-next 08/12] xsk: change names of validation functions Magnus Karlsson
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 22+ messages in thread
From: Magnus Karlsson @ 2019-12-09  7:56 UTC (permalink / raw)
  To: magnus.karlsson, bjorn.topel, ast, daniel, netdev, jonathan.lemon
  Cc: bpf, saeedm, jeffrey.t.kirsher, maciej.fijalkowski,
	maciejromanfijalkowski

Simplify and refactor consumer ring functions. The consumer first
"peeks" to find descriptors or addresses that are available to
read from the ring, then reads them and finally "releases" these
descriptors once it is done. The two local variables cons_tail
and cons_head are turned into one single variable called
cached_cons. cached_tail referred to the cached value of the
global consumer pointer and will be stored in cached_cons. For
cached_head, we just use cached_prod instead as it was not used
for a consumer queue before. It also better reflects what it
really is now: a cached copy of the producer pointer.

The names of the functions are also renamed in the same manner as
the producer functions. The new functions are called xskq_cons_
followed by what it does.

Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
---
 net/xdp/xsk.c       |  22 ++++++------
 net/xdp/xsk_queue.h | 102 ++++++++++++++++++++++++----------------------------
 2 files changed, 57 insertions(+), 67 deletions(-)

diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
index dbeb4e4..fb13b64 100644
--- a/net/xdp/xsk.c
+++ b/net/xdp/xsk.c
@@ -39,19 +39,19 @@ bool xsk_is_setup_for_bpf_map(struct xdp_sock *xs)
 
 bool xsk_umem_has_addrs(struct xdp_umem *umem, u32 cnt)
 {
-	return xskq_has_addrs(umem->fq, cnt);
+	return xskq_cons_has_entries(umem->fq, cnt);
 }
 EXPORT_SYMBOL(xsk_umem_has_addrs);
 
 u64 *xsk_umem_peek_addr(struct xdp_umem *umem, u64 *addr)
 {
-	return xskq_peek_addr(umem->fq, addr, umem);
+	return xskq_cons_peek_addr(umem->fq, addr, umem);
 }
 EXPORT_SYMBOL(xsk_umem_peek_addr);
 
 void xsk_umem_discard_addr(struct xdp_umem *umem)
 {
-	xskq_discard_addr(umem->fq);
+	xskq_cons_release(umem->fq);
 }
 EXPORT_SYMBOL(xsk_umem_discard_addr);
 
@@ -146,7 +146,7 @@ static int __xsk_rcv(struct xdp_sock *xs, struct xdp_buff *xdp, u32 len)
 	u32 metalen;
 	int err;
 
-	if (!xskq_peek_addr(xs->umem->fq, &addr, xs->umem) ||
+	if (!xskq_cons_peek_addr(xs->umem->fq, &addr, xs->umem) ||
 	    len > xs->umem->chunk_size_nohr - XDP_PACKET_HEADROOM) {
 		xs->rx_dropped++;
 		return -ENOSPC;
@@ -167,7 +167,7 @@ static int __xsk_rcv(struct xdp_sock *xs, struct xdp_buff *xdp, u32 len)
 	addr = xsk_umem_adjust_offset(xs->umem, addr, offset);
 	err = xskq_prod_reserve_desc(xs->rx, addr, len);
 	if (!err) {
-		xskq_discard_addr(xs->umem->fq);
+		xskq_cons_release(xs->umem->fq);
 		xdp_return_buff(xdp);
 		return 0;
 	}
@@ -234,7 +234,7 @@ int xsk_generic_rcv(struct xdp_sock *xs, struct xdp_buff *xdp)
 		goto out_unlock;
 	}
 
-	if (!xskq_peek_addr(xs->umem->fq, &addr, xs->umem) ||
+	if (!xskq_cons_peek_addr(xs->umem->fq, &addr, xs->umem) ||
 	    len > xs->umem->chunk_size_nohr - XDP_PACKET_HEADROOM) {
 		err = -ENOSPC;
 		goto out_drop;
@@ -249,7 +249,7 @@ int xsk_generic_rcv(struct xdp_sock *xs, struct xdp_buff *xdp)
 	if (err)
 		goto out_drop;
 
-	xskq_discard_addr(xs->umem->fq);
+	xskq_cons_release(xs->umem->fq);
 	xskq_prod_submit(xs->rx);
 
 	spin_unlock_bh(&xs->rx_lock);
@@ -317,13 +317,13 @@ bool xsk_umem_consume_tx(struct xdp_umem *umem, struct xdp_desc *desc)
 
 	rcu_read_lock();
 	list_for_each_entry_rcu(xs, &umem->xsk_list, list) {
-		if (!xskq_peek_desc(xs->tx, desc, umem))
+		if (!xskq_cons_peek_desc(xs->tx, desc, umem))
 			continue;
 
 		if (xskq_prod_reserve_addr(umem->cq, desc->addr))
 			goto out;
 
-		xskq_discard_desc(xs->tx);
+		xskq_cons_release(xs->tx);
 		rcu_read_unlock();
 		return true;
 	}
@@ -369,7 +369,7 @@ static int xsk_generic_xmit(struct sock *sk)
 	if (xs->queue_id >= xs->dev->real_num_tx_queues)
 		goto out;
 
-	while (xskq_peek_desc(xs->tx, &desc, xs->umem)) {
+	while (xskq_cons_peek_desc(xs->tx, &desc, xs->umem)) {
 		char *buffer;
 		u64 addr;
 		u32 len;
@@ -402,7 +402,7 @@ static int xsk_generic_xmit(struct sock *sk)
 		skb->destructor = xsk_destruct_skb;
 
 		err = dev_direct_xmit(skb, xs->queue_id);
-		xskq_discard_desc(xs->tx);
+		xskq_cons_release(xs->tx);
 		/* Ignore NET_XMIT_CN as packet might have been sent */
 		if (err == NET_XMIT_DROP || err == NETDEV_TX_BUSY) {
 			/* SKB completed but not sent */
diff --git a/net/xdp/xsk_queue.h b/net/xdp/xsk_queue.h
index deda03c..588bdc5 100644
--- a/net/xdp/xsk_queue.h
+++ b/net/xdp/xsk_queue.h
@@ -34,8 +34,7 @@ struct xsk_queue {
 	u32 ring_mask;
 	u32 nentries;
 	u32 cached_prod;
-	u32 cons_head;
-	u32 cons_tail;
+	u32 cached_cons;
 	struct xdp_ring *ring;
 	u64 invalid_descs;
 };
@@ -89,43 +88,48 @@ static inline u64 xskq_nb_invalid_descs(struct xsk_queue *q)
 	return q ? q->invalid_descs : 0;
 }
 
-static inline u32 xskq_nb_avail(struct xsk_queue *q)
+static inline void __xskq_cons_release(struct xsk_queue *q)
 {
-	u32 entries = q->cached_prod - q->cons_tail;
+	smp_mb(); /* D, matches A */
+	WRITE_ONCE(q->ring->consumer, q->cached_cons);
+}
 
-	if (entries == 0) {
-		/* Refresh the local pointer */
-		q->cached_prod = READ_ONCE(q->ring->producer);
-		entries = q->cached_prod - q->cons_tail;
-	}
+static inline void __xskq_cons_peek(struct xsk_queue *q)
+{
+	/* Refresh the local pointer */
+	q->cached_prod = READ_ONCE(q->ring->producer);
+	smp_rmb(); /* C, matches B */
+}
 
-	return entries;
+static inline void xskq_cons_get_entries(struct xsk_queue *q)
+{
+	__xskq_cons_release(q);
+	__xskq_cons_peek(q);
 }
 
 static inline bool xskq_prod_is_full(struct xsk_queue *q)
 {
-	u32 free_entries = q->nentries - (q->cached_prod - q->cons_tail);
+	u32 free_entries = q->nentries - (q->cached_prod - q->cached_cons);
 
 	if (free_entries)
 		return false;
 
 	/* Refresh the local tail pointer */
-	q->cons_tail = READ_ONCE(q->ring->consumer);
-	free_entries = q->nentries - (q->cached_prod - q->cons_tail);
+	q->cached_cons = READ_ONCE(q->ring->consumer);
+	free_entries = q->nentries - (q->cached_prod - q->cached_cons);
 
 	return !free_entries;
 }
 
-static inline bool xskq_has_addrs(struct xsk_queue *q, u32 cnt)
+static inline bool xskq_cons_has_entries(struct xsk_queue *q, u32 cnt)
 {
-	u32 entries = q->cached_prod - q->cons_tail;
+	u32 entries = q->cached_prod - q->cached_cons;
 
 	if (entries >= cnt)
 		return true;
 
-	/* Refresh the local pointer. */
-	q->cached_prod = READ_ONCE(q->ring->producer);
-	entries = q->cached_prod - q->cons_tail;
+	__xskq_cons_peek(q);
+	entries = q->cached_prod - q->cached_cons;
 
 	return entries >= cnt;
 }
@@ -172,9 +176,10 @@ static inline bool xskq_is_valid_addr_unaligned(struct xsk_queue *q, u64 addr,
 static inline u64 *xskq_validate_addr(struct xsk_queue *q, u64 *addr,
 				      struct xdp_umem *umem)
 {
-	while (q->cons_tail != q->cons_head) {
-		struct xdp_umem_ring *ring = (struct xdp_umem_ring *)q->ring;
-		unsigned int idx = q->cons_tail & q->ring_mask;
+	struct xdp_umem_ring *ring = (struct xdp_umem_ring *)q->ring;
+
+	while (q->cached_cons != q->cached_prod) {
+		u32 idx = q->cached_cons & q->ring_mask;
 
 		*addr = READ_ONCE(ring->desc[idx]) & q->chunk_mask;
 
@@ -190,30 +195,27 @@ static inline u64 *xskq_validate_addr(struct xsk_queue *q, u64 *addr,
 			return addr;
 
 out:
-		q->cons_tail++;
+		q->cached_cons++;
 	}
 
 	return NULL;
 }
 
-static inline u64 *xskq_peek_addr(struct xsk_queue *q, u64 *addr,
-				  struct xdp_umem *umem)
+static inline u64 *xskq_cons_peek_addr(struct xsk_queue *q, u64 *addr,
+				       struct xdp_umem *umem)
 {
-	if (q->cons_tail == q->cons_head) {
-		smp_mb(); /* D, matches A */
-		WRITE_ONCE(q->ring->consumer, q->cons_tail);
-		q->cons_head = q->cons_tail + xskq_nb_avail(q);
-
-		/* Order consumer and data */
-		smp_rmb();
-	}
-
+	if (q->cached_prod == q->cached_cons)
+		xskq_cons_get_entries(q);
 	return xskq_validate_addr(q, addr, umem);
 }
 
-static inline void xskq_discard_addr(struct xsk_queue *q)
+static inline void xskq_cons_release(struct xsk_queue *q)
 {
-	q->cons_tail++;
+	/* To improve performance, only update local state here.
+	 * Do the actual release operation when we get new entries
+	 * from the ring in xskq_cons_get_entries() instead.
+	 */
+	q->cached_cons++;
 }
 
 static inline int xskq_prod_reserve(struct xsk_queue *q)
@@ -299,41 +301,29 @@ static inline struct xdp_desc *xskq_validate_desc(struct xsk_queue *q,
 						  struct xdp_desc *desc,
 						  struct xdp_umem *umem)
 {
-	while (q->cons_tail != q->cons_head) {
+	while (q->cached_cons != q->cached_prod) {
 		struct xdp_rxtx_ring *ring = (struct xdp_rxtx_ring *)q->ring;
-		unsigned int idx = q->cons_tail & q->ring_mask;
+		u32 idx = q->cached_cons & q->ring_mask;
 
 		*desc = READ_ONCE(ring->desc[idx]);
 		if (xskq_is_valid_desc(q, desc, umem))
 			return desc;
 
-		q->cons_tail++;
+		q->cached_cons++;
 	}
 
 	return NULL;
 }
 
-static inline struct xdp_desc *xskq_peek_desc(struct xsk_queue *q,
-					      struct xdp_desc *desc,
-					      struct xdp_umem *umem)
+static inline struct xdp_desc *xskq_cons_peek_desc(struct xsk_queue *q,
+						   struct xdp_desc *desc,
+						   struct xdp_umem *umem)
 {
-	if (q->cons_tail == q->cons_head) {
-		smp_mb(); /* D, matches A */
-		WRITE_ONCE(q->ring->consumer, q->cons_tail);
-		q->cons_head = q->cons_tail + xskq_nb_avail(q);
-
-		/* Order consumer and data */
-		smp_rmb(); /* C, matches B */
-	}
-
+	if (q->cached_prod == q->cached_cons)
+		xskq_cons_get_entries(q);
 	return xskq_validate_desc(q, desc, umem);
 }
 
-static inline void xskq_discard_desc(struct xsk_queue *q)
-{
-	q->cons_tail++;
-}
-
 static inline int xskq_prod_reserve_desc(struct xsk_queue *q,
 					 u64 addr, u32 len)
 {
@@ -354,7 +344,7 @@ static inline int xskq_prod_reserve_desc(struct xsk_queue *q,
 static inline bool xskq_cons_is_full(struct xsk_queue *q)
 {
 	/* No barriers needed since data is not accessed */
-	return READ_ONCE(q->ring->producer) - q->cons_tail == q->nentries;
+	return READ_ONCE(q->ring->producer) - q->cached_cons == q->nentries;
 }
 
 static inline bool xskq_prod_is_empty(struct xsk_queue *q)
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH bpf-next 08/12] xsk: change names of validation functions
  2019-12-09  7:56 [PATCH bpf-next 00/12] xsk: clean up ring access functions Magnus Karlsson
                   ` (6 preceding siblings ...)
  2019-12-09  7:56 ` [PATCH bpf-next 07/12] xsk: simplify the consumer ring access functions Magnus Karlsson
@ 2019-12-09  7:56 ` Magnus Karlsson
  2019-12-09  7:56 ` [PATCH bpf-next 09/12] xsk: ixgbe: i40e: ice: mlx5: xsk_umem_discard_addr to xsk_umem_release_addr Magnus Karlsson
                   ` (4 subsequent siblings)
  12 siblings, 0 replies; 22+ messages in thread
From: Magnus Karlsson @ 2019-12-09  7:56 UTC (permalink / raw)
  To: magnus.karlsson, bjorn.topel, ast, daniel, netdev, jonathan.lemon
  Cc: bpf, saeedm, jeffrey.t.kirsher, maciej.fijalkowski,
	maciejromanfijalkowski

Change the names of the validation functions to better reflect what
they are doing. The uppermost ones are reading entries from the rings
and only the bottom ones validate entries. So xskq_cons_read_ is a
better prefix name.

Also change the xskq_cons_read_ functions to return a bool
as the the descriptor or address is already returned by reference
in the parameters. Everyone is using the return value as a bool
anyway.

Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
---
 include/net/xdp_sock.h |  4 ++--
 net/xdp/xsk.c          |  4 ++--
 net/xdp/xsk_queue.h    | 59 ++++++++++++++++++++++++++------------------------
 3 files changed, 35 insertions(+), 32 deletions(-)

diff --git a/include/net/xdp_sock.h b/include/net/xdp_sock.h
index e3780e4..0742aa9 100644
--- a/include/net/xdp_sock.h
+++ b/include/net/xdp_sock.h
@@ -119,7 +119,7 @@ int xsk_generic_rcv(struct xdp_sock *xs, struct xdp_buff *xdp);
 bool xsk_is_setup_for_bpf_map(struct xdp_sock *xs);
 /* Used from netdev driver */
 bool xsk_umem_has_addrs(struct xdp_umem *umem, u32 cnt);
-u64 *xsk_umem_peek_addr(struct xdp_umem *umem, u64 *addr);
+bool xsk_umem_peek_addr(struct xdp_umem *umem, u64 *addr);
 void xsk_umem_discard_addr(struct xdp_umem *umem);
 void xsk_umem_complete_tx(struct xdp_umem *umem, u32 nb_entries);
 bool xsk_umem_consume_tx(struct xdp_umem *umem, struct xdp_desc *desc);
@@ -199,7 +199,7 @@ static inline bool xsk_umem_has_addrs_rq(struct xdp_umem *umem, u32 cnt)
 	return xsk_umem_has_addrs(umem, cnt - rq->length);
 }
 
-static inline u64 *xsk_umem_peek_addr_rq(struct xdp_umem *umem, u64 *addr)
+static inline bool xsk_umem_peek_addr_rq(struct xdp_umem *umem, u64 *addr)
 {
 	struct xdp_umem_fq_reuse *rq = umem->fq_reuse;
 
diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
index fb13b64..62e27a4 100644
--- a/net/xdp/xsk.c
+++ b/net/xdp/xsk.c
@@ -43,7 +43,7 @@ bool xsk_umem_has_addrs(struct xdp_umem *umem, u32 cnt)
 }
 EXPORT_SYMBOL(xsk_umem_has_addrs);
 
-u64 *xsk_umem_peek_addr(struct xdp_umem *umem, u64 *addr)
+bool xsk_umem_peek_addr(struct xdp_umem *umem, u64 *addr)
 {
 	return xskq_cons_peek_addr(umem->fq, addr, umem);
 }
@@ -124,7 +124,7 @@ static void __xsk_rcv_memcpy(struct xdp_umem *umem, u64 addr, void *from_buf,
 	void *to_buf = xdp_umem_get_data(umem, addr);
 
 	addr = xsk_umem_add_offset_to_addr(addr);
-	if (xskq_crosses_non_contig_pg(umem, addr, len + metalen)) {
+	if (xskq_cons_crosses_non_contig_pg(umem, addr, len + metalen)) {
 		void *next_pg_addr = umem->pages[(addr >> PAGE_SHIFT) + 1].addr;
 		u64 page_start = addr & ~(PAGE_SIZE - 1);
 		u64 first_len = PAGE_SIZE - (addr - page_start);
diff --git a/net/xdp/xsk_queue.h b/net/xdp/xsk_queue.h
index 588bdc5..7566f64 100644
--- a/net/xdp/xsk_queue.h
+++ b/net/xdp/xsk_queue.h
@@ -136,8 +136,9 @@ static inline bool xskq_cons_has_entries(struct xsk_queue *q, u32 cnt)
 
 /* UMEM queue */
 
-static inline bool xskq_crosses_non_contig_pg(struct xdp_umem *umem, u64 addr,
-					      u64 length)
+static inline bool xskq_cons_crosses_non_contig_pg(struct xdp_umem *umem,
+						   u64 addr,
+						   u64 length)
 {
 	bool cross_pg = (addr & (PAGE_SIZE - 1)) + length > PAGE_SIZE;
 	bool next_pg_contig =
@@ -147,7 +148,7 @@ static inline bool xskq_crosses_non_contig_pg(struct xdp_umem *umem, u64 addr,
 	return cross_pg && !next_pg_contig;
 }
 
-static inline bool xskq_is_valid_addr(struct xsk_queue *q, u64 addr)
+static inline bool xskq_cons_is_valid_addr(struct xsk_queue *q, u64 addr)
 {
 	if (addr >= q->size) {
 		q->invalid_descs++;
@@ -157,7 +158,8 @@ static inline bool xskq_is_valid_addr(struct xsk_queue *q, u64 addr)
 	return true;
 }
 
-static inline bool xskq_is_valid_addr_unaligned(struct xsk_queue *q, u64 addr,
+static inline bool xskq_cons_is_valid_unaligned(struct xsk_queue *q,
+						u64 addr,
 						u64 length,
 						struct xdp_umem *umem)
 {
@@ -165,7 +167,7 @@ static inline bool xskq_is_valid_addr_unaligned(struct xsk_queue *q, u64 addr,
 
 	addr = xsk_umem_add_offset_to_addr(addr);
 	if (base_addr >= q->size || addr >= q->size ||
-	    xskq_crosses_non_contig_pg(umem, addr, length)) {
+	    xskq_cons_crosses_non_contig_pg(umem, addr, length)) {
 		q->invalid_descs++;
 		return false;
 	}
@@ -173,8 +175,8 @@ static inline bool xskq_is_valid_addr_unaligned(struct xsk_queue *q, u64 addr,
 	return true;
 }
 
-static inline u64 *xskq_validate_addr(struct xsk_queue *q, u64 *addr,
-				      struct xdp_umem *umem)
+static inline bool xskq_cons_read_addr(struct xsk_queue *q, u64 *addr,
+				       struct xdp_umem *umem)
 {
 	struct xdp_umem_ring *ring = (struct xdp_umem_ring *)q->ring;
 
@@ -184,29 +186,29 @@ static inline u64 *xskq_validate_addr(struct xsk_queue *q, u64 *addr,
 		*addr = READ_ONCE(ring->desc[idx]) & q->chunk_mask;
 
 		if (umem->flags & XDP_UMEM_UNALIGNED_CHUNK_FLAG) {
-			if (xskq_is_valid_addr_unaligned(q, *addr,
+			if (xskq_cons_is_valid_unaligned(q, *addr,
 							 umem->chunk_size_nohr,
 							 umem))
-				return addr;
+				return true;
 			goto out;
 		}
 
-		if (xskq_is_valid_addr(q, *addr))
-			return addr;
+		if (xskq_cons_is_valid_addr(q, *addr))
+			return true;
 
 out:
 		q->cached_cons++;
 	}
 
-	return NULL;
+	return false;
 }
 
-static inline u64 *xskq_cons_peek_addr(struct xsk_queue *q, u64 *addr,
+static inline bool xskq_cons_peek_addr(struct xsk_queue *q, u64 *addr,
 				       struct xdp_umem *umem)
 {
 	if (q->cached_prod == q->cached_cons)
 		xskq_cons_get_entries(q);
-	return xskq_validate_addr(q, addr, umem);
+	return xskq_cons_read_addr(q, addr, umem);
 }
 
 static inline void xskq_cons_release(struct xsk_queue *q)
@@ -270,11 +272,12 @@ static inline void xskq_prod_submit_n(struct xsk_queue *q, u32 nb_entries)
 
 /* Rx/Tx queue */
 
-static inline bool xskq_is_valid_desc(struct xsk_queue *q, struct xdp_desc *d,
-				      struct xdp_umem *umem)
+static inline bool xskq_cons_is_valid_desc(struct xsk_queue *q,
+					   struct xdp_desc *d,
+					   struct xdp_umem *umem)
 {
 	if (umem->flags & XDP_UMEM_UNALIGNED_CHUNK_FLAG) {
-		if (!xskq_is_valid_addr_unaligned(q, d->addr, d->len, umem))
+		if (!xskq_cons_is_valid_unaligned(q, d->addr, d->len, umem))
 			return false;
 
 		if (d->len > umem->chunk_size_nohr || d->options) {
@@ -285,7 +288,7 @@ static inline bool xskq_is_valid_desc(struct xsk_queue *q, struct xdp_desc *d,
 		return true;
 	}
 
-	if (!xskq_is_valid_addr(q, d->addr))
+	if (!xskq_cons_is_valid_addr(q, d->addr))
 		return false;
 
 	if (((d->addr + d->len) & q->chunk_mask) != (d->addr & q->chunk_mask) ||
@@ -297,31 +300,31 @@ static inline bool xskq_is_valid_desc(struct xsk_queue *q, struct xdp_desc *d,
 	return true;
 }
 
-static inline struct xdp_desc *xskq_validate_desc(struct xsk_queue *q,
-						  struct xdp_desc *desc,
-						  struct xdp_umem *umem)
+static inline bool xskq_cons_read_desc(struct xsk_queue *q,
+				       struct xdp_desc *desc,
+				       struct xdp_umem *umem)
 {
 	while (q->cached_cons != q->cached_prod) {
 		struct xdp_rxtx_ring *ring = (struct xdp_rxtx_ring *)q->ring;
 		u32 idx = q->cached_cons & q->ring_mask;
 
 		*desc = READ_ONCE(ring->desc[idx]);
-		if (xskq_is_valid_desc(q, desc, umem))
-			return desc;
+		if (xskq_cons_is_valid_desc(q, desc, umem))
+			return true;
 
 		q->cached_cons++;
 	}
 
-	return NULL;
+	return false;
 }
 
-static inline struct xdp_desc *xskq_cons_peek_desc(struct xsk_queue *q,
-						   struct xdp_desc *desc,
-						   struct xdp_umem *umem)
+static inline bool xskq_cons_peek_desc(struct xsk_queue *q,
+				       struct xdp_desc *desc,
+				       struct xdp_umem *umem)
 {
 	if (q->cached_prod == q->cached_cons)
 		xskq_cons_get_entries(q);
-	return xskq_validate_desc(q, desc, umem);
+	return xskq_cons_read_desc(q, desc, umem);
 }
 
 static inline int xskq_prod_reserve_desc(struct xsk_queue *q,
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH bpf-next 09/12] xsk: ixgbe: i40e: ice: mlx5: xsk_umem_discard_addr to xsk_umem_release_addr
  2019-12-09  7:56 [PATCH bpf-next 00/12] xsk: clean up ring access functions Magnus Karlsson
                   ` (7 preceding siblings ...)
  2019-12-09  7:56 ` [PATCH bpf-next 08/12] xsk: change names of validation functions Magnus Karlsson
@ 2019-12-09  7:56 ` Magnus Karlsson
  2019-12-09  7:56 ` [PATCH bpf-next 10/12] xsk: remove unnecessary READ_ONCE of data Magnus Karlsson
                   ` (3 subsequent siblings)
  12 siblings, 0 replies; 22+ messages in thread
From: Magnus Karlsson @ 2019-12-09  7:56 UTC (permalink / raw)
  To: magnus.karlsson, bjorn.topel, ast, daniel, netdev, jonathan.lemon
  Cc: bpf, saeedm, jeffrey.t.kirsher, maciej.fijalkowski,
	maciejromanfijalkowski

Change the name of xsk_umem_discard_addr to xsk_umem_release_addr to
better reflect the new naming of the AF_XDP queue manipulation
functions. As this functions is used by drivers implementing support
for AF_XDP zero-copy, it requires a name change to these drivers. The
function xsk_umem_release_addr_rq has also changed name in the same
fashion.

Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e_xsk.c          |  4 ++--
 drivers/net/ethernet/intel/ice/ice_xsk.c            |  4 ++--
 drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c        |  4 ++--
 drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.c |  2 +-
 include/net/xdp_sock.h                              | 10 +++++-----
 net/xdp/xsk.c                                       |  4 ++--
 6 files changed, 14 insertions(+), 14 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_xsk.c b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
index d07e1a8..20d6f11 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_xsk.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
@@ -269,7 +269,7 @@ static bool i40e_alloc_buffer_zc(struct i40e_ring *rx_ring,
 
 	bi->handle = xsk_umem_adjust_offset(umem, handle, umem->headroom);
 
-	xsk_umem_discard_addr(umem);
+	xsk_umem_release_addr(umem);
 	return true;
 }
 
@@ -306,7 +306,7 @@ static bool i40e_alloc_buffer_slow_zc(struct i40e_ring *rx_ring,
 
 	bi->handle = xsk_umem_adjust_offset(umem, handle, umem->headroom);
 
-	xsk_umem_discard_addr_rq(umem);
+	xsk_umem_release_addr_rq(umem);
 	return true;
 }
 
diff --git a/drivers/net/ethernet/intel/ice/ice_xsk.c b/drivers/net/ethernet/intel/ice/ice_xsk.c
index cf9b8b2..0af1f0b 100644
--- a/drivers/net/ethernet/intel/ice/ice_xsk.c
+++ b/drivers/net/ethernet/intel/ice/ice_xsk.c
@@ -555,7 +555,7 @@ ice_alloc_buf_fast_zc(struct ice_ring *rx_ring, struct ice_rx_buf *rx_buf)
 
 	rx_buf->handle = handle + umem->headroom;
 
-	xsk_umem_discard_addr(umem);
+	xsk_umem_release_addr(umem);
 	return true;
 }
 
@@ -591,7 +591,7 @@ ice_alloc_buf_slow_zc(struct ice_ring *rx_ring, struct ice_rx_buf *rx_buf)
 
 	rx_buf->handle = handle + umem->headroom;
 
-	xsk_umem_discard_addr_rq(umem);
+	xsk_umem_release_addr_rq(umem);
 	return true;
 }
 
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
index d6feaac..1501d0a 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
@@ -277,7 +277,7 @@ static bool ixgbe_alloc_buffer_zc(struct ixgbe_ring *rx_ring,
 
 	bi->handle = xsk_umem_adjust_offset(umem, handle, umem->headroom);
 
-	xsk_umem_discard_addr(umem);
+	xsk_umem_release_addr(umem);
 	return true;
 }
 
@@ -304,7 +304,7 @@ static bool ixgbe_alloc_buffer_slow_zc(struct ixgbe_ring *rx_ring,
 
 	bi->handle = xsk_umem_adjust_offset(umem, handle, umem->headroom);
 
-	xsk_umem_discard_addr_rq(umem);
+	xsk_umem_release_addr_rq(umem);
 	return true;
 }
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.c
index 475b6bd..62fc8a1 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.c
@@ -35,7 +35,7 @@ int mlx5e_xsk_page_alloc_umem(struct mlx5e_rq *rq,
 	 */
 	dma_info->addr = xdp_umem_get_dma(umem, handle);
 
-	xsk_umem_discard_addr_rq(umem);
+	xsk_umem_release_addr_rq(umem);
 
 	dma_sync_single_for_device(rq->pdev, dma_info->addr, PAGE_SIZE,
 				   DMA_BIDIRECTIONAL);
diff --git a/include/net/xdp_sock.h b/include/net/xdp_sock.h
index 0742aa9..b23f60d 100644
--- a/include/net/xdp_sock.h
+++ b/include/net/xdp_sock.h
@@ -120,7 +120,7 @@ bool xsk_is_setup_for_bpf_map(struct xdp_sock *xs);
 /* Used from netdev driver */
 bool xsk_umem_has_addrs(struct xdp_umem *umem, u32 cnt);
 bool xsk_umem_peek_addr(struct xdp_umem *umem, u64 *addr);
-void xsk_umem_discard_addr(struct xdp_umem *umem);
+void xsk_umem_release_addr(struct xdp_umem *umem);
 void xsk_umem_complete_tx(struct xdp_umem *umem, u32 nb_entries);
 bool xsk_umem_consume_tx(struct xdp_umem *umem, struct xdp_desc *desc);
 void xsk_umem_consume_tx_done(struct xdp_umem *umem);
@@ -210,12 +210,12 @@ static inline bool xsk_umem_peek_addr_rq(struct xdp_umem *umem, u64 *addr)
 	return addr;
 }
 
-static inline void xsk_umem_discard_addr_rq(struct xdp_umem *umem)
+static inline void xsk_umem_release_addr_rq(struct xdp_umem *umem)
 {
 	struct xdp_umem_fq_reuse *rq = umem->fq_reuse;
 
 	if (!rq->length)
-		xsk_umem_discard_addr(umem);
+		xsk_umem_release_addr(umem);
 	else
 		rq->length--;
 }
@@ -260,7 +260,7 @@ static inline u64 *xsk_umem_peek_addr(struct xdp_umem *umem, u64 *addr)
 	return NULL;
 }
 
-static inline void xsk_umem_discard_addr(struct xdp_umem *umem)
+static inline void xsk_umem_release_addr(struct xdp_umem *umem)
 {
 }
 
@@ -334,7 +334,7 @@ static inline u64 *xsk_umem_peek_addr_rq(struct xdp_umem *umem, u64 *addr)
 	return NULL;
 }
 
-static inline void xsk_umem_discard_addr_rq(struct xdp_umem *umem)
+static inline void xsk_umem_release_addr_rq(struct xdp_umem *umem)
 {
 }
 
diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
index 62e27a4..f0dd740 100644
--- a/net/xdp/xsk.c
+++ b/net/xdp/xsk.c
@@ -49,11 +49,11 @@ bool xsk_umem_peek_addr(struct xdp_umem *umem, u64 *addr)
 }
 EXPORT_SYMBOL(xsk_umem_peek_addr);
 
-void xsk_umem_discard_addr(struct xdp_umem *umem)
+void xsk_umem_release_addr(struct xdp_umem *umem)
 {
 	xskq_cons_release(umem->fq);
 }
-EXPORT_SYMBOL(xsk_umem_discard_addr);
+EXPORT_SYMBOL(xsk_umem_release_addr);
 
 void xsk_set_rx_need_wakeup(struct xdp_umem *umem)
 {
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH bpf-next 10/12] xsk: remove unnecessary READ_ONCE of data
  2019-12-09  7:56 [PATCH bpf-next 00/12] xsk: clean up ring access functions Magnus Karlsson
                   ` (8 preceding siblings ...)
  2019-12-09  7:56 ` [PATCH bpf-next 09/12] xsk: ixgbe: i40e: ice: mlx5: xsk_umem_discard_addr to xsk_umem_release_addr Magnus Karlsson
@ 2019-12-09  7:56 ` Magnus Karlsson
  2019-12-09  7:56 ` [PATCH bpf-next 11/12] xsk: add function naming comments and reorder functions Magnus Karlsson
                   ` (2 subsequent siblings)
  12 siblings, 0 replies; 22+ messages in thread
From: Magnus Karlsson @ 2019-12-09  7:56 UTC (permalink / raw)
  To: magnus.karlsson, bjorn.topel, ast, daniel, netdev, jonathan.lemon
  Cc: bpf, saeedm, jeffrey.t.kirsher, maciej.fijalkowski,
	maciejromanfijalkowski

There are two unnecessary READ_ONCE of descriptor data. These are not
needed since the data is written by the producer before it signals
that the data is available by incrementing the producer pointer. As the
access to this producer pointer is serialized and the consumer always
reads the descriptor after it has read and synchronized with the
producer counter, the write of the descriptor will have fully
completed and it does not matter if the consumer has any read tearing.

Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
---
 net/xdp/xsk_queue.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/xdp/xsk_queue.h b/net/xdp/xsk_queue.h
index 7566f64..89e1fbd 100644
--- a/net/xdp/xsk_queue.h
+++ b/net/xdp/xsk_queue.h
@@ -183,7 +183,7 @@ static inline bool xskq_cons_read_addr(struct xsk_queue *q, u64 *addr,
 	while (q->cached_cons != q->cached_prod) {
 		u32 idx = q->cached_cons & q->ring_mask;
 
-		*addr = READ_ONCE(ring->desc[idx]) & q->chunk_mask;
+		*addr = ring->desc[idx] & q->chunk_mask;
 
 		if (umem->flags & XDP_UMEM_UNALIGNED_CHUNK_FLAG) {
 			if (xskq_cons_is_valid_unaligned(q, *addr,
@@ -308,7 +308,7 @@ static inline bool xskq_cons_read_desc(struct xsk_queue *q,
 		struct xdp_rxtx_ring *ring = (struct xdp_rxtx_ring *)q->ring;
 		u32 idx = q->cached_cons & q->ring_mask;
 
-		*desc = READ_ONCE(ring->desc[idx]);
+		*desc = ring->desc[idx];
 		if (xskq_cons_is_valid_desc(q, desc, umem))
 			return true;
 
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH bpf-next 11/12] xsk: add function naming comments and reorder functions
  2019-12-09  7:56 [PATCH bpf-next 00/12] xsk: clean up ring access functions Magnus Karlsson
                   ` (9 preceding siblings ...)
  2019-12-09  7:56 ` [PATCH bpf-next 10/12] xsk: remove unnecessary READ_ONCE of data Magnus Karlsson
@ 2019-12-09  7:56 ` Magnus Karlsson
  2019-12-09  7:56 ` [PATCH bpf-next 12/12] xsk: use struct_size() helper Magnus Karlsson
  2019-12-10  1:02 ` [PATCH bpf-next 00/12] xsk: clean up ring access functions Martin Lau
  12 siblings, 0 replies; 22+ messages in thread
From: Magnus Karlsson @ 2019-12-09  7:56 UTC (permalink / raw)
  To: magnus.karlsson, bjorn.topel, ast, daniel, netdev, jonathan.lemon
  Cc: bpf, saeedm, jeffrey.t.kirsher, maciej.fijalkowski,
	maciejromanfijalkowski

Add comments on how the ring access functions are named and how they
are supposed to be used for producers and consumers. The functions are
also reordered so that the consumer functions are in the beginning and
the producer functions in the end, for easier reference. Put this in a
separate patch as the diff might look a little odd, but no
functionality has changed in this patch.

Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
---
 net/xdp/xsk.c       |  10 ++
 net/xdp/xsk_queue.h | 291 ++++++++++++++++++++++++++++------------------------
 2 files changed, 166 insertions(+), 135 deletions(-)

diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
index f0dd740..e1ffcf1 100644
--- a/net/xdp/xsk.c
+++ b/net/xdp/xsk.c
@@ -320,6 +320,11 @@ bool xsk_umem_consume_tx(struct xdp_umem *umem, struct xdp_desc *desc)
 		if (!xskq_cons_peek_desc(xs->tx, desc, umem))
 			continue;
 
+		/* This is the backpreassure mechanism for the Tx path.
+		 * Reserve space in the completion queue and only proceed
+		 * if there is space in it. This avoids having to implement
+		 * any buffering in the Tx path.
+		 */
 		if (xskq_prod_reserve_addr(umem->cq, desc->addr))
 			goto out;
 
@@ -390,6 +395,11 @@ static int xsk_generic_xmit(struct sock *sk)
 		addr = desc.addr;
 		buffer = xdp_umem_get_data(xs->umem, addr);
 		err = skb_store_bits(skb, 0, buffer, len);
+		/* This is the backpreassure mechanism for the Tx path.
+		 * Reserve space in the completion queue and only proceed
+		 * if there is space in it. This avoids having to implement
+		 * any buffering in the Tx path.
+		 */
 		if (unlikely(err) || xskq_prod_reserve(xs->umem->cq)) {
 			kfree_skb(skb);
 			goto out;
diff --git a/net/xdp/xsk_queue.h b/net/xdp/xsk_queue.h
index 89e1fbd..81a88c1 100644
--- a/net/xdp/xsk_queue.h
+++ b/net/xdp/xsk_queue.h
@@ -81,60 +81,27 @@ struct xsk_queue {
  * now and again after circling through the ring.
  */
 
-/* Common functions operating for both RXTX and umem queues */
-
-static inline u64 xskq_nb_invalid_descs(struct xsk_queue *q)
-{
-	return q ? q->invalid_descs : 0;
-}
-
-static inline void __xskq_cons_release(struct xsk_queue *q)
-{
-	smp_mb(); /* D, matches A */
-	WRITE_ONCE(q->ring->consumer, q->cached_cons);
-}
-
-static inline void __xskq_cons_peek(struct xsk_queue *q)
-{
-	/* Refresh the local pointer */
-	q->cached_prod = READ_ONCE(q->ring->producer);
-	smp_rmb(); /* C, matches B */
-}
-
-static inline void xskq_cons_get_entries(struct xsk_queue *q)
-{
-	__xskq_cons_release(q);
-	__xskq_cons_peek(q);
-}
-
-static inline bool xskq_prod_is_full(struct xsk_queue *q)
-{
-	u32 free_entries = q->nentries - (q->cached_prod - q->cached_cons);
-
-	if (free_entries)
-		return false;
-
-	/* Refresh the local tail pointer */
-	q->cached_cons = READ_ONCE(q->ring->consumer);
-	free_entries = q->nentries - (q->cached_prod - q->cached_cons);
-
-	return !free_entries;
-}
-
-static inline bool xskq_cons_has_entries(struct xsk_queue *q, u32 cnt)
-{
-	u32 entries = q->cached_prod - q->cached_cons;
-
-	if (entries >= cnt)
-		return true;
-
-	__xskq_cons_peek(q);
-	entries = q->cached_prod - q->cached_cons;
-
-	return entries >= cnt;
-}
+/* The operations on the rings are the following:
+ *
+ * producer                           consumer
+ *
+ * RESERVE entries                    PEEK in the ring for entries
+ * WRITE data into the ring           READ data from the ring
+ * SUBMIT entries                     RELEASE entries
+ *
+ * The producer reserves one or more entries in the ring. It can then
+ * fill in these entries and finally submit them so that they can be
+ * seen and read by the consumer.
+ *
+ * The consumer peeks into the ring to see if the producer has written
+ * any new entries. If so, the producer can then read these entries
+ * and when it is done reading them release them back to the producer
+ * so that the producer can use these slots to fill in new entries.
+ *
+ * The function names below reflect these operations.
+ */
 
-/* UMEM queue */
+/* Functions that read and validate content from consumer rings. */
 
 static inline bool xskq_cons_crosses_non_contig_pg(struct xdp_umem *umem,
 						   u64 addr,
@@ -148,16 +115,6 @@ static inline bool xskq_cons_crosses_non_contig_pg(struct xdp_umem *umem,
 	return cross_pg && !next_pg_contig;
 }
 
-static inline bool xskq_cons_is_valid_addr(struct xsk_queue *q, u64 addr)
-{
-	if (addr >= q->size) {
-		q->invalid_descs++;
-		return false;
-	}
-
-	return true;
-}
-
 static inline bool xskq_cons_is_valid_unaligned(struct xsk_queue *q,
 						u64 addr,
 						u64 length,
@@ -175,6 +132,16 @@ static inline bool xskq_cons_is_valid_unaligned(struct xsk_queue *q,
 	return true;
 }
 
+static inline bool xskq_cons_is_valid_addr(struct xsk_queue *q, u64 addr)
+{
+	if (addr >= q->size) {
+		q->invalid_descs++;
+		return false;
+	}
+
+	return true;
+}
+
 static inline bool xskq_cons_read_addr(struct xsk_queue *q, u64 *addr,
 				       struct xdp_umem *umem)
 {
@@ -203,75 +170,6 @@ static inline bool xskq_cons_read_addr(struct xsk_queue *q, u64 *addr,
 	return false;
 }
 
-static inline bool xskq_cons_peek_addr(struct xsk_queue *q, u64 *addr,
-				       struct xdp_umem *umem)
-{
-	if (q->cached_prod == q->cached_cons)
-		xskq_cons_get_entries(q);
-	return xskq_cons_read_addr(q, addr, umem);
-}
-
-static inline void xskq_cons_release(struct xsk_queue *q)
-{
-	/* To improve performance, only update local state here.
-	 * Do the actual release operation when we get new entries
-	 * from the ring in xskq_cons_get_entries() instead.
-	 */
-	q->cached_cons++;
-}
-
-static inline int xskq_prod_reserve(struct xsk_queue *q)
-{
-	if (xskq_prod_is_full(q))
-		return -ENOSPC;
-
-	/* A, matches D */
-	q->cached_prod++;
-	return 0;
-}
-
-static inline int xskq_prod_reserve_addr(struct xsk_queue *q, u64 addr)
-{
-	struct xdp_umem_ring *ring = (struct xdp_umem_ring *)q->ring;
-
-	if (xskq_prod_is_full(q))
-		return -ENOSPC;
-
-	/* A, matches D */
-	ring->desc[q->cached_prod++ & q->ring_mask] = addr;
-	return 0;
-}
-
-static inline void __xskq_prod_submit(struct xsk_queue *q, u32 idx)
-{
-	/* Order producer and data */
-	smp_wmb(); /* B, matches C */
-
-	WRITE_ONCE(q->ring->producer, idx);
-}
-
-static inline void xskq_prod_submit(struct xsk_queue *q)
-{
-	__xskq_prod_submit(q, q->cached_prod);
-}
-
-static inline void xskq_prod_submit_addr(struct xsk_queue *q, u64 addr)
-{
-	struct xdp_umem_ring *ring = (struct xdp_umem_ring *)q->ring;
-	u32 idx = q->ring->producer;
-
-	ring->desc[idx++ & q->ring_mask] = addr;
-
-	__xskq_prod_submit(q, idx);
-}
-
-static inline void xskq_prod_submit_n(struct xsk_queue *q, u32 nb_entries)
-{
-	__xskq_prod_submit(q, q->ring->producer + nb_entries);
-}
-
-/* Rx/Tx queue */
-
 static inline bool xskq_cons_is_valid_desc(struct xsk_queue *q,
 					   struct xdp_desc *d,
 					   struct xdp_umem *umem)
@@ -318,6 +216,48 @@ static inline bool xskq_cons_read_desc(struct xsk_queue *q,
 	return false;
 }
 
+/* Functions for consumers */
+
+static inline void __xskq_cons_release(struct xsk_queue *q)
+{
+	smp_mb(); /* D, matches A */
+	WRITE_ONCE(q->ring->consumer, q->cached_cons);
+}
+
+static inline void __xskq_cons_peek(struct xsk_queue *q)
+{
+	/* Refresh the local pointer */
+	q->cached_prod = READ_ONCE(q->ring->producer);
+	smp_rmb(); /* C, matches B */
+}
+
+static inline void xskq_cons_get_entries(struct xsk_queue *q)
+{
+	__xskq_cons_release(q);
+	__xskq_cons_peek(q);
+}
+
+static inline bool xskq_cons_has_entries(struct xsk_queue *q, u32 cnt)
+{
+	u32 entries = q->cached_prod - q->cached_cons;
+
+	if (entries >= cnt)
+		return true;
+
+	__xskq_cons_peek(q);
+	entries = q->cached_prod - q->cached_cons;
+
+	return entries >= cnt;
+}
+
+static inline bool xskq_cons_peek_addr(struct xsk_queue *q, u64 *addr,
+				       struct xdp_umem *umem)
+{
+	if (q->cached_prod == q->cached_cons)
+		xskq_cons_get_entries(q);
+	return xskq_cons_read_addr(q, addr, umem);
+}
+
 static inline bool xskq_cons_peek_desc(struct xsk_queue *q,
 				       struct xdp_desc *desc,
 				       struct xdp_umem *umem)
@@ -327,6 +267,59 @@ static inline bool xskq_cons_peek_desc(struct xsk_queue *q,
 	return xskq_cons_read_desc(q, desc, umem);
 }
 
+static inline void xskq_cons_release(struct xsk_queue *q)
+{
+	/* To improve performance, only update local state here.
+	 * Reflect this to global state when we get new entries
+	 * from the ring in xskq_cons_get_entries().
+	 */
+	q->cached_cons++;
+}
+
+static inline bool xskq_cons_is_full(struct xsk_queue *q)
+{
+	/* No barriers needed since data is not accessed */
+	return READ_ONCE(q->ring->producer) - q->cached_cons == q->nentries;
+}
+
+/* Functions for producers */
+
+static inline bool xskq_prod_is_full(struct xsk_queue *q)
+{
+	u32 free_entries = q->nentries - (q->cached_prod - q->cached_cons);
+
+	if (free_entries)
+		return false;
+
+	/* Refresh the local tail pointer */
+	q->cached_cons = READ_ONCE(q->ring->consumer);
+	free_entries = q->nentries - (q->cached_prod - q->cached_cons);
+
+	return !free_entries;
+}
+
+static inline int xskq_prod_reserve(struct xsk_queue *q)
+{
+	if (xskq_prod_is_full(q))
+		return -ENOSPC;
+
+	/* A, matches D */
+	q->cached_prod++;
+	return 0;
+}
+
+static inline int xskq_prod_reserve_addr(struct xsk_queue *q, u64 addr)
+{
+	struct xdp_umem_ring *ring = (struct xdp_umem_ring *)q->ring;
+
+	if (xskq_prod_is_full(q))
+		return -ENOSPC;
+
+	/* A, matches D */
+	ring->desc[q->cached_prod++ & q->ring_mask] = addr;
+	return 0;
+}
+
 static inline int xskq_prod_reserve_desc(struct xsk_queue *q,
 					 u64 addr, u32 len)
 {
@@ -344,10 +337,31 @@ static inline int xskq_prod_reserve_desc(struct xsk_queue *q,
 	return 0;
 }
 
-static inline bool xskq_cons_is_full(struct xsk_queue *q)
+static inline void __xskq_prod_submit(struct xsk_queue *q, u32 idx)
 {
-	/* No barriers needed since data is not accessed */
-	return READ_ONCE(q->ring->producer) - q->cached_cons == q->nentries;
+	smp_wmb(); /* B, matches C */
+
+	WRITE_ONCE(q->ring->producer, idx);
+}
+
+static inline void xskq_prod_submit(struct xsk_queue *q)
+{
+	__xskq_prod_submit(q, q->cached_prod);
+}
+
+static inline void xskq_prod_submit_addr(struct xsk_queue *q, u64 addr)
+{
+	struct xdp_umem_ring *ring = (struct xdp_umem_ring *)q->ring;
+	u32 idx = q->ring->producer;
+
+	ring->desc[idx++ & q->ring_mask] = addr;
+
+	__xskq_prod_submit(q, idx);
+}
+
+static inline void xskq_prod_submit_n(struct xsk_queue *q, u32 nb_entries)
+{
+	__xskq_prod_submit(q, q->ring->producer + nb_entries);
 }
 
 static inline bool xskq_prod_is_empty(struct xsk_queue *q)
@@ -356,6 +370,13 @@ static inline bool xskq_prod_is_empty(struct xsk_queue *q)
 	return READ_ONCE(q->ring->consumer) == q->cached_prod;
 }
 
+/* For both producers and consumers */
+
+static inline u64 xskq_nb_invalid_descs(struct xsk_queue *q)
+{
+	return q ? q->invalid_descs : 0;
+}
+
 void xskq_set_umem(struct xsk_queue *q, u64 size, u64 chunk_mask);
 struct xsk_queue *xskq_create(u32 nentries, bool umem_queue);
 void xskq_destroy(struct xsk_queue *q_ops);
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH bpf-next 12/12] xsk: use struct_size() helper
  2019-12-09  7:56 [PATCH bpf-next 00/12] xsk: clean up ring access functions Magnus Karlsson
                   ` (10 preceding siblings ...)
  2019-12-09  7:56 ` [PATCH bpf-next 11/12] xsk: add function naming comments and reorder functions Magnus Karlsson
@ 2019-12-09  7:56 ` Magnus Karlsson
  2019-12-10  1:02 ` [PATCH bpf-next 00/12] xsk: clean up ring access functions Martin Lau
  12 siblings, 0 replies; 22+ messages in thread
From: Magnus Karlsson @ 2019-12-09  7:56 UTC (permalink / raw)
  To: magnus.karlsson, bjorn.topel, ast, daniel, netdev, jonathan.lemon
  Cc: bpf, saeedm, jeffrey.t.kirsher, maciej.fijalkowski,
	maciejromanfijalkowski

Improve readability and maintainability by using the struct_size()
helper when allocating the AF_XDP rings.

Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
---
 net/xdp/xsk_queue.c | 15 +++++++--------
 1 file changed, 7 insertions(+), 8 deletions(-)

diff --git a/net/xdp/xsk_queue.c b/net/xdp/xsk_queue.c
index b665045..c90e9c1 100644
--- a/net/xdp/xsk_queue.c
+++ b/net/xdp/xsk_queue.c
@@ -18,14 +18,14 @@ void xskq_set_umem(struct xsk_queue *q, u64 size, u64 chunk_mask)
 	q->chunk_mask = chunk_mask;
 }
 
-static u32 xskq_umem_get_ring_size(struct xsk_queue *q)
+static size_t xskq_get_ring_size(struct xsk_queue *q, bool umem_queue)
 {
-	return sizeof(struct xdp_umem_ring) + q->nentries * sizeof(u64);
-}
+	struct xdp_umem_ring *umem_ring;
+	struct xdp_rxtx_ring *rxtx_ring;
 
-static u32 xskq_rxtx_get_ring_size(struct xsk_queue *q)
-{
-	return sizeof(struct xdp_ring) + q->nentries * sizeof(struct xdp_desc);
+	if (umem_queue)
+		return struct_size(umem_ring, desc, q->nentries);
+	return struct_size(rxtx_ring, desc, q->nentries);
 }
 
 struct xsk_queue *xskq_create(u32 nentries, bool umem_queue)
@@ -43,8 +43,7 @@ struct xsk_queue *xskq_create(u32 nentries, bool umem_queue)
 
 	gfp_flags = GFP_KERNEL | __GFP_ZERO | __GFP_NOWARN |
 		    __GFP_COMP  | __GFP_NORETRY;
-	size = umem_queue ? xskq_umem_get_ring_size(q) :
-	       xskq_rxtx_get_ring_size(q);
+	size = xskq_get_ring_size(q, umem_queue);
 
 	q->ring = (struct xdp_ring *)__get_free_pages(gfp_flags,
 						      get_order(size));
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH bpf-next 05/12] xsk: eliminate the RX batch size
  2019-12-09  7:56 ` [PATCH bpf-next 05/12] xsk: eliminate the RX batch size Magnus Karlsson
@ 2019-12-09 10:16   ` Sergei Shtylyov
  2019-12-09 13:07     ` Magnus Karlsson
  0 siblings, 1 reply; 22+ messages in thread
From: Sergei Shtylyov @ 2019-12-09 10:16 UTC (permalink / raw)
  To: Magnus Karlsson, bjorn.topel, ast, daniel, netdev, jonathan.lemon
  Cc: bpf, saeedm, jeffrey.t.kirsher, maciej.fijalkowski,
	maciejromanfijalkowski

Hello!

On 09.12.2019 10:56, Magnus Karlsson wrote:

> In the xsk consumer ring code there is a variable call RX_BATCH_SIZE

    Called?

> that dictates the minimum number of entries that we try to grab from
> the fill and Tx rings. In fact, the code always try to grab the
   ^^^^^^^^^^^^^^^^^^^^^ hm, are you sure there's no typo here?

> maximum amount of entries from these rings. The only thing this
> variable does is to throw an error if there is less than 16 (as it is
> defined) entries on the ring. There is no reason to do this and it
> will just lead to weird behavior from user space's point of view. So
> eliminate this variable.
> 
> With this change, we will be able to simplify the xskq_nb_free and
> xskq_nb_avail code in the next commit.
> 
> Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
[...]

MBR, Sergei


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH bpf-next 05/12] xsk: eliminate the RX batch size
  2019-12-09 10:16   ` Sergei Shtylyov
@ 2019-12-09 13:07     ` Magnus Karlsson
  0 siblings, 0 replies; 22+ messages in thread
From: Magnus Karlsson @ 2019-12-09 13:07 UTC (permalink / raw)
  To: Sergei Shtylyov
  Cc: Magnus Karlsson, Björn Töpel, Alexei Starovoitov,
	Daniel Borkmann, Network Development, Jonathan Lemon, bpf,
	Saeed Mahameed, jeffrey.t.kirsher, Fijalkowski, Maciej,
	Maciej Fijalkowski

On Mon, Dec 9, 2019 at 11:17 AM Sergei Shtylyov
<sergei.shtylyov@cogentembedded.com> wrote:
>
> Hello!
>
> On 09.12.2019 10:56, Magnus Karlsson wrote:
>
> > In the xsk consumer ring code there is a variable call RX_BATCH_SIZE
>
>     Called?

Yes, definitely. Will fix.

Thanks: Magnus

> > that dictates the minimum number of entries that we try to grab from
> > the fill and Tx rings. In fact, the code always try to grab the
>    ^^^^^^^^^^^^^^^^^^^^^ hm, are you sure there's no typo here?
>
> > maximum amount of entries from these rings. The only thing this
> > variable does is to throw an error if there is less than 16 (as it is
> > defined) entries on the ring. There is no reason to do this and it
> > will just lead to weird behavior from user space's point of view. So
> > eliminate this variable.
> >
> > With this change, we will be able to simplify the xskq_nb_free and
> > xskq_nb_avail code in the next commit.
> >
> > Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
> [...]
>
> MBR, Sergei
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH bpf-next 02/12] xsk: consolidate to one single cached producer pointer
  2019-12-09  7:56 ` [PATCH bpf-next 02/12] xsk: consolidate to one single cached producer pointer Magnus Karlsson
@ 2019-12-10  0:42   ` Martin Lau
  2019-12-10  9:04     ` Magnus Karlsson
  2019-12-13 18:04   ` Maxim Mikityanskiy
  1 sibling, 1 reply; 22+ messages in thread
From: Martin Lau @ 2019-12-10  0:42 UTC (permalink / raw)
  To: Magnus Karlsson
  Cc: bjorn.topel, ast, daniel, netdev, jonathan.lemon, bpf, saeedm,
	jeffrey.t.kirsher, maciej.fijalkowski, maciejromanfijalkowski

On Mon, Dec 09, 2019 at 08:56:19AM +0100, Magnus Karlsson wrote:
> Currently, the xsk ring code has two cached producer pointers:
> prod_head and prod_tail. This patch consolidates these two into a
> single one called cached_prod to make the code simpler and easier to
> maintain. This will be in line with the user space part of the the
> code found in libbpf, that only uses a single cached pointer.
> 
> The Rx path only uses the two top level functions
> xskq_produce_batch_desc and xskq_produce_flush_desc and they both use
> prod_head and never prod_tail. So just move them over to
> cached_prod.
> 
> The Tx XDP_DRV path uses xskq_produce_addr_lazy and
> xskq_produce_flush_addr_n and unnecessarily operates on both prod_tail
> and prod_cons, so move them over to just use cached_prod by skipping
prod_cons or prod_head?

> the intermediate step of updating prod_tail.
> 
> The Tx path in XDP_SKB mode uses xskq_reserve_addr and
> xskq_produce_addr. They currently use both cached pointers, but we can
> operate on the global producer pointer in xskq_produce_addr since it
> has to be updated anyway, thus eliminating the use of both cached
> pointers. We can also remove the xskq_nb_free in xskq_produce_addr
> since it is already called in xskq_reserve_addr. No need to do it
> twice.
> 
> When there is only one cached producer pointer, we can also simplify
> xskq_nb_free by removing one argument.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH bpf-next 00/12] xsk: clean up ring access functions
  2019-12-09  7:56 [PATCH bpf-next 00/12] xsk: clean up ring access functions Magnus Karlsson
                   ` (11 preceding siblings ...)
  2019-12-09  7:56 ` [PATCH bpf-next 12/12] xsk: use struct_size() helper Magnus Karlsson
@ 2019-12-10  1:02 ` Martin Lau
  12 siblings, 0 replies; 22+ messages in thread
From: Martin Lau @ 2019-12-10  1:02 UTC (permalink / raw)
  To: Magnus Karlsson, jonathan.lemon
  Cc: bjorn.topel, ast, daniel, netdev, bpf, saeedm, jeffrey.t.kirsher,
	maciej.fijalkowski, maciejromanfijalkowski

On Mon, Dec 09, 2019 at 08:56:17AM +0100, Magnus Karlsson wrote:
> This patch set cleans up the ring access functions of AF_XDP in hope
> that it will now be easier to understand and maintain. I used to get a
> headache every time I looked at this code in order to really understand it,
> but now I do think it is a lot less painful.
> 
> The code has been simplified a lot and as a bonus we get better
> performance. On my 2.0 GHz Broadwell machine with a standard default
> config plus AF_XDP support and CONFIG_PREEMPT on I get the following
> results in percent performance increases with this patch set compared
> to without it:
> 
> Zero-copy (-N):
>           rxdrop        txpush        l2fwd
> 1 core:     4%            5%            4%
> 2 cores:    1%            0%            2%
> 
> Zero-copy with poll() (-N -p):
>           rxdrop        txpush        l2fwd
> 1 core:     1%            3%            3%
> 2 cores:   22%            0%            5%
> 
> Skb mode (-S):
> Shows a 0% to 1% performance improvement over the same benchmarks as
> above.
> 
> Here 1 core means that we are running the driver processing and the
> application on the same core, while 2 cores means that they execute on
> separate cores. The applications are from the xdpsock sample app.
> 
> When a results says 22% better, as in the case of poll mode with 2
> cores and rxdrop, my first reaction is that it must be a
> bug. Everything else shows between 0% and 5% performance
> improvement. What is giving rise to 22%? A quick bisect indicates that
> it is patches 2, 3, 4, 5, and 6 that are giving rise to most of this
> improvement. So not one patch in particular, but something around 4%
> improvement from each one of them. Note that exactly this benchmark
> has previously had an extraordinary slow down compared to when running
> without poll syscalls. For all the other poll tests above, the
> slowdown has always been around 4% for using poll syscalls. But with
> the bad performing test in question, it was above 25%. Interestingly,
> after this clean up, the slow down is 4%, just like all the other poll
> tests. Please take an extra peek at this so I have not messed up
> something.
It overall makes sense to me.

Jonathan, can you also help to review?

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH bpf-next 02/12] xsk: consolidate to one single cached producer pointer
  2019-12-10  0:42   ` Martin Lau
@ 2019-12-10  9:04     ` Magnus Karlsson
  0 siblings, 0 replies; 22+ messages in thread
From: Magnus Karlsson @ 2019-12-10  9:04 UTC (permalink / raw)
  To: Martin Lau
  Cc: Magnus Karlsson, bjorn.topel, ast, daniel, netdev,
	jonathan.lemon, bpf, saeedm, jeffrey.t.kirsher,
	maciej.fijalkowski, maciejromanfijalkowski

On Tue, Dec 10, 2019 at 1:43 AM Martin Lau <kafai@fb.com> wrote:
>
> On Mon, Dec 09, 2019 at 08:56:19AM +0100, Magnus Karlsson wrote:
> > Currently, the xsk ring code has two cached producer pointers:
> > prod_head and prod_tail. This patch consolidates these two into a
> > single one called cached_prod to make the code simpler and easier to
> > maintain. This will be in line with the user space part of the the
> > code found in libbpf, that only uses a single cached pointer.
> >
> > The Rx path only uses the two top level functions
> > xskq_produce_batch_desc and xskq_produce_flush_desc and they both use
> > prod_head and never prod_tail. So just move them over to
> > cached_prod.
> >
> > The Tx XDP_DRV path uses xskq_produce_addr_lazy and
> > xskq_produce_flush_addr_n and unnecessarily operates on both prod_tail
> > and prod_cons, so move them over to just use cached_prod by skipping
> prod_cons or prod_head?

Thanks. It should read prod_head. Will fix in a v2.

/Magnus

> > the intermediate step of updating prod_tail.
> >
> > The Tx path in XDP_SKB mode uses xskq_reserve_addr and
> > xskq_produce_addr. They currently use both cached pointers, but we can
> > operate on the global producer pointer in xskq_produce_addr since it
> > has to be updated anyway, thus eliminating the use of both cached
> > pointers. We can also remove the xskq_nb_free in xskq_produce_addr
> > since it is already called in xskq_reserve_addr. No need to do it
> > twice.
> >
> > When there is only one cached producer pointer, we can also simplify
> > xskq_nb_free by removing one argument.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH bpf-next 02/12] xsk: consolidate to one single cached producer pointer
  2019-12-09  7:56 ` [PATCH bpf-next 02/12] xsk: consolidate to one single cached producer pointer Magnus Karlsson
  2019-12-10  0:42   ` Martin Lau
@ 2019-12-13 18:04   ` Maxim Mikityanskiy
  2019-12-16  8:46     ` Magnus Karlsson
  1 sibling, 1 reply; 22+ messages in thread
From: Maxim Mikityanskiy @ 2019-12-13 18:04 UTC (permalink / raw)
  To: Magnus Karlsson, bjorn.topel, ast, daniel, netdev, jonathan.lemon
  Cc: bpf, Saeed Mahameed, jeffrey.t.kirsher, maciej.fijalkowski,
	maciejromanfijalkowski, Maxim Mikityanskiy, Maxim Mikityanskiy

On 2019-12-09 09:56, Magnus Karlsson wrote:
> Currently, the xsk ring code has two cached producer pointers:
> prod_head and prod_tail. This patch consolidates these two into a
> single one called cached_prod to make the code simpler and easier to
> maintain. This will be in line with the user space part of the the
> code found in libbpf, that only uses a single cached pointer.
> 
> The Rx path only uses the two top level functions
> xskq_produce_batch_desc and xskq_produce_flush_desc and they both use
> prod_head and never prod_tail. So just move them over to
> cached_prod.
> 
> The Tx XDP_DRV path uses xskq_produce_addr_lazy and
> xskq_produce_flush_addr_n and unnecessarily operates on both prod_tail
> and prod_cons, so move them over to just use cached_prod by skipping
> the intermediate step of updating prod_tail.
> 
> The Tx path in XDP_SKB mode uses xskq_reserve_addr and
> xskq_produce_addr. They currently use both cached pointers, but we can
> operate on the global producer pointer in xskq_produce_addr since it
> has to be updated anyway, thus eliminating the use of both cached
> pointers. We can also remove the xskq_nb_free in xskq_produce_addr
> since it is already called in xskq_reserve_addr. No need to do it
> twice.
> 
> When there is only one cached producer pointer, we can also simplify
> xskq_nb_free by removing one argument.
> 
> Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
> ---
>   net/xdp/xsk_queue.h | 49 ++++++++++++++++++++++---------------------------
>   1 file changed, 22 insertions(+), 27 deletions(-)
> 
> diff --git a/net/xdp/xsk_queue.h b/net/xdp/xsk_queue.h
> index a2f0ba6..d88e1a0 100644
> --- a/net/xdp/xsk_queue.h
> +++ b/net/xdp/xsk_queue.h
> @@ -35,8 +35,7 @@ struct xsk_queue {
>   	u64 size;
>   	u32 ring_mask;
>   	u32 nentries;
> -	u32 prod_head;
> -	u32 prod_tail;
> +	u32 cached_prod;
>   	u32 cons_head;
>   	u32 cons_tail;
>   	struct xdp_ring *ring;
> @@ -94,39 +93,39 @@ static inline u64 xskq_nb_invalid_descs(struct xsk_queue *q)
>   
>   static inline u32 xskq_nb_avail(struct xsk_queue *q, u32 dcnt)
>   {
> -	u32 entries = q->prod_tail - q->cons_tail;
> +	u32 entries = q->cached_prod - q->cons_tail;
>   
>   	if (entries == 0) {
>   		/* Refresh the local pointer */
> -		q->prod_tail = READ_ONCE(q->ring->producer);
> -		entries = q->prod_tail - q->cons_tail;
> +		q->cached_prod = READ_ONCE(q->ring->producer);
> +		entries = q->cached_prod - q->cons_tail;
>   	}
>   
>   	return (entries > dcnt) ? dcnt : entries;
>   }
>   
> -static inline u32 xskq_nb_free(struct xsk_queue *q, u32 producer, u32 dcnt)
> +static inline u32 xskq_nb_free(struct xsk_queue *q, u32 dcnt)
>   {
> -	u32 free_entries = q->nentries - (producer - q->cons_tail);
> +	u32 free_entries = q->nentries - (q->cached_prod - q->cons_tail);
>   
>   	if (free_entries >= dcnt)
>   		return free_entries;
>   
>   	/* Refresh the local tail pointer */
>   	q->cons_tail = READ_ONCE(q->ring->consumer);
> -	return q->nentries - (producer - q->cons_tail);
> +	return q->nentries - (q->cached_prod - q->cons_tail);
>   }
>   
>   static inline bool xskq_has_addrs(struct xsk_queue *q, u32 cnt)
>   {
> -	u32 entries = q->prod_tail - q->cons_tail;
> +	u32 entries = q->cached_prod - q->cons_tail;
>   
>   	if (entries >= cnt)
>   		return true;
>   
>   	/* Refresh the local pointer. */
> -	q->prod_tail = READ_ONCE(q->ring->producer);
> -	entries = q->prod_tail - q->cons_tail;
> +	q->cached_prod = READ_ONCE(q->ring->producer);
> +	entries = q->cached_prod - q->cons_tail;
>   
>   	return entries >= cnt;
>   }
> @@ -220,17 +219,15 @@ static inline void xskq_discard_addr(struct xsk_queue *q)
>   static inline int xskq_produce_addr(struct xsk_queue *q, u64 addr)
>   {
>   	struct xdp_umem_ring *ring = (struct xdp_umem_ring *)q->ring;
> -
> -	if (xskq_nb_free(q, q->prod_tail, 1) == 0)
> -		return -ENOSPC;
> +	unsigned int idx = q->ring->producer;
>   
>   	/* A, matches D */
> -	ring->desc[q->prod_tail++ & q->ring_mask] = addr;
> +	ring->desc[idx++ & q->ring_mask] = addr;
>   
>   	/* Order producer and data */
>   	smp_wmb(); /* B, matches C */
>   
> -	WRITE_ONCE(q->ring->producer, q->prod_tail);
> +	WRITE_ONCE(q->ring->producer, idx);
>   	return 0;
>   }
>   
> @@ -238,11 +235,11 @@ static inline int xskq_produce_addr_lazy(struct xsk_queue *q, u64 addr)
>   {
>   	struct xdp_umem_ring *ring = (struct xdp_umem_ring *)q->ring;
>   
> -	if (xskq_nb_free(q, q->prod_head, 1) == 0)
> +	if (xskq_nb_free(q, 1) == 0)
>   		return -ENOSPC;
>   
>   	/* A, matches D */
> -	ring->desc[q->prod_head++ & q->ring_mask] = addr;
> +	ring->desc[q->cached_prod++ & q->ring_mask] = addr;
>   	return 0;
>   }
>   
> @@ -252,17 +249,16 @@ static inline void xskq_produce_flush_addr_n(struct xsk_queue *q,
>   	/* Order producer and data */
>   	smp_wmb(); /* B, matches C */
>   
> -	q->prod_tail += nb_entries;
> -	WRITE_ONCE(q->ring->producer, q->prod_tail);
> +	WRITE_ONCE(q->ring->producer, q->ring->producer + nb_entries);
>   }
>   
>   static inline int xskq_reserve_addr(struct xsk_queue *q)
>   {
> -	if (xskq_nb_free(q, q->prod_head, 1) == 0)
> +	if (xskq_nb_free(q, 1) == 0)
>   		return -ENOSPC;
>   
>   	/* A, matches D */
> -	q->prod_head++;
> +	q->cached_prod++;
>   	return 0;
>   }
>   
> @@ -340,11 +336,11 @@ static inline int xskq_produce_batch_desc(struct xsk_queue *q,
>   	struct xdp_rxtx_ring *ring = (struct xdp_rxtx_ring *)q->ring;
>   	unsigned int idx;
>   
> -	if (xskq_nb_free(q, q->prod_head, 1) == 0)
> +	if (xskq_nb_free(q, 1) == 0)
>   		return -ENOSPC;
>   
>   	/* A, matches D */
> -	idx = (q->prod_head++) & q->ring_mask;
> +	idx = q->cached_prod++ & q->ring_mask;
>   	ring->desc[idx].addr = addr;
>   	ring->desc[idx].len = len;
>   
> @@ -356,8 +352,7 @@ static inline void xskq_produce_flush_desc(struct xsk_queue *q)
>   	/* Order producer and data */
>   	smp_wmb(); /* B, matches C */
>   
> -	q->prod_tail = q->prod_head;
> -	WRITE_ONCE(q->ring->producer, q->prod_tail);
> +	WRITE_ONCE(q->ring->producer, q->cached_prod);
>   }
>   
>   static inline bool xskq_full_desc(struct xsk_queue *q)
> @@ -367,7 +362,7 @@ static inline bool xskq_full_desc(struct xsk_queue *q)
>   
>   static inline bool xskq_empty_desc(struct xsk_queue *q)
>   {
> -	return xskq_nb_free(q, q->prod_tail, q->nentries) == q->nentries;
> +	return xskq_nb_free(q, q->nentries) == q->nentries;

I don't think this change is correct. The old code checked the number of 
free items against prod_tail (== producer). The new code changes it to 
prod_head (which is now cached_prod). xskq_nb_free is used in xsk_poll 
to set EPOLLIN. After this change EPOLLIN will be set right after 
xskq_produce_batch_desc, but it should only be set after 
xskq_produce_flush_desc, just as before, otherwise the application will 
wake up before the data is available, and it will just waste CPU cycles.

>   }
>   
>   void xskq_set_umem(struct xsk_queue *q, u64 size, u64 chunk_mask);
> 


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH bpf-next 02/12] xsk: consolidate to one single cached producer pointer
  2019-12-13 18:04   ` Maxim Mikityanskiy
@ 2019-12-16  8:46     ` Magnus Karlsson
  2019-12-19 14:35       ` Maxim Mikityanskiy
  0 siblings, 1 reply; 22+ messages in thread
From: Magnus Karlsson @ 2019-12-16  8:46 UTC (permalink / raw)
  To: Maxim Mikityanskiy
  Cc: Magnus Karlsson, bjorn.topel, ast, daniel, netdev,
	jonathan.lemon, bpf, Saeed Mahameed, jeffrey.t.kirsher,
	maciej.fijalkowski, maciejromanfijalkowski, Maxim Mikityanskiy

On Fri, Dec 13, 2019 at 7:04 PM Maxim Mikityanskiy <maximmi@mellanox.com> wrote:
>
> On 2019-12-09 09:56, Magnus Karlsson wrote:
> > Currently, the xsk ring code has two cached producer pointers:
> > prod_head and prod_tail. This patch consolidates these two into a
> > single one called cached_prod to make the code simpler and easier to
> > maintain. This will be in line with the user space part of the the
> > code found in libbpf, that only uses a single cached pointer.
> >
> > The Rx path only uses the two top level functions
> > xskq_produce_batch_desc and xskq_produce_flush_desc and they both use
> > prod_head and never prod_tail. So just move them over to
> > cached_prod.
> >
> > The Tx XDP_DRV path uses xskq_produce_addr_lazy and
> > xskq_produce_flush_addr_n and unnecessarily operates on both prod_tail
> > and prod_cons, so move them over to just use cached_prod by skipping
> > the intermediate step of updating prod_tail.
> >
> > The Tx path in XDP_SKB mode uses xskq_reserve_addr and
> > xskq_produce_addr. They currently use both cached pointers, but we can
> > operate on the global producer pointer in xskq_produce_addr since it
> > has to be updated anyway, thus eliminating the use of both cached
> > pointers. We can also remove the xskq_nb_free in xskq_produce_addr
> > since it is already called in xskq_reserve_addr. No need to do it
> > twice.
> >
> > When there is only one cached producer pointer, we can also simplify
> > xskq_nb_free by removing one argument.
> >
> > Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
> > ---
> >   net/xdp/xsk_queue.h | 49 ++++++++++++++++++++++---------------------------
> >   1 file changed, 22 insertions(+), 27 deletions(-)
> >
> > diff --git a/net/xdp/xsk_queue.h b/net/xdp/xsk_queue.h
> > index a2f0ba6..d88e1a0 100644
> > --- a/net/xdp/xsk_queue.h
> > +++ b/net/xdp/xsk_queue.h
> > @@ -35,8 +35,7 @@ struct xsk_queue {
> >       u64 size;
> >       u32 ring_mask;
> >       u32 nentries;
> > -     u32 prod_head;
> > -     u32 prod_tail;
> > +     u32 cached_prod;
> >       u32 cons_head;
> >       u32 cons_tail;
> >       struct xdp_ring *ring;
> > @@ -94,39 +93,39 @@ static inline u64 xskq_nb_invalid_descs(struct xsk_queue *q)
> >
> >   static inline u32 xskq_nb_avail(struct xsk_queue *q, u32 dcnt)
> >   {
> > -     u32 entries = q->prod_tail - q->cons_tail;
> > +     u32 entries = q->cached_prod - q->cons_tail;
> >
> >       if (entries == 0) {
> >               /* Refresh the local pointer */
> > -             q->prod_tail = READ_ONCE(q->ring->producer);
> > -             entries = q->prod_tail - q->cons_tail;
> > +             q->cached_prod = READ_ONCE(q->ring->producer);
> > +             entries = q->cached_prod - q->cons_tail;
> >       }
> >
> >       return (entries > dcnt) ? dcnt : entries;
> >   }
> >
> > -static inline u32 xskq_nb_free(struct xsk_queue *q, u32 producer, u32 dcnt)
> > +static inline u32 xskq_nb_free(struct xsk_queue *q, u32 dcnt)
> >   {
> > -     u32 free_entries = q->nentries - (producer - q->cons_tail);
> > +     u32 free_entries = q->nentries - (q->cached_prod - q->cons_tail);
> >
> >       if (free_entries >= dcnt)
> >               return free_entries;
> >
> >       /* Refresh the local tail pointer */
> >       q->cons_tail = READ_ONCE(q->ring->consumer);
> > -     return q->nentries - (producer - q->cons_tail);
> > +     return q->nentries - (q->cached_prod - q->cons_tail);
> >   }
> >
> >   static inline bool xskq_has_addrs(struct xsk_queue *q, u32 cnt)
> >   {
> > -     u32 entries = q->prod_tail - q->cons_tail;
> > +     u32 entries = q->cached_prod - q->cons_tail;
> >
> >       if (entries >= cnt)
> >               return true;
> >
> >       /* Refresh the local pointer. */
> > -     q->prod_tail = READ_ONCE(q->ring->producer);
> > -     entries = q->prod_tail - q->cons_tail;
> > +     q->cached_prod = READ_ONCE(q->ring->producer);
> > +     entries = q->cached_prod - q->cons_tail;
> >
> >       return entries >= cnt;
> >   }
> > @@ -220,17 +219,15 @@ static inline void xskq_discard_addr(struct xsk_queue *q)
> >   static inline int xskq_produce_addr(struct xsk_queue *q, u64 addr)
> >   {
> >       struct xdp_umem_ring *ring = (struct xdp_umem_ring *)q->ring;
> > -
> > -     if (xskq_nb_free(q, q->prod_tail, 1) == 0)
> > -             return -ENOSPC;
> > +     unsigned int idx = q->ring->producer;
> >
> >       /* A, matches D */
> > -     ring->desc[q->prod_tail++ & q->ring_mask] = addr;
> > +     ring->desc[idx++ & q->ring_mask] = addr;
> >
> >       /* Order producer and data */
> >       smp_wmb(); /* B, matches C */
> >
> > -     WRITE_ONCE(q->ring->producer, q->prod_tail);
> > +     WRITE_ONCE(q->ring->producer, idx);
> >       return 0;
> >   }
> >
> > @@ -238,11 +235,11 @@ static inline int xskq_produce_addr_lazy(struct xsk_queue *q, u64 addr)
> >   {
> >       struct xdp_umem_ring *ring = (struct xdp_umem_ring *)q->ring;
> >
> > -     if (xskq_nb_free(q, q->prod_head, 1) == 0)
> > +     if (xskq_nb_free(q, 1) == 0)
> >               return -ENOSPC;
> >
> >       /* A, matches D */
> > -     ring->desc[q->prod_head++ & q->ring_mask] = addr;
> > +     ring->desc[q->cached_prod++ & q->ring_mask] = addr;
> >       return 0;
> >   }
> >
> > @@ -252,17 +249,16 @@ static inline void xskq_produce_flush_addr_n(struct xsk_queue *q,
> >       /* Order producer and data */
> >       smp_wmb(); /* B, matches C */
> >
> > -     q->prod_tail += nb_entries;
> > -     WRITE_ONCE(q->ring->producer, q->prod_tail);
> > +     WRITE_ONCE(q->ring->producer, q->ring->producer + nb_entries);
> >   }
> >
> >   static inline int xskq_reserve_addr(struct xsk_queue *q)
> >   {
> > -     if (xskq_nb_free(q, q->prod_head, 1) == 0)
> > +     if (xskq_nb_free(q, 1) == 0)
> >               return -ENOSPC;
> >
> >       /* A, matches D */
> > -     q->prod_head++;
> > +     q->cached_prod++;
> >       return 0;
> >   }
> >
> > @@ -340,11 +336,11 @@ static inline int xskq_produce_batch_desc(struct xsk_queue *q,
> >       struct xdp_rxtx_ring *ring = (struct xdp_rxtx_ring *)q->ring;
> >       unsigned int idx;
> >
> > -     if (xskq_nb_free(q, q->prod_head, 1) == 0)
> > +     if (xskq_nb_free(q, 1) == 0)
> >               return -ENOSPC;
> >
> >       /* A, matches D */
> > -     idx = (q->prod_head++) & q->ring_mask;
> > +     idx = q->cached_prod++ & q->ring_mask;
> >       ring->desc[idx].addr = addr;
> >       ring->desc[idx].len = len;
> >
> > @@ -356,8 +352,7 @@ static inline void xskq_produce_flush_desc(struct xsk_queue *q)
> >       /* Order producer and data */
> >       smp_wmb(); /* B, matches C */
> >
> > -     q->prod_tail = q->prod_head;
> > -     WRITE_ONCE(q->ring->producer, q->prod_tail);
> > +     WRITE_ONCE(q->ring->producer, q->cached_prod);
> >   }
> >
> >   static inline bool xskq_full_desc(struct xsk_queue *q)
> > @@ -367,7 +362,7 @@ static inline bool xskq_full_desc(struct xsk_queue *q)
> >
> >   static inline bool xskq_empty_desc(struct xsk_queue *q)
> >   {
> > -     return xskq_nb_free(q, q->prod_tail, q->nentries) == q->nentries;
> > +     return xskq_nb_free(q, q->nentries) == q->nentries;
>
> I don't think this change is correct. The old code checked the number of
> free items against prod_tail (== producer). The new code changes it to
> prod_head (which is now cached_prod). xskq_nb_free is used in xsk_poll
> to set EPOLLIN. After this change EPOLLIN will be set right after
> xskq_produce_batch_desc, but it should only be set after
> xskq_produce_flush_desc, just as before, otherwise the application will
> wake up before the data is available, and it will just waste CPU cycles.

That is correct. It will be inefficient during patch 2 and 3 as this
gets fixed in patch 4. I chose this as I thought the patch progression
and simplification process would be clearer this way. So what to do
about it? Some options:

* Document this in patch 2 and keep the current order

*  Put patch 4 before patch 2 so that the code is always efficient.
This is doable, but I have the feeling it will be somewhat less clear
from a simplification perspective. The advantage, on the other hand,
is that the poll code is always efficient during the whole patch set.

/Magnus

> >   }
> >
> >   void xskq_set_umem(struct xsk_queue *q, u64 size, u64 chunk_mask);
> >
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH bpf-next 02/12] xsk: consolidate to one single cached producer pointer
  2019-12-16  8:46     ` Magnus Karlsson
@ 2019-12-19 14:35       ` Maxim Mikityanskiy
  2019-12-19 16:21         ` Magnus Karlsson
  0 siblings, 1 reply; 22+ messages in thread
From: Maxim Mikityanskiy @ 2019-12-19 14:35 UTC (permalink / raw)
  To: Magnus Karlsson
  Cc: Maxim Mikityanskiy, Magnus Karlsson, bjorn.topel, ast, daniel,
	netdev, jonathan.lemon, bpf, Saeed Mahameed, jeffrey.t.kirsher,
	maciej.fijalkowski, maciejromanfijalkowski

On Mon, Dec 16, 2019 at 10:46 AM Magnus Karlsson
<magnus.karlsson@gmail.com> wrote:
>
> On Fri, Dec 13, 2019 at 7:04 PM Maxim Mikityanskiy <maximmi@mellanox.com> wrote:
> >
> > On 2019-12-09 09:56, Magnus Karlsson wrote:
> > > Currently, the xsk ring code has two cached producer pointers:
> > > prod_head and prod_tail. This patch consolidates these two into a
> > > single one called cached_prod to make the code simpler and easier to
> > > maintain. This will be in line with the user space part of the the
> > > code found in libbpf, that only uses a single cached pointer.
> > >
> > > The Rx path only uses the two top level functions
> > > xskq_produce_batch_desc and xskq_produce_flush_desc and they both use
> > > prod_head and never prod_tail. So just move them over to
> > > cached_prod.
> > >
> > > The Tx XDP_DRV path uses xskq_produce_addr_lazy and
> > > xskq_produce_flush_addr_n and unnecessarily operates on both prod_tail
> > > and prod_cons, so move them over to just use cached_prod by skipping
> > > the intermediate step of updating prod_tail.
> > >
> > > The Tx path in XDP_SKB mode uses xskq_reserve_addr and
> > > xskq_produce_addr. They currently use both cached pointers, but we can
> > > operate on the global producer pointer in xskq_produce_addr since it
> > > has to be updated anyway, thus eliminating the use of both cached
> > > pointers. We can also remove the xskq_nb_free in xskq_produce_addr
> > > since it is already called in xskq_reserve_addr. No need to do it
> > > twice.
> > >
> > > When there is only one cached producer pointer, we can also simplify
> > > xskq_nb_free by removing one argument.
> > >
> > > Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
> > > ---
> > >   net/xdp/xsk_queue.h | 49 ++++++++++++++++++++++---------------------------
> > >   1 file changed, 22 insertions(+), 27 deletions(-)
> > >
> > > diff --git a/net/xdp/xsk_queue.h b/net/xdp/xsk_queue.h
> > > index a2f0ba6..d88e1a0 100644
> > > --- a/net/xdp/xsk_queue.h
> > > +++ b/net/xdp/xsk_queue.h
> > > @@ -35,8 +35,7 @@ struct xsk_queue {
> > >       u64 size;
> > >       u32 ring_mask;
> > >       u32 nentries;
> > > -     u32 prod_head;
> > > -     u32 prod_tail;
> > > +     u32 cached_prod;
> > >       u32 cons_head;
> > >       u32 cons_tail;
> > >       struct xdp_ring *ring;
> > > @@ -94,39 +93,39 @@ static inline u64 xskq_nb_invalid_descs(struct xsk_queue *q)
> > >
> > >   static inline u32 xskq_nb_avail(struct xsk_queue *q, u32 dcnt)
> > >   {
> > > -     u32 entries = q->prod_tail - q->cons_tail;
> > > +     u32 entries = q->cached_prod - q->cons_tail;
> > >
> > >       if (entries == 0) {
> > >               /* Refresh the local pointer */
> > > -             q->prod_tail = READ_ONCE(q->ring->producer);
> > > -             entries = q->prod_tail - q->cons_tail;
> > > +             q->cached_prod = READ_ONCE(q->ring->producer);
> > > +             entries = q->cached_prod - q->cons_tail;
> > >       }
> > >
> > >       return (entries > dcnt) ? dcnt : entries;
> > >   }
> > >
> > > -static inline u32 xskq_nb_free(struct xsk_queue *q, u32 producer, u32 dcnt)
> > > +static inline u32 xskq_nb_free(struct xsk_queue *q, u32 dcnt)
> > >   {
> > > -     u32 free_entries = q->nentries - (producer - q->cons_tail);
> > > +     u32 free_entries = q->nentries - (q->cached_prod - q->cons_tail);
> > >
> > >       if (free_entries >= dcnt)
> > >               return free_entries;
> > >
> > >       /* Refresh the local tail pointer */
> > >       q->cons_tail = READ_ONCE(q->ring->consumer);
> > > -     return q->nentries - (producer - q->cons_tail);
> > > +     return q->nentries - (q->cached_prod - q->cons_tail);
> > >   }
> > >
> > >   static inline bool xskq_has_addrs(struct xsk_queue *q, u32 cnt)
> > >   {
> > > -     u32 entries = q->prod_tail - q->cons_tail;
> > > +     u32 entries = q->cached_prod - q->cons_tail;
> > >
> > >       if (entries >= cnt)
> > >               return true;
> > >
> > >       /* Refresh the local pointer. */
> > > -     q->prod_tail = READ_ONCE(q->ring->producer);
> > > -     entries = q->prod_tail - q->cons_tail;
> > > +     q->cached_prod = READ_ONCE(q->ring->producer);
> > > +     entries = q->cached_prod - q->cons_tail;
> > >
> > >       return entries >= cnt;
> > >   }
> > > @@ -220,17 +219,15 @@ static inline void xskq_discard_addr(struct xsk_queue *q)
> > >   static inline int xskq_produce_addr(struct xsk_queue *q, u64 addr)
> > >   {
> > >       struct xdp_umem_ring *ring = (struct xdp_umem_ring *)q->ring;
> > > -
> > > -     if (xskq_nb_free(q, q->prod_tail, 1) == 0)
> > > -             return -ENOSPC;
> > > +     unsigned int idx = q->ring->producer;
> > >
> > >       /* A, matches D */
> > > -     ring->desc[q->prod_tail++ & q->ring_mask] = addr;
> > > +     ring->desc[idx++ & q->ring_mask] = addr;
> > >
> > >       /* Order producer and data */
> > >       smp_wmb(); /* B, matches C */
> > >
> > > -     WRITE_ONCE(q->ring->producer, q->prod_tail);
> > > +     WRITE_ONCE(q->ring->producer, idx);
> > >       return 0;
> > >   }
> > >
> > > @@ -238,11 +235,11 @@ static inline int xskq_produce_addr_lazy(struct xsk_queue *q, u64 addr)
> > >   {
> > >       struct xdp_umem_ring *ring = (struct xdp_umem_ring *)q->ring;
> > >
> > > -     if (xskq_nb_free(q, q->prod_head, 1) == 0)
> > > +     if (xskq_nb_free(q, 1) == 0)
> > >               return -ENOSPC;
> > >
> > >       /* A, matches D */
> > > -     ring->desc[q->prod_head++ & q->ring_mask] = addr;
> > > +     ring->desc[q->cached_prod++ & q->ring_mask] = addr;
> > >       return 0;
> > >   }
> > >
> > > @@ -252,17 +249,16 @@ static inline void xskq_produce_flush_addr_n(struct xsk_queue *q,
> > >       /* Order producer and data */
> > >       smp_wmb(); /* B, matches C */
> > >
> > > -     q->prod_tail += nb_entries;
> > > -     WRITE_ONCE(q->ring->producer, q->prod_tail);
> > > +     WRITE_ONCE(q->ring->producer, q->ring->producer + nb_entries);
> > >   }
> > >
> > >   static inline int xskq_reserve_addr(struct xsk_queue *q)
> > >   {
> > > -     if (xskq_nb_free(q, q->prod_head, 1) == 0)
> > > +     if (xskq_nb_free(q, 1) == 0)
> > >               return -ENOSPC;
> > >
> > >       /* A, matches D */
> > > -     q->prod_head++;
> > > +     q->cached_prod++;
> > >       return 0;
> > >   }
> > >
> > > @@ -340,11 +336,11 @@ static inline int xskq_produce_batch_desc(struct xsk_queue *q,
> > >       struct xdp_rxtx_ring *ring = (struct xdp_rxtx_ring *)q->ring;
> > >       unsigned int idx;
> > >
> > > -     if (xskq_nb_free(q, q->prod_head, 1) == 0)
> > > +     if (xskq_nb_free(q, 1) == 0)
> > >               return -ENOSPC;
> > >
> > >       /* A, matches D */
> > > -     idx = (q->prod_head++) & q->ring_mask;
> > > +     idx = q->cached_prod++ & q->ring_mask;
> > >       ring->desc[idx].addr = addr;
> > >       ring->desc[idx].len = len;
> > >
> > > @@ -356,8 +352,7 @@ static inline void xskq_produce_flush_desc(struct xsk_queue *q)
> > >       /* Order producer and data */
> > >       smp_wmb(); /* B, matches C */
> > >
> > > -     q->prod_tail = q->prod_head;
> > > -     WRITE_ONCE(q->ring->producer, q->prod_tail);
> > > +     WRITE_ONCE(q->ring->producer, q->cached_prod);
> > >   }
> > >
> > >   static inline bool xskq_full_desc(struct xsk_queue *q)
> > > @@ -367,7 +362,7 @@ static inline bool xskq_full_desc(struct xsk_queue *q)
> > >
> > >   static inline bool xskq_empty_desc(struct xsk_queue *q)
> > >   {
> > > -     return xskq_nb_free(q, q->prod_tail, q->nentries) == q->nentries;
> > > +     return xskq_nb_free(q, q->nentries) == q->nentries;
> >
> > I don't think this change is correct. The old code checked the number of
> > free items against prod_tail (== producer). The new code changes it to
> > prod_head (which is now cached_prod). xskq_nb_free is used in xsk_poll
> > to set EPOLLIN. After this change EPOLLIN will be set right after
> > xskq_produce_batch_desc, but it should only be set after
> > xskq_produce_flush_desc, just as before, otherwise the application will
> > wake up before the data is available, and it will just waste CPU cycles.
>
> That is correct. It will be inefficient during patch 2 and 3 as this
> gets fixed in patch 4.

Looking at patch 4, I see it still uses cached_prod, not producer.
However, I see you changed it in the v2, so it should be fine now.

> I chose this as I thought the patch progression
> and simplification process would be clearer this way. So what to do
> about it? Some options:
>
> * Document this in patch 2 and keep the current order
>
> *  Put patch 4 before patch 2 so that the code is always efficient.
> This is doable, but I have the feeling it will be somewhat less clear
> from a simplification perspective. The advantage, on the other hand,
> is that the poll code is always efficient during the whole patch set.

I'm sorry that it takes long for me to answer, I'm on vacation now.
Anyway, either option looks good to me, as long as xskq_prod_is_empty
has the correct check (as in v2 patch 2).

> /Magnus
>
> > >   }
> > >
> > >   void xskq_set_umem(struct xsk_queue *q, u64 size, u64 chunk_mask);
> > >
> >

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH bpf-next 02/12] xsk: consolidate to one single cached producer pointer
  2019-12-19 14:35       ` Maxim Mikityanskiy
@ 2019-12-19 16:21         ` Magnus Karlsson
  0 siblings, 0 replies; 22+ messages in thread
From: Magnus Karlsson @ 2019-12-19 16:21 UTC (permalink / raw)
  To: Maxim Mikityanskiy
  Cc: Maxim Mikityanskiy, Magnus Karlsson, bjorn.topel, ast, daniel,
	netdev, jonathan.lemon, bpf, Saeed Mahameed, jeffrey.t.kirsher,
	maciej.fijalkowski, maciejromanfijalkowski

On Thu, Dec 19, 2019 at 3:35 PM Maxim Mikityanskiy <maxtram95@gmail.com> wrote:
>
> On Mon, Dec 16, 2019 at 10:46 AM Magnus Karlsson
> <magnus.karlsson@gmail.com> wrote:
> >
> > On Fri, Dec 13, 2019 at 7:04 PM Maxim Mikityanskiy <maximmi@mellanox.com> wrote:
> > >
> > > On 2019-12-09 09:56, Magnus Karlsson wrote:
> > > > Currently, the xsk ring code has two cached producer pointers:
> > > > prod_head and prod_tail. This patch consolidates these two into a
> > > > single one called cached_prod to make the code simpler and easier to
> > > > maintain. This will be in line with the user space part of the the
> > > > code found in libbpf, that only uses a single cached pointer.
> > > >
> > > > The Rx path only uses the two top level functions
> > > > xskq_produce_batch_desc and xskq_produce_flush_desc and they both use
> > > > prod_head and never prod_tail. So just move them over to
> > > > cached_prod.
> > > >
> > > > The Tx XDP_DRV path uses xskq_produce_addr_lazy and
> > > > xskq_produce_flush_addr_n and unnecessarily operates on both prod_tail
> > > > and prod_cons, so move them over to just use cached_prod by skipping
> > > > the intermediate step of updating prod_tail.
> > > >
> > > > The Tx path in XDP_SKB mode uses xskq_reserve_addr and
> > > > xskq_produce_addr. They currently use both cached pointers, but we can
> > > > operate on the global producer pointer in xskq_produce_addr since it
> > > > has to be updated anyway, thus eliminating the use of both cached
> > > > pointers. We can also remove the xskq_nb_free in xskq_produce_addr
> > > > since it is already called in xskq_reserve_addr. No need to do it
> > > > twice.
> > > >
> > > > When there is only one cached producer pointer, we can also simplify
> > > > xskq_nb_free by removing one argument.
> > > >
> > > > Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
> > > > ---
> > > >   net/xdp/xsk_queue.h | 49 ++++++++++++++++++++++---------------------------
> > > >   1 file changed, 22 insertions(+), 27 deletions(-)
> > > >
> > > > diff --git a/net/xdp/xsk_queue.h b/net/xdp/xsk_queue.h
> > > > index a2f0ba6..d88e1a0 100644
> > > > --- a/net/xdp/xsk_queue.h
> > > > +++ b/net/xdp/xsk_queue.h
> > > > @@ -35,8 +35,7 @@ struct xsk_queue {
> > > >       u64 size;
> > > >       u32 ring_mask;
> > > >       u32 nentries;
> > > > -     u32 prod_head;
> > > > -     u32 prod_tail;
> > > > +     u32 cached_prod;
> > > >       u32 cons_head;
> > > >       u32 cons_tail;
> > > >       struct xdp_ring *ring;
> > > > @@ -94,39 +93,39 @@ static inline u64 xskq_nb_invalid_descs(struct xsk_queue *q)
> > > >
> > > >   static inline u32 xskq_nb_avail(struct xsk_queue *q, u32 dcnt)
> > > >   {
> > > > -     u32 entries = q->prod_tail - q->cons_tail;
> > > > +     u32 entries = q->cached_prod - q->cons_tail;
> > > >
> > > >       if (entries == 0) {
> > > >               /* Refresh the local pointer */
> > > > -             q->prod_tail = READ_ONCE(q->ring->producer);
> > > > -             entries = q->prod_tail - q->cons_tail;
> > > > +             q->cached_prod = READ_ONCE(q->ring->producer);
> > > > +             entries = q->cached_prod - q->cons_tail;
> > > >       }
> > > >
> > > >       return (entries > dcnt) ? dcnt : entries;
> > > >   }
> > > >
> > > > -static inline u32 xskq_nb_free(struct xsk_queue *q, u32 producer, u32 dcnt)
> > > > +static inline u32 xskq_nb_free(struct xsk_queue *q, u32 dcnt)
> > > >   {
> > > > -     u32 free_entries = q->nentries - (producer - q->cons_tail);
> > > > +     u32 free_entries = q->nentries - (q->cached_prod - q->cons_tail);
> > > >
> > > >       if (free_entries >= dcnt)
> > > >               return free_entries;
> > > >
> > > >       /* Refresh the local tail pointer */
> > > >       q->cons_tail = READ_ONCE(q->ring->consumer);
> > > > -     return q->nentries - (producer - q->cons_tail);
> > > > +     return q->nentries - (q->cached_prod - q->cons_tail);
> > > >   }
> > > >
> > > >   static inline bool xskq_has_addrs(struct xsk_queue *q, u32 cnt)
> > > >   {
> > > > -     u32 entries = q->prod_tail - q->cons_tail;
> > > > +     u32 entries = q->cached_prod - q->cons_tail;
> > > >
> > > >       if (entries >= cnt)
> > > >               return true;
> > > >
> > > >       /* Refresh the local pointer. */
> > > > -     q->prod_tail = READ_ONCE(q->ring->producer);
> > > > -     entries = q->prod_tail - q->cons_tail;
> > > > +     q->cached_prod = READ_ONCE(q->ring->producer);
> > > > +     entries = q->cached_prod - q->cons_tail;
> > > >
> > > >       return entries >= cnt;
> > > >   }
> > > > @@ -220,17 +219,15 @@ static inline void xskq_discard_addr(struct xsk_queue *q)
> > > >   static inline int xskq_produce_addr(struct xsk_queue *q, u64 addr)
> > > >   {
> > > >       struct xdp_umem_ring *ring = (struct xdp_umem_ring *)q->ring;
> > > > -
> > > > -     if (xskq_nb_free(q, q->prod_tail, 1) == 0)
> > > > -             return -ENOSPC;
> > > > +     unsigned int idx = q->ring->producer;
> > > >
> > > >       /* A, matches D */
> > > > -     ring->desc[q->prod_tail++ & q->ring_mask] = addr;
> > > > +     ring->desc[idx++ & q->ring_mask] = addr;
> > > >
> > > >       /* Order producer and data */
> > > >       smp_wmb(); /* B, matches C */
> > > >
> > > > -     WRITE_ONCE(q->ring->producer, q->prod_tail);
> > > > +     WRITE_ONCE(q->ring->producer, idx);
> > > >       return 0;
> > > >   }
> > > >
> > > > @@ -238,11 +235,11 @@ static inline int xskq_produce_addr_lazy(struct xsk_queue *q, u64 addr)
> > > >   {
> > > >       struct xdp_umem_ring *ring = (struct xdp_umem_ring *)q->ring;
> > > >
> > > > -     if (xskq_nb_free(q, q->prod_head, 1) == 0)
> > > > +     if (xskq_nb_free(q, 1) == 0)
> > > >               return -ENOSPC;
> > > >
> > > >       /* A, matches D */
> > > > -     ring->desc[q->prod_head++ & q->ring_mask] = addr;
> > > > +     ring->desc[q->cached_prod++ & q->ring_mask] = addr;
> > > >       return 0;
> > > >   }
> > > >
> > > > @@ -252,17 +249,16 @@ static inline void xskq_produce_flush_addr_n(struct xsk_queue *q,
> > > >       /* Order producer and data */
> > > >       smp_wmb(); /* B, matches C */
> > > >
> > > > -     q->prod_tail += nb_entries;
> > > > -     WRITE_ONCE(q->ring->producer, q->prod_tail);
> > > > +     WRITE_ONCE(q->ring->producer, q->ring->producer + nb_entries);
> > > >   }
> > > >
> > > >   static inline int xskq_reserve_addr(struct xsk_queue *q)
> > > >   {
> > > > -     if (xskq_nb_free(q, q->prod_head, 1) == 0)
> > > > +     if (xskq_nb_free(q, 1) == 0)
> > > >               return -ENOSPC;
> > > >
> > > >       /* A, matches D */
> > > > -     q->prod_head++;
> > > > +     q->cached_prod++;
> > > >       return 0;
> > > >   }
> > > >
> > > > @@ -340,11 +336,11 @@ static inline int xskq_produce_batch_desc(struct xsk_queue *q,
> > > >       struct xdp_rxtx_ring *ring = (struct xdp_rxtx_ring *)q->ring;
> > > >       unsigned int idx;
> > > >
> > > > -     if (xskq_nb_free(q, q->prod_head, 1) == 0)
> > > > +     if (xskq_nb_free(q, 1) == 0)
> > > >               return -ENOSPC;
> > > >
> > > >       /* A, matches D */
> > > > -     idx = (q->prod_head++) & q->ring_mask;
> > > > +     idx = q->cached_prod++ & q->ring_mask;
> > > >       ring->desc[idx].addr = addr;
> > > >       ring->desc[idx].len = len;
> > > >
> > > > @@ -356,8 +352,7 @@ static inline void xskq_produce_flush_desc(struct xsk_queue *q)
> > > >       /* Order producer and data */
> > > >       smp_wmb(); /* B, matches C */
> > > >
> > > > -     q->prod_tail = q->prod_head;
> > > > -     WRITE_ONCE(q->ring->producer, q->prod_tail);
> > > > +     WRITE_ONCE(q->ring->producer, q->cached_prod);
> > > >   }
> > > >
> > > >   static inline bool xskq_full_desc(struct xsk_queue *q)
> > > > @@ -367,7 +362,7 @@ static inline bool xskq_full_desc(struct xsk_queue *q)
> > > >
> > > >   static inline bool xskq_empty_desc(struct xsk_queue *q)
> > > >   {
> > > > -     return xskq_nb_free(q, q->prod_tail, q->nentries) == q->nentries;
> > > > +     return xskq_nb_free(q, q->nentries) == q->nentries;
> > >
> > > I don't think this change is correct. The old code checked the number of
> > > free items against prod_tail (== producer). The new code changes it to
> > > prod_head (which is now cached_prod). xskq_nb_free is used in xsk_poll
> > > to set EPOLLIN. After this change EPOLLIN will be set right after
> > > xskq_produce_batch_desc, but it should only be set after
> > > xskq_produce_flush_desc, just as before, otherwise the application will
> > > wake up before the data is available, and it will just waste CPU cycles.
> >
> > That is correct. It will be inefficient during patch 2 and 3 as this
> > gets fixed in patch 4.
>
> Looking at patch 4, I see it still uses cached_prod, not producer.
> However, I see you changed it in the v2, so it should be fine now.
>
> > I chose this as I thought the patch progression
> > and simplification process would be clearer this way. So what to do
> > about it? Some options:
> >
> > * Document this in patch 2 and keep the current order
> >
> > *  Put patch 4 before patch 2 so that the code is always efficient.
> > This is doable, but I have the feeling it will be somewhat less clear
> > from a simplification perspective. The advantage, on the other hand,
> > is that the poll code is always efficient during the whole patch set.
>
> I'm sorry that it takes long for me to answer, I'm on vacation now.
> Anyway, either option looks good to me, as long as xskq_prod_is_empty
> has the correct check (as in v2 patch 2).

Thanks for taking a look at the patch during your vacation Maxim. But
now, please go and enjoy the holidays :-)!

/Magnus

> > /Magnus
> >
> > > >   }
> > > >
> > > >   void xskq_set_umem(struct xsk_queue *q, u64 size, u64 chunk_mask);
> > > >
> > >

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2019-12-19 16:21 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-12-09  7:56 [PATCH bpf-next 00/12] xsk: clean up ring access functions Magnus Karlsson
2019-12-09  7:56 ` [PATCH bpf-next 01/12] xsk: eliminate the lazy update threshold Magnus Karlsson
2019-12-09  7:56 ` [PATCH bpf-next 02/12] xsk: consolidate to one single cached producer pointer Magnus Karlsson
2019-12-10  0:42   ` Martin Lau
2019-12-10  9:04     ` Magnus Karlsson
2019-12-13 18:04   ` Maxim Mikityanskiy
2019-12-16  8:46     ` Magnus Karlsson
2019-12-19 14:35       ` Maxim Mikityanskiy
2019-12-19 16:21         ` Magnus Karlsson
2019-12-09  7:56 ` [PATCH bpf-next 03/12] xsk: standardize naming of producer ring access functions Magnus Karlsson
2019-12-09  7:56 ` [PATCH bpf-next 04/12] xsk: simplify detection of empty and full rings Magnus Karlsson
2019-12-09  7:56 ` [PATCH bpf-next 05/12] xsk: eliminate the RX batch size Magnus Karlsson
2019-12-09 10:16   ` Sergei Shtylyov
2019-12-09 13:07     ` Magnus Karlsson
2019-12-09  7:56 ` [PATCH bpf-next 06/12] xsk: simplify xskq_nb_avail and xskq_nb_free Magnus Karlsson
2019-12-09  7:56 ` [PATCH bpf-next 07/12] xsk: simplify the consumer ring access functions Magnus Karlsson
2019-12-09  7:56 ` [PATCH bpf-next 08/12] xsk: change names of validation functions Magnus Karlsson
2019-12-09  7:56 ` [PATCH bpf-next 09/12] xsk: ixgbe: i40e: ice: mlx5: xsk_umem_discard_addr to xsk_umem_release_addr Magnus Karlsson
2019-12-09  7:56 ` [PATCH bpf-next 10/12] xsk: remove unnecessary READ_ONCE of data Magnus Karlsson
2019-12-09  7:56 ` [PATCH bpf-next 11/12] xsk: add function naming comments and reorder functions Magnus Karlsson
2019-12-09  7:56 ` [PATCH bpf-next 12/12] xsk: use struct_size() helper Magnus Karlsson
2019-12-10  1:02 ` [PATCH bpf-next 00/12] xsk: clean up ring access functions Martin Lau

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).