bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH bpf-next 0/2] load-acquire/store-release semantics for AF_XDP rings
@ 2021-03-01 10:43 Björn Töpel
  2021-03-01 10:43 ` [PATCH bpf-next 1/2] xsk: update rings for load-acquire/store-release semantics Björn Töpel
  2021-03-01 10:43 ` [PATCH bpf-next 2/2] libbpf, xsk: add libbpf_smp_store_release libbpf_smp_load_acquire Björn Töpel
  0 siblings, 2 replies; 18+ messages in thread
From: Björn Töpel @ 2021-03-01 10:43 UTC (permalink / raw)
  To: ast, daniel, netdev, bpf
  Cc: Björn Töpel, bjorn.topel, magnus.karlsson,
	jonathan.lemon, maximmi, andrii

This two-patch series introduces load-acquire/store-release semantics
for the AF_XDP rings.

For most contemporary architectures, this is more effective than a
SPSC ring based on smp_{r,w,}mb() barriers. More importantly,
load-acquire/store-release semantics make the ring code easier to
follow.

This is effectively the change done in commit 6c43c091bdc5
("documentation: Update circular buffer for
load-acquire/store-release"), but for the AF_XDP rings.

Both libbpf and the kernel-side are updated.

More details in each commit.


Thanks,
Björn


Björn Töpel (2):
  xsk: update rings for load-acquire/store-release semantics
  libbpf, xsk: add libbpf_smp_store_release libbpf_smp_load_acquire

 net/xdp/xsk_queue.h         | 27 ++++++--------
 tools/lib/bpf/libbpf_util.h | 72 +++++++++++++++++++++++++------------
 tools/lib/bpf/xsk.h         | 17 +++------
 3 files changed, 66 insertions(+), 50 deletions(-)


base-commit: 85e142cb42a1e7b33971bf035dae432d8670c46b
-- 
2.27.0


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH bpf-next 1/2] xsk: update rings for load-acquire/store-release semantics
  2021-03-01 10:43 [PATCH bpf-next 0/2] load-acquire/store-release semantics for AF_XDP rings Björn Töpel
@ 2021-03-01 10:43 ` Björn Töpel
  2021-03-01 16:08   ` Toke Høiland-Jørgensen
  2021-03-01 10:43 ` [PATCH bpf-next 2/2] libbpf, xsk: add libbpf_smp_store_release libbpf_smp_load_acquire Björn Töpel
  1 sibling, 1 reply; 18+ messages in thread
From: Björn Töpel @ 2021-03-01 10:43 UTC (permalink / raw)
  To: ast, daniel, netdev, bpf
  Cc: Björn Töpel, magnus.karlsson, jonathan.lemon, maximmi, andrii

From: Björn Töpel <bjorn.topel@intel.com>

Currently, the AF_XDP rings uses smp_{r,w,}mb() fences on the
kernel-side. By updating the rings for load-acquire/store-release
semantics, the full barrier on the consumer side can be replaced with
improved performance as a nice side-effect.

Note that this change does *not* require similar changes on the
libbpf/userland side, however it is recommended [1].

On x86-64 systems, by removing the smp_mb() on the Rx and Tx side, the
l2fwd AF_XDP xdpsock sample performance increases by
1%. Weakly-ordered platforms, such as ARM64 might benefit even more.

[1] https://lore.kernel.org/bpf/20200316184423.GA14143@willie-the-truck/

Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
---
 net/xdp/xsk_queue.h | 27 +++++++++++----------------
 1 file changed, 11 insertions(+), 16 deletions(-)

diff --git a/net/xdp/xsk_queue.h b/net/xdp/xsk_queue.h
index 2823b7c3302d..e24279d8d845 100644
--- a/net/xdp/xsk_queue.h
+++ b/net/xdp/xsk_queue.h
@@ -47,19 +47,18 @@ struct xsk_queue {
 	u64 queue_empty_descs;
 };
 
-/* The structure of the shared state of the rings are the same as the
- * ring buffer in kernel/events/ring_buffer.c. For the Rx and completion
- * ring, the kernel is the producer and user space is the consumer. For
- * the Tx and fill rings, the kernel is the consumer and user space is
- * the producer.
+/* The structure of the shared state of the rings are a simple
+ * circular buffer, as outlined in
+ * Documentation/core-api/circular-buffers.rst. For the Rx and
+ * completion ring, the kernel is the producer and user space is the
+ * consumer. For the Tx and fill rings, the kernel is the consumer and
+ * user space is the producer.
  *
  * producer                         consumer
  *
- * if (LOAD ->consumer) {           LOAD ->producer
- *                    (A)           smp_rmb()       (C)
+ * if (LOAD ->consumer) {  (A)      LOAD.acq ->producer  (C)
  *    STORE $data                   LOAD $data
- *    smp_wmb()       (B)           smp_mb()        (D)
- *    STORE ->producer              STORE ->consumer
+ *    STORE.rel ->producer (B)      STORE.rel ->consumer (D)
  * }
  *
  * (A) pairs with (D), and (B) pairs with (C).
@@ -227,15 +226,13 @@ static inline u32 xskq_cons_read_desc_batch(struct xsk_queue *q,
 
 static inline void __xskq_cons_release(struct xsk_queue *q)
 {
-	smp_mb(); /* D, matches A */
-	WRITE_ONCE(q->ring->consumer, q->cached_cons);
+	smp_store_release(&q->ring->consumer, q->cached_cons); /* D, matchees A */
 }
 
 static inline void __xskq_cons_peek(struct xsk_queue *q)
 {
 	/* Refresh the local pointer */
-	q->cached_prod = READ_ONCE(q->ring->producer);
-	smp_rmb(); /* C, matches B */
+	q->cached_prod = smp_load_acquire(&q->ring->producer);  /* C, matches B */
 }
 
 static inline void xskq_cons_get_entries(struct xsk_queue *q)
@@ -397,9 +394,7 @@ static inline int xskq_prod_reserve_desc(struct xsk_queue *q,
 
 static inline void __xskq_prod_submit(struct xsk_queue *q, u32 idx)
 {
-	smp_wmb(); /* B, matches C */
-
-	WRITE_ONCE(q->ring->producer, idx);
+	smp_store_release(&q->ring->producer, idx); /* B, matches C */
 }
 
 static inline void xskq_prod_submit(struct xsk_queue *q)
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH bpf-next 2/2] libbpf, xsk: add libbpf_smp_store_release libbpf_smp_load_acquire
  2021-03-01 10:43 [PATCH bpf-next 0/2] load-acquire/store-release semantics for AF_XDP rings Björn Töpel
  2021-03-01 10:43 ` [PATCH bpf-next 1/2] xsk: update rings for load-acquire/store-release semantics Björn Töpel
@ 2021-03-01 10:43 ` Björn Töpel
  2021-03-01 16:10   ` Toke Høiland-Jørgensen
  2021-03-03  4:38   ` Andrii Nakryiko
  1 sibling, 2 replies; 18+ messages in thread
From: Björn Töpel @ 2021-03-01 10:43 UTC (permalink / raw)
  To: ast, daniel, netdev, bpf
  Cc: Björn Töpel, magnus.karlsson, jonathan.lemon, maximmi, andrii

From: Björn Töpel <bjorn.topel@intel.com>

Now that the AF_XDP rings have load-acquire/store-release semantics,
move libbpf to that as well.

The library-internal libbpf_smp_{load_acquire,store_release} are only
valid for 32-bit words on ARM64.

Also, remove the barriers that are no longer in use.

Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
---
 tools/lib/bpf/libbpf_util.h | 72 +++++++++++++++++++++++++------------
 tools/lib/bpf/xsk.h         | 17 +++------
 2 files changed, 55 insertions(+), 34 deletions(-)

diff --git a/tools/lib/bpf/libbpf_util.h b/tools/lib/bpf/libbpf_util.h
index 59c779c5790c..94a0d7bb6f3c 100644
--- a/tools/lib/bpf/libbpf_util.h
+++ b/tools/lib/bpf/libbpf_util.h
@@ -5,6 +5,7 @@
 #define __LIBBPF_LIBBPF_UTIL_H
 
 #include <stdbool.h>
+#include <linux/compiler.h>
 
 #ifdef __cplusplus
 extern "C" {
@@ -15,29 +16,56 @@ extern "C" {
  * application that uses libbpf.
  */
 #if defined(__i386__) || defined(__x86_64__)
-# define libbpf_smp_rmb() asm volatile("" : : : "memory")
-# define libbpf_smp_wmb() asm volatile("" : : : "memory")
-# define libbpf_smp_mb() \
-	asm volatile("lock; addl $0,-4(%%rsp)" : : : "memory", "cc")
-/* Hinders stores to be observed before older loads. */
-# define libbpf_smp_rwmb() asm volatile("" : : : "memory")
+# define libbpf_smp_store_release(p, v)					\
+	do {								\
+		asm volatile("" : : : "memory");			\
+		WRITE_ONCE(*p, v);					\
+	} while (0)
+# define libbpf_smp_load_acquire(p)					\
+	({								\
+		typeof(*p) ___p1 = READ_ONCE(*p);			\
+		asm volatile("" : : : "memory");			\
+		___p1;							\
+	})
 #elif defined(__aarch64__)
-# define libbpf_smp_rmb() asm volatile("dmb ishld" : : : "memory")
-# define libbpf_smp_wmb() asm volatile("dmb ishst" : : : "memory")
-# define libbpf_smp_mb() asm volatile("dmb ish" : : : "memory")
-# define libbpf_smp_rwmb() libbpf_smp_mb()
-#elif defined(__arm__)
-/* These are only valid for armv7 and above */
-# define libbpf_smp_rmb() asm volatile("dmb ish" : : : "memory")
-# define libbpf_smp_wmb() asm volatile("dmb ishst" : : : "memory")
-# define libbpf_smp_mb() asm volatile("dmb ish" : : : "memory")
-# define libbpf_smp_rwmb() libbpf_smp_mb()
-#else
-/* Architecture missing native barrier functions. */
-# define libbpf_smp_rmb() __sync_synchronize()
-# define libbpf_smp_wmb() __sync_synchronize()
-# define libbpf_smp_mb() __sync_synchronize()
-# define libbpf_smp_rwmb() __sync_synchronize()
+# define libbpf_smp_store_release(p, v)					\
+		asm volatile ("stlr %w1, %0" : "=Q" (*p) : "r" (v) : "memory")
+# define libbpf_smp_load_acquire(p)					\
+	({								\
+		typeof(*p) ___p1;					\
+		asm volatile ("ldar %w0, %1"				\
+			      : "=r" (___p1) : "Q" (*p) : "memory");	\
+		__p1;							\
+	})
+#elif defined(__riscv)
+# define libbpf_smp_store_release(p, v)					\
+	do {								\
+		asm volatile ("fence rw,w" : : : "memory");		\
+		WRITE_ONCE(*p, v);					\
+	} while (0)
+# define libbpf_smp_load_acquire(p)					\
+	({								\
+		typeof(*p) ___p1 = READ_ONCE(*p);			\
+		asm volatile ("fence r,rw" : : : "memory");		\
+		___p1;							\
+	})
+#endif
+
+#ifndef libbpf_smp_store_release
+#define libbpf_smp_store_release(p, v)					\
+	do {								\
+		__sync_synchronize();					\
+		WRITE_ONCE(*p, v);					\
+	} while (0)
+#endif
+
+#ifndef libbpf_smp_load_acquire
+#define libbpf_smp_load_acquire(p)					\
+	({								\
+		typeof(*p) ___p1 = READ_ONCE(*p);			\
+		__sync_synchronize();					\
+		___p1;							\
+	})
 #endif
 
 #ifdef __cplusplus
diff --git a/tools/lib/bpf/xsk.h b/tools/lib/bpf/xsk.h
index e9f121f5d129..a9fdea87b5cd 100644
--- a/tools/lib/bpf/xsk.h
+++ b/tools/lib/bpf/xsk.h
@@ -96,7 +96,8 @@ static inline __u32 xsk_prod_nb_free(struct xsk_ring_prod *r, __u32 nb)
 	 * this function. Without this optimization it whould have been
 	 * free_entries = r->cached_prod - r->cached_cons + r->size.
 	 */
-	r->cached_cons = *r->consumer + r->size;
+	r->cached_cons = libbpf_smp_load_acquire(r->consumer);
+	r->cached_cons += r->size;
 
 	return r->cached_cons - r->cached_prod;
 }
@@ -106,7 +107,7 @@ static inline __u32 xsk_cons_nb_avail(struct xsk_ring_cons *r, __u32 nb)
 	__u32 entries = r->cached_prod - r->cached_cons;
 
 	if (entries == 0) {
-		r->cached_prod = *r->producer;
+		r->cached_prod = libbpf_smp_load_acquire(r->producer);
 		entries = r->cached_prod - r->cached_cons;
 	}
 
@@ -129,9 +130,7 @@ static inline void xsk_ring_prod__submit(struct xsk_ring_prod *prod, __u32 nb)
 	/* Make sure everything has been written to the ring before indicating
 	 * this to the kernel by writing the producer pointer.
 	 */
-	libbpf_smp_wmb();
-
-	*prod->producer += nb;
+	libbpf_smp_store_release(prod->producer, *prod->producer + nb);
 }
 
 static inline __u32 xsk_ring_cons__peek(struct xsk_ring_cons *cons, __u32 nb, __u32 *idx)
@@ -139,11 +138,6 @@ static inline __u32 xsk_ring_cons__peek(struct xsk_ring_cons *cons, __u32 nb, __
 	__u32 entries = xsk_cons_nb_avail(cons, nb);
 
 	if (entries > 0) {
-		/* Make sure we do not speculatively read the data before
-		 * we have received the packet buffers from the ring.
-		 */
-		libbpf_smp_rmb();
-
 		*idx = cons->cached_cons;
 		cons->cached_cons += entries;
 	}
@@ -161,9 +155,8 @@ static inline void xsk_ring_cons__release(struct xsk_ring_cons *cons, __u32 nb)
 	/* Make sure data has been read before indicating we are done
 	 * with the entries by updating the consumer pointer.
 	 */
-	libbpf_smp_rwmb();
+	libbpf_smp_store_release(cons->consumer, *cons->consumer + nb);
 
-	*cons->consumer += nb;
 }
 
 static inline void *xsk_umem__get_data(void *umem_area, __u64 addr)
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH bpf-next 1/2] xsk: update rings for load-acquire/store-release semantics
  2021-03-01 10:43 ` [PATCH bpf-next 1/2] xsk: update rings for load-acquire/store-release semantics Björn Töpel
@ 2021-03-01 16:08   ` Toke Høiland-Jørgensen
  2021-03-02  8:04     ` Björn Töpel
  0 siblings, 1 reply; 18+ messages in thread
From: Toke Høiland-Jørgensen @ 2021-03-01 16:08 UTC (permalink / raw)
  To: Björn Töpel, ast, daniel, netdev, bpf
  Cc: Björn Töpel, magnus.karlsson, jonathan.lemon, maximmi, andrii

Björn Töpel <bjorn.topel@gmail.com> writes:

> From: Björn Töpel <bjorn.topel@intel.com>
>
> Currently, the AF_XDP rings uses smp_{r,w,}mb() fences on the
> kernel-side. By updating the rings for load-acquire/store-release
> semantics, the full barrier on the consumer side can be replaced with
> improved performance as a nice side-effect.
>
> Note that this change does *not* require similar changes on the
> libbpf/userland side, however it is recommended [1].
>
> On x86-64 systems, by removing the smp_mb() on the Rx and Tx side, the
> l2fwd AF_XDP xdpsock sample performance increases by
> 1%. Weakly-ordered platforms, such as ARM64 might benefit even more.
>
> [1] https://lore.kernel.org/bpf/20200316184423.GA14143@willie-the-truck/
>
> Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
> ---
>  net/xdp/xsk_queue.h | 27 +++++++++++----------------
>  1 file changed, 11 insertions(+), 16 deletions(-)
>
> diff --git a/net/xdp/xsk_queue.h b/net/xdp/xsk_queue.h
> index 2823b7c3302d..e24279d8d845 100644
> --- a/net/xdp/xsk_queue.h
> +++ b/net/xdp/xsk_queue.h
> @@ -47,19 +47,18 @@ struct xsk_queue {
>  	u64 queue_empty_descs;
>  };
>  
> -/* The structure of the shared state of the rings are the same as the
> - * ring buffer in kernel/events/ring_buffer.c. For the Rx and completion
> - * ring, the kernel is the producer and user space is the consumer. For
> - * the Tx and fill rings, the kernel is the consumer and user space is
> - * the producer.
> +/* The structure of the shared state of the rings are a simple
> + * circular buffer, as outlined in
> + * Documentation/core-api/circular-buffers.rst. For the Rx and
> + * completion ring, the kernel is the producer and user space is the
> + * consumer. For the Tx and fill rings, the kernel is the consumer and
> + * user space is the producer.
>   *
>   * producer                         consumer
>   *
> - * if (LOAD ->consumer) {           LOAD ->producer
> - *                    (A)           smp_rmb()       (C)
> + * if (LOAD ->consumer) {  (A)      LOAD.acq ->producer  (C)

Why is LOAD.acq not needed on the consumer side?

-Toke


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH bpf-next 2/2] libbpf, xsk: add libbpf_smp_store_release libbpf_smp_load_acquire
  2021-03-01 10:43 ` [PATCH bpf-next 2/2] libbpf, xsk: add libbpf_smp_store_release libbpf_smp_load_acquire Björn Töpel
@ 2021-03-01 16:10   ` Toke Høiland-Jørgensen
  2021-03-02  8:05     ` Björn Töpel
  2021-03-03  4:38   ` Andrii Nakryiko
  1 sibling, 1 reply; 18+ messages in thread
From: Toke Høiland-Jørgensen @ 2021-03-01 16:10 UTC (permalink / raw)
  To: Björn Töpel, ast, daniel, netdev, bpf
  Cc: Björn Töpel, magnus.karlsson, jonathan.lemon, maximmi, andrii

Björn Töpel <bjorn.topel@gmail.com> writes:

> From: Björn Töpel <bjorn.topel@intel.com>
>
> Now that the AF_XDP rings have load-acquire/store-release semantics,
> move libbpf to that as well.
>
> The library-internal libbpf_smp_{load_acquire,store_release} are only
> valid for 32-bit words on ARM64.
>
> Also, remove the barriers that are no longer in use.

So what happens if an updated libbpf is paired with an older kernel (or
vice versa)?

-Toke


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH bpf-next 1/2] xsk: update rings for load-acquire/store-release semantics
  2021-03-01 16:08   ` Toke Høiland-Jørgensen
@ 2021-03-02  8:04     ` Björn Töpel
  2021-03-02 10:23       ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 18+ messages in thread
From: Björn Töpel @ 2021-03-02  8:04 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen, Björn Töpel, ast,
	daniel, netdev, bpf, paulmck
  Cc: magnus.karlsson, jonathan.lemon, maximmi, andrii




On 2021-03-01 17:08, Toke Høiland-Jørgensen wrote:
> Björn Töpel <bjorn.topel@gmail.com> writes:
> 
>> From: Björn Töpel <bjorn.topel@intel.com>
>>
>> Currently, the AF_XDP rings uses smp_{r,w,}mb() fences on the
>> kernel-side. By updating the rings for load-acquire/store-release
>> semantics, the full barrier on the consumer side can be replaced with
>> improved performance as a nice side-effect.
>>
>> Note that this change does *not* require similar changes on the
>> libbpf/userland side, however it is recommended [1].
>>
>> On x86-64 systems, by removing the smp_mb() on the Rx and Tx side, the
>> l2fwd AF_XDP xdpsock sample performance increases by
>> 1%. Weakly-ordered platforms, such as ARM64 might benefit even more.
>>
>> [1] https://lore.kernel.org/bpf/20200316184423.GA14143@willie-the-truck/
>>
>> Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
>> ---
>>   net/xdp/xsk_queue.h | 27 +++++++++++----------------
>>   1 file changed, 11 insertions(+), 16 deletions(-)
>>
>> diff --git a/net/xdp/xsk_queue.h b/net/xdp/xsk_queue.h
>> index 2823b7c3302d..e24279d8d845 100644
>> --- a/net/xdp/xsk_queue.h
>> +++ b/net/xdp/xsk_queue.h
>> @@ -47,19 +47,18 @@ struct xsk_queue {
>>   	u64 queue_empty_descs;
>>   };
>>   
>> -/* The structure of the shared state of the rings are the same as the
>> - * ring buffer in kernel/events/ring_buffer.c. For the Rx and completion
>> - * ring, the kernel is the producer and user space is the consumer. For
>> - * the Tx and fill rings, the kernel is the consumer and user space is
>> - * the producer.
>> +/* The structure of the shared state of the rings are a simple
>> + * circular buffer, as outlined in
>> + * Documentation/core-api/circular-buffers.rst. For the Rx and
>> + * completion ring, the kernel is the producer and user space is the
>> + * consumer. For the Tx and fill rings, the kernel is the consumer and
>> + * user space is the producer.
>>    *
>>    * producer                         consumer
>>    *
>> - * if (LOAD ->consumer) {           LOAD ->producer
>> - *                    (A)           smp_rmb()       (C)
>> + * if (LOAD ->consumer) {  (A)      LOAD.acq ->producer  (C)
> 
> Why is LOAD.acq not needed on the consumer side?
>

You mean why LOAD.acq is not needed on the *producer* side, i.e. the
->consumer? The ->consumer is a control dependency for the store, so
there is no ordering constraint for ->consumer at producer side. If
there's no space, no data is written. So, no barrier is needed there --
at least that has been my perspective.

This is very similar to the buffer in
Documentation/core-api/circular-buffers.rst. Roping in Paul for some
guidance.


Björn

> -Toke
> 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH bpf-next 2/2] libbpf, xsk: add libbpf_smp_store_release libbpf_smp_load_acquire
  2021-03-01 16:10   ` Toke Høiland-Jørgensen
@ 2021-03-02  8:05     ` Björn Töpel
  2021-03-02  9:13       ` Daniel Borkmann
  0 siblings, 1 reply; 18+ messages in thread
From: Björn Töpel @ 2021-03-02  8:05 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen, Björn Töpel, ast,
	daniel, netdev, bpf
  Cc: magnus.karlsson, jonathan.lemon, maximmi, andrii, will

On 2021-03-01 17:10, Toke Høiland-Jørgensen wrote:
> Björn Töpel <bjorn.topel@gmail.com> writes:
> 
>> From: Björn Töpel <bjorn.topel@intel.com>
>>
>> Now that the AF_XDP rings have load-acquire/store-release semantics,
>> move libbpf to that as well.
>>
>> The library-internal libbpf_smp_{load_acquire,store_release} are only
>> valid for 32-bit words on ARM64.
>>
>> Also, remove the barriers that are no longer in use.
> 
> So what happens if an updated libbpf is paired with an older kernel (or
> vice versa)?
>

"This is fine." ;-) This was briefly discussed in [1], outlined by the
previous commit!

...even on POWER.


Björn

[1] https://lore.kernel.org/bpf/20200316184423.GA14143@willie-the-truck/


> -Toke
> 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH bpf-next 2/2] libbpf, xsk: add libbpf_smp_store_release libbpf_smp_load_acquire
  2021-03-02  8:05     ` Björn Töpel
@ 2021-03-02  9:13       ` Daniel Borkmann
  2021-03-02  9:16         ` Björn Töpel
  2021-03-03 15:39         ` Will Deacon
  0 siblings, 2 replies; 18+ messages in thread
From: Daniel Borkmann @ 2021-03-02  9:13 UTC (permalink / raw)
  To: Björn Töpel, Toke Høiland-Jørgensen,
	Björn Töpel, ast, netdev, bpf
  Cc: magnus.karlsson, jonathan.lemon, maximmi, andrii, will

On 3/2/21 9:05 AM, Björn Töpel wrote:
> On 2021-03-01 17:10, Toke Høiland-Jørgensen wrote:
>> Björn Töpel <bjorn.topel@gmail.com> writes:
>>> From: Björn Töpel <bjorn.topel@intel.com>
>>>
>>> Now that the AF_XDP rings have load-acquire/store-release semantics,
>>> move libbpf to that as well.
>>>
>>> The library-internal libbpf_smp_{load_acquire,store_release} are only
>>> valid for 32-bit words on ARM64.
>>>
>>> Also, remove the barriers that are no longer in use.
>>
>> So what happens if an updated libbpf is paired with an older kernel (or
>> vice versa)?
> 
> "This is fine." ;-) This was briefly discussed in [1], outlined by the
> previous commit!
> 
> ...even on POWER.

Could you put a summary or quote of that discussion on 'why it is okay and does not
cause /forward or backward/ compat issues with user space' directly into patch 1's
commit message?

I feel just referring to a link is probably less suitable in this case as it should
rather be part of the commit message that contains the justification on why it is
waterproof - at least it feels that specific area may be a bit under-documented, so
having it as direct part certainly doesn't hurt.

Would also be great to get Will's ACK on that when you have a v2. :)

Thanks,
Daniel

> [1] https://lore.kernel.org/bpf/20200316184423.GA14143@willie-the-truck/

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH bpf-next 2/2] libbpf, xsk: add libbpf_smp_store_release libbpf_smp_load_acquire
  2021-03-02  9:13       ` Daniel Borkmann
@ 2021-03-02  9:16         ` Björn Töpel
  2021-03-02  9:25           ` Daniel Borkmann
  2021-03-03 15:39         ` Will Deacon
  1 sibling, 1 reply; 18+ messages in thread
From: Björn Töpel @ 2021-03-02  9:16 UTC (permalink / raw)
  To: Daniel Borkmann, Toke Høiland-Jørgensen,
	Björn Töpel, ast, netdev, bpf
  Cc: magnus.karlsson, jonathan.lemon, maximmi, andrii, will

On 2021-03-02 10:13, Daniel Borkmann wrote:
> On 3/2/21 9:05 AM, Björn Töpel wrote:
>> On 2021-03-01 17:10, Toke Høiland-Jørgensen wrote:
>>> Björn Töpel <bjorn.topel@gmail.com> writes:
>>>> From: Björn Töpel <bjorn.topel@intel.com>
>>>>
>>>> Now that the AF_XDP rings have load-acquire/store-release semantics,
>>>> move libbpf to that as well.
>>>>
>>>> The library-internal libbpf_smp_{load_acquire,store_release} are only
>>>> valid for 32-bit words on ARM64.
>>>>
>>>> Also, remove the barriers that are no longer in use.
>>>
>>> So what happens if an updated libbpf is paired with an older kernel (or
>>> vice versa)?
>>
>> "This is fine." ;-) This was briefly discussed in [1], outlined by the
>> previous commit!
>>
>> ...even on POWER.
> 
> Could you put a summary or quote of that discussion on 'why it is okay 
> and does not
> cause /forward or backward/ compat issues with user space' directly into 
> patch 1's
> commit message?
> 
> I feel just referring to a link is probably less suitable in this case 
> as it should
> rather be part of the commit message that contains the justification on 
> why it is
> waterproof - at least it feels that specific area may be a bit 
> under-documented, so
> having it as direct part certainly doesn't hurt.
>

I agree; It's enough in the weed as it is already.

I wonder if it's possible to cook a LKMM litmus test for this...?


> Would also be great to get Will's ACK on that when you have a v2. :)
>

Yup! :-)


Björn


> Thanks,
> Daniel
> 
>> [1] https://lore.kernel.org/bpf/20200316184423.GA14143@willie-the-truck/

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH bpf-next 2/2] libbpf, xsk: add libbpf_smp_store_release libbpf_smp_load_acquire
  2021-03-02  9:16         ` Björn Töpel
@ 2021-03-02  9:25           ` Daniel Borkmann
  2021-03-03  8:08             ` Björn Töpel
  0 siblings, 1 reply; 18+ messages in thread
From: Daniel Borkmann @ 2021-03-02  9:25 UTC (permalink / raw)
  To: Björn Töpel, Toke Høiland-Jørgensen,
	Björn Töpel, ast, netdev, bpf
  Cc: magnus.karlsson, jonathan.lemon, maximmi, andrii, will, paulmck

On 3/2/21 10:16 AM, Björn Töpel wrote:
> On 2021-03-02 10:13, Daniel Borkmann wrote:
>> On 3/2/21 9:05 AM, Björn Töpel wrote:
>>> On 2021-03-01 17:10, Toke Høiland-Jørgensen wrote:
>>>> Björn Töpel <bjorn.topel@gmail.com> writes:
>>>>> From: Björn Töpel <bjorn.topel@intel.com>
>>>>>
>>>>> Now that the AF_XDP rings have load-acquire/store-release semantics,
>>>>> move libbpf to that as well.
>>>>>
>>>>> The library-internal libbpf_smp_{load_acquire,store_release} are only
>>>>> valid for 32-bit words on ARM64.
>>>>>
>>>>> Also, remove the barriers that are no longer in use.
>>>>
>>>> So what happens if an updated libbpf is paired with an older kernel (or
>>>> vice versa)?
>>>
>>> "This is fine." ;-) This was briefly discussed in [1], outlined by the
>>> previous commit!
>>>
>>> ...even on POWER.
>>
>> Could you put a summary or quote of that discussion on 'why it is okay and does not
>> cause /forward or backward/ compat issues with user space' directly into patch 1's
>> commit message?
>>
>> I feel just referring to a link is probably less suitable in this case as it should
>> rather be part of the commit message that contains the justification on why it is
>> waterproof - at least it feels that specific area may be a bit under-documented, so
>> having it as direct part certainly doesn't hurt.
> 
> I agree; It's enough in the weed as it is already.
> 
> I wonder if it's possible to cook a LKMM litmus test for this...?

That would be amazing! :-)

(Another option which can be done independently could be to update [0] with outlining a
  pairing scenario as we have here describing the forward/backward compatibility on the
  barriers used, I think that would be quite useful as well.)

   [0] Documentation/memory-barriers.txt

>> Would also be great to get Will's ACK on that when you have a v2. :)
> 
> Yup! :-)
> 
> 
> Björn
> 
> 
>> Thanks,
>> Daniel
>>
>>> [1] https://lore.kernel.org/bpf/20200316184423.GA14143@willie-the-truck/


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH bpf-next 1/2] xsk: update rings for load-acquire/store-release semantics
  2021-03-02  8:04     ` Björn Töpel
@ 2021-03-02 10:23       ` Toke Høiland-Jørgensen
  2021-03-03  7:56         ` Björn Töpel
  0 siblings, 1 reply; 18+ messages in thread
From: Toke Høiland-Jørgensen @ 2021-03-02 10:23 UTC (permalink / raw)
  To: Björn Töpel, Björn Töpel, ast, daniel,
	netdev, bpf, paulmck
  Cc: magnus.karlsson, jonathan.lemon, maximmi, andrii

Björn Töpel <bjorn.topel@intel.com> writes:

> On 2021-03-01 17:08, Toke Høiland-Jørgensen wrote:
>> Björn Töpel <bjorn.topel@gmail.com> writes:
>> 
>>> From: Björn Töpel <bjorn.topel@intel.com>
>>>
>>> Currently, the AF_XDP rings uses smp_{r,w,}mb() fences on the
>>> kernel-side. By updating the rings for load-acquire/store-release
>>> semantics, the full barrier on the consumer side can be replaced with
>>> improved performance as a nice side-effect.
>>>
>>> Note that this change does *not* require similar changes on the
>>> libbpf/userland side, however it is recommended [1].
>>>
>>> On x86-64 systems, by removing the smp_mb() on the Rx and Tx side, the
>>> l2fwd AF_XDP xdpsock sample performance increases by
>>> 1%. Weakly-ordered platforms, such as ARM64 might benefit even more.
>>>
>>> [1] https://lore.kernel.org/bpf/20200316184423.GA14143@willie-the-truck/
>>>
>>> Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
>>> ---
>>>   net/xdp/xsk_queue.h | 27 +++++++++++----------------
>>>   1 file changed, 11 insertions(+), 16 deletions(-)
>>>
>>> diff --git a/net/xdp/xsk_queue.h b/net/xdp/xsk_queue.h
>>> index 2823b7c3302d..e24279d8d845 100644
>>> --- a/net/xdp/xsk_queue.h
>>> +++ b/net/xdp/xsk_queue.h
>>> @@ -47,19 +47,18 @@ struct xsk_queue {
>>>   	u64 queue_empty_descs;
>>>   };
>>>   
>>> -/* The structure of the shared state of the rings are the same as the
>>> - * ring buffer in kernel/events/ring_buffer.c. For the Rx and completion
>>> - * ring, the kernel is the producer and user space is the consumer. For
>>> - * the Tx and fill rings, the kernel is the consumer and user space is
>>> - * the producer.
>>> +/* The structure of the shared state of the rings are a simple
>>> + * circular buffer, as outlined in
>>> + * Documentation/core-api/circular-buffers.rst. For the Rx and
>>> + * completion ring, the kernel is the producer and user space is the
>>> + * consumer. For the Tx and fill rings, the kernel is the consumer and
>>> + * user space is the producer.
>>>    *
>>>    * producer                         consumer
>>>    *
>>> - * if (LOAD ->consumer) {           LOAD ->producer
>>> - *                    (A)           smp_rmb()       (C)
>>> + * if (LOAD ->consumer) {  (A)      LOAD.acq ->producer  (C)
>> 
>> Why is LOAD.acq not needed on the consumer side?
>>
>
> You mean why LOAD.acq is not needed on the *producer* side, i.e. the
> ->consumer?

Yes, of course! The two words were, like, right next to each other ;)

> The ->consumer is a control dependency for the store, so there is no
> ordering constraint for ->consumer at producer side. If there's no
> space, no data is written. So, no barrier is needed there -- at least
> that has been my perspective.
>
> This is very similar to the buffer in
> Documentation/core-api/circular-buffers.rst. Roping in Paul for some
> guidance.

Yeah, I did read that, but got thrown off by this bit: "Therefore, the
unlock-lock pair between consecutive invocations of the consumer
provides the necessary ordering between the read of the index indicating
that the consumer has vacated a given element and the write by the
producer to that same element."

Since there is no lock in the XSK, what provides that guarantee here?


Oh, and BTW, when I re-read the rest of the comment in xsk_queue.h
(below the diagram you are changing in this patch), the text still talks
about "memory barriers" - maybe that should be updated to
release/acquire as well while you're changing things?

-Toke


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH bpf-next 2/2] libbpf, xsk: add libbpf_smp_store_release libbpf_smp_load_acquire
  2021-03-01 10:43 ` [PATCH bpf-next 2/2] libbpf, xsk: add libbpf_smp_store_release libbpf_smp_load_acquire Björn Töpel
  2021-03-01 16:10   ` Toke Høiland-Jørgensen
@ 2021-03-03  4:38   ` Andrii Nakryiko
  2021-03-03  7:14     ` Björn Töpel
  1 sibling, 1 reply; 18+ messages in thread
From: Andrii Nakryiko @ 2021-03-03  4:38 UTC (permalink / raw)
  To: Björn Töpel
  Cc: Alexei Starovoitov, Daniel Borkmann, Networking, bpf,
	Björn Töpel, Magnus Karlsson, Jonathan Lemon, maximmi,
	Andrii Nakryiko

On Mon, Mar 1, 2021 at 2:43 AM Björn Töpel <bjorn.topel@gmail.com> wrote:
>
> From: Björn Töpel <bjorn.topel@intel.com>
>
> Now that the AF_XDP rings have load-acquire/store-release semantics,
> move libbpf to that as well.
>
> The library-internal libbpf_smp_{load_acquire,store_release} are only
> valid for 32-bit words on ARM64.
>
> Also, remove the barriers that are no longer in use.
>
> Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
> ---
>  tools/lib/bpf/libbpf_util.h | 72 +++++++++++++++++++++++++------------
>  tools/lib/bpf/xsk.h         | 17 +++------
>  2 files changed, 55 insertions(+), 34 deletions(-)
>
> diff --git a/tools/lib/bpf/libbpf_util.h b/tools/lib/bpf/libbpf_util.h
> index 59c779c5790c..94a0d7bb6f3c 100644
> --- a/tools/lib/bpf/libbpf_util.h
> +++ b/tools/lib/bpf/libbpf_util.h
> @@ -5,6 +5,7 @@
>  #define __LIBBPF_LIBBPF_UTIL_H
>
>  #include <stdbool.h>
> +#include <linux/compiler.h>
>
>  #ifdef __cplusplus
>  extern "C" {
> @@ -15,29 +16,56 @@ extern "C" {
>   * application that uses libbpf.
>   */
>  #if defined(__i386__) || defined(__x86_64__)
> -# define libbpf_smp_rmb() asm volatile("" : : : "memory")
> -# define libbpf_smp_wmb() asm volatile("" : : : "memory")
> -# define libbpf_smp_mb() \
> -       asm volatile("lock; addl $0,-4(%%rsp)" : : : "memory", "cc")
> -/* Hinders stores to be observed before older loads. */
> -# define libbpf_smp_rwmb() asm volatile("" : : : "memory")

So, technically, these four are part of libbpf's API, as libbpf_util.h
is actually installed on target hosts. Seems like xsk.h is the only
one that is using them, though.

So the question is whether it's ok to remove them now?

And also, why wasn't this part of xsk.h in the first place?

> +# define libbpf_smp_store_release(p, v)                                        \
> +       do {                                                            \
> +               asm volatile("" : : : "memory");                        \
> +               WRITE_ONCE(*p, v);                                      \
> +       } while (0)
> +# define libbpf_smp_load_acquire(p)                                    \
> +       ({                                                              \
> +               typeof(*p) ___p1 = READ_ONCE(*p);                       \
> +               asm volatile("" : : : "memory");                        \
> +               ___p1;                                                  \
> +       })

[...]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH bpf-next 2/2] libbpf, xsk: add libbpf_smp_store_release libbpf_smp_load_acquire
  2021-03-03  4:38   ` Andrii Nakryiko
@ 2021-03-03  7:14     ` Björn Töpel
  2021-03-03  8:19       ` Björn Töpel
  0 siblings, 1 reply; 18+ messages in thread
From: Björn Töpel @ 2021-03-03  7:14 UTC (permalink / raw)
  To: Andrii Nakryiko, Björn Töpel
  Cc: Alexei Starovoitov, Daniel Borkmann, Networking, bpf,
	Magnus Karlsson, Jonathan Lemon, maximmi, Andrii Nakryiko

On 2021-03-03 05:38, Andrii Nakryiko wrote:
> On Mon, Mar 1, 2021 at 2:43 AM Björn Töpel <bjorn.topel@gmail.com> wrote:
>>
>> From: Björn Töpel <bjorn.topel@intel.com>
>>
>> Now that the AF_XDP rings have load-acquire/store-release semantics,
>> move libbpf to that as well.
>>
>> The library-internal libbpf_smp_{load_acquire,store_release} are only
>> valid for 32-bit words on ARM64.
>>
>> Also, remove the barriers that are no longer in use.
>>
>> Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
>> ---
>>   tools/lib/bpf/libbpf_util.h | 72 +++++++++++++++++++++++++------------
>>   tools/lib/bpf/xsk.h         | 17 +++------
>>   2 files changed, 55 insertions(+), 34 deletions(-)
>>
>> diff --git a/tools/lib/bpf/libbpf_util.h b/tools/lib/bpf/libbpf_util.h
>> index 59c779c5790c..94a0d7bb6f3c 100644
>> --- a/tools/lib/bpf/libbpf_util.h
>> +++ b/tools/lib/bpf/libbpf_util.h
>> @@ -5,6 +5,7 @@
>>   #define __LIBBPF_LIBBPF_UTIL_H
>>
>>   #include <stdbool.h>
>> +#include <linux/compiler.h>
>>
>>   #ifdef __cplusplus
>>   extern "C" {
>> @@ -15,29 +16,56 @@ extern "C" {
>>    * application that uses libbpf.
>>    */
>>   #if defined(__i386__) || defined(__x86_64__)
>> -# define libbpf_smp_rmb() asm volatile("" : : : "memory")
>> -# define libbpf_smp_wmb() asm volatile("" : : : "memory")
>> -# define libbpf_smp_mb() \
>> -       asm volatile("lock; addl $0,-4(%%rsp)" : : : "memory", "cc")
>> -/* Hinders stores to be observed before older loads. */
>> -# define libbpf_smp_rwmb() asm volatile("" : : : "memory")
> 
> So, technically, these four are part of libbpf's API, as libbpf_util.h
> is actually installed on target hosts. Seems like xsk.h is the only
> one that is using them, though.
> 
> So the question is whether it's ok to remove them now?
>

I would say that. Ideally, the barriers shouldn't be visible at all,
since they're only used as an implementation detail for the static
inline functions.


> And also, why wasn't this part of xsk.h in the first place?
>

I guess there was a "maybe it can be useful for more than the XDP socket 
parts of libbpf"-idea. I'll move them to xsk.h for the v2, which will 
make the migration easier.


Björn


>> +# define libbpf_smp_store_release(p, v)                                        \
>> +       do {                                                            \
>> +               asm volatile("" : : : "memory");                        \
>> +               WRITE_ONCE(*p, v);                                      \
>> +       } while (0)
>> +# define libbpf_smp_load_acquire(p)                                    \
>> +       ({                                                              \
>> +               typeof(*p) ___p1 = READ_ONCE(*p);                       \
>> +               asm volatile("" : : : "memory");                        \
>> +               ___p1;                                                  \
>> +       })
> 
> [...]
> 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH bpf-next 1/2] xsk: update rings for load-acquire/store-release semantics
  2021-03-02 10:23       ` Toke Høiland-Jørgensen
@ 2021-03-03  7:56         ` Björn Töpel
  0 siblings, 0 replies; 18+ messages in thread
From: Björn Töpel @ 2021-03-03  7:56 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: Björn Töpel, Alexei Starovoitov, Daniel Borkmann,
	Netdev, bpf, paulmck, Karlsson, Magnus, Jonathan Lemon, maximmi,
	Andrii Nakryiko

On Tue, 2 Mar 2021 at 11:23, Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>
> Björn Töpel <bjorn.topel@intel.com> writes:
>
> > On 2021-03-01 17:08, Toke Høiland-Jørgensen wrote:
> >> Björn Töpel <bjorn.topel@gmail.com> writes:
> >>
> >>> From: Björn Töpel <bjorn.topel@intel.com>
> >>>
> >>> Currently, the AF_XDP rings uses smp_{r,w,}mb() fences on the
> >>> kernel-side. By updating the rings for load-acquire/store-release
> >>> semantics, the full barrier on the consumer side can be replaced with
> >>> improved performance as a nice side-effect.
> >>>
> >>> Note that this change does *not* require similar changes on the
> >>> libbpf/userland side, however it is recommended [1].
> >>>
> >>> On x86-64 systems, by removing the smp_mb() on the Rx and Tx side, the
> >>> l2fwd AF_XDP xdpsock sample performance increases by
> >>> 1%. Weakly-ordered platforms, such as ARM64 might benefit even more.
> >>>
> >>> [1] https://lore.kernel.org/bpf/20200316184423.GA14143@willie-the-truck/
> >>>
> >>> Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
> >>> ---
> >>>   net/xdp/xsk_queue.h | 27 +++++++++++----------------
> >>>   1 file changed, 11 insertions(+), 16 deletions(-)
> >>>
> >>> diff --git a/net/xdp/xsk_queue.h b/net/xdp/xsk_queue.h
> >>> index 2823b7c3302d..e24279d8d845 100644
> >>> --- a/net/xdp/xsk_queue.h
> >>> +++ b/net/xdp/xsk_queue.h
> >>> @@ -47,19 +47,18 @@ struct xsk_queue {
> >>>     u64 queue_empty_descs;
> >>>   };
> >>>
> >>> -/* The structure of the shared state of the rings are the same as the
> >>> - * ring buffer in kernel/events/ring_buffer.c. For the Rx and completion
> >>> - * ring, the kernel is the producer and user space is the consumer. For
> >>> - * the Tx and fill rings, the kernel is the consumer and user space is
> >>> - * the producer.
> >>> +/* The structure of the shared state of the rings are a simple
> >>> + * circular buffer, as outlined in
> >>> + * Documentation/core-api/circular-buffers.rst. For the Rx and
> >>> + * completion ring, the kernel is the producer and user space is the
> >>> + * consumer. For the Tx and fill rings, the kernel is the consumer and
> >>> + * user space is the producer.
> >>>    *
> >>>    * producer                         consumer
> >>>    *
> >>> - * if (LOAD ->consumer) {           LOAD ->producer
> >>> - *                    (A)           smp_rmb()       (C)
> >>> + * if (LOAD ->consumer) {  (A)      LOAD.acq ->producer  (C)
> >>
> >> Why is LOAD.acq not needed on the consumer side?
> >>
> >
> > You mean why LOAD.acq is not needed on the *producer* side, i.e. the
> > ->consumer?
>
> Yes, of course! The two words were, like, right next to each other ;)
>
> > The ->consumer is a control dependency for the store, so there is no
> > ordering constraint for ->consumer at producer side. If there's no
> > space, no data is written. So, no barrier is needed there -- at least
> > that has been my perspective.
> >
> > This is very similar to the buffer in
> > Documentation/core-api/circular-buffers.rst. Roping in Paul for some
> > guidance.
>
> Yeah, I did read that, but got thrown off by this bit: "Therefore, the
> unlock-lock pair between consecutive invocations of the consumer
> provides the necessary ordering between the read of the index indicating
> that the consumer has vacated a given element and the write by the
> producer to that same element."
>
> Since there is no lock in the XSK, what provides that guarantee here?
>
>
> Oh, and BTW, when I re-read the rest of the comment in xsk_queue.h
> (below the diagram you are changing in this patch), the text still talks
> about "memory barriers" - maybe that should be updated to
> release/acquire as well while you're changing things?
>

Make sense! I'll make sure to do that for the V2!

Björn

> -Toke
>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH bpf-next 2/2] libbpf, xsk: add libbpf_smp_store_release libbpf_smp_load_acquire
  2021-03-02  9:25           ` Daniel Borkmann
@ 2021-03-03  8:08             ` Björn Töpel
  0 siblings, 0 replies; 18+ messages in thread
From: Björn Töpel @ 2021-03-03  8:08 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Björn Töpel, Toke Høiland-Jørgensen,
	Alexei Starovoitov, Netdev, bpf, Karlsson, Magnus,
	Jonathan Lemon, maximmi, Andrii Nakryiko, Will Deacon, paulmck

On Tue, 2 Mar 2021 at 10:25, Daniel Borkmann <daniel@iogearbox.net> wrote:
>

[...]

> > I wonder if it's possible to cook a LKMM litmus test for this...?
>
> That would be amazing! :-)
>

With the help of Paul and Alan [1] (Thanks!) I've cooked 8 litmus
tests for this [2].

The litmus tests is based on a one entry ring-buffer, and there are
two scenarios. The ring is full, i.e. the producer has written an
entry, so the consumer has to go first. The ring is empty, i.e. the
producer has to go first. There is one test for each permutation:
barrier only, acqrel only, acqrel+barrier, barrier+acqrel.

According to these tests the code in this series is correct. Now, for
the v2 some more wording/explanations are needed. Do you think I
should include the litmus tests in the patch, or just refer to them?
Paste parts of them into the cover?

> (Another option which can be done independently could be to update [0] with outlining a
>   pairing scenario as we have here describing the forward/backward compatibility on the
>   barriers used, I think that would be quite useful as well.)
>
>    [0] Documentation/memory-barriers.txt
>

Yeah, I agree. There is some information on it though in the "SMP
BARRIER PAIRING" section:
--8<--
General barriers pair with each other, though they also pair with most
other types of barriers, albeit without multicopy atomicity.  An acquire
barrier pairs with a release barrier, but both may also pair with other
barriers, including of course general barriers.  A write barrier pairs
with a data dependency barrier, a control dependency, an acquire barrier,
a release barrier, a read barrier, or a general barrier.  Similarly a
read barrier, control dependency, or a data dependency barrier pairs
with a write barrier, an acquire barrier, a release barrier, or a
general barrier:
-->8--

And there's the tools/memory-model/Documentation/cheatsheet.txt.

That being said; In this case more is more. :-D


Björn

[1] https://lore.kernel.org/lkml/CAJ+HfNhxWFeKnn1aZw-YJmzpBuCaoeGkXXKn058GhY-6ZBDtZA@mail.gmail.com/
[2] https://github.com/bjoto/litmus-xsk/commit/0db0dc426a7e1248f83e21f10f9e840f970f4cb7

> >> Would also be great to get Will's ACK on that when you have a v2. :)
> >
> > Yup! :-)
> >
> >
> > Björn
> >
> >
> >> Thanks,
> >> Daniel
> >>
> >>> [1] https://lore.kernel.org/bpf/20200316184423.GA14143@willie-the-truck/
>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH bpf-next 2/2] libbpf, xsk: add libbpf_smp_store_release libbpf_smp_load_acquire
  2021-03-03  7:14     ` Björn Töpel
@ 2021-03-03  8:19       ` Björn Töpel
  0 siblings, 0 replies; 18+ messages in thread
From: Björn Töpel @ 2021-03-03  8:19 UTC (permalink / raw)
  To: Björn Töpel
  Cc: Andrii Nakryiko, Alexei Starovoitov, Daniel Borkmann, Networking,
	bpf, Magnus Karlsson, Jonathan Lemon, maximmi, Andrii Nakryiko

On Wed, 3 Mar 2021 at 08:14, Björn Töpel <bjorn.topel@intel.com> wrote:
>
> On 2021-03-03 05:38, Andrii Nakryiko wrote:
> > On Mon, Mar 1, 2021 at 2:43 AM Björn Töpel <bjorn.topel@gmail.com> wrote:
> >>
> >> From: Björn Töpel <bjorn.topel@intel.com>
> >>
> >> Now that the AF_XDP rings have load-acquire/store-release semantics,
> >> move libbpf to that as well.
> >>
> >> The library-internal libbpf_smp_{load_acquire,store_release} are only
> >> valid for 32-bit words on ARM64.
> >>
> >> Also, remove the barriers that are no longer in use.
> >>
> >> Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
> >> ---
> >>   tools/lib/bpf/libbpf_util.h | 72 +++++++++++++++++++++++++------------
> >>   tools/lib/bpf/xsk.h         | 17 +++------
> >>   2 files changed, 55 insertions(+), 34 deletions(-)
> >>
> >> diff --git a/tools/lib/bpf/libbpf_util.h b/tools/lib/bpf/libbpf_util.h
> >> index 59c779c5790c..94a0d7bb6f3c 100644
> >> --- a/tools/lib/bpf/libbpf_util.h
> >> +++ b/tools/lib/bpf/libbpf_util.h
> >> @@ -5,6 +5,7 @@
> >>   #define __LIBBPF_LIBBPF_UTIL_H
> >>
> >>   #include <stdbool.h>
> >> +#include <linux/compiler.h>
> >>
> >>   #ifdef __cplusplus
> >>   extern "C" {
> >> @@ -15,29 +16,56 @@ extern "C" {
> >>    * application that uses libbpf.
> >>    */
> >>   #if defined(__i386__) || defined(__x86_64__)
> >> -# define libbpf_smp_rmb() asm volatile("" : : : "memory")
> >> -# define libbpf_smp_wmb() asm volatile("" : : : "memory")
> >> -# define libbpf_smp_mb() \
> >> -       asm volatile("lock; addl $0,-4(%%rsp)" : : : "memory", "cc")
> >> -/* Hinders stores to be observed before older loads. */
> >> -# define libbpf_smp_rwmb() asm volatile("" : : : "memory")
> >
> > So, technically, these four are part of libbpf's API, as libbpf_util.h
> > is actually installed on target hosts. Seems like xsk.h is the only
> > one that is using them, though.
> >
> > So the question is whether it's ok to remove them now?
> >
>
> I would say that. Ideally, the barriers shouldn't be visible at all,
> since they're only used as an implementation detail for the static
> inline functions.
>
>
> > And also, why wasn't this part of xsk.h in the first place?
> >
>
> I guess there was a "maybe it can be useful for more than the XDP socket
> parts of libbpf"-idea. I'll move them to xsk.h for the v2, which will
> make the migration easier.
>

Clarification! The reason for not having them in xsk.h, was that the
idea was that only the APIs allowed from the application should reside
there. IOW, libbpf_utils.h is only "implementation details". Again,
the static-inline function messes things up. Maybe moving to an
LTO-only world would be better, so we can get rid of the inlining all
together.

>
> Björn
>
>
> >> +# define libbpf_smp_store_release(p, v)                                        \
> >> +       do {                                                            \
> >> +               asm volatile("" : : : "memory");                        \
> >> +               WRITE_ONCE(*p, v);                                      \
> >> +       } while (0)
> >> +# define libbpf_smp_load_acquire(p)                                    \
> >> +       ({                                                              \
> >> +               typeof(*p) ___p1 = READ_ONCE(*p);                       \
> >> +               asm volatile("" : : : "memory");                        \
> >> +               ___p1;                                                  \
> >> +       })
> >
> > [...]
> >

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH bpf-next 2/2] libbpf, xsk: add libbpf_smp_store_release libbpf_smp_load_acquire
  2021-03-02  9:13       ` Daniel Borkmann
  2021-03-02  9:16         ` Björn Töpel
@ 2021-03-03 15:39         ` Will Deacon
  2021-03-03 16:34           ` Björn Töpel
  1 sibling, 1 reply; 18+ messages in thread
From: Will Deacon @ 2021-03-03 15:39 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Björn Töpel, Toke Høiland-Jørgensen,
	Björn Töpel, ast, netdev, bpf, magnus.karlsson,
	jonathan.lemon, maximmi, andrii

On Tue, Mar 02, 2021 at 10:13:21AM +0100, Daniel Borkmann wrote:
> On 3/2/21 9:05 AM, Björn Töpel wrote:
> > On 2021-03-01 17:10, Toke Høiland-Jørgensen wrote:
> > > Björn Töpel <bjorn.topel@gmail.com> writes:
> > > > From: Björn Töpel <bjorn.topel@intel.com>
> > > > 
> > > > Now that the AF_XDP rings have load-acquire/store-release semantics,
> > > > move libbpf to that as well.
> > > > 
> > > > The library-internal libbpf_smp_{load_acquire,store_release} are only
> > > > valid for 32-bit words on ARM64.
> > > > 
> > > > Also, remove the barriers that are no longer in use.
> > > 
> > > So what happens if an updated libbpf is paired with an older kernel (or
> > > vice versa)?
> > 
> > "This is fine." ;-) This was briefly discussed in [1], outlined by the
> > previous commit!
> > 
> > ...even on POWER.
> 
> Could you put a summary or quote of that discussion on 'why it is okay and does not
> cause /forward or backward/ compat issues with user space' directly into patch 1's
> commit message?
> 
> I feel just referring to a link is probably less suitable in this case as it should
> rather be part of the commit message that contains the justification on why it is
> waterproof - at least it feels that specific area may be a bit under-documented, so
> having it as direct part certainly doesn't hurt.
> 
> Would also be great to get Will's ACK on that when you have a v2. :)

Please stick me on CC for that and I'll take a look as I've forgotten pretty
much everything about this since last time :)

Will

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH bpf-next 2/2] libbpf, xsk: add libbpf_smp_store_release libbpf_smp_load_acquire
  2021-03-03 15:39         ` Will Deacon
@ 2021-03-03 16:34           ` Björn Töpel
  0 siblings, 0 replies; 18+ messages in thread
From: Björn Töpel @ 2021-03-03 16:34 UTC (permalink / raw)
  To: Will Deacon, Daniel Borkmann
  Cc: Toke Høiland-Jørgensen, Björn Töpel, ast,
	netdev, bpf, magnus.karlsson, jonathan.lemon, maximmi, andrii

On 2021-03-03 16:39, Will Deacon wrote:
> On Tue, Mar 02, 2021 at 10:13:21AM +0100, Daniel Borkmann wrote:

[...]

>>
>> Would also be great to get Will's ACK on that when you have a v2. :)
> 
> Please stick me on CC for that and I'll take a look as I've forgotten pretty
> much everything about this since last time :)
>

:-) I'll do that!

Thanks!


> Will
> 

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2021-03-03 22:57 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-01 10:43 [PATCH bpf-next 0/2] load-acquire/store-release semantics for AF_XDP rings Björn Töpel
2021-03-01 10:43 ` [PATCH bpf-next 1/2] xsk: update rings for load-acquire/store-release semantics Björn Töpel
2021-03-01 16:08   ` Toke Høiland-Jørgensen
2021-03-02  8:04     ` Björn Töpel
2021-03-02 10:23       ` Toke Høiland-Jørgensen
2021-03-03  7:56         ` Björn Töpel
2021-03-01 10:43 ` [PATCH bpf-next 2/2] libbpf, xsk: add libbpf_smp_store_release libbpf_smp_load_acquire Björn Töpel
2021-03-01 16:10   ` Toke Høiland-Jørgensen
2021-03-02  8:05     ` Björn Töpel
2021-03-02  9:13       ` Daniel Borkmann
2021-03-02  9:16         ` Björn Töpel
2021-03-02  9:25           ` Daniel Borkmann
2021-03-03  8:08             ` Björn Töpel
2021-03-03 15:39         ` Will Deacon
2021-03-03 16:34           ` Björn Töpel
2021-03-03  4:38   ` Andrii Nakryiko
2021-03-03  7:14     ` Björn Töpel
2021-03-03  8:19       ` Björn Töpel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).