BPF Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH bpf-next] xsk: introduce padding between ring pointers
@ 2020-10-08 14:12 Magnus Karlsson
  2020-10-09 15:03 ` Daniel Borkmann
  0 siblings, 1 reply; 6+ messages in thread
From: Magnus Karlsson @ 2020-10-08 14:12 UTC (permalink / raw)
  To: magnus.karlsson, bjorn.topel, ast, daniel, netdev, jonathan.lemon; +Cc: bpf

From: Magnus Karlsson <magnus.karlsson@intel.com>

Introduce one cache line worth of padding between the producer and
consumer pointers in all the lockless rings. This so that the HW
adjacency prefetcher will not prefetch the consumer pointer when the
producer pointer is used and vice versa. This improves throughput
performance for the l2fwd sample app with 2% on my machine with HW
prefetching turned on.

Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
---
 net/xdp/xsk_queue.h | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/net/xdp/xsk_queue.h b/net/xdp/xsk_queue.h
index dc1dd5e..3c235d2 100644
--- a/net/xdp/xsk_queue.h
+++ b/net/xdp/xsk_queue.h
@@ -15,6 +15,10 @@
 
 struct xdp_ring {
 	u32 producer ____cacheline_aligned_in_smp;
+	/* Hinder the adjacent cache prefetcher to prefetch the consumer pointer if the producer
+	 * pointer is touched and vice versa.
+	 */
+	u32 pad ____cacheline_aligned_in_smp;
 	u32 consumer ____cacheline_aligned_in_smp;
 	u32 flags;
 };
-- 
2.7.4


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH bpf-next] xsk: introduce padding between ring pointers
  2020-10-08 14:12 [PATCH bpf-next] xsk: introduce padding between ring pointers Magnus Karlsson
@ 2020-10-09 15:03 ` Daniel Borkmann
  2020-10-12  8:37   ` Magnus Karlsson
  0 siblings, 1 reply; 6+ messages in thread
From: Daniel Borkmann @ 2020-10-09 15:03 UTC (permalink / raw)
  To: Magnus Karlsson, magnus.karlsson, bjorn.topel, ast, netdev,
	jonathan.lemon
  Cc: bpf, john.fastabend

On 10/8/20 4:12 PM, Magnus Karlsson wrote:
> From: Magnus Karlsson <magnus.karlsson@intel.com>
> 
> Introduce one cache line worth of padding between the producer and
> consumer pointers in all the lockless rings. This so that the HW
> adjacency prefetcher will not prefetch the consumer pointer when the
> producer pointer is used and vice versa. This improves throughput
> performance for the l2fwd sample app with 2% on my machine with HW
> prefetching turned on.
> 
> Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>

Applied, thanks!

>   net/xdp/xsk_queue.h | 4 ++++
>   1 file changed, 4 insertions(+)
> 
> diff --git a/net/xdp/xsk_queue.h b/net/xdp/xsk_queue.h
> index dc1dd5e..3c235d2 100644
> --- a/net/xdp/xsk_queue.h
> +++ b/net/xdp/xsk_queue.h
> @@ -15,6 +15,10 @@
>   
>   struct xdp_ring {
>   	u32 producer ____cacheline_aligned_in_smp;
> +	/* Hinder the adjacent cache prefetcher to prefetch the consumer pointer if the producer
> +	 * pointer is touched and vice versa.
> +	 */
> +	u32 pad ____cacheline_aligned_in_smp;
>   	u32 consumer ____cacheline_aligned_in_smp;
>   	u32 flags;
>   };
> 

I was wondering whether we should even generalize this further for reuse
elsewhere e.g. ...

diff --git a/include/linux/cache.h b/include/linux/cache.h
index 1aa8009f6d06..5521dab01649 100644
--- a/include/linux/cache.h
+++ b/include/linux/cache.h
@@ -85,4 +85,17 @@
  #define cache_line_size()      L1_CACHE_BYTES
  #endif

+/*
+ * Dummy element for use in structs in order to pad a cacheline
+ * aligned element with an extra cacheline to hinder the adjacent
+ * cache prefetcher to prefetch the subsequent struct element.
+ */
+#ifndef ____cacheline_padding_in_smp
+# ifdef CONFIG_SMP
+#  define ____cacheline_padding_in_smp u8 :8 ____cacheline_aligned_in_smp
+# else
+#  define ____cacheline_padding_in_smp
+# endif /* CONFIG_SMP */
+#endif
+
  #endif /* __LINUX_CACHE_H */
diff --git a/net/xdp/xsk_queue.h b/net/xdp/xsk_queue.h
index cdb9cf3cd136..1da36423e779 100644
--- a/net/xdp/xsk_queue.h
+++ b/net/xdp/xsk_queue.h
@@ -15,11 +15,9 @@

  struct xdp_ring {
         u32 producer ____cacheline_aligned_in_smp;
-       /* Hinder the adjacent cache prefetcher to prefetch the consumer
-        * pointer if the producer pointer is touched and vice versa.
-        */
-       u32 pad ____cacheline_aligned_in_smp;
+       ____cacheline_padding_in_smp;
         u32 consumer ____cacheline_aligned_in_smp;
+       ____cacheline_padding_in_smp;
         u32 flags;
  };

... was there any improvement to also pad after the consumer given the struct
xdp_ring is also embedded into other structs?

Thanks,
Daniel

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH bpf-next] xsk: introduce padding between ring pointers
  2020-10-09 15:03 ` Daniel Borkmann
@ 2020-10-12  8:37   ` Magnus Karlsson
  2020-10-12  9:37     ` Magnus Karlsson
  0 siblings, 1 reply; 6+ messages in thread
From: Magnus Karlsson @ 2020-10-12  8:37 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Karlsson, Magnus, Björn Töpel, Alexei Starovoitov,
	Network Development, Jonathan Lemon, bpf, John Fastabend

On Fri, Oct 9, 2020 at 5:03 PM Daniel Borkmann <daniel@iogearbox.net> wrote:
>
> On 10/8/20 4:12 PM, Magnus Karlsson wrote:
> > From: Magnus Karlsson <magnus.karlsson@intel.com>
> >
> > Introduce one cache line worth of padding between the producer and
> > consumer pointers in all the lockless rings. This so that the HW
> > adjacency prefetcher will not prefetch the consumer pointer when the
> > producer pointer is used and vice versa. This improves throughput
> > performance for the l2fwd sample app with 2% on my machine with HW
> > prefetching turned on.
> >
> > Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
>
> Applied, thanks!
>
> >   net/xdp/xsk_queue.h | 4 ++++
> >   1 file changed, 4 insertions(+)
> >
> > diff --git a/net/xdp/xsk_queue.h b/net/xdp/xsk_queue.h
> > index dc1dd5e..3c235d2 100644
> > --- a/net/xdp/xsk_queue.h
> > +++ b/net/xdp/xsk_queue.h
> > @@ -15,6 +15,10 @@
> >
> >   struct xdp_ring {
> >       u32 producer ____cacheline_aligned_in_smp;
> > +     /* Hinder the adjacent cache prefetcher to prefetch the consumer pointer if the producer
> > +      * pointer is touched and vice versa.
> > +      */
> > +     u32 pad ____cacheline_aligned_in_smp;
> >       u32 consumer ____cacheline_aligned_in_smp;
> >       u32 flags;
> >   };
> >
>
> I was wondering whether we should even generalize this further for reuse
> elsewhere e.g. ...
>
> diff --git a/include/linux/cache.h b/include/linux/cache.h
> index 1aa8009f6d06..5521dab01649 100644
> --- a/include/linux/cache.h
> +++ b/include/linux/cache.h
> @@ -85,4 +85,17 @@
>   #define cache_line_size()      L1_CACHE_BYTES
>   #endif
>
> +/*
> + * Dummy element for use in structs in order to pad a cacheline
> + * aligned element with an extra cacheline to hinder the adjacent
> + * cache prefetcher to prefetch the subsequent struct element.
> + */
> +#ifndef ____cacheline_padding_in_smp
> +# ifdef CONFIG_SMP
> +#  define ____cacheline_padding_in_smp u8 :8 ____cacheline_aligned_in_smp
> +# else
> +#  define ____cacheline_padding_in_smp
> +# endif /* CONFIG_SMP */
> +#endif
> +
>   #endif /* __LINUX_CACHE_H */
> diff --git a/net/xdp/xsk_queue.h b/net/xdp/xsk_queue.h
> index cdb9cf3cd136..1da36423e779 100644
> --- a/net/xdp/xsk_queue.h
> +++ b/net/xdp/xsk_queue.h
> @@ -15,11 +15,9 @@
>
>   struct xdp_ring {
>          u32 producer ____cacheline_aligned_in_smp;
> -       /* Hinder the adjacent cache prefetcher to prefetch the consumer
> -        * pointer if the producer pointer is touched and vice versa.
> -        */
> -       u32 pad ____cacheline_aligned_in_smp;
> +       ____cacheline_padding_in_smp;
>          u32 consumer ____cacheline_aligned_in_smp;
> +       ____cacheline_padding_in_smp;
>          u32 flags;
>   };

This should be beneficial in theory, though I could not measure any
statistically significant improvement. Though, the flags variable is
touched much less frequently than the producer and consumer pointers,
so that might explain it. We also need to make the compiler allocate
flags to a cache line 128 bytes (2 cache lines) from the consumer
pointer like this:

u32 consumer ____cacheline_aligned_in_smp;
____cacheline_padding_in_smp;
u32 flags ____cacheline_aligned_in_smp;

> ... was there any improvement to also pad after the consumer given the struct
> xdp_ring is also embedded into other structs?

Good idea. Yes, I do believe I see another ~0.4% increase and more
stable high numbers when trying this out. The xdp_ring is followed by
the ring descriptors themselves in both the rt/tx rings and the umem
rings. And these rings are quite large, 2K in the sample app, so
accessed less frequently (1/8th of the time with a batch size of 256
and ring size 2K) which might explain the lower increase. In the end,
I ended up with the following struct:

u32 producer ____cacheline_aligned_in_smp;
____cacheline_padding_in_smp;
u32 consumer ____cacheline_aligned_in_smp;
____cacheline_padding_in_smp;
u32 flags ____cacheline_aligned_in_smp;
____cacheline_padding_in_smp;

Do you want to submit a patch, or shall I do it? I like your
____cacheline_padding_in_smp better than my explicit "padN" member.

Thanks: Magnus

> Thanks,
> Daniel

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH bpf-next] xsk: introduce padding between ring pointers
  2020-10-12  8:37   ` Magnus Karlsson
@ 2020-10-12  9:37     ` Magnus Karlsson
  2020-10-12 11:13       ` Magnus Karlsson
  0 siblings, 1 reply; 6+ messages in thread
From: Magnus Karlsson @ 2020-10-12  9:37 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Karlsson, Magnus, Björn Töpel, Alexei Starovoitov,
	Network Development, Jonathan Lemon, bpf, John Fastabend

On Mon, Oct 12, 2020 at 10:37 AM Magnus Karlsson
<magnus.karlsson@gmail.com> wrote:
>
> On Fri, Oct 9, 2020 at 5:03 PM Daniel Borkmann <daniel@iogearbox.net> wrote:
> >
> > On 10/8/20 4:12 PM, Magnus Karlsson wrote:
> > > From: Magnus Karlsson <magnus.karlsson@intel.com>
> > >
> > > Introduce one cache line worth of padding between the producer and
> > > consumer pointers in all the lockless rings. This so that the HW
> > > adjacency prefetcher will not prefetch the consumer pointer when the
> > > producer pointer is used and vice versa. This improves throughput
> > > performance for the l2fwd sample app with 2% on my machine with HW
> > > prefetching turned on.
> > >
> > > Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
> >
> > Applied, thanks!
> >
> > >   net/xdp/xsk_queue.h | 4 ++++
> > >   1 file changed, 4 insertions(+)
> > >
> > > diff --git a/net/xdp/xsk_queue.h b/net/xdp/xsk_queue.h
> > > index dc1dd5e..3c235d2 100644
> > > --- a/net/xdp/xsk_queue.h
> > > +++ b/net/xdp/xsk_queue.h
> > > @@ -15,6 +15,10 @@
> > >
> > >   struct xdp_ring {
> > >       u32 producer ____cacheline_aligned_in_smp;
> > > +     /* Hinder the adjacent cache prefetcher to prefetch the consumer pointer if the producer
> > > +      * pointer is touched and vice versa.
> > > +      */
> > > +     u32 pad ____cacheline_aligned_in_smp;
> > >       u32 consumer ____cacheline_aligned_in_smp;
> > >       u32 flags;
> > >   };
> > >
> >
> > I was wondering whether we should even generalize this further for reuse
> > elsewhere e.g. ...
> >
> > diff --git a/include/linux/cache.h b/include/linux/cache.h
> > index 1aa8009f6d06..5521dab01649 100644
> > --- a/include/linux/cache.h
> > +++ b/include/linux/cache.h
> > @@ -85,4 +85,17 @@
> >   #define cache_line_size()      L1_CACHE_BYTES
> >   #endif
> >
> > +/*
> > + * Dummy element for use in structs in order to pad a cacheline
> > + * aligned element with an extra cacheline to hinder the adjacent
> > + * cache prefetcher to prefetch the subsequent struct element.
> > + */
> > +#ifndef ____cacheline_padding_in_smp
> > +# ifdef CONFIG_SMP
> > +#  define ____cacheline_padding_in_smp u8 :8 ____cacheline_aligned_in_smp
> > +# else
> > +#  define ____cacheline_padding_in_smp
> > +# endif /* CONFIG_SMP */
> > +#endif
> > +
> >   #endif /* __LINUX_CACHE_H */
> > diff --git a/net/xdp/xsk_queue.h b/net/xdp/xsk_queue.h
> > index cdb9cf3cd136..1da36423e779 100644
> > --- a/net/xdp/xsk_queue.h
> > +++ b/net/xdp/xsk_queue.h
> > @@ -15,11 +15,9 @@
> >
> >   struct xdp_ring {
> >          u32 producer ____cacheline_aligned_in_smp;
> > -       /* Hinder the adjacent cache prefetcher to prefetch the consumer
> > -        * pointer if the producer pointer is touched and vice versa.
> > -        */
> > -       u32 pad ____cacheline_aligned_in_smp;
> > +       ____cacheline_padding_in_smp;
> >          u32 consumer ____cacheline_aligned_in_smp;
> > +       ____cacheline_padding_in_smp;
> >          u32 flags;
> >   };
>
> This should be beneficial in theory, though I could not measure any
> statistically significant improvement. Though, the flags variable is
> touched much less frequently than the producer and consumer pointers,
> so that might explain it. We also need to make the compiler allocate
> flags to a cache line 128 bytes (2 cache lines) from the consumer
> pointer like this:
>
> u32 consumer ____cacheline_aligned_in_smp;
> ____cacheline_padding_in_smp;
> u32 flags ____cacheline_aligned_in_smp;
>
> > ... was there any improvement to also pad after the consumer given the struct
> > xdp_ring is also embedded into other structs?
>
> Good idea. Yes, I do believe I see another ~0.4% increase and more
> stable high numbers when trying this out. The xdp_ring is followed by
> the ring descriptors themselves in both the rt/tx rings and the umem
> rings. And these rings are quite large, 2K in the sample app, so
> accessed less frequently (1/8th of the time with a batch size of 256
> and ring size 2K) which might explain the lower increase. In the end,
> I ended up with the following struct:
>
> u32 producer ____cacheline_aligned_in_smp;
> ____cacheline_padding_in_smp;
> u32 consumer ____cacheline_aligned_in_smp;
> ____cacheline_padding_in_smp;
> u32 flags ____cacheline_aligned_in_smp;
> ____cacheline_padding_in_smp;

Actually, this might make more sense and save some memory:

u32 producer ____cacheline_aligned_in_smp;
____cacheline_padding_in_smp;
u32 consumer ____cacheline_aligned_in_smp;
u32 flags;
____cacheline_padding_in_smp;

So keep the flags colocated with the consumer on the same cache line.
The reason I put it there to start with was that it is usually set in
conjunction with the consumer pointer being updated. This might also
explain why I did not see any performance improvement by putting it on
its own 128-byte cache line. In summary, we make sure producer and
consumer are separated with 128 bytes as well as consumer and the
start of the descriptor ring.

> Do you want to submit a patch, or shall I do it? I like your
> ____cacheline_padding_in_smp better than my explicit "padN" member.
>
> Thanks: Magnus
>
> > Thanks,
> > Daniel

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH bpf-next] xsk: introduce padding between ring pointers
  2020-10-12  9:37     ` Magnus Karlsson
@ 2020-10-12 11:13       ` Magnus Karlsson
  2020-10-12 22:28         ` Daniel Borkmann
  0 siblings, 1 reply; 6+ messages in thread
From: Magnus Karlsson @ 2020-10-12 11:13 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Karlsson, Magnus, Björn Töpel, Alexei Starovoitov,
	Network Development, Jonathan Lemon, bpf, John Fastabend

On Mon, Oct 12, 2020 at 11:37 AM Magnus Karlsson
<magnus.karlsson@gmail.com> wrote:
>
> On Mon, Oct 12, 2020 at 10:37 AM Magnus Karlsson
> <magnus.karlsson@gmail.com> wrote:
> >
> > On Fri, Oct 9, 2020 at 5:03 PM Daniel Borkmann <daniel@iogearbox.net> wrote:
> > >
> > > On 10/8/20 4:12 PM, Magnus Karlsson wrote:
> > > > From: Magnus Karlsson <magnus.karlsson@intel.com>
> > > >
> > > > Introduce one cache line worth of padding between the producer and
> > > > consumer pointers in all the lockless rings. This so that the HW
> > > > adjacency prefetcher will not prefetch the consumer pointer when the
> > > > producer pointer is used and vice versa. This improves throughput
> > > > performance for the l2fwd sample app with 2% on my machine with HW
> > > > prefetching turned on.
> > > >
> > > > Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
> > >
> > > Applied, thanks!
> > >
> > > >   net/xdp/xsk_queue.h | 4 ++++
> > > >   1 file changed, 4 insertions(+)
> > > >
> > > > diff --git a/net/xdp/xsk_queue.h b/net/xdp/xsk_queue.h
> > > > index dc1dd5e..3c235d2 100644
> > > > --- a/net/xdp/xsk_queue.h
> > > > +++ b/net/xdp/xsk_queue.h
> > > > @@ -15,6 +15,10 @@
> > > >
> > > >   struct xdp_ring {
> > > >       u32 producer ____cacheline_aligned_in_smp;
> > > > +     /* Hinder the adjacent cache prefetcher to prefetch the consumer pointer if the producer
> > > > +      * pointer is touched and vice versa.
> > > > +      */
> > > > +     u32 pad ____cacheline_aligned_in_smp;
> > > >       u32 consumer ____cacheline_aligned_in_smp;
> > > >       u32 flags;
> > > >   };
> > > >
> > >
> > > I was wondering whether we should even generalize this further for reuse
> > > elsewhere e.g. ...
> > >
> > > diff --git a/include/linux/cache.h b/include/linux/cache.h
> > > index 1aa8009f6d06..5521dab01649 100644
> > > --- a/include/linux/cache.h
> > > +++ b/include/linux/cache.h
> > > @@ -85,4 +85,17 @@
> > >   #define cache_line_size()      L1_CACHE_BYTES
> > >   #endif
> > >
> > > +/*
> > > + * Dummy element for use in structs in order to pad a cacheline
> > > + * aligned element with an extra cacheline to hinder the adjacent
> > > + * cache prefetcher to prefetch the subsequent struct element.
> > > + */
> > > +#ifndef ____cacheline_padding_in_smp
> > > +# ifdef CONFIG_SMP
> > > +#  define ____cacheline_padding_in_smp u8 :8 ____cacheline_aligned_in_smp
> > > +# else
> > > +#  define ____cacheline_padding_in_smp
> > > +# endif /* CONFIG_SMP */
> > > +#endif
> > > +
> > >   #endif /* __LINUX_CACHE_H */
> > > diff --git a/net/xdp/xsk_queue.h b/net/xdp/xsk_queue.h
> > > index cdb9cf3cd136..1da36423e779 100644
> > > --- a/net/xdp/xsk_queue.h
> > > +++ b/net/xdp/xsk_queue.h
> > > @@ -15,11 +15,9 @@
> > >
> > >   struct xdp_ring {
> > >          u32 producer ____cacheline_aligned_in_smp;
> > > -       /* Hinder the adjacent cache prefetcher to prefetch the consumer
> > > -        * pointer if the producer pointer is touched and vice versa.
> > > -        */
> > > -       u32 pad ____cacheline_aligned_in_smp;
> > > +       ____cacheline_padding_in_smp;
> > >          u32 consumer ____cacheline_aligned_in_smp;
> > > +       ____cacheline_padding_in_smp;
> > >          u32 flags;
> > >   };
> >
> > This should be beneficial in theory, though I could not measure any
> > statistically significant improvement. Though, the flags variable is
> > touched much less frequently than the producer and consumer pointers,
> > so that might explain it. We also need to make the compiler allocate
> > flags to a cache line 128 bytes (2 cache lines) from the consumer
> > pointer like this:
> >
> > u32 consumer ____cacheline_aligned_in_smp;
> > ____cacheline_padding_in_smp;
> > u32 flags ____cacheline_aligned_in_smp;
> >
> > > ... was there any improvement to also pad after the consumer given the struct
> > > xdp_ring is also embedded into other structs?
> >
> > Good idea. Yes, I do believe I see another ~0.4% increase and more
> > stable high numbers when trying this out. The xdp_ring is followed by
> > the ring descriptors themselves in both the rt/tx rings and the umem
> > rings. And these rings are quite large, 2K in the sample app, so
> > accessed less frequently (1/8th of the time with a batch size of 256
> > and ring size 2K) which might explain the lower increase. In the end,
> > I ended up with the following struct:
> >
> > u32 producer ____cacheline_aligned_in_smp;
> > ____cacheline_padding_in_smp;
> > u32 consumer ____cacheline_aligned_in_smp;
> > ____cacheline_padding_in_smp;
> > u32 flags ____cacheline_aligned_in_smp;
> > ____cacheline_padding_in_smp;
>
> Actually, this might make more sense and save some memory:
>
> u32 producer ____cacheline_aligned_in_smp;
> ____cacheline_padding_in_smp;
> u32 consumer ____cacheline_aligned_in_smp;
> u32 flags;
> ____cacheline_padding_in_smp;
>
> So keep the flags colocated with the consumer on the same cache line.
> The reason I put it there to start with was that it is usually set in
> conjunction with the consumer pointer being updated. This might also
> explain why I did not see any performance improvement by putting it on
> its own 128-byte cache line. In summary, we make sure producer and
> consumer are separated with 128 bytes as well as consumer and the
> start of the descriptor ring.

Nope, that was a bad idea. After measuring, this one produces worse
performance than the original suggestion with padding in between all
members. Cannot explain why at the moment, but the numbers are
convincing and above noise level for sure. So let us keep this one:

u32 producer ____cacheline_aligned_in_smp;
____cacheline_padding_in_smp;
u32 consumer ____cacheline_aligned_in_smp;
____cacheline_padding_in_smp;
u32 flags ____cacheline_aligned_in_smp;
____cacheline_padding_in_smp;

> > Do you want to submit a patch, or shall I do it? I like your
> > ____cacheline_padding_in_smp better than my explicit "padN" member.
> >
> > Thanks: Magnus
> >
> > > Thanks,
> > > Daniel

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH bpf-next] xsk: introduce padding between ring pointers
  2020-10-12 11:13       ` Magnus Karlsson
@ 2020-10-12 22:28         ` Daniel Borkmann
  0 siblings, 0 replies; 6+ messages in thread
From: Daniel Borkmann @ 2020-10-12 22:28 UTC (permalink / raw)
  To: Magnus Karlsson
  Cc: Karlsson, Magnus, Björn Töpel, Alexei Starovoitov,
	Network Development, Jonathan Lemon, bpf, John Fastabend

On 10/12/20 1:13 PM, Magnus Karlsson wrote:
[...]
> Nope, that was a bad idea. After measuring, this one produces worse
> performance than the original suggestion with padding in between all
> members. Cannot explain why at the moment, but the numbers are
> convincing and above noise level for sure. So let us keep this one:
> 
> u32 producer ____cacheline_aligned_in_smp;
> ____cacheline_padding_in_smp;
> u32 consumer ____cacheline_aligned_in_smp;
> ____cacheline_padding_in_smp;
> u32 flags ____cacheline_aligned_in_smp;
> ____cacheline_padding_in_smp;
> 
>>> Do you want to submit a patch, or shall I do it? I like your
>>> ____cacheline_padding_in_smp better than my explicit "padN" member.

Ok, feel free to go for it.

Thanks,
Daniel

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, back to index

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-08 14:12 [PATCH bpf-next] xsk: introduce padding between ring pointers Magnus Karlsson
2020-10-09 15:03 ` Daniel Borkmann
2020-10-12  8:37   ` Magnus Karlsson
2020-10-12  9:37     ` Magnus Karlsson
2020-10-12 11:13       ` Magnus Karlsson
2020-10-12 22:28         ` Daniel Borkmann

BPF Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/bpf/0 bpf/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 bpf bpf/ https://lore.kernel.org/bpf \
		bpf@vger.kernel.org
	public-inbox-index bpf

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.bpf


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git