From: Magnus Karlsson <magnus.karlsson@gmail.com>
To: Daniel Borkmann <daniel@iogearbox.net>
Cc: "Karlsson, Magnus" <magnus.karlsson@intel.com>,
"Björn Töpel" <bjorn.topel@intel.com>,
"Alexei Starovoitov" <ast@kernel.org>,
"Network Development" <netdev@vger.kernel.org>,
"Jonathan Lemon" <jonathan.lemon@gmail.com>,
bpf <bpf@vger.kernel.org>,
"John Fastabend" <john.fastabend@gmail.com>
Subject: Re: [PATCH bpf-next] xsk: introduce padding between ring pointers
Date: Mon, 12 Oct 2020 11:37:28 +0200 [thread overview]
Message-ID: <CAJ8uoz1Z4dpaoK5th092gid+xbcp1Rz1wkPXZuuceh5y0wvKYw@mail.gmail.com> (raw)
In-Reply-To: <CAJ8uoz3nfDe0a9Vp0NmnHVv5qM+kvqR-f6Yd0keKSqctNzi6=g@mail.gmail.com>
On Mon, Oct 12, 2020 at 10:37 AM Magnus Karlsson
<magnus.karlsson@gmail.com> wrote:
>
> On Fri, Oct 9, 2020 at 5:03 PM Daniel Borkmann <daniel@iogearbox.net> wrote:
> >
> > On 10/8/20 4:12 PM, Magnus Karlsson wrote:
> > > From: Magnus Karlsson <magnus.karlsson@intel.com>
> > >
> > > Introduce one cache line worth of padding between the producer and
> > > consumer pointers in all the lockless rings. This so that the HW
> > > adjacency prefetcher will not prefetch the consumer pointer when the
> > > producer pointer is used and vice versa. This improves throughput
> > > performance for the l2fwd sample app with 2% on my machine with HW
> > > prefetching turned on.
> > >
> > > Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
> >
> > Applied, thanks!
> >
> > > net/xdp/xsk_queue.h | 4 ++++
> > > 1 file changed, 4 insertions(+)
> > >
> > > diff --git a/net/xdp/xsk_queue.h b/net/xdp/xsk_queue.h
> > > index dc1dd5e..3c235d2 100644
> > > --- a/net/xdp/xsk_queue.h
> > > +++ b/net/xdp/xsk_queue.h
> > > @@ -15,6 +15,10 @@
> > >
> > > struct xdp_ring {
> > > u32 producer ____cacheline_aligned_in_smp;
> > > + /* Hinder the adjacent cache prefetcher to prefetch the consumer pointer if the producer
> > > + * pointer is touched and vice versa.
> > > + */
> > > + u32 pad ____cacheline_aligned_in_smp;
> > > u32 consumer ____cacheline_aligned_in_smp;
> > > u32 flags;
> > > };
> > >
> >
> > I was wondering whether we should even generalize this further for reuse
> > elsewhere e.g. ...
> >
> > diff --git a/include/linux/cache.h b/include/linux/cache.h
> > index 1aa8009f6d06..5521dab01649 100644
> > --- a/include/linux/cache.h
> > +++ b/include/linux/cache.h
> > @@ -85,4 +85,17 @@
> > #define cache_line_size() L1_CACHE_BYTES
> > #endif
> >
> > +/*
> > + * Dummy element for use in structs in order to pad a cacheline
> > + * aligned element with an extra cacheline to hinder the adjacent
> > + * cache prefetcher to prefetch the subsequent struct element.
> > + */
> > +#ifndef ____cacheline_padding_in_smp
> > +# ifdef CONFIG_SMP
> > +# define ____cacheline_padding_in_smp u8 :8 ____cacheline_aligned_in_smp
> > +# else
> > +# define ____cacheline_padding_in_smp
> > +# endif /* CONFIG_SMP */
> > +#endif
> > +
> > #endif /* __LINUX_CACHE_H */
> > diff --git a/net/xdp/xsk_queue.h b/net/xdp/xsk_queue.h
> > index cdb9cf3cd136..1da36423e779 100644
> > --- a/net/xdp/xsk_queue.h
> > +++ b/net/xdp/xsk_queue.h
> > @@ -15,11 +15,9 @@
> >
> > struct xdp_ring {
> > u32 producer ____cacheline_aligned_in_smp;
> > - /* Hinder the adjacent cache prefetcher to prefetch the consumer
> > - * pointer if the producer pointer is touched and vice versa.
> > - */
> > - u32 pad ____cacheline_aligned_in_smp;
> > + ____cacheline_padding_in_smp;
> > u32 consumer ____cacheline_aligned_in_smp;
> > + ____cacheline_padding_in_smp;
> > u32 flags;
> > };
>
> This should be beneficial in theory, though I could not measure any
> statistically significant improvement. Though, the flags variable is
> touched much less frequently than the producer and consumer pointers,
> so that might explain it. We also need to make the compiler allocate
> flags to a cache line 128 bytes (2 cache lines) from the consumer
> pointer like this:
>
> u32 consumer ____cacheline_aligned_in_smp;
> ____cacheline_padding_in_smp;
> u32 flags ____cacheline_aligned_in_smp;
>
> > ... was there any improvement to also pad after the consumer given the struct
> > xdp_ring is also embedded into other structs?
>
> Good idea. Yes, I do believe I see another ~0.4% increase and more
> stable high numbers when trying this out. The xdp_ring is followed by
> the ring descriptors themselves in both the rt/tx rings and the umem
> rings. And these rings are quite large, 2K in the sample app, so
> accessed less frequently (1/8th of the time with a batch size of 256
> and ring size 2K) which might explain the lower increase. In the end,
> I ended up with the following struct:
>
> u32 producer ____cacheline_aligned_in_smp;
> ____cacheline_padding_in_smp;
> u32 consumer ____cacheline_aligned_in_smp;
> ____cacheline_padding_in_smp;
> u32 flags ____cacheline_aligned_in_smp;
> ____cacheline_padding_in_smp;
Actually, this might make more sense and save some memory:
u32 producer ____cacheline_aligned_in_smp;
____cacheline_padding_in_smp;
u32 consumer ____cacheline_aligned_in_smp;
u32 flags;
____cacheline_padding_in_smp;
So keep the flags colocated with the consumer on the same cache line.
The reason I put it there to start with was that it is usually set in
conjunction with the consumer pointer being updated. This might also
explain why I did not see any performance improvement by putting it on
its own 128-byte cache line. In summary, we make sure producer and
consumer are separated with 128 bytes as well as consumer and the
start of the descriptor ring.
> Do you want to submit a patch, or shall I do it? I like your
> ____cacheline_padding_in_smp better than my explicit "padN" member.
>
> Thanks: Magnus
>
> > Thanks,
> > Daniel
next prev parent reply other threads:[~2020-10-12 9:37 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-10-08 14:12 [PATCH bpf-next] xsk: introduce padding between ring pointers Magnus Karlsson
2020-10-09 15:03 ` Daniel Borkmann
2020-10-12 8:37 ` Magnus Karlsson
2020-10-12 9:37 ` Magnus Karlsson [this message]
2020-10-12 11:13 ` Magnus Karlsson
2020-10-12 22:28 ` Daniel Borkmann
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAJ8uoz1Z4dpaoK5th092gid+xbcp1Rz1wkPXZuuceh5y0wvKYw@mail.gmail.com \
--to=magnus.karlsson@gmail.com \
--cc=ast@kernel.org \
--cc=bjorn.topel@intel.com \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=john.fastabend@gmail.com \
--cc=jonathan.lemon@gmail.com \
--cc=magnus.karlsson@intel.com \
--cc=netdev@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).