From: "Björn Töpel" <bjorn.topel@intel.com>
To: Christoph Hellwig <hch@lst.de>, Daniel Borkmann <daniel@iogearbox.net>
Cc: "Björn Töpel" <bjorn.topel@gmail.com>,
netdev@vger.kernel.org, davem@davemloft.net,
konrad.wilk@oracle.com, iommu@lists.linux-foundation.org,
linux-kernel@vger.kernel.org, bpf@vger.kernel.org,
maximmi@mellanox.com, magnus.karlsson@intel.com,
jonathan.lemon@gmail.com
Subject: Re: [PATCH net] xsk: remove cheap_dma optimization
Date: Sun, 28 Jun 2020 19:16:33 +0200 [thread overview]
Message-ID: <88d27e1b-dbda-301c-64ba-2391092e3236@intel.com> (raw)
In-Reply-To: <20200627070406.GB11854@lst.de>
On 2020-06-27 09:04, Christoph Hellwig wrote:
> On Sat, Jun 27, 2020 at 01:00:19AM +0200, Daniel Borkmann wrote:
>> Given there is roughly a ~5 weeks window at max where this removal could
>> still be applied in the worst case, could we come up with a fix / proposal
>> first that moves this into the DMA mapping core? If there is something that
>> can be agreed upon by all parties, then we could avoid re-adding the 9%
>> slowdown. :/
>
> I'd rather turn it upside down - this abuse of the internals blocks work
> that has basically just missed the previous window and I'm not going
> to wait weeks to sort out the API misuse. But we can add optimizations
> back later if we find a sane way.
>
I'm not super excited about the performance loss, but I do get
Christoph's frustration about gutting the DMA API making it harder for
DMA people to get work done. Lets try to solve this properly using
proper DMA APIs.
> That being said I really can't see how this would make so much of a
> difference. What architecture and what dma_ops are you using for
> those measurements? What is the workload?
>
The 9% is for an AF_XDP (Fast raw Ethernet socket. Think AF_PACKET, but
faster.) benchmark: receive the packet from the NIC, and drop it. The
DMA syncs stand out in the perf top:
28.63% [kernel] [k] i40e_clean_rx_irq_zc
17.12% [kernel] [k] xp_alloc
8.80% [kernel] [k] __xsk_rcv_zc
7.69% [kernel] [k] xdp_do_redirect
5.35% bpf_prog_992d9ddc835e5629 [k] bpf_prog_992d9ddc835e5629
4.77% [kernel] [k] xsk_rcv.part.0
4.07% [kernel] [k] __xsk_map_redirect
3.80% [kernel] [k] dma_direct_sync_single_for_cpu
3.03% [kernel] [k] dma_direct_sync_single_for_device
2.76% [kernel] [k] i40e_alloc_rx_buffers_zc
1.83% [kernel] [k] xsk_flush
...
For this benchmark the dma_ops are NULL (dma_is_direct() == true), and
the main issue is that SWIOTLB is now unconditionally enabled [1] for
x86, and for each sync we have to check that if is_swiotlb_buffer()
which involves a some costly indirection.
That was pretty much what my hack avoided. Instead we did all the checks
upfront, since AF_XDP has long-term DMA mappings, and just set a flag
for that.
Avoiding the whole "is this address swiotlb" in
dma_direct_sync_single_for_{cpu, device]() per-packet
would help a lot.
Somewhat related to the DMA API; It would have performance benefits for
AF_XDP if the DMA range of the mapped memory was linear, i.e. by IOMMU
utilization. I've started hacking a thing a little bit, but it would be
nice if such API was part of the mapping core.
Input: array of pages Output: array of dma addrs (and obviously dev,
flags and such)
For non-IOMMU len(array of pages) == len(array of dma addrs)
For best-case IOMMU len(array of dma addrs) == 1 (large linear space)
But that's for later. :-)
Björn
[1] commit: 09230cbc1bab ("swiotlb: move the SWIOTLB config symbol to
lib/Kconfig")
next prev parent reply other threads:[~2020-06-28 17:16 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-06-26 13:43 [PATCH net] xsk: remove cheap_dma optimization Björn Töpel
2020-06-26 20:44 ` Jonathan Lemon
2020-06-26 23:00 ` Daniel Borkmann
2020-06-27 7:04 ` Christoph Hellwig
2020-06-28 17:16 ` Björn Töpel [this message]
2020-06-29 13:52 ` Daniel Borkmann
2020-06-29 15:10 ` Björn Töpel
2020-06-29 15:18 ` Daniel Borkmann
2020-06-29 16:23 ` Björn Töpel
2020-06-30 5:07 ` Christoph Hellwig
2020-06-30 13:47 ` Daniel Borkmann
2020-06-29 15:41 ` Robin Murphy
2020-07-01 10:17 ` Björn Töpel
2020-07-08 6:50 ` Christoph Hellwig
2020-07-08 7:57 ` Song Bao Hua (Barry Song)
2020-07-08 12:19 ` Christoph Hellwig
2020-07-08 13:18 ` Robin Murphy
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=88d27e1b-dbda-301c-64ba-2391092e3236@intel.com \
--to=bjorn.topel@intel.com \
--cc=bjorn.topel@gmail.com \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=davem@davemloft.net \
--cc=hch@lst.de \
--cc=iommu@lists.linux-foundation.org \
--cc=jonathan.lemon@gmail.com \
--cc=konrad.wilk@oracle.com \
--cc=linux-kernel@vger.kernel.org \
--cc=magnus.karlsson@intel.com \
--cc=maximmi@mellanox.com \
--cc=netdev@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).