All of lore.kernel.org
 help / color / mirror / Atom feed
From: Juerg Haefliger <juergh@proton.me>
To: Petr Tesarik <petrtesarik@huaweicloud.com>
Cc: Christoph Hellwig <hch@lst.de>, Jonathan Corbet <corbet@lwn.net>,
	Marek Szyprowski <m.szyprowski@samsung.com>,
	Robin Murphy <robin.murphy@arm.com>, Borislav Petkov <bp@suse.de>,
	"Paul E. McKenney" <paulmck@kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Randy Dunlap <rdunlap@infradead.org>,
	Damien Le Moal <damien.lemoal@opensource.wdc.com>,
	Kim Phillips <kim.phillips@amd.com>,
	"Steven Rostedt (Google)" <rostedt@goodmis.org>,
	"open list:DOCUMENTATION" <linux-doc@vger.kernel.org>,
	open list <linux-kernel@vger.kernel.org>,
	"open list:DMA MAPPING HELPERS" <iommu@lists.linux.dev>,
	Roberto Sassu <roberto.sassu@huawei.com>,
	petr@tesarici.cz, Alexander Graf <graf@amazon.com>
Subject: Re: [RFC v1 3/4] swiotlb: Allow dynamic allocation of bounce buffers
Date: Fri, 31 Mar 2023 07:26:09 +0000	[thread overview]
Message-ID: <20230331092553.677e9649@smeagol> (raw)
In-Reply-To: <4268fa4e-4f0f-a2f6-a2a5-5b78ca4a073d@huaweicloud.com>

[-- Attachment #1: Type: text/plain, Size: 5229 bytes --]

On Tue, 28 Mar 2023 09:54:35 +0200
Petr Tesarik <petrtesarik@huaweicloud.com> wrote:

> On 3/28/2023 6:07 AM, Christoph Hellwig wrote:
> > [adding Alex as he has been interested in this in the past]
> >
> > On Mon, Mar 20, 2023 at 01:28:15PM +0100, Petr Tesarik wrote:
> >> Second, on the Raspberry Pi 4, swiotlb is used by dma-buf for pages
> >> moved from the rendering GPU (v3d driver), which can access all
> >> memory, to the display output (vc4 driver), which is connected to a
> >> bus with an address limit of 1 GiB and no IOMMU. These buffers can
> >> be large (several megabytes) and cannot be handled by SWIOTLB,
> >> because they exceed maximum segment size of 256 KiB. Such mapping
> >> failures can be easily reproduced on a Raspberry Pi4: Starting
> >> GNOME remote desktop results in a flood of kernel messages like
> >> these:
> >
> > Shouldn't we make sure dma-buf allocates the buffers for the most
> > restricted devices, and more importantly does something like a dma
> > coherent allocation instead of a dynamic mapping of random memory?
> >
> > While a larger swiotlb works around this I don't think this fixes the root
> > cause.
>
> I tend to agree here. However, it's the DMABUF design itself that causes
> some trouble. The buffer is allocated by the v3d driver, which does not
> have the restriction, so the DMA API typically allocates an address
> somewhere near the 4G boundary. Userspace then exports the buffer, sends
> it to another process as a file descriptor and imports it into the vc4
> driver, which requires DMA below 1G. In the beginning, v3d had no idea
> that the buffer would be exported to userspace, much less that it would
> be later imported into vc4.
>
> Anyway, I suspected that the buffers need not be imported into the vc4
> driver (also hinted by Eric Anholt in a 2018 blog post [1]), and it
> seems I was right. I encountered the issue with Ubuntu 22.10; I
> installed latest openSUSE Tumbleweed yesterday, and I was not able to
> reproduce the issue there, most likely because the Mesa drivers have
> been fixed meanwhile. This makes the specific case of the Raspberry Pi 4
> drivers moot. The issue may still affect other combinations of drivers,
> but I don't have any other real-world example ATM.

I'm only seeing this problem with Wayland, no issue when switching Ubuntu to
X. It seems Tumbleweed is using X by default.

...Juerg


> [1] https://anholt.github.io/twivc4/2018/02/12/twiv/
>
> >> 1. The value is limited to ULONG_MAX, which is too little both for
> >>    physical addresses (e.g. x86 PAE or 32-bit ARM LPAE) and DMA
> >>    addresses (e.g. Xen guests on 32-bit ARM).
> >>
> >> 2. Since buffers are currently allocated with page granularity, a
> >>    PFN can be used instead. However, some values are reserved by
> >>    the maple tree implementation. Liam suggests to use
> >>    xa_mk_value() in that case, but that reduces the usable range by
> >>    half. Luckily, 31 bits are still enough to hold a PFN on all
> >>    32-bit platforms.
> >>
> >> 3. Software IO TLB is used from interrupt context. The maple tree
> >>    implementation is not IRQ-safe (MT_FLAGS_LOCK_IRQ does nothing
> >>    AFAICS). Instead, I use an external lock, spin_lock_irqsave() and
> >>    spin_unlock_irqrestore().
> >>
> >> Note that bounce buffers are never allocated dynamically if the
> >> software IO TLB is in fact a DMA restricted pool, which is intended
> >> to be stay in its designated location in physical memory.
> >
> > I'm a little worried about all that because it causes quite a bit
> > of overhead even for callers that don't end up going into the
> > dynamic range or do not use swiotlb at all.  I don't really have a
> > good answer here except for the usual avoid bounce buffering whenever
> > you can that might not always be easy to do.
>
> I'm also worried about all this overhead. OTOH I was not able to confirm
> it, because the difference between two successive fio test runs on an
> unmodified kernel was bigger than the difference between a vanilla and a
> patched kernel, except the maximum completion latency, which OTOH
> affected less than 0.01% of all requests.
>
> BTW my testing also suggests that the streaming DMA API is quite
> inefficient, because UAS performance _improved_ with swiotlb=force.
> Sure, this should probably be addressed in the UAS and/or xHCI driver,
> but what I mean is that moving away from swiotlb may even cause
> performance regressions, which is counter-intuitive. At least I would
> _not_ have expected it.
>
> >> +	gfp = (attrs & DMA_ATTR_MAY_SLEEP) ? GFP_KERNEL : GFP_NOWAIT;
> >> +	slot = kmalloc(sizeof(*slot), gfp | __GFP_NOWARN);
> >> +	if (!slot)
> >> +		goto err;
> >> +
> >> +	slot->orig_addr = orig_addr;
> >> +	slot->alloc_size = alloc_size;
> >> +	slot->page = dma_direct_alloc_pages(dev, PAGE_ALIGN(alloc_size),
> >> +					    &slot->dma_addr, dir,
> >> +					    gfp | __GFP_NOWARN);
> >> +	if (!slot->page)
> >> +		goto err_free_slot;
> >
> > Without GFP_NOIO allocations this will deadlock eventually.
>
> Ah, that would affect the non-sleeping case (GFP_KERNEL), right?
>
> Petr T
>


[-- Attachment #2: attachment.sig --]
[-- Type: application/pgp-signature, Size: 849 bytes --]

  parent reply	other threads:[~2023-03-31  7:26 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-20 12:28 [RFC v1 0/4] Allow dynamic allocation of software IO TLB bounce buffers Petr Tesarik
2023-03-20 12:28 ` [RFC v1 1/4] dma-mapping: introduce the DMA_ATTR_MAY_SLEEP attribute Petr Tesarik
2023-03-28  3:57   ` Christoph Hellwig
2023-03-28  7:21     ` Petr Tesarik
2023-04-07  5:52       ` Christoph Hellwig
2023-03-31 13:06   ` Bagas Sanjaya
2023-03-20 12:28 ` [RFC v1 2/4] swiotlb: Move code around in preparation for dynamic bounce buffers Petr Tesarik
2023-03-20 12:28 ` [RFC v1 3/4] swiotlb: Allow dynamic allocation of " Petr Tesarik
2023-03-28  4:07   ` Christoph Hellwig
2023-03-28  7:54     ` Petr Tesarik
2023-03-28 12:43       ` Petr Tesarik
2023-04-07  5:57         ` Christoph Hellwig
2023-04-07 10:15           ` Petr Tesařík
2023-04-13 11:09             ` Petr Tesarik
2023-04-21 13:03             ` Petr Tesařík
2023-04-21 14:58               ` Robin Murphy
2023-04-21 15:09                 ` Petr Tesařík
2023-04-24  6:03                   ` Christoph Hellwig
2023-03-31  7:26       ` Juerg Haefliger [this message]
2023-03-31  9:00         ` Petr Tesařík
2023-04-06 11:44           ` Juerg Haefliger
2023-05-11 10:36             ` Petr Tesařík
2023-04-07  5:55       ` Christoph Hellwig
2023-04-07 10:46         ` Petr Tesařík
2023-04-11  3:51           ` Christoph Hellwig
2023-03-20 12:28 ` [RFC v1 4/4] swiotlb: Add an option to allow dynamic " Petr Tesarik
2023-03-27 11:06 ` [RFC v1 0/4] Allow dynamic allocation of software IO TLB " Petr Tesarik
2023-04-07  6:00   ` Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230331092553.677e9649@smeagol \
    --to=juergh@proton.me \
    --cc=akpm@linux-foundation.org \
    --cc=bp@suse.de \
    --cc=corbet@lwn.net \
    --cc=damien.lemoal@opensource.wdc.com \
    --cc=graf@amazon.com \
    --cc=hch@lst.de \
    --cc=iommu@lists.linux.dev \
    --cc=kim.phillips@amd.com \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=m.szyprowski@samsung.com \
    --cc=paulmck@kernel.org \
    --cc=petr@tesarici.cz \
    --cc=petrtesarik@huaweicloud.com \
    --cc=rdunlap@infradead.org \
    --cc=roberto.sassu@huawei.com \
    --cc=robin.murphy@arm.com \
    --cc=rostedt@goodmis.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.