iommu.lists.linux-foundation.org archive mirror
 help / color / mirror / Atom feed
From: Robin Murphy <robin.murphy@arm.com>
To: "Petr Tesařík" <petr@tesarici.cz>, "Christoph Hellwig" <hch@lst.de>
Cc: Michael Kelley <mhklinux@outlook.com>,
	Will Deacon <will@kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Petr Tesarik <petr.tesarik1@huawei-partners.com>,
	"kernel-team@android.com" <kernel-team@android.com>,
	"iommu@lists.linux.dev" <iommu@lists.linux.dev>,
	Marek Szyprowski <m.szyprowski@samsung.com>,
	Dexuan Cui <decui@microsoft.com>,
	Nicolin Chen <nicolinc@nvidia.com>
Subject: Re: [PATCH v5 6/6] swiotlb: Remove pointless stride adjustment for allocations >= PAGE_SIZE
Date: Fri, 1 Mar 2024 16:38:33 +0000	[thread overview]
Message-ID: <4fbcdd49-cd93-4af8-83e2-f1cef0ea2eeb@arm.com> (raw)
In-Reply-To: <20240301163927.18358ee2@meshulam.tesarici.cz>

On 2024-03-01 3:39 pm, Petr Tesařík wrote:
> On Thu, 29 Feb 2024 16:47:56 +0100
> Christoph Hellwig <hch@lst.de> wrote:
> 
>> On Thu, Feb 29, 2024 at 03:44:11PM +0000, Michael Kelley wrote:
>>> Any thoughts on how that historical behavior should apply if
>>> the DMA min_align_mask is non-zero, or the alloc_align_mask
>>> parameter to swiotbl_tbl_map_single() is non-zero? As currently
>>> used, alloc_align_mask is page aligned if the IOMMU granule is
>>>> = PAGE_SIZE. But a non-zero min_align_mask could mandate
>>> returning a buffer that is not page aligned. Perhaps do the
>>> historical behavior only if alloc_align_mask and min_align_mask
>>> are both zero?
>>
>> I think the driver setting min_align_mask is a clear indicator
>> that the driver requested a specific alignment and the defaults
>> don't apply.  For swiotbl_tbl_map_single as used by dma-iommu
>> I'd have to tak a closer look at how it is used.
> 
> I'm not sure it helps in this discussion, but let me dive into a bit
> of ancient history to understand how we ended up here.
> 
> IIRC this behaviour was originally motivated by limitations of PC AT
> hardware. Intel 8237 is a 16-bit DMA controller. To make it somehow
> usable with addresses up to 16MB (yeah, the infamous DMA zone), IBM
> added a page register, but it was on a separate chip and it did not
> increment when the 8237 address rolled over back to zero. Effectively,
> the page register selected a 64K-aligned window of 64K buffers.
> Consequently, DMA buffers could not cross a 64K physical boundary.
> 
> Thanks to how the buddy allocator works, the 64K-boundary constraint
> was satisfied by allocation size, and drivers took advantage of it when
> allocating device buffers. IMO software bounce buffers simply followed
> the same logic that worked for buffers allocated by the buddy allocator.
> 
> OTOH min_align_mask was motivated by NVME which prescribes the value of
> a certain number of low bits in the DMA address (for simplicity assumed
> to be identical with the same bits in the physical address).
> 
> The only pre-existing user of alloc_align_mask is x86 IOMMU code, and
> IIUC it is used to guarantee that unaligned transactions do not share
> the IOMMU granule with another device. This whole thing is weird,
> because swiotlb_tbl_map_single() is called like this:
> 
>                  aligned_size = iova_align(iovad, size);
>                  phys = swiotlb_tbl_map_single(dev, phys, size, aligned_size,
>                                                iova_mask(iovad), dir, attrs);
> 
> Here:
> 
> * alloc_size = iova_align(iovad, size)
> * alloc_align_mask = iova_mask(iovad)
> 
> Now, iova_align() rounds up its argument to a multiple of iova granule
> and iova_mask() is simply "granule - 1". This works, because granule
> size must be a power of 2, and I assume it must also be >= PAGE_SIZE.

Not quite, the granule must *not* be larger than PAGE_SIZE (but can be 
smaller).
> In that case, the alloc_align_mask argument is not even needed if you
> adjust the code to match documentation---the resulting buffer will be
> aligned to a granule boundary by virtue of having a size that is a
> multiple of the granule size.

I think we've conflated two different notions of "allocation" here. The 
page-alignment which Christoph quoted applies for dma_alloc_coherent(), 
which nowadays *should* only be relevant to SWIOTLB code in the 
swiotlb_alloc() path for stealing coherent pages out of restricted DMA 
pools. Historically there was never any official alignment requirement 
for the DMA addresses returned by dma_map_{page,sg}(), until 
min_align_mask was introduced.

> To sum it up:
> 
> 1. min_align_mask is by far the most important constraint. Devices will
>     simply stop working if it is not met.
> 2. Alignment to the smallest PAGE_SIZE order which is greater than or
>     equal to the requested size has been documented, and some drivers
>     may rely on it.

Strictly it stopped being necessary since fafadcd16595 when the 
historical swiotlb_alloc() was removed, but 602d9858f07c implies that 
indeed the assumption of it for streaming mappings had already set in. 
Of course NVMe is now using min_align_mask, but I'm not sure if it's 
worth taking the risk to find out who else should also be.

> 3. alloc_align_mask is a misguided fix for a bug in the above.

No, alloc_align_mask is about describing the explicit requirement rather 
than relying on any implicit behaviour, and thus being able to do the 
optimal thing for, say, a 9KB mapping given a 4KB IOVA granule and 64KB 
PAGE_SIZE.

Thanks,
Robin.

> 
> Correct me if anything of the above is wrong.
> 
> HTH
> Petr T

  reply	other threads:[~2024-03-01 16:38 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-02-28 13:39 [PATCH v5 0/6] Fix double allocation in swiotlb_alloc() Will Deacon
2024-02-28 13:39 ` [PATCH v5 1/6] swiotlb: Fix double-allocation of slots due to broken alignment handling Will Deacon
2024-02-29  5:44   ` Michael Kelley
2024-02-28 13:39 ` [PATCH v5 2/6] swiotlb: Enforce page alignment in swiotlb_alloc() Will Deacon
2024-02-28 13:39 ` [PATCH v5 3/6] swiotlb: Honour dma_alloc_coherent() " Will Deacon
2024-02-28 13:39 ` [PATCH v5 4/6] swiotlb: Fix alignment checks when both allocation and DMA masks are present Will Deacon
2024-02-28 13:39 ` [PATCH v5 5/6] iommu/dma: Force swiotlb_max_mapping_size on an untrusted device Will Deacon
2024-02-29  5:57   ` Michael Kelley
2024-02-28 13:39 ` [PATCH v5 6/6] swiotlb: Remove pointless stride adjustment for allocations >= PAGE_SIZE Will Deacon
2024-02-29  6:07   ` Michael Kelley
2024-02-29  7:36     ` Michael Kelley
2024-02-29 13:33       ` Christoph Hellwig
2024-02-29 15:44         ` Michael Kelley
2024-02-29 15:47           ` Christoph Hellwig
2024-03-01 15:39             ` Petr Tesařík
2024-03-01 16:38               ` Robin Murphy [this message]
2024-03-01 17:33                 ` Petr Tesařík
2024-03-01 17:08               ` Petr Tesařík
2024-03-01 17:54                 ` Robin Murphy
2024-03-01 18:42                   ` Petr Tesařík
2024-03-04  3:31                     ` Michael Kelley
2024-03-04 11:00                       ` Petr Tesařík
2024-03-04 13:37                         ` Robin Murphy
2024-03-04 15:55                           ` Petr Tesařík
2024-03-04 16:02                             ` Will Deacon
2024-03-04 16:10                               ` Michael Kelley
2024-03-04 16:53                                 ` Robin Murphy
2024-03-04 18:22                                   ` Michael Kelley
2024-03-05 11:20                                     ` Robin Murphy
2024-03-05 15:15                                       ` Petr Tesařík
2024-03-04 19:04                                 ` Petr Tesařík
2024-03-05 14:08                                   ` Christoph Hellwig
2024-03-04 16:04                             ` Michael Kelley
2024-03-04 17:11                               ` Robin Murphy
2024-03-04 18:08                                 ` Michael Kelley
2024-03-05 14:05                           ` Christoph Hellwig
2024-03-04 15:32     ` Will Deacon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4fbcdd49-cd93-4af8-83e2-f1cef0ea2eeb@arm.com \
    --to=robin.murphy@arm.com \
    --cc=decui@microsoft.com \
    --cc=hch@lst.de \
    --cc=iommu@lists.linux.dev \
    --cc=kernel-team@android.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=m.szyprowski@samsung.com \
    --cc=mhklinux@outlook.com \
    --cc=nicolinc@nvidia.com \
    --cc=petr.tesarik1@huawei-partners.com \
    --cc=petr@tesarici.cz \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).