From: Robin Murphy <robin.murphy@arm.com>
To: "Petr Tesařík" <petr@tesarici.cz>, "Christoph Hellwig" <hch@lst.de>
Cc: Michael Kelley <mhklinux@outlook.com>,
Will Deacon <will@kernel.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
Petr Tesarik <petr.tesarik1@huawei-partners.com>,
"kernel-team@android.com" <kernel-team@android.com>,
"iommu@lists.linux.dev" <iommu@lists.linux.dev>,
Marek Szyprowski <m.szyprowski@samsung.com>,
Dexuan Cui <decui@microsoft.com>,
Nicolin Chen <nicolinc@nvidia.com>
Subject: Re: [PATCH v5 6/6] swiotlb: Remove pointless stride adjustment for allocations >= PAGE_SIZE
Date: Fri, 1 Mar 2024 16:38:33 +0000 [thread overview]
Message-ID: <4fbcdd49-cd93-4af8-83e2-f1cef0ea2eeb@arm.com> (raw)
In-Reply-To: <20240301163927.18358ee2@meshulam.tesarici.cz>
On 2024-03-01 3:39 pm, Petr Tesařík wrote:
> On Thu, 29 Feb 2024 16:47:56 +0100
> Christoph Hellwig <hch@lst.de> wrote:
>
>> On Thu, Feb 29, 2024 at 03:44:11PM +0000, Michael Kelley wrote:
>>> Any thoughts on how that historical behavior should apply if
>>> the DMA min_align_mask is non-zero, or the alloc_align_mask
>>> parameter to swiotbl_tbl_map_single() is non-zero? As currently
>>> used, alloc_align_mask is page aligned if the IOMMU granule is
>>>> = PAGE_SIZE. But a non-zero min_align_mask could mandate
>>> returning a buffer that is not page aligned. Perhaps do the
>>> historical behavior only if alloc_align_mask and min_align_mask
>>> are both zero?
>>
>> I think the driver setting min_align_mask is a clear indicator
>> that the driver requested a specific alignment and the defaults
>> don't apply. For swiotbl_tbl_map_single as used by dma-iommu
>> I'd have to tak a closer look at how it is used.
>
> I'm not sure it helps in this discussion, but let me dive into a bit
> of ancient history to understand how we ended up here.
>
> IIRC this behaviour was originally motivated by limitations of PC AT
> hardware. Intel 8237 is a 16-bit DMA controller. To make it somehow
> usable with addresses up to 16MB (yeah, the infamous DMA zone), IBM
> added a page register, but it was on a separate chip and it did not
> increment when the 8237 address rolled over back to zero. Effectively,
> the page register selected a 64K-aligned window of 64K buffers.
> Consequently, DMA buffers could not cross a 64K physical boundary.
>
> Thanks to how the buddy allocator works, the 64K-boundary constraint
> was satisfied by allocation size, and drivers took advantage of it when
> allocating device buffers. IMO software bounce buffers simply followed
> the same logic that worked for buffers allocated by the buddy allocator.
>
> OTOH min_align_mask was motivated by NVME which prescribes the value of
> a certain number of low bits in the DMA address (for simplicity assumed
> to be identical with the same bits in the physical address).
>
> The only pre-existing user of alloc_align_mask is x86 IOMMU code, and
> IIUC it is used to guarantee that unaligned transactions do not share
> the IOMMU granule with another device. This whole thing is weird,
> because swiotlb_tbl_map_single() is called like this:
>
> aligned_size = iova_align(iovad, size);
> phys = swiotlb_tbl_map_single(dev, phys, size, aligned_size,
> iova_mask(iovad), dir, attrs);
>
> Here:
>
> * alloc_size = iova_align(iovad, size)
> * alloc_align_mask = iova_mask(iovad)
>
> Now, iova_align() rounds up its argument to a multiple of iova granule
> and iova_mask() is simply "granule - 1". This works, because granule
> size must be a power of 2, and I assume it must also be >= PAGE_SIZE.
Not quite, the granule must *not* be larger than PAGE_SIZE (but can be
smaller).
> In that case, the alloc_align_mask argument is not even needed if you
> adjust the code to match documentation---the resulting buffer will be
> aligned to a granule boundary by virtue of having a size that is a
> multiple of the granule size.
I think we've conflated two different notions of "allocation" here. The
page-alignment which Christoph quoted applies for dma_alloc_coherent(),
which nowadays *should* only be relevant to SWIOTLB code in the
swiotlb_alloc() path for stealing coherent pages out of restricted DMA
pools. Historically there was never any official alignment requirement
for the DMA addresses returned by dma_map_{page,sg}(), until
min_align_mask was introduced.
> To sum it up:
>
> 1. min_align_mask is by far the most important constraint. Devices will
> simply stop working if it is not met.
> 2. Alignment to the smallest PAGE_SIZE order which is greater than or
> equal to the requested size has been documented, and some drivers
> may rely on it.
Strictly it stopped being necessary since fafadcd16595 when the
historical swiotlb_alloc() was removed, but 602d9858f07c implies that
indeed the assumption of it for streaming mappings had already set in.
Of course NVMe is now using min_align_mask, but I'm not sure if it's
worth taking the risk to find out who else should also be.
> 3. alloc_align_mask is a misguided fix for a bug in the above.
No, alloc_align_mask is about describing the explicit requirement rather
than relying on any implicit behaviour, and thus being able to do the
optimal thing for, say, a 9KB mapping given a 4KB IOVA granule and 64KB
PAGE_SIZE.
Thanks,
Robin.
>
> Correct me if anything of the above is wrong.
>
> HTH
> Petr T
next prev parent reply other threads:[~2024-03-01 16:38 UTC|newest]
Thread overview: 37+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-02-28 13:39 [PATCH v5 0/6] Fix double allocation in swiotlb_alloc() Will Deacon
2024-02-28 13:39 ` [PATCH v5 1/6] swiotlb: Fix double-allocation of slots due to broken alignment handling Will Deacon
2024-02-29 5:44 ` Michael Kelley
2024-02-28 13:39 ` [PATCH v5 2/6] swiotlb: Enforce page alignment in swiotlb_alloc() Will Deacon
2024-02-28 13:39 ` [PATCH v5 3/6] swiotlb: Honour dma_alloc_coherent() " Will Deacon
2024-02-28 13:39 ` [PATCH v5 4/6] swiotlb: Fix alignment checks when both allocation and DMA masks are present Will Deacon
2024-02-28 13:39 ` [PATCH v5 5/6] iommu/dma: Force swiotlb_max_mapping_size on an untrusted device Will Deacon
2024-02-29 5:57 ` Michael Kelley
2024-02-28 13:39 ` [PATCH v5 6/6] swiotlb: Remove pointless stride adjustment for allocations >= PAGE_SIZE Will Deacon
2024-02-29 6:07 ` Michael Kelley
2024-02-29 7:36 ` Michael Kelley
2024-02-29 13:33 ` Christoph Hellwig
2024-02-29 15:44 ` Michael Kelley
2024-02-29 15:47 ` Christoph Hellwig
2024-03-01 15:39 ` Petr Tesařík
2024-03-01 16:38 ` Robin Murphy [this message]
2024-03-01 17:33 ` Petr Tesařík
2024-03-01 17:08 ` Petr Tesařík
2024-03-01 17:54 ` Robin Murphy
2024-03-01 18:42 ` Petr Tesařík
2024-03-04 3:31 ` Michael Kelley
2024-03-04 11:00 ` Petr Tesařík
2024-03-04 13:37 ` Robin Murphy
2024-03-04 15:55 ` Petr Tesařík
2024-03-04 16:02 ` Will Deacon
2024-03-04 16:10 ` Michael Kelley
2024-03-04 16:53 ` Robin Murphy
2024-03-04 18:22 ` Michael Kelley
2024-03-05 11:20 ` Robin Murphy
2024-03-05 15:15 ` Petr Tesařík
2024-03-04 19:04 ` Petr Tesařík
2024-03-05 14:08 ` Christoph Hellwig
2024-03-04 16:04 ` Michael Kelley
2024-03-04 17:11 ` Robin Murphy
2024-03-04 18:08 ` Michael Kelley
2024-03-05 14:05 ` Christoph Hellwig
2024-03-04 15:32 ` Will Deacon
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4fbcdd49-cd93-4af8-83e2-f1cef0ea2eeb@arm.com \
--to=robin.murphy@arm.com \
--cc=decui@microsoft.com \
--cc=hch@lst.de \
--cc=iommu@lists.linux.dev \
--cc=kernel-team@android.com \
--cc=linux-kernel@vger.kernel.org \
--cc=m.szyprowski@samsung.com \
--cc=mhklinux@outlook.com \
--cc=nicolinc@nvidia.com \
--cc=petr.tesarik1@huawei-partners.com \
--cc=petr@tesarici.cz \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).