iommu.lists.linux-foundation.org archive mirror
 help / color / mirror / Atom feed
From: "Petr Tesařík" <petr@tesarici.cz>
To: Christoph Hellwig <hch@lst.de>
Cc: Michael Kelley <mhklinux@outlook.com>,
	Will Deacon <will@kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Petr Tesarik <petr.tesarik1@huawei-partners.com>,
	"kernel-team@android.com" <kernel-team@android.com>,
	"iommu@lists.linux.dev" <iommu@lists.linux.dev>,
	Marek Szyprowski <m.szyprowski@samsung.com>,
	Robin Murphy <robin.murphy@arm.com>,
	Dexuan Cui <decui@microsoft.com>,
	Nicolin Chen <nicolinc@nvidia.com>
Subject: Re: [PATCH v5 6/6] swiotlb: Remove pointless stride adjustment for allocations >= PAGE_SIZE
Date: Fri, 1 Mar 2024 18:08:53 +0100	[thread overview]
Message-ID: <20240301180853.5ac20b27@meshulam.tesarici.cz> (raw)
In-Reply-To: <20240301163927.18358ee2@meshulam.tesarici.cz>

On Fri, 1 Mar 2024 16:39:27 +0100
Petr Tesařík <petr@tesarici.cz> wrote:

> On Thu, 29 Feb 2024 16:47:56 +0100
> Christoph Hellwig <hch@lst.de> wrote:
> 
> > On Thu, Feb 29, 2024 at 03:44:11PM +0000, Michael Kelley wrote:  
> > > Any thoughts on how that historical behavior should apply if
> > > the DMA min_align_mask is non-zero, or the alloc_align_mask
> > > parameter to swiotbl_tbl_map_single() is non-zero? As currently
> > > used, alloc_align_mask is page aligned if the IOMMU granule is    
> > > >= PAGE_SIZE. But a non-zero min_align_mask could mandate    
> > > returning a buffer that is not page aligned. Perhaps do the
> > > historical behavior only if alloc_align_mask and min_align_mask
> > > are both zero?    
> > 
> > I think the driver setting min_align_mask is a clear indicator
> > that the driver requested a specific alignment and the defaults
> > don't apply.  For swiotbl_tbl_map_single as used by dma-iommu
> > I'd have to tak a closer look at how it is used.  
> 
> I'm not sure it helps in this discussion, but let me dive into a bit
> of ancient history to understand how we ended up here.
> 
> IIRC this behaviour was originally motivated by limitations of PC AT
> hardware. Intel 8237 is a 16-bit DMA controller. To make it somehow
> usable with addresses up to 16MB (yeah, the infamous DMA zone), IBM
> added a page register, but it was on a separate chip and it did not
> increment when the 8237 address rolled over back to zero. Effectively,
> the page register selected a 64K-aligned window of 64K buffers.
> Consequently, DMA buffers could not cross a 64K physical boundary.
> 
> Thanks to how the buddy allocator works, the 64K-boundary constraint
> was satisfied by allocation size, and drivers took advantage of it when
> allocating device buffers. IMO software bounce buffers simply followed
> the same logic that worked for buffers allocated by the buddy allocator.
> 
> OTOH min_align_mask was motivated by NVME which prescribes the value of
> a certain number of low bits in the DMA address (for simplicity assumed
> to be identical with the same bits in the physical address).
> 
> The only pre-existing user of alloc_align_mask is x86 IOMMU code, and
> IIUC it is used to guarantee that unaligned transactions do not share
> the IOMMU granule with another device. This whole thing is weird,
> because swiotlb_tbl_map_single() is called like this:
> 
>                 aligned_size = iova_align(iovad, size);
>                 phys = swiotlb_tbl_map_single(dev, phys, size, aligned_size,
>                                               iova_mask(iovad), dir, attrs);
> 
> Here:
> 
> * alloc_size = iova_align(iovad, size)
> * alloc_align_mask = iova_mask(iovad)
> 
> Now, iova_align() rounds up its argument to a multiple of iova granule
> and iova_mask() is simply "granule - 1". This works, because granule
> size must be a power of 2, and I assume it must also be >= PAGE_SIZE.
> 
> In that case, the alloc_align_mask argument is not even needed if you
> adjust the code to match documentation---the resulting buffer will be
> aligned to a granule boundary by virtue of having a size that is a
> multiple of the granule size.
> 
> To sum it up:
> 
> 1. min_align_mask is by far the most important constraint. Devices will
>    simply stop working if it is not met.
> 2. Alignment to the smallest PAGE_SIZE order which is greater than or
>    equal to the requested size has been documented, and some drivers
>    may rely on it.
> 3. alloc_align_mask is a misguided fix for a bug in the above.
> 
> Correct me if anything of the above is wrong.

I thought about it some more, and I believe I know what should happen
if the first two constraints appear to be mutually exclusive.

First, the alignment based on size does not guarantee that the resulting
physical address is aligned. In fact, the lowest IO_TLB_SHIFT bits must
be always identical to the original buffer address.

Let's take an example request like this:

   TLB_SIZE       = 0x00000800
   min_align_mask = 0x0000ffff
   orig_addr      = 0x....1234
   alloc_size     = 0x00002800

Minimum alignment mask requires to keep the 1234 at the end. Allocation
size requires a buffer that is aligned to 16K. Of course, there is no
16K-aligned slot with slot_address & 0x7ff == 0x200, but if IO_TLB_SHIFT
was 14, it would be slot_address & 0x3fff == 0 (low IO_TLB_SHIFT are
masked off). Since the SWIOTLB API does not guarantee any specific
value of IO_TLB_SHIFT, callers cannot rely on it. That means 0x1234 is a
perfectly valid bounce buffer address for this example.

The caller may rightfully expect that the 16K granule containing the
bounce buffer is not shared with any other user. For the above case I
suggest to increase the allocation size to 0x4000 already in
swiotlb_tbl_map_single() and treat 0x1234 as the offset from the slot
address.

Will, do you want to incorporate it into your patch series, or should I
send the corresponding patch?

Petr T

  parent reply	other threads:[~2024-03-01 17:08 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-02-28 13:39 [PATCH v5 0/6] Fix double allocation in swiotlb_alloc() Will Deacon
2024-02-28 13:39 ` [PATCH v5 1/6] swiotlb: Fix double-allocation of slots due to broken alignment handling Will Deacon
2024-02-29  5:44   ` Michael Kelley
2024-02-28 13:39 ` [PATCH v5 2/6] swiotlb: Enforce page alignment in swiotlb_alloc() Will Deacon
2024-02-28 13:39 ` [PATCH v5 3/6] swiotlb: Honour dma_alloc_coherent() " Will Deacon
2024-02-28 13:39 ` [PATCH v5 4/6] swiotlb: Fix alignment checks when both allocation and DMA masks are present Will Deacon
2024-02-28 13:39 ` [PATCH v5 5/6] iommu/dma: Force swiotlb_max_mapping_size on an untrusted device Will Deacon
2024-02-29  5:57   ` Michael Kelley
2024-02-28 13:39 ` [PATCH v5 6/6] swiotlb: Remove pointless stride adjustment for allocations >= PAGE_SIZE Will Deacon
2024-02-29  6:07   ` Michael Kelley
2024-02-29  7:36     ` Michael Kelley
2024-02-29 13:33       ` Christoph Hellwig
2024-02-29 15:44         ` Michael Kelley
2024-02-29 15:47           ` Christoph Hellwig
2024-03-01 15:39             ` Petr Tesařík
2024-03-01 16:38               ` Robin Murphy
2024-03-01 17:33                 ` Petr Tesařík
2024-03-01 17:08               ` Petr Tesařík [this message]
2024-03-01 17:54                 ` Robin Murphy
2024-03-01 18:42                   ` Petr Tesařík
2024-03-04  3:31                     ` Michael Kelley
2024-03-04 11:00                       ` Petr Tesařík
2024-03-04 13:37                         ` Robin Murphy
2024-03-04 15:55                           ` Petr Tesařík
2024-03-04 16:02                             ` Will Deacon
2024-03-04 16:10                               ` Michael Kelley
2024-03-04 16:53                                 ` Robin Murphy
2024-03-04 18:22                                   ` Michael Kelley
2024-03-05 11:20                                     ` Robin Murphy
2024-03-05 15:15                                       ` Petr Tesařík
2024-03-04 19:04                                 ` Petr Tesařík
2024-03-05 14:08                                   ` Christoph Hellwig
2024-03-04 16:04                             ` Michael Kelley
2024-03-04 17:11                               ` Robin Murphy
2024-03-04 18:08                                 ` Michael Kelley
2024-03-05 14:05                           ` Christoph Hellwig
2024-03-04 15:32     ` Will Deacon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240301180853.5ac20b27@meshulam.tesarici.cz \
    --to=petr@tesarici.cz \
    --cc=decui@microsoft.com \
    --cc=hch@lst.de \
    --cc=iommu@lists.linux.dev \
    --cc=kernel-team@android.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=m.szyprowski@samsung.com \
    --cc=mhklinux@outlook.com \
    --cc=nicolinc@nvidia.com \
    --cc=petr.tesarik1@huawei-partners.com \
    --cc=robin.murphy@arm.com \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).