iommu.lists.linux-foundation.org archive mirror
 help / color / mirror / Atom feed
From: Jason Gunthorpe <jgg@nvidia.com>
To: Alex Williamson <alex.williamson@redhat.com>
Cc: Nicolin Chen <nicolinc@nvidia.com>,
	kevin.tian@intel.com, joro@8bytes.org, will@kernel.org,
	robin.murphy@arm.com, shuah@kernel.org, yi.l.liu@intel.com,
	linux-kernel@vger.kernel.org, iommu@lists.linux.dev,
	kvm@vger.kernel.org, linux-kselftest@vger.kernel.org,
	baolu.lu@linux.intel.com, "Raj, Ashok" <ashok.raj@intel.com>
Subject: Re: [PATCH v2 02/10] iommu: Introduce a new iommu_group_replace_domain() API
Date: Fri, 10 Feb 2023 20:44:40 -0400	[thread overview]
Message-ID: <Y+bk+GSCPKOJfr1f@nvidia.com> (raw)
In-Reply-To: <20230210165110.4e89ce55.alex.williamson@redhat.com>

On Fri, Feb 10, 2023 at 04:51:10PM -0700, Alex Williamson wrote:
> On Tue, 7 Feb 2023 13:17:54 -0800
> Nicolin Chen <nicolinc@nvidia.com> wrote:
> 
> > qemu has a need to replace the translations associated with a domain
> > when the guest does large-scale operations like switching between an
> > IDENTITY domain and, say, dma-iommu.c.
> > 
> > Currently, it does this by replacing all the mappings in a single
> > domain, but this is very inefficient and means that domains have to be
> > per-device rather than per-translation.
> > 
> > Provide a high-level API to allow replacements of one domain with
> > another. This is similar to a detach/attach cycle except it doesn't
> > force the group to go to the blocking domain in-between.
> > 
> > By removing this forced blocking domain the iommu driver has the
> > opportunity to implement an atomic replacement of the domains to the
> > greatest extent its hardware allows.
> > 
> > It could be possible to adderss this by simply removing the protection
> > from the iommu_attach_group(), but it is not so clear if that is safe
> > for the few users. Thus, add a new API to serve this new purpose.
> > 
> > Atomic replacement allows the qemu emulation of the viommu to be more
> > complete, as real hardware has this ability.
> 
> I was under the impression that we could not atomically switch a
> device's domain relative to in-flight DMA.  

Certainly all the HW can be proper atomic but not necessarily easily -
the usual issue is a SW complication to manage the software controlled
cache tags in a way that doesn't corrupt the cache.

This is because the cache tag and the io page table top are in
different 64 bit words so atomic writes don't cover both, and thus the
IOMMU HW could tear the two stores and mismatch the cache tag to the
table top. This would corrupt the cache.

The easiest way to avoid this is for SW to use the same DID for the
new and old tables. This is possible if this translation entry is the
only user of the DID. A more complex way would use a temporary DID
that can be safely corrupted. But realistically I'd expect VT-d
drivers to simply make the PASID invalid for the duration of the
update.

However something like AMD has a single cache tag for every entry in
the PASID table so you could do an atomic replace trivially. Just
update the PASID and invalidate the caches.

ARM has a flexible PASID table and atomic replace would be part of
resizing it. eg you can atomically update the top of the PASID table
with a single 64 bit store.

So replace lets the drivers implement those special behaviors if it
makes sense for them.

> Or maybe atomic is the wrong word here since we expect no in-flight DMA
> during the sort of mode transitions referred to here, and we're really
> just trying to convey that we can do this via a single operation with
> reduced latency?  Thanks,

atomic means DMA will either translate with the old domain or the new
domain but never a blocking domain. Keep in mind that with nesting
"domain" can mean a full PASID table in guest memory.

I should reiterate that replace is not an API that is required to be
atomic.

Jason

  reply	other threads:[~2023-02-11  0:44 UTC|newest]

Thread overview: 71+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-02-07 21:17 [PATCH v2 00/10] Add IO page table replacement support Nicolin Chen
2023-02-07 21:17 ` [PATCH v2 01/10] iommu: Move dev_iommu_ops() to private header Nicolin Chen
2023-02-09  2:49   ` Tian, Kevin
2023-02-07 21:17 ` [PATCH v2 02/10] iommu: Introduce a new iommu_group_replace_domain() API Nicolin Chen
2023-02-09  2:55   ` Tian, Kevin
2023-02-09 13:23     ` Jason Gunthorpe
2023-02-10  1:34       ` Tian, Kevin
2023-02-10 23:51   ` Alex Williamson
2023-02-11  0:44     ` Jason Gunthorpe [this message]
2023-02-13  2:24       ` Tian, Kevin
2023-02-13  8:34         ` Baolu Lu
2023-02-13 14:45         ` Jason Gunthorpe
2023-02-14  3:29           ` Tian, Kevin
2023-02-15  6:10   ` Tian, Kevin
2023-02-15 12:52     ` Jason Gunthorpe
2023-02-22  2:11       ` Tian, Kevin
2023-02-24  0:57         ` Jason Gunthorpe
2023-02-24  8:07           ` Tian, Kevin
2023-02-07 21:17 ` [PATCH v2 03/10] iommufd: Create access in vfio_iommufd_emulated_bind() Nicolin Chen
2023-02-09  2:56   ` Tian, Kevin
2023-02-09 16:15     ` Nicolin Chen
2023-02-09 18:58   ` Eric Farman
2023-02-09 19:54     ` Nicolin Chen
2023-02-07 21:17 ` [PATCH v2 04/10] iommufd/selftest: Add IOMMU_TEST_OP_ACCESS_SET_IOAS coverage Nicolin Chen
2023-02-09  2:59   ` Tian, Kevin
2023-02-07 21:17 ` [PATCH v2 05/10] iommufd: Add replace support in iommufd_access_set_ioas() Nicolin Chen
2023-02-09  3:13   ` Tian, Kevin
2023-02-09 20:28     ` Nicolin Chen
2023-02-09 20:49       ` Jason Gunthorpe
2023-02-09 22:18         ` Nicolin Chen
2023-02-07 21:17 ` [PATCH v2 06/10] iommufd/selftest: Add coverage for access->ioas replacement Nicolin Chen
2023-02-07 21:17 ` [PATCH v2 07/10] iommufd/device: Make hwpt_list list_add/del symmetric Nicolin Chen
2023-02-09  3:23   ` Tian, Kevin
2023-02-09 13:24     ` Jason Gunthorpe
2023-02-10  1:46       ` Tian, Kevin
2023-02-10 21:17         ` Jason Gunthorpe
2023-02-13  2:12           ` Tian, Kevin
2023-02-07 21:18 ` [PATCH v2 08/10] iommufd/device: Use iommu_group_replace_domain() Nicolin Chen
2023-02-08  8:08   ` Liu, Yi L
2023-02-09 20:55     ` Nicolin Chen
2023-02-08  8:12   ` Liu, Yi L
2023-02-09 20:56     ` Nicolin Chen
2023-02-09  4:00   ` Tian, Kevin
2023-02-09 21:13     ` Nicolin Chen
2023-02-10  0:01       ` Jason Gunthorpe
2023-02-10 20:50         ` Nicolin Chen
2023-02-10  2:11       ` Tian, Kevin
2023-02-11  0:10         ` Nicolin Chen
2023-02-13  2:34           ` Tian, Kevin
2023-02-13  7:48             ` Nicolin Chen
2023-02-13  8:27               ` Tian, Kevin
2023-02-13 14:49               ` Jason Gunthorpe
2023-02-14 10:54                 ` Nicolin Chen
2023-02-15  1:37                   ` Tian, Kevin
2023-02-15  1:58                     ` Nicolin Chen
2023-02-15  2:15                       ` Tian, Kevin
2023-02-15  7:15                         ` Nicolin Chen
2023-02-15  7:24                           ` Tian, Kevin
2023-02-15 12:51                           ` Jason Gunthorpe
2023-02-14 10:59         ` Nicolin Chen
2023-02-15  1:38           ` Tian, Kevin
2023-02-15  7:16             ` Nicolin Chen
2023-02-07 21:18 ` [PATCH v2 09/10] vfio: Support IO page table replacement Nicolin Chen
2023-02-09  4:06   ` Tian, Kevin
2023-02-07 21:18 ` [PATCH v2 10/10] vfio: Do not allow !ops->dma_unmap in vfio_pin/unpin_pages() Nicolin Chen
2023-02-09  4:10   ` Tian, Kevin
2023-02-09 13:26     ` Jason Gunthorpe
2023-02-09 16:19       ` Nicolin Chen
2023-02-09  2:50 ` [PATCH v2 00/10] Add IO page table replacement support Tian, Kevin
2023-02-09 16:13   ` Nicolin Chen
2023-02-10  1:34     ` Tian, Kevin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y+bk+GSCPKOJfr1f@nvidia.com \
    --to=jgg@nvidia.com \
    --cc=alex.williamson@redhat.com \
    --cc=ashok.raj@intel.com \
    --cc=baolu.lu@linux.intel.com \
    --cc=iommu@lists.linux.dev \
    --cc=joro@8bytes.org \
    --cc=kevin.tian@intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=nicolinc@nvidia.com \
    --cc=robin.murphy@arm.com \
    --cc=shuah@kernel.org \
    --cc=will@kernel.org \
    --cc=yi.l.liu@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).