linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jason Gunthorpe <jgg@nvidia.com>
To: Jean-Philippe Brucker <jean-philippe@linaro.org>
Cc: Nicolin Chen <nicolinc@nvidia.com>,
	robin.murphy@arm.com, will@kernel.org, eric.auger@redhat.com,
	kevin.tian@intel.com, baolu.lu@linux.intel.com, joro@8bytes.org,
	shameerali.kolothum.thodi@huawei.com,
	linux-arm-kernel@lists.infradead.org, iommu@lists.linux.dev,
	linux-kernel@vger.kernel.org, yi.l.liu@intel.com
Subject: Re: [PATCH v1 02/14] iommufd: Add nesting related data structures for ARM SMMUv3
Date: Thu, 9 Mar 2023 17:01:15 -0400	[thread overview]
Message-ID: <ZApJGwPjHhlDwTDV@nvidia.com> (raw)
In-Reply-To: <20230309182659.GA1710571@myrica>

On Thu, Mar 09, 2023 at 06:26:59PM +0000, Jean-Philippe Brucker wrote:
> On Thu, Mar 09, 2023 at 10:48:50AM -0400, Jason Gunthorpe wrote:
> > On Thu, Mar 09, 2023 at 01:42:17PM +0000, Jean-Philippe Brucker wrote:
> > 
> > > Although we can keep the alloc and hardware info separate for each IOMMU
> > > architecture, we should try to come up with common invalidation methods.
> > 
> > The invalidation language is tightly linked to the actual cache block
> > and cache tag in the IOMMU HW design.
> 
> Concretely though, what are the incompatibilities between the HW designs?
> They all need to remove a range of TLB entries, using some address space
> tag. But if there is an actual difference I do need to know.

For instance the address space tags and the cache entires they match
to are wildly different.

ARM uses a fine grained ASID and Intel uses a shared ASID called a DID
and incorporates the PASID into the cache tag.

AMD uses something called a DID that covers a different set of stuff
than the Intel DID, and it doesn't seem to work for nesting. AMD uses
PASID as the primary nested cache tag.

Superficially you can say all three have an ASID and you can have an
invalidate ASID Operation and make it "look" the same, but the actual
behavior is totally ill defined and the whole thing is utterly
obfuscated as to what does it actually MEAN.

But this doesn't matter for virtio. You have already got a spec that
defines invalidation in terms of virtio objects that sort of match
things like iommu_domains. I hope the virtio
VIRTIO_IOMMU_INVAL_S_DOMAIN is very well defined as to exactly what
objects a DOMAIN match applies to. The job of the hypervisor is to
translate that to whatever invalidation(s) the real HW requires.

ie virito is going to say "invalidate this domain" and expect the
hypervisor to spew it to every attached PASID if that is what the HW
requires (eg AMD), or do a single ASID invalidation (Intel, sometimes)

But when a vIOMMU gets a vDID or vPASID invalidation command it
doesn't mean the same thing as virtio. The driver must invalidate
exactly what the vIOMMU programming model says to invalidate because
the guest is going to spew more invalidations to cover what it
needs. Over invalidation would be a performance problem.

Exposing these subtle differences to userspace is necessary. What I do
not want is leaking those differences through an ill-defined "generic"
interface.

Even more importantly Intel and ARM should not have to fight about the
subtle implementation specific details of the specification of the
"generic" interface. If Intel needs something dumb to make their
viommu work well then they should simply be able to do it. I don't
want to spend 6 months of pointless arguing about language details in
an unnecessary "generic" spec.

> > The purpose of these interfaces is to support high performance full
> > functionality vIOMMUs of the real HW, we should not loose sight of
> > that goal.
> > 
> > We are actually planning to go futher and expose direct invalidation
> > work queues complete with HW doorbells to userspace. This obviously
> > must be in native HW format.
> 
> Doesn't seem relevant since direct access to command queue wouldn't use
> this struct.

The point is our design direction with iommufd is to expose the raw HW
to userspace, not to obfuscate it with ill defined generalizations.

> > Someone has to do the conversion. If you don't think virito should do
> > it then I'd be OK to add a type tag for virtio format invalidation and
> > put it in the IOMMU driver.
> 
> Implementing two invalidation formats in each IOMMU driver does not seem
> practical.

I don't see why.

The advantage of the kernel side is that the implementation is not
strong ABI. If we want to adjust and review the virtio invalidation
path as new HW comes along we can, so long as it is only in the
kernel.

On the other hand if we mess up the uABI for iommufd we are stuck with
it.

The safest and best uABI for iommufd is the HW native uABI because it,
almost by definition, cannot be wrong.

Anyhow, I'm still not very convinced adapting to virtio invalidation
format should be in iommu drivers. I think what you end up with for
virtio is that Intel/AMD have some nice common code to invalidate an
iommu_domain address range (probably even the existing invalidation
interface), and SMMUv3 is just totally different and special.

This is because SMMUv3 has no option to keep the PASID table in the
hypervisor so you are sadly forced to expose both the native ASID and
native PASID caches to the virtio protocol.

Given that the VM virtio driver has to have SMMUv3 specific code to
handle the CD table it must get, I don't see the problem with also
having SMMUv3 specific code in the hypervisor virtio driver to handle
invalidating based on the CD table.

Really, I want to see patches implementing all of this before we make
any decision on what the ops interface is for virtio-iommu kernel
side.

> > However, I don't know the rational for virtio-viommu, it seems like a
> > strange direction to me.
> 
> A couple of reasons are relevant here: non-QEMU VMMs don't want to emulate
> all vendor IOMMUs, new architectures get vIOMMU mostly for free,

So your argument is you can implement a simple map/unmap API riding
on the common IOMMU API and this is portable?

Seems sensible, but that falls apart pretty quickly when we talk about
nesting.. I don't think we can avoid VMM components to set this up, so
it stops being portable. At that point I'm back to asking why not use
the real HW model?

> > All the iommu drivers have native command
> > queues. ARM and AMD are both supporting native command queues directly
> > in the guest, complete with a direct guest MMIO doorbell ring.
> 
> Arm SMMUv3 mandates a single global command queue (SMMUv2 uses
> registers). An SMMUv3 can optionally implement multiple command
> queues, though I don't know if they can be safely assigned to
> guests.

It is not standardized by ARM, but it can (and has) been done.

> For a lot of SMMUv3 implementations that have a single queue and for
> other architectures, we can do better than hardware emulation.

How is using a SW emulated virtio formatted queue better than using a
SW emulated SMMUv3 ECMDQ?

The vSMMUv3 driver controls what capabilites are shown to the guest it
can definitely create a ECMDQ enabled device and do something like
assign the 2ndary ECMDQs to hypervisor kernel SW queues the same way
virito does.

I don't think there is a very solid argument that virtio-iommu is
necessary to get faster invalidation.

> > If someone wants to optimize this I'd think the way to do it is to use
> > virtio like techniques to put SW command queue processing in the
> > kernel iommu driver and continue to use the HW native interface in the
> > VM.
> 
> I didn't get which kernel this is.

hypervisor kernel.

> > IMHO this was just unioning all the different invalidation types
> > together. It makes sense for something like virtio but it is
> > illogical/obfuscated as a user/kernel interface since it still
> > requires a userspace HW driver to understand what subset of the
> > invalidations are used on the actual HW.
> 
> As above, decoding arch-specific structures into generic ones is what an
> emulated IOMMU does,

No, it is what virtio wants to do. We are deliberately trying not to
do that for real accelerated HW vIOMMU emulators.

> and it doesn't make a performance difference in which
> format it forwards that to the kernel. The host IOMMU driver checks the
> guest request and copies them into the command queue. Whether that request
> comes in the form of a structure binary-compatible with Arm SMMUvX.Y, or
> some generic structure, does not make a difference.

It is not the structure layouts that matter!

It is the semantic meaning of each request, on each unique piece of
hardware. We actually want to leak the subtle semantic differences to
userspace.

Doing that and continuing to give them the same command label is
exactly the kind of obfuscated ill defined "generic" interface I do
not want.

Jason

  reply	other threads:[~2023-03-09 21:01 UTC|newest]

Thread overview: 165+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-09 10:53 [PATCH v1 00/14] Add Nested Translation Support for SMMUv3 Nicolin Chen
2023-03-09 10:53 ` [PATCH v1 01/14] iommu: Add iommu_get_unmanaged_domain helper Nicolin Chen
2023-03-09 12:51   ` Robin Murphy
2023-03-09 14:19     ` Jason Gunthorpe
2023-03-09 19:04       ` Robin Murphy
2023-03-10  0:23         ` Jason Gunthorpe
2023-03-10  8:41     ` Eric Auger
2023-03-10 15:55       ` Jason Gunthorpe
2023-03-16  1:21         ` Nicolin Chen
2023-03-16 18:42           ` Robin Murphy
2023-03-16 20:01             ` Nicolin Chen
2023-03-20 12:51             ` Jason Gunthorpe
2023-03-10 10:14   ` Eric Auger
2023-03-10 15:33     ` Jason Gunthorpe
2023-03-10 15:44       ` Shameerali Kolothum Thodi
2023-03-10 15:56         ` Jason Gunthorpe
2023-03-10 16:07           ` Shameerali Kolothum Thodi
2023-03-10 16:21             ` Jason Gunthorpe
2023-03-10 16:30               ` Shameerali Kolothum Thodi
2023-03-10 17:03                 ` Jason Gunthorpe
2023-03-22 16:07                   ` Eric Auger
2023-03-22 17:02                     ` Jason Gunthorpe
2023-03-22 17:41                       ` Eric Auger
2023-03-22 18:07                         ` Jason Gunthorpe
2023-03-16 19:51                 ` Nicolin Chen
2023-03-16 19:56                   ` Shameerali Kolothum Thodi
2023-03-22 15:44                     ` Eric Auger
2023-03-09 10:53 ` [PATCH v1 02/14] iommufd: Add nesting related data structures for ARM SMMUv3 Nicolin Chen
2023-03-09 13:42   ` Jean-Philippe Brucker
2023-03-09 14:48     ` Jason Gunthorpe
2023-03-09 18:26       ` Jean-Philippe Brucker
2023-03-09 21:01         ` Jason Gunthorpe [this message]
2023-03-10 12:16           ` Jean-Philippe Brucker
2023-03-10 14:52           ` Robin Murphy
2023-03-10 15:25             ` Jason Gunthorpe
2023-03-10 15:57               ` Robin Murphy
2023-03-10 16:03                 ` Jason Gunthorpe
2023-03-17 10:10                   ` Tian, Kevin
2023-03-17 10:04                 ` Tian, Kevin
2023-03-10  4:50       ` Nicolin Chen
2023-03-10 12:54         ` Jean-Philippe Brucker
2023-03-10 14:00           ` Jason Gunthorpe
2023-03-10 16:06         ` Jason Gunthorpe
2023-03-16  0:59           ` Nicolin Chen
2023-03-09 15:26     ` Shameerali Kolothum Thodi
2023-03-09 15:40       ` Jason Gunthorpe
2023-03-09 15:51         ` Shameerali Kolothum Thodi
2023-03-09 15:59           ` Jason Gunthorpe
2023-03-09 16:07             ` Shameerali Kolothum Thodi
2023-03-10  5:26               ` Nicolin Chen
2023-03-10  5:36                 ` Nicolin Chen
2023-03-10 12:55                   ` Jason Gunthorpe
2023-03-10  5:18         ` Nicolin Chen
2023-03-10  5:04     ` Nicolin Chen
2023-03-10 11:33     ` Eric Auger
2023-03-10 12:51       ` Jason Gunthorpe
2023-03-17 10:17         ` Tian, Kevin
2023-03-09 10:53 ` [PATCH v1 03/14] iommufd/device: Setup MSI on kernel-managed domains Nicolin Chen
2023-03-10 16:45   ` Eric Auger
2023-03-11  0:17     ` Nicolin Chen
2023-03-09 10:53 ` [PATCH v1 04/14] iommu/arm-smmu-v3: Add arm_smmu_hw_info Nicolin Chen
2023-03-09 13:03   ` Robin Murphy
2023-03-10  1:17     ` Nicolin Chen
2023-03-10 15:28       ` Robin Murphy
2023-03-16  0:13         ` Nicolin Chen
2023-03-16 15:19           ` Robin Murphy
2023-03-16 20:06             ` Nicolin Chen
2023-04-12  7:47               ` Nicolin Chen
2023-03-09 10:53 ` [PATCH v1 05/14] iommu/arm-smmu-v3: Remove ARM_SMMU_DOMAIN_NESTED Nicolin Chen
2023-03-10 16:39   ` Eric Auger
2023-03-10 17:05     ` Jason Gunthorpe
2023-03-11  0:24       ` Nicolin Chen
2023-03-11  0:23     ` Nicolin Chen
2023-03-09 10:53 ` [PATCH v1 06/14] iommu/arm-smmu-v3: Unset corresponding STE fields when s2_cfg is NULL Nicolin Chen
2023-03-09 13:13   ` Robin Murphy
2023-03-09 18:24     ` Shameerali Kolothum Thodi
2023-03-10  1:54       ` Nicolin Chen
2023-03-09 10:53 ` [PATCH v1 07/14] iommu/arm-smmu-v3: Add STRTAB_STE_0_CFG_NESTED for 2-stage translation Nicolin Chen
2023-03-09 10:53 ` [PATCH v1 08/14] iommu/arm-smmu-v3: Prepare for nested domain support Nicolin Chen
2023-03-10 20:39   ` Robin Murphy
2023-03-11 12:40     ` Nicolin Chen
2023-03-09 10:53 ` [PATCH v1 09/14] iommu/arm-smmu-v3: Implement arm_smmu_get_unmanaged_domain Nicolin Chen
2023-03-09 10:53 ` [PATCH v1 10/14] iommu/arm-smmu-v3: Pass in user_cfg to arm_smmu_domain_finalise Nicolin Chen
2023-03-09 10:53 ` [PATCH v1 11/14] iommu/arm-smmu-v3: Add arm_smmu_domain_alloc_user Nicolin Chen
2023-03-24 15:28   ` Eric Auger
2023-03-24 17:40     ` Nicolin Chen
2023-03-24 17:50       ` Jason Gunthorpe
2023-03-24 18:00         ` Nicolin Chen
2023-03-24 15:33   ` Eric Auger
2023-03-24 17:43     ` Nicolin Chen
2023-03-09 10:53 ` [PATCH v1 12/14] iommu/arm-smmu-v3: Support IOMMU_DOMAIN_NESTED type of allocations Nicolin Chen
2023-03-09 13:20   ` Robin Murphy
2023-03-09 14:28     ` Robin Murphy
2023-03-10  1:34       ` Nicolin Chen
2023-03-24 15:44   ` Eric Auger
2023-03-24 16:30     ` Jason Gunthorpe
2023-03-24 17:50     ` Nicolin Chen
2023-03-24 17:51       ` Jason Gunthorpe
2023-03-24 17:55         ` Nicolin Chen
2023-03-09 10:53 ` [PATCH v1 13/14] iommu/arm-smmu-v3: Add CMDQ_OP_TLBI_NH_VAA and CMDQ_OP_TLBI_NH_ALL Nicolin Chen
2023-03-09 13:44   ` Robin Murphy
2023-03-10  1:19     ` Nicolin Chen
2023-03-09 10:53 ` [PATCH v1 14/14] iommu/arm-smmu-v3: Add arm_smmu_cache_invalidate_user Nicolin Chen
2023-03-09 14:49   ` Robin Murphy
2023-03-09 15:31     ` Jason Gunthorpe
2023-03-10  4:20       ` Nicolin Chen
2023-03-10 16:19         ` Jason Gunthorpe
2023-03-11 11:56           ` Nicolin Chen
2023-03-11 12:53             ` Nicolin Chen
2023-03-20 13:03             ` Jason Gunthorpe
2023-03-20 15:56               ` Nicolin Chen
2023-03-20 16:04                 ` Jason Gunthorpe
2023-03-20 16:59                   ` Nicolin Chen
2023-03-20 18:45                     ` Jason Gunthorpe
2023-03-20 21:22                       ` Nicolin Chen
2023-03-20 22:19                         ` Jason Gunthorpe
2023-03-22 20:57                           ` Nicolin Chen
2023-03-23 12:17                             ` Jason Gunthorpe
2023-03-17  9:41           ` Tian, Kevin
2023-03-17 14:24             ` Nicolin Chen
2023-03-20 12:59             ` Jason Gunthorpe
2023-03-20 16:12               ` Nicolin Chen
2023-03-20 18:00                 ` Jason Gunthorpe
2023-03-21  8:34                   ` Tian, Kevin
2023-03-21 11:48                     ` Jason Gunthorpe
2023-03-22  6:42                       ` Nicolin Chen
2023-03-22 12:43                         ` Jason Gunthorpe
2023-03-22 17:11                           ` Nicolin Chen
2023-03-22 17:28                             ` Jason Gunthorpe
2023-03-22 19:21                               ` Nicolin Chen
2023-03-22 19:41                                 ` Jason Gunthorpe
2023-03-22 20:43                                   ` Nicolin Chen
2023-03-23 12:16                                     ` Jason Gunthorpe
2023-03-23 18:13                                       ` Nicolin Chen
2023-03-23 18:27                                         ` Jason Gunthorpe
2023-03-24  9:02                         ` Tian, Kevin
2023-03-24 14:57                           ` Jason Gunthorpe
2023-03-24 17:35                             ` Nicolin Chen
2023-03-28  3:03                               ` Tian, Kevin
2023-03-24  8:47                       ` Tian, Kevin
2023-03-24 14:44                         ` Jason Gunthorpe
2023-03-28  2:48                           ` Tian, Kevin
2023-03-28 12:26                             ` Jason Gunthorpe
2023-03-31  8:09                               ` Tian, Kevin
2023-03-17  9:24       ` Tian, Kevin
2023-03-10  3:51     ` Nicolin Chen
2023-03-10 17:53       ` Robin Murphy
2023-03-10 18:49         ` Jason Gunthorpe
2023-03-11 12:38         ` Nicolin Chen
2023-03-13 13:07           ` Robin Murphy
2023-03-16  0:01             ` Nicolin Chen
2023-03-16 14:58               ` Robin Murphy
2023-03-16 21:09                 ` Nicolin Chen
2023-03-20  1:32                   ` Nicolin Chen
2023-03-20 13:11                     ` Jason Gunthorpe
2023-03-20 15:28                       ` Nicolin Chen
2023-03-20 16:01                         ` Jason Gunthorpe
2023-03-20 16:35                           ` Nicolin Chen
2023-03-20 18:07                             ` Jason Gunthorpe
2023-03-20 20:46                               ` Nicolin Chen
2023-03-20 22:14                                 ` Jason Gunthorpe
2023-03-22  5:14                                   ` Nicolin Chen
2023-03-24  8:55                                     ` Tian, Kevin
2023-03-17  9:47     ` Tian, Kevin
2023-03-17 14:16       ` Nicolin Chen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZApJGwPjHhlDwTDV@nvidia.com \
    --to=jgg@nvidia.com \
    --cc=baolu.lu@linux.intel.com \
    --cc=eric.auger@redhat.com \
    --cc=iommu@lists.linux.dev \
    --cc=jean-philippe@linaro.org \
    --cc=joro@8bytes.org \
    --cc=kevin.tian@intel.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=nicolinc@nvidia.com \
    --cc=robin.murphy@arm.com \
    --cc=shameerali.kolothum.thodi@huawei.com \
    --cc=will@kernel.org \
    --cc=yi.l.liu@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).