All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jason Gunthorpe <jgg@nvidia.com>
To: Eric Auger <eric.auger@redhat.com>
Cc: bpf@vger.kernel.org, Jonathan Corbet <corbet@lwn.net>,
	David Woodhouse <dwmw2@infradead.org>,
	iommu@lists.linux.dev, Joerg Roedel <joro@8bytes.org>,
	Kevin Tian <kevin.tian@intel.com>,
	linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org,
	llvm@lists.linux.dev, Nathan Chancellor <nathan@kernel.org>,
	Nick Desaulniers <ndesaulniers@google.com>,
	Miguel Ojeda <ojeda@kernel.org>,
	Robin Murphy <robin.murphy@arm.com>,
	Shuah Khan <shuah@kernel.org>,
	Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>,
	Tom Rix <trix@redhat.com>, Will Deacon <will@kernel.org>,
	Anthony Krowiak <akrowiak@linux.ibm.com>,
	Alex Williamson <alex.williamson@redhat.com>,
	Bagas Sanjaya <bagasdotme@gmail.com>,
	Lu Baolu <baolu.lu@linux.intel.com>,
	Chaitanya Kulkarni <chaitanyak@nvidia.com>,
	Cornelia Huck <cohuck@redhat.com>,
	Daniel Jordan <daniel.m.jordan@oracle.com>,
	David Gibson <david@gibson.dropbear.id.au>,
	Eric Farman <farman@linux.ibm.com>,
	Jason Wang <jasowang@redhat.com>,
	Jean-Philippe Brucker <jean-philippe@linaro.org>,
	Jason Herne <jjherne@linux.ibm.com>,
	Joao Martins <joao.m.martins@oracle.com>,
	kvm@vger.kernel.org, Lixiao Yang <lixiao.yang@intel.com>,
	Matthew Rosato <mjrosato@linux.ibm.com>,
	"Michael S. Tsirkin" <mst@redhat.com>,
	Nicolin Chen <nicolinc@nvidia.com>,
	Halil Pasic <pasic@linux.ibm.com>,
	Niklas Schnelle <schnelle@linux.ibm.com>,
	Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>,
	Yi Liu <yi.l.liu@intel.com>, Keqian Zhu <zhukeqian1@huawei.com>
Subject: Re: [PATCH v5 13/19] iommufd: Add kAPI toward external drivers for physical devices
Date: Sun, 27 Nov 2022 20:14:15 -0400	[thread overview]
Message-ID: <Y4P9VzpCv/DyHeaD@nvidia.com> (raw)
In-Reply-To: <4c429c36-146e-e2b2-0cb4-d256ca659280@redhat.com>

On Sun, Nov 27, 2022 at 10:13:55PM +0100, Eric Auger wrote:
> > +static int iommufd_device_setup_msi(struct iommufd_device *idev,
> > +				    struct iommufd_hw_pagetable *hwpt,
> > +				    phys_addr_t sw_msi_start,
> > +				    unsigned int flags)
> > +{
> > +	int rc;
> > +
> > +	/*
> > +	 * IOMMU_CAP_INTR_REMAP means that the platform is isolating MSI, and it
> rather means that the *IOMMU* implements IRQ remapping.

Not really. The name is a mess, but as it is implemented, it means the
platform is implementing MSI security. How exactly that is done is not
really defined, and it doesn't really belong as an iommu property.
However the security is being created is done in a way that is
transparent to the iommu_domain user.

MSI security means that the originating device (eg the RID for PCI) is
validated when the MSI is processed. RIDs can only use MSIs they are
authorized to use.

It doesn't matter how it is done, if it remapping HW, fancy
iommu_domain tricks, or built into the MSI controller. Set this flag
if the platform is secure and doesn't need the code triggered by
irq_domain_check_msi_remap().

> irq_domain_check_msi_remap() instead means the MSI controller
> implements that functionality (a given device id is able to trigger

Not quite, it means that MSI isolation is available, however it is not
transparent and the iommu_domain user must do the little dance that
follows.

It does not mean the MSI controller implements the security
functionality because it is not set on x86.

Logically on x86 we should consider the remapping logic as part of the
MSI controller even if x86 makes them separated for historical
reasons.

> MSI #n and this #n gets translated into actual MSI #m) So what you
> want is to prevent an assigned device from being able to DMA into an
> MSI doorbell that is not protected by either the IOMMU or the MSI
> controller. If this happens this means the DMA can generate any kind
> of MSI traffic that can jeopardize the host or other VMs

I don't know of any real systems that work like this. ARM standard GIC
uses a shared ITS page, the IOMMU does nothing to provide MSI
security. MSI security is built into the GIC because it validates the
device ID as part of processing the MSI. The IOMMU is only involved
to grant access to the shared ITS page.

Intel is similar, it provides security through the MSI controller's
remapping logic, the only difference is that on Intel the MSI window
is always present in every iommu_domain (becuase it occures before the
IOMMU), and in ARM you have to do the little dance.

Even if the iommu is to be involved it is all hidden from this layer.

> and afterwards resv_msi is checked to see if we need to create the
> so-called msi cookie.  This msi cookie tells the MSI writes are
> translated by the IOMMU and somebody must create a mapping for those
> transactions.

The cookie is always necessary if we are going to use the
non-transparent mode. That is what makes it the non transparent
mode. We have to supply the reserved IOVA range from one part of the
iommu driver to another part.

> > +	 * creates the MSI window by default in the iommu domain. Nothing
> > +	 * further to do.
> > +	 */
> > +	if (device_iommu_capable(idev->dev, IOMMU_CAP_INTR_REMAP))
> > +		return 0;
> > +
> > +	/*
> > +	 * On ARM systems that set the global IRQ_DOMAIN_FLAG_MSI_REMAP every
> > +	 * allocated iommu_domain will block interrupts by default and this
> It sounds there is a confusion between IRQ remapping and the fact MSI
> writes are not bypassed by the IOMMU.

I don't think I'm confused :)

> > +static int iommufd_device_do_attach(struct iommufd_device *idev,
> > +				    struct iommufd_hw_pagetable *hwpt,
> > +				    unsigned int flags)
> > +{
> > +	phys_addr_t sw_msi_start = 0;
> > +	int rc;
> > +
> > +	mutex_lock(&hwpt->devices_lock);
> > +
> > +	/*
> > +	 * Try to upgrade the domain we have, it is an iommu driver bug to
> > +	 * report IOMMU_CAP_ENFORCE_CACHE_COHERENCY but fail
> > +	 * enforce_cache_coherency when there are no devices attached to the
> > +	 * domain.
> > +	 */
> > +	if (idev->enforce_cache_coherency && !hwpt->enforce_cache_coherency) {
> > +		if (hwpt->domain->ops->enforce_cache_coherency)
> > +			hwpt->enforce_cache_coherency =
> > +				hwpt->domain->ops->enforce_cache_coherency(
> > +					hwpt->domain);
> > +		if (!hwpt->enforce_cache_coherency) {
> > +			WARN_ON(list_empty(&hwpt->devices));
> > +			rc = -EINVAL;
> > +			goto out_unlock;
> > +		}
> > +	}
> > +
> > +	rc = iopt_table_enforce_group_resv_regions(&hwpt->ioas->iopt, idev->dev,
> > +						   idev->group, &sw_msi_start);
> > +	if (rc)
> > +		goto out_unlock;
> > +
> so in the case of any IOMMU_RESV_MSI, iommufd_device_setup_msi() will be
> called with *sw_msi_start = 0 which will return -EPERM?
> I don't think this is what we want. In that case I think we want the
> RESV_MSI region to be taken into account as a RESV region but we don't
> need the MSI cookie.

This was sort of sloppy copied from VFIO - we should just delete
it. The is no driver that sets both, and once the platform asserts
irq_domain_check_msi_remap() it is going down the non-transparent path
anyhow and must set a cookie to work. [again the names doesn't make
any sense for the functionality]

Failing with EPERM is probably not so bad since the platform is using
an invalid configuration. I'm kind of inclined to leave this for right
now since it has all be tested and I don't want to make a
regression. But I can try to send a patch to tidy it a bit more, maybe
add an appropriate WARN_ON.

The whole thing is actually backwards. The IOMMU_CAP_INTR_REMAP should
have been some global irq_has_secure_msi() and everything with
interrupt remapping, and ARM should set it.

Then the domain should have had a domain cap
IOMUU_CAP_REQUIRE_MSI_WINDOW_SETUP to do the little dance ARM drivers
need.

And even better would have been to not have the little dance in the
first place, as we don't really need the iommu_domain user to convey
the fixed MSI window from one part of the iommu driver to another.

We may try to fix this up when we get to doing nesting on ARM, or
maybe just leave the confusing sort of working thing as-is. I don't
know.

Jason

  reply	other threads:[~2022-11-28  0:14 UTC|newest]

Thread overview: 59+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-16 21:00 [PATCH v5 00/19] IOMMUFD Generic interface Jason Gunthorpe
2022-11-16 21:00 ` [PATCH v5 01/19] iommu: Add IOMMU_CAP_ENFORCE_CACHE_COHERENCY Jason Gunthorpe
2022-11-23  8:30   ` Yi Liu
2022-11-23 16:56     ` Jason Gunthorpe
2022-11-16 21:00 ` [PATCH v5 02/19] iommu: Add device-centric DMA ownership interfaces Jason Gunthorpe
2022-11-16 21:00 ` [PATCH v5 03/19] interval-tree: Add a utility to iterate over spans in an interval tree Jason Gunthorpe
2022-11-16 21:00 ` [PATCH v5 04/19] scripts/kernel-doc: support EXPORT_SYMBOL_NS_GPL() with -export Jason Gunthorpe
2022-11-16 21:00 ` [PATCH v5 05/19] iommufd: Document overview of iommufd Jason Gunthorpe
2022-11-18  9:06   ` Eric Auger
2022-11-30 15:06   ` Binbin Wu
2022-12-01  0:08     ` Jason Gunthorpe
2022-11-16 21:00 ` [PATCH v5 06/19] iommufd: File descriptor, context, kconfig and makefiles Jason Gunthorpe
2022-11-18 16:27   ` Eric Auger
2022-11-18 20:23     ` Jason Gunthorpe
2022-11-25  8:43       ` Eric Auger
2022-11-16 21:00 ` [PATCH v5 07/19] kernel/user: Allow user::locked_vm to be usable for iommufd Jason Gunthorpe
2022-11-18  9:08   ` Eric Auger
2022-11-18  9:09   ` Eric Auger
2022-11-18 16:28   ` Eric Auger
2022-11-18 20:25     ` Jason Gunthorpe
2022-11-16 21:00 ` [PATCH v5 08/19] iommufd: PFN handling for iopt_pages Jason Gunthorpe
2022-11-18  2:24   ` Tian, Kevin
2022-11-18  2:27     ` Jason Gunthorpe
2022-11-16 21:00 ` [PATCH v5 09/19] iommufd: Algorithms for PFN storage Jason Gunthorpe
2022-11-16 21:00 ` [PATCH v5 10/19] iommufd: Data structure to provide IOVA to PFN mapping Jason Gunthorpe
2022-11-18  2:55   ` Tian, Kevin
2022-11-16 21:00 ` [PATCH v5 11/19] iommufd: IOCTLs for the io_pagetable Jason Gunthorpe
2022-11-27 17:49   ` Eric Auger
2022-11-28  9:05     ` Tian, Kevin
2022-11-28 18:11       ` Jason Gunthorpe
2022-11-28 18:27     ` Jason Gunthorpe
2022-11-28 20:09       ` Eric Auger
2022-11-16 21:00 ` [PATCH v5 12/19] iommufd: Add a HW pagetable object Jason Gunthorpe
2022-11-27 15:12   ` Eric Auger
2022-11-16 21:00 ` [PATCH v5 13/19] iommufd: Add kAPI toward external drivers for physical devices Jason Gunthorpe
2022-11-27 21:13   ` Eric Auger
2022-11-28  0:14     ` Jason Gunthorpe [this message]
2022-11-28 10:55       ` Eric Auger
2022-11-28 13:20         ` Jason Gunthorpe
2022-11-28 14:17           ` Eric Auger
2022-11-29  1:09             ` Jason Gunthorpe
2022-11-16 21:00 ` [PATCH v5 14/19] iommufd: Add kAPI toward external drivers for kernel access Jason Gunthorpe
2022-11-28 15:48   ` Eric Auger
2022-11-28 18:56     ` Jason Gunthorpe
2022-12-06 20:40       ` Jason Gunthorpe
2022-11-16 21:00 ` [PATCH v5 15/19] iommufd: vfio container FD ioctl compatibility Jason Gunthorpe
2022-11-18  2:58   ` Tian, Kevin
2022-11-18 15:22     ` Jason Gunthorpe
2022-11-23  1:33       ` Tian, Kevin
2022-11-23  4:31         ` Jason Wang
2022-11-23 13:03         ` Jason Gunthorpe
2022-11-24  5:23           ` Tian, Kevin
2022-11-28 17:53   ` Eric Auger
2022-11-28 19:37     ` Jason Gunthorpe
2022-11-28 20:54       ` Eric Auger
2022-11-16 21:00 ` [PATCH v5 16/19] iommufd: Add kernel support for testing iommufd Jason Gunthorpe
2022-11-16 21:00 ` [PATCH v5 17/19] iommufd: Add some fault injection points Jason Gunthorpe
2022-11-16 21:00 ` [PATCH v5 18/19] iommufd: Add additional invariant assertions Jason Gunthorpe
2022-11-16 21:00 ` [PATCH v5 19/19] iommufd: Add a selftest Jason Gunthorpe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y4P9VzpCv/DyHeaD@nvidia.com \
    --to=jgg@nvidia.com \
    --cc=akrowiak@linux.ibm.com \
    --cc=alex.williamson@redhat.com \
    --cc=bagasdotme@gmail.com \
    --cc=baolu.lu@linux.intel.com \
    --cc=bpf@vger.kernel.org \
    --cc=chaitanyak@nvidia.com \
    --cc=cohuck@redhat.com \
    --cc=corbet@lwn.net \
    --cc=daniel.m.jordan@oracle.com \
    --cc=david@gibson.dropbear.id.au \
    --cc=dwmw2@infradead.org \
    --cc=eric.auger@redhat.com \
    --cc=farman@linux.ibm.com \
    --cc=iommu@lists.linux.dev \
    --cc=jasowang@redhat.com \
    --cc=jean-philippe@linaro.org \
    --cc=jjherne@linux.ibm.com \
    --cc=joao.m.martins@oracle.com \
    --cc=joro@8bytes.org \
    --cc=kevin.tian@intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=lixiao.yang@intel.com \
    --cc=llvm@lists.linux.dev \
    --cc=mjrosato@linux.ibm.com \
    --cc=mst@redhat.com \
    --cc=nathan@kernel.org \
    --cc=ndesaulniers@google.com \
    --cc=nicolinc@nvidia.com \
    --cc=ojeda@kernel.org \
    --cc=pasic@linux.ibm.com \
    --cc=robin.murphy@arm.com \
    --cc=schnelle@linux.ibm.com \
    --cc=shameerali.kolothum.thodi@huawei.com \
    --cc=shuah@kernel.org \
    --cc=suravee.suthikulpanit@amd.com \
    --cc=trix@redhat.com \
    --cc=will@kernel.org \
    --cc=yi.l.liu@intel.com \
    --cc=zhukeqian1@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.