KVM ARM Archive on lore.kernel.org
 help / color / Atom feed
From: Eric Auger <eric.auger@redhat.com>
To: eric.auger.pro@gmail.com, eric.auger@redhat.com,
	iommu@lists.linux-foundation.org, linux-kernel@vger.kernel.org,
	kvm@vger.kernel.org, kvmarm@lists.cs.columbia.edu,
	joro@8bytes.org, alex.williamson@redhat.com,
	jacob.jun.pan@linux.intel.com, yi.l.liu@intel.com,
	jean-philippe.brucker@arm.com, will.deacon@arm.com,
	robin.murphy@arm.com
Cc: marc.zyngier@arm.com, kevin.tian@intel.com,
	vincent.stehle@arm.com, ashok.raj@intel.com
Subject: [PATCH v8 17/29] dma-iommu: Implement NESTED_MSI cookie
Date: Sun, 26 May 2019 18:09:52 +0200
Message-ID: <20190526161004.25232-18-eric.auger@redhat.com> (raw)
In-Reply-To: <20190526161004.25232-1-eric.auger@redhat.com>

Up to now, when the type was UNMANAGED, we used to
allocate IOVA pages within a reserved IOVA MSI range.

If both the host and the guest are exposed with SMMUs, each
would allocate an IOVA. The guest allocates an IOVA (gIOVA)
to map onto the guest MSI doorbell (gDB). The Host allocates
another IOVA (hIOVA) to map onto the physical doorbell (hDB).

So we end up with 2 unrelated mappings, at S1 and S2:
         S1             S2
gIOVA    ->     gDB
               hIOVA    ->    hDB

The PCI device would be programmed with hIOVA.
No stage 1 mapping would existing, causing the MSIs to fault.

iommu_dma_bind_guest_msi() allows to pass gIOVA/gDB
to the host so that gIOVA can be used by the host instead of
re-allocating a new hIOVA.

         S1           S2
gIOVA    ->    gDB    ->    hDB

this time, the PCI device can be programmed with the gIOVA MSI
doorbell which is correctly mapped through both stages.

Nested mode is not compatible with HW MSI regions as in that
case gDB and hDB should have a 1-1 mapping. This check will
be done when attaching each device to the IOMMU domain.

Signed-off-by: Eric Auger <eric.auger@redhat.com>

---

v7 -> v8:
- correct iommu_dma_(un)bind_guest_msi when
  !CONFIG_IOMMU_DMA
- Mentioned nested mode is not compatible with HW MSI regions
  in commit message
- protect with msi_lock on unbind

v6 -> v7:
- removed device handle

v3 -> v4:
- change function names; add unregister
- protect with msi_lock

v2 -> v3:
- also store the device handle on S1 mapping registration.
  This garantees we associate the associated S2 mapping binds
  to the correct physical MSI controller.

v1 -> v2:
- unmap stage2 on put()
---
 drivers/iommu/dma-iommu.c | 139 +++++++++++++++++++++++++++++++++++++-
 include/linux/dma-iommu.h |  16 +++++
 2 files changed, 152 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 129c4badf9ae..fd6b8b1f6700 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -35,12 +35,15 @@
 struct iommu_dma_msi_page {
 	struct list_head	list;
 	dma_addr_t		iova;
+	dma_addr_t		gpa;
 	phys_addr_t		phys;
+	size_t			s1_granule;
 };
 
 enum iommu_dma_cookie_type {
 	IOMMU_DMA_IOVA_COOKIE,
 	IOMMU_DMA_MSI_COOKIE,
+	IOMMU_DMA_NESTED_MSI_COOKIE,
 };
 
 struct iommu_dma_cookie {
@@ -110,14 +113,17 @@ EXPORT_SYMBOL(iommu_get_dma_cookie);
  *
  * Users who manage their own IOVA allocation and do not want DMA API support,
  * but would still like to take advantage of automatic MSI remapping, can use
- * this to initialise their own domain appropriately. Users should reserve a
+ * this to initialise their own domain appropriately. Users may reserve a
  * contiguous IOVA region, starting at @base, large enough to accommodate the
  * number of PAGE_SIZE mappings necessary to cover every MSI doorbell address
- * used by the devices attached to @domain.
+ * used by the devices attached to @domain. The other way round is to provide
+ * usable iova pages through the iommu_dma_bind_doorbell API (nested stages
+ * use case)
  */
 int iommu_get_msi_cookie(struct iommu_domain *domain, dma_addr_t base)
 {
 	struct iommu_dma_cookie *cookie;
+	int nesting, ret;
 
 	if (domain->type != IOMMU_DOMAIN_UNMANAGED)
 		return -EINVAL;
@@ -125,7 +131,12 @@ int iommu_get_msi_cookie(struct iommu_domain *domain, dma_addr_t base)
 	if (domain->iova_cookie)
 		return -EEXIST;
 
-	cookie = cookie_alloc(IOMMU_DMA_MSI_COOKIE);
+	ret =  iommu_domain_get_attr(domain, DOMAIN_ATTR_NESTING, &nesting);
+	if (!ret && nesting)
+		cookie = cookie_alloc(IOMMU_DMA_NESTED_MSI_COOKIE);
+	else
+		cookie = cookie_alloc(IOMMU_DMA_MSI_COOKIE);
+
 	if (!cookie)
 		return -ENOMEM;
 
@@ -146,6 +157,7 @@ void iommu_put_dma_cookie(struct iommu_domain *domain)
 {
 	struct iommu_dma_cookie *cookie = domain->iova_cookie;
 	struct iommu_dma_msi_page *msi, *tmp;
+	bool s2_unmap = false;
 
 	if (!cookie)
 		return;
@@ -153,7 +165,15 @@ void iommu_put_dma_cookie(struct iommu_domain *domain)
 	if (cookie->type == IOMMU_DMA_IOVA_COOKIE && cookie->iovad.granule)
 		put_iova_domain(&cookie->iovad);
 
+	if (cookie->type == IOMMU_DMA_NESTED_MSI_COOKIE)
+		s2_unmap = true;
+
 	list_for_each_entry_safe(msi, tmp, &cookie->msi_page_list, list) {
+		if (s2_unmap && msi->phys) {
+			size_t size = cookie_msi_granule(cookie);
+
+			WARN_ON(iommu_unmap(domain, msi->gpa, size) != size);
+		}
 		list_del(&msi->list);
 		kfree(msi);
 	}
@@ -162,6 +182,92 @@ void iommu_put_dma_cookie(struct iommu_domain *domain)
 }
 EXPORT_SYMBOL(iommu_put_dma_cookie);
 
+/**
+ * iommu_dma_bind_guest_msi - Allows to pass the stage 1
+ * binding of a virtual MSI doorbell used by @dev.
+ *
+ * @domain: domain handle
+ * @iova: guest iova
+ * @gpa: gpa of the virtual doorbell
+ * @size: size of the granule used for the stage1 mapping
+ *
+ * In nested stage use case, the user can provide IOVA/IPA bindings
+ * corresponding to a guest MSI stage 1 mapping. When the host needs
+ * to map its own MSI doorbells, it can use @gpa as stage 2 input
+ * and map it onto the physical MSI doorbell.
+ */
+int iommu_dma_bind_guest_msi(struct iommu_domain *domain,
+			     dma_addr_t iova, phys_addr_t gpa, size_t size)
+{
+	struct iommu_dma_cookie *cookie = domain->iova_cookie;
+	struct iommu_dma_msi_page *msi;
+	int ret = 0;
+
+	if (!cookie)
+		return -EINVAL;
+
+	if (cookie->type != IOMMU_DMA_NESTED_MSI_COOKIE)
+		return -EINVAL;
+
+	iova = iova & ~(dma_addr_t)(size - 1);
+	gpa = gpa & ~(phys_addr_t)(size - 1);
+
+	spin_lock(&cookie->msi_lock);
+
+	list_for_each_entry(msi, &cookie->msi_page_list, list) {
+		if (msi->iova == iova)
+			goto unlock; /* this page is already registered */
+	}
+
+	msi = kzalloc(sizeof(*msi), GFP_ATOMIC);
+	if (!msi) {
+		ret = -ENOMEM;
+		goto unlock;
+	}
+
+	msi->iova = iova;
+	msi->gpa = gpa;
+	msi->s1_granule = size;
+	list_add(&msi->list, &cookie->msi_page_list);
+unlock:
+	spin_unlock(&cookie->msi_lock);
+	return ret;
+}
+EXPORT_SYMBOL(iommu_dma_bind_guest_msi);
+
+void iommu_dma_unbind_guest_msi(struct iommu_domain *domain, dma_addr_t giova)
+{
+	struct iommu_dma_cookie *cookie = domain->iova_cookie;
+	struct iommu_dma_msi_page *msi;
+
+	if (!cookie)
+		return;
+
+	if (cookie->type != IOMMU_DMA_NESTED_MSI_COOKIE)
+		return;
+
+	spin_lock(&cookie->msi_lock);
+
+	list_for_each_entry(msi, &cookie->msi_page_list, list) {
+		dma_addr_t aligned_giova =
+			giova & ~(dma_addr_t)(msi->s1_granule - 1);
+
+		if (msi->iova == aligned_giova) {
+			if (msi->phys) {
+				/* unmap the stage 2 */
+				size_t size = cookie_msi_granule(cookie);
+
+				WARN_ON(iommu_unmap(domain, msi->gpa, size) != size);
+			}
+			list_del(&msi->list);
+			kfree(msi);
+			break;
+		}
+	}
+	spin_unlock(&cookie->msi_lock);
+}
+EXPORT_SYMBOL(iommu_dma_unbind_guest_msi);
+
 /**
  * iommu_dma_get_resv_regions - Reserved region driver helper
  * @dev: Device from iommu_get_resv_regions()
@@ -888,6 +994,33 @@ static struct iommu_dma_msi_page *iommu_dma_get_msi_page(struct device *dev,
 		if (msi_page->phys == msi_addr)
 			return msi_page;
 
+	/*
+	 * In nested stage mode, we do not allocate an MSI page in
+	 * a range provided by the user. Instead, IOVA/IPA bindings are
+	 * individually provided. We reuse thise IOVAs to build the
+	 * GIOVA -> GPA -> MSI HPA nested stage mapping.
+	 */
+	if (cookie->type == IOMMU_DMA_NESTED_MSI_COOKIE) {
+		list_for_each_entry(msi_page, &cookie->msi_page_list, list)
+			if (!msi_page->phys) {
+				int ret;
+
+				/* do the stage 2 mapping */
+				ret = iommu_map(domain,
+						msi_page->gpa, msi_addr, size,
+						IOMMU_MMIO | IOMMU_WRITE);
+				if (ret) {
+					pr_warn("MSI S2 mapping failed (%d)\n",
+						ret);
+					return NULL;
+				}
+				msi_page->phys = msi_addr;
+				return msi_page;
+			}
+		pr_warn("%s no MSI binding found\n", __func__);
+		return NULL;
+	}
+
 	msi_page = kzalloc(sizeof(*msi_page), GFP_ATOMIC);
 	if (!msi_page)
 		return NULL;
diff --git a/include/linux/dma-iommu.h b/include/linux/dma-iommu.h
index 476e0c54de2d..3f4b8c7a342c 100644
--- a/include/linux/dma-iommu.h
+++ b/include/linux/dma-iommu.h
@@ -24,6 +24,7 @@
 #include <linux/dma-mapping.h>
 #include <linux/iommu.h>
 #include <linux/msi.h>
+#include <uapi/linux/iommu.h>
 
 int iommu_dma_init(void);
 
@@ -85,6 +86,9 @@ void iommu_dma_compose_msi_msg(struct msi_desc *desc,
 			       struct msi_msg *msg);
 
 void iommu_dma_get_resv_regions(struct device *dev, struct list_head *list);
+int iommu_dma_bind_guest_msi(struct iommu_domain *domain,
+			     dma_addr_t iova, phys_addr_t gpa, size_t size);
+void iommu_dma_unbind_guest_msi(struct iommu_domain *domain, dma_addr_t giova);
 
 #else
 
@@ -123,6 +127,18 @@ static inline void iommu_dma_compose_msi_msg(struct msi_desc *desc,
 {
 }
 
+static inline int
+iommu_dma_bind_guest_msi(struct iommu_domain *domain,
+			 dma_addr_t iova, phys_addr_t gpa, size_t size)
+{
+	return -ENODEV;
+}
+
+static inline void
+iommu_dma_unbind_guest_msi(struct iommu_domain *domain, dma_addr_t giova);
+{
+}
+
 static inline void iommu_dma_get_resv_regions(struct device *dev, struct list_head *list)
 {
 }
-- 
2.20.1

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

  parent reply index

Thread overview: 64+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-05-26 16:09 [PATCH v8 00/29] SMMUv3 Nested Stage Setup Eric Auger
2019-05-26 16:09 ` [PATCH v8 01/29] driver core: Add per device iommu param Eric Auger
2019-05-26 16:09 ` [PATCH v8 02/29] iommu: Introduce device fault data Eric Auger
2019-05-26 16:09 ` [PATCH v8 03/29] iommu: Introduce device fault report API Eric Auger
2019-05-26 16:09 ` [PATCH v8 04/29] iommu: Add recoverable fault reporting Eric Auger
2019-06-03 22:31   ` Alex Williamson
2019-06-04 15:48     ` Jacob Pan
2019-05-26 16:09 ` [PATCH v8 05/29] iommu: Add a timeout parameter for PRQ response Eric Auger
2019-06-03 22:32   ` Alex Williamson
2019-06-04 10:52     ` Jean-Philippe Brucker
2019-06-04 15:50       ` Jacob Pan
2019-05-26 16:09 ` [PATCH v8 06/29] trace/iommu: Add sva trace events Eric Auger
2019-05-26 16:09 ` [PATCH v8 07/29] iommu: Use device fault trace event Eric Auger
2019-05-26 16:09 ` [PATCH v8 08/29] iommu: Introduce attach/detach_pasid_table API Eric Auger
2019-05-26 16:09 ` [PATCH v8 09/29] iommu: Introduce cache_invalidate API Eric Auger
2019-05-26 16:09 ` [PATCH v8 10/29] iommu: Introduce bind/unbind_guest_msi Eric Auger
2019-05-26 16:09 ` [PATCH v8 11/29] iommu/arm-smmu-v3: Maintain a SID->device structure Eric Auger
2019-05-26 16:09 ` [PATCH v8 12/29] iommu/smmuv3: Dynamically allocate s1_cfg and s2_cfg Eric Auger
2019-05-26 16:09 ` [PATCH v8 13/29] iommu/smmuv3: Get prepared for nested stage support Eric Auger
2019-05-26 16:09 ` [PATCH v8 14/29] iommu/smmuv3: Implement attach/detach_pasid_table Eric Auger
2019-05-26 16:09 ` [PATCH v8 15/29] iommu/smmuv3: Introduce __arm_smmu_tlb_inv_asid/s1_range_nosync Eric Auger
2019-05-26 16:09 ` [PATCH v8 16/29] iommu/smmuv3: Implement cache_invalidate Eric Auger
2019-05-26 16:09 ` Eric Auger [this message]
2019-05-26 16:09 ` [PATCH v8 18/29] iommu/smmuv3: Nested mode single MSI doorbell per domain enforcement Eric Auger
2019-05-26 16:09 ` [PATCH v8 19/29] iommu/smmuv3: Enforce incompatibility between nested mode and HW MSI regions Eric Auger
2019-05-26 16:09 ` [PATCH v8 20/29] iommu/smmuv3: Implement bind/unbind_guest_msi Eric Auger
2019-05-26 16:09 ` [PATCH v8 21/29] iommu/smmuv3: Report non recoverable faults Eric Auger
2019-05-26 16:09 ` [PATCH v8 22/29] vfio: VFIO_IOMMU_ATTACH/DETACH_PASID_TABLE Eric Auger
2019-06-03 22:32   ` Alex Williamson
2019-05-26 16:09 ` [PATCH v8 23/29] vfio: VFIO_IOMMU_CACHE_INVALIDATE Eric Auger
2019-06-14 12:38   ` Liu, Yi L
2019-06-14 13:17     ` Auger Eric
2019-05-26 16:09 ` [PATCH v8 24/29] vfio: VFIO_IOMMU_BIND/UNBIND_MSI Eric Auger
2019-06-03 22:32   ` Alex Williamson
2019-06-07  8:30     ` Auger Eric
2019-05-26 16:10 ` [PATCH v8 25/29] vfio-pci: Add a new VFIO_REGION_TYPE_NESTED region type Eric Auger
2019-06-03 22:31   ` Alex Williamson
2019-06-07  8:28     ` Auger Eric
2019-06-07 12:47       ` Jean-Philippe Brucker
2019-06-07 16:29       ` Alex Williamson
2019-05-26 16:10 ` [PATCH v8 26/29] vfio-pci: Register an iommu fault handler Eric Auger
2019-06-03 22:31   ` Alex Williamson
2019-06-04 16:11     ` Auger Eric
2019-06-05 22:45       ` Jacob Pan
2019-06-06 18:54         ` Jean-Philippe Brucker
2019-06-06 20:29           ` Jacob Pan
2019-06-07  7:02             ` Auger Eric
2019-06-07 10:28             ` Jean-Philippe Brucker
2019-06-07 17:43               ` Jacob Pan
2019-06-10 12:45                 ` Jean-Philippe Brucker
2019-06-10 21:31                   ` Jacob Pan
2019-06-11 13:14                     ` Jean-Philippe Brucker
2019-06-12 18:53                       ` Jacob Pan
2019-06-18 14:04                         ` Jean-Philippe Brucker
2019-06-19  0:19                           ` Jacob Pan
2019-06-19 11:44                             ` Jean-Philippe Brucker
2019-07-11 13:07                           ` Auger Eric
2019-06-07 12:48   ` Jean-Philippe Brucker
2019-06-07 14:18     ` Auger Eric
2019-05-26 16:10 ` [PATCH v8 27/29] vfio_pci: Allow to mmap the fault queue Eric Auger
2019-05-26 16:10 ` [PATCH v8 28/29] vfio-pci: Add VFIO_PCI_DMA_FAULT_IRQ_INDEX Eric Auger
2019-06-03 22:31   ` Alex Williamson
2019-06-04 16:11     ` Auger Eric
2019-05-26 16:10 ` [PATCH v8 29/29] vfio: Document nested stage control Eric Auger

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190526161004.25232-18-eric.auger@redhat.com \
    --to=eric.auger@redhat.com \
    --cc=alex.williamson@redhat.com \
    --cc=ashok.raj@intel.com \
    --cc=eric.auger.pro@gmail.com \
    --cc=iommu@lists.linux-foundation.org \
    --cc=jacob.jun.pan@linux.intel.com \
    --cc=jean-philippe.brucker@arm.com \
    --cc=joro@8bytes.org \
    --cc=kevin.tian@intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=kvmarm@lists.cs.columbia.edu \
    --cc=linux-kernel@vger.kernel.org \
    --cc=marc.zyngier@arm.com \
    --cc=robin.murphy@arm.com \
    --cc=vincent.stehle@arm.com \
    --cc=will.deacon@arm.com \
    --cc=yi.l.liu@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

KVM ARM Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/kvmarm/0 kvmarm/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 kvmarm kvmarm/ https://lore.kernel.org/kvmarm \
		kvmarm@lists.cs.columbia.edu kvmarm@archiver.kernel.org
	public-inbox-index kvmarm


Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/edu.columbia.cs.lists.kvmarm


AGPL code for this site: git clone https://public-inbox.org/ public-inbox