* [RFC PATCH 00/10] vfio/mdev: IOMMU aware mediated device @ 2018-07-22 6:09 Lu Baolu 2018-07-22 6:09 ` [RFC PATCH 01/10] iommu/vt-d: Get iommu device for a " Lu Baolu ` (9 more replies) 0 siblings, 10 replies; 32+ messages in thread From: Lu Baolu @ 2018-07-22 6:09 UTC (permalink / raw) To: Joerg Roedel, David Woodhouse, Alex Williamson, Kirti Wankhede Cc: ashok.raj, sanjay.k.kumar, jacob.jun.pan, kevin.tian, yi.l.liu, yi.y.sun, peterx, iommu, kvm, linux-kernel, Lu Baolu Hi, The Mediate Device is a framework for fine-grained physical device sharing across the isolated domains. Currently the mdev framework is designed to be independent of the platform IOMMU support. As the result, the DMA isolation relies on the mdev parent device in a vendor specific way. There are several cases where a mediated device could be protected and isolated by the platform IOMMU. For example, Intel vt-d rev3.0 [1] introduces a new translation mode called 'scalable mode', which enables PASID-granular translations. The vt-d scalable mode is the key ingredient for Scalable I/O Virtualization [2] [3] which allows sharing a device in minimal possible granularity (ADI - Assignable Device Interface). A mediated device backed by an ADI could be protected and isolated by the IOMMU since 1) the parent device supports tagging an unique PASID to all DMA traffic out of the mediated device; and 2) the DMA translation unit (IOMMU) supports the PASID granular translation. We can apply IOMMU protection and isolation to this kind of devices just as what we are doing with an assignable PCI device. In order to distinguish the IOMMU-capable mediated devices from those which still need to rely on parent devices, this patch set adds a domain type attribute to each mdev. enum mdev_domain_type { DOMAIN_TYPE_EXTERNAL, /* Use the external domain and all * IOMMU staff controlled by the * parent device driver. */ DOMAIN_TYPE_INHERITANCE,/* Use the same domain as the parent device. */ DOMAIN_TYPE_PRIVATE, /* Capable of having a private domain. For an * example, the parent device is able to bind * a specific PASID for a mediated device and * transfering data with the asigned PASID. */ }; The mdev parent device driver could opt-in whether an mdev is IOMMU capable when the device is created by invoking below interface within its @create callback: int mdev_set_domain_type(struct device *dev, enum mdev_domain_type type); In vfio_iommu_type1_attach_group(), a domain will be found and assigned to the group according to the domain type attributes of mdev devices in the group. Besides above, we still need below changes to sport IOMMU-capable mdev device: a) The platform specific iommu ops should be set to mdev bus. So that, the vfio/mdev framework could call iommu APIs. b) The iommu driver should be changed to support setting up the translation structures for a mediated device. This patch series extends both IOMMU and vfio components to support mdev device passing through when it could be isolated and protected by the IOMMU units. The first part of this series (PATCH 1/10 ~ 5/10) makes the Intel IOMMU driver to be aware of a mediated device. The second part (PATCH 6/10 ~ 8/10) sets the iommu ops for the mdev bus. The last part (PATCH 9/10 ~ 10/10) adds the domain type attribute to a mdev device and allocates an iommu domain if it is IOMMU-capable. This patch series depends on a patch set posted here [4] for discussion which added the support for scalable mode in Intel IOMMU driver. References: [1] https://software.intel.com/en-us/download/intel-virtualization-technology-for-directed-io-architecture-specification [2] https://software.intel.com/en-us/download/intel-scalable-io-virtualization-technical-specification [3] https://schd.ws/hosted_files/lc32018/00/LC3-SIOV-final.pdf [4] https://lkml.org/lkml/2018/7/16/62 Best regards, Lu Baolu Lu Baolu (10): iommu/vt-d: Get iommu device for a mediated device iommu/vt-d: Alloc domain for a mediated device iommu/vt-d: Allocate groups for mediated devices iommu/vt-d: Get pasid table for a mediated device iommu/vt-d: Setup DMA remapping for mediated devices iommu: Add iommu_set_bus API interface iommu/vt-d: Add set_bus iommu ops vfio/mdev: Set iommu ops for mdev bus vfio/mdev: Add mediated device domain type vfio/type1: Allocate domain for mediated device drivers/iommu/intel-iommu.c | 127 ++++++++++++++++++++++++++++++++++++--- drivers/iommu/intel-pasid.c | 16 ++++- drivers/iommu/intel-pasid.h | 16 +++++ drivers/iommu/iommu.c | 23 +++++++ drivers/vfio/mdev/mdev_core.c | 30 ++++++++- drivers/vfio/mdev/mdev_driver.c | 10 +++ drivers/vfio/mdev/mdev_private.h | 1 + drivers/vfio/vfio_iommu_type1.c | 43 +++++++++---- include/linux/intel-iommu.h | 5 ++ include/linux/iommu.h | 12 ++++ include/linux/mdev.h | 22 +++++++ 11 files changed, 282 insertions(+), 23 deletions(-) -- 2.7.4 ^ permalink raw reply [flat|nested] 32+ messages in thread
* [RFC PATCH 01/10] iommu/vt-d: Get iommu device for a mediated device 2018-07-22 6:09 [RFC PATCH 00/10] vfio/mdev: IOMMU aware mediated device Lu Baolu @ 2018-07-22 6:09 ` Lu Baolu 2018-07-23 4:44 ` Liu, Yi L 2018-07-24 18:50 ` Alex Williamson 2018-07-22 6:09 ` [RFC PATCH 02/10] iommu/vt-d: Alloc domain " Lu Baolu ` (8 subsequent siblings) 9 siblings, 2 replies; 32+ messages in thread From: Lu Baolu @ 2018-07-22 6:09 UTC (permalink / raw) To: Joerg Roedel, David Woodhouse, Alex Williamson, Kirti Wankhede Cc: ashok.raj, sanjay.k.kumar, jacob.jun.pan, kevin.tian, yi.l.liu, yi.y.sun, peterx, iommu, kvm, linux-kernel, Lu Baolu, Jacob Pan This patch adds the support to get the iommu device for a mediated device. The assumption is that each mediated device is a minimal assignable set of a physical PCI device. Hence, we should use the iommu device of the parent PCI device to manage the translation. Cc: Ashok Raj <ashok.raj@intel.com> Cc: Jacob Pan <jacob.jun.pan@linux.intel.com> Cc: Kevin Tian <kevin.tian@intel.com> Cc: Liu Yi L <yi.l.liu@intel.com> Signed-off-by: Sanjay Kumar <sanjay.k.kumar@intel.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> --- drivers/iommu/intel-iommu.c | 6 ++++++ drivers/iommu/intel-pasid.h | 16 ++++++++++++++++ 2 files changed, 22 insertions(+) diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index 7d198ea..fc3ac1c 100644 --- a/drivers/iommu/intel-iommu.c +++ b/drivers/iommu/intel-iommu.c @@ -767,6 +767,12 @@ static struct intel_iommu *device_to_iommu(struct device *dev, u8 *bus, u8 *devf if (iommu_dummy(dev)) return NULL; + if (dev_is_mdev(dev)) { + dev = dev_mdev_parent(dev); + if (WARN_ON(!dev_is_pci(dev))) + return NULL; + } + if (dev_is_pci(dev)) { struct pci_dev *pf_pdev; diff --git a/drivers/iommu/intel-pasid.h b/drivers/iommu/intel-pasid.h index 518df72..46cde66 100644 --- a/drivers/iommu/intel-pasid.h +++ b/drivers/iommu/intel-pasid.h @@ -35,6 +35,22 @@ struct pasid_table { struct list_head dev; /* device list */ }; +static inline bool dev_is_mdev(struct device *dev) +{ + if (!dev) + return false; + + return !strcmp(dev->bus->name, "mdev"); +} + +static inline struct device *dev_mdev_parent(struct device *dev) +{ + if (WARN_ON(!dev_is_mdev(dev))) + return NULL; + + return dev->parent; +} + extern u32 intel_pasid_max_id; int intel_pasid_alloc_id(void *ptr, int start, int end, gfp_t gfp); void intel_pasid_free_id(int pasid); -- 2.7.4 ^ permalink raw reply related [flat|nested] 32+ messages in thread
* RE: [RFC PATCH 01/10] iommu/vt-d: Get iommu device for a mediated device 2018-07-22 6:09 ` [RFC PATCH 01/10] iommu/vt-d: Get iommu device for a " Lu Baolu @ 2018-07-23 4:44 ` Liu, Yi L 2018-07-24 2:06 ` Lu Baolu 2018-07-24 18:50 ` Alex Williamson 1 sibling, 1 reply; 32+ messages in thread From: Liu, Yi L @ 2018-07-23 4:44 UTC (permalink / raw) To: Lu Baolu, Joerg Roedel, David Woodhouse, Alex Williamson, Kirti Wankhede Cc: Raj, Ashok, Kumar, Sanjay K, Pan, Jacob jun, Tian, Kevin, Sun, Yi Y, peterx, iommu, kvm, linux-kernel, Jacob Pan Hi Baolu, Per my understanding, the subject is not accurate. I think this patch is to get parent device structure for a mediate device instead of iommu device. This may be misleading for reviewers. Thanks, Yi Liu > From: Lu Baolu [mailto:baolu.lu@linux.intel.com] > Sent: Sunday, July 22, 2018 2:09 PM > > This patch adds the support to get the iommu device for a mediated device. The > assumption is that each mediated device is a minimal assignable set of a physical PCI > device. Hence, we should use the iommu device of the parent PCI device to manage > the translation. > > Cc: Ashok Raj <ashok.raj@intel.com> > Cc: Jacob Pan <jacob.jun.pan@linux.intel.com> > Cc: Kevin Tian <kevin.tian@intel.com> > Cc: Liu Yi L <yi.l.liu@intel.com> > Signed-off-by: Sanjay Kumar <sanjay.k.kumar@intel.com> > Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> > --- > drivers/iommu/intel-iommu.c | 6 ++++++ drivers/iommu/intel-pasid.h | 16 > ++++++++++++++++ > 2 files changed, 22 insertions(+) > > diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index > 7d198ea..fc3ac1c 100644 > --- a/drivers/iommu/intel-iommu.c > +++ b/drivers/iommu/intel-iommu.c > @@ -767,6 +767,12 @@ static struct intel_iommu *device_to_iommu(struct device > *dev, u8 *bus, u8 *devf > if (iommu_dummy(dev)) > return NULL; > > + if (dev_is_mdev(dev)) { > + dev = dev_mdev_parent(dev); > + if (WARN_ON(!dev_is_pci(dev))) > + return NULL; > + } > + > if (dev_is_pci(dev)) { > struct pci_dev *pf_pdev; > > diff --git a/drivers/iommu/intel-pasid.h b/drivers/iommu/intel-pasid.h index > 518df72..46cde66 100644 > --- a/drivers/iommu/intel-pasid.h > +++ b/drivers/iommu/intel-pasid.h > @@ -35,6 +35,22 @@ struct pasid_table { > struct list_head dev; /* device list */ > }; > > +static inline bool dev_is_mdev(struct device *dev) { > + if (!dev) > + return false; > + > + return !strcmp(dev->bus->name, "mdev"); } > + > +static inline struct device *dev_mdev_parent(struct device *dev) { > + if (WARN_ON(!dev_is_mdev(dev))) > + return NULL; > + > + return dev->parent; > +} > + > extern u32 intel_pasid_max_id; > int intel_pasid_alloc_id(void *ptr, int start, int end, gfp_t gfp); void > intel_pasid_free_id(int pasid); > -- > 2.7.4 ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [RFC PATCH 01/10] iommu/vt-d: Get iommu device for a mediated device 2018-07-23 4:44 ` Liu, Yi L @ 2018-07-24 2:06 ` Lu Baolu 0 siblings, 0 replies; 32+ messages in thread From: Lu Baolu @ 2018-07-24 2:06 UTC (permalink / raw) To: Liu, Yi L, Joerg Roedel, David Woodhouse, Alex Williamson, Kirti Wankhede Cc: Raj, Ashok, Kumar, Sanjay K, Pan, Jacob jun, Tian, Kevin, Sun, Yi Y, peterx, iommu, kvm, linux-kernel, Jacob Pan Hi Yi, Thank you for reviewing my patch. On 07/23/2018 12:44 PM, Liu, Yi L wrote: > Hi Baolu, > > Per my understanding, the subject is not accurate. I think this patch is > to get parent device structure for a mediate device instead of iommu > device. This may be misleading for reviewers. Please check below. > > Thanks, > Yi Liu > >> From: Lu Baolu [mailto:baolu.lu@linux.intel.com] >> Sent: Sunday, July 22, 2018 2:09 PM >> >> This patch adds the support to get the iommu device for a mediated device. The >> assumption is that each mediated device is a minimal assignable set of a physical PCI >> device. Hence, we should use the iommu device of the parent PCI device to manage >> the translation. >> >> Cc: Ashok Raj <ashok.raj@intel.com> >> Cc: Jacob Pan <jacob.jun.pan@linux.intel.com> >> Cc: Kevin Tian <kevin.tian@intel.com> >> Cc: Liu Yi L <yi.l.liu@intel.com> >> Signed-off-by: Sanjay Kumar <sanjay.k.kumar@intel.com> >> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> >> --- >> drivers/iommu/intel-iommu.c | 6 ++++++ drivers/iommu/intel-pasid.h | 16 >> ++++++++++++++++ >> 2 files changed, 22 insertions(+) >> >> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index >> 7d198ea..fc3ac1c 100644 >> --- a/drivers/iommu/intel-iommu.c >> +++ b/drivers/iommu/intel-iommu.c >> @@ -767,6 +767,12 @@ static struct intel_iommu *device_to_iommu(struct device >> *dev, u8 *bus, u8 *devf >> if (iommu_dummy(dev)) >> return NULL; >> >> + if (dev_is_mdev(dev)) { >> + dev = dev_mdev_parent(dev); >> + if (WARN_ON(!dev_is_pci(dev))) >> + return NULL; >> + } >> + This gets the parent of a mediated device. And an iommu device will be found which services the parent device. A mediated device should in the same iommu scope as its parent. Best regards, Lu Baolu >> if (dev_is_pci(dev)) { >> struct pci_dev *pf_pdev; >> >> diff --git a/drivers/iommu/intel-pasid.h b/drivers/iommu/intel-pasid.h index >> 518df72..46cde66 100644 >> --- a/drivers/iommu/intel-pasid.h >> +++ b/drivers/iommu/intel-pasid.h >> @@ -35,6 +35,22 @@ struct pasid_table { >> struct list_head dev; /* device list */ >> }; >> >> +static inline bool dev_is_mdev(struct device *dev) { >> + if (!dev) >> + return false; >> + >> + return !strcmp(dev->bus->name, "mdev"); } >> + >> +static inline struct device *dev_mdev_parent(struct device *dev) { >> + if (WARN_ON(!dev_is_mdev(dev))) >> + return NULL; >> + >> + return dev->parent; >> +} >> + >> extern u32 intel_pasid_max_id; >> int intel_pasid_alloc_id(void *ptr, int start, int end, gfp_t gfp); void >> intel_pasid_free_id(int pasid); >> -- >> 2.7.4 > ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [RFC PATCH 01/10] iommu/vt-d: Get iommu device for a mediated device 2018-07-22 6:09 ` [RFC PATCH 01/10] iommu/vt-d: Get iommu device for a " Lu Baolu 2018-07-23 4:44 ` Liu, Yi L @ 2018-07-24 18:50 ` Alex Williamson 2018-07-25 1:18 ` Lu Baolu 1 sibling, 1 reply; 32+ messages in thread From: Alex Williamson @ 2018-07-24 18:50 UTC (permalink / raw) To: Lu Baolu Cc: Joerg Roedel, David Woodhouse, Kirti Wankhede, ashok.raj, sanjay.k.kumar, jacob.jun.pan, kevin.tian, yi.l.liu, yi.y.sun, peterx, iommu, kvm, linux-kernel, Jacob Pan On Sun, 22 Jul 2018 14:09:24 +0800 Lu Baolu <baolu.lu@linux.intel.com> wrote: > This patch adds the support to get the iommu device for a mediated > device. The assumption is that each mediated device is a minimal > assignable set of a physical PCI device. Hence, we should use the > iommu device of the parent PCI device to manage the translation. Hmm, is this absolutely a valid assumption? I'm afraid there's an assumption throughout this series that the only way an mdev device could be interacting with the IOMMU is via S-IOV, but we could choose today to make an mdev wrapper for any device, which might provide basic RID granularity to the IOMMU. So if I make an mdev wrapper for a PF such that I can implement migration for that device, is it valid for the IOMMU driver to flag me as an mdev device and base mappings on my parent device? Perhaps in this patch we can assume that the parent of such an mdev device would be the PF backing it and that results in the correct drhd, but in the next patch we start imposing the assumption that because the device is an mdev, the only valid association is via pasid, which I'd say is incorrect. > Cc: Ashok Raj <ashok.raj@intel.com> > Cc: Jacob Pan <jacob.jun.pan@linux.intel.com> > Cc: Kevin Tian <kevin.tian@intel.com> > Cc: Liu Yi L <yi.l.liu@intel.com> > Signed-off-by: Sanjay Kumar <sanjay.k.kumar@intel.com> > Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> > --- > drivers/iommu/intel-iommu.c | 6 ++++++ > drivers/iommu/intel-pasid.h | 16 ++++++++++++++++ > 2 files changed, 22 insertions(+) > > diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c > index 7d198ea..fc3ac1c 100644 > --- a/drivers/iommu/intel-iommu.c > +++ b/drivers/iommu/intel-iommu.c > @@ -767,6 +767,12 @@ static struct intel_iommu *device_to_iommu(struct device *dev, u8 *bus, u8 *devf > if (iommu_dummy(dev)) > return NULL; > > + if (dev_is_mdev(dev)) { > + dev = dev_mdev_parent(dev); > + if (WARN_ON(!dev_is_pci(dev))) > + return NULL; > + } > + > if (dev_is_pci(dev)) { > struct pci_dev *pf_pdev; > > diff --git a/drivers/iommu/intel-pasid.h b/drivers/iommu/intel-pasid.h > index 518df72..46cde66 100644 > --- a/drivers/iommu/intel-pasid.h > +++ b/drivers/iommu/intel-pasid.h > @@ -35,6 +35,22 @@ struct pasid_table { > struct list_head dev; /* device list */ > }; > > +static inline bool dev_is_mdev(struct device *dev) > +{ > + if (!dev) > + return false; > + > + return !strcmp(dev->bus->name, "mdev"); > +} I assume we're not using mdev_bus_type because mdev is a loadable module and that symbol doesn't exist in this statically loaded driver, but strcmp is a pretty ugly alternative. Could we use symbol_get() so that we can use mdev_bus_type? Thanks, Alex > + > +static inline struct device *dev_mdev_parent(struct device *dev) > +{ > + if (WARN_ON(!dev_is_mdev(dev))) > + return NULL; > + > + return dev->parent; > +} > + > extern u32 intel_pasid_max_id; > int intel_pasid_alloc_id(void *ptr, int start, int end, gfp_t gfp); > void intel_pasid_free_id(int pasid); ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [RFC PATCH 01/10] iommu/vt-d: Get iommu device for a mediated device 2018-07-24 18:50 ` Alex Williamson @ 2018-07-25 1:18 ` Lu Baolu 0 siblings, 0 replies; 32+ messages in thread From: Lu Baolu @ 2018-07-25 1:18 UTC (permalink / raw) To: Alex Williamson Cc: Joerg Roedel, David Woodhouse, Kirti Wankhede, ashok.raj, sanjay.k.kumar, jacob.jun.pan, kevin.tian, yi.l.liu, yi.y.sun, peterx, iommu, kvm, linux-kernel, Jacob Pan Hi Alex, On 07/25/2018 02:50 AM, Alex Williamson wrote: > On Sun, 22 Jul 2018 14:09:24 +0800 > Lu Baolu <baolu.lu@linux.intel.com> wrote: > >> This patch adds the support to get the iommu device for a mediated >> device. The assumption is that each mediated device is a minimal >> assignable set of a physical PCI device. Hence, we should use the >> iommu device of the parent PCI device to manage the translation. > Hmm, is this absolutely a valid assumption? I'm afraid there's an > assumption throughout this series that the only way an mdev device > could be interacting with the IOMMU is via S-IOV, but we could choose > today to make an mdev wrapper for any device, which might provide basic > RID granularity to the IOMMU. So if I make an mdev wrapper for a PF > such that I can implement migration for that device, is it valid for > the IOMMU driver to flag me as an mdev device and base mappings on my > parent device? You are right. We should not make it SIOV centric. Let me look into the patches and identify/fix the unreasonable assumptions. > Perhaps in this patch we can assume that the parent of > such an mdev device would be the PF backing it and that results in the > correct drhd, Okay. > but in the next patch we start imposing the assumption > that because the device is an mdev, the only valid association is via > pasid, which I'd say is incorrect. You are right. I will find and fix it. > >> Cc: Ashok Raj <ashok.raj@intel.com> >> Cc: Jacob Pan <jacob.jun.pan@linux.intel.com> >> Cc: Kevin Tian <kevin.tian@intel.com> >> Cc: Liu Yi L <yi.l.liu@intel.com> >> Signed-off-by: Sanjay Kumar <sanjay.k.kumar@intel.com> >> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> >> --- >> drivers/iommu/intel-iommu.c | 6 ++++++ >> drivers/iommu/intel-pasid.h | 16 ++++++++++++++++ >> 2 files changed, 22 insertions(+) >> >> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c >> index 7d198ea..fc3ac1c 100644 >> --- a/drivers/iommu/intel-iommu.c >> +++ b/drivers/iommu/intel-iommu.c >> @@ -767,6 +767,12 @@ static struct intel_iommu *device_to_iommu(struct device *dev, u8 *bus, u8 *devf >> if (iommu_dummy(dev)) >> return NULL; >> >> + if (dev_is_mdev(dev)) { >> + dev = dev_mdev_parent(dev); >> + if (WARN_ON(!dev_is_pci(dev))) >> + return NULL; >> + } >> + >> if (dev_is_pci(dev)) { >> struct pci_dev *pf_pdev; >> >> diff --git a/drivers/iommu/intel-pasid.h b/drivers/iommu/intel-pasid.h >> index 518df72..46cde66 100644 >> --- a/drivers/iommu/intel-pasid.h >> +++ b/drivers/iommu/intel-pasid.h >> @@ -35,6 +35,22 @@ struct pasid_table { >> struct list_head dev; /* device list */ >> }; >> >> +static inline bool dev_is_mdev(struct device *dev) >> +{ >> + if (!dev) >> + return false; >> + >> + return !strcmp(dev->bus->name, "mdev"); >> +} > I assume we're not using mdev_bus_type because mdev is a loadable > module and that symbol doesn't exist in this statically loaded driver, Yes. > but strcmp is a pretty ugly alternative. Could we use symbol_get() so > that we can use mdev_bus_type? Thanks, Sure. Best regards, Lu Baolu ^ permalink raw reply [flat|nested] 32+ messages in thread
* [RFC PATCH 02/10] iommu/vt-d: Alloc domain for a mediated device 2018-07-22 6:09 [RFC PATCH 00/10] vfio/mdev: IOMMU aware mediated device Lu Baolu 2018-07-22 6:09 ` [RFC PATCH 01/10] iommu/vt-d: Get iommu device for a " Lu Baolu @ 2018-07-22 6:09 ` Lu Baolu 2018-07-23 4:44 ` Liu, Yi L 2018-07-22 6:09 ` [RFC PATCH 03/10] iommu/vt-d: Allocate groups for mediated devices Lu Baolu ` (7 subsequent siblings) 9 siblings, 1 reply; 32+ messages in thread From: Lu Baolu @ 2018-07-22 6:09 UTC (permalink / raw) To: Joerg Roedel, David Woodhouse, Alex Williamson, Kirti Wankhede Cc: ashok.raj, sanjay.k.kumar, jacob.jun.pan, kevin.tian, yi.l.liu, yi.y.sun, peterx, iommu, kvm, linux-kernel, Lu Baolu, Jacob Pan The PASID-granular 2nd level address translation makes it possible to isolate and protect a mediated device exposed by a real device which support PCI PASID features. This patch adds support to allocate a domain for a mediated device. A default pasid value will be allocated as well for the mediated device. This will be used by the mediated device attached to this domain for non-SVM DMA transactions. Cc: Ashok Raj <ashok.raj@intel.com> Cc: Jacob Pan <jacob.jun.pan@linux.intel.com> Cc: Kevin Tian <kevin.tian@intel.com> Cc: Liu Yi L <yi.l.liu@intel.com> Signed-off-by: Sanjay Kumar <sanjay.k.kumar@intel.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> --- drivers/iommu/intel-iommu.c | 17 ++++++++++++++++- include/linux/intel-iommu.h | 5 +++++ 2 files changed, 21 insertions(+), 1 deletion(-) diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index fc3ac1c..3ede34a 100644 --- a/drivers/iommu/intel-iommu.c +++ b/drivers/iommu/intel-iommu.c @@ -1926,6 +1926,9 @@ static void domain_exit(struct dmar_domain *domain) domain_remove_dev_info(domain); rcu_read_unlock(); + if (domain->default_pasid > 0) + intel_pasid_free_id(domain->default_pasid); + /* destroy iovas */ put_iova_domain(&domain->iovad); @@ -2519,11 +2522,23 @@ static struct dmar_domain *dmar_insert_one_dev_info(struct intel_iommu *iommu, } } + if (dev && dev_is_mdev(dev) && domain->default_pasid <= 0) { + int max = intel_pasid_get_dev_max_id(dev_mdev_parent(dev)); + + domain->default_pasid = intel_pasid_alloc_id(domain, PASID_MIN, + max, GFP_KERNEL); + if (domain->default_pasid < 0) { + free_devinfo_mem(info); + return NULL; + } + } + spin_lock_irqsave(&device_domain_lock, flags); if (dev) found = find_domain(dev); - if (!found) { + /* A mediated device never has an DMA alias. Ignore searching. */ + if (!found && !dev_is_mdev(dev)) { struct device_domain_info *info2; info2 = dmar_search_domain_by_dev_info(iommu->segment, bus, devfn); if (info2) { diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h index 7efc632..9f5dcf6 100644 --- a/include/linux/intel-iommu.h +++ b/include/linux/intel-iommu.h @@ -423,6 +423,11 @@ struct dmar_domain { 2 == 1GiB, 3 == 512GiB, 4 == 1TiB */ u64 max_addr; /* maximum mapped address */ + int default_pasid; /* + * The default pasid used for non-SVM + * traffic on mediated devices. + */ + struct iommu_domain domain; /* generic domain data structure for iommu core */ }; -- 2.7.4 ^ permalink raw reply related [flat|nested] 32+ messages in thread
* RE: [RFC PATCH 02/10] iommu/vt-d: Alloc domain for a mediated device 2018-07-22 6:09 ` [RFC PATCH 02/10] iommu/vt-d: Alloc domain " Lu Baolu @ 2018-07-23 4:44 ` Liu, Yi L 2018-07-24 2:09 ` Lu Baolu 0 siblings, 1 reply; 32+ messages in thread From: Liu, Yi L @ 2018-07-23 4:44 UTC (permalink / raw) To: Lu Baolu, Joerg Roedel, David Woodhouse, Alex Williamson, Kirti Wankhede Cc: Raj, Ashok, Kumar, Sanjay K, Pan, Jacob jun, Tian, Kevin, Sun, Yi Y, peterx, iommu, kvm, linux-kernel, Jacob Pan > From: Lu Baolu [mailto:baolu.lu@linux.intel.com] > Sent: Sunday, July 22, 2018 2:09 PM > > The PASID-granular 2nd level address translation makes it possible to isolate and > protect a mediated device exposed by a real device which support PCI PASID "support" -> "supports" > features. This patch adds support to allocate a domain for a mediated device. A > default pasid value will be allocated as well for the mediated device. This will be used This is not accurate, it is "a default pasid value will be allocated for the domain, and the mdev will be configed to use this pasid when it is added to this domain" Regards, Yi Liu > by the mediated device attached to this domain for non-SVM DMA transactions. > > Cc: Ashok Raj <ashok.raj@intel.com> > Cc: Jacob Pan <jacob.jun.pan@linux.intel.com> > Cc: Kevin Tian <kevin.tian@intel.com> > Cc: Liu Yi L <yi.l.liu@intel.com> > Signed-off-by: Sanjay Kumar <sanjay.k.kumar@intel.com> > Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> > --- > drivers/iommu/intel-iommu.c | 17 ++++++++++++++++- include/linux/intel-iommu.h > | 5 +++++ > 2 files changed, 21 insertions(+), 1 deletion(-) > > diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index > fc3ac1c..3ede34a 100644 > --- a/drivers/iommu/intel-iommu.c > +++ b/drivers/iommu/intel-iommu.c > @@ -1926,6 +1926,9 @@ static void domain_exit(struct dmar_domain *domain) > domain_remove_dev_info(domain); > rcu_read_unlock(); > > + if (domain->default_pasid > 0) > + intel_pasid_free_id(domain->default_pasid); > + > /* destroy iovas */ > put_iova_domain(&domain->iovad); > > @@ -2519,11 +2522,23 @@ static struct dmar_domain > *dmar_insert_one_dev_info(struct intel_iommu *iommu, > } > } > > + if (dev && dev_is_mdev(dev) && domain->default_pasid <= 0) { > + int max = intel_pasid_get_dev_max_id(dev_mdev_parent(dev)); > + > + domain->default_pasid = intel_pasid_alloc_id(domain, PASID_MIN, > + max, GFP_KERNEL); > + if (domain->default_pasid < 0) { > + free_devinfo_mem(info); > + return NULL; > + } > + } > + > spin_lock_irqsave(&device_domain_lock, flags); > if (dev) > found = find_domain(dev); > > - if (!found) { > + /* A mediated device never has an DMA alias. Ignore searching. */ > + if (!found && !dev_is_mdev(dev)) { > struct device_domain_info *info2; > info2 = dmar_search_domain_by_dev_info(iommu->segment, bus, > devfn); > if (info2) { > diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h index > 7efc632..9f5dcf6 100644 > --- a/include/linux/intel-iommu.h > +++ b/include/linux/intel-iommu.h > @@ -423,6 +423,11 @@ struct dmar_domain { > 2 == 1GiB, 3 == 512GiB, 4 == 1TiB */ > u64 max_addr; /* maximum mapped address */ > > + int default_pasid; /* > + * The default pasid used for non-SVM > + * traffic on mediated devices. > + */ > + > struct iommu_domain domain; /* generic domain data structure for > iommu core */ > }; > -- > 2.7.4 ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [RFC PATCH 02/10] iommu/vt-d: Alloc domain for a mediated device 2018-07-23 4:44 ` Liu, Yi L @ 2018-07-24 2:09 ` Lu Baolu 0 siblings, 0 replies; 32+ messages in thread From: Lu Baolu @ 2018-07-24 2:09 UTC (permalink / raw) To: Liu, Yi L, Joerg Roedel, David Woodhouse, Alex Williamson, Kirti Wankhede Cc: Raj, Ashok, Kumar, Sanjay K, Pan, Jacob jun, Tian, Kevin, Sun, Yi Y, peterx, iommu, kvm, linux-kernel, Jacob Pan Hi, On 07/23/2018 12:44 PM, Liu, Yi L wrote: >> From: Lu Baolu [mailto:baolu.lu@linux.intel.com] >> Sent: Sunday, July 22, 2018 2:09 PM >> >> The PASID-granular 2nd level address translation makes it possible to isolate and >> protect a mediated device exposed by a real device which support PCI PASID > "support" -> "supports" Sure. > >> features. This patch adds support to allocate a domain for a mediated device. A >> default pasid value will be allocated as well for the mediated device. This will be used > This is not accurate, it is "a default pasid value will be allocated for the domain, and the > mdev will be configed to use this pasid when it is added to this domain" Looks better. Thank you. Best regards, Lu Baolu > > Regards, > Yi Liu > >> by the mediated device attached to this domain for non-SVM DMA transactions. >> >> Cc: Ashok Raj <ashok.raj@intel.com> >> Cc: Jacob Pan <jacob.jun.pan@linux.intel.com> >> Cc: Kevin Tian <kevin.tian@intel.com> >> Cc: Liu Yi L <yi.l.liu@intel.com> >> Signed-off-by: Sanjay Kumar <sanjay.k.kumar@intel.com> >> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> >> --- >> drivers/iommu/intel-iommu.c | 17 ++++++++++++++++- include/linux/intel-iommu.h >> | 5 +++++ >> 2 files changed, 21 insertions(+), 1 deletion(-) >> >> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index >> fc3ac1c..3ede34a 100644 >> --- a/drivers/iommu/intel-iommu.c >> +++ b/drivers/iommu/intel-iommu.c >> @@ -1926,6 +1926,9 @@ static void domain_exit(struct dmar_domain *domain) >> domain_remove_dev_info(domain); >> rcu_read_unlock(); >> >> + if (domain->default_pasid > 0) >> + intel_pasid_free_id(domain->default_pasid); >> + >> /* destroy iovas */ >> put_iova_domain(&domain->iovad); >> >> @@ -2519,11 +2522,23 @@ static struct dmar_domain >> *dmar_insert_one_dev_info(struct intel_iommu *iommu, >> } >> } >> >> + if (dev && dev_is_mdev(dev) && domain->default_pasid <= 0) { >> + int max = intel_pasid_get_dev_max_id(dev_mdev_parent(dev)); >> + >> + domain->default_pasid = intel_pasid_alloc_id(domain, PASID_MIN, >> + max, GFP_KERNEL); >> + if (domain->default_pasid < 0) { >> + free_devinfo_mem(info); >> + return NULL; >> + } >> + } >> + >> spin_lock_irqsave(&device_domain_lock, flags); >> if (dev) >> found = find_domain(dev); >> >> - if (!found) { >> + /* A mediated device never has an DMA alias. Ignore searching. */ >> + if (!found && !dev_is_mdev(dev)) { >> struct device_domain_info *info2; >> info2 = dmar_search_domain_by_dev_info(iommu->segment, bus, >> devfn); >> if (info2) { >> diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h index >> 7efc632..9f5dcf6 100644 >> --- a/include/linux/intel-iommu.h >> +++ b/include/linux/intel-iommu.h >> @@ -423,6 +423,11 @@ struct dmar_domain { >> 2 == 1GiB, 3 == 512GiB, 4 == 1TiB */ >> u64 max_addr; /* maximum mapped address */ >> >> + int default_pasid; /* >> + * The default pasid used for non-SVM >> + * traffic on mediated devices. >> + */ >> + >> struct iommu_domain domain; /* generic domain data structure for >> iommu core */ >> }; >> -- >> 2.7.4 > ^ permalink raw reply [flat|nested] 32+ messages in thread
* [RFC PATCH 03/10] iommu/vt-d: Allocate groups for mediated devices 2018-07-22 6:09 [RFC PATCH 00/10] vfio/mdev: IOMMU aware mediated device Lu Baolu 2018-07-22 6:09 ` [RFC PATCH 01/10] iommu/vt-d: Get iommu device for a " Lu Baolu 2018-07-22 6:09 ` [RFC PATCH 02/10] iommu/vt-d: Alloc domain " Lu Baolu @ 2018-07-22 6:09 ` Lu Baolu 2018-07-23 4:44 ` Liu, Yi L 2018-07-22 6:09 ` [RFC PATCH 04/10] iommu/vt-d: Get pasid table for a mediated device Lu Baolu ` (6 subsequent siblings) 9 siblings, 1 reply; 32+ messages in thread From: Lu Baolu @ 2018-07-22 6:09 UTC (permalink / raw) To: Joerg Roedel, David Woodhouse, Alex Williamson, Kirti Wankhede Cc: ashok.raj, sanjay.k.kumar, jacob.jun.pan, kevin.tian, yi.l.liu, yi.y.sun, peterx, iommu, kvm, linux-kernel, Lu Baolu, Jacob Pan With the Intel IOMMU supporting PASID granularity isolation and protection, a mediated device could be isolated and protected by an IOMMU unit. We need to allocate a new group instead of a PCI group. Cc: Ashok Raj <ashok.raj@intel.com> Cc: Jacob Pan <jacob.jun.pan@linux.intel.com> Cc: Kevin Tian <kevin.tian@intel.com> Cc: Liu Yi L <yi.l.liu@intel.com> Signed-off-by: Sanjay Kumar <sanjay.k.kumar@intel.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> --- drivers/iommu/intel-iommu.c | 13 ++++++++++++- 1 file changed, 12 insertions(+), 1 deletion(-) diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index 3ede34a..57ccfc4 100644 --- a/drivers/iommu/intel-iommu.c +++ b/drivers/iommu/intel-iommu.c @@ -5276,6 +5276,17 @@ static void intel_iommu_put_resv_regions(struct device *dev, } } +static struct iommu_group *intel_iommu_device_group(struct device *dev) +{ + if (dev_is_pci(dev)) + return pci_device_group(dev); + + if (dev_is_mdev(dev)) + return iommu_group_alloc(); + + return ERR_PTR(-EINVAL); +} + #ifdef CONFIG_INTEL_IOMMU_SVM int intel_iommu_enable_pasid(struct intel_iommu *iommu, struct intel_svm_dev *sdev) { @@ -5370,7 +5381,7 @@ const struct iommu_ops intel_iommu_ops = { .remove_device = intel_iommu_remove_device, .get_resv_regions = intel_iommu_get_resv_regions, .put_resv_regions = intel_iommu_put_resv_regions, - .device_group = pci_device_group, + .device_group = intel_iommu_device_group, .pgsize_bitmap = INTEL_IOMMU_PGSIZES, }; -- 2.7.4 ^ permalink raw reply related [flat|nested] 32+ messages in thread
* RE: [RFC PATCH 03/10] iommu/vt-d: Allocate groups for mediated devices 2018-07-22 6:09 ` [RFC PATCH 03/10] iommu/vt-d: Allocate groups for mediated devices Lu Baolu @ 2018-07-23 4:44 ` Liu, Yi L 2018-07-24 2:22 ` Lu Baolu 0 siblings, 1 reply; 32+ messages in thread From: Liu, Yi L @ 2018-07-23 4:44 UTC (permalink / raw) To: Lu Baolu, Joerg Roedel, David Woodhouse, Alex Williamson, Kirti Wankhede Cc: Raj, Ashok, Kumar, Sanjay K, Pan, Jacob jun, Tian, Kevin, Sun, Yi Y, peterx, iommu, kvm, linux-kernel, Jacob Pan > From: Lu Baolu [mailto:baolu.lu@linux.intel.com] > Sent: Sunday, July 22, 2018 2:09 PM > > With the Intel IOMMU supporting PASID granularity isolation and protection, a > mediated device could be isolated and protected by an IOMMU unit. We need to > allocate a new group instead of a PCI group. Existing vfio mdev framework also allocates an iommu group for mediate device. mdev_probe() |_ mdev_attach_iommu() |_ iommu_group_alloc() IMHO, this patch actually do a wrapper to the group allocation API. The reason is: it is not necessary to apply the group allocation rules when the allocation is for mediated device. Instead, just allocate a new one for it. :) Thanks, Yi Liu > > Cc: Ashok Raj <ashok.raj@intel.com> > Cc: Jacob Pan <jacob.jun.pan@linux.intel.com> > Cc: Kevin Tian <kevin.tian@intel.com> > Cc: Liu Yi L <yi.l.liu@intel.com> > Signed-off-by: Sanjay Kumar <sanjay.k.kumar@intel.com> > Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> > --- > drivers/iommu/intel-iommu.c | 13 ++++++++++++- > 1 file changed, 12 insertions(+), 1 deletion(-) > > diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index > 3ede34a..57ccfc4 100644 > --- a/drivers/iommu/intel-iommu.c > +++ b/drivers/iommu/intel-iommu.c > @@ -5276,6 +5276,17 @@ static void intel_iommu_put_resv_regions(struct device > *dev, > } > } > > +static struct iommu_group *intel_iommu_device_group(struct device *dev) > +{ > + if (dev_is_pci(dev)) > + return pci_device_group(dev); > + > + if (dev_is_mdev(dev)) > + return iommu_group_alloc(); > + > + return ERR_PTR(-EINVAL); > +} > + > #ifdef CONFIG_INTEL_IOMMU_SVM > int intel_iommu_enable_pasid(struct intel_iommu *iommu, struct intel_svm_dev > *sdev) { @@ -5370,7 +5381,7 @@ const struct iommu_ops intel_iommu_ops = { > .remove_device = intel_iommu_remove_device, > .get_resv_regions = intel_iommu_get_resv_regions, > .put_resv_regions = intel_iommu_put_resv_regions, > - .device_group = pci_device_group, > + .device_group = intel_iommu_device_group, > .pgsize_bitmap = INTEL_IOMMU_PGSIZES, > }; > > -- > 2.7.4 ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [RFC PATCH 03/10] iommu/vt-d: Allocate groups for mediated devices 2018-07-23 4:44 ` Liu, Yi L @ 2018-07-24 2:22 ` Lu Baolu 2018-07-24 11:30 ` Jean-Philippe Brucker 0 siblings, 1 reply; 32+ messages in thread From: Lu Baolu @ 2018-07-24 2:22 UTC (permalink / raw) To: Liu, Yi L, Joerg Roedel, David Woodhouse, Alex Williamson, Kirti Wankhede Cc: Raj, Ashok, Kumar, Sanjay K, Pan, Jacob jun, Tian, Kevin, Sun, Yi Y, peterx, iommu, kvm, linux-kernel, Jacob Pan Hi, On 07/23/2018 12:44 PM, Liu, Yi L wrote: >> From: Lu Baolu [mailto:baolu.lu@linux.intel.com] >> Sent: Sunday, July 22, 2018 2:09 PM >> >> With the Intel IOMMU supporting PASID granularity isolation and protection, a >> mediated device could be isolated and protected by an IOMMU unit. We need to >> allocate a new group instead of a PCI group. > Existing vfio mdev framework also allocates an iommu group for mediate device. > > mdev_probe() > |_ mdev_attach_iommu() > |_ iommu_group_alloc() When external components ask iommu to allocate a group for a device, it will call pci_device_group in Intel IOMMU driver's @device_group callback. In another word, current Intel IOMMU driver doesn't aware the mediated device and treat all devices as PCI ones. This patch extends the @device_group call back to make it aware of a mediated device. Best regards, Lu Baolu > > IMHO, this patch actually do a wrapper to the group allocation API. The reason is: it > is not necessary to apply the group allocation rules when the allocation is for mediated > device. Instead, just allocate a new one for it. :) > > Thanks, > Yi Liu >> Cc: Ashok Raj <ashok.raj@intel.com> >> Cc: Jacob Pan <jacob.jun.pan@linux.intel.com> >> Cc: Kevin Tian <kevin.tian@intel.com> >> Cc: Liu Yi L <yi.l.liu@intel.com> >> Signed-off-by: Sanjay Kumar <sanjay.k.kumar@intel.com> >> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> >> --- >> drivers/iommu/intel-iommu.c | 13 ++++++++++++- >> 1 file changed, 12 insertions(+), 1 deletion(-) >> >> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index >> 3ede34a..57ccfc4 100644 >> --- a/drivers/iommu/intel-iommu.c >> +++ b/drivers/iommu/intel-iommu.c >> @@ -5276,6 +5276,17 @@ static void intel_iommu_put_resv_regions(struct device >> *dev, >> } >> } >> >> +static struct iommu_group *intel_iommu_device_group(struct device *dev) >> +{ >> + if (dev_is_pci(dev)) >> + return pci_device_group(dev); >> + >> + if (dev_is_mdev(dev)) >> + return iommu_group_alloc(); >> + >> + return ERR_PTR(-EINVAL); >> +} >> + >> #ifdef CONFIG_INTEL_IOMMU_SVM >> int intel_iommu_enable_pasid(struct intel_iommu *iommu, struct intel_svm_dev >> *sdev) { @@ -5370,7 +5381,7 @@ const struct iommu_ops intel_iommu_ops = { >> .remove_device = intel_iommu_remove_device, >> .get_resv_regions = intel_iommu_get_resv_regions, >> .put_resv_regions = intel_iommu_put_resv_regions, >> - .device_group = pci_device_group, >> + .device_group = intel_iommu_device_group, >> .pgsize_bitmap = INTEL_IOMMU_PGSIZES, >> }; >> >> -- >> 2.7.4 > ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [RFC PATCH 03/10] iommu/vt-d: Allocate groups for mediated devices 2018-07-24 2:22 ` Lu Baolu @ 2018-07-24 11:30 ` Jean-Philippe Brucker 2018-07-24 19:51 ` Alex Williamson ` (4 more replies) 0 siblings, 5 replies; 32+ messages in thread From: Jean-Philippe Brucker @ 2018-07-24 11:30 UTC (permalink / raw) To: Lu Baolu, Liu, Yi L, Joerg Roedel, David Woodhouse, Alex Williamson, Kirti Wankhede Cc: Raj, Ashok, kvm, Kumar, Sanjay K, iommu, linux-kernel, Sun, Yi Y, Pan, Jacob jun Hi Baolu, On 24/07/18 03:22, Lu Baolu wrote: > Hi, > > On 07/23/2018 12:44 PM, Liu, Yi L wrote: >>> From: Lu Baolu [mailto:baolu.lu@linux.intel.com] >>> Sent: Sunday, July 22, 2018 2:09 PM >>> >>> With the Intel IOMMU supporting PASID granularity isolation and protection, a >>> mediated device could be isolated and protected by an IOMMU unit. We need to >>> allocate a new group instead of a PCI group. >> Existing vfio mdev framework also allocates an iommu group for mediate device. >> >> mdev_probe() >> |_ mdev_attach_iommu() >> |_ iommu_group_alloc() > > When external components ask iommu to allocate a group for a device, > it will call pci_device_group in Intel IOMMU driver's @device_group > callback. In another word, current Intel IOMMU driver doesn't aware > the mediated device and treat all devices as PCI ones. This patch > extends the @device_group call back to make it aware of a mediated > device. I agree that allocating two groups for an mdev seems strange, and in my opinion we shouldn't export the notion of mdev to IOMMU drivers. Otherwise each driver will have to add its own "dev_is_mdev()" special cases, which will get messy in the long run. Besides, the macro is currently private, and to be exported it should be wrapped in symbol_get/put(mdev_bus_type). There is another way: as we're planning to add a generic pasid_alloc() function to the IOMMU API, the mdev module itself could allocate a default PASID for each mdev by calling pasid_alloc() on the mdev's parent, and then do map()/unmap() with that PASID. This way we don't have to add IOMMU ops to the mdev bus, everything can still be done using the ops of the parent. And IOMMU drivers "only" have to implement PASID ops, which will be reused by drivers other than mdev. The allocated PASID also needs to be installed into the parent device. If the mdev module knows the PASID, we can do that by adding set_pasid(mdev, pasid) and clear_pasid(mdev, pasid) operations to mdev_parent_ops. Thanks, Jean ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [RFC PATCH 03/10] iommu/vt-d: Allocate groups for mediated devices 2018-07-24 11:30 ` Jean-Philippe Brucker @ 2018-07-24 19:51 ` Alex Williamson 2018-07-25 2:06 ` Lu Baolu ` (3 subsequent siblings) 4 siblings, 0 replies; 32+ messages in thread From: Alex Williamson @ 2018-07-24 19:51 UTC (permalink / raw) To: Jean-Philippe Brucker Cc: Lu Baolu, Liu, Yi L, Joerg Roedel, David Woodhouse, Kirti Wankhede, Raj, Ashok, kvm, Kumar, Sanjay K, iommu, linux-kernel, Sun, Yi Y, Pan, Jacob jun On Tue, 24 Jul 2018 12:30:35 +0100 Jean-Philippe Brucker <jean-philippe.brucker@arm.com> wrote: > Hi Baolu, > > On 24/07/18 03:22, Lu Baolu wrote: > > Hi, > > > > On 07/23/2018 12:44 PM, Liu, Yi L wrote: > >>> From: Lu Baolu [mailto:baolu.lu@linux.intel.com] > >>> Sent: Sunday, July 22, 2018 2:09 PM > >>> > >>> With the Intel IOMMU supporting PASID granularity isolation and protection, a > >>> mediated device could be isolated and protected by an IOMMU unit. We need to > >>> allocate a new group instead of a PCI group. > >> Existing vfio mdev framework also allocates an iommu group for mediate device. > >> > >> mdev_probe() > >> |_ mdev_attach_iommu() > >> |_ iommu_group_alloc() > > > > When external components ask iommu to allocate a group for a device, > > it will call pci_device_group in Intel IOMMU driver's @device_group > > callback. In another word, current Intel IOMMU driver doesn't aware > > the mediated device and treat all devices as PCI ones. This patch > > extends the @device_group call back to make it aware of a mediated > > device. > > I agree that allocating two groups for an mdev seems strange, and in my > opinion we shouldn't export the notion of mdev to IOMMU drivers. Yep, I was just thinking the same thing. This is too highly integrated into VT-d and too narrowly focused on PASID being the only way that an mdev could make use of the IOMMU. Thanks, Alex ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [RFC PATCH 03/10] iommu/vt-d: Allocate groups for mediated devices 2018-07-24 11:30 ` Jean-Philippe Brucker 2018-07-24 19:51 ` Alex Williamson @ 2018-07-25 2:06 ` Lu Baolu 2018-07-25 2:16 ` Tian, Kevin ` (2 subsequent siblings) 4 siblings, 0 replies; 32+ messages in thread From: Lu Baolu @ 2018-07-25 2:06 UTC (permalink / raw) To: Jean-Philippe Brucker, Liu, Yi L, Joerg Roedel, David Woodhouse, Alex Williamson, Kirti Wankhede Cc: Raj, Ashok, kvm, Kumar, Sanjay K, iommu, linux-kernel, Sun, Yi Y, Pan, Jacob jun, Tian, Kevin Hi Jean, Pull Kevin in. Not sure why he was removed from cc list. On 07/24/2018 07:30 PM, Jean-Philippe Brucker wrote: > Hi Baolu, > > On 24/07/18 03:22, Lu Baolu wrote: >> Hi, >> >> On 07/23/2018 12:44 PM, Liu, Yi L wrote: >>>> From: Lu Baolu [mailto:baolu.lu@linux.intel.com] >>>> Sent: Sunday, July 22, 2018 2:09 PM >>>> >>>> With the Intel IOMMU supporting PASID granularity isolation and protection, a >>>> mediated device could be isolated and protected by an IOMMU unit. We need to >>>> allocate a new group instead of a PCI group. >>> Existing vfio mdev framework also allocates an iommu group for mediate device. >>> >>> mdev_probe() >>> |_ mdev_attach_iommu() >>> |_ iommu_group_alloc() >> When external components ask iommu to allocate a group for a device, >> it will call pci_device_group in Intel IOMMU driver's @device_group >> callback. In another word, current Intel IOMMU driver doesn't aware >> the mediated device and treat all devices as PCI ones. This patch >> extends the @device_group call back to make it aware of a mediated >> device. > I agree that allocating two groups for an mdev seems strange, and in my > opinion we shouldn't export the notion of mdev to IOMMU drivers. > Otherwise each driver will have to add its own "dev_is_mdev()" special > cases, which will get messy in the long run. Besides, the macro is > currently private, and to be exported it should be wrapped in > symbol_get/put(mdev_bus_type). I agree with you. It should be better if we can make it in a driver agnostic way. > > There is another way: as we're planning to add a generic pasid_alloc() > function to the IOMMU API, the mdev module itself could allocate a > default PASID for each mdev by calling pasid_alloc() on the mdev's > parent, and then do map()/unmap() with that PASID. This way we don't > have to add IOMMU ops to the mdev bus, everything can still be done > using the ops of the parent. And IOMMU drivers "only" have to implement > PASID ops, which will be reused by drivers other than mdev. A quick question when I thinking about this is how to allocate a domain for the mediated device who uses the default pasid for DMA transactions? Best regards, Lu Baolu > > The allocated PASID also needs to be installed into the parent device. > If the mdev module knows the PASID, we can do that by adding > set_pasid(mdev, pasid) and clear_pasid(mdev, pasid) operations to > mdev_parent_ops. > > Thanks, > Jean > > ^ permalink raw reply [flat|nested] 32+ messages in thread
* RE: [RFC PATCH 03/10] iommu/vt-d: Allocate groups for mediated devices 2018-07-24 11:30 ` Jean-Philippe Brucker 2018-07-24 19:51 ` Alex Williamson 2018-07-25 2:06 ` Lu Baolu @ 2018-07-25 2:16 ` Tian, Kevin 2018-07-25 2:35 ` Liu, Yi L [not found] ` <AADFC41AFE54684AB9EE6CBC0274A5D19127FB9E@SHSMSX101.ccr.corp.intel.com> 4 siblings, 0 replies; 32+ messages in thread From: Tian, Kevin @ 2018-07-25 2:16 UTC (permalink / raw) To: Jean-Philippe Brucker, Lu Baolu, Liu, Yi L, Joerg Roedel, David Woodhouse, Alex Williamson, Kirti Wankhede Cc: Raj, Ashok, kvm, Kumar, Sanjay K, iommu, linux-kernel, Sun, Yi Y, Pan, Jacob jun > From: Jean-Philippe Brucker > Sent: Tuesday, July 24, 2018 7:31 PM > > Hi Baolu, > > On 24/07/18 03:22, Lu Baolu wrote: > > Hi, > > > > On 07/23/2018 12:44 PM, Liu, Yi L wrote: > >>> From: Lu Baolu [mailto:baolu.lu@linux.intel.com] > >>> Sent: Sunday, July 22, 2018 2:09 PM > >>> > >>> With the Intel IOMMU supporting PASID granularity isolation and > protection, a > >>> mediated device could be isolated and protected by an IOMMU unit. > We need to > >>> allocate a new group instead of a PCI group. > >> Existing vfio mdev framework also allocates an iommu group for > mediate device. > >> > >> mdev_probe() > >> |_ mdev_attach_iommu() > >> |_ iommu_group_alloc() > > > > When external components ask iommu to allocate a group for a device, > > it will call pci_device_group in Intel IOMMU driver's @device_group > > callback. In another word, current Intel IOMMU driver doesn't aware > > the mediated device and treat all devices as PCI ones. This patch > > extends the @device_group call back to make it aware of a mediated > > device. > > I agree that allocating two groups for an mdev seems strange, and in my > opinion we shouldn't export the notion of mdev to IOMMU drivers. > Otherwise each driver will have to add its own "dev_is_mdev()" special > cases, which will get messy in the long run. Besides, the macro is > currently private, and to be exported it should be wrapped in > symbol_get/put(mdev_bus_type). There is only one group per mdev, but I agree that avoiding mdev awareness in IOMMU driver is definitely good... We didn't find a good way before, which is why current RFC was implemented that way. > > There is another way: as we're planning to add a generic pasid_alloc() > function to the IOMMU API, the mdev module itself could allocate a > default PASID for each mdev by calling pasid_alloc() on the mdev's > parent, and then do map()/unmap() with that PASID. This way we don't > have to add IOMMU ops to the mdev bus, everything can still be done > using the ops of the parent. And IOMMU drivers "only" have to implement > PASID ops, which will be reused by drivers other than mdev. yes, PASID is more general abstraction than mdev. However there is one problem. Please note with new scalable mode now PASID is not used just for SVA. You can do 1st-level, 2nd-level and nested in PASID granularity, meaning each PASID can be bound to an unique iommu domain. However today IOMMU driver allows only one iommu_domain per device. If we simply install allocated PASID to parent device, we should also remove that iommu_domain limitation. Is it the way that we want to pursue? mdev avoids such problem as it introduces a separate device... > > The allocated PASID also needs to be installed into the parent device. > If the mdev module knows the PASID, we can do that by adding > set_pasid(mdev, pasid) and clear_pasid(mdev, pasid) operations to > mdev_parent_ops. > Thanks Kevin ^ permalink raw reply [flat|nested] 32+ messages in thread
* RE: [RFC PATCH 03/10] iommu/vt-d: Allocate groups for mediated devices 2018-07-24 11:30 ` Jean-Philippe Brucker ` (2 preceding siblings ...) 2018-07-25 2:16 ` Tian, Kevin @ 2018-07-25 2:35 ` Liu, Yi L [not found] ` <AADFC41AFE54684AB9EE6CBC0274A5D19127FB9E@SHSMSX101.ccr.corp.intel.com> 4 siblings, 0 replies; 32+ messages in thread From: Liu, Yi L @ 2018-07-25 2:35 UTC (permalink / raw) To: Jean-Philippe Brucker, Lu Baolu, Joerg Roedel, David Woodhouse, Alex Williamson, Kirti Wankhede Cc: Raj, Ashok, kvm, Kumar, Sanjay K, iommu, linux-kernel, Sun, Yi Y, Pan, Jacob jun, Tian, Kevin Hi Jean, > From: Jean-Philippe Brucker [mailto:jean-philippe.brucker@arm.com] > Sent: Tuesday, July 24, 2018 7:31 PM > > Hi Baolu, > > On 24/07/18 03:22, Lu Baolu wrote: > > Hi, > > > > On 07/23/2018 12:44 PM, Liu, Yi L wrote: > >>> From: Lu Baolu [mailto:baolu.lu@linux.intel.com] > >>> Sent: Sunday, July 22, 2018 2:09 PM > >>> > >>> With the Intel IOMMU supporting PASID granularity isolation and protection, a > >>> mediated device could be isolated and protected by an IOMMU unit. We need to > >>> allocate a new group instead of a PCI group. > >> Existing vfio mdev framework also allocates an iommu group for mediate device. > >> > >> mdev_probe() > >> |_ mdev_attach_iommu() > >> |_ iommu_group_alloc() > > > > When external components ask iommu to allocate a group for a device, > > it will call pci_device_group in Intel IOMMU driver's @device_group > > callback. In another word, current Intel IOMMU driver doesn't aware > > the mediated device and treat all devices as PCI ones. This patch > > extends the @device_group call back to make it aware of a mediated > > device. > > I agree that allocating two groups for an mdev seems strange, and in my There will not be two groups for a mdev. Pls refer to Patch 08/10 of this series. Baolu added iommu_ops check when doing group allocation in mdev_attach_iommu(). [RFC PATCH 08/10] vfio/mdev: Set iommu ops for mdev bus @@ -21,6 +21,13 @@ static int mdev_attach_iommu(struct mdev_device *mdev) int ret; struct iommu_group *group; + /* + * If iommu_ops is set for bus, add_device() will allocate + * a group and add the device in the group. + */ + if (iommu_present(mdev->dev.bus)) + return 0; + > opinion we shouldn't export the notion of mdev to IOMMU drivers. The key idea of this RFC is to tag iommu domain with PASID, if any mdev is added to such a domain, it would get the PASID and config in its parent. Thus the transactions from mdev can be isolated in iommu hardware. Based on this idea, mdevs can be managed in a flexible manner. e.g. if two mdevs are assigned to same VM, they can share PASID. This share can be easily achieve by adding them to the same domain. If we default allocate a PASID for each mdev, it may be a waste. With vendor-specific iommu driver handle the mdev difference, it can largely keep the fundamental iommu concepts in current software implementation. > Otherwise each driver will have to add its own "dev_is_mdev()" special > cases, which will get messy in the long run. Besides, the macro is > currently private, and to be exported it should be wrapped in > symbol_get/put(mdev_bus_type). Agreed. Should figure out a better manner. > There is another way: as we're planning to add a generic pasid_alloc() > function to the IOMMU API, the mdev module itself could allocate a > default PASID for each mdev by calling pasid_alloc() on the mdev's > parent, and then do map()/unmap() with that PASID. This way we don't so far, map/unmap is per-domain operation. In this way, passing PASID makes it be kind of per-device operation. This may affect too much of existing software implementation. > have to add IOMMU ops to the mdev bus, everything can still be done > using the ops of the parent. And IOMMU drivers "only" have to implement > PASID ops, which will be reused by drivers other than mdev. > > The allocated PASID also needs to be installed into the parent device. > If the mdev module knows the PASID, we can do that by adding > set_pasid(mdev, pasid) and clear_pasid(mdev, pasid) operations to > mdev_parent_ops. Your idea is fascinating. Pls feel free let us know if we missed any from you. :) > Thanks, > Jean Thanks, Yi Liu ^ permalink raw reply [flat|nested] 32+ messages in thread
[parent not found: <AADFC41AFE54684AB9EE6CBC0274A5D19127FB9E@SHSMSX101.ccr.corp.intel.com>]
* RE: [RFC PATCH 03/10] iommu/vt-d: Allocate groups for mediated devices [not found] ` <AADFC41AFE54684AB9EE6CBC0274A5D19127FB9E@SHSMSX101.ccr.corp.intel.com> @ 2018-07-25 6:19 ` Tian, Kevin 2018-07-25 19:19 ` Jean-Philippe Brucker 0 siblings, 1 reply; 32+ messages in thread From: Tian, Kevin @ 2018-07-25 6:19 UTC (permalink / raw) To: 'Jean-Philippe Brucker', Lu Baolu, Liu, Yi L, Joerg Roedel, David Woodhouse, Alex Williamson, Kirti Wankhede Cc: Raj, Ashok, kvm, Kumar, Sanjay K, iommu, linux-kernel, Sun, Yi Y, Pan, Jacob jun > From: Tian, Kevin > Sent: Wednesday, July 25, 2018 10:16 AM > [...] > > > > There is another way: as we're planning to add a generic pasid_alloc() > > function to the IOMMU API, the mdev module itself could allocate a > > default PASID for each mdev by calling pasid_alloc() on the mdev's > > parent, and then do map()/unmap() with that PASID. This way we don't > > have to add IOMMU ops to the mdev bus, everything can still be done > > using the ops of the parent. And IOMMU drivers "only" have to > implement > > PASID ops, which will be reused by drivers other than mdev. > > yes, PASID is more general abstraction than mdev. However there is > one problem. Please note with new scalable mode now PASID is not > used just for SVA. You can do 1st-level, 2nd-level and nested in PASID > granularity, meaning each PASID can be bound to an unique iommu > domain. However today IOMMU driver allows only one iommu_domain > per device. If we simply install allocated PASID to parent device, we > should also remove that iommu_domain limitation. Is it the way that > we want to pursue? > > mdev avoids such problem as it introduces a separate device... another thing to think about is to simplify the caller logic. It would be good to have consistent IOMMU APIs cross PCI device and mdev, for common VFIO operations e.g. map/unmap, iommu_ attach_group, etc. If we need special version ops with PASID, it brings burden to VFIO and other IOMMU clients... For IOMMU-capable mdev, there are two use cases which have been talked so far: 1) mdev with PASID-granular DMA isolation 2) mdev inheriting RID from parent device (no sharing) 1) is what this RFC is currently for. 2) is what Alex concerned which is not covered in RFC yet. In my mind, an ideal design is like below: a) VFIO provides an interface for parent driver to specify whether a mdev is IOMMU capable, with flag to indicate whether it's PASID-granular or RID-inherit; b) Once an IOMMU-capable mdev is created, VFIO treats it no different from normal physical devices, using exactly same IOMMU APIs to operate (including future SVA). VFIO doesn't care whether PASID or RID will be used in hardware; c) IOMMU driver then handle RID/PASID difference internally, based on its own configuration and mdev type. To support above goal, I feel a new device handle is a nice fit to represent mdev within IOMMU driver, with same set of capabilities as observed on its parent device. Jean, maybe I didn't fully understand your proposal. Can you elaborate whether above goal can be achieved with it? Thanks Kevin ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [RFC PATCH 03/10] iommu/vt-d: Allocate groups for mediated devices 2018-07-25 6:19 ` Tian, Kevin @ 2018-07-25 19:19 ` Jean-Philippe Brucker 2018-07-26 3:03 ` Tian, Kevin [not found] ` <AADFC41AFE54684AB9EE6CBC0274A5D1912826AE@SHSMSX101.ccr.corp.intel.com> 0 siblings, 2 replies; 32+ messages in thread From: Jean-Philippe Brucker @ 2018-07-25 19:19 UTC (permalink / raw) To: Tian, Kevin, Lu Baolu, Liu, Yi L, Joerg Roedel, David Woodhouse, Alex Williamson, Kirti Wankhede Cc: Raj, Ashok, kvm, Kumar, Sanjay K, iommu, linux-kernel, Sun, Yi Y, Pan, Jacob jun On 25/07/18 07:19, Tian, Kevin wrote: >> From: Tian, Kevin >> Sent: Wednesday, July 25, 2018 10:16 AM >> > [...] >>> >>> There is another way: as we're planning to add a generic pasid_alloc() >>> function to the IOMMU API, the mdev module itself could allocate a >>> default PASID for each mdev by calling pasid_alloc() on the mdev's >>> parent, and then do map()/unmap() with that PASID. This way we don't >>> have to add IOMMU ops to the mdev bus, everything can still be done >>> using the ops of the parent. And IOMMU drivers "only" have to >> implement >>> PASID ops, which will be reused by drivers other than mdev. >> >> yes, PASID is more general abstraction than mdev. However there is >> one problem. Please note with new scalable mode now PASID is not >> used just for SVA. You can do 1st-level, 2nd-level and nested in PASID >> granularity, meaning each PASID can be bound to an unique iommu >> domain. However today IOMMU driver allows only one iommu_domain >> per device. If we simply install allocated PASID to parent device, we >> should also remove that iommu_domain limitation. Is it the way that >> we want to pursue? >> >> mdev avoids such problem as it introduces a separate device... > > another thing to think about is to simplify the caller logic. It would > be good to have consistent IOMMU APIs cross PCI device and > mdev, for common VFIO operations e.g. map/unmap, iommu_ > attach_group, etc. If we need special version ops with PASID, it > brings burden to VFIO and other IOMMU clients... Yes, providing special ops is the simplest to implement (or least invasive, I guess) but isn't particularly elegant. See the patch from Jordan below, that I was planning to add to the SVA series (I'm laboriously working towards next version) and my email from last week: https://patchwork.kernel.org/patch/10412251/ Whenever I come back to hierarchical IOMMU domains I reject it as too complicated, but maybe that is what we need. I find it difficult to reason about because domains currently represent both a collection of devices and a one or more address spaces. I proposed the io_mm thing to represent a single address space, and to avoid adding special cases to every function in the IOMMU subsystem that manipulates domains. > For IOMMU-capable mdev, there are two use cases which have > been talked so far: > > 1) mdev with PASID-granular DMA isolation > 2) mdev inheriting RID from parent device (no sharing) Would that be a single mdev per parent device? Otherwise I don't really understand how it would work without all map/unmap operations going to the parent device's driver. > 1) is what this RFC is currently for. 2) is what Alex concerned > which is not covered in RFC yet. > > In my mind, an ideal design is like below: > > a) VFIO provides an interface for parent driver to specify > whether a mdev is IOMMU capable, with flag to indicate > whether it's PASID-granular or RID-inherit; > > b) Once an IOMMU-capable mdev is created, VFIO treats it no > different from normal physical devices, using exactly same > IOMMU APIs to operate (including future SVA). VFIO doesn't > care whether PASID or RID will be used in hardware; > > c) IOMMU driver then handle RID/PASID difference internally, > based on its own configuration and mdev type. > > To support above goal, I feel a new device handle is a nice fit > to represent mdev within IOMMU driver, with same set of > capabilities as observed on its parent device. It's a good abstraction, but I'm still concerned about other users of PASID-granular DMA isolation, for example GPU drivers wanting to improve isolation of DMA bufs, will want the same functionality without going through the vfio-mdev module. The IOMMU operations we care about don't take a device handle, I think, just a domain. And VFIO itself only deals with domains when doing map/unmap. Maybe we could add this operation to the IOMMU subsystem: child_domain = domain_create_child(parent_dev, parent_domain) A child domain would be a smaller isolation granule, getting a PASID if that's what the IOMMU implements or another mechanism for 2). It is automatically attached to its parent's devices, so attach/detach operations wouldn't be necessary on the child. Depending on the underlying architecture the child domain can support map/unmap on a single stage, or map/unmap for 2nd level and bind_pgtable for 1st level. I'm not sure how this works for host SVA though. I think the sva_bind_dev() API could stay the same, but the internals will need to change. Thanks, Jean ----- I was going to send the following patch for dealing with private PASIDs. I used it for a vfio-mdev prototype internally: VFIO would call iommu_sva_alloc_pasid on the parent device and then sva_map/sva_unmap on the parent domain + allocated io_mm (which redirects to iommu_map/iommu_unmap when there is no io_mm). Since it relies on io_mm instead of child domains, it duplicates the iommu ops, and as discussed might not be adapted to Vt-d scalable mode. Subject: [PATCH] iommu: sva: Add support for private PASIDs Provide an API for allocating PASIDs and populating them manually. To ease cleanup and factor allocation code, reuse the io_mm structure for private PASID. Private io_mm has a NULL mm_struct pointer, and cannot be bound to multiple devices. The mm_alloc() IOMMU op must now check if the mm argument is NULL, in which case it should allocate io_pgtables instead of binding to an mm. Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org> Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com> --- drivers/iommu/iommu-sva.c | 211 +++++++++++++++++++++++++++++++++----- drivers/iommu/iommu.c | 49 ++++++--- include/linux/iommu.h | 118 +++++++++++++++++++-- 3 files changed, 331 insertions(+), 47 deletions(-) diff --git a/drivers/iommu/iommu-sva.c b/drivers/iommu/iommu-sva.c index 35d0f387fbef..12c9b861014d 100644 --- a/drivers/iommu/iommu-sva.c +++ b/drivers/iommu/iommu-sva.c @@ -15,11 +15,11 @@ /** * DOC: io_mm model * - * The io_mm keeps track of process address spaces shared between CPU and IOMMU. - * The following example illustrates the relation between structures - * iommu_domain, io_mm and iommu_bond. An iommu_bond is a link between io_mm and - * device. A device can have multiple io_mm and an io_mm may be bound to - * multiple devices. + * When used with the bind()/unbind() functions, the io_mm keeps track of + * process address spaces shared between CPU and IOMMU. The following example + * illustrates the relation between structures iommu_domain, io_mm and + * iommu_bond. An iommu_bond is a link between io_mm and device. A device can + * have multiple io_mm and an io_mm may be bound to multiple devices. * ___________________________ * | IOMMU domain A | * | ________________ | @@ -97,6 +97,12 @@ * non-PASID translations. In this case PASID 0 is reserved and entry 0 points * to the io_pgtable base. On Intel IOMMU, the io_pgtable base would be held in * the device table and PASID 0 would be available to the allocator. + * + * The io_mm can also represent a private IOMMU address space, which isn't + * shared with a process. The device driver calls iommu_sva_alloc_pasid which + * returns an io_mm that can be populated with the iommu_sva_map/unmap + * functions. The principle is the same as shared io_mm, except that a private + * io_mm cannot be bound to multiple devices. */ struct iommu_bond { @@ -130,6 +136,9 @@ static DEFINE_SPINLOCK(iommu_sva_lock); static struct mmu_notifier_ops iommu_mmu_notifier; +#define io_mm_is_private(io_mm) ((io_mm) != NULL && (io_mm)->mm == NULL) +#define io_mm_is_shared(io_mm) ((io_mm) != NULL && (io_mm)->mm != NULL) + static struct io_mm * io_mm_alloc(struct iommu_domain *domain, struct device *dev, struct mm_struct *mm, unsigned long flags) @@ -148,19 +157,10 @@ io_mm_alloc(struct iommu_domain *domain, struct device *dev, if (!io_mm) return ERR_PTR(-ENOMEM); - /* - * The mm must not be freed until after the driver frees the io_mm - * (which may involve unpinning the CPU ASID for instance, requiring a - * valid mm struct.) - */ - mmgrab(mm); - io_mm->flags = flags; io_mm->mm = mm; - io_mm->notifier.ops = &iommu_mmu_notifier; io_mm->release = domain->ops->mm_free; INIT_LIST_HEAD(&io_mm->devices); - /* Leave kref to zero until the io_mm is fully initialized */ idr_preload(GFP_KERNEL); spin_lock(&iommu_sva_lock); @@ -175,6 +175,32 @@ io_mm_alloc(struct iommu_domain *domain, struct device *dev, goto err_free_mm; } + return io_mm; + +err_free_mm: + io_mm->release(io_mm); + return ERR_PTR(ret); +} + +static struct io_mm * +io_mm_alloc_shared(struct iommu_domain *domain, struct device *dev, + struct mm_struct *mm, unsigned long flags) +{ + int ret; + struct io_mm *io_mm; + + io_mm = io_mm_alloc(domain, dev, mm, flags); + if (IS_ERR(io_mm)) + return io_mm; + + /* + * The mm must not be freed until after the driver frees the io_mm + * (which may involve unpinning the CPU ASID for instance, requiring a + * valid mm struct.) + */ + mmgrab(mm); + + io_mm->notifier.ops = &iommu_mmu_notifier; ret = mmu_notifier_register(&io_mm->notifier, mm); if (ret) goto err_free_pasid; @@ -202,7 +228,6 @@ io_mm_alloc(struct iommu_domain *domain, struct device *dev, idr_remove(&iommu_pasid_idr, io_mm->pasid); spin_unlock(&iommu_sva_lock); -err_free_mm: io_mm->release(io_mm); mmdrop(mm); @@ -231,6 +256,11 @@ static void io_mm_release(struct kref *kref) /* The PASID can now be reallocated for another mm. */ idr_remove(&iommu_pasid_idr, io_mm->pasid); + if (io_mm_is_private(io_mm)) { + io_mm->release(io_mm); + return; + } + /* * If we're being released from mm exit, the notifier callback ->release * has already been called. Otherwise we don't need ->release, the io_mm @@ -258,7 +288,7 @@ static int io_mm_get_locked(struct io_mm *io_mm) if (io_mm && kref_get_unless_zero(&io_mm->kref)) { /* * kref_get_unless_zero doesn't provide ordering for reads. This - * barrier pairs with the one in io_mm_alloc. + * barrier pairs with the one in io_mm_alloc_shared. */ smp_rmb(); return 1; @@ -289,7 +319,7 @@ static int io_mm_attach(struct iommu_domain *domain, struct device *dev, struct iommu_sva_param *param = dev->iommu_param->sva_param; if (!domain->ops->mm_attach || !domain->ops->mm_detach || - !domain->ops->mm_invalidate) + (io_mm_is_shared(io_mm) && !domain->ops->mm_invalidate)) return -ENODEV; if (pasid > param->max_pasid || pasid < param->min_pasid) @@ -550,7 +580,7 @@ int iommu_sva_device_init(struct device *dev, unsigned long features, struct iommu_sva_param *param; struct iommu_domain *domain = iommu_get_domain_for_dev(dev); - if (!domain || !domain->ops->sva_device_init) + if (!domain) return -ENODEV; if (features & ~IOMMU_SVA_FEAT_IOPF) @@ -584,9 +614,11 @@ int iommu_sva_device_init(struct device *dev, unsigned long features, * IOMMU driver updates the limits depending on the IOMMU and device * capabilities. */ - ret = domain->ops->sva_device_init(dev, param); - if (ret) - goto err_free_param; + if (domain->ops->sva_device_init) { + ret = domain->ops->sva_device_init(dev, param); + if (ret) + goto err_free_param; + } dev->iommu_param->sva_param = param; mutex_unlock(&dev->iommu_param->lock); @@ -688,7 +720,7 @@ int __iommu_sva_bind_device(struct device *dev, struct mm_struct *mm, } if (!io_mm) { - io_mm = io_mm_alloc(domain, dev, mm, flags); + io_mm = io_mm_alloc_shared(domain, dev, mm, flags); if (IS_ERR(io_mm)) return PTR_ERR(io_mm); } @@ -723,6 +755,9 @@ int __iommu_sva_unbind_device(struct device *dev, int pasid) /* spin_lock_irq matches the one in wait_event_lock_irq */ spin_lock_irq(&iommu_sva_lock); list_for_each_entry(bond, ¶m->mm_list, dev_head) { + if (io_mm_is_private(bond->io_mm)) + continue; + if (bond->io_mm->pasid == pasid) { io_mm_detach_locked(bond, true); ret = 0; @@ -765,6 +800,136 @@ void __iommu_sva_unbind_dev_all(struct device *dev) } EXPORT_SYMBOL_GPL(__iommu_sva_unbind_dev_all); +/* + * iommu_sva_alloc_pasid - Allocate a private PASID + * + * Allocate a PASID for private map/unmap operations. Create a new I/O address + * space for this device, that isn't bound to any process. + * + * iommu_sva_device_init must have been called first. + */ +int iommu_sva_alloc_pasid(struct device *dev, struct io_mm **out) +{ + int ret; + struct io_mm *io_mm; + struct iommu_domain *domain; + struct iommu_sva_param *param = dev->iommu_param->sva_param; + + if (!out || !param) + return -EINVAL; + + domain = iommu_get_domain_for_dev(dev); + if (!domain) + return -EINVAL; + + io_mm = io_mm_alloc(domain, dev, NULL, 0); + if (IS_ERR(io_mm)) + return PTR_ERR(io_mm); + + kref_init(&io_mm->kref); + + ret = io_mm_attach(domain, dev, io_mm, NULL); + if (ret) { + io_mm_put(io_mm); + return ret; + } + + *out = io_mm; + return 0; +} +EXPORT_SYMBOL_GPL(iommu_sva_alloc_pasid); + +void iommu_sva_free_pasid(struct device *dev, struct io_mm *io_mm) +{ + struct iommu_bond *bond; + + if (WARN_ON(io_mm_is_shared(io_mm))) + return; + + spin_lock(&iommu_sva_lock); + list_for_each_entry(bond, &io_mm->devices, mm_head) { + if (bond->dev == dev) { + io_mm_detach_locked(bond, false); + break; + } + } + spin_unlock(&iommu_sva_lock); +} +EXPORT_SYMBOL_GPL(iommu_sva_free_pasid); + +int iommu_sva_map(struct iommu_domain *domain, struct io_mm *io_mm, + unsigned long iova, phys_addr_t paddr, size_t size, int prot) +{ + if (WARN_ON(io_mm_is_shared(io_mm))) + return -ENODEV; + + return __iommu_map(domain, io_mm, iova, paddr, size, prot); +} +EXPORT_SYMBOL_GPL(iommu_sva_map); + +size_t iommu_sva_map_sg(struct iommu_domain *domain, struct io_mm *io_mm, + unsigned long iova, struct scatterlist *sg, + unsigned int nents, int prot) +{ + if (WARN_ON(io_mm_is_shared(io_mm))) + return -ENODEV; + + return domain->ops->map_sg(domain, io_mm, iova, sg, nents, prot); +} +EXPORT_SYMBOL_GPL(iommu_sva_map_sg); + +size_t iommu_sva_unmap(struct iommu_domain *domain, struct io_mm *io_mm, + unsigned long iova, size_t size) +{ + if (WARN_ON(io_mm_is_shared(io_mm))) + return 0; + + return __iommu_unmap(domain, io_mm, iova, size, true); +} +EXPORT_SYMBOL_GPL(iommu_sva_unmap); + +size_t iommu_sva_unmap_fast(struct iommu_domain *domain, struct io_mm *io_mm, + unsigned long iova, size_t size) +{ + if (WARN_ON(io_mm_is_shared(io_mm))) + return 0; + + return __iommu_unmap(domain, io_mm, iova, size, false); +} +EXPORT_SYMBOL_GPL(iommu_sva_unmap_fast); + +phys_addr_t iommu_sva_iova_to_phys(struct iommu_domain *domain, + struct io_mm *io_mm, dma_addr_t iova) +{ + if (!io_mm) + return iommu_iova_to_phys(domain, iova); + + if (WARN_ON(io_mm_is_shared(io_mm))) + return 0; + + if (unlikely(domain->ops->sva_iova_to_phys == NULL)) + return 0; + + return domain->ops->sva_iova_to_phys(domain, io_mm, iova); +} +EXPORT_SYMBOL_GPL(iommu_sva_iova_to_phys); + +void iommu_sva_tlb_range_add(struct iommu_domain *domain, struct io_mm *io_mm, + unsigned long iova, size_t size) +{ + if (!io_mm) { + iommu_tlb_range_add(domain, iova, size); + return; + } + + if (WARN_ON(io_mm_is_shared(io_mm))) + return; + + if (domain->ops->sva_iotlb_range_add != NULL) + domain->ops->sva_iotlb_range_add(domain, io_mm, iova, size); +} +EXPORT_SYMBOL_GPL(iommu_sva_tlb_range_add); + /** * iommu_sva_find() - Find mm associated to the given PASID * @pasid: Process Address Space ID assigned to the mm @@ -779,7 +944,7 @@ struct mm_struct *iommu_sva_find(int pasid) spin_lock(&iommu_sva_lock); io_mm = idr_find(&iommu_pasid_idr, pasid); - if (io_mm && io_mm_get_locked(io_mm)) { + if (io_mm_is_shared(io_mm) && io_mm_get_locked(io_mm)) { if (mmget_not_zero(io_mm->mm)) mm = io_mm->mm; diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index b02e1d32c733..26ac9b2a813e 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -1816,8 +1816,8 @@ static size_t iommu_pgsize(struct iommu_domain *domain, return pgsize; } -int iommu_map(struct iommu_domain *domain, unsigned long iova, - phys_addr_t paddr, size_t size, int prot) +int __iommu_map(struct iommu_domain *domain, struct io_mm *io_mm, + unsigned long iova, phys_addr_t paddr, size_t size, int prot) { unsigned long orig_iova = iova; unsigned int min_pagesz; @@ -1825,7 +1825,8 @@ int iommu_map(struct iommu_domain *domain, unsigned long iova, phys_addr_t orig_paddr = paddr; int ret = 0; - if (unlikely(domain->ops->map == NULL || + if (unlikely((!io_mm && domain->ops->map == NULL) || + (io_mm && domain->ops->sva_map == NULL) || domain->pgsize_bitmap == 0UL)) return -ENODEV; @@ -1854,7 +1855,12 @@ int iommu_map(struct iommu_domain *domain, unsigned long iova, pr_debug("mapping: iova 0x%lx pa %pa pgsize 0x%zx\n", iova, &paddr, pgsize); - ret = domain->ops->map(domain, iova, paddr, pgsize, prot); + if (io_mm) + ret = domain->ops->sva_map(domain, io_mm, iova, paddr, + pgsize, prot); + else + ret = domain->ops->map(domain, iova, paddr, pgsize, + prot); if (ret) break; @@ -1865,24 +1871,30 @@ int iommu_map(struct iommu_domain *domain, unsigned long iova, /* unroll mapping in case something went wrong */ if (ret) - iommu_unmap(domain, orig_iova, orig_size - size); + __iommu_unmap(domain, io_mm, orig_iova, orig_size - size, true); else trace_map(orig_iova, orig_paddr, orig_size); return ret; } + +int iommu_map(struct iommu_domain *domain, unsigned long iov, + phys_addr_t paddr, size_t size, int prot) +{ + return __iommu_map(domain, NULL, iov, paddr, size, prot); +} EXPORT_SYMBOL_GPL(iommu_map); -static size_t __iommu_unmap(struct iommu_domain *domain, - unsigned long iova, size_t size, - bool sync) +size_t __iommu_unmap(struct iommu_domain *domain, struct io_mm *io_mm, + unsigned long iova, size_t size, bool sync) { const struct iommu_ops *ops = domain->ops; size_t unmapped_page, unmapped = 0; unsigned long orig_iova = iova; unsigned int min_pagesz; - if (unlikely(ops->unmap == NULL || + if (unlikely((!io_mm && ops->unmap == NULL) || + (io_mm && ops->sva_unmap == NULL) || domain->pgsize_bitmap == 0UL)) return 0; @@ -1912,7 +1924,11 @@ static size_t __iommu_unmap(struct iommu_domain *domain, while (unmapped < size) { size_t pgsize = iommu_pgsize(domain, iova, size - unmapped); - unmapped_page = ops->unmap(domain, iova, pgsize); + if (io_mm) + unmapped_page = ops->sva_unmap(domain, io_mm, iova, + pgsize); + else + unmapped_page = ops->unmap(domain, iova, pgsize); if (!unmapped_page) break; @@ -1936,19 +1952,20 @@ static size_t __iommu_unmap(struct iommu_domain *domain, size_t iommu_unmap(struct iommu_domain *domain, unsigned long iova, size_t size) { - return __iommu_unmap(domain, iova, size, true); + return __iommu_unmap(domain, NULL, iova, size, true); } EXPORT_SYMBOL_GPL(iommu_unmap); size_t iommu_unmap_fast(struct iommu_domain *domain, unsigned long iova, size_t size) { - return __iommu_unmap(domain, iova, size, false); + return __iommu_unmap(domain, NULL, iova, size, false); } EXPORT_SYMBOL_GPL(iommu_unmap_fast); -size_t default_iommu_map_sg(struct iommu_domain *domain, unsigned long iova, - struct scatterlist *sg, unsigned int nents, int prot) +size_t default_iommu_map_sg(struct iommu_domain *domain, struct io_mm *io_mm, + unsigned long iova, struct scatterlist *sg, unsigned + int nents, int prot) { struct scatterlist *s; size_t mapped = 0; @@ -1972,7 +1989,7 @@ size_t default_iommu_map_sg(struct iommu_domain *domain, unsigned long iova, if (!IS_ALIGNED(s->offset, min_pagesz)) goto out_err; - ret = iommu_map(domain, iova + mapped, phys, s->length, prot); + ret = __iommu_map(domain, io_mm, iova + mapped, phys, s->length, prot); if (ret) goto out_err; @@ -1983,7 +2000,7 @@ size_t default_iommu_map_sg(struct iommu_domain *domain, unsigned long iova, out_err: /* undo mappings already done */ - iommu_unmap(domain, iova, mapped); + __iommu_unmap(domain, io_mm, iova, mapped, true); return 0; diff --git a/include/linux/iommu.h b/include/linux/iommu.h index b525f5636df8..1f63a8691173 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -248,13 +248,17 @@ struct iommu_sva_param { * @mm_invalidate: Invalidate a range of mappings for an mm * @map: map a physically contiguous memory region to an iommu domain * @unmap: unmap a physically contiguous memory region from an iommu domain + * @sva_map: map a physically contiguous memory region to an address space + * @sva_unmap: unmap a physically contiguous memory region from an address space * @map_sg: map a scatter-gather list of physically contiguous memory chunks * to an iommu domain * @flush_tlb_all: Synchronously flush all hardware TLBs for this domain * @tlb_range_add: Add a given iova range to the flush queue for this domain + * @sva_iotlb_range_add: Add a given iova range to the flush queue for this mm * @tlb_sync: Flush all queued ranges from the hardware TLBs and empty flush * queue * @iova_to_phys: translate iova to physical address + * @sva_iova_to_phys: translate iova to physical address * @add_device: add device to iommu grouping * @remove_device: remove device from iommu grouping * @device_group: find iommu group for a particular device @@ -302,13 +306,24 @@ struct iommu_ops { phys_addr_t paddr, size_t size, int prot); size_t (*unmap)(struct iommu_domain *domain, unsigned long iova, size_t size); - size_t (*map_sg)(struct iommu_domain *domain, unsigned long iova, - struct scatterlist *sg, unsigned int nents, int prot); + int (*sva_map)(struct iommu_domain *domain, struct io_mm *io_mm, + unsigned long iova, phys_addr_t paddr, size_t size, + int prot); + size_t (*sva_unmap)(struct iommu_domain *domain, struct io_mm *io_mm, + unsigned long iova, size_t size); + size_t (*map_sg)(struct iommu_domain *domain, struct io_mm *io_mm, + unsigned long iova, struct scatterlist *sg, + unsigned int nents, int prot); void (*flush_iotlb_all)(struct iommu_domain *domain); void (*iotlb_range_add)(struct iommu_domain *domain, unsigned long iova, size_t size); + void (*sva_iotlb_range_add)(struct iommu_domain *domain, + struct io_mm *io_mm, unsigned long iova, + size_t size); void (*iotlb_sync)(struct iommu_domain *domain); phys_addr_t (*iova_to_phys)(struct iommu_domain *domain, dma_addr_t iova); + phys_addr_t (*sva_iova_to_phys)(struct iommu_domain *domain, + struct io_mm *io_mm, dma_addr_t iova); int (*add_device)(struct device *dev); void (*remove_device)(struct device *dev); struct iommu_group *(*device_group)(struct device *dev); @@ -528,15 +543,20 @@ extern int iommu_sva_invalidate(struct iommu_domain *domain, struct device *dev, struct tlb_invalidate_info *inv_info); extern struct iommu_domain *iommu_get_domain_for_dev(struct device *dev); +extern int __iommu_map(struct iommu_domain *domain, struct io_mm *io_mm, + unsigned long iova, phys_addr_t paddr, size_t size, + int prot); extern int iommu_map(struct iommu_domain *domain, unsigned long iova, phys_addr_t paddr, size_t size, int prot); +extern size_t __iommu_unmap(struct iommu_domain *domain, struct io_mm *io_mm, + unsigned long iova, size_t size, bool sync); extern size_t iommu_unmap(struct iommu_domain *domain, unsigned long iova, size_t size); extern size_t iommu_unmap_fast(struct iommu_domain *domain, unsigned long iova, size_t size); -extern size_t default_iommu_map_sg(struct iommu_domain *domain, unsigned long iova, - struct scatterlist *sg,unsigned int nents, - int prot); +extern size_t default_iommu_map_sg(struct iommu_domain *domain, struct io_mm *io_mm, + unsigned long iova, struct scatterlist *sg, + unsigned int nents, int prot); extern phys_addr_t iommu_iova_to_phys(struct iommu_domain *domain, dma_addr_t iova); extern void iommu_set_fault_handler(struct iommu_domain *domain, iommu_fault_handler_t handler, void *token); @@ -622,7 +642,7 @@ static inline size_t iommu_map_sg(struct iommu_domain *domain, unsigned long iova, struct scatterlist *sg, unsigned int nents, int prot) { - return domain->ops->map_sg(domain, iova, sg, nents, prot); + return domain->ops->map_sg(domain, NULL, iova, sg, nents, prot); } /* PCI device grouping function */ @@ -710,12 +730,25 @@ static inline struct iommu_domain *iommu_get_domain_for_dev(struct device *dev) return NULL; } +static inline int __iommu_map(struct iommu_domain *domain, struct io_mm *io_mm, + unsigned long iova, phys_addr_t paddr, + size_t size, int prot) +{ + return -ENODEV; +} + static inline int iommu_map(struct iommu_domain *domain, unsigned long iova, phys_addr_t paddr, size_t size, int prot) { return -ENODEV; } +static inline size_t __iommu_unmap(struct iommu_domain *domain, struct io_mm *io_mm, + unsigned long iova, size_t size, bool sync) +{ + return 0; +} + static inline size_t iommu_unmap(struct iommu_domain *domain, unsigned long iova, size_t size) { @@ -729,8 +762,9 @@ static inline size_t iommu_unmap_fast(struct iommu_domain *domain, } static inline size_t iommu_map_sg(struct iommu_domain *domain, - unsigned long iova, struct scatterlist *sg, - unsigned int nents, int prot) + struct io_mm *io_mm, unsigned long iova, + struct scatterlist *sg, unsigned int nents, + int prot) { return 0; } @@ -1017,6 +1051,22 @@ extern int __iommu_sva_bind_device(struct device *dev, struct mm_struct *mm, extern int __iommu_sva_unbind_device(struct device *dev, int pasid); extern void __iommu_sva_unbind_dev_all(struct device *dev); +int iommu_sva_alloc_pasid(struct device *dev, struct io_mm **io_mm); +void iommu_sva_free_pasid(struct device *dev, struct io_mm *io_mm); + +int iommu_sva_map(struct iommu_domain *domain, struct io_mm *io_mm, + unsigned long iova, phys_addr_t paddr, size_t size, int prot); +size_t iommu_sva_map_sg(struct iommu_domain *domain, struct io_mm *io_mm, + unsigned long iova, struct scatterlist *sg, + unsigned int nents, int prot); +size_t iommu_sva_unmap(struct iommu_domain *domain, + struct io_mm *io_mm, unsigned long iova, size_t size); +size_t iommu_sva_unmap_fast(struct iommu_domain *domain, struct io_mm *io_mm, + unsigned long iova, size_t size); +phys_addr_t iommu_sva_iova_to_phys(struct iommu_domain *domain, + struct io_mm *io_mm, dma_addr_t iova); +void iommu_sva_tlb_range_add(struct iommu_domain *domain, struct io_mm *io_mm, + unsigned long iova, size_t size); extern struct mm_struct *iommu_sva_find(int pasid); #else /* CONFIG_IOMMU_SVA */ static inline int iommu_sva_device_init(struct device *dev, @@ -1048,6 +1098,58 @@ static inline void __iommu_sva_unbind_dev_all(struct device *dev) { } +static inline struct io_mm * +iommu_sva_alloc_pasid(struct iommu_domain *domain, struct device *dev) +{ + return ERR_PTR(-ENODEV); +} + +static inline void iommu_sva_free_pasid(struct io_mm *io_mm, struct device *dev) +{ +} + +static inline int iommu_sva_map(struct iommu_domain *domain, + struct io_mm *io_mm, unsigned long iova, + phys_addr_t paddr, size_t size, int prot) +{ + return -EINVAL; +} + +static inline size_t iommu_sva_map_sg(struct iommu_domain *domain, + struct io_mm *io_mm, unsigned long iova, + struct scatterlist *sg, + unsigned int nents, int prot) +{ + return 0; +} + +static inline size_t iommu_sva_unmap(struct iommu_domain *domain, + struct io_mm *io_mm, unsigned long iova, + size_t size) +{ + return 0; +} + +static inline size_t iommu_sva_unmap_fast(struct iommu_domain *domain, + struct io_mm *io_mm, + unsigned long iova, size_t size) +{ + return 0; +} + +static inline phys_addr_t iommu_sva_iova_to_phys(struct iommu_domain *domain, + struct io_mm *io_mm, + dma_addr_t iova) +{ + return 0; +} + +static inline void iommu_sva_tlb_range_add(struct iommu_domain *domain, + struct io_mm *io_mm, + unsigned long iova, size_t size) +{ +} + static inline struct mm_struct *iommu_sva_find(int pasid) { return NULL; -- 2.18.0 ^ permalink raw reply related [flat|nested] 32+ messages in thread
* RE: [RFC PATCH 03/10] iommu/vt-d: Allocate groups for mediated devices 2018-07-25 19:19 ` Jean-Philippe Brucker @ 2018-07-26 3:03 ` Tian, Kevin 2018-07-26 15:09 ` Jean-Philippe Brucker [not found] ` <AADFC41AFE54684AB9EE6CBC0274A5D1912826AE@SHSMSX101.ccr.corp.intel.com> 1 sibling, 1 reply; 32+ messages in thread From: Tian, Kevin @ 2018-07-26 3:03 UTC (permalink / raw) To: Jean-Philippe Brucker, Lu Baolu, Liu, Yi L, Joerg Roedel, David Woodhouse, Alex Williamson, Kirti Wankhede Cc: Raj, Ashok, kvm, Kumar, Sanjay K, iommu, linux-kernel, Sun, Yi Y, Pan, Jacob jun > From: Jean-Philippe Brucker [mailto:jean-philippe.brucker@arm.com] > Sent: Thursday, July 26, 2018 3:20 AM > > On 25/07/18 07:19, Tian, Kevin wrote: > >> From: Tian, Kevin > >> Sent: Wednesday, July 25, 2018 10:16 AM > >> > > [...] > >>> > >>> There is another way: as we're planning to add a generic pasid_alloc() > >>> function to the IOMMU API, the mdev module itself could allocate a > >>> default PASID for each mdev by calling pasid_alloc() on the mdev's > >>> parent, and then do map()/unmap() with that PASID. This way we don't > >>> have to add IOMMU ops to the mdev bus, everything can still be done > >>> using the ops of the parent. And IOMMU drivers "only" have to > >> implement > >>> PASID ops, which will be reused by drivers other than mdev. > >> > >> yes, PASID is more general abstraction than mdev. However there is > >> one problem. Please note with new scalable mode now PASID is not > >> used just for SVA. You can do 1st-level, 2nd-level and nested in PASID > >> granularity, meaning each PASID can be bound to an unique iommu > >> domain. However today IOMMU driver allows only one iommu_domain > >> per device. If we simply install allocated PASID to parent device, we > >> should also remove that iommu_domain limitation. Is it the way that > >> we want to pursue? > >> > >> mdev avoids such problem as it introduces a separate device... > > > > another thing to think about is to simplify the caller logic. It would > > be good to have consistent IOMMU APIs cross PCI device and > > mdev, for common VFIO operations e.g. map/unmap, iommu_ > > attach_group, etc. If we need special version ops with PASID, it > > brings burden to VFIO and other IOMMU clients... > > Yes, providing special ops is the simplest to implement (or least > invasive, I guess) but isn't particularly elegant. See the patch from > Jordan below, that I was planning to add to the SVA series (I'm > laboriously working towards next version) and my email from last week: > https://patchwork.kernel.org/patch/10412251/ > > Whenever I come back to hierarchical IOMMU domains I reject it as too > complicated, but maybe that is what we need. I find it difficult to > reason about because domains currently represent both a collection of > devices and a one or more address spaces. I proposed the io_mm thing to > represent a single address space, and to avoid adding special cases to > every function in the IOMMU subsystem that manipulates domains. I suppose io_mm is still under iommu_domain, right? this is different from hierarchical iommu domain concept... > > > For IOMMU-capable mdev, there are two use cases which have > > been talked so far: > > > > 1) mdev with PASID-granular DMA isolation > > 2) mdev inheriting RID from parent device (no sharing) > > Would that be a single mdev per parent device? Otherwise I don't really > understand how it would work without all map/unmap operations going to > the parent device's driver. yes, that is why I said "no sharing". In that special case, host driver itself doesn't do DMA. Only the mdev does. > > > 1) is what this RFC is currently for. 2) is what Alex concerned > > which is not covered in RFC yet. > > > > In my mind, an ideal design is like below: > > > > a) VFIO provides an interface for parent driver to specify > > whether a mdev is IOMMU capable, with flag to indicate > > whether it's PASID-granular or RID-inherit; > > > > b) Once an IOMMU-capable mdev is created, VFIO treats it no > > different from normal physical devices, using exactly same > > IOMMU APIs to operate (including future SVA). VFIO doesn't > > care whether PASID or RID will be used in hardware; > > > > c) IOMMU driver then handle RID/PASID difference internally, > > based on its own configuration and mdev type. > > > > To support above goal, I feel a new device handle is a nice fit > > to represent mdev within IOMMU driver, with same set of > > capabilities as observed on its parent device. > > It's a good abstraction, but I'm still concerned about other users of > PASID-granular DMA isolation, for example GPU drivers wanting to improve > isolation of DMA bufs, will want the same functionality without going > through the vfio-mdev module. for GPU I think you meant SVA. that one anyway needs its own interface as what we have been discussing in yours and Jacob's series. Here mdev is orthogonal to a specific capability like SVA. It is sort of logical representation of subset resource of parent device, on top of which we can enable IOMMU capabilities including SVA. I'm not sure whether it is good to combine two requirements closely... > > The IOMMU operations we care about don't take a device handle, I think, > just a domain. And VFIO itself only deals with domains when doing > map/unmap. Maybe we could add this operation to the IOMMU subsystem: > > child_domain = domain_create_child(parent_dev, parent_domain) > > A child domain would be a smaller isolation granule, getting a PASID if > that's what the IOMMU implements or another mechanism for 2). It is > automatically attached to its parent's devices, so attach/detach > operations wouldn't be necessary on the child. > > Depending on the underlying architecture the child domain can support > map/unmap on a single stage, or map/unmap for 2nd level and > bind_pgtable > for 1st level. > > I'm not sure how this works for host SVA though. I think the > sva_bind_dev() API could stay the same, but the internals will need > to change. > hierarchical domain might be the right way to go, but let's do more thinking on any corner cases. One open is whether iommu domain can fully carry all required attributes on mdev. Note today for physical device each vendor driver maintains a device structure for device specific info which may impact IOMMU setting (e.g. struct device_domain_info in intel-iommu, and struct arm_smmu_device in arm-smmu). If we want mdev to have a different attribute as its parent device, then new representation might be required. But honestly speaking I don't think it is a valid requirement now, since physically finally it is still the IOMMU structure of parent device being configured, so mdev should just inherit same attributes as parent. If the role of mdev representation in current RFC is just to connect iommu_ domain when hierarchical domain is missing, then we might instead just make the latter happen... Let's think more on this direction. btw can you elaborate any other complexities when you evaluated this option earlier? Thanks Kevin ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [RFC PATCH 03/10] iommu/vt-d: Allocate groups for mediated devices 2018-07-26 3:03 ` Tian, Kevin @ 2018-07-26 15:09 ` Jean-Philippe Brucker 0 siblings, 0 replies; 32+ messages in thread From: Jean-Philippe Brucker @ 2018-07-26 15:09 UTC (permalink / raw) To: Tian, Kevin, Lu Baolu, Liu, Yi L, Joerg Roedel, David Woodhouse, Alex Williamson, Kirti Wankhede Cc: Raj, Ashok, kvm, Kumar, Sanjay K, iommu, linux-kernel, Sun, Yi Y, Pan, Jacob jun On 26/07/18 04:03, Tian, Kevin wrote: >> Whenever I come back to hierarchical IOMMU domains I reject it as too >> complicated, but maybe that is what we need. I find it difficult to >> reason about because domains currently represent both a collection of >> devices and a one or more address spaces. I proposed the io_mm thing to >> represent a single address space, and to avoid adding special cases to >> every function in the IOMMU subsystem that manipulates domains. > > I suppose io_mm is still under iommu_domain, right? this is different > from hierarchical iommu domain concept... Yes, my initial solution is io_mm attached to domains, but in the rest of the mail yesterday I tried to explore a way to replace io_mm with child domains instead. In my current patches, io_mm when used for private PASIDs is basically a lightweight child domain. I don't know which solution is best or if mdev-aware IOMMU is still preferable. For the moment I'll keep the private PASID proposal that uses io_mm, until we've thought a bit more about this. [...] >> It's a good abstraction, but I'm still concerned about other users of >> PASID-granular DMA isolation, for example GPU drivers wanting to improve >> isolation of DMA bufs, will want the same functionality without going >> through the vfio-mdev module. > > for GPU I think you meant SVA. that one anyway needs its own > interface as what we have been discussing in yours and Jacob's > series. Actually I didn't mean SVA. People would like to allocate PASIDs in kernel drivers and do iommu_map/unmap on them, without binding process address spaces. Not counting hardware validation, I've actually seen more demand for private PASID management than for SVA. See for example the discussion on RFCv2 of SVA, or Jordan's series: https://www.spinics.net/lists/arm-kernel/msg611038.html https://lwn.net/Articles/747889/ It would be good if mdev and non-mdev could reuse most of the same code for private PASID, whatever it turns out to be > Here mdev is orthogonal to a specific capability like SVA. It is > sort of logical representation of subset resource of parent device, > on top of which we can enable IOMMU capabilities including SVA. > > I'm not sure whether it is good to combine two requirements closely... > >> >> The IOMMU operations we care about don't take a device handle, I think, >> just a domain. And VFIO itself only deals with domains when doing >> map/unmap. Maybe we could add this operation to the IOMMU subsystem: >> >> child_domain = domain_create_child(parent_dev, parent_domain) >> >> A child domain would be a smaller isolation granule, getting a PASID if >> that's what the IOMMU implements or another mechanism for 2). It is >> automatically attached to its parent's devices, so attach/detach >> operations wouldn't be necessary on the child. >> >> Depending on the underlying architecture the child domain can support >> map/unmap on a single stage, or map/unmap for 2nd level and >> bind_pgtable >> for 1st level. >> >> I'm not sure how this works for host SVA though. I think the >> sva_bind_dev() API could stay the same, but the internals will need >> to change. Thinking more about this, the SVA case seems a bit nasty. If we replaced io_mm with iommu_domain for SVA, a child domain would represent a single process address space, and therefore have multiple parent domains... Not sure we should go down that road. Maybe we should keep SVA separate, and keep io_mm to be a wrapper for mm_struct > hierarchical domain might be the right way to go, but let's do more > thinking on any corner cases. > > One open is whether iommu domain can fully carry all required > attributes on mdev. Note today for physical device each vendor > driver maintains a device structure for device specific info which > may impact IOMMU setting (e.g. struct device_domain_info in > intel-iommu, and struct arm_smmu_device in arm-smmu). If we > want mdev to have a different attribute as its parent device, then > new representation might be required. But honestly speaking I > don't think it is a valid requirement now, since physically finally > it is still the IOMMU structure of parent device being configured, > so mdev should just inherit same attributes as parent. If the role > of mdev representation in current RFC is just to connect iommu_ > domain when hierarchical domain is missing, then we might > instead just make the latter happen... Agreed, the mdev would inherit properties of its parent since physically the IOMMU sees DMA transactions coming from the parent > Let's think more on this direction. btw can you elaborate any > other complexities when you evaluated this option earlier? I can't think of anything right now - my prototype worked well with iommu_sva_alloc_pasid, but the goal was mainly for me to understand mdev using a toy DMA engine, I didn't spend time thinking about corner cases. Thanks, Jean ^ permalink raw reply [flat|nested] 32+ messages in thread
[parent not found: <AADFC41AFE54684AB9EE6CBC0274A5D1912826AE@SHSMSX101.ccr.corp.intel.com>]
* RE: [RFC PATCH 03/10] iommu/vt-d: Allocate groups for mediated devices [not found] ` <AADFC41AFE54684AB9EE6CBC0274A5D1912826AE@SHSMSX101.ccr.corp.intel.com> @ 2018-07-26 3:28 ` Tian, Kevin 2018-07-26 15:09 ` Jean-Philippe Brucker 0 siblings, 1 reply; 32+ messages in thread From: Tian, Kevin @ 2018-07-26 3:28 UTC (permalink / raw) To: 'Jean-Philippe Brucker', Lu Baolu, Liu, Yi L, Joerg Roedel, David Woodhouse, Alex Williamson, Kirti Wankhede Cc: Raj, Ashok, kvm, Kumar, Sanjay K, iommu, linux-kernel, Sun, Yi Y, Pan, Jacob jun > From: Tian, Kevin > Sent: Thursday, July 26, 2018 11:04 AM [...] > > > > The IOMMU operations we care about don't take a device handle, I think, > > just a domain. And VFIO itself only deals with domains when doing > > map/unmap. Maybe we could add this operation to the IOMMU > subsystem: > > > > child_domain = domain_create_child(parent_dev, parent_domain) > > > > A child domain would be a smaller isolation granule, getting a PASID if > > that's what the IOMMU implements or another mechanism for 2). It is > > automatically attached to its parent's devices, so attach/detach > > operations wouldn't be necessary on the child. > > > > Depending on the underlying architecture the child domain can support > > map/unmap on a single stage, or map/unmap for 2nd level and > > bind_pgtable > > for 1st level. > > > > I'm not sure how this works for host SVA though. I think the > > sva_bind_dev() API could stay the same, but the internals will need > > to change. > > > > hierarchical domain might be the right way to go, but let's do more > thinking on any corner cases. > btw maybe we don't need make it 'hierarchical', as maintaining hierarchy usually brings more work. What we require is possibly just the capability of having one device bind to multiple iommu_domains. One domain is reserved for parent driver's own DMA activities (e.g. serving DMA APIs), while other domains are auxiliary and can be tagged with a PASID (or any other identifier which IOMMU can use to support multiple domains). Such identifiers may be automatically provisioned when auxiliary domain is attached, i.e. not requiring an explicit request from caller. IMO it's safe to assume that supporting multiple iommu domains anyway implies some finer-grained capability than RID-based in underlying IOMMU. Then there is no need of parent/child concept. thoughts? Thanks Kevin ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [RFC PATCH 03/10] iommu/vt-d: Allocate groups for mediated devices 2018-07-26 3:28 ` Tian, Kevin @ 2018-07-26 15:09 ` Jean-Philippe Brucker 0 siblings, 0 replies; 32+ messages in thread From: Jean-Philippe Brucker @ 2018-07-26 15:09 UTC (permalink / raw) To: Tian, Kevin, Lu Baolu, Liu, Yi L, Joerg Roedel, David Woodhouse, Alex Williamson, Kirti Wankhede Cc: Raj, Ashok, kvm, Kumar, Sanjay K, iommu, linux-kernel, Sun, Yi Y, Pan, Jacob jun On 26/07/18 04:28, Tian, Kevin wrote: >> hierarchical domain might be the right way to go, but let's do more >> thinking on any corner cases. >> > > btw maybe we don't need make it 'hierarchical', as maintaining > hierarchy usually brings more work. What we require is possibly > just the capability of having one device bind to multiple > iommu_domains. One domain is reserved for parent driver's > own DMA activities (e.g. serving DMA APIs), while other domains > are auxiliary and can be tagged with a PASID (or any other identifier > which IOMMU can use to support multiple domains). Such identifiers > may be automatically provisioned when auxiliary domain is attached, > i.e. not requiring an explicit request from caller. IMO it's safe to > assume that supporting multiple iommu domains anyway implies > some finer-grained capability than RID-based in underlying IOMMU. > Then there is no need of parent/child concept. Right, we probably don't need a hierarchy. I like this model (it's actually the one I favor for supporting PASID in virtio-iommu), though I'm hoping we can avoid changing the logic of iommu_attach/detach, and instead introduce a new "get_child_domain", "attach_aux_domain" or simply "clone" op. Currently the attach_dev() op toggles the domain of a device. For example a device automatically gets a default domain for kernel DMA, then VFIO attaches a new domain for assigning to a VM. attach_dev() disables the default domain and installs fresh page tables. detach_dev() isn't called, and the SMMU drivers don't even implement it. If we wanted to reuse attach_dev() for auxiliary domains, we'd need some flag to differentiate the "add auxiliary domain" operation from the "detach everything and attach one new domain" one Thanks, Jean ^ permalink raw reply [flat|nested] 32+ messages in thread
* [RFC PATCH 04/10] iommu/vt-d: Get pasid table for a mediated device 2018-07-22 6:09 [RFC PATCH 00/10] vfio/mdev: IOMMU aware mediated device Lu Baolu ` (2 preceding siblings ...) 2018-07-22 6:09 ` [RFC PATCH 03/10] iommu/vt-d: Allocate groups for mediated devices Lu Baolu @ 2018-07-22 6:09 ` Lu Baolu 2018-07-22 6:09 ` [RFC PATCH 05/10] iommu/vt-d: Setup DMA remapping for mediated devices Lu Baolu ` (5 subsequent siblings) 9 siblings, 0 replies; 32+ messages in thread From: Lu Baolu @ 2018-07-22 6:09 UTC (permalink / raw) To: Joerg Roedel, David Woodhouse, Alex Williamson, Kirti Wankhede Cc: ashok.raj, sanjay.k.kumar, jacob.jun.pan, kevin.tian, yi.l.liu, yi.y.sun, peterx, iommu, kvm, linux-kernel, Lu Baolu, Jacob Pan This patch adds the support to get the pasid table for a mediated device. The assumption is that each mediated device is a minimal assignable set of a physical PCI device. Hence, we should use the pasid table of the parent PCI device to manage the translation. Cc: Ashok Raj <ashok.raj@intel.com> Cc: Jacob Pan <jacob.jun.pan@linux.intel.com> Cc: Kevin Tian <kevin.tian@intel.com> Cc: Liu Yi L <yi.l.liu@intel.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> --- drivers/iommu/intel-pasid.c | 16 ++++++++++++++-- 1 file changed, 14 insertions(+), 2 deletions(-) diff --git a/drivers/iommu/intel-pasid.c b/drivers/iommu/intel-pasid.c index 1195c2a..9b4d462 100644 --- a/drivers/iommu/intel-pasid.c +++ b/drivers/iommu/intel-pasid.c @@ -129,7 +129,19 @@ int intel_pasid_alloc_table(struct device *dev) int ret, order; info = dev->archdata.iommu; - if (WARN_ON(!info || !dev_is_pci(dev) || info->pasid_table)) + if (WARN_ON(!info || info->pasid_table)) + return -EINVAL; + + /* Use parent PCI device pasid table for mdev: */ + if (dev_is_mdev(dev)) { + pasid_table = intel_pasid_get_table(dev_mdev_parent(dev)); + if (pasid_table) + goto attach_out; + else + return -ENOMEM; + } + + if (WARN_ON(!dev_is_pci(dev))) return -EINVAL; /* DMA alias device already has a pasid table, use it: */ @@ -190,7 +202,7 @@ void intel_pasid_free_table(struct device *dev) int i, max_pde; info = dev->archdata.iommu; - if (!info || !dev_is_pci(dev) || !info->pasid_table) + if (!info || !info->pasid_table) return; pasid_table = info->pasid_table; -- 2.7.4 ^ permalink raw reply related [flat|nested] 32+ messages in thread
* [RFC PATCH 05/10] iommu/vt-d: Setup DMA remapping for mediated devices 2018-07-22 6:09 [RFC PATCH 00/10] vfio/mdev: IOMMU aware mediated device Lu Baolu ` (3 preceding siblings ...) 2018-07-22 6:09 ` [RFC PATCH 04/10] iommu/vt-d: Get pasid table for a mediated device Lu Baolu @ 2018-07-22 6:09 ` Lu Baolu 2018-07-23 4:44 ` Liu, Yi L 2018-07-22 6:09 ` [RFC PATCH 06/10] iommu: Add iommu_set_bus API interface Lu Baolu ` (4 subsequent siblings) 9 siblings, 1 reply; 32+ messages in thread From: Lu Baolu @ 2018-07-22 6:09 UTC (permalink / raw) To: Joerg Roedel, David Woodhouse, Alex Williamson, Kirti Wankhede Cc: ashok.raj, sanjay.k.kumar, jacob.jun.pan, kevin.tian, yi.l.liu, yi.y.sun, peterx, iommu, kvm, linux-kernel, Lu Baolu, Jacob Pan This configures the second level page table when external components request to allocate a domain for a mediated device. Cc: Ashok Raj <ashok.raj@intel.com> Cc: Jacob Pan <jacob.jun.pan@linux.intel.com> Cc: Kevin Tian <kevin.tian@intel.com> Cc: Liu Yi L <yi.l.liu@intel.com> Signed-off-by: Sanjay Kumar <sanjay.k.kumar@intel.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> --- drivers/iommu/intel-iommu.c | 73 ++++++++++++++++++++++++++++++++++++++++----- 1 file changed, 66 insertions(+), 7 deletions(-) diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index 57ccfc4..b6e9ea8 100644 --- a/drivers/iommu/intel-iommu.c +++ b/drivers/iommu/intel-iommu.c @@ -2569,8 +2569,9 @@ static struct dmar_domain *dmar_insert_one_dev_info(struct intel_iommu *iommu, if (dev) dev->archdata.iommu = info; - if (dev && dev_is_pci(dev) && sm_supported(iommu)) { - bool pass_through; + if (dev && sm_supported(iommu)) { + bool pass_through = hw_pass_through && + domain_type_is_si(domain); ret = intel_pasid_alloc_table(dev); if (ret) { @@ -2579,12 +2580,21 @@ static struct dmar_domain *dmar_insert_one_dev_info(struct intel_iommu *iommu, return NULL; } - /* Setup the PASID entry for requests without PASID: */ - pass_through = hw_pass_through && domain_type_is_si(domain); spin_lock(&iommu->lock); - intel_pasid_setup_second_level(iommu, domain, dev, - PASID_RID2PASID, - pass_through); + + /* Setup the PASID entry for requests without PASID: */ + if (dev_is_pci(dev)) + intel_pasid_setup_second_level(iommu, domain, dev, + PASID_RID2PASID, + pass_through); + /* Setup the PASID entry for mediated devices: */ + else if (dev_is_mdev(dev)) + intel_pasid_setup_second_level(iommu, domain, dev, + domain->default_pasid, + false); + else + pr_err("Unsupported device %s\n", dev_name(dev)); + spin_unlock(&iommu->lock); } spin_unlock_irqrestore(&device_domain_lock, flags); @@ -4937,6 +4947,32 @@ static void domain_context_clear(struct intel_iommu *iommu, struct device *dev) pci_for_each_dma_alias(to_pci_dev(dev), &domain_context_clear_one_cb, iommu); } +static void +iommu_flush_ext_iotlb(struct intel_iommu *iommu, u16 did, u32 pasid, u64 gran) +{ + struct qi_desc desc; + + desc.high = 0; + desc.low = QI_EIOTLB_PASID(pasid) | QI_EIOTLB_DID(did) | + QI_EIOTLB_GRAN(gran) | QI_EIOTLB_TYPE; + + qi_submit_sync(&desc, iommu); +} + +static void iommu_flush_pasid_dev_iotlb(struct intel_iommu *iommu, + struct device *dev, int sid, int pasid) +{ + struct qi_desc desc; + struct pci_dev *pdev = to_pci_dev(dev); + int qdep = pci_ats_queue_depth(pdev); + + desc.low = QI_DEV_EIOTLB_PASID(pasid) | QI_DEV_EIOTLB_SID(sid) | + QI_DEV_EIOTLB_QDEP(qdep) | QI_DEIOTLB_TYPE; + desc.high = QI_DEV_EIOTLB_ADDR(-1ULL >> 1) | QI_DEV_EIOTLB_SIZE; + + qi_submit_sync(&desc, iommu); +} + static void __dmar_remove_one_dev_info(struct device_domain_info *info) { struct intel_iommu *iommu; @@ -4949,6 +4985,29 @@ static void __dmar_remove_one_dev_info(struct device_domain_info *info) iommu = info->iommu; + if (dev_is_mdev(info->dev)) { + struct dmar_domain *domain = info->domain; + int did = domain->iommu_did[iommu->seq_id]; + int sid = info->bus << 8 | info->devfn; + struct device *dev = info->dev; + + intel_pasid_clear_entry(dev, domain->default_pasid); + + /* Flush IOTLB including PASID Cache: */ + iommu->flush.flush_iotlb(iommu, did, 0, 0, DMA_TLB_DSI_FLUSH); + + /* + * Flush EIOTLB. The only way to flush global mappings within + * a PASID is to use QI_GRAN_ALL_ALL. + */ + iommu_flush_ext_iotlb(iommu, did, domain->default_pasid, + QI_GRAN_ALL_ALL); + + /* Flush Dev TLB: */ + iommu_flush_pasid_dev_iotlb(iommu, dev_mdev_parent(dev), sid, + domain->default_pasid); + } + if (info->dev) { iommu_disable_dev_iotlb(info); domain_context_clear(iommu, info->dev); -- 2.7.4 ^ permalink raw reply related [flat|nested] 32+ messages in thread
* RE: [RFC PATCH 05/10] iommu/vt-d: Setup DMA remapping for mediated devices 2018-07-22 6:09 ` [RFC PATCH 05/10] iommu/vt-d: Setup DMA remapping for mediated devices Lu Baolu @ 2018-07-23 4:44 ` Liu, Yi L 2018-07-24 2:29 ` Lu Baolu 0 siblings, 1 reply; 32+ messages in thread From: Liu, Yi L @ 2018-07-23 4:44 UTC (permalink / raw) To: Lu Baolu, Joerg Roedel, David Woodhouse, Alex Williamson, Kirti Wankhede Cc: Raj, Ashok, Kumar, Sanjay K, Pan, Jacob jun, Tian, Kevin, Sun, Yi Y, peterx, iommu, kvm, linux-kernel, Jacob Pan > From: Lu Baolu [mailto:baolu.lu@linux.intel.com] > Sent: Sunday, July 22, 2018 2:09 PM > > This configures the second level page table when external components request to > allocate a domain for a mediated device. I think it should be the time when adding a mediate device to the domain. > > Cc: Ashok Raj <ashok.raj@intel.com> > Cc: Jacob Pan <jacob.jun.pan@linux.intel.com> > Cc: Kevin Tian <kevin.tian@intel.com> > Cc: Liu Yi L <yi.l.liu@intel.com> > Signed-off-by: Sanjay Kumar <sanjay.k.kumar@intel.com> > Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> > --- > drivers/iommu/intel-iommu.c | 73 > ++++++++++++++++++++++++++++++++++++++++----- > 1 file changed, 66 insertions(+), 7 deletions(-) > > diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index > 57ccfc4..b6e9ea8 100644 > --- a/drivers/iommu/intel-iommu.c > +++ b/drivers/iommu/intel-iommu.c > @@ -2569,8 +2569,9 @@ static struct dmar_domain > *dmar_insert_one_dev_info(struct intel_iommu *iommu, > if (dev) > dev->archdata.iommu = info; > > - if (dev && dev_is_pci(dev) && sm_supported(iommu)) { > - bool pass_through; > + if (dev && sm_supported(iommu)) { > + bool pass_through = hw_pass_through && > + domain_type_is_si(domain); > > ret = intel_pasid_alloc_table(dev); > if (ret) { > @@ -2579,12 +2580,21 @@ static struct dmar_domain > *dmar_insert_one_dev_info(struct intel_iommu *iommu, > return NULL; > } > > - /* Setup the PASID entry for requests without PASID: */ > - pass_through = hw_pass_through && domain_type_is_si(domain); > spin_lock(&iommu->lock); > - intel_pasid_setup_second_level(iommu, domain, dev, > - PASID_RID2PASID, > - pass_through); > + > + /* Setup the PASID entry for requests without PASID: */ > + if (dev_is_pci(dev)) > + intel_pasid_setup_second_level(iommu, domain, dev, > + PASID_RID2PASID, can domain->default_pasid be initialized to be PASID_RID2PASID if the domain is not a domain used by mediated device? > + pass_through); > + /* Setup the PASID entry for mediated devices: */ > + else if (dev_is_mdev(dev)) > + intel_pasid_setup_second_level(iommu, domain, dev, > + domain->default_pasid, > + false); > + else > + pr_err("Unsupported device %s\n", dev_name(dev)); > + > spin_unlock(&iommu->lock); > } > spin_unlock_irqrestore(&device_domain_lock, flags); @@ -4937,6 +4947,32 > @@ static void domain_context_clear(struct intel_iommu *iommu, struct device > *dev) > pci_for_each_dma_alias(to_pci_dev(dev), &domain_context_clear_one_cb, > iommu); } > > +static void > +iommu_flush_ext_iotlb(struct intel_iommu *iommu, u16 did, u32 pasid, Per latest VT-d spec, its new name is pasid-based-iotlb. Pls update the naming accordingly. :) Regards, Yi Liu > +u64 gran) { > + struct qi_desc desc; > + > + desc.high = 0; > + desc.low = QI_EIOTLB_PASID(pasid) | QI_EIOTLB_DID(did) | > + QI_EIOTLB_GRAN(gran) | QI_EIOTLB_TYPE; > + > + qi_submit_sync(&desc, iommu); > +} > + > +static void iommu_flush_pasid_dev_iotlb(struct intel_iommu *iommu, > + struct device *dev, int sid, int pasid) { > + struct qi_desc desc; > + struct pci_dev *pdev = to_pci_dev(dev); > + int qdep = pci_ats_queue_depth(pdev); > + > + desc.low = QI_DEV_EIOTLB_PASID(pasid) | QI_DEV_EIOTLB_SID(sid) | > + QI_DEV_EIOTLB_QDEP(qdep) | QI_DEIOTLB_TYPE; > + desc.high = QI_DEV_EIOTLB_ADDR(-1ULL >> 1) | QI_DEV_EIOTLB_SIZE; > + > + qi_submit_sync(&desc, iommu); > +} > + > static void __dmar_remove_one_dev_info(struct device_domain_info *info) { > struct intel_iommu *iommu; > @@ -4949,6 +4985,29 @@ static void __dmar_remove_one_dev_info(struct > device_domain_info *info) > > iommu = info->iommu; > > + if (dev_is_mdev(info->dev)) { > + struct dmar_domain *domain = info->domain; > + int did = domain->iommu_did[iommu->seq_id]; > + int sid = info->bus << 8 | info->devfn; > + struct device *dev = info->dev; > + > + intel_pasid_clear_entry(dev, domain->default_pasid); > + > + /* Flush IOTLB including PASID Cache: */ > + iommu->flush.flush_iotlb(iommu, did, 0, 0, DMA_TLB_DSI_FLUSH); > + > + /* > + * Flush EIOTLB. The only way to flush global mappings within > + * a PASID is to use QI_GRAN_ALL_ALL. > + */ > + iommu_flush_ext_iotlb(iommu, did, domain->default_pasid, > + QI_GRAN_ALL_ALL); > + > + /* Flush Dev TLB: */ > + iommu_flush_pasid_dev_iotlb(iommu, dev_mdev_parent(dev), sid, > + domain->default_pasid); > + } > + > if (info->dev) { > iommu_disable_dev_iotlb(info); > domain_context_clear(iommu, info->dev); > -- > 2.7.4 ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [RFC PATCH 05/10] iommu/vt-d: Setup DMA remapping for mediated devices 2018-07-23 4:44 ` Liu, Yi L @ 2018-07-24 2:29 ` Lu Baolu 0 siblings, 0 replies; 32+ messages in thread From: Lu Baolu @ 2018-07-24 2:29 UTC (permalink / raw) To: Liu, Yi L, Joerg Roedel, David Woodhouse, Alex Williamson, Kirti Wankhede Cc: Raj, Ashok, Kumar, Sanjay K, Pan, Jacob jun, Tian, Kevin, Sun, Yi Y, peterx, iommu, kvm, linux-kernel, Jacob Pan Hi, On 07/23/2018 12:44 PM, Liu, Yi L wrote: >> From: Lu Baolu [mailto:baolu.lu@linux.intel.com] >> Sent: Sunday, July 22, 2018 2:09 PM >> >> This configures the second level page table when external components request to >> allocate a domain for a mediated device. > I think it should be the time when adding a mediate device to the domain. You are right. Will correct it. > >> Cc: Ashok Raj <ashok.raj@intel.com> >> Cc: Jacob Pan <jacob.jun.pan@linux.intel.com> >> Cc: Kevin Tian <kevin.tian@intel.com> >> Cc: Liu Yi L <yi.l.liu@intel.com> >> Signed-off-by: Sanjay Kumar <sanjay.k.kumar@intel.com> >> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> >> --- >> drivers/iommu/intel-iommu.c | 73 >> ++++++++++++++++++++++++++++++++++++++++----- >> 1 file changed, 66 insertions(+), 7 deletions(-) >> >> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index >> 57ccfc4..b6e9ea8 100644 >> --- a/drivers/iommu/intel-iommu.c >> +++ b/drivers/iommu/intel-iommu.c >> @@ -2569,8 +2569,9 @@ static struct dmar_domain >> *dmar_insert_one_dev_info(struct intel_iommu *iommu, >> if (dev) >> dev->archdata.iommu = info; >> >> - if (dev && dev_is_pci(dev) && sm_supported(iommu)) { >> - bool pass_through; >> + if (dev && sm_supported(iommu)) { >> + bool pass_through = hw_pass_through && >> + domain_type_is_si(domain); >> >> ret = intel_pasid_alloc_table(dev); >> if (ret) { >> @@ -2579,12 +2580,21 @@ static struct dmar_domain >> *dmar_insert_one_dev_info(struct intel_iommu *iommu, >> return NULL; >> } >> >> - /* Setup the PASID entry for requests without PASID: */ >> - pass_through = hw_pass_through && domain_type_is_si(domain); >> spin_lock(&iommu->lock); >> - intel_pasid_setup_second_level(iommu, domain, dev, >> - PASID_RID2PASID, >> - pass_through); >> + >> + /* Setup the PASID entry for requests without PASID: */ >> + if (dev_is_pci(dev)) >> + intel_pasid_setup_second_level(iommu, domain, dev, >> + PASID_RID2PASID, > can domain->default_pasid be initialized to be PASID_RID2PASID if the domain > is not a domain used by mediated device? PASID_RID2PASID is 0 as well. > >> + pass_through); >> + /* Setup the PASID entry for mediated devices: */ >> + else if (dev_is_mdev(dev)) >> + intel_pasid_setup_second_level(iommu, domain, dev, >> + domain->default_pasid, >> + false); >> + else >> + pr_err("Unsupported device %s\n", dev_name(dev)); >> + >> spin_unlock(&iommu->lock); >> } >> spin_unlock_irqrestore(&device_domain_lock, flags); @@ -4937,6 +4947,32 >> @@ static void domain_context_clear(struct intel_iommu *iommu, struct device >> *dev) >> pci_for_each_dma_alias(to_pci_dev(dev), &domain_context_clear_one_cb, >> iommu); } >> >> +static void >> +iommu_flush_ext_iotlb(struct intel_iommu *iommu, u16 did, u32 pasid, > Per latest VT-d spec, its new name is pasid-based-iotlb. Pls update the naming > accordingly. :) Okay. Best regards, Lu Baolu > > Regards, > Yi Liu > >> +u64 gran) { >> + struct qi_desc desc; >> + >> + desc.high = 0; >> + desc.low = QI_EIOTLB_PASID(pasid) | QI_EIOTLB_DID(did) | >> + QI_EIOTLB_GRAN(gran) | QI_EIOTLB_TYPE; >> + >> + qi_submit_sync(&desc, iommu); >> +} >> + >> +static void iommu_flush_pasid_dev_iotlb(struct intel_iommu *iommu, >> + struct device *dev, int sid, int pasid) { >> + struct qi_desc desc; >> + struct pci_dev *pdev = to_pci_dev(dev); >> + int qdep = pci_ats_queue_depth(pdev); >> + >> + desc.low = QI_DEV_EIOTLB_PASID(pasid) | QI_DEV_EIOTLB_SID(sid) | >> + QI_DEV_EIOTLB_QDEP(qdep) | QI_DEIOTLB_TYPE; >> + desc.high = QI_DEV_EIOTLB_ADDR(-1ULL >> 1) | QI_DEV_EIOTLB_SIZE; >> + >> + qi_submit_sync(&desc, iommu); >> +} >> + >> static void __dmar_remove_one_dev_info(struct device_domain_info *info) { >> struct intel_iommu *iommu; >> @@ -4949,6 +4985,29 @@ static void __dmar_remove_one_dev_info(struct >> device_domain_info *info) >> >> iommu = info->iommu; >> >> + if (dev_is_mdev(info->dev)) { >> + struct dmar_domain *domain = info->domain; >> + int did = domain->iommu_did[iommu->seq_id]; >> + int sid = info->bus << 8 | info->devfn; >> + struct device *dev = info->dev; >> + >> + intel_pasid_clear_entry(dev, domain->default_pasid); >> + >> + /* Flush IOTLB including PASID Cache: */ >> + iommu->flush.flush_iotlb(iommu, did, 0, 0, DMA_TLB_DSI_FLUSH); >> + >> + /* >> + * Flush EIOTLB. The only way to flush global mappings within >> + * a PASID is to use QI_GRAN_ALL_ALL. >> + */ >> + iommu_flush_ext_iotlb(iommu, did, domain->default_pasid, >> + QI_GRAN_ALL_ALL); >> + >> + /* Flush Dev TLB: */ >> + iommu_flush_pasid_dev_iotlb(iommu, dev_mdev_parent(dev), sid, >> + domain->default_pasid); >> + } >> + >> if (info->dev) { >> iommu_disable_dev_iotlb(info); >> domain_context_clear(iommu, info->dev); >> -- >> 2.7.4 > ^ permalink raw reply [flat|nested] 32+ messages in thread
* [RFC PATCH 06/10] iommu: Add iommu_set_bus API interface 2018-07-22 6:09 [RFC PATCH 00/10] vfio/mdev: IOMMU aware mediated device Lu Baolu ` (4 preceding siblings ...) 2018-07-22 6:09 ` [RFC PATCH 05/10] iommu/vt-d: Setup DMA remapping for mediated devices Lu Baolu @ 2018-07-22 6:09 ` Lu Baolu 2018-07-22 6:09 ` [RFC PATCH 07/10] iommu/vt-d: Add set_bus iommu ops Lu Baolu ` (3 subsequent siblings) 9 siblings, 0 replies; 32+ messages in thread From: Lu Baolu @ 2018-07-22 6:09 UTC (permalink / raw) To: Joerg Roedel, David Woodhouse, Alex Williamson, Kirti Wankhede Cc: ashok.raj, sanjay.k.kumar, jacob.jun.pan, kevin.tian, yi.l.liu, yi.y.sun, peterx, iommu, kvm, linux-kernel, Lu Baolu, Jacob Pan This adds iommu_set_bus API by adding a new set_bus ops in the iommu_ops structure. A vendor IOMMU driver could either specify its callback or safely ignore it. This interface could be used to set the iommu methods used for a particular non-pci bus. One consumer of this interface could be vfio/mdev bus when the mdev devices could be exclusively protected by the IOMMU units. Cc: Ashok Raj <ashok.raj@intel.com> Cc: Jacob Pan <jacob.jun.pan@linux.intel.com> Cc: Kevin Tian <kevin.tian@intel.com> Cc: Liu Yi L <yi.l.liu@intel.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> --- drivers/iommu/iommu.c | 23 +++++++++++++++++++++++ include/linux/iommu.h | 12 ++++++++++++ 2 files changed, 35 insertions(+) diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index 63b3756..b22b0a7 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -1976,3 +1976,26 @@ int iommu_fwspec_add_ids(struct device *dev, u32 *ids, int num_ids) return 0; } EXPORT_SYMBOL_GPL(iommu_fwspec_add_ids); + +int iommu_set_bus(struct bus_type *bus) +{ + struct iommu_device *iommu; + int ret = -ENODEV; + + spin_lock(&iommu_device_lock); + /* + * Iterate over iommu list and try setting bus with + * each iommu until a successful setting. + */ + list_for_each_entry(iommu, &iommu_device_list, list) { + if (iommu->ops->set_bus) { + ret = iommu->ops->set_bus(bus, iommu); + if (!ret) + break; + } + } + spin_unlock(&iommu_device_lock); + + return ret; +} +EXPORT_SYMBOL_GPL(iommu_set_bus); diff --git a/include/linux/iommu.h b/include/linux/iommu.h index 19938ee..2679796 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -48,6 +48,7 @@ struct bus_type; struct device; struct iommu_domain; struct notifier_block; +struct iommu_device; /* iommu fault flags */ #define IOMMU_FAULT_READ 0x0 @@ -187,6 +188,7 @@ struct iommu_resv_region { * @domain_get_windows: Return the number of windows for a domain * @of_xlate: add OF master IDs to iommu grouping * @pgsize_bitmap: bitmap of all possible supported page sizes + * @set_bus: set iommu ops for a non-pci bus */ struct iommu_ops { bool (*capable)(enum iommu_cap); @@ -235,6 +237,9 @@ struct iommu_ops { int (*of_xlate)(struct device *dev, struct of_phandle_args *args); bool (*is_attach_deferred)(struct iommu_domain *domain, struct device *dev); + /* Set iommu ops for a bus */ + int (*set_bus)(struct bus_type *bus, struct iommu_device *iommu); + unsigned long pgsize_bitmap; }; @@ -412,6 +417,8 @@ void iommu_fwspec_free(struct device *dev); int iommu_fwspec_add_ids(struct device *dev, u32 *ids, int num_ids); const struct iommu_ops *iommu_ops_from_fwnode(struct fwnode_handle *fwnode); +int iommu_set_bus(struct bus_type *bus); + #else /* CONFIG_IOMMU_API */ struct iommu_ops {}; @@ -696,6 +703,11 @@ const struct iommu_ops *iommu_ops_from_fwnode(struct fwnode_handle *fwnode) return NULL; } +static inline int iommu_set_bus(struct bus_type *bus) +{ + return -ENODEV; +} + #endif /* CONFIG_IOMMU_API */ #endif /* __LINUX_IOMMU_H */ -- 2.7.4 ^ permalink raw reply related [flat|nested] 32+ messages in thread
* [RFC PATCH 07/10] iommu/vt-d: Add set_bus iommu ops 2018-07-22 6:09 [RFC PATCH 00/10] vfio/mdev: IOMMU aware mediated device Lu Baolu ` (5 preceding siblings ...) 2018-07-22 6:09 ` [RFC PATCH 06/10] iommu: Add iommu_set_bus API interface Lu Baolu @ 2018-07-22 6:09 ` Lu Baolu 2018-07-22 6:09 ` [RFC PATCH 08/10] vfio/mdev: Set iommu ops for mdev bus Lu Baolu ` (2 subsequent siblings) 9 siblings, 0 replies; 32+ messages in thread From: Lu Baolu @ 2018-07-22 6:09 UTC (permalink / raw) To: Joerg Roedel, David Woodhouse, Alex Williamson, Kirti Wankhede Cc: ashok.raj, sanjay.k.kumar, jacob.jun.pan, kevin.tian, yi.l.liu, yi.y.sun, peterx, iommu, kvm, linux-kernel, Lu Baolu, Jacob Pan This specifies the Intel IOMMU specific set_bus operation. Cc: Ashok Raj <ashok.raj@intel.com> Cc: Jacob Pan <jacob.jun.pan@linux.intel.com> Cc: Kevin Tian <kevin.tian@intel.com> Cc: Liu Yi L <yi.l.liu@intel.com> Signed-off-by: Sanjay Kumar <sanjay.k.kumar@intel.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> --- drivers/iommu/intel-iommu.c | 18 ++++++++++++++++++ 1 file changed, 18 insertions(+) diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index b6e9ea8..de36292 100644 --- a/drivers/iommu/intel-iommu.c +++ b/drivers/iommu/intel-iommu.c @@ -5346,6 +5346,23 @@ static struct iommu_group *intel_iommu_device_group(struct device *dev) return ERR_PTR(-EINVAL); } +static int intel_iommu_set_bus(struct bus_type *bus, struct iommu_device *idev) +{ + struct intel_iommu *iommu; + + iommu = container_of(idev, struct intel_iommu, iommu); + if (strcmp(bus->name, "mdev")) + return -EINVAL; + + if (!sm_supported(iommu)) + return -ENODEV; + + bus_set_iommu(bus, &intel_iommu_ops); + bus_register_notifier(bus, &device_nb); + + return 0; +} + #ifdef CONFIG_INTEL_IOMMU_SVM int intel_iommu_enable_pasid(struct intel_iommu *iommu, struct intel_svm_dev *sdev) { @@ -5440,6 +5457,7 @@ const struct iommu_ops intel_iommu_ops = { .remove_device = intel_iommu_remove_device, .get_resv_regions = intel_iommu_get_resv_regions, .put_resv_regions = intel_iommu_put_resv_regions, + .set_bus = intel_iommu_set_bus, .device_group = intel_iommu_device_group, .pgsize_bitmap = INTEL_IOMMU_PGSIZES, }; -- 2.7.4 ^ permalink raw reply related [flat|nested] 32+ messages in thread
* [RFC PATCH 08/10] vfio/mdev: Set iommu ops for mdev bus 2018-07-22 6:09 [RFC PATCH 00/10] vfio/mdev: IOMMU aware mediated device Lu Baolu ` (6 preceding siblings ...) 2018-07-22 6:09 ` [RFC PATCH 07/10] iommu/vt-d: Add set_bus iommu ops Lu Baolu @ 2018-07-22 6:09 ` Lu Baolu 2018-07-22 6:09 ` [RFC PATCH 09/10] vfio/mdev: Add mediated device domain type Lu Baolu 2018-07-22 6:09 ` [RFC PATCH 10/10] vfio/type1: Allocate domain for mediated device Lu Baolu 9 siblings, 0 replies; 32+ messages in thread From: Lu Baolu @ 2018-07-22 6:09 UTC (permalink / raw) To: Joerg Roedel, David Woodhouse, Alex Williamson, Kirti Wankhede Cc: ashok.raj, sanjay.k.kumar, jacob.jun.pan, kevin.tian, yi.l.liu, yi.y.sun, peterx, iommu, kvm, linux-kernel, Lu Baolu, Jacob Pan This sets the iommu ops for the mdev bus with iommu_set_bus(). With the iommu ops setting, a mediated device might be isolated and protected by an IOMMU unit. Cc: Ashok Raj <ashok.raj@intel.com> Cc: Jacob Pan <jacob.jun.pan@linux.intel.com> Cc: Kevin Tian <kevin.tian@intel.com> Cc: Liu Yi L <yi.l.liu@intel.com> Signed-off-by: Sanjay Kumar <sanjay.k.kumar@intel.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> --- drivers/vfio/mdev/mdev_core.c | 9 ++++++++- drivers/vfio/mdev/mdev_driver.c | 10 ++++++++++ 2 files changed, 18 insertions(+), 1 deletion(-) diff --git a/drivers/vfio/mdev/mdev_core.c b/drivers/vfio/mdev/mdev_core.c index 0212f0e..d8f19ba 100644 --- a/drivers/vfio/mdev/mdev_core.c +++ b/drivers/vfio/mdev/mdev_core.c @@ -16,6 +16,7 @@ #include <linux/uuid.h> #include <linux/sysfs.h> #include <linux/mdev.h> +#include <linux/iommu.h> #include "mdev_private.h" @@ -392,7 +393,13 @@ int mdev_device_remove(struct device *dev, bool force_remove) static int __init mdev_init(void) { - return mdev_bus_register(); + int ret; + + ret = mdev_bus_register(); + if (!ret) + iommu_set_bus(&mdev_bus_type); + + return ret; } static void __exit mdev_exit(void) diff --git a/drivers/vfio/mdev/mdev_driver.c b/drivers/vfio/mdev/mdev_driver.c index 6f0391f..76ac9e6 100644 --- a/drivers/vfio/mdev/mdev_driver.c +++ b/drivers/vfio/mdev/mdev_driver.c @@ -21,6 +21,13 @@ static int mdev_attach_iommu(struct mdev_device *mdev) int ret; struct iommu_group *group; + /* + * If iommu_ops is set for bus, add_device() will allocate + * a group and add the device in the group. + */ + if (iommu_present(mdev->dev.bus)) + return 0; + group = iommu_group_alloc(); if (IS_ERR(group)) return PTR_ERR(group); @@ -36,6 +43,9 @@ static int mdev_attach_iommu(struct mdev_device *mdev) static void mdev_detach_iommu(struct mdev_device *mdev) { + if (iommu_present(mdev->dev.bus)) + return; + iommu_group_remove_device(&mdev->dev); dev_info(&mdev->dev, "MDEV: detaching iommu\n"); } -- 2.7.4 ^ permalink raw reply related [flat|nested] 32+ messages in thread
* [RFC PATCH 09/10] vfio/mdev: Add mediated device domain type 2018-07-22 6:09 [RFC PATCH 00/10] vfio/mdev: IOMMU aware mediated device Lu Baolu ` (7 preceding siblings ...) 2018-07-22 6:09 ` [RFC PATCH 08/10] vfio/mdev: Set iommu ops for mdev bus Lu Baolu @ 2018-07-22 6:09 ` Lu Baolu 2018-07-22 6:09 ` [RFC PATCH 10/10] vfio/type1: Allocate domain for mediated device Lu Baolu 9 siblings, 0 replies; 32+ messages in thread From: Lu Baolu @ 2018-07-22 6:09 UTC (permalink / raw) To: Joerg Roedel, David Woodhouse, Alex Williamson, Kirti Wankhede Cc: ashok.raj, sanjay.k.kumar, jacob.jun.pan, kevin.tian, yi.l.liu, yi.y.sun, peterx, iommu, kvm, linux-kernel, Lu Baolu, Jacob Pan A parent device might create different types of mediated devices. For example, a mediated device could be created on the parent device with a PASID tagged. When the iommu supports PASID-granular translations, the mediated device is individually protected and isolated by the iommu. It's hence possible to allocate a domain for each such device. This patch defines the domain types of a mediated device and allows the parent driver to specify this attribute after a mdev is actually created. Cc: Ashok Raj <ashok.raj@intel.com> Cc: Jacob Pan <jacob.jun.pan@linux.intel.com> Cc: Kevin Tian <kevin.tian@intel.com> Cc: Liu Yi L <yi.l.liu@intel.com> Suggested-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> --- drivers/vfio/mdev/mdev_core.c | 21 +++++++++++++++++++++ drivers/vfio/mdev/mdev_private.h | 1 + include/linux/mdev.h | 22 ++++++++++++++++++++++ 3 files changed, 44 insertions(+) diff --git a/drivers/vfio/mdev/mdev_core.c b/drivers/vfio/mdev/mdev_core.c index d8f19ba..4d82b36 100644 --- a/drivers/vfio/mdev/mdev_core.c +++ b/drivers/vfio/mdev/mdev_core.c @@ -391,6 +391,27 @@ int mdev_device_remove(struct device *dev, bool force_remove) return 0; } +int mdev_set_domain_type(struct device *dev, enum mdev_domain_type type) +{ + struct mdev_device *mdev = to_mdev_device(dev); + + if (type == DOMAIN_TYPE_PRIVATE && !iommu_present(&mdev_bus_type)) + return -EINVAL; + + mdev->domain_type = type; + + return 0; +} +EXPORT_SYMBOL(mdev_set_domain_type); + +enum mdev_domain_type mdev_get_domain_type(struct device *dev) +{ + struct mdev_device *mdev = to_mdev_device(dev); + + return mdev->domain_type; +} +EXPORT_SYMBOL(mdev_get_domain_type); + static int __init mdev_init(void) { int ret; diff --git a/drivers/vfio/mdev/mdev_private.h b/drivers/vfio/mdev/mdev_private.h index b5819b7..d47a670 100644 --- a/drivers/vfio/mdev/mdev_private.h +++ b/drivers/vfio/mdev/mdev_private.h @@ -34,6 +34,7 @@ struct mdev_device { struct list_head next; struct kobject *type_kobj; bool active; + int domain_type; }; #define to_mdev_device(dev) container_of(dev, struct mdev_device, dev) diff --git a/include/linux/mdev.h b/include/linux/mdev.h index b6e048e..5d862b0 100644 --- a/include/linux/mdev.h +++ b/include/linux/mdev.h @@ -15,6 +15,28 @@ struct mdev_device; +enum mdev_domain_type { + DOMAIN_TYPE_EXTERNAL, /* Use the external domain and all + * IOMMU staff controlled by the + * parent device driver. + */ + DOMAIN_TYPE_INHERITANCE,/* Use the same domain as the parent device. */ + DOMAIN_TYPE_PRIVATE, /* Capable of having a private domain. For an + * example, the parent device is able to bind + * a specific PASID for a mediated device and + * transferring data with the asigned PASID. + */ +}; + +/* + * Called by the parent device driver to set the domain type. + * By default, the domain type is set to DOMAIN_TYPE_EXTERNAL. + */ +int mdev_set_domain_type(struct device *dev, enum mdev_domain_type type); + +/* Check the domain type. */ +enum mdev_domain_type mdev_get_domain_type(struct device *dev); + /** * struct mdev_parent_ops - Structure to be registered for each parent device to * register the device to mdev module. -- 2.7.4 ^ permalink raw reply related [flat|nested] 32+ messages in thread
* [RFC PATCH 10/10] vfio/type1: Allocate domain for mediated device 2018-07-22 6:09 [RFC PATCH 00/10] vfio/mdev: IOMMU aware mediated device Lu Baolu ` (8 preceding siblings ...) 2018-07-22 6:09 ` [RFC PATCH 09/10] vfio/mdev: Add mediated device domain type Lu Baolu @ 2018-07-22 6:09 ` Lu Baolu 9 siblings, 0 replies; 32+ messages in thread From: Lu Baolu @ 2018-07-22 6:09 UTC (permalink / raw) To: Joerg Roedel, David Woodhouse, Alex Williamson, Kirti Wankhede Cc: ashok.raj, sanjay.k.kumar, jacob.jun.pan, kevin.tian, yi.l.liu, yi.y.sun, peterx, iommu, kvm, linux-kernel, Lu Baolu, Jacob Pan This allocates a domain for the mediated device if it is able to be isolated and protected individually by IOMMU. Cc: Ashok Raj <ashok.raj@intel.com> Cc: Jacob Pan <jacob.jun.pan@linux.intel.com> Cc: Kevin Tian <kevin.tian@intel.com> Cc: Liu Yi L <yi.l.liu@intel.com> Signed-off-by: Sanjay Kumar <sanjay.k.kumar@intel.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> --- drivers/vfio/vfio_iommu_type1.c | 43 ++++++++++++++++++++++++++++++----------- 1 file changed, 32 insertions(+), 11 deletions(-) diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c index 3e5b177..496bea6 100644 --- a/drivers/vfio/vfio_iommu_type1.c +++ b/drivers/vfio/vfio_iommu_type1.c @@ -1177,6 +1177,22 @@ static int vfio_bus_type(struct device *dev, void *data) return 0; } +static int mdev_private_domain(struct device *dev, void *data) +{ + enum mdev_domain_type type; + enum mdev_domain_type (*fn)(struct device *dev); + + fn = symbol_get(mdev_get_domain_type); + if (fn) { + type = fn(dev); + symbol_put(mdev_get_domain_type); + + return type != DOMAIN_TYPE_PRIVATE; + } + + return -EINVAL; +} + static int vfio_iommu_replay(struct vfio_iommu *iommu, struct vfio_domain *domain) { @@ -1371,18 +1387,23 @@ static int vfio_iommu_type1_attach_group(void *iommu_data, mdev_bus = symbol_get(mdev_bus_type); if (mdev_bus) { - if ((bus == mdev_bus) && !iommu_present(bus)) { - symbol_put(mdev_bus_type); - if (!iommu->external_domain) { - INIT_LIST_HEAD(&domain->group_list); - iommu->external_domain = domain; - } else - kfree(domain); + if (bus == mdev_bus) { + ret = iommu_group_for_each_dev(iommu_group, NULL, + mdev_private_domain); + if (!iommu_present(bus) || ret) { + symbol_put(mdev_bus_type); + if (!iommu->external_domain) { + INIT_LIST_HEAD(&domain->group_list); + iommu->external_domain = domain; + } else { + kfree(domain); + } - list_add(&group->next, - &iommu->external_domain->group_list); - mutex_unlock(&iommu->lock); - return 0; + list_add(&group->next, + &iommu->external_domain->group_list); + mutex_unlock(&iommu->lock); + return 0; + } } symbol_put(mdev_bus_type); } -- 2.7.4 ^ permalink raw reply related [flat|nested] 32+ messages in thread
end of thread, other threads:[~2018-07-26 15:09 UTC | newest] Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2018-07-22 6:09 [RFC PATCH 00/10] vfio/mdev: IOMMU aware mediated device Lu Baolu 2018-07-22 6:09 ` [RFC PATCH 01/10] iommu/vt-d: Get iommu device for a " Lu Baolu 2018-07-23 4:44 ` Liu, Yi L 2018-07-24 2:06 ` Lu Baolu 2018-07-24 18:50 ` Alex Williamson 2018-07-25 1:18 ` Lu Baolu 2018-07-22 6:09 ` [RFC PATCH 02/10] iommu/vt-d: Alloc domain " Lu Baolu 2018-07-23 4:44 ` Liu, Yi L 2018-07-24 2:09 ` Lu Baolu 2018-07-22 6:09 ` [RFC PATCH 03/10] iommu/vt-d: Allocate groups for mediated devices Lu Baolu 2018-07-23 4:44 ` Liu, Yi L 2018-07-24 2:22 ` Lu Baolu 2018-07-24 11:30 ` Jean-Philippe Brucker 2018-07-24 19:51 ` Alex Williamson 2018-07-25 2:06 ` Lu Baolu 2018-07-25 2:16 ` Tian, Kevin 2018-07-25 2:35 ` Liu, Yi L [not found] ` <AADFC41AFE54684AB9EE6CBC0274A5D19127FB9E@SHSMSX101.ccr.corp.intel.com> 2018-07-25 6:19 ` Tian, Kevin 2018-07-25 19:19 ` Jean-Philippe Brucker 2018-07-26 3:03 ` Tian, Kevin 2018-07-26 15:09 ` Jean-Philippe Brucker [not found] ` <AADFC41AFE54684AB9EE6CBC0274A5D1912826AE@SHSMSX101.ccr.corp.intel.com> 2018-07-26 3:28 ` Tian, Kevin 2018-07-26 15:09 ` Jean-Philippe Brucker 2018-07-22 6:09 ` [RFC PATCH 04/10] iommu/vt-d: Get pasid table for a mediated device Lu Baolu 2018-07-22 6:09 ` [RFC PATCH 05/10] iommu/vt-d: Setup DMA remapping for mediated devices Lu Baolu 2018-07-23 4:44 ` Liu, Yi L 2018-07-24 2:29 ` Lu Baolu 2018-07-22 6:09 ` [RFC PATCH 06/10] iommu: Add iommu_set_bus API interface Lu Baolu 2018-07-22 6:09 ` [RFC PATCH 07/10] iommu/vt-d: Add set_bus iommu ops Lu Baolu 2018-07-22 6:09 ` [RFC PATCH 08/10] vfio/mdev: Set iommu ops for mdev bus Lu Baolu 2018-07-22 6:09 ` [RFC PATCH 09/10] vfio/mdev: Add mediated device domain type Lu Baolu 2018-07-22 6:09 ` [RFC PATCH 10/10] vfio/type1: Allocate domain for mediated device Lu Baolu
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).