From: "Tian, Kevin" <kevin.tian@intel.com> To: Jacob Pan <jacob.jun.pan@linux.intel.com>, Lu Baolu <baolu.lu@linux.intel.com>, "iommu@lists.linux-foundation.org" <iommu@lists.linux-foundation.org>, LKML <linux-kernel@vger.kernel.org>, Joerg Roedel <joro@8bytes.org>, David Woodhouse <dwmw2@infradead.org>, "Alex Williamson" <alex.williamson@redhat.com>, Jean-Philippe Brucker <jean-philippe@linaro.com> Cc: "Liu, Yi L" <yi.l.liu@intel.com>, "Raj, Ashok" <ashok.raj@intel.com>, Christoph Hellwig <hch@infradead.org>, Jonathan Cameron <jic23@kernel.org>, Eric Auger <eric.auger@redhat.com>, Yi L <yi.l.liu@linux.intel.com> Subject: RE: [PATCH V10 06/11] iommu/vt-d: Add bind guest PASID support Date: Sat, 28 Mar 2020 08:02:01 +0000 [thread overview] Message-ID: <AADFC41AFE54684AB9EE6CBC0274A5D19D7F77B4@SHSMSX104.ccr.corp.intel.com> (raw) In-Reply-To: <1584746861-76386-7-git-send-email-jacob.jun.pan@linux.intel.com> > From: Jacob Pan <jacob.jun.pan@linux.intel.com> > Sent: Saturday, March 21, 2020 7:28 AM > > When supporting guest SVA with emulated IOMMU, the guest PASID > table is shadowed in VMM. Updates to guest vIOMMU PASID table > will result in PASID cache flush which will be passed down to > the host as bind guest PASID calls. > > For the SL page tables, it will be harvested from device's > default domain (request w/o PASID), or aux domain in case of > mediated device. > > .-------------. .---------------------------. > | vIOMMU | | Guest process CR3, FL only| > | | '---------------------------' > .----------------/ > | PASID Entry |--- PASID cache flush - > '-------------' | > | | V > | | CR3 in GPA > '-------------' > Guest > ------| Shadow |--------------------------|-------- > v v v > Host > .-------------. .----------------------. > | pIOMMU | | Bind FL for GVA-GPA | > | | '----------------------' > .----------------/ | > | PASID Entry | V (Nested xlate) > '----------------\.------------------------------. > | | |SL for GPA-HPA, default domain| > | | '------------------------------' > '-------------' > Where: > - FL = First level/stage one page tables > - SL = Second level/stage two page tables > > Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com> > Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com> > --- > drivers/iommu/intel-iommu.c | 4 + > drivers/iommu/intel-svm.c | 224 > ++++++++++++++++++++++++++++++++++++++++++++ > include/linux/intel-iommu.h | 8 +- > include/linux/intel-svm.h | 17 ++++ > 4 files changed, 252 insertions(+), 1 deletion(-) > > diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c > index e599b2537b1c..b1477cd423dd 100644 > --- a/drivers/iommu/intel-iommu.c > +++ b/drivers/iommu/intel-iommu.c > @@ -6203,6 +6203,10 @@ const struct iommu_ops intel_iommu_ops = { > .dev_disable_feat = intel_iommu_dev_disable_feat, > .is_attach_deferred = intel_iommu_is_attach_deferred, > .pgsize_bitmap = INTEL_IOMMU_PGSIZES, > +#ifdef CONFIG_INTEL_IOMMU_SVM > + .sva_bind_gpasid = intel_svm_bind_gpasid, > + .sva_unbind_gpasid = intel_svm_unbind_gpasid, > +#endif > }; > > static void quirk_iommu_igfx(struct pci_dev *dev) > diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c > index d7f2a5358900..47c0deb5ae56 100644 > --- a/drivers/iommu/intel-svm.c > +++ b/drivers/iommu/intel-svm.c > @@ -226,6 +226,230 @@ static LIST_HEAD(global_svm_list); > list_for_each_entry((sdev), &(svm)->devs, list) \ > if ((d) != (sdev)->dev) {} else > > +int intel_svm_bind_gpasid(struct iommu_domain *domain, > + struct device *dev, > + struct iommu_gpasid_bind_data *data) > +{ > + struct intel_iommu *iommu = intel_svm_device_to_iommu(dev); > + struct dmar_domain *ddomain; what about the full name e.g. dmar_domain? though a bit longer but clearer than ddomain. > + struct intel_svm_dev *sdev; > + struct intel_svm *svm; > + int ret = 0; > + > + if (WARN_ON(!iommu) || !data) > + return -EINVAL; > + > + if (data->version != IOMMU_GPASID_BIND_VERSION_1 || > + data->format != IOMMU_PASID_FORMAT_INTEL_VTD) > + return -EINVAL; > + > + if (dev_is_pci(dev)) { > + /* VT-d supports devices with full 20 bit PASIDs only */ > + if (pci_max_pasids(to_pci_dev(dev)) != PASID_MAX) > + return -EINVAL; > + } else { > + return -ENOTSUPP; > + } > + > + /* > + * We only check host PASID range, we have no knowledge to check > + * guest PASID range nor do we use the guest PASID. > + */ > + if (data->hpasid <= 0 || data->hpasid >= PASID_MAX) > + return -EINVAL; > + > + ddomain = to_dmar_domain(domain); > + > + /* Sanity check paging mode support match between host and guest > */ > + if (data->addr_width == ADDR_WIDTH_5LEVEL && > + !cap_5lp_support(iommu->cap)) { > + pr_err("Cannot support 5 level paging requested by > guest!\n"); > + return -EINVAL; > + } -ENOTSUPP? > + > + mutex_lock(&pasid_mutex); > + svm = ioasid_find(NULL, data->hpasid, NULL); > + if (IS_ERR(svm)) { > + ret = PTR_ERR(svm); > + goto out; > + } > + > + if (svm) { > + /* > + * If we found svm for the PASID, there must be at > + * least one device bond, otherwise svm should be freed. > + */ > + if (WARN_ON(list_empty(&svm->devs))) { > + ret = -EINVAL; > + goto out; > + } > + > + if (svm->mm == get_task_mm(current) && > + data->hpasid == svm->pasid && > + data->gpasid == svm->gpasid) { > + pr_warn("Cannot bind the same guest-host PASID for > the same process\n"); Sorry I didn’t get the rationale here. Isn't this branch is for binding the same PASID to multiple devices? In that case definitely it is binding the same guest-host PASID for the same process. otherwise if hpasid is different then you'll hit a different intel_svm, while if gpasid is different how you can use one intel_svm to hold multiple gpasids? I feel the error condition should be the opposite. and suppose SVM_FLAG_ GUEST_PASID should be verified before checking gpasid. > + mmput(svm->mm); > + ret = -EINVAL; > + goto out; > + } > + mmput(current->mm); > + > + for_each_svm_dev(sdev, svm, dev) { > + /* In case of multiple sub-devices of the same pdev > + * assigned, we should allow multiple bind calls with > + * the same PASID and pdev. Does sub-device mean mdev? I didn't find such notation in current iommu directory. and to make it clearer, "In case of multiple mdevs of the same pdev assigned to the same guest process". > + */ > + sdev->users++; > + goto out; > + } > + } else { > + /* We come here when PASID has never been bond to a > device. */ > + svm = kzalloc(sizeof(*svm), GFP_KERNEL); > + if (!svm) { > + ret = -ENOMEM; > + goto out; > + } > + /* REVISIT: upper layer/VFIO can track host process that bind > the PASID. > + * ioasid_set = mm might be sufficient for vfio to check pasid > VMM > + * ownership. > + */ Above message is unclear about what should be revisited. Does it describe the current implementation or the expected revision in the future? > + svm->mm = get_task_mm(current); > + svm->pasid = data->hpasid; > + if (data->flags & IOMMU_SVA_GPASID_VAL) { > + svm->gpasid = data->gpasid; > + svm->flags |= SVM_FLAG_GUEST_PASID; > + } > + ioasid_set_data(data->hpasid, svm); > + INIT_LIST_HEAD_RCU(&svm->devs); > + mmput(svm->mm); > + } > + sdev = kzalloc(sizeof(*sdev), GFP_KERNEL); > + if (!sdev) { > + if (list_empty(&svm->devs)) { > + ioasid_set_data(data->hpasid, NULL); > + kfree(svm); > + } > + ret = -ENOMEM; > + goto out; > + } > + sdev->dev = dev; > + sdev->users = 1; > + > + /* Set up device context entry for PASID if not enabled already */ > + ret = intel_iommu_enable_pasid(iommu, sdev->dev); > + if (ret) { > + dev_err(dev, "Failed to enable PASID capability\n"); > + kfree(sdev); > + /* > + * If this this a new PASID that never bond to a device, then > + * the device list must be empty which indicates struct svm > + * was allocated in this function. > + */ the comment better move to the 1st occurrence when sdev allocation fails. or even better put it in out label... > + if (list_empty(&svm->devs)) { > + ioasid_set_data(data->hpasid, NULL); > + kfree(svm); > + } > + goto out; > + } > + > + /* > + * For guest bind, we need to set up PASID table entry as follows: > + * - FLPM matches guest paging mode > + * - turn on nested mode > + * - SL guest address width matching > + */ looks above just explains the internal detail of intel_pasid_setup_nested, which is not necessary to be here. > + ret = intel_pasid_setup_nested(iommu, > + dev, > + (pgd_t *)data->gpgd, > + data->hpasid, > + &data->vtd, > + ddomain, > + data->addr_width); It's worthy of an explanation here that setup_nested is required for every device (even when they are sharing same intel_svm) because we allocate pasid table per device. Otherwise I made a mistake to think that only the 1st device bound to a new hpasid requires this step. 😊 > + if (ret) { > + dev_err(dev, "Failed to set up PASID %llu in nested mode, > Err %d\n", > + data->hpasid, ret); > + /* > + * PASID entry should be in cleared state if nested mode > + * set up failed. So we only need to clear IOASID tracking > + * data such that free call will succeed. > + */ > + kfree(sdev); > + if (list_empty(&svm->devs)) { > + ioasid_set_data(data->hpasid, NULL); > + kfree(svm); > + } > + goto out; > + } > + svm->flags |= SVM_FLAG_GUEST_MODE; > + > + init_rcu_head(&sdev->rcu); > + list_add_rcu(&sdev->list, &svm->devs); > + out: > + mutex_unlock(&pasid_mutex); > + return ret; > +} > + > +int intel_svm_unbind_gpasid(struct device *dev, int pasid) > +{ > + struct intel_iommu *iommu = intel_svm_device_to_iommu(dev); > + struct intel_svm_dev *sdev; > + struct intel_svm *svm; > + int ret = -EINVAL; > + > + if (WARN_ON(!iommu)) > + return -EINVAL; > + > + mutex_lock(&pasid_mutex); > + svm = ioasid_find(NULL, pasid, NULL); > + if (!svm) { > + ret = -EINVAL; > + goto out; > + } > + > + if (IS_ERR(svm)) { > + ret = PTR_ERR(svm); > + goto out; > + } > + > + for_each_svm_dev(sdev, svm, dev) { > + ret = 0; > + sdev->users--; > + if (!sdev->users) { > + list_del_rcu(&sdev->list); > + intel_pasid_tear_down_entry(iommu, dev, svm- > >pasid); > + /* TODO: Drain in flight PRQ for the PASID since it > + * may get reused soon, we don't want to > + * confuse with its previous life. > + * intel_svm_drain_prq(dev, pasid); > + */ > + kfree_rcu(sdev, rcu); > + > + if (list_empty(&svm->devs)) { > + /* > + * We do not free PASID here until explicit call > + * from VFIO to free. The PASID life cycle > + * management is largely tied to VFIO > management > + * of assigned device life cycles. In case of > + * guest exit without a explicit free PASID call, > + * the responsibility lies in VFIO layer to free > + * the PASIDs allocated for the guest. > + * For security reasons, VFIO has to track the > + * PASID ownership per guest anyway to > ensure > + * that PASID allocated by one guest cannot > be > + * used by another. As commented in other patches, VFIO is only one example user of this API... > + */ > + ioasid_set_data(pasid, NULL); > + kfree(svm); > + } > + } > + break; > + } what about no dev match? an -EINVAL is also required then. > +out: > + mutex_unlock(&pasid_mutex); > + > + return ret; > +} > + > int intel_svm_bind_mm(struct device *dev, int *pasid, int flags, struct > svm_dev_ops *ops) > { > struct intel_iommu *iommu = intel_svm_device_to_iommu(dev); > diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h > index eda1d6687144..85b05120940e 100644 > --- a/include/linux/intel-iommu.h > +++ b/include/linux/intel-iommu.h > @@ -681,7 +681,9 @@ struct dmar_domain *find_domain(struct device > *dev); > extern void intel_svm_check(struct intel_iommu *iommu); > extern int intel_svm_enable_prq(struct intel_iommu *iommu); > extern int intel_svm_finish_prq(struct intel_iommu *iommu); > - > +extern int intel_svm_bind_gpasid(struct iommu_domain *domain, > + struct device *dev, struct iommu_gpasid_bind_data *data); > +extern int intel_svm_unbind_gpasid(struct device *dev, int pasid); > struct svm_dev_ops; > > struct intel_svm_dev { > @@ -698,9 +700,13 @@ struct intel_svm_dev { > struct intel_svm { > struct mmu_notifier notifier; > struct mm_struct *mm; > + > struct intel_iommu *iommu; > int flags; > int pasid; > + int gpasid; /* Guest PASID in case of vSVA bind with non-identity host > + * to guest PASID mapping. > + */ we don't need to highlight identity or non-identity thing, since either way shares the same infrastructure here and it is not the knowledge that the kernel driver should assume > struct list_head devs; > struct list_head list; > }; > diff --git a/include/linux/intel-svm.h b/include/linux/intel-svm.h > index d7c403d0dd27..c19690937540 100644 > --- a/include/linux/intel-svm.h > +++ b/include/linux/intel-svm.h > @@ -44,6 +44,23 @@ struct svm_dev_ops { > * do such IOTLB flushes automatically. > */ > #define SVM_FLAG_SUPERVISOR_MODE (1<<1) > +/* > + * The SVM_FLAG_GUEST_MODE flag is used when a guest process bind to a > device. > + * In this case the mm_struct is in the guest kernel or userspace, its life > + * cycle is managed by VMM and VFIO layer. For IOMMU driver, this API > provides > + * means to bind/unbind guest CR3 with PASIDs allocated for a device. > + */ > +#define SVM_FLAG_GUEST_MODE (1<<2) > +/* > + * The SVM_FLAG_GUEST_PASID flag is used when a guest has its own PASID > space, > + * which requires guest and host PASID translation at both directions. We > keep > + * track of guest PASID in order to provide lookup service to device drivers. > + * One such example is a physical function (PF) driver that supports > mediated > + * device (mdev) assignment. Guest programming of mdev configuration > space can > + * only be done with guest PASID, therefore PF driver needs to find the > matching > + * host PASID to program the real hardware. > + */ > +#define SVM_FLAG_GUEST_PASID (1<<3) > > #ifdef CONFIG_INTEL_IOMMU_SVM > > -- > 2.7.4
WARNING: multiple messages have this Message-ID (diff)
From: "Tian, Kevin" <kevin.tian@intel.com> To: Jacob Pan <jacob.jun.pan@linux.intel.com>, Lu Baolu <baolu.lu@linux.intel.com>, "iommu@lists.linux-foundation.org" <iommu@lists.linux-foundation.org>, LKML <linux-kernel@vger.kernel.org>, Joerg Roedel <joro@8bytes.org>, David Woodhouse <dwmw2@infradead.org>, "Alex Williamson" <alex.williamson@redhat.com>, Jean-Philippe Brucker <jean-philippe@linaro.com> Cc: Yi L <yi.l.liu@linux.intel.com>, "Raj, Ashok" <ashok.raj@intel.com>, Jonathan Cameron <jic23@kernel.org> Subject: RE: [PATCH V10 06/11] iommu/vt-d: Add bind guest PASID support Date: Sat, 28 Mar 2020 08:02:01 +0000 [thread overview] Message-ID: <AADFC41AFE54684AB9EE6CBC0274A5D19D7F77B4@SHSMSX104.ccr.corp.intel.com> (raw) In-Reply-To: <1584746861-76386-7-git-send-email-jacob.jun.pan@linux.intel.com> > From: Jacob Pan <jacob.jun.pan@linux.intel.com> > Sent: Saturday, March 21, 2020 7:28 AM > > When supporting guest SVA with emulated IOMMU, the guest PASID > table is shadowed in VMM. Updates to guest vIOMMU PASID table > will result in PASID cache flush which will be passed down to > the host as bind guest PASID calls. > > For the SL page tables, it will be harvested from device's > default domain (request w/o PASID), or aux domain in case of > mediated device. > > .-------------. .---------------------------. > | vIOMMU | | Guest process CR3, FL only| > | | '---------------------------' > .----------------/ > | PASID Entry |--- PASID cache flush - > '-------------' | > | | V > | | CR3 in GPA > '-------------' > Guest > ------| Shadow |--------------------------|-------- > v v v > Host > .-------------. .----------------------. > | pIOMMU | | Bind FL for GVA-GPA | > | | '----------------------' > .----------------/ | > | PASID Entry | V (Nested xlate) > '----------------\.------------------------------. > | | |SL for GPA-HPA, default domain| > | | '------------------------------' > '-------------' > Where: > - FL = First level/stage one page tables > - SL = Second level/stage two page tables > > Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com> > Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com> > --- > drivers/iommu/intel-iommu.c | 4 + > drivers/iommu/intel-svm.c | 224 > ++++++++++++++++++++++++++++++++++++++++++++ > include/linux/intel-iommu.h | 8 +- > include/linux/intel-svm.h | 17 ++++ > 4 files changed, 252 insertions(+), 1 deletion(-) > > diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c > index e599b2537b1c..b1477cd423dd 100644 > --- a/drivers/iommu/intel-iommu.c > +++ b/drivers/iommu/intel-iommu.c > @@ -6203,6 +6203,10 @@ const struct iommu_ops intel_iommu_ops = { > .dev_disable_feat = intel_iommu_dev_disable_feat, > .is_attach_deferred = intel_iommu_is_attach_deferred, > .pgsize_bitmap = INTEL_IOMMU_PGSIZES, > +#ifdef CONFIG_INTEL_IOMMU_SVM > + .sva_bind_gpasid = intel_svm_bind_gpasid, > + .sva_unbind_gpasid = intel_svm_unbind_gpasid, > +#endif > }; > > static void quirk_iommu_igfx(struct pci_dev *dev) > diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c > index d7f2a5358900..47c0deb5ae56 100644 > --- a/drivers/iommu/intel-svm.c > +++ b/drivers/iommu/intel-svm.c > @@ -226,6 +226,230 @@ static LIST_HEAD(global_svm_list); > list_for_each_entry((sdev), &(svm)->devs, list) \ > if ((d) != (sdev)->dev) {} else > > +int intel_svm_bind_gpasid(struct iommu_domain *domain, > + struct device *dev, > + struct iommu_gpasid_bind_data *data) > +{ > + struct intel_iommu *iommu = intel_svm_device_to_iommu(dev); > + struct dmar_domain *ddomain; what about the full name e.g. dmar_domain? though a bit longer but clearer than ddomain. > + struct intel_svm_dev *sdev; > + struct intel_svm *svm; > + int ret = 0; > + > + if (WARN_ON(!iommu) || !data) > + return -EINVAL; > + > + if (data->version != IOMMU_GPASID_BIND_VERSION_1 || > + data->format != IOMMU_PASID_FORMAT_INTEL_VTD) > + return -EINVAL; > + > + if (dev_is_pci(dev)) { > + /* VT-d supports devices with full 20 bit PASIDs only */ > + if (pci_max_pasids(to_pci_dev(dev)) != PASID_MAX) > + return -EINVAL; > + } else { > + return -ENOTSUPP; > + } > + > + /* > + * We only check host PASID range, we have no knowledge to check > + * guest PASID range nor do we use the guest PASID. > + */ > + if (data->hpasid <= 0 || data->hpasid >= PASID_MAX) > + return -EINVAL; > + > + ddomain = to_dmar_domain(domain); > + > + /* Sanity check paging mode support match between host and guest > */ > + if (data->addr_width == ADDR_WIDTH_5LEVEL && > + !cap_5lp_support(iommu->cap)) { > + pr_err("Cannot support 5 level paging requested by > guest!\n"); > + return -EINVAL; > + } -ENOTSUPP? > + > + mutex_lock(&pasid_mutex); > + svm = ioasid_find(NULL, data->hpasid, NULL); > + if (IS_ERR(svm)) { > + ret = PTR_ERR(svm); > + goto out; > + } > + > + if (svm) { > + /* > + * If we found svm for the PASID, there must be at > + * least one device bond, otherwise svm should be freed. > + */ > + if (WARN_ON(list_empty(&svm->devs))) { > + ret = -EINVAL; > + goto out; > + } > + > + if (svm->mm == get_task_mm(current) && > + data->hpasid == svm->pasid && > + data->gpasid == svm->gpasid) { > + pr_warn("Cannot bind the same guest-host PASID for > the same process\n"); Sorry I didn’t get the rationale here. Isn't this branch is for binding the same PASID to multiple devices? In that case definitely it is binding the same guest-host PASID for the same process. otherwise if hpasid is different then you'll hit a different intel_svm, while if gpasid is different how you can use one intel_svm to hold multiple gpasids? I feel the error condition should be the opposite. and suppose SVM_FLAG_ GUEST_PASID should be verified before checking gpasid. > + mmput(svm->mm); > + ret = -EINVAL; > + goto out; > + } > + mmput(current->mm); > + > + for_each_svm_dev(sdev, svm, dev) { > + /* In case of multiple sub-devices of the same pdev > + * assigned, we should allow multiple bind calls with > + * the same PASID and pdev. Does sub-device mean mdev? I didn't find such notation in current iommu directory. and to make it clearer, "In case of multiple mdevs of the same pdev assigned to the same guest process". > + */ > + sdev->users++; > + goto out; > + } > + } else { > + /* We come here when PASID has never been bond to a > device. */ > + svm = kzalloc(sizeof(*svm), GFP_KERNEL); > + if (!svm) { > + ret = -ENOMEM; > + goto out; > + } > + /* REVISIT: upper layer/VFIO can track host process that bind > the PASID. > + * ioasid_set = mm might be sufficient for vfio to check pasid > VMM > + * ownership. > + */ Above message is unclear about what should be revisited. Does it describe the current implementation or the expected revision in the future? > + svm->mm = get_task_mm(current); > + svm->pasid = data->hpasid; > + if (data->flags & IOMMU_SVA_GPASID_VAL) { > + svm->gpasid = data->gpasid; > + svm->flags |= SVM_FLAG_GUEST_PASID; > + } > + ioasid_set_data(data->hpasid, svm); > + INIT_LIST_HEAD_RCU(&svm->devs); > + mmput(svm->mm); > + } > + sdev = kzalloc(sizeof(*sdev), GFP_KERNEL); > + if (!sdev) { > + if (list_empty(&svm->devs)) { > + ioasid_set_data(data->hpasid, NULL); > + kfree(svm); > + } > + ret = -ENOMEM; > + goto out; > + } > + sdev->dev = dev; > + sdev->users = 1; > + > + /* Set up device context entry for PASID if not enabled already */ > + ret = intel_iommu_enable_pasid(iommu, sdev->dev); > + if (ret) { > + dev_err(dev, "Failed to enable PASID capability\n"); > + kfree(sdev); > + /* > + * If this this a new PASID that never bond to a device, then > + * the device list must be empty which indicates struct svm > + * was allocated in this function. > + */ the comment better move to the 1st occurrence when sdev allocation fails. or even better put it in out label... > + if (list_empty(&svm->devs)) { > + ioasid_set_data(data->hpasid, NULL); > + kfree(svm); > + } > + goto out; > + } > + > + /* > + * For guest bind, we need to set up PASID table entry as follows: > + * - FLPM matches guest paging mode > + * - turn on nested mode > + * - SL guest address width matching > + */ looks above just explains the internal detail of intel_pasid_setup_nested, which is not necessary to be here. > + ret = intel_pasid_setup_nested(iommu, > + dev, > + (pgd_t *)data->gpgd, > + data->hpasid, > + &data->vtd, > + ddomain, > + data->addr_width); It's worthy of an explanation here that setup_nested is required for every device (even when they are sharing same intel_svm) because we allocate pasid table per device. Otherwise I made a mistake to think that only the 1st device bound to a new hpasid requires this step. 😊 > + if (ret) { > + dev_err(dev, "Failed to set up PASID %llu in nested mode, > Err %d\n", > + data->hpasid, ret); > + /* > + * PASID entry should be in cleared state if nested mode > + * set up failed. So we only need to clear IOASID tracking > + * data such that free call will succeed. > + */ > + kfree(sdev); > + if (list_empty(&svm->devs)) { > + ioasid_set_data(data->hpasid, NULL); > + kfree(svm); > + } > + goto out; > + } > + svm->flags |= SVM_FLAG_GUEST_MODE; > + > + init_rcu_head(&sdev->rcu); > + list_add_rcu(&sdev->list, &svm->devs); > + out: > + mutex_unlock(&pasid_mutex); > + return ret; > +} > + > +int intel_svm_unbind_gpasid(struct device *dev, int pasid) > +{ > + struct intel_iommu *iommu = intel_svm_device_to_iommu(dev); > + struct intel_svm_dev *sdev; > + struct intel_svm *svm; > + int ret = -EINVAL; > + > + if (WARN_ON(!iommu)) > + return -EINVAL; > + > + mutex_lock(&pasid_mutex); > + svm = ioasid_find(NULL, pasid, NULL); > + if (!svm) { > + ret = -EINVAL; > + goto out; > + } > + > + if (IS_ERR(svm)) { > + ret = PTR_ERR(svm); > + goto out; > + } > + > + for_each_svm_dev(sdev, svm, dev) { > + ret = 0; > + sdev->users--; > + if (!sdev->users) { > + list_del_rcu(&sdev->list); > + intel_pasid_tear_down_entry(iommu, dev, svm- > >pasid); > + /* TODO: Drain in flight PRQ for the PASID since it > + * may get reused soon, we don't want to > + * confuse with its previous life. > + * intel_svm_drain_prq(dev, pasid); > + */ > + kfree_rcu(sdev, rcu); > + > + if (list_empty(&svm->devs)) { > + /* > + * We do not free PASID here until explicit call > + * from VFIO to free. The PASID life cycle > + * management is largely tied to VFIO > management > + * of assigned device life cycles. In case of > + * guest exit without a explicit free PASID call, > + * the responsibility lies in VFIO layer to free > + * the PASIDs allocated for the guest. > + * For security reasons, VFIO has to track the > + * PASID ownership per guest anyway to > ensure > + * that PASID allocated by one guest cannot > be > + * used by another. As commented in other patches, VFIO is only one example user of this API... > + */ > + ioasid_set_data(pasid, NULL); > + kfree(svm); > + } > + } > + break; > + } what about no dev match? an -EINVAL is also required then. > +out: > + mutex_unlock(&pasid_mutex); > + > + return ret; > +} > + > int intel_svm_bind_mm(struct device *dev, int *pasid, int flags, struct > svm_dev_ops *ops) > { > struct intel_iommu *iommu = intel_svm_device_to_iommu(dev); > diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h > index eda1d6687144..85b05120940e 100644 > --- a/include/linux/intel-iommu.h > +++ b/include/linux/intel-iommu.h > @@ -681,7 +681,9 @@ struct dmar_domain *find_domain(struct device > *dev); > extern void intel_svm_check(struct intel_iommu *iommu); > extern int intel_svm_enable_prq(struct intel_iommu *iommu); > extern int intel_svm_finish_prq(struct intel_iommu *iommu); > - > +extern int intel_svm_bind_gpasid(struct iommu_domain *domain, > + struct device *dev, struct iommu_gpasid_bind_data *data); > +extern int intel_svm_unbind_gpasid(struct device *dev, int pasid); > struct svm_dev_ops; > > struct intel_svm_dev { > @@ -698,9 +700,13 @@ struct intel_svm_dev { > struct intel_svm { > struct mmu_notifier notifier; > struct mm_struct *mm; > + > struct intel_iommu *iommu; > int flags; > int pasid; > + int gpasid; /* Guest PASID in case of vSVA bind with non-identity host > + * to guest PASID mapping. > + */ we don't need to highlight identity or non-identity thing, since either way shares the same infrastructure here and it is not the knowledge that the kernel driver should assume > struct list_head devs; > struct list_head list; > }; > diff --git a/include/linux/intel-svm.h b/include/linux/intel-svm.h > index d7c403d0dd27..c19690937540 100644 > --- a/include/linux/intel-svm.h > +++ b/include/linux/intel-svm.h > @@ -44,6 +44,23 @@ struct svm_dev_ops { > * do such IOTLB flushes automatically. > */ > #define SVM_FLAG_SUPERVISOR_MODE (1<<1) > +/* > + * The SVM_FLAG_GUEST_MODE flag is used when a guest process bind to a > device. > + * In this case the mm_struct is in the guest kernel or userspace, its life > + * cycle is managed by VMM and VFIO layer. For IOMMU driver, this API > provides > + * means to bind/unbind guest CR3 with PASIDs allocated for a device. > + */ > +#define SVM_FLAG_GUEST_MODE (1<<2) > +/* > + * The SVM_FLAG_GUEST_PASID flag is used when a guest has its own PASID > space, > + * which requires guest and host PASID translation at both directions. We > keep > + * track of guest PASID in order to provide lookup service to device drivers. > + * One such example is a physical function (PF) driver that supports > mediated > + * device (mdev) assignment. Guest programming of mdev configuration > space can > + * only be done with guest PASID, therefore PF driver needs to find the > matching > + * host PASID to program the real hardware. > + */ > +#define SVM_FLAG_GUEST_PASID (1<<3) > > #ifdef CONFIG_INTEL_IOMMU_SVM > > -- > 2.7.4 _______________________________________________ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
next prev parent reply other threads:[~2020-03-28 8:02 UTC|newest] Thread overview: 135+ messages / expand[flat|nested] mbox.gz Atom feed top 2020-03-20 23:27 [PATCH V10 00/11] Nested Shared Virtual Address (SVA) VT-d support Jacob Pan 2020-03-20 23:27 ` Jacob Pan 2020-03-20 23:27 ` [PATCH V10 01/11] iommu/vt-d: Move domain helper to header Jacob Pan 2020-03-20 23:27 ` Jacob Pan 2020-03-27 11:48 ` Tian, Kevin 2020-03-27 11:48 ` Tian, Kevin 2020-03-20 23:27 ` [PATCH V10 02/11] iommu/uapi: Define a mask for bind data Jacob Pan 2020-03-20 23:27 ` Jacob Pan 2020-03-22 1:29 ` Lu Baolu 2020-03-22 1:29 ` Lu Baolu 2020-03-23 19:37 ` Jacob Pan 2020-03-23 19:37 ` Jacob Pan 2020-03-24 1:50 ` Lu Baolu 2020-03-24 1:50 ` Lu Baolu 2020-03-27 11:50 ` Tian, Kevin 2020-03-27 11:50 ` Tian, Kevin 2020-03-27 14:13 ` Auger Eric 2020-03-27 14:13 ` Auger Eric 2020-03-20 23:27 ` [PATCH V10 03/11] iommu/vt-d: Add a helper function to skip agaw Jacob Pan 2020-03-20 23:27 ` Jacob Pan 2020-03-27 11:53 ` Tian, Kevin 2020-03-27 11:53 ` Tian, Kevin 2020-03-29 7:20 ` Lu Baolu 2020-03-29 7:20 ` Lu Baolu 2020-03-30 17:50 ` Jacob Pan 2020-03-30 17:50 ` Jacob Pan 2020-03-20 23:27 ` [PATCH V10 04/11] iommu/vt-d: Use helper function to skip agaw for SL Jacob Pan 2020-03-20 23:27 ` Jacob Pan 2020-03-27 11:55 ` Tian, Kevin 2020-03-27 11:55 ` Tian, Kevin 2020-03-27 16:05 ` Auger Eric 2020-03-27 16:05 ` Auger Eric 2020-03-29 7:35 ` Lu Baolu 2020-03-29 7:35 ` Lu Baolu 2020-03-20 23:27 ` [PATCH V10 05/11] iommu/vt-d: Add nested translation helper function Jacob Pan 2020-03-20 23:27 ` Jacob Pan 2020-03-26 10:41 ` kbuild test robot 2020-03-27 12:21 ` Tian, Kevin 2020-03-27 12:21 ` Tian, Kevin 2020-03-29 8:03 ` Lu Baolu 2020-03-29 8:03 ` Lu Baolu 2020-03-30 18:21 ` Jacob Pan 2020-03-30 18:21 ` Jacob Pan 2020-03-31 3:36 ` Tian, Kevin 2020-03-31 3:36 ` Tian, Kevin 2020-03-29 11:35 ` Auger Eric 2020-03-29 11:35 ` Auger Eric 2020-04-01 20:06 ` Jacob Pan 2020-04-01 20:06 ` Jacob Pan 2020-03-20 23:27 ` [PATCH V10 06/11] iommu/vt-d: Add bind guest PASID support Jacob Pan 2020-03-20 23:27 ` Jacob Pan 2020-03-28 8:02 ` Tian, Kevin [this message] 2020-03-28 8:02 ` Tian, Kevin 2020-03-30 20:51 ` Jacob Pan 2020-03-30 20:51 ` Jacob Pan 2020-03-31 3:43 ` Tian, Kevin 2020-03-31 3:43 ` Tian, Kevin 2020-04-01 17:13 ` Jacob Pan 2020-04-01 17:13 ` Jacob Pan 2020-03-29 13:40 ` Auger Eric 2020-03-29 13:40 ` Auger Eric 2020-03-30 22:53 ` Jacob Pan 2020-03-30 22:53 ` Jacob Pan 2020-03-20 23:27 ` [PATCH V10 07/11] iommu/vt-d: Support flushing more translation cache types Jacob Pan 2020-03-20 23:27 ` Jacob Pan 2020-03-27 14:46 ` Auger Eric 2020-03-27 14:46 ` Auger Eric 2020-03-30 23:28 ` Jacob Pan 2020-03-30 23:28 ` Jacob Pan 2020-03-31 16:13 ` Jacob Pan 2020-03-31 16:13 ` Jacob Pan 2020-03-31 16:15 ` Auger Eric 2020-03-31 16:15 ` Auger Eric 2020-03-20 23:27 ` [PATCH V10 08/11] iommu/vt-d: Add svm/sva invalidate function Jacob Pan 2020-03-20 23:27 ` Jacob Pan 2020-03-28 10:01 ` Tian, Kevin 2020-03-28 10:01 ` Tian, Kevin 2020-03-29 15:34 ` Auger Eric 2020-03-29 15:34 ` Auger Eric 2020-03-31 2:49 ` Tian, Kevin 2020-03-31 2:49 ` Tian, Kevin 2020-03-31 20:58 ` Jacob Pan 2020-03-31 20:58 ` Jacob Pan 2020-04-01 6:29 ` Tian, Kevin 2020-04-01 6:29 ` Tian, Kevin 2020-04-01 7:13 ` Liu, Yi L 2020-04-01 7:13 ` Liu, Yi L 2020-04-01 7:32 ` Auger Eric 2020-04-01 7:32 ` Auger Eric 2020-04-01 16:05 ` Jacob Pan 2020-04-01 16:05 ` Jacob Pan 2020-04-02 15:54 ` Jacob Pan 2020-04-02 15:54 ` Jacob Pan 2020-03-29 16:05 ` Auger Eric 2020-03-29 16:05 ` Auger Eric 2020-03-31 3:34 ` Tian, Kevin 2020-03-31 3:34 ` Tian, Kevin 2020-03-31 21:07 ` Jacob Pan 2020-03-31 21:07 ` Jacob Pan 2020-04-01 6:32 ` Tian, Kevin 2020-04-01 6:32 ` Tian, Kevin 2020-03-31 18:13 ` Jacob Pan 2020-03-31 18:13 ` Jacob Pan 2020-04-01 6:24 ` Tian, Kevin 2020-04-01 6:24 ` Tian, Kevin 2020-04-01 6:57 ` Liu, Yi L 2020-04-01 6:57 ` Liu, Yi L 2020-04-01 16:03 ` Jacob Pan 2020-04-01 16:03 ` Jacob Pan 2020-03-29 16:05 ` Auger Eric 2020-03-29 16:05 ` Auger Eric 2020-03-31 22:28 ` Jacob Pan 2020-03-31 22:28 ` Jacob Pan 2020-03-20 23:27 ` [PATCH V10 09/11] iommu/vt-d: Cache virtual command capability register Jacob Pan 2020-03-20 23:27 ` Jacob Pan 2020-03-28 10:04 ` Tian, Kevin 2020-03-28 10:04 ` Tian, Kevin 2020-03-31 22:33 ` Jacob Pan 2020-03-31 22:33 ` Jacob Pan 2020-03-20 23:27 ` [PATCH V10 10/11] iommu/vt-d: Enlightened PASID allocation Jacob Pan 2020-03-20 23:27 ` Jacob Pan 2020-03-28 10:08 ` Tian, Kevin 2020-03-28 10:08 ` Tian, Kevin 2020-03-31 22:37 ` Jacob Pan 2020-03-31 22:37 ` Jacob Pan 2020-03-20 23:27 ` [PATCH V10 11/11] iommu/vt-d: Add custom allocator for IOASID Jacob Pan 2020-03-20 23:27 ` Jacob Pan 2020-03-28 10:22 ` Tian, Kevin 2020-03-28 10:22 ` Tian, Kevin 2020-04-01 15:47 ` Jacob Pan 2020-04-01 15:47 ` Jacob Pan 2020-04-02 2:18 ` Tian, Kevin 2020-04-02 2:18 ` Tian, Kevin 2020-04-02 20:28 ` Jacob Pan 2020-04-02 20:28 ` Jacob Pan
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=AADFC41AFE54684AB9EE6CBC0274A5D19D7F77B4@SHSMSX104.ccr.corp.intel.com \ --to=kevin.tian@intel.com \ --cc=alex.williamson@redhat.com \ --cc=ashok.raj@intel.com \ --cc=baolu.lu@linux.intel.com \ --cc=dwmw2@infradead.org \ --cc=eric.auger@redhat.com \ --cc=hch@infradead.org \ --cc=iommu@lists.linux-foundation.org \ --cc=jacob.jun.pan@linux.intel.com \ --cc=jean-philippe@linaro.com \ --cc=jic23@kernel.org \ --cc=joro@8bytes.org \ --cc=linux-kernel@vger.kernel.org \ --cc=yi.l.liu@intel.com \ --cc=yi.l.liu@linux.intel.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.