IOMMU Archive on lore.kernel.org
 help / color / Atom feed
From: Jacob Pan <jacob.jun.pan@linux.intel.com>
To: Lu Baolu <baolu.lu@linux.intel.com>
Cc: "Tian, Kevin" <kevin.tian@intel.com>,
	Raj Ashok <ashok.raj@intel.com>,
	David Woodhouse <dwmw2@infradead.org>,
	iommu@lists.linux-foundation.org,
	LKML <linux-kernel@vger.kernel.org>,
	Alex Williamson <alex.williamson@redhat.com>,
	Jean-Philippe Brucker <jean-philippe@linaro.com>,
	Jonathan Cameron <jic23@kernel.org>
Subject: Re: [PATCH v7 09/11] iommu/vt-d: Add bind guest PASID support
Date: Mon, 28 Oct 2019 21:11:21 -0700
Message-ID: <20191028211121.4ba2f332@jacob-builder> (raw)
In-Reply-To: <e4ac7f82-a224-3388-88ae-1ce52d2eec58@linux.intel.com>

On Tue, 29 Oct 2019 10:54:48 +0800
Lu Baolu <baolu.lu@linux.intel.com> wrote:

> Hi,
> 
> On 10/29/19 6:29 AM, Jacob Pan wrote:
> > Hi Baolu,
> > 
> > Appreciate the thorough review, comments inline.  
> 
> You are welcome.
> 
> > 
> > On Sat, 26 Oct 2019 10:01:19 +0800
> > Lu Baolu <baolu.lu@linux.intel.com> wrote:
> >   
> >> Hi,
> >>  
> 
> [...]
> 
> >>> +			 * allow multiple bind calls with the
> >>> same PASID and pdev.
> >>> +			 */
> >>> +			sdev->users++;
> >>> +			goto out;
> >>> +		}  
> >>
> >> I remember I ever pointed this out before. But I forgot how we
> >> addressed it. So forgive me if this has been addressed.
> >>
> >> What if we have a valid bound svm but @dev doesn't belong to it
> >> (a.k.a. @dev not in svm->devs list)?
> >>  
> > If we are binding a new device to an existing/active PASID, the code
> > will allocate a new sdev and add that to the svm->devs list.  
> 
> But allocating a new sdev and adding device is in below else branch,
> so it will never reach there, right?
> 
No, allocating sdev is outside else branch.
> >>> +	} else {
> >>> +		/* We come here when PASID has never been bond to
> >>> a device. */
> >>> +		svm = kzalloc(sizeof(*svm), GFP_KERNEL);
> >>> +		if (!svm) {
> >>> +			ret = -ENOMEM;
> >>> +			goto out;
> >>> +		}
> >>> +		/* REVISIT: upper layer/VFIO can track host
> >>> process that bind the PASID.
> >>> +		 * ioasid_set = mm might be sufficient for vfio
> >>> to check pasid VMM
> >>> +		 * ownership.
> >>> +		 */
> >>> +		svm->mm = get_task_mm(current);
> >>> +		svm->pasid = data->hpasid;
> >>> +		if (data->flags & IOMMU_SVA_GPASID_VAL) {
> >>> +			svm->gpasid = data->gpasid;
> >>> +			svm->flags |= SVM_FLAG_GUEST_PASID;
> >>> +		}
> >>> +		ioasid_set_data(data->hpasid, svm);
> >>> +		INIT_LIST_HEAD_RCU(&svm->devs);
> >>> +		INIT_LIST_HEAD(&svm->list);
> >>> +
> >>> +		mmput(svm->mm);
> >>> +	}  
> >>
> >> A blank line, please.  
> > looks good.  
> >>  
> >>> +	sdev = kzalloc(sizeof(*sdev), GFP_KERNEL);
> >>> +	if (!sdev) {
> >>> +		if (list_empty(&svm->devs))
> >>> +			kfree(svm);  
> >>
> >> This is dangerous. This might leave a wild pointer bound with
> >> gpasid. 
> > why is that? can you please explain?
> > if the list is empty that means we just allocated the new svm, no
> > users. why can't we free it here?  
> 
> svm has been associated with the pasid private data. It needs to be
> unbound from pasid before getting freed. Otherwise, a wild pointer
> will be left.
> 
> 	ioasid_set_data(pasid, NULL);
> 	kfree(svm);
> 
Right, I need to clear the private data here. Thanks!

> >   
> >>> +		ret = -ENOMEM;
> >>> +		goto out;
> >>> +	}
> >>> +	sdev->dev = dev;
> >>> +	sdev->users = 1;
> >>> +
> >>> +	/* Set up device context entry for PASID if not enabled
> >>> already */
> >>> +	ret = intel_iommu_enable_pasid(iommu, sdev->dev);
> >>> +	if (ret) {
> >>> +		dev_err(dev, "Failed to enable PASID
> >>> capability\n");
> >>> +		kfree(sdev);
> >>> +		goto out;
> >>> +	}
> >>> +
> >>> +	/*
> >>> +	 * For guest bind, we need to set up PASID table entry as
> >>> follows:
> >>> +	 * - FLPM matches guest paging mode
> >>> +	 * - turn on nested mode
> >>> +	 * - SL guest address width matching
> >>> +	 */
> >>> +	ret = intel_pasid_setup_nested(iommu,
> >>> +				dev,
> >>> +				(pgd_t *)data->gpgd,
> >>> +				data->hpasid,
> >>> +				&data->vtd,
> >>> +				ddomain,
> >>> +				data->addr_width);
> >>> +	if (ret) {
> >>> +		dev_err(dev, "Failed to set up PASID %llu in
> >>> nested mode, Err %d\n",
> >>> +			data->hpasid, ret);  
> >>
> >> This error handling is insufficient. You should at least:
> >>
> >> 1. free sdev  
> > already done below
> >   
> >> 2. if list_empty(&svm->devs)
> >> 	unbound the svm from gpasid
> >> 	free svm
> >>  
> > yes, agreed.
> >   
> >> The same for above error handling. Add a branch for error recovery
> >> at the end of function might help here.
> >>  
> > not sure which code is the same as above? could you point it out?  
> 
> Above last comment. :-)
> 
Got it.
> >>> +		kfree(sdev);
> >>> +		goto out;
> >>> +	}
> >>> +	svm->flags |= SVM_FLAG_GUEST_MODE;
> >>> +
> >>> +	init_rcu_head(&sdev->rcu);
> >>> +	list_add_rcu(&sdev->list, &svm->devs);
> >>> + out:
> >>> +	mutex_unlock(&pasid_mutex);
> >>> +	return ret;
> >>> +}
> >>> +
> >>> +int intel_svm_unbind_gpasid(struct device *dev, int pasid)
> >>> +{
> >>> +	struct intel_svm_dev *sdev;
> >>> +	struct intel_iommu *iommu;
> >>> +	struct intel_svm *svm;
> >>> +	int ret = -EINVAL;
> >>> +
> >>> +	mutex_lock(&pasid_mutex);
> >>> +	iommu = intel_svm_device_to_iommu(dev);
> >>> +	if (!iommu)
> >>> +		goto out;  
> >>
> >> Make it symmetrical with bind function.
> >>
> >> 	if (WARN_ON(!iommu))
> >> 		goto out;
> >>  
> > sounds good.  
> >>> +
> >>> +	svm = ioasid_find(NULL, pasid, NULL);
> >>> +	if (IS_ERR_OR_NULL(svm)) {
> >>> +		ret = PTR_ERR(svm);  
> >>
> >> If svm == NULL, this function will return success. This is not
> >> expected, right?
> >>  
> > good catch, will fix.  
> >>> +		goto out;
> >>> +	}
> >>> +
> >>> +	for_each_svm_dev(svm, dev) {
> >>> +		ret = 0;
> >>> +		sdev->users--;
> >>> +		if (!sdev->users) {
> >>> +			list_del_rcu(&sdev->list);
> >>> +			intel_pasid_tear_down_entry(iommu, dev,
> >>> svm->pasid);
> >>> +			/* TODO: Drain in flight PRQ for the
> >>> PASID since it
> >>> +			 * may get reused soon, we don't want to
> >>> +			 * confuse with its previous life.
> >>> +			 * intel_svm_drain_prq(dev, pasid);
> >>> +			 */
> >>> +			kfree_rcu(sdev, rcu);
> >>> +
> >>> +			if (list_empty(&svm->devs)) {
> >>> +				list_del(&svm->list);
> >>> +				kfree(svm);
> >>> +				/*
> >>> +				 * We do not free PASID here
> >>> until explicit call
> >>> +				 * from VFIO to free. The PASID
> >>> life cycle
> >>> +				 * management is largely tied to
> >>> VFIO management
> >>> +				 * of assigned device life
> >>> cycles. In case of
> >>> +				 * guest exit without a explicit
> >>> free PASID call,
> >>> +				 * the responsibility lies in
> >>> VFIO layer to free
> >>> +				 * the PASIDs allocated for the
> >>> guest.
> >>> +				 * For security reasons, VFIO has
> >>> to track the
> >>> +				 * PASID ownership per guest
> >>> anyway to ensure
> >>> +				 * that PASID allocated by one
> >>> guest cannot be
> >>> +				 * used by another.
> >>> +				 */
> >>> +				ioasid_set_data(pasid, NULL);  
> >>
> >> Exchange order. First unbind svm from gpasid and then free svm.
> >>  
> > I am not following, aren't we already doing free svm after unbind?
> > please explain.  
> 
> I meant
> 
> 	ioasid_set_data(pasid, NULL);
> 	kfree(svm);
> 
> in reverse order, it leaves a short window when svm is freed, but
> pasid private data is still kept svm (wild pointer).
> 
> 
Right. will fix
> >>> +			}
> >>> +		}
> >>> +		break;
> >>> +	}
> >>> + out:
> >>> +	mutex_unlock(&pasid_mutex);
> >>> +
> >>> +	return ret;
> >>> +}
> >>> +
> >>>    int intel_svm_bind_mm(struct device *dev, int *pasid, int
> >>> flags, struct svm_dev_ops *ops) {
> >>>    	struct intel_iommu *iommu =
> >>> intel_svm_device_to_iommu(dev); diff --git
> >>> a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h index
> >>> 3dba6ad3e9ad..6c74c71b1ebf 100644 ---
> >>> a/include/linux/intel-iommu.h +++ b/include/linux/intel-iommu.h
> >>> @@ -673,7 +673,9 @@ int intel_iommu_enable_pasid(struct
> >>> intel_iommu *iommu, struct device *dev); int
> >>> intel_svm_init(struct intel_iommu *iommu); extern int
> >>> intel_svm_enable_prq(struct intel_iommu *iommu); extern int
> >>> intel_svm_finish_prq(struct intel_iommu *iommu); -
> >>> +extern int intel_svm_bind_gpasid(struct iommu_domain *domain,
> >>> +		struct device *dev, struct iommu_gpasid_bind_data
> >>> *data); +extern int intel_svm_unbind_gpasid(struct device *dev,
> >>> int pasid); struct svm_dev_ops;
> >>>    
> >>>    struct intel_svm_dev {
> >>> @@ -690,9 +692,13 @@ struct intel_svm_dev {
> >>>    struct intel_svm {
> >>>    	struct mmu_notifier notifier;
> >>>    	struct mm_struct *mm;
> >>> +
> >>>    	struct intel_iommu *iommu;
> >>>    	int flags;
> >>>    	int pasid;
> >>> +	int gpasid; /* Guest PASID in case of vSVA bind with
> >>> non-identity host
> >>> +		     * to guest PASID mapping.
> >>> +		     */
> >>>    	struct list_head devs;
> >>>    	struct list_head list;
> >>>    };
> >>> diff --git a/include/linux/intel-svm.h b/include/linux/intel-svm.h
> >>> index 94f047a8a845..a2c189ad0b01 100644
> >>> --- a/include/linux/intel-svm.h
> >>> +++ b/include/linux/intel-svm.h
> >>> @@ -44,6 +44,23 @@ struct svm_dev_ops {
> >>>     * do such IOTLB flushes automatically.
> >>>     */
> >>>    #define SVM_FLAG_SUPERVISOR_MODE	(1<<1)
> >>> +/*
> >>> + * The SVM_FLAG_GUEST_MODE flag is used when a guest process bind
> >>> to a device.
> >>> + * In this case the mm_struct is in the guest kernel or
> >>> userspace, its life
> >>> + * cycle is managed by VMM and VFIO layer. For IOMMU driver, this
> >>> API provides
> >>> + * means to bind/unbind guest CR3 with PASIDs allocated for a
> >>> device.
> >>> + */
> >>> +#define SVM_FLAG_GUEST_MODE	(1<<2)  
> >>
> >> How about keeping this aligned with top by adding a tab?
> >>  
> > sounds good.  
> >> BIT macro is preferred. Hence, make it BIT(1), BIT(2), BIT(3) is
> >> preferred.
> >>  
> > I know, but the existing mainline code is not using BIT, so I wanted
> > to keep coding style consistent. Perhaps a separate cleanup patch
> > will do later.  
> 
> It makes sense to me.
> 
> >>> +/*
> >>> + * The SVM_FLAG_GUEST_PASID flag is used when a guest has its own
> >>> PASID space,
> >>> + * which requires guest and host PASID translation at both
> >>> directions. We keep
> >>> + * track of guest PASID in order to provide lookup service to
> >>> device drivers.
> >>> + * One such example is a physical function (PF) driver that
> >>> supports mediated
> >>> + * device (mdev) assignment. Guest programming of mdev
> >>> configuration space can
> >>> + * only be done with guest PASID, therefore PF driver needs to
> >>> find the matching
> >>> + * host PASID to program the real hardware.
> >>> + */
> >>> +#define SVM_FLAG_GUEST_PASID	(1<<3)  
> >>
> >> Ditto.
> >>
> >> Best regards,
> >> baolu  
> > 
> > [Jacob Pan]
> >   
> 
> Best regards,
> baolu

[Jacob Pan]
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

  reply index

Thread overview: 78+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-10-24 19:54 [PATCH v7 00/11] Nested Shared Virtual Address (SVA) VT-d support Jacob Pan
2019-10-24 19:54 ` [PATCH v7 01/11] iommu/vt-d: Cache virtual command capability register Jacob Pan
2019-10-25  2:53   ` Lu Baolu
2019-10-25  6:06   ` Tian, Kevin
2019-11-08 10:32   ` Auger Eric
2019-10-24 19:54 ` [PATCH v7 02/11] iommu/vt-d: Enlightened PASID allocation Jacob Pan
2019-10-25  6:19   ` Tian, Kevin
2019-10-29 17:14     ` Jacob Pan
2019-10-29 18:16       ` Tian, Kevin
2019-11-08 10:33   ` Auger Eric
2019-11-08 22:22     ` Jacob Pan
2019-10-24 19:54 ` [PATCH v7 03/11] iommu/vt-d: Add custom allocator for IOASID Jacob Pan
2019-10-25  2:30   ` Lu Baolu
2019-10-25  4:43     ` Jacob Pan
2019-10-25  6:40       ` Tian, Kevin
2019-10-25 14:39         ` Lu Baolu
2019-10-25 15:52           ` Tian, Kevin
2019-10-28 22:49             ` Jacob Pan
2019-10-29  2:22               ` Lu Baolu
2019-10-25  6:31   ` Tian, Kevin
2019-10-28 22:52     ` Jacob Pan
2019-11-08 10:40   ` Auger Eric
2019-11-08 22:26     ` Jacob Pan
2019-10-24 19:54 ` [PATCH v7 04/11] iommu/vt-d: Replace Intel specific PASID allocator with IOASID Jacob Pan
2019-10-25  5:47   ` Lu Baolu
2019-11-01 18:29     ` Jacob Pan
2019-10-25  6:41   ` Tian, Kevin
2019-10-28 22:46     ` Jacob Pan
2019-11-08 11:30   ` Auger Eric
2019-11-08 22:55     ` Jacob Pan
2019-11-12  9:54       ` Auger Eric
2019-10-24 19:54 ` [PATCH v7 05/11] iommu/vt-d: Move domain helper to header Jacob Pan
2019-10-25  5:26   ` Lu Baolu
2019-10-24 19:54 ` [PATCH v7 06/11] iommu/vt-d: Avoid duplicated code for PASID setup Jacob Pan
2019-10-25  5:32   ` Lu Baolu
2019-10-25  6:42   ` Tian, Kevin
2019-10-28 22:41     ` Jacob Pan
2019-11-12  9:54   ` Auger Eric
2019-10-24 19:55 ` [PATCH v7 07/11] iommu/vt-d: Add nested translation helper function Jacob Pan
2019-10-25  7:04   ` Tian, Kevin
2019-11-01 21:10     ` Jacob Pan
2019-10-25 15:04   ` Lu Baolu
2019-10-25 16:06     ` Jacob Pan
2019-11-08 13:55   ` Auger Eric
2019-10-24 19:55 ` [PATCH v7 08/11] iommu/vt-d: Misc macro clean up for SVM Jacob Pan
2019-10-26  1:00   ` Lu Baolu
2019-10-28 22:38     ` Jacob Pan
2019-10-24 19:55 ` [PATCH v7 09/11] iommu/vt-d: Add bind guest PASID support Jacob Pan
2019-10-25  7:19   ` Tian, Kevin
2019-10-25 17:33     ` Jacob Pan
2019-10-28  6:03       ` Tian, Kevin
2019-10-28 16:02         ` Jacob Pan
2019-10-29  7:57           ` Tian, Kevin
2019-10-29 16:11             ` Jacob Pan
2019-10-29 18:04               ` Tian, Kevin
2019-10-29  2:33         ` Lu Baolu
2019-10-26  2:01   ` Lu Baolu
2019-10-28 22:29     ` Jacob Pan
2019-10-29  2:54       ` Lu Baolu
2019-10-29  4:11         ` Jacob Pan [this message]
2019-10-29  5:04           ` Lu Baolu
2019-10-24 19:55 ` [PATCH v7 10/11] iommu/vt-d: Support flushing more translation cache types Jacob Pan
2019-10-25  7:21   ` Tian, Kevin
2019-11-01 21:30     ` Jacob Pan
2019-10-26  2:22   ` Lu Baolu
2019-11-01 21:28     ` Jacob Pan
2019-11-08 16:18   ` Auger Eric
2019-11-08 23:05     ` Jacob Pan
2019-10-24 19:55 ` [PATCH v7 11/11] iommu/vt-d: Add svm/sva invalidate function Jacob Pan
2019-10-25  7:27   ` Tian, Kevin
2019-10-26  2:40     ` Lu Baolu
2019-10-26  7:03       ` Lu Baolu
2019-10-28  6:06         ` Tian, Kevin
2019-10-28 16:10           ` Jacob Pan
2019-10-29 18:52             ` Tian, Kevin
2019-10-29 19:25               ` Jacob Pan
2019-10-28 16:13     ` Jacob Pan
2019-11-12 10:28   ` Auger Eric

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191028211121.4ba2f332@jacob-builder \
    --to=jacob.jun.pan@linux.intel.com \
    --cc=alex.williamson@redhat.com \
    --cc=ashok.raj@intel.com \
    --cc=baolu.lu@linux.intel.com \
    --cc=dwmw2@infradead.org \
    --cc=iommu@lists.linux-foundation.org \
    --cc=jean-philippe@linaro.com \
    --cc=jic23@kernel.org \
    --cc=kevin.tian@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

IOMMU Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-iommu/0 linux-iommu/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-iommu linux-iommu/ https://lore.kernel.org/linux-iommu \
		iommu@lists.linux-foundation.org
	public-inbox-index linux-iommu

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.linux-foundation.lists.iommu


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git