All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jason Gunthorpe <jgg@nvidia.com>
To: Lu Baolu <baolu.lu@linux.intel.com>,
	bpf@vger.kernel.org, Jonathan Corbet <corbet@lwn.net>,
	David Woodhouse <dwmw2@infradead.org>,
	iommu@lists.linux.dev, Joerg Roedel <joro@8bytes.org>,
	Kevin Tian <kevin.tian@intel.com>,
	linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org,
	llvm@lists.linux.dev, Nathan Chancellor <nathan@kernel.org>,
	Nick Desaulniers <ndesaulniers@google.com>,
	Miguel Ojeda <ojeda@kernel.org>,
	Robin Murphy <robin.murphy@arm.com>,
	Shuah Khan <shuah@kernel.org>,
	Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>,
	Tom Rix <trix@redhat.com>, Will Deacon <will@kernel.org>
Cc: Alex Williamson <alex.williamson@redhat.com>,
	Chaitanya Kulkarni <chaitanyak@nvidia.com>,
	Cornelia Huck <cohuck@redhat.com>,
	Daniel Jordan <daniel.m.jordan@oracle.com>,
	David Gibson <david@gibson.dropbear.id.au>,
	Eric Auger <eric.auger@redhat.com>,
	Eric Farman <farman@linux.ibm.com>,
	Jason Wang <jasowang@redhat.com>,
	Jean-Philippe Brucker <jean-philippe@linaro.org>,
	Joao Martins <joao.m.martins@oracle.com>,
	kvm@vger.kernel.org, Matthew Rosato <mjrosato@linux.ibm.com>,
	"Michael S. Tsirkin" <mst@redhat.com>,
	Nicolin Chen <nicolinc@nvidia.com>,
	Niklas Schnelle <schnelle@linux.ibm.com>,
	Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>,
	Yi Liu <yi.l.liu@intel.com>, Keqian Zhu <zhukeqian1@huawei.com>
Subject: Re: [PATCH v3 15/15] iommufd: Add a selftest
Date: Thu, 3 Nov 2022 22:01:17 -0300	[thread overview]
Message-ID: <Y2RkXV+q0Q6bdTde@nvidia.com> (raw)
In-Reply-To: <15-v3-402a7d6459de+24b-iommufd_jgg@nvidia.com>

On Tue, Oct 25, 2022 at 03:12:24PM -0300, Jason Gunthorpe wrote:

> +static void iommufd_test_access_unmap(void *data, unsigned long iova,
> +				      unsigned long length)
> +{
> +	unsigned long iova_last = iova + length - 1;
> +	struct selftest_access *staccess = data;
> +	struct selftest_access_item *item;
> +	struct selftest_access_item *tmp;
> +
> +	spin_lock(&staccess->lock);
> +	list_for_each_entry_safe(item, tmp, &staccess->items, items_elm) {
> +		if (iova > item->iova_end || iova_last < item->iova)
> +			continue;
> +		list_del(&item->items_elm);
> +		spin_unlock(&staccess->lock);
> +		iommufd_access_unpin_pages(staccess->access, item->iova,
> +					   item->length);
> +		kfree(item);
> +		spin_lock(&staccess->lock);
> +	}
> +	spin_unlock(&staccess->lock);
> +}

> +static int iommufd_test_access_pages(struct iommufd_ucmd *ucmd,
> +				     unsigned int access_id, unsigned long iova,
> +				     size_t length, void __user *uptr,
> +				     u32 flags)
> +{
> +	struct iommu_test_cmd *cmd = ucmd->cmd;
> +	struct selftest_access_item *item;
> +	struct selftest_access *staccess;
> +	struct page **pages;
> +	size_t npages;
> +	int rc;
> +
> +	if (flags & ~MOCK_FLAGS_ACCESS_WRITE)
> +		return -EOPNOTSUPP;
> +
> +	staccess = iommufd_access_get(access_id);
> +	if (IS_ERR(staccess))
> +		return PTR_ERR(staccess);
> +
> +	npages = (ALIGN(iova + length, PAGE_SIZE) -
> +		  ALIGN_DOWN(iova, PAGE_SIZE)) /
> +		 PAGE_SIZE;
> +	pages = kvcalloc(npages, sizeof(*pages), GFP_KERNEL_ACCOUNT);
> +	if (!pages) {
> +		rc = -ENOMEM;
> +		goto out_put;
> +	}
> +
> +	rc = iommufd_access_pin_pages(staccess->access, iova, length, pages,
> +				      flags & MOCK_FLAGS_ACCESS_WRITE);
> +	if (rc)
> +		goto out_free_pages;
> +
> +	rc = iommufd_test_check_pages(
> +		uptr - (iova - ALIGN_DOWN(iova, PAGE_SIZE)), pages, npages);
> +	if (rc)
> +		goto out_unaccess;
> +
> +	item = kzalloc(sizeof(*item), GFP_KERNEL_ACCOUNT);
> +	if (!item) {
> +		rc = -ENOMEM;
> +		goto out_unaccess;
> +	}
> +
> +	item->iova = iova;
> +	item->length = length;
> +	spin_lock(&staccess->lock);
> +	item->id = staccess->next_id++;
> +	list_add_tail(&item->items_elm, &staccess->items);
> +	spin_unlock(&staccess->lock);

I haven't been remarking on the bugs that syzkaller finds in the test
suite itself (sigh), but this one is surprising and complicated enough
to deserve some wider attention.

VFIO has a protocol which has been mapped into iommufd allowing an
external driver to convert IOVA to struct pages *. iommufd natively
represents this as the sequence:

  access = iommufd_access_create(ops)
  iommufd_access_pin_pages(access, iova, length, pages)
  iommufd_access_unpin_pages(access, iova, length)

One of the quirks of the VFIO design is that if userspace does an
unmap then the unmap shall succeed, but like in a HW iommu, the above
pin_pages is revoked and the external driver must stop accessing that
memory. iommufd achieves this by calling a callback:

static const struct iommufd_access_ops selftest_access_ops = {
	.unmap = iommufd_test_access_unmap,
};

Which has the invariant that upon return the unpin_pages must be
completed.

This all sounds simple enough, but when you throw syzkalller at this
and it generates all kinds of races it generates something like this:

            CPU1                         CPU2                 CPU3
    iommufd_access_create()
    iommufd_access_pin_pages()
                                       unmap_all()
                                         iommufd_test_access_unmap()
                                                            unmap_all()
                                                             iommufd_test_access_unmap()

    spin_lock(&staccess->lock);
    list_add_tail(&item->items_elm, &staccess->items);

And of course since the list_add_tail is in the wrong order it means
iommufd_test_access_unmap() doesn't see it and doesn't undo it,
triggering a WARN_ON.

The only way I can see to solve this is to hold a serializing lock
across iommufd_access_pin_pages() so that neither
iommufd_test_access_unmap() can progress until both the pin is
completed and the record of the pin is stored.

Fortunately in the iommufd design we can hold a lock like this across
these calls, and in the op callback, without deadlocking. I can't
recall if vfio can do the same, I suspect not since I had in my mind I
needed to avoid that kind of locking for deadlock reasons..

I doubt any mdev drivers do this properly, so this will be some
oddball bugs. Thankfully it doesn't harm kernel integrity, but it does
leave a mess for a userspace vIOMMU which is tracking a guest command
to unmap an IOVA range and the kernel chucked out a WARN_ON and told
it EDEADLOCK. I guess sleep and retry?

Anyhow, the below seems to have fixed it. And this is the last open
syzkaller bug, the rest were dups of the prior one. Now we wait for it
to find something else.

Jason

@@ -420,7 +420,7 @@ static int iommufd_test_md_check_refs(struct iommufd_ucmd *ucmd,
 struct selftest_access {
 	struct iommufd_access *access;
 	struct file *file;
-	spinlock_t lock;
+	struct mutex lock;
 	struct list_head items;
 	unsigned int next_id;
 	bool destroying;
@@ -458,19 +458,17 @@ static void iommufd_test_access_unmap(void *data, unsigned long iova,
 	struct selftest_access_item *item;
 	struct selftest_access_item *tmp;
 
-	spin_lock(&staccess->lock);
+	mutex_lock(&staccess->lock);
 	list_for_each_entry_safe(item, tmp, &staccess->items, items_elm) {
 		if (iova > item->iova + item->length - 1 ||
 		    iova_last < item->iova)
 			continue;
 		list_del(&item->items_elm);
-		spin_unlock(&staccess->lock);
 		iommufd_access_unpin_pages(staccess->access, item->iova,
 					   item->length);
 		kfree(item);
-		spin_lock(&staccess->lock);
 	}
-	spin_unlock(&staccess->lock);
+	mutex_unlock(&staccess->lock);
 }
 
 static int iommufd_test_access_item_destroy(struct iommufd_ucmd *ucmd,
@@ -484,19 +482,19 @@ static int iommufd_test_access_item_destroy(struct iommufd_ucmd *ucmd,
 	if (IS_ERR(staccess))
 		return PTR_ERR(staccess);
 
-	spin_lock(&staccess->lock);
+	mutex_lock(&staccess->lock);
 	list_for_each_entry(item, &staccess->items, items_elm) {
 		if (item->id == item_id) {
 			list_del(&item->items_elm);
-			spin_unlock(&staccess->lock);
 			iommufd_access_unpin_pages(staccess->access, item->iova,
 						   item->length);
+			mutex_unlock(&staccess->lock);
 			kfree(item);
 			fput(staccess->file);
 			return 0;
 		}
 	}
-	spin_unlock(&staccess->lock);
+	mutex_unlock(&staccess->lock);
 	fput(staccess->file);
 	return -ENOENT;
 }
@@ -510,6 +508,7 @@ static int iommufd_test_staccess_release(struct inode *inode,
 		iommufd_test_access_unmap(staccess, 0, ULONG_MAX);
 		iommufd_access_destroy(staccess->access);
 	}
+	mutex_destroy(&staccess->lock);
 	kfree(staccess);
 	return 0;
 }
@@ -536,7 +535,7 @@ static struct selftest_access *iommufd_test_alloc_access(void)
 	if (!staccess)
 		return ERR_PTR(-ENOMEM);
 	INIT_LIST_HEAD(&staccess->items);
-	spin_lock_init(&staccess->lock);
+	mutex_init(&staccess->lock);
 
 	filep = anon_inode_getfile("[iommufd_test_staccess]",
 				   &iommfd_test_staccess_fops, staccess,
@@ -662,10 +661,20 @@ static int iommufd_test_access_pages(struct iommufd_ucmd *ucmd,
 		goto out_put;
 	}
 
+	/*
+	 * Drivers will need to think very carefully about this locking. The
+	 * core code can do multiple unmaps instantaneously after
+	 * iommufd_access_pin_pages() and *all* the unmaps must not return until
+	 * the range is unpinned. This simple implementation puts a global lock
+	 * around the pin, which may not suit drivers that want this to be a
+	 * performance path. drivers that get this wrong will trigger WARN_ON
+	 * races and cause EDEADLOCK failures to userspace.
+	 */
+	mutex_lock(&staccess->lock);
 	rc = iommufd_access_pin_pages(staccess->access, iova, length, pages,
 				      flags & MOCK_FLAGS_ACCESS_WRITE);
 	if (rc)
-		goto out_free_pages;
+		goto out_unlock;
 
 	/* For syzkaller allow uptr to be NULL to skip this check */
 	if (uptr) {
@@ -684,25 +693,22 @@ static int iommufd_test_access_pages(struct iommufd_ucmd *ucmd,
 
 	item->iova = iova;
 	item->length = length;
-	spin_lock(&staccess->lock);
 	item->id = staccess->next_id++;
 	list_add_tail(&item->items_elm, &staccess->items);
-	spin_unlock(&staccess->lock);
 
 	cmd->access_pages.out_access_pages_id = item->id;
 	rc = iommufd_ucmd_respond(ucmd, sizeof(*cmd));
 	if (rc)
 		goto out_free_item;
-	goto out_free_pages;
+	goto out_unlock;
 
 out_free_item:
-	spin_lock(&staccess->lock);
 	list_del(&item->items_elm);
-	spin_unlock(&staccess->lock);
 	kfree(item);
 out_unaccess:
 	iommufd_access_unpin_pages(staccess->access, iova, length);
-out_free_pages:
+out_unlock:
+	mutex_unlock(&staccess->lock);
 	kvfree(pages);
 out_put:
 	fput(staccess->file);

  parent reply	other threads:[~2022-11-04  1:01 UTC|newest]

Thread overview: 85+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-25 18:12 [PATCH v3 00/15] IOMMUFD Generic interface Jason Gunthorpe
2022-10-25 18:12 ` [PATCH v3 01/15] iommu: Add IOMMU_CAP_ENFORCE_CACHE_COHERENCY Jason Gunthorpe
2022-10-26 12:45   ` Baolu Lu
2022-11-03  5:03   ` Tian, Kevin
2022-11-04 19:25     ` Jason Gunthorpe
2022-10-25 18:12 ` [PATCH v3 02/15] iommu: Add device-centric DMA ownership interfaces Jason Gunthorpe
2022-11-03  5:11   ` Tian, Kevin
2022-11-04 19:32     ` Jason Gunthorpe
2022-10-25 18:12 ` [PATCH v3 03/15] interval-tree: Add a utility to iterate over spans in an interval tree Jason Gunthorpe
2022-11-03  5:31   ` Tian, Kevin
2022-11-04 19:38     ` Jason Gunthorpe
2022-11-05  1:32       ` Tian, Kevin
2022-11-05  1:48       ` Matthew Wilcox
2022-11-07 14:38         ` Jason Gunthorpe
2022-10-25 18:12 ` [PATCH v3 04/15] iommufd: Overview documentation Jason Gunthorpe
2022-10-26  4:17   ` Bagas Sanjaya
2022-10-28 19:09     ` Jason Gunthorpe
2022-10-25 18:12 ` [PATCH v3 05/15] iommufd: File descriptor, context, kconfig and makefiles Jason Gunthorpe
2022-10-26 12:58   ` Baolu Lu
2022-10-26 17:14     ` Jason Gunthorpe
2022-10-29  3:43       ` Baolu Lu
2022-11-03  7:22   ` Tian, Kevin
2022-11-07 17:00     ` Jason Gunthorpe
2022-10-25 18:12 ` [PATCH v3 06/15] kernel/user: Allow user::locked_vm to be usable for iommufd Jason Gunthorpe
2022-11-03  7:23   ` Tian, Kevin
2022-10-25 18:12 ` [PATCH v3 07/15] iommufd: PFN handling for iopt_pages Jason Gunthorpe
2022-11-01 19:38   ` Nicolin Chen
2022-11-02 13:13     ` Jason Gunthorpe
2022-10-25 18:12 ` [PATCH v3 08/15] iommufd: Algorithms for PFN storage Jason Gunthorpe
2022-10-31 16:01   ` [PATCH v3 8/15] " Jason Gunthorpe
2022-11-01 16:09   ` Jason Gunthorpe
2022-11-03 20:08   ` Jason Gunthorpe
2022-11-04 16:26     ` Jason Gunthorpe
2022-11-04 16:04   ` Jason Gunthorpe
2022-10-25 18:12 ` [PATCH v3 09/15] iommufd: Data structure to provide IOVA to PFN mapping Jason Gunthorpe
2022-10-26 18:46   ` [PATCH v3 9/15] " Jason Gunthorpe
2022-10-27 11:37   ` Jason Gunthorpe
2022-10-27 13:35   ` Jason Gunthorpe
2022-10-28 18:52   ` Jason Gunthorpe
2022-11-01 19:17   ` [PATCH v3 09/15] " Nicolin Chen
2022-11-02 13:11     ` Jason Gunthorpe
2022-10-25 18:12 ` [PATCH v3 10/15] iommufd: IOCTLs for the io_pagetable Jason Gunthorpe
2022-10-26 17:01   ` Jason Gunthorpe
2022-10-26 23:17   ` Jason Gunthorpe
2022-10-29  7:25   ` Baolu Lu
2022-11-07 14:17     ` Jason Gunthorpe
2022-11-04  8:32   ` Tian, Kevin
2022-11-07 15:02     ` Jason Gunthorpe
2022-11-08  2:05       ` Tian, Kevin
2022-11-08 17:29         ` Jason Gunthorpe
2022-11-09  2:50           ` Tian, Kevin
2022-11-09 13:05             ` Jason Gunthorpe
2022-10-25 18:12 ` [PATCH v3 11/15] iommufd: Add a HW pagetable object Jason Gunthorpe
2022-11-04 10:00   ` Tian, Kevin
2022-10-25 18:12 ` [PATCH v3 12/15] iommufd: Add kAPI toward external drivers for physical devices Jason Gunthorpe
2022-10-29  7:19   ` Baolu Lu
2022-11-07 14:14     ` Jason Gunthorpe
2022-11-05  7:17   ` Tian, Kevin
2022-11-07 17:54     ` Jason Gunthorpe
2022-11-08  2:17       ` Tian, Kevin
2022-10-25 18:12 ` [PATCH v3 13/15] iommufd: Add kAPI toward external drivers for kernel access Jason Gunthorpe
2022-10-25 18:12 ` [PATCH v3 14/15] iommufd: vfio container FD ioctl compatibility Jason Gunthorpe
2022-10-27 14:12   ` Jason Gunthorpe
2022-11-01 19:45   ` Nicolin Chen
2022-11-02 13:15     ` Jason Gunthorpe
2022-11-05  0:07   ` Baolu Lu
2022-11-07 14:23     ` Jason Gunthorpe
2022-11-05  9:31   ` Tian, Kevin
2022-11-07 17:08     ` Jason Gunthorpe
2022-11-07 23:53       ` Tian, Kevin
2022-11-08  0:09         ` Jason Gunthorpe
2022-11-08  0:13           ` Tian, Kevin
2022-11-08  0:17             ` Jason Gunthorpe
2022-10-25 18:12 ` [PATCH v3 15/15] iommufd: Add a selftest Jason Gunthorpe
2022-11-01 20:32   ` Nicolin Chen
2022-11-02 13:17     ` Jason Gunthorpe
2022-11-02 18:49       ` Nathan Chancellor
2022-11-04  1:01   ` Jason Gunthorpe [this message]
2022-11-04  5:43     ` Tian, Kevin
2022-11-04 19:42       ` Jason Gunthorpe
2022-10-28 23:57 ` [PATCH v3 00/15] IOMMUFD Generic interface Nicolin Chen
2022-11-04 21:27 ` Alex Williamson
2022-11-04 22:03   ` Alex Williamson
2022-11-07 14:22     ` Jason Gunthorpe
2022-11-07 14:19   ` Jason Gunthorpe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y2RkXV+q0Q6bdTde@nvidia.com \
    --to=jgg@nvidia.com \
    --cc=alex.williamson@redhat.com \
    --cc=baolu.lu@linux.intel.com \
    --cc=bpf@vger.kernel.org \
    --cc=chaitanyak@nvidia.com \
    --cc=cohuck@redhat.com \
    --cc=corbet@lwn.net \
    --cc=daniel.m.jordan@oracle.com \
    --cc=david@gibson.dropbear.id.au \
    --cc=dwmw2@infradead.org \
    --cc=eric.auger@redhat.com \
    --cc=farman@linux.ibm.com \
    --cc=iommu@lists.linux.dev \
    --cc=jasowang@redhat.com \
    --cc=jean-philippe@linaro.org \
    --cc=joao.m.martins@oracle.com \
    --cc=joro@8bytes.org \
    --cc=kevin.tian@intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=llvm@lists.linux.dev \
    --cc=mjrosato@linux.ibm.com \
    --cc=mst@redhat.com \
    --cc=nathan@kernel.org \
    --cc=ndesaulniers@google.com \
    --cc=nicolinc@nvidia.com \
    --cc=ojeda@kernel.org \
    --cc=robin.murphy@arm.com \
    --cc=schnelle@linux.ibm.com \
    --cc=shameerali.kolothum.thodi@huawei.com \
    --cc=shuah@kernel.org \
    --cc=suravee.suthikulpanit@amd.com \
    --cc=trix@redhat.com \
    --cc=will@kernel.org \
    --cc=yi.l.liu@intel.com \
    --cc=zhukeqian1@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.