linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/2] xen/gntdev: Fixes for leaks and VMA splitting
@ 2022-10-02 22:20 M. Vefa Bicakci
  2022-10-02 22:20 ` [PATCH v2 1/2] xen/gntdev: Prevent leaking grants M. Vefa Bicakci
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: M. Vefa Bicakci @ 2022-10-02 22:20 UTC (permalink / raw)
  To: xen-devel, linux-kernel
  Cc: m.v.b, Juergen Gross, Stefano Stabellini, Oleksandr Tyshchenko,
	Demi Marie Obenour, Gerd Hoffmann

Hi all,

First of all, sorry for the delay!

These patches continue the code review for the following patches:
  https://lore.kernel.org/xen-devel/20220912040002.198191-1-m.v.b@runbox.com/t/#u

The original description of the patch set is as follows:

  "The changes in this patch series intend to fix the Xen grant device
  driver, so that grant mapping leaks caused by partially failed grant
  mapping operations are avoided with the first patch, and so that the
  splitting of VMAs does not result in incorrectly unmapped grant pages
  with the second patch. The second patch also prevents a similar issue
  in a double-mapping scenario, where mmap() is used with MAP_FIXED to
  map grants over an existing mapping created with the same grants, and
  where grant pages are unmapped incorrectly as well."

A summary of the changes from v1 is as follows:
- Addressed Juergen's code review comment regarding the first patch.
- Amended the description of the second patch to note that the described
  issues are encountered with PV domains.

Verification notes:

- I have tested these commits on top of Linux v5.15.70 and v5.15.71, and
  I verified that they compile successfully on top of the tag
  "next-20220930", which corresponds to the base commit ID included at
  the bottom of this e-mail.

- My tests consist of using a kernel with Qubes OS v4.1's patches and
  these patches on my main computer for day-to-day tasks, in conjunction
  with Qubes OS's version of the Xen hypervisor v4.14.5, with the latter
  custom-compiled with CONFIG_DEBUG.

- I used a test program that verifies the following scenarios with an
  unprivileged paravirtualized (PV) Xen domain:

  - A program mmap()s two pages from another Xen domain and munmap()s
    the pages one by one. This used to result in implicit unmap errors
    to be reported by Xen and a general protection fault to be triggered
    by Xen in the affected domain, but now works as expected.
  - A program mmap()s two pages from another Xen domain and then
    attempts to remap (via MAP_FIXED) the same mapping again over the
    same virtual address. This used to result in similar issues
    (implicit unmap errors and general protection fault), but now is
    rejected by the kernel.
  - A program mmap()s two pages from another Xen domain and then
    attempts to mmap() the same mapping again to a different virtual
    address, by passing NULL as mmap()'s first argument. This used to be
    rejected by the kernel, and it continues to be rejected by the
    kernel.

- Unprivileged PVH Xen domains were also sanity tested with the same
  test program. I should note that PVH domains worked as expected
  without these patches too.

- Finally, I have verified that the original "g.e. 0x1234 still pending"
  issue does not appear after rapidly resizing GUI windows in Qubes OS
  v4.1.

Thank you,

Vefa

M. Vefa Bicakci (2):
  xen/gntdev: Prevent leaking grants
  xen/gntdev: Accommodate VMA splitting

 drivers/xen/gntdev-common.h |  3 +-
 drivers/xen/gntdev.c        | 80 +++++++++++++++++++------------------
 2 files changed, 44 insertions(+), 39 deletions(-)


base-commit: 274d7803837da78dfc911bcda0d593412676fc20
-- 
2.37.3


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v2 1/2] xen/gntdev: Prevent leaking grants
  2022-10-02 22:20 [PATCH v2 0/2] xen/gntdev: Fixes for leaks and VMA splitting M. Vefa Bicakci
@ 2022-10-02 22:20 ` M. Vefa Bicakci
  2022-10-03  0:29   ` Demi Marie Obenour
  2022-10-06  8:33   ` Juergen Gross
  2022-10-02 22:20 ` [PATCH v2 2/2] xen/gntdev: Accommodate VMA splitting M. Vefa Bicakci
  2022-10-07  5:17 ` [PATCH v2 0/2] xen/gntdev: Fixes for leaks and " Juergen Gross
  2 siblings, 2 replies; 12+ messages in thread
From: M. Vefa Bicakci @ 2022-10-02 22:20 UTC (permalink / raw)
  To: xen-devel, linux-kernel
  Cc: m.v.b, Juergen Gross, Stefano Stabellini, Oleksandr Tyshchenko,
	Demi Marie Obenour, Gerd Hoffmann

Prior to this commit, if a grant mapping operation failed partially,
some of the entries in the map_ops array would be invalid, whereas all
of the entries in the kmap_ops array would be valid. This in turn would
cause the following logic in gntdev_map_grant_pages to become invalid:

  for (i = 0; i < map->count; i++) {
    if (map->map_ops[i].status == GNTST_okay) {
      map->unmap_ops[i].handle = map->map_ops[i].handle;
      if (!use_ptemod)
        alloced++;
    }
    if (use_ptemod) {
      if (map->kmap_ops[i].status == GNTST_okay) {
        if (map->map_ops[i].status == GNTST_okay)
          alloced++;
        map->kunmap_ops[i].handle = map->kmap_ops[i].handle;
      }
    }
  }
  ...
  atomic_add(alloced, &map->live_grants);

Assume that use_ptemod is true (i.e., the domain mapping the granted
pages is a paravirtualized domain). In the code excerpt above, note that
the "alloced" variable is only incremented when both kmap_ops[i].status
and map_ops[i].status are set to GNTST_okay (i.e., both mapping
operations are successful).  However, as also noted above, there are
cases where a grant mapping operation fails partially, breaking the
assumption of the code excerpt above.

The aforementioned causes map->live_grants to be incorrectly set. In
some cases, all of the map_ops mappings fail, but all of the kmap_ops
mappings succeed, meaning that live_grants may remain zero. This in turn
makes it impossible to unmap the successfully grant-mapped pages pointed
to by kmap_ops, because unmap_grant_pages has the following snippet of
code at its beginning:

  if (atomic_read(&map->live_grants) == 0)
    return; /* Nothing to do */

In other cases where only some of the map_ops mappings fail but all
kmap_ops mappings succeed, live_grants is made positive, but when the
user requests unmapping the grant-mapped pages, __unmap_grant_pages_done
will then make map->live_grants negative, because the latter function
does not check if all of the pages that were requested to be unmapped
were actually unmapped, and the same function unconditionally subtracts
"data->count" (i.e., a value that can be greater than map->live_grants)
from map->live_grants. The side effects of a negative live_grants value
have not been studied.

The net effect of all of this is that grant references are leaked in one
of the above conditions. In Qubes OS v4.1 (which uses Xen's grant
mechanism extensively for X11 GUI isolation), this issue manifests
itself with warning messages like the following to be printed out by the
Linux kernel in the VM that had granted pages (that contain X11 GUI
window data) to dom0: "g.e. 0x1234 still pending", especially after the
user rapidly resizes GUI VM windows (causing some grant-mapping
operations to partially or completely fail, due to the fact that the VM
unshares some of the pages as part of the window resizing, making the
pages impossible to grant-map from dom0).

The fix for this issue involves counting all successful map_ops and
kmap_ops mappings separately, and then adding the sum to live_grants.
During unmapping, only the number of successfully unmapped grants is
subtracted from live_grants. The code is also modified to check for
negative live_grants values after the subtraction and warn the user.

Link: https://github.com/QubesOS/qubes-issues/issues/7631
Fixes: dbe97cff7dd9 ("xen/gntdev: Avoid blocking in unmap_grant_pages()")
Cc: stable@vger.kernel.org
Signed-off-by: M. Vefa Bicakci <m.v.b@runbox.com>
---

Changes since v1:
- To determine which unmap operations were successful, the previous
  version of this patch set the "unmap_ops[i].status" and
  "kunmap_ops[i].status" fields to the value "1" prior to passing these
  data structures to the hypervisor. Instead of doing that, the code now
  checks whether the "handle" fields in the same data structures were
  *not* set to "INVALID_GRANT_HANDLE". (Suggested by Juergen Gross.)
---
 drivers/xen/gntdev.c | 22 +++++++++++++++++-----
 1 file changed, 17 insertions(+), 5 deletions(-)

diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c
index 84b143eef395..eb0586b9767d 100644
--- a/drivers/xen/gntdev.c
+++ b/drivers/xen/gntdev.c
@@ -367,8 +367,7 @@ int gntdev_map_grant_pages(struct gntdev_grant_map *map)
 	for (i = 0; i < map->count; i++) {
 		if (map->map_ops[i].status == GNTST_okay) {
 			map->unmap_ops[i].handle = map->map_ops[i].handle;
-			if (!use_ptemod)
-				alloced++;
+			alloced++;
 		} else if (!err)
 			err = -EINVAL;
 
@@ -377,8 +376,7 @@ int gntdev_map_grant_pages(struct gntdev_grant_map *map)
 
 		if (use_ptemod) {
 			if (map->kmap_ops[i].status == GNTST_okay) {
-				if (map->map_ops[i].status == GNTST_okay)
-					alloced++;
+				alloced++;
 				map->kunmap_ops[i].handle = map->kmap_ops[i].handle;
 			} else if (!err)
 				err = -EINVAL;
@@ -394,8 +392,14 @@ static void __unmap_grant_pages_done(int result,
 	unsigned int i;
 	struct gntdev_grant_map *map = data->data;
 	unsigned int offset = data->unmap_ops - map->unmap_ops;
+	int successful_unmaps = 0;
+	int live_grants;
 
 	for (i = 0; i < data->count; i++) {
+		if (map->unmap_ops[offset + i].status == GNTST_okay &&
+		    map->unmap_ops[offset + i].handle != INVALID_GRANT_HANDLE)
+			successful_unmaps++;
+
 		WARN_ON(map->unmap_ops[offset + i].status != GNTST_okay &&
 			map->unmap_ops[offset + i].handle != INVALID_GRANT_HANDLE);
 		pr_debug("unmap handle=%d st=%d\n",
@@ -403,6 +407,10 @@ static void __unmap_grant_pages_done(int result,
 			map->unmap_ops[offset+i].status);
 		map->unmap_ops[offset+i].handle = INVALID_GRANT_HANDLE;
 		if (use_ptemod) {
+			if (map->kunmap_ops[offset + i].status == GNTST_okay &&
+			    map->kunmap_ops[offset + i].handle != INVALID_GRANT_HANDLE)
+				successful_unmaps++;
+
 			WARN_ON(map->kunmap_ops[offset + i].status != GNTST_okay &&
 				map->kunmap_ops[offset + i].handle != INVALID_GRANT_HANDLE);
 			pr_debug("kunmap handle=%u st=%d\n",
@@ -411,11 +419,15 @@ static void __unmap_grant_pages_done(int result,
 			map->kunmap_ops[offset+i].handle = INVALID_GRANT_HANDLE;
 		}
 	}
+
 	/*
 	 * Decrease the live-grant counter.  This must happen after the loop to
 	 * prevent premature reuse of the grants by gnttab_mmap().
 	 */
-	atomic_sub(data->count, &map->live_grants);
+	live_grants = atomic_sub_return(successful_unmaps, &map->live_grants);
+	if (WARN_ON(live_grants < 0))
+		pr_err("%s: live_grants became negative (%d) after unmapping %d pages!\n",
+		       __func__, live_grants, successful_unmaps);
 
 	/* Release reference taken by __unmap_grant_pages */
 	gntdev_put_map(NULL, map);
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v2 2/2] xen/gntdev: Accommodate VMA splitting
  2022-10-02 22:20 [PATCH v2 0/2] xen/gntdev: Fixes for leaks and VMA splitting M. Vefa Bicakci
  2022-10-02 22:20 ` [PATCH v2 1/2] xen/gntdev: Prevent leaking grants M. Vefa Bicakci
@ 2022-10-02 22:20 ` M. Vefa Bicakci
  2022-10-06  8:36   ` Juergen Gross
  2022-10-07  5:17 ` [PATCH v2 0/2] xen/gntdev: Fixes for leaks and " Juergen Gross
  2 siblings, 1 reply; 12+ messages in thread
From: M. Vefa Bicakci @ 2022-10-02 22:20 UTC (permalink / raw)
  To: xen-devel, linux-kernel
  Cc: m.v.b, Juergen Gross, Stefano Stabellini, Oleksandr Tyshchenko,
	Demi Marie Obenour, Gerd Hoffmann

Prior to this commit, the gntdev driver code did not handle the
following scenario correctly with paravirtualized (PV) Xen domains:

* User process sets up a gntdev mapping composed of two grant mappings
  (i.e., two pages shared by another Xen domain).
* User process munmap()s one of the pages.
* User process munmap()s the remaining page.
* User process exits.

In the scenario above, the user process would cause the kernel to log
the following messages in dmesg for the first munmap(), and the second
munmap() call would result in similar log messages:

  BUG: Bad page map in process doublemap.test  pte:... pmd:...
  page:0000000057c97bff refcount:1 mapcount:-1 \
    mapping:0000000000000000 index:0x0 pfn:...
  ...
  page dumped because: bad pte
  ...
  file:gntdev fault:0x0 mmap:gntdev_mmap [xen_gntdev] readpage:0x0
  ...
  Call Trace:
   <TASK>
   dump_stack_lvl+0x46/0x5e
   print_bad_pte.cold+0x66/0xb6
   unmap_page_range+0x7e5/0xdc0
   unmap_vmas+0x78/0xf0
   unmap_region+0xa8/0x110
   __do_munmap+0x1ea/0x4e0
   __vm_munmap+0x75/0x120
   __x64_sys_munmap+0x28/0x40
   do_syscall_64+0x38/0x90
   entry_SYSCALL_64_after_hwframe+0x61/0xcb
   ...

For each munmap() call, the Xen hypervisor (if built with CONFIG_DEBUG)
would print out the following and trigger a general protection fault in
the affected Xen PV domain:

  (XEN) d0v... Attempt to implicitly unmap d0's grant PTE ...
  (XEN) d0v... Attempt to implicitly unmap d0's grant PTE ...

As of this writing, gntdev_grant_map structure's vma field (referred to
as map->vma below) is mainly used for checking the start and end
addresses of mappings. However, with split VMAs, these may change, and
there could be more than one VMA associated with a gntdev mapping.
Hence, remove the use of map->vma and rely on map->pages_vm_start for
the original start address and on (map->count << PAGE_SHIFT) for the
original mapping size. Let the invalidate() and find_special_page()
hooks use these.

Also, given that there can be multiple VMAs associated with a gntdev
mapping, move the "mmu_interval_notifier_remove(&map->notifier)" call to
the end of gntdev_put_map, so that the MMU notifier is only removed
after the closing of the last remaining VMA.

Finally, use an atomic to prevent inadvertent gntdev mapping re-use,
instead of using the map->live_grants atomic counter and/or the map->vma
pointer (the latter of which is now removed). This prevents the
userspace from mmap()'ing (with MAP_FIXED) a gntdev mapping over the
same address range as a previously set up gntdev mapping. This scenario
can be summarized with the following call-trace, which was valid prior
to this commit:

  mmap
    gntdev_mmap
  mmap (repeat mmap with MAP_FIXED over the same address range)
    gntdev_invalidate
      unmap_grant_pages (sets 'being_removed' entries to true)
        gnttab_unmap_refs_async
    unmap_single_vma
    gntdev_mmap (maps the shared pages again)
  munmap
    gntdev_invalidate
      unmap_grant_pages
        (no-op because 'being_removed' entries are true)
    unmap_single_vma (For PV domains, Xen reports that a granted page
      is being unmapped and triggers a general protection fault in the
      affected domain, if Xen was built with CONFIG_DEBUG)

The fix for this last scenario could be worth its own commit, but we
opted for a single commit, because removing the gntdev_grant_map
structure's vma field requires guarding the entry to gntdev_mmap(), and
the live_grants atomic counter is not sufficient on its own to prevent
the mmap() over a pre-existing mapping.

Link: https://github.com/QubesOS/qubes-issues/issues/7631
Fixes: ab31523c2fca ("xen/gntdev: allow usermode to map granted pages")
Cc: stable@vger.kernel.org
Signed-off-by: M. Vefa Bicakci <m.v.b@runbox.com>
---

Changes since v1:
- Amended the patch description to note that the reported issues affect
  paravirtualized (PV) Xen domains.
- Note that Juergen Gross reviewed the first version of this patch.
  (Thanks!) I did not add a Reviewed-by tag in v2, because I have
  since amended the patch description.
- No source code changes.

My original note follows.

Note for reviewers:

I am not 100% sure if the "Fixes" tag is correct. Based on a quick look
at the history of the modified file, I am under the impression that VMA
splits could be broken for the Xen gntdev driver since day 1 (i.e.,
v2.6.38), but I did not yet attempt to verify this by testing older
kernels where the gntdev driver's code is sufficiently similar.

Also, resetting the being_removed flags to false after the completion of
unmap operation could be another potential solution (that I have not yet
tested in the context of this change) to the mmap and MAP_FIXED issue
discussed at the end of the patch description.
---
 drivers/xen/gntdev-common.h |  3 +-
 drivers/xen/gntdev.c        | 58 ++++++++++++++++---------------------
 2 files changed, 27 insertions(+), 34 deletions(-)

diff --git a/drivers/xen/gntdev-common.h b/drivers/xen/gntdev-common.h
index 40ef379c28ab..9c286b2a1900 100644
--- a/drivers/xen/gntdev-common.h
+++ b/drivers/xen/gntdev-common.h
@@ -44,9 +44,10 @@ struct gntdev_unmap_notify {
 };
 
 struct gntdev_grant_map {
+	atomic_t in_use;
 	struct mmu_interval_notifier notifier;
+	bool notifier_init;
 	struct list_head next;
-	struct vm_area_struct *vma;
 	int index;
 	int count;
 	int flags;
diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c
index eb0586b9767d..4d9a3050de6a 100644
--- a/drivers/xen/gntdev.c
+++ b/drivers/xen/gntdev.c
@@ -286,6 +286,9 @@ void gntdev_put_map(struct gntdev_priv *priv, struct gntdev_grant_map *map)
 		 */
 	}
 
+	if (use_ptemod && map->notifier_init)
+		mmu_interval_notifier_remove(&map->notifier);
+
 	if (map->notify.flags & UNMAP_NOTIFY_SEND_EVENT) {
 		notify_remote_via_evtchn(map->notify.event);
 		evtchn_put(map->notify.event);
@@ -298,7 +301,7 @@ void gntdev_put_map(struct gntdev_priv *priv, struct gntdev_grant_map *map)
 static int find_grant_ptes(pte_t *pte, unsigned long addr, void *data)
 {
 	struct gntdev_grant_map *map = data;
-	unsigned int pgnr = (addr - map->vma->vm_start) >> PAGE_SHIFT;
+	unsigned int pgnr = (addr - map->pages_vm_start) >> PAGE_SHIFT;
 	int flags = map->flags | GNTMAP_application_map | GNTMAP_contains_pte |
 		    (1 << _GNTMAP_guest_avail0);
 	u64 pte_maddr;
@@ -508,11 +511,7 @@ static void gntdev_vma_close(struct vm_area_struct *vma)
 	struct gntdev_priv *priv = file->private_data;
 
 	pr_debug("gntdev_vma_close %p\n", vma);
-	if (use_ptemod) {
-		WARN_ON(map->vma != vma);
-		mmu_interval_notifier_remove(&map->notifier);
-		map->vma = NULL;
-	}
+
 	vma->vm_private_data = NULL;
 	gntdev_put_map(priv, map);
 }
@@ -540,29 +539,30 @@ static bool gntdev_invalidate(struct mmu_interval_notifier *mn,
 	struct gntdev_grant_map *map =
 		container_of(mn, struct gntdev_grant_map, notifier);
 	unsigned long mstart, mend;
+	unsigned long map_start, map_end;
 
 	if (!mmu_notifier_range_blockable(range))
 		return false;
 
+	map_start = map->pages_vm_start;
+	map_end = map->pages_vm_start + (map->count << PAGE_SHIFT);
+
 	/*
 	 * If the VMA is split or otherwise changed the notifier is not
 	 * updated, but we don't want to process VA's outside the modified
 	 * VMA. FIXME: It would be much more understandable to just prevent
 	 * modifying the VMA in the first place.
 	 */
-	if (map->vma->vm_start >= range->end ||
-	    map->vma->vm_end <= range->start)
+	if (map_start >= range->end || map_end <= range->start)
 		return true;
 
-	mstart = max(range->start, map->vma->vm_start);
-	mend = min(range->end, map->vma->vm_end);
+	mstart = max(range->start, map_start);
+	mend = min(range->end, map_end);
 	pr_debug("map %d+%d (%lx %lx), range %lx %lx, mrange %lx %lx\n",
-			map->index, map->count,
-			map->vma->vm_start, map->vma->vm_end,
-			range->start, range->end, mstart, mend);
-	unmap_grant_pages(map,
-				(mstart - map->vma->vm_start) >> PAGE_SHIFT,
-				(mend - mstart) >> PAGE_SHIFT);
+		 map->index, map->count, map_start, map_end,
+		 range->start, range->end, mstart, mend);
+	unmap_grant_pages(map, (mstart - map_start) >> PAGE_SHIFT,
+			  (mend - mstart) >> PAGE_SHIFT);
 
 	return true;
 }
@@ -1042,18 +1042,15 @@ static int gntdev_mmap(struct file *flip, struct vm_area_struct *vma)
 		return -EINVAL;
 
 	pr_debug("map %d+%d at %lx (pgoff %lx)\n",
-			index, count, vma->vm_start, vma->vm_pgoff);
+		 index, count, vma->vm_start, vma->vm_pgoff);
 
 	mutex_lock(&priv->lock);
 	map = gntdev_find_map_index(priv, index, count);
 	if (!map)
 		goto unlock_out;
-	if (use_ptemod && map->vma)
-		goto unlock_out;
-	if (atomic_read(&map->live_grants)) {
-		err = -EAGAIN;
+	if (!atomic_add_unless(&map->in_use, 1, 1))
 		goto unlock_out;
-	}
+
 	refcount_inc(&map->users);
 
 	vma->vm_ops = &gntdev_vmops;
@@ -1074,15 +1071,16 @@ static int gntdev_mmap(struct file *flip, struct vm_area_struct *vma)
 			map->flags |= GNTMAP_readonly;
 	}
 
+	map->pages_vm_start = vma->vm_start;
+
 	if (use_ptemod) {
-		map->vma = vma;
 		err = mmu_interval_notifier_insert_locked(
 			&map->notifier, vma->vm_mm, vma->vm_start,
 			vma->vm_end - vma->vm_start, &gntdev_mmu_ops);
-		if (err) {
-			map->vma = NULL;
+		if (err)
 			goto out_unlock_put;
-		}
+
+		map->notifier_init = true;
 	}
 	mutex_unlock(&priv->lock);
 
@@ -1099,7 +1097,6 @@ static int gntdev_mmap(struct file *flip, struct vm_area_struct *vma)
 		 */
 		mmu_interval_read_begin(&map->notifier);
 
-		map->pages_vm_start = vma->vm_start;
 		err = apply_to_page_range(vma->vm_mm, vma->vm_start,
 					  vma->vm_end - vma->vm_start,
 					  find_grant_ptes, map);
@@ -1128,13 +1125,8 @@ static int gntdev_mmap(struct file *flip, struct vm_area_struct *vma)
 out_unlock_put:
 	mutex_unlock(&priv->lock);
 out_put_map:
-	if (use_ptemod) {
+	if (use_ptemod)
 		unmap_grant_pages(map, 0, map->count);
-		if (map->vma) {
-			mmu_interval_notifier_remove(&map->notifier);
-			map->vma = NULL;
-		}
-	}
 	gntdev_put_map(priv, map);
 	return err;
 }
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 1/2] xen/gntdev: Prevent leaking grants
  2022-10-02 22:20 ` [PATCH v2 1/2] xen/gntdev: Prevent leaking grants M. Vefa Bicakci
@ 2022-10-03  0:29   ` Demi Marie Obenour
  2022-10-04  1:31     ` M. Vefa Bicakci
  2022-10-06  8:33   ` Juergen Gross
  1 sibling, 1 reply; 12+ messages in thread
From: Demi Marie Obenour @ 2022-10-03  0:29 UTC (permalink / raw)
  To: M. Vefa Bicakci, xen-devel, linux-kernel
  Cc: Juergen Gross, Stefano Stabellini, Oleksandr Tyshchenko, Gerd Hoffmann

[-- Attachment #1: Type: text/plain, Size: 7531 bytes --]

On Sun, Oct 02, 2022 at 06:20:05PM -0400, M. Vefa Bicakci wrote:
> Prior to this commit, if a grant mapping operation failed partially,
> some of the entries in the map_ops array would be invalid, whereas all
> of the entries in the kmap_ops array would be valid. This in turn would
> cause the following logic in gntdev_map_grant_pages to become invalid:
> 
>   for (i = 0; i < map->count; i++) {
>     if (map->map_ops[i].status == GNTST_okay) {
>       map->unmap_ops[i].handle = map->map_ops[i].handle;
>       if (!use_ptemod)
>         alloced++;
>     }
>     if (use_ptemod) {
>       if (map->kmap_ops[i].status == GNTST_okay) {
>         if (map->map_ops[i].status == GNTST_okay)
>           alloced++;
>         map->kunmap_ops[i].handle = map->kmap_ops[i].handle;
>       }
>     }
>   }
>   ...
>   atomic_add(alloced, &map->live_grants);
> 
> Assume that use_ptemod is true (i.e., the domain mapping the granted
> pages is a paravirtualized domain). In the code excerpt above, note that
> the "alloced" variable is only incremented when both kmap_ops[i].status
> and map_ops[i].status are set to GNTST_okay (i.e., both mapping
> operations are successful).  However, as also noted above, there are
> cases where a grant mapping operation fails partially, breaking the
> assumption of the code excerpt above.
> 
> The aforementioned causes map->live_grants to be incorrectly set. In
> some cases, all of the map_ops mappings fail, but all of the kmap_ops
> mappings succeed, meaning that live_grants may remain zero. This in turn
> makes it impossible to unmap the successfully grant-mapped pages pointed
> to by kmap_ops, because unmap_grant_pages has the following snippet of
> code at its beginning:
> 
>   if (atomic_read(&map->live_grants) == 0)
>     return; /* Nothing to do */
> 
> In other cases where only some of the map_ops mappings fail but all
> kmap_ops mappings succeed, live_grants is made positive, but when the
> user requests unmapping the grant-mapped pages, __unmap_grant_pages_done
> will then make map->live_grants negative, because the latter function
> does not check if all of the pages that were requested to be unmapped
> were actually unmapped, and the same function unconditionally subtracts
> "data->count" (i.e., a value that can be greater than map->live_grants)
> from map->live_grants. The side effects of a negative live_grants value
> have not been studied.
> 
> The net effect of all of this is that grant references are leaked in one
> of the above conditions. In Qubes OS v4.1 (which uses Xen's grant
> mechanism extensively for X11 GUI isolation), this issue manifests
> itself with warning messages like the following to be printed out by the
> Linux kernel in the VM that had granted pages (that contain X11 GUI
> window data) to dom0: "g.e. 0x1234 still pending", especially after the
> user rapidly resizes GUI VM windows (causing some grant-mapping
> operations to partially or completely fail, due to the fact that the VM
> unshares some of the pages as part of the window resizing, making the
> pages impossible to grant-map from dom0).
> 
> The fix for this issue involves counting all successful map_ops and
> kmap_ops mappings separately, and then adding the sum to live_grants.
> During unmapping, only the number of successfully unmapped grants is
> subtracted from live_grants. The code is also modified to check for
> negative live_grants values after the subtraction and warn the user.
> 
> Link: https://github.com/QubesOS/qubes-issues/issues/7631
> Fixes: dbe97cff7dd9 ("xen/gntdev: Avoid blocking in unmap_grant_pages()")

Looks like this patch has been pretty buggy, sorry.  This is the second
time there has been a problem with it.  Thanks for the fix.

> Cc: stable@vger.kernel.org
> Signed-off-by: M. Vefa Bicakci <m.v.b@runbox.com>
> ---
> 
> Changes since v1:
> - To determine which unmap operations were successful, the previous
>   version of this patch set the "unmap_ops[i].status" and
>   "kunmap_ops[i].status" fields to the value "1" prior to passing these
>   data structures to the hypervisor. Instead of doing that, the code now
>   checks whether the "handle" fields in the same data structures were
>   *not* set to "INVALID_GRANT_HANDLE". (Suggested by Juergen Gross.)
> ---
>  drivers/xen/gntdev.c | 22 +++++++++++++++++-----
>  1 file changed, 17 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c
> index 84b143eef395..eb0586b9767d 100644
> --- a/drivers/xen/gntdev.c
> +++ b/drivers/xen/gntdev.c
> @@ -367,8 +367,7 @@ int gntdev_map_grant_pages(struct gntdev_grant_map *map)
>  	for (i = 0; i < map->count; i++) {
>  		if (map->map_ops[i].status == GNTST_okay) {
>  			map->unmap_ops[i].handle = map->map_ops[i].handle;
> -			if (!use_ptemod)
> -				alloced++;
> +			alloced++;
>  		} else if (!err)
>  			err = -EINVAL;
>  
> @@ -377,8 +376,7 @@ int gntdev_map_grant_pages(struct gntdev_grant_map *map)
>  
>  		if (use_ptemod) {
>  			if (map->kmap_ops[i].status == GNTST_okay) {
> -				if (map->map_ops[i].status == GNTST_okay)
> -					alloced++;
> +				alloced++;
>  				map->kunmap_ops[i].handle = map->kmap_ops[i].handle;
>  			} else if (!err)
>  				err = -EINVAL;
> @@ -394,8 +392,14 @@ static void __unmap_grant_pages_done(int result,
>  	unsigned int i;
>  	struct gntdev_grant_map *map = data->data;
>  	unsigned int offset = data->unmap_ops - map->unmap_ops;
> +	int successful_unmaps = 0;
> +	int live_grants;
>  
>  	for (i = 0; i < data->count; i++) {
> +		if (map->unmap_ops[offset + i].status == GNTST_okay &&
> +		    map->unmap_ops[offset + i].handle != INVALID_GRANT_HANDLE)
> +			successful_unmaps++;
> +
>  		WARN_ON(map->unmap_ops[offset + i].status != GNTST_okay &&
>  			map->unmap_ops[offset + i].handle != INVALID_GRANT_HANDLE);
>  		pr_debug("unmap handle=%d st=%d\n",
> @@ -403,6 +407,10 @@ static void __unmap_grant_pages_done(int result,
>  			map->unmap_ops[offset+i].status);
>  		map->unmap_ops[offset+i].handle = INVALID_GRANT_HANDLE;
>  		if (use_ptemod) {
> +			if (map->kunmap_ops[offset + i].status == GNTST_okay &&
> +			    map->kunmap_ops[offset + i].handle != INVALID_GRANT_HANDLE)
> +				successful_unmaps++;
> +
>  			WARN_ON(map->kunmap_ops[offset + i].status != GNTST_okay &&
>  				map->kunmap_ops[offset + i].handle != INVALID_GRANT_HANDLE);
>  			pr_debug("kunmap handle=%u st=%d\n",
> @@ -411,11 +419,15 @@ static void __unmap_grant_pages_done(int result,
>  			map->kunmap_ops[offset+i].handle = INVALID_GRANT_HANDLE;
>  		}
>  	}
> +
>  	/*
>  	 * Decrease the live-grant counter.  This must happen after the loop to
>  	 * prevent premature reuse of the grants by gnttab_mmap().
>  	 */
> -	atomic_sub(data->count, &map->live_grants);
> +	live_grants = atomic_sub_return(successful_unmaps, &map->live_grants);
> +	if (WARN_ON(live_grants < 0))
> +		pr_err("%s: live_grants became negative (%d) after unmapping %d pages!\n",
> +		       __func__, live_grants, successful_unmaps);
>  
>  	/* Release reference taken by __unmap_grant_pages */
>  	gntdev_put_map(NULL, map);
> -- 
> 2.37.3

Is there a possibility that live_grants could overflow, as it is now
set to a value twice as large as what it had been previously?

If not, you can add:

Acked-by: Demi Marie Obenour <demi@invisiblethingslab.com>
-- 
Sincerely,
Demi Marie Obenour (she/her/hers)
Invisible Things Lab

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 1/2] xen/gntdev: Prevent leaking grants
  2022-10-03  0:29   ` Demi Marie Obenour
@ 2022-10-04  1:31     ` M. Vefa Bicakci
  2022-10-04  1:51       ` Demi Marie Obenour
  0 siblings, 1 reply; 12+ messages in thread
From: M. Vefa Bicakci @ 2022-10-04  1:31 UTC (permalink / raw)
  To: Demi Marie Obenour, xen-devel, linux-kernel
  Cc: Juergen Gross, Stefano Stabellini, Oleksandr Tyshchenko, Gerd Hoffmann

On 2022-10-02 20:29, Demi Marie Obenour wrote:
> On Sun, Oct 02, 2022 at 06:20:05PM -0400, M. Vefa Bicakci wrote:
>> Prior to this commit, if a grant mapping operation failed partially,
>> some of the entries in the map_ops array would be invalid, whereas all
>> of the entries in the kmap_ops array would be valid. This in turn would
>> cause the following logic in gntdev_map_grant_pages to become invalid:
>>
>>    for (i = 0; i < map->count; i++) {
>>      if (map->map_ops[i].status == GNTST_okay) {
>>        map->unmap_ops[i].handle = map->map_ops[i].handle;
>>        if (!use_ptemod)
>>          alloced++;
>>      }
>>      if (use_ptemod) {
>>        if (map->kmap_ops[i].status == GNTST_okay) {
>>          if (map->map_ops[i].status == GNTST_okay)
>>            alloced++;
>>          map->kunmap_ops[i].handle = map->kmap_ops[i].handle;
>>        }
>>      }
>>    }
>>    ...
>>    atomic_add(alloced, &map->live_grants);
>>
>> Assume that use_ptemod is true (i.e., the domain mapping the granted
>> pages is a paravirtualized domain). In the code excerpt above, note that
>> the "alloced" variable is only incremented when both kmap_ops[i].status
>> and map_ops[i].status are set to GNTST_okay (i.e., both mapping
>> operations are successful).  However, as also noted above, there are
>> cases where a grant mapping operation fails partially, breaking the
>> assumption of the code excerpt above.
>>
>> The aforementioned causes map->live_grants to be incorrectly set. In
>> some cases, all of the map_ops mappings fail, but all of the kmap_ops
>> mappings succeed, meaning that live_grants may remain zero. This in turn
>> makes it impossible to unmap the successfully grant-mapped pages pointed
>> to by kmap_ops, because unmap_grant_pages has the following snippet of
>> code at its beginning:
>>
>>    if (atomic_read(&map->live_grants) == 0)
>>      return; /* Nothing to do */
>>
>> In other cases where only some of the map_ops mappings fail but all
>> kmap_ops mappings succeed, live_grants is made positive, but when the
>> user requests unmapping the grant-mapped pages, __unmap_grant_pages_done
>> will then make map->live_grants negative, because the latter function
>> does not check if all of the pages that were requested to be unmapped
>> were actually unmapped, and the same function unconditionally subtracts
>> "data->count" (i.e., a value that can be greater than map->live_grants)
>> from map->live_grants. The side effects of a negative live_grants value
>> have not been studied.
>>
>> The net effect of all of this is that grant references are leaked in one
>> of the above conditions. In Qubes OS v4.1 (which uses Xen's grant
>> mechanism extensively for X11 GUI isolation), this issue manifests
>> itself with warning messages like the following to be printed out by the
>> Linux kernel in the VM that had granted pages (that contain X11 GUI
>> window data) to dom0: "g.e. 0x1234 still pending", especially after the
>> user rapidly resizes GUI VM windows (causing some grant-mapping
>> operations to partially or completely fail, due to the fact that the VM
>> unshares some of the pages as part of the window resizing, making the
>> pages impossible to grant-map from dom0).
>>
>> The fix for this issue involves counting all successful map_ops and
>> kmap_ops mappings separately, and then adding the sum to live_grants.
>> During unmapping, only the number of successfully unmapped grants is
>> subtracted from live_grants. The code is also modified to check for
>> negative live_grants values after the subtraction and warn the user.
>>
>> Link: https://github.com/QubesOS/qubes-issues/issues/7631
>> Fixes: dbe97cff7dd9 ("xen/gntdev: Avoid blocking in unmap_grant_pages()")
> 
> Looks like this patch has been pretty buggy, sorry.  This is the second
> time there has been a problem with it.  Thanks for the fix.

Hi,

No problem! :-) Debugging this issue and coming up with a fix was a
nice challenge for me.

> 
>> Cc: stable@vger.kernel.org
>> Signed-off-by: M. Vefa Bicakci <m.v.b@runbox.com>
>> ---
>>
>> Changes since v1:
>> - To determine which unmap operations were successful, the previous
>>    version of this patch set the "unmap_ops[i].status" and
>>    "kunmap_ops[i].status" fields to the value "1" prior to passing these
>>    data structures to the hypervisor. Instead of doing that, the code now
>>    checks whether the "handle" fields in the same data structures were
>>    *not* set to "INVALID_GRANT_HANDLE". (Suggested by Juergen Gross.)
>> ---
>>   drivers/xen/gntdev.c | 22 +++++++++++++++++-----
>>   1 file changed, 17 insertions(+), 5 deletions(-)
>>
>> diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c
>> index 84b143eef395..eb0586b9767d 100644
>> --- a/drivers/xen/gntdev.c
>> +++ b/drivers/xen/gntdev.c
>> @@ -367,8 +367,7 @@ int gntdev_map_grant_pages(struct gntdev_grant_map *map)
>>   	for (i = 0; i < map->count; i++) {
>>   		if (map->map_ops[i].status == GNTST_okay) {
>>   			map->unmap_ops[i].handle = map->map_ops[i].handle;
>> -			if (!use_ptemod)
>> -				alloced++;
>> +			alloced++;
>>   		} else if (!err)
>>   			err = -EINVAL;
>>   
>> @@ -377,8 +376,7 @@ int gntdev_map_grant_pages(struct gntdev_grant_map *map)
>>   
>>   		if (use_ptemod) {
>>   			if (map->kmap_ops[i].status == GNTST_okay) {
>> -				if (map->map_ops[i].status == GNTST_okay)
>> -					alloced++;
>> +				alloced++;
>>   				map->kunmap_ops[i].handle = map->kmap_ops[i].handle;
>>   			} else if (!err)
>>   				err = -EINVAL;
>> @@ -394,8 +392,14 @@ static void __unmap_grant_pages_done(int result,
>>   	unsigned int i;
>>   	struct gntdev_grant_map *map = data->data;
>>   	unsigned int offset = data->unmap_ops - map->unmap_ops;
>> +	int successful_unmaps = 0;
>> +	int live_grants;
>>   
>>   	for (i = 0; i < data->count; i++) {
>> +		if (map->unmap_ops[offset + i].status == GNTST_okay &&
>> +		    map->unmap_ops[offset + i].handle != INVALID_GRANT_HANDLE)
>> +			successful_unmaps++;
>> +
>>   		WARN_ON(map->unmap_ops[offset + i].status != GNTST_okay &&
>>   			map->unmap_ops[offset + i].handle != INVALID_GRANT_HANDLE);
>>   		pr_debug("unmap handle=%d st=%d\n",
>> @@ -403,6 +407,10 @@ static void __unmap_grant_pages_done(int result,
>>   			map->unmap_ops[offset+i].status);
>>   		map->unmap_ops[offset+i].handle = INVALID_GRANT_HANDLE;
>>   		if (use_ptemod) {
>> +			if (map->kunmap_ops[offset + i].status == GNTST_okay &&
>> +			    map->kunmap_ops[offset + i].handle != INVALID_GRANT_HANDLE)
>> +				successful_unmaps++;
>> +
>>   			WARN_ON(map->kunmap_ops[offset + i].status != GNTST_okay &&
>>   				map->kunmap_ops[offset + i].handle != INVALID_GRANT_HANDLE);
>>   			pr_debug("kunmap handle=%u st=%d\n",
>> @@ -411,11 +419,15 @@ static void __unmap_grant_pages_done(int result,
>>   			map->kunmap_ops[offset+i].handle = INVALID_GRANT_HANDLE;
>>   		}
>>   	}
>> +
>>   	/*
>>   	 * Decrease the live-grant counter.  This must happen after the loop to
>>   	 * prevent premature reuse of the grants by gnttab_mmap().
>>   	 */
>> -	atomic_sub(data->count, &map->live_grants);
>> +	live_grants = atomic_sub_return(successful_unmaps, &map->live_grants);
>> +	if (WARN_ON(live_grants < 0))
>> +		pr_err("%s: live_grants became negative (%d) after unmapping %d pages!\n",
>> +		       __func__, live_grants, successful_unmaps);
>>   
>>   	/* Release reference taken by __unmap_grant_pages */
>>   	gntdev_put_map(NULL, map);
>> -- 
>> 2.37.3
> 
> Is there a possibility that live_grants could overflow, as it is now
> set to a value twice as large as what it had been previously?

Good point! My answer in summary: I think that the code could be improved,
but with reasonable values for the "limit" module parameter, there should
not be issues.

Grant mappings are set up via ioctl calls, and the structure field that
holds the number of grant references has u32 type:

(Quoting from kernel v5.15.71 for convenience)
include/uapi/xen/gntdev.h
=== 8< ===
struct ioctl_gntdev_map_grant_ref {
	/* IN parameters */
	/* The number of grants to be mapped. */
	__u32 count;
=== >8 ===

However, the number of grant references is further limited in the actual
ioctl handler function gntdev_ioctl_map_grant_ref(), which calls
gntdev_test_page_count() to ensure that the number of granted pages
requested to be mapped does not exceed "limit". "limit" defaults to 64K,
which should be okay to use with an atomic_t type (i.e., a 32-bit signed
integer type) like "live_grants", assuming that the system administrator
does not go overboard and set "limit" to a very large value:

drivers/xen/gntdev.c
=== 8< ===
static unsigned int limit = 64*1024;
module_param(limit, uint, 0644);
MODULE_PARM_DESC(limit,
	"Maximum number of grants that may be mapped by one mapping request");

/* trimmed */

bool gntdev_test_page_count(unsigned int count)
{
	return !count || count > limit;
}

/* trimmed */

static long gntdev_ioctl_map_grant_ref(struct gntdev_priv *priv,
				       struct ioctl_gntdev_map_grant_ref __user *u)
{
	/* trimmed */

	pr_debug("priv %p, add %d\n", priv, op.count);
	if (unlikely(gntdev_test_page_count(op.count)))
		return -EINVAL;

	/* trimmed */
}
=== >8 ===

To be fair, the "count" field of the gndev_grant_map structure is a signed
integer, so very large values of count could overflow live_grants, as
live_grants needs to accommodate values up to and including 2*count.

drivers/xen/gntdev-common.h
=== 8< ===
struct gntdev_grant_map {
	atomic_t in_use;
	struct mmu_interval_notifier notifier;
	bool notifier_init;
	struct list_head next;
	int index;
	int count;
	/* trimmed */
}
=== >8 ===

> If not, you can add:
> 
> Acked-by: Demi Marie Obenour <demi@invisiblethingslab.com>

Thank you! I hope that the explanation and rationale above are satisfactory.
Please let me know what you think.

Vefa

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 1/2] xen/gntdev: Prevent leaking grants
  2022-10-04  1:31     ` M. Vefa Bicakci
@ 2022-10-04  1:51       ` Demi Marie Obenour
  2022-10-05  1:37         ` M. Vefa Bicakci
  0 siblings, 1 reply; 12+ messages in thread
From: Demi Marie Obenour @ 2022-10-04  1:51 UTC (permalink / raw)
  To: M. Vefa Bicakci, xen-devel, linux-kernel
  Cc: Juergen Gross, Stefano Stabellini, Oleksandr Tyshchenko, Gerd Hoffmann

[-- Attachment #1: Type: text/plain, Size: 11215 bytes --]

On Mon, Oct 03, 2022 at 09:31:25PM -0400, M. Vefa Bicakci wrote:
> On 2022-10-02 20:29, Demi Marie Obenour wrote:
> > On Sun, Oct 02, 2022 at 06:20:05PM -0400, M. Vefa Bicakci wrote:
> > > Prior to this commit, if a grant mapping operation failed partially,
> > > some of the entries in the map_ops array would be invalid, whereas all
> > > of the entries in the kmap_ops array would be valid. This in turn would
> > > cause the following logic in gntdev_map_grant_pages to become invalid:
> > > 
> > >    for (i = 0; i < map->count; i++) {
> > >      if (map->map_ops[i].status == GNTST_okay) {
> > >        map->unmap_ops[i].handle = map->map_ops[i].handle;
> > >        if (!use_ptemod)
> > >          alloced++;
> > >      }
> > >      if (use_ptemod) {
> > >        if (map->kmap_ops[i].status == GNTST_okay) {
> > >          if (map->map_ops[i].status == GNTST_okay)
> > >            alloced++;
> > >          map->kunmap_ops[i].handle = map->kmap_ops[i].handle;
> > >        }
> > >      }
> > >    }
> > >    ...
> > >    atomic_add(alloced, &map->live_grants);
> > > 
> > > Assume that use_ptemod is true (i.e., the domain mapping the granted
> > > pages is a paravirtualized domain). In the code excerpt above, note that
> > > the "alloced" variable is only incremented when both kmap_ops[i].status
> > > and map_ops[i].status are set to GNTST_okay (i.e., both mapping
> > > operations are successful).  However, as also noted above, there are
> > > cases where a grant mapping operation fails partially, breaking the
> > > assumption of the code excerpt above.
> > > 
> > > The aforementioned causes map->live_grants to be incorrectly set. In
> > > some cases, all of the map_ops mappings fail, but all of the kmap_ops
> > > mappings succeed, meaning that live_grants may remain zero. This in turn
> > > makes it impossible to unmap the successfully grant-mapped pages pointed
> > > to by kmap_ops, because unmap_grant_pages has the following snippet of
> > > code at its beginning:
> > > 
> > >    if (atomic_read(&map->live_grants) == 0)
> > >      return; /* Nothing to do */
> > > 
> > > In other cases where only some of the map_ops mappings fail but all
> > > kmap_ops mappings succeed, live_grants is made positive, but when the
> > > user requests unmapping the grant-mapped pages, __unmap_grant_pages_done
> > > will then make map->live_grants negative, because the latter function
> > > does not check if all of the pages that were requested to be unmapped
> > > were actually unmapped, and the same function unconditionally subtracts
> > > "data->count" (i.e., a value that can be greater than map->live_grants)
> > > from map->live_grants. The side effects of a negative live_grants value
> > > have not been studied.
> > > 
> > > The net effect of all of this is that grant references are leaked in one
> > > of the above conditions. In Qubes OS v4.1 (which uses Xen's grant
> > > mechanism extensively for X11 GUI isolation), this issue manifests
> > > itself with warning messages like the following to be printed out by the
> > > Linux kernel in the VM that had granted pages (that contain X11 GUI
> > > window data) to dom0: "g.e. 0x1234 still pending", especially after the
> > > user rapidly resizes GUI VM windows (causing some grant-mapping
> > > operations to partially or completely fail, due to the fact that the VM
> > > unshares some of the pages as part of the window resizing, making the
> > > pages impossible to grant-map from dom0).
> > > 
> > > The fix for this issue involves counting all successful map_ops and
> > > kmap_ops mappings separately, and then adding the sum to live_grants.
> > > During unmapping, only the number of successfully unmapped grants is
> > > subtracted from live_grants. The code is also modified to check for
> > > negative live_grants values after the subtraction and warn the user.
> > > 
> > > Link: https://github.com/QubesOS/qubes-issues/issues/7631
> > > Fixes: dbe97cff7dd9 ("xen/gntdev: Avoid blocking in unmap_grant_pages()")
> > 
> > Looks like this patch has been pretty buggy, sorry.  This is the second
> > time there has been a problem with it.  Thanks for the fix.
> 
> Hi,
> 
> No problem! :-) Debugging this issue and coming up with a fix was a
> nice challenge for me.

You’re welcome!  I’m glad you were able to do this.

> > > Cc: stable@vger.kernel.org
> > > Signed-off-by: M. Vefa Bicakci <m.v.b@runbox.com>
> > > ---
> > > 
> > > Changes since v1:
> > > - To determine which unmap operations were successful, the previous
> > >    version of this patch set the "unmap_ops[i].status" and
> > >    "kunmap_ops[i].status" fields to the value "1" prior to passing these
> > >    data structures to the hypervisor. Instead of doing that, the code now
> > >    checks whether the "handle" fields in the same data structures were
> > >    *not* set to "INVALID_GRANT_HANDLE". (Suggested by Juergen Gross.)
> > > ---
> > >   drivers/xen/gntdev.c | 22 +++++++++++++++++-----
> > >   1 file changed, 17 insertions(+), 5 deletions(-)
> > > 
> > > diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c
> > > index 84b143eef395..eb0586b9767d 100644
> > > --- a/drivers/xen/gntdev.c
> > > +++ b/drivers/xen/gntdev.c
> > > @@ -367,8 +367,7 @@ int gntdev_map_grant_pages(struct gntdev_grant_map *map)
> > >   	for (i = 0; i < map->count; i++) {
> > >   		if (map->map_ops[i].status == GNTST_okay) {
> > >   			map->unmap_ops[i].handle = map->map_ops[i].handle;
> > > -			if (!use_ptemod)
> > > -				alloced++;
> > > +			alloced++;
> > >   		} else if (!err)
> > >   			err = -EINVAL;
> > > @@ -377,8 +376,7 @@ int gntdev_map_grant_pages(struct gntdev_grant_map *map)
> > >   		if (use_ptemod) {
> > >   			if (map->kmap_ops[i].status == GNTST_okay) {
> > > -				if (map->map_ops[i].status == GNTST_okay)
> > > -					alloced++;
> > > +				alloced++;
> > >   				map->kunmap_ops[i].handle = map->kmap_ops[i].handle;
> > >   			} else if (!err)
> > >   				err = -EINVAL;
> > > @@ -394,8 +392,14 @@ static void __unmap_grant_pages_done(int result,
> > >   	unsigned int i;
> > >   	struct gntdev_grant_map *map = data->data;
> > >   	unsigned int offset = data->unmap_ops - map->unmap_ops;
> > > +	int successful_unmaps = 0;
> > > +	int live_grants;
> > >   	for (i = 0; i < data->count; i++) {
> > > +		if (map->unmap_ops[offset + i].status == GNTST_okay &&
> > > +		    map->unmap_ops[offset + i].handle != INVALID_GRANT_HANDLE)
> > > +			successful_unmaps++;
> > > +
> > >   		WARN_ON(map->unmap_ops[offset + i].status != GNTST_okay &&
> > >   			map->unmap_ops[offset + i].handle != INVALID_GRANT_HANDLE);
> > >   		pr_debug("unmap handle=%d st=%d\n",
> > > @@ -403,6 +407,10 @@ static void __unmap_grant_pages_done(int result,
> > >   			map->unmap_ops[offset+i].status);
> > >   		map->unmap_ops[offset+i].handle = INVALID_GRANT_HANDLE;
> > >   		if (use_ptemod) {
> > > +			if (map->kunmap_ops[offset + i].status == GNTST_okay &&
> > > +			    map->kunmap_ops[offset + i].handle != INVALID_GRANT_HANDLE)
> > > +				successful_unmaps++;
> > > +
> > >   			WARN_ON(map->kunmap_ops[offset + i].status != GNTST_okay &&
> > >   				map->kunmap_ops[offset + i].handle != INVALID_GRANT_HANDLE);
> > >   			pr_debug("kunmap handle=%u st=%d\n",
> > > @@ -411,11 +419,15 @@ static void __unmap_grant_pages_done(int result,
> > >   			map->kunmap_ops[offset+i].handle = INVALID_GRANT_HANDLE;
> > >   		}
> > >   	}
> > > +
> > >   	/*
> > >   	 * Decrease the live-grant counter.  This must happen after the loop to
> > >   	 * prevent premature reuse of the grants by gnttab_mmap().
> > >   	 */
> > > -	atomic_sub(data->count, &map->live_grants);
> > > +	live_grants = atomic_sub_return(successful_unmaps, &map->live_grants);
> > > +	if (WARN_ON(live_grants < 0))
> > > +		pr_err("%s: live_grants became negative (%d) after unmapping %d pages!\n",
> > > +		       __func__, live_grants, successful_unmaps);
> > >   	/* Release reference taken by __unmap_grant_pages */
> > >   	gntdev_put_map(NULL, map);
> > > -- 
> > > 2.37.3
> > 
> > Is there a possibility that live_grants could overflow, as it is now
> > set to a value twice as large as what it had been previously?
> 
> Good point! My answer in summary: I think that the code could be improved,
> but with reasonable values for the "limit" module parameter, there should
> not be issues.
> 
> Grant mappings are set up via ioctl calls, and the structure field that
> holds the number of grant references has u32 type:
> 
> (Quoting from kernel v5.15.71 for convenience)
> include/uapi/xen/gntdev.h
> === 8< ===
> struct ioctl_gntdev_map_grant_ref {
> 	/* IN parameters */
> 	/* The number of grants to be mapped. */
> 	__u32 count;
> === >8 ===
> 
> However, the number of grant references is further limited in the actual
> ioctl handler function gntdev_ioctl_map_grant_ref(), which calls
> gntdev_test_page_count() to ensure that the number of granted pages
> requested to be mapped does not exceed "limit". "limit" defaults to 64K,
> which should be okay to use with an atomic_t type (i.e., a 32-bit signed
> integer type) like "live_grants", assuming that the system administrator
> does not go overboard and set "limit" to a very large value:
> 
> drivers/xen/gntdev.c
> === 8< ===
> static unsigned int limit = 64*1024;
> module_param(limit, uint, 0644);
> MODULE_PARM_DESC(limit,
> 	"Maximum number of grants that may be mapped by one mapping request");
> 
> /* trimmed */
> 
> bool gntdev_test_page_count(unsigned int count)
> {
> 	return !count || count > limit;
> }
> 
> /* trimmed */
> 
> static long gntdev_ioctl_map_grant_ref(struct gntdev_priv *priv,
> 				       struct ioctl_gntdev_map_grant_ref __user *u)
> {
> 	/* trimmed */
> 
> 	pr_debug("priv %p, add %d\n", priv, op.count);
> 	if (unlikely(gntdev_test_page_count(op.count)))
> 		return -EINVAL;
> 
> 	/* trimmed */
> }
> === >8 ===
> 
> To be fair, the "count" field of the gndev_grant_map structure is a signed
> integer, so very large values of count could overflow live_grants, as
> live_grants needs to accommodate values up to and including 2*count.

Could this be replaced by an unsigned and/or 64-bit integer?
Alternatively, one could use module_param_cb and param_set_uint_minmax
to enforce that the limit is something reasonable.  That said, one needs
almost 8TiB to trigger this problem, so while it ought to be fixed it
isn’t a huge deal.  Certainly should not block getting this merged.

> drivers/xen/gntdev-common.h
> === 8< ===
> struct gntdev_grant_map {
> 	atomic_t in_use;
> 	struct mmu_interval_notifier notifier;
> 	bool notifier_init;
> 	struct list_head next;
> 	int index;
> 	int count;
> 	/* trimmed */
> }
> === >8 ===
> 
> > If not, you can add:
> > 
> > Acked-by: Demi Marie Obenour <demi@invisiblethingslab.com>
> 
> Thank you! I hope that the explanation and rationale above are satisfactory.
> Please let me know what you think.

They are indeed.
-- 
Sincerely,
Demi Marie Obenour (she/her/hers)
Invisible Things Lab

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 1/2] xen/gntdev: Prevent leaking grants
  2022-10-04  1:51       ` Demi Marie Obenour
@ 2022-10-05  1:37         ` M. Vefa Bicakci
  0 siblings, 0 replies; 12+ messages in thread
From: M. Vefa Bicakci @ 2022-10-05  1:37 UTC (permalink / raw)
  To: Demi Marie Obenour, xen-devel, linux-kernel
  Cc: Juergen Gross, Stefano Stabellini, Oleksandr Tyshchenko, Gerd Hoffmann

On 2022-10-03 21:51, Demi Marie Obenour wrote:
> On Mon, Oct 03, 2022 at 09:31:25PM -0400, M. Vefa Bicakci wrote:
>> On 2022-10-02 20:29, Demi Marie Obenour wrote:
>>> On Sun, Oct 02, 2022 at 06:20:05PM -0400, M. Vefa Bicakci wrote:
>>>> Prior to this commit, if a grant mapping operation failed partially,
>>>> some of the entries in the map_ops array would be invalid, whereas all
>>>> of the entries in the kmap_ops array would be valid. This in turn would
>>>> cause the following logic in gntdev_map_grant_pages to become invalid:
>>>>
>>>>     for (i = 0; i < map->count; i++) {
>>>>       if (map->map_ops[i].status == GNTST_okay) {
>>>>         map->unmap_ops[i].handle = map->map_ops[i].handle;
>>>>         if (!use_ptemod)
>>>>           alloced++;
>>>>       }
>>>>       if (use_ptemod) {
>>>>         if (map->kmap_ops[i].status == GNTST_okay) {
>>>>           if (map->map_ops[i].status == GNTST_okay)
>>>>             alloced++;
>>>>           map->kunmap_ops[i].handle = map->kmap_ops[i].handle;
>>>>         }
>>>>       }
>>>>     }
>>>>     ...
>>>>     atomic_add(alloced, &map->live_grants);
>>>>
>>>> Assume that use_ptemod is true (i.e., the domain mapping the granted
>>>> pages is a paravirtualized domain). In the code excerpt above, note that
>>>> the "alloced" variable is only incremented when both kmap_ops[i].status
>>>> and map_ops[i].status are set to GNTST_okay (i.e., both mapping
>>>> operations are successful).  However, as also noted above, there are
>>>> cases where a grant mapping operation fails partially, breaking the
>>>> assumption of the code excerpt above.
>>>>
>>>> The aforementioned causes map->live_grants to be incorrectly set. In
>>>> some cases, all of the map_ops mappings fail, but all of the kmap_ops
>>>> mappings succeed, meaning that live_grants may remain zero. This in turn
>>>> makes it impossible to unmap the successfully grant-mapped pages pointed
>>>> to by kmap_ops, because unmap_grant_pages has the following snippet of
>>>> code at its beginning:
>>>>
>>>>     if (atomic_read(&map->live_grants) == 0)
>>>>       return; /* Nothing to do */
>>>>
>>>> In other cases where only some of the map_ops mappings fail but all
>>>> kmap_ops mappings succeed, live_grants is made positive, but when the
>>>> user requests unmapping the grant-mapped pages, __unmap_grant_pages_done
>>>> will then make map->live_grants negative, because the latter function
>>>> does not check if all of the pages that were requested to be unmapped
>>>> were actually unmapped, and the same function unconditionally subtracts
>>>> "data->count" (i.e., a value that can be greater than map->live_grants)
>>>> from map->live_grants. The side effects of a negative live_grants value
>>>> have not been studied.
>>>>
>>>> The net effect of all of this is that grant references are leaked in one
>>>> of the above conditions. In Qubes OS v4.1 (which uses Xen's grant
>>>> mechanism extensively for X11 GUI isolation), this issue manifests
>>>> itself with warning messages like the following to be printed out by the
>>>> Linux kernel in the VM that had granted pages (that contain X11 GUI
>>>> window data) to dom0: "g.e. 0x1234 still pending", especially after the
>>>> user rapidly resizes GUI VM windows (causing some grant-mapping
>>>> operations to partially or completely fail, due to the fact that the VM
>>>> unshares some of the pages as part of the window resizing, making the
>>>> pages impossible to grant-map from dom0).
>>>>
>>>> The fix for this issue involves counting all successful map_ops and
>>>> kmap_ops mappings separately, and then adding the sum to live_grants.
>>>> During unmapping, only the number of successfully unmapped grants is
>>>> subtracted from live_grants. The code is also modified to check for
>>>> negative live_grants values after the subtraction and warn the user.
>>>>
>>>> Link: https://github.com/QubesOS/qubes-issues/issues/7631
>>>> Fixes: dbe97cff7dd9 ("xen/gntdev: Avoid blocking in unmap_grant_pages()")
>>>
>>> Looks like this patch has been pretty buggy, sorry.  This is the second
>>> time there has been a problem with it.  Thanks for the fix.
>>
>> Hi,
>>
>> No problem! :-) Debugging this issue and coming up with a fix was a
>> nice challenge for me.
> 
> You’re welcome!  I’m glad you were able to do this.
> 
>>>> Cc: stable@vger.kernel.org
>>>> Signed-off-by: M. Vefa Bicakci <m.v.b@runbox.com>
>>>> ---
>>>>
>>>> Changes since v1:
>>>> - To determine which unmap operations were successful, the previous
>>>>     version of this patch set the "unmap_ops[i].status" and
>>>>     "kunmap_ops[i].status" fields to the value "1" prior to passing these
>>>>     data structures to the hypervisor. Instead of doing that, the code now
>>>>     checks whether the "handle" fields in the same data structures were
>>>>     *not* set to "INVALID_GRANT_HANDLE". (Suggested by Juergen Gross.)
>>>> ---
>>>>    drivers/xen/gntdev.c | 22 +++++++++++++++++-----
>>>>    1 file changed, 17 insertions(+), 5 deletions(-)
>>>>
>>>> diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c
>>>> index 84b143eef395..eb0586b9767d 100644
>>>> --- a/drivers/xen/gntdev.c
>>>> +++ b/drivers/xen/gntdev.c
>>>> @@ -367,8 +367,7 @@ int gntdev_map_grant_pages(struct gntdev_grant_map *map)
>>>>    	for (i = 0; i < map->count; i++) {
>>>>    		if (map->map_ops[i].status == GNTST_okay) {
>>>>    			map->unmap_ops[i].handle = map->map_ops[i].handle;
>>>> -			if (!use_ptemod)
>>>> -				alloced++;
>>>> +			alloced++;
>>>>    		} else if (!err)
>>>>    			err = -EINVAL;
>>>> @@ -377,8 +376,7 @@ int gntdev_map_grant_pages(struct gntdev_grant_map *map)
>>>>    		if (use_ptemod) {
>>>>    			if (map->kmap_ops[i].status == GNTST_okay) {
>>>> -				if (map->map_ops[i].status == GNTST_okay)
>>>> -					alloced++;
>>>> +				alloced++;
>>>>    				map->kunmap_ops[i].handle = map->kmap_ops[i].handle;
>>>>    			} else if (!err)
>>>>    				err = -EINVAL;
>>>> @@ -394,8 +392,14 @@ static void __unmap_grant_pages_done(int result,
>>>>    	unsigned int i;
>>>>    	struct gntdev_grant_map *map = data->data;
>>>>    	unsigned int offset = data->unmap_ops - map->unmap_ops;
>>>> +	int successful_unmaps = 0;
>>>> +	int live_grants;
>>>>    	for (i = 0; i < data->count; i++) {
>>>> +		if (map->unmap_ops[offset + i].status == GNTST_okay &&
>>>> +		    map->unmap_ops[offset + i].handle != INVALID_GRANT_HANDLE)
>>>> +			successful_unmaps++;
>>>> +
>>>>    		WARN_ON(map->unmap_ops[offset + i].status != GNTST_okay &&
>>>>    			map->unmap_ops[offset + i].handle != INVALID_GRANT_HANDLE);
>>>>    		pr_debug("unmap handle=%d st=%d\n",
>>>> @@ -403,6 +407,10 @@ static void __unmap_grant_pages_done(int result,
>>>>    			map->unmap_ops[offset+i].status);
>>>>    		map->unmap_ops[offset+i].handle = INVALID_GRANT_HANDLE;
>>>>    		if (use_ptemod) {
>>>> +			if (map->kunmap_ops[offset + i].status == GNTST_okay &&
>>>> +			    map->kunmap_ops[offset + i].handle != INVALID_GRANT_HANDLE)
>>>> +				successful_unmaps++;
>>>> +
>>>>    			WARN_ON(map->kunmap_ops[offset + i].status != GNTST_okay &&
>>>>    				map->kunmap_ops[offset + i].handle != INVALID_GRANT_HANDLE);
>>>>    			pr_debug("kunmap handle=%u st=%d\n",
>>>> @@ -411,11 +419,15 @@ static void __unmap_grant_pages_done(int result,
>>>>    			map->kunmap_ops[offset+i].handle = INVALID_GRANT_HANDLE;
>>>>    		}
>>>>    	}
>>>> +
>>>>    	/*
>>>>    	 * Decrease the live-grant counter.  This must happen after the loop to
>>>>    	 * prevent premature reuse of the grants by gnttab_mmap().
>>>>    	 */
>>>> -	atomic_sub(data->count, &map->live_grants);
>>>> +	live_grants = atomic_sub_return(successful_unmaps, &map->live_grants);
>>>> +	if (WARN_ON(live_grants < 0))
>>>> +		pr_err("%s: live_grants became negative (%d) after unmapping %d pages!\n",
>>>> +		       __func__, live_grants, successful_unmaps);
>>>>    	/* Release reference taken by __unmap_grant_pages */
>>>>    	gntdev_put_map(NULL, map);
>>>> -- 
>>>> 2.37.3
>>>
>>> Is there a possibility that live_grants could overflow, as it is now
>>> set to a value twice as large as what it had been previously?
>>
>> Good point! My answer in summary: I think that the code could be improved,
>> but with reasonable values for the "limit" module parameter, there should
>> not be issues.
>>
>> Grant mappings are set up via ioctl calls, and the structure field that
>> holds the number of grant references has u32 type:
>>
>> (Quoting from kernel v5.15.71 for convenience)
>> include/uapi/xen/gntdev.h
>> === 8< ===
>> struct ioctl_gntdev_map_grant_ref {
>> 	/* IN parameters */
>> 	/* The number of grants to be mapped. */
>> 	__u32 count;
>> === >8 ===
>>
>> However, the number of grant references is further limited in the actual
>> ioctl handler function gntdev_ioctl_map_grant_ref(), which calls
>> gntdev_test_page_count() to ensure that the number of granted pages
>> requested to be mapped does not exceed "limit". "limit" defaults to 64K,
>> which should be okay to use with an atomic_t type (i.e., a 32-bit signed
>> integer type) like "live_grants", assuming that the system administrator
>> does not go overboard and set "limit" to a very large value:
>>
>> drivers/xen/gntdev.c
>> === 8< ===
>> static unsigned int limit = 64*1024;
>> module_param(limit, uint, 0644);
>> MODULE_PARM_DESC(limit,
>> 	"Maximum number of grants that may be mapped by one mapping request");
>>
>> /* trimmed */
>>
>> bool gntdev_test_page_count(unsigned int count)
>> {
>> 	return !count || count > limit;
>> }
>>
>> /* trimmed */
>>
>> static long gntdev_ioctl_map_grant_ref(struct gntdev_priv *priv,
>> 				       struct ioctl_gntdev_map_grant_ref __user *u)
>> {
>> 	/* trimmed */
>>
>> 	pr_debug("priv %p, add %d\n", priv, op.count);
>> 	if (unlikely(gntdev_test_page_count(op.count)))
>> 		return -EINVAL;
>>
>> 	/* trimmed */
>> }
>> === >8 ===
>>
>> To be fair, the "count" field of the gndev_grant_map structure is a signed
>> integer, so very large values of count could overflow live_grants, as
>> live_grants needs to accommodate values up to and including 2*count.
> 
> Could this be replaced by an unsigned and/or 64-bit integer?
> Alternatively, one could use module_param_cb and param_set_uint_minmax
> to enforce that the limit is something reasonable.  That said, one needs
> almost 8TiB to trigger this problem, so while it ought to be fixed it
> isn’t a huge deal.  Certainly should not block getting this merged.

Thank you for the continued feedback.

I agree that these can be implemented to prevent overflowing "live_grants".
"live_grants" could be made an atomic64_t, and/or a to-be-chosen maximum
value less than or equal to INT_MAX/2 can be imposed on "limit" using the
approach you suggested.

I think that the latter option could be better, as the driver uses signed
integers in a number of places (including the gntdev_grant_map structure),
but the requested number of mappings (i.e., "count" in
ioctl_gntdev_map_grant_ref, provided by user-space) and "limit" are
unsigned integers.

> 
>> drivers/xen/gntdev-common.h
>> === 8< ===
>> struct gntdev_grant_map {
>> 	atomic_t in_use;
>> 	struct mmu_interval_notifier notifier;
>> 	bool notifier_init;
>> 	struct list_head next;
>> 	int index;
>> 	int count;
>> 	/* trimmed */
>> }
>> === >8 ===
>>
>>> If not, you can add:
>>>
>>> Acked-by: Demi Marie Obenour <demi@invisiblethingslab.com>
>>
>> Thank you! I hope that the explanation and rationale above are satisfactory.
>> Please let me know what you think.
> 
> They are indeed.

Thanks!

Vefa

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 1/2] xen/gntdev: Prevent leaking grants
  2022-10-02 22:20 ` [PATCH v2 1/2] xen/gntdev: Prevent leaking grants M. Vefa Bicakci
  2022-10-03  0:29   ` Demi Marie Obenour
@ 2022-10-06  8:33   ` Juergen Gross
  1 sibling, 0 replies; 12+ messages in thread
From: Juergen Gross @ 2022-10-06  8:33 UTC (permalink / raw)
  To: M. Vefa Bicakci, xen-devel, linux-kernel
  Cc: Stefano Stabellini, Oleksandr Tyshchenko, Demi Marie Obenour,
	Gerd Hoffmann


[-- Attachment #1.1.1: Type: text/plain, Size: 3806 bytes --]

On 03.10.22 00:20, M. Vefa Bicakci wrote:
> Prior to this commit, if a grant mapping operation failed partially,
> some of the entries in the map_ops array would be invalid, whereas all
> of the entries in the kmap_ops array would be valid. This in turn would
> cause the following logic in gntdev_map_grant_pages to become invalid:
> 
>    for (i = 0; i < map->count; i++) {
>      if (map->map_ops[i].status == GNTST_okay) {
>        map->unmap_ops[i].handle = map->map_ops[i].handle;
>        if (!use_ptemod)
>          alloced++;
>      }
>      if (use_ptemod) {
>        if (map->kmap_ops[i].status == GNTST_okay) {
>          if (map->map_ops[i].status == GNTST_okay)
>            alloced++;
>          map->kunmap_ops[i].handle = map->kmap_ops[i].handle;
>        }
>      }
>    }
>    ...
>    atomic_add(alloced, &map->live_grants);
> 
> Assume that use_ptemod is true (i.e., the domain mapping the granted
> pages is a paravirtualized domain). In the code excerpt above, note that
> the "alloced" variable is only incremented when both kmap_ops[i].status
> and map_ops[i].status are set to GNTST_okay (i.e., both mapping
> operations are successful).  However, as also noted above, there are
> cases where a grant mapping operation fails partially, breaking the
> assumption of the code excerpt above.
> 
> The aforementioned causes map->live_grants to be incorrectly set. In
> some cases, all of the map_ops mappings fail, but all of the kmap_ops
> mappings succeed, meaning that live_grants may remain zero. This in turn
> makes it impossible to unmap the successfully grant-mapped pages pointed
> to by kmap_ops, because unmap_grant_pages has the following snippet of
> code at its beginning:
> 
>    if (atomic_read(&map->live_grants) == 0)
>      return; /* Nothing to do */
> 
> In other cases where only some of the map_ops mappings fail but all
> kmap_ops mappings succeed, live_grants is made positive, but when the
> user requests unmapping the grant-mapped pages, __unmap_grant_pages_done
> will then make map->live_grants negative, because the latter function
> does not check if all of the pages that were requested to be unmapped
> were actually unmapped, and the same function unconditionally subtracts
> "data->count" (i.e., a value that can be greater than map->live_grants)
> from map->live_grants. The side effects of a negative live_grants value
> have not been studied.
> 
> The net effect of all of this is that grant references are leaked in one
> of the above conditions. In Qubes OS v4.1 (which uses Xen's grant
> mechanism extensively for X11 GUI isolation), this issue manifests
> itself with warning messages like the following to be printed out by the
> Linux kernel in the VM that had granted pages (that contain X11 GUI
> window data) to dom0: "g.e. 0x1234 still pending", especially after the
> user rapidly resizes GUI VM windows (causing some grant-mapping
> operations to partially or completely fail, due to the fact that the VM
> unshares some of the pages as part of the window resizing, making the
> pages impossible to grant-map from dom0).
> 
> The fix for this issue involves counting all successful map_ops and
> kmap_ops mappings separately, and then adding the sum to live_grants.
> During unmapping, only the number of successfully unmapped grants is
> subtracted from live_grants. The code is also modified to check for
> negative live_grants values after the subtraction and warn the user.
> 
> Link: https://github.com/QubesOS/qubes-issues/issues/7631
> Fixes: dbe97cff7dd9 ("xen/gntdev: Avoid blocking in unmap_grant_pages()")
> Cc: stable@vger.kernel.org
> Signed-off-by: M. Vefa Bicakci <m.v.b@runbox.com>

Reviewed-by: Juergen Gross <jgross@suse.com>


Juergen


[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3149 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 2/2] xen/gntdev: Accommodate VMA splitting
  2022-10-02 22:20 ` [PATCH v2 2/2] xen/gntdev: Accommodate VMA splitting M. Vefa Bicakci
@ 2022-10-06  8:36   ` Juergen Gross
  0 siblings, 0 replies; 12+ messages in thread
From: Juergen Gross @ 2022-10-06  8:36 UTC (permalink / raw)
  To: M. Vefa Bicakci, xen-devel, linux-kernel
  Cc: Stefano Stabellini, Oleksandr Tyshchenko, Demi Marie Obenour,
	Gerd Hoffmann


[-- Attachment #1.1.1: Type: text/plain, Size: 4174 bytes --]

On 03.10.22 00:20, M. Vefa Bicakci wrote:
> Prior to this commit, the gntdev driver code did not handle the
> following scenario correctly with paravirtualized (PV) Xen domains:
> 
> * User process sets up a gntdev mapping composed of two grant mappings
>    (i.e., two pages shared by another Xen domain).
> * User process munmap()s one of the pages.
> * User process munmap()s the remaining page.
> * User process exits.
> 
> In the scenario above, the user process would cause the kernel to log
> the following messages in dmesg for the first munmap(), and the second
> munmap() call would result in similar log messages:
> 
>    BUG: Bad page map in process doublemap.test  pte:... pmd:...
>    page:0000000057c97bff refcount:1 mapcount:-1 \
>      mapping:0000000000000000 index:0x0 pfn:...
>    ...
>    page dumped because: bad pte
>    ...
>    file:gntdev fault:0x0 mmap:gntdev_mmap [xen_gntdev] readpage:0x0
>    ...
>    Call Trace:
>     <TASK>
>     dump_stack_lvl+0x46/0x5e
>     print_bad_pte.cold+0x66/0xb6
>     unmap_page_range+0x7e5/0xdc0
>     unmap_vmas+0x78/0xf0
>     unmap_region+0xa8/0x110
>     __do_munmap+0x1ea/0x4e0
>     __vm_munmap+0x75/0x120
>     __x64_sys_munmap+0x28/0x40
>     do_syscall_64+0x38/0x90
>     entry_SYSCALL_64_after_hwframe+0x61/0xcb
>     ...
> 
> For each munmap() call, the Xen hypervisor (if built with CONFIG_DEBUG)
> would print out the following and trigger a general protection fault in
> the affected Xen PV domain:
> 
>    (XEN) d0v... Attempt to implicitly unmap d0's grant PTE ...
>    (XEN) d0v... Attempt to implicitly unmap d0's grant PTE ...
> 
> As of this writing, gntdev_grant_map structure's vma field (referred to
> as map->vma below) is mainly used for checking the start and end
> addresses of mappings. However, with split VMAs, these may change, and
> there could be more than one VMA associated with a gntdev mapping.
> Hence, remove the use of map->vma and rely on map->pages_vm_start for
> the original start address and on (map->count << PAGE_SHIFT) for the
> original mapping size. Let the invalidate() and find_special_page()
> hooks use these.
> 
> Also, given that there can be multiple VMAs associated with a gntdev
> mapping, move the "mmu_interval_notifier_remove(&map->notifier)" call to
> the end of gntdev_put_map, so that the MMU notifier is only removed
> after the closing of the last remaining VMA.
> 
> Finally, use an atomic to prevent inadvertent gntdev mapping re-use,
> instead of using the map->live_grants atomic counter and/or the map->vma
> pointer (the latter of which is now removed). This prevents the
> userspace from mmap()'ing (with MAP_FIXED) a gntdev mapping over the
> same address range as a previously set up gntdev mapping. This scenario
> can be summarized with the following call-trace, which was valid prior
> to this commit:
> 
>    mmap
>      gntdev_mmap
>    mmap (repeat mmap with MAP_FIXED over the same address range)
>      gntdev_invalidate
>        unmap_grant_pages (sets 'being_removed' entries to true)
>          gnttab_unmap_refs_async
>      unmap_single_vma
>      gntdev_mmap (maps the shared pages again)
>    munmap
>      gntdev_invalidate
>        unmap_grant_pages
>          (no-op because 'being_removed' entries are true)
>      unmap_single_vma (For PV domains, Xen reports that a granted page
>        is being unmapped and triggers a general protection fault in the
>        affected domain, if Xen was built with CONFIG_DEBUG)
> 
> The fix for this last scenario could be worth its own commit, but we
> opted for a single commit, because removing the gntdev_grant_map
> structure's vma field requires guarding the entry to gntdev_mmap(), and
> the live_grants atomic counter is not sufficient on its own to prevent
> the mmap() over a pre-existing mapping.
> 
> Link: https://github.com/QubesOS/qubes-issues/issues/7631
> Fixes: ab31523c2fca ("xen/gntdev: allow usermode to map granted pages")
> Cc: stable@vger.kernel.org
> Signed-off-by: M. Vefa Bicakci <m.v.b@runbox.com>

Reviewed-by: Juergen Gross <jgross@suse.com>


Juergen

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3149 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 0/2] xen/gntdev: Fixes for leaks and VMA splitting
  2022-10-02 22:20 [PATCH v2 0/2] xen/gntdev: Fixes for leaks and VMA splitting M. Vefa Bicakci
  2022-10-02 22:20 ` [PATCH v2 1/2] xen/gntdev: Prevent leaking grants M. Vefa Bicakci
  2022-10-02 22:20 ` [PATCH v2 2/2] xen/gntdev: Accommodate VMA splitting M. Vefa Bicakci
@ 2022-10-07  5:17 ` Juergen Gross
  2022-10-07 17:31   ` Demi Marie Obenour
  2022-10-08 15:17   ` M. Vefa Bicakci
  2 siblings, 2 replies; 12+ messages in thread
From: Juergen Gross @ 2022-10-07  5:17 UTC (permalink / raw)
  To: M. Vefa Bicakci, xen-devel, linux-kernel
  Cc: Stefano Stabellini, Oleksandr Tyshchenko, Demi Marie Obenour,
	Gerd Hoffmann


[-- Attachment #1.1.1: Type: text/plain, Size: 3166 bytes --]

On 03.10.22 00:20, M. Vefa Bicakci wrote:
> Hi all,
> 
> First of all, sorry for the delay!
> 
> These patches continue the code review for the following patches:
>    https://lore.kernel.org/xen-devel/20220912040002.198191-1-m.v.b@runbox.com/t/#u
> 
> The original description of the patch set is as follows:
> 
>    "The changes in this patch series intend to fix the Xen grant device
>    driver, so that grant mapping leaks caused by partially failed grant
>    mapping operations are avoided with the first patch, and so that the
>    splitting of VMAs does not result in incorrectly unmapped grant pages
>    with the second patch. The second patch also prevents a similar issue
>    in a double-mapping scenario, where mmap() is used with MAP_FIXED to
>    map grants over an existing mapping created with the same grants, and
>    where grant pages are unmapped incorrectly as well."
> 
> A summary of the changes from v1 is as follows:
> - Addressed Juergen's code review comment regarding the first patch.
> - Amended the description of the second patch to note that the described
>    issues are encountered with PV domains.
> 
> Verification notes:
> 
> - I have tested these commits on top of Linux v5.15.70 and v5.15.71, and
>    I verified that they compile successfully on top of the tag
>    "next-20220930", which corresponds to the base commit ID included at
>    the bottom of this e-mail.
> 
> - My tests consist of using a kernel with Qubes OS v4.1's patches and
>    these patches on my main computer for day-to-day tasks, in conjunction
>    with Qubes OS's version of the Xen hypervisor v4.14.5, with the latter
>    custom-compiled with CONFIG_DEBUG.
> 
> - I used a test program that verifies the following scenarios with an
>    unprivileged paravirtualized (PV) Xen domain:
> 
>    - A program mmap()s two pages from another Xen domain and munmap()s
>      the pages one by one. This used to result in implicit unmap errors
>      to be reported by Xen and a general protection fault to be triggered
>      by Xen in the affected domain, but now works as expected.
>    - A program mmap()s two pages from another Xen domain and then
>      attempts to remap (via MAP_FIXED) the same mapping again over the
>      same virtual address. This used to result in similar issues
>      (implicit unmap errors and general protection fault), but now is
>      rejected by the kernel.
>    - A program mmap()s two pages from another Xen domain and then
>      attempts to mmap() the same mapping again to a different virtual
>      address, by passing NULL as mmap()'s first argument. This used to be
>      rejected by the kernel, and it continues to be rejected by the
>      kernel.
> 
> - Unprivileged PVH Xen domains were also sanity tested with the same
>    test program. I should note that PVH domains worked as expected
>    without these patches too.
> 
> - Finally, I have verified that the original "g.e. 0x1234 still pending"
>    issue does not appear after rapidly resizing GUI windows in Qubes OS
>    v4.1.

Series pushed to xen/tip.git for-linus-6.1


Juergen


[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3149 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 0/2] xen/gntdev: Fixes for leaks and VMA splitting
  2022-10-07  5:17 ` [PATCH v2 0/2] xen/gntdev: Fixes for leaks and " Juergen Gross
@ 2022-10-07 17:31   ` Demi Marie Obenour
  2022-10-08 15:17   ` M. Vefa Bicakci
  1 sibling, 0 replies; 12+ messages in thread
From: Demi Marie Obenour @ 2022-10-07 17:31 UTC (permalink / raw)
  To: Juergen Gross, M. Vefa Bicakci, xen-devel, linux-kernel
  Cc: Stefano Stabellini, Oleksandr Tyshchenko, Gerd Hoffmann

[-- Attachment #1: Type: text/plain, Size: 3426 bytes --]

On Fri, Oct 07, 2022 at 07:17:41AM +0200, Juergen Gross wrote:
> On 03.10.22 00:20, M. Vefa Bicakci wrote:
> > Hi all,
> > 
> > First of all, sorry for the delay!
> > 
> > These patches continue the code review for the following patches:
> >    https://lore.kernel.org/xen-devel/20220912040002.198191-1-m.v.b@runbox.com/t/#u
> > 
> > The original description of the patch set is as follows:
> > 
> >    "The changes in this patch series intend to fix the Xen grant device
> >    driver, so that grant mapping leaks caused by partially failed grant
> >    mapping operations are avoided with the first patch, and so that the
> >    splitting of VMAs does not result in incorrectly unmapped grant pages
> >    with the second patch. The second patch also prevents a similar issue
> >    in a double-mapping scenario, where mmap() is used with MAP_FIXED to
> >    map grants over an existing mapping created with the same grants, and
> >    where grant pages are unmapped incorrectly as well."
> > 
> > A summary of the changes from v1 is as follows:
> > - Addressed Juergen's code review comment regarding the first patch.
> > - Amended the description of the second patch to note that the described
> >    issues are encountered with PV domains.
> > 
> > Verification notes:
> > 
> > - I have tested these commits on top of Linux v5.15.70 and v5.15.71, and
> >    I verified that they compile successfully on top of the tag
> >    "next-20220930", which corresponds to the base commit ID included at
> >    the bottom of this e-mail.
> > 
> > - My tests consist of using a kernel with Qubes OS v4.1's patches and
> >    these patches on my main computer for day-to-day tasks, in conjunction
> >    with Qubes OS's version of the Xen hypervisor v4.14.5, with the latter
> >    custom-compiled with CONFIG_DEBUG.
> > 
> > - I used a test program that verifies the following scenarios with an
> >    unprivileged paravirtualized (PV) Xen domain:
> > 
> >    - A program mmap()s two pages from another Xen domain and munmap()s
> >      the pages one by one. This used to result in implicit unmap errors
> >      to be reported by Xen and a general protection fault to be triggered
> >      by Xen in the affected domain, but now works as expected.
> >    - A program mmap()s two pages from another Xen domain and then
> >      attempts to remap (via MAP_FIXED) the same mapping again over the
> >      same virtual address. This used to result in similar issues
> >      (implicit unmap errors and general protection fault), but now is
> >      rejected by the kernel.
> >    - A program mmap()s two pages from another Xen domain and then
> >      attempts to mmap() the same mapping again to a different virtual
> >      address, by passing NULL as mmap()'s first argument. This used to be
> >      rejected by the kernel, and it continues to be rejected by the
> >      kernel.
> > 
> > - Unprivileged PVH Xen domains were also sanity tested with the same
> >    test program. I should note that PVH domains worked as expected
> >    without these patches too.
> > 
> > - Finally, I have verified that the original "g.e. 0x1234 still pending"
> >    issue does not appear after rapidly resizing GUI windows in Qubes OS
> >    v4.1.
> 
> Series pushed to xen/tip.git for-linus-6.1

Thanks!
-- 
Sincerely,
Demi Marie Obenour (she/her/hers)
Invisible Things Lab

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 0/2] xen/gntdev: Fixes for leaks and VMA splitting
  2022-10-07  5:17 ` [PATCH v2 0/2] xen/gntdev: Fixes for leaks and " Juergen Gross
  2022-10-07 17:31   ` Demi Marie Obenour
@ 2022-10-08 15:17   ` M. Vefa Bicakci
  1 sibling, 0 replies; 12+ messages in thread
From: M. Vefa Bicakci @ 2022-10-08 15:17 UTC (permalink / raw)
  To: Juergen Gross, xen-devel, linux-kernel
  Cc: Stefano Stabellini, Oleksandr Tyshchenko, Demi Marie Obenour,
	Gerd Hoffmann

On 2022-10-07 01:17, Juergen Gross wrote:
> On 03.10.22 00:20, M. Vefa Bicakci wrote:
>> Hi all,
>>
>> First of all, sorry for the delay!
>>
>> These patches continue the code review for the following patches:
>>    https://lore.kernel.org/xen-devel/20220912040002.198191-1-m.v.b@runbox.com/t/#u
>>
>> The original description of the patch set is as follows:
>>
>>    "The changes in this patch series intend to fix the Xen grant device
>>    driver, so that grant mapping leaks caused by partially failed grant
>>    mapping operations are avoided with the first patch, and so that the
>>    splitting of VMAs does not result in incorrectly unmapped grant pages
>>    with the second patch. The second patch also prevents a similar issue
>>    in a double-mapping scenario, where mmap() is used with MAP_FIXED to
>>    map grants over an existing mapping created with the same grants, and
>>    where grant pages are unmapped incorrectly as well."
>>
>> A summary of the changes from v1 is as follows:
>> - Addressed Juergen's code review comment regarding the first patch.
>> - Amended the description of the second patch to note that the described
>>    issues are encountered with PV domains.
>>
>> Verification notes:
>>
>> - I have tested these commits on top of Linux v5.15.70 and v5.15.71, and
>>    I verified that they compile successfully on top of the tag
>>    "next-20220930", which corresponds to the base commit ID included at
>>    the bottom of this e-mail.
>>
>> - My tests consist of using a kernel with Qubes OS v4.1's patches and
>>    these patches on my main computer for day-to-day tasks, in conjunction
>>    with Qubes OS's version of the Xen hypervisor v4.14.5, with the latter
>>    custom-compiled with CONFIG_DEBUG.
>>
>> - I used a test program that verifies the following scenarios with an
>>    unprivileged paravirtualized (PV) Xen domain:
>>
>>    - A program mmap()s two pages from another Xen domain and munmap()s
>>      the pages one by one. This used to result in implicit unmap errors
>>      to be reported by Xen and a general protection fault to be triggered
>>      by Xen in the affected domain, but now works as expected.
>>    - A program mmap()s two pages from another Xen domain and then
>>      attempts to remap (via MAP_FIXED) the same mapping again over the
>>      same virtual address. This used to result in similar issues
>>      (implicit unmap errors and general protection fault), but now is
>>      rejected by the kernel.
>>    - A program mmap()s two pages from another Xen domain and then
>>      attempts to mmap() the same mapping again to a different virtual
>>      address, by passing NULL as mmap()'s first argument. This used to be
>>      rejected by the kernel, and it continues to be rejected by the
>>      kernel.
>>
>> - Unprivileged PVH Xen domains were also sanity tested with the same
>>    test program. I should note that PVH domains worked as expected
>>    without these patches too.
>>
>> - Finally, I have verified that the original "g.e. 0x1234 still pending"
>>    issue does not appear after rapidly resizing GUI windows in Qubes OS
>>    v4.1.
> 
> Series pushed to xen/tip.git for-linus-6.1
> 
> 
> Juergen

I am a bit late, but thank you for reviewing the changes and merging them!

Vefa

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2022-10-08 15:17 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-10-02 22:20 [PATCH v2 0/2] xen/gntdev: Fixes for leaks and VMA splitting M. Vefa Bicakci
2022-10-02 22:20 ` [PATCH v2 1/2] xen/gntdev: Prevent leaking grants M. Vefa Bicakci
2022-10-03  0:29   ` Demi Marie Obenour
2022-10-04  1:31     ` M. Vefa Bicakci
2022-10-04  1:51       ` Demi Marie Obenour
2022-10-05  1:37         ` M. Vefa Bicakci
2022-10-06  8:33   ` Juergen Gross
2022-10-02 22:20 ` [PATCH v2 2/2] xen/gntdev: Accommodate VMA splitting M. Vefa Bicakci
2022-10-06  8:36   ` Juergen Gross
2022-10-07  5:17 ` [PATCH v2 0/2] xen/gntdev: Fixes for leaks and " Juergen Gross
2022-10-07 17:31   ` Demi Marie Obenour
2022-10-08 15:17   ` M. Vefa Bicakci

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).