On Mon, Oct 03, 2022 at 09:31:25PM -0400, M. Vefa Bicakci wrote: > On 2022-10-02 20:29, Demi Marie Obenour wrote: > > On Sun, Oct 02, 2022 at 06:20:05PM -0400, M. Vefa Bicakci wrote: > > > Prior to this commit, if a grant mapping operation failed partially, > > > some of the entries in the map_ops array would be invalid, whereas all > > > of the entries in the kmap_ops array would be valid. This in turn would > > > cause the following logic in gntdev_map_grant_pages to become invalid: > > > > > > for (i = 0; i < map->count; i++) { > > > if (map->map_ops[i].status == GNTST_okay) { > > > map->unmap_ops[i].handle = map->map_ops[i].handle; > > > if (!use_ptemod) > > > alloced++; > > > } > > > if (use_ptemod) { > > > if (map->kmap_ops[i].status == GNTST_okay) { > > > if (map->map_ops[i].status == GNTST_okay) > > > alloced++; > > > map->kunmap_ops[i].handle = map->kmap_ops[i].handle; > > > } > > > } > > > } > > > ... > > > atomic_add(alloced, &map->live_grants); > > > > > > Assume that use_ptemod is true (i.e., the domain mapping the granted > > > pages is a paravirtualized domain). In the code excerpt above, note that > > > the "alloced" variable is only incremented when both kmap_ops[i].status > > > and map_ops[i].status are set to GNTST_okay (i.e., both mapping > > > operations are successful). However, as also noted above, there are > > > cases where a grant mapping operation fails partially, breaking the > > > assumption of the code excerpt above. > > > > > > The aforementioned causes map->live_grants to be incorrectly set. In > > > some cases, all of the map_ops mappings fail, but all of the kmap_ops > > > mappings succeed, meaning that live_grants may remain zero. This in turn > > > makes it impossible to unmap the successfully grant-mapped pages pointed > > > to by kmap_ops, because unmap_grant_pages has the following snippet of > > > code at its beginning: > > > > > > if (atomic_read(&map->live_grants) == 0) > > > return; /* Nothing to do */ > > > > > > In other cases where only some of the map_ops mappings fail but all > > > kmap_ops mappings succeed, live_grants is made positive, but when the > > > user requests unmapping the grant-mapped pages, __unmap_grant_pages_done > > > will then make map->live_grants negative, because the latter function > > > does not check if all of the pages that were requested to be unmapped > > > were actually unmapped, and the same function unconditionally subtracts > > > "data->count" (i.e., a value that can be greater than map->live_grants) > > > from map->live_grants. The side effects of a negative live_grants value > > > have not been studied. > > > > > > The net effect of all of this is that grant references are leaked in one > > > of the above conditions. In Qubes OS v4.1 (which uses Xen's grant > > > mechanism extensively for X11 GUI isolation), this issue manifests > > > itself with warning messages like the following to be printed out by the > > > Linux kernel in the VM that had granted pages (that contain X11 GUI > > > window data) to dom0: "g.e. 0x1234 still pending", especially after the > > > user rapidly resizes GUI VM windows (causing some grant-mapping > > > operations to partially or completely fail, due to the fact that the VM > > > unshares some of the pages as part of the window resizing, making the > > > pages impossible to grant-map from dom0). > > > > > > The fix for this issue involves counting all successful map_ops and > > > kmap_ops mappings separately, and then adding the sum to live_grants. > > > During unmapping, only the number of successfully unmapped grants is > > > subtracted from live_grants. The code is also modified to check for > > > negative live_grants values after the subtraction and warn the user. > > > > > > Link: https://github.com/QubesOS/qubes-issues/issues/7631 > > > Fixes: dbe97cff7dd9 ("xen/gntdev: Avoid blocking in unmap_grant_pages()") > > > > Looks like this patch has been pretty buggy, sorry. This is the second > > time there has been a problem with it. Thanks for the fix. > > Hi, > > No problem! :-) Debugging this issue and coming up with a fix was a > nice challenge for me. You’re welcome! I’m glad you were able to do this. > > > Cc: stable@vger.kernel.org > > > Signed-off-by: M. Vefa Bicakci > > > --- > > > > > > Changes since v1: > > > - To determine which unmap operations were successful, the previous > > > version of this patch set the "unmap_ops[i].status" and > > > "kunmap_ops[i].status" fields to the value "1" prior to passing these > > > data structures to the hypervisor. Instead of doing that, the code now > > > checks whether the "handle" fields in the same data structures were > > > *not* set to "INVALID_GRANT_HANDLE". (Suggested by Juergen Gross.) > > > --- > > > drivers/xen/gntdev.c | 22 +++++++++++++++++----- > > > 1 file changed, 17 insertions(+), 5 deletions(-) > > > > > > diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c > > > index 84b143eef395..eb0586b9767d 100644 > > > --- a/drivers/xen/gntdev.c > > > +++ b/drivers/xen/gntdev.c > > > @@ -367,8 +367,7 @@ int gntdev_map_grant_pages(struct gntdev_grant_map *map) > > > for (i = 0; i < map->count; i++) { > > > if (map->map_ops[i].status == GNTST_okay) { > > > map->unmap_ops[i].handle = map->map_ops[i].handle; > > > - if (!use_ptemod) > > > - alloced++; > > > + alloced++; > > > } else if (!err) > > > err = -EINVAL; > > > @@ -377,8 +376,7 @@ int gntdev_map_grant_pages(struct gntdev_grant_map *map) > > > if (use_ptemod) { > > > if (map->kmap_ops[i].status == GNTST_okay) { > > > - if (map->map_ops[i].status == GNTST_okay) > > > - alloced++; > > > + alloced++; > > > map->kunmap_ops[i].handle = map->kmap_ops[i].handle; > > > } else if (!err) > > > err = -EINVAL; > > > @@ -394,8 +392,14 @@ static void __unmap_grant_pages_done(int result, > > > unsigned int i; > > > struct gntdev_grant_map *map = data->data; > > > unsigned int offset = data->unmap_ops - map->unmap_ops; > > > + int successful_unmaps = 0; > > > + int live_grants; > > > for (i = 0; i < data->count; i++) { > > > + if (map->unmap_ops[offset + i].status == GNTST_okay && > > > + map->unmap_ops[offset + i].handle != INVALID_GRANT_HANDLE) > > > + successful_unmaps++; > > > + > > > WARN_ON(map->unmap_ops[offset + i].status != GNTST_okay && > > > map->unmap_ops[offset + i].handle != INVALID_GRANT_HANDLE); > > > pr_debug("unmap handle=%d st=%d\n", > > > @@ -403,6 +407,10 @@ static void __unmap_grant_pages_done(int result, > > > map->unmap_ops[offset+i].status); > > > map->unmap_ops[offset+i].handle = INVALID_GRANT_HANDLE; > > > if (use_ptemod) { > > > + if (map->kunmap_ops[offset + i].status == GNTST_okay && > > > + map->kunmap_ops[offset + i].handle != INVALID_GRANT_HANDLE) > > > + successful_unmaps++; > > > + > > > WARN_ON(map->kunmap_ops[offset + i].status != GNTST_okay && > > > map->kunmap_ops[offset + i].handle != INVALID_GRANT_HANDLE); > > > pr_debug("kunmap handle=%u st=%d\n", > > > @@ -411,11 +419,15 @@ static void __unmap_grant_pages_done(int result, > > > map->kunmap_ops[offset+i].handle = INVALID_GRANT_HANDLE; > > > } > > > } > > > + > > > /* > > > * Decrease the live-grant counter. This must happen after the loop to > > > * prevent premature reuse of the grants by gnttab_mmap(). > > > */ > > > - atomic_sub(data->count, &map->live_grants); > > > + live_grants = atomic_sub_return(successful_unmaps, &map->live_grants); > > > + if (WARN_ON(live_grants < 0)) > > > + pr_err("%s: live_grants became negative (%d) after unmapping %d pages!\n", > > > + __func__, live_grants, successful_unmaps); > > > /* Release reference taken by __unmap_grant_pages */ > > > gntdev_put_map(NULL, map); > > > -- > > > 2.37.3 > > > > Is there a possibility that live_grants could overflow, as it is now > > set to a value twice as large as what it had been previously? > > Good point! My answer in summary: I think that the code could be improved, > but with reasonable values for the "limit" module parameter, there should > not be issues. > > Grant mappings are set up via ioctl calls, and the structure field that > holds the number of grant references has u32 type: > > (Quoting from kernel v5.15.71 for convenience) > include/uapi/xen/gntdev.h > === 8< === > struct ioctl_gntdev_map_grant_ref { > /* IN parameters */ > /* The number of grants to be mapped. */ > __u32 count; > === >8 === > > However, the number of grant references is further limited in the actual > ioctl handler function gntdev_ioctl_map_grant_ref(), which calls > gntdev_test_page_count() to ensure that the number of granted pages > requested to be mapped does not exceed "limit". "limit" defaults to 64K, > which should be okay to use with an atomic_t type (i.e., a 32-bit signed > integer type) like "live_grants", assuming that the system administrator > does not go overboard and set "limit" to a very large value: > > drivers/xen/gntdev.c > === 8< === > static unsigned int limit = 64*1024; > module_param(limit, uint, 0644); > MODULE_PARM_DESC(limit, > "Maximum number of grants that may be mapped by one mapping request"); > > /* trimmed */ > > bool gntdev_test_page_count(unsigned int count) > { > return !count || count > limit; > } > > /* trimmed */ > > static long gntdev_ioctl_map_grant_ref(struct gntdev_priv *priv, > struct ioctl_gntdev_map_grant_ref __user *u) > { > /* trimmed */ > > pr_debug("priv %p, add %d\n", priv, op.count); > if (unlikely(gntdev_test_page_count(op.count))) > return -EINVAL; > > /* trimmed */ > } > === >8 === > > To be fair, the "count" field of the gndev_grant_map structure is a signed > integer, so very large values of count could overflow live_grants, as > live_grants needs to accommodate values up to and including 2*count. Could this be replaced by an unsigned and/or 64-bit integer? Alternatively, one could use module_param_cb and param_set_uint_minmax to enforce that the limit is something reasonable. That said, one needs almost 8TiB to trigger this problem, so while it ought to be fixed it isn’t a huge deal. Certainly should not block getting this merged. > drivers/xen/gntdev-common.h > === 8< === > struct gntdev_grant_map { > atomic_t in_use; > struct mmu_interval_notifier notifier; > bool notifier_init; > struct list_head next; > int index; > int count; > /* trimmed */ > } > === >8 === > > > If not, you can add: > > > > Acked-by: Demi Marie Obenour > > Thank you! I hope that the explanation and rationale above are satisfactory. > Please let me know what you think. They are indeed. -- Sincerely, Demi Marie Obenour (she/her/hers) Invisible Things Lab