All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 0/7] Multiple fixes to XENMEM_acquire_resource
@ 2021-01-12 19:48 Andrew Cooper
  2021-01-12 19:48 ` [PATCH v3 1/7] xen/gnttab: Rework resource acquisition Andrew Cooper
                   ` (6 more replies)
  0 siblings, 7 replies; 29+ messages in thread
From: Andrew Cooper @ 2021-01-12 19:48 UTC (permalink / raw)
  To: Xen-devel
  Cc: Andrew Cooper, George Dunlap, Ian Jackson, Jan Beulich,
	Stefano Stabellini, Wei Liu, Julien Grall, Paul Durrant,
	Michał Leszczyński, Hubert Jasudowicz, Tamas K Lengyel,
	Juergen Gross

I thought this was going to be a very simple small bugfix for Michał's
Processor Trace series.  Serves me right for expecting it not to be full of
bear traps...

The sole implementation of acquire_resource never asks for size, so its little
surprise that Xen is broken for compat callers, and returns incorrect
information for regular callers.

v2 was delayed substantially due to the discovery of XSA-334, but is complete
now, permitting the mapping of arbitrary sized resouces, along with fixes to
the compat XLAT logic.

v3 was delayed substanitally due to other security work.  The major change
from v2 is how the size request works from userspace, which now depends on:

  https://lore.kernel.org/xen-devel/20210112115358.23346-1-roger.pau@citrix.com/T/#u

to fix the IOCTL in Linux without breaking ioctl-restrict usecases.

A branch can be obtained from:

  https://xenbits.xen.org/gitweb/?p=people/andrewcoop/xen.git;a=shortlog;h=refs/heads/xen-acquire-resource

Andrew Cooper (7):
  xen/gnttab: Rework resource acquisition
  xen/memory: Fix acquire_resource size semantics
  tools/foreignmem: Support querying the size of a resource
  xen/memory: Clarify the XENMEM_acquire_resource ABI description
  xen/memory: Improve compat XENMEM_acquire_resource handling
  xen/memory: Indent part of acquire_resource()
  xen/memory: Fix mapping grant tables with XENMEM_acquire_resource

 tools/include/xenforeignmemory.h                 |  15 ++
 tools/libs/foreignmemory/Makefile                |   2 +-
 tools/libs/foreignmemory/core.c                  |  18 ++
 tools/libs/foreignmemory/freebsd.c               |  18 +-
 tools/libs/foreignmemory/libxenforeignmemory.map |   4 +
 tools/libs/foreignmemory/linux.c                 |  18 +-
 xen/arch/x86/mm.c                                |  24 ++-
 xen/common/compat/memory.c                       | 147 ++++++++++++----
 xen/common/grant_table.c                         | 106 +++++++++---
 xen/common/memory.c                              | 210 ++++++++++++++---------
 xen/include/asm-x86/mm.h                         |   3 +
 xen/include/public/memory.h                      |  23 ++-
 xen/include/xen/grant_table.h                    |  21 ++-
 xen/include/xen/mm.h                             |   6 +
 14 files changed, 445 insertions(+), 170 deletions(-)

-- 
2.11.0



^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH v3 1/7] xen/gnttab: Rework resource acquisition
  2021-01-12 19:48 [PATCH v3 0/7] Multiple fixes to XENMEM_acquire_resource Andrew Cooper
@ 2021-01-12 19:48 ` Andrew Cooper
  2021-01-15 11:43   ` Jan Beulich
  2021-01-15 11:56   ` Jan Beulich
  2021-01-12 19:48 ` [PATCH v3 2/7] xen/memory: Fix acquire_resource size semantics Andrew Cooper
                   ` (5 subsequent siblings)
  6 siblings, 2 replies; 29+ messages in thread
From: Andrew Cooper @ 2021-01-12 19:48 UTC (permalink / raw)
  To: Xen-devel
  Cc: Andrew Cooper, George Dunlap, Ian Jackson, Jan Beulich,
	Stefano Stabellini, Wei Liu, Julien Grall, Paul Durrant,
	Michał Leszczyński, Hubert Jasudowicz, Tamas K Lengyel

The existing logic doesn't function in the general case for mapping a guests
grant table, due to arbitrary 32 frame limit, and the default grant table
limit being 64.

In order to start addressing this, rework the existing grant table logic by
implementing a single gnttab_acquire_resource().  This is far more efficient
than the previous acquire_grant_table() in memory.c because it doesn't take
the grant table write lock, and attempt to grow the table, for every single
frame.

The new gnttab_acquire_resource() function subsumes the previous two
gnttab_get_{shared,status}_frame() helpers.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: George Dunlap <George.Dunlap@eu.citrix.com>
CC: Ian Jackson <iwj@xenproject.org>
CC: Jan Beulich <JBeulich@suse.com>
CC: Stefano Stabellini <sstabellini@kernel.org>
CC: Wei Liu <wl@xen.org>
CC: Julien Grall <julien@xen.org>
CC: Paul Durrant <paul@xen.org>
CC: Michał Leszczyński <michal.leszczynski@cert.pl>
CC: Hubert Jasudowicz <hubert.jasudowicz@cert.pl>
CC: Tamas K Lengyel <tamas@tklengyel.com>

v3:
 * Fold switch statements in gnttab_acquire_resource()
---
 xen/common/grant_table.c      | 80 ++++++++++++++++++++++++++++---------------
 xen/common/memory.c           | 42 ++---------------------
 xen/include/xen/grant_table.h | 19 ++++------
 3 files changed, 62 insertions(+), 79 deletions(-)

diff --git a/xen/common/grant_table.c b/xen/common/grant_table.c
index a5d3ed8bda..f560c250d7 100644
--- a/xen/common/grant_table.c
+++ b/xen/common/grant_table.c
@@ -4013,6 +4013,59 @@ static int gnttab_get_shared_frame_mfn(struct domain *d,
     return 0;
 }
 
+int gnttab_acquire_resource(
+    struct domain *d, unsigned int id, unsigned long frame,
+    unsigned int nr_frames, xen_pfn_t mfn_list[])
+{
+    struct grant_table *gt = d->grant_table;
+    unsigned int i = nr_frames, tot_frames;
+    mfn_t tmp;
+    void **vaddrs;
+    int rc;
+
+    /* Overflow checks */
+    if ( frame + nr_frames < frame )
+        return -EINVAL;
+
+    tot_frames = frame + nr_frames;
+    if ( tot_frames != frame + nr_frames )
+        return -EINVAL;
+
+    /* Grow table if necessary. */
+    grant_write_lock(gt);
+    rc = -EINVAL;
+    switch ( id )
+    {
+    case XENMEM_resource_grant_table_id_shared:
+        vaddrs = gt->shared_raw;
+        rc = gnttab_get_shared_frame_mfn(d, tot_frames - 1, &tmp);
+        break;
+
+    case XENMEM_resource_grant_table_id_status:
+        if ( gt->gt_version != 2 )
+            break;
+
+        /* Check that void ** is a suitable representation for gt->status. */
+        BUILD_BUG_ON(!__builtin_types_compatible_p(
+                         typeof(gt->status), grant_status_t **));
+        vaddrs = (void **)gt->status;
+        rc = gnttab_get_status_frame_mfn(d, tot_frames - 1, &tmp);
+        break;
+    }
+
+    /* Any errors?  Bad id, or from growing the table? */
+    if ( rc )
+        goto out;
+
+    for ( i = 0; i < nr_frames; ++i )
+        mfn_list[i] = virt_to_mfn(vaddrs[frame + i]);
+
+ out:
+    grant_write_unlock(gt);
+
+    return rc;
+}
+
 int gnttab_map_frame(struct domain *d, unsigned long idx, gfn_t gfn, mfn_t *mfn)
 {
     int rc = 0;
@@ -4047,33 +4100,6 @@ int gnttab_map_frame(struct domain *d, unsigned long idx, gfn_t gfn, mfn_t *mfn)
     return rc;
 }
 
-int gnttab_get_shared_frame(struct domain *d, unsigned long idx,
-                            mfn_t *mfn)
-{
-    struct grant_table *gt = d->grant_table;
-    int rc;
-
-    grant_write_lock(gt);
-    rc = gnttab_get_shared_frame_mfn(d, idx, mfn);
-    grant_write_unlock(gt);
-
-    return rc;
-}
-
-int gnttab_get_status_frame(struct domain *d, unsigned long idx,
-                            mfn_t *mfn)
-{
-    struct grant_table *gt = d->grant_table;
-    int rc;
-
-    grant_write_lock(gt);
-    rc = (gt->gt_version == 2) ?
-        gnttab_get_status_frame_mfn(d, idx, mfn) : -EINVAL;
-    grant_write_unlock(gt);
-
-    return rc;
-}
-
 static void gnttab_usage_print(struct domain *rd)
 {
     int first = 1;
diff --git a/xen/common/memory.c b/xen/common/memory.c
index b21b6c452d..82cf7b41ee 100644
--- a/xen/common/memory.c
+++ b/xen/common/memory.c
@@ -1052,44 +1052,6 @@ static long xatp_permission_check(struct domain *d, unsigned int space)
     return xsm_add_to_physmap(XSM_TARGET, current->domain, d);
 }
 
-static int acquire_grant_table(struct domain *d, unsigned int id,
-                               unsigned long frame,
-                               unsigned int nr_frames,
-                               xen_pfn_t mfn_list[])
-{
-    unsigned int i = nr_frames;
-
-    /* Iterate backwards in case table needs to grow */
-    while ( i-- != 0 )
-    {
-        mfn_t mfn = INVALID_MFN;
-        int rc;
-
-        switch ( id )
-        {
-        case XENMEM_resource_grant_table_id_shared:
-            rc = gnttab_get_shared_frame(d, frame + i, &mfn);
-            break;
-
-        case XENMEM_resource_grant_table_id_status:
-            rc = gnttab_get_status_frame(d, frame + i, &mfn);
-            break;
-
-        default:
-            rc = -EINVAL;
-            break;
-        }
-
-        if ( rc )
-            return rc;
-
-        ASSERT(!mfn_eq(mfn, INVALID_MFN));
-        mfn_list[i] = mfn_x(mfn);
-    }
-
-    return 0;
-}
-
 static int acquire_resource(
     XEN_GUEST_HANDLE_PARAM(xen_mem_acquire_resource_t) arg)
 {
@@ -1144,8 +1106,8 @@ static int acquire_resource(
     switch ( xmar.type )
     {
     case XENMEM_resource_grant_table:
-        rc = acquire_grant_table(d, xmar.id, xmar.frame, xmar.nr_frames,
-                                 mfn_list);
+        rc = gnttab_acquire_resource(d, xmar.id, xmar.frame, xmar.nr_frames,
+                                     mfn_list);
         break;
 
     default:
diff --git a/xen/include/xen/grant_table.h b/xen/include/xen/grant_table.h
index 98603604b8..5a2c75b880 100644
--- a/xen/include/xen/grant_table.h
+++ b/xen/include/xen/grant_table.h
@@ -56,10 +56,10 @@ int mem_sharing_gref_to_gfn(struct grant_table *gt, grant_ref_t ref,
 
 int gnttab_map_frame(struct domain *d, unsigned long idx, gfn_t gfn,
                      mfn_t *mfn);
-int gnttab_get_shared_frame(struct domain *d, unsigned long idx,
-                            mfn_t *mfn);
-int gnttab_get_status_frame(struct domain *d, unsigned long idx,
-                            mfn_t *mfn);
+
+int gnttab_acquire_resource(
+    struct domain *d, unsigned int id, unsigned long frame,
+    unsigned int nr_frames, xen_pfn_t mfn_list[]);
 
 #else
 
@@ -93,14 +93,9 @@ static inline int gnttab_map_frame(struct domain *d, unsigned long idx,
     return -EINVAL;
 }
 
-static inline int gnttab_get_shared_frame(struct domain *d, unsigned long idx,
-                                          mfn_t *mfn)
-{
-    return -EINVAL;
-}
-
-static inline int gnttab_get_status_frame(struct domain *d, unsigned long idx,
-                                          mfn_t *mfn)
+static inline int gnttab_acquire_resource(
+    struct domain *d, unsigned int id, unsigned long frame,
+    unsigned int nr_frames, xen_pfn_t mfn_list[])
 {
     return -EINVAL;
 }
-- 
2.11.0



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v3 2/7] xen/memory: Fix acquire_resource size semantics
  2021-01-12 19:48 [PATCH v3 0/7] Multiple fixes to XENMEM_acquire_resource Andrew Cooper
  2021-01-12 19:48 ` [PATCH v3 1/7] xen/gnttab: Rework resource acquisition Andrew Cooper
@ 2021-01-12 19:48 ` Andrew Cooper
  2021-01-12 20:15   ` Julien Grall
  2021-01-12 19:48 ` [PATCH v3 3/7] tools/foreignmem: Support querying the size of a resource Andrew Cooper
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 29+ messages in thread
From: Andrew Cooper @ 2021-01-12 19:48 UTC (permalink / raw)
  To: Xen-devel
  Cc: Andrew Cooper, George Dunlap, Ian Jackson, Jan Beulich,
	Stefano Stabellini, Wei Liu, Julien Grall, Paul Durrant,
	Michał Leszczyński, Hubert Jasudowicz, Tamas K Lengyel

Calling XENMEM_acquire_resource with a NULL frame_list is a request for the
size of the resource, but the returned 32 is bogus.

If someone tries to follow it for XENMEM_resource_ioreq_server, the acquire
call will fail as IOREQ servers currently top out at 2 frames, and it is only
half the size of the default grant table limit for guests.

Also, no users actually request a resource size, because it was never wired up
in the sole implementaion of resource acquisition in Linux.

Introduce a new resource_max_frames() to calculate the size of a resource, and
implement it the IOREQ and grant subsystems.

It is impossible to guarantee that a mapping call following a successful size
call will succeed (e.g. The target IOREQ server gets destroyed, or the domain
switches from grant v2 to v1).  Document the restriction, and use the
flexibility to simplify the paths to be lockless.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Paul Durrant <paul@xen.org>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
---
CC: George Dunlap <George.Dunlap@eu.citrix.com>
CC: Ian Jackson <iwj@xenproject.org>
CC: Jan Beulich <JBeulich@suse.com>
CC: Stefano Stabellini <sstabellini@kernel.org>
CC: Wei Liu <wl@xen.org>
CC: Julien Grall <julien@xen.org>
CC: Paul Durrant <paul@xen.org>
CC: Michał Leszczyński <michal.leszczynski@cert.pl>
CC: Hubert Jasudowicz <hubert.jasudowicz@cert.pl>
CC: Tamas K Lengyel <tamas@tklengyel.com>

v3:
 * Use const struct domain *
 * Fold goto out paths
v2:
 * Spelling fixes
 * Add more local variables.
 * Don't return any status frames on ARM where v2 support is compiled out.
---
 xen/arch/x86/mm.c             | 20 +++++++++++++++++
 xen/common/grant_table.c      | 23 ++++++++++++++++++++
 xen/common/memory.c           | 50 ++++++++++++++++++++++++++++++++-----------
 xen/include/asm-x86/mm.h      |  3 +++
 xen/include/public/memory.h   | 16 ++++++++++----
 xen/include/xen/grant_table.h |  8 +++++++
 xen/include/xen/mm.h          |  6 ++++++
 7 files changed, 109 insertions(+), 17 deletions(-)

diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 63e9fae919..7a2e94cd6f 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -4587,6 +4587,26 @@ static int handle_iomem_range(unsigned long s, unsigned long e, void *p)
     return err || s > e ? err : _handle_iomem_range(s, e, p);
 }
 
+unsigned int arch_resource_max_frames(
+    const struct domain *d, unsigned int type, unsigned int id)
+{
+    unsigned int nr = 0;
+
+    switch ( type )
+    {
+#ifdef CONFIG_HVM
+    case XENMEM_resource_ioreq_server:
+        if ( !is_hvm_domain(d) )
+            break;
+        /* One frame for the buf-ioreq ring, and one frame per 128 vcpus. */
+        nr = 1 + DIV_ROUND_UP(d->max_vcpus * sizeof(struct ioreq), PAGE_SIZE);
+        break;
+#endif
+    }
+
+    return nr;
+}
+
 int arch_acquire_resource(struct domain *d, unsigned int type,
                           unsigned int id, unsigned long frame,
                           unsigned int nr_frames, xen_pfn_t mfn_list[])
diff --git a/xen/common/grant_table.c b/xen/common/grant_table.c
index f560c250d7..bd99dddbf6 100644
--- a/xen/common/grant_table.c
+++ b/xen/common/grant_table.c
@@ -4013,6 +4013,29 @@ static int gnttab_get_shared_frame_mfn(struct domain *d,
     return 0;
 }
 
+unsigned int gnttab_resource_max_frames(const struct domain *d, unsigned int id)
+{
+    const struct grant_table *gt = d->grant_table;
+    unsigned int nr = 0;
+
+    /* Don't need the grant lock.  This limit is fixed at domain create time. */
+    switch ( id )
+    {
+    case XENMEM_resource_grant_table_id_shared:
+        nr = gt->max_grant_frames;
+        break;
+
+    case XENMEM_resource_grant_table_id_status:
+        if ( GNTTAB_MAX_VERSION < 2 )
+            break;
+
+        nr = grant_to_status_frames(gt->max_grant_frames);
+        break;
+    }
+
+    return nr;
+}
+
 int gnttab_acquire_resource(
     struct domain *d, unsigned int id, unsigned long frame,
     unsigned int nr_frames, xen_pfn_t mfn_list[])
diff --git a/xen/common/memory.c b/xen/common/memory.c
index 82cf7b41ee..beefa6a313 100644
--- a/xen/common/memory.c
+++ b/xen/common/memory.c
@@ -1052,6 +1052,26 @@ static long xatp_permission_check(struct domain *d, unsigned int space)
     return xsm_add_to_physmap(XSM_TARGET, current->domain, d);
 }
 
+/*
+ * Return 0 on any kind of error.  Caller converts to -EINVAL.
+ *
+ * All nonzero values should be repeatable (i.e. derived from some fixed
+ * property of the domain), and describe the full resource (i.e. mapping the
+ * result of this call will be the entire resource).
+ */
+static unsigned int resource_max_frames(const struct domain *d,
+                                        unsigned int type, unsigned int id)
+{
+    switch ( type )
+    {
+    case XENMEM_resource_grant_table:
+        return gnttab_resource_max_frames(d, id);
+
+    default:
+        return arch_resource_max_frames(d, type, id);
+    }
+}
+
 static int acquire_resource(
     XEN_GUEST_HANDLE_PARAM(xen_mem_acquire_resource_t) arg)
 {
@@ -1063,6 +1083,7 @@ static int acquire_resource(
      * use-cases then per-CPU arrays or heap allocations may be required.
      */
     xen_pfn_t mfn_list[32];
+    unsigned int max_frames;
     int rc;
 
     /*
@@ -1079,19 +1100,6 @@ static int acquire_resource(
     if ( xmar.pad != 0 )
         return -EINVAL;
 
-    if ( guest_handle_is_null(xmar.frame_list) )
-    {
-        if ( xmar.nr_frames )
-            return -EINVAL;
-
-        xmar.nr_frames = ARRAY_SIZE(mfn_list);
-
-        if ( __copy_field_to_guest(arg, &xmar, nr_frames) )
-            return -EFAULT;
-
-        return 0;
-    }
-
     if ( xmar.nr_frames > ARRAY_SIZE(mfn_list) )
         return -E2BIG;
 
@@ -1103,6 +1111,22 @@ static int acquire_resource(
     if ( rc )
         goto out;
 
+    max_frames = resource_max_frames(d, xmar.type, xmar.id);
+
+    rc = -EINVAL;
+    if ( !max_frames )
+        goto out;
+
+    if ( guest_handle_is_null(xmar.frame_list) )
+    {
+        if ( xmar.nr_frames )
+            goto out;
+
+        xmar.nr_frames = max_frames;
+        rc = __copy_field_to_guest(arg, &xmar, nr_frames) ? -EFAULT : 0;
+        goto out;
+    }
+
     switch ( xmar.type )
     {
     case XENMEM_resource_grant_table:
diff --git a/xen/include/asm-x86/mm.h b/xen/include/asm-x86/mm.h
index deeba75a1c..a40a7fa024 100644
--- a/xen/include/asm-x86/mm.h
+++ b/xen/include/asm-x86/mm.h
@@ -639,6 +639,9 @@ static inline bool arch_mfn_in_directmap(unsigned long mfn)
     return mfn <= (virt_to_mfn(eva - 1) + 1);
 }
 
+unsigned int arch_resource_max_frames(const struct domain *d,
+                                      unsigned int type, unsigned int id);
+
 int arch_acquire_resource(struct domain *d, unsigned int type,
                           unsigned int id, unsigned long frame,
                           unsigned int nr_frames, xen_pfn_t mfn_list[]);
diff --git a/xen/include/public/memory.h b/xen/include/public/memory.h
index 21d483298e..d7eb34f167 100644
--- a/xen/include/public/memory.h
+++ b/xen/include/public/memory.h
@@ -639,10 +639,18 @@ struct xen_mem_acquire_resource {
 #define XENMEM_resource_grant_table_id_status 1
 
     /*
-     * IN/OUT - As an IN parameter number of frames of the resource
-     *          to be mapped. However, if the specified value is 0 and
-     *          frame_list is NULL then this field will be set to the
-     *          maximum value supported by the implementation on return.
+     * IN/OUT
+     *
+     * As an IN parameter number of frames of the resource to be mapped.
+     *
+     * When frame_list is NULL and nr_frames is 0, this is interpreted as a
+     * request for the size of the resource, which shall be returned in the
+     * nr_frames field.
+     *
+     * The size of a resource will never be zero, but a nonzero result doesn't
+     * guarantee that a subsequent mapping request will be successful.  There
+     * are further type/id specific constraints which may change between the
+     * two calls.
      */
     uint32_t nr_frames;
     uint32_t pad;
diff --git a/xen/include/xen/grant_table.h b/xen/include/xen/grant_table.h
index 5a2c75b880..015704bebc 100644
--- a/xen/include/xen/grant_table.h
+++ b/xen/include/xen/grant_table.h
@@ -57,6 +57,8 @@ int mem_sharing_gref_to_gfn(struct grant_table *gt, grant_ref_t ref,
 int gnttab_map_frame(struct domain *d, unsigned long idx, gfn_t gfn,
                      mfn_t *mfn);
 
+unsigned int gnttab_resource_max_frames(const struct domain *d, unsigned int id);
+
 int gnttab_acquire_resource(
     struct domain *d, unsigned int id, unsigned long frame,
     unsigned int nr_frames, xen_pfn_t mfn_list[]);
@@ -93,6 +95,12 @@ static inline int gnttab_map_frame(struct domain *d, unsigned long idx,
     return -EINVAL;
 }
 
+static inline unsigned int gnttab_resource_max_frames(
+    const struct domain *d, unsigned int id)
+{
+    return 0;
+}
+
 static inline int gnttab_acquire_resource(
     struct domain *d, unsigned int id, unsigned long frame,
     unsigned int nr_frames, xen_pfn_t mfn_list[])
diff --git a/xen/include/xen/mm.h b/xen/include/xen/mm.h
index e62a5b726e..2a919fbcb4 100644
--- a/xen/include/xen/mm.h
+++ b/xen/include/xen/mm.h
@@ -703,6 +703,12 @@ static inline void put_page_alloc_ref(struct page_info *page)
 }
 
 #ifndef CONFIG_ARCH_ACQUIRE_RESOURCE
+static inline unsigned int arch_resource_max_frames(
+    const struct domain *d, unsigned int type, unsigned int id)
+{
+    return 0;
+}
+
 static inline int arch_acquire_resource(
     struct domain *d, unsigned int type, unsigned int id, unsigned long frame,
     unsigned int nr_frames, xen_pfn_t mfn_list[])
-- 
2.11.0



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v3 3/7] tools/foreignmem: Support querying the size of a resource
  2021-01-12 19:48 [PATCH v3 0/7] Multiple fixes to XENMEM_acquire_resource Andrew Cooper
  2021-01-12 19:48 ` [PATCH v3 1/7] xen/gnttab: Rework resource acquisition Andrew Cooper
  2021-01-12 19:48 ` [PATCH v3 2/7] xen/memory: Fix acquire_resource size semantics Andrew Cooper
@ 2021-01-12 19:48 ` Andrew Cooper
  2021-01-12 19:48 ` [PATCH v3 4/7] xen/memory: Clarify the XENMEM_acquire_resource ABI description Andrew Cooper
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 29+ messages in thread
From: Andrew Cooper @ 2021-01-12 19:48 UTC (permalink / raw)
  To: Xen-devel
  Cc: Andrew Cooper, Wei Liu, Paul Durrant, Roger Pau Monné,
	Juergen Gross, Ian Jackson, Michał Leszczyński,
	Hubert Jasudowicz, Tamas K Lengyel

With the Xen side of this interface fixed to return real sizes, userspace
needs to be able to make the query.

Introduce xenforeignmemory_resource_size() for the purpose, bumping the
library minor version.

Update both Linux and FreeBSD's osdep_xenforeignmemory_map_resource() to
understand size requests, skip the mmap() operation, and copy back the
nr_frames field.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
---
CC: Wei Liu <wl@xen.org>
CC: Paul Durrant <paul@xen.org>
CC: Roger Pau Monné <roger.pau@citrix.com>
CC: Juergen Gross <jgross@suse.com>
CC: Ian Jackson <iwj@xenproject.org>
CC: Michał Leszczyński <michal.leszczynski@cert.pl>
CC: Hubert Jasudowicz <hubert.jasudowicz@cert.pl>
CC: Tamas K Lengyel <tamas@tklengyel.com>

This depends on a bugfix to the Linux IOCTL to understand size requests and
pass them on to Xen.

  https://lore.kernel.org/xen-devel/20210112115358.23346-1-roger.pau@citrix.com/T/#u

v3:
 * Rewrite from scratch, to avoid breaking restricted domid situations.  In
   particular, we cannot open a xencall interface and issue blind hypercalls.
---
 tools/include/xenforeignmemory.h                 | 15 +++++++++++++++
 tools/libs/foreignmemory/Makefile                |  2 +-
 tools/libs/foreignmemory/core.c                  | 18 ++++++++++++++++++
 tools/libs/foreignmemory/freebsd.c               | 18 +++++++++++++++---
 tools/libs/foreignmemory/libxenforeignmemory.map |  4 ++++
 tools/libs/foreignmemory/linux.c                 | 18 +++++++++++++++---
 6 files changed, 68 insertions(+), 7 deletions(-)

diff --git a/tools/include/xenforeignmemory.h b/tools/include/xenforeignmemory.h
index d594be8df0..1ba2f5316b 100644
--- a/tools/include/xenforeignmemory.h
+++ b/tools/include/xenforeignmemory.h
@@ -179,6 +179,21 @@ xenforeignmemory_resource_handle *xenforeignmemory_map_resource(
 int xenforeignmemory_unmap_resource(
     xenforeignmemory_handle *fmem, xenforeignmemory_resource_handle *fres);
 
+/**
+ * Determine the maximum size of a specific resource.
+ *
+ * @parm fmem handle to the open foreignmemory interface
+ * @parm domid the domain id
+ * @parm type the resource type
+ * @parm id the type-specific resource identifier
+ *
+ * Return 0 on success and fills in *nr_frames.  Sets errno and return -1 on
+ * error.
+ */
+int xenforeignmemory_resource_size(
+    xenforeignmemory_handle *fmem, domid_t domid, unsigned int type,
+    unsigned int id, unsigned long *nr_frames);
+
 #endif
 
 /*
diff --git a/tools/libs/foreignmemory/Makefile b/tools/libs/foreignmemory/Makefile
index 13850f7988..90d80a49ae 100644
--- a/tools/libs/foreignmemory/Makefile
+++ b/tools/libs/foreignmemory/Makefile
@@ -2,7 +2,7 @@ XEN_ROOT = $(CURDIR)/../../..
 include $(XEN_ROOT)/tools/Rules.mk
 
 MAJOR    = 1
-MINOR    = 3
+MINOR    = 4
 
 SRCS-y                 += core.c
 SRCS-$(CONFIG_Linux)   += linux.c
diff --git a/tools/libs/foreignmemory/core.c b/tools/libs/foreignmemory/core.c
index 63f12e2450..1e92c567e1 100644
--- a/tools/libs/foreignmemory/core.c
+++ b/tools/libs/foreignmemory/core.c
@@ -188,6 +188,24 @@ int xenforeignmemory_unmap_resource(
     return rc;
 }
 
+int xenforeignmemory_resource_size(
+    xenforeignmemory_handle *fmem, domid_t domid, unsigned int type,
+    unsigned int id, unsigned long *nr_frames)
+{
+    xenforeignmemory_resource_handle fres = {
+        .domid = domid,
+        .type  = type,
+        .id    = id,
+    };
+    int rc = osdep_xenforeignmemory_map_resource(fmem, &fres);
+
+    if ( rc )
+        return rc;
+
+    *nr_frames = fres.nr_frames;
+    return 0;
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/tools/libs/foreignmemory/freebsd.c b/tools/libs/foreignmemory/freebsd.c
index 3d403a7cd0..9a2796f0b7 100644
--- a/tools/libs/foreignmemory/freebsd.c
+++ b/tools/libs/foreignmemory/freebsd.c
@@ -119,6 +119,10 @@ int osdep_xenforeignmemory_map_resource(xenforeignmemory_handle *fmem,
     };
     int rc;
 
+    if ( !fres->addr && !fres->nr_frames )
+        /* Request for resource size.  Skip mmap(). */
+        goto skip_mmap;
+
     fres->addr = mmap(fres->addr, fres->nr_frames << PAGE_SHIFT,
                       fres->prot, fres->flags | MAP_SHARED, fmem->fd, 0);
     if ( fres->addr == MAP_FAILED )
@@ -126,6 +130,7 @@ int osdep_xenforeignmemory_map_resource(xenforeignmemory_handle *fmem,
 
     mr.addr = (uintptr_t)fres->addr;
 
+ skip_mmap:
     rc = ioctl(fmem->fd, IOCTL_PRIVCMD_MMAP_RESOURCE, &mr);
     if ( rc )
     {
@@ -136,13 +141,20 @@ int osdep_xenforeignmemory_map_resource(xenforeignmemory_handle *fmem,
         else
             errno = EOPNOTSUPP;
 
-        saved_errno = errno;
-        osdep_xenforeignmemory_unmap_resource(fmem, fres);
-        errno = saved_errno;
+        if ( fres->addr )
+        {
+            saved_errno = errno;
+            osdep_xenforeignmemory_unmap_resource(fmem, fres);
+            errno = saved_errno;
+        }
 
         return -1;
     }
 
+    /* If requesting size, copy back. */
+    if ( !fres->addr )
+        fres->nr_frames = mr.num;
+
     return 0;
 }
 
diff --git a/tools/libs/foreignmemory/libxenforeignmemory.map b/tools/libs/foreignmemory/libxenforeignmemory.map
index d5323c87d9..8aca341b99 100644
--- a/tools/libs/foreignmemory/libxenforeignmemory.map
+++ b/tools/libs/foreignmemory/libxenforeignmemory.map
@@ -19,3 +19,7 @@ VERS_1.3 {
 		xenforeignmemory_map_resource;
 		xenforeignmemory_unmap_resource;
 } VERS_1.2;
+VERS_1.4 {
+	global:
+		xenforeignmemory_resource_size;
+} VERS_1.3;
diff --git a/tools/libs/foreignmemory/linux.c b/tools/libs/foreignmemory/linux.c
index fe73d5ab72..d0eead1196 100644
--- a/tools/libs/foreignmemory/linux.c
+++ b/tools/libs/foreignmemory/linux.c
@@ -312,6 +312,10 @@ int osdep_xenforeignmemory_map_resource(
     };
     int rc;
 
+    if ( !fres->addr && !fres->nr_frames )
+        /* Request for resource size.  Skip mmap(). */
+        goto skip_mmap;
+
     fres->addr = mmap(fres->addr, fres->nr_frames << PAGE_SHIFT,
                       fres->prot, fres->flags | MAP_SHARED, fmem->fd, 0);
     if ( fres->addr == MAP_FAILED )
@@ -319,6 +323,7 @@ int osdep_xenforeignmemory_map_resource(
 
     mr.addr = (uintptr_t)fres->addr;
 
+ skip_mmap:
     rc = ioctl(fmem->fd, IOCTL_PRIVCMD_MMAP_RESOURCE, &mr);
     if ( rc )
     {
@@ -329,13 +334,20 @@ int osdep_xenforeignmemory_map_resource(
         else
             errno = EOPNOTSUPP;
 
-        saved_errno = errno;
-        (void)osdep_xenforeignmemory_unmap_resource(fmem, fres);
-        errno = saved_errno;
+        if ( fres->addr )
+        {
+            saved_errno = errno;
+            osdep_xenforeignmemory_unmap_resource(fmem, fres);
+            errno = saved_errno;
+        }
 
         return -1;
     }
 
+    /* If requesting size, copy back. */
+    if ( !fres->addr )
+        fres->nr_frames = mr.num;
+
     return 0;
 }
 
-- 
2.11.0



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v3 4/7] xen/memory: Clarify the XENMEM_acquire_resource ABI description
  2021-01-12 19:48 [PATCH v3 0/7] Multiple fixes to XENMEM_acquire_resource Andrew Cooper
                   ` (2 preceding siblings ...)
  2021-01-12 19:48 ` [PATCH v3 3/7] tools/foreignmem: Support querying the size of a resource Andrew Cooper
@ 2021-01-12 19:48 ` Andrew Cooper
  2021-01-12 19:48 ` [PATCH v3 5/7] xen/memory: Improve compat XENMEM_acquire_resource handling Andrew Cooper
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 29+ messages in thread
From: Andrew Cooper @ 2021-01-12 19:48 UTC (permalink / raw)
  To: Xen-devel
  Cc: Andrew Cooper, George Dunlap, Ian Jackson, Jan Beulich,
	Stefano Stabellini, Wei Liu, Julien Grall, Paul Durrant,
	Michał Leszczyński, Hubert Jasudowicz, Tamas K Lengyel

This is how similar operations already operate, compatible with the sole
implementation (in Linux), and explicitly gives us some flexibility.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Paul Durrant <paul@xen.org>
---
CC: George Dunlap <George.Dunlap@eu.citrix.com>
CC: Ian Jackson <iwj@xenproject.org>
CC: Jan Beulich <JBeulich@suse.com>
CC: Stefano Stabellini <sstabellini@kernel.org>
CC: Wei Liu <wl@xen.org>
CC: Julien Grall <julien@xen.org>
CC: Paul Durrant <paul@xen.org>
CC: Michał Leszczyński <michal.leszczynski@cert.pl>
CC: Hubert Jasudowicz <hubert.jasudowicz@cert.pl>
CC: Tamas K Lengyel <tamas@tklengyel.com>
---
 xen/include/public/memory.h | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/xen/include/public/memory.h b/xen/include/public/memory.h
index d7eb34f167..c4c47a0b38 100644
--- a/xen/include/public/memory.h
+++ b/xen/include/public/memory.h
@@ -642,6 +642,7 @@ struct xen_mem_acquire_resource {
      * IN/OUT
      *
      * As an IN parameter number of frames of the resource to be mapped.
+     * This value may be updated during the course of the operation.
      *
      * When frame_list is NULL and nr_frames is 0, this is interpreted as a
      * request for the size of the resource, which shall be returned in the
@@ -656,7 +657,8 @@ struct xen_mem_acquire_resource {
     uint32_t pad;
     /*
      * IN - the index of the initial frame to be mapped. This parameter
-     *      is ignored if nr_frames is 0.
+     *      is ignored if nr_frames is 0.  This value may be updated
+     *      during the course of the operation.
      */
     uint64_t frame;
 
@@ -672,7 +674,8 @@ struct xen_mem_acquire_resource {
      *          If -EIO is returned then the frame_list has only been
      *          partially mapped and it is up to the caller to unmap all
      *          the GFNs.
-     *          This parameter may be NULL if nr_frames is 0.
+     *          This parameter may be NULL if nr_frames is 0.  This
+     *          value may be updated during the course of the operation.
      */
     XEN_GUEST_HANDLE(xen_pfn_t) frame_list;
 };
-- 
2.11.0



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v3 5/7] xen/memory: Improve compat XENMEM_acquire_resource handling
  2021-01-12 19:48 [PATCH v3 0/7] Multiple fixes to XENMEM_acquire_resource Andrew Cooper
                   ` (3 preceding siblings ...)
  2021-01-12 19:48 ` [PATCH v3 4/7] xen/memory: Clarify the XENMEM_acquire_resource ABI description Andrew Cooper
@ 2021-01-12 19:48 ` Andrew Cooper
  2021-01-15 15:37   ` Jan Beulich
  2021-01-12 19:48 ` [PATCH v3 6/7] xen/memory: Indent part of acquire_resource() Andrew Cooper
  2021-01-12 19:48 ` [PATCH v3 7/7] xen/memory: Fix mapping grant tables with XENMEM_acquire_resource Andrew Cooper
  6 siblings, 1 reply; 29+ messages in thread
From: Andrew Cooper @ 2021-01-12 19:48 UTC (permalink / raw)
  To: Xen-devel
  Cc: Andrew Cooper, George Dunlap, Ian Jackson, Jan Beulich,
	Stefano Stabellini, Wei Liu, Julien Grall, Paul Durrant,
	Michał Leszczyński, Hubert Jasudowicz, Tamas K Lengyel

The frame_list is an input, or an output, depending on whether the calling
domain is translated or not.  The array does not need marshalling in both
directions.

Furthermore, the copy-in loop was very inefficient, copying 4 bytes at at
time.  Rewrite it to copy in all nr_frames at once, and then expand
compat_pfn_t to xen_pfn_t in place.

Re-position the copy-in loop to simplify continuation support in a future
patch, and reduce the scope of certain variables.

No change in guest observed behaviour.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Paul Durrant <paul@xen.org>
---
CC: George Dunlap <George.Dunlap@eu.citrix.com>
CC: Ian Jackson <iwj@xenproject.org>
CC: Jan Beulich <JBeulich@suse.com>
CC: Stefano Stabellini <sstabellini@kernel.org>
CC: Wei Liu <wl@xen.org>
CC: Julien Grall <julien@xen.org>
CC: Paul Durrant <paul@xen.org>
CC: Michał Leszczyński <michal.leszczynski@cert.pl>
CC: Hubert Jasudowicz <hubert.jasudowicz@cert.pl>
CC: Tamas K Lengyel <tamas@tklengyel.com>
---
 xen/common/compat/memory.c | 65 ++++++++++++++++++++++++++++------------------
 1 file changed, 40 insertions(+), 25 deletions(-)

diff --git a/xen/common/compat/memory.c b/xen/common/compat/memory.c
index ed92e05b08..834c5e19d1 100644
--- a/xen/common/compat/memory.c
+++ b/xen/common/compat/memory.c
@@ -55,6 +55,8 @@ static int get_reserved_device_memory(xen_pfn_t start, xen_ulong_t nr,
 
 int compat_memory_op(unsigned int cmd, XEN_GUEST_HANDLE_PARAM(void) compat)
 {
+    struct vcpu *curr = current;
+    struct domain *currd = curr->domain;
     int split, op = cmd & MEMOP_CMD_MASK;
     long rc;
     unsigned int start_extent = cmd >> MEMOP_EXTENT_SHIFT;
@@ -399,7 +401,7 @@ int compat_memory_op(unsigned int cmd, XEN_GUEST_HANDLE_PARAM(void) compat)
 
         case XENMEM_acquire_resource:
         {
-            xen_pfn_t *xen_frame_list;
+            xen_pfn_t *xen_frame_list = NULL;
             unsigned int max_nr_frames;
 
             if ( copy_from_guest(&cmp.mar, compat, 1) )
@@ -417,28 +419,10 @@ int compat_memory_op(unsigned int cmd, XEN_GUEST_HANDLE_PARAM(void) compat)
             if ( cmp.mar.nr_frames > max_nr_frames )
                 return -E2BIG;
 
-            if ( compat_handle_is_null(cmp.mar.frame_list) )
-                xen_frame_list = NULL;
-            else
-            {
+            /* Marshal the frame list in the remainder of the xlat space. */
+            if ( !compat_handle_is_null(cmp.mar.frame_list) )
                 xen_frame_list = (xen_pfn_t *)(nat.mar + 1);
 
-                if ( !compat_handle_okay(cmp.mar.frame_list,
-                                         cmp.mar.nr_frames) )
-                    return -EFAULT;
-
-                for ( i = 0; i < cmp.mar.nr_frames; i++ )
-                {
-                    compat_pfn_t frame;
-
-                    if ( __copy_from_compat_offset(
-                             &frame, cmp.mar.frame_list, i, 1) )
-                        return -EFAULT;
-
-                    xen_frame_list[i] = frame;
-                }
-            }
-
 #define XLAT_mem_acquire_resource_HNDL_frame_list(_d_, _s_) \
             set_xen_guest_handle((_d_)->frame_list, xen_frame_list)
 
@@ -446,6 +430,31 @@ int compat_memory_op(unsigned int cmd, XEN_GUEST_HANDLE_PARAM(void) compat)
 
 #undef XLAT_mem_acquire_resource_HNDL_frame_list
 
+            if ( xen_frame_list && cmp.mar.nr_frames )
+            {
+                /*
+                 * frame_list is an input for translated guests, and an output
+                 * for untranslated guests.  Only copy in for translated guests.
+                 */
+                if ( paging_mode_translate(currd) )
+                {
+                    compat_pfn_t *compat_frame_list = (void *)xen_frame_list;
+
+                    if ( !compat_handle_okay(cmp.mar.frame_list,
+                                             cmp.mar.nr_frames) ||
+                         __copy_from_compat_offset(
+                             compat_frame_list, cmp.mar.frame_list,
+                             0, cmp.mar.nr_frames) )
+                        return -EFAULT;
+
+                    /*
+                     * Iterate backwards over compat_frame_list[] expanding
+                     * compat_pfn_t to xen_pfn_t in place.
+                     */
+                    for ( int x = cmp.mar.nr_frames - 1; x >= 0; --x )
+                        xen_frame_list[x] = compat_frame_list[x];
+                }
+            }
             break;
         }
         default:
@@ -590,8 +599,6 @@ int compat_memory_op(unsigned int cmd, XEN_GUEST_HANDLE_PARAM(void) compat)
 
         case XENMEM_acquire_resource:
         {
-            const xen_pfn_t *xen_frame_list = (xen_pfn_t *)(nat.mar + 1);
-            compat_pfn_t *compat_frame_list = (compat_pfn_t *)(nat.mar + 1);
             DEFINE_XEN_GUEST_HANDLE(compat_mem_acquire_resource_t);
 
             if ( compat_handle_is_null(cmp.mar.frame_list) )
@@ -601,9 +608,18 @@ int compat_memory_op(unsigned int cmd, XEN_GUEST_HANDLE_PARAM(void) compat)
                                            compat_mem_acquire_resource_t),
                          nat.mar, nr_frames) )
                     return -EFAULT;
+                break;
             }
-            else
+
+            /*
+             * frame_list is an input for translated guests, and an output for
+             * untranslated guests.  Only copy out for untranslated guests.
+             */
+            if ( !paging_mode_translate(currd) )
             {
+                const xen_pfn_t *xen_frame_list = (xen_pfn_t *)(nat.mar + 1);
+                compat_pfn_t *compat_frame_list = (compat_pfn_t *)(nat.mar + 1);
+
                 /*
                  * NOTE: the smaller compat array overwrites the native
                  *       array.
@@ -625,7 +641,6 @@ int compat_memory_op(unsigned int cmd, XEN_GUEST_HANDLE_PARAM(void) compat)
                                              cmp.mar.nr_frames) )
                     return -EFAULT;
             }
-
             break;
         }
 
-- 
2.11.0



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v3 6/7] xen/memory: Indent part of acquire_resource()
  2021-01-12 19:48 [PATCH v3 0/7] Multiple fixes to XENMEM_acquire_resource Andrew Cooper
                   ` (4 preceding siblings ...)
  2021-01-12 19:48 ` [PATCH v3 5/7] xen/memory: Improve compat XENMEM_acquire_resource handling Andrew Cooper
@ 2021-01-12 19:48 ` Andrew Cooper
  2021-01-12 19:48 ` [PATCH v3 7/7] xen/memory: Fix mapping grant tables with XENMEM_acquire_resource Andrew Cooper
  6 siblings, 0 replies; 29+ messages in thread
From: Andrew Cooper @ 2021-01-12 19:48 UTC (permalink / raw)
  To: Xen-devel
  Cc: Andrew Cooper, George Dunlap, Ian Jackson, Jan Beulich,
	Stefano Stabellini, Wei Liu, Julien Grall, Paul Durrant,
	Michał Leszczyński, Hubert Jasudowicz, Tamas K Lengyel

Indent the middle of acquire_resource() inside a do {} while ( 0 ) loop.  This
is broken out specifically to make the following change readable.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Paul Durrant <paul@xen.org>
---
CC: George Dunlap <George.Dunlap@eu.citrix.com>
CC: Ian Jackson <iwj@xenproject.org>
CC: Jan Beulich <JBeulich@suse.com>
CC: Stefano Stabellini <sstabellini@kernel.org>
CC: Wei Liu <wl@xen.org>
CC: Julien Grall <julien@xen.org>
CC: Paul Durrant <paul@xen.org>
CC: Michał Leszczyński <michal.leszczynski@cert.pl>
CC: Hubert Jasudowicz <hubert.jasudowicz@cert.pl>
CC: Tamas K Lengyel <tamas@tklengyel.com>
---
 xen/common/memory.c | 66 +++++++++++++++++++++++++++--------------------------
 1 file changed, 34 insertions(+), 32 deletions(-)

diff --git a/xen/common/memory.c b/xen/common/memory.c
index beefa6a313..fd30eefa2e 100644
--- a/xen/common/memory.c
+++ b/xen/common/memory.c
@@ -1127,44 +1127,46 @@ static int acquire_resource(
         goto out;
     }
 
-    switch ( xmar.type )
-    {
-    case XENMEM_resource_grant_table:
-        rc = gnttab_acquire_resource(d, xmar.id, xmar.frame, xmar.nr_frames,
-                                     mfn_list);
-        break;
+    do {
+        switch ( xmar.type )
+        {
+        case XENMEM_resource_grant_table:
+            rc = gnttab_acquire_resource(d, xmar.id, xmar.frame, xmar.nr_frames,
+                                         mfn_list);
+            break;
 
-    default:
-        rc = arch_acquire_resource(d, xmar.type, xmar.id, xmar.frame,
-                                   xmar.nr_frames, mfn_list);
-        break;
-    }
+        default:
+            rc = arch_acquire_resource(d, xmar.type, xmar.id, xmar.frame,
+                                       xmar.nr_frames, mfn_list);
+            break;
+        }
 
-    if ( rc )
-        goto out;
+        if ( rc )
+            goto out;
 
-    if ( !paging_mode_translate(currd) )
-    {
-        if ( copy_to_guest(xmar.frame_list, mfn_list, xmar.nr_frames) )
-            rc = -EFAULT;
-    }
-    else
-    {
-        xen_pfn_t gfn_list[ARRAY_SIZE(mfn_list)];
-        unsigned int i;
+        if ( !paging_mode_translate(currd) )
+        {
+            if ( copy_to_guest(xmar.frame_list, mfn_list, xmar.nr_frames) )
+                rc = -EFAULT;
+        }
+        else
+        {
+            xen_pfn_t gfn_list[ARRAY_SIZE(mfn_list)];
+            unsigned int i;
 
-        if ( copy_from_guest(gfn_list, xmar.frame_list, xmar.nr_frames) )
-            rc = -EFAULT;
+            if ( copy_from_guest(gfn_list, xmar.frame_list, xmar.nr_frames) )
+                rc = -EFAULT;
 
-        for ( i = 0; !rc && i < xmar.nr_frames; i++ )
-        {
-            rc = set_foreign_p2m_entry(currd, gfn_list[i],
-                                       _mfn(mfn_list[i]));
-            /* rc should be -EIO for any iteration other than the first */
-            if ( rc && i )
-                rc = -EIO;
+            for ( i = 0; !rc && i < xmar.nr_frames; i++ )
+            {
+                rc = set_foreign_p2m_entry(currd, gfn_list[i],
+                                           _mfn(mfn_list[i]));
+                /* rc should be -EIO for any iteration other than the first */
+                if ( rc && i )
+                    rc = -EIO;
+            }
         }
-    }
+    } while ( 0 );
 
  out:
     rcu_unlock_domain(d);
-- 
2.11.0



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v3 7/7] xen/memory: Fix mapping grant tables with XENMEM_acquire_resource
  2021-01-12 19:48 [PATCH v3 0/7] Multiple fixes to XENMEM_acquire_resource Andrew Cooper
                   ` (5 preceding siblings ...)
  2021-01-12 19:48 ` [PATCH v3 6/7] xen/memory: Indent part of acquire_resource() Andrew Cooper
@ 2021-01-12 19:48 ` Andrew Cooper
  2021-01-15 16:12   ` Jan Beulich
  6 siblings, 1 reply; 29+ messages in thread
From: Andrew Cooper @ 2021-01-12 19:48 UTC (permalink / raw)
  To: Xen-devel
  Cc: Andrew Cooper, George Dunlap, Ian Jackson, Jan Beulich,
	Stefano Stabellini, Wei Liu, Julien Grall, Paul Durrant,
	Michał Leszczyński, Hubert Jasudowicz, Tamas K Lengyel

A guest's default number of grant frames is 64, and XENMEM_acquire_resource
will reject an attempt to map more than 32 frames.  This limit is caused by
the size of mfn_list[] on the stack.

Fix mapping of arbitrary size requests by looping over batches of 32 in
acquire_resource(), and using hypercall continuations when necessary.

To start with, break _acquire_resource() out of acquire_resource() to cope
with type-specific dispatching, and update the return semantics to indicate
the number of mfns returned.  Update gnttab_acquire_resource() and x86's
arch_acquire_resource() to match these new semantics.

Have do_memory_op() pass start_extent into acquire_resource() so it can pick
up where it left off after a continuation, and loop over batches of 32 until
all the work is done, or a continuation needs to occur.

compat_memory_op() is a bit more complicated, because it also has to marshal
frame_list in the XLAT buffer.  Have it account for continuation information
itself and hide details from the upper layer, so it can marshal the buffer in
chunks if necessary.

With these fixes in place, it is now possible to map the whole grant table for
a guest.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: George Dunlap <George.Dunlap@eu.citrix.com>
CC: Ian Jackson <iwj@xenproject.org>
CC: Jan Beulich <JBeulich@suse.com>
CC: Stefano Stabellini <sstabellini@kernel.org>
CC: Wei Liu <wl@xen.org>
CC: Julien Grall <julien@xen.org>
CC: Paul Durrant <paul@xen.org>
CC: Michał Leszczyński <michal.leszczynski@cert.pl>
CC: Hubert Jasudowicz <hubert.jasudowicz@cert.pl>
CC: Tamas K Lengyel <tamas@tklengyel.com>

v3:
 * Spelling fixes
---
 xen/arch/x86/mm.c          |   4 +-
 xen/common/compat/memory.c |  94 +++++++++++++++++++++++++++++--------
 xen/common/grant_table.c   |   3 ++
 xen/common/memory.c        | 114 ++++++++++++++++++++++++++++++++++-----------
 4 files changed, 167 insertions(+), 48 deletions(-)

diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 7a2e94cd6f..ad12429f6c 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -4628,7 +4628,6 @@ int arch_acquire_resource(struct domain *d, unsigned int type,
         if ( id != (unsigned int)ioservid )
             break;
 
-        rc = 0;
         for ( i = 0; i < nr_frames; i++ )
         {
             mfn_t mfn;
@@ -4639,6 +4638,9 @@ int arch_acquire_resource(struct domain *d, unsigned int type,
 
             mfn_list[i] = mfn_x(mfn);
         }
+        if ( i == nr_frames )
+            /* Success.  Passed nr_frames back to the caller. */
+            rc = nr_frames;
         break;
     }
 #endif
diff --git a/xen/common/compat/memory.c b/xen/common/compat/memory.c
index 834c5e19d1..533db228e0 100644
--- a/xen/common/compat/memory.c
+++ b/xen/common/compat/memory.c
@@ -402,23 +402,10 @@ int compat_memory_op(unsigned int cmd, XEN_GUEST_HANDLE_PARAM(void) compat)
         case XENMEM_acquire_resource:
         {
             xen_pfn_t *xen_frame_list = NULL;
-            unsigned int max_nr_frames;
 
             if ( copy_from_guest(&cmp.mar, compat, 1) )
                 return -EFAULT;
 
-            /*
-             * The number of frames handled is currently limited to a
-             * small number by the underlying implementation, so the
-             * scratch space should be sufficient for bouncing the
-             * frame addresses.
-             */
-            max_nr_frames = (COMPAT_ARG_XLAT_SIZE - sizeof(*nat.mar)) /
-                sizeof(*xen_frame_list);
-
-            if ( cmp.mar.nr_frames > max_nr_frames )
-                return -E2BIG;
-
             /* Marshal the frame list in the remainder of the xlat space. */
             if ( !compat_handle_is_null(cmp.mar.frame_list) )
                 xen_frame_list = (xen_pfn_t *)(nat.mar + 1);
@@ -432,6 +419,28 @@ int compat_memory_op(unsigned int cmd, XEN_GUEST_HANDLE_PARAM(void) compat)
 
             if ( xen_frame_list && cmp.mar.nr_frames )
             {
+                unsigned int xlat_max_frames =
+                    (COMPAT_ARG_XLAT_SIZE - sizeof(*nat.mar)) /
+                    sizeof(*xen_frame_list);
+
+                if ( start_extent >= nat.mar->nr_frames )
+                    return -EINVAL;
+
+                /*
+                 * Adjust nat to account for work done on previous
+                 * continuations, leaving cmp pristine.  Hide the continaution
+                 * from the native code to prevent double accounting.
+                 */
+                nat.mar->nr_frames -= start_extent;
+                nat.mar->frame += start_extent;
+                cmd &= MEMOP_CMD_MASK;
+
+                /*
+                 * If there are two many frames to fit within the xlat buffer,
+                 * we'll need to loop to marshal them all.
+                 */
+                nat.mar->nr_frames = min(nat.mar->nr_frames, xlat_max_frames);
+
                 /*
                  * frame_list is an input for translated guests, and an output
                  * for untranslated guests.  Only copy in for translated guests.
@@ -444,14 +453,14 @@ int compat_memory_op(unsigned int cmd, XEN_GUEST_HANDLE_PARAM(void) compat)
                                              cmp.mar.nr_frames) ||
                          __copy_from_compat_offset(
                              compat_frame_list, cmp.mar.frame_list,
-                             0, cmp.mar.nr_frames) )
+                             start_extent, nat.mar->nr_frames) )
                         return -EFAULT;
 
                     /*
                      * Iterate backwards over compat_frame_list[] expanding
                      * compat_pfn_t to xen_pfn_t in place.
                      */
-                    for ( int x = cmp.mar.nr_frames - 1; x >= 0; --x )
+                    for ( int x = nat.mar->nr_frames - 1; x >= 0; --x )
                         xen_frame_list[x] = compat_frame_list[x];
                 }
             }
@@ -600,9 +609,11 @@ int compat_memory_op(unsigned int cmd, XEN_GUEST_HANDLE_PARAM(void) compat)
         case XENMEM_acquire_resource:
         {
             DEFINE_XEN_GUEST_HANDLE(compat_mem_acquire_resource_t);
+            unsigned int done;
 
             if ( compat_handle_is_null(cmp.mar.frame_list) )
             {
+                ASSERT(split == 0 && rc == 0);
                 if ( __copy_field_to_guest(
                          guest_handle_cast(compat,
                                            compat_mem_acquire_resource_t),
@@ -611,6 +622,21 @@ int compat_memory_op(unsigned int cmd, XEN_GUEST_HANDLE_PARAM(void) compat)
                 break;
             }
 
+            if ( split < 0 )
+            {
+                /* Continuation occurred. */
+                ASSERT(rc != XENMEM_acquire_resource);
+                done = cmd >> MEMOP_EXTENT_SHIFT;
+            }
+            else
+            {
+                /* No continuation. */
+                ASSERT(rc == 0);
+                done = nat.mar->nr_frames;
+            }
+
+            ASSERT(done <= nat.mar->nr_frames);
+
             /*
              * frame_list is an input for translated guests, and an output for
              * untranslated guests.  Only copy out for untranslated guests.
@@ -626,7 +652,7 @@ int compat_memory_op(unsigned int cmd, XEN_GUEST_HANDLE_PARAM(void) compat)
                  */
                 BUILD_BUG_ON(sizeof(compat_pfn_t) > sizeof(xen_pfn_t));
 
-                for ( i = 0; i < cmp.mar.nr_frames; i++ )
+                for ( i = 0; i < done; i++ )
                 {
                     compat_pfn_t frame = xen_frame_list[i];
 
@@ -636,15 +662,45 @@ int compat_memory_op(unsigned int cmd, XEN_GUEST_HANDLE_PARAM(void) compat)
                     compat_frame_list[i] = frame;
                 }
 
-                if ( __copy_to_compat_offset(cmp.mar.frame_list, 0,
+                if ( __copy_to_compat_offset(cmp.mar.frame_list, start_extent,
                                              compat_frame_list,
-                                             cmp.mar.nr_frames) )
+                                             done) )
                     return -EFAULT;
             }
-            break;
+
+            start_extent += done;
+
+            /* Completely done. */
+            if ( start_extent == cmp.mar.nr_frames )
+                break;
+
+            /*
+             * Done a "full" batch, but we were limited by space in the xlat
+             * area.  Go around the loop again without necesserily returning
+             * to guest context.
+             */
+            if ( done == nat.mar->nr_frames )
+            {
+                split = 1;
+                break;
+            }
+
+            /* Explicit continuation request from a higher level. */
+            if ( done < nat.mar->nr_frames )
+                return hypercall_create_continuation(
+                    __HYPERVISOR_memory_op, "ih",
+                    op | (start_extent << MEMOP_EXTENT_SHIFT), compat);
+
+            /*
+             * Well... Somethings gone wrong with the two levels of chunking.
+             * My condolences to whomever next has to debug this mess.
+             */
+            ASSERT_UNREACHABLE();
+            goto crash;
         }
 
         default:
+        crash:
             domain_crash(current->domain);
             split = 0;
             break;
diff --git a/xen/common/grant_table.c b/xen/common/grant_table.c
index bd99dddbf6..7c0481a05b 100644
--- a/xen/common/grant_table.c
+++ b/xen/common/grant_table.c
@@ -4083,6 +4083,9 @@ int gnttab_acquire_resource(
     for ( i = 0; i < nr_frames; ++i )
         mfn_list[i] = virt_to_mfn(vaddrs[frame + i]);
 
+    /* Success.  Passed nr_frames back to the caller. */
+    rc = nr_frames;
+
  out:
     grant_write_unlock(gt);
 
diff --git a/xen/common/memory.c b/xen/common/memory.c
index fd30eefa2e..bb4926fc30 100644
--- a/xen/common/memory.c
+++ b/xen/common/memory.c
@@ -1072,17 +1072,31 @@ static unsigned int resource_max_frames(const struct domain *d,
     }
 }
 
+/*
+ * Returns -errno on error, or positive in the range [1, nr_frames] on
+ * success.  Returning less than nr_frames contitutes a request for a
+ * continuation.
+ */
+static int _acquire_resource(
+    struct domain *d, unsigned int type, unsigned int id, unsigned long frame,
+    unsigned int nr_frames, xen_pfn_t mfn_list[])
+{
+    switch ( type )
+    {
+    case XENMEM_resource_grant_table:
+        return gnttab_acquire_resource(d, id, frame, nr_frames, mfn_list);
+
+    default:
+        return arch_acquire_resource(d, type, id, frame, nr_frames, mfn_list);
+    }
+}
+
 static int acquire_resource(
-    XEN_GUEST_HANDLE_PARAM(xen_mem_acquire_resource_t) arg)
+    XEN_GUEST_HANDLE_PARAM(xen_mem_acquire_resource_t) arg,
+    unsigned long start_extent)
 {
     struct domain *d, *currd = current->domain;
     xen_mem_acquire_resource_t xmar;
-    /*
-     * The mfn_list and gfn_list (below) arrays are ok on stack for the
-     * moment since they are small, but if they need to grow in future
-     * use-cases then per-CPU arrays or heap allocations may be required.
-     */
-    xen_pfn_t mfn_list[32];
     unsigned int max_frames;
     int rc;
 
@@ -1100,9 +1114,6 @@ static int acquire_resource(
     if ( xmar.pad != 0 )
         return -EINVAL;
 
-    if ( xmar.nr_frames > ARRAY_SIZE(mfn_list) )
-        return -E2BIG;
-
     rc = rcu_lock_remote_domain_by_id(xmar.domid, &d);
     if ( rc )
         return rc;
@@ -1119,7 +1130,7 @@ static int acquire_resource(
 
     if ( guest_handle_is_null(xmar.frame_list) )
     {
-        if ( xmar.nr_frames )
+        if ( xmar.nr_frames || start_extent )
             goto out;
 
         xmar.nr_frames = max_frames;
@@ -1127,26 +1138,47 @@ static int acquire_resource(
         goto out;
     }
 
+    /*
+     * Limiting nr_frames at (UINT_MAX >> MEMOP_EXTENT_SHIFT) isn't ideal.  If
+     * it ever becomes a practical problem, we can switch to mutating
+     * xmar.{frame,nr_frames,frame_list} in guest memory.
+     */
+    rc = -EINVAL;
+    if ( start_extent >= xmar.nr_frames ||
+         xmar.nr_frames > (UINT_MAX >> MEMOP_EXTENT_SHIFT) )
+        goto out;
+
+    /* Adjust for work done on previous continuations. */
+    xmar.nr_frames -= start_extent;
+    xmar.frame += start_extent;
+    guest_handle_add_offset(xmar.frame_list, start_extent);
+
     do {
-        switch ( xmar.type )
-        {
-        case XENMEM_resource_grant_table:
-            rc = gnttab_acquire_resource(d, xmar.id, xmar.frame, xmar.nr_frames,
-                                         mfn_list);
-            break;
+        /*
+         * Arbitrary size.  Not too much stack space, and a reasonable stride
+         * for continuation checks.
+         */
+        xen_pfn_t mfn_list[32];
+        unsigned int todo = MIN(ARRAY_SIZE(mfn_list), xmar.nr_frames), done;
 
-        default:
-            rc = arch_acquire_resource(d, xmar.type, xmar.id, xmar.frame,
-                                       xmar.nr_frames, mfn_list);
-            break;
-        }
+        rc = _acquire_resource(d, xmar.type, xmar.id, xmar.frame,
+                               todo, mfn_list);
+        if ( rc < 0 )
+            goto out;
 
-        if ( rc )
+        done = rc;
+        rc = 0;
+        if ( done == 0 || done > todo )
+        {
+            ASSERT_UNREACHABLE();
+            rc = -EINVAL;
             goto out;
+        }
 
+        /* Adjust guest frame_list appropriately. */
         if ( !paging_mode_translate(currd) )
         {
-            if ( copy_to_guest(xmar.frame_list, mfn_list, xmar.nr_frames) )
+            if ( copy_to_guest(xmar.frame_list, mfn_list, done) )
                 rc = -EFAULT;
         }
         else
@@ -1154,10 +1186,10 @@ static int acquire_resource(
             xen_pfn_t gfn_list[ARRAY_SIZE(mfn_list)];
             unsigned int i;
 
-            if ( copy_from_guest(gfn_list, xmar.frame_list, xmar.nr_frames) )
+            if ( copy_from_guest(gfn_list, xmar.frame_list, done) )
                 rc = -EFAULT;
 
-            for ( i = 0; !rc && i < xmar.nr_frames; i++ )
+            for ( i = 0; !rc && i < done; i++ )
             {
                 rc = set_foreign_p2m_entry(currd, gfn_list[i],
                                            _mfn(mfn_list[i]));
@@ -1166,7 +1198,32 @@ static int acquire_resource(
                     rc = -EIO;
             }
         }
-    } while ( 0 );
+
+        if ( rc )
+            goto out;
+
+        xmar.nr_frames -= done;
+        xmar.frame += done;
+        guest_handle_add_offset(xmar.frame_list, done);
+        start_extent += done;
+
+        /*
+         * Explicit continuation request from _acquire_resource(), or we've
+         * still got more work to do.
+         */
+        if ( done < todo ||
+             (xmar.nr_frames && hypercall_preempt_check()) )
+        {
+            rc = hypercall_create_continuation(
+                __HYPERVISOR_memory_op, "lh",
+                XENMEM_acquire_resource | (start_extent << MEMOP_EXTENT_SHIFT),
+                arg);
+            goto out;
+        }
+
+    } while ( xmar.nr_frames );
+
+    rc = 0;
 
  out:
     rcu_unlock_domain(d);
@@ -1633,7 +1690,8 @@ long do_memory_op(unsigned long cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
 
     case XENMEM_acquire_resource:
         rc = acquire_resource(
-            guest_handle_cast(arg, xen_mem_acquire_resource_t));
+            guest_handle_cast(arg, xen_mem_acquire_resource_t),
+            start_extent);
         break;
 
     default:
-- 
2.11.0



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 2/7] xen/memory: Fix acquire_resource size semantics
  2021-01-12 19:48 ` [PATCH v3 2/7] xen/memory: Fix acquire_resource size semantics Andrew Cooper
@ 2021-01-12 20:15   ` Julien Grall
  2021-01-12 20:57     ` Andrew Cooper
  0 siblings, 1 reply; 29+ messages in thread
From: Julien Grall @ 2021-01-12 20:15 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Xen-devel, George Dunlap, Ian Jackson, Jan Beulich,
	Stefano Stabellini, Wei Liu, Paul Durrant,
	Michał Leszczyński, Hubert Jasudowicz, Tamas K Lengyel

Hi Andrew,

On Tue, 12 Jan 2021 at 19:49, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
>
> Calling XENMEM_acquire_resource with a NULL frame_list is a request for the
> size of the resource, but the returned 32 is bogus.
>
> If someone tries to follow it for XENMEM_resource_ioreq_server, the acquire
> call will fail as IOREQ servers currently top out at 2 frames, and it is only
> half the size of the default grant table limit for guests.

I haven't yet looked at the code, just wanted to seek some clarification here.
Are we expecting someone else outside the tools (e.g. QEMU) to rely on
the new behavior? If so, what would happen if such code ran on older
Xen?

IOW, will that code require some compatibility layer to function?

>
> Also, no users actually request a resource size, because it was never wired up
> in the sole implementaion of resource acquisition in Linux.

s/implementation/

>
> Introduce a new resource_max_frames() to calculate the size of a resource, and
> implement it the IOREQ and grant subsystems.

in the?



>
> It is impossible to guarantee that a mapping call following a successful size
> call will succeed (e.g. The target IOREQ server gets destroyed, or the domain
> switches from grant v2 to v1).  Document the restriction, and use the
> flexibility to simplify the paths to be lockless.
>
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> Reviewed-by: Paul Durrant <paul@xen.org>
> Reviewed-by: Jan Beulich <jbeulich@suse.com>
> ---
> CC: George Dunlap <George.Dunlap@eu.citrix.com>
> CC: Ian Jackson <iwj@xenproject.org>
> CC: Jan Beulich <JBeulich@suse.com>
> CC: Stefano Stabellini <sstabellini@kernel.org>
> CC: Wei Liu <wl@xen.org>
> CC: Julien Grall <julien@xen.org>
> CC: Paul Durrant <paul@xen.org>
> CC: Michał Leszczyński <michal.leszczynski@cert.pl>
> CC: Hubert Jasudowicz <hubert.jasudowicz@cert.pl>
> CC: Tamas K Lengyel <tamas@tklengyel.com>
>
> v3:
>  * Use const struct domain *
>  * Fold goto out paths
> v2:
>  * Spelling fixes
>  * Add more local variables.
>  * Don't return any status frames on ARM where v2 support is compiled out.
> ---
>  xen/arch/x86/mm.c             | 20 +++++++++++++++++
>  xen/common/grant_table.c      | 23 ++++++++++++++++++++
>  xen/common/memory.c           | 50 ++++++++++++++++++++++++++++++++-----------
>  xen/include/asm-x86/mm.h      |  3 +++
>  xen/include/public/memory.h   | 16 ++++++++++----
>  xen/include/xen/grant_table.h |  8 +++++++
>  xen/include/xen/mm.h          |  6 ++++++
>  7 files changed, 109 insertions(+), 17 deletions(-)
>
> diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
> index 63e9fae919..7a2e94cd6f 100644
> --- a/xen/arch/x86/mm.c
> +++ b/xen/arch/x86/mm.c
> @@ -4587,6 +4587,26 @@ static int handle_iomem_range(unsigned long s, unsigned long e, void *p)
>      return err || s > e ? err : _handle_iomem_range(s, e, p);
>  }
>
> +unsigned int arch_resource_max_frames(
> +    const struct domain *d, unsigned int type, unsigned int id)
> +{
> +    unsigned int nr = 0;
> +
> +    switch ( type )
> +    {
> +#ifdef CONFIG_HVM
> +    case XENMEM_resource_ioreq_server:
> +        if ( !is_hvm_domain(d) )
> +            break;
> +        /* One frame for the buf-ioreq ring, and one frame per 128 vcpus. */
> +        nr = 1 + DIV_ROUND_UP(d->max_vcpus * sizeof(struct ioreq), PAGE_SIZE);
> +        break;
> +#endif
> +    }
> +
> +    return nr;
> +}
> +
>  int arch_acquire_resource(struct domain *d, unsigned int type,
>                            unsigned int id, unsigned long frame,
>                            unsigned int nr_frames, xen_pfn_t mfn_list[])
> diff --git a/xen/common/grant_table.c b/xen/common/grant_table.c
> index f560c250d7..bd99dddbf6 100644
> --- a/xen/common/grant_table.c
> +++ b/xen/common/grant_table.c
> @@ -4013,6 +4013,29 @@ static int gnttab_get_shared_frame_mfn(struct domain *d,
>      return 0;
>  }
>
> +unsigned int gnttab_resource_max_frames(const struct domain *d, unsigned int id)
> +{
> +    const struct grant_table *gt = d->grant_table;
> +    unsigned int nr = 0;
> +
> +    /* Don't need the grant lock.  This limit is fixed at domain create time. */
> +    switch ( id )
> +    {
> +    case XENMEM_resource_grant_table_id_shared:
> +        nr = gt->max_grant_frames;
> +        break;
> +
> +    case XENMEM_resource_grant_table_id_status:
> +        if ( GNTTAB_MAX_VERSION < 2 )
> +            break;
> +
> +        nr = grant_to_status_frames(gt->max_grant_frames);
> +        break;
> +    }
> +
> +    return nr;
> +}
> +
>  int gnttab_acquire_resource(
>      struct domain *d, unsigned int id, unsigned long frame,
>      unsigned int nr_frames, xen_pfn_t mfn_list[])
> diff --git a/xen/common/memory.c b/xen/common/memory.c
> index 82cf7b41ee..beefa6a313 100644
> --- a/xen/common/memory.c
> +++ b/xen/common/memory.c
> @@ -1052,6 +1052,26 @@ static long xatp_permission_check(struct domain *d, unsigned int space)
>      return xsm_add_to_physmap(XSM_TARGET, current->domain, d);
>  }
>
> +/*
> + * Return 0 on any kind of error.  Caller converts to -EINVAL.
> + *
> + * All nonzero values should be repeatable (i.e. derived from some fixed
> + * property of the domain), and describe the full resource (i.e. mapping the
> + * result of this call will be the entire resource).
> + */
> +static unsigned int resource_max_frames(const struct domain *d,
> +                                        unsigned int type, unsigned int id)
> +{
> +    switch ( type )
> +    {
> +    case XENMEM_resource_grant_table:
> +        return gnttab_resource_max_frames(d, id);
> +
> +    default:
> +        return arch_resource_max_frames(d, type, id);
> +    }
> +}
> +
>  static int acquire_resource(
>      XEN_GUEST_HANDLE_PARAM(xen_mem_acquire_resource_t) arg)
>  {
> @@ -1063,6 +1083,7 @@ static int acquire_resource(
>       * use-cases then per-CPU arrays or heap allocations may be required.
>       */
>      xen_pfn_t mfn_list[32];
> +    unsigned int max_frames;
>      int rc;
>
>      /*
> @@ -1079,19 +1100,6 @@ static int acquire_resource(
>      if ( xmar.pad != 0 )
>          return -EINVAL;
>
> -    if ( guest_handle_is_null(xmar.frame_list) )
> -    {
> -        if ( xmar.nr_frames )
> -            return -EINVAL;
> -
> -        xmar.nr_frames = ARRAY_SIZE(mfn_list);
> -
> -        if ( __copy_field_to_guest(arg, &xmar, nr_frames) )
> -            return -EFAULT;
> -
> -        return 0;
> -    }
> -
>      if ( xmar.nr_frames > ARRAY_SIZE(mfn_list) )
>          return -E2BIG;
>
> @@ -1103,6 +1111,22 @@ static int acquire_resource(
>      if ( rc )
>          goto out;
>
> +    max_frames = resource_max_frames(d, xmar.type, xmar.id);
> +
> +    rc = -EINVAL;
> +    if ( !max_frames )
> +        goto out;
> +
> +    if ( guest_handle_is_null(xmar.frame_list) )
> +    {
> +        if ( xmar.nr_frames )
> +            goto out;
> +
> +        xmar.nr_frames = max_frames;
> +        rc = __copy_field_to_guest(arg, &xmar, nr_frames) ? -EFAULT : 0;
> +        goto out;
> +    }
> +
>      switch ( xmar.type )
>      {
>      case XENMEM_resource_grant_table:
> diff --git a/xen/include/asm-x86/mm.h b/xen/include/asm-x86/mm.h
> index deeba75a1c..a40a7fa024 100644
> --- a/xen/include/asm-x86/mm.h
> +++ b/xen/include/asm-x86/mm.h
> @@ -639,6 +639,9 @@ static inline bool arch_mfn_in_directmap(unsigned long mfn)
>      return mfn <= (virt_to_mfn(eva - 1) + 1);
>  }
>
> +unsigned int arch_resource_max_frames(const struct domain *d,
> +                                      unsigned int type, unsigned int id);
> +
>  int arch_acquire_resource(struct domain *d, unsigned int type,
>                            unsigned int id, unsigned long frame,
>                            unsigned int nr_frames, xen_pfn_t mfn_list[]);
> diff --git a/xen/include/public/memory.h b/xen/include/public/memory.h
> index 21d483298e..d7eb34f167 100644
> --- a/xen/include/public/memory.h
> +++ b/xen/include/public/memory.h
> @@ -639,10 +639,18 @@ struct xen_mem_acquire_resource {
>  #define XENMEM_resource_grant_table_id_status 1
>
>      /*
> -     * IN/OUT - As an IN parameter number of frames of the resource
> -     *          to be mapped. However, if the specified value is 0 and
> -     *          frame_list is NULL then this field will be set to the
> -     *          maximum value supported by the implementation on return.
> +     * IN/OUT
> +     *
> +     * As an IN parameter number of frames of the resource to be mapped.
> +     *
> +     * When frame_list is NULL and nr_frames is 0, this is interpreted as a
> +     * request for the size of the resource, which shall be returned in the
> +     * nr_frames field.
> +     *
> +     * The size of a resource will never be zero, but a nonzero result doesn't
> +     * guarantee that a subsequent mapping request will be successful.  There
> +     * are further type/id specific constraints which may change between the
> +     * two calls.
>       */
>      uint32_t nr_frames;
>      uint32_t pad;
> diff --git a/xen/include/xen/grant_table.h b/xen/include/xen/grant_table.h
> index 5a2c75b880..015704bebc 100644
> --- a/xen/include/xen/grant_table.h
> +++ b/xen/include/xen/grant_table.h
> @@ -57,6 +57,8 @@ int mem_sharing_gref_to_gfn(struct grant_table *gt, grant_ref_t ref,
>  int gnttab_map_frame(struct domain *d, unsigned long idx, gfn_t gfn,
>                       mfn_t *mfn);
>
> +unsigned int gnttab_resource_max_frames(const struct domain *d, unsigned int id);
> +
>  int gnttab_acquire_resource(
>      struct domain *d, unsigned int id, unsigned long frame,
>      unsigned int nr_frames, xen_pfn_t mfn_list[]);
> @@ -93,6 +95,12 @@ static inline int gnttab_map_frame(struct domain *d, unsigned long idx,
>      return -EINVAL;
>  }
>
> +static inline unsigned int gnttab_resource_max_frames(
> +    const struct domain *d, unsigned int id)
> +{
> +    return 0;
> +}
> +
>  static inline int gnttab_acquire_resource(
>      struct domain *d, unsigned int id, unsigned long frame,
>      unsigned int nr_frames, xen_pfn_t mfn_list[])
> diff --git a/xen/include/xen/mm.h b/xen/include/xen/mm.h
> index e62a5b726e..2a919fbcb4 100644
> --- a/xen/include/xen/mm.h
> +++ b/xen/include/xen/mm.h
> @@ -703,6 +703,12 @@ static inline void put_page_alloc_ref(struct page_info *page)
>  }
>
>  #ifndef CONFIG_ARCH_ACQUIRE_RESOURCE
> +static inline unsigned int arch_resource_max_frames(
> +    const struct domain *d, unsigned int type, unsigned int id)
> +{
> +    return 0;
> +}
> +
>  static inline int arch_acquire_resource(
>      struct domain *d, unsigned int type, unsigned int id, unsigned long frame,
>      unsigned int nr_frames, xen_pfn_t mfn_list[])
> --
> 2.11.0
>


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 2/7] xen/memory: Fix acquire_resource size semantics
  2021-01-12 20:15   ` Julien Grall
@ 2021-01-12 20:57     ` Andrew Cooper
  2021-01-12 21:05       ` Tamas K Lengyel
  0 siblings, 1 reply; 29+ messages in thread
From: Andrew Cooper @ 2021-01-12 20:57 UTC (permalink / raw)
  To: Julien Grall, Andrew Cooper
  Cc: Xen-devel, George Dunlap, Ian Jackson, Jan Beulich,
	Stefano Stabellini, Wei Liu, Paul Durrant,
	Michał Leszczyński, Hubert Jasudowicz, Tamas K Lengyel,
	Andrew Cooper

On 12/01/2021 20:15, Julien Grall wrote:
> Hi Andrew,
>
> On Tue, 12 Jan 2021 at 19:49, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
>> Calling XENMEM_acquire_resource with a NULL frame_list is a request for the
>> size of the resource, but the returned 32 is bogus.
>>
>> If someone tries to follow it for XENMEM_resource_ioreq_server, the acquire
>> call will fail as IOREQ servers currently top out at 2 frames, and it is only
>> half the size of the default grant table limit for guests.
> I haven't yet looked at the code, just wanted to seek some clarification here.
> Are we expecting someone else outside the tools (e.g. QEMU) to rely on
> the new behavior? If so, what would happen if such code ran on older
> Xen?
>
> IOW, will that code require some compatibility layer to function?

This is total mess.

Requesting the size of a resource was never implemented for userspace. 
The two current users of this interface (domain builder for gnttab
seeding, qemu for ioreq server) make blind mapping calls with a priori
knowledge of the offsets and sizes.

The next patch in this series adds the xenforeignmemory_resource_size()
API for userspace, so we can reliably say that anything built against
anything earlier than Xen 4.15 isn't making size requests.

The added complication is that if you have Xen 4.15, and Linux 4.18 or
later without the kernel fix provided by Roger (which is CC stable),
you'll get a bogus 32 back from the size requests.

But that is fine because nothing is making size requests yet.

The first concrete user of size requests will be Michał's Processor
Trace series, where there is a daemon to map the PT buffer and shuffle
the contents into a file.  That will also depend on new content in 4.15.

At some point in the future, Qemu is going to have to change it's
approach, when we want to support more than 128 vcpus.  This is the
point at which the synchronous ioreq page needs to turn into two pages. 
In practice, qemu could make a size call, or could make the same
calculation as Xen makes, as Qemu does know d->max_vcpus via other means.


Honestly, I'm expecting that Linux stable will do the rounds faster than
Xen 4.15 gets rolled out to distros.  The chances of having a bleeding
edge Xen, and not an up-to-date (or at least patched) dom0 kernel is
minimal.

>> Also, no users actually request a resource size, because it was never wired up
>> in the sole implementaion of resource acquisition in Linux.
> s/implementation/
>
>> Introduce a new resource_max_frames() to calculate the size of a resource, and
>> implement it the IOREQ and grant subsystems.
> in the?

Fixed, thanks,

~Andrew


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 2/7] xen/memory: Fix acquire_resource size semantics
  2021-01-12 20:57     ` Andrew Cooper
@ 2021-01-12 21:05       ` Tamas K Lengyel
  0 siblings, 0 replies; 29+ messages in thread
From: Tamas K Lengyel @ 2021-01-12 21:05 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Julien Grall, Andrew Cooper, Xen-devel, George Dunlap,
	Ian Jackson, Jan Beulich, Stefano Stabellini, Wei Liu,
	Paul Durrant, Michał Leszczyński, Hubert Jasudowicz

On Tue, Jan 12, 2021 at 3:57 PM Andrew Cooper <amc96@cam.ac.uk> wrote:
>
> On 12/01/2021 20:15, Julien Grall wrote:
> > Hi Andrew,
> >
> > On Tue, 12 Jan 2021 at 19:49, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
> >> Calling XENMEM_acquire_resource with a NULL frame_list is a request for the
> >> size of the resource, but the returned 32 is bogus.
> >>
> >> If someone tries to follow it for XENMEM_resource_ioreq_server, the acquire
> >> call will fail as IOREQ servers currently top out at 2 frames, and it is only
> >> half the size of the default grant table limit for guests.
> > I haven't yet looked at the code, just wanted to seek some clarification here.
> > Are we expecting someone else outside the tools (e.g. QEMU) to rely on
> > the new behavior? If so, what would happen if such code ran on older
> > Xen?
> >
> > IOW, will that code require some compatibility layer to function?
>
> This is total mess.
>
> Requesting the size of a resource was never implemented for userspace.
> The two current users of this interface (domain builder for gnttab
> seeding, qemu for ioreq server) make blind mapping calls with a priori
> knowledge of the offsets and sizes.
>
> The next patch in this series adds the xenforeignmemory_resource_size()
> API for userspace, so we can reliably say that anything built against
> anything earlier than Xen 4.15 isn't making size requests.
>
> The added complication is that if you have Xen 4.15, and Linux 4.18 or
> later without the kernel fix provided by Roger (which is CC stable),
> you'll get a bogus 32 back from the size requests.
>
> But that is fine because nothing is making size requests yet.
>
> The first concrete user of size requests will be Michał's Processor
> Trace series, where there is a daemon to map the PT buffer and shuffle
> the contents into a file.  That will also depend on new content in 4.15.
>
> At some point in the future, Qemu is going to have to change it's
> approach, when we want to support more than 128 vcpus.  This is the
> point at which the synchronous ioreq page needs to turn into two pages.
> In practice, qemu could make a size call, or could make the same
> calculation as Xen makes, as Qemu does know d->max_vcpus via other means.
>
>
> Honestly, I'm expecting that Linux stable will do the rounds faster than
> Xen 4.15 gets rolled out to distros.  The chances of having a bleeding
> edge Xen, and not an up-to-date (or at least patched) dom0 kernel is
> minimal.

For our usecase it's not great that we would need a newer kernel then
what standard distros ship, but as a workaround compiling the newer
version of xen-privcmd as an out-of-tree lkm with the fix applied is
something we can live with while the distros catch up. Here:
https://github.com/tklengyel/xen-privcmd-lkm

Tamas


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 1/7] xen/gnttab: Rework resource acquisition
  2021-01-12 19:48 ` [PATCH v3 1/7] xen/gnttab: Rework resource acquisition Andrew Cooper
@ 2021-01-15 11:43   ` Jan Beulich
  2021-01-15 16:03     ` Andrew Cooper
  2021-01-15 11:56   ` Jan Beulich
  1 sibling, 1 reply; 29+ messages in thread
From: Jan Beulich @ 2021-01-15 11:43 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: George Dunlap, Ian Jackson, Stefano Stabellini, Wei Liu,
	Julien Grall, Paul Durrant, Michał Leszczyński,
	Hubert Jasudowicz, Tamas K Lengyel, Xen-devel

On 12.01.2021 20:48, Andrew Cooper wrote:
> The existing logic doesn't function in the general case for mapping a guests
> grant table, due to arbitrary 32 frame limit, and the default grant table
> limit being 64.
> 
> In order to start addressing this, rework the existing grant table logic by
> implementing a single gnttab_acquire_resource().  This is far more efficient
> than the previous acquire_grant_table() in memory.c because it doesn't take
> the grant table write lock, and attempt to grow the table, for every single
> frame.
> 
> The new gnttab_acquire_resource() function subsumes the previous two
> gnttab_get_{shared,status}_frame() helpers.
> 
> No functional change.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

Reviewed-by: Jan Beulich <jbeulich@suse.com>
albeit with a remark and a question:

> --- a/xen/common/grant_table.c
> +++ b/xen/common/grant_table.c
> @@ -4013,6 +4013,59 @@ static int gnttab_get_shared_frame_mfn(struct domain *d,
>      return 0;
>  }
>  
> +int gnttab_acquire_resource(
> +    struct domain *d, unsigned int id, unsigned long frame,
> +    unsigned int nr_frames, xen_pfn_t mfn_list[])
> +{
> +    struct grant_table *gt = d->grant_table;
> +    unsigned int i = nr_frames, tot_frames;

It doesn't look like this initializer was needed. The only
use of i that I can spot is the loop near the end, which
starts from 0.

> +    mfn_t tmp;
> +    void **vaddrs;
> +    int rc;
> +
> +    /* Overflow checks */
> +    if ( frame + nr_frames < frame )
> +        return -EINVAL;
> +
> +    tot_frames = frame + nr_frames;
> +    if ( tot_frames != frame + nr_frames )
> +        return -EINVAL;

Can't these two be folded into

    unsigned int tot_frames = frame + nr_frames;

    if ( tot_frames < frame )
        return -EINVAL;

? Both truncation and wrapping look to be taken care of this
way.

Jan


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 1/7] xen/gnttab: Rework resource acquisition
  2021-01-12 19:48 ` [PATCH v3 1/7] xen/gnttab: Rework resource acquisition Andrew Cooper
  2021-01-15 11:43   ` Jan Beulich
@ 2021-01-15 11:56   ` Jan Beulich
  2021-01-15 16:57     ` Andrew Cooper
  1 sibling, 1 reply; 29+ messages in thread
From: Jan Beulich @ 2021-01-15 11:56 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: George Dunlap, Ian Jackson, Stefano Stabellini, Wei Liu,
	Julien Grall, Paul Durrant, Michał Leszczyński,
	Hubert Jasudowicz, Tamas K Lengyel, Xen-devel

On 12.01.2021 20:48, Andrew Cooper wrote:
> The existing logic doesn't function in the general case for mapping a guests
> grant table, due to arbitrary 32 frame limit, and the default grant table
> limit being 64.
> 
> In order to start addressing this, rework the existing grant table logic by
> implementing a single gnttab_acquire_resource().  This is far more efficient
> than the previous acquire_grant_table() in memory.c because it doesn't take
> the grant table write lock, and attempt to grow the table, for every single
> frame.
> 
> The new gnttab_acquire_resource() function subsumes the previous two
> gnttab_get_{shared,status}_frame() helpers.
> 
> No functional change.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

I'm sorry, but I have to withdraw the R-b given a few minutes ago.

> --- a/xen/common/grant_table.c
> +++ b/xen/common/grant_table.c
> @@ -4013,6 +4013,59 @@ static int gnttab_get_shared_frame_mfn(struct domain *d,
>      return 0;
>  }
>  
> +int gnttab_acquire_resource(
> +    struct domain *d, unsigned int id, unsigned long frame,
> +    unsigned int nr_frames, xen_pfn_t mfn_list[])
> +{
> +    struct grant_table *gt = d->grant_table;
> +    unsigned int i = nr_frames, tot_frames;
> +    mfn_t tmp;
> +    void **vaddrs;
> +    int rc;
> +
> +    /* Overflow checks */
> +    if ( frame + nr_frames < frame )
> +        return -EINVAL;
> +
> +    tot_frames = frame + nr_frames;
> +    if ( tot_frames != frame + nr_frames )
> +        return -EINVAL;

While you did remove the explicit returning of an error when
nr_frames is zero, ...

> +    /* Grow table if necessary. */
> +    grant_write_lock(gt);
> +    rc = -EINVAL;
> +    switch ( id )
> +    {
> +    case XENMEM_resource_grant_table_id_shared:
> +        vaddrs = gt->shared_raw;
> +        rc = gnttab_get_shared_frame_mfn(d, tot_frames - 1, &tmp);

... this will degenerate (and still cause an error) when frame
is also zero, and will cause undue growing of the table when
frame is non-zero yet not overly large.

As indicated before, I'm of the clear opinion that here - like
elsewhere - a number of zero frames requested means that no
action be taken at all, and success be returned.

> +        break;
> +
> +    case XENMEM_resource_grant_table_id_status:
> +        if ( gt->gt_version != 2 )
> +            break;

As per various other places in this file I think you want to
wrap this in evaluate_nospec().

Jan


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 5/7] xen/memory: Improve compat XENMEM_acquire_resource handling
  2021-01-12 19:48 ` [PATCH v3 5/7] xen/memory: Improve compat XENMEM_acquire_resource handling Andrew Cooper
@ 2021-01-15 15:37   ` Jan Beulich
  2021-01-28 23:32     ` Andrew Cooper
  0 siblings, 1 reply; 29+ messages in thread
From: Jan Beulich @ 2021-01-15 15:37 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: George Dunlap, Ian Jackson, Stefano Stabellini, Wei Liu,
	Julien Grall, Paul Durrant, Michał Leszczyński,
	Hubert Jasudowicz, Tamas K Lengyel, Xen-devel

On 12.01.2021 20:48, Andrew Cooper wrote:
> @@ -446,6 +430,31 @@ int compat_memory_op(unsigned int cmd, XEN_GUEST_HANDLE_PARAM(void) compat)
>  
>  #undef XLAT_mem_acquire_resource_HNDL_frame_list
>  
> +            if ( xen_frame_list && cmp.mar.nr_frames )
> +            {
> +                /*
> +                 * frame_list is an input for translated guests, and an output
> +                 * for untranslated guests.  Only copy in for translated guests.
> +                 */
> +                if ( paging_mode_translate(currd) )
> +                {
> +                    compat_pfn_t *compat_frame_list = (void *)xen_frame_list;
> +
> +                    if ( !compat_handle_okay(cmp.mar.frame_list,
> +                                             cmp.mar.nr_frames) ||
> +                         __copy_from_compat_offset(
> +                             compat_frame_list, cmp.mar.frame_list,
> +                             0, cmp.mar.nr_frames) )
> +                        return -EFAULT;
> +
> +                    /*
> +                     * Iterate backwards over compat_frame_list[] expanding
> +                     * compat_pfn_t to xen_pfn_t in place.
> +                     */
> +                    for ( int x = cmp.mar.nr_frames - 1; x >= 0; --x )
> +                        xen_frame_list[x] = compat_frame_list[x];

Just as a nit, without requiring you to adjust (but with the
request to consider adjusting) - x getting used as array index
would generally suggest it wants to be an unsigned type (despite
me guessing the compiler ought to be able to avoid an explicit
sign-extension for the actual memory accesses):

                    for ( unsigned int x = cmp.mar.nr_frames; x--; )
                        xen_frame_list[x] = compat_frame_list[x];

Jan


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 1/7] xen/gnttab: Rework resource acquisition
  2021-01-15 11:43   ` Jan Beulich
@ 2021-01-15 16:03     ` Andrew Cooper
  2021-01-15 16:26       ` Jan Beulich
  0 siblings, 1 reply; 29+ messages in thread
From: Andrew Cooper @ 2021-01-15 16:03 UTC (permalink / raw)
  To: Jan Beulich
  Cc: George Dunlap, Ian Jackson, Stefano Stabellini, Wei Liu,
	Julien Grall, Paul Durrant, Michał Leszczyński,
	Hubert Jasudowicz, Tamas K Lengyel, Xen-devel

On 15/01/2021 11:43, Jan Beulich wrote:
>> --- a/xen/common/grant_table.c
>> +++ b/xen/common/grant_table.c
>> @@ -4013,6 +4013,59 @@ static int gnttab_get_shared_frame_mfn(struct domain *d,
>>      return 0;
>>  }
>>  
>> +int gnttab_acquire_resource(
>> +    struct domain *d, unsigned int id, unsigned long frame,
>> +    unsigned int nr_frames, xen_pfn_t mfn_list[])
>> +{
>> +    struct grant_table *gt = d->grant_table;
>> +    unsigned int i = nr_frames, tot_frames;
> It doesn't look like this initializer was needed. The only
> use of i that I can spot is the loop near the end, which
> starts from 0.

Its possibly stale from v1.  I'll adjust.

>> +    mfn_t tmp;
>> +    void **vaddrs;
>> +    int rc;
>> +
>> +    /* Overflow checks */
>> +    if ( frame + nr_frames < frame )
>> +        return -EINVAL;
>> +
>> +    tot_frames = frame + nr_frames;
>> +    if ( tot_frames != frame + nr_frames )
>> +        return -EINVAL;
> Can't these two be folded into
>
>     unsigned int tot_frames = frame + nr_frames;
>
>     if ( tot_frames < frame )
>         return -EINVAL;
>
> ? Both truncation and wrapping look to be taken care of this
> way.

Not when frame is a multiple of 4G (or fractionally over, I think).

This ABI with mismatched widths really is obnoxious to work with.

~Andrew


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 7/7] xen/memory: Fix mapping grant tables with XENMEM_acquire_resource
  2021-01-12 19:48 ` [PATCH v3 7/7] xen/memory: Fix mapping grant tables with XENMEM_acquire_resource Andrew Cooper
@ 2021-01-15 16:12   ` Jan Beulich
  2021-01-28 23:44     ` Andrew Cooper
  0 siblings, 1 reply; 29+ messages in thread
From: Jan Beulich @ 2021-01-15 16:12 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: George Dunlap, Ian Jackson, Stefano Stabellini, Wei Liu,
	Julien Grall, Paul Durrant, Michał Leszczyński,
	Hubert Jasudowicz, Tamas K Lengyel, Xen-devel

On 12.01.2021 20:48, Andrew Cooper wrote:
> --- a/xen/arch/x86/mm.c
> +++ b/xen/arch/x86/mm.c
> @@ -4628,7 +4628,6 @@ int arch_acquire_resource(struct domain *d, unsigned int type,
>          if ( id != (unsigned int)ioservid )
>              break;
>  
> -        rc = 0;
>          for ( i = 0; i < nr_frames; i++ )
>          {
>              mfn_t mfn;

How "good" are our chances that older gcc won't recognize that
without this initialization ...

> @@ -4639,6 +4638,9 @@ int arch_acquire_resource(struct domain *d, unsigned int type,
>  
>              mfn_list[i] = mfn_x(mfn);
>          }
> +        if ( i == nr_frames )
> +            /* Success.  Passed nr_frames back to the caller. */
> +            rc = nr_frames;

... rc will nevertheless not remain uninitialized when nr_frames
is zero?

> @@ -432,6 +419,28 @@ int compat_memory_op(unsigned int cmd, XEN_GUEST_HANDLE_PARAM(void) compat)
>  
>              if ( xen_frame_list && cmp.mar.nr_frames )
>              {
> +                unsigned int xlat_max_frames =
> +                    (COMPAT_ARG_XLAT_SIZE - sizeof(*nat.mar)) /
> +                    sizeof(*xen_frame_list);
> +
> +                if ( start_extent >= nat.mar->nr_frames )

Comparing with the enclosing if() I find it slightly confusing
that different instances of nr_frames get used. Perhaps best to
adjust the earlier patch to also use nat.mar->nr_frames? Or is
this perhaps on purpose?

> +                    return -EINVAL;
> +
> +                /*
> +                 * Adjust nat to account for work done on previous
> +                 * continuations, leaving cmp pristine.  Hide the continaution

Nit: continuation

> +                 * from the native code to prevent double accounting.
> +                 */
> +                nat.mar->nr_frames -= start_extent;
> +                nat.mar->frame += start_extent;
> +                cmd &= MEMOP_CMD_MASK;
> +
> +                /*
> +                 * If there are two many frames to fit within the xlat buffer,

s/two/too/

> --- a/xen/common/memory.c
> +++ b/xen/common/memory.c
> @@ -1072,17 +1072,31 @@ static unsigned int resource_max_frames(const struct domain *d,
>      }
>  }
>  
> +/*
> + * Returns -errno on error, or positive in the range [1, nr_frames] on
> + * success.  Returning less than nr_frames contitutes a request for a

Nit: constitutes

> @@ -1127,26 +1138,47 @@ static int acquire_resource(
>          goto out;
>      }
>  
> +    /*
> +     * Limiting nr_frames at (UINT_MAX >> MEMOP_EXTENT_SHIFT) isn't ideal.  If
> +     * it ever becomes a practical problem, we can switch to mutating
> +     * xmar.{frame,nr_frames,frame_list} in guest memory.
> +     */
> +    rc = -EINVAL;
> +    if ( start_extent >= xmar.nr_frames ||

At the very least start_extend == 0 and xmar.nr_frames == 0
ought to be "success" (with nothing done). Elsewhere I think
we generalize this and only reject start_extent > nr, which
to me suggests we also should do so here (and in the compat
handling) for consistency. Of course this then means ...

> +         xmar.nr_frames > (UINT_MAX >> MEMOP_EXTENT_SHIFT) )
> +        goto out;
> +
> +    /* Adjust for work done on previous continuations. */
> +    xmar.nr_frames -= start_extent;
> +    xmar.frame += start_extent;
> +    guest_handle_add_offset(xmar.frame_list, start_extent);
> +
>      do {
> -        switch ( xmar.type )
> -        {
> -        case XENMEM_resource_grant_table:
> -            rc = gnttab_acquire_resource(d, xmar.id, xmar.frame, xmar.nr_frames,
> -                                         mfn_list);
> -            break;
> +        /*
> +         * Arbitrary size.  Not too much stack space, and a reasonable stride
> +         * for continuation checks.
> +         */
> +        xen_pfn_t mfn_list[32];
> +        unsigned int todo = MIN(ARRAY_SIZE(mfn_list), xmar.nr_frames), done;
>  
> -        default:
> -            rc = arch_acquire_resource(d, xmar.type, xmar.id, xmar.frame,
> -                                       xmar.nr_frames, mfn_list);
> -            break;
> -        }
> +        rc = _acquire_resource(d, xmar.type, xmar.id, xmar.frame,
> +                               todo, mfn_list);
> +        if ( rc < 0 )
> +            goto out;
>  
> -        if ( rc )
> +        done = rc;
> +        rc = 0;
> +        if ( done == 0 || done > todo )

... to only retain the right side of the || here and ...

> @@ -1166,7 +1198,32 @@ static int acquire_resource(
>                      rc = -EIO;
>              }
>          }
> -    } while ( 0 );
> +
> +        if ( rc )
> +            goto out;
> +
> +        xmar.nr_frames -= done;
> +        xmar.frame += done;
> +        guest_handle_add_offset(xmar.frame_list, done);
> +        start_extent += done;
> +
> +        /*
> +         * Explicit continuation request from _acquire_resource(), or we've
> +         * still got more work to do.
> +         */
> +        if ( done < todo ||
> +             (xmar.nr_frames && hypercall_preempt_check()) )
> +        {
> +            rc = hypercall_create_continuation(
> +                __HYPERVISOR_memory_op, "lh",
> +                XENMEM_acquire_resource | (start_extent << MEMOP_EXTENT_SHIFT),
> +                arg);
> +            goto out;
> +        }
> +
> +    } while ( xmar.nr_frames );

... converting this loop from do-while to just while.

Jan


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 1/7] xen/gnttab: Rework resource acquisition
  2021-01-15 16:03     ` Andrew Cooper
@ 2021-01-15 16:26       ` Jan Beulich
  2021-01-15 16:27         ` Jan Beulich
  0 siblings, 1 reply; 29+ messages in thread
From: Jan Beulich @ 2021-01-15 16:26 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: George Dunlap, Ian Jackson, Stefano Stabellini, Wei Liu,
	Julien Grall, Paul Durrant, Michał Leszczyński,
	Hubert Jasudowicz, Tamas K Lengyel, Xen-devel

On 15.01.2021 17:03, Andrew Cooper wrote:
> On 15/01/2021 11:43, Jan Beulich wrote:
>>> +    mfn_t tmp;
>>> +    void **vaddrs;
>>> +    int rc;
>>> +
>>> +    /* Overflow checks */
>>> +    if ( frame + nr_frames < frame )
>>> +        return -EINVAL;
>>> +
>>> +    tot_frames = frame + nr_frames;
>>> +    if ( tot_frames != frame + nr_frames )
>>> +        return -EINVAL;
>> Can't these two be folded into
>>
>>     unsigned int tot_frames = frame + nr_frames;
>>
>>     if ( tot_frames < frame )
>>         return -EINVAL;
>>
>> ? Both truncation and wrapping look to be taken care of this
>> way.
> 
> Not when frame is a multiple of 4G (or fractionally over, I think).

How that? In this case any unsigned int value will be less than
frame.

Jan


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 1/7] xen/gnttab: Rework resource acquisition
  2021-01-15 16:26       ` Jan Beulich
@ 2021-01-15 16:27         ` Jan Beulich
  0 siblings, 0 replies; 29+ messages in thread
From: Jan Beulich @ 2021-01-15 16:27 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: George Dunlap, Ian Jackson, Stefano Stabellini, Wei Liu,
	Julien Grall, Paul Durrant, Michał Leszczyński,
	Hubert Jasudowicz, Tamas K Lengyel, Xen-devel

On 15.01.2021 17:26, Jan Beulich wrote:
> On 15.01.2021 17:03, Andrew Cooper wrote:
>> On 15/01/2021 11:43, Jan Beulich wrote:
>>>> +    mfn_t tmp;
>>>> +    void **vaddrs;
>>>> +    int rc;
>>>> +
>>>> +    /* Overflow checks */
>>>> +    if ( frame + nr_frames < frame )
>>>> +        return -EINVAL;
>>>> +
>>>> +    tot_frames = frame + nr_frames;
>>>> +    if ( tot_frames != frame + nr_frames )
>>>> +        return -EINVAL;
>>> Can't these two be folded into
>>>
>>>     unsigned int tot_frames = frame + nr_frames;
>>>
>>>     if ( tot_frames < frame )
>>>         return -EINVAL;
>>>
>>> ? Both truncation and wrapping look to be taken care of this
>>> way.
>>
>> Not when frame is a multiple of 4G (or fractionally over, I think).
> 
> How that? In this case any unsigned int value will be less than
> frame.

And in the 32-bit case the above becomes a simple overflow
check.

Jan


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 1/7] xen/gnttab: Rework resource acquisition
  2021-01-15 11:56   ` Jan Beulich
@ 2021-01-15 16:57     ` Andrew Cooper
  2021-01-18  8:23       ` Jan Beulich
  0 siblings, 1 reply; 29+ messages in thread
From: Andrew Cooper @ 2021-01-15 16:57 UTC (permalink / raw)
  To: Jan Beulich
  Cc: George Dunlap, Ian Jackson, Stefano Stabellini, Wei Liu,
	Julien Grall, Paul Durrant, Michał Leszczyński,
	Hubert Jasudowicz, Tamas K Lengyel, Xen-devel

On 15/01/2021 11:56, Jan Beulich wrote:
>> +    /* Grow table if necessary. */
>> +    grant_write_lock(gt);
>> +    rc = -EINVAL;
>> +    switch ( id )
>> +    {
>> +    case XENMEM_resource_grant_table_id_shared:
>> +        vaddrs = gt->shared_raw;
>> +        rc = gnttab_get_shared_frame_mfn(d, tot_frames - 1, &tmp);
> ... this will degenerate (and still cause an error) when frame
> is also zero, and will cause undue growing of the table when
> frame is non-zero yet not overly large.

Urgh, yes - that is why I had the check.

In which case I retract my change between v2 and v3 here.

> As indicated before, I'm of the clear opinion that here - like
> elsewhere - a number of zero frames requested means that no
> action be taken at all, and success be returned.

The general world we work in (POSIX) agrees with my opinion over yours
when it comes to this matter.

I spent a lot of time and effort getting this logic correct in v2, and I
do not have any further time to waste adding complexity to support a
non-existent corner case, nor is it reasonable to further delay all the
work which is depending on this series.  This entire mess is already too
damn complicated, without taking extra complexity.

Entertaining the idea of supporting 0 length requests is really not as
simple as you seem to think it is, and is a large part of why I'm
stubbornly refusing to do so.

I am going to commit this patch (with some of the other minor adjustments).

If you wish to add the extra complexity yourself, you are welcome to all
the unit tests I put together which exercise the complexity, but I will
not ack the resulting change.

~Andrew


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 1/7] xen/gnttab: Rework resource acquisition
  2021-01-15 16:57     ` Andrew Cooper
@ 2021-01-18  8:23       ` Jan Beulich
  2021-01-28 22:56         ` Andrew Cooper
  0 siblings, 1 reply; 29+ messages in thread
From: Jan Beulich @ 2021-01-18  8:23 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: George Dunlap, Ian Jackson, Stefano Stabellini, Wei Liu,
	Julien Grall, Paul Durrant, Michał Leszczyński,
	Hubert Jasudowicz, Tamas K Lengyel, Xen-devel

On 15.01.2021 17:57, Andrew Cooper wrote:
> On 15/01/2021 11:56, Jan Beulich wrote:
>>> +    /* Grow table if necessary. */
>>> +    grant_write_lock(gt);
>>> +    rc = -EINVAL;
>>> +    switch ( id )
>>> +    {
>>> +    case XENMEM_resource_grant_table_id_shared:
>>> +        vaddrs = gt->shared_raw;
>>> +        rc = gnttab_get_shared_frame_mfn(d, tot_frames - 1, &tmp);
>> ... this will degenerate (and still cause an error) when frame
>> is also zero, and will cause undue growing of the table when
>> frame is non-zero yet not overly large.
> 
> Urgh, yes - that is why I had the check.
> 
> In which case I retract my change between v2 and v3 here.
> 
>> As indicated before, I'm of the clear opinion that here - like
>> elsewhere - a number of zero frames requested means that no
>> action be taken at all, and success be returned.
> 
> The general world we work in (POSIX) agrees with my opinion over yours
> when it comes to this matter.

I assume you are referring to mmap()? I ask because I think there
are numerous counter examples (some even in the C standard):
malloc() & friends allow for either behavior. memcpy() / memmove()
happily do nothing when passed a zero size. read() / write()
are at least allowed to read/write nothing (and return success)
when told so. Otoh I notice that a zero vector count passed to
readv() / writev() is indeed an error, yet nothing is said at all
about individual vector elements specifying zero size.

Plus of course I don't think POSIX is the main reference point
here, when the rest of the hypercalls allowing for some form of
batching permit empty batches.

> I spent a lot of time and effort getting this logic correct in v2, and I
> do not have any further time to waste adding complexity to support a
> non-existent corner case, nor is it reasonable to further delay all the
> work which is depending on this series.  This entire mess is already too
> damn complicated, without taking extra complexity.
> 
> Entertaining the idea of supporting 0 length requests is really not as
> simple as you seem to think it is, and is a large part of why I'm
> stubbornly refusing to do so.

I'd be really happy to be educated of the complications; sadly
so far you've only claimed ones would exist without actually
going into sufficient detail. In particular I don't view placing 

    if ( size == 0 )
        return 0;

suitably early coming anywhere near "complexity". Even more so
that as per your reply you mean to undo removal of a respective
check, just that in your version it'll return an error instead
of success.

> I am going to commit this patch (with some of the other minor adjustments).

I'm not concerned enough of the introduced inconsistency to
outright veto you doing so, but I still don't think this is an
appropriate step to take under the present conditions.

Jan


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 1/7] xen/gnttab: Rework resource acquisition
  2021-01-18  8:23       ` Jan Beulich
@ 2021-01-28 22:56         ` Andrew Cooper
  2021-01-29  9:33           ` Jan Beulich
  0 siblings, 1 reply; 29+ messages in thread
From: Andrew Cooper @ 2021-01-28 22:56 UTC (permalink / raw)
  To: Jan Beulich
  Cc: George Dunlap, Ian Jackson, Stefano Stabellini, Wei Liu,
	Julien Grall, Paul Durrant, Michał Leszczyński,
	Hubert Jasudowicz, Tamas K Lengyel, Xen-devel

On 18/01/2021 08:23, Jan Beulich wrote:
> On 15.01.2021 17:57, Andrew Cooper wrote:
>> On 15/01/2021 11:56, Jan Beulich wrote:
>>>> +    /* Grow table if necessary. */
>>>> +    grant_write_lock(gt);
>>>> +    rc = -EINVAL;
>>>> +    switch ( id )
>>>> +    {
>>>> +    case XENMEM_resource_grant_table_id_shared:
>>>> +        vaddrs = gt->shared_raw;
>>>> +        rc = gnttab_get_shared_frame_mfn(d, tot_frames - 1, &tmp);
>>> ... this will degenerate (and still cause an error) when frame
>>> is also zero, and will cause undue growing of the table when
>>> frame is non-zero yet not overly large.
>> Urgh, yes - that is why I had the check.
>>
>> In which case I retract my change between v2 and v3 here.
>>
>>> As indicated before, I'm of the clear opinion that here - like
>>> elsewhere - a number of zero frames requested means that no
>>> action be taken at all, and success be returned.
>> The general world we work in (POSIX) agrees with my opinion over yours
>> when it comes to this matter.
> I assume you are referring to mmap()? I ask because I think there
> are numerous counter examples (some even in the C standard):
> malloc() & friends allow for either behavior. memcpy() / memmove()
> ...

This entire infrastructure lives behind an mmap() (or equivalent) system
call.  Other specs are not relevant.

Any request for a zero sized mapping is a bug.  It is either a buggy
caller, or buggy continuation logic.

>> I spent a lot of time and effort getting this logic correct in v2, and I
>> do not have any further time to waste adding complexity to support a
>> non-existent corner case, nor is it reasonable to further delay all the
>> work which is depending on this series.  This entire mess is already too
>> damn complicated, without taking extra complexity.
>>
>> Entertaining the idea of supporting 0 length requests is really not as
>> simple as you seem to think it is, and is a large part of why I'm
>> stubbornly refusing to do so.
> I'd be really happy to be educated of the complications; sadly
> so far you've only claimed ones would exist without actually
> going into sufficient detail. In particular I don't view placing 
>
>     if ( size == 0 )
>         return 0;
>
> suitably early coming anywhere near "complexity". Even more so
> that as per your reply you mean to undo removal of a respective
> check, just that in your version it'll return an error instead
> of success.

I am not being a belligerent arse for the sake of it.  I've got far
better things I ought to be doing with my time.

I spent a lot of time getting this working, and with sufficient testing
to demonstrate it working in practice, particularly in continuation cases.

You've spent a lot of time and effort insisting that I reintroduce
support a fundamentally broken corner case, despite my best attempts to
demonstrate why it will livelock in the hypervisor because of the mess
which is the double looping between the compat and native handlers.

You've also spent a lot of time obfuscating the overflow checks and
breaking them in the process.

You are welcome, in your own time - and not mine, to use the testing
infrastructure I've already provided to discover why the compat mess has
a habit of livelocking in certain continuation cases.  (It won't
actually livelock if you switch to using return 0.  You'll instead hit
the ASSERT_UNREACHABLE() I put in a failsafe path specifically to avoid
bugs in this are turning back into XSAs.)

Combined with the fact that 0 length requests are buggy in all
circumstances, I chose to implement logic which is robust even in the
case of failure, because the compat logic is almost intractable and
borderline untestable because noone runs 32bit kernels in anger these
days.  I can't commit my test infrastructure which spots the problems,
because we obviously can't have a hypercall which panics when the input
buffer doesn't match the test pattern.

I am not inclined to adopt an approach which is fundamentally more
likely to contain security vulnerabilities in a fragile and untestable
area of code.  You are going to have to come up with a far more
compelling argument that "because I want to support 0 length mapping
requests" to change my mind.

~Andrew


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 5/7] xen/memory: Improve compat XENMEM_acquire_resource handling
  2021-01-15 15:37   ` Jan Beulich
@ 2021-01-28 23:32     ` Andrew Cooper
  2021-02-01 14:12       ` Jan Beulich
  0 siblings, 1 reply; 29+ messages in thread
From: Andrew Cooper @ 2021-01-28 23:32 UTC (permalink / raw)
  To: Jan Beulich
  Cc: George Dunlap, Ian Jackson, Stefano Stabellini, Wei Liu,
	Julien Grall, Paul Durrant, Michał Leszczyński,
	Hubert Jasudowicz, Tamas K Lengyel, Xen-devel

On 15/01/2021 15:37, Jan Beulich wrote:
> On 12.01.2021 20:48, Andrew Cooper wrote:
>> @@ -446,6 +430,31 @@ int compat_memory_op(unsigned int cmd, XEN_GUEST_HANDLE_PARAM(void) compat)
>>  
>>  #undef XLAT_mem_acquire_resource_HNDL_frame_list
>>  
>> +            if ( xen_frame_list && cmp.mar.nr_frames )
>> +            {
>> +                /*
>> +                 * frame_list is an input for translated guests, and an output
>> +                 * for untranslated guests.  Only copy in for translated guests.
>> +                 */
>> +                if ( paging_mode_translate(currd) )
>> +                {
>> +                    compat_pfn_t *compat_frame_list = (void *)xen_frame_list;
>> +
>> +                    if ( !compat_handle_okay(cmp.mar.frame_list,
>> +                                             cmp.mar.nr_frames) ||
>> +                         __copy_from_compat_offset(
>> +                             compat_frame_list, cmp.mar.frame_list,
>> +                             0, cmp.mar.nr_frames) )
>> +                        return -EFAULT;
>> +
>> +                    /*
>> +                     * Iterate backwards over compat_frame_list[] expanding
>> +                     * compat_pfn_t to xen_pfn_t in place.
>> +                     */
>> +                    for ( int x = cmp.mar.nr_frames - 1; x >= 0; --x )
>> +                        xen_frame_list[x] = compat_frame_list[x];
> Just as a nit, without requiring you to adjust (but with the
> request to consider adjusting) - x getting used as array index
> would generally suggest it wants to be an unsigned type (despite
> me guessing the compiler ought to be able to avoid an explicit
> sign-extension for the actual memory accesses):
>
>                     for ( unsigned int x = cmp.mar.nr_frames; x--; )
>                         xen_frame_list[x] = compat_frame_list[x];

Signed numbers are not inherently evil.  The range of x is between 0 and
1020 so there is no issue with failing to enter the loop.

It is the compilers job to make this optimisation.  It is a very poor
use of a developers time to write logic which takes extra effort to
figure out whether it is correct or not.

You know what my attitude will be towards a compiler which is incapable
of making the optimisation, and you've got to go back a decade to find a
processor old enough to not have identical performance between the
unoptimised signed and unsigned forms.

Both signs of numbers have their uses, and a rigid policy of using
unsigned numbers does more harm than good (in this case, concerning the
simplicity of the code).

~Andrew


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 7/7] xen/memory: Fix mapping grant tables with XENMEM_acquire_resource
  2021-01-15 16:12   ` Jan Beulich
@ 2021-01-28 23:44     ` Andrew Cooper
  2021-01-29  9:46       ` Jan Beulich
  0 siblings, 1 reply; 29+ messages in thread
From: Andrew Cooper @ 2021-01-28 23:44 UTC (permalink / raw)
  To: Jan Beulich
  Cc: George Dunlap, Ian Jackson, Stefano Stabellini, Wei Liu,
	Julien Grall, Paul Durrant, Michał Leszczyński,
	Hubert Jasudowicz, Tamas K Lengyel, Xen-devel

On 15/01/2021 16:12, Jan Beulich wrote:
> On 12.01.2021 20:48, Andrew Cooper wrote:
>> --- a/xen/arch/x86/mm.c
>> +++ b/xen/arch/x86/mm.c
>> @@ -4628,7 +4628,6 @@ int arch_acquire_resource(struct domain *d, unsigned int type,
>>          if ( id != (unsigned int)ioservid )
>>              break;
>>  
>> -        rc = 0;
>>          for ( i = 0; i < nr_frames; i++ )
>>          {
>>              mfn_t mfn;
> How "good" are our chances that older gcc won't recognize that
> without this initialization ...
>
>> @@ -4639,6 +4638,9 @@ int arch_acquire_resource(struct domain *d, unsigned int type,
>>  
>>              mfn_list[i] = mfn_x(mfn);
>>          }
>> +        if ( i == nr_frames )
>> +            /* Success.  Passed nr_frames back to the caller. */
>> +            rc = nr_frames;
> ... rc will nevertheless not remain uninitialized when nr_frames
> is zero?

I don't anticipate this function getting complicated enough for us to
need to rely on tricks like that to spot bugs.

AFAICT, it would take a rather larger diffstat to make it "uninitialised
clean" again.

>> @@ -432,6 +419,28 @@ int compat_memory_op(unsigned int cmd, XEN_GUEST_HANDLE_PARAM(void) compat)
>>  
>>              if ( xen_frame_list && cmp.mar.nr_frames )
>>              {
>> +                unsigned int xlat_max_frames =
>> +                    (COMPAT_ARG_XLAT_SIZE - sizeof(*nat.mar)) /
>> +                    sizeof(*xen_frame_list);
>> +
>> +                if ( start_extent >= nat.mar->nr_frames )
> Comparing with the enclosing if() I find it slightly confusing
> that different instances of nr_frames get used. Perhaps best to
> adjust the earlier patch to also use nat.mar->nr_frames? Or is
> this perhaps on purpose?

Good point - this is before the shuffle to avoid double accounting, and
the code gen will be better by using cmp consistently.

~Andrew


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 1/7] xen/gnttab: Rework resource acquisition
  2021-01-28 22:56         ` Andrew Cooper
@ 2021-01-29  9:33           ` Jan Beulich
  2021-01-29 17:44             ` Ian Jackson
  0 siblings, 1 reply; 29+ messages in thread
From: Jan Beulich @ 2021-01-29  9:33 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: George Dunlap, Ian Jackson, Stefano Stabellini, Wei Liu,
	Julien Grall, Paul Durrant, Michał Leszczyński,
	Hubert Jasudowicz, Tamas K Lengyel, Xen-devel

On 28.01.2021 23:56, Andrew Cooper wrote:
> On 18/01/2021 08:23, Jan Beulich wrote:
>> On 15.01.2021 17:57, Andrew Cooper wrote:
>>> On 15/01/2021 11:56, Jan Beulich wrote:
>>>>> +    /* Grow table if necessary. */
>>>>> +    grant_write_lock(gt);
>>>>> +    rc = -EINVAL;
>>>>> +    switch ( id )
>>>>> +    {
>>>>> +    case XENMEM_resource_grant_table_id_shared:
>>>>> +        vaddrs = gt->shared_raw;
>>>>> +        rc = gnttab_get_shared_frame_mfn(d, tot_frames - 1, &tmp);
>>>> ... this will degenerate (and still cause an error) when frame
>>>> is also zero, and will cause undue growing of the table when
>>>> frame is non-zero yet not overly large.
>>> Urgh, yes - that is why I had the check.
>>>
>>> In which case I retract my change between v2 and v3 here.
>>>
>>>> As indicated before, I'm of the clear opinion that here - like
>>>> elsewhere - a number of zero frames requested means that no
>>>> action be taken at all, and success be returned.
>>> The general world we work in (POSIX) agrees with my opinion over yours
>>> when it comes to this matter.
>> I assume you are referring to mmap()? I ask because I think there
>> are numerous counter examples (some even in the C standard):
>> malloc() & friends allow for either behavior. memcpy() / memmove()
>> ...
> 
> This entire infrastructure lives behind an mmap() (or equivalent) system
> call.  Other specs are not relevant.

With this you're implying restrictions on what callers might
use this for. I don't see why a ring-0 only environment
would necessarily have anything like mmap().

Anyway, I'm not going to further comment on any of the below.
I'm not going to either ack or nak this change, so if you
have the agreement of others feel free to put in.

Jan

> Any request for a zero sized mapping is a bug.  It is either a buggy
> caller, or buggy continuation logic.
> 
>>> I spent a lot of time and effort getting this logic correct in v2, and I
>>> do not have any further time to waste adding complexity to support a
>>> non-existent corner case, nor is it reasonable to further delay all the
>>> work which is depending on this series.  This entire mess is already too
>>> damn complicated, without taking extra complexity.
>>>
>>> Entertaining the idea of supporting 0 length requests is really not as
>>> simple as you seem to think it is, and is a large part of why I'm
>>> stubbornly refusing to do so.
>> I'd be really happy to be educated of the complications; sadly
>> so far you've only claimed ones would exist without actually
>> going into sufficient detail. In particular I don't view placing 
>>
>>     if ( size == 0 )
>>         return 0;
>>
>> suitably early coming anywhere near "complexity". Even more so
>> that as per your reply you mean to undo removal of a respective
>> check, just that in your version it'll return an error instead
>> of success.
> 
> I am not being a belligerent arse for the sake of it.  I've got far
> better things I ought to be doing with my time.
> 
> I spent a lot of time getting this working, and with sufficient testing
> to demonstrate it working in practice, particularly in continuation cases.
> 
> You've spent a lot of time and effort insisting that I reintroduce
> support a fundamentally broken corner case, despite my best attempts to
> demonstrate why it will livelock in the hypervisor because of the mess
> which is the double looping between the compat and native handlers.
> 
> You've also spent a lot of time obfuscating the overflow checks and
> breaking them in the process.
> 
> You are welcome, in your own time - and not mine, to use the testing
> infrastructure I've already provided to discover why the compat mess has
> a habit of livelocking in certain continuation cases.  (It won't
> actually livelock if you switch to using return 0.  You'll instead hit
> the ASSERT_UNREACHABLE() I put in a failsafe path specifically to avoid
> bugs in this are turning back into XSAs.)
> 
> Combined with the fact that 0 length requests are buggy in all
> circumstances, I chose to implement logic which is robust even in the
> case of failure, because the compat logic is almost intractable and
> borderline untestable because noone runs 32bit kernels in anger these
> days.  I can't commit my test infrastructure which spots the problems,
> because we obviously can't have a hypercall which panics when the input
> buffer doesn't match the test pattern.
> 
> I am not inclined to adopt an approach which is fundamentally more
> likely to contain security vulnerabilities in a fragile and untestable
> area of code.  You are going to have to come up with a far more
> compelling argument that "because I want to support 0 length mapping
> requests" to change my mind.
> 
> ~Andrew
> 



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 7/7] xen/memory: Fix mapping grant tables with XENMEM_acquire_resource
  2021-01-28 23:44     ` Andrew Cooper
@ 2021-01-29  9:46       ` Jan Beulich
  2021-01-29 18:18         ` Andrew Cooper
  0 siblings, 1 reply; 29+ messages in thread
From: Jan Beulich @ 2021-01-29  9:46 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: George Dunlap, Ian Jackson, Stefano Stabellini, Wei Liu,
	Julien Grall, Paul Durrant, Michał Leszczyński,
	Hubert Jasudowicz, Tamas K Lengyel, Xen-devel

On 29.01.2021 00:44, Andrew Cooper wrote:
> On 15/01/2021 16:12, Jan Beulich wrote:
>> On 12.01.2021 20:48, Andrew Cooper wrote:
>>> --- a/xen/arch/x86/mm.c
>>> +++ b/xen/arch/x86/mm.c
>>> @@ -4628,7 +4628,6 @@ int arch_acquire_resource(struct domain *d, unsigned int type,
>>>          if ( id != (unsigned int)ioservid )
>>>              break;
>>>  
>>> -        rc = 0;
>>>          for ( i = 0; i < nr_frames; i++ )
>>>          {
>>>              mfn_t mfn;
>> How "good" are our chances that older gcc won't recognize that
>> without this initialization ...
>>
>>> @@ -4639,6 +4638,9 @@ int arch_acquire_resource(struct domain *d, unsigned int type,
>>>  
>>>              mfn_list[i] = mfn_x(mfn);
>>>          }
>>> +        if ( i == nr_frames )
>>> +            /* Success.  Passed nr_frames back to the caller. */
>>> +            rc = nr_frames;
>> ... rc will nevertheless not remain uninitialized when nr_frames
>> is zero?
> 
> I don't anticipate this function getting complicated enough for us to
> need to rely on tricks like that to spot bugs.
> 
> AFAICT, it would take a rather larger diffstat to make it "uninitialised
> clean" again.

Well, we'll see how it goes then. We still allow rather ancient
gcc, and I will eventually run into a build issue here once
having rebased accordingly, if such old gcc won't cope.

Jan


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 1/7] xen/gnttab: Rework resource acquisition
  2021-01-29  9:33           ` Jan Beulich
@ 2021-01-29 17:44             ` Ian Jackson
  0 siblings, 0 replies; 29+ messages in thread
From: Ian Jackson @ 2021-01-29 17:44 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Jan Beulich, George Dunlap, Ian Jackson, Stefano Stabellini,
	Wei Liu, Julien Grall, Paul Durrant,
	Michał Leszczyński, Hubert Jasudowicz, Tamas K Lengyel,
	Xen-devel

Jan Beulich writes ("Re: [PATCH v3 1/7] xen/gnttab: Rework resource acquisition"):
> Anyway, I'm not going to further comment on any of the below.
> I'm not going to either ack or nak this change, so if you
> have the agreement of others feel free to put in.

This patch,

Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 7/7] xen/memory: Fix mapping grant tables with XENMEM_acquire_resource
  2021-01-29  9:46       ` Jan Beulich
@ 2021-01-29 18:18         ` Andrew Cooper
  2021-02-01 12:56           ` Jan Beulich
  0 siblings, 1 reply; 29+ messages in thread
From: Andrew Cooper @ 2021-01-29 18:18 UTC (permalink / raw)
  To: Jan Beulich
  Cc: George Dunlap, Ian Jackson, Stefano Stabellini, Wei Liu,
	Julien Grall, Paul Durrant, Michał Leszczyński,
	Hubert Jasudowicz, Tamas K Lengyel, Xen-devel

On 29/01/2021 09:46, Jan Beulich wrote:
> On 29.01.2021 00:44, Andrew Cooper wrote:
>> On 15/01/2021 16:12, Jan Beulich wrote:
>>> On 12.01.2021 20:48, Andrew Cooper wrote:
>>>> --- a/xen/arch/x86/mm.c
>>>> +++ b/xen/arch/x86/mm.c
>>>> @@ -4628,7 +4628,6 @@ int arch_acquire_resource(struct domain *d, unsigned int type,
>>>>          if ( id != (unsigned int)ioservid )
>>>>              break;
>>>>  
>>>> -        rc = 0;
>>>>          for ( i = 0; i < nr_frames; i++ )
>>>>          {
>>>>              mfn_t mfn;
>>> How "good" are our chances that older gcc won't recognize that
>>> without this initialization ...
>>>
>>>> @@ -4639,6 +4638,9 @@ int arch_acquire_resource(struct domain *d, unsigned int type,
>>>>  
>>>>              mfn_list[i] = mfn_x(mfn);
>>>>          }
>>>> +        if ( i == nr_frames )
>>>> +            /* Success.  Passed nr_frames back to the caller. */
>>>> +            rc = nr_frames;
>>> ... rc will nevertheless not remain uninitialized when nr_frames
>>> is zero?
>> I don't anticipate this function getting complicated enough for us to
>> need to rely on tricks like that to spot bugs.
>>
>> AFAICT, it would take a rather larger diffstat to make it "uninitialised
>> clean" again.
> Well, we'll see how it goes then. We still allow rather ancient
> gcc, and I will eventually run into a build issue here once
> having rebased accordingly, if such old gcc won't cope.

But those problematic compilers have problems spotting that a variable
is actually initialised on all paths.

This case is opposite.  It is unconditionally initialised to -EINVAL
(just outside of the context above), which breaks some "the compiler
would warn us about error paths" logic that we try to use.

~Andrew


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 7/7] xen/memory: Fix mapping grant tables with XENMEM_acquire_resource
  2021-01-29 18:18         ` Andrew Cooper
@ 2021-02-01 12:56           ` Jan Beulich
  0 siblings, 0 replies; 29+ messages in thread
From: Jan Beulich @ 2021-02-01 12:56 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: George Dunlap, Ian Jackson, Stefano Stabellini, Wei Liu,
	Julien Grall, Paul Durrant, Michał Leszczyński,
	Hubert Jasudowicz, Tamas K Lengyel, Xen-devel

On 29.01.2021 19:18, Andrew Cooper wrote:
> On 29/01/2021 09:46, Jan Beulich wrote:
>> On 29.01.2021 00:44, Andrew Cooper wrote:
>>> On 15/01/2021 16:12, Jan Beulich wrote:
>>>> On 12.01.2021 20:48, Andrew Cooper wrote:
>>>>> --- a/xen/arch/x86/mm.c
>>>>> +++ b/xen/arch/x86/mm.c
>>>>> @@ -4628,7 +4628,6 @@ int arch_acquire_resource(struct domain *d, unsigned int type,
>>>>>          if ( id != (unsigned int)ioservid )
>>>>>              break;
>>>>>  
>>>>> -        rc = 0;
>>>>>          for ( i = 0; i < nr_frames; i++ )
>>>>>          {
>>>>>              mfn_t mfn;
>>>> How "good" are our chances that older gcc won't recognize that
>>>> without this initialization ...
>>>>
>>>>> @@ -4639,6 +4638,9 @@ int arch_acquire_resource(struct domain *d, unsigned int type,
>>>>>  
>>>>>              mfn_list[i] = mfn_x(mfn);
>>>>>          }
>>>>> +        if ( i == nr_frames )
>>>>> +            /* Success.  Passed nr_frames back to the caller. */
>>>>> +            rc = nr_frames;
>>>> ... rc will nevertheless not remain uninitialized when nr_frames
>>>> is zero?
>>> I don't anticipate this function getting complicated enough for us to
>>> need to rely on tricks like that to spot bugs.
>>>
>>> AFAICT, it would take a rather larger diffstat to make it "uninitialised
>>> clean" again.
>> Well, we'll see how it goes then. We still allow rather ancient
>> gcc, and I will eventually run into a build issue here once
>> having rebased accordingly, if such old gcc won't cope.
> 
> But those problematic compilers have problems spotting that a variable
> is actually initialised on all paths.
> 
> This case is opposite.  It is unconditionally initialised to -EINVAL
> (just outside of the context above), which breaks some "the compiler
> would warn us about error paths" logic that we try to use.

Oh, right you are. I'm sorry for the noise then.

Jan


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 5/7] xen/memory: Improve compat XENMEM_acquire_resource handling
  2021-01-28 23:32     ` Andrew Cooper
@ 2021-02-01 14:12       ` Jan Beulich
  0 siblings, 0 replies; 29+ messages in thread
From: Jan Beulich @ 2021-02-01 14:12 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: George Dunlap, Ian Jackson, Stefano Stabellini, Wei Liu,
	Julien Grall, Paul Durrant, Michał Leszczyński,
	Hubert Jasudowicz, Tamas K Lengyel, Xen-devel

On 29.01.2021 00:32, Andrew Cooper wrote:
> On 15/01/2021 15:37, Jan Beulich wrote:
>> On 12.01.2021 20:48, Andrew Cooper wrote:
>>> @@ -446,6 +430,31 @@ int compat_memory_op(unsigned int cmd, XEN_GUEST_HANDLE_PARAM(void) compat)
>>>  
>>>  #undef XLAT_mem_acquire_resource_HNDL_frame_list
>>>  
>>> +            if ( xen_frame_list && cmp.mar.nr_frames )
>>> +            {
>>> +                /*
>>> +                 * frame_list is an input for translated guests, and an output
>>> +                 * for untranslated guests.  Only copy in for translated guests.
>>> +                 */
>>> +                if ( paging_mode_translate(currd) )
>>> +                {
>>> +                    compat_pfn_t *compat_frame_list = (void *)xen_frame_list;
>>> +
>>> +                    if ( !compat_handle_okay(cmp.mar.frame_list,
>>> +                                             cmp.mar.nr_frames) ||
>>> +                         __copy_from_compat_offset(
>>> +                             compat_frame_list, cmp.mar.frame_list,
>>> +                             0, cmp.mar.nr_frames) )
>>> +                        return -EFAULT;
>>> +
>>> +                    /*
>>> +                     * Iterate backwards over compat_frame_list[] expanding
>>> +                     * compat_pfn_t to xen_pfn_t in place.
>>> +                     */
>>> +                    for ( int x = cmp.mar.nr_frames - 1; x >= 0; --x )
>>> +                        xen_frame_list[x] = compat_frame_list[x];
>> Just as a nit, without requiring you to adjust (but with the
>> request to consider adjusting) - x getting used as array index
>> would generally suggest it wants to be an unsigned type (despite
>> me guessing the compiler ought to be able to avoid an explicit
>> sign-extension for the actual memory accesses):
>>
>>                     for ( unsigned int x = cmp.mar.nr_frames; x--; )
>>                         xen_frame_list[x] = compat_frame_list[x];
> 
> Signed numbers are not inherently evil.  The range of x is between 0 and
> 1020 so there is no issue with failing to enter the loop.
> 
> It is the compilers job to make this optimisation.  It is a very poor
> use of a developers time to write logic which takes extra effort to
> figure out whether it is correct or not.

I don't see why my suggested alternative is any more difficult to
understand. It's one less expression, so perhaps even less cognitive
load. But yes, this is easily getting subjective.

> You know what my attitude will be towards a compiler which is incapable
> of making the optimisation, and you've got to go back a decade to find a
> processor old enough to not have identical performance between the
> unoptimised signed and unsigned forms.

I'm not sure I see how the compiler could transform this to using
unsigned int. By observation, gcc10 doesn't, despite -O2 (release
build). It still emits an otherwise unnecessary MOVSXD, and the
loop body is one insn shorter with an unsigned induction variable
(albeit that's likely just a side effect in this specific example).

> Both signs of numbers have their uses, and a rigid policy of using
> unsigned numbers does more harm than good (in this case, concerning the
> simplicity of the code).

Of course. But array accesses are where we'd better limit ourselves
to unsigned indexing variables, imo.

Jan


^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2021-02-01 14:12 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-12 19:48 [PATCH v3 0/7] Multiple fixes to XENMEM_acquire_resource Andrew Cooper
2021-01-12 19:48 ` [PATCH v3 1/7] xen/gnttab: Rework resource acquisition Andrew Cooper
2021-01-15 11:43   ` Jan Beulich
2021-01-15 16:03     ` Andrew Cooper
2021-01-15 16:26       ` Jan Beulich
2021-01-15 16:27         ` Jan Beulich
2021-01-15 11:56   ` Jan Beulich
2021-01-15 16:57     ` Andrew Cooper
2021-01-18  8:23       ` Jan Beulich
2021-01-28 22:56         ` Andrew Cooper
2021-01-29  9:33           ` Jan Beulich
2021-01-29 17:44             ` Ian Jackson
2021-01-12 19:48 ` [PATCH v3 2/7] xen/memory: Fix acquire_resource size semantics Andrew Cooper
2021-01-12 20:15   ` Julien Grall
2021-01-12 20:57     ` Andrew Cooper
2021-01-12 21:05       ` Tamas K Lengyel
2021-01-12 19:48 ` [PATCH v3 3/7] tools/foreignmem: Support querying the size of a resource Andrew Cooper
2021-01-12 19:48 ` [PATCH v3 4/7] xen/memory: Clarify the XENMEM_acquire_resource ABI description Andrew Cooper
2021-01-12 19:48 ` [PATCH v3 5/7] xen/memory: Improve compat XENMEM_acquire_resource handling Andrew Cooper
2021-01-15 15:37   ` Jan Beulich
2021-01-28 23:32     ` Andrew Cooper
2021-02-01 14:12       ` Jan Beulich
2021-01-12 19:48 ` [PATCH v3 6/7] xen/memory: Indent part of acquire_resource() Andrew Cooper
2021-01-12 19:48 ` [PATCH v3 7/7] xen/memory: Fix mapping grant tables with XENMEM_acquire_resource Andrew Cooper
2021-01-15 16:12   ` Jan Beulich
2021-01-28 23:44     ` Andrew Cooper
2021-01-29  9:46       ` Jan Beulich
2021-01-29 18:18         ` Andrew Cooper
2021-02-01 12:56           ` Jan Beulich

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.