All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andrew Cooper <andrew.cooper3@citrix.com>
To: Xen-devel <xen-devel@lists.xenproject.org>
Cc: "Andrew Cooper" <andrew.cooper3@citrix.com>,
	"Xen Security Team" <security@xen.org>,
	"Jan Beulich" <JBeulich@suse.com>,
	"Roger Pau Monné" <roger.pau@citrix.com>, "Wei Liu" <wl@xen.org>,
	"Stefano Stabellini" <sstabellini@kernel.org>,
	"Julien Grall" <julien@xen.org>,
	"Volodymyr Babchuk" <Volodymyr_Babchuk@epam.com>,
	"Bertrand Marquis" <bertrand.marquis@arm.com>,
	"Henry Wang" <Henry.Wang@arm.com>,
	"Anthony PERARD" <anthony.perard@citrix.com>
Subject: [PATCH 1/4] xen: Introduce non-broken hypercalls for the p2m pool size
Date: Wed, 26 Oct 2022 11:20:15 +0100	[thread overview]
Message-ID: <20221026102018.4144-2-andrew.cooper3@citrix.com> (raw)
In-Reply-To: <20221026102018.4144-1-andrew.cooper3@citrix.com>

The existing XEN_DOMCTL_SHADOW_OP_{GET,SET}_ALLOCATION have problems:

 * All set_allocation() flavours have an overflow-before-widen bug when
   calculating "sc->mb << (20 - PAGE_SHIFT)".
 * All flavours have a granularity of of 1M.  This was tolerable when the size
   of the pool could only be set at the same granularity, but is broken now
   that ARM has a 16-page stopgap allocation in use.
 * All get_allocation() flavours round up, and in particular turn 0 into 1,
   meaning the get op returns junk before a successful set op.
 * The x86 flavours reject the hypercalls before the VM has vCPUs allocated,
   despite the pool size being a domain property.
 * Even the hypercall names are long-obsolete.

Implement an interface that doesn't suck, which can be first used to unit test
the behaviour, and subsequently correct a broken implementation.  The old
interface will be retired in due course.

This is part of XSA-409 / CVE-2022-33747.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Xen Security Team <security@xen.org>
CC: Jan Beulich <JBeulich@suse.com>
CC: Roger Pau Monné <roger.pau@citrix.com>
CC: Wei Liu <wl@xen.org>
CC: Stefano Stabellini <sstabellini@kernel.org>
CC: Julien Grall <julien@xen.org>
CC: Volodymyr Babchuk <Volodymyr_Babchuk@epam.com>
CC: Bertrand Marquis <bertrand.marquis@arm.com>
CC: Henry Wang <Henry.Wang@arm.com>
CC: Anthony PERARD <anthony.perard@citrix.com>

Name subject to improvement.  ABI not.  This is the first of many tools ABI
changes required to cleanly separate the logical operation from Xen's choice
of pagetable size.

Future TODOs:
 * x86 shadow still rounds up.  This is buggy as it's a simultaneous equation
   with tot_pages which varies over time with ballooning.
 * x86 PV is weird.  There are no toolstack interact with the shadow pool
   size, but the "shadow" pool it does come into existence when logdirty (or
   pv-l1tf) when first enabled.
 * The shadow+hap logic is in desperate need of deduping.
---
 tools/include/xenctrl.h           |  3 +++
 tools/libs/ctrl/xc_domain.c       | 29 +++++++++++++++++++++++++++++
 xen/arch/arm/p2m.c                | 27 +++++++++++++++++++++++++++
 xen/arch/x86/include/asm/hap.h    |  1 +
 xen/arch/x86/include/asm/shadow.h |  4 ++++
 xen/arch/x86/mm/hap/hap.c         | 10 ++++++++++
 xen/arch/x86/mm/paging.c          | 39 +++++++++++++++++++++++++++++++++++++++
 xen/arch/x86/mm/shadow/common.c   | 10 ++++++++++
 xen/common/domctl.c               | 14 ++++++++++++++
 xen/include/public/domctl.h       | 26 +++++++++++++++++++++++++-
 xen/include/xen/domain.h          |  3 +++
 11 files changed, 165 insertions(+), 1 deletion(-)

diff --git a/tools/include/xenctrl.h b/tools/include/xenctrl.h
index 0c8b4c3aa7a5..f503f03a3927 100644
--- a/tools/include/xenctrl.h
+++ b/tools/include/xenctrl.h
@@ -893,6 +893,9 @@ long long xc_logdirty_control(xc_interface *xch,
                               unsigned int mode,
                               xc_shadow_op_stats_t *stats);
 
+int xc_get_p2m_mempool_size(xc_interface *xch, uint32_t domid, uint64_t *size);
+int xc_set_p2m_mempool_size(xc_interface *xch, uint32_t domid, uint64_t size);
+
 int xc_sched_credit_domain_set(xc_interface *xch,
                                uint32_t domid,
                                struct xen_domctl_sched_credit *sdom);
diff --git a/tools/libs/ctrl/xc_domain.c b/tools/libs/ctrl/xc_domain.c
index 14c0420c35be..9ac09cfab036 100644
--- a/tools/libs/ctrl/xc_domain.c
+++ b/tools/libs/ctrl/xc_domain.c
@@ -706,6 +706,35 @@ long long xc_logdirty_control(xc_interface *xch,
     return (rc == 0) ? domctl.u.shadow_op.pages : rc;
 }
 
+int xc_get_p2m_mempool_size(xc_interface *xch, uint32_t domid, uint64_t *size)
+{
+    int rc;
+    struct xen_domctl domctl = {
+        .cmd         = XEN_DOMCTL_get_p2m_mempool_size,
+        .domain      = domid,
+    };
+
+    rc = do_domctl(xch, &domctl);
+    if ( rc )
+        return rc;
+
+    *size = domctl.u.p2m_mempool.size;
+    return 0;
+}
+
+int xc_set_p2m_mempool_size(xc_interface *xch, uint32_t domid, uint64_t size)
+{
+    struct xen_domctl domctl = {
+        .cmd         = XEN_DOMCTL_set_p2m_mempool_size,
+        .domain      = domid,
+        .u.p2m_mempool = {
+            .size = size,
+        },
+    };
+
+    return do_domctl(xch, &domctl);
+}
+
 int xc_domain_setmaxmem(xc_interface *xch,
                         uint32_t domid,
                         uint64_t max_memkb)
diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c
index 94d3b60b1387..4607cde6f0b8 100644
--- a/xen/arch/arm/p2m.c
+++ b/xen/arch/arm/p2m.c
@@ -100,6 +100,14 @@ unsigned int p2m_get_allocation(struct domain *d)
     return ROUNDUP(nr_pages, 1 << (20 - PAGE_SHIFT)) >> (20 - PAGE_SHIFT);
 }
 
+/* Return the size of the pool, in bytes. */
+int arch_get_p2m_mempool_size(struct domain *d, uint64_t *size)
+{
+    *size = ACCESS_ONCE(d->arch.paging.p2m_total_pages) << PAGE_SHIFT;
+
+    return 0;
+}
+
 /*
  * Set the pool of pages to the required number of pages.
  * Returns 0 for success, non-zero for failure.
@@ -157,6 +165,25 @@ int p2m_set_allocation(struct domain *d, unsigned long pages, bool *preempted)
     return 0;
 }
 
+int arch_set_p2m_mempool_size(struct domain *d, uint64_t size)
+{
+    unsigned long pages = size >> PAGE_SHIFT;
+    bool preempted = false;
+    int rc;
+
+    if ( (size & ~PAGE_MASK) ||          /* Non page-sized request? */
+         pages != (size >> PAGE_SHIFT) ) /* 32-bit overflow? */
+        return -EINVAL;
+
+    spin_lock(&d->arch.paging.lock);
+    rc = p2m_set_allocation(d, pages, &preempted);
+    spin_unlock(&d->arch.paging.lock);
+
+    ASSERT(preempted == (rc == -ERESTART));
+
+    return rc;
+}
+
 int p2m_teardown_allocation(struct domain *d)
 {
     int ret = 0;
diff --git a/xen/arch/x86/include/asm/hap.h b/xen/arch/x86/include/asm/hap.h
index 90dece29deca..14d2f212dab9 100644
--- a/xen/arch/x86/include/asm/hap.h
+++ b/xen/arch/x86/include/asm/hap.h
@@ -47,6 +47,7 @@ int   hap_track_dirty_vram(struct domain *d,
 extern const struct paging_mode *hap_paging_get_mode(struct vcpu *);
 int hap_set_allocation(struct domain *d, unsigned int pages, bool *preempted);
 unsigned int hap_get_allocation(struct domain *d);
+int hap_get_allocation_bytes(struct domain *d, uint64_t *size);
 
 #endif /* XEN_HAP_H */
 
diff --git a/xen/arch/x86/include/asm/shadow.h b/xen/arch/x86/include/asm/shadow.h
index 1365fe480518..dad876d29499 100644
--- a/xen/arch/x86/include/asm/shadow.h
+++ b/xen/arch/x86/include/asm/shadow.h
@@ -97,6 +97,8 @@ void shadow_blow_tables_per_domain(struct domain *d);
 int shadow_set_allocation(struct domain *d, unsigned int pages,
                           bool *preempted);
 
+int shadow_get_allocation_bytes(struct domain *d, uint64_t *size);
+
 #else /* !CONFIG_SHADOW_PAGING */
 
 #define shadow_vcpu_teardown(v) ASSERT(is_pv_vcpu(v))
@@ -108,6 +110,8 @@ int shadow_set_allocation(struct domain *d, unsigned int pages,
     ({ ASSERT_UNREACHABLE(); -EOPNOTSUPP; })
 #define shadow_set_allocation(d, pages, preempted) \
     ({ ASSERT_UNREACHABLE(); -EOPNOTSUPP; })
+#define shadow_get_allocation_bytes(d, size) \
+    ({ ASSERT_UNREACHABLE(); -EOPNOTSUPP; })
 
 static inline void sh_remove_shadows(struct domain *d, mfn_t gmfn,
                                      int fast, int all) {}
diff --git a/xen/arch/x86/mm/hap/hap.c b/xen/arch/x86/mm/hap/hap.c
index f809ea9aa6ae..50c3d6e63fa5 100644
--- a/xen/arch/x86/mm/hap/hap.c
+++ b/xen/arch/x86/mm/hap/hap.c
@@ -345,6 +345,16 @@ unsigned int hap_get_allocation(struct domain *d)
             + ((pg & ((1 << (20 - PAGE_SHIFT)) - 1)) ? 1 : 0));
 }
 
+int hap_get_allocation_bytes(struct domain *d, uint64_t *size)
+{
+    unsigned long pages = (d->arch.paging.hap.total_pages +
+                           d->arch.paging.hap.p2m_pages);
+
+    *size = pages << PAGE_SHIFT;
+
+    return 0;
+}
+
 /* Set the pool of pages to the required number of pages.
  * Returns 0 for success, non-zero for failure. */
 int hap_set_allocation(struct domain *d, unsigned int pages, bool *preempted)
diff --git a/xen/arch/x86/mm/paging.c b/xen/arch/x86/mm/paging.c
index 3a355eee9ca3..b3f7c46e1dfd 100644
--- a/xen/arch/x86/mm/paging.c
+++ b/xen/arch/x86/mm/paging.c
@@ -977,6 +977,45 @@ int __init paging_set_allocation(struct domain *d, unsigned int pages,
 }
 #endif
 
+int arch_get_p2m_mempool_size(struct domain *d, uint64_t *size)
+{
+    int rc;
+
+    if ( is_pv_domain(d) )
+        return -EOPNOTSUPP;
+
+    if ( hap_enabled(d) )
+        rc = hap_get_allocation_bytes(d, size);
+    else
+        rc = shadow_get_allocation_bytes(d, size);
+
+    return rc;
+}
+
+int arch_set_p2m_mempool_size(struct domain *d, uint64_t size)
+{
+    unsigned long pages = size >> PAGE_SHIFT;
+    bool preempted = false;
+    int rc;
+
+    if ( is_pv_domain(d) )
+        return -EOPNOTSUPP;
+
+    if ( size & ~PAGE_MASK )             /* Non page-sized request? */
+        return -EINVAL;
+
+    ASSERT(paging_mode_enabled(d));
+
+    paging_lock(d);
+    if ( hap_enabled(d) )
+        rc = hap_set_allocation(d, pages, &preempted);
+    else
+        rc = shadow_set_allocation(d, pages, &preempted);
+    paging_unlock(d);
+
+    return preempted ? -ERESTART : rc;
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/xen/arch/x86/mm/shadow/common.c b/xen/arch/x86/mm/shadow/common.c
index badfd53c6b23..d190601c4424 100644
--- a/xen/arch/x86/mm/shadow/common.c
+++ b/xen/arch/x86/mm/shadow/common.c
@@ -1427,6 +1427,16 @@ static unsigned int shadow_get_allocation(struct domain *d)
             + ((pg & ((1 << (20 - PAGE_SHIFT)) - 1)) ? 1 : 0));
 }
 
+int shadow_get_allocation_bytes(struct domain *d, uint64_t *size)
+{
+    unsigned long pages = (d->arch.paging.shadow.total_pages +
+                           d->arch.paging.shadow.p2m_pages);
+
+    *size = pages << PAGE_SHIFT;
+
+    return 0;
+}
+
 /**************************************************************************/
 /* Hash table for storing the guest->shadow mappings.
  * The table itself is an array of pointers to shadows; the shadows are then
diff --git a/xen/common/domctl.c b/xen/common/domctl.c
index 69fb9abd346f..8f318b830185 100644
--- a/xen/common/domctl.c
+++ b/xen/common/domctl.c
@@ -874,6 +874,20 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl)
         ret = iommu_do_domctl(op, d, u_domctl);
         break;
 
+    case XEN_DOMCTL_get_p2m_mempool_size:
+        ret = arch_get_p2m_mempool_size(d, &op->u.p2m_mempool.size);
+        if ( !ret )
+            copyback = 1;
+        break;
+
+    case XEN_DOMCTL_set_p2m_mempool_size:
+        ret = arch_set_p2m_mempool_size(d, op->u.p2m_mempool.size);
+
+        if ( ret == -ERESTART )
+            ret = hypercall_create_continuation(
+                __HYPERVISOR_domctl, "h", u_domctl);
+        break;
+
     default:
         ret = arch_do_domctl(op, d, u_domctl);
         break;
diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h
index b2ae839c3632..7da09d5925c8 100644
--- a/xen/include/public/domctl.h
+++ b/xen/include/public/domctl.h
@@ -214,7 +214,10 @@ struct xen_domctl_getpageframeinfo3 {
  /* Return the bitmap but do not modify internal copy. */
 #define XEN_DOMCTL_SHADOW_OP_PEEK        12
 
-/* Memory allocation accessors. */
+/*
+ * Memory allocation accessors.  These APIs are broken and will be removed.
+ * Use XEN_DOMCTL_{get,set}_p2m_mempool_size instead.
+ */
 #define XEN_DOMCTL_SHADOW_OP_GET_ALLOCATION   30
 #define XEN_DOMCTL_SHADOW_OP_SET_ALLOCATION   31
 
@@ -946,6 +949,24 @@ struct xen_domctl_cacheflush {
     xen_pfn_t start_pfn, nr_pfns;
 };
 
+/*
+ * XEN_DOMCTL_get_p2m_mempool_size / XEN_DOMCTL_set_p2m_mempool_size.
+ *
+ * Get or set the P2M memory pool size.  The size is in bytes.
+ *
+ * The P2M memory pool is a dedicated pool of memory for managing the guest
+ * physical -> host physical mappings, usually containing pagetables.
+ * Implementation details cause there to be a minimum granularity, usually the
+ * size of pagetables used by Xen.  Users of this interface are required to
+ * identify the granularity by other means.
+ *
+ * The set operation can fail midway through the request (e.g. Xen running out
+ * of memory, no free memory to reclaim from the pool, etc.).
+ */
+struct xen_domctl_p2m_mempool {
+    uint64_aligned_t size; /* IN/OUT.  Size in bytes. */
+};
+
 #if defined(__i386__) || defined(__x86_64__)
 struct xen_domctl_vcpu_msr {
     uint32_t         index;
@@ -1274,6 +1295,8 @@ struct xen_domctl {
 #define XEN_DOMCTL_get_cpu_policy                82
 #define XEN_DOMCTL_set_cpu_policy                83
 #define XEN_DOMCTL_vmtrace_op                    84
+#define XEN_DOMCTL_get_p2m_mempool_size          85
+#define XEN_DOMCTL_set_p2m_mempool_size          86
 #define XEN_DOMCTL_gdbsx_guestmemio            1000
 #define XEN_DOMCTL_gdbsx_pausevcpu             1001
 #define XEN_DOMCTL_gdbsx_unpausevcpu           1002
@@ -1335,6 +1358,7 @@ struct xen_domctl {
         struct xen_domctl_psr_alloc         psr_alloc;
         struct xen_domctl_vuart_op          vuart_op;
         struct xen_domctl_vmtrace_op        vmtrace_op;
+        struct xen_domctl_p2m_mempool       p2m_mempool;
         uint8_t                             pad[128];
     } u;
 };
diff --git a/xen/include/xen/domain.h b/xen/include/xen/domain.h
index 2c8116afba27..01aaf4dedbe8 100644
--- a/xen/include/xen/domain.h
+++ b/xen/include/xen/domain.h
@@ -98,6 +98,9 @@ void arch_get_info_guest(struct vcpu *, vcpu_guest_context_u);
 int arch_initialise_vcpu(struct vcpu *v, XEN_GUEST_HANDLE_PARAM(void) arg);
 int default_initialise_vcpu(struct vcpu *v, XEN_GUEST_HANDLE_PARAM(void) arg);
 
+int arch_get_p2m_mempool_size(struct domain *d, uint64_t *size /* bytes */);
+int arch_set_p2m_mempool_size(struct domain *d, uint64_t size /* bytes */);
+
 int domain_relinquish_resources(struct domain *d);
 
 void dump_pageframe_info(struct domain *d);
-- 
2.11.0



  reply	other threads:[~2022-10-26 10:20 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-26 10:20 [PATCH for-4.17 0/4] XSA-409 fixes Andrew Cooper
2022-10-26 10:20 ` Andrew Cooper [this message]
2022-10-26 13:42   ` [PATCH 1/4] xen: Introduce non-broken hypercalls for the p2m pool size Jan Beulich
2022-10-26 19:22     ` Andrew Cooper
2022-10-26 21:24       ` Julien Grall
2022-10-27  6:56         ` Jan Beulich
2022-10-27  9:27           ` Julien Grall
2022-10-27  7:11       ` Jan Beulich
2022-10-28 15:27         ` George Dunlap
2022-10-31  9:26           ` Jan Beulich
2022-10-31 10:12             ` George Dunlap
2022-11-16  1:19           ` Stefano Stabellini
2022-11-16  8:26             ` Jan Beulich
2022-10-27  7:42   ` Jan Beulich
2022-10-26 10:20 ` [PATCH 2/4] tools/tests: Unit test for " Andrew Cooper
2022-10-26 14:24   ` Jan Beulich
2022-10-26 14:35     ` Andrew Cooper
2022-10-26 10:20 ` [PATCH 3/4] xen/arm, libxl: Revert XEN_DOMCTL_shadow_op; use p2m mempool hypercalls Andrew Cooper
2022-10-26 13:22   ` Jason Andryuk
2022-10-26 13:25     ` Andrew Cooper
2022-11-16  1:37   ` Stefano Stabellini
2022-11-16  1:48     ` Andrew Cooper
2022-11-16  2:00       ` Stefano Stabellini
2022-11-16  2:39         ` Henry Wang
2022-11-16  8:30         ` Jan Beulich
2022-11-16 23:41           ` Andrew Cooper
2022-11-16 23:44             ` Stefano Stabellini
2022-11-16 23:51               ` Julien Grall
2022-11-16 23:56                 ` Stefano Stabellini
2022-11-17  8:18             ` Jan Beulich
2022-10-26 10:20 ` [PATCH 4/4] xen/arm: Correct the p2m pool size calculations Andrew Cooper
2022-11-11 10:11   ` Henry Wang
2022-11-11 10:54     ` Julien Grall

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20221026102018.4144-2-andrew.cooper3@citrix.com \
    --to=andrew.cooper3@citrix.com \
    --cc=Henry.Wang@arm.com \
    --cc=JBeulich@suse.com \
    --cc=Volodymyr_Babchuk@epam.com \
    --cc=anthony.perard@citrix.com \
    --cc=bertrand.marquis@arm.com \
    --cc=julien@xen.org \
    --cc=roger.pau@citrix.com \
    --cc=security@xen.org \
    --cc=sstabellini@kernel.org \
    --cc=wl@xen.org \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.