All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/4] Fix truncation of various XENVER_* subops
@ 2023-01-03 20:09 Andrew Cooper
  2023-01-03 20:09 ` [PATCH 1/4] public/version: Change xen_feature_info to have a fixed size Andrew Cooper
                   ` (3 more replies)
  0 siblings, 4 replies; 25+ messages in thread
From: Andrew Cooper @ 2023-01-03 20:09 UTC (permalink / raw)
  To: Xen-devel
  Cc: Andrew Cooper, George Dunlap, Jan Beulich, Stefano Stabellini,
	Wei Liu, Julien Grall, Daniel De Graaf, Daniel Smith

See patch 4 for details of the problem.  Other patches fix other errors found
while investigating.

This is only the hypervisor side of the change for now, because I want
agreement before starting to untangle the mess which is libxc's helpers for
this.

Also a disaster is Linux's sysfs handling for these.  In several places it
makes a heap allocation for a pointer (or two) sized object.

Andrew Cooper (4):
  public/version: Change xen_feature_info to have a fixed size
  xen/version: Drop compat/kernel.c
  xen/version: Drop bogus return values for XENVER_platform_parameters
  xen/version: Introduce non-truncating XENVER_* subops

 xen/common/Makefile          |   2 +-
 xen/common/compat/kernel.c   |  53 ---------------------
 xen/common/kernel.c          | 108 ++++++++++++++++++++++++++++++++++++++-----
 xen/include/hypercall-defs.c |   2 +-
 xen/include/public/version.h |  78 +++++++++++++++++++++++++++++--
 xen/include/xlat.lst         |   4 ++
 xen/xsm/flask/hooks.c        |   4 ++
 7 files changed, 181 insertions(+), 70 deletions(-)
 delete mode 100644 xen/common/compat/kernel.c

-- 
2.11.0



^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 1/4] public/version: Change xen_feature_info to have a fixed size
  2023-01-03 20:09 [PATCH 0/4] Fix truncation of various XENVER_* subops Andrew Cooper
@ 2023-01-03 20:09 ` Andrew Cooper
  2023-01-04  9:03   ` Julien Grall
  2023-01-03 20:09 ` [PATCH 2/4] xen/version: Drop compat/kernel.c Andrew Cooper
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 25+ messages in thread
From: Andrew Cooper @ 2023-01-03 20:09 UTC (permalink / raw)
  To: Xen-devel
  Cc: Andrew Cooper, George Dunlap, Jan Beulich, Stefano Stabellini,
	Wei Liu, Julien Grall

This is technically an ABI change, but Xen doesn't operate in any environment
where "unsigned int" is differnet to uint32_t, so switch to the explicit form.
This avoids the need to derive (identical) compat logic for handling the
subop.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: George Dunlap <George.Dunlap@eu.citrix.com>
CC: Jan Beulich <JBeulich@suse.com>
CC: Stefano Stabellini <sstabellini@kernel.org>
CC: Wei Liu <wl@xen.org>
CC: Julien Grall <julien@xen.org>
---
 xen/include/public/version.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/xen/include/public/version.h b/xen/include/public/version.h
index 9c78b4f3b6a4..0ff8bd9077c6 100644
--- a/xen/include/public/version.h
+++ b/xen/include/public/version.h
@@ -50,7 +50,7 @@ typedef struct xen_platform_parameters xen_platform_parameters_t;
 
 #define XENVER_get_features 6
 struct xen_feature_info {
-    unsigned int submap_idx;    /* IN: which 32-bit submap to return */
+    uint32_t     submap_idx;    /* IN: which 32-bit submap to return */
     uint32_t     submap;        /* OUT: 32-bit submap */
 };
 typedef struct xen_feature_info xen_feature_info_t;
-- 
2.11.0



^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 2/4] xen/version: Drop compat/kernel.c
  2023-01-03 20:09 [PATCH 0/4] Fix truncation of various XENVER_* subops Andrew Cooper
  2023-01-03 20:09 ` [PATCH 1/4] public/version: Change xen_feature_info to have a fixed size Andrew Cooper
@ 2023-01-03 20:09 ` Andrew Cooper
  2023-01-04 16:29   ` Jan Beulich
  2023-01-04 16:48   ` Jan Beulich
  2023-01-03 20:09 ` [PATCH 3/4] xen/version: Drop bogus return values for XENVER_platform_parameters Andrew Cooper
  2023-01-03 20:09 ` [PATCH 4/4] xen/version: Introduce non-truncating XENVER_* subops Andrew Cooper
  3 siblings, 2 replies; 25+ messages in thread
From: Andrew Cooper @ 2023-01-03 20:09 UTC (permalink / raw)
  To: Xen-devel
  Cc: Andrew Cooper, George Dunlap, Jan Beulich, Stefano Stabellini,
	Wei Liu, Julien Grall

kernel.c is mostly in an #ifndef COMPAT guard, because compat/kernel.c
reincludes kernel.c to recompile xen_version() in a compat form.

However, the xen_version hypercall is almost guest-ABI-agnostic; only
XENVER_platform_parameters has a compat split.  Handle this locally, and do
away with the reinclude entirely.

In particular, this removed the final instances of obfucation via the DO()
macro.

No functional change.  Also saves 2k of of .text in the x86 build.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: George Dunlap <George.Dunlap@eu.citrix.com>
CC: Jan Beulich <JBeulich@suse.com>
CC: Stefano Stabellini <sstabellini@kernel.org>
CC: Wei Liu <wl@xen.org>
CC: Julien Grall <julien@xen.org>
---
 xen/common/Makefile          |  2 +-
 xen/common/compat/kernel.c   | 53 --------------------------------------------
 xen/common/kernel.c          | 43 ++++++++++++++++++++++-------------
 xen/include/hypercall-defs.c |  2 +-
 xen/include/xlat.lst         |  3 +++
 5 files changed, 33 insertions(+), 70 deletions(-)
 delete mode 100644 xen/common/compat/kernel.c

diff --git a/xen/common/Makefile b/xen/common/Makefile
index 9a3a12b12dea..bbd75b4be6d3 100644
--- a/xen/common/Makefile
+++ b/xen/common/Makefile
@@ -59,7 +59,7 @@ obj-y += xmalloc_tlsf.o
 
 obj-bin-$(CONFIG_X86) += $(foreach n,decompress bunzip2 unxz unlzma lzo unlzo unlz4 unzstd earlycpio,$(n).init.o)
 
-obj-$(CONFIG_COMPAT) += $(addprefix compat/,domain.o kernel.o memory.o multicall.o xlat.o)
+obj-$(CONFIG_COMPAT) += $(addprefix compat/,domain.o memory.o multicall.o xlat.o)
 
 ifneq ($(CONFIG_PV_SHIM_EXCLUSIVE),y)
 obj-y += domctl.o
diff --git a/xen/common/compat/kernel.c b/xen/common/compat/kernel.c
deleted file mode 100644
index 804b919bdc72..000000000000
--- a/xen/common/compat/kernel.c
+++ /dev/null
@@ -1,53 +0,0 @@
-/******************************************************************************
- * kernel.c
- */
-
-EMIT_FILE;
-
-#include <xen/init.h>
-#include <xen/lib.h>
-#include <xen/errno.h>
-#include <xen/version.h>
-#include <xen/sched.h>
-#include <xen/guest_access.h>
-#include <asm/current.h>
-#include <compat/xen.h>
-#include <compat/version.h>
-
-extern xen_commandline_t saved_cmdline;
-
-#define xen_extraversion compat_extraversion
-#define xen_extraversion_t compat_extraversion_t
-
-#define xen_compile_info compat_compile_info
-#define xen_compile_info_t compat_compile_info_t
-
-CHECK_TYPE(capabilities_info);
-
-#define xen_platform_parameters compat_platform_parameters
-#define xen_platform_parameters_t compat_platform_parameters_t
-#undef HYPERVISOR_VIRT_START
-#define HYPERVISOR_VIRT_START HYPERVISOR_COMPAT_VIRT_START(current->domain)
-
-#define xen_changeset_info compat_changeset_info
-#define xen_changeset_info_t compat_changeset_info_t
-
-#define xen_feature_info compat_feature_info
-#define xen_feature_info_t compat_feature_info_t
-
-CHECK_TYPE(domain_handle);
-
-#define DO(fn) int compat_##fn
-#define COMPAT
-
-#include "../kernel.c"
-
-/*
- * Local variables:
- * mode: C
- * c-file-style: "BSD"
- * c-basic-offset: 4
- * tab-width: 4
- * indent-tabs-mode: nil
- * End:
- */
diff --git a/xen/common/kernel.c b/xen/common/kernel.c
index f8134d3e7a9d..ccee178ff17a 100644
--- a/xen/common/kernel.c
+++ b/xen/common/kernel.c
@@ -18,7 +18,13 @@
 #include <asm/current.h>
 #include <public/version.h>
 
-#ifndef COMPAT
+#ifdef CONFIG_COMPAT
+#include <compat/version.h>
+
+CHECK_compile_info;
+CHECK_feature_info;
+CHECK_build_id;
+#endif
 
 enum system_state system_state = SYS_STATE_early_boot;
 
@@ -463,15 +469,7 @@ static int __init cf_check param_init(void)
 __initcall(param_init);
 #endif
 
-# define DO(fn) long do_##fn
-
-#endif
-
-/*
- * Simple hypercalls.
- */
-
-DO(xen_version)(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
+long do_xen_version(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
 {
     bool_t deny = !!xsm_xen_version(XSM_OTHER, cmd);
 
@@ -520,12 +518,27 @@ DO(xen_version)(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
     
     case XENVER_platform_parameters:
     {
-        xen_platform_parameters_t params = {
-            .virt_start = HYPERVISOR_VIRT_START
-        };
+#ifdef CONFIG_COMPAT
+        if ( current->hcall_compat )
+        {
+            compat_platform_parameters_t params = {
+                .virt_start = HYPERVISOR_COMPAT_VIRT_START(current->domain),
+            };
+
+            if ( copy_to_guest(arg, &params, 1) )
+                return -EFAULT;
+        }
+        else
+#endif
+        {
+            xen_platform_parameters_t params = {
+                .virt_start = HYPERVISOR_VIRT_START,
+            };
+
+            if ( copy_to_guest(arg, &params, 1) )
+                return -EFAULT;
+        }
 
-        if ( copy_to_guest(arg, &params, 1) )
-            return -EFAULT;
         return 0;
         
     }
diff --git a/xen/include/hypercall-defs.c b/xen/include/hypercall-defs.c
index 41e1af01f6b2..6d361ddfce1b 100644
--- a/xen/include/hypercall-defs.c
+++ b/xen/include/hypercall-defs.c
@@ -245,7 +245,7 @@ multicall                          compat:2 do:2     compat   do       do
 update_va_mapping                  compat   do       -        -        -
 set_timer_op                       compat   do       compat   do       -
 event_channel_op_compat            do       do       -        -        dep
-xen_version                        compat   do       compat   do       do
+xen_version                        do       do       do       do       do
 console_io                         do       do       do       do       do
 physdev_op_compat                  compat   do       -        -        dep
 #if defined(CONFIG_GRANT_TABLE)
diff --git a/xen/include/xlat.lst b/xen/include/xlat.lst
index 65f7fe3811c7..f2bae220a6df 100644
--- a/xen/include/xlat.lst
+++ b/xen/include/xlat.lst
@@ -169,6 +169,9 @@
 !	vcpu_runstate_info		vcpu.h
 ?	vcpu_set_periodic_timer		vcpu.h
 !	vcpu_set_singleshot_timer	vcpu.h
+?	compile_info                    version.h
+?	feature_info                    version.h
+?	build_id                        version.h
 ?	xenoprof_init			xenoprof.h
 ?	xenoprof_passive		xenoprof.h
 ?	flask_access			xsm/flask_op.h
-- 
2.11.0



^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 3/4] xen/version: Drop bogus return values for XENVER_platform_parameters
  2023-01-03 20:09 [PATCH 0/4] Fix truncation of various XENVER_* subops Andrew Cooper
  2023-01-03 20:09 ` [PATCH 1/4] public/version: Change xen_feature_info to have a fixed size Andrew Cooper
  2023-01-03 20:09 ` [PATCH 2/4] xen/version: Drop compat/kernel.c Andrew Cooper
@ 2023-01-03 20:09 ` Andrew Cooper
  2023-01-04 16:40   ` Jan Beulich
  2023-01-03 20:09 ` [PATCH 4/4] xen/version: Introduce non-truncating XENVER_* subops Andrew Cooper
  3 siblings, 1 reply; 25+ messages in thread
From: Andrew Cooper @ 2023-01-03 20:09 UTC (permalink / raw)
  To: Xen-devel
  Cc: Andrew Cooper, George Dunlap, Jan Beulich, Stefano Stabellini,
	Wei Liu, Julien Grall

A split in virtual address space is only applicable for x86 PV guests.
Furthermore, the information returned for x86 64bit PV guests is wrong.

Explain the problem in version.h, stating the other information that PV guests
need to know.

For 64bit PV guests, and all non-x86-PV guests, return 0, which is strictly
less wrong than the values currently returned.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: George Dunlap <George.Dunlap@eu.citrix.com>
CC: Jan Beulich <JBeulich@suse.com>
CC: Stefano Stabellini <sstabellini@kernel.org>
CC: Wei Liu <wl@xen.org>
CC: Julien Grall <julien@xen.org>
---
 xen/common/kernel.c          |  6 ++++--
 xen/include/public/version.h | 20 ++++++++++++++++++++
 2 files changed, 24 insertions(+), 2 deletions(-)

diff --git a/xen/common/kernel.c b/xen/common/kernel.c
index ccee178ff17a..70e7dff87488 100644
--- a/xen/common/kernel.c
+++ b/xen/common/kernel.c
@@ -522,7 +522,9 @@ long do_xen_version(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
         if ( current->hcall_compat )
         {
             compat_platform_parameters_t params = {
-                .virt_start = HYPERVISOR_COMPAT_VIRT_START(current->domain),
+                .virt_start = is_pv_vcpu(current)
+                            ? HYPERVISOR_COMPAT_VIRT_START(current->domain)
+                            : 0,
             };
 
             if ( copy_to_guest(arg, &params, 1) )
@@ -532,7 +534,7 @@ long do_xen_version(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
 #endif
         {
             xen_platform_parameters_t params = {
-                .virt_start = HYPERVISOR_VIRT_START,
+                .virt_start = 0,
             };
 
             if ( copy_to_guest(arg, &params, 1) )
diff --git a/xen/include/public/version.h b/xen/include/public/version.h
index 0ff8bd9077c6..c8325219f648 100644
--- a/xen/include/public/version.h
+++ b/xen/include/public/version.h
@@ -42,6 +42,26 @@ typedef char xen_capabilities_info_t[1024];
 typedef char xen_changeset_info_t[64];
 #define XEN_CHANGESET_INFO_LEN (sizeof(xen_changeset_info_t))
 
+/*
+ * This API is problematic.
+ *
+ * It is only applicable to guests which share pagetables with Xen (x86 PV
+ * guests), and is supposed to identify the virtual address split between
+ * guest kernel and Xen.
+ *
+ * For 32bit PV guests, it mostly does this, but the caller needs to know that
+ * Xen lives between the split and 4G.
+ *
+ * For 64bit PV guests, Xen lives at the bottom of the upper canonical range.
+ * This previously returned the start of the upper canonical range (which is
+ * the userspace/Xen split), not the Xen/kernel split (which is 8TB further
+ * on).  This now returns 0 because the old number wasn't correct, and
+ * changing it to anything else would be even worse.
+ *
+ * For all guest types using hardware virt extentions, Xen is not mapped into
+ * the guest kernel virtual address space.  This now return 0, where it
+ * previously returned unrelated data.
+ */
 #define XENVER_platform_parameters 5
 struct xen_platform_parameters {
     xen_ulong_t virt_start;
-- 
2.11.0



^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 4/4] xen/version: Introduce non-truncating XENVER_* subops
  2023-01-03 20:09 [PATCH 0/4] Fix truncation of various XENVER_* subops Andrew Cooper
                   ` (2 preceding siblings ...)
  2023-01-03 20:09 ` [PATCH 3/4] xen/version: Drop bogus return values for XENVER_platform_parameters Andrew Cooper
@ 2023-01-03 20:09 ` Andrew Cooper
  2023-01-03 20:47   ` Julien Grall
  2023-01-04 17:04   ` Jan Beulich
  3 siblings, 2 replies; 25+ messages in thread
From: Andrew Cooper @ 2023-01-03 20:09 UTC (permalink / raw)
  To: Xen-devel
  Cc: Andrew Cooper, George Dunlap, Jan Beulich, Stefano Stabellini,
	Wei Liu, Julien Grall

Recently in XenServer, we have encountered problems caused by both
XENVER_extraversion and XENVER_commandline having fixed bounds.

More than just the invariant size, the APIs/ABIs also broken by typedef-ing an
array, and using an unqualified 'char' which has implementation-specific
signed-ness.

Provide brand new ops, which are capable of expressing variable length
strings, and mark the older ops as broken.

This fixes all issues around XENVER_extraversion being longer than 15 chars.
More work is required to remove other assumptions about XENVER_commandline
being 1023 chars long.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: George Dunlap <George.Dunlap@eu.citrix.com>
CC: Jan Beulich <JBeulich@suse.com>
CC: Stefano Stabellini <sstabellini@kernel.org>
CC: Wei Liu <wl@xen.org>
CC: Julien Grall <julien@xen.org>

Tested by forcing XENVER_extraversion to be 20 chars long, and confirming that
an untruncated version can be obtained.

API/ABI wise, XENVER_build_id could be merged into xenver_varstring_op(), but
the internal infrastructure is awkward.
---
 xen/common/kernel.c          | 69 ++++++++++++++++++++++++++++++++++++++++++++
 xen/include/public/version.h | 56 +++++++++++++++++++++++++++++++++--
 xen/include/xlat.lst         |  1 +
 xen/xsm/flask/hooks.c        |  4 +++
 4 files changed, 128 insertions(+), 2 deletions(-)

diff --git a/xen/common/kernel.c b/xen/common/kernel.c
index 70e7dff87488..56bd6c6f5d9c 100644
--- a/xen/common/kernel.c
+++ b/xen/common/kernel.c
@@ -24,6 +24,7 @@
 CHECK_compile_info;
 CHECK_feature_info;
 CHECK_build_id;
+CHECK_var_string;
 #endif
 
 enum system_state system_state = SYS_STATE_early_boot;
@@ -469,6 +470,66 @@ static int __init cf_check param_init(void)
 __initcall(param_init);
 #endif
 
+static long xenver_varstring_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
+{
+    const char *str = NULL;
+    size_t sz = 0;
+    union {
+        xen_capabilities_info_t info;
+    } u;
+    struct xen_var_string user_str;
+
+    switch ( cmd )
+    {
+    case XENVER_extraversion2:
+        str = xen_extra_version();
+        break;
+
+    case XENVER_changeset2:
+        str = xen_changeset();
+        break;
+
+    case XENVER_commandline2:
+        str = saved_cmdline;
+        break;
+
+    case XENVER_capabilities2:
+        memset(u.info, 0, sizeof(u.info));
+        arch_get_xen_caps(&u.info);
+        str = u.info;
+        break;
+
+    default:
+        ASSERT_UNREACHABLE();
+        break;
+    }
+
+    if ( !str ||
+         !(sz = strlen(str)) )
+        return -ENODATA; /* failsafe */
+
+    if ( sz > INT32_MAX )
+        return -E2BIG; /* Compat guests.  2G ought to be plenty. */
+
+    if ( guest_handle_is_null(arg) ) /* Length request */
+        return sz;
+
+    if ( copy_from_guest(&user_str, arg, 1) )
+        return -EFAULT;
+
+    if ( user_str.len == 0 )
+        return -EINVAL;
+
+    if ( sz > user_str.len )
+        return -ENOBUFS;
+
+    if ( copy_to_guest_offset(arg, offsetof(struct xen_var_string, buf),
+                              str, sz) )
+        return -EFAULT;
+
+    return sz;
+}
+
 long do_xen_version(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
 {
     bool_t deny = !!xsm_xen_version(XSM_OTHER, cmd);
@@ -670,6 +731,14 @@ long do_xen_version(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
 
         return sz;
     }
+
+    case XENVER_extraversion2:
+    case XENVER_capabilities2:
+    case XENVER_changeset2:
+    case XENVER_commandline2:
+        if ( deny )
+            return -EPERM;
+        return xenver_varstring_op(cmd, arg);
     }
 
     return -ENOSYS;
diff --git a/xen/include/public/version.h b/xen/include/public/version.h
index c8325219f648..cf2d2ef38b54 100644
--- a/xen/include/public/version.h
+++ b/xen/include/public/version.h
@@ -19,12 +19,20 @@
 /* arg == NULL; returns major:minor (16:16). */
 #define XENVER_version      0
 
-/* arg == xen_extraversion_t. */
+/*
+ * arg == xen_extraversion_t.
+ *
+ * This API/ABI is broken.  Use XENVER_extraversion2 instead.
+ */
 #define XENVER_extraversion 1
 typedef char xen_extraversion_t[16];
 #define XEN_EXTRAVERSION_LEN (sizeof(xen_extraversion_t))
 
-/* arg == xen_compile_info_t. */
+/*
+ * arg == xen_compile_info_t.
+ *
+ * This API/ABI is broken and truncates data.
+ */
 #define XENVER_compile_info 2
 struct xen_compile_info {
     char compiler[64];
@@ -34,10 +42,20 @@ struct xen_compile_info {
 };
 typedef struct xen_compile_info xen_compile_info_t;
 
+/*
+ * arg == xen_capabilities_info_t.
+ *
+ * This API/ABI is broken.  Use XENVER_capabilities2 instead.
+ */
 #define XENVER_capabilities 3
 typedef char xen_capabilities_info_t[1024];
 #define XEN_CAPABILITIES_INFO_LEN (sizeof(xen_capabilities_info_t))
 
+/*
+ * arg == xen_changeset_info_t.
+ *
+ * This API/ABI is broken.  Use XENVER_changeset2 instead.
+ */
 #define XENVER_changeset 4
 typedef char xen_changeset_info_t[64];
 #define XEN_CHANGESET_INFO_LEN (sizeof(xen_changeset_info_t))
@@ -88,6 +106,11 @@ typedef struct xen_feature_info xen_feature_info_t;
  */
 #define XENVER_guest_handle 8
 
+/*
+ * arg == xen_commandline_t.
+ *
+ * This API/ABI is broken.  Use XENVER_commandline2 instead.
+ */
 #define XENVER_commandline 9
 typedef char xen_commandline_t[1024];
 
@@ -103,6 +126,35 @@ struct xen_build_id {
 };
 typedef struct xen_build_id xen_build_id_t;
 
+/*
+ * Container for an arbitrary variable length string.
+ */
+struct xen_var_string {
+    uint32_t len;                          /* IN:  size of buf[] in bytes. */
+    unsigned char buf[XEN_FLEX_ARRAY_DIM]; /* OUT: requested data.         */
+};
+typedef struct xen_var_string xen_var_string_t;
+
+/*
+ * arg == xenver_string_t
+ *
+ * Equivalent to the original ops, but with a non-truncating API/ABI.
+ *
+ * Passing arg == NULL is a request for size.  The returned size does not
+ * include a NUL terminator, and has a practical upper limit of INT32_MAX for
+ * 32bit guests.  This is expected to be plenty for the purpose.
+ *
+ * Otherwise, the input xenver_string_t provides the size of the following
+ * buffer.  Xen will fill the buffer, and return the number of bytes written
+ * (e.g. if the input buffer was longer than necessary).
+ *
+ * These hypercalls can fail, in which case they'll return -XEN_Exx.
+ */
+#define XENVER_extraversion2 11
+#define XENVER_capabilities2 12
+#define XENVER_changeset2    13
+#define XENVER_commandline2  14
+
 #endif /* __XEN_PUBLIC_VERSION_H__ */
 
 /*
diff --git a/xen/include/xlat.lst b/xen/include/xlat.lst
index f2bae220a6df..19cef4424add 100644
--- a/xen/include/xlat.lst
+++ b/xen/include/xlat.lst
@@ -172,6 +172,7 @@
 ?	compile_info                    version.h
 ?	feature_info                    version.h
 ?	build_id                        version.h
+?	var_string                      version.h
 ?	xenoprof_init			xenoprof.h
 ?	xenoprof_passive		xenoprof.h
 ?	flask_access			xsm/flask_op.h
diff --git a/xen/xsm/flask/hooks.c b/xen/xsm/flask/hooks.c
index 78225f68c15c..a671dcd0322e 100644
--- a/xen/xsm/flask/hooks.c
+++ b/xen/xsm/flask/hooks.c
@@ -1777,15 +1777,18 @@ static int cf_check flask_xen_version(uint32_t op)
         /* These sub-ops ignore the permission checks and return data. */
         return 0;
     case XENVER_extraversion:
+    case XENVER_extraversion2:
         return avc_has_perm(dsid, SECINITSID_XEN, SECCLASS_VERSION,
                             VERSION__XEN_EXTRAVERSION, NULL);
     case XENVER_compile_info:
         return avc_has_perm(dsid, SECINITSID_XEN, SECCLASS_VERSION,
                             VERSION__XEN_COMPILE_INFO, NULL);
     case XENVER_capabilities:
+    case XENVER_capabilities2:
         return avc_has_perm(dsid, SECINITSID_XEN, SECCLASS_VERSION,
                             VERSION__XEN_CAPABILITIES, NULL);
     case XENVER_changeset:
+    case XENVER_changeset2:
         return avc_has_perm(dsid, SECINITSID_XEN, SECCLASS_VERSION,
                             VERSION__XEN_CHANGESET, NULL);
     case XENVER_pagesize:
@@ -1795,6 +1798,7 @@ static int cf_check flask_xen_version(uint32_t op)
         return avc_has_perm(dsid, SECINITSID_XEN, SECCLASS_VERSION,
                             VERSION__XEN_GUEST_HANDLE, NULL);
     case XENVER_commandline:
+    case XENVER_commandline2:
         return avc_has_perm(dsid, SECINITSID_XEN, SECCLASS_VERSION,
                             VERSION__XEN_COMMANDLINE, NULL);
     case XENVER_build_id:
-- 
2.11.0



^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH 4/4] xen/version: Introduce non-truncating XENVER_* subops
  2023-01-03 20:09 ` [PATCH 4/4] xen/version: Introduce non-truncating XENVER_* subops Andrew Cooper
@ 2023-01-03 20:47   ` Julien Grall
  2023-01-03 21:22     ` Andrew Cooper
  2023-01-04 17:04   ` Jan Beulich
  1 sibling, 1 reply; 25+ messages in thread
From: Julien Grall @ 2023-01-03 20:47 UTC (permalink / raw)
  To: Andrew Cooper, Xen-devel
  Cc: George Dunlap, Jan Beulich, Stefano Stabellini, Wei Liu

Hi Andrew,

On 03/01/2023 20:09, Andrew Cooper wrote:
> diff --git a/xen/include/public/version.h b/xen/include/public/version.h
> index c8325219f648..cf2d2ef38b54 100644
> --- a/xen/include/public/version.h
> +++ b/xen/include/public/version.h
> @@ -19,12 +19,20 @@
>   /* arg == NULL; returns major:minor (16:16). */
>   #define XENVER_version      0
>   
> -/* arg == xen_extraversion_t. */
> +/*
> + * arg == xen_extraversion_t.
> + *
> + * This API/ABI is broken.  Use XENVER_extraversion2 instead.

I read this as newer tools should never try to call XENVER_extraversion. 
But I don't think this is what you intend to say, correct? If so, I 
would say that an OS should first try XENVER_extraversion2 and then 
fallback to XENVER_extraversion if it returns -ENOSYS.

Same goes for the new hypercalls.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 4/4] xen/version: Introduce non-truncating XENVER_* subops
  2023-01-03 20:47   ` Julien Grall
@ 2023-01-03 21:22     ` Andrew Cooper
  2023-01-03 21:40       ` Julien Grall
  0 siblings, 1 reply; 25+ messages in thread
From: Andrew Cooper @ 2023-01-03 21:22 UTC (permalink / raw)
  To: Julien Grall, Xen-devel
  Cc: George Dunlap, Jan Beulich, Stefano Stabellini, Wei Liu

On 03/01/2023 8:47 pm, Julien Grall wrote:
> Hi Andrew,
>
> On 03/01/2023 20:09, Andrew Cooper wrote:
>> diff --git a/xen/include/public/version.h b/xen/include/public/version.h
>> index c8325219f648..cf2d2ef38b54 100644
>> --- a/xen/include/public/version.h
>> +++ b/xen/include/public/version.h
>> @@ -19,12 +19,20 @@
>>   /* arg == NULL; returns major:minor (16:16). */
>>   #define XENVER_version      0
>>   -/* arg == xen_extraversion_t. */
>> +/*
>> + * arg == xen_extraversion_t.
>> + *
>> + * This API/ABI is broken.  Use XENVER_extraversion2 instead.
>
> I read this as newer tools should never try to call
> XENVER_extraversion. But I don't think this is what you intend to say,
> correct? If so, I would say that an OS should first try
> XENVER_extraversion2 and then fallback to XENVER_extraversion if it
> returns -ENOSYS.
>
> Same goes for the new hypercalls.

Right, but that's sufficiently basic that it goes without saying.

This is not a "my first introduction to writing code" tutorial.

~Andrew

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 4/4] xen/version: Introduce non-truncating XENVER_* subops
  2023-01-03 21:22     ` Andrew Cooper
@ 2023-01-03 21:40       ` Julien Grall
  0 siblings, 0 replies; 25+ messages in thread
From: Julien Grall @ 2023-01-03 21:40 UTC (permalink / raw)
  To: Andrew Cooper, Xen-devel
  Cc: George Dunlap, Jan Beulich, Stefano Stabellini, Wei Liu



On 03/01/2023 21:22, Andrew Cooper wrote:
> On 03/01/2023 8:47 pm, Julien Grall wrote:
>> On 03/01/2023 20:09, Andrew Cooper wrote:
>>> diff --git a/xen/include/public/version.h b/xen/include/public/version.h
>>> index c8325219f648..cf2d2ef38b54 100644
>>> --- a/xen/include/public/version.h
>>> +++ b/xen/include/public/version.h
>>> @@ -19,12 +19,20 @@
>>>    /* arg == NULL; returns major:minor (16:16). */
>>>    #define XENVER_version      0
>>>    -/* arg == xen_extraversion_t. */
>>> +/*
>>> + * arg == xen_extraversion_t.
>>> + *
>>> + * This API/ABI is broken.  Use XENVER_extraversion2 instead.
>>
>> I read this as newer tools should never try to call
>> XENVER_extraversion. But I don't think this is what you intend to say,
>> correct? If so, I would say that an OS should first try
>> XENVER_extraversion2 and then fallback to XENVER_extraversion if it
>> returns -ENOSYS.
>>
>> Same goes for the new hypercalls.
> 
> Right, but that's sufficiently basic that it goes without saying.
It never hurts to be obvious. I couldn't say the same for the other way 
around.

> 
> This is not a "my first introduction to writing code" tutorial.

That's the first introduction to Xen for an OS developer. Adding a few 
words (maybe with the first version introducing it) would likely be 
welcomed by any developer.

This avoids any misinterpretation and/or losing time playing the git 
blame game...

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/4] public/version: Change xen_feature_info to have a fixed size
  2023-01-03 20:09 ` [PATCH 1/4] public/version: Change xen_feature_info to have a fixed size Andrew Cooper
@ 2023-01-04  9:03   ` Julien Grall
  0 siblings, 0 replies; 25+ messages in thread
From: Julien Grall @ 2023-01-04  9:03 UTC (permalink / raw)
  To: Andrew Cooper, Xen-devel
  Cc: George Dunlap, Jan Beulich, Stefano Stabellini, Wei Liu

Hi Andrew,

On 03/01/2023 20:09, Andrew Cooper wrote:
> This is technically an ABI change, but Xen doesn't operate in any environment
> where "unsigned int" is differnet to uint32_t, so switch to the explicit form.

typo: s/differnet/different/

> This avoids the need to derive (identical) compat logic for handling the
> subop.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

Acked-by: Julien Grall <jgrall@amazon.com>

Cheers,

> ---
> CC: George Dunlap <George.Dunlap@eu.citrix.com>
> CC: Jan Beulich <JBeulich@suse.com>
> CC: Stefano Stabellini <sstabellini@kernel.org>
> CC: Wei Liu <wl@xen.org>
> CC: Julien Grall <julien@xen.org>
> ---
>   xen/include/public/version.h | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/xen/include/public/version.h b/xen/include/public/version.h
> index 9c78b4f3b6a4..0ff8bd9077c6 100644
> --- a/xen/include/public/version.h
> +++ b/xen/include/public/version.h
> @@ -50,7 +50,7 @@ typedef struct xen_platform_parameters xen_platform_parameters_t;
>   
>   #define XENVER_get_features 6
>   struct xen_feature_info {
> -    unsigned int submap_idx;    /* IN: which 32-bit submap to return */
> +    uint32_t     submap_idx;    /* IN: which 32-bit submap to return */
>       uint32_t     submap;        /* OUT: 32-bit submap */
>   };
>   typedef struct xen_feature_info xen_feature_info_t;

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/4] xen/version: Drop compat/kernel.c
  2023-01-03 20:09 ` [PATCH 2/4] xen/version: Drop compat/kernel.c Andrew Cooper
@ 2023-01-04 16:29   ` Jan Beulich
  2023-01-04 19:15     ` Andrew Cooper
  2023-01-04 16:48   ` Jan Beulich
  1 sibling, 1 reply; 25+ messages in thread
From: Jan Beulich @ 2023-01-04 16:29 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: George Dunlap, Stefano Stabellini, Wei Liu, Julien Grall, Xen-devel

On 03.01.2023 21:09, Andrew Cooper wrote:
> kernel.c is mostly in an #ifndef COMPAT guard, because compat/kernel.c
> reincludes kernel.c to recompile xen_version() in a compat form.
> 
> However, the xen_version hypercall is almost guest-ABI-agnostic; only
> XENVER_platform_parameters has a compat split.  Handle this locally, and do
> away with the reinclude entirely.

And we henceforth mean to not introduce any interface structures here
which would require translation (or we're willing to accept the clutter
of handling those "locally" as well). Fine with me, just wanting to
mention it.

> --- a/xen/common/compat/kernel.c
> +++ /dev/null
> @@ -1,53 +0,0 @@
> -/******************************************************************************
> - * kernel.c
> - */
> -
> -EMIT_FILE;
> -
> -#include <xen/init.h>
> -#include <xen/lib.h>
> -#include <xen/errno.h>
> -#include <xen/version.h>
> -#include <xen/sched.h>
> -#include <xen/guest_access.h>
> -#include <asm/current.h>
> -#include <compat/xen.h>
> -#include <compat/version.h>
> -
> -extern xen_commandline_t saved_cmdline;
> -
> -#define xen_extraversion compat_extraversion
> -#define xen_extraversion_t compat_extraversion_t
> -
> -#define xen_compile_info compat_compile_info
> -#define xen_compile_info_t compat_compile_info_t
> -
> -CHECK_TYPE(capabilities_info);

This and ...

> -#define xen_platform_parameters compat_platform_parameters
> -#define xen_platform_parameters_t compat_platform_parameters_t
> -#undef HYPERVISOR_VIRT_START
> -#define HYPERVISOR_VIRT_START HYPERVISOR_COMPAT_VIRT_START(current->domain)
> -
> -#define xen_changeset_info compat_changeset_info
> -#define xen_changeset_info_t compat_changeset_info_t
> -
> -#define xen_feature_info compat_feature_info
> -#define xen_feature_info_t compat_feature_info_t
> -
> -CHECK_TYPE(domain_handle);

... this go away without any replacement. Considering they're both
char[], that's probably fine, but could do with mentioning in the
description.

> @@ -520,12 +518,27 @@ DO(xen_version)(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
>      
>      case XENVER_platform_parameters:
>      {
> -        xen_platform_parameters_t params = {
> -            .virt_start = HYPERVISOR_VIRT_START
> -        };

With this gone the oddly (but intentionally) placed braces can then
also go away.

Preferably with these minor adjustments
Reviewed-by: Jan Beulich <jbeulich@suse.com>

Jan


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 3/4] xen/version: Drop bogus return values for XENVER_platform_parameters
  2023-01-03 20:09 ` [PATCH 3/4] xen/version: Drop bogus return values for XENVER_platform_parameters Andrew Cooper
@ 2023-01-04 16:40   ` Jan Beulich
  2023-01-04 19:55     ` Andrew Cooper
  0 siblings, 1 reply; 25+ messages in thread
From: Jan Beulich @ 2023-01-04 16:40 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: George Dunlap, Stefano Stabellini, Wei Liu, Julien Grall, Xen-devel

On 03.01.2023 21:09, Andrew Cooper wrote:
> A split in virtual address space is only applicable for x86 PV guests.
> Furthermore, the information returned for x86 64bit PV guests is wrong.
> 
> Explain the problem in version.h, stating the other information that PV guests
> need to know.
> 
> For 64bit PV guests, and all non-x86-PV guests, return 0, which is strictly
> less wrong than the values currently returned.

I disagree for the 64-bit part of this. Seeing Linux'es exposure of the
value in sysfs I even wonder whether we can change this like you do for
HVM. Who knows what is being inferred from the value, and by whom.

> --- a/xen/include/public/version.h
> +++ b/xen/include/public/version.h
> @@ -42,6 +42,26 @@ typedef char xen_capabilities_info_t[1024];
>  typedef char xen_changeset_info_t[64];
>  #define XEN_CHANGESET_INFO_LEN (sizeof(xen_changeset_info_t))
>  
> +/*
> + * This API is problematic.
> + *
> + * It is only applicable to guests which share pagetables with Xen (x86 PV
> + * guests), and is supposed to identify the virtual address split between
> + * guest kernel and Xen.
> + *
> + * For 32bit PV guests, it mostly does this, but the caller needs to know that
> + * Xen lives between the split and 4G.
> + *
> + * For 64bit PV guests, Xen lives at the bottom of the upper canonical range.
> + * This previously returned the start of the upper canonical range (which is
> + * the userspace/Xen split), not the Xen/kernel split (which is 8TB further
> + * on).  This now returns 0 because the old number wasn't correct, and
> + * changing it to anything else would be even worse.

Whether the guest runs user mode code in the low or high half (or in yet
another way of splitting) isn't really dictated by the PV ABI, is it? So
whether the value is "wrong" is entirely unclear. Instead ...

> + * For all guest types using hardware virt extentions, Xen is not mapped into
> + * the guest kernel virtual address space.  This now return 0, where it
> + * previously returned unrelated data.
> + */
>  #define XENVER_platform_parameters 5
>  struct xen_platform_parameters {
>      xen_ulong_t virt_start;

... the field name tells me that all that is being conveyed is the virtual
address of where the hypervisor area starts.

Jan


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/4] xen/version: Drop compat/kernel.c
  2023-01-03 20:09 ` [PATCH 2/4] xen/version: Drop compat/kernel.c Andrew Cooper
  2023-01-04 16:29   ` Jan Beulich
@ 2023-01-04 16:48   ` Jan Beulich
  1 sibling, 0 replies; 25+ messages in thread
From: Jan Beulich @ 2023-01-04 16:48 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: George Dunlap, Stefano Stabellini, Wei Liu, Julien Grall, Xen-devel

On 03.01.2023 21:09, Andrew Cooper wrote:
> --- a/xen/include/xlat.lst
> +++ b/xen/include/xlat.lst
> @@ -169,6 +169,9 @@
>  !	vcpu_runstate_info		vcpu.h
>  ?	vcpu_set_periodic_timer		vcpu.h
>  !	vcpu_set_singleshot_timer	vcpu.h
> +?	compile_info                    version.h
> +?	feature_info                    version.h
> +?	build_id                        version.h

Oh, btw, another minor request: Can you please sort these by name (as
secondary criteria after the name of the containing header)?

Thanks, Jan


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 4/4] xen/version: Introduce non-truncating XENVER_* subops
  2023-01-03 20:09 ` [PATCH 4/4] xen/version: Introduce non-truncating XENVER_* subops Andrew Cooper
  2023-01-03 20:47   ` Julien Grall
@ 2023-01-04 17:04   ` Jan Beulich
  2023-01-04 18:34     ` Andrew Cooper
  1 sibling, 1 reply; 25+ messages in thread
From: Jan Beulich @ 2023-01-04 17:04 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: George Dunlap, Stefano Stabellini, Wei Liu, Julien Grall, Xen-devel

On 03.01.2023 21:09, Andrew Cooper wrote:
> Recently in XenServer, we have encountered problems caused by both
> XENVER_extraversion and XENVER_commandline having fixed bounds.
> 
> More than just the invariant size, the APIs/ABIs also broken by typedef-ing an
> array, and using an unqualified 'char' which has implementation-specific
> signed-ness.

Which is fine as long as only ASCII is returned. If non-ASCII can be returned,
I agree "unsigned char" is better, but then we also need to spell out what
encoding the strings use (UTF-8 presumably).

> API/ABI wise, XENVER_build_id could be merged into xenver_varstring_op(), but
> the internal infrastructure is awkward.

I guess build-id also doesn't fit the rest because of not returning strings,
but indeed an array of bytes. You also couldn't use strlen() on the array.

> @@ -469,6 +470,66 @@ static int __init cf_check param_init(void)
>  __initcall(param_init);
>  #endif
>  
> +static long xenver_varstring_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
> +{
> +    const char *str = NULL;
> +    size_t sz = 0;
> +    union {
> +        xen_capabilities_info_t info;
> +    } u;
> +    struct xen_var_string user_str;
> +
> +    switch ( cmd )
> +    {
> +    case XENVER_extraversion2:
> +        str = xen_extra_version();
> +        break;
> +
> +    case XENVER_changeset2:
> +        str = xen_changeset();
> +        break;
> +
> +    case XENVER_commandline2:
> +        str = saved_cmdline;
> +        break;
> +
> +    case XENVER_capabilities2:
> +        memset(u.info, 0, sizeof(u.info));
> +        arch_get_xen_caps(&u.info);
> +        str = u.info;
> +        break;
> +
> +    default:
> +        ASSERT_UNREACHABLE();
> +        break;
> +    }
> +
> +    if ( !str ||
> +         !(sz = strlen(str)) )
> +        return -ENODATA; /* failsafe */

Is this really appropriate for e.g. an empty command line?

> +    if ( sz > INT32_MAX )
> +        return -E2BIG; /* Compat guests.  2G ought to be plenty. */

While the comment here and in the public header mention compat guests,
the check is uniform. What's the deal?

> +    if ( guest_handle_is_null(arg) ) /* Length request */
> +        return sz;
> +
> +    if ( copy_from_guest(&user_str, arg, 1) )
> +        return -EFAULT;
> +
> +    if ( user_str.len == 0 )
> +        return -EINVAL;
> +
> +    if ( sz > user_str.len )
> +        return -ENOBUFS;
> +
> +    if ( copy_to_guest_offset(arg, offsetof(struct xen_var_string, buf),
> +                              str, sz) )
> +        return -EFAULT;

Not inserting a nul terminator is going to make this slightly awkward to
use.

> @@ -103,6 +126,35 @@ struct xen_build_id {
>  };
>  typedef struct xen_build_id xen_build_id_t;
>  
> +/*
> + * Container for an arbitrary variable length string.
> + */
> +struct xen_var_string {
> +    uint32_t len;                          /* IN:  size of buf[] in bytes. */
> +    unsigned char buf[XEN_FLEX_ARRAY_DIM]; /* OUT: requested data.         */
> +};
> +typedef struct xen_var_string xen_var_string_t;
> +
> +/*
> + * arg == xenver_string_t

Nit: xen_var_string_t (also again in the following text).

> + * Equivalent to the original ops, but with a non-truncating API/ABI.
> + *
> + * Passing arg == NULL is a request for size.  The returned size does not
> + * include a NUL terminator, and has a practical upper limit of INT32_MAX for
> + * 32bit guests.  This is expected to be plenty for the purpose.

As said above, the limit applies to all guests, which the wording here
doesn't suggest.

Jan

> + * Otherwise, the input xenver_string_t provides the size of the following
> + * buffer.  Xen will fill the buffer, and return the number of bytes written
> + * (e.g. if the input buffer was longer than necessary).
> + *
> + * These hypercalls can fail, in which case they'll return -XEN_Exx.
> + */
> +#define XENVER_extraversion2 11
> +#define XENVER_capabilities2 12
> +#define XENVER_changeset2    13
> +#define XENVER_commandline2  14
> +
>  #endif /* __XEN_PUBLIC_VERSION_H__ */
>  
>  /*



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 4/4] xen/version: Introduce non-truncating XENVER_* subops
  2023-01-04 17:04   ` Jan Beulich
@ 2023-01-04 18:34     ` Andrew Cooper
  2023-01-05  8:15       ` Jan Beulich
  0 siblings, 1 reply; 25+ messages in thread
From: Andrew Cooper @ 2023-01-04 18:34 UTC (permalink / raw)
  To: Jan Beulich
  Cc: George Dunlap, Stefano Stabellini, Wei Liu, Julien Grall, Xen-devel

On 04/01/2023 5:04 pm, Jan Beulich wrote:
> On 03.01.2023 21:09, Andrew Cooper wrote:
>> Recently in XenServer, we have encountered problems caused by both
>> XENVER_extraversion and XENVER_commandline having fixed bounds.
>>
>> More than just the invariant size, the APIs/ABIs also broken by typedef-ing an
>> array, and using an unqualified 'char' which has implementation-specific
>> signed-ness.
> Which is fine as long as only ASCII is returned. If non-ASCII can be returned,
> I agree "unsigned char" is better, but then we also need to spell out what
> encoding the strings use (UTF-8 presumably).

Well, it "functions" as far as ASCII is concerned, but I wouldn't say
that was fine for an ABI.

The command line can reasonably have non-ASCII characters these days. 
(as can the compile information, but I intentionally didn't convert them
to this new format).

UTF-8 is probably the sensible thing to specify, if we're going to make
any statement.

>
>> API/ABI wise, XENVER_build_id could be merged into xenver_varstring_op(), but
>> the internal infrastructure is awkward.
> I guess build-id also doesn't fit the rest because of not returning strings,
> but indeed an array of bytes. You also couldn't use strlen() on the array.

The format is unspecified, but it is a base64 encoded ASCII string (not
NUL terminated).

>> @@ -469,6 +470,66 @@ static int __init cf_check param_init(void)
>>  __initcall(param_init);
>>  #endif
>>  
>> +static long xenver_varstring_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
>> +{
>> +    const char *str = NULL;
>> +    size_t sz = 0;
>> +    union {
>> +        xen_capabilities_info_t info;
>> +    } u;
>> +    struct xen_var_string user_str;
>> +
>> +    switch ( cmd )
>> +    {
>> +    case XENVER_extraversion2:
>> +        str = xen_extra_version();
>> +        break;
>> +
>> +    case XENVER_changeset2:
>> +        str = xen_changeset();
>> +        break;
>> +
>> +    case XENVER_commandline2:
>> +        str = saved_cmdline;
>> +        break;
>> +
>> +    case XENVER_capabilities2:
>> +        memset(u.info, 0, sizeof(u.info));
>> +        arch_get_xen_caps(&u.info);
>> +        str = u.info;
>> +        break;
>> +
>> +    default:
>> +        ASSERT_UNREACHABLE();
>> +        break;
>> +    }
>> +
>> +    if ( !str ||
>> +         !(sz = strlen(str)) )
>> +        return -ENODATA; /* failsafe */
> Is this really appropriate for e.g. an empty command line?

Hmm.  Good point.  All can in principle be empty.

In which case I think I'll put the -ENODATA in the default case and have
a return 0 for the sz == 0 case.

>> +    if ( sz > INT32_MAX )
>> +        return -E2BIG; /* Compat guests.  2G ought to be plenty. */
> While the comment here and in the public header mention compat guests,
> the check is uniform. What's the deal?

Well its either this, or a (comat ? INT32_MAX : INT64_MAX) check, along
with the ifdefary and predicates required to make that compile.

But there's not a CPU today which can actually move 2G of data (which is
4G of L1d bandwidth) without suffering the watchdog (especially as we've
just read it once for strlen(), so that's 6G of bandwidth), nor do I
expect this to change in the forseeable future.

There's some boundary (probably far lower) beyond which we can't use the
algorithm here.

There wants to be some limit, and I don't feel it is necessary to make
it variable on the guest type.

>
>> +    if ( guest_handle_is_null(arg) ) /* Length request */
>> +        return sz;
>> +
>> +    if ( copy_from_guest(&user_str, arg, 1) )
>> +        return -EFAULT;
>> +
>> +    if ( user_str.len == 0 )
>> +        return -EINVAL;
>> +
>> +    if ( sz > user_str.len )
>> +        return -ENOBUFS;
>> +
>> +    if ( copy_to_guest_offset(arg, offsetof(struct xen_var_string, buf),
>> +                              str, sz) )
>> +        return -EFAULT;
> Not inserting a nul terminator is going to make this slightly awkward to
> use.

Not really.  It matches how build-id currently works, and forces the
caller to use proper tained-string discipline.

To safely printk() it, all you need is:

rc = xen_version(XENVER_$X, &str);
if ( rc < 0 )
    return rc;
printk("%*.s\n", rc, str.buf);

>> @@ -103,6 +126,35 @@ struct xen_build_id {
>>  };
>>  typedef struct xen_build_id xen_build_id_t;
>>  
>> +/*
>> + * Container for an arbitrary variable length string.
>> + */
>> +struct xen_var_string {
>> +    uint32_t len;                          /* IN:  size of buf[] in bytes. */
>> +    unsigned char buf[XEN_FLEX_ARRAY_DIM]; /* OUT: requested data.         */
>> +};
>> +typedef struct xen_var_string xen_var_string_t;
>> +
>> +/*
>> + * arg == xenver_string_t
> Nit: xen_var_string_t (also again in the following text).

Ah yes.  I originally called it xenver_string_t then found a whole load
of compat assumptions about xen prefixes on names.

Will fix.


But overall, I'm not seeing a major objection to this change?  In which
case I'll go ahead and do the tools/ cleanup too for v2.

~Andrew

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/4] xen/version: Drop compat/kernel.c
  2023-01-04 16:29   ` Jan Beulich
@ 2023-01-04 19:15     ` Andrew Cooper
  2023-01-05  7:22       ` Jan Beulich
  0 siblings, 1 reply; 25+ messages in thread
From: Andrew Cooper @ 2023-01-04 19:15 UTC (permalink / raw)
  To: Jan Beulich
  Cc: George Dunlap, Stefano Stabellini, Wei Liu, Julien Grall, Xen-devel

On 04/01/2023 4:29 pm, Jan Beulich wrote:
> On 03.01.2023 21:09, Andrew Cooper wrote:
>> kernel.c is mostly in an #ifndef COMPAT guard, because compat/kernel.c
>> reincludes kernel.c to recompile xen_version() in a compat form.
>>
>> However, the xen_version hypercall is almost guest-ABI-agnostic; only
>> XENVER_platform_parameters has a compat split.  Handle this locally, and do
>> away with the reinclude entirely.
> And we henceforth mean to not introduce any interface structures here
> which would require translation (or we're willing to accept the clutter
> of handling those "locally" as well). Fine with me, just wanting to
> mention it.

Sure - I'll put a note in the commit message.

In general, we don't want guest-variant interfaces.

>
>> --- a/xen/common/compat/kernel.c
>> +++ /dev/null
>> @@ -1,53 +0,0 @@
>> -/******************************************************************************
>> - * kernel.c
>> - */
>> -
>> -EMIT_FILE;
>> -
>> -#include <xen/init.h>
>> -#include <xen/lib.h>
>> -#include <xen/errno.h>
>> -#include <xen/version.h>
>> -#include <xen/sched.h>
>> -#include <xen/guest_access.h>
>> -#include <asm/current.h>
>> -#include <compat/xen.h>
>> -#include <compat/version.h>
>> -
>> -extern xen_commandline_t saved_cmdline;
>> -
>> -#define xen_extraversion compat_extraversion
>> -#define xen_extraversion_t compat_extraversion_t
>> -
>> -#define xen_compile_info compat_compile_info
>> -#define xen_compile_info_t compat_compile_info_t
>> -
>> -CHECK_TYPE(capabilities_info);
> This and ...
>
>> -#define xen_platform_parameters compat_platform_parameters
>> -#define xen_platform_parameters_t compat_platform_parameters_t
>> -#undef HYPERVISOR_VIRT_START
>> -#define HYPERVISOR_VIRT_START HYPERVISOR_COMPAT_VIRT_START(current->domain)
>> -
>> -#define xen_changeset_info compat_changeset_info
>> -#define xen_changeset_info_t compat_changeset_info_t
>> -
>> -#define xen_feature_info compat_feature_info
>> -#define xen_feature_info_t compat_feature_info_t
>> -
>> -CHECK_TYPE(domain_handle);
> ... this go away without any replacement. Considering they're both
> char[], that's probably fine, but could do with mentioning in the
> description.

I did actually mean to ask about these two, because they're incomplete
already.

Why do we CHECK_TYPE(capabilities_info) but define identity aliases for
compat_extraversion (amongst others) ?

Is there even a point for having a compat alias of a char array?

I'm tempted to just drop them.  I don't think the check does anything
useful for us.

>> @@ -520,12 +518,27 @@ DO(xen_version)(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
>>      
>>      case XENVER_platform_parameters:
>>      {
>> -        xen_platform_parameters_t params = {
>> -            .virt_start = HYPERVISOR_VIRT_START
>> -        };
> With this gone the oddly (but intentionally) placed braces can then
> also go away.

In light of how patch 3 ended up, I was considering pulling curr out
into a variable.

> Preferably with these minor adjustments
> Reviewed-by: Jan Beulich <jbeulich@suse.com>

Thanks,

~Andrew

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 3/4] xen/version: Drop bogus return values for XENVER_platform_parameters
  2023-01-04 16:40   ` Jan Beulich
@ 2023-01-04 19:55     ` Andrew Cooper
  2023-01-05  7:57       ` Jan Beulich
  0 siblings, 1 reply; 25+ messages in thread
From: Andrew Cooper @ 2023-01-04 19:55 UTC (permalink / raw)
  To: Jan Beulich
  Cc: George Dunlap, Stefano Stabellini, Wei Liu, Julien Grall, Xen-devel

On 04/01/2023 4:40 pm, Jan Beulich wrote:
> On 03.01.2023 21:09, Andrew Cooper wrote:
>> A split in virtual address space is only applicable for x86 PV guests.
>> Furthermore, the information returned for x86 64bit PV guests is wrong.
>>
>> Explain the problem in version.h, stating the other information that PV guests
>> need to know.
>>
>> For 64bit PV guests, and all non-x86-PV guests, return 0, which is strictly
>> less wrong than the values currently returned.
> I disagree for the 64-bit part of this. Seeing Linux'es exposure of the
> value in sysfs I even wonder whether we can change this like you do for
> HVM. Who knows what is being inferred from the value, and by whom.

Linux's sysfs ABI isn't relevant to us here.  The sysfs ABI says it
reports what the hypervisor presents, not that it will be a nonzero number.

>> --- a/xen/include/public/version.h
>> +++ b/xen/include/public/version.h
>> @@ -42,6 +42,26 @@ typedef char xen_capabilities_info_t[1024];
>>  typedef char xen_changeset_info_t[64];
>>  #define XEN_CHANGESET_INFO_LEN (sizeof(xen_changeset_info_t))
>>  
>> +/*
>> + * This API is problematic.
>> + *
>> + * It is only applicable to guests which share pagetables with Xen (x86 PV
>> + * guests), and is supposed to identify the virtual address split between
>> + * guest kernel and Xen.
>> + *
>> + * For 32bit PV guests, it mostly does this, but the caller needs to know that
>> + * Xen lives between the split and 4G.
>> + *
>> + * For 64bit PV guests, Xen lives at the bottom of the upper canonical range.
>> + * This previously returned the start of the upper canonical range (which is
>> + * the userspace/Xen split), not the Xen/kernel split (which is 8TB further
>> + * on).  This now returns 0 because the old number wasn't correct, and
>> + * changing it to anything else would be even worse.
> Whether the guest runs user mode code in the low or high half (or in yet
> another way of splitting) isn't really dictated by the PV ABI, is it?

No, but given a choice of reporting the thing which is an architectural
boundary, or the one which is the actual split between the two adjacent
ranges, reporting the architectural boundary is clearly the unhelpful thing.

>  So
> whether the value is "wrong" is entirely unclear. Instead ...
>
>> + * For all guest types using hardware virt extentions, Xen is not mapped into
>> + * the guest kernel virtual address space.  This now return 0, where it
>> + * previously returned unrelated data.
>> + */
>>  #define XENVER_platform_parameters 5
>>  struct xen_platform_parameters {
>>      xen_ulong_t virt_start;
> ... the field name tells me that all that is being conveyed is the virtual
> address of where the hypervisor area starts.

IMO, it doesn't matter what the name of the field is.  It dates from the
days when 32bit PV was the only type of guest.

32bit PV guests really do have a variable split, so the guest kernel
really does need to get this value from Xen.

The split for 64bit PV guests is compile-time constant, hence why 64bit
PV kernels don't care.

For compat HVM, it happens to pick up the -1 from:

#ifdef CONFIG_PV32
    HYPERVISOR_COMPAT_VIRT_START(d) =
        is_pv_domain(d) ? __HYPERVISOR_COMPAT_VIRT_START : ~0u;
#endif

in arch_domain_create(), whereas for non-compat HVM, it gets a number in
an address space it has no connection to in the slightest.  ARM guests
end up getting XEN_VIRT_START (== 2M) handed back, but this absolutely
an internal detail that guests have no business knowing.


The only reason I'm not issuing an XSA for this is because we don't have
any pretence of KASLR in Xen.  Pretty much every other kernel gets CVEs
for infoleaks like this.

We feasibly could do KASLR in !PV builds, at which point this would
qualify for an XSA.

~Andrew

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/4] xen/version: Drop compat/kernel.c
  2023-01-04 19:15     ` Andrew Cooper
@ 2023-01-05  7:22       ` Jan Beulich
  0 siblings, 0 replies; 25+ messages in thread
From: Jan Beulich @ 2023-01-05  7:22 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: George Dunlap, Stefano Stabellini, Wei Liu, Julien Grall, Xen-devel

On 04.01.2023 20:15, Andrew Cooper wrote:
> On 04/01/2023 4:29 pm, Jan Beulich wrote:
>> On 03.01.2023 21:09, Andrew Cooper wrote:
>>> --- a/xen/common/compat/kernel.c
>>> +++ /dev/null
>>> @@ -1,53 +0,0 @@
>>> -/******************************************************************************
>>> - * kernel.c
>>> - */
>>> -
>>> -EMIT_FILE;
>>> -
>>> -#include <xen/init.h>
>>> -#include <xen/lib.h>
>>> -#include <xen/errno.h>
>>> -#include <xen/version.h>
>>> -#include <xen/sched.h>
>>> -#include <xen/guest_access.h>
>>> -#include <asm/current.h>
>>> -#include <compat/xen.h>
>>> -#include <compat/version.h>
>>> -
>>> -extern xen_commandline_t saved_cmdline;
>>> -
>>> -#define xen_extraversion compat_extraversion
>>> -#define xen_extraversion_t compat_extraversion_t
>>> -
>>> -#define xen_compile_info compat_compile_info
>>> -#define xen_compile_info_t compat_compile_info_t
>>> -
>>> -CHECK_TYPE(capabilities_info);
>> This and ...
>>
>>> -#define xen_platform_parameters compat_platform_parameters
>>> -#define xen_platform_parameters_t compat_platform_parameters_t
>>> -#undef HYPERVISOR_VIRT_START
>>> -#define HYPERVISOR_VIRT_START HYPERVISOR_COMPAT_VIRT_START(current->domain)
>>> -
>>> -#define xen_changeset_info compat_changeset_info
>>> -#define xen_changeset_info_t compat_changeset_info_t
>>> -
>>> -#define xen_feature_info compat_feature_info
>>> -#define xen_feature_info_t compat_feature_info_t
>>> -
>>> -CHECK_TYPE(domain_handle);
>> ... this go away without any replacement. Considering they're both
>> char[], that's probably fine, but could do with mentioning in the
>> description.
> 
> I did actually mean to ask about these two, because they're incomplete
> already.
> 
> Why do we CHECK_TYPE(capabilities_info) but define identity aliases for
> compat_extraversion (amongst others) ?
> 
> Is there even a point for having a compat alias of a char array?

To be quite honest, for capabilities_info I don't recall why it wasn't
aliased. For domain_handle I think the reason was that it's declared
outside of this header, and it could conceivably be a more UUID-like
struct.

> I'm tempted to just drop them.  I don't think the check does anything
> useful for us.

As said, I'm pretty much okay with this, but would like you to mention
in the description that this is an intentional change.

Jan


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 3/4] xen/version: Drop bogus return values for XENVER_platform_parameters
  2023-01-04 19:55     ` Andrew Cooper
@ 2023-01-05  7:57       ` Jan Beulich
  2023-01-05 22:17         ` Andrew Cooper
  0 siblings, 1 reply; 25+ messages in thread
From: Jan Beulich @ 2023-01-05  7:57 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: George Dunlap, Stefano Stabellini, Wei Liu, Julien Grall, Xen-devel

On 04.01.2023 20:55, Andrew Cooper wrote:
> On 04/01/2023 4:40 pm, Jan Beulich wrote:
>> On 03.01.2023 21:09, Andrew Cooper wrote:
>>> A split in virtual address space is only applicable for x86 PV guests.
>>> Furthermore, the information returned for x86 64bit PV guests is wrong.
>>>
>>> Explain the problem in version.h, stating the other information that PV guests
>>> need to know.
>>>
>>> For 64bit PV guests, and all non-x86-PV guests, return 0, which is strictly
>>> less wrong than the values currently returned.
>> I disagree for the 64-bit part of this. Seeing Linux'es exposure of the
>> value in sysfs I even wonder whether we can change this like you do for
>> HVM. Who knows what is being inferred from the value, and by whom.
> 
> Linux's sysfs ABI isn't relevant to us here.  The sysfs ABI says it
> reports what the hypervisor presents, not that it will be a nonzero number.

It effectively reports the hypervisor (virtual) base address there. How
can we not care if something (kexec would come to mind) would be using
it for whatever purpose. And thinking of it, the tool stack has uses,
too. Assuming you audited them, did you consider removing dead uses in
a prereq patch (and discuss the effects on live ones in the description)?

>>> --- a/xen/include/public/version.h
>>> +++ b/xen/include/public/version.h
>>> @@ -42,6 +42,26 @@ typedef char xen_capabilities_info_t[1024];
>>>  typedef char xen_changeset_info_t[64];
>>>  #define XEN_CHANGESET_INFO_LEN (sizeof(xen_changeset_info_t))
>>>  
>>> +/*
>>> + * This API is problematic.
>>> + *
>>> + * It is only applicable to guests which share pagetables with Xen (x86 PV
>>> + * guests), and is supposed to identify the virtual address split between
>>> + * guest kernel and Xen.
>>> + *
>>> + * For 32bit PV guests, it mostly does this, but the caller needs to know that
>>> + * Xen lives between the split and 4G.
>>> + *
>>> + * For 64bit PV guests, Xen lives at the bottom of the upper canonical range.
>>> + * This previously returned the start of the upper canonical range (which is
>>> + * the userspace/Xen split), not the Xen/kernel split (which is 8TB further
>>> + * on).  This now returns 0 because the old number wasn't correct, and
>>> + * changing it to anything else would be even worse.
>> Whether the guest runs user mode code in the low or high half (or in yet
>> another way of splitting) isn't really dictated by the PV ABI, is it?
> 
> No, but given a choice of reporting the thing which is an architectural
> boundary, or the one which is the actual split between the two adjacent
> ranges, reporting the architectural boundary is clearly the unhelpful thing.

Hmm. To properly parallel the 32-bit variant, a [start,end] range would need
exposing for 64-bit, rather than exposing nothing. Not the least because ...

>>  So
>> whether the value is "wrong" is entirely unclear. Instead ...
>>
>>> + * For all guest types using hardware virt extentions, Xen is not mapped into
>>> + * the guest kernel virtual address space.  This now return 0, where it
>>> + * previously returned unrelated data.
>>> + */
>>>  #define XENVER_platform_parameters 5
>>>  struct xen_platform_parameters {
>>>      xen_ulong_t virt_start;
>> ... the field name tells me that all that is being conveyed is the virtual
>> address of where the hypervisor area starts.
> 
> IMO, it doesn't matter what the name of the field is.  It dates from the
> days when 32bit PV was the only type of guest.
> 
> 32bit PV guests really do have a variable split, so the guest kernel
> really does need to get this value from Xen.
> 
> The split for 64bit PV guests is compile-time constant, hence why 64bit
> PV kernels don't care.

... once we get to run Xen in 5-level mode, 4-level PV guests could also
gain a variable split: Like for 32-bit guests now, only the r/o M2P would
need to live in that area, and this may well occupy less than the full
range presently reserved for the hypervisor.

> For compat HVM, it happens to pick up the -1 from:
> 
> #ifdef CONFIG_PV32
>     HYPERVISOR_COMPAT_VIRT_START(d) =
>         is_pv_domain(d) ? __HYPERVISOR_COMPAT_VIRT_START : ~0u;
> #endif
> 
> in arch_domain_create(), whereas for non-compat HVM, it gets a number in
> an address space it has no connection to in the slightest.  ARM guests
> end up getting XEN_VIRT_START (== 2M) handed back, but this absolutely
> an internal detail that guests have no business knowing.

Well, okay, this looks to be good enough an argument to make the adjustment
you propose for !PV guests.

> The only reason I'm not issuing an XSA for this is because we don't have
> any pretence of KASLR in Xen.  Pretty much every other kernel gets CVEs
> for infoleaks like this.
> 
> We feasibly could do KASLR in !PV builds, at which point this would
> qualify for an XSA.

I would question that, but I can see your view as one possible one.

Jan


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 4/4] xen/version: Introduce non-truncating XENVER_* subops
  2023-01-04 18:34     ` Andrew Cooper
@ 2023-01-05  8:15       ` Jan Beulich
  2023-01-05 22:28         ` Andrew Cooper
  0 siblings, 1 reply; 25+ messages in thread
From: Jan Beulich @ 2023-01-05  8:15 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: George Dunlap, Stefano Stabellini, Wei Liu, Julien Grall, Xen-devel

On 04.01.2023 19:34, Andrew Cooper wrote:
> On 04/01/2023 5:04 pm, Jan Beulich wrote:
>> On 03.01.2023 21:09, Andrew Cooper wrote:
>>> API/ABI wise, XENVER_build_id could be merged into xenver_varstring_op(), but
>>> the internal infrastructure is awkward.
>> I guess build-id also doesn't fit the rest because of not returning strings,
>> but indeed an array of bytes. You also couldn't use strlen() on the array.
> 
> The format is unspecified, but it is a base64 encoded ASCII string (not
> NUL terminated).

Where are you taking this base64 encoding from? I see the build ID being pure
binary data, which could easily have an embedded nul.

>>> +    if ( sz > INT32_MAX )
>>> +        return -E2BIG; /* Compat guests.  2G ought to be plenty. */
>> While the comment here and in the public header mention compat guests,
>> the check is uniform. What's the deal?
> 
> Well its either this, or a (comat ? INT32_MAX : INT64_MAX) check, along
> with the ifdefary and predicates required to make that compile.
> 
> But there's not a CPU today which can actually move 2G of data (which is
> 4G of L1d bandwidth) without suffering the watchdog (especially as we've
> just read it once for strlen(), so that's 6G of bandwidth), nor do I
> expect this to change in the forseeable future.
> 
> There's some boundary (probably far lower) beyond which we can't use the
> algorithm here.
> 
> There wants to be some limit, and I don't feel it is necessary to make
> it variable on the guest type.

Sure. My question was merely because of the special mentioning of 32-bit /
compat guests. I'm fine with the universal limit, and I'd also be fine
with a lower (universal) bound. All I'm after is that the (to me at least)
confusing comments be adjusted.

> But overall, I'm not seeing a major objection to this change?  In which
> case I'll go ahead and do the tools/ cleanup too for v2.

Well, I'm not entirely convinced of the need for the new subops (we could
as well introduce build time checks to make sure no truncation will occur
for the existing ones), but I also won't object to their introduction. At
least for the command line I can see that we will want (need) to support
more than 1k worth of a string, considering how many options we have
accumulated.

Jan


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 3/4] xen/version: Drop bogus return values for XENVER_platform_parameters
  2023-01-05  7:57       ` Jan Beulich
@ 2023-01-05 22:17         ` Andrew Cooper
  2023-01-06  7:54           ` Jan Beulich
  0 siblings, 1 reply; 25+ messages in thread
From: Andrew Cooper @ 2023-01-05 22:17 UTC (permalink / raw)
  To: Jan Beulich
  Cc: George Dunlap, Stefano Stabellini, Wei Liu, Julien Grall, Xen-devel

On 05/01/2023 7:57 am, Jan Beulich wrote:
> On 04.01.2023 20:55, Andrew Cooper wrote:
>> On 04/01/2023 4:40 pm, Jan Beulich wrote:
>>> On 03.01.2023 21:09, Andrew Cooper wrote:
>>>> A split in virtual address space is only applicable for x86 PV guests.
>>>> Furthermore, the information returned for x86 64bit PV guests is wrong.
>>>>
>>>> Explain the problem in version.h, stating the other information that PV guests
>>>> need to know.
>>>>
>>>> For 64bit PV guests, and all non-x86-PV guests, return 0, which is strictly
>>>> less wrong than the values currently returned.
>>> I disagree for the 64-bit part of this. Seeing Linux'es exposure of the
>>> value in sysfs I even wonder whether we can change this like you do for
>>> HVM. Who knows what is being inferred from the value, and by whom.
>> Linux's sysfs ABI isn't relevant to us here.  The sysfs ABI says it
>> reports what the hypervisor presents, not that it will be a nonzero number.
> It effectively reports the hypervisor (virtual) base address there. How
> can we not care if something (kexec would come to mind) would be using
> it for whatever purpose.

What about kexec do you think would care?

The only thing kexec-tools cares about is XENVER_capabilities, but even
that's not actually correct for figuring out whether Xen can do kexec
transitions to ELF64/32.

> And thinking of it, the tool stack has uses,
> too. Assuming you audited them, did you consider removing dead uses in
> a prereq patch (and discuss the effects on live ones in the description)?

There is only one toolstack use I can spot which is non-informational,
and it's broken AFAICT.

`xl dump-core` writes out a header which includes this metadata, but it
takes dom0's value, not domU's.  (Not that this is relevant AFAICT,
because the M2P is handled specially anyway.)


Most XENVER_* information is global (and by this, I mean invariant and
non-caller dependent, outside of livepatching.)

XENVER_guest_handle is caller-variant, but the toolstack has proper
interfaces to get/set this value.

XENVER_platform_parameters (and XENVER_get_features for that matter) are
caller-variant, and the toolstack has no way to get domU's view of this
data.


Every instance (well - this is the only interesting one) of the use of
XENVER_platform_parameters I can find is broken, even in the Xen code.

>>>> --- a/xen/include/public/version.h
>>>> +++ b/xen/include/public/version.h
>>>> @@ -42,6 +42,26 @@ typedef char xen_capabilities_info_t[1024];
>>>>  typedef char xen_changeset_info_t[64];
>>>>  #define XEN_CHANGESET_INFO_LEN (sizeof(xen_changeset_info_t))
>>>>  
>>>> +/*
>>>> + * This API is problematic.
>>>> + *
>>>> + * It is only applicable to guests which share pagetables with Xen (x86 PV
>>>> + * guests), and is supposed to identify the virtual address split between
>>>> + * guest kernel and Xen.
>>>> + *
>>>> + * For 32bit PV guests, it mostly does this, but the caller needs to know that
>>>> + * Xen lives between the split and 4G.
>>>> + *
>>>> + * For 64bit PV guests, Xen lives at the bottom of the upper canonical range.
>>>> + * This previously returned the start of the upper canonical range (which is
>>>> + * the userspace/Xen split), not the Xen/kernel split (which is 8TB further
>>>> + * on).  This now returns 0 because the old number wasn't correct, and
>>>> + * changing it to anything else would be even worse.
>>> Whether the guest runs user mode code in the low or high half (or in yet
>>> another way of splitting) isn't really dictated by the PV ABI, is it?
>> No, but given a choice of reporting the thing which is an architectural
>> boundary, or the one which is the actual split between the two adjacent
>> ranges, reporting the architectural boundary is clearly the unhelpful thing.
> Hmm. To properly parallel the 32-bit variant, a [start,end] range would need
> exposing for 64-bit, rather than exposing nothing.

The 32-bit version is a start/end pair, but with end being implicit at
the 4G architectural boundary.

If we were doing 64-bit from scratch, then reporting end would have been
sensible, because for 64-bit, start is the architectural boundary which
can be implicit.

But there is no such thing as a 64bit PV guest with any (useful) idea of
a variable split, because this number has been junk for the entire
lifetime of 64bit PV guests.  In particular, ...

>>>> + * For all guest types using hardware virt extentions, Xen is not mapped into
>>>> + * the guest kernel virtual address space.  This now return 0, where it
>>>> + * previously returned unrelated data.
>>>> + */
>>>>  #define XENVER_platform_parameters 5
>>>>  struct xen_platform_parameters {
>>>>      xen_ulong_t virt_start;
>>> ... the field name tells me that all that is being conveyed is the virtual
>>> address of where the hypervisor area starts.
>> IMO, it doesn't matter what the name of the field is.  It dates from the
>> days when 32bit PV was the only type of guest.
>>
>> 32bit PV guests really do have a variable split, so the guest kernel
>> really does need to get this value from Xen.
>>
>> The split for 64bit PV guests is compile-time constant, hence why 64bit
>> PV kernels don't care.
> ... once we get to run Xen in 5-level mode, 4-level PV guests could also
> gain a variable split: Like for 32-bit guests now, only the r/o M2P would
> need to live in that area, and this may well occupy less than the full
> range presently reserved for the hypervisor.

... you can't do this, because it only works for guests which have
chosen to find the M2P using XENMEM_machphys_mapping (e.g. Linux), and
doesn't for e.g. MiniOS which does:

#define machine_to_phys_mapping ((unsigned long *)HYPERVISOR_VIRT_START)

in fact, looking at this, MiniOS is also broken as a 32bit PV dom0,
because it hardcodes __MACH2PHYS_VIRT_START in the case where the split
really is variable.


Its only PV guests which are LA57 aware which can possibly benefit from
a variable position M2P, and only because that will be a new ELFNOTE
protocol.

>
>> For compat HVM, it happens to pick up the -1 from:
>>
>> #ifdef CONFIG_PV32
>>     HYPERVISOR_COMPAT_VIRT_START(d) =
>>         is_pv_domain(d) ? __HYPERVISOR_COMPAT_VIRT_START : ~0u;
>> #endif
>>
>> in arch_domain_create(), whereas for non-compat HVM, it gets a number in
>> an address space it has no connection to in the slightest.  ARM guests
>> end up getting XEN_VIRT_START (== 2M) handed back, but this absolutely
>> an internal detail that guests have no business knowing.
> Well, okay, this looks to be good enough an argument to make the adjustment
> you propose for !PV guests.

Right, HVM (on all architectures) is very cut and dry.

But it feels wrong to not address the PV64 issue at the same time
because it is similar level of broken, despite there being (in theory) a
legitimate need for a PV guest kernel to know it.

~Andrew

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 4/4] xen/version: Introduce non-truncating XENVER_* subops
  2023-01-05  8:15       ` Jan Beulich
@ 2023-01-05 22:28         ` Andrew Cooper
  2023-01-06  7:56           ` Jan Beulich
  0 siblings, 1 reply; 25+ messages in thread
From: Andrew Cooper @ 2023-01-05 22:28 UTC (permalink / raw)
  To: Jan Beulich
  Cc: George Dunlap, Stefano Stabellini, Wei Liu, Julien Grall, Xen-devel

On 05/01/2023 8:15 am, Jan Beulich wrote:
> On 04.01.2023 19:34, Andrew Cooper wrote:
>> On 04/01/2023 5:04 pm, Jan Beulich wrote:
>>> On 03.01.2023 21:09, Andrew Cooper wrote:
>>>> API/ABI wise, XENVER_build_id could be merged into xenver_varstring_op(), but
>>>> the internal infrastructure is awkward.
>>> I guess build-id also doesn't fit the rest because of not returning strings,
>>> but indeed an array of bytes. You also couldn't use strlen() on the array.
>> The format is unspecified, but it is a base64 encoded ASCII string (not
>> NUL terminated).
> Where are you taking this base64 encoding from? I see the build ID being pure
> binary data, which could easily have an embedded nul.

Oh, so it is.

I'd failed to spot that libxl formats it automatically on behalf of the
caller.

>>>> +    if ( sz > INT32_MAX )
>>>> +        return -E2BIG; /* Compat guests.  2G ought to be plenty. */
>>> While the comment here and in the public header mention compat guests,
>>> the check is uniform. What's the deal?
>> Well its either this, or a (comat ? INT32_MAX : INT64_MAX) check, along
>> with the ifdefary and predicates required to make that compile.
>>
>> But there's not a CPU today which can actually move 2G of data (which is
>> 4G of L1d bandwidth) without suffering the watchdog (especially as we've
>> just read it once for strlen(), so that's 6G of bandwidth), nor do I
>> expect this to change in the forseeable future.
>>
>> There's some boundary (probably far lower) beyond which we can't use the
>> algorithm here.
>>
>> There wants to be some limit, and I don't feel it is necessary to make
>> it variable on the guest type.
> Sure. My question was merely because of the special mentioning of 32-bit /
> compat guests. I'm fine with the universal limit, and I'd also be fine
> with a lower (universal) bound. All I'm after is that the (to me at least)
> confusing comments be adjusted.

How about 16k then?

>> But overall, I'm not seeing a major objection to this change?  In which
>> case I'll go ahead and do the tools/ cleanup too for v2.
> Well, I'm not entirely convinced of the need for the new subops (we could
> as well introduce build time checks to make sure no truncation will occur
> for the existing ones), but I also won't object to their introduction. At
> least for the command line I can see that we will want (need) to support
> more than 1k worth of a string, considering how many options we have
> accumulated.

I've legitimate customer bugs caused by the cmdline limit, and real test
problems caused by the extraversion limit which I'm unwilling to "fix"
by sticking to the current API/ABI.

~Andrew

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 3/4] xen/version: Drop bogus return values for XENVER_platform_parameters
  2023-01-05 22:17         ` Andrew Cooper
@ 2023-01-06  7:54           ` Jan Beulich
  2023-01-06 12:14             ` Andrew Cooper
  0 siblings, 1 reply; 25+ messages in thread
From: Jan Beulich @ 2023-01-06  7:54 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: George Dunlap, Stefano Stabellini, Wei Liu, Julien Grall, Xen-devel

On 05.01.2023 23:17, Andrew Cooper wrote:
> On 05/01/2023 7:57 am, Jan Beulich wrote:
>> On 04.01.2023 20:55, Andrew Cooper wrote:
>>> On 04/01/2023 4:40 pm, Jan Beulich wrote:
>>>> On 03.01.2023 21:09, Andrew Cooper wrote:
>>>>> A split in virtual address space is only applicable for x86 PV guests.
>>>>> Furthermore, the information returned for x86 64bit PV guests is wrong.
>>>>>
>>>>> Explain the problem in version.h, stating the other information that PV guests
>>>>> need to know.
>>>>>
>>>>> For 64bit PV guests, and all non-x86-PV guests, return 0, which is strictly
>>>>> less wrong than the values currently returned.
>>>> I disagree for the 64-bit part of this. Seeing Linux'es exposure of the
>>>> value in sysfs I even wonder whether we can change this like you do for
>>>> HVM. Who knows what is being inferred from the value, and by whom.
>>> Linux's sysfs ABI isn't relevant to us here.  The sysfs ABI says it
>>> reports what the hypervisor presents, not that it will be a nonzero number.
>> It effectively reports the hypervisor (virtual) base address there. How
>> can we not care if something (kexec would come to mind) would be using
>> it for whatever purpose.
> 
> What about kexec do you think would care?

I didn't think about anything specific, but I could see why it may want to
know where in VA space Xen sits.

>>>>> --- a/xen/include/public/version.h
>>>>> +++ b/xen/include/public/version.h
>>>>> @@ -42,6 +42,26 @@ typedef char xen_capabilities_info_t[1024];
>>>>>  typedef char xen_changeset_info_t[64];
>>>>>  #define XEN_CHANGESET_INFO_LEN (sizeof(xen_changeset_info_t))
>>>>>  
>>>>> +/*
>>>>> + * This API is problematic.
>>>>> + *
>>>>> + * It is only applicable to guests which share pagetables with Xen (x86 PV
>>>>> + * guests), and is supposed to identify the virtual address split between
>>>>> + * guest kernel and Xen.
>>>>> + *
>>>>> + * For 32bit PV guests, it mostly does this, but the caller needs to know that
>>>>> + * Xen lives between the split and 4G.
>>>>> + *
>>>>> + * For 64bit PV guests, Xen lives at the bottom of the upper canonical range.
>>>>> + * This previously returned the start of the upper canonical range (which is
>>>>> + * the userspace/Xen split), not the Xen/kernel split (which is 8TB further
>>>>> + * on).  This now returns 0 because the old number wasn't correct, and
>>>>> + * changing it to anything else would be even worse.
>>>> Whether the guest runs user mode code in the low or high half (or in yet
>>>> another way of splitting) isn't really dictated by the PV ABI, is it?
>>> No, but given a choice of reporting the thing which is an architectural
>>> boundary, or the one which is the actual split between the two adjacent
>>> ranges, reporting the architectural boundary is clearly the unhelpful thing.
>> Hmm. To properly parallel the 32-bit variant, a [start,end] range would need
>> exposing for 64-bit, rather than exposing nothing.
> 
> The 32-bit version is a start/end pair, but with end being implicit at
> the 4G architectural boundary.
> 
> If we were doing 64-bit from scratch, then reporting end would have been
> sensible, because for 64-bit, start is the architectural boundary which
> can be implicit.
> 
> But there is no such thing as a 64bit PV guest with any (useful) idea of
> a variable split, because this number has been junk for the entire
> lifetime of 64bit PV guests.  In particular, ...
> 
>>>>> + * For all guest types using hardware virt extentions, Xen is not mapped into
>>>>> + * the guest kernel virtual address space.  This now return 0, where it
>>>>> + * previously returned unrelated data.
>>>>> + */
>>>>>  #define XENVER_platform_parameters 5
>>>>>  struct xen_platform_parameters {
>>>>>      xen_ulong_t virt_start;
>>>> ... the field name tells me that all that is being conveyed is the virtual
>>>> address of where the hypervisor area starts.
>>> IMO, it doesn't matter what the name of the field is.  It dates from the
>>> days when 32bit PV was the only type of guest.
>>>
>>> 32bit PV guests really do have a variable split, so the guest kernel
>>> really does need to get this value from Xen.
>>>
>>> The split for 64bit PV guests is compile-time constant, hence why 64bit
>>> PV kernels don't care.
>> ... once we get to run Xen in 5-level mode, 4-level PV guests could also
>> gain a variable split: Like for 32-bit guests now, only the r/o M2P would
>> need to live in that area, and this may well occupy less than the full
>> range presently reserved for the hypervisor.
> 
> ... you can't do this, because it only works for guests which have
> chosen to find the M2P using XENMEM_machphys_mapping (e.g. Linux), and
> doesn't for e.g. MiniOS which does:
> 
> #define machine_to_phys_mapping ((unsigned long *)HYPERVISOR_VIRT_START)

Hmm, looks like a misunderstanding? I certainly wasn't thinking about
making the start of that region variable, but rather the end (i.e. not
exactly like for 32-bit compat).

>>> For compat HVM, it happens to pick up the -1 from:
>>>
>>> #ifdef CONFIG_PV32
>>>     HYPERVISOR_COMPAT_VIRT_START(d) =
>>>         is_pv_domain(d) ? __HYPERVISOR_COMPAT_VIRT_START : ~0u;
>>> #endif
>>>
>>> in arch_domain_create(), whereas for non-compat HVM, it gets a number in
>>> an address space it has no connection to in the slightest.  ARM guests
>>> end up getting XEN_VIRT_START (== 2M) handed back, but this absolutely
>>> an internal detail that guests have no business knowing.
>> Well, okay, this looks to be good enough an argument to make the adjustment
>> you propose for !PV guests.
> 
> Right, HVM (on all architectures) is very cut and dry.
> 
> But it feels wrong to not address the PV64 issue at the same time
> because it is similar level of broken, despite there being (in theory) a
> legitimate need for a PV guest kernel to know it.

To me it feels wrong to address the 64-bit PV issue by removing information,
when - as you also say - it is actually _missing_ information. To me the
proper course of action would be to expose the upper bound as well (such
that, down the road, it could become dynamic). There's also no info leak
there, as the two (static) bounds are part of the PV ABI anyway.

Jan


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 4/4] xen/version: Introduce non-truncating XENVER_* subops
  2023-01-05 22:28         ` Andrew Cooper
@ 2023-01-06  7:56           ` Jan Beulich
  0 siblings, 0 replies; 25+ messages in thread
From: Jan Beulich @ 2023-01-06  7:56 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: George Dunlap, Stefano Stabellini, Wei Liu, Julien Grall, Xen-devel

On 05.01.2023 23:28, Andrew Cooper wrote:
> On 05/01/2023 8:15 am, Jan Beulich wrote:
>> On 04.01.2023 19:34, Andrew Cooper wrote:
>>> On 04/01/2023 5:04 pm, Jan Beulich wrote:
>>>> On 03.01.2023 21:09, Andrew Cooper wrote:
>>>>> +    if ( sz > INT32_MAX )
>>>>> +        return -E2BIG; /* Compat guests.  2G ought to be plenty. */
>>>> While the comment here and in the public header mention compat guests,
>>>> the check is uniform. What's the deal?
>>> Well its either this, or a (comat ? INT32_MAX : INT64_MAX) check, along
>>> with the ifdefary and predicates required to make that compile.
>>>
>>> But there's not a CPU today which can actually move 2G of data (which is
>>> 4G of L1d bandwidth) without suffering the watchdog (especially as we've
>>> just read it once for strlen(), so that's 6G of bandwidth), nor do I
>>> expect this to change in the forseeable future.
>>>
>>> There's some boundary (probably far lower) beyond which we can't use the
>>> algorithm here.
>>>
>>> There wants to be some limit, and I don't feel it is necessary to make
>>> it variable on the guest type.
>> Sure. My question was merely because of the special mentioning of 32-bit /
>> compat guests. I'm fine with the universal limit, and I'd also be fine
>> with a lower (universal) bound. All I'm after is that the (to me at least)
>> confusing comments be adjusted.
> 
> How about 16k then?

Might be okay. If I was to pick a value, I'd use 64k.

Jan


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 3/4] xen/version: Drop bogus return values for XENVER_platform_parameters
  2023-01-06  7:54           ` Jan Beulich
@ 2023-01-06 12:14             ` Andrew Cooper
  2023-01-09  8:06               ` Jan Beulich
  0 siblings, 1 reply; 25+ messages in thread
From: Andrew Cooper @ 2023-01-06 12:14 UTC (permalink / raw)
  To: Jan Beulich
  Cc: George Dunlap, Stefano Stabellini, Wei Liu, Julien Grall, Xen-devel

On 06/01/2023 7:54 am, Jan Beulich wrote:
> On 05.01.2023 23:17, Andrew Cooper wrote:
>> On 05/01/2023 7:57 am, Jan Beulich wrote:
>>> On 04.01.2023 20:55, Andrew Cooper wrote:
>>>> On 04/01/2023 4:40 pm, Jan Beulich wrote:
>>>>> On 03.01.2023 21:09, Andrew Cooper wrote:
>>>>>> A split in virtual address space is only applicable for x86 PV guests.
>>>>>> Furthermore, the information returned for x86 64bit PV guests is wrong.
>>>>>>
>>>>>> Explain the problem in version.h, stating the other information that PV guests
>>>>>> need to know.
>>>>>>
>>>>>> For 64bit PV guests, and all non-x86-PV guests, return 0, which is strictly
>>>>>> less wrong than the values currently returned.
>>>>> I disagree for the 64-bit part of this. Seeing Linux'es exposure of the
>>>>> value in sysfs I even wonder whether we can change this like you do for
>>>>> HVM. Who knows what is being inferred from the value, and by whom.
>>>> Linux's sysfs ABI isn't relevant to us here.  The sysfs ABI says it
>>>> reports what the hypervisor presents, not that it will be a nonzero number.
>>> It effectively reports the hypervisor (virtual) base address there. How
>>> can we not care if something (kexec would come to mind) would be using
>>> it for whatever purpose.
>> What about kexec do you think would care?
> I didn't think about anything specific, but I could see why it may want to
> know where in VA space Xen sits.

The kexec image doesn't run "under" Xen; it replaces Xen in memory, and
transition into the new image is via no paging (32bit) or identity
paging (64bit) in the reserved region.

We don't really support kexec load (it's there, but I don't expect
anyone has exercised it in anger), but if we were to load a new
Xen+dom0, then kexec-tools still has nothing relevant to do with
new-Xen+dom0's split.

>>>>>> + * For all guest types using hardware virt extentions, Xen is not mapped into
>>>>>> + * the guest kernel virtual address space.  This now return 0, where it
>>>>>> + * previously returned unrelated data.
>>>>>> + */
>>>>>>  #define XENVER_platform_parameters 5
>>>>>>  struct xen_platform_parameters {
>>>>>>      xen_ulong_t virt_start;
>>>>> ... the field name tells me that all that is being conveyed is the virtual
>>>>> address of where the hypervisor area starts.
>>>> IMO, it doesn't matter what the name of the field is.  It dates from the
>>>> days when 32bit PV was the only type of guest.
>>>>
>>>> 32bit PV guests really do have a variable split, so the guest kernel
>>>> really does need to get this value from Xen.
>>>>
>>>> The split for 64bit PV guests is compile-time constant, hence why 64bit
>>>> PV kernels don't care.
>>> ... once we get to run Xen in 5-level mode, 4-level PV guests could also
>>> gain a variable split: Like for 32-bit guests now, only the r/o M2P would
>>> need to live in that area, and this may well occupy less than the full
>>> range presently reserved for the hypervisor.
>> ... you can't do this, because it only works for guests which have
>> chosen to find the M2P using XENMEM_machphys_mapping (e.g. Linux), and
>> doesn't for e.g. MiniOS which does:
>>
>> #define machine_to_phys_mapping ((unsigned long *)HYPERVISOR_VIRT_START)
> Hmm, looks like a misunderstanding? I certainly wasn't thinking about
> making the start of that region variable, but rather the end (i.e. not
> exactly like for 32-bit compat).

But for what purpose?  You can't make 4-level guests have a variable range.

The best you can do is make a 5-level-aware guest running on 4-level Xen
have some new semantics, but a 4-level PV guest already owns the
overwhelming majority of virtual address space, so being able to hand
back a few extra TB is not interesting.  And for making that happen...

>>>> For compat HVM, it happens to pick up the -1 from:
>>>>
>>>> #ifdef CONFIG_PV32
>>>>     HYPERVISOR_COMPAT_VIRT_START(d) =
>>>>         is_pv_domain(d) ? __HYPERVISOR_COMPAT_VIRT_START : ~0u;
>>>> #endif
>>>>
>>>> in arch_domain_create(), whereas for non-compat HVM, it gets a number in
>>>> an address space it has no connection to in the slightest.  ARM guests
>>>> end up getting XEN_VIRT_START (== 2M) handed back, but this absolutely
>>>> an internal detail that guests have no business knowing.
>>> Well, okay, this looks to be good enough an argument to make the adjustment
>>> you propose for !PV guests.
>> Right, HVM (on all architectures) is very cut and dry.
>>
>> But it feels wrong to not address the PV64 issue at the same time
>> because it is similar level of broken, despite there being (in theory) a
>> legitimate need for a PV guest kernel to know it.
> To me it feels wrong to address the 64-bit PV issue by removing information,
> when - as you also say - it is actually _missing_ information. To me the
> proper course of action would be to expose the upper bound as well (such
> that, down the road, it could become dynamic). There's also no info leak
> there, as the two (static) bounds are part of the PV ABI anyway.

... the absolute best you could do here is introduce a new
XENVER_platform_parameters2 that has a different structure.

Which still leaves us with XENVER_platform_parameters providing data
which is somewhere between useless and actively unhelpful.

~Andrew

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 3/4] xen/version: Drop bogus return values for XENVER_platform_parameters
  2023-01-06 12:14             ` Andrew Cooper
@ 2023-01-09  8:06               ` Jan Beulich
  0 siblings, 0 replies; 25+ messages in thread
From: Jan Beulich @ 2023-01-09  8:06 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: George Dunlap, Stefano Stabellini, Wei Liu, Julien Grall, Xen-devel

On 06.01.2023 13:14, Andrew Cooper wrote:
> On 06/01/2023 7:54 am, Jan Beulich wrote:
>> On 05.01.2023 23:17, Andrew Cooper wrote:
>>> On 05/01/2023 7:57 am, Jan Beulich wrote:
>>>> On 04.01.2023 20:55, Andrew Cooper wrote:
>>>>> On 04/01/2023 4:40 pm, Jan Beulich wrote:
>>>>>> On 03.01.2023 21:09, Andrew Cooper wrote:
>>>>>>> + * For all guest types using hardware virt extentions, Xen is not mapped into
>>>>>>> + * the guest kernel virtual address space.  This now return 0, where it
>>>>>>> + * previously returned unrelated data.
>>>>>>> + */
>>>>>>>  #define XENVER_platform_parameters 5
>>>>>>>  struct xen_platform_parameters {
>>>>>>>      xen_ulong_t virt_start;
>>>>>> ... the field name tells me that all that is being conveyed is the virtual
>>>>>> address of where the hypervisor area starts.
>>>>> IMO, it doesn't matter what the name of the field is.  It dates from the
>>>>> days when 32bit PV was the only type of guest.
>>>>>
>>>>> 32bit PV guests really do have a variable split, so the guest kernel
>>>>> really does need to get this value from Xen.
>>>>>
>>>>> The split for 64bit PV guests is compile-time constant, hence why 64bit
>>>>> PV kernels don't care.
>>>> ... once we get to run Xen in 5-level mode, 4-level PV guests could also
>>>> gain a variable split: Like for 32-bit guests now, only the r/o M2P would
>>>> need to live in that area, and this may well occupy less than the full
>>>> range presently reserved for the hypervisor.
>>> ... you can't do this, because it only works for guests which have
>>> chosen to find the M2P using XENMEM_machphys_mapping (e.g. Linux), and
>>> doesn't for e.g. MiniOS which does:
>>>
>>> #define machine_to_phys_mapping ((unsigned long *)HYPERVISOR_VIRT_START)
>> Hmm, looks like a misunderstanding? I certainly wasn't thinking about
>> making the start of that region variable, but rather the end (i.e. not
>> exactly like for 32-bit compat).
> 
> But for what purpose?  You can't make 4-level guests have a variable range.

That entirely depends on the kernel. Linux, for example, could put its
direct map lower, ...

> The best you can do is make a 5-level-aware guest running on 4-level Xen
> have some new semantics, but a 4-level PV guest already owns the
> overwhelming majority of virtual address space, so being able to hand
> back a few extra TB is not interesting.  And for making that happen...

... allowing it to cover some more space. That's not a whole lot more,
sure.

>>>>> For compat HVM, it happens to pick up the -1 from:
>>>>>
>>>>> #ifdef CONFIG_PV32
>>>>>     HYPERVISOR_COMPAT_VIRT_START(d) =
>>>>>         is_pv_domain(d) ? __HYPERVISOR_COMPAT_VIRT_START : ~0u;
>>>>> #endif
>>>>>
>>>>> in arch_domain_create(), whereas for non-compat HVM, it gets a number in
>>>>> an address space it has no connection to in the slightest.  ARM guests
>>>>> end up getting XEN_VIRT_START (== 2M) handed back, but this absolutely
>>>>> an internal detail that guests have no business knowing.
>>>> Well, okay, this looks to be good enough an argument to make the adjustment
>>>> you propose for !PV guests.
>>> Right, HVM (on all architectures) is very cut and dry.
>>>
>>> But it feels wrong to not address the PV64 issue at the same time
>>> because it is similar level of broken, despite there being (in theory) a
>>> legitimate need for a PV guest kernel to know it.
>> To me it feels wrong to address the 64-bit PV issue by removing information,
>> when - as you also say - it is actually _missing_ information. To me the
>> proper course of action would be to expose the upper bound as well (such
>> that, down the road, it could become dynamic). There's also no info leak
>> there, as the two (static) bounds are part of the PV ABI anyway.
> 
> ... the absolute best you could do here is introduce a new
> XENVER_platform_parameters2 that has a different structure.

Indeed.

> Which still leaves us with XENVER_platform_parameters providing data
> which is somewhere between useless and actively unhelpful.

If it's useless, there's still no reason to remove / alter the information,
as you can never be absolutely certain that no-one uses this in some way.
And for the "actively unhelpful" aspect it looks like our views simply
differ. Is there no way we can settle on making the change affect HVM only?

Jan


^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2023-01-09  8:07 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-01-03 20:09 [PATCH 0/4] Fix truncation of various XENVER_* subops Andrew Cooper
2023-01-03 20:09 ` [PATCH 1/4] public/version: Change xen_feature_info to have a fixed size Andrew Cooper
2023-01-04  9:03   ` Julien Grall
2023-01-03 20:09 ` [PATCH 2/4] xen/version: Drop compat/kernel.c Andrew Cooper
2023-01-04 16:29   ` Jan Beulich
2023-01-04 19:15     ` Andrew Cooper
2023-01-05  7:22       ` Jan Beulich
2023-01-04 16:48   ` Jan Beulich
2023-01-03 20:09 ` [PATCH 3/4] xen/version: Drop bogus return values for XENVER_platform_parameters Andrew Cooper
2023-01-04 16:40   ` Jan Beulich
2023-01-04 19:55     ` Andrew Cooper
2023-01-05  7:57       ` Jan Beulich
2023-01-05 22:17         ` Andrew Cooper
2023-01-06  7:54           ` Jan Beulich
2023-01-06 12:14             ` Andrew Cooper
2023-01-09  8:06               ` Jan Beulich
2023-01-03 20:09 ` [PATCH 4/4] xen/version: Introduce non-truncating XENVER_* subops Andrew Cooper
2023-01-03 20:47   ` Julien Grall
2023-01-03 21:22     ` Andrew Cooper
2023-01-03 21:40       ` Julien Grall
2023-01-04 17:04   ` Jan Beulich
2023-01-04 18:34     ` Andrew Cooper
2023-01-05  8:15       ` Jan Beulich
2023-01-05 22:28         ` Andrew Cooper
2023-01-06  7:56           ` Jan Beulich

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.