All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 0/5] x86/pvh: Support relocating dom0 kernel
@ 2024-03-25 20:45 Jason Andryuk
  2024-03-25 20:45 ` [PATCH v4 1/5] Revert "xen/x86: bzImage parse kernel_alignment" Jason Andryuk
                   ` (4 more replies)
  0 siblings, 5 replies; 14+ messages in thread
From: Jason Andryuk @ 2024-03-25 20:45 UTC (permalink / raw)
  To: xen-devel
  Cc: Jason Andryuk, Jan Beulich, Andrew Cooper, Roger Pau Monné,
	Juergen Gross, Julien Grall, Anthony PERARD, George Dunlap,
	Stefano Stabellini

Xen tries to load a PVH dom0 kernel at the fixed guest physical address
from the elf headers.  For Linux, this defaults to 0x1000000 (16MB), but
it can be configured.

Unfortunately there exist firmwares that have reserved regions at this
address, so Xen fails to load the dom0 kernel since it's not RAM.

The other issue is that the Linux PVH entry point is not
position-independent.  It expects to run at the compiled
CONFIG_PHYSICAL_ADDRESS.

This patch set expands the PVH dom0 builder to try to relocate the
kernel if needed and possible.  XEN_ELFNOTE_PHYS32_RELOC is added for
kernels to indicate they are relocatable and their acceptable address
range and alignment.

The first patch reverts "xen/x86: bzImage parse kernel_alignment" since
the alignment will be specified by the ELF note.

The second and third patches move MB/GB() to common-macros.h.

The fourth patch expands ELF note printing beyond just printing
integers and strings.

The fifth patch expands the pvh dom0 kernel placement code.

I'll post an additional patch showing the Linux changes to make PVH
relocatable.

Jason Andryuk (5):
  Revert "xen/x86: bzImage parse kernel_alignment"
  tools/init-xenstore-domain: Replace variable MB() usage
  tools: Move MB/GB() to common-macros.h
  libelf: Expand ELF note printing
  x86/PVH: Support relocatable dom0 kernels

 tools/firmware/hvmloader/util.h         |  3 -
 tools/helpers/init-xenstore-domain.c    | 11 ++-
 tools/include/xen-tools/common-macros.h |  4 +
 tools/libs/light/libxl_internal.h       |  4 -
 xen/arch/x86/bzimage.c                  |  4 +-
 xen/arch/x86/hvm/dom0_build.c           | 98 ++++++++++++++++++++++++-
 xen/arch/x86/include/asm/bzimage.h      |  2 +-
 xen/arch/x86/pv/dom0_build.c            |  2 +-
 xen/common/libelf/libelf-dominfo.c      | 97 ++++++++++++++++++------
 xen/common/libelf/libelf-private.h      |  1 +
 xen/include/public/elfnote.h            | 19 ++++-
 xen/include/xen/libelf.h                |  4 +
 12 files changed, 206 insertions(+), 43 deletions(-)

-- 
2.44.0



^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v4 1/5] Revert "xen/x86: bzImage parse kernel_alignment"
  2024-03-25 20:45 [PATCH v4 0/5] x86/pvh: Support relocating dom0 kernel Jason Andryuk
@ 2024-03-25 20:45 ` Jason Andryuk
  2024-03-25 20:45 ` [PATCH v4 2/5] tools/init-xenstore-domain: Replace variable MB() usage Jason Andryuk
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 14+ messages in thread
From: Jason Andryuk @ 2024-03-25 20:45 UTC (permalink / raw)
  To: xen-devel; +Cc: Jason Andryuk, Jan Beulich, Andrew Cooper, Roger Pau Monné

A new ELF note will specify the alignment for a relocatable PVH kernel.
ELF notes are suitable for vmlinux and other ELF files, so this
Linux-specific bzImage parsing in unnecessary.

This reverts commit c44cac229067faeec8f49247d1cf281723ac2d40.

Signed-off-by: Jason Andryuk <jason.andryuk@amd.com>
---
 xen/arch/x86/bzimage.c             | 4 +---
 xen/arch/x86/hvm/dom0_build.c      | 4 +---
 xen/arch/x86/include/asm/bzimage.h | 2 +-
 xen/arch/x86/pv/dom0_build.c       | 2 +-
 4 files changed, 4 insertions(+), 8 deletions(-)

diff --git a/xen/arch/x86/bzimage.c b/xen/arch/x86/bzimage.c
index 0f4cfc49f7..ac4fd428be 100644
--- a/xen/arch/x86/bzimage.c
+++ b/xen/arch/x86/bzimage.c
@@ -105,7 +105,7 @@ unsigned long __init bzimage_headroom(void *image_start,
 }
 
 int __init bzimage_parse(void *image_base, void **image_start,
-                         unsigned long *image_len, unsigned int *align)
+                         unsigned long *image_len)
 {
     struct setup_header *hdr = (struct setup_header *)(*image_start);
     int err = bzimage_check(hdr, *image_len);
@@ -118,8 +118,6 @@ int __init bzimage_parse(void *image_base, void **image_start,
     {
         *image_start += (hdr->setup_sects + 1) * 512 + hdr->payload_offset;
         *image_len = hdr->payload_length;
-        if ( align )
-            *align = hdr->kernel_alignment;
     }
 
     if ( elf_is_elfbinary(*image_start, *image_len) )
diff --git a/xen/arch/x86/hvm/dom0_build.c b/xen/arch/x86/hvm/dom0_build.c
index bbae8a5645..0ceda4140b 100644
--- a/xen/arch/x86/hvm/dom0_build.c
+++ b/xen/arch/x86/hvm/dom0_build.c
@@ -548,14 +548,12 @@ static int __init pvh_load_kernel(struct domain *d, const module_t *image,
     struct elf_binary elf;
     struct elf_dom_parms parms;
     paddr_t last_addr;
-    unsigned int align = 0;
     struct hvm_start_info start_info = { 0 };
     struct hvm_modlist_entry mod = { 0 };
     struct vcpu *v = d->vcpu[0];
     int rc;
 
-    rc = bzimage_parse(image_base, &image_start, &image_len, &align);
-    if ( rc != 0 )
+    if ( (rc = bzimage_parse(image_base, &image_start, &image_len)) != 0 )
     {
         printk("Error trying to detect bz compressed kernel\n");
         return rc;
diff --git a/xen/arch/x86/include/asm/bzimage.h b/xen/arch/x86/include/asm/bzimage.h
index 2e04f5cc7b..7ed69d3910 100644
--- a/xen/arch/x86/include/asm/bzimage.h
+++ b/xen/arch/x86/include/asm/bzimage.h
@@ -6,6 +6,6 @@
 unsigned long bzimage_headroom(void *image_start, unsigned long image_length);
 
 int bzimage_parse(void *image_base, void **image_start,
-                  unsigned long *image_len, unsigned int *align);
+                  unsigned long *image_len);
 
 #endif /* __X86_BZIMAGE_H__ */
diff --git a/xen/arch/x86/pv/dom0_build.c b/xen/arch/x86/pv/dom0_build.c
index e9fa8a9a82..d8043fa58a 100644
--- a/xen/arch/x86/pv/dom0_build.c
+++ b/xen/arch/x86/pv/dom0_build.c
@@ -416,7 +416,7 @@ int __init dom0_construct_pv(struct domain *d,
 
     d->max_pages = ~0U;
 
-    if ( (rc = bzimage_parse(image_base, &image_start, &image_len, NULL)) != 0 )
+    if ( (rc = bzimage_parse(image_base, &image_start, &image_len)) != 0 )
         return rc;
 
     if ( (rc = elf_init(&elf, image_start, image_len)) != 0 )
-- 
2.44.0



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v4 2/5] tools/init-xenstore-domain: Replace variable MB() usage
  2024-03-25 20:45 [PATCH v4 0/5] x86/pvh: Support relocating dom0 kernel Jason Andryuk
  2024-03-25 20:45 ` [PATCH v4 1/5] Revert "xen/x86: bzImage parse kernel_alignment" Jason Andryuk
@ 2024-03-25 20:45 ` Jason Andryuk
  2024-04-03 12:27   ` Anthony PERARD
  2024-03-25 20:45 ` [PATCH v4 3/5] tools: Move MB/GB() to common-macros.h Jason Andryuk
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 14+ messages in thread
From: Jason Andryuk @ 2024-03-25 20:45 UTC (permalink / raw)
  To: xen-devel; +Cc: Jason Andryuk, Juergen Gross, Julien Grall, Anthony PERARD

The local MB() & GB() macros will be replaced with a common
implementation, but those only work with constant values.  Introduce a
static inline mb_to_bytes() in place of the MB() macro to convert the
variable values.

Signed-off-by: Jason Andryuk <jason.andryuk@amd.com>
---
v4:
New
---
 tools/helpers/init-xenstore-domain.c | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/tools/helpers/init-xenstore-domain.c b/tools/helpers/init-xenstore-domain.c
index 1683438c5c..5405842dfe 100644
--- a/tools/helpers/init-xenstore-domain.c
+++ b/tools/helpers/init-xenstore-domain.c
@@ -20,7 +20,6 @@
 #include "init-dom-json.h"
 
 #define LAPIC_BASE_ADDRESS  0xfee00000UL
-#define MB(x)               ((uint64_t)x << 20)
 #define GB(x)               ((uint64_t)x << 30)
 
 static uint32_t domid = ~0;
@@ -36,6 +35,11 @@ static xc_evtchn_port_or_error_t console_evtchn;
 static xentoollog_level minmsglevel = XTL_PROGRESS;
 static void *logger;
 
+static inline uint64_t mb_to_bytes(int mem)
+{
+	return (uint64_t)mem << 20;
+}
+
 static struct option options[] = {
     { "kernel", 1, NULL, 'k' },
     { "memory", 1, NULL, 'm' },
@@ -76,8 +80,8 @@ static int build(xc_interface *xch)
     int rv, xs_fd;
     struct xc_dom_image *dom = NULL;
     int limit_kb = (maxmem ? : memory) * 1024 + X86_HVM_NR_SPECIAL_PAGES * 4;
-    uint64_t mem_size = MB(memory);
-    uint64_t max_size = MB(maxmem ? : memory);
+    uint64_t mem_size = mb_to_bytes(memory);
+    uint64_t max_size = mb_to_bytes(maxmem ? : memory);
     struct e820entry e820[3];
     struct xen_domctl_createdomain config = {
         .ssidref = SECINITSID_DOMU,
-- 
2.44.0



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v4 3/5] tools: Move MB/GB() to common-macros.h
  2024-03-25 20:45 [PATCH v4 0/5] x86/pvh: Support relocating dom0 kernel Jason Andryuk
  2024-03-25 20:45 ` [PATCH v4 1/5] Revert "xen/x86: bzImage parse kernel_alignment" Jason Andryuk
  2024-03-25 20:45 ` [PATCH v4 2/5] tools/init-xenstore-domain: Replace variable MB() usage Jason Andryuk
@ 2024-03-25 20:45 ` Jason Andryuk
  2024-03-26  7:30   ` Jan Beulich
  2024-03-25 20:45 ` [PATCH v4 4/5] libelf: Expand ELF note printing Jason Andryuk
  2024-03-25 20:45 ` [PATCH v4 5/5] x86/PVH: Support relocatable dom0 kernels Jason Andryuk
  4 siblings, 1 reply; 14+ messages in thread
From: Jason Andryuk @ 2024-03-25 20:45 UTC (permalink / raw)
  To: xen-devel
  Cc: Jason Andryuk, Jan Beulich, Andrew Cooper, Roger Pau Monné,
	Anthony PERARD, Juergen Gross, Julien Grall

Consolidate to a single set of common macros for tools.

MB() will gain another use in libelf, so this movement makes it
available.

Signed-off-by: Jason Andryuk <jason.andryuk@amd.com>
---
v4:
New
---
 tools/firmware/hvmloader/util.h         | 3 ---
 tools/helpers/init-xenstore-domain.c    | 1 -
 tools/include/xen-tools/common-macros.h | 4 ++++
 tools/libs/light/libxl_internal.h       | 4 ----
 4 files changed, 4 insertions(+), 8 deletions(-)

diff --git a/tools/firmware/hvmloader/util.h b/tools/firmware/hvmloader/util.h
index 87be213dec..14078bde1e 100644
--- a/tools/firmware/hvmloader/util.h
+++ b/tools/firmware/hvmloader/util.h
@@ -38,9 +38,6 @@ void __bug(const char *file, int line) __attribute__((noreturn));
 #define BUG() __bug(__FILE__, __LINE__)
 #define BUG_ON(p) do { if (p) BUG(); } while (0)
 
-#define MB(mb) (mb##ULL << 20)
-#define GB(gb) (gb##ULL << 30)
-
 static inline int test_bit(unsigned int b, const void *p)
 {
     return !!(((const uint8_t *)p)[b>>3] & (1u<<(b&7)));
diff --git a/tools/helpers/init-xenstore-domain.c b/tools/helpers/init-xenstore-domain.c
index 5405842dfe..f38ba8d6b5 100644
--- a/tools/helpers/init-xenstore-domain.c
+++ b/tools/helpers/init-xenstore-domain.c
@@ -20,7 +20,6 @@
 #include "init-dom-json.h"
 
 #define LAPIC_BASE_ADDRESS  0xfee00000UL
-#define GB(x)               ((uint64_t)x << 30)
 
 static uint32_t domid = ~0;
 static char *kernel;
diff --git a/tools/include/xen-tools/common-macros.h b/tools/include/xen-tools/common-macros.h
index 81fba2e9f5..07aed92684 100644
--- a/tools/include/xen-tools/common-macros.h
+++ b/tools/include/xen-tools/common-macros.h
@@ -91,6 +91,10 @@
 #define __AC(X, Y)   (X ## Y)
 #define _AC(X, Y)    __AC(X, Y)
 
+/* Size macros. */
+#define MB(_mb)     (_AC(_mb, ULL) << 20)
+#define GB(_gb)     (_AC(_gb, ULL) << 30)
+
 #define get_unaligned_t(type, ptr) ({                               \
     const struct { type x; } __packed *ptr_ = (typeof(ptr_))(ptr);  \
     ptr_->x;                                                        \
diff --git a/tools/libs/light/libxl_internal.h b/tools/libs/light/libxl_internal.h
index 094d0df9b1..803dbc1a03 100644
--- a/tools/libs/light/libxl_internal.h
+++ b/tools/libs/light/libxl_internal.h
@@ -126,10 +126,6 @@
 #define PVSHIM_BASENAME "xen-shim"
 #define PVSHIM_CMDLINE "pv-shim console=xen,pv"
 
-/* Size macros. */
-#define MB(_mb)     (_AC(_mb, ULL) << 20)
-#define GB(_gb)     (_AC(_gb, ULL) << 30)
-
 #define DIV_ROUNDUP(n, d) (((n) + (d) - 1) / (d))
 
 #define LIBXL__LOGGING_ENABLED
-- 
2.44.0



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v4 4/5] libelf: Expand ELF note printing
  2024-03-25 20:45 [PATCH v4 0/5] x86/pvh: Support relocating dom0 kernel Jason Andryuk
                   ` (2 preceding siblings ...)
  2024-03-25 20:45 ` [PATCH v4 3/5] tools: Move MB/GB() to common-macros.h Jason Andryuk
@ 2024-03-25 20:45 ` Jason Andryuk
  2024-03-26  7:51   ` Jan Beulich
  2024-03-25 20:45 ` [PATCH v4 5/5] x86/PVH: Support relocatable dom0 kernels Jason Andryuk
  4 siblings, 1 reply; 14+ messages in thread
From: Jason Andryuk @ 2024-03-25 20:45 UTC (permalink / raw)
  To: xen-devel
  Cc: Jason Andryuk, Andrew Cooper, George Dunlap, Jan Beulich,
	Julien Grall, Stefano Stabellini

The XEN_ELFNOTE_L1_MFN_VALID is an array of pairs of values, but it is
printed as:
(XEN) ELF: note: L1_MFN_VALID = 0

This is a limitation of only printing either string or numeric values.
Switch from the boolean to an enum which allows more flexibility in
printing the values.  Introduce ELFNOTE_NAME to only print the name
without a value like:
(XEN) ELF: note: L1_MFN_VALID

Details can optionally be printed for specific notes.

Signed-off-by: Jason Andryuk <jason.andryuk@amd.com>
---
v3:
Remove fatal size check
Don't print values - just the name for presence

v4:
Use switch
Always print newline for ELFNOTE_NAME
---
 xen/common/libelf/libelf-dominfo.c | 60 ++++++++++++++++++------------
 1 file changed, 37 insertions(+), 23 deletions(-)

diff --git a/xen/common/libelf/libelf-dominfo.c b/xen/common/libelf/libelf-dominfo.c
index a13a5e4db6..e7b44d238b 100644
--- a/xen/common/libelf/libelf-dominfo.c
+++ b/xen/common/libelf/libelf-dominfo.c
@@ -101,26 +101,30 @@ elf_errorstatus elf_xen_parse_note(struct elf_binary *elf,
 /* *INDENT-OFF* */
     static const struct {
         const char *name;
-        bool str;
+        enum {
+            ELFNOTE_INT,
+            ELFNOTE_STRING,
+            ELFNOTE_NAME,
+        } type;
     } note_desc[] = {
-        [XEN_ELFNOTE_ENTRY] = { "ENTRY", 0},
-        [XEN_ELFNOTE_HYPERCALL_PAGE] = { "HYPERCALL_PAGE", 0},
-        [XEN_ELFNOTE_VIRT_BASE] = { "VIRT_BASE", 0},
-        [XEN_ELFNOTE_INIT_P2M] = { "INIT_P2M", 0},
-        [XEN_ELFNOTE_PADDR_OFFSET] = { "PADDR_OFFSET", 0},
-        [XEN_ELFNOTE_HV_START_LOW] = { "HV_START_LOW", 0},
-        [XEN_ELFNOTE_XEN_VERSION] = { "XEN_VERSION", 1},
-        [XEN_ELFNOTE_GUEST_OS] = { "GUEST_OS", 1},
-        [XEN_ELFNOTE_GUEST_VERSION] = { "GUEST_VERSION", 1},
-        [XEN_ELFNOTE_LOADER] = { "LOADER", 1},
-        [XEN_ELFNOTE_PAE_MODE] = { "PAE_MODE", 1},
-        [XEN_ELFNOTE_FEATURES] = { "FEATURES", 1},
-        [XEN_ELFNOTE_SUPPORTED_FEATURES] = { "SUPPORTED_FEATURES", 0},
-        [XEN_ELFNOTE_BSD_SYMTAB] = { "BSD_SYMTAB", 1},
-        [XEN_ELFNOTE_L1_MFN_VALID] = { "L1_MFN_VALID", false },
-        [XEN_ELFNOTE_SUSPEND_CANCEL] = { "SUSPEND_CANCEL", 0 },
-        [XEN_ELFNOTE_MOD_START_PFN] = { "MOD_START_PFN", 0 },
-        [XEN_ELFNOTE_PHYS32_ENTRY] = { "PHYS32_ENTRY", 0 },
+        [XEN_ELFNOTE_ENTRY] = { "ENTRY", ELFNOTE_INT },
+        [XEN_ELFNOTE_HYPERCALL_PAGE] = { "HYPERCALL_PAGE", ELFNOTE_INT },
+        [XEN_ELFNOTE_VIRT_BASE] = { "VIRT_BASE", ELFNOTE_INT },
+        [XEN_ELFNOTE_INIT_P2M] = { "INIT_P2M", ELFNOTE_INT },
+        [XEN_ELFNOTE_PADDR_OFFSET] = { "PADDR_OFFSET", ELFNOTE_INT },
+        [XEN_ELFNOTE_HV_START_LOW] = { "HV_START_LOW", ELFNOTE_INT },
+        [XEN_ELFNOTE_XEN_VERSION] = { "XEN_VERSION", ELFNOTE_STRING },
+        [XEN_ELFNOTE_GUEST_OS] = { "GUEST_OS", ELFNOTE_STRING },
+        [XEN_ELFNOTE_GUEST_VERSION] = { "GUEST_VERSION", ELFNOTE_STRING },
+        [XEN_ELFNOTE_LOADER] = { "LOADER", ELFNOTE_STRING },
+        [XEN_ELFNOTE_PAE_MODE] = { "PAE_MODE", ELFNOTE_STRING },
+        [XEN_ELFNOTE_FEATURES] = { "FEATURES", ELFNOTE_STRING },
+        [XEN_ELFNOTE_SUPPORTED_FEATURES] = { "SUPPORTED_FEATURES", ELFNOTE_INT },
+        [XEN_ELFNOTE_BSD_SYMTAB] = { "BSD_SYMTAB", ELFNOTE_STRING },
+        [XEN_ELFNOTE_L1_MFN_VALID] = { "L1_MFN_VALID", ELFNOTE_NAME },
+        [XEN_ELFNOTE_SUSPEND_CANCEL] = { "SUSPEND_CANCEL", ELFNOTE_INT },
+        [XEN_ELFNOTE_MOD_START_PFN] = { "MOD_START_PFN", ELFNOTE_INT },
+        [XEN_ELFNOTE_PHYS32_ENTRY] = { "PHYS32_ENTRY", ELFNOTE_INT },
     };
 /* *INDENT-ON* */
 
@@ -136,8 +140,9 @@ elf_errorstatus elf_xen_parse_note(struct elf_binary *elf,
         return 0;
     }
 
-    if ( note_desc[type].str )
+    switch ( note_desc[type].type )
     {
+    case ELFNOTE_STRING :
         str = elf_strval(elf, elf_note_desc(elf, note));
         if (str == NULL)
             /* elf_strval will mark elf broken if it fails so no need to log */
@@ -145,13 +150,18 @@ elf_errorstatus elf_xen_parse_note(struct elf_binary *elf,
         elf_msg(elf, "ELF: note: %s = \"%s\"\n", note_desc[type].name, str);
         parms->elf_notes[type].type = XEN_ENT_STR;
         parms->elf_notes[type].data.str = str;
-    }
-    else
-    {
+        break;
+
+    case ELFNOTE_INT:
         val = elf_note_numeric(elf, note);
         elf_msg(elf, "ELF: note: %s = %#" PRIx64 "\n", note_desc[type].name, val);
         parms->elf_notes[type].type = XEN_ENT_LONG;
         parms->elf_notes[type].data.num = val;
+        break;
+
+    case ELFNOTE_NAME:
+        elf_msg(elf, "ELF: note: %s", note_desc[type].name);
+        break;
     }
     parms->elf_notes[type].name = note_desc[type].name;
 
@@ -218,6 +228,10 @@ elf_errorstatus elf_xen_parse_note(struct elf_binary *elf,
         parms->phys_entry = val;
         break;
     }
+
+    if ( note_desc[type].type == ELFNOTE_NAME)
+        elf_msg(elf, "\n");
+
     return 0;
 }
 
-- 
2.44.0



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v4 5/5] x86/PVH: Support relocatable dom0 kernels
  2024-03-25 20:45 [PATCH v4 0/5] x86/pvh: Support relocating dom0 kernel Jason Andryuk
                   ` (3 preceding siblings ...)
  2024-03-25 20:45 ` [PATCH v4 4/5] libelf: Expand ELF note printing Jason Andryuk
@ 2024-03-25 20:45 ` Jason Andryuk
  2024-03-26  7:50   ` Jan Beulich
  4 siblings, 1 reply; 14+ messages in thread
From: Jason Andryuk @ 2024-03-25 20:45 UTC (permalink / raw)
  To: xen-devel
  Cc: Jason Andryuk, Jan Beulich, Andrew Cooper, Roger Pau Monné,
	George Dunlap, Julien Grall, Stefano Stabellini

Xen tries to load a PVH dom0 kernel at the fixed guest physical address
from the elf headers.  For Linux, this defaults to 0x1000000 (16MB), but
it can be configured.

Unfortunately there exist firmwares that have reserved regions at this
address, so Xen fails to load the dom0 kernel since it's not RAM.

The PVH entry code is not relocatable - it loads from absolute
addresses, which fail when the kernel is loaded at a different address.
With a suitably modified kernel, a reloctable entry point is possible.

Add XEN_ELFNOTE_PHYS32_RELOC which specifies optional minimum, maximum and
alignment needed for the kernel.  The presence of the NOTE indicates the
kernel supports a relocatable entry path.

Change the loading to check for an acceptable load address.  If the
kernel is relocatable, support finding an alternate load address.

The primary motivation for an explicit align field is that Linux has a
configurable CONFIG_PHYSICAL_ALIGN field.  This value is present in the
bzImage setup header, but not the ELF program headers p_align, which
report 2MB even when CONFIG_PHYSICAL_ALIGN is greater.  Since a kernel
is only considered relocatable if the PHYS32_RELOC elf note is present,
the alignment contraints can just be specified within the note instead
of searching for an alignment value via a heuristic.

libelf-private.h includes common-macros.h to satisfy the fuzzer build.

Link: https://gitlab.com/xen-project/xen/-/issues/180
Signed-off-by: Jason Andryuk <jason.andryuk@amd.com>
---
ELF Note printing looks like:
(XEN) ELF: note: PHYS32_RELOC = max: 0xffffffff min: 0x1000000 align: 0x200000

v2:
Use elfnote for min, max & align - use 64bit values.
Print original and relocated memory addresses
Use check_and_adjust_load_address() name
Return relocated base instead of offset
Use PAGE_ALIGN
Don't load above max_phys (expected to be 4GB in kernel elf note)
Use single line comments
Exit check_load_address loop earlier
Add __init to find_kernel_memory()

v3:
Remove kernel_start/end page rounding
Change loop comment to rely on a sorted memory map.
Reorder E820_RAM check first
Use %p for dest_base
Use PRIpaddr
Use 32bit phys_min/max/align
Consolidate to if ( x || y ) continue
Use max_t
Add parms->phys_reloc
Use common "%pd kernel: " prefix for messages
Re-order phys_entry assignment
Print range ends inclusively
Remove extra "Unable to load kernel" message
s/PVH_RELOCATION/PHYS32_RELOC/
Make PHYS32_RELOC contents optional
Use 2MB default alignment
Update ELF Note comment
Update XEN_ELFNOTE_MAX

v4:
Cast dest_base to uintptr_t
Use local start variable
Mention e820 requiring adjacent entries merged
Remove extra whitespace
Re-word elfnote comment & mention x86
Use ELFNOTE_NAME
Return -ENOSPC
Use ! instead of == 0
Check kend - 1 to avoid off by one
libelf: Use MB/GB() to define the size.
Use ARCH_PHYS_* defines
---
 xen/arch/x86/hvm/dom0_build.c      | 94 ++++++++++++++++++++++++++++++
 xen/common/libelf/libelf-dominfo.c | 37 ++++++++++++
 xen/common/libelf/libelf-private.h |  1 +
 xen/include/public/elfnote.h       | 19 +++++-
 xen/include/xen/libelf.h           |  4 ++
 5 files changed, 154 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/hvm/dom0_build.c b/xen/arch/x86/hvm/dom0_build.c
index 0ceda4140b..d925fc7417 100644
--- a/xen/arch/x86/hvm/dom0_build.c
+++ b/xen/arch/x86/hvm/dom0_build.c
@@ -537,6 +537,97 @@ static paddr_t __init find_memory(
     return INVALID_PADDR;
 }
 
+static bool __init check_load_address(
+    const struct domain *d, const struct elf_binary *elf)
+{
+    paddr_t kernel_start = (uintptr_t)elf->dest_base;
+    paddr_t kernel_end = kernel_start + elf->dest_size;
+    unsigned int i;
+
+    /* Relies on a sorted memory map with adjacent entries merged. */
+    for ( i = 0; i < d->arch.nr_e820; i++ )
+    {
+        paddr_t start = d->arch.e820[i].addr;
+        paddr_t end = start + d->arch.e820[i].size;
+
+        if ( start >= kernel_end )
+            return false;
+
+        if ( d->arch.e820[i].type == E820_RAM &&
+             start <= kernel_start &&
+             end >= kernel_end )
+            return true;
+    }
+
+    return false;
+}
+
+/* Find an e820 RAM region that fits the kernel at a suitable alignment. */
+static paddr_t __init find_kernel_memory(
+    const struct domain *d, struct elf_binary *elf,
+    const struct elf_dom_parms *parms)
+{
+    paddr_t kernel_size = elf->dest_size;
+    unsigned int i;
+
+    for ( i = 0; i < d->arch.nr_e820; i++ )
+    {
+        paddr_t start = d->arch.e820[i].addr;
+        paddr_t end = start + d->arch.e820[i].size;
+        paddr_t kstart, kend;
+
+        if ( d->arch.e820[i].type != E820_RAM ||
+             d->arch.e820[i].size < kernel_size )
+            continue;
+
+        kstart = ROUNDUP(start, parms->phys_align);
+        kstart = max_t(paddr_t, kstart, parms->phys_min);
+        kend = kstart + kernel_size;
+
+        if ( kend - 1 > parms->phys_max )
+            return 0;
+
+        if ( kend <= end )
+            return kstart;
+    }
+
+    return 0;
+}
+
+/* Check the kernel load address, and adjust if necessary and possible. */
+static bool __init check_and_adjust_load_address(
+    const struct domain *d, struct elf_binary *elf, struct elf_dom_parms *parms)
+{
+    paddr_t reloc_base;
+
+    if ( check_load_address(d, elf) )
+        return true;
+
+    if ( !parms->phys_reloc )
+    {
+        printk("%pd kernel: Address conflict and not relocatable\n", d);
+        return false;
+    }
+
+    reloc_base = find_kernel_memory(d, elf, parms);
+    if ( !reloc_base )
+    {
+        printk("%pd kernel: Failed find a load address\n", d);
+        return false;
+    }
+
+    if ( opt_dom0_verbose )
+        printk("%pd kernel: Moving [%p, %p] -> [%"PRIpaddr", %"PRIpaddr"]\n", d,
+               elf->dest_base, elf->dest_base + elf->dest_size - 1,
+               reloc_base, reloc_base + elf->dest_size - 1);
+
+    parms->phys_entry = reloc_base +
+                            (parms->phys_entry - (uintptr_t)elf->dest_base);
+    elf->dest_base = (char *)reloc_base;
+
+    return true;
+}
+
 static int __init pvh_load_kernel(struct domain *d, const module_t *image,
                                   unsigned long image_headroom,
                                   module_t *initrd, void *image_base,
@@ -585,6 +676,9 @@ static int __init pvh_load_kernel(struct domain *d, const module_t *image,
     elf.dest_base = (void *)(parms.virt_kstart - parms.virt_base);
     elf.dest_size = parms.virt_kend - parms.virt_kstart;
 
+    if ( !check_and_adjust_load_address(d, &elf, &parms) )
+        return -ENOSPC;
+
     elf_set_vcpu(&elf, v);
     rc = elf_load_binary(&elf);
     if ( rc < 0 )
diff --git a/xen/common/libelf/libelf-dominfo.c b/xen/common/libelf/libelf-dominfo.c
index e7b44d238b..b47d540023 100644
--- a/xen/common/libelf/libelf-dominfo.c
+++ b/xen/common/libelf/libelf-dominfo.c
@@ -17,6 +17,16 @@
 
 #include "libelf-private.h"
 
+#if defined(__i386__) || defined(__x86_64__)
+#define ARCH_PHYS_MIN_DEFAULT   0;
+#define ARCH_PHYS_MAX_DEFAULT   (GB(4) - 1);
+#define ARCH_PHYS_ALIGN_DEFAULT MB(2);
+#else
+#define ARCH_PHYS_MIN_DEFAULT   0;
+#define ARCH_PHYS_MAX_DEFAULT   0;
+#define ARCH_PHYS_ALIGN_DEFAULT 0;
+#endif
+
 /* ------------------------------------------------------------------------ */
 /* xen features                                                             */
 
@@ -125,6 +135,7 @@ elf_errorstatus elf_xen_parse_note(struct elf_binary *elf,
         [XEN_ELFNOTE_SUSPEND_CANCEL] = { "SUSPEND_CANCEL", ELFNOTE_INT },
         [XEN_ELFNOTE_MOD_START_PFN] = { "MOD_START_PFN", ELFNOTE_INT },
         [XEN_ELFNOTE_PHYS32_ENTRY] = { "PHYS32_ENTRY", ELFNOTE_INT },
+        [XEN_ELFNOTE_PHYS32_RELOC] = { "PHYS32_RELOC", ELFNOTE_NAME },
     };
 /* *INDENT-ON* */
 
@@ -132,6 +143,7 @@ elf_errorstatus elf_xen_parse_note(struct elf_binary *elf,
     uint64_t val = 0;
     unsigned int i;
     unsigned type = elf_uval(elf, note, type);
+    unsigned descsz = elf_uval(elf, note, descsz);
 
     if ( (type >= sizeof(note_desc) / sizeof(note_desc[0])) ||
          (note_desc[type].name == NULL) )
@@ -227,6 +239,27 @@ elf_errorstatus elf_xen_parse_note(struct elf_binary *elf,
     case XEN_ELFNOTE_PHYS32_ENTRY:
         parms->phys_entry = val;
         break;
+
+    case XEN_ELFNOTE_PHYS32_RELOC:
+        parms->phys_reloc = true;
+
+        if ( descsz >= 4 )
+        {
+            parms->phys_max = elf_note_numeric_array(elf, note, 4, 0);
+            elf_msg(elf, " = max: %#"PRIx32, parms->phys_max);
+        }
+        if ( descsz >= 8 )
+        {
+            parms->phys_min = elf_note_numeric_array(elf, note, 4, 1);
+            elf_msg(elf, " min: %#"PRIx32, parms->phys_min);
+        }
+        if ( descsz >= 12 )
+        {
+            parms->phys_align = elf_note_numeric_array(elf, note, 4, 2);
+            elf_msg(elf, " align: %#"PRIx32, parms->phys_align);
+        }
+
+        break;
     }
 
     if ( note_desc[type].type == ELFNOTE_NAME)
@@ -542,6 +575,10 @@ elf_errorstatus elf_xen_parse(struct elf_binary *elf,
     parms->p2m_base = UNSET_ADDR;
     parms->elf_paddr_offset = UNSET_ADDR;
     parms->phys_entry = UNSET_ADDR32;
+    parms->phys_min = ARCH_PHYS_MIN_DEFAULT;
+    parms->phys_max = ARCH_PHYS_MAX_DEFAULT;
+    parms->phys_align = ARCH_PHYS_ALIGN_DEFAULT;
+    parms->phys_reloc = false;
 
     /* Find and parse elf notes. */
     count = elf_phdr_count(elf);
diff --git a/xen/common/libelf/libelf-private.h b/xen/common/libelf/libelf-private.h
index 47db679966..98cac65bc5 100644
--- a/xen/common/libelf/libelf-private.h
+++ b/xen/common/libelf/libelf-private.h
@@ -71,6 +71,7 @@
 #endif
 #include <xen/elfnote.h>
 #include <xen/libelf/libelf.h>
+#include <xen-tools/common-macros.h>
 
 #ifndef FUZZ_NO_LIBXC
 #include "xenctrl.h"
diff --git a/xen/include/public/elfnote.h b/xen/include/public/elfnote.h
index 8bf54d035b..ed87d5575d 100644
--- a/xen/include/public/elfnote.h
+++ b/xen/include/public/elfnote.h
@@ -194,10 +194,27 @@
  */
 #define XEN_ELFNOTE_PHYS32_ENTRY 18
 
+/*
+ * Physical loading constraints for PVH kernels
+ *
+ * The presence of this note indicates the kernel supports relocating itself.
+ *
+ * The note may include up to three 32bit values to place constraints on the
+ * guest physical loading addresses and alignment for a PVH kernel.  Values
+ * are read in the following order:
+ *  - a maximum address for the entire image to be loaded below (default
+ *      0xffffffff)
+ *  - a minimum address for the start of the image (default 0)
+ *  - a required start alignment (default 0x200000)
+ *
+ *  This note is only valid for x86 binaries.
+ */
+#define XEN_ELFNOTE_PHYS32_RELOC 19
+
 /*
  * The number of the highest elfnote defined.
  */
-#define XEN_ELFNOTE_MAX XEN_ELFNOTE_PHYS32_ENTRY
+#define XEN_ELFNOTE_MAX XEN_ELFNOTE_PHYS32_RELOC
 
 /*
  * System information exported through crash notes.
diff --git a/xen/include/xen/libelf.h b/xen/include/xen/libelf.h
index 1c77e3df31..777c5008ca 100644
--- a/xen/include/xen/libelf.h
+++ b/xen/include/xen/libelf.h
@@ -434,10 +434,14 @@ struct elf_dom_parms {
     uint32_t f_supported[XENFEAT_NR_SUBMAPS];
     uint32_t f_required[XENFEAT_NR_SUBMAPS];
     uint32_t phys_entry;
+    uint32_t phys_min;
+    uint32_t phys_max;
+    uint32_t phys_align;
 
     /* calculated */
     uint64_t virt_kstart;
     uint64_t virt_kend;
+    bool phys_reloc;
 };
 
 static inline void elf_xen_feature_set(int nr, uint32_t * addr)
-- 
2.44.0



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v4 3/5] tools: Move MB/GB() to common-macros.h
  2024-03-25 20:45 ` [PATCH v4 3/5] tools: Move MB/GB() to common-macros.h Jason Andryuk
@ 2024-03-26  7:30   ` Jan Beulich
  0 siblings, 0 replies; 14+ messages in thread
From: Jan Beulich @ 2024-03-26  7:30 UTC (permalink / raw)
  To: Jason Andryuk
  Cc: Andrew Cooper, Roger Pau Monné,
	Anthony PERARD, Juergen Gross, Julien Grall, xen-devel

On 25.03.2024 21:45, Jason Andryuk wrote:
> Consolidate to a single set of common macros for tools.
> 
> MB() will gain another use in libelf, so this movement makes it
> available.
> 
> Signed-off-by: Jason Andryuk <jason.andryuk@amd.com>

Reviewed-by: Jan Beulich <jbeulich@suse.com>
(and perhaps also Requested-by: or some such)



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v4 5/5] x86/PVH: Support relocatable dom0 kernels
  2024-03-25 20:45 ` [PATCH v4 5/5] x86/PVH: Support relocatable dom0 kernels Jason Andryuk
@ 2024-03-26  7:50   ` Jan Beulich
  2024-03-26 13:24     ` Jason Andryuk
  0 siblings, 1 reply; 14+ messages in thread
From: Jan Beulich @ 2024-03-26  7:50 UTC (permalink / raw)
  To: Jason Andryuk
  Cc: Andrew Cooper, Roger Pau Monné,
	George Dunlap, Julien Grall, Stefano Stabellini, xen-devel

On 25.03.2024 21:45, Jason Andryuk wrote:
> +/* Find an e820 RAM region that fits the kernel at a suitable alignment. */
> +static paddr_t __init find_kernel_memory(
> +    const struct domain *d, struct elf_binary *elf,
> +    const struct elf_dom_parms *parms)
> +{
> +    paddr_t kernel_size = elf->dest_size;
> +    unsigned int i;
> +
> +    for ( i = 0; i < d->arch.nr_e820; i++ )
> +    {
> +        paddr_t start = d->arch.e820[i].addr;
> +        paddr_t end = start + d->arch.e820[i].size;
> +        paddr_t kstart, kend;
> +
> +        if ( d->arch.e820[i].type != E820_RAM ||
> +             d->arch.e820[i].size < kernel_size )
> +            continue;
> +
> +        kstart = ROUNDUP(start, parms->phys_align);
> +        kstart = max_t(paddr_t, kstart, parms->phys_min);
> +        kend = kstart + kernel_size;
> +
> +        if ( kend - 1 > parms->phys_max )
> +            return 0;
> +
> +        if ( kend <= end )
> +            return kstart;

IOW within a suitable region the lowest suitable part is selected. Often
low memory is deemed more precious than higher one, so if this choice is
indeed intentional, I'd like to ask for a brief comment towards the
reasons.

> --- a/xen/common/libelf/libelf-dominfo.c
> +++ b/xen/common/libelf/libelf-dominfo.c
> @@ -17,6 +17,16 @@
>  
>  #include "libelf-private.h"
>  
> +#if defined(__i386__) || defined(__x86_64__)
> +#define ARCH_PHYS_MIN_DEFAULT   0;
> +#define ARCH_PHYS_MAX_DEFAULT   (GB(4) - 1);
> +#define ARCH_PHYS_ALIGN_DEFAULT MB(2);
> +#else
> +#define ARCH_PHYS_MIN_DEFAULT   0;
> +#define ARCH_PHYS_MAX_DEFAULT   0;
> +#define ARCH_PHYS_ALIGN_DEFAULT 0;
> +#endif

None of the semicolons should really be here.

> @@ -227,6 +239,27 @@ elf_errorstatus elf_xen_parse_note(struct elf_binary *elf,
>      case XEN_ELFNOTE_PHYS32_ENTRY:
>          parms->phys_entry = val;
>          break;
> +
> +    case XEN_ELFNOTE_PHYS32_RELOC:
> +        parms->phys_reloc = true;
> +
> +        if ( descsz >= 4 )
> +        {
> +            parms->phys_max = elf_note_numeric_array(elf, note, 4, 0);
> +            elf_msg(elf, " = max: %#"PRIx32, parms->phys_max);

As indicated before, I consider the = here a little odd.

> +        }
> +        if ( descsz >= 8 )
> +        {
> +            parms->phys_min = elf_note_numeric_array(elf, note, 4, 1);
> +            elf_msg(elf, " min: %#"PRIx32, parms->phys_min);
> +        }
> +        if ( descsz >= 12 )
> +        {
> +            parms->phys_align = elf_note_numeric_array(elf, note, 4, 2);
> +            elf_msg(elf, " align: %#"PRIx32, parms->phys_align);
> +        }

I'd like us to reconsider this ordering: I'm inclined to say that MAX isn't
the most likely one a guest may find a need to use. Instead I'd expect both
MIN and ALIGN wanting to be given higher priority; what I'm less certain
about is the ordering between the two. To keep MIN and MAX adjacent, how
about ALIGN, MIN, MAX?

> --- a/xen/include/public/elfnote.h
> +++ b/xen/include/public/elfnote.h
> @@ -194,10 +194,27 @@
>   */
>  #define XEN_ELFNOTE_PHYS32_ENTRY 18
>  
> +/*
> + * Physical loading constraints for PVH kernels
> + *
> + * The presence of this note indicates the kernel supports relocating itself.
> + *
> + * The note may include up to three 32bit values to place constraints on the
> + * guest physical loading addresses and alignment for a PVH kernel.  Values
> + * are read in the following order:
> + *  - a maximum address for the entire image to be loaded below (default
> + *      0xffffffff)

"below" isn't exactly true anymore with this now being an inclusive value.
Perhaps "up to", or perhaps more of a re-wording.

I also think the wrapped line's indentation is too deep (by 2 blanks).

> + *  - a minimum address for the start of the image (default 0)
> + *  - a required start alignment (default 0x200000)
> + *
> + *  This note is only valid for x86 binaries.

Maybe s/valid/recognized/ (or honored or some such)?

Jan


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v4 4/5] libelf: Expand ELF note printing
  2024-03-25 20:45 ` [PATCH v4 4/5] libelf: Expand ELF note printing Jason Andryuk
@ 2024-03-26  7:51   ` Jan Beulich
  0 siblings, 0 replies; 14+ messages in thread
From: Jan Beulich @ 2024-03-26  7:51 UTC (permalink / raw)
  To: Jason Andryuk
  Cc: Andrew Cooper, George Dunlap, Julien Grall, Stefano Stabellini,
	xen-devel

On 25.03.2024 21:45, Jason Andryuk wrote:
> @@ -145,13 +150,18 @@ elf_errorstatus elf_xen_parse_note(struct elf_binary *elf,
>          elf_msg(elf, "ELF: note: %s = \"%s\"\n", note_desc[type].name, str);
>          parms->elf_notes[type].type = XEN_ENT_STR;
>          parms->elf_notes[type].data.str = str;
> -    }
> -    else
> -    {
> +        break;
> +
> +    case ELFNOTE_INT:
>          val = elf_note_numeric(elf, note);
>          elf_msg(elf, "ELF: note: %s = %#" PRIx64 "\n", note_desc[type].name, val);
>          parms->elf_notes[type].type = XEN_ENT_LONG;
>          parms->elf_notes[type].data.num = val;
> +        break;
> +
> +    case ELFNOTE_NAME:
> +        elf_msg(elf, "ELF: note: %s", note_desc[type].name);
> +        break;

As indicated before, I think a brief comment is warranted here towards
the seemingly missing newline. With that added
Reviewed-by: Jan Beulich <jbeulich@suse.com>

Jan


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v4 5/5] x86/PVH: Support relocatable dom0 kernels
  2024-03-26  7:50   ` Jan Beulich
@ 2024-03-26 13:24     ` Jason Andryuk
  2024-03-26 13:30       ` Jan Beulich
  2024-03-26 13:40       ` Jan Beulich
  0 siblings, 2 replies; 14+ messages in thread
From: Jason Andryuk @ 2024-03-26 13:24 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Andrew Cooper, Roger Pau Monné,
	George Dunlap, Julien Grall, Stefano Stabellini, xen-devel

On 2024-03-26 03:50, Jan Beulich wrote:
> On 25.03.2024 21:45, Jason Andryuk wrote:
>> +/* Find an e820 RAM region that fits the kernel at a suitable alignment. */
>> +static paddr_t __init find_kernel_memory(
>> +    const struct domain *d, struct elf_binary *elf,
>> +    const struct elf_dom_parms *parms)
>> +{
>> +    paddr_t kernel_size = elf->dest_size;
>> +    unsigned int i;
>> +
>> +    for ( i = 0; i < d->arch.nr_e820; i++ )
>> +    {
>> +        paddr_t start = d->arch.e820[i].addr;
>> +        paddr_t end = start + d->arch.e820[i].size;
>> +        paddr_t kstart, kend;
>> +
>> +        if ( d->arch.e820[i].type != E820_RAM ||
>> +             d->arch.e820[i].size < kernel_size )
>> +            continue;
>> +
>> +        kstart = ROUNDUP(start, parms->phys_align);
>> +        kstart = max_t(paddr_t, kstart, parms->phys_min);
>> +        kend = kstart + kernel_size;
>> +
>> +        if ( kend - 1 > parms->phys_max )
>> +            return 0;
>> +
>> +        if ( kend <= end )
>> +            return kstart;
> 
> IOW within a suitable region the lowest suitable part is selected. Often
> low memory is deemed more precious than higher one, so if this choice is
> indeed intentional, I'd like to ask for a brief comment towards the
> reasons.

It is not particularly intentional.  I'll look into locating at a higher 
address.

>> --- a/xen/common/libelf/libelf-dominfo.c
>> +++ b/xen/common/libelf/libelf-dominfo.c
>> @@ -17,6 +17,16 @@
>>   
>>   #include "libelf-private.h"
>>   
>> +#if defined(__i386__) || defined(__x86_64__)
>> +#define ARCH_PHYS_MIN_DEFAULT   0;
>> +#define ARCH_PHYS_MAX_DEFAULT   (GB(4) - 1);
>> +#define ARCH_PHYS_ALIGN_DEFAULT MB(2);
>> +#else
>> +#define ARCH_PHYS_MIN_DEFAULT   0;
>> +#define ARCH_PHYS_MAX_DEFAULT   0;
>> +#define ARCH_PHYS_ALIGN_DEFAULT 0;
>> +#endif
> 
> None of the semicolons should really be here.

Yes, sorry.  I inadvertently retained them when reworking this.

>> @@ -227,6 +239,27 @@ elf_errorstatus elf_xen_parse_note(struct elf_binary *elf,
>>       case XEN_ELFNOTE_PHYS32_ENTRY:
>>           parms->phys_entry = val;
>>           break;
>> +
>> +    case XEN_ELFNOTE_PHYS32_RELOC:
>> +        parms->phys_reloc = true;
>> +
>> +        if ( descsz >= 4 )
>> +        {
>> +            parms->phys_max = elf_note_numeric_array(elf, note, 4, 0);
>> +            elf_msg(elf, " = max: %#"PRIx32, parms->phys_max);
> 
> As indicated before, I consider the = here a little odd.

I retained = for consistency with other notes:
ELF: note: PHYS32_RELOC = max: 0x40000000 min: 0x1000000 align: 0x200000
ELF: note: PHYS32_ENTRY = 0x1000000
ELF: note: GUEST_OS = "linux"

I guess whitespace and labels makes it clear, so I'll drop the '='.

>> +        }
>> +        if ( descsz >= 8 )
>> +        {
>> +            parms->phys_min = elf_note_numeric_array(elf, note, 4, 1);
>> +            elf_msg(elf, " min: %#"PRIx32, parms->phys_min);
>> +        }
>> +        if ( descsz >= 12 )
>> +        {
>> +            parms->phys_align = elf_note_numeric_array(elf, note, 4, 2);
>> +            elf_msg(elf, " align: %#"PRIx32, parms->phys_align);
>> +        }
> 
> I'd like us to reconsider this ordering: I'm inclined to say that MAX isn't
> the most likely one a guest may find a need to use. Instead I'd expect both
> MIN and ALIGN wanting to be given higher priority; what I'm less certain
> about is the ordering between the two. To keep MIN and MAX adjacent, how
> about ALIGN, MIN, MAX?

ALIGN, MIN, MAX works for me.

On the Linux side, I'm expecting them all to be set:
ALIGN = CONFIG_PHYSICAL_ALIGN
MIN = LOAD_PHYSICAL_ADDR
MAX = KERNEL_IMAGE_SIZE

You need enough identity page tables to cover up to MAX. 
LOAD_PHYSICAL_ADDR is used as a minimum, so requesting placement above 
MIN makes sense to me.

>> --- a/xen/include/public/elfnote.h
>> +++ b/xen/include/public/elfnote.h
>> @@ -194,10 +194,27 @@
>>    */
>>   #define XEN_ELFNOTE_PHYS32_ENTRY 18
>>   
>> +/*
>> + * Physical loading constraints for PVH kernels
>> + *
>> + * The presence of this note indicates the kernel supports relocating itself.
>> + *
>> + * The note may include up to three 32bit values to place constraints on the
>> + * guest physical loading addresses and alignment for a PVH kernel.  Values
>> + * are read in the following order:
>> + *  - a maximum address for the entire image to be loaded below (default
>> + *      0xffffffff)
> 
> "below" isn't exactly true anymore with this now being an inclusive value.
> Perhaps "up to", or perhaps more of a re-wording.

Yes, good point.

> I also think the wrapped line's indentation is too deep (by 2 blanks).

Yes, thanks.

>> + *  - a minimum address for the start of the image (default 0)
>> + *  - a required start alignment (default 0x200000)
>> + *
>> + *  This note is only valid for x86 binaries.
> 
> Maybe s/valid/recognized/ (or honored or some such)?

Would a comment at the top of the file saying Notes are only used with 
x86 be better instead of this one-off comment?  Roger already said that, 
and elf_xen_note_check() has a successful early exit with "ELF: Not 
bothering with notes on ARM\n"

Thanks,
Jason


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v4 5/5] x86/PVH: Support relocatable dom0 kernels
  2024-03-26 13:24     ` Jason Andryuk
@ 2024-03-26 13:30       ` Jan Beulich
  2024-03-26 13:40       ` Jan Beulich
  1 sibling, 0 replies; 14+ messages in thread
From: Jan Beulich @ 2024-03-26 13:30 UTC (permalink / raw)
  To: Jason Andryuk
  Cc: Andrew Cooper, Roger Pau Monné,
	George Dunlap, Julien Grall, Stefano Stabellini, xen-devel

On 26.03.2024 14:24, Jason Andryuk wrote:
> On 2024-03-26 03:50, Jan Beulich wrote:
>> On 25.03.2024 21:45, Jason Andryuk wrote:
>>> + *  - a minimum address for the start of the image (default 0)
>>> + *  - a required start alignment (default 0x200000)
>>> + *
>>> + *  This note is only valid for x86 binaries.
>>
>> Maybe s/valid/recognized/ (or honored or some such)?
> 
> Would a comment at the top of the file saying Notes are only used with 
> x86 be better instead of this one-off comment?  Roger already said that, 
> and elf_xen_note_check() has a successful early exit with "ELF: Not 
> bothering with notes on ARM\n"

If truly none of the notes are of interest for Arm, then yes, such a more
general comment would likely make sense.

Jan


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v4 5/5] x86/PVH: Support relocatable dom0 kernels
  2024-03-26 13:24     ` Jason Andryuk
  2024-03-26 13:30       ` Jan Beulich
@ 2024-03-26 13:40       ` Jan Beulich
  2024-03-26 15:41         ` Jason Andryuk
  1 sibling, 1 reply; 14+ messages in thread
From: Jan Beulich @ 2024-03-26 13:40 UTC (permalink / raw)
  To: Jason Andryuk
  Cc: Andrew Cooper, Roger Pau Monné,
	George Dunlap, Julien Grall, Stefano Stabellini, xen-devel

On 26.03.2024 14:24, Jason Andryuk wrote:
> On 2024-03-26 03:50, Jan Beulich wrote:
>> On 25.03.2024 21:45, Jason Andryuk wrote:
>>> @@ -227,6 +239,27 @@ elf_errorstatus elf_xen_parse_note(struct elf_binary *elf,
>>>       case XEN_ELFNOTE_PHYS32_ENTRY:
>>>           parms->phys_entry = val;
>>>           break;
>>> +
>>> +    case XEN_ELFNOTE_PHYS32_RELOC:
>>> +        parms->phys_reloc = true;
>>> +
>>> +        if ( descsz >= 4 )
>>> +        {
>>> +            parms->phys_max = elf_note_numeric_array(elf, note, 4, 0);
>>> +            elf_msg(elf, " = max: %#"PRIx32, parms->phys_max);
>>
>> As indicated before, I consider the = here a little odd.
> 
> I retained = for consistency with other notes:
> ELF: note: PHYS32_RELOC = max: 0x40000000 min: 0x1000000 align: 0x200000
> ELF: note: PHYS32_ENTRY = 0x1000000
> ELF: note: GUEST_OS = "linux"
> 
> I guess whitespace and labels makes it clear, so I'll drop the '='.
> 
>>> +        }
>>> +        if ( descsz >= 8 )
>>> +        {
>>> +            parms->phys_min = elf_note_numeric_array(elf, note, 4, 1);
>>> +            elf_msg(elf, " min: %#"PRIx32, parms->phys_min);
>>> +        }
>>> +        if ( descsz >= 12 )
>>> +        {
>>> +            parms->phys_align = elf_note_numeric_array(elf, note, 4, 2);
>>> +            elf_msg(elf, " align: %#"PRIx32, parms->phys_align);
>>> +        }
>>
>> I'd like us to reconsider this ordering: I'm inclined to say that MAX isn't
>> the most likely one a guest may find a need to use. Instead I'd expect both
>> MIN and ALIGN wanting to be given higher priority; what I'm less certain
>> about is the ordering between the two. To keep MIN and MAX adjacent, how
>> about ALIGN, MIN, MAX?
> 
> ALIGN, MIN, MAX works for me.
> 
> On the Linux side, I'm expecting them all to be set:
> ALIGN = CONFIG_PHYSICAL_ALIGN
> MIN = LOAD_PHYSICAL_ADDR
> MAX = KERNEL_IMAGE_SIZE
> 
> You need enough identity page tables to cover up to MAX. 
> LOAD_PHYSICAL_ADDR is used as a minimum, so requesting placement above 
> MIN makes sense to me.

Hmm, setting MIN like this means moving down is precluded. Why would it
not be possible to move a kernel to lower than the default of 16M, when
CONFIG_PHYSICAL_START can be as low as 0? (In fact, I doubt 0 would work
if chosen, but 2M surely does work, as I build some of my Dom0 kernels
that way.)

MAX, otoh, I guess really wants setting as you say, for KERNEL_IMAGE_SIZE
actually being commented upon as mis-named. Just that it now really wants
to be KERNEL_IMAGE_SIZE-1.

Jan


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v4 5/5] x86/PVH: Support relocatable dom0 kernels
  2024-03-26 13:40       ` Jan Beulich
@ 2024-03-26 15:41         ` Jason Andryuk
  0 siblings, 0 replies; 14+ messages in thread
From: Jason Andryuk @ 2024-03-26 15:41 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Andrew Cooper, Roger Pau Monné,
	George Dunlap, Julien Grall, Stefano Stabellini, xen-devel

On 2024-03-26 09:40, Jan Beulich wrote:
> On 26.03.2024 14:24, Jason Andryuk wrote:
>> On 2024-03-26 03:50, Jan Beulich wrote:
>>> On 25.03.2024 21:45, Jason Andryuk wrote:
>>>> @@ -227,6 +239,27 @@ elf_errorstatus elf_xen_parse_note(struct elf_binary *elf,

>>>> +        }
>>>> +        if ( descsz >= 8 )
>>>> +        {
>>>> +            parms->phys_min = elf_note_numeric_array(elf, note, 4, 1);
>>>> +            elf_msg(elf, " min: %#"PRIx32, parms->phys_min);
>>>> +        }
>>>> +        if ( descsz >= 12 )
>>>> +        {
>>>> +            parms->phys_align = elf_note_numeric_array(elf, note, 4, 2);
>>>> +            elf_msg(elf, " align: %#"PRIx32, parms->phys_align);
>>>> +        }
>>>
>>> I'd like us to reconsider this ordering: I'm inclined to say that MAX isn't
>>> the most likely one a guest may find a need to use. Instead I'd expect both
>>> MIN and ALIGN wanting to be given higher priority; what I'm less certain
>>> about is the ordering between the two. To keep MIN and MAX adjacent, how
>>> about ALIGN, MIN, MAX?
>>
>> ALIGN, MIN, MAX works for me.
>>
>> On the Linux side, I'm expecting them all to be set:
>> ALIGN = CONFIG_PHYSICAL_ALIGN
>> MIN = LOAD_PHYSICAL_ADDR
>> MAX = KERNEL_IMAGE_SIZE
>>
>> You need enough identity page tables to cover up to MAX.
>> LOAD_PHYSICAL_ADDR is used as a minimum, so requesting placement above
>> MIN makes sense to me.
> 
> Hmm, setting MIN like this means moving down is precluded. Why would it
> not be possible to move a kernel to lower than the default of 16M, when
> CONFIG_PHYSICAL_START can be as low as 0? (In fact, I doubt 0 would work
> if chosen, but 2M surely does work, as I build some of my Dom0 kernels
> that way.)

I successfully booted at a lower address when testing, so it's possible. 
  The bzImage early boot code uses LOAD_PHYSICAL_ADDR as a minimum for 
extracting vmlinux, so I matched that.  It's not clear to me exactly why 
that is used, though it avoids using the 16MB ISA DMA region.

Kconfig RELOCATABLE has this:
Note: If CONFIG_RELOCATABLE=y, then the kernel runs from the address
it has been loaded at and the compile time physical address
(CONFIG_PHYSICAL_START) is used as the minimum location.

Which is again why I thought to use it as MIN.

> MAX, otoh, I guess really wants setting as you say, for KERNEL_IMAGE_SIZE
> actually being commented upon as mis-named. Just that it now really wants
> to be KERNEL_IMAGE_SIZE-1.

Yes.

If placement changes to favor higher addresses, then ALIGN, MAX, MIN 
becomes a little more important since that should be consulted first.

Thanks,
Jason


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v4 2/5] tools/init-xenstore-domain: Replace variable MB() usage
  2024-03-25 20:45 ` [PATCH v4 2/5] tools/init-xenstore-domain: Replace variable MB() usage Jason Andryuk
@ 2024-04-03 12:27   ` Anthony PERARD
  0 siblings, 0 replies; 14+ messages in thread
From: Anthony PERARD @ 2024-04-03 12:27 UTC (permalink / raw)
  To: Jason Andryuk; +Cc: xen-devel, Juergen Gross, Julien Grall

On Mon, Mar 25, 2024 at 04:45:12PM -0400, Jason Andryuk wrote:
> The local MB() & GB() macros will be replaced with a common
> implementation, but those only work with constant values.  Introduce a
> static inline mb_to_bytes() in place of the MB() macro to convert the
> variable values.
> 
> Signed-off-by: Jason Andryuk <jason.andryuk@amd.com>
> ---
> v4:
> New
> ---
>  tools/helpers/init-xenstore-domain.c | 10 +++++++---
>  1 file changed, 7 insertions(+), 3 deletions(-)
> 
> diff --git a/tools/helpers/init-xenstore-domain.c b/tools/helpers/init-xenstore-domain.c
> index 1683438c5c..5405842dfe 100644
> --- a/tools/helpers/init-xenstore-domain.c
> +++ b/tools/helpers/init-xenstore-domain.c
> @@ -20,7 +20,6 @@
>  #include "init-dom-json.h"
>  
>  #define LAPIC_BASE_ADDRESS  0xfee00000UL
> -#define MB(x)               ((uint64_t)x << 20)
>  #define GB(x)               ((uint64_t)x << 30)
>  
>  static uint32_t domid = ~0;
> @@ -36,6 +35,11 @@ static xc_evtchn_port_or_error_t console_evtchn;
>  static xentoollog_level minmsglevel = XTL_PROGRESS;
>  static void *logger;
>  
> +static inline uint64_t mb_to_bytes(int mem)
> +{
> +	return (uint64_t)mem << 20;
> +}
> +
>  static struct option options[] = {
>      { "kernel", 1, NULL, 'k' },
>      { "memory", 1, NULL, 'm' },
> @@ -76,8 +80,8 @@ static int build(xc_interface *xch)
>      int rv, xs_fd;
>      struct xc_dom_image *dom = NULL;
>      int limit_kb = (maxmem ? : memory) * 1024 + X86_HVM_NR_SPECIAL_PAGES * 4;
> -    uint64_t mem_size = MB(memory);
> -    uint64_t max_size = MB(maxmem ? : memory);
> +    uint64_t mem_size = mb_to_bytes(memory);
> +    uint64_t max_size = mb_to_bytes(maxmem ? : memory);

Looks like this change actually fix a bug. When `maxmem` is set, we
would have "max_size = maxmem", otherwise, it would be 
"max_size = memory << 20"

The macro should have been written as
 #define MB(x)               ((uint64_t)(x) << 20)

This patch could be backported to 4.17 and later, with:
    Fixes: 134d53f57707 ("tools/init-xenstore-domain: fix memory map for PVH stubdom")

Anyway, this patch on it's own looks fine:
Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>

Thanks,

-- 
Anthony PERARD


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2024-04-03 12:28 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-03-25 20:45 [PATCH v4 0/5] x86/pvh: Support relocating dom0 kernel Jason Andryuk
2024-03-25 20:45 ` [PATCH v4 1/5] Revert "xen/x86: bzImage parse kernel_alignment" Jason Andryuk
2024-03-25 20:45 ` [PATCH v4 2/5] tools/init-xenstore-domain: Replace variable MB() usage Jason Andryuk
2024-04-03 12:27   ` Anthony PERARD
2024-03-25 20:45 ` [PATCH v4 3/5] tools: Move MB/GB() to common-macros.h Jason Andryuk
2024-03-26  7:30   ` Jan Beulich
2024-03-25 20:45 ` [PATCH v4 4/5] libelf: Expand ELF note printing Jason Andryuk
2024-03-26  7:51   ` Jan Beulich
2024-03-25 20:45 ` [PATCH v4 5/5] x86/PVH: Support relocatable dom0 kernels Jason Andryuk
2024-03-26  7:50   ` Jan Beulich
2024-03-26 13:24     ` Jason Andryuk
2024-03-26 13:30       ` Jan Beulich
2024-03-26 13:40       ` Jan Beulich
2024-03-26 15:41         ` Jason Andryuk

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.