All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v5 0/6] x86/pvh: Support relocating dom0 kernel
@ 2024-03-26 21:38 Jason Andryuk
  2024-03-26 21:38 ` [PATCH v5 1/6] Revert "xen/x86: bzImage parse kernel_alignment" Jason Andryuk
                   ` (6 more replies)
  0 siblings, 7 replies; 17+ messages in thread
From: Jason Andryuk @ 2024-03-26 21:38 UTC (permalink / raw)
  To: xen-devel
  Cc: Jason Andryuk, Jan Beulich, Andrew Cooper, Roger Pau Monné,
	Juergen Gross, Julien Grall, Anthony PERARD, George Dunlap,
	Stefano Stabellini

Xen tries to load a PVH dom0 kernel at the fixed guest physical address
from the elf headers.  For Linux, this defaults to 0x1000000 (16MB), but
it can be configured.

Unfortunately there exist firmwares that have reserved regions at this
address, so Xen fails to load the dom0 kernel since it's not RAM.

The other issue is that the Linux PVH entry point is not
position-independent.  It expects to run at the compiled
CONFIG_PHYSICAL_ADDRESS.

This patch set expands the PVH dom0 builder to try to relocate the
kernel if needed and possible.  XEN_ELFNOTE_PHYS32_RELOC is added for
kernels to indicate they are relocatable and their acceptable address
range and alignment.

The first patch reverts "xen/x86: bzImage parse kernel_alignment" since
the alignment will be specified by the ELF note.

The second and third patches move MB/GB() to common-macros.h.

The fourth patch expands ELF note printing beyond just printing
integers and strings.

The fifth patch expands specifies the Xen ELF Notes are x86-specific.

The sixth patch expands the pvh dom0 kernel placement code.

I'll post an additional patch showing the Linux changes to make PVH
relocatable.

Jason Andryuk (6):
  Revert "xen/x86: bzImage parse kernel_alignment"
  tools/init-xenstore-domain: Replace variable MB() usage
  tools: Move MB/GB() to common-macros.h
  libelf: Expand ELF note printing
  xen/elfnote: Specify ELF Notes are x86-specific
  x86/PVH: Support relocatable dom0 kernels

 tools/firmware/hvmloader/util.h         |   3 -
 tools/helpers/init-xenstore-domain.c    |  11 ++-
 tools/include/xen-tools/common-macros.h |   4 +
 tools/libs/light/libxl_internal.h       |   4 -
 xen/arch/x86/bzimage.c                  |   4 +-
 xen/arch/x86/hvm/dom0_build.c           | 104 +++++++++++++++++++++++-
 xen/arch/x86/include/asm/bzimage.h      |   2 +-
 xen/arch/x86/pv/dom0_build.c            |   2 +-
 xen/common/libelf/libelf-dominfo.c      |  99 ++++++++++++++++------
 xen/common/libelf/libelf-private.h      |   1 +
 xen/include/public/elfnote.h            |  18 +++-
 xen/include/xen/libelf.h                |   4 +
 12 files changed, 213 insertions(+), 43 deletions(-)

-- 
2.44.0



^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v5 1/6] Revert "xen/x86: bzImage parse kernel_alignment"
  2024-03-26 21:38 [PATCH v5 0/6] x86/pvh: Support relocating dom0 kernel Jason Andryuk
@ 2024-03-26 21:38 ` Jason Andryuk
  2024-03-27  7:22   ` Jan Beulich
  2024-03-26 21:38 ` [PATCH v5 2/6] tools/init-xenstore-domain: Replace variable MB() usage Jason Andryuk
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 17+ messages in thread
From: Jason Andryuk @ 2024-03-26 21:38 UTC (permalink / raw)
  To: xen-devel; +Cc: Jason Andryuk, Jan Beulich, Andrew Cooper, Roger Pau Monné

A new ELF note will specify the alignment for a relocatable PVH kernel.
ELF notes are suitable for vmlinux and other ELF files, so this
Linux-specific bzImage parsing in unnecessary.

This reverts commit c44cac229067faeec8f49247d1cf281723ac2d40.

Signed-off-by: Jason Andryuk <jason.andryuk@amd.com>
---
 xen/arch/x86/bzimage.c             | 4 +---
 xen/arch/x86/hvm/dom0_build.c      | 4 +---
 xen/arch/x86/include/asm/bzimage.h | 2 +-
 xen/arch/x86/pv/dom0_build.c       | 2 +-
 4 files changed, 4 insertions(+), 8 deletions(-)

diff --git a/xen/arch/x86/bzimage.c b/xen/arch/x86/bzimage.c
index 0f4cfc49f7..ac4fd428be 100644
--- a/xen/arch/x86/bzimage.c
+++ b/xen/arch/x86/bzimage.c
@@ -105,7 +105,7 @@ unsigned long __init bzimage_headroom(void *image_start,
 }
 
 int __init bzimage_parse(void *image_base, void **image_start,
-                         unsigned long *image_len, unsigned int *align)
+                         unsigned long *image_len)
 {
     struct setup_header *hdr = (struct setup_header *)(*image_start);
     int err = bzimage_check(hdr, *image_len);
@@ -118,8 +118,6 @@ int __init bzimage_parse(void *image_base, void **image_start,
     {
         *image_start += (hdr->setup_sects + 1) * 512 + hdr->payload_offset;
         *image_len = hdr->payload_length;
-        if ( align )
-            *align = hdr->kernel_alignment;
     }
 
     if ( elf_is_elfbinary(*image_start, *image_len) )
diff --git a/xen/arch/x86/hvm/dom0_build.c b/xen/arch/x86/hvm/dom0_build.c
index bbae8a5645..0ceda4140b 100644
--- a/xen/arch/x86/hvm/dom0_build.c
+++ b/xen/arch/x86/hvm/dom0_build.c
@@ -548,14 +548,12 @@ static int __init pvh_load_kernel(struct domain *d, const module_t *image,
     struct elf_binary elf;
     struct elf_dom_parms parms;
     paddr_t last_addr;
-    unsigned int align = 0;
     struct hvm_start_info start_info = { 0 };
     struct hvm_modlist_entry mod = { 0 };
     struct vcpu *v = d->vcpu[0];
     int rc;
 
-    rc = bzimage_parse(image_base, &image_start, &image_len, &align);
-    if ( rc != 0 )
+    if ( (rc = bzimage_parse(image_base, &image_start, &image_len)) != 0 )
     {
         printk("Error trying to detect bz compressed kernel\n");
         return rc;
diff --git a/xen/arch/x86/include/asm/bzimage.h b/xen/arch/x86/include/asm/bzimage.h
index 2e04f5cc7b..7ed69d3910 100644
--- a/xen/arch/x86/include/asm/bzimage.h
+++ b/xen/arch/x86/include/asm/bzimage.h
@@ -6,6 +6,6 @@
 unsigned long bzimage_headroom(void *image_start, unsigned long image_length);
 
 int bzimage_parse(void *image_base, void **image_start,
-                  unsigned long *image_len, unsigned int *align);
+                  unsigned long *image_len);
 
 #endif /* __X86_BZIMAGE_H__ */
diff --git a/xen/arch/x86/pv/dom0_build.c b/xen/arch/x86/pv/dom0_build.c
index e9fa8a9a82..d8043fa58a 100644
--- a/xen/arch/x86/pv/dom0_build.c
+++ b/xen/arch/x86/pv/dom0_build.c
@@ -416,7 +416,7 @@ int __init dom0_construct_pv(struct domain *d,
 
     d->max_pages = ~0U;
 
-    if ( (rc = bzimage_parse(image_base, &image_start, &image_len, NULL)) != 0 )
+    if ( (rc = bzimage_parse(image_base, &image_start, &image_len)) != 0 )
         return rc;
 
     if ( (rc = elf_init(&elf, image_start, image_len)) != 0 )
-- 
2.44.0



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v5 2/6] tools/init-xenstore-domain: Replace variable MB() usage
  2024-03-26 21:38 [PATCH v5 0/6] x86/pvh: Support relocating dom0 kernel Jason Andryuk
  2024-03-26 21:38 ` [PATCH v5 1/6] Revert "xen/x86: bzImage parse kernel_alignment" Jason Andryuk
@ 2024-03-26 21:38 ` Jason Andryuk
  2024-03-26 21:38 ` [PATCH v5 3/6] tools: Move MB/GB() to common-macros.h Jason Andryuk
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 17+ messages in thread
From: Jason Andryuk @ 2024-03-26 21:38 UTC (permalink / raw)
  To: xen-devel; +Cc: Jason Andryuk, Juergen Gross, Julien Grall, Anthony PERARD

The local MB() & GB() macros will be replaced with a common
implementation, but those only work with constant values.  Introduce a
static inline mb_to_bytes() in place of the MB() macro to convert the
variable values.

Signed-off-by: Jason Andryuk <jason.andryuk@amd.com>
---
v4:
New
---
 tools/helpers/init-xenstore-domain.c | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/tools/helpers/init-xenstore-domain.c b/tools/helpers/init-xenstore-domain.c
index 1683438c5c..5405842dfe 100644
--- a/tools/helpers/init-xenstore-domain.c
+++ b/tools/helpers/init-xenstore-domain.c
@@ -20,7 +20,6 @@
 #include "init-dom-json.h"
 
 #define LAPIC_BASE_ADDRESS  0xfee00000UL
-#define MB(x)               ((uint64_t)x << 20)
 #define GB(x)               ((uint64_t)x << 30)
 
 static uint32_t domid = ~0;
@@ -36,6 +35,11 @@ static xc_evtchn_port_or_error_t console_evtchn;
 static xentoollog_level minmsglevel = XTL_PROGRESS;
 static void *logger;
 
+static inline uint64_t mb_to_bytes(int mem)
+{
+	return (uint64_t)mem << 20;
+}
+
 static struct option options[] = {
     { "kernel", 1, NULL, 'k' },
     { "memory", 1, NULL, 'm' },
@@ -76,8 +80,8 @@ static int build(xc_interface *xch)
     int rv, xs_fd;
     struct xc_dom_image *dom = NULL;
     int limit_kb = (maxmem ? : memory) * 1024 + X86_HVM_NR_SPECIAL_PAGES * 4;
-    uint64_t mem_size = MB(memory);
-    uint64_t max_size = MB(maxmem ? : memory);
+    uint64_t mem_size = mb_to_bytes(memory);
+    uint64_t max_size = mb_to_bytes(maxmem ? : memory);
     struct e820entry e820[3];
     struct xen_domctl_createdomain config = {
         .ssidref = SECINITSID_DOMU,
-- 
2.44.0



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v5 3/6] tools: Move MB/GB() to common-macros.h
  2024-03-26 21:38 [PATCH v5 0/6] x86/pvh: Support relocating dom0 kernel Jason Andryuk
  2024-03-26 21:38 ` [PATCH v5 1/6] Revert "xen/x86: bzImage parse kernel_alignment" Jason Andryuk
  2024-03-26 21:38 ` [PATCH v5 2/6] tools/init-xenstore-domain: Replace variable MB() usage Jason Andryuk
@ 2024-03-26 21:38 ` Jason Andryuk
  2024-03-26 21:38 ` [PATCH v5 4/6] libelf: Expand ELF note printing Jason Andryuk
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 17+ messages in thread
From: Jason Andryuk @ 2024-03-26 21:38 UTC (permalink / raw)
  To: xen-devel
  Cc: Jason Andryuk, Jan Beulich, Andrew Cooper, Roger Pau Monné,
	Anthony PERARD, Juergen Gross, Julien Grall

Consolidate to a single set of common macros for tools.

MB() will gain another use in libelf, so this movement makes it
available.

Requested-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Jason Andryuk <jason.andryuk@amd.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
---
v4:
New

v5:
Add Jan's R-b & Req-b
---
 tools/firmware/hvmloader/util.h         | 3 ---
 tools/helpers/init-xenstore-domain.c    | 1 -
 tools/include/xen-tools/common-macros.h | 4 ++++
 tools/libs/light/libxl_internal.h       | 4 ----
 4 files changed, 4 insertions(+), 8 deletions(-)

diff --git a/tools/firmware/hvmloader/util.h b/tools/firmware/hvmloader/util.h
index 87be213dec..14078bde1e 100644
--- a/tools/firmware/hvmloader/util.h
+++ b/tools/firmware/hvmloader/util.h
@@ -38,9 +38,6 @@ void __bug(const char *file, int line) __attribute__((noreturn));
 #define BUG() __bug(__FILE__, __LINE__)
 #define BUG_ON(p) do { if (p) BUG(); } while (0)
 
-#define MB(mb) (mb##ULL << 20)
-#define GB(gb) (gb##ULL << 30)
-
 static inline int test_bit(unsigned int b, const void *p)
 {
     return !!(((const uint8_t *)p)[b>>3] & (1u<<(b&7)));
diff --git a/tools/helpers/init-xenstore-domain.c b/tools/helpers/init-xenstore-domain.c
index 5405842dfe..f38ba8d6b5 100644
--- a/tools/helpers/init-xenstore-domain.c
+++ b/tools/helpers/init-xenstore-domain.c
@@ -20,7 +20,6 @@
 #include "init-dom-json.h"
 
 #define LAPIC_BASE_ADDRESS  0xfee00000UL
-#define GB(x)               ((uint64_t)x << 30)
 
 static uint32_t domid = ~0;
 static char *kernel;
diff --git a/tools/include/xen-tools/common-macros.h b/tools/include/xen-tools/common-macros.h
index 81fba2e9f5..07aed92684 100644
--- a/tools/include/xen-tools/common-macros.h
+++ b/tools/include/xen-tools/common-macros.h
@@ -91,6 +91,10 @@
 #define __AC(X, Y)   (X ## Y)
 #define _AC(X, Y)    __AC(X, Y)
 
+/* Size macros. */
+#define MB(_mb)     (_AC(_mb, ULL) << 20)
+#define GB(_gb)     (_AC(_gb, ULL) << 30)
+
 #define get_unaligned_t(type, ptr) ({                               \
     const struct { type x; } __packed *ptr_ = (typeof(ptr_))(ptr);  \
     ptr_->x;                                                        \
diff --git a/tools/libs/light/libxl_internal.h b/tools/libs/light/libxl_internal.h
index 094d0df9b1..803dbc1a03 100644
--- a/tools/libs/light/libxl_internal.h
+++ b/tools/libs/light/libxl_internal.h
@@ -126,10 +126,6 @@
 #define PVSHIM_BASENAME "xen-shim"
 #define PVSHIM_CMDLINE "pv-shim console=xen,pv"
 
-/* Size macros. */
-#define MB(_mb)     (_AC(_mb, ULL) << 20)
-#define GB(_gb)     (_AC(_gb, ULL) << 30)
-
 #define DIV_ROUNDUP(n, d) (((n) + (d) - 1) / (d))
 
 #define LIBXL__LOGGING_ENABLED
-- 
2.44.0



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v5 4/6] libelf: Expand ELF note printing
  2024-03-26 21:38 [PATCH v5 0/6] x86/pvh: Support relocating dom0 kernel Jason Andryuk
                   ` (2 preceding siblings ...)
  2024-03-26 21:38 ` [PATCH v5 3/6] tools: Move MB/GB() to common-macros.h Jason Andryuk
@ 2024-03-26 21:38 ` Jason Andryuk
  2024-03-27  7:27   ` Jan Beulich
  2024-03-26 21:38 ` [PATCH v5 5/6] xen/elfnote: Specify ELF Notes are x86-specific Jason Andryuk
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 17+ messages in thread
From: Jason Andryuk @ 2024-03-26 21:38 UTC (permalink / raw)
  To: xen-devel
  Cc: Jason Andryuk, Andrew Cooper, George Dunlap, Jan Beulich,
	Julien Grall, Stefano Stabellini

The XEN_ELFNOTE_L1_MFN_VALID is an array of pairs of values, but it is
printed as:
(XEN) ELF: note: L1_MFN_VALID = 0

This is a limitation of only printing either string or numeric values.
Switch from the boolean to an enum which allows more flexibility in
printing the values.  Introduce ELFNOTE_NAME to only print the name
without a value like:
(XEN) ELF: note: L1_MFN_VALID

Details can optionally be printed for specific notes.

Signed-off-by: Jason Andryuk <jason.andryuk@amd.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
---
v3:
Remove fatal size check
Don't print values - just the name for presence

v4:
Use switch
Always print newline for ELFNOTE_NAME

v5:
Add comment about newline printing
Add Jan's R-b
---
 xen/common/libelf/libelf-dominfo.c | 62 +++++++++++++++++++-----------
 1 file changed, 39 insertions(+), 23 deletions(-)

diff --git a/xen/common/libelf/libelf-dominfo.c b/xen/common/libelf/libelf-dominfo.c
index a13a5e4db6..77bd582a37 100644
--- a/xen/common/libelf/libelf-dominfo.c
+++ b/xen/common/libelf/libelf-dominfo.c
@@ -101,26 +101,30 @@ elf_errorstatus elf_xen_parse_note(struct elf_binary *elf,
 /* *INDENT-OFF* */
     static const struct {
         const char *name;
-        bool str;
+        enum {
+            ELFNOTE_INT,
+            ELFNOTE_STRING,
+            ELFNOTE_NAME,
+        } type;
     } note_desc[] = {
-        [XEN_ELFNOTE_ENTRY] = { "ENTRY", 0},
-        [XEN_ELFNOTE_HYPERCALL_PAGE] = { "HYPERCALL_PAGE", 0},
-        [XEN_ELFNOTE_VIRT_BASE] = { "VIRT_BASE", 0},
-        [XEN_ELFNOTE_INIT_P2M] = { "INIT_P2M", 0},
-        [XEN_ELFNOTE_PADDR_OFFSET] = { "PADDR_OFFSET", 0},
-        [XEN_ELFNOTE_HV_START_LOW] = { "HV_START_LOW", 0},
-        [XEN_ELFNOTE_XEN_VERSION] = { "XEN_VERSION", 1},
-        [XEN_ELFNOTE_GUEST_OS] = { "GUEST_OS", 1},
-        [XEN_ELFNOTE_GUEST_VERSION] = { "GUEST_VERSION", 1},
-        [XEN_ELFNOTE_LOADER] = { "LOADER", 1},
-        [XEN_ELFNOTE_PAE_MODE] = { "PAE_MODE", 1},
-        [XEN_ELFNOTE_FEATURES] = { "FEATURES", 1},
-        [XEN_ELFNOTE_SUPPORTED_FEATURES] = { "SUPPORTED_FEATURES", 0},
-        [XEN_ELFNOTE_BSD_SYMTAB] = { "BSD_SYMTAB", 1},
-        [XEN_ELFNOTE_L1_MFN_VALID] = { "L1_MFN_VALID", false },
-        [XEN_ELFNOTE_SUSPEND_CANCEL] = { "SUSPEND_CANCEL", 0 },
-        [XEN_ELFNOTE_MOD_START_PFN] = { "MOD_START_PFN", 0 },
-        [XEN_ELFNOTE_PHYS32_ENTRY] = { "PHYS32_ENTRY", 0 },
+        [XEN_ELFNOTE_ENTRY] = { "ENTRY", ELFNOTE_INT },
+        [XEN_ELFNOTE_HYPERCALL_PAGE] = { "HYPERCALL_PAGE", ELFNOTE_INT },
+        [XEN_ELFNOTE_VIRT_BASE] = { "VIRT_BASE", ELFNOTE_INT },
+        [XEN_ELFNOTE_INIT_P2M] = { "INIT_P2M", ELFNOTE_INT },
+        [XEN_ELFNOTE_PADDR_OFFSET] = { "PADDR_OFFSET", ELFNOTE_INT },
+        [XEN_ELFNOTE_HV_START_LOW] = { "HV_START_LOW", ELFNOTE_INT },
+        [XEN_ELFNOTE_XEN_VERSION] = { "XEN_VERSION", ELFNOTE_STRING },
+        [XEN_ELFNOTE_GUEST_OS] = { "GUEST_OS", ELFNOTE_STRING },
+        [XEN_ELFNOTE_GUEST_VERSION] = { "GUEST_VERSION", ELFNOTE_STRING },
+        [XEN_ELFNOTE_LOADER] = { "LOADER", ELFNOTE_STRING },
+        [XEN_ELFNOTE_PAE_MODE] = { "PAE_MODE", ELFNOTE_STRING },
+        [XEN_ELFNOTE_FEATURES] = { "FEATURES", ELFNOTE_STRING },
+        [XEN_ELFNOTE_SUPPORTED_FEATURES] = { "SUPPORTED_FEATURES", ELFNOTE_INT },
+        [XEN_ELFNOTE_BSD_SYMTAB] = { "BSD_SYMTAB", ELFNOTE_STRING },
+        [XEN_ELFNOTE_L1_MFN_VALID] = { "L1_MFN_VALID", ELFNOTE_NAME },
+        [XEN_ELFNOTE_SUSPEND_CANCEL] = { "SUSPEND_CANCEL", ELFNOTE_INT },
+        [XEN_ELFNOTE_MOD_START_PFN] = { "MOD_START_PFN", ELFNOTE_INT },
+        [XEN_ELFNOTE_PHYS32_ENTRY] = { "PHYS32_ENTRY", ELFNOTE_INT },
     };
 /* *INDENT-ON* */
 
@@ -136,8 +140,9 @@ elf_errorstatus elf_xen_parse_note(struct elf_binary *elf,
         return 0;
     }
 
-    if ( note_desc[type].str )
+    switch ( note_desc[type].type )
     {
+    case ELFNOTE_STRING :
         str = elf_strval(elf, elf_note_desc(elf, note));
         if (str == NULL)
             /* elf_strval will mark elf broken if it fails so no need to log */
@@ -145,13 +150,20 @@ elf_errorstatus elf_xen_parse_note(struct elf_binary *elf,
         elf_msg(elf, "ELF: note: %s = \"%s\"\n", note_desc[type].name, str);
         parms->elf_notes[type].type = XEN_ENT_STR;
         parms->elf_notes[type].data.str = str;
-    }
-    else
-    {
+        break;
+
+    case ELFNOTE_INT:
         val = elf_note_numeric(elf, note);
         elf_msg(elf, "ELF: note: %s = %#" PRIx64 "\n", note_desc[type].name, val);
         parms->elf_notes[type].type = XEN_ENT_LONG;
         parms->elf_notes[type].data.num = val;
+        break;
+
+    case ELFNOTE_NAME:
+        /* ELFNOTE_NAME has a newline printed at the end of the function to
+         * optionally allow printing customized details. */
+        elf_msg(elf, "ELF: note: %s", note_desc[type].name);
+        break;
     }
     parms->elf_notes[type].name = note_desc[type].name;
 
@@ -218,6 +230,10 @@ elf_errorstatus elf_xen_parse_note(struct elf_binary *elf,
         parms->phys_entry = val;
         break;
     }
+
+    if ( note_desc[type].type == ELFNOTE_NAME)
+        elf_msg(elf, "\n");
+
     return 0;
 }
 
-- 
2.44.0



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v5 5/6] xen/elfnote: Specify ELF Notes are x86-specific
  2024-03-26 21:38 [PATCH v5 0/6] x86/pvh: Support relocating dom0 kernel Jason Andryuk
                   ` (3 preceding siblings ...)
  2024-03-26 21:38 ` [PATCH v5 4/6] libelf: Expand ELF note printing Jason Andryuk
@ 2024-03-26 21:38 ` Jason Andryuk
  2024-03-27  7:24   ` Jan Beulich
  2024-03-26 21:38 ` [PATCH v5 6/6] x86/PVH: Support relocatable dom0 kernels Jason Andryuk
  2024-03-26 21:47 ` [PATCH v5] RFC: x86/pvh: Make Xen PVH entrypoint PIC for x86-64 Jason Andryuk
  6 siblings, 1 reply; 17+ messages in thread
From: Jason Andryuk @ 2024-03-26 21:38 UTC (permalink / raw)
  To: xen-devel
  Cc: Jason Andryuk, Andrew Cooper, George Dunlap, Jan Beulich,
	Julien Grall, Stefano Stabellini

The Xen ELF Notes are only used with x86.  libelf's elf_xen_note_check()
exits early for ARM binaries with "ELF: Not bothering with notes on
ARM".

Add a comment to the top of elfnote.h specifying that Notes are only used
with x86 binaries.  This is to avoid adding disclaimers for individual
notes.

Signed-off-by: Jason Andryuk <jason.andryuk@amd.com>
---
 xen/include/public/elfnote.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/xen/include/public/elfnote.h b/xen/include/public/elfnote.h
index 8bf54d035b..1d84b05f44 100644
--- a/xen/include/public/elfnote.h
+++ b/xen/include/public/elfnote.h
@@ -24,6 +24,8 @@
  *
  * String values (for non-legacy) are NULL terminated ASCII, also known
  * as ASCIZ type.
+ *
+ * Xen only uses ELF Notes contained in x86 binaries.
  */
 
 /*
-- 
2.44.0



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v5 6/6] x86/PVH: Support relocatable dom0 kernels
  2024-03-26 21:38 [PATCH v5 0/6] x86/pvh: Support relocating dom0 kernel Jason Andryuk
                   ` (4 preceding siblings ...)
  2024-03-26 21:38 ` [PATCH v5 5/6] xen/elfnote: Specify ELF Notes are x86-specific Jason Andryuk
@ 2024-03-26 21:38 ` Jason Andryuk
  2024-03-26 21:47 ` [PATCH v5] RFC: x86/pvh: Make Xen PVH entrypoint PIC for x86-64 Jason Andryuk
  6 siblings, 0 replies; 17+ messages in thread
From: Jason Andryuk @ 2024-03-26 21:38 UTC (permalink / raw)
  To: xen-devel
  Cc: Jason Andryuk, Jan Beulich, Andrew Cooper, Roger Pau Monné,
	George Dunlap, Julien Grall, Stefano Stabellini

Xen tries to load a PVH dom0 kernel at the fixed guest physical address
from the elf headers.  For Linux, this defaults to 0x1000000 (16MB), but
it can be configured.

Unfortunately there exist firmwares that have reserved regions at this
address, so Xen fails to load the dom0 kernel since it's not RAM.

The PVH entry code is not relocatable - it loads from absolute
addresses, which fail when the kernel is loaded at a different address.
With a suitably modified kernel, a reloctable entry point is possible.

Add XEN_ELFNOTE_PHYS32_RELOC which specifies optional alignment,
minimum, and maximum addresses needed for the kernel.  The presence of
the NOTE indicates the kernel supports a relocatable entry path.

Change the loading to check for an acceptable load address.  If the
kernel is relocatable, support finding an alternate load address.

The primary motivation for an explicit align field is that Linux has a
configurable CONFIG_PHYSICAL_ALIGN field.  This value is present in the
bzImage setup header, but not the ELF program headers p_align, which
report 2MB even when CONFIG_PHYSICAL_ALIGN is greater.  Since a kernel
is only considered relocatable if the PHYS32_RELOC elf note is present,
the alignment contraints can just be specified within the note instead
of searching for an alignment value via a heuristic.

libelf-private.h includes common-macros.h to satisfy the fuzzer build.

Link: https://gitlab.com/xen-project/xen/-/issues/180
Signed-off-by: Jason Andryuk <jason.andryuk@amd.com>
---
ELF Note printing looks like:
(XEN) ELF: note: PHYS32_RELOC align: 0x200000 min: 0x1000000 max: 0x3fffffff

v2:
Use elfnote for min, max & align - use 64bit values.
Print original and relocated memory addresses
Use check_and_adjust_load_address() name
Return relocated base instead of offset
Use PAGE_ALIGN
Don't load above max_phys (expected to be 4GB in kernel elf note)
Use single line comments
Exit check_load_address loop earlier
Add __init to find_kernel_memory()

v3:
Remove kernel_start/end page rounding
Change loop comment to rely on a sorted memory map.
Reorder E820_RAM check first
Use %p for dest_base
Use PRIpaddr
Use 32bit phys_min/max/align
Consolidate to if ( x || y ) continue
Use max_t
Add parms->phys_reloc
Use common "%pd kernel: " prefix for messages
Re-order phys_entry assignment
Print range ends inclusively
Remove extra "Unable to load kernel" message
s/PVH_RELOCATION/PHYS32_RELOC/
Make PHYS32_RELOC contents optional
Use 2MB default alignment
Update ELF Note comment
Update XEN_ELFNOTE_MAX

v4:
Cast dest_base to uintptr_t
Use local start variable
Mention e820 requiring adjacent entries merged
Remove extra whitespace
Re-word elfnote comment & mention x86
Use ELFNOTE_NAME
Return -ENOSPC
Use ! instead of == 0
Check kend - 1 to avoid off by one
libelf: Use MB/GB() to define the size.
Use ARCH_PHYS_* defines

v5:
Place kernel in higher memory addresses
Remove stray semicolons
ELFNOTE_NAME comment about newline
Make PHYS32_RELOC element order align, min, max
Re-word PHYS32_RELOC comment
Move phys_align next to other bool variables
---
 xen/arch/x86/hvm/dom0_build.c      | 100 +++++++++++++++++++++++++++++
 xen/common/libelf/libelf-dominfo.c |  41 +++++++++++-
 xen/common/libelf/libelf-private.h |   1 +
 xen/include/public/elfnote.h       |  16 ++++-
 xen/include/xen/libelf.h           |   4 ++
 5 files changed, 159 insertions(+), 3 deletions(-)

diff --git a/xen/arch/x86/hvm/dom0_build.c b/xen/arch/x86/hvm/dom0_build.c
index 0ceda4140b..b34cb638da 100644
--- a/xen/arch/x86/hvm/dom0_build.c
+++ b/xen/arch/x86/hvm/dom0_build.c
@@ -537,6 +537,103 @@ static paddr_t __init find_memory(
     return INVALID_PADDR;
 }
 
+static bool __init check_load_address(
+    const struct domain *d, const struct elf_binary *elf)
+{
+    paddr_t kernel_start = (uintptr_t)elf->dest_base;
+    paddr_t kernel_end = kernel_start + elf->dest_size;
+    unsigned int i;
+
+    /* Relies on a sorted memory map with adjacent entries merged. */
+    for ( i = 0; i < d->arch.nr_e820; i++ )
+    {
+        paddr_t start = d->arch.e820[i].addr;
+        paddr_t end = start + d->arch.e820[i].size;
+
+        if ( start >= kernel_end )
+            return false;
+
+        if ( d->arch.e820[i].type == E820_RAM &&
+             start <= kernel_start &&
+             end >= kernel_end )
+            return true;
+    }
+
+    return false;
+}
+
+/* Find an e820 RAM region that fits the kernel at a suitable alignment. */
+static paddr_t __init find_kernel_memory(
+    const struct domain *d, struct elf_binary *elf,
+    const struct elf_dom_parms *parms)
+{
+    paddr_t kernel_size = elf->dest_size;
+    int i;
+
+    /* Search backwards to find the highest address. */
+    for ( i = d->arch.nr_e820 - 1; i >= 0 ; i-- )
+    {
+        paddr_t start = d->arch.e820[i].addr;
+        paddr_t end = start + d->arch.e820[i].size;
+        paddr_t kstart, kend;
+
+        if ( d->arch.e820[i].type != E820_RAM ||
+             d->arch.e820[i].size < kernel_size )
+            continue;
+
+        if ( start > parms->phys_max )
+            continue;
+
+        if ( end - 1 > parms->phys_max )
+            end = parms->phys_max + 1;
+
+        kstart = (end - kernel_size) & ~(parms->phys_align - 1);
+        kend = kstart + kernel_size;
+
+        if ( kstart < parms->phys_min )
+            return 0;
+
+        if ( kstart >= start && kend <= end )
+            return kstart;
+    }
+
+    return 0;
+}
+
+/* Check the kernel load address, and adjust if necessary and possible. */
+static bool __init check_and_adjust_load_address(
+    const struct domain *d, struct elf_binary *elf, struct elf_dom_parms *parms)
+{
+    paddr_t reloc_base;
+
+    if ( check_load_address(d, elf) )
+        return true;
+
+    if ( !parms->phys_reloc )
+    {
+        printk("%pd kernel: Address conflict and not relocatable\n", d);
+        return false;
+    }
+
+    reloc_base = find_kernel_memory(d, elf, parms);
+    if ( !reloc_base )
+    {
+        printk("%pd kernel: Failed find a load address\n", d);
+        return false;
+    }
+
+    if ( opt_dom0_verbose )
+        printk("%pd kernel: Moving [%p, %p] -> [%"PRIpaddr", %"PRIpaddr"]\n", d,
+               elf->dest_base, elf->dest_base + elf->dest_size - 1,
+               reloc_base, reloc_base + elf->dest_size - 1);
+
+    parms->phys_entry = reloc_base +
+                            (parms->phys_entry - (uintptr_t)elf->dest_base);
+    elf->dest_base = (char *)reloc_base;
+
+    return true;
+}
+
 static int __init pvh_load_kernel(struct domain *d, const module_t *image,
                                   unsigned long image_headroom,
                                   module_t *initrd, void *image_base,
@@ -585,6 +682,9 @@ static int __init pvh_load_kernel(struct domain *d, const module_t *image,
     elf.dest_base = (void *)(parms.virt_kstart - parms.virt_base);
     elf.dest_size = parms.virt_kend - parms.virt_kstart;
 
+    if ( !check_and_adjust_load_address(d, &elf, &parms) )
+        return -ENOSPC;
+
     elf_set_vcpu(&elf, v);
     rc = elf_load_binary(&elf);
     if ( rc < 0 )
diff --git a/xen/common/libelf/libelf-dominfo.c b/xen/common/libelf/libelf-dominfo.c
index 77bd582a37..08573c51e0 100644
--- a/xen/common/libelf/libelf-dominfo.c
+++ b/xen/common/libelf/libelf-dominfo.c
@@ -17,6 +17,16 @@
 
 #include "libelf-private.h"
 
+#if defined(__i386__) || defined(__x86_64__)
+#define ARCH_PHYS_ALIGN_DEFAULT MB(2)
+#define ARCH_PHYS_MIN_DEFAULT   0
+#define ARCH_PHYS_MAX_DEFAULT   (GB(4) - 1)
+#else
+#define ARCH_PHYS_ALIGN_DEFAULT 0
+#define ARCH_PHYS_MIN_DEFAULT   0
+#define ARCH_PHYS_MAX_DEFAULT   0
+#endif
+
 /* ------------------------------------------------------------------------ */
 /* xen features                                                             */
 
@@ -125,6 +135,7 @@ elf_errorstatus elf_xen_parse_note(struct elf_binary *elf,
         [XEN_ELFNOTE_SUSPEND_CANCEL] = { "SUSPEND_CANCEL", ELFNOTE_INT },
         [XEN_ELFNOTE_MOD_START_PFN] = { "MOD_START_PFN", ELFNOTE_INT },
         [XEN_ELFNOTE_PHYS32_ENTRY] = { "PHYS32_ENTRY", ELFNOTE_INT },
+        [XEN_ELFNOTE_PHYS32_RELOC] = { "PHYS32_RELOC", ELFNOTE_NAME },
     };
 /* *INDENT-ON* */
 
@@ -132,6 +143,7 @@ elf_errorstatus elf_xen_parse_note(struct elf_binary *elf,
     uint64_t val = 0;
     unsigned int i;
     unsigned type = elf_uval(elf, note, type);
+    unsigned descsz = elf_uval(elf, note, descsz);
 
     if ( (type >= sizeof(note_desc) / sizeof(note_desc[0])) ||
          (note_desc[type].name == NULL) )
@@ -160,8 +172,8 @@ elf_errorstatus elf_xen_parse_note(struct elf_binary *elf,
         break;
 
     case ELFNOTE_NAME:
-        /* ELFNOTE_NAME has a newline printed at the end of the function to
-         * optionally allow printing customized details. */
+        /* ELFNOTE_NAME has a newline printed at the end of the function and
+         * optionally allows printing custom formatted details below. */
         elf_msg(elf, "ELF: note: %s", note_desc[type].name);
         break;
     }
@@ -229,6 +241,27 @@ elf_errorstatus elf_xen_parse_note(struct elf_binary *elf,
     case XEN_ELFNOTE_PHYS32_ENTRY:
         parms->phys_entry = val;
         break;
+
+    case XEN_ELFNOTE_PHYS32_RELOC:
+        parms->phys_reloc = true;
+
+        if ( descsz >= 4 )
+        {
+            parms->phys_align = elf_note_numeric_array(elf, note, 4, 0);
+            elf_msg(elf, " align: %#"PRIx32, parms->phys_align);
+        }
+        if ( descsz >= 8 )
+        {
+            parms->phys_min = elf_note_numeric_array(elf, note, 4, 1);
+            elf_msg(elf, " min: %#"PRIx32, parms->phys_min);
+        }
+        if ( descsz >= 12 )
+        {
+            parms->phys_max = elf_note_numeric_array(elf, note, 4, 2);
+            elf_msg(elf, " max: %#"PRIx32, parms->phys_max);
+        }
+
+        break;
     }
 
     if ( note_desc[type].type == ELFNOTE_NAME)
@@ -544,6 +577,10 @@ elf_errorstatus elf_xen_parse(struct elf_binary *elf,
     parms->p2m_base = UNSET_ADDR;
     parms->elf_paddr_offset = UNSET_ADDR;
     parms->phys_entry = UNSET_ADDR32;
+    parms->phys_align = ARCH_PHYS_ALIGN_DEFAULT;
+    parms->phys_min = ARCH_PHYS_MIN_DEFAULT;
+    parms->phys_max = ARCH_PHYS_MAX_DEFAULT;
+    parms->phys_reloc = false;
 
     /* Find and parse elf notes. */
     count = elf_phdr_count(elf);
diff --git a/xen/common/libelf/libelf-private.h b/xen/common/libelf/libelf-private.h
index 47db679966..98cac65bc5 100644
--- a/xen/common/libelf/libelf-private.h
+++ b/xen/common/libelf/libelf-private.h
@@ -71,6 +71,7 @@
 #endif
 #include <xen/elfnote.h>
 #include <xen/libelf/libelf.h>
+#include <xen-tools/common-macros.h>
 
 #ifndef FUZZ_NO_LIBXC
 #include "xenctrl.h"
diff --git a/xen/include/public/elfnote.h b/xen/include/public/elfnote.h
index 1d84b05f44..a2f89bdc40 100644
--- a/xen/include/public/elfnote.h
+++ b/xen/include/public/elfnote.h
@@ -196,10 +196,24 @@
  */
 #define XEN_ELFNOTE_PHYS32_ENTRY 18
 
+/*
+ * Physical loading constraints for PVH kernels
+ *
+ * The presence of this note indicates the kernel supports relocating itself.
+ *
+ * The note may include up to three 32bit values to place constraints on the
+ * guest physical loading addresses and alignment for a PVH kernel.  Values
+ * are read in the following order:
+ *  - a required start alignment (default 0x200000)
+ *  - a minimum address for the start of the image (default 0)
+ *  - a maximum address for the last byte of the image (default 0xffffffff)
+ */
+#define XEN_ELFNOTE_PHYS32_RELOC 19
+
 /*
  * The number of the highest elfnote defined.
  */
-#define XEN_ELFNOTE_MAX XEN_ELFNOTE_PHYS32_ENTRY
+#define XEN_ELFNOTE_MAX XEN_ELFNOTE_PHYS32_RELOC
 
 /*
  * System information exported through crash notes.
diff --git a/xen/include/xen/libelf.h b/xen/include/xen/libelf.h
index 1c77e3df31..00e02c27cc 100644
--- a/xen/include/xen/libelf.h
+++ b/xen/include/xen/libelf.h
@@ -425,6 +425,7 @@ struct elf_dom_parms {
     enum xen_pae_type pae;
     bool bsd_symtab;
     bool unmapped_initrd;
+    bool phys_reloc;
     uint64_t virt_base;
     uint64_t virt_entry;
     uint64_t virt_hypercall;
@@ -434,6 +435,9 @@ struct elf_dom_parms {
     uint32_t f_supported[XENFEAT_NR_SUBMAPS];
     uint32_t f_required[XENFEAT_NR_SUBMAPS];
     uint32_t phys_entry;
+    uint32_t phys_align;
+    uint32_t phys_min;
+    uint32_t phys_max;
 
     /* calculated */
     uint64_t virt_kstart;
-- 
2.44.0



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v5] RFC: x86/pvh: Make Xen PVH entrypoint PIC for x86-64
  2024-03-26 21:38 [PATCH v5 0/6] x86/pvh: Support relocating dom0 kernel Jason Andryuk
                   ` (5 preceding siblings ...)
  2024-03-26 21:38 ` [PATCH v5 6/6] x86/PVH: Support relocatable dom0 kernels Jason Andryuk
@ 2024-03-26 21:47 ` Jason Andryuk
  2024-03-27  8:20   ` Jan Beulich
  6 siblings, 1 reply; 17+ messages in thread
From: Jason Andryuk @ 2024-03-26 21:47 UTC (permalink / raw)
  To: xen-devel
  Cc: Jan Beulich, Roger Pau Monné,
	Andrew Cooper, Juergen Gross, Boris Ostrovsky, Jason Andryuk

The Xen PVH entrypoint is 32bit non-PIC code running at a default load
address of 0x1000000 (16MB) (CONFIG_PHYSICAL_START).  Xen loads the
kernel at that physical address inside the PVH container.

When running a PVH Dom0, the system reserved addresses are mapped 1-1
into the PVH container.  There exist system firmwares (Coreboot/EDK2)
with reserved memory at 16MB.  This creates a conflict where the PVH
kernel cannot be loaded at that address.

Modify the PVH entrypoint to be position-indepedent to allow flexibility
in load address.  Only the 64bit entry path is converted.  A 32bit
kernel is not PIC, so calling into other parts of the kernel, like
xen_prepare_pvh() and mk_pgtable_32(), don't work properly when
relocated.

Initial PVH entry runs at the physical addresses and then transitions to
the identity mapped address.  While executing xen_prepare_pvh() calls
through pv_ops function pointers transition to the high mapped
addresses.  Additionally, __va() is called on some hvm_start_info
physical addresses, we need the directmap address range is used.  So we
need to run page tables with all of those ranges mapped.

Modifying init_top_pgt tables ran into issue since
startup_64/__startup_64() will modify those page tables again.  Use a
dedicated set of page tables - pvh_init_top_pgt  - for the PVH entry to
avoid unwanted interactions.

In xen_pvh_init(), __pa() is called to find the physical address of the
hypercall page.  Set phys_base temporarily before calling into
xen_prepare_pvh(), which calls xen_pvh_init(), and clear it afterwards.
__startup_64() assumes phys_base is zero and adds load_delta to it.  If
phys_base is already set, the calculation results in an incorrect cr3.

TODO: Sync elfnote.h from xen.git commit xxxxxxxxxx when it is
commited.

Signed-off-by: Jason Andryuk <jason.andryuk@amd.com>
---
Put this out as an example for the Xen modifications

Instead of setting and clearing phys_base, add a dedicated variable?
Clearing phys_base is a little weird, but it leaves the kernel more
consistent when running non-entry code.

Make __startup_64() exit if phys_base is already set to allow calling
multiple times, and use that and init_top_pgt instead of adding
additional page tables?  That won't work.  __startup_64 is 64bit code,
and pvh needs to create page tables in 32bit code before it can
transition to 64bit long mode.  Hence it can't be re-used to relocate
page tables.
---
 arch/x86/platform/pvh/head.S    | 184 +++++++++++++++++++++++++++++---
 include/xen/interface/elfnote.h |  18 +++-
 2 files changed, 189 insertions(+), 13 deletions(-)

diff --git a/arch/x86/platform/pvh/head.S b/arch/x86/platform/pvh/head.S
index f7235ef87bc3..13cfd4a35462 100644
--- a/arch/x86/platform/pvh/head.S
+++ b/arch/x86/platform/pvh/head.S
@@ -50,11 +50,32 @@
 #define PVH_CS_SEL		(PVH_GDT_ENTRY_CS * 8)
 #define PVH_DS_SEL		(PVH_GDT_ENTRY_DS * 8)
 
+#define rva(x) ((x) - pvh_start_xen)
+
 SYM_CODE_START_LOCAL(pvh_start_xen)
 	UNWIND_HINT_END_OF_STACK
 	cld
 
-	lgdt (_pa(gdt))
+	/*
+	 * See the comment for startup_32 for more details.  We need to
+	 * execute a call to get the execution address to be position
+	 * independent, but we don't have a stack.  Save and restore the
+	 * magic field of start_info in ebx, and use that as the stack.
+	 */
+	mov	(%ebx), %eax
+	leal	4(%ebx), %esp
+	ANNOTATE_INTRA_FUNCTION_CALL
+	call	1f
+1:	popl	%ebp
+	mov	%eax, (%ebx)
+	subl	$ rva(1b), %ebp
+	movl	$0, %esp
+
+	leal	rva(gdt)(%ebp), %eax
+	movl	%eax, %ecx
+	leal	rva(gdt_start)(%ebp), %ecx
+	movl	%ecx, 2(%eax)
+	lgdt	(%eax)
 
 	mov $PVH_DS_SEL,%eax
 	mov %eax,%ds
@@ -62,14 +83,14 @@ SYM_CODE_START_LOCAL(pvh_start_xen)
 	mov %eax,%ss
 
 	/* Stash hvm_start_info. */
-	mov $_pa(pvh_start_info), %edi
+	leal rva(pvh_start_info)(%ebp), %edi
 	mov %ebx, %esi
-	mov _pa(pvh_start_info_sz), %ecx
+	movl rva(pvh_start_info_sz)(%ebp), %ecx
 	shr $2,%ecx
 	rep
 	movsl
 
-	mov $_pa(early_stack_end), %esp
+	leal rva(early_stack_end)(%ebp), %esp
 
 	/* Enable PAE mode. */
 	mov %cr4, %eax
@@ -83,29 +104,83 @@ SYM_CODE_START_LOCAL(pvh_start_xen)
 	btsl $_EFER_LME, %eax
 	wrmsr
 
+	mov %ebp, %ebx
+	subl $LOAD_PHYSICAL_ADDR, %ebx /* offset */
+	jz .Lpagetable_done
+
+	/* Fixup page-tables for relocation. */
+	leal rva(pvh_init_top_pgt)(%ebp), %edi
+	movl $512, %ecx
+2:
+	movl 0x00(%edi), %eax
+	addl 0x04(%edi), %eax
+	jz 1f
+	addl %ebx, 0x00(%edi)
+1:
+	addl $8, %edi
+	decl %ecx
+	jnz 2b
+
+	/* L3 ident has a single entry. */
+	leal rva(pvh_level3_ident_pgt)(%ebp), %edi
+	addl %ebx, 0x00(%edi)
+
+	leal rva(pvh_level3_kernel_pgt)(%ebp), %edi
+	addl %ebx, (4096 - 16)(%edi)
+	addl %ebx, (4096 - 8)(%edi)
+
+	/* pvh_level2_ident_pgt is fine - large pages */
+
+	/* pvh_level2_kernel_pgt needs adjustment - large pages */
+	leal rva(pvh_level2_kernel_pgt)(%ebp), %edi
+	movl $512, %ecx
+2:
+	movl 0x00(%edi), %eax
+	addl 0x04(%edi), %eax
+	jz 1f
+	addl %ebx, 0x00(%edi)
+1:
+	addl $8, %edi
+	decl %ecx
+	jnz 2b
+
+.Lpagetable_done:
 	/* Enable pre-constructed page tables. */
-	mov $_pa(init_top_pgt), %eax
+	leal rva(pvh_init_top_pgt)(%ebp), %eax
 	mov %eax, %cr3
 	mov $(X86_CR0_PG | X86_CR0_PE), %eax
 	mov %eax, %cr0
 
 	/* Jump to 64-bit mode. */
-	ljmp $PVH_CS_SEL, $_pa(1f)
+	pushl $PVH_CS_SEL
+	leal  rva(1f)(%ebp), %eax
+	pushl %eax
+	lretl
 
 	/* 64-bit entry point. */
 	.code64
 1:
 	/* Set base address in stack canary descriptor. */
 	mov $MSR_GS_BASE,%ecx
-	mov $_pa(canary), %eax
+	leal rva(canary)(%ebp), %eax
 	xor %edx, %edx
 	wrmsr
 
+	/* Calculate load offset from LOAD_PHYSICAL_ADDR and store in
+	 * phys_base.  __pa() needs phys_base set to calculate the the
+	 * hypercall page in xen_pvh_init(). */
+	movq %rbp, %rbx
+	subq $LOAD_PHYSICAL_ADDR, %rbx
+	movq %rbx, phys_base(%rip)
 	call xen_prepare_pvh
+	/* Clear phys_base.  startup_64/__startup_64 will *add* to its value,
+	   so start from 0. */
+	xor  %rbx, %rbx
+	movq %rbx, phys_base(%rip)
 
 	/* startup_64 expects boot_params in %rsi. */
-	mov $_pa(pvh_bootparams), %rsi
-	mov $_pa(startup_64), %rax
+	lea rva(pvh_bootparams)(%ebp), %rsi
+	lea rva(startup_64)(%ebp), %rax
 	ANNOTATE_RETPOLINE_SAFE
 	jmp *%rax
 
@@ -137,13 +212,14 @@ SYM_CODE_START_LOCAL(pvh_start_xen)
 
 	ljmp $PVH_CS_SEL, $_pa(startup_32)
 #endif
+
 SYM_CODE_END(pvh_start_xen)
 
 	.section ".init.data","aw"
 	.balign 8
 SYM_DATA_START_LOCAL(gdt)
-	.word gdt_end - gdt_start
-	.long _pa(gdt_start)
+	.word gdt_end - gdt_start - 1
+	.long _pa(gdt_start) /* x86-64 will overwrite if relocated. */
 	.word 0
 SYM_DATA_END(gdt)
 SYM_DATA_START_LOCAL(gdt_start)
@@ -163,5 +239,89 @@ SYM_DATA_START_LOCAL(early_stack)
 	.fill BOOT_STACK_SIZE, 1, 0
 SYM_DATA_END_LABEL(early_stack, SYM_L_LOCAL, early_stack_end)
 
+#ifdef CONFIG_X86_64
+/*
+ * We are not able to switch in one step to the final KERNEL ADDRESS SPACE
+ * because we need identity-mapped pages.
+ */
+#define l4_index(x)     (((x) >> 39) & 511)
+#define pud_index(x)    (((x) >> PUD_SHIFT) & (PTRS_PER_PUD-1))
+
+L4_PAGE_OFFSET  = l4_index(__PAGE_OFFSET_BASE_L4)
+L4_START_KERNEL = l4_index(__START_KERNEL_map)
+L3_START_KERNEL = pud_index(__START_KERNEL_map)
+
+#define SYM_DATA_START_PAGE_ALIGNED(name)			\
+	SYM_START(name, SYM_L_GLOBAL, .balign PAGE_SIZE)
+
+/* Automate the creation of 1 to 1 mapping pmd entries */
+#define PMDS(START, PERM, COUNT)			\
+	i = 0 ;						\
+	.rept (COUNT) ;					\
+	.quad	(START) + (i << PMD_SHIFT) + (PERM) ;	\
+	i = i + 1 ;					\
+	.endr
+
+/*
+ * Xen PVH needs a set of identity mapped and kernel high mapping
+ * page tables.  pvh_start_xen starts running on the identity mapped
+ * page tables, but xen_prepare_pvh calls into the high mapping.
+ * These page tables need to be relocatable and are only used until
+ * startup_64 transitions to init_top_pgt.
+ */
+SYM_DATA_START_PAGE_ALIGNED(pvh_init_top_pgt)
+	.quad   pvh_level3_ident_pgt - __START_KERNEL_map + _KERNPG_TABLE_NOENC
+	.org    pvh_init_top_pgt + L4_PAGE_OFFSET*8, 0
+	.quad   pvh_level3_ident_pgt - __START_KERNEL_map + _KERNPG_TABLE_NOENC
+	.org    pvh_init_top_pgt + L4_START_KERNEL*8, 0
+	/* (2^48-(2*1024*1024*1024))/(2^39) = 511 */
+	.quad   pvh_level3_kernel_pgt - __START_KERNEL_map + _PAGE_TABLE_NOENC
+SYM_DATA_END(pvh_init_top_pgt)
+
+SYM_DATA_START_PAGE_ALIGNED(pvh_level3_ident_pgt)
+	.quad	pvh_level2_ident_pgt - __START_KERNEL_map + _KERNPG_TABLE_NOENC
+	.fill	511, 8, 0
+SYM_DATA_END(pvh_level3_ident_pgt)
+SYM_DATA_START_PAGE_ALIGNED(pvh_level2_ident_pgt)
+	/*
+	 * Since I easily can, map the first 1G.
+	 * Don't set NX because code runs from these pages.
+	 *
+	 * Note: This sets _PAGE_GLOBAL despite whether
+	 * the CPU supports it or it is enabled.  But,
+	 * the CPU should ignore the bit.
+	 */
+	PMDS(0, __PAGE_KERNEL_IDENT_LARGE_EXEC, PTRS_PER_PMD)
+SYM_DATA_END(pvh_level2_ident_pgt)
+SYM_DATA_START_PAGE_ALIGNED(pvh_level3_kernel_pgt)
+	.fill	L3_START_KERNEL,8,0
+	/* (2^48-(2*1024*1024*1024)-((2^39)*511))/(2^30) = 510 */
+	.quad	pvh_level2_kernel_pgt - __START_KERNEL_map + _KERNPG_TABLE_NOENC
+	.quad	0 /* no fixmap */
+SYM_DATA_END(pvh_level3_kernel_pgt)
+
+SYM_DATA_START_PAGE_ALIGNED(pvh_level2_kernel_pgt)
+	/*
+	 * Kernel high mapping.
+	 *
+	 * The kernel code+data+bss must be located below KERNEL_IMAGE_SIZE in
+	 * virtual address space, which is 1 GiB if RANDOMIZE_BASE is enabled,
+	 * 512 MiB otherwise.
+	 *
+	 * (NOTE: after that starts the module area, see MODULES_VADDR.)
+	 *
+	 * This table is eventually used by the kernel during normal runtime.
+	 * Care must be taken to clear out undesired bits later, like _PAGE_RW
+	 * or _PAGE_GLOBAL in some cases.
+	 */
+	PMDS(0, __PAGE_KERNEL_LARGE_EXEC, KERNEL_IMAGE_SIZE/PMD_SIZE)
+SYM_DATA_END(pvh_level2_kernel_pgt)
+
+	ELFNOTE(Xen, XEN_ELFNOTE_PHYS32_RELOC,
+		     .long CONFIG_PHYSICAL_ALIGN;
+		     .long LOAD_PHYSICAL_ADDR;
+		     .long KERNEL_IMAGE_SIZE - 1)
+#endif
+
 	ELFNOTE(Xen, XEN_ELFNOTE_PHYS32_ENTRY,
-	             _ASM_PTR (pvh_start_xen - __START_KERNEL_map))
+	             .long (pvh_start_xen - __START_KERNEL_map))
diff --git a/include/xen/interface/elfnote.h b/include/xen/interface/elfnote.h
index 38deb1214613..9ebd4e79bb41 100644
--- a/include/xen/interface/elfnote.h
+++ b/include/xen/interface/elfnote.h
@@ -185,9 +185,25 @@
  */
 #define XEN_ELFNOTE_PHYS32_ENTRY 18
 
+/*
+ * Physical loading constraints for PVH kernels
+ *
+ * Used to place constraints on the guest physical loading addresses and
+ * alignment for a PVH kernel.
+ *
+ * The presence of this note indicates the kernel supports relocating itself.
+ *
+ * The note may include up to three 32bit values in the following order:
+ *  - a maximum address for the entire image to be loaded below (default
+ *      0xffffffff)
+ *  - a minimum address for the start of the image (default 0)
+ *  - a required start alignment (default 0x200000)
+ */
+#define XEN_ELFNOTE_PHYS32_RELOC 19
+
 /*
  * The number of the highest elfnote defined.
  */
-#define XEN_ELFNOTE_MAX XEN_ELFNOTE_PHYS32_ENTRY
+#define XEN_ELFNOTE_MAX XEN_ELFNOTE_PHYS32_RELOC
 
 #endif /* __XEN_PUBLIC_ELFNOTE_H__ */
-- 
2.44.0



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH v5 1/6] Revert "xen/x86: bzImage parse kernel_alignment"
  2024-03-26 21:38 ` [PATCH v5 1/6] Revert "xen/x86: bzImage parse kernel_alignment" Jason Andryuk
@ 2024-03-27  7:22   ` Jan Beulich
  2024-03-27  8:59     ` Roger Pau Monné
  0 siblings, 1 reply; 17+ messages in thread
From: Jan Beulich @ 2024-03-27  7:22 UTC (permalink / raw)
  To: Jason Andryuk, Roger Pau Monné; +Cc: Andrew Cooper, xen-devel

On 26.03.2024 22:38, Jason Andryuk wrote:
> A new ELF note will specify the alignment for a relocatable PVH kernel.
> ELF notes are suitable for vmlinux and other ELF files, so this
> Linux-specific bzImage parsing in unnecessary.
> 
> This reverts commit c44cac229067faeec8f49247d1cf281723ac2d40.
> 
> Signed-off-by: Jason Andryuk <jason.andryuk@amd.com>

Since you keep re-sending this: In private discussion Roger has indicated
that, like me, he too would prefer falling back to the ELF data, before
falling back to the arch default (Roger, please correct me if I got it
wrong). That would make it necessary that the change you're proposing to
revert here is actually kept.

Or wait - what you're reverting is taking the alignment out of the
bzImage header. I don't expect the BSDs to use that protocol; aiui that's
entirely Linux-specific.

I further meanwhile realized that consulting the ELF phdrs may also be
ambiguous, as there may be more than one. I guess it would need to be the
maximum of all of them then.

Jan


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v5 5/6] xen/elfnote: Specify ELF Notes are x86-specific
  2024-03-26 21:38 ` [PATCH v5 5/6] xen/elfnote: Specify ELF Notes are x86-specific Jason Andryuk
@ 2024-03-27  7:24   ` Jan Beulich
  0 siblings, 0 replies; 17+ messages in thread
From: Jan Beulich @ 2024-03-27  7:24 UTC (permalink / raw)
  To: Jason Andryuk
  Cc: Andrew Cooper, George Dunlap, Julien Grall, Stefano Stabellini,
	xen-devel

On 26.03.2024 22:38, Jason Andryuk wrote:
> The Xen ELF Notes are only used with x86.  libelf's elf_xen_note_check()
> exits early for ARM binaries with "ELF: Not bothering with notes on
> ARM".
> 
> Add a comment to the top of elfnote.h specifying that Notes are only used
> with x86 binaries.  This is to avoid adding disclaimers for individual
> notes.
> 
> Signed-off-by: Jason Andryuk <jason.andryuk@amd.com>

Acked-by: Jan Beulich <jbeulich@suse.com>




^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v5 4/6] libelf: Expand ELF note printing
  2024-03-26 21:38 ` [PATCH v5 4/6] libelf: Expand ELF note printing Jason Andryuk
@ 2024-03-27  7:27   ` Jan Beulich
  0 siblings, 0 replies; 17+ messages in thread
From: Jan Beulich @ 2024-03-27  7:27 UTC (permalink / raw)
  To: Jason Andryuk
  Cc: Andrew Cooper, George Dunlap, Julien Grall, Stefano Stabellini,
	xen-devel

On 26.03.2024 22:38, Jason Andryuk wrote:
> @@ -145,13 +150,20 @@ elf_errorstatus elf_xen_parse_note(struct elf_binary *elf,
>          elf_msg(elf, "ELF: note: %s = \"%s\"\n", note_desc[type].name, str);
>          parms->elf_notes[type].type = XEN_ENT_STR;
>          parms->elf_notes[type].data.str = str;
> -    }
> -    else
> -    {
> +        break;
> +
> +    case ELFNOTE_INT:
>          val = elf_note_numeric(elf, note);
>          elf_msg(elf, "ELF: note: %s = %#" PRIx64 "\n", note_desc[type].name, val);
>          parms->elf_notes[type].type = XEN_ENT_LONG;
>          parms->elf_notes[type].data.num = val;
> +        break;
> +
> +    case ELFNOTE_NAME:
> +        /* ELFNOTE_NAME has a newline printed at the end of the function to
> +         * optionally allow printing customized details. */
> +        elf_msg(elf, "ELF: note: %s", note_desc[type].name);
> +        break;

Well. I said "brief comment" for several reasons. One of them being that
it would best fit on a single line. Since now it doesn't, I have to point
out that this way comment style is violated.

/* NB: Newline emitted further down. */

?

Jan


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v5] RFC: x86/pvh: Make Xen PVH entrypoint PIC for x86-64
  2024-03-26 21:47 ` [PATCH v5] RFC: x86/pvh: Make Xen PVH entrypoint PIC for x86-64 Jason Andryuk
@ 2024-03-27  8:20   ` Jan Beulich
  2024-03-27 14:15     ` Jason Andryuk
  0 siblings, 1 reply; 17+ messages in thread
From: Jan Beulich @ 2024-03-27  8:20 UTC (permalink / raw)
  To: Jason Andryuk
  Cc: Roger Pau Monné,
	Andrew Cooper, Juergen Gross, Boris Ostrovsky, xen-devel

On 26.03.2024 22:47, Jason Andryuk wrote:
> --- a/include/xen/interface/elfnote.h
> +++ b/include/xen/interface/elfnote.h
> @@ -185,9 +185,25 @@
>   */
>  #define XEN_ELFNOTE_PHYS32_ENTRY 18
>  
> +/*
> + * Physical loading constraints for PVH kernels
> + *
> + * Used to place constraints on the guest physical loading addresses and
> + * alignment for a PVH kernel.
> + *
> + * The presence of this note indicates the kernel supports relocating itself.
> + *
> + * The note may include up to three 32bit values in the following order:
> + *  - a maximum address for the entire image to be loaded below (default
> + *      0xffffffff)
> + *  - a minimum address for the start of the image (default 0)
> + *  - a required start alignment (default 0x200000)

This looks to be stale now.

Jan



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v5 1/6] Revert "xen/x86: bzImage parse kernel_alignment"
  2024-03-27  7:22   ` Jan Beulich
@ 2024-03-27  8:59     ` Roger Pau Monné
  2024-03-27 14:08       ` Jason Andryuk
  0 siblings, 1 reply; 17+ messages in thread
From: Roger Pau Monné @ 2024-03-27  8:59 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Jason Andryuk, Andrew Cooper, xen-devel

On Wed, Mar 27, 2024 at 08:22:41AM +0100, Jan Beulich wrote:
> On 26.03.2024 22:38, Jason Andryuk wrote:
> > A new ELF note will specify the alignment for a relocatable PVH kernel.
> > ELF notes are suitable for vmlinux and other ELF files, so this
> > Linux-specific bzImage parsing in unnecessary.
> > 
> > This reverts commit c44cac229067faeec8f49247d1cf281723ac2d40.
> > 
> > Signed-off-by: Jason Andryuk <jason.andryuk@amd.com>
> 
> Since you keep re-sending this: In private discussion Roger has indicated
> that, like me, he too would prefer falling back to the ELF data, before
> falling back to the arch default (Roger, please correct me if I got it
> wrong). That would make it necessary that the change you're proposing to
> revert here is actually kept.

Sorry, was meaning to reply yesterday but Jason is very fast at
sending new version so I'm always one version behind.

IMO the order: ELF note, PHDR alignment, arch default should be the
preferred one.

> Or wait - what you're reverting is taking the alignment out of the
> bzImage header. I don't expect the BSDs to use that protocol; aiui that's
> entirely Linux-specific.

Yeah, I don't have strong opinions in keeping this, we already do
bzImage parsing, so we might as well attempt to fetch the alignment
from there if correct:

ELF note, bzImage kernel_alignment, ELF PHDR alignment, arch default

> I further meanwhile realized that consulting the ELF phdrs may also be
> ambiguous, as there may be more than one. I guess it would need to be the
> maximum of all of them then.

My suggestion (not sure if I mentioned this before) was to use the
alignment of the first LOAD PHDR, which is the one that defines the
value of the dest_base field used as the image load start address.

Using the maximum of all load PHDRs might be safer.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v5 1/6] Revert "xen/x86: bzImage parse kernel_alignment"
  2024-03-27  8:59     ` Roger Pau Monné
@ 2024-03-27 14:08       ` Jason Andryuk
  2024-03-27 14:19         ` Jan Beulich
  0 siblings, 1 reply; 17+ messages in thread
From: Jason Andryuk @ 2024-03-27 14:08 UTC (permalink / raw)
  To: Roger Pau Monné, Jan Beulich; +Cc: Andrew Cooper, xen-devel

On 2024-03-27 04:59, Roger Pau Monné wrote:
> On Wed, Mar 27, 2024 at 08:22:41AM +0100, Jan Beulich wrote:
>> On 26.03.2024 22:38, Jason Andryuk wrote:
>>> A new ELF note will specify the alignment for a relocatable PVH kernel.
>>> ELF notes are suitable for vmlinux and other ELF files, so this
>>> Linux-specific bzImage parsing in unnecessary.
>>>
>>> This reverts commit c44cac229067faeec8f49247d1cf281723ac2d40.
>>>
>>> Signed-off-by: Jason Andryuk <jason.andryuk@amd.com>
>>
>> Since you keep re-sending this: In private discussion Roger has indicated
>> that, like me, he too would prefer falling back to the ELF data, before
>> falling back to the arch default (Roger, please correct me if I got it
>> wrong). That would make it necessary that the change you're proposing to
>> revert here is actually kept.
> 
> Sorry, was meaning to reply yesterday but Jason is very fast at
> sending new version so I'm always one version behind.

:)

I was hoping to finish this up and get it in...

> IMO the order: ELF note, PHDR alignment, arch default should be the
> preferred one.
> 
>> Or wait - what you're reverting is taking the alignment out of the
>> bzImage header. I don't expect the BSDs to use that protocol; aiui that's
>> entirely Linux-specific.
> 
> Yeah, I don't have strong opinions in keeping this, we already do
> bzImage parsing, so we might as well attempt to fetch the alignment
> from there if correct:
> 
> ELF note, bzImage kernel_alignment, ELF PHDR alignment, arch default

I'm not sure how to handle ELF PHDR vs. arch default.  ELF PHDR will 
always be set, AFAIU.  Should that always be respected, which means we 
don't need an arch default?

To include arch default, it would be something like this:

     if ( parms->phys_align != UNSET_ADDR )
         align = parms->phys_align;
     else if ( bz_align )
         align = bz_align;
     else if ( elf->palign > PHYS32_RELOC_ALIGN_DEFAULT )
         align = elf->palign;
     else
         align = PHYS32_RELOC_ALIGN_DEFAULT;


>> I further meanwhile realized that consulting the ELF phdrs may also be
>> ambiguous, as there may be more than one. I guess it would need to be the
>> maximum of all of them then.
> 
> My suggestion (not sure if I mentioned this before) was to use the
> alignment of the first LOAD PHDR, which is the one that defines the
> value of the dest_base field used as the image load start address.
> 
> Using the maximum of all load PHDRs might be safer.

I'll find the maximum.

Thanks,
Jason


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v5] RFC: x86/pvh: Make Xen PVH entrypoint PIC for x86-64
  2024-03-27  8:20   ` Jan Beulich
@ 2024-03-27 14:15     ` Jason Andryuk
  0 siblings, 0 replies; 17+ messages in thread
From: Jason Andryuk @ 2024-03-27 14:15 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Roger Pau Monné,
	Andrew Cooper, Juergen Gross, Boris Ostrovsky, xen-devel

On 2024-03-27 04:20, Jan Beulich wrote:
> On 26.03.2024 22:47, Jason Andryuk wrote:
>> --- a/include/xen/interface/elfnote.h
>> +++ b/include/xen/interface/elfnote.h
>> @@ -185,9 +185,25 @@
>>    */
>>   #define XEN_ELFNOTE_PHYS32_ENTRY 18
>>   
>> +/*
>> + * Physical loading constraints for PVH kernels
>> + *
>> + * Used to place constraints on the guest physical loading addresses and
>> + * alignment for a PVH kernel.
>> + *
>> + * The presence of this note indicates the kernel supports relocating itself.
>> + *
>> + * The note may include up to three 32bit values in the following order:
>> + *  - a maximum address for the entire image to be loaded below (default
>> + *      0xffffffff)
>> + *  - a minimum address for the start of the image (default 0)
>> + *  - a required start alignment (default 0x200000)
> 
> This looks to be stale now.

Yes - I did not update this file.

The patch is functional as the ELF Note fields in 
arch/x86/platform/pvh/head.S were re-ordered to match the Xen side.

Regards,
Jason


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v5 1/6] Revert "xen/x86: bzImage parse kernel_alignment"
  2024-03-27 14:08       ` Jason Andryuk
@ 2024-03-27 14:19         ` Jan Beulich
  2024-03-27 14:49           ` Jason Andryuk
  0 siblings, 1 reply; 17+ messages in thread
From: Jan Beulich @ 2024-03-27 14:19 UTC (permalink / raw)
  To: Jason Andryuk; +Cc: Andrew Cooper, xen-devel, Roger Pau Monné

On 27.03.2024 15:08, Jason Andryuk wrote:
> On 2024-03-27 04:59, Roger Pau Monné wrote:
>> On Wed, Mar 27, 2024 at 08:22:41AM +0100, Jan Beulich wrote:
>>> On 26.03.2024 22:38, Jason Andryuk wrote:
>>>> A new ELF note will specify the alignment for a relocatable PVH kernel.
>>>> ELF notes are suitable for vmlinux and other ELF files, so this
>>>> Linux-specific bzImage parsing in unnecessary.
>>>>
>>>> This reverts commit c44cac229067faeec8f49247d1cf281723ac2d40.
>>>>
>>>> Signed-off-by: Jason Andryuk <jason.andryuk@amd.com>
>>>
>>> Since you keep re-sending this: In private discussion Roger has indicated
>>> that, like me, he too would prefer falling back to the ELF data, before
>>> falling back to the arch default (Roger, please correct me if I got it
>>> wrong). That would make it necessary that the change you're proposing to
>>> revert here is actually kept.
>>
>> Sorry, was meaning to reply yesterday but Jason is very fast at
>> sending new version so I'm always one version behind.
> 
> :)
> 
> I was hoping to finish this up and get it in...
> 
>> IMO the order: ELF note, PHDR alignment, arch default should be the
>> preferred one.
>>
>>> Or wait - what you're reverting is taking the alignment out of the
>>> bzImage header. I don't expect the BSDs to use that protocol; aiui that's
>>> entirely Linux-specific.
>>
>> Yeah, I don't have strong opinions in keeping this, we already do
>> bzImage parsing, so we might as well attempt to fetch the alignment
>> from there if correct:
>>
>> ELF note, bzImage kernel_alignment, ELF PHDR alignment, arch default
> 
> I'm not sure how to handle ELF PHDR vs. arch default.  ELF PHDR will 
> always be set, AFAIU.  Should that always be respected, which means we 
> don't need an arch default?

A value of 0 (and 1) is specifically permitted, to indicate no alignment.
We may take 0 to mean default, but what you suggest below is another
plausible approach. Yet another might be to take anything below PAGE_SIZE
as "use default".

> To include arch default, it would be something like this:
> 
>      if ( parms->phys_align != UNSET_ADDR )
>          align = parms->phys_align;
>      else if ( bz_align )
>          align = bz_align;

Why do you include bz again here? Didn't you previously indicate the
header field can't be relied upon? Which is also why, finally, I committed
this revert earlier today.

Jan

>      else if ( elf->palign > PHYS32_RELOC_ALIGN_DEFAULT )
>          align = elf->palign;
>      else
>          align = PHYS32_RELOC_ALIGN_DEFAULT;
> 
> 
>>> I further meanwhile realized that consulting the ELF phdrs may also be
>>> ambiguous, as there may be more than one. I guess it would need to be the
>>> maximum of all of them then.
>>
>> My suggestion (not sure if I mentioned this before) was to use the
>> alignment of the first LOAD PHDR, which is the one that defines the
>> value of the dest_base field used as the image load start address.
>>
>> Using the maximum of all load PHDRs might be safer.
> 
> I'll find the maximum.
> 
> Thanks,
> Jason



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v5 1/6] Revert "xen/x86: bzImage parse kernel_alignment"
  2024-03-27 14:19         ` Jan Beulich
@ 2024-03-27 14:49           ` Jason Andryuk
  0 siblings, 0 replies; 17+ messages in thread
From: Jason Andryuk @ 2024-03-27 14:49 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, xen-devel, Roger Pau Monné

On 2024-03-27 10:19, Jan Beulich wrote:
> On 27.03.2024 15:08, Jason Andryuk wrote:
>> On 2024-03-27 04:59, Roger Pau Monné wrote:
>>> On Wed, Mar 27, 2024 at 08:22:41AM +0100, Jan Beulich wrote:
>>>> On 26.03.2024 22:38, Jason Andryuk wrote:
>>>>> A new ELF note will specify the alignment for a relocatable PVH kernel.
>>>>> ELF notes are suitable for vmlinux and other ELF files, so this
>>>>> Linux-specific bzImage parsing in unnecessary.
>>>>>
>>>>> This reverts commit c44cac229067faeec8f49247d1cf281723ac2d40.
>>>>>
>>>>> Signed-off-by: Jason Andryuk <jason.andryuk@amd.com>
>>>>
>>>> Since you keep re-sending this: In private discussion Roger has indicated
>>>> that, like me, he too would prefer falling back to the ELF data, before
>>>> falling back to the arch default (Roger, please correct me if I got it
>>>> wrong). That would make it necessary that the change you're proposing to
>>>> revert here is actually kept.
>>>
>>> Sorry, was meaning to reply yesterday but Jason is very fast at
>>> sending new version so I'm always one version behind.
>>
>> :)
>>
>> I was hoping to finish this up and get it in...
>>
>>> IMO the order: ELF note, PHDR alignment, arch default should be the
>>> preferred one.
>>>
>>>> Or wait - what you're reverting is taking the alignment out of the
>>>> bzImage header. I don't expect the BSDs to use that protocol; aiui that's
>>>> entirely Linux-specific.
>>>
>>> Yeah, I don't have strong opinions in keeping this, we already do
>>> bzImage parsing, so we might as well attempt to fetch the alignment
>>> from there if correct:
>>>
>>> ELF note, bzImage kernel_alignment, ELF PHDR alignment, arch default
>>
>> I'm not sure how to handle ELF PHDR vs. arch default.  ELF PHDR will
>> always be set, AFAIU.  Should that always be respected, which means we
>> don't need an arch default?
> 
> A value of 0 (and 1) is specifically permitted, to indicate no alignment.
> We may take 0 to mean default, but what you suggest below is another
> plausible approach. Yet another might be to take anything below PAGE_SIZE
> as "use default".
> 
>> To include arch default, it would be something like this:
>>
>>       if ( parms->phys_align != UNSET_ADDR )
>>           align = parms->phys_align;
>>       else if ( bz_align )
>>           align = bz_align;
> 
> Why do you include bz again here? Didn't you previously indicate the
> header field can't be relied upon? Which is also why, finally, I committed
> this revert earlier today.

Roger wanted to consult the bz value?  He mentioned it above.  Locally, 
I haven't synced with staging yet, and I dropped the revert to start 
re-working this...

If present, the bzImage header field is valid.  But being 
bzImage-specific, it isn't useful for other ELF files.  Xen will only 
move a kernel with the PHSY32_RELOC Note, so it can just specify an 
alignment if it needs to.

Personally, I think using the Note's value or a default is fine.  It 
seems like the PHDR aligment will just be 0x200000 anyway (for x86-64 at 
lease), which matches the default.  Specifying the PHYS32_RELOC Note, 
but relying on a search for the alignment, seems slightly questionable 
to me.

Still, it seemed like the path of least resistance is to just implement 
the fallback search like Roger requested.  Dropping the bzImage, I guess 
I'd go with your PAGE_SIZE suggestions for:

     if ( parms->phys_align != UNSET_ADDR )
         align = parms->phys_align;
     else if ( elf->palign > PAGE_SIZE )
         align = elf->palign;
     else
         align = PHYS32_RELOC_ALIGN_DEFAULT;

Thanks for your reviews.

Regards,
Jason


^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2024-03-27 14:49 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-03-26 21:38 [PATCH v5 0/6] x86/pvh: Support relocating dom0 kernel Jason Andryuk
2024-03-26 21:38 ` [PATCH v5 1/6] Revert "xen/x86: bzImage parse kernel_alignment" Jason Andryuk
2024-03-27  7:22   ` Jan Beulich
2024-03-27  8:59     ` Roger Pau Monné
2024-03-27 14:08       ` Jason Andryuk
2024-03-27 14:19         ` Jan Beulich
2024-03-27 14:49           ` Jason Andryuk
2024-03-26 21:38 ` [PATCH v5 2/6] tools/init-xenstore-domain: Replace variable MB() usage Jason Andryuk
2024-03-26 21:38 ` [PATCH v5 3/6] tools: Move MB/GB() to common-macros.h Jason Andryuk
2024-03-26 21:38 ` [PATCH v5 4/6] libelf: Expand ELF note printing Jason Andryuk
2024-03-27  7:27   ` Jan Beulich
2024-03-26 21:38 ` [PATCH v5 5/6] xen/elfnote: Specify ELF Notes are x86-specific Jason Andryuk
2024-03-27  7:24   ` Jan Beulich
2024-03-26 21:38 ` [PATCH v5 6/6] x86/PVH: Support relocatable dom0 kernels Jason Andryuk
2024-03-26 21:47 ` [PATCH v5] RFC: x86/pvh: Make Xen PVH entrypoint PIC for x86-64 Jason Andryuk
2024-03-27  8:20   ` Jan Beulich
2024-03-27 14:15     ` Jason Andryuk

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.