All of lore.kernel.org
 help / color / mirror / Atom feed
* [Xen-devel] [PATCH 0/6] x86/boot: Remove mappings at 0
@ 2020-01-06 15:54 Andrew Cooper
  2020-01-06 15:54 ` [Xen-devel] [PATCH 1/6] x86/boot: Check for E820_RAM earlier when searching the E820 Andrew Cooper
                   ` (5 more replies)
  0 siblings, 6 replies; 34+ messages in thread
From: Andrew Cooper @ 2020-01-06 15:54 UTC (permalink / raw)
  To: Xen-devel
  Cc: Wei Liu, Andrew Cooper, Julien Grall, Hongyan Xia, Jan Beulich,
	Paul Durrant, David Woodhouse, Roger Pau Monné

This is the (long overdue) series which finally removes the mapping at 0
during early boot, which has bitten us in the past.  It also removes an RWX
gadget which persists past boot in the idle pagetables.

Most of the complexity was down to the differing (and hard-to-follow) uses of
the bootmap.  I first opted to get rid of the bootmap entirely.  While this is
possible for the current Multiboot paths, it is incompatible with the EFI boot
path, and works against David's existing plans to not use the trampoline at
all.

Further ideas: (not addressed here because -ETIME on my behalf.)

1) Get PV-shim to use hypercalls for AP startup, at which point we can compile
   out the trampoline entirely.  This is probably helpful for robustness
   testing in combination with David's plans.

2) Drop BOOTSTRAP_MAP_{BASE,LIMIT} and have bootstrap_map() populate into the
   directmap, as we only request RAM mappings.  This would allow us to drop 3
   of the bootmap pagetables.  However, I'm not entirely convinced the later
   logic will cope with cacheability boundaries forcing the use of small
   mappings.

This series has had complete testing for MB and EFI boot paths.  It turns out
that grub can chainload xen.efi and test those paths.

Andrew Cooper (6):
  x86/boot: Check for E820_RAM earlier when searching the E820
  x86/boot: Map the trampoline as read-only
  x86/boot: Remove the preconstructed low 16M superpage mappings
  x86/boot: Clean up l?_bootmap[] construction
  x86/boot: Don't map 0 during boot
  x86/boot: Drop INVALID_VCPU

 xen/arch/x86/boot/head.S          | 33 +++++++++++++++++----------------
 xen/arch/x86/boot/x86_64.S        | 21 ++++++++++-----------
 xen/arch/x86/cpu/mcheck/mce.c     |  2 +-
 xen/arch/x86/domain_page.c        |  2 +-
 xen/arch/x86/efi/efi-boot.h       | 17 ++++++++++-------
 xen/arch/x86/setup.c              | 24 +++++++++++++++---------
 xen/arch/x86/tboot.c              |  2 +-
 xen/arch/x86/x86_64/asm-offsets.c |  3 ---
 xen/arch/x86/x86_64/mm.c          |  2 +-
 xen/arch/x86/xen.lds.S            |  3 +++
 xen/include/asm-x86/setup.h       |  3 ---
 11 files changed, 59 insertions(+), 53 deletions(-)

-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [Xen-devel] [PATCH 1/6] x86/boot: Check for E820_RAM earlier when searching the E820
  2020-01-06 15:54 [Xen-devel] [PATCH 0/6] x86/boot: Remove mappings at 0 Andrew Cooper
@ 2020-01-06 15:54 ` Andrew Cooper
  2020-01-07 15:12   ` Jan Beulich
  2020-01-06 15:54 ` [Xen-devel] [PATCH 2/6] x86/boot: Map the trampoline as read-only Andrew Cooper
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 34+ messages in thread
From: Andrew Cooper @ 2020-01-06 15:54 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Wei Liu, Jan Beulich, Roger Pau Monné

There is no point performing the masking calculations if we are going to
throw the result away.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Wei Liu <wl@xen.org>
CC: Roger Pau Monné <roger.pau@citrix.com>
---
 xen/arch/x86/setup.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
index 501f3f5e4b..ed54f79fea 100644
--- a/xen/arch/x86/setup.c
+++ b/xen/arch/x86/setup.c
@@ -1033,11 +1033,14 @@ void __init noreturn __start_xen(unsigned long mbi_p)
         uint64_t s, e, mask = (1UL << L2_PAGETABLE_SHIFT) - 1;
         uint64_t end, limit = ARRAY_SIZE(l2_identmap) << L2_PAGETABLE_SHIFT;
 
+        if ( boot_e820.map[i].type != E820_RAM )
+            continue;
+
         /* Superpage-aligned chunks from BOOTSTRAP_MAP_BASE. */
         s = (boot_e820.map[i].addr + mask) & ~mask;
         e = (boot_e820.map[i].addr + boot_e820.map[i].size) & ~mask;
         s = max_t(uint64_t, s, BOOTSTRAP_MAP_BASE);
-        if ( (boot_e820.map[i].type != E820_RAM) || (s >= e) )
+        if ( s >= e )
             continue;
 
         if ( s < limit )
@@ -1286,11 +1289,14 @@ void __init noreturn __start_xen(unsigned long mbi_p)
         uint64_t s, e, mask = PAGE_SIZE - 1;
         uint64_t map_s, map_e;
 
+        if ( boot_e820.map[i].type != E820_RAM )
+            continue;
+
         /* Only page alignment required now. */
         s = (boot_e820.map[i].addr + mask) & ~mask;
         e = (boot_e820.map[i].addr + boot_e820.map[i].size) & ~mask;
         s = max_t(uint64_t, s, 1<<20);
-        if ( (boot_e820.map[i].type != E820_RAM) || (s >= e) )
+        if ( s >= e )
             continue;
 
         if ( !acpi_boot_table_init_done &&
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [Xen-devel] [PATCH 2/6] x86/boot: Map the trampoline as read-only
  2020-01-06 15:54 [Xen-devel] [PATCH 0/6] x86/boot: Remove mappings at 0 Andrew Cooper
  2020-01-06 15:54 ` [Xen-devel] [PATCH 1/6] x86/boot: Check for E820_RAM earlier when searching the E820 Andrew Cooper
@ 2020-01-06 15:54 ` Andrew Cooper
  2020-01-07 15:21   ` Jan Beulich
  2020-01-06 15:54 ` [Xen-devel] [PATCH 3/6] x86/boot: Remove the preconstructed low 16M superpage mappings Andrew Cooper
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 34+ messages in thread
From: Andrew Cooper @ 2020-01-06 15:54 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Wei Liu, Jan Beulich, Roger Pau Monné

c/s ec92fcd1d08, which caused the trampoline GDT Access bits to be set,
removed the final writes which occurred between enabling paging and switching
to the high mappings.  There don't plausibly need to be any memory writes in
few instructions is takes to perform this transition.

As a consequence, we can remove the RWX mapping of the trampoline.  It is RX
via its identity mapping below 1M, and RW via the directmap.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Wei Liu <wl@xen.org>
CC: Roger Pau Monné <roger.pau@citrix.com>

This probably wants backporting, alongside ec92fcd1d08 if it hasn't yet.
---
 xen/arch/x86/x86_64/mm.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/xen/arch/x86/x86_64/mm.c b/xen/arch/x86/x86_64/mm.c
index 8ea09ecc30..b7ce833ffc 100644
--- a/xen/arch/x86/x86_64/mm.c
+++ b/xen/arch/x86/x86_64/mm.c
@@ -699,7 +699,7 @@ void __init zap_low_mappings(void)
     /* Replace with mapping of the boot trampoline only. */
     map_pages_to_xen(trampoline_phys, maddr_to_mfn(trampoline_phys),
                      PFN_UP(trampoline_end - trampoline_start),
-                     __PAGE_HYPERVISOR);
+                     __PAGE_HYPERVISOR_RX);
 }
 
 int setup_compat_arg_xlat(struct vcpu *v)
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [Xen-devel] [PATCH 3/6] x86/boot: Remove the preconstructed low 16M superpage mappings
  2020-01-06 15:54 [Xen-devel] [PATCH 0/6] x86/boot: Remove mappings at 0 Andrew Cooper
  2020-01-06 15:54 ` [Xen-devel] [PATCH 1/6] x86/boot: Check for E820_RAM earlier when searching the E820 Andrew Cooper
  2020-01-06 15:54 ` [Xen-devel] [PATCH 2/6] x86/boot: Map the trampoline as read-only Andrew Cooper
@ 2020-01-06 15:54 ` Andrew Cooper
  2020-01-07 15:43   ` Jan Beulich
  2020-01-06 15:54 ` [Xen-devel] [PATCH 4/6] x86/boot: Clean up l?_bootmap[] construction Andrew Cooper
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 34+ messages in thread
From: Andrew Cooper @ 2020-01-06 15:54 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Wei Liu, Jan Beulich, Roger Pau Monné

First, it is undefined to have superpages and MTRRs disagree on cacheability
boundaries, and nothing this early in boot has checked that it is safe to use
superpages here.

Furthermore, nothing actually uses the mappings on boot.  Build these entries
in the directmap when walking the E820 table along with everything else.

As a consequence, there are now no _PAGE_PRESENT entries between
__page_tables_{start,end} which need to skip relocation.  This simplifies the
MB1/2 entry path logic to remove the l2_identmap[] special case.

The low 2M (using 4k pages) is retained for now.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Wei Liu <wl@xen.org>
CC: Roger Pau Monné <roger.pau@citrix.com>
---
 xen/arch/x86/boot/head.S          | 10 ++--------
 xen/arch/x86/boot/x86_64.S        | 17 ++++++-----------
 xen/arch/x86/setup.c              | 12 ++++++------
 xen/arch/x86/x86_64/asm-offsets.c |  3 ---
 4 files changed, 14 insertions(+), 28 deletions(-)

diff --git a/xen/arch/x86/boot/head.S b/xen/arch/x86/boot/head.S
index 8d0ffbd1b0..7ee4511e26 100644
--- a/xen/arch/x86/boot/head.S
+++ b/xen/arch/x86/boot/head.S
@@ -661,15 +661,9 @@ trampoline_setup:
         mov     %eax,sym_fs(boot_tsc_stamp)
         mov     %edx,sym_fs(boot_tsc_stamp)+4
 
-        /*
-         * Update frame addresses in page tables excluding l2_identmap
-         * without its first entry which points to l1_identmap.
-         */
+        /* Relocate pagetables to point at Xen's current location in memory. */
         mov     $((__page_tables_end-__page_tables_start)/8),%ecx
-        mov     $(((l2_identmap-__page_tables_start)/8)+1),%edx
-1:      cmp     $((l2_identmap+l2_identmap_sizeof-__page_tables_start)/8),%ecx
-        cmove   %edx,%ecx
-        testl   $_PAGE_PRESENT,sym_fs(__page_tables_start)-8(,%ecx,8)
+1:      testl   $_PAGE_PRESENT,sym_fs(__page_tables_start)-8(,%ecx,8)
         jz      2f
         add     %esi,sym_fs(__page_tables_start)-8(,%ecx,8)
 2:      loop    1b
diff --git a/xen/arch/x86/boot/x86_64.S b/xen/arch/x86/boot/x86_64.S
index b54d3aceea..30c82f9d5c 100644
--- a/xen/arch/x86/boot/x86_64.S
+++ b/xen/arch/x86/boot/x86_64.S
@@ -66,24 +66,19 @@ l1_identmap:
         .size l1_identmap, . - l1_identmap
 
 /*
- * __page_tables_start does not cover l1_identmap because it (l1_identmap)
- * contains 1-1 mappings. This means that frame addresses of these mappings
- * are static and should not be updated at runtime.
+ * __page_tables_{start,end} cover the range of pagetables which need
+ * relocating as Xen moves around physical memory.  i.e. each sym_offs()
+ * reference to a different pagetable in the Xen image.
  */
 GLOBAL(__page_tables_start)
 
 /*
- * Space for mapping the first 4GB of memory, with the first 16 megabytes
- * actualy mapped (mostly using superpages).  Uses 4x 4k pages.
+ * Space for 4G worth of 2M mappings, first 2M actually mapped via
+ * l1_identmap[].  Uses 4x 4k pages.
  */
 GLOBAL(l2_identmap)
         .quad sym_offs(l1_identmap) + __PAGE_HYPERVISOR
-        idx = 1
-        .rept 7
-        .quad (idx << L2_PAGETABLE_SHIFT) | PAGE_HYPERVISOR | _PAGE_PSE
-        idx = idx + 1
-        .endr
-        .fill 4 * L2_PAGETABLE_ENTRIES - 8, 8, 0
+        .fill 4 * L2_PAGETABLE_ENTRIES - 1, 8, 0
         .size l2_identmap, . - l2_identmap
 
 /*
diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
index ed54f79fea..452f5bdd37 100644
--- a/xen/arch/x86/setup.c
+++ b/xen/arch/x86/setup.c
@@ -1020,8 +1020,8 @@ void __init noreturn __start_xen(unsigned long mbi_p)
      *
      * We require superpage alignment because the boot allocator is
      * not yet initialised. Hence we can only map superpages in the
-     * address range BOOTSTRAP_MAP_BASE to 4GB, as this is guaranteed
-     * not to require dynamic allocation of pagetables.
+     * address range 2MB to 4GB, as this is guaranteed not to require
+     * dynamic allocation of pagetables.
      *
      * As well as mapping superpages in that range, in preparation for
      * initialising the boot allocator, we also look for a region to which
@@ -1036,10 +1036,10 @@ void __init noreturn __start_xen(unsigned long mbi_p)
         if ( boot_e820.map[i].type != E820_RAM )
             continue;
 
-        /* Superpage-aligned chunks from BOOTSTRAP_MAP_BASE. */
+        /* Superpage-aligned chunks from 2MB. */
         s = (boot_e820.map[i].addr + mask) & ~mask;
         e = (boot_e820.map[i].addr + boot_e820.map[i].size) & ~mask;
-        s = max_t(uint64_t, s, BOOTSTRAP_MAP_BASE);
+        s = max_t(uint64_t, s, MB(2));
         if ( s >= e )
             continue;
 
@@ -1346,8 +1346,8 @@ void __init noreturn __start_xen(unsigned long mbi_p)
 
         set_pdx_range(s >> PAGE_SHIFT, e >> PAGE_SHIFT);
 
-        /* Need to create mappings above BOOTSTRAP_MAP_BASE. */
-        map_s = max_t(uint64_t, s, BOOTSTRAP_MAP_BASE);
+        /* Need to create mappings above 2MB. */
+        map_s = max_t(uint64_t, s, MB(2));
         map_e = min_t(uint64_t, e,
                       ARRAY_SIZE(l2_identmap) << L2_PAGETABLE_SHIFT);
 
diff --git a/xen/arch/x86/x86_64/asm-offsets.c b/xen/arch/x86/x86_64/asm-offsets.c
index f9cb78cfdb..07d2155bf5 100644
--- a/xen/arch/x86/x86_64/asm-offsets.c
+++ b/xen/arch/x86/x86_64/asm-offsets.c
@@ -165,8 +165,5 @@ void __dummy__(void)
     OFFSET(MB2_efi64_ih, multiboot2_tag_efi64_ih_t, pointer);
     BLANK();
 
-    DEFINE(l2_identmap_sizeof, sizeof(l2_identmap));
-    BLANK();
-
     OFFSET(DOMAIN_vm_assist, struct domain, vm_assist);
 }
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [Xen-devel] [PATCH 4/6] x86/boot: Clean up l?_bootmap[] construction
  2020-01-06 15:54 [Xen-devel] [PATCH 0/6] x86/boot: Remove mappings at 0 Andrew Cooper
                   ` (2 preceding siblings ...)
  2020-01-06 15:54 ` [Xen-devel] [PATCH 3/6] x86/boot: Remove the preconstructed low 16M superpage mappings Andrew Cooper
@ 2020-01-06 15:54 ` Andrew Cooper
  2020-01-07 16:16   ` Jan Beulich
  2020-01-06 15:54 ` [Xen-devel] [PATCH 5/6] x86/boot: Don't map 0 during boot Andrew Cooper
  2020-01-06 15:54 ` [Xen-devel] [PATCH 6/6] x86/boot: Drop INVALID_VCPU Andrew Cooper
  5 siblings, 1 reply; 34+ messages in thread
From: Andrew Cooper @ 2020-01-06 15:54 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Wei Liu, Jan Beulich, Roger Pau Monné

The need for Xen to be identity mapped into the bootmap is not obvious, and
differs between the MB and EFI boot paths.  Furthermore, the EFI side is
further complicated by spraying non-identity aliases into the mix.

Simplify the EFI bootmap construction code to make exactly one identity-map of
Xen, which now matches the MB path.  Comment both pieces of logic, explaining
what the mappings are needed for.

Finally, leave a linker assert covering the fact that plenty of code blindly
assumes that Xen is less that 16M.  This wants fixing in due course.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Wei Liu <wl@xen.org>
CC: Roger Pau Monné <roger.pau@citrix.com>

The MB path's dependency on Xen's identity mapping can be broken by having
trampoline_boot_cpu_entry switch the alias of gdt_48 it uses.

I took this approach first in an attempt to drop the bootmap entirely, but it
is incompatible with the EFI path, and would also work against other plans to
avoid using the trampoline during early boot.
---
 xen/arch/x86/boot/head.S    |  8 ++++++--
 xen/arch/x86/efi/efi-boot.h | 17 ++++++++++-------
 xen/arch/x86/xen.lds.S      |  3 +++
 3 files changed, 19 insertions(+), 9 deletions(-)

diff --git a/xen/arch/x86/boot/head.S b/xen/arch/x86/boot/head.S
index 7ee4511e26..f7d273ca36 100644
--- a/xen/arch/x86/boot/head.S
+++ b/xen/arch/x86/boot/head.S
@@ -668,7 +668,11 @@ trampoline_setup:
         add     %esi,sym_fs(__page_tables_start)-8(,%ecx,8)
 2:      loop    1b
 
-        /* Initialize L2 boot-map/direct map page table entries (16MB). */
+        /*
+         * Map Xen into the directmap (needed for early-boot pagetable
+         * handling/walking), and identity map Xen into bootmap (needed for
+         * the transition into long mode), using 2M superpages.
+         */
         lea     sym_esi(start),%ebx
         lea     (1<<L2_PAGETABLE_SHIFT)*7+(PAGE_HYPERVISOR|_PAGE_PSE)(%ebx),%eax
         shr     $(L2_PAGETABLE_SHIFT-3),%ebx
@@ -678,7 +682,7 @@ trampoline_setup:
         sub     $(1<<L2_PAGETABLE_SHIFT),%eax
         loop    1b
 
-        /* Initialize L3 boot-map page directory entry. */
+        /* Initialize L3 boot-map page directory entries. */
         lea     __PAGE_HYPERVISOR+(L2_PAGETABLE_ENTRIES*8)*3+sym_esi(l2_bootmap),%eax
         mov     $4,%ecx
 1:      mov     %eax,sym_fs(l3_bootmap)-8(,%ecx,8)
diff --git a/xen/arch/x86/efi/efi-boot.h b/xen/arch/x86/efi/efi-boot.h
index 676d616ff8..9c314e403a 100644
--- a/xen/arch/x86/efi/efi-boot.h
+++ b/xen/arch/x86/efi/efi-boot.h
@@ -584,21 +584,24 @@ static void __init efi_arch_memory_setup(void)
     if ( !efi_enabled(EFI_LOADER) )
         return;
 
-    /* Initialise L2 identity-map and boot-map page table entries (16MB). */
+    /*
+     * Map Xen into the directmap (NX, needed for early-boot pagetable
+     * handling/walking), and identity map Xen into bootmap (X, needed for the
+     * transition from the EFI pagetables to Xen), using 2M superpages.
+     */
     for ( i = 0; i < 8; ++i )
     {
         unsigned int slot = (xen_phys_start >> L2_PAGETABLE_SHIFT) + i;
         paddr_t addr = slot << L2_PAGETABLE_SHIFT;
 
         l2_identmap[slot] = l2e_from_paddr(addr, PAGE_HYPERVISOR|_PAGE_PSE);
-        slot &= L2_PAGETABLE_ENTRIES - 1;
         l2_bootmap[slot] = l2e_from_paddr(addr, __PAGE_HYPERVISOR|_PAGE_PSE);
     }
-    /* Initialise L3 boot-map page directory entries. */
-    l3_bootmap[l3_table_offset(xen_phys_start)] =
-        l3e_from_paddr((UINTN)l2_bootmap, __PAGE_HYPERVISOR);
-    l3_bootmap[l3_table_offset(xen_phys_start + (8 << L2_PAGETABLE_SHIFT) - 1)] =
-        l3e_from_paddr((UINTN)l2_bootmap, __PAGE_HYPERVISOR);
+
+    /* Initialize L3 boot-map page directory entries. */
+    for ( i = 0; i < 4; ++i )
+        l3_bootmap[i] = l3e_from_paddr((UINTN)l2_bootmap + i * PAGE_SIZE,
+                                       __PAGE_HYPERVISOR);
 }
 
 static void __init efi_arch_handle_module(struct file *file, const CHAR16 *name,
diff --git a/xen/arch/x86/xen.lds.S b/xen/arch/x86/xen.lds.S
index 111edb5360..7f82f64078 100644
--- a/xen/arch/x86/xen.lds.S
+++ b/xen/arch/x86/xen.lds.S
@@ -381,3 +381,6 @@ ASSERT((trampoline_end - trampoline_start) < TRAMPOLINE_SPACE - MBI_SPACE_MIN,
     "not enough room for trampoline and mbi data")
 ASSERT((wakeup_stack - wakeup_stack_start) >= WAKEUP_STACK_MIN,
     "wakeup stack too small")
+
+/* Plenty of boot code assumes that Xen isn't larger than 16M. */
+ASSERT(_end - _start <= MB(16), "Xen too large for early-boot assumptions")
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [Xen-devel] [PATCH 5/6] x86/boot: Don't map 0 during boot
  2020-01-06 15:54 [Xen-devel] [PATCH 0/6] x86/boot: Remove mappings at 0 Andrew Cooper
                   ` (3 preceding siblings ...)
  2020-01-06 15:54 ` [Xen-devel] [PATCH 4/6] x86/boot: Clean up l?_bootmap[] construction Andrew Cooper
@ 2020-01-06 15:54 ` Andrew Cooper
  2020-01-07 16:35   ` Jan Beulich
  2020-01-06 15:54 ` [Xen-devel] [PATCH 6/6] x86/boot: Drop INVALID_VCPU Andrew Cooper
  5 siblings, 1 reply; 34+ messages in thread
From: Andrew Cooper @ 2020-01-06 15:54 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Wei Liu, Jan Beulich, Roger Pau Monné

In particular, it causes accidental NULL pointer dereferences to go unnoticed.

The majority of the early operation takes place either in Real mode, or
Protected Unpaged mode.  The only bit which requires pagetable mappings is the
trampoline transition into Long mode and jump to the higher mappings, so there
is no need for the whole bottom 2M to be mapped.

Introduce a new l1_bootmap in .init.data, and use it instead of l1_identmap.
The EFI boot path doesn't pass through the trampoline, so doesn't need any
adjustment.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Wei Liu <wl@xen.org>
CC: Roger Pau Monné <roger.pau@citrix.com>
---
 xen/arch/x86/boot/head.S   | 15 +++++++++------
 xen/arch/x86/boot/x86_64.S |  4 ++++
 2 files changed, 13 insertions(+), 6 deletions(-)

diff --git a/xen/arch/x86/boot/head.S b/xen/arch/x86/boot/head.S
index f7d273ca36..b338d4ba5c 100644
--- a/xen/arch/x86/boot/head.S
+++ b/xen/arch/x86/boot/head.S
@@ -689,12 +689,15 @@ trampoline_setup:
         sub     $(L2_PAGETABLE_ENTRIES*8),%eax
         loop    1b
 
-        /*
-         * During boot, hook 4kB mappings of first 2MB of memory into L2.
-         * This avoids mixing cachability for the legacy VGA region.
-         */
-        lea     __PAGE_HYPERVISOR+sym_esi(l1_identmap),%edi
-        mov     %edi,sym_fs(l2_bootmap)
+        /* Map the permentant trampoline page into l{1,2}_bootmap[]. */
+        mov     sym_esi(trampoline_phys), %edx
+        mov     %edx, %ecx
+        or      $__PAGE_HYPERVISOR_RX, %edx /* %edx = PTE to write  */
+        shr     $PAGE_SHIFT, %ecx           /* %ecx = Slot to write */
+        mov     %edx, sym_offs(l1_bootmap)(%esi, %ecx, 8)
+
+        lea     __PAGE_HYPERVISOR + sym_esi(l1_bootmap), %edx
+        mov     %edx, sym_offs(l2_bootmap)(%esi)
 
         /* Apply relocations to bootstrap trampoline. */
         mov     sym_fs(trampoline_phys),%edx
diff --git a/xen/arch/x86/boot/x86_64.S b/xen/arch/x86/boot/x86_64.S
index 30c82f9d5c..e0763a90e3 100644
--- a/xen/arch/x86/boot/x86_64.S
+++ b/xen/arch/x86/boot/x86_64.S
@@ -156,6 +156,10 @@ GLOBAL(__page_tables_end)
         .section .init.data, "aw", @progbits
         .align PAGE_SIZE, 0
 
+l1_bootmap:
+        .fill L1_PAGETABLE_ENTRIES, 8, 0
+        .size l1_bootmap, . - l1_bootmap
+
 GLOBAL(l2_bootmap)
         .fill 4 * L2_PAGETABLE_ENTRIES, 8, 0
         .size l2_bootmap, . - l2_bootmap
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [Xen-devel] [PATCH 6/6] x86/boot: Drop INVALID_VCPU
  2020-01-06 15:54 [Xen-devel] [PATCH 0/6] x86/boot: Remove mappings at 0 Andrew Cooper
                   ` (4 preceding siblings ...)
  2020-01-06 15:54 ` [Xen-devel] [PATCH 5/6] x86/boot: Don't map 0 during boot Andrew Cooper
@ 2020-01-06 15:54 ` Andrew Cooper
  2020-01-07 16:52   ` Jan Beulich
  5 siblings, 1 reply; 34+ messages in thread
From: Andrew Cooper @ 2020-01-06 15:54 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Wei Liu, Jan Beulich, Roger Pau Monné

Now that NULL will fault at boot, there is no need for a special constant to
signify "current not set up yet".

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Wei Liu <wl@xen.org>
CC: Roger Pau Monné <roger.pau@citrix.com>
---
 xen/arch/x86/cpu/mcheck/mce.c | 2 +-
 xen/arch/x86/domain_page.c    | 2 +-
 xen/arch/x86/setup.c          | 2 +-
 xen/arch/x86/tboot.c          | 2 +-
 xen/include/asm-x86/setup.h   | 3 ---
 5 files changed, 4 insertions(+), 7 deletions(-)

diff --git a/xen/arch/x86/cpu/mcheck/mce.c b/xen/arch/x86/cpu/mcheck/mce.c
index c8cecc4976..0c572b04f2 100644
--- a/xen/arch/x86/cpu/mcheck/mce.c
+++ b/xen/arch/x86/cpu/mcheck/mce.c
@@ -260,7 +260,7 @@ static int mca_init_global(uint32_t flags, struct mcinfo_global *mig)
                         &mig->mc_coreid, &mig->mc_core_threadid,
                         &mig->mc_apicid, NULL, NULL, NULL);
 
-    if ( curr != INVALID_VCPU )
+    if ( curr )
     {
         mig->mc_domid = curr->domain->domain_id;
         mig->mc_vcpuid = curr->vcpu_id;
diff --git a/xen/arch/x86/domain_page.c b/xen/arch/x86/domain_page.c
index 4a07cfb18e..dd32712d2f 100644
--- a/xen/arch/x86/domain_page.c
+++ b/xen/arch/x86/domain_page.c
@@ -29,7 +29,7 @@ static inline struct vcpu *mapcache_current_vcpu(void)
      * When current isn't properly set up yet, this is equivalent to
      * running in an idle vCPU (callers must check for NULL).
      */
-    if ( v == INVALID_VCPU )
+    if ( !v )
         return NULL;
 
     /*
diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
index 452f5bdd37..a7ca2236f4 100644
--- a/xen/arch/x86/setup.c
+++ b/xen/arch/x86/setup.c
@@ -705,7 +705,7 @@ void __init noreturn __start_xen(unsigned long mbi_p)
     /* Critical region without IDT or TSS.  Any fault is deadly! */
 
     set_processor_id(0);
-    set_current(INVALID_VCPU); /* debug sanity. */
+    set_current(NULL); /* debug sanity. */
     idle_vcpu[0] = current;
     init_shadow_spec_ctrl_state();
 
diff --git a/xen/arch/x86/tboot.c b/xen/arch/x86/tboot.c
index 3e828fe204..5020c4ad49 100644
--- a/xen/arch/x86/tboot.c
+++ b/xen/arch/x86/tboot.c
@@ -392,7 +392,7 @@ void tboot_shutdown(uint32_t shutdown_type)
      * During early boot, we can be called by panic before idle_vcpu[0] is
      * setup, but in that case we don't need to change page tables.
      */
-    if ( idle_vcpu[0] != INVALID_VCPU )
+    if ( idle_vcpu[0] )
         write_ptbase(idle_vcpu[0]);
 
     ((void(*)(void))(unsigned long)g_tboot_shared->shutdown_entry)();
diff --git a/xen/include/asm-x86/setup.h b/xen/include/asm-x86/setup.h
index 861d46d6ac..28257bc5c8 100644
--- a/xen/include/asm-x86/setup.h
+++ b/xen/include/asm-x86/setup.h
@@ -4,9 +4,6 @@
 #include <xen/multiboot.h>
 #include <asm/numa.h>
 
-/* vCPU pointer used prior to there being a valid one around */
-#define INVALID_VCPU ((struct vcpu *)0xccccccccccccc000UL)
-
 extern const char __2M_text_start[], __2M_text_end[];
 extern const char __2M_rodata_start[], __2M_rodata_end[];
 extern char __2M_init_start[], __2M_init_end[];
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [Xen-devel] [PATCH 1/6] x86/boot: Check for E820_RAM earlier when searching the E820
  2020-01-06 15:54 ` [Xen-devel] [PATCH 1/6] x86/boot: Check for E820_RAM earlier when searching the E820 Andrew Cooper
@ 2020-01-07 15:12   ` Jan Beulich
  0 siblings, 0 replies; 34+ messages in thread
From: Jan Beulich @ 2020-01-07 15:12 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Xen-devel, Wei Liu, Roger Pau Monné

On 06.01.2020 16:54, Andrew Cooper wrote:
> There is no point performing the masking calculations if we are going to
> throw the result away.

A reasonably optimizing compiler ought to do so. It's slightly less
source code the original way. Nevertheless I don't really mind the
change, so ...

> No functional change.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

Reviewed-by: Jan Beulich <jbeulich@suse.com>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Xen-devel] [PATCH 2/6] x86/boot: Map the trampoline as read-only
  2020-01-06 15:54 ` [Xen-devel] [PATCH 2/6] x86/boot: Map the trampoline as read-only Andrew Cooper
@ 2020-01-07 15:21   ` Jan Beulich
  2020-01-07 15:51     ` Andrew Cooper
  0 siblings, 1 reply; 34+ messages in thread
From: Jan Beulich @ 2020-01-07 15:21 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Xen-devel, Wei Liu, Roger Pau Monné

On 06.01.2020 16:54, Andrew Cooper wrote:
> c/s ec92fcd1d08, which caused the trampoline GDT Access bits to be set,
> removed the final writes which occurred between enabling paging and switching
> to the high mappings.  There don't plausibly need to be any memory writes in
> few instructions is takes to perform this transition.
> 
> As a consequence, we can remove the RWX mapping of the trampoline.  It is RX
> via its identity mapping below 1M, and RW via the directmap.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

Reviewed-by: Jan Beulich <jbeulich@suse.com>

> This probably wants backporting, alongside ec92fcd1d08 if it hasn't yet.

This is just cleanup, largely cosmetic in nature. It could be argued
that once the directmap has disappeared this can serve as additional
proof that the trampoline range has no (intended) writable mappings
anymore, but prior to that point I don't see much further benefit.
Could you expand on the reasons why you see both as backporting
candidates?


Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Xen-devel] [PATCH 3/6] x86/boot: Remove the preconstructed low 16M superpage mappings
  2020-01-06 15:54 ` [Xen-devel] [PATCH 3/6] x86/boot: Remove the preconstructed low 16M superpage mappings Andrew Cooper
@ 2020-01-07 15:43   ` Jan Beulich
  2020-01-07 17:24     ` Andrew Cooper
  0 siblings, 1 reply; 34+ messages in thread
From: Jan Beulich @ 2020-01-07 15:43 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Xen-devel, Wei Liu, Roger Pau Monné

On 06.01.2020 16:54, Andrew Cooper wrote:
> First, it is undefined to have superpages and MTRRs disagree on cacheability
> boundaries, and nothing this early in boot has checked that it is safe to use
> superpages here.

Stating this here gives, at least to me, the impression that you change
things here to obey to these restrictions. I don't see you do so, though
- map_pages_to_xen() doesn't query MTRRs at all afaics.

> Furthermore, nothing actually uses the mappings on boot.  Build these entries
> in the directmap when walking the E820 table along with everything else.

I'm pretty sure some of these mappings were used, perhaps long ago, and
possibly only by the 32-bit hypervisor. It would feel quite a bit better
if it was clear when the need for this disappeared. I wonder if I could
talk you into finding out, so you could say so here.

> --- a/xen/arch/x86/boot/x86_64.S
> +++ b/xen/arch/x86/boot/x86_64.S
> @@ -66,24 +66,19 @@ l1_identmap:
>          .size l1_identmap, . - l1_identmap
>  
>  /*
> - * __page_tables_start does not cover l1_identmap because it (l1_identmap)
> - * contains 1-1 mappings. This means that frame addresses of these mappings
> - * are static and should not be updated at runtime.
> + * __page_tables_{start,end} cover the range of pagetables which need
> + * relocating as Xen moves around physical memory.  i.e. each sym_offs()
> + * reference to a different pagetable in the Xen image.
>   */
>  GLOBAL(__page_tables_start)
>  
>  /*
> - * Space for mapping the first 4GB of memory, with the first 16 megabytes
> - * actualy mapped (mostly using superpages).  Uses 4x 4k pages.
> + * Space for 4G worth of 2M mappings, first 2M actually mapped via
> + * l1_identmap[].  Uses 4x 4k pages.

Would you mind making this say "page tables" instead of "pages" in the
2nd sentence?

> --- a/xen/arch/x86/setup.c
> +++ b/xen/arch/x86/setup.c
> @@ -1020,8 +1020,8 @@ void __init noreturn __start_xen(unsigned long mbi_p)
>       *
>       * We require superpage alignment because the boot allocator is
>       * not yet initialised. Hence we can only map superpages in the
> -     * address range BOOTSTRAP_MAP_BASE to 4GB, as this is guaranteed
> -     * not to require dynamic allocation of pagetables.
> +     * address range 2MB to 4GB, as this is guaranteed not to require
> +     * dynamic allocation of pagetables.
>       *
>       * As well as mapping superpages in that range, in preparation for
>       * initialising the boot allocator, we also look for a region to which
> @@ -1036,10 +1036,10 @@ void __init noreturn __start_xen(unsigned long mbi_p)
>          if ( boot_e820.map[i].type != E820_RAM )
>              continue;
>  
> -        /* Superpage-aligned chunks from BOOTSTRAP_MAP_BASE. */
> +        /* Superpage-aligned chunks from 2MB. */
>          s = (boot_e820.map[i].addr + mask) & ~mask;
>          e = (boot_e820.map[i].addr + boot_e820.map[i].size) & ~mask;
> -        s = max_t(uint64_t, s, BOOTSTRAP_MAP_BASE);
> +        s = max_t(uint64_t, s, MB(2));
>          if ( s >= e )
>              continue;
>  
> @@ -1346,8 +1346,8 @@ void __init noreturn __start_xen(unsigned long mbi_p)
>  
>          set_pdx_range(s >> PAGE_SHIFT, e >> PAGE_SHIFT);
>  
> -        /* Need to create mappings above BOOTSTRAP_MAP_BASE. */
> -        map_s = max_t(uint64_t, s, BOOTSTRAP_MAP_BASE);
> +        /* Need to create mappings above 2MB. */
> +        map_s = max_t(uint64_t, s, MB(2));

Instead of hard coding 2Mb everywhere, how about simply reducing
BOOTSTRAP_MAP_BASE? This would then also ease shrinking the build
time mappings further, e.g. to the low 1Mb (instead of touching
several of the places you touch now, it would again mainly be an
adjustment to BOOTSTRAP_MAP_BASE, alongside the assembly file
changes needed).

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Xen-devel] [PATCH 2/6] x86/boot: Map the trampoline as read-only
  2020-01-07 15:21   ` Jan Beulich
@ 2020-01-07 15:51     ` Andrew Cooper
  2020-01-07 16:19       ` Jan Beulich
  0 siblings, 1 reply; 34+ messages in thread
From: Andrew Cooper @ 2020-01-07 15:51 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Xen-devel, Wei Liu, Roger Pau Monné

On 07/01/2020 15:21, Jan Beulich wrote:
> On 06.01.2020 16:54, Andrew Cooper wrote:
>> c/s ec92fcd1d08, which caused the trampoline GDT Access bits to be set,
>> removed the final writes which occurred between enabling paging and switching
>> to the high mappings.  There don't plausibly need to be any memory writes in
>> few instructions is takes to perform this transition.
>>
>> As a consequence, we can remove the RWX mapping of the trampoline.  It is RX
>> via its identity mapping below 1M, and RW via the directmap.
>>
>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> Reviewed-by: Jan Beulich <jbeulich@suse.com>
>
>> This probably wants backporting, alongside ec92fcd1d08 if it hasn't yet.
> This is just cleanup, largely cosmetic in nature. It could be argued
> that once the directmap has disappeared this can serve as additional
> proof that the trampoline range has no (intended) writable mappings
> anymore, but prior to that point I don't see much further benefit.
> Could you expand on the reasons why you see both as backporting
> candidates?

Defence in depth.

An RWX mapping is very attractive for an attacker who's broken into Xen
and is looking to expand the damage they can do.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Xen-devel] [PATCH 4/6] x86/boot: Clean up l?_bootmap[] construction
  2020-01-06 15:54 ` [Xen-devel] [PATCH 4/6] x86/boot: Clean up l?_bootmap[] construction Andrew Cooper
@ 2020-01-07 16:16   ` Jan Beulich
  2020-01-07 16:30     ` Jan Beulich
  2020-01-07 18:00     ` Andrew Cooper
  0 siblings, 2 replies; 34+ messages in thread
From: Jan Beulich @ 2020-01-07 16:16 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Xen-devel, Wei Liu, Roger Pau Monné

On 06.01.2020 16:54, Andrew Cooper wrote:
> The need for Xen to be identity mapped into the bootmap is not obvious, and
> differs between the MB and EFI boot paths.  Furthermore, the EFI side is
> further complicated by spraying non-identity aliases into the mix.

What (intentional) aliases are you talking about? The changes done here
don't remove any. Or do you mean the ones occurring as a side effect of
possibly using the same L2 in two L3 slots?

> Simplify the EFI bootmap construction code to make exactly one identity-map of
> Xen, which now matches the MB path.  Comment both pieces of logic, explaining
> what the mappings are needed for.

Is both boot map variants fully matching actually needed for anything?

> --- a/xen/arch/x86/efi/efi-boot.h
> +++ b/xen/arch/x86/efi/efi-boot.h
> @@ -584,21 +584,24 @@ static void __init efi_arch_memory_setup(void)
>      if ( !efi_enabled(EFI_LOADER) )
>          return;
>  
> -    /* Initialise L2 identity-map and boot-map page table entries (16MB). */
> +    /*
> +     * Map Xen into the directmap (NX, needed for early-boot pagetable
> +     * handling/walking), and identity map Xen into bootmap (X, needed for the
> +     * transition from the EFI pagetables to Xen), using 2M superpages.
> +     */

How does NX vs X matter for the code below here? PAGE_HYPERVISOR and
__PAGE_HYPERVISOR, as used below, differ by just _PAGE_GLOBAL. Did
you mean to make further changes?

>      for ( i = 0; i < 8; ++i )
>      {
>          unsigned int slot = (xen_phys_start >> L2_PAGETABLE_SHIFT) + i;
>          paddr_t addr = slot << L2_PAGETABLE_SHIFT;
>  
>          l2_identmap[slot] = l2e_from_paddr(addr, PAGE_HYPERVISOR|_PAGE_PSE);
> -        slot &= L2_PAGETABLE_ENTRIES - 1;
>          l2_bootmap[slot] = l2e_from_paddr(addr, __PAGE_HYPERVISOR|_PAGE_PSE);
>      }
> -    /* Initialise L3 boot-map page directory entries. */
> -    l3_bootmap[l3_table_offset(xen_phys_start)] =
> -        l3e_from_paddr((UINTN)l2_bootmap, __PAGE_HYPERVISOR);
> -    l3_bootmap[l3_table_offset(xen_phys_start + (8 << L2_PAGETABLE_SHIFT) - 1)] =
> -        l3e_from_paddr((UINTN)l2_bootmap, __PAGE_HYPERVISOR);
> +
> +    /* Initialize L3 boot-map page directory entries. */
> +    for ( i = 0; i < 4; ++i )
> +        l3_bootmap[i] = l3e_from_paddr((UINTN)l2_bootmap + i * PAGE_SIZE,
> +                                       __PAGE_HYPERVISOR);

The idea behind the original code was to be immune to the number
of pages l2_bootmap[] covers, as long as it's at least one (which
it'll always be, I would say). The minimum requirement to any
change to this I have is that the build must break if the size
assumption here is violated. I.e. there may not be a literal 4 as
the upper loop bound here, or there would need to be a
BUILD_BUG_ON() right next to it. But I'd really prefer if the
code was left as is (perhaps with a comment added), unless you
can point out actual issues with it (which I can't see in the
description), or you can otherwise justify the change with better
than "the EFI side is further complicated by spraying non-identity
aliases into the mix."

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Xen-devel] [PATCH 2/6] x86/boot: Map the trampoline as read-only
  2020-01-07 15:51     ` Andrew Cooper
@ 2020-01-07 16:19       ` Jan Beulich
  2020-01-07 19:04         ` Andrew Cooper
  0 siblings, 1 reply; 34+ messages in thread
From: Jan Beulich @ 2020-01-07 16:19 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Xen-devel, Wei Liu, Roger Pau Monné

On 07.01.2020 16:51, Andrew Cooper wrote:
> On 07/01/2020 15:21, Jan Beulich wrote:
>> On 06.01.2020 16:54, Andrew Cooper wrote:
>>> c/s ec92fcd1d08, which caused the trampoline GDT Access bits to be set,
>>> removed the final writes which occurred between enabling paging and switching
>>> to the high mappings.  There don't plausibly need to be any memory writes in
>>> few instructions is takes to perform this transition.
>>>
>>> As a consequence, we can remove the RWX mapping of the trampoline.  It is RX
>>> via its identity mapping below 1M, and RW via the directmap.
>>>
>>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
>> Reviewed-by: Jan Beulich <jbeulich@suse.com>
>>
>>> This probably wants backporting, alongside ec92fcd1d08 if it hasn't yet.
>> This is just cleanup, largely cosmetic in nature. It could be argued
>> that once the directmap has disappeared this can serve as additional
>> proof that the trampoline range has no (intended) writable mappings
>> anymore, but prior to that point I don't see much further benefit.
>> Could you expand on the reasons why you see both as backporting
>> candidates?
> 
> Defence in depth.
> 
> An RWX mapping is very attractive for an attacker who's broken into Xen
> and is looking to expand the damage they can do.

Such an attacker is typically in the position though to make
themselves RWX mappings. Having as little as possible is only
complicating their job, not making it impossible, I would say.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Xen-devel] [PATCH 4/6] x86/boot: Clean up l?_bootmap[] construction
  2020-01-07 16:16   ` Jan Beulich
@ 2020-01-07 16:30     ` Jan Beulich
  2020-01-07 18:01       ` Andrew Cooper
  2020-01-13 12:08       ` Andrew Cooper
  2020-01-07 18:00     ` Andrew Cooper
  1 sibling, 2 replies; 34+ messages in thread
From: Jan Beulich @ 2020-01-07 16:30 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Xen-devel, Wei Liu, Roger Pau Monné

On 07.01.2020 17:16, Jan Beulich wrote:
> On 06.01.2020 16:54, Andrew Cooper wrote:
>> --- a/xen/arch/x86/efi/efi-boot.h
>> +++ b/xen/arch/x86/efi/efi-boot.h
>> @@ -584,21 +584,24 @@ static void __init efi_arch_memory_setup(void)
>>      if ( !efi_enabled(EFI_LOADER) )
>>          return;
>>  
>> -    /* Initialise L2 identity-map and boot-map page table entries (16MB). */
>> +    /*
>> +     * Map Xen into the directmap (NX, needed for early-boot pagetable
>> +     * handling/walking), and identity map Xen into bootmap (X, needed for the
>> +     * transition from the EFI pagetables to Xen), using 2M superpages.
>> +     */
> 
> How does NX vs X matter for the code below here? PAGE_HYPERVISOR and
> __PAGE_HYPERVISOR, as used below, differ by just _PAGE_GLOBAL. Did
> you mean to make further changes?
> 
>>      for ( i = 0; i < 8; ++i )
>>      {
>>          unsigned int slot = (xen_phys_start >> L2_PAGETABLE_SHIFT) + i;
>>          paddr_t addr = slot << L2_PAGETABLE_SHIFT;
>>  
>>          l2_identmap[slot] = l2e_from_paddr(addr, PAGE_HYPERVISOR|_PAGE_PSE);
>> -        slot &= L2_PAGETABLE_ENTRIES - 1;
>>          l2_bootmap[slot] = l2e_from_paddr(addr, __PAGE_HYPERVISOR|_PAGE_PSE);
>>      }
>> -    /* Initialise L3 boot-map page directory entries. */
>> -    l3_bootmap[l3_table_offset(xen_phys_start)] =
>> -        l3e_from_paddr((UINTN)l2_bootmap, __PAGE_HYPERVISOR);
>> -    l3_bootmap[l3_table_offset(xen_phys_start + (8 << L2_PAGETABLE_SHIFT) - 1)] =
>> -        l3e_from_paddr((UINTN)l2_bootmap, __PAGE_HYPERVISOR);
>> +
>> +    /* Initialize L3 boot-map page directory entries. */
>> +    for ( i = 0; i < 4; ++i )
>> +        l3_bootmap[i] = l3e_from_paddr((UINTN)l2_bootmap + i * PAGE_SIZE,
>> +                                       __PAGE_HYPERVISOR);
> 
> The idea behind the original code was to be immune to the number
> of pages l2_bootmap[] covers, as long as it's at least one (which
> it'll always be, I would say). The minimum requirement to any
> change to this I have is that the build must break if the size
> assumption here is violated. I.e. there may not be a literal 4 as
> the upper loop bound here, or there would need to be a
> BUILD_BUG_ON() right next to it. But I'd really prefer if the
> code was left as is (perhaps with a comment added), unless you
> can point out actual issues with it (which I can't see in the
> description), or you can otherwise justify the change with better
> than "the EFI side is further complicated by spraying non-identity
> aliases into the mix."

And if this change is to be made, won't it mean the code in setup.c
commented with "Make boot page tables match non-EFI boot" can then
go away at the same time?

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Xen-devel] [PATCH 5/6] x86/boot: Don't map 0 during boot
  2020-01-06 15:54 ` [Xen-devel] [PATCH 5/6] x86/boot: Don't map 0 during boot Andrew Cooper
@ 2020-01-07 16:35   ` Jan Beulich
  2020-01-07 18:03     ` Andrew Cooper
  0 siblings, 1 reply; 34+ messages in thread
From: Jan Beulich @ 2020-01-07 16:35 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Xen-devel, Wei Liu, Roger Pau Monné

On 06.01.2020 16:54, Andrew Cooper wrote:
> --- a/xen/arch/x86/boot/head.S
> +++ b/xen/arch/x86/boot/head.S
> @@ -689,12 +689,15 @@ trampoline_setup:
>          sub     $(L2_PAGETABLE_ENTRIES*8),%eax
>          loop    1b
>  
> -        /*
> -         * During boot, hook 4kB mappings of first 2MB of memory into L2.
> -         * This avoids mixing cachability for the legacy VGA region.
> -         */
> -        lea     __PAGE_HYPERVISOR+sym_esi(l1_identmap),%edi
> -        mov     %edi,sym_fs(l2_bootmap)
> +        /* Map the permentant trampoline page into l{1,2}_bootmap[]. */

"permanent"?

> +        mov     sym_esi(trampoline_phys), %edx
> +        mov     %edx, %ecx
> +        or      $__PAGE_HYPERVISOR_RX, %edx /* %edx = PTE to write  */
> +        shr     $PAGE_SHIFT, %ecx           /* %ecx = Slot to write */

Following the LEA model further down, how about

        mov     sym_esi(trampoline_phys), %ecx
        lea     __PAGE_HYPERVISOR_RX(%ecx), %edx /* %edx = PTE to write  */
        shr     $PAGE_SHIFT, %ecx                /* %ecx = Slot to write */

? Anyway, with or without this adjustment
Reviewed-by: Jan Beulich <jbeulich@suse.com>

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Xen-devel] [PATCH 6/6] x86/boot: Drop INVALID_VCPU
  2020-01-06 15:54 ` [Xen-devel] [PATCH 6/6] x86/boot: Drop INVALID_VCPU Andrew Cooper
@ 2020-01-07 16:52   ` Jan Beulich
  2020-01-07 18:07     ` Andrew Cooper
  0 siblings, 1 reply; 34+ messages in thread
From: Jan Beulich @ 2020-01-07 16:52 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Xen-devel, Wei Liu, Roger Pau Monné

On 06.01.2020 16:54, Andrew Cooper wrote:
> Now that NULL will fault at boot, there is no need for a special constant to
> signify "current not set up yet".

Mind making this "... no strong need ..."? The benefit of an easily
recognizable value goes away, but I guess we'll be fine without.
IOW I'm not meaning to object.

> --- a/xen/arch/x86/setup.c
> +++ b/xen/arch/x86/setup.c
> @@ -705,7 +705,7 @@ void __init noreturn __start_xen(unsigned long mbi_p)
>      /* Critical region without IDT or TSS.  Any fault is deadly! */
>  
>      set_processor_id(0);
> -    set_current(INVALID_VCPU); /* debug sanity. */
> +    set_current(NULL); /* debug sanity. */
>      idle_vcpu[0] = current;

Is any of this actually changing any value in memory? I.e. wouldn't
it be better to delete all of this, or leave it in a comment for
documentation purposes? (I'm willing to ack the patch as is, but I'd
like this alternative to at least be considered.)

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Xen-devel] [PATCH 3/6] x86/boot: Remove the preconstructed low 16M superpage mappings
  2020-01-07 15:43   ` Jan Beulich
@ 2020-01-07 17:24     ` Andrew Cooper
  2020-01-08 11:23       ` Jan Beulich
  0 siblings, 1 reply; 34+ messages in thread
From: Andrew Cooper @ 2020-01-07 17:24 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Xen-devel, Wei Liu, Roger Pau Monné

On 07/01/2020 15:43, Jan Beulich wrote:
> On 06.01.2020 16:54, Andrew Cooper wrote:
>> First, it is undefined to have superpages and MTRRs disagree on cacheability
>> boundaries, and nothing this early in boot has checked that it is safe to use
>> superpages here.
> Stating this here gives, at least to me, the impression that you change
> things here to obey to these restrictions. I don't see you do so, though
> - map_pages_to_xen() doesn't query MTRRs at all afaics.

No, but it does now honour the E820 WRT holes and/or reserved regions,
rather than blindly using 2M WB superpages, which is an improvement.

>
>> Furthermore, nothing actually uses the mappings on boot.  Build these entries
>> in the directmap when walking the E820 table along with everything else.
> I'm pretty sure some of these mappings were used, perhaps long ago, and
> possibly only by the 32-bit hypervisor. It would feel quite a bit better
> if it was clear when the need for this disappeared. I wonder if I could
> talk you into finding out, so you could say so here.

TBH, its hard enough figuring out how the mappings were used on staging
alone.

At a guess, these date from the pre-MB2 days, where Xen depended on
being loaded at 1M, and will have been the equivalent of:

+        /*
+         * Map Xen into the directmap (needed for early-boot pagetable
+         * handling/walking), and identity map Xen into bootmap (needed for
+         * the transition into long mode), using 2M superpages.
+         */

which is described now in patch 4.

In my experiments, discussed in the cover letter, I did get down to
having a only the single 4k trampoline page mapped, and across a number
of machines, it was the bootscrub which then hit their absence in the
directmap.

>
>> --- a/xen/arch/x86/boot/x86_64.S
>> +++ b/xen/arch/x86/boot/x86_64.S
>> @@ -66,24 +66,19 @@ l1_identmap:
>>          .size l1_identmap, . - l1_identmap
>>  
>>  /*
>> - * __page_tables_start does not cover l1_identmap because it (l1_identmap)
>> - * contains 1-1 mappings. This means that frame addresses of these mappings
>> - * are static and should not be updated at runtime.
>> + * __page_tables_{start,end} cover the range of pagetables which need
>> + * relocating as Xen moves around physical memory.  i.e. each sym_offs()
>> + * reference to a different pagetable in the Xen image.
>>   */
>>  GLOBAL(__page_tables_start)
>>  
>>  /*
>> - * Space for mapping the first 4GB of memory, with the first 16 megabytes
>> - * actualy mapped (mostly using superpages).  Uses 4x 4k pages.
>> + * Space for 4G worth of 2M mappings, first 2M actually mapped via
>> + * l1_identmap[].  Uses 4x 4k pages.
> Would you mind making this say "page tables" instead of "pages" in the
> 2nd sentence?

Why?  Currently all the "Uses x pages" are consistent, and it is
describing the size of the objects, whose units are pages, not pagetables.

>
>> --- a/xen/arch/x86/setup.c
>> +++ b/xen/arch/x86/setup.c
>> @@ -1020,8 +1020,8 @@ void __init noreturn __start_xen(unsigned long mbi_p)
>>       *
>>       * We require superpage alignment because the boot allocator is
>>       * not yet initialised. Hence we can only map superpages in the
>> -     * address range BOOTSTRAP_MAP_BASE to 4GB, as this is guaranteed
>> -     * not to require dynamic allocation of pagetables.
>> +     * address range 2MB to 4GB, as this is guaranteed not to require
>> +     * dynamic allocation of pagetables.
>>       *
>>       * As well as mapping superpages in that range, in preparation for
>>       * initialising the boot allocator, we also look for a region to which
>> @@ -1036,10 +1036,10 @@ void __init noreturn __start_xen(unsigned long mbi_p)
>>          if ( boot_e820.map[i].type != E820_RAM )
>>              continue;
>>  
>> -        /* Superpage-aligned chunks from BOOTSTRAP_MAP_BASE. */
>> +        /* Superpage-aligned chunks from 2MB. */
>>          s = (boot_e820.map[i].addr + mask) & ~mask;
>>          e = (boot_e820.map[i].addr + boot_e820.map[i].size) & ~mask;
>> -        s = max_t(uint64_t, s, BOOTSTRAP_MAP_BASE);
>> +        s = max_t(uint64_t, s, MB(2));
>>          if ( s >= e )
>>              continue;
>>  
>> @@ -1346,8 +1346,8 @@ void __init noreturn __start_xen(unsigned long mbi_p)
>>  
>>          set_pdx_range(s >> PAGE_SHIFT, e >> PAGE_SHIFT);
>>  
>> -        /* Need to create mappings above BOOTSTRAP_MAP_BASE. */
>> -        map_s = max_t(uint64_t, s, BOOTSTRAP_MAP_BASE);
>> +        /* Need to create mappings above 2MB. */
>> +        map_s = max_t(uint64_t, s, MB(2));
> Instead of hard coding 2Mb everywhere, how about simply reducing
> BOOTSTRAP_MAP_BASE?

Because the use of BOOTSTRAP_MAP_BASE here is conceptually wrong.

Once I've figured out one other bug on the EFI side of things only, I've
got a follow-on change which manages to undef BOOTSTRAP_MAP_BASE beside
LIMIT because, ...

>  This would then also ease shrinking the build
> time mappings further, e.g. to the low 1Mb (instead of touching
> several of the places you touch now, it would again mainly be an
> adjustment to BOOTSTRAP_MAP_BASE, alongside the assembly file
> changes needed).

... as you correctly identify here, it is a property of the prebuilt
tables (in l?_identmap[]), not a property of where we chose to put the
dynamic boot mappings (in the l?_bootmap[]).  Another change (blocked
behind the above bug) moves BOOTSTRAP_MAP_BASE to be 1G to reduce the
chance of an offset from a NULL pointer hitting a present mapping.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Xen-devel] [PATCH 4/6] x86/boot: Clean up l?_bootmap[] construction
  2020-01-07 16:16   ` Jan Beulich
  2020-01-07 16:30     ` Jan Beulich
@ 2020-01-07 18:00     ` Andrew Cooper
  2020-01-08 11:38       ` Jan Beulich
  1 sibling, 1 reply; 34+ messages in thread
From: Andrew Cooper @ 2020-01-07 18:00 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Xen-devel, Wei Liu, Roger Pau Monné

On 07/01/2020 16:16, Jan Beulich wrote:
> On 06.01.2020 16:54, Andrew Cooper wrote:
>> The need for Xen to be identity mapped into the bootmap is not obvious, and
>> differs between the MB and EFI boot paths.  Furthermore, the EFI side is
>> further complicated by spraying non-identity aliases into the mix.
> What (intentional) aliases are you talking about? The changes done here
> don't remove any. Or do you mean the ones occurring as a side effect of
> possibly using the same L2 in two L3 slots?

This piece of logic took ages to reverse engineer, but yes - there are
aliases.

The logic previously read:

l2_identmap[slot] = l2e_from_paddr(addr, PAGE_HYPERVISOR|_PAGE_PSE);
slot &= L2_PAGETABLE_ENTRIES - 1;
l2_bootmap[slot] = l2e_from_paddr(addr, __PAGE_HYPERVISOR|_PAGE_PSE);

which is suspicious and looks wrong, seeing as l2_bootmap[] and
l2_idetmap[] are both 4 pages long and used elsewhere as identity mappings.

This ends up working because of the l3 logic which may, in some
circumstances (the 16M of Xen crossing a 1G boundary), edit two entries
of l3_identmap[], rather than one.  In this case, there ends up being a
second (split) alias of Xen mapped at either end of 2G range which
covers Xen, as the same L2 is used in two L3e's

>> Simplify the EFI bootmap construction code to make exactly one identity-map of
>> Xen, which now matches the MB path.  Comment both pieces of logic, explaining
>> what the mappings are needed for.
> Is both boot map variants fully matching actually needed for anything?

They don't actually fully match after this change.  Xen.efi doesn't map
the trampoline, and has only ever (AFAICT) booted because
zap_low_mappings() creates the trampoline mapping even if it was absent
previously.

The MB path needs the trampoline mapping because it unconditionally
bounces through there, even when no BIOS calls are needed.  This is
expected to change in the future with David's kexec plans.

As for why they should be matching, (or at least, used consistently when
used for the same purpose), my sanity trying to figure out how the EFI
side of things didn't explode on boot.

>
>> --- a/xen/arch/x86/efi/efi-boot.h
>> +++ b/xen/arch/x86/efi/efi-boot.h
>> @@ -584,21 +584,24 @@ static void __init efi_arch_memory_setup(void)
>>      if ( !efi_enabled(EFI_LOADER) )
>>          return;
>>  
>> -    /* Initialise L2 identity-map and boot-map page table entries (16MB). */
>> +    /*
>> +     * Map Xen into the directmap (NX, needed for early-boot pagetable
>> +     * handling/walking), and identity map Xen into bootmap (X, needed for the
>> +     * transition from the EFI pagetables to Xen), using 2M superpages.
>> +     */
> How does NX vs X matter for the code below here? PAGE_HYPERVISOR and
> __PAGE_HYPERVISOR, as used below, differ by just _PAGE_GLOBAL. Did
> you mean to make further changes?

Hmm - good question.  I really did get the EFI build dying when using
code of the form:

l2_identmap[slot] = l2_bootmap[slot] =
    l2e_from_paddr(addr, __PAGE_HYPERVISOR | _PAGE_PSE);

I put that down to trying to use an NX mapping before EFER.NXE was set
up, but in light of your point, I suspect it was something else.

>
>>      for ( i = 0; i < 8; ++i )
>>      {
>>          unsigned int slot = (xen_phys_start >> L2_PAGETABLE_SHIFT) + i;
>>          paddr_t addr = slot << L2_PAGETABLE_SHIFT;
>>  
>>          l2_identmap[slot] = l2e_from_paddr(addr, PAGE_HYPERVISOR|_PAGE_PSE);
>> -        slot &= L2_PAGETABLE_ENTRIES - 1;
>>          l2_bootmap[slot] = l2e_from_paddr(addr, __PAGE_HYPERVISOR|_PAGE_PSE);
>>      }
>> -    /* Initialise L3 boot-map page directory entries. */
>> -    l3_bootmap[l3_table_offset(xen_phys_start)] =
>> -        l3e_from_paddr((UINTN)l2_bootmap, __PAGE_HYPERVISOR);
>> -    l3_bootmap[l3_table_offset(xen_phys_start + (8 << L2_PAGETABLE_SHIFT) - 1)] =
>> -        l3e_from_paddr((UINTN)l2_bootmap, __PAGE_HYPERVISOR);
>> +
>> +    /* Initialize L3 boot-map page directory entries. */
>> +    for ( i = 0; i < 4; ++i )
>> +        l3_bootmap[i] = l3e_from_paddr((UINTN)l2_bootmap + i * PAGE_SIZE,
>> +                                       __PAGE_HYPERVISOR);
> The idea behind the original code was to be immune to the number
> of pages l2_bootmap[] covers, as long as it's at least one (which
> it'll always be, I would say). The minimum requirement to any
> change to this I have is that the build must break if the size
> assumption here is violated. I.e. there may not be a literal 4 as
> the upper loop bound here, or there would need to be a
> BUILD_BUG_ON() right next to it. But I'd really prefer if the
> code was left as is (perhaps with a comment added), unless you
> can point out actual issues with it (which I can't see in the
> description), or you can otherwise justify the change with better
> than "the EFI side is further complicated by spraying non-identity
> aliases into the mix."

Given that what you describe here is totally undocumented, and AFAICT,
totally undescribed even in commit messages, it has cost me probably a
weeks worth of time to reverse to the point at which I was confident
that I knew all of what it was attempting to do.

The purpose of this was to make the handling of l?_bootmap[] as
consistent as possible between the various environments.  The pagetables
themselves are common, and should be used consistently.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Xen-devel] [PATCH 4/6] x86/boot: Clean up l?_bootmap[] construction
  2020-01-07 16:30     ` Jan Beulich
@ 2020-01-07 18:01       ` Andrew Cooper
  2020-01-13 12:08       ` Andrew Cooper
  1 sibling, 0 replies; 34+ messages in thread
From: Andrew Cooper @ 2020-01-07 18:01 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Xen-devel, Wei Liu, Roger Pau Monné

On 07/01/2020 16:30, Jan Beulich wrote:
>>>      for ( i = 0; i < 8; ++i )
>>>      {
>>>          unsigned int slot = (xen_phys_start >> L2_PAGETABLE_SHIFT) + i;
>>>          paddr_t addr = slot << L2_PAGETABLE_SHIFT;
>>>  
>>>          l2_identmap[slot] = l2e_from_paddr(addr, PAGE_HYPERVISOR|_PAGE_PSE);
>>> -        slot &= L2_PAGETABLE_ENTRIES - 1;
>>>          l2_bootmap[slot] = l2e_from_paddr(addr, __PAGE_HYPERVISOR|_PAGE_PSE);
>>>      }
>>> -    /* Initialise L3 boot-map page directory entries. */
>>> -    l3_bootmap[l3_table_offset(xen_phys_start)] =
>>> -        l3e_from_paddr((UINTN)l2_bootmap, __PAGE_HYPERVISOR);
>>> -    l3_bootmap[l3_table_offset(xen_phys_start + (8 << L2_PAGETABLE_SHIFT) - 1)] =
>>> -        l3e_from_paddr((UINTN)l2_bootmap, __PAGE_HYPERVISOR);
>>> +
>>> +    /* Initialize L3 boot-map page directory entries. */
>>> +    for ( i = 0; i < 4; ++i )
>>> +        l3_bootmap[i] = l3e_from_paddr((UINTN)l2_bootmap + i * PAGE_SIZE,
>>> +                                       __PAGE_HYPERVISOR);
>> The idea behind the original code was to be immune to the number
>> of pages l2_bootmap[] covers, as long as it's at least one (which
>> it'll always be, I would say). The minimum requirement to any
>> change to this I have is that the build must break if the size
>> assumption here is violated. I.e. there may not be a literal 4 as
>> the upper loop bound here, or there would need to be a
>> BUILD_BUG_ON() right next to it. But I'd really prefer if the
>> code was left as is (perhaps with a comment added), unless you
>> can point out actual issues with it (which I can't see in the
>> description), or you can otherwise justify the change with better
>> than "the EFI side is further complicated by spraying non-identity
>> aliases into the mix."
> And if this change is to be made, won't it mean the code in setup.c
> commented with "Make boot page tables match non-EFI boot" can then
> go away at the same time?

When I've figured out why altering that causes the EFI boot to fail, yes
- that was the plan...

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Xen-devel] [PATCH 5/6] x86/boot: Don't map 0 during boot
  2020-01-07 16:35   ` Jan Beulich
@ 2020-01-07 18:03     ` Andrew Cooper
  0 siblings, 0 replies; 34+ messages in thread
From: Andrew Cooper @ 2020-01-07 18:03 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Xen-devel, Wei Liu, Roger Pau Monné

On 07/01/2020 16:35, Jan Beulich wrote:
> On 06.01.2020 16:54, Andrew Cooper wrote:
>> --- a/xen/arch/x86/boot/head.S
>> +++ b/xen/arch/x86/boot/head.S
>> @@ -689,12 +689,15 @@ trampoline_setup:
>>          sub     $(L2_PAGETABLE_ENTRIES*8),%eax
>>          loop    1b
>>  
>> -        /*
>> -         * During boot, hook 4kB mappings of first 2MB of memory into L2.
>> -         * This avoids mixing cachability for the legacy VGA region.
>> -         */
>> -        lea     __PAGE_HYPERVISOR+sym_esi(l1_identmap),%edi
>> -        mov     %edi,sym_fs(l2_bootmap)
>> +        /* Map the permentant trampoline page into l{1,2}_bootmap[]. */
> "permanent"?

Fixed.

>
>> +        mov     sym_esi(trampoline_phys), %edx
>> +        mov     %edx, %ecx
>> +        or      $__PAGE_HYPERVISOR_RX, %edx /* %edx = PTE to write  */
>> +        shr     $PAGE_SHIFT, %ecx           /* %ecx = Slot to write */
> Following the LEA model further down, how about
>
>         mov     sym_esi(trampoline_phys), %ecx
>         lea     __PAGE_HYPERVISOR_RX(%ecx), %edx /* %edx = PTE to write  */
>         shr     $PAGE_SHIFT, %ecx                /* %ecx = Slot to write */
>
> ?

LGTM

> Anyway, with or without this adjustment
> Reviewed-by: Jan Beulich <jbeulich@suse.com>

Thanks.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Xen-devel] [PATCH 6/6] x86/boot: Drop INVALID_VCPU
  2020-01-07 16:52   ` Jan Beulich
@ 2020-01-07 18:07     ` Andrew Cooper
  2020-01-08 11:44       ` Jan Beulich
  0 siblings, 1 reply; 34+ messages in thread
From: Andrew Cooper @ 2020-01-07 18:07 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Xen-devel, Wei Liu, Roger Pau Monné

On 07/01/2020 16:52, Jan Beulich wrote:
> On 06.01.2020 16:54, Andrew Cooper wrote:
>> Now that NULL will fault at boot, there is no need for a special constant to
>> signify "current not set up yet".
> Mind making this "... no strong need ..."? The benefit of an easily
> recognizable value goes away, but I guess we'll be fine without.
> IOW I'm not meaning to object.

Fine.

>
>> --- a/xen/arch/x86/setup.c
>> +++ b/xen/arch/x86/setup.c
>> @@ -705,7 +705,7 @@ void __init noreturn __start_xen(unsigned long mbi_p)
>>      /* Critical region without IDT or TSS.  Any fault is deadly! */
>>  
>>      set_processor_id(0);
>> -    set_current(INVALID_VCPU); /* debug sanity. */
>> +    set_current(NULL); /* debug sanity. */
>>      idle_vcpu[0] = current;
> Is any of this actually changing any value in memory?

Yes. Observe:

    /* Set up stack. */
    lea     STACK_SIZE + sym_esi(cpu0_stack), %esp

twice in head.S, meaning that the top-of-stack block is junk at this point.

Explicitly setting it to NULL here seems like a safer option than
trusting that noone has actually used the stack yet.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Xen-devel] [PATCH 2/6] x86/boot: Map the trampoline as read-only
  2020-01-07 16:19       ` Jan Beulich
@ 2020-01-07 19:04         ` Andrew Cooper
  2020-01-08 11:08           ` Jan Beulich
  0 siblings, 1 reply; 34+ messages in thread
From: Andrew Cooper @ 2020-01-07 19:04 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Xen-devel, Wei Liu, Roger Pau Monné

On 07/01/2020 16:19, Jan Beulich wrote:
> On 07.01.2020 16:51, Andrew Cooper wrote:
>> On 07/01/2020 15:21, Jan Beulich wrote:
>>> On 06.01.2020 16:54, Andrew Cooper wrote:
>>>> c/s ec92fcd1d08, which caused the trampoline GDT Access bits to be set,
>>>> removed the final writes which occurred between enabling paging and switching
>>>> to the high mappings.  There don't plausibly need to be any memory writes in
>>>> few instructions is takes to perform this transition.
>>>>
>>>> As a consequence, we can remove the RWX mapping of the trampoline.  It is RX
>>>> via its identity mapping below 1M, and RW via the directmap.
>>>>
>>>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
>>> Reviewed-by: Jan Beulich <jbeulich@suse.com>
>>>
>>>> This probably wants backporting, alongside ec92fcd1d08 if it hasn't yet.
>>> This is just cleanup, largely cosmetic in nature. It could be argued
>>> that once the directmap has disappeared this can serve as additional
>>> proof that the trampoline range has no (intended) writable mappings
>>> anymore, but prior to that point I don't see much further benefit.
>>> Could you expand on the reasons why you see both as backporting
>>> candidates?
>> Defence in depth.
>>
>> An RWX mapping is very attractive for an attacker who's broken into Xen
>> and is looking to expand the damage they can do.
> Such an attacker is typically in the position though to make
> themselves RWX mappings.

This is one example of a possibility.  I wouldn't put it in the "likely"
category, and it definitely isn't a guarantee.

>  Having as little as possible is only
> complicating their job, not making it impossible, I would say.

Yes, and?

This is the entire point of defence in depth.  Make an attackers job harder.

Enforcing W^X is universally considered a good thing from a security
perspective, because it removes a load of trivial cases cases where a
stack over-write can easily be turned into arbitrary code execution.

Sure - this isn't going to stop an attacker who has arbitrary write
exploit, but it very well might stop an attacker who only has restricted
write exploit.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Xen-devel] [PATCH 2/6] x86/boot: Map the trampoline as read-only
  2020-01-07 19:04         ` Andrew Cooper
@ 2020-01-08 11:08           ` Jan Beulich
  2020-01-08 15:51             ` Andrew Cooper
  0 siblings, 1 reply; 34+ messages in thread
From: Jan Beulich @ 2020-01-08 11:08 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Xen-devel, Wei Liu, Roger Pau Monné

On 07.01.2020 20:04, Andrew Cooper wrote:
> On 07/01/2020 16:19, Jan Beulich wrote:
>> On 07.01.2020 16:51, Andrew Cooper wrote:
>>> On 07/01/2020 15:21, Jan Beulich wrote:
>>>> On 06.01.2020 16:54, Andrew Cooper wrote:
>>>>> c/s ec92fcd1d08, which caused the trampoline GDT Access bits to be set,
>>>>> removed the final writes which occurred between enabling paging and switching
>>>>> to the high mappings.  There don't plausibly need to be any memory writes in
>>>>> few instructions is takes to perform this transition.
>>>>>
>>>>> As a consequence, we can remove the RWX mapping of the trampoline.  It is RX
>>>>> via its identity mapping below 1M, and RW via the directmap.
>>>>>
>>>>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
>>>> Reviewed-by: Jan Beulich <jbeulich@suse.com>
>>>>
>>>>> This probably wants backporting, alongside ec92fcd1d08 if it hasn't yet.
>>>> This is just cleanup, largely cosmetic in nature. It could be argued
>>>> that once the directmap has disappeared this can serve as additional
>>>> proof that the trampoline range has no (intended) writable mappings
>>>> anymore, but prior to that point I don't see much further benefit.
>>>> Could you expand on the reasons why you see both as backporting
>>>> candidates?
>>> Defence in depth.
>>>
>>> An RWX mapping is very attractive for an attacker who's broken into Xen
>>> and is looking to expand the damage they can do.
>> Such an attacker is typically in the position though to make
>> themselves RWX mappings.
> 
> This is one example of a possibility.  I wouldn't put it in the "likely"
> category, and it definitely isn't a guarantee.
> 
>>  Having as little as possible is only
>> complicating their job, not making it impossible, I would say.
> 
> Yes, and?
> 
> This is the entire point of defence in depth.  Make an attackers job harder.
> 
> Enforcing W^X is universally considered a good thing from a security
> perspective, because it removes a load of trivial cases cases where a
> stack over-write can easily be turned into arbitrary code execution.

Then let me ask the question differently: Did we backport any of the
earlier RWX elimination changes? I don't recall us doing so. Please
don't get me wrong - I'm happy to be convinced of the backport need,
but as always I'd like to take such a decision in a consistent (and
hence sufficiently predictable) manner, or alternatively with a good
enough reason to ignore this general goal.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Xen-devel] [PATCH 3/6] x86/boot: Remove the preconstructed low 16M superpage mappings
  2020-01-07 17:24     ` Andrew Cooper
@ 2020-01-08 11:23       ` Jan Beulich
  2020-01-08 19:41         ` Andrew Cooper
  0 siblings, 1 reply; 34+ messages in thread
From: Jan Beulich @ 2020-01-08 11:23 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Xen-devel, Wei Liu, Roger Pau Monné

On 07.01.2020 18:24, Andrew Cooper wrote:
> On 07/01/2020 15:43, Jan Beulich wrote:
>> On 06.01.2020 16:54, Andrew Cooper wrote:
>>> First, it is undefined to have superpages and MTRRs disagree on cacheability
>>> boundaries, and nothing this early in boot has checked that it is safe to use
>>> superpages here.
>> Stating this here gives, at least to me, the impression that you change
>> things here to obey to these restrictions. I don't see you do so, though
>> - map_pages_to_xen() doesn't query MTRRs at all afaics.
> 
> No, but it does now honour the E820 WRT holes and/or reserved regions,
> rather than blindly using 2M WB superpages, which is an improvement.

Can you then please make more explicit in this part of the description
what gets improved and what remains to be fishy?

>>> Furthermore, nothing actually uses the mappings on boot.  Build these entries
>>> in the directmap when walking the E820 table along with everything else.
>> I'm pretty sure some of these mappings were used, perhaps long ago, and
>> possibly only by the 32-bit hypervisor. It would feel quite a bit better
>> if it was clear when the need for this disappeared. I wonder if I could
>> talk you into finding out, so you could say so here.
> 
> TBH, its hard enough figuring out how the mappings were used on staging
> alone.
> 
> At a guess, these date from the pre-MB2 days, where Xen depended on
> being loaded at 1M, and will have been the equivalent of:
> 
> +        /*
> +         * Map Xen into the directmap (needed for early-boot pagetable
> +         * handling/walking), and identity map Xen into bootmap (needed for
> +         * the transition into long mode), using 2M superpages.
> +         */
> 
> which is described now in patch 4.
> 
> In my experiments, discussed in the cover letter, I did get down to
> having a only the single 4k trampoline page mapped, and across a number
> of machines, it was the bootscrub which then hit their absence in the
> directmap.

Well, okay then without further archaeology.

>>> --- a/xen/arch/x86/boot/x86_64.S
>>> +++ b/xen/arch/x86/boot/x86_64.S
>>> @@ -66,24 +66,19 @@ l1_identmap:
>>>          .size l1_identmap, . - l1_identmap
>>>  
>>>  /*
>>> - * __page_tables_start does not cover l1_identmap because it (l1_identmap)
>>> - * contains 1-1 mappings. This means that frame addresses of these mappings
>>> - * are static and should not be updated at runtime.
>>> + * __page_tables_{start,end} cover the range of pagetables which need
>>> + * relocating as Xen moves around physical memory.  i.e. each sym_offs()
>>> + * reference to a different pagetable in the Xen image.
>>>   */
>>>  GLOBAL(__page_tables_start)
>>>  
>>>  /*
>>> - * Space for mapping the first 4GB of memory, with the first 16 megabytes
>>> - * actualy mapped (mostly using superpages).  Uses 4x 4k pages.
>>> + * Space for 4G worth of 2M mappings, first 2M actually mapped via
>>> + * l1_identmap[].  Uses 4x 4k pages.
>> Would you mind making this say "page tables" instead of "pages" in the
>> 2nd sentence?
> 
> Why?  Currently all the "Uses x pages" are consistent, and it is
> describing the size of the objects, whose units are pages, not pagetables.

Fair enough.

>>> --- a/xen/arch/x86/setup.c
>>> +++ b/xen/arch/x86/setup.c
>>> @@ -1020,8 +1020,8 @@ void __init noreturn __start_xen(unsigned long mbi_p)
>>>       *
>>>       * We require superpage alignment because the boot allocator is
>>>       * not yet initialised. Hence we can only map superpages in the
>>> -     * address range BOOTSTRAP_MAP_BASE to 4GB, as this is guaranteed
>>> -     * not to require dynamic allocation of pagetables.
>>> +     * address range 2MB to 4GB, as this is guaranteed not to require
>>> +     * dynamic allocation of pagetables.
>>>       *
>>>       * As well as mapping superpages in that range, in preparation for
>>>       * initialising the boot allocator, we also look for a region to which
>>> @@ -1036,10 +1036,10 @@ void __init noreturn __start_xen(unsigned long mbi_p)
>>>          if ( boot_e820.map[i].type != E820_RAM )
>>>              continue;
>>>  
>>> -        /* Superpage-aligned chunks from BOOTSTRAP_MAP_BASE. */
>>> +        /* Superpage-aligned chunks from 2MB. */
>>>          s = (boot_e820.map[i].addr + mask) & ~mask;
>>>          e = (boot_e820.map[i].addr + boot_e820.map[i].size) & ~mask;
>>> -        s = max_t(uint64_t, s, BOOTSTRAP_MAP_BASE);
>>> +        s = max_t(uint64_t, s, MB(2));
>>>          if ( s >= e )
>>>              continue;
>>>  
>>> @@ -1346,8 +1346,8 @@ void __init noreturn __start_xen(unsigned long mbi_p)
>>>  
>>>          set_pdx_range(s >> PAGE_SHIFT, e >> PAGE_SHIFT);
>>>  
>>> -        /* Need to create mappings above BOOTSTRAP_MAP_BASE. */
>>> -        map_s = max_t(uint64_t, s, BOOTSTRAP_MAP_BASE);
>>> +        /* Need to create mappings above 2MB. */
>>> +        map_s = max_t(uint64_t, s, MB(2));
>> Instead of hard coding 2Mb everywhere, how about simply reducing
>> BOOTSTRAP_MAP_BASE?
> 
> Because the use of BOOTSTRAP_MAP_BASE here is conceptually wrong.
> 
> Once I've figured out one other bug on the EFI side of things only, I've
> got a follow-on change which manages to undef BOOTSTRAP_MAP_BASE beside
> LIMIT because, ...
> 
>>  This would then also ease shrinking the build
>> time mappings further, e.g. to the low 1Mb (instead of touching
>> several of the places you touch now, it would again mainly be an
>> adjustment to BOOTSTRAP_MAP_BASE, alongside the assembly file
>> changes needed).
> 
> ... as you correctly identify here, it is a property of the prebuilt
> tables (in l?_identmap[]), not a property of where we chose to put the
> dynamic boot mappings (in the l?_bootmap[]).  Another change (blocked
> behind the above bug) moves BOOTSTRAP_MAP_BASE to be 1G to reduce the
> chance of an offset from a NULL pointer hitting a present mapping.

Right, BOOTSTRAP_MAP_BASE was (ab)used for a 2nd purpose. But this
would better be dealt with by introducing a new manifest constant
(e.g. PREBUILT_MAP_LIMIT) instead of open-coding 2Mb everywhere. Plus
there's (aiui) a PREBUILT_MAP_LIMIT <= BOOTSTRAP_MAP_BASE
requirement, which would better be verified (e.g. by a BUILD_BUG_ON())
then. Then again, having also seen patch 5 now, the relationship
basically goes away altogether there, so perhaps adding a check here
just to drop it there again isn't very useful (but the omission
thereof in the patch here might warrant a remark in the description
then).

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Xen-devel] [PATCH 4/6] x86/boot: Clean up l?_bootmap[] construction
  2020-01-07 18:00     ` Andrew Cooper
@ 2020-01-08 11:38       ` Jan Beulich
  2020-01-08 16:15         ` Andrew Cooper
  0 siblings, 1 reply; 34+ messages in thread
From: Jan Beulich @ 2020-01-08 11:38 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Xen-devel, Wei Liu, Roger Pau Monné

On 07.01.2020 19:00, Andrew Cooper wrote:
> On 07/01/2020 16:16, Jan Beulich wrote:
>> On 06.01.2020 16:54, Andrew Cooper wrote:
>>>      for ( i = 0; i < 8; ++i )
>>>      {
>>>          unsigned int slot = (xen_phys_start >> L2_PAGETABLE_SHIFT) + i;
>>>          paddr_t addr = slot << L2_PAGETABLE_SHIFT;
>>>  
>>>          l2_identmap[slot] = l2e_from_paddr(addr, PAGE_HYPERVISOR|_PAGE_PSE);
>>> -        slot &= L2_PAGETABLE_ENTRIES - 1;
>>>          l2_bootmap[slot] = l2e_from_paddr(addr, __PAGE_HYPERVISOR|_PAGE_PSE);
>>>      }
>>> -    /* Initialise L3 boot-map page directory entries. */
>>> -    l3_bootmap[l3_table_offset(xen_phys_start)] =
>>> -        l3e_from_paddr((UINTN)l2_bootmap, __PAGE_HYPERVISOR);
>>> -    l3_bootmap[l3_table_offset(xen_phys_start + (8 << L2_PAGETABLE_SHIFT) - 1)] =
>>> -        l3e_from_paddr((UINTN)l2_bootmap, __PAGE_HYPERVISOR);
>>> +
>>> +    /* Initialize L3 boot-map page directory entries. */
>>> +    for ( i = 0; i < 4; ++i )
>>> +        l3_bootmap[i] = l3e_from_paddr((UINTN)l2_bootmap + i * PAGE_SIZE,
>>> +                                       __PAGE_HYPERVISOR);
>> The idea behind the original code was to be immune to the number
>> of pages l2_bootmap[] covers, as long as it's at least one (which
>> it'll always be, I would say). The minimum requirement to any
>> change to this I have is that the build must break if the size
>> assumption here is violated. I.e. there may not be a literal 4 as
>> the upper loop bound here, or there would need to be a
>> BUILD_BUG_ON() right next to it. But I'd really prefer if the
>> code was left as is (perhaps with a comment added), unless you
>> can point out actual issues with it (which I can't see in the
>> description), or you can otherwise justify the change with better
>> than "the EFI side is further complicated by spraying non-identity
>> aliases into the mix."
> 
> Given that what you describe here is totally undocumented, and AFAICT,
> totally undescribed even in commit messages, it has cost me probably a
> weeks worth of time to reverse to the point at which I was confident
> that I knew all of what it was attempting to do.

This is not meant as an excuse (I really should have done better back
then), but you could have asked.

> The purpose of this was to make the handling of l?_bootmap[] as
> consistent as possible between the various environments.  The pagetables
> themselves are common, and should be used consistently.

I don't think I can wholeheartedly agree here: l?_bootmap[] are
throw-away page tables (living in .init), and with the non-EFI and
EFI boot paths being so different anyway, them using the available
tables differently is not a big issue imo. This heavy difference of
other aspects was also why back then I decided to be as defensive
towards l2_bootmap[] size changes as possible in code which doesn't
really need it to be multiple pages.

As said - I'm going to try to not stand in the way of you re-
arranging this, but
- the new code should not break silently when (in particular)
  l2_bootmap[] changes
- the description should be more explicit about the motivation of
  the change (which includes distinguishing between intentional
  mappings and ones simply appearing as a side effect, without
  getting in the way)

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Xen-devel] [PATCH 6/6] x86/boot: Drop INVALID_VCPU
  2020-01-07 18:07     ` Andrew Cooper
@ 2020-01-08 11:44       ` Jan Beulich
  0 siblings, 0 replies; 34+ messages in thread
From: Jan Beulich @ 2020-01-08 11:44 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Xen-devel, Wei Liu, Roger Pau Monné

On 07.01.2020 19:07, Andrew Cooper wrote:
> On 07/01/2020 16:52, Jan Beulich wrote:
>> On 06.01.2020 16:54, Andrew Cooper wrote:
>>> --- a/xen/arch/x86/setup.c
>>> +++ b/xen/arch/x86/setup.c
>>> @@ -705,7 +705,7 @@ void __init noreturn __start_xen(unsigned long mbi_p)
>>>      /* Critical region without IDT or TSS.  Any fault is deadly! */
>>>  
>>>      set_processor_id(0);
>>> -    set_current(INVALID_VCPU); /* debug sanity. */
>>> +    set_current(NULL); /* debug sanity. */
>>>      idle_vcpu[0] = current;
>> Is any of this actually changing any value in memory?
> 
> Yes. Observe:
> 
>     /* Set up stack. */
>     lea     STACK_SIZE + sym_esi(cpu0_stack), %esp
> 
> twice in head.S, meaning that the top-of-stack block is junk at this point.

Why junk, when we have

char __section(".bss.stack_aligned") __aligned(STACK_SIZE)
    cpu0_stack[STACK_SIZE];

> Explicitly setting it to NULL here seems like a safer option than
> trusting that noone has actually used the stack yet.

The actual "stack" part of cpu0_stack[] may have been used already,
but the top-of-stack block ought to be untouched, or else we have
other problems. Anyway, I don't heavily mind writing several zeros
over what is already zero here, it just seems pretty pointless (and
increasingly so by you now writing yet another zero).

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Xen-devel] [PATCH 2/6] x86/boot: Map the trampoline as read-only
  2020-01-08 11:08           ` Jan Beulich
@ 2020-01-08 15:51             ` Andrew Cooper
  0 siblings, 0 replies; 34+ messages in thread
From: Andrew Cooper @ 2020-01-08 15:51 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Xen-devel, Wei Liu, Roger Pau Monné

On 08/01/2020 11:08, Jan Beulich wrote:
> On 07.01.2020 20:04, Andrew Cooper wrote:
>> On 07/01/2020 16:19, Jan Beulich wrote:
>>> On 07.01.2020 16:51, Andrew Cooper wrote:
>>>> On 07/01/2020 15:21, Jan Beulich wrote:
>>>>> On 06.01.2020 16:54, Andrew Cooper wrote:
>>>>>> c/s ec92fcd1d08, which caused the trampoline GDT Access bits to be set,
>>>>>> removed the final writes which occurred between enabling paging and switching
>>>>>> to the high mappings.  There don't plausibly need to be any memory writes in
>>>>>> few instructions is takes to perform this transition.
>>>>>>
>>>>>> As a consequence, we can remove the RWX mapping of the trampoline.  It is RX
>>>>>> via its identity mapping below 1M, and RW via the directmap.
>>>>>>
>>>>>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
>>>>> Reviewed-by: Jan Beulich <jbeulich@suse.com>
>>>>>
>>>>>> This probably wants backporting, alongside ec92fcd1d08 if it hasn't yet.
>>>>> This is just cleanup, largely cosmetic in nature. It could be argued
>>>>> that once the directmap has disappeared this can serve as additional
>>>>> proof that the trampoline range has no (intended) writable mappings
>>>>> anymore, but prior to that point I don't see much further benefit.
>>>>> Could you expand on the reasons why you see both as backporting
>>>>> candidates?
>>>> Defence in depth.
>>>>
>>>> An RWX mapping is very attractive for an attacker who's broken into Xen
>>>> and is looking to expand the damage they can do.
>>> Such an attacker is typically in the position though to make
>>> themselves RWX mappings.
>> This is one example of a possibility.  I wouldn't put it in the "likely"
>> category, and it definitely isn't a guarantee.
>>
>>>  Having as little as possible is only
>>> complicating their job, not making it impossible, I would say.
>> Yes, and?
>>
>> This is the entire point of defence in depth.  Make an attackers job harder.
>>
>> Enforcing W^X is universally considered a good thing from a security
>> perspective, because it removes a load of trivial cases cases where a
>> stack over-write can easily be turned into arbitrary code execution.
> Then let me ask the question differently: Did we backport any of the
> earlier RWX elimination changes? I don't recall us doing so.

I don't know if we did or not.

> Please
> don't get me wrong - I'm happy to be convinced of the backport need,
> but as always I'd like to take such a decision in a consistent (and
> hence sufficiently predictable) manner, or alternatively with a good
> enough reason to ignore this general goal.

If we didn't, then we really ought to have done.  There are real,
concrete security nice-to-haves from it.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Xen-devel] [PATCH 4/6] x86/boot: Clean up l?_bootmap[] construction
  2020-01-08 11:38       ` Jan Beulich
@ 2020-01-08 16:15         ` Andrew Cooper
  2020-01-08 16:55           ` Jan Beulich
  0 siblings, 1 reply; 34+ messages in thread
From: Andrew Cooper @ 2020-01-08 16:15 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Xen-devel, Wei Liu, Roger Pau Monné

On 08/01/2020 11:38, Jan Beulich wrote:
>> The purpose of this was to make the handling of l?_bootmap[] as
>> consistent as possible between the various environments.  The pagetables
>> themselves are common, and should be used consistently.
> I don't think I can wholeheartedly agree here: l?_bootmap[] are
> throw-away page tables (living in .init), and with the non-EFI and
> EFI boot paths being so different anyway, them using the available
> tables differently is not a big issue imo. This heavy difference of
> other aspects was also why back then I decided to be as defensive
> towards l2_bootmap[] size changes as possible in code which doesn't
> really need it to be multiple pages.

From this description, it suggests that you haven't spotted the rather
more subtle bug which will trip up anyone trying to develop in the future.

This scheme is incompatible with trying to map a second object (e.g. the
trampoline) into the bootmap, because depending on alignment, it may
overlap with the PTEs which mapped Xen.  There also typically isn't an
l3_bootmap[0] => l2_bootmap[0] because of where xen.efi is loaded in memory.

>
> As said - I'm going to try to not stand in the way of you re-
> arranging this, but
> - the new code should not break silently when (in particular)
>   l2_bootmap[] changes

What practical changes do you think could be done here?  I can't spot
any which would be helpful.

A BUILD_BUG_ON() doesn't work.  The most likely case for something going
wrong here is an edit to x86_64.S and no equivalent edit to page.h,
which a BUILD_BUG_ON() wouldn't spot.  head.S similarly has no useful
protections which could be added.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Xen-devel] [PATCH 4/6] x86/boot: Clean up l?_bootmap[] construction
  2020-01-08 16:15         ` Andrew Cooper
@ 2020-01-08 16:55           ` Jan Beulich
  2020-01-08 17:09             ` Andrew Cooper
  0 siblings, 1 reply; 34+ messages in thread
From: Jan Beulich @ 2020-01-08 16:55 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Xen-devel, Wei Liu, Roger Pau Monné

On 08.01.2020 17:15, Andrew Cooper wrote:
> On 08/01/2020 11:38, Jan Beulich wrote:
>> As said - I'm going to try to not stand in the way of you re-
>> arranging this, but
>> - the new code should not break silently when (in particular)
>>   l2_bootmap[] changes
> 
> What practical changes do you think could be done here?  I can't spot
> any which would be helpful.
> 
> A BUILD_BUG_ON() doesn't work.  The most likely case for something going
> wrong here is an edit to x86_64.S and no equivalent edit to page.h,
> which a BUILD_BUG_ON() wouldn't spot.  head.S similarly has no useful
> protections which could be added.

Well, the fundamental assumption is that the .S files and the
C declaration of l?_bootmap[] are kept in sync. No BUILD_BUG_ON()
can cover a mistake made there. But rather than using the literal
4 as you did, an ARRAY_SIZE() construct should be usable to either
replace it, or amend it with a BUILD_BUG_ON().

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Xen-devel] [PATCH 4/6] x86/boot: Clean up l?_bootmap[] construction
  2020-01-08 16:55           ` Jan Beulich
@ 2020-01-08 17:09             ` Andrew Cooper
  2020-01-09  9:30               ` Jan Beulich
  0 siblings, 1 reply; 34+ messages in thread
From: Andrew Cooper @ 2020-01-08 17:09 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Xen-devel, Wei Liu, Roger Pau Monné

On 08/01/2020 16:55, Jan Beulich wrote:
> On 08.01.2020 17:15, Andrew Cooper wrote:
>> On 08/01/2020 11:38, Jan Beulich wrote:
>>> As said - I'm going to try to not stand in the way of you re-
>>> arranging this, but
>>> - the new code should not break silently when (in particular)
>>>   l2_bootmap[] changes
>> What practical changes do you think could be done here?  I can't spot
>> any which would be helpful.
>>
>> A BUILD_BUG_ON() doesn't work.  The most likely case for something going
>> wrong here is an edit to x86_64.S and no equivalent edit to page.h,
>> which a BUILD_BUG_ON() wouldn't spot.  head.S similarly has no useful
>> protections which could be added.
> Well, the fundamental assumption is that the .S files and the
> C declaration of l?_bootmap[] are kept in sync. No BUILD_BUG_ON()
> can cover a mistake made there. But rather than using the literal
> 4 as you did, an ARRAY_SIZE() construct should be usable to either
> replace it, or amend it with a BUILD_BUG_ON().

You are aware that ARRAY_SIZE(l2_bootmap) is 2048 and
ARRAY_SIZE(l3_bootmap) is 512, neither of which would be correct here?

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Xen-devel] [PATCH 3/6] x86/boot: Remove the preconstructed low 16M superpage mappings
  2020-01-08 11:23       ` Jan Beulich
@ 2020-01-08 19:41         ` Andrew Cooper
  2020-01-09  9:35           ` Jan Beulich
  0 siblings, 1 reply; 34+ messages in thread
From: Andrew Cooper @ 2020-01-08 19:41 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Xen-devel, Wei Liu, Roger Pau Monné

On 08/01/2020 11:23, Jan Beulich wrote:
>>>  This would then also ease shrinking the build
>>> time mappings further, e.g. to the low 1Mb (instead of touching
>>> several of the places you touch now, it would again mainly be an
>>> adjustment to BOOTSTRAP_MAP_BASE, alongside the assembly file
>>> changes needed).
>> ... as you correctly identify here, it is a property of the prebuilt
>> tables (in l?_identmap[]), not a property of where we chose to put the
>> dynamic boot mappings (in the l?_bootmap[]).  Another change (blocked
>> behind the above bug) moves BOOTSTRAP_MAP_BASE to be 1G to reduce the
>> chance of an offset from a NULL pointer hitting a present mapping.
> Right, BOOTSTRAP_MAP_BASE was (ab)used for a 2nd purpose. But this
> would better be dealt with by introducing a new manifest constant
> (e.g. PREBUILT_MAP_LIMIT) instead of open-coding 2Mb everywhere.

I'm hoping to get rid of even this, (although it is complicated by
CONFIG_VIDEO's blind use of the legacy VGA range).

> Plus there's (aiui) a PREBUILT_MAP_LIMIT <= BOOTSTRAP_MAP_BASE
> requirement, which would better be verified (e.g. by a BUILD_BUG_ON())
> then.

Is there?  I don't see a real connection between the two, even in this
patch.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Xen-devel] [PATCH 4/6] x86/boot: Clean up l?_bootmap[] construction
  2020-01-08 17:09             ` Andrew Cooper
@ 2020-01-09  9:30               ` Jan Beulich
  0 siblings, 0 replies; 34+ messages in thread
From: Jan Beulich @ 2020-01-09  9:30 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Xen-devel, Wei Liu, Roger Pau Monné

On 08.01.2020 18:09, Andrew Cooper wrote:
> On 08/01/2020 16:55, Jan Beulich wrote:
>> On 08.01.2020 17:15, Andrew Cooper wrote:
>>> On 08/01/2020 11:38, Jan Beulich wrote:
>>>> As said - I'm going to try to not stand in the way of you re-
>>>> arranging this, but
>>>> - the new code should not break silently when (in particular)
>>>>   l2_bootmap[] changes
>>> What practical changes do you think could be done here?  I can't spot
>>> any which would be helpful.
>>>
>>> A BUILD_BUG_ON() doesn't work.  The most likely case for something going
>>> wrong here is an edit to x86_64.S and no equivalent edit to page.h,
>>> which a BUILD_BUG_ON() wouldn't spot.  head.S similarly has no useful
>>> protections which could be added.
>> Well, the fundamental assumption is that the .S files and the
>> C declaration of l?_bootmap[] are kept in sync. No BUILD_BUG_ON()
>> can cover a mistake made there. But rather than using the literal
>> 4 as you did, an ARRAY_SIZE() construct should be usable to either
>> replace it, or amend it with a BUILD_BUG_ON().
> 
> You are aware that ARRAY_SIZE(l2_bootmap) is 2048 and
> ARRAY_SIZE(l3_bootmap) is 512, neither of which would be correct here?

Yes, I am (which is why I added "construct"). Dividing by
L<n>_PAGETABLE_ENTRIES would be one option. In particular for
l2_bootmap declaring at as [4][L2_PAGETABLE_ENTRIES] would be
another.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Xen-devel] [PATCH 3/6] x86/boot: Remove the preconstructed low 16M superpage mappings
  2020-01-08 19:41         ` Andrew Cooper
@ 2020-01-09  9:35           ` Jan Beulich
  0 siblings, 0 replies; 34+ messages in thread
From: Jan Beulich @ 2020-01-09  9:35 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Xen-devel, Wei Liu, Roger Pau Monné

On 08.01.2020 20:41, Andrew Cooper wrote:
> On 08/01/2020 11:23, Jan Beulich wrote:
>>>>  This would then also ease shrinking the build
>>>> time mappings further, e.g. to the low 1Mb (instead of touching
>>>> several of the places you touch now, it would again mainly be an
>>>> adjustment to BOOTSTRAP_MAP_BASE, alongside the assembly file
>>>> changes needed).
>>> ... as you correctly identify here, it is a property of the prebuilt
>>> tables (in l?_identmap[]), not a property of where we chose to put the
>>> dynamic boot mappings (in the l?_bootmap[]).  Another change (blocked
>>> behind the above bug) moves BOOTSTRAP_MAP_BASE to be 1G to reduce the
>>> chance of an offset from a NULL pointer hitting a present mapping.
>> Right, BOOTSTRAP_MAP_BASE was (ab)used for a 2nd purpose. But this
>> would better be dealt with by introducing a new manifest constant
>> (e.g. PREBUILT_MAP_LIMIT) instead of open-coding 2Mb everywhere.
> 
> I'm hoping to get rid of even this, (although it is complicated by
> CONFIG_VIDEO's blind use of the legacy VGA range).
> 
>> Plus there's (aiui) a PREBUILT_MAP_LIMIT <= BOOTSTRAP_MAP_BASE
>> requirement, which would better be verified (e.g. by a BUILD_BUG_ON())
>> then.
> 
> Is there?  I don't see a real connection between the two, even in this
> patch.

Perhaps not a "real" one (in the sense that I think you mean), but
at the very least when PREBUILT_MAP_LIMIT > BOOTSTRAP_MAP_BASE
adding a first bootstrap mapping would _require_ a TLB flush, whereas
things would be fine without otherwise (albeit this going through
map_pages_to_xen() means appropriate flushing will be done anyway).
Anything else depends on the potentially overlapping range actually
being used for something (which your series proves it's not).

Anyway, as said before, considering your subsequent changes, there's
no real need to add any further checking here.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Xen-devel] [PATCH 4/6] x86/boot: Clean up l?_bootmap[] construction
  2020-01-07 16:30     ` Jan Beulich
  2020-01-07 18:01       ` Andrew Cooper
@ 2020-01-13 12:08       ` Andrew Cooper
  1 sibling, 0 replies; 34+ messages in thread
From: Andrew Cooper @ 2020-01-13 12:08 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Xen-devel, Wei Liu, Roger Pau Monné

On 07/01/2020 16:30, Jan Beulich wrote:
> On 07.01.2020 17:16, Jan Beulich wrote:
>> On 06.01.2020 16:54, Andrew Cooper wrote:
>>> --- a/xen/arch/x86/efi/efi-boot.h
>>> +++ b/xen/arch/x86/efi/efi-boot.h
>>> @@ -584,21 +584,24 @@ static void __init efi_arch_memory_setup(void)
>>>      if ( !efi_enabled(EFI_LOADER) )
>>>          return;
>>>  
>>> -    /* Initialise L2 identity-map and boot-map page table entries (16MB). */
>>> +    /*
>>> +     * Map Xen into the directmap (NX, needed for early-boot pagetable
>>> +     * handling/walking), and identity map Xen into bootmap (X, needed for the
>>> +     * transition from the EFI pagetables to Xen), using 2M superpages.
>>> +     */
>> How does NX vs X matter for the code below here? PAGE_HYPERVISOR and
>> __PAGE_HYPERVISOR, as used below, differ by just _PAGE_GLOBAL. Did
>> you mean to make further changes?

Nope.  The comments were actually correct (and the code, remained correct).

PAGE_HYPERVISOR and __PAGE_HYPERVISOR really do differ by NX as well,
outside of asm code.  I'm going to fix this because its too complicated
to reason about.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2020-01-13 12:08 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-01-06 15:54 [Xen-devel] [PATCH 0/6] x86/boot: Remove mappings at 0 Andrew Cooper
2020-01-06 15:54 ` [Xen-devel] [PATCH 1/6] x86/boot: Check for E820_RAM earlier when searching the E820 Andrew Cooper
2020-01-07 15:12   ` Jan Beulich
2020-01-06 15:54 ` [Xen-devel] [PATCH 2/6] x86/boot: Map the trampoline as read-only Andrew Cooper
2020-01-07 15:21   ` Jan Beulich
2020-01-07 15:51     ` Andrew Cooper
2020-01-07 16:19       ` Jan Beulich
2020-01-07 19:04         ` Andrew Cooper
2020-01-08 11:08           ` Jan Beulich
2020-01-08 15:51             ` Andrew Cooper
2020-01-06 15:54 ` [Xen-devel] [PATCH 3/6] x86/boot: Remove the preconstructed low 16M superpage mappings Andrew Cooper
2020-01-07 15:43   ` Jan Beulich
2020-01-07 17:24     ` Andrew Cooper
2020-01-08 11:23       ` Jan Beulich
2020-01-08 19:41         ` Andrew Cooper
2020-01-09  9:35           ` Jan Beulich
2020-01-06 15:54 ` [Xen-devel] [PATCH 4/6] x86/boot: Clean up l?_bootmap[] construction Andrew Cooper
2020-01-07 16:16   ` Jan Beulich
2020-01-07 16:30     ` Jan Beulich
2020-01-07 18:01       ` Andrew Cooper
2020-01-13 12:08       ` Andrew Cooper
2020-01-07 18:00     ` Andrew Cooper
2020-01-08 11:38       ` Jan Beulich
2020-01-08 16:15         ` Andrew Cooper
2020-01-08 16:55           ` Jan Beulich
2020-01-08 17:09             ` Andrew Cooper
2020-01-09  9:30               ` Jan Beulich
2020-01-06 15:54 ` [Xen-devel] [PATCH 5/6] x86/boot: Don't map 0 during boot Andrew Cooper
2020-01-07 16:35   ` Jan Beulich
2020-01-07 18:03     ` Andrew Cooper
2020-01-06 15:54 ` [Xen-devel] [PATCH 6/6] x86/boot: Drop INVALID_VCPU Andrew Cooper
2020-01-07 16:52   ` Jan Beulich
2020-01-07 18:07     ` Andrew Cooper
2020-01-08 11:44       ` Jan Beulich

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.