All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andrew Cooper <andrew.cooper3@citrix.com>
To: Xen-devel <xen-devel@lists.xenproject.org>
Cc: "Andrew Cooper" <andrew.cooper3@citrix.com>,
	"Jan Beulich" <JBeulich@suse.com>,
	"Roger Pau Monné" <roger.pau@citrix.com>, "Wei Liu" <wl@xen.org>
Subject: [PATCH v2-ish] x86/boot: Factor move_xen() out of __start_xen()
Date: Fri, 16 Dec 2022 20:17:49 +0000	[thread overview]
Message-ID: <20221216201749.32164-1-andrew.cooper3@citrix.com> (raw)
In-Reply-To: <20211207105339.28440-1-andrew.cooper3@citrix.com>

Partly for clarity because there is a lot of subtle magic at work here.
Expand the commentary of what's going on.

Also, because there is no need to double copy the stack (32kB) to retrieve
local values spilled to the stack under the old alias, when all of the
aforementioned local variables are going out of scope anyway.

There is also a logic change when walking l2_xenmap[].  The old logic would
skip recursing into 4k mappings; this would corrupt Xen if it were used.
There are no 4k mappings in l2_xenmap[] this early on boot, so assert the
property instead of leaving a latent breakage.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Roger Pau Monné <roger.pau@citrix.com>
CC: Wei Liu <wl@xen.org>

We probably want to drop PGE from XEN_MINIMAL_CR4.  It will simplify several
boot paths which don't need to care about global pages, and PGE is conditional
anyway now with the PCID support.

v2-ish:
 * Split out previous series.  Was previously "x86/boot: Don't double-copy the
   stack during physical relocation" reworked to remove the risk of losing
   local stack updated we might care about.

Note that putting a printk() in the critical region wouldn't have worked even
with the old logic, and is not a reduction in functionality caused by this
patch avoiding the second stack copy.
---
 xen/arch/x86/setup.c | 178 +++++++++++++++++++++++++++++++--------------------
 1 file changed, 108 insertions(+), 70 deletions(-)

diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
index 4102aae76dde..830502f2d0d9 100644
--- a/xen/arch/x86/setup.c
+++ b/xen/arch/x86/setup.c
@@ -467,6 +467,113 @@ static void __init move_memory(
     }
 }
 
+static void __init noinline move_xen(void)
+{
+    l4_pgentry_t *pl4e;
+    l3_pgentry_t *pl3e;
+    l2_pgentry_t *pl2e;
+    unsigned long tmp;
+    int i, j, k;
+
+    /*
+     * The caller has selected xen_phys_start, ensuring that the old and new
+     * locations do not overlap.  Make it so.
+     *
+     * Prevent the compiler from reordering anything across this point.  Such
+     * things will end badly.
+     */
+    barrier();
+
+    /*
+     * Copy out of the current alias, into the directmap at the new location.
+     * This includes a snapshot of the current stack.
+     */
+    memcpy(__va(__pa(_start)), _start, _end - _start);
+
+    /*
+     * We are now in a critical region.  Any write (modifying a global,
+     * spilling a local to the stack, etc) via the current alias will get lost
+     * when we switch to the new pagetables.  This even means no printk()'s
+     * debugging purposes.
+     *
+     * Walk the soon-to-be-used pagetables in the new location, relocating all
+     * intermediate (non-leaf) entries to point to their new-location
+     * equivalents.  All writes are via the directmap alias.
+     */
+    pl4e = __va(__pa(idle_pg_table));
+    for ( i = 0 ; i < L4_PAGETABLE_ENTRIES; i++, pl4e++ )
+    {
+        if ( !(l4e_get_flags(*pl4e) & _PAGE_PRESENT) )
+            continue;
+
+        *pl4e = l4e_from_intpte(l4e_get_intpte(*pl4e) + xen_phys_start);
+        pl3e = __va(l4e_get_paddr(*pl4e));
+        for ( j = 0; j < L3_PAGETABLE_ENTRIES; j++, pl3e++ )
+        {
+            if ( !(l3e_get_flags(*pl3e) & _PAGE_PRESENT) ||
+                 (l3e_get_flags(*pl3e) & _PAGE_PSE) )
+                continue;
+
+            *pl3e = l3e_from_intpte(l3e_get_intpte(*pl3e) + xen_phys_start);
+            pl2e = __va(l3e_get_paddr(*pl3e));
+            for ( k = 0; k < L2_PAGETABLE_ENTRIES; k++, pl2e++ )
+            {
+                if ( !(l2e_get_flags(*pl2e) & _PAGE_PRESENT) ||
+                     (l2e_get_flags(*pl2e) & _PAGE_PSE) )
+                    continue;
+
+                *pl2e = l2e_from_intpte(l2e_get_intpte(*pl2e) + xen_phys_start);
+            }
+        }
+    }
+
+    /*
+     * Walk the soon-to-be-used l2_xenmap[], relocating all the leaf mappings
+     * so text/data/bss etc refer to the new location in memory.
+     */
+    pl2e = __va(__pa(l2_xenmap));
+    for ( i = 0; i < L2_PAGETABLE_ENTRIES; i++, pl2e++ )
+    {
+        if ( !(l2e_get_flags(*pl2e) & _PAGE_PRESENT) )
+            continue;
+
+        /*
+         * We don't have 4k mappings in l2_xenmap[] at this point in boot, nor
+         * is this anticipated to change.  Simply assert the fact, rather than
+         * introducing dead logic to decend into l1 tables.
+         */
+        ASSERT(l2e_get_flags(*pl2e) & _PAGE_PSE);
+
+        *pl2e = l2e_from_intpte(l2e_get_intpte(*pl2e) + xen_phys_start);
+    }
+
+    /*
+     * Switch to relocated pagetables, shooting down global mappings as we go.
+     */
+    asm volatile (
+        "mov    %%cr4, %[cr4]\n\t"
+        "andb   $~%c[pge], %b[cr4]\n\t"
+        "mov    %[cr4], %%cr4\n\t"     /* CR4.PGE = 0 */
+        "mov    %[cr3], %%cr3\n\t"     /* CR3 = new pagetables */
+        "orb    $%c[pge], %b[cr4]\n\t"
+        "mov    %[cr4], %%cr4\n\t"     /* CR4.PGE = 1 */
+        : [cr4] "=&a" (tmp) /* Could be "r", but "a" makes better asm */
+        : [cr3] "r" (__pa(idle_pg_table)),
+          [pge] "i" (X86_CR4_PGE)
+        : "memory" );
+
+    /*
+     * End of the critical region.  Updates to locals and globals now work as
+     * expected.
+     *
+     * Updates to local variables which have been spilled to the stack since
+     * the memcpy() have been lost.  But we don't care, because they're all
+     * going out of scope imminently.
+     */
+
+    printk("New Xen image base address: %#lx\n", xen_phys_start);
+}
+
 #undef BOOTSTRAP_MAP_LIMIT
 
 static uint64_t __init consider_modules(
@@ -1255,81 +1362,12 @@ void __init noreturn __start_xen(unsigned long mbi_p)
          */
         if ( (end > s) && (end - reloc_size + XEN_IMG_OFFSET >= __pa(_end)) )
         {
-            l4_pgentry_t *pl4e;
-            l3_pgentry_t *pl3e;
-            l2_pgentry_t *pl2e;
-            int i, j, k;
-
             /* Select relocation address. */
             xen_phys_start = end - reloc_size;
             e = xen_phys_start + XEN_IMG_OFFSET;
             bootsym(trampoline_xen_phys_start) = xen_phys_start;
 
-            /*
-             * Perform relocation to new physical address.
-             * Before doing so we must sync static/global data with main memory
-             * with a barrier(). After this we must *not* modify static/global
-             * data until after we have switched to the relocated pagetables!
-             */
-            barrier();
-            memcpy(__va(__pa(_start)), _start, _end - _start);
-
-            /* Walk idle_pg_table, relocating non-leaf entries. */
-            pl4e = __va(__pa(idle_pg_table));
-            for ( i = 0 ; i < L4_PAGETABLE_ENTRIES; i++, pl4e++ )
-            {
-                if ( !(l4e_get_flags(*pl4e) & _PAGE_PRESENT) )
-                    continue;
-                *pl4e = l4e_from_intpte(l4e_get_intpte(*pl4e) +
-                                        xen_phys_start);
-                pl3e = __va(l4e_get_paddr(*pl4e));
-                for ( j = 0; j < L3_PAGETABLE_ENTRIES; j++, pl3e++ )
-                {
-                    /* Not present, 1GB mapping, or already relocated? */
-                    if ( !(l3e_get_flags(*pl3e) & _PAGE_PRESENT) ||
-                         (l3e_get_flags(*pl3e) & _PAGE_PSE) )
-                        continue;
-                    *pl3e = l3e_from_intpte(l3e_get_intpte(*pl3e) +
-                                            xen_phys_start);
-                    pl2e = __va(l3e_get_paddr(*pl3e));
-                    for ( k = 0; k < L2_PAGETABLE_ENTRIES; k++, pl2e++ )
-                    {
-                        /* Not present, PSE, or already relocated? */
-                        if ( !(l2e_get_flags(*pl2e) & _PAGE_PRESENT) ||
-                             (l2e_get_flags(*pl2e) & _PAGE_PSE) )
-                            continue;
-                        *pl2e = l2e_from_intpte(l2e_get_intpte(*pl2e) +
-                                                xen_phys_start);
-                    }
-                }
-            }
-
-            /* Walk l2_xenmap[], relocating 2M superpage leaves. */
-            pl2e = __va(__pa(l2_xenmap));
-            for ( i = 0; i < L2_PAGETABLE_ENTRIES; i++, pl2e++ )
-            {
-                if ( !(l2e_get_flags(*pl2e) & _PAGE_PRESENT) ||
-                     !(l2e_get_flags(*pl2e) & _PAGE_PSE) )
-                    continue;
-
-                *pl2e = l2e_from_intpte(l2e_get_intpte(*pl2e) + xen_phys_start);
-            }
-
-            /* Re-sync the stack and then switch to relocated pagetables. */
-            asm volatile (
-                "rep movsq        ; " /* re-sync the stack */
-                "movq %%cr4,%%rsi ; "
-                "andb $0x7f,%%sil ; "
-                "movq %%rsi,%%cr4 ; " /* CR4.PGE == 0 */
-                "movq %[pg],%%cr3 ; " /* CR3 == new pagetables */
-                "orb $0x80,%%sil  ; "
-                "movq %%rsi,%%cr4   " /* CR4.PGE == 1 */
-                : "=&S" (i), "=&D" (i), "=&c" (i) /* All outputs discarded. */
-                :  [pg] "r" (__pa(idle_pg_table)), "0" (cpu0_stack),
-                   "1" (__va(__pa(cpu0_stack))), "2" (STACK_SIZE / 8)
-                : "memory" );
-
-            printk("New Xen image base address: %#lx\n", xen_phys_start);
+            move_xen();
         }
 
         /* Is the region suitable for relocating the multiboot modules? */
-- 
2.11.0



  parent reply	other threads:[~2022-12-16 20:18 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-12-07 10:53 [PATCH 0/3] x86/boot: Cleanup Andrew Cooper
2021-12-07 10:53 ` [PATCH 1/3] x86/boot: Drop pte_update_limit from physical relocation logic Andrew Cooper
2021-12-07 11:37   ` Jan Beulich
2021-12-09 17:34     ` Andrew Cooper
2021-12-10  7:17       ` Jan Beulich
2022-12-09 20:36         ` Andrew Cooper
2021-12-07 10:53 ` [PATCH 2/3] x86/boot: Drop move_memory() and use memcpy() directly Andrew Cooper
2021-12-07 12:03   ` Jan Beulich
2021-12-09 19:58     ` Andrew Cooper
2021-12-10  7:35       ` Jan Beulich
2021-12-07 10:53 ` [PATCH 3/3] x86/boot: Don't double-copy the stack during physical relocation Andrew Cooper
2021-12-07 14:07   ` Jan Beulich
2022-12-09 21:42 ` [PATCH v2-ish] x86/boot: Relocate Xen using memcpy() directly Andrew Cooper
2022-12-12 10:16   ` Jan Beulich
2022-12-16 20:17 ` Andrew Cooper [this message]
2022-12-20 13:51   ` [PATCH v2-ish] x86/boot: Factor move_xen() out of __start_xen() Jan Beulich
2022-12-20 20:56     ` Andrew Cooper
2022-12-21  7:40       ` Jan Beulich
2023-01-04 20:04         ` Andrew Cooper
2023-01-05  7:13           ` Jan Beulich

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20221216201749.32164-1-andrew.cooper3@citrix.com \
    --to=andrew.cooper3@citrix.com \
    --cc=JBeulich@suse.com \
    --cc=roger.pau@citrix.com \
    --cc=wl@xen.org \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.