xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* [Xen-devel] [RFC PATCH v3 0/22] Live update: boot memory management, data stream handling, record format
@ 2020-01-30 16:12 David Woodhouse
  2020-01-30 16:13 ` [Xen-devel] [RFC PATCH v3 01/22] x86/setup: Don't skip 2MiB underneath relocated Xen image David Woodhouse
                   ` (2 more replies)
  0 siblings, 3 replies; 25+ messages in thread
From: David Woodhouse @ 2020-01-30 16:12 UTC (permalink / raw)
  To: Xen-devel
  Cc: Stefano Stabellini, Julien Grall, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Varad Gautam, paul, Ian Jackson,
	Hongyan Xia, Amit Shah, Roger Pau Monné


[-- Attachment #1.1: Type: text/plain, Size: 5774 bytes --]

Now with added documentation:
http://david.woodhou.se/live-update-handover.pdf


v1: 
Reserve a contiguous region of memory which can be safely used by the
boot allocator in the new Xen, before the live update data stream has
been processed and thus before the locations of all the other pages
which contain live domain data are known.

v2:

As the last gasp of kexec_reloc(), leave a 'breadcrumb' in the first
words of the reserved bootmem region, for Xen to find the live update
data. Which is mostly a guest-transparent live migration data stream,
except the guest memory is left in-place.

The breadcrumb has a magic value, the physical address of an MFN array
referencing the pages with actual data, and the number of such pages.
All of these are allocated in arbitrary heap pages (and not in the
reserved bootmem region) by the original Xen.

Provide functions on the "save" side for appending to the LU data
stream (designed to cope with the way that hvm_save_size() and
hvm_save() work), and on the "receive" side for detecting and mapping
it.

On the way to excluding "already in use" pages from being added to the
heap at start up, also fix the long-standing bug that pages marked bad
with 'badpage=' on the command line weren't being eschewed if they were
above HYPERVISOR_VIRT_END and added directly to the heap; only
init_boot_pages() was doing that filtering.

This is now handled by setting either PGC_broken (for bad pages) or
PGC_allocated (for those containing live update data) in the
corresponding page_info, at a time when the frametable is expected to
be initialised to zero. When init_heap_pages() sees such a page it
knows not to use it. Bad pages thus get completely ignored as they
should be (and put on the pointless page_broken_list that nobody ever
uses AFAICT).

The "in use" pages will need some rehabilitation (of refcount,
ownership etc.) before the system is in a correct state. That will come
shortly, as we start passing real domain data across this mechanism and
processing it.

v3:

Define the migration stream record format based on the libxc format,
provide helper functions for creating and parsing those records in a
live update data stream. Send a basic LU_VERSION record.

Refactor __setup_xen() a little to allow for the different code paths
for creating a new dom0 vs. resuming the existing one after live
update. This currently has stub functions for lu_reserve_pages() and
lu_restore_domains().

Next step is actually passing migration records over this interface
which allow us to preserve a Dom0 from one Xen to the next. We do have
that working in our internal proof-of-concept tree and we're working on
cleaning it up for public consumption.

David Woodhouse (21):
      x86/setup: Don't skip 2MiB underneath relocated Xen image
      x86/boot: Reserve live update boot memory
      Reserve live update memory regions
      Add KEXEC_RANGE_MA_LIVEUPDATE
      Add KEXEC_TYPE_LIVE_UPDATE
      Add IND_WRITE64 primitive to kexec kimage
      Add basic live update stream creation
      Add kimage_add_live_update_data()
      Add basic lu_save_all() shell
      Don't add bad pages above HYPERVISOR_VIRT_END to the domheap
      xen/vmap: allow vmap() to be called during early boot
      x86/setup: move vm_init() before end_boot_allocator()
      Detect live update breadcrumb at boot and map data stream
      Start documenting the live update handover
      Migrate migration stream definitions into Xen public headers
      Add lu_stream_{open,close,append}_record()
      Add LU_VERSION and LU_END records to live update stream
      Add shell of lu_reserve_pages()
      x86/setup: lift dom0 creation out into create_dom0 function
      x86/setup: finish plumbing in live update path through __start_xen()
      x86/setup: simplify handling of initrdidx when no initrd present

Wei Liu (1):
      xen/vmap: allow vm_init_type to be called during early_boot

 docs/misc/xen-command-line.pandoc        |   9 +
 docs/specs/libxc-migration-stream.pandoc |  19 +-
 docs/specs/live-update-handover.pandoc   | 371 +++++++++++++++++++++++++++++
 tools/libxc/xc_sr_common.c               |  20 +-
 tools/libxc/xc_sr_common_x86.c           |   4 +-
 tools/libxc/xc_sr_restore.c              |   2 +-
 tools/libxc/xc_sr_restore_x86_hvm.c      |   4 +-
 tools/libxc/xc_sr_restore_x86_pv.c       |   8 +-
 tools/libxc/xc_sr_save.c                 |   2 +-
 tools/libxc/xc_sr_save_x86_hvm.c         |   4 +-
 tools/libxc/xc_sr_save_x86_pv.c          |  12 +-
 tools/libxc/xc_sr_stream_format.h        |  97 +-------
 xen/arch/x86/machine_kexec.c             |  13 +-
 xen/arch/x86/setup.c                     | 395 ++++++++++++++++++++++---------
 xen/arch/x86/x86_64/kexec_reloc.S        |   9 +-
 xen/common/Makefile                      |   1 +
 xen/common/kexec.c                       |  24 ++
 xen/common/kimage.c                      |  34 +++
 xen/common/lu/Makefile                   |   1 +
 xen/common/lu/restore.c                  |  39 +++
 xen/common/lu/save.c                     |  67 ++++++
 xen/common/lu/stream.c                   | 219 +++++++++++++++++
 xen/common/page_alloc.c                  | 128 +++++++++-
 xen/common/vmap.c                        |  45 +++-
 xen/include/Makefile                     |   2 +-
 xen/include/asm-x86/config.h             |   1 +
 xen/include/public/kexec.h               |  13 +-
 xen/include/public/migration_stream.h    | 135 +++++++++++
 xen/include/xen/kimage.h                 |   4 +
 xen/include/xen/lu.h                     |  62 +++++
 xen/include/xen/mm.h                     |   2 +
 31 files changed, 1492 insertions(+), 254 deletions(-)


[-- Attachment #1.2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5174 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Xen-devel] [RFC PATCH v3 01/22] x86/setup: Don't skip 2MiB underneath relocated Xen image
  2020-01-30 16:12 [Xen-devel] [RFC PATCH v3 0/22] Live update: boot memory management, data stream handling, record format David Woodhouse
@ 2020-01-30 16:13 ` David Woodhouse
  2020-01-30 16:13   ` [Xen-devel] [RFC PATCH v3 02/22] x86/boot: Reserve live update boot memory David Woodhouse
                     ` (20 more replies)
  2020-01-31 17:16 ` [Xen-devel] [RFC PATCH v3 23/22] x86/smp: reset x2apic_enabled in smp_send_stop() David Woodhouse
  2020-02-18 15:22 ` [Xen-devel] [RFC PATCH v3 0/22] Live update: boot memory management, data stream handling, record format Ian Jackson
  2 siblings, 21 replies; 25+ messages in thread
From: David Woodhouse @ 2020-01-30 16:13 UTC (permalink / raw)
  To: Xen-devel
  Cc: Stefano Stabellini, Julien Grall, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Varad Gautam, paul, Ian Jackson,
	Hongyan Xia, Amit Shah, Roger Pau Monné

From: David Woodhouse <dwmw@amazon.co.uk>

Set 'e' correctly to reflect the location that Xen is actually relocated
to from its default 2MiB location. Not 2MiB below that.

This is only vaguely a bug fix. The "missing" 2MiB would have been used
in the end, and fed to the allocator. It's just that other things don't
get to sit right up *next* to the Xen image, and it isn't very tidy.

For live update, I'd quite like a single contiguous region for the
reserved bootmem and Xen, allowing the 'slack' in the former to be used
when Xen itself grows larger. Let's not allow 2MiB of random heap pages
to get in the way...

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---
 xen/arch/x86/setup.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
index d858883404..2677f127b9 100644
--- a/xen/arch/x86/setup.c
+++ b/xen/arch/x86/setup.c
@@ -1080,9 +1080,9 @@ void __init noreturn __start_xen(unsigned long mbi_p)
             unsigned long pte_update_limit;
 
             /* Select relocation address. */
-            e = end - reloc_size;
-            xen_phys_start = e;
-            bootsym(trampoline_xen_phys_start) = e;
+            xen_phys_start = end - reloc_size;
+            e = xen_phys_start + XEN_IMG_OFFSET;
+            bootsym(trampoline_xen_phys_start) = xen_phys_start;
 
             /*
              * No PTEs pointing above this address are candidates for relocation.
@@ -1090,7 +1090,7 @@ void __init noreturn __start_xen(unsigned long mbi_p)
              * and the beginning of region for destination image some PTEs may
              * point to addresses in range [e, e + XEN_IMG_OFFSET).
              */
-            pte_update_limit = PFN_DOWN(e + XEN_IMG_OFFSET);
+            pte_update_limit = PFN_DOWN(e);
 
             /*
              * Perform relocation to new physical address.
@@ -1099,7 +1099,7 @@ void __init noreturn __start_xen(unsigned long mbi_p)
              * data until after we have switched to the relocated pagetables!
              */
             barrier();
-            move_memory(e + XEN_IMG_OFFSET, XEN_IMG_OFFSET, _end - _start, 1);
+            move_memory(e, XEN_IMG_OFFSET, _end - _start, 1);
 
             /* Walk initial pagetables, relocating page directory entries. */
             pl4e = __va(__pa(idle_pg_table));
-- 
2.21.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [Xen-devel] [RFC PATCH v3 02/22] x86/boot: Reserve live update boot memory
  2020-01-30 16:13 ` [Xen-devel] [RFC PATCH v3 01/22] x86/setup: Don't skip 2MiB underneath relocated Xen image David Woodhouse
@ 2020-01-30 16:13   ` David Woodhouse
  2020-01-30 16:13   ` [Xen-devel] [RFC PATCH v3 03/22] Reserve live update memory regions David Woodhouse
                     ` (19 subsequent siblings)
  20 siblings, 0 replies; 25+ messages in thread
From: David Woodhouse @ 2020-01-30 16:13 UTC (permalink / raw)
  To: Xen-devel
  Cc: Stefano Stabellini, Julien Grall, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Varad Gautam, paul, Ian Jackson,
	Hongyan Xia, Amit Shah, Roger Pau Monné

From: David Woodhouse <dwmw@amazon.co.uk>

For live update to work, it will need a region of memory that can be
given to the boot allocator while it parses the state information from
the previous Xen and works out which of the other pages of memory it
can consume.

Reserve that like the crashdump region, and accept it on the command
line. Use only that region for early boot, and register the remaining
RAM (all of it for now, until the real live update happens) later.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---
 docs/misc/xen-command-line.pandoc |   9 +++
 xen/arch/x86/setup.c              | 120 ++++++++++++++++++++++++++++--
 xen/include/asm-x86/config.h      |   1 +
 3 files changed, 123 insertions(+), 7 deletions(-)

diff --git a/docs/misc/xen-command-line.pandoc b/docs/misc/xen-command-line.pandoc
index 5eb3a07276..61524f1056 100644
--- a/docs/misc/xen-command-line.pandoc
+++ b/docs/misc/xen-command-line.pandoc
@@ -1402,6 +1402,15 @@ This option is intended for debugging purposes only.  Enable MSR_DEBUGCTL.LBR
 in hypervisor context to be able to dump the Last Interrupt/Exception To/From
 record with other registers.
 
+### liveupdate
+> `= <size>[@<offset>]`
+
+Specify size and optionally placement of the boot memory reserved for
+Xen live update. The size must be a multiple of 2MiB.
+
+A trailing `@<offset>` specifies the exact address this area should be
+placed at, which must be below 4GiB.
+
 ### loglvl
 > `= <level>[/<rate-limited level>]` where level is `none | error | warning | info | debug | all`
 
diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
index 2677f127b9..63f06d4856 100644
--- a/xen/arch/x86/setup.c
+++ b/xen/arch/x86/setup.c
@@ -681,6 +681,47 @@ static unsigned int __init copy_bios_e820(struct e820entry *map, unsigned int li
 /* How much of the directmap is prebuilt at compile time. */
 #define PREBUILT_MAP_LIMIT (1 << L2_PAGETABLE_SHIFT)
 
+unsigned long lu_bootmem_start, lu_bootmem_size;
+static unsigned long lu_breadcrumb_phys;
+
+static int __init parse_liveupdate(const char *str)
+{
+    const char *cur;
+
+    lu_bootmem_size = parse_size_and_unit(cur = str, &str);
+    if ( !lu_bootmem_size || cur == str )
+        return -EINVAL;
+
+    if ( lu_bootmem_size & ((MB(2) - 1)) )
+    {
+        printk(XENLOG_WARNING "Live update size must be a multiple of 2MiB\n");
+        return -EINVAL;
+    }
+
+    if (!*str) {
+        printk(XENLOG_INFO "Live update size 0x%lx\n", lu_bootmem_size);
+        return 0;
+    }
+
+    if (*str != '@')
+        return -EINVAL;
+
+    lu_bootmem_start = parse_size_and_unit(cur = str + 1, &str);
+    if ( !lu_bootmem_start || cur == str )
+        return -EINVAL;
+
+    printk(XENLOG_INFO "Live update area 0x%lx-0x%lx (0x%lx)\n", lu_bootmem_start,
+           lu_bootmem_start + lu_bootmem_size, lu_bootmem_size);
+
+    /*
+     * If present, the breadcrumb leading to the migration data stream is
+     * in the very beginning of the reserved bootmem region.
+     */
+    lu_breadcrumb_phys = lu_bootmem_start;
+    return 0;
+}
+custom_param("liveupdate", parse_liveupdate);
+
 void __init noreturn __start_xen(unsigned long mbi_p)
 {
     char *memmap_type = NULL;
@@ -690,7 +731,7 @@ void __init noreturn __start_xen(unsigned long mbi_p)
     module_t *mod;
     unsigned long nr_pages, raw_max_page, modules_headroom, module_map[1];
     int i, j, e820_warn = 0, bytes = 0;
-    bool acpi_boot_table_init_done = false, relocated = false;
+    bool acpi_boot_table_init_done = false, relocated = false, lu_reserved = false;
     int ret;
     struct ns16550_defaults ns16550 = {
         .data_bits = 8,
@@ -980,6 +1021,22 @@ void __init noreturn __start_xen(unsigned long mbi_p)
     set_kexec_crash_area_size((u64)nr_pages << PAGE_SHIFT);
     kexec_reserve_area(&boot_e820);
 
+    if ( lu_bootmem_start )
+    {
+        /* XX: Check it's in usable memory first */
+        reserve_e820_ram(&boot_e820, lu_bootmem_start, lu_bootmem_start + lu_bootmem_size);
+
+        /* Since it will already be out of the e820 map by the time the first
+         * loop over physical memory, map it manually already. */
+        set_pdx_range(lu_bootmem_start >> PAGE_SHIFT,
+                      (lu_bootmem_start + lu_bootmem_size) >> PAGE_SHIFT);
+        map_pages_to_xen((unsigned long)__va(lu_bootmem_start),
+                         maddr_to_mfn(lu_bootmem_start),
+                         PFN_DOWN(lu_bootmem_size), PAGE_HYPERVISOR);
+
+        lu_reserved = true;
+    }
+
     initial_images = mod;
     nr_initial_images = mbi->mods_count;
 
@@ -1207,6 +1264,16 @@ void __init noreturn __start_xen(unsigned long mbi_p)
             printk("New Xen image base address: %#lx\n", xen_phys_start);
         }
 
+        /* Is the region suitable for the live update bootmem region? */
+        if ( lu_bootmem_size && ! lu_bootmem_start && e < limit )
+        {
+            end = consider_modules(s, e, lu_bootmem_size, mod, mbi->mods_count + relocated, -1);
+            if ( end )
+            {
+                e = lu_bootmem_start = end - lu_bootmem_size;
+            }
+        }
+
         /* Is the region suitable for relocating the multiboot modules? */
         for ( j = mbi->mods_count - 1; j >= 0; j-- )
         {
@@ -1270,6 +1337,15 @@ void __init noreturn __start_xen(unsigned long mbi_p)
     if ( !xen_phys_start )
         panic("Not enough memory to relocate Xen\n");
 
+    if ( lu_bootmem_start )
+    {
+        if ( !lu_reserved )
+            reserve_e820_ram(&boot_e820, lu_bootmem_start, lu_bootmem_start + lu_bootmem_size);
+        printk("LU bootmem: 0x%lx - 0x%lx\n", lu_bootmem_start, lu_bootmem_start + lu_bootmem_size);
+        init_boot_pages(lu_bootmem_start, lu_bootmem_start + lu_bootmem_size);
+        lu_reserved = true;
+    }
+
     /* This needs to remain in sync with xen_in_range(). */
     reserve_e820_ram(&boot_e820, __pa(_stext), __pa(__2M_rwdata_end));
 
@@ -1281,8 +1357,8 @@ void __init noreturn __start_xen(unsigned long mbi_p)
         xenheap_max_mfn(PFN_DOWN(highmem_start - 1));
 
     /*
-     * Walk every RAM region and map it in its entirety (on x86/64, at least)
-     * and notify it to the boot allocator.
+     * Walk every RAM region and map it in its entirety and (unless in
+     * live update mode) notify it to the boot allocator.
      */
     for ( i = 0; i < boot_e820.nr_map; i++ )
     {
@@ -1335,6 +1411,7 @@ void __init noreturn __start_xen(unsigned long mbi_p)
                 printk(XENLOG_WARNING "Ignoring inaccessible memory range"
                                       " %013"PRIx64"-%013"PRIx64"\n",
                        s, e);
+                reserve_e820_ram(&boot_e820, s, e);
                 continue;
             }
             map_e = e;
@@ -1342,6 +1419,7 @@ void __init noreturn __start_xen(unsigned long mbi_p)
             printk(XENLOG_WARNING "Ignoring inaccessible memory range"
                                   " %013"PRIx64"-%013"PRIx64"\n",
                    e, map_e);
+            reserve_e820_ram(&boot_e820, e, map_e);
         }
 
         set_pdx_range(s >> PAGE_SHIFT, e >> PAGE_SHIFT);
@@ -1352,7 +1430,9 @@ void __init noreturn __start_xen(unsigned long mbi_p)
                       ARRAY_SIZE(l2_directmap) << L2_PAGETABLE_SHIFT);
 
         /* Pass mapped memory to allocator /before/ creating new mappings. */
-        init_boot_pages(s, min(map_s, e));
+        if ( !lu_reserved)
+            init_boot_pages(s, min(map_s, e));
+
         s = map_s;
         if ( s < map_e )
         {
@@ -1360,7 +1440,8 @@ void __init noreturn __start_xen(unsigned long mbi_p)
 
             map_s = (s + mask) & ~mask;
             map_e &= ~mask;
-            init_boot_pages(map_s, map_e);
+            if ( !lu_reserved)
+                init_boot_pages(map_s, map_e);
         }
 
         if ( map_s > map_e )
@@ -1376,7 +1457,8 @@ void __init noreturn __start_xen(unsigned long mbi_p)
             {
                 map_pages_to_xen((unsigned long)__va(map_e), maddr_to_mfn(map_e),
                                  PFN_DOWN(end - map_e), PAGE_HYPERVISOR);
-                init_boot_pages(map_e, end);
+                if ( !lu_reserved)
+                    init_boot_pages(map_e, end);
                 map_e = end;
             }
         }
@@ -1391,7 +1473,8 @@ void __init noreturn __start_xen(unsigned long mbi_p)
         {
             map_pages_to_xen((unsigned long)__va(s), maddr_to_mfn(s),
                              PFN_DOWN(map_s - s), PAGE_HYPERVISOR);
-            init_boot_pages(s, map_s);
+            if ( !lu_reserved)
+                init_boot_pages(s, map_s);
         }
     }
 
@@ -1489,6 +1572,29 @@ void __init noreturn __start_xen(unsigned long mbi_p)
 
     numa_initmem_init(0, raw_max_page);
 
+    if ( lu_bootmem_start )
+    {
+        unsigned long limit = virt_to_mfn(HYPERVISOR_VIRT_END - 1);
+        uint64_t mask = PAGE_SIZE - 1;
+
+        for ( i = 0; i < boot_e820.nr_map; i++ )
+        {
+            uint64_t s, e;
+
+            if ( boot_e820.map[i].type != E820_RAM )
+                continue;
+            s = (boot_e820.map[i].addr + mask) & ~mask;
+            e = (boot_e820.map[i].addr + boot_e820.map[i].size) & ~mask;
+            s = max_t(uint64_t, s, 1<<20);
+            if ( PFN_DOWN(s) > limit )
+                continue;
+            if ( PFN_DOWN(e) > limit )
+                e = pfn_to_paddr(limit);
+
+            init_boot_pages(s, e);
+        }
+    }
+
     if ( max_page - 1 > virt_to_mfn(HYPERVISOR_VIRT_END - 1) )
     {
         unsigned long limit = virt_to_mfn(HYPERVISOR_VIRT_END - 1);
diff --git a/xen/include/asm-x86/config.h b/xen/include/asm-x86/config.h
index d0cfbb70a8..d59e5101c3 100644
--- a/xen/include/asm-x86/config.h
+++ b/xen/include/asm-x86/config.h
@@ -279,6 +279,7 @@ extern unsigned char boot_edid_info[128];
 
 #ifndef __ASSEMBLY__
 extern unsigned long xen_phys_start;
+extern unsigned long lu_bootmem_start, lu_bootmem_size;
 #endif
 
 /* GDT/LDT shadow mapping area. The first per-domain-mapping sub-area. */
-- 
2.21.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [Xen-devel] [RFC PATCH v3 03/22] Reserve live update memory regions
  2020-01-30 16:13 ` [Xen-devel] [RFC PATCH v3 01/22] x86/setup: Don't skip 2MiB underneath relocated Xen image David Woodhouse
  2020-01-30 16:13   ` [Xen-devel] [RFC PATCH v3 02/22] x86/boot: Reserve live update boot memory David Woodhouse
@ 2020-01-30 16:13   ` David Woodhouse
  2020-01-30 16:13   ` [Xen-devel] [RFC PATCH v3 04/22] Add KEXEC_RANGE_MA_LIVEUPDATE David Woodhouse
                     ` (18 subsequent siblings)
  20 siblings, 0 replies; 25+ messages in thread
From: David Woodhouse @ 2020-01-30 16:13 UTC (permalink / raw)
  To: Xen-devel
  Cc: Stefano Stabellini, Julien Grall, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Varad Gautam, paul, Ian Jackson,
	Hongyan Xia, Amit Shah, Roger Pau Monné

From: David Woodhouse <dwmw@amazon.co.uk>

The live update handover requires that a region of memory be reserved
for the new Xen to use in its boot allocator. The original Xen may use
that memory but not for any pages which are mapped to domains, or which
would need to be preserved across the live update for any other reason.

The same constraints apply to initmem pages freed from the Xen image,
since the new Xen will be loaded into the same physical location as the
previous Xen.

There is separate work ongoing which will make the xenheap meet this
requirement by eliminating share_xen_page_with_guest(). For the meantime,
just don't add those pages to the heap at all in the live update case.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---
 xen/arch/x86/setup.c    | 12 ++++++++++-
 xen/common/page_alloc.c | 45 +++++++++++++++++++++++++++++++++++++++++
 xen/include/xen/mm.h    |  2 ++
 3 files changed, 58 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
index 63f06d4856..dba8c3f0a1 100644
--- a/xen/arch/x86/setup.c
+++ b/xen/arch/x86/setup.c
@@ -611,7 +611,7 @@ static void noinline init_done(void)
     }
 
     destroy_xen_mappings(start, end);
-    init_xenheap_pages(__pa(start), __pa(end));
+    init_lu_reserved_pages(__pa(start), __pa(end));
     printk("Freed %lukB init memory\n", (end - start) >> 10);
 
     startup_cpu_idle_loop();
@@ -1577,6 +1577,16 @@ void __init noreturn __start_xen(unsigned long mbi_p)
         unsigned long limit = virt_to_mfn(HYPERVISOR_VIRT_END - 1);
         uint64_t mask = PAGE_SIZE - 1;
 
+        /*
+         * Pages in the reserved LU region must not be used for anything which
+         * will need to persist across a live update. There is ongoing work to
+         * eliminate or limit the use of share_xen_page_with_guest() and get
+         * to a point where we can actually honour that promise, but for now
+         * just *don't* add those pages to the heap. Clear the boot allocator
+         * out completely, before adding the non-reserved ranges.
+         */
+        clear_boot_allocator();
+
         for ( i = 0; i < boot_e820.nr_map; i++ )
         {
             uint64_t s, e;
diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
index 919a270587..a74bf02559 100644
--- a/xen/common/page_alloc.c
+++ b/xen/common/page_alloc.c
@@ -1879,6 +1879,51 @@ void __init end_boot_allocator(void)
     printk("\n");
 }
 
+/*
+ * Called when live update is supported. The memory ranges currently
+ * still free in the boot allocator must be added to the reserved
+ * heap, distinct from the xenheap in that pages from it MUST NOT be
+ * used for anything which will be mapped to a domain or otherwise
+ * need to survive a live update.
+ */
+void __init clear_boot_allocator(void)
+{
+    unsigned int i;
+
+    /* Add at least one range on node zero first, if we can. */
+    for ( i = 0; i < nr_bootmem_regions; i++ )
+    {
+        struct bootmem_region *r = &bootmem_region_list[i];
+        if ( (r->s < r->e) &&
+             (phys_to_nid(pfn_to_paddr(r->s)) == cpu_to_node(0)) )
+        {
+            init_lu_reserved_pages(r->s << PAGE_SHIFT, r->e << PAGE_SHIFT);
+            r->e = r->s;
+            break;
+        }
+    }
+    for ( i = nr_bootmem_regions; i-- > 0; )
+    {
+        struct bootmem_region *r = &bootmem_region_list[i];
+        if ( r->s < r->e )
+            init_lu_reserved_pages(r->s << PAGE_SHIFT, r->e << PAGE_SHIFT);
+    }
+    nr_bootmem_regions = 0;
+}
+
+void init_lu_reserved_pages(paddr_t ps, paddr_t pe)
+{
+    if (!lu_bootmem_start)
+        init_xenheap_pages(ps, pe);
+
+    /* There is ongoing work for other reasons to eliminate the use of
+     * share_xen_page_with_guest() and get to a point where the normal
+     * xenheap actually meets the requirement we need for live update
+     * reserved memory, that nothing allocated from it will be mapped
+     * to a guest and/or need to be preserved over a live update.
+     * Until then, we simply don't use these pages after boot. */
+}
+
 static void __init smp_scrub_heap_pages(void *data)
 {
     unsigned long mfn, start, end;
diff --git a/xen/include/xen/mm.h b/xen/include/xen/mm.h
index d0d095d9c7..d120d84d23 100644
--- a/xen/include/xen/mm.h
+++ b/xen/include/xen/mm.h
@@ -158,8 +158,10 @@ struct domain *__must_check page_get_owner_and_reference(struct page_info *);
 void init_boot_pages(paddr_t ps, paddr_t pe);
 mfn_t alloc_boot_pages(unsigned long nr_pfns, unsigned long pfn_align);
 void end_boot_allocator(void);
+void clear_boot_allocator(void);
 
 /* Xen suballocator. These functions are interrupt-safe. */
+void init_lu_reserved_pages(paddr_t ps, paddr_t pe);
 void init_xenheap_pages(paddr_t ps, paddr_t pe);
 void xenheap_max_mfn(unsigned long mfn);
 void *alloc_xenheap_pages(unsigned int order, unsigned int memflags);
-- 
2.21.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [Xen-devel] [RFC PATCH v3 04/22] Add KEXEC_RANGE_MA_LIVEUPDATE
  2020-01-30 16:13 ` [Xen-devel] [RFC PATCH v3 01/22] x86/setup: Don't skip 2MiB underneath relocated Xen image David Woodhouse
  2020-01-30 16:13   ` [Xen-devel] [RFC PATCH v3 02/22] x86/boot: Reserve live update boot memory David Woodhouse
  2020-01-30 16:13   ` [Xen-devel] [RFC PATCH v3 03/22] Reserve live update memory regions David Woodhouse
@ 2020-01-30 16:13   ` David Woodhouse
  2020-01-30 16:13   ` [Xen-devel] [RFC PATCH v3 05/22] Add KEXEC_TYPE_LIVE_UPDATE David Woodhouse
                     ` (17 subsequent siblings)
  20 siblings, 0 replies; 25+ messages in thread
From: David Woodhouse @ 2020-01-30 16:13 UTC (permalink / raw)
  To: Xen-devel
  Cc: Stefano Stabellini, Julien Grall, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Varad Gautam, paul, Ian Jackson,
	Hongyan Xia, Amit Shah, Roger Pau Monné

From: David Woodhouse <dwmw@amazon.co.uk>

This allows kexec userspace to tell the next Xen where the range is,
on its command line.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---
 xen/arch/x86/machine_kexec.c | 13 ++++++++++---
 xen/include/public/kexec.h   |  1 +
 2 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/xen/arch/x86/machine_kexec.c b/xen/arch/x86/machine_kexec.c
index b70d5a6a86..273bc20664 100644
--- a/xen/arch/x86/machine_kexec.c
+++ b/xen/arch/x86/machine_kexec.c
@@ -186,9 +186,16 @@ void machine_kexec(struct kexec_image *image)
 
 int machine_kexec_get(xen_kexec_range_t *range)
 {
-	if (range->range != KEXEC_RANGE_MA_XEN)
-		return -EINVAL;
-	return machine_kexec_get_xen(range);
+    switch (range->range) {
+    case KEXEC_RANGE_MA_XEN:
+        return machine_kexec_get_xen(range);
+    case KEXEC_RANGE_MA_LIVEUPDATE:
+        range->start = lu_bootmem_start;
+        range->size = lu_bootmem_size;
+        return 0;
+    default:
+        return -EINVAL;
+    }
 }
 
 void arch_crash_save_vmcoreinfo(void)
diff --git a/xen/include/public/kexec.h b/xen/include/public/kexec.h
index 3f2a118381..298381af8d 100644
--- a/xen/include/public/kexec.h
+++ b/xen/include/public/kexec.h
@@ -150,6 +150,7 @@ typedef struct xen_kexec_load_v1 {
 #define KEXEC_RANGE_MA_EFI_MEMMAP 5 /* machine address and size of
                                      * of the EFI Memory Map */
 #define KEXEC_RANGE_MA_VMCOREINFO 6 /* machine address and size of vmcoreinfo */
+#define KEXEC_RANGE_MA_LIVEUPDATE 7 /* Boot mem for live update */
 
 /*
  * Find the address and size of certain memory areas
-- 
2.21.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [Xen-devel] [RFC PATCH v3 05/22] Add KEXEC_TYPE_LIVE_UPDATE
  2020-01-30 16:13 ` [Xen-devel] [RFC PATCH v3 01/22] x86/setup: Don't skip 2MiB underneath relocated Xen image David Woodhouse
                     ` (2 preceding siblings ...)
  2020-01-30 16:13   ` [Xen-devel] [RFC PATCH v3 04/22] Add KEXEC_RANGE_MA_LIVEUPDATE David Woodhouse
@ 2020-01-30 16:13   ` David Woodhouse
  2020-01-30 16:13   ` [Xen-devel] [RFC PATCH v3 06/22] Add IND_WRITE64 primitive to kexec kimage David Woodhouse
                     ` (16 subsequent siblings)
  20 siblings, 0 replies; 25+ messages in thread
From: David Woodhouse @ 2020-01-30 16:13 UTC (permalink / raw)
  To: Xen-devel
  Cc: Stefano Stabellini, Julien Grall, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Varad Gautam, paul, Ian Jackson,
	Hongyan Xia, Amit Shah, Roger Pau Monné

From: David Woodhouse <dwmw@amazon.co.uk>

This is identical to the default case... for now.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---
 xen/common/kexec.c         | 18 ++++++++++++++++++
 xen/include/public/kexec.h | 12 ++++++++----
 2 files changed, 26 insertions(+), 4 deletions(-)

diff --git a/xen/common/kexec.c b/xen/common/kexec.c
index a262cc5a18..a78aa4f5b0 100644
--- a/xen/common/kexec.c
+++ b/xen/common/kexec.c
@@ -404,6 +404,19 @@ static long kexec_reboot(void *_image)
     return 0;
 }
 
+static long kexec_live_update(void *_image)
+{
+    struct kexec_image *image = _image;
+
+    kexecing = TRUE;
+
+    kexec_common_shutdown();
+    machine_reboot_kexec(image);
+
+    BUG();
+    return 0;
+}
+
 static void do_crashdump_trigger(unsigned char key)
 {
     printk("'%c' pressed -> triggering crashdump\n", key);
@@ -736,6 +749,7 @@ static int kexec_load_get_bits(int type, int *base, int *bit)
     switch ( type )
     {
     case KEXEC_TYPE_DEFAULT:
+    case KEXEC_TYPE_LIVE_UPDATE:
         *base = KEXEC_IMAGE_DEFAULT_BASE;
         *bit = KEXEC_FLAG_DEFAULT_POS;
         break;
@@ -837,6 +851,10 @@ static int kexec_exec(XEN_GUEST_HANDLE_PARAM(void) uarg)
         image = kexec_image[base + pos];
         ret = continue_hypercall_on_cpu(0, kexec_reboot, image);
         break;
+    case KEXEC_TYPE_LIVE_UPDATE:
+        image = kexec_image[base + pos];
+        ret = continue_hypercall_on_cpu(0, kexec_live_update, image);
+        break;
     case KEXEC_TYPE_CRASH:
         kexec_crash(); /* Does not return */
         break;
diff --git a/xen/include/public/kexec.h b/xen/include/public/kexec.h
index 298381af8d..f5230286d3 100644
--- a/xen/include/public/kexec.h
+++ b/xen/include/public/kexec.h
@@ -71,18 +71,22 @@
  */
 
 /*
- * Kexec supports two types of operation:
+ * Kexec supports three types of operation:
  * - kexec into a regular kernel, very similar to a standard reboot
  *   - KEXEC_TYPE_DEFAULT is used to specify this type
  * - kexec into a special "crash kernel", aka kexec-on-panic
  *   - KEXEC_TYPE_CRASH is used to specify this type
  *   - parts of our system may be broken at kexec-on-panic time
  *     - the code should be kept as simple and self-contained as possible
+ * - Live update into a new Xen, preserving all running domains
+ *   - KEXEC_TYPE_LIVE_UPDATE is used to specify this type
+ *   - Xen performs guest-transparent live migration and stores live
+ *     update state in memory, passing it to the new Xen.
  */
 
-#define KEXEC_TYPE_DEFAULT 0
-#define KEXEC_TYPE_CRASH   1
-
+#define KEXEC_TYPE_DEFAULT          0
+#define KEXEC_TYPE_CRASH            1
+#define KEXEC_TYPE_LIVE_UPDATE      2
 
 /* The kexec implementation for Xen allows the user to load two
  * types of kernels, KEXEC_TYPE_DEFAULT and KEXEC_TYPE_CRASH.
-- 
2.21.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [Xen-devel] [RFC PATCH v3 06/22] Add IND_WRITE64 primitive to kexec kimage
  2020-01-30 16:13 ` [Xen-devel] [RFC PATCH v3 01/22] x86/setup: Don't skip 2MiB underneath relocated Xen image David Woodhouse
                     ` (3 preceding siblings ...)
  2020-01-30 16:13   ` [Xen-devel] [RFC PATCH v3 05/22] Add KEXEC_TYPE_LIVE_UPDATE David Woodhouse
@ 2020-01-30 16:13   ` David Woodhouse
  2020-01-30 16:13   ` [Xen-devel] [RFC PATCH v3 07/22] Add basic live update stream creation David Woodhouse
                     ` (15 subsequent siblings)
  20 siblings, 0 replies; 25+ messages in thread
From: David Woodhouse @ 2020-01-30 16:13 UTC (permalink / raw)
  To: Xen-devel
  Cc: Stefano Stabellini, Julien Grall, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Varad Gautam, paul, Ian Jackson,
	Hongyan Xia, Amit Shah, Roger Pau Monné

From: David Woodhouse <dwmw@amazon.co.uk>

This allows a single page-aligned physical address to be written to
the current destination, intended to pass the location of the live
update data stream from one Xen to the next.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---
 xen/arch/x86/x86_64/kexec_reloc.S | 9 ++++++++-
 xen/include/xen/kimage.h          | 1 +
 2 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/x86_64/kexec_reloc.S b/xen/arch/x86/x86_64/kexec_reloc.S
index d488d127cf..a75f989926 100644
--- a/xen/arch/x86/x86_64/kexec_reloc.S
+++ b/xen/arch/x86/x86_64/kexec_reloc.S
@@ -131,11 +131,18 @@ is_source:
         jmp     next_entry
 is_zero:
         testb   $IND_ZERO, %cl
-        jz      next_entry
+        jz      is_write64
         movl    $(PAGE_SIZE / 8), %ecx  /* Zero the destination page. */
         xorl    %eax, %eax
         rep stosq
         jmp     next_entry
+is_write64:
+        testb   $IND_WRITE64, %cl
+        jz      next_entry
+        andq    $PAGE_MASK, %rcx
+        movq    %rcx, %rax
+        stosq
+        jmp     next_entry
 done:
         popq    %rbx
         ret
diff --git a/xen/include/xen/kimage.h b/xen/include/xen/kimage.h
index cbfb9e9054..e94839d7c3 100644
--- a/xen/include/xen/kimage.h
+++ b/xen/include/xen/kimage.h
@@ -6,6 +6,7 @@
 #define IND_DONE         0x4
 #define IND_SOURCE       0x8
 #define IND_ZERO        0x10
+#define IND_WRITE64     0x20
 
 #ifndef __ASSEMBLY__
 
-- 
2.21.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [Xen-devel] [RFC PATCH v3 07/22] Add basic live update stream creation
  2020-01-30 16:13 ` [Xen-devel] [RFC PATCH v3 01/22] x86/setup: Don't skip 2MiB underneath relocated Xen image David Woodhouse
                     ` (4 preceding siblings ...)
  2020-01-30 16:13   ` [Xen-devel] [RFC PATCH v3 06/22] Add IND_WRITE64 primitive to kexec kimage David Woodhouse
@ 2020-01-30 16:13   ` David Woodhouse
  2020-01-30 16:13   ` [Xen-devel] [RFC PATCH v3 08/22] Add kimage_add_live_update_data() David Woodhouse
                     ` (14 subsequent siblings)
  20 siblings, 0 replies; 25+ messages in thread
From: David Woodhouse @ 2020-01-30 16:13 UTC (permalink / raw)
  To: Xen-devel
  Cc: Stefano Stabellini, Julien Grall, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Varad Gautam, paul, Ian Jackson,
	Hongyan Xia, Amit Shah, Roger Pau Monné

From: David Woodhouse <dwmw@amazon.co.uk>

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---
 xen/common/Makefile    |   1 +
 xen/common/lu/Makefile |   1 +
 xen/common/lu/stream.c | 135 +++++++++++++++++++++++++++++++++++++++++
 xen/include/xen/lu.h   |  29 +++++++++
 4 files changed, 166 insertions(+)
 create mode 100644 xen/common/lu/Makefile
 create mode 100644 xen/common/lu/stream.c
 create mode 100644 xen/include/xen/lu.h

diff --git a/xen/common/Makefile b/xen/common/Makefile
index 2abb8250b0..60502bb909 100644
--- a/xen/common/Makefile
+++ b/xen/common/Makefile
@@ -72,3 +72,4 @@ subdir-$(CONFIG_UBSAN) += ubsan
 
 subdir-$(CONFIG_NEEDS_LIBELF) += libelf
 subdir-$(CONFIG_HAS_DEVICE_TREE) += libfdt
+subdir-y += lu
diff --git a/xen/common/lu/Makefile b/xen/common/lu/Makefile
new file mode 100644
index 0000000000..68991b3ca4
--- /dev/null
+++ b/xen/common/lu/Makefile
@@ -0,0 +1 @@
+obj-y += stream.o
diff --git a/xen/common/lu/stream.c b/xen/common/lu/stream.c
new file mode 100644
index 0000000000..10e123a466
--- /dev/null
+++ b/xen/common/lu/stream.c
@@ -0,0 +1,135 @@
+/*
+ * Live update data stream handling.
+ *
+ * During live update, one version of Xen (Xen#1) performs a kexec into
+ * a new version of Xen (Xen#2), performing guest-transparent live
+ * migration of all existing domains.
+ *
+ * Xen#2 must avoid scribbling on any pages which may belong to existing
+ * domains. In order to achieve this, we reserve a contiguous area of
+ * physical memory to be used by the boot allocator in Xen#2. Xen must
+ * not allocate pages from that region which are later shared with
+ * guests or need to persist across live update.
+ *
+ * The live update bootmem region is reserved by the first Xen to boot,
+ * and userspace can obtain its address using KEXEC_CMD_kexec_get_range
+ * with the new KEXEC_RANGE_MA_LIVEUPDATE type. Userspace kexec(8)
+ * appends the appropriate 'liveupdate=' parameter to the command line
+ * of Xen#2 when setting up the kexec image.
+ *
+ * At the time of kexec, Xen#1 serialises the domain state into buffers
+ * allocated from its own heap., then creates a single physically
+ * contiguous scatter-gather list containing the MFNs of those data
+ * pages (which Xen#2 can then trivially vmap()). In a system with
+ * 4KiB pages, the MFN list for the live update data stream will fit
+ * into a single page until the total size of the live update data
+ * exceeds 2MiB.
+ *
+ * The physical address of the MFN list is passed to Xen#2 by placing
+ * it at the start of the reserved live update bootmem region, with a
+ * magic number to avoid false positives.
+ */
+
+#include <xen/types.h>
+#include <xen/vmap.h>
+#include <xen/lu.h>
+
+static int lu_stream_extend(struct lu_stream *stream, int nr_pages)
+{
+    int order = get_order_from_bytes((nr_pages + 1) * sizeof(mfn_t));
+    int old_order = get_order_from_bytes((stream->nr_pages + 1) * sizeof(mfn_t));
+
+    if ( !stream->nr_pages || order > old_order )
+    {
+        mfn_t *new_pglist = alloc_xenheap_pages(order, 0);
+
+		if ( !new_pglist )
+            return -ENOMEM;
+
+        if ( stream->nr_pages )
+        {
+            memcpy(new_pglist, stream->pagelist,
+                   stream->nr_pages * sizeof(mfn_t));
+            free_xenheap_pages(stream->pagelist, old_order);
+        }
+        stream->pagelist = new_pglist;
+    }
+    while ( stream->nr_pages < nr_pages )
+    {
+        struct page_info *pg = alloc_domheap_page(NULL, MEMF_no_owner);
+
+        if ( !pg )
+        {
+            /* Ensure the cleanup frees the correct order of pagelist */
+            stream->nr_pages++;
+
+            return -ENOMEM;
+        }
+        stream->pagelist[stream->nr_pages++] = page_to_mfn(pg);
+        stream->pagelist[stream->nr_pages] = INVALID_MFN;
+    }
+
+    if ( stream->data )
+        vunmap(stream->data);
+    stream->data = vmap(stream->pagelist, stream->nr_pages);
+    if ( !stream->data )
+        return -ENOMEM;
+
+    return 0;
+}
+
+void *lu_stream_reserve(struct lu_stream *stream, size_t size)
+{
+    int nr_pages = (stream->len + size + PAGE_SIZE - 1) >> PAGE_SHIFT;
+
+    if ( stream->nr_pages < nr_pages && lu_stream_extend(stream, nr_pages) )
+        return NULL;
+
+    return stream->data + stream->len;
+}
+
+void lu_stream_end_reservation(struct lu_stream *stream, size_t size)
+{
+    stream->len += size;
+}
+
+int lu_stream_append(struct lu_stream *stream, const void *data, size_t size)
+{
+	void *p = lu_stream_reserve(stream, size);
+
+    if ( !p )
+        return -ENOMEM;
+    memcpy(p, data, size);
+    lu_stream_end_reservation(stream, size);
+
+    return 0;
+}
+
+void lu_stream_free(struct lu_stream *stream)
+{
+    unsigned int order = get_order_from_bytes((stream->nr_pages + 1) * sizeof(mfn_t));
+    unsigned int i;
+
+    if ( stream->data )
+        vunmap(stream->data);
+
+    if ( stream->pagelist )
+    {
+        for ( i = 0; i < stream->nr_pages; i++ )
+        {
+            if (mfn_valid(stream->pagelist[i]))
+                free_domheap_page(mfn_to_page(stream->pagelist[i]));
+        }
+        free_xenheap_pages(stream->pagelist, order);
+    }
+}
+
+/*
+ * local variables:
+ * mode: c
+ * c-file-style: "bsd"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * end:
+ */
diff --git a/xen/include/xen/lu.h b/xen/include/xen/lu.h
new file mode 100644
index 0000000000..abb30545fe
--- /dev/null
+++ b/xen/include/xen/lu.h
@@ -0,0 +1,29 @@
+#ifndef __XEN_LU_H__
+#define __XEN_LU_H__
+
+#include <xen/types.h>
+#include <xen/mm.h>
+
+struct lu_stream {
+    mfn_t *pagelist;
+    size_t len;
+    int nr_pages;
+    char *data;
+};
+
+void *lu_stream_reserve(struct lu_stream *stream, size_t size);
+void lu_stream_end_reservation(struct lu_stream *stream, size_t size);
+int lu_stream_append(struct lu_stream *stream, const void *data, size_t size);
+void lu_stream_free(struct lu_stream *stream);
+
+#endif /* __XEN_LU_H__ */
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
-- 
2.21.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [Xen-devel] [RFC PATCH v3 08/22] Add kimage_add_live_update_data()
  2020-01-30 16:13 ` [Xen-devel] [RFC PATCH v3 01/22] x86/setup: Don't skip 2MiB underneath relocated Xen image David Woodhouse
                     ` (5 preceding siblings ...)
  2020-01-30 16:13   ` [Xen-devel] [RFC PATCH v3 07/22] Add basic live update stream creation David Woodhouse
@ 2020-01-30 16:13   ` David Woodhouse
  2020-01-30 16:13   ` [Xen-devel] [RFC PATCH v3 09/22] Add basic lu_save_all() shell David Woodhouse
                     ` (13 subsequent siblings)
  20 siblings, 0 replies; 25+ messages in thread
From: David Woodhouse @ 2020-01-30 16:13 UTC (permalink / raw)
  To: Xen-devel
  Cc: Stefano Stabellini, Julien Grall, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Varad Gautam, paul, Ian Jackson,
	Hongyan Xia, Amit Shah, Roger Pau Monné

From: David Woodhouse <dwmw@amazon.co.uk>

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---
 xen/common/kimage.c      | 34 ++++++++++++++++++++++++++++++++++
 xen/include/xen/kimage.h |  3 +++
 xen/include/xen/lu.h     |  3 +++
 3 files changed, 40 insertions(+)

diff --git a/xen/common/kimage.c b/xen/common/kimage.c
index 210241dfb7..86d2797cbc 100644
--- a/xen/common/kimage.c
+++ b/xen/common/kimage.c
@@ -20,6 +20,7 @@
 #include <xen/mm.h>
 #include <xen/kexec.h>
 #include <xen/kimage.h>
+#include <xen/lu.h>
 
 #include <asm/page.h>
 
@@ -938,6 +939,39 @@ done:
     return ret;
 }
 
+int kimage_add_live_update_data(struct kexec_image *image, mfn_t data, int nr_mfns)
+{
+    int ret;
+
+    /*
+     * For live update, we place the physical location of 'data'
+     * into the first 64 bits of the reserved live update bootmem
+     * region. At 'data' is an MFN list of pages containing the
+     * actual live update stream, which the new Xen can vmap().
+     *
+     * Append IND_WRITE64 operations to the end of the kimage stream
+     * to store the live update magic and the address of 'data' for
+     * the new Xen to see.
+     */
+    if (!lu_bootmem_start || kimage_dst_used(image, lu_bootmem_start))
+        return -EINVAL;
+
+    ret = machine_kexec_add_page(image, lu_bootmem_start, lu_bootmem_start);
+    if ( ret < 0 )
+        return ret;
+
+    ret = kimage_set_destination(image, lu_bootmem_start);
+    if (!ret)
+        ret = kimage_add_entry(image, LIVE_UPDATE_MAGIC | IND_WRITE64);
+    if (!ret)
+        ret = kimage_add_entry(image, mfn_to_maddr(data) | IND_WRITE64);
+    if (!ret)
+        ret = kimage_add_entry(image, (nr_mfns << PAGE_SHIFT) | IND_WRITE64);
+
+    kimage_terminate(image);
+
+    return ret;
+}
 /*
  * Local variables:
  * mode: C
diff --git a/xen/include/xen/kimage.h b/xen/include/xen/kimage.h
index e94839d7c3..1e0e378afd 100644
--- a/xen/include/xen/kimage.h
+++ b/xen/include/xen/kimage.h
@@ -54,6 +54,9 @@ unsigned long kimage_entry_ind(kimage_entry_t *entry, bool_t compat);
 int kimage_build_ind(struct kexec_image *image, mfn_t ind_mfn,
                      bool_t compat);
 
+int kimage_add_live_update_data(struct kexec_image *image, mfn_t data,
+                                int nr_mfns);
+
 #endif /* __ASSEMBLY__ */
 
 #endif /* __XEN_KIMAGE_H__ */
diff --git a/xen/include/xen/lu.h b/xen/include/xen/lu.h
index abb30545fe..21ee1825d3 100644
--- a/xen/include/xen/lu.h
+++ b/xen/include/xen/lu.h
@@ -1,9 +1,12 @@
 #ifndef __XEN_LU_H__
 #define __XEN_LU_H__
 
+
 #include <xen/types.h>
 #include <xen/mm.h>
 
+#define LIVE_UPDATE_MAGIC        (0x4c69766555706461UL & PAGE_MASK)
+
 struct lu_stream {
     mfn_t *pagelist;
     size_t len;
-- 
2.21.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [Xen-devel] [RFC PATCH v3 09/22] Add basic lu_save_all() shell
  2020-01-30 16:13 ` [Xen-devel] [RFC PATCH v3 01/22] x86/setup: Don't skip 2MiB underneath relocated Xen image David Woodhouse
                     ` (6 preceding siblings ...)
  2020-01-30 16:13   ` [Xen-devel] [RFC PATCH v3 08/22] Add kimage_add_live_update_data() David Woodhouse
@ 2020-01-30 16:13   ` David Woodhouse
  2020-01-30 16:13   ` [Xen-devel] [RFC PATCH v3 10/22] Don't add bad pages above HYPERVISOR_VIRT_END to the domheap David Woodhouse
                     ` (12 subsequent siblings)
  20 siblings, 0 replies; 25+ messages in thread
From: David Woodhouse @ 2020-01-30 16:13 UTC (permalink / raw)
  To: Xen-devel
  Cc: Stefano Stabellini, Julien Grall, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Varad Gautam, paul, Ian Jackson,
	Hongyan Xia, Amit Shah, Roger Pau Monné

From: David Woodhouse <dwmw@amazon.co.uk>

---
 xen/common/kexec.c     |  6 +++++
 xen/common/lu/Makefile |  2 +-
 xen/common/lu/save.c   | 56 ++++++++++++++++++++++++++++++++++++++++++
 xen/include/xen/lu.h   |  3 +++
 4 files changed, 66 insertions(+), 1 deletion(-)
 create mode 100644 xen/common/lu/save.c

diff --git a/xen/common/kexec.c b/xen/common/kexec.c
index a78aa4f5b0..658fe3d3d4 100644
--- a/xen/common/kexec.c
+++ b/xen/common/kexec.c
@@ -29,6 +29,7 @@
 #include <public/elfnote.h>
 #include <xsm/xsm.h>
 #include <xen/cpu.h>
+#include <xen/lu.h>
 #ifdef CONFIG_COMPAT
 #include <compat/kexec.h>
 #endif
@@ -407,6 +408,11 @@ static long kexec_reboot(void *_image)
 static long kexec_live_update(void *_image)
 {
     struct kexec_image *image = _image;
+    int ret;
+
+    ret = lu_save_all(image);
+    if (ret)
+        return ret;
 
     kexecing = TRUE;
 
diff --git a/xen/common/lu/Makefile b/xen/common/lu/Makefile
index 68991b3ca4..7b7d975f65 100644
--- a/xen/common/lu/Makefile
+++ b/xen/common/lu/Makefile
@@ -1 +1 @@
-obj-y += stream.o
+obj-y += stream.o save.o
diff --git a/xen/common/lu/save.c b/xen/common/lu/save.c
new file mode 100644
index 0000000000..c43962c44e
--- /dev/null
+++ b/xen/common/lu/save.c
@@ -0,0 +1,56 @@
+
+#include <xen/types.h>
+#include <xen/vmap.h>
+#include <xen/lu.h>
+#include <xen/kimage.h>
+#include <xen/sched.h>
+
+int lu_save_global(struct lu_stream *stream)
+{
+    return 0;
+}
+
+
+int lu_save_domain(struct lu_stream *stream, struct domain *d)
+{
+    return 0;
+}
+
+int lu_save_all(struct kexec_image *image)
+{
+    struct lu_stream stream;
+    struct domain *d;
+    int ret;
+
+    memset(&stream, 0, sizeof(stream));
+
+    ret = lu_save_global(&stream);
+
+    for_each_domain ( d )
+    {
+        if (ret)
+            break;
+
+        ret = lu_save_domain(&stream, d);
+    }
+
+    if (!ret)
+        ret = kimage_add_live_update_data(image,
+                          _mfn(virt_to_mfn(stream.pagelist)),
+                          stream.nr_pages);
+
+    if (ret)
+        lu_stream_free(&stream);
+
+    return ret;
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/include/xen/lu.h b/xen/include/xen/lu.h
index 21ee1825d3..f232a56950 100644
--- a/xen/include/xen/lu.h
+++ b/xen/include/xen/lu.h
@@ -19,6 +19,9 @@ void lu_stream_end_reservation(struct lu_stream *stream, size_t size);
 int lu_stream_append(struct lu_stream *stream, const void *data, size_t size);
 void lu_stream_free(struct lu_stream *stream);
 
+struct kexec_image;
+int lu_save_all(struct kexec_image *image);
+
 #endif /* __XEN_LU_H__ */
 
 /*
-- 
2.21.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [Xen-devel] [RFC PATCH v3 10/22] Don't add bad pages above HYPERVISOR_VIRT_END to the domheap
  2020-01-30 16:13 ` [Xen-devel] [RFC PATCH v3 01/22] x86/setup: Don't skip 2MiB underneath relocated Xen image David Woodhouse
                     ` (7 preceding siblings ...)
  2020-01-30 16:13   ` [Xen-devel] [RFC PATCH v3 09/22] Add basic lu_save_all() shell David Woodhouse
@ 2020-01-30 16:13   ` David Woodhouse
  2020-01-30 16:13   ` [Xen-devel] [RFC PATCH v3 11/22] xen/vmap: allow vm_init_type to be called during early_boot David Woodhouse
                     ` (11 subsequent siblings)
  20 siblings, 0 replies; 25+ messages in thread
From: David Woodhouse @ 2020-01-30 16:13 UTC (permalink / raw)
  To: Xen-devel
  Cc: Stefano Stabellini, Julien Grall, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Varad Gautam, paul, Ian Jackson,
	Hongyan Xia, Amit Shah, Roger Pau Monné

From: David Woodhouse <dwmw@amazon.co.uk>

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---
 xen/common/page_alloc.c | 83 +++++++++++++++++++++++++++++++++++++++--
 1 file changed, 80 insertions(+), 3 deletions(-)

diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
index a74bf02559..4ada4412dd 100644
--- a/xen/common/page_alloc.c
+++ b/xen/common/page_alloc.c
@@ -1758,6 +1758,18 @@ int query_page_offline(mfn_t mfn, uint32_t *status)
     return 0;
 }
 
+static unsigned long contig_avail_pages(struct page_info *pg, unsigned long max_pages)
+{
+    unsigned long i;
+
+    for ( i = 0 ; i < max_pages; i++)
+    {
+        if ( pg[i].count_info & (PGC_broken | PGC_allocated) )
+            break;
+    }
+    return i;
+}
+
 /*
  * Hand the specified arbitrary page range to the specified heap zone
  * checking the node_id of the previous page.  If they differ and the
@@ -1799,18 +1811,24 @@ static void init_heap_pages(
     {
         unsigned int nid = phys_to_nid(page_to_maddr(pg+i));
 
+        /* If the (first) page is already marked bad, or allocated in advance
+         * due to live update, don't add it to the heap. */
+        if (pg[i].count_info & (PGC_broken | PGC_allocated))
+            continue;
+
         if ( unlikely(!avail[nid]) )
         {
+            unsigned long contig_nr_pages = contig_avail_pages(pg + i, nr_pages);
             unsigned long s = mfn_x(page_to_mfn(pg + i));
-            unsigned long e = mfn_x(mfn_add(page_to_mfn(pg + nr_pages - 1), 1));
+            unsigned long e = mfn_x(mfn_add(page_to_mfn(pg + i + contig_nr_pages - 1), 1));
             bool use_tail = (nid == phys_to_nid(pfn_to_paddr(e - 1))) &&
                             !(s & ((1UL << MAX_ORDER) - 1)) &&
                             (find_first_set_bit(e) <= find_first_set_bit(s));
             unsigned long n;
 
-            n = init_node_heap(nid, mfn_x(page_to_mfn(pg + i)), nr_pages - i,
+            n = init_node_heap(nid, mfn_x(page_to_mfn(pg + i)), contig_nr_pages,
                                &use_tail);
-            BUG_ON(i + n > nr_pages);
+            BUG_ON(n > contig_nr_pages);
             if ( n && !use_tail )
             {
                 i += n - 1;
@@ -1846,6 +1864,63 @@ static unsigned long avail_heap_pages(
     return free_pages;
 }
 
+static void mark_bad_pages(void)
+{
+    unsigned long bad_spfn, bad_epfn;
+    const char *p;
+    struct page_info *pg;
+#ifdef CONFIG_X86
+    const struct platform_bad_page *badpage;
+    unsigned int i, j, array_size;
+
+    badpage = get_platform_badpages(&array_size);
+    if ( badpage )
+    {
+        for ( i = 0; i < array_size; i++ )
+        {
+            for ( j = 0; j < 1UL << badpage->order; j++ )
+            {
+                if ( mfn_valid(_mfn(badpage->mfn + j)) )
+                {
+                    pg = mfn_to_page(_mfn(badpage->mfn + j));
+                    pg->count_info |= PGC_broken;
+                    page_list_add_tail(pg, &page_broken_list);
+                }
+            }
+        }
+    }
+#endif
+
+    /* Check new pages against the bad-page list. */
+    p = opt_badpage;
+    while ( *p != '\0' )
+    {
+        bad_spfn = simple_strtoul(p, &p, 0);
+        bad_epfn = bad_spfn;
+
+        if ( *p == '-' )
+        {
+            p++;
+            bad_epfn = simple_strtoul(p, &p, 0);
+            if ( bad_epfn < bad_spfn )
+                bad_epfn = bad_spfn;
+        }
+
+        if ( *p == ',' )
+            p++;
+        else if ( *p != '\0' )
+            break;
+
+        while ( mfn_valid(_mfn(bad_spfn)) && bad_spfn < bad_epfn )
+        {
+            pg = mfn_to_page(_mfn(bad_spfn));
+            pg->count_info |= PGC_broken;
+            page_list_add_tail(pg, &page_broken_list);
+            bad_spfn++;
+        }
+    }
+}
+
 void __init end_boot_allocator(void)
 {
     unsigned int i;
@@ -1870,6 +1945,8 @@ void __init end_boot_allocator(void)
     }
     nr_bootmem_regions = 0;
 
+    mark_bad_pages();
+
     if ( !dma_bitsize && (num_online_nodes() > 1) )
         dma_bitsize = arch_get_dma_bitsize();
 
-- 
2.21.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [Xen-devel] [RFC PATCH v3 11/22] xen/vmap: allow vm_init_type to be called during early_boot
  2020-01-30 16:13 ` [Xen-devel] [RFC PATCH v3 01/22] x86/setup: Don't skip 2MiB underneath relocated Xen image David Woodhouse
                     ` (8 preceding siblings ...)
  2020-01-30 16:13   ` [Xen-devel] [RFC PATCH v3 10/22] Don't add bad pages above HYPERVISOR_VIRT_END to the domheap David Woodhouse
@ 2020-01-30 16:13   ` David Woodhouse
  2020-01-30 16:13   ` [Xen-devel] [RFC PATCH v3 12/22] xen/vmap: allow vmap() to be called during early boot David Woodhouse
                     ` (10 subsequent siblings)
  20 siblings, 0 replies; 25+ messages in thread
From: David Woodhouse @ 2020-01-30 16:13 UTC (permalink / raw)
  To: Xen-devel
  Cc: Stefano Stabellini, Julien Grall, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Varad Gautam, paul, Ian Jackson,
	Hongyan Xia, Amit Shah, Roger Pau Monné

From: Wei Liu <wei.liu2@citrix.com>

We want to move vm_init, which calls vm_init_type under the hood, to
early boot stage. Add a path to get page from boot allocator instead.

Add an emacs block to that file while I was there.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 xen/common/vmap.c | 22 +++++++++++++++++++---
 1 file changed, 19 insertions(+), 3 deletions(-)

diff --git a/xen/common/vmap.c b/xen/common/vmap.c
index faebc1ddf1..37922f735b 100644
--- a/xen/common/vmap.c
+++ b/xen/common/vmap.c
@@ -34,9 +34,15 @@ void __init vm_init_type(enum vmap_region type, void *start, void *end)
 
     for ( i = 0, va = (unsigned long)vm_bitmap(type); i < nr; ++i, va += PAGE_SIZE )
     {
-        struct page_info *pg = alloc_domheap_page(NULL, 0);
-
-        map_pages_to_xen(va, page_to_mfn(pg), 1, PAGE_HYPERVISOR);
+        mfn_t mfn;
+        int rc;
+
+        if ( system_state == SYS_STATE_early_boot )
+            mfn = alloc_boot_pages(1, 1);
+        else
+            mfn = page_to_mfn(alloc_domheap_page(NULL, 0));
+        rc = map_pages_to_xen(va, mfn, 1, PAGE_HYPERVISOR);
+        BUG_ON(rc);
         clear_page((void *)va);
     }
     bitmap_fill(vm_bitmap(type), vm_low[type]);
@@ -330,3 +336,13 @@ void vfree(void *va)
         free_domheap_page(pg);
 }
 #endif
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
-- 
2.21.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [Xen-devel] [RFC PATCH v3 12/22] xen/vmap: allow vmap() to be called during early boot
  2020-01-30 16:13 ` [Xen-devel] [RFC PATCH v3 01/22] x86/setup: Don't skip 2MiB underneath relocated Xen image David Woodhouse
                     ` (9 preceding siblings ...)
  2020-01-30 16:13   ` [Xen-devel] [RFC PATCH v3 11/22] xen/vmap: allow vm_init_type to be called during early_boot David Woodhouse
@ 2020-01-30 16:13   ` David Woodhouse
  2020-01-30 16:13   ` [Xen-devel] [RFC PATCH v3 13/22] x86/setup: move vm_init() before end_boot_allocator() David Woodhouse
                     ` (9 subsequent siblings)
  20 siblings, 0 replies; 25+ messages in thread
From: David Woodhouse @ 2020-01-30 16:13 UTC (permalink / raw)
  To: Xen-devel
  Cc: Stefano Stabellini, Julien Grall, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Varad Gautam, paul, Ian Jackson,
	Hongyan Xia, Amit Shah, Roger Pau Monné

From: David Woodhouse <dwmw@amazon.co.uk>

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---
 xen/common/vmap.c | 23 +++++++++++++++++------
 1 file changed, 17 insertions(+), 6 deletions(-)

diff --git a/xen/common/vmap.c b/xen/common/vmap.c
index 37922f735b..8343460794 100644
--- a/xen/common/vmap.c
+++ b/xen/common/vmap.c
@@ -68,7 +68,7 @@ static void *vm_alloc(unsigned int nr, unsigned int align,
     spin_lock(&vm_lock);
     for ( ; ; )
     {
-        struct page_info *pg;
+        mfn_t mfn;
 
         ASSERT(vm_low[t] == vm_top[t] || !test_bit(vm_low[t], vm_bitmap(t)));
         for ( start = vm_low[t]; start < vm_top[t]; )
@@ -103,9 +103,17 @@ static void *vm_alloc(unsigned int nr, unsigned int align,
         if ( vm_top[t] >= vm_end[t] )
             return NULL;
 
-        pg = alloc_domheap_page(NULL, 0);
-        if ( !pg )
-            return NULL;
+        if ( system_state == SYS_STATE_early_boot )
+        {
+            mfn = alloc_boot_pages(1, 1);
+        }
+        else
+        {
+            struct page_info *pg = alloc_domheap_page(NULL, 0);
+            if ( !pg )
+                return NULL;
+            mfn = page_to_mfn(pg);
+        }
 
         spin_lock(&vm_lock);
 
@@ -113,7 +121,7 @@ static void *vm_alloc(unsigned int nr, unsigned int align,
         {
             unsigned long va = (unsigned long)vm_bitmap(t) + vm_top[t] / 8;
 
-            if ( !map_pages_to_xen(va, page_to_mfn(pg), 1, PAGE_HYPERVISOR) )
+            if ( !map_pages_to_xen(va, mfn, 1, PAGE_HYPERVISOR) )
             {
                 clear_page((void *)va);
                 vm_top[t] += PAGE_SIZE * 8;
@@ -123,7 +131,10 @@ static void *vm_alloc(unsigned int nr, unsigned int align,
             }
         }
 
-        free_domheap_page(pg);
+        if ( system_state == SYS_STATE_early_boot )
+            init_boot_pages(mfn_to_maddr(mfn), mfn_to_maddr(mfn) + PAGE_SIZE);
+        else
+            free_domheap_page(mfn_to_page(mfn));
 
         if ( start >= vm_top[t] )
         {
-- 
2.21.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [Xen-devel] [RFC PATCH v3 13/22] x86/setup: move vm_init() before end_boot_allocator()
  2020-01-30 16:13 ` [Xen-devel] [RFC PATCH v3 01/22] x86/setup: Don't skip 2MiB underneath relocated Xen image David Woodhouse
                     ` (10 preceding siblings ...)
  2020-01-30 16:13   ` [Xen-devel] [RFC PATCH v3 12/22] xen/vmap: allow vmap() to be called during early boot David Woodhouse
@ 2020-01-30 16:13   ` David Woodhouse
  2020-01-30 16:13   ` [Xen-devel] [RFC PATCH v3 14/22] Detect live update breadcrumb at boot and map data stream David Woodhouse
                     ` (8 subsequent siblings)
  20 siblings, 0 replies; 25+ messages in thread
From: David Woodhouse @ 2020-01-30 16:13 UTC (permalink / raw)
  To: Xen-devel
  Cc: Stefano Stabellini, Julien Grall, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Varad Gautam, paul, Ian Jackson,
	Hongyan Xia, Amit Shah, Roger Pau Monné

From: David Woodhouse <dwmw@amazon.co.uk>

We would like to be able to use vmap() to map the live update data, and
we need to do a first pass of the live update data before we prime the
heap because we need to know which pages need to be preserved.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---
 xen/arch/x86/setup.c | 8 ++------
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
index dba8c3f0a1..ea3f423b4c 100644
--- a/xen/arch/x86/setup.c
+++ b/xen/arch/x86/setup.c
@@ -1572,6 +1572,8 @@ void __init noreturn __start_xen(unsigned long mbi_p)
 
     numa_initmem_init(0, raw_max_page);
 
+    vm_init();
+
     if ( lu_bootmem_start )
     {
         unsigned long limit = virt_to_mfn(HYPERVISOR_VIRT_END - 1);
@@ -1635,12 +1637,6 @@ void __init noreturn __start_xen(unsigned long mbi_p)
         end_boot_allocator();
 
     system_state = SYS_STATE_boot;
-    /*
-     * No calls involving ACPI code should go between the setting of
-     * SYS_STATE_boot and vm_init() (or else acpi_os_{,un}map_memory()
-     * will break).
-     */
-    vm_init();
 
     console_init_ring();
     vesa_init();
-- 
2.21.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [Xen-devel] [RFC PATCH v3 14/22] Detect live update breadcrumb at boot and map data stream
  2020-01-30 16:13 ` [Xen-devel] [RFC PATCH v3 01/22] x86/setup: Don't skip 2MiB underneath relocated Xen image David Woodhouse
                     ` (11 preceding siblings ...)
  2020-01-30 16:13   ` [Xen-devel] [RFC PATCH v3 13/22] x86/setup: move vm_init() before end_boot_allocator() David Woodhouse
@ 2020-01-30 16:13   ` David Woodhouse
  2020-01-30 16:13   ` [Xen-devel] [RFC PATCH v3 15/22] Start documenting the live update handover David Woodhouse
                     ` (7 subsequent siblings)
  20 siblings, 0 replies; 25+ messages in thread
From: David Woodhouse @ 2020-01-30 16:13 UTC (permalink / raw)
  To: Xen-devel
  Cc: Stefano Stabellini, Julien Grall, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Varad Gautam, paul, Ian Jackson,
	Hongyan Xia, Amit Shah, Roger Pau Monné

From: David Woodhouse <dwmw@amazon.co.uk>

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---
 xen/arch/x86/setup.c   | 35 +++++++++++++++++++++++++++++++++--
 xen/common/lu/stream.c | 34 ++++++++++++++++++++++++++++++++++
 xen/include/xen/lu.h   |  2 ++
 3 files changed, 69 insertions(+), 2 deletions(-)

diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
index ea3f423b4c..eea670e03b 100644
--- a/xen/arch/x86/setup.c
+++ b/xen/arch/x86/setup.c
@@ -25,6 +25,7 @@
 #include <xen/nodemask.h>
 #include <xen/virtual_region.h>
 #include <xen/watchdog.h>
+#include <xen/lu.h>
 #include <public/version.h>
 #include <compat/platform.h>
 #include <compat/xen.h>
@@ -745,6 +746,8 @@ void __init noreturn __start_xen(unsigned long mbi_p)
         .max_maptrack_frames = -1,
     };
     const char *hypervisor_name;
+    uint64_t lu_mfnlist_phys = 0, lu_nr_pages = 0;
+    struct lu_stream lu_stream;
 
     /* Critical region without IDT or TSS.  Any fault is deadly! */
 
@@ -889,9 +892,16 @@ void __init noreturn __start_xen(unsigned long mbi_p)
     printk(" Found %d EDD information structures\n",
            bootsym(boot_edd_info_nr));
 
-    /* Check that we have at least one Multiboot module. */
     if ( !(mbi->flags & MBI_MODULES) || (mbi->mods_count == 0) )
-        panic("dom0 kernel not specified. Check bootloader configuration\n");
+    {
+        if ( !lu_breadcrumb_phys )
+            panic("dom0 kernel not specified. Check bootloader configuration\n");
+    }
+    else
+    {
+        /* If modules are provided, don't even look for live update data. */
+        lu_breadcrumb_phys = 0;
+    }
 
     /* Check that we don't have a silly number of modules. */
     if ( mbi->mods_count > sizeof(module_map) * 8 )
@@ -1337,6 +1347,22 @@ void __init noreturn __start_xen(unsigned long mbi_p)
     if ( !xen_phys_start )
         panic("Not enough memory to relocate Xen\n");
 
+    /* Check for the state breadcrumb before giving it to the boot allocator */
+    if ( lu_breadcrumb_phys )
+    {
+        uint64_t *breadcrumb = maddr_to_virt(lu_breadcrumb_phys);
+
+        lu_mfnlist_phys = breadcrumb[1];
+        lu_nr_pages = breadcrumb[2] >> PAGE_SHIFT;
+
+        if ( breadcrumb[0] == LIVE_UPDATE_MAGIC && lu_nr_pages) {
+            printk("%ld pages of live update data at 0x%lx\n", lu_nr_pages, lu_mfnlist_phys);
+        } else {
+            panic("Live update breadcrumb not found: %lx %lx %lx at %lx\n",
+                  breadcrumb[0], breadcrumb[1], breadcrumb[2], lu_breadcrumb_phys);
+        }
+    }
+
     if ( lu_bootmem_start )
     {
         if ( !lu_reserved )
@@ -1574,6 +1600,11 @@ void __init noreturn __start_xen(unsigned long mbi_p)
 
     vm_init();
 
+    if ( lu_breadcrumb_phys )
+    {
+        lu_stream_map(&lu_stream, lu_mfnlist_phys, lu_nr_pages);
+    }
+
     if ( lu_bootmem_start )
     {
         unsigned long limit = virt_to_mfn(HYPERVISOR_VIRT_END - 1);
diff --git a/xen/common/lu/stream.c b/xen/common/lu/stream.c
index 10e123a466..8c44a4eb37 100644
--- a/xen/common/lu/stream.c
+++ b/xen/common/lu/stream.c
@@ -124,6 +124,40 @@ void lu_stream_free(struct lu_stream *stream)
     }
 }
 
+void lu_stream_map(struct lu_stream *stream, unsigned long mfns_phys, int nr_pages)
+{
+    unsigned int order = get_order_from_bytes((nr_pages + 1) * sizeof(mfn_t));
+    unsigned int i;
+
+    memset(stream, 0, sizeof(*stream));
+
+    stream->len = nr_pages << PAGE_SHIFT;
+    stream->nr_pages = nr_pages;
+    stream->pagelist = __va(mfns_phys);
+
+    map_pages_to_xen((unsigned long)stream->pagelist, maddr_to_mfn(mfns_phys),
+                     1 << order, PAGE_HYPERVISOR);
+
+    /* Reserve the pages used for the pagelist itself. */
+    for ( i = 0; i < (1 << order); i++ )
+    {
+        maddr_to_page(mfns_phys + (i << PAGE_SHIFT))->count_info |= PGC_allocated;
+    }
+
+    /* Validate and reserve the data pages */
+    for ( i = 0; i < nr_pages; i++ )
+    {
+        if (!mfn_valid(stream->pagelist[i]))
+            panic("Invalid MFN %lx in live update stream\n", mfn_x(stream->pagelist[i]));
+
+        mfn_to_page(stream->pagelist[i])->count_info |= PGC_allocated;
+    }
+
+    stream->data = vmap(stream->pagelist, nr_pages);
+    if (!stream->data)
+        panic("Failed to map live update data\n");
+}
+
 /*
  * local variables:
  * mode: c
diff --git a/xen/include/xen/lu.h b/xen/include/xen/lu.h
index f232a56950..21abace130 100644
--- a/xen/include/xen/lu.h
+++ b/xen/include/xen/lu.h
@@ -22,6 +22,8 @@ void lu_stream_free(struct lu_stream *stream);
 struct kexec_image;
 int lu_save_all(struct kexec_image *image);
 
+void lu_stream_map(struct lu_stream *stream, unsigned long mfns_phys, int nr_pages);
+
 #endif /* __XEN_LU_H__ */
 
 /*
-- 
2.21.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [Xen-devel] [RFC PATCH v3 15/22] Start documenting the live update handover
  2020-01-30 16:13 ` [Xen-devel] [RFC PATCH v3 01/22] x86/setup: Don't skip 2MiB underneath relocated Xen image David Woodhouse
                     ` (12 preceding siblings ...)
  2020-01-30 16:13   ` [Xen-devel] [RFC PATCH v3 14/22] Detect live update breadcrumb at boot and map data stream David Woodhouse
@ 2020-01-30 16:13   ` David Woodhouse
  2020-01-30 16:13   ` [Xen-devel] [RFC PATCH v3 16/22] Migrate migration stream definitions into Xen public headers David Woodhouse
                     ` (6 subsequent siblings)
  20 siblings, 0 replies; 25+ messages in thread
From: David Woodhouse @ 2020-01-30 16:13 UTC (permalink / raw)
  To: Xen-devel
  Cc: Stefano Stabellini, Julien Grall, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Varad Gautam, paul, Ian Jackson,
	Hongyan Xia, Amit Shah, Roger Pau Monné

From: David Woodhouse <dwmw@amazon.co.uk>

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---
 docs/specs/libxc-migration-stream.pandoc |  19 +-
 docs/specs/live-update-handover.pandoc   | 371 +++++++++++++++++++++++
 2 files changed, 388 insertions(+), 2 deletions(-)
 create mode 100644 docs/specs/live-update-handover.pandoc

diff --git a/docs/specs/libxc-migration-stream.pandoc b/docs/specs/libxc-migration-stream.pandoc
index a7a8a08936..9a6679f3de 100644
--- a/docs/specs/libxc-migration-stream.pandoc
+++ b/docs/specs/libxc-migration-stream.pandoc
@@ -227,12 +227,18 @@ type         0x00000000: END
 
              0x0000000F: CHECKPOINT_DIRTY_PFN_LIST (Secondary -> Primary)
 
-             0x00000010 - 0x7FFFFFFF: Reserved for future _mandatory_
+             0x00000010 - 0x3FFFFFFF: Reserved for future _mandatory_
              records.
 
-             0x80000000 - 0xFFFFFFFF: Reserved for future _optional_
+             0x40000000 - 0x7FFFFFFF: Reserved for future _mandatory_
+             live update records.
+
+             0x80000000 - 0xBFFFFFFF: Reserved for future _optional_
              records.
 
+             0xC0000000 - 0xFFFFFFFF: Reserved for future _optional_
+             live update records.
+
 body_length  Length in octets of the record body.
 
 body         Content of the record.
@@ -246,6 +252,15 @@ Records may be _mandatory_ or _optional_.  Optional records have bit
 unsupported mandatory record must fail.  The contents of optional
 records may be ignored during a restore.
 
+Note: This basic record format,. and some of the record types defined here,
+are also used for Live Update, as discussed in the Live Update Handover
+document: `docs/specs/live-update-handover.pandoc`.
+
+Records defined for live update have bit 30 set in their type value,
+are defined in that document, and are out of scope for this document.
+Such records shall not appear in the Domain Image Format as defined by
+this document.
+
 The following sub-sections specify the record body format for each of
 the record types.
 
diff --git a/docs/specs/live-update-handover.pandoc b/docs/specs/live-update-handover.pandoc
new file mode 100644
index 0000000000..31d23c7c90
--- /dev/null
+++ b/docs/specs/live-update-handover.pandoc
@@ -0,0 +1,371 @@
+% Live Update Handover Protocol
+% David Woodhouse <<dwmw@amazon.co.uk>>
+% Revision 1
+
+Introduction
+============
+
+Purpose
+-------
+
+Live update performs a _kexec_ from one running version of Xen to
+another, preserving all running domains in a form of guest-transparent
+live migration.
+
+This document outlines the memory layout requirements and data stream
+used in handover protocol, to ensure that pages used by running
+domains are preserved during the transition from one version of Xen
+to the next.
+
+
+Compatibility
+-------------
+
+It cannot be repeated often enough that information passed over live
+update is an ABI. It is expected that live update can be performed from
+one major version of Xen to another, or even hypothetically to a system
+which is not Xen at all.
+
+It is necessary that some data are handed over "in place"; in
+particular the memory pages of the running domains. However, no
+internal Xen data structures may be transferred in this fashion; at
+least not without retrospectively declaring them to be ABI, with the
+restrictions that places on subsequent changes.
+
+
+
+Handover
+========
+
+
+Memory Usage Restrictions
+-------------------------
+
+The new Xen must take care not to use any memory pages which already
+belong to guests. To facilitate this, a contiguous region of memory
+is reserved for the boot allocator, known as _live update bootmem_.
+
+This region is reserved by the original Xen during its own boot, and
+the location made available to the _kexec(8)_ user space tool
+through the `kexec_get_range` hypercall using a new region type
+`KEXEC_RANGE_MA_LIVEUPDATE`. It is passed to the new Xen on the
+command line, using the `liveupdate=` parameter.
+
+The new Xen must not use any pages outside this region until it has
+consumed the live update data stream and determined which pages are
+already in use by running domains.
+
+At run time, Xen may use memory from the reserved region for any
+purpose that does not require preservation over a live update; in
+particular it must not be mapped to a domain.
+
+The new Xen executable image must be loaded by kexec to the same
+physical location as the running Xen, since that region of memory is
+known to be available. For that reason, freed init memory from the
+Xen image is also treated as reserved _live update bootmem_.
+
+
+Live Update Data Stream
+-----------------------
+
+During handover, the running Xen pauses all domains and creates a
+_live update data stream_ containing all the information required by
+the new Xen to restore them. This is largely the same as guest
+transparent live migration.
+
+Data pages for this stream may be allocated anywhere in physical
+memory outside the _live update bootmem_ regions.
+
+Xen creates a physically contiguous array of MFNs of the allocated
+data pages, suitable for passing to `vmap()` to obtain a virtually
+contiguous mapping of the whole data stream.
+
+
+Breadcrumb
+----------
+
+Since the live update data stream is created during the final `kexec_exec`
+hypercall, its address cannot be passed on the command line to the
+new Xen since the command line needs to have been set up by `kexec(8)`
+in userspace long beforehand.
+
+Thus, to allow the new Xen to find the data stream, the old Xen places
+a _breadcrumb_ in the first words of the _live update bootmem_, containing
+the number of data pages, and the physical address of the contiguous MFN
+array.
+
+The breadcrumb is written as the last action of the `kexec_reloc()`
+routine during the `kexec` handover, so cannot overwrite anything
+important by virtue of the existing guarantee that Xen will not place
+any data in that region which needs to survive across a live update.
+
+A restriction of the `kexec_reloc()` mechanism for writing the breadcrumb
+is that the values are host-endian and are masked with PAGE_MASK; the low
+bits are zeroed. This is actually perfect for the magic value used
+to recognise a live update breadcrumb, since it neatly prevents any attempt
+to live update to a Xen which uses a different endianness or page size.
+
+For the physical address of the MFN list it's perfectly fine, since
+that list is page-aligned anyway. For the number of pages, it means
+the value must be shifted accordingly. Hence the use of `shifted_nr_pages`
+in the breadcrumb structure below:
+
+
+     0      1     2     3     4     5     6     7 octet
+    +-------------------------------------------------+
+    | live_update_magic                               |
+    +-------------------------------------------------+
+    | mfn_array_physaddr                              |
+    +-------------------------------------------------+
+    | shifted_nr_pages                                |
+    +-------------------------------------------------+
+
+--------------------------------------------------------------------
+Field               Description
+------------------- ------------------------------------------------
+live_update_magic   "LiveUpda" (0x4c69766555706461) stored in the the host
+                    endianness and masked with PAGE_MASK.
+                    For example on x86_64: `00 60 70 55 65 76 89 4c`.
+
+mfn_array_physaddr  Machine address of MFN list for data streaes.
+
+shift_nr_pages      Number of data pages, shifted by PAGE_SHIFT to
+                    avoid the limitation of kexec_reloc().
+--------------------------------------------------------------------
+
+
+IOMMU
+-----
+
+Where devices are passed through to domains, it may not be possible
+to quiesce those devices for the purpose of performing the update.
+
+If performing live update with assigned devices, the original Xen will
+leave the IOMMU mappings active during the handover (thus implying
+that IOMMU page tables may not be allocated in the `live update
+bootmem` region either).
+
+The new Xen must resume control of the IOMMU without causing those mappings
+to become invalid even for a short period of time. On hardware which does not
+support Posted Interrupts, interrupts may need to be generated on resume.
+
+_This section will be expanded once we actually have it working._
+
+\clearpage
+
+Data Stream Overview
+====================
+
+Once discovered and mapped, the live update data stream forms a
+virtually contiguous stream of records following the basic form
+documented in the LibXenCtrl Domain Image Format at
+`docs/specs/libxc-migration-stream.pandoc`.
+
+Some record types from the LibXenCtrl Domain Image format are used
+as-is, such as the `X86_PV_INFO`, `X86_PV_VCPU_BASIC`, `HVM_CONTEXT`
+and other records containing domain-specific data.
+
+The Domain Header from that document is not used in that form, and a new
+record of type `LU_DOMAIN_INFO` is defined below.
+
+Other new record types specific to the live update process are defined in
+this document. Of those, some contain global state such as the M2P table
+information, while others are domain-specific.
+
+The live update data stream starts with records containing global
+information, followed any number of times by a `LU_DOMAIN_INFO` record
+and subsequent domain-specific records for that domain.
+
+There is a single `END` record at the end of the live update data stream,
+indicating that no more `DOMAIN_INFO` records are present.
+
+\clearpage
+
+As defined in the LibXenCtrl Domain Image format document, a record
+has the following structure. Record type values defined for live update
+have bit 30 set, and are thus in the range 0x40000000-0x7FFFFFFF for
+mandatory live update records, and 0xC0000000-0xFFFFFFFF for optional
+live update records _(of which there are none at the present time)_.
+
+
+    0     1     2     3     4     5     6     7 octet
+    +-----------------------+-------------------------+
+    | type                  | body_length             |
+    +-----------+-----------+-------------------------+
+    | body...                                         |
+    ...
+    |           | padding (0 to 7 octets)             |
+    +-----------+-------------------------------------+
+
+--------------------------------------------------------------------
+Field        Description
+-----------  -------------------------------------------------------
+type         0x40000000: LU_VERSION
+
+             0x40000001: LU_M2P
+
+             0x40000002: LU_M2P_COMPAT
+
+             0x40000003: LU_DOMAIN_INFO
+
+             0x40000004 - 0x7FFFFFFF: Reserved for future _mandatory_
+             live update records.
+
+             0xC0000000 - 0xFFFFFFFF: Reserved for future _optional_
+             live update records.
+
+body_length  Length in octets of the record body.
+
+body         Content of the record.
+
+padding      0 to 7 octets of zeros to pad the whole record to a multiple
+             of 8 octets.
+--------------------------------------------------------------------
+
+
+\clearpage
+
+Global Records
+==============
+
+LU_VERSION
+----------
+
+The version field indicates the version of Xen from which the system
+is live updating. In theory this should never be relevant, but it
+allows for version-specific workarounds to be implementing in the receiving
+Xen should they become necessary.
+
+     0      1     2     3     4     5     6     7 octet
+    +-----------------------+-----------+-------------+
+    | xen_major             | xen_minor               |
+    +-----------------------+-------------------------+
+
+
+--------------------------------------------------------------------
+Field       Description
+----------- --------------------------------------------------------
+xen_major   The Xen major version from which the system is updating.
+
+xen_minor   The Xen minor version from which the system is updating.
+--------------------------------------------------------------------
+
+\clearpage
+
+LU_M2P / LU_M2P_COMPAT
+----------------------
+
+The M2P and compatibility M2P records contain a scatter/gather list of
+pages containing native or 32-bit M2P data.
+
+
+     0     1     2     3     4     5     6     7 octet
+    +-----------------------+-------------------------+
+    | m2p_page_data[0]...                             |
+    ...
+    +-------------------------------------------------+
+    | m2p_page_data[N-1]...                           |
+    ...
+    +-------------------------------------------------+
+
+--------------------------------------------------------------------
+Field           Description
+-----------     --------------------------------------------------------
+m2p_page_data   A 64-bit value containing the physical address of the
+                next page of M2P data, encoding the _order_ of the page
+                into the low 12 bits. Thus, a 1GiB page at 0x4C0000000
+                would be encoded as 0x4C000001E.
+
+                In case the M2P does not contiguously cover pages starting
+                from MFN zero, a discontiguity is indicated by a field
+                with order set to zero. The high bits of the field then
+                provide the MFN for which the subsequent M2P data page
+                provides data.
+
+--------------------------------------------------------------------
+
+\clearpage
+
+Domain Specific Records
+=======================
+
+
+LU_DOMAIN_INFO
+--------------
+
+The domain info record contains general properties necessary to
+recreate a domain in the receiving Xen, and marks the start of a set
+of other domain-specific records pertaining to that domain.
+
+     0      1     2     3     4     5     6     7 octet
+    +-----------------------+-----------+-------------+
+    | type                  | page_shift| domain_id   |
+    +-----------------------+-----------+-------------+
+    | domain_handle[0-7]                              |
+    +-------------------------------------------------+
+    | domain_handle[8-15]                             |
+    +-----------------------+-------------------------+
+    | ssidref               | flags                   |
+    +-----------------------+-------------------------+
+    | max_vcpus             | emulation_flags         |
+    +-----------------------+-------------------------+
+    | extra_flags           | (padding)               |
+    +-----------------------+-------------------------+
+
+
+--------------------------------------------------------------------
+Field           Description
+--------------- --------------------------------------------------------
+type            0x0000: Reserved.
+
+                0x0001: x86 PV.
+
+                0x0002: x86 HVM.
+
+                0x0003 - 0xFFFFFFFF: Reserved.
+
+page_shift      Size of a guest page as a power of two.
+
+                i.e., page size = 2 ^page_shift^.
+
+domain_id       Domain ID
+
+
+domain_handle   UUID domain handle.
+
+ssidref         Security Identifier Index
+
+flags           Domain flags using `XEN_DOMCTL_CTF_`
+
+max_vcpus       Maximum vCPUs for domain.
+
+emulation_flags Emulation flags using `XEN_X86_EMU_`
+
+extra_flags     Additional flags:
+
+                0x00000001: Is privileged
+
+--------------------------------------------------------------------
+
+\clearpage
+
+Future Extensions
+=================
+
+All changes to this specification should bump the revision number in
+the title block.
+
+All changes to the image or domain headers require the image version
+to be increased.
+
+The format may be extended by adding additional record types.
+
+Extending an existing record type must be done by adding a new record
+type.  This allows old images with the old record to still be
+restored.
+
+The image header may only be extended by _appending_ additional
+fields.  In particular, the `marker`, `id` and `version` fields must
+never change size or location.
+
+
-- 
2.21.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [Xen-devel] [RFC PATCH v3 16/22] Migrate migration stream definitions into Xen public headers
  2020-01-30 16:13 ` [Xen-devel] [RFC PATCH v3 01/22] x86/setup: Don't skip 2MiB underneath relocated Xen image David Woodhouse
                     ` (13 preceding siblings ...)
  2020-01-30 16:13   ` [Xen-devel] [RFC PATCH v3 15/22] Start documenting the live update handover David Woodhouse
@ 2020-01-30 16:13   ` David Woodhouse
  2020-01-30 16:13   ` [Xen-devel] [RFC PATCH v3 17/22] Add lu_stream_{open, close, append}_record() David Woodhouse
                     ` (5 subsequent siblings)
  20 siblings, 0 replies; 25+ messages in thread
From: David Woodhouse @ 2020-01-30 16:13 UTC (permalink / raw)
  To: Xen-devel
  Cc: Stefano Stabellini, Julien Grall, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Varad Gautam, paul, Ian Jackson,
	Hongyan Xia, Amit Shah, Roger Pau Monné

From: David Woodhouse <dwmw@amazon.co.uk>

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---
 tools/libxc/xc_sr_common.c            |  20 ++--
 tools/libxc/xc_sr_common_x86.c        |   4 +-
 tools/libxc/xc_sr_restore.c           |   2 +-
 tools/libxc/xc_sr_restore_x86_hvm.c   |   4 +-
 tools/libxc/xc_sr_restore_x86_pv.c    |   8 +-
 tools/libxc/xc_sr_save.c              |   2 +-
 tools/libxc/xc_sr_save_x86_hvm.c      |   4 +-
 tools/libxc/xc_sr_save_x86_pv.c       |  12 +--
 tools/libxc/xc_sr_stream_format.h     |  97 +-------------------
 xen/include/Makefile                  |   2 +-
 xen/include/public/migration_stream.h | 126 ++++++++++++++++++++++++++
 11 files changed, 157 insertions(+), 124 deletions(-)
 create mode 100644 xen/include/public/migration_stream.h

diff --git a/tools/libxc/xc_sr_common.c b/tools/libxc/xc_sr_common.c
index dd9a11b4b5..92f9332e73 100644
--- a/tools/libxc/xc_sr_common.c
+++ b/tools/libxc/xc_sr_common.c
@@ -91,7 +91,7 @@ int write_split_record(struct xc_sr_context *ctx, struct xc_sr_record *rec,
 int read_record(struct xc_sr_context *ctx, int fd, struct xc_sr_record *rec)
 {
     xc_interface *xch = ctx->xch;
-    struct xc_sr_rhdr rhdr;
+    struct mr_rhdr rhdr;
     size_t datasz;
 
     if ( read_exact(fd, &rhdr, sizeof(rhdr)) )
@@ -142,15 +142,15 @@ static void __attribute__((unused)) build_assertions(void)
 {
     BUILD_BUG_ON(sizeof(struct xc_sr_ihdr) != 24);
     BUILD_BUG_ON(sizeof(struct xc_sr_dhdr) != 16);
-    BUILD_BUG_ON(sizeof(struct xc_sr_rhdr) != 8);
-
-    BUILD_BUG_ON(sizeof(struct xc_sr_rec_page_data_header)  != 8);
-    BUILD_BUG_ON(sizeof(struct xc_sr_rec_x86_pv_info)       != 8);
-    BUILD_BUG_ON(sizeof(struct xc_sr_rec_x86_pv_p2m_frames) != 8);
-    BUILD_BUG_ON(sizeof(struct xc_sr_rec_x86_pv_vcpu_hdr)   != 8);
-    BUILD_BUG_ON(sizeof(struct xc_sr_rec_x86_tsc_info)      != 24);
-    BUILD_BUG_ON(sizeof(struct xc_sr_rec_hvm_params_entry)  != 16);
-    BUILD_BUG_ON(sizeof(struct xc_sr_rec_hvm_params)        != 8);
+    BUILD_BUG_ON(sizeof(struct mr_rhdr) != 8);
+
+    BUILD_BUG_ON(sizeof(struct mr_page_data_header)  != 8);
+    BUILD_BUG_ON(sizeof(struct mr_x86_pv_info)       != 8);
+    BUILD_BUG_ON(sizeof(struct mr_x86_pv_p2m_frames) != 8);
+    BUILD_BUG_ON(sizeof(struct mr_x86_pv_vcpu_hdr)   != 8);
+    BUILD_BUG_ON(sizeof(struct mr_x86_tsc_info)      != 24);
+    BUILD_BUG_ON(sizeof(struct mr_hvm_params_entry)  != 16);
+    BUILD_BUG_ON(sizeof(struct mr_hvm_params)        != 8);
 }
 
 /*
diff --git a/tools/libxc/xc_sr_common_x86.c b/tools/libxc/xc_sr_common_x86.c
index 011684df97..1627ff72d6 100644
--- a/tools/libxc/xc_sr_common_x86.c
+++ b/tools/libxc/xc_sr_common_x86.c
@@ -3,7 +3,7 @@
 int write_x86_tsc_info(struct xc_sr_context *ctx)
 {
     xc_interface *xch = ctx->xch;
-    struct xc_sr_rec_x86_tsc_info tsc = {};
+    struct mr_x86_tsc_info tsc = {};
     struct xc_sr_record rec = {
         .type = REC_TYPE_X86_TSC_INFO,
         .length = sizeof(tsc),
@@ -23,7 +23,7 @@ int write_x86_tsc_info(struct xc_sr_context *ctx)
 int handle_x86_tsc_info(struct xc_sr_context *ctx, struct xc_sr_record *rec)
 {
     xc_interface *xch = ctx->xch;
-    struct xc_sr_rec_x86_tsc_info *tsc = rec->data;
+    struct mr_x86_tsc_info *tsc = rec->data;
 
     if ( rec->length != sizeof(*tsc) )
     {
diff --git a/tools/libxc/xc_sr_restore.c b/tools/libxc/xc_sr_restore.c
index 5e31908ca8..29c264ecc7 100644
--- a/tools/libxc/xc_sr_restore.c
+++ b/tools/libxc/xc_sr_restore.c
@@ -335,7 +335,7 @@ static int process_page_data(struct xc_sr_context *ctx, unsigned int count,
 static int handle_page_data(struct xc_sr_context *ctx, struct xc_sr_record *rec)
 {
     xc_interface *xch = ctx->xch;
-    struct xc_sr_rec_page_data_header *pages = rec->data;
+    struct mr_page_data_header *pages = rec->data;
     unsigned int i, pages_of_data = 0;
     int rc = -1;
 
diff --git a/tools/libxc/xc_sr_restore_x86_hvm.c b/tools/libxc/xc_sr_restore_x86_hvm.c
index 3f78248f32..e5b25f4280 100644
--- a/tools/libxc/xc_sr_restore_x86_hvm.c
+++ b/tools/libxc/xc_sr_restore_x86_hvm.c
@@ -25,8 +25,8 @@ static int handle_hvm_params(struct xc_sr_context *ctx,
                              struct xc_sr_record *rec)
 {
     xc_interface *xch = ctx->xch;
-    struct xc_sr_rec_hvm_params *hdr = rec->data;
-    struct xc_sr_rec_hvm_params_entry *entry = hdr->param;
+    struct mr_hvm_params *hdr = rec->data;
+    struct mr_hvm_params_entry *entry = hdr->param;
     unsigned int i;
     int rc;
 
diff --git a/tools/libxc/xc_sr_restore_x86_pv.c b/tools/libxc/xc_sr_restore_x86_pv.c
index 16e738884e..8e43ddcfd7 100644
--- a/tools/libxc/xc_sr_restore_x86_pv.c
+++ b/tools/libxc/xc_sr_restore_x86_pv.c
@@ -585,7 +585,7 @@ static int update_guest_p2m(struct xc_sr_context *ctx)
  * Cross-check the legitimate combinations.
  */
 static bool valid_x86_pv_info_combination(
-    const struct xc_sr_rec_x86_pv_info *info)
+    const struct mr_x86_pv_info *info)
 {
     switch ( info->guest_width )
     {
@@ -602,7 +602,7 @@ static int handle_x86_pv_info(struct xc_sr_context *ctx,
                               struct xc_sr_record *rec)
 {
     xc_interface *xch = ctx->xch;
-    struct xc_sr_rec_x86_pv_info *info = rec->data;
+    struct mr_x86_pv_info *info = rec->data;
 
     if ( ctx->x86.pv.restore.seen_pv_info )
     {
@@ -675,7 +675,7 @@ static int handle_x86_pv_p2m_frames(struct xc_sr_context *ctx,
                                     struct xc_sr_record *rec)
 {
     xc_interface *xch = ctx->xch;
-    struct xc_sr_rec_x86_pv_p2m_frames *data = rec->data;
+    struct mr_x86_pv_p2m_frames *data = rec->data;
     unsigned int start, end, x, fpp = PAGE_SIZE / ctx->x86.pv.width;
     int rc;
 
@@ -734,7 +734,7 @@ static int handle_x86_pv_vcpu_blob(struct xc_sr_context *ctx,
                                    struct xc_sr_record *rec)
 {
     xc_interface *xch = ctx->xch;
-    struct xc_sr_rec_x86_pv_vcpu_hdr *vhdr = rec->data;
+    struct mr_x86_pv_vcpu_hdr *vhdr = rec->data;
     struct xc_sr_x86_pv_restore_vcpu *vcpu;
     const char *rec_name;
     size_t blobsz;
diff --git a/tools/libxc/xc_sr_save.c b/tools/libxc/xc_sr_save.c
index fa736a311f..41af26909e 100644
--- a/tools/libxc/xc_sr_save.c
+++ b/tools/libxc/xc_sr_save.c
@@ -87,7 +87,7 @@ static int write_batch(struct xc_sr_context *ctx)
     void *page, *orig_page;
     uint64_t *rec_pfns = NULL;
     struct iovec *iov = NULL; int iovcnt = 0;
-    struct xc_sr_rec_page_data_header hdr = { 0 };
+    struct mr_page_data_header hdr = { 0 };
     struct xc_sr_record rec = {
         .type = REC_TYPE_PAGE_DATA,
     };
diff --git a/tools/libxc/xc_sr_save_x86_hvm.c b/tools/libxc/xc_sr_save_x86_hvm.c
index d99efe65e5..c4dc42479f 100644
--- a/tools/libxc/xc_sr_save_x86_hvm.c
+++ b/tools/libxc/xc_sr_save_x86_hvm.c
@@ -80,8 +80,8 @@ static int write_hvm_params(struct xc_sr_context *ctx)
     };
 
     xc_interface *xch = ctx->xch;
-    struct xc_sr_rec_hvm_params_entry entries[ARRAY_SIZE(params)];
-    struct xc_sr_rec_hvm_params hdr = {
+    struct mr_hvm_params_entry entries[ARRAY_SIZE(params)];
+    struct mr_hvm_params hdr = {
         .count = 0,
     };
     struct xc_sr_record rec = {
diff --git a/tools/libxc/xc_sr_save_x86_pv.c b/tools/libxc/xc_sr_save_x86_pv.c
index f3ccf5bb4b..916c5aad41 100644
--- a/tools/libxc/xc_sr_save_x86_pv.c
+++ b/tools/libxc/xc_sr_save_x86_pv.c
@@ -485,7 +485,7 @@ static int write_one_vcpu_basic(struct xc_sr_context *ctx, uint32_t id)
     unsigned int i, gdt_count;
     int rc = -1;
     vcpu_guest_context_any_t vcpu;
-    struct xc_sr_rec_x86_pv_vcpu_hdr vhdr = {
+    struct mr_x86_pv_vcpu_hdr vhdr = {
         .vcpu_id = id,
     };
     struct xc_sr_record rec = {
@@ -583,7 +583,7 @@ static int write_one_vcpu_basic(struct xc_sr_context *ctx, uint32_t id)
 static int write_one_vcpu_extended(struct xc_sr_context *ctx, uint32_t id)
 {
     xc_interface *xch = ctx->xch;
-    struct xc_sr_rec_x86_pv_vcpu_hdr vhdr = {
+    struct mr_x86_pv_vcpu_hdr vhdr = {
         .vcpu_id = id,
     };
     struct xc_sr_record rec = {
@@ -620,7 +620,7 @@ static int write_one_vcpu_xsave(struct xc_sr_context *ctx, uint32_t id)
     xc_interface *xch = ctx->xch;
     int rc = -1;
     DECLARE_HYPERCALL_BUFFER(void, buffer);
-    struct xc_sr_rec_x86_pv_vcpu_hdr vhdr = {
+    struct mr_x86_pv_vcpu_hdr vhdr = {
         .vcpu_id = id,
     };
     struct xc_sr_record rec = {
@@ -686,7 +686,7 @@ static int write_one_vcpu_msrs(struct xc_sr_context *ctx, uint32_t id)
     int rc = -1;
     size_t buffersz;
     DECLARE_HYPERCALL_BUFFER(void, buffer);
-    struct xc_sr_rec_x86_pv_vcpu_hdr vhdr = {
+    struct mr_x86_pv_vcpu_hdr vhdr = {
         .vcpu_id = id,
     };
     struct xc_sr_record rec = {
@@ -793,7 +793,7 @@ static int write_all_vcpu_information(struct xc_sr_context *ctx)
  */
 static int write_x86_pv_info(struct xc_sr_context *ctx)
 {
-    struct xc_sr_rec_x86_pv_info info = {
+    struct mr_x86_pv_info info = {
         .guest_width = ctx->x86.pv.width,
         .pt_levels = ctx->x86.pv.levels,
     };
@@ -816,7 +816,7 @@ static int write_x86_pv_p2m_frames(struct xc_sr_context *ctx)
     int rc; unsigned int i;
     size_t datasz = ctx->x86.pv.p2m_frames * sizeof(uint64_t);
     uint64_t *data = NULL;
-    struct xc_sr_rec_x86_pv_p2m_frames hdr = {
+    struct mr_x86_pv_p2m_frames hdr = {
         .end_pfn = ctx->x86.pv.max_pfn,
     };
     struct xc_sr_record rec = {
diff --git a/tools/libxc/xc_sr_stream_format.h b/tools/libxc/xc_sr_stream_format.h
index 37a7da6eab..0700cde54f 100644
--- a/tools/libxc/xc_sr_stream_format.h
+++ b/tools/libxc/xc_sr_stream_format.h
@@ -1,6 +1,8 @@
 #ifndef __STREAM_FORMAT__H
 #define __STREAM_FORMAT__H
 
+#include <xen/migration_stream.h>
+
 /*
  * C structures for the Migration v2 stream format.
  * See docs/specs/libxc-migration-stream.pandoc
@@ -41,101 +43,6 @@ struct xc_sr_dhdr
     uint32_t xen_minor;
 };
 
-#define DHDR_TYPE_X86_PV  0x00000001U
-#define DHDR_TYPE_X86_HVM 0x00000002U
-
-/*
- * Record Header
- */
-struct xc_sr_rhdr
-{
-    uint32_t type;
-    uint32_t length;
-};
-
-/* All records must be aligned up to an 8 octet boundary */
-#define REC_ALIGN_ORDER               (3U)
-/* Somewhat arbitrary - 128MB */
-#define REC_LENGTH_MAX                (128U << 20)
-
-#define REC_TYPE_END                        0x00000000U
-#define REC_TYPE_PAGE_DATA                  0x00000001U
-#define REC_TYPE_X86_PV_INFO                0x00000002U
-#define REC_TYPE_X86_PV_P2M_FRAMES          0x00000003U
-#define REC_TYPE_X86_PV_VCPU_BASIC          0x00000004U
-#define REC_TYPE_X86_PV_VCPU_EXTENDED       0x00000005U
-#define REC_TYPE_X86_PV_VCPU_XSAVE          0x00000006U
-#define REC_TYPE_SHARED_INFO                0x00000007U
-#define REC_TYPE_X86_TSC_INFO               0x00000008U
-#define REC_TYPE_HVM_CONTEXT                0x00000009U
-#define REC_TYPE_HVM_PARAMS                 0x0000000aU
-#define REC_TYPE_TOOLSTACK                  0x0000000bU
-#define REC_TYPE_X86_PV_VCPU_MSRS           0x0000000cU
-#define REC_TYPE_VERIFY                     0x0000000dU
-#define REC_TYPE_CHECKPOINT                 0x0000000eU
-#define REC_TYPE_CHECKPOINT_DIRTY_PFN_LIST  0x0000000fU
-
-#define REC_TYPE_OPTIONAL             0x80000000U
-
-/* PAGE_DATA */
-struct xc_sr_rec_page_data_header
-{
-    uint32_t count;
-    uint32_t _res1;
-    uint64_t pfn[0];
-};
-
-#define PAGE_DATA_PFN_MASK  0x000fffffffffffffULL
-#define PAGE_DATA_TYPE_MASK 0xf000000000000000ULL
-
-/* X86_PV_INFO */
-struct xc_sr_rec_x86_pv_info
-{
-    uint8_t guest_width;
-    uint8_t pt_levels;
-    uint8_t _res[6];
-};
-
-/* X86_PV_P2M_FRAMES */
-struct xc_sr_rec_x86_pv_p2m_frames
-{
-    uint32_t start_pfn;
-    uint32_t end_pfn;
-    uint64_t p2m_pfns[0];
-};
-
-/* X86_PV_VCPU_{BASIC,EXTENDED,XSAVE,MSRS} */
-struct xc_sr_rec_x86_pv_vcpu_hdr
-{
-    uint32_t vcpu_id;
-    uint32_t _res1;
-    uint8_t context[0];
-};
-
-/* X86_TSC_INFO */
-struct xc_sr_rec_x86_tsc_info
-{
-    uint32_t mode;
-    uint32_t khz;
-    uint64_t nsec;
-    uint32_t incarnation;
-    uint32_t _res1;
-};
-
-/* HVM_PARAMS */
-struct xc_sr_rec_hvm_params_entry
-{
-    uint64_t index;
-    uint64_t value;
-};
-
-struct xc_sr_rec_hvm_params
-{
-    uint32_t count;
-    uint32_t _res1;
-    struct xc_sr_rec_hvm_params_entry param[0];
-};
-
 #endif
 /*
  * Local variables:
diff --git a/xen/include/Makefile b/xen/include/Makefile
index c3e0283d34..9161716e8f 100644
--- a/xen/include/Makefile
+++ b/xen/include/Makefile
@@ -101,7 +101,7 @@ all: headers.chk headers99.chk headers++.chk
 PUBLIC_HEADERS := $(filter-out public/arch-% public/dom0_ops.h, $(wildcard public/*.h public/*/*.h) $(public-y))
 
 PUBLIC_C99_HEADERS := public/io/9pfs.h public/io/pvcalls.h
-PUBLIC_ANSI_HEADERS := $(filter-out public/%ctl.h public/xsm/% public/%hvm/save.h $(PUBLIC_C99_HEADERS), $(PUBLIC_HEADERS))
+PUBLIC_ANSI_HEADERS := $(filter-out public/%ctl.h public/migration_stream.h public/xsm/% public/%hvm/save.h $(PUBLIC_C99_HEADERS), $(PUBLIC_HEADERS))
 
 public/io/9pfs.h-prereq := string
 public/io/pvcalls.h-prereq := string
diff --git a/xen/include/public/migration_stream.h b/xen/include/public/migration_stream.h
new file mode 100644
index 0000000000..92dd119f9f
--- /dev/null
+++ b/xen/include/public/migration_stream.h
@@ -0,0 +1,126 @@
+#ifndef __XEN_MIGRATION_STREAM_H__
+#define __XEN_MIGRATION_STREAM_H__
+
+#if !defined(__XEN__) && !defined(__XEN_TOOLS__)
+#error "Migration stream definitions are intended for use by node control tools only"
+#endif
+
+/*
+ * C structures for the Migration and Live Update.
+ * See docs/specs/libxc-migration-stream.pandoc
+ * abd docs/specs/live-update-handover.pandoc
+ */
+
+#include "xen.h"
+
+/*
+ * Domain types are used in the libxc stream domain header as well
+ * as in the live update REC_TYPE_DOMAIN_INFO record.
+ */
+#define DHDR_TYPE_X86_PV  0x00000001U
+#define DHDR_TYPE_X86_HVM 0x00000002U
+
+/*
+ * Record Header
+ */
+struct mr_rhdr
+{
+    uint32_t type;
+    uint32_t length;
+};
+
+/* All records must be aligned up to an 8 octet boundary */
+#define REC_ALIGN_ORDER               (3U)
+/* Somewhat arbitrary - 128MB */
+#define REC_LENGTH_MAX                (128U << 20)
+
+#define REC_TYPE_END                        0x00000000U
+#define REC_TYPE_PAGE_DATA                  0x00000001U
+#define REC_TYPE_X86_PV_INFO                0x00000002U
+#define REC_TYPE_X86_PV_P2M_FRAMES          0x00000003U
+#define REC_TYPE_X86_PV_VCPU_BASIC          0x00000004U
+#define REC_TYPE_X86_PV_VCPU_EXTENDED       0x00000005U
+#define REC_TYPE_X86_PV_VCPU_XSAVE          0x00000006U
+#define REC_TYPE_SHARED_INFO                0x00000007U
+#define REC_TYPE_X86_TSC_INFO               0x00000008U
+#define REC_TYPE_HVM_CONTEXT                0x00000009U
+#define REC_TYPE_HVM_PARAMS                 0x0000000aU
+#define REC_TYPE_TOOLSTACK                  0x0000000bU
+#define REC_TYPE_X86_PV_VCPU_MSRS           0x0000000cU
+#define REC_TYPE_VERIFY                     0x0000000dU
+#define REC_TYPE_CHECKPOINT                 0x0000000eU
+#define REC_TYPE_CHECKPOINT_DIRTY_PFN_LIST  0x0000000fU
+
+#define REC_TYPE_OPTIONAL             0x80000000U
+#define REC_TYPE_LIVE_UPDATE          0x40000000U
+
+/* PAGE_DATA */
+struct mr_page_data_header
+{
+    uint32_t count;
+    uint32_t _res1;
+    uint64_t pfn[0];
+};
+
+#define PAGE_DATA_PFN_MASK  0x000fffffffffffffULL
+#define PAGE_DATA_TYPE_MASK 0xf000000000000000ULL
+
+/* X86_PV_INFO */
+struct mr_x86_pv_info
+{
+    uint8_t guest_width;
+    uint8_t pt_levels;
+    uint8_t _res[6];
+};
+
+/* X86_PV_P2M_FRAMES */
+struct mr_x86_pv_p2m_frames
+{
+    uint32_t start_pfn;
+    uint32_t end_pfn;
+    uint64_t p2m_pfns[0];
+};
+
+/* X86_PV_VCPU_{BASIC,EXTENDED,XSAVE,MSRS} */
+struct mr_x86_pv_vcpu_hdr
+{
+    uint32_t vcpu_id;
+    uint32_t _res1;
+    uint8_t context[0];
+};
+
+/* X86_TSC_INFO */
+struct mr_x86_tsc_info
+{
+    uint32_t mode;
+    uint32_t khz;
+    uint64_t nsec;
+    uint32_t incarnation;
+    uint32_t _res1;
+};
+
+/* HVM_PARAMS */
+struct mr_hvm_params_entry
+{
+    uint64_t index;
+    uint64_t value;
+};
+
+struct mr_hvm_params
+{
+    uint32_t count;
+    uint32_t _res1;
+    struct mr_hvm_params_entry param[0];
+};
+
+#endif /* __XEN_MIGRATION_STREAM_H__ */
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
-- 
2.21.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [Xen-devel] [RFC PATCH v3 17/22] Add lu_stream_{open, close, append}_record()
  2020-01-30 16:13 ` [Xen-devel] [RFC PATCH v3 01/22] x86/setup: Don't skip 2MiB underneath relocated Xen image David Woodhouse
                     ` (14 preceding siblings ...)
  2020-01-30 16:13   ` [Xen-devel] [RFC PATCH v3 16/22] Migrate migration stream definitions into Xen public headers David Woodhouse
@ 2020-01-30 16:13   ` David Woodhouse
  2020-01-30 16:13   ` [Xen-devel] [RFC PATCH v3 18/22] Add LU_VERSION and LU_END records to live update stream David Woodhouse
                     ` (4 subsequent siblings)
  20 siblings, 0 replies; 25+ messages in thread
From: David Woodhouse @ 2020-01-30 16:13 UTC (permalink / raw)
  To: Xen-devel
  Cc: Stefano Stabellini, Julien Grall, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Varad Gautam, paul, Ian Jackson,
	Hongyan Xia, Amit Shah, Roger Pau Monné

From: David Woodhouse <dwmw@amazon.co.uk>

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---
 xen/common/lu/stream.c | 50 ++++++++++++++++++++++++++++++++++++++++++
 xen/include/xen/lu.h   |  5 +++++
 2 files changed, 55 insertions(+)

diff --git a/xen/common/lu/stream.c b/xen/common/lu/stream.c
index 8c44a4eb37..2ee870e80a 100644
--- a/xen/common/lu/stream.c
+++ b/xen/common/lu/stream.c
@@ -33,6 +33,7 @@
 #include <xen/types.h>
 #include <xen/vmap.h>
 #include <xen/lu.h>
+#include <public/migration_stream.h>
 
 static int lu_stream_extend(struct lu_stream *stream, int nr_pages)
 {
@@ -105,6 +106,55 @@ int lu_stream_append(struct lu_stream *stream, const void *data, size_t size)
     return 0;
 }
 
+int lu_stream_open_record(struct lu_stream *stream, unsigned int type)
+{
+    struct mr_rhdr *hdr;
+
+    stream->last_hdr = stream->len;
+
+    hdr = lu_stream_reserve(stream, sizeof(hdr));
+    if (!hdr)
+        return -ENOMEM;
+
+    hdr->type = type;
+    hdr->length = 0;
+
+    lu_stream_end_reservation(stream, sizeof(*hdr));
+
+    return 0;
+}
+
+int lu_stream_close_record(struct lu_stream *stream)
+{
+    uint64_t zeroes = 0;
+    struct mr_rhdr *hdr;
+    int rc = 0;
+
+    hdr = (struct mr_rhdr *)(stream->data + stream->last_hdr);
+
+    hdr->length = stream->len - stream->last_hdr - sizeof(*hdr);
+
+    if (stream->len & 7)
+        rc = lu_stream_append(stream, &zeroes, 8 - (stream->len & 7));
+
+    return rc;
+}
+
+int lu_stream_append_record(struct lu_stream *stream, unsigned int type,
+                            void *rec, size_t len)
+{
+    int rc;
+
+
+    rc = lu_stream_open_record(stream, type);
+    if (!rc && len)
+        rc = lu_stream_append(stream, rec, len);
+    if (!rc)
+        rc = lu_stream_close_record(stream);
+
+    return 0;
+}
+
 void lu_stream_free(struct lu_stream *stream)
 {
     unsigned int order = get_order_from_bytes((stream->nr_pages + 1) * sizeof(mfn_t));
diff --git a/xen/include/xen/lu.h b/xen/include/xen/lu.h
index 21abace130..c02268e414 100644
--- a/xen/include/xen/lu.h
+++ b/xen/include/xen/lu.h
@@ -9,6 +9,7 @@
 
 struct lu_stream {
     mfn_t *pagelist;
+    size_t last_hdr;
     size_t len;
     int nr_pages;
     char *data;
@@ -17,6 +18,10 @@ struct lu_stream {
 void *lu_stream_reserve(struct lu_stream *stream, size_t size);
 void lu_stream_end_reservation(struct lu_stream *stream, size_t size);
 int lu_stream_append(struct lu_stream *stream, const void *data, size_t size);
+int lu_stream_open_record(struct lu_stream *stream, unsigned int type);
+int lu_stream_close_record(struct lu_stream *stream);
+int lu_stream_append_record(struct lu_stream *stream, unsigned int type,
+                            void *rec, size_t len);
 void lu_stream_free(struct lu_stream *stream);
 
 struct kexec_image;
-- 
2.21.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [Xen-devel] [RFC PATCH v3 18/22] Add LU_VERSION and LU_END records to live update stream
  2020-01-30 16:13 ` [Xen-devel] [RFC PATCH v3 01/22] x86/setup: Don't skip 2MiB underneath relocated Xen image David Woodhouse
                     ` (15 preceding siblings ...)
  2020-01-30 16:13   ` [Xen-devel] [RFC PATCH v3 17/22] Add lu_stream_{open, close, append}_record() David Woodhouse
@ 2020-01-30 16:13   ` David Woodhouse
  2020-01-30 16:13   ` [Xen-devel] [RFC PATCH v3 19/22] Add shell of lu_reserve_pages() David Woodhouse
                     ` (3 subsequent siblings)
  20 siblings, 0 replies; 25+ messages in thread
From: David Woodhouse @ 2020-01-30 16:13 UTC (permalink / raw)
  To: Xen-devel
  Cc: Stefano Stabellini, Julien Grall, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Varad Gautam, paul, Ian Jackson,
	Hongyan Xia, Amit Shah, Roger Pau Monné

From: David Woodhouse <dwmw@amazon.co.uk>

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---
 xen/common/lu/save.c                  | 13 ++++++++++++-
 xen/include/public/migration_stream.h |  9 +++++++++
 2 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/xen/common/lu/save.c b/xen/common/lu/save.c
index c43962c44e..84852da35e 100644
--- a/xen/common/lu/save.c
+++ b/xen/common/lu/save.c
@@ -4,10 +4,17 @@
 #include <xen/lu.h>
 #include <xen/kimage.h>
 #include <xen/sched.h>
+#include <xen/version.h>
+#include <public/migration_stream.h>
 
 int lu_save_global(struct lu_stream *stream)
 {
-    return 0;
+    struct mr_lu_version ver_rec;
+
+    ver_rec.xen_major = xen_major_version();
+    ver_rec.xen_minor = xen_minor_version();
+    return lu_stream_append_record(stream, REC_TYPE_LU_VERSION,
+                                   &ver_rec, sizeof(ver_rec));
 }
 
 
@@ -34,6 +41,10 @@ int lu_save_all(struct kexec_image *image)
         ret = lu_save_domain(&stream, d);
     }
 
+    if (!ret)
+        ret = lu_stream_append_record(&stream, REC_TYPE_END,
+                                      NULL, 0);
+
     if (!ret)
         ret = kimage_add_live_update_data(image,
                           _mfn(virt_to_mfn(stream.pagelist)),
diff --git a/xen/include/public/migration_stream.h b/xen/include/public/migration_stream.h
index 92dd119f9f..29ed8cc2b5 100644
--- a/xen/include/public/migration_stream.h
+++ b/xen/include/public/migration_stream.h
@@ -51,6 +51,8 @@ struct mr_rhdr
 #define REC_TYPE_CHECKPOINT                 0x0000000eU
 #define REC_TYPE_CHECKPOINT_DIRTY_PFN_LIST  0x0000000fU
 
+#define REC_TYPE_LU_VERSION                 0x40000000U
+
 #define REC_TYPE_OPTIONAL             0x80000000U
 #define REC_TYPE_LIVE_UPDATE          0x40000000U
 
@@ -113,6 +115,13 @@ struct mr_hvm_params
     struct mr_hvm_params_entry param[0];
 };
 
+/* LU_VERSION */
+struct mr_lu_version
+{
+    uint32_t xen_major;
+    uint32_t xen_minor;
+};
+
 #endif /* __XEN_MIGRATION_STREAM_H__ */
 
 /*
-- 
2.21.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [Xen-devel] [RFC PATCH v3 19/22] Add shell of lu_reserve_pages()
  2020-01-30 16:13 ` [Xen-devel] [RFC PATCH v3 01/22] x86/setup: Don't skip 2MiB underneath relocated Xen image David Woodhouse
                     ` (16 preceding siblings ...)
  2020-01-30 16:13   ` [Xen-devel] [RFC PATCH v3 18/22] Add LU_VERSION and LU_END records to live update stream David Woodhouse
@ 2020-01-30 16:13   ` David Woodhouse
  2020-01-30 16:13   ` [Xen-devel] [RFC PATCH v3 20/22] x86/setup: lift dom0 creation out into create_dom0 function David Woodhouse
                     ` (2 subsequent siblings)
  20 siblings, 0 replies; 25+ messages in thread
From: David Woodhouse @ 2020-01-30 16:13 UTC (permalink / raw)
  To: Xen-devel
  Cc: Stefano Stabellini, Julien Grall, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Varad Gautam, paul, Ian Jackson,
	Hongyan Xia, Amit Shah, Roger Pau Monné

From: David Woodhouse <dwmw@amazon.co.uk>

This currently only iterates over the records and prints the version of
Xen that we're live updating from.

In the fullness of time, it will also reserve the pages passed over as
M2P as well as the pages belonging to preserved domains.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---
 xen/arch/x86/setup.c    |  2 ++
 xen/common/lu/Makefile  |  2 +-
 xen/common/lu/restore.c | 34 ++++++++++++++++++++++++++++++++++
 xen/include/xen/lu.h    | 18 ++++++++++++++++++
 4 files changed, 55 insertions(+), 1 deletion(-)
 create mode 100644 xen/common/lu/restore.c

diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
index eea670e03b..f789713b1b 100644
--- a/xen/arch/x86/setup.c
+++ b/xen/arch/x86/setup.c
@@ -1603,6 +1603,8 @@ void __init noreturn __start_xen(unsigned long mbi_p)
     if ( lu_breadcrumb_phys )
     {
         lu_stream_map(&lu_stream, lu_mfnlist_phys, lu_nr_pages);
+
+        lu_reserve_pages(&lu_stream);
     }
 
     if ( lu_bootmem_start )
diff --git a/xen/common/lu/Makefile b/xen/common/lu/Makefile
index 7b7d975f65..592c72e1ec 100644
--- a/xen/common/lu/Makefile
+++ b/xen/common/lu/Makefile
@@ -1 +1 @@
-obj-y += stream.o save.o
+obj-y += stream.o save.o restore.o
diff --git a/xen/common/lu/restore.c b/xen/common/lu/restore.c
new file mode 100644
index 0000000000..f52bb660d2
--- /dev/null
+++ b/xen/common/lu/restore.c
@@ -0,0 +1,34 @@
+#include <xen/types.h>
+#include <xen/vmap.h>
+#include <xen/lu.h>
+#include <xen/sched.h>
+#include <xen/lu.h>
+
+#include <public/migration_stream.h>
+
+void lu_reserve_pages(struct lu_stream *stream)
+{
+    struct mr_rhdr *hdr;
+
+    while ( (hdr = lu_next_record(stream)) && hdr->type != REC_TYPE_END )
+    {
+        if ( hdr->type == REC_TYPE_LU_VERSION &&
+             hdr->length == sizeof(struct mr_lu_version) )
+        {
+            struct mr_lu_version *vers = LU_REC_DATA(hdr);
+
+            printk("Live update from Xen %d.%d\n",
+                   vers->xen_major, vers->xen_minor);
+        }
+    }
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/include/xen/lu.h b/xen/include/xen/lu.h
index c02268e414..588f2dd137 100644
--- a/xen/include/xen/lu.h
+++ b/xen/include/xen/lu.h
@@ -5,6 +5,8 @@
 #include <xen/types.h>
 #include <xen/mm.h>
 
+#include <public/migration_stream.h>
+
 #define LIVE_UPDATE_MAGIC        (0x4c69766555706461UL & PAGE_MASK)
 
 struct lu_stream {
@@ -28,6 +30,22 @@ struct kexec_image;
 int lu_save_all(struct kexec_image *image);
 
 void lu_stream_map(struct lu_stream *stream, unsigned long mfns_phys, int nr_pages);
+void lu_reserve_pages(struct lu_stream *stream);
+
+/* Pointer to the data immediately following a record header */
+#define LU_REC_DATA(hdr) ((void *)&(hdr)[1])
+
+static inline struct mr_rhdr *lu_next_record(struct lu_stream *stream)
+{
+    struct mr_rhdr *hdr = (struct mr_rhdr *)(stream->data + stream->last_hdr);
+
+    if (stream->len < stream->last_hdr + sizeof(*hdr) ||
+        stream->len < stream->last_hdr + sizeof(*hdr) + hdr->length)
+        return NULL;
+
+    stream->last_hdr += sizeof(*hdr) + ROUNDUP(hdr->length, 1<<REC_ALIGN_ORDER);
+    return hdr;
+}
 
 #endif /* __XEN_LU_H__ */
 
-- 
2.21.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [Xen-devel] [RFC PATCH v3 20/22] x86/setup: lift dom0 creation out into create_dom0 function
  2020-01-30 16:13 ` [Xen-devel] [RFC PATCH v3 01/22] x86/setup: Don't skip 2MiB underneath relocated Xen image David Woodhouse
                     ` (17 preceding siblings ...)
  2020-01-30 16:13   ` [Xen-devel] [RFC PATCH v3 19/22] Add shell of lu_reserve_pages() David Woodhouse
@ 2020-01-30 16:13   ` David Woodhouse
  2020-01-30 16:13   ` [Xen-devel] [RFC PATCH v3 21/22] x86/setup: finish plumbing in live update path through __start_xen() David Woodhouse
  2020-01-30 16:13   ` [Xen-devel] [RFC PATCH v3 22/22] x86/setup: simplify handling of initrdidx when no initrd present David Woodhouse
  20 siblings, 0 replies; 25+ messages in thread
From: David Woodhouse @ 2020-01-30 16:13 UTC (permalink / raw)
  To: Xen-devel
  Cc: Stefano Stabellini, Julien Grall, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Varad Gautam, paul, Ian Jackson,
	Hongyan Xia, Amit Shah, Roger Pau Monné

From: David Woodhouse <dwmw@amazon.co.uk>

It's about to become optional as __start_xen() grows a different path
for live update, so move it out of the way.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---
 xen/arch/x86/setup.c | 173 +++++++++++++++++++++++--------------------
 1 file changed, 94 insertions(+), 79 deletions(-)

diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
index f789713b1b..ac93965be4 100644
--- a/xen/arch/x86/setup.c
+++ b/xen/arch/x86/setup.c
@@ -679,6 +679,92 @@ static unsigned int __init copy_bios_e820(struct e820entry *map, unsigned int li
     return n;
 }
 
+static struct domain * __init create_dom0(const module_t *image,
+                                          unsigned long headroom,
+                                          module_t *initrd, char *kextra,
+                                          char *loader)
+{
+    struct xen_domctl_createdomain dom0_cfg = {
+        .flags = IS_ENABLED(CONFIG_TBOOT) ? XEN_DOMCTL_CDF_s3_integrity : 0,
+        .max_evtchn_port = -1,
+        .max_grant_frames = -1,
+        .max_maptrack_frames = -1,
+    };
+    struct domain *d;
+    char *cmdline;
+
+    if ( opt_dom0_pvh )
+    {
+        dom0_cfg.flags |= (XEN_DOMCTL_CDF_hvm |
+                           ((hvm_hap_supported() && !opt_dom0_shadow) ?
+                            XEN_DOMCTL_CDF_hap : 0));
+
+        dom0_cfg.arch.emulation_flags |=
+            XEN_X86_EMU_LAPIC | XEN_X86_EMU_IOAPIC | XEN_X86_EMU_VPCI;
+    }
+    dom0_cfg.max_vcpus = dom0_max_vcpus();
+
+    if ( iommu_enabled )
+        dom0_cfg.flags |= XEN_DOMCTL_CDF_iommu;
+
+    /* Create initial domain 0. */
+    d = domain_create(get_initial_domain_id(), &dom0_cfg, !pv_shim);
+    if ( IS_ERR(d) || (alloc_dom0_vcpu0(d) == NULL) )
+        panic("Error creating domain 0\n");
+
+    /* Grab the DOM0 command line. */
+    cmdline = (char *)(image->string ? __va(image->string) : NULL);
+    if ( (cmdline != NULL) || (kextra != NULL) )
+    {
+        static char __initdata dom0_cmdline[MAX_GUEST_CMDLINE];
+
+        cmdline = cmdline_cook(cmdline, loader);
+        safe_strcpy(dom0_cmdline, cmdline);
+
+        if ( kextra != NULL )
+            /* kextra always includes exactly one leading space. */
+            safe_strcat(dom0_cmdline, kextra);
+
+        /* Append any extra parameters. */
+        if ( skip_ioapic_setup && !strstr(dom0_cmdline, "noapic") )
+            safe_strcat(dom0_cmdline, " noapic");
+        if ( (strlen(acpi_param) == 0) && acpi_disabled )
+        {
+            printk("ACPI is disabled, notifying Domain 0 (acpi=off)\n");
+            safe_strcpy(acpi_param, "off");
+        }
+        if ( (strlen(acpi_param) != 0) && !strstr(dom0_cmdline, "acpi=") )
+        {
+            safe_strcat(dom0_cmdline, " acpi=");
+            safe_strcat(dom0_cmdline, acpi_param);
+        }
+
+        cmdline = dom0_cmdline;
+    }
+
+    /*
+     * Temporarily clear SMAP in CR4 to allow user-accesses in construct_dom0().
+     * This saves a large number of corner cases interactions with
+     * copy_from_user().
+     */
+    if ( cpu_has_smap )
+    {
+        cr4_pv32_mask &= ~X86_CR4_SMAP;
+        write_cr4(read_cr4() & ~X86_CR4_SMAP);
+    }
+
+    if ( construct_dom0(d, image, headroom, initrd, cmdline) != 0 )
+        panic("Could not construct domain 0\n");
+
+    if ( cpu_has_smap )
+    {
+        write_cr4(read_cr4() | X86_CR4_SMAP);
+        cr4_pv32_mask |= X86_CR4_SMAP;
+    }
+
+    return d;
+}
+
 /* How much of the directmap is prebuilt at compile time. */
 #define PREBUILT_MAP_LIMIT (1 << L2_PAGETABLE_SHIFT)
 
@@ -739,12 +825,6 @@ void __init noreturn __start_xen(unsigned long mbi_p)
         .parity    = 'n',
         .stop_bits = 1
     };
-    struct xen_domctl_createdomain dom0_cfg = {
-        .flags = IS_ENABLED(CONFIG_TBOOT) ? XEN_DOMCTL_CDF_s3_integrity : 0,
-        .max_evtchn_port = -1,
-        .max_grant_frames = -1,
-        .max_maptrack_frames = -1,
-    };
     const char *hypervisor_name;
     uint64_t lu_mfnlist_phys = 0, lu_nr_pages = 0;
     struct lu_stream lu_stream;
@@ -1889,94 +1969,29 @@ void __init noreturn __start_xen(unsigned long mbi_p)
     init_guest_cpuid();
     init_guest_msr_policy();
 
-    if ( opt_dom0_pvh )
-    {
-        dom0_cfg.flags |= (XEN_DOMCTL_CDF_hvm |
-                           ((hvm_hap_supported() && !opt_dom0_shadow) ?
-                            XEN_DOMCTL_CDF_hap : 0));
-
-        dom0_cfg.arch.emulation_flags |=
-            XEN_X86_EMU_LAPIC | XEN_X86_EMU_IOAPIC | XEN_X86_EMU_VPCI;
-    }
-    dom0_cfg.max_vcpus = dom0_max_vcpus();
-
-    if ( iommu_enabled )
-        dom0_cfg.flags |= XEN_DOMCTL_CDF_iommu;
-
-    /* Create initial domain 0. */
-    dom0 = domain_create(get_initial_domain_id(), &dom0_cfg, !pv_shim);
-    if ( IS_ERR(dom0) || (alloc_dom0_vcpu0(dom0) == NULL) )
-        panic("Error creating domain 0\n");
-
-    /* Grab the DOM0 command line. */
-    cmdline = (char *)(mod[0].string ? __va(mod[0].string) : NULL);
-    if ( (cmdline != NULL) || (kextra != NULL) )
-    {
-        static char __initdata dom0_cmdline[MAX_GUEST_CMDLINE];
-
-        cmdline = cmdline_cook(cmdline, loader);
-        safe_strcpy(dom0_cmdline, cmdline);
-
-        if ( kextra != NULL )
-            /* kextra always includes exactly one leading space. */
-            safe_strcat(dom0_cmdline, kextra);
-
-        /* Append any extra parameters. */
-        if ( skip_ioapic_setup && !strstr(dom0_cmdline, "noapic") )
-            safe_strcat(dom0_cmdline, " noapic");
-        if ( (strlen(acpi_param) == 0) && acpi_disabled )
-        {
-            printk("ACPI is disabled, notifying Domain 0 (acpi=off)\n");
-            safe_strcpy(acpi_param, "off");
-        }
-        if ( (strlen(acpi_param) != 0) && !strstr(dom0_cmdline, "acpi=") )
-        {
-            safe_strcat(dom0_cmdline, " acpi=");
-            safe_strcat(dom0_cmdline, acpi_param);
-        }
-
-        cmdline = dom0_cmdline;
-    }
-
     if ( xen_cpuidle )
         xen_processor_pmbits |= XEN_PROCESSOR_PM_CX;
 
+    printk("%sNX (Execute Disable) protection %sactive\n",
+           cpu_has_nx ? XENLOG_INFO : XENLOG_WARNING "Warning: ",
+           cpu_has_nx ? "" : "not ");
+
     initrdidx = find_first_bit(module_map, mbi->mods_count);
     if ( bitmap_weight(module_map, mbi->mods_count) > 1 )
         printk(XENLOG_WARNING
                "Multiple initrd candidates, picking module #%u\n",
                initrdidx);
 
-    /*
-     * Temporarily clear SMAP in CR4 to allow user-accesses in construct_dom0().
-     * This saves a large number of corner cases interactions with
-     * copy_from_user().
-     */
-    if ( cpu_has_smap )
-    {
-        cr4_pv32_mask &= ~X86_CR4_SMAP;
-        write_cr4(read_cr4() & ~X86_CR4_SMAP);
-    }
-
-    printk("%sNX (Execute Disable) protection %sactive\n",
-           cpu_has_nx ? XENLOG_INFO : XENLOG_WARNING "Warning: ",
-           cpu_has_nx ? "" : "not ");
-
     /*
      * We're going to setup domain0 using the module(s) that we stashed safely
      * above our heap. The second module, if present, is an initrd ramdisk.
      */
-    if ( construct_dom0(dom0, mod, modules_headroom,
-                        (initrdidx > 0) && (initrdidx < mbi->mods_count)
-                        ? mod + initrdidx : NULL, cmdline) != 0)
+    dom0 = create_dom0(mod, modules_headroom,
+                       (initrdidx > 0) && (initrdidx < mbi->mods_count)
+                       ? mod + initrdidx : NULL, kextra, loader);
+    if ( dom0 == NULL )
         panic("Could not set up DOM0 guest OS\n");
 
-    if ( cpu_has_smap )
-    {
-        write_cr4(read_cr4() | X86_CR4_SMAP);
-        cr4_pv32_mask |= X86_CR4_SMAP;
-    }
-
     heap_init_late();
 
     init_trace_bufs();
-- 
2.21.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [Xen-devel] [RFC PATCH v3 21/22] x86/setup: finish plumbing in live update path through __start_xen()
  2020-01-30 16:13 ` [Xen-devel] [RFC PATCH v3 01/22] x86/setup: Don't skip 2MiB underneath relocated Xen image David Woodhouse
                     ` (18 preceding siblings ...)
  2020-01-30 16:13   ` [Xen-devel] [RFC PATCH v3 20/22] x86/setup: lift dom0 creation out into create_dom0 function David Woodhouse
@ 2020-01-30 16:13   ` David Woodhouse
  2020-01-30 16:13   ` [Xen-devel] [RFC PATCH v3 22/22] x86/setup: simplify handling of initrdidx when no initrd present David Woodhouse
  20 siblings, 0 replies; 25+ messages in thread
From: David Woodhouse @ 2020-01-30 16:13 UTC (permalink / raw)
  To: Xen-devel
  Cc: Stefano Stabellini, Julien Grall, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Varad Gautam, paul, Ian Jackson,
	Hongyan Xia, Amit Shah, Roger Pau Monné

From: David Woodhouse <dwmw@amazon.co.uk>

With this we are fairly much done hacking up __start_xen() to support
live update. The live update functions themselves are still stubs,
but now we can start populating those with actual save/restore of
domain information.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---
 xen/arch/x86/setup.c    | 52 +++++++++++++++++++++++++++--------------
 xen/common/lu/restore.c |  5 ++++
 xen/include/xen/lu.h    |  2 ++
 3 files changed, 41 insertions(+), 18 deletions(-)

diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
index ac93965be4..53f7b9ced4 100644
--- a/xen/arch/x86/setup.c
+++ b/xen/arch/x86/setup.c
@@ -816,7 +816,7 @@ void __init noreturn __start_xen(unsigned long mbi_p)
     unsigned int initrdidx, num_parked = 0;
     multiboot_info_t *mbi;
     module_t *mod;
-    unsigned long nr_pages, raw_max_page, modules_headroom, module_map[1];
+    unsigned long nr_pages, raw_max_page, modules_headroom = 0, module_map[1];
     int i, j, e820_warn = 0, bytes = 0;
     bool acpi_boot_table_init_done = false, relocated = false, lu_reserved = false;
     int ret;
@@ -992,7 +992,8 @@ void __init noreturn __start_xen(unsigned long mbi_p)
     }
 
     bitmap_fill(module_map, mbi->mods_count);
-    __clear_bit(0, module_map); /* Dom0 kernel is always first */
+    if ( !lu_breadcrumb_phys )
+        __clear_bit(0, module_map); /* Dom0 kernel is always first */
 
     if ( pvh_boot )
     {
@@ -1151,8 +1152,12 @@ void __init noreturn __start_xen(unsigned long mbi_p)
         mod[mbi->mods_count].mod_end = __2M_rwdata_end - _stext;
     }
 
-    modules_headroom = bzimage_headroom(bootstrap_map(mod), mod->mod_end);
-    bootstrap_map(NULL);
+
+    if ( !lu_breadcrumb_phys )
+    {
+        modules_headroom = bzimage_headroom(bootstrap_map(mod), mod->mod_end);
+        bootstrap_map(NULL);
+    }
 
 #ifndef highmem_start
     /* Don't allow split below 4Gb. */
@@ -1976,21 +1981,32 @@ void __init noreturn __start_xen(unsigned long mbi_p)
            cpu_has_nx ? XENLOG_INFO : XENLOG_WARNING "Warning: ",
            cpu_has_nx ? "" : "not ");
 
-    initrdidx = find_first_bit(module_map, mbi->mods_count);
-    if ( bitmap_weight(module_map, mbi->mods_count) > 1 )
-        printk(XENLOG_WARNING
-               "Multiple initrd candidates, picking module #%u\n",
-               initrdidx);
 
-    /*
-     * We're going to setup domain0 using the module(s) that we stashed safely
-     * above our heap. The second module, if present, is an initrd ramdisk.
-     */
-    dom0 = create_dom0(mod, modules_headroom,
-                       (initrdidx > 0) && (initrdidx < mbi->mods_count)
-                       ? mod + initrdidx : NULL, kextra, loader);
-    if ( dom0 == NULL )
-        panic("Could not set up DOM0 guest OS\n");
+    if ( lu_breadcrumb_phys )
+    {
+        dom0 = lu_restore_domains(&lu_stream);
+        if ( dom0 == NULL )
+            panic("No DOM0 found in live update data\n");
+
+        lu_stream_free(&lu_stream);
+    }
+    else
+    {
+        initrdidx = find_first_bit(module_map, mbi->mods_count);
+        if ( bitmap_weight(module_map, mbi->mods_count) > 1 )
+            printk(XENLOG_WARNING
+                   "Multiple initrd candidates, picking module #%u\n",
+                   initrdidx);
+        /*
+         * We're going to setup domain0 using the module(s) that we stashed
+         * safely above our heap. The second module, if present, is an initrd.
+         */
+        dom0 = create_dom0(mod, modules_headroom,
+                           (initrdidx > 0) && (initrdidx < mbi->mods_count)
+                           ? mod + initrdidx : NULL, kextra, loader);
+        if ( dom0 == NULL )
+            panic("Could not set up DOM0 guest OS\n");
+    }
 
     heap_init_late();
 
diff --git a/xen/common/lu/restore.c b/xen/common/lu/restore.c
index f52bb660d2..163827f5de 100644
--- a/xen/common/lu/restore.c
+++ b/xen/common/lu/restore.c
@@ -23,6 +23,11 @@ void lu_reserve_pages(struct lu_stream *stream)
     }
 }
 
+struct domain *lu_restore_domains(struct lu_stream *stream)
+{
+    panic("Implement me!\n");
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/xen/include/xen/lu.h b/xen/include/xen/lu.h
index 588f2dd137..817a88b77a 100644
--- a/xen/include/xen/lu.h
+++ b/xen/include/xen/lu.h
@@ -31,6 +31,8 @@ int lu_save_all(struct kexec_image *image);
 
 void lu_stream_map(struct lu_stream *stream, unsigned long mfns_phys, int nr_pages);
 void lu_reserve_pages(struct lu_stream *stream);
+/* Returns Dom0 in case the architecture needs to do anything special to it */
+struct domain *lu_restore_domains(struct lu_stream *stream);
 
 /* Pointer to the data immediately following a record header */
 #define LU_REC_DATA(hdr) ((void *)&(hdr)[1])
-- 
2.21.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [Xen-devel] [RFC PATCH v3 22/22] x86/setup: simplify handling of initrdidx when no initrd present
  2020-01-30 16:13 ` [Xen-devel] [RFC PATCH v3 01/22] x86/setup: Don't skip 2MiB underneath relocated Xen image David Woodhouse
                     ` (19 preceding siblings ...)
  2020-01-30 16:13   ` [Xen-devel] [RFC PATCH v3 21/22] x86/setup: finish plumbing in live update path through __start_xen() David Woodhouse
@ 2020-01-30 16:13   ` David Woodhouse
  20 siblings, 0 replies; 25+ messages in thread
From: David Woodhouse @ 2020-01-30 16:13 UTC (permalink / raw)
  To: Xen-devel
  Cc: Stefano Stabellini, Julien Grall, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Varad Gautam, paul, Ian Jackson,
	Hongyan Xia, Amit Shah, Roger Pau Monné

From: David Woodhouse <dwmw@amazon.co.uk>

Remove a ternary operator that made my brain hurt and replace it with
something simpler that makes it clearer that the >= mbi->mods_count
is because of what find_first_bit() returns when it doesn't find
anything. Just have a simple condition to set initrdidx to zero in
that case, and a much simpler ternary operator in the create_dom0()
call.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---
 xen/arch/x86/setup.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
index 53f7b9ced4..6b3a5777cb 100644
--- a/xen/arch/x86/setup.c
+++ b/xen/arch/x86/setup.c
@@ -1993,6 +1993,9 @@ void __init noreturn __start_xen(unsigned long mbi_p)
     else
     {
         initrdidx = find_first_bit(module_map, mbi->mods_count);
+        if ( initrdidx >= mbi->mods_count )
+            initrdidx = 0;
+
         if ( bitmap_weight(module_map, mbi->mods_count) > 1 )
             printk(XENLOG_WARNING
                    "Multiple initrd candidates, picking module #%u\n",
@@ -2002,8 +2005,8 @@ void __init noreturn __start_xen(unsigned long mbi_p)
          * safely above our heap. The second module, if present, is an initrd.
          */
         dom0 = create_dom0(mod, modules_headroom,
-                           (initrdidx > 0) && (initrdidx < mbi->mods_count)
-                           ? mod + initrdidx : NULL, kextra, loader);
+                           initrdidx ? mod + initrdidx : NULL,
+                           kextra, loader);
         if ( dom0 == NULL )
             panic("Could not set up DOM0 guest OS\n");
     }
-- 
2.21.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [Xen-devel] [RFC PATCH v3 23/22] x86/smp: reset x2apic_enabled in smp_send_stop()
  2020-01-30 16:12 [Xen-devel] [RFC PATCH v3 0/22] Live update: boot memory management, data stream handling, record format David Woodhouse
  2020-01-30 16:13 ` [Xen-devel] [RFC PATCH v3 01/22] x86/setup: Don't skip 2MiB underneath relocated Xen image David Woodhouse
@ 2020-01-31 17:16 ` David Woodhouse
  2020-02-18 15:22 ` [Xen-devel] [RFC PATCH v3 0/22] Live update: boot memory management, data stream handling, record format Ian Jackson
  2 siblings, 0 replies; 25+ messages in thread
From: David Woodhouse @ 2020-01-31 17:16 UTC (permalink / raw)
  To: Xen-devel
  Cc: Stefano Stabellini, Julien Grall, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Varad Gautam, paul, Ian Jackson,
	Hongyan Xia, Amit Shah, Roger Pau Monné


[-- Attachment #1.1: Type: text/plain, Size: 2207 bytes --]

From: David Woodhouse <dwmw@amazon.co.uk>

Just before smp_send_stop() re-enables interrupts when shutting down
for reboot or kexec, it calls __stop_this_cpu() which in turn calls
disable_local_APIC(), which puts the APIC back in to the mode Xen found
it in at boot.

If that means turning x2APIC off and going back into xAPIC mode, then
a timer interrupt occurring just after interrupts come back on will
lead to a GP# when apic_timer_interrupt() attempts to ack the IRQ
through the EOI register in x2APIC MSR 0x80b:

(XEN) Executing kexec image on cpu0
(XEN) ----[ Xen-4.14-unstable  x86_64  debug=n   Not tainted ]----
(XEN) CPU:    0
(XEN) RIP:    e008:[<ffff82d08026c139>] apic_timer_interrupt+0x29/0x40
(XEN) RFLAGS: 0000000000010046   CONTEXT: hypervisor
(XEN) rax: 0000000000000000   rbx: 00000000000000fa   rcx: 000000000000080b
…
(XEN) Xen code around <ffff82d08026c139> (apic_timer_interrupt+0x29/0x40):
(XEN)  c0 b9 0b 08 00 00 89 c2 <0f> 30 31 ff e9 0e c9 fb ff 0f 1f 40 00 66 2e 0f
…
(XEN) Xen call trace:
(XEN)    [<ffff82d08026c139>] R apic_timer_interrupt+0x29/0x40
(XEN)    [<ffff82d080283825>] S do_IRQ+0x95/0x750
…
(XEN)    [<ffff82d0802a0ad2>] S smp_send_stop+0x42/0xd0

We can't clear the global x2apic_enabled variable in disable_local_APIC()
itself because that runs on each CPU. Instead, correct it (by using
current_local_apic_mode()) in smp_send_stop() while interrupts are still
disabled immediately after calling __stop_this_cpu() for the boot CPU,
after all other CPUs have been stopped.

cf: d639bdd9bbe ("x86/apic: Disable the LAPIC later in smp_send_stop()")
    ... which didn't quite fix it completely.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---
 xen/arch/x86/smp.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/xen/arch/x86/smp.c b/xen/arch/x86/smp.c
index 65eb7cbda8..fac295fa6f 100644
--- a/xen/arch/x86/smp.c
+++ b/xen/arch/x86/smp.c
@@ -354,6 +354,7 @@ void smp_send_stop(void)
         disable_IO_APIC();
         hpet_disable();
         __stop_this_cpu();
+        x2apic_enabled = (current_local_apic_mode() == APIC_MODE_X2APIC);
         local_irq_enable();
     }
 }
-- 
2.17.1


[-- Attachment #1.2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5174 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [Xen-devel] [RFC PATCH v3 0/22] Live update: boot memory management, data stream handling, record format
  2020-01-30 16:12 [Xen-devel] [RFC PATCH v3 0/22] Live update: boot memory management, data stream handling, record format David Woodhouse
  2020-01-30 16:13 ` [Xen-devel] [RFC PATCH v3 01/22] x86/setup: Don't skip 2MiB underneath relocated Xen image David Woodhouse
  2020-01-31 17:16 ` [Xen-devel] [RFC PATCH v3 23/22] x86/smp: reset x2apic_enabled in smp_send_stop() David Woodhouse
@ 2020-02-18 15:22 ` Ian Jackson
  2 siblings, 0 replies; 25+ messages in thread
From: Ian Jackson @ 2020-02-18 15:22 UTC (permalink / raw)
  To: David Woodhouse
  Cc: Stefano Stabellini, Julien Grall, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Varad Gautam, paul, Hongyan Xia,
	Xen-devel, Amit Shah, Roger Pau Monné

David Woodhouse writes ("[RFC PATCH v3 0/22] Live update: boot memory management, data stream handling, record format"):
> Now with added documentation:
> http://david.woodhou.se/live-update-handover.pdf

I had a look at this.  I didn't read the patches in detail, but I did
read all of
   [RFC PATCH v3 05/22] Add KEXEC_TYPE_LIVE_UPDATE
It seems plausible to me.  This seems a new and interesting way of
updating Xen underneath running guests.

BUT I am not an expert on the migration code.  I hope Andrew Cooper
will be able to comment...

Regards,
Ian.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2020-02-18 15:22 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-01-30 16:12 [Xen-devel] [RFC PATCH v3 0/22] Live update: boot memory management, data stream handling, record format David Woodhouse
2020-01-30 16:13 ` [Xen-devel] [RFC PATCH v3 01/22] x86/setup: Don't skip 2MiB underneath relocated Xen image David Woodhouse
2020-01-30 16:13   ` [Xen-devel] [RFC PATCH v3 02/22] x86/boot: Reserve live update boot memory David Woodhouse
2020-01-30 16:13   ` [Xen-devel] [RFC PATCH v3 03/22] Reserve live update memory regions David Woodhouse
2020-01-30 16:13   ` [Xen-devel] [RFC PATCH v3 04/22] Add KEXEC_RANGE_MA_LIVEUPDATE David Woodhouse
2020-01-30 16:13   ` [Xen-devel] [RFC PATCH v3 05/22] Add KEXEC_TYPE_LIVE_UPDATE David Woodhouse
2020-01-30 16:13   ` [Xen-devel] [RFC PATCH v3 06/22] Add IND_WRITE64 primitive to kexec kimage David Woodhouse
2020-01-30 16:13   ` [Xen-devel] [RFC PATCH v3 07/22] Add basic live update stream creation David Woodhouse
2020-01-30 16:13   ` [Xen-devel] [RFC PATCH v3 08/22] Add kimage_add_live_update_data() David Woodhouse
2020-01-30 16:13   ` [Xen-devel] [RFC PATCH v3 09/22] Add basic lu_save_all() shell David Woodhouse
2020-01-30 16:13   ` [Xen-devel] [RFC PATCH v3 10/22] Don't add bad pages above HYPERVISOR_VIRT_END to the domheap David Woodhouse
2020-01-30 16:13   ` [Xen-devel] [RFC PATCH v3 11/22] xen/vmap: allow vm_init_type to be called during early_boot David Woodhouse
2020-01-30 16:13   ` [Xen-devel] [RFC PATCH v3 12/22] xen/vmap: allow vmap() to be called during early boot David Woodhouse
2020-01-30 16:13   ` [Xen-devel] [RFC PATCH v3 13/22] x86/setup: move vm_init() before end_boot_allocator() David Woodhouse
2020-01-30 16:13   ` [Xen-devel] [RFC PATCH v3 14/22] Detect live update breadcrumb at boot and map data stream David Woodhouse
2020-01-30 16:13   ` [Xen-devel] [RFC PATCH v3 15/22] Start documenting the live update handover David Woodhouse
2020-01-30 16:13   ` [Xen-devel] [RFC PATCH v3 16/22] Migrate migration stream definitions into Xen public headers David Woodhouse
2020-01-30 16:13   ` [Xen-devel] [RFC PATCH v3 17/22] Add lu_stream_{open, close, append}_record() David Woodhouse
2020-01-30 16:13   ` [Xen-devel] [RFC PATCH v3 18/22] Add LU_VERSION and LU_END records to live update stream David Woodhouse
2020-01-30 16:13   ` [Xen-devel] [RFC PATCH v3 19/22] Add shell of lu_reserve_pages() David Woodhouse
2020-01-30 16:13   ` [Xen-devel] [RFC PATCH v3 20/22] x86/setup: lift dom0 creation out into create_dom0 function David Woodhouse
2020-01-30 16:13   ` [Xen-devel] [RFC PATCH v3 21/22] x86/setup: finish plumbing in live update path through __start_xen() David Woodhouse
2020-01-30 16:13   ` [Xen-devel] [RFC PATCH v3 22/22] x86/setup: simplify handling of initrdidx when no initrd present David Woodhouse
2020-01-31 17:16 ` [Xen-devel] [RFC PATCH v3 23/22] x86/smp: reset x2apic_enabled in smp_send_stop() David Woodhouse
2020-02-18 15:22 ` [Xen-devel] [RFC PATCH v3 0/22] Live update: boot memory management, data stream handling, record format Ian Jackson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).