Xen-Devel Archive on lore.kernel.org
 help / color / Atom feed
* [Xen-devel] [PATCH 0/8] Early cleanups and bug fixes in preparation for live update
@ 2020-02-01  0:32 David Woodhouse
  2020-02-01  0:32 ` [Xen-devel] [PATCH 1/8] x86/smp: reset x2apic_enabled in smp_send_stop() David Woodhouse
                   ` (7 more replies)
  0 siblings, 8 replies; 33+ messages in thread
From: David Woodhouse @ 2020-02-01  0:32 UTC (permalink / raw)
  To: xen-devel
  Cc: Stefano Stabellini, Julien Grall, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Varad Gautam, Ian Jackson,
	Hongyan Xia, Paul Durrant, Roger Pau Monné

[-- Attachment #1.1: Type: text/plain, Size: 1654 bytes --]

Picking out the things from the live update tree which are ready to be
merged.

A couple of actual bug fixes discovered along the way, and a weird off-
by-2MiB error with the start of the Xen image in really early memory
management that wasn't *strictly* a bug because those pages did get
reclaimed and fed into the heap in the end, but is annoying enough that
I want to fix it (and eventually I want the live update reserved
bootmem to fit snugly under the Xen image, so that the slack space we
reserve can be used for *either* of them to grow).

Make it possible to use vmap() earlier, which came out of Wei's work on
removing the directmap and is also needed for live update.

Finally a little bit of preparation/cleanup of __setup_xen() to make
way for what's to come, but which stands alone.

David Woodhouse (7):
      x86/smp: reset x2apic_enabled in smp_send_stop()
      x86/setup: Fix badpage= handling for memory above HYPERVISOR_VIRT_END
      x86/setup: Don't skip 2MiB underneath relocated Xen image
      xen/vmap: allow vmap() to be called during early boot
      x86/setup: move vm_init() before end_boot_allocator()
      x86/setup: simplify handling of initrdidx when no initrd present
      x86/setup: lift dom0 creation out into create_dom0() function

Wei Liu (1):
      xen/vmap: allow vm_init_type to be called during early_boot

 xen/arch/x86/setup.c    | 194 +++++++++++++++++++++++++-----------------------
 xen/arch/x86/smp.c      |   1 +
 xen/common/page_alloc.c |  82 +++++++++++++++++++-
 xen/common/vmap.c       |  45 ++++++++---
 4 files changed, 219 insertions(+), 103 deletions(-)



[-- Attachment #1.2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5174 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [Xen-devel] [PATCH 1/8] x86/smp: reset x2apic_enabled in smp_send_stop()
  2020-02-01  0:32 [Xen-devel] [PATCH 0/8] Early cleanups and bug fixes in preparation for live update David Woodhouse
@ 2020-02-01  0:32 ` David Woodhouse
  2020-02-03 16:18   ` Roger Pau Monné
  2020-02-01  0:32 ` [Xen-devel] [PATCH 2/8] x86/setup: Fix badpage= handling for memory above HYPERVISOR_VIRT_END David Woodhouse
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 33+ messages in thread
From: David Woodhouse @ 2020-02-01  0:32 UTC (permalink / raw)
  To: xen-devel
  Cc: Stefano Stabellini, Julien Grall, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Varad Gautam, Ian Jackson,
	Hongyan Xia, Paul Durrant, Roger Pau Monné

From: David Woodhouse <dwmw@amazon.co.uk>

Just before smp_send_stop() re-enables interrupts when shutting down
for reboot or kexec, it calls __stop_this_cpu() which in turn calls
disable_local_APIC(), which puts the APIC back in to the mode Xen found
it in at boot.

If that means turning x2APIC off and going back into xAPIC mode, then
a timer interrupt occurring just after interrupts come back on will
lead to a GP# when apic_timer_interrupt() attempts to ack the IRQ
through the EOI register in x2APIC MSR 0x80b:

(XEN) Executing kexec image on cpu0
(XEN) ----[ Xen-4.14-unstable  x86_64  debug=n   Not tainted ]----
(XEN) CPU:    0
(XEN) RIP:    e008:[<ffff82d08026c139>] apic_timer_interrupt+0x29/0x40
(XEN) RFLAGS: 0000000000010046   CONTEXT: hypervisor
(XEN) rax: 0000000000000000   rbx: 00000000000000fa   rcx: 000000000000080b
…
(XEN) Xen code around <ffff82d08026c139> (apic_timer_interrupt+0x29/0x40):
(XEN)  c0 b9 0b 08 00 00 89 c2 <0f> 30 31 ff e9 0e c9 fb ff 0f 1f 40 00 66 2e 0f
…
(XEN) Xen call trace:
(XEN)    [<ffff82d08026c139>] R apic_timer_interrupt+0x29/0x40
(XEN)    [<ffff82d080283825>] S do_IRQ+0x95/0x750
…
(XEN)    [<ffff82d0802a0ad2>] S smp_send_stop+0x42/0xd0

We can't clear the global x2apic_enabled variable in disable_local_APIC()
itself because that runs on each CPU. Instead, correct it (by using
current_local_apic_mode()) in smp_send_stop() while interrupts are still
disabled immediately after calling __stop_this_cpu() for the boot CPU,
after all other CPUs have been stopped.

cf: d639bdd9bbe ("x86/apic: Disable the LAPIC later in smp_send_stop()")
    ... which didn't quite fix it completely.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---
 xen/arch/x86/smp.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/xen/arch/x86/smp.c b/xen/arch/x86/smp.c
index 65eb7cbda8..fac295fa6f 100644
--- a/xen/arch/x86/smp.c
+++ b/xen/arch/x86/smp.c
@@ -354,6 +354,7 @@ void smp_send_stop(void)
         disable_IO_APIC();
         hpet_disable();
         __stop_this_cpu();
+        x2apic_enabled = (current_local_apic_mode() == APIC_MODE_X2APIC);
         local_irq_enable();
     }
 }
-- 
2.21.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [Xen-devel] [PATCH 2/8] x86/setup: Fix badpage= handling for memory above HYPERVISOR_VIRT_END
  2020-02-01  0:32 [Xen-devel] [PATCH 0/8] Early cleanups and bug fixes in preparation for live update David Woodhouse
  2020-02-01  0:32 ` [Xen-devel] [PATCH 1/8] x86/smp: reset x2apic_enabled in smp_send_stop() David Woodhouse
@ 2020-02-01  0:32 ` David Woodhouse
  2020-02-03 10:57   ` Julien Grall
  2020-02-20 15:38   ` Jan Beulich
  2020-02-01  0:32 ` [Xen-devel] [PATCH 3/8] x86/setup: Don't skip 2MiB underneath relocated Xen image David Woodhouse
                   ` (5 subsequent siblings)
  7 siblings, 2 replies; 33+ messages in thread
From: David Woodhouse @ 2020-02-01  0:32 UTC (permalink / raw)
  To: xen-devel
  Cc: Stefano Stabellini, Julien Grall, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Varad Gautam, Ian Jackson,
	Hongyan Xia, Paul Durrant, Roger Pau Monné

From: David Woodhouse <dwmw@amazon.co.uk>

Bad pages are identified by get_platform_badpages() and with badpage=
on the command line.

The boot allocator currently automatically elides these from the regions
passed to it with init_boot_pages(). The xenheap is then initialised
with the pages which are still marked as free by the boot allocator when
end_boot_allocator() is called.

However, any memory above HYPERVISOR_VIRT_END is passed directly to
init_domheap_pages() later in __start_xen(), and the bad page list is
not consulted.

Fix this by marking those pages as PGC_broken in the frametable at the
time end_boot_allocator() runs, and then making init_heap_pages() skip
over any pages which are so marked.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---
 xen/common/page_alloc.c | 82 +++++++++++++++++++++++++++++++++++++++--
 1 file changed, 79 insertions(+), 3 deletions(-)

diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
index 919a270587..3cf478311b 100644
--- a/xen/common/page_alloc.c
+++ b/xen/common/page_alloc.c
@@ -1758,6 +1758,18 @@ int query_page_offline(mfn_t mfn, uint32_t *status)
     return 0;
 }
 
+static unsigned long contig_avail_pages(struct page_info *pg, unsigned long max_pages)
+{
+    unsigned long i;
+
+    for ( i = 0 ; i < max_pages; i++)
+    {
+        if ( pg[i].count_info & PGC_broken )
+            break;
+    }
+    return i;
+}
+
 /*
  * Hand the specified arbitrary page range to the specified heap zone
  * checking the node_id of the previous page.  If they differ and the
@@ -1799,18 +1811,23 @@ static void init_heap_pages(
     {
         unsigned int nid = phys_to_nid(page_to_maddr(pg+i));
 
+        /* If the (first) page is already marked broken, don't add it. */
+        if ( pg[i].count_info & PGC_broken )
+            continue;
+
         if ( unlikely(!avail[nid]) )
         {
+            unsigned long contig_nr_pages = contig_avail_pages(pg + i, nr_pages);
             unsigned long s = mfn_x(page_to_mfn(pg + i));
-            unsigned long e = mfn_x(mfn_add(page_to_mfn(pg + nr_pages - 1), 1));
+            unsigned long e = mfn_x(mfn_add(page_to_mfn(pg + i + contig_nr_pages - 1), 1));
             bool use_tail = (nid == phys_to_nid(pfn_to_paddr(e - 1))) &&
                             !(s & ((1UL << MAX_ORDER) - 1)) &&
                             (find_first_set_bit(e) <= find_first_set_bit(s));
             unsigned long n;
 
-            n = init_node_heap(nid, mfn_x(page_to_mfn(pg + i)), nr_pages - i,
+            n = init_node_heap(nid, mfn_x(page_to_mfn(pg + i)), contig_nr_pages,
                                &use_tail);
-            BUG_ON(i + n > nr_pages);
+            BUG_ON(n > contig_nr_pages);
             if ( n && !use_tail )
             {
                 i += n - 1;
@@ -1846,6 +1863,63 @@ static unsigned long avail_heap_pages(
     return free_pages;
 }
 
+static void mark_bad_pages(void)
+{
+    unsigned long bad_spfn, bad_epfn;
+    const char *p;
+    struct page_info *pg;
+#ifdef CONFIG_X86
+    const struct platform_bad_page *badpage;
+    unsigned int i, j, array_size;
+
+    badpage = get_platform_badpages(&array_size);
+    if ( badpage )
+    {
+        for ( i = 0; i < array_size; i++ )
+        {
+            for ( j = 0; j < 1UL << badpage->order; j++ )
+            {
+                if ( mfn_valid(_mfn(badpage->mfn + j)) )
+                {
+                    pg = mfn_to_page(_mfn(badpage->mfn + j));
+                    pg->count_info |= PGC_broken;
+                    page_list_add_tail(pg, &page_broken_list);
+                }
+            }
+        }
+    }
+#endif
+
+    /* Check new pages against the bad-page list. */
+    p = opt_badpage;
+    while ( *p != '\0' )
+    {
+        bad_spfn = simple_strtoul(p, &p, 0);
+        bad_epfn = bad_spfn;
+
+        if ( *p == '-' )
+        {
+            p++;
+            bad_epfn = simple_strtoul(p, &p, 0);
+            if ( bad_epfn < bad_spfn )
+                bad_epfn = bad_spfn;
+        }
+
+        if ( *p == ',' )
+            p++;
+        else if ( *p != '\0' )
+            break;
+
+        while ( mfn_valid(_mfn(bad_spfn)) && bad_spfn < bad_epfn )
+        {
+            pg = mfn_to_page(_mfn(bad_spfn));
+            pg->count_info |= PGC_broken;
+            page_list_add_tail(pg, &page_broken_list);
+            bad_spfn++;
+        }
+    }
+}
+
 void __init end_boot_allocator(void)
 {
     unsigned int i;
@@ -1870,6 +1944,8 @@ void __init end_boot_allocator(void)
     }
     nr_bootmem_regions = 0;
 
+    mark_bad_pages();
+
     if ( !dma_bitsize && (num_online_nodes() > 1) )
         dma_bitsize = arch_get_dma_bitsize();
 
-- 
2.21.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [Xen-devel] [PATCH 3/8] x86/setup: Don't skip 2MiB underneath relocated Xen image
  2020-02-01  0:32 [Xen-devel] [PATCH 0/8] Early cleanups and bug fixes in preparation for live update David Woodhouse
  2020-02-01  0:32 ` [Xen-devel] [PATCH 1/8] x86/smp: reset x2apic_enabled in smp_send_stop() David Woodhouse
  2020-02-01  0:32 ` [Xen-devel] [PATCH 2/8] x86/setup: Fix badpage= handling for memory above HYPERVISOR_VIRT_END David Woodhouse
@ 2020-02-01  0:32 ` David Woodhouse
  2020-02-01  0:32 ` [Xen-devel] [PATCH 4/8] xen/vmap: allow vm_init_type to be called during early_boot David Woodhouse
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 33+ messages in thread
From: David Woodhouse @ 2020-02-01  0:32 UTC (permalink / raw)
  To: xen-devel
  Cc: Stefano Stabellini, Julien Grall, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Varad Gautam, Ian Jackson,
	Hongyan Xia, Paul Durrant, Roger Pau Monné

From: David Woodhouse <dwmw@amazon.co.uk>

Set 'e' correctly to reflect the location that Xen is actually relocated
to from its default 2MiB location. Not 2MiB below that.

This is only vaguely a bug fix. The "missing" 2MiB would have been used
in the end, and fed to the allocator. It's just that other things don't
get to sit right up *next* to the Xen image, and it isn't very tidy.

For live update, I'd quite like a single contiguous region for the
reserved bootmem and Xen, allowing the 'slack' in the former to be used
when Xen itself grows larger. Let's not allow 2MiB of random heap pages
to get in the way...

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---
 xen/arch/x86/setup.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
index d858883404..2677f127b9 100644
--- a/xen/arch/x86/setup.c
+++ b/xen/arch/x86/setup.c
@@ -1080,9 +1080,9 @@ void __init noreturn __start_xen(unsigned long mbi_p)
             unsigned long pte_update_limit;
 
             /* Select relocation address. */
-            e = end - reloc_size;
-            xen_phys_start = e;
-            bootsym(trampoline_xen_phys_start) = e;
+            xen_phys_start = end - reloc_size;
+            e = xen_phys_start + XEN_IMG_OFFSET;
+            bootsym(trampoline_xen_phys_start) = xen_phys_start;
 
             /*
              * No PTEs pointing above this address are candidates for relocation.
@@ -1090,7 +1090,7 @@ void __init noreturn __start_xen(unsigned long mbi_p)
              * and the beginning of region for destination image some PTEs may
              * point to addresses in range [e, e + XEN_IMG_OFFSET).
              */
-            pte_update_limit = PFN_DOWN(e + XEN_IMG_OFFSET);
+            pte_update_limit = PFN_DOWN(e);
 
             /*
              * Perform relocation to new physical address.
@@ -1099,7 +1099,7 @@ void __init noreturn __start_xen(unsigned long mbi_p)
              * data until after we have switched to the relocated pagetables!
              */
             barrier();
-            move_memory(e + XEN_IMG_OFFSET, XEN_IMG_OFFSET, _end - _start, 1);
+            move_memory(e, XEN_IMG_OFFSET, _end - _start, 1);
 
             /* Walk initial pagetables, relocating page directory entries. */
             pl4e = __va(__pa(idle_pg_table));
-- 
2.21.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [Xen-devel] [PATCH 4/8] xen/vmap: allow vm_init_type to be called during early_boot
  2020-02-01  0:32 [Xen-devel] [PATCH 0/8] Early cleanups and bug fixes in preparation for live update David Woodhouse
                   ` (2 preceding siblings ...)
  2020-02-01  0:32 ` [Xen-devel] [PATCH 3/8] x86/setup: Don't skip 2MiB underneath relocated Xen image David Woodhouse
@ 2020-02-01  0:32 ` David Woodhouse
  2020-02-13 10:36   ` Julien Grall
  2020-02-21 16:42   ` Jan Beulich
  2020-02-01  0:33 ` [Xen-devel] [PATCH 5/8] xen/vmap: allow vmap() to be called during early boot David Woodhouse
                   ` (3 subsequent siblings)
  7 siblings, 2 replies; 33+ messages in thread
From: David Woodhouse @ 2020-02-01  0:32 UTC (permalink / raw)
  To: xen-devel
  Cc: Stefano Stabellini, Julien Grall, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Varad Gautam, Ian Jackson,
	Hongyan Xia, Paul Durrant, Roger Pau Monné

From: Wei Liu <wei.liu2@citrix.com>

We want to move vm_init, which calls vm_init_type under the hood, to
early boot stage. Add a path to get page from boot allocator instead.

Add an emacs block to that file while I was there.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 xen/common/vmap.c | 22 +++++++++++++++++++---
 1 file changed, 19 insertions(+), 3 deletions(-)

diff --git a/xen/common/vmap.c b/xen/common/vmap.c
index faebc1ddf1..37922f735b 100644
--- a/xen/common/vmap.c
+++ b/xen/common/vmap.c
@@ -34,9 +34,15 @@ void __init vm_init_type(enum vmap_region type, void *start, void *end)
 
     for ( i = 0, va = (unsigned long)vm_bitmap(type); i < nr; ++i, va += PAGE_SIZE )
     {
-        struct page_info *pg = alloc_domheap_page(NULL, 0);
-
-        map_pages_to_xen(va, page_to_mfn(pg), 1, PAGE_HYPERVISOR);
+        mfn_t mfn;
+        int rc;
+
+        if ( system_state == SYS_STATE_early_boot )
+            mfn = alloc_boot_pages(1, 1);
+        else
+            mfn = page_to_mfn(alloc_domheap_page(NULL, 0));
+        rc = map_pages_to_xen(va, mfn, 1, PAGE_HYPERVISOR);
+        BUG_ON(rc);
         clear_page((void *)va);
     }
     bitmap_fill(vm_bitmap(type), vm_low[type]);
@@ -330,3 +336,13 @@ void vfree(void *va)
         free_domheap_page(pg);
 }
 #endif
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
-- 
2.21.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [Xen-devel] [PATCH 5/8] xen/vmap: allow vmap() to be called during early boot
  2020-02-01  0:32 [Xen-devel] [PATCH 0/8] Early cleanups and bug fixes in preparation for live update David Woodhouse
                   ` (3 preceding siblings ...)
  2020-02-01  0:32 ` [Xen-devel] [PATCH 4/8] xen/vmap: allow vm_init_type to be called during early_boot David Woodhouse
@ 2020-02-01  0:33 ` David Woodhouse
  2020-02-03 14:00   ` Julien Grall
  2020-02-21 16:46   ` Jan Beulich
  2020-02-01  0:33 ` [Xen-devel] [PATCH 6/8] x86/setup: move vm_init() before end_boot_allocator() David Woodhouse
                   ` (2 subsequent siblings)
  7 siblings, 2 replies; 33+ messages in thread
From: David Woodhouse @ 2020-02-01  0:33 UTC (permalink / raw)
  To: xen-devel
  Cc: Stefano Stabellini, Julien Grall, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Varad Gautam, Ian Jackson,
	Hongyan Xia, Paul Durrant, Roger Pau Monné

From: David Woodhouse <dwmw@amazon.co.uk>

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---
 xen/common/vmap.c | 23 +++++++++++++++++------
 1 file changed, 17 insertions(+), 6 deletions(-)

diff --git a/xen/common/vmap.c b/xen/common/vmap.c
index 37922f735b..8343460794 100644
--- a/xen/common/vmap.c
+++ b/xen/common/vmap.c
@@ -68,7 +68,7 @@ static void *vm_alloc(unsigned int nr, unsigned int align,
     spin_lock(&vm_lock);
     for ( ; ; )
     {
-        struct page_info *pg;
+        mfn_t mfn;
 
         ASSERT(vm_low[t] == vm_top[t] || !test_bit(vm_low[t], vm_bitmap(t)));
         for ( start = vm_low[t]; start < vm_top[t]; )
@@ -103,9 +103,17 @@ static void *vm_alloc(unsigned int nr, unsigned int align,
         if ( vm_top[t] >= vm_end[t] )
             return NULL;
 
-        pg = alloc_domheap_page(NULL, 0);
-        if ( !pg )
-            return NULL;
+        if ( system_state == SYS_STATE_early_boot )
+        {
+            mfn = alloc_boot_pages(1, 1);
+        }
+        else
+        {
+            struct page_info *pg = alloc_domheap_page(NULL, 0);
+            if ( !pg )
+                return NULL;
+            mfn = page_to_mfn(pg);
+        }
 
         spin_lock(&vm_lock);
 
@@ -113,7 +121,7 @@ static void *vm_alloc(unsigned int nr, unsigned int align,
         {
             unsigned long va = (unsigned long)vm_bitmap(t) + vm_top[t] / 8;
 
-            if ( !map_pages_to_xen(va, page_to_mfn(pg), 1, PAGE_HYPERVISOR) )
+            if ( !map_pages_to_xen(va, mfn, 1, PAGE_HYPERVISOR) )
             {
                 clear_page((void *)va);
                 vm_top[t] += PAGE_SIZE * 8;
@@ -123,7 +131,10 @@ static void *vm_alloc(unsigned int nr, unsigned int align,
             }
         }
 
-        free_domheap_page(pg);
+        if ( system_state == SYS_STATE_early_boot )
+            init_boot_pages(mfn_to_maddr(mfn), mfn_to_maddr(mfn) + PAGE_SIZE);
+        else
+            free_domheap_page(mfn_to_page(mfn));
 
         if ( start >= vm_top[t] )
         {
-- 
2.21.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [Xen-devel] [PATCH 6/8] x86/setup: move vm_init() before end_boot_allocator()
  2020-02-01  0:32 [Xen-devel] [PATCH 0/8] Early cleanups and bug fixes in preparation for live update David Woodhouse
                   ` (4 preceding siblings ...)
  2020-02-01  0:33 ` [Xen-devel] [PATCH 5/8] xen/vmap: allow vmap() to be called during early boot David Woodhouse
@ 2020-02-01  0:33 ` David Woodhouse
  2020-02-03 11:10   ` Xia, Hongyan
  2020-02-21 16:48   ` Jan Beulich
  2020-02-01  0:33 ` [Xen-devel] [PATCH 7/8] x86/setup: simplify handling of initrdidx when no initrd present David Woodhouse
  2020-02-01  0:33 ` [Xen-devel] [PATCH 8/8] x86/setup: lift dom0 creation out into create_dom0() function David Woodhouse
  7 siblings, 2 replies; 33+ messages in thread
From: David Woodhouse @ 2020-02-01  0:33 UTC (permalink / raw)
  To: xen-devel
  Cc: Stefano Stabellini, Julien Grall, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Varad Gautam, Ian Jackson,
	Hongyan Xia, Paul Durrant, Roger Pau Monné

From: David Woodhouse <dwmw@amazon.co.uk>

We would like to be able to use vmap() to map the live update data, and
we need to do a first pass of the live update data before we prime the
heap because we need to know which pages need to be preserved.

The warning about ACPI code can be dropped, since that problem no longer
exists when things are done in this order.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---
 xen/arch/x86/setup.c | 8 ++------
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
index 2677f127b9..5f68a1308f 100644
--- a/xen/arch/x86/setup.c
+++ b/xen/arch/x86/setup.c
@@ -1489,6 +1489,8 @@ void __init noreturn __start_xen(unsigned long mbi_p)
 
     numa_initmem_init(0, raw_max_page);
 
+    vm_init();
+
     if ( max_page - 1 > virt_to_mfn(HYPERVISOR_VIRT_END - 1) )
     {
         unsigned long limit = virt_to_mfn(HYPERVISOR_VIRT_END - 1);
@@ -1519,12 +1521,6 @@ void __init noreturn __start_xen(unsigned long mbi_p)
         end_boot_allocator();
 
     system_state = SYS_STATE_boot;
-    /*
-     * No calls involving ACPI code should go between the setting of
-     * SYS_STATE_boot and vm_init() (or else acpi_os_{,un}map_memory()
-     * will break).
-     */
-    vm_init();
 
     console_init_ring();
     vesa_init();
-- 
2.21.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [Xen-devel] [PATCH 7/8] x86/setup: simplify handling of initrdidx when no initrd present
  2020-02-01  0:32 [Xen-devel] [PATCH 0/8] Early cleanups and bug fixes in preparation for live update David Woodhouse
                   ` (5 preceding siblings ...)
  2020-02-01  0:33 ` [Xen-devel] [PATCH 6/8] x86/setup: move vm_init() before end_boot_allocator() David Woodhouse
@ 2020-02-01  0:33 ` David Woodhouse
  2020-02-13 10:47   ` Julien Grall
  2020-02-21 16:59   ` Jan Beulich
  2020-02-01  0:33 ` [Xen-devel] [PATCH 8/8] x86/setup: lift dom0 creation out into create_dom0() function David Woodhouse
  7 siblings, 2 replies; 33+ messages in thread
From: David Woodhouse @ 2020-02-01  0:33 UTC (permalink / raw)
  To: xen-devel
  Cc: Stefano Stabellini, Julien Grall, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Varad Gautam, Ian Jackson,
	Hongyan Xia, Paul Durrant, Roger Pau Monné

From: David Woodhouse <dwmw@amazon.co.uk>

Remove a ternary operator that made my brain hurt.

Replace it with something simpler that makes it somewhat clearer that
the check for initrdidx < mbi->mods_count is because mbi->mods_count
is what find_first_bit() will return when it doesn't find anything.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---
 xen/arch/x86/setup.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
index 5f68a1308f..10209e6bfb 100644
--- a/xen/arch/x86/setup.c
+++ b/xen/arch/x86/setup.c
@@ -687,7 +687,7 @@ void __init noreturn __start_xen(unsigned long mbi_p)
     char *cmdline, *kextra, *loader;
     unsigned int initrdidx, num_parked = 0;
     multiboot_info_t *mbi;
-    module_t *mod;
+    module_t *mod, *initrd = NULL;
     unsigned long nr_pages, raw_max_page, modules_headroom, module_map[1];
     int i, j, e820_warn = 0, bytes = 0;
     bool acpi_boot_table_init_done = false, relocated = false;
@@ -1793,6 +1793,9 @@ void __init noreturn __start_xen(unsigned long mbi_p)
         xen_processor_pmbits |= XEN_PROCESSOR_PM_CX;
 
     initrdidx = find_first_bit(module_map, mbi->mods_count);
+    if ( initrdidx < mbi->mods_count )
+        initrd = mod + initrdidx;
+
     if ( bitmap_weight(module_map, mbi->mods_count) > 1 )
         printk(XENLOG_WARNING
                "Multiple initrd candidates, picking module #%u\n",
@@ -1817,9 +1820,7 @@ void __init noreturn __start_xen(unsigned long mbi_p)
      * We're going to setup domain0 using the module(s) that we stashed safely
      * above our heap. The second module, if present, is an initrd ramdisk.
      */
-    if ( construct_dom0(dom0, mod, modules_headroom,
-                        (initrdidx > 0) && (initrdidx < mbi->mods_count)
-                        ? mod + initrdidx : NULL, cmdline) != 0)
+    if ( construct_dom0(dom0, mod, modules_headroom, initrd, cmdline) != 0 )
         panic("Could not set up DOM0 guest OS\n");
 
     if ( cpu_has_smap )
-- 
2.21.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [Xen-devel] [PATCH 8/8] x86/setup: lift dom0 creation out into create_dom0() function
  2020-02-01  0:32 [Xen-devel] [PATCH 0/8] Early cleanups and bug fixes in preparation for live update David Woodhouse
                   ` (6 preceding siblings ...)
  2020-02-01  0:33 ` [Xen-devel] [PATCH 7/8] x86/setup: simplify handling of initrdidx when no initrd present David Woodhouse
@ 2020-02-01  0:33 ` David Woodhouse
  2020-02-03 14:28   ` Julien Grall
  2020-02-21 17:06   ` Jan Beulich
  7 siblings, 2 replies; 33+ messages in thread
From: David Woodhouse @ 2020-02-01  0:33 UTC (permalink / raw)
  To: xen-devel
  Cc: Stefano Stabellini, Julien Grall, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Varad Gautam, Ian Jackson,
	Hongyan Xia, Paul Durrant, Roger Pau Monné

From: David Woodhouse <dwmw@amazon.co.uk>

The creation of dom0 can be relatively self-contained. Shift it into
a separate function and simplify __start_xen() a little bit.

This is a cleanup in its own right, but will be even more desireable
when live update provides an alternative path through __start_xen()
that doesn't involve creating a new dom0 at all.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---
 xen/arch/x86/setup.c | 169 +++++++++++++++++++++++--------------------
 1 file changed, 92 insertions(+), 77 deletions(-)

diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
index 10209e6bfb..9d86722ecd 100644
--- a/xen/arch/x86/setup.c
+++ b/xen/arch/x86/setup.c
@@ -678,6 +678,92 @@ static unsigned int __init copy_bios_e820(struct e820entry *map, unsigned int li
     return n;
 }
 
+static struct domain * __init create_dom0(const module_t *image,
+                                          unsigned long headroom,
+                                          module_t *initrd, char *kextra,
+                                          char *loader)
+{
+    struct xen_domctl_createdomain dom0_cfg = {
+        .flags = IS_ENABLED(CONFIG_TBOOT) ? XEN_DOMCTL_CDF_s3_integrity : 0,
+        .max_evtchn_port = -1,
+        .max_grant_frames = -1,
+        .max_maptrack_frames = -1,
+    };
+    struct domain *d;
+    char *cmdline;
+
+    if ( opt_dom0_pvh )
+    {
+        dom0_cfg.flags |= (XEN_DOMCTL_CDF_hvm |
+                           ((hvm_hap_supported() && !opt_dom0_shadow) ?
+                            XEN_DOMCTL_CDF_hap : 0));
+
+        dom0_cfg.arch.emulation_flags |=
+            XEN_X86_EMU_LAPIC | XEN_X86_EMU_IOAPIC | XEN_X86_EMU_VPCI;
+    }
+    dom0_cfg.max_vcpus = dom0_max_vcpus();
+
+    if ( iommu_enabled )
+        dom0_cfg.flags |= XEN_DOMCTL_CDF_iommu;
+
+    /* Create initial domain 0. */
+    d = domain_create(get_initial_domain_id(), &dom0_cfg, !pv_shim);
+    if ( IS_ERR(d) || (alloc_dom0_vcpu0(d) == NULL) )
+        panic("Error creating domain 0\n");
+
+    /* Grab the DOM0 command line. */
+    cmdline = (char *)(image->string ? __va(image->string) : NULL);
+    if ( (cmdline != NULL) || (kextra != NULL) )
+    {
+        static char __initdata dom0_cmdline[MAX_GUEST_CMDLINE];
+
+        cmdline = cmdline_cook(cmdline, loader);
+        safe_strcpy(dom0_cmdline, cmdline);
+
+        if ( kextra != NULL )
+            /* kextra always includes exactly one leading space. */
+            safe_strcat(dom0_cmdline, kextra);
+
+        /* Append any extra parameters. */
+        if ( skip_ioapic_setup && !strstr(dom0_cmdline, "noapic") )
+            safe_strcat(dom0_cmdline, " noapic");
+        if ( (strlen(acpi_param) == 0) && acpi_disabled )
+        {
+            printk("ACPI is disabled, notifying Domain 0 (acpi=off)\n");
+            safe_strcpy(acpi_param, "off");
+        }
+        if ( (strlen(acpi_param) != 0) && !strstr(dom0_cmdline, "acpi=") )
+        {
+            safe_strcat(dom0_cmdline, " acpi=");
+            safe_strcat(dom0_cmdline, acpi_param);
+        }
+
+        cmdline = dom0_cmdline;
+    }
+
+    /*
+     * Temporarily clear SMAP in CR4 to allow user-accesses in construct_dom0().
+     * This saves a large number of corner cases interactions with
+     * copy_from_user().
+     */
+    if ( cpu_has_smap )
+    {
+        cr4_pv32_mask &= ~X86_CR4_SMAP;
+        write_cr4(read_cr4() & ~X86_CR4_SMAP);
+    }
+
+    if ( construct_dom0(d, image, headroom, initrd, cmdline) != 0 )
+        panic("Could not construct domain 0\n");
+
+    if ( cpu_has_smap )
+    {
+        write_cr4(read_cr4() | X86_CR4_SMAP);
+        cr4_pv32_mask |= X86_CR4_SMAP;
+    }
+
+    return d;
+}
+
 /* How much of the directmap is prebuilt at compile time. */
 #define PREBUILT_MAP_LIMIT (1 << L2_PAGETABLE_SHIFT)
 
@@ -697,12 +783,6 @@ void __init noreturn __start_xen(unsigned long mbi_p)
         .parity    = 'n',
         .stop_bits = 1
     };
-    struct xen_domctl_createdomain dom0_cfg = {
-        .flags = IS_ENABLED(CONFIG_TBOOT) ? XEN_DOMCTL_CDF_s3_integrity : 0,
-        .max_evtchn_port = -1,
-        .max_grant_frames = -1,
-        .max_maptrack_frames = -1,
-    };
     const char *hypervisor_name;
 
     /* Critical region without IDT or TSS.  Any fault is deadly! */
@@ -1740,58 +1820,13 @@ void __init noreturn __start_xen(unsigned long mbi_p)
     init_guest_cpuid();
     init_guest_msr_policy();
 
-    if ( opt_dom0_pvh )
-    {
-        dom0_cfg.flags |= (XEN_DOMCTL_CDF_hvm |
-                           ((hvm_hap_supported() && !opt_dom0_shadow) ?
-                            XEN_DOMCTL_CDF_hap : 0));
-
-        dom0_cfg.arch.emulation_flags |=
-            XEN_X86_EMU_LAPIC | XEN_X86_EMU_IOAPIC | XEN_X86_EMU_VPCI;
-    }
-    dom0_cfg.max_vcpus = dom0_max_vcpus();
-
-    if ( iommu_enabled )
-        dom0_cfg.flags |= XEN_DOMCTL_CDF_iommu;
-
-    /* Create initial domain 0. */
-    dom0 = domain_create(get_initial_domain_id(), &dom0_cfg, !pv_shim);
-    if ( IS_ERR(dom0) || (alloc_dom0_vcpu0(dom0) == NULL) )
-        panic("Error creating domain 0\n");
-
-    /* Grab the DOM0 command line. */
-    cmdline = (char *)(mod[0].string ? __va(mod[0].string) : NULL);
-    if ( (cmdline != NULL) || (kextra != NULL) )
-    {
-        static char __initdata dom0_cmdline[MAX_GUEST_CMDLINE];
-
-        cmdline = cmdline_cook(cmdline, loader);
-        safe_strcpy(dom0_cmdline, cmdline);
-
-        if ( kextra != NULL )
-            /* kextra always includes exactly one leading space. */
-            safe_strcat(dom0_cmdline, kextra);
-
-        /* Append any extra parameters. */
-        if ( skip_ioapic_setup && !strstr(dom0_cmdline, "noapic") )
-            safe_strcat(dom0_cmdline, " noapic");
-        if ( (strlen(acpi_param) == 0) && acpi_disabled )
-        {
-            printk("ACPI is disabled, notifying Domain 0 (acpi=off)\n");
-            safe_strcpy(acpi_param, "off");
-        }
-        if ( (strlen(acpi_param) != 0) && !strstr(dom0_cmdline, "acpi=") )
-        {
-            safe_strcat(dom0_cmdline, " acpi=");
-            safe_strcat(dom0_cmdline, acpi_param);
-        }
-
-        cmdline = dom0_cmdline;
-    }
-
     if ( xen_cpuidle )
         xen_processor_pmbits |= XEN_PROCESSOR_PM_CX;
 
+    printk("%sNX (Execute Disable) protection %sactive\n",
+           cpu_has_nx ? XENLOG_INFO : XENLOG_WARNING "Warning: ",
+           cpu_has_nx ? "" : "not ");
+
     initrdidx = find_first_bit(module_map, mbi->mods_count);
     if ( initrdidx < mbi->mods_count )
         initrd = mod + initrdidx;
@@ -1801,34 +1836,14 @@ void __init noreturn __start_xen(unsigned long mbi_p)
                "Multiple initrd candidates, picking module #%u\n",
                initrdidx);
 
-    /*
-     * Temporarily clear SMAP in CR4 to allow user-accesses in construct_dom0().
-     * This saves a large number of corner cases interactions with
-     * copy_from_user().
-     */
-    if ( cpu_has_smap )
-    {
-        cr4_pv32_mask &= ~X86_CR4_SMAP;
-        write_cr4(read_cr4() & ~X86_CR4_SMAP);
-    }
-
-    printk("%sNX (Execute Disable) protection %sactive\n",
-           cpu_has_nx ? XENLOG_INFO : XENLOG_WARNING "Warning: ",
-           cpu_has_nx ? "" : "not ");
-
     /*
      * We're going to setup domain0 using the module(s) that we stashed safely
      * above our heap. The second module, if present, is an initrd ramdisk.
      */
-    if ( construct_dom0(dom0, mod, modules_headroom, initrd, cmdline) != 0 )
+    dom0 = create_dom0(mod, modules_headroom, initrd, kextra, loader);
+    if ( dom0 == NULL )
         panic("Could not set up DOM0 guest OS\n");
 
-    if ( cpu_has_smap )
-    {
-        write_cr4(read_cr4() | X86_CR4_SMAP);
-        cr4_pv32_mask |= X86_CR4_SMAP;
-    }
-
     heap_init_late();
 
     init_trace_bufs();
-- 
2.21.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Xen-devel] [PATCH 2/8] x86/setup: Fix badpage= handling for memory above HYPERVISOR_VIRT_END
  2020-02-01  0:32 ` [Xen-devel] [PATCH 2/8] x86/setup: Fix badpage= handling for memory above HYPERVISOR_VIRT_END David Woodhouse
@ 2020-02-03 10:57   ` Julien Grall
  2020-02-20 15:38   ` Jan Beulich
  1 sibling, 0 replies; 33+ messages in thread
From: Julien Grall @ 2020-02-03 10:57 UTC (permalink / raw)
  To: David Woodhouse, xen-devel
  Cc: Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Varad Gautam, Ian Jackson,
	Hongyan Xia, Paul Durrant, Roger Pau Monné

Hi David,

On 01/02/2020 00:32, David Woodhouse wrote:
> From: David Woodhouse <dwmw@amazon.co.uk>
> 
> Bad pages are identified by get_platform_badpages() and with badpage=
> on the command line.
> 
> The boot allocator currently automatically elides these from the regions
> passed to it with init_boot_pages(). The xenheap is then initialised
> with the pages which are still marked as free by the boot allocator when
> end_boot_allocator() is called.
> 
> However, any memory above HYPERVISOR_VIRT_END is passed directly to
> init_domheap_pages() later in __start_xen(), and the bad page list is
> not consulted.
> 
> Fix this by marking those pages as PGC_broken in the frametable at the
> time end_boot_allocator() runs, and then making init_heap_pages() skip
> over any pages which are so marked.
> 
> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
> ---
>   xen/common/page_alloc.c | 82 +++++++++++++++++++++++++++++++++++++++--
>   1 file changed, 79 insertions(+), 3 deletions(-)
> 
> diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
> index 919a270587..3cf478311b 100644
> --- a/xen/common/page_alloc.c
> +++ b/xen/common/page_alloc.c
> @@ -1758,6 +1758,18 @@ int query_page_offline(mfn_t mfn, uint32_t *status)
>       return 0;
>   }
>   
> +static unsigned long contig_avail_pages(struct page_info *pg, unsigned long max_pages)
> +{
> +    unsigned long i;
> +
> +    for ( i = 0 ; i < max_pages; i++)
> +    {
> +        if ( pg[i].count_info & PGC_broken )
> +            break;
> +    }
> +    return i;
> +}
> +
>   /*
>    * Hand the specified arbitrary page range to the specified heap zone
>    * checking the node_id of the previous page.  If they differ and the
> @@ -1799,18 +1811,23 @@ static void init_heap_pages(
>       {
>           unsigned int nid = phys_to_nid(page_to_maddr(pg+i));
>   
> +        /* If the (first) page is already marked broken, don't add it. */
> +        if ( pg[i].count_info & PGC_broken )
> +            continue;
> +
>           if ( unlikely(!avail[nid]) )
>           {
> +            unsigned long contig_nr_pages = contig_avail_pages(pg + i, nr_pages);
>               unsigned long s = mfn_x(page_to_mfn(pg + i));
> -            unsigned long e = mfn_x(mfn_add(page_to_mfn(pg + nr_pages - 1), 1));
> +            unsigned long e = mfn_x(mfn_add(page_to_mfn(pg + i + contig_nr_pages - 1), 1));
>               bool use_tail = (nid == phys_to_nid(pfn_to_paddr(e - 1))) &&
>                               !(s & ((1UL << MAX_ORDER) - 1)) &&
>                               (find_first_set_bit(e) <= find_first_set_bit(s));
>               unsigned long n;
>   
> -            n = init_node_heap(nid, mfn_x(page_to_mfn(pg + i)), nr_pages - i,
> +            n = init_node_heap(nid, mfn_x(page_to_mfn(pg + i)), contig_nr_pages,
>                                  &use_tail);
> -            BUG_ON(i + n > nr_pages);
> +            BUG_ON(n > contig_nr_pages);
>               if ( n && !use_tail )
>               {
>                   i += n - 1;
> @@ -1846,6 +1863,63 @@ static unsigned long avail_heap_pages(
>       return free_pages;
>   }
>   
> +static void mark_bad_pages(void)
> +{
> +    unsigned long bad_spfn, bad_epfn;

I would like to avoid adding more 'pfn' in the code. Per the current 
terminology (see include/xen/mm.h), this would be bad_smfn, bad_emfn.

Ideally, this should also use mfn_t, but I can live without it for now.

> +    const char *p;
> +    struct page_info *pg;
> +#ifdef CONFIG_X86
> +    const struct platform_bad_page *badpage;
> +    unsigned int i, j, array_size;
> +
> +    badpage = get_platform_badpages(&array_size);
> +    if ( badpage )
> +    {
> +        for ( i = 0; i < array_size; i++ )
> +        {
> +            for ( j = 0; j < 1UL << badpage->order; j++ )
> +            {
> +                if ( mfn_valid(_mfn(badpage->mfn + j)) )
> +                {
> +                    pg = mfn_to_page(_mfn(badpage->mfn + j));
> +                    pg->count_info |= PGC_broken;
> +                    page_list_add_tail(pg, &page_broken_list);
> +                }
> +            }
> +        }
> +    }

You are missing the PV shim part here.

But, AFAICT, the only difference with the implementation in 
init_boot_pages() is how we treat the page.

Could you we introduce a common helper taking a callback? The callback 
would be applied on every range range.

> +#endif
> +
> +    /* Check new pages against the bad-page list. */
> +    p = opt_badpage;
> +    while ( *p != '\0' )
> +    {
> +        bad_spfn = simple_strtoul(p, &p, 0);
> +        bad_epfn = bad_spfn;
> +
> +        if ( *p == '-' )
> +        {
> +            p++;
> +            bad_epfn = simple_strtoul(p, &p, 0);
> +            if ( bad_epfn < bad_spfn )
> +                bad_epfn = bad_spfn;
> +        }
> +
> +        if ( *p == ',' )
> +            p++;
> +        else if ( *p != '\0' )
> +            break;
> +
> +        while ( mfn_valid(_mfn(bad_spfn)) && bad_spfn < bad_epfn )
> +        {
> +            pg = mfn_to_page(_mfn(bad_spfn));
> +            pg->count_info |= PGC_broken;
> +            page_list_add_tail(pg, &page_broken_list);
> +            bad_spfn++;
> +        }
> +    }
> +}
> +
>   void __init end_boot_allocator(void)
>   {
>       unsigned int i;
> @@ -1870,6 +1944,8 @@ void __init end_boot_allocator(void)
>       }
>       nr_bootmem_regions = 0;
>   
> +    mark_bad_pages();
> +
>       if ( !dma_bitsize && (num_online_nodes() > 1) )
>           dma_bitsize = arch_get_dma_bitsize();
>   
> 

Cheers,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Xen-devel] [PATCH 6/8] x86/setup: move vm_init() before end_boot_allocator()
  2020-02-01  0:33 ` [Xen-devel] [PATCH 6/8] x86/setup: move vm_init() before end_boot_allocator() David Woodhouse
@ 2020-02-03 11:10   ` Xia, Hongyan
  2020-02-03 14:03     ` David Woodhouse
  2020-02-21 16:48   ` Jan Beulich
  1 sibling, 1 reply; 33+ messages in thread
From: Xia, Hongyan @ 2020-02-03 11:10 UTC (permalink / raw)
  To: dwmw2, xen-devel
  Cc: sstabellini, julien, wl, konrad.wilk, George.Dunlap,
	andrew.cooper3, Gautam, Varad, ian.jackson, Durrant, Paul,
	roger.pau

On Sat, 2020-02-01 at 00:33 +0000, David Woodhouse wrote:
> From: David Woodhouse <dwmw@amazon.co.uk>
> 
> We would like to be able to use vmap() to map the live update data,
> and
> we need to do a first pass of the live update data before we prime
> the
> heap because we need to know which pages need to be preserved.
> 
> The warning about ACPI code can be dropped, since that problem no
> longer
> exists when things are done in this order.
> 
> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
> ---
>  xen/arch/x86/setup.c | 8 ++------
>  1 file changed, 2 insertions(+), 6 deletions(-)
> 
> diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
> index 2677f127b9..5f68a1308f 100644
> --- a/xen/arch/x86/setup.c
> +++ b/xen/arch/x86/setup.c
> @@ -1489,6 +1489,8 @@ void __init noreturn __start_xen(unsigned long
> mbi_p)
>  
>      numa_initmem_init(0, raw_max_page);
>  
> +    vm_init();
> +
>      if ( max_page - 1 > virt_to_mfn(HYPERVISOR_VIRT_END - 1) )
>      {
>          unsigned long limit = virt_to_mfn(HYPERVISOR_VIRT_END - 1);
> @@ -1519,12 +1521,6 @@ void __init noreturn __start_xen(unsigned long
> mbi_p)
>          end_boot_allocator();
>  
>      system_state = SYS_STATE_boot;
> -    /*
> -     * No calls involving ACPI code should go between the setting of
> -     * SYS_STATE_boot and vm_init() (or else
> acpi_os_{,un}map_memory()
> -     * will break).
> -     */
> -    vm_init();
>  
>      console_init_ring();
>      vesa_init();

Is there any problem to move vm_init() even earlier than this, like
right after init_frametable()? ACPI and NUMA functions need a couple of
things permanently mapped. Without the directmap (directmap removal
series), xenheap and vmap at this stage, I can only map memory to the
directmap region which is really hacky. I would like to use vmap at
this stage so hopefully vmap can be ready very early.

Hongyan
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Xen-devel] [PATCH 5/8] xen/vmap: allow vmap() to be called during early boot
  2020-02-01  0:33 ` [Xen-devel] [PATCH 5/8] xen/vmap: allow vmap() to be called during early boot David Woodhouse
@ 2020-02-03 14:00   ` Julien Grall
  2020-02-03 16:37     ` David Woodhouse
  2020-02-21 16:46   ` Jan Beulich
  1 sibling, 1 reply; 33+ messages in thread
From: Julien Grall @ 2020-02-03 14:00 UTC (permalink / raw)
  To: David Woodhouse, xen-devel
  Cc: Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Varad Gautam, Ian Jackson,
	Hongyan Xia, Paul Durrant, Roger Pau Monné

Hi David,

On 01/02/2020 00:33, David Woodhouse wrote:
> From: David Woodhouse <dwmw@amazon.co.uk>

I am a bit concerned with this change, particularly the consequence this 
have for the page-tables. There is an assumption that intermediate 
page-tables allocated via the boot allocator will never be freed.

On x86, a call to vunmap() will not free page-tables, but a subsequent 
call to vmap() may free it depending on the mapping size. So we would 
call free_domheap_pages() rather than init_heap_pages().

I am not entirely sure what is the full consequence, but I think this is 
a call for investigation and write it down a summary in the commit message.

Cheers,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Xen-devel] [PATCH 6/8] x86/setup: move vm_init() before end_boot_allocator()
  2020-02-03 11:10   ` Xia, Hongyan
@ 2020-02-03 14:03     ` David Woodhouse
  0 siblings, 0 replies; 33+ messages in thread
From: David Woodhouse @ 2020-02-03 14:03 UTC (permalink / raw)
  To: Xia, Hongyan, xen-devel
  Cc: sstabellini, julien, wl, konrad.wilk, George.Dunlap,
	andrew.cooper3, Gautam, Varad, ian.jackson, Durrant,  Paul,
	roger.pau

[-- Attachment #1.1: Type: text/plain, Size: 4021 bytes --]

On Mon, 2020-02-03 at 11:10 +0000, Xia, Hongyan wrote:
> Is there any problem to move vm_init() even earlier than this, like
> right after init_frametable()? ACPI and NUMA functions need a couple of
> things permanently mapped. 

You want it sooner than that, don't you? The code calls
acpi_boot_table_init() and srat_parse_regions() while looping over the
e820 regions, before init_frametable(). But that's OK; you can call
vm_init() the moment you have the pages in the boot allocator to
support it.

So you can do something like the hack below, for example.

This boots in all three of the liveupdate=, <4GiB, >4GiB cases on
x86_64 — but will probably break Arm unless you make vm_init() run soon
enough there too, and will potentially run vm_init() more than once on
x86_64 if acpi_boot_table_init() fails the first time(s).

But as a proof-of-concept, sure.

diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
index 2071e5acee..8aee55f31a 100644
--- a/xen/arch/x86/setup.c
+++ b/xen/arch/x86/setup.c
@@ -1487,12 +1487,16 @@ void __init noreturn __start_xen(unsigned long mbi_p)
             continue;
 
         if ( !acpi_boot_table_init_done &&
-             s >= (1ULL << 32) &&
-             !acpi_boot_table_init() )
+             s >= (1ULL << 32) )
         {
-            acpi_boot_table_init_done = true;
-            srat_parse_regions(s);
-            setup_max_pdx(raw_max_page);
+            printk("acpi/vm init\n");
+            vm_init(); // XX: not idempotent
+            if ( !acpi_boot_table_init() )
+            {
+                acpi_boot_table_init_done = true;
+                srat_parse_regions(s);
+                setup_max_pdx(raw_max_page);
+            }
         }
 
         if ( pfn_to_pdx((e - 1) >> PAGE_SHIFT) >= max_pdx )
@@ -1677,14 +1681,16 @@ void __init noreturn __start_xen(unsigned long mbi_p)
     init_frametable();
 
     if ( !acpi_boot_table_init_done )
+    {
+        printk("Late vm/acpi init\n");
+        vm_init();
         acpi_boot_table_init();
+    }
 
     acpi_numa_init();
 
     numa_initmem_init(0, raw_max_page);
 
-    vm_init();
-
     if ( lu_breadcrumb_phys )
     {
         lu_stream_map(&lu_stream, lu_mfnlist_phys, lu_nr_pages);
diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
index f1e7d81edc..e5d938f8ca 100644
--- a/xen/common/page_alloc.c
+++ b/xen/common/page_alloc.c
@@ -322,7 +322,7 @@ void __init init_boot_pages(paddr_t ps, paddr_t pe)
                  MAX_ORDER + 1);
 #endif
     BUILD_BUG_ON(sizeof(frame_table->u) != sizeof(unsigned long));
-
+    printk("init boot pages %lx %lx\n", ps, pe);
     ps = round_pgup(ps);
     pe = round_pgdown(pe);
     if ( pe <= ps )
@@ -395,7 +395,7 @@ mfn_t __init alloc_boot_pages(unsigned long nr_pfns, unsigned long pfn_align)
     unsigned int i = nr_bootmem_regions;
 
     BUG_ON(!nr_bootmem_regions);
-
+    printk("alloc_boot_pages %ld\n", nr_pfns);
     while ( i-- )
     {
         struct bootmem_region *r = &bootmem_region_list[i];
diff --git a/xen/drivers/acpi/osl.c b/xen/drivers/acpi/osl.c
index 4c8bb7839e..b7fcee408e 100644
--- a/xen/drivers/acpi/osl.c
+++ b/xen/drivers/acpi/osl.c
@@ -92,10 +92,11 @@ acpi_physical_address __init acpi_os_get_root_pointer(void)
 void __iomem *
 acpi_os_map_memory(acpi_physical_address phys, acpi_size size)
 {
-	if (system_state >= SYS_STATE_boot) {
+	if (1 || system_state >= SYS_STATE_boot) {
 		mfn_t mfn = _mfn(PFN_DOWN(phys));
 		unsigned int offs = phys & (PAGE_SIZE - 1);
 
+        printk("ACPI vmap %08lx\n", phys);
 		/* The low first Mb is always mapped on x86. */
 		if (IS_ENABLED(CONFIG_X86) && !((phys + size - 1) >> 20))
 			return __va(phys);
@@ -114,7 +115,7 @@ void acpi_os_unmap_memory(void __iomem * virt, acpi_size size)
 		return;
 	}
 
-	if (system_state >= SYS_STATE_boot)
+	if (1 || system_state >= SYS_STATE_boot)
 		vunmap((void *)((unsigned long)virt & PAGE_MASK));
 }
 


[-- Attachment #1.2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5174 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Xen-devel] [PATCH 8/8] x86/setup: lift dom0 creation out into create_dom0() function
  2020-02-01  0:33 ` [Xen-devel] [PATCH 8/8] x86/setup: lift dom0 creation out into create_dom0() function David Woodhouse
@ 2020-02-03 14:28   ` Julien Grall
  2020-02-03 15:03     ` David Woodhouse
  2020-02-21 17:06   ` Jan Beulich
  1 sibling, 1 reply; 33+ messages in thread
From: Julien Grall @ 2020-02-03 14:28 UTC (permalink / raw)
  To: David Woodhouse, xen-devel
  Cc: Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Varad Gautam, Ian Jackson,
	Hongyan Xia, Paul Durrant, Roger Pau Monné



On 01/02/2020 00:33, David Woodhouse wrote:
>       if ( xen_cpuidle )
>           xen_processor_pmbits |= XEN_PROCESSOR_PM_CX;
>   
> +    printk("%sNX (Execute Disable) protection %sactive\n",
> +           cpu_has_nx ? XENLOG_INFO : XENLOG_WARNING "Warning: ",
> +           cpu_has_nx ? "" : "not ");
> +

The placement of printk shouldn't matter but the change feels a bit 
out-of-context. Would you mind to explain it in the commit message?

>       initrdidx = find_first_bit(module_map, mbi->mods_count);
>       if ( initrdidx < mbi->mods_count )
>           initrd = mod + initrdidx;
> @@ -1801,34 +1836,14 @@ void __init noreturn __start_xen(unsigned long mbi_p)
>                  "Multiple initrd candidates, picking module #%u\n",
>                  initrdidx);
>   
> -    /*
> -     * Temporarily clear SMAP in CR4 to allow user-accesses in construct_dom0().
> -     * This saves a large number of corner cases interactions with
> -     * copy_from_user().
> -     */
> -    if ( cpu_has_smap )
> -    {
> -        cr4_pv32_mask &= ~X86_CR4_SMAP;
> -        write_cr4(read_cr4() & ~X86_CR4_SMAP);
> -    }
> -
> -    printk("%sNX (Execute Disable) protection %sactive\n",
> -           cpu_has_nx ? XENLOG_INFO : XENLOG_WARNING "Warning: ",
> -           cpu_has_nx ? "" : "not ");
> -

[...]


Cheers,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Xen-devel] [PATCH 8/8] x86/setup: lift dom0 creation out into create_dom0() function
  2020-02-03 14:28   ` Julien Grall
@ 2020-02-03 15:03     ` David Woodhouse
  0 siblings, 0 replies; 33+ messages in thread
From: David Woodhouse @ 2020-02-03 15:03 UTC (permalink / raw)
  To: Julien Grall, xen-devel
  Cc: Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Varad Gautam, Ian Jackson,
	Hongyan Xia, Paul Durrant, Roger Pau Monné

[-- Attachment #1.1: Type: text/plain, Size: 1729 bytes --]

On Mon, 2020-02-03 at 14:28 +0000, Julien Grall wrote:
> The placement of printk shouldn't matter but the change feels a bit 
> out-of-context. Would you mind to explain it in the commit message?

I didn't really intend to move the printk up; what I intended to do was
move the setting of 'initrd' down, so that it's right before the
create_dom0() call that it is preparing for. Which is purely cosmetic
for now, and more practical later when all that goes into a single
conditional for the non-live-update boot (as shown below).

Will update the commit message to note this; thanks.


    printk("%sNX (Execute Disable) protection %sactive\n",
           cpu_has_nx ? XENLOG_INFO : XENLOG_WARNING "Warning: ",
           cpu_has_nx ? "" : "not ");

    if ( lu_breadcrumb_phys )
    {
        dom0 = lu_restore_domains(&lu_stream);
        if ( dom0 == NULL )
            panic("No DOM0 found in live update data\n");

        lu_stream_free(&lu_stream);
    }
    else
    {
        initrdidx = find_first_bit(module_map, mbi->mods_count);
        if ( initrdidx < mbi->mods_count )
            initrd = mod + initrdidx;

        if ( bitmap_weight(module_map, mbi->mods_count) > 1 )
            printk(XENLOG_WARNING
                   "Multiple initrd candidates, picking module #%u\n",
                   initrdidx);

        /*
         * We're going to setup domain0 using the module(s) that we
         * stashed safely above our heap. The second module, if
         * present, is an initrd ramdisk.
         */
        dom0 = create_dom0(mod, modules_headroom, initrd, kextra, loader);
        if ( dom0 == NULL )
            panic("Could not set up DOM0 guest OS\n");
    }

[-- Attachment #1.2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5174 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Xen-devel] [PATCH 1/8] x86/smp: reset x2apic_enabled in smp_send_stop()
  2020-02-01  0:32 ` [Xen-devel] [PATCH 1/8] x86/smp: reset x2apic_enabled in smp_send_stop() David Woodhouse
@ 2020-02-03 16:18   ` Roger Pau Monné
  0 siblings, 0 replies; 33+ messages in thread
From: Roger Pau Monné @ 2020-02-03 16:18 UTC (permalink / raw)
  To: David Woodhouse
  Cc: Stefano Stabellini, Julien Grall, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Varad Gautam, Ian Jackson,
	Hongyan Xia, xen-devel, Paul Durrant

On Sat, Feb 01, 2020 at 12:32:56AM +0000, David Woodhouse wrote:
> From: David Woodhouse <dwmw@amazon.co.uk>
> 
> Just before smp_send_stop() re-enables interrupts when shutting down
> for reboot or kexec, it calls __stop_this_cpu() which in turn calls
> disable_local_APIC(), which puts the APIC back in to the mode Xen found
> it in at boot.
> 
> If that means turning x2APIC off and going back into xAPIC mode, then
> a timer interrupt occurring just after interrupts come back on will
> lead to a GP# when apic_timer_interrupt() attempts to ack the IRQ
> through the EOI register in x2APIC MSR 0x80b:
> 
> (XEN) Executing kexec image on cpu0
> (XEN) ----[ Xen-4.14-unstable  x86_64  debug=n   Not tainted ]----
> (XEN) CPU:    0
> (XEN) RIP:    e008:[<ffff82d08026c139>] apic_timer_interrupt+0x29/0x40
> (XEN) RFLAGS: 0000000000010046   CONTEXT: hypervisor
> (XEN) rax: 0000000000000000   rbx: 00000000000000fa   rcx: 000000000000080b
> …
> (XEN) Xen code around <ffff82d08026c139> (apic_timer_interrupt+0x29/0x40):
> (XEN)  c0 b9 0b 08 00 00 89 c2 <0f> 30 31 ff e9 0e c9 fb ff 0f 1f 40 00 66 2e 0f
> …
> (XEN) Xen call trace:
> (XEN)    [<ffff82d08026c139>] R apic_timer_interrupt+0x29/0x40
> (XEN)    [<ffff82d080283825>] S do_IRQ+0x95/0x750
> …
> (XEN)    [<ffff82d0802a0ad2>] S smp_send_stop+0x42/0xd0
> 
> We can't clear the global x2apic_enabled variable in disable_local_APIC()
> itself because that runs on each CPU. Instead, correct it (by using
> current_local_apic_mode()) in smp_send_stop() while interrupts are still
> disabled immediately after calling __stop_this_cpu() for the boot CPU,
> after all other CPUs have been stopped.
> 
> cf: d639bdd9bbe ("x86/apic: Disable the LAPIC later in smp_send_stop()")
>     ... which didn't quite fix it completely.
> 
> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>

Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>

Thanks!

> ---
>  xen/arch/x86/smp.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/xen/arch/x86/smp.c b/xen/arch/x86/smp.c
> index 65eb7cbda8..fac295fa6f 100644
> --- a/xen/arch/x86/smp.c
> +++ b/xen/arch/x86/smp.c
> @@ -354,6 +354,7 @@ void smp_send_stop(void)
>          disable_IO_APIC();
>          hpet_disable();
>          __stop_this_cpu();
> +        x2apic_enabled = (current_local_apic_mode() == APIC_MODE_X2APIC);

You could do this only when kexecing, but it's safe to do
unconditionally, and might be helpful if we also decide to play with
the lapic mode even when not kexecing.

Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Xen-devel] [PATCH 5/8] xen/vmap: allow vmap() to be called during early boot
  2020-02-03 14:00   ` Julien Grall
@ 2020-02-03 16:37     ` David Woodhouse
  2020-02-04 11:00       ` George Dunlap
  2020-02-09 18:19       ` Julien Grall
  0 siblings, 2 replies; 33+ messages in thread
From: David Woodhouse @ 2020-02-03 16:37 UTC (permalink / raw)
  To: Julien Grall, xen-devel
  Cc: Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Varad Gautam, Ian Jackson,
	Hongyan Xia, Paul Durrant, Roger Pau Monné

[-- Attachment #1.1: Type: text/plain, Size: 2219 bytes --]

On Mon, 2020-02-03 at 14:00 +0000, Julien Grall wrote:
> Hi David,
> 
> On 01/02/2020 00:33, David Woodhouse wrote:
> > From: David Woodhouse <dwmw@amazon.co.uk>
> 
> I am a bit concerned with this change, particularly the consequence this 
> have for the page-tables. There is an assumption that intermediate 
> page-tables allocated via the boot allocator will never be freed.
> 
> On x86, a call to vunmap() will not free page-tables, but a subsequent 
> call to vmap() may free it depending on the mapping size. So we would 
> call free_domheap_pages() rather than init_heap_pages().
> 
> I am not entirely sure what is the full consequence, but I think this is 
> a call for investigation and write it down a summary in the commit message.

This isn't just about page tables, right? It's about *any* allocation
given out by the boot allocator, being freed with free_heap_pages() ? 

Given the amount of code that has conditionals in both alloc and free
paths along the lines of…

  if (system_state > SYS_STATE_boot)
      use xenheap
  else
      use boot allocator

… I'm not sure I'd really trust the assumption that such a thing never
happens; that no pages are ever allocated from the boot allocator and
then freed into the heap.

In fact it does work fine except for some esoteric corner cases,
because init_heap_pages() is mostly just a trivial loop over
free_heap_pages().

The corner cases are if you call free_heap_pages() on boot-allocated
memory which matches one or more of the following criteria:

 • Includes MFN #0,

 • Includes the first page the heap has seen on a given node, so
   init_node_heap() has to be called, or

 • High-order allocations crossing from one node to another.

The first is trivial to fix by lifting the check for MFN#0 up into
free_heap_pages(). The second isn't hard either.

I'll have to think about the third a little longer... one option might
be to declare that it's OK to call free_heap_pages() on a *single* page
that was allocated from the boot allocator, but not higher orders (in
case they cross nodes). That would at least cover the specific concern
about page table pages and vm_init()/vmap().


[-- Attachment #1.2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5174 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Xen-devel] [PATCH 5/8] xen/vmap: allow vmap() to be called during early boot
  2020-02-03 16:37     ` David Woodhouse
@ 2020-02-04 11:00       ` George Dunlap
  2020-02-04 11:06         ` David Woodhouse
  2020-02-09 18:19       ` Julien Grall
  1 sibling, 1 reply; 33+ messages in thread
From: George Dunlap @ 2020-02-04 11:00 UTC (permalink / raw)
  To: David Woodhouse
  Cc: Stefano Stabellini, Julien Grall, Wei Liu, Konrad Rzeszutek Wilk,
	Andrew Cooper, Varad Gautam, Ian Jackson, Hongyan Xia, xen-devel,
	Paul Durrant, Roger Pau Monné

On Mon, Feb 3, 2020 at 4:37 PM David Woodhouse <dwmw2@infradead.org> wrote:
>
> On Mon, 2020-02-03 at 14:00 +0000, Julien Grall wrote:
> > Hi David,
> >
> > On 01/02/2020 00:33, David Woodhouse wrote:
> > > From: David Woodhouse <dwmw@amazon.co.uk>
> >
> > I am a bit concerned with this change, particularly the consequence this
> > have for the page-tables. There is an assumption that intermediate
> > page-tables allocated via the boot allocator will never be freed.
> >
> > On x86, a call to vunmap() will not free page-tables, but a subsequent
> > call to vmap() may free it depending on the mapping size. So we would
> > call free_domheap_pages() rather than init_heap_pages().
> >
> > I am not entirely sure what is the full consequence, but I think this is
> > a call for investigation and write it down a summary in the commit message.
>
> This isn't just about page tables, right? It's about *any* allocation
> given out by the boot allocator, being freed with free_heap_pages() ?
>
> Given the amount of code that has conditionals in both alloc and free
> paths along the lines of…
>
>   if (system_state > SYS_STATE_boot)
>       use xenheap
>   else
>       use boot allocator
>
> … I'm not sure I'd really trust the assumption that such a thing never
> happens; that no pages are ever allocated from the boot allocator and
> then freed into the heap.
>
> In fact it does work fine except for some esoteric corner cases,
> because init_heap_pages() is mostly just a trivial loop over
> free_heap_pages().
>
> The corner cases are if you call free_heap_pages() on boot-allocated
> memory which matches one or more of the following criteria:
>
>  • Includes MFN #0,
>
>  • Includes the first page the heap has seen on a given node, so
>    init_node_heap() has to be called, or
>
>  • High-order allocations crossing from one node to another.

I was asked to forward a message relating to MFN 0 and allocations
crossing zones from a private discussion on the security list:

8<---

> I am having difficulty seeing how invalidating MFN0 would solve the issue here.
> The zone number for a specific page is calculated from the most significant bit
> position set in it's MFN. As a result, each successive zone contains an order of
> magnitude more pages. You would need to invalidate the first or last MFN in each
> zone.

Because (unless Jan and I are reading the code wrong):

* Chunks can only be merged such that they end up on order-boundaries.
* Chunks can only be merged if they are the same order.
* Zone boundaries are on order boundaries.

So say you're freeing mfn 0x100, and mfn 0xff is free.  In that loop, (1
<< order) & mfn will always be 0, so it will always only look "forward"
fro things to merge, not backwards.

Suppose on the other hand, that you're freeing mfn 0x101, and 0x98
through 0x100 are free.  The loop will look "backwards" and merge with
0x100; but then it will look "forwards" again.

Now suppose you've merged 0x100-0x1ff, and the order moves up to size
0x100.  Now the mask becomes 0x1ff; so it can't merge with 0x200-0x2ff
(which would cross zones); instead it looks backwards to 0x0-0xff.

We don't think it's possible for things to be merged across zones unless
it can (say) start at 0xff, and merge all the way back to 0x0; which
can't be done if 0x0 is never on the free list.

That's the idea anyway.  That would explain why we've never seen it on
x86 -- due to the way the architecture is, mfn 0 is never on the free list.

--->8

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Xen-devel] [PATCH 5/8] xen/vmap: allow vmap() to be called during early boot
  2020-02-04 11:00       ` George Dunlap
@ 2020-02-04 11:06         ` David Woodhouse
  2020-02-04 11:18           ` David Woodhouse
  0 siblings, 1 reply; 33+ messages in thread
From: David Woodhouse @ 2020-02-04 11:06 UTC (permalink / raw)
  To: George Dunlap
  Cc: Stefano Stabellini, Julien Grall, Wei Liu, Konrad Rzeszutek Wilk,
	Andrew Cooper, Varad Gautam, Ian Jackson, Hongyan Xia, xen-devel,
	Paul Durrant, Roger Pau Monné

[-- Attachment #1.1: Type: text/plain, Size: 4191 bytes --]

On Tue, 2020-02-04 at 11:00 +0000, George Dunlap wrote:
> On Mon, Feb 3, 2020 at 4:37 PM David Woodhouse <dwmw2@infradead.org> wrote:
> > 
> > On Mon, 2020-02-03 at 14:00 +0000, Julien Grall wrote:
> > > Hi David,
> > > 
> > > On 01/02/2020 00:33, David Woodhouse wrote:
> > > > From: David Woodhouse <dwmw@amazon.co.uk>
> > > 
> > > I am a bit concerned with this change, particularly the consequence this
> > > have for the page-tables. There is an assumption that intermediate
> > > page-tables allocated via the boot allocator will never be freed.
> > > 
> > > On x86, a call to vunmap() will not free page-tables, but a subsequent
> > > call to vmap() may free it depending on the mapping size. So we would
> > > call free_domheap_pages() rather than init_heap_pages().
> > > 
> > > I am not entirely sure what is the full consequence, but I think this is
> > > a call for investigation and write it down a summary in the commit message.
> > 
> > This isn't just about page tables, right? It's about *any* allocation
> > given out by the boot allocator, being freed with free_heap_pages() ?
> > 
> > Given the amount of code that has conditionals in both alloc and free
> > paths along the lines of…
> > 
> >   if (system_state > SYS_STATE_boot)
> >       use xenheap
> >   else
> >       use boot allocator
> > 
> > … I'm not sure I'd really trust the assumption that such a thing never
> > happens; that no pages are ever allocated from the boot allocator and
> > then freed into the heap.
> > 
> > In fact it does work fine except for some esoteric corner cases,
> > because init_heap_pages() is mostly just a trivial loop over
> > free_heap_pages().
> > 
> > The corner cases are if you call free_heap_pages() on boot-allocated
> > memory which matches one or more of the following criteria:
> > 
> >  • Includes MFN #0,
> > 
> >  • Includes the first page the heap has seen on a given node, so
> >    init_node_heap() has to be called, or
> > 
> >  • High-order allocations crossing from one node to another.
> 
> I was asked to forward a message relating to MFN 0 and allocations
> crossing zones from a private discussion on the security list:
> 
> 8<---
> 
> > I am having difficulty seeing how invalidating MFN0 would solve the issue here.
> > The zone number for a specific page is calculated from the most significant bit
> > position set in it's MFN. As a result, each successive zone contains an order of
> > magnitude more pages. You would need to invalidate the first or last MFN in each
> > zone.
> 
> Because (unless Jan and I are reading the code wrong):
> 
> * Chunks can only be merged such that they end up on order-boundaries.
> * Chunks can only be merged if they are the same order.
> * Zone boundaries are on order boundaries.
> 
> So say you're freeing mfn 0x100, and mfn 0xff is free.  In that loop, (1
> << order) & mfn will always be 0, so it will always only look "forward"
> fro things to merge, not backwards.
> 
> Suppose on the other hand, that you're freeing mfn 0x101, and 0x98
> through 0x100 are free.  The loop will look "backwards" and merge with
> 0x100; but then it will look "forwards" again.
> 
> Now suppose you've merged 0x100-0x1ff, and the order moves up to size
> 0x100.  Now the mask becomes 0x1ff; so it can't merge with 0x200-0x2ff
> (which would cross zones); instead it looks backwards to 0x0-0xff.
> 
> We don't think it's possible for things to be merged across zones unless
> it can (say) start at 0xff, and merge all the way back to 0x0; which
> can't be done if 0x0 is never on the free list.
> 
> That's the idea anyway.  That would explain why we've never seen it on
> x86 -- due to the way the architecture is, mfn 0 is never on the free list.
> 
> --->8

Thanks.

I still don't really get it. What if the zone boundary is at MFN 0x300?

What prevents the buddy allocator from merging a range a 0x200-0x2FF
with another from 0x300-0x3FF, creating a single range 0x200-0x400
which crosses nodes?

The MFN0 trick only works if all zone boundaries must be at an address
which is 2ⁿ, doesn't it? Is that always true?


[-- Attachment #1.2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5174 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Xen-devel] [PATCH 5/8] xen/vmap: allow vmap() to be called during early boot
  2020-02-04 11:06         ` David Woodhouse
@ 2020-02-04 11:18           ` David Woodhouse
  0 siblings, 0 replies; 33+ messages in thread
From: David Woodhouse @ 2020-02-04 11:18 UTC (permalink / raw)
  To: George Dunlap
  Cc: Stefano Stabellini, Julien Grall, Wei Liu, Konrad Rzeszutek Wilk,
	Andrew Cooper, Varad Gautam, Ian Jackson, Hongyan Xia, xen-devel,
	Paul Durrant, Roger Pau Monné

[-- Attachment #1.1: Type: text/plain, Size: 895 bytes --]

On Tue, 2020-02-04 at 11:06 +0000, David Woodhouse wrote:
> 
> I still don't really get it. What if the zone boundary is at MFN 0x300?
> 
> What prevents the buddy allocator from merging a range a 0x200-0x2FF
> with another from 0x300-0x3FF, creating a single range 0x200-0x400
> which crosses nodes?
> 
> The MFN0 trick only works if all zone boundaries must be at an address
> which is 2ⁿ, doesn't it? Is that always true?


As yes, it is. And you called it out explicitly. Sorry.

I think I was conflating NUMA nodes, with the allocator zones.

On the plus side, this is a whole lot easier for the boot allocator to
cope with. It's easy enough to avoid handing out any high-order
allocations from the boot allocator which would cross a *zone*
boundary. It's crossing *node* boundaries which I was thinking of,
which would be harder before we've even parsed the SRAT...

[-- Attachment #1.2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5174 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Xen-devel] [PATCH 5/8] xen/vmap: allow vmap() to be called during early boot
  2020-02-03 16:37     ` David Woodhouse
  2020-02-04 11:00       ` George Dunlap
@ 2020-02-09 18:19       ` Julien Grall
  1 sibling, 0 replies; 33+ messages in thread
From: Julien Grall @ 2020-02-09 18:19 UTC (permalink / raw)
  To: David Woodhouse, xen-devel
  Cc: Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Varad Gautam, Ian Jackson,
	Hongyan Xia, Jan Beulich, Paul Durrant, Roger Pau Monné

Hi,

Answring back to this thread as well, so it is easier to track.

On 03/02/2020 16:37, David Woodhouse wrote:
> On Mon, 2020-02-03 at 14:00 +0000, Julien Grall wrote:
>> Hi David,
>>
>> On 01/02/2020 00:33, David Woodhouse wrote:
>>> From: David Woodhouse <dwmw@amazon.co.uk>
>>
>> I am a bit concerned with this change, particularly the consequence this
>> have for the page-tables. There is an assumption that intermediate
>> page-tables allocated via the boot allocator will never be freed.
>>
>> On x86, a call to vunmap() will not free page-tables, but a subsequent
>> call to vmap() may free it depending on the mapping size. So we would
>> call free_domheap_pages() rather than init_heap_pages().
>>
>> I am not entirely sure what is the full consequence, but I think this is
>> a call for investigation and write it down a summary in the commit message.
> 
> This isn't just about page tables, right? It's about *any* allocation
> given out by the boot allocator, being freed with free_heap_pages() ?

That's correct.

> 
> Given the amount of code that has conditionals in both alloc and free
> paths along the lines of…
> 
>    if (system_state > SYS_STATE_boot)
>        use xenheap
>    else
>        use boot allocator
> 
> … I'm not sure I'd really trust the assumption that such a thing never
> happens; that no pages are ever allocated from the boot allocator and
> then freed into the heap.

I believe this assumption holds on Arm because page-tables are never 
freed so far. The only other case is in the ACPI code 
(acpi_os_free_memory()) but I think it would result to a leak instead 
(see is_xmalloc_memory()).

I hadn't realized that this assumption was not holding on x86 :(. 
Actually, from the discussion on the MFN 0 thread, it looks like x86 is 
abusing quite a few interface in page alloc.

So if this is nothing new, this probably yet another good future 
clean-up/hardening as long as the assumption hold outside of x86.

Cheers,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Xen-devel] [PATCH 4/8] xen/vmap: allow vm_init_type to be called during early_boot
  2020-02-01  0:32 ` [Xen-devel] [PATCH 4/8] xen/vmap: allow vm_init_type to be called during early_boot David Woodhouse
@ 2020-02-13 10:36   ` Julien Grall
  2020-02-21 16:42   ` Jan Beulich
  1 sibling, 0 replies; 33+ messages in thread
From: Julien Grall @ 2020-02-13 10:36 UTC (permalink / raw)
  To: David Woodhouse, xen-devel
  Cc: Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Varad Gautam, Ian Jackson,
	Hongyan Xia, Jan Beulich, Paul Durrant, Roger Pau Monné

Hi David,

On 01/02/2020 01:32, David Woodhouse wrote:
> From: Wei Liu <wei.liu2@citrix.com>
> 
> We want to move vm_init, which calls vm_init_type under the hood, to
> early boot stage. Add a path to get page from boot allocator instead.
> 
> Add an emacs block to that file while I was there.
> 
> Signed-off-by: Wei Liu <wei.liu2@citrix.com>
> ---
>   xen/common/vmap.c | 22 +++++++++++++++++++---
>   1 file changed, 19 insertions(+), 3 deletions(-)
> 
> diff --git a/xen/common/vmap.c b/xen/common/vmap.c
> index faebc1ddf1..37922f735b 100644
> --- a/xen/common/vmap.c
> +++ b/xen/common/vmap.c
> @@ -34,9 +34,15 @@ void __init vm_init_type(enum vmap_region type, void *start, void *end)
>   
>       for ( i = 0, va = (unsigned long)vm_bitmap(type); i < nr; ++i, va += PAGE_SIZE )
>       {
> -        struct page_info *pg = alloc_domheap_page(NULL, 0);
> -
> -        map_pages_to_xen(va, page_to_mfn(pg), 1, PAGE_HYPERVISOR);
> +        mfn_t mfn;
> +        int rc;
> +
> +        if ( system_state == SYS_STATE_early_boot )
> +            mfn = alloc_boot_pages(1, 1);
> +        else
> +            mfn = page_to_mfn(alloc_domheap_page(NULL, 0));

It looks like the function was doing a pretty bad job at checking the 
return values :(.

As you add a check for map_pages_to_xen(), would you mind to add another 
one for alloc_domheap_page()?

Other than that, the code looks good to me.

> +        rc = map_pages_to_xen(va, mfn, 1, PAGE_HYPERVISOR);
> +        BUG_ON(rc);
>           clear_page((void *)va);
>       }
>       bitmap_fill(vm_bitmap(type), vm_low[type]);
> @@ -330,3 +336,13 @@ void vfree(void *va)
>           free_domheap_page(pg);
>   }
>   #endif
> +
> +/*
> + * Local variables:
> + * mode: C
> + * c-file-style: "BSD"
> + * c-basic-offset: 4
> + * tab-width: 4
> + * indent-tabs-mode: nil
> + * End:
> + */
> 

Cheers,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Xen-devel] [PATCH 7/8] x86/setup: simplify handling of initrdidx when no initrd present
  2020-02-01  0:33 ` [Xen-devel] [PATCH 7/8] x86/setup: simplify handling of initrdidx when no initrd present David Woodhouse
@ 2020-02-13 10:47   ` Julien Grall
  2020-02-21 16:59   ` Jan Beulich
  1 sibling, 0 replies; 33+ messages in thread
From: Julien Grall @ 2020-02-13 10:47 UTC (permalink / raw)
  To: David Woodhouse, xen-devel
  Cc: Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Varad Gautam, Ian Jackson,
	Hongyan Xia, Jan Beulich, Paul Durrant, Roger Pau Monné

Hi David,

On 01/02/2020 01:33, David Woodhouse wrote:
> From: David Woodhouse <dwmw@amazon.co.uk>
> 
> Remove a ternary operator that made my brain hurt.
> 
> Replace it with something simpler that makes it somewhat clearer that
> the check for initrdidx < mbi->mods_count is because mbi->mods_count
> is what find_first_bit() will return when it doesn't find anything.
> 
> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
> ---
>   xen/arch/x86/setup.c | 9 +++++----
>   1 file changed, 5 insertions(+), 4 deletions(-)
> 
> diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
> index 5f68a1308f..10209e6bfb 100644
> --- a/xen/arch/x86/setup.c
> +++ b/xen/arch/x86/setup.c
> @@ -687,7 +687,7 @@ void __init noreturn __start_xen(unsigned long mbi_p)
>       char *cmdline, *kextra, *loader;
>       unsigned int initrdidx, num_parked = 0;
>       multiboot_info_t *mbi;
> -    module_t *mod;
> +    module_t *mod, *initrd = NULL;
>       unsigned long nr_pages, raw_max_page, modules_headroom, module_map[1];
>       int i, j, e820_warn = 0, bytes = 0;
>       bool acpi_boot_table_init_done = false, relocated = false;
> @@ -1793,6 +1793,9 @@ void __init noreturn __start_xen(unsigned long mbi_p)
>           xen_processor_pmbits |= XEN_PROCESSOR_PM_CX;
>   
>       initrdidx = find_first_bit(module_map, mbi->mods_count);
> +    if ( initrdidx < mbi->mods_count )
IIUC, the check on initrdx > 0 was pointless as bit 0 will always be 
cleared (it is used for dom0 kernel). It might be worth mentionning it 
in the commit message. Something like:

"The initrd can never be in the first module (i.e initrdidx == 0) as 
this is always used by the dom0 kernel. So the check can be simplify."

Acked-by: Julien Grall <julien@xen.org>

> +        initrd = mod + initrdidx;
> +
>       if ( bitmap_weight(module_map, mbi->mods_count) > 1 )
>           printk(XENLOG_WARNING
>                  "Multiple initrd candidates, picking module #%u\n",
> @@ -1817,9 +1820,7 @@ void __init noreturn __start_xen(unsigned long mbi_p)
>        * We're going to setup domain0 using the module(s) that we stashed safely
>        * above our heap. The second module, if present, is an initrd ramdisk.
>        */
> -    if ( construct_dom0(dom0, mod, modules_headroom,
> -                        (initrdidx > 0) && (initrdidx < mbi->mods_count)
> -                        ? mod + initrdidx : NULL, cmdline) != 0)
> +    if ( construct_dom0(dom0, mod, modules_headroom, initrd, cmdline) != 0 )
>           panic("Could not set up DOM0 guest OS\n");
>   
>       if ( cpu_has_smap )
> 

Cheers,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Xen-devel] [PATCH 2/8] x86/setup: Fix badpage= handling for memory above HYPERVISOR_VIRT_END
  2020-02-01  0:32 ` [Xen-devel] [PATCH 2/8] x86/setup: Fix badpage= handling for memory above HYPERVISOR_VIRT_END David Woodhouse
  2020-02-03 10:57   ` Julien Grall
@ 2020-02-20 15:38   ` Jan Beulich
  1 sibling, 0 replies; 33+ messages in thread
From: Jan Beulich @ 2020-02-20 15:38 UTC (permalink / raw)
  To: David Woodhouse
  Cc: Stefano Stabellini, Julien Grall, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Varad Gautam, Ian Jackson,
	Hongyan Xia, xen-devel, Paul Durrant, Roger Pau Monné

On 01.02.2020 01:32, David Woodhouse wrote:
> --- a/xen/common/page_alloc.c
> +++ b/xen/common/page_alloc.c
> @@ -1758,6 +1758,18 @@ int query_page_offline(mfn_t mfn, uint32_t *status)
>      return 0;
>  }
>  
> +static unsigned long contig_avail_pages(struct page_info *pg, unsigned long max_pages)
> +{
> +    unsigned long i;
> +
> +    for ( i = 0 ; i < max_pages; i++)

Nit: Stray blank before first semicolon.

> +    {
> +        if ( pg[i].count_info & PGC_broken )
> +            break;
> +    }
> +    return i;

Further nits: Commonly we omit the braces in cases like this one.
We also like to have blank lines before the main return statement
of a function.

> @@ -1846,6 +1863,63 @@ static unsigned long avail_heap_pages(
>      return free_pages;
>  }
>  
> +static void mark_bad_pages(void)

__init please

> +{
> +    unsigned long bad_spfn, bad_epfn;
> +    const char *p;
> +    struct page_info *pg;
> +#ifdef CONFIG_X86
> +    const struct platform_bad_page *badpage;
> +    unsigned int i, j, array_size;
> +
> +    badpage = get_platform_badpages(&array_size);
> +    if ( badpage )
> +    {
> +        for ( i = 0; i < array_size; i++ )
> +        {
> +            for ( j = 0; j < 1UL << badpage->order; j++ )

Either you mean badpage[i].* here and below, or you're missing an
increment of badpage somewhere.

> +            {
> +                if ( mfn_valid(_mfn(badpage->mfn + j)) )
> +                {
> +                    pg = mfn_to_page(_mfn(badpage->mfn + j));
> +                    pg->count_info |= PGC_broken;
> +                    page_list_add_tail(pg, &page_broken_list);
> +                }
> +            }
> +        }
> +    }
> +#endif
> +
> +    /* Check new pages against the bad-page list. */
> +    p = opt_badpage;
> +    while ( *p != '\0' )
> +    {
> +        bad_spfn = simple_strtoul(p, &p, 0);
> +        bad_epfn = bad_spfn;
> +
> +        if ( *p == '-' )
> +        {
> +            p++;
> +            bad_epfn = simple_strtoul(p, &p, 0);
> +            if ( bad_epfn < bad_spfn )
> +                bad_epfn = bad_spfn;
> +        }
> +
> +        if ( *p == ',' )
> +            p++;
> +        else if ( *p != '\0' )
> +            break;

I think this common (with init_boot_pages()) part of parsing
would better abstracted out, such there will be just one
instance of, and hence there's no risk of things going out of
sync.

> +        while ( mfn_valid(_mfn(bad_spfn)) && bad_spfn < bad_epfn )

As per init_boot_pages() as well as per the "bad_epfn = bad_spfn;"
you have further up, the range here is inclusive at its end. I'm
also uncertain about the stopping at the first !mfn_valid() -
there may well be further valid pages later on.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Xen-devel] [PATCH 4/8] xen/vmap: allow vm_init_type to be called during early_boot
  2020-02-01  0:32 ` [Xen-devel] [PATCH 4/8] xen/vmap: allow vm_init_type to be called during early_boot David Woodhouse
  2020-02-13 10:36   ` Julien Grall
@ 2020-02-21 16:42   ` Jan Beulich
  1 sibling, 0 replies; 33+ messages in thread
From: Jan Beulich @ 2020-02-21 16:42 UTC (permalink / raw)
  To: David Woodhouse, Wei Liu
  Cc: Stefano Stabellini, Julien Grall, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Varad Gautam, Ian Jackson,
	Hongyan Xia, xen-devel, Paul Durrant, Roger Pau Monné

On 01.02.2020 01:32, David Woodhouse wrote:
> From: Wei Liu <wei.liu2@citrix.com>
> 
> We want to move vm_init, which calls vm_init_type under the hood, to
> early boot stage. Add a path to get page from boot allocator instead.
> 
> Add an emacs block to that file while I was there.
> 
> Signed-off-by: Wei Liu <wei.liu2@citrix.com>

Reviewed-by: Jan Beulich <jbeulich@suse.com>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Xen-devel] [PATCH 5/8] xen/vmap: allow vmap() to be called during early boot
  2020-02-01  0:33 ` [Xen-devel] [PATCH 5/8] xen/vmap: allow vmap() to be called during early boot David Woodhouse
  2020-02-03 14:00   ` Julien Grall
@ 2020-02-21 16:46   ` Jan Beulich
  1 sibling, 0 replies; 33+ messages in thread
From: Jan Beulich @ 2020-02-21 16:46 UTC (permalink / raw)
  To: David Woodhouse
  Cc: Stefano Stabellini, Julien Grall, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Varad Gautam, Ian Jackson,
	Hongyan Xia, xen-devel, Paul Durrant, Roger Pau Monné

On 01.02.2020 01:33, David Woodhouse wrote:
> From: David Woodhouse <dwmw@amazon.co.uk>
> 
> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>

A word on the "why" would be nice, just like Wei has in the previous
patch. Besides that just two cosmetic requests:

> --- a/xen/common/vmap.c
> +++ b/xen/common/vmap.c
> @@ -68,7 +68,7 @@ static void *vm_alloc(unsigned int nr, unsigned int align,
>      spin_lock(&vm_lock);
>      for ( ; ; )
>      {
> -        struct page_info *pg;
> +        mfn_t mfn;
>  
>          ASSERT(vm_low[t] == vm_top[t] || !test_bit(vm_low[t], vm_bitmap(t)));
>          for ( start = vm_low[t]; start < vm_top[t]; )
> @@ -103,9 +103,17 @@ static void *vm_alloc(unsigned int nr, unsigned int align,
>          if ( vm_top[t] >= vm_end[t] )
>              return NULL;
>  
> -        pg = alloc_domheap_page(NULL, 0);
> -        if ( !pg )
> -            return NULL;
> +        if ( system_state == SYS_STATE_early_boot )
> +        {
> +            mfn = alloc_boot_pages(1, 1);
> +        }

Please omit the braces in cases like this one.

> +        else
> +        {
> +            struct page_info *pg = alloc_domheap_page(NULL, 0);
> +            if ( !pg )

Please have a blank line between declaration(s) and statement(s).

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Xen-devel] [PATCH 6/8] x86/setup: move vm_init() before end_boot_allocator()
  2020-02-01  0:33 ` [Xen-devel] [PATCH 6/8] x86/setup: move vm_init() before end_boot_allocator() David Woodhouse
  2020-02-03 11:10   ` Xia, Hongyan
@ 2020-02-21 16:48   ` Jan Beulich
  1 sibling, 0 replies; 33+ messages in thread
From: Jan Beulich @ 2020-02-21 16:48 UTC (permalink / raw)
  To: David Woodhouse
  Cc: Stefano Stabellini, Julien Grall, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Varad Gautam, Ian Jackson,
	Hongyan Xia, xen-devel, Paul Durrant, Roger Pau Monné

On 01.02.2020 01:33, David Woodhouse wrote:
> From: David Woodhouse <dwmw@amazon.co.uk>
> 
> We would like to be able to use vmap() to map the live update data, and
> we need to do a first pass of the live update data before we prime the
> heap because we need to know which pages need to be preserved.
> 
> The warning about ACPI code can be dropped, since that problem no longer
> exists when things are done in this order.
> 
> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>

Reviewed-by: Jan Beulich <jbeulich@suse.com>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Xen-devel] [PATCH 7/8] x86/setup: simplify handling of initrdidx when no initrd present
  2020-02-01  0:33 ` [Xen-devel] [PATCH 7/8] x86/setup: simplify handling of initrdidx when no initrd present David Woodhouse
  2020-02-13 10:47   ` Julien Grall
@ 2020-02-21 16:59   ` Jan Beulich
  2020-02-24 13:31     ` Julien Grall
  1 sibling, 1 reply; 33+ messages in thread
From: Jan Beulich @ 2020-02-21 16:59 UTC (permalink / raw)
  To: David Woodhouse
  Cc: Stefano Stabellini, Julien Grall, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Varad Gautam, Ian Jackson,
	Hongyan Xia, xen-devel, Paul Durrant, Roger Pau Monné

On 01.02.2020 01:33, David Woodhouse wrote:
> From: David Woodhouse <dwmw@amazon.co.uk>
> 
> Remove a ternary operator that made my brain hurt.

Personally I'd prefer the code to stay as is, but if Andrew agrees
with this being an improvement, then I also wouldn't want to stand
in the way. If it is to go in I have a few small adjustment requests:

> Replace it with something simpler that makes it somewhat clearer that
> the check for initrdidx < mbi->mods_count is because mbi->mods_count
> is what find_first_bit() will return when it doesn't find anything.

Especially in light of the recent XSA-307 I'd like to ask that we
avoid imprecise statements like this: Afaict find_first_bit() may
also validly return any value larger than the passed in bitmap
length.

> @@ -1793,6 +1793,9 @@ void __init noreturn __start_xen(unsigned long mbi_p)
>          xen_processor_pmbits |= XEN_PROCESSOR_PM_CX;
>  
>      initrdidx = find_first_bit(module_map, mbi->mods_count);
> +    if ( initrdidx < mbi->mods_count )
> +        initrd = mod + initrdidx;
> +
>      if ( bitmap_weight(module_map, mbi->mods_count) > 1 )
>          printk(XENLOG_WARNING
>                 "Multiple initrd candidates, picking module #%u\n",

Since this if() is tightly coupled with the find_first_bit()
invocation, I'd like to ask for there to not be a blank line in
between.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Xen-devel] [PATCH 8/8] x86/setup: lift dom0 creation out into create_dom0() function
  2020-02-01  0:33 ` [Xen-devel] [PATCH 8/8] x86/setup: lift dom0 creation out into create_dom0() function David Woodhouse
  2020-02-03 14:28   ` Julien Grall
@ 2020-02-21 17:06   ` Jan Beulich
  1 sibling, 0 replies; 33+ messages in thread
From: Jan Beulich @ 2020-02-21 17:06 UTC (permalink / raw)
  To: David Woodhouse
  Cc: Stefano Stabellini, Julien Grall, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Varad Gautam, Ian Jackson,
	Hongyan Xia, xen-devel, Paul Durrant, Roger Pau Monné

On 01.02.2020 01:33, David Woodhouse wrote:
> --- a/xen/arch/x86/setup.c
> +++ b/xen/arch/x86/setup.c
> @@ -678,6 +678,92 @@ static unsigned int __init copy_bios_e820(struct e820entry *map, unsigned int li
>      return n;
>  }
>  
> +static struct domain * __init create_dom0(const module_t *image,
> +                                          unsigned long headroom,
> +                                          module_t *initrd, char *kextra,
> +                                          char *loader)

Can any of these last three be pointer-to-const?

> +{
> +    struct xen_domctl_createdomain dom0_cfg = {
> +        .flags = IS_ENABLED(CONFIG_TBOOT) ? XEN_DOMCTL_CDF_s3_integrity : 0,
> +        .max_evtchn_port = -1,
> +        .max_grant_frames = -1,
> +        .max_maptrack_frames = -1,
> +    };
> +    struct domain *d;
> +    char *cmdline;
> +
> +    if ( opt_dom0_pvh )
> +    {
> +        dom0_cfg.flags |= (XEN_DOMCTL_CDF_hvm |
> +                           ((hvm_hap_supported() && !opt_dom0_shadow) ?
> +                            XEN_DOMCTL_CDF_hap : 0));
> +
> +        dom0_cfg.arch.emulation_flags |=
> +            XEN_X86_EMU_LAPIC | XEN_X86_EMU_IOAPIC | XEN_X86_EMU_VPCI;
> +    }
> +    dom0_cfg.max_vcpus = dom0_max_vcpus();

Can this not be part of the initializer now?

> +    if ( iommu_enabled )
> +        dom0_cfg.flags |= XEN_DOMCTL_CDF_iommu;
> +
> +    /* Create initial domain 0. */
> +    d = domain_create(get_initial_domain_id(), &dom0_cfg, !pv_shim);
> +    if ( IS_ERR(d) || (alloc_dom0_vcpu0(d) == NULL) )
> +        panic("Error creating domain 0\n");
> +
> +    /* Grab the DOM0 command line. */
> +    cmdline = (char *)(image->string ? __va(image->string) : NULL);

Is this cast needed? (I know you're only moving the code, but some
easy cleanup would be nice anyway.)

> +    if ( (cmdline != NULL) || (kextra != NULL) )

Similarly here you may want to consider shortening to

    if ( cmdline || kextra )

At least one more similar case further down.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Xen-devel] [PATCH 7/8] x86/setup: simplify handling of initrdidx when no initrd present
  2020-02-21 16:59   ` Jan Beulich
@ 2020-02-24 13:31     ` Julien Grall
  2020-02-25 12:34       ` Jan Beulich
  0 siblings, 1 reply; 33+ messages in thread
From: Julien Grall @ 2020-02-24 13:31 UTC (permalink / raw)
  To: Jan Beulich, David Woodhouse
  Cc: Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Varad Gautam, Ian Jackson,
	Hongyan Xia, xen-devel, Paul Durrant, Roger Pau Monné

Hi Jan,

On 21/02/2020 16:59, Jan Beulich wrote:
> On 01.02.2020 01:33, David Woodhouse wrote:
>> From: David Woodhouse <dwmw@amazon.co.uk>
>>
>> Remove a ternary operator that made my brain hurt.
> 
> Personally I'd prefer the code to stay as is, but if Andrew agrees
> with this being an improvement, then I also wouldn't want to stand
> in the way. If it is to go in I have a few small adjustment requests:
> 
>> Replace it with something simpler that makes it somewhat clearer that
>> the check for initrdidx < mbi->mods_count is because mbi->mods_count
>> is what find_first_bit() will return when it doesn't find anything.
> 
> Especially in light of the recent XSA-307 I'd like to ask that we
> avoid imprecise statements like this: Afaict find_first_bit() may
> also validly return any value larger than the passed in bitmap
> length.

Is it? I though that all the callers are now returning 'size' in all the 
error cases.

Cheers,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Xen-devel] [PATCH 7/8] x86/setup: simplify handling of initrdidx when no initrd present
  2020-02-24 13:31     ` Julien Grall
@ 2020-02-25 12:34       ` Jan Beulich
  2020-02-26  7:13         ` Julien Grall
  0 siblings, 1 reply; 33+ messages in thread
From: Jan Beulich @ 2020-02-25 12:34 UTC (permalink / raw)
  To: Julien Grall
  Cc: Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Varad Gautam, Ian Jackson,
	Hongyan Xia, xen-devel, Paul Durrant, David Woodhouse,
	Roger Pau Monné

On 24.02.2020 14:31, Julien Grall wrote:
> On 21/02/2020 16:59, Jan Beulich wrote:
>> On 01.02.2020 01:33, David Woodhouse wrote:
>>> From: David Woodhouse <dwmw@amazon.co.uk>
>>>
>>> Remove a ternary operator that made my brain hurt.
>>
>> Personally I'd prefer the code to stay as is, but if Andrew agrees
>> with this being an improvement, then I also wouldn't want to stand
>> in the way. If it is to go in I have a few small adjustment requests:
>>
>>> Replace it with something simpler that makes it somewhat clearer that
>>> the check for initrdidx < mbi->mods_count is because mbi->mods_count
>>> is what find_first_bit() will return when it doesn't find anything.
>>
>> Especially in light of the recent XSA-307 I'd like to ask that we
>> avoid imprecise statements like this: Afaict find_first_bit() may
>> also validly return any value larger than the passed in bitmap
>> length.
> 
> Is it? I though that all the callers are now returning 'size' in all the 
> error cases.

Taking (part of) the x86 example:

>#define find_next_bit(addr, size, off) ({                                   \
>    unsigned int r__;                                                       \
>    const unsigned long *a__ = (addr);                                      \
>    unsigned int s__ = (size);                                              \
>    unsigned int o__ = (off);                                               \
>    if ( o__ >= s__ )                                                       \
>        r__ = s__;                                                          \

This is what did get adjusted, guaranteeing "size" to be returned for
too large an "offset".

>    else if ( __builtin_constant_p(size) && s__ <= BITS_PER_LONG )          \
>        r__ = o__ + __scanbit(*(const unsigned long *)(a__) >> o__, s__);   \

Yet this was (deliberately) left untouched. Without s__ getting
replaced by s__ - o__ it may still produce a value larger than
"size", though.

Further, even if all current implementations obeyed by the more
strict rule, this still wouldn't mean callers are actually permitted
to assume such. The more strict rule would then also need to be
written down, such that it won't get violated again in the future
(by changes to an existing arch's implementation, or by a new port).

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Xen-devel] [PATCH 7/8] x86/setup: simplify handling of initrdidx when no initrd present
  2020-02-25 12:34       ` Jan Beulich
@ 2020-02-26  7:13         ` Julien Grall
  2020-02-26  8:37           ` Jan Beulich
  0 siblings, 1 reply; 33+ messages in thread
From: Julien Grall @ 2020-02-26  7:13 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Varad Gautam, Ian Jackson,
	Hongyan Xia, xen-devel, Paul Durrant, David Woodhouse,
	Roger Pau Monné

Hi Jan,

On 25/02/2020 12:34, Jan Beulich wrote:
> On 24.02.2020 14:31, Julien Grall wrote:
>> On 21/02/2020 16:59, Jan Beulich wrote:
>>> On 01.02.2020 01:33, David Woodhouse wrote:
>>>> From: David Woodhouse <dwmw@amazon.co.uk>
>>>>
>>>> Remove a ternary operator that made my brain hurt.
>>>
>>> Personally I'd prefer the code to stay as is, but if Andrew agrees
>>> with this being an improvement, then I also wouldn't want to stand
>>> in the way. If it is to go in I have a few small adjustment requests:
>>>
>>>> Replace it with something simpler that makes it somewhat clearer that
>>>> the check for initrdidx < mbi->mods_count is because mbi->mods_count
>>>> is what find_first_bit() will return when it doesn't find anything.
>>>
>>> Especially in light of the recent XSA-307 I'd like to ask that we
>>> avoid imprecise statements like this: Afaict find_first_bit() may
>>> also validly return any value larger than the passed in bitmap
>>> length.
>>
>> Is it? I though that all the callers are now returning 'size' in all the
>> error cases.
> 
> Taking (part of) the x86 example:
> 
>> #define find_next_bit(addr, size, off) ({                                   \
>>     unsigned int r__;                                                       \
>>     const unsigned long *a__ = (addr);                                      \
>>     unsigned int s__ = (size);                                              \
>>     unsigned int o__ = (off);                                               \
>>     if ( o__ >= s__ )                                                       \
>>         r__ = s__;                                                          \
> 
> This is what did get adjusted, guaranteeing "size" to be returned for
> too large an "offset".
> 
>>     else if ( __builtin_constant_p(size) && s__ <= BITS_PER_LONG )          \
>>         r__ = o__ + __scanbit(*(const unsigned long *)(a__) >> o__, s__);   \
> 
> Yet this was (deliberately) left untouched. Without s__ getting
> replaced by s__ - o__ it may still produce a value larger than
> "size", though.

Ah I missed this part.

> 
> Further, even if all current implementations obeyed by the more
> strict rule, this still wouldn't mean callers are actually permitted
> to assume such. The more strict rule would then also need to be
> written down, such that it won't get violated again in the future
> (by changes to an existing arch's implementation, or by a new port

To be honest, the rule should be written down in any case. The current 
one is not necessarily an obvious one and also differ from what Linux 
folks can expect.

Regarding future port, the number of architectures in Linux using custom 
bitops are fairly limited (AFAICT only arm32 and unicore32). All the 
rest (including x86) using a generic implementation.

On Xen, Arm64 is already using the generic implementation. Is there any 
particular concern to use it for x86 as well?

If not, I can pull a patch to use the generic implementation on 
x86/arm32. This would solve the discrenpancies in find_*_bit 
implementations.

Cheers,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Xen-devel] [PATCH 7/8] x86/setup: simplify handling of initrdidx when no initrd present
  2020-02-26  7:13         ` Julien Grall
@ 2020-02-26  8:37           ` Jan Beulich
  0 siblings, 0 replies; 33+ messages in thread
From: Jan Beulich @ 2020-02-26  8:37 UTC (permalink / raw)
  To: Julien Grall, Andrew Cooper
  Cc: Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Varad Gautam, Ian Jackson, Hongyan Xia, xen-devel,
	Paul Durrant, David Woodhouse, Roger Pau Monné

On 26.02.2020 08:13, Julien Grall wrote:
> On 25/02/2020 12:34, Jan Beulich wrote:
>> Further, even if all current implementations obeyed by the more
>> strict rule, this still wouldn't mean callers are actually permitted
>> to assume such. The more strict rule would then also need to be
>> written down, such that it won't get violated again in the future
>> (by changes to an existing arch's implementation, or by a new port
> 
> To be honest, the rule should be written down in any case. The current 
> one is not necessarily an obvious one and also differ from what Linux 
> folks can expect.

I think we should stick to the more relaxed rule in any event, unless
there's a strong reason to enforce the more strict one. Much (but not
all) of Linux code looks to assume the more relaxed rule too, likely
also for historical reasons (when the implementation on e.g. x86-64
still followed the more relaxed model).

> Regarding future port, the number of architectures in Linux using custom 
> bitops are fairly limited (AFAICT only arm32 and unicore32). All the 
> rest (including x86) using a generic implementation.
> 
> On Xen, Arm64 is already using the generic implementation. Is there any 
> particular concern to use it for x86 as well?

According to the (over 10 years old) commit updating Linux x86 this
way, the generic implementation was even faster. If that was the
case today and for our implementation as well, then I think it
would be very nice if we updated. If, otoh, data isn't as clear,
then further consideration may be needed. Andrew, do you have any
thoughts either way?

> If not, I can pull a patch to use the generic implementation on 
> x86/arm32. This would solve the discrenpancies in find_*_bit 
> implementations.

This is orthogonal to the issue discussed - as said, I think code
using the functions would still better assume the more relaxed model.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, back to index

Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-01  0:32 [Xen-devel] [PATCH 0/8] Early cleanups and bug fixes in preparation for live update David Woodhouse
2020-02-01  0:32 ` [Xen-devel] [PATCH 1/8] x86/smp: reset x2apic_enabled in smp_send_stop() David Woodhouse
2020-02-03 16:18   ` Roger Pau Monné
2020-02-01  0:32 ` [Xen-devel] [PATCH 2/8] x86/setup: Fix badpage= handling for memory above HYPERVISOR_VIRT_END David Woodhouse
2020-02-03 10:57   ` Julien Grall
2020-02-20 15:38   ` Jan Beulich
2020-02-01  0:32 ` [Xen-devel] [PATCH 3/8] x86/setup: Don't skip 2MiB underneath relocated Xen image David Woodhouse
2020-02-01  0:32 ` [Xen-devel] [PATCH 4/8] xen/vmap: allow vm_init_type to be called during early_boot David Woodhouse
2020-02-13 10:36   ` Julien Grall
2020-02-21 16:42   ` Jan Beulich
2020-02-01  0:33 ` [Xen-devel] [PATCH 5/8] xen/vmap: allow vmap() to be called during early boot David Woodhouse
2020-02-03 14:00   ` Julien Grall
2020-02-03 16:37     ` David Woodhouse
2020-02-04 11:00       ` George Dunlap
2020-02-04 11:06         ` David Woodhouse
2020-02-04 11:18           ` David Woodhouse
2020-02-09 18:19       ` Julien Grall
2020-02-21 16:46   ` Jan Beulich
2020-02-01  0:33 ` [Xen-devel] [PATCH 6/8] x86/setup: move vm_init() before end_boot_allocator() David Woodhouse
2020-02-03 11:10   ` Xia, Hongyan
2020-02-03 14:03     ` David Woodhouse
2020-02-21 16:48   ` Jan Beulich
2020-02-01  0:33 ` [Xen-devel] [PATCH 7/8] x86/setup: simplify handling of initrdidx when no initrd present David Woodhouse
2020-02-13 10:47   ` Julien Grall
2020-02-21 16:59   ` Jan Beulich
2020-02-24 13:31     ` Julien Grall
2020-02-25 12:34       ` Jan Beulich
2020-02-26  7:13         ` Julien Grall
2020-02-26  8:37           ` Jan Beulich
2020-02-01  0:33 ` [Xen-devel] [PATCH 8/8] x86/setup: lift dom0 creation out into create_dom0() function David Woodhouse
2020-02-03 14:28   ` Julien Grall
2020-02-03 15:03     ` David Woodhouse
2020-02-21 17:06   ` Jan Beulich

Xen-Devel Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/xen-devel/0 xen-devel/git/0.git
	git clone --mirror https://lore.kernel.org/xen-devel/1 xen-devel/git/1.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 xen-devel xen-devel/ https://lore.kernel.org/xen-devel \
		xen-devel@lists.xenproject.org xen-devel@lists.xen.org
	public-inbox-index xen-devel

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.xenproject.lists.xen-devel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git