All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/22] Remove the directmap
@ 2022-12-16 11:48 Julien Grall
  2022-12-16 11:48 ` [PATCH 01/22] xen/common: page_alloc: Re-order includes Julien Grall
                   ` (21 more replies)
  0 siblings, 22 replies; 101+ messages in thread
From: Julien Grall @ 2022-12-16 11:48 UTC (permalink / raw)
  To: xen-devel
  Cc: julien, Julien Grall, Andrew Cooper, George Dunlap, Jan Beulich,
	Stefano Stabellini, Wei Liu, Bertrand Marquis, Volodymyr Babchuk,
	Roger Pau Monné

From: Julien Grall <jgrall@amazon.com>

Hi all,

A few years ago, Wei Liu implemented a PoC to remove the directmap
from Xen. The last version was sent by Hongyan Xia [1].

I will start with thanking both Wei and Hongyan for the initial work
to upstream the feature. A lot of patches already went in and this is
the last few patches missing to effectively enable the feature.

=== What is the directmap? ===

At the moment, on both arm64 and x86, most of the RAM is mapped
in Xen address space. This means that domain memory is easily
accessible in Xen.

=== Why do we want to remove the directmap? ===

(Summarizing my understanding of the previous discussion)

Speculation attacks (like Spectre SP1) rely on loading piece of memory
in the cache. If the region is not mapped then it can't be loaded.

So removing reducing the amount of memory mapped in Xen will also
reduce the surface attack.

=== What's the performance impact? ===

As the guest memory is not always mapped, then the cost of mapping
will increase. I haven't done the numbers with this new version, but
some measurement were provided in the previous version for x86.

=== Improvement possible ===

The known area to improve on x86 are:
   * Mapcache: There was a patch sent by Hongyan:
     https://lore.kernel.org/xen-devel/4058e92ce21627731c49b588a95809dc0affd83a.1581015491.git.hongyxia@amazon.com/
   * EPT: At the moment an guest page-tabel walk requires about 20 map/unmap.
     This will have an very high impact on the performance. We need to decide
     whether keep the EPT always mapped is a problem

The original series didn't have support for Arm64. But as there were
some interest, I have provided a PoC.

There are more extra work for Arm64:
   * The mapcache is quite simple. We would investigate the performance
   * The mapcache should be made compliant to the Arm Arm (this is now
     more critical).
   * We will likely have the same problem as for the EPT.
   * We have no support for merging table to a superpage, neither
     free empty page-tables. (See more below)

=== Implementation ===

The subject is probably a misnomer. The directmap is still present but
the RAM is not mapped by default. Instead, the region will still be used
to map pages allocate via alloc_xenheap_pages().

The advantage is the solution is simple (so IHMO good enough for been
merged as a tech preview). The disadvantage is the page allocator is not
trying to keep all the xenheap pages together. So we may end up to have
an increase of page table usage.

In the longer term, we should consider to remove the direct map
completely and switch to vmap(). The main problem with this approach
is it is frequent to use mfn_to_virt() in the code. So we would need
to cache the mapping (maybe in the struct page_info).

=== Why arm32 is not covered? ===

On Arm32, the domheap and xenheap is always separated. So by design
the guest memory is not mapped by default.

At this stage, it seems unnecessary to have to map/unmap xenheap pages
every time they are allocated.

=== Why not using a separate domheap and xenheap? ===

While a separate xenheap/domheap reduce the page-table usage (all
xenheap pages are contiguous and could be always mapped), it is also
currently less scalable because the split is fixed at boot time (XXX:
Can this be dynamic?).

=== Future of secret-free hypervisor ===

There are some information in an e-mail from Andrew a few years ago:

https://lore.kernel.org/xen-devel/e3219697-0759-39fc-2486-715cdec1ca9e@citrix.com/

Cheers,

[1] https://lore.kernel.org/xen-devel/cover.1588278317.git.hongyxia@amazon.com/

*** BLURB HERE ***

Hongyan Xia (12):
  acpi: vmap pages in acpi_os_alloc_memory
  xen/numa: vmap the pages for memnodemap
  x86/srat: vmap the pages for acpi_slit
  x86: map/unmap pages in restore_all_guests
  x86/pv: rewrite how building PV dom0 handles domheap mappings
  x86/mapcache: initialise the mapcache for the idle domain
  x86: add a boot option to enable and disable the direct map
  x86/domain_page: remove the fast paths when mfn is not in the
    directmap
  xen/page_alloc: add a path for xenheap when there is no direct map
  x86/setup: leave early boot slightly earlier
  x86/setup: vmap heap nodes when they are outside the direct map
  x86/setup: do not create valid mappings when directmap=no

Julien Grall (7):
  xen/common: page_alloc: Re-order includes
  xen/arm: fixmap: Rename the fixmap slots to follow the x86 convention
  xen/x86: Add support for the PMAP
  xen/arm32: mm: Rename 'first' to 'root' in init_secondary_pagetables()
  xen/arm64: mm: Use per-pCPU page-tables
  xen/arm64: Implement a mapcache for arm64
  xen/arm64: Allow the admin to enable/disable the directmap

Wei Liu (3):
  x86/setup: move vm_init() before acpi calls
  x86/pv: domheap pages should be mapped while relocating initrd
  x86: lift mapcache variable to the arch level

 docs/misc/xen-command-line.pandoc       |  12 +++
 xen/arch/arm/Kconfig                    |   1 +
 xen/arch/arm/acpi/lib.c                 |  18 ++--
 xen/arch/arm/domain_page.c              |  49 +++++++++-
 xen/arch/arm/include/asm/arm32/mm.h     |   8 --
 xen/arch/arm/include/asm/arm64/mm.h     |   2 +-
 xen/arch/arm/include/asm/config.h       |   7 ++
 xen/arch/arm/include/asm/domain_page.h  |  13 +++
 xen/arch/arm/include/asm/early_printk.h |   2 +-
 xen/arch/arm/include/asm/fixmap.h       |  16 +--
 xen/arch/arm/include/asm/mm.h           |  17 ++++
 xen/arch/arm/kernel.c                   |   6 +-
 xen/arch/arm/mm.c                       | 104 +++++++++++---------
 xen/arch/arm/setup.c                    |  10 +-
 xen/arch/x86/Kconfig                    |   1 +
 xen/arch/x86/domain.c                   |   4 +-
 xen/arch/x86/domain_page.c              |  70 +++++++++----
 xen/arch/x86/include/asm/domain.h       |  12 +--
 xen/arch/x86/include/asm/fixmap.h       |   4 +
 xen/arch/x86/include/asm/mm.h           |  17 +++-
 xen/arch/x86/include/asm/pmap.h         |  25 +++++
 xen/arch/x86/mm.c                       |   6 ++
 xen/arch/x86/pv/dom0_build.c            |  74 +++++++++++---
 xen/arch/x86/setup.c                    | 125 ++++++++++++++++++++----
 xen/arch/x86/srat.c                     |   3 +-
 xen/arch/x86/x86_64/entry.S             |  27 ++++-
 xen/common/numa.c                       |   8 +-
 xen/common/page_alloc.c                 | 112 ++++++++++++++++-----
 xen/common/pmap.c                       |   8 +-
 xen/common/vmap.c                       |  42 ++++++--
 xen/drivers/acpi/osl.c                  |  13 ++-
 xen/include/xen/vmap.h                  |   2 +
 32 files changed, 627 insertions(+), 191 deletions(-)
 create mode 100644 xen/arch/arm/include/asm/domain_page.h
 create mode 100644 xen/arch/x86/include/asm/pmap.h

-- 
2.38.1



^ permalink raw reply	[flat|nested] 101+ messages in thread

* [PATCH 01/22] xen/common: page_alloc: Re-order includes
  2022-12-16 11:48 [PATCH 00/22] Remove the directmap Julien Grall
@ 2022-12-16 11:48 ` Julien Grall
  2022-12-16 12:03   ` Jan Beulich
  2023-01-23 21:29   ` Stefano Stabellini
  2022-12-16 11:48 ` [PATCH 02/22] x86/setup: move vm_init() before acpi calls Julien Grall
                   ` (20 subsequent siblings)
  21 siblings, 2 replies; 101+ messages in thread
From: Julien Grall @ 2022-12-16 11:48 UTC (permalink / raw)
  To: xen-devel
  Cc: julien, Julien Grall, Andrew Cooper, George Dunlap, Jan Beulich,
	Stefano Stabellini, Wei Liu

From: Julien Grall <jgrall@amazon.com>

Order the includes with the xen headers first, then asm headers and
last public headers. Within each category, they are sorted alphabetically.

Note that the includes in protected by CONFIG_X86 hasn't been sorted
to avoid adding multiple #ifdef.

Signed-off-by: Julien Grall <jgrall@amazon.com>

----

    I am open to add sort the includes protected by CONFIG_X86
    and add multiple #ifdef if this is preferred.
---
 xen/common/page_alloc.c | 29 ++++++++++++++++-------------
 1 file changed, 16 insertions(+), 13 deletions(-)

diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
index 0c93a1078702..0a950288e241 100644
--- a/xen/common/page_alloc.c
+++ b/xen/common/page_alloc.c
@@ -120,27 +120,30 @@
  *   regions within it.
  */
 
+#include <xen/domain_page.h>
+#include <xen/event.h>
 #include <xen/init.h>
-#include <xen/types.h>
+#include <xen/irq.h>
+#include <xen/keyhandler.h>
 #include <xen/lib.h>
-#include <xen/sched.h>
-#include <xen/spinlock.h>
 #include <xen/mm.h>
+#include <xen/nodemask.h>
+#include <xen/numa.h>
 #include <xen/param.h>
-#include <xen/irq.h>
-#include <xen/softirq.h>
-#include <xen/domain_page.h>
-#include <xen/keyhandler.h>
 #include <xen/perfc.h>
 #include <xen/pfn.h>
-#include <xen/numa.h>
-#include <xen/nodemask.h>
-#include <xen/event.h>
+#include <xen/types.h>
+#include <xen/sched.h>
+#include <xen/softirq.h>
+#include <xen/spinlock.h>
+
+#include <asm/flushtlb.h>
+#include <asm/numa.h>
+#include <asm/page.h>
+
 #include <public/sysctl.h>
 #include <public/sched.h>
-#include <asm/page.h>
-#include <asm/numa.h>
-#include <asm/flushtlb.h>
+
 #ifdef CONFIG_X86
 #include <asm/guest.h>
 #include <asm/p2m.h>
-- 
2.38.1



^ permalink raw reply related	[flat|nested] 101+ messages in thread

* [PATCH 02/22] x86/setup: move vm_init() before acpi calls
  2022-12-16 11:48 [PATCH 00/22] Remove the directmap Julien Grall
  2022-12-16 11:48 ` [PATCH 01/22] xen/common: page_alloc: Re-order includes Julien Grall
@ 2022-12-16 11:48 ` Julien Grall
  2022-12-20 15:08   ` Jan Beulich
  2023-01-23 21:34   ` Stefano Stabellini
  2022-12-16 11:48 ` [PATCH 03/22] acpi: vmap pages in acpi_os_alloc_memory Julien Grall
                   ` (19 subsequent siblings)
  21 siblings, 2 replies; 101+ messages in thread
From: Julien Grall @ 2022-12-16 11:48 UTC (permalink / raw)
  To: xen-devel
  Cc: julien, Wei Liu, Stefano Stabellini, Bertrand Marquis,
	Volodymyr Babchuk, Andrew Cooper, George Dunlap, Jan Beulich,
	Wei Liu, Roger Pau Monné,
	David Woodhouse, Hongyan Xia, Julien Grall

From: Wei Liu <wei.liu2@citrix.com>

After the direct map removal, pages from the boot allocator are not
mapped at all in the direct map. Although we have map_domain_page, they
are ephemeral and are less helpful for mappings that are more than a
page, so we want a mechanism to globally map a range of pages, which is
what vmap is for. Therefore, we bring vm_init into early boot stage.

To allow vmap to be initialised and used in early boot, we need to
modify vmap to receive pages from the boot allocator during early boot
stage.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: David Woodhouse <dwmw2@amazon.com>
Signed-off-by: Hongyan Xia <hongyxia@amazon.com>
Signed-off-by: Julien Grall <jgrall@amazon.com>
---
 xen/arch/arm/setup.c |  4 ++--
 xen/arch/x86/setup.c | 31 ++++++++++++++++++++-----------
 xen/common/vmap.c    | 37 +++++++++++++++++++++++++++++--------
 3 files changed, 51 insertions(+), 21 deletions(-)

diff --git a/xen/arch/arm/setup.c b/xen/arch/arm/setup.c
index 1f26f67b90e3..2311726f5ddd 100644
--- a/xen/arch/arm/setup.c
+++ b/xen/arch/arm/setup.c
@@ -1028,6 +1028,8 @@ void __init start_xen(unsigned long boot_phys_offset,
 
     setup_mm();
 
+    vm_init();
+
     /* Parse the ACPI tables for possible boot-time configuration */
     acpi_boot_table_init();
 
@@ -1039,8 +1041,6 @@ void __init start_xen(unsigned long boot_phys_offset,
      */
     system_state = SYS_STATE_boot;
 
-    vm_init();
-
     if ( acpi_disabled )
     {
         printk("Booting using Device Tree\n");
diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
index 6bb5bc7c84be..1c2e09711eb0 100644
--- a/xen/arch/x86/setup.c
+++ b/xen/arch/x86/setup.c
@@ -870,6 +870,7 @@ void __init noreturn __start_xen(unsigned long mbi_p)
     unsigned long eb_start, eb_end;
     bool acpi_boot_table_init_done = false, relocated = false;
     int ret;
+    bool vm_init_done = false;
     struct ns16550_defaults ns16550 = {
         .data_bits = 8,
         .parity    = 'n',
@@ -1442,12 +1443,23 @@ void __init noreturn __start_xen(unsigned long mbi_p)
             continue;
 
         if ( !acpi_boot_table_init_done &&
-             s >= (1ULL << 32) &&
-             !acpi_boot_table_init() )
+             s >= (1ULL << 32) )
         {
-            acpi_boot_table_init_done = true;
-            srat_parse_regions(s);
-            setup_max_pdx(raw_max_page);
+            /*
+             * We only initialise vmap and acpi after going through the bottom
+             * 4GiB, so that we have enough pages in the boot allocator.
+             */
+            if ( !vm_init_done )
+            {
+                vm_init();
+                vm_init_done = true;
+            }
+            if ( !acpi_boot_table_init() )
+            {
+                acpi_boot_table_init_done = true;
+                srat_parse_regions(s);
+                setup_max_pdx(raw_max_page);
+            }
         }
 
         if ( pfn_to_pdx((e - 1) >> PAGE_SHIFT) >= max_pdx )
@@ -1624,6 +1636,9 @@ void __init noreturn __start_xen(unsigned long mbi_p)
 
     init_frametable();
 
+    if ( !vm_init_done )
+        vm_init();
+
     if ( !acpi_boot_table_init_done )
         acpi_boot_table_init();
 
@@ -1661,12 +1676,6 @@ void __init noreturn __start_xen(unsigned long mbi_p)
         end_boot_allocator();
 
     system_state = SYS_STATE_boot;
-    /*
-     * No calls involving ACPI code should go between the setting of
-     * SYS_STATE_boot and vm_init() (or else acpi_os_{,un}map_memory()
-     * will break).
-     */
-    vm_init();
 
     bsp_stack = cpu_alloc_stack(0);
     if ( !bsp_stack )
diff --git a/xen/common/vmap.c b/xen/common/vmap.c
index 4fd6b3067ec1..1340c7c6faf6 100644
--- a/xen/common/vmap.c
+++ b/xen/common/vmap.c
@@ -34,9 +34,20 @@ void __init vm_init_type(enum vmap_region type, void *start, void *end)
 
     for ( i = 0, va = (unsigned long)vm_bitmap(type); i < nr; ++i, va += PAGE_SIZE )
     {
-        struct page_info *pg = alloc_domheap_page(NULL, 0);
+        mfn_t mfn;
+        int rc;
 
-        map_pages_to_xen(va, page_to_mfn(pg), 1, PAGE_HYPERVISOR);
+        if ( system_state == SYS_STATE_early_boot )
+            mfn = alloc_boot_pages(1, 1);
+        else
+        {
+            struct page_info *pg = alloc_domheap_page(NULL, 0);
+
+            BUG_ON(!pg);
+            mfn = page_to_mfn(pg);
+        }
+        rc = map_pages_to_xen(va, mfn, 1, PAGE_HYPERVISOR);
+        BUG_ON(rc);
         clear_page((void *)va);
     }
     bitmap_fill(vm_bitmap(type), vm_low[type]);
@@ -62,7 +73,7 @@ static void *vm_alloc(unsigned int nr, unsigned int align,
     spin_lock(&vm_lock);
     for ( ; ; )
     {
-        struct page_info *pg;
+        mfn_t mfn;
 
         ASSERT(vm_low[t] == vm_top[t] || !test_bit(vm_low[t], vm_bitmap(t)));
         for ( start = vm_low[t]; start < vm_top[t]; )
@@ -97,9 +108,16 @@ static void *vm_alloc(unsigned int nr, unsigned int align,
         if ( vm_top[t] >= vm_end[t] )
             return NULL;
 
-        pg = alloc_domheap_page(NULL, 0);
-        if ( !pg )
-            return NULL;
+        if ( system_state == SYS_STATE_early_boot )
+            mfn = alloc_boot_pages(1, 1);
+        else
+        {
+            struct page_info *pg = alloc_domheap_page(NULL, 0);
+
+            if ( !pg )
+                return NULL;
+            mfn = page_to_mfn(pg);
+        }
 
         spin_lock(&vm_lock);
 
@@ -107,7 +125,7 @@ static void *vm_alloc(unsigned int nr, unsigned int align,
         {
             unsigned long va = (unsigned long)vm_bitmap(t) + vm_top[t] / 8;
 
-            if ( !map_pages_to_xen(va, page_to_mfn(pg), 1, PAGE_HYPERVISOR) )
+            if ( !map_pages_to_xen(va, mfn, 1, PAGE_HYPERVISOR) )
             {
                 clear_page((void *)va);
                 vm_top[t] += PAGE_SIZE * 8;
@@ -117,7 +135,10 @@ static void *vm_alloc(unsigned int nr, unsigned int align,
             }
         }
 
-        free_domheap_page(pg);
+        if ( system_state == SYS_STATE_early_boot )
+            init_boot_pages(mfn_to_maddr(mfn), mfn_to_maddr(mfn) + PAGE_SIZE);
+        else
+            free_domheap_page(mfn_to_page(mfn));
 
         if ( start >= vm_top[t] )
         {
-- 
2.38.1



^ permalink raw reply related	[flat|nested] 101+ messages in thread

* [PATCH 03/22] acpi: vmap pages in acpi_os_alloc_memory
  2022-12-16 11:48 [PATCH 00/22] Remove the directmap Julien Grall
  2022-12-16 11:48 ` [PATCH 01/22] xen/common: page_alloc: Re-order includes Julien Grall
  2022-12-16 11:48 ` [PATCH 02/22] x86/setup: move vm_init() before acpi calls Julien Grall
@ 2022-12-16 11:48 ` Julien Grall
  2022-12-16 12:07   ` Julien Grall
                     ` (2 more replies)
  2022-12-16 11:48 ` [PATCH 04/22] xen/numa: vmap the pages for memnodemap Julien Grall
                   ` (18 subsequent siblings)
  21 siblings, 3 replies; 101+ messages in thread
From: Julien Grall @ 2022-12-16 11:48 UTC (permalink / raw)
  To: xen-devel
  Cc: julien, Hongyan Xia, Andrew Cooper, George Dunlap, Jan Beulich,
	Stefano Stabellini, Wei Liu, Julien Grall

From: Hongyan Xia <hongyxia@amazon.com>

Also, introduce a wrapper around vmap that maps a contiguous range for
boot allocations. Unfortunately, the new helper cannot be a static inline
because the dependences are a mess. We would need to re-include
asm/page.h (was removed in aa4b9d1ee653 "include: don't use asm/page.h
from common headers") and it doesn't look to be enough anymore
because bits from asm/cpufeature.h is used in the definition of PAGE_NX.

Signed-off-by: Hongyan Xia <hongyxia@amazon.com>
Signed-off-by: Julien Grall <jgrall@amazon.com>

----

    Changes since Hongyan's version:
        * Rename vmap_boot_pages() to vmap_contig_pages()
        * Move the new helper in vmap.c to avoid compilation issue
        * Don't use __pa() to translate the virtual address
---
 xen/common/vmap.c      |  5 +++++
 xen/drivers/acpi/osl.c | 13 +++++++++++--
 xen/include/xen/vmap.h |  2 ++
 3 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/xen/common/vmap.c b/xen/common/vmap.c
index 1340c7c6faf6..78f051a67682 100644
--- a/xen/common/vmap.c
+++ b/xen/common/vmap.c
@@ -244,6 +244,11 @@ void *vmap(const mfn_t *mfn, unsigned int nr)
     return __vmap(mfn, 1, nr, 1, PAGE_HYPERVISOR, VMAP_DEFAULT);
 }
 
+void *vmap_contig_pages(mfn_t mfn, unsigned int nr_pages)
+{
+    return __vmap(&mfn, nr_pages, 1, 1, PAGE_HYPERVISOR, VMAP_DEFAULT);
+}
+
 void vunmap(const void *va)
 {
     unsigned long addr = (unsigned long)va;
diff --git a/xen/drivers/acpi/osl.c b/xen/drivers/acpi/osl.c
index 389505f78666..44a9719b0dcf 100644
--- a/xen/drivers/acpi/osl.c
+++ b/xen/drivers/acpi/osl.c
@@ -221,7 +221,11 @@ void *__init acpi_os_alloc_memory(size_t sz)
 	void *ptr;
 
 	if (system_state == SYS_STATE_early_boot)
-		return mfn_to_virt(mfn_x(alloc_boot_pages(PFN_UP(sz), 1)));
+	{
+		mfn_t mfn = alloc_boot_pages(PFN_UP(sz), 1);
+
+		return vmap_contig_pages(mfn, PFN_UP(sz));
+	}
 
 	ptr = xmalloc_bytes(sz);
 	ASSERT(!ptr || is_xmalloc_memory(ptr));
@@ -246,5 +250,10 @@ void __init acpi_os_free_memory(void *ptr)
 	if (is_xmalloc_memory(ptr))
 		xfree(ptr);
 	else if (ptr && system_state == SYS_STATE_early_boot)
-		init_boot_pages(__pa(ptr), __pa(ptr) + PAGE_SIZE);
+	{
+		paddr_t addr = mfn_to_maddr(vmap_to_mfn(ptr));
+
+		vunmap(ptr);
+		init_boot_pages(addr, addr + PAGE_SIZE);
+	}
 }
diff --git a/xen/include/xen/vmap.h b/xen/include/xen/vmap.h
index b0f7632e8985..3c06c7c3ba30 100644
--- a/xen/include/xen/vmap.h
+++ b/xen/include/xen/vmap.h
@@ -23,6 +23,8 @@ void *vmalloc_xen(size_t size);
 void *vzalloc(size_t size);
 void vfree(void *va);
 
+void *vmap_contig_pages(mfn_t mfn, unsigned int nr_pages);
+
 void __iomem *ioremap(paddr_t, size_t);
 
 static inline void iounmap(void __iomem *va)
-- 
2.38.1



^ permalink raw reply related	[flat|nested] 101+ messages in thread

* [PATCH 04/22] xen/numa: vmap the pages for memnodemap
  2022-12-16 11:48 [PATCH 00/22] Remove the directmap Julien Grall
                   ` (2 preceding siblings ...)
  2022-12-16 11:48 ` [PATCH 03/22] acpi: vmap pages in acpi_os_alloc_memory Julien Grall
@ 2022-12-16 11:48 ` Julien Grall
  2022-12-20 15:25   ` Jan Beulich
  2022-12-16 11:48 ` [PATCH 05/22] x86/srat: vmap the pages for acpi_slit Julien Grall
                   ` (17 subsequent siblings)
  21 siblings, 1 reply; 101+ messages in thread
From: Julien Grall @ 2022-12-16 11:48 UTC (permalink / raw)
  To: xen-devel
  Cc: julien, Hongyan Xia, Andrew Cooper, George Dunlap, Jan Beulich,
	Stefano Stabellini, Wei Liu, Julien Grall

From: Hongyan Xia <hongyxia@amazon.com>

This avoids the assumption that there is a direct map and boot pages
fall inside the direct map.

Clean up the variables so that mfn actually stores a type-safe mfn.

Signed-off-by: Hongyan Xia <hongyxia@amazon.com>
Signed-off-by: Julien Grall <jgrall@amazon.com>

----

    Changes compare to Hongyan's version:
        * The function modified was moved to common code. So rebase it
        * vmap_boot_pages() was renamed to vmap_contig_pages()
---
 xen/common/numa.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/xen/common/numa.c b/xen/common/numa.c
index 4948b21fbe66..2040b3d974e5 100644
--- a/xen/common/numa.c
+++ b/xen/common/numa.c
@@ -424,13 +424,13 @@ static int __init populate_memnodemap(const struct node *nodes,
 static int __init allocate_cachealigned_memnodemap(void)
 {
     unsigned long size = PFN_UP(memnodemapsize * sizeof(*memnodemap));
-    unsigned long mfn = mfn_x(alloc_boot_pages(size, 1));
+    mfn_t mfn = alloc_boot_pages(size, 1);
 
-    memnodemap = mfn_to_virt(mfn);
-    mfn <<= PAGE_SHIFT;
+    memnodemap = vmap_contig_pages(mfn, size);
+    BUG_ON(!memnodemap);
     size <<= PAGE_SHIFT;
     printk(KERN_DEBUG "NUMA: Allocated memnodemap from %lx - %lx\n",
-           mfn, mfn + size);
+           mfn_to_maddr(mfn), mfn_to_maddr(mfn) + size);
     memnodemapsize = size / sizeof(*memnodemap);
 
     return 0;
-- 
2.38.1



^ permalink raw reply related	[flat|nested] 101+ messages in thread

* [PATCH 05/22] x86/srat: vmap the pages for acpi_slit
  2022-12-16 11:48 [PATCH 00/22] Remove the directmap Julien Grall
                   ` (3 preceding siblings ...)
  2022-12-16 11:48 ` [PATCH 04/22] xen/numa: vmap the pages for memnodemap Julien Grall
@ 2022-12-16 11:48 ` Julien Grall
  2022-12-20 15:30   ` Jan Beulich
  2022-12-16 11:48 ` [PATCH 06/22] x86: map/unmap pages in restore_all_guests Julien Grall
                   ` (16 subsequent siblings)
  21 siblings, 1 reply; 101+ messages in thread
From: Julien Grall @ 2022-12-16 11:48 UTC (permalink / raw)
  To: xen-devel
  Cc: julien, Hongyan Xia, Jan Beulich, Andrew Cooper,
	Roger Pau Monné,
	Wei Liu, Julien Grall

From: Hongyan Xia <hongyxia@amazon.com>

This avoids the assumption that boot pages are in the direct map.

Signed-off-by: Hongyan Xia <hongyxia@amazon.com>
Signed-off-by: Julien Grall <jgrall@amazon.com>

----

    Changes since Hongyan's version:
        * vmap_boot_pages() was renamed to vmap_contig_pages()
---
 xen/arch/x86/srat.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/srat.c b/xen/arch/x86/srat.c
index 56749ddca526..1fd178e89d28 100644
--- a/xen/arch/x86/srat.c
+++ b/xen/arch/x86/srat.c
@@ -139,7 +139,8 @@ void __init acpi_numa_slit_init(struct acpi_table_slit *slit)
 		return;
 	}
 	mfn = alloc_boot_pages(PFN_UP(slit->header.length), 1);
-	acpi_slit = mfn_to_virt(mfn_x(mfn));
+	acpi_slit = vmap_contig_pages(mfn, PFN_UP(slit->header.length));
+	BUG_ON(!acpi_slit);
 	memcpy(acpi_slit, slit, slit->header.length);
 }
 
-- 
2.38.1



^ permalink raw reply related	[flat|nested] 101+ messages in thread

* [PATCH 06/22] x86: map/unmap pages in restore_all_guests
  2022-12-16 11:48 [PATCH 00/22] Remove the directmap Julien Grall
                   ` (4 preceding siblings ...)
  2022-12-16 11:48 ` [PATCH 05/22] x86/srat: vmap the pages for acpi_slit Julien Grall
@ 2022-12-16 11:48 ` Julien Grall
  2022-12-22 11:12   ` Jan Beulich
  2022-12-16 11:48 ` [PATCH 07/22] x86/pv: domheap pages should be mapped while relocating initrd Julien Grall
                   ` (15 subsequent siblings)
  21 siblings, 1 reply; 101+ messages in thread
From: Julien Grall @ 2022-12-16 11:48 UTC (permalink / raw)
  To: xen-devel
  Cc: julien, Hongyan Xia, Jan Beulich, Andrew Cooper,
	Roger Pau Monné,
	Wei Liu, Julien Grall

From: Hongyan Xia <hongyxia@amazon.com>

Before, it assumed the pv cr3 could be accessed via a direct map. This
is no longer true.

Note that we do not map and unmap root_pgt for now since it is still a
xenheap page.

Signed-off-by: Hongyan Xia <hongyxia@amazon.com>
Signed-off-by: Julien Grall <jgrall@amazon.com>

----

    Changes since Hongyan's version:
        * Remove the final dot in the commit title
---
 xen/arch/x86/x86_64/entry.S | 27 ++++++++++++++++++++++++++-
 1 file changed, 26 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/x86_64/entry.S b/xen/arch/x86/x86_64/entry.S
index ae012851819a..b72abf923d9c 100644
--- a/xen/arch/x86/x86_64/entry.S
+++ b/xen/arch/x86/x86_64/entry.S
@@ -165,7 +165,24 @@ restore_all_guest:
         and   %rsi, %rdi
         and   %r9, %rsi
         add   %rcx, %rdi
-        add   %rcx, %rsi
+
+         /*
+          * Without a direct map, we have to map first before copying. We only
+          * need to map the guest root table but not the per-CPU root_pgt,
+          * because the latter is still a xenheap page.
+          */
+        pushq %r9
+        pushq %rdx
+        pushq %rax
+        pushq %rdi
+        mov   %rsi, %rdi
+        shr   $PAGE_SHIFT, %rdi
+        callq map_domain_page
+        mov   %rax, %rsi
+        popq  %rdi
+        /* Stash the pointer for unmapping later. */
+        pushq %rax
+
         mov   $ROOT_PAGETABLE_FIRST_XEN_SLOT, %ecx
         mov   root_table_offset(SH_LINEAR_PT_VIRT_START)*8(%rsi), %r8
         mov   %r8, root_table_offset(SH_LINEAR_PT_VIRT_START)*8(%rdi)
@@ -177,6 +194,14 @@ restore_all_guest:
         sub   $(ROOT_PAGETABLE_FIRST_XEN_SLOT - \
                 ROOT_PAGETABLE_LAST_XEN_SLOT - 1) * 8, %rdi
         rep movsq
+
+        /* Unmap the page. */
+        popq  %rdi
+        callq unmap_domain_page
+        popq  %rax
+        popq  %rdx
+        popq  %r9
+
 .Lrag_copy_done:
         mov   %r9, STACK_CPUINFO_FIELD(xen_cr3)(%rdx)
         movb  $1, STACK_CPUINFO_FIELD(use_pv_cr3)(%rdx)
-- 
2.38.1



^ permalink raw reply related	[flat|nested] 101+ messages in thread

* [PATCH 07/22] x86/pv: domheap pages should be mapped while relocating initrd
  2022-12-16 11:48 [PATCH 00/22] Remove the directmap Julien Grall
                   ` (5 preceding siblings ...)
  2022-12-16 11:48 ` [PATCH 06/22] x86: map/unmap pages in restore_all_guests Julien Grall
@ 2022-12-16 11:48 ` Julien Grall
  2022-12-22 11:18   ` Jan Beulich
  2022-12-16 11:48 ` [PATCH 08/22] x86/pv: rewrite how building PV dom0 handles domheap mappings Julien Grall
                   ` (14 subsequent siblings)
  21 siblings, 1 reply; 101+ messages in thread
From: Julien Grall @ 2022-12-16 11:48 UTC (permalink / raw)
  To: xen-devel
  Cc: julien, Wei Liu, Jan Beulich, Andrew Cooper, Roger Pau Monné,
	Wei Liu, Wei Wang, Julien Grall

From: Wei Liu <wei.liu2@citrix.com>

Xen shouldn't use domheap page as if they were xenheap pages. Map and
unmap pages accordingly.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Wei Wang <wawei@amazon.de>
Signed-off-by: Julien Grall <jgrall@amazon.com>

----

    Changes since Hongyan's version:
        * Add missing newline after the variable declaration
---
 xen/arch/x86/pv/dom0_build.c | 18 ++++++++++++++++--
 1 file changed, 16 insertions(+), 2 deletions(-)

diff --git a/xen/arch/x86/pv/dom0_build.c b/xen/arch/x86/pv/dom0_build.c
index a62f0fa2ef29..c837b2d96f89 100644
--- a/xen/arch/x86/pv/dom0_build.c
+++ b/xen/arch/x86/pv/dom0_build.c
@@ -611,18 +611,32 @@ int __init dom0_construct_pv(struct domain *d,
         if ( d->arch.physaddr_bitsize &&
              ((mfn + count - 1) >> (d->arch.physaddr_bitsize - PAGE_SHIFT)) )
         {
+            unsigned long nr_pages;
+            unsigned long len = initrd_len;
+
             order = get_order_from_pages(count);
             page = alloc_domheap_pages(d, order, MEMF_no_scrub);
             if ( !page )
                 panic("Not enough RAM for domain 0 initrd\n");
+
+            nr_pages = 1UL << order;
             for ( count = -count; order--; )
                 if ( count & (1UL << order) )
                 {
                     free_domheap_pages(page, order);
                     page += 1UL << order;
+                    nr_pages -= 1UL << order;
                 }
-            memcpy(page_to_virt(page), mfn_to_virt(initrd->mod_start),
-                   initrd_len);
+
+            for ( i = 0; i < nr_pages; i++, len -= PAGE_SIZE )
+            {
+                void *p = __map_domain_page(page + i);
+
+                memcpy(p, mfn_to_virt(initrd_mfn + i),
+                       min(len, (unsigned long)PAGE_SIZE));
+                unmap_domain_page(p);
+            }
+
             mpt_alloc = (paddr_t)initrd->mod_start << PAGE_SHIFT;
             init_domheap_pages(mpt_alloc,
                                mpt_alloc + PAGE_ALIGN(initrd_len));
-- 
2.38.1



^ permalink raw reply related	[flat|nested] 101+ messages in thread

* [PATCH 08/22] x86/pv: rewrite how building PV dom0 handles domheap mappings
  2022-12-16 11:48 [PATCH 00/22] Remove the directmap Julien Grall
                   ` (6 preceding siblings ...)
  2022-12-16 11:48 ` [PATCH 07/22] x86/pv: domheap pages should be mapped while relocating initrd Julien Grall
@ 2022-12-16 11:48 ` Julien Grall
  2022-12-22 11:48   ` Jan Beulich
  2022-12-16 11:48 ` [PATCH 09/22] x86: lift mapcache variable to the arch level Julien Grall
                   ` (13 subsequent siblings)
  21 siblings, 1 reply; 101+ messages in thread
From: Julien Grall @ 2022-12-16 11:48 UTC (permalink / raw)
  To: xen-devel
  Cc: julien, Hongyan Xia, Jan Beulich, Andrew Cooper,
	Roger Pau Monné,
	Wei Liu, Julien Grall

From: Hongyan Xia <hongyxia@amazon.com>

Building a PV dom0 is allocating from the domheap but uses it like the
xenheap. This is clearly wrong. Fix.

Signed-off-by: Hongyan Xia <hongyxia@amazon.com>
Signed-off-by: Julien Grall <jgrall@amazon.com>

----

    Changes since Hongyan's version:
        * Rebase
        * Remove spurious newline
---
 xen/arch/x86/pv/dom0_build.c | 56 +++++++++++++++++++++++++++---------
 1 file changed, 42 insertions(+), 14 deletions(-)

diff --git a/xen/arch/x86/pv/dom0_build.c b/xen/arch/x86/pv/dom0_build.c
index c837b2d96f89..cd60f259d1b7 100644
--- a/xen/arch/x86/pv/dom0_build.c
+++ b/xen/arch/x86/pv/dom0_build.c
@@ -383,6 +383,10 @@ int __init dom0_construct_pv(struct domain *d,
     l3_pgentry_t *l3tab = NULL, *l3start = NULL;
     l2_pgentry_t *l2tab = NULL, *l2start = NULL;
     l1_pgentry_t *l1tab = NULL, *l1start = NULL;
+    mfn_t l4start_mfn = INVALID_MFN;
+    mfn_t l3start_mfn = INVALID_MFN;
+    mfn_t l2start_mfn = INVALID_MFN;
+    mfn_t l1start_mfn = INVALID_MFN;
 
     /*
      * This fully describes the memory layout of the initial domain. All
@@ -711,22 +715,32 @@ int __init dom0_construct_pv(struct domain *d,
         v->arch.pv.event_callback_cs    = FLAT_COMPAT_KERNEL_CS;
     }
 
+#define UNMAP_MAP_AND_ADVANCE(mfn_var, virt_var, maddr) \
+do {                                                    \
+    UNMAP_DOMAIN_PAGE(virt_var);                        \
+    mfn_var = maddr_to_mfn(maddr);                      \
+    maddr += PAGE_SIZE;                                 \
+    virt_var = map_domain_page(mfn_var);                \
+} while ( false )
+
     if ( !compat )
     {
         maddr_to_page(mpt_alloc)->u.inuse.type_info = PGT_l4_page_table;
-        l4start = l4tab = __va(mpt_alloc); mpt_alloc += PAGE_SIZE;
+        UNMAP_MAP_AND_ADVANCE(l4start_mfn, l4start, mpt_alloc);
+        l4tab = l4start;
         clear_page(l4tab);
-        init_xen_l4_slots(l4tab, _mfn(virt_to_mfn(l4start)),
-                          d, INVALID_MFN, true);
-        v->arch.guest_table = pagetable_from_paddr(__pa(l4start));
+        init_xen_l4_slots(l4tab, l4start_mfn, d, INVALID_MFN, true);
+        v->arch.guest_table = pagetable_from_mfn(l4start_mfn);
     }
     else
     {
         /* Monitor table already created by switch_compat(). */
-        l4start = l4tab = __va(pagetable_get_paddr(v->arch.guest_table));
+        l4start_mfn = pagetable_get_mfn(v->arch.guest_table);
+        l4start = l4tab = map_domain_page(l4start_mfn);
         /* See public/xen.h on why the following is needed. */
         maddr_to_page(mpt_alloc)->u.inuse.type_info = PGT_l3_page_table;
         l3start = __va(mpt_alloc); mpt_alloc += PAGE_SIZE;
+        UNMAP_MAP_AND_ADVANCE(l3start_mfn, l3start, mpt_alloc);
     }
 
     l4tab += l4_table_offset(v_start);
@@ -736,14 +750,16 @@ int __init dom0_construct_pv(struct domain *d,
         if ( !((unsigned long)l1tab & (PAGE_SIZE-1)) )
         {
             maddr_to_page(mpt_alloc)->u.inuse.type_info = PGT_l1_page_table;
-            l1start = l1tab = __va(mpt_alloc); mpt_alloc += PAGE_SIZE;
+            UNMAP_MAP_AND_ADVANCE(l1start_mfn, l1start, mpt_alloc);
+            l1tab = l1start;
             clear_page(l1tab);
             if ( count == 0 )
                 l1tab += l1_table_offset(v_start);
             if ( !((unsigned long)l2tab & (PAGE_SIZE-1)) )
             {
                 maddr_to_page(mpt_alloc)->u.inuse.type_info = PGT_l2_page_table;
-                l2start = l2tab = __va(mpt_alloc); mpt_alloc += PAGE_SIZE;
+                UNMAP_MAP_AND_ADVANCE(l2start_mfn, l2start, mpt_alloc);
+                l2tab = l2start;
                 clear_page(l2tab);
                 if ( count == 0 )
                     l2tab += l2_table_offset(v_start);
@@ -753,19 +769,19 @@ int __init dom0_construct_pv(struct domain *d,
                     {
                         maddr_to_page(mpt_alloc)->u.inuse.type_info =
                             PGT_l3_page_table;
-                        l3start = __va(mpt_alloc); mpt_alloc += PAGE_SIZE;
+                        UNMAP_MAP_AND_ADVANCE(l3start_mfn, l3start, mpt_alloc);
                     }
                     l3tab = l3start;
                     clear_page(l3tab);
                     if ( count == 0 )
                         l3tab += l3_table_offset(v_start);
-                    *l4tab = l4e_from_paddr(__pa(l3start), L4_PROT);
+                    *l4tab = l4e_from_mfn(l3start_mfn, L4_PROT);
                     l4tab++;
                 }
-                *l3tab = l3e_from_paddr(__pa(l2start), L3_PROT);
+                *l3tab = l3e_from_mfn(l2start_mfn, L3_PROT);
                 l3tab++;
             }
-            *l2tab = l2e_from_paddr(__pa(l1start), L2_PROT);
+            *l2tab = l2e_from_mfn(l1start_mfn, L2_PROT);
             l2tab++;
         }
         if ( count < initrd_pfn || count >= initrd_pfn + PFN_UP(initrd_len) )
@@ -792,9 +808,9 @@ int __init dom0_construct_pv(struct domain *d,
             if ( !l3e_get_intpte(*l3tab) )
             {
                 maddr_to_page(mpt_alloc)->u.inuse.type_info = PGT_l2_page_table;
-                l2tab = __va(mpt_alloc); mpt_alloc += PAGE_SIZE;
-                clear_page(l2tab);
-                *l3tab = l3e_from_paddr(__pa(l2tab), L3_PROT);
+                UNMAP_MAP_AND_ADVANCE(l2start_mfn, l2start, mpt_alloc);
+                clear_page(l2start);
+                *l3tab = l3e_from_mfn(l2start_mfn, L3_PROT);
             }
             if ( i == 3 )
                 l3e_get_page(*l3tab)->u.inuse.type_info |= PGT_pae_xen_l2;
@@ -805,9 +821,17 @@ int __init dom0_construct_pv(struct domain *d,
         unmap_domain_page(l2t);
     }
 
+#undef UNMAP_MAP_AND_ADVANCE
+
+    UNMAP_DOMAIN_PAGE(l1start);
+    UNMAP_DOMAIN_PAGE(l2start);
+    UNMAP_DOMAIN_PAGE(l3start);
+
     /* Pages that are part of page tables must be read only. */
     mark_pv_pt_pages_rdonly(d, l4start, vpt_start, nr_pt_pages, &flush_flags);
 
+    UNMAP_DOMAIN_PAGE(l4start);
+
     /* Mask all upcalls... */
     for ( i = 0; i < XEN_LEGACY_MAX_VCPUS; i++ )
         shared_info(d, vcpu_info[i].evtchn_upcall_mask) = 1;
@@ -977,8 +1001,12 @@ int __init dom0_construct_pv(struct domain *d,
      * !CONFIG_VIDEO case so the logic here can be simplified.
      */
     if ( pv_shim )
+    {
+        l4start = map_domain_page(l4start_mfn);
         pv_shim_setup_dom(d, l4start, v_start, vxenstore_start, vconsole_start,
                           vphysmap_start, si);
+        UNMAP_DOMAIN_PAGE(l4start);
+    }
 
 #ifdef CONFIG_COMPAT
     if ( compat )
-- 
2.38.1



^ permalink raw reply related	[flat|nested] 101+ messages in thread

* [PATCH 09/22] x86: lift mapcache variable to the arch level
  2022-12-16 11:48 [PATCH 00/22] Remove the directmap Julien Grall
                   ` (7 preceding siblings ...)
  2022-12-16 11:48 ` [PATCH 08/22] x86/pv: rewrite how building PV dom0 handles domheap mappings Julien Grall
@ 2022-12-16 11:48 ` Julien Grall
  2022-12-22 12:53   ` Jan Beulich
  2022-12-16 11:48 ` [PATCH 10/22] x86/mapcache: initialise the mapcache for the idle domain Julien Grall
                   ` (12 subsequent siblings)
  21 siblings, 1 reply; 101+ messages in thread
From: Julien Grall @ 2022-12-16 11:48 UTC (permalink / raw)
  To: xen-devel
  Cc: julien, Wei Liu, Jan Beulich, Andrew Cooper, Roger Pau Monné,
	Wei Liu, Wei Wang, Hongyan Xia, Julien Grall

From: Wei Liu <wei.liu2@citrix.com>

It is going to be needed by HVM and idle domain as well, because without
the direct map, both need a mapcache to map pages.

This only lifts the mapcache variable up. Whether we populate the
mapcache for a domain is unchanged in this patch.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Wei Wang <wawei@amazon.de>
Signed-off-by: Hongyan Xia <hongyxia@amazon.com>
Signed-off-by: Julien Grall <jgrall@amazon.com>
---
 xen/arch/x86/domain.c             |  4 ++--
 xen/arch/x86/domain_page.c        | 22 ++++++++++------------
 xen/arch/x86/include/asm/domain.h | 12 ++++++------
 3 files changed, 18 insertions(+), 20 deletions(-)

diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index d7a8237f01ab..069b7d2af330 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -829,6 +829,8 @@ int arch_domain_create(struct domain *d,
 
     psr_domain_init(d);
 
+    mapcache_domain_init(d);
+
     if ( is_hvm_domain(d) )
     {
         if ( (rc = hvm_domain_initialise(d, config)) != 0 )
@@ -836,8 +838,6 @@ int arch_domain_create(struct domain *d,
     }
     else if ( is_pv_domain(d) )
     {
-        mapcache_domain_init(d);
-
         if ( (rc = pv_domain_initialise(d)) != 0 )
             goto fail;
     }
diff --git a/xen/arch/x86/domain_page.c b/xen/arch/x86/domain_page.c
index eac5e3304fb8..55e337aaf703 100644
--- a/xen/arch/x86/domain_page.c
+++ b/xen/arch/x86/domain_page.c
@@ -82,11 +82,11 @@ void *map_domain_page(mfn_t mfn)
 #endif
 
     v = mapcache_current_vcpu();
-    if ( !v || !is_pv_vcpu(v) )
+    if ( !v )
         return mfn_to_virt(mfn_x(mfn));
 
-    dcache = &v->domain->arch.pv.mapcache;
-    vcache = &v->arch.pv.mapcache;
+    dcache = &v->domain->arch.mapcache;
+    vcache = &v->arch.mapcache;
     if ( !dcache->inuse )
         return mfn_to_virt(mfn_x(mfn));
 
@@ -187,14 +187,14 @@ void unmap_domain_page(const void *ptr)
     ASSERT(va >= MAPCACHE_VIRT_START && va < MAPCACHE_VIRT_END);
 
     v = mapcache_current_vcpu();
-    ASSERT(v && is_pv_vcpu(v));
+    ASSERT(v);
 
-    dcache = &v->domain->arch.pv.mapcache;
+    dcache = &v->domain->arch.mapcache;
     ASSERT(dcache->inuse);
 
     idx = PFN_DOWN(va - MAPCACHE_VIRT_START);
     mfn = l1e_get_pfn(MAPCACHE_L1ENT(idx));
-    hashent = &v->arch.pv.mapcache.hash[MAPHASH_HASHFN(mfn)];
+    hashent = &v->arch.mapcache.hash[MAPHASH_HASHFN(mfn)];
 
     local_irq_save(flags);
 
@@ -233,11 +233,9 @@ void unmap_domain_page(const void *ptr)
 
 int mapcache_domain_init(struct domain *d)
 {
-    struct mapcache_domain *dcache = &d->arch.pv.mapcache;
+    struct mapcache_domain *dcache = &d->arch.mapcache;
     unsigned int bitmap_pages;
 
-    ASSERT(is_pv_domain(d));
-
 #ifdef NDEBUG
     if ( !mem_hotplug && max_page <= PFN_DOWN(__pa(HYPERVISOR_VIRT_END - 1)) )
         return 0;
@@ -261,12 +259,12 @@ int mapcache_domain_init(struct domain *d)
 int mapcache_vcpu_init(struct vcpu *v)
 {
     struct domain *d = v->domain;
-    struct mapcache_domain *dcache = &d->arch.pv.mapcache;
+    struct mapcache_domain *dcache = &d->arch.mapcache;
     unsigned long i;
     unsigned int ents = d->max_vcpus * MAPCACHE_VCPU_ENTRIES;
     unsigned int nr = PFN_UP(BITS_TO_LONGS(ents) * sizeof(long));
 
-    if ( !is_pv_vcpu(v) || !dcache->inuse )
+    if ( !dcache->inuse )
         return 0;
 
     if ( ents > dcache->entries )
@@ -293,7 +291,7 @@ int mapcache_vcpu_init(struct vcpu *v)
     BUILD_BUG_ON(MAPHASHENT_NOTINUSE < MAPCACHE_ENTRIES);
     for ( i = 0; i < MAPHASH_ENTRIES; i++ )
     {
-        struct vcpu_maphash_entry *hashent = &v->arch.pv.mapcache.hash[i];
+        struct vcpu_maphash_entry *hashent = &v->arch.mapcache.hash[i];
 
         hashent->mfn = ~0UL; /* never valid to map */
         hashent->idx = MAPHASHENT_NOTINUSE;
diff --git a/xen/arch/x86/include/asm/domain.h b/xen/arch/x86/include/asm/domain.h
index 43ace233d75e..eb548eb10efe 100644
--- a/xen/arch/x86/include/asm/domain.h
+++ b/xen/arch/x86/include/asm/domain.h
@@ -285,9 +285,6 @@ struct pv_domain
     /* Mitigate L1TF with shadow/crashing? */
     bool check_l1tf;
 
-    /* map_domain_page() mapping cache. */
-    struct mapcache_domain mapcache;
-
     struct cpuidmasks *cpuidmasks;
 };
 
@@ -326,6 +323,9 @@ struct arch_domain
 
     uint8_t spec_ctrl_flags; /* See SCF_DOM_MASK */
 
+    /* map_domain_page() mapping cache. */
+    struct mapcache_domain mapcache;
+
     union {
         struct pv_domain pv;
         struct hvm_domain hvm;
@@ -508,9 +508,6 @@ struct arch_domain
 
 struct pv_vcpu
 {
-    /* map_domain_page() mapping cache. */
-    struct mapcache_vcpu mapcache;
-
     unsigned int vgc_flags;
 
     struct trap_info *trap_ctxt;
@@ -610,6 +607,9 @@ struct arch_vcpu
 #define async_exception_state(t) async_exception_state[(t)-1]
     uint8_t async_exception_mask;
 
+    /* map_domain_page() mapping cache. */
+    struct mapcache_vcpu mapcache;
+
     /* Virtual Machine Extensions */
     union {
         struct pv_vcpu pv;
-- 
2.38.1



^ permalink raw reply related	[flat|nested] 101+ messages in thread

* [PATCH 10/22] x86/mapcache: initialise the mapcache for the idle domain
  2022-12-16 11:48 [PATCH 00/22] Remove the directmap Julien Grall
                   ` (8 preceding siblings ...)
  2022-12-16 11:48 ` [PATCH 09/22] x86: lift mapcache variable to the arch level Julien Grall
@ 2022-12-16 11:48 ` Julien Grall
  2022-12-22 13:06   ` Jan Beulich
  2022-12-16 11:48 ` [PATCH 11/22] x86: add a boot option to enable and disable the direct map Julien Grall
                   ` (11 subsequent siblings)
  21 siblings, 1 reply; 101+ messages in thread
From: Julien Grall @ 2022-12-16 11:48 UTC (permalink / raw)
  To: xen-devel
  Cc: julien, Hongyan Xia, Jan Beulich, Andrew Cooper,
	Roger Pau Monné,
	Wei Liu, Wei Wang, Julien Grall

From: Hongyan Xia <hongyxia@amazon.com>

In order to use the mapcache in the idle domain, we also have to
populate its page tables in the PERDOMAIN region, and we need to move
mapcache_domain_init() earlier in arch_domain_create().

Note, commit 'x86: lift mapcache variable to the arch level' has
initialised the mapcache for HVM domains. With this patch, PV, HVM,
idle domains now all initialise the mapcache.

Signed-off-by: Wei Wang <wawei@amazon.de>
Signed-off-by: Hongyan Xia <hongyxia@amazon.com>
Signed-off-by: Julien Grall <jgrall@amazon.com>
---
 xen/arch/x86/domain.c | 4 ++--
 xen/arch/x86/mm.c     | 3 +++
 2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index 069b7d2af330..ec150f4fd144 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -732,6 +732,8 @@ int arch_domain_create(struct domain *d,
 
     spin_lock_init(&d->arch.e820_lock);
 
+    mapcache_domain_init(d);
+
     /* Minimal initialisation for the idle domain. */
     if ( unlikely(is_idle_domain(d)) )
     {
@@ -829,8 +831,6 @@ int arch_domain_create(struct domain *d,
 
     psr_domain_init(d);
 
-    mapcache_domain_init(d);
-
     if ( is_hvm_domain(d) )
     {
         if ( (rc = hvm_domain_initialise(d, config)) != 0 )
diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 8b9740f57519..041bd4cfde17 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -5963,6 +5963,9 @@ int create_perdomain_mapping(struct domain *d, unsigned long va,
         l3tab = __map_domain_page(pg);
         clear_page(l3tab);
         d->arch.perdomain_l3_pg = pg;
+        if ( is_idle_domain(d) )
+            idle_pg_table[l4_table_offset(PERDOMAIN_VIRT_START)] =
+                l4e_from_page(pg, __PAGE_HYPERVISOR_RW);
         if ( !nr )
         {
             unmap_domain_page(l3tab);
-- 
2.38.1



^ permalink raw reply related	[flat|nested] 101+ messages in thread

* [PATCH 11/22] x86: add a boot option to enable and disable the direct map
  2022-12-16 11:48 [PATCH 00/22] Remove the directmap Julien Grall
                   ` (9 preceding siblings ...)
  2022-12-16 11:48 ` [PATCH 10/22] x86/mapcache: initialise the mapcache for the idle domain Julien Grall
@ 2022-12-16 11:48 ` Julien Grall
  2022-12-22 13:24   ` Jan Beulich
  2023-01-23 21:45   ` Stefano Stabellini
  2022-12-16 11:48 ` [PATCH 12/22] xen/arm: fixmap: Rename the fixmap slots to follow the x86 convention Julien Grall
                   ` (10 subsequent siblings)
  21 siblings, 2 replies; 101+ messages in thread
From: Julien Grall @ 2022-12-16 11:48 UTC (permalink / raw)
  To: xen-devel
  Cc: julien, Hongyan Xia, Andrew Cooper, George Dunlap, Jan Beulich,
	Stefano Stabellini, Wei Liu, Bertrand Marquis, Volodymyr Babchuk,
	Roger Pau Monné,
	Julien Grall

From: Hongyan Xia <hongyxia@amazon.com>

Also add a helper function to retrieve it. Change arch_mfns_in_direct_map
to check this option before returning.

This is added as a boot command line option, not a Kconfig to allow
the user to experiment the feature without rebuild the hypervisor.

Signed-off-by: Hongyan Xia <hongyxia@amazon.com>
Signed-off-by: Julien Grall <jgrall@amazon.com>

----

    TODO:
        * Do we also want to provide a Kconfig option?

    Changes since Hongyan's version:
        * Reword the commit message
        * opt_directmap is only modified during boot so mark it as
          __ro_after_init
---
 docs/misc/xen-command-line.pandoc | 12 ++++++++++++
 xen/arch/arm/include/asm/mm.h     |  5 +++++
 xen/arch/x86/include/asm/mm.h     | 17 ++++++++++++++++-
 xen/arch/x86/mm.c                 |  3 +++
 xen/arch/x86/setup.c              |  2 ++
 5 files changed, 38 insertions(+), 1 deletion(-)

diff --git a/docs/misc/xen-command-line.pandoc b/docs/misc/xen-command-line.pandoc
index b7ee97be762e..a63e4612acac 100644
--- a/docs/misc/xen-command-line.pandoc
+++ b/docs/misc/xen-command-line.pandoc
@@ -760,6 +760,18 @@ Specify the size of the console debug trace buffer. By specifying `cpu:`
 additionally a trace buffer of the specified size is allocated per cpu.
 The debug trace feature is only enabled in debugging builds of Xen.
 
+### directmap (x86)
+> `= <boolean>`
+
+> Default: `true`
+
+Enable or disable the direct map region in Xen.
+
+By default, Xen creates the direct map region which maps physical memory
+in that region. Setting this to no will remove the direct map, blocking
+exploits that leak secrets via speculative memory access in the direct
+map.
+
 ### dma_bits
 > `= <integer>`
 
diff --git a/xen/arch/arm/include/asm/mm.h b/xen/arch/arm/include/asm/mm.h
index 68adcac9fa8d..2366928d71aa 100644
--- a/xen/arch/arm/include/asm/mm.h
+++ b/xen/arch/arm/include/asm/mm.h
@@ -406,6 +406,11 @@ static inline void page_set_xenheap_gfn(struct page_info *p, gfn_t gfn)
     } while ( (y = cmpxchg(&p->u.inuse.type_info, x, nx)) != x );
 }
 
+static inline bool arch_has_directmap(void)
+{
+    return true;
+}
+
 #endif /*  __ARCH_ARM_MM__ */
 /*
  * Local variables:
diff --git a/xen/arch/x86/include/asm/mm.h b/xen/arch/x86/include/asm/mm.h
index db29e3e2059f..cf8b20817c6c 100644
--- a/xen/arch/x86/include/asm/mm.h
+++ b/xen/arch/x86/include/asm/mm.h
@@ -464,6 +464,8 @@ static inline int get_page_and_type(struct page_info *page,
     ASSERT(((_p)->count_info & PGC_count_mask) != 0);          \
     ASSERT(page_get_owner(_p) == (_d))
 
+extern bool opt_directmap;
+
 /******************************************************************************
  * With shadow pagetables, the different kinds of address start
  * to get get confusing.
@@ -620,13 +622,26 @@ extern const char zero_page[];
 /* Build a 32bit PSE page table using 4MB pages. */
 void write_32bit_pse_identmap(uint32_t *l2);
 
+static inline bool arch_has_directmap(void)
+{
+    return opt_directmap;
+}
+
 /*
  * x86 maps part of physical memory via the directmap region.
  * Return whether the range of MFN falls in the directmap region.
+ *
+ * When boot command line sets directmap=no, we will not have a direct map at
+ * all so this will always return false.
  */
 static inline bool arch_mfns_in_directmap(unsigned long mfn, unsigned long nr)
 {
-    unsigned long eva = min(DIRECTMAP_VIRT_END, HYPERVISOR_VIRT_END);
+    unsigned long eva;
+
+    if ( !arch_has_directmap() )
+        return false;
+
+    eva = min(DIRECTMAP_VIRT_END, HYPERVISOR_VIRT_END);
 
     return (mfn + nr) <= (virt_to_mfn(eva - 1) + 1);
 }
diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 041bd4cfde17..e76e135b96fc 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -157,6 +157,9 @@ l1_pgentry_t __section(".bss.page_aligned") __aligned(PAGE_SIZE)
 l1_pgentry_t __section(".bss.page_aligned") __aligned(PAGE_SIZE)
     l1_fixmap_x[L1_PAGETABLE_ENTRIES];
 
+bool __ro_after_init opt_directmap = true;
+boolean_param("directmap", opt_directmap);
+
 /* Frame table size in pages. */
 unsigned long max_page;
 unsigned long total_pages;
diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
index 1c2e09711eb0..2cb051c6e4e7 100644
--- a/xen/arch/x86/setup.c
+++ b/xen/arch/x86/setup.c
@@ -1423,6 +1423,8 @@ void __init noreturn __start_xen(unsigned long mbi_p)
     if ( highmem_start )
         xenheap_max_mfn(PFN_DOWN(highmem_start - 1));
 
+    printk("Booting with directmap %s\n", arch_has_directmap() ? "on" : "off");
+
     /*
      * Walk every RAM region and map it in its entirety (on x86/64, at least)
      * and notify it to the boot allocator.
-- 
2.38.1



^ permalink raw reply related	[flat|nested] 101+ messages in thread

* [PATCH 12/22] xen/arm: fixmap: Rename the fixmap slots to follow the x86 convention
  2022-12-16 11:48 [PATCH 00/22] Remove the directmap Julien Grall
                   ` (10 preceding siblings ...)
  2022-12-16 11:48 ` [PATCH 11/22] x86: add a boot option to enable and disable the direct map Julien Grall
@ 2022-12-16 11:48 ` Julien Grall
  2022-12-22 13:29   ` Jan Beulich
                     ` (2 more replies)
  2022-12-16 11:48 ` [PATCH 13/22] xen/x86: Add support for the PMAP Julien Grall
                   ` (9 subsequent siblings)
  21 siblings, 3 replies; 101+ messages in thread
From: Julien Grall @ 2022-12-16 11:48 UTC (permalink / raw)
  To: xen-devel
  Cc: julien, Julien Grall, Stefano Stabellini, Bertrand Marquis,
	Volodymyr Babchuk, Andrew Cooper, George Dunlap, Jan Beulich,
	Wei Liu

From: Julien Grall <jgrall@amazon.com>

At the moment the fixmap slots are prefixed differently between arm and
x86.

Some of them (e.g. the PMAP slots) are used in common code. So it would
be better if they are named the same way to avoid having to create
aliases.

I have decided to use the x86 naming because they are less change. So
all the Arm fixmap slots will now be prefixed with FIX rather than
FIXMAP.

Signed-off-by: Julien Grall <jgrall@amazon.com>

----

    Note that potentially more renaming that could be done to share
    more code in future. I have decided to not do that to avoid going
    down a rabbit hole.
---
 xen/arch/arm/acpi/lib.c                 | 18 +++++++++---------
 xen/arch/arm/include/asm/early_printk.h |  2 +-
 xen/arch/arm/include/asm/fixmap.h       | 16 ++++++++--------
 xen/arch/arm/kernel.c                   |  6 +++---
 xen/common/pmap.c                       |  8 ++++----
 5 files changed, 25 insertions(+), 25 deletions(-)

diff --git a/xen/arch/arm/acpi/lib.c b/xen/arch/arm/acpi/lib.c
index 41d521f720ac..736cf09ecaa8 100644
--- a/xen/arch/arm/acpi/lib.c
+++ b/xen/arch/arm/acpi/lib.c
@@ -40,10 +40,10 @@ char *__acpi_map_table(paddr_t phys, unsigned long size)
         return NULL;
 
     offset = phys & (PAGE_SIZE - 1);
-    base = FIXMAP_ADDR(FIXMAP_ACPI_BEGIN) + offset;
+    base = FIXMAP_ADDR(FIX_ACPI_BEGIN) + offset;
 
     /* Check the fixmap is big enough to map the region */
-    if ( (FIXMAP_ADDR(FIXMAP_ACPI_END) + PAGE_SIZE - base) < size )
+    if ( (FIXMAP_ADDR(FIX_ACPI_END) + PAGE_SIZE - base) < size )
         return NULL;
 
     /* With the fixmap, we can only map one region at the time */
@@ -54,7 +54,7 @@ char *__acpi_map_table(paddr_t phys, unsigned long size)
 
     size += offset;
     mfn = maddr_to_mfn(phys);
-    idx = FIXMAP_ACPI_BEGIN;
+    idx = FIX_ACPI_BEGIN;
 
     do {
         set_fixmap(idx, mfn, PAGE_HYPERVISOR);
@@ -72,8 +72,8 @@ bool __acpi_unmap_table(const void *ptr, unsigned long size)
     unsigned int idx;
 
     /* We are only handling fixmap address in the arch code */
-    if ( (vaddr < FIXMAP_ADDR(FIXMAP_ACPI_BEGIN)) ||
-         (vaddr >= (FIXMAP_ADDR(FIXMAP_ACPI_END) + PAGE_SIZE)) )
+    if ( (vaddr < FIXMAP_ADDR(FIX_ACPI_BEGIN)) ||
+         (vaddr >= (FIXMAP_ADDR(FIX_ACPI_END) + PAGE_SIZE)) )
         return false;
 
     /*
@@ -81,16 +81,16 @@ bool __acpi_unmap_table(const void *ptr, unsigned long size)
      * for the ACPI fixmap region. The caller is expected to free with
      * the same address.
      */
-    ASSERT((vaddr & PAGE_MASK) == FIXMAP_ADDR(FIXMAP_ACPI_BEGIN));
+    ASSERT((vaddr & PAGE_MASK) == FIXMAP_ADDR(FIX_ACPI_BEGIN));
 
     /* The region allocated fit in the ACPI fixmap region. */
-    ASSERT(size < (FIXMAP_ADDR(FIXMAP_ACPI_END) + PAGE_SIZE - vaddr));
+    ASSERT(size < (FIXMAP_ADDR(FIX_ACPI_END) + PAGE_SIZE - vaddr));
     ASSERT(fixmap_inuse);
 
     fixmap_inuse = false;
 
-    size += vaddr - FIXMAP_ADDR(FIXMAP_ACPI_BEGIN);
-    idx = FIXMAP_ACPI_BEGIN;
+    size += vaddr - FIXMAP_ADDR(FIX_ACPI_BEGIN);
+    idx = FIX_ACPI_BEGIN;
 
     do
     {
diff --git a/xen/arch/arm/include/asm/early_printk.h b/xen/arch/arm/include/asm/early_printk.h
index c5149b2976da..a5f48801f476 100644
--- a/xen/arch/arm/include/asm/early_printk.h
+++ b/xen/arch/arm/include/asm/early_printk.h
@@ -17,7 +17,7 @@
 
 /* need to add the uart address offset in page to the fixmap address */
 #define EARLY_UART_VIRTUAL_ADDRESS \
-    (FIXMAP_ADDR(FIXMAP_CONSOLE) + (CONFIG_EARLY_UART_BASE_ADDRESS & ~PAGE_MASK))
+    (FIXMAP_ADDR(FIX_CONSOLE) + (CONFIG_EARLY_UART_BASE_ADDRESS & ~PAGE_MASK))
 
 #endif /* !CONFIG_EARLY_PRINTK */
 
diff --git a/xen/arch/arm/include/asm/fixmap.h b/xen/arch/arm/include/asm/fixmap.h
index d0c9a52c8c28..154db85686c2 100644
--- a/xen/arch/arm/include/asm/fixmap.h
+++ b/xen/arch/arm/include/asm/fixmap.h
@@ -8,17 +8,17 @@
 #include <xen/pmap.h>
 
 /* Fixmap slots */
-#define FIXMAP_CONSOLE  0  /* The primary UART */
-#define FIXMAP_MISC     1  /* Ephemeral mappings of hardware */
-#define FIXMAP_ACPI_BEGIN  2  /* Start mappings of ACPI tables */
-#define FIXMAP_ACPI_END    (FIXMAP_ACPI_BEGIN + NUM_FIXMAP_ACPI_PAGES - 1)  /* End mappings of ACPI tables */
-#define FIXMAP_PMAP_BEGIN (FIXMAP_ACPI_END + 1) /* Start of PMAP */
-#define FIXMAP_PMAP_END (FIXMAP_PMAP_BEGIN + NUM_FIX_PMAP - 1) /* End of PMAP */
+#define FIX_CONSOLE  0  /* The primary UART */
+#define FIX_MISC     1  /* Ephemeral mappings of hardware */
+#define FIX_ACPI_BEGIN  2  /* Start mappings of ACPI tables */
+#define FIX_ACPI_END    (FIX_ACPI_BEGIN + NUM_FIXMAP_ACPI_PAGES - 1)  /* End mappings of ACPI tables */
+#define FIX_PMAP_BEGIN (FIX_ACPI_END + 1) /* Start of PMAP */
+#define FIX_PMAP_END (FIX_PMAP_BEGIN + NUM_FIX_PMAP - 1) /* End of PMAP */
 
-#define FIXMAP_LAST FIXMAP_PMAP_END
+#define FIX_LAST FIX_PMAP_END
 
 #define FIXADDR_START FIXMAP_ADDR(0)
-#define FIXADDR_TOP FIXMAP_ADDR(FIXMAP_LAST)
+#define FIXADDR_TOP FIXMAP_ADDR(FIX_LAST)
 
 #ifndef __ASSEMBLY__
 
diff --git a/xen/arch/arm/kernel.c b/xen/arch/arm/kernel.c
index 23b840ea9ea8..56800750fd9c 100644
--- a/xen/arch/arm/kernel.c
+++ b/xen/arch/arm/kernel.c
@@ -49,7 +49,7 @@ struct minimal_dtb_header {
  */
 void __init copy_from_paddr(void *dst, paddr_t paddr, unsigned long len)
 {
-    void *src = (void *)FIXMAP_ADDR(FIXMAP_MISC);
+    void *src = (void *)FIXMAP_ADDR(FIX_MISC);
 
     while (len) {
         unsigned long l, s;
@@ -57,10 +57,10 @@ void __init copy_from_paddr(void *dst, paddr_t paddr, unsigned long len)
         s = paddr & (PAGE_SIZE-1);
         l = min(PAGE_SIZE - s, len);
 
-        set_fixmap(FIXMAP_MISC, maddr_to_mfn(paddr), PAGE_HYPERVISOR_WC);
+        set_fixmap(FIX_MISC, maddr_to_mfn(paddr), PAGE_HYPERVISOR_WC);
         memcpy(dst, src + s, l);
         clean_dcache_va_range(dst, l);
-        clear_fixmap(FIXMAP_MISC);
+        clear_fixmap(FIX_MISC);
 
         paddr += l;
         dst += l;
diff --git a/xen/common/pmap.c b/xen/common/pmap.c
index 14517198aae3..6e3ba9298df4 100644
--- a/xen/common/pmap.c
+++ b/xen/common/pmap.c
@@ -32,8 +32,8 @@ void *__init pmap_map(mfn_t mfn)
 
     __set_bit(idx, inuse);
 
-    slot = idx + FIXMAP_PMAP_BEGIN;
-    ASSERT(slot >= FIXMAP_PMAP_BEGIN && slot <= FIXMAP_PMAP_END);
+    slot = idx + FIX_PMAP_BEGIN;
+    ASSERT(slot >= FIX_PMAP_BEGIN && slot <= FIX_PMAP_END);
 
     /*
      * We cannot use set_fixmap() here. We use PMAP when the domain map
@@ -53,10 +53,10 @@ void __init pmap_unmap(const void *p)
     unsigned int slot = virt_to_fix((unsigned long)p);
 
     ASSERT(system_state < SYS_STATE_smp_boot);
-    ASSERT(slot >= FIXMAP_PMAP_BEGIN && slot <= FIXMAP_PMAP_END);
+    ASSERT(slot >= FIX_PMAP_BEGIN && slot <= FIX_PMAP_END);
     ASSERT(!in_irq());
 
-    idx = slot - FIXMAP_PMAP_BEGIN;
+    idx = slot - FIX_PMAP_BEGIN;
 
     __clear_bit(idx, inuse);
     arch_pmap_unmap(slot);
-- 
2.38.1



^ permalink raw reply related	[flat|nested] 101+ messages in thread

* [PATCH 13/22] xen/x86: Add support for the PMAP
  2022-12-16 11:48 [PATCH 00/22] Remove the directmap Julien Grall
                   ` (11 preceding siblings ...)
  2022-12-16 11:48 ` [PATCH 12/22] xen/arm: fixmap: Rename the fixmap slots to follow the x86 convention Julien Grall
@ 2022-12-16 11:48 ` Julien Grall
  2023-01-05 16:46   ` Jan Beulich
  2022-12-16 11:48 ` [PATCH 14/22] x86/domain_page: remove the fast paths when mfn is not in the directmap Julien Grall
                   ` (8 subsequent siblings)
  21 siblings, 1 reply; 101+ messages in thread
From: Julien Grall @ 2022-12-16 11:48 UTC (permalink / raw)
  To: xen-devel
  Cc: julien, Julien Grall, Jan Beulich, Andrew Cooper,
	Roger Pau Monné,
	Wei Liu

From: Julien Grall <jgrall@amazon.com>

PMAP will be used in a follow-up patch to bootstap map domain
page infrastructure -- we need some way to map pages to setup the
mapcache without a direct map.

Signed-off-by: Julien Grall <jgrall@amazon.com>

----
    The PMAP infrastructure was upstream separately for Arm since
    Hongyan sent the secret-free hypervisor series. So this is a new
    patch to plumb the feature on x86.
---
 xen/arch/x86/Kconfig              |  1 +
 xen/arch/x86/include/asm/fixmap.h |  4 ++++
 xen/arch/x86/include/asm/pmap.h   | 25 +++++++++++++++++++++++++
 3 files changed, 30 insertions(+)
 create mode 100644 xen/arch/x86/include/asm/pmap.h

diff --git a/xen/arch/x86/Kconfig b/xen/arch/x86/Kconfig
index 6a7825f4ba3c..47b120f18497 100644
--- a/xen/arch/x86/Kconfig
+++ b/xen/arch/x86/Kconfig
@@ -24,6 +24,7 @@ config X86
 	select HAS_PCI
 	select HAS_PCI_MSI
 	select HAS_PDX
+	select HAS_PMAP
 	select HAS_SCHED_GRANULARITY
 	select HAS_UBSAN
 	select HAS_VPCI if HVM
diff --git a/xen/arch/x86/include/asm/fixmap.h b/xen/arch/x86/include/asm/fixmap.h
index 516ec3fa6c95..38f079873418 100644
--- a/xen/arch/x86/include/asm/fixmap.h
+++ b/xen/arch/x86/include/asm/fixmap.h
@@ -21,6 +21,8 @@
 
 #include <xen/acpi.h>
 #include <xen/pfn.h>
+#include <xen/pmap.h>
+
 #include <asm/apicdef.h>
 #include <asm/msi.h>
 #include <acpi/apei.h>
@@ -54,6 +56,8 @@ enum fixed_addresses {
     FIX_XEN_SHARED_INFO,
 #endif /* CONFIG_XEN_GUEST */
     /* Everything else should go further down. */
+    FIX_PMAP_BEGIN,
+    FIX_PMAP_END = FIX_PMAP_BEGIN + NUM_FIX_PMAP,
     FIX_APIC_BASE,
     FIX_IO_APIC_BASE_0,
     FIX_IO_APIC_BASE_END = FIX_IO_APIC_BASE_0 + MAX_IO_APICS-1,
diff --git a/xen/arch/x86/include/asm/pmap.h b/xen/arch/x86/include/asm/pmap.h
new file mode 100644
index 000000000000..62746e191d03
--- /dev/null
+++ b/xen/arch/x86/include/asm/pmap.h
@@ -0,0 +1,25 @@
+#ifndef __ASM_PMAP_H__
+#define __ASM_PMAP_H__
+
+#include <asm/fixmap.h>
+
+static inline void arch_pmap_map(unsigned int slot, mfn_t mfn)
+{
+    unsigned long linear = (unsigned long)fix_to_virt(slot);
+    l1_pgentry_t *pl1e = &l1_fixmap[l1_table_offset(linear)];
+
+    ASSERT(!(l1e_get_flags(*pl1e) & _PAGE_PRESENT));
+
+    l1e_write_atomic(pl1e, l1e_from_mfn(mfn, PAGE_HYPERVISOR));
+}
+
+static inline void arch_pmap_unmap(unsigned int slot)
+{
+    unsigned long linear = (unsigned long)fix_to_virt(slot);
+    l1_pgentry_t *pl1e = &l1_fixmap[l1_table_offset(linear)];
+
+    l1e_write_atomic(pl1e, l1e_empty());
+    flush_tlb_one_local(linear);
+}
+
+#endif /* __ASM_PMAP_H__ */
-- 
2.38.1



^ permalink raw reply related	[flat|nested] 101+ messages in thread

* [PATCH 14/22] x86/domain_page: remove the fast paths when mfn is not in the directmap
  2022-12-16 11:48 [PATCH 00/22] Remove the directmap Julien Grall
                   ` (12 preceding siblings ...)
  2022-12-16 11:48 ` [PATCH 13/22] xen/x86: Add support for the PMAP Julien Grall
@ 2022-12-16 11:48 ` Julien Grall
  2023-01-11 14:11   ` Jan Beulich
  2022-12-16 11:48 ` [PATCH 15/22] xen/page_alloc: add a path for xenheap when there is no direct map Julien Grall
                   ` (7 subsequent siblings)
  21 siblings, 1 reply; 101+ messages in thread
From: Julien Grall @ 2022-12-16 11:48 UTC (permalink / raw)
  To: xen-devel
  Cc: julien, Hongyan Xia, Jan Beulich, Andrew Cooper,
	Roger Pau Monné,
	Wei Liu, Julien Grall

From: Hongyan Xia <hongyxia@amazon.com>

When mfn is not in direct map, never use mfn_to_virt for any mappings.

We replace mfn_x(mfn) <= PFN_DOWN(__pa(HYPERVISOR_VIRT_END - 1)) with
arch_mfns_in_direct_map(mfn, 1) because these two are equivalent. The
extra comparison in arch_mfns_in_direct_map() looks different but because
DIRECTMAP_VIRT_END is always higher, it does not make any difference.

Lastly, domain_page_map_to_mfn() needs to gain to a special case for
the PMAP.

Signed-off-by: Hongyan Xia <hongyxia@amazon.com>
Signed-off-by: Julien Grall <jgrall@amazon.com>

----
    Changes since Hongyan's version:
        * arch_mfn_in_direct_map() was renamed to arch_mfns_in_directmap()
        * add a special case for the PMAP in domain_page_map_to_mfn()
---
 xen/arch/x86/domain_page.c | 50 +++++++++++++++++++++++++++++++-------
 1 file changed, 41 insertions(+), 9 deletions(-)

diff --git a/xen/arch/x86/domain_page.c b/xen/arch/x86/domain_page.c
index 55e337aaf703..89caefc8a210 100644
--- a/xen/arch/x86/domain_page.c
+++ b/xen/arch/x86/domain_page.c
@@ -14,8 +14,10 @@
 #include <xen/sched.h>
 #include <xen/vmap.h>
 #include <asm/current.h>
+#include <asm/fixmap.h>
 #include <asm/flushtlb.h>
 #include <asm/hardirq.h>
+#include <asm/pmap.h>
 #include <asm/setup.h>
 
 static DEFINE_PER_CPU(struct vcpu *, override);
@@ -35,10 +37,11 @@ static inline struct vcpu *mapcache_current_vcpu(void)
     /*
      * When using efi runtime page tables, we have the equivalent of the idle
      * domain's page tables but current may point at another domain's VCPU.
-     * Return NULL as though current is not properly set up yet.
+     * Return the idle domains's vcpu on that core because the efi per-domain
+     * region (where the mapcache is) is in-sync with the idle domain.
      */
     if ( efi_rs_using_pgtables() )
-        return NULL;
+        return idle_vcpu[smp_processor_id()];
 
     /*
      * If guest_table is NULL, and we are running a paravirtualised guest,
@@ -77,18 +80,24 @@ void *map_domain_page(mfn_t mfn)
     struct vcpu_maphash_entry *hashent;
 
 #ifdef NDEBUG
-    if ( mfn_x(mfn) <= PFN_DOWN(__pa(HYPERVISOR_VIRT_END - 1)) )
+    if ( arch_mfns_in_directmap(mfn_x(mfn), 1) )
         return mfn_to_virt(mfn_x(mfn));
 #endif
 
     v = mapcache_current_vcpu();
-    if ( !v )
-        return mfn_to_virt(mfn_x(mfn));
+    if ( !v || !v->domain->arch.mapcache.inuse )
+    {
+        if ( arch_mfns_in_directmap(mfn_x(mfn), 1) )
+            return mfn_to_virt(mfn_x(mfn));
+        else
+        {
+            BUG_ON(system_state >= SYS_STATE_smp_boot);
+            return pmap_map(mfn);
+        }
+    }
 
     dcache = &v->domain->arch.mapcache;
     vcache = &v->arch.mapcache;
-    if ( !dcache->inuse )
-        return mfn_to_virt(mfn_x(mfn));
 
     perfc_incr(map_domain_page_count);
 
@@ -184,6 +193,12 @@ void unmap_domain_page(const void *ptr)
     if ( !va || va >= DIRECTMAP_VIRT_START )
         return;
 
+    if ( va >= FIXADDR_START && va < FIXADDR_TOP )
+    {
+        pmap_unmap((void *)ptr);
+        return;
+    }
+
     ASSERT(va >= MAPCACHE_VIRT_START && va < MAPCACHE_VIRT_END);
 
     v = mapcache_current_vcpu();
@@ -237,7 +252,7 @@ int mapcache_domain_init(struct domain *d)
     unsigned int bitmap_pages;
 
 #ifdef NDEBUG
-    if ( !mem_hotplug && max_page <= PFN_DOWN(__pa(HYPERVISOR_VIRT_END - 1)) )
+    if ( !mem_hotplug && arch_mfn_in_directmap(0, max_page) )
         return 0;
 #endif
 
@@ -308,7 +323,7 @@ void *map_domain_page_global(mfn_t mfn)
             local_irq_is_enabled()));
 
 #ifdef NDEBUG
-    if ( mfn_x(mfn) <= PFN_DOWN(__pa(HYPERVISOR_VIRT_END - 1)) )
+    if ( arch_mfn_in_directmap(mfn_x(mfn, 1)) )
         return mfn_to_virt(mfn_x(mfn));
 #endif
 
@@ -335,6 +350,23 @@ mfn_t domain_page_map_to_mfn(const void *ptr)
     if ( va >= DIRECTMAP_VIRT_START )
         return _mfn(virt_to_mfn(ptr));
 
+    /*
+     * The fixmap is stealing the top-end of the VMAP. So the check for
+     * the PMAP *must* happen first.
+     *
+     * Also, the fixmap translate a slot to an address backwards. The
+     * logic will rely on it to avoid any complexity. So check at
+     * compile time this will always hold.
+    */
+    BUILD_BUG_ON(fix_to_virt(FIX_PMAP_BEGIN) < fix_to_virt(FIX_PMAP_END));
+
+    if ( ((unsigned long)fix_to_virt(FIX_PMAP_END) <= va) &&
+         ((va & PAGE_MASK) <= (unsigned long)fix_to_virt(FIX_PMAP_BEGIN)) )
+    {
+        BUG_ON(system_state >= SYS_STATE_smp_boot);
+        return l1e_get_mfn(l1_fixmap[l1_table_offset(va)]);
+    }
+
     if ( va >= VMAP_VIRT_START && va < VMAP_VIRT_END )
         return vmap_to_mfn(va);
 
-- 
2.38.1



^ permalink raw reply related	[flat|nested] 101+ messages in thread

* [PATCH 15/22] xen/page_alloc: add a path for xenheap when there is no direct map
  2022-12-16 11:48 [PATCH 00/22] Remove the directmap Julien Grall
                   ` (13 preceding siblings ...)
  2022-12-16 11:48 ` [PATCH 14/22] x86/domain_page: remove the fast paths when mfn is not in the directmap Julien Grall
@ 2022-12-16 11:48 ` Julien Grall
  2023-01-11 14:23   ` Jan Beulich
  2022-12-16 11:48 ` [PATCH 16/22] x86/setup: leave early boot slightly earlier Julien Grall
                   ` (6 subsequent siblings)
  21 siblings, 1 reply; 101+ messages in thread
From: Julien Grall @ 2022-12-16 11:48 UTC (permalink / raw)
  To: xen-devel
  Cc: julien, Hongyan Xia, Andrew Cooper, George Dunlap, Jan Beulich,
	Stefano Stabellini, Wei Liu, Julien Grall

From: Hongyan Xia <hongyxia@amazon.com>

When there is not an always-mapped direct map, xenheap allocations need
to be mapped and unmapped on-demand.

Signed-off-by: Hongyan Xia <hongyxia@amazon.com>
Signed-off-by: Julien Grall <jgrall@amazon.com>

----

    I have left the call to map_pages_to_xen() and destroy_xen_mappings()
    in the split heap for now. I am not entirely convinced this is necessary
    because in that setup only the xenheap would be always mapped and
    this doesn't contain any guest memory (aside the grant-table).
    So map/unmapping for every allocation seems unnecessary.

    Changes since Hongyan's version:
        * Rebase
        * Fix indentation in alloc_xenheap_pages()
        * Fix build for arm32
---
 xen/common/page_alloc.c | 41 ++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 40 insertions(+), 1 deletion(-)

diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
index 0a950288e241..0c4af5a71407 100644
--- a/xen/common/page_alloc.c
+++ b/xen/common/page_alloc.c
@@ -2222,6 +2222,7 @@ void init_xenheap_pages(paddr_t ps, paddr_t pe)
 void *alloc_xenheap_pages(unsigned int order, unsigned int memflags)
 {
     struct page_info *pg;
+    void *ret;
 
     ASSERT_ALLOC_CONTEXT();
 
@@ -2230,17 +2231,36 @@ void *alloc_xenheap_pages(unsigned int order, unsigned int memflags)
     if ( unlikely(pg == NULL) )
         return NULL;
 
+    ret = page_to_virt(pg);
+
+    if ( !arch_has_directmap() &&
+         map_pages_to_xen((unsigned long)ret, page_to_mfn(pg), 1UL << order,
+                          PAGE_HYPERVISOR) )
+        {
+            /* Failed to map xenheap pages. */
+            free_heap_pages(pg, order, false);
+            return NULL;
+        }
+
     return page_to_virt(pg);
 }
 
 
 void free_xenheap_pages(void *v, unsigned int order)
 {
+    unsigned long va = (unsigned long)v & PAGE_MASK;
+
     ASSERT_ALLOC_CONTEXT();
 
     if ( v == NULL )
         return;
 
+    if ( !arch_has_directmap() &&
+         destroy_xen_mappings(va, va + (1UL << (order + PAGE_SHIFT))) )
+        dprintk(XENLOG_WARNING,
+                "Error while destroying xenheap mappings at %p, order %u\n",
+                v, order);
+
     free_heap_pages(virt_to_page(v), order, false);
 }
 
@@ -2264,6 +2284,7 @@ void *alloc_xenheap_pages(unsigned int order, unsigned int memflags)
 {
     struct page_info *pg;
     unsigned int i;
+    void *ret;
 
     ASSERT_ALLOC_CONTEXT();
 
@@ -2276,16 +2297,28 @@ void *alloc_xenheap_pages(unsigned int order, unsigned int memflags)
     if ( unlikely(pg == NULL) )
         return NULL;
 
+    ret = page_to_virt(pg);
+
+    if ( !arch_has_directmap() &&
+         map_pages_to_xen((unsigned long)ret, page_to_mfn(pg), 1UL << order,
+                          PAGE_HYPERVISOR) )
+    {
+        /* Failed to map xenheap pages. */
+        free_domheap_pages(pg, order);
+        return NULL;
+    }
+
     for ( i = 0; i < (1u << order); i++ )
         pg[i].count_info |= PGC_xen_heap;
 
-    return page_to_virt(pg);
+    return ret;
 }
 
 void free_xenheap_pages(void *v, unsigned int order)
 {
     struct page_info *pg;
     unsigned int i;
+    unsigned long va = (unsigned long)v & PAGE_MASK;
 
     ASSERT_ALLOC_CONTEXT();
 
@@ -2297,6 +2330,12 @@ void free_xenheap_pages(void *v, unsigned int order)
     for ( i = 0; i < (1u << order); i++ )
         pg[i].count_info &= ~PGC_xen_heap;
 
+    if ( !arch_has_directmap() &&
+         destroy_xen_mappings(va, va + (1UL << (order + PAGE_SHIFT))) )
+        dprintk(XENLOG_WARNING,
+                "Error while destroying xenheap mappings at %p, order %u\n",
+                v, order);
+
     free_heap_pages(pg, order, true);
 }
 
-- 
2.38.1



^ permalink raw reply related	[flat|nested] 101+ messages in thread

* [PATCH 16/22] x86/setup: leave early boot slightly earlier
  2022-12-16 11:48 [PATCH 00/22] Remove the directmap Julien Grall
                   ` (14 preceding siblings ...)
  2022-12-16 11:48 ` [PATCH 15/22] xen/page_alloc: add a path for xenheap when there is no direct map Julien Grall
@ 2022-12-16 11:48 ` Julien Grall
  2023-01-11 14:34   ` Jan Beulich
  2022-12-16 11:48 ` [PATCH 17/22] x86/setup: vmap heap nodes when they are outside the direct map Julien Grall
                   ` (5 subsequent siblings)
  21 siblings, 1 reply; 101+ messages in thread
From: Julien Grall @ 2022-12-16 11:48 UTC (permalink / raw)
  To: xen-devel
  Cc: julien, Hongyan Xia, Jan Beulich, Andrew Cooper,
	Roger Pau Monné,
	Wei Liu, Julien Grall

From: Hongyan Xia <hongyxia@amazon.com>

When we do not have a direct map, memory for metadata of heap nodes in
init_node_heap() is allocated from xenheap, which needs to be mapped and
unmapped on demand. However, we cannot just take memory from the boot
allocator to create the PTEs while we are passing memory to the heap
allocator.

To solve this race, we leave early boot slightly sooner so that Xen PTE
pages are allocated from the heap instead of the boot allocator. We can
do this because the metadata for the 1st node is statically allocated,
and by the time we need memory to create mappings for the 2nd node, we
already have enough memory in the heap allocator in the 1st node.

Signed-off-by: Hongyan Xia <hongyxia@amazon.com>
Signed-off-by: Julien Grall <jgrall@amazon.com>
---
 xen/arch/x86/setup.c | 18 ++++++++++++++++--
 1 file changed, 16 insertions(+), 2 deletions(-)

diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
index 2cb051c6e4e7..ec5a7448a225 100644
--- a/xen/arch/x86/setup.c
+++ b/xen/arch/x86/setup.c
@@ -1648,6 +1648,22 @@ void __init noreturn __start_xen(unsigned long mbi_p)
 
     numa_initmem_init(0, raw_max_page);
 
+    /*
+     * When we do not have a direct map, memory for metadata of heap nodes in
+     * init_node_heap() is allocated from xenheap, which needs to be mapped and
+     * unmapped on demand. However, we cannot just take memory from the boot
+     * allocator to create the PTEs while we are passing memory to the heap
+     * allocator during end_boot_allocator().
+     *
+     * To solve this race, we need to leave early boot before
+     * end_boot_allocator() so that Xen PTE pages are allocated from the heap
+     * instead of the boot allocator. We can do this because the metadata for
+     * the 1st node is statically allocated, and by the time we need memory to
+     * create mappings for the 2nd node, we already have enough memory in the
+     * heap allocator in the 1st node.
+     */
+    system_state = SYS_STATE_boot;
+
     if ( max_page - 1 > virt_to_mfn(HYPERVISOR_VIRT_END - 1) )
     {
         unsigned long limit = virt_to_mfn(HYPERVISOR_VIRT_END - 1);
@@ -1677,8 +1693,6 @@ void __init noreturn __start_xen(unsigned long mbi_p)
     else
         end_boot_allocator();
 
-    system_state = SYS_STATE_boot;
-
     bsp_stack = cpu_alloc_stack(0);
     if ( !bsp_stack )
         panic("No memory for BSP stack\n");
-- 
2.38.1



^ permalink raw reply related	[flat|nested] 101+ messages in thread

* [PATCH 17/22] x86/setup: vmap heap nodes when they are outside the direct map
  2022-12-16 11:48 [PATCH 00/22] Remove the directmap Julien Grall
                   ` (15 preceding siblings ...)
  2022-12-16 11:48 ` [PATCH 16/22] x86/setup: leave early boot slightly earlier Julien Grall
@ 2022-12-16 11:48 ` Julien Grall
  2023-01-11 14:39   ` Jan Beulich
  2023-01-23 22:03   ` Stefano Stabellini
  2022-12-16 11:48 ` [PATCH 18/22] x86/setup: do not create valid mappings when directmap=no Julien Grall
                   ` (4 subsequent siblings)
  21 siblings, 2 replies; 101+ messages in thread
From: Julien Grall @ 2022-12-16 11:48 UTC (permalink / raw)
  To: xen-devel
  Cc: julien, Hongyan Xia, Andrew Cooper, George Dunlap, Jan Beulich,
	Stefano Stabellini, Wei Liu, Julien Grall

From: Hongyan Xia <hongyxia@amazon.com>

When we do not have a direct map, archs_mfn_in_direct_map() will always
return false, thus init_node_heap() will allocate xenheap pages from an
existing node for the metadata of a new node. This means that the
metadata of a new node is in a different node, slowing down heap
allocation.

Since we now have early vmap, vmap the metadata locally in the new node.

Signed-off-by: Hongyan Xia <hongyxia@amazon.com>
Signed-off-by: Julien Grall <jgrall@amazon.com>

----

    Changes from Hongyan's version:
        * arch_mfn_in_direct_map() was renamed to
          arch_mfns_in_direct_map()
        * Use vmap_contig_pages() rather than __vmap(...).
        * Add missing include (xen/vmap.h) so it compiles on Arm
---
 xen/common/page_alloc.c | 42 +++++++++++++++++++++++++++++++----------
 1 file changed, 32 insertions(+), 10 deletions(-)

diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
index 0c4af5a71407..581c15d74dfb 100644
--- a/xen/common/page_alloc.c
+++ b/xen/common/page_alloc.c
@@ -136,6 +136,7 @@
 #include <xen/sched.h>
 #include <xen/softirq.h>
 #include <xen/spinlock.h>
+#include <xen/vmap.h>
 
 #include <asm/flushtlb.h>
 #include <asm/numa.h>
@@ -597,22 +598,43 @@ static unsigned long init_node_heap(int node, unsigned long mfn,
         needed = 0;
     }
     else if ( *use_tail && nr >= needed &&
-              arch_mfns_in_directmap(mfn + nr - needed, needed) &&
               (!xenheap_bits ||
                !((mfn + nr - 1) >> (xenheap_bits - PAGE_SHIFT))) )
     {
-        _heap[node] = mfn_to_virt(mfn + nr - needed);
-        avail[node] = mfn_to_virt(mfn + nr - 1) +
-                      PAGE_SIZE - sizeof(**avail) * NR_ZONES;
-    }
-    else if ( nr >= needed &&
-              arch_mfns_in_directmap(mfn, needed) &&
+        if ( arch_mfns_in_directmap(mfn + nr - needed, needed) )
+        {
+            _heap[node] = mfn_to_virt(mfn + nr - needed);
+            avail[node] = mfn_to_virt(mfn + nr - 1) +
+                          PAGE_SIZE - sizeof(**avail) * NR_ZONES;
+        }
+        else
+        {
+            mfn_t needed_start = _mfn(mfn + nr - needed);
+
+            _heap[node] = vmap_contig_pages(needed_start, needed);
+            BUG_ON(!_heap[node]);
+            avail[node] = (void *)(_heap[node]) + (needed << PAGE_SHIFT) -
+                          sizeof(**avail) * NR_ZONES;
+        }
+    } else if ( nr >= needed &&
               (!xenheap_bits ||
                !((mfn + needed - 1) >> (xenheap_bits - PAGE_SHIFT))) )
     {
-        _heap[node] = mfn_to_virt(mfn);
-        avail[node] = mfn_to_virt(mfn + needed - 1) +
-                      PAGE_SIZE - sizeof(**avail) * NR_ZONES;
+        if ( arch_mfns_in_directmap(mfn, needed) )
+        {
+            _heap[node] = mfn_to_virt(mfn);
+            avail[node] = mfn_to_virt(mfn + needed - 1) +
+                          PAGE_SIZE - sizeof(**avail) * NR_ZONES;
+        }
+        else
+        {
+            mfn_t needed_start = _mfn(mfn);
+
+            _heap[node] = vmap_contig_pages(needed_start, needed);
+            BUG_ON(!_heap[node]);
+            avail[node] = (void *)(_heap[node]) + (needed << PAGE_SHIFT) -
+                          sizeof(**avail) * NR_ZONES;
+        }
         *use_tail = false;
     }
     else if ( get_order_from_bytes(sizeof(**_heap)) ==
-- 
2.38.1



^ permalink raw reply related	[flat|nested] 101+ messages in thread

* [PATCH 18/22] x86/setup: do not create valid mappings when directmap=no
  2022-12-16 11:48 [PATCH 00/22] Remove the directmap Julien Grall
                   ` (16 preceding siblings ...)
  2022-12-16 11:48 ` [PATCH 17/22] x86/setup: vmap heap nodes when they are outside the direct map Julien Grall
@ 2022-12-16 11:48 ` Julien Grall
  2023-01-11 14:47   ` Jan Beulich
  2022-12-16 11:48 ` [PATCH 19/22] xen/arm32: mm: Rename 'first' to 'root' in init_secondary_pagetables() Julien Grall
                   ` (3 subsequent siblings)
  21 siblings, 1 reply; 101+ messages in thread
From: Julien Grall @ 2022-12-16 11:48 UTC (permalink / raw)
  To: xen-devel
  Cc: julien, Hongyan Xia, Jan Beulich, Andrew Cooper,
	Roger Pau Monné,
	Wei Liu, Julien Grall

From: Hongyan Xia <hongyxia@amazon.com>

Create empty mappings in the second e820 pass. Also, destroy existing
direct map mappings created in the first pass.

To make xenheap pages visible in guests, it is necessary to create empty
L3 tables in the direct map even when directmap=no, since guest cr3s
copy idle domain's L4 entries, which means they will share mappings in
the direct map if we pre-populate idle domain's L4 entries and L3
tables. A helper is introduced for this.

Also, after the direct map is actually gone, we need to stop updating
the direct map in update_xen_mappings().

Signed-off-by: Hongyan Xia <hongyxia@amazon.com>
Signed-off-by: Julien Grall <jgrall@amazon.com>
---
 xen/arch/x86/setup.c | 74 +++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 67 insertions(+), 7 deletions(-)

diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
index ec5a7448a225..87967abb00cb 100644
--- a/xen/arch/x86/setup.c
+++ b/xen/arch/x86/setup.c
@@ -856,6 +856,57 @@ static struct domain *__init create_dom0(const module_t *image,
 /* How much of the directmap is prebuilt at compile time. */
 #define PREBUILT_MAP_LIMIT (1 << L2_PAGETABLE_SHIFT)
 
+/*
+ * This either populates a valid direct map, or allocates empty L3 tables and
+ * creates the L4 entries for virtual address between [start, end) in the
+ * direct map depending on arch_has_directmap();
+ *
+ * When directmap=no, we still need to populate empty L3 tables in the
+ * direct map region. The reason is that on-demand xenheap mappings are
+ * created in the idle domain's page table but must be seen by
+ * everyone. Since all domains share the direct map L4 entries, they
+ * will share xenheap mappings if we pre-populate the L4 entries and L3
+ * tables in the direct map region for all RAM. We also rely on the fact
+ * that L3 tables are never freed.
+ */
+static void __init populate_directmap(uint64_t pstart, uint64_t pend,
+                                      unsigned int flags)
+{
+    unsigned long vstart = (unsigned long)__va(pstart);
+    unsigned long vend = (unsigned long)__va(pend);
+
+    if ( pstart >= pend )
+        return;
+
+    BUG_ON(vstart < DIRECTMAP_VIRT_START);
+    BUG_ON(vend > DIRECTMAP_VIRT_END);
+
+    if ( arch_has_directmap() )
+        /* Populate valid direct map. */
+        BUG_ON(map_pages_to_xen(vstart, maddr_to_mfn(pstart),
+                                PFN_DOWN(pend - pstart), flags));
+    else
+    {
+        /* Create empty L3 tables. */
+        unsigned long vaddr = vstart & ~((1UL << L4_PAGETABLE_SHIFT) - 1);
+
+        for ( ; vaddr < vend; vaddr += (1UL << L4_PAGETABLE_SHIFT) )
+        {
+            l4_pgentry_t *pl4e = &idle_pg_table[l4_table_offset(vaddr)];
+
+            if ( !(l4e_get_flags(*pl4e) & _PAGE_PRESENT) )
+            {
+                mfn_t mfn = alloc_boot_pages(1, 1);
+                void *v = map_domain_page(mfn);
+
+                clear_page(v);
+                UNMAP_DOMAIN_PAGE(v);
+                l4e_write(pl4e, l4e_from_mfn(mfn, __PAGE_HYPERVISOR));
+            }
+        }
+    }
+}
+
 void __init noreturn __start_xen(unsigned long mbi_p)
 {
     char *memmap_type = NULL;
@@ -1507,8 +1558,17 @@ void __init noreturn __start_xen(unsigned long mbi_p)
         map_e = min_t(uint64_t, e,
                       ARRAY_SIZE(l2_directmap) << L2_PAGETABLE_SHIFT);
 
-        /* Pass mapped memory to allocator /before/ creating new mappings. */
+        /*
+         * Pass mapped memory to allocator /before/ creating new mappings.
+         * The direct map for the bottom 4GiB has been populated in the first
+         * e820 pass. In the second pass, we make sure those existing mappings
+         * are destroyed when directmap=no.
+         */
         init_boot_pages(s, min(map_s, e));
+        if ( !arch_has_directmap() )
+            destroy_xen_mappings((unsigned long)__va(s),
+                                 (unsigned long)__va(min(map_s, e)));
+
         s = map_s;
         if ( s < map_e )
         {
@@ -1517,6 +1577,9 @@ void __init noreturn __start_xen(unsigned long mbi_p)
             map_s = (s + mask) & ~mask;
             map_e &= ~mask;
             init_boot_pages(map_s, map_e);
+            if ( !arch_has_directmap() )
+                destroy_xen_mappings((unsigned long)__va(map_s),
+                                     (unsigned long)__va(map_e));
         }
 
         if ( map_s > map_e )
@@ -1530,8 +1593,7 @@ void __init noreturn __start_xen(unsigned long mbi_p)
 
             if ( map_e < end )
             {
-                map_pages_to_xen((unsigned long)__va(map_e), maddr_to_mfn(map_e),
-                                 PFN_DOWN(end - map_e), PAGE_HYPERVISOR);
+                populate_directmap(map_e, end, PAGE_HYPERVISOR);
                 init_boot_pages(map_e, end);
                 map_e = end;
             }
@@ -1540,13 +1602,11 @@ void __init noreturn __start_xen(unsigned long mbi_p)
         {
             /* This range must not be passed to the boot allocator and
              * must also not be mapped with _PAGE_GLOBAL. */
-            map_pages_to_xen((unsigned long)__va(map_e), maddr_to_mfn(map_e),
-                             PFN_DOWN(e - map_e), __PAGE_HYPERVISOR_RW);
+            populate_directmap(map_e, e, __PAGE_HYPERVISOR_RW);
         }
         if ( s < map_s )
         {
-            map_pages_to_xen((unsigned long)__va(s), maddr_to_mfn(s),
-                             PFN_DOWN(map_s - s), PAGE_HYPERVISOR);
+            populate_directmap(s, map_s, PAGE_HYPERVISOR);
             init_boot_pages(s, map_s);
         }
     }
-- 
2.38.1



^ permalink raw reply related	[flat|nested] 101+ messages in thread

* [PATCH 19/22] xen/arm32: mm: Rename 'first' to 'root' in init_secondary_pagetables()
  2022-12-16 11:48 [PATCH 00/22] Remove the directmap Julien Grall
                   ` (17 preceding siblings ...)
  2022-12-16 11:48 ` [PATCH 18/22] x86/setup: do not create valid mappings when directmap=no Julien Grall
@ 2022-12-16 11:48 ` Julien Grall
  2023-01-06 14:54   ` Henry Wang
  2023-01-23 22:06   ` Stefano Stabellini
  2022-12-16 11:48 ` [PATCH 20/22] xen/arm64: mm: Use per-pCPU page-tables Julien Grall
                   ` (2 subsequent siblings)
  21 siblings, 2 replies; 101+ messages in thread
From: Julien Grall @ 2022-12-16 11:48 UTC (permalink / raw)
  To: xen-devel
  Cc: julien, Julien Grall, Stefano Stabellini, Bertrand Marquis,
	Volodymyr Babchuk

From: Julien Grall <jgrall@amazon.com>

The arm32 version of init_secondary_pagetables() will soon be re-used
for arm64 as well where the root table start at level 0 rather than level 1.

So rename 'first' to 'root'.

Signed-off-by: Julien Grall <jgrall@amazon.com>
---
 xen/arch/arm/mm.c | 16 +++++++---------
 1 file changed, 7 insertions(+), 9 deletions(-)

diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c
index 0fc6f2992dd1..4e208f7d20c8 100644
--- a/xen/arch/arm/mm.c
+++ b/xen/arch/arm/mm.c
@@ -571,32 +571,30 @@ int init_secondary_pagetables(int cpu)
 #else
 int init_secondary_pagetables(int cpu)
 {
-    lpae_t *first;
+    lpae_t *root = alloc_xenheap_page();
 
-    first = alloc_xenheap_page(); /* root == first level on 32-bit 3-level trie */
-
-    if ( !first )
+    if ( !root )
     {
-        printk("CPU%u: Unable to allocate the first page-table\n", cpu);
+        printk("CPU%u: Unable to allocate the root page-table\n", cpu);
         return -ENOMEM;
     }
 
     /* Initialise root pagetable from root of boot tables */
-    memcpy(first, cpu0_pgtable, PAGE_SIZE);
-    per_cpu(xen_pgtable, cpu) = first;
+    memcpy(root, cpu0_pgtable, PAGE_SIZE);
+    per_cpu(xen_pgtable, cpu) = root;
 
     if ( !init_domheap_mappings(cpu) )
     {
         printk("CPU%u: Unable to prepare the domheap page-tables\n", cpu);
         per_cpu(xen_pgtable, cpu) = NULL;
-        free_xenheap_page(first);
+        free_xenheap_page(root);
         return -ENOMEM;
     }
 
     clear_boot_pagetables();
 
     /* Set init_ttbr for this CPU coming up */
-    init_ttbr = __pa(first);
+    init_ttbr = __pa(root);
     clean_dcache(init_ttbr);
 
     return 0;
-- 
2.38.1



^ permalink raw reply related	[flat|nested] 101+ messages in thread

* [PATCH 20/22] xen/arm64: mm: Use per-pCPU page-tables
  2022-12-16 11:48 [PATCH 00/22] Remove the directmap Julien Grall
                   ` (18 preceding siblings ...)
  2022-12-16 11:48 ` [PATCH 19/22] xen/arm32: mm: Rename 'first' to 'root' in init_secondary_pagetables() Julien Grall
@ 2022-12-16 11:48 ` Julien Grall
  2023-01-06 14:54   ` Henry Wang
  2023-01-23 22:21   ` Stefano Stabellini
  2022-12-16 11:48 ` [PATCH 21/22] xen/arm64: Implement a mapcache for arm64 Julien Grall
  2022-12-16 11:48 ` [PATCH 22/22] xen/arm64: Allow the admin to enable/disable the directmap Julien Grall
  21 siblings, 2 replies; 101+ messages in thread
From: Julien Grall @ 2022-12-16 11:48 UTC (permalink / raw)
  To: xen-devel
  Cc: julien, Julien Grall, Stefano Stabellini, Bertrand Marquis,
	Volodymyr Babchuk

From: Julien Grall <jgrall@amazon.com>

At the moment, on Arm64, every pCPU are sharing the same page-tables.

In a follow-up patch, we will allow the possibility to remove the
direct map and therefore it will be necessary to have a mapcache.

While we have plenty of spare virtual address space to have
to reserve part for each pCPU, it means that temporary mappings
(e.g. guest memory) could be accessible by every pCPU.

In order to increase our security posture, it would be better if
those mappings are only accessible by the pCPU doing the temporary
mapping.

In addition to that, a per-pCPU page-tables opens the way to have
per-domain mapping area.

Arm32 is already using per-pCPU page-tables so most of the code
can be re-used. Arm64 doesn't yet have support for the mapcache,
so a stub is provided (moved to its own header asm/domain_page.h).

Take the opportunity to fix a typo in a comment that is modified.

Signed-off-by: Julien Grall <jgrall@amazon.com>
---
 xen/arch/arm/domain_page.c             |  2 ++
 xen/arch/arm/include/asm/arm32/mm.h    |  8 -----
 xen/arch/arm/include/asm/domain_page.h | 13 ++++++++
 xen/arch/arm/include/asm/mm.h          |  5 +++
 xen/arch/arm/mm.c                      | 42 +++++++-------------------
 xen/arch/arm/setup.c                   |  1 +
 6 files changed, 32 insertions(+), 39 deletions(-)
 create mode 100644 xen/arch/arm/include/asm/domain_page.h

diff --git a/xen/arch/arm/domain_page.c b/xen/arch/arm/domain_page.c
index b7c02c919064..4540b3c5f24c 100644
--- a/xen/arch/arm/domain_page.c
+++ b/xen/arch/arm/domain_page.c
@@ -3,6 +3,8 @@
 #include <xen/pmap.h>
 #include <xen/vmap.h>
 
+#include <asm/domain_page.h>
+
 /* Override macros from asm/page.h to make them work with mfn_t */
 #undef virt_to_mfn
 #define virt_to_mfn(va) _mfn(__virt_to_mfn(va))
diff --git a/xen/arch/arm/include/asm/arm32/mm.h b/xen/arch/arm/include/asm/arm32/mm.h
index 8bfc906e7178..6b039d9ceaa2 100644
--- a/xen/arch/arm/include/asm/arm32/mm.h
+++ b/xen/arch/arm/include/asm/arm32/mm.h
@@ -1,12 +1,6 @@
 #ifndef __ARM_ARM32_MM_H__
 #define __ARM_ARM32_MM_H__
 
-#include <xen/percpu.h>
-
-#include <asm/lpae.h>
-
-DECLARE_PER_CPU(lpae_t *, xen_pgtable);
-
 /*
  * Only a limited amount of RAM, called xenheap, is always mapped on ARM32.
  * For convenience always return false.
@@ -16,8 +10,6 @@ static inline bool arch_mfns_in_directmap(unsigned long mfn, unsigned long nr)
     return false;
 }
 
-bool init_domheap_mappings(unsigned int cpu);
-
 #endif /* __ARM_ARM32_MM_H__ */
 
 /*
diff --git a/xen/arch/arm/include/asm/domain_page.h b/xen/arch/arm/include/asm/domain_page.h
new file mode 100644
index 000000000000..e9f52685e2ec
--- /dev/null
+++ b/xen/arch/arm/include/asm/domain_page.h
@@ -0,0 +1,13 @@
+#ifndef __ASM_ARM_DOMAIN_PAGE_H__
+#define __ASM_ARM_DOMAIN_PAGE_H__
+
+#ifdef CONFIG_ARCH_MAP_DOMAIN_PAGE
+bool init_domheap_mappings(unsigned int cpu);
+#else
+static inline bool init_domheap_mappings(unsigned int cpu)
+{
+    return true;
+}
+#endif
+
+#endif /* __ASM_ARM_DOMAIN_PAGE_H__ */
diff --git a/xen/arch/arm/include/asm/mm.h b/xen/arch/arm/include/asm/mm.h
index 2366928d71aa..7a2c775f9562 100644
--- a/xen/arch/arm/include/asm/mm.h
+++ b/xen/arch/arm/include/asm/mm.h
@@ -2,6 +2,9 @@
 #define __ARCH_ARM_MM__
 
 #include <xen/kernel.h>
+#include <xen/percpu.h>
+
+#include <asm/lpae.h>
 #include <asm/page.h>
 #include <public/xen.h>
 #include <xen/pdx.h>
@@ -14,6 +17,8 @@
 # error "unknown ARM variant"
 #endif
 
+DECLARE_PER_CPU(lpae_t *, xen_pgtable);
+
 /* Align Xen to a 2 MiB boundary. */
 #define XEN_PADDR_ALIGN (1 << 21)
 
diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c
index 4e208f7d20c8..2af751af9003 100644
--- a/xen/arch/arm/mm.c
+++ b/xen/arch/arm/mm.c
@@ -24,6 +24,7 @@
 
 #include <xsm/xsm.h>
 
+#include <asm/domain_page.h>
 #include <asm/fixmap.h>
 #include <asm/setup.h>
 
@@ -90,20 +91,19 @@ DEFINE_BOOT_PAGE_TABLE(boot_third);
  * xen_second, xen_fixmap and xen_xenmap are always shared between all
  * PCPUs.
  */
+/* Per-CPU pagetable pages */
+/* xen_pgtable == root of the trie (zeroeth level on 64-bit, first on 32-bit) */
+DEFINE_PER_CPU(lpae_t *, xen_pgtable);
+
+/* Root of the trie for cpu0, other CPU's PTs are dynamically allocated */
+static DEFINE_PAGE_TABLE(cpu0_pgtable);
+#define THIS_CPU_PGTABLE this_cpu(xen_pgtable)
 
 #ifdef CONFIG_ARM_64
 #define HYP_PT_ROOT_LEVEL 0
-static DEFINE_PAGE_TABLE(xen_pgtable);
 static DEFINE_PAGE_TABLE(xen_first);
-#define THIS_CPU_PGTABLE xen_pgtable
 #else
 #define HYP_PT_ROOT_LEVEL 1
-/* Per-CPU pagetable pages */
-/* xen_pgtable == root of the trie (zeroeth level on 64-bit, first on 32-bit) */
-DEFINE_PER_CPU(lpae_t *, xen_pgtable);
-#define THIS_CPU_PGTABLE this_cpu(xen_pgtable)
-/* Root of the trie for cpu0, other CPU's PTs are dynamically allocated */
-static DEFINE_PAGE_TABLE(cpu0_pgtable);
 #endif
 
 /* Common pagetable leaves */
@@ -481,14 +481,13 @@ void __init setup_pagetables(unsigned long boot_phys_offset)
 
     phys_offset = boot_phys_offset;
 
+    p = cpu0_pgtable;
+
 #ifdef CONFIG_ARM_64
-    p = (void *) xen_pgtable;
     p[0] = pte_of_xenaddr((uintptr_t)xen_first);
     p[0].pt.table = 1;
     p[0].pt.xn = 0;
     p = (void *) xen_first;
-#else
-    p = (void *) cpu0_pgtable;
 #endif
 
     /* Map xen second level page-table */
@@ -527,19 +526,13 @@ void __init setup_pagetables(unsigned long boot_phys_offset)
     pte.pt.table = 1;
     xen_second[second_table_offset(FIXMAP_ADDR(0))] = pte;
 
-#ifdef CONFIG_ARM_64
-    ttbr = (uintptr_t) xen_pgtable + phys_offset;
-#else
     ttbr = (uintptr_t) cpu0_pgtable + phys_offset;
-#endif
 
     switch_ttbr(ttbr);
 
     xen_pt_enforce_wnx();
 
-#ifdef CONFIG_ARM_32
     per_cpu(xen_pgtable, 0) = cpu0_pgtable;
-#endif
 }
 
 static void clear_boot_pagetables(void)
@@ -557,18 +550,6 @@ static void clear_boot_pagetables(void)
     clear_table(boot_third);
 }
 
-#ifdef CONFIG_ARM_64
-int init_secondary_pagetables(int cpu)
-{
-    clear_boot_pagetables();
-
-    /* Set init_ttbr for this CPU coming up. All CPus share a single setof
-     * pagetables, but rewrite it each time for consistency with 32 bit. */
-    init_ttbr = (uintptr_t) xen_pgtable + phys_offset;
-    clean_dcache(init_ttbr);
-    return 0;
-}
-#else
 int init_secondary_pagetables(int cpu)
 {
     lpae_t *root = alloc_xenheap_page();
@@ -599,7 +580,6 @@ int init_secondary_pagetables(int cpu)
 
     return 0;
 }
-#endif
 
 /* MMU setup for secondary CPUS (which already have paging enabled) */
 void mmu_init_secondary_cpu(void)
@@ -1089,7 +1069,7 @@ static int xen_pt_update(unsigned long virt,
     unsigned long left = nr_mfns;
 
     /*
-     * For arm32, page-tables are different on each CPUs. Yet, they share
+     * Page-tables are different on each CPU. Yet, they share
      * some common mappings. It is assumed that only common mappings
      * will be modified with this function.
      *
diff --git a/xen/arch/arm/setup.c b/xen/arch/arm/setup.c
index 2311726f5ddd..88d9d90fb5ad 100644
--- a/xen/arch/arm/setup.c
+++ b/xen/arch/arm/setup.c
@@ -39,6 +39,7 @@
 #include <asm/gic.h>
 #include <asm/cpuerrata.h>
 #include <asm/cpufeature.h>
+#include <asm/domain_page.h>
 #include <asm/platform.h>
 #include <asm/procinfo.h>
 #include <asm/setup.h>
-- 
2.38.1



^ permalink raw reply related	[flat|nested] 101+ messages in thread

* [PATCH 21/22] xen/arm64: Implement a mapcache for arm64
  2022-12-16 11:48 [PATCH 00/22] Remove the directmap Julien Grall
                   ` (19 preceding siblings ...)
  2022-12-16 11:48 ` [PATCH 20/22] xen/arm64: mm: Use per-pCPU page-tables Julien Grall
@ 2022-12-16 11:48 ` Julien Grall
  2023-01-06 14:55   ` Henry Wang
  2023-01-23 22:34   ` Stefano Stabellini
  2022-12-16 11:48 ` [PATCH 22/22] xen/arm64: Allow the admin to enable/disable the directmap Julien Grall
  21 siblings, 2 replies; 101+ messages in thread
From: Julien Grall @ 2022-12-16 11:48 UTC (permalink / raw)
  To: xen-devel
  Cc: julien, Julien Grall, Stefano Stabellini, Bertrand Marquis,
	Volodymyr Babchuk

From: Julien Grall <jgrall@amazon.com>

At the moment, on arm64, map_domain_page() is implemented using
virt_to_mfn(). Therefore it is relying on the directmap.

In a follow-up patch, we will allow the admin to remove the directmap.
Therefore we want to implement a mapcache.

Thanksfully there is already one for arm32. So select ARCH_ARM_DOMAIN_PAGE
and add the necessary boiler plate to support 64-bit:
    - The page-table start at level 0, so we need to allocate the level
      1 page-table
    - map_domain_page() should check if the page is in the directmap. If
      yes, then use virt_to_mfn() to limit the performance impact
      when the directmap is still enabled (this will be selectable
      on the command line).

Take the opportunity to replace first_table_offset(...) with offsets[...].

Note that, so far, arch_mfns_in_directmap() always return true on
arm64. So the mapcache is not yet used. This will change in a
follow-up patch.

Signed-off-by: Julien Grall <jgrall@amazon.com>

----

    There are a few TODOs:
        - It is becoming more critical to fix the mapcache
          implementation (this is not compliant with the Arm Arm)
        - Evaluate the performance
---
 xen/arch/arm/Kconfig              |  1 +
 xen/arch/arm/domain_page.c        | 47 +++++++++++++++++++++++++++----
 xen/arch/arm/include/asm/config.h |  7 +++++
 xen/arch/arm/include/asm/mm.h     |  5 ++++
 xen/arch/arm/mm.c                 |  6 ++--
 xen/arch/arm/setup.c              |  4 +++
 6 files changed, 62 insertions(+), 8 deletions(-)

diff --git a/xen/arch/arm/Kconfig b/xen/arch/arm/Kconfig
index 239d3aed3c7f..9c58b2d5c3aa 100644
--- a/xen/arch/arm/Kconfig
+++ b/xen/arch/arm/Kconfig
@@ -9,6 +9,7 @@ config ARM_64
 	select 64BIT
 	select ARM_EFI
 	select HAS_FAST_MULTIPLY
+	select ARCH_MAP_DOMAIN_PAGE
 
 config ARM
 	def_bool y
diff --git a/xen/arch/arm/domain_page.c b/xen/arch/arm/domain_page.c
index 4540b3c5f24c..f3547dc853ef 100644
--- a/xen/arch/arm/domain_page.c
+++ b/xen/arch/arm/domain_page.c
@@ -1,4 +1,5 @@
 /* SPDX-License-Identifier: GPL-2.0-or-later */
+#include <xen/domain_page.h>
 #include <xen/mm.h>
 #include <xen/pmap.h>
 #include <xen/vmap.h>
@@ -8,6 +9,8 @@
 /* Override macros from asm/page.h to make them work with mfn_t */
 #undef virt_to_mfn
 #define virt_to_mfn(va) _mfn(__virt_to_mfn(va))
+#undef mfn_to_virt
+#define mfn_to_virt(va) __mfn_to_virt(mfn_x(mfn))
 
 /* cpu0's domheap page tables */
 static DEFINE_PAGE_TABLES(cpu0_dommap, DOMHEAP_SECOND_PAGES);
@@ -31,13 +34,30 @@ bool init_domheap_mappings(unsigned int cpu)
 {
     unsigned int order = get_order_from_pages(DOMHEAP_SECOND_PAGES);
     lpae_t *root = per_cpu(xen_pgtable, cpu);
+    lpae_t *first;
     unsigned int i, first_idx;
     lpae_t *domheap;
     mfn_t mfn;
 
+    /* Convenience aliases */
+    DECLARE_OFFSETS(offsets, DOMHEAP_VIRT_START);
+
     ASSERT(root);
     ASSERT(!per_cpu(xen_dommap, cpu));
 
+    /*
+     * On Arm64, the root is at level 0. Therefore we need an extra step
+     * to allocate the first level page-table.
+     */
+#ifdef CONFIG_ARM_64
+    if ( create_xen_table(&root[offsets[0]]) )
+        return false;
+
+    first = xen_map_table(lpae_get_mfn(root[offsets[0]]));
+#else
+    first = root;
+#endif
+
     /*
      * The domheap for cpu0 is initialized before the heap is initialized.
      * So we need to use pre-allocated pages.
@@ -58,16 +78,20 @@ bool init_domheap_mappings(unsigned int cpu)
      * domheap mapping pages.
      */
     mfn = virt_to_mfn(domheap);
-    first_idx = first_table_offset(DOMHEAP_VIRT_START);
+    first_idx = offsets[1];
     for ( i = 0; i < DOMHEAP_SECOND_PAGES; i++ )
     {
         lpae_t pte = mfn_to_xen_entry(mfn_add(mfn, i), MT_NORMAL);
         pte.pt.table = 1;
-        write_pte(&root[first_idx + i], pte);
+        write_pte(&first[first_idx + i], pte);
     }
 
     per_cpu(xen_dommap, cpu) = domheap;
 
+#ifdef CONFIG_ARM_64
+    xen_unmap_table(first);
+#endif
+
     return true;
 }
 
@@ -91,6 +115,10 @@ void *map_domain_page(mfn_t mfn)
     lpae_t pte;
     int i, slot;
 
+    /* Bypass the mapcache if the page is in the directmap */
+    if ( arch_mfns_in_directmap(mfn_x(mfn), 1) )
+        return mfn_to_virt(mfn);
+
     local_irq_save(flags);
 
     /* The map is laid out as an open-addressed hash table where each
@@ -151,15 +179,24 @@ void *map_domain_page(mfn_t mfn)
 }
 
 /* Release a mapping taken with map_domain_page() */
-void unmap_domain_page(const void *va)
+void unmap_domain_page(const void *ptr)
 {
+    unsigned long va = (unsigned long)ptr;
     unsigned long flags;
     lpae_t *map = this_cpu(xen_dommap);
-    int slot = ((unsigned long) va - DOMHEAP_VIRT_START) >> SECOND_SHIFT;
+    unsigned int slot;
 
-    if ( !va )
+    /*
+     * map_domain_page() may not have mapped anything if the address
+     * is part of the directmap. So ignore anything outside of the
+     * domheap.
+     */
+    if ( (va < DOMHEAP_VIRT_START) ||
+         ((va - DOMHEAP_VIRT_START) >= DOMHEAP_VIRT_SIZE) )
         return;
 
+    slot = (va - DOMHEAP_VIRT_START) >> SECOND_SHIFT;
+
     local_irq_save(flags);
 
     ASSERT(slot >= 0 && slot < DOMHEAP_ENTRIES);
diff --git a/xen/arch/arm/include/asm/config.h b/xen/arch/arm/include/asm/config.h
index 0fefed1b8aa9..12b7f1f1b9ea 100644
--- a/xen/arch/arm/include/asm/config.h
+++ b/xen/arch/arm/include/asm/config.h
@@ -156,6 +156,13 @@
 #define FRAMETABLE_SIZE        GB(32)
 #define FRAMETABLE_NR          (FRAMETABLE_SIZE / sizeof(*frame_table))
 
+#define DOMHEAP_VIRT_START     SLOT0(255)
+#define DOMHEAP_VIRT_SIZE      GB(2)
+
+#define DOMHEAP_ENTRIES        1024 /* 1024 2MB mapping slots */
+/* Number of domheap pagetable pages required at the second level (2MB mappings) */
+#define DOMHEAP_SECOND_PAGES (DOMHEAP_VIRT_SIZE >> FIRST_SHIFT)
+
 #define DIRECTMAP_VIRT_START   SLOT0(256)
 #define DIRECTMAP_SIZE         (SLOT0_ENTRY_SIZE * (265-256))
 #define DIRECTMAP_VIRT_END     (DIRECTMAP_VIRT_START + DIRECTMAP_SIZE - 1)
diff --git a/xen/arch/arm/include/asm/mm.h b/xen/arch/arm/include/asm/mm.h
index 7a2c775f9562..d73abf1bf763 100644
--- a/xen/arch/arm/include/asm/mm.h
+++ b/xen/arch/arm/include/asm/mm.h
@@ -416,6 +416,11 @@ static inline bool arch_has_directmap(void)
     return true;
 }
 
+/* Helpers to allocate, map and unmap a Xen page-table */
+int create_xen_table(lpae_t *entry);
+lpae_t *xen_map_table(mfn_t mfn);
+void xen_unmap_table(const lpae_t *table);
+
 #endif /*  __ARCH_ARM_MM__ */
 /*
  * Local variables:
diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c
index 2af751af9003..f5fb957554a5 100644
--- a/xen/arch/arm/mm.c
+++ b/xen/arch/arm/mm.c
@@ -177,7 +177,7 @@ static void __init __maybe_unused build_assertions(void)
 #undef CHECK_SAME_SLOT
 }
 
-static lpae_t *xen_map_table(mfn_t mfn)
+lpae_t *xen_map_table(mfn_t mfn)
 {
     /*
      * During early boot, map_domain_page() may be unusable. Use the
@@ -189,7 +189,7 @@ static lpae_t *xen_map_table(mfn_t mfn)
     return map_domain_page(mfn);
 }
 
-static void xen_unmap_table(const lpae_t *table)
+void xen_unmap_table(const lpae_t *table)
 {
     /*
      * During early boot, xen_map_table() will not use map_domain_page()
@@ -699,7 +699,7 @@ void *ioremap(paddr_t pa, size_t len)
     return ioremap_attr(pa, len, PAGE_HYPERVISOR_NOCACHE);
 }
 
-static int create_xen_table(lpae_t *entry)
+int create_xen_table(lpae_t *entry)
 {
     mfn_t mfn;
     void *p;
diff --git a/xen/arch/arm/setup.c b/xen/arch/arm/setup.c
index 88d9d90fb5ad..b1a8f91bb385 100644
--- a/xen/arch/arm/setup.c
+++ b/xen/arch/arm/setup.c
@@ -923,6 +923,10 @@ static void __init setup_mm(void)
      */
     populate_boot_allocator();
 
+    if ( !init_domheap_mappings(smp_processor_id()) )
+        panic("CPU%u: Unable to prepare the domheap page-tables\n",
+              smp_processor_id());
+
     total_pages = 0;
 
     for ( i = 0; i < banks->nr_banks; i++ )
-- 
2.38.1



^ permalink raw reply related	[flat|nested] 101+ messages in thread

* [PATCH 22/22] xen/arm64: Allow the admin to enable/disable the directmap
  2022-12-16 11:48 [PATCH 00/22] Remove the directmap Julien Grall
                   ` (20 preceding siblings ...)
  2022-12-16 11:48 ` [PATCH 21/22] xen/arm64: Implement a mapcache for arm64 Julien Grall
@ 2022-12-16 11:48 ` Julien Grall
  2023-01-06 14:55   ` Henry Wang
  2023-01-23 22:52   ` Stefano Stabellini
  21 siblings, 2 replies; 101+ messages in thread
From: Julien Grall @ 2022-12-16 11:48 UTC (permalink / raw)
  To: xen-devel
  Cc: julien, Julien Grall, Andrew Cooper, George Dunlap, Jan Beulich,
	Stefano Stabellini, Wei Liu, Bertrand Marquis, Volodymyr Babchuk

From: Julien Grall <jgrall@amazon.com>

Implement the same command line option as x86 to enable/disable the
directmap. By default this is kept enabled.

Also modify setup_directmap_mappings() to populate the L0 entries
related to the directmap area.

Signed-off-by: Julien Grall <jgrall@amazon.com>

----
    This patch is in an RFC state we need to decide what to do for arm32.

    Also, this is moving code that was introduced in this series. So
    this will need to be fix in the next version (assuming Arm64 will
    be ready).

    This was sent early as PoC to enable secret-free hypervisor
    on Arm64.
---
 docs/misc/xen-command-line.pandoc   |  2 +-
 xen/arch/arm/include/asm/arm64/mm.h |  2 +-
 xen/arch/arm/include/asm/mm.h       | 12 +++++----
 xen/arch/arm/mm.c                   | 40 +++++++++++++++++++++++++++--
 xen/arch/arm/setup.c                |  1 +
 5 files changed, 48 insertions(+), 9 deletions(-)

diff --git a/docs/misc/xen-command-line.pandoc b/docs/misc/xen-command-line.pandoc
index a63e4612acac..948035286acc 100644
--- a/docs/misc/xen-command-line.pandoc
+++ b/docs/misc/xen-command-line.pandoc
@@ -760,7 +760,7 @@ Specify the size of the console debug trace buffer. By specifying `cpu:`
 additionally a trace buffer of the specified size is allocated per cpu.
 The debug trace feature is only enabled in debugging builds of Xen.
 
-### directmap (x86)
+### directmap (arm64, x86)
 > `= <boolean>`
 
 > Default: `true`
diff --git a/xen/arch/arm/include/asm/arm64/mm.h b/xen/arch/arm/include/asm/arm64/mm.h
index aa2adac63189..8b5dcb091750 100644
--- a/xen/arch/arm/include/asm/arm64/mm.h
+++ b/xen/arch/arm/include/asm/arm64/mm.h
@@ -7,7 +7,7 @@
  */
 static inline bool arch_mfns_in_directmap(unsigned long mfn, unsigned long nr)
 {
-    return true;
+    return opt_directmap;
 }
 
 #endif /* __ARM_ARM64_MM_H__ */
diff --git a/xen/arch/arm/include/asm/mm.h b/xen/arch/arm/include/asm/mm.h
index d73abf1bf763..ef9ad3b366e3 100644
--- a/xen/arch/arm/include/asm/mm.h
+++ b/xen/arch/arm/include/asm/mm.h
@@ -9,6 +9,13 @@
 #include <public/xen.h>
 #include <xen/pdx.h>
 
+extern bool opt_directmap;
+
+static inline bool arch_has_directmap(void)
+{
+    return opt_directmap;
+}
+
 #if defined(CONFIG_ARM_32)
 # include <asm/arm32/mm.h>
 #elif defined(CONFIG_ARM_64)
@@ -411,11 +418,6 @@ static inline void page_set_xenheap_gfn(struct page_info *p, gfn_t gfn)
     } while ( (y = cmpxchg(&p->u.inuse.type_info, x, nx)) != x );
 }
 
-static inline bool arch_has_directmap(void)
-{
-    return true;
-}
-
 /* Helpers to allocate, map and unmap a Xen page-table */
 int create_xen_table(lpae_t *entry);
 lpae_t *xen_map_table(mfn_t mfn);
diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c
index f5fb957554a5..925d81c450e8 100644
--- a/xen/arch/arm/mm.c
+++ b/xen/arch/arm/mm.c
@@ -15,6 +15,7 @@
 #include <xen/init.h>
 #include <xen/libfdt/libfdt.h>
 #include <xen/mm.h>
+#include <xen/param.h>
 #include <xen/pfn.h>
 #include <xen/pmap.h>
 #include <xen/sched.h>
@@ -131,6 +132,12 @@ vaddr_t directmap_virt_start __read_mostly;
 unsigned long directmap_base_pdx __read_mostly;
 #endif
 
+bool __ro_after_init opt_directmap = true;
+/* TODO: Decide what to do for arm32. */
+#ifdef CONFIG_ARM_64
+boolean_param("directmap", opt_directmap);
+#endif
+
 unsigned long frametable_base_pdx __read_mostly;
 unsigned long frametable_virt_end __read_mostly;
 
@@ -606,16 +613,27 @@ void __init setup_directmap_mappings(unsigned long base_mfn,
     directmap_virt_end = XENHEAP_VIRT_START + nr_mfns * PAGE_SIZE;
 }
 #else /* CONFIG_ARM_64 */
-/* Map the region in the directmap area. */
+/*
+ * This either populate a valid fdirect map, or allocates empty L1 tables
+ * and creates the L0 entries for the given region in the direct map
+ * depending on arch_has_directmap().
+ *
+ * When directmap=no, we still need to populate empty L1 tables in the
+ * directmap region. The reason is that the root page-table (i.e. L0)
+ * is per-CPU and secondary CPUs will initialize their root page-table
+ * based on the pCPU0 one. So L0 entries will be shared if they are
+ * pre-populated. We also rely on the fact that L1 tables are never
+ * freed.
+ */
 void __init setup_directmap_mappings(unsigned long base_mfn,
                                      unsigned long nr_mfns)
 {
+    unsigned long mfn_gb = base_mfn & ~((FIRST_SIZE >> PAGE_SHIFT) - 1);
     int rc;
 
     /* First call sets the directmap physical and virtual offset. */
     if ( mfn_eq(directmap_mfn_start, INVALID_MFN) )
     {
-        unsigned long mfn_gb = base_mfn & ~((FIRST_SIZE >> PAGE_SHIFT) - 1);
 
         directmap_mfn_start = _mfn(base_mfn);
         directmap_base_pdx = mfn_to_pdx(_mfn(base_mfn));
@@ -636,6 +654,24 @@ void __init setup_directmap_mappings(unsigned long base_mfn,
         panic("cannot add directmap mapping at %lx below heap start %lx\n",
               base_mfn, mfn_x(directmap_mfn_start));
 
+
+    if ( !arch_has_directmap() )
+    {
+        vaddr_t vaddr = (vaddr_t)__mfn_to_virt(base_mfn);
+        unsigned int i, slot;
+
+        slot = first_table_offset(vaddr);
+        nr_mfns += base_mfn - mfn_gb;
+        for ( i = 0; i < nr_mfns; i += BIT(XEN_PT_LEVEL_ORDER(0), UL), slot++ )
+        {
+            lpae_t *entry = &cpu0_pgtable[slot];
+
+            if ( !lpae_is_valid(*entry) && !create_xen_table(entry) )
+                panic("Unable to populate zeroeth slot %u\n", slot);
+        }
+        return;
+    }
+
     rc = map_pages_to_xen((vaddr_t)__mfn_to_virt(base_mfn),
                           _mfn(base_mfn), nr_mfns,
                           PAGE_HYPERVISOR_RW | _PAGE_BLOCK);
diff --git a/xen/arch/arm/setup.c b/xen/arch/arm/setup.c
index b1a8f91bb385..83ded03c7b1f 100644
--- a/xen/arch/arm/setup.c
+++ b/xen/arch/arm/setup.c
@@ -1032,6 +1032,7 @@ void __init start_xen(unsigned long boot_phys_offset,
     cmdline_parse(cmdline);
 
     setup_mm();
+    printk("Booting with directmap %s\n", arch_has_directmap() ? "on" : "off");
 
     vm_init();
 
-- 
2.38.1



^ permalink raw reply related	[flat|nested] 101+ messages in thread

* Re: [PATCH 01/22] xen/common: page_alloc: Re-order includes
  2022-12-16 11:48 ` [PATCH 01/22] xen/common: page_alloc: Re-order includes Julien Grall
@ 2022-12-16 12:03   ` Jan Beulich
  2022-12-23  9:29     ` Julien Grall
  2023-01-23 21:29   ` Stefano Stabellini
  1 sibling, 1 reply; 101+ messages in thread
From: Jan Beulich @ 2022-12-16 12:03 UTC (permalink / raw)
  To: Julien Grall
  Cc: Julien Grall, Andrew Cooper, George Dunlap, Stefano Stabellini,
	Wei Liu, xen-devel

On 16.12.2022 12:48, Julien Grall wrote:
> From: Julien Grall <jgrall@amazon.com>
> 
> Order the includes with the xen headers first, then asm headers and
> last public headers. Within each category, they are sorted alphabetically.
> 
> Note that the includes in protected by CONFIG_X86 hasn't been sorted
> to avoid adding multiple #ifdef.
> 
> Signed-off-by: Julien Grall <jgrall@amazon.com>

Acked-by: Jan Beulich <jbeulich@suse.com>

> ----
> 
>     I am open to add sort the includes protected by CONFIG_X86
>     and add multiple #ifdef if this is preferred.

I, for one, prefer it the way you've done it.

Jan


^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 03/22] acpi: vmap pages in acpi_os_alloc_memory
  2022-12-16 11:48 ` [PATCH 03/22] acpi: vmap pages in acpi_os_alloc_memory Julien Grall
@ 2022-12-16 12:07   ` Julien Grall
  2022-12-20 15:15   ` Jan Beulich
  2023-01-23 21:39   ` Stefano Stabellini
  2 siblings, 0 replies; 101+ messages in thread
From: Julien Grall @ 2022-12-16 12:07 UTC (permalink / raw)
  To: xen-devel
  Cc: Hongyan Xia, Andrew Cooper, George Dunlap, Jan Beulich,
	Stefano Stabellini, Wei Liu, Julien Grall

Hi,

On 16/12/2022 11:48, Julien Grall wrote:
> From: Hongyan Xia <hongyxia@amazon.com>
> 
> Also, introduce a wrapper around vmap that maps a contiguous range for
> boot allocations. Unfortunately, the new helper cannot be a static inline
> because the dependences are a mess. We would need to re-include
> asm/page.h (was removed in aa4b9d1ee653 "include: don't use asm/page.h
> from common headers") and it doesn't look to be enough anymore
> because bits from asm/cpufeature.h is used in the definition of PAGE_NX.
> 
> Signed-off-by: Hongyan Xia <hongyxia@amazon.com>
> Signed-off-by: Julien Grall <jgrall@amazon.com>
> 
> ----

Sorry I sent this patch (and the others) with 4 dashes rather than 3. 
This is my way to workaround an issue with the patchqueue tools I am 
using (it would strip the text after the --- otherwise).

I will try to remember to remove the extra dash in the next version.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 02/22] x86/setup: move vm_init() before acpi calls
  2022-12-16 11:48 ` [PATCH 02/22] x86/setup: move vm_init() before acpi calls Julien Grall
@ 2022-12-20 15:08   ` Jan Beulich
  2022-12-21 10:18     ` Julien Grall
  2022-12-23  9:51     ` Julien Grall
  2023-01-23 21:34   ` Stefano Stabellini
  1 sibling, 2 replies; 101+ messages in thread
From: Jan Beulich @ 2022-12-20 15:08 UTC (permalink / raw)
  To: Julien Grall
  Cc: Wei Liu, Stefano Stabellini, Bertrand Marquis, Volodymyr Babchuk,
	Andrew Cooper, George Dunlap, Wei Liu, Roger Pau Monné,
	David Woodhouse, Hongyan Xia, Julien Grall, xen-devel

On 16.12.2022 12:48, Julien Grall wrote:
> From: Wei Liu <wei.liu2@citrix.com>
> 
> After the direct map removal, pages from the boot allocator are not
> mapped at all in the direct map. Although we have map_domain_page, they

Nit: "will not be mapped" or "are not going to be mapped", or else this
sounds like there's a bug somewhere.

> --- a/xen/arch/x86/setup.c
> +++ b/xen/arch/x86/setup.c
> @@ -870,6 +870,7 @@ void __init noreturn __start_xen(unsigned long mbi_p)
>      unsigned long eb_start, eb_end;
>      bool acpi_boot_table_init_done = false, relocated = false;
>      int ret;
> +    bool vm_init_done = false;

Can this please be grouped with the other bool-s (even visible in context)?
> --- a/xen/common/vmap.c
> +++ b/xen/common/vmap.c
> @@ -34,9 +34,20 @@ void __init vm_init_type(enum vmap_region type, void *start, void *end)
>  
>      for ( i = 0, va = (unsigned long)vm_bitmap(type); i < nr; ++i, va += PAGE_SIZE )
>      {
> -        struct page_info *pg = alloc_domheap_page(NULL, 0);
> +        mfn_t mfn;
> +        int rc;
>  
> -        map_pages_to_xen(va, page_to_mfn(pg), 1, PAGE_HYPERVISOR);
> +        if ( system_state == SYS_STATE_early_boot )
> +            mfn = alloc_boot_pages(1, 1);
> +        else
> +        {
> +            struct page_info *pg = alloc_domheap_page(NULL, 0);
> +
> +            BUG_ON(!pg);
> +            mfn = page_to_mfn(pg);
> +        }
> +        rc = map_pages_to_xen(va, mfn, 1, PAGE_HYPERVISOR);
> +        BUG_ON(rc);

The adding of a return value check is unrelated and not overly useful:

>          clear_page((void *)va);

This will fault anyway if the mapping attempt failed.

Jan


^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 03/22] acpi: vmap pages in acpi_os_alloc_memory
  2022-12-16 11:48 ` [PATCH 03/22] acpi: vmap pages in acpi_os_alloc_memory Julien Grall
  2022-12-16 12:07   ` Julien Grall
@ 2022-12-20 15:15   ` Jan Beulich
  2022-12-21 10:23     ` Julien Grall
  2023-01-23 21:39   ` Stefano Stabellini
  2 siblings, 1 reply; 101+ messages in thread
From: Jan Beulich @ 2022-12-20 15:15 UTC (permalink / raw)
  To: Julien Grall
  Cc: Hongyan Xia, Andrew Cooper, George Dunlap, Stefano Stabellini,
	Wei Liu, Julien Grall, xen-devel

On 16.12.2022 12:48, Julien Grall wrote:
> --- a/xen/common/vmap.c
> +++ b/xen/common/vmap.c
> @@ -244,6 +244,11 @@ void *vmap(const mfn_t *mfn, unsigned int nr)
>      return __vmap(mfn, 1, nr, 1, PAGE_HYPERVISOR, VMAP_DEFAULT);
>  }
>  
> +void *vmap_contig_pages(mfn_t mfn, unsigned int nr_pages)

I don't think the _pages suffix buys us much here. I also think parameter
names would better be consistent with other functions here, in particular
with vmap() (i.e. s/nr_pages/nr/).

> --- a/xen/drivers/acpi/osl.c
> +++ b/xen/drivers/acpi/osl.c
> @@ -221,7 +221,11 @@ void *__init acpi_os_alloc_memory(size_t sz)
>  	void *ptr;
>  
>  	if (system_state == SYS_STATE_early_boot)
> -		return mfn_to_virt(mfn_x(alloc_boot_pages(PFN_UP(sz), 1)));
> +	{
> +		mfn_t mfn = alloc_boot_pages(PFN_UP(sz), 1);
> +
> +		return vmap_contig_pages(mfn, PFN_UP(sz));
> +	}

Multiple pages may be allocated here, yet ...

> @@ -246,5 +250,10 @@ void __init acpi_os_free_memory(void *ptr)
>  	if (is_xmalloc_memory(ptr))
>  		xfree(ptr);
>  	else if (ptr && system_state == SYS_STATE_early_boot)
> -		init_boot_pages(__pa(ptr), __pa(ptr) + PAGE_SIZE);
> +	{
> +		paddr_t addr = mfn_to_maddr(vmap_to_mfn(ptr));
> +
> +		vunmap(ptr);
> +		init_boot_pages(addr, addr + PAGE_SIZE);
> +	}

... (as before) only one page would be freed here. With the move to
vmap() it ought to be possible to do better now. (If you want to
defer this to a later patch, please at least mention the aspect in
the description.)

Jan


^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 04/22] xen/numa: vmap the pages for memnodemap
  2022-12-16 11:48 ` [PATCH 04/22] xen/numa: vmap the pages for memnodemap Julien Grall
@ 2022-12-20 15:25   ` Jan Beulich
  0 siblings, 0 replies; 101+ messages in thread
From: Jan Beulich @ 2022-12-20 15:25 UTC (permalink / raw)
  To: Julien Grall
  Cc: Hongyan Xia, Andrew Cooper, George Dunlap, Stefano Stabellini,
	Wei Liu, Julien Grall, xen-devel

On 16.12.2022 12:48, Julien Grall wrote:
> From: Hongyan Xia <hongyxia@amazon.com>
> 
> This avoids the assumption that there is a direct map and boot pages
> fall inside the direct map.
> 
> Clean up the variables so that mfn actually stores a type-safe mfn.
> 
> Signed-off-by: Hongyan Xia <hongyxia@amazon.com>
> Signed-off-by: Julien Grall <jgrall@amazon.com>

Reviewed-by: Jan Beulich <jbeulich@suse.com>
(obviously remains valid across ...

> --- a/xen/common/numa.c
> +++ b/xen/common/numa.c
> @@ -424,13 +424,13 @@ static int __init populate_memnodemap(const struct node *nodes,
>  static int __init allocate_cachealigned_memnodemap(void)
>  {
>      unsigned long size = PFN_UP(memnodemapsize * sizeof(*memnodemap));
> -    unsigned long mfn = mfn_x(alloc_boot_pages(size, 1));
> +    mfn_t mfn = alloc_boot_pages(size, 1);
>  
> -    memnodemap = mfn_to_virt(mfn);
> -    mfn <<= PAGE_SHIFT;
> +    memnodemap = vmap_contig_pages(mfn, size);

... a possible rename of this function)

Jan


^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 05/22] x86/srat: vmap the pages for acpi_slit
  2022-12-16 11:48 ` [PATCH 05/22] x86/srat: vmap the pages for acpi_slit Julien Grall
@ 2022-12-20 15:30   ` Jan Beulich
  2022-12-23 11:31     ` Julien Grall
  0 siblings, 1 reply; 101+ messages in thread
From: Jan Beulich @ 2022-12-20 15:30 UTC (permalink / raw)
  To: Julien Grall
  Cc: Hongyan Xia, Andrew Cooper, Roger Pau Monné,
	Wei Liu, xen-devel, Julien Grall

On 16.12.2022 12:48, Julien Grall wrote:
> From: Hongyan Xia <hongyxia@amazon.com>
> 
> This avoids the assumption that boot pages are in the direct map.
> 
> Signed-off-by: Hongyan Xia <hongyxia@amazon.com>
> Signed-off-by: Julien Grall <jgrall@amazon.com>

Reviewed-by: Jan Beulich <jbeulich@suse.com>

However, ...

> --- a/xen/arch/x86/srat.c
> +++ b/xen/arch/x86/srat.c
> @@ -139,7 +139,8 @@ void __init acpi_numa_slit_init(struct acpi_table_slit *slit)
>  		return;
>  	}
>  	mfn = alloc_boot_pages(PFN_UP(slit->header.length), 1);
> -	acpi_slit = mfn_to_virt(mfn_x(mfn));
> +	acpi_slit = vmap_contig_pages(mfn, PFN_UP(slit->header.length));

... with the increased use of vmap space the VA range used will need
growing. And that's perhaps better done ahead of time than late.

> +	BUG_ON(!acpi_slit);

Similarly relevant for the earlier patch: It would be nice if boot
failure for optional things like NUMA data could be avoided. But I
understand this is somewhat orthogonal to this series (the more that
alloc_boot_pages() itself is also affected). Yet not entirely so,
since previously there was no mapping failure possible here.

Jan


^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 02/22] x86/setup: move vm_init() before acpi calls
  2022-12-20 15:08   ` Jan Beulich
@ 2022-12-21 10:18     ` Julien Grall
  2022-12-21 10:22       ` Jan Beulich
  2022-12-23  9:51     ` Julien Grall
  1 sibling, 1 reply; 101+ messages in thread
From: Julien Grall @ 2022-12-21 10:18 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Wei Liu, Stefano Stabellini, Bertrand Marquis, Volodymyr Babchuk,
	Andrew Cooper, George Dunlap, Wei Liu, Roger Pau Monné,
	David Woodhouse, Hongyan Xia, Julien Grall, xen-devel

Hi Jan,

On 20/12/2022 15:08, Jan Beulich wrote:
> On 16.12.2022 12:48, Julien Grall wrote:
>> From: Wei Liu <wei.liu2@citrix.com>
>>
>> After the direct map removal, pages from the boot allocator are not
>> mapped at all in the direct map. Although we have map_domain_page, they
> 
> Nit: "will not be mapped" or "are not going to be mapped", or else this
> sounds like there's a bug somewhere.

I will update.

> 
>> --- a/xen/arch/x86/setup.c
>> +++ b/xen/arch/x86/setup.c
>> @@ -870,6 +870,7 @@ void __init noreturn __start_xen(unsigned long mbi_p)
>>       unsigned long eb_start, eb_end;
>>       bool acpi_boot_table_init_done = false, relocated = false;
>>       int ret;
>> +    bool vm_init_done = false;
> 
> Can this please be grouped with the other bool-s (even visible in context)?
>> --- a/xen/common/vmap.c
>> +++ b/xen/common/vmap.c
>> @@ -34,9 +34,20 @@ void __init vm_init_type(enum vmap_region type, void *start, void *end)
>>   
>>       for ( i = 0, va = (unsigned long)vm_bitmap(type); i < nr; ++i, va += PAGE_SIZE )
>>       {
>> -        struct page_info *pg = alloc_domheap_page(NULL, 0);
>> +        mfn_t mfn;
>> +        int rc;
>>   
>> -        map_pages_to_xen(va, page_to_mfn(pg), 1, PAGE_HYPERVISOR);
>> +        if ( system_state == SYS_STATE_early_boot )
>> +            mfn = alloc_boot_pages(1, 1);
>> +        else
>> +        {
>> +            struct page_info *pg = alloc_domheap_page(NULL, 0);
>> +
>> +            BUG_ON(!pg);
>> +            mfn = page_to_mfn(pg);
>> +        }
>> +        rc = map_pages_to_xen(va, mfn, 1, PAGE_HYPERVISOR);
>> +        BUG_ON(rc);
> 
> The adding of a return value check is unrelated and not overly useful:
> 
>>           clear_page((void *)va);
> 
> This will fault anyway if the mapping attempt failed.

Not always. At least on Arm, map_pages_to_xen() could fail if the VA was 
mapped to another physical address.

This seems unlikely, yet I think that relying on clear_page() to always 
fail when map_pages_to_xen() return an error is bogus.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 02/22] x86/setup: move vm_init() before acpi calls
  2022-12-21 10:18     ` Julien Grall
@ 2022-12-21 10:22       ` Jan Beulich
  2022-12-23  9:51         ` Julien Grall
  0 siblings, 1 reply; 101+ messages in thread
From: Jan Beulich @ 2022-12-21 10:22 UTC (permalink / raw)
  To: Julien Grall
  Cc: Wei Liu, Stefano Stabellini, Bertrand Marquis, Volodymyr Babchuk,
	Andrew Cooper, George Dunlap, Wei Liu, Roger Pau Monné,
	David Woodhouse, Hongyan Xia, Julien Grall, xen-devel

On 21.12.2022 11:18, Julien Grall wrote:
> On 20/12/2022 15:08, Jan Beulich wrote:
>> On 16.12.2022 12:48, Julien Grall wrote:
>>> --- a/xen/common/vmap.c
>>> +++ b/xen/common/vmap.c
>>> @@ -34,9 +34,20 @@ void __init vm_init_type(enum vmap_region type, void *start, void *end)
>>>   
>>>       for ( i = 0, va = (unsigned long)vm_bitmap(type); i < nr; ++i, va += PAGE_SIZE )
>>>       {
>>> -        struct page_info *pg = alloc_domheap_page(NULL, 0);
>>> +        mfn_t mfn;
>>> +        int rc;
>>>   
>>> -        map_pages_to_xen(va, page_to_mfn(pg), 1, PAGE_HYPERVISOR);
>>> +        if ( system_state == SYS_STATE_early_boot )
>>> +            mfn = alloc_boot_pages(1, 1);
>>> +        else
>>> +        {
>>> +            struct page_info *pg = alloc_domheap_page(NULL, 0);
>>> +
>>> +            BUG_ON(!pg);
>>> +            mfn = page_to_mfn(pg);
>>> +        }
>>> +        rc = map_pages_to_xen(va, mfn, 1, PAGE_HYPERVISOR);
>>> +        BUG_ON(rc);
>>
>> The adding of a return value check is unrelated and not overly useful:
>>
>>>           clear_page((void *)va);
>>
>> This will fault anyway if the mapping attempt failed.
> 
> Not always. At least on Arm, map_pages_to_xen() could fail if the VA was 
> mapped to another physical address.

Oh, okay.

> This seems unlikely, yet I think that relying on clear_page() to always 
> fail when map_pages_to_xen() return an error is bogus.

Fair enough, but then please at least call out the change (and the reason)
in the description. Even better might be to make this a separate change,
but I wouldn't insist (quite likely I wouldn't have made this a separate
change either).

Jan


^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 03/22] acpi: vmap pages in acpi_os_alloc_memory
  2022-12-20 15:15   ` Jan Beulich
@ 2022-12-21 10:23     ` Julien Grall
  0 siblings, 0 replies; 101+ messages in thread
From: Julien Grall @ 2022-12-21 10:23 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Hongyan Xia, Andrew Cooper, George Dunlap, Stefano Stabellini,
	Wei Liu, Julien Grall, xen-devel

Hi Jan,

On 20/12/2022 15:15, Jan Beulich wrote:
> On 16.12.2022 12:48, Julien Grall wrote:
>> --- a/xen/common/vmap.c
>> +++ b/xen/common/vmap.c
>> @@ -244,6 +244,11 @@ void *vmap(const mfn_t *mfn, unsigned int nr)
>>       return __vmap(mfn, 1, nr, 1, PAGE_HYPERVISOR, VMAP_DEFAULT);
>>   }
>>   
>> +void *vmap_contig_pages(mfn_t mfn, unsigned int nr_pages)
> 
> I don't think the _pages suffix buys us much here. I also think parameter
> names would better be consistent with other functions here, in particular
> with vmap() (i.e. s/nr_pages/nr/).

I will do the renaming.

> 
>> --- a/xen/drivers/acpi/osl.c
>> +++ b/xen/drivers/acpi/osl.c
>> @@ -221,7 +221,11 @@ void *__init acpi_os_alloc_memory(size_t sz)
>>   	void *ptr;
>>   
>>   	if (system_state == SYS_STATE_early_boot)
>> -		return mfn_to_virt(mfn_x(alloc_boot_pages(PFN_UP(sz), 1)));
>> +	{
>> +		mfn_t mfn = alloc_boot_pages(PFN_UP(sz), 1);
>> +
>> +		return vmap_contig_pages(mfn, PFN_UP(sz));
>> +	}
> 
> Multiple pages may be allocated here, yet ...
> 
>> @@ -246,5 +250,10 @@ void __init acpi_os_free_memory(void *ptr)
>>   	if (is_xmalloc_memory(ptr))
>>   		xfree(ptr);
>>   	else if (ptr && system_state == SYS_STATE_early_boot)
>> -		init_boot_pages(__pa(ptr), __pa(ptr) + PAGE_SIZE);
>> +	{
>> +		paddr_t addr = mfn_to_maddr(vmap_to_mfn(ptr));
>> +
>> +		vunmap(ptr);
>> +		init_boot_pages(addr, addr + PAGE_SIZE);
>> +	}
> 
> ... (as before) only one page would be freed here. With the move to
> vmap() it ought to be possible to do better now. (If you want to
> defer this to a later patch, please at least mention the aspect in
> the description.)

Good point, I will have a look to solve it in this patch.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 06/22] x86: map/unmap pages in restore_all_guests
  2022-12-16 11:48 ` [PATCH 06/22] x86: map/unmap pages in restore_all_guests Julien Grall
@ 2022-12-22 11:12   ` Jan Beulich
  2022-12-23 12:22     ` Julien Grall
  0 siblings, 1 reply; 101+ messages in thread
From: Jan Beulich @ 2022-12-22 11:12 UTC (permalink / raw)
  To: Julien Grall
  Cc: Hongyan Xia, Andrew Cooper, Roger Pau Monné,
	Wei Liu, Julien Grall, xen-devel

On 16.12.2022 12:48, Julien Grall wrote:
> --- a/xen/arch/x86/x86_64/entry.S
> +++ b/xen/arch/x86/x86_64/entry.S
> @@ -165,7 +165,24 @@ restore_all_guest:
>          and   %rsi, %rdi
>          and   %r9, %rsi
>          add   %rcx, %rdi
> -        add   %rcx, %rsi
> +
> +         /*
> +          * Without a direct map, we have to map first before copying. We only
> +          * need to map the guest root table but not the per-CPU root_pgt,
> +          * because the latter is still a xenheap page.
> +          */
> +        pushq %r9
> +        pushq %rdx
> +        pushq %rax
> +        pushq %rdi
> +        mov   %rsi, %rdi
> +        shr   $PAGE_SHIFT, %rdi
> +        callq map_domain_page
> +        mov   %rax, %rsi
> +        popq  %rdi
> +        /* Stash the pointer for unmapping later. */
> +        pushq %rax
> +
>          mov   $ROOT_PAGETABLE_FIRST_XEN_SLOT, %ecx
>          mov   root_table_offset(SH_LINEAR_PT_VIRT_START)*8(%rsi), %r8
>          mov   %r8, root_table_offset(SH_LINEAR_PT_VIRT_START)*8(%rdi)
> @@ -177,6 +194,14 @@ restore_all_guest:
>          sub   $(ROOT_PAGETABLE_FIRST_XEN_SLOT - \
>                  ROOT_PAGETABLE_LAST_XEN_SLOT - 1) * 8, %rdi
>          rep movsq
> +
> +        /* Unmap the page. */
> +        popq  %rdi
> +        callq unmap_domain_page
> +        popq  %rax
> +        popq  %rdx
> +        popq  %r9

While the PUSH/POP are part of what I dislike here, I think this wants
doing differently: Establish a mapping when putting in place a new guest
page table, and use the pointer here. This could be a new per-domain
mapping, to limit its visibility.

Jan


^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 07/22] x86/pv: domheap pages should be mapped while relocating initrd
  2022-12-16 11:48 ` [PATCH 07/22] x86/pv: domheap pages should be mapped while relocating initrd Julien Grall
@ 2022-12-22 11:18   ` Jan Beulich
  0 siblings, 0 replies; 101+ messages in thread
From: Jan Beulich @ 2022-12-22 11:18 UTC (permalink / raw)
  To: Julien Grall
  Cc: Wei Liu, Andrew Cooper, Roger Pau Monné,
	Wei Liu, Wei Wang, Julien Grall, xen-devel

On 16.12.2022 12:48, Julien Grall wrote:
> From: Wei Liu <wei.liu2@citrix.com>
> 
> Xen shouldn't use domheap page as if they were xenheap pages. Map and
> unmap pages accordingly.
> 
> Signed-off-by: Wei Liu <wei.liu2@citrix.com>
> Signed-off-by: Wei Wang <wawei@amazon.de>
> Signed-off-by: Julien Grall <jgrall@amazon.com>
> 
> ----
> 
>     Changes since Hongyan's version:
>         * Add missing newline after the variable declaration
> ---
>  xen/arch/x86/pv/dom0_build.c | 18 ++++++++++++++++--
>  1 file changed, 16 insertions(+), 2 deletions(-)
> 
> diff --git a/xen/arch/x86/pv/dom0_build.c b/xen/arch/x86/pv/dom0_build.c
> index a62f0fa2ef29..c837b2d96f89 100644
> --- a/xen/arch/x86/pv/dom0_build.c
> +++ b/xen/arch/x86/pv/dom0_build.c
> @@ -611,18 +611,32 @@ int __init dom0_construct_pv(struct domain *d,
>          if ( d->arch.physaddr_bitsize &&
>               ((mfn + count - 1) >> (d->arch.physaddr_bitsize - PAGE_SHIFT)) )
>          {
> +            unsigned long nr_pages;
> +            unsigned long len = initrd_len;
> +
>              order = get_order_from_pages(count);
>              page = alloc_domheap_pages(d, order, MEMF_no_scrub);
>              if ( !page )
>                  panic("Not enough RAM for domain 0 initrd\n");
> +
> +            nr_pages = 1UL << order;

I don't think this needs establishing here and ...

>              for ( count = -count; order--; )
>                  if ( count & (1UL << order) )
>                  {
>                      free_domheap_pages(page, order);
>                      page += 1UL << order;
> +                    nr_pages -= 1UL << order;

... updating here. Doing so just once ...

>                  }
> -            memcpy(page_to_virt(page), mfn_to_virt(initrd->mod_start),
> -                   initrd_len);
> +

... here ought to suffice, assuming this 2nd variable is needed at all
(alongside "len").

> +            for ( i = 0; i < nr_pages; i++, len -= PAGE_SIZE )
> +            {
> +                void *p = __map_domain_page(page + i);
> +
> +                memcpy(p, mfn_to_virt(initrd_mfn + i),
> +                       min(len, (unsigned long)PAGE_SIZE));
> +                unmap_domain_page(p);
> +            }

You're half open-coding copy_domain_page() without saying anywhere why
the remaining mfn_to_virt() is okay to keep. If you used
copy_domain_page(), no such remark would need adding in the description.

Jan


^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 08/22] x86/pv: rewrite how building PV dom0 handles domheap mappings
  2022-12-16 11:48 ` [PATCH 08/22] x86/pv: rewrite how building PV dom0 handles domheap mappings Julien Grall
@ 2022-12-22 11:48   ` Jan Beulich
  2024-01-10 12:50     ` El Yandouzi, Elias
  0 siblings, 1 reply; 101+ messages in thread
From: Jan Beulich @ 2022-12-22 11:48 UTC (permalink / raw)
  To: Julien Grall
  Cc: Hongyan Xia, Andrew Cooper, Roger Pau Monné,
	Wei Liu, Julien Grall, xen-devel

On 16.12.2022 12:48, Julien Grall wrote:
> From: Hongyan Xia <hongyxia@amazon.com>
> 
> Building a PV dom0 is allocating from the domheap but uses it like the
> xenheap. This is clearly wrong. Fix.

"Clearly wrong" would mean there's a bug here, at lest under certain
conditions. But there isn't: Even on huge systems, due to running on
idle page tables, all memory is mapped at present.

> @@ -711,22 +715,32 @@ int __init dom0_construct_pv(struct domain *d,
>          v->arch.pv.event_callback_cs    = FLAT_COMPAT_KERNEL_CS;
>      }
>  
> +#define UNMAP_MAP_AND_ADVANCE(mfn_var, virt_var, maddr) \
> +do {                                                    \
> +    UNMAP_DOMAIN_PAGE(virt_var);                        \

Not much point using the macro when ...

> +    mfn_var = maddr_to_mfn(maddr);                      \
> +    maddr += PAGE_SIZE;                                 \
> +    virt_var = map_domain_page(mfn_var);                \

... the variable gets reset again to non-NULL unconditionally right
away.

> +} while ( false )

This being a local macro and all use sites passing mpt_alloc as the
last argument, I think that parameter wants dropping, which would
improve readability.

> @@ -792,9 +808,9 @@ int __init dom0_construct_pv(struct domain *d,
>              if ( !l3e_get_intpte(*l3tab) )
>              {
>                  maddr_to_page(mpt_alloc)->u.inuse.type_info = PGT_l2_page_table;
> -                l2tab = __va(mpt_alloc); mpt_alloc += PAGE_SIZE;
> -                clear_page(l2tab);
> -                *l3tab = l3e_from_paddr(__pa(l2tab), L3_PROT);
> +                UNMAP_MAP_AND_ADVANCE(l2start_mfn, l2start, mpt_alloc);
> +                clear_page(l2start);
> +                *l3tab = l3e_from_mfn(l2start_mfn, L3_PROT);

The l2start you map on the last iteration here can be re-used ...

> @@ -805,9 +821,17 @@ int __init dom0_construct_pv(struct domain *d,
>          unmap_domain_page(l2t);
>      }

... in the code the tail of which is visible here, eliminating a
redundant map/unmap pair.

> @@ -977,8 +1001,12 @@ int __init dom0_construct_pv(struct domain *d,
>       * !CONFIG_VIDEO case so the logic here can be simplified.
>       */
>      if ( pv_shim )
> +    {
> +        l4start = map_domain_page(l4start_mfn);
>          pv_shim_setup_dom(d, l4start, v_start, vxenstore_start, vconsole_start,
>                            vphysmap_start, si);
> +        UNMAP_DOMAIN_PAGE(l4start);
> +    }

The, at the first glance, redundant re-mapping of the L4 table here could
do with explaining in the description. However, I further wonder in how
far in shim mode eliminating the direct map is actually useful. Which is
to say that I question the need for this change in the first place. Or
wait - isn't this (unlike the rest of this patch) actually a bug fix? At
this point we're on the domain's page tables, which may not cover the
page the L4 is allocated at (if a truly huge shim was configured). So I
guess the change is needed but wants breaking out, allowing to at least
consider whether to backport it.

Jan


^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 09/22] x86: lift mapcache variable to the arch level
  2022-12-16 11:48 ` [PATCH 09/22] x86: lift mapcache variable to the arch level Julien Grall
@ 2022-12-22 12:53   ` Jan Beulich
  0 siblings, 0 replies; 101+ messages in thread
From: Jan Beulich @ 2022-12-22 12:53 UTC (permalink / raw)
  To: Julien Grall
  Cc: Wei Liu, Andrew Cooper, Roger Pau Monné,
	Wei Liu, Wei Wang, Hongyan Xia, Julien Grall, xen-devel

On 16.12.2022 12:48, Julien Grall wrote:
> From: Wei Liu <wei.liu2@citrix.com>
> 
> It is going to be needed by HVM and idle domain as well, because without
> the direct map, both need a mapcache to map pages.
> 
> This only lifts the mapcache variable up. Whether we populate the
> mapcache for a domain is unchanged in this patch.
> 
> Signed-off-by: Wei Liu <wei.liu2@citrix.com>
> Signed-off-by: Wei Wang <wawei@amazon.de>
> Signed-off-by: Hongyan Xia <hongyxia@amazon.com>
> Signed-off-by: Julien Grall <jgrall@amazon.com>

Reviewed-by: Jan Beulich <jbeulich@suse.com>




^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 10/22] x86/mapcache: initialise the mapcache for the idle domain
  2022-12-16 11:48 ` [PATCH 10/22] x86/mapcache: initialise the mapcache for the idle domain Julien Grall
@ 2022-12-22 13:06   ` Jan Beulich
  2024-01-10 16:24     ` Elias El Yandouzi
  0 siblings, 1 reply; 101+ messages in thread
From: Jan Beulich @ 2022-12-22 13:06 UTC (permalink / raw)
  To: Julien Grall
  Cc: Hongyan Xia, Andrew Cooper, Roger Pau Monné,
	Wei Liu, Wei Wang, Julien Grall, xen-devel

On 16.12.2022 12:48, Julien Grall wrote:
> From: Hongyan Xia <hongyxia@amazon.com>
> 
> In order to use the mapcache in the idle domain, we also have to
> populate its page tables in the PERDOMAIN region, and we need to move
> mapcache_domain_init() earlier in arch_domain_create().
> 
> Note, commit 'x86: lift mapcache variable to the arch level' has
> initialised the mapcache for HVM domains. With this patch, PV, HVM,
> idle domains now all initialise the mapcache.

But they can't use it yet, can they? This needs saying explicitly, or
else one is going to make wrong implications.

> --- a/xen/arch/x86/domain.c
> +++ b/xen/arch/x86/domain.c
> @@ -732,6 +732,8 @@ int arch_domain_create(struct domain *d,
>  
>      spin_lock_init(&d->arch.e820_lock);
>  
> +    mapcache_domain_init(d);
> +
>      /* Minimal initialisation for the idle domain. */
>      if ( unlikely(is_idle_domain(d)) )
>      {
> @@ -829,8 +831,6 @@ int arch_domain_create(struct domain *d,
>  
>      psr_domain_init(d);
>  
> -    mapcache_domain_init(d);

You move this ahead of error paths taking the "goto out" route, so
adjustments to affected error paths are going to be needed to avoid
memory leaks.

> --- a/xen/arch/x86/mm.c
> +++ b/xen/arch/x86/mm.c
> @@ -5963,6 +5963,9 @@ int create_perdomain_mapping(struct domain *d, unsigned long va,
>          l3tab = __map_domain_page(pg);
>          clear_page(l3tab);
>          d->arch.perdomain_l3_pg = pg;
> +        if ( is_idle_domain(d) )
> +            idle_pg_table[l4_table_offset(PERDOMAIN_VIRT_START)] =
> +                l4e_from_page(pg, __PAGE_HYPERVISOR_RW);

Hmm, having an idle domain check here isn't very nice. I agree putting
it in arch_domain_create()'s respective conditional isn't very neat
either, but personally I'd consider this at least a little less bad.
And the layering violation aspect isn't much worse than that of setting
d->arch.ctxt_switch there as well.

Jan


^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 11/22] x86: add a boot option to enable and disable the direct map
  2022-12-16 11:48 ` [PATCH 11/22] x86: add a boot option to enable and disable the direct map Julien Grall
@ 2022-12-22 13:24   ` Jan Beulich
  2024-01-11 10:47     ` Elias El Yandouzi
  2023-01-23 21:45   ` Stefano Stabellini
  1 sibling, 1 reply; 101+ messages in thread
From: Jan Beulich @ 2022-12-22 13:24 UTC (permalink / raw)
  To: Julien Grall
  Cc: Hongyan Xia, Andrew Cooper, George Dunlap, Stefano Stabellini,
	Wei Liu, Bertrand Marquis, Volodymyr Babchuk,
	Roger Pau Monné,
	Julien Grall, xen-devel

On 16.12.2022 12:48, Julien Grall wrote:
> From: Hongyan Xia <hongyxia@amazon.com>
> 
> Also add a helper function to retrieve it. Change arch_mfns_in_direct_map
> to check this option before returning.

I think the abstract parts of this want to be generic right away. I can't
see why Arm would not suffer from the same issue that this work is trying
to address.

> This is added as a boot command line option, not a Kconfig to allow
> the user to experiment the feature without rebuild the hypervisor.

I think there wants to be a (generic) Kconfig piece here, to control the
default of the option. Plus a 2nd, prompt-less element which an arch can
select to force the setting to always-on, suppressing the choice of
default. That 2nd control would then be used to compile out the
boolean_param() for Arm for the time being.

That said, I think this change comes too early in the series, or there is
something missing. As said in reply to patch 10, while there the mapcache
is being initialized for the idle domain, I don't think it can be used
just yet. Read through mapcache_current_vcpu() to understand why I think
that way, paying particular attention to the ASSERT() near the end. In
preparation of this patch here I think the mfn_to_virt() uses have to all
disappear from map_domain_page(). Perhaps yet more strongly all
..._to_virt() (except fix_to_virt() and friends) and __va() have to
disappear up front from x86 and any code path which can be taken on x86
(which may simply mean purging all respective x86 #define-s, without
breaking the build in any way).

Jan


^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 12/22] xen/arm: fixmap: Rename the fixmap slots to follow the x86 convention
  2022-12-16 11:48 ` [PATCH 12/22] xen/arm: fixmap: Rename the fixmap slots to follow the x86 convention Julien Grall
@ 2022-12-22 13:29   ` Jan Beulich
  2023-01-06 14:54   ` Henry Wang
  2023-01-23 21:47   ` Stefano Stabellini
  2 siblings, 0 replies; 101+ messages in thread
From: Jan Beulich @ 2022-12-22 13:29 UTC (permalink / raw)
  To: Julien Grall
  Cc: Julien Grall, Stefano Stabellini, Bertrand Marquis,
	Volodymyr Babchuk, Andrew Cooper, George Dunlap, Wei Liu,
	xen-devel

On 16.12.2022 12:48, Julien Grall wrote:
> From: Julien Grall <jgrall@amazon.com>
> 
> At the moment the fixmap slots are prefixed differently between arm and
> x86.
> 
> Some of them (e.g. the PMAP slots) are used in common code. So it would
> be better if they are named the same way to avoid having to create
> aliases.
> 
> I have decided to use the x86 naming because they are less change. So
> all the Arm fixmap slots will now be prefixed with FIX rather than
> FIXMAP.
> 
> Signed-off-by: Julien Grall <jgrall@amazon.com>

Reviewed-by: Jan Beulich <jbeulich@suse.com>




^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 01/22] xen/common: page_alloc: Re-order includes
  2022-12-16 12:03   ` Jan Beulich
@ 2022-12-23  9:29     ` Julien Grall
  0 siblings, 0 replies; 101+ messages in thread
From: Julien Grall @ 2022-12-23  9:29 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Julien Grall, Andrew Cooper, George Dunlap, Stefano Stabellini,
	Wei Liu, xen-devel

Hi Jan,

On 16/12/2022 12:03, Jan Beulich wrote:
> On 16.12.2022 12:48, Julien Grall wrote:
>> From: Julien Grall <jgrall@amazon.com>
>>
>> Order the includes with the xen headers first, then asm headers and
>> last public headers. Within each category, they are sorted alphabetically.
>>
>> Note that the includes in protected by CONFIG_X86 hasn't been sorted
>> to avoid adding multiple #ifdef.
>>
>> Signed-off-by: Julien Grall <jgrall@amazon.com>
> 
> Acked-by: Jan Beulich <jbeulich@suse.com>

Thanks!

> 
>> ----
>>
>>      I am open to add sort the includes protected by CONFIG_X86
>>      and add multiple #ifdef if this is preferred.
> 
> I, for one, prefer it the way you've done it.

Ok. I have committed the patch as-is .

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 02/22] x86/setup: move vm_init() before acpi calls
  2022-12-20 15:08   ` Jan Beulich
  2022-12-21 10:18     ` Julien Grall
@ 2022-12-23  9:51     ` Julien Grall
  1 sibling, 0 replies; 101+ messages in thread
From: Julien Grall @ 2022-12-23  9:51 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Wei Liu, Stefano Stabellini, Bertrand Marquis, Volodymyr Babchuk,
	Andrew Cooper, George Dunlap, Wei Liu, Roger Pau Monné,
	David Woodhouse, Hongyan Xia, Julien Grall, xen-devel

Hi Jan,

On 20/12/2022 15:08, Jan Beulich wrote:
> On 16.12.2022 12:48, Julien Grall wrote:
>> From: Wei Liu <wei.liu2@citrix.com>
>>
>> After the direct map removal, pages from the boot allocator are not
>> mapped at all in the direct map. Although we have map_domain_page, they
> 
> Nit: "will not be mapped" or "are not going to be mapped", or else this
> sounds like there's a bug somewhere.
> 
>> --- a/xen/arch/x86/setup.c
>> +++ b/xen/arch/x86/setup.c
>> @@ -870,6 +870,7 @@ void __init noreturn __start_xen(unsigned long mbi_p)
>>       unsigned long eb_start, eb_end;
>>       bool acpi_boot_table_init_done = false, relocated = false;
>>       int ret;
>> +    bool vm_init_done = false;
> 
> Can this please be grouped with the other bool-s (even visible in context)?

This can't fit on the same line. So I went with:

bool acpi_boot_table_init_done = false, relocated = false;
bool vm_init_done = false;

I prefer this over the below:

bool acpi_boot_table_init_done = false, relocated false,
      vm_init_done = false;

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 02/22] x86/setup: move vm_init() before acpi calls
  2022-12-21 10:22       ` Jan Beulich
@ 2022-12-23  9:51         ` Julien Grall
  0 siblings, 0 replies; 101+ messages in thread
From: Julien Grall @ 2022-12-23  9:51 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Wei Liu, Stefano Stabellini, Bertrand Marquis, Volodymyr Babchuk,
	Andrew Cooper, George Dunlap, Wei Liu, Roger Pau Monné,
	David Woodhouse, Hongyan Xia, Julien Grall, xen-devel

Hi Jan,

On 21/12/2022 10:22, Jan Beulich wrote:
> On 21.12.2022 11:18, Julien Grall wrote:
>> On 20/12/2022 15:08, Jan Beulich wrote:
>>> On 16.12.2022 12:48, Julien Grall wrote:
>>>> --- a/xen/common/vmap.c
>>>> +++ b/xen/common/vmap.c
>>>> @@ -34,9 +34,20 @@ void __init vm_init_type(enum vmap_region type, void *start, void *end)
>>>>    
>>>>        for ( i = 0, va = (unsigned long)vm_bitmap(type); i < nr; ++i, va += PAGE_SIZE )
>>>>        {
>>>> -        struct page_info *pg = alloc_domheap_page(NULL, 0);
>>>> +        mfn_t mfn;
>>>> +        int rc;
>>>>    
>>>> -        map_pages_to_xen(va, page_to_mfn(pg), 1, PAGE_HYPERVISOR);
>>>> +        if ( system_state == SYS_STATE_early_boot )
>>>> +            mfn = alloc_boot_pages(1, 1);
>>>> +        else
>>>> +        {
>>>> +            struct page_info *pg = alloc_domheap_page(NULL, 0);
>>>> +
>>>> +            BUG_ON(!pg);
>>>> +            mfn = page_to_mfn(pg);
>>>> +        }
>>>> +        rc = map_pages_to_xen(va, mfn, 1, PAGE_HYPERVISOR);
>>>> +        BUG_ON(rc);
>>>
>>> The adding of a return value check is unrelated and not overly useful:
>>>
>>>>            clear_page((void *)va);
>>>
>>> This will fault anyway if the mapping attempt failed.
>>
>> Not always. At least on Arm, map_pages_to_xen() could fail if the VA was
>> mapped to another physical address.
> 
> Oh, okay.
> 
>> This seems unlikely, yet I think that relying on clear_page() to always
>> fail when map_pages_to_xen() return an error is bogus.
> 
> Fair enough, but then please at least call out the change (and the reason)
> in the description. Even better might be to make this a separate change,
> but I wouldn't insist (quite likely I wouldn't have made this a separate
> change either).

I have moved the change in a separate patch.

Cheers,

> 
> Jan

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 05/22] x86/srat: vmap the pages for acpi_slit
  2022-12-20 15:30   ` Jan Beulich
@ 2022-12-23 11:31     ` Julien Grall
  2023-01-04 10:23       ` Jan Beulich
  2023-01-30 19:27       ` Julien Grall
  0 siblings, 2 replies; 101+ messages in thread
From: Julien Grall @ 2022-12-23 11:31 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Hongyan Xia, Andrew Cooper, Roger Pau Monné,
	Wei Liu, xen-devel, Julien Grall

Hi Jan,

On 20/12/2022 15:30, Jan Beulich wrote:
> On 16.12.2022 12:48, Julien Grall wrote:
>> From: Hongyan Xia <hongyxia@amazon.com>
>>
>> This avoids the assumption that boot pages are in the direct map.
>>
>> Signed-off-by: Hongyan Xia <hongyxia@amazon.com>
>> Signed-off-by: Julien Grall <jgrall@amazon.com>
> 
> Reviewed-by: Jan Beulich <jbeulich@suse.com>
> 
> However, ...
> 
>> --- a/xen/arch/x86/srat.c
>> +++ b/xen/arch/x86/srat.c
>> @@ -139,7 +139,8 @@ void __init acpi_numa_slit_init(struct acpi_table_slit *slit)
>>   		return;
>>   	}
>>   	mfn = alloc_boot_pages(PFN_UP(slit->header.length), 1);
>> -	acpi_slit = mfn_to_virt(mfn_x(mfn));
>> +	acpi_slit = vmap_contig_pages(mfn, PFN_UP(slit->header.length));
> 
> ... with the increased use of vmap space the VA range used will need
> growing. And that's perhaps better done ahead of time than late.

I will have a look to increase the vmap().

> 
>> +	BUG_ON(!acpi_slit);
> 
> Similarly relevant for the earlier patch: It would be nice if boot
> failure for optional things like NUMA data could be avoided.

If you can't map (or allocate the memory), then you are probably in a 
very bad situation because both should really not fail at boot.

So I think this is correct to crash early because the admin will be able 
to look what went wrong. Otherwise, it may be missed in the noise.

>  But I
> understand this is somewhat orthogonal to this series (the more that
> alloc_boot_pages() itself is also affected). Yet not entirely so,
> since previously there was no mapping failure possible here.

See above. I don't see the problem of adding a potential mapping failure 
here and before.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 06/22] x86: map/unmap pages in restore_all_guests
  2022-12-22 11:12   ` Jan Beulich
@ 2022-12-23 12:22     ` Julien Grall
  2023-01-04 10:27       ` Jan Beulich
  0 siblings, 1 reply; 101+ messages in thread
From: Julien Grall @ 2022-12-23 12:22 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Hongyan Xia, Andrew Cooper, Roger Pau Monné,
	Wei Liu, Julien Grall, xen-devel

Hi,

On 22/12/2022 11:12, Jan Beulich wrote:
> On 16.12.2022 12:48, Julien Grall wrote:
>> --- a/xen/arch/x86/x86_64/entry.S
>> +++ b/xen/arch/x86/x86_64/entry.S
>> @@ -165,7 +165,24 @@ restore_all_guest:
>>           and   %rsi, %rdi
>>           and   %r9, %rsi
>>           add   %rcx, %rdi
>> -        add   %rcx, %rsi
>> +
>> +         /*
>> +          * Without a direct map, we have to map first before copying. We only
>> +          * need to map the guest root table but not the per-CPU root_pgt,
>> +          * because the latter is still a xenheap page.
>> +          */
>> +        pushq %r9
>> +        pushq %rdx
>> +        pushq %rax
>> +        pushq %rdi
>> +        mov   %rsi, %rdi
>> +        shr   $PAGE_SHIFT, %rdi
>> +        callq map_domain_page
>> +        mov   %rax, %rsi
>> +        popq  %rdi
>> +        /* Stash the pointer for unmapping later. */
>> +        pushq %rax
>> +
>>           mov   $ROOT_PAGETABLE_FIRST_XEN_SLOT, %ecx
>>           mov   root_table_offset(SH_LINEAR_PT_VIRT_START)*8(%rsi), %r8
>>           mov   %r8, root_table_offset(SH_LINEAR_PT_VIRT_START)*8(%rdi)
>> @@ -177,6 +194,14 @@ restore_all_guest:
>>           sub   $(ROOT_PAGETABLE_FIRST_XEN_SLOT - \
>>                   ROOT_PAGETABLE_LAST_XEN_SLOT - 1) * 8, %rdi
>>           rep movsq
>> +
>> +        /* Unmap the page. */
>> +        popq  %rdi
>> +        callq unmap_domain_page
>> +        popq  %rax
>> +        popq  %rdx
>> +        popq  %r9
> 
> While the PUSH/POP are part of what I dislike here, I think this wants
> doing differently: Establish a mapping when putting in place a new guest
> page table, and use the pointer here. This could be a new per-domain
> mapping, to limit its visibility.

I have looked at a per-domain approach and this looks way more complex 
than the few concise lines here (not mentioning the extra amount of 
memory).

So I am not convinced this is worth the effort here.

I don't have an other approach in mind. So are you disliking this 
approach to the point this will be nacked?

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 05/22] x86/srat: vmap the pages for acpi_slit
  2022-12-23 11:31     ` Julien Grall
@ 2023-01-04 10:23       ` Jan Beulich
  2023-01-12 23:15         ` Julien Grall
  2023-01-30 19:27       ` Julien Grall
  1 sibling, 1 reply; 101+ messages in thread
From: Jan Beulich @ 2023-01-04 10:23 UTC (permalink / raw)
  To: Julien Grall
  Cc: Hongyan Xia, Andrew Cooper, Roger Pau Monné,
	Wei Liu, xen-devel, Julien Grall

On 23.12.2022 12:31, Julien Grall wrote:
> On 20/12/2022 15:30, Jan Beulich wrote:
>> On 16.12.2022 12:48, Julien Grall wrote:
>>> From: Hongyan Xia <hongyxia@amazon.com>
>>>
>>> This avoids the assumption that boot pages are in the direct map.
>>>
>>> Signed-off-by: Hongyan Xia <hongyxia@amazon.com>
>>> Signed-off-by: Julien Grall <jgrall@amazon.com>
>>
>> Reviewed-by: Jan Beulich <jbeulich@suse.com>
>>
>> However, ...
>>
>>> --- a/xen/arch/x86/srat.c
>>> +++ b/xen/arch/x86/srat.c
>>> @@ -139,7 +139,8 @@ void __init acpi_numa_slit_init(struct acpi_table_slit *slit)
>>>   		return;
>>>   	}
>>>   	mfn = alloc_boot_pages(PFN_UP(slit->header.length), 1);
>>> -	acpi_slit = mfn_to_virt(mfn_x(mfn));
>>> +	acpi_slit = vmap_contig_pages(mfn, PFN_UP(slit->header.length));
>>
>> ... with the increased use of vmap space the VA range used will need
>> growing. And that's perhaps better done ahead of time than late.
> 
> I will have a look to increase the vmap().
> 
>>
>>> +	BUG_ON(!acpi_slit);
>>
>> Similarly relevant for the earlier patch: It would be nice if boot
>> failure for optional things like NUMA data could be avoided.
> 
> If you can't map (or allocate the memory), then you are probably in a 
> very bad situation because both should really not fail at boot.
> 
> So I think this is correct to crash early because the admin will be able 
> to look what went wrong. Otherwise, it may be missed in the noise.

Well, I certainly can see one taking this view. However, at least in
principle allocation (or mapping) may fail _because_ of NUMA issues.
At which point it would be better to boot with NUMA support turned off.

Jan


^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 06/22] x86: map/unmap pages in restore_all_guests
  2022-12-23 12:22     ` Julien Grall
@ 2023-01-04 10:27       ` Jan Beulich
  2023-01-12 23:20         ` Julien Grall
  0 siblings, 1 reply; 101+ messages in thread
From: Jan Beulich @ 2023-01-04 10:27 UTC (permalink / raw)
  To: Julien Grall
  Cc: Hongyan Xia, Andrew Cooper, Roger Pau Monné,
	Wei Liu, Julien Grall, xen-devel

On 23.12.2022 13:22, Julien Grall wrote:
> Hi,
> 
> On 22/12/2022 11:12, Jan Beulich wrote:
>> On 16.12.2022 12:48, Julien Grall wrote:
>>> --- a/xen/arch/x86/x86_64/entry.S
>>> +++ b/xen/arch/x86/x86_64/entry.S
>>> @@ -165,7 +165,24 @@ restore_all_guest:
>>>           and   %rsi, %rdi
>>>           and   %r9, %rsi
>>>           add   %rcx, %rdi
>>> -        add   %rcx, %rsi
>>> +
>>> +         /*
>>> +          * Without a direct map, we have to map first before copying. We only
>>> +          * need to map the guest root table but not the per-CPU root_pgt,
>>> +          * because the latter is still a xenheap page.
>>> +          */
>>> +        pushq %r9
>>> +        pushq %rdx
>>> +        pushq %rax
>>> +        pushq %rdi
>>> +        mov   %rsi, %rdi
>>> +        shr   $PAGE_SHIFT, %rdi
>>> +        callq map_domain_page
>>> +        mov   %rax, %rsi
>>> +        popq  %rdi
>>> +        /* Stash the pointer for unmapping later. */
>>> +        pushq %rax
>>> +
>>>           mov   $ROOT_PAGETABLE_FIRST_XEN_SLOT, %ecx
>>>           mov   root_table_offset(SH_LINEAR_PT_VIRT_START)*8(%rsi), %r8
>>>           mov   %r8, root_table_offset(SH_LINEAR_PT_VIRT_START)*8(%rdi)
>>> @@ -177,6 +194,14 @@ restore_all_guest:
>>>           sub   $(ROOT_PAGETABLE_FIRST_XEN_SLOT - \
>>>                   ROOT_PAGETABLE_LAST_XEN_SLOT - 1) * 8, %rdi
>>>           rep movsq
>>> +
>>> +        /* Unmap the page. */
>>> +        popq  %rdi
>>> +        callq unmap_domain_page
>>> +        popq  %rax
>>> +        popq  %rdx
>>> +        popq  %r9
>>
>> While the PUSH/POP are part of what I dislike here, I think this wants
>> doing differently: Establish a mapping when putting in place a new guest
>> page table, and use the pointer here. This could be a new per-domain
>> mapping, to limit its visibility.
> 
> I have looked at a per-domain approach and this looks way more complex 
> than the few concise lines here (not mentioning the extra amount of 
> memory).

Yes, I do understand that would be a more intrusive change.

> So I am not convinced this is worth the effort here.
> 
> I don't have an other approach in mind. So are you disliking this 
> approach to the point this will be nacked?

I guess I wouldn't nack it, but I also wouldn't provide an ack. I'm curious
what Andrew or Roger think here...

Jan


^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 13/22] xen/x86: Add support for the PMAP
  2022-12-16 11:48 ` [PATCH 13/22] xen/x86: Add support for the PMAP Julien Grall
@ 2023-01-05 16:46   ` Jan Beulich
  2023-01-05 17:50     ` Julien Grall
  0 siblings, 1 reply; 101+ messages in thread
From: Jan Beulich @ 2023-01-05 16:46 UTC (permalink / raw)
  To: Julien Grall
  Cc: Julien Grall, Andrew Cooper, Roger Pau Monné, Wei Liu, xen-devel

On 16.12.2022 12:48, Julien Grall wrote:
> PMAP will be used in a follow-up patch to bootstap map domain
> page infrastructure -- we need some way to map pages to setup the
> mapcache without a direct map.

But this isn't going to be needed overly early then, seeing that ...

> --- a/xen/arch/x86/include/asm/fixmap.h
> +++ b/xen/arch/x86/include/asm/fixmap.h
> @@ -21,6 +21,8 @@
>  
>  #include <xen/acpi.h>
>  #include <xen/pfn.h>
> +#include <xen/pmap.h>
> +
>  #include <asm/apicdef.h>
>  #include <asm/msi.h>
>  #include <acpi/apei.h>
> @@ -54,6 +56,8 @@ enum fixed_addresses {
>      FIX_XEN_SHARED_INFO,
>  #endif /* CONFIG_XEN_GUEST */
>      /* Everything else should go further down. */
> +    FIX_PMAP_BEGIN,
> +    FIX_PMAP_END = FIX_PMAP_BEGIN + NUM_FIX_PMAP,

... you've inserted the new entries after the respective comment? Is
there a reason you don't insert farther towards the end of this
enumeration?

> --- /dev/null
> +++ b/xen/arch/x86/include/asm/pmap.h
> @@ -0,0 +1,25 @@
> +#ifndef __ASM_PMAP_H__
> +#define __ASM_PMAP_H__
> +
> +#include <asm/fixmap.h>
> +
> +static inline void arch_pmap_map(unsigned int slot, mfn_t mfn)
> +{
> +    unsigned long linear = (unsigned long)fix_to_virt(slot);
> +    l1_pgentry_t *pl1e = &l1_fixmap[l1_table_offset(linear)];
> +
> +    ASSERT(!(l1e_get_flags(*pl1e) & _PAGE_PRESENT));
> +
> +    l1e_write_atomic(pl1e, l1e_from_mfn(mfn, PAGE_HYPERVISOR));
> +}
> +
> +static inline void arch_pmap_unmap(unsigned int slot)
> +{
> +    unsigned long linear = (unsigned long)fix_to_virt(slot);
> +    l1_pgentry_t *pl1e = &l1_fixmap[l1_table_offset(linear)];
> +
> +    l1e_write_atomic(pl1e, l1e_empty());
> +    flush_tlb_one_local(linear);
> +}

You're effectively open-coding {set,clear}_fixmap(), just without
the L1 table allocation (should such be necessary). If you depend
on using the build-time L1 table, then you need to move your
entries ahead of said comment. But independent of that you want
to either use the existing macros / functions, or explain why you
can't.

Jan


^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 13/22] xen/x86: Add support for the PMAP
  2023-01-05 16:46   ` Jan Beulich
@ 2023-01-05 17:50     ` Julien Grall
  2023-01-06  7:17       ` Jan Beulich
  0 siblings, 1 reply; 101+ messages in thread
From: Julien Grall @ 2023-01-05 17:50 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Julien Grall, Andrew Cooper, Roger Pau Monné, Wei Liu, xen-devel

Hi Jan,

On 05/01/2023 16:46, Jan Beulich wrote:
> On 16.12.2022 12:48, Julien Grall wrote:
>> --- a/xen/arch/x86/include/asm/fixmap.h
>> +++ b/xen/arch/x86/include/asm/fixmap.h
>> @@ -21,6 +21,8 @@
>>   
>>   #include <xen/acpi.h>
>>   #include <xen/pfn.h>
>> +#include <xen/pmap.h>
>> +
>>   #include <asm/apicdef.h>
>>   #include <asm/msi.h>
>>   #include <acpi/apei.h>
>> @@ -54,6 +56,8 @@ enum fixed_addresses {
>>       FIX_XEN_SHARED_INFO,
>>   #endif /* CONFIG_XEN_GUEST */
>>       /* Everything else should go further down. */
>> +    FIX_PMAP_BEGIN,
>> +    FIX_PMAP_END = FIX_PMAP_BEGIN + NUM_FIX_PMAP,
> 
> ... you've inserted the new entries after the respective comment? Is
> there a reason you don't insert farther towards the end of this
> enumeration?

I will answer this below.

> 
>> --- /dev/null
>> +++ b/xen/arch/x86/include/asm/pmap.h
>> @@ -0,0 +1,25 @@
>> +#ifndef __ASM_PMAP_H__
>> +#define __ASM_PMAP_H__
>> +
>> +#include <asm/fixmap.h>
>> +
>> +static inline void arch_pmap_map(unsigned int slot, mfn_t mfn)
>> +{
>> +    unsigned long linear = (unsigned long)fix_to_virt(slot);
>> +    l1_pgentry_t *pl1e = &l1_fixmap[l1_table_offset(linear)];
>> +
>> +    ASSERT(!(l1e_get_flags(*pl1e) & _PAGE_PRESENT));
>> +
>> +    l1e_write_atomic(pl1e, l1e_from_mfn(mfn, PAGE_HYPERVISOR));
>> +}
>> +
>> +static inline void arch_pmap_unmap(unsigned int slot)
>> +{
>> +    unsigned long linear = (unsigned long)fix_to_virt(slot);
>> +    l1_pgentry_t *pl1e = &l1_fixmap[l1_table_offset(linear)];
>> +
>> +    l1e_write_atomic(pl1e, l1e_empty());
>> +    flush_tlb_one_local(linear);
>> +}
> 
> You're effectively open-coding {set,clear}_fixmap(), just without
> the L1 table allocation (should such be necessary). If you depend
> on using the build-time L1 table, then you need to move your
> entries ahead of said comment.

So the problem is less about the allocation be more the fact that we 
can't use map_pages_to_xen() because it would call pmap_map().

So we need to break the loop. Hence why set_fixmap()/clear_fixmap() are 
open-coded.

And indeed, we would need to rely on the build-time L1 table in this 
case. So I will move the entries earlier.

> But independent of that you want
> to either use the existing macros / functions, or explain why you
> can't.

This is explained in the caller of arch_pmap*():

     /*
      * We cannot use set_fixmap() here. We use PMAP when the domain map
      * page infrastructure is not yet initialized, so 
map_pages_to_xen() called
      * by set_fixmap() needs to map pages on demand, which then calls 
pmap()
      * again, resulting in a loop. Modify the PTEs directly instead. 
The same
      * is true for pmap_unmap().
      */

The comment is valid for Arm, x86 and (I would expect in the future) 
RISC-V because the page-tables may be allocated in domheap (so not 
always mapped).

So I don't feel this comment should be duplicated in the header. But I 
can certainly explain it in the commit message.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 13/22] xen/x86: Add support for the PMAP
  2023-01-05 17:50     ` Julien Grall
@ 2023-01-06  7:17       ` Jan Beulich
  0 siblings, 0 replies; 101+ messages in thread
From: Jan Beulich @ 2023-01-06  7:17 UTC (permalink / raw)
  To: Julien Grall
  Cc: Julien Grall, Andrew Cooper, Roger Pau Monné, Wei Liu, xen-devel

On 05.01.2023 18:50, Julien Grall wrote:
> On 05/01/2023 16:46, Jan Beulich wrote:
>> On 16.12.2022 12:48, Julien Grall wrote:
>>> --- a/xen/arch/x86/include/asm/fixmap.h
>>> +++ b/xen/arch/x86/include/asm/fixmap.h
>>> @@ -21,6 +21,8 @@
>>>   
>>>   #include <xen/acpi.h>
>>>   #include <xen/pfn.h>
>>> +#include <xen/pmap.h>
>>> +
>>>   #include <asm/apicdef.h>
>>>   #include <asm/msi.h>
>>>   #include <acpi/apei.h>
>>> @@ -54,6 +56,8 @@ enum fixed_addresses {
>>>       FIX_XEN_SHARED_INFO,
>>>   #endif /* CONFIG_XEN_GUEST */
>>>       /* Everything else should go further down. */
>>> +    FIX_PMAP_BEGIN,
>>> +    FIX_PMAP_END = FIX_PMAP_BEGIN + NUM_FIX_PMAP,
>>
>> ... you've inserted the new entries after the respective comment? Is
>> there a reason you don't insert farther towards the end of this
>> enumeration?
> 
> I will answer this below.
> 
>>
>>> --- /dev/null
>>> +++ b/xen/arch/x86/include/asm/pmap.h
>>> @@ -0,0 +1,25 @@
>>> +#ifndef __ASM_PMAP_H__
>>> +#define __ASM_PMAP_H__
>>> +
>>> +#include <asm/fixmap.h>
>>> +
>>> +static inline void arch_pmap_map(unsigned int slot, mfn_t mfn)
>>> +{
>>> +    unsigned long linear = (unsigned long)fix_to_virt(slot);
>>> +    l1_pgentry_t *pl1e = &l1_fixmap[l1_table_offset(linear)];
>>> +
>>> +    ASSERT(!(l1e_get_flags(*pl1e) & _PAGE_PRESENT));
>>> +
>>> +    l1e_write_atomic(pl1e, l1e_from_mfn(mfn, PAGE_HYPERVISOR));
>>> +}
>>> +
>>> +static inline void arch_pmap_unmap(unsigned int slot)
>>> +{
>>> +    unsigned long linear = (unsigned long)fix_to_virt(slot);
>>> +    l1_pgentry_t *pl1e = &l1_fixmap[l1_table_offset(linear)];
>>> +
>>> +    l1e_write_atomic(pl1e, l1e_empty());
>>> +    flush_tlb_one_local(linear);
>>> +}
>>
>> You're effectively open-coding {set,clear}_fixmap(), just without
>> the L1 table allocation (should such be necessary). If you depend
>> on using the build-time L1 table, then you need to move your
>> entries ahead of said comment.
> 
> So the problem is less about the allocation be more the fact that we 
> can't use map_pages_to_xen() because it would call pmap_map().
> 
> So we need to break the loop. Hence why set_fixmap()/clear_fixmap() are 
> open-coded.
> 
> And indeed, we would need to rely on the build-time L1 table in this 
> case. So I will move the entries earlier.

Additionally we will now need to (finally) gain a build-time check that
all "early" entries actually fit in the static L1 table. XHCI has pushed
us quite a bit up here, and I could see us considering to alter (bump)
the number of PMAP entries.

>> But independent of that you want
>> to either use the existing macros / functions, or explain why you
>> can't.
> 
> This is explained in the caller of arch_pmap*():
> 
>      /*
>       * We cannot use set_fixmap() here. We use PMAP when the domain map
>       * page infrastructure is not yet initialized, so 
> map_pages_to_xen() called
>       * by set_fixmap() needs to map pages on demand, which then calls 
> pmap()
>       * again, resulting in a loop. Modify the PTEs directly instead. 
> The same
>       * is true for pmap_unmap().
>       */
> 
> The comment is valid for Arm, x86 and (I would expect in the future) 
> RISC-V because the page-tables may be allocated in domheap (so not 
> always mapped).
> 
> So I don't feel this comment should be duplicated in the header. But I 
> can certainly explain it in the commit message.

Right, that's what I was after; I'm sorry for not having worded this
precisely enough.

Jan


^ permalink raw reply	[flat|nested] 101+ messages in thread

* RE: [PATCH 12/22] xen/arm: fixmap: Rename the fixmap slots to follow the x86 convention
  2022-12-16 11:48 ` [PATCH 12/22] xen/arm: fixmap: Rename the fixmap slots to follow the x86 convention Julien Grall
  2022-12-22 13:29   ` Jan Beulich
@ 2023-01-06 14:54   ` Henry Wang
  2023-01-23 21:47   ` Stefano Stabellini
  2 siblings, 0 replies; 101+ messages in thread
From: Henry Wang @ 2023-01-06 14:54 UTC (permalink / raw)
  To: Julien Grall, xen-devel
  Cc: Julien Grall, Stefano Stabellini, Bertrand Marquis,
	Volodymyr Babchuk, Andrew Cooper, George Dunlap, Jan Beulich,
	Wei Liu

Hi Julien,

> -----Original Message-----
> Subject: [PATCH 12/22] xen/arm: fixmap: Rename the fixmap slots to follow
> the x86 convention
> 
> From: Julien Grall <jgrall@amazon.com>
> 
> At the moment the fixmap slots are prefixed differently between arm and
> x86.
> 
> Some of them (e.g. the PMAP slots) are used in common code. So it would
> be better if they are named the same way to avoid having to create
> aliases.
> 
> I have decided to use the x86 naming because they are less change. So
> all the Arm fixmap slots will now be prefixed with FIX rather than
> FIXMAP.
> 
> Signed-off-by: Julien Grall <jgrall@amazon.com>

Reviewed-by: Henry Wang <Henry.Wang@arm.com>

Kind regards,
Henry



^ permalink raw reply	[flat|nested] 101+ messages in thread

* RE: [PATCH 19/22] xen/arm32: mm: Rename 'first' to 'root' in init_secondary_pagetables()
  2022-12-16 11:48 ` [PATCH 19/22] xen/arm32: mm: Rename 'first' to 'root' in init_secondary_pagetables() Julien Grall
@ 2023-01-06 14:54   ` Henry Wang
  2023-01-23 22:06   ` Stefano Stabellini
  1 sibling, 0 replies; 101+ messages in thread
From: Henry Wang @ 2023-01-06 14:54 UTC (permalink / raw)
  To: Julien Grall, xen-devel
  Cc: Julien Grall, Stefano Stabellini, Bertrand Marquis, Volodymyr Babchuk

Hi Julien,

> -----Original Message-----
> Subject: [PATCH 19/22] xen/arm32: mm: Rename 'first' to 'root' in
> init_secondary_pagetables()
> 
> From: Julien Grall <jgrall@amazon.com>
> 
> The arm32 version of init_secondary_pagetables() will soon be re-used
> for arm64 as well where the root table start at level 0 rather than level 1.

Nit: s/start at/starts at/

> 
> So rename 'first' to 'root'.
> 
> Signed-off-by: Julien Grall <jgrall@amazon.com>

Reviewed-by: Henry Wang <Henry.Wang@arm.com>

Kind regards,
Henry


^ permalink raw reply	[flat|nested] 101+ messages in thread

* RE: [PATCH 20/22] xen/arm64: mm: Use per-pCPU page-tables
  2022-12-16 11:48 ` [PATCH 20/22] xen/arm64: mm: Use per-pCPU page-tables Julien Grall
@ 2023-01-06 14:54   ` Henry Wang
  2023-01-06 15:44     ` Julien Grall
  2023-01-23 22:21   ` Stefano Stabellini
  1 sibling, 1 reply; 101+ messages in thread
From: Henry Wang @ 2023-01-06 14:54 UTC (permalink / raw)
  To: Julien Grall, xen-devel
  Cc: Julien Grall, Stefano Stabellini, Bertrand Marquis, Volodymyr Babchuk

Hi Julien,

> -----Original Message-----
> Subject: [PATCH 20/22] xen/arm64: mm: Use per-pCPU page-tables
> 
> From: Julien Grall <jgrall@amazon.com>
> 
> At the moment, on Arm64, every pCPU are sharing the same page-tables.

Nit: s/every pCPU are/ every pCPU is/

> 
>  /*
> diff --git a/xen/arch/arm/include/asm/domain_page.h
> b/xen/arch/arm/include/asm/domain_page.h
> new file mode 100644
> index 000000000000..e9f52685e2ec
> --- /dev/null
> +++ b/xen/arch/arm/include/asm/domain_page.h
> @@ -0,0 +1,13 @@
> +#ifndef __ASM_ARM_DOMAIN_PAGE_H__
> +#define __ASM_ARM_DOMAIN_PAGE_H__
> +
> +#ifdef CONFIG_ARCH_MAP_DOMAIN_PAGE
> +bool init_domheap_mappings(unsigned int cpu);

I wonder if we can make this function "__init" as IIRC this function is only
used at Xen boot time, but since the original init_domheap_mappings()
is not "__init" anyway so this is not a strong argument.

> +#else
> +static inline bool init_domheap_mappings(unsigned int cpu)

(and also here)

Either you agree with above "__init" comment or not:
Reviewed-by: Henry Wang <Henry.Wang@arm.com>

Kind regards,
Henry


^ permalink raw reply	[flat|nested] 101+ messages in thread

* RE: [PATCH 21/22] xen/arm64: Implement a mapcache for arm64
  2022-12-16 11:48 ` [PATCH 21/22] xen/arm64: Implement a mapcache for arm64 Julien Grall
@ 2023-01-06 14:55   ` Henry Wang
  2023-01-23 22:34   ` Stefano Stabellini
  1 sibling, 0 replies; 101+ messages in thread
From: Henry Wang @ 2023-01-06 14:55 UTC (permalink / raw)
  To: Julien Grall, xen-devel
  Cc: Julien Grall, Stefano Stabellini, Bertrand Marquis, Volodymyr Babchuk

Hi Julien,

> -----Original Message-----
> Subject: [PATCH 21/22] xen/arm64: Implement a mapcache for arm64
> 
> From: Julien Grall <jgrall@amazon.com>
> 
> At the moment, on arm64, map_domain_page() is implemented using
> virt_to_mfn(). Therefore it is relying on the directmap.
> 
> In a follow-up patch, we will allow the admin to remove the directmap.
> Therefore we want to implement a mapcache.
> 
> Thanksfully there is already one for arm32. So select
> ARCH_ARM_DOMAIN_PAGE
> and add the necessary boiler plate to support 64-bit:
>     - The page-table start at level 0, so we need to allocate the level
>       1 page-table
>     - map_domain_page() should check if the page is in the directmap. If
>       yes, then use virt_to_mfn() to limit the performance impact
>       when the directmap is still enabled (this will be selectable
>       on the command line).
> 
> Take the opportunity to replace first_table_offset(...) with offsets[...].
> 
> Note that, so far, arch_mfns_in_directmap() always return true on

Nit: s/return/returns/

> arm64. So the mapcache is not yet used. This will change in a
> follow-up patch.
> 
> Signed-off-by: Julien Grall <jgrall@amazon.com>

Reviewed-by: Henry Wang <Henry.Wang@arm.com>

Kind regards,
Henry


^ permalink raw reply	[flat|nested] 101+ messages in thread

* RE: [PATCH 22/22] xen/arm64: Allow the admin to enable/disable the directmap
  2022-12-16 11:48 ` [PATCH 22/22] xen/arm64: Allow the admin to enable/disable the directmap Julien Grall
@ 2023-01-06 14:55   ` Henry Wang
  2023-01-23 22:52   ` Stefano Stabellini
  1 sibling, 0 replies; 101+ messages in thread
From: Henry Wang @ 2023-01-06 14:55 UTC (permalink / raw)
  To: Julien Grall, xen-devel
  Cc: Julien Grall, Andrew Cooper, George Dunlap, Jan Beulich,
	Stefano Stabellini, Wei Liu, Bertrand Marquis, Volodymyr Babchuk

Hi Julien,

> -----Original Message-----
> Subject: [PATCH 22/22] xen/arm64: Allow the admin to enable/disable the
> directmap
> 
> From: Julien Grall <jgrall@amazon.com>
> 
> Implement the same command line option as x86 to enable/disable the
> directmap. By default this is kept enabled.
> 
> Also modify setup_directmap_mappings() to populate the L0 entries
> related to the directmap area.
> 
> Signed-off-by: Julien Grall <jgrall@amazon.com>
> 
> ----
>     This patch is in an RFC state we need to decide what to do for arm32.
> 
>     Also, this is moving code that was introduced in this series. So
>     this will need to be fix in the next version (assuming Arm64 will
>     be ready).
> 
>     This was sent early as PoC to enable secret-free hypervisor
>     on Arm64.
> ---
> @@ -606,16 +613,27 @@ void __init setup_directmap_mappings(unsigned
> long base_mfn,
>      directmap_virt_end = XENHEAP_VIRT_START + nr_mfns * PAGE_SIZE;
>  }
>  #else /* CONFIG_ARM_64 */
> -/* Map the region in the directmap area. */
> +/*
> + * This either populate a valid fdirect map, or allocates empty L1 tables

I guess this is a typo: s/fdirect/direct/ ?

Reviewed-by: Henry Wang <Henry.Wang@arm.com>

Kind regards,
Henry


^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 20/22] xen/arm64: mm: Use per-pCPU page-tables
  2023-01-06 14:54   ` Henry Wang
@ 2023-01-06 15:44     ` Julien Grall
  2023-01-07  2:22       ` Henry Wang
  0 siblings, 1 reply; 101+ messages in thread
From: Julien Grall @ 2023-01-06 15:44 UTC (permalink / raw)
  To: Henry Wang, xen-devel
  Cc: Julien Grall, Stefano Stabellini, Bertrand Marquis, Volodymyr Babchuk

Hi Henry,

On 06/01/2023 14:54, Henry Wang wrote:
>> -----Original Message-----
>> Subject: [PATCH 20/22] xen/arm64: mm: Use per-pCPU page-tables
>>
>> From: Julien Grall <jgrall@amazon.com>
>>
>> At the moment, on Arm64, every pCPU are sharing the same page-tables.
> 
> Nit: s/every pCPU are/ every pCPU is/

I will fix it.

> 
>>
>>   /*
>> diff --git a/xen/arch/arm/include/asm/domain_page.h
>> b/xen/arch/arm/include/asm/domain_page.h
>> new file mode 100644
>> index 000000000000..e9f52685e2ec
>> --- /dev/null
>> +++ b/xen/arch/arm/include/asm/domain_page.h
>> @@ -0,0 +1,13 @@
>> +#ifndef __ASM_ARM_DOMAIN_PAGE_H__
>> +#define __ASM_ARM_DOMAIN_PAGE_H__
>> +
>> +#ifdef CONFIG_ARCH_MAP_DOMAIN_PAGE
>> +bool init_domheap_mappings(unsigned int cpu);
> 
> I wonder if we can make this function "__init" as IIRC this function is only
> used at Xen boot time, but since the original init_domheap_mappings()
> is not "__init" anyway so this is not a strong argument.

While this is not yet supported on Xen on Arm, CPUs can be 
onlined/offlined at runtime. So you want to keep init_domheap_mappings() 
around.

We could consider to provide a new attribute that will be match __init 
if hotplug is supported otherwirse it would be a NOP. But I don't think 
this is related to this series (most of the function used for bringup 
are not in __init).

>> +static inline bool init_domheap_mappings(unsigned int cpu)
> 
> (and also here)
> 
> Either you agree with above "__init" comment or not:
> Reviewed-by: Henry Wang <Henry.Wang@arm.com>

Thanks!

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 101+ messages in thread

* RE: [PATCH 20/22] xen/arm64: mm: Use per-pCPU page-tables
  2023-01-06 15:44     ` Julien Grall
@ 2023-01-07  2:22       ` Henry Wang
  0 siblings, 0 replies; 101+ messages in thread
From: Henry Wang @ 2023-01-07  2:22 UTC (permalink / raw)
  To: Julien Grall, xen-devel
  Cc: Julien Grall, Stefano Stabellini, Bertrand Marquis, Volodymyr Babchuk

Hi Julien,

> -----Original Message-----
> Subject: Re: [PATCH 20/22] xen/arm64: mm: Use per-pCPU page-tables
> 
> Hi Henry,
> 
> >> Subject: [PATCH 20/22] xen/arm64: mm: Use per-pCPU page-tables
> >>
> >> From: Julien Grall <jgrall@amazon.com>
> >>
> >> At the moment, on Arm64, every pCPU are sharing the same page-tables.
> >
> > Nit: s/every pCPU are/ every pCPU is/
> 
> I will fix it.

Thank you.

> 
> >> +bool init_domheap_mappings(unsigned int cpu);
> >
> > I wonder if we can make this function "__init" as IIRC this function is only
> > used at Xen boot time, but since the original init_domheap_mappings()
> > is not "__init" anyway so this is not a strong argument.
> 
> While this is not yet supported on Xen on Arm, CPUs can be
> onlined/offlined at runtime. So you want to keep init_domheap_mappings()
> around.

This is a very good point. I agree the pCPU online/offline is affected by the
"__init" so leaving the function without the "__init" like what we are doing
now is a good idea.

> 
> We could consider to provide a new attribute that will be match __init
> if hotplug is supported otherwirse it would be a NOP. But I don't think
> this is related to this series (most of the function used for bringup
> are not in __init).

Agreed.

> 
> >> +static inline bool init_domheap_mappings(unsigned int cpu)
> >
> > (and also here)
> >
> > Either you agree with above "__init" comment or not:
> > Reviewed-by: Henry Wang <Henry.Wang@arm.com>
> 
> Thanks!

No problem. To avoid confusion, my reviewed-by tag still holds.

Kind regards,
Henry

> 
> Cheers,
> 
> --
> Julien Grall

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 14/22] x86/domain_page: remove the fast paths when mfn is not in the directmap
  2022-12-16 11:48 ` [PATCH 14/22] x86/domain_page: remove the fast paths when mfn is not in the directmap Julien Grall
@ 2023-01-11 14:11   ` Jan Beulich
  2024-01-11 14:22     ` Elias El Yandouzi
  0 siblings, 1 reply; 101+ messages in thread
From: Jan Beulich @ 2023-01-11 14:11 UTC (permalink / raw)
  To: Julien Grall
  Cc: Hongyan Xia, Andrew Cooper, Roger Pau Monné,
	Wei Liu, Julien Grall, xen-devel

On 16.12.2022 12:48, Julien Grall wrote:
> From: Hongyan Xia <hongyxia@amazon.com>
> 
> When mfn is not in direct map, never use mfn_to_virt for any mappings.
> 
> We replace mfn_x(mfn) <= PFN_DOWN(__pa(HYPERVISOR_VIRT_END - 1)) with
> arch_mfns_in_direct_map(mfn, 1) because these two are equivalent. The
> extra comparison in arch_mfns_in_direct_map() looks different but because
> DIRECTMAP_VIRT_END is always higher, it does not make any difference.
> 
> Lastly, domain_page_map_to_mfn() needs to gain to a special case for
> the PMAP.
> 
> Signed-off-by: Hongyan Xia <hongyxia@amazon.com>
> Signed-off-by: Julien Grall <jgrall@amazon.com>

This looks plausible, but cannot really be fully judged upon before the
mapcache_current_vcpu() aspects pointed out earlier have been sorted.
As to using pmap - assuming you've done an audit and the number of
simultaneous mappings that can be in use can be proven to not exceed
the number of slots available, can you please say so in the description?
I have to admit though that I'm wary - this isn't a per-CPU number of
slots aiui, but a global one. But then you also have a BUG_ON() there
restricting the use to early boot. The reasoning for this is also
missing (and might address my concern).

Jan


^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 15/22] xen/page_alloc: add a path for xenheap when there is no direct map
  2022-12-16 11:48 ` [PATCH 15/22] xen/page_alloc: add a path for xenheap when there is no direct map Julien Grall
@ 2023-01-11 14:23   ` Jan Beulich
  0 siblings, 0 replies; 101+ messages in thread
From: Jan Beulich @ 2023-01-11 14:23 UTC (permalink / raw)
  To: Julien Grall
  Cc: Hongyan Xia, Andrew Cooper, George Dunlap, Stefano Stabellini,
	Wei Liu, Julien Grall, xen-devel

On 16.12.2022 12:48, Julien Grall wrote:
> From: Hongyan Xia <hongyxia@amazon.com>
> 
> When there is not an always-mapped direct map, xenheap allocations need
> to be mapped and unmapped on-demand.

Hmm, that's still putting mappings in the directmap, which I thought we
mean to be doing away with. If that's just a temporary step, then please
say so here.

>     I have left the call to map_pages_to_xen() and destroy_xen_mappings()
>     in the split heap for now. I am not entirely convinced this is necessary
>     because in that setup only the xenheap would be always mapped and
>     this doesn't contain any guest memory (aside the grant-table).
>     So map/unmapping for every allocation seems unnecessary.

But if you're unmapping, that heap won't be "always mapped" anymore. So
why would it need mapping initially?

>     Changes since Hongyan's version:
>         * Rebase
>         * Fix indentation in alloc_xenheap_pages()

Looks like you did in one of the two instances only, as ...

> @@ -2230,17 +2231,36 @@ void *alloc_xenheap_pages(unsigned int order, unsigned int memflags)
>      if ( unlikely(pg == NULL) )
>          return NULL;
>  
> +    ret = page_to_virt(pg);
> +
> +    if ( !arch_has_directmap() &&
> +         map_pages_to_xen((unsigned long)ret, page_to_mfn(pg), 1UL << order,
> +                          PAGE_HYPERVISOR) )
> +        {
> +            /* Failed to map xenheap pages. */
> +            free_heap_pages(pg, order, false);
> +            return NULL;
> +        }

... this looks wrong.

An important aspect here is that to be sure of no recursion,
map_pages_to_xen() and destroy_xen_mappings() may no longer use Xen
heap pages. May be worth saying explicitly in the description (I can't
think of a good place in code where such a comment could be put _and_
be likely to be noticed at the right point in time).

>  void free_xenheap_pages(void *v, unsigned int order)
>  {
> +    unsigned long va = (unsigned long)v & PAGE_MASK;
> +
>      ASSERT_ALLOC_CONTEXT();
>  
>      if ( v == NULL )
>          return;
>  
> +    if ( !arch_has_directmap() &&
> +         destroy_xen_mappings(va, va + (1UL << (order + PAGE_SHIFT))) )
> +        dprintk(XENLOG_WARNING,
> +                "Error while destroying xenheap mappings at %p, order %u\n",
> +                v, order);

Doesn't failure here mean (intended) security henceforth isn't guaranteed
anymore? If so, a mere dprintk() can't really be sufficient to "handle"
the error.

Jan


^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 16/22] x86/setup: leave early boot slightly earlier
  2022-12-16 11:48 ` [PATCH 16/22] x86/setup: leave early boot slightly earlier Julien Grall
@ 2023-01-11 14:34   ` Jan Beulich
  0 siblings, 0 replies; 101+ messages in thread
From: Jan Beulich @ 2023-01-11 14:34 UTC (permalink / raw)
  To: Julien Grall
  Cc: Hongyan Xia, Andrew Cooper, Roger Pau Monné,
	Wei Liu, Julien Grall, xen-devel

On 16.12.2022 12:48, Julien Grall wrote:
> --- a/xen/arch/x86/setup.c
> +++ b/xen/arch/x86/setup.c
> @@ -1648,6 +1648,22 @@ void __init noreturn __start_xen(unsigned long mbi_p)
>  
>      numa_initmem_init(0, raw_max_page);
>  
> +    /*
> +     * When we do not have a direct map, memory for metadata of heap nodes in
> +     * init_node_heap() is allocated from xenheap, which needs to be mapped and
> +     * unmapped on demand. However, we cannot just take memory from the boot
> +     * allocator to create the PTEs while we are passing memory to the heap
> +     * allocator during end_boot_allocator().
> +     *
> +     * To solve this race, we need to leave early boot before
> +     * end_boot_allocator() so that Xen PTE pages are allocated from the heap
> +     * instead of the boot allocator. We can do this because the metadata for
> +     * the 1st node is statically allocated, and by the time we need memory to
> +     * create mappings for the 2nd node, we already have enough memory in the
> +     * heap allocator in the 1st node.
> +     */

Is this "enough" guaranteed, or merely a hope (and true in the common case,
but maybe not when the 1st node ends up having very little memory)?

> +    system_state = SYS_STATE_boot;
> +
>      if ( max_page - 1 > virt_to_mfn(HYPERVISOR_VIRT_END - 1) )
>      {
>          unsigned long limit = virt_to_mfn(HYPERVISOR_VIRT_END - 1);
> @@ -1677,8 +1693,6 @@ void __init noreturn __start_xen(unsigned long mbi_p)
>      else
>          end_boot_allocator();
>  
> -    system_state = SYS_STATE_boot;

I'm afraid I don't view this as viable - there are assumptions not just in
the page table allocation functions that SYS_STATE_boot (or higher) means
that end_boot_allocator() has run (e.g. acpi_os_map_memory()). You also do
this for x86 only. I think system_state wants leaving alone here, and an
arch specific approach wants creating for the page table allocation you
talk of.

Jan


^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 17/22] x86/setup: vmap heap nodes when they are outside the direct map
  2022-12-16 11:48 ` [PATCH 17/22] x86/setup: vmap heap nodes when they are outside the direct map Julien Grall
@ 2023-01-11 14:39   ` Jan Beulich
  2023-01-23 22:03   ` Stefano Stabellini
  1 sibling, 0 replies; 101+ messages in thread
From: Jan Beulich @ 2023-01-11 14:39 UTC (permalink / raw)
  To: Julien Grall
  Cc: Hongyan Xia, Andrew Cooper, George Dunlap, Stefano Stabellini,
	Wei Liu, Julien Grall, xen-devel

On 16.12.2022 12:48, Julien Grall wrote:
> @@ -597,22 +598,43 @@ static unsigned long init_node_heap(int node, unsigned long mfn,
>          needed = 0;
>      }
>      else if ( *use_tail && nr >= needed &&
> -              arch_mfns_in_directmap(mfn + nr - needed, needed) &&
>                (!xenheap_bits ||
>                 !((mfn + nr - 1) >> (xenheap_bits - PAGE_SHIFT))) )
>      {
> -        _heap[node] = mfn_to_virt(mfn + nr - needed);
> -        avail[node] = mfn_to_virt(mfn + nr - 1) +
> -                      PAGE_SIZE - sizeof(**avail) * NR_ZONES;
> -    }
> -    else if ( nr >= needed &&

By replacing these two well-formed lines with ...

> -              arch_mfns_in_directmap(mfn, needed) &&
> +        if ( arch_mfns_in_directmap(mfn + nr - needed, needed) )
> +        {
> +            _heap[node] = mfn_to_virt(mfn + nr - needed);
> +            avail[node] = mfn_to_virt(mfn + nr - 1) +
> +                          PAGE_SIZE - sizeof(**avail) * NR_ZONES;
> +        }
> +        else
> +        {
> +            mfn_t needed_start = _mfn(mfn + nr - needed);
> +
> +            _heap[node] = vmap_contig_pages(needed_start, needed);
> +            BUG_ON(!_heap[node]);
> +            avail[node] = (void *)(_heap[node]) + (needed << PAGE_SHIFT) -
> +                          sizeof(**avail) * NR_ZONES;
> +        }
> +    } else if ( nr >= needed &&

... this, you're not only violating style here, but you also ...

>                (!xenheap_bits ||
>                 !((mfn + needed - 1) >> (xenheap_bits - PAGE_SHIFT))) )

... break indentation for these two lines.

Jan


^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 18/22] x86/setup: do not create valid mappings when directmap=no
  2022-12-16 11:48 ` [PATCH 18/22] x86/setup: do not create valid mappings when directmap=no Julien Grall
@ 2023-01-11 14:47   ` Jan Beulich
  0 siblings, 0 replies; 101+ messages in thread
From: Jan Beulich @ 2023-01-11 14:47 UTC (permalink / raw)
  To: Julien Grall
  Cc: Hongyan Xia, Andrew Cooper, Roger Pau Monné,
	Wei Liu, Julien Grall, xen-devel

On 16.12.2022 12:48, Julien Grall wrote:
> From: Hongyan Xia <hongyxia@amazon.com>
> 
> Create empty mappings in the second e820 pass. Also, destroy existing
> direct map mappings created in the first pass.

Could you remind me (perhaps say a word here) why these need creating then
in the first place?

Jan


^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 05/22] x86/srat: vmap the pages for acpi_slit
  2023-01-04 10:23       ` Jan Beulich
@ 2023-01-12 23:15         ` Julien Grall
  2023-01-13  9:16           ` Jan Beulich
  0 siblings, 1 reply; 101+ messages in thread
From: Julien Grall @ 2023-01-12 23:15 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Hongyan Xia, Andrew Cooper, Roger Pau Monné,
	Wei Liu, xen-devel, Julien Grall

Hi,

On 04/01/2023 10:23, Jan Beulich wrote:
> On 23.12.2022 12:31, Julien Grall wrote:
>> On 20/12/2022 15:30, Jan Beulich wrote:
>>> On 16.12.2022 12:48, Julien Grall wrote:
>>>> From: Hongyan Xia <hongyxia@amazon.com>
>>>>
>>>> This avoids the assumption that boot pages are in the direct map.
>>>>
>>>> Signed-off-by: Hongyan Xia <hongyxia@amazon.com>
>>>> Signed-off-by: Julien Grall <jgrall@amazon.com>
>>>
>>> Reviewed-by: Jan Beulich <jbeulich@suse.com>
>>>
>>> However, ...
>>>
>>>> --- a/xen/arch/x86/srat.c
>>>> +++ b/xen/arch/x86/srat.c
>>>> @@ -139,7 +139,8 @@ void __init acpi_numa_slit_init(struct acpi_table_slit *slit)
>>>>    		return;
>>>>    	}
>>>>    	mfn = alloc_boot_pages(PFN_UP(slit->header.length), 1);
>>>> -	acpi_slit = mfn_to_virt(mfn_x(mfn));
>>>> +	acpi_slit = vmap_contig_pages(mfn, PFN_UP(slit->header.length));
>>>
>>> ... with the increased use of vmap space the VA range used will need
>>> growing. And that's perhaps better done ahead of time than late.
>>
>> I will have a look to increase the vmap().
>>
>>>
>>>> +	BUG_ON(!acpi_slit);
>>>
>>> Similarly relevant for the earlier patch: It would be nice if boot
>>> failure for optional things like NUMA data could be avoided.
>>
>> If you can't map (or allocate the memory), then you are probably in a
>> very bad situation because both should really not fail at boot.
>>
>> So I think this is correct to crash early because the admin will be able
>> to look what went wrong. Otherwise, it may be missed in the noise.
> 
> Well, I certainly can see one taking this view. However, at least in
> principle allocation (or mapping) may fail _because_ of NUMA issues.

Right. I read this as the user will likely want to add "numa=off" on the 
command line.

> At which point it would be better to boot with NUMA support turned off
I have to disagree with "better" here. This may work for a user with a 
handful of hosts. But for large scale setup, you will really want a 
failure early rather than having a host booting with an expected feature 
disabled (the NUMA issues may be a broken HW).

It is better to fail and then ask the user to specify "numa=off". At
least the person made a conscientious decision to turn off the feature.

I am curious to hear the opinion from the others.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 06/22] x86: map/unmap pages in restore_all_guests
  2023-01-04 10:27       ` Jan Beulich
@ 2023-01-12 23:20         ` Julien Grall
  2023-01-13  9:22           ` Jan Beulich
  0 siblings, 1 reply; 101+ messages in thread
From: Julien Grall @ 2023-01-12 23:20 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Hongyan Xia, Andrew Cooper, Roger Pau Monné,
	Wei Liu, Julien Grall, xen-devel

Hi Jan,

On 04/01/2023 10:27, Jan Beulich wrote:
> On 23.12.2022 13:22, Julien Grall wrote:
>> Hi,
>>
>> On 22/12/2022 11:12, Jan Beulich wrote:
>>> On 16.12.2022 12:48, Julien Grall wrote:
>>>> --- a/xen/arch/x86/x86_64/entry.S
>>>> +++ b/xen/arch/x86/x86_64/entry.S
>>>> @@ -165,7 +165,24 @@ restore_all_guest:
>>>>            and   %rsi, %rdi
>>>>            and   %r9, %rsi
>>>>            add   %rcx, %rdi
>>>> -        add   %rcx, %rsi
>>>> +
>>>> +         /*
>>>> +          * Without a direct map, we have to map first before copying. We only
>>>> +          * need to map the guest root table but not the per-CPU root_pgt,
>>>> +          * because the latter is still a xenheap page.
>>>> +          */
>>>> +        pushq %r9
>>>> +        pushq %rdx
>>>> +        pushq %rax
>>>> +        pushq %rdi
>>>> +        mov   %rsi, %rdi
>>>> +        shr   $PAGE_SHIFT, %rdi
>>>> +        callq map_domain_page
>>>> +        mov   %rax, %rsi
>>>> +        popq  %rdi
>>>> +        /* Stash the pointer for unmapping later. */
>>>> +        pushq %rax
>>>> +
>>>>            mov   $ROOT_PAGETABLE_FIRST_XEN_SLOT, %ecx
>>>>            mov   root_table_offset(SH_LINEAR_PT_VIRT_START)*8(%rsi), %r8
>>>>            mov   %r8, root_table_offset(SH_LINEAR_PT_VIRT_START)*8(%rdi)
>>>> @@ -177,6 +194,14 @@ restore_all_guest:
>>>>            sub   $(ROOT_PAGETABLE_FIRST_XEN_SLOT - \
>>>>                    ROOT_PAGETABLE_LAST_XEN_SLOT - 1) * 8, %rdi
>>>>            rep movsq
>>>> +
>>>> +        /* Unmap the page. */
>>>> +        popq  %rdi
>>>> +        callq unmap_domain_page
>>>> +        popq  %rax
>>>> +        popq  %rdx
>>>> +        popq  %r9
>>>
>>> While the PUSH/POP are part of what I dislike here, I think this wants
>>> doing differently: Establish a mapping when putting in place a new guest
>>> page table, and use the pointer here. This could be a new per-domain
>>> mapping, to limit its visibility.
>>
>> I have looked at a per-domain approach and this looks way more complex
>> than the few concise lines here (not mentioning the extra amount of
>> memory).
> 
> Yes, I do understand that would be a more intrusive change.

I could be persuaded to look at a more intrusive change if there are a 
good reason to do it. To me, at the moment, it mostly seem a matter of 
taste.

So what would we gain from a perdomain mapping?

> 
>> So I am not convinced this is worth the effort here.
>>
>> I don't have an other approach in mind. So are you disliking this
>> approach to the point this will be nacked?
> 
> I guess I wouldn't nack it, but I also wouldn't provide an ack.
> I'm curious
> what Andrew or Roger think here...

Unfortunately Roger is on parental leaves for the next couple of months. 
It would be good to make some progress before hand. Andrew, what do you 
think?

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 05/22] x86/srat: vmap the pages for acpi_slit
  2023-01-12 23:15         ` Julien Grall
@ 2023-01-13  9:16           ` Jan Beulich
  2023-01-13  9:17             ` Julien Grall
  0 siblings, 1 reply; 101+ messages in thread
From: Jan Beulich @ 2023-01-13  9:16 UTC (permalink / raw)
  To: Julien Grall
  Cc: Hongyan Xia, Andrew Cooper, Roger Pau Monné,
	Wei Liu, xen-devel, Julien Grall

On 13.01.2023 00:15, Julien Grall wrote:
> Hi,
> 
> On 04/01/2023 10:23, Jan Beulich wrote:
>> On 23.12.2022 12:31, Julien Grall wrote:
>>> On 20/12/2022 15:30, Jan Beulich wrote:
>>>> On 16.12.2022 12:48, Julien Grall wrote:
>>>>> From: Hongyan Xia <hongyxia@amazon.com>
>>>>>
>>>>> This avoids the assumption that boot pages are in the direct map.
>>>>>
>>>>> Signed-off-by: Hongyan Xia <hongyxia@amazon.com>
>>>>> Signed-off-by: Julien Grall <jgrall@amazon.com>
>>>>
>>>> Reviewed-by: Jan Beulich <jbeulich@suse.com>
>>>>
>>>> However, ...
>>>>
>>>>> --- a/xen/arch/x86/srat.c
>>>>> +++ b/xen/arch/x86/srat.c
>>>>> @@ -139,7 +139,8 @@ void __init acpi_numa_slit_init(struct acpi_table_slit *slit)
>>>>>    		return;
>>>>>    	}
>>>>>    	mfn = alloc_boot_pages(PFN_UP(slit->header.length), 1);
>>>>> -	acpi_slit = mfn_to_virt(mfn_x(mfn));
>>>>> +	acpi_slit = vmap_contig_pages(mfn, PFN_UP(slit->header.length));
>>>>
>>>> ... with the increased use of vmap space the VA range used will need
>>>> growing. And that's perhaps better done ahead of time than late.
>>>
>>> I will have a look to increase the vmap().
>>>
>>>>
>>>>> +	BUG_ON(!acpi_slit);
>>>>
>>>> Similarly relevant for the earlier patch: It would be nice if boot
>>>> failure for optional things like NUMA data could be avoided.
>>>
>>> If you can't map (or allocate the memory), then you are probably in a
>>> very bad situation because both should really not fail at boot.
>>>
>>> So I think this is correct to crash early because the admin will be able
>>> to look what went wrong. Otherwise, it may be missed in the noise.
>>
>> Well, I certainly can see one taking this view. However, at least in
>> principle allocation (or mapping) may fail _because_ of NUMA issues.
> 
> Right. I read this as the user will likely want to add "numa=off" on the 
> command line.
> 
>> At which point it would be better to boot with NUMA support turned off
> I have to disagree with "better" here. This may work for a user with a 
> handful of hosts. But for large scale setup, you will really want a 
> failure early rather than having a host booting with an expected feature 
> disabled (the NUMA issues may be a broken HW).
> 
> It is better to fail and then ask the user to specify "numa=off". At
> least the person made a conscientious decision to turn off the feature.

Yet how would the observing admin make the connection from the BUG_ON()
that triggered and the need to add "numa=off" to the command line,
without knowing Xen internals?

> I am curious to hear the opinion from the others.

So am I.

Jan


^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 05/22] x86/srat: vmap the pages for acpi_slit
  2023-01-13  9:16           ` Jan Beulich
@ 2023-01-13  9:17             ` Julien Grall
  0 siblings, 0 replies; 101+ messages in thread
From: Julien Grall @ 2023-01-13  9:17 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Hongyan Xia, Andrew Cooper, Roger Pau Monné,
	Wei Liu, xen-devel, Julien Grall



On 13/01/2023 09:16, Jan Beulich wrote:
> On 13.01.2023 00:15, Julien Grall wrote:
>> Hi,
>>
>> On 04/01/2023 10:23, Jan Beulich wrote:
>>> On 23.12.2022 12:31, Julien Grall wrote:
>>>> On 20/12/2022 15:30, Jan Beulich wrote:
>>>>> On 16.12.2022 12:48, Julien Grall wrote:
>>>>>> From: Hongyan Xia <hongyxia@amazon.com>
>>>>>>
>>>>>> This avoids the assumption that boot pages are in the direct map.
>>>>>>
>>>>>> Signed-off-by: Hongyan Xia <hongyxia@amazon.com>
>>>>>> Signed-off-by: Julien Grall <jgrall@amazon.com>
>>>>>
>>>>> Reviewed-by: Jan Beulich <jbeulich@suse.com>
>>>>>
>>>>> However, ...
>>>>>
>>>>>> --- a/xen/arch/x86/srat.c
>>>>>> +++ b/xen/arch/x86/srat.c
>>>>>> @@ -139,7 +139,8 @@ void __init acpi_numa_slit_init(struct acpi_table_slit *slit)
>>>>>>     		return;
>>>>>>     	}
>>>>>>     	mfn = alloc_boot_pages(PFN_UP(slit->header.length), 1);
>>>>>> -	acpi_slit = mfn_to_virt(mfn_x(mfn));
>>>>>> +	acpi_slit = vmap_contig_pages(mfn, PFN_UP(slit->header.length));
>>>>>
>>>>> ... with the increased use of vmap space the VA range used will need
>>>>> growing. And that's perhaps better done ahead of time than late.
>>>>
>>>> I will have a look to increase the vmap().
>>>>
>>>>>
>>>>>> +	BUG_ON(!acpi_slit);
>>>>>
>>>>> Similarly relevant for the earlier patch: It would be nice if boot
>>>>> failure for optional things like NUMA data could be avoided.
>>>>
>>>> If you can't map (or allocate the memory), then you are probably in a
>>>> very bad situation because both should really not fail at boot.
>>>>
>>>> So I think this is correct to crash early because the admin will be able
>>>> to look what went wrong. Otherwise, it may be missed in the noise.
>>>
>>> Well, I certainly can see one taking this view. However, at least in
>>> principle allocation (or mapping) may fail _because_ of NUMA issues.
>>
>> Right. I read this as the user will likely want to add "numa=off" on the
>> command line.
>>
>>> At which point it would be better to boot with NUMA support turned off
>> I have to disagree with "better" here. This may work for a user with a
>> handful of hosts. But for large scale setup, you will really want a
>> failure early rather than having a host booting with an expected feature
>> disabled (the NUMA issues may be a broken HW).
>>
>> It is better to fail and then ask the user to specify "numa=off". At
>> least the person made a conscientious decision to turn off the feature.
> 
> Yet how would the observing admin make the connection from the BUG_ON()
> that triggered and the need to add "numa=off" to the command line,
> without knowing Xen internals?

I am happy to switch to a panic() that suggests to turn off NUMA.

> 
>> I am curious to hear the opinion from the others.
> 
> So am I.
> 
> Jan

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 06/22] x86: map/unmap pages in restore_all_guests
  2023-01-12 23:20         ` Julien Grall
@ 2023-01-13  9:22           ` Jan Beulich
  2023-06-22 10:44             ` Julien Grall
  0 siblings, 1 reply; 101+ messages in thread
From: Jan Beulich @ 2023-01-13  9:22 UTC (permalink / raw)
  To: Julien Grall
  Cc: Hongyan Xia, Andrew Cooper, Roger Pau Monné,
	Wei Liu, Julien Grall, xen-devel

On 13.01.2023 00:20, Julien Grall wrote:
> On 04/01/2023 10:27, Jan Beulich wrote:
>> On 23.12.2022 13:22, Julien Grall wrote:
>>> On 22/12/2022 11:12, Jan Beulich wrote:
>>>> On 16.12.2022 12:48, Julien Grall wrote:
>>>>> --- a/xen/arch/x86/x86_64/entry.S
>>>>> +++ b/xen/arch/x86/x86_64/entry.S
>>>>> @@ -165,7 +165,24 @@ restore_all_guest:
>>>>>            and   %rsi, %rdi
>>>>>            and   %r9, %rsi
>>>>>            add   %rcx, %rdi
>>>>> -        add   %rcx, %rsi
>>>>> +
>>>>> +         /*
>>>>> +          * Without a direct map, we have to map first before copying. We only
>>>>> +          * need to map the guest root table but not the per-CPU root_pgt,
>>>>> +          * because the latter is still a xenheap page.
>>>>> +          */
>>>>> +        pushq %r9
>>>>> +        pushq %rdx
>>>>> +        pushq %rax
>>>>> +        pushq %rdi
>>>>> +        mov   %rsi, %rdi
>>>>> +        shr   $PAGE_SHIFT, %rdi
>>>>> +        callq map_domain_page
>>>>> +        mov   %rax, %rsi
>>>>> +        popq  %rdi
>>>>> +        /* Stash the pointer for unmapping later. */
>>>>> +        pushq %rax
>>>>> +
>>>>>            mov   $ROOT_PAGETABLE_FIRST_XEN_SLOT, %ecx
>>>>>            mov   root_table_offset(SH_LINEAR_PT_VIRT_START)*8(%rsi), %r8
>>>>>            mov   %r8, root_table_offset(SH_LINEAR_PT_VIRT_START)*8(%rdi)
>>>>> @@ -177,6 +194,14 @@ restore_all_guest:
>>>>>            sub   $(ROOT_PAGETABLE_FIRST_XEN_SLOT - \
>>>>>                    ROOT_PAGETABLE_LAST_XEN_SLOT - 1) * 8, %rdi
>>>>>            rep movsq
>>>>> +
>>>>> +        /* Unmap the page. */
>>>>> +        popq  %rdi
>>>>> +        callq unmap_domain_page
>>>>> +        popq  %rax
>>>>> +        popq  %rdx
>>>>> +        popq  %r9
>>>>
>>>> While the PUSH/POP are part of what I dislike here, I think this wants
>>>> doing differently: Establish a mapping when putting in place a new guest
>>>> page table, and use the pointer here. This could be a new per-domain
>>>> mapping, to limit its visibility.
>>>
>>> I have looked at a per-domain approach and this looks way more complex
>>> than the few concise lines here (not mentioning the extra amount of
>>> memory).
>>
>> Yes, I do understand that would be a more intrusive change.
> 
> I could be persuaded to look at a more intrusive change if there are a 
> good reason to do it. To me, at the moment, it mostly seem a matter of 
> taste.
> 
> So what would we gain from a perdomain mapping?

Rather than mapping/unmapping once per hypervisor entry/exit, we'd
map just once per context switch. Plus we'd save ugly/fragile assembly
code (apart from the push/pop I also dislike C functions being called
from assembly which aren't really meant to be called this way: While
these two may indeed be unlikely to ever change, any such change comes
with the risk of the assembly callers being missed - the compiler
won't tell you that e.g. argument types/count don't match parameters
anymore).

Jan


^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 01/22] xen/common: page_alloc: Re-order includes
  2022-12-16 11:48 ` [PATCH 01/22] xen/common: page_alloc: Re-order includes Julien Grall
  2022-12-16 12:03   ` Jan Beulich
@ 2023-01-23 21:29   ` Stefano Stabellini
  2023-01-23 21:57     ` Julien Grall
  1 sibling, 1 reply; 101+ messages in thread
From: Stefano Stabellini @ 2023-01-23 21:29 UTC (permalink / raw)
  To: Julien Grall
  Cc: xen-devel, Julien Grall, Andrew Cooper, George Dunlap,
	Jan Beulich, Stefano Stabellini, Wei Liu

On Fri, 16 Dec 2022, Julien Grall wrote:
> From: Julien Grall <jgrall@amazon.com>
> 
> Order the includes with the xen headers first, then asm headers and
> last public headers. Within each category, they are sorted alphabetically.
> 
> Note that the includes in protected by CONFIG_X86 hasn't been sorted
> to avoid adding multiple #ifdef.
> 
> Signed-off-by: Julien Grall <jgrall@amazon.com>

This patch doesn't apply as is any longer. Assuming it gets ported to
the latest staging appropriately:

Acked-by: Stefano Stabellini <sstabellini@kernel.org>


> ----
> 
>     I am open to add sort the includes protected by CONFIG_X86
>     and add multiple #ifdef if this is preferred.
> ---
>  xen/common/page_alloc.c | 29 ++++++++++++++++-------------
>  1 file changed, 16 insertions(+), 13 deletions(-)
> 
> diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
> index 0c93a1078702..0a950288e241 100644
> --- a/xen/common/page_alloc.c
> +++ b/xen/common/page_alloc.c
> @@ -120,27 +120,30 @@
>   *   regions within it.
>   */
>  
> +#include <xen/domain_page.h>
> +#include <xen/event.h>
>  #include <xen/init.h>
> -#include <xen/types.h>
> +#include <xen/irq.h>
> +#include <xen/keyhandler.h>
>  #include <xen/lib.h>
> -#include <xen/sched.h>
> -#include <xen/spinlock.h>
>  #include <xen/mm.h>
> +#include <xen/nodemask.h>
> +#include <xen/numa.h>
>  #include <xen/param.h>
> -#include <xen/irq.h>
> -#include <xen/softirq.h>
> -#include <xen/domain_page.h>
> -#include <xen/keyhandler.h>
>  #include <xen/perfc.h>
>  #include <xen/pfn.h>
> -#include <xen/numa.h>
> -#include <xen/nodemask.h>
> -#include <xen/event.h>
> +#include <xen/types.h>
> +#include <xen/sched.h>
> +#include <xen/softirq.h>
> +#include <xen/spinlock.h>
> +
> +#include <asm/flushtlb.h>
> +#include <asm/numa.h>
> +#include <asm/page.h>
> +
>  #include <public/sysctl.h>
>  #include <public/sched.h>
> -#include <asm/page.h>
> -#include <asm/numa.h>
> -#include <asm/flushtlb.h>
> +
>  #ifdef CONFIG_X86
>  #include <asm/guest.h>
>  #include <asm/p2m.h>
> -- 
> 2.38.1
> 


^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 02/22] x86/setup: move vm_init() before acpi calls
  2022-12-16 11:48 ` [PATCH 02/22] x86/setup: move vm_init() before acpi calls Julien Grall
  2022-12-20 15:08   ` Jan Beulich
@ 2023-01-23 21:34   ` Stefano Stabellini
  1 sibling, 0 replies; 101+ messages in thread
From: Stefano Stabellini @ 2023-01-23 21:34 UTC (permalink / raw)
  To: Julien Grall
  Cc: xen-devel, Wei Liu, Stefano Stabellini, Bertrand Marquis,
	Volodymyr Babchuk, Andrew Cooper, George Dunlap, Jan Beulich,
	Wei Liu, Roger Pau Monné,
	David Woodhouse, Hongyan Xia, Julien Grall

On Fri, 16 Dec 2022, Julien Grall wrote:
> From: Wei Liu <wei.liu2@citrix.com>
> 
> After the direct map removal, pages from the boot allocator are not
> mapped at all in the direct map. Although we have map_domain_page, they
> are ephemeral and are less helpful for mappings that are more than a
> page, so we want a mechanism to globally map a range of pages, which is
> what vmap is for. Therefore, we bring vm_init into early boot stage.
> 
> To allow vmap to be initialised and used in early boot, we need to
> modify vmap to receive pages from the boot allocator during early boot
> stage.
> 
> Signed-off-by: Wei Liu <wei.liu2@citrix.com>
> Signed-off-by: David Woodhouse <dwmw2@amazon.com>
> Signed-off-by: Hongyan Xia <hongyxia@amazon.com>
> Signed-off-by: Julien Grall <jgrall@amazon.com>

For the arm and common parts:

Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>


> ---
>  xen/arch/arm/setup.c |  4 ++--
>  xen/arch/x86/setup.c | 31 ++++++++++++++++++++-----------
>  xen/common/vmap.c    | 37 +++++++++++++++++++++++++++++--------
>  3 files changed, 51 insertions(+), 21 deletions(-)
> 
> diff --git a/xen/arch/arm/setup.c b/xen/arch/arm/setup.c
> index 1f26f67b90e3..2311726f5ddd 100644
> --- a/xen/arch/arm/setup.c
> +++ b/xen/arch/arm/setup.c
> @@ -1028,6 +1028,8 @@ void __init start_xen(unsigned long boot_phys_offset,
>  
>      setup_mm();
>  
> +    vm_init();
> +
>      /* Parse the ACPI tables for possible boot-time configuration */
>      acpi_boot_table_init();
>  
> @@ -1039,8 +1041,6 @@ void __init start_xen(unsigned long boot_phys_offset,
>       */
>      system_state = SYS_STATE_boot;
>  
> -    vm_init();
> -
>      if ( acpi_disabled )
>      {
>          printk("Booting using Device Tree\n");
> diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
> index 6bb5bc7c84be..1c2e09711eb0 100644
> --- a/xen/arch/x86/setup.c
> +++ b/xen/arch/x86/setup.c
> @@ -870,6 +870,7 @@ void __init noreturn __start_xen(unsigned long mbi_p)
>      unsigned long eb_start, eb_end;
>      bool acpi_boot_table_init_done = false, relocated = false;
>      int ret;
> +    bool vm_init_done = false;
>      struct ns16550_defaults ns16550 = {
>          .data_bits = 8,
>          .parity    = 'n',
> @@ -1442,12 +1443,23 @@ void __init noreturn __start_xen(unsigned long mbi_p)
>              continue;
>  
>          if ( !acpi_boot_table_init_done &&
> -             s >= (1ULL << 32) &&
> -             !acpi_boot_table_init() )
> +             s >= (1ULL << 32) )
>          {
> -            acpi_boot_table_init_done = true;
> -            srat_parse_regions(s);
> -            setup_max_pdx(raw_max_page);
> +            /*
> +             * We only initialise vmap and acpi after going through the bottom
> +             * 4GiB, so that we have enough pages in the boot allocator.
> +             */
> +            if ( !vm_init_done )
> +            {
> +                vm_init();
> +                vm_init_done = true;
> +            }
> +            if ( !acpi_boot_table_init() )
> +            {
> +                acpi_boot_table_init_done = true;
> +                srat_parse_regions(s);
> +                setup_max_pdx(raw_max_page);
> +            }
>          }
>  
>          if ( pfn_to_pdx((e - 1) >> PAGE_SHIFT) >= max_pdx )
> @@ -1624,6 +1636,9 @@ void __init noreturn __start_xen(unsigned long mbi_p)
>  
>      init_frametable();
>  
> +    if ( !vm_init_done )
> +        vm_init();
> +
>      if ( !acpi_boot_table_init_done )
>          acpi_boot_table_init();
>  
> @@ -1661,12 +1676,6 @@ void __init noreturn __start_xen(unsigned long mbi_p)
>          end_boot_allocator();
>  
>      system_state = SYS_STATE_boot;
> -    /*
> -     * No calls involving ACPI code should go between the setting of
> -     * SYS_STATE_boot and vm_init() (or else acpi_os_{,un}map_memory()
> -     * will break).
> -     */
> -    vm_init();
>  
>      bsp_stack = cpu_alloc_stack(0);
>      if ( !bsp_stack )
> diff --git a/xen/common/vmap.c b/xen/common/vmap.c
> index 4fd6b3067ec1..1340c7c6faf6 100644
> --- a/xen/common/vmap.c
> +++ b/xen/common/vmap.c
> @@ -34,9 +34,20 @@ void __init vm_init_type(enum vmap_region type, void *start, void *end)
>  
>      for ( i = 0, va = (unsigned long)vm_bitmap(type); i < nr; ++i, va += PAGE_SIZE )
>      {
> -        struct page_info *pg = alloc_domheap_page(NULL, 0);
> +        mfn_t mfn;
> +        int rc;
>  
> -        map_pages_to_xen(va, page_to_mfn(pg), 1, PAGE_HYPERVISOR);
> +        if ( system_state == SYS_STATE_early_boot )
> +            mfn = alloc_boot_pages(1, 1);
> +        else
> +        {
> +            struct page_info *pg = alloc_domheap_page(NULL, 0);
> +
> +            BUG_ON(!pg);
> +            mfn = page_to_mfn(pg);
> +        }
> +        rc = map_pages_to_xen(va, mfn, 1, PAGE_HYPERVISOR);
> +        BUG_ON(rc);
>          clear_page((void *)va);
>      }
>      bitmap_fill(vm_bitmap(type), vm_low[type]);
> @@ -62,7 +73,7 @@ static void *vm_alloc(unsigned int nr, unsigned int align,
>      spin_lock(&vm_lock);
>      for ( ; ; )
>      {
> -        struct page_info *pg;
> +        mfn_t mfn;
>  
>          ASSERT(vm_low[t] == vm_top[t] || !test_bit(vm_low[t], vm_bitmap(t)));
>          for ( start = vm_low[t]; start < vm_top[t]; )
> @@ -97,9 +108,16 @@ static void *vm_alloc(unsigned int nr, unsigned int align,
>          if ( vm_top[t] >= vm_end[t] )
>              return NULL;
>  
> -        pg = alloc_domheap_page(NULL, 0);
> -        if ( !pg )
> -            return NULL;
> +        if ( system_state == SYS_STATE_early_boot )
> +            mfn = alloc_boot_pages(1, 1);
> +        else
> +        {
> +            struct page_info *pg = alloc_domheap_page(NULL, 0);
> +
> +            if ( !pg )
> +                return NULL;
> +            mfn = page_to_mfn(pg);
> +        }
>  
>          spin_lock(&vm_lock);
>  
> @@ -107,7 +125,7 @@ static void *vm_alloc(unsigned int nr, unsigned int align,
>          {
>              unsigned long va = (unsigned long)vm_bitmap(t) + vm_top[t] / 8;
>  
> -            if ( !map_pages_to_xen(va, page_to_mfn(pg), 1, PAGE_HYPERVISOR) )
> +            if ( !map_pages_to_xen(va, mfn, 1, PAGE_HYPERVISOR) )
>              {
>                  clear_page((void *)va);
>                  vm_top[t] += PAGE_SIZE * 8;
> @@ -117,7 +135,10 @@ static void *vm_alloc(unsigned int nr, unsigned int align,
>              }
>          }
>  
> -        free_domheap_page(pg);
> +        if ( system_state == SYS_STATE_early_boot )
> +            init_boot_pages(mfn_to_maddr(mfn), mfn_to_maddr(mfn) + PAGE_SIZE);
> +        else
> +            free_domheap_page(mfn_to_page(mfn));
>  
>          if ( start >= vm_top[t] )
>          {
> -- 
> 2.38.1
> 


^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 03/22] acpi: vmap pages in acpi_os_alloc_memory
  2022-12-16 11:48 ` [PATCH 03/22] acpi: vmap pages in acpi_os_alloc_memory Julien Grall
  2022-12-16 12:07   ` Julien Grall
  2022-12-20 15:15   ` Jan Beulich
@ 2023-01-23 21:39   ` Stefano Stabellini
  2 siblings, 0 replies; 101+ messages in thread
From: Stefano Stabellini @ 2023-01-23 21:39 UTC (permalink / raw)
  To: Julien Grall
  Cc: xen-devel, Hongyan Xia, Andrew Cooper, George Dunlap,
	Jan Beulich, Stefano Stabellini, Wei Liu, Julien Grall

On Fri, 16 Dec 2022, Julien Grall wrote:
> From: Hongyan Xia <hongyxia@amazon.com>
> 
> Also, introduce a wrapper around vmap that maps a contiguous range for
> boot allocations. Unfortunately, the new helper cannot be a static inline
> because the dependences are a mess. We would need to re-include
> asm/page.h (was removed in aa4b9d1ee653 "include: don't use asm/page.h
> from common headers") and it doesn't look to be enough anymore
> because bits from asm/cpufeature.h is used in the definition of PAGE_NX.
> 
> Signed-off-by: Hongyan Xia <hongyxia@amazon.com>
> Signed-off-by: Julien Grall <jgrall@amazon.com>

I saw Jan's comments and I agree with them but I also wanted to track
that I reviewed this patch and looks OK:

Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>


> ----
> 
>     Changes since Hongyan's version:
>         * Rename vmap_boot_pages() to vmap_contig_pages()
>         * Move the new helper in vmap.c to avoid compilation issue
>         * Don't use __pa() to translate the virtual address
> ---
>  xen/common/vmap.c      |  5 +++++
>  xen/drivers/acpi/osl.c | 13 +++++++++++--
>  xen/include/xen/vmap.h |  2 ++
>  3 files changed, 18 insertions(+), 2 deletions(-)
> 
> diff --git a/xen/common/vmap.c b/xen/common/vmap.c
> index 1340c7c6faf6..78f051a67682 100644
> --- a/xen/common/vmap.c
> +++ b/xen/common/vmap.c
> @@ -244,6 +244,11 @@ void *vmap(const mfn_t *mfn, unsigned int nr)
>      return __vmap(mfn, 1, nr, 1, PAGE_HYPERVISOR, VMAP_DEFAULT);
>  }
>  
> +void *vmap_contig_pages(mfn_t mfn, unsigned int nr_pages)
> +{
> +    return __vmap(&mfn, nr_pages, 1, 1, PAGE_HYPERVISOR, VMAP_DEFAULT);
> +}
> +
>  void vunmap(const void *va)
>  {
>      unsigned long addr = (unsigned long)va;
> diff --git a/xen/drivers/acpi/osl.c b/xen/drivers/acpi/osl.c
> index 389505f78666..44a9719b0dcf 100644
> --- a/xen/drivers/acpi/osl.c
> +++ b/xen/drivers/acpi/osl.c
> @@ -221,7 +221,11 @@ void *__init acpi_os_alloc_memory(size_t sz)
>  	void *ptr;
>  
>  	if (system_state == SYS_STATE_early_boot)
> -		return mfn_to_virt(mfn_x(alloc_boot_pages(PFN_UP(sz), 1)));
> +	{
> +		mfn_t mfn = alloc_boot_pages(PFN_UP(sz), 1);
> +
> +		return vmap_contig_pages(mfn, PFN_UP(sz));
> +	}
>  
>  	ptr = xmalloc_bytes(sz);
>  	ASSERT(!ptr || is_xmalloc_memory(ptr));
> @@ -246,5 +250,10 @@ void __init acpi_os_free_memory(void *ptr)
>  	if (is_xmalloc_memory(ptr))
>  		xfree(ptr);
>  	else if (ptr && system_state == SYS_STATE_early_boot)
> -		init_boot_pages(__pa(ptr), __pa(ptr) + PAGE_SIZE);
> +	{
> +		paddr_t addr = mfn_to_maddr(vmap_to_mfn(ptr));
> +
> +		vunmap(ptr);
> +		init_boot_pages(addr, addr + PAGE_SIZE);
> +	}
>  }
> diff --git a/xen/include/xen/vmap.h b/xen/include/xen/vmap.h
> index b0f7632e8985..3c06c7c3ba30 100644
> --- a/xen/include/xen/vmap.h
> +++ b/xen/include/xen/vmap.h
> @@ -23,6 +23,8 @@ void *vmalloc_xen(size_t size);
>  void *vzalloc(size_t size);
>  void vfree(void *va);
>  
> +void *vmap_contig_pages(mfn_t mfn, unsigned int nr_pages);
> +
>  void __iomem *ioremap(paddr_t, size_t);
>  
>  static inline void iounmap(void __iomem *va)
> -- 
> 2.38.1
> 


^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 11/22] x86: add a boot option to enable and disable the direct map
  2022-12-16 11:48 ` [PATCH 11/22] x86: add a boot option to enable and disable the direct map Julien Grall
  2022-12-22 13:24   ` Jan Beulich
@ 2023-01-23 21:45   ` Stefano Stabellini
  2023-01-23 22:01     ` Julien Grall
  1 sibling, 1 reply; 101+ messages in thread
From: Stefano Stabellini @ 2023-01-23 21:45 UTC (permalink / raw)
  To: Julien Grall
  Cc: xen-devel, Hongyan Xia, Andrew Cooper, George Dunlap,
	Jan Beulich, Stefano Stabellini, Wei Liu, Bertrand Marquis,
	Volodymyr Babchuk, Roger Pau Monné,
	Julien Grall

On Fri, 16 Dec 2022, Julien Grall wrote:
> From: Hongyan Xia <hongyxia@amazon.com>
> 
> Also add a helper function to retrieve it. Change arch_mfns_in_direct_map
> to check this option before returning.
> 
> This is added as a boot command line option, not a Kconfig to allow
> the user to experiment the feature without rebuild the hypervisor.
> 
> Signed-off-by: Hongyan Xia <hongyxia@amazon.com>
> Signed-off-by: Julien Grall <jgrall@amazon.com>
> 
> ----
> 
>     TODO:
>         * Do we also want to provide a Kconfig option?
> 
>     Changes since Hongyan's version:
>         * Reword the commit message
>         * opt_directmap is only modified during boot so mark it as
>           __ro_after_init
> ---
>  docs/misc/xen-command-line.pandoc | 12 ++++++++++++
>  xen/arch/arm/include/asm/mm.h     |  5 +++++
>  xen/arch/x86/include/asm/mm.h     | 17 ++++++++++++++++-
>  xen/arch/x86/mm.c                 |  3 +++
>  xen/arch/x86/setup.c              |  2 ++
>  5 files changed, 38 insertions(+), 1 deletion(-)
> 
> diff --git a/docs/misc/xen-command-line.pandoc b/docs/misc/xen-command-line.pandoc
> index b7ee97be762e..a63e4612acac 100644
> --- a/docs/misc/xen-command-line.pandoc
> +++ b/docs/misc/xen-command-line.pandoc
> @@ -760,6 +760,18 @@ Specify the size of the console debug trace buffer. By specifying `cpu:`
>  additionally a trace buffer of the specified size is allocated per cpu.
>  The debug trace feature is only enabled in debugging builds of Xen.
>  
> +### directmap (x86)
> +> `= <boolean>`
> +
> +> Default: `true`
> +
> +Enable or disable the direct map region in Xen.
> +
> +By default, Xen creates the direct map region which maps physical memory
> +in that region. Setting this to no will remove the direct map, blocking
> +exploits that leak secrets via speculative memory access in the direct
> +map.
> +
>  ### dma_bits
>  > `= <integer>`
>  
> diff --git a/xen/arch/arm/include/asm/mm.h b/xen/arch/arm/include/asm/mm.h
> index 68adcac9fa8d..2366928d71aa 100644
> --- a/xen/arch/arm/include/asm/mm.h
> +++ b/xen/arch/arm/include/asm/mm.h
> @@ -406,6 +406,11 @@ static inline void page_set_xenheap_gfn(struct page_info *p, gfn_t gfn)
>      } while ( (y = cmpxchg(&p->u.inuse.type_info, x, nx)) != x );
>  }
>  
> +static inline bool arch_has_directmap(void)
> +{
> +    return true;

Shoudn't arch_has_directmap return false for arm32?



> +}
> +
>  #endif /*  __ARCH_ARM_MM__ */
>  /*
>   * Local variables:
> diff --git a/xen/arch/x86/include/asm/mm.h b/xen/arch/x86/include/asm/mm.h
> index db29e3e2059f..cf8b20817c6c 100644
> --- a/xen/arch/x86/include/asm/mm.h
> +++ b/xen/arch/x86/include/asm/mm.h
> @@ -464,6 +464,8 @@ static inline int get_page_and_type(struct page_info *page,
>      ASSERT(((_p)->count_info & PGC_count_mask) != 0);          \
>      ASSERT(page_get_owner(_p) == (_d))
>  
> +extern bool opt_directmap;
> +
>  /******************************************************************************
>   * With shadow pagetables, the different kinds of address start
>   * to get get confusing.
> @@ -620,13 +622,26 @@ extern const char zero_page[];
>  /* Build a 32bit PSE page table using 4MB pages. */
>  void write_32bit_pse_identmap(uint32_t *l2);
>  
> +static inline bool arch_has_directmap(void)
> +{
> +    return opt_directmap;
> +}
> +
>  /*
>   * x86 maps part of physical memory via the directmap region.
>   * Return whether the range of MFN falls in the directmap region.
> + *
> + * When boot command line sets directmap=no, we will not have a direct map at
> + * all so this will always return false.
>   */
>  static inline bool arch_mfns_in_directmap(unsigned long mfn, unsigned long nr)
>  {
> -    unsigned long eva = min(DIRECTMAP_VIRT_END, HYPERVISOR_VIRT_END);
> +    unsigned long eva;
> +
> +    if ( !arch_has_directmap() )
> +        return false;
> +
> +    eva = min(DIRECTMAP_VIRT_END, HYPERVISOR_VIRT_END);
>  
>      return (mfn + nr) <= (virt_to_mfn(eva - 1) + 1);
>  }
> diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
> index 041bd4cfde17..e76e135b96fc 100644
> --- a/xen/arch/x86/mm.c
> +++ b/xen/arch/x86/mm.c
> @@ -157,6 +157,9 @@ l1_pgentry_t __section(".bss.page_aligned") __aligned(PAGE_SIZE)
>  l1_pgentry_t __section(".bss.page_aligned") __aligned(PAGE_SIZE)
>      l1_fixmap_x[L1_PAGETABLE_ENTRIES];
>  
> +bool __ro_after_init opt_directmap = true;
> +boolean_param("directmap", opt_directmap);
> +
>  /* Frame table size in pages. */
>  unsigned long max_page;
>  unsigned long total_pages;
> diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
> index 1c2e09711eb0..2cb051c6e4e7 100644
> --- a/xen/arch/x86/setup.c
> +++ b/xen/arch/x86/setup.c
> @@ -1423,6 +1423,8 @@ void __init noreturn __start_xen(unsigned long mbi_p)
>      if ( highmem_start )
>          xenheap_max_mfn(PFN_DOWN(highmem_start - 1));
>  
> +    printk("Booting with directmap %s\n", arch_has_directmap() ? "on" : "off");
> +
>      /*
>       * Walk every RAM region and map it in its entirety (on x86/64, at least)
>       * and notify it to the boot allocator.
> -- 
> 2.38.1
> 


^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 12/22] xen/arm: fixmap: Rename the fixmap slots to follow the x86 convention
  2022-12-16 11:48 ` [PATCH 12/22] xen/arm: fixmap: Rename the fixmap slots to follow the x86 convention Julien Grall
  2022-12-22 13:29   ` Jan Beulich
  2023-01-06 14:54   ` Henry Wang
@ 2023-01-23 21:47   ` Stefano Stabellini
  2 siblings, 0 replies; 101+ messages in thread
From: Stefano Stabellini @ 2023-01-23 21:47 UTC (permalink / raw)
  To: Julien Grall
  Cc: xen-devel, Julien Grall, Stefano Stabellini, Bertrand Marquis,
	Volodymyr Babchuk, Andrew Cooper, George Dunlap, Jan Beulich,
	Wei Liu

On Fri, 16 Dec 2022, Julien Grall wrote:
> From: Julien Grall <jgrall@amazon.com>
> 
> At the moment the fixmap slots are prefixed differently between arm and
> x86.
> 
> Some of them (e.g. the PMAP slots) are used in common code. So it would
> be better if they are named the same way to avoid having to create
> aliases.
> 
> I have decided to use the x86 naming because they are less change. So
> all the Arm fixmap slots will now be prefixed with FIX rather than
> FIXMAP.
> 
> Signed-off-by: Julien Grall <jgrall@amazon.com>

Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>


> ----
> 
>     Note that potentially more renaming that could be done to share
>     more code in future. I have decided to not do that to avoid going
>     down a rabbit hole.
> ---
>  xen/arch/arm/acpi/lib.c                 | 18 +++++++++---------
>  xen/arch/arm/include/asm/early_printk.h |  2 +-
>  xen/arch/arm/include/asm/fixmap.h       | 16 ++++++++--------
>  xen/arch/arm/kernel.c                   |  6 +++---
>  xen/common/pmap.c                       |  8 ++++----
>  5 files changed, 25 insertions(+), 25 deletions(-)
> 
> diff --git a/xen/arch/arm/acpi/lib.c b/xen/arch/arm/acpi/lib.c
> index 41d521f720ac..736cf09ecaa8 100644
> --- a/xen/arch/arm/acpi/lib.c
> +++ b/xen/arch/arm/acpi/lib.c
> @@ -40,10 +40,10 @@ char *__acpi_map_table(paddr_t phys, unsigned long size)
>          return NULL;
>  
>      offset = phys & (PAGE_SIZE - 1);
> -    base = FIXMAP_ADDR(FIXMAP_ACPI_BEGIN) + offset;
> +    base = FIXMAP_ADDR(FIX_ACPI_BEGIN) + offset;
>  
>      /* Check the fixmap is big enough to map the region */
> -    if ( (FIXMAP_ADDR(FIXMAP_ACPI_END) + PAGE_SIZE - base) < size )
> +    if ( (FIXMAP_ADDR(FIX_ACPI_END) + PAGE_SIZE - base) < size )
>          return NULL;
>  
>      /* With the fixmap, we can only map one region at the time */
> @@ -54,7 +54,7 @@ char *__acpi_map_table(paddr_t phys, unsigned long size)
>  
>      size += offset;
>      mfn = maddr_to_mfn(phys);
> -    idx = FIXMAP_ACPI_BEGIN;
> +    idx = FIX_ACPI_BEGIN;
>  
>      do {
>          set_fixmap(idx, mfn, PAGE_HYPERVISOR);
> @@ -72,8 +72,8 @@ bool __acpi_unmap_table(const void *ptr, unsigned long size)
>      unsigned int idx;
>  
>      /* We are only handling fixmap address in the arch code */
> -    if ( (vaddr < FIXMAP_ADDR(FIXMAP_ACPI_BEGIN)) ||
> -         (vaddr >= (FIXMAP_ADDR(FIXMAP_ACPI_END) + PAGE_SIZE)) )
> +    if ( (vaddr < FIXMAP_ADDR(FIX_ACPI_BEGIN)) ||
> +         (vaddr >= (FIXMAP_ADDR(FIX_ACPI_END) + PAGE_SIZE)) )
>          return false;
>  
>      /*
> @@ -81,16 +81,16 @@ bool __acpi_unmap_table(const void *ptr, unsigned long size)
>       * for the ACPI fixmap region. The caller is expected to free with
>       * the same address.
>       */
> -    ASSERT((vaddr & PAGE_MASK) == FIXMAP_ADDR(FIXMAP_ACPI_BEGIN));
> +    ASSERT((vaddr & PAGE_MASK) == FIXMAP_ADDR(FIX_ACPI_BEGIN));
>  
>      /* The region allocated fit in the ACPI fixmap region. */
> -    ASSERT(size < (FIXMAP_ADDR(FIXMAP_ACPI_END) + PAGE_SIZE - vaddr));
> +    ASSERT(size < (FIXMAP_ADDR(FIX_ACPI_END) + PAGE_SIZE - vaddr));
>      ASSERT(fixmap_inuse);
>  
>      fixmap_inuse = false;
>  
> -    size += vaddr - FIXMAP_ADDR(FIXMAP_ACPI_BEGIN);
> -    idx = FIXMAP_ACPI_BEGIN;
> +    size += vaddr - FIXMAP_ADDR(FIX_ACPI_BEGIN);
> +    idx = FIX_ACPI_BEGIN;
>  
>      do
>      {
> diff --git a/xen/arch/arm/include/asm/early_printk.h b/xen/arch/arm/include/asm/early_printk.h
> index c5149b2976da..a5f48801f476 100644
> --- a/xen/arch/arm/include/asm/early_printk.h
> +++ b/xen/arch/arm/include/asm/early_printk.h
> @@ -17,7 +17,7 @@
>  
>  /* need to add the uart address offset in page to the fixmap address */
>  #define EARLY_UART_VIRTUAL_ADDRESS \
> -    (FIXMAP_ADDR(FIXMAP_CONSOLE) + (CONFIG_EARLY_UART_BASE_ADDRESS & ~PAGE_MASK))
> +    (FIXMAP_ADDR(FIX_CONSOLE) + (CONFIG_EARLY_UART_BASE_ADDRESS & ~PAGE_MASK))
>  
>  #endif /* !CONFIG_EARLY_PRINTK */
>  
> diff --git a/xen/arch/arm/include/asm/fixmap.h b/xen/arch/arm/include/asm/fixmap.h
> index d0c9a52c8c28..154db85686c2 100644
> --- a/xen/arch/arm/include/asm/fixmap.h
> +++ b/xen/arch/arm/include/asm/fixmap.h
> @@ -8,17 +8,17 @@
>  #include <xen/pmap.h>
>  
>  /* Fixmap slots */
> -#define FIXMAP_CONSOLE  0  /* The primary UART */
> -#define FIXMAP_MISC     1  /* Ephemeral mappings of hardware */
> -#define FIXMAP_ACPI_BEGIN  2  /* Start mappings of ACPI tables */
> -#define FIXMAP_ACPI_END    (FIXMAP_ACPI_BEGIN + NUM_FIXMAP_ACPI_PAGES - 1)  /* End mappings of ACPI tables */
> -#define FIXMAP_PMAP_BEGIN (FIXMAP_ACPI_END + 1) /* Start of PMAP */
> -#define FIXMAP_PMAP_END (FIXMAP_PMAP_BEGIN + NUM_FIX_PMAP - 1) /* End of PMAP */
> +#define FIX_CONSOLE  0  /* The primary UART */
> +#define FIX_MISC     1  /* Ephemeral mappings of hardware */
> +#define FIX_ACPI_BEGIN  2  /* Start mappings of ACPI tables */
> +#define FIX_ACPI_END    (FIX_ACPI_BEGIN + NUM_FIXMAP_ACPI_PAGES - 1)  /* End mappings of ACPI tables */
> +#define FIX_PMAP_BEGIN (FIX_ACPI_END + 1) /* Start of PMAP */
> +#define FIX_PMAP_END (FIX_PMAP_BEGIN + NUM_FIX_PMAP - 1) /* End of PMAP */
>  
> -#define FIXMAP_LAST FIXMAP_PMAP_END
> +#define FIX_LAST FIX_PMAP_END
>  
>  #define FIXADDR_START FIXMAP_ADDR(0)
> -#define FIXADDR_TOP FIXMAP_ADDR(FIXMAP_LAST)
> +#define FIXADDR_TOP FIXMAP_ADDR(FIX_LAST)
>  
>  #ifndef __ASSEMBLY__
>  
> diff --git a/xen/arch/arm/kernel.c b/xen/arch/arm/kernel.c
> index 23b840ea9ea8..56800750fd9c 100644
> --- a/xen/arch/arm/kernel.c
> +++ b/xen/arch/arm/kernel.c
> @@ -49,7 +49,7 @@ struct minimal_dtb_header {
>   */
>  void __init copy_from_paddr(void *dst, paddr_t paddr, unsigned long len)
>  {
> -    void *src = (void *)FIXMAP_ADDR(FIXMAP_MISC);
> +    void *src = (void *)FIXMAP_ADDR(FIX_MISC);
>  
>      while (len) {
>          unsigned long l, s;
> @@ -57,10 +57,10 @@ void __init copy_from_paddr(void *dst, paddr_t paddr, unsigned long len)
>          s = paddr & (PAGE_SIZE-1);
>          l = min(PAGE_SIZE - s, len);
>  
> -        set_fixmap(FIXMAP_MISC, maddr_to_mfn(paddr), PAGE_HYPERVISOR_WC);
> +        set_fixmap(FIX_MISC, maddr_to_mfn(paddr), PAGE_HYPERVISOR_WC);
>          memcpy(dst, src + s, l);
>          clean_dcache_va_range(dst, l);
> -        clear_fixmap(FIXMAP_MISC);
> +        clear_fixmap(FIX_MISC);
>  
>          paddr += l;
>          dst += l;
> diff --git a/xen/common/pmap.c b/xen/common/pmap.c
> index 14517198aae3..6e3ba9298df4 100644
> --- a/xen/common/pmap.c
> +++ b/xen/common/pmap.c
> @@ -32,8 +32,8 @@ void *__init pmap_map(mfn_t mfn)
>  
>      __set_bit(idx, inuse);
>  
> -    slot = idx + FIXMAP_PMAP_BEGIN;
> -    ASSERT(slot >= FIXMAP_PMAP_BEGIN && slot <= FIXMAP_PMAP_END);
> +    slot = idx + FIX_PMAP_BEGIN;
> +    ASSERT(slot >= FIX_PMAP_BEGIN && slot <= FIX_PMAP_END);
>  
>      /*
>       * We cannot use set_fixmap() here. We use PMAP when the domain map
> @@ -53,10 +53,10 @@ void __init pmap_unmap(const void *p)
>      unsigned int slot = virt_to_fix((unsigned long)p);
>  
>      ASSERT(system_state < SYS_STATE_smp_boot);
> -    ASSERT(slot >= FIXMAP_PMAP_BEGIN && slot <= FIXMAP_PMAP_END);
> +    ASSERT(slot >= FIX_PMAP_BEGIN && slot <= FIX_PMAP_END);
>      ASSERT(!in_irq());
>  
> -    idx = slot - FIXMAP_PMAP_BEGIN;
> +    idx = slot - FIX_PMAP_BEGIN;
>  
>      __clear_bit(idx, inuse);
>      arch_pmap_unmap(slot);
> -- 
> 2.38.1
> 


^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 01/22] xen/common: page_alloc: Re-order includes
  2023-01-23 21:29   ` Stefano Stabellini
@ 2023-01-23 21:57     ` Julien Grall
  0 siblings, 0 replies; 101+ messages in thread
From: Julien Grall @ 2023-01-23 21:57 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: xen-devel, Julien Grall, Andrew Cooper, George Dunlap,
	Jan Beulich, Wei Liu

Hi Stefano,

On 23/01/2023 21:29, Stefano Stabellini wrote:
> On Fri, 16 Dec 2022, Julien Grall wrote:
>> From: Julien Grall <jgrall@amazon.com>
>>
>> Order the includes with the xen headers first, then asm headers and
>> last public headers. Within each category, they are sorted alphabetically.
>>
>> Note that the includes in protected by CONFIG_X86 hasn't been sorted
>> to avoid adding multiple #ifdef.
>>
>> Signed-off-by: Julien Grall <jgrall@amazon.com>
> 
> This patch doesn't apply as is any longer.

That's expected given that I committed this patch a month ago (see my 
answer to Jan's e-mail on the 23rd December).

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 11/22] x86: add a boot option to enable and disable the direct map
  2023-01-23 21:45   ` Stefano Stabellini
@ 2023-01-23 22:01     ` Julien Grall
  0 siblings, 0 replies; 101+ messages in thread
From: Julien Grall @ 2023-01-23 22:01 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: xen-devel, Hongyan Xia, Andrew Cooper, George Dunlap,
	Jan Beulich, Wei Liu, Bertrand Marquis, Volodymyr Babchuk,
	Roger Pau Monné,
	Julien Grall

Hi Stefano,

On 23/01/2023 21:45, Stefano Stabellini wrote:
>> diff --git a/xen/arch/arm/include/asm/mm.h b/xen/arch/arm/include/asm/mm.h
>> index 68adcac9fa8d..2366928d71aa 100644
>> --- a/xen/arch/arm/include/asm/mm.h
>> +++ b/xen/arch/arm/include/asm/mm.h
>> @@ -406,6 +406,11 @@ static inline void page_set_xenheap_gfn(struct page_info *p, gfn_t gfn)
>>       } while ( (y = cmpxchg(&p->u.inuse.type_info, x, nx)) != x );
>>   }
>>   
>> +static inline bool arch_has_directmap(void)
>> +{
>> +    return true;
> 
> Shoudn't arch_has_directmap return false for arm32?

We still have a directmap on Arm32, but it only covers the xenheap.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 17/22] x86/setup: vmap heap nodes when they are outside the direct map
  2022-12-16 11:48 ` [PATCH 17/22] x86/setup: vmap heap nodes when they are outside the direct map Julien Grall
  2023-01-11 14:39   ` Jan Beulich
@ 2023-01-23 22:03   ` Stefano Stabellini
  2023-01-23 22:23     ` Julien Grall
  1 sibling, 1 reply; 101+ messages in thread
From: Stefano Stabellini @ 2023-01-23 22:03 UTC (permalink / raw)
  To: Julien Grall
  Cc: xen-devel, Hongyan Xia, Andrew Cooper, George Dunlap,
	Jan Beulich, Stefano Stabellini, Wei Liu, Julien Grall

On Fri, 16 Dec 2022, Julien Grall wrote:
> From: Hongyan Xia <hongyxia@amazon.com>
> 
> When we do not have a direct map, archs_mfn_in_direct_map() will always
> return false, thus init_node_heap() will allocate xenheap pages from an
> existing node for the metadata of a new node. This means that the
> metadata of a new node is in a different node, slowing down heap
> allocation.
> 
> Since we now have early vmap, vmap the metadata locally in the new node.
> 
> Signed-off-by: Hongyan Xia <hongyxia@amazon.com>
> Signed-off-by: Julien Grall <jgrall@amazon.com>
> 
> ----
> 
>     Changes from Hongyan's version:
>         * arch_mfn_in_direct_map() was renamed to
>           arch_mfns_in_direct_map()
>         * Use vmap_contig_pages() rather than __vmap(...).
>         * Add missing include (xen/vmap.h) so it compiles on Arm
> ---
>  xen/common/page_alloc.c | 42 +++++++++++++++++++++++++++++++----------
>  1 file changed, 32 insertions(+), 10 deletions(-)
> 
> diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
> index 0c4af5a71407..581c15d74dfb 100644
> --- a/xen/common/page_alloc.c
> +++ b/xen/common/page_alloc.c
> @@ -136,6 +136,7 @@
>  #include <xen/sched.h>
>  #include <xen/softirq.h>
>  #include <xen/spinlock.h>
> +#include <xen/vmap.h>
>  
>  #include <asm/flushtlb.h>
>  #include <asm/numa.h>
> @@ -597,22 +598,43 @@ static unsigned long init_node_heap(int node, unsigned long mfn,
>          needed = 0;
>      }
>      else if ( *use_tail && nr >= needed &&
> -              arch_mfns_in_directmap(mfn + nr - needed, needed) &&
>                (!xenheap_bits ||
>                 !((mfn + nr - 1) >> (xenheap_bits - PAGE_SHIFT))) )
>      {
> -        _heap[node] = mfn_to_virt(mfn + nr - needed);
> -        avail[node] = mfn_to_virt(mfn + nr - 1) +
> -                      PAGE_SIZE - sizeof(**avail) * NR_ZONES;
> -    }
> -    else if ( nr >= needed &&
> -              arch_mfns_in_directmap(mfn, needed) &&
> +        if ( arch_mfns_in_directmap(mfn + nr - needed, needed) )
> +        {
> +            _heap[node] = mfn_to_virt(mfn + nr - needed);
> +            avail[node] = mfn_to_virt(mfn + nr - 1) +
> +                          PAGE_SIZE - sizeof(**avail) * NR_ZONES;
> +        }
> +        else
> +        {
> +            mfn_t needed_start = _mfn(mfn + nr - needed);
> +
> +            _heap[node] = vmap_contig_pages(needed_start, needed);
> +            BUG_ON(!_heap[node]);

I see a BUG_ON here but init_node_heap is not __init. Asking because
BUG_ON is only a good idea during init time. Should init_node_heap be
__init (not necessarely in this patch, but still)?


> +            avail[node] = (void *)(_heap[node]) + (needed << PAGE_SHIFT) -
> +                          sizeof(**avail) * NR_ZONES;
> +        }
> +    } else if ( nr >= needed &&
>                (!xenheap_bits ||
>                 !((mfn + needed - 1) >> (xenheap_bits - PAGE_SHIFT))) )
>      {
> -        _heap[node] = mfn_to_virt(mfn);
> -        avail[node] = mfn_to_virt(mfn + needed - 1) +
> -                      PAGE_SIZE - sizeof(**avail) * NR_ZONES;
> +        if ( arch_mfns_in_directmap(mfn, needed) )
> +        {
> +            _heap[node] = mfn_to_virt(mfn);
> +            avail[node] = mfn_to_virt(mfn + needed - 1) +
> +                          PAGE_SIZE - sizeof(**avail) * NR_ZONES;
> +        }
> +        else
> +        {
> +            mfn_t needed_start = _mfn(mfn);
> +
> +            _heap[node] = vmap_contig_pages(needed_start, needed);
> +            BUG_ON(!_heap[node]);
> +            avail[node] = (void *)(_heap[node]) + (needed << PAGE_SHIFT) -
> +                          sizeof(**avail) * NR_ZONES;
> +        }
>          *use_tail = false;
>      }
>      else if ( get_order_from_bytes(sizeof(**_heap)) ==
> -- 
> 2.38.1
> 


^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 19/22] xen/arm32: mm: Rename 'first' to 'root' in init_secondary_pagetables()
  2022-12-16 11:48 ` [PATCH 19/22] xen/arm32: mm: Rename 'first' to 'root' in init_secondary_pagetables() Julien Grall
  2023-01-06 14:54   ` Henry Wang
@ 2023-01-23 22:06   ` Stefano Stabellini
  1 sibling, 0 replies; 101+ messages in thread
From: Stefano Stabellini @ 2023-01-23 22:06 UTC (permalink / raw)
  To: Julien Grall
  Cc: xen-devel, Julien Grall, Stefano Stabellini, Bertrand Marquis,
	Volodymyr Babchuk

On Fri, 16 Dec 2022, Julien Grall wrote:
> From: Julien Grall <jgrall@amazon.com>
> 
> The arm32 version of init_secondary_pagetables() will soon be re-used
> for arm64 as well where the root table start at level 0 rather than level 1.
> 
> So rename 'first' to 'root'.
> 
> Signed-off-by: Julien Grall <jgrall@amazon.com>

Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>


> ---
>  xen/arch/arm/mm.c | 16 +++++++---------
>  1 file changed, 7 insertions(+), 9 deletions(-)
> 
> diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c
> index 0fc6f2992dd1..4e208f7d20c8 100644
> --- a/xen/arch/arm/mm.c
> +++ b/xen/arch/arm/mm.c
> @@ -571,32 +571,30 @@ int init_secondary_pagetables(int cpu)
>  #else
>  int init_secondary_pagetables(int cpu)
>  {
> -    lpae_t *first;
> +    lpae_t *root = alloc_xenheap_page();
>  
> -    first = alloc_xenheap_page(); /* root == first level on 32-bit 3-level trie */
> -
> -    if ( !first )
> +    if ( !root )
>      {
> -        printk("CPU%u: Unable to allocate the first page-table\n", cpu);
> +        printk("CPU%u: Unable to allocate the root page-table\n", cpu);
>          return -ENOMEM;
>      }
>  
>      /* Initialise root pagetable from root of boot tables */
> -    memcpy(first, cpu0_pgtable, PAGE_SIZE);
> -    per_cpu(xen_pgtable, cpu) = first;
> +    memcpy(root, cpu0_pgtable, PAGE_SIZE);
> +    per_cpu(xen_pgtable, cpu) = root;
>  
>      if ( !init_domheap_mappings(cpu) )
>      {
>          printk("CPU%u: Unable to prepare the domheap page-tables\n", cpu);
>          per_cpu(xen_pgtable, cpu) = NULL;
> -        free_xenheap_page(first);
> +        free_xenheap_page(root);
>          return -ENOMEM;
>      }
>  
>      clear_boot_pagetables();
>  
>      /* Set init_ttbr for this CPU coming up */
> -    init_ttbr = __pa(first);
> +    init_ttbr = __pa(root);
>      clean_dcache(init_ttbr);
>  
>      return 0;
> -- 
> 2.38.1
> 


^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 20/22] xen/arm64: mm: Use per-pCPU page-tables
  2022-12-16 11:48 ` [PATCH 20/22] xen/arm64: mm: Use per-pCPU page-tables Julien Grall
  2023-01-06 14:54   ` Henry Wang
@ 2023-01-23 22:21   ` Stefano Stabellini
  1 sibling, 0 replies; 101+ messages in thread
From: Stefano Stabellini @ 2023-01-23 22:21 UTC (permalink / raw)
  To: Julien Grall
  Cc: xen-devel, Julien Grall, Stefano Stabellini, Bertrand Marquis,
	Volodymyr Babchuk

On Fri, 16 Dec 2022, Julien Grall wrote:
> From: Julien Grall <jgrall@amazon.com>
> 
> At the moment, on Arm64, every pCPU are sharing the same page-tables.
> 
> In a follow-up patch, we will allow the possibility to remove the
> direct map and therefore it will be necessary to have a mapcache.
> 
> While we have plenty of spare virtual address space to have
> to reserve part for each pCPU, it means that temporary mappings
> (e.g. guest memory) could be accessible by every pCPU.
> 
> In order to increase our security posture, it would be better if
> those mappings are only accessible by the pCPU doing the temporary
> mapping.
> 
> In addition to that, a per-pCPU page-tables opens the way to have
> per-domain mapping area.
> 
> Arm32 is already using per-pCPU page-tables so most of the code
> can be re-used. Arm64 doesn't yet have support for the mapcache,
> so a stub is provided (moved to its own header asm/domain_page.h).
> 
> Take the opportunity to fix a typo in a comment that is modified.
> 
> Signed-off-by: Julien Grall <jgrall@amazon.com>

Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>


> ---
>  xen/arch/arm/domain_page.c             |  2 ++
>  xen/arch/arm/include/asm/arm32/mm.h    |  8 -----
>  xen/arch/arm/include/asm/domain_page.h | 13 ++++++++
>  xen/arch/arm/include/asm/mm.h          |  5 +++
>  xen/arch/arm/mm.c                      | 42 +++++++-------------------
>  xen/arch/arm/setup.c                   |  1 +
>  6 files changed, 32 insertions(+), 39 deletions(-)
>  create mode 100644 xen/arch/arm/include/asm/domain_page.h
> 
> diff --git a/xen/arch/arm/domain_page.c b/xen/arch/arm/domain_page.c
> index b7c02c919064..4540b3c5f24c 100644
> --- a/xen/arch/arm/domain_page.c
> +++ b/xen/arch/arm/domain_page.c
> @@ -3,6 +3,8 @@
>  #include <xen/pmap.h>
>  #include <xen/vmap.h>
>  
> +#include <asm/domain_page.h>
> +
>  /* Override macros from asm/page.h to make them work with mfn_t */
>  #undef virt_to_mfn
>  #define virt_to_mfn(va) _mfn(__virt_to_mfn(va))
> diff --git a/xen/arch/arm/include/asm/arm32/mm.h b/xen/arch/arm/include/asm/arm32/mm.h
> index 8bfc906e7178..6b039d9ceaa2 100644
> --- a/xen/arch/arm/include/asm/arm32/mm.h
> +++ b/xen/arch/arm/include/asm/arm32/mm.h
> @@ -1,12 +1,6 @@
>  #ifndef __ARM_ARM32_MM_H__
>  #define __ARM_ARM32_MM_H__
>  
> -#include <xen/percpu.h>
> -
> -#include <asm/lpae.h>
> -
> -DECLARE_PER_CPU(lpae_t *, xen_pgtable);
> -
>  /*
>   * Only a limited amount of RAM, called xenheap, is always mapped on ARM32.
>   * For convenience always return false.
> @@ -16,8 +10,6 @@ static inline bool arch_mfns_in_directmap(unsigned long mfn, unsigned long nr)
>      return false;
>  }
>  
> -bool init_domheap_mappings(unsigned int cpu);
> -
>  #endif /* __ARM_ARM32_MM_H__ */
>  
>  /*
> diff --git a/xen/arch/arm/include/asm/domain_page.h b/xen/arch/arm/include/asm/domain_page.h
> new file mode 100644
> index 000000000000..e9f52685e2ec
> --- /dev/null
> +++ b/xen/arch/arm/include/asm/domain_page.h
> @@ -0,0 +1,13 @@
> +#ifndef __ASM_ARM_DOMAIN_PAGE_H__
> +#define __ASM_ARM_DOMAIN_PAGE_H__
> +
> +#ifdef CONFIG_ARCH_MAP_DOMAIN_PAGE
> +bool init_domheap_mappings(unsigned int cpu);
> +#else
> +static inline bool init_domheap_mappings(unsigned int cpu)
> +{
> +    return true;
> +}
> +#endif
> +
> +#endif /* __ASM_ARM_DOMAIN_PAGE_H__ */
> diff --git a/xen/arch/arm/include/asm/mm.h b/xen/arch/arm/include/asm/mm.h
> index 2366928d71aa..7a2c775f9562 100644
> --- a/xen/arch/arm/include/asm/mm.h
> +++ b/xen/arch/arm/include/asm/mm.h
> @@ -2,6 +2,9 @@
>  #define __ARCH_ARM_MM__
>  
>  #include <xen/kernel.h>
> +#include <xen/percpu.h>
> +
> +#include <asm/lpae.h>
>  #include <asm/page.h>
>  #include <public/xen.h>
>  #include <xen/pdx.h>
> @@ -14,6 +17,8 @@
>  # error "unknown ARM variant"
>  #endif
>  
> +DECLARE_PER_CPU(lpae_t *, xen_pgtable);
> +
>  /* Align Xen to a 2 MiB boundary. */
>  #define XEN_PADDR_ALIGN (1 << 21)
>  
> diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c
> index 4e208f7d20c8..2af751af9003 100644
> --- a/xen/arch/arm/mm.c
> +++ b/xen/arch/arm/mm.c
> @@ -24,6 +24,7 @@
>  
>  #include <xsm/xsm.h>
>  
> +#include <asm/domain_page.h>
>  #include <asm/fixmap.h>
>  #include <asm/setup.h>
>  
> @@ -90,20 +91,19 @@ DEFINE_BOOT_PAGE_TABLE(boot_third);
>   * xen_second, xen_fixmap and xen_xenmap are always shared between all
>   * PCPUs.
>   */
> +/* Per-CPU pagetable pages */
> +/* xen_pgtable == root of the trie (zeroeth level on 64-bit, first on 32-bit) */
> +DEFINE_PER_CPU(lpae_t *, xen_pgtable);
> +
> +/* Root of the trie for cpu0, other CPU's PTs are dynamically allocated */
> +static DEFINE_PAGE_TABLE(cpu0_pgtable);
> +#define THIS_CPU_PGTABLE this_cpu(xen_pgtable)
>  
>  #ifdef CONFIG_ARM_64
>  #define HYP_PT_ROOT_LEVEL 0
> -static DEFINE_PAGE_TABLE(xen_pgtable);
>  static DEFINE_PAGE_TABLE(xen_first);
> -#define THIS_CPU_PGTABLE xen_pgtable
>  #else
>  #define HYP_PT_ROOT_LEVEL 1
> -/* Per-CPU pagetable pages */
> -/* xen_pgtable == root of the trie (zeroeth level on 64-bit, first on 32-bit) */
> -DEFINE_PER_CPU(lpae_t *, xen_pgtable);
> -#define THIS_CPU_PGTABLE this_cpu(xen_pgtable)
> -/* Root of the trie for cpu0, other CPU's PTs are dynamically allocated */
> -static DEFINE_PAGE_TABLE(cpu0_pgtable);
>  #endif
>  
>  /* Common pagetable leaves */
> @@ -481,14 +481,13 @@ void __init setup_pagetables(unsigned long boot_phys_offset)
>  
>      phys_offset = boot_phys_offset;
>  
> +    p = cpu0_pgtable;
> +
>  #ifdef CONFIG_ARM_64
> -    p = (void *) xen_pgtable;
>      p[0] = pte_of_xenaddr((uintptr_t)xen_first);
>      p[0].pt.table = 1;
>      p[0].pt.xn = 0;
>      p = (void *) xen_first;
> -#else
> -    p = (void *) cpu0_pgtable;
>  #endif
>  
>      /* Map xen second level page-table */
> @@ -527,19 +526,13 @@ void __init setup_pagetables(unsigned long boot_phys_offset)
>      pte.pt.table = 1;
>      xen_second[second_table_offset(FIXMAP_ADDR(0))] = pte;
>  
> -#ifdef CONFIG_ARM_64
> -    ttbr = (uintptr_t) xen_pgtable + phys_offset;
> -#else
>      ttbr = (uintptr_t) cpu0_pgtable + phys_offset;
> -#endif
>  
>      switch_ttbr(ttbr);
>  
>      xen_pt_enforce_wnx();
>  
> -#ifdef CONFIG_ARM_32
>      per_cpu(xen_pgtable, 0) = cpu0_pgtable;
> -#endif
>  }
>  
>  static void clear_boot_pagetables(void)
> @@ -557,18 +550,6 @@ static void clear_boot_pagetables(void)
>      clear_table(boot_third);
>  }
>  
> -#ifdef CONFIG_ARM_64
> -int init_secondary_pagetables(int cpu)
> -{
> -    clear_boot_pagetables();
> -
> -    /* Set init_ttbr for this CPU coming up. All CPus share a single setof
> -     * pagetables, but rewrite it each time for consistency with 32 bit. */
> -    init_ttbr = (uintptr_t) xen_pgtable + phys_offset;
> -    clean_dcache(init_ttbr);
> -    return 0;
> -}
> -#else
>  int init_secondary_pagetables(int cpu)
>  {
>      lpae_t *root = alloc_xenheap_page();
> @@ -599,7 +580,6 @@ int init_secondary_pagetables(int cpu)
>  
>      return 0;
>  }
> -#endif
>  
>  /* MMU setup for secondary CPUS (which already have paging enabled) */
>  void mmu_init_secondary_cpu(void)
> @@ -1089,7 +1069,7 @@ static int xen_pt_update(unsigned long virt,
>      unsigned long left = nr_mfns;
>  
>      /*
> -     * For arm32, page-tables are different on each CPUs. Yet, they share
> +     * Page-tables are different on each CPU. Yet, they share
>       * some common mappings. It is assumed that only common mappings
>       * will be modified with this function.
>       *
> diff --git a/xen/arch/arm/setup.c b/xen/arch/arm/setup.c
> index 2311726f5ddd..88d9d90fb5ad 100644
> --- a/xen/arch/arm/setup.c
> +++ b/xen/arch/arm/setup.c
> @@ -39,6 +39,7 @@
>  #include <asm/gic.h>
>  #include <asm/cpuerrata.h>
>  #include <asm/cpufeature.h>
> +#include <asm/domain_page.h>
>  #include <asm/platform.h>
>  #include <asm/procinfo.h>
>  #include <asm/setup.h>
> -- 
> 2.38.1
> 


^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 17/22] x86/setup: vmap heap nodes when they are outside the direct map
  2023-01-23 22:03   ` Stefano Stabellini
@ 2023-01-23 22:23     ` Julien Grall
  2023-01-23 22:56       ` Stefano Stabellini
  0 siblings, 1 reply; 101+ messages in thread
From: Julien Grall @ 2023-01-23 22:23 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: xen-devel, Hongyan Xia, Andrew Cooper, George Dunlap,
	Jan Beulich, Wei Liu, Julien Grall

Hi,

On 23/01/2023 22:03, Stefano Stabellini wrote:
> On Fri, 16 Dec 2022, Julien Grall wrote:
>> From: Hongyan Xia <hongyxia@amazon.com>
>>
>> When we do not have a direct map, archs_mfn_in_direct_map() will always
>> return false, thus init_node_heap() will allocate xenheap pages from an
>> existing node for the metadata of a new node. This means that the
>> metadata of a new node is in a different node, slowing down heap
>> allocation.
>>
>> Since we now have early vmap, vmap the metadata locally in the new node.
>>
>> Signed-off-by: Hongyan Xia <hongyxia@amazon.com>
>> Signed-off-by: Julien Grall <jgrall@amazon.com>
>>
>> ----
>>
>>      Changes from Hongyan's version:
>>          * arch_mfn_in_direct_map() was renamed to
>>            arch_mfns_in_direct_map()
>>          * Use vmap_contig_pages() rather than __vmap(...).
>>          * Add missing include (xen/vmap.h) so it compiles on Arm
>> ---
>>   xen/common/page_alloc.c | 42 +++++++++++++++++++++++++++++++----------
>>   1 file changed, 32 insertions(+), 10 deletions(-)
>>
>> diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
>> index 0c4af5a71407..581c15d74dfb 100644
>> --- a/xen/common/page_alloc.c
>> +++ b/xen/common/page_alloc.c
>> @@ -136,6 +136,7 @@
>>   #include <xen/sched.h>
>>   #include <xen/softirq.h>
>>   #include <xen/spinlock.h>
>> +#include <xen/vmap.h>
>>   
>>   #include <asm/flushtlb.h>
>>   #include <asm/numa.h>
>> @@ -597,22 +598,43 @@ static unsigned long init_node_heap(int node, unsigned long mfn,
>>           needed = 0;
>>       }
>>       else if ( *use_tail && nr >= needed &&
>> -              arch_mfns_in_directmap(mfn + nr - needed, needed) &&
>>                 (!xenheap_bits ||
>>                  !((mfn + nr - 1) >> (xenheap_bits - PAGE_SHIFT))) )
>>       {
>> -        _heap[node] = mfn_to_virt(mfn + nr - needed);
>> -        avail[node] = mfn_to_virt(mfn + nr - 1) +
>> -                      PAGE_SIZE - sizeof(**avail) * NR_ZONES;
>> -    }
>> -    else if ( nr >= needed &&
>> -              arch_mfns_in_directmap(mfn, needed) &&
>> +        if ( arch_mfns_in_directmap(mfn + nr - needed, needed) )
>> +        {
>> +            _heap[node] = mfn_to_virt(mfn + nr - needed);
>> +            avail[node] = mfn_to_virt(mfn + nr - 1) +
>> +                          PAGE_SIZE - sizeof(**avail) * NR_ZONES;
>> +        }
>> +        else
>> +        {
>> +            mfn_t needed_start = _mfn(mfn + nr - needed);
>> +
>> +            _heap[node] = vmap_contig_pages(needed_start, needed);
>> +            BUG_ON(!_heap[node]);
> 
> I see a BUG_ON here but init_node_heap is not __init.

FWIW, this is not the first introducing the first BUG_ON() in this function.

  Asking because
> BUG_ON is only a good idea during init time. Should init_node_heap be
> __init (not necessarely in this patch, but still)?
AFAIK, there are two uses outside of __init:
   1) Free the init sections
   2) Memory hotplug

In the first case, we will likely need to panic() in case of an error. 
For ther second case, I am not entirely sure.

But there would be a fair bit of plumbing and thinking (how do you deal 
with the case where part of the memory were already added?).

Anyway, I don't think I am making the function worse, so I would rather 
no open that can of worms (yet).

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 21/22] xen/arm64: Implement a mapcache for arm64
  2022-12-16 11:48 ` [PATCH 21/22] xen/arm64: Implement a mapcache for arm64 Julien Grall
  2023-01-06 14:55   ` Henry Wang
@ 2023-01-23 22:34   ` Stefano Stabellini
  1 sibling, 0 replies; 101+ messages in thread
From: Stefano Stabellini @ 2023-01-23 22:34 UTC (permalink / raw)
  To: Julien Grall
  Cc: xen-devel, Julien Grall, Stefano Stabellini, Bertrand Marquis,
	Volodymyr Babchuk

On Fri, 16 Dec 2022, Julien Grall wrote:
> From: Julien Grall <jgrall@amazon.com>
> 
> At the moment, on arm64, map_domain_page() is implemented using
> virt_to_mfn(). Therefore it is relying on the directmap.
> 
> In a follow-up patch, we will allow the admin to remove the directmap.
> Therefore we want to implement a mapcache.
> 
> Thanksfully there is already one for arm32. So select ARCH_ARM_DOMAIN_PAGE
> and add the necessary boiler plate to support 64-bit:
>     - The page-table start at level 0, so we need to allocate the level
>       1 page-table
>     - map_domain_page() should check if the page is in the directmap. If
>       yes, then use virt_to_mfn() to limit the performance impact
>       when the directmap is still enabled (this will be selectable
>       on the command line).
> 
> Take the opportunity to replace first_table_offset(...) with offsets[...].
> 
> Note that, so far, arch_mfns_in_directmap() always return true on
> arm64. So the mapcache is not yet used. This will change in a
> follow-up patch.
> 
> Signed-off-by: Julien Grall <jgrall@amazon.com>

Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>


> ----
> 
>     There are a few TODOs:
>         - It is becoming more critical to fix the mapcache
>           implementation (this is not compliant with the Arm Arm)
>         - Evaluate the performance
> ---
>  xen/arch/arm/Kconfig              |  1 +
>  xen/arch/arm/domain_page.c        | 47 +++++++++++++++++++++++++++----
>  xen/arch/arm/include/asm/config.h |  7 +++++
>  xen/arch/arm/include/asm/mm.h     |  5 ++++
>  xen/arch/arm/mm.c                 |  6 ++--
>  xen/arch/arm/setup.c              |  4 +++
>  6 files changed, 62 insertions(+), 8 deletions(-)
> 
> diff --git a/xen/arch/arm/Kconfig b/xen/arch/arm/Kconfig
> index 239d3aed3c7f..9c58b2d5c3aa 100644
> --- a/xen/arch/arm/Kconfig
> +++ b/xen/arch/arm/Kconfig
> @@ -9,6 +9,7 @@ config ARM_64
>  	select 64BIT
>  	select ARM_EFI
>  	select HAS_FAST_MULTIPLY
> +	select ARCH_MAP_DOMAIN_PAGE
>  
>  config ARM
>  	def_bool y
> diff --git a/xen/arch/arm/domain_page.c b/xen/arch/arm/domain_page.c
> index 4540b3c5f24c..f3547dc853ef 100644
> --- a/xen/arch/arm/domain_page.c
> +++ b/xen/arch/arm/domain_page.c
> @@ -1,4 +1,5 @@
>  /* SPDX-License-Identifier: GPL-2.0-or-later */
> +#include <xen/domain_page.h>
>  #include <xen/mm.h>
>  #include <xen/pmap.h>
>  #include <xen/vmap.h>
> @@ -8,6 +9,8 @@
>  /* Override macros from asm/page.h to make them work with mfn_t */
>  #undef virt_to_mfn
>  #define virt_to_mfn(va) _mfn(__virt_to_mfn(va))
> +#undef mfn_to_virt
> +#define mfn_to_virt(va) __mfn_to_virt(mfn_x(mfn))
>  
>  /* cpu0's domheap page tables */
>  static DEFINE_PAGE_TABLES(cpu0_dommap, DOMHEAP_SECOND_PAGES);
> @@ -31,13 +34,30 @@ bool init_domheap_mappings(unsigned int cpu)
>  {
>      unsigned int order = get_order_from_pages(DOMHEAP_SECOND_PAGES);
>      lpae_t *root = per_cpu(xen_pgtable, cpu);
> +    lpae_t *first;
>      unsigned int i, first_idx;
>      lpae_t *domheap;
>      mfn_t mfn;
>  
> +    /* Convenience aliases */
> +    DECLARE_OFFSETS(offsets, DOMHEAP_VIRT_START);
> +
>      ASSERT(root);
>      ASSERT(!per_cpu(xen_dommap, cpu));
>  
> +    /*
> +     * On Arm64, the root is at level 0. Therefore we need an extra step
> +     * to allocate the first level page-table.
> +     */
> +#ifdef CONFIG_ARM_64
> +    if ( create_xen_table(&root[offsets[0]]) )
> +        return false;
> +
> +    first = xen_map_table(lpae_get_mfn(root[offsets[0]]));
> +#else
> +    first = root;
> +#endif
> +
>      /*
>       * The domheap for cpu0 is initialized before the heap is initialized.
>       * So we need to use pre-allocated pages.
> @@ -58,16 +78,20 @@ bool init_domheap_mappings(unsigned int cpu)
>       * domheap mapping pages.
>       */
>      mfn = virt_to_mfn(domheap);
> -    first_idx = first_table_offset(DOMHEAP_VIRT_START);
> +    first_idx = offsets[1];
>      for ( i = 0; i < DOMHEAP_SECOND_PAGES; i++ )
>      {
>          lpae_t pte = mfn_to_xen_entry(mfn_add(mfn, i), MT_NORMAL);
>          pte.pt.table = 1;
> -        write_pte(&root[first_idx + i], pte);
> +        write_pte(&first[first_idx + i], pte);
>      }
>  
>      per_cpu(xen_dommap, cpu) = domheap;
>  
> +#ifdef CONFIG_ARM_64
> +    xen_unmap_table(first);
> +#endif
> +
>      return true;
>  }
>  
> @@ -91,6 +115,10 @@ void *map_domain_page(mfn_t mfn)
>      lpae_t pte;
>      int i, slot;
>  
> +    /* Bypass the mapcache if the page is in the directmap */
> +    if ( arch_mfns_in_directmap(mfn_x(mfn), 1) )
> +        return mfn_to_virt(mfn);
> +
>      local_irq_save(flags);
>  
>      /* The map is laid out as an open-addressed hash table where each
> @@ -151,15 +179,24 @@ void *map_domain_page(mfn_t mfn)
>  }
>  
>  /* Release a mapping taken with map_domain_page() */
> -void unmap_domain_page(const void *va)
> +void unmap_domain_page(const void *ptr)
>  {
> +    unsigned long va = (unsigned long)ptr;
>      unsigned long flags;
>      lpae_t *map = this_cpu(xen_dommap);
> -    int slot = ((unsigned long) va - DOMHEAP_VIRT_START) >> SECOND_SHIFT;
> +    unsigned int slot;
>  
> -    if ( !va )
> +    /*
> +     * map_domain_page() may not have mapped anything if the address
> +     * is part of the directmap. So ignore anything outside of the
> +     * domheap.
> +     */
> +    if ( (va < DOMHEAP_VIRT_START) ||
> +         ((va - DOMHEAP_VIRT_START) >= DOMHEAP_VIRT_SIZE) )
>          return;
>  
> +    slot = (va - DOMHEAP_VIRT_START) >> SECOND_SHIFT;
> +
>      local_irq_save(flags);
>  
>      ASSERT(slot >= 0 && slot < DOMHEAP_ENTRIES);
> diff --git a/xen/arch/arm/include/asm/config.h b/xen/arch/arm/include/asm/config.h
> index 0fefed1b8aa9..12b7f1f1b9ea 100644
> --- a/xen/arch/arm/include/asm/config.h
> +++ b/xen/arch/arm/include/asm/config.h
> @@ -156,6 +156,13 @@
>  #define FRAMETABLE_SIZE        GB(32)
>  #define FRAMETABLE_NR          (FRAMETABLE_SIZE / sizeof(*frame_table))
>  
> +#define DOMHEAP_VIRT_START     SLOT0(255)
> +#define DOMHEAP_VIRT_SIZE      GB(2)
> +
> +#define DOMHEAP_ENTRIES        1024 /* 1024 2MB mapping slots */
> +/* Number of domheap pagetable pages required at the second level (2MB mappings) */
> +#define DOMHEAP_SECOND_PAGES (DOMHEAP_VIRT_SIZE >> FIRST_SHIFT)
> +
>  #define DIRECTMAP_VIRT_START   SLOT0(256)
>  #define DIRECTMAP_SIZE         (SLOT0_ENTRY_SIZE * (265-256))
>  #define DIRECTMAP_VIRT_END     (DIRECTMAP_VIRT_START + DIRECTMAP_SIZE - 1)
> diff --git a/xen/arch/arm/include/asm/mm.h b/xen/arch/arm/include/asm/mm.h
> index 7a2c775f9562..d73abf1bf763 100644
> --- a/xen/arch/arm/include/asm/mm.h
> +++ b/xen/arch/arm/include/asm/mm.h
> @@ -416,6 +416,11 @@ static inline bool arch_has_directmap(void)
>      return true;
>  }
>  
> +/* Helpers to allocate, map and unmap a Xen page-table */
> +int create_xen_table(lpae_t *entry);
> +lpae_t *xen_map_table(mfn_t mfn);
> +void xen_unmap_table(const lpae_t *table);
> +
>  #endif /*  __ARCH_ARM_MM__ */
>  /*
>   * Local variables:
> diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c
> index 2af751af9003..f5fb957554a5 100644
> --- a/xen/arch/arm/mm.c
> +++ b/xen/arch/arm/mm.c
> @@ -177,7 +177,7 @@ static void __init __maybe_unused build_assertions(void)
>  #undef CHECK_SAME_SLOT
>  }
>  
> -static lpae_t *xen_map_table(mfn_t mfn)
> +lpae_t *xen_map_table(mfn_t mfn)
>  {
>      /*
>       * During early boot, map_domain_page() may be unusable. Use the
> @@ -189,7 +189,7 @@ static lpae_t *xen_map_table(mfn_t mfn)
>      return map_domain_page(mfn);
>  }
>  
> -static void xen_unmap_table(const lpae_t *table)
> +void xen_unmap_table(const lpae_t *table)
>  {
>      /*
>       * During early boot, xen_map_table() will not use map_domain_page()
> @@ -699,7 +699,7 @@ void *ioremap(paddr_t pa, size_t len)
>      return ioremap_attr(pa, len, PAGE_HYPERVISOR_NOCACHE);
>  }
>  
> -static int create_xen_table(lpae_t *entry)
> +int create_xen_table(lpae_t *entry)
>  {
>      mfn_t mfn;
>      void *p;
> diff --git a/xen/arch/arm/setup.c b/xen/arch/arm/setup.c
> index 88d9d90fb5ad..b1a8f91bb385 100644
> --- a/xen/arch/arm/setup.c
> +++ b/xen/arch/arm/setup.c
> @@ -923,6 +923,10 @@ static void __init setup_mm(void)
>       */
>      populate_boot_allocator();
>  
> +    if ( !init_domheap_mappings(smp_processor_id()) )
> +        panic("CPU%u: Unable to prepare the domheap page-tables\n",
> +              smp_processor_id());
> +
>      total_pages = 0;
>  
>      for ( i = 0; i < banks->nr_banks; i++ )
> -- 
> 2.38.1
> 


^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 22/22] xen/arm64: Allow the admin to enable/disable the directmap
  2022-12-16 11:48 ` [PATCH 22/22] xen/arm64: Allow the admin to enable/disable the directmap Julien Grall
  2023-01-06 14:55   ` Henry Wang
@ 2023-01-23 22:52   ` Stefano Stabellini
  2023-01-23 23:09     ` Julien Grall
  1 sibling, 1 reply; 101+ messages in thread
From: Stefano Stabellini @ 2023-01-23 22:52 UTC (permalink / raw)
  To: Julien Grall
  Cc: xen-devel, Julien Grall, Andrew Cooper, George Dunlap,
	Jan Beulich, Stefano Stabellini, Wei Liu, Bertrand Marquis,
	Volodymyr Babchuk

On Fri, 16 Dec 2022, Julien Grall wrote:
> From: Julien Grall <jgrall@amazon.com>
> 
> Implement the same command line option as x86 to enable/disable the
> directmap. By default this is kept enabled.
> 
> Also modify setup_directmap_mappings() to populate the L0 entries
> related to the directmap area.
> 
> Signed-off-by: Julien Grall <jgrall@amazon.com>
> 
> ----
>     This patch is in an RFC state we need to decide what to do for arm32.
> 
>     Also, this is moving code that was introduced in this series. So
>     this will need to be fix in the next version (assuming Arm64 will
>     be ready).
> 
>     This was sent early as PoC to enable secret-free hypervisor
>     on Arm64.
> ---
>  docs/misc/xen-command-line.pandoc   |  2 +-
>  xen/arch/arm/include/asm/arm64/mm.h |  2 +-
>  xen/arch/arm/include/asm/mm.h       | 12 +++++----
>  xen/arch/arm/mm.c                   | 40 +++++++++++++++++++++++++++--
>  xen/arch/arm/setup.c                |  1 +
>  5 files changed, 48 insertions(+), 9 deletions(-)
> 
> diff --git a/docs/misc/xen-command-line.pandoc b/docs/misc/xen-command-line.pandoc
> index a63e4612acac..948035286acc 100644
> --- a/docs/misc/xen-command-line.pandoc
> +++ b/docs/misc/xen-command-line.pandoc
> @@ -760,7 +760,7 @@ Specify the size of the console debug trace buffer. By specifying `cpu:`
>  additionally a trace buffer of the specified size is allocated per cpu.
>  The debug trace feature is only enabled in debugging builds of Xen.
>  
> -### directmap (x86)
> +### directmap (arm64, x86)
>  > `= <boolean>`
>  
>  > Default: `true`
> diff --git a/xen/arch/arm/include/asm/arm64/mm.h b/xen/arch/arm/include/asm/arm64/mm.h
> index aa2adac63189..8b5dcb091750 100644
> --- a/xen/arch/arm/include/asm/arm64/mm.h
> +++ b/xen/arch/arm/include/asm/arm64/mm.h
> @@ -7,7 +7,7 @@
>   */
>  static inline bool arch_mfns_in_directmap(unsigned long mfn, unsigned long nr)
>  {
> -    return true;
> +    return opt_directmap;
>  }
>  
>  #endif /* __ARM_ARM64_MM_H__ */
> diff --git a/xen/arch/arm/include/asm/mm.h b/xen/arch/arm/include/asm/mm.h
> index d73abf1bf763..ef9ad3b366e3 100644
> --- a/xen/arch/arm/include/asm/mm.h
> +++ b/xen/arch/arm/include/asm/mm.h
> @@ -9,6 +9,13 @@
>  #include <public/xen.h>
>  #include <xen/pdx.h>
>  
> +extern bool opt_directmap;
> +
> +static inline bool arch_has_directmap(void)
> +{
> +    return opt_directmap;
> +}
> +
>  #if defined(CONFIG_ARM_32)
>  # include <asm/arm32/mm.h>
>  #elif defined(CONFIG_ARM_64)
> @@ -411,11 +418,6 @@ static inline void page_set_xenheap_gfn(struct page_info *p, gfn_t gfn)
>      } while ( (y = cmpxchg(&p->u.inuse.type_info, x, nx)) != x );
>  }
>  
> -static inline bool arch_has_directmap(void)
> -{
> -    return true;
> -}
> -
>  /* Helpers to allocate, map and unmap a Xen page-table */
>  int create_xen_table(lpae_t *entry);
>  lpae_t *xen_map_table(mfn_t mfn);
> diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c
> index f5fb957554a5..925d81c450e8 100644
> --- a/xen/arch/arm/mm.c
> +++ b/xen/arch/arm/mm.c
> @@ -15,6 +15,7 @@
>  #include <xen/init.h>
>  #include <xen/libfdt/libfdt.h>
>  #include <xen/mm.h>
> +#include <xen/param.h>
>  #include <xen/pfn.h>
>  #include <xen/pmap.h>
>  #include <xen/sched.h>
> @@ -131,6 +132,12 @@ vaddr_t directmap_virt_start __read_mostly;
>  unsigned long directmap_base_pdx __read_mostly;
>  #endif
>  
> +bool __ro_after_init opt_directmap = true;
> +/* TODO: Decide what to do for arm32. */
> +#ifdef CONFIG_ARM_64
> +boolean_param("directmap", opt_directmap);
> +#endif
> +
>  unsigned long frametable_base_pdx __read_mostly;
>  unsigned long frametable_virt_end __read_mostly;
>  
> @@ -606,16 +613,27 @@ void __init setup_directmap_mappings(unsigned long base_mfn,
>      directmap_virt_end = XENHEAP_VIRT_START + nr_mfns * PAGE_SIZE;
>  }
>  #else /* CONFIG_ARM_64 */
> -/* Map the region in the directmap area. */
> +/*
> + * This either populate a valid fdirect map, or allocates empty L1 tables

fdirect/direct


> + * and creates the L0 entries for the given region in the direct map
> + * depending on arch_has_directmap().
> + *
> + * When directmap=no, we still need to populate empty L1 tables in the
> + * directmap region. The reason is that the root page-table (i.e. L0)
> + * is per-CPU and secondary CPUs will initialize their root page-table
> + * based on the pCPU0 one. So L0 entries will be shared if they are
> + * pre-populated. We also rely on the fact that L1 tables are never
> + * freed.

You are saying that in case of directmap=no we are still creating empty
L1 tables and L0 entries because secondary CPUs will need them when they
initialize their root pagetables.

But why? Secondary CPUs will not be using the directmap either? Why do
seconday CPUs need the empty L1 tables?



> + */
>  void __init setup_directmap_mappings(unsigned long base_mfn,
>                                       unsigned long nr_mfns)
>  {
> +    unsigned long mfn_gb = base_mfn & ~((FIRST_SIZE >> PAGE_SHIFT) - 1);
>      int rc;
>  
>      /* First call sets the directmap physical and virtual offset. */
>      if ( mfn_eq(directmap_mfn_start, INVALID_MFN) )
>      {
> -        unsigned long mfn_gb = base_mfn & ~((FIRST_SIZE >> PAGE_SHIFT) - 1);
>  
>          directmap_mfn_start = _mfn(base_mfn);
>          directmap_base_pdx = mfn_to_pdx(_mfn(base_mfn));
> @@ -636,6 +654,24 @@ void __init setup_directmap_mappings(unsigned long base_mfn,
>          panic("cannot add directmap mapping at %lx below heap start %lx\n",
>                base_mfn, mfn_x(directmap_mfn_start));
>  
> +
> +    if ( !arch_has_directmap() )
> +    {
> +        vaddr_t vaddr = (vaddr_t)__mfn_to_virt(base_mfn);
> +        unsigned int i, slot;
> +
> +        slot = first_table_offset(vaddr);
> +        nr_mfns += base_mfn - mfn_gb;
> +        for ( i = 0; i < nr_mfns; i += BIT(XEN_PT_LEVEL_ORDER(0), UL), slot++ )
> +        {
> +            lpae_t *entry = &cpu0_pgtable[slot];
> +
> +            if ( !lpae_is_valid(*entry) && !create_xen_table(entry) )
> +                panic("Unable to populate zeroeth slot %u\n", slot);
> +        }
> +        return;
> +    }
> +
>      rc = map_pages_to_xen((vaddr_t)__mfn_to_virt(base_mfn),
>                            _mfn(base_mfn), nr_mfns,
>                            PAGE_HYPERVISOR_RW | _PAGE_BLOCK);
> diff --git a/xen/arch/arm/setup.c b/xen/arch/arm/setup.c
> index b1a8f91bb385..83ded03c7b1f 100644
> --- a/xen/arch/arm/setup.c
> +++ b/xen/arch/arm/setup.c
> @@ -1032,6 +1032,7 @@ void __init start_xen(unsigned long boot_phys_offset,
>      cmdline_parse(cmdline);
>  
>      setup_mm();
> +    printk("Booting with directmap %s\n", arch_has_directmap() ? "on" : "off");
>  
>      vm_init();
>  
> -- 
> 2.38.1
> 


^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 17/22] x86/setup: vmap heap nodes when they are outside the direct map
  2023-01-23 22:23     ` Julien Grall
@ 2023-01-23 22:56       ` Stefano Stabellini
  0 siblings, 0 replies; 101+ messages in thread
From: Stefano Stabellini @ 2023-01-23 22:56 UTC (permalink / raw)
  To: Julien Grall
  Cc: Stefano Stabellini, xen-devel, Hongyan Xia, Andrew Cooper,
	George Dunlap, Jan Beulich, Wei Liu, Julien Grall

On Mon, 23 Jan 2023, Julien Grall wrote:
> On 23/01/2023 22:03, Stefano Stabellini wrote:
> > On Fri, 16 Dec 2022, Julien Grall wrote:
> > > From: Hongyan Xia <hongyxia@amazon.com>
> > > 
> > > When we do not have a direct map, archs_mfn_in_direct_map() will always
> > > return false, thus init_node_heap() will allocate xenheap pages from an
> > > existing node for the metadata of a new node. This means that the
> > > metadata of a new node is in a different node, slowing down heap
> > > allocation.
> > > 
> > > Since we now have early vmap, vmap the metadata locally in the new node.
> > > 
> > > Signed-off-by: Hongyan Xia <hongyxia@amazon.com>
> > > Signed-off-by: Julien Grall <jgrall@amazon.com>
> > > 
> > > ----
> > > 
> > >      Changes from Hongyan's version:
> > >          * arch_mfn_in_direct_map() was renamed to
> > >            arch_mfns_in_direct_map()
> > >          * Use vmap_contig_pages() rather than __vmap(...).
> > >          * Add missing include (xen/vmap.h) so it compiles on Arm
> > > ---
> > >   xen/common/page_alloc.c | 42 +++++++++++++++++++++++++++++++----------
> > >   1 file changed, 32 insertions(+), 10 deletions(-)
> > > 
> > > diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
> > > index 0c4af5a71407..581c15d74dfb 100644
> > > --- a/xen/common/page_alloc.c
> > > +++ b/xen/common/page_alloc.c
> > > @@ -136,6 +136,7 @@
> > >   #include <xen/sched.h>
> > >   #include <xen/softirq.h>
> > >   #include <xen/spinlock.h>
> > > +#include <xen/vmap.h>
> > >     #include <asm/flushtlb.h>
> > >   #include <asm/numa.h>
> > > @@ -597,22 +598,43 @@ static unsigned long init_node_heap(int node,
> > > unsigned long mfn,
> > >           needed = 0;
> > >       }
> > >       else if ( *use_tail && nr >= needed &&
> > > -              arch_mfns_in_directmap(mfn + nr - needed, needed) &&
> > >                 (!xenheap_bits ||
> > >                  !((mfn + nr - 1) >> (xenheap_bits - PAGE_SHIFT))) )
> > >       {
> > > -        _heap[node] = mfn_to_virt(mfn + nr - needed);
> > > -        avail[node] = mfn_to_virt(mfn + nr - 1) +
> > > -                      PAGE_SIZE - sizeof(**avail) * NR_ZONES;
> > > -    }
> > > -    else if ( nr >= needed &&
> > > -              arch_mfns_in_directmap(mfn, needed) &&
> > > +        if ( arch_mfns_in_directmap(mfn + nr - needed, needed) )
> > > +        {
> > > +            _heap[node] = mfn_to_virt(mfn + nr - needed);
> > > +            avail[node] = mfn_to_virt(mfn + nr - 1) +
> > > +                          PAGE_SIZE - sizeof(**avail) * NR_ZONES;
> > > +        }
> > > +        else
> > > +        {
> > > +            mfn_t needed_start = _mfn(mfn + nr - needed);
> > > +
> > > +            _heap[node] = vmap_contig_pages(needed_start, needed);
> > > +            BUG_ON(!_heap[node]);
> > 
> > I see a BUG_ON here but init_node_heap is not __init.
> 
> FWIW, this is not the first introducing the first BUG_ON() in this function.
> 
>  Asking because
> > BUG_ON is only a good idea during init time. Should init_node_heap be
> > __init (not necessarely in this patch, but still)?
> AFAIK, there are two uses outside of __init:
>   1) Free the init sections
>   2) Memory hotplug
> 
> In the first case, we will likely need to panic() in case of an error. For
> ther second case, I am not entirely sure.
> 
> But there would be a fair bit of plumbing and thinking (how do you deal with
> the case where part of the memory were already added?).
> 
> Anyway, I don't think I am making the function worse, so I would rather no
> open that can of worms (yet).

I am only trying to check that we are not introducing any BUG_ONs that
could be triggered at runtime. We don't have a rule that requires the
function with a BUG_ON to be __init, however that is a simple and nice
way to check that the BUG_ON is appropriate.

In this specific case, you are right that there are already 2 BUG_ONs
in this function so you are not making things worse.

Aside from Jan's code style comment:

Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>


^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 22/22] xen/arm64: Allow the admin to enable/disable the directmap
  2023-01-23 22:52   ` Stefano Stabellini
@ 2023-01-23 23:09     ` Julien Grall
  2023-01-24  0:12       ` Stefano Stabellini
  0 siblings, 1 reply; 101+ messages in thread
From: Julien Grall @ 2023-01-23 23:09 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: xen-devel, Julien Grall, Andrew Cooper, George Dunlap,
	Jan Beulich, Wei Liu, Bertrand Marquis, Volodymyr Babchuk

Hi Stefano,

On 23/01/2023 22:52, Stefano Stabellini wrote:
> On Fri, 16 Dec 2022, Julien Grall wrote:
>> From: Julien Grall <jgrall@amazon.com>
>>
>> Implement the same command line option as x86 to enable/disable the
>> directmap. By default this is kept enabled.
>>
>> Also modify setup_directmap_mappings() to populate the L0 entries
>> related to the directmap area.
>>
>> Signed-off-by: Julien Grall <jgrall@amazon.com>
>>
>> ----
>>      This patch is in an RFC state we need to decide what to do for arm32.
>>
>>      Also, this is moving code that was introduced in this series. So
>>      this will need to be fix in the next version (assuming Arm64 will
>>      be ready).
>>
>>      This was sent early as PoC to enable secret-free hypervisor
>>      on Arm64.
>> ---
>>   docs/misc/xen-command-line.pandoc   |  2 +-
>>   xen/arch/arm/include/asm/arm64/mm.h |  2 +-
>>   xen/arch/arm/include/asm/mm.h       | 12 +++++----
>>   xen/arch/arm/mm.c                   | 40 +++++++++++++++++++++++++++--
>>   xen/arch/arm/setup.c                |  1 +
>>   5 files changed, 48 insertions(+), 9 deletions(-)
>>
>> diff --git a/docs/misc/xen-command-line.pandoc b/docs/misc/xen-command-line.pandoc
>> index a63e4612acac..948035286acc 100644
>> --- a/docs/misc/xen-command-line.pandoc
>> +++ b/docs/misc/xen-command-line.pandoc
>> @@ -760,7 +760,7 @@ Specify the size of the console debug trace buffer. By specifying `cpu:`
>>   additionally a trace buffer of the specified size is allocated per cpu.
>>   The debug trace feature is only enabled in debugging builds of Xen.
>>   
>> -### directmap (x86)
>> +### directmap (arm64, x86)
>>   > `= <boolean>`
>>   
>>   > Default: `true`
>> diff --git a/xen/arch/arm/include/asm/arm64/mm.h b/xen/arch/arm/include/asm/arm64/mm.h
>> index aa2adac63189..8b5dcb091750 100644
>> --- a/xen/arch/arm/include/asm/arm64/mm.h
>> +++ b/xen/arch/arm/include/asm/arm64/mm.h
>> @@ -7,7 +7,7 @@
>>    */
>>   static inline bool arch_mfns_in_directmap(unsigned long mfn, unsigned long nr)
>>   {
>> -    return true;
>> +    return opt_directmap;
>>   }
>>   
>>   #endif /* __ARM_ARM64_MM_H__ */
>> diff --git a/xen/arch/arm/include/asm/mm.h b/xen/arch/arm/include/asm/mm.h
>> index d73abf1bf763..ef9ad3b366e3 100644
>> --- a/xen/arch/arm/include/asm/mm.h
>> +++ b/xen/arch/arm/include/asm/mm.h
>> @@ -9,6 +9,13 @@
>>   #include <public/xen.h>
>>   #include <xen/pdx.h>
>>   
>> +extern bool opt_directmap;
>> +
>> +static inline bool arch_has_directmap(void)
>> +{
>> +    return opt_directmap;
>> +}
>> +
>>   #if defined(CONFIG_ARM_32)
>>   # include <asm/arm32/mm.h>
>>   #elif defined(CONFIG_ARM_64)
>> @@ -411,11 +418,6 @@ static inline void page_set_xenheap_gfn(struct page_info *p, gfn_t gfn)
>>       } while ( (y = cmpxchg(&p->u.inuse.type_info, x, nx)) != x );
>>   }
>>   
>> -static inline bool arch_has_directmap(void)
>> -{
>> -    return true;
>> -}
>> -
>>   /* Helpers to allocate, map and unmap a Xen page-table */
>>   int create_xen_table(lpae_t *entry);
>>   lpae_t *xen_map_table(mfn_t mfn);
>> diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c
>> index f5fb957554a5..925d81c450e8 100644
>> --- a/xen/arch/arm/mm.c
>> +++ b/xen/arch/arm/mm.c
>> @@ -15,6 +15,7 @@
>>   #include <xen/init.h>
>>   #include <xen/libfdt/libfdt.h>
>>   #include <xen/mm.h>
>> +#include <xen/param.h>
>>   #include <xen/pfn.h>
>>   #include <xen/pmap.h>
>>   #include <xen/sched.h>
>> @@ -131,6 +132,12 @@ vaddr_t directmap_virt_start __read_mostly;
>>   unsigned long directmap_base_pdx __read_mostly;
>>   #endif
>>   
>> +bool __ro_after_init opt_directmap = true;
>> +/* TODO: Decide what to do for arm32. */
>> +#ifdef CONFIG_ARM_64
>> +boolean_param("directmap", opt_directmap);
>> +#endif
>> +
>>   unsigned long frametable_base_pdx __read_mostly;
>>   unsigned long frametable_virt_end __read_mostly;
>>   
>> @@ -606,16 +613,27 @@ void __init setup_directmap_mappings(unsigned long base_mfn,
>>       directmap_virt_end = XENHEAP_VIRT_START + nr_mfns * PAGE_SIZE;
>>   }
>>   #else /* CONFIG_ARM_64 */
>> -/* Map the region in the directmap area. */
>> +/*
>> + * This either populate a valid fdirect map, or allocates empty L1 tables
> 
> fdirect/direct
> 
> 
>> + * and creates the L0 entries for the given region in the direct map
>> + * depending on arch_has_directmap().
>> + *
>> + * When directmap=no, we still need to populate empty L1 tables in the
>> + * directmap region. The reason is that the root page-table (i.e. L0)
>> + * is per-CPU and secondary CPUs will initialize their root page-table
>> + * based on the pCPU0 one. So L0 entries will be shared if they are
>> + * pre-populated. We also rely on the fact that L1 tables are never
>> + * freed.
> 
> You are saying that in case of directmap=no we are still creating empty
> L1 tables and L0 entries because secondary CPUs will need them when they
> initialize their root pagetables.
> 
> But why? Secondary CPUs will not be using the directmap either? Why do
> seconday CPUs need the empty L1 tables?

 From the cover letter,

"
The subject is probably a misnomer. The directmap is still present but
the RAM is not mapped by default. Instead, the region will still be used
to map pages allocate via alloc_xenheap_pages().

The advantage is the solution is simple (so IHMO good enough for been
merged as a tech preview). The disadvantage is the page allocator is not
trying to keep all the xenheap pages together. So we may end up to have
an increase of page table usage.

In the longer term, we should consider to remove the direct map
completely and switch to vmap(). The main problem with this approach
is it is frequent to use mfn_to_virt() in the code. So we would need
to cache the mapping (maybe in the struct page_info).
"

I can add summary in the commit message.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 22/22] xen/arm64: Allow the admin to enable/disable the directmap
  2023-01-23 23:09     ` Julien Grall
@ 2023-01-24  0:12       ` Stefano Stabellini
  2023-01-24 18:06         ` Julien Grall
  0 siblings, 1 reply; 101+ messages in thread
From: Stefano Stabellini @ 2023-01-24  0:12 UTC (permalink / raw)
  To: Julien Grall
  Cc: Stefano Stabellini, xen-devel, Julien Grall, Andrew Cooper,
	George Dunlap, Jan Beulich, Wei Liu, Bertrand Marquis,
	Volodymyr Babchuk

On Mon, 23 Jan 2023, Julien Grall wrote:
> Hi Stefano,
> 
> On 23/01/2023 22:52, Stefano Stabellini wrote:
> > On Fri, 16 Dec 2022, Julien Grall wrote:
> > > From: Julien Grall <jgrall@amazon.com>
> > > 
> > > Implement the same command line option as x86 to enable/disable the
> > > directmap. By default this is kept enabled.
> > > 
> > > Also modify setup_directmap_mappings() to populate the L0 entries
> > > related to the directmap area.
> > > 
> > > Signed-off-by: Julien Grall <jgrall@amazon.com>
> > > 
> > > ----
> > >      This patch is in an RFC state we need to decide what to do for arm32.
> > > 
> > >      Also, this is moving code that was introduced in this series. So
> > >      this will need to be fix in the next version (assuming Arm64 will
> > >      be ready).
> > > 
> > >      This was sent early as PoC to enable secret-free hypervisor
> > >      on Arm64.
> > > ---
> > >   docs/misc/xen-command-line.pandoc   |  2 +-
> > >   xen/arch/arm/include/asm/arm64/mm.h |  2 +-
> > >   xen/arch/arm/include/asm/mm.h       | 12 +++++----
> > >   xen/arch/arm/mm.c                   | 40 +++++++++++++++++++++++++++--
> > >   xen/arch/arm/setup.c                |  1 +
> > >   5 files changed, 48 insertions(+), 9 deletions(-)
> > > 
> > > diff --git a/docs/misc/xen-command-line.pandoc
> > > b/docs/misc/xen-command-line.pandoc
> > > index a63e4612acac..948035286acc 100644
> > > --- a/docs/misc/xen-command-line.pandoc
> > > +++ b/docs/misc/xen-command-line.pandoc
> > > @@ -760,7 +760,7 @@ Specify the size of the console debug trace buffer. By
> > > specifying `cpu:`
> > >   additionally a trace buffer of the specified size is allocated per cpu.
> > >   The debug trace feature is only enabled in debugging builds of Xen.
> > >   -### directmap (x86)
> > > +### directmap (arm64, x86)
> > >   > `= <boolean>`
> > >     > Default: `true`
> > > diff --git a/xen/arch/arm/include/asm/arm64/mm.h
> > > b/xen/arch/arm/include/asm/arm64/mm.h
> > > index aa2adac63189..8b5dcb091750 100644
> > > --- a/xen/arch/arm/include/asm/arm64/mm.h
> > > +++ b/xen/arch/arm/include/asm/arm64/mm.h
> > > @@ -7,7 +7,7 @@
> > >    */
> > >   static inline bool arch_mfns_in_directmap(unsigned long mfn, unsigned
> > > long nr)
> > >   {
> > > -    return true;
> > > +    return opt_directmap;
> > >   }
> > >     #endif /* __ARM_ARM64_MM_H__ */
> > > diff --git a/xen/arch/arm/include/asm/mm.h b/xen/arch/arm/include/asm/mm.h
> > > index d73abf1bf763..ef9ad3b366e3 100644
> > > --- a/xen/arch/arm/include/asm/mm.h
> > > +++ b/xen/arch/arm/include/asm/mm.h
> > > @@ -9,6 +9,13 @@
> > >   #include <public/xen.h>
> > >   #include <xen/pdx.h>
> > >   +extern bool opt_directmap;
> > > +
> > > +static inline bool arch_has_directmap(void)
> > > +{
> > > +    return opt_directmap;
> > > +}
> > > +
> > >   #if defined(CONFIG_ARM_32)
> > >   # include <asm/arm32/mm.h>
> > >   #elif defined(CONFIG_ARM_64)
> > > @@ -411,11 +418,6 @@ static inline void page_set_xenheap_gfn(struct
> > > page_info *p, gfn_t gfn)
> > >       } while ( (y = cmpxchg(&p->u.inuse.type_info, x, nx)) != x );
> > >   }
> > >   -static inline bool arch_has_directmap(void)
> > > -{
> > > -    return true;
> > > -}
> > > -
> > >   /* Helpers to allocate, map and unmap a Xen page-table */
> > >   int create_xen_table(lpae_t *entry);
> > >   lpae_t *xen_map_table(mfn_t mfn);
> > > diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c
> > > index f5fb957554a5..925d81c450e8 100644
> > > --- a/xen/arch/arm/mm.c
> > > +++ b/xen/arch/arm/mm.c
> > > @@ -15,6 +15,7 @@
> > >   #include <xen/init.h>
> > >   #include <xen/libfdt/libfdt.h>
> > >   #include <xen/mm.h>
> > > +#include <xen/param.h>
> > >   #include <xen/pfn.h>
> > >   #include <xen/pmap.h>
> > >   #include <xen/sched.h>
> > > @@ -131,6 +132,12 @@ vaddr_t directmap_virt_start __read_mostly;
> > >   unsigned long directmap_base_pdx __read_mostly;
> > >   #endif
> > >   +bool __ro_after_init opt_directmap = true;
> > > +/* TODO: Decide what to do for arm32. */
> > > +#ifdef CONFIG_ARM_64
> > > +boolean_param("directmap", opt_directmap);
> > > +#endif
> > > +
> > >   unsigned long frametable_base_pdx __read_mostly;
> > >   unsigned long frametable_virt_end __read_mostly;
> > >   @@ -606,16 +613,27 @@ void __init setup_directmap_mappings(unsigned long
> > > base_mfn,
> > >       directmap_virt_end = XENHEAP_VIRT_START + nr_mfns * PAGE_SIZE;
> > >   }
> > >   #else /* CONFIG_ARM_64 */
> > > -/* Map the region in the directmap area. */
> > > +/*
> > > + * This either populate a valid fdirect map, or allocates empty L1 tables
> > 
> > fdirect/direct
> > 
> > 
> > > + * and creates the L0 entries for the given region in the direct map
> > > + * depending on arch_has_directmap().
> > > + *
> > > + * When directmap=no, we still need to populate empty L1 tables in the
> > > + * directmap region. The reason is that the root page-table (i.e. L0)
> > > + * is per-CPU and secondary CPUs will initialize their root page-table
> > > + * based on the pCPU0 one. So L0 entries will be shared if they are
> > > + * pre-populated. We also rely on the fact that L1 tables are never
> > > + * freed.
> > 
> > You are saying that in case of directmap=no we are still creating empty
> > L1 tables and L0 entries because secondary CPUs will need them when they
> > initialize their root pagetables.
> > 
> > But why? Secondary CPUs will not be using the directmap either? Why do
> > seconday CPUs need the empty L1 tables?
> 
> From the cover letter,
> 
> "
> The subject is probably a misnomer. The directmap is still present but
> the RAM is not mapped by default. Instead, the region will still be used
> to map pages allocate via alloc_xenheap_pages().
> 
> The advantage is the solution is simple (so IHMO good enough for been
> merged as a tech preview). The disadvantage is the page allocator is not
> trying to keep all the xenheap pages together. So we may end up to have
> an increase of page table usage.
> 
> In the longer term, we should consider to remove the direct map
> completely and switch to vmap(). The main problem with this approach
> is it is frequent to use mfn_to_virt() in the code. So we would need
> to cache the mapping (maybe in the struct page_info).
> "

Ah yes! I see it now that we are relying on the same area
(alloc_xenheap_pages calls page_to_virt() then map_pages_to_xen()).

map_pages_to_xen() is able to create pagetable entries at every level,
but we need to make sure they are shared across per-cpu pagetables. To
make that happen, we are pre-creating the L0/L1 entries here so that
they become common across all per-cpu pagetables and then we let
map_pages_to_xen() do its job.

Did I understand it right?


> I can add summary in the commit message.

I would suggest to improve a bit the in-code comment on top of
setup_directmap_mappings. I might also be able to help with the text
once I am sure I understood what is going on :-)


^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 22/22] xen/arm64: Allow the admin to enable/disable the directmap
  2023-01-24  0:12       ` Stefano Stabellini
@ 2023-01-24 18:06         ` Julien Grall
  2023-01-24 20:48           ` Stefano Stabellini
  0 siblings, 1 reply; 101+ messages in thread
From: Julien Grall @ 2023-01-24 18:06 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: xen-devel, Julien Grall, Andrew Cooper, George Dunlap,
	Jan Beulich, Wei Liu, Bertrand Marquis, Volodymyr Babchuk

Hi Stefano,

On 24/01/2023 00:12, Stefano Stabellini wrote:
> On Mon, 23 Jan 2023, Julien Grall wrote:
> Ah yes! I see it now that we are relying on the same area
> (alloc_xenheap_pages calls page_to_virt() then map_pages_to_xen()).
> 
> map_pages_to_xen() is able to create pagetable entries at every level,
> but we need to make sure they are shared across per-cpu pagetables. To
> make that happen, we are pre-creating the L0/L1 entries here so that
> they become common across all per-cpu pagetables and then we let
> map_pages_to_xen() do its job.
> 
> Did I understand it right?

Your understanding is correct.

>> I can add summary in the commit message.
> 
> I would suggest to improve a bit the in-code comment on top of
> setup_directmap_mappings. I might also be able to help with the text
> once I am sure I understood what is going on :-)

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 22/22] xen/arm64: Allow the admin to enable/disable the directmap
  2023-01-24 18:06         ` Julien Grall
@ 2023-01-24 20:48           ` Stefano Stabellini
  0 siblings, 0 replies; 101+ messages in thread
From: Stefano Stabellini @ 2023-01-24 20:48 UTC (permalink / raw)
  To: Julien Grall
  Cc: Stefano Stabellini, xen-devel, Julien Grall, Andrew Cooper,
	George Dunlap, Jan Beulich, Wei Liu, Bertrand Marquis,
	Volodymyr Babchuk

On Tue, 24 Jan 2023, Julien Grall wrote:
> Hi Stefano,
> 
> On 24/01/2023 00:12, Stefano Stabellini wrote:
> > On Mon, 23 Jan 2023, Julien Grall wrote:
> > Ah yes! I see it now that we are relying on the same area
> > (alloc_xenheap_pages calls page_to_virt() then map_pages_to_xen()).
> > 
> > map_pages_to_xen() is able to create pagetable entries at every level,
> > but we need to make sure they are shared across per-cpu pagetables. To
> > make that happen, we are pre-creating the L0/L1 entries here so that
> > they become common across all per-cpu pagetables and then we let
> > map_pages_to_xen() do its job.
> > 
> > Did I understand it right?
> 
> Your understanding is correct.

Great!


> > > I can add summary in the commit message.
> > 
> > I would suggest to improve a bit the in-code comment on top of
> > setup_directmap_mappings. I might also be able to help with the text
> > once I am sure I understood what is going on :-)

How about this comment (feel free to edit/improve it as well, just a
suggestion):

In the !arch_has_directmap() case this function allocates empty L1
tables and creates the L0 entries for the direct map region.

When the direct map is disabled, alloc_xenheap_pages results in the page
being temporarily mapped in the usual xenheap address range via
map_pages_to_xen(). map_pages_to_xen() is able to create pagetable
entries at every level, but we need to make sure they are shared across
per-cpu pagetables. For this reason, here in this function we are
creating the L0 entries and empty L1 tables in advance, so that they
become common across all per-cpu pagetables.


^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 05/22] x86/srat: vmap the pages for acpi_slit
  2022-12-23 11:31     ` Julien Grall
  2023-01-04 10:23       ` Jan Beulich
@ 2023-01-30 19:27       ` Julien Grall
  2023-01-31  9:11         ` Jan Beulich
  1 sibling, 1 reply; 101+ messages in thread
From: Julien Grall @ 2023-01-30 19:27 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Hongyan Xia, Andrew Cooper, Roger Pau Monné,
	Wei Liu, xen-devel, Julien Grall

Hi Jan,

On 23/12/2022 11:31, Julien Grall wrote:
> On 20/12/2022 15:30, Jan Beulich wrote:
>> On 16.12.2022 12:48, Julien Grall wrote:
>>> From: Hongyan Xia <hongyxia@amazon.com>
>>>
>>> This avoids the assumption that boot pages are in the direct map.
>>>
>>> Signed-off-by: Hongyan Xia <hongyxia@amazon.com>
>>> Signed-off-by: Julien Grall <jgrall@amazon.com>
>>
>> Reviewed-by: Jan Beulich <jbeulich@suse.com>
>>
>> However, ...
>>
>>> --- a/xen/arch/x86/srat.c
>>> +++ b/xen/arch/x86/srat.c
>>> @@ -139,7 +139,8 @@ void __init acpi_numa_slit_init(struct 
>>> acpi_table_slit *slit)
>>>           return;
>>>       }
>>>       mfn = alloc_boot_pages(PFN_UP(slit->header.length), 1);
>>> -    acpi_slit = mfn_to_virt(mfn_x(mfn));
>>> +    acpi_slit = vmap_contig_pages(mfn, PFN_UP(slit->header.length));
>>
>> ... with the increased use of vmap space the VA range used will need
>> growing. And that's perhaps better done ahead of time than late.
> 
> I will have a look to increase the vmap().

I have started to look at this. The current size of VMAP is 64GB.

At least in the setup I have I didn't see any particular issue with the 
existing size of the vmap. Looking through the history, the last time it 
was bumped by one of your commit (see b0581b9214d2) but it is not clear 
what was the setup.

Given I don't have a setup where the VMAP is exhausted it is not clear 
to me what would be an acceptable bump.

AFAICT, in PML4 slot 261, we still have 62GB reserved for future. So I 
was thinking to add an extra 32GB which would bring the VMAP to 96GB. 
This is just a number that doesn't use all the reserved space but still 
a power of two.

Are you fine with that?

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 05/22] x86/srat: vmap the pages for acpi_slit
  2023-01-30 19:27       ` Julien Grall
@ 2023-01-31  9:11         ` Jan Beulich
  2023-01-31 21:37           ` Julien Grall
  0 siblings, 1 reply; 101+ messages in thread
From: Jan Beulich @ 2023-01-31  9:11 UTC (permalink / raw)
  To: Julien Grall
  Cc: Hongyan Xia, Andrew Cooper, Roger Pau Monné,
	Wei Liu, xen-devel, Julien Grall

On 30.01.2023 20:27, Julien Grall wrote:
> Hi Jan,
> 
> On 23/12/2022 11:31, Julien Grall wrote:
>> On 20/12/2022 15:30, Jan Beulich wrote:
>>> On 16.12.2022 12:48, Julien Grall wrote:
>>>> From: Hongyan Xia <hongyxia@amazon.com>
>>>>
>>>> This avoids the assumption that boot pages are in the direct map.
>>>>
>>>> Signed-off-by: Hongyan Xia <hongyxia@amazon.com>
>>>> Signed-off-by: Julien Grall <jgrall@amazon.com>
>>>
>>> Reviewed-by: Jan Beulich <jbeulich@suse.com>
>>>
>>> However, ...
>>>
>>>> --- a/xen/arch/x86/srat.c
>>>> +++ b/xen/arch/x86/srat.c
>>>> @@ -139,7 +139,8 @@ void __init acpi_numa_slit_init(struct 
>>>> acpi_table_slit *slit)
>>>>           return;
>>>>       }
>>>>       mfn = alloc_boot_pages(PFN_UP(slit->header.length), 1);
>>>> -    acpi_slit = mfn_to_virt(mfn_x(mfn));
>>>> +    acpi_slit = vmap_contig_pages(mfn, PFN_UP(slit->header.length));
>>>
>>> ... with the increased use of vmap space the VA range used will need
>>> growing. And that's perhaps better done ahead of time than late.
>>
>> I will have a look to increase the vmap().
> 
> I have started to look at this. The current size of VMAP is 64GB.
> 
> At least in the setup I have I didn't see any particular issue with the 
> existing size of the vmap. Looking through the history, the last time it 
> was bumped by one of your commit (see b0581b9214d2) but it is not clear 
> what was the setup.
> 
> Given I don't have a setup where the VMAP is exhausted it is not clear 
> to me what would be an acceptable bump.
> 
> AFAICT, in PML4 slot 261, we still have 62GB reserved for future. So I 
> was thinking to add an extra 32GB which would bring the VMAP to 96GB. 
> This is just a number that doesn't use all the reserved space but still 
> a power of two.
> 
> Are you fine with that?

Hmm. Leaving aside that 96Gb isn't a power of two, my comment saying
"ahead of time" was under the (wrong, as it now looks) impression that
the goal of your series was to truly do away with the directmap. I was
therefore expecting a much larger bump in size, perhaps moving the
vmap area into space presently occupied by the directmap. IOW for the
time being, with no _significant_ increase of space consumption, we
may well be fine with the 64Gb range.

Jan


^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 05/22] x86/srat: vmap the pages for acpi_slit
  2023-01-31  9:11         ` Jan Beulich
@ 2023-01-31 21:37           ` Julien Grall
  0 siblings, 0 replies; 101+ messages in thread
From: Julien Grall @ 2023-01-31 21:37 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Hongyan Xia, Andrew Cooper, Roger Pau Monné,
	Wei Liu, xen-devel, Julien Grall

Hi Jan,

On 31/01/2023 09:11, Jan Beulich wrote:
> On 30.01.2023 20:27, Julien Grall wrote:
>> Hi Jan,
>>
>> On 23/12/2022 11:31, Julien Grall wrote:
>>> On 20/12/2022 15:30, Jan Beulich wrote:
>>>> On 16.12.2022 12:48, Julien Grall wrote:
>>>>> From: Hongyan Xia <hongyxia@amazon.com>
>>>>>
>>>>> This avoids the assumption that boot pages are in the direct map.
>>>>>
>>>>> Signed-off-by: Hongyan Xia <hongyxia@amazon.com>
>>>>> Signed-off-by: Julien Grall <jgrall@amazon.com>
>>>>
>>>> Reviewed-by: Jan Beulich <jbeulich@suse.com>
>>>>
>>>> However, ...
>>>>
>>>>> --- a/xen/arch/x86/srat.c
>>>>> +++ b/xen/arch/x86/srat.c
>>>>> @@ -139,7 +139,8 @@ void __init acpi_numa_slit_init(struct
>>>>> acpi_table_slit *slit)
>>>>>            return;
>>>>>        }
>>>>>        mfn = alloc_boot_pages(PFN_UP(slit->header.length), 1);
>>>>> -    acpi_slit = mfn_to_virt(mfn_x(mfn));
>>>>> +    acpi_slit = vmap_contig_pages(mfn, PFN_UP(slit->header.length));
>>>>
>>>> ... with the increased use of vmap space the VA range used will need
>>>> growing. And that's perhaps better done ahead of time than late.
>>>
>>> I will have a look to increase the vmap().
>>
>> I have started to look at this. The current size of VMAP is 64GB.
>>
>> At least in the setup I have I didn't see any particular issue with the
>> existing size of the vmap. Looking through the history, the last time it
>> was bumped by one of your commit (see b0581b9214d2) but it is not clear
>> what was the setup.
>>
>> Given I don't have a setup where the VMAP is exhausted it is not clear
>> to me what would be an acceptable bump.
>>
>> AFAICT, in PML4 slot 261, we still have 62GB reserved for future. So I
>> was thinking to add an extra 32GB which would bring the VMAP to 96GB.
>> This is just a number that doesn't use all the reserved space but still
>> a power of two.
>>
>> Are you fine with that?
> 
> Hmm. Leaving aside that 96Gb isn't a power of two, my comment saying
> "ahead of time" was under the (wrong, as it now looks) impression that
> the goal of your series was to truly do away with the directmap.

Yes, the directmap is still present with this series. There are more 
work to completely remove the vmap() (see the cover letter for some 
details) and would prefer if this is separated from this series.


> I was
> therefore expecting a much larger bump in size, perhaps moving the
> vmap area into space presently occupied by the directmap. IOW for the
> time being, with no _significant_ increase of space consumption, we
> may well be fine with the 64Gb range.

Ok. I will keep it in mind if I am working completely removing the 
directmap.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 06/22] x86: map/unmap pages in restore_all_guests
  2023-01-13  9:22           ` Jan Beulich
@ 2023-06-22 10:44             ` Julien Grall
  2023-06-22 13:19               ` Jan Beulich
  0 siblings, 1 reply; 101+ messages in thread
From: Julien Grall @ 2023-06-22 10:44 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Andrew Cooper, Roger Pau Monné, Wei Liu, Julien Grall, xen-devel

Hi Jan,

Sorry for the late reply.

On 13/01/2023 09:22, Jan Beulich wrote:
> On 13.01.2023 00:20, Julien Grall wrote:
>> On 04/01/2023 10:27, Jan Beulich wrote:
>>> On 23.12.2022 13:22, Julien Grall wrote:
>>>> On 22/12/2022 11:12, Jan Beulich wrote:
>>>>> On 16.12.2022 12:48, Julien Grall wrote:
>>>>>> --- a/xen/arch/x86/x86_64/entry.S
>>>>>> +++ b/xen/arch/x86/x86_64/entry.S
>>>>>> @@ -165,7 +165,24 @@ restore_all_guest:
>>>>>>             and   %rsi, %rdi
>>>>>>             and   %r9, %rsi
>>>>>>             add   %rcx, %rdi
>>>>>> -        add   %rcx, %rsi
>>>>>> +
>>>>>> +         /*
>>>>>> +          * Without a direct map, we have to map first before copying. We only
>>>>>> +          * need to map the guest root table but not the per-CPU root_pgt,
>>>>>> +          * because the latter is still a xenheap page.
>>>>>> +          */
>>>>>> +        pushq %r9
>>>>>> +        pushq %rdx
>>>>>> +        pushq %rax
>>>>>> +        pushq %rdi
>>>>>> +        mov   %rsi, %rdi
>>>>>> +        shr   $PAGE_SHIFT, %rdi
>>>>>> +        callq map_domain_page
>>>>>> +        mov   %rax, %rsi
>>>>>> +        popq  %rdi
>>>>>> +        /* Stash the pointer for unmapping later. */
>>>>>> +        pushq %rax
>>>>>> +
>>>>>>             mov   $ROOT_PAGETABLE_FIRST_XEN_SLOT, %ecx
>>>>>>             mov   root_table_offset(SH_LINEAR_PT_VIRT_START)*8(%rsi), %r8
>>>>>>             mov   %r8, root_table_offset(SH_LINEAR_PT_VIRT_START)*8(%rdi)
>>>>>> @@ -177,6 +194,14 @@ restore_all_guest:
>>>>>>             sub   $(ROOT_PAGETABLE_FIRST_XEN_SLOT - \
>>>>>>                     ROOT_PAGETABLE_LAST_XEN_SLOT - 1) * 8, %rdi
>>>>>>             rep movsq
>>>>>> +
>>>>>> +        /* Unmap the page. */
>>>>>> +        popq  %rdi
>>>>>> +        callq unmap_domain_page
>>>>>> +        popq  %rax
>>>>>> +        popq  %rdx
>>>>>> +        popq  %r9
>>>>>
>>>>> While the PUSH/POP are part of what I dislike here, I think this wants
>>>>> doing differently: Establish a mapping when putting in place a new guest
>>>>> page table, and use the pointer here. This could be a new per-domain
>>>>> mapping, to limit its visibility.
>>>>
>>>> I have looked at a per-domain approach and this looks way more complex
>>>> than the few concise lines here (not mentioning the extra amount of
>>>> memory).
>>>
>>> Yes, I do understand that would be a more intrusive change.
>>
>> I could be persuaded to look at a more intrusive change if there are a
>> good reason to do it. To me, at the moment, it mostly seem a matter of
>> taste.
>>
>> So what would we gain from a perdomain mapping?
> 
> Rather than mapping/unmapping once per hypervisor entry/exit, we'd
> map just once per context switch. Plus we'd save ugly/fragile assembly
> code (apart from the push/pop I also dislike C functions being called
> from assembly which aren't really meant to be called this way: While
> these two may indeed be unlikely to ever change, any such change comes
> with the risk of the assembly callers being missed - the compiler
> won't tell you that e.g. argument types/count don't match parameters
> anymore).

I think I have managed to write what you suggested. I would like to 
share to get early feedback before resending the series.

There are also a couple of TODOs (XXX) in place where I am not sure if 
this is correct.

diff --git a/xen/arch/x86/include/asm/config.h 
b/xen/arch/x86/include/asm/config.h
index fbc4bb3416bd..320ddb9e1e77 100644
--- a/xen/arch/x86/include/asm/config.h
+++ b/xen/arch/x86/include/asm/config.h
@@ -202,7 +202,7 @@ extern unsigned char boot_edid_info[128];
  /* Slot 260: per-domain mappings (including map cache). */
  #define PERDOMAIN_VIRT_START    (PML4_ADDR(260))
  #define PERDOMAIN_SLOT_MBYTES   (PML4_ENTRY_BYTES >> (20 + 
PAGETABLE_ORDER))
-#define PERDOMAIN_SLOTS         3
+#define PERDOMAIN_SLOTS         4
  #define PERDOMAIN_VIRT_SLOT(s)  (PERDOMAIN_VIRT_START + (s) * \
                                   (PERDOMAIN_SLOT_MBYTES << 20))
  /* Slot 4: mirror of per-domain mappings (for compat xlat area 
accesses). */
@@ -316,6 +316,16 @@ extern unsigned long xen_phys_start;
  #define ARG_XLAT_START(v)        \
      (ARG_XLAT_VIRT_START + ((v)->vcpu_id << ARG_XLAT_VA_SHIFT))

+/* CR3 shadow mapping area. The fourth per-domain-mapping sub-area */
+#define SHADOW_CR3_VIRT_START   PERDOMAIN_VIRT_SLOT(3)
+#define SHADOW_CR3_ENTRIES      MAX_VIRT_CPUS
+#define SHADOW_CR3_VIRT_END     (SHADOW_CR3_VIRT_START +    \
+                                 (MAX_VIRT_CPUS * PAGE_SIZE))
+
+/* The address of a particular VCPU's c3 */
+#define SHADOW_CR3_VCPU_VIRT_START(v) \
+    (SHADOW_CR3_VIRT_START + ((v)->vcpu_id * PAGE_SIZE))
+
  #define ELFSIZE 64

  #define ARCH_CRASH_SAVE_VMCOREINFO
diff --git a/xen/arch/x86/include/asm/domain.h 
b/xen/arch/x86/include/asm/domain.h
index c2d9fc333be5..d5989224f4a3 100644
--- a/xen/arch/x86/include/asm/domain.h
+++ b/xen/arch/x86/include/asm/domain.h
@@ -273,6 +273,7 @@ struct time_scale {
  struct pv_domain
  {
      l1_pgentry_t **gdt_ldt_l1tab;
+    l1_pgentry_t **shadow_cr3_l1tab;

      atomic_t nr_l4_pages;

diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 9741d28cbc96..b64ee1ca47f6 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -509,6 +509,13 @@ void share_xen_page_with_guest(struct page_info 
*page, struct domain *d,
      spin_unlock(&d->page_alloc_lock);
  }

+#define shadow_cr3_idx(v) \
+    ((v)->vcpu_id >> PAGETABLE_ORDER)
+
+#define pv_shadow_cr3_pte(v) \
+    ((v)->domain->arch.pv.shadow_cr3_l1tab[shadow_cr3_idx(v)] + \
+     ((v)->vcpu_id & (L1_PAGETABLE_ENTRIES - 1)))
+
  void make_cr3(struct vcpu *v, mfn_t mfn)
  {
      struct domain *d = v->domain;
@@ -516,6 +523,18 @@ void make_cr3(struct vcpu *v, mfn_t mfn)
      v->arch.cr3 = mfn_x(mfn) << PAGE_SHIFT;
      if ( is_pv_domain(d) && d->arch.pv.pcid )
          v->arch.cr3 |= get_pcid_bits(v, false);
+
+    /* Update the CR3 mapping */
+    if ( is_pv_domain(d) )
+    {
+        l1_pgentry_t *pte = pv_shadow_cr3_pte(v);
+
+        /* XXX Do we need to call get page first? */
+        l1e_write(pte, l1e_from_mfn(mfn, __PAGE_HYPERVISOR_RW));
+        /* XXX Can the flush be reduced to the page? */
+        /* XXX Do we always call with current? */
+        flush_tlb_local();
+    }
  }

  void write_ptbase(struct vcpu *v)
diff --git a/xen/arch/x86/pv/domain.c b/xen/arch/x86/pv/domain.c
index 5c92812dc67a..064645ccc261 100644
--- a/xen/arch/x86/pv/domain.c
+++ b/xen/arch/x86/pv/domain.c
@@ -288,6 +288,19 @@ static void pv_destroy_gdt_ldt_l1tab(struct vcpu *v)
                                1U << GDT_LDT_VCPU_SHIFT);
  }

+static int pv_create_shadow_cr3_l1tab(struct vcpu *v)
+{
+    return create_perdomain_mapping(v->domain, 
SHADOW_CR3_VCPU_VIRT_START(v),
+                                    1, v->domain->arch.pv.shadow_cr3_l1tab,
+                                    NULL);
+}
+
+static void pv_destroy_shadow_cr3_l1tab(struct vcpu *v)
+
+{
+    destroy_perdomain_mapping(v->domain, SHADOW_CR3_VCPU_VIRT_START(v), 1);
+}
+
  void pv_vcpu_destroy(struct vcpu *v)
  {
      if ( is_pv_32bit_vcpu(v) )
@@ -297,6 +310,7 @@ void pv_vcpu_destroy(struct vcpu *v)
      }

      pv_destroy_gdt_ldt_l1tab(v);
+    pv_destroy_shadow_cr3_l1tab(v);
      XFREE(v->arch.pv.trap_ctxt);
  }

@@ -311,6 +325,10 @@ int pv_vcpu_initialise(struct vcpu *v)
      if ( rc )
          return rc;

+    rc = pv_create_shadow_cr3_l1tab(v);
+    if ( rc )
+        goto done;
+
      BUILD_BUG_ON(X86_NR_VECTORS * sizeof(*v->arch.pv.trap_ctxt) >
                   PAGE_SIZE);
      v->arch.pv.trap_ctxt = xzalloc_array(struct trap_info, 
X86_NR_VECTORS);
@@ -346,10 +364,12 @@ void pv_domain_destroy(struct domain *d)

      destroy_perdomain_mapping(d, GDT_LDT_VIRT_START,
                                GDT_LDT_MBYTES << (20 - PAGE_SHIFT));
+    destroy_perdomain_mapping(d, SHADOW_CR3_VIRT_START, 
SHADOW_CR3_ENTRIES);

      XFREE(d->arch.pv.cpuidmasks);

      FREE_XENHEAP_PAGE(d->arch.pv.gdt_ldt_l1tab);
+    FREE_XENHEAP_PAGE(d->arch.pv.shadow_cr3_l1tab);
  }

  void noreturn cf_check continue_pv_domain(void);
@@ -371,6 +391,12 @@ int pv_domain_initialise(struct domain *d)
          goto fail;
      clear_page(d->arch.pv.gdt_ldt_l1tab);

+    d->arch.pv.shadow_cr3_l1tab =
+        alloc_xenheap_pages(0, MEMF_node(domain_to_node(d)));
+    if ( !d->arch.pv.shadow_cr3_l1tab )
+        goto fail;
+    clear_page(d->arch.pv.shadow_cr3_l1tab);
+
      if ( levelling_caps & ~LCAP_faulting &&
           (d->arch.pv.cpuidmasks = xmemdup(&cpuidmask_defaults)) == NULL )
          goto fail;
@@ -381,6 +407,11 @@ int pv_domain_initialise(struct domain *d)
      if ( rc )
          goto fail;

+    rc = create_perdomain_mapping(d, SHADOW_CR3_VIRT_START,
+                                  SHADOW_CR3_ENTRIES, NULL, NULL);
+    if ( rc )
+        goto fail;
+
      d->arch.ctxt_switch = &pv_csw;

      d->arch.pv.xpti = is_hardware_domain(d) ? opt_xpti_hwdom : 
opt_xpti_domu;
diff --git a/xen/arch/x86/x86_64/asm-offsets.c 
b/xen/arch/x86/x86_64/asm-offsets.c
index 287dac101ad4..ed486607bf15 100644
--- a/xen/arch/x86/x86_64/asm-offsets.c
+++ b/xen/arch/x86/x86_64/asm-offsets.c
@@ -51,6 +51,7 @@ void __dummy__(void)
      OFFSET(UREGS_kernel_sizeof, struct cpu_user_regs, es);
      BLANK();

+    OFFSET(VCPU_id, struct vcpu, vcpu_id);
      OFFSET(VCPU_processor, struct vcpu, processor);
      OFFSET(VCPU_domain, struct vcpu, domain);
      OFFSET(VCPU_vcpu_info, struct vcpu, vcpu_info);
diff --git a/xen/arch/x86/x86_64/entry.S b/xen/arch/x86/x86_64/entry.S
index 8b77d7113bbf..678876a32177 100644
--- a/xen/arch/x86/x86_64/entry.S
+++ b/xen/arch/x86/x86_64/entry.S
@@ -165,7 +165,16 @@ restore_all_guest:
          and   %rsi, %rdi
          and   %r9, %rsi
          add   %rcx, %rdi
+
+        /*
+         * The address in the vCPU cr3 is always mapped in the shadow
+         * cr3 virt area.
+         */
+        mov   VCPU_id(%rbx), %rsi
+        shl   $PAGE_SHIFT, %rsi
+        movabs $SHADOW_CR3_VIRT_START, %rcx
          add   %rcx, %rsi
+
          mov   $ROOT_PAGETABLE_FIRST_XEN_SLOT, %ecx
          mov   root_table_offset(SH_LINEAR_PT_VIRT_START)*8(%rsi), %r8
          mov   %r8, root_table_offset(SH_LINEAR_PT_VIRT_START)*8(%rdi)
> 
> Jan

-- 
Julien Grall


^ permalink raw reply related	[flat|nested] 101+ messages in thread

* Re: [PATCH 06/22] x86: map/unmap pages in restore_all_guests
  2023-06-22 10:44             ` Julien Grall
@ 2023-06-22 13:19               ` Jan Beulich
  0 siblings, 0 replies; 101+ messages in thread
From: Jan Beulich @ 2023-06-22 13:19 UTC (permalink / raw)
  To: Julien Grall
  Cc: Andrew Cooper, Roger Pau Monné, Wei Liu, Julien Grall, xen-devel

On 22.06.2023 12:44, Julien Grall wrote:
> On 13/01/2023 09:22, Jan Beulich wrote:
>> On 13.01.2023 00:20, Julien Grall wrote:
>>> On 04/01/2023 10:27, Jan Beulich wrote:
>>>> On 23.12.2022 13:22, Julien Grall wrote:
>>>>> On 22/12/2022 11:12, Jan Beulich wrote:
>>>>>> On 16.12.2022 12:48, Julien Grall wrote:
>>>>>>> --- a/xen/arch/x86/x86_64/entry.S
>>>>>>> +++ b/xen/arch/x86/x86_64/entry.S
>>>>>>> @@ -165,7 +165,24 @@ restore_all_guest:
>>>>>>>             and   %rsi, %rdi
>>>>>>>             and   %r9, %rsi
>>>>>>>             add   %rcx, %rdi
>>>>>>> -        add   %rcx, %rsi
>>>>>>> +
>>>>>>> +         /*
>>>>>>> +          * Without a direct map, we have to map first before copying. We only
>>>>>>> +          * need to map the guest root table but not the per-CPU root_pgt,
>>>>>>> +          * because the latter is still a xenheap page.
>>>>>>> +          */
>>>>>>> +        pushq %r9
>>>>>>> +        pushq %rdx
>>>>>>> +        pushq %rax
>>>>>>> +        pushq %rdi
>>>>>>> +        mov   %rsi, %rdi
>>>>>>> +        shr   $PAGE_SHIFT, %rdi
>>>>>>> +        callq map_domain_page
>>>>>>> +        mov   %rax, %rsi
>>>>>>> +        popq  %rdi
>>>>>>> +        /* Stash the pointer for unmapping later. */
>>>>>>> +        pushq %rax
>>>>>>> +
>>>>>>>             mov   $ROOT_PAGETABLE_FIRST_XEN_SLOT, %ecx
>>>>>>>             mov   root_table_offset(SH_LINEAR_PT_VIRT_START)*8(%rsi), %r8
>>>>>>>             mov   %r8, root_table_offset(SH_LINEAR_PT_VIRT_START)*8(%rdi)
>>>>>>> @@ -177,6 +194,14 @@ restore_all_guest:
>>>>>>>             sub   $(ROOT_PAGETABLE_FIRST_XEN_SLOT - \
>>>>>>>                     ROOT_PAGETABLE_LAST_XEN_SLOT - 1) * 8, %rdi
>>>>>>>             rep movsq
>>>>>>> +
>>>>>>> +        /* Unmap the page. */
>>>>>>> +        popq  %rdi
>>>>>>> +        callq unmap_domain_page
>>>>>>> +        popq  %rax
>>>>>>> +        popq  %rdx
>>>>>>> +        popq  %r9
>>>>>>
>>>>>> While the PUSH/POP are part of what I dislike here, I think this wants
>>>>>> doing differently: Establish a mapping when putting in place a new guest
>>>>>> page table, and use the pointer here. This could be a new per-domain
>>>>>> mapping, to limit its visibility.
>>>>>
>>>>> I have looked at a per-domain approach and this looks way more complex
>>>>> than the few concise lines here (not mentioning the extra amount of
>>>>> memory).
>>>>
>>>> Yes, I do understand that would be a more intrusive change.
>>>
>>> I could be persuaded to look at a more intrusive change if there are a
>>> good reason to do it. To me, at the moment, it mostly seem a matter of
>>> taste.
>>>
>>> So what would we gain from a perdomain mapping?
>>
>> Rather than mapping/unmapping once per hypervisor entry/exit, we'd
>> map just once per context switch. Plus we'd save ugly/fragile assembly
>> code (apart from the push/pop I also dislike C functions being called
>> from assembly which aren't really meant to be called this way: While
>> these two may indeed be unlikely to ever change, any such change comes
>> with the risk of the assembly callers being missed - the compiler
>> won't tell you that e.g. argument types/count don't match parameters
>> anymore).
> 
> I think I have managed to write what you suggested. I would like to 
> share to get early feedback before resending the series.
> 
> There are also a couple of TODOs (XXX) in place where I am not sure if 
> this is correct.

Sure, some comments below. But note that this isn't a full review. One
remark up front: The CR3 part of the names isn't matching what you map,
as it's not the register but the page thar it points to. I'd suggest
"rootpt" (or "root_pt") as the respective part of the names instead.

> --- a/xen/arch/x86/mm.c
> +++ b/xen/arch/x86/mm.c
> @@ -509,6 +509,13 @@ void share_xen_page_with_guest(struct page_info 
> *page, struct domain *d,
>       spin_unlock(&d->page_alloc_lock);
>   }
> 
> +#define shadow_cr3_idx(v) \
> +    ((v)->vcpu_id >> PAGETABLE_ORDER)
> +
> +#define pv_shadow_cr3_pte(v) \
> +    ((v)->domain->arch.pv.shadow_cr3_l1tab[shadow_cr3_idx(v)] + \
> +     ((v)->vcpu_id & (L1_PAGETABLE_ENTRIES - 1)))
> +
>   void make_cr3(struct vcpu *v, mfn_t mfn)
>   {
>       struct domain *d = v->domain;
> @@ -516,6 +523,18 @@ void make_cr3(struct vcpu *v, mfn_t mfn)
>       v->arch.cr3 = mfn_x(mfn) << PAGE_SHIFT;
>       if ( is_pv_domain(d) && d->arch.pv.pcid )
>           v->arch.cr3 |= get_pcid_bits(v, false);
> +
> +    /* Update the CR3 mapping */
> +    if ( is_pv_domain(d) )
> +    {
> +        l1_pgentry_t *pte = pv_shadow_cr3_pte(v);
> +
> +        /* XXX Do we need to call get page first? */

I don't think so. You piggy-back on the reference obtained when the
page address is stored in v->arch.cr3. What you need to be sure of
though is that there can't be a stale mapping left once that value is
replaced. I think the place here is the one central one, but this
will want double checking.

> +        l1e_write(pte, l1e_from_mfn(mfn, __PAGE_HYPERVISOR_RW));
> +        /* XXX Can the flush be reduced to the page? */

I think so; any reason you think more needs flushing? I'd rather
raise the question whether any flushing is needed at all. Before
this mapping can come into use, there necessarily is a CR3 write.
See also below.

> +        /* XXX Do we always call with current? */

I don't think we do. See e.g. arch_set_info_guest() or some of the
calls here from shadow code. However, I think when v != current, it
is always the case that v is paused. In which case no flushing would
be needed at all then, only when v == current.

Another question is whether this is the best place to make the
mapping. On one hand it is true that the way you do it, the mapping
isn't even re-written on each context switch. Otoh having it in
write_ptbase() may be the more natural (easier to prove as correct,
and that no dangling mappings can be left) place. For example then
you'll know that v == current in all cases (avoiding the other code
paths, examples of which I gave above). Plus explicit flushing can
be omitted, as switch_cr3_cr4() will always flush all non-global
mappings.

> +        flush_tlb_local();
> +    }
>   }
> 
>   void write_ptbase(struct vcpu *v)
>[...]
> --- a/xen/arch/x86/x86_64/entry.S
> +++ b/xen/arch/x86/x86_64/entry.S
> @@ -165,7 +165,16 @@ restore_all_guest:
>           and   %rsi, %rdi
>           and   %r9, %rsi
>           add   %rcx, %rdi
> +
> +        /*
> +         * The address in the vCPU cr3 is always mapped in the shadow
> +         * cr3 virt area.
> +         */
> +        mov   VCPU_id(%rbx), %rsi

The field is 32 bits, so you need to use %esi here.

> +        shl   $PAGE_SHIFT, %rsi

I wonder whether these two wouldn't sensibly be combined to

        imul   $PAGE_SIZE, VCPU_id(%rbx), %esi

as the result is guaranteed to fit in 32 bits.

A final remark, with no good place to attach it to: The code path above
is bypassed when xpti is off for the domain. You may want to avoid all
of the setup (and mapping) in that case. This, btw, could be done quite
naturally if - as outlined above as an alternative - the mapping
occurred in write_ptbase(): The function already distinguishes the two
cases.

Jan


^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 08/22] x86/pv: rewrite how building PV dom0 handles domheap mappings
  2022-12-22 11:48   ` Jan Beulich
@ 2024-01-10 12:50     ` El Yandouzi, Elias
  0 siblings, 0 replies; 101+ messages in thread
From: El Yandouzi, Elias @ 2024-01-10 12:50 UTC (permalink / raw)
  To: Jan Beulich, Julien Grall
  Cc: Hongyan Xia, Andrew Cooper, Roger Pau Monné,
	Wei Liu, Julien Grall, xen-devel

Hi Jan,

I have been looking at this series recently and tried my best
to address your comments. I'll shortly to the other patches too.

On 22/12/2022 11:48, Jan Beulich wrote:
> On 16.12.2022 12:48, Julien Grall wrote:
>> From: Hongyan Xia <hongyxia@amazon.com>
>>
>> Building a PV dom0 is allocating from the domheap but uses it like the
>> xenheap. This is clearly wrong. Fix.
> 
> "Clearly wrong" would mean there's a bug here, at lest under certain
> conditions. But there isn't: Even on huge systems, due to running on
> idle page tables, all memory is mapped at present.

I agree with you, I'll rephrase the commit message.

> 
>> @@ -711,22 +715,32 @@ int __init dom0_construct_pv(struct domain *d,
>>           v->arch.pv.event_callback_cs    = FLAT_COMPAT_KERNEL_CS;
>>       }
>>   
>> +#define UNMAP_MAP_AND_ADVANCE(mfn_var, virt_var, maddr) \
>> +do {                                                    \
>> +    UNMAP_DOMAIN_PAGE(virt_var);                        \
> 
> Not much point using the macro when ...
> 
>> +    mfn_var = maddr_to_mfn(maddr);                      \
>> +    maddr += PAGE_SIZE;                                 \
>> +    virt_var = map_domain_page(mfn_var);                \
> 
> ... the variable gets reset again to non-NULL unconditionally right
> away.

Sure, I'll change that.

> 
>> +} while ( false )
> 
> This being a local macro and all use sites passing mpt_alloc as the
> last argument, I think that parameter wants dropping, which would
> improve readability.

I have to disagree. It wouldn't improve readability but make only make 
things more obscure. I'll keep the macro as is.

> 
>> @@ -792,9 +808,9 @@ int __init dom0_construct_pv(struct domain *d,
>>               if ( !l3e_get_intpte(*l3tab) )
>>               {
>>                   maddr_to_page(mpt_alloc)->u.inuse.type_info = PGT_l2_page_table;
>> -                l2tab = __va(mpt_alloc); mpt_alloc += PAGE_SIZE;
>> -                clear_page(l2tab);
>> -                *l3tab = l3e_from_paddr(__pa(l2tab), L3_PROT);
>> +                UNMAP_MAP_AND_ADVANCE(l2start_mfn, l2start, mpt_alloc);
>> +                clear_page(l2start);
>> +                *l3tab = l3e_from_mfn(l2start_mfn, L3_PROT);
> 
> The l2start you map on the last iteration here can be re-used ...
> 
>> @@ -805,9 +821,17 @@ int __init dom0_construct_pv(struct domain *d,
>>           unmap_domain_page(l2t);
>>       }
> 
> ... in the code the tail of which is visible here, eliminating a
> redundant map/unmap pair.

Good catch, I'll remove the redundant pair.

> 
>> @@ -977,8 +1001,12 @@ int __init dom0_construct_pv(struct domain *d,
>>        * !CONFIG_VIDEO case so the logic here can be simplified.
>>        */
>>       if ( pv_shim )
>> +    {
>> +        l4start = map_domain_page(l4start_mfn);
>>           pv_shim_setup_dom(d, l4start, v_start, vxenstore_start, vconsole_start,
>>                             vphysmap_start, si);
>> +        UNMAP_DOMAIN_PAGE(l4start);
>> +    }
> 
> The, at the first glance, redundant re-mapping of the L4 table here could
> do with explaining in the description. However, I further wonder in how
> far in shim mode eliminating the direct map is actually useful. Which is
> to say that I question the need for this change in the first place. Or
> wait - isn't this (unlike the rest of this patch) actually a bug fix? At
> this point we're on the domain's page tables, which may not cover the
> page the L4 is allocated at (if a truly huge shim was configured). So I
> guess the change is needed but wants breaking out, allowing to at least
> consider whether to backport it.
> 

I will create a separate patch for this change.

> Jan
> 


^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 10/22] x86/mapcache: initialise the mapcache for the idle domain
  2022-12-22 13:06   ` Jan Beulich
@ 2024-01-10 16:24     ` Elias El Yandouzi
  2024-01-11  7:53       ` Jan Beulich
  0 siblings, 1 reply; 101+ messages in thread
From: Elias El Yandouzi @ 2024-01-10 16:24 UTC (permalink / raw)
  To: Jan Beulich, Julien Grall
  Cc: Hongyan Xia, Andrew Cooper, Roger Pau Monné,
	Wei Liu, Wei Wang, Julien Grall, xen-devel

Hi,

On 22/12/2022 13:06, Jan Beulich wrote:
> On 16.12.2022 12:48, Julien Grall wrote:
>> From: Hongyan Xia <hongyxia@amazon.com>
>>
>> In order to use the mapcache in the idle domain, we also have to
>> populate its page tables in the PERDOMAIN region, and we need to move
>> mapcache_domain_init() earlier in arch_domain_create().
>>
>> Note, commit 'x86: lift mapcache variable to the arch level' has
>> initialised the mapcache for HVM domains. With this patch, PV, HVM,
>> idle domains now all initialise the mapcache.
> 
> But they can't use it yet, can they? This needs saying explicitly, or
> else one is going to make wrong implications.
> 

Yes, I tried to use the mapcache right after the idle vCPU gets 
scheduled and it worked. So, I believe it is enough.

>> --- a/xen/arch/x86/domain.c
>> +++ b/xen/arch/x86/domain.c
>> @@ -732,6 +732,8 @@ int arch_domain_create(struct domain *d,
>>   
>>       spin_lock_init(&d->arch.e820_lock);
>>   
>> +    mapcache_domain_init(d);
>> +
>>       /* Minimal initialisation for the idle domain. */
>>       if ( unlikely(is_idle_domain(d)) )
>>       {
>> @@ -829,8 +831,6 @@ int arch_domain_create(struct domain *d,
>>   
>>       psr_domain_init(d);
>>   
>> -    mapcache_domain_init(d);
> 
> You move this ahead of error paths taking the "goto out" route, so
> adjustments to affected error paths are going to be needed to avoid
> memory leaks.

Correct, I'll fix that.

> 
>> --- a/xen/arch/x86/mm.c
>> +++ b/xen/arch/x86/mm.c
>> @@ -5963,6 +5963,9 @@ int create_perdomain_mapping(struct domain *d, unsigned long va,
>>           l3tab = __map_domain_page(pg);
>>           clear_page(l3tab);
>>           d->arch.perdomain_l3_pg = pg;
>> +        if ( is_idle_domain(d) )
>> +            idle_pg_table[l4_table_offset(PERDOMAIN_VIRT_START)] =
>> +                l4e_from_page(pg, __PAGE_HYPERVISOR_RW);
> 
> Hmm, having an idle domain check here isn't very nice. I agree putting
> it in arch_domain_create()'s respective conditional isn't very neat
> either, but personally I'd consider this at least a little less bad.
> And the layering violation aspect isn't much worse than that of setting
> d->arch.ctxt_switch there as well.
> 

Why do you think it would be less bad to move it in 
arch_domain_create()? To me, it would make things worse as it would 
spread the mapping stuff across different functions.

-- 
Elias


^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 10/22] x86/mapcache: initialise the mapcache for the idle domain
  2024-01-10 16:24     ` Elias El Yandouzi
@ 2024-01-11  7:53       ` Jan Beulich
  0 siblings, 0 replies; 101+ messages in thread
From: Jan Beulich @ 2024-01-11  7:53 UTC (permalink / raw)
  To: Elias El Yandouzi
  Cc: Hongyan Xia, Andrew Cooper, Roger Pau Monné,
	Wei Liu, Wei Wang, Julien Grall, xen-devel, Julien Grall

On 10.01.2024 17:24, Elias El Yandouzi wrote:
> On 22/12/2022 13:06, Jan Beulich wrote:
>> On 16.12.2022 12:48, Julien Grall wrote:
>>> --- a/xen/arch/x86/mm.c
>>> +++ b/xen/arch/x86/mm.c
>>> @@ -5963,6 +5963,9 @@ int create_perdomain_mapping(struct domain *d, unsigned long va,
>>>           l3tab = __map_domain_page(pg);
>>>           clear_page(l3tab);
>>>           d->arch.perdomain_l3_pg = pg;
>>> +        if ( is_idle_domain(d) )
>>> +            idle_pg_table[l4_table_offset(PERDOMAIN_VIRT_START)] =
>>> +                l4e_from_page(pg, __PAGE_HYPERVISOR_RW);
>>
>> Hmm, having an idle domain check here isn't very nice. I agree putting
>> it in arch_domain_create()'s respective conditional isn't very neat
>> either, but personally I'd consider this at least a little less bad.
>> And the layering violation aspect isn't much worse than that of setting
>> d->arch.ctxt_switch there as well.
> 
> Why do you think it would be less bad to move it in 
> arch_domain_create()? To me, it would make things worse as it would 
> spread the mapping stuff across different functions.

Not sure what to add to what I said: create_perdomain_mapping() gaining
such a check is a layering violation to me. arch_domain_create() otoh
special cases the idle domain already.

Jan


^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 11/22] x86: add a boot option to enable and disable the direct map
  2022-12-22 13:24   ` Jan Beulich
@ 2024-01-11 10:47     ` Elias El Yandouzi
  2024-01-11 11:53       ` Jan Beulich
  0 siblings, 1 reply; 101+ messages in thread
From: Elias El Yandouzi @ 2024-01-11 10:47 UTC (permalink / raw)
  To: Jan Beulich, Julien Grall
  Cc: Hongyan Xia, Andrew Cooper, George Dunlap, Stefano Stabellini,
	Wei Liu, Bertrand Marquis, Volodymyr Babchuk,
	Roger Pau Monné,
	Julien Grall, xen-devel

Hi,

On 22/12/2022 13:24, Jan Beulich wrote:
> That said, I think this change comes too early in the series, or there is
> something missing. 

At first, I had the same feeling but looking at the rest of the series, 
I can see that the option is needed in follow-up patches.

> As said in reply to patch 10, while there the mapcache
> is being initialized for the idle domain, I don't think it can be used
> just yet. Read through mapcache_current_vcpu() to understand why I think
> that way, paying particular attention to the ASSERT() near the end.

Would be able to elaborate a bit more why you think that? I haven't been 
able to get your point.

> In preparation of this patch here I think the mfn_to_virt() uses have to all
> disappear from map_domain_page(). Perhaps yet more strongly all
> ..._to_virt() (except fix_to_virt() and friends) and __va() have to
> disappear up front from x86 and any code path which can be taken on x86
> (which may simply mean purging all respective x86 #define-s, without
> breaking the build in any way).

I agree with you on that one. I think it is what we're aiming for in the 
long term. However, as mentioned by Julien in the cover letter, the 
series's name is a misnomer and I am afraid we won't be able to remove 
all of them with this series. These helpers would still be used for 
xenheap pages or when the direct map is enabled.

-- 
Elias


^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 11/22] x86: add a boot option to enable and disable the direct map
  2024-01-11 10:47     ` Elias El Yandouzi
@ 2024-01-11 11:53       ` Jan Beulich
  2024-01-11 12:25         ` Julien Grall
  0 siblings, 1 reply; 101+ messages in thread
From: Jan Beulich @ 2024-01-11 11:53 UTC (permalink / raw)
  To: Elias El Yandouzi
  Cc: Hongyan Xia, Andrew Cooper, George Dunlap, Stefano Stabellini,
	Wei Liu, Bertrand Marquis, Volodymyr Babchuk,
	Roger Pau Monné,
	Julien Grall, xen-devel, Julien Grall

On 11.01.2024 11:47, Elias El Yandouzi wrote:
> On 22/12/2022 13:24, Jan Beulich wrote:
>> That said, I think this change comes too early in the series, or there is
>> something missing. 
> 
> At first, I had the same feeling but looking at the rest of the series, 
> I can see that the option is needed in follow-up patches.
> 
>> As said in reply to patch 10, while there the mapcache
>> is being initialized for the idle domain, I don't think it can be used
>> just yet. Read through mapcache_current_vcpu() to understand why I think
>> that way, paying particular attention to the ASSERT() near the end.
> 
> Would be able to elaborate a bit more why you think that? I haven't been 
> able to get your point.

Why exactly I referred to the ASSERT() there I can't reconstruct. The
function as a whole looks problematic though when suddenly the idle
domain also gains a mapcache. I'm sorry, too much context was lost
from over a year ago; all of this will need looking at from scratch
again whenever a new version was posted.

>> In preparation of this patch here I think the mfn_to_virt() uses have to all
>> disappear from map_domain_page(). Perhaps yet more strongly all
>> ..._to_virt() (except fix_to_virt() and friends) and __va() have to
>> disappear up front from x86 and any code path which can be taken on x86
>> (which may simply mean purging all respective x86 #define-s, without
>> breaking the build in any way).
> 
> I agree with you on that one. I think it is what we're aiming for in the 
> long term. However, as mentioned by Julien in the cover letter, the 
> series's name is a misnomer and I am afraid we won't be able to remove 
> all of them with this series. These helpers would still be used for 
> xenheap pages or when the direct map is enabled.

Leaving a hazard of certain uses not having been converted, or even
overlooked in patches going in at around the same time as this series?
I view this as pretty "adventurous".

Jan


^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 11/22] x86: add a boot option to enable and disable the direct map
  2024-01-11 11:53       ` Jan Beulich
@ 2024-01-11 12:25         ` Julien Grall
  2024-01-11 14:09           ` Jan Beulich
  0 siblings, 1 reply; 101+ messages in thread
From: Julien Grall @ 2024-01-11 12:25 UTC (permalink / raw)
  To: Jan Beulich, Elias El Yandouzi
  Cc: Hongyan Xia, Andrew Cooper, George Dunlap, Stefano Stabellini,
	Wei Liu, Bertrand Marquis, Volodymyr Babchuk,
	Roger Pau Monné,
	Julien Grall, xen-devel

Hi Jan,

On 11/01/2024 11:53, Jan Beulich wrote:
> On 11.01.2024 11:47, Elias El Yandouzi wrote:
>> On 22/12/2022 13:24, Jan Beulich wrote:
>>> That said, I think this change comes too early in the series, or there is
>>> something missing.
>>
>> At first, I had the same feeling but looking at the rest of the series,
>> I can see that the option is needed in follow-up patches.
>>
>>> As said in reply to patch 10, while there the mapcache
>>> is being initialized for the idle domain, I don't think it can be used
>>> just yet. Read through mapcache_current_vcpu() to understand why I think
>>> that way, paying particular attention to the ASSERT() near the end.
>>
>> Would be able to elaborate a bit more why you think that? I haven't been
>> able to get your point.
> 
> Why exactly I referred to the ASSERT() there I can't reconstruct. The
> function as a whole looks problematic though when suddenly the idle
> domain also gains a mapcache. I'm sorry, too much context was lost
> from over a year ago; all of this will need looking at from scratch
> again whenever a new version was posted.
> 
>>> In preparation of this patch here I think the mfn_to_virt() uses have to all
>>> disappear from map_domain_page(). Perhaps yet more strongly all
>>> ..._to_virt() (except fix_to_virt() and friends) and __va() have to
>>> disappear up front from x86 and any code path which can be taken on x86
>>> (which may simply mean purging all respective x86 #define-s, without
>>> breaking the build in any way).
>>
>> I agree with you on that one. I think it is what we're aiming for in the
>> long term. However, as mentioned by Julien in the cover letter, the
>> series's name is a misnomer and I am afraid we won't be able to remove
>> all of them with this series. These helpers would still be used for
>> xenheap pages or when the direct map is enabled.
> 
> Leaving a hazard of certain uses not having been converted, or even
> overlooked in patches going in at around the same time as this series?
> I view this as pretty "adventurous".

Until we get rid of the directmap completely (which is not the goal of 
this series), we will need to keep mfn_to_virt().

In fact the one you ask to remove in map_domain_page() will need to be 
replaced with function doing the same thing. The same for the code that 
will initially prepare the directmap. This to avoid impacting 
performance when the user still wants to use the directmap.

So are you just asking to remove most of the use and rename *_to_virt() 
to something that is more directmap specific (e.g. mfn_to_directmap_virt())?

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 11/22] x86: add a boot option to enable and disable the direct map
  2024-01-11 12:25         ` Julien Grall
@ 2024-01-11 14:09           ` Jan Beulich
  2024-01-11 18:25             ` Elias El Yandouzi
  0 siblings, 1 reply; 101+ messages in thread
From: Jan Beulich @ 2024-01-11 14:09 UTC (permalink / raw)
  To: Julien Grall, Elias El Yandouzi
  Cc: Hongyan Xia, Andrew Cooper, George Dunlap, Stefano Stabellini,
	Wei Liu, Bertrand Marquis, Volodymyr Babchuk,
	Roger Pau Monné,
	Julien Grall, xen-devel

On 11.01.2024 13:25, Julien Grall wrote:
> Hi Jan,
> 
> On 11/01/2024 11:53, Jan Beulich wrote:
>> On 11.01.2024 11:47, Elias El Yandouzi wrote:
>>> On 22/12/2022 13:24, Jan Beulich wrote:
>>>> That said, I think this change comes too early in the series, or there is
>>>> something missing.
>>>
>>> At first, I had the same feeling but looking at the rest of the series,
>>> I can see that the option is needed in follow-up patches.
>>>
>>>> As said in reply to patch 10, while there the mapcache
>>>> is being initialized for the idle domain, I don't think it can be used
>>>> just yet. Read through mapcache_current_vcpu() to understand why I think
>>>> that way, paying particular attention to the ASSERT() near the end.
>>>
>>> Would be able to elaborate a bit more why you think that? I haven't been
>>> able to get your point.
>>
>> Why exactly I referred to the ASSERT() there I can't reconstruct. The
>> function as a whole looks problematic though when suddenly the idle
>> domain also gains a mapcache. I'm sorry, too much context was lost
>> from over a year ago; all of this will need looking at from scratch
>> again whenever a new version was posted.
>>
>>>> In preparation of this patch here I think the mfn_to_virt() uses have to all
>>>> disappear from map_domain_page(). Perhaps yet more strongly all
>>>> ..._to_virt() (except fix_to_virt() and friends) and __va() have to
>>>> disappear up front from x86 and any code path which can be taken on x86
>>>> (which may simply mean purging all respective x86 #define-s, without
>>>> breaking the build in any way).
>>>
>>> I agree with you on that one. I think it is what we're aiming for in the
>>> long term. However, as mentioned by Julien in the cover letter, the
>>> series's name is a misnomer and I am afraid we won't be able to remove
>>> all of them with this series. These helpers would still be used for
>>> xenheap pages or when the direct map is enabled.
>>
>> Leaving a hazard of certain uses not having been converted, or even
>> overlooked in patches going in at around the same time as this series?
>> I view this as pretty "adventurous".
> 
> Until we get rid of the directmap completely (which is not the goal of 
> this series), we will need to keep mfn_to_virt().
> 
> In fact the one you ask to remove in map_domain_page() will need to be 
> replaced with function doing the same thing. The same for the code that 
> will initially prepare the directmap. This to avoid impacting 
> performance when the user still wants to use the directmap.
> 
> So are you just asking to remove most of the use and rename *_to_virt() 
> to something that is more directmap specific (e.g. mfn_to_directmap_virt())?

Well, in a way. If done this way, mfn_to_virt() (and __va()) should have no
users by the end of the series, and it would be obvious that nothing was
missed (and by then purging the old ones we could also ensure no new uses
would appear).

Jan


^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 14/22] x86/domain_page: remove the fast paths when mfn is not in the directmap
  2023-01-11 14:11   ` Jan Beulich
@ 2024-01-11 14:22     ` Elias El Yandouzi
  0 siblings, 0 replies; 101+ messages in thread
From: Elias El Yandouzi @ 2024-01-11 14:22 UTC (permalink / raw)
  To: Jan Beulich, Julien Grall
  Cc: Hongyan Xia, Andrew Cooper, Roger Pau Monné,
	Wei Liu, Julien Grall, xen-devel

Hi Jan,

On 11/01/2023 14:11, Jan Beulich wrote:
> As to using pmap - assuming you've done an audit and the number of
> simultaneous mappings that can be in use can be proven to not exceed
> the number of slots available, can you please say so in the description?

I don't know if any audit has been made but a similar code has been 
internally used for a few years now without any problem.
Quickly looking at the slots usage, I found that at most only 4 slots 
out of 8 would be used during boot time.

> I have to admit though that I'm wary - this isn't a per-CPU number of
> slots aiui, but a global one. But then you also have a BUG_ON() there
> restricting the use to early boot. The reasoning for this is also
> missing (and might address my concern).


Indeed, this isn't presented as a per-CPU number of slots, but the slots 
can be only used before we reach the state SYS_STATE_smp_boot. So 
effectively, only the BSP would use the PMAP slots.

The PMAP slots are meant to map at most the number of level page-tables.
See the comment on top of NUM_FIX_PMAP definition.

-- 
Elias


^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 11/22] x86: add a boot option to enable and disable the direct map
  2024-01-11 14:09           ` Jan Beulich
@ 2024-01-11 18:25             ` Elias El Yandouzi
  2024-01-12  7:47               ` Jan Beulich
  0 siblings, 1 reply; 101+ messages in thread
From: Elias El Yandouzi @ 2024-01-11 18:25 UTC (permalink / raw)
  To: Jan Beulich, Julien Grall
  Cc: Hongyan Xia, Andrew Cooper, George Dunlap, Stefano Stabellini,
	Wei Liu, Bertrand Marquis, Volodymyr Babchuk,
	Roger Pau Monné,
	Julien Grall, xen-devel



On 11/01/2024 14:09, Jan Beulich wrote:
> On 11.01.2024 13:25, Julien Grall wrote:
>> Hi Jan,
>>
>> On 11/01/2024 11:53, Jan Beulich wrote:
>>> On 11.01.2024 11:47, Elias El Yandouzi wrote:
>>>> On 22/12/2022 13:24, Jan Beulich wrote:
>>>>> That said, I think this change comes too early in the series, or there is
>>>>> something missing.
>>>>
>>>> At first, I had the same feeling but looking at the rest of the series,
>>>> I can see that the option is needed in follow-up patches.
>>>>
>>>>> As said in reply to patch 10, while there the mapcache
>>>>> is being initialized for the idle domain, I don't think it can be used
>>>>> just yet. Read through mapcache_current_vcpu() to understand why I think
>>>>> that way, paying particular attention to the ASSERT() near the end.
>>>>
>>>> Would be able to elaborate a bit more why you think that? I haven't been
>>>> able to get your point.
>>>
>>> Why exactly I referred to the ASSERT() there I can't reconstruct. The
>>> function as a whole looks problematic though when suddenly the idle
>>> domain also gains a mapcache. I'm sorry, too much context was lost
>>> from over a year ago; all of this will need looking at from scratch
>>> again whenever a new version was posted.
>>>
>>>>> In preparation of this patch here I think the mfn_to_virt() uses have to all
>>>>> disappear from map_domain_page(). Perhaps yet more strongly all
>>>>> ..._to_virt() (except fix_to_virt() and friends) and __va() have to
>>>>> disappear up front from x86 and any code path which can be taken on x86
>>>>> (which may simply mean purging all respective x86 #define-s, without
>>>>> breaking the build in any way).
>>>>
>>>> I agree with you on that one. I think it is what we're aiming for in the
>>>> long term. However, as mentioned by Julien in the cover letter, the
>>>> series's name is a misnomer and I am afraid we won't be able to remove
>>>> all of them with this series. These helpers would still be used for
>>>> xenheap pages or when the direct map is enabled.
>>>
>>> Leaving a hazard of certain uses not having been converted, or even
>>> overlooked in patches going in at around the same time as this series?
>>> I view this as pretty "adventurous".
>>
>> Until we get rid of the directmap completely (which is not the goal of
>> this series), we will need to keep mfn_to_virt().
>>
>> In fact the one you ask to remove in map_domain_page() will need to be
>> replaced with function doing the same thing. The same for the code that
>> will initially prepare the directmap. This to avoid impacting
>> performance when the user still wants to use the directmap.
>>
>> So are you just asking to remove most of the use and rename *_to_virt()
>> to something that is more directmap specific (e.g. mfn_to_directmap_virt())?
> 
> Well, in a way. If done this way, mfn_to_virt() (and __va()) should have no
> users by the end of the series, and it would be obvious that nothing was
> missed (and by then purging the old ones we could also ensure no new uses
> would appear).

What about maddr_to_virt()? For instance, in the function 
xen/arch/x86/dmi_scan.c:dmi_iterate(), we need to access a very low 
machine address which isn't in the directmap range.

How would you proceed? Calling vmap() seems to be a bit overkill for 
just a temporary mapping and I don't really want to rework this function 
to use map_domain_page().

In such case, how would you proceed? What do you suggest?

Cheers,

-- 
Elias


^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 11/22] x86: add a boot option to enable and disable the direct map
  2024-01-11 18:25             ` Elias El Yandouzi
@ 2024-01-12  7:47               ` Jan Beulich
  2024-01-15 14:50                 ` Elias El Yandouzi
  0 siblings, 1 reply; 101+ messages in thread
From: Jan Beulich @ 2024-01-12  7:47 UTC (permalink / raw)
  To: Elias El Yandouzi, Julien Grall
  Cc: Hongyan Xia, Andrew Cooper, George Dunlap, Stefano Stabellini,
	Wei Liu, Bertrand Marquis, Volodymyr Babchuk,
	Roger Pau Monné,
	Julien Grall, xen-devel

On 11.01.2024 19:25, Elias El Yandouzi wrote:
> 
> 
> On 11/01/2024 14:09, Jan Beulich wrote:
>> On 11.01.2024 13:25, Julien Grall wrote:
>>> Hi Jan,
>>>
>>> On 11/01/2024 11:53, Jan Beulich wrote:
>>>> On 11.01.2024 11:47, Elias El Yandouzi wrote:
>>>>> On 22/12/2022 13:24, Jan Beulich wrote:
>>>>>> That said, I think this change comes too early in the series, or there is
>>>>>> something missing.
>>>>>
>>>>> At first, I had the same feeling but looking at the rest of the series,
>>>>> I can see that the option is needed in follow-up patches.
>>>>>
>>>>>> As said in reply to patch 10, while there the mapcache
>>>>>> is being initialized for the idle domain, I don't think it can be used
>>>>>> just yet. Read through mapcache_current_vcpu() to understand why I think
>>>>>> that way, paying particular attention to the ASSERT() near the end.
>>>>>
>>>>> Would be able to elaborate a bit more why you think that? I haven't been
>>>>> able to get your point.
>>>>
>>>> Why exactly I referred to the ASSERT() there I can't reconstruct. The
>>>> function as a whole looks problematic though when suddenly the idle
>>>> domain also gains a mapcache. I'm sorry, too much context was lost
>>>> from over a year ago; all of this will need looking at from scratch
>>>> again whenever a new version was posted.
>>>>
>>>>>> In preparation of this patch here I think the mfn_to_virt() uses have to all
>>>>>> disappear from map_domain_page(). Perhaps yet more strongly all
>>>>>> ..._to_virt() (except fix_to_virt() and friends) and __va() have to
>>>>>> disappear up front from x86 and any code path which can be taken on x86
>>>>>> (which may simply mean purging all respective x86 #define-s, without
>>>>>> breaking the build in any way).
>>>>>
>>>>> I agree with you on that one. I think it is what we're aiming for in the
>>>>> long term. However, as mentioned by Julien in the cover letter, the
>>>>> series's name is a misnomer and I am afraid we won't be able to remove
>>>>> all of them with this series. These helpers would still be used for
>>>>> xenheap pages or when the direct map is enabled.
>>>>
>>>> Leaving a hazard of certain uses not having been converted, or even
>>>> overlooked in patches going in at around the same time as this series?
>>>> I view this as pretty "adventurous".
>>>
>>> Until we get rid of the directmap completely (which is not the goal of
>>> this series), we will need to keep mfn_to_virt().
>>>
>>> In fact the one you ask to remove in map_domain_page() will need to be
>>> replaced with function doing the same thing. The same for the code that
>>> will initially prepare the directmap. This to avoid impacting
>>> performance when the user still wants to use the directmap.
>>>
>>> So are you just asking to remove most of the use and rename *_to_virt()
>>> to something that is more directmap specific (e.g. mfn_to_directmap_virt())?
>>
>> Well, in a way. If done this way, mfn_to_virt() (and __va()) should have no
>> users by the end of the series, and it would be obvious that nothing was
>> missed (and by then purging the old ones we could also ensure no new uses
>> would appear).
> 
> What about maddr_to_virt()? For instance, in the function 
> xen/arch/x86/dmi_scan.c:dmi_iterate(), we need to access a very low 
> machine address which isn't in the directmap range.

I'm afraid I don't follow: Very low addresses are always in the
direct map range, which - on x86 - always starts at 0.

> How would you proceed? Calling vmap() seems to be a bit overkill for 
> just a temporary mapping and I don't really want to rework this function 
> to use map_domain_page().
> 
> In such case, how would you proceed? What do you suggest?

fixmap may be an option to consider, but I also don't see why you
apparently think using vmap() would be a possibility while at the
same time making use of map_domain_page() is too much effort.

Jan


^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 11/22] x86: add a boot option to enable and disable the direct map
  2024-01-12  7:47               ` Jan Beulich
@ 2024-01-15 14:50                 ` Elias El Yandouzi
  2024-01-16  8:30                   ` Jan Beulich
  0 siblings, 1 reply; 101+ messages in thread
From: Elias El Yandouzi @ 2024-01-15 14:50 UTC (permalink / raw)
  To: Jan Beulich, Julien Grall
  Cc: Hongyan Xia, Andrew Cooper, George Dunlap, Stefano Stabellini,
	Wei Liu, Bertrand Marquis, Volodymyr Babchuk,
	Roger Pau Monné,
	Julien Grall, xen-devel

Hi,

On 12/01/2024 07:47, Jan Beulich wrote:
> On 11.01.2024 19:25, Elias El Yandouzi wrote:
>> On 11/01/2024 14:09, Jan Beulich wrote:
>>
>> What about maddr_to_virt()? For instance, in the function
>> xen/arch/x86/dmi_scan.c:dmi_iterate(), we need to access a very low
>> machine address which isn't in the directmap range.
> 
> I'm afraid I don't follow: Very low addresses are always in the
> direct map range, which - on x86 - always starts at 0.
> 

I reckon it was poorly phrased. IIUC, we'd like to remove every use of 
*_to_virt() in the case the directmap option is disabled.
So I meant that in this situation, the helper arch_mfns_in_direct_map() 
would return false.

>> How would you proceed? Calling vmap() seems to be a bit overkill for
>> just a temporary mapping and I don't really want to rework this function
>> to use map_domain_page().
>>
>> In such case, how would you proceed? What do you suggest?
> 
> fixmap may be an option to consider, but I also don't see why you
> apparently think using vmap() would be a possibility while at the
> same time making use of map_domain_page() is too much effort.

I thought about using vmap() as it allows to map a contiguous region 
easily. It is also used in the follow-up patch 17/22, so I thought it 
could be viable.

I was reluctant to use map_domain_page() for two reasons. 1) it only 
allows to map one page at the time, so I'd need to rework more deeply 
the function dmi_iterate() 2) because the mapcache wouldn't be ready to 
use at that time, the mapping would end up in PMAP which is meant to map 
the page tables, nothing else.

-- 
Elias


^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 11/22] x86: add a boot option to enable and disable the direct map
  2024-01-15 14:50                 ` Elias El Yandouzi
@ 2024-01-16  8:30                   ` Jan Beulich
  0 siblings, 0 replies; 101+ messages in thread
From: Jan Beulich @ 2024-01-16  8:30 UTC (permalink / raw)
  To: Elias El Yandouzi
  Cc: Hongyan Xia, Andrew Cooper, George Dunlap, Stefano Stabellini,
	Wei Liu, Bertrand Marquis, Volodymyr Babchuk,
	Roger Pau Monné,
	Julien Grall, xen-devel, Julien Grall

On 15.01.2024 15:50, Elias El Yandouzi wrote:
> On 12/01/2024 07:47, Jan Beulich wrote:
>> On 11.01.2024 19:25, Elias El Yandouzi wrote:
>>> How would you proceed? Calling vmap() seems to be a bit overkill for
>>> just a temporary mapping and I don't really want to rework this function
>>> to use map_domain_page().
>>>
>>> In such case, how would you proceed? What do you suggest?
>>
>> fixmap may be an option to consider, but I also don't see why you
>> apparently think using vmap() would be a possibility while at the
>> same time making use of map_domain_page() is too much effort.
> 
> I thought about using vmap() as it allows to map a contiguous region 
> easily. It is also used in the follow-up patch 17/22, so I thought it 
> could be viable.
> 
> I was reluctant to use map_domain_page() for two reasons. 1) it only 
> allows to map one page at the time, so I'd need to rework more deeply 
> the function dmi_iterate() 2) because the mapcache wouldn't be ready to 
> use at that time, the mapping would end up in PMAP which is meant to map 
> the page tables, nothing else.

Oh, right, this makes sense of course.

Jan


^ permalink raw reply	[flat|nested] 101+ messages in thread

end of thread, other threads:[~2024-01-16  8:30 UTC | newest]

Thread overview: 101+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-12-16 11:48 [PATCH 00/22] Remove the directmap Julien Grall
2022-12-16 11:48 ` [PATCH 01/22] xen/common: page_alloc: Re-order includes Julien Grall
2022-12-16 12:03   ` Jan Beulich
2022-12-23  9:29     ` Julien Grall
2023-01-23 21:29   ` Stefano Stabellini
2023-01-23 21:57     ` Julien Grall
2022-12-16 11:48 ` [PATCH 02/22] x86/setup: move vm_init() before acpi calls Julien Grall
2022-12-20 15:08   ` Jan Beulich
2022-12-21 10:18     ` Julien Grall
2022-12-21 10:22       ` Jan Beulich
2022-12-23  9:51         ` Julien Grall
2022-12-23  9:51     ` Julien Grall
2023-01-23 21:34   ` Stefano Stabellini
2022-12-16 11:48 ` [PATCH 03/22] acpi: vmap pages in acpi_os_alloc_memory Julien Grall
2022-12-16 12:07   ` Julien Grall
2022-12-20 15:15   ` Jan Beulich
2022-12-21 10:23     ` Julien Grall
2023-01-23 21:39   ` Stefano Stabellini
2022-12-16 11:48 ` [PATCH 04/22] xen/numa: vmap the pages for memnodemap Julien Grall
2022-12-20 15:25   ` Jan Beulich
2022-12-16 11:48 ` [PATCH 05/22] x86/srat: vmap the pages for acpi_slit Julien Grall
2022-12-20 15:30   ` Jan Beulich
2022-12-23 11:31     ` Julien Grall
2023-01-04 10:23       ` Jan Beulich
2023-01-12 23:15         ` Julien Grall
2023-01-13  9:16           ` Jan Beulich
2023-01-13  9:17             ` Julien Grall
2023-01-30 19:27       ` Julien Grall
2023-01-31  9:11         ` Jan Beulich
2023-01-31 21:37           ` Julien Grall
2022-12-16 11:48 ` [PATCH 06/22] x86: map/unmap pages in restore_all_guests Julien Grall
2022-12-22 11:12   ` Jan Beulich
2022-12-23 12:22     ` Julien Grall
2023-01-04 10:27       ` Jan Beulich
2023-01-12 23:20         ` Julien Grall
2023-01-13  9:22           ` Jan Beulich
2023-06-22 10:44             ` Julien Grall
2023-06-22 13:19               ` Jan Beulich
2022-12-16 11:48 ` [PATCH 07/22] x86/pv: domheap pages should be mapped while relocating initrd Julien Grall
2022-12-22 11:18   ` Jan Beulich
2022-12-16 11:48 ` [PATCH 08/22] x86/pv: rewrite how building PV dom0 handles domheap mappings Julien Grall
2022-12-22 11:48   ` Jan Beulich
2024-01-10 12:50     ` El Yandouzi, Elias
2022-12-16 11:48 ` [PATCH 09/22] x86: lift mapcache variable to the arch level Julien Grall
2022-12-22 12:53   ` Jan Beulich
2022-12-16 11:48 ` [PATCH 10/22] x86/mapcache: initialise the mapcache for the idle domain Julien Grall
2022-12-22 13:06   ` Jan Beulich
2024-01-10 16:24     ` Elias El Yandouzi
2024-01-11  7:53       ` Jan Beulich
2022-12-16 11:48 ` [PATCH 11/22] x86: add a boot option to enable and disable the direct map Julien Grall
2022-12-22 13:24   ` Jan Beulich
2024-01-11 10:47     ` Elias El Yandouzi
2024-01-11 11:53       ` Jan Beulich
2024-01-11 12:25         ` Julien Grall
2024-01-11 14:09           ` Jan Beulich
2024-01-11 18:25             ` Elias El Yandouzi
2024-01-12  7:47               ` Jan Beulich
2024-01-15 14:50                 ` Elias El Yandouzi
2024-01-16  8:30                   ` Jan Beulich
2023-01-23 21:45   ` Stefano Stabellini
2023-01-23 22:01     ` Julien Grall
2022-12-16 11:48 ` [PATCH 12/22] xen/arm: fixmap: Rename the fixmap slots to follow the x86 convention Julien Grall
2022-12-22 13:29   ` Jan Beulich
2023-01-06 14:54   ` Henry Wang
2023-01-23 21:47   ` Stefano Stabellini
2022-12-16 11:48 ` [PATCH 13/22] xen/x86: Add support for the PMAP Julien Grall
2023-01-05 16:46   ` Jan Beulich
2023-01-05 17:50     ` Julien Grall
2023-01-06  7:17       ` Jan Beulich
2022-12-16 11:48 ` [PATCH 14/22] x86/domain_page: remove the fast paths when mfn is not in the directmap Julien Grall
2023-01-11 14:11   ` Jan Beulich
2024-01-11 14:22     ` Elias El Yandouzi
2022-12-16 11:48 ` [PATCH 15/22] xen/page_alloc: add a path for xenheap when there is no direct map Julien Grall
2023-01-11 14:23   ` Jan Beulich
2022-12-16 11:48 ` [PATCH 16/22] x86/setup: leave early boot slightly earlier Julien Grall
2023-01-11 14:34   ` Jan Beulich
2022-12-16 11:48 ` [PATCH 17/22] x86/setup: vmap heap nodes when they are outside the direct map Julien Grall
2023-01-11 14:39   ` Jan Beulich
2023-01-23 22:03   ` Stefano Stabellini
2023-01-23 22:23     ` Julien Grall
2023-01-23 22:56       ` Stefano Stabellini
2022-12-16 11:48 ` [PATCH 18/22] x86/setup: do not create valid mappings when directmap=no Julien Grall
2023-01-11 14:47   ` Jan Beulich
2022-12-16 11:48 ` [PATCH 19/22] xen/arm32: mm: Rename 'first' to 'root' in init_secondary_pagetables() Julien Grall
2023-01-06 14:54   ` Henry Wang
2023-01-23 22:06   ` Stefano Stabellini
2022-12-16 11:48 ` [PATCH 20/22] xen/arm64: mm: Use per-pCPU page-tables Julien Grall
2023-01-06 14:54   ` Henry Wang
2023-01-06 15:44     ` Julien Grall
2023-01-07  2:22       ` Henry Wang
2023-01-23 22:21   ` Stefano Stabellini
2022-12-16 11:48 ` [PATCH 21/22] xen/arm64: Implement a mapcache for arm64 Julien Grall
2023-01-06 14:55   ` Henry Wang
2023-01-23 22:34   ` Stefano Stabellini
2022-12-16 11:48 ` [PATCH 22/22] xen/arm64: Allow the admin to enable/disable the directmap Julien Grall
2023-01-06 14:55   ` Henry Wang
2023-01-23 22:52   ` Stefano Stabellini
2023-01-23 23:09     ` Julien Grall
2023-01-24  0:12       ` Stefano Stabellini
2023-01-24 18:06         ` Julien Grall
2023-01-24 20:48           ` Stefano Stabellini

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.