kvmarm.lists.cs.columbia.edu archive mirror
 help / color / mirror / Atom feed
* [kvm-unit-tests RFC PATCH 00/19] arm/arm64: Rework cache maintenance at boot
@ 2022-08-09  9:15 Alexandru Elisei
  2022-08-09  9:15 ` [kvm-unit-tests RFC PATCH 01/19] Makefile: Define __ASSEMBLY__ for assembly files Alexandru Elisei
                   ` (19 more replies)
  0 siblings, 20 replies; 49+ messages in thread
From: Alexandru Elisei @ 2022-08-09  9:15 UTC (permalink / raw)
  To: pbonzini, thuth, andrew.jones, kvm, kvmarm, nikos.nikoleris

I got the idea for this series as I was looking at the UEFI support series
[1]. More specifically, I realized that the cache maintenance performed by
asm_mmu_disable is insuficient. Patch #19 ("arm/arm64: Rework the cache
maintenance in asm_mmu_disable") describes what is wrong with
asm_mmu_disable. A detailed explanation of what cache maintenance is needed
and why is needed can be found in patch #18 ("arm/arm64: Perform dcache
maintenance at boot").

Then I realized that I couldn't fix only asm_mmu_disable, and leave the
rest of kvm-unit-tests without the needed cache maintenance, so here it is,
my attempt at adding the cache maintenace operations (from now on, CMOs)
required by the architecture.

My approach is to try to enable the MMU and build the translation tables as
soon as possible, to avoid as much of cache maintenance as possible. I
didn't want to do it in the early assembly code, like Linux, because I like
the fact that kvm-unit-tests keeps the assembly code to a minimum, and I
wanted to preserve that. So I made the physical allocator simpler (patches
#2-#6) so it can be used to create the translation tables immediately after
the memory regions are populated.

After moving some code around, especially how the secondaries are brought
online, the dcache maintenance is implemented in patch #18 ("arm/arm64:
Perform dcache maintenance at boot").

The series is an RFC, and I open to suggestions about how to do things
better; I'm happy to rework the entire series if a better approach is
proposed.

Why is this needed? Nobody complained about test failing because of missing
CMOs before, so why add them now? I see two reasons for the series:

1. For architectural correctness. The emphasis has been so far on the test
themselves to be architectural compliant, but I believe that the boot code
should get the same treatment. kvm-unit-tests has started to be used in
different ways than before, and I don't think that we should limit
ourselves to running under one hypervisor, or running under a hypervisor at
all. Which brings me to point number 2.

2. If nothing else, this can serve as a showcase for the UEFI support
series for the required cache maintenance. Although I hope that UEFI
support will end up sharing at least some of the boot code with the
non-UEFI boot path.

This is an RFC and has some rough edges, probably also bugs, but I believe
the concept to be sound. If/when the series stabilizes, I'll probably split
it into separate series (for example, the __ASSEMBLY__ define patch could
probably be separate from the others). Tested by running all the arm and
arm64 tests on a rockpro64 with qemu.

[1] https://lore.kernel.org/all/20220630100324.3153655-1-nikos.nikoleris@arm.com/

Alexandru Elisei (19):
  Makefile: Define __ASSEMBLY__ for assembly files
  lib/alloc_phys: Initialize align_min
  lib/alloc_phys: Use phys_alloc_aligned_safe and rename it to
    memalign_early
  powerpc: Use the page allocator
  lib/alloc_phys: Remove locking
  lib/alloc_phys: Remove allocation accounting
  arm/arm64: Mark the phys_end parameter as unused in setup_mmu()
  arm/arm64: Use pgd_alloc() to allocate mmu_idmap
  arm/arm64: Zero secondary CPUs' stack
  arm/arm64: Enable the MMU early
  arm/arm64: Map the UART when creating the translation tables
  arm/arm64: assembler.h: Replace size with end address for
    dcache_by_line_op
  arm: page.h: Add missing libcflat.h include
  arm/arm64: Add C functions for doing cache maintenance
  lib/alloc_phys: Add callback to perform cache maintenance
  arm/arm64: Allocate secondaries' stack using the page allocator
  arm/arm64: Configure secondaries' stack before enabling the MMU
  arm/arm64: Perform dcache maintenance at boot
  arm/arm64: Rework the cache maintenance in asm_mmu_disable

 Makefile                   |   5 +-
 arm/Makefile.arm           |   4 +-
 arm/Makefile.arm64         |   4 +-
 arm/Makefile.common        |   4 +-
 arm/cstart.S               |  59 ++++++++++++------
 arm/cstart64.S             |  56 +++++++++++++----
 lib/alloc_phys.c           | 122 ++++++++++++-------------------------
 lib/alloc_phys.h           |  13 +++-
 lib/arm/asm/assembler.h    |  15 ++---
 lib/arm/asm/cacheflush.h   |   1 +
 lib/arm/asm/mmu-api.h      |   1 +
 lib/arm/asm/mmu.h          |   6 --
 lib/arm/asm/page.h         |   2 +
 lib/arm/asm/pgtable.h      |  52 ++++++++++++++--
 lib/arm/asm/thread_info.h  |   3 +-
 lib/arm/cache.S            |  89 +++++++++++++++++++++++++++
 lib/arm/io.c               |   5 ++
 lib/arm/io.h               |   3 +
 lib/arm/mmu.c              |  60 +++++++++++-------
 lib/arm/processor.c        |   6 +-
 lib/arm/setup.c            |  66 ++++++++++++++++----
 lib/arm/smp.c              |   9 ++-
 lib/arm64/asm/assembler.h  |  11 ++--
 lib/arm64/asm/cacheflush.h |  32 ++++++++++
 lib/arm64/asm/mmu.h        |   5 --
 lib/arm64/asm/pgtable.h    |  67 ++++++++++++++++++--
 lib/arm64/cache.S          |  85 ++++++++++++++++++++++++++
 lib/arm64/processor.c      |   5 +-
 lib/devicetree.c           |   2 +-
 lib/powerpc/setup.c        |   8 +++
 powerpc/Makefile.common    |   1 +
 powerpc/cstart64.S         |   1 -
 powerpc/spapr_hcall.c      |   5 +-
 33 files changed, 608 insertions(+), 199 deletions(-)
 create mode 100644 lib/arm/asm/cacheflush.h
 create mode 100644 lib/arm/cache.S
 create mode 100644 lib/arm64/asm/cacheflush.h
 create mode 100644 lib/arm64/cache.S

-- 
2.37.1

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [kvm-unit-tests RFC PATCH 01/19] Makefile: Define __ASSEMBLY__ for assembly files
  2022-08-09  9:15 [kvm-unit-tests RFC PATCH 00/19] arm/arm64: Rework cache maintenance at boot Alexandru Elisei
@ 2022-08-09  9:15 ` Alexandru Elisei
  2022-08-09 12:36   ` Nikos Nikoleris
  2022-09-20  8:11   ` Andrew Jones
  2022-08-09  9:15 ` [kvm-unit-tests RFC PATCH 02/19] lib/alloc_phys: Initialize align_min Alexandru Elisei
                   ` (18 subsequent siblings)
  19 siblings, 2 replies; 49+ messages in thread
From: Alexandru Elisei @ 2022-08-09  9:15 UTC (permalink / raw)
  To: pbonzini, thuth, andrew.jones, kvm, kvmarm, nikos.nikoleris

There are 25 header files today (found with grep -r "#ifndef __ASSEMBLY__)
with functionality relies on the __ASSEMBLY__ prepocessor constant being
correctly defined to work correctly. So far, kvm-unit-tests has relied on
the assembly files to define the constant before including any header
files which depend on it.

Let's make sure that nobody gets this wrong and define it as a compiler
constant when compiling assembly files. __ASSEMBLY__ is now defined for all
.S files, even those that didn't set it explicitely before.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 Makefile           | 5 ++++-
 arm/cstart.S       | 1 -
 arm/cstart64.S     | 1 -
 powerpc/cstart64.S | 1 -
 4 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/Makefile b/Makefile
index 6ed5deaccb41..42c61aa45d50 100644
--- a/Makefile
+++ b/Makefile
@@ -94,6 +94,9 @@ CFLAGS += $(wmissing_parameter_type)
 CFLAGS += $(wold_style_declaration)
 CFLAGS += -Woverride-init -Wmissing-prototypes -Wstrict-prototypes
 
+AFLAGS  = $(CFLAGS)
+AFLAGS += -D__ASSEMBLY__
+
 autodepend-flags = -MMD -MF $(dir $*).$(notdir $*).d
 
 LDFLAGS += $(CFLAGS)
@@ -117,7 +120,7 @@ directories:
 	@mkdir -p $(OBJDIRS)
 
 %.o: %.S
-	$(CC) $(CFLAGS) -c -nostdlib -o $@ $<
+	$(CC) $(AFLAGS) -c -nostdlib -o $@ $<
 
 -include */.*.d */*/.*.d
 
diff --git a/arm/cstart.S b/arm/cstart.S
index 7036e67fc0d8..39260e0fa470 100644
--- a/arm/cstart.S
+++ b/arm/cstart.S
@@ -5,7 +5,6 @@
  *
  * This work is licensed under the terms of the GNU LGPL, version 2.
  */
-#define __ASSEMBLY__
 #include <auxinfo.h>
 #include <asm/assembler.h>
 #include <asm/thread_info.h>
diff --git a/arm/cstart64.S b/arm/cstart64.S
index e4ab7d06251e..d62360cf3859 100644
--- a/arm/cstart64.S
+++ b/arm/cstart64.S
@@ -5,7 +5,6 @@
  *
  * This work is licensed under the terms of the GNU GPL, version 2.
  */
-#define __ASSEMBLY__
 #include <auxinfo.h>
 #include <asm/asm-offsets.h>
 #include <asm/assembler.h>
diff --git a/powerpc/cstart64.S b/powerpc/cstart64.S
index 972851f9ed65..1a823385fd0f 100644
--- a/powerpc/cstart64.S
+++ b/powerpc/cstart64.S
@@ -5,7 +5,6 @@
  *
  * This work is licensed under the terms of the GNU LGPL, version 2.
  */
-#define __ASSEMBLY__
 #include <asm/hcall.h>
 #include <asm/ppc_asm.h>
 #include <asm/rtas.h>
-- 
2.37.1

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [kvm-unit-tests RFC PATCH 02/19] lib/alloc_phys: Initialize align_min
  2022-08-09  9:15 [kvm-unit-tests RFC PATCH 00/19] arm/arm64: Rework cache maintenance at boot Alexandru Elisei
  2022-08-09  9:15 ` [kvm-unit-tests RFC PATCH 01/19] Makefile: Define __ASSEMBLY__ for assembly files Alexandru Elisei
@ 2022-08-09  9:15 ` Alexandru Elisei
  2022-09-20  8:20   ` Andrew Jones
  2022-08-09  9:15 ` [kvm-unit-tests RFC PATCH 03/19] lib/alloc_phys: Use phys_alloc_aligned_safe and rename it to memalign_early Alexandru Elisei
                   ` (17 subsequent siblings)
  19 siblings, 1 reply; 49+ messages in thread
From: Alexandru Elisei @ 2022-08-09  9:15 UTC (permalink / raw)
  To: pbonzini, thuth, andrew.jones, kvm, kvmarm, nikos.nikoleris

Commit 11c4715fbf87 ("alloc: implement free") changed align_min from a
static variable to a field for the alloc_ops struct and carried over the
initializer value of DEFAULT_MINIMUM_ALIGNMENT.

Commit 7e3e823b78c0 ("lib/alloc.h: remove align_min from struct
alloc_ops") removed the align_min field and changed it back to a static
variable, but missed initializing it.

Initialize align_min to DEFAULT_MINIMUM_ALIGNMENT, as it was intended.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 lib/alloc_phys.c | 7 +++----
 lib/alloc_phys.h | 2 --
 2 files changed, 3 insertions(+), 6 deletions(-)

diff --git a/lib/alloc_phys.c b/lib/alloc_phys.c
index a4d2bf23c1bc..3a78d0acd718 100644
--- a/lib/alloc_phys.c
+++ b/lib/alloc_phys.c
@@ -13,8 +13,6 @@
 
 #define PHYS_ALLOC_NR_REGIONS	256
 
-#define DEFAULT_MINIMUM_ALIGNMENT	32
-
 struct phys_alloc_region {
 	phys_addr_t base;
 	phys_addr_t size;
@@ -26,12 +24,13 @@ static int nr_regions;
 static struct spinlock lock;
 static phys_addr_t base, top;
 
+#define DEFAULT_MINIMUM_ALIGNMENT	32
+static size_t align_min = DEFAULT_MINIMUM_ALIGNMENT;
+
 static void *early_memalign(size_t alignment, size_t size);
 static struct alloc_ops early_alloc_ops = {
 	.memalign = early_memalign,
 };
-static size_t align_min;
-
 struct alloc_ops *alloc_ops = &early_alloc_ops;
 
 void phys_alloc_show(void)
diff --git a/lib/alloc_phys.h b/lib/alloc_phys.h
index 611aa70d2041..8049c340818d 100644
--- a/lib/alloc_phys.h
+++ b/lib/alloc_phys.h
@@ -15,8 +15,6 @@
  */
 #include "libcflat.h"
 
-#define DEFAULT_MINIMUM_ALIGNMENT 32
-
 /*
  * phys_alloc_init creates the initial free memory region of size @size
  * at @base. The minimum alignment is set to DEFAULT_MINIMUM_ALIGNMENT.
-- 
2.37.1

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [kvm-unit-tests RFC PATCH 03/19] lib/alloc_phys: Use phys_alloc_aligned_safe and rename it to memalign_early
  2022-08-09  9:15 [kvm-unit-tests RFC PATCH 00/19] arm/arm64: Rework cache maintenance at boot Alexandru Elisei
  2022-08-09  9:15 ` [kvm-unit-tests RFC PATCH 01/19] Makefile: Define __ASSEMBLY__ for assembly files Alexandru Elisei
  2022-08-09  9:15 ` [kvm-unit-tests RFC PATCH 02/19] lib/alloc_phys: Initialize align_min Alexandru Elisei
@ 2022-08-09  9:15 ` Alexandru Elisei
  2022-09-20  8:27   ` Andrew Jones
  2022-08-09  9:15 ` [kvm-unit-tests RFC PATCH 04/19] powerpc: Use the page allocator Alexandru Elisei
                   ` (16 subsequent siblings)
  19 siblings, 1 reply; 49+ messages in thread
From: Alexandru Elisei @ 2022-08-09  9:15 UTC (permalink / raw)
  To: pbonzini, thuth, andrew.jones, kvm, kvmarm, nikos.nikoleris

phys_alloc_aligned_safe() is called only by early_memalign() and the safe
parameter is always true. In the spirit of simplifying the code, merge the
two functions together. Rename it to memalign_early(), to match the naming
scheme used by the page allocator.

Change the type of top_safe to phys_addr_t, to match the type of the top
and base variables describing the available physical memory; this is a
cosmetic change only, since libcflat.h defines phys_addr_t as an alias
for u64.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 lib/alloc_phys.c | 38 ++++++++++++++------------------------
 1 file changed, 14 insertions(+), 24 deletions(-)

diff --git a/lib/alloc_phys.c b/lib/alloc_phys.c
index 3a78d0acd718..efb783b34002 100644
--- a/lib/alloc_phys.c
+++ b/lib/alloc_phys.c
@@ -27,9 +27,9 @@ static phys_addr_t base, top;
 #define DEFAULT_MINIMUM_ALIGNMENT	32
 static size_t align_min = DEFAULT_MINIMUM_ALIGNMENT;
 
-static void *early_memalign(size_t alignment, size_t size);
+static void *memalign_early(size_t alignment, size_t sz);
 static struct alloc_ops early_alloc_ops = {
-	.memalign = early_memalign,
+	.memalign = memalign_early,
 };
 struct alloc_ops *alloc_ops = &early_alloc_ops;
 
@@ -66,18 +66,21 @@ void phys_alloc_set_minimum_alignment(phys_addr_t align)
 	spin_unlock(&lock);
 }
 
-static phys_addr_t phys_alloc_aligned_safe(phys_addr_t size,
-					   phys_addr_t align, bool safe)
+static void *memalign_early(size_t alignment, size_t sz)
 {
 	static bool warned = false;
-	phys_addr_t addr, size_orig = size;
-	u64 top_safe;
+	phys_addr_t align = (phys_addr_t)alignment;
+	phys_addr_t size = (phys_addr_t)sz;
+	phys_addr_t size_orig = size;
+	phys_addr_t addr, top_safe;
+
+	assert(align && !(align & (align - 1)));
 
 	spin_lock(&lock);
 
 	top_safe = top;
 
-	if (safe && sizeof(long) == 4)
+	if (sizeof(long) == 4)
 		top_safe = MIN(top_safe, 1ULL << 32);
 
 	assert(base < top_safe);
@@ -92,10 +95,10 @@ static phys_addr_t phys_alloc_aligned_safe(phys_addr_t size,
 		       " (align=%#" PRIx64 "), "
 		       "need=%#" PRIx64 ", but free=%#" PRIx64 ". "
 		       "top=%#" PRIx64 ", top_safe=%#" PRIx64 "\n",
-		       (u64)size_orig, (u64)align, (u64)size, top_safe - base,
-		       (u64)top, top_safe);
+		       (u64)size_orig, (u64)align, (u64)size,
+		       (u64)(top_safe - base), (u64)top, (u64)top_safe);
 		spin_unlock(&lock);
-		return INVALID_PHYS_ADDR;
+		return NULL;
 	}
 
 	base += size;
@@ -112,7 +115,7 @@ static phys_addr_t phys_alloc_aligned_safe(phys_addr_t size,
 
 	spin_unlock(&lock);
 
-	return addr;
+	return phys_to_virt(addr);
 }
 
 void phys_alloc_get_unused(phys_addr_t *p_base, phys_addr_t *p_top)
@@ -128,16 +131,3 @@ void phys_alloc_get_unused(phys_addr_t *p_base, phys_addr_t *p_top)
 	base = top;
 	spin_unlock(&lock);
 }
-
-static void *early_memalign(size_t alignment, size_t size)
-{
-	phys_addr_t addr;
-
-	assert(alignment && !(alignment & (alignment - 1)));
-
-	addr = phys_alloc_aligned_safe(size, alignment, true);
-	if (addr == INVALID_PHYS_ADDR)
-		return NULL;
-
-	return phys_to_virt(addr);
-}
-- 
2.37.1

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [kvm-unit-tests RFC PATCH 04/19] powerpc: Use the page allocator
  2022-08-09  9:15 [kvm-unit-tests RFC PATCH 00/19] arm/arm64: Rework cache maintenance at boot Alexandru Elisei
                   ` (2 preceding siblings ...)
  2022-08-09  9:15 ` [kvm-unit-tests RFC PATCH 03/19] lib/alloc_phys: Use phys_alloc_aligned_safe and rename it to memalign_early Alexandru Elisei
@ 2022-08-09  9:15 ` Alexandru Elisei
  2022-08-09  9:15 ` [kvm-unit-tests RFC PATCH 05/19] lib/alloc_phys: Remove locking Alexandru Elisei
                   ` (15 subsequent siblings)
  19 siblings, 0 replies; 49+ messages in thread
From: Alexandru Elisei @ 2022-08-09  9:15 UTC (permalink / raw)
  To: pbonzini, thuth, andrew.jones, kvm, kvmarm, nikos.nikoleris
  Cc: Laurent Vivier, kvm-ppc

The spapr_hcall test makes two page sized allocations using the physical
allocator. Initialize the page allocator and use alloc_page() directly.

CC: Laurent Vivier <lvivier@redhat.com>
CC: Thomas Huth <thuth@redhat.com>
CC: kvm-ppc@vger.kernel.org
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 lib/powerpc/setup.c     | 8 ++++++++
 powerpc/Makefile.common | 1 +
 powerpc/spapr_hcall.c   | 5 +++--
 3 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/lib/powerpc/setup.c b/lib/powerpc/setup.c
index 1be4c030012b..bb39711af31c 100644
--- a/lib/powerpc/setup.c
+++ b/lib/powerpc/setup.c
@@ -15,6 +15,7 @@
 #include <devicetree.h>
 #include <alloc.h>
 #include <alloc_phys.h>
+#include <alloc_page.h>
 #include <argv.h>
 #include <asm/setup.h>
 #include <asm/page.h>
@@ -111,6 +112,7 @@ static void mem_init(phys_addr_t freemem_start)
 	struct mem_region primary, mem = {
 		.start = (phys_addr_t)-1,
 	};
+	phys_addr_t base, top;
 	int nr_regs, i;
 
 	nr_regs = dt_get_memory_params(regs, NR_MEM_REGIONS);
@@ -149,6 +151,12 @@ static void mem_init(phys_addr_t freemem_start)
 	phys_alloc_init(freemem_start, primary.end - freemem_start);
 	phys_alloc_set_minimum_alignment(__icache_bytes > __dcache_bytes
 					 ? __icache_bytes : __dcache_bytes);
+
+	phys_alloc_get_unused(&base, &top);
+	base = PAGE_ALIGN(base);
+	top = top & PAGE_MASK;
+	page_alloc_init_area(0, base >> PAGE_SHIFT, top >> PAGE_SHIFT);
+	page_alloc_ops_enable();
 }
 
 void setup(const void *fdt)
diff --git a/powerpc/Makefile.common b/powerpc/Makefile.common
index 12c280c15fff..260946844998 100644
--- a/powerpc/Makefile.common
+++ b/powerpc/Makefile.common
@@ -34,6 +34,7 @@ include $(SRCDIR)/scripts/asm-offsets.mak
 cflatobjs += lib/util.o
 cflatobjs += lib/getchar.o
 cflatobjs += lib/alloc_phys.o
+cflatobjs += lib/alloc_page.o
 cflatobjs += lib/alloc.o
 cflatobjs += lib/devicetree.o
 cflatobjs += lib/powerpc/io.o
diff --git a/powerpc/spapr_hcall.c b/powerpc/spapr_hcall.c
index 823a574a4724..1a2015474b1c 100644
--- a/powerpc/spapr_hcall.c
+++ b/powerpc/spapr_hcall.c
@@ -8,6 +8,7 @@
 #include <libcflat.h>
 #include <util.h>
 #include <alloc.h>
+#include <alloc_page.h>
 #include <asm/hcall.h>
 
 #define PAGE_SIZE 4096
@@ -65,8 +66,8 @@ static void test_h_page_init(int argc, char **argv)
 	if (argc > 1)
 		report_abort("Unsupported argument: '%s'", argv[1]);
 
-	dst = memalign(PAGE_SIZE, PAGE_SIZE);
-	src = memalign(PAGE_SIZE, PAGE_SIZE);
+	dst = alloc_page();
+	src = alloc_page();
 	if (!dst || !src)
 		report_abort("Failed to alloc memory");
 
-- 
2.37.1

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [kvm-unit-tests RFC PATCH 05/19] lib/alloc_phys: Remove locking
  2022-08-09  9:15 [kvm-unit-tests RFC PATCH 00/19] arm/arm64: Rework cache maintenance at boot Alexandru Elisei
                   ` (3 preceding siblings ...)
  2022-08-09  9:15 ` [kvm-unit-tests RFC PATCH 04/19] powerpc: Use the page allocator Alexandru Elisei
@ 2022-08-09  9:15 ` Alexandru Elisei
  2022-09-20  8:45   ` Andrew Jones
  2022-08-09  9:15 ` [kvm-unit-tests RFC PATCH 06/19] lib/alloc_phys: Remove allocation accounting Alexandru Elisei
                   ` (14 subsequent siblings)
  19 siblings, 1 reply; 49+ messages in thread
From: Alexandru Elisei @ 2022-08-09  9:15 UTC (permalink / raw)
  To: pbonzini, thuth, andrew.jones, kvm, kvmarm, nikos.nikoleris

With powerpc moving the page allocator, there are no architectures left
which use the physical allocator after the boot setup:  arm, arm64,
s390x and powerpc drain the physical allocator to initialize the page
allocator; and x86 calls setup_vm() to drain the allocator for each of
the tests that allocate memory.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 lib/alloc_phys.c | 14 --------------
 1 file changed, 14 deletions(-)

diff --git a/lib/alloc_phys.c b/lib/alloc_phys.c
index efb783b34002..2e0b9c079d1d 100644
--- a/lib/alloc_phys.c
+++ b/lib/alloc_phys.c
@@ -21,7 +21,6 @@ struct phys_alloc_region {
 static struct phys_alloc_region regions[PHYS_ALLOC_NR_REGIONS];
 static int nr_regions;
 
-static struct spinlock lock;
 static phys_addr_t base, top;
 
 #define DEFAULT_MINIMUM_ALIGNMENT	32
@@ -37,7 +36,6 @@ void phys_alloc_show(void)
 {
 	int i;
 
-	spin_lock(&lock);
 	printf("phys_alloc minimum alignment: %#" PRIx64 "\n", (u64)align_min);
 	for (i = 0; i < nr_regions; ++i)
 		printf("%016" PRIx64 "-%016" PRIx64 " [%s]\n",
@@ -46,24 +44,19 @@ void phys_alloc_show(void)
 			"USED");
 	printf("%016" PRIx64 "-%016" PRIx64 " [%s]\n",
 		(u64)base, (u64)(top - 1), "FREE");
-	spin_unlock(&lock);
 }
 
 void phys_alloc_init(phys_addr_t base_addr, phys_addr_t size)
 {
-	spin_lock(&lock);
 	base = base_addr;
 	top = base + size;
 	nr_regions = 0;
-	spin_unlock(&lock);
 }
 
 void phys_alloc_set_minimum_alignment(phys_addr_t align)
 {
 	assert(align && !(align & (align - 1)));
-	spin_lock(&lock);
 	align_min = align;
-	spin_unlock(&lock);
 }
 
 static void *memalign_early(size_t alignment, size_t sz)
@@ -76,8 +69,6 @@ static void *memalign_early(size_t alignment, size_t sz)
 
 	assert(align && !(align & (align - 1)));
 
-	spin_lock(&lock);
-
 	top_safe = top;
 
 	if (sizeof(long) == 4)
@@ -97,7 +88,6 @@ static void *memalign_early(size_t alignment, size_t sz)
 		       "top=%#" PRIx64 ", top_safe=%#" PRIx64 "\n",
 		       (u64)size_orig, (u64)align, (u64)size,
 		       (u64)(top_safe - base), (u64)top, (u64)top_safe);
-		spin_unlock(&lock);
 		return NULL;
 	}
 
@@ -113,8 +103,6 @@ static void *memalign_early(size_t alignment, size_t sz)
 		warned = true;
 	}
 
-	spin_unlock(&lock);
-
 	return phys_to_virt(addr);
 }
 
@@ -124,10 +112,8 @@ void phys_alloc_get_unused(phys_addr_t *p_base, phys_addr_t *p_top)
 	*p_top = top;
 	if (base == top)
 		return;
-	spin_lock(&lock);
 	regions[nr_regions].base = base;
 	regions[nr_regions].size = top - base;
 	++nr_regions;
 	base = top;
-	spin_unlock(&lock);
 }
-- 
2.37.1

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [kvm-unit-tests RFC PATCH 06/19] lib/alloc_phys: Remove allocation accounting
  2022-08-09  9:15 [kvm-unit-tests RFC PATCH 00/19] arm/arm64: Rework cache maintenance at boot Alexandru Elisei
                   ` (4 preceding siblings ...)
  2022-08-09  9:15 ` [kvm-unit-tests RFC PATCH 05/19] lib/alloc_phys: Remove locking Alexandru Elisei
@ 2022-08-09  9:15 ` Alexandru Elisei
  2022-09-20  8:40   ` Andrew Jones
  2022-08-09  9:15 ` [kvm-unit-tests RFC PATCH 07/19] arm/arm64: Mark the phys_end parameter as unused in setup_mmu() Alexandru Elisei
                   ` (13 subsequent siblings)
  19 siblings, 1 reply; 49+ messages in thread
From: Alexandru Elisei @ 2022-08-09  9:15 UTC (permalink / raw)
  To: pbonzini, thuth, andrew.jones, kvm, kvmarm, nikos.nikoleris

The page allocator has better allocation tracking and is used by all
architectures, while the physical allocator is now never used for
allocating memory.

Simplify the physical allocator by removing allocation accounting. This
accomplishes two things:

1. It makes the allocator more useful, as the warning that was displayed
each allocation after the 256th is removed.

2. Together with the lock removal, the physical allocator becomes more
appealing as a very early allocator, when using the page allocator might
not be desirable or feasible.

Also, phys_alloc_show() has received a slight change in the way it displays
the use and free regions: the end of the region is now non-inclusive, to
allow phys_alloc_show() to express that no memory has been used, or no
memory is free, in which case the start and the end adresses are equal.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 lib/alloc_phys.c | 65 ++++++++++++++----------------------------------
 lib/alloc_phys.h |  5 ++--
 2 files changed, 21 insertions(+), 49 deletions(-)

diff --git a/lib/alloc_phys.c b/lib/alloc_phys.c
index 2e0b9c079d1d..1e1fc179f108 100644
--- a/lib/alloc_phys.c
+++ b/lib/alloc_phys.c
@@ -11,17 +11,11 @@
 #include "asm/io.h"
 #include "alloc_phys.h"
 
-#define PHYS_ALLOC_NR_REGIONS	256
-
-struct phys_alloc_region {
-	phys_addr_t base;
-	phys_addr_t size;
-};
-
-static struct phys_alloc_region regions[PHYS_ALLOC_NR_REGIONS];
-static int nr_regions;
-
-static phys_addr_t base, top;
+/*
+ * used is the end address of the currently allocated memory, non-inclusive.
+ * used equals top means that all memory has been allocated.
+ */
+static phys_addr_t base, used, top;
 
 #define DEFAULT_MINIMUM_ALIGNMENT	32
 static size_t align_min = DEFAULT_MINIMUM_ALIGNMENT;
@@ -34,23 +28,15 @@ struct alloc_ops *alloc_ops = &early_alloc_ops;
 
 void phys_alloc_show(void)
 {
-	int i;
-
 	printf("phys_alloc minimum alignment: %#" PRIx64 "\n", (u64)align_min);
-	for (i = 0; i < nr_regions; ++i)
-		printf("%016" PRIx64 "-%016" PRIx64 " [%s]\n",
-			(u64)regions[i].base,
-			(u64)(regions[i].base + regions[i].size - 1),
-			"USED");
-	printf("%016" PRIx64 "-%016" PRIx64 " [%s]\n",
-		(u64)base, (u64)(top - 1), "FREE");
+	printf("%016" PRIx64 "-%016" PRIx64 " [USED]\n", (u64)base, (u64)used);
+	printf("%016" PRIx64 "-%016" PRIx64 " [FREE]\n", (u64)used, (u64)top);
 }
 
 void phys_alloc_init(phys_addr_t base_addr, phys_addr_t size)
 {
-	base = base_addr;
+	used = base = base_addr;
 	top = base + size;
-	nr_regions = 0;
 }
 
 void phys_alloc_set_minimum_alignment(phys_addr_t align)
@@ -61,7 +47,6 @@ void phys_alloc_set_minimum_alignment(phys_addr_t align)
 
 static void *memalign_early(size_t alignment, size_t sz)
 {
-	static bool warned = false;
 	phys_addr_t align = (phys_addr_t)alignment;
 	phys_addr_t size = (phys_addr_t)sz;
 	phys_addr_t size_orig = size;
@@ -70,50 +55,36 @@ static void *memalign_early(size_t alignment, size_t sz)
 	assert(align && !(align & (align - 1)));
 
 	top_safe = top;
-
 	if (sizeof(long) == 4)
 		top_safe = MIN(top_safe, 1ULL << 32);
+	assert(used < top_safe);
 
-	assert(base < top_safe);
 	if (align < align_min)
 		align = align_min;
 
-	addr = ALIGN(base, align);
-	size += addr - base;
+	addr = ALIGN(used, align);
+	size += addr - used;
 
-	if ((top_safe - base) < size) {
+	if (size > top_safe - used) {
 		printf("phys_alloc: requested=%#" PRIx64
 		       " (align=%#" PRIx64 "), "
 		       "need=%#" PRIx64 ", but free=%#" PRIx64 ". "
 		       "top=%#" PRIx64 ", top_safe=%#" PRIx64 "\n",
 		       (u64)size_orig, (u64)align, (u64)size,
-		       (u64)(top_safe - base), (u64)top, (u64)top_safe);
+		       (u64)(top_safe - used), (u64)top, (u64)top_safe);
 		return NULL;
 	}
 
-	base += size;
-
-	if (nr_regions < PHYS_ALLOC_NR_REGIONS) {
-		regions[nr_regions].base = addr;
-		regions[nr_regions].size = size_orig;
-		++nr_regions;
-	} else if (!warned) {
-		printf("WARNING: phys_alloc: No free log entries, "
-		       "can no longer log allocations...\n");
-		warned = true;
-	}
+	used += size;
 
 	return phys_to_virt(addr);
 }
 
 void phys_alloc_get_unused(phys_addr_t *p_base, phys_addr_t *p_top)
 {
-	*p_base = base;
+	*p_base = used;
 	*p_top = top;
-	if (base == top)
-		return;
-	regions[nr_regions].base = base;
-	regions[nr_regions].size = top - base;
-	++nr_regions;
-	base = top;
+
+	/* Empty allocator. */
+	used = top;
 }
diff --git a/lib/alloc_phys.h b/lib/alloc_phys.h
index 8049c340818d..4d350f010031 100644
--- a/lib/alloc_phys.h
+++ b/lib/alloc_phys.h
@@ -29,8 +29,9 @@ extern void phys_alloc_set_minimum_alignment(phys_addr_t align);
 
 /*
  * phys_alloc_show outputs all currently allocated regions with the
- * following format
- *   <start_addr>-<end_addr> [<USED|FREE>]
+ * following format, where <end_addr> is non-inclusive:
+ *
+ * <start_addr>-<end_addr> [<USED|FREE>]
  */
 extern void phys_alloc_show(void);
 
-- 
2.37.1

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [kvm-unit-tests RFC PATCH 07/19] arm/arm64: Mark the phys_end parameter as unused in setup_mmu()
  2022-08-09  9:15 [kvm-unit-tests RFC PATCH 00/19] arm/arm64: Rework cache maintenance at boot Alexandru Elisei
                   ` (5 preceding siblings ...)
  2022-08-09  9:15 ` [kvm-unit-tests RFC PATCH 06/19] lib/alloc_phys: Remove allocation accounting Alexandru Elisei
@ 2022-08-09  9:15 ` Alexandru Elisei
  2022-09-20  8:58   ` Andrew Jones
  2022-08-09  9:15 ` [kvm-unit-tests RFC PATCH 08/19] arm/arm64: Use pgd_alloc() to allocate mmu_idmap Alexandru Elisei
                   ` (12 subsequent siblings)
  19 siblings, 1 reply; 49+ messages in thread
From: Alexandru Elisei @ 2022-08-09  9:15 UTC (permalink / raw)
  To: pbonzini, thuth, andrew.jones, kvm, kvmarm, nikos.nikoleris

phys_end was used to cap the linearly mapped memory to 3G to allow 1G of
room for the vmalloc area to grown down. This was made useless in commit
c1cd1a2bed69 ("arm/arm64: mmu: Remove memory layout assumptions"), when
setup_mmu() was changed to map all the detected memory regions without
changing their limits.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 lib/arm/mmu.c | 6 +-----
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/lib/arm/mmu.c b/lib/arm/mmu.c
index e1a72fe4941f..8f936acafe8b 100644
--- a/lib/arm/mmu.c
+++ b/lib/arm/mmu.c
@@ -153,14 +153,10 @@ void mmu_set_range_sect(pgd_t *pgtable, uintptr_t virt_offset,
 	}
 }
 
-void *setup_mmu(phys_addr_t phys_end, void *unused)
+void *setup_mmu(phys_addr_t unused0, void *unused1)
 {
 	struct mem_region *r;
 
-	/* 3G-4G region is reserved for vmalloc, cap phys_end at 3G */
-	if (phys_end > (3ul << 30))
-		phys_end = 3ul << 30;
-
 #ifdef __aarch64__
 	init_alloc_vpage((void*)(4ul << 30));
 
-- 
2.37.1

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [kvm-unit-tests RFC PATCH 08/19] arm/arm64: Use pgd_alloc() to allocate mmu_idmap
  2022-08-09  9:15 [kvm-unit-tests RFC PATCH 00/19] arm/arm64: Rework cache maintenance at boot Alexandru Elisei
                   ` (6 preceding siblings ...)
  2022-08-09  9:15 ` [kvm-unit-tests RFC PATCH 07/19] arm/arm64: Mark the phys_end parameter as unused in setup_mmu() Alexandru Elisei
@ 2022-08-09  9:15 ` Alexandru Elisei
  2022-09-20  9:05   ` Andrew Jones
  2022-08-09  9:15 ` [kvm-unit-tests RFC PATCH 09/19] arm/arm64: Zero secondary CPUs' stack Alexandru Elisei
                   ` (11 subsequent siblings)
  19 siblings, 1 reply; 49+ messages in thread
From: Alexandru Elisei @ 2022-08-09  9:15 UTC (permalink / raw)
  To: pbonzini, thuth, andrew.jones, kvm, kvmarm, nikos.nikoleris

Until commit 031755dbfefb ("arm: enable vmalloc"), the idmap was allocated
using pgd_alloc(). After that commit, all the page table allocator
functions were switched to using the page allocator, but pgd_alloc() was
left unchanged and became unused, with the idmap now being allocated with
alloc_page().

For arm64, the pgd table size varies based on the page size, which is
configured by the user. For arm, it will always contain 4 entries (it
translates bits 31:30 of the input address). To keep things simple and
consistent with the other functions and across both architectures, modify
pgd_alloc() to use alloc_page() instead of memalign like the rest of the
page table allocator functions and use it to allocate the idmap.

Note that when the idmap is created, alloc_ops->memalign is
memalign_pages() which allocates memory with page granularity. Using
memalign() as before would still have allocated a full page.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 lib/arm/asm/pgtable.h   | 4 ++--
 lib/arm/mmu.c           | 4 ++--
 lib/arm64/asm/pgtable.h | 4 ++--
 3 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/lib/arm/asm/pgtable.h b/lib/arm/asm/pgtable.h
index d7c73906ba5e..a35f42965df9 100644
--- a/lib/arm/asm/pgtable.h
+++ b/lib/arm/asm/pgtable.h
@@ -43,8 +43,8 @@
 #define pgd_free(pgd) free(pgd)
 static inline pgd_t *pgd_alloc(void)
 {
-	pgd_t *pgd = memalign(L1_CACHE_BYTES, PTRS_PER_PGD * sizeof(pgd_t));
-	memset(pgd, 0, PTRS_PER_PGD * sizeof(pgd_t));
+	assert(PTRS_PER_PGD * sizeof(pgd_t) <= PAGE_SIZE);
+	pgd_t *pgd = alloc_page();
 	return pgd;
 }
 
diff --git a/lib/arm/mmu.c b/lib/arm/mmu.c
index 8f936acafe8b..2b7405d0274f 100644
--- a/lib/arm/mmu.c
+++ b/lib/arm/mmu.c
@@ -165,7 +165,7 @@ void *setup_mmu(phys_addr_t unused0, void *unused1)
 #endif
 
 	if (!mmu_idmap)
-		mmu_idmap = alloc_page();
+		mmu_idmap = pgd_alloc();
 
 	for (r = mem_regions; r->end; ++r) {
 		if (r->flags & MR_F_IO) {
@@ -197,7 +197,7 @@ void __iomem *__ioremap(phys_addr_t phys_addr, size_t size)
 		pgtable = current_thread_info()->pgtable;
 	} else {
 		if (!mmu_idmap)
-			mmu_idmap = alloc_page();
+			mmu_idmap = pgd_alloc();
 		pgtable = mmu_idmap;
 	}
 
diff --git a/lib/arm64/asm/pgtable.h b/lib/arm64/asm/pgtable.h
index bfb8a993e112..06357920aa74 100644
--- a/lib/arm64/asm/pgtable.h
+++ b/lib/arm64/asm/pgtable.h
@@ -49,8 +49,8 @@
 #define pgd_free(pgd) free(pgd)
 static inline pgd_t *pgd_alloc(void)
 {
-	pgd_t *pgd = memalign(PAGE_SIZE, PTRS_PER_PGD * sizeof(pgd_t));
-	memset(pgd, 0, PTRS_PER_PGD * sizeof(pgd_t));
+	assert(PTRS_PER_PGD * sizeof(pgd_t) <= PAGE_SIZE);
+	pgd_t *pgd = alloc_page();
 	return pgd;
 }
 
-- 
2.37.1

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [kvm-unit-tests RFC PATCH 09/19] arm/arm64: Zero secondary CPUs' stack
  2022-08-09  9:15 [kvm-unit-tests RFC PATCH 00/19] arm/arm64: Rework cache maintenance at boot Alexandru Elisei
                   ` (7 preceding siblings ...)
  2022-08-09  9:15 ` [kvm-unit-tests RFC PATCH 08/19] arm/arm64: Use pgd_alloc() to allocate mmu_idmap Alexandru Elisei
@ 2022-08-09  9:15 ` Alexandru Elisei
  2022-08-09 12:56   ` Nikos Nikoleris
  2022-09-20  9:24   ` Andrew Jones
  2022-08-09  9:15 ` [kvm-unit-tests RFC PATCH 10/19] arm/arm64: Enable the MMU early Alexandru Elisei
                   ` (10 subsequent siblings)
  19 siblings, 2 replies; 49+ messages in thread
From: Alexandru Elisei @ 2022-08-09  9:15 UTC (permalink / raw)
  To: pbonzini, thuth, andrew.jones, kvm, kvmarm, nikos.nikoleris

For the boot CPU, the entire stack is zeroed in the entry code. For the
secondaries, only struct thread_info, which lives at the bottom of the
stack, is zeroed in thread_info_init().

Be consistent and zero the entire stack for the secondaries. This should
also improve reproducibility of the testsuite, as all the stacks now start
with the same contents, which is zero. And now that all the stacks are
zeroed in the entry code, there is no need to explicitely zero struct
thread_info in thread_info_init().

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 arm/cstart.S          | 6 ++++++
 arm/cstart64.S        | 3 +++
 lib/arm/processor.c   | 1 -
 lib/arm64/processor.c | 1 -
 4 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/arm/cstart.S b/arm/cstart.S
index 39260e0fa470..39e70f40986a 100644
--- a/arm/cstart.S
+++ b/arm/cstart.S
@@ -151,7 +151,13 @@ secondary_entry:
 	 */
 	ldr	r1, =secondary_data
 	ldr	r0, [r1]
+	mov	r2, r0
+	lsr	r2, #THREAD_SHIFT
+	lsl	r2, #THREAD_SHIFT
+	add	r3, r2, #THREAD_SIZE
+	zero_range r2, r3, r4, r5
 	mov	sp, r0
+
 	bl	exceptions_init
 	bl	enable_vfp
 
diff --git a/arm/cstart64.S b/arm/cstart64.S
index d62360cf3859..54773676d1d5 100644
--- a/arm/cstart64.S
+++ b/arm/cstart64.S
@@ -156,6 +156,9 @@ secondary_entry:
 	/* set the stack */
 	adrp	x0, secondary_data
 	ldr	x0, [x0, :lo12:secondary_data]
+	and	x1, x0, #THREAD_MASK
+	add	x2, x1, #THREAD_SIZE
+	zero_range x1, x2
 	mov	sp, x0
 
 	/* finish init in C code */
diff --git a/lib/arm/processor.c b/lib/arm/processor.c
index 9d5759686b73..ceff1c0a1bd2 100644
--- a/lib/arm/processor.c
+++ b/lib/arm/processor.c
@@ -117,7 +117,6 @@ void do_handle_exception(enum vector v, struct pt_regs *regs)
 
 void thread_info_init(struct thread_info *ti, unsigned int flags)
 {
-	memset(ti, 0, sizeof(struct thread_info));
 	ti->cpu = mpidr_to_cpu(get_mpidr());
 	ti->flags = flags;
 }
diff --git a/lib/arm64/processor.c b/lib/arm64/processor.c
index 831207c16587..268b2858f0be 100644
--- a/lib/arm64/processor.c
+++ b/lib/arm64/processor.c
@@ -232,7 +232,6 @@ void install_vector_handler(enum vector v, vector_fn fn)
 
 static void __thread_info_init(struct thread_info *ti, unsigned int flags)
 {
-	memset(ti, 0, sizeof(struct thread_info));
 	ti->cpu = mpidr_to_cpu(get_mpidr());
 	ti->flags = flags;
 }
-- 
2.37.1

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [kvm-unit-tests RFC PATCH 10/19] arm/arm64: Enable the MMU early
  2022-08-09  9:15 [kvm-unit-tests RFC PATCH 00/19] arm/arm64: Rework cache maintenance at boot Alexandru Elisei
                   ` (8 preceding siblings ...)
  2022-08-09  9:15 ` [kvm-unit-tests RFC PATCH 09/19] arm/arm64: Zero secondary CPUs' stack Alexandru Elisei
@ 2022-08-09  9:15 ` Alexandru Elisei
  2022-08-09  9:15 ` [kvm-unit-tests RFC PATCH 11/19] arm/arm64: Map the UART when creating the translation tables Alexandru Elisei
                   ` (9 subsequent siblings)
  19 siblings, 0 replies; 49+ messages in thread
From: Alexandru Elisei @ 2022-08-09  9:15 UTC (permalink / raw)
  To: pbonzini, thuth, andrew.jones, kvm, kvmarm, nikos.nikoleris

Enable the MMU immediately after the physical allocator is initialized,
to make reasoning about what cache maintenance operations are needed a
lot easier.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 lib/arm/asm/mmu-api.h   |  1 +
 lib/arm/asm/pgtable.h   | 42 ++++++++++++++++++++++++++++---
 lib/arm/mmu.c           | 46 +++++++++++++++++++---------------
 lib/arm/processor.c     |  5 ++++
 lib/arm/setup.c         |  4 +++
 lib/arm/smp.c           |  4 +--
 lib/arm64/asm/pgtable.h | 55 ++++++++++++++++++++++++++++++++++++++---
 lib/arm64/processor.c   |  4 +++
 8 files changed, 131 insertions(+), 30 deletions(-)

diff --git a/lib/arm/asm/mmu-api.h b/lib/arm/asm/mmu-api.h
index 3d77cbfd8b24..456c8bc626d7 100644
--- a/lib/arm/asm/mmu-api.h
+++ b/lib/arm/asm/mmu-api.h
@@ -10,6 +10,7 @@ extern void mmu_mark_enabled(int cpu);
 extern void mmu_mark_disabled(int cpu);
 extern void mmu_enable(pgd_t *pgtable);
 extern void mmu_disable(void);
+extern void mmu_setup_early(phys_addr_t unused0, void *unused1);
 
 extern void mmu_set_range_sect(pgd_t *pgtable, uintptr_t virt_offset,
 			       phys_addr_t phys_start, phys_addr_t phys_end,
diff --git a/lib/arm/asm/pgtable.h b/lib/arm/asm/pgtable.h
index a35f42965df9..1911e35bb091 100644
--- a/lib/arm/asm/pgtable.h
+++ b/lib/arm/asm/pgtable.h
@@ -41,10 +41,21 @@
 #define pgd_offset(pgtable, addr) ((pgtable) + pgd_index(addr))
 
 #define pgd_free(pgd) free(pgd)
+static inline pgd_t *pgd_alloc_early(void)
+{
+	pgd_t *pgd = memalign(PAGE_SIZE, PAGE_SIZE);
+	memset(pgd, 0, PAGE_SIZE);
+	return pgd;
+}
 static inline pgd_t *pgd_alloc(void)
 {
+	pgd_t *pgd;
+
 	assert(PTRS_PER_PGD * sizeof(pgd_t) <= PAGE_SIZE);
-	pgd_t *pgd = alloc_page();
+	if (page_alloc_initialized())
+		pgd = alloc_page();
+	else
+		pgd = pgd_alloc_early();
 	return pgd;
 }
 
@@ -71,11 +82,23 @@ static inline pmd_t *pmd_alloc_one(void)
 	pmd_t *pmd = alloc_page();
 	return pmd;
 }
+static inline pmd_t *pmd_alloc_one_early(void)
+{
+	assert(PTRS_PER_PMD * sizeof(pmd_t) == PAGE_SIZE);
+	pmd_t *pmd = memalign(PAGE_SIZE, PAGE_SIZE);
+	memset(pmd, 0, PAGE_SIZE);
+	return pmd;
+}
 static inline pmd_t *pmd_alloc(pgd_t *pgd, unsigned long addr)
 {
 	if (pgd_none(*pgd)) {
 		pgd_t entry;
-		pgd_val(entry) = pgtable_pa(pmd_alloc_one()) | PMD_TYPE_TABLE;
+		pmd_t *pmd;
+		if (page_alloc_initialized())
+			pmd = pmd_alloc_one();
+		else
+			pmd = pmd_alloc_one_early();
+		pgd_val(entry) = pgtable_pa(pmd) | PMD_TYPE_TABLE;
 		WRITE_ONCE(*pgd, entry);
 	}
 	return pmd_offset(pgd, addr);
@@ -98,12 +121,25 @@ static inline pte_t *pte_alloc_one(void)
 	pte_t *pte = alloc_page();
 	return pte;
 }
+static inline pte_t *pte_alloc_one_early(void)
+{
+	assert(PTRS_PER_PTE * sizeof(pte_t) == PAGE_SIZE);
+	pte_t *pte = memalign(PAGE_SIZE, PAGE_SIZE);
+	memset(pte, 0, PAGE_SIZE);
+	return pte;
+}
 static inline pte_t *pte_alloc(pmd_t *pmd, unsigned long addr)
 {
 	if (pmd_none(*pmd)) {
 		pmd_t entry;
-		pmd_val(entry) = pgtable_pa(pte_alloc_one()) | PMD_TYPE_TABLE;
+		pte_t *pte;
+		if (page_alloc_initialized())
+			pte = pte_alloc_one();
+		else
+			pte = pte_alloc_one_early();
+		pmd_val(entry) = pgtable_pa(pte) | PMD_TYPE_TABLE;
 		WRITE_ONCE(*pmd, entry);
+
 	}
 	return pte_offset(pmd, addr);
 }
diff --git a/lib/arm/mmu.c b/lib/arm/mmu.c
index 2b7405d0274f..7765d47dc27a 100644
--- a/lib/arm/mmu.c
+++ b/lib/arm/mmu.c
@@ -12,11 +12,10 @@
 #include <asm/setup.h>
 #include <asm/page.h>
 #include <asm/io.h>
+#include <asm/pgtable.h>
+#include <asm/pgtable-hwdef.h>
 
-#include "alloc_page.h"
 #include "vmalloc.h"
-#include <asm/pgtable-hwdef.h>
-#include <asm/pgtable.h>
 
 #include <linux/compiler.h>
 
@@ -153,19 +152,18 @@ void mmu_set_range_sect(pgd_t *pgtable, uintptr_t virt_offset,
 	}
 }
 
-void *setup_mmu(phys_addr_t unused0, void *unused1)
+void mmu_setup_early(phys_addr_t unused0, void *unused1)
 {
 	struct mem_region *r;
 
-#ifdef __aarch64__
-	init_alloc_vpage((void*)(4ul << 30));
+	assert(!mmu_enabled() && !page_alloc_initialized());
 
+#ifdef __aarch64__
 	assert_msg(system_supports_granule(PAGE_SIZE),
 			"Unsupported translation granule %ld\n", PAGE_SIZE);
 #endif
 
-	if (!mmu_idmap)
-		mmu_idmap = pgd_alloc();
+	mmu_idmap = pgd_alloc();
 
 	for (r = mem_regions; r->end; ++r) {
 		if (r->flags & MR_F_IO) {
@@ -180,7 +178,21 @@ void *setup_mmu(phys_addr_t unused0, void *unused1)
 		}
 	}
 
-	mmu_enable(mmu_idmap);
+	asm_mmu_enable((phys_addr_t)(unsigned long)mmu_idmap);
+}
+
+void *setup_mmu(phys_addr_t unused0, void *unused1)
+{
+	struct thread_info *info = current_thread_info();
+
+	assert(mmu_idmap);
+
+#ifdef __aarch64__
+	init_alloc_vpage((void*)(4ul << 30));
+#endif
+
+	mmu_mark_enabled(info->cpu);
+
 	return mmu_idmap;
 }
 
@@ -189,17 +201,10 @@ void __iomem *__ioremap(phys_addr_t phys_addr, size_t size)
 	phys_addr_t paddr_aligned = phys_addr & PAGE_MASK;
 	phys_addr_t paddr_end = PAGE_ALIGN(phys_addr + size);
 	pgprot_t prot = __pgprot(PTE_UNCACHED | PTE_USER | PTE_UXN | PTE_PXN);
-	pgd_t *pgtable;
+	pgd_t *pgtable = current_thread_info()->pgtable;
 
 	assert(sizeof(long) == 8 || !(phys_addr >> 32));
-
-	if (mmu_enabled()) {
-		pgtable = current_thread_info()->pgtable;
-	} else {
-		if (!mmu_idmap)
-			mmu_idmap = pgd_alloc();
-		pgtable = mmu_idmap;
-	}
+	assert(pgtable);
 
 	mmu_set_range_ptes(pgtable, paddr_aligned, paddr_aligned,
 			   paddr_end, prot);
@@ -209,8 +214,9 @@ void __iomem *__ioremap(phys_addr_t phys_addr, size_t size)
 
 phys_addr_t __virt_to_phys(unsigned long addr)
 {
-	if (mmu_enabled()) {
-		pgd_t *pgtable = current_thread_info()->pgtable;
+	pgd_t *pgtable = current_thread_info()->pgtable;
+
+	if (mmu_enabled() && pgtable) {
 		return virt_to_pte_phys(pgtable, (void *)addr);
 	}
 	return addr;
diff --git a/lib/arm/processor.c b/lib/arm/processor.c
index ceff1c0a1bd2..91ef2824ac6c 100644
--- a/lib/arm/processor.c
+++ b/lib/arm/processor.c
@@ -5,7 +5,10 @@
  *
  * This work is licensed under the terms of the GNU LGPL, version 2.
  */
+#include <auxinfo.h>
 #include <libcflat.h>
+
+#include <asm/mmu.h>
 #include <asm/ptrace.h>
 #include <asm/processor.h>
 #include <asm/thread_info.h>
@@ -119,6 +122,8 @@ void thread_info_init(struct thread_info *ti, unsigned int flags)
 {
 	ti->cpu = mpidr_to_cpu(get_mpidr());
 	ti->flags = flags;
+	if (!(auxinfo.flags & AUXINFO_MMU_OFF))
+		ti->pgtable = mmu_idmap;
 }
 
 void start_usr(void (*func)(void *arg), void *arg, unsigned long sp_usr)
diff --git a/lib/arm/setup.c b/lib/arm/setup.c
index bcdf0d78c2e2..73f7c22c6828 100644
--- a/lib/arm/setup.c
+++ b/lib/arm/setup.c
@@ -26,6 +26,7 @@
 #include <asm/smp.h>
 #include <asm/timer.h>
 #include <asm/psci.h>
+#include <asm/mmu.h>
 
 #include "io.h"
 
@@ -189,6 +190,9 @@ static void mem_init(phys_addr_t freemem_start)
 	phys_alloc_init(freemem_start, freemem->end - freemem_start);
 	phys_alloc_set_minimum_alignment(SMP_CACHE_BYTES);
 
+	if (!(auxinfo.flags & AUXINFO_MMU_OFF))
+		mmu_setup_early(0, NULL);
+
 	phys_alloc_get_unused(&base, &top);
 	base = PAGE_ALIGN(base);
 	top = top & PAGE_MASK;
diff --git a/lib/arm/smp.c b/lib/arm/smp.c
index 98a5054e039b..89e44a172c15 100644
--- a/lib/arm/smp.c
+++ b/lib/arm/smp.c
@@ -38,10 +38,8 @@ secondary_entry_fn secondary_cinit(void)
 
 	thread_info_init(ti, 0);
 
-	if (!(auxinfo.flags & AUXINFO_MMU_OFF)) {
-		ti->pgtable = mmu_idmap;
+	if (!(auxinfo.flags & AUXINFO_MMU_OFF))
 		mmu_mark_enabled(ti->cpu);
-	}
 
 	/*
 	 * Save secondary_data.entry locally to avoid opening a race
diff --git a/lib/arm64/asm/pgtable.h b/lib/arm64/asm/pgtable.h
index 06357920aa74..98d51c89b7c0 100644
--- a/lib/arm64/asm/pgtable.h
+++ b/lib/arm64/asm/pgtable.h
@@ -47,10 +47,21 @@
 #define pgd_offset(pgtable, addr) ((pgtable) + pgd_index(addr))
 
 #define pgd_free(pgd) free(pgd)
+static inline pgd_t *pgd_alloc_early(void)
+{
+	pgd_t *pgd = memalign(PAGE_SIZE, PAGE_SIZE);
+	memset(pgd, 0, PAGE_SIZE);
+	return pgd;
+}
 static inline pgd_t *pgd_alloc(void)
 {
+	pgd_t *pgd;
+
 	assert(PTRS_PER_PGD * sizeof(pgd_t) <= PAGE_SIZE);
-	pgd_t *pgd = alloc_page();
+	if (page_alloc_initialized())
+		pgd = alloc_page();
+	else
+		pgd = pgd_alloc_early();
 	return pgd;
 }
 
@@ -81,11 +92,23 @@ static inline pmd_t *pmd_alloc_one(void)
 	pmd_t *pmd = alloc_page();
 	return pmd;
 }
+static inline pmd_t *pmd_alloc_one_early(void)
+{
+	assert(PTRS_PER_PMD * sizeof(pmd_t) == PAGE_SIZE);
+	pmd_t *pmd = memalign(PAGE_SIZE, PAGE_SIZE);
+	memset(pmd, 0, PAGE_SIZE);
+	return pmd;
+}
 static inline pmd_t *pmd_alloc(pud_t *pud, unsigned long addr)
 {
 	if (pud_none(*pud)) {
 		pud_t entry;
-		pud_val(entry) = pgtable_pa(pmd_alloc_one()) | PMD_TYPE_TABLE;
+		pmd_t *pmd;
+		if (page_alloc_initialized())
+			pmd = pmd_alloc_one();
+		else
+			pmd = pmd_alloc_one_early();
+		pud_val(entry) = pgtable_pa(pmd) | PMD_TYPE_TABLE;
 		WRITE_ONCE(*pud, entry);
 	}
 	return pmd_offset(pud, addr);
@@ -108,11 +131,23 @@ static inline pud_t *pud_alloc_one(void)
 	pud_t *pud = alloc_page();
 	return pud;
 }
+static inline pud_t *pud_alloc_one_early(void)
+{
+	assert(PTRS_PER_PUD * sizeof(pud_t) == PAGE_SIZE);
+	pud_t *pud = memalign(PAGE_SIZE, PAGE_SIZE);
+	memset(pud, 0, PAGE_SIZE);
+	return pud;
+}
 static inline pud_t *pud_alloc(pgd_t *pgd, unsigned long addr)
 {
 	if (pgd_none(*pgd)) {
 		pgd_t entry;
-		pgd_val(entry) = pgtable_pa(pud_alloc_one()) | PMD_TYPE_TABLE;
+		pud_t *pud;
+		if (page_alloc_initialized())
+			pud = pud_alloc_one();
+		else
+			pud = pud_alloc_one_early();
+		pgd_val(entry) = pgtable_pa(pud) | PMD_TYPE_TABLE;
 		WRITE_ONCE(*pgd, entry);
 	}
 	return pud_offset(pgd, addr);
@@ -135,11 +170,23 @@ static inline pte_t *pte_alloc_one(void)
 	pte_t *pte = alloc_page();
 	return pte;
 }
+static inline pte_t *pte_alloc_one_early(void)
+{
+	assert(PTRS_PER_PTE * sizeof(pte_t) == PAGE_SIZE);
+	pte_t *pte = memalign(PAGE_SIZE, PAGE_SIZE);
+	memset(pte, 0, PAGE_SIZE);
+	return pte;
+}
 static inline pte_t *pte_alloc(pmd_t *pmd, unsigned long addr)
 {
 	if (pmd_none(*pmd)) {
 		pmd_t entry;
-		pmd_val(entry) = pgtable_pa(pte_alloc_one()) | PMD_TYPE_TABLE;
+		pte_t *pte;
+		if (page_alloc_initialized())
+			pte = pte_alloc_one();
+		else
+			pte = pte_alloc_one_early();
+		pmd_val(entry) = pgtable_pa(pte) | PMD_TYPE_TABLE;
 		WRITE_ONCE(*pmd, entry);
 	}
 	return pte_offset(pmd, addr);
diff --git a/lib/arm64/processor.c b/lib/arm64/processor.c
index 268b2858f0be..c435fb96e373 100644
--- a/lib/arm64/processor.c
+++ b/lib/arm64/processor.c
@@ -5,7 +5,9 @@
  *
  * This work is licensed under the terms of the GNU LGPL, version 2.
  */
+#include <auxinfo.h>
 #include <libcflat.h>
+#include <asm/mmu.h>
 #include <asm/ptrace.h>
 #include <asm/processor.h>
 #include <asm/thread_info.h>
@@ -234,6 +236,8 @@ static void __thread_info_init(struct thread_info *ti, unsigned int flags)
 {
 	ti->cpu = mpidr_to_cpu(get_mpidr());
 	ti->flags = flags;
+	if (!(auxinfo.flags & AUXINFO_MMU_OFF))
+		ti->pgtable = mmu_idmap;
 }
 
 void thread_info_init(struct thread_info *ti, unsigned int flags)
-- 
2.37.1

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [kvm-unit-tests RFC PATCH 11/19] arm/arm64: Map the UART when creating the translation tables
  2022-08-09  9:15 [kvm-unit-tests RFC PATCH 00/19] arm/arm64: Rework cache maintenance at boot Alexandru Elisei
                   ` (9 preceding siblings ...)
  2022-08-09  9:15 ` [kvm-unit-tests RFC PATCH 10/19] arm/arm64: Enable the MMU early Alexandru Elisei
@ 2022-08-09  9:15 ` Alexandru Elisei
  2022-08-09  9:15 ` [kvm-unit-tests RFC PATCH 12/19] arm/arm64: assembler.h: Replace size with end address for dcache_by_line_op Alexandru Elisei
                   ` (8 subsequent siblings)
  19 siblings, 0 replies; 49+ messages in thread
From: Alexandru Elisei @ 2022-08-09  9:15 UTC (permalink / raw)
  To: pbonzini, thuth, andrew.jones, kvm, kvmarm, nikos.nikoleris

The MMU is now enabled before the UART is probed, which leaves
kvm-unit-tests with a window where attempting to write to the console
results in an infinite data abort loop triggered by the exception
handlers themselves trying to use the console. Get around this by
mapping the UART early address when creating the translation tables.

Note that the address remains mapped even if the devicetree address is
different.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 lib/arm/io.c  | 5 +++++
 lib/arm/io.h  | 3 +++
 lib/arm/mmu.c | 8 ++++++++
 3 files changed, 16 insertions(+)

diff --git a/lib/arm/io.c b/lib/arm/io.c
index 343e10822263..c2c3edb563a0 100644
--- a/lib/arm/io.c
+++ b/lib/arm/io.c
@@ -73,6 +73,11 @@ static void uart0_init(void)
 	}
 }
 
+void __iomem *io_uart_early_base(void)
+{
+	return UART_EARLY_BASE;
+}
+
 void io_init(void)
 {
 	uart0_init();
diff --git a/lib/arm/io.h b/lib/arm/io.h
index 183479c899a9..24704d8fe0a4 100644
--- a/lib/arm/io.h
+++ b/lib/arm/io.h
@@ -7,6 +7,9 @@
 #ifndef _ARM_IO_H_
 #define _ARM_IO_H_
 
+#include <asm/io.h>
+
 extern void io_init(void);
+extern void __iomem *io_uart_early_base(void);
 
 #endif
diff --git a/lib/arm/mmu.c b/lib/arm/mmu.c
index 7765d47dc27a..19c98a8a9640 100644
--- a/lib/arm/mmu.c
+++ b/lib/arm/mmu.c
@@ -15,6 +15,7 @@
 #include <asm/pgtable.h>
 #include <asm/pgtable-hwdef.h>
 
+#include "io.h"
 #include "vmalloc.h"
 
 #include <linux/compiler.h>
@@ -155,6 +156,8 @@ void mmu_set_range_sect(pgd_t *pgtable, uintptr_t virt_offset,
 void mmu_setup_early(phys_addr_t unused0, void *unused1)
 {
 	struct mem_region *r;
+	pgprot_t uart_prot;
+	void *uart_base;
 
 	assert(!mmu_enabled() && !page_alloc_initialized());
 
@@ -178,6 +181,11 @@ void mmu_setup_early(phys_addr_t unused0, void *unused1)
 		}
 	}
 
+	uart_base = io_uart_early_base();
+	uart_prot = __pgprot(PTE_UNCACHED | PTE_USER | PTE_UXN | PTE_PXN);
+	install_page_prot(mmu_idmap, (phys_addr_t)(unsigned long)uart_base,
+			  (uintptr_t)uart_base, uart_prot);
+
 	asm_mmu_enable((phys_addr_t)(unsigned long)mmu_idmap);
 }
 
-- 
2.37.1

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [kvm-unit-tests RFC PATCH 12/19] arm/arm64: assembler.h: Replace size with end address for dcache_by_line_op
  2022-08-09  9:15 [kvm-unit-tests RFC PATCH 00/19] arm/arm64: Rework cache maintenance at boot Alexandru Elisei
                   ` (10 preceding siblings ...)
  2022-08-09  9:15 ` [kvm-unit-tests RFC PATCH 11/19] arm/arm64: Map the UART when creating the translation tables Alexandru Elisei
@ 2022-08-09  9:15 ` Alexandru Elisei
  2022-08-09 13:01   ` Nikos Nikoleris
  2022-09-20  9:37   ` Andrew Jones
  2022-08-09  9:15 ` [kvm-unit-tests RFC PATCH 13/19] arm: page.h: Add missing libcflat.h include Alexandru Elisei
                   ` (7 subsequent siblings)
  19 siblings, 2 replies; 49+ messages in thread
From: Alexandru Elisei @ 2022-08-09  9:15 UTC (permalink / raw)
  To: pbonzini, thuth, andrew.jones, kvm, kvmarm, nikos.nikoleris

Commit b5f659be4775 ("arm/arm64: Remove dcache_line_size global
variable") moved the dcache_by_line_op macro to assembler.h and changed
it to take the size of the regions instead of the end address as
parameter. This was done to keep the file in sync with the upstream
Linux kernel implementation at the time.

But in both places where the macro is used, the code has the start and
end address of the region, and it has to compute the size to pass it to
dcache_by_line_op. Then the macro itsef computes the end by adding size
to start.

Get rid of this massaging of parameters and change the macro to the end
address as parameter directly.

Besides slightly simplyfing the code by remove two unneeded arithmetic
operations, this makes the macro compatible with the current upstream
version of Linux (which was similarly changed to take the end address in
commit 163d3f80695e ("arm64: dcache_by_line_op to take end parameter
instead of size")), which will allow us to reuse (part of) the Linux C
wrappers over the assembly macro.

The change has been tested with the same snippet of code used to test
commit 410b3bf09e76 ("arm/arm64: Perform dcache clean + invalidate after
turning MMU off").

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 arm/cstart.S              |  1 -
 arm/cstart64.S            |  1 -
 lib/arm/asm/assembler.h   | 11 +++++------
 lib/arm64/asm/assembler.h | 11 +++++------
 4 files changed, 10 insertions(+), 14 deletions(-)

diff --git a/arm/cstart.S b/arm/cstart.S
index 39e70f40986a..096a77c454f4 100644
--- a/arm/cstart.S
+++ b/arm/cstart.S
@@ -226,7 +226,6 @@ asm_mmu_disable:
 	ldr	r0, [r0]
 	ldr	r1, =__phys_end
 	ldr	r1, [r1]
-	sub	r1, r1, r0
 	dcache_by_line_op dccimvac, sy, r0, r1, r2, r3
 
 	mov     pc, lr
diff --git a/arm/cstart64.S b/arm/cstart64.S
index 54773676d1d5..7cc90a9fa13f 100644
--- a/arm/cstart64.S
+++ b/arm/cstart64.S
@@ -258,7 +258,6 @@ asm_mmu_disable:
 	ldr	x0, [x0, :lo12:__phys_offset]
 	adrp	x1, __phys_end
 	ldr	x1, [x1, :lo12:__phys_end]
-	sub	x1, x1, x0
 	dcache_by_line_op civac, sy, x0, x1, x2, x3
 
 	ret
diff --git a/lib/arm/asm/assembler.h b/lib/arm/asm/assembler.h
index 4200252dd14d..db5f0f55027c 100644
--- a/lib/arm/asm/assembler.h
+++ b/lib/arm/asm/assembler.h
@@ -25,17 +25,16 @@
 
 /*
  * Macro to perform a data cache maintenance for the interval
- * [addr, addr + size).
+ * [addr, end).
  *
  * 	op:		operation to execute
  * 	domain		domain used in the dsb instruction
  * 	addr:		starting virtual address of the region
- * 	size:		size of the region
- * 	Corrupts:	addr, size, tmp1, tmp2
+ * 	end:		the end of the region (non-inclusive)
+ * 	Corrupts:	addr, tmp1, tmp2
  */
-	.macro dcache_by_line_op op, domain, addr, size, tmp1, tmp2
+	.macro dcache_by_line_op op, domain, addr, end, tmp1, tmp2
 	dcache_line_size \tmp1, \tmp2
-	add	\size, \addr, \size
 	sub	\tmp2, \tmp1, #1
 	bic	\addr, \addr, \tmp2
 9998:
@@ -45,7 +44,7 @@
 	.err
 	.endif
 	add	\addr, \addr, \tmp1
-	cmp	\addr, \size
+	cmp	\addr, \end
 	blo	9998b
 	dsb	\domain
 	.endm
diff --git a/lib/arm64/asm/assembler.h b/lib/arm64/asm/assembler.h
index aa8c65a2bb4a..1e09d65af4a7 100644
--- a/lib/arm64/asm/assembler.h
+++ b/lib/arm64/asm/assembler.h
@@ -28,25 +28,24 @@
 
 /*
  * Macro to perform a data cache maintenance for the interval
- * [addr, addr + size). Use the raw value for the dcache line size because
+ * [addr, end). Use the raw value for the dcache line size because
  * kvm-unit-tests has no concept of scheduling.
  *
  * 	op:		operation passed to dc instruction
  * 	domain:		domain used in dsb instruction
  * 	addr:		starting virtual address of the region
- * 	size:		size of the region
- * 	Corrupts:	addr, size, tmp1, tmp2
+ * 	end:		the end of the region (non-inclusive)
+ * 	Corrupts:	addr, tmp1, tmp2
  */
 
-	.macro dcache_by_line_op op, domain, addr, size, tmp1, tmp2
+	.macro dcache_by_line_op op, domain, addr, end, tmp1, tmp2
 	raw_dcache_line_size \tmp1, \tmp2
-	add	\size, \addr, \size
 	sub	\tmp2, \tmp1, #1
 	bic	\addr, \addr, \tmp2
 9998:
 	dc	\op, \addr
 	add	\addr, \addr, \tmp1
-	cmp	\addr, \size
+	cmp	\addr, \end
 	b.lo	9998b
 	dsb	\domain
 	.endm
-- 
2.37.1

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [kvm-unit-tests RFC PATCH 13/19] arm: page.h: Add missing libcflat.h include
  2022-08-09  9:15 [kvm-unit-tests RFC PATCH 00/19] arm/arm64: Rework cache maintenance at boot Alexandru Elisei
                   ` (11 preceding siblings ...)
  2022-08-09  9:15 ` [kvm-unit-tests RFC PATCH 12/19] arm/arm64: assembler.h: Replace size with end address for dcache_by_line_op Alexandru Elisei
@ 2022-08-09  9:15 ` Alexandru Elisei
  2022-09-20  9:39   ` Andrew Jones
  2022-08-09  9:15 ` [kvm-unit-tests RFC PATCH 14/19] arm/arm64: Add C functions for doing cache maintenance Alexandru Elisei
                   ` (6 subsequent siblings)
  19 siblings, 1 reply; 49+ messages in thread
From: Alexandru Elisei @ 2022-08-09  9:15 UTC (permalink / raw)
  To: pbonzini, thuth, andrew.jones, kvm, kvmarm, nikos.nikoleris

Include libcflat from page.h to avoid error like this one:

/path/to/kvm-unit-tests/lib/asm/page.h:19:9: error: unknown type name ‘u64’
   19 | typedef u64 pteval_t;
      |         ^~~
[..]
/path/to/kvm-unit-tests/lib/asm/page.h:47:8: error: unknown type name ‘phys_addr_t’
   47 | extern phys_addr_t __virt_to_phys(unsigned long addr);
      |        ^~~~~~~~~~~
      |                                     ^~~~~~~~~~~
[..]
/path/to/kvm-unit-tests/lib/asm/page.h:50:47: error: unknown type name ‘size_t’
   50 | extern void *__ioremap(phys_addr_t phys_addr, size_t size);

The arm64 version of the header already includes libcflat since commit
a2d06852fe59 ("arm64: Add support for configuring the translation
granule").

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 lib/arm/asm/page.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/lib/arm/asm/page.h b/lib/arm/asm/page.h
index 8eb4a883808e..0a46bda018c7 100644
--- a/lib/arm/asm/page.h
+++ b/lib/arm/asm/page.h
@@ -8,6 +8,8 @@
 
 #include <linux/const.h>
 
+#include <libcflat.h>
+
 #define PAGE_SHIFT		12
 #define PAGE_SIZE		(_AC(1,UL) << PAGE_SHIFT)
 #define PAGE_MASK		(~(PAGE_SIZE-1))
-- 
2.37.1

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [kvm-unit-tests RFC PATCH 14/19] arm/arm64: Add C functions for doing cache maintenance
  2022-08-09  9:15 [kvm-unit-tests RFC PATCH 00/19] arm/arm64: Rework cache maintenance at boot Alexandru Elisei
                   ` (12 preceding siblings ...)
  2022-08-09  9:15 ` [kvm-unit-tests RFC PATCH 13/19] arm: page.h: Add missing libcflat.h include Alexandru Elisei
@ 2022-08-09  9:15 ` Alexandru Elisei
  2022-08-09  9:15 ` [kvm-unit-tests RFC PATCH 15/19] lib/alloc_phys: Add callback to perform " Alexandru Elisei
                   ` (5 subsequent siblings)
  19 siblings, 0 replies; 49+ messages in thread
From: Alexandru Elisei @ 2022-08-09  9:15 UTC (permalink / raw)
  To: pbonzini, thuth, andrew.jones, kvm, kvmarm, nikos.nikoleris

Add C functions for doing cache maintenance, which will be used for
implementing the necessary maintenance when kvm-unit-tests boots.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 arm/Makefile.arm           |  4 +-
 arm/Makefile.arm64         |  4 +-
 arm/Makefile.common        |  4 +-
 lib/arm/asm/assembler.h    |  4 +-
 lib/arm/asm/cacheflush.h   |  1 +
 lib/arm/asm/mmu.h          |  6 ---
 lib/arm/cache.S            | 89 ++++++++++++++++++++++++++++++++++++++
 lib/arm64/asm/cacheflush.h | 32 ++++++++++++++
 lib/arm64/asm/mmu.h        |  5 ---
 lib/arm64/cache.S          | 85 ++++++++++++++++++++++++++++++++++++
 10 files changed, 218 insertions(+), 16 deletions(-)
 create mode 100644 lib/arm/asm/cacheflush.h
 create mode 100644 lib/arm/cache.S
 create mode 100644 lib/arm64/asm/cacheflush.h
 create mode 100644 lib/arm64/cache.S

diff --git a/arm/Makefile.arm b/arm/Makefile.arm
index 01fd4c7bb6e2..03ec08503cfb 100644
--- a/arm/Makefile.arm
+++ b/arm/Makefile.arm
@@ -25,7 +25,9 @@ endif
 define arch_elf_check =
 endef
 
-cstart.o = $(TEST_DIR)/cstart.o
+asmobjs  = $(TEST_DIR)/cstart.o
+asmobjs += lib/arm/cache.o
+
 cflatobjs += lib/arm/spinlock.o
 cflatobjs += lib/arm/processor.o
 cflatobjs += lib/arm/stack.o
diff --git a/arm/Makefile.arm64 b/arm/Makefile.arm64
index 42e18e771b3b..31ac56dab104 100644
--- a/arm/Makefile.arm64
+++ b/arm/Makefile.arm64
@@ -20,7 +20,9 @@ define arch_elf_check =
 		$(error $(1) has unsupported reloc types))
 endef
 
-cstart.o = $(TEST_DIR)/cstart64.o
+asmobjs  = $(TEST_DIR)/cstart64.o
+asmobjs += lib/arm64/cache.o
+
 cflatobjs += lib/arm64/processor.o
 cflatobjs += lib/arm64/spinlock.o
 cflatobjs += lib/arm64/gic-v3-its.o lib/arm64/gic-v3-its-cmd.o
diff --git a/arm/Makefile.common b/arm/Makefile.common
index 38385e0c558e..a7a4787a5ef8 100644
--- a/arm/Makefile.common
+++ b/arm/Makefile.common
@@ -60,7 +60,7 @@ eabiobjs = lib/arm/eabi_compat.o
 
 FLATLIBS = $(libcflat) $(LIBFDT_archive) $(libeabi)
 %.elf: LDFLAGS = -nostdlib $(arch_LDFLAGS)
-%.elf: %.o $(FLATLIBS) $(SRCDIR)/arm/flat.lds $(cstart.o)
+%.elf: %.o $(FLATLIBS) $(SRCDIR)/arm/flat.lds $(asmobjs)
 	$(CC) $(CFLAGS) -c -o $(@:.elf=.aux.o) $(SRCDIR)/lib/auxinfo.c \
 		-DPROGNAME=\"$(@:.elf=.flat)\" -DAUXFLAGS=$(AUXFLAGS)
 	$(LD) $(LDFLAGS) -o $@ -T $(SRCDIR)/arm/flat.lds \
@@ -81,4 +81,4 @@ arm_clean: asm_offsets_clean
 	      $(TEST_DIR)/.*.d lib/arm/.*.d
 
 generated-files = $(asm-offsets)
-$(tests-all:.flat=.o) $(cstart.o) $(cflatobjs): $(generated-files)
+$(tests-all:.flat=.o) $(asmobjs) $(cflatobjs): $(generated-files)
diff --git a/lib/arm/asm/assembler.h b/lib/arm/asm/assembler.h
index db5f0f55027c..e728bb210a07 100644
--- a/lib/arm/asm/assembler.h
+++ b/lib/arm/asm/assembler.h
@@ -41,7 +41,9 @@
 	.ifc	\op, dccimvac
 	mcr	p15, 0, \addr, c7, c14, 1
 	.else
-	.err
+	.ifc	\op, dccmvac
+	mcr	p15, 0, \addr, c7, c10, 1
+	.endif
 	.endif
 	add	\addr, \addr, \tmp1
 	cmp	\addr, \end
diff --git a/lib/arm/asm/cacheflush.h b/lib/arm/asm/cacheflush.h
new file mode 100644
index 000000000000..42dc88a44ce0
--- /dev/null
+++ b/lib/arm/asm/cacheflush.h
@@ -0,0 +1 @@
+#include "../../arm64/asm/cacheflush.h"
diff --git a/lib/arm/asm/mmu.h b/lib/arm/asm/mmu.h
index b24b97e554e2..6359ba642a4c 100644
--- a/lib/arm/asm/mmu.h
+++ b/lib/arm/asm/mmu.h
@@ -45,12 +45,6 @@ static inline void flush_tlb_page(unsigned long vaddr)
 	isb();
 }
 
-static inline void flush_dcache_addr(unsigned long vaddr)
-{
-	/* DCCIMVAC */
-	asm volatile("mcr p15, 0, %0, c7, c14, 1" :: "r" (vaddr));
-}
-
 #include <asm/mmu-api.h>
 
 #endif /* _ASMARM_MMU_H_ */
diff --git a/lib/arm/cache.S b/lib/arm/cache.S
new file mode 100644
index 000000000000..f42284491b1f
--- /dev/null
+++ b/lib/arm/cache.S
@@ -0,0 +1,89 @@
+/*
+ * Based on arch/arm64/mm/cache.S
+ *
+ * Copyright (C) 2001 Deep Blue Solutions Ltd.
+ * Copyright (C) 2012, 2022 ARM Ltd.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.
+ */
+
+#include <asm/assembler.h>
+
+/*
+ *      dcache_inval_poc(start, end)
+ *
+ *      Ensure that any D-cache lines for the interval [start, end)
+ *      are invalidated. Any partial lines at the ends of the interval are
+ *      cleaned to PoC instead to prevent data loss.
+ *
+ *      - start   - start address of region
+ *      - end     - end address of region
+ */
+.global dcache_inval_poc
+dcache_inval_poc:
+	dmb	sy
+	dcache_line_size r2, r3
+	sub 	r3, r2, #1
+	tst	r1, r3			// end cache line aligned?
+	bic	r1, r1, r3
+	beq 	1f
+	// DCCIMVAC
+	mcr	p15, 0, r1, c7, c14, 1	// clean + invalidate end cache line
+1:	tst	r0, r3			// start cache line aligned?
+	bic	r0, r0, r3
+	beq	2f
+	mcr	p15, 0, r0, c7, c14, 1	// clean + invalidate start cache line
+	b	3f
+	// DCIMVAC
+2:	mcr	p15, 0, r0, c7, c6, 1	// invalidate current cache line
+3:	add	r0, r0, r2
+	cmp	r0, r1
+	blo	2b
+	dsb	sy
+	mov	pc, lr
+
+/*
+ *      dcache_clean_poc(start, end)
+ *
+ *      Ensure that any D-cache lines for the interval [start, end)
+ *      are cleaned to the PoC.
+ *
+ *      - start   - start address of region
+ *      - end     - end address of region
+ */
+.global dcache_clean_poc
+dcache_clean_poc:
+	dmb	sy
+	dcache_by_line_op dccmvac, sy, r0, r1, r2, r3
+	mov	pc, lr
+
+/*
+ *      dcache_clean_addr_poc(addr)
+ *
+ *      Ensure that the D-cache line for address addr is cleaned to the PoC.
+ *
+ *      - addr    - the address
+ */
+.global dcache_clean_addr_poc
+dcache_clean_addr_poc:
+	dmb	sy
+	// DCCMVAC
+	mcr 	p15, 0, r0, c7, c10, 1
+	dsb	sy
+	mov	pc, lr
+
+/*
+ *      dcache_clean_inval_addr_poc(addr)
+ *
+ *      Ensure that the D-cache line for address addr is cleaned and invalidated
+ *      to the PoC.
+ *
+ *      - addr    - the address
+ */
+.global dcache_clean_inval_addr_poc
+dcache_clean_inval_addr_poc:
+	dmb	sy
+	// DCCIMVAC
+	mcr 	p15, 0, r0, c7, c14, 1
+	dsb	sy
+	mov	pc, lr
diff --git a/lib/arm64/asm/cacheflush.h b/lib/arm64/asm/cacheflush.h
new file mode 100644
index 000000000000..2f0ac1f3e573
--- /dev/null
+++ b/lib/arm64/asm/cacheflush.h
@@ -0,0 +1,32 @@
+#ifndef _ASMARM64_CACHEFLUSH_H_
+#define _ASMARM64_CACHEFLUSH_H_
+/*
+ * Based on arch/arm64/asm/include/cacheflush.h
+ *
+ * Copyright (C) 1999-2002 Russell King.
+ * Copyright (C) 2012,2022 ARM Ltd.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.
+ */
+
+#include <asm/page.h>
+
+extern void dcache_clean_addr_poc(unsigned long addr);
+extern void dcache_clean_inval_addr_poc(unsigned long addr);
+
+extern void dcache_inval_poc(unsigned long start, unsigned long end);
+extern void dcache_clean_poc(unsigned long start, unsigned long end);
+
+static inline void dcache_inval_page_poc(unsigned long page_addr)
+{
+	assert(PAGE_ALIGN(page_addr) == page_addr);
+	dcache_inval_poc(page_addr, page_addr + PAGE_SIZE);
+}
+
+static inline void dcache_clean_page_poc(unsigned long page_addr)
+{
+	assert(PAGE_ALIGN(page_addr) == page_addr);
+	dcache_clean_poc(page_addr, page_addr + PAGE_SIZE);
+}
+
+#endif /* _ASMARM64_CACHEFLUSH_H_ */
diff --git a/lib/arm64/asm/mmu.h b/lib/arm64/asm/mmu.h
index 5c27edb24d2e..2ebbe925033b 100644
--- a/lib/arm64/asm/mmu.h
+++ b/lib/arm64/asm/mmu.h
@@ -28,11 +28,6 @@ static inline void flush_tlb_page(unsigned long vaddr)
 	isb();
 }
 
-static inline void flush_dcache_addr(unsigned long vaddr)
-{
-	asm volatile("dc civac, %0" :: "r" (vaddr));
-}
-
 #include <asm/mmu-api.h>
 
 #endif /* _ASMARM64_MMU_H_ */
diff --git a/lib/arm64/cache.S b/lib/arm64/cache.S
new file mode 100644
index 000000000000..28a339a451bf
--- /dev/null
+++ b/lib/arm64/cache.S
@@ -0,0 +1,85 @@
+/*
+ * Based on arch/arm64/mm/cache.S
+ *
+ * Copyright (C) 2001 Deep Blue Solutions Ltd.
+ * Copyright (C) 2012, 2022 ARM Ltd.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.
+ */
+
+#include <asm/assembler.h>
+
+/*
+ *      dcache_inval_poc(start, end)
+ *
+ *      Ensure that any D-cache lines for the interval [start, end)
+ *      are invalidated. Any partial lines at the ends of the interval are
+ *      cleaned to PoC instead to prevent data loss.
+ *
+ *      - start   - start address of region
+ *      - end     - end address of region
+ */
+.global dcache_inval_poc
+dcache_inval_poc:
+	dmb	sy
+	raw_dcache_line_size x2, x3
+	sub 	x3, x2, #1
+	tst	x1, x3			// end cache line aligned?
+	bic	x1, x1, x3
+	b.eq 	1f
+	dc	civac, x1		// clean + invalidate end cache line
+1:	tst	x0, x3			// start cache line aligned?
+	bic	x0, x0, x3
+	b.eq	2f
+	dc 	civac, x0		// clean + invalidate start cache line
+	b	3f
+2:	dc	ivac, x0		// invalidate current cache line
+3:	add	x0, x0, x2
+	cmp	x0, x1
+	b.lo	2b
+	dsb	sy
+	ret
+
+/*
+ *      dcache_clean_poc(start, end)
+ *
+ *      Ensure that any D-cache lines for the interval [start, end)
+ *      are cleaned to the PoC.
+ *
+ *      - start   - start address of region
+ *      - end     - end address of region
+ */
+.global dcache_clean_poc
+dcache_clean_poc:
+	dmb	sy
+	dcache_by_line_op cvac, sy, x0, x1, x2, x3
+	ret
+
+/*
+ *      dcache_clean_addr_poc(addr)
+ *
+ *      Ensure that the D-cache line for address addr is cleaned to the PoC.
+ *
+ *      - addr    - the address
+ */
+.global dcache_clean_addr_poc
+dcache_clean_addr_poc:
+	dmb	sy
+	dc	cvac, x0
+	dsb	sy
+	ret
+
+/*
+ *      dcache_clean_inval_addr_poc(addr)
+ *
+ *      Ensure that the D-cache line for address addr is cleaned and invalidated
+ *      to the PoC.
+ *
+ *      - addr    - the address
+ */
+.global dcache_clean_inval_addr_poc
+dcache_clean_inval_addr_poc:
+	dmb	sy
+	dc	civac, x0
+	dsb	sy
+	ret
-- 
2.37.1

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [kvm-unit-tests RFC PATCH 15/19] lib/alloc_phys: Add callback to perform cache maintenance
  2022-08-09  9:15 [kvm-unit-tests RFC PATCH 00/19] arm/arm64: Rework cache maintenance at boot Alexandru Elisei
                   ` (13 preceding siblings ...)
  2022-08-09  9:15 ` [kvm-unit-tests RFC PATCH 14/19] arm/arm64: Add C functions for doing cache maintenance Alexandru Elisei
@ 2022-08-09  9:15 ` Alexandru Elisei
  2022-08-09  9:15 ` [kvm-unit-tests RFC PATCH 16/19] arm/arm64: Allocate secondaries' stack using the page allocator Alexandru Elisei
                   ` (4 subsequent siblings)
  19 siblings, 0 replies; 49+ messages in thread
From: Alexandru Elisei @ 2022-08-09  9:15 UTC (permalink / raw)
  To: pbonzini, thuth, andrew.jones, kvm, kvmarm, nikos.nikoleris

Some architectures, like arm and arm64, require explicit cache
maintenance to maintain the caches in sync with memory when toggling the
caches. Add the function to do the required maintenance on the internal
structures that the allocator maintains.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 lib/alloc_phys.c | 8 ++++++++
 lib/alloc_phys.h | 8 ++++++++
 2 files changed, 16 insertions(+)

diff --git a/lib/alloc_phys.c b/lib/alloc_phys.c
index 1e1fc179f108..5872d2c6e38f 100644
--- a/lib/alloc_phys.c
+++ b/lib/alloc_phys.c
@@ -45,6 +45,14 @@ void phys_alloc_set_minimum_alignment(phys_addr_t align)
 	align_min = align;
 }
 
+void phys_alloc_perform_cache_maintenance(cache_maint_fn maint_fn)
+{
+	maint_fn((unsigned long)&base);
+	maint_fn((unsigned long)&used);
+	maint_fn((unsigned long)&top);
+	maint_fn((unsigned long)&align_min);
+}
+
 static void *memalign_early(size_t alignment, size_t sz)
 {
 	phys_addr_t align = (phys_addr_t)alignment;
diff --git a/lib/alloc_phys.h b/lib/alloc_phys.h
index 4d350f010031..86b3d0215d49 100644
--- a/lib/alloc_phys.h
+++ b/lib/alloc_phys.h
@@ -15,6 +15,8 @@
  */
 #include "libcflat.h"
 
+typedef void (*cache_maint_fn)(unsigned long addr);
+
 /*
  * phys_alloc_init creates the initial free memory region of size @size
  * at @base. The minimum alignment is set to DEFAULT_MINIMUM_ALIGNMENT.
@@ -27,6 +29,12 @@ extern void phys_alloc_init(phys_addr_t base, phys_addr_t size);
  */
 extern void phys_alloc_set_minimum_alignment(phys_addr_t align);
 
+/*
+ * Perform cache maintenance on the internal structures that the physical
+ * allocator maintains.
+ */
+extern void phys_alloc_perform_cache_maintenance(cache_maint_fn maint_fn);
+
 /*
  * phys_alloc_show outputs all currently allocated regions with the
  * following format, where <end_addr> is non-inclusive:
-- 
2.37.1

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [kvm-unit-tests RFC PATCH 16/19] arm/arm64: Allocate secondaries' stack using the page allocator
  2022-08-09  9:15 [kvm-unit-tests RFC PATCH 00/19] arm/arm64: Rework cache maintenance at boot Alexandru Elisei
                   ` (14 preceding siblings ...)
  2022-08-09  9:15 ` [kvm-unit-tests RFC PATCH 15/19] lib/alloc_phys: Add callback to perform " Alexandru Elisei
@ 2022-08-09  9:15 ` Alexandru Elisei
  2022-09-20  9:58   ` Andrew Jones
  2022-08-09  9:15 ` [kvm-unit-tests RFC PATCH 17/19] arm/arm64: Configure secondaries' stack before enabling the MMU Alexandru Elisei
                   ` (3 subsequent siblings)
  19 siblings, 1 reply; 49+ messages in thread
From: Alexandru Elisei @ 2022-08-09  9:15 UTC (permalink / raw)
  To: pbonzini, thuth, andrew.jones, kvm, kvmarm, nikos.nikoleris

The vmalloc allocator returns non-id mapped addresses, where the virtual
address is different than the physical address. This makes it impossible
to access the stack of the secondary CPUs while the MMU is disabled.

On arm, THREAD_SIZE is 16K and PAGE_SIZE is 4K, which makes THREAD_SIZE
a power of two multiple of PAGE_SIZE. On arm64, THREAD_SIZE is 16 when
PAGE_SIZE is 4K or 16K, and 64K when PAGE_SIZE is 64K. In all cases,
THREAD_SIZE is a power of two multiple of PAGE_SIZE. As a result, using
memalign_pages() for the stack won't lead to wasted memory.

memalign_pages() allocates memory in chunks of power of two number of
pages, aligned to the allocation size, which makes it a drop-in
replacement for vm_memalign (which is the value for alloc_ops->memalign
when the stack is allocated).

Using memalign_pages() has two distinct benefits:

1. The secondary CPUs' stack can be used with the MMU off.

2. The secondary CPUs' stack is identify mapped similar to the stack for
the primary CPU, which makes the configuration of the CPUs consistent.

memalign_pages_flags() has been used instead of memalign_pages() to
instruct the allocator not to zero the stack, as it's already zeroed in the
entry code.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 lib/arm/asm/thread_info.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/lib/arm/asm/thread_info.h b/lib/arm/asm/thread_info.h
index eaa72582af86..190e082cbba0 100644
--- a/lib/arm/asm/thread_info.h
+++ b/lib/arm/asm/thread_info.h
@@ -25,6 +25,7 @@
 #ifndef __ASSEMBLY__
 #include <asm/processor.h>
 #include <alloc.h>
+#include <alloc_page.h>
 
 #ifdef __arm__
 #include <asm/ptrace.h>
@@ -40,7 +41,7 @@
 
 static inline void *thread_stack_alloc(void)
 {
-	void *sp = memalign(THREAD_ALIGNMENT, THREAD_SIZE);
+	void *sp = memalign_pages_flags(THREAD_ALIGNMENT, THREAD_SIZE, FLAG_DONTZERO);
 	return sp + THREAD_START_SP;
 }
 
-- 
2.37.1

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [kvm-unit-tests RFC PATCH 17/19] arm/arm64: Configure secondaries' stack before enabling the MMU
  2022-08-09  9:15 [kvm-unit-tests RFC PATCH 00/19] arm/arm64: Rework cache maintenance at boot Alexandru Elisei
                   ` (15 preceding siblings ...)
  2022-08-09  9:15 ` [kvm-unit-tests RFC PATCH 16/19] arm/arm64: Allocate secondaries' stack using the page allocator Alexandru Elisei
@ 2022-08-09  9:15 ` Alexandru Elisei
  2022-08-09  9:15 ` [kvm-unit-tests RFC PATCH 18/19] arm/arm64: Perform dcache maintenance at boot Alexandru Elisei
                   ` (2 subsequent siblings)
  19 siblings, 0 replies; 49+ messages in thread
From: Alexandru Elisei @ 2022-08-09  9:15 UTC (permalink / raw)
  To: pbonzini, thuth, andrew.jones, kvm, kvmarm, nikos.nikoleris

Now that the secondaries' stack is linearly mapped, we can set it before
turning the MMU on. This makes the entry code for the secondaries
consistent with the entry code for the boot CPU.

To keep it simple, the struct secondary_data, which is now read with the
MMU off by the secondaries, is cleaned to PoC.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 arm/cstart.S   | 20 ++++++++++----------
 arm/cstart64.S | 16 ++++++++--------
 lib/arm/smp.c  |  5 +++++
 3 files changed, 23 insertions(+), 18 deletions(-)

diff --git a/arm/cstart.S b/arm/cstart.S
index 096a77c454f4..877559b367de 100644
--- a/arm/cstart.S
+++ b/arm/cstart.S
@@ -134,16 +134,6 @@ get_mmu_off:
 
 .global secondary_entry
 secondary_entry:
-	/* enable the MMU unless requested off */
-	bl	get_mmu_off
-	cmp	r0, #0
-	bne	1f
-	mov	r1, #0
-	ldr	r0, =mmu_idmap
-	ldr	r0, [r0]
-	bl	asm_mmu_enable
-
-1:
 	/*
 	 * Set the stack, and set up vector table
 	 * and exception stacks. Exception stacks
@@ -161,6 +151,16 @@ secondary_entry:
 	bl	exceptions_init
 	bl	enable_vfp
 
+	/* enable the MMU unless requested off */
+	bl	get_mmu_off
+	cmp	r0, #0
+	bne	1f
+	mov	r1, #0
+	ldr	r0, =mmu_idmap
+	ldr	r0, [r0]
+	bl	asm_mmu_enable
+
+1:
 	/* finish init in C code */
 	bl	secondary_cinit
 
diff --git a/arm/cstart64.S b/arm/cstart64.S
index 7cc90a9fa13f..face185a7781 100644
--- a/arm/cstart64.S
+++ b/arm/cstart64.S
@@ -138,6 +138,14 @@ get_mmu_off:
 
 .globl secondary_entry
 secondary_entry:
+	/* set the stack */
+	adrp	x0, secondary_data
+	ldr	x0, [x0, :lo12:secondary_data]
+	and	x1, x0, #THREAD_MASK
+	add	x2, x1, #THREAD_SIZE
+	zero_range x1, x2
+	mov	sp, x0
+
 	/* Enable FP/ASIMD */
 	mov	x0, #(3 << 20)
 	msr	cpacr_el1, x0
@@ -153,14 +161,6 @@ secondary_entry:
 	bl	asm_mmu_enable
 
 1:
-	/* set the stack */
-	adrp	x0, secondary_data
-	ldr	x0, [x0, :lo12:secondary_data]
-	and	x1, x0, #THREAD_MASK
-	add	x2, x1, #THREAD_SIZE
-	zero_range x1, x2
-	mov	sp, x0
-
 	/* finish init in C code */
 	bl	secondary_cinit
 
diff --git a/lib/arm/smp.c b/lib/arm/smp.c
index 89e44a172c15..9c49bc3dec61 100644
--- a/lib/arm/smp.c
+++ b/lib/arm/smp.c
@@ -7,6 +7,8 @@
  */
 #include <libcflat.h>
 #include <auxinfo.h>
+
+#include <asm/cacheflush.h>
 #include <asm/thread_info.h>
 #include <asm/spinlock.h>
 #include <asm/cpumask.h>
@@ -62,6 +64,9 @@ static void __smp_boot_secondary(int cpu, secondary_entry_fn entry)
 
 	secondary_data.stack = thread_stack_alloc();
 	secondary_data.entry = entry;
+	dcache_clean_poc((unsigned long)&secondary_data,
+			 (unsigned long)&secondary_data + sizeof(secondary_data));
+
 	mmu_mark_disabled(cpu);
 	ret = cpu_psci_cpu_boot(cpu);
 	assert(ret == 0);
-- 
2.37.1

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [kvm-unit-tests RFC PATCH 18/19] arm/arm64: Perform dcache maintenance at boot
  2022-08-09  9:15 [kvm-unit-tests RFC PATCH 00/19] arm/arm64: Rework cache maintenance at boot Alexandru Elisei
                   ` (16 preceding siblings ...)
  2022-08-09  9:15 ` [kvm-unit-tests RFC PATCH 17/19] arm/arm64: Configure secondaries' stack before enabling the MMU Alexandru Elisei
@ 2022-08-09  9:15 ` Alexandru Elisei
  2022-08-09  9:15 ` [kvm-unit-tests RFC PATCH 19/19] arm/arm64: Rework the cache maintenance in asm_mmu_disable Alexandru Elisei
  2022-08-09  9:49 ` [kvm-unit-tests RFC PATCH 00/19] arm/arm64: Rework cache maintenance at boot Alexandru Elisei
  19 siblings, 0 replies; 49+ messages in thread
From: Alexandru Elisei @ 2022-08-09  9:15 UTC (permalink / raw)
  To: pbonzini, thuth, andrew.jones, kvm, kvmarm, nikos.nikoleris

The arm and arm64 architectures require explicit cache maintenance to
keep the memory in sync with the caches when the MMU is turned on. When
the MMU is off, reads and writes are access system memory directly, but
with the MMU on, the same accesses hit the cache.

The sequence of cache maintenance operations to keep the cache contents
in sync with memory is:

1. Dcache invalidation for memory location M before the first write to
that location. This is needed to avoid a dirty cache line for address
M* being evicted after the explicit write that follows, and overwriting
the value that software writes. Dcache clean also works, but invalidation
is faster on bare metal. When running in a virtual machine, dcache
invalidation is promoted to dcache clean + invalidation, so it makes little
difference exactly which of the dcache operations is chosen, as long as
the result is that no cache line is dirty.

2. A write, or multiple writes, to memory address M are performed.

3. After the last write, and before the first read with the MMU on, the
dcache line that hold the value for memory location M needs to be
invalidated. This is needed to make sure that loads fetch data from
memory, where the most recent value written in step 2 is, instead of
reading clean, but stale values from the cache**.

For robustness and simplicty, the dcache invalidation at step 3 is
performed after the last write with the MMU off.

When running under KVM, the maintenance operations can be omitted when
first enabling the MMU because:

- KVM performs a dcache clean + invalidate on each page the first time a
guest accesses the page. This corresponds to the maintenance at step 1.

- KVM performs a dcache clean + invalidate on the entire guest memory when
  the MMU is turned on. This corresponds to the maintenance at step 3. The
dcache clean is harmless for a guest, because all accessed cache lines are
already clean because of the cache maintenance above.

The situation changes when kvm-unit-tests turns the MMU off and then turns
it back on: KVM will skip the cache maintenance when the MMU is enabled
again, and potentially skip the maintenance on page access if the page is
already mapped at stage 2. In this case, explicit cache maintenance is
required even when running as a KVM guest, although this is the
responsibility of the test to perform, since the library or boot code
doesn't turn the MMU off after enabling it.

Do what is needed to ensure correctness and perform the CMOs before
enabling the MMU on the boot path. The stack is a special case, since
kvm-unit-tests runs C code with the MMU disabled and the compiler can
arbitrarily modify the stack when compiling C code. So invalidate the
entire stack in the assembly function asm_mmu_enable.

Note that for static variables, the dcache maintenance at step 1 is omitted
because either the variable is stored in the data section of the test
image, which is cleaned to the PoC as per the Linux boot protocol, or in
the BSS section, which is invalidated to the PoC in the assembly entry code.

*The cache line could have been dirtied by higher level software, for
example by firmware.
**Even though the address M is invalidated in step 1, higher level software
could have allocated a new, clean cache line, as a result of a direct or
speculated read.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 arm/cstart.S            | 20 +++++++++++++
 arm/cstart64.S          | 30 ++++++++++++++++++++
 lib/arm/asm/pgtable.h   |  8 ++++++
 lib/arm/mmu.c           |  4 +++
 lib/arm/setup.c         | 62 +++++++++++++++++++++++++++++++++--------
 lib/arm64/asm/pgtable.h | 10 +++++++
 lib/devicetree.c        |  2 +-
 7 files changed, 124 insertions(+), 12 deletions(-)

diff --git a/arm/cstart.S b/arm/cstart.S
index 877559b367de..fc7c558802f1 100644
--- a/arm/cstart.S
+++ b/arm/cstart.S
@@ -34,12 +34,24 @@ start:
 	/* zero BSS */
 	ldr	r4, =bss
 	ldr	r5, =ebss
+	mov	r3, r4
+	dmb	sy
+	dcache_by_line_op dcimvac, sy, r4, r5, r6, r7
+	mov	r4, r3
 	zero_range r4, r5, r6, r7
+	mov	r4, r3
+	dmb	sy
+	dcache_by_line_op dcimvac, sy, r4, r5, r6, r7
 
 	/* zero stack */
 	ldr	r5, =stacktop
 	sub	r4, r5, #THREAD_SIZE
+	mov	r3, r4
+	dmb	sy
+	dcache_by_line_op dcimvac, sy, r4, r5, r6, r7
+	mov	r4, r3
 	zero_range r4, r5, r6, r7
+	/* asm_mmu_enable will perform the rest of the cache maintenance. */
 
 	/*
 	 * set stack, making room at top of stack for cpu0's
@@ -145,6 +157,7 @@ secondary_entry:
 	lsr	r2, #THREAD_SHIFT
 	lsl	r2, #THREAD_SHIFT
 	add	r3, r2, #THREAD_SIZE
+	/* Stack already cleaned to PoC, no need for cache maintenance. */
 	zero_range r2, r3, r4, r5
 	mov	sp, r0
 
@@ -204,6 +217,13 @@ asm_mmu_enable:
 	mcrr	p15, 0, r0, r1, c2
 	isb
 
+	dmb	sy
+	mov	r0, sp
+	lsr	r0, #THREAD_SHIFT
+	lsl	r0, #THREAD_SHIFT
+	add	r1, r0, #THREAD_SIZE
+	dcache_by_line_op dcimvac, sy, r0, r1, r3, r4
+
 	/* SCTLR */
 	mrc	p15, 0, r2, c1, c0, 0
 	orr	r2, #CR_C
diff --git a/arm/cstart64.S b/arm/cstart64.S
index face185a7781..1ce6b9e14d23 100644
--- a/arm/cstart64.S
+++ b/arm/cstart64.S
@@ -49,6 +49,7 @@ start:
 	add     x5, x5, :lo12:reloc_start
 	adrp	x6, reloc_end
 	add     x6, x6, :lo12:reloc_end
+
 1:
 	cmp	x5, x6
 	b.hs	1f
@@ -56,22 +57,44 @@ start:
 	ldr	x8, [x5, #16]			// r_addend
 	add	x8, x8, x4			// val = base + r_addend
 	str	x8, [x4, x7]			// base[r_offset] = val
+	add	x7, x4, x7
+	dmb	sy
+	/* Image is cleaned to PoC, no need for CMOs before the memory write. */
+	dc	ivac, x7
 	add	x5, x5, #24
 	b	1b
 
 1:
+	/* Complete the cache maintenance operations. */
+	dsb	sy
+
 	/* zero BSS */
 	adrp	x4, bss
 	add	x4, x4, :lo12:bss
 	adrp    x5, ebss
 	add     x5, x5, :lo12:ebss
+	/* Stash start of bss, as dcache_by_line_op corrupts it. */
+	mov	x9, x4
+	dmb	sy
+	/* Make sure there are no dirty cache lines that can be evicted. */
+	dcache_by_line_op ivac, sy, x4, x5, x6, x7
+	mov	x4, x9
 	zero_range x4, x5
+	mov	x9, x4
+	dmb	sy
+	/* Invalidate clean and potentially stale cache lines. */
+	dcache_by_line_op ivac, sy, x4, x5, x6, x7
 
 	/* zero and set up stack */
 	adrp    x5, stacktop
 	add     x5, x5, :lo12:stacktop
 	sub	x4, x5, #THREAD_SIZE
+	mov	x9, x4
+	dmb	sy
+	dcache_by_line_op ivac, sy, x4, x5, x6, x7
+	mov	x4, x9
 	zero_range x4, x5
+	/* asm_mmu_enable will perform the rest of the cache maintenance. */
 
 	/* set SCTLR_EL1 to a known value */
 	ldr	x4, =INIT_SCTLR_EL1_MMU_OFF
@@ -143,6 +166,7 @@ secondary_entry:
 	ldr	x0, [x0, :lo12:secondary_data]
 	and	x1, x0, #THREAD_MASK
 	add	x2, x1, #THREAD_SIZE
+	/* Stack already cleaned to PoC, no need for cache maintenance. */
 	zero_range x1, x2
 	mov	sp, x0
 
@@ -212,6 +236,12 @@ asm_mmu_enable:
 	tlbi	vmalle1			// invalidate I + D TLBs
 	dsb	nsh
 
+	dmb	sy
+	mov	x9, sp
+	and	x9, x9, #THREAD_MASK
+	add	x10, x9, #THREAD_SIZE
+	dcache_by_line_op ivac, sy, x9, x10, x11, x12
+
 	/* TCR */
 	ldr	x1, =TCR_TxSZ(VA_BITS) |		\
 		     TCR_TG_FLAGS  |			\
diff --git a/lib/arm/asm/pgtable.h b/lib/arm/asm/pgtable.h
index 1911e35bb091..3cfdccd00733 100644
--- a/lib/arm/asm/pgtable.h
+++ b/lib/arm/asm/pgtable.h
@@ -22,6 +22,8 @@
  */
 #include <linux/compiler.h>
 
+#include <asm/cacheflush.h>
+
 #define pgtable_va(x)		((void *)(unsigned long)(x))
 #define pgtable_pa(x)		((unsigned long)(x))
 
@@ -44,7 +46,9 @@
 static inline pgd_t *pgd_alloc_early(void)
 {
 	pgd_t *pgd = memalign(PAGE_SIZE, PAGE_SIZE);
+	dcache_inval_page_poc((unsigned long)pgd);
 	memset(pgd, 0, PAGE_SIZE);
+	dcache_inval_page_poc((unsigned long)pgd);
 	return pgd;
 }
 static inline pgd_t *pgd_alloc(void)
@@ -86,7 +90,9 @@ static inline pmd_t *pmd_alloc_one_early(void)
 {
 	assert(PTRS_PER_PMD * sizeof(pmd_t) == PAGE_SIZE);
 	pmd_t *pmd = memalign(PAGE_SIZE, PAGE_SIZE);
+	dcache_inval_page_poc((unsigned long)pmd);
 	memset(pmd, 0, PAGE_SIZE);
+	dcache_inval_page_poc((unsigned long)pmd);
 	return pmd;
 }
 static inline pmd_t *pmd_alloc(pgd_t *pgd, unsigned long addr)
@@ -125,7 +131,9 @@ static inline pte_t *pte_alloc_one_early(void)
 {
 	assert(PTRS_PER_PTE * sizeof(pte_t) == PAGE_SIZE);
 	pte_t *pte = memalign(PAGE_SIZE, PAGE_SIZE);
+	dcache_inval_page_poc((unsigned long)pte);
 	memset(pte, 0, PAGE_SIZE);
+	dcache_inval_page_poc((unsigned long)pte);
 	return pte;
 }
 static inline pte_t *pte_alloc(pmd_t *pmd, unsigned long addr)
diff --git a/lib/arm/mmu.c b/lib/arm/mmu.c
index 19c98a8a9640..56c64d9c26d4 100644
--- a/lib/arm/mmu.c
+++ b/lib/arm/mmu.c
@@ -5,6 +5,7 @@
  *
  * This work is licensed under the terms of the GNU LGPL, version 2.
  */
+#include <asm/cacheflush.h>
 #include <asm/setup.h>
 #include <asm/thread_info.h>
 #include <asm/cpumask.h>
@@ -15,6 +16,7 @@
 #include <asm/pgtable.h>
 #include <asm/pgtable-hwdef.h>
 
+#include "alloc_phys.h"
 #include "io.h"
 #include "vmalloc.h"
 
@@ -186,6 +188,8 @@ void mmu_setup_early(phys_addr_t unused0, void *unused1)
 	install_page_prot(mmu_idmap, (phys_addr_t)(unsigned long)uart_base,
 			  (uintptr_t)uart_base, uart_prot);
 
+	phys_alloc_perform_cache_maintenance(dcache_clean_inval_addr_poc);
+
 	asm_mmu_enable((phys_addr_t)(unsigned long)mmu_idmap);
 }
 
diff --git a/lib/arm/setup.c b/lib/arm/setup.c
index 73f7c22c6828..54422fe7a981 100644
--- a/lib/arm/setup.c
+++ b/lib/arm/setup.c
@@ -19,6 +19,7 @@
 #include <vmalloc.h>
 #include <auxinfo.h>
 #include <argv.h>
+#include <asm/cacheflush.h>
 #include <asm/thread_info.h>
 #include <asm/setup.h>
 #include <asm/page.h>
@@ -160,6 +161,7 @@ static void mem_init(phys_addr_t freemem_start)
 	struct mem_region *freemem, *r, mem = {
 		.start = (phys_addr_t)-1,
 	};
+	int nr_regions = 0;
 
 	freemem = mem_region_find(freemem_start);
 	assert(freemem && !(freemem->flags & (MR_F_IO | MR_F_CODE)));
@@ -171,6 +173,7 @@ static void mem_init(phys_addr_t freemem_start)
 			if (r->end > mem.end)
 				mem.end = r->end;
 		}
+		nr_regions++;
 	}
 	assert(mem.end && !(mem.start & ~PHYS_MASK));
 	mem.end &= PHYS_MASK;
@@ -184,6 +187,9 @@ static void mem_init(phys_addr_t freemem_start)
 	/* Ensure our selected freemem range is somewhere in our full range */
 	assert(freemem_start >= mem.start && freemem->end <= mem.end);
 
+	dcache_inval_poc((unsigned long)mem_regions,
+			 (unsigned long)(mem_regions + nr_regions));
+
 	__phys_offset = mem.start;	/* PHYS_OFFSET */
 	__phys_end = mem.end;		/* PHYS_END */
 
@@ -240,32 +246,66 @@ static void timer_save_state(void)
 	__timer_state.vtimer.irq_flags = fdt32_to_cpu(data[8]);
 }
 
-void setup(const void *fdt, phys_addr_t freemem_start)
+extern const void *fdt;
+static void do_fdt_move(void *freemem, const void *fdt_addr, u32 *fdt_size)
 {
-	void *freemem;
-	const char *bootargs, *tmp;
-	u32 fdt_size;
 	int ret;
 
-	assert(sizeof(long) == 8 || freemem_start < (3ul << 30));
-	freemem = (void *)(unsigned long)freemem_start;
-
 	/* Move the FDT to the base of free memory */
-	fdt_size = fdt_totalsize(fdt);
-	ret = fdt_move(fdt, freemem, fdt_size);
+	*fdt_size = fdt_totalsize(fdt_addr);
+
+	/* Invalidate potentially dirty cache lines. */
+	dcache_inval_poc((unsigned long)freemem, (unsigned long)freemem + *fdt_size);
+	ret = fdt_move(fdt_addr, freemem, *fdt_size);
 	assert(ret == 0);
+
 	ret = dt_init(freemem);
+	/*
+	 * Invalidate the clean (the bootloader cleans the test image to PoC),
+	 * but potentially stale, cache line that holds the value of the
+	 * variable fdt, to force the CPU to fetch it from memory when the MMU
+	 * is enabled.
+	 */
+	dcache_clean_inval_addr_poc((unsigned long)&fdt);
 	assert(ret == 0);
-	freemem += fdt_size;
+}
+
+static void initrd_move(void *freemem)
+{
+	const char *tmp;
+	int ret;
 
 	/* Move the initrd to the top of the FDT */
 	ret = dt_get_initrd(&tmp, &initrd_size);
 	assert(ret == 0 || ret == -FDT_ERR_NOTFOUND);
 	if (ret == 0) {
 		initrd = freemem;
+		/* Invalidate the potentially stale cached value for initrd. */
+		dcache_clean_inval_addr_poc((unsigned long)&initrd);
+		/* Invalidate potentially dirty cache lines. */
+		dcache_inval_poc((unsigned long)freemem, (unsigned long)freemem + initrd_size);
 		memmove(initrd, tmp, initrd_size);
-		freemem += initrd_size;
 	}
+}
+
+void setup(const void *fdt_addr, phys_addr_t freemem_start)
+{
+	void *freemem;
+	const char *bootargs;
+	u32 fdt_size;
+	int ret;
+
+	assert(sizeof(long) == 8 || freemem_start < (3ul << 30));
+	freemem = (void *)(unsigned long)freemem_start;
+
+	do_fdt_move(freemem, fdt_addr, &fdt_size);
+	freemem += fdt_size;
+
+	initrd_move(freemem);
+	freemem += initrd_size;
+
+	/* Invalidate potentially stale cache lines. */
+	dcache_inval_poc(freemem_start, (unsigned long)freemem);
 
 	mem_regions_add_dt_regions();
 	mem_regions_add_assumed();
diff --git a/lib/arm64/asm/pgtable.h b/lib/arm64/asm/pgtable.h
index 98d51c89b7c0..c026fd01e4c8 100644
--- a/lib/arm64/asm/pgtable.h
+++ b/lib/arm64/asm/pgtable.h
@@ -15,6 +15,8 @@
  */
 #include <alloc.h>
 #include <alloc_page.h>
+
+#include <asm/cacheflush.h>
 #include <asm/setup.h>
 #include <asm/page.h>
 #include <asm/pgtable-hwdef.h>
@@ -50,7 +52,9 @@
 static inline pgd_t *pgd_alloc_early(void)
 {
 	pgd_t *pgd = memalign(PAGE_SIZE, PAGE_SIZE);
+	dcache_inval_page_poc((unsigned long)pgd);
 	memset(pgd, 0, PAGE_SIZE);
+	dcache_inval_page_poc((unsigned long)pgd);
 	return pgd;
 }
 static inline pgd_t *pgd_alloc(void)
@@ -96,7 +100,9 @@ static inline pmd_t *pmd_alloc_one_early(void)
 {
 	assert(PTRS_PER_PMD * sizeof(pmd_t) == PAGE_SIZE);
 	pmd_t *pmd = memalign(PAGE_SIZE, PAGE_SIZE);
+	dcache_inval_page_poc((unsigned long)pmd);
 	memset(pmd, 0, PAGE_SIZE);
+	dcache_inval_page_poc((unsigned long)pmd);
 	return pmd;
 }
 static inline pmd_t *pmd_alloc(pud_t *pud, unsigned long addr)
@@ -135,7 +141,9 @@ static inline pud_t *pud_alloc_one_early(void)
 {
 	assert(PTRS_PER_PUD * sizeof(pud_t) == PAGE_SIZE);
 	pud_t *pud = memalign(PAGE_SIZE, PAGE_SIZE);
+	dcache_inval_page_poc((unsigned long)pud);
 	memset(pud, 0, PAGE_SIZE);
+	dcache_inval_page_poc((unsigned long)pud);
 	return pud;
 }
 static inline pud_t *pud_alloc(pgd_t *pgd, unsigned long addr)
@@ -174,7 +182,9 @@ static inline pte_t *pte_alloc_one_early(void)
 {
 	assert(PTRS_PER_PTE * sizeof(pte_t) == PAGE_SIZE);
 	pte_t *pte = memalign(PAGE_SIZE, PAGE_SIZE);
+	dcache_inval_page_poc((unsigned long)pte);
 	memset(pte, 0, PAGE_SIZE);
+	dcache_inval_page_poc((unsigned long)pte);
 	return pte;
 }
 static inline pte_t *pte_alloc(pmd_t *pmd, unsigned long addr)
diff --git a/lib/devicetree.c b/lib/devicetree.c
index 78c1f6fbe474..be3c8a30a72d 100644
--- a/lib/devicetree.c
+++ b/lib/devicetree.c
@@ -7,7 +7,7 @@
 #include "libfdt/libfdt.h"
 #include "devicetree.h"
 
-static const void *fdt;
+const void *fdt;
 
 const void *dt_fdt(void)
 {
-- 
2.37.1

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [kvm-unit-tests RFC PATCH 19/19] arm/arm64: Rework the cache maintenance in asm_mmu_disable
  2022-08-09  9:15 [kvm-unit-tests RFC PATCH 00/19] arm/arm64: Rework cache maintenance at boot Alexandru Elisei
                   ` (17 preceding siblings ...)
  2022-08-09  9:15 ` [kvm-unit-tests RFC PATCH 18/19] arm/arm64: Perform dcache maintenance at boot Alexandru Elisei
@ 2022-08-09  9:15 ` Alexandru Elisei
  2022-08-09 13:53   ` Nikos Nikoleris
  2022-08-09  9:49 ` [kvm-unit-tests RFC PATCH 00/19] arm/arm64: Rework cache maintenance at boot Alexandru Elisei
  19 siblings, 1 reply; 49+ messages in thread
From: Alexandru Elisei @ 2022-08-09  9:15 UTC (permalink / raw)
  To: pbonzini, thuth, andrew.jones, kvm, kvmarm, nikos.nikoleris

asm_mmu_disable is overly ambitious and provably incorrect:

1. It tries to clean and invalidate the data caches for the *entire*
memory, which is highly unnecessary, as it's very unlikely that a test
will write to the entire memory, and even more unlikely that a test will
modify the text section of the test image.

2. There is no corresponding dcache invalidate command for the entire
memory in asm_mmu_enable, leaving it up to the test that disabled the
MMU to do the cache maintenance in an asymmetrical fashion: only for
re-enabling the MMU, but not for disabling it.

3. It's missing the DMB SY memory barrier to ensure that the dcache
maintenance is performed after the last store executed in program order
before calling asm_mmu_disable.

Fix all of the issues in one go, by doing the cache maintenance only for
the stack, as that is out of the control of the C code, and add the missing
memory barrier.

The code used to test that mmu_disable works correctly is similar to the
code used to test commit 410b3bf09e76 ("arm/arm64: Perform dcache clean
+ invalidate after turning MMU off"), with extra cache maintenance
added:

+#include <alloc_page.h>
+#include <asm/cacheflush.h>
+#include <asm/mmu.h>
 int main(int argc, char **argv)
 {
+       int *x = alloc_page();
+       bool pass = true;
+       int i;
+
+       for  (i = 0; i < 1000000; i++) {
+               *x = 0x42;
+               dcache_clean_addr_poc((unsigned long)x);
+               mmu_disable();
+               if (*x != 0x42) {
+                       pass = false;
+                       break;
+               }
+               *x = 0x50;
+               /* Needed for the invalidation only. */
+               dcache_clean_inval_addr_poc((unsigned long)x);
+               mmu_enable(current_thread_info()->pgtable);
+               if (*x != 0x50) {
+                       pass = false;
+                       break;
+               }
+       }
+       report(pass, "MMU disable cache maintenance");

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 arm/cstart.S   | 11 ++++++-----
 arm/cstart64.S | 11 +++++------
 2 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/arm/cstart.S b/arm/cstart.S
index fc7c558802f1..b27de44f30a6 100644
--- a/arm/cstart.S
+++ b/arm/cstart.S
@@ -242,11 +242,12 @@ asm_mmu_disable:
 	mcr	p15, 0, r0, c1, c0, 0
 	isb
 
-	ldr	r0, =__phys_offset
-	ldr	r0, [r0]
-	ldr	r1, =__phys_end
-	ldr	r1, [r1]
-	dcache_by_line_op dccimvac, sy, r0, r1, r2, r3
+	dmb	sy
+	mov	r0, sp
+	lsr	r0, #THREAD_SHIFT
+	lsl	r0, #THREAD_SHIFT
+	add	r1, r0, #THREAD_SIZE
+	dcache_by_line_op dccmvac, sy, r0, r1, r3, r4
 
 	mov     pc, lr
 
diff --git a/arm/cstart64.S b/arm/cstart64.S
index 1ce6b9e14d23..af4970775298 100644
--- a/arm/cstart64.S
+++ b/arm/cstart64.S
@@ -283,12 +283,11 @@ asm_mmu_disable:
 	msr	sctlr_el1, x0
 	isb
 
-	/* Clean + invalidate the entire memory */
-	adrp	x0, __phys_offset
-	ldr	x0, [x0, :lo12:__phys_offset]
-	adrp	x1, __phys_end
-	ldr	x1, [x1, :lo12:__phys_end]
-	dcache_by_line_op civac, sy, x0, x1, x2, x3
+	dmb	sy
+	mov	x9, sp
+	and	x9, x9, #THREAD_MASK
+	add	x10, x9, #THREAD_SIZE
+	dcache_by_line_op cvac, sy, x9, x10, x11, x12
 
 	ret
 
-- 
2.37.1

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Re: [kvm-unit-tests RFC PATCH 00/19] arm/arm64: Rework cache maintenance at boot
  2022-08-09  9:15 [kvm-unit-tests RFC PATCH 00/19] arm/arm64: Rework cache maintenance at boot Alexandru Elisei
                   ` (18 preceding siblings ...)
  2022-08-09  9:15 ` [kvm-unit-tests RFC PATCH 19/19] arm/arm64: Rework the cache maintenance in asm_mmu_disable Alexandru Elisei
@ 2022-08-09  9:49 ` Alexandru Elisei
  19 siblings, 0 replies; 49+ messages in thread
From: Alexandru Elisei @ 2022-08-09  9:49 UTC (permalink / raw)
  To: pbonzini, thuth, andrew.jones, kvm, kvmarm, nikos.nikoleris

Hi,

Forgot to add a link to the gitlab branch with the series [1].

[1] https://gitlab.arm.com/linux-arm/kvm-unit-tests-ae/-/tree/arm-arm64-rework-cache-maintenance-at-boot-v1

Thanks,
Alex

On Tue, Aug 09, 2022 at 10:15:39AM +0100, Alexandru Elisei wrote:
> I got the idea for this series as I was looking at the UEFI support series
> [1]. More specifically, I realized that the cache maintenance performed by
> asm_mmu_disable is insuficient. Patch #19 ("arm/arm64: Rework the cache
> maintenance in asm_mmu_disable") describes what is wrong with
> asm_mmu_disable. A detailed explanation of what cache maintenance is needed
> and why is needed can be found in patch #18 ("arm/arm64: Perform dcache
> maintenance at boot").
> 
> Then I realized that I couldn't fix only asm_mmu_disable, and leave the
> rest of kvm-unit-tests without the needed cache maintenance, so here it is,
> my attempt at adding the cache maintenace operations (from now on, CMOs)
> required by the architecture.
> 
> My approach is to try to enable the MMU and build the translation tables as
> soon as possible, to avoid as much of cache maintenance as possible. I
> didn't want to do it in the early assembly code, like Linux, because I like
> the fact that kvm-unit-tests keeps the assembly code to a minimum, and I
> wanted to preserve that. So I made the physical allocator simpler (patches
> #2-#6) so it can be used to create the translation tables immediately after
> the memory regions are populated.
> 
> After moving some code around, especially how the secondaries are brought
> online, the dcache maintenance is implemented in patch #18 ("arm/arm64:
> Perform dcache maintenance at boot").
> 
> The series is an RFC, and I open to suggestions about how to do things
> better; I'm happy to rework the entire series if a better approach is
> proposed.
> 
> Why is this needed? Nobody complained about test failing because of missing
> CMOs before, so why add them now? I see two reasons for the series:
> 
> 1. For architectural correctness. The emphasis has been so far on the test
> themselves to be architectural compliant, but I believe that the boot code
> should get the same treatment. kvm-unit-tests has started to be used in
> different ways than before, and I don't think that we should limit
> ourselves to running under one hypervisor, or running under a hypervisor at
> all. Which brings me to point number 2.
> 
> 2. If nothing else, this can serve as a showcase for the UEFI support
> series for the required cache maintenance. Although I hope that UEFI
> support will end up sharing at least some of the boot code with the
> non-UEFI boot path.
> 
> This is an RFC and has some rough edges, probably also bugs, but I believe
> the concept to be sound. If/when the series stabilizes, I'll probably split
> it into separate series (for example, the __ASSEMBLY__ define patch could
> probably be separate from the others). Tested by running all the arm and
> arm64 tests on a rockpro64 with qemu.
> 
> [1] https://lore.kernel.org/all/20220630100324.3153655-1-nikos.nikoleris@arm.com/
> 
> Alexandru Elisei (19):
>   Makefile: Define __ASSEMBLY__ for assembly files
>   lib/alloc_phys: Initialize align_min
>   lib/alloc_phys: Use phys_alloc_aligned_safe and rename it to
>     memalign_early
>   powerpc: Use the page allocator
>   lib/alloc_phys: Remove locking
>   lib/alloc_phys: Remove allocation accounting
>   arm/arm64: Mark the phys_end parameter as unused in setup_mmu()
>   arm/arm64: Use pgd_alloc() to allocate mmu_idmap
>   arm/arm64: Zero secondary CPUs' stack
>   arm/arm64: Enable the MMU early
>   arm/arm64: Map the UART when creating the translation tables
>   arm/arm64: assembler.h: Replace size with end address for
>     dcache_by_line_op
>   arm: page.h: Add missing libcflat.h include
>   arm/arm64: Add C functions for doing cache maintenance
>   lib/alloc_phys: Add callback to perform cache maintenance
>   arm/arm64: Allocate secondaries' stack using the page allocator
>   arm/arm64: Configure secondaries' stack before enabling the MMU
>   arm/arm64: Perform dcache maintenance at boot
>   arm/arm64: Rework the cache maintenance in asm_mmu_disable
> 
>  Makefile                   |   5 +-
>  arm/Makefile.arm           |   4 +-
>  arm/Makefile.arm64         |   4 +-
>  arm/Makefile.common        |   4 +-
>  arm/cstart.S               |  59 ++++++++++++------
>  arm/cstart64.S             |  56 +++++++++++++----
>  lib/alloc_phys.c           | 122 ++++++++++++-------------------------
>  lib/alloc_phys.h           |  13 +++-
>  lib/arm/asm/assembler.h    |  15 ++---
>  lib/arm/asm/cacheflush.h   |   1 +
>  lib/arm/asm/mmu-api.h      |   1 +
>  lib/arm/asm/mmu.h          |   6 --
>  lib/arm/asm/page.h         |   2 +
>  lib/arm/asm/pgtable.h      |  52 ++++++++++++++--
>  lib/arm/asm/thread_info.h  |   3 +-
>  lib/arm/cache.S            |  89 +++++++++++++++++++++++++++
>  lib/arm/io.c               |   5 ++
>  lib/arm/io.h               |   3 +
>  lib/arm/mmu.c              |  60 +++++++++++-------
>  lib/arm/processor.c        |   6 +-
>  lib/arm/setup.c            |  66 ++++++++++++++++----
>  lib/arm/smp.c              |   9 ++-
>  lib/arm64/asm/assembler.h  |  11 ++--
>  lib/arm64/asm/cacheflush.h |  32 ++++++++++
>  lib/arm64/asm/mmu.h        |   5 --
>  lib/arm64/asm/pgtable.h    |  67 ++++++++++++++++++--
>  lib/arm64/cache.S          |  85 ++++++++++++++++++++++++++
>  lib/arm64/processor.c      |   5 +-
>  lib/devicetree.c           |   2 +-
>  lib/powerpc/setup.c        |   8 +++
>  powerpc/Makefile.common    |   1 +
>  powerpc/cstart64.S         |   1 -
>  powerpc/spapr_hcall.c      |   5 +-
>  33 files changed, 608 insertions(+), 199 deletions(-)
>  create mode 100644 lib/arm/asm/cacheflush.h
>  create mode 100644 lib/arm/cache.S
>  create mode 100644 lib/arm64/asm/cacheflush.h
>  create mode 100644 lib/arm64/cache.S
> 
> -- 
> 2.37.1
> 
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [kvm-unit-tests RFC PATCH 01/19] Makefile: Define __ASSEMBLY__ for assembly files
  2022-08-09  9:15 ` [kvm-unit-tests RFC PATCH 01/19] Makefile: Define __ASSEMBLY__ for assembly files Alexandru Elisei
@ 2022-08-09 12:36   ` Nikos Nikoleris
  2022-09-20  8:11   ` Andrew Jones
  1 sibling, 0 replies; 49+ messages in thread
From: Nikos Nikoleris @ 2022-08-09 12:36 UTC (permalink / raw)
  To: Alexandru Elisei, pbonzini, thuth, andrew.jones, kvm, kvmarm

On 09/08/2022 10:15, Alexandru Elisei wrote:
> There are 25 header files today (found with grep -r "#ifndef __ASSEMBLY__)
> with functionality relies on the __ASSEMBLY__ prepocessor constant being
> correctly defined to work correctly. So far, kvm-unit-tests has relied on
> the assembly files to define the constant before including any header
> files which depend on it.
> 
> Let's make sure that nobody gets this wrong and define it as a compiler
> constant when compiling assembly files. __ASSEMBLY__ is now defined for all
> .S files, even those that didn't set it explicitely before.
> 

This is a great idea.

Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com>

Thanks,

Nikos

> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
> ---
>   Makefile           | 5 ++++-
>   arm/cstart.S       | 1 -
>   arm/cstart64.S     | 1 -
>   powerpc/cstart64.S | 1 -
>   4 files changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/Makefile b/Makefile
> index 6ed5deaccb41..42c61aa45d50 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -94,6 +94,9 @@ CFLAGS += $(wmissing_parameter_type)
>   CFLAGS += $(wold_style_declaration)
>   CFLAGS += -Woverride-init -Wmissing-prototypes -Wstrict-prototypes
>   
> +AFLAGS  = $(CFLAGS)
> +AFLAGS += -D__ASSEMBLY__
> +
>   autodepend-flags = -MMD -MF $(dir $*).$(notdir $*).d
>   
>   LDFLAGS += $(CFLAGS)
> @@ -117,7 +120,7 @@ directories:
>   	@mkdir -p $(OBJDIRS)
>   
>   %.o: %.S
> -	$(CC) $(CFLAGS) -c -nostdlib -o $@ $<
> +	$(CC) $(AFLAGS) -c -nostdlib -o $@ $<
>   
>   -include */.*.d */*/.*.d
>   
> diff --git a/arm/cstart.S b/arm/cstart.S
> index 7036e67fc0d8..39260e0fa470 100644
> --- a/arm/cstart.S
> +++ b/arm/cstart.S
> @@ -5,7 +5,6 @@
>    *
>    * This work is licensed under the terms of the GNU LGPL, version 2.
>    */
> -#define __ASSEMBLY__
>   #include <auxinfo.h>
>   #include <asm/assembler.h>
>   #include <asm/thread_info.h>
> diff --git a/arm/cstart64.S b/arm/cstart64.S
> index e4ab7d06251e..d62360cf3859 100644
> --- a/arm/cstart64.S
> +++ b/arm/cstart64.S
> @@ -5,7 +5,6 @@
>    *
>    * This work is licensed under the terms of the GNU GPL, version 2.
>    */
> -#define __ASSEMBLY__
>   #include <auxinfo.h>
>   #include <asm/asm-offsets.h>
>   #include <asm/assembler.h>
> diff --git a/powerpc/cstart64.S b/powerpc/cstart64.S
> index 972851f9ed65..1a823385fd0f 100644
> --- a/powerpc/cstart64.S
> +++ b/powerpc/cstart64.S
> @@ -5,7 +5,6 @@
>    *
>    * This work is licensed under the terms of the GNU LGPL, version 2.
>    */
> -#define __ASSEMBLY__
>   #include <asm/hcall.h>
>   #include <asm/ppc_asm.h>
>   #include <asm/rtas.h>
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [kvm-unit-tests RFC PATCH 09/19] arm/arm64: Zero secondary CPUs' stack
  2022-08-09  9:15 ` [kvm-unit-tests RFC PATCH 09/19] arm/arm64: Zero secondary CPUs' stack Alexandru Elisei
@ 2022-08-09 12:56   ` Nikos Nikoleris
  2022-08-10  9:42     ` Alexandru Elisei
  2022-09-20  9:24   ` Andrew Jones
  1 sibling, 1 reply; 49+ messages in thread
From: Nikos Nikoleris @ 2022-08-09 12:56 UTC (permalink / raw)
  To: Alexandru Elisei, pbonzini, thuth, andrew.jones, kvm, kvmarm

On 09/08/2022 10:15, Alexandru Elisei wrote:
> For the boot CPU, the entire stack is zeroed in the entry code. For the
> secondaries, only struct thread_info, which lives at the bottom of the
> stack, is zeroed in thread_info_init().
>

That's a good point.

> Be consistent and zero the entire stack for the secondaries. This should
> also improve reproducibility of the testsuite, as all the stacks now start
> with the same contents, which is zero. And now that all the stacks are
> zeroed in the entry code, there is no need to explicitely zero struct
> thread_info in thread_info_init().
>

Wouldn't it make more sense to call memset(sp, 0, THREAD_SIZE); from 
thread_stack_alloc() instead and avoid doing this in assembly? Do we 
expect anyone to jump to secondary_entry without calling 
thread_stack_alloc() first?

Thanks,

Nikos

> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
> ---
>   arm/cstart.S          | 6 ++++++
>   arm/cstart64.S        | 3 +++
>   lib/arm/processor.c   | 1 -
>   lib/arm64/processor.c | 1 -
>   4 files changed, 9 insertions(+), 2 deletions(-)
> 
> diff --git a/arm/cstart.S b/arm/cstart.S
> index 39260e0fa470..39e70f40986a 100644
> --- a/arm/cstart.S
> +++ b/arm/cstart.S
> @@ -151,7 +151,13 @@ secondary_entry:
>   	 */
>   	ldr	r1, =secondary_data
>   	ldr	r0, [r1]
> +	mov	r2, r0
> +	lsr	r2, #THREAD_SHIFT
> +	lsl	r2, #THREAD_SHIFT
> +	add	r3, r2, #THREAD_SIZE
> +	zero_range r2, r3, r4, r5
>   	mov	sp, r0
> +
>   	bl	exceptions_init
>   	bl	enable_vfp
>   
> diff --git a/arm/cstart64.S b/arm/cstart64.S
> index d62360cf3859..54773676d1d5 100644
> --- a/arm/cstart64.S
> +++ b/arm/cstart64.S
> @@ -156,6 +156,9 @@ secondary_entry:
>   	/* set the stack */
>   	adrp	x0, secondary_data
>   	ldr	x0, [x0, :lo12:secondary_data]
> +	and	x1, x0, #THREAD_MASK
> +	add	x2, x1, #THREAD_SIZE
> +	zero_range x1, x2
>   	mov	sp, x0
>   
>   	/* finish init in C code */
> diff --git a/lib/arm/processor.c b/lib/arm/processor.c
> index 9d5759686b73..ceff1c0a1bd2 100644
> --- a/lib/arm/processor.c
> +++ b/lib/arm/processor.c
> @@ -117,7 +117,6 @@ void do_handle_exception(enum vector v, struct pt_regs *regs)
>   
>   void thread_info_init(struct thread_info *ti, unsigned int flags)
>   {
> -	memset(ti, 0, sizeof(struct thread_info));
>   	ti->cpu = mpidr_to_cpu(get_mpidr());
>   	ti->flags = flags;
>   }
> diff --git a/lib/arm64/processor.c b/lib/arm64/processor.c
> index 831207c16587..268b2858f0be 100644
> --- a/lib/arm64/processor.c
> +++ b/lib/arm64/processor.c
> @@ -232,7 +232,6 @@ void install_vector_handler(enum vector v, vector_fn fn)
>   
>   static void __thread_info_init(struct thread_info *ti, unsigned int flags)
>   {
> -	memset(ti, 0, sizeof(struct thread_info));
>   	ti->cpu = mpidr_to_cpu(get_mpidr());
>   	ti->flags = flags;
>   }
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [kvm-unit-tests RFC PATCH 12/19] arm/arm64: assembler.h: Replace size with end address for dcache_by_line_op
  2022-08-09  9:15 ` [kvm-unit-tests RFC PATCH 12/19] arm/arm64: assembler.h: Replace size with end address for dcache_by_line_op Alexandru Elisei
@ 2022-08-09 13:01   ` Nikos Nikoleris
  2022-09-20  9:37   ` Andrew Jones
  1 sibling, 0 replies; 49+ messages in thread
From: Nikos Nikoleris @ 2022-08-09 13:01 UTC (permalink / raw)
  To: Alexandru Elisei, pbonzini, thuth, andrew.jones, kvm, kvmarm

On 09/08/2022 10:15, Alexandru Elisei wrote:
> Commit b5f659be4775 ("arm/arm64: Remove dcache_line_size global
> variable") moved the dcache_by_line_op macro to assembler.h and changed
> it to take the size of the regions instead of the end address as
> parameter. This was done to keep the file in sync with the upstream
> Linux kernel implementation at the time.
> 
> But in both places where the macro is used, the code has the start and
> end address of the region, and it has to compute the size to pass it to
> dcache_by_line_op. Then the macro itsef computes the end by adding size
> to start.
> 
> Get rid of this massaging of parameters and change the macro to the end
> address as parameter directly.
> 
> Besides slightly simplyfing the code by remove two unneeded arithmetic
> operations, this makes the macro compatible with the current upstream
> version of Linux (which was similarly changed to take the end address in
> commit 163d3f80695e ("arm64: dcache_by_line_op to take end parameter
> instead of size")), which will allow us to reuse (part of) the Linux C
> wrappers over the assembly macro.
> 
> The change has been tested with the same snippet of code used to test
> commit 410b3bf09e76 ("arm/arm64: Perform dcache clean + invalidate after
> turning MMU off").
> 
> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>

Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com>

Thanks,

Nikos

> ---
>   arm/cstart.S              |  1 -
>   arm/cstart64.S            |  1 -
>   lib/arm/asm/assembler.h   | 11 +++++------
>   lib/arm64/asm/assembler.h | 11 +++++------
>   4 files changed, 10 insertions(+), 14 deletions(-)
> 
> diff --git a/arm/cstart.S b/arm/cstart.S
> index 39e70f40986a..096a77c454f4 100644
> --- a/arm/cstart.S
> +++ b/arm/cstart.S
> @@ -226,7 +226,6 @@ asm_mmu_disable:
>   	ldr	r0, [r0]
>   	ldr	r1, =__phys_end
>   	ldr	r1, [r1]
> -	sub	r1, r1, r0
>   	dcache_by_line_op dccimvac, sy, r0, r1, r2, r3
>   
>   	mov     pc, lr
> diff --git a/arm/cstart64.S b/arm/cstart64.S
> index 54773676d1d5..7cc90a9fa13f 100644
> --- a/arm/cstart64.S
> +++ b/arm/cstart64.S
> @@ -258,7 +258,6 @@ asm_mmu_disable:
>   	ldr	x0, [x0, :lo12:__phys_offset]
>   	adrp	x1, __phys_end
>   	ldr	x1, [x1, :lo12:__phys_end]
> -	sub	x1, x1, x0
>   	dcache_by_line_op civac, sy, x0, x1, x2, x3
>   
>   	ret
> diff --git a/lib/arm/asm/assembler.h b/lib/arm/asm/assembler.h
> index 4200252dd14d..db5f0f55027c 100644
> --- a/lib/arm/asm/assembler.h
> +++ b/lib/arm/asm/assembler.h
> @@ -25,17 +25,16 @@
>   
>   /*
>    * Macro to perform a data cache maintenance for the interval
> - * [addr, addr + size).
> + * [addr, end).
>    *
>    * 	op:		operation to execute
>    * 	domain		domain used in the dsb instruction
>    * 	addr:		starting virtual address of the region
> - * 	size:		size of the region
> - * 	Corrupts:	addr, size, tmp1, tmp2
> + * 	end:		the end of the region (non-inclusive)
> + * 	Corrupts:	addr, tmp1, tmp2
>    */
> -	.macro dcache_by_line_op op, domain, addr, size, tmp1, tmp2
> +	.macro dcache_by_line_op op, domain, addr, end, tmp1, tmp2
>   	dcache_line_size \tmp1, \tmp2
> -	add	\size, \addr, \size
>   	sub	\tmp2, \tmp1, #1
>   	bic	\addr, \addr, \tmp2
>   9998:
> @@ -45,7 +44,7 @@
>   	.err
>   	.endif
>   	add	\addr, \addr, \tmp1
> -	cmp	\addr, \size
> +	cmp	\addr, \end
>   	blo	9998b
>   	dsb	\domain
>   	.endm
> diff --git a/lib/arm64/asm/assembler.h b/lib/arm64/asm/assembler.h
> index aa8c65a2bb4a..1e09d65af4a7 100644
> --- a/lib/arm64/asm/assembler.h
> +++ b/lib/arm64/asm/assembler.h
> @@ -28,25 +28,24 @@
>   
>   /*
>    * Macro to perform a data cache maintenance for the interval
> - * [addr, addr + size). Use the raw value for the dcache line size because
> + * [addr, end). Use the raw value for the dcache line size because
>    * kvm-unit-tests has no concept of scheduling.
>    *
>    * 	op:		operation passed to dc instruction
>    * 	domain:		domain used in dsb instruction
>    * 	addr:		starting virtual address of the region
> - * 	size:		size of the region
> - * 	Corrupts:	addr, size, tmp1, tmp2
> + * 	end:		the end of the region (non-inclusive)
> + * 	Corrupts:	addr, tmp1, tmp2
>    */
>   
> -	.macro dcache_by_line_op op, domain, addr, size, tmp1, tmp2
> +	.macro dcache_by_line_op op, domain, addr, end, tmp1, tmp2
>   	raw_dcache_line_size \tmp1, \tmp2
> -	add	\size, \addr, \size
>   	sub	\tmp2, \tmp1, #1
>   	bic	\addr, \addr, \tmp2
>   9998:
>   	dc	\op, \addr
>   	add	\addr, \addr, \tmp1
> -	cmp	\addr, \size
> +	cmp	\addr, \end
>   	b.lo	9998b
>   	dsb	\domain
>   	.endm
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [kvm-unit-tests RFC PATCH 19/19] arm/arm64: Rework the cache maintenance in asm_mmu_disable
  2022-08-09  9:15 ` [kvm-unit-tests RFC PATCH 19/19] arm/arm64: Rework the cache maintenance in asm_mmu_disable Alexandru Elisei
@ 2022-08-09 13:53   ` Nikos Nikoleris
  2022-08-09 14:22     ` Alexandru Elisei
  0 siblings, 1 reply; 49+ messages in thread
From: Nikos Nikoleris @ 2022-08-09 13:53 UTC (permalink / raw)
  To: Alexandru Elisei, pbonzini, thuth, andrew.jones, kvm, kvmarm

Hi Alex,

On 09/08/2022 10:15, Alexandru Elisei wrote:
> asm_mmu_disable is overly ambitious and provably incorrect:
>
> 1. It tries to clean and invalidate the data caches for the *entire*
> memory, which is highly unnecessary, as it's very unlikely that a test
> will write to the entire memory, and even more unlikely that a test will
> modify the text section of the test image.
>

While it appears that we don't modify the text section, there is some
loading happening before we start executing a test. Are you sure that
the loader doesn't leave the memory dirty?

> 2. There is no corresponding dcache invalidate command for the entire
> memory in asm_mmu_enable, leaving it up to the test that disabled the
> MMU to do the cache maintenance in an asymmetrical fashion: only for
> re-enabling the MMU, but not for disabling it.
>
> 3. It's missing the DMB SY memory barrier to ensure that the dcache
> maintenance is performed after the last store executed in program order
> before calling asm_mmu_disable.
>

I am not sure why this is needed. In general, iiuc, a store to location
x followed by a DC CVAC to x in program order don't need an barrier (see
Arm ARM ARM DDI 0487G.b "Data cache maintenance instructions" at K11.5.1
and "Ordering and completion of data and instruction cache instructions"
at D4-2656). It doesn't hurt to have it but I think it's unnecessary.

Thanks,

Nikos

> Fix all of the issues in one go, by doing the cache maintenance only for
> the stack, as that is out of the control of the C code, and add the missing
> memory barrier.
>
> The code used to test that mmu_disable works correctly is similar to the
> code used to test commit 410b3bf09e76 ("arm/arm64: Perform dcache clean
> + invalidate after turning MMU off"), with extra cache maintenance
> added:
>
> +#include <alloc_page.h>
> +#include <asm/cacheflush.h>
> +#include <asm/mmu.h>
>   int main(int argc, char **argv)
>   {
> +       int *x = alloc_page();
> +       bool pass = true;
> +       int i;
> +
> +       for  (i = 0; i < 1000000; i++) {
> +               *x = 0x42;
> +               dcache_clean_addr_poc((unsigned long)x);
> +               mmu_disable();
> +               if (*x != 0x42) {
> +                       pass = false;
> +                       break;
> +               }
> +               *x = 0x50;
> +               /* Needed for the invalidation only. */
> +               dcache_clean_inval_addr_poc((unsigned long)x);
> +               mmu_enable(current_thread_info()->pgtable);
> +               if (*x != 0x50) {
> +                       pass = false;
> +                       break;
> +               }
> +       }
> +       report(pass, "MMU disable cache maintenance");
>
> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
> ---
>   arm/cstart.S   | 11 ++++++-----
>   arm/cstart64.S | 11 +++++------
>   2 files changed, 11 insertions(+), 11 deletions(-)
>
> diff --git a/arm/cstart.S b/arm/cstart.S
> index fc7c558802f1..b27de44f30a6 100644
> --- a/arm/cstart.S
> +++ b/arm/cstart.S
> @@ -242,11 +242,12 @@ asm_mmu_disable:
>       mcr     p15, 0, r0, c1, c0, 0
>       isb
>
> -     ldr     r0, =__phys_offset
> -     ldr     r0, [r0]
> -     ldr     r1, =__phys_end
> -     ldr     r1, [r1]
> -     dcache_by_line_op dccimvac, sy, r0, r1, r2, r3
> +     dmb     sy
> +     mov     r0, sp
> +     lsr     r0, #THREAD_SHIFT
> +     lsl     r0, #THREAD_SHIFT
> +     add     r1, r0, #THREAD_SIZE
> +     dcache_by_line_op dccmvac, sy, r0, r1, r3, r4
>
>       mov     pc, lr
>
> diff --git a/arm/cstart64.S b/arm/cstart64.S
> index 1ce6b9e14d23..af4970775298 100644
> --- a/arm/cstart64.S
> +++ b/arm/cstart64.S
> @@ -283,12 +283,11 @@ asm_mmu_disable:
>       msr     sctlr_el1, x0
>       isb
>
> -     /* Clean + invalidate the entire memory */
> -     adrp    x0, __phys_offset
> -     ldr     x0, [x0, :lo12:__phys_offset]
> -     adrp    x1, __phys_end
> -     ldr     x1, [x1, :lo12:__phys_end]
> -     dcache_by_line_op civac, sy, x0, x1, x2, x3
> +     dmb     sy
> +     mov     x9, sp
> +     and     x9, x9, #THREAD_MASK
> +     add     x10, x9, #THREAD_SIZE
> +     dcache_by_line_op cvac, sy, x9, x10, x11, x12
>
>       ret
>
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [kvm-unit-tests RFC PATCH 19/19] arm/arm64: Rework the cache maintenance in asm_mmu_disable
  2022-08-09 13:53   ` Nikos Nikoleris
@ 2022-08-09 14:22     ` Alexandru Elisei
  2022-08-09 15:53       ` Nikos Nikoleris
  0 siblings, 1 reply; 49+ messages in thread
From: Alexandru Elisei @ 2022-08-09 14:22 UTC (permalink / raw)
  To: Nikos Nikoleris; +Cc: pbonzini, thuth, kvmarm, kvm, andrew.jones

On Tue, Aug 09, 2022 at 02:53:34PM +0100, Nikos Nikoleris wrote:
> Hi Alex,
> 
> On 09/08/2022 10:15, Alexandru Elisei wrote:
> > asm_mmu_disable is overly ambitious and provably incorrect:
> > 
> > 1. It tries to clean and invalidate the data caches for the *entire*
> > memory, which is highly unnecessary, as it's very unlikely that a test
> > will write to the entire memory, and even more unlikely that a test will
> > modify the text section of the test image.
> > 
> 
> While it appears that we don't modify the text section, there is some
> loading happening before we start executing a test. Are you sure that the
> loader doesn't leave the memory dirty?

Yes, it's in the boot protocol for Linux [1]. I also mentioned this in the
commit message for the previous patch.

[1] https://elixir.bootlin.com/linux/v5.19/source/Documentation/arm64/booting.rst#L180

> 
> > 2. There is no corresponding dcache invalidate command for the entire
> > memory in asm_mmu_enable, leaving it up to the test that disabled the
> > MMU to do the cache maintenance in an asymmetrical fashion: only for
> > re-enabling the MMU, but not for disabling it.
> > 
> > 3. It's missing the DMB SY memory barrier to ensure that the dcache
> > maintenance is performed after the last store executed in program order
> > before calling asm_mmu_disable.
> > 
> 
> I am not sure why this is needed. In general, iiuc, a store to location x
> followed by a DC CVAC to x in program order don't need an barrier (see Arm
> ARM ARM DDI 0487G.b "Data cache maintenance instructions" at K11.5.1 and

Just a note, the latest public version is H.a.

K11.5.1 looks to me like it deals with ordering of the cache maintenance
operations with regards to memory accesses that are *after* the CMO in
program order, this patch is about memory accesses that are *before* the
CMO in program order.

> "Ordering and completion of data and instruction cache instructions" at
> D4-2656). It doesn't hurt to have it but I think it's unnecessary.

D4-2656 is about PAC, I assume you meant D4-2636 judging from the section
name (please correct me if I'm wrong):

"All data cache instructions, other than DC ZVA, that specify an address:
[..]
Can execute in any order relative to loads or stores that access any
address with the Device memory attribute, or with Normal memory with Inner
Non-cacheable attribute unless a DMB or DSB is executed between the
instructions."

Since the maintenance is performed with the MMU off, I think the DMB SY is
required as per the architecture.

I prefer to keep the maintenance after the MMU is disabled, to allow for
any kind of translation table setups that a test might conjure up (a test
in theory can create and install its own translation tables).

Thanks,
Alex

> 
> Thanks,
> 
> Nikos
> 
> > Fix all of the issues in one go, by doing the cache maintenance only for
> > the stack, as that is out of the control of the C code, and add the missing
> > memory barrier.
> > 
> > The code used to test that mmu_disable works correctly is similar to the
> > code used to test commit 410b3bf09e76 ("arm/arm64: Perform dcache clean
> > + invalidate after turning MMU off"), with extra cache maintenance
> > added:
> > 
> > +#include <alloc_page.h>
> > +#include <asm/cacheflush.h>
> > +#include <asm/mmu.h>
> >   int main(int argc, char **argv)
> >   {
> > +       int *x = alloc_page();
> > +       bool pass = true;
> > +       int i;
> > +
> > +       for  (i = 0; i < 1000000; i++) {
> > +               *x = 0x42;
> > +               dcache_clean_addr_poc((unsigned long)x);
> > +               mmu_disable();
> > +               if (*x != 0x42) {
> > +                       pass = false;
> > +                       break;
> > +               }
> > +               *x = 0x50;
> > +               /* Needed for the invalidation only. */
> > +               dcache_clean_inval_addr_poc((unsigned long)x);
> > +               mmu_enable(current_thread_info()->pgtable);
> > +               if (*x != 0x50) {
> > +                       pass = false;
> > +                       break;
> > +               }
> > +       }
> > +       report(pass, "MMU disable cache maintenance");
> > 
> > Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
> > ---
> >   arm/cstart.S   | 11 ++++++-----
> >   arm/cstart64.S | 11 +++++------
> >   2 files changed, 11 insertions(+), 11 deletions(-)
> > 
> > diff --git a/arm/cstart.S b/arm/cstart.S
> > index fc7c558802f1..b27de44f30a6 100644
> > --- a/arm/cstart.S
> > +++ b/arm/cstart.S
> > @@ -242,11 +242,12 @@ asm_mmu_disable:
> >   	mcr	p15, 0, r0, c1, c0, 0
> >   	isb
> > -	ldr	r0, =__phys_offset
> > -	ldr	r0, [r0]
> > -	ldr	r1, =__phys_end
> > -	ldr	r1, [r1]
> > -	dcache_by_line_op dccimvac, sy, r0, r1, r2, r3
> > +	dmb	sy
> > +	mov	r0, sp
> > +	lsr	r0, #THREAD_SHIFT
> > +	lsl	r0, #THREAD_SHIFT
> > +	add	r1, r0, #THREAD_SIZE
> > +	dcache_by_line_op dccmvac, sy, r0, r1, r3, r4
> >   	mov     pc, lr
> > diff --git a/arm/cstart64.S b/arm/cstart64.S
> > index 1ce6b9e14d23..af4970775298 100644
> > --- a/arm/cstart64.S
> > +++ b/arm/cstart64.S
> > @@ -283,12 +283,11 @@ asm_mmu_disable:
> >   	msr	sctlr_el1, x0
> >   	isb
> > -	/* Clean + invalidate the entire memory */
> > -	adrp	x0, __phys_offset
> > -	ldr	x0, [x0, :lo12:__phys_offset]
> > -	adrp	x1, __phys_end
> > -	ldr	x1, [x1, :lo12:__phys_end]
> > -	dcache_by_line_op civac, sy, x0, x1, x2, x3
> > +	dmb	sy
> > +	mov	x9, sp
> > +	and	x9, x9, #THREAD_MASK
> > +	add	x10, x9, #THREAD_SIZE
> > +	dcache_by_line_op cvac, sy, x9, x10, x11, x12
> >   	ret
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [kvm-unit-tests RFC PATCH 19/19] arm/arm64: Rework the cache maintenance in asm_mmu_disable
  2022-08-09 14:22     ` Alexandru Elisei
@ 2022-08-09 15:53       ` Nikos Nikoleris
  2022-08-09 16:53         ` Alexandru Elisei
  0 siblings, 1 reply; 49+ messages in thread
From: Nikos Nikoleris @ 2022-08-09 15:53 UTC (permalink / raw)
  To: Alexandru Elisei; +Cc: pbonzini, thuth, kvmarm, kvm, andrew.jones

On 09/08/2022 15:22, Alexandru Elisei wrote:
> On Tue, Aug 09, 2022 at 02:53:34PM +0100, Nikos Nikoleris wrote:
>> Hi Alex,
>>
>> On 09/08/2022 10:15, Alexandru Elisei wrote:
>>> asm_mmu_disable is overly ambitious and provably incorrect:
>>>
>>> 1. It tries to clean and invalidate the data caches for the *entire*
>>> memory, which is highly unnecessary, as it's very unlikely that a test
>>> will write to the entire memory, and even more unlikely that a test will
>>> modify the text section of the test image.
>>>
>>
>> While it appears that we don't modify the text section, there is some
>> loading happening before we start executing a test. Are you sure that the
>> loader doesn't leave the memory dirty?
> 
> Yes, it's in the boot protocol for Linux [1]. I also mentioned this in the
> commit message for the previous patch.
> 
> [1] https://elixir.bootlin.com/linux/v5.19/source/Documentation/arm64/booting.rst#L180
> 

I see, thanks!

Right now {asm_,}mmu_disable() is not used anywhere. So this patch will 
introduce the assumption that mmu_disable() can be safely called only if 
we didn't perform any writes, outside the test's stack, doesn't it?

When we add support for EFI, there is a lot happening from efi_main() 
until we get to the point where we can mmu_disable(), cleaning just the 
(new) stack of the test seems risky.

>>
>>> 2. There is no corresponding dcache invalidate command for the entire
>>> memory in asm_mmu_enable, leaving it up to the test that disabled the
>>> MMU to do the cache maintenance in an asymmetrical fashion: only for
>>> re-enabling the MMU, but not for disabling it.
>>>
>>> 3. It's missing the DMB SY memory barrier to ensure that the dcache
>>> maintenance is performed after the last store executed in program order
>>> before calling asm_mmu_disable.
>>>
>>
>> I am not sure why this is needed. In general, iiuc, a store to location x
>> followed by a DC CVAC to x in program order don't need an barrier (see Arm
>> ARM ARM DDI 0487G.b "Data cache maintenance instructions" at K11.5.1 and
> 
> Just a note, the latest public version is H.a.
> 
> K11.5.1 looks to me like it deals with ordering of the cache maintenance
> operations with regards to memory accesses that are *after* the CMO in
> program order, this patch is about memory accesses that are *before* the
> CMO in program order.
> 

The AArch64 example in K11.5.1 has a memory instruction before and after 
the CMO:

STR W5, [X1]
DC CVAC, X1
DMB ISH
STR W0, [X4]

The first store and the DC CVAC access the same cache line and there is 
no need for a memory barrier in between. The second store is assumed to 
be to a different location and that's why we need a barrier to order it 
with respect to the DC CVAC.

>> "Ordering and completion of data and instruction cache instructions" at
>> D4-2656). It doesn't hurt to have it but I think it's unnecessary.
> 
> D4-2656 is about PAC, I assume you meant D4-2636 judging from the section
> name (please correct me if I'm wrong): >
> "All data cache instructions, other than DC ZVA, that specify an address:
> [..]
> Can execute in any order relative to loads or stores that access any
> address with the Device memory attribute, or with Normal memory with Inner
> Non-cacheable attribute unless a DMB or DSB is executed between the
> instructions."
> 
> Since the maintenance is performed with the MMU off, I think the DMB SY is
> required as per the architecture.
> 
> I prefer to keep the maintenance after the MMU is disabled, to allow for
> any kind of translation table setups that a test might conjure up (a test
> in theory can create and install its own translation tables).
> 

Right, so between the stores and the DC CVAC, we've switched the MMU 
off, in which case the DMB SY might be necessary. I was missing this part.

The benefits of this design choice (switch the MMU off then clean data) 
are still unclear to me. This patch is modifying the CMO operation to 
perform only a clean. Why can't we clean the data cache before we switch 
off the MMU and use the same translation we used to write to it.

Thanks,

Nikos

> Thanks,
> Alex
> 
>>
>> Thanks,
>>
>> Nikos
>>
>>> Fix all of the issues in one go, by doing the cache maintenance only for
>>> the stack, as that is out of the control of the C code, and add the missing
>>> memory barrier.
>>>
>>> The code used to test that mmu_disable works correctly is similar to the
>>> code used to test commit 410b3bf09e76 ("arm/arm64: Perform dcache clean
>>> + invalidate after turning MMU off"), with extra cache maintenance
>>> added:
>>>
>>> +#include <alloc_page.h>
>>> +#include <asm/cacheflush.h>
>>> +#include <asm/mmu.h>
>>>    int main(int argc, char **argv)
>>>    {
>>> +       int *x = alloc_page();
>>> +       bool pass = true;
>>> +       int i;
>>> +
>>> +       for  (i = 0; i < 1000000; i++) {
>>> +               *x = 0x42;
>>> +               dcache_clean_addr_poc((unsigned long)x);
>>> +               mmu_disable();
>>> +               if (*x != 0x42) {
>>> +                       pass = false;
>>> +                       break;
>>> +               }
>>> +               *x = 0x50;
>>> +               /* Needed for the invalidation only. */
>>> +               dcache_clean_inval_addr_poc((unsigned long)x);
>>> +               mmu_enable(current_thread_info()->pgtable);
>>> +               if (*x != 0x50) {
>>> +                       pass = false;
>>> +                       break;
>>> +               }
>>> +       }
>>> +       report(pass, "MMU disable cache maintenance");
>>>
>>> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
>>> ---
>>>    arm/cstart.S   | 11 ++++++-----
>>>    arm/cstart64.S | 11 +++++------
>>>    2 files changed, 11 insertions(+), 11 deletions(-)
>>>
>>> diff --git a/arm/cstart.S b/arm/cstart.S
>>> index fc7c558802f1..b27de44f30a6 100644
>>> --- a/arm/cstart.S
>>> +++ b/arm/cstart.S
>>> @@ -242,11 +242,12 @@ asm_mmu_disable:
>>>    	mcr	p15, 0, r0, c1, c0, 0
>>>    	isb
>>> -	ldr	r0, =__phys_offset
>>> -	ldr	r0, [r0]
>>> -	ldr	r1, =__phys_end
>>> -	ldr	r1, [r1]
>>> -	dcache_by_line_op dccimvac, sy, r0, r1, r2, r3
>>> +	dmb	sy
>>> +	mov	r0, sp
>>> +	lsr	r0, #THREAD_SHIFT
>>> +	lsl	r0, #THREAD_SHIFT
>>> +	add	r1, r0, #THREAD_SIZE
>>> +	dcache_by_line_op dccmvac, sy, r0, r1, r3, r4
>>>    	mov     pc, lr
>>> diff --git a/arm/cstart64.S b/arm/cstart64.S
>>> index 1ce6b9e14d23..af4970775298 100644
>>> --- a/arm/cstart64.S
>>> +++ b/arm/cstart64.S
>>> @@ -283,12 +283,11 @@ asm_mmu_disable:
>>>    	msr	sctlr_el1, x0
>>>    	isb
>>> -	/* Clean + invalidate the entire memory */
>>> -	adrp	x0, __phys_offset
>>> -	ldr	x0, [x0, :lo12:__phys_offset]
>>> -	adrp	x1, __phys_end
>>> -	ldr	x1, [x1, :lo12:__phys_end]
>>> -	dcache_by_line_op civac, sy, x0, x1, x2, x3
>>> +	dmb	sy
>>> +	mov	x9, sp
>>> +	and	x9, x9, #THREAD_MASK
>>> +	add	x10, x9, #THREAD_SIZE
>>> +	dcache_by_line_op cvac, sy, x9, x10, x11, x12
>>>    	ret
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [kvm-unit-tests RFC PATCH 19/19] arm/arm64: Rework the cache maintenance in asm_mmu_disable
  2022-08-09 15:53       ` Nikos Nikoleris
@ 2022-08-09 16:53         ` Alexandru Elisei
  2022-08-09 19:48           ` Nikos Nikoleris
  0 siblings, 1 reply; 49+ messages in thread
From: Alexandru Elisei @ 2022-08-09 16:53 UTC (permalink / raw)
  To: Nikos Nikoleris; +Cc: pbonzini, thuth, kvmarm, kvm, andrew.jones

Hi,

On Tue, Aug 09, 2022 at 04:53:18PM +0100, Nikos Nikoleris wrote:
> On 09/08/2022 15:22, Alexandru Elisei wrote:
> > On Tue, Aug 09, 2022 at 02:53:34PM +0100, Nikos Nikoleris wrote:
> > > Hi Alex,
> > > 
> > > On 09/08/2022 10:15, Alexandru Elisei wrote:
> > > > asm_mmu_disable is overly ambitious and provably incorrect:
> > > > 
> > > > 1. It tries to clean and invalidate the data caches for the *entire*
> > > > memory, which is highly unnecessary, as it's very unlikely that a test
> > > > will write to the entire memory, and even more unlikely that a test will
> > > > modify the text section of the test image.
> > > > 
> > > 
> > > While it appears that we don't modify the text section, there is some
> > > loading happening before we start executing a test. Are you sure that the
> > > loader doesn't leave the memory dirty?
> > 
> > Yes, it's in the boot protocol for Linux [1]. I also mentioned this in the
> > commit message for the previous patch.
> > 
> > [1] https://elixir.bootlin.com/linux/v5.19/source/Documentation/arm64/booting.rst#L180
> > 
> 
> I see, thanks!
> 
> Right now {asm_,}mmu_disable() is not used anywhere. So this patch will
> introduce the assumption that mmu_disable() can be safely called only if we
> didn't perform any writes, outside the test's stack, doesn't it?

This patch introduces the assumption that the code that disables the MMU
will do the necessary cache maintenance. I should reword the commit
message to make it clearer.

> 
> When we add support for EFI, there is a lot happening from efi_main() until
> we get to the point where we can mmu_disable(), cleaning just the (new)
> stack of the test seems risky.

Well, that's an understatement, the code disabling the MMU definitly needs
to do the necessary cache maintenance! asm_mmu_disable() is not a silver
bullet that removes the need to do any cache maintenace, the previous patch
explains what needs to be done and why. If you're looking for inspiration
about what maintenance to be done for UEFI, I suggest you look there. Or
even better, you can reuse that code, which I think is the better approach
for the UEFI series going forward, but that's a discussion for the UEFI
thread.

> 
> > > 
> > > > 2. There is no corresponding dcache invalidate command for the entire
> > > > memory in asm_mmu_enable, leaving it up to the test that disabled the
> > > > MMU to do the cache maintenance in an asymmetrical fashion: only for
> > > > re-enabling the MMU, but not for disabling it.
> > > > 
> > > > 3. It's missing the DMB SY memory barrier to ensure that the dcache
> > > > maintenance is performed after the last store executed in program order
> > > > before calling asm_mmu_disable.
> > > > 
> > > 
> > > I am not sure why this is needed. In general, iiuc, a store to location x
> > > followed by a DC CVAC to x in program order don't need an barrier (see Arm
> > > ARM ARM DDI 0487G.b "Data cache maintenance instructions" at K11.5.1 and
> > 
> > Just a note, the latest public version is H.a.
> > 
> > K11.5.1 looks to me like it deals with ordering of the cache maintenance
> > operations with regards to memory accesses that are *after* the CMO in
> > program order, this patch is about memory accesses that are *before* the
> > CMO in program order.
> > 
> 
> The AArch64 example in K11.5.1 has a memory instruction before and after the
> CMO:
> 
> STR W5, [X1]
> DC CVAC, X1
> DMB ISH
> STR W0, [X4]
> 
> The first store and the DC CVAC access the same cache line and there is no
> need for a memory barrier in between. The second store is assumed to be to a
> different location and that's why we need a barrier to order it with respect
> to the DC CVAC.

It's explained why the DMB is not necessary in the section that you've
referenced. I'll reproduce the paragraph:

"All data cache instructions, other than DC ZVA, that specify an address:

Execute in program order relative to loads or stores that have all of the
following properties:

—Access an address in Normal memory with either Inner Write Through or
Inner Write Back attributes within the same cache line of minimum size, as
indicated by CTR_EL0.DMinLine.

—Use an address with the same cacheability attributes as the address passed
to the data cache instruction."

Both the store and the dcache clean access the same cache line, indexed by
the adress in register X1. Does that make sense to you?

> 
> > > "Ordering and completion of data and instruction cache instructions" at
> > > D4-2656). It doesn't hurt to have it but I think it's unnecessary.
> > 
> > D4-2656 is about PAC, I assume you meant D4-2636 judging from the section
> > name (please correct me if I'm wrong): >
> > "All data cache instructions, other than DC ZVA, that specify an address:
> > [..]
> > Can execute in any order relative to loads or stores that access any
> > address with the Device memory attribute, or with Normal memory with Inner
> > Non-cacheable attribute unless a DMB or DSB is executed between the
> > instructions."
> > 
> > Since the maintenance is performed with the MMU off, I think the DMB SY is
> > required as per the architecture.
> > 
> > I prefer to keep the maintenance after the MMU is disabled, to allow for
> > any kind of translation table setups that a test might conjure up (a test
> > in theory can create and install its own translation tables).
> > 
> 
> Right, so between the stores and the DC CVAC, we've switched the MMU off, in
> which case the DMB SY might be necessary. I was missing this part.
                       ^^^^^^^^^^^^^^^^^^^
		       might be necessary or might be **unnecessary**?

I would say that it's definitely unecessary according to the
architecture, not "might be".

> 
> The benefits of this design choice (switch the MMU off then clean data) are
> still unclear to me. This patch is modifying the CMO operation to perform
> only a clean. Why can't we clean the data cache before we switch off the MMU
> and use the same translation we used to write to it.

What do you mean by "translation"? Same VA to PA mapping? Or same address
attributes? If it's the latter, the architecture is pretty clear that this
is correct and expected.

If it's the VA to PA mapping, asm_mmu_disable is called with an identify
mapped stack, otherwise following the ret at the end of the function,
asm_mmu_disable would not return to the calling function (when the MMU is
disabled, all addresses are flat mapped, and x30/lr will point to some
bogus address). mmu_disable() even has an assert to check that the stack is
identify mapped.

So I really do think that the order of the operations is correct. Unless
you can prove otherwise.

Why is it so important to you that the dcache is cleaned with the MMU on?
It's correct either way, so I'm interested to know why you are so keen on
doing it with the MMU enabled. I've already told you my reason for doing it
with the MMU disabled, I'm waiting to hear yours.

Thanks,
Alex

> 
> Thanks,
> 
> Nikos
> 
> > Thanks,
> > Alex
> > 
> > > 
> > > Thanks,
> > > 
> > > Nikos
> > > 
> > > > Fix all of the issues in one go, by doing the cache maintenance only for
> > > > the stack, as that is out of the control of the C code, and add the missing
> > > > memory barrier.
> > > > 
> > > > The code used to test that mmu_disable works correctly is similar to the
> > > > code used to test commit 410b3bf09e76 ("arm/arm64: Perform dcache clean
> > > > + invalidate after turning MMU off"), with extra cache maintenance
> > > > added:
> > > > 
> > > > +#include <alloc_page.h>
> > > > +#include <asm/cacheflush.h>
> > > > +#include <asm/mmu.h>
> > > >    int main(int argc, char **argv)
> > > >    {
> > > > +       int *x = alloc_page();
> > > > +       bool pass = true;
> > > > +       int i;
> > > > +
> > > > +       for  (i = 0; i < 1000000; i++) {
> > > > +               *x = 0x42;
> > > > +               dcache_clean_addr_poc((unsigned long)x);
> > > > +               mmu_disable();
> > > > +               if (*x != 0x42) {
> > > > +                       pass = false;
> > > > +                       break;
> > > > +               }
> > > > +               *x = 0x50;
> > > > +               /* Needed for the invalidation only. */
> > > > +               dcache_clean_inval_addr_poc((unsigned long)x);
> > > > +               mmu_enable(current_thread_info()->pgtable);
> > > > +               if (*x != 0x50) {
> > > > +                       pass = false;
> > > > +                       break;
> > > > +               }
> > > > +       }
> > > > +       report(pass, "MMU disable cache maintenance");
> > > > 
> > > > Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
> > > > ---
> > > >    arm/cstart.S   | 11 ++++++-----
> > > >    arm/cstart64.S | 11 +++++------
> > > >    2 files changed, 11 insertions(+), 11 deletions(-)
> > > > 
> > > > diff --git a/arm/cstart.S b/arm/cstart.S
> > > > index fc7c558802f1..b27de44f30a6 100644
> > > > --- a/arm/cstart.S
> > > > +++ b/arm/cstart.S
> > > > @@ -242,11 +242,12 @@ asm_mmu_disable:
> > > >    	mcr	p15, 0, r0, c1, c0, 0
> > > >    	isb
> > > > -	ldr	r0, =__phys_offset
> > > > -	ldr	r0, [r0]
> > > > -	ldr	r1, =__phys_end
> > > > -	ldr	r1, [r1]
> > > > -	dcache_by_line_op dccimvac, sy, r0, r1, r2, r3
> > > > +	dmb	sy
> > > > +	mov	r0, sp
> > > > +	lsr	r0, #THREAD_SHIFT
> > > > +	lsl	r0, #THREAD_SHIFT
> > > > +	add	r1, r0, #THREAD_SIZE
> > > > +	dcache_by_line_op dccmvac, sy, r0, r1, r3, r4
> > > >    	mov     pc, lr
> > > > diff --git a/arm/cstart64.S b/arm/cstart64.S
> > > > index 1ce6b9e14d23..af4970775298 100644
> > > > --- a/arm/cstart64.S
> > > > +++ b/arm/cstart64.S
> > > > @@ -283,12 +283,11 @@ asm_mmu_disable:
> > > >    	msr	sctlr_el1, x0
> > > >    	isb
> > > > -	/* Clean + invalidate the entire memory */
> > > > -	adrp	x0, __phys_offset
> > > > -	ldr	x0, [x0, :lo12:__phys_offset]
> > > > -	adrp	x1, __phys_end
> > > > -	ldr	x1, [x1, :lo12:__phys_end]
> > > > -	dcache_by_line_op civac, sy, x0, x1, x2, x3
> > > > +	dmb	sy
> > > > +	mov	x9, sp
> > > > +	and	x9, x9, #THREAD_MASK
> > > > +	add	x10, x9, #THREAD_SIZE
> > > > +	dcache_by_line_op cvac, sy, x9, x10, x11, x12
> > > >    	ret
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [kvm-unit-tests RFC PATCH 19/19] arm/arm64: Rework the cache maintenance in asm_mmu_disable
  2022-08-09 16:53         ` Alexandru Elisei
@ 2022-08-09 19:48           ` Nikos Nikoleris
  2022-08-10  8:52             ` Alexandru Elisei
  0 siblings, 1 reply; 49+ messages in thread
From: Nikos Nikoleris @ 2022-08-09 19:48 UTC (permalink / raw)
  To: Alexandru Elisei; +Cc: pbonzini, thuth, kvmarm, kvm, andrew.jones

On 09/08/2022 17:53, Alexandru Elisei wrote:
> Hi,
> 
> On Tue, Aug 09, 2022 at 04:53:18PM +0100, Nikos Nikoleris wrote:
>> On 09/08/2022 15:22, Alexandru Elisei wrote:
>>> On Tue, Aug 09, 2022 at 02:53:34PM +0100, Nikos Nikoleris wrote:
>>>> Hi Alex,
>>>>
>>>> On 09/08/2022 10:15, Alexandru Elisei wrote:
>>>>> asm_mmu_disable is overly ambitious and provably incorrect:
>>>>>
>>>>> 1. It tries to clean and invalidate the data caches for the *entire*
>>>>> memory, which is highly unnecessary, as it's very unlikely that a test
>>>>> will write to the entire memory, and even more unlikely that a test will
>>>>> modify the text section of the test image.
>>>>>
>>>>
>>>> While it appears that we don't modify the text section, there is some
>>>> loading happening before we start executing a test. Are you sure that the
>>>> loader doesn't leave the memory dirty?
>>>
>>> Yes, it's in the boot protocol for Linux [1]. I also mentioned this in the
>>> commit message for the previous patch.
>>>
>>> [1] https://elixir.bootlin.com/linux/v5.19/source/Documentation/arm64/booting.rst#L180
>>>
>>
>> I see, thanks!
>>
>> Right now {asm_,}mmu_disable() is not used anywhere. So this patch will
>> introduce the assumption that mmu_disable() can be safely called only if we
>> didn't perform any writes, outside the test's stack, doesn't it?
> 
> This patch introduces the assumption that the code that disables the MMU
> will do the necessary cache maintenance. I should reword the commit
> message to make it clearer.
> 
>>
>> When we add support for EFI, there is a lot happening from efi_main() until
>> we get to the point where we can mmu_disable(), cleaning just the (new)
>> stack of the test seems risky.
> 
> Well, that's an understatement, the code disabling the MMU definitly needs
> to do the necessary cache maintenance! asm_mmu_disable() is not a silver
> bullet that removes the need to do any cache maintenace, the previous patch
> explains what needs to be done and why. If you're looking for inspiration
> about what maintenance to be done for UEFI, I suggest you look there. Or
> even better, you can reuse that code, which I think is the better approach
> for the UEFI series going forward, but that's a discussion for the UEFI
> thread.
> 

I would argue that asm_mmu_disable() was "a silver bullet" and your 
patch is changing it. How do you choose what is reasonable for 
asm_mmu_disable to clean? Why should it clean the stack?

>>
>>>>
>>>>> 2. There is no corresponding dcache invalidate command for the entire
>>>>> memory in asm_mmu_enable, leaving it up to the test that disabled the
>>>>> MMU to do the cache maintenance in an asymmetrical fashion: only for
>>>>> re-enabling the MMU, but not for disabling it.
>>>>>
>>>>> 3. It's missing the DMB SY memory barrier to ensure that the dcache
>>>>> maintenance is performed after the last store executed in program order
>>>>> before calling asm_mmu_disable.
>>>>>
>>>>
>>>> I am not sure why this is needed. In general, iiuc, a store to location x
>>>> followed by a DC CVAC to x in program order don't need an barrier (see Arm
>>>> ARM ARM DDI 0487G.b "Data cache maintenance instructions" at K11.5.1 and
>>>
>>> Just a note, the latest public version is H.a.
>>>
>>> K11.5.1 looks to me like it deals with ordering of the cache maintenance
>>> operations with regards to memory accesses that are *after* the CMO in
>>> program order, this patch is about memory accesses that are *before* the
>>> CMO in program order.
>>>
>>
>> The AArch64 example in K11.5.1 has a memory instruction before and after the
>> CMO:
>>
>> STR W5, [X1]
>> DC CVAC, X1
>> DMB ISH
>> STR W0, [X4]
>>
>> The first store and the DC CVAC access the same cache line and there is no
>> need for a memory barrier in between. The second store is assumed to be to a
>> different location and that's why we need a barrier to order it with respect
>> to the DC CVAC.
> 
> It's explained why the DMB is not necessary in the section that you've
> referenced. I'll reproduce the paragraph:
> 
> "All data cache instructions, other than DC ZVA, that specify an address:
> 
> Execute in program order relative to loads or stores that have all of the
> following properties:
> 
> —Access an address in Normal memory with either Inner Write Through or
> Inner Write Back attributes within the same cache line of minimum size, as
> indicated by CTR_EL0.DMinLine.
> 
> —Use an address with the same cacheability attributes as the address passed
> to the data cache instruction."
> 
> Both the store and the dcache clean access the same cache line, indexed by
> the adress in register X1. Does that make sense to you?
> 
>>
>>>> "Ordering and completion of data and instruction cache instructions" at
>>>> D4-2656). It doesn't hurt to have it but I think it's unnecessary.
>>>
>>> D4-2656 is about PAC, I assume you meant D4-2636 judging from the section
>>> name (please correct me if I'm wrong): >
>>> "All data cache instructions, other than DC ZVA, that specify an address:
>>> [..]
>>> Can execute in any order relative to loads or stores that access any
>>> address with the Device memory attribute, or with Normal memory with Inner
>>> Non-cacheable attribute unless a DMB or DSB is executed between the
>>> instructions."
>>>
>>> Since the maintenance is performed with the MMU off, I think the DMB SY is
>>> required as per the architecture.
>>>
>>> I prefer to keep the maintenance after the MMU is disabled, to allow for
>>> any kind of translation table setups that a test might conjure up (a test
>>> in theory can create and install its own translation tables).
>>>
>>
>> Right, so between the stores and the DC CVAC, we've switched the MMU off, in
>> which case the DMB SY might be necessary. I was missing this part.
>                         ^^^^^^^^^^^^^^^^^^^
> 		       might be necessary or might be **unnecessary**?
> 
> I would say that it's definitely unecessary according to the
> architecture, not "might be".
> 

Well you had successfully convinced me that since we're switching the 
MMU off, there needs to be a barrier to ensure that the dc cvac is 
ordered with respect to prior stores. Switching the MMU off means that 
stores could be executed with different memory attributes (e.g., Normal, 
Inner-Shareable, Writeback) than the DC CVAC (Device-nGnRnE). Some type 
of barrier might be needed. This is what your patch is doing.

>>
>> The benefits of this design choice (switch the MMU off then clean data) are
>> still unclear to me. This patch is modifying the CMO operation to perform
>> only a clean. Why can't we clean the data cache before we switch off the MMU
>> and use the same translation we used to write to it.
> 
> What do you mean by "translation"? Same VA to PA mapping? Or same address
> attributes? If it's the latter, the architecture is pretty clear that this
> is correct and expected.

Both.

> 
> If it's the VA to PA mapping, asm_mmu_disable is called with an identify
> mapped stack, otherwise following the ret at the end of the function,
> asm_mmu_disable would not return to the calling function (when the MMU is
> disabled, all addresses are flat mapped, and x30/lr will point to some
> bogus address). mmu_disable() even has an assert to check that the stack is
> identify mapped.
> 
> So I really do think that the order of the operations is correct. Unless
> you can prove otherwise.
> 
> Why is it so important to you that the dcache is cleaned with the MMU on?
> It's correct either way, so I'm interested to know why you are so keen on
> doing it with the MMU enabled. I've already told you my reason for doing it
> with the MMU disabled, I'm waiting to hear yours.

Sorry, maybe I am missing something. As far as I remember, your argument 
was that invalidating the cache before switching the MMU off was 
pointless, for Normal Memory any kind of speculation might result in 
fetching data to the cache. I agree. But this patch changes the CMO we 
use and it doesn't invalidate any more. What was the argument for 
cleaning the cache after switching the MMU off?

I am happy either way, I am just trying to understand :)

Thanks,

Nikos

> 
> Thanks,
> Alex
> 
>>
>> Thanks,
>>
>> Nikos
>>
>>> Thanks,
>>> Alex
>>>
>>>>
>>>> Thanks,
>>>>
>>>> Nikos
>>>>
>>>>> Fix all of the issues in one go, by doing the cache maintenance only for
>>>>> the stack, as that is out of the control of the C code, and add the missing
>>>>> memory barrier.
>>>>>
>>>>> The code used to test that mmu_disable works correctly is similar to the
>>>>> code used to test commit 410b3bf09e76 ("arm/arm64: Perform dcache clean
>>>>> + invalidate after turning MMU off"), with extra cache maintenance
>>>>> added:
>>>>>
>>>>> +#include <alloc_page.h>
>>>>> +#include <asm/cacheflush.h>
>>>>> +#include <asm/mmu.h>
>>>>>     int main(int argc, char **argv)
>>>>>     {
>>>>> +       int *x = alloc_page();
>>>>> +       bool pass = true;
>>>>> +       int i;
>>>>> +
>>>>> +       for  (i = 0; i < 1000000; i++) {
>>>>> +               *x = 0x42;
>>>>> +               dcache_clean_addr_poc((unsigned long)x);
>>>>> +               mmu_disable();
>>>>> +               if (*x != 0x42) {
>>>>> +                       pass = false;
>>>>> +                       break;
>>>>> +               }
>>>>> +               *x = 0x50;
>>>>> +               /* Needed for the invalidation only. */
>>>>> +               dcache_clean_inval_addr_poc((unsigned long)x);
>>>>> +               mmu_enable(current_thread_info()->pgtable);
>>>>> +               if (*x != 0x50) {
>>>>> +                       pass = false;
>>>>> +                       break;
>>>>> +               }
>>>>> +       }
>>>>> +       report(pass, "MMU disable cache maintenance");
>>>>>
>>>>> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
>>>>> ---
>>>>>     arm/cstart.S   | 11 ++++++-----
>>>>>     arm/cstart64.S | 11 +++++------
>>>>>     2 files changed, 11 insertions(+), 11 deletions(-)
>>>>>
>>>>> diff --git a/arm/cstart.S b/arm/cstart.S
>>>>> index fc7c558802f1..b27de44f30a6 100644
>>>>> --- a/arm/cstart.S
>>>>> +++ b/arm/cstart.S
>>>>> @@ -242,11 +242,12 @@ asm_mmu_disable:
>>>>>     	mcr	p15, 0, r0, c1, c0, 0
>>>>>     	isb
>>>>> -	ldr	r0, =__phys_offset
>>>>> -	ldr	r0, [r0]
>>>>> -	ldr	r1, =__phys_end
>>>>> -	ldr	r1, [r1]
>>>>> -	dcache_by_line_op dccimvac, sy, r0, r1, r2, r3
>>>>> +	dmb	sy
>>>>> +	mov	r0, sp
>>>>> +	lsr	r0, #THREAD_SHIFT
>>>>> +	lsl	r0, #THREAD_SHIFT
>>>>> +	add	r1, r0, #THREAD_SIZE
>>>>> +	dcache_by_line_op dccmvac, sy, r0, r1, r3, r4
>>>>>     	mov     pc, lr
>>>>> diff --git a/arm/cstart64.S b/arm/cstart64.S
>>>>> index 1ce6b9e14d23..af4970775298 100644
>>>>> --- a/arm/cstart64.S
>>>>> +++ b/arm/cstart64.S
>>>>> @@ -283,12 +283,11 @@ asm_mmu_disable:
>>>>>     	msr	sctlr_el1, x0
>>>>>     	isb
>>>>> -	/* Clean + invalidate the entire memory */
>>>>> -	adrp	x0, __phys_offset
>>>>> -	ldr	x0, [x0, :lo12:__phys_offset]
>>>>> -	adrp	x1, __phys_end
>>>>> -	ldr	x1, [x1, :lo12:__phys_end]
>>>>> -	dcache_by_line_op civac, sy, x0, x1, x2, x3
>>>>> +	dmb	sy
>>>>> +	mov	x9, sp
>>>>> +	and	x9, x9, #THREAD_MASK
>>>>> +	add	x10, x9, #THREAD_SIZE
>>>>> +	dcache_by_line_op cvac, sy, x9, x10, x11, x12
>>>>>     	ret
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [kvm-unit-tests RFC PATCH 19/19] arm/arm64: Rework the cache maintenance in asm_mmu_disable
  2022-08-09 19:48           ` Nikos Nikoleris
@ 2022-08-10  8:52             ` Alexandru Elisei
  0 siblings, 0 replies; 49+ messages in thread
From: Alexandru Elisei @ 2022-08-10  8:52 UTC (permalink / raw)
  To: Nikos Nikoleris; +Cc: pbonzini, thuth, kvmarm, kvm, andrew.jones

Hi Nikos,

Just want to make this clear, this patch should be read in conjuction  with
the previous patch, patch #18 ("arm/arm64: Perform dcache maintenance at
boot"), which explains what dcache maintenance operations are needed for
writes when the MMU is disabled. Without it, this patch might look very
confusing.

On Tue, Aug 09, 2022 at 08:48:40PM +0100, Nikos Nikoleris wrote:
> On 09/08/2022 17:53, Alexandru Elisei wrote:
> > Hi,
> > 
> > On Tue, Aug 09, 2022 at 04:53:18PM +0100, Nikos Nikoleris wrote:
> > > On 09/08/2022 15:22, Alexandru Elisei wrote:
> > > > On Tue, Aug 09, 2022 at 02:53:34PM +0100, Nikos Nikoleris wrote:
> > > > > Hi Alex,
> > > > > 
> > > > > On 09/08/2022 10:15, Alexandru Elisei wrote:
> > > > > > asm_mmu_disable is overly ambitious and provably incorrect:
> > > > > > 
> > > > > > 1. It tries to clean and invalidate the data caches for the *entire*
> > > > > > memory, which is highly unnecessary, as it's very unlikely that a test
> > > > > > will write to the entire memory, and even more unlikely that a test will
> > > > > > modify the text section of the test image.
> > > > > > 
> > > > > 
> > > > > While it appears that we don't modify the text section, there is some
> > > > > loading happening before we start executing a test. Are you sure that the
> > > > > loader doesn't leave the memory dirty?
> > > > 
> > > > Yes, it's in the boot protocol for Linux [1]. I also mentioned this in the
> > > > commit message for the previous patch.
> > > > 
> > > > [1] https://elixir.bootlin.com/linux/v5.19/source/Documentation/arm64/booting.rst#L180
> > > > 
> > > 
> > > I see, thanks!
> > > 
> > > Right now {asm_,}mmu_disable() is not used anywhere. So this patch will
> > > introduce the assumption that mmu_disable() can be safely called only if we
> > > didn't perform any writes, outside the test's stack, doesn't it?
> > 
> > This patch introduces the assumption that the code that disables the MMU
> > will do the necessary cache maintenance. I should reword the commit
> > message to make it clearer.
> > 
> > > 
> > > When we add support for EFI, there is a lot happening from efi_main() until
> > > we get to the point where we can mmu_disable(), cleaning just the (new)
> > > stack of the test seems risky.
> > 
> > Well, that's an understatement, the code disabling the MMU definitly needs
> > to do the necessary cache maintenance! asm_mmu_disable() is not a silver
> > bullet that removes the need to do any cache maintenace, the previous patch
> > explains what needs to be done and why. If you're looking for inspiration
> > about what maintenance to be done for UEFI, I suggest you look there. Or
> > even better, you can reuse that code, which I think is the better approach
> > for the UEFI series going forward, but that's a discussion for the UEFI
> > thread.
> > 
> 
> I would argue that asm_mmu_disable() was "a silver bullet" and your patch is

This commit message explains why asm_mmu_disable cannot be a silver bullet
(reason number 2).

> changing it. How do you choose what is reasonable for asm_mmu_disable to
> clean? Why should it clean the stack?

I explained in this commit message what it should clean the stack (the
second to last paragrah).

As for deciding what's reasonable, the more memory the environment has, the
longer it will take to clean it. Tried it with 3GB of memory,
asm_mmu_disable() takes about 1 second, and it should scale linearly with
the memory size. I find that to be a poor user experience, I don't want the
user to wait for several seconds if not minutes on servers with a large
memory pool, especially since it's totally unnecessary and can be avoided
with a few simple rules: want to read something with MMU off, do a dcache
clean to PoC; want to write something with the MMU off, do a dcache inval
to PoC before the write; want to read with the MMU on something that was
written with the MMU off, do a dcache inval to PoC before the read. That's
just how it is on arm64, and has been this way since the architecture was
created. And if you're wondering why kvm-unit-tests doesn't do it in the
setup code, the previous patch explains why.

So, to summarize:

- cleaning the entire memory in asm_mmu_disable is not enough to ensure
correctness if the MMU is ever re-enabled.
- cleaning the entire memory can take a long time.

In conclusion, it's better that asm_mmu_disable doesn't clean the entire
memory.

> 
> > > 
> > > > > 
> > > > > > 2. There is no corresponding dcache invalidate command for the entire
> > > > > > memory in asm_mmu_enable, leaving it up to the test that disabled the
> > > > > > MMU to do the cache maintenance in an asymmetrical fashion: only for
> > > > > > re-enabling the MMU, but not for disabling it.
> > > > > > 
> > > > > > 3. It's missing the DMB SY memory barrier to ensure that the dcache
> > > > > > maintenance is performed after the last store executed in program order
> > > > > > before calling asm_mmu_disable.
> > > > > > 
> > > > > 
> > > > > I am not sure why this is needed. In general, iiuc, a store to location x
> > > > > followed by a DC CVAC to x in program order don't need an barrier (see Arm
> > > > > ARM ARM DDI 0487G.b "Data cache maintenance instructions" at K11.5.1 and
> > > > 
> > > > Just a note, the latest public version is H.a.
> > > > 
> > > > K11.5.1 looks to me like it deals with ordering of the cache maintenance
> > > > operations with regards to memory accesses that are *after* the CMO in
> > > > program order, this patch is about memory accesses that are *before* the
> > > > CMO in program order.
> > > > 
> > > 
> > > The AArch64 example in K11.5.1 has a memory instruction before and after the
> > > CMO:
> > > 
> > > STR W5, [X1]
> > > DC CVAC, X1
> > > DMB ISH
> > > STR W0, [X4]
> > > 
> > > The first store and the DC CVAC access the same cache line and there is no
> > > need for a memory barrier in between. The second store is assumed to be to a
> > > different location and that's why we need a barrier to order it with respect
> > > to the DC CVAC.
> > 
> > It's explained why the DMB is not necessary in the section that you've
> > referenced. I'll reproduce the paragraph:
> > 
> > "All data cache instructions, other than DC ZVA, that specify an address:
> > 
> > Execute in program order relative to loads or stores that have all of the
> > following properties:
> > 
> > —Access an address in Normal memory with either Inner Write Through or
> > Inner Write Back attributes within the same cache line of minimum size, as
> > indicated by CTR_EL0.DMinLine.
> > 
> > —Use an address with the same cacheability attributes as the address passed
> > to the data cache instruction."
> > 
> > Both the store and the dcache clean access the same cache line, indexed by
> > the adress in register X1. Does that make sense to you?
> > 
> > > 
> > > > > "Ordering and completion of data and instruction cache instructions" at
> > > > > D4-2656). It doesn't hurt to have it but I think it's unnecessary.
> > > > 
> > > > D4-2656 is about PAC, I assume you meant D4-2636 judging from the section
> > > > name (please correct me if I'm wrong): >
> > > > "All data cache instructions, other than DC ZVA, that specify an address:
> > > > [..]
> > > > Can execute in any order relative to loads or stores that access any
> > > > address with the Device memory attribute, or with Normal memory with Inner
> > > > Non-cacheable attribute unless a DMB or DSB is executed between the
> > > > instructions."
> > > > 
> > > > Since the maintenance is performed with the MMU off, I think the DMB SY is
> > > > required as per the architecture.
> > > > 
> > > > I prefer to keep the maintenance after the MMU is disabled, to allow for
> > > > any kind of translation table setups that a test might conjure up (a test
> > > > in theory can create and install its own translation tables).
> > > > 
> > > 
> > > Right, so between the stores and the DC CVAC, we've switched the MMU off, in
> > > which case the DMB SY might be necessary. I was missing this part.
> >                         ^^^^^^^^^^^^^^^^^^^
> > 		       might be necessary or might be **unnecessary**?
> > 
> > I would say that it's definitely unecessary according to the
> > architecture, not "might be".
> > 
> 
> Well you had successfully convinced me that since we're switching the MMU
> off, there needs to be a barrier to ensure that the dc cvac is ordered with
> respect to prior stores. Switching the MMU off means that stores could be
> executed with different memory attributes (e.g., Normal, Inner-Shareable,
> Writeback) than the DC CVAC (Device-nGnRnE). Some type of barrier might be
> needed. This is what your patch is doing.

Exactly! I got myself confused, and wrote "unnecessary" when I meant
"necessary". So the above paragraph should read:

"I would say that it's definitely **necessary** according to the
architecture, not "might be".

Sorry for the confusion :(

> 
> > > 
> > > The benefits of this design choice (switch the MMU off then clean data) are
> > > still unclear to me. This patch is modifying the CMO operation to perform
> > > only a clean. Why can't we clean the data cache before we switch off the MMU
> > > and use the same translation we used to write to it.
> > 
> > What do you mean by "translation"? Same VA to PA mapping? Or same address
> > attributes? If it's the latter, the architecture is pretty clear that this
> > is correct and expected.
> 
> Both.
> 
> > 
> > If it's the VA to PA mapping, asm_mmu_disable is called with an identify
> > mapped stack, otherwise following the ret at the end of the function,
> > asm_mmu_disable would not return to the calling function (when the MMU is
> > disabled, all addresses are flat mapped, and x30/lr will point to some
> > bogus address). mmu_disable() even has an assert to check that the stack is
> > identify mapped.
> > 
> > So I really do think that the order of the operations is correct. Unless
> > you can prove otherwise.
> > 
> > Why is it so important to you that the dcache is cleaned with the MMU on?
> > It's correct either way, so I'm interested to know why you are so keen on
> > doing it with the MMU enabled. I've already told you my reason for doing it
> > with the MMU disabled, I'm waiting to hear yours.
> 
> Sorry, maybe I am missing something. As far as I remember, your argument was
> that invalidating the cache before switching the MMU off was pointless, for
> Normal Memory any kind of speculation might result in fetching data to the
> cache. I agree. But this patch changes the CMO we use and it doesn't

Yeah, that was with the old implementation. The old implementation was
incorrect, because clean cache lines can still be allocated when
kvm-unit-tests is running with the MMU disabled by higher level software
(which is running with the MMU on), so you also need to do invalidate after
the last write with the MMU off (to be 100% correct, before the first read
with the MMU on; this is explained in the previous patch, and in point
number 2 in this commit message; also, if you're wondering where's the
corresponding invalidation, it's in asm_mmu_enable, added in the previous
patch).

> invalidate any more. What was the argument for cleaning the cache after
> switching the MMU off?

So there's no confusion about the DMB SY being needed. And I can come up
with another: it's possibly faster, because the address if flat mapped and
doesn't have to be translated using the translation tables (the dcaches are
physically indexed) - not really important, but hey, it's a reason. Yes,
they're both very weak, that's why I'm waiting to be convinced otherwise.

Why did you want the cache maintenance to be done with the MMU enabled?

Thanks,
Alex

> 
> I am happy either way, I am just trying to understand :)
> 
> Thanks,
> 
> Nikos
> 
> > 
> > Thanks,
> > Alex
> > 
> > > 
> > > Thanks,
> > > 
> > > Nikos
> > > 
> > > > Thanks,
> > > > Alex
> > > > 
> > > > > 
> > > > > Thanks,
> > > > > 
> > > > > Nikos
> > > > > 
> > > > > > Fix all of the issues in one go, by doing the cache maintenance only for
> > > > > > the stack, as that is out of the control of the C code, and add the missing
> > > > > > memory barrier.
> > > > > > 
> > > > > > The code used to test that mmu_disable works correctly is similar to the
> > > > > > code used to test commit 410b3bf09e76 ("arm/arm64: Perform dcache clean
> > > > > > + invalidate after turning MMU off"), with extra cache maintenance
> > > > > > added:
> > > > > > 
> > > > > > +#include <alloc_page.h>
> > > > > > +#include <asm/cacheflush.h>
> > > > > > +#include <asm/mmu.h>
> > > > > >     int main(int argc, char **argv)
> > > > > >     {
> > > > > > +       int *x = alloc_page();
> > > > > > +       bool pass = true;
> > > > > > +       int i;
> > > > > > +
> > > > > > +       for  (i = 0; i < 1000000; i++) {
> > > > > > +               *x = 0x42;
> > > > > > +               dcache_clean_addr_poc((unsigned long)x);
> > > > > > +               mmu_disable();
> > > > > > +               if (*x != 0x42) {
> > > > > > +                       pass = false;
> > > > > > +                       break;
> > > > > > +               }
> > > > > > +               *x = 0x50;
> > > > > > +               /* Needed for the invalidation only. */
> > > > > > +               dcache_clean_inval_addr_poc((unsigned long)x);
> > > > > > +               mmu_enable(current_thread_info()->pgtable);
> > > > > > +               if (*x != 0x50) {
> > > > > > +                       pass = false;
> > > > > > +                       break;
> > > > > > +               }
> > > > > > +       }
> > > > > > +       report(pass, "MMU disable cache maintenance");
> > > > > > 
> > > > > > Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
> > > > > > ---
> > > > > >     arm/cstart.S   | 11 ++++++-----
> > > > > >     arm/cstart64.S | 11 +++++------
> > > > > >     2 files changed, 11 insertions(+), 11 deletions(-)
> > > > > > 
> > > > > > diff --git a/arm/cstart.S b/arm/cstart.S
> > > > > > index fc7c558802f1..b27de44f30a6 100644
> > > > > > --- a/arm/cstart.S
> > > > > > +++ b/arm/cstart.S
> > > > > > @@ -242,11 +242,12 @@ asm_mmu_disable:
> > > > > >     	mcr	p15, 0, r0, c1, c0, 0
> > > > > >     	isb
> > > > > > -	ldr	r0, =__phys_offset
> > > > > > -	ldr	r0, [r0]
> > > > > > -	ldr	r1, =__phys_end
> > > > > > -	ldr	r1, [r1]
> > > > > > -	dcache_by_line_op dccimvac, sy, r0, r1, r2, r3
> > > > > > +	dmb	sy
> > > > > > +	mov	r0, sp
> > > > > > +	lsr	r0, #THREAD_SHIFT
> > > > > > +	lsl	r0, #THREAD_SHIFT
> > > > > > +	add	r1, r0, #THREAD_SIZE
> > > > > > +	dcache_by_line_op dccmvac, sy, r0, r1, r3, r4
> > > > > >     	mov     pc, lr
> > > > > > diff --git a/arm/cstart64.S b/arm/cstart64.S
> > > > > > index 1ce6b9e14d23..af4970775298 100644
> > > > > > --- a/arm/cstart64.S
> > > > > > +++ b/arm/cstart64.S
> > > > > > @@ -283,12 +283,11 @@ asm_mmu_disable:
> > > > > >     	msr	sctlr_el1, x0
> > > > > >     	isb
> > > > > > -	/* Clean + invalidate the entire memory */
> > > > > > -	adrp	x0, __phys_offset
> > > > > > -	ldr	x0, [x0, :lo12:__phys_offset]
> > > > > > -	adrp	x1, __phys_end
> > > > > > -	ldr	x1, [x1, :lo12:__phys_end]
> > > > > > -	dcache_by_line_op civac, sy, x0, x1, x2, x3
> > > > > > +	dmb	sy
> > > > > > +	mov	x9, sp
> > > > > > +	and	x9, x9, #THREAD_MASK
> > > > > > +	add	x10, x9, #THREAD_SIZE
> > > > > > +	dcache_by_line_op cvac, sy, x9, x10, x11, x12
> > > > > >     	ret
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [kvm-unit-tests RFC PATCH 09/19] arm/arm64: Zero secondary CPUs' stack
  2022-08-09 12:56   ` Nikos Nikoleris
@ 2022-08-10  9:42     ` Alexandru Elisei
  2022-08-10 10:00       ` Nikos Nikoleris
  0 siblings, 1 reply; 49+ messages in thread
From: Alexandru Elisei @ 2022-08-10  9:42 UTC (permalink / raw)
  To: Nikos Nikoleris; +Cc: pbonzini, thuth, kvmarm, kvm, andrew.jones

Hi Nikos,

On Tue, Aug 09, 2022 at 01:56:13PM +0100, Nikos Nikoleris wrote:
> On 09/08/2022 10:15, Alexandru Elisei wrote:
> > For the boot CPU, the entire stack is zeroed in the entry code. For the
> > secondaries, only struct thread_info, which lives at the bottom of the
> > stack, is zeroed in thread_info_init().
> > 
> 
> That's a good point.
> 
> > Be consistent and zero the entire stack for the secondaries. This should
> > also improve reproducibility of the testsuite, as all the stacks now start
> > with the same contents, which is zero. And now that all the stacks are
> > zeroed in the entry code, there is no need to explicitely zero struct
> > thread_info in thread_info_init().
> > 
> 
> Wouldn't it make more sense to call memset(sp, 0, THREAD_SIZE); from
> thread_stack_alloc() instead and avoid doing this in assembly? Do we expect

I prefer to do the zero'ing in assembly because:

1. For consistency, which is one of the main reasons this patch exists.

2. I don't want to deal with all the cache maintenance that is required for
inter-CPU communication. Let's keep it simple.

> anyone to jump to secondary_entry without calling thread_stack_alloc()
> first?

It's impossible to jump to secondary_data.entry without allocating the
stack first, because it's impossible to run C code without a valid stack.

Thanks,
Alex

> 
> Thanks,
> 
> Nikos
> 
> > Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
> > ---
> >   arm/cstart.S          | 6 ++++++
> >   arm/cstart64.S        | 3 +++
> >   lib/arm/processor.c   | 1 -
> >   lib/arm64/processor.c | 1 -
> >   4 files changed, 9 insertions(+), 2 deletions(-)
> > 
> > diff --git a/arm/cstart.S b/arm/cstart.S
> > index 39260e0fa470..39e70f40986a 100644
> > --- a/arm/cstart.S
> > +++ b/arm/cstart.S
> > @@ -151,7 +151,13 @@ secondary_entry:
> >   	 */
> >   	ldr	r1, =secondary_data
> >   	ldr	r0, [r1]
> > +	mov	r2, r0
> > +	lsr	r2, #THREAD_SHIFT
> > +	lsl	r2, #THREAD_SHIFT
> > +	add	r3, r2, #THREAD_SIZE
> > +	zero_range r2, r3, r4, r5
> >   	mov	sp, r0
> > +
> >   	bl	exceptions_init
> >   	bl	enable_vfp
> > diff --git a/arm/cstart64.S b/arm/cstart64.S
> > index d62360cf3859..54773676d1d5 100644
> > --- a/arm/cstart64.S
> > +++ b/arm/cstart64.S
> > @@ -156,6 +156,9 @@ secondary_entry:
> >   	/* set the stack */
> >   	adrp	x0, secondary_data
> >   	ldr	x0, [x0, :lo12:secondary_data]
> > +	and	x1, x0, #THREAD_MASK
> > +	add	x2, x1, #THREAD_SIZE
> > +	zero_range x1, x2
> >   	mov	sp, x0
> >   	/* finish init in C code */
> > diff --git a/lib/arm/processor.c b/lib/arm/processor.c
> > index 9d5759686b73..ceff1c0a1bd2 100644
> > --- a/lib/arm/processor.c
> > +++ b/lib/arm/processor.c
> > @@ -117,7 +117,6 @@ void do_handle_exception(enum vector v, struct pt_regs *regs)
> >   void thread_info_init(struct thread_info *ti, unsigned int flags)
> >   {
> > -	memset(ti, 0, sizeof(struct thread_info));
> >   	ti->cpu = mpidr_to_cpu(get_mpidr());
> >   	ti->flags = flags;
> >   }
> > diff --git a/lib/arm64/processor.c b/lib/arm64/processor.c
> > index 831207c16587..268b2858f0be 100644
> > --- a/lib/arm64/processor.c
> > +++ b/lib/arm64/processor.c
> > @@ -232,7 +232,6 @@ void install_vector_handler(enum vector v, vector_fn fn)
> >   static void __thread_info_init(struct thread_info *ti, unsigned int flags)
> >   {
> > -	memset(ti, 0, sizeof(struct thread_info));
> >   	ti->cpu = mpidr_to_cpu(get_mpidr());
> >   	ti->flags = flags;
> >   }
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [kvm-unit-tests RFC PATCH 09/19] arm/arm64: Zero secondary CPUs' stack
  2022-08-10  9:42     ` Alexandru Elisei
@ 2022-08-10 10:00       ` Nikos Nikoleris
  0 siblings, 0 replies; 49+ messages in thread
From: Nikos Nikoleris @ 2022-08-10 10:00 UTC (permalink / raw)
  To: Alexandru Elisei; +Cc: pbonzini, thuth, kvmarm, kvm, andrew.jones

On 10/08/2022 10:42, Alexandru Elisei wrote:
> Hi Nikos,
> 
> On Tue, Aug 09, 2022 at 01:56:13PM +0100, Nikos Nikoleris wrote:
>> On 09/08/2022 10:15, Alexandru Elisei wrote:
>>> For the boot CPU, the entire stack is zeroed in the entry code. For the
>>> secondaries, only struct thread_info, which lives at the bottom of the
>>> stack, is zeroed in thread_info_init().
>>>
>>
>> That's a good point.
>>
>>> Be consistent and zero the entire stack for the secondaries. This should
>>> also improve reproducibility of the testsuite, as all the stacks now start
>>> with the same contents, which is zero. And now that all the stacks are
>>> zeroed in the entry code, there is no need to explicitely zero struct
>>> thread_info in thread_info_init().
>>>
>>
>> Wouldn't it make more sense to call memset(sp, 0, THREAD_SIZE); from
>> thread_stack_alloc() instead and avoid doing this in assembly? Do we expect
> 
> I prefer to do the zero'ing in assembly because:
> 
> 1. For consistency, which is one of the main reasons this patch exists.
> 
> 2. I don't want to deal with all the cache maintenance that is required for
> inter-CPU communication. Let's keep it simple.
> 

I see that's a very good point. For this reason, I agree initializing to 
0 is better done locally.

Since you brought this up, we might have to worry about the thread_info 
fields we initialize in __thread_info_init(). But this is a problem for 
another patch.

Reviewed-by: Nikos Nikoleris <nikos.nikoleris@arm.com>

Thanks,

Nikos

>> anyone to jump to secondary_entry without calling thread_stack_alloc()
>> first?
> 
> It's impossible to jump to secondary_data.entry without allocating the
> stack first, because it's impossible to run C code without a valid stack.
>  > Thanks,
> Alex
> 
>>
>> Thanks,
>>
>> Nikos
>>
>>> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
>>> ---
>>>    arm/cstart.S          | 6 ++++++
>>>    arm/cstart64.S        | 3 +++
>>>    lib/arm/processor.c   | 1 -
>>>    lib/arm64/processor.c | 1 -
>>>    4 files changed, 9 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/arm/cstart.S b/arm/cstart.S
>>> index 39260e0fa470..39e70f40986a 100644
>>> --- a/arm/cstart.S
>>> +++ b/arm/cstart.S
>>> @@ -151,7 +151,13 @@ secondary_entry:
>>>    	 */
>>>    	ldr	r1, =secondary_data
>>>    	ldr	r0, [r1]
>>> +	mov	r2, r0
>>> +	lsr	r2, #THREAD_SHIFT
>>> +	lsl	r2, #THREAD_SHIFT
>>> +	add	r3, r2, #THREAD_SIZE
>>> +	zero_range r2, r3, r4, r5
>>>    	mov	sp, r0
>>> +
>>>    	bl	exceptions_init
>>>    	bl	enable_vfp
>>> diff --git a/arm/cstart64.S b/arm/cstart64.S
>>> index d62360cf3859..54773676d1d5 100644
>>> --- a/arm/cstart64.S
>>> +++ b/arm/cstart64.S
>>> @@ -156,6 +156,9 @@ secondary_entry:
>>>    	/* set the stack */
>>>    	adrp	x0, secondary_data
>>>    	ldr	x0, [x0, :lo12:secondary_data]
>>> +	and	x1, x0, #THREAD_MASK
>>> +	add	x2, x1, #THREAD_SIZE
>>> +	zero_range x1, x2
>>>    	mov	sp, x0
>>>    	/* finish init in C code */
>>> diff --git a/lib/arm/processor.c b/lib/arm/processor.c
>>> index 9d5759686b73..ceff1c0a1bd2 100644
>>> --- a/lib/arm/processor.c
>>> +++ b/lib/arm/processor.c
>>> @@ -117,7 +117,6 @@ void do_handle_exception(enum vector v, struct pt_regs *regs)
>>>    void thread_info_init(struct thread_info *ti, unsigned int flags)
>>>    {
>>> -	memset(ti, 0, sizeof(struct thread_info));
>>>    	ti->cpu = mpidr_to_cpu(get_mpidr());
>>>    	ti->flags = flags;
>>>    }
>>> diff --git a/lib/arm64/processor.c b/lib/arm64/processor.c
>>> index 831207c16587..268b2858f0be 100644
>>> --- a/lib/arm64/processor.c
>>> +++ b/lib/arm64/processor.c
>>> @@ -232,7 +232,6 @@ void install_vector_handler(enum vector v, vector_fn fn)
>>>    static void __thread_info_init(struct thread_info *ti, unsigned int flags)
>>>    {
>>> -	memset(ti, 0, sizeof(struct thread_info));
>>>    	ti->cpu = mpidr_to_cpu(get_mpidr());
>>>    	ti->flags = flags;
>>>    }
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [kvm-unit-tests RFC PATCH 01/19] Makefile: Define __ASSEMBLY__ for assembly files
  2022-08-09  9:15 ` [kvm-unit-tests RFC PATCH 01/19] Makefile: Define __ASSEMBLY__ for assembly files Alexandru Elisei
  2022-08-09 12:36   ` Nikos Nikoleris
@ 2022-09-20  8:11   ` Andrew Jones
  1 sibling, 0 replies; 49+ messages in thread
From: Andrew Jones @ 2022-09-20  8:11 UTC (permalink / raw)
  To: Alexandru Elisei; +Cc: nikos.nikoleris, pbonzini, thuth, kvmarm, kvm

On Tue, Aug 09, 2022 at 10:15:40AM +0100, Alexandru Elisei wrote:
> There are 25 header files today (found with grep -r "#ifndef __ASSEMBLY__)
> with functionality relies on the __ASSEMBLY__ prepocessor constant being
> correctly defined to work correctly. So far, kvm-unit-tests has relied on
> the assembly files to define the constant before including any header
> files which depend on it.
> 
> Let's make sure that nobody gets this wrong and define it as a compiler
> constant when compiling assembly files. __ASSEMBLY__ is now defined for all
> .S files, even those that didn't set it explicitely before.
> 
> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
> ---
>  Makefile           | 5 ++++-
>  arm/cstart.S       | 1 -
>  arm/cstart64.S     | 1 -
>  powerpc/cstart64.S | 1 -
>  4 files changed, 4 insertions(+), 4 deletions(-)
>

Reviewed-by: Andrew Jones <andrew.jones@linux.dev>
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [kvm-unit-tests RFC PATCH 02/19] lib/alloc_phys: Initialize align_min
  2022-08-09  9:15 ` [kvm-unit-tests RFC PATCH 02/19] lib/alloc_phys: Initialize align_min Alexandru Elisei
@ 2022-09-20  8:20   ` Andrew Jones
  0 siblings, 0 replies; 49+ messages in thread
From: Andrew Jones @ 2022-09-20  8:20 UTC (permalink / raw)
  To: Alexandru Elisei; +Cc: nikos.nikoleris, pbonzini, thuth, kvmarm, kvm

On Tue, Aug 09, 2022 at 10:15:41AM +0100, Alexandru Elisei wrote:
> Commit 11c4715fbf87 ("alloc: implement free") changed align_min from a
> static variable to a field for the alloc_ops struct and carried over the
> initializer value of DEFAULT_MINIMUM_ALIGNMENT.
> 
> Commit 7e3e823b78c0 ("lib/alloc.h: remove align_min from struct
> alloc_ops") removed the align_min field and changed it back to a static
> variable, but missed initializing it.
> 
> Initialize align_min to DEFAULT_MINIMUM_ALIGNMENT, as it was intended.
> 
> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
> ---
>  lib/alloc_phys.c | 7 +++----
>  lib/alloc_phys.h | 2 --
>  2 files changed, 3 insertions(+), 6 deletions(-)

Reviewed-by: Andrew Jones <andrew.jones@linux.dev>
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [kvm-unit-tests RFC PATCH 03/19] lib/alloc_phys: Use phys_alloc_aligned_safe and rename it to memalign_early
  2022-08-09  9:15 ` [kvm-unit-tests RFC PATCH 03/19] lib/alloc_phys: Use phys_alloc_aligned_safe and rename it to memalign_early Alexandru Elisei
@ 2022-09-20  8:27   ` Andrew Jones
  0 siblings, 0 replies; 49+ messages in thread
From: Andrew Jones @ 2022-09-20  8:27 UTC (permalink / raw)
  To: Alexandru Elisei; +Cc: nikos.nikoleris, pbonzini, thuth, kvmarm, kvm

On Tue, Aug 09, 2022 at 10:15:42AM +0100, Alexandru Elisei wrote:
> phys_alloc_aligned_safe() is called only by early_memalign() and the safe
> parameter is always true. In the spirit of simplifying the code, merge the
> two functions together. Rename it to memalign_early(), to match the naming
> scheme used by the page allocator.
> 
> Change the type of top_safe to phys_addr_t, to match the type of the top
> and base variables describing the available physical memory; this is a
> cosmetic change only, since libcflat.h defines phys_addr_t as an alias
> for u64.
> 
> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
> ---
>  lib/alloc_phys.c | 38 ++++++++++++++------------------------
>  1 file changed, 14 insertions(+), 24 deletions(-)
>

Reviewed-by: Andrew Jones <andrew.jones@linux.dev>
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [kvm-unit-tests RFC PATCH 06/19] lib/alloc_phys: Remove allocation accounting
  2022-08-09  9:15 ` [kvm-unit-tests RFC PATCH 06/19] lib/alloc_phys: Remove allocation accounting Alexandru Elisei
@ 2022-09-20  8:40   ` Andrew Jones
  2022-09-20 13:19     ` Alexandru Elisei
  0 siblings, 1 reply; 49+ messages in thread
From: Andrew Jones @ 2022-09-20  8:40 UTC (permalink / raw)
  To: Alexandru Elisei; +Cc: nikos.nikoleris, pbonzini, thuth, kvmarm, kvm

On Tue, Aug 09, 2022 at 10:15:45AM +0100, Alexandru Elisei wrote:
> The page allocator has better allocation tracking and is used by all
> architectures, while the physical allocator is now never used for
> allocating memory.
> 
> Simplify the physical allocator by removing allocation accounting. This
> accomplishes two things:
> 
> 1. It makes the allocator more useful, as the warning that was displayed
> each allocation after the 256th is removed.
> 
> 2. Together with the lock removal, the physical allocator becomes more
> appealing as a very early allocator, when using the page allocator might
> not be desirable or feasible.

How does the locking cause problems when used in an early allocator?

> 
> Also, phys_alloc_show() has received a slight change in the way it displays
> the use and free regions: the end of the region is now non-inclusive, to
> allow phys_alloc_show() to express that no memory has been used, or no
> memory is free, in which case the start and the end adresses are equal.
> 
> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
> ---
>  lib/alloc_phys.c | 65 ++++++++++++++----------------------------------
>  lib/alloc_phys.h |  5 ++--
>  2 files changed, 21 insertions(+), 49 deletions(-)
>

Reviewed-by: Andrew Jones <andrew.jones@linux.dev>
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [kvm-unit-tests RFC PATCH 05/19] lib/alloc_phys: Remove locking
  2022-08-09  9:15 ` [kvm-unit-tests RFC PATCH 05/19] lib/alloc_phys: Remove locking Alexandru Elisei
@ 2022-09-20  8:45   ` Andrew Jones
  2022-09-20 13:20     ` Alexandru Elisei
  0 siblings, 1 reply; 49+ messages in thread
From: Andrew Jones @ 2022-09-20  8:45 UTC (permalink / raw)
  To: Alexandru Elisei; +Cc: nikos.nikoleris, pbonzini, thuth, kvmarm, kvm

On Tue, Aug 09, 2022 at 10:15:44AM +0100, Alexandru Elisei wrote:
> With powerpc moving the page allocator, there are no architectures left
> which use the physical allocator after the boot setup:  arm, arm64,
> s390x and powerpc drain the physical allocator to initialize the page
> allocator; and x86 calls setup_vm() to drain the allocator for each of
> the tests that allocate memory.

Please put the motivation for this change in the commit message. I looked
ahead at the next patch to find it, but I'm not sure I agree with it. We
should be able to keep the locking even when used early, since we probably
need our locking to be something we can use early elsewhere anyway.

Thanks,
drew
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [kvm-unit-tests RFC PATCH 07/19] arm/arm64: Mark the phys_end parameter as unused in setup_mmu()
  2022-08-09  9:15 ` [kvm-unit-tests RFC PATCH 07/19] arm/arm64: Mark the phys_end parameter as unused in setup_mmu() Alexandru Elisei
@ 2022-09-20  8:58   ` Andrew Jones
  2022-09-26 11:01     ` Alexandru Elisei
  0 siblings, 1 reply; 49+ messages in thread
From: Andrew Jones @ 2022-09-20  8:58 UTC (permalink / raw)
  To: Alexandru Elisei; +Cc: nikos.nikoleris, pbonzini, thuth, kvmarm, kvm

On Tue, Aug 09, 2022 at 10:15:46AM +0100, Alexandru Elisei wrote:
> phys_end was used to cap the linearly mapped memory to 3G to allow 1G of
> room for the vmalloc area to grown down. This was made useless in commit
> c1cd1a2bed69 ("arm/arm64: mmu: Remove memory layout assumptions"), when
> setup_mmu() was changed to map all the detected memory regions without
> changing their limits.

c1cd1a2bed69 was a start, but as that commit says, the 3G-4G region was
still necessary due to assumptions in the virtual memory allocator. This
patch needs to point out a vmalloc commit which removes that assumption
as well for its justification.

Thanks,
drew

> 
> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
> ---
>  lib/arm/mmu.c | 6 +-----
>  1 file changed, 1 insertion(+), 5 deletions(-)
> 
> diff --git a/lib/arm/mmu.c b/lib/arm/mmu.c
> index e1a72fe4941f..8f936acafe8b 100644
> --- a/lib/arm/mmu.c
> +++ b/lib/arm/mmu.c
> @@ -153,14 +153,10 @@ void mmu_set_range_sect(pgd_t *pgtable, uintptr_t virt_offset,
>  	}
>  }
>  
> -void *setup_mmu(phys_addr_t phys_end, void *unused)
> +void *setup_mmu(phys_addr_t unused0, void *unused1)
>  {
>  	struct mem_region *r;
>  
> -	/* 3G-4G region is reserved for vmalloc, cap phys_end at 3G */
> -	if (phys_end > (3ul << 30))
> -		phys_end = 3ul << 30;
> -
>  #ifdef __aarch64__
>  	init_alloc_vpage((void*)(4ul << 30));
>  
> -- 
> 2.37.1
> 
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [kvm-unit-tests RFC PATCH 08/19] arm/arm64: Use pgd_alloc() to allocate mmu_idmap
  2022-08-09  9:15 ` [kvm-unit-tests RFC PATCH 08/19] arm/arm64: Use pgd_alloc() to allocate mmu_idmap Alexandru Elisei
@ 2022-09-20  9:05   ` Andrew Jones
  0 siblings, 0 replies; 49+ messages in thread
From: Andrew Jones @ 2022-09-20  9:05 UTC (permalink / raw)
  To: Alexandru Elisei; +Cc: nikos.nikoleris, pbonzini, thuth, kvmarm, kvm

On Tue, Aug 09, 2022 at 10:15:47AM +0100, Alexandru Elisei wrote:
> Until commit 031755dbfefb ("arm: enable vmalloc"), the idmap was allocated
> using pgd_alloc(). After that commit, all the page table allocator
> functions were switched to using the page allocator, but pgd_alloc() was
> left unchanged and became unused, with the idmap now being allocated with
> alloc_page().
> 
> For arm64, the pgd table size varies based on the page size, which is
> configured by the user. For arm, it will always contain 4 entries (it
> translates bits 31:30 of the input address). To keep things simple and
> consistent with the other functions and across both architectures, modify
> pgd_alloc() to use alloc_page() instead of memalign like the rest of the
> page table allocator functions and use it to allocate the idmap.
> 
> Note that when the idmap is created, alloc_ops->memalign is
> memalign_pages() which allocates memory with page granularity. Using
> memalign() as before would still have allocated a full page.
> 
> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
> ---
>  lib/arm/asm/pgtable.h   | 4 ++--
>  lib/arm/mmu.c           | 4 ++--
>  lib/arm64/asm/pgtable.h | 4 ++--
>  3 files changed, 6 insertions(+), 6 deletions(-)
>

Reviewed-by: Andrew Jones <andrew.jones@linux.dev>
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [kvm-unit-tests RFC PATCH 09/19] arm/arm64: Zero secondary CPUs' stack
  2022-08-09  9:15 ` [kvm-unit-tests RFC PATCH 09/19] arm/arm64: Zero secondary CPUs' stack Alexandru Elisei
  2022-08-09 12:56   ` Nikos Nikoleris
@ 2022-09-20  9:24   ` Andrew Jones
  1 sibling, 0 replies; 49+ messages in thread
From: Andrew Jones @ 2022-09-20  9:24 UTC (permalink / raw)
  To: Alexandru Elisei; +Cc: nikos.nikoleris, pbonzini, thuth, kvmarm, kvm

On Tue, Aug 09, 2022 at 10:15:48AM +0100, Alexandru Elisei wrote:
> For the boot CPU, the entire stack is zeroed in the entry code. For the
> secondaries, only struct thread_info, which lives at the bottom of the
> stack, is zeroed in thread_info_init().
> 
> Be consistent and zero the entire stack for the secondaries. This should
> also improve reproducibility of the testsuite, as all the stacks now start
> with the same contents, which is zero. And now that all the stacks are
> zeroed in the entry code, there is no need to explicitely zero struct
> thread_info in thread_info_init().
> 
> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
> ---
>  arm/cstart.S          | 6 ++++++
>  arm/cstart64.S        | 3 +++
>  lib/arm/processor.c   | 1 -
>  lib/arm64/processor.c | 1 -
>  4 files changed, 9 insertions(+), 2 deletions(-)
>

Reviewed-by: Andrew Jones <andrew.jones@linux.dev>
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [kvm-unit-tests RFC PATCH 12/19] arm/arm64: assembler.h: Replace size with end address for dcache_by_line_op
  2022-08-09  9:15 ` [kvm-unit-tests RFC PATCH 12/19] arm/arm64: assembler.h: Replace size with end address for dcache_by_line_op Alexandru Elisei
  2022-08-09 13:01   ` Nikos Nikoleris
@ 2022-09-20  9:37   ` Andrew Jones
  1 sibling, 0 replies; 49+ messages in thread
From: Andrew Jones @ 2022-09-20  9:37 UTC (permalink / raw)
  To: Alexandru Elisei; +Cc: nikos.nikoleris, pbonzini, thuth, kvmarm, kvm

On Tue, Aug 09, 2022 at 10:15:51AM +0100, Alexandru Elisei wrote:
> Commit b5f659be4775 ("arm/arm64: Remove dcache_line_size global
> variable") moved the dcache_by_line_op macro to assembler.h and changed
> it to take the size of the regions instead of the end address as
> parameter. This was done to keep the file in sync with the upstream
> Linux kernel implementation at the time.
> 
> But in both places where the macro is used, the code has the start and
> end address of the region, and it has to compute the size to pass it to
> dcache_by_line_op. Then the macro itsef computes the end by adding size
> to start.
> 
> Get rid of this massaging of parameters and change the macro to the end
> address as parameter directly.
> 
> Besides slightly simplyfing the code by remove two unneeded arithmetic
> operations, this makes the macro compatible with the current upstream
> version of Linux (which was similarly changed to take the end address in
> commit 163d3f80695e ("arm64: dcache_by_line_op to take end parameter
> instead of size")), which will allow us to reuse (part of) the Linux C
> wrappers over the assembly macro.
> 
> The change has been tested with the same snippet of code used to test
> commit 410b3bf09e76 ("arm/arm64: Perform dcache clean + invalidate after
> turning MMU off").
> 
> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
> ---
>  arm/cstart.S              |  1 -
>  arm/cstart64.S            |  1 -
>  lib/arm/asm/assembler.h   | 11 +++++------
>  lib/arm64/asm/assembler.h | 11 +++++------
>  4 files changed, 10 insertions(+), 14 deletions(-)
>

Reviewed-by: Andrew Jones <andrew.jones@linux.dev>
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [kvm-unit-tests RFC PATCH 13/19] arm: page.h: Add missing libcflat.h include
  2022-08-09  9:15 ` [kvm-unit-tests RFC PATCH 13/19] arm: page.h: Add missing libcflat.h include Alexandru Elisei
@ 2022-09-20  9:39   ` Andrew Jones
  2022-09-26 11:02     ` Alexandru Elisei
  0 siblings, 1 reply; 49+ messages in thread
From: Andrew Jones @ 2022-09-20  9:39 UTC (permalink / raw)
  To: Alexandru Elisei; +Cc: nikos.nikoleris, pbonzini, thuth, kvmarm, kvm


I guess this should be squashed into one of the early patches in this
series since we don't have this issue with the current code.

Thanks,
drew


On Tue, Aug 09, 2022 at 10:15:52AM +0100, Alexandru Elisei wrote:
> Include libcflat from page.h to avoid error like this one:
> 
> /path/to/kvm-unit-tests/lib/asm/page.h:19:9: error: unknown type name ‘u64’
>    19 | typedef u64 pteval_t;
>       |         ^~~
> [..]
> /path/to/kvm-unit-tests/lib/asm/page.h:47:8: error: unknown type name ‘phys_addr_t’
>    47 | extern phys_addr_t __virt_to_phys(unsigned long addr);
>       |        ^~~~~~~~~~~
>       |                                     ^~~~~~~~~~~
> [..]
> /path/to/kvm-unit-tests/lib/asm/page.h:50:47: error: unknown type name ‘size_t’
>    50 | extern void *__ioremap(phys_addr_t phys_addr, size_t size);
> 
> The arm64 version of the header already includes libcflat since commit
> a2d06852fe59 ("arm64: Add support for configuring the translation
> granule").
> 
> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
> ---
>  lib/arm/asm/page.h | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/lib/arm/asm/page.h b/lib/arm/asm/page.h
> index 8eb4a883808e..0a46bda018c7 100644
> --- a/lib/arm/asm/page.h
> +++ b/lib/arm/asm/page.h
> @@ -8,6 +8,8 @@
>  
>  #include <linux/const.h>
>  
> +#include <libcflat.h>
> +
>  #define PAGE_SHIFT		12
>  #define PAGE_SIZE		(_AC(1,UL) << PAGE_SHIFT)
>  #define PAGE_MASK		(~(PAGE_SIZE-1))
> -- 
> 2.37.1
> 
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [kvm-unit-tests RFC PATCH 16/19] arm/arm64: Allocate secondaries' stack using the page allocator
  2022-08-09  9:15 ` [kvm-unit-tests RFC PATCH 16/19] arm/arm64: Allocate secondaries' stack using the page allocator Alexandru Elisei
@ 2022-09-20  9:58   ` Andrew Jones
  0 siblings, 0 replies; 49+ messages in thread
From: Andrew Jones @ 2022-09-20  9:58 UTC (permalink / raw)
  To: Alexandru Elisei; +Cc: nikos.nikoleris, pbonzini, thuth, kvmarm, kvm

On Tue, Aug 09, 2022 at 10:15:55AM +0100, Alexandru Elisei wrote:
> The vmalloc allocator returns non-id mapped addresses, where the virtual
> address is different than the physical address. This makes it impossible
> to access the stack of the secondary CPUs while the MMU is disabled.
> 
> On arm, THREAD_SIZE is 16K and PAGE_SIZE is 4K, which makes THREAD_SIZE
> a power of two multiple of PAGE_SIZE. On arm64, THREAD_SIZE is 16 when
> PAGE_SIZE is 4K or 16K, and 64K when PAGE_SIZE is 64K. In all cases,
> THREAD_SIZE is a power of two multiple of PAGE_SIZE. As a result, using
> memalign_pages() for the stack won't lead to wasted memory.
> 
> memalign_pages() allocates memory in chunks of power of two number of
> pages, aligned to the allocation size, which makes it a drop-in
> replacement for vm_memalign (which is the value for alloc_ops->memalign
> when the stack is allocated).
> 
> Using memalign_pages() has two distinct benefits:
> 
> 1. The secondary CPUs' stack can be used with the MMU off.
> 
> 2. The secondary CPUs' stack is identify mapped similar to the stack for
> the primary CPU, which makes the configuration of the CPUs consistent.
> 
> memalign_pages_flags() has been used instead of memalign_pages() to
> instruct the allocator not to zero the stack, as it's already zeroed in the
> entry code.
> 
> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
> ---
>  lib/arm/asm/thread_info.h | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/lib/arm/asm/thread_info.h b/lib/arm/asm/thread_info.h
> index eaa72582af86..190e082cbba0 100644
> --- a/lib/arm/asm/thread_info.h
> +++ b/lib/arm/asm/thread_info.h
> @@ -25,6 +25,7 @@
>  #ifndef __ASSEMBLY__
>  #include <asm/processor.h>
>  #include <alloc.h>
> +#include <alloc_page.h>
>  
>  #ifdef __arm__
>  #include <asm/ptrace.h>
> @@ -40,7 +41,7 @@
>  
>  static inline void *thread_stack_alloc(void)
>  {
> -	void *sp = memalign(THREAD_ALIGNMENT, THREAD_SIZE);
> +	void *sp = memalign_pages_flags(THREAD_ALIGNMENT, THREAD_SIZE, FLAG_DONTZERO);
>  	return sp + THREAD_START_SP;
>  }
>  
> -- 
> 2.37.1
>

Reviewed-by: Andrew Jones <andrew.jones@linux.dev>
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [kvm-unit-tests RFC PATCH 06/19] lib/alloc_phys: Remove allocation accounting
  2022-09-20  8:40   ` Andrew Jones
@ 2022-09-20 13:19     ` Alexandru Elisei
  0 siblings, 0 replies; 49+ messages in thread
From: Alexandru Elisei @ 2022-09-20 13:19 UTC (permalink / raw)
  To: Andrew Jones; +Cc: nikos.nikoleris, pbonzini, thuth, kvmarm, kvm

Hi,

On Tue, Sep 20, 2022 at 10:40:47AM +0200, Andrew Jones wrote:
> On Tue, Aug 09, 2022 at 10:15:45AM +0100, Alexandru Elisei wrote:
> > The page allocator has better allocation tracking and is used by all
> > architectures, while the physical allocator is now never used for
> > allocating memory.
> > 
> > Simplify the physical allocator by removing allocation accounting. This
> > accomplishes two things:
> > 
> > 1. It makes the allocator more useful, as the warning that was displayed
> > each allocation after the 256th is removed.
> > 
> > 2. Together with the lock removal, the physical allocator becomes more
> > appealing as a very early allocator, when using the page allocator might
> > not be desirable or feasible.
> 
> How does the locking cause problems when used in an early allocator?

By "early allocator" I mean here an allocator that can be used with the MMU
off.

The "desirable or feasible" part refers to the fact that the page allocator
cannot be used an early allocator (when the MMU is off) because 1. It
doesn't do the necessary cache maintenance operations and 2. It would be
hard to do add them, as the internal structures that the page allocator
maintains are significantly more complex than what the physical allocator
uses.

With this part: "together with the lock removal, the physical allocator
becomes more appealing as a very early allocator [..]" I was trying to say
that the physical allocator has now become as simple as it can possibly be
(well, align_min could also be removed and leave it up to the calling code
to request correctly aligned allocations but it's debatable if users of the
allocator should know about how it's implemented). I can reword or remove
this part if you feel it's confusing.

Thanks,
Alex

> 
> > 
> > Also, phys_alloc_show() has received a slight change in the way it displays
> > the use and free regions: the end of the region is now non-inclusive, to
> > allow phys_alloc_show() to express that no memory has been used, or no
> > memory is free, in which case the start and the end adresses are equal.
> > 
> > Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
> > ---
> >  lib/alloc_phys.c | 65 ++++++++++++++----------------------------------
> >  lib/alloc_phys.h |  5 ++--
> >  2 files changed, 21 insertions(+), 49 deletions(-)
> >
> 
> Reviewed-by: Andrew Jones <andrew.jones@linux.dev>
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [kvm-unit-tests RFC PATCH 05/19] lib/alloc_phys: Remove locking
  2022-09-20  8:45   ` Andrew Jones
@ 2022-09-20 13:20     ` Alexandru Elisei
  2022-09-20 14:59       ` Andrew Jones
  0 siblings, 1 reply; 49+ messages in thread
From: Alexandru Elisei @ 2022-09-20 13:20 UTC (permalink / raw)
  To: Andrew Jones; +Cc: nikos.nikoleris, pbonzini, thuth, kvmarm, kvm

Hi,

On Tue, Sep 20, 2022 at 10:45:53AM +0200, Andrew Jones wrote:
> On Tue, Aug 09, 2022 at 10:15:44AM +0100, Alexandru Elisei wrote:
> > With powerpc moving the page allocator, there are no architectures left
> > which use the physical allocator after the boot setup:  arm, arm64,
> > s390x and powerpc drain the physical allocator to initialize the page
> > allocator; and x86 calls setup_vm() to drain the allocator for each of
> > the tests that allocate memory.
> 
> Please put the motivation for this change in the commit message. I looked
> ahead at the next patch to find it, but I'm not sure I agree with it. We
> should be able to keep the locking even when used early, since we probably
> need our locking to be something we can use early elsewhere anyway.

You are correct, the commit message doesn't explain why locking is removed,
which makes the commit confusing. I will try to do a better job for the
next iteration (if we decide to keep this patch).

I removed locking because the physical allocator by the end of the series
will end up being used only by arm64 to create the idmap, which is done on
the boot CPU and with the MMU off. After that, the translation table
allocator functions will use the page allocator, which can be used
concurrently.

Looking at the spinlock implementation, spin_lock() doesn't protect from
the concurrent accesses when the MMU is disabled (lock->v is
unconditionally set to 1). Which means that spin_lock() does not work (in
the sense that it doesn't protect against concurrent accesses) on the boot
path, which doesn't need a spinlock anyway, because no secondaries are
online secondaries. It also means that spinlocks don't work when
AUXINFO_MMU_OFF is set. So for the purpose of simplicity I preferred to
drop it entirely.

Thanks,
Alex

> 
> Thanks,
> drew
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [kvm-unit-tests RFC PATCH 05/19] lib/alloc_phys: Remove locking
  2022-09-20 13:20     ` Alexandru Elisei
@ 2022-09-20 14:59       ` Andrew Jones
  2022-09-26 15:04         ` Alexandru Elisei
  0 siblings, 1 reply; 49+ messages in thread
From: Andrew Jones @ 2022-09-20 14:59 UTC (permalink / raw)
  To: Alexandru Elisei; +Cc: nikos.nikoleris, pbonzini, thuth, kvmarm, kvm

On Tue, Sep 20, 2022 at 02:20:48PM +0100, Alexandru Elisei wrote:
> Hi,
> 
> On Tue, Sep 20, 2022 at 10:45:53AM +0200, Andrew Jones wrote:
> > On Tue, Aug 09, 2022 at 10:15:44AM +0100, Alexandru Elisei wrote:
> > > With powerpc moving the page allocator, there are no architectures left
> > > which use the physical allocator after the boot setup:  arm, arm64,
> > > s390x and powerpc drain the physical allocator to initialize the page
> > > allocator; and x86 calls setup_vm() to drain the allocator for each of
> > > the tests that allocate memory.
> > 
> > Please put the motivation for this change in the commit message. I looked
> > ahead at the next patch to find it, but I'm not sure I agree with it. We
> > should be able to keep the locking even when used early, since we probably
> > need our locking to be something we can use early elsewhere anyway.
> 
> You are correct, the commit message doesn't explain why locking is removed,
> which makes the commit confusing. I will try to do a better job for the
> next iteration (if we decide to keep this patch).
> 
> I removed locking because the physical allocator by the end of the series
> will end up being used only by arm64 to create the idmap, which is done on

If only arm, and no unit tests, needs the phys allocator, then it can be
integrated with whatever arm is using it for and removed from the general
lib.

> the boot CPU and with the MMU off. After that, the translation table
> allocator functions will use the page allocator, which can be used
> concurrently.
> 
> Looking at the spinlock implementation, spin_lock() doesn't protect from
> the concurrent accesses when the MMU is disabled (lock->v is
> unconditionally set to 1). Which means that spin_lock() does not work (in
> the sense that it doesn't protect against concurrent accesses) on the boot
> path, which doesn't need a spinlock anyway, because no secondaries are
> online secondaries. It also means that spinlocks don't work when
> AUXINFO_MMU_OFF is set. So for the purpose of simplicity I preferred to
> drop it entirely.

If other architectures or unit tests have / could have uses for the
phys allocator then we should either document that it doesn't have
locks or keep the locks, and arm will just know that they don't work,
but also that they don't need to for its purposes.

Finally, if we drop the locks and arm doesn't have any other places where
we use locks without the MMU enabled, then we can change the lock
implementation to not have the no-mmu fallback - maybe by switching to the
generic implementation as the other architectures have done.

Thanks,
drew
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [kvm-unit-tests RFC PATCH 07/19] arm/arm64: Mark the phys_end parameter as unused in setup_mmu()
  2022-09-20  8:58   ` Andrew Jones
@ 2022-09-26 11:01     ` Alexandru Elisei
  0 siblings, 0 replies; 49+ messages in thread
From: Alexandru Elisei @ 2022-09-26 11:01 UTC (permalink / raw)
  To: Andrew Jones; +Cc: nikos.nikoleris, pbonzini, thuth, kvmarm, kvm

Hi,

On Tue, Sep 20, 2022 at 10:58:15AM +0200, Andrew Jones wrote:
> On Tue, Aug 09, 2022 at 10:15:46AM +0100, Alexandru Elisei wrote:
> > phys_end was used to cap the linearly mapped memory to 3G to allow 1G of
> > room for the vmalloc area to grown down. This was made useless in commit
> > c1cd1a2bed69 ("arm/arm64: mmu: Remove memory layout assumptions"), when
> > setup_mmu() was changed to map all the detected memory regions without
> > changing their limits.
> 
> c1cd1a2bed69 was a start, but as that commit says, the 3G-4G region was
> still necessary due to assumptions in the virtual memory allocator. This
> patch needs to point out a vmalloc commit which removes that assumption
> as well for its justification.

By "made useless" I mean that after that commit phys_end has no influence
on the way setup_mmu() creates the translation tables.

Yes, it's a problem because on real hardware or with kvmtool, which allows
the user to specify where RAM starts, the test can be loaded at the same
address from where vmalloc() will start allocating memory. But I think that
should be fixed separately from this series, maybe as part of the main
UEFI series, or as a separate patch(es).

I'll drop this patch, and leave any cleanups for when the vmalloc area
change is implemented.

Thanks,
Alex

> 
> Thanks,
> drew
> 
> > 
> > Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
> > ---
> >  lib/arm/mmu.c | 6 +-----
> >  1 file changed, 1 insertion(+), 5 deletions(-)
> > 
> > diff --git a/lib/arm/mmu.c b/lib/arm/mmu.c
> > index e1a72fe4941f..8f936acafe8b 100644
> > --- a/lib/arm/mmu.c
> > +++ b/lib/arm/mmu.c
> > @@ -153,14 +153,10 @@ void mmu_set_range_sect(pgd_t *pgtable, uintptr_t virt_offset,
> >  	}
> >  }
> >  
> > -void *setup_mmu(phys_addr_t phys_end, void *unused)
> > +void *setup_mmu(phys_addr_t unused0, void *unused1)
> >  {
> >  	struct mem_region *r;
> >  
> > -	/* 3G-4G region is reserved for vmalloc, cap phys_end at 3G */
> > -	if (phys_end > (3ul << 30))
> > -		phys_end = 3ul << 30;
> > -
> >  #ifdef __aarch64__
> >  	init_alloc_vpage((void*)(4ul << 30));
> >  
> > -- 
> > 2.37.1
> > 
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [kvm-unit-tests RFC PATCH 13/19] arm: page.h: Add missing libcflat.h include
  2022-09-20  9:39   ` Andrew Jones
@ 2022-09-26 11:02     ` Alexandru Elisei
  0 siblings, 0 replies; 49+ messages in thread
From: Alexandru Elisei @ 2022-09-26 11:02 UTC (permalink / raw)
  To: Andrew Jones; +Cc: nikos.nikoleris, pbonzini, thuth, kvmarm, kvm

Hi,

On Tue, Sep 20, 2022 at 11:39:56AM +0200, Andrew Jones wrote:
> 
> I guess this should be squashed into one of the early patches in this
> series since we don't have this issue with the current code.

Will do, thanks for the suggestion!

Alex

> 
> Thanks,
> drew
> 
> 
> On Tue, Aug 09, 2022 at 10:15:52AM +0100, Alexandru Elisei wrote:
> > Include libcflat from page.h to avoid error like this one:
> > 
> > /path/to/kvm-unit-tests/lib/asm/page.h:19:9: error: unknown type name ‘u64’
> >    19 | typedef u64 pteval_t;
> >       |         ^~~
> > [..]
> > /path/to/kvm-unit-tests/lib/asm/page.h:47:8: error: unknown type name ‘phys_addr_t’
> >    47 | extern phys_addr_t __virt_to_phys(unsigned long addr);
> >       |        ^~~~~~~~~~~
> >       |                                     ^~~~~~~~~~~
> > [..]
> > /path/to/kvm-unit-tests/lib/asm/page.h:50:47: error: unknown type name ‘size_t’
> >    50 | extern void *__ioremap(phys_addr_t phys_addr, size_t size);
> > 
> > The arm64 version of the header already includes libcflat since commit
> > a2d06852fe59 ("arm64: Add support for configuring the translation
> > granule").
> > 
> > Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
> > ---
> >  lib/arm/asm/page.h | 2 ++
> >  1 file changed, 2 insertions(+)
> > 
> > diff --git a/lib/arm/asm/page.h b/lib/arm/asm/page.h
> > index 8eb4a883808e..0a46bda018c7 100644
> > --- a/lib/arm/asm/page.h
> > +++ b/lib/arm/asm/page.h
> > @@ -8,6 +8,8 @@
> >  
> >  #include <linux/const.h>
> >  
> > +#include <libcflat.h>
> > +
> >  #define PAGE_SHIFT		12
> >  #define PAGE_SIZE		(_AC(1,UL) << PAGE_SHIFT)
> >  #define PAGE_MASK		(~(PAGE_SIZE-1))
> > -- 
> > 2.37.1
> > 
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [kvm-unit-tests RFC PATCH 05/19] lib/alloc_phys: Remove locking
  2022-09-20 14:59       ` Andrew Jones
@ 2022-09-26 15:04         ` Alexandru Elisei
  0 siblings, 0 replies; 49+ messages in thread
From: Alexandru Elisei @ 2022-09-26 15:04 UTC (permalink / raw)
  To: Andrew Jones; +Cc: nikos.nikoleris, pbonzini, thuth, kvmarm, kvm

Hi,

On Tue, Sep 20, 2022 at 04:59:52PM +0200, Andrew Jones wrote:
> On Tue, Sep 20, 2022 at 02:20:48PM +0100, Alexandru Elisei wrote:
> > Hi,
> > 
> > On Tue, Sep 20, 2022 at 10:45:53AM +0200, Andrew Jones wrote:
> > > On Tue, Aug 09, 2022 at 10:15:44AM +0100, Alexandru Elisei wrote:
> > > > With powerpc moving the page allocator, there are no architectures left
> > > > which use the physical allocator after the boot setup:  arm, arm64,
> > > > s390x and powerpc drain the physical allocator to initialize the page
> > > > allocator; and x86 calls setup_vm() to drain the allocator for each of
> > > > the tests that allocate memory.
> > > 
> > > Please put the motivation for this change in the commit message. I looked
> > > ahead at the next patch to find it, but I'm not sure I agree with it. We
> > > should be able to keep the locking even when used early, since we probably
> > > need our locking to be something we can use early elsewhere anyway.
> > 
> > You are correct, the commit message doesn't explain why locking is removed,
> > which makes the commit confusing. I will try to do a better job for the
> > next iteration (if we decide to keep this patch).
> > 
> > I removed locking because the physical allocator by the end of the series
> > will end up being used only by arm64 to create the idmap, which is done on
> 
> If only arm, and no unit tests, needs the phys allocator, then it can be
> integrated with whatever arm is using it for and removed from the general
> lib.

I kept the allocator in lib because I thought that RISC-V might have an use
for it. Since it's a RISC architecture, I was thinking that it also might
require software cache management around enabling/disabling the MMU. But in
the end it's up to you, it would be easy to move the physical allocator to
lib/arm if you think that is best.

> 
> > the boot CPU and with the MMU off. After that, the translation table
> > allocator functions will use the page allocator, which can be used
> > concurrently.
> > 
> > Looking at the spinlock implementation, spin_lock() doesn't protect from
> > the concurrent accesses when the MMU is disabled (lock->v is
> > unconditionally set to 1). Which means that spin_lock() does not work (in
> > the sense that it doesn't protect against concurrent accesses) on the boot
> > path, which doesn't need a spinlock anyway, because no secondaries are
> > online secondaries. It also means that spinlocks don't work when
> > AUXINFO_MMU_OFF is set. So for the purpose of simplicity I preferred to
> > drop it entirely.
> 
> If other architectures or unit tests have / could have uses for the
> phys allocator then we should either document that it doesn't have
> locks or keep the locks, and arm will just know that they don't work,
> but also that they don't need to for its purposes.

I will write a comment explaining the baked in assumptions for the
allocator.

> 
> Finally, if we drop the locks and arm doesn't have any other places where
> we use locks without the MMU enabled, then we can change the lock
> implementation to not have the no-mmu fallback - maybe by switching to the
> generic implementation as the other architectures have done.

The architecture mandates that load-acquire/store-release instructions are
supported only on Normal memory (to be more precise, Inner Shareable, Inner
Write-Back, Outer Write-Back Normal memory with Read allocation hints and
Write allocation hints and not transient and Outer Shareable, Inner
Write-Back, Outer Write-Back Normal memory with Read allocation hints and
Write allocation hints and not transient, ARM DDI 0487H.a, pages B2-211 and
B2-212).

If the AUXINFO_MMU_OFF flag is set, kvm-unit-tests doesn't enable the MMU
at boot, which means that all tests can be run with the MMU disabled. In
this case, all memory is Device-nGnRnE (instead of Normal). By using an
implementation that doesn't take into account that spin_lock() might be
called with the MMU disabled, kvm-unit-tests will end up using exclusive
access instructions on memory which doesn't support it. This can have
various effects, all rather unpleasant, like causing an external abort or
treating the exclusive access instruction as a NOP (ARM DDI 0487H.a, page
B2-212).

Tested this on my rockpro64 board, kvm-unit-tests built from current
master, with the mmu_disabled() path removed from spin_lock() (and
AUXINFO_MMU_OFF flag set), all tests hang indefinitely, that's because
phys_alloc_init() uses a spinlock. It is conceivable that we could rework
the setup code to remove the usage of spinlocks, but it's still the matter
of tests needing one for synchronization. It's also the matter of the uart
needing one for puts. And report. And probably other places.

Out of curiosity, without setting the AUXINFO_MMU_OFF flag, I tried using
the generic version of the spinlock (I assume you mean the one from
lib/asm-generic/spinlock.h, changed lib/arm64/asm/spinlock.h to include the
above header), selftest-setup hangs without displaying anything before
phys_alloc_init(), I have no idea why that is.

In the current implementation, when AUXINFO_MMU_OFF is set, tests that
actually use more than one thread might end up being incorrect some of the
time because spin_lock() doesn't protect against concurrent accesses.
That's pretty bad, but I think the alternative off all tests hanging
indefinitely is worse.

In my opinion, the current spinlock implementation is incorrect when the
MMU is disabled, but using a generic implementation is worse. I guess
another thing to put on the TODO list.  Arm ARM recommends Lamport’s Bakery
algorithm for mutual exclusion and we could try to implement that for the
MMU disabled case, but I don't see much interest at the moment in running
tests with the MMU disabled.

Thanks,
Alex

> 
> Thanks,
> drew
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 49+ messages in thread

end of thread, other threads:[~2022-09-26 15:03 UTC | newest]

Thread overview: 49+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-09  9:15 [kvm-unit-tests RFC PATCH 00/19] arm/arm64: Rework cache maintenance at boot Alexandru Elisei
2022-08-09  9:15 ` [kvm-unit-tests RFC PATCH 01/19] Makefile: Define __ASSEMBLY__ for assembly files Alexandru Elisei
2022-08-09 12:36   ` Nikos Nikoleris
2022-09-20  8:11   ` Andrew Jones
2022-08-09  9:15 ` [kvm-unit-tests RFC PATCH 02/19] lib/alloc_phys: Initialize align_min Alexandru Elisei
2022-09-20  8:20   ` Andrew Jones
2022-08-09  9:15 ` [kvm-unit-tests RFC PATCH 03/19] lib/alloc_phys: Use phys_alloc_aligned_safe and rename it to memalign_early Alexandru Elisei
2022-09-20  8:27   ` Andrew Jones
2022-08-09  9:15 ` [kvm-unit-tests RFC PATCH 04/19] powerpc: Use the page allocator Alexandru Elisei
2022-08-09  9:15 ` [kvm-unit-tests RFC PATCH 05/19] lib/alloc_phys: Remove locking Alexandru Elisei
2022-09-20  8:45   ` Andrew Jones
2022-09-20 13:20     ` Alexandru Elisei
2022-09-20 14:59       ` Andrew Jones
2022-09-26 15:04         ` Alexandru Elisei
2022-08-09  9:15 ` [kvm-unit-tests RFC PATCH 06/19] lib/alloc_phys: Remove allocation accounting Alexandru Elisei
2022-09-20  8:40   ` Andrew Jones
2022-09-20 13:19     ` Alexandru Elisei
2022-08-09  9:15 ` [kvm-unit-tests RFC PATCH 07/19] arm/arm64: Mark the phys_end parameter as unused in setup_mmu() Alexandru Elisei
2022-09-20  8:58   ` Andrew Jones
2022-09-26 11:01     ` Alexandru Elisei
2022-08-09  9:15 ` [kvm-unit-tests RFC PATCH 08/19] arm/arm64: Use pgd_alloc() to allocate mmu_idmap Alexandru Elisei
2022-09-20  9:05   ` Andrew Jones
2022-08-09  9:15 ` [kvm-unit-tests RFC PATCH 09/19] arm/arm64: Zero secondary CPUs' stack Alexandru Elisei
2022-08-09 12:56   ` Nikos Nikoleris
2022-08-10  9:42     ` Alexandru Elisei
2022-08-10 10:00       ` Nikos Nikoleris
2022-09-20  9:24   ` Andrew Jones
2022-08-09  9:15 ` [kvm-unit-tests RFC PATCH 10/19] arm/arm64: Enable the MMU early Alexandru Elisei
2022-08-09  9:15 ` [kvm-unit-tests RFC PATCH 11/19] arm/arm64: Map the UART when creating the translation tables Alexandru Elisei
2022-08-09  9:15 ` [kvm-unit-tests RFC PATCH 12/19] arm/arm64: assembler.h: Replace size with end address for dcache_by_line_op Alexandru Elisei
2022-08-09 13:01   ` Nikos Nikoleris
2022-09-20  9:37   ` Andrew Jones
2022-08-09  9:15 ` [kvm-unit-tests RFC PATCH 13/19] arm: page.h: Add missing libcflat.h include Alexandru Elisei
2022-09-20  9:39   ` Andrew Jones
2022-09-26 11:02     ` Alexandru Elisei
2022-08-09  9:15 ` [kvm-unit-tests RFC PATCH 14/19] arm/arm64: Add C functions for doing cache maintenance Alexandru Elisei
2022-08-09  9:15 ` [kvm-unit-tests RFC PATCH 15/19] lib/alloc_phys: Add callback to perform " Alexandru Elisei
2022-08-09  9:15 ` [kvm-unit-tests RFC PATCH 16/19] arm/arm64: Allocate secondaries' stack using the page allocator Alexandru Elisei
2022-09-20  9:58   ` Andrew Jones
2022-08-09  9:15 ` [kvm-unit-tests RFC PATCH 17/19] arm/arm64: Configure secondaries' stack before enabling the MMU Alexandru Elisei
2022-08-09  9:15 ` [kvm-unit-tests RFC PATCH 18/19] arm/arm64: Perform dcache maintenance at boot Alexandru Elisei
2022-08-09  9:15 ` [kvm-unit-tests RFC PATCH 19/19] arm/arm64: Rework the cache maintenance in asm_mmu_disable Alexandru Elisei
2022-08-09 13:53   ` Nikos Nikoleris
2022-08-09 14:22     ` Alexandru Elisei
2022-08-09 15:53       ` Nikos Nikoleris
2022-08-09 16:53         ` Alexandru Elisei
2022-08-09 19:48           ` Nikos Nikoleris
2022-08-10  8:52             ` Alexandru Elisei
2022-08-09  9:49 ` [kvm-unit-tests RFC PATCH 00/19] arm/arm64: Rework cache maintenance at boot Alexandru Elisei

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).