linux-doc.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v6 0/4] RISC-V: mm: Make SV48 the default address space
@ 2023-07-14 16:54 Charlie Jenkins
  2023-07-14 16:54 ` [PATCH v6 1/4] RISC-V: mm: Restrict address space for sv39,sv48,sv57 Charlie Jenkins
                   ` (3 more replies)
  0 siblings, 4 replies; 11+ messages in thread
From: Charlie Jenkins @ 2023-07-14 16:54 UTC (permalink / raw)
  To: linux-riscv, linux-kernel
  Cc: charlie, conor, paul.walmsley, palmer, aou, anup, konstantin,
	linux-doc, linux-kselftest, linux-mm, mick, jrtc27, rdunlap,
	alexghiti

Make sv48 the default address space for mmap as some applications
currently depend on this assumption. Users can now select a
desired address space using a non-zero hint address to mmap. Previously,
requesting the default address space from mmap by passing zero as the hint
address would result in using the largest address space possible. Some
applications depend on empty bits in the virtual address space, like Go and
Java, so this patch provides more flexibility for application developers.

-Charlie

---
v6:
- Rebase onto the correct base

v5:
- Minor wording change in documentation
- Change some parenthesis in arch_get_mmap_ macros
- Added case for addr==0 in arch_get_mmap_ because without this, programs would
  crash if RLIMIT_STACK was modified before executing the program. This was
  tested using the libhugetlbfs tests. 

v4:
- Split testcases/document patch into test cases, in-code documentation, and
  formal documentation patches
- Modified the mmap_base macro to be more legible and better represent memory
  layout
- Fixed documentation to better reflect the implmentation
- Renamed DEFAULT_VA_BITS to MMAP_VA_BITS
- Added additional test case for rlimit changes
---

Charlie Jenkins (4):
  RISC-V: mm: Restrict address space for sv39,sv48,sv57
  RISC-V: mm: Add tests for RISC-V mm
  RISC-V: mm: Update pgtable comment documentation
  RISC-V: mm: Document mmap changes

 Documentation/riscv/vm-layout.rst             |  22 +++
 arch/riscv/include/asm/elf.h                  |   2 +-
 arch/riscv/include/asm/pgtable.h              |  20 ++-
 arch/riscv/include/asm/processor.h            |  46 +++++-
 tools/testing/selftests/riscv/Makefile        |   2 +-
 tools/testing/selftests/riscv/mm/.gitignore   |   1 +
 tools/testing/selftests/riscv/mm/Makefile     |  21 +++
 .../selftests/riscv/mm/testcases/mmap.c       | 133 ++++++++++++++++++
 8 files changed, 234 insertions(+), 13 deletions(-)
 create mode 100644 tools/testing/selftests/riscv/mm/.gitignore
 create mode 100644 tools/testing/selftests/riscv/mm/Makefile
 create mode 100644 tools/testing/selftests/riscv/mm/testcases/mmap.c

-- 
2.41.0


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v6 1/4] RISC-V: mm: Restrict address space for sv39,sv48,sv57
  2023-07-14 16:54 [PATCH v6 0/4] RISC-V: mm: Make SV48 the default address space Charlie Jenkins
@ 2023-07-14 16:54 ` Charlie Jenkins
  2023-07-20  8:13   ` Alexandre Ghiti
  2023-07-14 16:54 ` [PATCH v6 2/4] RISC-V: mm: Add tests for RISC-V mm Charlie Jenkins
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 11+ messages in thread
From: Charlie Jenkins @ 2023-07-14 16:54 UTC (permalink / raw)
  To: linux-riscv, linux-kernel
  Cc: charlie, conor, paul.walmsley, palmer, aou, anup, konstantin,
	linux-doc, linux-kselftest, linux-mm, mick, jrtc27, rdunlap,
	alexghiti

Make sv48 the default address space for mmap as some applications
currently depend on this assumption. A hint address passed to mmap will
cause the largest address space that fits entirely into the hint to be
used. If the hint is less than or equal to 1<<38, an sv39 address will
be used. An exception is that if the hint address is 0, then a sv48
address will be used. After an address space is completely full, the next
smallest address space will be used.

Signed-off-by: Charlie Jenkins <charlie@rivosinc.com>
---
 arch/riscv/include/asm/elf.h       |  2 +-
 arch/riscv/include/asm/pgtable.h   | 12 +++++++-
 arch/riscv/include/asm/processor.h | 46 +++++++++++++++++++++++++-----
 3 files changed, 51 insertions(+), 9 deletions(-)

diff --git a/arch/riscv/include/asm/elf.h b/arch/riscv/include/asm/elf.h
index c24280774caf..5d3368d5585c 100644
--- a/arch/riscv/include/asm/elf.h
+++ b/arch/riscv/include/asm/elf.h
@@ -49,7 +49,7 @@ extern bool compat_elf_check_arch(Elf32_Ehdr *hdr);
  * the loader.  We need to make sure that it is out of the way of the program
  * that it will "exec", and that there is sufficient room for the brk.
  */
-#define ELF_ET_DYN_BASE		((TASK_SIZE / 3) * 2)
+#define ELF_ET_DYN_BASE		((DEFAULT_MAP_WINDOW / 3) * 2)
 
 #ifdef CONFIG_64BIT
 #ifdef CONFIG_COMPAT
diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
index 75970ee2bda2..e13f5872bfe9 100644
--- a/arch/riscv/include/asm/pgtable.h
+++ b/arch/riscv/include/asm/pgtable.h
@@ -63,12 +63,22 @@
  * position vmemmap directly below the VMALLOC region.
  */
 #ifdef CONFIG_64BIT
+#define VA_BITS_SV39 39
+#define VA_BITS_SV48 48
+#define VA_BITS_SV57 57
+
+#define VA_USER_SV39 (UL(1) << (VA_BITS_SV39 - 1))
+#define VA_USER_SV48 (UL(1) << (VA_BITS_SV48 - 1))
+#define VA_USER_SV57 (UL(1) << (VA_BITS_SV57 - 1))
+
 #define VA_BITS		(pgtable_l5_enabled ? \
-				57 : (pgtable_l4_enabled ? 48 : 39))
+				VA_BITS_SV57 : (pgtable_l4_enabled ? VA_BITS_SV48 : VA_BITS_SV39))
 #else
 #define VA_BITS		32
 #endif
 
+#define MMAP_VA_BITS ((VA_BITS >= VA_BITS_SV48) ? VA_BITS_SV48 : VA_BITS)
+
 #define VMEMMAP_SHIFT \
 	(VA_BITS - PAGE_SHIFT - 1 + STRUCT_PAGE_MAX_SHIFT)
 #define VMEMMAP_SIZE	BIT(VMEMMAP_SHIFT)
diff --git a/arch/riscv/include/asm/processor.h b/arch/riscv/include/asm/processor.h
index c950a8d9edef..14a5396eed3d 100644
--- a/arch/riscv/include/asm/processor.h
+++ b/arch/riscv/include/asm/processor.h
@@ -13,20 +13,52 @@
 
 #include <asm/ptrace.h>
 
-/*
- * This decides where the kernel will search for a free chunk of vm
- * space during mmap's.
- */
-#define TASK_UNMAPPED_BASE	PAGE_ALIGN(TASK_SIZE / 3)
-
-#define STACK_TOP		TASK_SIZE
 #ifdef CONFIG_64BIT
+#define DEFAULT_MAP_WINDOW	(UL(1) << (MMAP_VA_BITS - 1))
 #define STACK_TOP_MAX		TASK_SIZE_64
+
+#define arch_get_mmap_end(addr, len, flags)	\
+({	\
+	unsigned long mmap_end;	\
+	if ((addr) >= VA_USER_SV57)	\
+		mmap_end = STACK_TOP_MAX;	\
+	else if ((((addr) >= VA_USER_SV48)) && (VA_BITS >= VA_BITS_SV48))	\
+		mmap_end = VA_USER_SV48;	\
+	else if ((addr) == 0)	\
+		mmap_end = DEFAULT_MAP_WINDOW;	\
+	else	\
+		mmap_end = VA_USER_SV39;	\
+	mmap_end;	\
+})
+
+#define arch_get_mmap_base(addr, base)	\
+({	\
+	unsigned long mmap_base;	\
+	if (((addr) >= VA_USER_SV57) && (VA_BITS >= VA_BITS_SV57))	\
+		mmap_base = (base) + (VA_USER_SV57 - DEFAULT_MAP_WINDOW);	\
+	else if ((((addr) >= VA_USER_SV48)) && (VA_BITS >= VA_BITS_SV48))	\
+		mmap_base = (base) + (VA_USER_SV48 - DEFAULT_MAP_WINDOW);	\
+	else if ((addr) == 0)	\
+		mmap_base = (base);	\
+	else	\
+		mmap_base = (base) + (VA_USER_SV39 - DEFAULT_MAP_WINDOW);	\
+	mmap_base;	\
+})
+
 #else
+#define DEFAULT_MAP_WINDOW	TASK_SIZE
 #define STACK_TOP_MAX		TASK_SIZE
 #endif
 #define STACK_ALIGN		16
 
+#define STACK_TOP		DEFAULT_MAP_WINDOW
+
+/*
+ * This decides where the kernel will search for a free chunk of vm
+ * space during mmap's.
+ */
+#define TASK_UNMAPPED_BASE	PAGE_ALIGN(DEFAULT_MAP_WINDOW / 3)
+
 #ifndef __ASSEMBLY__
 
 struct task_struct;
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v6 2/4] RISC-V: mm: Add tests for RISC-V mm
  2023-07-14 16:54 [PATCH v6 0/4] RISC-V: mm: Make SV48 the default address space Charlie Jenkins
  2023-07-14 16:54 ` [PATCH v6 1/4] RISC-V: mm: Restrict address space for sv39,sv48,sv57 Charlie Jenkins
@ 2023-07-14 16:54 ` Charlie Jenkins
  2023-07-20  8:16   ` Alexandre Ghiti
  2023-07-14 16:54 ` [PATCH v6 3/4] RISC-V: mm: Update pgtable comment documentation Charlie Jenkins
  2023-07-14 16:54 ` [PATCH v6 4/4] RISC-V: mm: Document mmap changes Charlie Jenkins
  3 siblings, 1 reply; 11+ messages in thread
From: Charlie Jenkins @ 2023-07-14 16:54 UTC (permalink / raw)
  To: linux-riscv, linux-kernel
  Cc: charlie, conor, paul.walmsley, palmer, aou, anup, konstantin,
	linux-doc, linux-kselftest, linux-mm, mick, jrtc27, rdunlap,
	alexghiti

Add tests that enforce mmap hint address behavior. mmap should default
to sv48. mmap will provide an address at the highest address space that
can fit into the hint address, unless the hint address is less than sv39
and not 0, then it will return a sv39 address. In addition, ensure that
rlimit changes do not cause mmap to fail.

Signed-off-by: Charlie Jenkins <charlie@rivosinc.com>
---
 tools/testing/selftests/riscv/Makefile        |   2 +-
 tools/testing/selftests/riscv/mm/.gitignore   |   1 +
 tools/testing/selftests/riscv/mm/Makefile     |  21 +++
 .../selftests/riscv/mm/testcases/mmap.c       | 133 ++++++++++++++++++
 4 files changed, 156 insertions(+), 1 deletion(-)
 create mode 100644 tools/testing/selftests/riscv/mm/.gitignore
 create mode 100644 tools/testing/selftests/riscv/mm/Makefile
 create mode 100644 tools/testing/selftests/riscv/mm/testcases/mmap.c

diff --git a/tools/testing/selftests/riscv/Makefile b/tools/testing/selftests/riscv/Makefile
index 9dd629cc86aa..1b79da90396e 100644
--- a/tools/testing/selftests/riscv/Makefile
+++ b/tools/testing/selftests/riscv/Makefile
@@ -5,7 +5,7 @@
 ARCH ?= $(shell uname -m 2>/dev/null || echo not)
 
 ifneq (,$(filter $(ARCH),riscv))
-RISCV_SUBTARGETS ?= hwprobe vector
+RISCV_SUBTARGETS ?= hwprobe vector mm
 else
 RISCV_SUBTARGETS :=
 endif
diff --git a/tools/testing/selftests/riscv/mm/.gitignore b/tools/testing/selftests/riscv/mm/.gitignore
new file mode 100644
index 000000000000..9a6f303edcd3
--- /dev/null
+++ b/tools/testing/selftests/riscv/mm/.gitignore
@@ -0,0 +1 @@
+mmap
diff --git a/tools/testing/selftests/riscv/mm/Makefile b/tools/testing/selftests/riscv/mm/Makefile
new file mode 100644
index 000000000000..cf68e63e7495
--- /dev/null
+++ b/tools/testing/selftests/riscv/mm/Makefile
@@ -0,0 +1,21 @@
+# SPDX-License-Identifier: GPL-2.0
+# Originally tools/testing/selftests/arm64/signal
+
+# Additional include paths needed by kselftest.h and local headers
+CFLAGS += -D_GNU_SOURCE -std=gnu99 -I.
+
+SRCS := $(filter-out testcases/testcases.c,$(wildcard testcases/*.c))
+PROGS := $(patsubst %.c,%,$(SRCS))
+
+# Generated binaries to be installed by top KSFT script
+TEST_GEN_PROGS := $(notdir $(PROGS))
+
+# Get Kernel headers installed and use them.
+
+# Including KSFT lib.mk here will also mangle the TEST_GEN_PROGS list
+# to account for any OUTPUT target-dirs optionally provided by
+# the toplevel makefile
+include ../../lib.mk
+
+$(TEST_GEN_PROGS): $(PROGS)
+	cp $(PROGS) $(OUTPUT)/
diff --git a/tools/testing/selftests/riscv/mm/testcases/mmap.c b/tools/testing/selftests/riscv/mm/testcases/mmap.c
new file mode 100644
index 000000000000..d8e751f7b8c9
--- /dev/null
+++ b/tools/testing/selftests/riscv/mm/testcases/mmap.c
@@ -0,0 +1,133 @@
+// SPDX-License-Identifier: GPL-2.0-only
+#include <sys/mman.h>
+#include <sys/resource.h>
+#include <sys/time.h>
+
+#include "../../kselftest_harness.h"
+struct addresses {
+	int *no_hint;
+	int *on_37_addr;
+	int *on_38_addr;
+	int *on_46_addr;
+	int *on_47_addr;
+	int *on_55_addr;
+	int *on_56_addr;
+};
+
+void do_mmaps(struct addresses *mmap_addresses)
+{
+	// Place all of the hint addresses on the boundaries of mmap
+	// sv39, sv48, sv57
+	// User addresses end at 1<<38, 1<<47, 1<<56 respectively
+	void *on_37_bits = (void *)(1UL << 37);
+	void *on_38_bits = (void *)(1UL << 38);
+	void *on_46_bits = (void *)(1UL << 46);
+	void *on_47_bits = (void *)(1UL << 47);
+	void *on_55_bits = (void *)(1UL << 55);
+	void *on_56_bits = (void *)(1UL << 56);
+
+	int prot = PROT_READ | PROT_WRITE;
+	int flags = MAP_PRIVATE | MAP_ANONYMOUS;
+
+	mmap_addresses->no_hint =
+		mmap(NULL, 5 * sizeof(int), prot, flags, 0, 0);
+	mmap_addresses->on_37_addr =
+		mmap(on_37_bits, 5 * sizeof(int), prot, flags, 0, 0);
+	mmap_addresses->on_38_addr =
+		mmap(on_38_bits, 5 * sizeof(int), prot, flags, 0, 0);
+	mmap_addresses->on_46_addr =
+		mmap(on_46_bits, 5 * sizeof(int), prot, flags, 0, 0);
+	mmap_addresses->on_47_addr =
+		mmap(on_47_bits, 5 * sizeof(int), prot, flags, 0, 0);
+	mmap_addresses->on_55_addr =
+		mmap(on_55_bits, 5 * sizeof(int), prot, flags, 0, 0);
+	mmap_addresses->on_56_addr =
+		mmap(on_56_bits, 5 * sizeof(int), prot, flags, 0, 0);
+}
+
+TEST(default_rlimit)
+{
+// Only works on 64 bit
+#if __riscv_xlen == 64
+	struct addresses mmap_addresses;
+
+	do_mmaps(&mmap_addresses);
+
+	EXPECT_NE(mmap_addresses.no_hint, MAP_FAILED);
+	EXPECT_NE(mmap_addresses.on_37_addr, MAP_FAILED);
+	EXPECT_NE(mmap_addresses.on_38_addr, MAP_FAILED);
+	EXPECT_NE(mmap_addresses.on_46_addr, MAP_FAILED);
+	EXPECT_NE(mmap_addresses.on_47_addr, MAP_FAILED);
+	EXPECT_NE(mmap_addresses.on_55_addr, MAP_FAILED);
+	EXPECT_NE(mmap_addresses.on_56_addr, MAP_FAILED);
+
+	EXPECT_LT((unsigned long)mmap_addresses.no_hint, 1UL << 47);
+	EXPECT_LT((unsigned long)mmap_addresses.on_37_addr, 1UL << 38);
+	EXPECT_LT((unsigned long)mmap_addresses.on_38_addr, 1UL << 38);
+	EXPECT_LT((unsigned long)mmap_addresses.on_46_addr, 1UL << 38);
+	EXPECT_LT((unsigned long)mmap_addresses.on_47_addr, 1UL << 47);
+	EXPECT_LT((unsigned long)mmap_addresses.on_55_addr, 1UL << 47);
+	EXPECT_LT((unsigned long)mmap_addresses.on_56_addr, 1UL << 56);
+#endif
+}
+
+TEST(zero_rlimit)
+{
+// Only works on 64 bit
+#if __riscv_xlen == 64
+	struct addresses mmap_addresses;
+	struct rlimit rlim_new = { .rlim_cur = 0, .rlim_max = RLIM_INFINITY };
+
+	setrlimit(RLIMIT_STACK, &rlim_new);
+
+	do_mmaps(&mmap_addresses);
+
+	EXPECT_NE(mmap_addresses.no_hint, MAP_FAILED);
+	EXPECT_NE(mmap_addresses.on_37_addr, MAP_FAILED);
+	EXPECT_NE(mmap_addresses.on_38_addr, MAP_FAILED);
+	EXPECT_NE(mmap_addresses.on_46_addr, MAP_FAILED);
+	EXPECT_NE(mmap_addresses.on_47_addr, MAP_FAILED);
+	EXPECT_NE(mmap_addresses.on_55_addr, MAP_FAILED);
+	EXPECT_NE(mmap_addresses.on_56_addr, MAP_FAILED);
+
+	EXPECT_LT((unsigned long)mmap_addresses.no_hint, 1UL << 47);
+	EXPECT_LT((unsigned long)mmap_addresses.on_37_addr, 1UL << 38);
+	EXPECT_LT((unsigned long)mmap_addresses.on_38_addr, 1UL << 38);
+	EXPECT_LT((unsigned long)mmap_addresses.on_46_addr, 1UL << 38);
+	EXPECT_LT((unsigned long)mmap_addresses.on_47_addr, 1UL << 47);
+	EXPECT_LT((unsigned long)mmap_addresses.on_55_addr, 1UL << 47);
+	EXPECT_LT((unsigned long)mmap_addresses.on_56_addr, 1UL << 56);
+#endif
+}
+
+TEST(infinite_rlimit)
+{
+// Only works on 64 bit
+#if __riscv_xlen == 64
+	struct addresses mmap_addresses;
+	struct rlimit rlim_new = { .rlim_cur = RLIM_INFINITY,
+				   .rlim_max = RLIM_INFINITY };
+
+	setrlimit(RLIMIT_STACK, &rlim_new);
+
+	do_mmaps(&mmap_addresses);
+
+	EXPECT_NE(mmap_addresses.no_hint, MAP_FAILED);
+	EXPECT_NE(mmap_addresses.on_37_addr, MAP_FAILED);
+	EXPECT_NE(mmap_addresses.on_38_addr, MAP_FAILED);
+	EXPECT_NE(mmap_addresses.on_46_addr, MAP_FAILED);
+	EXPECT_NE(mmap_addresses.on_47_addr, MAP_FAILED);
+	EXPECT_NE(mmap_addresses.on_55_addr, MAP_FAILED);
+	EXPECT_NE(mmap_addresses.on_56_addr, MAP_FAILED);
+
+	EXPECT_LT((unsigned long)mmap_addresses.no_hint, 1UL << 47);
+	EXPECT_LT((unsigned long)mmap_addresses.on_37_addr, 1UL << 38);
+	EXPECT_LT((unsigned long)mmap_addresses.on_38_addr, 1UL << 38);
+	EXPECT_LT((unsigned long)mmap_addresses.on_46_addr, 1UL << 38);
+	EXPECT_LT((unsigned long)mmap_addresses.on_47_addr, 1UL << 47);
+	EXPECT_LT((unsigned long)mmap_addresses.on_55_addr, 1UL << 47);
+	EXPECT_LT((unsigned long)mmap_addresses.on_56_addr, 1UL << 56);
+#endif
+}
+
+TEST_HARNESS_MAIN
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v6 3/4] RISC-V: mm: Update pgtable comment documentation
  2023-07-14 16:54 [PATCH v6 0/4] RISC-V: mm: Make SV48 the default address space Charlie Jenkins
  2023-07-14 16:54 ` [PATCH v6 1/4] RISC-V: mm: Restrict address space for sv39,sv48,sv57 Charlie Jenkins
  2023-07-14 16:54 ` [PATCH v6 2/4] RISC-V: mm: Add tests for RISC-V mm Charlie Jenkins
@ 2023-07-14 16:54 ` Charlie Jenkins
  2023-07-20  7:00   ` Alexandre Ghiti
  2023-07-14 16:54 ` [PATCH v6 4/4] RISC-V: mm: Document mmap changes Charlie Jenkins
  3 siblings, 1 reply; 11+ messages in thread
From: Charlie Jenkins @ 2023-07-14 16:54 UTC (permalink / raw)
  To: linux-riscv, linux-kernel
  Cc: charlie, conor, paul.walmsley, palmer, aou, anup, konstantin,
	linux-doc, linux-kselftest, linux-mm, mick, jrtc27, rdunlap,
	alexghiti

sv57 is supported in the kernel so pgtable.h should reflect that.

Signed-off-by: Charlie Jenkins <charlie@rivosinc.com>
---
 arch/riscv/include/asm/pgtable.h | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
index e13f5872bfe9..28660bab754c 100644
--- a/arch/riscv/include/asm/pgtable.h
+++ b/arch/riscv/include/asm/pgtable.h
@@ -840,14 +840,16 @@ static inline pte_t pte_swp_clear_exclusive(pte_t pte)
  * Task size is 0x4000000000 for RV64 or 0x9fc00000 for RV32.
  * Note that PGDIR_SIZE must evenly divide TASK_SIZE.
  * Task size is:
- * -     0x9fc00000 (~2.5GB) for RV32.
- * -   0x4000000000 ( 256GB) for RV64 using SV39 mmu
- * - 0x800000000000 ( 128TB) for RV64 using SV48 mmu
+ * -        0x9fc00000	(~2.5GB) for RV32.
+ * -      0x4000000000	( 256GB) for RV64 using SV39 mmu
+ * -    0x800000000000	( 128TB) for RV64 using SV48 mmu
+ * - 0x100000000000000	(  64PB) for RV64 using SV57 mmu
  *
  * Note that PGDIR_SIZE must evenly divide TASK_SIZE since "RISC-V
  * Instruction Set Manual Volume II: Privileged Architecture" states that
  * "load and store effective addresses, which are 64bits, must have bits
  * 63–48 all equal to bit 47, or else a page-fault exception will occur."
+ * Similarly for SV57, bits 63–57 must be equal to bit 56.
  */
 #ifdef CONFIG_64BIT
 #define TASK_SIZE_64	(PGDIR_SIZE * PTRS_PER_PGD / 2)
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v6 4/4] RISC-V: mm: Document mmap changes
  2023-07-14 16:54 [PATCH v6 0/4] RISC-V: mm: Make SV48 the default address space Charlie Jenkins
                   ` (2 preceding siblings ...)
  2023-07-14 16:54 ` [PATCH v6 3/4] RISC-V: mm: Update pgtable comment documentation Charlie Jenkins
@ 2023-07-14 16:54 ` Charlie Jenkins
  2023-07-20  6:59   ` Alexandre Ghiti
  3 siblings, 1 reply; 11+ messages in thread
From: Charlie Jenkins @ 2023-07-14 16:54 UTC (permalink / raw)
  To: linux-riscv, linux-kernel
  Cc: charlie, conor, paul.walmsley, palmer, aou, anup, konstantin,
	linux-doc, linux-kselftest, linux-mm, mick, jrtc27, rdunlap,
	alexghiti

The behavior of mmap is modified with this patch series, so explain the
changes to the mmap hint address behavior.

Signed-off-by: Charlie Jenkins <charlie@rivosinc.com>
---
 Documentation/riscv/vm-layout.rst | 22 ++++++++++++++++++++++
 1 file changed, 22 insertions(+)

diff --git a/Documentation/riscv/vm-layout.rst b/Documentation/riscv/vm-layout.rst
index 5462c84f4723..892412b91300 100644
--- a/Documentation/riscv/vm-layout.rst
+++ b/Documentation/riscv/vm-layout.rst
@@ -133,3 +133,25 @@ RISC-V Linux Kernel SV57
    ffffffff00000000 |  -4     GB | ffffffff7fffffff |    2 GB | modules, BPF
    ffffffff80000000 |  -2     GB | ffffffffffffffff |    2 GB | kernel
   __________________|____________|__________________|_________|____________________________________________________________
+
+
+Userspace VAs
+--------------------
+To maintain compatibility with software that relies on the VA space with a
+maximum of 48 bits the kernel will, by default, return virtual addresses to
+userspace from a 48-bit range (sv48). This default behavior is achieved by
+passing 0 into the hint address parameter of mmap. On CPUs with an address space
+smaller than sv48, the CPU maximum supported address space will be the default.
+
+Software can "opt-in" to receiving VAs from another VA space by providing
+a hint address to mmap. A call to mmap is guaranteed to return an address
+that will not override the unset left-aligned bits in the hint address,
+unless there is no space left in the address space. If there is no space
+available in the requested address space, an address in the next smallest
+available address space will be returned.
+
+For example, in order to obtain 48-bit VA space, a hint address greater than
+:code:`1 << 38` must be provided. Note that this is 38 due to sv39 userspace
+ending at :code:`1 << 38` and the addresses beyond this are reserved for the
+kernel. Similarly, to obtain 57-bit VA space addresses, a hint address greater
+than or equal to :code:`1 << 47` must be provided.
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v6 4/4] RISC-V: mm: Document mmap changes
  2023-07-14 16:54 ` [PATCH v6 4/4] RISC-V: mm: Document mmap changes Charlie Jenkins
@ 2023-07-20  6:59   ` Alexandre Ghiti
  2023-07-26 13:52     ` Charlie Jenkins
  0 siblings, 1 reply; 11+ messages in thread
From: Alexandre Ghiti @ 2023-07-20  6:59 UTC (permalink / raw)
  To: Charlie Jenkins
  Cc: linux-riscv, linux-kernel, conor, paul.walmsley, palmer, aou,
	anup, konstantin, linux-doc, linux-kselftest, linux-mm, mick,
	jrtc27, rdunlap

On Fri, Jul 14, 2023 at 6:56 PM Charlie Jenkins <charlie@rivosinc.com> wrote:
>
> The behavior of mmap is modified with this patch series, so explain the
> changes to the mmap hint address behavior.
>
> Signed-off-by: Charlie Jenkins <charlie@rivosinc.com>
> ---
>  Documentation/riscv/vm-layout.rst | 22 ++++++++++++++++++++++
>  1 file changed, 22 insertions(+)
>
> diff --git a/Documentation/riscv/vm-layout.rst b/Documentation/riscv/vm-layout.rst
> index 5462c84f4723..892412b91300 100644
> --- a/Documentation/riscv/vm-layout.rst
> +++ b/Documentation/riscv/vm-layout.rst
> @@ -133,3 +133,25 @@ RISC-V Linux Kernel SV57
>     ffffffff00000000 |  -4     GB | ffffffff7fffffff |    2 GB | modules, BPF
>     ffffffff80000000 |  -2     GB | ffffffffffffffff |    2 GB | kernel
>    __________________|____________|__________________|_________|____________________________________________________________
> +
> +
> +Userspace VAs
> +--------------------
> +To maintain compatibility with software that relies on the VA space with a
> +maximum of 48 bits the kernel will, by default, return virtual addresses to
> +userspace from a 48-bit range (sv48). This default behavior is achieved by
> +passing 0 into the hint address parameter of mmap. On CPUs with an address space
> +smaller than sv48, the CPU maximum supported address space will be the default.
> +
> +Software can "opt-in" to receiving VAs from another VA space by providing
> +a hint address to mmap. A call to mmap is guaranteed to return an address
> +that will not override the unset left-aligned bits in the hint address,
> +unless there is no space left in the address space. If there is no space
> +available in the requested address space, an address in the next smallest
> +available address space will be returned.
> +
> +For example, in order to obtain 48-bit VA space, a hint address greater than
> +:code:`1 << 38` must be provided.

Is this correct? Shouldn't the hint be strictly greater than the
address space it targets? In patch 1, you state that "A hint address
passed to mmap will cause the largest address space that fits entirely
into the hint to be used", it seems contradictory to me.

> Note that this is 38 due to sv39 userspace
> +ending at :code:`1 << 38` and the addresses beyond this are reserved for the
> +kernel. Similarly, to obtain 57-bit VA space addresses, a hint address greater
> +than or equal to :code:`1 << 47` must be provided.
> --
> 2.41.0
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v6 3/4] RISC-V: mm: Update pgtable comment documentation
  2023-07-14 16:54 ` [PATCH v6 3/4] RISC-V: mm: Update pgtable comment documentation Charlie Jenkins
@ 2023-07-20  7:00   ` Alexandre Ghiti
  0 siblings, 0 replies; 11+ messages in thread
From: Alexandre Ghiti @ 2023-07-20  7:00 UTC (permalink / raw)
  To: Charlie Jenkins
  Cc: linux-riscv, linux-kernel, conor, paul.walmsley, palmer, aou,
	anup, konstantin, linux-doc, linux-kselftest, linux-mm, mick,
	jrtc27, rdunlap

On Fri, Jul 14, 2023 at 6:56 PM Charlie Jenkins <charlie@rivosinc.com> wrote:
>
> sv57 is supported in the kernel so pgtable.h should reflect that.
>
> Signed-off-by: Charlie Jenkins <charlie@rivosinc.com>
> ---
>  arch/riscv/include/asm/pgtable.h | 8 +++++---
>  1 file changed, 5 insertions(+), 3 deletions(-)
>
> diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
> index e13f5872bfe9..28660bab754c 100644
> --- a/arch/riscv/include/asm/pgtable.h
> +++ b/arch/riscv/include/asm/pgtable.h
> @@ -840,14 +840,16 @@ static inline pte_t pte_swp_clear_exclusive(pte_t pte)
>   * Task size is 0x4000000000 for RV64 or 0x9fc00000 for RV32.
>   * Note that PGDIR_SIZE must evenly divide TASK_SIZE.
>   * Task size is:
> - * -     0x9fc00000 (~2.5GB) for RV32.
> - * -   0x4000000000 ( 256GB) for RV64 using SV39 mmu
> - * - 0x800000000000 ( 128TB) for RV64 using SV48 mmu
> + * -        0x9fc00000 (~2.5GB) for RV32.
> + * -      0x4000000000 ( 256GB) for RV64 using SV39 mmu
> + * -    0x800000000000 ( 128TB) for RV64 using SV48 mmu
> + * - 0x100000000000000 (  64PB) for RV64 using SV57 mmu
>   *
>   * Note that PGDIR_SIZE must evenly divide TASK_SIZE since "RISC-V
>   * Instruction Set Manual Volume II: Privileged Architecture" states that
>   * "load and store effective addresses, which are 64bits, must have bits
>   * 63–48 all equal to bit 47, or else a page-fault exception will occur."
> + * Similarly for SV57, bits 63–57 must be equal to bit 56.
>   */
>  #ifdef CONFIG_64BIT
>  #define TASK_SIZE_64   (PGDIR_SIZE * PTRS_PER_PGD / 2)
> --
> 2.41.0
>

You can add:

Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>

Thanks,

Alex

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v6 1/4] RISC-V: mm: Restrict address space for sv39,sv48,sv57
  2023-07-14 16:54 ` [PATCH v6 1/4] RISC-V: mm: Restrict address space for sv39,sv48,sv57 Charlie Jenkins
@ 2023-07-20  8:13   ` Alexandre Ghiti
  2023-07-26 14:02     ` Charlie Jenkins
  0 siblings, 1 reply; 11+ messages in thread
From: Alexandre Ghiti @ 2023-07-20  8:13 UTC (permalink / raw)
  To: Charlie Jenkins
  Cc: linux-riscv, linux-kernel, conor, paul.walmsley, palmer, aou,
	anup, konstantin, linux-doc, linux-kselftest, linux-mm, mick,
	jrtc27, rdunlap

On Fri, Jul 14, 2023 at 6:55 PM Charlie Jenkins <charlie@rivosinc.com> wrote:
>
> Make sv48 the default address space for mmap as some applications
> currently depend on this assumption. A hint address passed to mmap will
> cause the largest address space that fits entirely into the hint to be
> used. If the hint is less than or equal to 1<<38, an sv39 address will
> be used. An exception is that if the hint address is 0, then a sv48
> address will be used. After an address space is completely full, the next
> smallest address space will be used.
>
> Signed-off-by: Charlie Jenkins <charlie@rivosinc.com>
> ---
>  arch/riscv/include/asm/elf.h       |  2 +-
>  arch/riscv/include/asm/pgtable.h   | 12 +++++++-
>  arch/riscv/include/asm/processor.h | 46 +++++++++++++++++++++++++-----
>  3 files changed, 51 insertions(+), 9 deletions(-)
>
> diff --git a/arch/riscv/include/asm/elf.h b/arch/riscv/include/asm/elf.h
> index c24280774caf..5d3368d5585c 100644
> --- a/arch/riscv/include/asm/elf.h
> +++ b/arch/riscv/include/asm/elf.h
> @@ -49,7 +49,7 @@ extern bool compat_elf_check_arch(Elf32_Ehdr *hdr);
>   * the loader.  We need to make sure that it is out of the way of the program
>   * that it will "exec", and that there is sufficient room for the brk.
>   */
> -#define ELF_ET_DYN_BASE                ((TASK_SIZE / 3) * 2)
> +#define ELF_ET_DYN_BASE                ((DEFAULT_MAP_WINDOW / 3) * 2)
>
>  #ifdef CONFIG_64BIT
>  #ifdef CONFIG_COMPAT
> diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
> index 75970ee2bda2..e13f5872bfe9 100644
> --- a/arch/riscv/include/asm/pgtable.h
> +++ b/arch/riscv/include/asm/pgtable.h
> @@ -63,12 +63,22 @@
>   * position vmemmap directly below the VMALLOC region.
>   */
>  #ifdef CONFIG_64BIT
> +#define VA_BITS_SV39 39
> +#define VA_BITS_SV48 48
> +#define VA_BITS_SV57 57
> +
> +#define VA_USER_SV39 (UL(1) << (VA_BITS_SV39 - 1))
> +#define VA_USER_SV48 (UL(1) << (VA_BITS_SV48 - 1))
> +#define VA_USER_SV57 (UL(1) << (VA_BITS_SV57 - 1))
> +
>  #define VA_BITS                (pgtable_l5_enabled ? \
> -                               57 : (pgtable_l4_enabled ? 48 : 39))
> +                               VA_BITS_SV57 : (pgtable_l4_enabled ? VA_BITS_SV48 : VA_BITS_SV39))
>  #else
>  #define VA_BITS                32
>  #endif
>
> +#define MMAP_VA_BITS ((VA_BITS >= VA_BITS_SV48) ? VA_BITS_SV48 : VA_BITS)
> +
>  #define VMEMMAP_SHIFT \
>         (VA_BITS - PAGE_SHIFT - 1 + STRUCT_PAGE_MAX_SHIFT)
>  #define VMEMMAP_SIZE   BIT(VMEMMAP_SHIFT)
> diff --git a/arch/riscv/include/asm/processor.h b/arch/riscv/include/asm/processor.h
> index c950a8d9edef..14a5396eed3d 100644
> --- a/arch/riscv/include/asm/processor.h
> +++ b/arch/riscv/include/asm/processor.h
> @@ -13,20 +13,52 @@
>
>  #include <asm/ptrace.h>
>
> -/*
> - * This decides where the kernel will search for a free chunk of vm
> - * space during mmap's.
> - */
> -#define TASK_UNMAPPED_BASE     PAGE_ALIGN(TASK_SIZE / 3)
> -
> -#define STACK_TOP              TASK_SIZE
>  #ifdef CONFIG_64BIT
> +#define DEFAULT_MAP_WINDOW     (UL(1) << (MMAP_VA_BITS - 1))
>  #define STACK_TOP_MAX          TASK_SIZE_64
> +
> +#define arch_get_mmap_end(addr, len, flags)    \
> +({     \
> +       unsigned long mmap_end; \
> +       if ((addr) >= VA_USER_SV57)     \
> +               mmap_end = STACK_TOP_MAX;       \
> +       else if ((((addr) >= VA_USER_SV48)) && (VA_BITS >= VA_BITS_SV48))       \
> +               mmap_end = VA_USER_SV48;        \
> +       else if ((addr) == 0)   \
> +               mmap_end = DEFAULT_MAP_WINDOW;  \
> +       else    \
> +               mmap_end = VA_USER_SV39;        \
> +       mmap_end;       \
> +})

What about the following instead:

#define arch_get_mmap_end(addr, len, flags)    \
({     \
       unsigned long mmap_end; \
       if ((addr) >= VA_USER_SV57) \
          mmap_end = STACK_TOP_MAX; \ // Maybe a comment here that
says it returns the max user address of the current mode, not obvious
at first sight.
       else \
          mmap_end = DEFAULT_MAP_WINDOW; \
       mmap_end; \
})

The only corner case is when sv57 is active, then only a hint greater
than VA_USER_SV57 can return a sv57 user address. Otherwise, we just
need to return the default mmap end right?

> +
> +#define arch_get_mmap_base(addr, base) \
> +({     \
> +       unsigned long mmap_base;        \
> +       if (((addr) >= VA_USER_SV57) && (VA_BITS >= VA_BITS_SV57))      \
> +               mmap_base = (base) + (VA_USER_SV57 - DEFAULT_MAP_WINDOW);       \
> +       else if ((((addr) >= VA_USER_SV48)) && (VA_BITS >= VA_BITS_SV48))       \
> +               mmap_base = (base) + (VA_USER_SV48 - DEFAULT_MAP_WINDOW);       \
> +       else if ((addr) == 0)   \
> +               mmap_base = (base);     \
> +       else    \
> +               mmap_base = (base) + (VA_USER_SV39 - DEFAULT_MAP_WINDOW);       \
> +       mmap_base;      \
> +})
> +

From arch_pick_mmap_layout()
(https://elixir.bootlin.com/linux/latest/source/mm/util.c#L433), the
"base" argument is:

- either STACK_TOP in top-down (more or less some random offset)
- or TASK_UNMAPPED_BASE in bottom-up (more or less some random offset)

When bottom-up is the current mode, we should not change the base, so
adding (VA_USER_SV57 - DEFAULT_MAP_WINDOW) in the first case is not
right for me. When sv48 or sv57 are the active mode,
DEFAULT_MAP_WINDOW is equal to VA_USER_SV48 right? So (VA_USER_SV48 -
DEFAULT_MAP_WINDOW) is 0, so not useful. And for the last case, when
the user asks for a sv39 address whereas the active mode is sv48 or
sv57, then  (VA_USER_SV39 - DEFAULT_MAP_WINDOW) is negative and the
base is smaller which is not correct.

In the bottom-up case, we should preserve the base and I think that
again, only sv57 is the corner case to deal with.


>  #else
> +#define DEFAULT_MAP_WINDOW     TASK_SIZE
>  #define STACK_TOP_MAX          TASK_SIZE
>  #endif
>  #define STACK_ALIGN            16
>
> +#define STACK_TOP              DEFAULT_MAP_WINDOW
> +
> +/*
> + * This decides where the kernel will search for a free chunk of vm
> + * space during mmap's.
> + */
> +#define TASK_UNMAPPED_BASE     PAGE_ALIGN(DEFAULT_MAP_WINDOW / 3)
> +
>  #ifndef __ASSEMBLY__
>
>  struct task_struct;
> --
> 2.41.0
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v6 2/4] RISC-V: mm: Add tests for RISC-V mm
  2023-07-14 16:54 ` [PATCH v6 2/4] RISC-V: mm: Add tests for RISC-V mm Charlie Jenkins
@ 2023-07-20  8:16   ` Alexandre Ghiti
  0 siblings, 0 replies; 11+ messages in thread
From: Alexandre Ghiti @ 2023-07-20  8:16 UTC (permalink / raw)
  To: Charlie Jenkins
  Cc: linux-riscv, linux-kernel, conor, paul.walmsley, palmer, aou,
	anup, konstantin, linux-doc, linux-kselftest, linux-mm, mick,
	jrtc27, rdunlap

On Fri, Jul 14, 2023 at 6:55 PM Charlie Jenkins <charlie@rivosinc.com> wrote:
>
> Add tests that enforce mmap hint address behavior. mmap should default
> to sv48. mmap will provide an address at the highest address space that
> can fit into the hint address, unless the hint address is less than sv39
> and not 0, then it will return a sv39 address. In addition, ensure that
> rlimit changes do not cause mmap to fail.
>
> Signed-off-by: Charlie Jenkins <charlie@rivosinc.com>
> ---
>  tools/testing/selftests/riscv/Makefile        |   2 +-
>  tools/testing/selftests/riscv/mm/.gitignore   |   1 +
>  tools/testing/selftests/riscv/mm/Makefile     |  21 +++
>  .../selftests/riscv/mm/testcases/mmap.c       | 133 ++++++++++++++++++
>  4 files changed, 156 insertions(+), 1 deletion(-)
>  create mode 100644 tools/testing/selftests/riscv/mm/.gitignore
>  create mode 100644 tools/testing/selftests/riscv/mm/Makefile
>  create mode 100644 tools/testing/selftests/riscv/mm/testcases/mmap.c
>
> diff --git a/tools/testing/selftests/riscv/Makefile b/tools/testing/selftests/riscv/Makefile
> index 9dd629cc86aa..1b79da90396e 100644
> --- a/tools/testing/selftests/riscv/Makefile
> +++ b/tools/testing/selftests/riscv/Makefile
> @@ -5,7 +5,7 @@
>  ARCH ?= $(shell uname -m 2>/dev/null || echo not)
>
>  ifneq (,$(filter $(ARCH),riscv))
> -RISCV_SUBTARGETS ?= hwprobe vector
> +RISCV_SUBTARGETS ?= hwprobe vector mm
>  else
>  RISCV_SUBTARGETS :=
>  endif
> diff --git a/tools/testing/selftests/riscv/mm/.gitignore b/tools/testing/selftests/riscv/mm/.gitignore
> new file mode 100644
> index 000000000000..9a6f303edcd3
> --- /dev/null
> +++ b/tools/testing/selftests/riscv/mm/.gitignore
> @@ -0,0 +1 @@
> +mmap
> diff --git a/tools/testing/selftests/riscv/mm/Makefile b/tools/testing/selftests/riscv/mm/Makefile
> new file mode 100644
> index 000000000000..cf68e63e7495
> --- /dev/null
> +++ b/tools/testing/selftests/riscv/mm/Makefile
> @@ -0,0 +1,21 @@
> +# SPDX-License-Identifier: GPL-2.0
> +# Originally tools/testing/selftests/arm64/signal
> +
> +# Additional include paths needed by kselftest.h and local headers
> +CFLAGS += -D_GNU_SOURCE -std=gnu99 -I.
> +
> +SRCS := $(filter-out testcases/testcases.c,$(wildcard testcases/*.c))
> +PROGS := $(patsubst %.c,%,$(SRCS))
> +
> +# Generated binaries to be installed by top KSFT script
> +TEST_GEN_PROGS := $(notdir $(PROGS))
> +
> +# Get Kernel headers installed and use them.
> +
> +# Including KSFT lib.mk here will also mangle the TEST_GEN_PROGS list
> +# to account for any OUTPUT target-dirs optionally provided by
> +# the toplevel makefile
> +include ../../lib.mk
> +
> +$(TEST_GEN_PROGS): $(PROGS)
> +       cp $(PROGS) $(OUTPUT)/
> diff --git a/tools/testing/selftests/riscv/mm/testcases/mmap.c b/tools/testing/selftests/riscv/mm/testcases/mmap.c
> new file mode 100644
> index 000000000000..d8e751f7b8c9
> --- /dev/null
> +++ b/tools/testing/selftests/riscv/mm/testcases/mmap.c
> @@ -0,0 +1,133 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +#include <sys/mman.h>
> +#include <sys/resource.h>
> +#include <sys/time.h>
> +
> +#include "../../kselftest_harness.h"
> +struct addresses {
> +       int *no_hint;
> +       int *on_37_addr;
> +       int *on_38_addr;
> +       int *on_46_addr;
> +       int *on_47_addr;
> +       int *on_55_addr;
> +       int *on_56_addr;
> +};
> +
> +void do_mmaps(struct addresses *mmap_addresses)
> +{
> +       // Place all of the hint addresses on the boundaries of mmap
> +       // sv39, sv48, sv57
> +       // User addresses end at 1<<38, 1<<47, 1<<56 respectively

Doesn't checkpatch complain about those comments? Shouldn't you use /*
*/ instead?

> +       void *on_37_bits = (void *)(1UL << 37);
> +       void *on_38_bits = (void *)(1UL << 38);
> +       void *on_46_bits = (void *)(1UL << 46);
> +       void *on_47_bits = (void *)(1UL << 47);
> +       void *on_55_bits = (void *)(1UL << 55);
> +       void *on_56_bits = (void *)(1UL << 56);
> +
> +       int prot = PROT_READ | PROT_WRITE;
> +       int flags = MAP_PRIVATE | MAP_ANONYMOUS;
> +
> +       mmap_addresses->no_hint =
> +               mmap(NULL, 5 * sizeof(int), prot, flags, 0, 0);
> +       mmap_addresses->on_37_addr =
> +               mmap(on_37_bits, 5 * sizeof(int), prot, flags, 0, 0);
> +       mmap_addresses->on_38_addr =
> +               mmap(on_38_bits, 5 * sizeof(int), prot, flags, 0, 0);
> +       mmap_addresses->on_46_addr =
> +               mmap(on_46_bits, 5 * sizeof(int), prot, flags, 0, 0);
> +       mmap_addresses->on_47_addr =
> +               mmap(on_47_bits, 5 * sizeof(int), prot, flags, 0, 0);
> +       mmap_addresses->on_55_addr =
> +               mmap(on_55_bits, 5 * sizeof(int), prot, flags, 0, 0);
> +       mmap_addresses->on_56_addr =
> +               mmap(on_56_bits, 5 * sizeof(int), prot, flags, 0, 0);
> +}
> +
> +TEST(default_rlimit)
> +{
> +// Only works on 64 bit
> +#if __riscv_xlen == 64
> +       struct addresses mmap_addresses;
> +
> +       do_mmaps(&mmap_addresses);
> +
> +       EXPECT_NE(mmap_addresses.no_hint, MAP_FAILED);
> +       EXPECT_NE(mmap_addresses.on_37_addr, MAP_FAILED);
> +       EXPECT_NE(mmap_addresses.on_38_addr, MAP_FAILED);
> +       EXPECT_NE(mmap_addresses.on_46_addr, MAP_FAILED);
> +       EXPECT_NE(mmap_addresses.on_47_addr, MAP_FAILED);
> +       EXPECT_NE(mmap_addresses.on_55_addr, MAP_FAILED);
> +       EXPECT_NE(mmap_addresses.on_56_addr, MAP_FAILED);
> +
> +       EXPECT_LT((unsigned long)mmap_addresses.no_hint, 1UL << 47);
> +       EXPECT_LT((unsigned long)mmap_addresses.on_37_addr, 1UL << 38);
> +       EXPECT_LT((unsigned long)mmap_addresses.on_38_addr, 1UL << 38);
> +       EXPECT_LT((unsigned long)mmap_addresses.on_46_addr, 1UL << 38);
> +       EXPECT_LT((unsigned long)mmap_addresses.on_47_addr, 1UL << 47);
> +       EXPECT_LT((unsigned long)mmap_addresses.on_55_addr, 1UL << 47);
> +       EXPECT_LT((unsigned long)mmap_addresses.on_56_addr, 1UL << 56);
> +#endif
> +}
> +
> +TEST(zero_rlimit)
> +{
> +// Only works on 64 bit
> +#if __riscv_xlen == 64
> +       struct addresses mmap_addresses;
> +       struct rlimit rlim_new = { .rlim_cur = 0, .rlim_max = RLIM_INFINITY };
> +
> +       setrlimit(RLIMIT_STACK, &rlim_new);
> +
> +       do_mmaps(&mmap_addresses);
> +
> +       EXPECT_NE(mmap_addresses.no_hint, MAP_FAILED);
> +       EXPECT_NE(mmap_addresses.on_37_addr, MAP_FAILED);
> +       EXPECT_NE(mmap_addresses.on_38_addr, MAP_FAILED);
> +       EXPECT_NE(mmap_addresses.on_46_addr, MAP_FAILED);
> +       EXPECT_NE(mmap_addresses.on_47_addr, MAP_FAILED);
> +       EXPECT_NE(mmap_addresses.on_55_addr, MAP_FAILED);
> +       EXPECT_NE(mmap_addresses.on_56_addr, MAP_FAILED);
> +
> +       EXPECT_LT((unsigned long)mmap_addresses.no_hint, 1UL << 47);
> +       EXPECT_LT((unsigned long)mmap_addresses.on_37_addr, 1UL << 38);
> +       EXPECT_LT((unsigned long)mmap_addresses.on_38_addr, 1UL << 38);
> +       EXPECT_LT((unsigned long)mmap_addresses.on_46_addr, 1UL << 38);
> +       EXPECT_LT((unsigned long)mmap_addresses.on_47_addr, 1UL << 47);
> +       EXPECT_LT((unsigned long)mmap_addresses.on_55_addr, 1UL << 47);
> +       EXPECT_LT((unsigned long)mmap_addresses.on_56_addr, 1UL << 56);
> +#endif
> +}
> +
> +TEST(infinite_rlimit)
> +{
> +// Only works on 64 bit
> +#if __riscv_xlen == 64
> +       struct addresses mmap_addresses;
> +       struct rlimit rlim_new = { .rlim_cur = RLIM_INFINITY,
> +                                  .rlim_max = RLIM_INFINITY };
> +
> +       setrlimit(RLIMIT_STACK, &rlim_new);
> +
> +       do_mmaps(&mmap_addresses);
> +
> +       EXPECT_NE(mmap_addresses.no_hint, MAP_FAILED);
> +       EXPECT_NE(mmap_addresses.on_37_addr, MAP_FAILED);
> +       EXPECT_NE(mmap_addresses.on_38_addr, MAP_FAILED);
> +       EXPECT_NE(mmap_addresses.on_46_addr, MAP_FAILED);
> +       EXPECT_NE(mmap_addresses.on_47_addr, MAP_FAILED);
> +       EXPECT_NE(mmap_addresses.on_55_addr, MAP_FAILED);
> +       EXPECT_NE(mmap_addresses.on_56_addr, MAP_FAILED);
> +
> +       EXPECT_LT((unsigned long)mmap_addresses.no_hint, 1UL << 47);
> +       EXPECT_LT((unsigned long)mmap_addresses.on_37_addr, 1UL << 38);
> +       EXPECT_LT((unsigned long)mmap_addresses.on_38_addr, 1UL << 38);
> +       EXPECT_LT((unsigned long)mmap_addresses.on_46_addr, 1UL << 38);
> +       EXPECT_LT((unsigned long)mmap_addresses.on_47_addr, 1UL << 47);
> +       EXPECT_LT((unsigned long)mmap_addresses.on_55_addr, 1UL << 47);
> +       EXPECT_LT((unsigned long)mmap_addresses.on_56_addr, 1UL << 56);
> +#endif
> +}
> +
> +TEST_HARNESS_MAIN
> --
> 2.41.0
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v6 4/4] RISC-V: mm: Document mmap changes
  2023-07-20  6:59   ` Alexandre Ghiti
@ 2023-07-26 13:52     ` Charlie Jenkins
  0 siblings, 0 replies; 11+ messages in thread
From: Charlie Jenkins @ 2023-07-26 13:52 UTC (permalink / raw)
  To: Alexandre Ghiti
  Cc: linux-riscv, linux-kernel, conor, paul.walmsley, palmer, aou,
	anup, konstantin, linux-doc, linux-kselftest, linux-mm, mick,
	jrtc27, rdunlap

On Thu, Jul 20, 2023 at 08:59:12AM +0200, Alexandre Ghiti wrote:
> On Fri, Jul 14, 2023 at 6:56 PM Charlie Jenkins <charlie@rivosinc.com> wrote:
> >
> > The behavior of mmap is modified with this patch series, so explain the
> > changes to the mmap hint address behavior.
> >
> > Signed-off-by: Charlie Jenkins <charlie@rivosinc.com>
> > ---
> >  Documentation/riscv/vm-layout.rst | 22 ++++++++++++++++++++++
> >  1 file changed, 22 insertions(+)
> >
> > diff --git a/Documentation/riscv/vm-layout.rst b/Documentation/riscv/vm-layout.rst
> > index 5462c84f4723..892412b91300 100644
> > --- a/Documentation/riscv/vm-layout.rst
> > +++ b/Documentation/riscv/vm-layout.rst
> > @@ -133,3 +133,25 @@ RISC-V Linux Kernel SV57
> >     ffffffff00000000 |  -4     GB | ffffffff7fffffff |    2 GB | modules, BPF
> >     ffffffff80000000 |  -2     GB | ffffffffffffffff |    2 GB | kernel
> >    __________________|____________|__________________|_________|____________________________________________________________
> > +
> > +
> > +Userspace VAs
> > +--------------------
> > +To maintain compatibility with software that relies on the VA space with a
> > +maximum of 48 bits the kernel will, by default, return virtual addresses to
> > +userspace from a 48-bit range (sv48). This default behavior is achieved by
> > +passing 0 into the hint address parameter of mmap. On CPUs with an address space
> > +smaller than sv48, the CPU maximum supported address space will be the default.
> > +
> > +Software can "opt-in" to receiving VAs from another VA space by providing
> > +a hint address to mmap. A call to mmap is guaranteed to return an address
> > +that will not override the unset left-aligned bits in the hint address,
> > +unless there is no space left in the address space. If there is no space
> > +available in the requested address space, an address in the next smallest
> > +available address space will be returned.
> > +
> > +For example, in order to obtain 48-bit VA space, a hint address greater than
> > +:code:`1 << 38` must be provided.
> 
> Is this correct? Shouldn't the hint be strictly greater than the
> address space it targets? In patch 1, you state that "A hint address
> passed to mmap will cause the largest address space that fits entirely
> into the hint to be used", it seems contradictory to me.
> 
That is a mistake, it should have a hint address greater than 1 << 47. I
will fix up the wording here.
> > Note that this is 38 due to sv39 userspace
> > +ending at :code:`1 << 38` and the addresses beyond this are reserved for the
> > +kernel. Similarly, to obtain 57-bit VA space addresses, a hint address greater
> > +than or equal to :code:`1 << 47` must be provided.
> > --
> > 2.41.0
> >

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v6 1/4] RISC-V: mm: Restrict address space for sv39,sv48,sv57
  2023-07-20  8:13   ` Alexandre Ghiti
@ 2023-07-26 14:02     ` Charlie Jenkins
  0 siblings, 0 replies; 11+ messages in thread
From: Charlie Jenkins @ 2023-07-26 14:02 UTC (permalink / raw)
  To: Alexandre Ghiti
  Cc: linux-riscv, linux-kernel, conor, paul.walmsley, palmer, aou,
	anup, konstantin, linux-doc, linux-kselftest, linux-mm, mick,
	jrtc27, rdunlap

On Thu, Jul 20, 2023 at 10:13:54AM +0200, Alexandre Ghiti wrote:
> On Fri, Jul 14, 2023 at 6:55 PM Charlie Jenkins <charlie@rivosinc.com> wrote:
> >
> > Make sv48 the default address space for mmap as some applications
> > currently depend on this assumption. A hint address passed to mmap will
> > cause the largest address space that fits entirely into the hint to be
> > used. If the hint is less than or equal to 1<<38, an sv39 address will
> > be used. An exception is that if the hint address is 0, then a sv48
> > address will be used. After an address space is completely full, the next
> > smallest address space will be used.
> >
> > Signed-off-by: Charlie Jenkins <charlie@rivosinc.com>
> > ---
> >  arch/riscv/include/asm/elf.h       |  2 +-
> >  arch/riscv/include/asm/pgtable.h   | 12 +++++++-
> >  arch/riscv/include/asm/processor.h | 46 +++++++++++++++++++++++++-----
> >  3 files changed, 51 insertions(+), 9 deletions(-)
> >
> > diff --git a/arch/riscv/include/asm/elf.h b/arch/riscv/include/asm/elf.h
> > index c24280774caf..5d3368d5585c 100644
> > --- a/arch/riscv/include/asm/elf.h
> > +++ b/arch/riscv/include/asm/elf.h
> > @@ -49,7 +49,7 @@ extern bool compat_elf_check_arch(Elf32_Ehdr *hdr);
> >   * the loader.  We need to make sure that it is out of the way of the program
> >   * that it will "exec", and that there is sufficient room for the brk.
> >   */
> > -#define ELF_ET_DYN_BASE                ((TASK_SIZE / 3) * 2)
> > +#define ELF_ET_DYN_BASE                ((DEFAULT_MAP_WINDOW / 3) * 2)
> >
> >  #ifdef CONFIG_64BIT
> >  #ifdef CONFIG_COMPAT
> > diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
> > index 75970ee2bda2..e13f5872bfe9 100644
> > --- a/arch/riscv/include/asm/pgtable.h
> > +++ b/arch/riscv/include/asm/pgtable.h
> > @@ -63,12 +63,22 @@
> >   * position vmemmap directly below the VMALLOC region.
> >   */
> >  #ifdef CONFIG_64BIT
> > +#define VA_BITS_SV39 39
> > +#define VA_BITS_SV48 48
> > +#define VA_BITS_SV57 57
> > +
> > +#define VA_USER_SV39 (UL(1) << (VA_BITS_SV39 - 1))
> > +#define VA_USER_SV48 (UL(1) << (VA_BITS_SV48 - 1))
> > +#define VA_USER_SV57 (UL(1) << (VA_BITS_SV57 - 1))
> > +
> >  #define VA_BITS                (pgtable_l5_enabled ? \
> > -                               57 : (pgtable_l4_enabled ? 48 : 39))
> > +                               VA_BITS_SV57 : (pgtable_l4_enabled ? VA_BITS_SV48 : VA_BITS_SV39))
> >  #else
> >  #define VA_BITS                32
> >  #endif
> >
> > +#define MMAP_VA_BITS ((VA_BITS >= VA_BITS_SV48) ? VA_BITS_SV48 : VA_BITS)
> > +
> >  #define VMEMMAP_SHIFT \
> >         (VA_BITS - PAGE_SHIFT - 1 + STRUCT_PAGE_MAX_SHIFT)
> >  #define VMEMMAP_SIZE   BIT(VMEMMAP_SHIFT)
> > diff --git a/arch/riscv/include/asm/processor.h b/arch/riscv/include/asm/processor.h
> > index c950a8d9edef..14a5396eed3d 100644
> > --- a/arch/riscv/include/asm/processor.h
> > +++ b/arch/riscv/include/asm/processor.h
> > @@ -13,20 +13,52 @@
> >
> >  #include <asm/ptrace.h>
> >
> > -/*
> > - * This decides where the kernel will search for a free chunk of vm
> > - * space during mmap's.
> > - */
> > -#define TASK_UNMAPPED_BASE     PAGE_ALIGN(TASK_SIZE / 3)
> > -
> > -#define STACK_TOP              TASK_SIZE
> >  #ifdef CONFIG_64BIT
> > +#define DEFAULT_MAP_WINDOW     (UL(1) << (MMAP_VA_BITS - 1))
> >  #define STACK_TOP_MAX          TASK_SIZE_64
> > +
> > +#define arch_get_mmap_end(addr, len, flags)    \
> > +({     \
> > +       unsigned long mmap_end; \
> > +       if ((addr) >= VA_USER_SV57)     \
> > +               mmap_end = STACK_TOP_MAX;       \
> > +       else if ((((addr) >= VA_USER_SV48)) && (VA_BITS >= VA_BITS_SV48))       \
> > +               mmap_end = VA_USER_SV48;        \
> > +       else if ((addr) == 0)   \
> > +               mmap_end = DEFAULT_MAP_WINDOW;  \
> > +       else    \
> > +               mmap_end = VA_USER_SV39;        \
> > +       mmap_end;       \
> > +})
> 
> What about the following instead:
> 
> #define arch_get_mmap_end(addr, len, flags)    \
> ({     \
>        unsigned long mmap_end; \
>        if ((addr) >= VA_USER_SV57) \
>           mmap_end = STACK_TOP_MAX; \ // Maybe a comment here that
> says it returns the max user address of the current mode, not obvious
> at first sight.
>        else \
>           mmap_end = DEFAULT_MAP_WINDOW; \
>        mmap_end; \
> })
> 
> The only corner case is when sv57 is active, then only a hint greater
> than VA_USER_SV57 can return a sv57 user address. Otherwise, we just
> need to return the default mmap end right?
>
> > +
> > +#define arch_get_mmap_base(addr, base) \
> > +({     \
> > +       unsigned long mmap_base;        \
> > +       if (((addr) >= VA_USER_SV57) && (VA_BITS >= VA_BITS_SV57))      \
> > +               mmap_base = (base) + (VA_USER_SV57 - DEFAULT_MAP_WINDOW);       \
> > +       else if ((((addr) >= VA_USER_SV48)) && (VA_BITS >= VA_BITS_SV48))       \
> > +               mmap_base = (base) + (VA_USER_SV48 - DEFAULT_MAP_WINDOW);       \
> > +       else if ((addr) == 0)   \
> > +               mmap_base = (base);     \
> > +       else    \
> > +               mmap_base = (base) + (VA_USER_SV39 - DEFAULT_MAP_WINDOW);       \
> > +       mmap_base;      \
> > +})
> > +
> 
> From arch_pick_mmap_layout()
> (https://elixir.bootlin.com/linux/latest/source/mm/util.c#L433), the
> "base" argument is:
> 
> - either STACK_TOP in top-down (more or less some random offset)
> - or TASK_UNMAPPED_BASE in bottom-up (more or less some random offset)
> 
> When bottom-up is the current mode, we should not change the base, so
> adding (VA_USER_SV57 - DEFAULT_MAP_WINDOW) in the first case is not
> right for me. When sv48 or sv57 are the active mode,
> DEFAULT_MAP_WINDOW is equal to VA_USER_SV48 right? So (VA_USER_SV48 -
> DEFAULT_MAP_WINDOW) is 0, so not useful. And for the last case, when
> the user asks for a sv39 address whereas the active mode is sv48 or
> sv57, then  (VA_USER_SV39 - DEFAULT_MAP_WINDOW) is negative and the
> base is smaller which is not correct.
> 
In the bottom-up case mm->mmap_base is directly used instead of
arch_get_mmap_base so base will not be modified in that case. The math
here is confusing so I can refactor it. I like your suggestion to
extract out the rnd value of base with (base) - DEFAULT_MAP_WINDOW since
base is defined as PAGE_ALIGN(STACK_TOP - gap - rnd) and STACK_TOP is
DEFAULT_MAP_WINDOW. This (-gap-rnd) value can then directly be used
instead of redoing the math in each if.
> In the bottom-up case, we should preserve the base and I think that
> again, only sv57 is the corner case to deal with.
> 
>
For both this comment and the one above, I think we should allow the
user to default back to sv38. I talked to Palmer and he would prefer
allowing selection of sv38. Since there is no overhead, there is no
need to prevent the user from doing so.
> >  #else
> > +#define DEFAULT_MAP_WINDOW     TASK_SIZE
> >  #define STACK_TOP_MAX          TASK_SIZE
> >  #endif
> >  #define STACK_ALIGN            16
> >
> > +#define STACK_TOP              DEFAULT_MAP_WINDOW
> > +
> > +/*
> > + * This decides where the kernel will search for a free chunk of vm
> > + * space during mmap's.
> > + */
> > +#define TASK_UNMAPPED_BASE     PAGE_ALIGN(DEFAULT_MAP_WINDOW / 3)
> > +
> >  #ifndef __ASSEMBLY__
> >
> >  struct task_struct;
> > --
> > 2.41.0
> >

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2023-07-26 14:02 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-07-14 16:54 [PATCH v6 0/4] RISC-V: mm: Make SV48 the default address space Charlie Jenkins
2023-07-14 16:54 ` [PATCH v6 1/4] RISC-V: mm: Restrict address space for sv39,sv48,sv57 Charlie Jenkins
2023-07-20  8:13   ` Alexandre Ghiti
2023-07-26 14:02     ` Charlie Jenkins
2023-07-14 16:54 ` [PATCH v6 2/4] RISC-V: mm: Add tests for RISC-V mm Charlie Jenkins
2023-07-20  8:16   ` Alexandre Ghiti
2023-07-14 16:54 ` [PATCH v6 3/4] RISC-V: mm: Update pgtable comment documentation Charlie Jenkins
2023-07-20  7:00   ` Alexandre Ghiti
2023-07-14 16:54 ` [PATCH v6 4/4] RISC-V: mm: Document mmap changes Charlie Jenkins
2023-07-20  6:59   ` Alexandre Ghiti
2023-07-26 13:52     ` Charlie Jenkins

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).