All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v9 0/3] riscv, mm: detect svnapot cpu support at runtime
@ 2022-12-04 14:11 panqinglin2020
  2022-12-04 14:11 ` [PATCH v9 1/3] riscv: mm: modify pte format for Svnapot panqinglin2020
                   ` (2 more replies)
  0 siblings, 3 replies; 22+ messages in thread
From: panqinglin2020 @ 2022-12-04 14:11 UTC (permalink / raw)
  To: palmer, linux-riscv
  Cc: jeff, xuyinan, conor, ajones, alex, jszhang, Qinglin Pan

From: Qinglin Pan <panqinglin2020@iscas.ac.cn>

Svnapot is a RISC-V extension for marking contiguous 4K pages as a non-4K
page. This patch set is for using Svnapot in hugetlb fs and huge vmap.

This patchset adds a Kconfig item for using Svnapot in
"Platform type"->"SVNAPOT extension support". Its default value is on,
and people can set it off if they don't allow kernel to detect Svnapot
hardware support and leverage it.

Tested on:
  - qemu rv64 with "Svnapot support" off and svnapot=true.
  - qemu rv64 with "Svnapot support" on and svnapot=true.
  - qemu rv64 with "Svnapot support" off and svnapot=false.
  - qemu rv64 with "Svnapot support" on and svnapot=false.


Changes in v2:
  - detect Svnapot hardware support at boot time.
Changes in v3:
  - do linear mapping again if has_svnapot
Changes in v4:
  - fix some errors/warns reported by checkpatch.pl, thanks @Conor
Changes in v5:
  - modify code according to @Conor and @Heiko
Changes in v6:
  - use static key insead of alternative errata
Changes in v7:
  - add napot_cont_order for possible more napot order in the future
  - remove linear mapping related code from this patchset to another patch
  - refactor hugetlb code for svnapot according to thanks @Andrew @Conor
  - use tools/testing/selftests/vm/map_hugetlb as hugetlb testing program
  - support svnapot in huge vmap on newer for-next branch
Changes in v8:
  - fix compilation errors in rv32_defconfig
  - insert some lines of whitespace according to @Conor's suggestion
Changes in v9:
  - use alternative to avoid using static branches inside heavily used
    inline functions
  - change napot_cont_mask definition
  - post test_vmalloc modification about testing vmalloc_huge


Qinglin Pan (3):
  riscv: mm: modify pte format for Svnapot
  riscv: mm: support Svnapot in hugetlb page
  riscv: mm: support Svnapot in huge vmap

 arch/riscv/Kconfig                   |  16 +-
 arch/riscv/include/asm/errata_list.h |  15 +-
 arch/riscv/include/asm/hugetlb.h     |  34 ++-
 arch/riscv/include/asm/hwcap.h       |   2 +-
 arch/riscv/include/asm/page.h        |   5 -
 arch/riscv/include/asm/pgtable-64.h  |  34 +++
 arch/riscv/include/asm/pgtable.h     |  43 +++-
 arch/riscv/include/asm/vmalloc.h     |  61 +++++-
 arch/riscv/kernel/cpu.c              |   1 +
 arch/riscv/kernel/cpufeature.c       |  15 ++
 arch/riscv/mm/hugetlbpage.c          | 297 +++++++++++++++++++++++++++
 11 files changed, 510 insertions(+), 13 deletions(-)

-- 
2.37.4


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v9 1/3] riscv: mm: modify pte format for Svnapot
  2022-12-04 14:11 [PATCH v9 0/3] riscv, mm: detect svnapot cpu support at runtime panqinglin2020
@ 2022-12-04 14:11 ` panqinglin2020
  2022-12-06 19:56   ` Conor Dooley
                     ` (2 more replies)
  2022-12-04 14:11 ` [PATCH v9 2/3] riscv: mm: support Svnapot in hugetlb page panqinglin2020
  2022-12-04 14:11 ` [PATCH v9 3/3] riscv: mm: support Svnapot in huge vmap panqinglin2020
  2 siblings, 3 replies; 22+ messages in thread
From: panqinglin2020 @ 2022-12-04 14:11 UTC (permalink / raw)
  To: palmer, linux-riscv
  Cc: jeff, xuyinan, conor, ajones, alex, jszhang, Qinglin Pan

From: Qinglin Pan <panqinglin2020@iscas.ac.cn>

Add one alternative to enable/disable svnapot support, enable this static
key when "svnapot" is in the "riscv,isa" field of fdt and SVNAPOT compile
option is set. It will influence the behavior of has_svnapot. All code
dependent on svnapot should make sure that has_svnapot return true firstly.

Modify PTE definition for Svnapot, and creates some functions in pgtable.h
to mark a PTE as napot and check if it is a Svnapot PTE. Until now, only
64KB napot size is supported in spec, so some macros has only 64KB version.

Signed-off-by: Qinglin Pan <panqinglin2020@iscas.ac.cn>

diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index ef8d66de5f38..1d8477c0af7c 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -395,6 +395,20 @@ config RISCV_ISA_C
 
 	  If you don't know what to do here, say Y.
 
+config RISCV_ISA_SVNAPOT
+	bool "SVNAPOT extension support"
+	depends on 64BIT && MMU
+	select RISCV_ALTERNATIVE
+	default y
+	help
+	  Allow kernel to detect SVNAPOT ISA-extension dynamically in boot time
+	  and enable its usage.
+
+	  SVNAPOT extension helps to mark contiguous PTEs as a range
+	  of contiguous virtual-to-physical translations, with a naturally
+	  aligned power-of-2 (NAPOT) granularity larger than the base 4KB page
+	  size.
+
 config RISCV_ISA_SVPBMT
 	bool "SVPBMT extension support"
 	depends on 64BIT && MMU
diff --git a/arch/riscv/include/asm/errata_list.h b/arch/riscv/include/asm/errata_list.h
index 4180312d2a70..beadb1126ed9 100644
--- a/arch/riscv/include/asm/errata_list.h
+++ b/arch/riscv/include/asm/errata_list.h
@@ -22,9 +22,10 @@
 #define	ERRATA_THEAD_NUMBER 3
 #endif
 
-#define	CPUFEATURE_SVPBMT 0
-#define	CPUFEATURE_ZICBOM 1
-#define	CPUFEATURE_NUMBER 2
+#define	CPUFEATURE_SVPBMT	0
+#define	CPUFEATURE_ZICBOM	1
+#define	CPUFEATURE_SVNAPOT	2
+#define	CPUFEATURE_NUMBER	3
 
 #ifdef __ASSEMBLY__
 
@@ -156,6 +157,14 @@ asm volatile(ALTERNATIVE(						\
 	: "=r" (__ovl) :						\
 	: "memory")
 
+#define ALT_SVNAPOT(_val)						\
+asm(ALTERNATIVE(							\
+	"li %0, 0",							\
+	"li %0, 1",							\
+		0, CPUFEATURE_SVNAPOT, CONFIG_RISCV_ISA_SVNAPOT)	\
+	: "=r" (_val) :							\
+	: "memory")
+
 #endif /* __ASSEMBLY__ */
 
 #endif
diff --git a/arch/riscv/include/asm/hwcap.h b/arch/riscv/include/asm/hwcap.h
index b22525290073..4cbc1f45ab26 100644
--- a/arch/riscv/include/asm/hwcap.h
+++ b/arch/riscv/include/asm/hwcap.h
@@ -54,6 +54,7 @@ extern unsigned long elf_hwcap;
  */
 enum riscv_isa_ext_id {
 	RISCV_ISA_EXT_SSCOFPMF = RISCV_ISA_EXT_BASE,
+	RISCV_ISA_EXT_SVNAPOT,
 	RISCV_ISA_EXT_SVPBMT,
 	RISCV_ISA_EXT_ZICBOM,
 	RISCV_ISA_EXT_ZIHINTPAUSE,
@@ -87,7 +88,6 @@ static __always_inline int riscv_isa_ext2key(int num)
 {
 	switch (num) {
 	case RISCV_ISA_EXT_f:
-		return RISCV_ISA_EXT_KEY_FPU;
 	case RISCV_ISA_EXT_d:
 		return RISCV_ISA_EXT_KEY_FPU;
 	case RISCV_ISA_EXT_ZIHINTPAUSE:
diff --git a/arch/riscv/include/asm/page.h b/arch/riscv/include/asm/page.h
index ac70b0fd9a9a..349fad5e35de 100644
--- a/arch/riscv/include/asm/page.h
+++ b/arch/riscv/include/asm/page.h
@@ -16,11 +16,6 @@
 #define PAGE_SIZE	(_AC(1, UL) << PAGE_SHIFT)
 #define PAGE_MASK	(~(PAGE_SIZE - 1))
 
-#ifdef CONFIG_64BIT
-#define HUGE_MAX_HSTATE		2
-#else
-#define HUGE_MAX_HSTATE		1
-#endif
 #define HPAGE_SHIFT		PMD_SHIFT
 #define HPAGE_SIZE		(_AC(1, UL) << HPAGE_SHIFT)
 #define HPAGE_MASK              (~(HPAGE_SIZE - 1))
diff --git a/arch/riscv/include/asm/pgtable-64.h b/arch/riscv/include/asm/pgtable-64.h
index dc42375c2357..9611833907ec 100644
--- a/arch/riscv/include/asm/pgtable-64.h
+++ b/arch/riscv/include/asm/pgtable-64.h
@@ -74,6 +74,40 @@ typedef struct {
  */
 #define _PAGE_PFN_MASK  GENMASK(53, 10)
 
+/*
+ * [63] Svnapot definitions:
+ * 0 Svnapot disabled
+ * 1 Svnapot enabled
+ */
+#define _PAGE_NAPOT_SHIFT	63
+#define _PAGE_NAPOT		BIT(_PAGE_NAPOT_SHIFT)
+/*
+ * Only 64KB (order 4) napot ptes supported.
+ */
+#define NAPOT_CONT_ORDER_BASE 4
+enum napot_cont_order {
+	NAPOT_CONT64KB_ORDER = NAPOT_CONT_ORDER_BASE,
+	NAPOT_ORDER_MAX,
+};
+
+#define for_each_napot_order(order)						\
+	for (order = NAPOT_CONT_ORDER_BASE; order < NAPOT_ORDER_MAX; order++)
+#define for_each_napot_order_rev(order)						\
+	for (order = NAPOT_ORDER_MAX - 1;					\
+	     order >= NAPOT_CONT_ORDER_BASE; order--)
+#define napot_cont_order(val)	(__builtin_ctzl((val.pte >> _PAGE_PFN_SHIFT) << 1))
+
+#define napot_cont_shift(order)	((order) + PAGE_SHIFT)
+#define napot_cont_size(order)	BIT(napot_cont_shift(order))
+#define napot_cont_mask(order)	(~(napot_cont_size(order) - 1UL))
+#define napot_pte_num(order)	BIT(order)
+
+#ifdef CONFIG_RISCV_ISA_SVNAPOT
+#define HUGE_MAX_HSTATE		(2 + (NAPOT_ORDER_MAX - NAPOT_CONT_ORDER_BASE))
+#else
+#define HUGE_MAX_HSTATE		2
+#endif
+
 /*
  * [62:61] Svpbmt Memory Type definitions:
  *
diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
index c61ae83aadee..99957b1270f2 100644
--- a/arch/riscv/include/asm/pgtable.h
+++ b/arch/riscv/include/asm/pgtable.h
@@ -6,10 +6,12 @@
 #ifndef _ASM_RISCV_PGTABLE_H
 #define _ASM_RISCV_PGTABLE_H
 
+#include <linux/jump_label.h>
 #include <linux/mmzone.h>
 #include <linux/sizes.h>
 
 #include <asm/pgtable-bits.h>
+#include <asm/hwcap.h>
 
 #ifndef CONFIG_MMU
 #define KERNEL_LINK_ADDR	PAGE_OFFSET
@@ -264,10 +266,49 @@ static inline pte_t pud_pte(pud_t pud)
 	return __pte(pud_val(pud));
 }
 
+static __always_inline bool has_svnapot(void)
+{
+	unsigned int _val;
+
+	ALT_SVNAPOT(_val);
+
+	return _val;
+}
+
+#ifdef CONFIG_RISCV_ISA_SVNAPOT
+
+static inline unsigned long pte_napot(pte_t pte)
+{
+	return pte_val(pte) & _PAGE_NAPOT;
+}
+
+static inline pte_t pte_mknapot(pte_t pte, unsigned int order)
+{
+	int pos = order - 1 + _PAGE_PFN_SHIFT;
+	unsigned long napot_bit = BIT(pos);
+	unsigned long napot_mask = ~GENMASK(pos, _PAGE_PFN_SHIFT);
+
+	return __pte((pte_val(pte) & napot_mask) | napot_bit | _PAGE_NAPOT);
+}
+
+#else
+
+static inline unsigned long pte_napot(pte_t pte)
+{
+	return 0;
+}
+
+#endif /* CONFIG_RISCV_ISA_SVNAPOT */
+
 /* Yields the page frame number (PFN) of a page table entry */
 static inline unsigned long pte_pfn(pte_t pte)
 {
-	return __page_val_to_pfn(pte_val(pte));
+	unsigned long res  = __page_val_to_pfn(pte_val(pte));
+
+	if (has_svnapot() && pte_napot(pte))
+		res = res & (res - 1UL);
+
+	return res;
 }
 
 #define pte_page(x)     pfn_to_page(pte_pfn(x))
diff --git a/arch/riscv/kernel/cpu.c b/arch/riscv/kernel/cpu.c
index bf9dd6764bad..88495f5fcafd 100644
--- a/arch/riscv/kernel/cpu.c
+++ b/arch/riscv/kernel/cpu.c
@@ -165,6 +165,7 @@ static struct riscv_isa_ext_data isa_ext_arr[] = {
 	__RISCV_ISA_EXT_DATA(sscofpmf, RISCV_ISA_EXT_SSCOFPMF),
 	__RISCV_ISA_EXT_DATA(sstc, RISCV_ISA_EXT_SSTC),
 	__RISCV_ISA_EXT_DATA(svinval, RISCV_ISA_EXT_SVINVAL),
+	__RISCV_ISA_EXT_DATA(svnapot, RISCV_ISA_EXT_SVNAPOT),
 	__RISCV_ISA_EXT_DATA(svpbmt, RISCV_ISA_EXT_SVPBMT),
 	__RISCV_ISA_EXT_DATA(zicbom, RISCV_ISA_EXT_ZICBOM),
 	__RISCV_ISA_EXT_DATA(zihintpause, RISCV_ISA_EXT_ZIHINTPAUSE),
diff --git a/arch/riscv/kernel/cpufeature.c b/arch/riscv/kernel/cpufeature.c
index 694267d1fe81..eeed66c3d497 100644
--- a/arch/riscv/kernel/cpufeature.c
+++ b/arch/riscv/kernel/cpufeature.c
@@ -205,6 +205,7 @@ void __init riscv_fill_hwcap(void)
 				SET_ISA_EXT_MAP("zihintpause", RISCV_ISA_EXT_ZIHINTPAUSE);
 				SET_ISA_EXT_MAP("sstc", RISCV_ISA_EXT_SSTC);
 				SET_ISA_EXT_MAP("svinval", RISCV_ISA_EXT_SVINVAL);
+				SET_ISA_EXT_MAP("svnapot", RISCV_ISA_EXT_SVNAPOT);
 			}
 #undef SET_ISA_EXT_MAP
 		}
@@ -252,6 +253,17 @@ void __init riscv_fill_hwcap(void)
 }
 
 #ifdef CONFIG_RISCV_ALTERNATIVE
+static bool __init_or_module cpufeature_probe_svnapot(unsigned int stage)
+{
+	if (!IS_ENABLED(CONFIG_RISCV_ISA_SVNAPOT))
+		return false;
+
+	if (stage == RISCV_ALTERNATIVES_EARLY_BOOT)
+		return false;
+
+	return riscv_isa_extension_available(NULL, SVNAPOT);
+}
+
 static bool __init_or_module cpufeature_probe_svpbmt(unsigned int stage)
 {
 	if (!IS_ENABLED(CONFIG_RISCV_ISA_SVPBMT))
@@ -289,6 +301,9 @@ static u32 __init_or_module cpufeature_probe(unsigned int stage)
 {
 	u32 cpu_req_feature = 0;
 
+	if (cpufeature_probe_svnapot(stage))
+		cpu_req_feature |= BIT(CPUFEATURE_SVNAPOT);
+
 	if (cpufeature_probe_svpbmt(stage))
 		cpu_req_feature |= BIT(CPUFEATURE_SVPBMT);
 
-- 
2.37.4


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v9 2/3] riscv: mm: support Svnapot in hugetlb page
  2022-12-04 14:11 [PATCH v9 0/3] riscv, mm: detect svnapot cpu support at runtime panqinglin2020
  2022-12-04 14:11 ` [PATCH v9 1/3] riscv: mm: modify pte format for Svnapot panqinglin2020
@ 2022-12-04 14:11 ` panqinglin2020
  2022-12-07 18:55   ` Conor Dooley
  2022-12-08 15:17   ` Andrew Jones
  2022-12-04 14:11 ` [PATCH v9 3/3] riscv: mm: support Svnapot in huge vmap panqinglin2020
  2 siblings, 2 replies; 22+ messages in thread
From: panqinglin2020 @ 2022-12-04 14:11 UTC (permalink / raw)
  To: palmer, linux-riscv
  Cc: jeff, xuyinan, conor, ajones, alex, jszhang, Qinglin Pan

From: Qinglin Pan <panqinglin2020@iscas.ac.cn>

Svnapot can be used to support 64KB hugetlb page, so it can become a new
option when using hugetlbfs. Add a basic implementation of hugetlb page,
and support 64KB as a size in it by using Svnapot.

For test, boot kernel with command line contains "default_hugepagesz=64K
hugepagesz=64K hugepages=20" and run a simple test like this:

tools/testing/selftests/vm/map_hugetlb 1 16

And it should be passed.

Signed-off-by: Qinglin Pan <panqinglin2020@iscas.ac.cn>

diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index 1d8477c0af7c..be5c1edea70f 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -43,7 +43,7 @@ config RISCV
 	select ARCH_USE_QUEUED_RWLOCKS
 	select ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT if MMU
 	select ARCH_WANT_FRAME_POINTERS
-	select ARCH_WANT_GENERAL_HUGETLB
+	select ARCH_WANT_GENERAL_HUGETLB if !RISCV_ISA_SVNAPOT
 	select ARCH_WANT_HUGE_PMD_SHARE if 64BIT
 	select ARCH_WANTS_THP_SWAP if HAVE_ARCH_TRANSPARENT_HUGEPAGE
 	select BINFMT_FLAT_NO_DATA_START_OFFSET if !MMU
diff --git a/arch/riscv/include/asm/hugetlb.h b/arch/riscv/include/asm/hugetlb.h
index ec19d6afc896..fe6f23006641 100644
--- a/arch/riscv/include/asm/hugetlb.h
+++ b/arch/riscv/include/asm/hugetlb.h
@@ -2,7 +2,6 @@
 #ifndef _ASM_RISCV_HUGETLB_H
 #define _ASM_RISCV_HUGETLB_H
 
-#include <asm-generic/hugetlb.h>
 #include <asm/page.h>
 
 static inline void arch_clear_hugepage_flags(struct page *page)
@@ -11,4 +10,37 @@ static inline void arch_clear_hugepage_flags(struct page *page)
 }
 #define arch_clear_hugepage_flags arch_clear_hugepage_flags
 
+#ifdef CONFIG_RISCV_ISA_SVNAPOT
+#define __HAVE_ARCH_HUGE_PTE_CLEAR
+void huge_pte_clear(struct mm_struct *mm, unsigned long addr,
+		    pte_t *ptep, unsigned long sz);
+
+#define __HAVE_ARCH_HUGE_SET_HUGE_PTE_AT
+void set_huge_pte_at(struct mm_struct *mm,
+		     unsigned long addr, pte_t *ptep, pte_t pte);
+
+#define __HAVE_ARCH_HUGE_PTEP_GET_AND_CLEAR
+pte_t huge_ptep_get_and_clear(struct mm_struct *mm,
+			      unsigned long addr, pte_t *ptep);
+
+#define __HAVE_ARCH_HUGE_PTEP_CLEAR_FLUSH
+pte_t huge_ptep_clear_flush(struct vm_area_struct *vma,
+			    unsigned long addr, pte_t *ptep);
+
+#define __HAVE_ARCH_HUGE_PTEP_SET_WRPROTECT
+void huge_ptep_set_wrprotect(struct mm_struct *mm,
+			     unsigned long addr, pte_t *ptep);
+
+#define __HAVE_ARCH_HUGE_PTEP_SET_ACCESS_FLAGS
+int huge_ptep_set_access_flags(struct vm_area_struct *vma,
+			       unsigned long addr, pte_t *ptep,
+			       pte_t pte, int dirty);
+
+pte_t arch_make_huge_pte(pte_t entry, unsigned int shift, vm_flags_t flags);
+#define arch_make_huge_pte arch_make_huge_pte
+
+#endif /*CONFIG_RISCV_ISA_SVNAPOT*/
+
+#include <asm-generic/hugetlb.h>
+
 #endif /* _ASM_RISCV_HUGETLB_H */
diff --git a/arch/riscv/mm/hugetlbpage.c b/arch/riscv/mm/hugetlbpage.c
index 932dadfdca54..49f92f8cd431 100644
--- a/arch/riscv/mm/hugetlbpage.c
+++ b/arch/riscv/mm/hugetlbpage.c
@@ -2,6 +2,301 @@
 #include <linux/hugetlb.h>
 #include <linux/err.h>
 
+#ifdef CONFIG_RISCV_ISA_SVNAPOT
+pte_t *huge_pte_alloc(struct mm_struct *mm,
+		      struct vm_area_struct *vma,
+		      unsigned long addr,
+		      unsigned long sz)
+{
+	pgd_t *pgd;
+	p4d_t *p4d;
+	pud_t *pud;
+	pmd_t *pmd;
+	pte_t *pte = NULL;
+	unsigned long order;
+
+	pgd = pgd_offset(mm, addr);
+	p4d = p4d_alloc(mm, pgd, addr);
+	if (!p4d)
+		return NULL;
+
+	pud = pud_alloc(mm, p4d, addr);
+	if (!pud)
+		return NULL;
+
+	if (sz == PUD_SIZE) {
+		pte = (pte_t *)pud;
+		goto out;
+	}
+
+	if (sz == PMD_SIZE) {
+		if (want_pmd_share(vma, addr) && pud_none(*pud))
+			pte = huge_pmd_share(mm, vma, addr, pud);
+		else
+			pte = (pte_t *)pmd_alloc(mm, pud, addr);
+		goto out;
+	}
+
+	pmd = pmd_alloc(mm, pud, addr);
+	if (!pmd)
+		return NULL;
+
+	for_each_napot_order(order) {
+		if (napot_cont_size(order) == sz) {
+			pte = pte_alloc_map(mm, pmd, (addr & napot_cont_mask(order)));
+			break;
+		}
+	}
+
+out:
+	WARN_ON_ONCE(pte && pte_present(*pte) && !pte_huge(*pte));
+	return pte;
+}
+
+pte_t *huge_pte_offset(struct mm_struct *mm,
+		       unsigned long addr,
+		       unsigned long sz)
+{
+	pgd_t *pgd;
+	p4d_t *p4d;
+	pud_t *pud;
+	pmd_t *pmd;
+	pte_t *pte = NULL;
+	unsigned long order;
+
+	pgd = pgd_offset(mm, addr);
+	if (!pgd_present(*pgd))
+		return NULL;
+	p4d = p4d_offset(pgd, addr);
+	if (!p4d_present(*p4d))
+		return NULL;
+
+	pud = pud_offset(p4d, addr);
+	if (sz == PUD_SIZE)
+		/* must be pud huge, non-present or none */
+		return (pte_t *)pud;
+	if (!pud_present(*pud))
+		return NULL;
+
+	pmd = pmd_offset(pud, addr);
+	if (sz == PMD_SIZE)
+		/* must be pmd huge, non-present or none */
+		return (pte_t *)pmd;
+	if (!pmd_present(*pmd))
+		return NULL;
+
+	for_each_napot_order(order) {
+		if (napot_cont_size(order) == sz) {
+			pte = pte_offset_kernel(pmd, (addr & napot_cont_mask(order)));
+			break;
+		}
+	}
+	return pte;
+}
+
+static pte_t get_clear_contig(struct mm_struct *mm,
+			      unsigned long addr,
+			      pte_t *ptep,
+			      unsigned long pte_num)
+{
+	pte_t orig_pte = ptep_get(ptep);
+	unsigned long i;
+
+	for (i = 0; i < pte_num; i++, addr += PAGE_SIZE, ptep++) {
+		pte_t pte = ptep_get_and_clear(mm, addr, ptep);
+
+		if (pte_dirty(pte))
+			orig_pte = pte_mkdirty(orig_pte);
+
+		if (pte_young(pte))
+			orig_pte = pte_mkyoung(orig_pte);
+	}
+	return orig_pte;
+}
+
+static pte_t get_clear_contig_flush(struct mm_struct *mm,
+				    unsigned long addr,
+				    pte_t *ptep,
+				    unsigned long pte_num)
+{
+	pte_t orig_pte = get_clear_contig(mm, addr, ptep, pte_num);
+	bool valid = !pte_none(orig_pte);
+	struct vm_area_struct vma = TLB_FLUSH_VMA(mm, 0);
+
+	if (valid)
+		flush_tlb_range(&vma, addr, addr + (PAGE_SIZE * pte_num));
+
+	return orig_pte;
+}
+
+pte_t arch_make_huge_pte(pte_t entry, unsigned int shift, vm_flags_t flags)
+{
+	unsigned long order;
+
+	for_each_napot_order(order) {
+		if (shift == napot_cont_shift(order)) {
+			entry = pte_mknapot(entry, order);
+			break;
+		}
+	}
+	if (order == NAPOT_ORDER_MAX)
+		entry = pte_mkhuge(entry);
+
+	return entry;
+}
+
+void set_huge_pte_at(struct mm_struct *mm,
+		     unsigned long addr,
+		     pte_t *ptep,
+		     pte_t pte)
+{
+	int i;
+	int pte_num;
+
+	if (!pte_napot(pte)) {
+		set_pte_at(mm, addr, ptep, pte);
+		return;
+	}
+
+	pte_num = napot_pte_num(napot_cont_order(pte));
+	for (i = 0; i < pte_num; i++, ptep++, addr += PAGE_SIZE)
+		set_pte_at(mm, addr, ptep, pte);
+}
+
+int huge_ptep_set_access_flags(struct vm_area_struct *vma,
+			       unsigned long addr,
+			       pte_t *ptep,
+			       pte_t pte,
+			       int dirty)
+{
+	pte_t orig_pte;
+	int i;
+	int pte_num;
+	struct mm_struct *mm = vma->vm_mm;
+
+	if (!pte_napot(pte))
+		return ptep_set_access_flags(vma, addr, ptep, pte, dirty);
+
+	pte_num = napot_pte_num(napot_cont_order(pte));
+	ptep = huge_pte_offset(mm, addr,
+			       napot_cont_size(napot_cont_order(pte)));
+	orig_pte = get_clear_contig_flush(mm, addr, ptep, pte_num);
+
+	if (pte_dirty(orig_pte))
+		pte = pte_mkdirty(pte);
+
+	if (pte_young(orig_pte))
+		pte = pte_mkyoung(pte);
+
+	for (i = 0; i < pte_num; i++, addr += PAGE_SIZE, ptep++)
+		set_pte_at(mm, addr, ptep, pte);
+
+	return true;
+}
+
+pte_t huge_ptep_get_and_clear(struct mm_struct *mm,
+			      unsigned long addr,
+			      pte_t *ptep)
+{
+	int pte_num;
+	pte_t orig_pte = ptep_get(ptep);
+
+	if (!pte_napot(orig_pte))
+		return ptep_get_and_clear(mm, addr, ptep);
+
+	pte_num = napot_pte_num(napot_cont_order(orig_pte));
+
+	return get_clear_contig(mm, addr, ptep, pte_num);
+}
+
+void huge_ptep_set_wrprotect(struct mm_struct *mm,
+			     unsigned long addr,
+			     pte_t *ptep)
+{
+	int i;
+	int pte_num;
+	pte_t pte = ptep_get(ptep);
+
+	if (!pte_napot(pte)) {
+		ptep_set_wrprotect(mm, addr, ptep);
+		return;
+	}
+
+	pte_num = napot_pte_num(napot_cont_order(pte));
+	ptep = huge_pte_offset(mm, addr, napot_cont_size(napot_cont_order(pte)));
+
+	for (i = 0; i < pte_num; i++, addr += PAGE_SIZE, ptep++)
+		ptep_set_wrprotect(mm, addr, ptep);
+}
+
+pte_t huge_ptep_clear_flush(struct vm_area_struct *vma,
+			    unsigned long addr,
+			    pte_t *ptep)
+{
+	int pte_num;
+	pte_t pte = ptep_get(ptep);
+
+	if (!pte_napot(pte))
+		return ptep_clear_flush(vma, addr, ptep);
+
+	pte_num = napot_pte_num(napot_cont_order(pte));
+
+	return get_clear_contig_flush(vma->vm_mm, addr, ptep, pte_num);
+}
+
+void huge_pte_clear(struct mm_struct *mm,
+		    unsigned long addr,
+		    pte_t *ptep,
+		    unsigned long sz)
+{
+	int i, pte_num;
+	pte_t pte = READ_ONCE(*ptep);
+
+	if (!pte_napot(pte)) {
+		pte_clear(mm, addr, ptep);
+		return;
+	}
+
+	pte_num = napot_pte_num(napot_cont_order(pte));
+	for (i = 0; i < pte_num; i++, addr += PAGE_SIZE, ptep++)
+		pte_clear(mm, addr, ptep);
+}
+
+bool __init is_napot_size(unsigned long size)
+{
+	unsigned long order;
+
+	if (!has_svnapot())
+		return false;
+
+	for_each_napot_order(order) {
+		if (size == napot_cont_size(order))
+			return true;
+	}
+	return false;
+}
+
+static __init int napot_hugetlbpages_init(void)
+{
+	if (has_svnapot()) {
+		unsigned long order;
+
+		for_each_napot_order(order)
+			hugetlb_add_hstate(order);
+	}
+	return 0;
+}
+arch_initcall(napot_hugetlbpages_init);
+
+#else
+
+bool __init is_napot_size(unsigned long size)
+{
+	return false;
+}
+
+#endif /*CONFIG_RISCV_ISA_SVNAPOT*/
+
 int pud_huge(pud_t pud)
 {
 	return pud_leaf(pud);
@@ -18,6 +313,8 @@ bool __init arch_hugetlb_valid_size(unsigned long size)
 		return true;
 	else if (IS_ENABLED(CONFIG_64BIT) && size == PUD_SIZE)
 		return true;
+	else if (is_napot_size(size))
+		return true;
 	else
 		return false;
 }
-- 
2.37.4


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v9 3/3] riscv: mm: support Svnapot in huge vmap
  2022-12-04 14:11 [PATCH v9 0/3] riscv, mm: detect svnapot cpu support at runtime panqinglin2020
  2022-12-04 14:11 ` [PATCH v9 1/3] riscv: mm: modify pte format for Svnapot panqinglin2020
  2022-12-04 14:11 ` [PATCH v9 2/3] riscv: mm: support Svnapot in hugetlb page panqinglin2020
@ 2022-12-04 14:11 ` panqinglin2020
  2022-12-07 18:59   ` Conor Dooley
  2022-12-08 15:22   ` Andrew Jones
  2 siblings, 2 replies; 22+ messages in thread
From: panqinglin2020 @ 2022-12-04 14:11 UTC (permalink / raw)
  To: palmer, linux-riscv
  Cc: jeff, xuyinan, conor, ajones, alex, jszhang, Qinglin Pan

From: Qinglin Pan <panqinglin2020@iscas.ac.cn>

As HAVE_ARCH_HUGE_VMAP and HAVE_ARCH_HUGE_VMALLOC is supported, we can
implement arch_vmap_pte_range_map_size and arch_vmap_pte_supported_shift
for Svnapot to support huge vmap about napot size.

It can be tested by huge vmap used in pci driver. Huge vmalloc with svnapot
can be tested by changing test_vmalloc module like [1], and probe this
module to run fix_size_alloc_test with use_huge true.

[1]https://github.com/RV-VM/linux-vm-support/commit/33f4ee399c36d355c412ebe5334ca46fd727f8f5

Signed-off-by: Qinglin Pan <panqinglin2020@iscas.ac.cn>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>

diff --git a/arch/riscv/include/asm/vmalloc.h b/arch/riscv/include/asm/vmalloc.h
index 48da5371f1e9..6f7447d563ab 100644
--- a/arch/riscv/include/asm/vmalloc.h
+++ b/arch/riscv/include/asm/vmalloc.h
@@ -17,6 +17,65 @@ static inline bool arch_vmap_pmd_supported(pgprot_t prot)
 	return true;
 }
 
-#endif
+#ifdef CONFIG_RISCV_ISA_SVNAPOT
+#include <linux/pgtable.h>
 
+#define arch_vmap_pte_range_map_size arch_vmap_pte_range_map_size
+static inline unsigned long arch_vmap_pte_range_map_size(unsigned long addr, unsigned long end,
+							 u64 pfn, unsigned int max_page_shift)
+{
+	unsigned long size, order;
+	unsigned long map_size = PAGE_SIZE;
+
+	if (!has_svnapot())
+		return map_size;
+
+	for_each_napot_order_rev(order) {
+		if (napot_cont_shift(order) > max_page_shift)
+			continue;
+
+		size = napot_cont_size(order);
+		if (end - addr < size)
+			continue;
+
+		if (!IS_ALIGNED(addr, size))
+			continue;
+
+		if (!IS_ALIGNED(PFN_PHYS(pfn), size))
+			continue;
+
+		map_size = size;
+		break;
+	}
+
+	return map_size;
+}
+
+#define arch_vmap_pte_supported_shift arch_vmap_pte_supported_shift
+static inline int arch_vmap_pte_supported_shift(unsigned long size)
+{
+	int shift = PAGE_SHIFT;
+	unsigned long order;
+
+	if (!has_svnapot())
+		return shift;
+
+	WARN_ON_ONCE(size >= PMD_SIZE);
+
+	for_each_napot_order_rev(order) {
+		if (napot_cont_size(order) > size)
+			continue;
+
+		if (!IS_ALIGNED(size, napot_cont_size(order)))
+			continue;
+
+		shift = napot_cont_shift(order);
+		break;
+	}
+
+	return shift;
+}
+
+#endif /* CONFIG_RISCV_ISA_SVNAPOT */
+#endif /* CONFIG_HAVE_ARCH_HUGE_VMAP */
 #endif /* _ASM_RISCV_VMALLOC_H */
-- 
2.37.4


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH v9 1/3] riscv: mm: modify pte format for Svnapot
  2022-12-04 14:11 ` [PATCH v9 1/3] riscv: mm: modify pte format for Svnapot panqinglin2020
@ 2022-12-06 19:56   ` Conor Dooley
  2022-12-07 18:46   ` Conor Dooley
  2022-12-08 14:55   ` Andrew Jones
  2 siblings, 0 replies; 22+ messages in thread
From: Conor Dooley @ 2022-12-06 19:56 UTC (permalink / raw)
  To: panqinglin2020, palmer
  Cc: palmer, linux-riscv, jeff, xuyinan, ajones, alex, jszhang


[-- Attachment #1.1: Type: text/plain, Size: 1112 bytes --]

Palmer,

On Sun, Dec 04, 2022 at 10:11:35PM +0800, panqinglin2020@iscas.ac.cn wrote:
> From: Qinglin Pan <panqinglin2020@iscas.ac.cn>
> +config RISCV_ISA_SVNAPOT
> +	bool "SVNAPOT extension support"
> +	depends on 64BIT && MMU
> +	select RISCV_ALTERNATIVE
> +	default y
> +	help
> +	  Allow kernel to detect SVNAPOT ISA-extension dynamically in boot time
> +	  and enable its usage.
> +
> +	  SVNAPOT extension helps to mark contiguous PTEs as a range
> +	  of contiguous virtual-to-physical translations, with a naturally
> +	  aligned power-of-2 (NAPOT) granularity larger than the base 4KB page
> +	  size.

RMK's perl script that I yoinked complains about this patch,
specifically that it expects the order:
depends
default
select

The patch I sent out for the select order the other day also switches
everything to this ordering:
https://lore.kernel.org/all/20221202131034.66674-1-conor@kernel.org/

Is that sane? Certainly for the really long lists that order works well,
but idk about the short ones. Unless you feel strongly otherwise, maybe
we should enforce that order for uniformity?

Thanks,
Conor.


[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

[-- Attachment #2: Type: text/plain, Size: 161 bytes --]

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v9 1/3] riscv: mm: modify pte format for Svnapot
  2022-12-04 14:11 ` [PATCH v9 1/3] riscv: mm: modify pte format for Svnapot panqinglin2020
  2022-12-06 19:56   ` Conor Dooley
@ 2022-12-07 18:46   ` Conor Dooley
  2022-12-08  4:51     ` Qinglin Pan
  2022-12-08 14:55   ` Andrew Jones
  2 siblings, 1 reply; 22+ messages in thread
From: Conor Dooley @ 2022-12-07 18:46 UTC (permalink / raw)
  To: panqinglin2020; +Cc: palmer, linux-riscv, jeff, xuyinan, ajones, alex, jszhang


[-- Attachment #1.1: Type: text/plain, Size: 9900 bytes --]

Hey!

Couple small remarks and questions for you.

On Sun, Dec 04, 2022 at 10:11:35PM +0800, panqinglin2020@iscas.ac.cn wrote:
> From: Qinglin Pan <panqinglin2020@iscas.ac.cn>
> 
> Add one alternative to enable/disable svnapot support, enable this static
> key when "svnapot" is in the "riscv,isa" field of fdt and SVNAPOT compile
> option is set. It will influence the behavior of has_svnapot. All code
> dependent on svnapot should make sure that has_svnapot return true firstly.
> 
> Modify PTE definition for Svnapot, and creates some functions in pgtable.h
> to mark a PTE as napot and check if it is a Svnapot PTE. Until now, only
> 64KB napot size is supported in spec, so some macros has only 64KB version.
> 
> Signed-off-by: Qinglin Pan <panqinglin2020@iscas.ac.cn>
> 
> diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
> index ef8d66de5f38..1d8477c0af7c 100644
> --- a/arch/riscv/Kconfig
> +++ b/arch/riscv/Kconfig
> @@ -395,6 +395,20 @@ config RISCV_ISA_C
>  
>  	  If you don't know what to do here, say Y.
>  
> +config RISCV_ISA_SVNAPOT
> +	bool "SVNAPOT extension support"
> +	depends on 64BIT && MMU
> +	select RISCV_ALTERNATIVE
> +	default y
> +	help
> +	  Allow kernel to detect SVNAPOT ISA-extension dynamically in boot time
> +	  and enable its usage.
> +
> +	  SVNAPOT extension helps to mark contiguous PTEs as a range
> +	  of contiguous virtual-to-physical translations, with a naturally
> +	  aligned power-of-2 (NAPOT) granularity larger than the base 4KB page
> +	  size.
> +
>  config RISCV_ISA_SVPBMT
>  	bool "SVPBMT extension support"
>  	depends on 64BIT && MMU
> diff --git a/arch/riscv/include/asm/errata_list.h b/arch/riscv/include/asm/errata_list.h
> index 4180312d2a70..beadb1126ed9 100644
> --- a/arch/riscv/include/asm/errata_list.h
> +++ b/arch/riscv/include/asm/errata_list.h
> @@ -22,9 +22,10 @@
>  #define	ERRATA_THEAD_NUMBER 3
>  #endif
>  
> -#define	CPUFEATURE_SVPBMT 0
> -#define	CPUFEATURE_ZICBOM 1
> -#define	CPUFEATURE_NUMBER 2
> +#define	CPUFEATURE_SVPBMT	0
> +#define	CPUFEATURE_ZICBOM	1
> +#define	CPUFEATURE_SVNAPOT	2
> +#define	CPUFEATURE_NUMBER	3
>  
>  #ifdef __ASSEMBLY__
>  
> @@ -156,6 +157,14 @@ asm volatile(ALTERNATIVE(						\
>  	: "=r" (__ovl) :						\
>  	: "memory")
>  
> +#define ALT_SVNAPOT(_val)						\
> +asm(ALTERNATIVE(							\
> +	"li %0, 0",							\
> +	"li %0, 1",							\
> +		0, CPUFEATURE_SVNAPOT, CONFIG_RISCV_ISA_SVNAPOT)	\
> +	: "=r" (_val) :							\
> +	: "memory")
> +
>  #endif /* __ASSEMBLY__ */
>  
>  #endif
> diff --git a/arch/riscv/include/asm/hwcap.h b/arch/riscv/include/asm/hwcap.h
> index b22525290073..4cbc1f45ab26 100644
> --- a/arch/riscv/include/asm/hwcap.h
> +++ b/arch/riscv/include/asm/hwcap.h
> @@ -54,6 +54,7 @@ extern unsigned long elf_hwcap;
>   */
>  enum riscv_isa_ext_id {
>  	RISCV_ISA_EXT_SSCOFPMF = RISCV_ISA_EXT_BASE,
> +	RISCV_ISA_EXT_SVNAPOT,
>  	RISCV_ISA_EXT_SVPBMT,
>  	RISCV_ISA_EXT_ZICBOM,
>  	RISCV_ISA_EXT_ZIHINTPAUSE,
> @@ -87,7 +88,6 @@ static __always_inline int riscv_isa_ext2key(int num)
>  {
>  	switch (num) {
>  	case RISCV_ISA_EXT_f:
> -		return RISCV_ISA_EXT_KEY_FPU;
>  	case RISCV_ISA_EXT_d:
>  		return RISCV_ISA_EXT_KEY_FPU;
>  	case RISCV_ISA_EXT_ZIHINTPAUSE:
> diff --git a/arch/riscv/include/asm/page.h b/arch/riscv/include/asm/page.h
> index ac70b0fd9a9a..349fad5e35de 100644
> --- a/arch/riscv/include/asm/page.h
> +++ b/arch/riscv/include/asm/page.h
> @@ -16,11 +16,6 @@
>  #define PAGE_SIZE	(_AC(1, UL) << PAGE_SHIFT)
>  #define PAGE_MASK	(~(PAGE_SIZE - 1))
>  
> -#ifdef CONFIG_64BIT
> -#define HUGE_MAX_HSTATE		2
> -#else
> -#define HUGE_MAX_HSTATE		1
> -#endif
>  #define HPAGE_SHIFT		PMD_SHIFT
>  #define HPAGE_SIZE		(_AC(1, UL) << HPAGE_SHIFT)
>  #define HPAGE_MASK              (~(HPAGE_SIZE - 1))
> diff --git a/arch/riscv/include/asm/pgtable-64.h b/arch/riscv/include/asm/pgtable-64.h
> index dc42375c2357..9611833907ec 100644
> --- a/arch/riscv/include/asm/pgtable-64.h
> +++ b/arch/riscv/include/asm/pgtable-64.h
> @@ -74,6 +74,40 @@ typedef struct {
>   */
>  #define _PAGE_PFN_MASK  GENMASK(53, 10)
>  
> +/*
> + * [63] Svnapot definitions:
> + * 0 Svnapot disabled
> + * 1 Svnapot enabled
> + */
> +#define _PAGE_NAPOT_SHIFT	63
> +#define _PAGE_NAPOT		BIT(_PAGE_NAPOT_SHIFT)
> +/*
> + * Only 64KB (order 4) napot ptes supported.
> + */
> +#define NAPOT_CONT_ORDER_BASE 4
> +enum napot_cont_order {
> +	NAPOT_CONT64KB_ORDER = NAPOT_CONT_ORDER_BASE,
> +	NAPOT_ORDER_MAX,
> +};
> +
> +#define for_each_napot_order(order)						\
> +	for (order = NAPOT_CONT_ORDER_BASE; order < NAPOT_ORDER_MAX; order++)
> +#define for_each_napot_order_rev(order)						\

Would it be terrible to do s/rev/reverse to make things more obvious?

> +	for (order = NAPOT_ORDER_MAX - 1;					\
> +	     order >= NAPOT_CONT_ORDER_BASE; order--)
> +#define napot_cont_order(val)	(__builtin_ctzl((val.pte >> _PAGE_PFN_SHIFT) << 1))
> +
> +#define napot_cont_shift(order)	((order) + PAGE_SHIFT)
> +#define napot_cont_size(order)	BIT(napot_cont_shift(order))
> +#define napot_cont_mask(order)	(~(napot_cont_size(order) - 1UL))
> +#define napot_pte_num(order)	BIT(order)
> +
> +#ifdef CONFIG_RISCV_ISA_SVNAPOT
> +#define HUGE_MAX_HSTATE		(2 + (NAPOT_ORDER_MAX - NAPOT_CONT_ORDER_BASE))
> +#else
> +#define HUGE_MAX_HSTATE		2
> +#endif

This is coming from a place of *complete* ignorance:
If CONFIG_RISCV_ISA_SVNAPOT is enabled in the kernel but there is no
support for it on a platform is it okay to change the value of
HUGE_MAX_HSTATE? Apologies if I've missed something obvious.

> +
>  /*
>   * [62:61] Svpbmt Memory Type definitions:
>   *
> diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
> index c61ae83aadee..99957b1270f2 100644
> --- a/arch/riscv/include/asm/pgtable.h
> +++ b/arch/riscv/include/asm/pgtable.h
> @@ -6,10 +6,12 @@
>  #ifndef _ASM_RISCV_PGTABLE_H
>  #define _ASM_RISCV_PGTABLE_H
>  
> +#include <linux/jump_label.h>
>  #include <linux/mmzone.h>
>  #include <linux/sizes.h>
>  
>  #include <asm/pgtable-bits.h>
> +#include <asm/hwcap.h>
>  
>  #ifndef CONFIG_MMU
>  #define KERNEL_LINK_ADDR	PAGE_OFFSET
> @@ -264,10 +266,49 @@ static inline pte_t pud_pte(pud_t pud)
>  	return __pte(pud_val(pud));
>  }
>  
> +static __always_inline bool has_svnapot(void)
> +{
> +	unsigned int _val;
> +
> +	ALT_SVNAPOT(_val);
> +
> +	return _val;
> +}
> +
> +#ifdef CONFIG_RISCV_ISA_SVNAPOT
> +
> +static inline unsigned long pte_napot(pte_t pte)
> +{
> +	return pte_val(pte) & _PAGE_NAPOT;
> +}
> +
> +static inline pte_t pte_mknapot(pte_t pte, unsigned int order)
> +{
> +	int pos = order - 1 + _PAGE_PFN_SHIFT;
> +	unsigned long napot_bit = BIT(pos);
> +	unsigned long napot_mask = ~GENMASK(pos, _PAGE_PFN_SHIFT);
> +
> +	return __pte((pte_val(pte) & napot_mask) | napot_bit | _PAGE_NAPOT);
> +}
> +
> +#else
> +
> +static inline unsigned long pte_napot(pte_t pte)
> +{
> +	return 0;
> +}
> +
> +#endif /* CONFIG_RISCV_ISA_SVNAPOT */
> +
>  /* Yields the page frame number (PFN) of a page table entry */
>  static inline unsigned long pte_pfn(pte_t pte)
>  {
> -	return __page_val_to_pfn(pte_val(pte));
> +	unsigned long res  = __page_val_to_pfn(pte_val(pte));

nit: extra space before the =

> +
> +	if (has_svnapot() && pte_napot(pte))
> +		res = res & (res - 1UL);
> +
> +	return res;
>  }
>  
>  #define pte_page(x)     pfn_to_page(pte_pfn(x))
> diff --git a/arch/riscv/kernel/cpu.c b/arch/riscv/kernel/cpu.c
> index bf9dd6764bad..88495f5fcafd 100644
> --- a/arch/riscv/kernel/cpu.c
> +++ b/arch/riscv/kernel/cpu.c
> @@ -165,6 +165,7 @@ static struct riscv_isa_ext_data isa_ext_arr[] = {
>  	__RISCV_ISA_EXT_DATA(sscofpmf, RISCV_ISA_EXT_SSCOFPMF),
>  	__RISCV_ISA_EXT_DATA(sstc, RISCV_ISA_EXT_SSTC),
>  	__RISCV_ISA_EXT_DATA(svinval, RISCV_ISA_EXT_SVINVAL),
> +	__RISCV_ISA_EXT_DATA(svnapot, RISCV_ISA_EXT_SVNAPOT),
>  	__RISCV_ISA_EXT_DATA(svpbmt, RISCV_ISA_EXT_SVPBMT),
>  	__RISCV_ISA_EXT_DATA(zicbom, RISCV_ISA_EXT_ZICBOM),
>  	__RISCV_ISA_EXT_DATA(zihintpause, RISCV_ISA_EXT_ZIHINTPAUSE),
> diff --git a/arch/riscv/kernel/cpufeature.c b/arch/riscv/kernel/cpufeature.c
> index 694267d1fe81..eeed66c3d497 100644
> --- a/arch/riscv/kernel/cpufeature.c
> +++ b/arch/riscv/kernel/cpufeature.c
> @@ -205,6 +205,7 @@ void __init riscv_fill_hwcap(void)
>  				SET_ISA_EXT_MAP("zihintpause", RISCV_ISA_EXT_ZIHINTPAUSE);
>  				SET_ISA_EXT_MAP("sstc", RISCV_ISA_EXT_SSTC);
>  				SET_ISA_EXT_MAP("svinval", RISCV_ISA_EXT_SVINVAL);
> +				SET_ISA_EXT_MAP("svnapot", RISCV_ISA_EXT_SVNAPOT);
>  			}
>  #undef SET_ISA_EXT_MAP
>  		}
> @@ -252,6 +253,17 @@ void __init riscv_fill_hwcap(void)
>  }
>  
>  #ifdef CONFIG_RISCV_ALTERNATIVE
> +static bool __init_or_module cpufeature_probe_svnapot(unsigned int stage)
> +{
> +	if (!IS_ENABLED(CONFIG_RISCV_ISA_SVNAPOT))
> +		return false;
> +
> +	if (stage == RISCV_ALTERNATIVES_EARLY_BOOT)
> +		return false;
> +
> +	return riscv_isa_extension_available(NULL, SVNAPOT);
> +}
> +
>  static bool __init_or_module cpufeature_probe_svpbmt(unsigned int stage)
>  {
>  	if (!IS_ENABLED(CONFIG_RISCV_ISA_SVPBMT))
> @@ -289,6 +301,9 @@ static u32 __init_or_module cpufeature_probe(unsigned int stage)
>  {
>  	u32 cpu_req_feature = 0;
>  
> +	if (cpufeature_probe_svnapot(stage))
> +		cpu_req_feature |= BIT(CPUFEATURE_SVNAPOT);
> +
>  	if (cpufeature_probe_svpbmt(stage))
>  		cpu_req_feature |= BIT(CPUFEATURE_SVPBMT);
>  

There's a bunch of stuff in this patch that may go away, depending on
sequencing just FYI. See [1] for more. I wouldn't rebase on top of that,
but just so that you're aware.

1 - https://patchwork.kernel.org/project/linux-riscv/cover/20221204174632.3677-1-jszhang@kernel.org/

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

[-- Attachment #2: Type: text/plain, Size: 161 bytes --]

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v9 2/3] riscv: mm: support Svnapot in hugetlb page
  2022-12-04 14:11 ` [PATCH v9 2/3] riscv: mm: support Svnapot in hugetlb page panqinglin2020
@ 2022-12-07 18:55   ` Conor Dooley
  2022-12-08  4:54     ` Qinglin Pan
  2022-12-08 15:17   ` Andrew Jones
  1 sibling, 1 reply; 22+ messages in thread
From: Conor Dooley @ 2022-12-07 18:55 UTC (permalink / raw)
  To: panqinglin2020; +Cc: palmer, linux-riscv, jeff, xuyinan, ajones, alex, jszhang


[-- Attachment #1.1: Type: text/plain, Size: 11424 bytes --]

On Sun, Dec 04, 2022 at 10:11:36PM +0800, panqinglin2020@iscas.ac.cn wrote:
> From: Qinglin Pan <panqinglin2020@iscas.ac.cn>
> 
> Svnapot can be used to support 64KB hugetlb page, so it can become a new
> option when using hugetlbfs. Add a basic implementation of hugetlb page,
> and support 64KB as a size in it by using Svnapot.
> 
> For test, boot kernel with command line contains "default_hugepagesz=64K
> hugepagesz=64K hugepages=20" and run a simple test like this:
> 
> tools/testing/selftests/vm/map_hugetlb 1 16
> 
> And it should be passed.
> 
> Signed-off-by: Qinglin Pan <panqinglin2020@iscas.ac.cn>
> 
> diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
> index 1d8477c0af7c..be5c1edea70f 100644
> --- a/arch/riscv/Kconfig
> +++ b/arch/riscv/Kconfig
> @@ -43,7 +43,7 @@ config RISCV
>  	select ARCH_USE_QUEUED_RWLOCKS
>  	select ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT if MMU
>  	select ARCH_WANT_FRAME_POINTERS
> -	select ARCH_WANT_GENERAL_HUGETLB
> +	select ARCH_WANT_GENERAL_HUGETLB if !RISCV_ISA_SVNAPOT

I am expecting this to be a dumb question too, but I'm curious again
about what happens in a system that enables CONFIG_RISCV_ISA_SVNAPOT but
the platform it is running on does not support it...

>  	select ARCH_WANT_HUGE_PMD_SHARE if 64BIT
>  	select ARCH_WANTS_THP_SWAP if HAVE_ARCH_TRANSPARENT_HUGEPAGE
>  	select BINFMT_FLAT_NO_DATA_START_OFFSET if !MMU
> diff --git a/arch/riscv/include/asm/hugetlb.h b/arch/riscv/include/asm/hugetlb.h
> index ec19d6afc896..fe6f23006641 100644
> --- a/arch/riscv/include/asm/hugetlb.h
> +++ b/arch/riscv/include/asm/hugetlb.h
> @@ -2,7 +2,6 @@
>  #ifndef _ASM_RISCV_HUGETLB_H
>  #define _ASM_RISCV_HUGETLB_H
>  
> -#include <asm-generic/hugetlb.h>
>  #include <asm/page.h>
>  
>  static inline void arch_clear_hugepage_flags(struct page *page)
> @@ -11,4 +10,37 @@ static inline void arch_clear_hugepage_flags(struct page *page)
>  }
>  #define arch_clear_hugepage_flags arch_clear_hugepage_flags
>  
> +#ifdef CONFIG_RISCV_ISA_SVNAPOT
> +#define __HAVE_ARCH_HUGE_PTE_CLEAR
> +void huge_pte_clear(struct mm_struct *mm, unsigned long addr,
> +		    pte_t *ptep, unsigned long sz);
> +
> +#define __HAVE_ARCH_HUGE_SET_HUGE_PTE_AT
> +void set_huge_pte_at(struct mm_struct *mm,
> +		     unsigned long addr, pte_t *ptep, pte_t pte);
> +
> +#define __HAVE_ARCH_HUGE_PTEP_GET_AND_CLEAR
> +pte_t huge_ptep_get_and_clear(struct mm_struct *mm,
> +			      unsigned long addr, pte_t *ptep);
> +
> +#define __HAVE_ARCH_HUGE_PTEP_CLEAR_FLUSH
> +pte_t huge_ptep_clear_flush(struct vm_area_struct *vma,
> +			    unsigned long addr, pte_t *ptep);
> +
> +#define __HAVE_ARCH_HUGE_PTEP_SET_WRPROTECT
> +void huge_ptep_set_wrprotect(struct mm_struct *mm,
> +			     unsigned long addr, pte_t *ptep);
> +
> +#define __HAVE_ARCH_HUGE_PTEP_SET_ACCESS_FLAGS
> +int huge_ptep_set_access_flags(struct vm_area_struct *vma,
> +			       unsigned long addr, pte_t *ptep,
> +			       pte_t pte, int dirty);
> +
> +pte_t arch_make_huge_pte(pte_t entry, unsigned int shift, vm_flags_t flags);
> +#define arch_make_huge_pte arch_make_huge_pte
> +
> +#endif /*CONFIG_RISCV_ISA_SVNAPOT*/
> +
> +#include <asm-generic/hugetlb.h>

...is this sufficient to fall back to generic huge pages?

Hopefully that's just my ignorance on show!

Thanks,
Conor.

> +
>  #endif /* _ASM_RISCV_HUGETLB_H */
> diff --git a/arch/riscv/mm/hugetlbpage.c b/arch/riscv/mm/hugetlbpage.c
> index 932dadfdca54..49f92f8cd431 100644
> --- a/arch/riscv/mm/hugetlbpage.c
> +++ b/arch/riscv/mm/hugetlbpage.c
> @@ -2,6 +2,301 @@
>  #include <linux/hugetlb.h>
>  #include <linux/err.h>
>  
> +#ifdef CONFIG_RISCV_ISA_SVNAPOT
> +pte_t *huge_pte_alloc(struct mm_struct *mm,
> +		      struct vm_area_struct *vma,
> +		      unsigned long addr,
> +		      unsigned long sz)
> +{
> +	pgd_t *pgd;
> +	p4d_t *p4d;
> +	pud_t *pud;
> +	pmd_t *pmd;
> +	pte_t *pte = NULL;
> +	unsigned long order;
> +
> +	pgd = pgd_offset(mm, addr);
> +	p4d = p4d_alloc(mm, pgd, addr);
> +	if (!p4d)
> +		return NULL;
> +
> +	pud = pud_alloc(mm, p4d, addr);
> +	if (!pud)
> +		return NULL;
> +
> +	if (sz == PUD_SIZE) {
> +		pte = (pte_t *)pud;
> +		goto out;
> +	}
> +
> +	if (sz == PMD_SIZE) {
> +		if (want_pmd_share(vma, addr) && pud_none(*pud))
> +			pte = huge_pmd_share(mm, vma, addr, pud);
> +		else
> +			pte = (pte_t *)pmd_alloc(mm, pud, addr);
> +		goto out;
> +	}
> +
> +	pmd = pmd_alloc(mm, pud, addr);
> +	if (!pmd)
> +		return NULL;
> +
> +	for_each_napot_order(order) {
> +		if (napot_cont_size(order) == sz) {
> +			pte = pte_alloc_map(mm, pmd, (addr & napot_cont_mask(order)));
> +			break;
> +		}
> +	}
> +
> +out:
> +	WARN_ON_ONCE(pte && pte_present(*pte) && !pte_huge(*pte));
> +	return pte;
> +}
> +
> +pte_t *huge_pte_offset(struct mm_struct *mm,
> +		       unsigned long addr,
> +		       unsigned long sz)
> +{
> +	pgd_t *pgd;
> +	p4d_t *p4d;
> +	pud_t *pud;
> +	pmd_t *pmd;
> +	pte_t *pte = NULL;
> +	unsigned long order;
> +
> +	pgd = pgd_offset(mm, addr);
> +	if (!pgd_present(*pgd))
> +		return NULL;
> +	p4d = p4d_offset(pgd, addr);
> +	if (!p4d_present(*p4d))
> +		return NULL;
> +
> +	pud = pud_offset(p4d, addr);
> +	if (sz == PUD_SIZE)
> +		/* must be pud huge, non-present or none */
> +		return (pte_t *)pud;
> +	if (!pud_present(*pud))
> +		return NULL;
> +
> +	pmd = pmd_offset(pud, addr);
> +	if (sz == PMD_SIZE)
> +		/* must be pmd huge, non-present or none */
> +		return (pte_t *)pmd;
> +	if (!pmd_present(*pmd))
> +		return NULL;
> +
> +	for_each_napot_order(order) {
> +		if (napot_cont_size(order) == sz) {
> +			pte = pte_offset_kernel(pmd, (addr & napot_cont_mask(order)));
> +			break;
> +		}
> +	}
> +	return pte;
> +}
> +
> +static pte_t get_clear_contig(struct mm_struct *mm,
> +			      unsigned long addr,
> +			      pte_t *ptep,
> +			      unsigned long pte_num)
> +{
> +	pte_t orig_pte = ptep_get(ptep);
> +	unsigned long i;
> +
> +	for (i = 0; i < pte_num; i++, addr += PAGE_SIZE, ptep++) {
> +		pte_t pte = ptep_get_and_clear(mm, addr, ptep);
> +
> +		if (pte_dirty(pte))
> +			orig_pte = pte_mkdirty(orig_pte);
> +
> +		if (pte_young(pte))
> +			orig_pte = pte_mkyoung(orig_pte);
> +	}
> +	return orig_pte;
> +}
> +
> +static pte_t get_clear_contig_flush(struct mm_struct *mm,
> +				    unsigned long addr,
> +				    pte_t *ptep,
> +				    unsigned long pte_num)
> +{
> +	pte_t orig_pte = get_clear_contig(mm, addr, ptep, pte_num);
> +	bool valid = !pte_none(orig_pte);
> +	struct vm_area_struct vma = TLB_FLUSH_VMA(mm, 0);
> +
> +	if (valid)
> +		flush_tlb_range(&vma, addr, addr + (PAGE_SIZE * pte_num));
> +
> +	return orig_pte;
> +}
> +
> +pte_t arch_make_huge_pte(pte_t entry, unsigned int shift, vm_flags_t flags)
> +{
> +	unsigned long order;
> +
> +	for_each_napot_order(order) {
> +		if (shift == napot_cont_shift(order)) {
> +			entry = pte_mknapot(entry, order);
> +			break;
> +		}
> +	}
> +	if (order == NAPOT_ORDER_MAX)
> +		entry = pte_mkhuge(entry);
> +
> +	return entry;
> +}
> +
> +void set_huge_pte_at(struct mm_struct *mm,
> +		     unsigned long addr,
> +		     pte_t *ptep,
> +		     pte_t pte)
> +{
> +	int i;
> +	int pte_num;
> +
> +	if (!pte_napot(pte)) {
> +		set_pte_at(mm, addr, ptep, pte);
> +		return;
> +	}
> +
> +	pte_num = napot_pte_num(napot_cont_order(pte));
> +	for (i = 0; i < pte_num; i++, ptep++, addr += PAGE_SIZE)
> +		set_pte_at(mm, addr, ptep, pte);
> +}
> +
> +int huge_ptep_set_access_flags(struct vm_area_struct *vma,
> +			       unsigned long addr,
> +			       pte_t *ptep,
> +			       pte_t pte,
> +			       int dirty)
> +{
> +	pte_t orig_pte;
> +	int i;
> +	int pte_num;
> +	struct mm_struct *mm = vma->vm_mm;
> +
> +	if (!pte_napot(pte))
> +		return ptep_set_access_flags(vma, addr, ptep, pte, dirty);
> +
> +	pte_num = napot_pte_num(napot_cont_order(pte));
> +	ptep = huge_pte_offset(mm, addr,
> +			       napot_cont_size(napot_cont_order(pte)));
> +	orig_pte = get_clear_contig_flush(mm, addr, ptep, pte_num);
> +
> +	if (pte_dirty(orig_pte))
> +		pte = pte_mkdirty(pte);
> +
> +	if (pte_young(orig_pte))
> +		pte = pte_mkyoung(pte);
> +
> +	for (i = 0; i < pte_num; i++, addr += PAGE_SIZE, ptep++)
> +		set_pte_at(mm, addr, ptep, pte);
> +
> +	return true;
> +}
> +
> +pte_t huge_ptep_get_and_clear(struct mm_struct *mm,
> +			      unsigned long addr,
> +			      pte_t *ptep)
> +{
> +	int pte_num;
> +	pte_t orig_pte = ptep_get(ptep);
> +
> +	if (!pte_napot(orig_pte))
> +		return ptep_get_and_clear(mm, addr, ptep);
> +
> +	pte_num = napot_pte_num(napot_cont_order(orig_pte));
> +
> +	return get_clear_contig(mm, addr, ptep, pte_num);
> +}
> +
> +void huge_ptep_set_wrprotect(struct mm_struct *mm,
> +			     unsigned long addr,
> +			     pte_t *ptep)
> +{
> +	int i;
> +	int pte_num;
> +	pte_t pte = ptep_get(ptep);
> +
> +	if (!pte_napot(pte)) {
> +		ptep_set_wrprotect(mm, addr, ptep);
> +		return;
> +	}
> +
> +	pte_num = napot_pte_num(napot_cont_order(pte));
> +	ptep = huge_pte_offset(mm, addr, napot_cont_size(napot_cont_order(pte)));
> +
> +	for (i = 0; i < pte_num; i++, addr += PAGE_SIZE, ptep++)
> +		ptep_set_wrprotect(mm, addr, ptep);
> +}
> +
> +pte_t huge_ptep_clear_flush(struct vm_area_struct *vma,
> +			    unsigned long addr,
> +			    pte_t *ptep)
> +{
> +	int pte_num;
> +	pte_t pte = ptep_get(ptep);
> +
> +	if (!pte_napot(pte))
> +		return ptep_clear_flush(vma, addr, ptep);
> +
> +	pte_num = napot_pte_num(napot_cont_order(pte));
> +
> +	return get_clear_contig_flush(vma->vm_mm, addr, ptep, pte_num);
> +}
> +
> +void huge_pte_clear(struct mm_struct *mm,
> +		    unsigned long addr,
> +		    pte_t *ptep,
> +		    unsigned long sz)
> +{
> +	int i, pte_num;
> +	pte_t pte = READ_ONCE(*ptep);
> +
> +	if (!pte_napot(pte)) {
> +		pte_clear(mm, addr, ptep);
> +		return;
> +	}
> +
> +	pte_num = napot_pte_num(napot_cont_order(pte));
> +	for (i = 0; i < pte_num; i++, addr += PAGE_SIZE, ptep++)
> +		pte_clear(mm, addr, ptep);
> +}
> +
> +bool __init is_napot_size(unsigned long size)
> +{
> +	unsigned long order;
> +
> +	if (!has_svnapot())
> +		return false;
> +
> +	for_each_napot_order(order) {
> +		if (size == napot_cont_size(order))
> +			return true;
> +	}
> +	return false;
> +}
> +
> +static __init int napot_hugetlbpages_init(void)
> +{
> +	if (has_svnapot()) {
> +		unsigned long order;
> +
> +		for_each_napot_order(order)
> +			hugetlb_add_hstate(order);
> +	}
> +	return 0;
> +}
> +arch_initcall(napot_hugetlbpages_init);
> +
> +#else
> +
> +bool __init is_napot_size(unsigned long size)
> +{
> +	return false;
> +}
> +
> +#endif /*CONFIG_RISCV_ISA_SVNAPOT*/
> +
>  int pud_huge(pud_t pud)
>  {
>  	return pud_leaf(pud);
> @@ -18,6 +313,8 @@ bool __init arch_hugetlb_valid_size(unsigned long size)
>  		return true;
>  	else if (IS_ENABLED(CONFIG_64BIT) && size == PUD_SIZE)
>  		return true;
> +	else if (is_napot_size(size))
> +		return true;
>  	else
>  		return false;
>  }
> -- 
> 2.37.4
> 
> 
> _______________________________________________
> linux-riscv mailing list
> linux-riscv@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv
> 

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

[-- Attachment #2: Type: text/plain, Size: 161 bytes --]

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v9 3/3] riscv: mm: support Svnapot in huge vmap
  2022-12-04 14:11 ` [PATCH v9 3/3] riscv: mm: support Svnapot in huge vmap panqinglin2020
@ 2022-12-07 18:59   ` Conor Dooley
  2022-12-08  4:57     ` Qinglin Pan
  2022-12-08 15:22   ` Andrew Jones
  1 sibling, 1 reply; 22+ messages in thread
From: Conor Dooley @ 2022-12-07 18:59 UTC (permalink / raw)
  To: panqinglin2020; +Cc: palmer, linux-riscv, jeff, xuyinan, ajones, alex, jszhang


[-- Attachment #1.1: Type: text/plain, Size: 3051 bytes --]

On Sun, Dec 04, 2022 at 10:11:37PM +0800, panqinglin2020@iscas.ac.cn wrote:
> From: Qinglin Pan <panqinglin2020@iscas.ac.cn>
> 
> As HAVE_ARCH_HUGE_VMAP and HAVE_ARCH_HUGE_VMALLOC is supported, we can
> implement arch_vmap_pte_range_map_size and arch_vmap_pte_supported_shift
> for Svnapot to support huge vmap about napot size.
> 
> It can be tested by huge vmap used in pci driver. Huge vmalloc with svnapot
> can be tested by changing test_vmalloc module like [1], and probe this
> module to run fix_size_alloc_test with use_huge true.
> 
> [1]https://github.com/RV-VM/linux-vm-support/commit/33f4ee399c36d355c412ebe5334ca46fd727f8f5

Please make this one a standard Link: tag.

> 
> Signed-off-by: Qinglin Pan <panqinglin2020@iscas.ac.cn>
> Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
> 
> diff --git a/arch/riscv/include/asm/vmalloc.h b/arch/riscv/include/asm/vmalloc.h
> index 48da5371f1e9..6f7447d563ab 100644
> --- a/arch/riscv/include/asm/vmalloc.h
> +++ b/arch/riscv/include/asm/vmalloc.h
> @@ -17,6 +17,65 @@ static inline bool arch_vmap_pmd_supported(pgprot_t prot)
>  	return true;
>  }
>  
> -#endif
> +#ifdef CONFIG_RISCV_ISA_SVNAPOT
> +#include <linux/pgtable.h>
>  
> +#define arch_vmap_pte_range_map_size arch_vmap_pte_range_map_size
> +static inline unsigned long arch_vmap_pte_range_map_size(unsigned long addr, unsigned long end,
> +							 u64 pfn, unsigned int max_page_shift)
> +{
> +	unsigned long size, order;
> +	unsigned long map_size = PAGE_SIZE;
> +
> +	if (!has_svnapot())
> +		return map_size;

These ones are obvious about what you're gonna do if !has_svnapot()
though, nice :)
With a proper Link: tag
Acked-by: Conor Dooley <conor.dooley@microchip.com>

> +
> +	for_each_napot_order_rev(order) {
> +		if (napot_cont_shift(order) > max_page_shift)
> +			continue;
> +
> +		size = napot_cont_size(order);
> +		if (end - addr < size)
> +			continue;
> +
> +		if (!IS_ALIGNED(addr, size))
> +			continue;
> +
> +		if (!IS_ALIGNED(PFN_PHYS(pfn), size))
> +			continue;
> +
> +		map_size = size;
> +		break;
> +	}
> +
> +	return map_size;
> +}
> +
> +#define arch_vmap_pte_supported_shift arch_vmap_pte_supported_shift
> +static inline int arch_vmap_pte_supported_shift(unsigned long size)
> +{
> +	int shift = PAGE_SHIFT;
> +	unsigned long order;
> +
> +	if (!has_svnapot())
> +		return shift;
> +
> +	WARN_ON_ONCE(size >= PMD_SIZE);
> +
> +	for_each_napot_order_rev(order) {
> +		if (napot_cont_size(order) > size)
> +			continue;
> +
> +		if (!IS_ALIGNED(size, napot_cont_size(order)))
> +			continue;
> +
> +		shift = napot_cont_shift(order);
> +		break;
> +	}
> +
> +	return shift;
> +}
> +
> +#endif /* CONFIG_RISCV_ISA_SVNAPOT */
> +#endif /* CONFIG_HAVE_ARCH_HUGE_VMAP */
>  #endif /* _ASM_RISCV_VMALLOC_H */
> -- 
> 2.37.4
> 
> 
> _______________________________________________
> linux-riscv mailing list
> linux-riscv@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv
> 

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

[-- Attachment #2: Type: text/plain, Size: 161 bytes --]

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v9 1/3] riscv: mm: modify pte format for Svnapot
  2022-12-07 18:46   ` Conor Dooley
@ 2022-12-08  4:51     ` Qinglin Pan
  2022-12-08  4:59       ` Conor.Dooley
  2022-12-08 17:34       ` Qinglin Pan
  0 siblings, 2 replies; 22+ messages in thread
From: Qinglin Pan @ 2022-12-08  4:51 UTC (permalink / raw)
  To: Conor Dooley; +Cc: palmer, linux-riscv, jeff, xuyinan, ajones, alex, jszhang

Hey!

On 2022/12/8 02:46, Conor Dooley wrote:
> Hey!
> 
> Couple small remarks and questions for you.
> 
> On Sun, Dec 04, 2022 at 10:11:35PM +0800, panqinglin2020@iscas.ac.cn wrote:
>> From: Qinglin Pan <panqinglin2020@iscas.ac.cn>
>>
>> Add one alternative to enable/disable svnapot support, enable this static
>> key when "svnapot" is in the "riscv,isa" field of fdt and SVNAPOT compile
>> option is set. It will influence the behavior of has_svnapot. All code
>> dependent on svnapot should make sure that has_svnapot return true firstly.
>>
>> Modify PTE definition for Svnapot, and creates some functions in pgtable.h
>> to mark a PTE as napot and check if it is a Svnapot PTE. Until now, only
>> 64KB napot size is supported in spec, so some macros has only 64KB version.
>>
>> Signed-off-by: Qinglin Pan <panqinglin2020@iscas.ac.cn>
>>
>> diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
>> index ef8d66de5f38..1d8477c0af7c 100644
>> --- a/arch/riscv/Kconfig
>> +++ b/arch/riscv/Kconfig
>> @@ -395,6 +395,20 @@ config RISCV_ISA_C
>>   
>>   	  If you don't know what to do here, say Y.
>>   
>> +config RISCV_ISA_SVNAPOT
>> +	bool "SVNAPOT extension support"
>> +	depends on 64BIT && MMU
>> +	select RISCV_ALTERNATIVE
>> +	default y
>> +	help
>> +	  Allow kernel to detect SVNAPOT ISA-extension dynamically in boot time
>> +	  and enable its usage.
>> +
>> +	  SVNAPOT extension helps to mark contiguous PTEs as a range
>> +	  of contiguous virtual-to-physical translations, with a naturally
>> +	  aligned power-of-2 (NAPOT) granularity larger than the base 4KB page
>> +	  size.
>> +
>>   config RISCV_ISA_SVPBMT
>>   	bool "SVPBMT extension support"
>>   	depends on 64BIT && MMU
>> diff --git a/arch/riscv/include/asm/errata_list.h b/arch/riscv/include/asm/errata_list.h
>> index 4180312d2a70..beadb1126ed9 100644
>> --- a/arch/riscv/include/asm/errata_list.h
>> +++ b/arch/riscv/include/asm/errata_list.h
>> @@ -22,9 +22,10 @@
>>   #define	ERRATA_THEAD_NUMBER 3
>>   #endif
>>   
>> -#define	CPUFEATURE_SVPBMT 0
>> -#define	CPUFEATURE_ZICBOM 1
>> -#define	CPUFEATURE_NUMBER 2
>> +#define	CPUFEATURE_SVPBMT	0
>> +#define	CPUFEATURE_ZICBOM	1
>> +#define	CPUFEATURE_SVNAPOT	2
>> +#define	CPUFEATURE_NUMBER	3
>>   
>>   #ifdef __ASSEMBLY__
>>   
>> @@ -156,6 +157,14 @@ asm volatile(ALTERNATIVE(						\
>>   	: "=r" (__ovl) :						\
>>   	: "memory")
>>   
>> +#define ALT_SVNAPOT(_val)						\
>> +asm(ALTERNATIVE(							\
>> +	"li %0, 0",							\
>> +	"li %0, 1",							\
>> +		0, CPUFEATURE_SVNAPOT, CONFIG_RISCV_ISA_SVNAPOT)	\
>> +	: "=r" (_val) :							\
>> +	: "memory")
>> +
>>   #endif /* __ASSEMBLY__ */
>>   
>>   #endif
>> diff --git a/arch/riscv/include/asm/hwcap.h b/arch/riscv/include/asm/hwcap.h
>> index b22525290073..4cbc1f45ab26 100644
>> --- a/arch/riscv/include/asm/hwcap.h
>> +++ b/arch/riscv/include/asm/hwcap.h
>> @@ -54,6 +54,7 @@ extern unsigned long elf_hwcap;
>>    */
>>   enum riscv_isa_ext_id {
>>   	RISCV_ISA_EXT_SSCOFPMF = RISCV_ISA_EXT_BASE,
>> +	RISCV_ISA_EXT_SVNAPOT,
>>   	RISCV_ISA_EXT_SVPBMT,
>>   	RISCV_ISA_EXT_ZICBOM,
>>   	RISCV_ISA_EXT_ZIHINTPAUSE,
>> @@ -87,7 +88,6 @@ static __always_inline int riscv_isa_ext2key(int num)
>>   {
>>   	switch (num) {
>>   	case RISCV_ISA_EXT_f:
>> -		return RISCV_ISA_EXT_KEY_FPU;
>>   	case RISCV_ISA_EXT_d:
>>   		return RISCV_ISA_EXT_KEY_FPU;
>>   	case RISCV_ISA_EXT_ZIHINTPAUSE:
>> diff --git a/arch/riscv/include/asm/page.h b/arch/riscv/include/asm/page.h
>> index ac70b0fd9a9a..349fad5e35de 100644
>> --- a/arch/riscv/include/asm/page.h
>> +++ b/arch/riscv/include/asm/page.h
>> @@ -16,11 +16,6 @@
>>   #define PAGE_SIZE	(_AC(1, UL) << PAGE_SHIFT)
>>   #define PAGE_MASK	(~(PAGE_SIZE - 1))
>>   
>> -#ifdef CONFIG_64BIT
>> -#define HUGE_MAX_HSTATE		2
>> -#else
>> -#define HUGE_MAX_HSTATE		1
>> -#endif
>>   #define HPAGE_SHIFT		PMD_SHIFT
>>   #define HPAGE_SIZE		(_AC(1, UL) << HPAGE_SHIFT)
>>   #define HPAGE_MASK              (~(HPAGE_SIZE - 1))
>> diff --git a/arch/riscv/include/asm/pgtable-64.h b/arch/riscv/include/asm/pgtable-64.h
>> index dc42375c2357..9611833907ec 100644
>> --- a/arch/riscv/include/asm/pgtable-64.h
>> +++ b/arch/riscv/include/asm/pgtable-64.h
>> @@ -74,6 +74,40 @@ typedef struct {
>>    */
>>   #define _PAGE_PFN_MASK  GENMASK(53, 10)
>>   
>> +/*
>> + * [63] Svnapot definitions:
>> + * 0 Svnapot disabled
>> + * 1 Svnapot enabled
>> + */
>> +#define _PAGE_NAPOT_SHIFT	63
>> +#define _PAGE_NAPOT		BIT(_PAGE_NAPOT_SHIFT)
>> +/*
>> + * Only 64KB (order 4) napot ptes supported.
>> + */
>> +#define NAPOT_CONT_ORDER_BASE 4
>> +enum napot_cont_order {
>> +	NAPOT_CONT64KB_ORDER = NAPOT_CONT_ORDER_BASE,
>> +	NAPOT_ORDER_MAX,
>> +};
>> +
>> +#define for_each_napot_order(order)						\
>> +	for (order = NAPOT_CONT_ORDER_BASE; order < NAPOT_ORDER_MAX; order++)
>> +#define for_each_napot_order_rev(order)						\
> 
> Would it be terrible to do s/rev/reverse to make things more obvious?

Maybe we should just keep it in the same style as existing macros like
for_each_mem_range/for_each_mem_range_rev :)

> 
>> +	for (order = NAPOT_ORDER_MAX - 1;					\
>> +	     order >= NAPOT_CONT_ORDER_BASE; order--)
>> +#define napot_cont_order(val)	(__builtin_ctzl((val.pte >> _PAGE_PFN_SHIFT) << 1))
>> +
>> +#define napot_cont_shift(order)	((order) + PAGE_SHIFT)
>> +#define napot_cont_size(order)	BIT(napot_cont_shift(order))
>> +#define napot_cont_mask(order)	(~(napot_cont_size(order) - 1UL))
>> +#define napot_pte_num(order)	BIT(order)
>> +
>> +#ifdef CONFIG_RISCV_ISA_SVNAPOT
>> +#define HUGE_MAX_HSTATE		(2 + (NAPOT_ORDER_MAX - NAPOT_CONT_ORDER_BASE))
>> +#else
>> +#define HUGE_MAX_HSTATE		2
>> +#endif
> 
> This is coming from a place of *complete* ignorance:
> If CONFIG_RISCV_ISA_SVNAPOT is enabled in the kernel but there is no
> support for it on a platform is it okay to change the value of
> HUGE_MAX_HSTATE? Apologies if I've missed something obvious.

:(
You are right. If CONFIG_RISCV_ISA_SVNAPOT is enabled, HUGE_MAX_HSTATE
will always be 3 whether has_svnapot() is true or false. I will try to
fix this.

> 
>> +
>>   /*
>>    * [62:61] Svpbmt Memory Type definitions:
>>    *
>> diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
>> index c61ae83aadee..99957b1270f2 100644
>> --- a/arch/riscv/include/asm/pgtable.h
>> +++ b/arch/riscv/include/asm/pgtable.h
>> @@ -6,10 +6,12 @@
>>   #ifndef _ASM_RISCV_PGTABLE_H
>>   #define _ASM_RISCV_PGTABLE_H
>>   
>> +#include <linux/jump_label.h>
>>   #include <linux/mmzone.h>
>>   #include <linux/sizes.h>
>>   
>>   #include <asm/pgtable-bits.h>
>> +#include <asm/hwcap.h>
>>   
>>   #ifndef CONFIG_MMU
>>   #define KERNEL_LINK_ADDR	PAGE_OFFSET
>> @@ -264,10 +266,49 @@ static inline pte_t pud_pte(pud_t pud)
>>   	return __pte(pud_val(pud));
>>   }
>>   
>> +static __always_inline bool has_svnapot(void)
>> +{
>> +	unsigned int _val;
>> +
>> +	ALT_SVNAPOT(_val);
>> +
>> +	return _val;
>> +}
>> +
>> +#ifdef CONFIG_RISCV_ISA_SVNAPOT
>> +
>> +static inline unsigned long pte_napot(pte_t pte)
>> +{
>> +	return pte_val(pte) & _PAGE_NAPOT;
>> +}
>> +
>> +static inline pte_t pte_mknapot(pte_t pte, unsigned int order)
>> +{
>> +	int pos = order - 1 + _PAGE_PFN_SHIFT;
>> +	unsigned long napot_bit = BIT(pos);
>> +	unsigned long napot_mask = ~GENMASK(pos, _PAGE_PFN_SHIFT);
>> +
>> +	return __pte((pte_val(pte) & napot_mask) | napot_bit | _PAGE_NAPOT);
>> +}
>> +
>> +#else
>> +
>> +static inline unsigned long pte_napot(pte_t pte)
>> +{
>> +	return 0;
>> +}
>> +
>> +#endif /* CONFIG_RISCV_ISA_SVNAPOT */
>> +
>>   /* Yields the page frame number (PFN) of a page table entry */
>>   static inline unsigned long pte_pfn(pte_t pte)
>>   {
>> -	return __page_val_to_pfn(pte_val(pte));
>> +	unsigned long res  = __page_val_to_pfn(pte_val(pte));
> 
> nit: extra space before the =

Nice catch! Will remove this extra space :)

> 
>> +
>> +	if (has_svnapot() && pte_napot(pte))
>> +		res = res & (res - 1UL);
>> +
>> +	return res;
>>   }
>>   
>>   #define pte_page(x)     pfn_to_page(pte_pfn(x))
>> diff --git a/arch/riscv/kernel/cpu.c b/arch/riscv/kernel/cpu.c
>> index bf9dd6764bad..88495f5fcafd 100644
>> --- a/arch/riscv/kernel/cpu.c
>> +++ b/arch/riscv/kernel/cpu.c
>> @@ -165,6 +165,7 @@ static struct riscv_isa_ext_data isa_ext_arr[] = {
>>   	__RISCV_ISA_EXT_DATA(sscofpmf, RISCV_ISA_EXT_SSCOFPMF),
>>   	__RISCV_ISA_EXT_DATA(sstc, RISCV_ISA_EXT_SSTC),
>>   	__RISCV_ISA_EXT_DATA(svinval, RISCV_ISA_EXT_SVINVAL),
>> +	__RISCV_ISA_EXT_DATA(svnapot, RISCV_ISA_EXT_SVNAPOT),
>>   	__RISCV_ISA_EXT_DATA(svpbmt, RISCV_ISA_EXT_SVPBMT),
>>   	__RISCV_ISA_EXT_DATA(zicbom, RISCV_ISA_EXT_ZICBOM),
>>   	__RISCV_ISA_EXT_DATA(zihintpause, RISCV_ISA_EXT_ZIHINTPAUSE),
>> diff --git a/arch/riscv/kernel/cpufeature.c b/arch/riscv/kernel/cpufeature.c
>> index 694267d1fe81..eeed66c3d497 100644
>> --- a/arch/riscv/kernel/cpufeature.c
>> +++ b/arch/riscv/kernel/cpufeature.c
>> @@ -205,6 +205,7 @@ void __init riscv_fill_hwcap(void)
>>   				SET_ISA_EXT_MAP("zihintpause", RISCV_ISA_EXT_ZIHINTPAUSE);
>>   				SET_ISA_EXT_MAP("sstc", RISCV_ISA_EXT_SSTC);
>>   				SET_ISA_EXT_MAP("svinval", RISCV_ISA_EXT_SVINVAL);
>> +				SET_ISA_EXT_MAP("svnapot", RISCV_ISA_EXT_SVNAPOT);
>>   			}
>>   #undef SET_ISA_EXT_MAP
>>   		}
>> @@ -252,6 +253,17 @@ void __init riscv_fill_hwcap(void)
>>   }
>>   
>>   #ifdef CONFIG_RISCV_ALTERNATIVE
>> +static bool __init_or_module cpufeature_probe_svnapot(unsigned int stage)
>> +{
>> +	if (!IS_ENABLED(CONFIG_RISCV_ISA_SVNAPOT))
>> +		return false;
>> +
>> +	if (stage == RISCV_ALTERNATIVES_EARLY_BOOT)
>> +		return false;
>> +
>> +	return riscv_isa_extension_available(NULL, SVNAPOT);
>> +}
>> +
>>   static bool __init_or_module cpufeature_probe_svpbmt(unsigned int stage)
>>   {
>>   	if (!IS_ENABLED(CONFIG_RISCV_ISA_SVPBMT))
>> @@ -289,6 +301,9 @@ static u32 __init_or_module cpufeature_probe(unsigned int stage)
>>   {
>>   	u32 cpu_req_feature = 0;
>>   
>> +	if (cpufeature_probe_svnapot(stage))
>> +		cpu_req_feature |= BIT(CPUFEATURE_SVNAPOT);
>> +
>>   	if (cpufeature_probe_svpbmt(stage))
>>   		cpu_req_feature |= BIT(CPUFEATURE_SVPBMT);
>>   
> 
> There's a bunch of stuff in this patch that may go away, depending on
> sequencing just FYI. See [1] for more. I wouldn't rebase on top of that,
> but just so that you're aware.

Thanks for the information! If this patchset will be merged firstly, 
Jisheng can easily replace implementation of has_svnapot with his 
interface. Otherwise I will do this replacement :)

Thanks,
Qinglin

> 
> 1 - https://patchwork.kernel.org/project/linux-riscv/cover/20221204174632.3677-1-jszhang@kernel.org/


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v9 2/3] riscv: mm: support Svnapot in hugetlb page
  2022-12-07 18:55   ` Conor Dooley
@ 2022-12-08  4:54     ` Qinglin Pan
  0 siblings, 0 replies; 22+ messages in thread
From: Qinglin Pan @ 2022-12-08  4:54 UTC (permalink / raw)
  To: Conor Dooley; +Cc: palmer, linux-riscv, jeff, xuyinan, ajones, alex, jszhang

Hey!

On 2022/12/8 02:55, Conor Dooley wrote:
> On Sun, Dec 04, 2022 at 10:11:36PM +0800, panqinglin2020@iscas.ac.cn wrote:
>> From: Qinglin Pan <panqinglin2020@iscas.ac.cn>
>>
>> Svnapot can be used to support 64KB hugetlb page, so it can become a new
>> option when using hugetlbfs. Add a basic implementation of hugetlb page,
>> and support 64KB as a size in it by using Svnapot.
>>
>> For test, boot kernel with command line contains "default_hugepagesz=64K
>> hugepagesz=64K hugepages=20" and run a simple test like this:
>>
>> tools/testing/selftests/vm/map_hugetlb 1 16
>>
>> And it should be passed.
>>
>> Signed-off-by: Qinglin Pan <panqinglin2020@iscas.ac.cn>
>>
>> diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
>> index 1d8477c0af7c..be5c1edea70f 100644
>> --- a/arch/riscv/Kconfig
>> +++ b/arch/riscv/Kconfig
>> @@ -43,7 +43,7 @@ config RISCV
>>   	select ARCH_USE_QUEUED_RWLOCKS
>>   	select ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT if MMU
>>   	select ARCH_WANT_FRAME_POINTERS
>> -	select ARCH_WANT_GENERAL_HUGETLB
>> +	select ARCH_WANT_GENERAL_HUGETLB if !RISCV_ISA_SVNAPOT
> 
> I am expecting this to be a dumb question too, but I'm curious again
> about what happens in a system that enables CONFIG_RISCV_ISA_SVNAPOT but
> the platform it is running on does not support it...
> 
>>   	select ARCH_WANT_HUGE_PMD_SHARE if 64BIT
>>   	select ARCH_WANTS_THP_SWAP if HAVE_ARCH_TRANSPARENT_HUGEPAGE
>>   	select BINFMT_FLAT_NO_DATA_START_OFFSET if !MMU
>> diff --git a/arch/riscv/include/asm/hugetlb.h b/arch/riscv/include/asm/hugetlb.h
>> index ec19d6afc896..fe6f23006641 100644
>> --- a/arch/riscv/include/asm/hugetlb.h
>> +++ b/arch/riscv/include/asm/hugetlb.h
>> @@ -2,7 +2,6 @@
>>   #ifndef _ASM_RISCV_HUGETLB_H
>>   #define _ASM_RISCV_HUGETLB_H
>>   
>> -#include <asm-generic/hugetlb.h>
>>   #include <asm/page.h>
>>   
>>   static inline void arch_clear_hugepage_flags(struct page *page)
>> @@ -11,4 +10,37 @@ static inline void arch_clear_hugepage_flags(struct page *page)
>>   }
>>   #define arch_clear_hugepage_flags arch_clear_hugepage_flags
>>   
>> +#ifdef CONFIG_RISCV_ISA_SVNAPOT
>> +#define __HAVE_ARCH_HUGE_PTE_CLEAR
>> +void huge_pte_clear(struct mm_struct *mm, unsigned long addr,
>> +		    pte_t *ptep, unsigned long sz);
>> +
>> +#define __HAVE_ARCH_HUGE_SET_HUGE_PTE_AT
>> +void set_huge_pte_at(struct mm_struct *mm,
>> +		     unsigned long addr, pte_t *ptep, pte_t pte);
>> +
>> +#define __HAVE_ARCH_HUGE_PTEP_GET_AND_CLEAR
>> +pte_t huge_ptep_get_and_clear(struct mm_struct *mm,
>> +			      unsigned long addr, pte_t *ptep);
>> +
>> +#define __HAVE_ARCH_HUGE_PTEP_CLEAR_FLUSH
>> +pte_t huge_ptep_clear_flush(struct vm_area_struct *vma,
>> +			    unsigned long addr, pte_t *ptep);
>> +
>> +#define __HAVE_ARCH_HUGE_PTEP_SET_WRPROTECT
>> +void huge_ptep_set_wrprotect(struct mm_struct *mm,
>> +			     unsigned long addr, pte_t *ptep);
>> +
>> +#define __HAVE_ARCH_HUGE_PTEP_SET_ACCESS_FLAGS
>> +int huge_ptep_set_access_flags(struct vm_area_struct *vma,
>> +			       unsigned long addr, pte_t *ptep,
>> +			       pte_t pte, int dirty);
>> +
>> +pte_t arch_make_huge_pte(pte_t entry, unsigned int shift, vm_flags_t flags);
>> +#define arch_make_huge_pte arch_make_huge_pte
>> +
>> +#endif /*CONFIG_RISCV_ISA_SVNAPOT*/
>> +
>> +#include <asm-generic/hugetlb.h>
> 
> ...is this sufficient to fall back to generic huge pages?

Yes. If CONFIG_RISCV_ISA_SVNAPOT is disabled, it will fall back to
generic huge pages. And if CONFIG_RISCV_ISA_SVNAPOT is enabled but
the platform has not svnapot support, PMD_SIZE/PUD_SIZE huge pages
are still available.

Thanks,
Qinglin.

> 
> Hopefully that's just my ignorance on show >
> Thanks,
> Conor.
> 
>> +
>>   #endif /* _ASM_RISCV_HUGETLB_H */
>> diff --git a/arch/riscv/mm/hugetlbpage.c b/arch/riscv/mm/hugetlbpage.c
>> index 932dadfdca54..49f92f8cd431 100644
>> --- a/arch/riscv/mm/hugetlbpage.c
>> +++ b/arch/riscv/mm/hugetlbpage.c
>> @@ -2,6 +2,301 @@
>>   #include <linux/hugetlb.h>
>>   #include <linux/err.h>
>>   
>> +#ifdef CONFIG_RISCV_ISA_SVNAPOT
>> +pte_t *huge_pte_alloc(struct mm_struct *mm,
>> +		      struct vm_area_struct *vma,
>> +		      unsigned long addr,
>> +		      unsigned long sz)
>> +{
>> +	pgd_t *pgd;
>> +	p4d_t *p4d;
>> +	pud_t *pud;
>> +	pmd_t *pmd;
>> +	pte_t *pte = NULL;
>> +	unsigned long order;
>> +
>> +	pgd = pgd_offset(mm, addr);
>> +	p4d = p4d_alloc(mm, pgd, addr);
>> +	if (!p4d)
>> +		return NULL;
>> +
>> +	pud = pud_alloc(mm, p4d, addr);
>> +	if (!pud)
>> +		return NULL;
>> +
>> +	if (sz == PUD_SIZE) {
>> +		pte = (pte_t *)pud;
>> +		goto out;
>> +	}
>> +
>> +	if (sz == PMD_SIZE) {
>> +		if (want_pmd_share(vma, addr) && pud_none(*pud))
>> +			pte = huge_pmd_share(mm, vma, addr, pud);
>> +		else
>> +			pte = (pte_t *)pmd_alloc(mm, pud, addr);
>> +		goto out;
>> +	}
>> +
>> +	pmd = pmd_alloc(mm, pud, addr);
>> +	if (!pmd)
>> +		return NULL;
>> +
>> +	for_each_napot_order(order) {
>> +		if (napot_cont_size(order) == sz) {
>> +			pte = pte_alloc_map(mm, pmd, (addr & napot_cont_mask(order)));
>> +			break;
>> +		}
>> +	}
>> +
>> +out:
>> +	WARN_ON_ONCE(pte && pte_present(*pte) && !pte_huge(*pte));
>> +	return pte;
>> +}
>> +
>> +pte_t *huge_pte_offset(struct mm_struct *mm,
>> +		       unsigned long addr,
>> +		       unsigned long sz)
>> +{
>> +	pgd_t *pgd;
>> +	p4d_t *p4d;
>> +	pud_t *pud;
>> +	pmd_t *pmd;
>> +	pte_t *pte = NULL;
>> +	unsigned long order;
>> +
>> +	pgd = pgd_offset(mm, addr);
>> +	if (!pgd_present(*pgd))
>> +		return NULL;
>> +	p4d = p4d_offset(pgd, addr);
>> +	if (!p4d_present(*p4d))
>> +		return NULL;
>> +
>> +	pud = pud_offset(p4d, addr);
>> +	if (sz == PUD_SIZE)
>> +		/* must be pud huge, non-present or none */
>> +		return (pte_t *)pud;
>> +	if (!pud_present(*pud))
>> +		return NULL;
>> +
>> +	pmd = pmd_offset(pud, addr);
>> +	if (sz == PMD_SIZE)
>> +		/* must be pmd huge, non-present or none */
>> +		return (pte_t *)pmd;
>> +	if (!pmd_present(*pmd))
>> +		return NULL;
>> +
>> +	for_each_napot_order(order) {
>> +		if (napot_cont_size(order) == sz) {
>> +			pte = pte_offset_kernel(pmd, (addr & napot_cont_mask(order)));
>> +			break;
>> +		}
>> +	}
>> +	return pte;
>> +}
>> +
>> +static pte_t get_clear_contig(struct mm_struct *mm,
>> +			      unsigned long addr,
>> +			      pte_t *ptep,
>> +			      unsigned long pte_num)
>> +{
>> +	pte_t orig_pte = ptep_get(ptep);
>> +	unsigned long i;
>> +
>> +	for (i = 0; i < pte_num; i++, addr += PAGE_SIZE, ptep++) {
>> +		pte_t pte = ptep_get_and_clear(mm, addr, ptep);
>> +
>> +		if (pte_dirty(pte))
>> +			orig_pte = pte_mkdirty(orig_pte);
>> +
>> +		if (pte_young(pte))
>> +			orig_pte = pte_mkyoung(orig_pte);
>> +	}
>> +	return orig_pte;
>> +}
>> +
>> +static pte_t get_clear_contig_flush(struct mm_struct *mm,
>> +				    unsigned long addr,
>> +				    pte_t *ptep,
>> +				    unsigned long pte_num)
>> +{
>> +	pte_t orig_pte = get_clear_contig(mm, addr, ptep, pte_num);
>> +	bool valid = !pte_none(orig_pte);
>> +	struct vm_area_struct vma = TLB_FLUSH_VMA(mm, 0);
>> +
>> +	if (valid)
>> +		flush_tlb_range(&vma, addr, addr + (PAGE_SIZE * pte_num));
>> +
>> +	return orig_pte;
>> +}
>> +
>> +pte_t arch_make_huge_pte(pte_t entry, unsigned int shift, vm_flags_t flags)
>> +{
>> +	unsigned long order;
>> +
>> +	for_each_napot_order(order) {
>> +		if (shift == napot_cont_shift(order)) {
>> +			entry = pte_mknapot(entry, order);
>> +			break;
>> +		}
>> +	}
>> +	if (order == NAPOT_ORDER_MAX)
>> +		entry = pte_mkhuge(entry);
>> +
>> +	return entry;
>> +}
>> +
>> +void set_huge_pte_at(struct mm_struct *mm,
>> +		     unsigned long addr,
>> +		     pte_t *ptep,
>> +		     pte_t pte)
>> +{
>> +	int i;
>> +	int pte_num;
>> +
>> +	if (!pte_napot(pte)) {
>> +		set_pte_at(mm, addr, ptep, pte);
>> +		return;
>> +	}
>> +
>> +	pte_num = napot_pte_num(napot_cont_order(pte));
>> +	for (i = 0; i < pte_num; i++, ptep++, addr += PAGE_SIZE)
>> +		set_pte_at(mm, addr, ptep, pte);
>> +}
>> +
>> +int huge_ptep_set_access_flags(struct vm_area_struct *vma,
>> +			       unsigned long addr,
>> +			       pte_t *ptep,
>> +			       pte_t pte,
>> +			       int dirty)
>> +{
>> +	pte_t orig_pte;
>> +	int i;
>> +	int pte_num;
>> +	struct mm_struct *mm = vma->vm_mm;
>> +
>> +	if (!pte_napot(pte))
>> +		return ptep_set_access_flags(vma, addr, ptep, pte, dirty);
>> +
>> +	pte_num = napot_pte_num(napot_cont_order(pte));
>> +	ptep = huge_pte_offset(mm, addr,
>> +			       napot_cont_size(napot_cont_order(pte)));
>> +	orig_pte = get_clear_contig_flush(mm, addr, ptep, pte_num);
>> +
>> +	if (pte_dirty(orig_pte))
>> +		pte = pte_mkdirty(pte);
>> +
>> +	if (pte_young(orig_pte))
>> +		pte = pte_mkyoung(pte);
>> +
>> +	for (i = 0; i < pte_num; i++, addr += PAGE_SIZE, ptep++)
>> +		set_pte_at(mm, addr, ptep, pte);
>> +
>> +	return true;
>> +}
>> +
>> +pte_t huge_ptep_get_and_clear(struct mm_struct *mm,
>> +			      unsigned long addr,
>> +			      pte_t *ptep)
>> +{
>> +	int pte_num;
>> +	pte_t orig_pte = ptep_get(ptep);
>> +
>> +	if (!pte_napot(orig_pte))
>> +		return ptep_get_and_clear(mm, addr, ptep);
>> +
>> +	pte_num = napot_pte_num(napot_cont_order(orig_pte));
>> +
>> +	return get_clear_contig(mm, addr, ptep, pte_num);
>> +}
>> +
>> +void huge_ptep_set_wrprotect(struct mm_struct *mm,
>> +			     unsigned long addr,
>> +			     pte_t *ptep)
>> +{
>> +	int i;
>> +	int pte_num;
>> +	pte_t pte = ptep_get(ptep);
>> +
>> +	if (!pte_napot(pte)) {
>> +		ptep_set_wrprotect(mm, addr, ptep);
>> +		return;
>> +	}
>> +
>> +	pte_num = napot_pte_num(napot_cont_order(pte));
>> +	ptep = huge_pte_offset(mm, addr, napot_cont_size(napot_cont_order(pte)));
>> +
>> +	for (i = 0; i < pte_num; i++, addr += PAGE_SIZE, ptep++)
>> +		ptep_set_wrprotect(mm, addr, ptep);
>> +}
>> +
>> +pte_t huge_ptep_clear_flush(struct vm_area_struct *vma,
>> +			    unsigned long addr,
>> +			    pte_t *ptep)
>> +{
>> +	int pte_num;
>> +	pte_t pte = ptep_get(ptep);
>> +
>> +	if (!pte_napot(pte))
>> +		return ptep_clear_flush(vma, addr, ptep);
>> +
>> +	pte_num = napot_pte_num(napot_cont_order(pte));
>> +
>> +	return get_clear_contig_flush(vma->vm_mm, addr, ptep, pte_num);
>> +}
>> +
>> +void huge_pte_clear(struct mm_struct *mm,
>> +		    unsigned long addr,
>> +		    pte_t *ptep,
>> +		    unsigned long sz)
>> +{
>> +	int i, pte_num;
>> +	pte_t pte = READ_ONCE(*ptep);
>> +
>> +	if (!pte_napot(pte)) {
>> +		pte_clear(mm, addr, ptep);
>> +		return;
>> +	}
>> +
>> +	pte_num = napot_pte_num(napot_cont_order(pte));
>> +	for (i = 0; i < pte_num; i++, addr += PAGE_SIZE, ptep++)
>> +		pte_clear(mm, addr, ptep);
>> +}
>> +
>> +bool __init is_napot_size(unsigned long size)
>> +{
>> +	unsigned long order;
>> +
>> +	if (!has_svnapot())
>> +		return false;
>> +
>> +	for_each_napot_order(order) {
>> +		if (size == napot_cont_size(order))
>> +			return true;
>> +	}
>> +	return false;
>> +}
>> +
>> +static __init int napot_hugetlbpages_init(void)
>> +{
>> +	if (has_svnapot()) {
>> +		unsigned long order;
>> +
>> +		for_each_napot_order(order)
>> +			hugetlb_add_hstate(order);
>> +	}
>> +	return 0;
>> +}
>> +arch_initcall(napot_hugetlbpages_init);
>> +
>> +#else
>> +
>> +bool __init is_napot_size(unsigned long size)
>> +{
>> +	return false;
>> +}
>> +
>> +#endif /*CONFIG_RISCV_ISA_SVNAPOT*/
>> +
>>   int pud_huge(pud_t pud)
>>   {
>>   	return pud_leaf(pud);
>> @@ -18,6 +313,8 @@ bool __init arch_hugetlb_valid_size(unsigned long size)
>>   		return true;
>>   	else if (IS_ENABLED(CONFIG_64BIT) && size == PUD_SIZE)
>>   		return true;
>> +	else if (is_napot_size(size))
>> +		return true;
>>   	else
>>   		return false;
>>   }
>> -- 
>> 2.37.4
>>
>>
>> _______________________________________________
>> linux-riscv mailing list
>> linux-riscv@lists.infradead.org
>> http://lists.infradead.org/mailman/listinfo/linux-riscv
>>


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v9 3/3] riscv: mm: support Svnapot in huge vmap
  2022-12-07 18:59   ` Conor Dooley
@ 2022-12-08  4:57     ` Qinglin Pan
  0 siblings, 0 replies; 22+ messages in thread
From: Qinglin Pan @ 2022-12-08  4:57 UTC (permalink / raw)
  To: Conor Dooley; +Cc: palmer, linux-riscv, jeff, xuyinan, ajones, alex, jszhang

Hey!

On 2022/12/8 02:59, Conor Dooley wrote:
> On Sun, Dec 04, 2022 at 10:11:37PM +0800, panqinglin2020@iscas.ac.cn wrote:
>> From: Qinglin Pan <panqinglin2020@iscas.ac.cn>
>>
>> As HAVE_ARCH_HUGE_VMAP and HAVE_ARCH_HUGE_VMALLOC is supported, we can
>> implement arch_vmap_pte_range_map_size and arch_vmap_pte_supported_shift
>> for Svnapot to support huge vmap about napot size.
>>
>> It can be tested by huge vmap used in pci driver. Huge vmalloc with svnapot
>> can be tested by changing test_vmalloc module like [1], and probe this
>> module to run fix_size_alloc_test with use_huge true.
>>
>> [1]https://github.com/RV-VM/linux-vm-support/commit/33f4ee399c36d355c412ebe5334ca46fd727f8f5
> 
> Please make this one a standard Link: tag.

Got it.

> 
>>
>> Signed-off-by: Qinglin Pan <panqinglin2020@iscas.ac.cn>
>> Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
>>
>> diff --git a/arch/riscv/include/asm/vmalloc.h b/arch/riscv/include/asm/vmalloc.h
>> index 48da5371f1e9..6f7447d563ab 100644
>> --- a/arch/riscv/include/asm/vmalloc.h
>> +++ b/arch/riscv/include/asm/vmalloc.h
>> @@ -17,6 +17,65 @@ static inline bool arch_vmap_pmd_supported(pgprot_t prot)
>>   	return true;
>>   }
>>   
>> -#endif
>> +#ifdef CONFIG_RISCV_ISA_SVNAPOT
>> +#include <linux/pgtable.h>
>>   
>> +#define arch_vmap_pte_range_map_size arch_vmap_pte_range_map_size
>> +static inline unsigned long arch_vmap_pte_range_map_size(unsigned long addr, unsigned long end,
>> +							 u64 pfn, unsigned int max_page_shift)
>> +{
>> +	unsigned long size, order;
>> +	unsigned long map_size = PAGE_SIZE;
>> +
>> +	if (!has_svnapot())
>> +		return map_size;
> 
> These ones are obvious about what you're gonna do if !has_svnapot()
> though, nice :)
> With a proper Link: tag
> Acked-by: Conor Dooley <conor.dooley@microchip.com>

Thanks,
Qinglin.

> 
>> +
>> +	for_each_napot_order_rev(order) {
>> +		if (napot_cont_shift(order) > max_page_shift)
>> +			continue;
>> +
>> +		size = napot_cont_size(order);
>> +		if (end - addr < size)
>> +			continue;
>> +
>> +		if (!IS_ALIGNED(addr, size))
>> +			continue;
>> +
>> +		if (!IS_ALIGNED(PFN_PHYS(pfn), size))
>> +			continue;
>> +
>> +		map_size = size;
>> +		break;
>> +	}
>> +
>> +	return map_size;
>> +}
>> +
>> +#define arch_vmap_pte_supported_shift arch_vmap_pte_supported_shift
>> +static inline int arch_vmap_pte_supported_shift(unsigned long size)
>> +{
>> +	int shift = PAGE_SHIFT;
>> +	unsigned long order;
>> +
>> +	if (!has_svnapot())
>> +		return shift;
>> +
>> +	WARN_ON_ONCE(size >= PMD_SIZE);
>> +
>> +	for_each_napot_order_rev(order) {
>> +		if (napot_cont_size(order) > size)
>> +			continue;
>> +
>> +		if (!IS_ALIGNED(size, napot_cont_size(order)))
>> +			continue;
>> +
>> +		shift = napot_cont_shift(order);
>> +		break;
>> +	}
>> +
>> +	return shift;
>> +}
>> +
>> +#endif /* CONFIG_RISCV_ISA_SVNAPOT */
>> +#endif /* CONFIG_HAVE_ARCH_HUGE_VMAP */
>>   #endif /* _ASM_RISCV_VMALLOC_H */
>> -- 
>> 2.37.4
>>
>>
>> _______________________________________________
>> linux-riscv mailing list
>> linux-riscv@lists.infradead.org
>> http://lists.infradead.org/mailman/listinfo/linux-riscv
>>


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v9 1/3] riscv: mm: modify pte format for Svnapot
  2022-12-08  4:51     ` Qinglin Pan
@ 2022-12-08  4:59       ` Conor.Dooley
  2022-12-08 17:34       ` Qinglin Pan
  1 sibling, 0 replies; 22+ messages in thread
From: Conor.Dooley @ 2022-12-08  4:59 UTC (permalink / raw)
  To: panqinglin2020, conor
  Cc: palmer, linux-riscv, jeff, xuyinan, ajones, alex, jszhang

On 08/12/2022 04:51, Qinglin Pan wrote:
> On 2022/12/8 02:46, Conor Dooley wrote:
>> On Sun, Dec 04, 2022 at 10:11:35PM +0800, panqinglin2020@iscas.ac.cn wrote:
>>> From: Qinglin Pan <panqinglin2020@iscas.ac.cn>

>>> +#define for_each_napot_order(order)                                         \
>>> +    for (order = NAPOT_CONT_ORDER_BASE; order < NAPOT_ORDER_MAX; order++)
>>> +#define for_each_napot_order_rev(order)                                             \
>>
>> Would it be terrible to do s/rev/reverse to make things more obvious?
> 
> Maybe we should just keep it in the same style as existing macros like
> for_each_mem_range/for_each_mem_range_rev :)

Yah, makes sense. Sorry for the noise then!

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v9 1/3] riscv: mm: modify pte format for Svnapot
  2022-12-04 14:11 ` [PATCH v9 1/3] riscv: mm: modify pte format for Svnapot panqinglin2020
  2022-12-06 19:56   ` Conor Dooley
  2022-12-07 18:46   ` Conor Dooley
@ 2022-12-08 14:55   ` Andrew Jones
  2 siblings, 0 replies; 22+ messages in thread
From: Andrew Jones @ 2022-12-08 14:55 UTC (permalink / raw)
  To: panqinglin2020; +Cc: palmer, linux-riscv, jeff, xuyinan, conor, alex, jszhang

On Sun, Dec 04, 2022 at 10:11:35PM +0800, panqinglin2020@iscas.ac.cn wrote:
> From: Qinglin Pan <panqinglin2020@iscas.ac.cn>
> 
> Add one alternative to enable/disable svnapot support, enable this static
> key when "svnapot" is in the "riscv,isa" field of fdt and SVNAPOT compile
> option is set. It will influence the behavior of has_svnapot. All code
> dependent on svnapot should make sure that has_svnapot return true firstly.
> 
> Modify PTE definition for Svnapot, and creates some functions in pgtable.h
> to mark a PTE as napot and check if it is a Svnapot PTE. Until now, only
> 64KB napot size is supported in spec, so some macros has only 64KB version.
> 
> Signed-off-by: Qinglin Pan <panqinglin2020@iscas.ac.cn>
> 
> diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
> index ef8d66de5f38..1d8477c0af7c 100644
> --- a/arch/riscv/Kconfig
> +++ b/arch/riscv/Kconfig
> @@ -395,6 +395,20 @@ config RISCV_ISA_C
>  
>  	  If you don't know what to do here, say Y.
>  
> +config RISCV_ISA_SVNAPOT
> +	bool "SVNAPOT extension support"
> +	depends on 64BIT && MMU
> +	select RISCV_ALTERNATIVE
> +	default y
> +	help
> +	  Allow kernel to detect SVNAPOT ISA-extension dynamically in boot time
> +	  and enable its usage.
> +
> +	  SVNAPOT extension helps to mark contiguous PTEs as a range
> +	  of contiguous virtual-to-physical translations, with a naturally
> +	  aligned power-of-2 (NAPOT) granularity larger than the base 4KB page
> +	  size.

Maybe we want to usual, "If you don't know what to do here, say Y."

> +
>  config RISCV_ISA_SVPBMT
>  	bool "SVPBMT extension support"
>  	depends on 64BIT && MMU
> diff --git a/arch/riscv/include/asm/errata_list.h b/arch/riscv/include/asm/errata_list.h
> index 4180312d2a70..beadb1126ed9 100644
> --- a/arch/riscv/include/asm/errata_list.h
> +++ b/arch/riscv/include/asm/errata_list.h
> @@ -22,9 +22,10 @@
>  #define	ERRATA_THEAD_NUMBER 3
>  #endif
>  
> -#define	CPUFEATURE_SVPBMT 0
> -#define	CPUFEATURE_ZICBOM 1
> -#define	CPUFEATURE_NUMBER 2
> +#define	CPUFEATURE_SVPBMT	0
> +#define	CPUFEATURE_ZICBOM	1
> +#define	CPUFEATURE_SVNAPOT	2
> +#define	CPUFEATURE_NUMBER	3
>  
>  #ifdef __ASSEMBLY__
>  
> @@ -156,6 +157,14 @@ asm volatile(ALTERNATIVE(						\
>  	: "=r" (__ovl) :						\
>  	: "memory")
>  
> +#define ALT_SVNAPOT(_val)						\
> +asm(ALTERNATIVE(							\
> +	"li %0, 0",							\
> +	"li %0, 1",							\
> +		0, CPUFEATURE_SVNAPOT, CONFIG_RISCV_ISA_SVNAPOT)	\
> +	: "=r" (_val) :							\
> +	: "memory")
> +
>  #endif /* __ASSEMBLY__ */
>  
>  #endif
> diff --git a/arch/riscv/include/asm/hwcap.h b/arch/riscv/include/asm/hwcap.h
> index b22525290073..4cbc1f45ab26 100644
> --- a/arch/riscv/include/asm/hwcap.h
> +++ b/arch/riscv/include/asm/hwcap.h
> @@ -54,6 +54,7 @@ extern unsigned long elf_hwcap;
>   */
>  enum riscv_isa_ext_id {
>  	RISCV_ISA_EXT_SSCOFPMF = RISCV_ISA_EXT_BASE,
> +	RISCV_ISA_EXT_SVNAPOT,
>  	RISCV_ISA_EXT_SVPBMT,
>  	RISCV_ISA_EXT_ZICBOM,
>  	RISCV_ISA_EXT_ZIHINTPAUSE,
> @@ -87,7 +88,6 @@ static __always_inline int riscv_isa_ext2key(int num)
>  {
>  	switch (num) {
>  	case RISCV_ISA_EXT_f:
> -		return RISCV_ISA_EXT_KEY_FPU;
>  	case RISCV_ISA_EXT_d:
>  		return RISCV_ISA_EXT_KEY_FPU;
>  	case RISCV_ISA_EXT_ZIHINTPAUSE:
> diff --git a/arch/riscv/include/asm/page.h b/arch/riscv/include/asm/page.h
> index ac70b0fd9a9a..349fad5e35de 100644
> --- a/arch/riscv/include/asm/page.h
> +++ b/arch/riscv/include/asm/page.h
> @@ -16,11 +16,6 @@
>  #define PAGE_SIZE	(_AC(1, UL) << PAGE_SHIFT)
>  #define PAGE_MASK	(~(PAGE_SIZE - 1))
>  
> -#ifdef CONFIG_64BIT
> -#define HUGE_MAX_HSTATE		2
> -#else
> -#define HUGE_MAX_HSTATE		1
> -#endif
>  #define HPAGE_SHIFT		PMD_SHIFT
>  #define HPAGE_SIZE		(_AC(1, UL) << HPAGE_SHIFT)
>  #define HPAGE_MASK              (~(HPAGE_SIZE - 1))
> diff --git a/arch/riscv/include/asm/pgtable-64.h b/arch/riscv/include/asm/pgtable-64.h
> index dc42375c2357..9611833907ec 100644
> --- a/arch/riscv/include/asm/pgtable-64.h
> +++ b/arch/riscv/include/asm/pgtable-64.h
> @@ -74,6 +74,40 @@ typedef struct {
>   */
>  #define _PAGE_PFN_MASK  GENMASK(53, 10)
>  
> +/*
> + * [63] Svnapot definitions:
> + * 0 Svnapot disabled
> + * 1 Svnapot enabled
> + */
> +#define _PAGE_NAPOT_SHIFT	63
> +#define _PAGE_NAPOT		BIT(_PAGE_NAPOT_SHIFT)
> +/*
> + * Only 64KB (order 4) napot ptes supported.
> + */
> +#define NAPOT_CONT_ORDER_BASE 4
> +enum napot_cont_order {
> +	NAPOT_CONT64KB_ORDER = NAPOT_CONT_ORDER_BASE,
> +	NAPOT_ORDER_MAX,
> +};
> +
> +#define for_each_napot_order(order)						\
> +	for (order = NAPOT_CONT_ORDER_BASE; order < NAPOT_ORDER_MAX; order++)
> +#define for_each_napot_order_rev(order)						\
> +	for (order = NAPOT_ORDER_MAX - 1;					\
> +	     order >= NAPOT_CONT_ORDER_BASE; order--)
> +#define napot_cont_order(val)	(__builtin_ctzl((val.pte >> _PAGE_PFN_SHIFT) << 1))
> +
> +#define napot_cont_shift(order)	((order) + PAGE_SHIFT)
> +#define napot_cont_size(order)	BIT(napot_cont_shift(order))
> +#define napot_cont_mask(order)	(~(napot_cont_size(order) - 1UL))
> +#define napot_pte_num(order)	BIT(order)
> +
> +#ifdef CONFIG_RISCV_ISA_SVNAPOT
> +#define HUGE_MAX_HSTATE		(2 + (NAPOT_ORDER_MAX - NAPOT_CONT_ORDER_BASE))
> +#else
> +#define HUGE_MAX_HSTATE		2
> +#endif
> +
>  /*
>   * [62:61] Svpbmt Memory Type definitions:
>   *
> diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
> index c61ae83aadee..99957b1270f2 100644
> --- a/arch/riscv/include/asm/pgtable.h
> +++ b/arch/riscv/include/asm/pgtable.h
> @@ -6,10 +6,12 @@
>  #ifndef _ASM_RISCV_PGTABLE_H
>  #define _ASM_RISCV_PGTABLE_H
>  
> +#include <linux/jump_label.h>
>  #include <linux/mmzone.h>
>  #include <linux/sizes.h>
>  
>  #include <asm/pgtable-bits.h>
> +#include <asm/hwcap.h>
>  
>  #ifndef CONFIG_MMU
>  #define KERNEL_LINK_ADDR	PAGE_OFFSET
> @@ -264,10 +266,49 @@ static inline pte_t pud_pte(pud_t pud)
>  	return __pte(pud_val(pud));
>  }
>  
> +static __always_inline bool has_svnapot(void)
> +{
> +	unsigned int _val;
> +
> +	ALT_SVNAPOT(_val);

I was actually hoping to see this based on Jisheng's series and then use
riscv_has_extension_likely() or riscv_has_extension_unlikely(). I guess
we can proceed independently and then sort it out later though as you
said in a reply to Conor.

> +
> +	return _val;
> +}
> +
> +#ifdef CONFIG_RISCV_ISA_SVNAPOT
> +
> +static inline unsigned long pte_napot(pte_t pte)
> +{
> +	return pte_val(pte) & _PAGE_NAPOT;
> +}
> +
> +static inline pte_t pte_mknapot(pte_t pte, unsigned int order)
> +{
> +	int pos = order - 1 + _PAGE_PFN_SHIFT;
> +	unsigned long napot_bit = BIT(pos);
> +	unsigned long napot_mask = ~GENMASK(pos, _PAGE_PFN_SHIFT);
> +
> +	return __pte((pte_val(pte) & napot_mask) | napot_bit | _PAGE_NAPOT);
> +}
> +
> +#else
> +
> +static inline unsigned long pte_napot(pte_t pte)
> +{
> +	return 0;
> +}
> +
> +#endif /* CONFIG_RISCV_ISA_SVNAPOT */
> +
>  /* Yields the page frame number (PFN) of a page table entry */
>  static inline unsigned long pte_pfn(pte_t pte)
>  {
> -	return __page_val_to_pfn(pte_val(pte));
> +	unsigned long res  = __page_val_to_pfn(pte_val(pte));
> +
> +	if (has_svnapot() && pte_napot(pte))
> +		res = res & (res - 1UL);
> +
> +	return res;
>  }
>  
>  #define pte_page(x)     pfn_to_page(pte_pfn(x))
> diff --git a/arch/riscv/kernel/cpu.c b/arch/riscv/kernel/cpu.c
> index bf9dd6764bad..88495f5fcafd 100644
> --- a/arch/riscv/kernel/cpu.c
> +++ b/arch/riscv/kernel/cpu.c
> @@ -165,6 +165,7 @@ static struct riscv_isa_ext_data isa_ext_arr[] = {
>  	__RISCV_ISA_EXT_DATA(sscofpmf, RISCV_ISA_EXT_SSCOFPMF),
>  	__RISCV_ISA_EXT_DATA(sstc, RISCV_ISA_EXT_SSTC),
>  	__RISCV_ISA_EXT_DATA(svinval, RISCV_ISA_EXT_SVINVAL),
> +	__RISCV_ISA_EXT_DATA(svnapot, RISCV_ISA_EXT_SVNAPOT),
>  	__RISCV_ISA_EXT_DATA(svpbmt, RISCV_ISA_EXT_SVPBMT),
>  	__RISCV_ISA_EXT_DATA(zicbom, RISCV_ISA_EXT_ZICBOM),
>  	__RISCV_ISA_EXT_DATA(zihintpause, RISCV_ISA_EXT_ZIHINTPAUSE),
> diff --git a/arch/riscv/kernel/cpufeature.c b/arch/riscv/kernel/cpufeature.c
> index 694267d1fe81..eeed66c3d497 100644
> --- a/arch/riscv/kernel/cpufeature.c
> +++ b/arch/riscv/kernel/cpufeature.c
> @@ -205,6 +205,7 @@ void __init riscv_fill_hwcap(void)
>  				SET_ISA_EXT_MAP("zihintpause", RISCV_ISA_EXT_ZIHINTPAUSE);
>  				SET_ISA_EXT_MAP("sstc", RISCV_ISA_EXT_SSTC);
>  				SET_ISA_EXT_MAP("svinval", RISCV_ISA_EXT_SVINVAL);
> +				SET_ISA_EXT_MAP("svnapot", RISCV_ISA_EXT_SVNAPOT);
>  			}
>  #undef SET_ISA_EXT_MAP
>  		}
> @@ -252,6 +253,17 @@ void __init riscv_fill_hwcap(void)
>  }
>  
>  #ifdef CONFIG_RISCV_ALTERNATIVE
> +static bool __init_or_module cpufeature_probe_svnapot(unsigned int stage)
> +{
> +	if (!IS_ENABLED(CONFIG_RISCV_ISA_SVNAPOT))
> +		return false;
> +
> +	if (stage == RISCV_ALTERNATIVES_EARLY_BOOT)
> +		return false;
> +
> +	return riscv_isa_extension_available(NULL, SVNAPOT);
> +}
> +
>  static bool __init_or_module cpufeature_probe_svpbmt(unsigned int stage)
>  {
>  	if (!IS_ENABLED(CONFIG_RISCV_ISA_SVPBMT))
> @@ -289,6 +301,9 @@ static u32 __init_or_module cpufeature_probe(unsigned int stage)
>  {
>  	u32 cpu_req_feature = 0;
>  
> +	if (cpufeature_probe_svnapot(stage))
> +		cpu_req_feature |= BIT(CPUFEATURE_SVNAPOT);
> +
>  	if (cpufeature_probe_svpbmt(stage))
>  		cpu_req_feature |= BIT(CPUFEATURE_SVPBMT);
>  
> -- 
> 2.37.4
>

Other than the issue with HUGE_MAX_HSTATE that Conor pointed out this
looks good to me.

Reviewed-by: Andrew Jones <ajones@ventanamicro.com>

Thanks,
drew

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v9 2/3] riscv: mm: support Svnapot in hugetlb page
  2022-12-04 14:11 ` [PATCH v9 2/3] riscv: mm: support Svnapot in hugetlb page panqinglin2020
  2022-12-07 18:55   ` Conor Dooley
@ 2022-12-08 15:17   ` Andrew Jones
  2022-12-08 17:43     ` Qinglin Pan
  1 sibling, 1 reply; 22+ messages in thread
From: Andrew Jones @ 2022-12-08 15:17 UTC (permalink / raw)
  To: panqinglin2020; +Cc: palmer, linux-riscv, jeff, xuyinan, conor, alex, jszhang

On Sun, Dec 04, 2022 at 10:11:36PM +0800, panqinglin2020@iscas.ac.cn wrote:
> From: Qinglin Pan <panqinglin2020@iscas.ac.cn>
> 
> Svnapot can be used to support 64KB hugetlb page, so it can become a new
> option when using hugetlbfs. Add a basic implementation of hugetlb page,
> and support 64KB as a size in it by using Svnapot.
> 
> For test, boot kernel with command line contains "default_hugepagesz=64K
> hugepagesz=64K hugepages=20" and run a simple test like this:
> 
> tools/testing/selftests/vm/map_hugetlb 1 16
> 
> And it should be passed.
> 
> Signed-off-by: Qinglin Pan <panqinglin2020@iscas.ac.cn>
> 
> diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
> index 1d8477c0af7c..be5c1edea70f 100644
> --- a/arch/riscv/Kconfig
> +++ b/arch/riscv/Kconfig
> @@ -43,7 +43,7 @@ config RISCV
>  	select ARCH_USE_QUEUED_RWLOCKS
>  	select ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT if MMU
>  	select ARCH_WANT_FRAME_POINTERS
> -	select ARCH_WANT_GENERAL_HUGETLB
> +	select ARCH_WANT_GENERAL_HUGETLB if !RISCV_ISA_SVNAPOT
>  	select ARCH_WANT_HUGE_PMD_SHARE if 64BIT
>  	select ARCH_WANTS_THP_SWAP if HAVE_ARCH_TRANSPARENT_HUGEPAGE
>  	select BINFMT_FLAT_NO_DATA_START_OFFSET if !MMU
> diff --git a/arch/riscv/include/asm/hugetlb.h b/arch/riscv/include/asm/hugetlb.h
> index ec19d6afc896..fe6f23006641 100644
> --- a/arch/riscv/include/asm/hugetlb.h
> +++ b/arch/riscv/include/asm/hugetlb.h
> @@ -2,7 +2,6 @@
>  #ifndef _ASM_RISCV_HUGETLB_H
>  #define _ASM_RISCV_HUGETLB_H
>  
> -#include <asm-generic/hugetlb.h>
>  #include <asm/page.h>
>  
>  static inline void arch_clear_hugepage_flags(struct page *page)
> @@ -11,4 +10,37 @@ static inline void arch_clear_hugepage_flags(struct page *page)
>  }
>  #define arch_clear_hugepage_flags arch_clear_hugepage_flags
>  
> +#ifdef CONFIG_RISCV_ISA_SVNAPOT
> +#define __HAVE_ARCH_HUGE_PTE_CLEAR
> +void huge_pte_clear(struct mm_struct *mm, unsigned long addr,
> +		    pte_t *ptep, unsigned long sz);
> +
> +#define __HAVE_ARCH_HUGE_SET_HUGE_PTE_AT
> +void set_huge_pte_at(struct mm_struct *mm,
> +		     unsigned long addr, pte_t *ptep, pte_t pte);
> +
> +#define __HAVE_ARCH_HUGE_PTEP_GET_AND_CLEAR
> +pte_t huge_ptep_get_and_clear(struct mm_struct *mm,
> +			      unsigned long addr, pte_t *ptep);
> +
> +#define __HAVE_ARCH_HUGE_PTEP_CLEAR_FLUSH
> +pte_t huge_ptep_clear_flush(struct vm_area_struct *vma,
> +			    unsigned long addr, pte_t *ptep);
> +
> +#define __HAVE_ARCH_HUGE_PTEP_SET_WRPROTECT
> +void huge_ptep_set_wrprotect(struct mm_struct *mm,
> +			     unsigned long addr, pte_t *ptep);
> +
> +#define __HAVE_ARCH_HUGE_PTEP_SET_ACCESS_FLAGS
> +int huge_ptep_set_access_flags(struct vm_area_struct *vma,
> +			       unsigned long addr, pte_t *ptep,
> +			       pte_t pte, int dirty);
> +
> +pte_t arch_make_huge_pte(pte_t entry, unsigned int shift, vm_flags_t flags);
> +#define arch_make_huge_pte arch_make_huge_pte
> +
> +#endif /*CONFIG_RISCV_ISA_SVNAPOT*/
> +
> +#include <asm-generic/hugetlb.h>
> +
>  #endif /* _ASM_RISCV_HUGETLB_H */
> diff --git a/arch/riscv/mm/hugetlbpage.c b/arch/riscv/mm/hugetlbpage.c
> index 932dadfdca54..49f92f8cd431 100644
> --- a/arch/riscv/mm/hugetlbpage.c
> +++ b/arch/riscv/mm/hugetlbpage.c
> @@ -2,6 +2,301 @@
>  #include <linux/hugetlb.h>
>  #include <linux/err.h>
>  
> +#ifdef CONFIG_RISCV_ISA_SVNAPOT
> +pte_t *huge_pte_alloc(struct mm_struct *mm,
> +		      struct vm_area_struct *vma,
> +		      unsigned long addr,
> +		      unsigned long sz)
> +{
> +	pgd_t *pgd;
> +	p4d_t *p4d;
> +	pud_t *pud;
> +	pmd_t *pmd;
> +	pte_t *pte = NULL;
> +	unsigned long order;

Since it's Christmas time I'll make the nit that reverse fir tree is
preferred[1]. 

[1] https://www.kernel.org/doc/html/latest/process/maintainer-tip.html#variable-declarations

> +
> +	pgd = pgd_offset(mm, addr);
> +	p4d = p4d_alloc(mm, pgd, addr);
> +	if (!p4d)
> +		return NULL;
> +
> +	pud = pud_alloc(mm, p4d, addr);
> +	if (!pud)
> +		return NULL;
> +
> +	if (sz == PUD_SIZE) {
> +		pte = (pte_t *)pud;
> +		goto out;
> +	}
> +
> +	if (sz == PMD_SIZE) {
> +		if (want_pmd_share(vma, addr) && pud_none(*pud))
> +			pte = huge_pmd_share(mm, vma, addr, pud);
> +		else
> +			pte = (pte_t *)pmd_alloc(mm, pud, addr);
> +		goto out;
> +	}
> +
> +	pmd = pmd_alloc(mm, pud, addr);
> +	if (!pmd)
> +		return NULL;
> +
> +	for_each_napot_order(order) {
> +		if (napot_cont_size(order) == sz) {
> +			pte = pte_alloc_map(mm, pmd, (addr & napot_cont_mask(order)));

nit: No need for the () in the 3rd parameter

> +			break;
> +		}
> +	}
> +
> +out:
> +	WARN_ON_ONCE(pte && pte_present(*pte) && !pte_huge(*pte));
> +	return pte;
> +}
> +
> +pte_t *huge_pte_offset(struct mm_struct *mm,
> +		       unsigned long addr,
> +		       unsigned long sz)
> +{
> +	pgd_t *pgd;
> +	p4d_t *p4d;
> +	pud_t *pud;
> +	pmd_t *pmd;
> +	pte_t *pte = NULL;
> +	unsigned long order;
> +
> +	pgd = pgd_offset(mm, addr);
> +	if (!pgd_present(*pgd))
> +		return NULL;

nit: Add blank line here

> +	p4d = p4d_offset(pgd, addr);
> +	if (!p4d_present(*p4d))
> +		return NULL;
> +
> +	pud = pud_offset(p4d, addr);
> +	if (sz == PUD_SIZE)
> +		/* must be pud huge, non-present or none */
> +		return (pte_t *)pud;

nit: Add blank line here

> +	if (!pud_present(*pud))
> +		return NULL;
> +
> +	pmd = pmd_offset(pud, addr);
> +	if (sz == PMD_SIZE)
> +		/* must be pmd huge, non-present or none */
> +		return (pte_t *)pmd;

nit: Add blank line here

> +	if (!pmd_present(*pmd))
> +		return NULL;
> +
> +	for_each_napot_order(order) {
> +		if (napot_cont_size(order) == sz) {
> +			pte = pte_offset_kernel(pmd, (addr & napot_cont_mask(order)));

nit: extra ()

> +			break;
> +		}
> +	}

nit: Add blank line here

> +	return pte;
> +}
> +
> +static pte_t get_clear_contig(struct mm_struct *mm,
> +			      unsigned long addr,
> +			      pte_t *ptep,
> +			      unsigned long pte_num)
> +{
> +	pte_t orig_pte = ptep_get(ptep);
> +	unsigned long i;
> +
> +	for (i = 0; i < pte_num; i++, addr += PAGE_SIZE, ptep++) {
> +		pte_t pte = ptep_get_and_clear(mm, addr, ptep);
> +
> +		if (pte_dirty(pte))
> +			orig_pte = pte_mkdirty(orig_pte);
> +
> +		if (pte_young(pte))
> +			orig_pte = pte_mkyoung(orig_pte);
> +	}

nit: Add blank line here

> +	return orig_pte;
> +}
> +
> +static pte_t get_clear_contig_flush(struct mm_struct *mm,
> +				    unsigned long addr,
> +				    pte_t *ptep,
> +				    unsigned long pte_num)
> +{
> +	pte_t orig_pte = get_clear_contig(mm, addr, ptep, pte_num);
> +	bool valid = !pte_none(orig_pte);
> +	struct vm_area_struct vma = TLB_FLUSH_VMA(mm, 0);
> +
> +	if (valid)
> +		flush_tlb_range(&vma, addr, addr + (PAGE_SIZE * pte_num));
> +
> +	return orig_pte;
> +}
> +
> +pte_t arch_make_huge_pte(pte_t entry, unsigned int shift, vm_flags_t flags)
> +{
> +	unsigned long order;
> +
> +	for_each_napot_order(order) {
> +		if (shift == napot_cont_shift(order)) {
> +			entry = pte_mknapot(entry, order);
> +			break;
> +		}
> +	}
> +	if (order == NAPOT_ORDER_MAX)
> +		entry = pte_mkhuge(entry);
> +
> +	return entry;
> +}
> +
> +void set_huge_pte_at(struct mm_struct *mm,
> +		     unsigned long addr,
> +		     pte_t *ptep,
> +		     pte_t pte)
> +{
> +	int i;
> +	int pte_num;
> +
> +	if (!pte_napot(pte)) {
> +		set_pte_at(mm, addr, ptep, pte);
> +		return;
> +	}
> +
> +	pte_num = napot_pte_num(napot_cont_order(pte));
> +	for (i = 0; i < pte_num; i++, ptep++, addr += PAGE_SIZE)
> +		set_pte_at(mm, addr, ptep, pte);
> +}
> +
> +int huge_ptep_set_access_flags(struct vm_area_struct *vma,
> +			       unsigned long addr,
> +			       pte_t *ptep,
> +			       pte_t pte,
> +			       int dirty)
> +{
> +	pte_t orig_pte;
> +	int i;
> +	int pte_num;
> +	struct mm_struct *mm = vma->vm_mm;
> +
> +	if (!pte_napot(pte))
> +		return ptep_set_access_flags(vma, addr, ptep, pte, dirty);
> +
> +	pte_num = napot_pte_num(napot_cont_order(pte));
> +	ptep = huge_pte_offset(mm, addr,
> +			       napot_cont_size(napot_cont_order(pte)));

nit:
	order = napot_cont_order(pte);
	pte_num = napot_pte_num(order);
	ptep = huge_pte_offset(mm, addr, napot_cont_size(order));

> +	orig_pte = get_clear_contig_flush(mm, addr, ptep, pte_num);
> +
> +	if (pte_dirty(orig_pte))
> +		pte = pte_mkdirty(pte);
> +
> +	if (pte_young(orig_pte))
> +		pte = pte_mkyoung(pte);
> +
> +	for (i = 0; i < pte_num; i++, addr += PAGE_SIZE, ptep++)
> +		set_pte_at(mm, addr, ptep, pte);
> +
> +	return true;
> +}
> +
> +pte_t huge_ptep_get_and_clear(struct mm_struct *mm,
> +			      unsigned long addr,
> +			      pte_t *ptep)
> +{
> +	int pte_num;
> +	pte_t orig_pte = ptep_get(ptep);
> +
> +	if (!pte_napot(orig_pte))
> +		return ptep_get_and_clear(mm, addr, ptep);
> +
> +	pte_num = napot_pte_num(napot_cont_order(orig_pte));
> +
> +	return get_clear_contig(mm, addr, ptep, pte_num);
> +}
> +
> +void huge_ptep_set_wrprotect(struct mm_struct *mm,
> +			     unsigned long addr,
> +			     pte_t *ptep)
> +{
> +	int i;
> +	int pte_num;
> +	pte_t pte = ptep_get(ptep);
> +
> +	if (!pte_napot(pte)) {
> +		ptep_set_wrprotect(mm, addr, ptep);
> +		return;
> +	}
> +
> +	pte_num = napot_pte_num(napot_cont_order(pte));
> +	ptep = huge_pte_offset(mm, addr, napot_cont_size(napot_cont_order(pte)));

Same use an 'order' variable nit as above

> +
> +	for (i = 0; i < pte_num; i++, addr += PAGE_SIZE, ptep++)
> +		ptep_set_wrprotect(mm, addr, ptep);
> +}
> +
> +pte_t huge_ptep_clear_flush(struct vm_area_struct *vma,
> +			    unsigned long addr,
> +			    pte_t *ptep)
> +{
> +	int pte_num;
> +	pte_t pte = ptep_get(ptep);
> +
> +	if (!pte_napot(pte))
> +		return ptep_clear_flush(vma, addr, ptep);
> +
> +	pte_num = napot_pte_num(napot_cont_order(pte));
> +
> +	return get_clear_contig_flush(vma->vm_mm, addr, ptep, pte_num);
> +}
> +
> +void huge_pte_clear(struct mm_struct *mm,
> +		    unsigned long addr,
> +		    pte_t *ptep,
> +		    unsigned long sz)
> +{
> +	int i, pte_num;
> +	pte_t pte = READ_ONCE(*ptep);
> +
> +	if (!pte_napot(pte)) {
> +		pte_clear(mm, addr, ptep);
> +		return;
> +	}
> +
> +	pte_num = napot_pte_num(napot_cont_order(pte));
> +	for (i = 0; i < pte_num; i++, addr += PAGE_SIZE, ptep++)
> +		pte_clear(mm, addr, ptep);
> +}
> +
> +bool __init is_napot_size(unsigned long size)
> +{
> +	unsigned long order;
> +
> +	if (!has_svnapot())
> +		return false;
> +
> +	for_each_napot_order(order) {
> +		if (size == napot_cont_size(order))
> +			return true;
> +	}
> +	return false;
> +}
> +
> +static __init int napot_hugetlbpages_init(void)
> +{
> +	if (has_svnapot()) {
> +		unsigned long order;
> +
> +		for_each_napot_order(order)
> +			hugetlb_add_hstate(order);
> +	}
> +	return 0;
> +}
> +arch_initcall(napot_hugetlbpages_init);
> +
> +#else
> +
> +bool __init is_napot_size(unsigned long size)
> +{
> +	return false;
> +}
> +
> +#endif /*CONFIG_RISCV_ISA_SVNAPOT*/
> +
>  int pud_huge(pud_t pud)
>  {
>  	return pud_leaf(pud);
> @@ -18,6 +313,8 @@ bool __init arch_hugetlb_valid_size(unsigned long size)
>  		return true;
>  	else if (IS_ENABLED(CONFIG_64BIT) && size == PUD_SIZE)
>  		return true;
> +	else if (is_napot_size(size))
> +		return true;
>  	else
>  		return false;
>  }
> -- 
> 2.37.4
>

Besides the nits,

Reviewed-by: Andrew Jones <ajones@ventanamicro.com>

Thanks,
drew

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v9 3/3] riscv: mm: support Svnapot in huge vmap
  2022-12-04 14:11 ` [PATCH v9 3/3] riscv: mm: support Svnapot in huge vmap panqinglin2020
  2022-12-07 18:59   ` Conor Dooley
@ 2022-12-08 15:22   ` Andrew Jones
  2022-12-08 17:39     ` Qinglin Pan
  1 sibling, 1 reply; 22+ messages in thread
From: Andrew Jones @ 2022-12-08 15:22 UTC (permalink / raw)
  To: panqinglin2020; +Cc: palmer, linux-riscv, jeff, xuyinan, conor, alex, jszhang

On Sun, Dec 04, 2022 at 10:11:37PM +0800, panqinglin2020@iscas.ac.cn wrote:
> From: Qinglin Pan <panqinglin2020@iscas.ac.cn>
> 
> As HAVE_ARCH_HUGE_VMAP and HAVE_ARCH_HUGE_VMALLOC is supported, we can
> implement arch_vmap_pte_range_map_size and arch_vmap_pte_supported_shift
> for Svnapot to support huge vmap about napot size.
> 
> It can be tested by huge vmap used in pci driver. Huge vmalloc with svnapot
> can be tested by changing test_vmalloc module like [1], and probe this
> module to run fix_size_alloc_test with use_huge true.
> 
> [1]https://github.com/RV-VM/linux-vm-support/commit/33f4ee399c36d355c412ebe5334ca46fd727f8f5

Do you plan to post this test_vmalloc.c patch?

Thanks,
drew

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v9 1/3] riscv: mm: modify pte format for Svnapot
  2022-12-08  4:51     ` Qinglin Pan
  2022-12-08  4:59       ` Conor.Dooley
@ 2022-12-08 17:34       ` Qinglin Pan
  2022-12-09  7:42         ` Andrew Jones
  1 sibling, 1 reply; 22+ messages in thread
From: Qinglin Pan @ 2022-12-08 17:34 UTC (permalink / raw)
  To: linux-riscv, Andrew Jones, Conor Dooley, palmer, jeff

Hi all,

On 2022/12/8 12:51, Qinglin Pan wrote:
> Hey!
> 
> On 2022/12/8 02:46, Conor Dooley wrote:
>> Hey!
>>
>> Couple small remarks and questions for you.
>>
>> On Sun, Dec 04, 2022 at 10:11:35PM +0800, panqinglin2020@iscas.ac.cn 
>> wrote:
>>> From: Qinglin Pan <panqinglin2020@iscas.ac.cn>
>>>
>>> Add one alternative to enable/disable svnapot support, enable this 
>>> static
>>> key when "svnapot" is in the "riscv,isa" field of fdt and SVNAPOT 
>>> compile
>>> option is set. It will influence the behavior of has_svnapot. All code
>>> dependent on svnapot should make sure that has_svnapot return true 
>>> firstly.
>>>
>>> Modify PTE definition for Svnapot, and creates some functions in 
>>> pgtable.h
>>> to mark a PTE as napot and check if it is a Svnapot PTE. Until now, only
>>> 64KB napot size is supported in spec, so some macros has only 64KB 
>>> version.
>>>
>>> Signed-off-by: Qinglin Pan <panqinglin2020@iscas.ac.cn>
>>>
>>> diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
>>> index ef8d66de5f38..1d8477c0af7c 100644
>>> --- a/arch/riscv/Kconfig
>>> +++ b/arch/riscv/Kconfig
>>> @@ -395,6 +395,20 @@ config RISCV_ISA_C
>>>         If you don't know what to do here, say Y.
>>> +config RISCV_ISA_SVNAPOT
>>> +    bool "SVNAPOT extension support"
>>> +    depends on 64BIT && MMU
>>> +    select RISCV_ALTERNATIVE
>>> +    default y
>>> +    help
>>> +      Allow kernel to detect SVNAPOT ISA-extension dynamically in 
>>> boot time
>>> +      and enable its usage.
>>> +
>>> +      SVNAPOT extension helps to mark contiguous PTEs as a range
>>> +      of contiguous virtual-to-physical translations, with a naturally
>>> +      aligned power-of-2 (NAPOT) granularity larger than the base 
>>> 4KB page
>>> +      size.
>>> +
>>>   config RISCV_ISA_SVPBMT
>>>       bool "SVPBMT extension support"
>>>       depends on 64BIT && MMU
>>> diff --git a/arch/riscv/include/asm/errata_list.h 
>>> b/arch/riscv/include/asm/errata_list.h
>>> index 4180312d2a70..beadb1126ed9 100644
>>> --- a/arch/riscv/include/asm/errata_list.h
>>> +++ b/arch/riscv/include/asm/errata_list.h
>>> @@ -22,9 +22,10 @@
>>>   #define    ERRATA_THEAD_NUMBER 3
>>>   #endif
>>> -#define    CPUFEATURE_SVPBMT 0
>>> -#define    CPUFEATURE_ZICBOM 1
>>> -#define    CPUFEATURE_NUMBER 2
>>> +#define    CPUFEATURE_SVPBMT    0
>>> +#define    CPUFEATURE_ZICBOM    1
>>> +#define    CPUFEATURE_SVNAPOT    2
>>> +#define    CPUFEATURE_NUMBER    3
>>>   #ifdef __ASSEMBLY__
>>> @@ -156,6 +157,14 @@ asm volatile(ALTERNATIVE(                        \
>>>       : "=r" (__ovl) :                        \
>>>       : "memory")
>>> +#define ALT_SVNAPOT(_val)                        \
>>> +asm(ALTERNATIVE(                            \
>>> +    "li %0, 0",                            \
>>> +    "li %0, 1",                            \
>>> +        0, CPUFEATURE_SVNAPOT, CONFIG_RISCV_ISA_SVNAPOT)    \
>>> +    : "=r" (_val) :                            \
>>> +    : "memory")
>>> +
>>>   #endif /* __ASSEMBLY__ */
>>>   #endif
>>> diff --git a/arch/riscv/include/asm/hwcap.h 
>>> b/arch/riscv/include/asm/hwcap.h
>>> index b22525290073..4cbc1f45ab26 100644
>>> --- a/arch/riscv/include/asm/hwcap.h
>>> +++ b/arch/riscv/include/asm/hwcap.h
>>> @@ -54,6 +54,7 @@ extern unsigned long elf_hwcap;
>>>    */
>>>   enum riscv_isa_ext_id {
>>>       RISCV_ISA_EXT_SSCOFPMF = RISCV_ISA_EXT_BASE,
>>> +    RISCV_ISA_EXT_SVNAPOT,
>>>       RISCV_ISA_EXT_SVPBMT,
>>>       RISCV_ISA_EXT_ZICBOM,
>>>       RISCV_ISA_EXT_ZIHINTPAUSE,
>>> @@ -87,7 +88,6 @@ static __always_inline int riscv_isa_ext2key(int num)
>>>   {
>>>       switch (num) {
>>>       case RISCV_ISA_EXT_f:
>>> -        return RISCV_ISA_EXT_KEY_FPU;
>>>       case RISCV_ISA_EXT_d:
>>>           return RISCV_ISA_EXT_KEY_FPU;
>>>       case RISCV_ISA_EXT_ZIHINTPAUSE:
>>> diff --git a/arch/riscv/include/asm/page.h 
>>> b/arch/riscv/include/asm/page.h
>>> index ac70b0fd9a9a..349fad5e35de 100644
>>> --- a/arch/riscv/include/asm/page.h
>>> +++ b/arch/riscv/include/asm/page.h
>>> @@ -16,11 +16,6 @@
>>>   #define PAGE_SIZE    (_AC(1, UL) << PAGE_SHIFT)
>>>   #define PAGE_MASK    (~(PAGE_SIZE - 1))
>>> -#ifdef CONFIG_64BIT
>>> -#define HUGE_MAX_HSTATE        2
>>> -#else
>>> -#define HUGE_MAX_HSTATE        1
>>> -#endif
>>>   #define HPAGE_SHIFT        PMD_SHIFT
>>>   #define HPAGE_SIZE        (_AC(1, UL) << HPAGE_SHIFT)
>>>   #define HPAGE_MASK              (~(HPAGE_SIZE - 1))
>>> diff --git a/arch/riscv/include/asm/pgtable-64.h 
>>> b/arch/riscv/include/asm/pgtable-64.h
>>> index dc42375c2357..9611833907ec 100644
>>> --- a/arch/riscv/include/asm/pgtable-64.h
>>> +++ b/arch/riscv/include/asm/pgtable-64.h
>>> @@ -74,6 +74,40 @@ typedef struct {
>>>    */
>>>   #define _PAGE_PFN_MASK  GENMASK(53, 10)
>>> +/*
>>> + * [63] Svnapot definitions:
>>> + * 0 Svnapot disabled
>>> + * 1 Svnapot enabled
>>> + */
>>> +#define _PAGE_NAPOT_SHIFT    63
>>> +#define _PAGE_NAPOT        BIT(_PAGE_NAPOT_SHIFT)
>>> +/*
>>> + * Only 64KB (order 4) napot ptes supported.
>>> + */
>>> +#define NAPOT_CONT_ORDER_BASE 4
>>> +enum napot_cont_order {
>>> +    NAPOT_CONT64KB_ORDER = NAPOT_CONT_ORDER_BASE,
>>> +    NAPOT_ORDER_MAX,
>>> +};
>>> +
>>> +#define for_each_napot_order(order)                        \
>>> +    for (order = NAPOT_CONT_ORDER_BASE; order < NAPOT_ORDER_MAX; 
>>> order++)
>>> +#define for_each_napot_order_rev(order)                        \
>>
>> Would it be terrible to do s/rev/reverse to make things more obvious?
> 
> Maybe we should just keep it in the same style as existing macros like
> for_each_mem_range/for_each_mem_range_rev :)
> 
>>
>>> +    for (order = NAPOT_ORDER_MAX - 1;                    \
>>> +         order >= NAPOT_CONT_ORDER_BASE; order--)
>>> +#define napot_cont_order(val)    (__builtin_ctzl((val.pte >> 
>>> _PAGE_PFN_SHIFT) << 1))
>>> +
>>> +#define napot_cont_shift(order)    ((order) + PAGE_SHIFT)
>>> +#define napot_cont_size(order)    BIT(napot_cont_shift(order))
>>> +#define napot_cont_mask(order)    (~(napot_cont_size(order) - 1UL))
>>> +#define napot_pte_num(order)    BIT(order)
>>> +
>>> +#ifdef CONFIG_RISCV_ISA_SVNAPOT
>>> +#define HUGE_MAX_HSTATE        (2 + (NAPOT_ORDER_MAX - 
>>> NAPOT_CONT_ORDER_BASE))
>>> +#else
>>> +#define HUGE_MAX_HSTATE        2
>>> +#endif
>>
>> This is coming from a place of *complete* ignorance:
>> If CONFIG_RISCV_ISA_SVNAPOT is enabled in the kernel but there is no
>> support for it on a platform is it okay to change the value of
>> HUGE_MAX_HSTATE? Apologies if I've missed something obvious.
> 
> :(
> You are right. If CONFIG_RISCV_ISA_SVNAPOT is enabled, HUGE_MAX_HSTATE
> will always be 3 whether has_svnapot() is true or false. I will try to
> fix this.

I am really expecting that HUGE_MAX_HSTATE can change according to the
platform situation when CONFIG_RISCV_ISA_SVNAPOT is enabled. But it
seems impossible to achive this with a minor patch. Because this value
is used as some static allocated arrays' length (for example, hstates in
mm/hugetlb.c:52) in arch-independent code :(

This immuture HUGE_MAX_HSTATE value will cause no functional error but
it really makes some objects used only partially when 
CONFIG_RISCV_ISA_SVNAPOT is enabled but the platform has no svnapot
support :( I am not sure whether this patch can be accepted with this
feature? Or any other helpful idea to fix this problem?

Please let me know if I ignore any information about it.

Regards,
Qinglin.

> 
>>
>>> +
>>>   /*
>>>    * [62:61] Svpbmt Memory Type definitions:
>>>    *
>>> diff --git a/arch/riscv/include/asm/pgtable.h 
>>> b/arch/riscv/include/asm/pgtable.h
>>> index c61ae83aadee..99957b1270f2 100644
>>> --- a/arch/riscv/include/asm/pgtable.h
>>> +++ b/arch/riscv/include/asm/pgtable.h
>>> @@ -6,10 +6,12 @@
>>>   #ifndef _ASM_RISCV_PGTABLE_H
>>>   #define _ASM_RISCV_PGTABLE_H
>>> +#include <linux/jump_label.h>
>>>   #include <linux/mmzone.h>
>>>   #include <linux/sizes.h>
>>>   #include <asm/pgtable-bits.h>
>>> +#include <asm/hwcap.h>
>>>   #ifndef CONFIG_MMU
>>>   #define KERNEL_LINK_ADDR    PAGE_OFFSET
>>> @@ -264,10 +266,49 @@ static inline pte_t pud_pte(pud_t pud)
>>>       return __pte(pud_val(pud));
>>>   }
>>> +static __always_inline bool has_svnapot(void)
>>> +{
>>> +    unsigned int _val;
>>> +
>>> +    ALT_SVNAPOT(_val);
>>> +
>>> +    return _val;
>>> +}
>>> +
>>> +#ifdef CONFIG_RISCV_ISA_SVNAPOT
>>> +
>>> +static inline unsigned long pte_napot(pte_t pte)
>>> +{
>>> +    return pte_val(pte) & _PAGE_NAPOT;
>>> +}
>>> +
>>> +static inline pte_t pte_mknapot(pte_t pte, unsigned int order)
>>> +{
>>> +    int pos = order - 1 + _PAGE_PFN_SHIFT;
>>> +    unsigned long napot_bit = BIT(pos);
>>> +    unsigned long napot_mask = ~GENMASK(pos, _PAGE_PFN_SHIFT);
>>> +
>>> +    return __pte((pte_val(pte) & napot_mask) | napot_bit | 
>>> _PAGE_NAPOT);
>>> +}
>>> +
>>> +#else
>>> +
>>> +static inline unsigned long pte_napot(pte_t pte)
>>> +{
>>> +    return 0;
>>> +}
>>> +
>>> +#endif /* CONFIG_RISCV_ISA_SVNAPOT */
>>> +
>>>   /* Yields the page frame number (PFN) of a page table entry */
>>>   static inline unsigned long pte_pfn(pte_t pte)
>>>   {
>>> -    return __page_val_to_pfn(pte_val(pte));
>>> +    unsigned long res  = __page_val_to_pfn(pte_val(pte));
>>
>> nit: extra space before the =
> 
> Nice catch! Will remove this extra space :)
> 
>>
>>> +
>>> +    if (has_svnapot() && pte_napot(pte))
>>> +        res = res & (res - 1UL);
>>> +
>>> +    return res;
>>>   }
>>>   #define pte_page(x)     pfn_to_page(pte_pfn(x))
>>> diff --git a/arch/riscv/kernel/cpu.c b/arch/riscv/kernel/cpu.c
>>> index bf9dd6764bad..88495f5fcafd 100644
>>> --- a/arch/riscv/kernel/cpu.c
>>> +++ b/arch/riscv/kernel/cpu.c
>>> @@ -165,6 +165,7 @@ static struct riscv_isa_ext_data isa_ext_arr[] = {
>>>       __RISCV_ISA_EXT_DATA(sscofpmf, RISCV_ISA_EXT_SSCOFPMF),
>>>       __RISCV_ISA_EXT_DATA(sstc, RISCV_ISA_EXT_SSTC),
>>>       __RISCV_ISA_EXT_DATA(svinval, RISCV_ISA_EXT_SVINVAL),
>>> +    __RISCV_ISA_EXT_DATA(svnapot, RISCV_ISA_EXT_SVNAPOT),
>>>       __RISCV_ISA_EXT_DATA(svpbmt, RISCV_ISA_EXT_SVPBMT),
>>>       __RISCV_ISA_EXT_DATA(zicbom, RISCV_ISA_EXT_ZICBOM),
>>>       __RISCV_ISA_EXT_DATA(zihintpause, RISCV_ISA_EXT_ZIHINTPAUSE),
>>> diff --git a/arch/riscv/kernel/cpufeature.c 
>>> b/arch/riscv/kernel/cpufeature.c
>>> index 694267d1fe81..eeed66c3d497 100644
>>> --- a/arch/riscv/kernel/cpufeature.c
>>> +++ b/arch/riscv/kernel/cpufeature.c
>>> @@ -205,6 +205,7 @@ void __init riscv_fill_hwcap(void)
>>>                   SET_ISA_EXT_MAP("zihintpause", 
>>> RISCV_ISA_EXT_ZIHINTPAUSE);
>>>                   SET_ISA_EXT_MAP("sstc", RISCV_ISA_EXT_SSTC);
>>>                   SET_ISA_EXT_MAP("svinval", RISCV_ISA_EXT_SVINVAL);
>>> +                SET_ISA_EXT_MAP("svnapot", RISCV_ISA_EXT_SVNAPOT);
>>>               }
>>>   #undef SET_ISA_EXT_MAP
>>>           }
>>> @@ -252,6 +253,17 @@ void __init riscv_fill_hwcap(void)
>>>   }
>>>   #ifdef CONFIG_RISCV_ALTERNATIVE
>>> +static bool __init_or_module cpufeature_probe_svnapot(unsigned int 
>>> stage)
>>> +{
>>> +    if (!IS_ENABLED(CONFIG_RISCV_ISA_SVNAPOT))
>>> +        return false;
>>> +
>>> +    if (stage == RISCV_ALTERNATIVES_EARLY_BOOT)
>>> +        return false;
>>> +
>>> +    return riscv_isa_extension_available(NULL, SVNAPOT);
>>> +}
>>> +
>>>   static bool __init_or_module cpufeature_probe_svpbmt(unsigned int 
>>> stage)
>>>   {
>>>       if (!IS_ENABLED(CONFIG_RISCV_ISA_SVPBMT))
>>> @@ -289,6 +301,9 @@ static u32 __init_or_module 
>>> cpufeature_probe(unsigned int stage)
>>>   {
>>>       u32 cpu_req_feature = 0;
>>> +    if (cpufeature_probe_svnapot(stage))
>>> +        cpu_req_feature |= BIT(CPUFEATURE_SVNAPOT);
>>> +
>>>       if (cpufeature_probe_svpbmt(stage))
>>>           cpu_req_feature |= BIT(CPUFEATURE_SVPBMT);
>>
>> There's a bunch of stuff in this patch that may go away, depending on
>> sequencing just FYI. See [1] for more. I wouldn't rebase on top of that,
>> but just so that you're aware.
> 
> Thanks for the information! If this patchset will be merged firstly, 
> Jisheng can easily replace implementation of has_svnapot with his 
> interface. Otherwise I will do this replacement :)
> 
> Thanks,
> Qinglin
> 
>>
>> 1 - 
>> https://patchwork.kernel.org/project/linux-riscv/cover/20221204174632.3677-1-jszhang@kernel.org/
> 
> 
> _______________________________________________
> linux-riscv mailing list
> linux-riscv@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v9 3/3] riscv: mm: support Svnapot in huge vmap
  2022-12-08 15:22   ` Andrew Jones
@ 2022-12-08 17:39     ` Qinglin Pan
  0 siblings, 0 replies; 22+ messages in thread
From: Qinglin Pan @ 2022-12-08 17:39 UTC (permalink / raw)
  To: Andrew Jones; +Cc: palmer, linux-riscv, jeff, xuyinan, conor, alex, jszhang

Hey!

On 2022/12/8 23:22, Andrew Jones wrote:
> On Sun, Dec 04, 2022 at 10:11:37PM +0800, panqinglin2020@iscas.ac.cn wrote:
>> From: Qinglin Pan <panqinglin2020@iscas.ac.cn>
>>
>> As HAVE_ARCH_HUGE_VMAP and HAVE_ARCH_HUGE_VMALLOC is supported, we can
>> implement arch_vmap_pte_range_map_size and arch_vmap_pte_supported_shift
>> for Svnapot to support huge vmap about napot size.
>>
>> It can be tested by huge vmap used in pci driver. Huge vmalloc with svnapot
>> can be tested by changing test_vmalloc module like [1], and probe this
>> module to run fix_size_alloc_test with use_huge true.
>>
>> [1]https://github.com/RV-VM/linux-vm-support/commit/33f4ee399c36d355c412ebe5334ca46fd727f8f5
> 
> Do you plan to post this test_vmalloc.c patch?

I am sure to do it if it makes sense :)
I will send a patch of it later.

Thanks,
Qinglin.

> 
> Thanks,
> drew


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v9 2/3] riscv: mm: support Svnapot in hugetlb page
  2022-12-08 15:17   ` Andrew Jones
@ 2022-12-08 17:43     ` Qinglin Pan
  0 siblings, 0 replies; 22+ messages in thread
From: Qinglin Pan @ 2022-12-08 17:43 UTC (permalink / raw)
  To: Andrew Jones; +Cc: palmer, linux-riscv, jeff, xuyinan, conor, alex, jszhang

Hi Andrew,

On 2022/12/8 23:17, Andrew Jones wrote:
> On Sun, Dec 04, 2022 at 10:11:36PM +0800, panqinglin2020@iscas.ac.cn wrote:
>> From: Qinglin Pan <panqinglin2020@iscas.ac.cn>
>>
>> Svnapot can be used to support 64KB hugetlb page, so it can become a new
>> option when using hugetlbfs. Add a basic implementation of hugetlb page,
>> and support 64KB as a size in it by using Svnapot.
>>
>> For test, boot kernel with command line contains "default_hugepagesz=64K
>> hugepagesz=64K hugepages=20" and run a simple test like this:
>>
>> tools/testing/selftests/vm/map_hugetlb 1 16
>>
>> And it should be passed.
>>
>> Signed-off-by: Qinglin Pan <panqinglin2020@iscas.ac.cn>
>>
>> diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
>> index 1d8477c0af7c..be5c1edea70f 100644
>> --- a/arch/riscv/Kconfig
>> +++ b/arch/riscv/Kconfig
>> @@ -43,7 +43,7 @@ config RISCV
>>   	select ARCH_USE_QUEUED_RWLOCKS
>>   	select ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT if MMU
>>   	select ARCH_WANT_FRAME_POINTERS
>> -	select ARCH_WANT_GENERAL_HUGETLB
>> +	select ARCH_WANT_GENERAL_HUGETLB if !RISCV_ISA_SVNAPOT
>>   	select ARCH_WANT_HUGE_PMD_SHARE if 64BIT
>>   	select ARCH_WANTS_THP_SWAP if HAVE_ARCH_TRANSPARENT_HUGEPAGE
>>   	select BINFMT_FLAT_NO_DATA_START_OFFSET if !MMU
>> diff --git a/arch/riscv/include/asm/hugetlb.h b/arch/riscv/include/asm/hugetlb.h
>> index ec19d6afc896..fe6f23006641 100644
>> --- a/arch/riscv/include/asm/hugetlb.h
>> +++ b/arch/riscv/include/asm/hugetlb.h
>> @@ -2,7 +2,6 @@
>>   #ifndef _ASM_RISCV_HUGETLB_H
>>   #define _ASM_RISCV_HUGETLB_H
>>   
>> -#include <asm-generic/hugetlb.h>
>>   #include <asm/page.h>
>>   
>>   static inline void arch_clear_hugepage_flags(struct page *page)
>> @@ -11,4 +10,37 @@ static inline void arch_clear_hugepage_flags(struct page *page)
>>   }
>>   #define arch_clear_hugepage_flags arch_clear_hugepage_flags
>>   
>> +#ifdef CONFIG_RISCV_ISA_SVNAPOT
>> +#define __HAVE_ARCH_HUGE_PTE_CLEAR
>> +void huge_pte_clear(struct mm_struct *mm, unsigned long addr,
>> +		    pte_t *ptep, unsigned long sz);
>> +
>> +#define __HAVE_ARCH_HUGE_SET_HUGE_PTE_AT
>> +void set_huge_pte_at(struct mm_struct *mm,
>> +		     unsigned long addr, pte_t *ptep, pte_t pte);
>> +
>> +#define __HAVE_ARCH_HUGE_PTEP_GET_AND_CLEAR
>> +pte_t huge_ptep_get_and_clear(struct mm_struct *mm,
>> +			      unsigned long addr, pte_t *ptep);
>> +
>> +#define __HAVE_ARCH_HUGE_PTEP_CLEAR_FLUSH
>> +pte_t huge_ptep_clear_flush(struct vm_area_struct *vma,
>> +			    unsigned long addr, pte_t *ptep);
>> +
>> +#define __HAVE_ARCH_HUGE_PTEP_SET_WRPROTECT
>> +void huge_ptep_set_wrprotect(struct mm_struct *mm,
>> +			     unsigned long addr, pte_t *ptep);
>> +
>> +#define __HAVE_ARCH_HUGE_PTEP_SET_ACCESS_FLAGS
>> +int huge_ptep_set_access_flags(struct vm_area_struct *vma,
>> +			       unsigned long addr, pte_t *ptep,
>> +			       pte_t pte, int dirty);
>> +
>> +pte_t arch_make_huge_pte(pte_t entry, unsigned int shift, vm_flags_t flags);
>> +#define arch_make_huge_pte arch_make_huge_pte
>> +
>> +#endif /*CONFIG_RISCV_ISA_SVNAPOT*/
>> +
>> +#include <asm-generic/hugetlb.h>
>> +
>>   #endif /* _ASM_RISCV_HUGETLB_H */
>> diff --git a/arch/riscv/mm/hugetlbpage.c b/arch/riscv/mm/hugetlbpage.c
>> index 932dadfdca54..49f92f8cd431 100644
>> --- a/arch/riscv/mm/hugetlbpage.c
>> +++ b/arch/riscv/mm/hugetlbpage.c
>> @@ -2,6 +2,301 @@
>>   #include <linux/hugetlb.h>
>>   #include <linux/err.h>
>>   
>> +#ifdef CONFIG_RISCV_ISA_SVNAPOT
>> +pte_t *huge_pte_alloc(struct mm_struct *mm,
>> +		      struct vm_area_struct *vma,
>> +		      unsigned long addr,
>> +		      unsigned long sz)
>> +{
>> +	pgd_t *pgd;
>> +	p4d_t *p4d;
>> +	pud_t *pud;
>> +	pmd_t *pmd;
>> +	pte_t *pte = NULL;
>> +	unsigned long order;
> 
> Since it's Christmas time I'll make the nit that reverse fir tree is
> preferred[1].
> 
> [1] https://www.kernel.org/doc/html/latest/process/maintainer-tip.html#variable-declarations

Got it :)

> 
>> +
>> +	pgd = pgd_offset(mm, addr);
>> +	p4d = p4d_alloc(mm, pgd, addr);
>> +	if (!p4d)
>> +		return NULL;
>> +
>> +	pud = pud_alloc(mm, p4d, addr);
>> +	if (!pud)
>> +		return NULL;
>> +
>> +	if (sz == PUD_SIZE) {
>> +		pte = (pte_t *)pud;
>> +		goto out;
>> +	}
>> +
>> +	if (sz == PMD_SIZE) {
>> +		if (want_pmd_share(vma, addr) && pud_none(*pud))
>> +			pte = huge_pmd_share(mm, vma, addr, pud);
>> +		else
>> +			pte = (pte_t *)pmd_alloc(mm, pud, addr);
>> +		goto out;
>> +	}
>> +
>> +	pmd = pmd_alloc(mm, pud, addr);
>> +	if (!pmd)
>> +		return NULL;
>> +
>> +	for_each_napot_order(order) {
>> +		if (napot_cont_size(order) == sz) {
>> +			pte = pte_alloc_map(mm, pmd, (addr & napot_cont_mask(order)));
> 
> nit: No need for the () in the 3rd parameter
> 
>> +			break;
>> +		}
>> +	}
>> +
>> +out:
>> +	WARN_ON_ONCE(pte && pte_present(*pte) && !pte_huge(*pte));
>> +	return pte;
>> +}
>> +
>> +pte_t *huge_pte_offset(struct mm_struct *mm,
>> +		       unsigned long addr,
>> +		       unsigned long sz)
>> +{
>> +	pgd_t *pgd;
>> +	p4d_t *p4d;
>> +	pud_t *pud;
>> +	pmd_t *pmd;
>> +	pte_t *pte = NULL;
>> +	unsigned long order;
>> +
>> +	pgd = pgd_offset(mm, addr);
>> +	if (!pgd_present(*pgd))
>> +		return NULL;
> 
> nit: Add blank line here
> 
>> +	p4d = p4d_offset(pgd, addr);
>> +	if (!p4d_present(*p4d))
>> +		return NULL;
>> +
>> +	pud = pud_offset(p4d, addr);
>> +	if (sz == PUD_SIZE)
>> +		/* must be pud huge, non-present or none */
>> +		return (pte_t *)pud;
> 
> nit: Add blank line here
> 
>> +	if (!pud_present(*pud))
>> +		return NULL;
>> +
>> +	pmd = pmd_offset(pud, addr);
>> +	if (sz == PMD_SIZE)
>> +		/* must be pmd huge, non-present or none */
>> +		return (pte_t *)pmd;
> 
> nit: Add blank line here
> 
>> +	if (!pmd_present(*pmd))
>> +		return NULL;
>> +
>> +	for_each_napot_order(order) {
>> +		if (napot_cont_size(order) == sz) {
>> +			pte = pte_offset_kernel(pmd, (addr & napot_cont_mask(order)));
> 
> nit: extra ()
> 
>> +			break;
>> +		}
>> +	}
> 
> nit: Add blank line here
> 
>> +	return pte;
>> +}
>> +
>> +static pte_t get_clear_contig(struct mm_struct *mm,
>> +			      unsigned long addr,
>> +			      pte_t *ptep,
>> +			      unsigned long pte_num)
>> +{
>> +	pte_t orig_pte = ptep_get(ptep);
>> +	unsigned long i;
>> +
>> +	for (i = 0; i < pte_num; i++, addr += PAGE_SIZE, ptep++) {
>> +		pte_t pte = ptep_get_and_clear(mm, addr, ptep);
>> +
>> +		if (pte_dirty(pte))
>> +			orig_pte = pte_mkdirty(orig_pte);
>> +
>> +		if (pte_young(pte))
>> +			orig_pte = pte_mkyoung(orig_pte);
>> +	}
> 
> nit: Add blank line here
> 
>> +	return orig_pte;
>> +}
>> +
>> +static pte_t get_clear_contig_flush(struct mm_struct *mm,
>> +				    unsigned long addr,
>> +				    pte_t *ptep,
>> +				    unsigned long pte_num)
>> +{
>> +	pte_t orig_pte = get_clear_contig(mm, addr, ptep, pte_num);
>> +	bool valid = !pte_none(orig_pte);
>> +	struct vm_area_struct vma = TLB_FLUSH_VMA(mm, 0);
>> +
>> +	if (valid)
>> +		flush_tlb_range(&vma, addr, addr + (PAGE_SIZE * pte_num));
>> +
>> +	return orig_pte;
>> +}
>> +
>> +pte_t arch_make_huge_pte(pte_t entry, unsigned int shift, vm_flags_t flags)
>> +{
>> +	unsigned long order;
>> +
>> +	for_each_napot_order(order) {
>> +		if (shift == napot_cont_shift(order)) {
>> +			entry = pte_mknapot(entry, order);
>> +			break;
>> +		}
>> +	}
>> +	if (order == NAPOT_ORDER_MAX)
>> +		entry = pte_mkhuge(entry);
>> +
>> +	return entry;
>> +}
>> +
>> +void set_huge_pte_at(struct mm_struct *mm,
>> +		     unsigned long addr,
>> +		     pte_t *ptep,
>> +		     pte_t pte)
>> +{
>> +	int i;
>> +	int pte_num;
>> +
>> +	if (!pte_napot(pte)) {
>> +		set_pte_at(mm, addr, ptep, pte);
>> +		return;
>> +	}
>> +
>> +	pte_num = napot_pte_num(napot_cont_order(pte));
>> +	for (i = 0; i < pte_num; i++, ptep++, addr += PAGE_SIZE)
>> +		set_pte_at(mm, addr, ptep, pte);
>> +}
>> +
>> +int huge_ptep_set_access_flags(struct vm_area_struct *vma,
>> +			       unsigned long addr,
>> +			       pte_t *ptep,
>> +			       pte_t pte,
>> +			       int dirty)
>> +{
>> +	pte_t orig_pte;
>> +	int i;
>> +	int pte_num;
>> +	struct mm_struct *mm = vma->vm_mm;
>> +
>> +	if (!pte_napot(pte))
>> +		return ptep_set_access_flags(vma, addr, ptep, pte, dirty);
>> +
>> +	pte_num = napot_pte_num(napot_cont_order(pte));
>> +	ptep = huge_pte_offset(mm, addr,
>> +			       napot_cont_size(napot_cont_order(pte)));
> 
> nit:
> 	order = napot_cont_order(pte);
> 	pte_num = napot_pte_num(order);
> 	ptep = huge_pte_offset(mm, addr, napot_cont_size(order));
> 
>> +	orig_pte = get_clear_contig_flush(mm, addr, ptep, pte_num);
>> +
>> +	if (pte_dirty(orig_pte))
>> +		pte = pte_mkdirty(pte);
>> +
>> +	if (pte_young(orig_pte))
>> +		pte = pte_mkyoung(pte);
>> +
>> +	for (i = 0; i < pte_num; i++, addr += PAGE_SIZE, ptep++)
>> +		set_pte_at(mm, addr, ptep, pte);
>> +
>> +	return true;
>> +}
>> +
>> +pte_t huge_ptep_get_and_clear(struct mm_struct *mm,
>> +			      unsigned long addr,
>> +			      pte_t *ptep)
>> +{
>> +	int pte_num;
>> +	pte_t orig_pte = ptep_get(ptep);
>> +
>> +	if (!pte_napot(orig_pte))
>> +		return ptep_get_and_clear(mm, addr, ptep);
>> +
>> +	pte_num = napot_pte_num(napot_cont_order(orig_pte));
>> +
>> +	return get_clear_contig(mm, addr, ptep, pte_num);
>> +}
>> +
>> +void huge_ptep_set_wrprotect(struct mm_struct *mm,
>> +			     unsigned long addr,
>> +			     pte_t *ptep)
>> +{
>> +	int i;
>> +	int pte_num;
>> +	pte_t pte = ptep_get(ptep);
>> +
>> +	if (!pte_napot(pte)) {
>> +		ptep_set_wrprotect(mm, addr, ptep);
>> +		return;
>> +	}
>> +
>> +	pte_num = napot_pte_num(napot_cont_order(pte));
>> +	ptep = huge_pte_offset(mm, addr, napot_cont_size(napot_cont_order(pte)));
> 
> Same use an 'order' variable nit as above
> 
>> +
>> +	for (i = 0; i < pte_num; i++, addr += PAGE_SIZE, ptep++)
>> +		ptep_set_wrprotect(mm, addr, ptep);
>> +}
>> +
>> +pte_t huge_ptep_clear_flush(struct vm_area_struct *vma,
>> +			    unsigned long addr,
>> +			    pte_t *ptep)
>> +{
>> +	int pte_num;
>> +	pte_t pte = ptep_get(ptep);
>> +
>> +	if (!pte_napot(pte))
>> +		return ptep_clear_flush(vma, addr, ptep);
>> +
>> +	pte_num = napot_pte_num(napot_cont_order(pte));
>> +
>> +	return get_clear_contig_flush(vma->vm_mm, addr, ptep, pte_num);
>> +}
>> +
>> +void huge_pte_clear(struct mm_struct *mm,
>> +		    unsigned long addr,
>> +		    pte_t *ptep,
>> +		    unsigned long sz)
>> +{
>> +	int i, pte_num;
>> +	pte_t pte = READ_ONCE(*ptep);
>> +
>> +	if (!pte_napot(pte)) {
>> +		pte_clear(mm, addr, ptep);
>> +		return;
>> +	}
>> +
>> +	pte_num = napot_pte_num(napot_cont_order(pte));
>> +	for (i = 0; i < pte_num; i++, addr += PAGE_SIZE, ptep++)
>> +		pte_clear(mm, addr, ptep);
>> +}
>> +
>> +bool __init is_napot_size(unsigned long size)
>> +{
>> +	unsigned long order;
>> +
>> +	if (!has_svnapot())
>> +		return false;
>> +
>> +	for_each_napot_order(order) {
>> +		if (size == napot_cont_size(order))
>> +			return true;
>> +	}
>> +	return false;
>> +}
>> +
>> +static __init int napot_hugetlbpages_init(void)
>> +{
>> +	if (has_svnapot()) {
>> +		unsigned long order;
>> +
>> +		for_each_napot_order(order)
>> +			hugetlb_add_hstate(order);
>> +	}
>> +	return 0;
>> +}
>> +arch_initcall(napot_hugetlbpages_init);
>> +
>> +#else
>> +
>> +bool __init is_napot_size(unsigned long size)
>> +{
>> +	return false;
>> +}
>> +
>> +#endif /*CONFIG_RISCV_ISA_SVNAPOT*/
>> +
>>   int pud_huge(pud_t pud)
>>   {
>>   	return pud_leaf(pud);
>> @@ -18,6 +313,8 @@ bool __init arch_hugetlb_valid_size(unsigned long size)
>>   		return true;
>>   	else if (IS_ENABLED(CONFIG_64BIT) && size == PUD_SIZE)
>>   		return true;
>> +	else if (is_napot_size(size))
>> +		return true;
>>   	else
>>   		return false;
>>   }
>> -- 
>> 2.37.4
>>
> 
> Besides the nits,

Thanks a lot for all the nits! Will fix them in next version.

> 
> Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
> 
> Thanks,
> drew

Thanks,
Qinglin.


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v9 1/3] riscv: mm: modify pte format for Svnapot
  2022-12-08 17:34       ` Qinglin Pan
@ 2022-12-09  7:42         ` Andrew Jones
  2022-12-09 14:33           ` Conor Dooley
  0 siblings, 1 reply; 22+ messages in thread
From: Andrew Jones @ 2022-12-09  7:42 UTC (permalink / raw)
  To: Qinglin Pan; +Cc: linux-riscv, Conor Dooley, palmer, jeff

On Fri, Dec 09, 2022 at 01:34:42AM +0800, Qinglin Pan wrote:
...
> > > > +#ifdef CONFIG_RISCV_ISA_SVNAPOT
> > > > +#define HUGE_MAX_HSTATE        (2 + (NAPOT_ORDER_MAX -
> > > > NAPOT_CONT_ORDER_BASE))
> > > > +#else
> > > > +#define HUGE_MAX_HSTATE        2
> > > > +#endif
> > > 
> > > This is coming from a place of *complete* ignorance:
> > > If CONFIG_RISCV_ISA_SVNAPOT is enabled in the kernel but there is no
> > > support for it on a platform is it okay to change the value of
> > > HUGE_MAX_HSTATE? Apologies if I've missed something obvious.
> > 
> > :(
> > You are right. If CONFIG_RISCV_ISA_SVNAPOT is enabled, HUGE_MAX_HSTATE
> > will always be 3 whether has_svnapot() is true or false. I will try to
> > fix this.
> 
> I am really expecting that HUGE_MAX_HSTATE can change according to the
> platform situation when CONFIG_RISCV_ISA_SVNAPOT is enabled. But it
> seems impossible to achive this with a minor patch. Because this value
> is used as some static allocated arrays' length (for example, hstates in
> mm/hugetlb.c:52) in arch-independent code :(
> 
> This immuture HUGE_MAX_HSTATE value will cause no functional error but
> it really makes some objects used only partially when
> CONFIG_RISCV_ISA_SVNAPOT is enabled but the platform has no svnapot
> support :( I am not sure whether this patch can be accepted with this
> feature? Or any other helpful idea to fix this problem?
> 
> Please let me know if I ignore any information about it.

I agree that the only problem appears to a waste of memory, particularly
if cgroups are enabled. I don't see any easy solution either. Maybe a
warning in the Kconfig text would be sufficient?

Thanks,
drew

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v9 1/3] riscv: mm: modify pte format for Svnapot
  2022-12-09  7:42         ` Andrew Jones
@ 2022-12-09 14:33           ` Conor Dooley
  2022-12-09 15:36             ` Qinglin Pan
  0 siblings, 1 reply; 22+ messages in thread
From: Conor Dooley @ 2022-12-09 14:33 UTC (permalink / raw)
  To: Andrew Jones, Qinglin Pan; +Cc: linux-riscv, palmer, jeff



On 9 December 2022 08:42:29 GMT+01:00, Andrew Jones <ajones@ventanamicro.com> wrote:
>On Fri, Dec 09, 2022 at 01:34:42AM +0800, Qinglin Pan wrote:
>...
>> > > > +#ifdef CONFIG_RISCV_ISA_SVNAPOT
>> > > > +#define HUGE_MAX_HSTATE        (2 + (NAPOT_ORDER_MAX -
>> > > > NAPOT_CONT_ORDER_BASE))
>> > > > +#else
>> > > > +#define HUGE_MAX_HSTATE        2
>> > > > +#endif
>> > > 
>> > > This is coming from a place of *complete* ignorance:
>> > > If CONFIG_RISCV_ISA_SVNAPOT is enabled in the kernel but there is no
>> > > support for it on a platform is it okay to change the value of
>> > > HUGE_MAX_HSTATE? Apologies if I've missed something obvious.
>> > 
>> > :(
>> > You are right. If CONFIG_RISCV_ISA_SVNAPOT is enabled, HUGE_MAX_HSTATE
>> > will always be 3 whether has_svnapot() is true or false. I will try to
>> > fix this.
>> 
>> I am really expecting that HUGE_MAX_HSTATE can change according to the
>> platform situation when CONFIG_RISCV_ISA_SVNAPOT is enabled. But it
>> seems impossible to achive this with a minor patch. Because this value
>> is used as some static allocated arrays' length (for example, hstates in
>> mm/hugetlb.c:52) in arch-independent code :(
>> 
>> This immuture HUGE_MAX_HSTATE value will cause no functional error but
>> it really makes some objects used only partially when
>> CONFIG_RISCV_ISA_SVNAPOT is enabled but the platform has no svnapot
>> support :( I am not sure whether this patch can be accepted with this
>> feature? Or any other helpful idea to fix this problem?
>> 
>> Please let me know if I ignore any information about it.
>
>I agree that the only problem appears to a waste of memory, particularly

Caveat about ignorance still applies, how much of a waste of memory are we talking?

>if cgroups are enabled. I don't see any easy solution either. Maybe a
>warning in the Kconfig text would be sufficient?

I think that entirely depends on how wasteful it is.
If it's minor, sure?

Thanks,
Conor.


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v9 1/3] riscv: mm: modify pte format for Svnapot
  2022-12-09 14:33           ` Conor Dooley
@ 2022-12-09 15:36             ` Qinglin Pan
  2022-12-09 17:04               ` Andrew Jones
  0 siblings, 1 reply; 22+ messages in thread
From: Qinglin Pan @ 2022-12-09 15:36 UTC (permalink / raw)
  To: Conor Dooley, Andrew Jones; +Cc: linux-riscv, palmer, jeff

Hi all,

On 2022/12/9 22:33, Conor Dooley wrote:
> 
> 
> On 9 December 2022 08:42:29 GMT+01:00, Andrew Jones <ajones@ventanamicro.com> wrote:
>> On Fri, Dec 09, 2022 at 01:34:42AM +0800, Qinglin Pan wrote:
>> ...
>>>>>> +#ifdef CONFIG_RISCV_ISA_SVNAPOT
>>>>>> +#define HUGE_MAX_HSTATE        (2 + (NAPOT_ORDER_MAX -
>>>>>> NAPOT_CONT_ORDER_BASE))
>>>>>> +#else
>>>>>> +#define HUGE_MAX_HSTATE        2
>>>>>> +#endif
>>>>>
>>>>> This is coming from a place of *complete* ignorance:
>>>>> If CONFIG_RISCV_ISA_SVNAPOT is enabled in the kernel but there is no
>>>>> support for it on a platform is it okay to change the value of
>>>>> HUGE_MAX_HSTATE? Apologies if I've missed something obvious.
>>>>
>>>> :(
>>>> You are right. If CONFIG_RISCV_ISA_SVNAPOT is enabled, HUGE_MAX_HSTATE
>>>> will always be 3 whether has_svnapot() is true or false. I will try to
>>>> fix this.
>>>
>>> I am really expecting that HUGE_MAX_HSTATE can change according to the
>>> platform situation when CONFIG_RISCV_ISA_SVNAPOT is enabled. But it
>>> seems impossible to achive this with a minor patch. Because this value
>>> is used as some static allocated arrays' length (for example, hstates in
>>> mm/hugetlb.c:52) in arch-independent code :(
>>>
>>> This immuture HUGE_MAX_HSTATE value will cause no functional error but
>>> it really makes some objects used only partially when
>>> CONFIG_RISCV_ISA_SVNAPOT is enabled but the platform has no svnapot
>>> support :( I am not sure whether this patch can be accepted with this
>>> feature? Or any other helpful idea to fix this problem?
>>>
>>> Please let me know if I ignore any information about it.
>>
>> I agree that the only problem appears to a waste of memory, particularly
> 
> Caveat about ignorance still applies, how much of a waste of memory are we talking?
> 
>> if cgroups are enabled. I don't see any easy solution either. Maybe a
>> warning in the Kconfig text would be sufficient?
> 
> I think that entirely depends on how wasteful it is.
> If it's minor, sure?
> 

I am also not sure how minor this waste will be. But I believe it will
be acceptable. Because before this patchset, we also use only 1/2 of
HUGE_MAX_HSTATE value (which is 2) when CONFIG_64BIT is enabled and
CONFIG_CONTIG_ALLOC is disabled. As HUGE_MAX_HSTATE is a 'max' value,
it is normal to use it up only sometimes but not always.

I prefer to do as Andrew said. To add an notification about this waste
in KConfig text, and still use the immuture HUGE_MAX_HSTATE value.

@Andrew @Conor what do you think of this explanation and this solution?

Please let me know if I ignore anything. Thanks a lot!

Regards,
Qinglin.

> Thanks,
> Conor.


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v9 1/3] riscv: mm: modify pte format for Svnapot
  2022-12-09 15:36             ` Qinglin Pan
@ 2022-12-09 17:04               ` Andrew Jones
  0 siblings, 0 replies; 22+ messages in thread
From: Andrew Jones @ 2022-12-09 17:04 UTC (permalink / raw)
  To: Qinglin Pan; +Cc: Conor Dooley, linux-riscv, palmer, jeff

On Fri, Dec 09, 2022 at 11:36:53PM +0800, Qinglin Pan wrote:
> Hi all,
> 
> On 2022/12/9 22:33, Conor Dooley wrote:
> > 
> > 
> > On 9 December 2022 08:42:29 GMT+01:00, Andrew Jones <ajones@ventanamicro.com> wrote:
> > > On Fri, Dec 09, 2022 at 01:34:42AM +0800, Qinglin Pan wrote:
> > > ...
> > > > > > > +#ifdef CONFIG_RISCV_ISA_SVNAPOT
> > > > > > > +#define HUGE_MAX_HSTATE        (2 + (NAPOT_ORDER_MAX -
> > > > > > > NAPOT_CONT_ORDER_BASE))
> > > > > > > +#else
> > > > > > > +#define HUGE_MAX_HSTATE        2
> > > > > > > +#endif
> > > > > > 
> > > > > > This is coming from a place of *complete* ignorance:
> > > > > > If CONFIG_RISCV_ISA_SVNAPOT is enabled in the kernel but there is no
> > > > > > support for it on a platform is it okay to change the value of
> > > > > > HUGE_MAX_HSTATE? Apologies if I've missed something obvious.
> > > > > 
> > > > > :(
> > > > > You are right. If CONFIG_RISCV_ISA_SVNAPOT is enabled, HUGE_MAX_HSTATE
> > > > > will always be 3 whether has_svnapot() is true or false. I will try to
> > > > > fix this.
> > > > 
> > > > I am really expecting that HUGE_MAX_HSTATE can change according to the
> > > > platform situation when CONFIG_RISCV_ISA_SVNAPOT is enabled. But it
> > > > seems impossible to achive this with a minor patch. Because this value
> > > > is used as some static allocated arrays' length (for example, hstates in
> > > > mm/hugetlb.c:52) in arch-independent code :(
> > > > 
> > > > This immuture HUGE_MAX_HSTATE value will cause no functional error but
> > > > it really makes some objects used only partially when
> > > > CONFIG_RISCV_ISA_SVNAPOT is enabled but the platform has no svnapot
> > > > support :( I am not sure whether this patch can be accepted with this
> > > > feature? Or any other helpful idea to fix this problem?
> > > > 
> > > > Please let me know if I ignore any information about it.
> > > 
> > > I agree that the only problem appears to a waste of memory, particularly
> > 
> > Caveat about ignorance still applies, how much of a waste of memory are we talking?
> > 
> > > if cgroups are enabled. I don't see any easy solution either. Maybe a
> > > warning in the Kconfig text would be sufficient?
> > 
> > I think that entirely depends on how wasteful it is.
> > If it's minor, sure?
> > 
> 
> I am also not sure how minor this waste will be. But I believe it will
> be acceptable. Because before this patchset, we also use only 1/2 of
> HUGE_MAX_HSTATE value (which is 2) when CONFIG_64BIT is enabled and
> CONFIG_CONTIG_ALLOC is disabled. As HUGE_MAX_HSTATE is a 'max' value,
> it is normal to use it up only sometimes but not always.
> 
> I prefer to do as Andrew said. To add an notification about this waste
> in KConfig text, and still use the immuture HUGE_MAX_HSTATE value.
> 
> @Andrew @Conor what do you think of this explanation and this solution?

To get a better estimate of the waste it should be possible to write a
quick kernel module which outputs sizeof on a few structs. But, from a
quick read of the structs, I think we're probably fine. Certainly it looks
like without CGROUP_HUGETLB and with a small MAX_NUMNODES it's no big
deal. With CGROUP_HUGETLB it's definitely more, but probably OK.
MAX_NUMNODES probably won't ever be huge, so, while there are five arrays
of MAX_NUMNODES elements in hstates, it's unlikely to matter much,
especially since the types of the array elements are small.

It'd be nice to get actual numbers, but based on that analysis I think I'm
OK with the comment in Kconfig.

Thanks,
drew

> 
> Please let me know if I ignore anything. Thanks a lot!
> 
> Regards,
> Qinglin.
> 
> > Thanks,
> > Conor.
> 

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2022-12-09 17:04 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-12-04 14:11 [PATCH v9 0/3] riscv, mm: detect svnapot cpu support at runtime panqinglin2020
2022-12-04 14:11 ` [PATCH v9 1/3] riscv: mm: modify pte format for Svnapot panqinglin2020
2022-12-06 19:56   ` Conor Dooley
2022-12-07 18:46   ` Conor Dooley
2022-12-08  4:51     ` Qinglin Pan
2022-12-08  4:59       ` Conor.Dooley
2022-12-08 17:34       ` Qinglin Pan
2022-12-09  7:42         ` Andrew Jones
2022-12-09 14:33           ` Conor Dooley
2022-12-09 15:36             ` Qinglin Pan
2022-12-09 17:04               ` Andrew Jones
2022-12-08 14:55   ` Andrew Jones
2022-12-04 14:11 ` [PATCH v9 2/3] riscv: mm: support Svnapot in hugetlb page panqinglin2020
2022-12-07 18:55   ` Conor Dooley
2022-12-08  4:54     ` Qinglin Pan
2022-12-08 15:17   ` Andrew Jones
2022-12-08 17:43     ` Qinglin Pan
2022-12-04 14:11 ` [PATCH v9 3/3] riscv: mm: support Svnapot in huge vmap panqinglin2020
2022-12-07 18:59   ` Conor Dooley
2022-12-08  4:57     ` Qinglin Pan
2022-12-08 15:22   ` Andrew Jones
2022-12-08 17:39     ` Qinglin Pan

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.