All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/6] ARM: Add support for large kernel page (from 8K to 64K)
@ 2020-06-11 13:49 Gregory CLEMENT
  2020-06-11 13:49 ` [PATCH v2 1/6] ARM: Use PAGE_SIZE for ELF_EXEC_PAGESIZE Gregory CLEMENT
                   ` (6 more replies)
  0 siblings, 7 replies; 22+ messages in thread
From: Gregory CLEMENT @ 2020-06-11 13:49 UTC (permalink / raw)
  To: Russell King, Arnd Bergmann
  Cc: Gregory CLEMENT, Thomas Petazzoni, linux-arm-kernel

Hello,

On ARM based NAS it is possible to have storage volume larger than
16TB, especially with the use of LVM. However, on 32-bit architectures,
the page cache index is stored on 32 bits, which means that given a
page size of 4 KB, we can only address volumes of up to 16 TB.

Therefore, one option to use such large volumes and filesystems on 32
bits architecture is to increase the page size.

This series allows to support 8K, 16K, 32K and 64K kernel pages. On
ARM the size of the page can be either 4K or 64K, so for the other
size a "software emulation" is used, here Linux thinks it is using
pages of 8 KB, 16 KB or 32 KB, while underneath the MMU still uses 4
KB pages.

For ARM there is already a difference between the kernel page and the
hardware page in the way they are managed. In the same 4K space the
Linux kernel deals with 2 PTE tables at the beginning, while the
hardware deals with 2 other hardware PTE tables.

This series takes advantage of it and pushes further the difference
between hardware and Linux version by using larger page size at Linux
kernel level.

This series is inspired from fa0ca2726ea9 ("DSMP 64K support") and
4ef803e12baf ("mmu: large-page: Added support for multiple kernel page
sizes") from
https://github.com/MarvellEmbeddedProcessors/linux-marvell.git. This
feature was used since many years and intensively on real product.

The first 4 patches are preparation to make distinction between kernel
page size and hardware page size. For 4K kernel page they won't modify
anything.

The fifth patch is the one actually adding the support for the large
page kernel. This feature was restricted for ARM v7 and non LPAE
architecture. It could maybe be extended to support them, but until
now it has only been tested on ARMv7.

The last patch allows to use the hardware 64K large page.

Gregory

Gregory CLEMENT (6):
  ARM: Use PAGE_SIZE for ELF_EXEC_PAGESIZE
  ARM: pagetable: prepare hardware page table to use large page
  ARM: Make the number of fix bitmap depend on the page size
  ARM: mm: Aligned pte allocation to one page
  ARM: Add large kernel page support
  ARM: Add 64K page support at MMU level

 arch/arm/include/asm/elf.h                  |  2 +-
 arch/arm/include/asm/fixmap.h               |  3 +-
 arch/arm/include/asm/page.h                 | 12 ++++
 arch/arm/include/asm/pgtable-2level-hwdef.h |  8 +++
 arch/arm/include/asm/pgtable-2level.h       |  6 +-
 arch/arm/include/asm/pgtable.h              |  4 ++
 arch/arm/include/asm/shmparam.h             |  4 ++
 arch/arm/include/asm/tlbflush.h             | 21 +++++-
 arch/arm/kernel/entry-common.S              | 13 ++++
 arch/arm/kernel/traps.c                     | 10 +++
 arch/arm/mm/Kconfig                         | 72 +++++++++++++++++++++
 arch/arm/mm/fault.c                         | 19 ++++++
 arch/arm/mm/mmu.c                           | 22 ++++++-
 arch/arm/mm/pgd.c                           |  2 +
 arch/arm/mm/proc-v7-2level.S                | 72 ++++++++++++++++++++-
 arch/arm/mm/tlb-v7.S                        | 14 +++-
 16 files changed, 271 insertions(+), 13 deletions(-)

-- 
2.26.2


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v2 1/6] ARM: Use PAGE_SIZE for ELF_EXEC_PAGESIZE
  2020-06-11 13:49 [PATCH v2 0/6] ARM: Add support for large kernel page (from 8K to 64K) Gregory CLEMENT
@ 2020-06-11 13:49 ` Gregory CLEMENT
  2020-06-12  8:22   ` Arnd Bergmann
  2020-06-11 13:49 ` [PATCH v2 2/6] ARM: pagetable: prepare hardware page table to use large page Gregory CLEMENT
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 22+ messages in thread
From: Gregory CLEMENT @ 2020-06-11 13:49 UTC (permalink / raw)
  To: Russell King, Arnd Bergmann
  Cc: Gregory CLEMENT, Thomas Petazzoni, linux-arm-kernel

Currently ELF_EXEC_PAGESIZE is 4096 which is also the page size. In
order to be able to use other size of page than 4K, use PAGE_SIZE
instead of the hardcoded value.

The use of PAGE_SIZE will be also aligned with what we find in other
architectures such as arm64.

This is inspired from fa0ca2726ea9 ("DSMP 64K support") and
4ef803e12baf ("mmu: large-page: Added support for multiple kernel page
sizes") from
https://github.com/MarvellEmbeddedProcessors/linux-marvell.git

Signed-off-by: Gregory CLEMENT <gregory.clement@bootlin.com>
---
 arch/arm/include/asm/elf.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm/include/asm/elf.h b/arch/arm/include/asm/elf.h
index b078d992414b..0e406ce25379 100644
--- a/arch/arm/include/asm/elf.h
+++ b/arch/arm/include/asm/elf.h
@@ -116,7 +116,7 @@ int dump_task_regs(struct task_struct *t, elf_gregset_t *elfregs);
 #define ELF_CORE_COPY_TASK_REGS dump_task_regs
 
 #define CORE_DUMP_USE_REGSET
-#define ELF_EXEC_PAGESIZE	4096
+#define ELF_EXEC_PAGESIZE	PAGE_SIZE
 
 /* This is the base location for PIE (ET_DYN with INTERP) loads. */
 #define ELF_ET_DYN_BASE		0x400000UL
-- 
2.26.2


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v2 2/6] ARM: pagetable: prepare hardware page table to use large page
  2020-06-11 13:49 [PATCH v2 0/6] ARM: Add support for large kernel page (from 8K to 64K) Gregory CLEMENT
  2020-06-11 13:49 ` [PATCH v2 1/6] ARM: Use PAGE_SIZE for ELF_EXEC_PAGESIZE Gregory CLEMENT
@ 2020-06-11 13:49 ` Gregory CLEMENT
  2020-06-11 13:49 ` [PATCH v2 3/6] ARM: Make the number of fix bitmap depend on the page size Gregory CLEMENT
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 22+ messages in thread
From: Gregory CLEMENT @ 2020-06-11 13:49 UTC (permalink / raw)
  To: Russell King, Arnd Bergmann
  Cc: Gregory CLEMENT, Thomas Petazzoni, linux-arm-kernel

With 4 KB pages, each page table contained 512 entries in the hardware
page tables, and 512 entries in the Linux page tables, each of those
entries pointing to 4 KB page.

With larger page sizes being emulated, the hardware page tables will
continue to contain 512 entries, as we keep using 4 KB pages at the MMU
level. Hence PTE_HWTABLE_PTRS is changed to 512. However, the number of
Linux page tables entries will vary depending on the page size: 512
entries with 4 KB pages, 256 entries with 8 KB pages, 128 entries with
16 KB pages, etc.

In the case of 4K pages, this patch doesn't modify the values being
used.

This is inspired from fa0ca2726ea9 ("DSMP 64K support") and
4ef803e12baf ("mmu: large-page: Added support for multiple kernel page
sizes") from
https://github.com/MarvellEmbeddedProcessors/linux-marvell.git

Signed-off-by: Gregory CLEMENT <gregory.clement@bootlin.com>
---
 arch/arm/include/asm/pgtable-2level.h | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/arm/include/asm/pgtable-2level.h b/arch/arm/include/asm/pgtable-2level.h
index 9e084a464a97..6316ef4a9f5c 100644
--- a/arch/arm/include/asm/pgtable-2level.h
+++ b/arch/arm/include/asm/pgtable-2level.h
@@ -67,13 +67,13 @@
  * until either the TLB entry is evicted under pressure, or a context
  * switch which changes the user space mapping occurs.
  */
-#define PTRS_PER_PTE		512
+#define PTRS_PER_PTE		(512 >> (PAGE_SHIFT-12))
 #define PTRS_PER_PMD		1
 #define PTRS_PER_PGD		2048
 
-#define PTE_HWTABLE_PTRS	(PTRS_PER_PTE)
+#define PTE_HWTABLE_PTRS	(512)
 #define PTE_HWTABLE_OFF		(PTE_HWTABLE_PTRS * sizeof(pte_t))
-#define PTE_HWTABLE_SIZE	(PTRS_PER_PTE * sizeof(u32))
+#define PTE_HWTABLE_SIZE	(PTE_HWTABLE_PTRS * sizeof(u32))
 
 /*
  * PMD_SHIFT determines the size of the area a second-level page table can map
-- 
2.26.2


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v2 3/6] ARM: Make the number of fix bitmap depend on the page size
  2020-06-11 13:49 [PATCH v2 0/6] ARM: Add support for large kernel page (from 8K to 64K) Gregory CLEMENT
  2020-06-11 13:49 ` [PATCH v2 1/6] ARM: Use PAGE_SIZE for ELF_EXEC_PAGESIZE Gregory CLEMENT
  2020-06-11 13:49 ` [PATCH v2 2/6] ARM: pagetable: prepare hardware page table to use large page Gregory CLEMENT
@ 2020-06-11 13:49 ` Gregory CLEMENT
  2020-06-11 13:49 ` [PATCH v2 4/6] ARM: mm: Aligned pte allocation to one page Gregory CLEMENT
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 22+ messages in thread
From: Gregory CLEMENT @ 2020-06-11 13:49 UTC (permalink / raw)
  To: Russell King, Arnd Bergmann
  Cc: Gregory CLEMENT, Thomas Petazzoni, linux-arm-kernel

Currently the number of fixmap used is fixed. However, if the page
size is no more 4K but a larger one, then, the space occupied by
fixmap is too big.

The total fixmap size being fixed, the number of fixmap should depend
of the page size as it is done for arm64.

Instead of always using 32 fixmap, we try to always having the same
size: 128KB, which for 4KB page matches these 32 pages.

Signed-off-by: Gregory CLEMENT <gregory.clement@bootlin.com>
---
 arch/arm/include/asm/fixmap.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/arm/include/asm/fixmap.h b/arch/arm/include/asm/fixmap.h
index 472c93db5dac..d4b82af5a96d 100644
--- a/arch/arm/include/asm/fixmap.h
+++ b/arch/arm/include/asm/fixmap.h
@@ -6,6 +6,7 @@
 #define FIXADDR_END		0xfff00000UL
 #define FIXADDR_TOP		(FIXADDR_END - PAGE_SIZE)
 
+#include <linux/sizes.h>
 #include <asm/kmap_types.h>
 #include <asm/pgtable.h>
 
@@ -27,7 +28,7 @@ enum fixed_addresses {
 	 * not to clash since early_ioremap() is only available before
 	 * paging_init(), and kmap() only after.
 	 */
-#define NR_FIX_BTMAPS		32
+#define NR_FIX_BTMAPS		(SZ_128K / PAGE_SIZE)
 #define FIX_BTMAPS_SLOTS	7
 #define TOTAL_FIX_BTMAPS	(NR_FIX_BTMAPS * FIX_BTMAPS_SLOTS)
 
-- 
2.26.2


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v2 4/6] ARM: mm: Aligned pte allocation to one page
  2020-06-11 13:49 [PATCH v2 0/6] ARM: Add support for large kernel page (from 8K to 64K) Gregory CLEMENT
                   ` (2 preceding siblings ...)
  2020-06-11 13:49 ` [PATCH v2 3/6] ARM: Make the number of fix bitmap depend on the page size Gregory CLEMENT
@ 2020-06-11 13:49 ` Gregory CLEMENT
  2020-06-12  8:37   ` Arnd Bergmann
  2020-06-11 13:49 ` [PATCH v2 5/6] ARM: Add large kernel page support Gregory CLEMENT
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 22+ messages in thread
From: Gregory CLEMENT @ 2020-06-11 13:49 UTC (permalink / raw)
  To: Russell King, Arnd Bergmann
  Cc: Gregory CLEMENT, Thomas Petazzoni, linux-arm-kernel

In pte_offset_kernel() the pte_index macro is used. This macro makes
the assumption that the address is aligned to a page size.

In arm_pte_allocation, the size allocated is the size needed for 512
entries. Actually this size was calculated to fit in a 4K page. When
using larger page, the size of the table allocated is no more
aligned which end to give a wrong physical address.

The solution is to round up the allocation to a page size instead of
the exact size of the tables (which is 4KB). It allows to comply with
the assumption of pte_index() but the drawback is a waste of memory
for the early allocation if page size is bigger than 4KB.

This is inspired from fa0ca2726ea9 ("DSMP 64K support") and
4ef803e12baf ("mmu: large-page: Added support for multiple kernel page
sizes") from
https://github.com/MarvellEmbeddedProcessors/linux-marvell.git

Signed-off-by: Gregory CLEMENT <gregory.clement@bootlin.com>
---
 arch/arm/mm/mmu.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/arm/mm/mmu.c b/arch/arm/mm/mmu.c
index ec8d0008bfa1..b7fdea7e0cbe 100644
--- a/arch/arm/mm/mmu.c
+++ b/arch/arm/mm/mmu.c
@@ -715,7 +715,9 @@ static pte_t * __init arm_pte_alloc(pmd_t *pmd, unsigned long addr,
 				void *(*alloc)(unsigned long sz))
 {
 	if (pmd_none(*pmd)) {
-		pte_t *pte = alloc(PTE_HWTABLE_OFF + PTE_HWTABLE_SIZE);
+		/* The PTE needs to be page to be page aligned	 */
+		pte_t *pte = alloc(round_up(PTE_HWTABLE_OFF + PTE_HWTABLE_SIZE,
+					    PAGE_SIZE));
 		__pmd_populate(pmd, __pa(pte), prot);
 	}
 	BUG_ON(pmd_bad(*pmd));
-- 
2.26.2


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v2 5/6] ARM: Add large kernel page support
  2020-06-11 13:49 [PATCH v2 0/6] ARM: Add support for large kernel page (from 8K to 64K) Gregory CLEMENT
                   ` (3 preceding siblings ...)
  2020-06-11 13:49 ` [PATCH v2 4/6] ARM: mm: Aligned pte allocation to one page Gregory CLEMENT
@ 2020-06-11 13:49 ` Gregory CLEMENT
  2020-06-11 13:49 ` [PATCH v2 6/6] ARM: Add 64K page support at MMU level Gregory CLEMENT
  2020-06-11 16:21 ` [PATCH v2 0/6] ARM: Add support for large kernel page (from 8K to 64K) Russell King - ARM Linux admin
  6 siblings, 0 replies; 22+ messages in thread
From: Gregory CLEMENT @ 2020-06-11 13:49 UTC (permalink / raw)
  To: Russell King, Arnd Bergmann
  Cc: Gregory CLEMENT, Thomas Petazzoni, linux-arm-kernel

On 32 bits system with 4K page it is not possible to support volume
larger than 16TB even with ext4 support. To achieve this, the size of
the page must be larger.

This patch allows to support until 64K kernel page but at the MMU
level it is still the 4K page that is used.

Indeed for ARM there is already a difference between the kernel page
and the hardware page in the way they are managed. In the same 4K
space the Linux kernel deals with 2 PTE tables at the beginning, while
the hardware deals with 2 other hardware PTE tables.

This patch takes advantage of it and push further the difference
between hardware and Linux version by using larger page at Linux
kernel level.

At the lower level when the Linux kernel deals with a single large
page, then it was several 4K pages that are managed.

This is inspired from fa0ca2726ea9 ("DSMP 64K support") and
4ef803e12baf ("mmu: large-page: Added support for multiple kernel page
sizes") from
https://github.com/MarvellEmbeddedProcessors/linux-marvell.git

Signed-off-by: Gregory CLEMENT <gregory.clement@bootlin.com>
---
 arch/arm/include/asm/page.h     | 10 +++++++
 arch/arm/include/asm/pgtable.h  |  4 +++
 arch/arm/include/asm/shmparam.h |  4 +++
 arch/arm/include/asm/tlbflush.h | 21 ++++++++++++-
 arch/arm/kernel/entry-common.S  | 13 ++++++++
 arch/arm/kernel/traps.c         | 10 +++++++
 arch/arm/mm/Kconfig             | 53 +++++++++++++++++++++++++++++++++
 arch/arm/mm/fault.c             | 19 ++++++++++++
 arch/arm/mm/mmu.c               | 18 +++++++++++
 arch/arm/mm/pgd.c               |  2 ++
 arch/arm/mm/proc-v7-2level.S    | 44 +++++++++++++++++++++++++--
 arch/arm/mm/tlb-v7.S            | 14 +++++++--
 12 files changed, 205 insertions(+), 7 deletions(-)

diff --git a/arch/arm/include/asm/page.h b/arch/arm/include/asm/page.h
index 11b058a72a5b..42784fed8834 100644
--- a/arch/arm/include/asm/page.h
+++ b/arch/arm/include/asm/page.h
@@ -8,7 +8,17 @@
 #define _ASMARM_PAGE_H
 
 /* PAGE_SHIFT determines the page size */
+#ifdef CONFIG_ARM_8KB_SW_PAGE_SIZE_SUPPORT
+#define PAGE_SHIFT		13
+#elif defined(CONFIG_ARM_16KB_SW_PAGE_SIZE_SUPPORT)
+#define PAGE_SHIFT		14
+#elif defined(CONFIG_ARM_32KB_SW_PAGE_SIZE_SUPPORT)
+#define PAGE_SHIFT		15
+#elif defined(CONFIG_ARM_64KB_SW_PAGE_SIZE_SUPPORT)
+#define PAGE_SHIFT		16
+#else
 #define PAGE_SHIFT		12
+#endif
 #define PAGE_SIZE		(_AC(1,UL) << PAGE_SHIFT)
 #define PAGE_MASK		(~((1 << PAGE_SHIFT) - 1))
 
diff --git a/arch/arm/include/asm/pgtable.h b/arch/arm/include/asm/pgtable.h
index befc8fcec98f..8b0a85ec8614 100644
--- a/arch/arm/include/asm/pgtable.h
+++ b/arch/arm/include/asm/pgtable.h
@@ -59,7 +59,11 @@ extern void __pgd_error(const char *file, int line, pgd_t);
  * mapping to be mapped at.  This is particularly important for
  * non-high vector CPUs.
  */
+#ifndef CONFIG_ARM_LARGE_PAGE_SUPPORT
 #define FIRST_USER_ADDRESS	(PAGE_SIZE * 2)
+#else
+#define FIRST_USER_ADDRESS	PAGE_SIZE
+#endif
 
 /*
  * Use TASK_SIZE as the ceiling argument for free_pgtables() and
diff --git a/arch/arm/include/asm/shmparam.h b/arch/arm/include/asm/shmparam.h
index 367a9dac6150..01de64a57a5e 100644
--- a/arch/arm/include/asm/shmparam.h
+++ b/arch/arm/include/asm/shmparam.h
@@ -7,7 +7,11 @@
  * or page size, whichever is greater since the cache aliases
  * every size/ways bytes.
  */
+#ifdef CONFIG_ARM_LARGE_PAGE_SUPPORT
+#define	SHMLBA	(16 << 10)		 /* attach addr a multiple of (4 * 4096) */
+#else
 #define	SHMLBA	(4 * PAGE_SIZE)		 /* attach addr a multiple of this */
+#endif
 
 /*
  * Enforce SHMLBA in shmat
diff --git a/arch/arm/include/asm/tlbflush.h b/arch/arm/include/asm/tlbflush.h
index 24cbfc112dfa..d8ad4021a4da 100644
--- a/arch/arm/include/asm/tlbflush.h
+++ b/arch/arm/include/asm/tlbflush.h
@@ -419,8 +419,16 @@ static inline void __flush_tlb_mm(struct mm_struct *mm)
 static inline void
 __local_flush_tlb_page(struct vm_area_struct *vma, unsigned long uaddr)
 {
-	const int zero = 0;
 	const unsigned int __tlb_flag = __cpu_tlb_flags;
+#if defined(CONFIG_ARM_LARGE_PAGE_SUPPORT)
+	if (tlb_flag(TLB_WB))
+		dsb();
+
+	uaddr = (uaddr & PAGE_MASK);
+	__cpu_flush_user_tlb_range(uaddr, uaddr + PAGE_SIZE, vma);
+
+#else
+	const int zero = 0;
 
 	uaddr = (uaddr & PAGE_MASK) | ASID(vma->vm_mm);
 
@@ -436,6 +444,7 @@ __local_flush_tlb_page(struct vm_area_struct *vma, unsigned long uaddr)
 	tlb_op(TLB_V6_U_PAGE, "c8, c7, 1", uaddr);
 	tlb_op(TLB_V6_D_PAGE, "c8, c6, 1", uaddr);
 	tlb_op(TLB_V6_I_PAGE, "c8, c5, 1", uaddr);
+#endif /* CONFIG_ARM_LARGE_PAGE_SUPPORT */
 }
 
 static inline void
@@ -449,7 +458,9 @@ local_flush_tlb_page(struct vm_area_struct *vma, unsigned long uaddr)
 		dsb(nshst);
 
 	__local_flush_tlb_page(vma, uaddr);
+#if !defined(CONFIG_ARM_LARGE_PAGE_SUPPORT)
 	tlb_op(TLB_V7_UIS_PAGE, "c8, c7, 1", uaddr);
+#endif
 
 	if (tlb_flag(TLB_BARRIER))
 		dsb(nsh);
@@ -478,6 +489,9 @@ __flush_tlb_page(struct vm_area_struct *vma, unsigned long uaddr)
 
 static inline void __local_flush_tlb_kernel_page(unsigned long kaddr)
 {
+#if defined(CONFIG_ARM_LARGE_PAGE_SUPPORT)
+	__cpu_flush_kern_tlb_range(kaddr, kaddr + PAGE_SIZE);
+#else
 	const int zero = 0;
 	const unsigned int __tlb_flag = __cpu_tlb_flags;
 
@@ -490,6 +504,7 @@ static inline void __local_flush_tlb_kernel_page(unsigned long kaddr)
 	tlb_op(TLB_V6_U_PAGE, "c8, c7, 1", kaddr);
 	tlb_op(TLB_V6_D_PAGE, "c8, c6, 1", kaddr);
 	tlb_op(TLB_V6_I_PAGE, "c8, c5, 1", kaddr);
+#endif /* CONFIG_ARM_LARGE_PAGE_SUPPORT */
 }
 
 static inline void local_flush_tlb_kernel_page(unsigned long kaddr)
@@ -502,7 +517,9 @@ static inline void local_flush_tlb_kernel_page(unsigned long kaddr)
 		dsb(nshst);
 
 	__local_flush_tlb_kernel_page(kaddr);
+#if !defined(CONFIG_ARM_LARGE_PAGE_SUPPORT)
 	tlb_op(TLB_V7_UIS_PAGE, "c8, c7, 1", kaddr);
+#endif
 
 	if (tlb_flag(TLB_BARRIER)) {
 		dsb(nsh);
@@ -520,7 +537,9 @@ static inline void __flush_tlb_kernel_page(unsigned long kaddr)
 		dsb(ishst);
 
 	__local_flush_tlb_kernel_page(kaddr);
+#if !defined(CONFIG_ARM_LARGE_PAGE_SUPPORT)
 	tlb_op(TLB_V7_UIS_PAGE, "c8, c3, 1", kaddr);
+#endif
 
 	if (tlb_flag(TLB_BARRIER)) {
 		dsb(ish);
diff --git a/arch/arm/kernel/entry-common.S b/arch/arm/kernel/entry-common.S
index 271cb8a1eba1..3a6ff31b8554 100644
--- a/arch/arm/kernel/entry-common.S
+++ b/arch/arm/kernel/entry-common.S
@@ -407,9 +407,22 @@ ENDPROC(sys_fstatfs64_wrapper)
  * Note: off_4k (r5) is always units of 4K.  If we can't do the requested
  * offset, we return EINVAL.
  */
+
+#define PGOFF_SHIFT (PAGE_SHIFT - 12)
+#define PGOFF_MASK  ((1 << PGOFF_SHIFT) - 1)
+
 sys_mmap2:
+#ifdef CONFIG_ARM_LARGE_PAGE_SUPPORT
+		tst	r5, #PGOFF_MASK
+		moveq	r5, r5, lsr #PGOFF_SHIFT
+		streq	r5, [sp, #4]
+		beq	sys_mmap_pgoff
+		mov	r0, #-EINVAL
+		ret	lr
+#else
 		str	r5, [sp, #4]
 		b	sys_mmap_pgoff
+#endif
 ENDPROC(sys_mmap2)
 
 #ifdef CONFIG_OABI_COMPAT
diff --git a/arch/arm/kernel/traps.c b/arch/arm/kernel/traps.c
index 1e70e7227f0f..19cf3e66df31 100644
--- a/arch/arm/kernel/traps.c
+++ b/arch/arm/kernel/traps.c
@@ -830,7 +830,17 @@ void __init early_trap_init(void *vectors_base)
 
 	kuser_init(vectors_base);
 
+#if defined(CONFIG_ARM_LARGE_PAGE_SUPPORT)
+	/*
+	 * With large page support, the page are at least 8K, so there
+	 * enough space in one page for the stubs are copied at
+	 * 4K offset.
+	 */
+	flush_icache_range(vectors, vectors + PAGE_SIZE);
+#else
 	flush_icache_range(vectors, vectors + PAGE_SIZE * 2);
+#endif
+
 #else /* ifndef CONFIG_CPU_V7M */
 	/*
 	 * on V7-M there is no need to copy the vector table to a dedicated
diff --git a/arch/arm/mm/Kconfig b/arch/arm/mm/Kconfig
index 65e4482e3849..6266caa93520 100644
--- a/arch/arm/mm/Kconfig
+++ b/arch/arm/mm/Kconfig
@@ -975,6 +975,59 @@ config MIGHT_HAVE_CACHE_L2X0
 	  instead of this option, thus preventing the user from
 	  inadvertently configuring a broken kernel.
 
+config ARM_LARGE_PAGE_SUPPORT
+	bool
+
+choice
+	prompt "Kernel Large Page Support"
+	depends on CPU_V7 && !ARM_LPAE
+	default ARM_NO_LARGE_PAGE_SUPPORT
+	help
+	  Support kennel large pages (> 4KB) by software emulation of
+	  large pages (using 4KB MMU pages).  Select one of the page
+	  sizes below.
+
+config ARM_NO_LARGE_PAGE_SUPPORT
+	bool "Disabled - Use default"
+	help
+	  Use kernel default page size (4KB).
+	  If you are not sure, select this option.
+	  This option does not make any changes to default kernel page size
+	  MMU management.
+
+config ARM_8KB_SW_PAGE_SIZE_SUPPORT
+	bool "8KB software page size support"
+	select ARM_LARGE_PAGE_SUPPORT
+	help
+	  The kernel uses 8KB pages, MMU page table will still use 4KB pages.
+	  This feature enables support for large storage volumes up to 32TB
+	  at the expense of higher memory fragmentation.
+
+config ARM_16KB_SW_PAGE_SIZE_SUPPORT
+	bool "16KB software page size support"
+	select ARM_LARGE_PAGE_SUPPORT
+	help
+	  The kernel uses 16KB pages, MMU page table will still use 4KB pages.
+	  This feature enables support for large storage volumes up to 64TB.
+	  at the expense of higher memory fragmentation.
+
+config ARM_32KB_SW_PAGE_SIZE_SUPPORT
+	bool "32KB software page size support"
+	select ARM_LARGE_PAGE_SUPPORT
+	help
+	  The kernel uses 32KB pages, MMU page table will still use 4KB pages.
+	  This feature enables support for large storage volumes up to 128TB.
+	  at the expense of higher memory fragmentation.
+
+config ARM_64KB_SW_PAGE_SIZE_SUPPORT
+	bool "64KB software page size support"
+	select ARM_LARGE_PAGE_SUPPORT
+	help
+	  The kernel uses 64KB pages, MMU page table will still use 4KB pages.
+	  This feature enables support for large storage volumes up to 256TB.
+	  at the expense of higher memory fragmentation.
+endchoice
+
 config CACHE_L2X0
 	bool "Enable the L2x0 outer cache controller" if MIGHT_HAVE_CACHE_L2X0
 	default MIGHT_HAVE_CACHE_L2X0
diff --git a/arch/arm/mm/fault.c b/arch/arm/mm/fault.c
index 2dd5c41cbb8d..ee4241b3cb2b 100644
--- a/arch/arm/mm/fault.c
+++ b/arch/arm/mm/fault.c
@@ -27,6 +27,20 @@
 
 #ifdef CONFIG_MMU
 
+#ifdef CONFIG_ARM_LARGE_PAGE_SUPPORT
+static long long get_large_pte_hw_val(pte_t *pte)
+{
+	unsigned long pte_ptr = (unsigned long)pte;
+	unsigned long tmp = pte_ptr;
+
+	pte_ptr += (PTE_HWTABLE_PTRS * sizeof(void *));
+	pte_ptr &= ~0x7FC;
+	tmp &= 0x7FC & (~(((PAGE_SHIFT - 12) - 1) << 7));
+	pte_ptr += (tmp << (PAGE_SHIFT - 12));
+	return (long long)pte_val(*(pte_t *)pte_ptr);
+}
+#endif
+
 /*
  * This is useful to dump out the page tables associated with
  * 'addr' in mm 'mm'.
@@ -86,9 +100,14 @@ void show_pte(const char *lvl, struct mm_struct *mm, unsigned long addr)
 		pte = pte_offset_map(pmd, addr);
 		pr_cont(", *pte=%08llx", (long long)pte_val(*pte));
 #ifndef CONFIG_ARM_LPAE
+#ifdef CONFIG_ARM_LARGE_PAGE_SUPPORT
+		pr_cont(", *ppte=%08llx", get_large_pte_hw_val(pte));
+
+#else
 		pr_cont(", *ppte=%08llx",
 		       (long long)pte_val(pte[PTE_HWTABLE_PTRS]));
 #endif
+#endif /* CONFIG_ARM_LPAE */
 		pte_unmap(pte);
 	} while(0);
 
diff --git a/arch/arm/mm/mmu.c b/arch/arm/mm/mmu.c
index b7fdea7e0cbe..06549714973a 100644
--- a/arch/arm/mm/mmu.c
+++ b/arch/arm/mm/mmu.c
@@ -1318,8 +1318,17 @@ static void __init devicemaps_init(const struct machine_desc *mdesc)
 	/*
 	 * Allocate the vector page early.
 	 */
+#if defined(CONFIG_ARM_LARGE_PAGE_SUPPORT)
+	/*
+	 * With large page support, the pages are at least 8K, so
+	 * there is enough space in one page for the stubs that are
+	 * copied at 4K offset.
+	 */
+	vectors = early_alloc(PAGE_SIZE);
+#else
 	vectors = early_alloc(PAGE_SIZE * 2);
 
+#endif
 	early_trap_init(vectors);
 
 	/*
@@ -1380,12 +1389,21 @@ static void __init devicemaps_init(const struct machine_desc *mdesc)
 		create_mapping(&map);
 	}
 
+	/*
+	 * With large page support, the page are at least 8K, so this
+	 * hardware page was already mapped. Actually the hardcoded
+	 * 4KB offset causes trouble with the virtual address passed
+	 * to create_mapping: the address is no more aligned to a
+	 * page.
+	 */
+#ifndef CONFIG_ARM_LARGE_PAGE_SUPPORT
 	/* Now create a kernel read-only mapping */
 	map.pfn += 1;
 	map.virtual = 0xffff0000 + PAGE_SIZE;
 	map.length = PAGE_SIZE;
 	map.type = MT_LOW_VECTORS;
 	create_mapping(&map);
+#endif
 
 	/*
 	 * Ask the machine support to map in the statically mapped devices.
diff --git a/arch/arm/mm/pgd.c b/arch/arm/mm/pgd.c
index 478bd2c6aa50..ade3f3885b4c 100644
--- a/arch/arm/mm/pgd.c
+++ b/arch/arm/mm/pgd.c
@@ -95,7 +95,9 @@ pgd_t *pgd_alloc(struct mm_struct *mm)
 		init_pmd = pmd_offset(init_pud, 0);
 		init_pte = pte_offset_map(init_pmd, 0);
 		set_pte_ext(new_pte + 0, init_pte[0], 0);
+#ifndef CONFIG_ARM_LARGE_PAGE_SUPPORT
 		set_pte_ext(new_pte + 1, init_pte[1], 0);
+#endif
 		pte_unmap(init_pte);
 		pte_unmap(new_pte);
 	}
diff --git a/arch/arm/mm/proc-v7-2level.S b/arch/arm/mm/proc-v7-2level.S
index 5db029c8f987..7e34b421c8b8 100644
--- a/arch/arm/mm/proc-v7-2level.S
+++ b/arch/arm/mm/proc-v7-2level.S
@@ -59,6 +59,11 @@ ENTRY(cpu_v7_switch_mm)
 	bx	lr
 ENDPROC(cpu_v7_switch_mm)
 
+    .macro flush_pte adr
+	ALT_SMP(W(nop))
+	ALT_UP (mcr	p15, 0, \adr, c7, c10, 1)	@ flush_pte
+.endm
+
 /*
  *	cpu_v7_set_pte_ext(ptep, pte)
  *
@@ -73,6 +78,19 @@ ENTRY(cpu_v7_set_pte_ext)
 #ifdef CONFIG_MMU
 	str	r1, [r0]			@ linux version
 
+    /* Calc HW PTE Entry Offset */
+#ifdef CONFIG_ARM_LARGE_PAGE_SUPPORT
+#define PTE_SHIFT	(PAGE_SHIFT - 12)
+#define PTE_MASK	(0x3FC >> (PTE_SHIFT - 1))
+	mov	r3, #PTE_MASK
+	and	r3, r3, r0
+	mov	r3, r3, lsl#PTE_SHIFT
+
+	bic	r0, r0, #0x3FC
+	bic	r0, r0, #0x400
+	orr	r0, r0, r3
+#endif /* CONFIG_ARM_LARGE_PAGE_SUPPORT */
+
 	bic	r3, r1, #0x000003f0
 	bic	r3, r3, #PTE_TYPE_MASK
 	orr	r3, r3, r2
@@ -100,9 +118,29 @@ ENTRY(cpu_v7_set_pte_ext)
  ARM(	str	r3, [r0, #2048]! )
  THUMB(	add	r0, r0, #2048 )
  THUMB(	str	r3, [r0] )
-	ALT_SMP(W(nop))
-	ALT_UP (mcr	p15, 0, r0, c7, c10, 1)		@ flush_pte
-#endif
+#ifdef CONFIG_ARM_LARGE_PAGE_SUPPORT
+#define PTE_OFFSET ((1 << (PAGE_SHIFT - 12)) * 4)
+	mov	r1, #PTE_OFFSET
+	mov	r2, #4
+1:	add	r3, r3, #0x1000
+	str	r3, [r0, r2]
+	add	r2, r2, #4
+#if PAGE_SHIFT > 15 /* 64KB in this case 2 cache lines need to be flushed */
+	cmp	r2, #32  @ cache line size
+	bne	2f
+	cmp	r2, r1
+	beq	3f
+	flush_pte r0
+	mov	r1, #32
+	add	r0, r0, #32
+	mov	r2, #0
+#endif /* PAGE_SHIFT > 15 */
+2:	cmp	r2, r1
+	bne	1b
+#endif /* CONFIG_ARM_LARGE_PAGE_SUPPORT */
+3:	flush_pte r0
+#endif /* CONFIG_MMU */
+
 	bx	lr
 ENDPROC(cpu_v7_set_pte_ext)
 
diff --git a/arch/arm/mm/tlb-v7.S b/arch/arm/mm/tlb-v7.S
index 1bb28d7db567..8e68218e53d3 100644
--- a/arch/arm/mm/tlb-v7.S
+++ b/arch/arm/mm/tlb-v7.S
@@ -50,8 +50,12 @@ ENTRY(v7wbi_flush_user_tlb_range)
 #endif
 	ALT_UP(mcr	p15, 0, r0, c8, c7, 1)	@ TLB invalidate U MVA
 
-	add	r0, r0, #PAGE_SZ
-	cmp	r0, r1
+#if defined(CONFIG_ARM_LARGE_PAGE_SUPPORT)
+	add	r0, r0, #0x1000
+#else
+        add	r0, r0, #PAGE_SZ
+#endif  /* CONFIG_ARM_LARGE_PAGE_SUPPORT */
+        cmp	r0, r1
 	blo	1b
 	dsb	ish
 	ret	lr
@@ -78,7 +82,11 @@ ENTRY(v7wbi_flush_kern_tlb_range)
 	ALT_SMP(mcr	p15, 0, r0, c8, c3, 1)	@ TLB invalidate U MVA (shareable)
 #endif
 	ALT_UP(mcr	p15, 0, r0, c8, c7, 1)	@ TLB invalidate U MVA
-	add	r0, r0, #PAGE_SZ
+#if defined(CONFIG_ARM_LARGE_PAGE_SUPPORT)
+	add	r0, r0, #0x1000
+#else
+        add	r0, r0, #PAGE_SZ
+#endif /* CONFIG_ARM_LARGE_PAGE_SUPPORT */
 	cmp	r0, r1
 	blo	1b
 	dsb	ish
-- 
2.26.2


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v2 6/6] ARM: Add 64K page support at MMU level
  2020-06-11 13:49 [PATCH v2 0/6] ARM: Add support for large kernel page (from 8K to 64K) Gregory CLEMENT
                   ` (4 preceding siblings ...)
  2020-06-11 13:49 ` [PATCH v2 5/6] ARM: Add large kernel page support Gregory CLEMENT
@ 2020-06-11 13:49 ` Gregory CLEMENT
  2020-06-11 16:21 ` [PATCH v2 0/6] ARM: Add support for large kernel page (from 8K to 64K) Russell King - ARM Linux admin
  6 siblings, 0 replies; 22+ messages in thread
From: Gregory CLEMENT @ 2020-06-11 13:49 UTC (permalink / raw)
  To: Russell King, Arnd Bergmann
  Cc: Gregory CLEMENT, Thomas Petazzoni, linux-arm-kernel

While 8K, 16K or 32K pages are not supported by ARM, it is possible to
use large page with a 64K size.

Compared to the large page support based on software, by using real
64K page the tlb flush can be done on a single page instead of a range
of pages.

This is inspired from fa0ca2726ea9 ("DSMP 64K support") and
4ef803e12baf ("mmu: large-page: Added support for multiple kernel page
sizes") from
https://github.com/MarvellEmbeddedProcessors/linux-marvell.git

Signed-off-by: Gregory CLEMENT <gregory.clement@bootlin.com>
---
 arch/arm/include/asm/page.h                 |  2 ++
 arch/arm/include/asm/pgtable-2level-hwdef.h |  8 ++++++
 arch/arm/include/asm/tlbflush.h             | 14 +++++------
 arch/arm/mm/Kconfig                         | 23 +++++++++++++++--
 arch/arm/mm/proc-v7-2level.S                | 28 +++++++++++++++++++++
 arch/arm/mm/tlb-v7.S                        |  8 +++---
 6 files changed, 70 insertions(+), 13 deletions(-)

diff --git a/arch/arm/include/asm/page.h b/arch/arm/include/asm/page.h
index 42784fed8834..8d6b16e73b06 100644
--- a/arch/arm/include/asm/page.h
+++ b/arch/arm/include/asm/page.h
@@ -16,6 +16,8 @@
 #define PAGE_SHIFT		15
 #elif defined(CONFIG_ARM_64KB_SW_PAGE_SIZE_SUPPORT)
 #define PAGE_SHIFT		16
+#elif defined(CONFIG_ARM_64KB_MMU_PAGE_SIZE_SUPPORT)
+#define PAGE_SHIFT		16
 #else
 #define PAGE_SHIFT		12
 #endif
diff --git a/arch/arm/include/asm/pgtable-2level-hwdef.h b/arch/arm/include/asm/pgtable-2level-hwdef.h
index 556937e1790e..37503789c6d6 100644
--- a/arch/arm/include/asm/pgtable-2level-hwdef.h
+++ b/arch/arm/include/asm/pgtable-2level-hwdef.h
@@ -66,7 +66,11 @@
 /*
  *   - extended small page/tiny page
  */
+#ifdef CONFIG_ARM_64KB_MMU_PAGE_SIZE_SUPPORT
+#define PTE_EXT_XN		(_AT(pteval_t, 1) << 15)	/* v6 */
+#else
 #define PTE_EXT_XN		(_AT(pteval_t, 1) << 0)		/* v6 */
+#endif
 #define PTE_EXT_AP_MASK		(_AT(pteval_t, 3) << 4)
 #define PTE_EXT_AP0		(_AT(pteval_t, 1) << 4)
 #define PTE_EXT_AP1		(_AT(pteval_t, 2) << 4)
@@ -74,7 +78,11 @@
 #define PTE_EXT_AP_UNO_SRW	(PTE_EXT_AP0)
 #define PTE_EXT_AP_URO_SRW	(PTE_EXT_AP1)
 #define PTE_EXT_AP_URW_SRW	(PTE_EXT_AP1|PTE_EXT_AP0)
+#ifdef CONFIG_ARM_64KB_MMU_PAGE_SIZE_SUPPORT
+#define PTE_EXT_TEX(x)		(_AT(pteval_t, (x)) << 12)	/* Large Page */
+#else
 #define PTE_EXT_TEX(x)		(_AT(pteval_t, (x)) << 6)	/* v5 */
+#endif
 #define PTE_EXT_APX		(_AT(pteval_t, 1) << 9)		/* v6 */
 #define PTE_EXT_COHERENT	(_AT(pteval_t, 1) << 9)		/* XScale3 */
 #define PTE_EXT_SHARED		(_AT(pteval_t, 1) << 10)	/* v6 */
diff --git a/arch/arm/include/asm/tlbflush.h b/arch/arm/include/asm/tlbflush.h
index d8ad4021a4da..1d2b17a9b6ee 100644
--- a/arch/arm/include/asm/tlbflush.h
+++ b/arch/arm/include/asm/tlbflush.h
@@ -420,7 +420,7 @@ static inline void
 __local_flush_tlb_page(struct vm_area_struct *vma, unsigned long uaddr)
 {
 	const unsigned int __tlb_flag = __cpu_tlb_flags;
-#if defined(CONFIG_ARM_LARGE_PAGE_SUPPORT)
+#if defined(CONFIG_SW_ARM_LARGE_PAGE_SUPPORT)
 	if (tlb_flag(TLB_WB))
 		dsb();
 
@@ -444,7 +444,7 @@ __local_flush_tlb_page(struct vm_area_struct *vma, unsigned long uaddr)
 	tlb_op(TLB_V6_U_PAGE, "c8, c7, 1", uaddr);
 	tlb_op(TLB_V6_D_PAGE, "c8, c6, 1", uaddr);
 	tlb_op(TLB_V6_I_PAGE, "c8, c5, 1", uaddr);
-#endif /* CONFIG_ARM_LARGE_PAGE_SUPPORT */
+#endif /* CONFIG_SW_ARM_LARGE_PAGE_SUPPORT */
 }
 
 static inline void
@@ -458,7 +458,7 @@ local_flush_tlb_page(struct vm_area_struct *vma, unsigned long uaddr)
 		dsb(nshst);
 
 	__local_flush_tlb_page(vma, uaddr);
-#if !defined(CONFIG_ARM_LARGE_PAGE_SUPPORT)
+#if !defined(CONFIG_SW_ARM_LARGE_PAGE_SUPPORT)
 	tlb_op(TLB_V7_UIS_PAGE, "c8, c7, 1", uaddr);
 #endif
 
@@ -489,7 +489,7 @@ __flush_tlb_page(struct vm_area_struct *vma, unsigned long uaddr)
 
 static inline void __local_flush_tlb_kernel_page(unsigned long kaddr)
 {
-#if defined(CONFIG_ARM_LARGE_PAGE_SUPPORT)
+#if defined(CONFIG_SW_ARM_LARGE_PAGE_SUPPORT)
 	__cpu_flush_kern_tlb_range(kaddr, kaddr + PAGE_SIZE);
 #else
 	const int zero = 0;
@@ -504,7 +504,7 @@ static inline void __local_flush_tlb_kernel_page(unsigned long kaddr)
 	tlb_op(TLB_V6_U_PAGE, "c8, c7, 1", kaddr);
 	tlb_op(TLB_V6_D_PAGE, "c8, c6, 1", kaddr);
 	tlb_op(TLB_V6_I_PAGE, "c8, c5, 1", kaddr);
-#endif /* CONFIG_ARM_LARGE_PAGE_SUPPORT */
+#endif /* CONFIG_SW_ARM_LARGE_PAGE_SUPPORT */
 }
 
 static inline void local_flush_tlb_kernel_page(unsigned long kaddr)
@@ -517,7 +517,7 @@ static inline void local_flush_tlb_kernel_page(unsigned long kaddr)
 		dsb(nshst);
 
 	__local_flush_tlb_kernel_page(kaddr);
-#if !defined(CONFIG_ARM_LARGE_PAGE_SUPPORT)
+#if !defined(CONFIG_SW_ARM_LARGE_PAGE_SUPPORT)
 	tlb_op(TLB_V7_UIS_PAGE, "c8, c7, 1", kaddr);
 #endif
 
@@ -537,7 +537,7 @@ static inline void __flush_tlb_kernel_page(unsigned long kaddr)
 		dsb(ishst);
 
 	__local_flush_tlb_kernel_page(kaddr);
-#if !defined(CONFIG_ARM_LARGE_PAGE_SUPPORT)
+#if !defined(CONFIG_SW_ARM_LARGE_PAGE_SUPPORT)
 	tlb_op(TLB_V7_UIS_PAGE, "c8, c3, 1", kaddr);
 #endif
 
diff --git a/arch/arm/mm/Kconfig b/arch/arm/mm/Kconfig
index 6266caa93520..b566708af0bf 100644
--- a/arch/arm/mm/Kconfig
+++ b/arch/arm/mm/Kconfig
@@ -978,13 +978,16 @@ config MIGHT_HAVE_CACHE_L2X0
 config ARM_LARGE_PAGE_SUPPORT
 	bool
 
+config ARM_SW_LARGE_PAGE_SUPPORT
+	bool
+
 choice
 	prompt "Kernel Large Page Support"
 	depends on CPU_V7 && !ARM_LPAE
 	default ARM_NO_LARGE_PAGE_SUPPORT
 	help
-	  Support kennel large pages (> 4KB) by software emulation of
-	  large pages (using 4KB MMU pages).  Select one of the page
+	  Support kennel large pages (> 4KB), this includes MMU large pages
+	  (64KB) and software emulation of large pages (using 4KB MMU pages).
 	  sizes below.
 
 config ARM_NO_LARGE_PAGE_SUPPORT
@@ -998,6 +1001,7 @@ config ARM_NO_LARGE_PAGE_SUPPORT
 config ARM_8KB_SW_PAGE_SIZE_SUPPORT
 	bool "8KB software page size support"
 	select ARM_LARGE_PAGE_SUPPORT
+	select ARM_SW_LARGE_PAGE_SUPPORT
 	help
 	  The kernel uses 8KB pages, MMU page table will still use 4KB pages.
 	  This feature enables support for large storage volumes up to 32TB
@@ -1006,6 +1010,7 @@ config ARM_8KB_SW_PAGE_SIZE_SUPPORT
 config ARM_16KB_SW_PAGE_SIZE_SUPPORT
 	bool "16KB software page size support"
 	select ARM_LARGE_PAGE_SUPPORT
+	select ARM_SW_LARGE_PAGE_SUPPORT
 	help
 	  The kernel uses 16KB pages, MMU page table will still use 4KB pages.
 	  This feature enables support for large storage volumes up to 64TB.
@@ -1014,6 +1019,7 @@ config ARM_16KB_SW_PAGE_SIZE_SUPPORT
 config ARM_32KB_SW_PAGE_SIZE_SUPPORT
 	bool "32KB software page size support"
 	select ARM_LARGE_PAGE_SUPPORT
+	select ARM_SW_LARGE_PAGE_SUPPORT
 	help
 	  The kernel uses 32KB pages, MMU page table will still use 4KB pages.
 	  This feature enables support for large storage volumes up to 128TB.
@@ -1022,10 +1028,23 @@ config ARM_32KB_SW_PAGE_SIZE_SUPPORT
 config ARM_64KB_SW_PAGE_SIZE_SUPPORT
 	bool "64KB software page size support"
 	select ARM_LARGE_PAGE_SUPPORT
+	select ARM_SW_LARGE_PAGE_SUPPORT
 	help
 	  The kernel uses 64KB pages, MMU page table will still use 4KB pages.
 	  This feature enables support for large storage volumes up to 256TB.
 	  at the expense of higher memory fragmentation.
+	  If you need 64KB pages, consider using the ARM_64KB_MMU_PAGE_SIZE_SUPPORT
+	  option.
+
+config ARM_64KB_MMU_PAGE_SIZE_SUPPORT
+	bool "64KB MMU page size support"
+	select ARM_LARGE_PAGE_SUPPORT
+	help
+	  The kernel uses 64KB pages. The page-table will use large-pages (64KB)
+	  as well.
+	  This feature enables support for large storage volumes up to 256TB.
+	  at the expense of higher memory fragmentation.
+
 endchoice
 
 config CACHE_L2X0
diff --git a/arch/arm/mm/proc-v7-2level.S b/arch/arm/mm/proc-v7-2level.S
index 7e34b421c8b8..67401f859c2d 100644
--- a/arch/arm/mm/proc-v7-2level.S
+++ b/arch/arm/mm/proc-v7-2level.S
@@ -92,9 +92,16 @@ ENTRY(cpu_v7_set_pte_ext)
 #endif /* CONFIG_ARM_LARGE_PAGE_SUPPORT */
 
 	bic	r3, r1, #0x000003f0
+#ifdef CONFIG_ARM_64KB_MMU_PAGE_SIZE_SUPPORT
+	bic	r3, r3, #0x00000F000
+#endif
 	bic	r3, r3, #PTE_TYPE_MASK
 	orr	r3, r3, r2
+#ifdef CONFIG_ARM_64KB_MMU_PAGE_SIZE_SUPPORT
+	orr	r3, r3, #PTE_EXT_AP0 | 1
+#else
 	orr	r3, r3, #PTE_EXT_AP0 | 2
+#endif
 
 	tst	r1, #1 << 4
 	orrne	r3, r3, #PTE_EXT_TEX(1)
@@ -119,6 +126,26 @@ ENTRY(cpu_v7_set_pte_ext)
  THUMB(	add	r0, r0, #2048 )
  THUMB(	str	r3, [r0] )
 #ifdef CONFIG_ARM_LARGE_PAGE_SUPPORT
+#ifdef CONFIG_ARM_64KB_MMU_PAGE_SIZE_SUPPORT
+	@ Need to duplicate the entry 16 times because of overlapping in PTE index bits.
+	str	r3, [r0, #4]
+	str	r3, [r0, #8]
+	str	r3, [r0, #12]
+	str	r3, [r0, #16]
+	str	r3, [r0, #20]
+	str	r3, [r0, #24]
+	str	r3, [r0, #28]
+	flush_pte r0
+	add	r0, r0, #32
+	str	r3, [r0]
+	str	r3, [r0, #4]
+	str	r3, [r0, #8]
+	str	r3, [r0, #12]
+	str	r3, [r0, #16]
+	str	r3, [r0, #20]
+	str	r3, [r0, #24]
+	str	r3, [r0, #28]
+#else
 #define PTE_OFFSET ((1 << (PAGE_SHIFT - 12)) * 4)
 	mov	r1, #PTE_OFFSET
 	mov	r2, #4
@@ -137,6 +164,7 @@ ENTRY(cpu_v7_set_pte_ext)
 #endif /* PAGE_SHIFT > 15 */
 2:	cmp	r2, r1
 	bne	1b
+#endif /* CONFIG_ARM_64KB_MMU_PAGE_SIZE_SUPPORT */
 #endif /* CONFIG_ARM_LARGE_PAGE_SUPPORT */
 3:	flush_pte r0
 #endif /* CONFIG_MMU */
diff --git a/arch/arm/mm/tlb-v7.S b/arch/arm/mm/tlb-v7.S
index 8e68218e53d3..c90dbbd6aa5e 100644
--- a/arch/arm/mm/tlb-v7.S
+++ b/arch/arm/mm/tlb-v7.S
@@ -50,11 +50,11 @@ ENTRY(v7wbi_flush_user_tlb_range)
 #endif
 	ALT_UP(mcr	p15, 0, r0, c8, c7, 1)	@ TLB invalidate U MVA
 
-#if defined(CONFIG_ARM_LARGE_PAGE_SUPPORT)
+#if defined(CONFIG_SW_ARM_LARGE_PAGE_SUPPORT)
 	add	r0, r0, #0x1000
 #else
         add	r0, r0, #PAGE_SZ
-#endif  /* CONFIG_ARM_LARGE_PAGE_SUPPORT */
+#endif  /* CONFIG_SW_ARM_LARGE_PAGE_SUPPORT */
         cmp	r0, r1
 	blo	1b
 	dsb	ish
@@ -82,11 +82,11 @@ ENTRY(v7wbi_flush_kern_tlb_range)
 	ALT_SMP(mcr	p15, 0, r0, c8, c3, 1)	@ TLB invalidate U MVA (shareable)
 #endif
 	ALT_UP(mcr	p15, 0, r0, c8, c7, 1)	@ TLB invalidate U MVA
-#if defined(CONFIG_ARM_LARGE_PAGE_SUPPORT)
+#if defined(CONFIG_SW_ARM_LARGE_PAGE_SUPPORT)
 	add	r0, r0, #0x1000
 #else
         add	r0, r0, #PAGE_SZ
-#endif /* CONFIG_ARM_LARGE_PAGE_SUPPORT */
+#endif /* CONFIG_SW_ARM_LARGE_PAGE_SUPPORT */
 	cmp	r0, r1
 	blo	1b
 	dsb	ish
-- 
2.26.2


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 0/6] ARM: Add support for large kernel page (from 8K to 64K)
  2020-06-11 13:49 [PATCH v2 0/6] ARM: Add support for large kernel page (from 8K to 64K) Gregory CLEMENT
                   ` (5 preceding siblings ...)
  2020-06-11 13:49 ` [PATCH v2 6/6] ARM: Add 64K page support at MMU level Gregory CLEMENT
@ 2020-06-11 16:21 ` Russell King - ARM Linux admin
  2020-06-12  9:15   ` Gregory CLEMENT
  2020-06-12  9:23   ` Arnd Bergmann
  6 siblings, 2 replies; 22+ messages in thread
From: Russell King - ARM Linux admin @ 2020-06-11 16:21 UTC (permalink / raw)
  To: Gregory CLEMENT; +Cc: Thomas Petazzoni, linux-arm-kernel, Arnd Bergmann

Hi Gregory,

You're on your own with this one; I've no motivation to re-understand
the ARM page table code now that 32-bit ARM is basically unsupported
now.

I'll point out some of the things you got wrong below though.

On Thu, Jun 11, 2020 at 03:49:08PM +0200, Gregory CLEMENT wrote:
> Hello,
> 
> On ARM based NAS it is possible to have storage volume larger than
> 16TB, especially with the use of LVM. However, on 32-bit architectures,
> the page cache index is stored on 32 bits, which means that given a
> page size of 4 KB, we can only address volumes of up to 16 TB.
> 
> Therefore, one option to use such large volumes and filesystems on 32
> bits architecture is to increase the page size.
> 
> This series allows to support 8K, 16K, 32K and 64K kernel pages. On
> ARM the size of the page can be either 4K or 64K, so for the other
> size a "software emulation" is used, here Linux thinks it is using
> pages of 8 KB, 16 KB or 32 KB, while underneath the MMU still uses 4
> KB pages.
> 
> For ARM there is already a difference between the kernel page and the
> hardware page in the way they are managed. In the same 4K space the
> Linux kernel deals with 2 PTE tables at the beginning, while the
> hardware deals with 2 other hardware PTE tables.

This is incorrect.  The kernel page size and the hardware page size
match today - both are 4k.  What your'e talking about here is the
PTE table size.

The kernel requires that each PTE table is contained within one
struct page.  Since one hardware PTE table is 256 entries, it
occupies 1024 bytes, so a quarter of a page.  So, to have a single
4k page per PTE table would waste quite a bit of space.

Now, the hardware PTE tables do not lend themselves to the kernel's
usage: the kernel wants additional bits to track the state of each
page in the page tables.  Hence, we need to shadow every PTE entry.
This also provides us independence of the underlying hardware PTE
entry format, which varies between ARM architecture versions.

So, we end up with a single 4k page containing two consecutive
hardware PTE tables, followed by two Linux PTE tables for the kernels
benefit.

If you increase the page size, then you need to increase the number
of tables in a page, or suffer a huge amount of wasted memory taken
for the page tables - going to an 8k page size means that the upper
4k of each page will not be used.  Going to 16k means the upper 12k
won't be used.  And so on - as your software page size increases,
the amount of memory wasted for each PTE table will increase
unless you also increase the number of hardware 1st level entries
pointing to each PTE page.  With 64k pages, 60k of each PTE page
will remain unused.

That isn't very efficient use of memory.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTC for 0.8m (est. 1762m) line in suburbia: sync at 13.1Mbps down 503kbps up

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 1/6] ARM: Use PAGE_SIZE for ELF_EXEC_PAGESIZE
  2020-06-11 13:49 ` [PATCH v2 1/6] ARM: Use PAGE_SIZE for ELF_EXEC_PAGESIZE Gregory CLEMENT
@ 2020-06-12  8:22   ` Arnd Bergmann
  2020-06-12  8:35     ` Russell King - ARM Linux admin
  2020-06-12  8:52     ` Gregory CLEMENT
  0 siblings, 2 replies; 22+ messages in thread
From: Arnd Bergmann @ 2020-06-12  8:22 UTC (permalink / raw)
  To: Gregory CLEMENT; +Cc: Thomas Petazzoni, Russell King, Linux ARM

On Thu, Jun 11, 2020 at 3:49 PM Gregory CLEMENT
<gregory.clement@bootlin.com> wrote:
>
> Currently ELF_EXEC_PAGESIZE is 4096 which is also the page size. In
> order to be able to use other size of page than 4K, use PAGE_SIZE
> instead of the hardcoded value.
>
> The use of PAGE_SIZE will be also aligned with what we find in other
> architectures such as arm64.
>
> This is inspired from fa0ca2726ea9 ("DSMP 64K support") and
> 4ef803e12baf ("mmu: large-page: Added support for multiple kernel page
> sizes") from
> https://github.com/MarvellEmbeddedProcessors/linux-marvell.git
>
> Signed-off-by: Gregory CLEMENT <gregory.clement@bootlin.com>

IIRC using page sizes above 16KB here also requires using a
non-ancient linker in user space that places sections on
ELF_EXEC_PAGESIZE boundaries, right?

      Arnd

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 1/6] ARM: Use PAGE_SIZE for ELF_EXEC_PAGESIZE
  2020-06-12  8:22   ` Arnd Bergmann
@ 2020-06-12  8:35     ` Russell King - ARM Linux admin
  2020-06-12  8:46       ` Arnd Bergmann
  2020-06-12  8:52     ` Gregory CLEMENT
  1 sibling, 1 reply; 22+ messages in thread
From: Russell King - ARM Linux admin @ 2020-06-12  8:35 UTC (permalink / raw)
  To: Arnd Bergmann; +Cc: Gregory CLEMENT, Thomas Petazzoni, Linux ARM

On Fri, Jun 12, 2020 at 10:22:17AM +0200, Arnd Bergmann wrote:
> On Thu, Jun 11, 2020 at 3:49 PM Gregory CLEMENT
> <gregory.clement@bootlin.com> wrote:
> >
> > Currently ELF_EXEC_PAGESIZE is 4096 which is also the page size. In
> > order to be able to use other size of page than 4K, use PAGE_SIZE
> > instead of the hardcoded value.
> >
> > The use of PAGE_SIZE will be also aligned with what we find in other
> > architectures such as arm64.
> >
> > This is inspired from fa0ca2726ea9 ("DSMP 64K support") and
> > 4ef803e12baf ("mmu: large-page: Added support for multiple kernel page
> > sizes") from
> > https://github.com/MarvellEmbeddedProcessors/linux-marvell.git
> >
> > Signed-off-by: Gregory CLEMENT <gregory.clement@bootlin.com>
> 
> IIRC using page sizes above 16KB here also requires using a
> non-ancient linker in user space that places sections on
> ELF_EXEC_PAGESIZE boundaries, right?

Doesn't that mean that this change breaks all existing userspace when
ELF_EXEC_PAGESIZE is not 4k?

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 4/6] ARM: mm: Aligned pte allocation to one page
  2020-06-11 13:49 ` [PATCH v2 4/6] ARM: mm: Aligned pte allocation to one page Gregory CLEMENT
@ 2020-06-12  8:37   ` Arnd Bergmann
  2020-06-12 10:25     ` Catalin Marinas
  2020-06-12 11:56     ` Gregory CLEMENT
  0 siblings, 2 replies; 22+ messages in thread
From: Arnd Bergmann @ 2020-06-12  8:37 UTC (permalink / raw)
  To: Gregory CLEMENT; +Cc: Thomas Petazzoni, Russell King, Linux ARM

On Thu, Jun 11, 2020 at 3:49 PM Gregory CLEMENT
<gregory.clement@bootlin.com> wrote:
>
> In pte_offset_kernel() the pte_index macro is used. This macro makes
> the assumption that the address is aligned to a page size.
>
> In arm_pte_allocation, the size allocated is the size needed for 512
> entries. Actually this size was calculated to fit in a 4K page. When
> using larger page, the size of the table allocated is no more
> aligned which end to give a wrong physical address.
>
> The solution is to round up the allocation to a page size instead of
> the exact size of the tables (which is 4KB). It allows to comply with
> the assumption of pte_index() but the drawback is a waste of memory
> for the early allocation if page size is bigger than 4KB.

Have you considered increasing PTRS_PER_PTE instead to fill up
a logical page instead? If that doesn't work, can you explain here
why not?

       Arnd

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 1/6] ARM: Use PAGE_SIZE for ELF_EXEC_PAGESIZE
  2020-06-12  8:35     ` Russell King - ARM Linux admin
@ 2020-06-12  8:46       ` Arnd Bergmann
  2020-06-12  8:50         ` Russell King - ARM Linux admin
                           ` (2 more replies)
  0 siblings, 3 replies; 22+ messages in thread
From: Arnd Bergmann @ 2020-06-12  8:46 UTC (permalink / raw)
  To: Russell King - ARM Linux admin
  Cc: Gregory CLEMENT, Thomas Petazzoni, Linux ARM

On Fri, Jun 12, 2020 at 10:35 AM Russell King - ARM Linux admin
<linux@armlinux.org.uk> wrote:
> On Fri, Jun 12, 2020 at 10:22:17AM +0200, Arnd Bergmann wrote:
> > On Thu, Jun 11, 2020 at 3:49 PM Gregory CLEMENT
> > <gregory.clement@bootlin.com> wrote:
> > >
> > > Currently ELF_EXEC_PAGESIZE is 4096 which is also the page size. In
> > > order to be able to use other size of page than 4K, use PAGE_SIZE
> > > instead of the hardcoded value.
> > >
> > > The use of PAGE_SIZE will be also aligned with what we find in other
> > > architectures such as arm64.
> > >
> > > This is inspired from fa0ca2726ea9 ("DSMP 64K support") and
> > > 4ef803e12baf ("mmu: large-page: Added support for multiple kernel page
> > > sizes") from
> > > https://github.com/MarvellEmbeddedProcessors/linux-marvell.git
> > >
> > > Signed-off-by: Gregory CLEMENT <gregory.clement@bootlin.com>
> >
> > IIRC using page sizes above 16KB here also requires using a
> > non-ancient linker in user space that places sections on
> > ELF_EXEC_PAGESIZE boundaries, right?

Correction: I was thinking of SHMLBA, not ELF_EXEC_PAGESIZE.
SHMLBA is defined to 16KB in arch/arm/ at the moment (based on 4K
page size), or (4 * PAGE_SIZE) on arm64, which can blow up to 256KB.

AFAICT, SHMLBA should now be defined as "min(16384, PAGE_SIZE)".

> Doesn't that mean that this change breaks all existing userspace when
> ELF_EXEC_PAGESIZE is not 4k?

I think a lot of older user space would be broken with page sizes larger
than 16KB, but would still work with 8KB or 16KB. Larger page sizes
would only work with user space that was linked in the last five years
or so, using a toolchain that has the workarounds for running on arm64
with 64KB page size.

      Arnd

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 1/6] ARM: Use PAGE_SIZE for ELF_EXEC_PAGESIZE
  2020-06-12  8:46       ` Arnd Bergmann
@ 2020-06-12  8:50         ` Russell King - ARM Linux admin
  2020-06-12 11:50         ` Catalin Marinas
  2020-06-12 12:06         ` Gregory CLEMENT
  2 siblings, 0 replies; 22+ messages in thread
From: Russell King - ARM Linux admin @ 2020-06-12  8:50 UTC (permalink / raw)
  To: Arnd Bergmann; +Cc: Gregory CLEMENT, Thomas Petazzoni, Linux ARM

On Fri, Jun 12, 2020 at 10:46:17AM +0200, Arnd Bergmann wrote:
> Correction: I was thinking of SHMLBA, not ELF_EXEC_PAGESIZE.
> SHMLBA is defined to 16KB in arch/arm/ at the moment (based on 4K
> page size), or (4 * PAGE_SIZE) on arm64, which can blow up to 256KB.
> 
> AFAICT, SHMLBA should now be defined as "min(16384, PAGE_SIZE)".

Yes, because the 16k comes from the aliasing VIPT cache to avoid
aliases.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 1/6] ARM: Use PAGE_SIZE for ELF_EXEC_PAGESIZE
  2020-06-12  8:22   ` Arnd Bergmann
  2020-06-12  8:35     ` Russell King - ARM Linux admin
@ 2020-06-12  8:52     ` Gregory CLEMENT
  1 sibling, 0 replies; 22+ messages in thread
From: Gregory CLEMENT @ 2020-06-12  8:52 UTC (permalink / raw)
  To: Arnd Bergmann; +Cc: Thomas Petazzoni, Russell King, Linux ARM

Hello Arnd,

> On Thu, Jun 11, 2020 at 3:49 PM Gregory CLEMENT
> <gregory.clement@bootlin.com> wrote:
>>
>> Currently ELF_EXEC_PAGESIZE is 4096 which is also the page size. In
>> order to be able to use other size of page than 4K, use PAGE_SIZE
>> instead of the hardcoded value.
>>
>> The use of PAGE_SIZE will be also aligned with what we find in other
>> architectures such as arm64.
>>
>> This is inspired from fa0ca2726ea9 ("DSMP 64K support") and
>> 4ef803e12baf ("mmu: large-page: Added support for multiple kernel page
>> sizes") from
>> https://github.com/MarvellEmbeddedProcessors/linux-marvell.git
>>
>> Signed-off-by: Gregory CLEMENT <gregory.clement@bootlin.com>
>
> IIRC using page sizes above 16KB here also requires using a
> non-ancient linker in user space that places sections on
> ELF_EXEC_PAGESIZE boundaries, right?

Actually I only tested the kernel with userspace built with pretty
recent toolchains. The oldest one I used was using gcc 7.3 and ld 2.28.

Gregory

>
>       Arnd

-- 
Gregory Clement, Bootlin
Embedded Linux and Kernel engineering
http://bootlin.com

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 0/6] ARM: Add support for large kernel page (from 8K to 64K)
  2020-06-11 16:21 ` [PATCH v2 0/6] ARM: Add support for large kernel page (from 8K to 64K) Russell King - ARM Linux admin
@ 2020-06-12  9:15   ` Gregory CLEMENT
  2020-06-12  9:23   ` Arnd Bergmann
  1 sibling, 0 replies; 22+ messages in thread
From: Gregory CLEMENT @ 2020-06-12  9:15 UTC (permalink / raw)
  To: Russell King - ARM Linux admin
  Cc: Thomas Petazzoni, linux-arm-kernel, Arnd Bergmann

Hello Russell,

> Hi Gregory,
>
> You're on your own with this one; I've no motivation to re-understand
> the ARM page table code now that 32-bit ARM is basically unsupported
> now.

Understood.

>
> I'll point out some of the things you got wrong below though.

However thanks for your pointer.

>
> On Thu, Jun 11, 2020 at 03:49:08PM +0200, Gregory CLEMENT wrote:
>> Hello,
>> 
>> On ARM based NAS it is possible to have storage volume larger than
>> 16TB, especially with the use of LVM. However, on 32-bit architectures,
>> the page cache index is stored on 32 bits, which means that given a
>> page size of 4 KB, we can only address volumes of up to 16 TB.
>> 
>> Therefore, one option to use such large volumes and filesystems on 32
>> bits architecture is to increase the page size.
>> 
>> This series allows to support 8K, 16K, 32K and 64K kernel pages. On
>> ARM the size of the page can be either 4K or 64K, so for the other
>> size a "software emulation" is used, here Linux thinks it is using
>> pages of 8 KB, 16 KB or 32 KB, while underneath the MMU still uses 4
>> KB pages.
>> 
>> For ARM there is already a difference between the kernel page and the
>> hardware page in the way they are managed. In the same 4K space the
>> Linux kernel deals with 2 PTE tables at the beginning, while the
>> hardware deals with 2 other hardware PTE tables.
>
> This is incorrect.  The kernel page size and the hardware page size
> match today - both are 4k.  What your'e talking about here is the
> PTE table size.
>
> The kernel requires that each PTE table is contained within one
> struct page.  Since one hardware PTE table is 256 entries, it
> occupies 1024 bytes, so a quarter of a page.  So, to have a single
> 4k page per PTE table would waste quite a bit of space.
>
> Now, the hardware PTE tables do not lend themselves to the kernel's
> usage: the kernel wants additional bits to track the state of each
> page in the page tables.  Hence, we need to shadow every PTE entry.
> This also provides us independence of the underlying hardware PTE
> entry format, which varies between ARM architecture versions.
>
> So, we end up with a single 4k page containing two consecutive
> hardware PTE tables, followed by two Linux PTE tables for the kernels
> benefit.
>

It was what I understood, but I seemed I didn't formulate it accurately.

> If you increase the page size, then you need to increase the number
> of tables in a page, or suffer a huge amount of wasted memory taken
> for the page tables - going to an 8k page size means that the upper
> 4k of each page will not be used.  Going to 16k means the upper 12k
> won't be used.  And so on - as your software page size increases,
> the amount of memory wasted for each PTE table will increase
> unless you also increase the number of hardware 1st level entries
> pointing to each PTE page.  With 64k pages, 60k of each PTE page
> will remain unused.

Unfortunately I was aware of it. But I thought that it was an acceptable
drawback to be able to address large volume on a 32 bits ARM
system. Actually it is already the case on some product.

> That isn't very efficient use of memory.

Indeed however on a 3GB system, in the worst case we need 786432 pages
of 4K to map the memory. These pages can be mapped by 1536 block of 512
entries. So when the 64K pages are emulated we loose 92MB (around 3% of
the memory). So it is not negligible but given the use case I seems
acceptable.

Of course, it didn't prevent to try to do better.

Gregory
>
> -- 
> RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
> FTTC for 0.8m (est. 1762m) line in suburbia: sync at 13.1Mbps down 503kbps up

-- 
Gregory Clement, Bootlin
Embedded Linux and Kernel engineering
http://bootlin.com

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 0/6] ARM: Add support for large kernel page (from 8K to 64K)
  2020-06-11 16:21 ` [PATCH v2 0/6] ARM: Add support for large kernel page (from 8K to 64K) Russell King - ARM Linux admin
  2020-06-12  9:15   ` Gregory CLEMENT
@ 2020-06-12  9:23   ` Arnd Bergmann
  2020-06-12 12:21     ` Catalin Marinas
  1 sibling, 1 reply; 22+ messages in thread
From: Arnd Bergmann @ 2020-06-12  9:23 UTC (permalink / raw)
  To: Russell King - ARM Linux admin
  Cc: Gregory CLEMENT, Thomas Petazzoni, Linux ARM

On Thu, Jun 11, 2020 at 6:21 PM Russell King - ARM Linux admin
<linux@armlinux.org.uk> wrote:

> If you increase the page size, then you need to increase the number
> of tables in a page, or suffer a huge amount of wasted memory taken
> for the page tables - going to an 8k page size means that the upper
> 4k of each page will not be used.  Going to 16k means the upper 12k
> won't be used.  And so on - as your software page size increases,
> the amount of memory wasted for each PTE table will increase
> unless you also increase the number of hardware 1st level entries
> pointing to each PTE page.  With 64k pages, 60k of each PTE page
> will remain unused.
>
> That isn't very efficient use of memory.

I think this could be addressed by using the full page to contain
PTEs by making PTRS_PER_PTE larger and PTRS_PER_PGD
smaller, but there is an even bigger problem in the added memory
usage and I/O overhead for basically everything else: in any
sparsely populated memory mapped file or anonymous mapping,
the memory usage grows with the page size as well.

I think Synology's vendor kernels for their NAS boxes have a
different hack to make large file systems work, by extending
the internal data types (I forgot which ones) to 64 bit. That is
probably more invasive to the generic kernel code, but should
be much more efficient and less invasive to ARM architecture
specific code.

Either way, I wonder what the intended use cases are. Is this
work mainly intended for

a) running Debian/Buildroot/Yocto/... with (close to) upstream
   kernels on older NAS boxes,
b) commercial products that use 32-bit SoCs in multi-disk
    NAS boxes with vendor upgrades to future kernels, or
c) commercial products using 64-bit SoCs but 32-bit kernels?

My feeling is that any commercial products that need this are
either stuck on old kernels already, or they have moved on
to 64-bit chips and are better off running a 64-bit kernel[1], so
a) seems like the main purpose, right?

       Arnd

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 4/6] ARM: mm: Aligned pte allocation to one page
  2020-06-12  8:37   ` Arnd Bergmann
@ 2020-06-12 10:25     ` Catalin Marinas
  2020-06-12 11:56     ` Gregory CLEMENT
  1 sibling, 0 replies; 22+ messages in thread
From: Catalin Marinas @ 2020-06-12 10:25 UTC (permalink / raw)
  To: Arnd Bergmann; +Cc: Linux ARM, Gregory CLEMENT, Russell King, Thomas Petazzoni

On Fri, Jun 12, 2020 at 10:37:15AM +0200, Arnd Bergmann wrote:
> On Thu, Jun 11, 2020 at 3:49 PM Gregory CLEMENT
> <gregory.clement@bootlin.com> wrote:
> > In pte_offset_kernel() the pte_index macro is used. This macro makes
> > the assumption that the address is aligned to a page size.
> >
> > In arm_pte_allocation, the size allocated is the size needed for 512
> > entries. Actually this size was calculated to fit in a 4K page. When
> > using larger page, the size of the table allocated is no more
> > aligned which end to give a wrong physical address.
> >
> > The solution is to round up the allocation to a page size instead of
> > the exact size of the tables (which is 4KB). It allows to comply with
> > the assumption of pte_index() but the drawback is a waste of memory
> > for the early allocation if page size is bigger than 4KB.
> 
> Have you considered increasing PTRS_PER_PTE instead to fill up
> a logical page instead? If that doesn't work, can you explain here
> why not?

From what I remember, increasing the PTRS_PER_PTE also requires changing
the pgd_t array to cover them. As a side-effect, {PMD,PGDIR}_SHIFT would
need to increase. cpu_v7_set_pte_ext() also needs to take care of the
software pte offset (hard-coded at 2048 now).

Many years ago I had some patches to get rid of the software pte offset
but it wasn't really worth the maintenance hassle (only possible for
ARMv6/7). I'm not even sure it's feasible now if we gained more L_PTE_*
bits in the meantime.

-- 
Catalin

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 1/6] ARM: Use PAGE_SIZE for ELF_EXEC_PAGESIZE
  2020-06-12  8:46       ` Arnd Bergmann
  2020-06-12  8:50         ` Russell King - ARM Linux admin
@ 2020-06-12 11:50         ` Catalin Marinas
  2020-06-12 12:06         ` Gregory CLEMENT
  2 siblings, 0 replies; 22+ messages in thread
From: Catalin Marinas @ 2020-06-12 11:50 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Linux ARM, Gregory CLEMENT, Russell King - ARM Linux admin,
	Thomas Petazzoni

On Fri, Jun 12, 2020 at 10:46:17AM +0200, Arnd Bergmann wrote:
> On Fri, Jun 12, 2020 at 10:35 AM Russell King - ARM Linux admin
> <linux@armlinux.org.uk> wrote:
> > On Fri, Jun 12, 2020 at 10:22:17AM +0200, Arnd Bergmann wrote:
> > > On Thu, Jun 11, 2020 at 3:49 PM Gregory CLEMENT
> > > <gregory.clement@bootlin.com> wrote:
> > > >
> > > > Currently ELF_EXEC_PAGESIZE is 4096 which is also the page size. In
> > > > order to be able to use other size of page than 4K, use PAGE_SIZE
> > > > instead of the hardcoded value.
> > > >
> > > > The use of PAGE_SIZE will be also aligned with what we find in other
> > > > architectures such as arm64.
> > > >
> > > > This is inspired from fa0ca2726ea9 ("DSMP 64K support") and
> > > > 4ef803e12baf ("mmu: large-page: Added support for multiple kernel page
> > > > sizes") from
> > > > https://github.com/MarvellEmbeddedProcessors/linux-marvell.git
> > > >
> > > > Signed-off-by: Gregory CLEMENT <gregory.clement@bootlin.com>
> > >
> > > IIRC using page sizes above 16KB here also requires using a
> > > non-ancient linker in user space that places sections on
> > > ELF_EXEC_PAGESIZE boundaries, right?
> 
> Correction: I was thinking of SHMLBA, not ELF_EXEC_PAGESIZE.
> SHMLBA is defined to 16KB in arch/arm/ at the moment (based on 4K
> page size), or (4 * PAGE_SIZE) on arm64, which can blow up to 256KB.
> 
> AFAICT, SHMLBA should now be defined as "min(16384, PAGE_SIZE)".

Good point. We should do this with the COMPAT_SHMLBA on arm64 (we didn't
bother since COMPAT had a dependency on 4K but you can override it with
EXPERT).

> > Doesn't that mean that this change breaks all existing userspace when
> > ELF_EXEC_PAGESIZE is not 4k?
> 
> I think a lot of older user space would be broken with page sizes larger
> than 16KB, but would still work with 8KB or 16KB. Larger page sizes
> would only work with user space that was linked in the last five years
> or so, using a toolchain that has the workarounds for running on arm64
> with 64KB page size.

FWIW, Debian armhf now boots fine on an arm64 kernel with 64K pages (it
wasn't the case some years ago).

-- 
Catalin

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 4/6] ARM: mm: Aligned pte allocation to one page
  2020-06-12  8:37   ` Arnd Bergmann
  2020-06-12 10:25     ` Catalin Marinas
@ 2020-06-12 11:56     ` Gregory CLEMENT
  1 sibling, 0 replies; 22+ messages in thread
From: Gregory CLEMENT @ 2020-06-12 11:56 UTC (permalink / raw)
  To: Arnd Bergmann; +Cc: Thomas Petazzoni, Russell King, Linux ARM

Arnd Bergmann <arnd@arndb.de> writes:

> On Thu, Jun 11, 2020 at 3:49 PM Gregory CLEMENT
> <gregory.clement@bootlin.com> wrote:
>>
>> In pte_offset_kernel() the pte_index macro is used. This macro makes
>> the assumption that the address is aligned to a page size.
>>
>> In arm_pte_allocation, the size allocated is the size needed for 512
>> entries. Actually this size was calculated to fit in a 4K page. When
>> using larger page, the size of the table allocated is no more
>> aligned which end to give a wrong physical address.
>>
>> The solution is to round up the allocation to a page size instead of
>> the exact size of the tables (which is 4KB). It allows to comply with
>> the assumption of pte_index() but the drawback is a waste of memory
>> for the early allocation if page size is bigger than 4KB.
>
> Have you considered increasing PTRS_PER_PTE instead to fill up
> a logical page instead? If that doesn't work, can you explain here
> why not?

Actually for this situation I didn't try to do better but it is only
used very early during the boot. Then I'm expecting that the allocation
is done though slab so with object at the exact size we need.

However you also pointed modifying PTRS_PER_PTE for the overall memory
consumption in the cover letter and in this case, it could worth
modifying it.

Gregory

>
>        Arnd

-- 
Gregory Clement, Bootlin
Embedded Linux and Kernel engineering
http://bootlin.com

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 1/6] ARM: Use PAGE_SIZE for ELF_EXEC_PAGESIZE
  2020-06-12  8:46       ` Arnd Bergmann
  2020-06-12  8:50         ` Russell King - ARM Linux admin
  2020-06-12 11:50         ` Catalin Marinas
@ 2020-06-12 12:06         ` Gregory CLEMENT
  2 siblings, 0 replies; 22+ messages in thread
From: Gregory CLEMENT @ 2020-06-12 12:06 UTC (permalink / raw)
  To: Arnd Bergmann, Russell King - ARM Linux admin; +Cc: Thomas Petazzoni, Linux ARM

Hi Arnd,

> On Fri, Jun 12, 2020 at 10:35 AM Russell King - ARM Linux admin
> <linux@armlinux.org.uk> wrote:
>> On Fri, Jun 12, 2020 at 10:22:17AM +0200, Arnd Bergmann wrote:
>> > On Thu, Jun 11, 2020 at 3:49 PM Gregory CLEMENT
>> > <gregory.clement@bootlin.com> wrote:
>> > >
>> > > Currently ELF_EXEC_PAGESIZE is 4096 which is also the page size. In
>> > > order to be able to use other size of page than 4K, use PAGE_SIZE
>> > > instead of the hardcoded value.
>> > >
>> > > The use of PAGE_SIZE will be also aligned with what we find in other
>> > > architectures such as arm64.
>> > >
>> > > This is inspired from fa0ca2726ea9 ("DSMP 64K support") and
>> > > 4ef803e12baf ("mmu: large-page: Added support for multiple kernel page
>> > > sizes") from
>> > > https://github.com/MarvellEmbeddedProcessors/linux-marvell.git
>> > >
>> > > Signed-off-by: Gregory CLEMENT <gregory.clement@bootlin.com>
>> >
>> > IIRC using page sizes above 16KB here also requires using a
>> > non-ancient linker in user space that places sections on
>> > ELF_EXEC_PAGESIZE boundaries, right?
>
> Correction: I was thinking of SHMLBA, not ELF_EXEC_PAGESIZE.
> SHMLBA is defined to 16KB in arch/arm/ at the moment (based on 4K
> page size), or (4 * PAGE_SIZE) on arm64, which can blow up to 256KB.
>
> AFAICT, SHMLBA should now be defined as "min(16384, PAGE_SIZE)".

We took care of it in patch 5:

+#ifdef CONFIG_ARM_LARGE_PAGE_SUPPORT
+#define    SHMLBA    (16 << 10)         /* attach addr a multiple of (4 * 4096) */
+#else
 #define    SHMLBA    (4 * PAGE_SIZE)         /* attach addr a multiple of this */
+#endif

But your version is better, with it we don't need anymore this ifdef.

Gregory

>
>> Doesn't that mean that this change breaks all existing userspace when
>> ELF_EXEC_PAGESIZE is not 4k?
>
> I think a lot of older user space would be broken with page sizes larger
> than 16KB, but would still work with 8KB or 16KB. Larger page sizes
> would only work with user space that was linked in the last five years
> or so, using a toolchain that has the workarounds for running on arm64
> with 64KB page size.
>
>       Arnd

-- 
Gregory Clement, Bootlin
Embedded Linux and Kernel engineering
http://bootlin.com

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 0/6] ARM: Add support for large kernel page (from 8K to 64K)
  2020-06-12  9:23   ` Arnd Bergmann
@ 2020-06-12 12:21     ` Catalin Marinas
  2020-06-12 12:49       ` Arnd Bergmann
  0 siblings, 1 reply; 22+ messages in thread
From: Catalin Marinas @ 2020-06-12 12:21 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Linux ARM, Gregory CLEMENT, Russell King - ARM Linux admin,
	Thomas Petazzoni

On Fri, Jun 12, 2020 at 11:23:11AM +0200, Arnd Bergmann wrote:
> On Thu, Jun 11, 2020 at 6:21 PM Russell King - ARM Linux admin
> <linux@armlinux.org.uk> wrote:
> 
> > If you increase the page size, then you need to increase the number
> > of tables in a page, or suffer a huge amount of wasted memory taken
> > for the page tables - going to an 8k page size means that the upper
> > 4k of each page will not be used.  Going to 16k means the upper 12k
> > won't be used.  And so on - as your software page size increases,
> > the amount of memory wasted for each PTE table will increase
> > unless you also increase the number of hardware 1st level entries
> > pointing to each PTE page.  With 64k pages, 60k of each PTE page
> > will remain unused.
> >
> > That isn't very efficient use of memory.
> 
> I think this could be addressed by using the full page to contain
> PTEs by making PTRS_PER_PTE larger and PTRS_PER_PGD
> smaller, but there is an even bigger problem in the added memory
> usage and I/O overhead for basically everything else: in any
> sparsely populated memory mapped file or anonymous mapping,
> the memory usage grows with the page size as well.
> 
> I think Synology's vendor kernels for their NAS boxes have a
> different hack to make large file systems work, by extending
> the internal data types (I forgot which ones) to 64 bit. That is
> probably more invasive to the generic kernel code, but should
> be much more efficient and less invasive to ARM architecture
> specific code.

IIUC from Gregory's cover letter, the problem is page->index which is a
pgoff_t, unsigned long. This limits us to a 32-bit page offsets, so a
44-bit actual file offset (16TB). It may be worth exploring this than
hacking the page tables to pretend we have bigger page sizes.

-- 
Catalin

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 0/6] ARM: Add support for large kernel page (from 8K to 64K)
  2020-06-12 12:21     ` Catalin Marinas
@ 2020-06-12 12:49       ` Arnd Bergmann
  0 siblings, 0 replies; 22+ messages in thread
From: Arnd Bergmann @ 2020-06-12 12:49 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Linux ARM, Gregory CLEMENT, Russell King - ARM Linux admin,
	Thomas Petazzoni

On Fri, Jun 12, 2020 at 2:21 PM Catalin Marinas <catalin.marinas@arm.com> wrote:
> On Fri, Jun 12, 2020 at 11:23:11AM +0200, Arnd Bergmann wrote:
> > On Thu, Jun 11, 2020 at 6:21 PM Russell King - ARM Linux admin
> > <linux@armlinux.org.uk> wrote:
> >
> > > If you increase the page size, then you need to increase the number
> > > of tables in a page, or suffer a huge amount of wasted memory taken
> > > for the page tables - going to an 8k page size means that the upper
> > > 4k of each page will not be used.  Going to 16k means the upper 12k
> > > won't be used.  And so on - as your software page size increases,
> > > the amount of memory wasted for each PTE table will increase
> > > unless you also increase the number of hardware 1st level entries
> > > pointing to each PTE page.  With 64k pages, 60k of each PTE page
> > > will remain unused.
> > >
> > > That isn't very efficient use of memory.
> >
> > I think this could be addressed by using the full page to contain
> > PTEs by making PTRS_PER_PTE larger and PTRS_PER_PGD
> > smaller, but there is an even bigger problem in the added memory
> > usage and I/O overhead for basically everything else: in any
> > sparsely populated memory mapped file or anonymous mapping,
> > the memory usage grows with the page size as well.
> >
> > I think Synology's vendor kernels for their NAS boxes have a
> > different hack to make large file systems work, by extending
> > the internal data types (I forgot which ones) to 64 bit. That is
> > probably more invasive to the generic kernel code, but should
> > be much more efficient and less invasive to ARM architecture
> > specific code.
>
> IIUC from Gregory's cover letter, the problem is page->index which is a
> pgoff_t, unsigned long. This limits us to a 32-bit page offsets, so a
> 44-bit actual file offset (16TB). It may be worth exploring this than
> hacking the page tables to pretend we have bigger page sizes.

Right, that's at least one type that needs to be changed, there
may be additional ones besides it. In Synology's patch, is also
a new rdx_t type that is defined the same way and used elsewhere,
at least for some SoCs (they use a maze of #ifdefs to merge
all the vendor kernels, and they also strip all code comments
and git history from the tarballs).

https://pastebin.com/e8C1zhzG has an attempt to split out
the relevant changes from the linux-3.10.105 tarball that they
use on Armada 385, see
https://sourceforge.net/projects/dsgpl/files/Synology%20NAS%20GPL%20Source/24922branch/armada38x-source/linux-3.10.x-bsp.txz/download
for the full kernel sources.

      Arnd

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2020-06-12 12:49 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-11 13:49 [PATCH v2 0/6] ARM: Add support for large kernel page (from 8K to 64K) Gregory CLEMENT
2020-06-11 13:49 ` [PATCH v2 1/6] ARM: Use PAGE_SIZE for ELF_EXEC_PAGESIZE Gregory CLEMENT
2020-06-12  8:22   ` Arnd Bergmann
2020-06-12  8:35     ` Russell King - ARM Linux admin
2020-06-12  8:46       ` Arnd Bergmann
2020-06-12  8:50         ` Russell King - ARM Linux admin
2020-06-12 11:50         ` Catalin Marinas
2020-06-12 12:06         ` Gregory CLEMENT
2020-06-12  8:52     ` Gregory CLEMENT
2020-06-11 13:49 ` [PATCH v2 2/6] ARM: pagetable: prepare hardware page table to use large page Gregory CLEMENT
2020-06-11 13:49 ` [PATCH v2 3/6] ARM: Make the number of fix bitmap depend on the page size Gregory CLEMENT
2020-06-11 13:49 ` [PATCH v2 4/6] ARM: mm: Aligned pte allocation to one page Gregory CLEMENT
2020-06-12  8:37   ` Arnd Bergmann
2020-06-12 10:25     ` Catalin Marinas
2020-06-12 11:56     ` Gregory CLEMENT
2020-06-11 13:49 ` [PATCH v2 5/6] ARM: Add large kernel page support Gregory CLEMENT
2020-06-11 13:49 ` [PATCH v2 6/6] ARM: Add 64K page support at MMU level Gregory CLEMENT
2020-06-11 16:21 ` [PATCH v2 0/6] ARM: Add support for large kernel page (from 8K to 64K) Russell King - ARM Linux admin
2020-06-12  9:15   ` Gregory CLEMENT
2020-06-12  9:23   ` Arnd Bergmann
2020-06-12 12:21     ` Catalin Marinas
2020-06-12 12:49       ` Arnd Bergmann

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.