* [RFC 0/3] arm: support get_user_pages_fast
@ 2018-09-06 10:22 ` Minchan Kim
0 siblings, 0 replies; 12+ messages in thread
From: Minchan Kim @ 2018-09-06 10:22 UTC (permalink / raw)
To: Andrew Morton, linux
Cc: steve.capper, will.deacon, catalin.marinas, linux-kernel,
linux-arm-kernel, kernel-team, android-treble-mediatek-ext,
Minchan Kim
Recently, I got a report get_user_pages_fast helps app's launching time
due to reducing uninterruptible sleep time because it could reduce
mmap_sem lock contentions when app is launching.
To support gupf in ARM-non-LPAE, first patch reorders memory type table
to use 5th bit of the page table. It seems we don't use the bit in ARMv6+
but it needs double check from maintainers.
Second patch introduces L_PTE_SPECIAL for arm so that last patch can
support get_user_pags_fast.
I would greatly appreciate if guys review that I screw up something,
especially, architecture stuffs.
Thanks.
Minchan Kim (3):
arm: mm: reordering memory type table
arm: mm: introduce L_PTE_SPECIAL
arm: mm: support get_user_pages_fast
arch/arm/Kconfig | 2 +-
arch/arm/include/asm/pgtable-2level.h | 16 +-
arch/arm/include/asm/pgtable-3level.h | 6 -
arch/arm/include/asm/pgtable.h | 13 ++
arch/arm/mm/Makefile | 6 +
arch/arm/mm/gup.c | 221 ++++++++++++++++++++++++++
arch/arm/mm/proc-macros.S | 16 +-
7 files changed, 261 insertions(+), 19 deletions(-)
create mode 100644 arch/arm/mm/gup.c
--
2.19.0.rc1.350.ge57e33dbd1-goog
^ permalink raw reply [flat|nested] 12+ messages in thread
* [RFC 0/3] arm: support get_user_pages_fast
@ 2018-09-06 10:22 ` Minchan Kim
0 siblings, 0 replies; 12+ messages in thread
From: Minchan Kim @ 2018-09-06 10:22 UTC (permalink / raw)
To: linux-arm-kernel
Recently, I got a report get_user_pages_fast helps app's launching time
due to reducing uninterruptible sleep time because it could reduce
mmap_sem lock contentions when app is launching.
To support gupf in ARM-non-LPAE, first patch reorders memory type table
to use 5th bit of the page table. It seems we don't use the bit in ARMv6+
but it needs double check from maintainers.
Second patch introduces L_PTE_SPECIAL for arm so that last patch can
support get_user_pags_fast.
I would greatly appreciate if guys review that I screw up something,
especially, architecture stuffs.
Thanks.
Minchan Kim (3):
arm: mm: reordering memory type table
arm: mm: introduce L_PTE_SPECIAL
arm: mm: support get_user_pages_fast
arch/arm/Kconfig | 2 +-
arch/arm/include/asm/pgtable-2level.h | 16 +-
arch/arm/include/asm/pgtable-3level.h | 6 -
arch/arm/include/asm/pgtable.h | 13 ++
arch/arm/mm/Makefile | 6 +
arch/arm/mm/gup.c | 221 ++++++++++++++++++++++++++
arch/arm/mm/proc-macros.S | 16 +-
7 files changed, 261 insertions(+), 19 deletions(-)
create mode 100644 arch/arm/mm/gup.c
--
2.19.0.rc1.350.ge57e33dbd1-goog
^ permalink raw reply [flat|nested] 12+ messages in thread
* [RFC 1/3] arm: mm: reordering memory type table
2018-09-06 10:22 ` Minchan Kim
@ 2018-09-06 10:22 ` Minchan Kim
-1 siblings, 0 replies; 12+ messages in thread
From: Minchan Kim @ 2018-09-06 10:22 UTC (permalink / raw)
To: Andrew Morton, linux
Cc: steve.capper, will.deacon, catalin.marinas, linux-kernel,
linux-arm-kernel, kernel-team, android-treble-mediatek-ext,
Minchan Kim
To use bit 5th in page table, we need a room for that and it seems
we don't need 4 bits for the memory type with ARMv6+.
If so, let's reorder bits to make bit 5 free.
We will use the bit for L_PTE_SPECIAL in next patch.
Cc: Russell King <linux@armlinux.org.uk>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Steve Capper <steve.capper@linaro.org>
Signed-off-by: Minchan Kim <minchan@kernel.org>
---
arch/arm/include/asm/pgtable-2level.h | 13 +++++++++++--
arch/arm/mm/proc-macros.S | 16 ++++++++--------
2 files changed, 19 insertions(+), 10 deletions(-)
diff --git a/arch/arm/include/asm/pgtable-2level.h b/arch/arm/include/asm/pgtable-2level.h
index 92fd2c8a9af0..91b99fadcba1 100644
--- a/arch/arm/include/asm/pgtable-2level.h
+++ b/arch/arm/include/asm/pgtable-2level.h
@@ -164,14 +164,23 @@
#define L_PTE_MT_BUFFERABLE (_AT(pteval_t, 0x01) << 2) /* 0001 */
#define L_PTE_MT_WRITETHROUGH (_AT(pteval_t, 0x02) << 2) /* 0010 */
#define L_PTE_MT_WRITEBACK (_AT(pteval_t, 0x03) << 2) /* 0011 */
+#define L_PTE_MT_DEV_SHARED (_AT(pteval_t, 0x04) << 2) /* 0100 */
#define L_PTE_MT_MINICACHE (_AT(pteval_t, 0x06) << 2) /* 0110 (sa1100, xscale) */
#define L_PTE_MT_WRITEALLOC (_AT(pteval_t, 0x07) << 2) /* 0111 */
-#define L_PTE_MT_DEV_SHARED (_AT(pteval_t, 0x04) << 2) /* 0100 */
-#define L_PTE_MT_DEV_NONSHARED (_AT(pteval_t, 0x0c) << 2) /* 1100 */
+#if defined(CONFIG_CPU_V7) || defined(CONFIG_CPU_V7M) || \
+ defined (CONFIG_CPU_V6) || defined(CONFIG_CPU_V6K)
+#define L_PTE_MT_DEV_WC L_PTE_MT_BUFFERABLE
+#define L_PTE_MT_DEV_CACHED L_PTE_MT_WRITEBACK
+#define L_PTE_MT_DEV_NONSHARED L_PTE_MT_MINICACHE
+#define L_PTE_MT_VECTORS (_AT(pteval_t, 0x05) << 2) /* 0101 */
+#define L_PTE_MT_MASK (_AT(pteval_t, 0x07) << 2)
+#else
#define L_PTE_MT_DEV_WC (_AT(pteval_t, 0x09) << 2) /* 1001 */
#define L_PTE_MT_DEV_CACHED (_AT(pteval_t, 0x0b) << 2) /* 1011 */
+#define L_PTE_MT_DEV_NONSHARED (_AT(pteval_t, 0x0c) << 2) /* 1100 */
#define L_PTE_MT_VECTORS (_AT(pteval_t, 0x0f) << 2) /* 1111 */
#define L_PTE_MT_MASK (_AT(pteval_t, 0x0f) << 2)
+#endif
#ifndef __ASSEMBLY__
diff --git a/arch/arm/mm/proc-macros.S b/arch/arm/mm/proc-macros.S
index 81d0efb055c6..f896a30653fa 100644
--- a/arch/arm/mm/proc-macros.S
+++ b/arch/arm/mm/proc-macros.S
@@ -134,21 +134,21 @@
.macro armv6_mt_table pfx
\pfx\()_mt_table:
.long 0x00 @ L_PTE_MT_UNCACHED
- .long PTE_EXT_TEX(1) @ L_PTE_MT_BUFFERABLE
+ .long PTE_EXT_TEX(1) @ L_PTE_MT_BUFFERABLE(L_PTE_MT_DEV_WC)
.long PTE_CACHEABLE @ L_PTE_MT_WRITETHROUGH
- .long PTE_CACHEABLE | PTE_BUFFERABLE @ L_PTE_MT_WRITEBACK
+ .long PTE_CACHEABLE | PTE_BUFFERABLE @ L_PTE_MT_WRITEBACK(L_PTE_MT_DEV_CACHED)
.long PTE_BUFFERABLE @ L_PTE_MT_DEV_SHARED
- .long 0x00 @ unused
- .long 0x00 @ L_PTE_MT_MINICACHE (not present)
+ .long PTE_CACHEABLE | PTE_BUFFERABLE | PTE_EXT_APX @ L_PTE_MT_VECTORS
+ .long PTE_EXT_TEX(2) @ L_PTE_MT_DEV_NONSHARED
.long PTE_EXT_TEX(1) | PTE_CACHEABLE | PTE_BUFFERABLE @ L_PTE_MT_WRITEALLOC
.long 0x00 @ unused
- .long PTE_EXT_TEX(1) @ L_PTE_MT_DEV_WC
.long 0x00 @ unused
- .long PTE_CACHEABLE | PTE_BUFFERABLE @ L_PTE_MT_DEV_CACHED
- .long PTE_EXT_TEX(2) @ L_PTE_MT_DEV_NONSHARED
.long 0x00 @ unused
.long 0x00 @ unused
- .long PTE_CACHEABLE | PTE_BUFFERABLE | PTE_EXT_APX @ L_PTE_MT_VECTORS
+ .long 0x00 @ unused
+ .long 0x00 @ unused
+ .long 0x00 @ unused
+ .long 0x00 @ unused
.endm
.macro armv6_set_pte_ext pfx
--
2.19.0.rc1.350.ge57e33dbd1-goog
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [RFC 1/3] arm: mm: reordering memory type table
@ 2018-09-06 10:22 ` Minchan Kim
0 siblings, 0 replies; 12+ messages in thread
From: Minchan Kim @ 2018-09-06 10:22 UTC (permalink / raw)
To: linux-arm-kernel
To use bit 5th in page table, we need a room for that and it seems
we don't need 4 bits for the memory type with ARMv6+.
If so, let's reorder bits to make bit 5 free.
We will use the bit for L_PTE_SPECIAL in next patch.
Cc: Russell King <linux@armlinux.org.uk>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Steve Capper <steve.capper@linaro.org>
Signed-off-by: Minchan Kim <minchan@kernel.org>
---
arch/arm/include/asm/pgtable-2level.h | 13 +++++++++++--
arch/arm/mm/proc-macros.S | 16 ++++++++--------
2 files changed, 19 insertions(+), 10 deletions(-)
diff --git a/arch/arm/include/asm/pgtable-2level.h b/arch/arm/include/asm/pgtable-2level.h
index 92fd2c8a9af0..91b99fadcba1 100644
--- a/arch/arm/include/asm/pgtable-2level.h
+++ b/arch/arm/include/asm/pgtable-2level.h
@@ -164,14 +164,23 @@
#define L_PTE_MT_BUFFERABLE (_AT(pteval_t, 0x01) << 2) /* 0001 */
#define L_PTE_MT_WRITETHROUGH (_AT(pteval_t, 0x02) << 2) /* 0010 */
#define L_PTE_MT_WRITEBACK (_AT(pteval_t, 0x03) << 2) /* 0011 */
+#define L_PTE_MT_DEV_SHARED (_AT(pteval_t, 0x04) << 2) /* 0100 */
#define L_PTE_MT_MINICACHE (_AT(pteval_t, 0x06) << 2) /* 0110 (sa1100, xscale) */
#define L_PTE_MT_WRITEALLOC (_AT(pteval_t, 0x07) << 2) /* 0111 */
-#define L_PTE_MT_DEV_SHARED (_AT(pteval_t, 0x04) << 2) /* 0100 */
-#define L_PTE_MT_DEV_NONSHARED (_AT(pteval_t, 0x0c) << 2) /* 1100 */
+#if defined(CONFIG_CPU_V7) || defined(CONFIG_CPU_V7M) || \
+ defined (CONFIG_CPU_V6) || defined(CONFIG_CPU_V6K)
+#define L_PTE_MT_DEV_WC L_PTE_MT_BUFFERABLE
+#define L_PTE_MT_DEV_CACHED L_PTE_MT_WRITEBACK
+#define L_PTE_MT_DEV_NONSHARED L_PTE_MT_MINICACHE
+#define L_PTE_MT_VECTORS (_AT(pteval_t, 0x05) << 2) /* 0101 */
+#define L_PTE_MT_MASK (_AT(pteval_t, 0x07) << 2)
+#else
#define L_PTE_MT_DEV_WC (_AT(pteval_t, 0x09) << 2) /* 1001 */
#define L_PTE_MT_DEV_CACHED (_AT(pteval_t, 0x0b) << 2) /* 1011 */
+#define L_PTE_MT_DEV_NONSHARED (_AT(pteval_t, 0x0c) << 2) /* 1100 */
#define L_PTE_MT_VECTORS (_AT(pteval_t, 0x0f) << 2) /* 1111 */
#define L_PTE_MT_MASK (_AT(pteval_t, 0x0f) << 2)
+#endif
#ifndef __ASSEMBLY__
diff --git a/arch/arm/mm/proc-macros.S b/arch/arm/mm/proc-macros.S
index 81d0efb055c6..f896a30653fa 100644
--- a/arch/arm/mm/proc-macros.S
+++ b/arch/arm/mm/proc-macros.S
@@ -134,21 +134,21 @@
.macro armv6_mt_table pfx
\pfx\()_mt_table:
.long 0x00 @ L_PTE_MT_UNCACHED
- .long PTE_EXT_TEX(1) @ L_PTE_MT_BUFFERABLE
+ .long PTE_EXT_TEX(1) @ L_PTE_MT_BUFFERABLE(L_PTE_MT_DEV_WC)
.long PTE_CACHEABLE @ L_PTE_MT_WRITETHROUGH
- .long PTE_CACHEABLE | PTE_BUFFERABLE @ L_PTE_MT_WRITEBACK
+ .long PTE_CACHEABLE | PTE_BUFFERABLE @ L_PTE_MT_WRITEBACK(L_PTE_MT_DEV_CACHED)
.long PTE_BUFFERABLE @ L_PTE_MT_DEV_SHARED
- .long 0x00 @ unused
- .long 0x00 @ L_PTE_MT_MINICACHE (not present)
+ .long PTE_CACHEABLE | PTE_BUFFERABLE | PTE_EXT_APX @ L_PTE_MT_VECTORS
+ .long PTE_EXT_TEX(2) @ L_PTE_MT_DEV_NONSHARED
.long PTE_EXT_TEX(1) | PTE_CACHEABLE | PTE_BUFFERABLE @ L_PTE_MT_WRITEALLOC
.long 0x00 @ unused
- .long PTE_EXT_TEX(1) @ L_PTE_MT_DEV_WC
.long 0x00 @ unused
- .long PTE_CACHEABLE | PTE_BUFFERABLE @ L_PTE_MT_DEV_CACHED
- .long PTE_EXT_TEX(2) @ L_PTE_MT_DEV_NONSHARED
.long 0x00 @ unused
.long 0x00 @ unused
- .long PTE_CACHEABLE | PTE_BUFFERABLE | PTE_EXT_APX @ L_PTE_MT_VECTORS
+ .long 0x00 @ unused
+ .long 0x00 @ unused
+ .long 0x00 @ unused
+ .long 0x00 @ unused
.endm
.macro armv6_set_pte_ext pfx
--
2.19.0.rc1.350.ge57e33dbd1-goog
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [RFC 2/3] arm: mm: introduce L_PTE_SPECIAL
2018-09-06 10:22 ` Minchan Kim
@ 2018-09-06 10:22 ` Minchan Kim
-1 siblings, 0 replies; 12+ messages in thread
From: Minchan Kim @ 2018-09-06 10:22 UTC (permalink / raw)
To: Andrew Morton, linux
Cc: steve.capper, will.deacon, catalin.marinas, linux-kernel,
linux-arm-kernel, kernel-team, android-treble-mediatek-ext,
Minchan Kim
This patch introduces L_PTE_SPECIAL and pte functions for supporting
get_user_pages_fast.
Cc: Russell King <linux@armlinux.org.uk>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Steve Capper <steve.capper@linaro.org>
Signed-off-by: Minchan Kim <minchan@kernel.org>
---
arch/arm/Kconfig | 2 +-
arch/arm/include/asm/pgtable-2level.h | 3 +--
arch/arm/include/asm/pgtable-3level.h | 6 ------
arch/arm/include/asm/pgtable.h | 13 +++++++++++++
4 files changed, 15 insertions(+), 9 deletions(-)
diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index e8cd55a5b04c..5d4489a019c4 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -10,7 +10,7 @@ config ARM
select ARCH_HAS_FORTIFY_SOURCE
select ARCH_HAS_KCOV
select ARCH_HAS_MEMBARRIER_SYNC_CORE
- select ARCH_HAS_PTE_SPECIAL if ARM_LPAE
+ select ARCH_HAS_PTE_SPECIAL if (ARM_LPAE || CPU_V7 || CPU_V7M || CPU_V6 || CPUV6K)
select ARCH_HAS_PHYS_TO_DMA
select ARCH_HAS_SET_MEMORY
select ARCH_HAS_STRICT_KERNEL_RWX if MMU && !XIP_KERNEL
diff --git a/arch/arm/include/asm/pgtable-2level.h b/arch/arm/include/asm/pgtable-2level.h
index 91b99fadcba1..82386a2b84e7 100644
--- a/arch/arm/include/asm/pgtable-2level.h
+++ b/arch/arm/include/asm/pgtable-2level.h
@@ -120,6 +120,7 @@
#define L_PTE_VALID (_AT(pteval_t, 1) << 0) /* Valid */
#define L_PTE_PRESENT (_AT(pteval_t, 1) << 0)
#define L_PTE_YOUNG (_AT(pteval_t, 1) << 1)
+#define L_PTE_SPECIAL (_AT(pteval_t, 1) << 5)
#define L_PTE_DIRTY (_AT(pteval_t, 1) << 6)
#define L_PTE_RDONLY (_AT(pteval_t, 1) << 7)
#define L_PTE_USER (_AT(pteval_t, 1) << 8)
@@ -222,8 +223,6 @@ static inline pmd_t *pmd_offset(pud_t *pud, unsigned long addr)
#define pmd_addr_end(addr,end) (end)
#define set_pte_ext(ptep,pte,ext) cpu_set_pte_ext(ptep,pte,ext)
-#define pte_special(pte) (0)
-static inline pte_t pte_mkspecial(pte_t pte) { return pte; }
/*
* We don't have huge page support for short descriptors, for the moment
diff --git a/arch/arm/include/asm/pgtable-3level.h b/arch/arm/include/asm/pgtable-3level.h
index 6d50a11d7793..b6f52e16b478 100644
--- a/arch/arm/include/asm/pgtable-3level.h
+++ b/arch/arm/include/asm/pgtable-3level.h
@@ -213,12 +213,6 @@ static inline pmd_t *pmd_offset(pud_t *pud, unsigned long addr)
#define pmd_present(pmd) (pmd_isset((pmd), L_PMD_SECT_VALID))
#define pmd_young(pmd) (pmd_isset((pmd), PMD_SECT_AF))
-#define pte_special(pte) (pte_isset((pte), L_PTE_SPECIAL))
-static inline pte_t pte_mkspecial(pte_t pte)
-{
- pte_val(pte) |= L_PTE_SPECIAL;
- return pte;
-}
#define pmd_write(pmd) (pmd_isclear((pmd), L_PMD_SECT_RDONLY))
#define pmd_dirty(pmd) (pmd_isset((pmd), L_PMD_SECT_DIRTY))
diff --git a/arch/arm/include/asm/pgtable.h b/arch/arm/include/asm/pgtable.h
index a757401129f9..6cc7ce0e423e 100644
--- a/arch/arm/include/asm/pgtable.h
+++ b/arch/arm/include/asm/pgtable.h
@@ -228,6 +228,11 @@ static inline pte_t *pmd_page_vaddr(pmd_t pmd)
#define pte_dirty(pte) (pte_isset((pte), L_PTE_DIRTY))
#define pte_young(pte) (pte_isset((pte), L_PTE_YOUNG))
#define pte_exec(pte) (pte_isclear((pte), L_PTE_XN))
+#ifdef CONFIG_ARCH_HAS_PTE_SPECIAL
+#define pte_special(pte) (pte_isset((pte), L_PTE_SPECIAL))
+#else
+#define pte_special(pte) (0)
+#endif
#define pte_valid_user(pte) \
(pte_valid(pte) && pte_isset((pte), L_PTE_USER) && pte_young(pte))
@@ -318,6 +323,14 @@ static inline pte_t pte_mknexec(pte_t pte)
return set_pte_bit(pte, __pgprot(L_PTE_XN));
}
+#ifdef CONFIG_ARCH_HAS_PTE_SPECIAL
+static inline pte_t pte_mkspecial(pte_t pte)
+{
+ return set_pte_bit(pte, __pgprot(L_PTE_SPECIAL));
+}
+#else
+static inline pte_t pte_mkspecial(pte_t pte) { return pte; }
+#endif
static inline pte_t pte_modify(pte_t pte, pgprot_t newprot)
{
const pteval_t mask = L_PTE_XN | L_PTE_RDONLY | L_PTE_USER |
--
2.19.0.rc1.350.ge57e33dbd1-goog
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [RFC 2/3] arm: mm: introduce L_PTE_SPECIAL
@ 2018-09-06 10:22 ` Minchan Kim
0 siblings, 0 replies; 12+ messages in thread
From: Minchan Kim @ 2018-09-06 10:22 UTC (permalink / raw)
To: linux-arm-kernel
This patch introduces L_PTE_SPECIAL and pte functions for supporting
get_user_pages_fast.
Cc: Russell King <linux@armlinux.org.uk>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Steve Capper <steve.capper@linaro.org>
Signed-off-by: Minchan Kim <minchan@kernel.org>
---
arch/arm/Kconfig | 2 +-
arch/arm/include/asm/pgtable-2level.h | 3 +--
arch/arm/include/asm/pgtable-3level.h | 6 ------
arch/arm/include/asm/pgtable.h | 13 +++++++++++++
4 files changed, 15 insertions(+), 9 deletions(-)
diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index e8cd55a5b04c..5d4489a019c4 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -10,7 +10,7 @@ config ARM
select ARCH_HAS_FORTIFY_SOURCE
select ARCH_HAS_KCOV
select ARCH_HAS_MEMBARRIER_SYNC_CORE
- select ARCH_HAS_PTE_SPECIAL if ARM_LPAE
+ select ARCH_HAS_PTE_SPECIAL if (ARM_LPAE || CPU_V7 || CPU_V7M || CPU_V6 || CPUV6K)
select ARCH_HAS_PHYS_TO_DMA
select ARCH_HAS_SET_MEMORY
select ARCH_HAS_STRICT_KERNEL_RWX if MMU && !XIP_KERNEL
diff --git a/arch/arm/include/asm/pgtable-2level.h b/arch/arm/include/asm/pgtable-2level.h
index 91b99fadcba1..82386a2b84e7 100644
--- a/arch/arm/include/asm/pgtable-2level.h
+++ b/arch/arm/include/asm/pgtable-2level.h
@@ -120,6 +120,7 @@
#define L_PTE_VALID (_AT(pteval_t, 1) << 0) /* Valid */
#define L_PTE_PRESENT (_AT(pteval_t, 1) << 0)
#define L_PTE_YOUNG (_AT(pteval_t, 1) << 1)
+#define L_PTE_SPECIAL (_AT(pteval_t, 1) << 5)
#define L_PTE_DIRTY (_AT(pteval_t, 1) << 6)
#define L_PTE_RDONLY (_AT(pteval_t, 1) << 7)
#define L_PTE_USER (_AT(pteval_t, 1) << 8)
@@ -222,8 +223,6 @@ static inline pmd_t *pmd_offset(pud_t *pud, unsigned long addr)
#define pmd_addr_end(addr,end) (end)
#define set_pte_ext(ptep,pte,ext) cpu_set_pte_ext(ptep,pte,ext)
-#define pte_special(pte) (0)
-static inline pte_t pte_mkspecial(pte_t pte) { return pte; }
/*
* We don't have huge page support for short descriptors, for the moment
diff --git a/arch/arm/include/asm/pgtable-3level.h b/arch/arm/include/asm/pgtable-3level.h
index 6d50a11d7793..b6f52e16b478 100644
--- a/arch/arm/include/asm/pgtable-3level.h
+++ b/arch/arm/include/asm/pgtable-3level.h
@@ -213,12 +213,6 @@ static inline pmd_t *pmd_offset(pud_t *pud, unsigned long addr)
#define pmd_present(pmd) (pmd_isset((pmd), L_PMD_SECT_VALID))
#define pmd_young(pmd) (pmd_isset((pmd), PMD_SECT_AF))
-#define pte_special(pte) (pte_isset((pte), L_PTE_SPECIAL))
-static inline pte_t pte_mkspecial(pte_t pte)
-{
- pte_val(pte) |= L_PTE_SPECIAL;
- return pte;
-}
#define pmd_write(pmd) (pmd_isclear((pmd), L_PMD_SECT_RDONLY))
#define pmd_dirty(pmd) (pmd_isset((pmd), L_PMD_SECT_DIRTY))
diff --git a/arch/arm/include/asm/pgtable.h b/arch/arm/include/asm/pgtable.h
index a757401129f9..6cc7ce0e423e 100644
--- a/arch/arm/include/asm/pgtable.h
+++ b/arch/arm/include/asm/pgtable.h
@@ -228,6 +228,11 @@ static inline pte_t *pmd_page_vaddr(pmd_t pmd)
#define pte_dirty(pte) (pte_isset((pte), L_PTE_DIRTY))
#define pte_young(pte) (pte_isset((pte), L_PTE_YOUNG))
#define pte_exec(pte) (pte_isclear((pte), L_PTE_XN))
+#ifdef CONFIG_ARCH_HAS_PTE_SPECIAL
+#define pte_special(pte) (pte_isset((pte), L_PTE_SPECIAL))
+#else
+#define pte_special(pte) (0)
+#endif
#define pte_valid_user(pte) \
(pte_valid(pte) && pte_isset((pte), L_PTE_USER) && pte_young(pte))
@@ -318,6 +323,14 @@ static inline pte_t pte_mknexec(pte_t pte)
return set_pte_bit(pte, __pgprot(L_PTE_XN));
}
+#ifdef CONFIG_ARCH_HAS_PTE_SPECIAL
+static inline pte_t pte_mkspecial(pte_t pte)
+{
+ return set_pte_bit(pte, __pgprot(L_PTE_SPECIAL));
+}
+#else
+static inline pte_t pte_mkspecial(pte_t pte) { return pte; }
+#endif
static inline pte_t pte_modify(pte_t pte, pgprot_t newprot)
{
const pteval_t mask = L_PTE_XN | L_PTE_RDONLY | L_PTE_USER |
--
2.19.0.rc1.350.ge57e33dbd1-goog
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [RFC 3/3] arm: mm: support get_user_pages_fast
2018-09-06 10:22 ` Minchan Kim
@ 2018-09-06 10:22 ` Minchan Kim
-1 siblings, 0 replies; 12+ messages in thread
From: Minchan Kim @ 2018-09-06 10:22 UTC (permalink / raw)
To: Andrew Morton, linux
Cc: steve.capper, will.deacon, catalin.marinas, linux-kernel,
linux-arm-kernel, kernel-team, android-treble-mediatek-ext,
Minchan Kim
Recently, there was a report get_user_pages_fast helps app launching
speed due to reducing uninterruptible sleep time because we don't
need to contend for mmap_sem, I believe.
With get_user_pages_fast, that uniterruptible sleep time is reduced
about 5~10% by testing.
Cc: Russell King <linux@armlinux.org.uk>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Steve Capper <steve.capper@linaro.org>
Signed-off-by: Minchan Kim <minchan@kernel.org>
---
arch/arm/mm/Makefile | 6 ++
arch/arm/mm/gup.c | 221 +++++++++++++++++++++++++++++++++++++++++++
2 files changed, 227 insertions(+)
create mode 100644 arch/arm/mm/gup.c
diff --git a/arch/arm/mm/Makefile b/arch/arm/mm/Makefile
index 7cb1699fbfc4..f55f96d56843 100644
--- a/arch/arm/mm/Makefile
+++ b/arch/arm/mm/Makefile
@@ -13,6 +13,12 @@ obj-y += nommu.o
obj-$(CONFIG_ARM_MPU) += pmsa-v7.o pmsa-v8.o
endif
+ifneq ($(CONFIG_ARM_LPAE),y)
+ifeq ($(CONFIG_ARCH_HAS_PTE_SPECIAL),y)
+obj-$(CONFIG_MMU) += gup.o
+endif
+endif
+
obj-$(CONFIG_ARM_PTDUMP_CORE) += dump.o
obj-$(CONFIG_ARM_PTDUMP_DEBUGFS) += ptdump_debugfs.o
obj-$(CONFIG_MODULES) += proc-syms.o
diff --git a/arch/arm/mm/gup.c b/arch/arm/mm/gup.c
new file mode 100644
index 000000000000..44e12fb7430e
--- /dev/null
+++ b/arch/arm/mm/gup.c
@@ -0,0 +1,221 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <linux/mm.h>
+#include <linux/uaccess.h>
+#include <linux/pagemap.h>
+#include <asm/pgtable.h>
+
+static inline pte_t gup_get_pte(pte_t *ptep)
+{
+ return READ_ONCE(*ptep);
+}
+
+static int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end,
+ int write, struct page **pages, int *nr)
+{
+ int ret = 0;
+ pte_t *ptep, *ptem;
+
+ ptem = ptep = pte_offset_map(&pmd, addr);
+ do {
+ pte_t pte = gup_get_pte(ptep);
+ struct page *page;
+
+ if (!pte_access_permitted(pte, write))
+ goto pte_unmap;
+
+ if (pte_special(pte))
+ goto pte_unmap;
+
+ VM_BUG_ON(!pfn_valid(pte_pfn(pte)));
+ page = pte_page(pte);
+
+ if (!page_cache_get_speculative(page))
+ goto pte_unmap;
+
+ if (unlikely(pte_val(pte) != pte_val(*ptep))) {
+ put_page(page);
+ goto pte_unmap;
+ }
+
+ SetPageReferenced(page);
+ pages[*nr] = page;
+ (*nr)++;
+
+ } while (ptep++, addr += PAGE_SIZE, addr != end);
+
+ ret = 1;
+
+pte_unmap:
+ pte_unmap(ptem);
+ return ret;
+}
+
+static int gup_pmd_range(pud_t pud, unsigned long addr, unsigned long end,
+ int write, struct page **pages, int *nr)
+{
+ unsigned long next;
+ pmd_t *pmdp;
+
+ pmdp = pmd_offset(&pud, addr);
+ do {
+ pmd_t pmd = READ_ONCE(*pmdp);
+
+ next = pmd_addr_end(addr, end);
+ if (!pmd_present(pmd))
+ return 0;
+ else if (!gup_pte_range(pmd, addr, next, write, pages, nr))
+ return 0;
+ } while (pmdp++, addr = next, addr != end);
+
+ return 1;
+}
+
+static int gup_pud_range(p4d_t *p4dp, unsigned long addr, unsigned long end,
+ int write, struct page **pages, int *nr)
+{
+ unsigned long next;
+ pud_t *pudp;
+
+ pudp = pud_offset(p4dp, addr);
+ do {
+ pud_t pud = READ_ONCE(*pudp);
+
+ next = pud_addr_end(addr, end);
+ if (pud_none(pud))
+ return 0;
+ else if (!gup_pmd_range(pud, addr, next, write, pages, nr))
+ return 0;
+ } while (pudp++, addr = next, addr != end);
+
+ return 1;
+}
+
+static int gup_p4d_range(pgd_t *pgdp, unsigned long addr, unsigned long end,
+ int write, struct page **pages, int *nr)
+{
+ unsigned long next;
+ p4d_t *p4dp;
+
+ p4dp = p4d_offset(pgdp, addr);
+ do {
+ next = p4d_addr_end(addr, end);
+ if (p4d_none(*p4dp)) {
+ return 0;
+ } else if (!gup_pud_range(p4dp, addr, next, write, pages, nr))
+ return 0;
+ } while (p4dp++, addr = next, addr != end);
+
+ return 1;
+}
+
+
+static void gup_pgd_range(unsigned long addr, unsigned long end,
+ int write, struct page **pages, int *nr)
+{
+ unsigned long next;
+ pgd_t *pgdp;
+
+ pgdp = pgd_offset(current->mm, addr);
+ do {
+ next = pgd_addr_end(addr, end);
+ if (pgd_none(*pgdp))
+ return;
+ else if (!gup_p4d_range(pgdp, addr, next, write, pages, nr))
+ break;
+ } while (pgdp++, addr = next, addr != end);
+}
+
+bool gup_fast_permitted(unsigned long start, int nr_pages, int write)
+{
+ unsigned long len, end;
+
+ len = (unsigned long) nr_pages << PAGE_SHIFT;
+ end = start + len;
+ return end >= start;
+}
+
+/*
+ * Like get_user_pages_fast() except its IRQ-safe in that it won't fall
+ * back to the regular GUP.
+ */
+int __get_user_pages_fast(unsigned long start, int nr_pages, int write,
+ struct page **pages)
+{
+ unsigned long addr, len, end;
+ unsigned long flags;
+ int nr = 0;
+
+ start &= PAGE_MASK;
+ addr = start;
+ len = (unsigned long) nr_pages << PAGE_SHIFT;
+ end = start + len;
+
+ if (unlikely(!access_ok(write ? VERIFY_WRITE : VERIFY_READ,
+ (void __user *)start, len)))
+ return 0;
+
+ /*
+ * Disable interrupts. We use the nested form as we can already have
+ * interrupts disabled by get_futex_key.
+ *
+ * With interrupts disabled, we block page table pages from being
+ * freed from under us. See mmu_gather_tlb in asm-generic/tlb.h
+ * for more details.
+ *
+ * We do not adopt an rcu_read_lock(.) here as we also want to
+ * block IPIs that come from THPs splitting.
+ */
+
+ if (gup_fast_permitted(start, nr_pages, write)) {
+ local_irq_save(flags);
+ gup_pgd_range(addr, end, write, pages, &nr);
+ local_irq_restore(flags);
+ }
+
+ return nr;
+}
+
+int get_user_pages_fast(unsigned long start, int nr_pages, int write,
+ struct page **pages)
+{
+ unsigned long addr, len, end;
+ int nr = 0, ret = 0;
+
+ start &= PAGE_MASK;
+ addr = start;
+ len = (unsigned long) nr_pages << PAGE_SHIFT;
+ end = start + len;
+
+ if (nr_pages <= 0)
+ return 0;
+
+ if (unlikely(!access_ok(write ? VERIFY_WRITE : VERIFY_READ,
+ (void __user *)start, len)))
+ return -EFAULT;
+
+ if (gup_fast_permitted(start, nr_pages, write)) {
+ local_irq_disable();
+ gup_pgd_range(addr, end, write, pages, &nr);
+ local_irq_enable();
+ ret = nr;
+ }
+
+ if (nr < nr_pages) {
+ /* Try to get the remaining pages with get_user_pages */
+ start += nr << PAGE_SHIFT;
+ pages += nr;
+
+ ret = get_user_pages_unlocked(start, nr_pages - nr, pages,
+ write ? FOLL_WRITE : 0);
+
+ /* Have to be a bit careful with return values */
+ if (nr > 0) {
+ if (ret < 0)
+ ret = nr;
+ else
+ ret += nr;
+ }
+ }
+
+ return ret;
+}
--
2.19.0.rc1.350.ge57e33dbd1-goog
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [RFC 3/3] arm: mm: support get_user_pages_fast
@ 2018-09-06 10:22 ` Minchan Kim
0 siblings, 0 replies; 12+ messages in thread
From: Minchan Kim @ 2018-09-06 10:22 UTC (permalink / raw)
To: linux-arm-kernel
Recently, there was a report get_user_pages_fast helps app launching
speed due to reducing uninterruptible sleep time because we don't
need to contend for mmap_sem, I believe.
With get_user_pages_fast, that uniterruptible sleep time is reduced
about 5~10% by testing.
Cc: Russell King <linux@armlinux.org.uk>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Steve Capper <steve.capper@linaro.org>
Signed-off-by: Minchan Kim <minchan@kernel.org>
---
arch/arm/mm/Makefile | 6 ++
arch/arm/mm/gup.c | 221 +++++++++++++++++++++++++++++++++++++++++++
2 files changed, 227 insertions(+)
create mode 100644 arch/arm/mm/gup.c
diff --git a/arch/arm/mm/Makefile b/arch/arm/mm/Makefile
index 7cb1699fbfc4..f55f96d56843 100644
--- a/arch/arm/mm/Makefile
+++ b/arch/arm/mm/Makefile
@@ -13,6 +13,12 @@ obj-y += nommu.o
obj-$(CONFIG_ARM_MPU) += pmsa-v7.o pmsa-v8.o
endif
+ifneq ($(CONFIG_ARM_LPAE),y)
+ifeq ($(CONFIG_ARCH_HAS_PTE_SPECIAL),y)
+obj-$(CONFIG_MMU) += gup.o
+endif
+endif
+
obj-$(CONFIG_ARM_PTDUMP_CORE) += dump.o
obj-$(CONFIG_ARM_PTDUMP_DEBUGFS) += ptdump_debugfs.o
obj-$(CONFIG_MODULES) += proc-syms.o
diff --git a/arch/arm/mm/gup.c b/arch/arm/mm/gup.c
new file mode 100644
index 000000000000..44e12fb7430e
--- /dev/null
+++ b/arch/arm/mm/gup.c
@@ -0,0 +1,221 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <linux/mm.h>
+#include <linux/uaccess.h>
+#include <linux/pagemap.h>
+#include <asm/pgtable.h>
+
+static inline pte_t gup_get_pte(pte_t *ptep)
+{
+ return READ_ONCE(*ptep);
+}
+
+static int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end,
+ int write, struct page **pages, int *nr)
+{
+ int ret = 0;
+ pte_t *ptep, *ptem;
+
+ ptem = ptep = pte_offset_map(&pmd, addr);
+ do {
+ pte_t pte = gup_get_pte(ptep);
+ struct page *page;
+
+ if (!pte_access_permitted(pte, write))
+ goto pte_unmap;
+
+ if (pte_special(pte))
+ goto pte_unmap;
+
+ VM_BUG_ON(!pfn_valid(pte_pfn(pte)));
+ page = pte_page(pte);
+
+ if (!page_cache_get_speculative(page))
+ goto pte_unmap;
+
+ if (unlikely(pte_val(pte) != pte_val(*ptep))) {
+ put_page(page);
+ goto pte_unmap;
+ }
+
+ SetPageReferenced(page);
+ pages[*nr] = page;
+ (*nr)++;
+
+ } while (ptep++, addr += PAGE_SIZE, addr != end);
+
+ ret = 1;
+
+pte_unmap:
+ pte_unmap(ptem);
+ return ret;
+}
+
+static int gup_pmd_range(pud_t pud, unsigned long addr, unsigned long end,
+ int write, struct page **pages, int *nr)
+{
+ unsigned long next;
+ pmd_t *pmdp;
+
+ pmdp = pmd_offset(&pud, addr);
+ do {
+ pmd_t pmd = READ_ONCE(*pmdp);
+
+ next = pmd_addr_end(addr, end);
+ if (!pmd_present(pmd))
+ return 0;
+ else if (!gup_pte_range(pmd, addr, next, write, pages, nr))
+ return 0;
+ } while (pmdp++, addr = next, addr != end);
+
+ return 1;
+}
+
+static int gup_pud_range(p4d_t *p4dp, unsigned long addr, unsigned long end,
+ int write, struct page **pages, int *nr)
+{
+ unsigned long next;
+ pud_t *pudp;
+
+ pudp = pud_offset(p4dp, addr);
+ do {
+ pud_t pud = READ_ONCE(*pudp);
+
+ next = pud_addr_end(addr, end);
+ if (pud_none(pud))
+ return 0;
+ else if (!gup_pmd_range(pud, addr, next, write, pages, nr))
+ return 0;
+ } while (pudp++, addr = next, addr != end);
+
+ return 1;
+}
+
+static int gup_p4d_range(pgd_t *pgdp, unsigned long addr, unsigned long end,
+ int write, struct page **pages, int *nr)
+{
+ unsigned long next;
+ p4d_t *p4dp;
+
+ p4dp = p4d_offset(pgdp, addr);
+ do {
+ next = p4d_addr_end(addr, end);
+ if (p4d_none(*p4dp)) {
+ return 0;
+ } else if (!gup_pud_range(p4dp, addr, next, write, pages, nr))
+ return 0;
+ } while (p4dp++, addr = next, addr != end);
+
+ return 1;
+}
+
+
+static void gup_pgd_range(unsigned long addr, unsigned long end,
+ int write, struct page **pages, int *nr)
+{
+ unsigned long next;
+ pgd_t *pgdp;
+
+ pgdp = pgd_offset(current->mm, addr);
+ do {
+ next = pgd_addr_end(addr, end);
+ if (pgd_none(*pgdp))
+ return;
+ else if (!gup_p4d_range(pgdp, addr, next, write, pages, nr))
+ break;
+ } while (pgdp++, addr = next, addr != end);
+}
+
+bool gup_fast_permitted(unsigned long start, int nr_pages, int write)
+{
+ unsigned long len, end;
+
+ len = (unsigned long) nr_pages << PAGE_SHIFT;
+ end = start + len;
+ return end >= start;
+}
+
+/*
+ * Like get_user_pages_fast() except its IRQ-safe in that it won't fall
+ * back to the regular GUP.
+ */
+int __get_user_pages_fast(unsigned long start, int nr_pages, int write,
+ struct page **pages)
+{
+ unsigned long addr, len, end;
+ unsigned long flags;
+ int nr = 0;
+
+ start &= PAGE_MASK;
+ addr = start;
+ len = (unsigned long) nr_pages << PAGE_SHIFT;
+ end = start + len;
+
+ if (unlikely(!access_ok(write ? VERIFY_WRITE : VERIFY_READ,
+ (void __user *)start, len)))
+ return 0;
+
+ /*
+ * Disable interrupts. We use the nested form as we can already have
+ * interrupts disabled by get_futex_key.
+ *
+ * With interrupts disabled, we block page table pages from being
+ * freed from under us. See mmu_gather_tlb in asm-generic/tlb.h
+ * for more details.
+ *
+ * We do not adopt an rcu_read_lock(.) here as we also want to
+ * block IPIs that come from THPs splitting.
+ */
+
+ if (gup_fast_permitted(start, nr_pages, write)) {
+ local_irq_save(flags);
+ gup_pgd_range(addr, end, write, pages, &nr);
+ local_irq_restore(flags);
+ }
+
+ return nr;
+}
+
+int get_user_pages_fast(unsigned long start, int nr_pages, int write,
+ struct page **pages)
+{
+ unsigned long addr, len, end;
+ int nr = 0, ret = 0;
+
+ start &= PAGE_MASK;
+ addr = start;
+ len = (unsigned long) nr_pages << PAGE_SHIFT;
+ end = start + len;
+
+ if (nr_pages <= 0)
+ return 0;
+
+ if (unlikely(!access_ok(write ? VERIFY_WRITE : VERIFY_READ,
+ (void __user *)start, len)))
+ return -EFAULT;
+
+ if (gup_fast_permitted(start, nr_pages, write)) {
+ local_irq_disable();
+ gup_pgd_range(addr, end, write, pages, &nr);
+ local_irq_enable();
+ ret = nr;
+ }
+
+ if (nr < nr_pages) {
+ /* Try to get the remaining pages with get_user_pages */
+ start += nr << PAGE_SHIFT;
+ pages += nr;
+
+ ret = get_user_pages_unlocked(start, nr_pages - nr, pages,
+ write ? FOLL_WRITE : 0);
+
+ /* Have to be a bit careful with return values */
+ if (nr > 0) {
+ if (ret < 0)
+ ret = nr;
+ else
+ ret += nr;
+ }
+ }
+
+ return ret;
+}
--
2.19.0.rc1.350.ge57e33dbd1-goog
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [RFC 1/3] arm: mm: reordering memory type table
2018-09-06 10:22 ` Minchan Kim
@ 2018-09-10 16:50 ` Catalin Marinas
-1 siblings, 0 replies; 12+ messages in thread
From: Catalin Marinas @ 2018-09-10 16:50 UTC (permalink / raw)
To: Minchan Kim
Cc: Andrew Morton, linux, steve.capper, will.deacon, linux-kernel,
android-treble-mediatek-ext, kernel-team, linux-arm-kernel,
Simon Horman
On Thu, Sep 06, 2018 at 07:22:10PM +0900, Minchan Kim wrote:
> diff --git a/arch/arm/include/asm/pgtable-2level.h b/arch/arm/include/asm/pgtable-2level.h
> index 92fd2c8a9af0..91b99fadcba1 100644
> --- a/arch/arm/include/asm/pgtable-2level.h
> +++ b/arch/arm/include/asm/pgtable-2level.h
> @@ -164,14 +164,23 @@
> #define L_PTE_MT_BUFFERABLE (_AT(pteval_t, 0x01) << 2) /* 0001 */
> #define L_PTE_MT_WRITETHROUGH (_AT(pteval_t, 0x02) << 2) /* 0010 */
> #define L_PTE_MT_WRITEBACK (_AT(pteval_t, 0x03) << 2) /* 0011 */
> +#define L_PTE_MT_DEV_SHARED (_AT(pteval_t, 0x04) << 2) /* 0100 */
> #define L_PTE_MT_MINICACHE (_AT(pteval_t, 0x06) << 2) /* 0110 (sa1100, xscale) */
> #define L_PTE_MT_WRITEALLOC (_AT(pteval_t, 0x07) << 2) /* 0111 */
> -#define L_PTE_MT_DEV_SHARED (_AT(pteval_t, 0x04) << 2) /* 0100 */
> -#define L_PTE_MT_DEV_NONSHARED (_AT(pteval_t, 0x0c) << 2) /* 1100 */
> +#if defined(CONFIG_CPU_V7) || defined(CONFIG_CPU_V7M) || \
> + defined (CONFIG_CPU_V6) || defined(CONFIG_CPU_V6K)
> +#define L_PTE_MT_DEV_WC L_PTE_MT_BUFFERABLE
> +#define L_PTE_MT_DEV_CACHED L_PTE_MT_WRITEBACK
> +#define L_PTE_MT_DEV_NONSHARED L_PTE_MT_MINICACHE
I think you can just ignore v7M here, it doesn't have an MMU.
You are defining L_PTE_MT_DEV_NONSHARED to L_PTE_MT_MINICACHE but what I
think you just meant is index 6 in the cpu_v6_mt_table which I would use
explicitly to avoid confusion.
Anyway, on ARMv7 or ARMv6+LPAE, the non-shared device gets mapped to
shared device in hardware. Looking through the arm32 code, it seems that
MT_DEVICE_NONSHARED is used by arch/arm/mach-shmobile/setup-r8a7779.c
and IIUC that's a v7 platform (R-Car H1, Cortex-A9). I think the above
should be defined to L_PTE_MT_DEV_SHARED, unless I miss any place where
DEV_NONSHARED is relevant on ARMv6 (adding Simon to confirm on
shmobile).
> diff --git a/arch/arm/mm/proc-macros.S b/arch/arm/mm/proc-macros.S
> index 81d0efb055c6..f896a30653fa 100644
> --- a/arch/arm/mm/proc-macros.S
> +++ b/arch/arm/mm/proc-macros.S
> @@ -134,21 +134,21 @@
> .macro armv6_mt_table pfx
> \pfx\()_mt_table:
Since you changed the MT index, you'd have to fix proc-v7-*levels.S as
well. If you define DEV_NONSHARED to SHARED, I think you only need to
update the index for L_PTE_MT_VECTORS.
--
Catalin
^ permalink raw reply [flat|nested] 12+ messages in thread
* [RFC 1/3] arm: mm: reordering memory type table
@ 2018-09-10 16:50 ` Catalin Marinas
0 siblings, 0 replies; 12+ messages in thread
From: Catalin Marinas @ 2018-09-10 16:50 UTC (permalink / raw)
To: linux-arm-kernel
On Thu, Sep 06, 2018 at 07:22:10PM +0900, Minchan Kim wrote:
> diff --git a/arch/arm/include/asm/pgtable-2level.h b/arch/arm/include/asm/pgtable-2level.h
> index 92fd2c8a9af0..91b99fadcba1 100644
> --- a/arch/arm/include/asm/pgtable-2level.h
> +++ b/arch/arm/include/asm/pgtable-2level.h
> @@ -164,14 +164,23 @@
> #define L_PTE_MT_BUFFERABLE (_AT(pteval_t, 0x01) << 2) /* 0001 */
> #define L_PTE_MT_WRITETHROUGH (_AT(pteval_t, 0x02) << 2) /* 0010 */
> #define L_PTE_MT_WRITEBACK (_AT(pteval_t, 0x03) << 2) /* 0011 */
> +#define L_PTE_MT_DEV_SHARED (_AT(pteval_t, 0x04) << 2) /* 0100 */
> #define L_PTE_MT_MINICACHE (_AT(pteval_t, 0x06) << 2) /* 0110 (sa1100, xscale) */
> #define L_PTE_MT_WRITEALLOC (_AT(pteval_t, 0x07) << 2) /* 0111 */
> -#define L_PTE_MT_DEV_SHARED (_AT(pteval_t, 0x04) << 2) /* 0100 */
> -#define L_PTE_MT_DEV_NONSHARED (_AT(pteval_t, 0x0c) << 2) /* 1100 */
> +#if defined(CONFIG_CPU_V7) || defined(CONFIG_CPU_V7M) || \
> + defined (CONFIG_CPU_V6) || defined(CONFIG_CPU_V6K)
> +#define L_PTE_MT_DEV_WC L_PTE_MT_BUFFERABLE
> +#define L_PTE_MT_DEV_CACHED L_PTE_MT_WRITEBACK
> +#define L_PTE_MT_DEV_NONSHARED L_PTE_MT_MINICACHE
I think you can just ignore v7M here, it doesn't have an MMU.
You are defining L_PTE_MT_DEV_NONSHARED to L_PTE_MT_MINICACHE but what I
think you just meant is index 6 in the cpu_v6_mt_table which I would use
explicitly to avoid confusion.
Anyway, on ARMv7 or ARMv6+LPAE, the non-shared device gets mapped to
shared device in hardware. Looking through the arm32 code, it seems that
MT_DEVICE_NONSHARED is used by arch/arm/mach-shmobile/setup-r8a7779.c
and IIUC that's a v7 platform (R-Car H1, Cortex-A9). I think the above
should be defined to L_PTE_MT_DEV_SHARED, unless I miss any place where
DEV_NONSHARED is relevant on ARMv6 (adding Simon to confirm on
shmobile).
> diff --git a/arch/arm/mm/proc-macros.S b/arch/arm/mm/proc-macros.S
> index 81d0efb055c6..f896a30653fa 100644
> --- a/arch/arm/mm/proc-macros.S
> +++ b/arch/arm/mm/proc-macros.S
> @@ -134,21 +134,21 @@
> .macro armv6_mt_table pfx
> \pfx\()_mt_table:
Since you changed the MT index, you'd have to fix proc-v7-*levels.S as
well. If you define DEV_NONSHARED to SHARED, I think you only need to
update the index for L_PTE_MT_VECTORS.
--
Catalin
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RFC 1/3] arm: mm: reordering memory type table
2018-09-10 16:50 ` Catalin Marinas
@ 2018-09-14 6:26 ` Minchan Kim
-1 siblings, 0 replies; 12+ messages in thread
From: Minchan Kim @ 2018-09-14 6:26 UTC (permalink / raw)
To: Catalin Marinas
Cc: Andrew Morton, linux, steve.capper, will.deacon, linux-kernel,
android-treble-mediatek-ext, kernel-team, linux-arm-kernel,
Simon Horman
On Mon, Sep 10, 2018 at 05:50:11PM +0100, Catalin Marinas wrote:
> On Thu, Sep 06, 2018 at 07:22:10PM +0900, Minchan Kim wrote:
> > diff --git a/arch/arm/include/asm/pgtable-2level.h b/arch/arm/include/asm/pgtable-2level.h
> > index 92fd2c8a9af0..91b99fadcba1 100644
> > --- a/arch/arm/include/asm/pgtable-2level.h
> > +++ b/arch/arm/include/asm/pgtable-2level.h
> > @@ -164,14 +164,23 @@
> > #define L_PTE_MT_BUFFERABLE (_AT(pteval_t, 0x01) << 2) /* 0001 */
> > #define L_PTE_MT_WRITETHROUGH (_AT(pteval_t, 0x02) << 2) /* 0010 */
> > #define L_PTE_MT_WRITEBACK (_AT(pteval_t, 0x03) << 2) /* 0011 */
> > +#define L_PTE_MT_DEV_SHARED (_AT(pteval_t, 0x04) << 2) /* 0100 */
> > #define L_PTE_MT_MINICACHE (_AT(pteval_t, 0x06) << 2) /* 0110 (sa1100, xscale) */
> > #define L_PTE_MT_WRITEALLOC (_AT(pteval_t, 0x07) << 2) /* 0111 */
> > -#define L_PTE_MT_DEV_SHARED (_AT(pteval_t, 0x04) << 2) /* 0100 */
> > -#define L_PTE_MT_DEV_NONSHARED (_AT(pteval_t, 0x0c) << 2) /* 1100 */
> > +#if defined(CONFIG_CPU_V7) || defined(CONFIG_CPU_V7M) || \
> > + defined (CONFIG_CPU_V6) || defined(CONFIG_CPU_V6K)
> > +#define L_PTE_MT_DEV_WC L_PTE_MT_BUFFERABLE
> > +#define L_PTE_MT_DEV_CACHED L_PTE_MT_WRITEBACK
> > +#define L_PTE_MT_DEV_NONSHARED L_PTE_MT_MINICACHE
>
> I think you can just ignore v7M here, it doesn't have an MMU.
I didn't know that. Will fix.
>
> You are defining L_PTE_MT_DEV_NONSHARED to L_PTE_MT_MINICACHE but what I
> think you just meant is index 6 in the cpu_v6_mt_table which I would use
> explicitly to avoid confusion.
>
> Anyway, on ARMv7 or ARMv6+LPAE, the non-shared device gets mapped to
> shared device in hardware. Looking through the arm32 code, it seems that
Thanks for the information. I didn't know that.
> MT_DEVICE_NONSHARED is used by arch/arm/mach-shmobile/setup-r8a7779.c
> and IIUC that's a v7 platform (R-Car H1, Cortex-A9). I think the above
> should be defined to L_PTE_MT_DEV_SHARED, unless I miss any place where
> DEV_NONSHARED is relevant on ARMv6 (adding Simon to confirm on
> shmobile).
Simon, could you confirm this?
>
> > diff --git a/arch/arm/mm/proc-macros.S b/arch/arm/mm/proc-macros.S
> > index 81d0efb055c6..f896a30653fa 100644
> > --- a/arch/arm/mm/proc-macros.S
> > +++ b/arch/arm/mm/proc-macros.S
> > @@ -134,21 +134,21 @@
> > .macro armv6_mt_table pfx
> > \pfx\()_mt_table:
>
> Since you changed the MT index, you'd have to fix proc-v7-*levels.S as
> well. If you define DEV_NONSHARED to SHARED, I think you only need to
> update the index for L_PTE_MT_VECTORS.
Good idea. I will try it on.
Thanks for the review, Catalin.
^ permalink raw reply [flat|nested] 12+ messages in thread
* [RFC 1/3] arm: mm: reordering memory type table
@ 2018-09-14 6:26 ` Minchan Kim
0 siblings, 0 replies; 12+ messages in thread
From: Minchan Kim @ 2018-09-14 6:26 UTC (permalink / raw)
To: linux-arm-kernel
On Mon, Sep 10, 2018 at 05:50:11PM +0100, Catalin Marinas wrote:
> On Thu, Sep 06, 2018 at 07:22:10PM +0900, Minchan Kim wrote:
> > diff --git a/arch/arm/include/asm/pgtable-2level.h b/arch/arm/include/asm/pgtable-2level.h
> > index 92fd2c8a9af0..91b99fadcba1 100644
> > --- a/arch/arm/include/asm/pgtable-2level.h
> > +++ b/arch/arm/include/asm/pgtable-2level.h
> > @@ -164,14 +164,23 @@
> > #define L_PTE_MT_BUFFERABLE (_AT(pteval_t, 0x01) << 2) /* 0001 */
> > #define L_PTE_MT_WRITETHROUGH (_AT(pteval_t, 0x02) << 2) /* 0010 */
> > #define L_PTE_MT_WRITEBACK (_AT(pteval_t, 0x03) << 2) /* 0011 */
> > +#define L_PTE_MT_DEV_SHARED (_AT(pteval_t, 0x04) << 2) /* 0100 */
> > #define L_PTE_MT_MINICACHE (_AT(pteval_t, 0x06) << 2) /* 0110 (sa1100, xscale) */
> > #define L_PTE_MT_WRITEALLOC (_AT(pteval_t, 0x07) << 2) /* 0111 */
> > -#define L_PTE_MT_DEV_SHARED (_AT(pteval_t, 0x04) << 2) /* 0100 */
> > -#define L_PTE_MT_DEV_NONSHARED (_AT(pteval_t, 0x0c) << 2) /* 1100 */
> > +#if defined(CONFIG_CPU_V7) || defined(CONFIG_CPU_V7M) || \
> > + defined (CONFIG_CPU_V6) || defined(CONFIG_CPU_V6K)
> > +#define L_PTE_MT_DEV_WC L_PTE_MT_BUFFERABLE
> > +#define L_PTE_MT_DEV_CACHED L_PTE_MT_WRITEBACK
> > +#define L_PTE_MT_DEV_NONSHARED L_PTE_MT_MINICACHE
>
> I think you can just ignore v7M here, it doesn't have an MMU.
I didn't know that. Will fix.
>
> You are defining L_PTE_MT_DEV_NONSHARED to L_PTE_MT_MINICACHE but what I
> think you just meant is index 6 in the cpu_v6_mt_table which I would use
> explicitly to avoid confusion.
>
> Anyway, on ARMv7 or ARMv6+LPAE, the non-shared device gets mapped to
> shared device in hardware. Looking through the arm32 code, it seems that
Thanks for the information. I didn't know that.
> MT_DEVICE_NONSHARED is used by arch/arm/mach-shmobile/setup-r8a7779.c
> and IIUC that's a v7 platform (R-Car H1, Cortex-A9). I think the above
> should be defined to L_PTE_MT_DEV_SHARED, unless I miss any place where
> DEV_NONSHARED is relevant on ARMv6 (adding Simon to confirm on
> shmobile).
Simon, could you confirm this?
>
> > diff --git a/arch/arm/mm/proc-macros.S b/arch/arm/mm/proc-macros.S
> > index 81d0efb055c6..f896a30653fa 100644
> > --- a/arch/arm/mm/proc-macros.S
> > +++ b/arch/arm/mm/proc-macros.S
> > @@ -134,21 +134,21 @@
> > .macro armv6_mt_table pfx
> > \pfx\()_mt_table:
>
> Since you changed the MT index, you'd have to fix proc-v7-*levels.S as
> well. If you define DEV_NONSHARED to SHARED, I think you only need to
> update the index for L_PTE_MT_VECTORS.
Good idea. I will try it on.
Thanks for the review, Catalin.
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2018-09-14 6:27 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-09-06 10:22 [RFC 0/3] arm: support get_user_pages_fast Minchan Kim
2018-09-06 10:22 ` Minchan Kim
2018-09-06 10:22 ` [RFC 1/3] arm: mm: reordering memory type table Minchan Kim
2018-09-06 10:22 ` Minchan Kim
2018-09-10 16:50 ` Catalin Marinas
2018-09-10 16:50 ` Catalin Marinas
2018-09-14 6:26 ` Minchan Kim
2018-09-14 6:26 ` Minchan Kim
2018-09-06 10:22 ` [RFC 2/3] arm: mm: introduce L_PTE_SPECIAL Minchan Kim
2018-09-06 10:22 ` Minchan Kim
2018-09-06 10:22 ` [RFC 3/3] arm: mm: support get_user_pages_fast Minchan Kim
2018-09-06 10:22 ` Minchan Kim
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.