Linux-csky Archive on lore.kernel.org
 help / color / Atom feed
* [patch RFC 00/15] mm/highmem: Provide a preemptible variant of kmap_atomic & friends
@ 2020-09-19  9:17 Thomas Gleixner
  2020-09-19  9:17 ` [patch RFC 01/15] mm/highmem: Un-EXPORT __kmap_atomic_idx() Thomas Gleixner
                   ` (16 more replies)
  0 siblings, 17 replies; 54+ messages in thread
From: Thomas Gleixner @ 2020-09-19  9:17 UTC (permalink / raw)
  To: LKML
  Cc: linux-arch, Linus Torvalds, Paul McKenney, x86,
	Sebastian Andrzej Siewior, Peter Zijlstra, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Daniel Bristot de Oliveira, Will Deacon,
	Andrew Morton, Linux-MM, Russell King, Linux ARM, Chris Zankel,
	Max Filippov, linux-xtensa, Jani Nikula, Joonas Lahtinen,
	Rodrigo Vivi, David Airlie, Daniel Vetter, intel-gfx, dri-devel,
	Ard Biesheuvel, Herbert Xu, Vineet Gupta, linux-snps-arc,
	Arnd Bergmann, Guo Ren, linux-csky, Michal Simek,
	Thomas Bogendoerfer, linux-mips, Nick Hu, Greentime Hu,
	Vincent Chen, Michael Ellerman, Benjamin Herrenschmidt,
	Paul Mackerras, linuxppc-dev, David S. Miller, sparclinux

First of all, sorry for the horribly big Cc list!

Following up to the discussion in:

  https://lore.kernel.org/r/20200914204209.256266093@linutronix.de

this provides a preemptible variant of kmap_atomic & related
interfaces. This is achieved by:

 - Consolidating all kmap atomic implementations in generic code

 - Switching from per CPU storage of the kmap index to a per task storage

 - Adding a pteval array to the per task storage which contains the ptevals
   of the currently active temporary kmaps

 - Adding context switch code which checks whether the outgoing or the
   incoming task has active temporary kmaps. If so, the outgoing task's
   kmaps are removed and the incoming task's kmaps are restored.

 - Adding new interfaces k[un]map_temporary*() which are not disabling
   preemption and can be called from any context (except NMI).

   Contrary to kmap() which provides preemptible and "persistant" mappings,
   these interfaces are meant to replace the temporary mappings provided by
   kmap_atomic*() today.

This allows to get rid of conditional mapping choices and allows to have
preemptible short term mappings on 64bit which are today enforced to be
non-preemptible due to the highmem constraints. It clearly puts overhead on
the highmem users, but highmem is slow anyway.

This is not a wholesale conversion which makes kmap_atomic magically
preemptible because there might be usage sites which rely on the implicit
preempt disable. So this needs to be done on a case by case basis and the
call sites converted to kmap_temporary.

Note, that this is only lightly tested on X86 and completely untested on
all other architectures.

The lot is also available from

   git://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel.git highmem

Thanks,

	tglx
---
 a/arch/arm/mm/highmem.c               |  121 ---------------------
 a/arch/microblaze/mm/highmem.c        |   78 -------------
 a/arch/nds32/mm/highmem.c             |   48 --------
 a/arch/powerpc/mm/highmem.c           |   67 -----------
 a/arch/sparc/mm/highmem.c             |  115 --------------------
 arch/arc/Kconfig                      |    1 
 arch/arc/include/asm/highmem.h        |    8 +
 arch/arc/mm/highmem.c                 |   44 -------
 arch/arm/Kconfig                      |    1 
 arch/arm/include/asm/highmem.h        |   30 +++--
 arch/arm/mm/Makefile                  |    1 
 arch/csky/Kconfig                     |    1 
 arch/csky/include/asm/highmem.h       |    4 
 arch/csky/mm/highmem.c                |   75 -------------
 arch/microblaze/Kconfig               |    1 
 arch/microblaze/include/asm/highmem.h |    6 -
 arch/microblaze/mm/Makefile           |    1 
 arch/microblaze/mm/init.c             |    6 -
 arch/mips/Kconfig                     |    1 
 arch/mips/include/asm/highmem.h       |    4 
 arch/mips/mm/highmem.c                |   77 -------------
 arch/mips/mm/init.c                   |    3 
 arch/nds32/Kconfig.cpu                |    1 
 arch/nds32/include/asm/highmem.h      |   21 ++-
 arch/nds32/mm/Makefile                |    1 
 arch/powerpc/Kconfig                  |    1 
 arch/powerpc/include/asm/highmem.h    |    6 -
 arch/powerpc/mm/Makefile              |    1 
 arch/powerpc/mm/mem.c                 |    7 -
 arch/sparc/Kconfig                    |    1 
 arch/sparc/include/asm/highmem.h      |    7 -
 arch/sparc/mm/Makefile                |    3 
 arch/sparc/mm/srmmu.c                 |    2 
 arch/x86/include/asm/fixmap.h         |    1 
 arch/x86/include/asm/highmem.h        |   12 +-
 arch/x86/include/asm/iomap.h          |   29 +++--
 arch/x86/mm/highmem_32.c              |   59 ----------
 arch/x86/mm/init_32.c                 |   15 --
 arch/x86/mm/iomap_32.c                |   57 ----------
 arch/xtensa/Kconfig                   |    1 
 arch/xtensa/include/asm/highmem.h     |    9 +
 arch/xtensa/mm/highmem.c              |   44 -------
 b/arch/x86/Kconfig                    |    3 
 include/linux/highmem.h               |  141 +++++++++++++++---------
 include/linux/io-mapping.h            |    2 
 include/linux/sched.h                 |    9 +
 kernel/sched/core.c                   |   10 +
 mm/Kconfig                            |    3 
 mm/highmem.c                          |  192 ++++++++++++++++++++++++++++++++--
 49 files changed, 422 insertions(+), 909 deletions(-)

^ permalink raw reply	[flat|nested] 54+ messages in thread

* [patch RFC 01/15] mm/highmem: Un-EXPORT __kmap_atomic_idx()
  2020-09-19  9:17 [patch RFC 00/15] mm/highmem: Provide a preemptible variant of kmap_atomic & friends Thomas Gleixner
@ 2020-09-19  9:17 ` Thomas Gleixner
  2020-09-21  6:23   ` Christoph Hellwig
  2020-09-19  9:17 ` [patch RFC 02/15] highmem: Provide generic variant of kmap_atomic* Thomas Gleixner
                   ` (15 subsequent siblings)
  16 siblings, 1 reply; 54+ messages in thread
From: Thomas Gleixner @ 2020-09-19  9:17 UTC (permalink / raw)
  To: LKML
  Cc: linux-arch, Linus Torvalds, Paul McKenney, x86,
	Sebastian Andrzej Siewior, Peter Zijlstra, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Daniel Bristot de Oliveira, Will Deacon,
	Andrew Morton, Linux-MM, Russell King, Linux ARM, Chris Zankel,
	Max Filippov, linux-xtensa, Jani Nikula, Joonas Lahtinen,
	Rodrigo Vivi, David Airlie, Daniel Vetter, intel-gfx, dri-devel,
	Ard Biesheuvel, Herbert Xu, Vineet Gupta, linux-snps-arc,
	Arnd Bergmann, Guo Ren, linux-csky, Michal Simek,
	Thomas Bogendoerfer, linux-mips, Nick Hu, Greentime Hu,
	Vincent Chen, Michael Ellerman, Benjamin Herrenschmidt,
	Paul Mackerras, linuxppc-dev, David S. Miller, sparclinux

Nothing in modules can use that.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 mm/highmem.c |    2 --
 1 file changed, 2 deletions(-)

--- a/mm/highmem.c
+++ b/mm/highmem.c
@@ -108,8 +108,6 @@ static inline wait_queue_head_t *get_pkm
 atomic_long_t _totalhigh_pages __read_mostly;
 EXPORT_SYMBOL(_totalhigh_pages);
 
-EXPORT_PER_CPU_SYMBOL(__kmap_atomic_idx);
-
 unsigned int nr_free_highpages (void)
 {
 	struct zone *zone;


^ permalink raw reply	[flat|nested] 54+ messages in thread

* [patch RFC 02/15] highmem: Provide generic variant of kmap_atomic*
  2020-09-19  9:17 [patch RFC 00/15] mm/highmem: Provide a preemptible variant of kmap_atomic & friends Thomas Gleixner
  2020-09-19  9:17 ` [patch RFC 01/15] mm/highmem: Un-EXPORT __kmap_atomic_idx() Thomas Gleixner
@ 2020-09-19  9:17 ` Thomas Gleixner
  2020-09-21  6:28   ` Christoph Hellwig
  2020-09-19  9:17 ` [patch RFC 03/15] x86/mm/highmem: Use generic kmap atomic implementation Thomas Gleixner
                   ` (14 subsequent siblings)
  16 siblings, 1 reply; 54+ messages in thread
From: Thomas Gleixner @ 2020-09-19  9:17 UTC (permalink / raw)
  To: LKML
  Cc: linux-arch, Linus Torvalds, Paul McKenney, x86,
	Sebastian Andrzej Siewior, Peter Zijlstra, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Daniel Bristot de Oliveira, Will Deacon,
	Andrew Morton, Linux-MM, Russell King, Linux ARM, Chris Zankel,
	Max Filippov, linux-xtensa, Jani Nikula, Joonas Lahtinen,
	Rodrigo Vivi, David Airlie, Daniel Vetter, intel-gfx, dri-devel,
	Ard Biesheuvel, Herbert Xu, Vineet Gupta, linux-snps-arc,
	Arnd Bergmann, Guo Ren, linux-csky, Michal Simek,
	Thomas Bogendoerfer, linux-mips, Nick Hu, Greentime Hu,
	Vincent Chen, Michael Ellerman, Benjamin Herrenschmidt,
	Paul Mackerras, linuxppc-dev, David S. Miller, sparclinux

The kmap_atomic* interfaces in all architectures are pretty much the same
except for post map operations (flush) and pre- and post unmap operations.

Provide a generic variant for that.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 include/linux/highmem.h |   87 ++++++++++++++++++++++++++++-------
 mm/Kconfig              |    3 +
 mm/highmem.c            |  119 +++++++++++++++++++++++++++++++++++++++++++++++-
 3 files changed, 192 insertions(+), 17 deletions(-)

--- a/include/linux/highmem.h
+++ b/include/linux/highmem.h
@@ -31,9 +31,22 @@ static inline void invalidate_kernel_vma
 
 #include <asm/kmap_types.h>
 
+/*
+ * Outside of CONFIG_HIGHMEM to support X86 32bit iomap_atomic() cruft.
+ */
+#ifdef CONFIG_KMAP_ATOMIC_GENERIC
+void *kmap_atomic_pfn_prot(unsigned long pfn, pgprot_t prot);
+void *kmap_atomic_page_prot(struct page *page, pgprot_t prot);
+void kunmap_atomic_indexed(void *vaddr);
+# ifndef ARCH_NEEDS_KMAP_HIGH_GET
+static inline void *arch_kmap_temporary_high_get(struct page *page)
+{
+	return NULL;
+}
+# endif
+#endif
+
 #ifdef CONFIG_HIGHMEM
-extern void *kmap_atomic_high_prot(struct page *page, pgprot_t prot);
-extern void kunmap_atomic_high(void *kvaddr);
 #include <asm/highmem.h>
 
 #ifndef ARCH_HAS_KMAP_FLUSH_TLB
@@ -81,6 +94,11 @@ static inline void kunmap(struct page *p
  * be used in IRQ contexts, so in some (very limited) cases we need
  * it.
  */
+
+#ifndef CONFIG_KMAP_ATOMIC_GENERIC
+void *kmap_atomic_high_prot(struct page *page, pgprot_t prot);
+void kunmap_atomic_high(void *kvaddr);
+
 static inline void *kmap_atomic_prot(struct page *page, pgprot_t prot)
 {
 	preempt_disable();
@@ -89,7 +107,38 @@ static inline void *kmap_atomic_prot(str
 		return page_address(page);
 	return kmap_atomic_high_prot(page, prot);
 }
-#define kmap_atomic(page)	kmap_atomic_prot(page, kmap_prot)
+
+static inline void __kunmap_atomic(void *vaddr)
+{
+	kunmap_atomic_high(vaddr);
+	pagefault_enable();
+}
+#else /* !CONFIG_KMAP_ATOMIC_GENERIC */
+
+static inline void *kmap_atomic_prot(struct page *page, pgprot_t prot)
+{
+	preempt_disable();
+	return kmap_atomic_page_prot(page, prot);
+}
+
+static inline void *kmap_atomic_pfn(unsigned long pfn)
+{
+	preempt_disable();
+	return kmap_atomic_pfn_prot(pfn, kmap_prot);
+}
+
+static inline void __kunmap_atomic(void *addr)
+{
+	kumap_atomic_indexed(addr);
+}
+
+
+#endif /* CONFIG_KMAP_ATOMIC_GENERIC */
+
+static inline void *kmap_atomic(struct page *page)
+{
+	return kmap_atomic_prot(page, kmap_prot);
+}
 
 /* declarations for linux/mm/highmem.c */
 unsigned int nr_free_highpages(void);
@@ -157,21 +206,29 @@ static inline void *kmap_atomic(struct p
 	pagefault_disable();
 	return page_address(page);
 }
-#define kmap_atomic_prot(page, prot)	kmap_atomic(page)
 
-static inline void kunmap_atomic_high(void *addr)
+static inline void *kmap_atomic_prot(struct page *page, pgprot_t prot)
+{
+	return kmap_atomic(page);
+}
+
+static inline void *kmap_atomic_pfn(unsigned long pfn)
+{
+	return kmap_atomic(pfn_to_page(pfn));
+}
+
+static inline void __kunmap_atomic(void *addr)
 {
 	/*
 	 * Mostly nothing to do in the CONFIG_HIGHMEM=n case as kunmap_atomic()
-	 * handles re-enabling faults + preemption
+	 * handles preemption
 	 */
 #ifdef ARCH_HAS_FLUSH_ON_KUNMAP
 	kunmap_flush_on_unmap(addr);
 #endif
+	pagefault_enable();
 }
 
-#define kmap_atomic_pfn(pfn)	kmap_atomic(pfn_to_page(pfn))
-
 #define kmap_flush_unused()	do {} while(0)
 
 #endif /* CONFIG_HIGHMEM */
@@ -213,14 +270,12 @@ static inline void kmap_atomic_idx_pop(v
  * Prevent people trying to call kunmap_atomic() as if it were kunmap()
  * kunmap_atomic() should get the return value of kmap_atomic, not the page.
  */
-#define kunmap_atomic(addr)                                     \
-do {                                                            \
-	BUILD_BUG_ON(__same_type((addr), struct page *));       \
-	kunmap_atomic_high(addr);                                  \
-	pagefault_enable();                                     \
-	preempt_enable();                                       \
-} while (0)
-
+#define kunmap_atomic(addr)						\
+	do {								\
+		BUILD_BUG_ON(__same_type((addr), struct page *));	\
+		__kunmap_atomic(addr);					\
+		preempt_enable();					\
+	} while (0)
 
 /* when CONFIG_HIGHMEM is not set these will be plain clear/copy_page */
 #ifndef clear_user_highpage
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -868,4 +868,7 @@ config ARCH_HAS_HUGEPD
 config MAPPING_DIRTY_HELPERS
         bool
 
+config KMAP_ATOMIC_GENERIC
+	bool
+
 endmenu
--- a/mm/highmem.c
+++ b/mm/highmem.c
@@ -314,6 +314,15 @@ void *kmap_high_get(struct page *page)
 	unlock_kmap_any(flags);
 	return (void*) vaddr;
 }
+
+/* Unmap a temporary mapping which was obtained by kmap_high_get() */
+static void kmap_high_unmap_temporary(unsigned long vaddr)
+{
+	if (vaddr >= PKMAP_ADDR(0) && vaddr < PKMAP_ADDR(LAST_PKMAP))
+		kunmap_high(pte_page(pkmap_page_table[PKMAP_NR(vaddr)]));
+}
+#else
+static inline void kmap_high_unmap_temporary(unsigned long vaddr) { }
 #endif
 
 /**
@@ -365,8 +374,116 @@ void kunmap_high(struct page *page)
 	if (need_wakeup)
 		wake_up(pkmap_map_wait);
 }
-
 EXPORT_SYMBOL(kunmap_high);
+#else
+static inline void kmap_high_unmap_temporary(unsigned long vaddr) { }
+#endif /* CONFIG_HIGHMEM */
+
+#ifdef CONFIG_KMAP_ATOMIC_GENERIC
+#ifndef arch_kmap_temp_post_map
+# define arch_kmap_temp_post_map(vaddr, pteval)	do { } while (0)
+#endif
+
+#ifndef arch_kmap_temp_pre_unmap
+# define arch_kmap_temp_pre_unmap(vaddr)	do { } while (0)
+#endif
+
+#ifndef arch_kmap_temp_post_unmap
+# define arch_kmap_temp_post_unmap(vaddr)	do { } while (0)
+#endif
+
+#ifndef arch_kmap_temp_map_idx
+#define arch_kmap_temp_map_idx(type, pfn)	kmap_temp_idx(type)
+#endif
+
+#ifndef arch_kmap_temp_unmap_idx
+#define arch_kmap_temp_unmap_idx(type, vaddr)	kmap_temp_idx(type)
+#endif
+
+static inline int kmap_temp_idx(int type)
+{
+	return type + KM_TYPE_NR * smp_processor_id();
+}
+
+static pte_t *__kmap_pte;
+
+static pte_t *kmap_get_pte(void)
+{
+	if (!__kmap_pte)
+		__kmap_pte = virt_to_kpte(__fix_to_virt(FIX_KMAP_BEGIN));
+	return __kmap_pte;
+}
+
+static void *__kmap_atomic_pfn_prot(unsigned long pfn, pgprot_t prot)
+{
+	pte_t pteval, *kmap_pte = kmap_get_pte();
+	unsigned long vaddr;
+	int idx;
+
+	preempt_disable();
+	idx = arch_kmap_temp_map_idx(kmap_atomic_idx_push(), pfn);
+	vaddr = __fix_to_virt(FIX_KMAP_BEGIN + idx);
+	BUG_ON(!pte_none(*(kmap_pte - idx)));
+	pteval = pfn_pte(pfn, prot);
+	set_pte(kmap_pte - idx, pteval);
+	arch_kmap_temp_post_map(vaddr, pteval);
+	preempt_enable();
+
+	return (void *)vaddr;
+}
+
+void *kmap_atomic_pfn_prot(unsigned long pfn, pgprot_t prot)
+{
+	pagefault_disable();
+	return __kmap_atomic_pfn_prot(pfn, prot);
+}
+EXPORT_SYMBOL(kmap_atomic_pfn_prot);
+
+void *kmap_atomic_page_prot(struct page *page, pgprot_t prot)
+{
+	void *kmap;
+
+	pagefault_disable();
+	if (!PageHighMem(page))
+		return page_address(page);
+
+	/* Try kmap_high_get() if architecture has it enabled */
+	kmap = arch_kmap_temporary_high_get(page);
+	if (kmap)
+		return kmap;
+
+	return __kmap_atomic_pfn_prot(page_to_pfn(page), prot);
+}
+EXPORT_SYMBOL(kmap_atomic_page_prot);
+
+void kunmap_atomic_indexed(void *vaddr)
+{
+	unsigned long addr = (unsigned long) vaddr & PAGE_MASK;
+	pte_t *kmap_pte = kmap_get_pte();
+	int idx;
+
+	if (addr < __fix_to_virt(FIX_KMAP_END) ||
+	    addr > __fix_to_virt(FIX_KMAP_BEGIN)) {
+		WARN_ON_ONCE(addr < PAGE_OFFSET);
+
+		/* Handle mappings which were obtained by kmap_high_get() */
+		kmap_high_unmap_temporary(addr);
+		pagefault_enable();
+		return;
+	}
+
+	preempt_disable();
+	idx = arch_kmap_temp_unmap_idx(kmap_atomic_idx(), addr);
+	WARN_ON_ONCE(addr != __fix_to_virt(FIX_KMAP_BEGIN + idx));
+
+	arch_kmap_temp_pre_unmap(addr);
+	pte_clear(&init_mm, addr, kmap_pte - idx);
+	arch_kmap_temp_post_unmap(addr);
+	kmap_atomic_idx_pop();
+	preempt_enable();
+	pagefault_enable();
+}
+EXPORT_SYMBOL(kunmap_atomic_indexed);
 #endif
 
 #if defined(HASHED_PAGE_VIRTUAL)


^ permalink raw reply	[flat|nested] 54+ messages in thread

* [patch RFC 03/15] x86/mm/highmem: Use generic kmap atomic implementation
  2020-09-19  9:17 [patch RFC 00/15] mm/highmem: Provide a preemptible variant of kmap_atomic & friends Thomas Gleixner
  2020-09-19  9:17 ` [patch RFC 01/15] mm/highmem: Un-EXPORT __kmap_atomic_idx() Thomas Gleixner
  2020-09-19  9:17 ` [patch RFC 02/15] highmem: Provide generic variant of kmap_atomic* Thomas Gleixner
@ 2020-09-19  9:17 ` Thomas Gleixner
  2020-09-19  9:17 ` [patch RFC 04/15] arc/mm/highmem: " Thomas Gleixner
                   ` (13 subsequent siblings)
  16 siblings, 0 replies; 54+ messages in thread
From: Thomas Gleixner @ 2020-09-19  9:17 UTC (permalink / raw)
  To: LKML
  Cc: linux-arch, Linus Torvalds, Paul McKenney, x86,
	Sebastian Andrzej Siewior, Peter Zijlstra, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Daniel Bristot de Oliveira, Will Deacon,
	Andrew Morton, Linux-MM, Russell King, Linux ARM, Chris Zankel,
	Max Filippov, linux-xtensa, Jani Nikula, Joonas Lahtinen,
	Rodrigo Vivi, David Airlie, Daniel Vetter, intel-gfx, dri-devel,
	Ard Biesheuvel, Herbert Xu, Vineet Gupta, linux-snps-arc,
	Arnd Bergmann, Guo Ren, linux-csky, Michal Simek,
	Thomas Bogendoerfer, linux-mips, Nick Hu, Greentime Hu,
	Vincent Chen, Michael Ellerman, Benjamin Herrenschmidt,
	Paul Mackerras, linuxppc-dev, David S. Miller, sparclinux

Convert X86 to the generic kmap atomic implementation.

Make the iomap_atomic() naming convention consistent while at it.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/Kconfig               |    3 +-
 arch/x86/include/asm/fixmap.h  |    1 
 arch/x86/include/asm/highmem.h |   12 ++++++--
 arch/x86/include/asm/iomap.h   |   17 ++++++-----
 arch/x86/mm/highmem_32.c       |   59 -----------------------------------------
 arch/x86/mm/init_32.c          |   15 ----------
 arch/x86/mm/iomap_32.c         |   58 ++--------------------------------------
 include/linux/io-mapping.h     |    2 -
 8 files changed, 25 insertions(+), 142 deletions(-)

--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -14,10 +14,11 @@ config X86_32
 	select ARCH_WANT_IPC_PARSE_VERSION
 	select CLKSRC_I8253
 	select CLONE_BACKWARDS
+	select GENERIC_VDSO_32
 	select HAVE_DEBUG_STACKOVERFLOW
+	select KMAP_ATOMIC_GENERIC
 	select MODULES_USE_ELF_REL
 	select OLD_SIGACTION
-	select GENERIC_VDSO_32
 
 config X86_64
 	def_bool y
--- a/arch/x86/include/asm/fixmap.h
+++ b/arch/x86/include/asm/fixmap.h
@@ -151,7 +151,6 @@ extern void reserve_top_address(unsigned
 
 extern int fixmaps_set;
 
-extern pte_t *kmap_pte;
 extern pte_t *pkmap_page_table;
 
 void __native_set_fixmap(enum fixed_addresses idx, pte_t pte);
--- a/arch/x86/include/asm/highmem.h
+++ b/arch/x86/include/asm/highmem.h
@@ -58,11 +58,17 @@ extern unsigned long highstart_pfn, high
 #define PKMAP_NR(virt)  ((virt-PKMAP_BASE) >> PAGE_SHIFT)
 #define PKMAP_ADDR(nr)  (PKMAP_BASE + ((nr) << PAGE_SHIFT))
 
-void *kmap_atomic_pfn(unsigned long pfn);
-void *kmap_atomic_prot_pfn(unsigned long pfn, pgprot_t prot);
-
 #define flush_cache_kmaps()	do { } while (0)
 
+#define	arch_kmap_temp_post_map(vaddr, pteval)		\
+	arch_flush_lazy_mmu_mode()
+
+#define	arch_kmap_temp_post_unmap(vaddr)		\
+	do {						\
+		flush_tlb_one_kernel((vaddr));		\
+		arch_flush_lazy_mmu_mode();		\
+	} while (0)
+
 extern void add_highpages_with_active_regions(int nid, unsigned long start_pfn,
 					unsigned long end_pfn);
 
--- a/arch/x86/include/asm/iomap.h
+++ b/arch/x86/include/asm/iomap.h
@@ -9,19 +9,20 @@
 #include <linux/fs.h>
 #include <linux/mm.h>
 #include <linux/uaccess.h>
+#include <linux/highmem.h>
 #include <asm/cacheflush.h>
 #include <asm/tlbflush.h>
 
-void __iomem *
-iomap_atomic_prot_pfn(unsigned long pfn, pgprot_t prot);
+void __iomem *iomap_atomic_pfn_prot(unsigned long pfn, pgprot_t prot);
 
-void
-iounmap_atomic(void __iomem *kvaddr);
+static inline void iounmap_atomic(void __iomem *vaddr)
+{
+	kunmap_atomic_indexed((void __force *)vaddr);
+	preempt_enable();
+}
 
-int
-iomap_create_wc(resource_size_t base, unsigned long size, pgprot_t *prot);
+int iomap_create_wc(resource_size_t base, unsigned long size, pgprot_t *prot);
 
-void
-iomap_free(resource_size_t base, unsigned long size);
+void iomap_free(resource_size_t base, unsigned long size);
 
 #endif /* _ASM_X86_IOMAP_H */
--- a/arch/x86/mm/highmem_32.c
+++ b/arch/x86/mm/highmem_32.c
@@ -4,65 +4,6 @@
 #include <linux/swap.h> /* for totalram_pages */
 #include <linux/memblock.h>
 
-void *kmap_atomic_high_prot(struct page *page, pgprot_t prot)
-{
-	unsigned long vaddr;
-	int idx, type;
-
-	type = kmap_atomic_idx_push();
-	idx = type + KM_TYPE_NR*smp_processor_id();
-	vaddr = __fix_to_virt(FIX_KMAP_BEGIN + idx);
-	BUG_ON(!pte_none(*(kmap_pte-idx)));
-	set_pte(kmap_pte-idx, mk_pte(page, prot));
-	arch_flush_lazy_mmu_mode();
-
-	return (void *)vaddr;
-}
-EXPORT_SYMBOL(kmap_atomic_high_prot);
-
-/*
- * This is the same as kmap_atomic() but can map memory that doesn't
- * have a struct page associated with it.
- */
-void *kmap_atomic_pfn(unsigned long pfn)
-{
-	return kmap_atomic_prot_pfn(pfn, kmap_prot);
-}
-EXPORT_SYMBOL_GPL(kmap_atomic_pfn);
-
-void kunmap_atomic_high(void *kvaddr)
-{
-	unsigned long vaddr = (unsigned long) kvaddr & PAGE_MASK;
-
-	if (vaddr >= __fix_to_virt(FIX_KMAP_END) &&
-	    vaddr <= __fix_to_virt(FIX_KMAP_BEGIN)) {
-		int idx, type;
-
-		type = kmap_atomic_idx();
-		idx = type + KM_TYPE_NR * smp_processor_id();
-
-#ifdef CONFIG_DEBUG_HIGHMEM
-		WARN_ON_ONCE(vaddr != __fix_to_virt(FIX_KMAP_BEGIN + idx));
-#endif
-		/*
-		 * Force other mappings to Oops if they'll try to access this
-		 * pte without first remap it.  Keeping stale mappings around
-		 * is a bad idea also, in case the page changes cacheability
-		 * attributes or becomes a protected page in a hypervisor.
-		 */
-		kpte_clear_flush(kmap_pte-idx, vaddr);
-		kmap_atomic_idx_pop();
-		arch_flush_lazy_mmu_mode();
-	}
-#ifdef CONFIG_DEBUG_HIGHMEM
-	else {
-		BUG_ON(vaddr < PAGE_OFFSET);
-		BUG_ON(vaddr >= (unsigned long)high_memory);
-	}
-#endif
-}
-EXPORT_SYMBOL(kunmap_atomic_high);
-
 void __init set_highmem_pages_init(void)
 {
 	struct zone *zone;
--- a/arch/x86/mm/init_32.c
+++ b/arch/x86/mm/init_32.c
@@ -394,19 +394,6 @@ kernel_physical_mapping_init(unsigned lo
 	return last_map_addr;
 }
 
-pte_t *kmap_pte;
-
-static void __init kmap_init(void)
-{
-	unsigned long kmap_vstart;
-
-	/*
-	 * Cache the first kmap pte:
-	 */
-	kmap_vstart = __fix_to_virt(FIX_KMAP_BEGIN);
-	kmap_pte = virt_to_kpte(kmap_vstart);
-}
-
 #ifdef CONFIG_HIGHMEM
 static void __init permanent_kmaps_init(pgd_t *pgd_base)
 {
@@ -712,8 +699,6 @@ void __init paging_init(void)
 
 	__flush_tlb_all();
 
-	kmap_init();
-
 	/*
 	 * NOTE: at this point the bootmem allocator is fully available.
 	 */
--- a/arch/x86/mm/iomap_32.c
+++ b/arch/x86/mm/iomap_32.c
@@ -44,28 +44,7 @@ void iomap_free(resource_size_t base, un
 }
 EXPORT_SYMBOL_GPL(iomap_free);
 
-void *kmap_atomic_prot_pfn(unsigned long pfn, pgprot_t prot)
-{
-	unsigned long vaddr;
-	int idx, type;
-
-	preempt_disable();
-	pagefault_disable();
-
-	type = kmap_atomic_idx_push();
-	idx = type + KM_TYPE_NR * smp_processor_id();
-	vaddr = __fix_to_virt(FIX_KMAP_BEGIN + idx);
-	set_pte(kmap_pte - idx, pfn_pte(pfn, prot));
-	arch_flush_lazy_mmu_mode();
-
-	return (void *)vaddr;
-}
-
-/*
- * Map 'pfn' using protections 'prot'
- */
-void __iomem *
-iomap_atomic_prot_pfn(unsigned long pfn, pgprot_t prot)
+void __iomem *iomap_atomic_pfn_prot(unsigned long pfn, pgprot_t prot)
 {
 	/*
 	 * For non-PAT systems, translate non-WB request to UC- just in
@@ -81,36 +60,7 @@ iomap_atomic_prot_pfn(unsigned long pfn,
 	/* Filter out unsupported __PAGE_KERNEL* bits: */
 	pgprot_val(prot) &= __default_kernel_pte_mask;
 
-	return (void __force __iomem *) kmap_atomic_prot_pfn(pfn, prot);
-}
-EXPORT_SYMBOL_GPL(iomap_atomic_prot_pfn);
-
-void
-iounmap_atomic(void __iomem *kvaddr)
-{
-	unsigned long vaddr = (unsigned long) kvaddr & PAGE_MASK;
-
-	if (vaddr >= __fix_to_virt(FIX_KMAP_END) &&
-	    vaddr <= __fix_to_virt(FIX_KMAP_BEGIN)) {
-		int idx, type;
-
-		type = kmap_atomic_idx();
-		idx = type + KM_TYPE_NR * smp_processor_id();
-
-#ifdef CONFIG_DEBUG_HIGHMEM
-		WARN_ON_ONCE(vaddr != __fix_to_virt(FIX_KMAP_BEGIN + idx));
-#endif
-		/*
-		 * Force other mappings to Oops if they'll try to access this
-		 * pte without first remap it.  Keeping stale mappings around
-		 * is a bad idea also, in case the page changes cacheability
-		 * attributes or becomes a protected page in a hypervisor.
-		 */
-		kpte_clear_flush(kmap_pte-idx, vaddr);
-		kmap_atomic_idx_pop();
-	}
-
-	pagefault_enable();
-	preempt_enable();
+	preempt_disable();
+	return (void __force __iomem *)kmap_atomic_pfn_prot(pfn, prot);
 }
-EXPORT_SYMBOL_GPL(iounmap_atomic);
+EXPORT_SYMBOL_GPL(iomap_atomic_pfn_prot);
--- a/include/linux/io-mapping.h
+++ b/include/linux/io-mapping.h
@@ -69,7 +69,7 @@ io_mapping_map_atomic_wc(struct io_mappi
 
 	BUG_ON(offset >= mapping->size);
 	phys_addr = mapping->base + offset;
-	return iomap_atomic_prot_pfn(PHYS_PFN(phys_addr), mapping->prot);
+	return iomap_atomic_pfn_prot(PHYS_PFN(phys_addr), mapping->prot);
 }
 
 static inline void


^ permalink raw reply	[flat|nested] 54+ messages in thread

* [patch RFC 04/15] arc/mm/highmem: Use generic kmap atomic implementation
  2020-09-19  9:17 [patch RFC 00/15] mm/highmem: Provide a preemptible variant of kmap_atomic & friends Thomas Gleixner
                   ` (2 preceding siblings ...)
  2020-09-19  9:17 ` [patch RFC 03/15] x86/mm/highmem: Use generic kmap atomic implementation Thomas Gleixner
@ 2020-09-19  9:17 ` Thomas Gleixner
  2020-09-19  9:17 ` [patch RFC 05/15] ARM: highmem: Switch to generic kmap atomic Thomas Gleixner
                   ` (12 subsequent siblings)
  16 siblings, 0 replies; 54+ messages in thread
From: Thomas Gleixner @ 2020-09-19  9:17 UTC (permalink / raw)
  To: LKML
  Cc: linux-arch, Linus Torvalds, Paul McKenney, x86,
	Sebastian Andrzej Siewior, Peter Zijlstra, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Daniel Bristot de Oliveira, Will Deacon,
	Andrew Morton, Linux-MM, Russell King, Linux ARM, Chris Zankel,
	Max Filippov, linux-xtensa, Jani Nikula, Joonas Lahtinen,
	Rodrigo Vivi, David Airlie, Daniel Vetter, intel-gfx, dri-devel,
	Ard Biesheuvel, Herbert Xu, Vineet Gupta, linux-snps-arc,
	Arnd Bergmann, Guo Ren, linux-csky, Michal Simek,
	Thomas Bogendoerfer, linux-mips, Nick Hu, Greentime Hu,
	Vincent Chen, Michael Ellerman, Benjamin Herrenschmidt,
	Paul Mackerras, linuxppc-dev, David S. Miller, sparclinux

Adopt the map ordering to match the other architectures and the generic
code.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: linux-snps-arc@lists.infradead.org
---
Note: Completely untested
---
 arch/arc/Kconfig               |    1 
 arch/arc/include/asm/highmem.h |    8 ++++++-
 arch/arc/mm/highmem.c          |   44 -----------------------------------------
 3 files changed, 9 insertions(+), 44 deletions(-)

--- a/arch/arc/Kconfig
+++ b/arch/arc/Kconfig
@@ -508,6 +508,7 @@ config LINUX_RAM_BASE
 config HIGHMEM
 	bool "High Memory Support"
 	select ARCH_DISCONTIGMEM_ENABLE
+	select KMAP_ATOMIC_GENERIC
 	help
 	  With ARC 2G:2G address split, only upper 2G is directly addressable by
 	  kernel. Enable this to potentially allow access to rest of 2G and PAE
--- a/arch/arc/include/asm/highmem.h
+++ b/arch/arc/include/asm/highmem.h
@@ -15,7 +15,10 @@
 #define FIXMAP_BASE		(PAGE_OFFSET - FIXMAP_SIZE - PKMAP_SIZE)
 #define FIXMAP_SIZE		PGDIR_SIZE	/* only 1 PGD worth */
 #define KM_TYPE_NR		((FIXMAP_SIZE >> PAGE_SHIFT)/NR_CPUS)
-#define FIXMAP_ADDR(nr)		(FIXMAP_BASE + ((nr) << PAGE_SHIFT))
+
+#define FIX_KMAP_BEGIN		(0)
+#define FIX_KMAP_END		((FIXMAP_SIZE >> PAGE_SHIFT) - 1)
+#define FIXADDR_TOP		(FIXMAP_BASE + FIXMAP_SIZE - PAGE_SIZE)
 
 /* start after fixmap area */
 #define PKMAP_BASE		(FIXMAP_BASE + FIXMAP_SIZE)
@@ -29,6 +32,9 @@
 
 extern void kmap_init(void);
 
+#define arch_kmap_temp_post_unmap(vaddr)			\
+	local_flush_tlb_kernel_range(vaddr, vaddr + PAGE_SIZE)
+
 static inline void flush_cache_kmaps(void)
 {
 	flush_cache_all();
--- a/arch/arc/mm/highmem.c
+++ b/arch/arc/mm/highmem.c
@@ -47,48 +47,6 @@
  */
 
 extern pte_t * pkmap_page_table;
-static pte_t * fixmap_page_table;
-
-void *kmap_atomic_high_prot(struct page *page, pgprot_t prot)
-{
-	int idx, cpu_idx;
-	unsigned long vaddr;
-
-	cpu_idx = kmap_atomic_idx_push();
-	idx = cpu_idx + KM_TYPE_NR * smp_processor_id();
-	vaddr = FIXMAP_ADDR(idx);
-
-	set_pte_at(&init_mm, vaddr, fixmap_page_table + idx,
-		   mk_pte(page, prot));
-
-	return (void *)vaddr;
-}
-EXPORT_SYMBOL(kmap_atomic_high_prot);
-
-void kunmap_atomic_high(void *kv)
-{
-	unsigned long kvaddr = (unsigned long)kv;
-
-	if (kvaddr >= FIXMAP_BASE && kvaddr < (FIXMAP_BASE + FIXMAP_SIZE)) {
-
-		/*
-		 * Because preemption is disabled, this vaddr can be associated
-		 * with the current allocated index.
-		 * But in case of multiple live kmap_atomic(), it still relies on
-		 * callers to unmap in right order.
-		 */
-		int cpu_idx = kmap_atomic_idx();
-		int idx = cpu_idx + KM_TYPE_NR * smp_processor_id();
-
-		WARN_ON(kvaddr != FIXMAP_ADDR(idx));
-
-		pte_clear(&init_mm, kvaddr, fixmap_page_table + idx);
-		local_flush_tlb_kernel_range(kvaddr, kvaddr + PAGE_SIZE);
-
-		kmap_atomic_idx_pop();
-	}
-}
-EXPORT_SYMBOL(kunmap_atomic_high);
 
 static noinline pte_t * __init alloc_kmap_pgtable(unsigned long kvaddr)
 {
@@ -113,5 +71,5 @@ void __init kmap_init(void)
 	pkmap_page_table = alloc_kmap_pgtable(PKMAP_BASE);
 
 	BUILD_BUG_ON(LAST_PKMAP > PTRS_PER_PTE);
-	fixmap_page_table = alloc_kmap_pgtable(FIXMAP_BASE);
+	alloc_kmap_pgtable(FIXMAP_BASE);
 }


^ permalink raw reply	[flat|nested] 54+ messages in thread

* [patch RFC 05/15] ARM: highmem: Switch to generic kmap atomic
  2020-09-19  9:17 [patch RFC 00/15] mm/highmem: Provide a preemptible variant of kmap_atomic & friends Thomas Gleixner
                   ` (3 preceding siblings ...)
  2020-09-19  9:17 ` [patch RFC 04/15] arc/mm/highmem: " Thomas Gleixner
@ 2020-09-19  9:17 ` Thomas Gleixner
  2020-09-19  9:17 ` [patch RFC 06/15] csky/mm/highmem: " Thomas Gleixner
                   ` (11 subsequent siblings)
  16 siblings, 0 replies; 54+ messages in thread
From: Thomas Gleixner @ 2020-09-19  9:17 UTC (permalink / raw)
  To: LKML
  Cc: linux-arch, Linus Torvalds, Paul McKenney, x86,
	Sebastian Andrzej Siewior, Peter Zijlstra, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Daniel Bristot de Oliveira, Will Deacon,
	Andrew Morton, Linux-MM, Russell King, Linux ARM, Chris Zankel,
	Max Filippov, linux-xtensa, Jani Nikula, Joonas Lahtinen,
	Rodrigo Vivi, David Airlie, Daniel Vetter, intel-gfx, dri-devel,
	Ard Biesheuvel, Herbert Xu, Arnd Bergmann, Vineet Gupta,
	linux-snps-arc, Guo Ren, linux-csky, Michal Simek,
	Thomas Bogendoerfer, linux-mips, Nick Hu, Greentime Hu,
	Vincent Chen, Michael Ellerman, Benjamin Herrenschmidt,
	Paul Mackerras, linuxppc-dev, David S. Miller, sparclinux

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: linux-arm-kernel@lists.infradead.org
---
Note: Completely untested
---
 arch/arm/Kconfig               |    1 
 arch/arm/include/asm/highmem.h |   30 +++++++---
 arch/arm/mm/Makefile           |    1 
 arch/arm/mm/highmem.c          |  121 -----------------------------------------
 4 files changed, 23 insertions(+), 130 deletions(-)

--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -1499,6 +1499,7 @@ config HAVE_ARCH_PFN_VALID
 config HIGHMEM
 	bool "High Memory Support"
 	depends on MMU
+	select KMAP_ATOMIC_GENERIC
 	help
 	  The address space of ARM processors is only 4 Gigabytes large
 	  and it has to accommodate user address space, kernel address
--- a/arch/arm/include/asm/highmem.h
+++ b/arch/arm/include/asm/highmem.h
@@ -46,19 +46,33 @@ extern pte_t *pkmap_page_table;
 
 #ifdef ARCH_NEEDS_KMAP_HIGH_GET
 extern void *kmap_high_get(struct page *page);
+
+#ifdef CONFIG_DEBUG_HIGHMEM
+extern void *arch_kmap_temporary_high_get(struct page *page);
 #else
+static inline void *arch_kmap_temporary_high_get(struct page *page)
+{
+	return kmap_high_get(page);
+}
+#endif /* !CONFIG_DEBUG_HIGHMEM */
+
+#else /* ARCH_NEEDS_KMAP_HIGH_GET */
 static inline void *kmap_high_get(struct page *page)
 {
 	return NULL;
 }
-#endif
+#endif /* !ARCH_NEEDS_KMAP_HIGH_GET */
 
-/*
- * The following functions are already defined by <linux/highmem.h>
- * when CONFIG_HIGHMEM is not set.
- */
-#ifdef CONFIG_HIGHMEM
-extern void *kmap_atomic_pfn(unsigned long pfn);
-#endif
+#define arch_kmap_temp_post_map(vaddr, pteval)				\
+	local_flush_tlb_kernel_page(vaddr)
+
+#define arch_kmap_temp_pre_unmap(vaddr)					\
+do {									\
+	if (cache_is_vivt())						\
+		__cpuc_flush_dcache_area((void *)vaddr, PAGE_SIZE);	\
+} while (0)
+
+#define arch_kmap_temp_post_unmap(vaddr)				\
+	local_flush_tlb_kernel_page(vaddr)
 
 #endif
--- a/arch/arm/mm/Makefile
+++ b/arch/arm/mm/Makefile
@@ -19,7 +19,6 @@ obj-$(CONFIG_MODULES)		+= proc-syms.o
 obj-$(CONFIG_DEBUG_VIRTUAL)	+= physaddr.o
 
 obj-$(CONFIG_ALIGNMENT_TRAP)	+= alignment.o
-obj-$(CONFIG_HIGHMEM)		+= highmem.o
 obj-$(CONFIG_HUGETLB_PAGE)	+= hugetlbpage.o
 obj-$(CONFIG_ARM_PV_FIXUP)	+= pv-fixup-asm.o
 
--- a/arch/arm/mm/highmem.c
+++ /dev/null
@@ -1,121 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0-only
-/*
- * arch/arm/mm/highmem.c -- ARM highmem support
- *
- * Author:	Nicolas Pitre
- * Created:	september 8, 2008
- * Copyright:	Marvell Semiconductors Inc.
- */
-
-#include <linux/module.h>
-#include <linux/highmem.h>
-#include <linux/interrupt.h>
-#include <asm/fixmap.h>
-#include <asm/cacheflush.h>
-#include <asm/tlbflush.h>
-#include "mm.h"
-
-static inline void set_fixmap_pte(int idx, pte_t pte)
-{
-	unsigned long vaddr = __fix_to_virt(idx);
-	pte_t *ptep = virt_to_kpte(vaddr);
-
-	set_pte_ext(ptep, pte, 0);
-	local_flush_tlb_kernel_page(vaddr);
-}
-
-static inline pte_t get_fixmap_pte(unsigned long vaddr)
-{
-	pte_t *ptep = virt_to_kpte(vaddr);
-
-	return *ptep;
-}
-
-void *kmap_atomic_high_prot(struct page *page, pgprot_t prot)
-{
-	unsigned int idx;
-	unsigned long vaddr;
-	void *kmap;
-	int type;
-
-#ifdef CONFIG_DEBUG_HIGHMEM
-	/*
-	 * There is no cache coherency issue when non VIVT, so force the
-	 * dedicated kmap usage for better debugging purposes in that case.
-	 */
-	if (!cache_is_vivt())
-		kmap = NULL;
-	else
-#endif
-		kmap = kmap_high_get(page);
-	if (kmap)
-		return kmap;
-
-	type = kmap_atomic_idx_push();
-
-	idx = FIX_KMAP_BEGIN + type + KM_TYPE_NR * smp_processor_id();
-	vaddr = __fix_to_virt(idx);
-#ifdef CONFIG_DEBUG_HIGHMEM
-	/*
-	 * With debugging enabled, kunmap_atomic forces that entry to 0.
-	 * Make sure it was indeed properly unmapped.
-	 */
-	BUG_ON(!pte_none(get_fixmap_pte(vaddr)));
-#endif
-	/*
-	 * When debugging is off, kunmap_atomic leaves the previous mapping
-	 * in place, so the contained TLB flush ensures the TLB is updated
-	 * with the new mapping.
-	 */
-	set_fixmap_pte(idx, mk_pte(page, prot));
-
-	return (void *)vaddr;
-}
-EXPORT_SYMBOL(kmap_atomic_high_prot);
-
-void kunmap_atomic_high(void *kvaddr)
-{
-	unsigned long vaddr = (unsigned long) kvaddr & PAGE_MASK;
-	int idx, type;
-
-	if (kvaddr >= (void *)FIXADDR_START) {
-		type = kmap_atomic_idx();
-		idx = FIX_KMAP_BEGIN + type + KM_TYPE_NR * smp_processor_id();
-
-		if (cache_is_vivt())
-			__cpuc_flush_dcache_area((void *)vaddr, PAGE_SIZE);
-#ifdef CONFIG_DEBUG_HIGHMEM
-		BUG_ON(vaddr != __fix_to_virt(idx));
-		set_fixmap_pte(idx, __pte(0));
-#else
-		(void) idx;  /* to kill a warning */
-#endif
-		kmap_atomic_idx_pop();
-	} else if (vaddr >= PKMAP_ADDR(0) && vaddr < PKMAP_ADDR(LAST_PKMAP)) {
-		/* this address was obtained through kmap_high_get() */
-		kunmap_high(pte_page(pkmap_page_table[PKMAP_NR(vaddr)]));
-	}
-}
-EXPORT_SYMBOL(kunmap_atomic_high);
-
-void *kmap_atomic_pfn(unsigned long pfn)
-{
-	unsigned long vaddr;
-	int idx, type;
-	struct page *page = pfn_to_page(pfn);
-
-	preempt_disable();
-	pagefault_disable();
-	if (!PageHighMem(page))
-		return page_address(page);
-
-	type = kmap_atomic_idx_push();
-	idx = FIX_KMAP_BEGIN + type + KM_TYPE_NR * smp_processor_id();
-	vaddr = __fix_to_virt(idx);
-#ifdef CONFIG_DEBUG_HIGHMEM
-	BUG_ON(!pte_none(get_fixmap_pte(vaddr)));
-#endif
-	set_fixmap_pte(idx, pfn_pte(pfn, kmap_prot));
-
-	return (void *)vaddr;
-}


^ permalink raw reply	[flat|nested] 54+ messages in thread

* [patch RFC 06/15] csky/mm/highmem: Switch to generic kmap atomic
  2020-09-19  9:17 [patch RFC 00/15] mm/highmem: Provide a preemptible variant of kmap_atomic & friends Thomas Gleixner
                   ` (4 preceding siblings ...)
  2020-09-19  9:17 ` [patch RFC 05/15] ARM: highmem: Switch to generic kmap atomic Thomas Gleixner
@ 2020-09-19  9:17 ` Thomas Gleixner
  2020-09-23  0:05   ` Guo Ren
  2020-09-19  9:17 ` [patch RFC 07/15] microblaze/mm/highmem: " Thomas Gleixner
                   ` (10 subsequent siblings)
  16 siblings, 1 reply; 54+ messages in thread
From: Thomas Gleixner @ 2020-09-19  9:17 UTC (permalink / raw)
  To: LKML
  Cc: linux-arch, Linus Torvalds, Paul McKenney, x86,
	Sebastian Andrzej Siewior, Peter Zijlstra, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Daniel Bristot de Oliveira, Will Deacon,
	Andrew Morton, Linux-MM, Russell King, Linux ARM, Chris Zankel,
	Max Filippov, linux-xtensa, Jani Nikula, Joonas Lahtinen,
	Rodrigo Vivi, David Airlie, Daniel Vetter, intel-gfx, dri-devel,
	Ard Biesheuvel, Herbert Xu, Guo Ren, linux-csky, Vineet Gupta,
	linux-snps-arc, Arnd Bergmann, Michal Simek, Thomas Bogendoerfer,
	linux-mips, Nick Hu, Greentime Hu, Vincent Chen,
	Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
	linuxppc-dev, David S. Miller, sparclinux

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Guo Ren <guoren@kernel.org>
Cc: linux-csky@vger.kernel.org
---
Note: Completely untested
---
 arch/csky/Kconfig               |    1 
 arch/csky/include/asm/highmem.h |    4 +-
 arch/csky/mm/highmem.c          |   75 ----------------------------------------
 3 files changed, 5 insertions(+), 75 deletions(-)

--- a/arch/csky/Kconfig
+++ b/arch/csky/Kconfig
@@ -285,6 +285,7 @@ config NR_CPUS
 config HIGHMEM
 	bool "High Memory Support"
 	depends on !CPU_CK610
+	select KMAP_ATOMIC_GENERIC
 	default y
 
 config FORCE_MAX_ZONEORDER
--- a/arch/csky/include/asm/highmem.h
+++ b/arch/csky/include/asm/highmem.h
@@ -32,10 +32,12 @@ extern pte_t *pkmap_page_table;
 
 #define ARCH_HAS_KMAP_FLUSH_TLB
 extern void kmap_flush_tlb(unsigned long addr);
-extern void *kmap_atomic_pfn(unsigned long pfn);
 
 #define flush_cache_kmaps() do {} while (0)
 
+#define arch_kmap_temp_post_map(vaddr, pteval)	kmap_flush_tlb(vaddr)
+#define arch_kmap_temp_post_unmap(vaddr)	kmap_flush_tlb(vaddr)
+
 extern void kmap_init(void);
 
 #endif /* __KERNEL__ */
--- a/arch/csky/mm/highmem.c
+++ b/arch/csky/mm/highmem.c
@@ -9,8 +9,6 @@
 #include <asm/tlbflush.h>
 #include <asm/cacheflush.h>
 
-static pte_t *kmap_pte;
-
 unsigned long highstart_pfn, highend_pfn;
 
 void kmap_flush_tlb(unsigned long addr)
@@ -19,67 +17,7 @@ void kmap_flush_tlb(unsigned long addr)
 }
 EXPORT_SYMBOL(kmap_flush_tlb);
 
-void *kmap_atomic_high_prot(struct page *page, pgprot_t prot)
-{
-	unsigned long vaddr;
-	int idx, type;
-
-	type = kmap_atomic_idx_push();
-	idx = type + KM_TYPE_NR*smp_processor_id();
-	vaddr = __fix_to_virt(FIX_KMAP_BEGIN + idx);
-#ifdef CONFIG_DEBUG_HIGHMEM
-	BUG_ON(!pte_none(*(kmap_pte - idx)));
-#endif
-	set_pte(kmap_pte-idx, mk_pte(page, prot));
-	flush_tlb_one((unsigned long)vaddr);
-
-	return (void *)vaddr;
-}
-EXPORT_SYMBOL(kmap_atomic_high_prot);
-
-void kunmap_atomic_high(void *kvaddr)
-{
-	unsigned long vaddr = (unsigned long) kvaddr & PAGE_MASK;
-	int idx;
-
-	if (vaddr < FIXADDR_START)
-		return;
-
-#ifdef CONFIG_DEBUG_HIGHMEM
-	idx = KM_TYPE_NR*smp_processor_id() + kmap_atomic_idx();
-
-	BUG_ON(vaddr != __fix_to_virt(FIX_KMAP_BEGIN + idx));
-
-	pte_clear(&init_mm, vaddr, kmap_pte - idx);
-	flush_tlb_one(vaddr);
-#else
-	(void) idx; /* to kill a warning */
-#endif
-	kmap_atomic_idx_pop();
-}
-EXPORT_SYMBOL(kunmap_atomic_high);
-
-/*
- * This is the same as kmap_atomic() but can map memory that doesn't
- * have a struct page associated with it.
- */
-void *kmap_atomic_pfn(unsigned long pfn)
-{
-	unsigned long vaddr;
-	int idx, type;
-
-	pagefault_disable();
-
-	type = kmap_atomic_idx_push();
-	idx = type + KM_TYPE_NR*smp_processor_id();
-	vaddr = __fix_to_virt(FIX_KMAP_BEGIN + idx);
-	set_pte(kmap_pte-idx, pfn_pte(pfn, PAGE_KERNEL));
-	flush_tlb_one(vaddr);
-
-	return (void *) vaddr;
-}
-
-static void __init kmap_pages_init(void)
+void __init kmap_init(void)
 {
 	unsigned long vaddr;
 	pgd_t *pgd;
@@ -96,14 +34,3 @@ static void __init kmap_pages_init(void)
 	pte = pte_offset_kernel(pmd, vaddr);
 	pkmap_page_table = pte;
 }
-
-void __init kmap_init(void)
-{
-	unsigned long vaddr;
-
-	kmap_pages_init();
-
-	vaddr = __fix_to_virt(FIX_KMAP_BEGIN);
-
-	kmap_pte = pte_offset_kernel((pmd_t *)pgd_offset_k(vaddr), vaddr);
-}


^ permalink raw reply	[flat|nested] 54+ messages in thread

* [patch RFC 07/15] microblaze/mm/highmem: Switch to generic kmap atomic
  2020-09-19  9:17 [patch RFC 00/15] mm/highmem: Provide a preemptible variant of kmap_atomic & friends Thomas Gleixner
                   ` (5 preceding siblings ...)
  2020-09-19  9:17 ` [patch RFC 06/15] csky/mm/highmem: " Thomas Gleixner
@ 2020-09-19  9:17 ` Thomas Gleixner
  2020-09-19  9:17 ` [patch RFC 08/15] mips/mm/highmem: " Thomas Gleixner
                   ` (9 subsequent siblings)
  16 siblings, 0 replies; 54+ messages in thread
From: Thomas Gleixner @ 2020-09-19  9:17 UTC (permalink / raw)
  To: LKML
  Cc: linux-arch, Linus Torvalds, Paul McKenney, x86,
	Sebastian Andrzej Siewior, Peter Zijlstra, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Daniel Bristot de Oliveira, Will Deacon,
	Andrew Morton, Linux-MM, Russell King, Linux ARM, Chris Zankel,
	Max Filippov, linux-xtensa, Jani Nikula, Joonas Lahtinen,
	Rodrigo Vivi, David Airlie, Daniel Vetter, intel-gfx, dri-devel,
	Ard Biesheuvel, Herbert Xu, Michal Simek, Vineet Gupta,
	linux-snps-arc, Arnd Bergmann, Guo Ren, linux-csky,
	Thomas Bogendoerfer, linux-mips, Nick Hu, Greentime Hu,
	Vincent Chen, Michael Ellerman, Benjamin Herrenschmidt,
	Paul Mackerras, linuxppc-dev, David S. Miller, sparclinux

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Michal Simek <monstr@monstr.eu>
---
Note: Completely untested
---
 arch/microblaze/Kconfig               |    1 
 arch/microblaze/include/asm/highmem.h |    6 ++
 arch/microblaze/mm/Makefile           |    1 
 arch/microblaze/mm/highmem.c          |   78 ----------------------------------
 arch/microblaze/mm/init.c             |    6 --
 5 files changed, 6 insertions(+), 86 deletions(-)

--- a/arch/microblaze/Kconfig
+++ b/arch/microblaze/Kconfig
@@ -170,6 +170,7 @@ config XILINX_UNCACHED_SHADOW
 config HIGHMEM
 	bool "High memory support"
 	depends on MMU
+	select KMAP_ATOMIC_GENERIC
 	help
 	  The address space of Microblaze processors is only 4 Gigabytes large
 	  and it has to accommodate user address space, kernel address
--- a/arch/microblaze/include/asm/highmem.h
+++ b/arch/microblaze/include/asm/highmem.h
@@ -25,7 +25,6 @@
 #include <linux/uaccess.h>
 #include <asm/fixmap.h>
 
-extern pte_t *kmap_pte;
 extern pte_t *pkmap_page_table;
 
 /*
@@ -52,6 +51,11 @@ extern pte_t *pkmap_page_table;
 
 #define flush_cache_kmaps()	{ flush_icache(); flush_dcache(); }
 
+#define arch_kmap_temp_post_map(vaddr, pteval)	\
+	local_flush_tlb_page(NULL, vaddr);
+#define arch_kmap_temp_post_unmap(vaddr)	\
+	local_flush_tlb_page(NULL, vaddr);
+
 #endif /* __KERNEL__ */
 
 #endif /* _ASM_HIGHMEM_H */
--- a/arch/microblaze/mm/Makefile
+++ b/arch/microblaze/mm/Makefile
@@ -6,4 +6,3 @@
 obj-y := consistent.o init.o
 
 obj-$(CONFIG_MMU) += pgtable.o mmu_context.o fault.o
-obj-$(CONFIG_HIGHMEM) += highmem.o
--- a/arch/microblaze/mm/highmem.c
+++ /dev/null
@@ -1,78 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0
-/*
- * highmem.c: virtual kernel memory mappings for high memory
- *
- * PowerPC version, stolen from the i386 version.
- *
- * Used in CONFIG_HIGHMEM systems for memory pages which
- * are not addressable by direct kernel virtual addresses.
- *
- * Copyright (C) 1999 Gerhard Wichert, Siemens AG
- *		      Gerhard.Wichert@pdb.siemens.de
- *
- *
- * Redesigned the x86 32-bit VM architecture to deal with
- * up to 16 Terrabyte physical memory. With current x86 CPUs
- * we now support up to 64 Gigabytes physical RAM.
- *
- * Copyright (C) 1999 Ingo Molnar <mingo@redhat.com>
- *
- * Reworked for PowerPC by various contributors. Moved from
- * highmem.h by Benjamin Herrenschmidt (c) 2009 IBM Corp.
- */
-
-#include <linux/export.h>
-#include <linux/highmem.h>
-
-/*
- * The use of kmap_atomic/kunmap_atomic is discouraged - kmap/kunmap
- * gives a more generic (and caching) interface. But kmap_atomic can
- * be used in IRQ contexts, so in some (very limited) cases we need
- * it.
- */
-#include <asm/tlbflush.h>
-
-void *kmap_atomic_high_prot(struct page *page, pgprot_t prot)
-{
-
-	unsigned long vaddr;
-	int idx, type;
-
-	type = kmap_atomic_idx_push();
-	idx = type + KM_TYPE_NR*smp_processor_id();
-	vaddr = __fix_to_virt(FIX_KMAP_BEGIN + idx);
-#ifdef CONFIG_DEBUG_HIGHMEM
-	BUG_ON(!pte_none(*(kmap_pte-idx)));
-#endif
-	set_pte_at(&init_mm, vaddr, kmap_pte-idx, mk_pte(page, prot));
-	local_flush_tlb_page(NULL, vaddr);
-
-	return (void *) vaddr;
-}
-EXPORT_SYMBOL(kmap_atomic_high_prot);
-
-void kunmap_atomic_high(void *kvaddr)
-{
-	unsigned long vaddr = (unsigned long) kvaddr & PAGE_MASK;
-	int type;
-	unsigned int idx;
-
-	if (vaddr < __fix_to_virt(FIX_KMAP_END))
-		return;
-
-	type = kmap_atomic_idx();
-
-	idx = type + KM_TYPE_NR * smp_processor_id();
-#ifdef CONFIG_DEBUG_HIGHMEM
-	BUG_ON(vaddr != __fix_to_virt(FIX_KMAP_BEGIN + idx));
-#endif
-	/*
-	 * force other mappings to Oops if they'll try to access
-	 * this pte without first remap it
-	 */
-	pte_clear(&init_mm, vaddr, kmap_pte-idx);
-	local_flush_tlb_page(NULL, vaddr);
-
-	kmap_atomic_idx_pop();
-}
-EXPORT_SYMBOL(kunmap_atomic_high);
--- a/arch/microblaze/mm/init.c
+++ b/arch/microblaze/mm/init.c
@@ -49,17 +49,11 @@ unsigned long lowmem_size;
 EXPORT_SYMBOL(min_low_pfn);
 EXPORT_SYMBOL(max_low_pfn);
 
-#ifdef CONFIG_HIGHMEM
-pte_t *kmap_pte;
-EXPORT_SYMBOL(kmap_pte);
-
 static void __init highmem_init(void)
 {
 	pr_debug("%x\n", (u32)PKMAP_BASE);
 	map_page(PKMAP_BASE, 0, 0);	/* XXX gross */
 	pkmap_page_table = virt_to_kpte(PKMAP_BASE);
-
-	kmap_pte = virt_to_kpte(__fix_to_virt(FIX_KMAP_BEGIN));
 }
 
 static void highmem_setup(void)


^ permalink raw reply	[flat|nested] 54+ messages in thread

* [patch RFC 08/15] mips/mm/highmem: Switch to generic kmap atomic
  2020-09-19  9:17 [patch RFC 00/15] mm/highmem: Provide a preemptible variant of kmap_atomic & friends Thomas Gleixner
                   ` (6 preceding siblings ...)
  2020-09-19  9:17 ` [patch RFC 07/15] microblaze/mm/highmem: " Thomas Gleixner
@ 2020-09-19  9:17 ` Thomas Gleixner
  2020-09-19  9:18 ` [patch RFC 09/15] nds32/mm/highmem: " Thomas Gleixner
                   ` (8 subsequent siblings)
  16 siblings, 0 replies; 54+ messages in thread
From: Thomas Gleixner @ 2020-09-19  9:17 UTC (permalink / raw)
  To: LKML
  Cc: linux-arch, Linus Torvalds, Paul McKenney, x86,
	Sebastian Andrzej Siewior, Peter Zijlstra, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Daniel Bristot de Oliveira, Will Deacon,
	Andrew Morton, Linux-MM, Russell King, Linux ARM, Chris Zankel,
	Max Filippov, linux-xtensa, Jani Nikula, Joonas Lahtinen,
	Rodrigo Vivi, David Airlie, Daniel Vetter, intel-gfx, dri-devel,
	Ard Biesheuvel, Herbert Xu, Thomas Bogendoerfer, linux-mips,
	Vineet Gupta, linux-snps-arc, Arnd Bergmann, Guo Ren, linux-csky,
	Michal Simek, Nick Hu, Greentime Hu, Vincent Chen,
	Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
	linuxppc-dev, David S. Miller, sparclinux

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: linux-mips@vger.kernel.org
---
Note: Completely untested
---
 arch/mips/Kconfig               |    1 
 arch/mips/include/asm/highmem.h |    4 +-
 arch/mips/mm/highmem.c          |   77 ----------------------------------------
 arch/mips/mm/init.c             |    3 -
 4 files changed, 3 insertions(+), 82 deletions(-)

--- a/arch/mips/Kconfig
+++ b/arch/mips/Kconfig
@@ -2654,6 +2654,7 @@ config MIPS_CRC_SUPPORT
 config HIGHMEM
 	bool "High Memory Support"
 	depends on 32BIT && CPU_SUPPORTS_HIGHMEM && SYS_SUPPORTS_HIGHMEM && !CPU_MIPS32_3_5_EVA
+	select KMAP_ATOMIC_GENERIC
 
 config CPU_SUPPORTS_HIGHMEM
 	bool
--- a/arch/mips/include/asm/highmem.h
+++ b/arch/mips/include/asm/highmem.h
@@ -48,11 +48,11 @@ extern pte_t *pkmap_page_table;
 
 #define ARCH_HAS_KMAP_FLUSH_TLB
 extern void kmap_flush_tlb(unsigned long addr);
-extern void *kmap_atomic_pfn(unsigned long pfn);
 
 #define flush_cache_kmaps()	BUG_ON(cpu_has_dc_aliases)
 
-extern void kmap_init(void);
+#define arch_kmap_temp_post_map(vaddr, pteval)	local_flush_tlb_one(vaddr)
+#define arch_kmap_temp_post_unmap(vaddr)	local_flush_tlb_one(vaddr)
 
 #endif /* __KERNEL__ */
 
--- a/arch/mips/mm/highmem.c
+++ b/arch/mips/mm/highmem.c
@@ -8,8 +8,6 @@
 #include <asm/fixmap.h>
 #include <asm/tlbflush.h>
 
-static pte_t *kmap_pte;
-
 unsigned long highstart_pfn, highend_pfn;
 
 void kmap_flush_tlb(unsigned long addr)
@@ -17,78 +15,3 @@ void kmap_flush_tlb(unsigned long addr)
 	flush_tlb_one(addr);
 }
 EXPORT_SYMBOL(kmap_flush_tlb);
-
-void *kmap_atomic_high_prot(struct page *page, pgprot_t prot)
-{
-	unsigned long vaddr;
-	int idx, type;
-
-	type = kmap_atomic_idx_push();
-	idx = type + KM_TYPE_NR*smp_processor_id();
-	vaddr = __fix_to_virt(FIX_KMAP_BEGIN + idx);
-#ifdef CONFIG_DEBUG_HIGHMEM
-	BUG_ON(!pte_none(*(kmap_pte - idx)));
-#endif
-	set_pte(kmap_pte-idx, mk_pte(page, prot));
-	local_flush_tlb_one((unsigned long)vaddr);
-
-	return (void*) vaddr;
-}
-EXPORT_SYMBOL(kmap_atomic_high_prot);
-
-void kunmap_atomic_high(void *kvaddr)
-{
-	unsigned long vaddr = (unsigned long) kvaddr & PAGE_MASK;
-	int type __maybe_unused;
-
-	if (vaddr < FIXADDR_START)
-		return;
-
-	type = kmap_atomic_idx();
-#ifdef CONFIG_DEBUG_HIGHMEM
-	{
-		int idx = type + KM_TYPE_NR * smp_processor_id();
-
-		BUG_ON(vaddr != __fix_to_virt(FIX_KMAP_BEGIN + idx));
-
-		/*
-		 * force other mappings to Oops if they'll try to access
-		 * this pte without first remap it
-		 */
-		pte_clear(&init_mm, vaddr, kmap_pte-idx);
-		local_flush_tlb_one(vaddr);
-	}
-#endif
-	kmap_atomic_idx_pop();
-}
-EXPORT_SYMBOL(kunmap_atomic_high);
-
-/*
- * This is the same as kmap_atomic() but can map memory that doesn't
- * have a struct page associated with it.
- */
-void *kmap_atomic_pfn(unsigned long pfn)
-{
-	unsigned long vaddr;
-	int idx, type;
-
-	preempt_disable();
-	pagefault_disable();
-
-	type = kmap_atomic_idx_push();
-	idx = type + KM_TYPE_NR*smp_processor_id();
-	vaddr = __fix_to_virt(FIX_KMAP_BEGIN + idx);
-	set_pte(kmap_pte-idx, pfn_pte(pfn, PAGE_KERNEL));
-	flush_tlb_one(vaddr);
-
-	return (void*) vaddr;
-}
-
-void __init kmap_init(void)
-{
-	unsigned long kmap_vstart;
-
-	/* cache the first kmap pte */
-	kmap_vstart = __fix_to_virt(FIX_KMAP_BEGIN);
-	kmap_pte = virt_to_kpte(kmap_vstart);
-}
--- a/arch/mips/mm/init.c
+++ b/arch/mips/mm/init.c
@@ -402,9 +402,6 @@ void __init paging_init(void)
 
 	pagetable_init();
 
-#ifdef CONFIG_HIGHMEM
-	kmap_init();
-#endif
 #ifdef CONFIG_ZONE_DMA
 	max_zone_pfns[ZONE_DMA] = MAX_DMA_PFN;
 #endif


^ permalink raw reply	[flat|nested] 54+ messages in thread

* [patch RFC 09/15] nds32/mm/highmem: Switch to generic kmap atomic
  2020-09-19  9:17 [patch RFC 00/15] mm/highmem: Provide a preemptible variant of kmap_atomic & friends Thomas Gleixner
                   ` (7 preceding siblings ...)
  2020-09-19  9:17 ` [patch RFC 08/15] mips/mm/highmem: " Thomas Gleixner
@ 2020-09-19  9:18 ` Thomas Gleixner
  2020-09-19  9:18 ` [patch RFC 10/15] powerpc/mm/highmem: " Thomas Gleixner
                   ` (7 subsequent siblings)
  16 siblings, 0 replies; 54+ messages in thread
From: Thomas Gleixner @ 2020-09-19  9:18 UTC (permalink / raw)
  To: LKML
  Cc: linux-arch, Linus Torvalds, Paul McKenney, x86,
	Sebastian Andrzej Siewior, Peter Zijlstra, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Daniel Bristot de Oliveira, Will Deacon,
	Andrew Morton, Linux-MM, Russell King, Linux ARM, Chris Zankel,
	Max Filippov, linux-xtensa, Jani Nikula, Joonas Lahtinen,
	Rodrigo Vivi, David Airlie, Daniel Vetter, intel-gfx, dri-devel,
	Ard Biesheuvel, Herbert Xu, Nick Hu, Greentime Hu, Vincent Chen,
	Vineet Gupta, linux-snps-arc, Arnd Bergmann, Guo Ren, linux-csky,
	Michal Simek, Thomas Bogendoerfer, linux-mips, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras, linuxppc-dev,
	David S. Miller, sparclinux

The mapping code is odd and looks broken. See FIXME in the comment.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Nick Hu <nickhu@andestech.com>
Cc: Greentime Hu <green.hu@gmail.com>
Cc: Vincent Chen <deanbo422@gmail.com>
---
Note: Completely untested
---
 arch/nds32/Kconfig.cpu           |    1 
 arch/nds32/include/asm/highmem.h |   21 +++++++++++++----
 arch/nds32/mm/Makefile           |    1 
 arch/nds32/mm/highmem.c          |   48 ---------------------------------------
 4 files changed, 17 insertions(+), 54 deletions(-)

--- a/arch/nds32/Kconfig.cpu
+++ b/arch/nds32/Kconfig.cpu
@@ -157,6 +157,7 @@ config HW_SUPPORT_UNALIGNMENT_ACCESS
 config HIGHMEM
 	bool "High Memory Support"
 	depends on MMU && !CPU_CACHE_ALIASING
+        select KMAP_ATOMIC_GENERIC
 	help
 	  The address space of Andes processors is only 4 Gigabytes large
 	  and it has to accommodate user address space, kernel address
--- a/arch/nds32/include/asm/highmem.h
+++ b/arch/nds32/include/asm/highmem.h
@@ -45,11 +45,22 @@ extern pte_t *pkmap_page_table;
 extern void kmap_init(void);
 
 /*
- * The following functions are already defined by <linux/highmem.h>
- * when CONFIG_HIGHMEM is not set.
+ * FIXME: The below looks broken vs. a kmap_atomic() in task context which
+ * is interupted and another kmap_atomic() happens in interrupt context.
+ * But what do I know about nds32. -- tglx
  */
-#ifdef CONFIG_HIGHMEM
-extern void *kmap_atomic_pfn(unsigned long pfn);
-#endif
+#define arch_kmap_temp_post_map(vaddr, pteval)			\
+	do {							\
+		__nds32__tlbop_inv(vaddr);			\
+		__nds32__mtsr_dsb(vaddr, NDS32_SR_TLB_VPN);	\
+		__nds32__tlbop_rwr(pteval);			\
+		__nds32__isb();					\
+	} while (0)
+
+#define arch_kmap_temp_pre_unmap(vaddr, pte)			\
+	do {							\
+		__nds32__tlbop_inv(vaddr);			\
+		__nds32__isb();					\
+	} while (0)
 
 #endif
--- a/arch/nds32/mm/Makefile
+++ b/arch/nds32/mm/Makefile
@@ -3,7 +3,6 @@ obj-y				:= extable.o tlb.o fault.o init
                                    mm-nds32.o cacheflush.o proc.o
 
 obj-$(CONFIG_ALIGNMENT_TRAP)	+= alignment.o
-obj-$(CONFIG_HIGHMEM)           += highmem.o
 
 ifdef CONFIG_FUNCTION_TRACER
 CFLAGS_REMOVE_proc.o     = $(CC_FLAGS_FTRACE)
--- a/arch/nds32/mm/highmem.c
+++ /dev/null
@@ -1,48 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0
-// Copyright (C) 2005-2017 Andes Technology Corporation
-
-#include <linux/export.h>
-#include <linux/highmem.h>
-#include <linux/sched.h>
-#include <linux/smp.h>
-#include <linux/interrupt.h>
-#include <linux/memblock.h>
-#include <asm/fixmap.h>
-#include <asm/tlbflush.h>
-
-void *kmap_atomic_high_prot(struct page *page, pgprot_t prot)
-{
-	unsigned int idx;
-	unsigned long vaddr, pte;
-	int type;
-	pte_t *ptep;
-
-	type = kmap_atomic_idx_push();
-
-	idx = type + KM_TYPE_NR * smp_processor_id();
-	vaddr = __fix_to_virt(FIX_KMAP_BEGIN + idx);
-	pte = (page_to_pfn(page) << PAGE_SHIFT) | prot;
-	ptep = pte_offset_kernel(pmd_off_k(vaddr), vaddr);
-	set_pte(ptep, pte);
-
-	__nds32__tlbop_inv(vaddr);
-	__nds32__mtsr_dsb(vaddr, NDS32_SR_TLB_VPN);
-	__nds32__tlbop_rwr(pte);
-	__nds32__isb();
-	return (void *)vaddr;
-}
-EXPORT_SYMBOL(kmap_atomic_high_prot);
-
-void kunmap_atomic_high(void *kvaddr)
-{
-	if (kvaddr >= (void *)FIXADDR_START) {
-		unsigned long vaddr = (unsigned long)kvaddr;
-		pte_t *ptep;
-		kmap_atomic_idx_pop();
-		__nds32__tlbop_inv(vaddr);
-		__nds32__isb();
-		ptep = pte_offset_kernel(pmd_off_k(vaddr), vaddr);
-		set_pte(ptep, 0);
-	}
-}
-EXPORT_SYMBOL(kunmap_atomic_high);


^ permalink raw reply	[flat|nested] 54+ messages in thread

* [patch RFC 10/15] powerpc/mm/highmem: Switch to generic kmap atomic
  2020-09-19  9:17 [patch RFC 00/15] mm/highmem: Provide a preemptible variant of kmap_atomic & friends Thomas Gleixner
                   ` (8 preceding siblings ...)
  2020-09-19  9:18 ` [patch RFC 09/15] nds32/mm/highmem: " Thomas Gleixner
@ 2020-09-19  9:18 ` Thomas Gleixner
  2020-09-19  9:18 ` [patch RFC 11/15] sparc/mm/highmem: " Thomas Gleixner
                   ` (6 subsequent siblings)
  16 siblings, 0 replies; 54+ messages in thread
From: Thomas Gleixner @ 2020-09-19  9:18 UTC (permalink / raw)
  To: LKML
  Cc: linux-arch, Linus Torvalds, Paul McKenney, x86,
	Sebastian Andrzej Siewior, Peter Zijlstra, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Daniel Bristot de Oliveira, Will Deacon,
	Andrew Morton, Linux-MM, Russell King, Linux ARM, Chris Zankel,
	Max Filippov, linux-xtensa, Jani Nikula, Joonas Lahtinen,
	Rodrigo Vivi, David Airlie, Daniel Vetter, intel-gfx, dri-devel,
	Ard Biesheuvel, Herbert Xu, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras, linuxppc-dev,
	Vineet Gupta, linux-snps-arc, Arnd Bergmann, Guo Ren, linux-csky,
	Michal Simek, Thomas Bogendoerfer, linux-mips, Nick Hu,
	Greentime Hu, Vincent Chen, David S. Miller, sparclinux

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: linuxppc-dev@lists.ozlabs.org
---
Note: Completely untested
---
 arch/powerpc/Kconfig               |    1 
 arch/powerpc/include/asm/highmem.h |    6 ++-
 arch/powerpc/mm/Makefile           |    1 
 arch/powerpc/mm/highmem.c          |   67 -------------------------------------
 arch/powerpc/mm/mem.c              |    7 ---
 5 files changed, 6 insertions(+), 76 deletions(-)

--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -406,6 +406,7 @@ menu "Kernel options"
 config HIGHMEM
 	bool "High memory support"
 	depends on PPC32
+        select KMAP_ATOMIC_GENERIC
 
 source "kernel/Kconfig.hz"
 
--- a/arch/powerpc/include/asm/highmem.h
+++ b/arch/powerpc/include/asm/highmem.h
@@ -29,7 +29,6 @@
 #include <asm/page.h>
 #include <asm/fixmap.h>
 
-extern pte_t *kmap_pte;
 extern pte_t *pkmap_page_table;
 
 /*
@@ -60,6 +59,11 @@ extern pte_t *pkmap_page_table;
 
 #define flush_cache_kmaps()	flush_cache_all()
 
+#define arch_kmap_temp_post_map(vaddr, pteval)	\
+	local_flush_tlb_page(NULL, vaddr)
+#define arch_kmap_temp_post_unmap(vaddr)	\
+	local_flush_tlb_page(NULL, vaddr)
+
 #endif /* __KERNEL__ */
 
 #endif /* _ASM_HIGHMEM_H */
--- a/arch/powerpc/mm/Makefile
+++ b/arch/powerpc/mm/Makefile
@@ -16,7 +16,6 @@ obj-$(CONFIG_NEED_MULTIPLE_NODES) += num
 obj-$(CONFIG_PPC_MM_SLICES)	+= slice.o
 obj-$(CONFIG_HUGETLB_PAGE)	+= hugetlbpage.o
 obj-$(CONFIG_NOT_COHERENT_CACHE) += dma-noncoherent.o
-obj-$(CONFIG_HIGHMEM)		+= highmem.o
 obj-$(CONFIG_PPC_COPRO_BASE)	+= copro_fault.o
 obj-$(CONFIG_PPC_PTDUMP)	+= ptdump/
 obj-$(CONFIG_KASAN)		+= kasan/
--- a/arch/powerpc/mm/highmem.c
+++ /dev/null
@@ -1,67 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0
-/*
- * highmem.c: virtual kernel memory mappings for high memory
- *
- * PowerPC version, stolen from the i386 version.
- *
- * Used in CONFIG_HIGHMEM systems for memory pages which
- * are not addressable by direct kernel virtual addresses.
- *
- * Copyright (C) 1999 Gerhard Wichert, Siemens AG
- *		      Gerhard.Wichert@pdb.siemens.de
- *
- *
- * Redesigned the x86 32-bit VM architecture to deal with
- * up to 16 Terrabyte physical memory. With current x86 CPUs
- * we now support up to 64 Gigabytes physical RAM.
- *
- * Copyright (C) 1999 Ingo Molnar <mingo@redhat.com>
- *
- * Reworked for PowerPC by various contributors. Moved from
- * highmem.h by Benjamin Herrenschmidt (c) 2009 IBM Corp.
- */
-
-#include <linux/highmem.h>
-#include <linux/module.h>
-
-void *kmap_atomic_high_prot(struct page *page, pgprot_t prot)
-{
-	unsigned long vaddr;
-	int idx, type;
-
-	type = kmap_atomic_idx_push();
-	idx = type + KM_TYPE_NR*smp_processor_id();
-	vaddr = __fix_to_virt(FIX_KMAP_BEGIN + idx);
-	WARN_ON(IS_ENABLED(CONFIG_DEBUG_HIGHMEM) && !pte_none(*(kmap_pte - idx)));
-	__set_pte_at(&init_mm, vaddr, kmap_pte-idx, mk_pte(page, prot), 1);
-	local_flush_tlb_page(NULL, vaddr);
-
-	return (void*) vaddr;
-}
-EXPORT_SYMBOL(kmap_atomic_high_prot);
-
-void kunmap_atomic_high(void *kvaddr)
-{
-	unsigned long vaddr = (unsigned long) kvaddr & PAGE_MASK;
-
-	if (vaddr < __fix_to_virt(FIX_KMAP_END))
-		return;
-
-	if (IS_ENABLED(CONFIG_DEBUG_HIGHMEM)) {
-		int type = kmap_atomic_idx();
-		unsigned int idx;
-
-		idx = type + KM_TYPE_NR * smp_processor_id();
-		WARN_ON(vaddr != __fix_to_virt(FIX_KMAP_BEGIN + idx));
-
-		/*
-		 * force other mappings to Oops if they'll try to access
-		 * this pte without first remap it
-		 */
-		pte_clear(&init_mm, vaddr, kmap_pte-idx);
-		local_flush_tlb_page(NULL, vaddr);
-	}
-
-	kmap_atomic_idx_pop();
-}
-EXPORT_SYMBOL(kunmap_atomic_high);
--- a/arch/powerpc/mm/mem.c
+++ b/arch/powerpc/mm/mem.c
@@ -60,11 +60,6 @@
 unsigned long long memory_limit;
 bool init_mem_is_free;
 
-#ifdef CONFIG_HIGHMEM
-pte_t *kmap_pte;
-EXPORT_SYMBOL(kmap_pte);
-#endif
-
 pgprot_t phys_mem_access_prot(struct file *file, unsigned long pfn,
 			      unsigned long size, pgprot_t vma_prot)
 {
@@ -233,8 +228,6 @@ void __init paging_init(void)
 
 	map_kernel_page(PKMAP_BASE, 0, __pgprot(0));	/* XXX gross */
 	pkmap_page_table = virt_to_kpte(PKMAP_BASE);
-
-	kmap_pte = virt_to_kpte(__fix_to_virt(FIX_KMAP_BEGIN));
 #endif /* CONFIG_HIGHMEM */
 
 	printk(KERN_DEBUG "Top of RAM: 0x%llx, Total RAM: 0x%llx\n",


^ permalink raw reply	[flat|nested] 54+ messages in thread

* [patch RFC 11/15] sparc/mm/highmem: Switch to generic kmap atomic
  2020-09-19  9:17 [patch RFC 00/15] mm/highmem: Provide a preemptible variant of kmap_atomic & friends Thomas Gleixner
                   ` (9 preceding siblings ...)
  2020-09-19  9:18 ` [patch RFC 10/15] powerpc/mm/highmem: " Thomas Gleixner
@ 2020-09-19  9:18 ` Thomas Gleixner
  2020-09-19  9:18 ` [patch RFC 12/15] xtensa/mm/highmem: " Thomas Gleixner
                   ` (5 subsequent siblings)
  16 siblings, 0 replies; 54+ messages in thread
From: Thomas Gleixner @ 2020-09-19  9:18 UTC (permalink / raw)
  To: LKML
  Cc: linux-arch, Linus Torvalds, Paul McKenney, x86,
	Sebastian Andrzej Siewior, Peter Zijlstra, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Daniel Bristot de Oliveira, Will Deacon,
	Andrew Morton, Linux-MM, Russell King, Linux ARM, Chris Zankel,
	Max Filippov, linux-xtensa, Jani Nikula, Joonas Lahtinen,
	Rodrigo Vivi, David Airlie, Daniel Vetter, intel-gfx, dri-devel,
	Ard Biesheuvel, Herbert Xu, David S. Miller, sparclinux,
	Vineet Gupta, linux-snps-arc, Arnd Bergmann, Guo Ren, linux-csky,
	Michal Simek, Thomas Bogendoerfer, linux-mips, Nick Hu,
	Greentime Hu, Vincent Chen, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras, linuxppc-dev

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: sparclinux@vger.kernel.org
---
Note: Completely untested
---
 arch/sparc/Kconfig               |    1 
 arch/sparc/include/asm/highmem.h |    7 +-
 arch/sparc/mm/Makefile           |    3 -
 arch/sparc/mm/highmem.c          |  115 ---------------------------------------
 arch/sparc/mm/srmmu.c            |    2 
 5 files changed, 6 insertions(+), 122 deletions(-)

--- a/arch/sparc/Kconfig
+++ b/arch/sparc/Kconfig
@@ -137,6 +137,7 @@ config MMU
 config HIGHMEM
 	bool
 	default y if SPARC32
+        select KMAP_ATOMIC_GENERIC
 
 config ZONE_DMA
 	bool
--- a/arch/sparc/include/asm/highmem.h
+++ b/arch/sparc/include/asm/highmem.h
@@ -33,8 +33,6 @@ extern unsigned long highstart_pfn, high
 #define kmap_prot __pgprot(SRMMU_ET_PTE | SRMMU_PRIV | SRMMU_CACHE)
 extern pte_t *pkmap_page_table;
 
-void kmap_init(void) __init;
-
 /*
  * Right now we initialize only a single pte table. It can be extended
  * easily, subsequent pte tables have to be allocated in one physical
@@ -53,6 +51,11 @@ void kmap_init(void) __init;
 
 #define flush_cache_kmaps()	flush_cache_all()
 
+/* FIXME: Use __flush_tlb_one(vaddr) instead of flush_cache_all() -- Anton */
+#define arch_kmap_temp_post_map(vaddr, pteval)	flush_cache_all()
+#define arch_kmap_temp_post_unmap(vaddr)	flush_cache_all()
+
+
 #endif /* __KERNEL__ */
 
 #endif /* _ASM_HIGHMEM_H */
--- a/arch/sparc/mm/Makefile
+++ b/arch/sparc/mm/Makefile
@@ -15,6 +15,3 @@ obj-$(CONFIG_SPARC32)   += leon_mm.o
 
 # Only used by sparc64
 obj-$(CONFIG_HUGETLB_PAGE) += hugetlbpage.o
-
-# Only used by sparc32
-obj-$(CONFIG_HIGHMEM)   += highmem.o
--- a/arch/sparc/mm/highmem.c
+++ /dev/null
@@ -1,115 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0
-/*
- *  highmem.c: virtual kernel memory mappings for high memory
- *
- *  Provides kernel-static versions of atomic kmap functions originally
- *  found as inlines in include/asm-sparc/highmem.h.  These became
- *  needed as kmap_atomic() and kunmap_atomic() started getting
- *  called from within modules.
- *  -- Tomas Szepe <szepe@pinerecords.com>, September 2002
- *
- *  But kmap_atomic() and kunmap_atomic() cannot be inlined in
- *  modules because they are loaded with btfixup-ped functions.
- */
-
-/*
- * The use of kmap_atomic/kunmap_atomic is discouraged - kmap/kunmap
- * gives a more generic (and caching) interface. But kmap_atomic can
- * be used in IRQ contexts, so in some (very limited) cases we need it.
- *
- * XXX This is an old text. Actually, it's good to use atomic kmaps,
- * provided you remember that they are atomic and not try to sleep
- * with a kmap taken, much like a spinlock. Non-atomic kmaps are
- * shared by CPUs, and so precious, and establishing them requires IPI.
- * Atomic kmaps are lightweight and we may have NCPUS more of them.
- */
-#include <linux/highmem.h>
-#include <linux/export.h>
-#include <linux/mm.h>
-
-#include <asm/cacheflush.h>
-#include <asm/tlbflush.h>
-#include <asm/vaddrs.h>
-
-static pte_t *kmap_pte;
-
-void __init kmap_init(void)
-{
-	unsigned long address = __fix_to_virt(FIX_KMAP_BEGIN);
-
-        /* cache the first kmap pte */
-        kmap_pte = virt_to_kpte(address);
-}
-
-void *kmap_atomic_high_prot(struct page *page, pgprot_t prot)
-{
-	unsigned long vaddr;
-	long idx, type;
-
-	type = kmap_atomic_idx_push();
-	idx = type + KM_TYPE_NR*smp_processor_id();
-	vaddr = __fix_to_virt(FIX_KMAP_BEGIN + idx);
-
-/* XXX Fix - Anton */
-#if 0
-	__flush_cache_one(vaddr);
-#else
-	flush_cache_all();
-#endif
-
-#ifdef CONFIG_DEBUG_HIGHMEM
-	BUG_ON(!pte_none(*(kmap_pte-idx)));
-#endif
-	set_pte(kmap_pte-idx, mk_pte(page, prot));
-/* XXX Fix - Anton */
-#if 0
-	__flush_tlb_one(vaddr);
-#else
-	flush_tlb_all();
-#endif
-
-	return (void*) vaddr;
-}
-EXPORT_SYMBOL(kmap_atomic_high_prot);
-
-void kunmap_atomic_high(void *kvaddr)
-{
-	unsigned long vaddr = (unsigned long) kvaddr & PAGE_MASK;
-	int type;
-
-	if (vaddr < FIXADDR_START)
-		return;
-
-	type = kmap_atomic_idx();
-
-#ifdef CONFIG_DEBUG_HIGHMEM
-	{
-		unsigned long idx;
-
-		idx = type + KM_TYPE_NR * smp_processor_id();
-		BUG_ON(vaddr != __fix_to_virt(FIX_KMAP_BEGIN+idx));
-
-		/* XXX Fix - Anton */
-#if 0
-		__flush_cache_one(vaddr);
-#else
-		flush_cache_all();
-#endif
-
-		/*
-		 * force other mappings to Oops if they'll try to access
-		 * this pte without first remap it
-		 */
-		pte_clear(&init_mm, vaddr, kmap_pte-idx);
-		/* XXX Fix - Anton */
-#if 0
-		__flush_tlb_one(vaddr);
-#else
-		flush_tlb_all();
-#endif
-	}
-#endif
-
-	kmap_atomic_idx_pop();
-}
-EXPORT_SYMBOL(kunmap_atomic_high);
--- a/arch/sparc/mm/srmmu.c
+++ b/arch/sparc/mm/srmmu.c
@@ -971,8 +971,6 @@ void __init srmmu_paging_init(void)
 
 	sparc_context_init(num_contexts);
 
-	kmap_init();
-
 	{
 		unsigned long max_zone_pfn[MAX_NR_ZONES] = { 0 };
 


^ permalink raw reply	[flat|nested] 54+ messages in thread

* [patch RFC 12/15] xtensa/mm/highmem: Switch to generic kmap atomic
  2020-09-19  9:17 [patch RFC 00/15] mm/highmem: Provide a preemptible variant of kmap_atomic & friends Thomas Gleixner
                   ` (10 preceding siblings ...)
  2020-09-19  9:18 ` [patch RFC 11/15] sparc/mm/highmem: " Thomas Gleixner
@ 2020-09-19  9:18 ` Thomas Gleixner
  2020-09-19  9:18 ` [patch RFC 13/15] mm/highmem: Remove the old kmap_atomic cruft Thomas Gleixner
                   ` (4 subsequent siblings)
  16 siblings, 0 replies; 54+ messages in thread
From: Thomas Gleixner @ 2020-09-19  9:18 UTC (permalink / raw)
  To: LKML
  Cc: linux-arch, Linus Torvalds, Paul McKenney, x86,
	Sebastian Andrzej Siewior, Peter Zijlstra, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Daniel Bristot de Oliveira, Will Deacon,
	Andrew Morton, Linux-MM, Russell King, Linux ARM, Chris Zankel,
	Max Filippov, linux-xtensa, Jani Nikula, Joonas Lahtinen,
	Rodrigo Vivi, David Airlie, Daniel Vetter, intel-gfx, dri-devel,
	Ard Biesheuvel, Herbert Xu, Vineet Gupta, linux-snps-arc,
	Arnd Bergmann, Guo Ren, linux-csky, Michal Simek,
	Thomas Bogendoerfer, linux-mips, Nick Hu, Greentime Hu,
	Vincent Chen, Michael Ellerman, Benjamin Herrenschmidt,
	Paul Mackerras, linuxppc-dev, David S. Miller, sparclinux

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Chris Zankel <chris@zankel.net>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: linux-xtensa@linux-xtensa.org
---
Note: Completely untested
---
 arch/xtensa/Kconfig               |    1 
 arch/xtensa/include/asm/highmem.h |    9 +++++++
 arch/xtensa/mm/highmem.c          |   44 +++-----------------------------------
 3 files changed, 14 insertions(+), 40 deletions(-)

--- a/arch/xtensa/Kconfig
+++ b/arch/xtensa/Kconfig
@@ -679,6 +679,7 @@ endchoice
 config HIGHMEM
 	bool "High Memory Support"
 	depends on MMU
+        select KMAP_ATOMIC_GENERIC
 	help
 	  Linux can use the full amount of RAM in the system by
 	  default. However, the default MMUv2 setup only maps the
--- a/arch/xtensa/include/asm/highmem.h
+++ b/arch/xtensa/include/asm/highmem.h
@@ -68,6 +68,15 @@ static inline void flush_cache_kmaps(voi
 	flush_cache_all();
 }
 
+enum fixed_addresses kmap_temp_map_idx(int type, unsigned long pfn);
+#define arch_kmap_temp_map_idx		kmap_temp_map_idx
+
+enum fixed_addresses kmap_temp_unmap_idx(int type, unsigned long addr);
+#define arch_kmap_temp_unmap_idx	kmap_temp_unmap_idx
+
+#define arch_kmap_temp_post_unmap(vaddr)	\
+	local_flush_tlb_kernel_range(vaddr, vaddr + PAGE_SIZE)
+
 void kmap_init(void);
 
 #endif
--- a/arch/xtensa/mm/highmem.c
+++ b/arch/xtensa/mm/highmem.c
@@ -12,8 +12,6 @@
 #include <linux/highmem.h>
 #include <asm/tlbflush.h>
 
-static pte_t *kmap_pte;
-
 #if DCACHE_WAY_SIZE > PAGE_SIZE
 unsigned int last_pkmap_nr_arr[DCACHE_N_COLORS];
 wait_queue_head_t pkmap_map_wait_arr[DCACHE_N_COLORS];
@@ -37,55 +35,21 @@ static inline enum fixed_addresses kmap_
 		color;
 }
 
-void *kmap_atomic_high_prot(struct page *page, pgprot_t prot)
+enum fixed_addresses kmap_temp_map_idx(int type, unsigned long pfn)
 {
-	enum fixed_addresses idx;
-	unsigned long vaddr;
-
-	idx = kmap_idx(kmap_atomic_idx_push(),
-		       DCACHE_ALIAS(page_to_phys(page)));
-	vaddr = __fix_to_virt(FIX_KMAP_BEGIN + idx);
-#ifdef CONFIG_DEBUG_HIGHMEM
-	BUG_ON(!pte_none(*(kmap_pte + idx)));
-#endif
-	set_pte(kmap_pte + idx, mk_pte(page, prot));
-
-	return (void *)vaddr;
+	return kmap_idx(type, DCACHE_ALIAS(pfn << PAGE_SHIFT);
 }
-EXPORT_SYMBOL(kmap_atomic_high_prot);
 
-void kunmap_atomic_high(void *kvaddr)
+enum fixed_addresses kmap_temp_unmap_idx(int type, unsigned long addr)
 {
-	if (kvaddr >= (void *)FIXADDR_START &&
-	    kvaddr < (void *)FIXADDR_TOP) {
-		int idx = kmap_idx(kmap_atomic_idx(),
-				   DCACHE_ALIAS((unsigned long)kvaddr));
-
-		/*
-		 * Force other mappings to Oops if they'll try to access this
-		 * pte without first remap it.  Keeping stale mappings around
-		 * is a bad idea also, in case the page changes cacheability
-		 * attributes or becomes a protected page in a hypervisor.
-		 */
-		pte_clear(&init_mm, kvaddr, kmap_pte + idx);
-		local_flush_tlb_kernel_range((unsigned long)kvaddr,
-					     (unsigned long)kvaddr + PAGE_SIZE);
-
-		kmap_atomic_idx_pop();
-	}
+	return kmap_idx(type, DCACHE_ALIAS(addr));
 }
-EXPORT_SYMBOL(kunmap_atomic_high);
 
 void __init kmap_init(void)
 {
-	unsigned long kmap_vstart;
-
 	/* Check if this memory layout is broken because PKMAP overlaps
 	 * page table.
 	 */
 	BUILD_BUG_ON(PKMAP_BASE < TLBTEMP_BASE_1 + TLBTEMP_SIZE);
-	/* cache the first kmap pte */
-	kmap_vstart = __fix_to_virt(FIX_KMAP_BEGIN);
-	kmap_pte = virt_to_kpte(kmap_vstart);
 	kmap_waitqueues_init();
 }


^ permalink raw reply	[flat|nested] 54+ messages in thread

* [patch RFC 13/15] mm/highmem: Remove the old kmap_atomic cruft
  2020-09-19  9:17 [patch RFC 00/15] mm/highmem: Provide a preemptible variant of kmap_atomic & friends Thomas Gleixner
                   ` (11 preceding siblings ...)
  2020-09-19  9:18 ` [patch RFC 12/15] xtensa/mm/highmem: " Thomas Gleixner
@ 2020-09-19  9:18 ` Thomas Gleixner
  2020-09-19  9:18 ` [patch RFC 14/15] sched: highmem: Store temporary kmaps in task struct Thomas Gleixner
                   ` (3 subsequent siblings)
  16 siblings, 0 replies; 54+ messages in thread
From: Thomas Gleixner @ 2020-09-19  9:18 UTC (permalink / raw)
  To: LKML
  Cc: linux-arch, Linus Torvalds, Paul McKenney, x86,
	Sebastian Andrzej Siewior, Peter Zijlstra, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Daniel Bristot de Oliveira, Will Deacon,
	Andrew Morton, Linux-MM, Russell King, Linux ARM, Chris Zankel,
	Max Filippov, linux-xtensa, Jani Nikula, Joonas Lahtinen,
	Rodrigo Vivi, David Airlie, Daniel Vetter, intel-gfx, dri-devel,
	Ard Biesheuvel, Herbert Xu, Vineet Gupta, linux-snps-arc,
	Arnd Bergmann, Guo Ren, linux-csky, Michal Simek,
	Thomas Bogendoerfer, linux-mips, Nick Hu, Greentime Hu,
	Vincent Chen, Michael Ellerman, Benjamin Herrenschmidt,
	Paul Mackerras, linuxppc-dev, David S. Miller, sparclinux

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 include/linux/highmem.h |   65 ++----------------------------------------------
 mm/highmem.c            |   28 +++++++++++++++++---
 2 files changed, 28 insertions(+), 65 deletions(-)

--- a/include/linux/highmem.h
+++ b/include/linux/highmem.h
@@ -94,27 +94,6 @@ static inline void kunmap(struct page *p
  * be used in IRQ contexts, so in some (very limited) cases we need
  * it.
  */
-
-#ifndef CONFIG_KMAP_ATOMIC_GENERIC
-void *kmap_atomic_high_prot(struct page *page, pgprot_t prot);
-void kunmap_atomic_high(void *kvaddr);
-
-static inline void *kmap_atomic_prot(struct page *page, pgprot_t prot)
-{
-	preempt_disable();
-	pagefault_disable();
-	if (!PageHighMem(page))
-		return page_address(page);
-	return kmap_atomic_high_prot(page, prot);
-}
-
-static inline void __kunmap_atomic(void *vaddr)
-{
-	kunmap_atomic_high(vaddr);
-	pagefault_enable();
-}
-#else /* !CONFIG_KMAP_ATOMIC_GENERIC */
-
 static inline void *kmap_atomic_prot(struct page *page, pgprot_t prot)
 {
 	preempt_disable();
@@ -127,17 +106,14 @@ static inline void *kmap_atomic_pfn(unsi
 	return kmap_atomic_pfn_prot(pfn, kmap_prot);
 }
 
-static inline void __kunmap_atomic(void *addr)
+static inline void *kmap_atomic(struct page *page)
 {
-	kumap_atomic_indexed(addr);
+	return kmap_atomic_prot(page, kmap_prot);
 }
 
-
-#endif /* CONFIG_KMAP_ATOMIC_GENERIC */
-
-static inline void *kmap_atomic(struct page *page)
+static inline void __kunmap_atomic(void *addr)
 {
-	return kmap_atomic_prot(page, kmap_prot);
+	kumap_atomic_indexed(addr);
 }
 
 /* declarations for linux/mm/highmem.c */
@@ -233,39 +209,6 @@ static inline void __kunmap_atomic(void
 
 #endif /* CONFIG_HIGHMEM */
 
-#if defined(CONFIG_HIGHMEM) || defined(CONFIG_X86_32)
-
-DECLARE_PER_CPU(int, __kmap_atomic_idx);
-
-static inline int kmap_atomic_idx_push(void)
-{
-	int idx = __this_cpu_inc_return(__kmap_atomic_idx) - 1;
-
-#ifdef CONFIG_DEBUG_HIGHMEM
-	WARN_ON_ONCE(in_irq() && !irqs_disabled());
-	BUG_ON(idx >= KM_TYPE_NR);
-#endif
-	return idx;
-}
-
-static inline int kmap_atomic_idx(void)
-{
-	return __this_cpu_read(__kmap_atomic_idx) - 1;
-}
-
-static inline void kmap_atomic_idx_pop(void)
-{
-#ifdef CONFIG_DEBUG_HIGHMEM
-	int idx = __this_cpu_dec_return(__kmap_atomic_idx);
-
-	BUG_ON(idx < 0);
-#else
-	__this_cpu_dec(__kmap_atomic_idx);
-#endif
-}
-
-#endif
-
 /*
  * Prevent people trying to call kunmap_atomic() as if it were kunmap()
  * kunmap_atomic() should get the return value of kmap_atomic, not the page.
--- a/mm/highmem.c
+++ b/mm/highmem.c
@@ -31,10 +31,6 @@
 #include <asm/tlbflush.h>
 #include <linux/vmalloc.h>
 
-#if defined(CONFIG_HIGHMEM) || defined(CONFIG_X86_32)
-DEFINE_PER_CPU(int, __kmap_atomic_idx);
-#endif
-
 /*
  * Virtual_count is not a pure "count".
  *  0 means that it is not mapped, and has not been mapped
@@ -380,6 +376,30 @@ static inline void kmap_high_unmap_tempo
 #endif /* CONFIG_HIGHMEM */
 
 #ifdef CONFIG_KMAP_ATOMIC_GENERIC
+
+static DEFINE_PER_CPU(int, __kmap_atomic_idx);
+
+static inline int kmap_atomic_idx_push(void)
+{
+	int idx = __this_cpu_inc_return(__kmap_atomic_idx) - 1;
+
+	WARN_ON_ONCE(in_irq() && !irqs_disabled());
+	BUG_ON(idx >= KM_TYPE_NR);
+	return idx;
+}
+
+static inline int kmap_atomic_idx(void)
+{
+	return __this_cpu_read(__kmap_atomic_idx) - 1;
+}
+
+static inline void kmap_atomic_idx_pop(void)
+{
+	int idx = __this_cpu_dec_return(__kmap_atomic_idx);
+
+	BUG_ON(idx < 0);
+}
+
 #ifndef arch_kmap_temp_post_map
 # define arch_kmap_temp_post_map(vaddr, pteval)	do { } while (0)
 #endif


^ permalink raw reply	[flat|nested] 54+ messages in thread

* [patch RFC 14/15] sched: highmem: Store temporary kmaps in task struct
  2020-09-19  9:17 [patch RFC 00/15] mm/highmem: Provide a preemptible variant of kmap_atomic & friends Thomas Gleixner
                   ` (12 preceding siblings ...)
  2020-09-19  9:18 ` [patch RFC 13/15] mm/highmem: Remove the old kmap_atomic cruft Thomas Gleixner
@ 2020-09-19  9:18 ` Thomas Gleixner
  2020-09-19  9:18 ` [patch RFC 15/15] mm/highmem: Provide kmap_temporary* Thomas Gleixner
                   ` (2 subsequent siblings)
  16 siblings, 0 replies; 54+ messages in thread
From: Thomas Gleixner @ 2020-09-19  9:18 UTC (permalink / raw)
  To: LKML
  Cc: linux-arch, Linus Torvalds, Paul McKenney, x86,
	Sebastian Andrzej Siewior, Peter Zijlstra, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Daniel Bristot de Oliveira, Will Deacon,
	Andrew Morton, Linux-MM, Russell King, Linux ARM, Chris Zankel,
	Max Filippov, linux-xtensa, Jani Nikula, Joonas Lahtinen,
	Rodrigo Vivi, David Airlie, Daniel Vetter, intel-gfx, dri-devel,
	Ard Biesheuvel, Herbert Xu, Vineet Gupta, linux-snps-arc,
	Arnd Bergmann, Guo Ren, linux-csky, Michal Simek,
	Thomas Bogendoerfer, linux-mips, Nick Hu, Greentime Hu,
	Vincent Chen, Michael Ellerman, Benjamin Herrenschmidt,
	Paul Mackerras, linuxppc-dev, David S. Miller, sparclinux

Instead of storing the map per CPU provide and use per task storage. That
prepares for temporary kmaps which are preemptible.

The context switch code is preparatory and not yet in use because
kmap_atomic() runs with preemption disabled. Will be made usable in the
next step.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 include/linux/highmem.h |    1 
 include/linux/sched.h   |    9 +++++++
 kernel/sched/core.c     |   10 ++++++++
 mm/highmem.c            |   59 ++++++++++++++++++++++++++++++++++++++++++------
 4 files changed, 72 insertions(+), 7 deletions(-)

--- a/include/linux/highmem.h
+++ b/include/linux/highmem.h
@@ -38,6 +38,7 @@ static inline void invalidate_kernel_vma
 void *kmap_atomic_pfn_prot(unsigned long pfn, pgprot_t prot);
 void *kmap_atomic_page_prot(struct page *page, pgprot_t prot);
 void kunmap_atomic_indexed(void *vaddr);
+void kmap_switch_temporary(struct task_struct *prev, struct task_struct *next);
 # ifndef ARCH_NEEDS_KMAP_HIGH_GET
 static inline void *arch_kmap_temporary_high_get(struct page *page)
 {
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -34,6 +34,7 @@
 #include <linux/rseq.h>
 #include <linux/seqlock.h>
 #include <linux/kcsan.h>
+#include <asm/kmap_types.h>
 
 /* task_struct member predeclarations (sorted alphabetically): */
 struct audit_context;
@@ -628,6 +629,13 @@ struct wake_q_node {
 	struct wake_q_node *next;
 };
 
+struct kmap_ctrl {
+#ifdef CONFIG_KMAP_ATOMIC_GENERIC
+	int				idx;
+	pte_t				pteval[KM_TYPE_NR];
+#endif
+};
+
 struct task_struct {
 #ifdef CONFIG_THREAD_INFO_IN_TASK
 	/*
@@ -1280,6 +1288,7 @@ struct task_struct {
 	unsigned int			sequential_io;
 	unsigned int			sequential_io_avg;
 #endif
+	struct kmap_ctrl		kmap_ctrl;
 #ifdef CONFIG_DEBUG_ATOMIC_SLEEP
 	unsigned long			task_state_change;
 #endif
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3529,6 +3529,15 @@ static inline void finish_lock_switch(st
 # define finish_arch_post_lock_switch()	do { } while (0)
 #endif
 
+static inline void kmap_temp_switch(struct task_struct *prev,
+				    struct task_struct *next)
+{
+#ifdef CONFIG_HIGHMEM
+	if (unlikely(prev->kmap_ctrl.idx || next->kmap_ctrl.idx))
+		kmap_switch_temporary(prev, next);
+#endif
+}
+
 /**
  * prepare_task_switch - prepare to switch tasks
  * @rq: the runqueue preparing to switch
@@ -3551,6 +3560,7 @@ prepare_task_switch(struct rq *rq, struc
 	perf_event_task_sched_out(prev, next);
 	rseq_preempt(prev);
 	fire_sched_out_preempt_notifiers(prev, next);
+	kmap_temp_switch(prev, next);
 	prepare_task(next);
 	prepare_arch_switch(next);
 }
--- a/mm/highmem.c
+++ b/mm/highmem.c
@@ -370,6 +370,7 @@ void kunmap_high(struct page *page)
 	if (need_wakeup)
 		wake_up(pkmap_map_wait);
 }
+
 EXPORT_SYMBOL(kunmap_high);
 #else
 static inline void kmap_high_unmap_temporary(unsigned long vaddr) { }
@@ -377,11 +378,9 @@ static inline void kmap_high_unmap_tempo
 
 #ifdef CONFIG_KMAP_ATOMIC_GENERIC
 
-static DEFINE_PER_CPU(int, __kmap_atomic_idx);
-
 static inline int kmap_atomic_idx_push(void)
 {
-	int idx = __this_cpu_inc_return(__kmap_atomic_idx) - 1;
+	int idx = current->kmap_ctrl.idx++;
 
 	WARN_ON_ONCE(in_irq() && !irqs_disabled());
 	BUG_ON(idx >= KM_TYPE_NR);
@@ -390,14 +389,13 @@ static inline int kmap_atomic_idx_push(v
 
 static inline int kmap_atomic_idx(void)
 {
-	return __this_cpu_read(__kmap_atomic_idx) - 1;
+	return current->kmap_ctrl.idx - 1;
 }
 
 static inline void kmap_atomic_idx_pop(void)
 {
-	int idx = __this_cpu_dec_return(__kmap_atomic_idx);
-
-	BUG_ON(idx < 0);
+	current->kmap_ctrl.idx--;
+	BUG_ON(current->kmap_ctrl.idx < 0);
 }
 
 #ifndef arch_kmap_temp_post_map
@@ -447,6 +445,7 @@ static void *__kmap_atomic_pfn_prot(unsi
 	pteval = pfn_pte(pfn, prot);
 	set_pte(kmap_pte - idx, pteval);
 	arch_kmap_temp_post_map(vaddr, pteval);
+	current->kmap_ctrl.pteval[kmap_atomic_idx()] = pteval;
 	preempt_enable();
 
 	return (void *)vaddr;
@@ -499,11 +498,57 @@ void kunmap_atomic_indexed(void *vaddr)
 	arch_kmap_temp_pre_unmap(addr);
 	pte_clear(&init_mm, addr, kmap_pte - idx);
 	arch_kmap_temp_post_unmap(addr);
+	current->kmap_ctrl.pteval[kmap_atomic_idx()] = __pte(0);
 	kmap_atomic_idx_pop();
 	preempt_enable();
 	pagefault_enable();
 }
 EXPORT_SYMBOL(kunmap_atomic_indexed);
+
+void kmap_switch_temporary(struct task_struct *prev, struct task_struct *next)
+{
+	pte_t *kmap_pte = kmap_get_pte();
+	int i;
+
+	/* Clear @prev's kmaps */
+	for (i = 0; i < prev->kmap_ctrl.idx; i++) {
+		pte_t pteval = prev->kmap_ctrl.pteval[i];
+		unsigned long addr;
+		int idx;
+
+		if (WARN_ON_ONCE(pte_none(pteval)))
+			continue;
+
+		/*
+		 * This is a horrible hack for XTENSA to calculate the
+		 * coloured PTE index. Uses the PFN encoded into the pteval
+		 * and the map index calculation because the actual mapped
+		 * virtual address is not stored in task::kmap_ctrl.
+		 *
+		 * For any sane architecture that address calculation is
+		 * optimized out.
+		 */
+		idx = arch_kmap_temp_map_idx(i, pte_pfn(pteval));
+
+		arch_kmap_temp_pre_unmap(addr);
+		pte_clear(&init_mm, addr, kmap_pte - idx);
+		arch_kmap_temp_post_unmap(addr);
+	}
+
+	/* Restore @next's kmaps */
+	for (i = 0; i < next->kmap_ctrl.idx; i++) {
+		pte_t pteval = next->kmap_ctrl.pteval[i];
+		int idx;
+
+		if (WARN_ON_ONCE(pte_none(pteval)))
+			continue;
+
+		idx = arch_kmap_temp_map_idx(i, pte_pfn(pteval));
+		set_pte(kmap_pte - idx, pteval);
+		arch_kmap_temp_post_map(addr, pteval);
+	}
+}
+
 #endif
 
 #if defined(HASHED_PAGE_VIRTUAL)


^ permalink raw reply	[flat|nested] 54+ messages in thread

* [patch RFC 15/15] mm/highmem: Provide kmap_temporary*
  2020-09-19  9:17 [patch RFC 00/15] mm/highmem: Provide a preemptible variant of kmap_atomic & friends Thomas Gleixner
                   ` (13 preceding siblings ...)
  2020-09-19  9:18 ` [patch RFC 14/15] sched: highmem: Store temporary kmaps in task struct Thomas Gleixner
@ 2020-09-19  9:18 ` Thomas Gleixner
  2020-09-19 10:35 ` [patch RFC 00/15] mm/highmem: Provide a preemptible variant of kmap_atomic & friends Daniel Vetter
  2020-09-19 17:18 ` Linus Torvalds
  16 siblings, 0 replies; 54+ messages in thread
From: Thomas Gleixner @ 2020-09-19  9:18 UTC (permalink / raw)
  To: LKML
  Cc: linux-arch, Linus Torvalds, Paul McKenney, x86,
	Sebastian Andrzej Siewior, Peter Zijlstra, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Daniel Bristot de Oliveira, Will Deacon,
	Andrew Morton, Linux-MM, Russell King, Linux ARM, Chris Zankel,
	Max Filippov, linux-xtensa, Jani Nikula, Joonas Lahtinen,
	Rodrigo Vivi, David Airlie, Daniel Vetter, intel-gfx, dri-devel,
	Ard Biesheuvel, Herbert Xu, Vineet Gupta, linux-snps-arc,
	Arnd Bergmann, Guo Ren, linux-csky, Michal Simek,
	Thomas Bogendoerfer, linux-mips, Nick Hu, Greentime Hu,
	Vincent Chen, Michael Ellerman, Benjamin Herrenschmidt,
	Paul Mackerras, linuxppc-dev, David S. Miller, sparclinux

Now that the kmap atomic index is stored in task struct provide a
preemptible variant. On context switch the maps of an outgoing task are
removed and the map of the incoming task are restored. That's obviously
slow, but highmem is slow anyway.

The kmap_temporary and iomap_temporary interfaces can be invoked from both
preemptible and atomic context.

A wholesale conversion of kmap_atomic to be fully preemptible is not
possible because some of the usage sites might rely on the preemption
disable for serialization or per CPUness. Needs to be done on a case by
case basis.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/include/asm/iomap.h |   16 ++++++++-
 arch/x86/mm/iomap_32.c       |    7 +---
 include/linux/highmem.h      |   70 +++++++++++++++++++++++++++++++++----------
 mm/highmem.c                 |   18 +++++------
 4 files changed, 80 insertions(+), 31 deletions(-)

--- a/arch/x86/include/asm/iomap.h
+++ b/arch/x86/include/asm/iomap.h
@@ -13,11 +13,23 @@
 #include <asm/cacheflush.h>
 #include <asm/tlbflush.h>
 
-void __iomem *iomap_atomic_pfn_prot(unsigned long pfn, pgprot_t prot);
+void __iomem *iomap_temporary_pfn_prot(unsigned long pfn, pgprot_t prot);
+
+static inline void __iomem *iomap_atomic_pfn_prot(unsigned long pfn,
+						  pgprot_t prot)
+{
+	preempt_disable();
+	return iomap_temporary_pfn_prot(pfn, prot);
+}
+
+static inline void iounmap_temporary(void __iomem *vaddr)
+{
+	kunmap_temporary_indexed((void __force *)vaddr);
+}
 
 static inline void iounmap_atomic(void __iomem *vaddr)
 {
-	kunmap_atomic_indexed((void __force *)vaddr);
+	iounmap_temporary(vaddr);
 	preempt_enable();
 }
 
--- a/arch/x86/mm/iomap_32.c
+++ b/arch/x86/mm/iomap_32.c
@@ -44,7 +44,7 @@ void iomap_free(resource_size_t base, un
 }
 EXPORT_SYMBOL_GPL(iomap_free);
 
-void __iomem *iomap_atomic_pfn_prot(unsigned long pfn, pgprot_t prot)
+void __iomem *iomap_temporary_pfn_prot(unsigned long pfn, pgprot_t prot)
 {
 	/*
 	 * For non-PAT systems, translate non-WB request to UC- just in
@@ -60,7 +60,6 @@ void __iomem *iomap_atomic_pfn_prot(unsi
 	/* Filter out unsupported __PAGE_KERNEL* bits: */
 	pgprot_val(prot) &= __default_kernel_pte_mask;
 
-	preempt_disable();
-	return (void __force __iomem *)kmap_atomic_pfn_prot(pfn, prot);
+	return (void __force __iomem *)__kmap_temporary_pfn_prot(pfn, prot);
 }
-EXPORT_SYMBOL_GPL(iomap_atomic_pfn_prot);
+EXPORT_SYMBOL_GPL(iomap_temporary_pfn_prot);
--- a/include/linux/highmem.h
+++ b/include/linux/highmem.h
@@ -35,9 +35,9 @@ static inline void invalidate_kernel_vma
  * Outside of CONFIG_HIGHMEM to support X86 32bit iomap_atomic() cruft.
  */
 #ifdef CONFIG_KMAP_ATOMIC_GENERIC
-void *kmap_atomic_pfn_prot(unsigned long pfn, pgprot_t prot);
-void *kmap_atomic_page_prot(struct page *page, pgprot_t prot);
-void kunmap_atomic_indexed(void *vaddr);
+void *__kmap_temporary_pfn_prot(unsigned long pfn, pgprot_t prot);
+void *__kmap_temporary_page_prot(struct page *page, pgprot_t prot);
+void kunmap_temporary_indexed(void *vaddr);
 void kmap_switch_temporary(struct task_struct *prev, struct task_struct *next);
 # ifndef ARCH_NEEDS_KMAP_HIGH_GET
 static inline void *arch_kmap_temporary_high_get(struct page *page)
@@ -95,16 +95,35 @@ static inline void kunmap(struct page *p
  * be used in IRQ contexts, so in some (very limited) cases we need
  * it.
  */
-static inline void *kmap_atomic_prot(struct page *page, pgprot_t prot)
+static inline void *kmap_temporary_page_prot(struct page *page, pgprot_t prot)
 {
-	preempt_disable();
-	return kmap_atomic_page_prot(page, prot);
+	return __kmap_temporary_page_prot(page, prot);
 }
 
-static inline void *kmap_atomic_pfn(unsigned long pfn)
+static inline void *kmap_temporary_page(struct page *page)
+{
+	return kmap_temporary_page_prot(page, kmap_prot);
+}
+
+static inline void *kmap_temporary_pfn_prot(unsigned long pfn, pgprot_t prot)
+{
+	return __kmap_temporary_pfn_prot(pfn, prot);
+}
+
+static inline void *kmap_temporary_pfn(unsigned long pfn)
+{
+	return kmap_temporary_pfn_prot(pfn, kmap_prot);
+}
+
+static inline void __kunmap_temporary(void *vaddr)
+{
+	kunmap_temporary_indexed(vaddr);
+}
+
+static inline void *kmap_atomic_prot(struct page *page, pgprot_t prot)
 {
 	preempt_disable();
-	return kmap_atomic_pfn_prot(pfn, kmap_prot);
+	return kmap_temporary_page_prot(page, prot);
 }
 
 static inline void *kmap_atomic(struct page *page)
@@ -112,9 +131,10 @@ static inline void *kmap_atomic(struct p
 	return kmap_atomic_prot(page, kmap_prot);
 }
 
-static inline void __kunmap_atomic(void *addr)
+static inline void *kmap_atomic_pfn(unsigned long pfn)
 {
-	kumap_atomic_indexed(addr);
+	preempt_disable();
+	return kmap_temporary_pfn_prot(pfn, kmap_prot);
 }
 
 /* declarations for linux/mm/highmem.c */
@@ -177,6 +197,22 @@ static inline void kunmap(struct page *p
 #endif
 }
 
+static inline void *kmap_temporary_page(struct page *page)
+{
+	pagefault_disable();
+	return page_address(page);
+}
+
+static inline void *kmap_temporary_page_prot(struct page *page, pgprot_t prot)
+{
+	return kmap_temporary_page(page);
+}
+
+static inline void *kmap_temporary_pfn(unsigned long pfn)
+{
+	return kmap_temporary_page(pfn_to_page(pfn));
+}
+
 static inline void *kmap_atomic(struct page *page)
 {
 	preempt_disable();
@@ -194,12 +230,8 @@ static inline void *kmap_atomic_pfn(unsi
 	return kmap_atomic(pfn_to_page(pfn));
 }
 
-static inline void __kunmap_atomic(void *addr)
+static inline void __kunmap_temporary(void *addr)
 {
-	/*
-	 * Mostly nothing to do in the CONFIG_HIGHMEM=n case as kunmap_atomic()
-	 * handles preemption
-	 */
 #ifdef ARCH_HAS_FLUSH_ON_KUNMAP
 	kunmap_flush_on_unmap(addr);
 #endif
@@ -217,10 +249,16 @@ static inline void __kunmap_atomic(void
 #define kunmap_atomic(addr)						\
 	do {								\
 		BUILD_BUG_ON(__same_type((addr), struct page *));	\
-		__kunmap_atomic(addr);					\
+		__kunmap_temporary(addr);				\
 		preempt_enable();					\
 	} while (0)
 
+#define kunmap_temporary(addr)						\
+	do {								\
+		BUILD_BUG_ON(__same_type((addr), struct page *));	\
+		__kunmap_temporary(addr);				\
+	} while (0)
+
 /* when CONFIG_HIGHMEM is not set these will be plain clear/copy_page */
 #ifndef clear_user_highpage
 static inline void clear_user_highpage(struct page *page, unsigned long vaddr)
--- a/mm/highmem.c
+++ b/mm/highmem.c
@@ -432,7 +432,7 @@ static pte_t *kmap_get_pte(void)
 	return __kmap_pte;
 }
 
-static void *__kmap_atomic_pfn_prot(unsigned long pfn, pgprot_t prot)
+static void *do_kmap_temporary_pfn_prot(unsigned long pfn, pgprot_t prot)
 {
 	pte_t pteval, *kmap_pte = kmap_get_pte();
 	unsigned long vaddr;
@@ -451,14 +451,14 @@ static void *__kmap_atomic_pfn_prot(unsi
 	return (void *)vaddr;
 }
 
-void *kmap_atomic_pfn_prot(unsigned long pfn, pgprot_t prot)
+void *__kmap_temporary_pfn_prot(unsigned long pfn, pgprot_t prot)
 {
 	pagefault_disable();
-	return __kmap_atomic_pfn_prot(pfn, prot);
+	return do_kmap_temporary_pfn_prot(pfn, prot);
 }
-EXPORT_SYMBOL(kmap_atomic_pfn_prot);
+EXPORT_SYMBOL(__kmap_temporary_pfn_prot);
 
-void *kmap_atomic_page_prot(struct page *page, pgprot_t prot)
+void *__kmap_temporary_page_prot(struct page *page, pgprot_t prot)
 {
 	void *kmap;
 
@@ -471,11 +471,11 @@ void *kmap_atomic_page_prot(struct page
 	if (kmap)
 		return kmap;
 
-	return __kmap_atomic_pfn_prot(page_to_pfn(page), prot);
+	return do_kmap_temporary_pfn_prot(page_to_pfn(page), prot);
 }
-EXPORT_SYMBOL(kmap_atomic_page_prot);
+EXPORT_SYMBOL(__kmap_temporary_page_prot);
 
-void kunmap_atomic_indexed(void *vaddr)
+void kunmap_temporary_indexed(void *vaddr)
 {
 	unsigned long addr = (unsigned long) vaddr & PAGE_MASK;
 	pte_t *kmap_pte = kmap_get_pte();
@@ -503,7 +503,7 @@ void kunmap_atomic_indexed(void *vaddr)
 	preempt_enable();
 	pagefault_enable();
 }
-EXPORT_SYMBOL(kunmap_atomic_indexed);
+EXPORT_SYMBOL(kunmap_temporary_indexed);
 
 void kmap_switch_temporary(struct task_struct *prev, struct task_struct *next)
 {


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [patch RFC 00/15] mm/highmem: Provide a preemptible variant of kmap_atomic & friends
  2020-09-19  9:17 [patch RFC 00/15] mm/highmem: Provide a preemptible variant of kmap_atomic & friends Thomas Gleixner
                   ` (14 preceding siblings ...)
  2020-09-19  9:18 ` [patch RFC 15/15] mm/highmem: Provide kmap_temporary* Thomas Gleixner
@ 2020-09-19 10:35 ` Daniel Vetter
  2020-09-19 10:37   ` Daniel Vetter
  2020-09-19 17:18 ` Linus Torvalds
  16 siblings, 1 reply; 54+ messages in thread
From: Daniel Vetter @ 2020-09-19 10:35 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, open list:GENERIC INCLUDE/A...,
	Linus Torvalds, Paul McKenney, X86 ML, Sebastian Andrzej Siewior,
	Peter Zijlstra, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman,
	Daniel Bristot de Oliveira, Will Deacon, Andrew Morton, Linux-MM,
	Russell King, Linux ARM, Chris Zankel, Max Filippov,
	linux-xtensa, Jani Nikula, Joonas Lahtinen, Rodrigo Vivi,
	David Airlie, intel-gfx, dri-devel, Ard Biesheuvel, Herbert Xu,
	Vineet Gupta, arcml, Arnd Bergmann, Guo Ren, linux-csky,
	Michal Simek, Thomas Bogendoerfer, linux-mips, Nick Hu,
	Greentime Hu, Vincent Chen, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras, linuxppc-dev,
	David S. Miller, sparclinux

On Sat, Sep 19, 2020 at 11:50 AM Thomas Gleixner <tglx@linutronix.de> wrote:
>
> First of all, sorry for the horribly big Cc list!
>
> Following up to the discussion in:
>
>   https://lore.kernel.org/r/20200914204209.256266093@linutronix.de
>
> this provides a preemptible variant of kmap_atomic & related
> interfaces. This is achieved by:
>
>  - Consolidating all kmap atomic implementations in generic code
>
>  - Switching from per CPU storage of the kmap index to a per task storage
>
>  - Adding a pteval array to the per task storage which contains the ptevals
>    of the currently active temporary kmaps
>
>  - Adding context switch code which checks whether the outgoing or the
>    incoming task has active temporary kmaps. If so, the outgoing task's
>    kmaps are removed and the incoming task's kmaps are restored.
>
>  - Adding new interfaces k[un]map_temporary*() which are not disabling
>    preemption and can be called from any context (except NMI).
>
>    Contrary to kmap() which provides preemptible and "persistant" mappings,
>    these interfaces are meant to replace the temporary mappings provided by
>    kmap_atomic*() today.
>
> This allows to get rid of conditional mapping choices and allows to have
> preemptible short term mappings on 64bit which are today enforced to be
> non-preemptible due to the highmem constraints. It clearly puts overhead on
> the highmem users, but highmem is slow anyway.
>
> This is not a wholesale conversion which makes kmap_atomic magically
> preemptible because there might be usage sites which rely on the implicit
> preempt disable. So this needs to be done on a case by case basis and the
> call sites converted to kmap_temporary.
>
> Note, that this is only lightly tested on X86 and completely untested on
> all other architectures.
>
> The lot is also available from
>
>    git://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel.git highmem

I think it should be the case, but I want to double check: Will
copy_*_user be allowed within a kmap_temporary section? This would
allow us to ditch an absolute pile of slowpaths.
-Daniel

>
> Thanks,
>
>         tglx
> ---
>  a/arch/arm/mm/highmem.c               |  121 ---------------------
>  a/arch/microblaze/mm/highmem.c        |   78 -------------
>  a/arch/nds32/mm/highmem.c             |   48 --------
>  a/arch/powerpc/mm/highmem.c           |   67 -----------
>  a/arch/sparc/mm/highmem.c             |  115 --------------------
>  arch/arc/Kconfig                      |    1
>  arch/arc/include/asm/highmem.h        |    8 +
>  arch/arc/mm/highmem.c                 |   44 -------
>  arch/arm/Kconfig                      |    1
>  arch/arm/include/asm/highmem.h        |   30 +++--
>  arch/arm/mm/Makefile                  |    1
>  arch/csky/Kconfig                     |    1
>  arch/csky/include/asm/highmem.h       |    4
>  arch/csky/mm/highmem.c                |   75 -------------
>  arch/microblaze/Kconfig               |    1
>  arch/microblaze/include/asm/highmem.h |    6 -
>  arch/microblaze/mm/Makefile           |    1
>  arch/microblaze/mm/init.c             |    6 -
>  arch/mips/Kconfig                     |    1
>  arch/mips/include/asm/highmem.h       |    4
>  arch/mips/mm/highmem.c                |   77 -------------
>  arch/mips/mm/init.c                   |    3
>  arch/nds32/Kconfig.cpu                |    1
>  arch/nds32/include/asm/highmem.h      |   21 ++-
>  arch/nds32/mm/Makefile                |    1
>  arch/powerpc/Kconfig                  |    1
>  arch/powerpc/include/asm/highmem.h    |    6 -
>  arch/powerpc/mm/Makefile              |    1
>  arch/powerpc/mm/mem.c                 |    7 -
>  arch/sparc/Kconfig                    |    1
>  arch/sparc/include/asm/highmem.h      |    7 -
>  arch/sparc/mm/Makefile                |    3
>  arch/sparc/mm/srmmu.c                 |    2
>  arch/x86/include/asm/fixmap.h         |    1
>  arch/x86/include/asm/highmem.h        |   12 +-
>  arch/x86/include/asm/iomap.h          |   29 +++--
>  arch/x86/mm/highmem_32.c              |   59 ----------
>  arch/x86/mm/init_32.c                 |   15 --
>  arch/x86/mm/iomap_32.c                |   57 ----------
>  arch/xtensa/Kconfig                   |    1
>  arch/xtensa/include/asm/highmem.h     |    9 +
>  arch/xtensa/mm/highmem.c              |   44 -------
>  b/arch/x86/Kconfig                    |    3
>  include/linux/highmem.h               |  141 +++++++++++++++---------
>  include/linux/io-mapping.h            |    2
>  include/linux/sched.h                 |    9 +
>  kernel/sched/core.c                   |   10 +
>  mm/Kconfig                            |    3
>  mm/highmem.c                          |  192 ++++++++++++++++++++++++++++++++--
>  49 files changed, 422 insertions(+), 909 deletions(-)



-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [patch RFC 00/15] mm/highmem: Provide a preemptible variant of kmap_atomic & friends
  2020-09-19 10:35 ` [patch RFC 00/15] mm/highmem: Provide a preemptible variant of kmap_atomic & friends Daniel Vetter
@ 2020-09-19 10:37   ` Daniel Vetter
  2020-09-20  6:23     ` Thomas Gleixner
  0 siblings, 1 reply; 54+ messages in thread
From: Daniel Vetter @ 2020-09-19 10:37 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, open list:GENERIC INCLUDE/A...,
	Linus Torvalds, Paul McKenney, X86 ML, Sebastian Andrzej Siewior,
	Peter Zijlstra, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman,
	Daniel Bristot de Oliveira, Will Deacon, Andrew Morton, Linux-MM,
	Russell King, Linux ARM, Chris Zankel, Max Filippov,
	linux-xtensa, Jani Nikula, Joonas Lahtinen, Rodrigo Vivi,
	David Airlie, intel-gfx, dri-devel, Ard Biesheuvel, Herbert Xu,
	Vineet Gupta, arcml, Arnd Bergmann, Guo Ren, linux-csky,
	Michal Simek, Thomas Bogendoerfer, linux-mips, Nick Hu,
	Greentime Hu, Vincent Chen, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras, linuxppc-dev,
	David S. Miller, sparclinux

On Sat, Sep 19, 2020 at 12:35 PM Daniel Vetter <daniel@ffwll.ch> wrote:
>
> On Sat, Sep 19, 2020 at 11:50 AM Thomas Gleixner <tglx@linutronix.de> wrote:
> >
> > First of all, sorry for the horribly big Cc list!
> >
> > Following up to the discussion in:
> >
> >   https://lore.kernel.org/r/20200914204209.256266093@linutronix.de
> >
> > this provides a preemptible variant of kmap_atomic & related
> > interfaces. This is achieved by:
> >
> >  - Consolidating all kmap atomic implementations in generic code
> >
> >  - Switching from per CPU storage of the kmap index to a per task storage
> >
> >  - Adding a pteval array to the per task storage which contains the ptevals
> >    of the currently active temporary kmaps
> >
> >  - Adding context switch code which checks whether the outgoing or the
> >    incoming task has active temporary kmaps. If so, the outgoing task's
> >    kmaps are removed and the incoming task's kmaps are restored.
> >
> >  - Adding new interfaces k[un]map_temporary*() which are not disabling
> >    preemption and can be called from any context (except NMI).
> >
> >    Contrary to kmap() which provides preemptible and "persistant" mappings,
> >    these interfaces are meant to replace the temporary mappings provided by
> >    kmap_atomic*() today.
> >
> > This allows to get rid of conditional mapping choices and allows to have
> > preemptible short term mappings on 64bit which are today enforced to be
> > non-preemptible due to the highmem constraints. It clearly puts overhead on
> > the highmem users, but highmem is slow anyway.
> >
> > This is not a wholesale conversion which makes kmap_atomic magically
> > preemptible because there might be usage sites which rely on the implicit
> > preempt disable. So this needs to be done on a case by case basis and the
> > call sites converted to kmap_temporary.
> >
> > Note, that this is only lightly tested on X86 and completely untested on
> > all other architectures.
> >
> > The lot is also available from
> >
> >    git://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel.git highmem
>
> I think it should be the case, but I want to double check: Will
> copy_*_user be allowed within a kmap_temporary section? This would
> allow us to ditch an absolute pile of slowpaths.

(coffee just kicked in) copy_*_user is ofc allowed, but if you hit a
page fault you get a short read/write. This looks like it would remove
the need to handle these in a slowpath, since page faults can now be
served in this new kmap_temporary sections. But this sounds too good
to be true, so I'm wondering what I'm missing.
-Daniel
>
> >
> > Thanks,
> >
> >         tglx
> > ---
> >  a/arch/arm/mm/highmem.c               |  121 ---------------------
> >  a/arch/microblaze/mm/highmem.c        |   78 -------------
> >  a/arch/nds32/mm/highmem.c             |   48 --------
> >  a/arch/powerpc/mm/highmem.c           |   67 -----------
> >  a/arch/sparc/mm/highmem.c             |  115 --------------------
> >  arch/arc/Kconfig                      |    1
> >  arch/arc/include/asm/highmem.h        |    8 +
> >  arch/arc/mm/highmem.c                 |   44 -------
> >  arch/arm/Kconfig                      |    1
> >  arch/arm/include/asm/highmem.h        |   30 +++--
> >  arch/arm/mm/Makefile                  |    1
> >  arch/csky/Kconfig                     |    1
> >  arch/csky/include/asm/highmem.h       |    4
> >  arch/csky/mm/highmem.c                |   75 -------------
> >  arch/microblaze/Kconfig               |    1
> >  arch/microblaze/include/asm/highmem.h |    6 -
> >  arch/microblaze/mm/Makefile           |    1
> >  arch/microblaze/mm/init.c             |    6 -
> >  arch/mips/Kconfig                     |    1
> >  arch/mips/include/asm/highmem.h       |    4
> >  arch/mips/mm/highmem.c                |   77 -------------
> >  arch/mips/mm/init.c                   |    3
> >  arch/nds32/Kconfig.cpu                |    1
> >  arch/nds32/include/asm/highmem.h      |   21 ++-
> >  arch/nds32/mm/Makefile                |    1
> >  arch/powerpc/Kconfig                  |    1
> >  arch/powerpc/include/asm/highmem.h    |    6 -
> >  arch/powerpc/mm/Makefile              |    1
> >  arch/powerpc/mm/mem.c                 |    7 -
> >  arch/sparc/Kconfig                    |    1
> >  arch/sparc/include/asm/highmem.h      |    7 -
> >  arch/sparc/mm/Makefile                |    3
> >  arch/sparc/mm/srmmu.c                 |    2
> >  arch/x86/include/asm/fixmap.h         |    1
> >  arch/x86/include/asm/highmem.h        |   12 +-
> >  arch/x86/include/asm/iomap.h          |   29 +++--
> >  arch/x86/mm/highmem_32.c              |   59 ----------
> >  arch/x86/mm/init_32.c                 |   15 --
> >  arch/x86/mm/iomap_32.c                |   57 ----------
> >  arch/xtensa/Kconfig                   |    1
> >  arch/xtensa/include/asm/highmem.h     |    9 +
> >  arch/xtensa/mm/highmem.c              |   44 -------
> >  b/arch/x86/Kconfig                    |    3
> >  include/linux/highmem.h               |  141 +++++++++++++++---------
> >  include/linux/io-mapping.h            |    2
> >  include/linux/sched.h                 |    9 +
> >  kernel/sched/core.c                   |   10 +
> >  mm/Kconfig                            |    3
> >  mm/highmem.c                          |  192 ++++++++++++++++++++++++++++++++--
> >  49 files changed, 422 insertions(+), 909 deletions(-)
>
>
>
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch



-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [patch RFC 00/15] mm/highmem: Provide a preemptible variant of kmap_atomic & friends
  2020-09-19  9:17 [patch RFC 00/15] mm/highmem: Provide a preemptible variant of kmap_atomic & friends Thomas Gleixner
                   ` (15 preceding siblings ...)
  2020-09-19 10:35 ` [patch RFC 00/15] mm/highmem: Provide a preemptible variant of kmap_atomic & friends Daniel Vetter
@ 2020-09-19 17:18 ` Linus Torvalds
  2020-09-19 17:39   ` Matthew Wilcox
  2020-09-20  6:41   ` Thomas Gleixner
  16 siblings, 2 replies; 54+ messages in thread
From: Linus Torvalds @ 2020-09-19 17:18 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, linux-arch, Paul McKenney, the arch/x86 maintainers,
	Sebastian Andrzej Siewior, Peter Zijlstra, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Daniel Bristot de Oliveira, Will Deacon,
	Andrew Morton, Linux-MM, Russell King, Linux ARM, Chris Zankel,
	Max Filippov, linux-xtensa, Jani Nikula, Joonas Lahtinen,
	Rodrigo Vivi, David Airlie, Daniel Vetter, intel-gfx, dri-devel,
	Ard Biesheuvel, Herbert Xu, Vineet Gupta,
	open list:SYNOPSYS ARC ARCHITECTURE, Arnd Bergmann, Guo Ren,
	linux-csky, Michal Simek, Thomas Bogendoerfer, linux-mips,
	Nick Hu, Greentime Hu, Vincent Chen, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras, linuxppc-dev,
	David S. Miller, linux-sparc

On Sat, Sep 19, 2020 at 2:50 AM Thomas Gleixner <tglx@linutronix.de> wrote:
>
> this provides a preemptible variant of kmap_atomic & related
> interfaces. This is achieved by:

Ack. This looks really nice, even apart from the new capability.

The only thing I really reacted to is that the name doesn't make sense
to me: "kmap_temporary()" seems a bit odd.

Particularly for an interface that really is basically meant as a
better replacement of "kmap_atomic()" (but is perhaps also a better
replacement for "kmap()").

I think I understand how the name came about: I think the "temporary"
is there as a distinction from the "longterm" regular kmap(). So I
think it makes some sense from an internal implementation angle, but I
don't think it makes a lot of sense from an interface name.

I don't know what might be a better name, but if we want to emphasize
that it's thread-private and a one-off, maybe "local" would be a
better naming, and make it distinct from the "global" nature of the
old kmap() interface?

However, another solution might be to just use this new preemptible
"local" kmap(), and remove the old global one entirely. Yes, the old
global one caches the page table mapping and that sounds really
efficient and nice. But it's actually horribly horribly bad, because
it means that we need to use locking for them. Your new "temporary"
implementation seems to be fundamentally better locking-wise, and only
need preemption disabling as locking (and is equally fast for the
non-highmem case).

So I wonder if the single-page TLB flush isn't a better model, and
whether it wouldn't be a lot simpler to just get rid of the old
complex kmap() entirely, and replace it with this?

I agree we can't replace the kmap_atomic() version, because maybe
people depend on the preemption disabling it also implied. But what
about replacing the non-atomic kmap()?

Maybe I've missed something.  Is it because the new interface still
does "pagefault_disable()" perhaps?

But does it even need the pagefault_disable() at all? Yes, the
*atomic* one obviously needed it. But why does this new one need to
disable page faults?

[ I'm just reading the patches, I didn't try to apply them and look at
the end result, so I might have missed something ]

The only other worry I would have is just test coverage of this
change. I suspect very few developers use HIGHMEM. And I know the
various test robots don't tend to test 32-bit either.

But apart from that question about naming (and perhaps replacing
kmap() entirely), I very much like it.

                        Linus

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [patch RFC 00/15] mm/highmem: Provide a preemptible variant of kmap_atomic & friends
  2020-09-19 17:18 ` Linus Torvalds
@ 2020-09-19 17:39   ` Matthew Wilcox
  2020-09-19 19:13     ` Linus Torvalds
  2020-09-21 19:58     ` Ira Weiny
  2020-09-20  6:41   ` Thomas Gleixner
  1 sibling, 2 replies; 54+ messages in thread
From: Matthew Wilcox @ 2020-09-19 17:39 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Thomas Gleixner, LKML, linux-arch, Paul McKenney,
	the arch/x86 maintainers, Sebastian Andrzej Siewior,
	Peter Zijlstra, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman,
	Daniel Bristot de Oliveira, Will Deacon, Andrew Morton, Linux-MM,
	Russell King, Linux ARM, Chris Zankel, Max Filippov,
	linux-xtensa, Jani Nikula, Joonas Lahtinen, Rodrigo Vivi,
	David Airlie, Daniel Vetter, intel-gfx, dri-devel,
	Ard Biesheuvel, Herbert Xu, Vineet Gupta,
	open list:SYNOPSYS ARC ARCHITECTURE, Arnd Bergmann, Guo Ren,
	linux-csky, Michal Simek, Thomas Bogendoerfer, linux-mips,
	Nick Hu, Greentime Hu, Vincent Chen, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras, linuxppc-dev,
	David S. Miller, linux-sparc

On Sat, Sep 19, 2020 at 10:18:54AM -0700, Linus Torvalds wrote:
> On Sat, Sep 19, 2020 at 2:50 AM Thomas Gleixner <tglx@linutronix.de> wrote:
> >
> > this provides a preemptible variant of kmap_atomic & related
> > interfaces. This is achieved by:
> 
> Ack. This looks really nice, even apart from the new capability.
> 
> The only thing I really reacted to is that the name doesn't make sense
> to me: "kmap_temporary()" seems a bit odd.
> 
> Particularly for an interface that really is basically meant as a
> better replacement of "kmap_atomic()" (but is perhaps also a better
> replacement for "kmap()").
> 
> I think I understand how the name came about: I think the "temporary"
> is there as a distinction from the "longterm" regular kmap(). So I
> think it makes some sense from an internal implementation angle, but I
> don't think it makes a lot of sense from an interface name.
> 
> I don't know what might be a better name, but if we want to emphasize
> that it's thread-private and a one-off, maybe "local" would be a
> better naming, and make it distinct from the "global" nature of the
> old kmap() interface?
> 
> However, another solution might be to just use this new preemptible
> "local" kmap(), and remove the old global one entirely. Yes, the old
> global one caches the page table mapping and that sounds really
> efficient and nice. But it's actually horribly horribly bad, because
> it means that we need to use locking for them. Your new "temporary"
> implementation seems to be fundamentally better locking-wise, and only
> need preemption disabling as locking (and is equally fast for the
> non-highmem case).
> 
> So I wonder if the single-page TLB flush isn't a better model, and
> whether it wouldn't be a lot simpler to just get rid of the old
> complex kmap() entirely, and replace it with this?
> 
> I agree we can't replace the kmap_atomic() version, because maybe
> people depend on the preemption disabling it also implied. But what
> about replacing the non-atomic kmap()?

My concern with that is people might use kmap() and then pass the address
to a different task.  So we need to audit the current users of kmap()
and convert any that do that into using vmap() instead.

I like kmap_local().  Or kmap_thread().

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [patch RFC 00/15] mm/highmem: Provide a preemptible variant of kmap_atomic & friends
  2020-09-19 17:39   ` Matthew Wilcox
@ 2020-09-19 19:13     ` Linus Torvalds
  2020-09-21 19:58     ` Ira Weiny
  1 sibling, 0 replies; 54+ messages in thread
From: Linus Torvalds @ 2020-09-19 19:13 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Thomas Gleixner, LKML, linux-arch, Paul McKenney,
	the arch/x86 maintainers, Sebastian Andrzej Siewior,
	Peter Zijlstra, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman,
	Daniel Bristot de Oliveira, Will Deacon, Andrew Morton, Linux-MM,
	Russell King, Linux ARM, Chris Zankel, Max Filippov,
	linux-xtensa, Jani Nikula, Joonas Lahtinen, Rodrigo Vivi,
	David Airlie, Daniel Vetter, intel-gfx, dri-devel,
	Ard Biesheuvel, Herbert Xu, Vineet Gupta,
	open list:SYNOPSYS ARC ARCHITECTURE, Arnd Bergmann, Guo Ren,
	linux-csky, Michal Simek, Thomas Bogendoerfer, linux-mips,
	Nick Hu, Greentime Hu, Vincent Chen, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras, linuxppc-dev,
	David S. Miller, linux-sparc

On Sat, Sep 19, 2020 at 10:39 AM Matthew Wilcox <willy@infradead.org> wrote:
>
> My concern with that is people might use kmap() and then pass the address
> to a different task.  So we need to audit the current users of kmap()
> and convert any that do that into using vmap() instead.

Ahh. Yes, I guess they might do that. It sounds strange, but not
entirely crazy - I could imagine some "PIO thread" that does IO to a
page that has been set up by somebody else using kmap(). Or similar.

                Linus

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [patch RFC 00/15] mm/highmem: Provide a preemptible variant of kmap_atomic & friends
  2020-09-19 10:37   ` Daniel Vetter
@ 2020-09-20  6:23     ` Thomas Gleixner
  2020-09-20  8:23       ` Daniel Vetter
  0 siblings, 1 reply; 54+ messages in thread
From: Thomas Gleixner @ 2020-09-20  6:23 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: LKML, open list:GENERIC INCLUDE/A...,
	Linus Torvalds, Paul McKenney, X86 ML, Sebastian Andrzej Siewior,
	Peter Zijlstra, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman,
	Daniel Bristot de Oliveira, Will Deacon, Andrew Morton, Linux-MM,
	Russell King, Linux ARM, Chris Zankel, Max Filippov,
	linux-xtensa, Jani Nikula, Joonas Lahtinen, Rodrigo Vivi,
	David Airlie, intel-gfx, dri-devel, Ard Biesheuvel, Herbert Xu,
	Vineet Gupta, arcml, Arnd Bergmann, Guo Ren, linux-csky,
	Michal Simek, Thomas Bogendoerfer, linux-mips, Nick Hu,
	Greentime Hu, Vincent Chen, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras, linuxppc-dev,
	David S. Miller, sparclinux

On Sat, Sep 19 2020 at 12:37, Daniel Vetter wrote:
> On Sat, Sep 19, 2020 at 12:35 PM Daniel Vetter <daniel@ffwll.ch> wrote:
>> I think it should be the case, but I want to double check: Will
>> copy_*_user be allowed within a kmap_temporary section? This would
>> allow us to ditch an absolute pile of slowpaths.
>
> (coffee just kicked in) copy_*_user is ofc allowed, but if you hit a
> page fault you get a short read/write. This looks like it would remove
> the need to handle these in a slowpath, since page faults can now be
> served in this new kmap_temporary sections. But this sounds too good
> to be true, so I'm wondering what I'm missing.

In principle we could allow pagefaults, but not with the currently
proposed interface which can be called from any context. Obviously if
called from atomic context it can't handle user page faults.

In theory we could make a variant which does not disable pagefaults, but
that's what kmap() already provides.

Thanks,

        tglx




^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [patch RFC 00/15] mm/highmem: Provide a preemptible variant of kmap_atomic & friends
  2020-09-19 17:18 ` Linus Torvalds
  2020-09-19 17:39   ` Matthew Wilcox
@ 2020-09-20  6:41   ` Thomas Gleixner
  2020-09-20  8:49     ` Thomas Gleixner
  1 sibling, 1 reply; 54+ messages in thread
From: Thomas Gleixner @ 2020-09-20  6:41 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: LKML, linux-arch, Paul McKenney, the arch/x86 maintainers,
	Sebastian Andrzej Siewior, Peter Zijlstra, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Daniel Bristot de Oliveira, Will Deacon,
	Andrew Morton, Linux-MM, Russell King, Linux ARM, Chris Zankel,
	Max Filippov, linux-xtensa, Jani Nikula, Joonas Lahtinen,
	Rodrigo Vivi, David Airlie, Daniel Vetter, intel-gfx, dri-devel,
	Ard Biesheuvel, Herbert Xu, Vineet Gupta,
	open list:SYNOPSYS ARC ARCHITECTURE, Arnd Bergmann, Guo Ren,
	linux-csky, Michal Simek, Thomas Bogendoerfer, linux-mips,
	Nick Hu, Greentime Hu, Vincent Chen, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras, linuxppc-dev,
	David S. Miller, linux-sparc

On Sat, Sep 19 2020 at 10:18, Linus Torvalds wrote:
> On Sat, Sep 19, 2020 at 2:50 AM Thomas Gleixner <tglx@linutronix.de> wrote:
>>
>> this provides a preemptible variant of kmap_atomic & related
>> interfaces. This is achieved by:
>
> Ack. This looks really nice, even apart from the new capability.
>
> The only thing I really reacted to is that the name doesn't make sense
> to me: "kmap_temporary()" seems a bit odd.

Yeah. Couldn't come up with something useful.

> Particularly for an interface that really is basically meant as a
> better replacement of "kmap_atomic()" (but is perhaps also a better
> replacement for "kmap()").
>
> I think I understand how the name came about: I think the "temporary"
> is there as a distinction from the "longterm" regular kmap(). So I
> think it makes some sense from an internal implementation angle, but I
> don't think it makes a lot of sense from an interface name.
>
> I don't know what might be a better name, but if we want to emphasize
> that it's thread-private and a one-off, maybe "local" would be a
> better naming, and make it distinct from the "global" nature of the
> old kmap() interface?

Right, _local or _thread would be more intuitive.

> However, another solution might be to just use this new preemptible
> "local" kmap(), and remove the old global one entirely. Yes, the old
> global one caches the page table mapping and that sounds really
> efficient and nice. But it's actually horribly horribly bad, because
> it means that we need to use locking for them. Your new "temporary"
> implementation seems to be fundamentally better locking-wise, and only
> need preemption disabling as locking (and is equally fast for the
> non-highmem case).
>
> So I wonder if the single-page TLB flush isn't a better model, and
> whether it wouldn't be a lot simpler to just get rid of the old
> complex kmap() entirely, and replace it with this?
>
> I agree we can't replace the kmap_atomic() version, because maybe
> people depend on the preemption disabling it also implied. But what
> about replacing the non-atomic kmap()?
>
> Maybe I've missed something.  Is it because the new interface still
> does "pagefault_disable()" perhaps?
>
> But does it even need the pagefault_disable() at all? Yes, the
> *atomic* one obviously needed it. But why does this new one need to
> disable page faults?

It disables pagefaults because it can be called from atomic and
non-atomic context. That was the point to get rid of

         if (preeemptible())
         	kmap();
         else
                kmap_atomic();

If it does not disable pagefaults, then it's just a lightweight variant
of kmap() for short lived mappings.

> But apart from that question about naming (and perhaps replacing
> kmap() entirely), I very much like it.

I thought about it, but then I figured that kmap pointers can be
handed to other contexts from the thread which sets up the mapping
because it's 'permanent'.

I'm not sure whether that actually happens, so we'd need to audit all
kmap() users to be sure. If there is no such use case, then we surely
can get of rid of kmap() completely. It's only 300+ instances to stare
at and quite some of them are wrapped into other functions.

Highmem sucks no matter what and the only sane solution is to remove it
completely.

Thanks,

        tglx


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [patch RFC 00/15] mm/highmem: Provide a preemptible variant of kmap_atomic & friends
  2020-09-20  6:23     ` Thomas Gleixner
@ 2020-09-20  8:23       ` Daniel Vetter
  2020-09-20 17:24         ` Thomas Gleixner
  0 siblings, 1 reply; 54+ messages in thread
From: Daniel Vetter @ 2020-09-20  8:23 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Daniel Vetter, LKML, open list:GENERIC INCLUDE/A...,
	Linus Torvalds, Paul McKenney, X86 ML, Sebastian Andrzej Siewior,
	Peter Zijlstra, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman,
	Daniel Bristot de Oliveira, Will Deacon, Andrew Morton, Linux-MM,
	Russell King, Linux ARM, Chris Zankel, Max Filippov,
	linux-xtensa, Jani Nikula, Joonas Lahtinen, Rodrigo Vivi,
	David Airlie, intel-gfx, dri-devel, Ard Biesheuvel, Herbert Xu,
	Vineet Gupta, arcml, Arnd Bergmann, Guo Ren, linux-csky,
	Michal Simek, Thomas Bogendoerfer, linux-mips, Nick Hu,
	Greentime Hu, Vincent Chen, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras, linuxppc-dev,
	David S. Miller, sparclinux

On Sun, Sep 20, 2020 at 08:23:26AM +0200, Thomas Gleixner wrote:
> On Sat, Sep 19 2020 at 12:37, Daniel Vetter wrote:
> > On Sat, Sep 19, 2020 at 12:35 PM Daniel Vetter <daniel@ffwll.ch> wrote:
> >> I think it should be the case, but I want to double check: Will
> >> copy_*_user be allowed within a kmap_temporary section? This would
> >> allow us to ditch an absolute pile of slowpaths.
> >
> > (coffee just kicked in) copy_*_user is ofc allowed, but if you hit a
> > page fault you get a short read/write. This looks like it would remove
> > the need to handle these in a slowpath, since page faults can now be
> > served in this new kmap_temporary sections. But this sounds too good
> > to be true, so I'm wondering what I'm missing.
> 
> In principle we could allow pagefaults, but not with the currently
> proposed interface which can be called from any context. Obviously if
> called from atomic context it can't handle user page faults.
 
Yeah that's clear, but does the implemention need to disable pagefaults
unconditionally?

> In theory we could make a variant which does not disable pagefaults, but
> that's what kmap() already provides.

Currently we have a bunch of code which roughly does

	kmap_atomic();
	copy_*_user();
	kunmap_atomic();

	if (short_copy_user) {
		kmap();
		copy_*_user(remaining_stuff);
		kunmap();
	}

And the only reason is that kmap is a notch slower, hence the fastpath. If
we get a kmap which is fast and allows pagefaults (only in contexts that
allow pagefaults already ofc) then we can ditch a pile of kmap users.

Cheers, Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [patch RFC 00/15] mm/highmem: Provide a preemptible variant of kmap_atomic & friends
  2020-09-20  6:41   ` Thomas Gleixner
@ 2020-09-20  8:49     ` Thomas Gleixner
  2020-09-20 16:57       ` Linus Torvalds
  0 siblings, 1 reply; 54+ messages in thread
From: Thomas Gleixner @ 2020-09-20  8:49 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: LKML, linux-arch, Paul McKenney, the arch/x86 maintainers,
	Sebastian Andrzej Siewior, Peter Zijlstra, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Daniel Bristot de Oliveira, Will Deacon,
	Andrew Morton, Linux-MM, Russell King, Linux ARM, Chris Zankel,
	Max Filippov, linux-xtensa, Jani Nikula, Joonas Lahtinen,
	Rodrigo Vivi, David Airlie, Daniel Vetter, intel-gfx, dri-devel,
	Ard Biesheuvel, Herbert Xu, Vineet Gupta,
	open list:SYNOPSYS ARC ARCHITECTURE, Arnd Bergmann, Guo Ren,
	linux-csky, Michal Simek, Thomas Bogendoerfer, linux-mips,
	Nick Hu, Greentime Hu, Vincent Chen, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras, linuxppc-dev,
	David S. Miller, linux-sparc

On Sun, Sep 20 2020 at 08:41, Thomas Gleixner wrote:
> On Sat, Sep 19 2020 at 10:18, Linus Torvalds wrote:
>> Maybe I've missed something.  Is it because the new interface still
>> does "pagefault_disable()" perhaps?
>>
>> But does it even need the pagefault_disable() at all? Yes, the
>> *atomic* one obviously needed it. But why does this new one need to
>> disable page faults?
>
> It disables pagefaults because it can be called from atomic and
> non-atomic context. That was the point to get rid of
>
>          if (preeemptible())
>          	kmap();
>          else
>                 kmap_atomic();
>
> If it does not disable pagefaults, then it's just a lightweight variant
> of kmap() for short lived mappings.

Actually most usage sites of kmap atomic do not need page faults to be
disabled at all. As Daniel pointed out the implicit pagefault disable
enforces copy_from_user_inatomic() even in places which are clearly
preemptible task context.

As we need to look at each instance anyway we can add the PF disable
explicitely to the very few places which actually need it.

Thanks,

        tglx


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [patch RFC 00/15] mm/highmem: Provide a preemptible variant of kmap_atomic & friends
  2020-09-20  8:49     ` Thomas Gleixner
@ 2020-09-20 16:57       ` Linus Torvalds
  2020-09-20 17:40         ` Thomas Gleixner
  0 siblings, 1 reply; 54+ messages in thread
From: Linus Torvalds @ 2020-09-20 16:57 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, linux-arch, Paul McKenney, the arch/x86 maintainers,
	Sebastian Andrzej Siewior, Peter Zijlstra, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Daniel Bristot de Oliveira, Will Deacon,
	Andrew Morton, Linux-MM, Russell King, Linux ARM, Chris Zankel,
	Max Filippov, linux-xtensa, Jani Nikula, Joonas Lahtinen,
	Rodrigo Vivi, David Airlie, Daniel Vetter, intel-gfx, dri-devel,
	Ard Biesheuvel, Herbert Xu, Vineet Gupta,
	open list:SYNOPSYS ARC ARCHITECTURE, Arnd Bergmann, Guo Ren,
	linux-csky, Michal Simek, Thomas Bogendoerfer, linux-mips,
	Nick Hu, Greentime Hu, Vincent Chen, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras, linuxppc-dev,
	David S. Miller, linux-sparc

On Sun, Sep 20, 2020 at 1:49 AM Thomas Gleixner <tglx@linutronix.de> wrote:
>
> Actually most usage sites of kmap atomic do not need page faults to be
> disabled at all.

Right. I think the pagefault disabling has (almost) nothing at all to
do with the kmap() itself - it comes from the "atomic" part, not the
"kmap" part.

I say *almost*, because there is one issue that needs some thought:
the amount of kmap nesting.

The kmap_atomic() interface - and your local/temporary/whatever
versions of it - depends very much inherently on being strictly
nesting. In fact, it depends so much on it that maybe that should be
part of the new name?

It's very wrong to do

    addr1 = kmap_atomic();
    addr2 = kmap_atomic();
    ..do something with addr 1..
    kunmap_atomic(addr1);
    .. do something with addr 2..
    kunmap_atomic(addr2);

because the way we allocate the slots is by using a percpu-atomic
inc-return (and we deallocate using dec).

So it's fundamentally a stack.

And that's perfectly fine for page faults - if they do any kmaps,
those will obviously nest.

So the only issue with page faults might be that the stack grows
_larger_. And that might need some thought. We already make the kmap
stack bigger for CONFIG_DEBUG_HIGHMEM, and it's possibly that if we
allow page faults we need to make the kmap stack bigger still.

Btw, looking at the stack code, Ithink your new implementation of it
is a bit scary:

   static inline int kmap_atomic_idx_push(void)
   {
  -       int idx = __this_cpu_inc_return(__kmap_atomic_idx) - 1;
  +       int idx = current->kmap_ctrl.idx++;

and now that 'current->kmap_ctrl.idx' is not atomic wrt

 (a) NMI's (this may be ok, maybe we never do kmaps in NMIs, and with
nesting I think it's fine anyway - the NMI will undo whatever it did)

 (b) the prev/next switch

And that (b) part worries me. You do the kmap_switch_temporary() to
switch the entries, but you do that *separately* from actually
switching 'current' to the new value.

So kmap_switch_temporary() looks safe, but I don't think it actually
is. Because while it first unmaps the old entries and then remaps the
new ones, an interrupt can come in, and at that point it matters what
is *CURRENT*.

And regardless of whether 'current' is 'prev' or 'next', that
kmap_switch_temporary() loop may be doing the wrong thing, depending
on which one had the deeper stack. The interrupt will be using
whatever "current->kmap_ctrl.idx" is, but that might overwrite entries
that are in the process of being restored (if current is still 'prev',
but kmap_switch_temporary() is in the "restore @next's kmaps" pgase),
or it might stomp on entries that have been pte_clear()'ed by the
'prev' thing.

I dunno. The latter may be one of those "it works anyway, it
overwrites things we don't care about", but the former will most
definitely not work.

And it will be completely impossible to debug, because it will depend
on an interrupt that uses kmap_local/atomic/whatever() coming in
_just_ at the right point in the scheduler, and only when the
scheduler has been entered with the right number of kmap entries on
the prev/next stack.

And no developer will ever see this with any amount of debug code
enabled, because it will only hit on legacy platforms that do this
kmap anyway.

So honestly, that code scares me. I think it's buggy. And even if it
"happens to work", it does so for all the wrong reasons, and is very
fragile.

So I would suggest:

 - continue to use an actual per-cpu kmap_atomic_idx

 - make the switching code save the old idx, then unmap the old
entries one by one (while doing the proper "pop" action), and then map
the new entries one by one (while doing the proper "push" action).

which would mean that the only index that is actually ever *USED* is
the percpu one, and it's always up-to-date and pushed/popped for
individual entries, rather than this - imho completely bogus -
optimization where you use "p->kmap_ctrl.idx" directly and very very
unsafely.

Alternatively, that process counter would need about a hundred lines
of commentary about exactly why it's safe. Because I don't think it
is.

                   Linus

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [patch RFC 00/15] mm/highmem: Provide a preemptible variant of kmap_atomic & friends
  2020-09-20  8:23       ` Daniel Vetter
@ 2020-09-20 17:24         ` Thomas Gleixner
  0 siblings, 0 replies; 54+ messages in thread
From: Thomas Gleixner @ 2020-09-20 17:24 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Daniel Vetter, LKML, open list:GENERIC INCLUDE/A...,
	Linus Torvalds, Paul McKenney, X86 ML, Sebastian Andrzej Siewior,
	Peter Zijlstra, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman,
	Daniel Bristot de Oliveira, Will Deacon, Andrew Morton, Linux-MM,
	Russell King, Linux ARM, Chris Zankel, Max Filippov,
	linux-xtensa, Jani Nikula, Joonas Lahtinen, Rodrigo Vivi,
	David Airlie, intel-gfx, dri-devel, Ard Biesheuvel, Herbert Xu,
	Vineet Gupta, arcml, Arnd Bergmann, Guo Ren, linux-csky,
	Michal Simek, Thomas Bogendoerfer, linux-mips, Nick Hu,
	Greentime Hu, Vincent Chen, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras, linuxppc-dev,
	David S. Miller, sparclinux

On Sun, Sep 20 2020 at 10:23, Daniel Vetter wrote:

> On Sun, Sep 20, 2020 at 08:23:26AM +0200, Thomas Gleixner wrote:
>> On Sat, Sep 19 2020 at 12:37, Daniel Vetter wrote:
>> > On Sat, Sep 19, 2020 at 12:35 PM Daniel Vetter <daniel@ffwll.ch> wrote:
>> >> I think it should be the case, but I want to double check: Will
>> >> copy_*_user be allowed within a kmap_temporary section? This would
>> >> allow us to ditch an absolute pile of slowpaths.
>> >
>> > (coffee just kicked in) copy_*_user is ofc allowed, but if you hit a
>> > page fault you get a short read/write. This looks like it would remove
>> > the need to handle these in a slowpath, since page faults can now be
>> > served in this new kmap_temporary sections. But this sounds too good
>> > to be true, so I'm wondering what I'm missing.
>> 
>> In principle we could allow pagefaults, but not with the currently
>> proposed interface which can be called from any context. Obviously if
>> called from atomic context it can't handle user page faults.
>  
> Yeah that's clear, but does the implemention need to disable pagefaults
> unconditionally?

As I wrote in the other reply it's not required and the final interface
will neither disable preemption nor pagefaults (except for the atomic
wrapper around it).

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [patch RFC 00/15] mm/highmem: Provide a preemptible variant of kmap_atomic & friends
  2020-09-20 16:57       ` Linus Torvalds
@ 2020-09-20 17:40         ` Thomas Gleixner
  2020-09-20 17:42           ` Linus Torvalds
  0 siblings, 1 reply; 54+ messages in thread
From: Thomas Gleixner @ 2020-09-20 17:40 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: LKML, linux-arch, Paul McKenney, the arch/x86 maintainers,
	Sebastian Andrzej Siewior, Peter Zijlstra, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Daniel Bristot de Oliveira, Will Deacon,
	Andrew Morton, Linux-MM, Russell King, Linux ARM, Chris Zankel,
	Max Filippov, linux-xtensa, Jani Nikula, Joonas Lahtinen,
	Rodrigo Vivi, David Airlie, Daniel Vetter, intel-gfx, dri-devel,
	Ard Biesheuvel, Herbert Xu, Vineet Gupta,
	open list:SYNOPSYS ARC ARCHITECTURE, Arnd Bergmann, Guo Ren,
	linux-csky, Michal Simek, Thomas Bogendoerfer, linux-mips,
	Nick Hu, Greentime Hu, Vincent Chen, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras, linuxppc-dev,
	David S. Miller, linux-sparc

On Sun, Sep 20 2020 at 09:57, Linus Torvalds wrote:
> On Sun, Sep 20, 2020 at 1:49 AM Thomas Gleixner <tglx@linutronix.de> wrote:
> Btw, looking at the stack code, Ithink your new implementation of it
> is a bit scary:
>
>    static inline int kmap_atomic_idx_push(void)
>    {
>   -       int idx = __this_cpu_inc_return(__kmap_atomic_idx) - 1;
>   +       int idx = current->kmap_ctrl.idx++;
>
> and now that 'current->kmap_ctrl.idx' is not atomic wrt
>
>  (a) NMI's (this may be ok, maybe we never do kmaps in NMIs, and with
> nesting I think it's fine anyway - the NMI will undo whatever it did)

Right. Nesting should be a non issue, but I don't think we have
kmap_atomic() in NMI context.

>  (b) the prev/next switch
>
> And that (b) part worries me. You do the kmap_switch_temporary() to
> switch the entries, but you do that *separately* from actually
> switching 'current' to the new value.
>
> So kmap_switch_temporary() looks safe, but I don't think it actually
> is. Because while it first unmaps the old entries and then remaps the
> new ones, an interrupt can come in, and at that point it matters what
> is *CURRENT*.
>
> And regardless of whether 'current' is 'prev' or 'next', that
> kmap_switch_temporary() loop may be doing the wrong thing, depending
> on which one had the deeper stack. The interrupt will be using
> whatever "current->kmap_ctrl.idx" is, but that might overwrite entries
> that are in the process of being restored (if current is still 'prev',
> but kmap_switch_temporary() is in the "restore @next's kmaps" pgase),
> or it might stomp on entries that have been pte_clear()'ed by the
> 'prev' thing.

Duh yes. Never thought about that.

> Alternatively, that process counter would need about a hundred lines
> of commentary about exactly why it's safe. Because I don't think it
> is.

I think the more obvious solution is to split the whole exercise:


  schedule()
     prepare_switch()
        unmap()

    switch_to()

    finish_switch()
        map()

That's safe because neither the unmap() nor the map() code changes
kmap_ctrl.idx. So if there is an interrupt coming in between unmap() and
switch_to() then a kmap_local() there will use the next entry. So we
could even do the unmap() with interrupts enabled (preemption disabled).
Same for the map() part.

To explain that we need only a few lines of commentry methinks.

Thanks,

        tglx


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [patch RFC 00/15] mm/highmem: Provide a preemptible variant of kmap_atomic & friends
  2020-09-20 17:40         ` Thomas Gleixner
@ 2020-09-20 17:42           ` Linus Torvalds
  2020-09-20 17:58             ` Linus Torvalds
  2020-09-21  7:39             ` Thomas Gleixner
  0 siblings, 2 replies; 54+ messages in thread
From: Linus Torvalds @ 2020-09-20 17:42 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, linux-arch, Paul McKenney, the arch/x86 maintainers,
	Sebastian Andrzej Siewior, Peter Zijlstra, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Daniel Bristot de Oliveira, Will Deacon,
	Andrew Morton, Linux-MM, Russell King, Linux ARM, Chris Zankel,
	Max Filippov, linux-xtensa, Jani Nikula, Joonas Lahtinen,
	Rodrigo Vivi, David Airlie, Daniel Vetter, intel-gfx, dri-devel,
	Ard Biesheuvel, Herbert Xu, Vineet Gupta,
	open list:SYNOPSYS ARC ARCHITECTURE, Arnd Bergmann, Guo Ren,
	linux-csky, Michal Simek, Thomas Bogendoerfer, linux-mips,
	Nick Hu, Greentime Hu, Vincent Chen, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras, linuxppc-dev,
	David S. Miller, linux-sparc

On Sun, Sep 20, 2020 at 10:40 AM Thomas Gleixner <tglx@linutronix.de> wrote:
>
> I think the more obvious solution is to split the whole exercise:
>
>   schedule()
>      prepare_switch()
>         unmap()
>
>     switch_to()
>
>     finish_switch()
>         map()

Yeah, that looks much easier to explain. Ack.

               Linus

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [patch RFC 00/15] mm/highmem: Provide a preemptible variant of kmap_atomic & friends
  2020-09-20 17:42           ` Linus Torvalds
@ 2020-09-20 17:58             ` Linus Torvalds
  2020-09-21  7:39             ` Thomas Gleixner
  1 sibling, 0 replies; 54+ messages in thread
From: Linus Torvalds @ 2020-09-20 17:58 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, linux-arch, Paul McKenney, the arch/x86 maintainers,
	Sebastian Andrzej Siewior, Peter Zijlstra, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Daniel Bristot de Oliveira, Will Deacon,
	Andrew Morton, Linux-MM, Russell King, Linux ARM, Chris Zankel,
	Max Filippov, linux-xtensa, Jani Nikula, Joonas Lahtinen,
	Rodrigo Vivi, David Airlie, Daniel Vetter, intel-gfx, dri-devel,
	Ard Biesheuvel, Herbert Xu, Vineet Gupta,
	open list:SYNOPSYS ARC ARCHITECTURE, Arnd Bergmann, Guo Ren,
	linux-csky, Michal Simek, Thomas Bogendoerfer, linux-mips,
	Nick Hu, Greentime Hu, Vincent Chen, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras, linuxppc-dev,
	David S. Miller, linux-sparc

On Sun, Sep 20, 2020 at 10:42 AM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> Yeah, that looks much easier to explain. Ack.

Btw, one thing that might be a good idea at least initially is to add
a check for p->kmap_ctrl.idx being zero at fork, exit and maybe
syscall return time (but that last one may be too cumbersome to really
worry about).

The kmap_atomic() interface basically has a lot of coverage for leaked
as part of all the "might_sleep()" checks sprinkled around,  The new
kmap_temporary/local/whatever wouldn't have that kind of incidental
debug checking, and any leaked kmap indexes would be rather hard to
debug (much) later when they cause index overflows or whatever.

                Linus

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [patch RFC 01/15] mm/highmem: Un-EXPORT __kmap_atomic_idx()
  2020-09-19  9:17 ` [patch RFC 01/15] mm/highmem: Un-EXPORT __kmap_atomic_idx() Thomas Gleixner
@ 2020-09-21  6:23   ` Christoph Hellwig
  0 siblings, 0 replies; 54+ messages in thread
From: Christoph Hellwig @ 2020-09-21  6:23 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, linux-arch, Linus Torvalds, Paul McKenney, x86,
	Sebastian Andrzej Siewior, Peter Zijlstra, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Daniel Bristot de Oliveira, Will Deacon,
	Andrew Morton, Linux-MM, Russell King, Linux ARM, Chris Zankel,
	Max Filippov, linux-xtensa, Jani Nikula, Joonas Lahtinen,
	Rodrigo Vivi, David Airlie, Daniel Vetter, intel-gfx, dri-devel,
	Ard Biesheuvel, Herbert Xu, Vineet Gupta, linux-snps-arc,
	Arnd Bergmann, Guo Ren, linux-csky, Michal Simek,
	Thomas Bogendoerfer, linux-mips, Nick Hu, Greentime Hu,
	Vincent Chen, Michael Ellerman, Benjamin Herrenschmidt,
	Paul Mackerras, linuxppc-dev, David S. Miller, sparclinux

On Sat, Sep 19, 2020 at 11:17:52AM +0200, Thomas Gleixner wrote:
> Nothing in modules can use that.
> 
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [patch RFC 02/15] highmem: Provide generic variant of kmap_atomic*
  2020-09-19  9:17 ` [patch RFC 02/15] highmem: Provide generic variant of kmap_atomic* Thomas Gleixner
@ 2020-09-21  6:28   ` Christoph Hellwig
  0 siblings, 0 replies; 54+ messages in thread
From: Christoph Hellwig @ 2020-09-21  6:28 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, linux-arch, Linus Torvalds, Paul McKenney, x86,
	Sebastian Andrzej Siewior, Peter Zijlstra, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Daniel Bristot de Oliveira, Will Deacon,
	Andrew Morton, Linux-MM, Russell King, Linux ARM, Chris Zankel,
	Max Filippov, linux-xtensa, Jani Nikula, Joonas Lahtinen,
	Rodrigo Vivi, David Airlie, Daniel Vetter, intel-gfx, dri-devel,
	Ard Biesheuvel, Herbert Xu, Vineet Gupta, linux-snps-arc,
	Arnd Bergmann, Guo Ren, linux-csky, Michal Simek,
	Thomas Bogendoerfer, linux-mips, Nick Hu, Greentime Hu,
	Vincent Chen, Michael Ellerman, Benjamin Herrenschmidt,
	Paul Mackerras, linuxppc-dev, David S. Miller, sparclinux

> +# ifndef ARCH_NEEDS_KMAP_HIGH_GET
> +static inline void *arch_kmap_temporary_high_get(struct page *page)
> +{
> +	return NULL;
> +}
> +# endif

Turn this into a macro and use #ifndef on the symbol name?

> +static inline void __kunmap_atomic(void *addr)
> +{
> +	kumap_atomic_indexed(addr);
> +}
> +
> +
> +#endif /* CONFIG_KMAP_ATOMIC_GENERIC */

Stange double empty line above the endif.

> -#define kunmap_atomic(addr)                                     \
> -do {                                                            \
> -	BUILD_BUG_ON(__same_type((addr), struct page *));       \
> -	kunmap_atomic_high(addr);                                  \
> -	pagefault_enable();                                     \
> -	preempt_enable();                                       \
> -} while (0)
> -
> +#define kunmap_atomic(addr)						\
> +	do {								\
> +		BUILD_BUG_ON(__same_type((addr), struct page *));	\
> +		__kunmap_atomic(addr);					\
> +		preempt_enable();					\
> +	} while (0)

Why the strange re-indent to a form that is much less common and less
readable?

> +void *kmap_atomic_pfn_prot(unsigned long pfn, pgprot_t prot)
> +{
> +	pagefault_disable();
> +	return __kmap_atomic_pfn_prot(pfn, prot);
> +}
> +EXPORT_SYMBOL(kmap_atomic_pfn_prot);

The existing kmap_atomic_pfn & co implementation is EXPORT_SYMBOL_GPL,
and this stuff should preferably stay that way.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [patch RFC 00/15] mm/highmem: Provide a preemptible variant of kmap_atomic & friends
  2020-09-20 17:42           ` Linus Torvalds
  2020-09-20 17:58             ` Linus Torvalds
@ 2020-09-21  7:39             ` Thomas Gleixner
  2020-09-21 16:24               ` Linus Torvalds
  1 sibling, 1 reply; 54+ messages in thread
From: Thomas Gleixner @ 2020-09-21  7:39 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: LKML, linux-arch, Paul McKenney, the arch/x86 maintainers,
	Sebastian Andrzej Siewior, Peter Zijlstra, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Daniel Bristot de Oliveira, Will Deacon,
	Andrew Morton, Linux-MM, Russell King, Linux ARM, Chris Zankel,
	Max Filippov, linux-xtensa, Jani Nikula, Joonas Lahtinen,
	Rodrigo Vivi, David Airlie, Daniel Vetter, intel-gfx, dri-devel,
	Ard Biesheuvel, Herbert Xu, Vineet Gupta,
	open list:SYNOPSYS ARC ARCHITECTURE, Arnd Bergmann, Guo Ren,
	linux-csky, Michal Simek, Thomas Bogendoerfer, linux-mips,
	Nick Hu, Greentime Hu, Vincent Chen, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras, linuxppc-dev,
	David S. Miller, linux-sparc

On Sun, Sep 20 2020 at 10:42, Linus Torvalds wrote:
> On Sun, Sep 20, 2020 at 10:40 AM Thomas Gleixner <tglx@linutronix.de> wrote:
>>
>> I think the more obvious solution is to split the whole exercise:
>>
>>   schedule()
>>      prepare_switch()
>>         unmap()
>>
>>     switch_to()
>>
>>     finish_switch()
>>         map()
>
> Yeah, that looks much easier to explain. Ack.

So far so good, but Peter Z. just pointed out to me that I completely
missed the fact that this cannot work.

If a task is migrated to a different CPU then the mapping address will
change which will explode in colourful ways.

On RT kernels this works because we ping the task to the CPU via
migrate_disable(). On a !RT kernel migrate_disable() maps to
preempt_disable() which brings us back to square one.

/me goes back to the drawing board.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [patch RFC 00/15] mm/highmem: Provide a preemptible variant of kmap_atomic & friends
  2020-09-21  7:39             ` Thomas Gleixner
@ 2020-09-21 16:24               ` Linus Torvalds
  2020-09-21 19:27                 ` Thomas Gleixner
  0 siblings, 1 reply; 54+ messages in thread
From: Linus Torvalds @ 2020-09-21 16:24 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, linux-arch, Paul McKenney, the arch/x86 maintainers,
	Sebastian Andrzej Siewior, Peter Zijlstra, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Daniel Bristot de Oliveira, Will Deacon,
	Andrew Morton, Linux-MM, Russell King, Linux ARM, Chris Zankel,
	Max Filippov, linux-xtensa, Jani Nikula, Joonas Lahtinen,
	Rodrigo Vivi, David Airlie, Daniel Vetter, intel-gfx, dri-devel,
	Ard Biesheuvel, Herbert Xu, Vineet Gupta,
	open list:SYNOPSYS ARC ARCHITECTURE, Arnd Bergmann, Guo Ren,
	linux-csky, Michal Simek, Thomas Bogendoerfer, linux-mips,
	Nick Hu, Greentime Hu, Vincent Chen, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras, linuxppc-dev,
	David S. Miller, linux-sparc

On Mon, Sep 21, 2020 at 12:39 AM Thomas Gleixner <tglx@linutronix.de> wrote:
>
> If a task is migrated to a different CPU then the mapping address will
> change which will explode in colourful ways.

Heh.

Right you are.

Maybe we really *could* call this new kmap functionality something
like "kmap_percpu()" (or maybe "local" is good enough), and make it
act like your RT code does for spinlocks - not disable preemption, but
only disabling CPU migration.

That would probably be good enough for a lot of users that don't want
to expose excessive latencies, but where it's really not a huge deal
to say "stick to this CPU for a short while".

The crypto code certainly sounds like one such case.

             Linus

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [patch RFC 00/15] mm/highmem: Provide a preemptible variant of kmap_atomic & friends
  2020-09-21 16:24               ` Linus Torvalds
@ 2020-09-21 19:27                 ` Thomas Gleixner
  2020-09-23  8:40                   ` peterz
                                     ` (2 more replies)
  0 siblings, 3 replies; 54+ messages in thread
From: Thomas Gleixner @ 2020-09-21 19:27 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: LKML, linux-arch, Paul McKenney, the arch/x86 maintainers,
	Sebastian Andrzej Siewior, Peter Zijlstra, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Daniel Bristot de Oliveira, Will Deacon,
	Andrew Morton, Linux-MM, Russell King, Linux ARM, Chris Zankel,
	Max Filippov, linux-xtensa, Jani Nikula, Joonas Lahtinen,
	Rodrigo Vivi, David Airlie, Daniel Vetter, intel-gfx, dri-devel,
	Ard Biesheuvel, Herbert Xu, Vineet Gupta,
	open list:SYNOPSYS ARC ARCHITECTURE, Arnd Bergmann, Guo Ren,
	linux-csky, Michal Simek, Thomas Bogendoerfer, linux-mips,
	Nick Hu, Greentime Hu, Vincent Chen, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras, linuxppc-dev,
	David S. Miller, linux-sparc

On Mon, Sep 21 2020 at 09:24, Linus Torvalds wrote:
> On Mon, Sep 21, 2020 at 12:39 AM Thomas Gleixner <tglx@linutronix.de> wrote:
>>
>> If a task is migrated to a different CPU then the mapping address will
>> change which will explode in colourful ways.
>
> Right you are.
>
> Maybe we really *could* call this new kmap functionality something
> like "kmap_percpu()" (or maybe "local" is good enough), and make it
> act like your RT code does for spinlocks - not disable preemption, but
> only disabling CPU migration.

I"m all for it, but the scheduler people have opinions :)

> That would probably be good enough for a lot of users that don't want
> to expose excessive latencies, but where it's really not a huge deal
> to say "stick to this CPU for a short while".
>
> The crypto code certainly sounds like one such case.

I looked at a lot of the kmap_atomic() places and quite some of them
only require migration to be disabled to keep the temporary map
stable.

Quite some code could be simplified significantly especially those
places which need to do copy_from/to_user inside these
sections. Graphics is the main example here as Daniel pointed out.

Alternatively this could of course be solved with per CPU page tables
which will come around some day anyway I fear.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [patch RFC 00/15] mm/highmem: Provide a preemptible variant of kmap_atomic & friends
  2020-09-19 17:39   ` Matthew Wilcox
  2020-09-19 19:13     ` Linus Torvalds
@ 2020-09-21 19:58     ` Ira Weiny
  1 sibling, 0 replies; 54+ messages in thread
From: Ira Weiny @ 2020-09-21 19:58 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Linus Torvalds, Thomas Gleixner, LKML, linux-arch, Paul McKenney,
	the arch/x86 maintainers, Sebastian Andrzej Siewior,
	Peter Zijlstra, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman,
	Daniel Bristot de Oliveira, Will Deacon, Andrew Morton, Linux-MM,
	Russell King, Linux ARM, Chris Zankel, Max Filippov,
	linux-xtensa, Jani Nikula, Joonas Lahtinen, Rodrigo Vivi,
	David Airlie, Daniel Vetter, intel-gfx, dri-devel,
	Ard Biesheuvel, Herbert Xu, Vineet Gupta,
	open list:SYNOPSYS ARC ARCHITECTURE, Arnd Bergmann, Guo Ren,
	linux-csky, Michal Simek, Thomas Bogendoerfer, linux-mips,
	Nick Hu, Greentime Hu, Vincent Chen, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras, linuxppc-dev,
	David S. Miller, linux-sparc

On Sat, Sep 19, 2020 at 06:39:06PM +0100, Matthew Wilcox wrote:
> On Sat, Sep 19, 2020 at 10:18:54AM -0700, Linus Torvalds wrote:
> > On Sat, Sep 19, 2020 at 2:50 AM Thomas Gleixner <tglx@linutronix.de> wrote:
> > >
> > > this provides a preemptible variant of kmap_atomic & related
> > > interfaces. This is achieved by:
> > 
> > Ack. This looks really nice, even apart from the new capability.
> > 
> > The only thing I really reacted to is that the name doesn't make sense
> > to me: "kmap_temporary()" seems a bit odd.
> > 
> > Particularly for an interface that really is basically meant as a
> > better replacement of "kmap_atomic()" (but is perhaps also a better
> > replacement for "kmap()").
> > 
> > I think I understand how the name came about: I think the "temporary"
> > is there as a distinction from the "longterm" regular kmap(). So I
> > think it makes some sense from an internal implementation angle, but I
> > don't think it makes a lot of sense from an interface name.
> > 
> > I don't know what might be a better name, but if we want to emphasize
> > that it's thread-private and a one-off, maybe "local" would be a
> > better naming, and make it distinct from the "global" nature of the
> > old kmap() interface?
> > 
> > However, another solution might be to just use this new preemptible
> > "local" kmap(), and remove the old global one entirely. Yes, the old
> > global one caches the page table mapping and that sounds really
> > efficient and nice. But it's actually horribly horribly bad, because
> > it means that we need to use locking for them. Your new "temporary"
> > implementation seems to be fundamentally better locking-wise, and only
> > need preemption disabling as locking (and is equally fast for the
> > non-highmem case).
> > 
> > So I wonder if the single-page TLB flush isn't a better model, and
> > whether it wouldn't be a lot simpler to just get rid of the old
> > complex kmap() entirely, and replace it with this?
> > 
> > I agree we can't replace the kmap_atomic() version, because maybe
> > people depend on the preemption disabling it also implied. But what
> > about replacing the non-atomic kmap()?
> 
> My concern with that is people might use kmap() and then pass the address
> to a different task.  So we need to audit the current users of kmap()
> and convert any that do that into using vmap() instead.
> 

I've done some of this work.[3]  PKS and pmem stray write protection[2] depend
on kmap to enable the correct PKS settings.  After working through the
exception handling we realized that some users of kmap() seem to be doing just
this; passing the address to a different task.

From what I have found ~90% of kmap() callers are 'kmap_thread()' and the other
~10% are kmap().[3]  But of those 10% I'm not familiar with the code enough to
know if they really require a 'global' map.  What I do know is they save an
address which appears to be used in other threads.  But I could be wrong.

For PKS I added a 'global' implementation which could then be called by kmap()
and added a new kmap_thread() call which used the original 'local' version of
the PKS interface.  The PKS work is still being reviewed internally for the TIP
core code.  But I've pushed it all to git hub for purposes of this
discussion.[1]

> I like kmap_local().  Or kmap_thread().

I chose kmap_thread() so that makes sense to me.  I also thought about using
kmap_global() as an alternative interface which would change just ~10% of the
callers and make the series much smaller.  But internal discussions lead me to
chose kmap_thread() as the new interface so that we don't change the semantics
of kmap().

Ira


[1] https://github.com/weiny2/linux-kernel/tree/lm-pks-pmem-for-5.10-v3

[2] https://lore.kernel.org/lkml/20200717072056.73134-1-ira.weiny@intel.com/

[3]
12:42:06 > git grep ' kmap(' *.c | grep -v '* ' | wc -l
22

12:43:32 > git grep ' kmap_thread(' *.c | grep -v '* ' | wc -l
204

Here are the callers which hand an address to another thread.

12:45:25 > git grep ' kmap(' *.c | grep -v '* '
arch/x86/mm/dump_pagetables.c:  [PKMAP_BASE_NR]         = { 0UL, "Persistent kmap() Area" },
drivers/firewire/net.c:         ptr = kmap(dev->broadcast_rcv_buffer.pages[u]);
drivers/gpu/drm/i915/gem/i915_gem_pages.c:              return kmap(sg_page(sgt->sgl));
drivers/gpu/drm/i915/selftests/i915_perf.c:     scratch = kmap(ce->vm->scratch[0].base.page);
drivers/gpu/drm/ttm/ttm_bo_util.c:              map->virtual = kmap(map->page);
drivers/infiniband/hw/qib/qib_user_sdma.c:      mpage = kmap(page);
drivers/misc/vmw_vmci/vmci_host.c:      context->notify = kmap(context->notify_page) + (uva & (PAGE_SIZE - 1));
drivers/misc/xilinx_sdfec.c:            addr = kmap(pages[i]);
drivers/mmc/host/usdhi6rol0.c:  host->pg.mapped         = kmap(host->pg.page);
drivers/mmc/host/usdhi6rol0.c:  host->pg.mapped = kmap(host->pg.page);
drivers/mmc/host/usdhi6rol0.c:  host->pg.mapped = kmap(host->pg.page);
drivers/nvme/target/tcp.c:              iov->iov_base = kmap(sg_page(sg)) + sg->offset + sg_offset;
drivers/scsi/libiscsi_tcp.c:            segment->sg_mapped = kmap(sg_page(sg));
drivers/target/iscsi/iscsi_target.c:            iov[i].iov_base = kmap(sg_page(sg)) + sg->offset + page_off;
drivers/target/target_core_transport.c:         return kmap(sg_page(sg)) + sg->offset;
fs/btrfs/check-integrity.c:             block_ctx->datav[i] = kmap(block_ctx->pagev[i]);
fs/ceph/dir.c:          cache_ctl->dentries = kmap(cache_ctl->page);
fs/ceph/inode.c:                ctl->dentries = kmap(ctl->page);
lib/scatterlist.c:              miter->addr = kmap(miter->page) + miter->__offset;
net/ceph/pagelist.c:    pl->mapped_tail = kmap(page);
net/ceph/pagelist.c:            pl->mapped_tail = kmap(page);
virt/kvm/kvm_main.c:                    hva = kmap(page);


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [patch RFC 06/15] csky/mm/highmem: Switch to generic kmap atomic
  2020-09-19  9:17 ` [patch RFC 06/15] csky/mm/highmem: " Thomas Gleixner
@ 2020-09-23  0:05   ` Guo Ren
  0 siblings, 0 replies; 54+ messages in thread
From: Guo Ren @ 2020-09-23  0:05 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, linux-arch, Linus Torvalds, Paul McKenney, x86,
	Sebastian Andrzej Siewior, Peter Zijlstra, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Daniel Bristot de Oliveira, Will Deacon,
	Andrew Morton, Linux-MM, Russell King, Linux ARM, Chris Zankel,
	Max Filippov, linux-xtensa, Jani Nikula, Joonas Lahtinen,
	Rodrigo Vivi, David Airlie, Daniel Vetter, intel-gfx, dri-devel,
	Ard Biesheuvel, Herbert Xu, linux-csky, Vineet Gupta,
	linux-snps-arc, Arnd Bergmann, Michal Simek, Thomas Bogendoerfer,
	linux-mips, Nick Hu, Greentime Hu, Vincent Chen,
	Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
	linuxppc-dev, David S. Miller, sparclinux

Acked-by: Guo Ren <guoren@kernel.org>

On Sat, Sep 19, 2020 at 5:50 PM Thomas Gleixner <tglx@linutronix.de> wrote:
>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> Cc: Guo Ren <guoren@kernel.org>
> Cc: linux-csky@vger.kernel.org
> ---
> Note: Completely untested
> ---
>  arch/csky/Kconfig               |    1
>  arch/csky/include/asm/highmem.h |    4 +-
>  arch/csky/mm/highmem.c          |   75 ----------------------------------------
>  3 files changed, 5 insertions(+), 75 deletions(-)
>
> --- a/arch/csky/Kconfig
> +++ b/arch/csky/Kconfig
> @@ -285,6 +285,7 @@ config NR_CPUS
>  config HIGHMEM
>         bool "High Memory Support"
>         depends on !CPU_CK610
> +       select KMAP_ATOMIC_GENERIC
>         default y
>
>  config FORCE_MAX_ZONEORDER
> --- a/arch/csky/include/asm/highmem.h
> +++ b/arch/csky/include/asm/highmem.h
> @@ -32,10 +32,12 @@ extern pte_t *pkmap_page_table;
>
>  #define ARCH_HAS_KMAP_FLUSH_TLB
>  extern void kmap_flush_tlb(unsigned long addr);
> -extern void *kmap_atomic_pfn(unsigned long pfn);
>
>  #define flush_cache_kmaps() do {} while (0)
>
> +#define arch_kmap_temp_post_map(vaddr, pteval) kmap_flush_tlb(vaddr)
> +#define arch_kmap_temp_post_unmap(vaddr)       kmap_flush_tlb(vaddr)
> +
>  extern void kmap_init(void);
>
>  #endif /* __KERNEL__ */
> --- a/arch/csky/mm/highmem.c
> +++ b/arch/csky/mm/highmem.c
> @@ -9,8 +9,6 @@
>  #include <asm/tlbflush.h>
>  #include <asm/cacheflush.h>
>
> -static pte_t *kmap_pte;
> -
>  unsigned long highstart_pfn, highend_pfn;
>
>  void kmap_flush_tlb(unsigned long addr)
> @@ -19,67 +17,7 @@ void kmap_flush_tlb(unsigned long addr)
>  }
>  EXPORT_SYMBOL(kmap_flush_tlb);
>
> -void *kmap_atomic_high_prot(struct page *page, pgprot_t prot)
> -{
> -       unsigned long vaddr;
> -       int idx, type;
> -
> -       type = kmap_atomic_idx_push();
> -       idx = type + KM_TYPE_NR*smp_processor_id();
> -       vaddr = __fix_to_virt(FIX_KMAP_BEGIN + idx);
> -#ifdef CONFIG_DEBUG_HIGHMEM
> -       BUG_ON(!pte_none(*(kmap_pte - idx)));
> -#endif
> -       set_pte(kmap_pte-idx, mk_pte(page, prot));
> -       flush_tlb_one((unsigned long)vaddr);
> -
> -       return (void *)vaddr;
> -}
> -EXPORT_SYMBOL(kmap_atomic_high_prot);
> -
> -void kunmap_atomic_high(void *kvaddr)
> -{
> -       unsigned long vaddr = (unsigned long) kvaddr & PAGE_MASK;
> -       int idx;
> -
> -       if (vaddr < FIXADDR_START)
> -               return;
> -
> -#ifdef CONFIG_DEBUG_HIGHMEM
> -       idx = KM_TYPE_NR*smp_processor_id() + kmap_atomic_idx();
> -
> -       BUG_ON(vaddr != __fix_to_virt(FIX_KMAP_BEGIN + idx));
> -
> -       pte_clear(&init_mm, vaddr, kmap_pte - idx);
> -       flush_tlb_one(vaddr);
> -#else
> -       (void) idx; /* to kill a warning */
> -#endif
> -       kmap_atomic_idx_pop();
> -}
> -EXPORT_SYMBOL(kunmap_atomic_high);
> -
> -/*
> - * This is the same as kmap_atomic() but can map memory that doesn't
> - * have a struct page associated with it.
> - */
> -void *kmap_atomic_pfn(unsigned long pfn)
> -{
> -       unsigned long vaddr;
> -       int idx, type;
> -
> -       pagefault_disable();
> -
> -       type = kmap_atomic_idx_push();
> -       idx = type + KM_TYPE_NR*smp_processor_id();
> -       vaddr = __fix_to_virt(FIX_KMAP_BEGIN + idx);
> -       set_pte(kmap_pte-idx, pfn_pte(pfn, PAGE_KERNEL));
> -       flush_tlb_one(vaddr);
> -
> -       return (void *) vaddr;
> -}
> -
> -static void __init kmap_pages_init(void)
> +void __init kmap_init(void)
>  {
>         unsigned long vaddr;
>         pgd_t *pgd;
> @@ -96,14 +34,3 @@ static void __init kmap_pages_init(void)
>         pte = pte_offset_kernel(pmd, vaddr);
>         pkmap_page_table = pte;
>  }
> -
> -void __init kmap_init(void)
> -{
> -       unsigned long vaddr;
> -
> -       kmap_pages_init();
> -
> -       vaddr = __fix_to_virt(FIX_KMAP_BEGIN);
> -
> -       kmap_pte = pte_offset_kernel((pmd_t *)pgd_offset_k(vaddr), vaddr);
> -}
>


-- 
Best Regards
 Guo Ren

ML: https://lore.kernel.org/linux-csky/

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [patch RFC 00/15] mm/highmem: Provide a preemptible variant of kmap_atomic & friends
  2020-09-21 19:27                 ` Thomas Gleixner
@ 2020-09-23  8:40                   ` peterz
  2020-09-23 13:35                     ` Thomas Gleixner
  2020-09-23 15:52                     ` Steven Rostedt
  2020-09-23 10:19                   ` peterz
  2020-09-23 14:33                   ` Thomas Gleixner
  2 siblings, 2 replies; 54+ messages in thread
From: peterz @ 2020-09-23  8:40 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Linus Torvalds, LKML, linux-arch, Paul McKenney,
	the arch/x86 maintainers, Sebastian Andrzej Siewior, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Daniel Bristot de Oliveira, Will Deacon,
	Andrew Morton, Linux-MM, Russell King, Linux ARM, Chris Zankel,
	Max Filippov, linux-xtensa, Jani Nikula, Joonas Lahtinen,
	Rodrigo Vivi, David Airlie, Daniel Vetter, intel-gfx, dri-devel,
	Ard Biesheuvel, Herbert Xu, Vineet Gupta,
	open list:SYNOPSYS ARC ARCHITECTURE, Arnd Bergmann, Guo Ren,
	linux-csky, Michal Simek, Thomas Bogendoerfer, linux-mips,
	Nick Hu, Greentime Hu, Vincent Chen, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras, linuxppc-dev,
	David S. Miller, linux-sparc

On Mon, Sep 21, 2020 at 09:27:57PM +0200, Thomas Gleixner wrote:
> On Mon, Sep 21 2020 at 09:24, Linus Torvalds wrote:
> > On Mon, Sep 21, 2020 at 12:39 AM Thomas Gleixner <tglx@linutronix.de> wrote:
> >>
> >> If a task is migrated to a different CPU then the mapping address will
> >> change which will explode in colourful ways.
> >
> > Right you are.
> >
> > Maybe we really *could* call this new kmap functionality something
> > like "kmap_percpu()" (or maybe "local" is good enough), and make it
> > act like your RT code does for spinlocks - not disable preemption, but
> > only disabling CPU migration.
> 
> I"m all for it, but the scheduler people have opinions :)

Right, so I'm concerned. migrate_disable() wrecks pretty much all
Real-Time scheduler theory we have, and PREEMPRT_RT bringing it in is
somewhat ironic.

Yes, it allows breaking up non-preemptible regions of non-deterministic
duration, and thereby both reduce and bound the scheduling latency, the
cost for doing that is that the theory on CPU utilization/bandwidth go
out the window.

To easily see this consider an SMP system with a number of tasks equal
to the number of CPUs. On a regular (preempt_disable) kernel we can
always run each task, by virtue of always having an idle CPU to take the
task.

However, with migrate_disable() we can have each task preempted in a
migrate_disable() region, worse we can stack them all on the _same_ CPU
(super ridiculous odds, sure). And then we end up only able to run one
task, with the rest of the CPUs picking their nose.

The end result is that, like with unbounded latency, we will still miss
our deadline, simply because we got starved for CPU.


Now, while we could (with a _lot_ of work) rework the kernel to not rely
on the implicit per-cpu ness of things like spinlock_t, the moment we
bring in basic primitives that rely on migrate_disable() we're stuck
with it.


The problem is; afaict there's been no research into this problem. There
might be scheduling (read: balancing) schemes that can mitigate/solve
this problem, or it might prove to be a 'hard' problem, I just don't
know. But once we start down this road, it's going to be hell to get rid
of it.

That's why I've been arguing (for many years) to strictly limit this to
PREEMPT_RT and only as a gap-stop, not a fundamental primitive to build
on.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [patch RFC 00/15] mm/highmem: Provide a preemptible variant of kmap_atomic & friends
  2020-09-21 19:27                 ` Thomas Gleixner
  2020-09-23  8:40                   ` peterz
@ 2020-09-23 10:19                   ` peterz
  2020-09-23 12:33                     ` Thomas Gleixner
  2020-09-23 14:33                   ` Thomas Gleixner
  2 siblings, 1 reply; 54+ messages in thread
From: peterz @ 2020-09-23 10:19 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Linus Torvalds, LKML, linux-arch, Paul McKenney,
	the arch/x86 maintainers, Sebastian Andrzej Siewior, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Daniel Bristot de Oliveira, Will Deacon,
	Andrew Morton, Linux-MM, Russell King, Linux ARM, Chris Zankel,
	Max Filippov, linux-xtensa, Jani Nikula, Joonas Lahtinen,
	Rodrigo Vivi, David Airlie, Daniel Vetter, intel-gfx, dri-devel,
	Ard Biesheuvel, Herbert Xu, Vineet Gupta,
	open list:SYNOPSYS ARC ARCHITECTURE, Arnd Bergmann, Guo Ren,
	linux-csky, Michal Simek, Thomas Bogendoerfer, linux-mips,
	Nick Hu, Greentime Hu, Vincent Chen, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras, linuxppc-dev,
	David S. Miller, linux-sparc

On Mon, Sep 21, 2020 at 09:27:57PM +0200, Thomas Gleixner wrote:
> Alternatively this could of course be solved with per CPU page tables
> which will come around some day anyway I fear.

Previously (with PTI) we looked at making the entire kernel map per-CPU,
and that takes a 2K copy on switch_mm() (or more general, the user part
of whatever the top level directory is for architectures that have a
shared kernel/user page-table setup in the first place).

The idea was having a fixed per-cpu kernel page-table, share a bunch of
(kernel) page-tables between all CPUs and then copy in the user part on
switch.

I've forgotten what the plan was for ASID/PCID in that scheme.

For x86_64 we've been fearing the performance of that 2k copy, but I
don't think we've ever actually bit the bullet and implemented it to see
how bad it really is.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [patch RFC 00/15] mm/highmem: Provide a preemptible variant of kmap_atomic & friends
  2020-09-23 10:19                   ` peterz
@ 2020-09-23 12:33                     ` Thomas Gleixner
  0 siblings, 0 replies; 54+ messages in thread
From: Thomas Gleixner @ 2020-09-23 12:33 UTC (permalink / raw)
  To: peterz
  Cc: Linus Torvalds, LKML, linux-arch, Paul McKenney,
	the arch/x86 maintainers, Sebastian Andrzej Siewior, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Daniel Bristot de Oliveira, Will Deacon,
	Andrew Morton, Linux-MM, Russell King, Linux ARM, Chris Zankel,
	Max Filippov, linux-xtensa, Jani Nikula, Joonas Lahtinen,
	Rodrigo Vivi, David Airlie, Daniel Vetter, intel-gfx, dri-devel,
	Ard Biesheuvel, Herbert Xu, Vineet Gupta,
	open list:SYNOPSYS ARC ARCHITECTURE, Arnd Bergmann, Guo Ren,
	linux-csky, Michal Simek, Thomas Bogendoerfer, linux-mips,
	Nick Hu, Greentime Hu, Vincent Chen, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras, linuxppc-dev,
	David S. Miller, linux-sparc

On Wed, Sep 23 2020 at 12:19, peterz wrote:
> On Mon, Sep 21, 2020 at 09:27:57PM +0200, Thomas Gleixner wrote:
>> Alternatively this could of course be solved with per CPU page tables
>> which will come around some day anyway I fear.
>
> Previously (with PTI) we looked at making the entire kernel map per-CPU,
> and that takes a 2K copy on switch_mm() (or more general, the user part
> of whatever the top level directory is for architectures that have a
> shared kernel/user page-table setup in the first place).
>
> The idea was having a fixed per-cpu kernel page-table, share a bunch of
> (kernel) page-tables between all CPUs and then copy in the user part on
> switch.
>
> I've forgotten what the plan was for ASID/PCID in that scheme.
>
> For x86_64 we've been fearing the performance of that 2k copy, but I
> don't think we've ever actually bit the bullet and implemented it to see
> how bad it really is.

I actually did at some point and depending on the workload the overhead
was clearly measurable. And yes, it fell apart with PCID and I could not
come up with a scheme for it which did not suck horribly. So I burried
the patches in the poison cabinet.

Aside of that, we'd need to implement that for a eight other
architectures as well...

Thanks,

        tglx


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [patch RFC 00/15] mm/highmem: Provide a preemptible variant of kmap_atomic & friends
  2020-09-23  8:40                   ` peterz
@ 2020-09-23 13:35                     ` Thomas Gleixner
  2020-09-23 15:52                     ` Steven Rostedt
  1 sibling, 0 replies; 54+ messages in thread
From: Thomas Gleixner @ 2020-09-23 13:35 UTC (permalink / raw)
  To: peterz
  Cc: Linus Torvalds, LKML, linux-arch, Paul McKenney,
	the arch/x86 maintainers, Sebastian Andrzej Siewior, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Daniel Bristot de Oliveira, Will Deacon,
	Andrew Morton, Linux-MM, Russell King, Linux ARM, Chris Zankel,
	Max Filippov, linux-xtensa, Jani Nikula, Joonas Lahtinen,
	Rodrigo Vivi, David Airlie, Daniel Vetter, intel-gfx, dri-devel,
	Ard Biesheuvel, Herbert Xu, Vineet Gupta,
	open list:SYNOPSYS ARC ARCHITECTURE, Arnd Bergmann, Guo Ren,
	linux-csky, Michal Simek, Thomas Bogendoerfer, linux-mips,
	Nick Hu, Greentime Hu, Vincent Chen, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras, linuxppc-dev,
	David S. Miller, linux-sparc

On Wed, Sep 23 2020 at 10:40, peterz wrote:
> Right, so I'm concerned. migrate_disable() wrecks pretty much all
> Real-Time scheduler theory we have, and PREEMPRT_RT bringing it in is
> somewhat ironic.

It's even more ironic that the approach of PREEMPT_RT has been
'pragmatic ignorance of theory' from the very beginning and despite
violating all theories it still works. :)

> Yes, it allows breaking up non-preemptible regions of non-deterministic
> duration, and thereby both reduce and bound the scheduling latency, the
> cost for doing that is that the theory on CPU utilization/bandwidth go
> out the window.

I agree, that the theory goes out of the window, but does it matter in
practice? I've yet to see a report of migrate disable stacking being the
culprit of a missed deadline and I surely have stared at lots of reports
in the past 10+ years.

> To easily see this consider an SMP system with a number of tasks equal
> to the number of CPUs. On a regular (preempt_disable) kernel we can
> always run each task, by virtue of always having an idle CPU to take the
> task.
>
> However, with migrate_disable() we can have each task preempted in a
> migrate_disable() region, worse we can stack them all on the _same_ CPU
> (super ridiculous odds, sure). And then we end up only able to run one
> task, with the rest of the CPUs picking their nose.
>
> The end result is that, like with unbounded latency, we will still miss
> our deadline, simply because we got starved for CPU.

I'm well aware of that.

> Now, while we could (with a _lot_ of work) rework the kernel to not rely
> on the implicit per-cpu ness of things like spinlock_t, the moment we
> bring in basic primitives that rely on migrate_disable() we're stuck
> with it.

Right, but we are stuck with per CPUness and distangling that is just
infeasible IMO.

> The problem is; afaict there's been no research into this problem.

There is no research on a lot of things the kernel does :)

> There might be scheduling (read: balancing) schemes that can
> mitigate/solve this problem, or it might prove to be a 'hard' problem,
> I just don't know.

In practive balancing surely can take the number of preempted tasks
which are in a migrate disable section into account which would be just
another measure to work around the fact that the kernel is not adhering
to the theories. It never did that even w/o migrate disable.

> But once we start down this road, it's going to be hell to get rid of
> it.

Like most of the other things the kernel came up with to deal with the
oddities of modern hardware :)

> That's why I've been arguing (for many years) to strictly limit this to
> PREEMPT_RT and only as a gap-stop, not a fundamental primitive to build
> on.

I know, but short of rewriting the world, I'm not seing the faintest
plan to remove the stop gap. :)

As we discussed not long ago we have too many inconsistent preemption
models already. RT is adding yet another one. And that's worse than
introducing migrate disable as a general available facility.

IMO, reaching a point of consistency where our different preemption
models (NONE, VOLUNTARY, PREEMPT. RT) build on each other is far more
important.

For !RT migrate disable is far less of an danger than for RT kernels
because the amount of code which will use it is rather limited compared
to code which still will disable preemption implicit through spin and rw
locks.

On RT converting these locks to 'sleepable spinlocks' is just possible
because RT forces almost everything into task context and migrate
disable is just the obvious decomposition of preempt disable which
implicitely disables migration.

But that means that RT is by orders of magnitude more prone to run into
the scheduling trainwreck you are worried about. It just refuses to do
so at least with real world work loads.

I'm surely in favour of having solid theories behind implementation, but
at some point you just have to bite the bullet and chose pragmatism in
order to make progress.

Proliferating inconsistency is not real progress, as it is violating the
most fundamental engineering principles. That's by far more dangerous
than violating scheduling theories which are built on perfect models and
therefore enforce violation by practical implementations anyway.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [patch RFC 00/15] mm/highmem: Provide a preemptible variant of kmap_atomic & friends
  2020-09-21 19:27                 ` Thomas Gleixner
  2020-09-23  8:40                   ` peterz
  2020-09-23 10:19                   ` peterz
@ 2020-09-23 14:33                   ` Thomas Gleixner
  2 siblings, 0 replies; 54+ messages in thread
From: Thomas Gleixner @ 2020-09-23 14:33 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: LKML, linux-arch, Paul McKenney, the arch/x86 maintainers,
	Sebastian Andrzej Siewior, Peter Zijlstra, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Daniel Bristot de Oliveira, Will Deacon,
	Andrew Morton, Linux-MM, Russell King, Linux ARM, Chris Zankel,
	Max Filippov, linux-xtensa, Jani Nikula, Joonas Lahtinen,
	Rodrigo Vivi, David Airlie, Daniel Vetter, intel-gfx, dri-devel,
	Ard Biesheuvel, Herbert Xu, Vineet Gupta,
	open list:SYNOPSYS ARC ARCHITECTURE, Arnd Bergmann, Guo Ren,
	linux-csky, Michal Simek, Thomas Bogendoerfer, linux-mips,
	Nick Hu, Greentime Hu, Vincent Chen, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras, linuxppc-dev,
	David S. Miller, linux-sparc

On Mon, Sep 21 2020 at 21:27, Thomas Gleixner wrote:
> On Mon, Sep 21 2020 at 09:24, Linus Torvalds wrote:
>> On Mon, Sep 21, 2020 at 12:39 AM Thomas Gleixner <tglx@linutronix.de> wrote:
>> Maybe we really *could* call this new kmap functionality something
>> like "kmap_percpu()" (or maybe "local" is good enough), and make it
>> act like your RT code does for spinlocks - not disable preemption, but
>> only disabling CPU migration.
>
> I"m all for it, but the scheduler people have opinions :)

I just took the latest version of migrate disable patches

  https://lore.kernel.org/r/20200921163557.234036895@infradead.org

removed the RT dependency on top of them and adopted the kmap stuff
(addressing the various comments while at it and unbreaking ARM).

I'm not going to post that until there is consensus about the general
availability of migrate disable, but for those who want to play with it
I've pushed the resulting combo out to:

   git://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel.git highmem 

For testing I've tweaked a few places to use the new _local() variants
and it survived testing so far and I've verified that there is actual
preemption which means zap/restore of the thread local kmaps.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [patch RFC 00/15] mm/highmem: Provide a preemptible variant of kmap_atomic & friends
  2020-09-23  8:40                   ` peterz
  2020-09-23 13:35                     ` Thomas Gleixner
@ 2020-09-23 15:52                     ` Steven Rostedt
  2020-09-23 20:55                       ` Thomas Gleixner
  2020-09-24  8:27                       ` peterz
  1 sibling, 2 replies; 54+ messages in thread
From: Steven Rostedt @ 2020-09-23 15:52 UTC (permalink / raw)
  To: peterz
  Cc: Thomas Gleixner, Linus Torvalds, LKML, linux-arch, Paul McKenney,
	the arch/x86 maintainers, Sebastian Andrzej Siewior, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Ben Segall, Mel Gorman,
	Daniel Bristot de Oliveira, Will Deacon, Andrew Morton, Linux-MM,
	Russell King, Linux ARM, Chris Zankel, Max Filippov,
	linux-xtensa, Jani Nikula, Joonas Lahtinen, Rodrigo Vivi,
	David Airlie, Daniel Vetter, intel-gfx, dri-devel,
	Ard Biesheuvel, Herbert Xu, Vineet Gupta,
	open list:SYNOPSYS ARC ARCHITECTURE, Arnd Bergmann, Guo Ren,
	linux-csky, Michal Simek, Thomas Bogendoerfer, linux-mips,
	Nick Hu, Greentime Hu, Vincent Chen, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras, linuxppc-dev,
	David S. Miller, linux-sparc

On Wed, 23 Sep 2020 10:40:32 +0200
peterz@infradead.org wrote:

> However, with migrate_disable() we can have each task preempted in a
> migrate_disable() region, worse we can stack them all on the _same_ CPU
> (super ridiculous odds, sure). And then we end up only able to run one
> task, with the rest of the CPUs picking their nose.

What if we just made migrate_disable() a local_lock() available for !RT?

I mean make it a priority inheritance PER CPU lock.

That is, no two tasks could do a migrate_disable() on the same CPU? If
one task does a migrate_disable() and then gets preempted and the
preempting task does a migrate_disable() on the same CPU, it will block
and wait for the first task to do a migrate_enable().

No two tasks on the same CPU could enter the migrate_disable() section
simultaneously, just like no two tasks could enter a preempt_disable()
section.

In essence, we just allow local_lock() to be used for both RT and !RT.

Perhaps make migrate_disable() an anonymous local_lock()?

This should lower the SHC in theory, if you can't have stacked migrate
disables on the same CPU.

-- Steve

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [patch RFC 00/15] mm/highmem: Provide a preemptible variant of kmap_atomic & friends
  2020-09-23 15:52                     ` Steven Rostedt
@ 2020-09-23 20:55                       ` Thomas Gleixner
  2020-09-23 21:12                         ` Steven Rostedt
  2020-09-24  8:27                       ` peterz
  1 sibling, 1 reply; 54+ messages in thread
From: Thomas Gleixner @ 2020-09-23 20:55 UTC (permalink / raw)
  To: Steven Rostedt, peterz
  Cc: Linus Torvalds, LKML, linux-arch, Paul McKenney,
	the arch/x86 maintainers, Sebastian Andrzej Siewior, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Ben Segall, Mel Gorman,
	Daniel Bristot de Oliveira, Will Deacon, Andrew Morton, Linux-MM,
	Russell King, Linux ARM, Chris Zankel, Max Filippov,
	linux-xtensa, Jani Nikula, Joonas Lahtinen, Rodrigo Vivi,
	David Airlie, Daniel Vetter, intel-gfx, dri-devel,
	Ard Biesheuvel, Herbert Xu, Vineet Gupta,
	open list:SYNOPSYS ARC ARCHITECTURE, Arnd Bergmann, Guo Ren,
	linux-csky, Michal Simek, Thomas Bogendoerfer, linux-mips,
	Nick Hu, Greentime Hu, Vincent Chen, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras, linuxppc-dev,
	David S. Miller, linux-sparc

On Wed, Sep 23 2020 at 11:52, Steven Rostedt wrote:
> On Wed, 23 Sep 2020 10:40:32 +0200
> peterz@infradead.org wrote:
>
>> However, with migrate_disable() we can have each task preempted in a
>> migrate_disable() region, worse we can stack them all on the _same_ CPU
>> (super ridiculous odds, sure). And then we end up only able to run one
>> task, with the rest of the CPUs picking their nose.
>
> What if we just made migrate_disable() a local_lock() available for !RT?
>
> I mean make it a priority inheritance PER CPU lock.
>
> That is, no two tasks could do a migrate_disable() on the same CPU? If
> one task does a migrate_disable() and then gets preempted and the
> preempting task does a migrate_disable() on the same CPU, it will block
> and wait for the first task to do a migrate_enable().
>
> No two tasks on the same CPU could enter the migrate_disable() section
> simultaneously, just like no two tasks could enter a preempt_disable()
> section.
>
> In essence, we just allow local_lock() to be used for both RT and !RT.
>
> Perhaps make migrate_disable() an anonymous local_lock()?
>
> This should lower the SHC in theory, if you can't have stacked migrate
> disables on the same CPU.

I'm pretty sure this ends up in locking hell pretty fast and aside of
that it's not working for scenarios like:

     kmap_local();
       migrate_disable();
       ...

     copy_from_user()
        -> #PF
           -> schedule()

which brought us into that discussion in the first place. You would stop
any other migrate disable user from running until the page fault is
resolved...

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [patch RFC 00/15] mm/highmem: Provide a preemptible variant of kmap_atomic & friends
  2020-09-23 20:55                       ` Thomas Gleixner
@ 2020-09-23 21:12                         ` Steven Rostedt
  2020-09-24  6:57                           ` Thomas Gleixner
  0 siblings, 1 reply; 54+ messages in thread
From: Steven Rostedt @ 2020-09-23 21:12 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: peterz, Linus Torvalds, LKML, linux-arch, Paul McKenney,
	the arch/x86 maintainers, Sebastian Andrzej Siewior, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Ben Segall, Mel Gorman,
	Daniel Bristot de Oliveira, Will Deacon, Andrew Morton, Linux-MM,
	Russell King, Linux ARM, Chris Zankel, Max Filippov,
	linux-xtensa, Jani Nikula, Joonas Lahtinen, Rodrigo Vivi,
	David Airlie, Daniel Vetter, intel-gfx, dri-devel,
	Ard Biesheuvel, Herbert Xu, Vineet Gupta,
	open list:SYNOPSYS ARC ARCHITECTURE, Arnd Bergmann, Guo Ren,
	linux-csky, Michal Simek, Thomas Bogendoerfer, linux-mips,
	Nick Hu, Greentime Hu, Vincent Chen, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras, linuxppc-dev,
	David S. Miller, linux-sparc

On Wed, 23 Sep 2020 22:55:54 +0200
Thomas Gleixner <tglx@linutronix.de> wrote:

> > Perhaps make migrate_disable() an anonymous local_lock()?
> >
> > This should lower the SHC in theory, if you can't have stacked migrate
> > disables on the same CPU.  
> 
> I'm pretty sure this ends up in locking hell pretty fast and aside of
> that it's not working for scenarios like:
> 
>      kmap_local();
>        migrate_disable();
>        ...
> 
>      copy_from_user()
>         -> #PF
>            -> schedule()  
> 
> which brought us into that discussion in the first place. You would stop
> any other migrate disable user from running until the page fault is
> resolved...

Then scratch the idea of having anonymous local_lock() and just bring
local_lock in directly? Then have a kmap local lock, which would only
block those that need to do a kmap.

Now as for migration disabled nesting, at least now we would have
groupings of this, and perhaps the theorists can handle that. I mean,
how is this much different that having a bunch of tasks blocked on a
mutex with the owner is pinned on a CPU?

migrate_disable() is a BKL of pinning affinity. If we only have
local_lock() available (even on !RT), then it makes the blocking in
groups. At least this way you could grep for all the different
local_locks in the system and plug that into the algorithm for WCS,
just like one would with a bunch of mutexes.

-- Steve

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [patch RFC 00/15] mm/highmem: Provide a preemptible variant of kmap_atomic & friends
  2020-09-23 21:12                         ` Steven Rostedt
@ 2020-09-24  6:57                           ` Thomas Gleixner
  2020-09-24 12:32                             ` Steven Rostedt
  0 siblings, 1 reply; 54+ messages in thread
From: Thomas Gleixner @ 2020-09-24  6:57 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: peterz, Linus Torvalds, LKML, linux-arch, Paul McKenney,
	the arch/x86 maintainers, Sebastian Andrzej Siewior, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Ben Segall, Mel Gorman,
	Daniel Bristot de Oliveira, Will Deacon, Andrew Morton, Linux-MM,
	Russell King, Linux ARM, Chris Zankel, Max Filippov,
	linux-xtensa, Jani Nikula, Joonas Lahtinen, Rodrigo Vivi,
	David Airlie, Daniel Vetter, intel-gfx, dri-devel,
	Ard Biesheuvel, Herbert Xu, Vineet Gupta,
	open list:SYNOPSYS ARC ARCHITECTURE, Arnd Bergmann, Guo Ren,
	linux-csky, Michal Simek, Thomas Bogendoerfer, linux-mips,
	Nick Hu, Greentime Hu, Vincent Chen, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras, linuxppc-dev,
	David S. Miller, linux-sparc

On Wed, Sep 23 2020 at 17:12, Steven Rostedt wrote:
> On Wed, 23 Sep 2020 22:55:54 +0200
> Then scratch the idea of having anonymous local_lock() and just bring
> local_lock in directly? Then have a kmap local lock, which would only
> block those that need to do a kmap.

That's still going to end up in lock ordering nightmares and you lose
the ability to use kmap_local from arbitrary contexts which was again
one of the goals of this exercise.

Aside of that you're imposing reentrancy protections on something which
does not need it in the first place.

> Now as for migration disabled nesting, at least now we would have
> groupings of this, and perhaps the theorists can handle that. I mean,
> how is this much different that having a bunch of tasks blocked on a
> mutex with the owner is pinned on a CPU?
>
> migrate_disable() is a BKL of pinning affinity.

No. That's just wrong. preempt disable is a concurrency control,
i.e. protecting against reentrancy on a given CPU. But it's a cpu global
protection which means that it's not protecting a specific code path.

Contrary to preempt disable, migrate disable is not protecting against
reentrancy on a given CPU. It's a temporary restriction to the scheduler
on placement.

The fact that disabling preemption implicitely disables migration does
not make them semantically equivalent.

> If we only have local_lock() available (even on !RT), then it makes
> the blocking in groups. At least this way you could grep for all the
> different local_locks in the system and plug that into the algorithm
> for WCS, just like one would with a bunch of mutexes.

You cannot do that on RT at all where migrate disable is substituting
preempt disable in spin and rw locks. The result would be the same as
with a !RT kernel just with horribly bad performance.

That means the stacking problem has to be solved anyway.

So why on earth do you want to create yet another special duct tape case
for kamp_local() which proliferates inconsistency instead of aiming for
consistency accross all preemption models?

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [patch RFC 00/15] mm/highmem: Provide a preemptible variant of kmap_atomic & friends
  2020-09-23 15:52                     ` Steven Rostedt
  2020-09-23 20:55                       ` Thomas Gleixner
@ 2020-09-24  8:27                       ` peterz
  2020-09-24 19:36                         ` Daniel Bristot de Oliveira
  1 sibling, 1 reply; 54+ messages in thread
From: peterz @ 2020-09-24  8:27 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Thomas Gleixner, Linus Torvalds, LKML, linux-arch, Paul McKenney,
	the arch/x86 maintainers, Sebastian Andrzej Siewior, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Ben Segall, Mel Gorman,
	Daniel Bristot de Oliveira, Will Deacon, Andrew Morton, Linux-MM,
	Russell King, Linux ARM, Chris Zankel, Max Filippov,
	linux-xtensa, Jani Nikula, Joonas Lahtinen, Rodrigo Vivi,
	David Airlie, Daniel Vetter, intel-gfx, dri-devel,
	Ard Biesheuvel, Herbert Xu, Vineet Gupta,
	open list:SYNOPSYS ARC ARCHITECTURE, Arnd Bergmann, Guo Ren,
	linux-csky, Michal Simek, Thomas Bogendoerfer, linux-mips,
	Nick Hu, Greentime Hu, Vincent Chen, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras, linuxppc-dev,
	David S. Miller, linux-sparc

On Wed, Sep 23, 2020 at 11:52:51AM -0400, Steven Rostedt wrote:
> On Wed, 23 Sep 2020 10:40:32 +0200
> peterz@infradead.org wrote:
> 
> > However, with migrate_disable() we can have each task preempted in a
> > migrate_disable() region, worse we can stack them all on the _same_ CPU
> > (super ridiculous odds, sure). And then we end up only able to run one
> > task, with the rest of the CPUs picking their nose.
> 
> What if we just made migrate_disable() a local_lock() available for !RT?

Can't, neiter migrate_disable() nor migrate_enable() are allowed to
block -- which is what makes their implementation so 'interesting'.

> This should lower the SHC in theory, if you can't have stacked migrate
> disables on the same CPU.

See this email in that other thread (you're on Cc too IIRC):

  https://lkml.kernel.org/r/20200923170809.GY1362448@hirez.programming.kicks-ass.net

I think that is we 'frob' the balance PULL, we'll end up with something
similar.

Whichever way around we turn this thing, the migrate_disable() runtime
(we'll have to add a tracer for that), will be an interference term on
the lower priority task, exactly like preempt_disable() would be. We'll
just not exclude a higher priority task from running.


AFAICT; the best case is a single migrate_disable() nesting, where a
higher priority task preempts in a migrate_disable() section -- this is
per design.

When this preempted task becomes elegible to run under the ideal model
(IOW it becomes one of the M highest priority tasks), it might still
have to wait for the preemptee's migrate_disable() section to complete.
Thereby suffering interference in the exact duration of
migrate_disable() section.

Per this argument, the change from preempt_disable() to
migrate_disable() gets us:

 - higher priority tasks gain reduced wake-up latency
 - lower priority tasks are unchanged and are subject to the exact same
   interference term as if the higher priority task were using
   preempt_disable().

Since we've already established this term is unbounded, any task but the
highest priority task is basically buggered.

TL;DR, if we get balancing fixed and achieve (near) the optimal case
above, migrate_disable() is an over-all win. But it's provably
non-deterministic as long as the migrate_disable() sections are
non-deterministic.


The reason this all mostly works in practise is (I think) because:

 - People care most about the higher prio RT tasks and craft them to
   mostly avoid the migrate_disable() infected code.

 - The preemption scenario is most pronounced at higher utilization
   scenarios, and I suspect this is fairly rare to begin with.

 - And while many of these migrate_disable() regions are unbound in
   theory, in practise they're often fairly reasonable.


So my current todo list is:

 - Change RT PULL
 - Change DL PULL
 - Add migrate_disable() tracer; exactly like preempt/irqoff, except
   measuring task-runtime instead of cpu-time.
 - Add a mode that measures actual interference.
 - Add a traceevent to detect preemption in migrate_disable().


And then I suppose I should twist Daniel's arm to update his model to
include these scenarios and numbers.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [patch RFC 00/15] mm/highmem: Provide a preemptible variant of kmap_atomic & friends
  2020-09-24  6:57                           ` Thomas Gleixner
@ 2020-09-24 12:32                             ` Steven Rostedt
  2020-09-24 12:42                               ` Peter Zijlstra
  2020-09-24 17:55                               ` Thomas Gleixner
  0 siblings, 2 replies; 54+ messages in thread
From: Steven Rostedt @ 2020-09-24 12:32 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: peterz, Linus Torvalds, LKML, linux-arch, Paul McKenney,
	the arch/x86 maintainers, Sebastian Andrzej Siewior, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Ben Segall, Mel Gorman,
	Daniel Bristot de Oliveira, Will Deacon, Andrew Morton, Linux-MM,
	Russell King, Linux ARM, Chris Zankel, Max Filippov,
	linux-xtensa, Jani Nikula, Joonas Lahtinen, Rodrigo Vivi,
	David Airlie, Daniel Vetter, intel-gfx, dri-devel,
	Ard Biesheuvel, Herbert Xu, Vineet Gupta,
	open list:SYNOPSYS ARC ARCHITECTURE, Arnd Bergmann, Guo Ren,
	linux-csky, Michal Simek, Thomas Bogendoerfer, linux-mips,
	Nick Hu, Greentime Hu, Vincent Chen, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras, linuxppc-dev,
	David S. Miller, linux-sparc

On Thu, 24 Sep 2020 08:57:52 +0200
Thomas Gleixner <tglx@linutronix.de> wrote:

> > Now as for migration disabled nesting, at least now we would have
> > groupings of this, and perhaps the theorists can handle that. I mean,
> > how is this much different that having a bunch of tasks blocked on a
> > mutex with the owner is pinned on a CPU?
> >
> > migrate_disable() is a BKL of pinning affinity.  
> 
> No. That's just wrong. preempt disable is a concurrency control,

I think you totally misunderstood what I was saying. The above wasn't about
comparing preempt_disable to migrate_disable. It was comparing
migrate_disable to a chain of tasks blocked on mutexes where the top owner
has preempt_disable set. You still have a bunch of tasks that can't move to
other CPUs.


> > If we only have local_lock() available (even on !RT), then it makes
> > the blocking in groups. At least this way you could grep for all the
> > different local_locks in the system and plug that into the algorithm
> > for WCS, just like one would with a bunch of mutexes.  
> 
> You cannot do that on RT at all where migrate disable is substituting
> preempt disable in spin and rw locks. The result would be the same as
> with a !RT kernel just with horribly bad performance.

Note, the spin and rwlocks already have a lock associated with them. Why
would it be any different on RT? I wasn't suggesting adding another lock
inside a spinlock. Why would I recommend THAT? I wasn't recommending
blindly replacing migrate_disable() with local_lock(). I just meant expose
local_lock() but not migrate_disable().

> 
> That means the stacking problem has to be solved anyway.
> 
> So why on earth do you want to create yet another special duct tape case
> for kamp_local() which proliferates inconsistency instead of aiming for
> consistency accross all preemption models?

The idea was to help with the scheduling issue.

Anyway, instead of blocking. What about having a counter of number of
migrate disabled tasks per cpu, and when taking a migrate_disable(), and there's
already another task with migrate_disabled() set, and the current task has
an affinity greater than 1, it tries to migrate to another CPU?

This way migrate_disable() is less likely to have a bunch of tasks blocked
on one CPU serialized by each task exiting the migrate_disable() section.

Yes, there's more overhead, but it only happens if multiple tasks are in a
migrate disable section on the same CPU.

-- Steve


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [patch RFC 00/15] mm/highmem: Provide a preemptible variant of kmap_atomic & friends
  2020-09-24 12:32                             ` Steven Rostedt
@ 2020-09-24 12:42                               ` Peter Zijlstra
  2020-09-24 13:51                                 ` Steven Rostedt
  2020-09-24 17:55                               ` Thomas Gleixner
  1 sibling, 1 reply; 54+ messages in thread
From: Peter Zijlstra @ 2020-09-24 12:42 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Thomas Gleixner, Linus Torvalds, LKML, linux-arch, Paul McKenney,
	the arch/x86 maintainers, Sebastian Andrzej Siewior, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Ben Segall, Mel Gorman,
	Daniel Bristot de Oliveira, Will Deacon, Andrew Morton, Linux-MM,
	Russell King, Linux ARM, Chris Zankel, Max Filippov,
	linux-xtensa, Jani Nikula, Joonas Lahtinen, Rodrigo Vivi,
	David Airlie, Daniel Vetter, intel-gfx, dri-devel,
	Ard Biesheuvel, Herbert Xu, Vineet Gupta,
	open list:SYNOPSYS ARC ARCHITECTURE, Arnd Bergmann, Guo Ren,
	linux-csky, Michal Simek, Thomas Bogendoerfer, linux-mips,
	Nick Hu, Greentime Hu, Vincent Chen, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras, linuxppc-dev,
	David S. Miller, linux-sparc

On Thu, Sep 24, 2020 at 08:32:41AM -0400, Steven Rostedt wrote:
> Anyway, instead of blocking. What about having a counter of number of
> migrate disabled tasks per cpu, and when taking a migrate_disable(), and there's
> already another task with migrate_disabled() set, and the current task has
> an affinity greater than 1, it tries to migrate to another CPU?

That doesn't solve the problem. On wakeup we should already prefer an
idle CPU over one running a (RT) task, but you can always wake more
tasks than there's CPUs around and you'll _have_ to stack at some point.

The trick is how to unstack them correctly. We need to detect when a
migrate_disable() task _should_ start running again, and migrate away
whoever is in the way at that point.

It turns out, that getting selected for pull-balance is exactly that
condition, and clearly a migrate_disable() task cannot be pulled, but we
can use that signal to try and pull away the running task that's in the
way.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [patch RFC 00/15] mm/highmem: Provide a preemptible variant of kmap_atomic & friends
  2020-09-24 12:42                               ` Peter Zijlstra
@ 2020-09-24 13:51                                 ` Steven Rostedt
  2020-09-24 13:58                                   ` Peter Zijlstra
  0 siblings, 1 reply; 54+ messages in thread
From: Steven Rostedt @ 2020-09-24 13:51 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Thomas Gleixner, Linus Torvalds, LKML, linux-arch, Paul McKenney,
	the arch/x86 maintainers, Sebastian Andrzej Siewior, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Ben Segall, Mel Gorman,
	Daniel Bristot de Oliveira, Will Deacon, Andrew Morton, Linux-MM,
	Russell King, Linux ARM, Chris Zankel, Max Filippov,
	linux-xtensa, Jani Nikula, Joonas Lahtinen, Rodrigo Vivi,
	David Airlie, Daniel Vetter, intel-gfx, dri-devel,
	Ard Biesheuvel, Herbert Xu, Vineet Gupta,
	open list:SYNOPSYS ARC ARCHITECTURE, Arnd Bergmann, Guo Ren,
	linux-csky, Michal Simek, Thomas Bogendoerfer, linux-mips,
	Nick Hu, Greentime Hu, Vincent Chen, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras, linuxppc-dev,
	David S. Miller, linux-sparc

On Thu, 24 Sep 2020 14:42:41 +0200
Peter Zijlstra <peterz@infradead.org> wrote:

> On Thu, Sep 24, 2020 at 08:32:41AM -0400, Steven Rostedt wrote:
> > Anyway, instead of blocking. What about having a counter of number of
> > migrate disabled tasks per cpu, and when taking a migrate_disable(), and there's
> > already another task with migrate_disabled() set, and the current task has
> > an affinity greater than 1, it tries to migrate to another CPU?  
> 
> That doesn't solve the problem. On wakeup we should already prefer an
> idle CPU over one running a (RT) task, but you can always wake more
> tasks than there's CPUs around and you'll _have_ to stack at some point.

Yes, understood.

> 
> The trick is how to unstack them correctly. We need to detect when a
> migrate_disable() task _should_ start running again, and migrate away
> whoever is in the way at that point.
> 
> It turns out, that getting selected for pull-balance is exactly that
> condition, and clearly a migrate_disable() task cannot be pulled, but we
> can use that signal to try and pull away the running task that's in the
> way.

Unless of course that running task is in a migrate disable section
itself ;-)

But I guess we will always have that SHC, and there will always be a
scenario that you can't balance properly. But hopefully in practice we
wont see that.

How to handle kmap_local(), will migrate_disable() be used only for
32bit or, for consistency, will it also apply to 64bit?

-- Steve

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [patch RFC 00/15] mm/highmem: Provide a preemptible variant of kmap_atomic & friends
  2020-09-24 13:51                                 ` Steven Rostedt
@ 2020-09-24 13:58                                   ` Peter Zijlstra
  0 siblings, 0 replies; 54+ messages in thread
From: Peter Zijlstra @ 2020-09-24 13:58 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Thomas Gleixner, Linus Torvalds, LKML, linux-arch, Paul McKenney,
	the arch/x86 maintainers, Sebastian Andrzej Siewior, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Ben Segall, Mel Gorman,
	Daniel Bristot de Oliveira, Will Deacon, Andrew Morton, Linux-MM,
	Russell King, Linux ARM, Chris Zankel, Max Filippov,
	linux-xtensa, Jani Nikula, Joonas Lahtinen, Rodrigo Vivi,
	David Airlie, Daniel Vetter, intel-gfx, dri-devel,
	Ard Biesheuvel, Herbert Xu, Vineet Gupta,
	open list:SYNOPSYS ARC ARCHITECTURE, Arnd Bergmann, Guo Ren,
	linux-csky, Michal Simek, Thomas Bogendoerfer, linux-mips,
	Nick Hu, Greentime Hu, Vincent Chen, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras, linuxppc-dev,
	David S. Miller, linux-sparc

On Thu, Sep 24, 2020 at 09:51:38AM -0400, Steven Rostedt wrote:

> > It turns out, that getting selected for pull-balance is exactly that
> > condition, and clearly a migrate_disable() task cannot be pulled, but we
> > can use that signal to try and pull away the running task that's in the
> > way.
> 
> Unless of course that running task is in a migrate disable section
> itself ;-)

See my ramblings here:

  https://lkml.kernel.org/r/20200924082717.GA1362448@hirez.programming.kicks-ass.net

My plan was to have the migrate_enable() of the running task trigger the
migration in that case.


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [patch RFC 00/15] mm/highmem: Provide a preemptible variant of kmap_atomic & friends
  2020-09-24 12:32                             ` Steven Rostedt
  2020-09-24 12:42                               ` Peter Zijlstra
@ 2020-09-24 17:55                               ` Thomas Gleixner
  2020-09-24 18:58                                 ` Steven Rostedt
  1 sibling, 1 reply; 54+ messages in thread
From: Thomas Gleixner @ 2020-09-24 17:55 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: peterz, Linus Torvalds, LKML, linux-arch, Paul McKenney,
	the arch/x86 maintainers, Sebastian Andrzej Siewior, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Ben Segall, Mel Gorman,
	Daniel Bristot de Oliveira, Will Deacon, Andrew Morton, Linux-MM,
	Russell King, Linux ARM, Chris Zankel, Max Filippov,
	linux-xtensa, Jani Nikula, Joonas Lahtinen, Rodrigo Vivi,
	David Airlie, Daniel Vetter, intel-gfx, dri-devel,
	Ard Biesheuvel, Herbert Xu, Vineet Gupta,
	open list:SYNOPSYS ARC ARCHITECTURE, Arnd Bergmann, Guo Ren,
	linux-csky, Michal Simek, Thomas Bogendoerfer, linux-mips,
	Nick Hu, Greentime Hu, Vincent Chen, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras, linuxppc-dev,
	David S. Miller, linux-sparc

On Thu, Sep 24 2020 at 08:32, Steven Rostedt wrote:
> On Thu, 24 Sep 2020 08:57:52 +0200
> Thomas Gleixner <tglx@linutronix.de> wrote:
>
>> > Now as for migration disabled nesting, at least now we would have
>> > groupings of this, and perhaps the theorists can handle that. I mean,
>> > how is this much different that having a bunch of tasks blocked on a
>> > mutex with the owner is pinned on a CPU?
>> >
>> > migrate_disable() is a BKL of pinning affinity.  
>> 
>> No. That's just wrong. preempt disable is a concurrency control,
>
> I think you totally misunderstood what I was saying. The above wasn't about
> comparing preempt_disable to migrate_disable. It was comparing
> migrate_disable to a chain of tasks blocked on mutexes where the top owner
> has preempt_disable set. You still have a bunch of tasks that can't move to
> other CPUs.

What? The top owner does not prevent any task from moving. The tasks
cannot move because they are blocked on the mutex, which means they are
not runnable and non runnable tasks are not migrated at all.

I really don't understand what you are trying to say.

>> > If we only have local_lock() available (even on !RT), then it makes
>> > the blocking in groups. At least this way you could grep for all the
>> > different local_locks in the system and plug that into the algorithm
>> > for WCS, just like one would with a bunch of mutexes.  
>> 
>> You cannot do that on RT at all where migrate disable is substituting
>> preempt disable in spin and rw locks. The result would be the same as
>> with a !RT kernel just with horribly bad performance.
>
> Note, the spin and rwlocks already have a lock associated with them. Why
> would it be any different on RT? I wasn't suggesting adding another lock
> inside a spinlock. Why would I recommend THAT? I wasn't recommending
> blindly replacing migrate_disable() with local_lock(). I just meant expose
> local_lock() but not migrate_disable().

We already exposed local_lock() to non RT and it's for places which do
preempt_disable() or local_irq_disable() without having a lock
associated. But both primitives are scope less and therefore behave like
CPU local BKLs. What local_lock() provides in these cases is:

  - Making the protection scope clear by associating a named local
    lock which is coverred by lockdep.

  - It still maps to preempt_disable() or local_irq_disable() in !RT
    kernels

  - The scope and the named lock allows RT kernels to substitute with
    real (recursion aware) locking primitives which keep preemption and
    interupts enabled, but provide the fine grained protection for the
    scoped critical section.
  
So how would you substitute migrate_disable() with a local_lock()? You
can't. Again migrate_disable() is NOT a concurrency control and
therefore it cannot be substituted by any concurrency control primitive.

>> That means the stacking problem has to be solved anyway.
>> 
>> So why on earth do you want to create yet another special duct tape case
>> for kamp_local() which proliferates inconsistency instead of aiming for
>> consistency accross all preemption models?
>
> The idea was to help with the scheduling issue.

I don't see how mixing concepts and trying to duct tape a problem which
is clearly in the realm of the scheduler, i.e. load balancing and
placing algorithms, is helpful.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [patch RFC 00/15] mm/highmem: Provide a preemptible variant of kmap_atomic & friends
  2020-09-24 17:55                               ` Thomas Gleixner
@ 2020-09-24 18:58                                 ` Steven Rostedt
  0 siblings, 0 replies; 54+ messages in thread
From: Steven Rostedt @ 2020-09-24 18:58 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: peterz, Linus Torvalds, LKML, linux-arch, Paul McKenney,
	the arch/x86 maintainers, Sebastian Andrzej Siewior, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Ben Segall, Mel Gorman,
	Daniel Bristot de Oliveira, Will Deacon, Andrew Morton, Linux-MM,
	Russell King, Linux ARM, Chris Zankel, Max Filippov,
	linux-xtensa, Jani Nikula, Joonas Lahtinen, Rodrigo Vivi,
	David Airlie, Daniel Vetter, intel-gfx, dri-devel,
	Ard Biesheuvel, Herbert Xu, Vineet Gupta,
	open list:SYNOPSYS ARC ARCHITECTURE, Arnd Bergmann, Guo Ren,
	linux-csky, Michal Simek, Thomas Bogendoerfer, linux-mips,
	Nick Hu, Greentime Hu, Vincent Chen, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras, linuxppc-dev,
	David S. Miller, linux-sparc

On Thu, 24 Sep 2020 19:55:10 +0200
Thomas Gleixner <tglx@linutronix.de> wrote:

> On Thu, Sep 24 2020 at 08:32, Steven Rostedt wrote:
> > On Thu, 24 Sep 2020 08:57:52 +0200
> > Thomas Gleixner <tglx@linutronix.de> wrote:
> >  
> >> > Now as for migration disabled nesting, at least now we would have
> >> > groupings of this, and perhaps the theorists can handle that. I mean,
> >> > how is this much different that having a bunch of tasks blocked on a
> >> > mutex with the owner is pinned on a CPU?
> >> >
> >> > migrate_disable() is a BKL of pinning affinity.    
> >> 
> >> No. That's just wrong. preempt disable is a concurrency control,  
> >
> > I think you totally misunderstood what I was saying. The above wasn't about
> > comparing preempt_disable to migrate_disable. It was comparing
> > migrate_disable to a chain of tasks blocked on mutexes where the top owner
> > has preempt_disable set. You still have a bunch of tasks that can't move to
> > other CPUs.  
> 
> What? The top owner does not prevent any task from moving. The tasks
> cannot move because they are blocked on the mutex, which means they are
> not runnable and non runnable tasks are not migrated at all.

And neither are migrated disabled tasks preempted by a high priority
task.

> 
> I really don't understand what you are trying to say.

Don't worry about it. I was just making a high level comparison of how
migrate disabled tasks blocked on a higher priority task is similar to
that of tasks blocked on a mutex held by a pinned task that is
preempted by a high priority task. But we can forget this analogy as
it's not appropriate for the current conversation.

> 
> >> > If we only have local_lock() available (even on !RT), then it makes
> >> > the blocking in groups. At least this way you could grep for all the
> >> > different local_locks in the system and plug that into the algorithm
> >> > for WCS, just like one would with a bunch of mutexes.    
> >> 
> >> You cannot do that on RT at all where migrate disable is substituting
> >> preempt disable in spin and rw locks. The result would be the same as
> >> with a !RT kernel just with horribly bad performance.  
> >
> > Note, the spin and rwlocks already have a lock associated with them. Why
> > would it be any different on RT? I wasn't suggesting adding another lock
> > inside a spinlock. Why would I recommend THAT? I wasn't recommending
> > blindly replacing migrate_disable() with local_lock(). I just meant expose
> > local_lock() but not migrate_disable().  
> 
> We already exposed local_lock() to non RT and it's for places which do
> preempt_disable() or local_irq_disable() without having a lock
> associated. But both primitives are scope less and therefore behave like
> CPU local BKLs. What local_lock() provides in these cases is:
> 
>   - Making the protection scope clear by associating a named local
>     lock which is coverred by lockdep.
> 
>   - It still maps to preempt_disable() or local_irq_disable() in !RT
>     kernels
> 
>   - The scope and the named lock allows RT kernels to substitute with
>     real (recursion aware) locking primitives which keep preemption and
>     interupts enabled, but provide the fine grained protection for the
>     scoped critical section.

I'm very much aware of the above.

>   
> So how would you substitute migrate_disable() with a local_lock()? You
> can't. Again migrate_disable() is NOT a concurrency control and
> therefore it cannot be substituted by any concurrency control primitive.

When I was first writing my email, I was writing about a way to replace
migrate_disable with a construct similar to local locks without
actually mentioning local locks, but then rewrote it to state local
locks, trying to simplify what I was writing. I shouldn't have done
that, because it portrayed that I wanted to use local_lock()
unmodified. I was actually thinking of a new construct that was similar
but not exactly the same as local lock.

But this will just make things more complex and we can forget about it.

I'll wait to see what Peter produces.

-- Steve

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [patch RFC 00/15] mm/highmem: Provide a preemptible variant of kmap_atomic & friends
  2020-09-24  8:27                       ` peterz
@ 2020-09-24 19:36                         ` Daniel Bristot de Oliveira
  0 siblings, 0 replies; 54+ messages in thread
From: Daniel Bristot de Oliveira @ 2020-09-24 19:36 UTC (permalink / raw)
  To: peterz, Steven Rostedt
  Cc: Thomas Gleixner, Linus Torvalds, LKML, linux-arch, Paul McKenney,
	the arch/x86 maintainers, Sebastian Andrzej Siewior, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Ben Segall, Mel Gorman,
	Will Deacon, Andrew Morton, Linux-MM, Russell King, Linux ARM,
	Chris Zankel, Max Filippov, linux-xtensa, Jani Nikula,
	Joonas Lahtinen, Rodrigo Vivi, David Airlie, Daniel Vetter,
	intel-gfx, dri-devel, Ard Biesheuvel, Herbert Xu, Vineet Gupta,
	open list:SYNOPSYS ARC ARCHITECTURE, Arnd Bergmann, Guo Ren,
	linux-csky, Michal Simek, Thomas Bogendoerfer, linux-mips,
	Nick Hu, Greentime Hu, Vincent Chen, Michael Ellerman,
	Benjamin Herrenschmidt, Paul Mackerras, linuxppc-dev,
	David S. Miller, linux-sparc

On 9/24/20 10:27 AM, peterz@infradead.org wrote:
> So my current todo list is:
> 
>  - Change RT PULL
>  - Change DL PULL
>  - Add migrate_disable() tracer; exactly like preempt/irqoff, except
>    measuring task-runtime instead of cpu-time.
>  - Add a mode that measures actual interference.
>  - Add a traceevent to detect preemption in migrate_disable().
> 
> 
> And then I suppose I should twist Daniel's arm to update his model to
> include these scenarios and numbers.

Challenge accepted :-)

-- Daniel


^ permalink raw reply	[flat|nested] 54+ messages in thread

end of thread, back to index

Thread overview: 54+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-19  9:17 [patch RFC 00/15] mm/highmem: Provide a preemptible variant of kmap_atomic & friends Thomas Gleixner
2020-09-19  9:17 ` [patch RFC 01/15] mm/highmem: Un-EXPORT __kmap_atomic_idx() Thomas Gleixner
2020-09-21  6:23   ` Christoph Hellwig
2020-09-19  9:17 ` [patch RFC 02/15] highmem: Provide generic variant of kmap_atomic* Thomas Gleixner
2020-09-21  6:28   ` Christoph Hellwig
2020-09-19  9:17 ` [patch RFC 03/15] x86/mm/highmem: Use generic kmap atomic implementation Thomas Gleixner
2020-09-19  9:17 ` [patch RFC 04/15] arc/mm/highmem: " Thomas Gleixner
2020-09-19  9:17 ` [patch RFC 05/15] ARM: highmem: Switch to generic kmap atomic Thomas Gleixner
2020-09-19  9:17 ` [patch RFC 06/15] csky/mm/highmem: " Thomas Gleixner
2020-09-23  0:05   ` Guo Ren
2020-09-19  9:17 ` [patch RFC 07/15] microblaze/mm/highmem: " Thomas Gleixner
2020-09-19  9:17 ` [patch RFC 08/15] mips/mm/highmem: " Thomas Gleixner
2020-09-19  9:18 ` [patch RFC 09/15] nds32/mm/highmem: " Thomas Gleixner
2020-09-19  9:18 ` [patch RFC 10/15] powerpc/mm/highmem: " Thomas Gleixner
2020-09-19  9:18 ` [patch RFC 11/15] sparc/mm/highmem: " Thomas Gleixner
2020-09-19  9:18 ` [patch RFC 12/15] xtensa/mm/highmem: " Thomas Gleixner
2020-09-19  9:18 ` [patch RFC 13/15] mm/highmem: Remove the old kmap_atomic cruft Thomas Gleixner
2020-09-19  9:18 ` [patch RFC 14/15] sched: highmem: Store temporary kmaps in task struct Thomas Gleixner
2020-09-19  9:18 ` [patch RFC 15/15] mm/highmem: Provide kmap_temporary* Thomas Gleixner
2020-09-19 10:35 ` [patch RFC 00/15] mm/highmem: Provide a preemptible variant of kmap_atomic & friends Daniel Vetter
2020-09-19 10:37   ` Daniel Vetter
2020-09-20  6:23     ` Thomas Gleixner
2020-09-20  8:23       ` Daniel Vetter
2020-09-20 17:24         ` Thomas Gleixner
2020-09-19 17:18 ` Linus Torvalds
2020-09-19 17:39   ` Matthew Wilcox
2020-09-19 19:13     ` Linus Torvalds
2020-09-21 19:58     ` Ira Weiny
2020-09-20  6:41   ` Thomas Gleixner
2020-09-20  8:49     ` Thomas Gleixner
2020-09-20 16:57       ` Linus Torvalds
2020-09-20 17:40         ` Thomas Gleixner
2020-09-20 17:42           ` Linus Torvalds
2020-09-20 17:58             ` Linus Torvalds
2020-09-21  7:39             ` Thomas Gleixner
2020-09-21 16:24               ` Linus Torvalds
2020-09-21 19:27                 ` Thomas Gleixner
2020-09-23  8:40                   ` peterz
2020-09-23 13:35                     ` Thomas Gleixner
2020-09-23 15:52                     ` Steven Rostedt
2020-09-23 20:55                       ` Thomas Gleixner
2020-09-23 21:12                         ` Steven Rostedt
2020-09-24  6:57                           ` Thomas Gleixner
2020-09-24 12:32                             ` Steven Rostedt
2020-09-24 12:42                               ` Peter Zijlstra
2020-09-24 13:51                                 ` Steven Rostedt
2020-09-24 13:58                                   ` Peter Zijlstra
2020-09-24 17:55                               ` Thomas Gleixner
2020-09-24 18:58                                 ` Steven Rostedt
2020-09-24  8:27                       ` peterz
2020-09-24 19:36                         ` Daniel Bristot de Oliveira
2020-09-23 10:19                   ` peterz
2020-09-23 12:33                     ` Thomas Gleixner
2020-09-23 14:33                   ` Thomas Gleixner

Linux-csky Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-csky/0 linux-csky/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-csky linux-csky/ https://lore.kernel.org/linux-csky \
		linux-csky@vger.kernel.org
	public-inbox-index linux-csky

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-csky


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git