All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 0/7] Kernel huge I/O mapping support
@ 2015-01-26 23:13 ` Toshi Kani
  0 siblings, 0 replies; 30+ messages in thread
From: Toshi Kani @ 2015-01-26 23:13 UTC (permalink / raw)
  To: akpm, hpa, tglx, mingo, arnd, linux-mm; +Cc: x86, linux-kernel

ioremap() and its related interfaces are used to create I/O
mappings to memory-mapped I/O devices.  The mapping sizes of
the traditional I/O devices are relatively small.  Non-volatile
memory (NVM), however, has many GB and is going to have TB soon.
It is not very efficient to create large I/O mappings with 4KB. 

This patch extends the ioremap() interfaces to transparently
create I/O mappings with huge pages.  There is no change necessary
to the drivers using ioremap().  Using huge pages will improve
performance of NVM and other devices with large memory, and reduce
the time to create their mappings as well.

The patchset introduces the following configs:
 HUGE_IOMAP - When selected, enable huge I/O mappings.  Require
              HAVE_ARCH_HUGE_VMAP set.
 HAVE_ARCH_HUGE_VMAP - Indicate arch supports huge KVA mappings

Patch 1-4 changes common files to support huge I/O mappings.  There
is no change in the functinalities until HUGE_IOMAP is set in patch 7.

Patch 5,6 implement HAVE_ARCH_HUGE_VMAP and HUGE_IOMAP funcs on x86,
and set HAVE_ARCH_HUGE_VMAP on x86.

Patch 7 adds HUGE_IOMAP to Kconfig, which is set to Y by default on
x86.

---
Toshi Kani (7):
  1/7 mm: Change __get_vm_area_node() to use fls_long()
  2/7 lib: Add huge I/O map capability interfaces
  3/7 mm: Change ioremap to set up huge I/O mappings
  4/7 mm: Change vunmap to tear down huge KVA mappings
  5/7 x86, mm: Support huge KVA mappings on x86
  6/7 x86, mm: Support huge I/O mappings on x86
  7/7 mm: Add config HUGE_IOMAP to enable huge I/O mappings

---
 Documentation/kernel-parameters.txt |  2 ++
 arch/Kconfig                        |  3 ++
 arch/x86/Kconfig                    |  1 +
 arch/x86/include/asm/page_types.h   |  8 +++++
 arch/x86/mm/ioremap.c               | 16 ++++++++++
 arch/x86/mm/pgtable.c               | 32 ++++++++++++++++++++
 include/asm-generic/pgtable.h       | 12 ++++++++
 include/linux/io.h                  |  5 ++++
 lib/ioremap.c                       | 60 +++++++++++++++++++++++++++++++++++++
 mm/Kconfig                          | 10 +++++++
 mm/vmalloc.c                        |  8 ++++-
 11 files changed, 156 insertions(+), 1 deletion(-)

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [RFC PATCH 0/7] Kernel huge I/O mapping support
@ 2015-01-26 23:13 ` Toshi Kani
  0 siblings, 0 replies; 30+ messages in thread
From: Toshi Kani @ 2015-01-26 23:13 UTC (permalink / raw)
  To: akpm, hpa, tglx, mingo, arnd, linux-mm; +Cc: x86, linux-kernel

ioremap() and its related interfaces are used to create I/O
mappings to memory-mapped I/O devices.  The mapping sizes of
the traditional I/O devices are relatively small.  Non-volatile
memory (NVM), however, has many GB and is going to have TB soon.
It is not very efficient to create large I/O mappings with 4KB. 

This patch extends the ioremap() interfaces to transparently
create I/O mappings with huge pages.  There is no change necessary
to the drivers using ioremap().  Using huge pages will improve
performance of NVM and other devices with large memory, and reduce
the time to create their mappings as well.

The patchset introduces the following configs:
 HUGE_IOMAP - When selected, enable huge I/O mappings.  Require
              HAVE_ARCH_HUGE_VMAP set.
 HAVE_ARCH_HUGE_VMAP - Indicate arch supports huge KVA mappings

Patch 1-4 changes common files to support huge I/O mappings.  There
is no change in the functinalities until HUGE_IOMAP is set in patch 7.

Patch 5,6 implement HAVE_ARCH_HUGE_VMAP and HUGE_IOMAP funcs on x86,
and set HAVE_ARCH_HUGE_VMAP on x86.

Patch 7 adds HUGE_IOMAP to Kconfig, which is set to Y by default on
x86.

---
Toshi Kani (7):
  1/7 mm: Change __get_vm_area_node() to use fls_long()
  2/7 lib: Add huge I/O map capability interfaces
  3/7 mm: Change ioremap to set up huge I/O mappings
  4/7 mm: Change vunmap to tear down huge KVA mappings
  5/7 x86, mm: Support huge KVA mappings on x86
  6/7 x86, mm: Support huge I/O mappings on x86
  7/7 mm: Add config HUGE_IOMAP to enable huge I/O mappings

---
 Documentation/kernel-parameters.txt |  2 ++
 arch/Kconfig                        |  3 ++
 arch/x86/Kconfig                    |  1 +
 arch/x86/include/asm/page_types.h   |  8 +++++
 arch/x86/mm/ioremap.c               | 16 ++++++++++
 arch/x86/mm/pgtable.c               | 32 ++++++++++++++++++++
 include/asm-generic/pgtable.h       | 12 ++++++++
 include/linux/io.h                  |  5 ++++
 lib/ioremap.c                       | 60 +++++++++++++++++++++++++++++++++++++
 mm/Kconfig                          | 10 +++++++
 mm/vmalloc.c                        |  8 ++++-
 11 files changed, 156 insertions(+), 1 deletion(-)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [RFC PATCH 1/7] mm: Change __get_vm_area_node() to use fls_long()
  2015-01-26 23:13 ` Toshi Kani
@ 2015-01-26 23:13   ` Toshi Kani
  -1 siblings, 0 replies; 30+ messages in thread
From: Toshi Kani @ 2015-01-26 23:13 UTC (permalink / raw)
  To: akpm, hpa, tglx, mingo, arnd, linux-mm; +Cc: x86, linux-kernel, Toshi Kani

__get_vm_area_node() takes unsigned long size, which is a 64-bit
value on a 64-bit kernel.  However, fls(size) simply ignores the
upper 32-bit.  Change to use fls_long() to handle the size properly.

Signed-off-by: Toshi Kani <toshi.kani@hp.com>
---
 mm/vmalloc.c |    4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 39c3388..830a4be 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -29,6 +29,7 @@
 #include <linux/atomic.h>
 #include <linux/compiler.h>
 #include <linux/llist.h>
+#include <linux/bitops.h>
 
 #include <asm/uaccess.h>
 #include <asm/tlbflush.h>
@@ -1314,7 +1315,8 @@ static struct vm_struct *__get_vm_area_node(unsigned long size,
 
 	BUG_ON(in_interrupt());
 	if (flags & VM_IOREMAP)
-		align = 1ul << clamp(fls(size), PAGE_SHIFT, IOREMAP_MAX_ORDER);
+		align = 1ul << clamp((int)fls_long(size),
+				     PAGE_SHIFT, IOREMAP_MAX_ORDER);
 
 	size = PAGE_ALIGN(size);
 	if (unlikely(!size))

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [RFC PATCH 1/7] mm: Change __get_vm_area_node() to use fls_long()
@ 2015-01-26 23:13   ` Toshi Kani
  0 siblings, 0 replies; 30+ messages in thread
From: Toshi Kani @ 2015-01-26 23:13 UTC (permalink / raw)
  To: akpm, hpa, tglx, mingo, arnd, linux-mm; +Cc: x86, linux-kernel, Toshi Kani

__get_vm_area_node() takes unsigned long size, which is a 64-bit
value on a 64-bit kernel.  However, fls(size) simply ignores the
upper 32-bit.  Change to use fls_long() to handle the size properly.

Signed-off-by: Toshi Kani <toshi.kani@hp.com>
---
 mm/vmalloc.c |    4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 39c3388..830a4be 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -29,6 +29,7 @@
 #include <linux/atomic.h>
 #include <linux/compiler.h>
 #include <linux/llist.h>
+#include <linux/bitops.h>
 
 #include <asm/uaccess.h>
 #include <asm/tlbflush.h>
@@ -1314,7 +1315,8 @@ static struct vm_struct *__get_vm_area_node(unsigned long size,
 
 	BUG_ON(in_interrupt());
 	if (flags & VM_IOREMAP)
-		align = 1ul << clamp(fls(size), PAGE_SHIFT, IOREMAP_MAX_ORDER);
+		align = 1ul << clamp((int)fls_long(size),
+				     PAGE_SHIFT, IOREMAP_MAX_ORDER);
 
 	size = PAGE_ALIGN(size);
 	if (unlikely(!size))

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [RFC PATCH 2/7] lib: Add huge I/O map capability interfaces
  2015-01-26 23:13 ` Toshi Kani
@ 2015-01-26 23:13   ` Toshi Kani
  -1 siblings, 0 replies; 30+ messages in thread
From: Toshi Kani @ 2015-01-26 23:13 UTC (permalink / raw)
  To: akpm, hpa, tglx, mingo, arnd, linux-mm; +Cc: x86, linux-kernel, Toshi Kani

Add ioremap_pud_enabled() and ioremap_pmd_enabled(), which
return 1 when I/O mappings of pud/pmd are enabled on the kernel.

ioremap_huge_init() calls arch_ioremap_pud_supported() and
arch_ioremap_pmd_supported() to initialize the capabilities.

A new kernel option "nohgiomap" is also added, so that user can
disable the huge I/O map capabilities if necessary.

Signed-off-by: Toshi Kani <toshi.kani@hp.com>
---
 Documentation/kernel-parameters.txt |    2 ++
 include/linux/io.h                  |    5 ++++
 lib/ioremap.c                       |   44 +++++++++++++++++++++++++++++++++++
 3 files changed, 51 insertions(+)

diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index 176d4fe..e3de01c 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -2304,6 +2304,8 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
 			register save and restore. The kernel will only save
 			legacy floating-point registers on task switch.
 
+	nohgiomap	[KNL,x86] Disable huge I/O mappings.
+
 	noxsave		[BUGS=X86] Disables x86 extended register state save
 			and restore using xsave. The kernel will fallback to
 			enabling legacy floating-point and sse state.
diff --git a/include/linux/io.h b/include/linux/io.h
index fa02e55..8f5c8af 100644
--- a/include/linux/io.h
+++ b/include/linux/io.h
@@ -38,6 +38,11 @@ static inline int ioremap_page_range(unsigned long addr, unsigned long end,
 }
 #endif
 
+#ifdef CONFIG_HUGE_IOMAP
+int arch_ioremap_pud_supported(void);
+int arch_ioremap_pmd_supported(void);
+#endif
+
 /*
  * Managed iomap interface
  */
diff --git a/lib/ioremap.c b/lib/ioremap.c
index 0c9216c..0a1ecb6 100644
--- a/lib/ioremap.c
+++ b/lib/ioremap.c
@@ -13,6 +13,44 @@
 #include <asm/cacheflush.h>
 #include <asm/pgtable.h>
 
+#ifdef CONFIG_HUGE_IOMAP
+int __read_mostly ioremap_pud_capable;
+int __read_mostly ioremap_pmd_capable;
+int __read_mostly ioremap_huge_disabled;
+
+static int __init set_nohgiomap(char *str)
+{
+	ioremap_huge_disabled = 1;
+	return 0;
+}
+early_param("nohgiomap", set_nohgiomap);
+
+static inline void ioremap_huge_init(void)
+{
+	if (!ioremap_huge_disabled) {
+		if (arch_ioremap_pud_supported())
+			ioremap_pud_capable = 1;
+		if (arch_ioremap_pmd_supported())
+			ioremap_pmd_capable = 1;
+	}
+}
+
+static inline int ioremap_pud_enabled(void)
+{
+	return ioremap_pud_capable;
+}
+
+static inline int ioremap_pmd_enabled(void)
+{
+	return ioremap_pmd_capable;
+}
+
+#else	/* !CONFIG_HUGE_IOMAP */
+static inline void ioremap_huge_init(void) { }
+static inline int ioremap_pud_enabled(void) { return 0; }
+static inline int ioremap_pmd_enabled(void) { return 0; }
+#endif	/* CONFIG_HUGE_IOMAP */
+
 static int ioremap_pte_range(pmd_t *pmd, unsigned long addr,
 		unsigned long end, phys_addr_t phys_addr, pgprot_t prot)
 {
@@ -74,6 +112,12 @@ int ioremap_page_range(unsigned long addr,
 	unsigned long start;
 	unsigned long next;
 	int err;
+	static int ioremap_huge_init_done;
+
+	if (!ioremap_huge_init_done) {
+		ioremap_huge_init_done = 1;
+		ioremap_huge_init();
+	}
 
 	BUG_ON(addr >= end);
 

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [RFC PATCH 2/7] lib: Add huge I/O map capability interfaces
@ 2015-01-26 23:13   ` Toshi Kani
  0 siblings, 0 replies; 30+ messages in thread
From: Toshi Kani @ 2015-01-26 23:13 UTC (permalink / raw)
  To: akpm, hpa, tglx, mingo, arnd, linux-mm; +Cc: x86, linux-kernel, Toshi Kani

Add ioremap_pud_enabled() and ioremap_pmd_enabled(), which
return 1 when I/O mappings of pud/pmd are enabled on the kernel.

ioremap_huge_init() calls arch_ioremap_pud_supported() and
arch_ioremap_pmd_supported() to initialize the capabilities.

A new kernel option "nohgiomap" is also added, so that user can
disable the huge I/O map capabilities if necessary.

Signed-off-by: Toshi Kani <toshi.kani@hp.com>
---
 Documentation/kernel-parameters.txt |    2 ++
 include/linux/io.h                  |    5 ++++
 lib/ioremap.c                       |   44 +++++++++++++++++++++++++++++++++++
 3 files changed, 51 insertions(+)

diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index 176d4fe..e3de01c 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -2304,6 +2304,8 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
 			register save and restore. The kernel will only save
 			legacy floating-point registers on task switch.
 
+	nohgiomap	[KNL,x86] Disable huge I/O mappings.
+
 	noxsave		[BUGS=X86] Disables x86 extended register state save
 			and restore using xsave. The kernel will fallback to
 			enabling legacy floating-point and sse state.
diff --git a/include/linux/io.h b/include/linux/io.h
index fa02e55..8f5c8af 100644
--- a/include/linux/io.h
+++ b/include/linux/io.h
@@ -38,6 +38,11 @@ static inline int ioremap_page_range(unsigned long addr, unsigned long end,
 }
 #endif
 
+#ifdef CONFIG_HUGE_IOMAP
+int arch_ioremap_pud_supported(void);
+int arch_ioremap_pmd_supported(void);
+#endif
+
 /*
  * Managed iomap interface
  */
diff --git a/lib/ioremap.c b/lib/ioremap.c
index 0c9216c..0a1ecb6 100644
--- a/lib/ioremap.c
+++ b/lib/ioremap.c
@@ -13,6 +13,44 @@
 #include <asm/cacheflush.h>
 #include <asm/pgtable.h>
 
+#ifdef CONFIG_HUGE_IOMAP
+int __read_mostly ioremap_pud_capable;
+int __read_mostly ioremap_pmd_capable;
+int __read_mostly ioremap_huge_disabled;
+
+static int __init set_nohgiomap(char *str)
+{
+	ioremap_huge_disabled = 1;
+	return 0;
+}
+early_param("nohgiomap", set_nohgiomap);
+
+static inline void ioremap_huge_init(void)
+{
+	if (!ioremap_huge_disabled) {
+		if (arch_ioremap_pud_supported())
+			ioremap_pud_capable = 1;
+		if (arch_ioremap_pmd_supported())
+			ioremap_pmd_capable = 1;
+	}
+}
+
+static inline int ioremap_pud_enabled(void)
+{
+	return ioremap_pud_capable;
+}
+
+static inline int ioremap_pmd_enabled(void)
+{
+	return ioremap_pmd_capable;
+}
+
+#else	/* !CONFIG_HUGE_IOMAP */
+static inline void ioremap_huge_init(void) { }
+static inline int ioremap_pud_enabled(void) { return 0; }
+static inline int ioremap_pmd_enabled(void) { return 0; }
+#endif	/* CONFIG_HUGE_IOMAP */
+
 static int ioremap_pte_range(pmd_t *pmd, unsigned long addr,
 		unsigned long end, phys_addr_t phys_addr, pgprot_t prot)
 {
@@ -74,6 +112,12 @@ int ioremap_page_range(unsigned long addr,
 	unsigned long start;
 	unsigned long next;
 	int err;
+	static int ioremap_huge_init_done;
+
+	if (!ioremap_huge_init_done) {
+		ioremap_huge_init_done = 1;
+		ioremap_huge_init();
+	}
 
 	BUG_ON(addr >= end);
 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [RFC PATCH 3/7] mm: Change ioremap to set up huge I/O mappings
  2015-01-26 23:13 ` Toshi Kani
@ 2015-01-26 23:13   ` Toshi Kani
  -1 siblings, 0 replies; 30+ messages in thread
From: Toshi Kani @ 2015-01-26 23:13 UTC (permalink / raw)
  To: akpm, hpa, tglx, mingo, arnd, linux-mm; +Cc: x86, linux-kernel, Toshi Kani

Change ioremap_pud_range() and ioremap_pmd_range() to set up
huge I/O mappings when their capability is enabled and their
conditions are met in a given request -- both virtual & physical
addresses are aligned and its range fufills the mapping size.

These changes are only enabled when both CONFIG_HUGE_IOMAP
and CONFIG_HAVE_ARCH_HUGE_VMAP are defined.

Signed-off-by: Toshi Kani <toshi.kani@hp.com>
---
 arch/Kconfig                  |    3 +++
 include/asm-generic/pgtable.h |    8 ++++++++
 lib/ioremap.c                 |   16 ++++++++++++++++
 3 files changed, 27 insertions(+)

diff --git a/arch/Kconfig b/arch/Kconfig
index 05d7a8a..55c4440 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -446,6 +446,9 @@ config HAVE_IRQ_TIME_ACCOUNTING
 config HAVE_ARCH_TRANSPARENT_HUGEPAGE
 	bool
 
+config HAVE_ARCH_HUGE_VMAP
+	bool
+
 config HAVE_ARCH_SOFT_DIRTY
 	bool
 
diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h
index 177d597..7dc3838 100644
--- a/include/asm-generic/pgtable.h
+++ b/include/asm-generic/pgtable.h
@@ -847,4 +847,12 @@ static inline void pmdp_set_numa(struct mm_struct *mm, unsigned long addr,
 #define io_remap_pfn_range remap_pfn_range
 #endif
 
+#ifdef CONFIG_HAVE_ARCH_HUGE_VMAP
+void pud_set_huge(pud_t *pud, phys_addr_t addr, pgprot_t prot);
+void pmd_set_huge(pmd_t *pmd, phys_addr_t addr, pgprot_t prot);
+#else	/* !CONFIG_HAVE_ARCH_HUGE_VMAP */
+static inline void pud_set_huge(pud_t *pud, phys_addr_t addr, pgprot_t prot) { }
+static inline void pmd_set_huge(pmd_t *pmd, phys_addr_t addr, pgprot_t prot) { }
+#endif	/* CONFIG_HAVE_ARCH_HUGE_VMAP */
+
 #endif /* _ASM_GENERIC_PGTABLE_H */
diff --git a/lib/ioremap.c b/lib/ioremap.c
index 0a1ecb6..01b70aa 100644
--- a/lib/ioremap.c
+++ b/lib/ioremap.c
@@ -81,6 +81,14 @@ static inline int ioremap_pmd_range(pud_t *pud, unsigned long addr,
 		return -ENOMEM;
 	do {
 		next = pmd_addr_end(addr, end);
+
+		if (ioremap_pmd_enabled() &&
+		    ((next - addr) == PMD_SIZE) &&
+		    !((phys_addr + addr) & (PMD_SIZE-1))) {
+			pmd_set_huge(pmd, phys_addr + addr, prot);
+			continue;
+		}
+
 		if (ioremap_pte_range(pmd, addr, next, phys_addr + addr, prot))
 			return -ENOMEM;
 	} while (pmd++, addr = next, addr != end);
@@ -99,6 +107,14 @@ static inline int ioremap_pud_range(pgd_t *pgd, unsigned long addr,
 		return -ENOMEM;
 	do {
 		next = pud_addr_end(addr, end);
+
+		if (ioremap_pud_enabled() &&
+		    ((next - addr) == PUD_SIZE) &&
+		    !((phys_addr + addr) & (PUD_SIZE-1))) {
+			pud_set_huge(pud, phys_addr + addr, prot);
+			continue;
+		}
+
 		if (ioremap_pmd_range(pud, addr, next, phys_addr + addr, prot))
 			return -ENOMEM;
 	} while (pud++, addr = next, addr != end);

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [RFC PATCH 3/7] mm: Change ioremap to set up huge I/O mappings
@ 2015-01-26 23:13   ` Toshi Kani
  0 siblings, 0 replies; 30+ messages in thread
From: Toshi Kani @ 2015-01-26 23:13 UTC (permalink / raw)
  To: akpm, hpa, tglx, mingo, arnd, linux-mm; +Cc: x86, linux-kernel, Toshi Kani

Change ioremap_pud_range() and ioremap_pmd_range() to set up
huge I/O mappings when their capability is enabled and their
conditions are met in a given request -- both virtual & physical
addresses are aligned and its range fufills the mapping size.

These changes are only enabled when both CONFIG_HUGE_IOMAP
and CONFIG_HAVE_ARCH_HUGE_VMAP are defined.

Signed-off-by: Toshi Kani <toshi.kani@hp.com>
---
 arch/Kconfig                  |    3 +++
 include/asm-generic/pgtable.h |    8 ++++++++
 lib/ioremap.c                 |   16 ++++++++++++++++
 3 files changed, 27 insertions(+)

diff --git a/arch/Kconfig b/arch/Kconfig
index 05d7a8a..55c4440 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -446,6 +446,9 @@ config HAVE_IRQ_TIME_ACCOUNTING
 config HAVE_ARCH_TRANSPARENT_HUGEPAGE
 	bool
 
+config HAVE_ARCH_HUGE_VMAP
+	bool
+
 config HAVE_ARCH_SOFT_DIRTY
 	bool
 
diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h
index 177d597..7dc3838 100644
--- a/include/asm-generic/pgtable.h
+++ b/include/asm-generic/pgtable.h
@@ -847,4 +847,12 @@ static inline void pmdp_set_numa(struct mm_struct *mm, unsigned long addr,
 #define io_remap_pfn_range remap_pfn_range
 #endif
 
+#ifdef CONFIG_HAVE_ARCH_HUGE_VMAP
+void pud_set_huge(pud_t *pud, phys_addr_t addr, pgprot_t prot);
+void pmd_set_huge(pmd_t *pmd, phys_addr_t addr, pgprot_t prot);
+#else	/* !CONFIG_HAVE_ARCH_HUGE_VMAP */
+static inline void pud_set_huge(pud_t *pud, phys_addr_t addr, pgprot_t prot) { }
+static inline void pmd_set_huge(pmd_t *pmd, phys_addr_t addr, pgprot_t prot) { }
+#endif	/* CONFIG_HAVE_ARCH_HUGE_VMAP */
+
 #endif /* _ASM_GENERIC_PGTABLE_H */
diff --git a/lib/ioremap.c b/lib/ioremap.c
index 0a1ecb6..01b70aa 100644
--- a/lib/ioremap.c
+++ b/lib/ioremap.c
@@ -81,6 +81,14 @@ static inline int ioremap_pmd_range(pud_t *pud, unsigned long addr,
 		return -ENOMEM;
 	do {
 		next = pmd_addr_end(addr, end);
+
+		if (ioremap_pmd_enabled() &&
+		    ((next - addr) == PMD_SIZE) &&
+		    !((phys_addr + addr) & (PMD_SIZE-1))) {
+			pmd_set_huge(pmd, phys_addr + addr, prot);
+			continue;
+		}
+
 		if (ioremap_pte_range(pmd, addr, next, phys_addr + addr, prot))
 			return -ENOMEM;
 	} while (pmd++, addr = next, addr != end);
@@ -99,6 +107,14 @@ static inline int ioremap_pud_range(pgd_t *pgd, unsigned long addr,
 		return -ENOMEM;
 	do {
 		next = pud_addr_end(addr, end);
+
+		if (ioremap_pud_enabled() &&
+		    ((next - addr) == PUD_SIZE) &&
+		    !((phys_addr + addr) & (PUD_SIZE-1))) {
+			pud_set_huge(pud, phys_addr + addr, prot);
+			continue;
+		}
+
 		if (ioremap_pmd_range(pud, addr, next, phys_addr + addr, prot))
 			return -ENOMEM;
 	} while (pud++, addr = next, addr != end);

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [RFC PATCH 4/7] mm: Change vunmap to tear down huge KVA mappings
  2015-01-26 23:13 ` Toshi Kani
@ 2015-01-26 23:13   ` Toshi Kani
  -1 siblings, 0 replies; 30+ messages in thread
From: Toshi Kani @ 2015-01-26 23:13 UTC (permalink / raw)
  To: akpm, hpa, tglx, mingo, arnd, linux-mm; +Cc: x86, linux-kernel, Toshi Kani

Change vunmap_pmd_range() and vunmap_pud_range() to tear down
huge KVA mappings when they are set.

These changes are only enabled when CONFIG_HAVE_ARCH_HUGE_VMAP
is defined.

Signed-off-by: Toshi Kani <toshi.kani@hp.com>
---
 include/asm-generic/pgtable.h |    4 ++++
 mm/vmalloc.c                  |    4 ++++
 2 files changed, 8 insertions(+)

diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h
index 7dc3838..1204ea6 100644
--- a/include/asm-generic/pgtable.h
+++ b/include/asm-generic/pgtable.h
@@ -850,9 +850,13 @@ static inline void pmdp_set_numa(struct mm_struct *mm, unsigned long addr,
 #ifdef CONFIG_HAVE_ARCH_HUGE_VMAP
 void pud_set_huge(pud_t *pud, phys_addr_t addr, pgprot_t prot);
 void pmd_set_huge(pmd_t *pmd, phys_addr_t addr, pgprot_t prot);
+int pud_clear_huge(pud_t *pud);
+int pmd_clear_huge(pmd_t *pmd);
 #else	/* !CONFIG_HAVE_ARCH_HUGE_VMAP */
 static inline void pud_set_huge(pud_t *pud, phys_addr_t addr, pgprot_t prot) { }
 static inline void pmd_set_huge(pmd_t *pmd, phys_addr_t addr, pgprot_t prot) { }
+static inline int pud_clear_huge(pud_t *pud) { return 0; }
+static inline int pmd_clear_huge(pmd_t *pmd) { return 0; }
 #endif	/* CONFIG_HAVE_ARCH_HUGE_VMAP */
 
 #endif /* _ASM_GENERIC_PGTABLE_H */
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 830a4be..c9490fe 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -75,6 +75,8 @@ static void vunmap_pmd_range(pud_t *pud, unsigned long addr, unsigned long end)
 	pmd = pmd_offset(pud, addr);
 	do {
 		next = pmd_addr_end(addr, end);
+		if (pmd_clear_huge(pmd))
+			continue;
 		if (pmd_none_or_clear_bad(pmd))
 			continue;
 		vunmap_pte_range(pmd, addr, next);
@@ -89,6 +91,8 @@ static void vunmap_pud_range(pgd_t *pgd, unsigned long addr, unsigned long end)
 	pud = pud_offset(pgd, addr);
 	do {
 		next = pud_addr_end(addr, end);
+		if (pud_clear_huge(pud))
+			continue;
 		if (pud_none_or_clear_bad(pud))
 			continue;
 		vunmap_pmd_range(pud, addr, next);

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [RFC PATCH 4/7] mm: Change vunmap to tear down huge KVA mappings
@ 2015-01-26 23:13   ` Toshi Kani
  0 siblings, 0 replies; 30+ messages in thread
From: Toshi Kani @ 2015-01-26 23:13 UTC (permalink / raw)
  To: akpm, hpa, tglx, mingo, arnd, linux-mm; +Cc: x86, linux-kernel, Toshi Kani

Change vunmap_pmd_range() and vunmap_pud_range() to tear down
huge KVA mappings when they are set.

These changes are only enabled when CONFIG_HAVE_ARCH_HUGE_VMAP
is defined.

Signed-off-by: Toshi Kani <toshi.kani@hp.com>
---
 include/asm-generic/pgtable.h |    4 ++++
 mm/vmalloc.c                  |    4 ++++
 2 files changed, 8 insertions(+)

diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h
index 7dc3838..1204ea6 100644
--- a/include/asm-generic/pgtable.h
+++ b/include/asm-generic/pgtable.h
@@ -850,9 +850,13 @@ static inline void pmdp_set_numa(struct mm_struct *mm, unsigned long addr,
 #ifdef CONFIG_HAVE_ARCH_HUGE_VMAP
 void pud_set_huge(pud_t *pud, phys_addr_t addr, pgprot_t prot);
 void pmd_set_huge(pmd_t *pmd, phys_addr_t addr, pgprot_t prot);
+int pud_clear_huge(pud_t *pud);
+int pmd_clear_huge(pmd_t *pmd);
 #else	/* !CONFIG_HAVE_ARCH_HUGE_VMAP */
 static inline void pud_set_huge(pud_t *pud, phys_addr_t addr, pgprot_t prot) { }
 static inline void pmd_set_huge(pmd_t *pmd, phys_addr_t addr, pgprot_t prot) { }
+static inline int pud_clear_huge(pud_t *pud) { return 0; }
+static inline int pmd_clear_huge(pmd_t *pmd) { return 0; }
 #endif	/* CONFIG_HAVE_ARCH_HUGE_VMAP */
 
 #endif /* _ASM_GENERIC_PGTABLE_H */
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 830a4be..c9490fe 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -75,6 +75,8 @@ static void vunmap_pmd_range(pud_t *pud, unsigned long addr, unsigned long end)
 	pmd = pmd_offset(pud, addr);
 	do {
 		next = pmd_addr_end(addr, end);
+		if (pmd_clear_huge(pmd))
+			continue;
 		if (pmd_none_or_clear_bad(pmd))
 			continue;
 		vunmap_pte_range(pmd, addr, next);
@@ -89,6 +91,8 @@ static void vunmap_pud_range(pgd_t *pgd, unsigned long addr, unsigned long end)
 	pud = pud_offset(pgd, addr);
 	do {
 		next = pud_addr_end(addr, end);
+		if (pud_clear_huge(pud))
+			continue;
 		if (pud_none_or_clear_bad(pud))
 			continue;
 		vunmap_pmd_range(pud, addr, next);

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [RFC PATCH 5/7] x86, mm: Support huge KVA mappings on x86
  2015-01-26 23:13 ` Toshi Kani
@ 2015-01-26 23:13   ` Toshi Kani
  -1 siblings, 0 replies; 30+ messages in thread
From: Toshi Kani @ 2015-01-26 23:13 UTC (permalink / raw)
  To: akpm, hpa, tglx, mingo, arnd, linux-mm; +Cc: x86, linux-kernel, Toshi Kani

Implement huge KVA mapping interfaces and select
CONFIG_HAVE_ARCH_HUGE_VMAP on x86.

Signed-off-by: Toshi Kani <toshi.kani@hp.com>
---
 arch/x86/Kconfig      |    1 +
 arch/x86/mm/pgtable.c |   32 ++++++++++++++++++++++++++++++++
 2 files changed, 33 insertions(+)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 0dc9d01..8150038 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -97,6 +97,7 @@ config X86
 	select IRQ_FORCED_THREADING
 	select HAVE_BPF_JIT if X86_64
 	select HAVE_ARCH_TRANSPARENT_HUGEPAGE
+	select HAVE_ARCH_HUGE_VMAP
 	select ARCH_HAS_SG_CHAIN
 	select CLKEVT_I8253
 	select ARCH_HAVE_NMI_SAFE_CMPXCHG
diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index 6fb6927..e113d69 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -481,3 +481,35 @@ void native_set_fixmap(enum fixed_addresses idx, phys_addr_t phys,
 {
 	__native_set_fixmap(idx, pfn_pte(phys >> PAGE_SHIFT, flags));
 }
+
+void pud_set_huge(pud_t *pud, phys_addr_t addr, pgprot_t prot)
+{
+	set_pte((pte_t *)pud, pfn_pte(
+		(u64)addr >> PAGE_SHIFT,
+		__pgprot(pgprot_val(prot) | _PAGE_PSE)));
+}
+
+void pmd_set_huge(pmd_t *pmd, phys_addr_t addr, pgprot_t prot)
+{
+	set_pte((pte_t *)pmd, pfn_pte(
+		(u64)addr >> PAGE_SHIFT,
+		__pgprot(pgprot_val(prot) | _PAGE_PSE)));
+}
+
+int pud_clear_huge(pud_t *pud)
+{
+	if (pud_large(*pud)) {
+		pud_clear(pud);
+		return 1;
+	}
+	return 0;
+}
+
+int pmd_clear_huge(pmd_t *pmd)
+{
+	if (pmd_large(*pmd)) {
+		pmd_clear(pmd);
+		return 1;
+	}
+	return 0;
+}

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [RFC PATCH 5/7] x86, mm: Support huge KVA mappings on x86
@ 2015-01-26 23:13   ` Toshi Kani
  0 siblings, 0 replies; 30+ messages in thread
From: Toshi Kani @ 2015-01-26 23:13 UTC (permalink / raw)
  To: akpm, hpa, tglx, mingo, arnd, linux-mm; +Cc: x86, linux-kernel, Toshi Kani

Implement huge KVA mapping interfaces and select
CONFIG_HAVE_ARCH_HUGE_VMAP on x86.

Signed-off-by: Toshi Kani <toshi.kani@hp.com>
---
 arch/x86/Kconfig      |    1 +
 arch/x86/mm/pgtable.c |   32 ++++++++++++++++++++++++++++++++
 2 files changed, 33 insertions(+)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 0dc9d01..8150038 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -97,6 +97,7 @@ config X86
 	select IRQ_FORCED_THREADING
 	select HAVE_BPF_JIT if X86_64
 	select HAVE_ARCH_TRANSPARENT_HUGEPAGE
+	select HAVE_ARCH_HUGE_VMAP
 	select ARCH_HAS_SG_CHAIN
 	select CLKEVT_I8253
 	select ARCH_HAVE_NMI_SAFE_CMPXCHG
diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index 6fb6927..e113d69 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -481,3 +481,35 @@ void native_set_fixmap(enum fixed_addresses idx, phys_addr_t phys,
 {
 	__native_set_fixmap(idx, pfn_pte(phys >> PAGE_SHIFT, flags));
 }
+
+void pud_set_huge(pud_t *pud, phys_addr_t addr, pgprot_t prot)
+{
+	set_pte((pte_t *)pud, pfn_pte(
+		(u64)addr >> PAGE_SHIFT,
+		__pgprot(pgprot_val(prot) | _PAGE_PSE)));
+}
+
+void pmd_set_huge(pmd_t *pmd, phys_addr_t addr, pgprot_t prot)
+{
+	set_pte((pte_t *)pmd, pfn_pte(
+		(u64)addr >> PAGE_SHIFT,
+		__pgprot(pgprot_val(prot) | _PAGE_PSE)));
+}
+
+int pud_clear_huge(pud_t *pud)
+{
+	if (pud_large(*pud)) {
+		pud_clear(pud);
+		return 1;
+	}
+	return 0;
+}
+
+int pmd_clear_huge(pmd_t *pmd)
+{
+	if (pmd_large(*pmd)) {
+		pmd_clear(pmd);
+		return 1;
+	}
+	return 0;
+}

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [RFC PATCH 6/7] x86, mm: Support huge I/O mappings on x86
  2015-01-26 23:13 ` Toshi Kani
@ 2015-01-26 23:13   ` Toshi Kani
  -1 siblings, 0 replies; 30+ messages in thread
From: Toshi Kani @ 2015-01-26 23:13 UTC (permalink / raw)
  To: akpm, hpa, tglx, mingo, arnd, linux-mm; +Cc: x86, linux-kernel, Toshi Kani

Implement huge I/O mapping capability interfaces on x86.

Also define IOREMAP_MAX_ORDER to the size of PUD or PMD on x86.
When IOREMAP_MAX_ORDER is not defined on x86, it will be defined
to the default value in <linux/vmalloc.h>.

Signed-off-by: Toshi Kani <toshi.kani@hp.com>
---
 arch/x86/include/asm/page_types.h |    8 ++++++++
 arch/x86/mm/ioremap.c             |   16 ++++++++++++++++
 2 files changed, 24 insertions(+)

diff --git a/arch/x86/include/asm/page_types.h b/arch/x86/include/asm/page_types.h
index f97fbe3..debbfff 100644
--- a/arch/x86/include/asm/page_types.h
+++ b/arch/x86/include/asm/page_types.h
@@ -38,6 +38,14 @@
 
 #define __START_KERNEL		(__START_KERNEL_map + __PHYSICAL_START)
 
+#ifdef CONFIG_HUGE_IOMAP
+#ifdef CONFIG_X86_64
+#define IOREMAP_MAX_ORDER       (PUD_SHIFT)
+#elif defined(PMD_SHIFT)
+#define IOREMAP_MAX_ORDER       (PMD_SHIFT)
+#endif
+#endif  /* CONFIG_HUGE_IOMAP */
+
 #ifdef CONFIG_X86_64
 #include <asm/page_64_types.h>
 #else
diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
index fdf617c..ef32bec 100644
--- a/arch/x86/mm/ioremap.c
+++ b/arch/x86/mm/ioremap.c
@@ -326,6 +326,22 @@ void iounmap(volatile void __iomem *addr)
 }
 EXPORT_SYMBOL(iounmap);
 
+#ifdef CONFIG_HUGE_IOMAP
+int arch_ioremap_pud_supported(void)
+{
+#ifdef CONFIG_X86_64
+	return cpu_has_gbpages;
+#else
+	return 0;
+#endif
+}
+
+int arch_ioremap_pmd_supported(void)
+{
+	return cpu_has_pse;
+}
+#endif	/* CONFIG_HUGE_IOMAP */
+
 /*
  * Convert a physical pointer to a virtual kernel pointer for /dev/mem
  * access

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [RFC PATCH 6/7] x86, mm: Support huge I/O mappings on x86
@ 2015-01-26 23:13   ` Toshi Kani
  0 siblings, 0 replies; 30+ messages in thread
From: Toshi Kani @ 2015-01-26 23:13 UTC (permalink / raw)
  To: akpm, hpa, tglx, mingo, arnd, linux-mm; +Cc: x86, linux-kernel, Toshi Kani

Implement huge I/O mapping capability interfaces on x86.

Also define IOREMAP_MAX_ORDER to the size of PUD or PMD on x86.
When IOREMAP_MAX_ORDER is not defined on x86, it will be defined
to the default value in <linux/vmalloc.h>.

Signed-off-by: Toshi Kani <toshi.kani@hp.com>
---
 arch/x86/include/asm/page_types.h |    8 ++++++++
 arch/x86/mm/ioremap.c             |   16 ++++++++++++++++
 2 files changed, 24 insertions(+)

diff --git a/arch/x86/include/asm/page_types.h b/arch/x86/include/asm/page_types.h
index f97fbe3..debbfff 100644
--- a/arch/x86/include/asm/page_types.h
+++ b/arch/x86/include/asm/page_types.h
@@ -38,6 +38,14 @@
 
 #define __START_KERNEL		(__START_KERNEL_map + __PHYSICAL_START)
 
+#ifdef CONFIG_HUGE_IOMAP
+#ifdef CONFIG_X86_64
+#define IOREMAP_MAX_ORDER       (PUD_SHIFT)
+#elif defined(PMD_SHIFT)
+#define IOREMAP_MAX_ORDER       (PMD_SHIFT)
+#endif
+#endif  /* CONFIG_HUGE_IOMAP */
+
 #ifdef CONFIG_X86_64
 #include <asm/page_64_types.h>
 #else
diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
index fdf617c..ef32bec 100644
--- a/arch/x86/mm/ioremap.c
+++ b/arch/x86/mm/ioremap.c
@@ -326,6 +326,22 @@ void iounmap(volatile void __iomem *addr)
 }
 EXPORT_SYMBOL(iounmap);
 
+#ifdef CONFIG_HUGE_IOMAP
+int arch_ioremap_pud_supported(void)
+{
+#ifdef CONFIG_X86_64
+	return cpu_has_gbpages;
+#else
+	return 0;
+#endif
+}
+
+int arch_ioremap_pmd_supported(void)
+{
+	return cpu_has_pse;
+}
+#endif	/* CONFIG_HUGE_IOMAP */
+
 /*
  * Convert a physical pointer to a virtual kernel pointer for /dev/mem
  * access

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [RFC PATCH 7/7] mm: Add config HUGE_IOMAP to enable huge I/O mappings
  2015-01-26 23:13 ` Toshi Kani
@ 2015-01-26 23:13   ` Toshi Kani
  -1 siblings, 0 replies; 30+ messages in thread
From: Toshi Kani @ 2015-01-26 23:13 UTC (permalink / raw)
  To: akpm, hpa, tglx, mingo, arnd, linux-mm; +Cc: x86, linux-kernel, Toshi Kani

Add config HUGE_IOMAP to enable huge I/O mappings.  This feature
is set to Y by default when HAVE_ARCH_HUGE_VMAP is defined on the
architecture.

Note, user can also disable this feature at boot-time by the kernel
option "nohgiomap" if necessary.

Signed-off-by: Toshi Kani <toshi.kani@hp.com>
---
 mm/Kconfig |   11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/mm/Kconfig b/mm/Kconfig
index 1d1ae6b..eb738ae 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -444,6 +444,17 @@ choice
 	  benefit.
 endchoice
 
+config HUGE_IOMAP
+	bool "Kernel huge I/O mapping support"
+	depends on HAVE_ARCH_HUGE_VMAP
+	default y
+	help
+	  Kernel huge I/O mapping allows the kernel to transparently
+	  create I/O mappings with huge pages for memory-mapped I/O
+	  devices whenever possible.  This feature can improve
+	  performance of certain devices with large memory size, such
+	  as NVM, and reduce the time to create their mappings.
+
 #
 # UP and nommu archs use km based percpu allocator
 #

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [RFC PATCH 7/7] mm: Add config HUGE_IOMAP to enable huge I/O mappings
@ 2015-01-26 23:13   ` Toshi Kani
  0 siblings, 0 replies; 30+ messages in thread
From: Toshi Kani @ 2015-01-26 23:13 UTC (permalink / raw)
  To: akpm, hpa, tglx, mingo, arnd, linux-mm; +Cc: x86, linux-kernel, Toshi Kani

Add config HUGE_IOMAP to enable huge I/O mappings.  This feature
is set to Y by default when HAVE_ARCH_HUGE_VMAP is defined on the
architecture.

Note, user can also disable this feature at boot-time by the kernel
option "nohgiomap" if necessary.

Signed-off-by: Toshi Kani <toshi.kani@hp.com>
---
 mm/Kconfig |   11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/mm/Kconfig b/mm/Kconfig
index 1d1ae6b..eb738ae 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -444,6 +444,17 @@ choice
 	  benefit.
 endchoice
 
+config HUGE_IOMAP
+	bool "Kernel huge I/O mapping support"
+	depends on HAVE_ARCH_HUGE_VMAP
+	default y
+	help
+	  Kernel huge I/O mapping allows the kernel to transparently
+	  create I/O mappings with huge pages for memory-mapped I/O
+	  devices whenever possible.  This feature can improve
+	  performance of certain devices with large memory size, such
+	  as NVM, and reduce the time to create their mappings.
+
 #
 # UP and nommu archs use km based percpu allocator
 #

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH 2/7] lib: Add huge I/O map capability interfaces
  2015-01-26 23:13   ` Toshi Kani
@ 2015-01-26 23:54     ` Andrew Morton
  -1 siblings, 0 replies; 30+ messages in thread
From: Andrew Morton @ 2015-01-26 23:54 UTC (permalink / raw)
  To: Toshi Kani; +Cc: hpa, tglx, mingo, arnd, linux-mm, x86, linux-kernel

On Mon, 26 Jan 2015 16:13:24 -0700 Toshi Kani <toshi.kani@hp.com> wrote:

> Add ioremap_pud_enabled() and ioremap_pmd_enabled(), which
> return 1 when I/O mappings of pud/pmd are enabled on the kernel.
> 
> ioremap_huge_init() calls arch_ioremap_pud_supported() and
> arch_ioremap_pmd_supported() to initialize the capabilities.
> 
> A new kernel option "nohgiomap" is also added, so that user can
> disable the huge I/O map capabilities if necessary.

Why?  What's the problem with leaving it enabled?

> --- a/Documentation/kernel-parameters.txt
> +++ b/Documentation/kernel-parameters.txt
> @@ -2304,6 +2304,8 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
>  			register save and restore. The kernel will only save
>  			legacy floating-point registers on task switch.
>  
> +	nohgiomap	[KNL,x86] Disable huge I/O mappings.

That reads like "no high iomap" to me.  "nohugeiomap" would be better.

> --- a/lib/ioremap.c
> +++ b/lib/ioremap.c
> @@ -13,6 +13,44 @@
>  #include <asm/cacheflush.h>
>  #include <asm/pgtable.h>
>  
> +#ifdef CONFIG_HUGE_IOMAP
> +int __read_mostly ioremap_pud_capable;
> +int __read_mostly ioremap_pmd_capable;
> +int __read_mostly ioremap_huge_disabled;
> +
> +static int __init set_nohgiomap(char *str)
> +{
> +	ioremap_huge_disabled = 1;
> +	return 0;
> +}
> +early_param("nohgiomap", set_nohgiomap);

Why early?

> +static inline void ioremap_huge_init(void)
> +{
> +	if (!ioremap_huge_disabled) {
> +		if (arch_ioremap_pud_supported())
> +			ioremap_pud_capable = 1;
> +		if (arch_ioremap_pmd_supported())
> +			ioremap_pmd_capable = 1;
> +	}
> +}
> +
> +static inline int ioremap_pud_enabled(void)
> +{
> +	return ioremap_pud_capable;
> +}
> +
> +static inline int ioremap_pmd_enabled(void)
> +{
> +	return ioremap_pmd_capable;
> +}
> +
> +#else	/* !CONFIG_HUGE_IOMAP */
> +static inline void ioremap_huge_init(void) { }
> +static inline int ioremap_pud_enabled(void) { return 0; }
> +static inline int ioremap_pmd_enabled(void) { return 0; }
> +#endif	/* CONFIG_HUGE_IOMAP */
> +
>  static int ioremap_pte_range(pmd_t *pmd, unsigned long addr,
>  		unsigned long end, phys_addr_t phys_addr, pgprot_t prot)
>  {
> @@ -74,6 +112,12 @@ int ioremap_page_range(unsigned long addr,
>  	unsigned long start;
>  	unsigned long next;
>  	int err;
> +	static int ioremap_huge_init_done;
> +
> +	if (!ioremap_huge_init_done) {
> +		ioremap_huge_init_done = 1;
> +		ioremap_huge_init();
> +	}

Looks hacky.  Why can't we just get the startup ordering correct?  It
at least needs a comment which fully explains the situation.


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH 2/7] lib: Add huge I/O map capability interfaces
@ 2015-01-26 23:54     ` Andrew Morton
  0 siblings, 0 replies; 30+ messages in thread
From: Andrew Morton @ 2015-01-26 23:54 UTC (permalink / raw)
  To: Toshi Kani; +Cc: hpa, tglx, mingo, arnd, linux-mm, x86, linux-kernel

On Mon, 26 Jan 2015 16:13:24 -0700 Toshi Kani <toshi.kani@hp.com> wrote:

> Add ioremap_pud_enabled() and ioremap_pmd_enabled(), which
> return 1 when I/O mappings of pud/pmd are enabled on the kernel.
> 
> ioremap_huge_init() calls arch_ioremap_pud_supported() and
> arch_ioremap_pmd_supported() to initialize the capabilities.
> 
> A new kernel option "nohgiomap" is also added, so that user can
> disable the huge I/O map capabilities if necessary.

Why?  What's the problem with leaving it enabled?

> --- a/Documentation/kernel-parameters.txt
> +++ b/Documentation/kernel-parameters.txt
> @@ -2304,6 +2304,8 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
>  			register save and restore. The kernel will only save
>  			legacy floating-point registers on task switch.
>  
> +	nohgiomap	[KNL,x86] Disable huge I/O mappings.

That reads like "no high iomap" to me.  "nohugeiomap" would be better.

> --- a/lib/ioremap.c
> +++ b/lib/ioremap.c
> @@ -13,6 +13,44 @@
>  #include <asm/cacheflush.h>
>  #include <asm/pgtable.h>
>  
> +#ifdef CONFIG_HUGE_IOMAP
> +int __read_mostly ioremap_pud_capable;
> +int __read_mostly ioremap_pmd_capable;
> +int __read_mostly ioremap_huge_disabled;
> +
> +static int __init set_nohgiomap(char *str)
> +{
> +	ioremap_huge_disabled = 1;
> +	return 0;
> +}
> +early_param("nohgiomap", set_nohgiomap);

Why early?

> +static inline void ioremap_huge_init(void)
> +{
> +	if (!ioremap_huge_disabled) {
> +		if (arch_ioremap_pud_supported())
> +			ioremap_pud_capable = 1;
> +		if (arch_ioremap_pmd_supported())
> +			ioremap_pmd_capable = 1;
> +	}
> +}
> +
> +static inline int ioremap_pud_enabled(void)
> +{
> +	return ioremap_pud_capable;
> +}
> +
> +static inline int ioremap_pmd_enabled(void)
> +{
> +	return ioremap_pmd_capable;
> +}
> +
> +#else	/* !CONFIG_HUGE_IOMAP */
> +static inline void ioremap_huge_init(void) { }
> +static inline int ioremap_pud_enabled(void) { return 0; }
> +static inline int ioremap_pmd_enabled(void) { return 0; }
> +#endif	/* CONFIG_HUGE_IOMAP */
> +
>  static int ioremap_pte_range(pmd_t *pmd, unsigned long addr,
>  		unsigned long end, phys_addr_t phys_addr, pgprot_t prot)
>  {
> @@ -74,6 +112,12 @@ int ioremap_page_range(unsigned long addr,
>  	unsigned long start;
>  	unsigned long next;
>  	int err;
> +	static int ioremap_huge_init_done;
> +
> +	if (!ioremap_huge_init_done) {
> +		ioremap_huge_init_done = 1;
> +		ioremap_huge_init();
> +	}

Looks hacky.  Why can't we just get the startup ordering correct?  It
at least needs a comment which fully explains the situation.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH 3/7] mm: Change ioremap to set up huge I/O mappings
  2015-01-26 23:13   ` Toshi Kani
@ 2015-01-26 23:58     ` Andrew Morton
  -1 siblings, 0 replies; 30+ messages in thread
From: Andrew Morton @ 2015-01-26 23:58 UTC (permalink / raw)
  To: Toshi Kani; +Cc: hpa, tglx, mingo, arnd, linux-mm, x86, linux-kernel

On Mon, 26 Jan 2015 16:13:25 -0700 Toshi Kani <toshi.kani@hp.com> wrote:

> Change ioremap_pud_range() and ioremap_pmd_range() to set up
> huge I/O mappings when their capability is enabled and their
> conditions are met in a given request -- both virtual & physical
> addresses are aligned and its range fufills the mapping size.
> 
> These changes are only enabled when both CONFIG_HUGE_IOMAP
> and CONFIG_HAVE_ARCH_HUGE_VMAP are defined.
> 
> --- a/lib/ioremap.c
> +++ b/lib/ioremap.c
> @@ -81,6 +81,14 @@ static inline int ioremap_pmd_range(pud_t *pud, unsigned long addr,
>  		return -ENOMEM;
>  	do {
>  		next = pmd_addr_end(addr, end);
> +
> +		if (ioremap_pmd_enabled() &&
> +		    ((next - addr) == PMD_SIZE) &&
> +		    !((phys_addr + addr) & (PMD_SIZE-1))) {

IS_ALIGNED might be a little neater here.

> +			pmd_set_huge(pmd, phys_addr + addr, prot);
> +			continue;
> +		}
> +
>  		if (ioremap_pte_range(pmd, addr, next, phys_addr + addr, prot))
>  			return -ENOMEM;
>  	} while (pmd++, addr = next, addr != end);
> @@ -99,6 +107,14 @@ static inline int ioremap_pud_range(pgd_t *pgd, unsigned long addr,
>  		return -ENOMEM;
>  	do {
>  		next = pud_addr_end(addr, end);
> +
> +		if (ioremap_pud_enabled() &&
> +		    ((next - addr) == PUD_SIZE) &&
> +		    !((phys_addr + addr) & (PUD_SIZE-1))) {

And here.

> +			pud_set_huge(pud, phys_addr + addr, prot);
> +			continue;
> +		}
> +
>  		if (ioremap_pmd_range(pud, addr, next, phys_addr + addr, prot))
>  			return -ENOMEM;
>  	} while (pud++, addr = next, addr != end);

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH 3/7] mm: Change ioremap to set up huge I/O mappings
@ 2015-01-26 23:58     ` Andrew Morton
  0 siblings, 0 replies; 30+ messages in thread
From: Andrew Morton @ 2015-01-26 23:58 UTC (permalink / raw)
  To: Toshi Kani; +Cc: hpa, tglx, mingo, arnd, linux-mm, x86, linux-kernel

On Mon, 26 Jan 2015 16:13:25 -0700 Toshi Kani <toshi.kani@hp.com> wrote:

> Change ioremap_pud_range() and ioremap_pmd_range() to set up
> huge I/O mappings when their capability is enabled and their
> conditions are met in a given request -- both virtual & physical
> addresses are aligned and its range fufills the mapping size.
> 
> These changes are only enabled when both CONFIG_HUGE_IOMAP
> and CONFIG_HAVE_ARCH_HUGE_VMAP are defined.
> 
> --- a/lib/ioremap.c
> +++ b/lib/ioremap.c
> @@ -81,6 +81,14 @@ static inline int ioremap_pmd_range(pud_t *pud, unsigned long addr,
>  		return -ENOMEM;
>  	do {
>  		next = pmd_addr_end(addr, end);
> +
> +		if (ioremap_pmd_enabled() &&
> +		    ((next - addr) == PMD_SIZE) &&
> +		    !((phys_addr + addr) & (PMD_SIZE-1))) {

IS_ALIGNED might be a little neater here.

> +			pmd_set_huge(pmd, phys_addr + addr, prot);
> +			continue;
> +		}
> +
>  		if (ioremap_pte_range(pmd, addr, next, phys_addr + addr, prot))
>  			return -ENOMEM;
>  	} while (pmd++, addr = next, addr != end);
> @@ -99,6 +107,14 @@ static inline int ioremap_pud_range(pgd_t *pgd, unsigned long addr,
>  		return -ENOMEM;
>  	do {
>  		next = pud_addr_end(addr, end);
> +
> +		if (ioremap_pud_enabled() &&
> +		    ((next - addr) == PUD_SIZE) &&
> +		    !((phys_addr + addr) & (PUD_SIZE-1))) {

And here.

> +			pud_set_huge(pud, phys_addr + addr, prot);
> +			continue;
> +		}
> +
>  		if (ioremap_pmd_range(pud, addr, next, phys_addr + addr, prot))
>  			return -ENOMEM;
>  	} while (pud++, addr = next, addr != end);

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH 3/7] mm: Change ioremap to set up huge I/O mappings
  2015-01-26 23:58     ` Andrew Morton
@ 2015-01-27  0:01       ` Toshi Kani
  -1 siblings, 0 replies; 30+ messages in thread
From: Toshi Kani @ 2015-01-27  0:01 UTC (permalink / raw)
  To: Andrew Morton; +Cc: hpa, tglx, mingo, arnd, linux-mm, x86, linux-kernel

On Mon, 2015-01-26 at 15:58 -0800, Andrew Morton wrote:
> On Mon, 26 Jan 2015 16:13:25 -0700 Toshi Kani <toshi.kani@hp.com> wrote:
> 
> > Change ioremap_pud_range() and ioremap_pmd_range() to set up
> > huge I/O mappings when their capability is enabled and their
> > conditions are met in a given request -- both virtual & physical
> > addresses are aligned and its range fufills the mapping size.
> > 
> > These changes are only enabled when both CONFIG_HUGE_IOMAP
> > and CONFIG_HAVE_ARCH_HUGE_VMAP are defined.
> > 
> > --- a/lib/ioremap.c
> > +++ b/lib/ioremap.c
> > @@ -81,6 +81,14 @@ static inline int ioremap_pmd_range(pud_t *pud, unsigned long addr,
> >  		return -ENOMEM;
> >  	do {
> >  		next = pmd_addr_end(addr, end);
> > +
> > +		if (ioremap_pmd_enabled() &&
> > +		    ((next - addr) == PMD_SIZE) &&
> > +		    !((phys_addr + addr) & (PMD_SIZE-1))) {
> 
> IS_ALIGNED might be a little neater here.

Right.  Will use IS_ALIGNED.

> > +			pmd_set_huge(pmd, phys_addr + addr, prot);
> > +			continue;
> > +		}
> > +
> >  		if (ioremap_pte_range(pmd, addr, next, phys_addr + addr, prot))
> >  			return -ENOMEM;
> >  	} while (pmd++, addr = next, addr != end);
> > @@ -99,6 +107,14 @@ static inline int ioremap_pud_range(pgd_t *pgd, unsigned long addr,
> >  		return -ENOMEM;
> >  	do {
> >  		next = pud_addr_end(addr, end);
> > +
> > +		if (ioremap_pud_enabled() &&
> > +		    ((next - addr) == PUD_SIZE) &&
> > +		    !((phys_addr + addr) & (PUD_SIZE-1))) {
> 
> And here.

Will do.

Thanks,
-Toshi


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH 3/7] mm: Change ioremap to set up huge I/O mappings
@ 2015-01-27  0:01       ` Toshi Kani
  0 siblings, 0 replies; 30+ messages in thread
From: Toshi Kani @ 2015-01-27  0:01 UTC (permalink / raw)
  To: Andrew Morton; +Cc: hpa, tglx, mingo, arnd, linux-mm, x86, linux-kernel

On Mon, 2015-01-26 at 15:58 -0800, Andrew Morton wrote:
> On Mon, 26 Jan 2015 16:13:25 -0700 Toshi Kani <toshi.kani@hp.com> wrote:
> 
> > Change ioremap_pud_range() and ioremap_pmd_range() to set up
> > huge I/O mappings when their capability is enabled and their
> > conditions are met in a given request -- both virtual & physical
> > addresses are aligned and its range fufills the mapping size.
> > 
> > These changes are only enabled when both CONFIG_HUGE_IOMAP
> > and CONFIG_HAVE_ARCH_HUGE_VMAP are defined.
> > 
> > --- a/lib/ioremap.c
> > +++ b/lib/ioremap.c
> > @@ -81,6 +81,14 @@ static inline int ioremap_pmd_range(pud_t *pud, unsigned long addr,
> >  		return -ENOMEM;
> >  	do {
> >  		next = pmd_addr_end(addr, end);
> > +
> > +		if (ioremap_pmd_enabled() &&
> > +		    ((next - addr) == PMD_SIZE) &&
> > +		    !((phys_addr + addr) & (PMD_SIZE-1))) {
> 
> IS_ALIGNED might be a little neater here.

Right.  Will use IS_ALIGNED.

> > +			pmd_set_huge(pmd, phys_addr + addr, prot);
> > +			continue;
> > +		}
> > +
> >  		if (ioremap_pte_range(pmd, addr, next, phys_addr + addr, prot))
> >  			return -ENOMEM;
> >  	} while (pmd++, addr = next, addr != end);
> > @@ -99,6 +107,14 @@ static inline int ioremap_pud_range(pgd_t *pgd, unsigned long addr,
> >  		return -ENOMEM;
> >  	do {
> >  		next = pud_addr_end(addr, end);
> > +
> > +		if (ioremap_pud_enabled() &&
> > +		    ((next - addr) == PUD_SIZE) &&
> > +		    !((phys_addr + addr) & (PUD_SIZE-1))) {
> 
> And here.

Will do.

Thanks,
-Toshi

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH 2/7] lib: Add huge I/O map capability interfaces
  2015-01-26 23:54     ` Andrew Morton
@ 2015-01-27  1:01       ` Toshi Kani
  -1 siblings, 0 replies; 30+ messages in thread
From: Toshi Kani @ 2015-01-27  1:01 UTC (permalink / raw)
  To: Andrew Morton; +Cc: hpa, tglx, mingo, arnd, linux-mm, x86, linux-kernel

On Mon, 2015-01-26 at 15:54 -0800, Andrew Morton wrote:
> On Mon, 26 Jan 2015 16:13:24 -0700 Toshi Kani <toshi.kani@hp.com> wrote:
> 
> > Add ioremap_pud_enabled() and ioremap_pmd_enabled(), which
> > return 1 when I/O mappings of pud/pmd are enabled on the kernel.
> > 
> > ioremap_huge_init() calls arch_ioremap_pud_supported() and
> > arch_ioremap_pmd_supported() to initialize the capabilities.
> > 
> > A new kernel option "nohgiomap" is also added, so that user can
> > disable the huge I/O map capabilities if necessary.
> 
> Why?  What's the problem with leaving it enabled?

No, there should not be any problem with leaving it enabled.  This
option is added as a way to workaround a problem when someone hit an
issue unexpectedly.

> > --- a/Documentation/kernel-parameters.txt
> > +++ b/Documentation/kernel-parameters.txt
> > @@ -2304,6 +2304,8 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
> >  			register save and restore. The kernel will only save
> >  			legacy floating-point registers on task switch.
> >  
> > +	nohgiomap	[KNL,x86] Disable huge I/O mappings.
> 
> That reads like "no high iomap" to me.  "nohugeiomap" would be better.

Agreed.  Will use "nohugeiomap".

> > --- a/lib/ioremap.c
> > +++ b/lib/ioremap.c
> > @@ -13,6 +13,44 @@
> >  #include <asm/cacheflush.h>
> >  #include <asm/pgtable.h>
> >  
> > +#ifdef CONFIG_HUGE_IOMAP
> > +int __read_mostly ioremap_pud_capable;
> > +int __read_mostly ioremap_pmd_capable;
> > +int __read_mostly ioremap_huge_disabled;
> > +
> > +static int __init set_nohgiomap(char *str)
> > +{
> > +	ioremap_huge_disabled = 1;
> > +	return 0;
> > +}
> > +early_param("nohgiomap", set_nohgiomap);
> 
> Why early?

On my system, the first ioremap() call is made at:

  start_kernel()
   -> late_time_init()
     -> x86_late_time_init()
       -> hpet_time_init()

I think this is too early for module_param().  Also, lib/ioremap.c is
not really a module.

> > +static inline void ioremap_huge_init(void)
> > +{
> > +	if (!ioremap_huge_disabled) {
> > +		if (arch_ioremap_pud_supported())
> > +			ioremap_pud_capable = 1;
> > +		if (arch_ioremap_pmd_supported())
> > +			ioremap_pmd_capable = 1;
> > +	}
> > +}
> > +
> > +static inline int ioremap_pud_enabled(void)
> > +{
> > +	return ioremap_pud_capable;
> > +}
> > +
> > +static inline int ioremap_pmd_enabled(void)
> > +{
> > +	return ioremap_pmd_capable;
> > +}
> > +
> > +#else	/* !CONFIG_HUGE_IOMAP */
> > +static inline void ioremap_huge_init(void) { }
> > +static inline int ioremap_pud_enabled(void) { return 0; }
> > +static inline int ioremap_pmd_enabled(void) { return 0; }
> > +#endif	/* CONFIG_HUGE_IOMAP */
> > +
> >  static int ioremap_pte_range(pmd_t *pmd, unsigned long addr,
> >  		unsigned long end, phys_addr_t phys_addr, pgprot_t prot)
> >  {
> > @@ -74,6 +112,12 @@ int ioremap_page_range(unsigned long addr,
> >  	unsigned long start;
> >  	unsigned long next;
> >  	int err;
> > +	static int ioremap_huge_init_done;
> > +
> > +	if (!ioremap_huge_init_done) {
> > +		ioremap_huge_init_done = 1;
> > +		ioremap_huge_init();
> > +	}
> 
> Looks hacky.  Why can't we just get the startup ordering correct?  It
> at least needs a comment which fully explains the situation.

How about calling it from mm_init() after vmalloc_init()?  

void __init mm_init(void)
		:
        percpu_init_late();
        pgtable_init();
        vmalloc_init();
+       ioremap_huge_init();
 }

Thanks,
-Toshi



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH 2/7] lib: Add huge I/O map capability interfaces
@ 2015-01-27  1:01       ` Toshi Kani
  0 siblings, 0 replies; 30+ messages in thread
From: Toshi Kani @ 2015-01-27  1:01 UTC (permalink / raw)
  To: Andrew Morton; +Cc: hpa, tglx, mingo, arnd, linux-mm, x86, linux-kernel

On Mon, 2015-01-26 at 15:54 -0800, Andrew Morton wrote:
> On Mon, 26 Jan 2015 16:13:24 -0700 Toshi Kani <toshi.kani@hp.com> wrote:
> 
> > Add ioremap_pud_enabled() and ioremap_pmd_enabled(), which
> > return 1 when I/O mappings of pud/pmd are enabled on the kernel.
> > 
> > ioremap_huge_init() calls arch_ioremap_pud_supported() and
> > arch_ioremap_pmd_supported() to initialize the capabilities.
> > 
> > A new kernel option "nohgiomap" is also added, so that user can
> > disable the huge I/O map capabilities if necessary.
> 
> Why?  What's the problem with leaving it enabled?

No, there should not be any problem with leaving it enabled.  This
option is added as a way to workaround a problem when someone hit an
issue unexpectedly.

> > --- a/Documentation/kernel-parameters.txt
> > +++ b/Documentation/kernel-parameters.txt
> > @@ -2304,6 +2304,8 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
> >  			register save and restore. The kernel will only save
> >  			legacy floating-point registers on task switch.
> >  
> > +	nohgiomap	[KNL,x86] Disable huge I/O mappings.
> 
> That reads like "no high iomap" to me.  "nohugeiomap" would be better.

Agreed.  Will use "nohugeiomap".

> > --- a/lib/ioremap.c
> > +++ b/lib/ioremap.c
> > @@ -13,6 +13,44 @@
> >  #include <asm/cacheflush.h>
> >  #include <asm/pgtable.h>
> >  
> > +#ifdef CONFIG_HUGE_IOMAP
> > +int __read_mostly ioremap_pud_capable;
> > +int __read_mostly ioremap_pmd_capable;
> > +int __read_mostly ioremap_huge_disabled;
> > +
> > +static int __init set_nohgiomap(char *str)
> > +{
> > +	ioremap_huge_disabled = 1;
> > +	return 0;
> > +}
> > +early_param("nohgiomap", set_nohgiomap);
> 
> Why early?

On my system, the first ioremap() call is made at:

  start_kernel()
   -> late_time_init()
     -> x86_late_time_init()
       -> hpet_time_init()

I think this is too early for module_param().  Also, lib/ioremap.c is
not really a module.

> > +static inline void ioremap_huge_init(void)
> > +{
> > +	if (!ioremap_huge_disabled) {
> > +		if (arch_ioremap_pud_supported())
> > +			ioremap_pud_capable = 1;
> > +		if (arch_ioremap_pmd_supported())
> > +			ioremap_pmd_capable = 1;
> > +	}
> > +}
> > +
> > +static inline int ioremap_pud_enabled(void)
> > +{
> > +	return ioremap_pud_capable;
> > +}
> > +
> > +static inline int ioremap_pmd_enabled(void)
> > +{
> > +	return ioremap_pmd_capable;
> > +}
> > +
> > +#else	/* !CONFIG_HUGE_IOMAP */
> > +static inline void ioremap_huge_init(void) { }
> > +static inline int ioremap_pud_enabled(void) { return 0; }
> > +static inline int ioremap_pmd_enabled(void) { return 0; }
> > +#endif	/* CONFIG_HUGE_IOMAP */
> > +
> >  static int ioremap_pte_range(pmd_t *pmd, unsigned long addr,
> >  		unsigned long end, phys_addr_t phys_addr, pgprot_t prot)
> >  {
> > @@ -74,6 +112,12 @@ int ioremap_page_range(unsigned long addr,
> >  	unsigned long start;
> >  	unsigned long next;
> >  	int err;
> > +	static int ioremap_huge_init_done;
> > +
> > +	if (!ioremap_huge_init_done) {
> > +		ioremap_huge_init_done = 1;
> > +		ioremap_huge_init();
> > +	}
> 
> Looks hacky.  Why can't we just get the startup ordering correct?  It
> at least needs a comment which fully explains the situation.

How about calling it from mm_init() after vmalloc_init()?  

void __init mm_init(void)
		:
        percpu_init_late();
        pgtable_init();
        vmalloc_init();
+       ioremap_huge_init();
 }

Thanks,
-Toshi


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH 2/7] lib: Add huge I/O map capability interfaces
  2015-01-27  1:01       ` Toshi Kani
@ 2015-01-27 21:37         ` Andrew Morton
  -1 siblings, 0 replies; 30+ messages in thread
From: Andrew Morton @ 2015-01-27 21:37 UTC (permalink / raw)
  To: Toshi Kani; +Cc: hpa, tglx, mingo, arnd, linux-mm, x86, linux-kernel

On Mon, 26 Jan 2015 18:01:55 -0700 Toshi Kani <toshi.kani@hp.com> wrote:

> > >  static int ioremap_pte_range(pmd_t *pmd, unsigned long addr,
> > >  		unsigned long end, phys_addr_t phys_addr, pgprot_t prot)
> > >  {
> > > @@ -74,6 +112,12 @@ int ioremap_page_range(unsigned long addr,
> > >  	unsigned long start;
> > >  	unsigned long next;
> > >  	int err;
> > > +	static int ioremap_huge_init_done;
> > > +
> > > +	if (!ioremap_huge_init_done) {
> > > +		ioremap_huge_init_done = 1;
> > > +		ioremap_huge_init();
> > > +	}
> > 
> > Looks hacky.  Why can't we just get the startup ordering correct?  It
> > at least needs a comment which fully explains the situation.
> 
> How about calling it from mm_init() after vmalloc_init()?  
> 
> void __init mm_init(void)
> 		:
>         percpu_init_late();
>         pgtable_init();
>         vmalloc_init();
> +       ioremap_huge_init();
>  }

Sure, that would be better, assuming it can be made to work.  Don't
forget to mark ioremap_huge_init() as __init.


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH 2/7] lib: Add huge I/O map capability interfaces
@ 2015-01-27 21:37         ` Andrew Morton
  0 siblings, 0 replies; 30+ messages in thread
From: Andrew Morton @ 2015-01-27 21:37 UTC (permalink / raw)
  To: Toshi Kani; +Cc: hpa, tglx, mingo, arnd, linux-mm, x86, linux-kernel

On Mon, 26 Jan 2015 18:01:55 -0700 Toshi Kani <toshi.kani@hp.com> wrote:

> > >  static int ioremap_pte_range(pmd_t *pmd, unsigned long addr,
> > >  		unsigned long end, phys_addr_t phys_addr, pgprot_t prot)
> > >  {
> > > @@ -74,6 +112,12 @@ int ioremap_page_range(unsigned long addr,
> > >  	unsigned long start;
> > >  	unsigned long next;
> > >  	int err;
> > > +	static int ioremap_huge_init_done;
> > > +
> > > +	if (!ioremap_huge_init_done) {
> > > +		ioremap_huge_init_done = 1;
> > > +		ioremap_huge_init();
> > > +	}
> > 
> > Looks hacky.  Why can't we just get the startup ordering correct?  It
> > at least needs a comment which fully explains the situation.
> 
> How about calling it from mm_init() after vmalloc_init()?  
> 
> void __init mm_init(void)
> 		:
>         percpu_init_late();
>         pgtable_init();
>         vmalloc_init();
> +       ioremap_huge_init();
>  }

Sure, that would be better, assuming it can be made to work.  Don't
forget to mark ioremap_huge_init() as __init.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH 2/7] lib: Add huge I/O map capability interfaces
  2015-01-27 21:37         ` Andrew Morton
@ 2015-01-27 22:01           ` Toshi Kani
  -1 siblings, 0 replies; 30+ messages in thread
From: Toshi Kani @ 2015-01-27 22:01 UTC (permalink / raw)
  To: Andrew Morton; +Cc: hpa, tglx, mingo, arnd, linux-mm, x86, linux-kernel

On Tue, 2015-01-27 at 13:37 -0800, Andrew Morton wrote:
> On Mon, 26 Jan 2015 18:01:55 -0700 Toshi Kani <toshi.kani@hp.com> wrote:
> 
> > > >  static int ioremap_pte_range(pmd_t *pmd, unsigned long addr,
> > > >  		unsigned long end, phys_addr_t phys_addr, pgprot_t prot)
> > > >  {
> > > > @@ -74,6 +112,12 @@ int ioremap_page_range(unsigned long addr,
> > > >  	unsigned long start;
> > > >  	unsigned long next;
> > > >  	int err;
> > > > +	static int ioremap_huge_init_done;
> > > > +
> > > > +	if (!ioremap_huge_init_done) {
> > > > +		ioremap_huge_init_done = 1;
> > > > +		ioremap_huge_init();
> > > > +	}
> > > 
> > > Looks hacky.  Why can't we just get the startup ordering correct?  It
> > > at least needs a comment which fully explains the situation.
> > 
> > How about calling it from mm_init() after vmalloc_init()?  
> > 
> > void __init mm_init(void)
> > 		:
> >         percpu_init_late();
> >         pgtable_init();
> >         vmalloc_init();
> > +       ioremap_huge_init();
> >  }
> 
> Sure, that would be better, assuming it can be made to work.

Yes, I verified that ioremap() works right after this point.

> Don't forget to mark ioremap_huge_init() as __init.

Right.

Thanks,
-Toshi



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH 2/7] lib: Add huge I/O map capability interfaces
@ 2015-01-27 22:01           ` Toshi Kani
  0 siblings, 0 replies; 30+ messages in thread
From: Toshi Kani @ 2015-01-27 22:01 UTC (permalink / raw)
  To: Andrew Morton; +Cc: hpa, tglx, mingo, arnd, linux-mm, x86, linux-kernel

On Tue, 2015-01-27 at 13:37 -0800, Andrew Morton wrote:
> On Mon, 26 Jan 2015 18:01:55 -0700 Toshi Kani <toshi.kani@hp.com> wrote:
> 
> > > >  static int ioremap_pte_range(pmd_t *pmd, unsigned long addr,
> > > >  		unsigned long end, phys_addr_t phys_addr, pgprot_t prot)
> > > >  {
> > > > @@ -74,6 +112,12 @@ int ioremap_page_range(unsigned long addr,
> > > >  	unsigned long start;
> > > >  	unsigned long next;
> > > >  	int err;
> > > > +	static int ioremap_huge_init_done;
> > > > +
> > > > +	if (!ioremap_huge_init_done) {
> > > > +		ioremap_huge_init_done = 1;
> > > > +		ioremap_huge_init();
> > > > +	}
> > > 
> > > Looks hacky.  Why can't we just get the startup ordering correct?  It
> > > at least needs a comment which fully explains the situation.
> > 
> > How about calling it from mm_init() after vmalloc_init()?  
> > 
> > void __init mm_init(void)
> > 		:
> >         percpu_init_late();
> >         pgtable_init();
> >         vmalloc_init();
> > +       ioremap_huge_init();
> >  }
> 
> Sure, that would be better, assuming it can be made to work.

Yes, I verified that ioremap() works right after this point.

> Don't forget to mark ioremap_huge_init() as __init.

Right.

Thanks,
-Toshi


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH 2/7] lib: Add huge I/O map capability interfaces
  2015-01-27  1:01       ` Toshi Kani
@ 2015-02-05 20:56         ` Toshi Kani
  -1 siblings, 0 replies; 30+ messages in thread
From: Toshi Kani @ 2015-02-05 20:56 UTC (permalink / raw)
  To: Andrew Morton
  Cc: hpa, tglx, mingo, arnd, linux-mm, x86, linux-kernel, Elliott

On Mon, 2015-01-26 at 18:01 -0700, Toshi Kani wrote:
> On Mon, 2015-01-26 at 15:54 -0800, Andrew Morton wrote:
> > On Mon, 26 Jan 2015 16:13:24 -0700 Toshi Kani <toshi.kani@hp.com> wrote:
> > 
> > > Add ioremap_pud_enabled() and ioremap_pmd_enabled(), which
> > > return 1 when I/O mappings of pud/pmd are enabled on the kernel.
> > > 
> > > ioremap_huge_init() calls arch_ioremap_pud_supported() and
> > > arch_ioremap_pmd_supported() to initialize the capabilities.
> > > 
> > > A new kernel option "nohgiomap" is also added, so that user can
> > > disable the huge I/O map capabilities if necessary.
> > 
> > Why?  What's the problem with leaving it enabled?
> 
> No, there should not be any problem with leaving it enabled.  This
> option is added as a way to workaround a problem when someone hit an
> issue unexpectedly.

Intel SDM states "large page size considerations" as quoted in the
bottom of this email (Thanks Robert Elliott for this info).  There are
two cases mentioned:

 1) When large page is mapped to a region where MTRRs have multiple
different memory types, processor can behave in an undefined manner.
 2) When large page is mapped to the first 1MB which conflicts with the
fixed MTRRs, processor maps the range with multiple 4KB pages.

Case 2) is not an issue here since ioremap() does not remap the ISA
space in the first 1MB, and it's just a processor's "special" support.

For case 1), MTRR is a legacy feature and a driver calling ioremap() for
a large range covered by multiple MTRRs with two different types sounds
very unlikely to me, but it is theoretically possible.  (Note, /dev/mem
uses remap_pfn_range(), not ioremap().)

Here are three options I can think of for case 1).

 A) ioremap() to change a requested type to UC in case of 1)
 B) ioremap() to force 4KB mappings in case of 1)
 C) ioremap() to have no special handling for case 1)

In option A), pat_x_mtrr_type(), called from reserve_memtype(), already
has a special handling to convert WB request to UC-.  This handling
needs to be changed to convert all request types to UC (not UC-) in case
of 1).  reserve_memtype() is shared by other interfaces, so it needs to
have an additional argument to see if the caller supports large page
mapping since this conversion is only needed for large pages.

In option B), reserve_memtype() tells the caller that 4KB mappings need
to be used in case of 1) by returning 1.  All callers need to handle
this new return value properly.  ioremap_page_range() is then extended
to have additional flag that forces to use 4KB mappings.

In option C), we only document this potential issue, and do not make any
special handling for case 1), at least until we know this case really
exists in the real world.

Case 1) is better handled in the order of B), A), C) with additional
complexity & risk of the changes.  I am willing to make necessary
changes (A or B), but I am also thinking that we may be better off with
C) since MTRRs are legacy.

Do you think we need to protect the ioremap callers from case 1)?  Any
thoughts/suggestions will be very appreciated.

Thanks,
-Toshi  

=====
11.11.9 Large Page Size Considerations

The MTRRs provide memory typing for a limited number of regions that
have a 4 KByte granularity (the same gran-ularity as 4-KByte pages). The
memory type for a given page is cached in the processor’s TLBs. When
using large pages (2 MBytes, 4 MBytes, or 1 GBytes), a single page-table
entry covers multiple 4-KByte granules, each with a single memory type.
Because the memory type for a large page is cached in the TLB, the
processor can behave in an undefined manner if a large page is mapped to
a region of memory that MTRRs have mapped with multiple memory types. 

Undefined behavior can be avoided by insuring that all MTRR memory-type
ranges within a large page are of the same type. If a large page maps to
a region of memory containing different MTRR-defined memory types, the
PCD and PWT flags in the page-table entry should be set for the most
conservative memory type for that range. For example, a large page used
for memory mapped I/O and regular memory is mapped as UC memory.
Alternatively, the operating system can map the region using multiple
4-KByte pages each with its own memory type. 

The requirement that all 4-KByte ranges in a large page are of the same
memory type implies that large pages with different memory types may
suffer a performance penalty, since they must be marked with the lowest
common denominator memory type. The same consideration apply to 1 GByte
pages, each of which may consist of multiple 2-Mbyte ranges. 

The Pentium 4, Intel Xeon, and P6 family processors provide special
support for the physical memory range from 0 to 4 MBytes, which is
potentially mapped by both the fixed and variable MTRRs. This support is
invoked when a Pentium 4, Intel Xeon, or P6 family processor detects a
large page overlapping the first 1 MByte of this memory range with a
memory type that conflicts with the fixed MTRRs. Here, the processor
maps the memory range as multiple 4-KByte pages within the TLB. This
operation insures correct behavior at the cost of performance. To avoid 
this performance penalty, operating-system software should reserve the
large page option for regions of memory at addresses greater than or
equal to 4 MBytes.



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH 2/7] lib: Add huge I/O map capability interfaces
@ 2015-02-05 20:56         ` Toshi Kani
  0 siblings, 0 replies; 30+ messages in thread
From: Toshi Kani @ 2015-02-05 20:56 UTC (permalink / raw)
  To: Andrew Morton
  Cc: hpa, tglx, mingo, arnd, linux-mm, x86, linux-kernel, Elliott

On Mon, 2015-01-26 at 18:01 -0700, Toshi Kani wrote:
> On Mon, 2015-01-26 at 15:54 -0800, Andrew Morton wrote:
> > On Mon, 26 Jan 2015 16:13:24 -0700 Toshi Kani <toshi.kani@hp.com> wrote:
> > 
> > > Add ioremap_pud_enabled() and ioremap_pmd_enabled(), which
> > > return 1 when I/O mappings of pud/pmd are enabled on the kernel.
> > > 
> > > ioremap_huge_init() calls arch_ioremap_pud_supported() and
> > > arch_ioremap_pmd_supported() to initialize the capabilities.
> > > 
> > > A new kernel option "nohgiomap" is also added, so that user can
> > > disable the huge I/O map capabilities if necessary.
> > 
> > Why?  What's the problem with leaving it enabled?
> 
> No, there should not be any problem with leaving it enabled.  This
> option is added as a way to workaround a problem when someone hit an
> issue unexpectedly.

Intel SDM states "large page size considerations" as quoted in the
bottom of this email (Thanks Robert Elliott for this info).  There are
two cases mentioned:

 1) When large page is mapped to a region where MTRRs have multiple
different memory types, processor can behave in an undefined manner.
 2) When large page is mapped to the first 1MB which conflicts with the
fixed MTRRs, processor maps the range with multiple 4KB pages.

Case 2) is not an issue here since ioremap() does not remap the ISA
space in the first 1MB, and it's just a processor's "special" support.

For case 1), MTRR is a legacy feature and a driver calling ioremap() for
a large range covered by multiple MTRRs with two different types sounds
very unlikely to me, but it is theoretically possible.  (Note, /dev/mem
uses remap_pfn_range(), not ioremap().)

Here are three options I can think of for case 1).

 A) ioremap() to change a requested type to UC in case of 1)
 B) ioremap() to force 4KB mappings in case of 1)
 C) ioremap() to have no special handling for case 1)

In option A), pat_x_mtrr_type(), called from reserve_memtype(), already
has a special handling to convert WB request to UC-.  This handling
needs to be changed to convert all request types to UC (not UC-) in case
of 1).  reserve_memtype() is shared by other interfaces, so it needs to
have an additional argument to see if the caller supports large page
mapping since this conversion is only needed for large pages.

In option B), reserve_memtype() tells the caller that 4KB mappings need
to be used in case of 1) by returning 1.  All callers need to handle
this new return value properly.  ioremap_page_range() is then extended
to have additional flag that forces to use 4KB mappings.

In option C), we only document this potential issue, and do not make any
special handling for case 1), at least until we know this case really
exists in the real world.

Case 1) is better handled in the order of B), A), C) with additional
complexity & risk of the changes.  I am willing to make necessary
changes (A or B), but I am also thinking that we may be better off with
C) since MTRRs are legacy.

Do you think we need to protect the ioremap callers from case 1)?  Any
thoughts/suggestions will be very appreciated.

Thanks,
-Toshi  

=====
11.11.9 Large Page Size Considerations

The MTRRs provide memory typing for a limited number of regions that
have a 4 KByte granularity (the same gran-ularity as 4-KByte pages). The
memory type for a given page is cached in the processora??s TLBs. When
using large pages (2 MBytes, 4 MBytes, or 1 GBytes), a single page-table
entry covers multiple 4-KByte granules, each with a single memory type.
Because the memory type for a large page is cached in the TLB, the
processor can behave in an undefined manner if a large page is mapped to
a region of memory that MTRRs have mapped with multiple memory types. 

Undefined behavior can be avoided by insuring that all MTRR memory-type
ranges within a large page are of the same type. If a large page maps to
a region of memory containing different MTRR-defined memory types, the
PCD and PWT flags in the page-table entry should be set for the most
conservative memory type for that range. For example, a large page used
for memory mapped I/O and regular memory is mapped as UC memory.
Alternatively, the operating system can map the region using multiple
4-KByte pages each with its own memory type. 

The requirement that all 4-KByte ranges in a large page are of the same
memory type implies that large pages with different memory types may
suffer a performance penalty, since they must be marked with the lowest
common denominator memory type. The same consideration apply to 1 GByte
pages, each of which may consist of multiple 2-Mbyte ranges. 

The Pentium 4, Intel Xeon, and P6 family processors provide special
support for the physical memory range from 0 to 4 MBytes, which is
potentially mapped by both the fixed and variable MTRRs. This support is
invoked when a Pentium 4, Intel Xeon, or P6 family processor detects a
large page overlapping the first 1 MByte of this memory range with a
memory type that conflicts with the fixed MTRRs. Here, the processor
maps the memory range as multiple 4-KByte pages within the TLB. This
operation insures correct behavior at the cost of performance. To avoid 
this performance penalty, operating-system software should reserve the
large page option for regions of memory at addresses greater than or
equal to 4 MBytes.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2015-02-05 20:56 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-01-26 23:13 [RFC PATCH 0/7] Kernel huge I/O mapping support Toshi Kani
2015-01-26 23:13 ` Toshi Kani
2015-01-26 23:13 ` [RFC PATCH 1/7] mm: Change __get_vm_area_node() to use fls_long() Toshi Kani
2015-01-26 23:13   ` Toshi Kani
2015-01-26 23:13 ` [RFC PATCH 2/7] lib: Add huge I/O map capability interfaces Toshi Kani
2015-01-26 23:13   ` Toshi Kani
2015-01-26 23:54   ` Andrew Morton
2015-01-26 23:54     ` Andrew Morton
2015-01-27  1:01     ` Toshi Kani
2015-01-27  1:01       ` Toshi Kani
2015-01-27 21:37       ` Andrew Morton
2015-01-27 21:37         ` Andrew Morton
2015-01-27 22:01         ` Toshi Kani
2015-01-27 22:01           ` Toshi Kani
2015-02-05 20:56       ` Toshi Kani
2015-02-05 20:56         ` Toshi Kani
2015-01-26 23:13 ` [RFC PATCH 3/7] mm: Change ioremap to set up huge I/O mappings Toshi Kani
2015-01-26 23:13   ` Toshi Kani
2015-01-26 23:58   ` Andrew Morton
2015-01-26 23:58     ` Andrew Morton
2015-01-27  0:01     ` Toshi Kani
2015-01-27  0:01       ` Toshi Kani
2015-01-26 23:13 ` [RFC PATCH 4/7] mm: Change vunmap to tear down huge KVA mappings Toshi Kani
2015-01-26 23:13   ` Toshi Kani
2015-01-26 23:13 ` [RFC PATCH 5/7] x86, mm: Support huge KVA mappings on x86 Toshi Kani
2015-01-26 23:13   ` Toshi Kani
2015-01-26 23:13 ` [RFC PATCH 6/7] x86, mm: Support huge I/O " Toshi Kani
2015-01-26 23:13   ` Toshi Kani
2015-01-26 23:13 ` [RFC PATCH 7/7] mm: Add config HUGE_IOMAP to enable huge I/O mappings Toshi Kani
2015-01-26 23:13   ` Toshi Kani

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.