All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/7] Kernel huge I/O mapping support
@ 2015-02-09 22:45 ` Toshi Kani
  0 siblings, 0 replies; 50+ messages in thread
From: Toshi Kani @ 2015-02-09 22:45 UTC (permalink / raw)
  To: akpm, hpa, tglx, mingo, arnd; +Cc: linux-mm, x86, linux-kernel, Elliott

ioremap() and its related interfaces are used to create I/O
mappings to memory-mapped I/O devices.  The mapping sizes of
the traditional I/O devices are relatively small.  Non-volatile
memory (NVM), however, has many GB and is going to have TB soon.
It is not very efficient to create large I/O mappings with 4KB. 

This patchset extends the ioremap() interfaces to transparently
create I/O mappings with huge pages whenever possible.  ioremap()
continues to use 4KB mappings when a huge page does not fit into
a requested range.  There is no change necessary to the drivers
using ioremap().  A requested physical address must be aligned by
a huge page size (1GB or 2MB on x86) for using huge page mapping,
though.  The kernel huge I/O mapping will improve performance of
NVM and other devices with large memory, and reduce the time to
create their mappings as well.

On x86, the huge I/O mapping may not be used when a target range is
covered by multiple MTRRs with different memory types.  The caller
must make a separate request for each MTRR range, or the huge I/O
mapping can be disabled with the kernel boot option "nohugeiomap".
The detail of this issue is described in the email below, and this
patch takes option C) in favor of simplicity since MTRRs are legacy
feature.
 https://lkml.org/lkml/2015/2/5/638

The patchset introduces the following configs:
 HUGE_IOMAP - When selected (default Y), enable huge I/O mappings.
              Require HAVE_ARCH_HUGE_VMAP set.
 HAVE_ARCH_HUGE_VMAP - Indicate arch supports huge KVA mappings.
                       Require X86_PAE set on X86_32.

Patch 1-4 changes common files to support huge I/O mappings.  There
is no change in the functinalities until HUGE_IOMAP is set in patch 7.

Patch 5,6 implement HAVE_ARCH_HUGE_VMAP and HUGE_IOMAP funcs on x86,
and set HAVE_ARCH_HUGE_VMAP on x86.

Patch 7 adds HUGE_IOMAP to Kconfig, which is set to Y by default on
x86.

---
v2:
 - Addressed review comments from Andrew Morton.
 - Changed HAVE_ARCH_HUGE_VMAP to require X86_PAE set on X86_32.
 - Documented a x86 restriction with multiple MTRRs with different
   memory types.

---
Toshi Kani (7):
  1/7 mm: Change __get_vm_area_node() to use fls_long()
  2/7 lib: Add huge I/O map capability interfaces
  3/7 mm: Change ioremap to set up huge I/O mappings
  4/7 mm: Change vunmap to tear down huge KVA mappings
  5/7 x86, mm: Support huge KVA mappings on x86
  6/7 x86, mm: Support huge I/O mappings on x86
  7/7 mm: Add config HUGE_IOMAP to enable huge I/O mappings

---
 Documentation/kernel-parameters.txt |  2 ++
 arch/Kconfig                        |  3 +++
 arch/x86/Kconfig                    |  1 +
 arch/x86/include/asm/page_types.h   |  8 ++++++
 arch/x86/mm/ioremap.c               | 26 ++++++++++++++++--
 arch/x86/mm/pgtable.c               | 34 +++++++++++++++++++++++
 include/asm-generic/pgtable.h       | 12 +++++++++
 include/linux/io.h                  |  7 +++++
 init/main.c                         |  2 ++
 lib/ioremap.c                       | 54 +++++++++++++++++++++++++++++++++++++
 mm/Kconfig                          | 11 ++++++++
 mm/vmalloc.c                        |  8 +++++-
 12 files changed, 165 insertions(+), 3 deletions(-)

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH v2 0/7] Kernel huge I/O mapping support
@ 2015-02-09 22:45 ` Toshi Kani
  0 siblings, 0 replies; 50+ messages in thread
From: Toshi Kani @ 2015-02-09 22:45 UTC (permalink / raw)
  To: akpm, hpa, tglx, mingo, arnd; +Cc: linux-mm, x86, linux-kernel, Elliott

ioremap() and its related interfaces are used to create I/O
mappings to memory-mapped I/O devices.  The mapping sizes of
the traditional I/O devices are relatively small.  Non-volatile
memory (NVM), however, has many GB and is going to have TB soon.
It is not very efficient to create large I/O mappings with 4KB. 

This patchset extends the ioremap() interfaces to transparently
create I/O mappings with huge pages whenever possible.  ioremap()
continues to use 4KB mappings when a huge page does not fit into
a requested range.  There is no change necessary to the drivers
using ioremap().  A requested physical address must be aligned by
a huge page size (1GB or 2MB on x86) for using huge page mapping,
though.  The kernel huge I/O mapping will improve performance of
NVM and other devices with large memory, and reduce the time to
create their mappings as well.

On x86, the huge I/O mapping may not be used when a target range is
covered by multiple MTRRs with different memory types.  The caller
must make a separate request for each MTRR range, or the huge I/O
mapping can be disabled with the kernel boot option "nohugeiomap".
The detail of this issue is described in the email below, and this
patch takes option C) in favor of simplicity since MTRRs are legacy
feature.
 https://lkml.org/lkml/2015/2/5/638

The patchset introduces the following configs:
 HUGE_IOMAP - When selected (default Y), enable huge I/O mappings.
              Require HAVE_ARCH_HUGE_VMAP set.
 HAVE_ARCH_HUGE_VMAP - Indicate arch supports huge KVA mappings.
                       Require X86_PAE set on X86_32.

Patch 1-4 changes common files to support huge I/O mappings.  There
is no change in the functinalities until HUGE_IOMAP is set in patch 7.

Patch 5,6 implement HAVE_ARCH_HUGE_VMAP and HUGE_IOMAP funcs on x86,
and set HAVE_ARCH_HUGE_VMAP on x86.

Patch 7 adds HUGE_IOMAP to Kconfig, which is set to Y by default on
x86.

---
v2:
 - Addressed review comments from Andrew Morton.
 - Changed HAVE_ARCH_HUGE_VMAP to require X86_PAE set on X86_32.
 - Documented a x86 restriction with multiple MTRRs with different
   memory types.

---
Toshi Kani (7):
  1/7 mm: Change __get_vm_area_node() to use fls_long()
  2/7 lib: Add huge I/O map capability interfaces
  3/7 mm: Change ioremap to set up huge I/O mappings
  4/7 mm: Change vunmap to tear down huge KVA mappings
  5/7 x86, mm: Support huge KVA mappings on x86
  6/7 x86, mm: Support huge I/O mappings on x86
  7/7 mm: Add config HUGE_IOMAP to enable huge I/O mappings

---
 Documentation/kernel-parameters.txt |  2 ++
 arch/Kconfig                        |  3 +++
 arch/x86/Kconfig                    |  1 +
 arch/x86/include/asm/page_types.h   |  8 ++++++
 arch/x86/mm/ioremap.c               | 26 ++++++++++++++++--
 arch/x86/mm/pgtable.c               | 34 +++++++++++++++++++++++
 include/asm-generic/pgtable.h       | 12 +++++++++
 include/linux/io.h                  |  7 +++++
 init/main.c                         |  2 ++
 lib/ioremap.c                       | 54 +++++++++++++++++++++++++++++++++++++
 mm/Kconfig                          | 11 ++++++++
 mm/vmalloc.c                        |  8 +++++-
 12 files changed, 165 insertions(+), 3 deletions(-)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH v2 1/7] mm: Change __get_vm_area_node() to use fls_long()
  2015-02-09 22:45 ` Toshi Kani
@ 2015-02-09 22:45   ` Toshi Kani
  -1 siblings, 0 replies; 50+ messages in thread
From: Toshi Kani @ 2015-02-09 22:45 UTC (permalink / raw)
  To: akpm, hpa, tglx, mingo, arnd
  Cc: linux-mm, x86, linux-kernel, Elliott, Toshi Kani

__get_vm_area_node() takes unsigned long size, which is a 64-bit
value on a 64-bit kernel.  However, fls(size) simply ignores the
upper 32-bit.  Change to use fls_long() to handle the size properly.

Signed-off-by: Toshi Kani <toshi.kani@hp.com>
---
 mm/vmalloc.c |    4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 39c3388..40ea214 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -29,6 +29,7 @@
 #include <linux/atomic.h>
 #include <linux/compiler.h>
 #include <linux/llist.h>
+#include <linux/bitops.h>
 
 #include <asm/uaccess.h>
 #include <asm/tlbflush.h>
@@ -1314,7 +1315,8 @@ static struct vm_struct *__get_vm_area_node(unsigned long size,
 
 	BUG_ON(in_interrupt());
 	if (flags & VM_IOREMAP)
-		align = 1ul << clamp(fls(size), PAGE_SHIFT, IOREMAP_MAX_ORDER);
+		align = 1ul << clamp_t(int, fls_long(size),
+				       PAGE_SHIFT, IOREMAP_MAX_ORDER);
 
 	size = PAGE_ALIGN(size);
 	if (unlikely(!size))

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v2 1/7] mm: Change __get_vm_area_node() to use fls_long()
@ 2015-02-09 22:45   ` Toshi Kani
  0 siblings, 0 replies; 50+ messages in thread
From: Toshi Kani @ 2015-02-09 22:45 UTC (permalink / raw)
  To: akpm, hpa, tglx, mingo, arnd
  Cc: linux-mm, x86, linux-kernel, Elliott, Toshi Kani

__get_vm_area_node() takes unsigned long size, which is a 64-bit
value on a 64-bit kernel.  However, fls(size) simply ignores the
upper 32-bit.  Change to use fls_long() to handle the size properly.

Signed-off-by: Toshi Kani <toshi.kani@hp.com>
---
 mm/vmalloc.c |    4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 39c3388..40ea214 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -29,6 +29,7 @@
 #include <linux/atomic.h>
 #include <linux/compiler.h>
 #include <linux/llist.h>
+#include <linux/bitops.h>
 
 #include <asm/uaccess.h>
 #include <asm/tlbflush.h>
@@ -1314,7 +1315,8 @@ static struct vm_struct *__get_vm_area_node(unsigned long size,
 
 	BUG_ON(in_interrupt());
 	if (flags & VM_IOREMAP)
-		align = 1ul << clamp(fls(size), PAGE_SHIFT, IOREMAP_MAX_ORDER);
+		align = 1ul << clamp_t(int, fls_long(size),
+				       PAGE_SHIFT, IOREMAP_MAX_ORDER);
 
 	size = PAGE_ALIGN(size);
 	if (unlikely(!size))

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v2 2/7] lib: Add huge I/O map capability interfaces
  2015-02-09 22:45 ` Toshi Kani
@ 2015-02-09 22:45   ` Toshi Kani
  -1 siblings, 0 replies; 50+ messages in thread
From: Toshi Kani @ 2015-02-09 22:45 UTC (permalink / raw)
  To: akpm, hpa, tglx, mingo, arnd
  Cc: linux-mm, x86, linux-kernel, Elliott, Toshi Kani

Add ioremap_pud_enabled() and ioremap_pmd_enabled(), which
return 1 when I/O mappings of pud/pmd are enabled on the kernel.

ioremap_huge_init() calls arch_ioremap_pud_supported() and
arch_ioremap_pmd_supported() to initialize the capabilities.

A new kernel option "nohugeiomap" is also added, so that user
can disable the huge I/O map capabilities when necessary.

Signed-off-by: Toshi Kani <toshi.kani@hp.com>
---
 Documentation/kernel-parameters.txt |    2 ++
 include/linux/io.h                  |    7 ++++++
 init/main.c                         |    2 ++
 lib/ioremap.c                       |   38 +++++++++++++++++++++++++++++++++++
 4 files changed, 49 insertions(+)

diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index 176d4fe..1872b46 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -2304,6 +2304,8 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
 			register save and restore. The kernel will only save
 			legacy floating-point registers on task switch.
 
+	nohugeiomap	[KNL,x86] Disable kernel huge I/O mappings.
+
 	noxsave		[BUGS=X86] Disables x86 extended register state save
 			and restore using xsave. The kernel will fallback to
 			enabling legacy floating-point and sse state.
diff --git a/include/linux/io.h b/include/linux/io.h
index fa02e55..9acc588a 100644
--- a/include/linux/io.h
+++ b/include/linux/io.h
@@ -38,6 +38,13 @@ static inline int ioremap_page_range(unsigned long addr, unsigned long end,
 }
 #endif
 
+void __init ioremap_huge_init(void);
+
+#ifdef CONFIG_HUGE_IOMAP
+int arch_ioremap_pud_supported(void);
+int arch_ioremap_pmd_supported(void);
+#endif
+
 /*
  * Managed iomap interface
  */
diff --git a/init/main.c b/init/main.c
index 61b99376..9f871ac 100644
--- a/init/main.c
+++ b/init/main.c
@@ -80,6 +80,7 @@
 #include <linux/list.h>
 #include <linux/integrity.h>
 #include <linux/proc_ns.h>
+#include <linux/io.h>
 
 #include <asm/io.h>
 #include <asm/bugs.h>
@@ -497,6 +498,7 @@ static void __init mm_init(void)
 	percpu_init_late();
 	pgtable_init();
 	vmalloc_init();
+	ioremap_huge_init();
 }
 
 asmlinkage __visible void __init start_kernel(void)
diff --git a/lib/ioremap.c b/lib/ioremap.c
index 0c9216c..cafd83e 100644
--- a/lib/ioremap.c
+++ b/lib/ioremap.c
@@ -13,6 +13,44 @@
 #include <asm/cacheflush.h>
 #include <asm/pgtable.h>
 
+#ifdef CONFIG_HUGE_IOMAP
+int __read_mostly ioremap_pud_capable;
+int __read_mostly ioremap_pmd_capable;
+int __read_mostly ioremap_huge_disabled;
+
+static int __init set_nohugeiomap(char *str)
+{
+	ioremap_huge_disabled = 1;
+	return 0;
+}
+early_param("nohugeiomap", set_nohugeiomap);
+
+void __init ioremap_huge_init(void)
+{
+	if (!ioremap_huge_disabled) {
+		if (arch_ioremap_pud_supported())
+			ioremap_pud_capable = 1;
+		if (arch_ioremap_pmd_supported())
+			ioremap_pmd_capable = 1;
+	}
+}
+
+static inline int ioremap_pud_enabled(void)
+{
+	return ioremap_pud_capable;
+}
+
+static inline int ioremap_pmd_enabled(void)
+{
+	return ioremap_pmd_capable;
+}
+
+#else	/* !CONFIG_HUGE_IOMAP */
+void __init ioremap_huge_init(void) { }
+static inline int ioremap_pud_enabled(void) { return 0; }
+static inline int ioremap_pmd_enabled(void) { return 0; }
+#endif	/* CONFIG_HUGE_IOMAP */
+
 static int ioremap_pte_range(pmd_t *pmd, unsigned long addr,
 		unsigned long end, phys_addr_t phys_addr, pgprot_t prot)
 {

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v2 2/7] lib: Add huge I/O map capability interfaces
@ 2015-02-09 22:45   ` Toshi Kani
  0 siblings, 0 replies; 50+ messages in thread
From: Toshi Kani @ 2015-02-09 22:45 UTC (permalink / raw)
  To: akpm, hpa, tglx, mingo, arnd
  Cc: linux-mm, x86, linux-kernel, Elliott, Toshi Kani

Add ioremap_pud_enabled() and ioremap_pmd_enabled(), which
return 1 when I/O mappings of pud/pmd are enabled on the kernel.

ioremap_huge_init() calls arch_ioremap_pud_supported() and
arch_ioremap_pmd_supported() to initialize the capabilities.

A new kernel option "nohugeiomap" is also added, so that user
can disable the huge I/O map capabilities when necessary.

Signed-off-by: Toshi Kani <toshi.kani@hp.com>
---
 Documentation/kernel-parameters.txt |    2 ++
 include/linux/io.h                  |    7 ++++++
 init/main.c                         |    2 ++
 lib/ioremap.c                       |   38 +++++++++++++++++++++++++++++++++++
 4 files changed, 49 insertions(+)

diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index 176d4fe..1872b46 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -2304,6 +2304,8 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
 			register save and restore. The kernel will only save
 			legacy floating-point registers on task switch.
 
+	nohugeiomap	[KNL,x86] Disable kernel huge I/O mappings.
+
 	noxsave		[BUGS=X86] Disables x86 extended register state save
 			and restore using xsave. The kernel will fallback to
 			enabling legacy floating-point and sse state.
diff --git a/include/linux/io.h b/include/linux/io.h
index fa02e55..9acc588a 100644
--- a/include/linux/io.h
+++ b/include/linux/io.h
@@ -38,6 +38,13 @@ static inline int ioremap_page_range(unsigned long addr, unsigned long end,
 }
 #endif
 
+void __init ioremap_huge_init(void);
+
+#ifdef CONFIG_HUGE_IOMAP
+int arch_ioremap_pud_supported(void);
+int arch_ioremap_pmd_supported(void);
+#endif
+
 /*
  * Managed iomap interface
  */
diff --git a/init/main.c b/init/main.c
index 61b99376..9f871ac 100644
--- a/init/main.c
+++ b/init/main.c
@@ -80,6 +80,7 @@
 #include <linux/list.h>
 #include <linux/integrity.h>
 #include <linux/proc_ns.h>
+#include <linux/io.h>
 
 #include <asm/io.h>
 #include <asm/bugs.h>
@@ -497,6 +498,7 @@ static void __init mm_init(void)
 	percpu_init_late();
 	pgtable_init();
 	vmalloc_init();
+	ioremap_huge_init();
 }
 
 asmlinkage __visible void __init start_kernel(void)
diff --git a/lib/ioremap.c b/lib/ioremap.c
index 0c9216c..cafd83e 100644
--- a/lib/ioremap.c
+++ b/lib/ioremap.c
@@ -13,6 +13,44 @@
 #include <asm/cacheflush.h>
 #include <asm/pgtable.h>
 
+#ifdef CONFIG_HUGE_IOMAP
+int __read_mostly ioremap_pud_capable;
+int __read_mostly ioremap_pmd_capable;
+int __read_mostly ioremap_huge_disabled;
+
+static int __init set_nohugeiomap(char *str)
+{
+	ioremap_huge_disabled = 1;
+	return 0;
+}
+early_param("nohugeiomap", set_nohugeiomap);
+
+void __init ioremap_huge_init(void)
+{
+	if (!ioremap_huge_disabled) {
+		if (arch_ioremap_pud_supported())
+			ioremap_pud_capable = 1;
+		if (arch_ioremap_pmd_supported())
+			ioremap_pmd_capable = 1;
+	}
+}
+
+static inline int ioremap_pud_enabled(void)
+{
+	return ioremap_pud_capable;
+}
+
+static inline int ioremap_pmd_enabled(void)
+{
+	return ioremap_pmd_capable;
+}
+
+#else	/* !CONFIG_HUGE_IOMAP */
+void __init ioremap_huge_init(void) { }
+static inline int ioremap_pud_enabled(void) { return 0; }
+static inline int ioremap_pmd_enabled(void) { return 0; }
+#endif	/* CONFIG_HUGE_IOMAP */
+
 static int ioremap_pte_range(pmd_t *pmd, unsigned long addr,
 		unsigned long end, phys_addr_t phys_addr, pgprot_t prot)
 {

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v2 3/7] mm: Change ioremap to set up huge I/O mappings
  2015-02-09 22:45 ` Toshi Kani
@ 2015-02-09 22:45   ` Toshi Kani
  -1 siblings, 0 replies; 50+ messages in thread
From: Toshi Kani @ 2015-02-09 22:45 UTC (permalink / raw)
  To: akpm, hpa, tglx, mingo, arnd
  Cc: linux-mm, x86, linux-kernel, Elliott, Toshi Kani

Change ioremap_pud_range() and ioremap_pmd_range() to set up
kernel huge I/O mappings when their capability is enabled, and
the request meets their conditions -- both virtual & physical
addresses are aligned and its range fufills the mapping size.

The changes are only enabled when both CONFIG_HUGE_IOMAP and
CONFIG_HAVE_ARCH_HUGE_VMAP are defined.

Signed-off-by: Toshi Kani <toshi.kani@hp.com>
---
 arch/Kconfig                  |    3 +++
 include/asm-generic/pgtable.h |    8 ++++++++
 lib/ioremap.c                 |   16 ++++++++++++++++
 3 files changed, 27 insertions(+)

diff --git a/arch/Kconfig b/arch/Kconfig
index 05d7a8a..55c4440 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -446,6 +446,9 @@ config HAVE_IRQ_TIME_ACCOUNTING
 config HAVE_ARCH_TRANSPARENT_HUGEPAGE
 	bool
 
+config HAVE_ARCH_HUGE_VMAP
+	bool
+
 config HAVE_ARCH_SOFT_DIRTY
 	bool
 
diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h
index 177d597..7dc3838 100644
--- a/include/asm-generic/pgtable.h
+++ b/include/asm-generic/pgtable.h
@@ -847,4 +847,12 @@ static inline void pmdp_set_numa(struct mm_struct *mm, unsigned long addr,
 #define io_remap_pfn_range remap_pfn_range
 #endif
 
+#ifdef CONFIG_HAVE_ARCH_HUGE_VMAP
+void pud_set_huge(pud_t *pud, phys_addr_t addr, pgprot_t prot);
+void pmd_set_huge(pmd_t *pmd, phys_addr_t addr, pgprot_t prot);
+#else	/* !CONFIG_HAVE_ARCH_HUGE_VMAP */
+static inline void pud_set_huge(pud_t *pud, phys_addr_t addr, pgprot_t prot) { }
+static inline void pmd_set_huge(pmd_t *pmd, phys_addr_t addr, pgprot_t prot) { }
+#endif	/* CONFIG_HAVE_ARCH_HUGE_VMAP */
+
 #endif /* _ASM_GENERIC_PGTABLE_H */
diff --git a/lib/ioremap.c b/lib/ioremap.c
index cafd83e..c447832 100644
--- a/lib/ioremap.c
+++ b/lib/ioremap.c
@@ -81,6 +81,14 @@ static inline int ioremap_pmd_range(pud_t *pud, unsigned long addr,
 		return -ENOMEM;
 	do {
 		next = pmd_addr_end(addr, end);
+
+		if (ioremap_pmd_enabled() &&
+		    ((next - addr) == PMD_SIZE) &&
+		    IS_ALIGNED(phys_addr + addr, PMD_SIZE)) {
+			pmd_set_huge(pmd, phys_addr + addr, prot);
+			continue;
+		}
+
 		if (ioremap_pte_range(pmd, addr, next, phys_addr + addr, prot))
 			return -ENOMEM;
 	} while (pmd++, addr = next, addr != end);
@@ -99,6 +107,14 @@ static inline int ioremap_pud_range(pgd_t *pgd, unsigned long addr,
 		return -ENOMEM;
 	do {
 		next = pud_addr_end(addr, end);
+
+		if (ioremap_pud_enabled() &&
+		    ((next - addr) == PUD_SIZE) &&
+		    IS_ALIGNED(phys_addr + addr, PUD_SIZE)) {
+			pud_set_huge(pud, phys_addr + addr, prot);
+			continue;
+		}
+
 		if (ioremap_pmd_range(pud, addr, next, phys_addr + addr, prot))
 			return -ENOMEM;
 	} while (pud++, addr = next, addr != end);

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v2 3/7] mm: Change ioremap to set up huge I/O mappings
@ 2015-02-09 22:45   ` Toshi Kani
  0 siblings, 0 replies; 50+ messages in thread
From: Toshi Kani @ 2015-02-09 22:45 UTC (permalink / raw)
  To: akpm, hpa, tglx, mingo, arnd
  Cc: linux-mm, x86, linux-kernel, Elliott, Toshi Kani

Change ioremap_pud_range() and ioremap_pmd_range() to set up
kernel huge I/O mappings when their capability is enabled, and
the request meets their conditions -- both virtual & physical
addresses are aligned and its range fufills the mapping size.

The changes are only enabled when both CONFIG_HUGE_IOMAP and
CONFIG_HAVE_ARCH_HUGE_VMAP are defined.

Signed-off-by: Toshi Kani <toshi.kani@hp.com>
---
 arch/Kconfig                  |    3 +++
 include/asm-generic/pgtable.h |    8 ++++++++
 lib/ioremap.c                 |   16 ++++++++++++++++
 3 files changed, 27 insertions(+)

diff --git a/arch/Kconfig b/arch/Kconfig
index 05d7a8a..55c4440 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -446,6 +446,9 @@ config HAVE_IRQ_TIME_ACCOUNTING
 config HAVE_ARCH_TRANSPARENT_HUGEPAGE
 	bool
 
+config HAVE_ARCH_HUGE_VMAP
+	bool
+
 config HAVE_ARCH_SOFT_DIRTY
 	bool
 
diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h
index 177d597..7dc3838 100644
--- a/include/asm-generic/pgtable.h
+++ b/include/asm-generic/pgtable.h
@@ -847,4 +847,12 @@ static inline void pmdp_set_numa(struct mm_struct *mm, unsigned long addr,
 #define io_remap_pfn_range remap_pfn_range
 #endif
 
+#ifdef CONFIG_HAVE_ARCH_HUGE_VMAP
+void pud_set_huge(pud_t *pud, phys_addr_t addr, pgprot_t prot);
+void pmd_set_huge(pmd_t *pmd, phys_addr_t addr, pgprot_t prot);
+#else	/* !CONFIG_HAVE_ARCH_HUGE_VMAP */
+static inline void pud_set_huge(pud_t *pud, phys_addr_t addr, pgprot_t prot) { }
+static inline void pmd_set_huge(pmd_t *pmd, phys_addr_t addr, pgprot_t prot) { }
+#endif	/* CONFIG_HAVE_ARCH_HUGE_VMAP */
+
 #endif /* _ASM_GENERIC_PGTABLE_H */
diff --git a/lib/ioremap.c b/lib/ioremap.c
index cafd83e..c447832 100644
--- a/lib/ioremap.c
+++ b/lib/ioremap.c
@@ -81,6 +81,14 @@ static inline int ioremap_pmd_range(pud_t *pud, unsigned long addr,
 		return -ENOMEM;
 	do {
 		next = pmd_addr_end(addr, end);
+
+		if (ioremap_pmd_enabled() &&
+		    ((next - addr) == PMD_SIZE) &&
+		    IS_ALIGNED(phys_addr + addr, PMD_SIZE)) {
+			pmd_set_huge(pmd, phys_addr + addr, prot);
+			continue;
+		}
+
 		if (ioremap_pte_range(pmd, addr, next, phys_addr + addr, prot))
 			return -ENOMEM;
 	} while (pmd++, addr = next, addr != end);
@@ -99,6 +107,14 @@ static inline int ioremap_pud_range(pgd_t *pgd, unsigned long addr,
 		return -ENOMEM;
 	do {
 		next = pud_addr_end(addr, end);
+
+		if (ioremap_pud_enabled() &&
+		    ((next - addr) == PUD_SIZE) &&
+		    IS_ALIGNED(phys_addr + addr, PUD_SIZE)) {
+			pud_set_huge(pud, phys_addr + addr, prot);
+			continue;
+		}
+
 		if (ioremap_pmd_range(pud, addr, next, phys_addr + addr, prot))
 			return -ENOMEM;
 	} while (pud++, addr = next, addr != end);

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v2 4/7] mm: Change vunmap to tear down huge KVA mappings
  2015-02-09 22:45 ` Toshi Kani
@ 2015-02-09 22:45   ` Toshi Kani
  -1 siblings, 0 replies; 50+ messages in thread
From: Toshi Kani @ 2015-02-09 22:45 UTC (permalink / raw)
  To: akpm, hpa, tglx, mingo, arnd
  Cc: linux-mm, x86, linux-kernel, Elliott, Toshi Kani

Change vunmap_pmd_range() and vunmap_pud_range() to tear down
huge KVA mappings when they are set.

These changes are only enabled when CONFIG_HAVE_ARCH_HUGE_VMAP
is defined.

Signed-off-by: Toshi Kani <toshi.kani@hp.com>
---
 include/asm-generic/pgtable.h |    4 ++++
 mm/vmalloc.c                  |    4 ++++
 2 files changed, 8 insertions(+)

diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h
index 7dc3838..1204ea6 100644
--- a/include/asm-generic/pgtable.h
+++ b/include/asm-generic/pgtable.h
@@ -850,9 +850,13 @@ static inline void pmdp_set_numa(struct mm_struct *mm, unsigned long addr,
 #ifdef CONFIG_HAVE_ARCH_HUGE_VMAP
 void pud_set_huge(pud_t *pud, phys_addr_t addr, pgprot_t prot);
 void pmd_set_huge(pmd_t *pmd, phys_addr_t addr, pgprot_t prot);
+int pud_clear_huge(pud_t *pud);
+int pmd_clear_huge(pmd_t *pmd);
 #else	/* !CONFIG_HAVE_ARCH_HUGE_VMAP */
 static inline void pud_set_huge(pud_t *pud, phys_addr_t addr, pgprot_t prot) { }
 static inline void pmd_set_huge(pmd_t *pmd, phys_addr_t addr, pgprot_t prot) { }
+static inline int pud_clear_huge(pud_t *pud) { return 0; }
+static inline int pmd_clear_huge(pmd_t *pmd) { return 0; }
 #endif	/* CONFIG_HAVE_ARCH_HUGE_VMAP */
 
 #endif /* _ASM_GENERIC_PGTABLE_H */
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 40ea214..dd53a9d 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -75,6 +75,8 @@ static void vunmap_pmd_range(pud_t *pud, unsigned long addr, unsigned long end)
 	pmd = pmd_offset(pud, addr);
 	do {
 		next = pmd_addr_end(addr, end);
+		if (pmd_clear_huge(pmd))
+			continue;
 		if (pmd_none_or_clear_bad(pmd))
 			continue;
 		vunmap_pte_range(pmd, addr, next);
@@ -89,6 +91,8 @@ static void vunmap_pud_range(pgd_t *pgd, unsigned long addr, unsigned long end)
 	pud = pud_offset(pgd, addr);
 	do {
 		next = pud_addr_end(addr, end);
+		if (pud_clear_huge(pud))
+			continue;
 		if (pud_none_or_clear_bad(pud))
 			continue;
 		vunmap_pmd_range(pud, addr, next);

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v2 4/7] mm: Change vunmap to tear down huge KVA mappings
@ 2015-02-09 22:45   ` Toshi Kani
  0 siblings, 0 replies; 50+ messages in thread
From: Toshi Kani @ 2015-02-09 22:45 UTC (permalink / raw)
  To: akpm, hpa, tglx, mingo, arnd
  Cc: linux-mm, x86, linux-kernel, Elliott, Toshi Kani

Change vunmap_pmd_range() and vunmap_pud_range() to tear down
huge KVA mappings when they are set.

These changes are only enabled when CONFIG_HAVE_ARCH_HUGE_VMAP
is defined.

Signed-off-by: Toshi Kani <toshi.kani@hp.com>
---
 include/asm-generic/pgtable.h |    4 ++++
 mm/vmalloc.c                  |    4 ++++
 2 files changed, 8 insertions(+)

diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h
index 7dc3838..1204ea6 100644
--- a/include/asm-generic/pgtable.h
+++ b/include/asm-generic/pgtable.h
@@ -850,9 +850,13 @@ static inline void pmdp_set_numa(struct mm_struct *mm, unsigned long addr,
 #ifdef CONFIG_HAVE_ARCH_HUGE_VMAP
 void pud_set_huge(pud_t *pud, phys_addr_t addr, pgprot_t prot);
 void pmd_set_huge(pmd_t *pmd, phys_addr_t addr, pgprot_t prot);
+int pud_clear_huge(pud_t *pud);
+int pmd_clear_huge(pmd_t *pmd);
 #else	/* !CONFIG_HAVE_ARCH_HUGE_VMAP */
 static inline void pud_set_huge(pud_t *pud, phys_addr_t addr, pgprot_t prot) { }
 static inline void pmd_set_huge(pmd_t *pmd, phys_addr_t addr, pgprot_t prot) { }
+static inline int pud_clear_huge(pud_t *pud) { return 0; }
+static inline int pmd_clear_huge(pmd_t *pmd) { return 0; }
 #endif	/* CONFIG_HAVE_ARCH_HUGE_VMAP */
 
 #endif /* _ASM_GENERIC_PGTABLE_H */
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 40ea214..dd53a9d 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -75,6 +75,8 @@ static void vunmap_pmd_range(pud_t *pud, unsigned long addr, unsigned long end)
 	pmd = pmd_offset(pud, addr);
 	do {
 		next = pmd_addr_end(addr, end);
+		if (pmd_clear_huge(pmd))
+			continue;
 		if (pmd_none_or_clear_bad(pmd))
 			continue;
 		vunmap_pte_range(pmd, addr, next);
@@ -89,6 +91,8 @@ static void vunmap_pud_range(pgd_t *pgd, unsigned long addr, unsigned long end)
 	pud = pud_offset(pgd, addr);
 	do {
 		next = pud_addr_end(addr, end);
+		if (pud_clear_huge(pud))
+			continue;
 		if (pud_none_or_clear_bad(pud))
 			continue;
 		vunmap_pmd_range(pud, addr, next);

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v2 5/7] x86, mm: Support huge KVA mappings on x86
  2015-02-09 22:45 ` Toshi Kani
@ 2015-02-09 22:45   ` Toshi Kani
  -1 siblings, 0 replies; 50+ messages in thread
From: Toshi Kani @ 2015-02-09 22:45 UTC (permalink / raw)
  To: akpm, hpa, tglx, mingo, arnd
  Cc: linux-mm, x86, linux-kernel, Elliott, Toshi Kani

Implement huge KVA mapping interfaces on x86.  Select
HAVE_ARCH_HUGE_VMAP when X86_64 or X86_32 with X86_PAE is set.
Without X86_PAE set, the X86_32 kernel has the 2-level page
tables and cannot provide the huge KVA mappings.

Signed-off-by: Toshi Kani <toshi.kani@hp.com>
---
 arch/x86/Kconfig      |    1 +
 arch/x86/mm/pgtable.c |   34 ++++++++++++++++++++++++++++++++++
 2 files changed, 35 insertions(+)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 0dc9d01..a79e286 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -97,6 +97,7 @@ config X86
 	select IRQ_FORCED_THREADING
 	select HAVE_BPF_JIT if X86_64
 	select HAVE_ARCH_TRANSPARENT_HUGEPAGE
+	select HAVE_ARCH_HUGE_VMAP if X86_64 || (X86_32 && X86_PAE)
 	select ARCH_HAS_SG_CHAIN
 	select CLKEVT_I8253
 	select ARCH_HAVE_NMI_SAFE_CMPXCHG
diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index 6fb6927..e495432 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -481,3 +481,37 @@ void native_set_fixmap(enum fixed_addresses idx, phys_addr_t phys,
 {
 	__native_set_fixmap(idx, pfn_pte(phys >> PAGE_SHIFT, flags));
 }
+
+#ifdef CONFIG_HAVE_ARCH_HUGE_VMAP
+void pud_set_huge(pud_t *pud, phys_addr_t addr, pgprot_t prot)
+{
+	set_pte((pte_t *)pud, pfn_pte(
+		(u64)addr >> PAGE_SHIFT,
+		__pgprot(pgprot_val(prot) | _PAGE_PSE)));
+}
+
+void pmd_set_huge(pmd_t *pmd, phys_addr_t addr, pgprot_t prot)
+{
+	set_pte((pte_t *)pmd, pfn_pte(
+		(u64)addr >> PAGE_SHIFT,
+		__pgprot(pgprot_val(prot) | _PAGE_PSE)));
+}
+
+int pud_clear_huge(pud_t *pud)
+{
+	if (pud_large(*pud)) {
+		pud_clear(pud);
+		return 1;
+	}
+	return 0;
+}
+
+int pmd_clear_huge(pmd_t *pmd)
+{
+	if (pmd_large(*pmd)) {
+		pmd_clear(pmd);
+		return 1;
+	}
+	return 0;
+}
+#endif	/* CONFIG_HAVE_ARCH_HUGE_VMAP */

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v2 5/7] x86, mm: Support huge KVA mappings on x86
@ 2015-02-09 22:45   ` Toshi Kani
  0 siblings, 0 replies; 50+ messages in thread
From: Toshi Kani @ 2015-02-09 22:45 UTC (permalink / raw)
  To: akpm, hpa, tglx, mingo, arnd
  Cc: linux-mm, x86, linux-kernel, Elliott, Toshi Kani

Implement huge KVA mapping interfaces on x86.  Select
HAVE_ARCH_HUGE_VMAP when X86_64 or X86_32 with X86_PAE is set.
Without X86_PAE set, the X86_32 kernel has the 2-level page
tables and cannot provide the huge KVA mappings.

Signed-off-by: Toshi Kani <toshi.kani@hp.com>
---
 arch/x86/Kconfig      |    1 +
 arch/x86/mm/pgtable.c |   34 ++++++++++++++++++++++++++++++++++
 2 files changed, 35 insertions(+)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 0dc9d01..a79e286 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -97,6 +97,7 @@ config X86
 	select IRQ_FORCED_THREADING
 	select HAVE_BPF_JIT if X86_64
 	select HAVE_ARCH_TRANSPARENT_HUGEPAGE
+	select HAVE_ARCH_HUGE_VMAP if X86_64 || (X86_32 && X86_PAE)
 	select ARCH_HAS_SG_CHAIN
 	select CLKEVT_I8253
 	select ARCH_HAVE_NMI_SAFE_CMPXCHG
diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index 6fb6927..e495432 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -481,3 +481,37 @@ void native_set_fixmap(enum fixed_addresses idx, phys_addr_t phys,
 {
 	__native_set_fixmap(idx, pfn_pte(phys >> PAGE_SHIFT, flags));
 }
+
+#ifdef CONFIG_HAVE_ARCH_HUGE_VMAP
+void pud_set_huge(pud_t *pud, phys_addr_t addr, pgprot_t prot)
+{
+	set_pte((pte_t *)pud, pfn_pte(
+		(u64)addr >> PAGE_SHIFT,
+		__pgprot(pgprot_val(prot) | _PAGE_PSE)));
+}
+
+void pmd_set_huge(pmd_t *pmd, phys_addr_t addr, pgprot_t prot)
+{
+	set_pte((pte_t *)pmd, pfn_pte(
+		(u64)addr >> PAGE_SHIFT,
+		__pgprot(pgprot_val(prot) | _PAGE_PSE)));
+}
+
+int pud_clear_huge(pud_t *pud)
+{
+	if (pud_large(*pud)) {
+		pud_clear(pud);
+		return 1;
+	}
+	return 0;
+}
+
+int pmd_clear_huge(pmd_t *pmd)
+{
+	if (pmd_large(*pmd)) {
+		pmd_clear(pmd);
+		return 1;
+	}
+	return 0;
+}
+#endif	/* CONFIG_HAVE_ARCH_HUGE_VMAP */

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v2 6/7] x86, mm: Support huge I/O mappings on x86
  2015-02-09 22:45 ` Toshi Kani
@ 2015-02-09 22:45   ` Toshi Kani
  -1 siblings, 0 replies; 50+ messages in thread
From: Toshi Kani @ 2015-02-09 22:45 UTC (permalink / raw)
  To: akpm, hpa, tglx, mingo, arnd
  Cc: linux-mm, x86, linux-kernel, Elliott, Toshi Kani

This patch implements huge I/O mapping capability interfaces on x86.

IOREMAP_MAX_ORDER is defined to the size of PUD on X86_64 and PMD
on x86_32.  When IOREMAP_MAX_ORDER is not defined on x86, it is
defined to the generic value in <linux/vmalloc.h>.

On x86, the huge I/O mapping may not be used when a target range is
covered by multiple MTRRs with different memory types.  The caller
must make a separate request for each MTRR range, or the huge I/O
mapping can be disabled with the kernel boot option "nohugeiomap".
The detail of this issue is described in the email below, and this
patch takes option C) in favor of simplicity since MTRRs are legacy
feature.
 https://lkml.org/lkml/2015/2/5/638

Signed-off-by: Toshi Kani <toshi.kani@hp.com>
---
 arch/x86/include/asm/page_types.h |    8 ++++++++
 arch/x86/mm/ioremap.c             |   26 ++++++++++++++++++++++++--
 2 files changed, 32 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/page_types.h b/arch/x86/include/asm/page_types.h
index f97fbe3..246426c 100644
--- a/arch/x86/include/asm/page_types.h
+++ b/arch/x86/include/asm/page_types.h
@@ -38,6 +38,14 @@
 
 #define __START_KERNEL		(__START_KERNEL_map + __PHYSICAL_START)
 
+#ifdef CONFIG_HUGE_IOMAP
+#ifdef CONFIG_X86_64
+#define IOREMAP_MAX_ORDER       (PUD_SHIFT)
+#else
+#define IOREMAP_MAX_ORDER       (PMD_SHIFT)
+#endif
+#endif  /* CONFIG_HUGE_IOMAP */
+
 #ifdef CONFIG_X86_64
 #include <asm/page_64_types.h>
 #else
diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
index fdf617c..f97b587 100644
--- a/arch/x86/mm/ioremap.c
+++ b/arch/x86/mm/ioremap.c
@@ -67,8 +67,14 @@ static int __ioremap_check_ram(unsigned long start_pfn, unsigned long nr_pages,
 
 /*
  * Remap an arbitrary physical address space into the kernel virtual
- * address space. Needed when the kernel wants to access high addresses
- * directly.
+ * address space. It transparently creates kernel huge I/O mapping when
+ * the physical address is aligned by a huge page size (1GB or 2MB) and
+ * the requested size is at least the huge page size.
+ *
+ * NOTE: The huge I/O mapping may not be used when a target range is
+ * covered by multiple MTRRs with different memory types. The caller
+ * must make a separate request for each MTRR range, or the huge I/O
+ * mapping can be disabled with the kernel boot option "nohugeiomap".
  *
  * NOTE! We need to allow non-page-aligned mappings too: we will obviously
  * have to convert them into an offset in a page-aligned mapping, but the
@@ -326,6 +332,22 @@ void iounmap(volatile void __iomem *addr)
 }
 EXPORT_SYMBOL(iounmap);
 
+#ifdef CONFIG_HUGE_IOMAP
+int arch_ioremap_pud_supported(void)
+{
+#ifdef CONFIG_X86_64
+	return cpu_has_gbpages;
+#else
+	return 0;
+#endif
+}
+
+int arch_ioremap_pmd_supported(void)
+{
+	return cpu_has_pse;
+}
+#endif	/* CONFIG_HUGE_IOMAP */
+
 /*
  * Convert a physical pointer to a virtual kernel pointer for /dev/mem
  * access

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v2 6/7] x86, mm: Support huge I/O mappings on x86
@ 2015-02-09 22:45   ` Toshi Kani
  0 siblings, 0 replies; 50+ messages in thread
From: Toshi Kani @ 2015-02-09 22:45 UTC (permalink / raw)
  To: akpm, hpa, tglx, mingo, arnd
  Cc: linux-mm, x86, linux-kernel, Elliott, Toshi Kani

This patch implements huge I/O mapping capability interfaces on x86.

IOREMAP_MAX_ORDER is defined to the size of PUD on X86_64 and PMD
on x86_32.  When IOREMAP_MAX_ORDER is not defined on x86, it is
defined to the generic value in <linux/vmalloc.h>.

On x86, the huge I/O mapping may not be used when a target range is
covered by multiple MTRRs with different memory types.  The caller
must make a separate request for each MTRR range, or the huge I/O
mapping can be disabled with the kernel boot option "nohugeiomap".
The detail of this issue is described in the email below, and this
patch takes option C) in favor of simplicity since MTRRs are legacy
feature.
 https://lkml.org/lkml/2015/2/5/638

Signed-off-by: Toshi Kani <toshi.kani@hp.com>
---
 arch/x86/include/asm/page_types.h |    8 ++++++++
 arch/x86/mm/ioremap.c             |   26 ++++++++++++++++++++++++--
 2 files changed, 32 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/page_types.h b/arch/x86/include/asm/page_types.h
index f97fbe3..246426c 100644
--- a/arch/x86/include/asm/page_types.h
+++ b/arch/x86/include/asm/page_types.h
@@ -38,6 +38,14 @@
 
 #define __START_KERNEL		(__START_KERNEL_map + __PHYSICAL_START)
 
+#ifdef CONFIG_HUGE_IOMAP
+#ifdef CONFIG_X86_64
+#define IOREMAP_MAX_ORDER       (PUD_SHIFT)
+#else
+#define IOREMAP_MAX_ORDER       (PMD_SHIFT)
+#endif
+#endif  /* CONFIG_HUGE_IOMAP */
+
 #ifdef CONFIG_X86_64
 #include <asm/page_64_types.h>
 #else
diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
index fdf617c..f97b587 100644
--- a/arch/x86/mm/ioremap.c
+++ b/arch/x86/mm/ioremap.c
@@ -67,8 +67,14 @@ static int __ioremap_check_ram(unsigned long start_pfn, unsigned long nr_pages,
 
 /*
  * Remap an arbitrary physical address space into the kernel virtual
- * address space. Needed when the kernel wants to access high addresses
- * directly.
+ * address space. It transparently creates kernel huge I/O mapping when
+ * the physical address is aligned by a huge page size (1GB or 2MB) and
+ * the requested size is at least the huge page size.
+ *
+ * NOTE: The huge I/O mapping may not be used when a target range is
+ * covered by multiple MTRRs with different memory types. The caller
+ * must make a separate request for each MTRR range, or the huge I/O
+ * mapping can be disabled with the kernel boot option "nohugeiomap".
  *
  * NOTE! We need to allow non-page-aligned mappings too: we will obviously
  * have to convert them into an offset in a page-aligned mapping, but the
@@ -326,6 +332,22 @@ void iounmap(volatile void __iomem *addr)
 }
 EXPORT_SYMBOL(iounmap);
 
+#ifdef CONFIG_HUGE_IOMAP
+int arch_ioremap_pud_supported(void)
+{
+#ifdef CONFIG_X86_64
+	return cpu_has_gbpages;
+#else
+	return 0;
+#endif
+}
+
+int arch_ioremap_pmd_supported(void)
+{
+	return cpu_has_pse;
+}
+#endif	/* CONFIG_HUGE_IOMAP */
+
 /*
  * Convert a physical pointer to a virtual kernel pointer for /dev/mem
  * access

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v2 7/7] mm: Add config HUGE_IOMAP to enable huge I/O mappings
  2015-02-09 22:45 ` Toshi Kani
@ 2015-02-09 22:45   ` Toshi Kani
  -1 siblings, 0 replies; 50+ messages in thread
From: Toshi Kani @ 2015-02-09 22:45 UTC (permalink / raw)
  To: akpm, hpa, tglx, mingo, arnd
  Cc: linux-mm, x86, linux-kernel, Elliott, Toshi Kani

Add config HUGE_IOMAP to enable huge I/O mappings.  This feature
is set to Y by default when HAVE_ARCH_HUGE_VMAP is defined on the
architecture.

Note that user can also disable this feature at boot-time by the
new kernel option "nohugeiomap" when necessary.

Signed-off-by: Toshi Kani <toshi.kani@hp.com>
---
 mm/Kconfig |   11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/mm/Kconfig b/mm/Kconfig
index 1d1ae6b..eb738ae 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -444,6 +444,17 @@ choice
 	  benefit.
 endchoice
 
+config HUGE_IOMAP
+	bool "Kernel huge I/O mapping support"
+	depends on HAVE_ARCH_HUGE_VMAP
+	default y
+	help
+	  Kernel huge I/O mapping allows the kernel to transparently
+	  create I/O mappings with huge pages for memory-mapped I/O
+	  devices whenever possible.  This feature can improve
+	  performance of certain devices with large memory size, such
+	  as NVM, and reduce the time to create their mappings.
+
 #
 # UP and nommu archs use km based percpu allocator
 #

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v2 7/7] mm: Add config HUGE_IOMAP to enable huge I/O mappings
@ 2015-02-09 22:45   ` Toshi Kani
  0 siblings, 0 replies; 50+ messages in thread
From: Toshi Kani @ 2015-02-09 22:45 UTC (permalink / raw)
  To: akpm, hpa, tglx, mingo, arnd
  Cc: linux-mm, x86, linux-kernel, Elliott, Toshi Kani

Add config HUGE_IOMAP to enable huge I/O mappings.  This feature
is set to Y by default when HAVE_ARCH_HUGE_VMAP is defined on the
architecture.

Note that user can also disable this feature at boot-time by the
new kernel option "nohugeiomap" when necessary.

Signed-off-by: Toshi Kani <toshi.kani@hp.com>
---
 mm/Kconfig |   11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/mm/Kconfig b/mm/Kconfig
index 1d1ae6b..eb738ae 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -444,6 +444,17 @@ choice
 	  benefit.
 endchoice
 
+config HUGE_IOMAP
+	bool "Kernel huge I/O mapping support"
+	depends on HAVE_ARCH_HUGE_VMAP
+	default y
+	help
+	  Kernel huge I/O mapping allows the kernel to transparently
+	  create I/O mappings with huge pages for memory-mapped I/O
+	  devices whenever possible.  This feature can improve
+	  performance of certain devices with large memory size, such
+	  as NVM, and reduce the time to create their mappings.
+
 #
 # UP and nommu archs use km based percpu allocator
 #

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 5/7] x86, mm: Support huge KVA mappings on x86
  2015-02-09 22:45   ` Toshi Kani
@ 2015-02-10 18:59     ` Dave Hansen
  -1 siblings, 0 replies; 50+ messages in thread
From: Dave Hansen @ 2015-02-10 18:59 UTC (permalink / raw)
  To: Toshi Kani, akpm, hpa, tglx, mingo, arnd
  Cc: linux-mm, x86, linux-kernel, Elliott

On 02/09/2015 02:45 PM, Toshi Kani wrote:
> Implement huge KVA mapping interfaces on x86.  Select
> HAVE_ARCH_HUGE_VMAP when X86_64 or X86_32 with X86_PAE is set.
> Without X86_PAE set, the X86_32 kernel has the 2-level page
> tables and cannot provide the huge KVA mappings.

Not that it's a big deal, but what's the limitation with the 2-level
page tables on 32-bit?  We have a 4MB large page size available there
and we already use it for the kernel linear mapping.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 5/7] x86, mm: Support huge KVA mappings on x86
@ 2015-02-10 18:59     ` Dave Hansen
  0 siblings, 0 replies; 50+ messages in thread
From: Dave Hansen @ 2015-02-10 18:59 UTC (permalink / raw)
  To: Toshi Kani, akpm, hpa, tglx, mingo, arnd
  Cc: linux-mm, x86, linux-kernel, Elliott

On 02/09/2015 02:45 PM, Toshi Kani wrote:
> Implement huge KVA mapping interfaces on x86.  Select
> HAVE_ARCH_HUGE_VMAP when X86_64 or X86_32 with X86_PAE is set.
> Without X86_PAE set, the X86_32 kernel has the 2-level page
> tables and cannot provide the huge KVA mappings.

Not that it's a big deal, but what's the limitation with the 2-level
page tables on 32-bit?  We have a 4MB large page size available there
and we already use it for the kernel linear mapping.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 5/7] x86, mm: Support huge KVA mappings on x86
  2015-02-10 18:59     ` Dave Hansen
@ 2015-02-10 20:42       ` Toshi Kani
  -1 siblings, 0 replies; 50+ messages in thread
From: Toshi Kani @ 2015-02-10 20:42 UTC (permalink / raw)
  To: Dave Hansen
  Cc: akpm, hpa, tglx, mingo, arnd, linux-mm, x86, linux-kernel, Elliott

On Tue, 2015-02-10 at 10:59 -0800, Dave Hansen wrote:
> On 02/09/2015 02:45 PM, Toshi Kani wrote:
> > Implement huge KVA mapping interfaces on x86.  Select
> > HAVE_ARCH_HUGE_VMAP when X86_64 or X86_32 with X86_PAE is set.
> > Without X86_PAE set, the X86_32 kernel has the 2-level page
> > tables and cannot provide the huge KVA mappings.
> 
> Not that it's a big deal, but what's the limitation with the 2-level
> page tables on 32-bit?  We have a 4MB large page size available there
> and we already use it for the kernel linear mapping.

ioremap() calls arch-neutral ioremap_page_range() to set up I/O mappings
with PTEs.  This patch-set enables ioremap_page_range() to set up PUD &
PMD mappings.  With 2-level page table, I do not think this PUD/PMD
mapping code works unless we add some special code.

Thanks,
-Toshi



^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 5/7] x86, mm: Support huge KVA mappings on x86
@ 2015-02-10 20:42       ` Toshi Kani
  0 siblings, 0 replies; 50+ messages in thread
From: Toshi Kani @ 2015-02-10 20:42 UTC (permalink / raw)
  To: Dave Hansen
  Cc: akpm, hpa, tglx, mingo, arnd, linux-mm, x86, linux-kernel, Elliott

On Tue, 2015-02-10 at 10:59 -0800, Dave Hansen wrote:
> On 02/09/2015 02:45 PM, Toshi Kani wrote:
> > Implement huge KVA mapping interfaces on x86.  Select
> > HAVE_ARCH_HUGE_VMAP when X86_64 or X86_32 with X86_PAE is set.
> > Without X86_PAE set, the X86_32 kernel has the 2-level page
> > tables and cannot provide the huge KVA mappings.
> 
> Not that it's a big deal, but what's the limitation with the 2-level
> page tables on 32-bit?  We have a 4MB large page size available there
> and we already use it for the kernel linear mapping.

ioremap() calls arch-neutral ioremap_page_range() to set up I/O mappings
with PTEs.  This patch-set enables ioremap_page_range() to set up PUD &
PMD mappings.  With 2-level page table, I do not think this PUD/PMD
mapping code works unless we add some special code.

Thanks,
-Toshi


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 5/7] x86, mm: Support huge KVA mappings on x86
  2015-02-10 20:42       ` Toshi Kani
@ 2015-02-10 20:51         ` Dave Hansen
  -1 siblings, 0 replies; 50+ messages in thread
From: Dave Hansen @ 2015-02-10 20:51 UTC (permalink / raw)
  To: Toshi Kani
  Cc: akpm, hpa, tglx, mingo, arnd, linux-mm, x86, linux-kernel, Elliott

On 02/10/2015 12:42 PM, Toshi Kani wrote:
> On Tue, 2015-02-10 at 10:59 -0800, Dave Hansen wrote:
>> On 02/09/2015 02:45 PM, Toshi Kani wrote:
>>> Implement huge KVA mapping interfaces on x86.  Select
>>> HAVE_ARCH_HUGE_VMAP when X86_64 or X86_32 with X86_PAE is set.
>>> Without X86_PAE set, the X86_32 kernel has the 2-level page
>>> tables and cannot provide the huge KVA mappings.
>>
>> Not that it's a big deal, but what's the limitation with the 2-level
>> page tables on 32-bit?  We have a 4MB large page size available there
>> and we already use it for the kernel linear mapping.
> 
> ioremap() calls arch-neutral ioremap_page_range() to set up I/O mappings
> with PTEs.  This patch-set enables ioremap_page_range() to set up PUD &
> PMD mappings.  With 2-level page table, I do not think this PUD/PMD
> mapping code works unless we add some special code.

What actually breaks, though?

Can't you just disable the pud code via ioremap_pud_enabled()?


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 5/7] x86, mm: Support huge KVA mappings on x86
@ 2015-02-10 20:51         ` Dave Hansen
  0 siblings, 0 replies; 50+ messages in thread
From: Dave Hansen @ 2015-02-10 20:51 UTC (permalink / raw)
  To: Toshi Kani
  Cc: akpm, hpa, tglx, mingo, arnd, linux-mm, x86, linux-kernel, Elliott

On 02/10/2015 12:42 PM, Toshi Kani wrote:
> On Tue, 2015-02-10 at 10:59 -0800, Dave Hansen wrote:
>> On 02/09/2015 02:45 PM, Toshi Kani wrote:
>>> Implement huge KVA mapping interfaces on x86.  Select
>>> HAVE_ARCH_HUGE_VMAP when X86_64 or X86_32 with X86_PAE is set.
>>> Without X86_PAE set, the X86_32 kernel has the 2-level page
>>> tables and cannot provide the huge KVA mappings.
>>
>> Not that it's a big deal, but what's the limitation with the 2-level
>> page tables on 32-bit?  We have a 4MB large page size available there
>> and we already use it for the kernel linear mapping.
> 
> ioremap() calls arch-neutral ioremap_page_range() to set up I/O mappings
> with PTEs.  This patch-set enables ioremap_page_range() to set up PUD &
> PMD mappings.  With 2-level page table, I do not think this PUD/PMD
> mapping code works unless we add some special code.

What actually breaks, though?

Can't you just disable the pud code via ioremap_pud_enabled()?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 5/7] x86, mm: Support huge KVA mappings on x86
  2015-02-10 20:51         ` Dave Hansen
@ 2015-02-10 22:13           ` Toshi Kani
  -1 siblings, 0 replies; 50+ messages in thread
From: Toshi Kani @ 2015-02-10 22:13 UTC (permalink / raw)
  To: Dave Hansen
  Cc: akpm, hpa, tglx, mingo, arnd, linux-mm, x86, linux-kernel, Elliott

On Tue, 2015-02-10 at 12:51 -0800, Dave Hansen wrote:
> On 02/10/2015 12:42 PM, Toshi Kani wrote:
> > On Tue, 2015-02-10 at 10:59 -0800, Dave Hansen wrote:
> >> On 02/09/2015 02:45 PM, Toshi Kani wrote:
> >>> Implement huge KVA mapping interfaces on x86.  Select
> >>> HAVE_ARCH_HUGE_VMAP when X86_64 or X86_32 with X86_PAE is set.
> >>> Without X86_PAE set, the X86_32 kernel has the 2-level page
> >>> tables and cannot provide the huge KVA mappings.
> >>
> >> Not that it's a big deal, but what's the limitation with the 2-level
> >> page tables on 32-bit?  We have a 4MB large page size available there
> >> and we already use it for the kernel linear mapping.
> > 
> > ioremap() calls arch-neutral ioremap_page_range() to set up I/O mappings
> > with PTEs.  This patch-set enables ioremap_page_range() to set up PUD &
> > PMD mappings.  With 2-level page table, I do not think this PUD/PMD
> > mapping code works unless we add some special code.
> 
> What actually breaks, though?
> 
> Can't you just disable the pud code via ioremap_pud_enabled()?

That's what v1 did, and I found in testing that the PMD mapping code did
not work when PAE was unset.  I think we need special handling similar
to one_md_table_init(), which returns pgd as pmd in case of non-PAE.
ioremap_page_range() does not have such handling and I thought it would
be worth adding it.

Thanks,
-Toshi



^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 5/7] x86, mm: Support huge KVA mappings on x86
@ 2015-02-10 22:13           ` Toshi Kani
  0 siblings, 0 replies; 50+ messages in thread
From: Toshi Kani @ 2015-02-10 22:13 UTC (permalink / raw)
  To: Dave Hansen
  Cc: akpm, hpa, tglx, mingo, arnd, linux-mm, x86, linux-kernel, Elliott

On Tue, 2015-02-10 at 12:51 -0800, Dave Hansen wrote:
> On 02/10/2015 12:42 PM, Toshi Kani wrote:
> > On Tue, 2015-02-10 at 10:59 -0800, Dave Hansen wrote:
> >> On 02/09/2015 02:45 PM, Toshi Kani wrote:
> >>> Implement huge KVA mapping interfaces on x86.  Select
> >>> HAVE_ARCH_HUGE_VMAP when X86_64 or X86_32 with X86_PAE is set.
> >>> Without X86_PAE set, the X86_32 kernel has the 2-level page
> >>> tables and cannot provide the huge KVA mappings.
> >>
> >> Not that it's a big deal, but what's the limitation with the 2-level
> >> page tables on 32-bit?  We have a 4MB large page size available there
> >> and we already use it for the kernel linear mapping.
> > 
> > ioremap() calls arch-neutral ioremap_page_range() to set up I/O mappings
> > with PTEs.  This patch-set enables ioremap_page_range() to set up PUD &
> > PMD mappings.  With 2-level page table, I do not think this PUD/PMD
> > mapping code works unless we add some special code.
> 
> What actually breaks, though?
> 
> Can't you just disable the pud code via ioremap_pud_enabled()?

That's what v1 did, and I found in testing that the PMD mapping code did
not work when PAE was unset.  I think we need special handling similar
to one_md_table_init(), which returns pgd as pmd in case of non-PAE.
ioremap_page_range() does not have such handling and I thought it would
be worth adding it.

Thanks,
-Toshi


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 5/7] x86, mm: Support huge KVA mappings on x86
  2015-02-10 22:13           ` Toshi Kani
@ 2015-02-10 22:20             ` Toshi Kani
  -1 siblings, 0 replies; 50+ messages in thread
From: Toshi Kani @ 2015-02-10 22:20 UTC (permalink / raw)
  To: Dave Hansen
  Cc: akpm, hpa, tglx, mingo, arnd, linux-mm, x86, linux-kernel, Elliott

On Tue, 2015-02-10 at 15:13 -0700, Toshi Kani wrote:
> On Tue, 2015-02-10 at 12:51 -0800, Dave Hansen wrote:
> > On 02/10/2015 12:42 PM, Toshi Kani wrote:
> > > On Tue, 2015-02-10 at 10:59 -0800, Dave Hansen wrote:
> > >> On 02/09/2015 02:45 PM, Toshi Kani wrote:
> > >>> Implement huge KVA mapping interfaces on x86.  Select
> > >>> HAVE_ARCH_HUGE_VMAP when X86_64 or X86_32 with X86_PAE is set.
> > >>> Without X86_PAE set, the X86_32 kernel has the 2-level page
> > >>> tables and cannot provide the huge KVA mappings.
> > >>
> > >> Not that it's a big deal, but what's the limitation with the 2-level
> > >> page tables on 32-bit?  We have a 4MB large page size available there
> > >> and we already use it for the kernel linear mapping.
> > > 
> > > ioremap() calls arch-neutral ioremap_page_range() to set up I/O mappings
> > > with PTEs.  This patch-set enables ioremap_page_range() to set up PUD &
> > > PMD mappings.  With 2-level page table, I do not think this PUD/PMD
> > > mapping code works unless we add some special code.
> > 
> > What actually breaks, though?
> > 
> > Can't you just disable the pud code via ioremap_pud_enabled()?
> 
> That's what v1 did, and I found in testing that the PMD mapping code did
> not work when PAE was unset.  I think we need special handling similar
> to one_md_table_init(), which returns pgd as pmd in case of non-PAE.
> ioremap_page_range() does not have such handling and I thought it would
> be worth adding it.

Oops, a typo.  The last sentence should be "I thought it would not be
worth adding it."

Thanks,
-Toshi


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 5/7] x86, mm: Support huge KVA mappings on x86
@ 2015-02-10 22:20             ` Toshi Kani
  0 siblings, 0 replies; 50+ messages in thread
From: Toshi Kani @ 2015-02-10 22:20 UTC (permalink / raw)
  To: Dave Hansen
  Cc: akpm, hpa, tglx, mingo, arnd, linux-mm, x86, linux-kernel, Elliott

On Tue, 2015-02-10 at 15:13 -0700, Toshi Kani wrote:
> On Tue, 2015-02-10 at 12:51 -0800, Dave Hansen wrote:
> > On 02/10/2015 12:42 PM, Toshi Kani wrote:
> > > On Tue, 2015-02-10 at 10:59 -0800, Dave Hansen wrote:
> > >> On 02/09/2015 02:45 PM, Toshi Kani wrote:
> > >>> Implement huge KVA mapping interfaces on x86.  Select
> > >>> HAVE_ARCH_HUGE_VMAP when X86_64 or X86_32 with X86_PAE is set.
> > >>> Without X86_PAE set, the X86_32 kernel has the 2-level page
> > >>> tables and cannot provide the huge KVA mappings.
> > >>
> > >> Not that it's a big deal, but what's the limitation with the 2-level
> > >> page tables on 32-bit?  We have a 4MB large page size available there
> > >> and we already use it for the kernel linear mapping.
> > > 
> > > ioremap() calls arch-neutral ioremap_page_range() to set up I/O mappings
> > > with PTEs.  This patch-set enables ioremap_page_range() to set up PUD &
> > > PMD mappings.  With 2-level page table, I do not think this PUD/PMD
> > > mapping code works unless we add some special code.
> > 
> > What actually breaks, though?
> > 
> > Can't you just disable the pud code via ioremap_pud_enabled()?
> 
> That's what v1 did, and I found in testing that the PMD mapping code did
> not work when PAE was unset.  I think we need special handling similar
> to one_md_table_init(), which returns pgd as pmd in case of non-PAE.
> ioremap_page_range() does not have such handling and I thought it would
> be worth adding it.

Oops, a typo.  The last sentence should be "I thought it would not be
worth adding it."

Thanks,
-Toshi

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 5/7] x86, mm: Support huge KVA mappings on x86
  2015-02-10 22:13           ` Toshi Kani
@ 2015-02-10 23:10             ` Toshi Kani
  -1 siblings, 0 replies; 50+ messages in thread
From: Toshi Kani @ 2015-02-10 23:10 UTC (permalink / raw)
  To: Dave Hansen
  Cc: akpm, hpa, tglx, mingo, arnd, linux-mm, x86, linux-kernel, Elliott

On Tue, 2015-02-10 at 15:13 -0700, Toshi Kani wrote:
> On Tue, 2015-02-10 at 12:51 -0800, Dave Hansen wrote:
> > On 02/10/2015 12:42 PM, Toshi Kani wrote:
> > > On Tue, 2015-02-10 at 10:59 -0800, Dave Hansen wrote:
> > >> On 02/09/2015 02:45 PM, Toshi Kani wrote:
> > >>> Implement huge KVA mapping interfaces on x86.  Select
> > >>> HAVE_ARCH_HUGE_VMAP when X86_64 or X86_32 with X86_PAE is set.
> > >>> Without X86_PAE set, the X86_32 kernel has the 2-level page
> > >>> tables and cannot provide the huge KVA mappings.
> > >>
> > >> Not that it's a big deal, but what's the limitation with the 2-level
> > >> page tables on 32-bit?  We have a 4MB large page size available there
> > >> and we already use it for the kernel linear mapping.
> > > 
> > > ioremap() calls arch-neutral ioremap_page_range() to set up I/O mappings
> > > with PTEs.  This patch-set enables ioremap_page_range() to set up PUD &
> > > PMD mappings.  With 2-level page table, I do not think this PUD/PMD
> > > mapping code works unless we add some special code.
> > 
> > What actually breaks, though?
> > 
> > Can't you just disable the pud code via ioremap_pud_enabled()?
> 
> That's what v1 did, and I found in testing that the PMD mapping code did
> not work when PAE was unset.  I think we need special handling similar
> to one_md_table_init(), which returns pgd as pmd in case of non-PAE.
> ioremap_page_range() does not have such handling and I thought it would
> not be worth adding it.

Actually pud_alloc() and pmd_alloc() should carry pgd in this case...  I
will look into the problem to see why it did not work when PAE was
unset.

Thanks,
-Toshi


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 5/7] x86, mm: Support huge KVA mappings on x86
@ 2015-02-10 23:10             ` Toshi Kani
  0 siblings, 0 replies; 50+ messages in thread
From: Toshi Kani @ 2015-02-10 23:10 UTC (permalink / raw)
  To: Dave Hansen
  Cc: akpm, hpa, tglx, mingo, arnd, linux-mm, x86, linux-kernel, Elliott

On Tue, 2015-02-10 at 15:13 -0700, Toshi Kani wrote:
> On Tue, 2015-02-10 at 12:51 -0800, Dave Hansen wrote:
> > On 02/10/2015 12:42 PM, Toshi Kani wrote:
> > > On Tue, 2015-02-10 at 10:59 -0800, Dave Hansen wrote:
> > >> On 02/09/2015 02:45 PM, Toshi Kani wrote:
> > >>> Implement huge KVA mapping interfaces on x86.  Select
> > >>> HAVE_ARCH_HUGE_VMAP when X86_64 or X86_32 with X86_PAE is set.
> > >>> Without X86_PAE set, the X86_32 kernel has the 2-level page
> > >>> tables and cannot provide the huge KVA mappings.
> > >>
> > >> Not that it's a big deal, but what's the limitation with the 2-level
> > >> page tables on 32-bit?  We have a 4MB large page size available there
> > >> and we already use it for the kernel linear mapping.
> > > 
> > > ioremap() calls arch-neutral ioremap_page_range() to set up I/O mappings
> > > with PTEs.  This patch-set enables ioremap_page_range() to set up PUD &
> > > PMD mappings.  With 2-level page table, I do not think this PUD/PMD
> > > mapping code works unless we add some special code.
> > 
> > What actually breaks, though?
> > 
> > Can't you just disable the pud code via ioremap_pud_enabled()?
> 
> That's what v1 did, and I found in testing that the PMD mapping code did
> not work when PAE was unset.  I think we need special handling similar
> to one_md_table_init(), which returns pgd as pmd in case of non-PAE.
> ioremap_page_range() does not have such handling and I thought it would
> not be worth adding it.

Actually pud_alloc() and pmd_alloc() should carry pgd in this case...  I
will look into the problem to see why it did not work when PAE was
unset.

Thanks,
-Toshi

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 6/7] x86, mm: Support huge I/O mappings on x86
  2015-02-09 22:45   ` Toshi Kani
@ 2015-02-18 20:44     ` Ingo Molnar
  -1 siblings, 0 replies; 50+ messages in thread
From: Ingo Molnar @ 2015-02-18 20:44 UTC (permalink / raw)
  To: Toshi Kani
  Cc: akpm, hpa, tglx, mingo, arnd, linux-mm, x86, linux-kernel, Elliott


* Toshi Kani <toshi.kani@hp.com> wrote:

> This patch implements huge I/O mapping capability interfaces on x86.

> +#ifdef CONFIG_HUGE_IOMAP
> +#ifdef CONFIG_X86_64
> +#define IOREMAP_MAX_ORDER       (PUD_SHIFT)
> +#else
> +#define IOREMAP_MAX_ORDER       (PMD_SHIFT)
> +#endif
> +#endif  /* CONFIG_HUGE_IOMAP */

> +#ifdef CONFIG_HUGE_IOMAP

Hm, so why is there a Kconfig option for this? It just 
complicates things.

For example the kernel already defaults to mapping itself 
with as large mappings as possible, without a Kconfig entry 
for it. There's no reason to make this configurable - and 
quite a bit of complexity in the patches comes from this 
configurability.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 6/7] x86, mm: Support huge I/O mappings on x86
@ 2015-02-18 20:44     ` Ingo Molnar
  0 siblings, 0 replies; 50+ messages in thread
From: Ingo Molnar @ 2015-02-18 20:44 UTC (permalink / raw)
  To: Toshi Kani
  Cc: akpm, hpa, tglx, mingo, arnd, linux-mm, x86, linux-kernel, Elliott


* Toshi Kani <toshi.kani@hp.com> wrote:

> This patch implements huge I/O mapping capability interfaces on x86.

> +#ifdef CONFIG_HUGE_IOMAP
> +#ifdef CONFIG_X86_64
> +#define IOREMAP_MAX_ORDER       (PUD_SHIFT)
> +#else
> +#define IOREMAP_MAX_ORDER       (PMD_SHIFT)
> +#endif
> +#endif  /* CONFIG_HUGE_IOMAP */

> +#ifdef CONFIG_HUGE_IOMAP

Hm, so why is there a Kconfig option for this? It just 
complicates things.

For example the kernel already defaults to mapping itself 
with as large mappings as possible, without a Kconfig entry 
for it. There's no reason to make this configurable - and 
quite a bit of complexity in the patches comes from this 
configurability.

Thanks,

	Ingo

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 6/7] x86, mm: Support huge I/O mappings on x86
  2015-02-18 20:44     ` Ingo Molnar
@ 2015-02-18 21:13       ` Toshi Kani
  -1 siblings, 0 replies; 50+ messages in thread
From: Toshi Kani @ 2015-02-18 21:13 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: akpm, hpa, tglx, mingo, arnd, linux-mm, x86, linux-kernel, Elliott

On Wed, 2015-02-18 at 21:44 +0100, Ingo Molnar wrote:
> * Toshi Kani <toshi.kani@hp.com> wrote:
> 
> > This patch implements huge I/O mapping capability interfaces on x86.
> 
> > +#ifdef CONFIG_HUGE_IOMAP
> > +#ifdef CONFIG_X86_64
> > +#define IOREMAP_MAX_ORDER       (PUD_SHIFT)
> > +#else
> > +#define IOREMAP_MAX_ORDER       (PMD_SHIFT)
> > +#endif
> > +#endif  /* CONFIG_HUGE_IOMAP */
> 
> > +#ifdef CONFIG_HUGE_IOMAP
> 
> Hm, so why is there a Kconfig option for this? It just 
> complicates things.
> 
> For example the kernel already defaults to mapping itself 
> with as large mappings as possible, without a Kconfig entry 
> for it. There's no reason to make this configurable - and 
> quite a bit of complexity in the patches comes from this 
> configurability.

This Kconfig option was added to disable this feature in case there is
an issue.  That said, since the patchset also added a new nohugeiomap
boot option for the same purpose, I agree that this Kconfig option can
be removed.  So, I will remove it in the next version.

An example of such case is with multiple MTRRs described in patch 0/7.
However, I believe it is very unlikely to have such platform/use-case,
and it can also be avoided by a driver creating a separate mapping for
each MTRR range. 

Thanks,
-Toshi


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 6/7] x86, mm: Support huge I/O mappings on x86
@ 2015-02-18 21:13       ` Toshi Kani
  0 siblings, 0 replies; 50+ messages in thread
From: Toshi Kani @ 2015-02-18 21:13 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: akpm, hpa, tglx, mingo, arnd, linux-mm, x86, linux-kernel, Elliott

On Wed, 2015-02-18 at 21:44 +0100, Ingo Molnar wrote:
> * Toshi Kani <toshi.kani@hp.com> wrote:
> 
> > This patch implements huge I/O mapping capability interfaces on x86.
> 
> > +#ifdef CONFIG_HUGE_IOMAP
> > +#ifdef CONFIG_X86_64
> > +#define IOREMAP_MAX_ORDER       (PUD_SHIFT)
> > +#else
> > +#define IOREMAP_MAX_ORDER       (PMD_SHIFT)
> > +#endif
> > +#endif  /* CONFIG_HUGE_IOMAP */
> 
> > +#ifdef CONFIG_HUGE_IOMAP
> 
> Hm, so why is there a Kconfig option for this? It just 
> complicates things.
> 
> For example the kernel already defaults to mapping itself 
> with as large mappings as possible, without a Kconfig entry 
> for it. There's no reason to make this configurable - and 
> quite a bit of complexity in the patches comes from this 
> configurability.

This Kconfig option was added to disable this feature in case there is
an issue.  That said, since the patchset also added a new nohugeiomap
boot option for the same purpose, I agree that this Kconfig option can
be removed.  So, I will remove it in the next version.

An example of such case is with multiple MTRRs described in patch 0/7.
However, I believe it is very unlikely to have such platform/use-case,
and it can also be avoided by a driver creating a separate mapping for
each MTRR range. 

Thanks,
-Toshi

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 6/7] x86, mm: Support huge I/O mappings on x86
  2015-02-18 21:13       ` Toshi Kani
@ 2015-02-18 21:15         ` Ingo Molnar
  -1 siblings, 0 replies; 50+ messages in thread
From: Ingo Molnar @ 2015-02-18 21:15 UTC (permalink / raw)
  To: Toshi Kani
  Cc: akpm, hpa, tglx, mingo, arnd, linux-mm, x86, linux-kernel, Elliott


* Toshi Kani <toshi.kani@hp.com> wrote:

> On Wed, 2015-02-18 at 21:44 +0100, Ingo Molnar wrote:
> > * Toshi Kani <toshi.kani@hp.com> wrote:
> > 
> > > This patch implements huge I/O mapping capability interfaces on x86.
> > 
> > > +#ifdef CONFIG_HUGE_IOMAP
> > > +#ifdef CONFIG_X86_64
> > > +#define IOREMAP_MAX_ORDER       (PUD_SHIFT)
> > > +#else
> > > +#define IOREMAP_MAX_ORDER       (PMD_SHIFT)
> > > +#endif
> > > +#endif  /* CONFIG_HUGE_IOMAP */
> > 
> > > +#ifdef CONFIG_HUGE_IOMAP
> > 
> > Hm, so why is there a Kconfig option for this? It just 
> > complicates things.
> > 
> > For example the kernel already defaults to mapping itself 
> > with as large mappings as possible, without a Kconfig entry 
> > for it. There's no reason to make this configurable - and 
> > quite a bit of complexity in the patches comes from this 
> > configurability.
> 
> This Kconfig option was added to disable this feature in 
> case there is an issue. [...]

If bugs are found then they should be fixed.

> [...]  That said, since the patchset also added a new 
> nohugeiomap boot option for the same purpose, I agree 
> that this Kconfig option can be removed.  So, I will 
> remove it in the next version.
> 
> An example of such case is with multiple MTRRs described 
> in patch 0/7.

So the multi-MTRR case should probably be detected and 
handled safely?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 6/7] x86, mm: Support huge I/O mappings on x86
@ 2015-02-18 21:15         ` Ingo Molnar
  0 siblings, 0 replies; 50+ messages in thread
From: Ingo Molnar @ 2015-02-18 21:15 UTC (permalink / raw)
  To: Toshi Kani
  Cc: akpm, hpa, tglx, mingo, arnd, linux-mm, x86, linux-kernel, Elliott


* Toshi Kani <toshi.kani@hp.com> wrote:

> On Wed, 2015-02-18 at 21:44 +0100, Ingo Molnar wrote:
> > * Toshi Kani <toshi.kani@hp.com> wrote:
> > 
> > > This patch implements huge I/O mapping capability interfaces on x86.
> > 
> > > +#ifdef CONFIG_HUGE_IOMAP
> > > +#ifdef CONFIG_X86_64
> > > +#define IOREMAP_MAX_ORDER       (PUD_SHIFT)
> > > +#else
> > > +#define IOREMAP_MAX_ORDER       (PMD_SHIFT)
> > > +#endif
> > > +#endif  /* CONFIG_HUGE_IOMAP */
> > 
> > > +#ifdef CONFIG_HUGE_IOMAP
> > 
> > Hm, so why is there a Kconfig option for this? It just 
> > complicates things.
> > 
> > For example the kernel already defaults to mapping itself 
> > with as large mappings as possible, without a Kconfig entry 
> > for it. There's no reason to make this configurable - and 
> > quite a bit of complexity in the patches comes from this 
> > configurability.
> 
> This Kconfig option was added to disable this feature in 
> case there is an issue. [...]

If bugs are found then they should be fixed.

> [...]  That said, since the patchset also added a new 
> nohugeiomap boot option for the same purpose, I agree 
> that this Kconfig option can be removed.  So, I will 
> remove it in the next version.
> 
> An example of such case is with multiple MTRRs described 
> in patch 0/7.

So the multi-MTRR case should probably be detected and 
handled safely?

Thanks,

	Ingo

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 6/7] x86, mm: Support huge I/O mappings on x86
  2015-02-18 21:15         ` Ingo Molnar
@ 2015-02-18 21:33           ` Toshi Kani
  -1 siblings, 0 replies; 50+ messages in thread
From: Toshi Kani @ 2015-02-18 21:33 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: akpm, hpa, tglx, mingo, arnd, linux-mm, x86, linux-kernel, Elliott

On Wed, 2015-02-18 at 22:15 +0100, Ingo Molnar wrote:
> * Toshi Kani <toshi.kani@hp.com> wrote:
> 
> > On Wed, 2015-02-18 at 21:44 +0100, Ingo Molnar wrote:
> > > * Toshi Kani <toshi.kani@hp.com> wrote:
> > > 
> > > > This patch implements huge I/O mapping capability interfaces on x86.
> > > 
> > > > +#ifdef CONFIG_HUGE_IOMAP
> > > > +#ifdef CONFIG_X86_64
> > > > +#define IOREMAP_MAX_ORDER       (PUD_SHIFT)
> > > > +#else
> > > > +#define IOREMAP_MAX_ORDER       (PMD_SHIFT)
> > > > +#endif
> > > > +#endif  /* CONFIG_HUGE_IOMAP */
> > > 
> > > > +#ifdef CONFIG_HUGE_IOMAP
> > > 
> > > Hm, so why is there a Kconfig option for this? It just 
> > > complicates things.
> > > 
> > > For example the kernel already defaults to mapping itself 
> > > with as large mappings as possible, without a Kconfig entry 
> > > for it. There's no reason to make this configurable - and 
> > > quite a bit of complexity in the patches comes from this 
> > > configurability.
> > 
> > This Kconfig option was added to disable this feature in 
> > case there is an issue. [...]
> 
> If bugs are found then they should be fixed.

Right.

> > [...]  That said, since the patchset also added a new 
> > nohugeiomap boot option for the same purpose, I agree 
> > that this Kconfig option can be removed.  So, I will 
> > remove it in the next version.
> > 
> > An example of such case is with multiple MTRRs described 
> > in patch 0/7.
> 
> So the multi-MTRR case should probably be detected and 
> handled safely?

I considered two options to safely handle this case, i.e. option A) and
B) described in the link below.
  https://lkml.org/lkml/2015/2/5/638

I thought about how much complication we should put into the code for an
imaginable platform with a combination of new NVM (or large I/O range)
and legacy MTRRs with multi-types & contiguous ranges.  My thinking is
that we should go with option C) for simplicity, and implement A) or B)
later if we find it necessary.

Thanks,
-Toshi






^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 6/7] x86, mm: Support huge I/O mappings on x86
@ 2015-02-18 21:33           ` Toshi Kani
  0 siblings, 0 replies; 50+ messages in thread
From: Toshi Kani @ 2015-02-18 21:33 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: akpm, hpa, tglx, mingo, arnd, linux-mm, x86, linux-kernel, Elliott

On Wed, 2015-02-18 at 22:15 +0100, Ingo Molnar wrote:
> * Toshi Kani <toshi.kani@hp.com> wrote:
> 
> > On Wed, 2015-02-18 at 21:44 +0100, Ingo Molnar wrote:
> > > * Toshi Kani <toshi.kani@hp.com> wrote:
> > > 
> > > > This patch implements huge I/O mapping capability interfaces on x86.
> > > 
> > > > +#ifdef CONFIG_HUGE_IOMAP
> > > > +#ifdef CONFIG_X86_64
> > > > +#define IOREMAP_MAX_ORDER       (PUD_SHIFT)
> > > > +#else
> > > > +#define IOREMAP_MAX_ORDER       (PMD_SHIFT)
> > > > +#endif
> > > > +#endif  /* CONFIG_HUGE_IOMAP */
> > > 
> > > > +#ifdef CONFIG_HUGE_IOMAP
> > > 
> > > Hm, so why is there a Kconfig option for this? It just 
> > > complicates things.
> > > 
> > > For example the kernel already defaults to mapping itself 
> > > with as large mappings as possible, without a Kconfig entry 
> > > for it. There's no reason to make this configurable - and 
> > > quite a bit of complexity in the patches comes from this 
> > > configurability.
> > 
> > This Kconfig option was added to disable this feature in 
> > case there is an issue. [...]
> 
> If bugs are found then they should be fixed.

Right.

> > [...]  That said, since the patchset also added a new 
> > nohugeiomap boot option for the same purpose, I agree 
> > that this Kconfig option can be removed.  So, I will 
> > remove it in the next version.
> > 
> > An example of such case is with multiple MTRRs described 
> > in patch 0/7.
> 
> So the multi-MTRR case should probably be detected and 
> handled safely?

I considered two options to safely handle this case, i.e. option A) and
B) described in the link below.
  https://lkml.org/lkml/2015/2/5/638

I thought about how much complication we should put into the code for an
imaginable platform with a combination of new NVM (or large I/O range)
and legacy MTRRs with multi-types & contiguous ranges.  My thinking is
that we should go with option C) for simplicity, and implement A) or B)
later if we find it necessary.

Thanks,
-Toshi





--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 6/7] x86, mm: Support huge I/O mappings on x86
  2015-02-18 21:33           ` Toshi Kani
@ 2015-02-18 21:57             ` Ingo Molnar
  -1 siblings, 0 replies; 50+ messages in thread
From: Ingo Molnar @ 2015-02-18 21:57 UTC (permalink / raw)
  To: Toshi Kani
  Cc: akpm, hpa, tglx, mingo, arnd, linux-mm, x86, linux-kernel, Elliott


* Toshi Kani <toshi.kani@hp.com> wrote:

> On Wed, 2015-02-18 at 22:15 +0100, Ingo Molnar wrote:
> > * Toshi Kani <toshi.kani@hp.com> wrote:
> > 
> > > On Wed, 2015-02-18 at 21:44 +0100, Ingo Molnar wrote:
> > > > * Toshi Kani <toshi.kani@hp.com> wrote:
> > > > 
> > > > > This patch implements huge I/O mapping capability interfaces on x86.
> > > > 
> > > > > +#ifdef CONFIG_HUGE_IOMAP
> > > > > +#ifdef CONFIG_X86_64
> > > > > +#define IOREMAP_MAX_ORDER       (PUD_SHIFT)
> > > > > +#else
> > > > > +#define IOREMAP_MAX_ORDER       (PMD_SHIFT)
> > > > > +#endif
> > > > > +#endif  /* CONFIG_HUGE_IOMAP */
> > > > 
> > > > > +#ifdef CONFIG_HUGE_IOMAP
> > > > 
> > > > Hm, so why is there a Kconfig option for this? It just 
> > > > complicates things.
> > > > 
> > > > For example the kernel already defaults to mapping itself 
> > > > with as large mappings as possible, without a Kconfig entry 
> > > > for it. There's no reason to make this configurable - and 
> > > > quite a bit of complexity in the patches comes from this 
> > > > configurability.
> > > 
> > > This Kconfig option was added to disable this feature in 
> > > case there is an issue. [...]
> > 
> > If bugs are found then they should be fixed.
> 
> Right.
> 
> > > [...]  That said, since the patchset also added a new 
> > > nohugeiomap boot option for the same purpose, I agree 
> > > that this Kconfig option can be removed.  So, I will 
> > > remove it in the next version.
> > > 
> > > An example of such case is with multiple MTRRs described 
> > > in patch 0/7.
> > 
> > So the multi-MTRR case should probably be detected and 
> > handled safely?
> 
> I considered two options to safely handle this case, i.e. 
> option A) and B) described in the link below.
>
>   https://lkml.org/lkml/2015/2/5/638
> 
> I thought about how much complication we should put into 
> the code for an imaginable platform with a combination of 
> new NVM (or large I/O range) and legacy MTRRs with 
> multi-types & contiguous ranges.  My thinking is that we 
> should go with option C) for simplicity, and implement A) 
> or B) later if we find it necessary.

Well, why not option D):

   D) detect unaligned requests and reject them

?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 6/7] x86, mm: Support huge I/O mappings on x86
@ 2015-02-18 21:57             ` Ingo Molnar
  0 siblings, 0 replies; 50+ messages in thread
From: Ingo Molnar @ 2015-02-18 21:57 UTC (permalink / raw)
  To: Toshi Kani
  Cc: akpm, hpa, tglx, mingo, arnd, linux-mm, x86, linux-kernel, Elliott


* Toshi Kani <toshi.kani@hp.com> wrote:

> On Wed, 2015-02-18 at 22:15 +0100, Ingo Molnar wrote:
> > * Toshi Kani <toshi.kani@hp.com> wrote:
> > 
> > > On Wed, 2015-02-18 at 21:44 +0100, Ingo Molnar wrote:
> > > > * Toshi Kani <toshi.kani@hp.com> wrote:
> > > > 
> > > > > This patch implements huge I/O mapping capability interfaces on x86.
> > > > 
> > > > > +#ifdef CONFIG_HUGE_IOMAP
> > > > > +#ifdef CONFIG_X86_64
> > > > > +#define IOREMAP_MAX_ORDER       (PUD_SHIFT)
> > > > > +#else
> > > > > +#define IOREMAP_MAX_ORDER       (PMD_SHIFT)
> > > > > +#endif
> > > > > +#endif  /* CONFIG_HUGE_IOMAP */
> > > > 
> > > > > +#ifdef CONFIG_HUGE_IOMAP
> > > > 
> > > > Hm, so why is there a Kconfig option for this? It just 
> > > > complicates things.
> > > > 
> > > > For example the kernel already defaults to mapping itself 
> > > > with as large mappings as possible, without a Kconfig entry 
> > > > for it. There's no reason to make this configurable - and 
> > > > quite a bit of complexity in the patches comes from this 
> > > > configurability.
> > > 
> > > This Kconfig option was added to disable this feature in 
> > > case there is an issue. [...]
> > 
> > If bugs are found then they should be fixed.
> 
> Right.
> 
> > > [...]  That said, since the patchset also added a new 
> > > nohugeiomap boot option for the same purpose, I agree 
> > > that this Kconfig option can be removed.  So, I will 
> > > remove it in the next version.
> > > 
> > > An example of such case is with multiple MTRRs described 
> > > in patch 0/7.
> > 
> > So the multi-MTRR case should probably be detected and 
> > handled safely?
> 
> I considered two options to safely handle this case, i.e. 
> option A) and B) described in the link below.
>
>   https://lkml.org/lkml/2015/2/5/638
> 
> I thought about how much complication we should put into 
> the code for an imaginable platform with a combination of 
> new NVM (or large I/O range) and legacy MTRRs with 
> multi-types & contiguous ranges.  My thinking is that we 
> should go with option C) for simplicity, and implement A) 
> or B) later if we find it necessary.

Well, why not option D):

   D) detect unaligned requests and reject them

?

Thanks,

	Ingo

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 6/7] x86, mm: Support huge I/O mappings on x86
  2015-02-18 21:57             ` Ingo Molnar
@ 2015-02-18 22:14               ` Toshi Kani
  -1 siblings, 0 replies; 50+ messages in thread
From: Toshi Kani @ 2015-02-18 22:14 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: akpm, hpa, tglx, mingo, arnd, linux-mm, x86, linux-kernel, Elliott

On Wed, 2015-02-18 at 22:57 +0100, Ingo Molnar wrote:
> * Toshi Kani <toshi.kani@hp.com> wrote:
> 
> > On Wed, 2015-02-18 at 22:15 +0100, Ingo Molnar wrote:
> > > * Toshi Kani <toshi.kani@hp.com> wrote:
> > > 
> > > > On Wed, 2015-02-18 at 21:44 +0100, Ingo Molnar wrote:
 :
> > 
> > > > [...]  That said, since the patchset also added a new 
> > > > nohugeiomap boot option for the same purpose, I agree 
> > > > that this Kconfig option can be removed.  So, I will 
> > > > remove it in the next version.
> > > > 
> > > > An example of such case is with multiple MTRRs described 
> > > > in patch 0/7.
> > > 
> > > So the multi-MTRR case should probably be detected and 
> > > handled safely?
> > 
> > I considered two options to safely handle this case, i.e. 
> > option A) and B) described in the link below.
> >
> >   https://lkml.org/lkml/2015/2/5/638
> > 
> > I thought about how much complication we should put into 
> > the code for an imaginable platform with a combination of 
> > new NVM (or large I/O range) and legacy MTRRs with 
> > multi-types & contiguous ranges.  My thinking is that we 
> > should go with option C) for simplicity, and implement A) 
> > or B) later if we find it necessary.
> 
> Well, why not option D):
> 
>    D) detect unaligned requests and reject them
> 

That sounds like a good idea!  I will work on it. 

Thanks,
-Toshi



^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 6/7] x86, mm: Support huge I/O mappings on x86
@ 2015-02-18 22:14               ` Toshi Kani
  0 siblings, 0 replies; 50+ messages in thread
From: Toshi Kani @ 2015-02-18 22:14 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: akpm, hpa, tglx, mingo, arnd, linux-mm, x86, linux-kernel, Elliott

On Wed, 2015-02-18 at 22:57 +0100, Ingo Molnar wrote:
> * Toshi Kani <toshi.kani@hp.com> wrote:
> 
> > On Wed, 2015-02-18 at 22:15 +0100, Ingo Molnar wrote:
> > > * Toshi Kani <toshi.kani@hp.com> wrote:
> > > 
> > > > On Wed, 2015-02-18 at 21:44 +0100, Ingo Molnar wrote:
 :
> > 
> > > > [...]  That said, since the patchset also added a new 
> > > > nohugeiomap boot option for the same purpose, I agree 
> > > > that this Kconfig option can be removed.  So, I will 
> > > > remove it in the next version.
> > > > 
> > > > An example of such case is with multiple MTRRs described 
> > > > in patch 0/7.
> > > 
> > > So the multi-MTRR case should probably be detected and 
> > > handled safely?
> > 
> > I considered two options to safely handle this case, i.e. 
> > option A) and B) described in the link below.
> >
> >   https://lkml.org/lkml/2015/2/5/638
> > 
> > I thought about how much complication we should put into 
> > the code for an imaginable platform with a combination of 
> > new NVM (or large I/O range) and legacy MTRRs with 
> > multi-types & contiguous ranges.  My thinking is that we 
> > should go with option C) for simplicity, and implement A) 
> > or B) later if we find it necessary.
> 
> Well, why not option D):
> 
>    D) detect unaligned requests and reject them
> 

That sounds like a good idea!  I will work on it. 

Thanks,
-Toshi


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 0/7] Kernel huge I/O mapping support
  2015-02-09 22:45 ` Toshi Kani
@ 2015-02-23 20:22   ` Andrew Morton
  -1 siblings, 0 replies; 50+ messages in thread
From: Andrew Morton @ 2015-02-23 20:22 UTC (permalink / raw)
  To: Toshi Kani; +Cc: hpa, tglx, mingo, arnd, linux-mm, x86, linux-kernel, Elliott

On Mon,  9 Feb 2015 15:45:28 -0700 Toshi Kani <toshi.kani@hp.com> wrote:

> ioremap() and its related interfaces are used to create I/O
> mappings to memory-mapped I/O devices.  The mapping sizes of
> the traditional I/O devices are relatively small.  Non-volatile
> memory (NVM), however, has many GB and is going to have TB soon.
> It is not very efficient to create large I/O mappings with 4KB. 

The changelogging is very good - thanks for taking the time to do this.

> This patchset extends the ioremap() interfaces to transparently
> create I/O mappings with huge pages whenever possible.

I'm wondering if this is prudent.  Existing code which was tested with
4k mappings will magically start to use huge tlb mappings.  I don't
know what could go wrong, but I'd prefer not to find out!  Wouldn't it
be safer to make this an explicit opt-in?

What operations can presently be performed against an ioremapped area? 
Can kernel code perform change_page_attr() against individual pages? 
Can kernel code run iounmap() against just part of that region (I
forget).  There does seem to be potential for breakage if we start
using hugetlb mappings for such things?

>  ioremap()
> continues to use 4KB mappings when a huge page does not fit into
> a requested range.  There is no change necessary to the drivers
> using ioremap().  A requested physical address must be aligned by
> a huge page size (1GB or 2MB on x86) for using huge page mapping,
> though.  The kernel huge I/O mapping will improve performance of
> NVM and other devices with large memory, and reduce the time to
> create their mappings as well.
> 
> On x86, the huge I/O mapping may not be used when a target range is
> covered by multiple MTRRs with different memory types.  The caller
> must make a separate request for each MTRR range, or the huge I/O
> mapping can be disabled with the kernel boot option "nohugeiomap".
> The detail of this issue is described in the email below, and this
> patch takes option C) in favor of simplicity since MTRRs are legacy
> feature.
>  https://lkml.org/lkml/2015/2/5/638

How is this mtrr clash handled?

- The iomap call will fail if there are any MTRRs covering the region?

- The iomap call will fail if there are more than one MTRRs covering
  the region?

- If the ioremap will succeed if a single MTRR covers the region,
  must that MTRR cover the *entire* region?

- What happens if userspace tried fiddling the MTRRs after the region
  has been established?

<reads the code>

Oh.  We don't do any checking at all.  We're just telling userspace
programmers "don't do that".  hrm.  What are your thoughts on adding
the overlap checks to the kernel?

This adds more potential for breaking existing code, doesn't it?  If
there's code which is using 4k ioremap on regions which are covered by
mtrrs, the transparent switch to hugeptes will cause that code to enter
the "undefined behaviour" space?

> The patchset introduces the following configs:
>  HUGE_IOMAP - When selected (default Y), enable huge I/O mappings.
>               Require HAVE_ARCH_HUGE_VMAP set.
>  HAVE_ARCH_HUGE_VMAP - Indicate arch supports huge KVA mappings.
>                        Require X86_PAE set on X86_32.
> 
> Patch 1-4 changes common files to support huge I/O mappings.  There
> is no change in the functinalities until HUGE_IOMAP is set in patch 7.
> 
> Patch 5,6 implement HAVE_ARCH_HUGE_VMAP and HUGE_IOMAP funcs on x86,
> and set HAVE_ARCH_HUGE_VMAP on x86.
> 
> Patch 7 adds HUGE_IOMAP to Kconfig, which is set to Y by default on
> x86.

What do other architectures need to do to utilize this?



^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 0/7] Kernel huge I/O mapping support
@ 2015-02-23 20:22   ` Andrew Morton
  0 siblings, 0 replies; 50+ messages in thread
From: Andrew Morton @ 2015-02-23 20:22 UTC (permalink / raw)
  To: Toshi Kani; +Cc: hpa, tglx, mingo, arnd, linux-mm, x86, linux-kernel, Elliott

On Mon,  9 Feb 2015 15:45:28 -0700 Toshi Kani <toshi.kani@hp.com> wrote:

> ioremap() and its related interfaces are used to create I/O
> mappings to memory-mapped I/O devices.  The mapping sizes of
> the traditional I/O devices are relatively small.  Non-volatile
> memory (NVM), however, has many GB and is going to have TB soon.
> It is not very efficient to create large I/O mappings with 4KB. 

The changelogging is very good - thanks for taking the time to do this.

> This patchset extends the ioremap() interfaces to transparently
> create I/O mappings with huge pages whenever possible.

I'm wondering if this is prudent.  Existing code which was tested with
4k mappings will magically start to use huge tlb mappings.  I don't
know what could go wrong, but I'd prefer not to find out!  Wouldn't it
be safer to make this an explicit opt-in?

What operations can presently be performed against an ioremapped area? 
Can kernel code perform change_page_attr() against individual pages? 
Can kernel code run iounmap() against just part of that region (I
forget).  There does seem to be potential for breakage if we start
using hugetlb mappings for such things?

>  ioremap()
> continues to use 4KB mappings when a huge page does not fit into
> a requested range.  There is no change necessary to the drivers
> using ioremap().  A requested physical address must be aligned by
> a huge page size (1GB or 2MB on x86) for using huge page mapping,
> though.  The kernel huge I/O mapping will improve performance of
> NVM and other devices with large memory, and reduce the time to
> create their mappings as well.
> 
> On x86, the huge I/O mapping may not be used when a target range is
> covered by multiple MTRRs with different memory types.  The caller
> must make a separate request for each MTRR range, or the huge I/O
> mapping can be disabled with the kernel boot option "nohugeiomap".
> The detail of this issue is described in the email below, and this
> patch takes option C) in favor of simplicity since MTRRs are legacy
> feature.
>  https://lkml.org/lkml/2015/2/5/638

How is this mtrr clash handled?

- The iomap call will fail if there are any MTRRs covering the region?

- The iomap call will fail if there are more than one MTRRs covering
  the region?

- If the ioremap will succeed if a single MTRR covers the region,
  must that MTRR cover the *entire* region?

- What happens if userspace tried fiddling the MTRRs after the region
  has been established?

<reads the code>

Oh.  We don't do any checking at all.  We're just telling userspace
programmers "don't do that".  hrm.  What are your thoughts on adding
the overlap checks to the kernel?

This adds more potential for breaking existing code, doesn't it?  If
there's code which is using 4k ioremap on regions which are covered by
mtrrs, the transparent switch to hugeptes will cause that code to enter
the "undefined behaviour" space?

> The patchset introduces the following configs:
>  HUGE_IOMAP - When selected (default Y), enable huge I/O mappings.
>               Require HAVE_ARCH_HUGE_VMAP set.
>  HAVE_ARCH_HUGE_VMAP - Indicate arch supports huge KVA mappings.
>                        Require X86_PAE set on X86_32.
> 
> Patch 1-4 changes common files to support huge I/O mappings.  There
> is no change in the functinalities until HUGE_IOMAP is set in patch 7.
> 
> Patch 5,6 implement HAVE_ARCH_HUGE_VMAP and HUGE_IOMAP funcs on x86,
> and set HAVE_ARCH_HUGE_VMAP on x86.
> 
> Patch 7 adds HUGE_IOMAP to Kconfig, which is set to Y by default on
> x86.

What do other architectures need to do to utilize this?


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 0/7] Kernel huge I/O mapping support
  2015-02-23 20:22   ` Andrew Morton
@ 2015-02-23 23:54     ` Toshi Kani
  -1 siblings, 0 replies; 50+ messages in thread
From: Toshi Kani @ 2015-02-23 23:54 UTC (permalink / raw)
  To: Andrew Morton
  Cc: hpa, tglx, mingo, arnd, linux-mm, x86, linux-kernel, Elliott

On Mon, 2015-02-23 at 12:22 -0800, Andrew Morton wrote:
> On Mon,  9 Feb 2015 15:45:28 -0700 Toshi Kani <toshi.kani@hp.com> wrote:
> 
> > ioremap() and its related interfaces are used to create I/O
> > mappings to memory-mapped I/O devices.  The mapping sizes of
> > the traditional I/O devices are relatively small.  Non-volatile
> > memory (NVM), however, has many GB and is going to have TB soon.
> > It is not very efficient to create large I/O mappings with 4KB. 
> 
> The changelogging is very good - thanks for taking the time to do this.
> 
> > This patchset extends the ioremap() interfaces to transparently
> > create I/O mappings with huge pages whenever possible.
> 
> I'm wondering if this is prudent.  Existing code which was tested with
> 4k mappings will magically start to use huge tlb mappings.  I don't
> know what could go wrong, but I'd prefer not to find out!  Wouldn't it
> be safer to make this an explicit opt-in?

There were related discussions on this.  This v2 patchset actually has
CONFIG_HUGE_IOMAP, which allows user to select this feature.  As
suggested in the thread below, I am going to remove this
CONFIG_HUGE_IOMAP, so that it will be simpler and similar to how we
create huge mappings to the kernel itself.  If bugs are found, they will
be fixed.
https://lkml.org/lkml/2015/2/18/677

> What operations can presently be performed against an ioremapped area? 
> Can kernel code perform change_page_attr() against individual pages? 
> Can kernel code run iounmap() against just part of that region (I
> forget).  There does seem to be potential for breakage if we start
> using hugetlb mappings for such things?

Yes, kernel code can use the CPA interfaces, such as set_memory_x() and
set_memory_ro() to an ioremapped area.  CPA breaks a huge page to
smaller pages.  I have included them into my test cases and confirmed
they work.  (Note, memory type change interfaces, such as
set_memory_uc() and set_memory_wc(), are not supported to an ioremapped
area regardless of their page size.)

iounmap() only takes a single argument, virtual base addr.  It looks up
the corresponding vm area object from the virt addr, and always removes
the entire mapping.

> >  ioremap()
> > continues to use 4KB mappings when a huge page does not fit into
> > a requested range.  There is no change necessary to the drivers
> > using ioremap().  A requested physical address must be aligned by
> > a huge page size (1GB or 2MB on x86) for using huge page mapping,
> > though.  The kernel huge I/O mapping will improve performance of
> > NVM and other devices with large memory, and reduce the time to
> > create their mappings as well.
> > 
> > On x86, the huge I/O mapping may not be used when a target range is
> > covered by multiple MTRRs with different memory types.  The caller
> > must make a separate request for each MTRR range, or the huge I/O
> > mapping can be disabled with the kernel boot option "nohugeiomap".
> > The detail of this issue is described in the email below, and this
> > patch takes option C) in favor of simplicity since MTRRs are legacy
> > feature.
> >  https://lkml.org/lkml/2015/2/5/638
> 
> How is this mtrr clash handled?
> 
> - The iomap call will fail if there are any MTRRs covering the region?
> 
> - The iomap call will fail if there are more than one MTRRs covering
>   the region?
>
> - If the ioremap will succeed if a single MTRR covers the region,
>   must that MTRR cover the *entire* region?
> 
> - What happens if userspace tried fiddling the MTRRs after the region
>   has been established?
> 
> <reads the code>

This issue was also discussed in the same thread:
https://lkml.org/lkml/2015/2/18/677

I am going to implement option D -- the iomap call will fail if there
are more than one MTRRs with "different types" covering the region. 

> Oh.  We don't do any checking at all.  We're just telling userspace
> programmers "don't do that".  hrm.  What are your thoughts on adding
> the overlap checks to the kernel?
>
> This adds more potential for breaking existing code, doesn't it?  If
> there's code which is using 4k ioremap on regions which are covered by
> mtrrs, the transparent switch to hugeptes will cause that code to enter
> the "undefined behaviour" space?

Yes, I agree with your concern, and I am going to add the check.  I do
not think we have such platform today, and will be affected by this
change, though.

> > The patchset introduces the following configs:
> >  HUGE_IOMAP - When selected (default Y), enable huge I/O mappings.
> >               Require HAVE_ARCH_HUGE_VMAP set.
> >  HAVE_ARCH_HUGE_VMAP - Indicate arch supports huge KVA mappings.
> >                        Require X86_PAE set on X86_32.
> > 
> > Patch 1-4 changes common files to support huge I/O mappings.  There
> > is no change in the functinalities until HUGE_IOMAP is set in patch 7.
> > 
> > Patch 5,6 implement HAVE_ARCH_HUGE_VMAP and HUGE_IOMAP funcs on x86,
> > and set HAVE_ARCH_HUGE_VMAP on x86.
> > 
> > Patch 7 adds HUGE_IOMAP to Kconfig, which is set to Y by default on
> > x86.
> 
> What do other architectures need to do to utilize this?

Other architectures can implement their version of patch 5/7 and 6/7 to
utilize this feature.

Thanks,
-Toshi



^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 0/7] Kernel huge I/O mapping support
@ 2015-02-23 23:54     ` Toshi Kani
  0 siblings, 0 replies; 50+ messages in thread
From: Toshi Kani @ 2015-02-23 23:54 UTC (permalink / raw)
  To: Andrew Morton
  Cc: hpa, tglx, mingo, arnd, linux-mm, x86, linux-kernel, Elliott

On Mon, 2015-02-23 at 12:22 -0800, Andrew Morton wrote:
> On Mon,  9 Feb 2015 15:45:28 -0700 Toshi Kani <toshi.kani@hp.com> wrote:
> 
> > ioremap() and its related interfaces are used to create I/O
> > mappings to memory-mapped I/O devices.  The mapping sizes of
> > the traditional I/O devices are relatively small.  Non-volatile
> > memory (NVM), however, has many GB and is going to have TB soon.
> > It is not very efficient to create large I/O mappings with 4KB. 
> 
> The changelogging is very good - thanks for taking the time to do this.
> 
> > This patchset extends the ioremap() interfaces to transparently
> > create I/O mappings with huge pages whenever possible.
> 
> I'm wondering if this is prudent.  Existing code which was tested with
> 4k mappings will magically start to use huge tlb mappings.  I don't
> know what could go wrong, but I'd prefer not to find out!  Wouldn't it
> be safer to make this an explicit opt-in?

There were related discussions on this.  This v2 patchset actually has
CONFIG_HUGE_IOMAP, which allows user to select this feature.  As
suggested in the thread below, I am going to remove this
CONFIG_HUGE_IOMAP, so that it will be simpler and similar to how we
create huge mappings to the kernel itself.  If bugs are found, they will
be fixed.
https://lkml.org/lkml/2015/2/18/677

> What operations can presently be performed against an ioremapped area? 
> Can kernel code perform change_page_attr() against individual pages? 
> Can kernel code run iounmap() against just part of that region (I
> forget).  There does seem to be potential for breakage if we start
> using hugetlb mappings for such things?

Yes, kernel code can use the CPA interfaces, such as set_memory_x() and
set_memory_ro() to an ioremapped area.  CPA breaks a huge page to
smaller pages.  I have included them into my test cases and confirmed
they work.  (Note, memory type change interfaces, such as
set_memory_uc() and set_memory_wc(), are not supported to an ioremapped
area regardless of their page size.)

iounmap() only takes a single argument, virtual base addr.  It looks up
the corresponding vm area object from the virt addr, and always removes
the entire mapping.

> >  ioremap()
> > continues to use 4KB mappings when a huge page does not fit into
> > a requested range.  There is no change necessary to the drivers
> > using ioremap().  A requested physical address must be aligned by
> > a huge page size (1GB or 2MB on x86) for using huge page mapping,
> > though.  The kernel huge I/O mapping will improve performance of
> > NVM and other devices with large memory, and reduce the time to
> > create their mappings as well.
> > 
> > On x86, the huge I/O mapping may not be used when a target range is
> > covered by multiple MTRRs with different memory types.  The caller
> > must make a separate request for each MTRR range, or the huge I/O
> > mapping can be disabled with the kernel boot option "nohugeiomap".
> > The detail of this issue is described in the email below, and this
> > patch takes option C) in favor of simplicity since MTRRs are legacy
> > feature.
> >  https://lkml.org/lkml/2015/2/5/638
> 
> How is this mtrr clash handled?
> 
> - The iomap call will fail if there are any MTRRs covering the region?
> 
> - The iomap call will fail if there are more than one MTRRs covering
>   the region?
>
> - If the ioremap will succeed if a single MTRR covers the region,
>   must that MTRR cover the *entire* region?
> 
> - What happens if userspace tried fiddling the MTRRs after the region
>   has been established?
> 
> <reads the code>

This issue was also discussed in the same thread:
https://lkml.org/lkml/2015/2/18/677

I am going to implement option D -- the iomap call will fail if there
are more than one MTRRs with "different types" covering the region. 

> Oh.  We don't do any checking at all.  We're just telling userspace
> programmers "don't do that".  hrm.  What are your thoughts on adding
> the overlap checks to the kernel?
>
> This adds more potential for breaking existing code, doesn't it?  If
> there's code which is using 4k ioremap on regions which are covered by
> mtrrs, the transparent switch to hugeptes will cause that code to enter
> the "undefined behaviour" space?

Yes, I agree with your concern, and I am going to add the check.  I do
not think we have such platform today, and will be affected by this
change, though.

> > The patchset introduces the following configs:
> >  HUGE_IOMAP - When selected (default Y), enable huge I/O mappings.
> >               Require HAVE_ARCH_HUGE_VMAP set.
> >  HAVE_ARCH_HUGE_VMAP - Indicate arch supports huge KVA mappings.
> >                        Require X86_PAE set on X86_32.
> > 
> > Patch 1-4 changes common files to support huge I/O mappings.  There
> > is no change in the functinalities until HUGE_IOMAP is set in patch 7.
> > 
> > Patch 5,6 implement HAVE_ARCH_HUGE_VMAP and HUGE_IOMAP funcs on x86,
> > and set HAVE_ARCH_HUGE_VMAP on x86.
> > 
> > Patch 7 adds HUGE_IOMAP to Kconfig, which is set to Y by default on
> > x86.
> 
> What do other architectures need to do to utilize this?

Other architectures can implement their version of patch 5/7 and 6/7 to
utilize this feature.

Thanks,
-Toshi


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 0/7] Kernel huge I/O mapping support
  2015-02-23 20:22   ` Andrew Morton
@ 2015-02-24  8:09     ` Ingo Molnar
  -1 siblings, 0 replies; 50+ messages in thread
From: Ingo Molnar @ 2015-02-24  8:09 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Toshi Kani, hpa, tglx, mingo, arnd, linux-mm, x86, linux-kernel, Elliott


* Andrew Morton <akpm@linux-foundation.org> wrote:

> <reads the code>
> 
> Oh.  We don't do any checking at all.  We're just telling 
> userspace programmers "don't do that".  hrm.  What are 
> your thoughts on adding the overlap checks to the kernel?

I have requested such sanity checking in previous review as 
well, it has to be made fool-proof for this optimization to 
be usable.

Another alternative would be to make this not a transparent 
optimization, but a separate API: ioremap_hugepage() or so.

The devices and drivers dealing with GBs of remapped pages 
is still relatively low, so they could make explicit use of 
the API and opt in to it.

What I was arguing against was to make it a CONFIG_ option: 
that achieves very little in practice, such APIs should be 
uniformly available.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 0/7] Kernel huge I/O mapping support
@ 2015-02-24  8:09     ` Ingo Molnar
  0 siblings, 0 replies; 50+ messages in thread
From: Ingo Molnar @ 2015-02-24  8:09 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Toshi Kani, hpa, tglx, mingo, arnd, linux-mm, x86, linux-kernel, Elliott


* Andrew Morton <akpm@linux-foundation.org> wrote:

> <reads the code>
> 
> Oh.  We don't do any checking at all.  We're just telling 
> userspace programmers "don't do that".  hrm.  What are 
> your thoughts on adding the overlap checks to the kernel?

I have requested such sanity checking in previous review as 
well, it has to be made fool-proof for this optimization to 
be usable.

Another alternative would be to make this not a transparent 
optimization, but a separate API: ioremap_hugepage() or so.

The devices and drivers dealing with GBs of remapped pages 
is still relatively low, so they could make explicit use of 
the API and opt in to it.

What I was arguing against was to make it a CONFIG_ option: 
that achieves very little in practice, such APIs should be 
uniformly available.

Thanks,

	Ingo

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 0/7] Kernel huge I/O mapping support
  2015-02-24  8:09     ` Ingo Molnar
@ 2015-03-02 15:51       ` Toshi Kani
  -1 siblings, 0 replies; 50+ messages in thread
From: Toshi Kani @ 2015-03-02 15:51 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Andrew Morton, hpa, tglx, mingo, arnd, linux-mm, x86,
	linux-kernel, Elliott

On Tue, 2015-02-24 at 09:09 +0100, Ingo Molnar wrote:
> * Andrew Morton <akpm@linux-foundation.org> wrote:
> 
> > <reads the code>
> > 
> > Oh.  We don't do any checking at all.  We're just telling 
> > userspace programmers "don't do that".  hrm.  What are 
> > your thoughts on adding the overlap checks to the kernel?
> 
> I have requested such sanity checking in previous review as 
> well, it has to be made fool-proof for this optimization to 
> be usable.
> 
> Another alternative would be to make this not a transparent 
> optimization, but a separate API: ioremap_hugepage() or so.
> 
> The devices and drivers dealing with GBs of remapped pages 
> is still relatively low, so they could make explicit use of 
> the API and opt in to it.
> 
> What I was arguing against was to make it a CONFIG_ option: 
> that achieves very little in practice, such APIs should be 
> uniformly available.

I was able to come up with simple changes that fall back to 4KB mappings
when a target range is covered by MTRRs.  So, with the changes, it is
now safe to enable huge page mappings to ioremap() transparently without
such restriction.  I will post updated patchset hopefully soon.

Thanks,
-Toshi


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 0/7] Kernel huge I/O mapping support
@ 2015-03-02 15:51       ` Toshi Kani
  0 siblings, 0 replies; 50+ messages in thread
From: Toshi Kani @ 2015-03-02 15:51 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Andrew Morton, hpa, tglx, mingo, arnd, linux-mm, x86,
	linux-kernel, Elliott

On Tue, 2015-02-24 at 09:09 +0100, Ingo Molnar wrote:
> * Andrew Morton <akpm@linux-foundation.org> wrote:
> 
> > <reads the code>
> > 
> > Oh.  We don't do any checking at all.  We're just telling 
> > userspace programmers "don't do that".  hrm.  What are 
> > your thoughts on adding the overlap checks to the kernel?
> 
> I have requested such sanity checking in previous review as 
> well, it has to be made fool-proof for this optimization to 
> be usable.
> 
> Another alternative would be to make this not a transparent 
> optimization, but a separate API: ioremap_hugepage() or so.
> 
> The devices and drivers dealing with GBs of remapped pages 
> is still relatively low, so they could make explicit use of 
> the API and opt in to it.
> 
> What I was arguing against was to make it a CONFIG_ option: 
> that achieves very little in practice, such APIs should be 
> uniformly available.

I was able to come up with simple changes that fall back to 4KB mappings
when a target range is covered by MTRRs.  So, with the changes, it is
now safe to enable huge page mappings to ioremap() transparently without
such restriction.  I will post updated patchset hopefully soon.

Thanks,
-Toshi

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 5/7] x86, mm: Support huge KVA mappings on x86
  2015-02-10 23:10             ` Toshi Kani
@ 2015-03-03  0:37               ` Toshi Kani
  -1 siblings, 0 replies; 50+ messages in thread
From: Toshi Kani @ 2015-03-03  0:37 UTC (permalink / raw)
  To: Dave Hansen
  Cc: akpm, hpa, tglx, mingo, arnd, linux-mm, x86, linux-kernel, Elliott

On Tue, 2015-02-10 at 16:10 -0700, Toshi Kani wrote:
> On Tue, 2015-02-10 at 15:13 -0700, Toshi Kani wrote:
> > On Tue, 2015-02-10 at 12:51 -0800, Dave Hansen wrote:
> > > On 02/10/2015 12:42 PM, Toshi Kani wrote:
> > > > On Tue, 2015-02-10 at 10:59 -0800, Dave Hansen wrote:
> > > >> On 02/09/2015 02:45 PM, Toshi Kani wrote:
> > > >>> Implement huge KVA mapping interfaces on x86.  Select
> > > >>> HAVE_ARCH_HUGE_VMAP when X86_64 or X86_32 with X86_PAE is set.
> > > >>> Without X86_PAE set, the X86_32 kernel has the 2-level page
> > > >>> tables and cannot provide the huge KVA mappings.
> > > >>
> > > >> Not that it's a big deal, but what's the limitation with the 2-level
> > > >> page tables on 32-bit?  We have a 4MB large page size available there
> > > >> and we already use it for the kernel linear mapping.
> > > > 
> > > > ioremap() calls arch-neutral ioremap_page_range() to set up I/O mappings
> > > > with PTEs.  This patch-set enables ioremap_page_range() to set up PUD &
> > > > PMD mappings.  With 2-level page table, I do not think this PUD/PMD
> > > > mapping code works unless we add some special code.
> > > 
> > > What actually breaks, though?
> > > 
> > > Can't you just disable the pud code via ioremap_pud_enabled()?
> > 
> > That's what v1 did, and I found in testing that the PMD mapping code did
> > not work when PAE was unset.  I think we need special handling similar
> > to one_md_table_init(), which returns pgd as pmd in case of non-PAE.
> > ioremap_page_range() does not have such handling and I thought it would
> > not be worth adding it.
> 
> Actually pud_alloc() and pmd_alloc() should carry pgd in this case...  I
> will look into the problem to see why it did not work when PAE was
> unset.

I have looked at this case, 32bit without PAE, and confirmed that it set
pgd properly.  crash can translate an address with the mapping as well.
However, there is something missing in the code that the kernel cannot
access to a page with the mapping (page fault).  I tried TLB flush, but
it did not help, either.  Since this config can unlikely be benefited by
this feature, I will have to continue to disable this case.  I hope that
is OK.  

Thanks,
-Toshi


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 5/7] x86, mm: Support huge KVA mappings on x86
@ 2015-03-03  0:37               ` Toshi Kani
  0 siblings, 0 replies; 50+ messages in thread
From: Toshi Kani @ 2015-03-03  0:37 UTC (permalink / raw)
  To: Dave Hansen
  Cc: akpm, hpa, tglx, mingo, arnd, linux-mm, x86, linux-kernel, Elliott

On Tue, 2015-02-10 at 16:10 -0700, Toshi Kani wrote:
> On Tue, 2015-02-10 at 15:13 -0700, Toshi Kani wrote:
> > On Tue, 2015-02-10 at 12:51 -0800, Dave Hansen wrote:
> > > On 02/10/2015 12:42 PM, Toshi Kani wrote:
> > > > On Tue, 2015-02-10 at 10:59 -0800, Dave Hansen wrote:
> > > >> On 02/09/2015 02:45 PM, Toshi Kani wrote:
> > > >>> Implement huge KVA mapping interfaces on x86.  Select
> > > >>> HAVE_ARCH_HUGE_VMAP when X86_64 or X86_32 with X86_PAE is set.
> > > >>> Without X86_PAE set, the X86_32 kernel has the 2-level page
> > > >>> tables and cannot provide the huge KVA mappings.
> > > >>
> > > >> Not that it's a big deal, but what's the limitation with the 2-level
> > > >> page tables on 32-bit?  We have a 4MB large page size available there
> > > >> and we already use it for the kernel linear mapping.
> > > > 
> > > > ioremap() calls arch-neutral ioremap_page_range() to set up I/O mappings
> > > > with PTEs.  This patch-set enables ioremap_page_range() to set up PUD &
> > > > PMD mappings.  With 2-level page table, I do not think this PUD/PMD
> > > > mapping code works unless we add some special code.
> > > 
> > > What actually breaks, though?
> > > 
> > > Can't you just disable the pud code via ioremap_pud_enabled()?
> > 
> > That's what v1 did, and I found in testing that the PMD mapping code did
> > not work when PAE was unset.  I think we need special handling similar
> > to one_md_table_init(), which returns pgd as pmd in case of non-PAE.
> > ioremap_page_range() does not have such handling and I thought it would
> > not be worth adding it.
> 
> Actually pud_alloc() and pmd_alloc() should carry pgd in this case...  I
> will look into the problem to see why it did not work when PAE was
> unset.

I have looked at this case, 32bit without PAE, and confirmed that it set
pgd properly.  crash can translate an address with the mapping as well.
However, there is something missing in the code that the kernel cannot
access to a page with the mapping (page fault).  I tried TLB flush, but
it did not help, either.  Since this config can unlikely be benefited by
this feature, I will have to continue to disable this case.  I hope that
is OK.  

Thanks,
-Toshi

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 50+ messages in thread

end of thread, other threads:[~2015-03-03  0:37 UTC | newest]

Thread overview: 50+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-02-09 22:45 [PATCH v2 0/7] Kernel huge I/O mapping support Toshi Kani
2015-02-09 22:45 ` Toshi Kani
2015-02-09 22:45 ` [PATCH v2 1/7] mm: Change __get_vm_area_node() to use fls_long() Toshi Kani
2015-02-09 22:45   ` Toshi Kani
2015-02-09 22:45 ` [PATCH v2 2/7] lib: Add huge I/O map capability interfaces Toshi Kani
2015-02-09 22:45   ` Toshi Kani
2015-02-09 22:45 ` [PATCH v2 3/7] mm: Change ioremap to set up huge I/O mappings Toshi Kani
2015-02-09 22:45   ` Toshi Kani
2015-02-09 22:45 ` [PATCH v2 4/7] mm: Change vunmap to tear down huge KVA mappings Toshi Kani
2015-02-09 22:45   ` Toshi Kani
2015-02-09 22:45 ` [PATCH v2 5/7] x86, mm: Support huge KVA mappings on x86 Toshi Kani
2015-02-09 22:45   ` Toshi Kani
2015-02-10 18:59   ` Dave Hansen
2015-02-10 18:59     ` Dave Hansen
2015-02-10 20:42     ` Toshi Kani
2015-02-10 20:42       ` Toshi Kani
2015-02-10 20:51       ` Dave Hansen
2015-02-10 20:51         ` Dave Hansen
2015-02-10 22:13         ` Toshi Kani
2015-02-10 22:13           ` Toshi Kani
2015-02-10 22:20           ` Toshi Kani
2015-02-10 22:20             ` Toshi Kani
2015-02-10 23:10           ` Toshi Kani
2015-02-10 23:10             ` Toshi Kani
2015-03-03  0:37             ` Toshi Kani
2015-03-03  0:37               ` Toshi Kani
2015-02-09 22:45 ` [PATCH v2 6/7] x86, mm: Support huge I/O " Toshi Kani
2015-02-09 22:45   ` Toshi Kani
2015-02-18 20:44   ` Ingo Molnar
2015-02-18 20:44     ` Ingo Molnar
2015-02-18 21:13     ` Toshi Kani
2015-02-18 21:13       ` Toshi Kani
2015-02-18 21:15       ` Ingo Molnar
2015-02-18 21:15         ` Ingo Molnar
2015-02-18 21:33         ` Toshi Kani
2015-02-18 21:33           ` Toshi Kani
2015-02-18 21:57           ` Ingo Molnar
2015-02-18 21:57             ` Ingo Molnar
2015-02-18 22:14             ` Toshi Kani
2015-02-18 22:14               ` Toshi Kani
2015-02-09 22:45 ` [PATCH v2 7/7] mm: Add config HUGE_IOMAP to enable huge I/O mappings Toshi Kani
2015-02-09 22:45   ` Toshi Kani
2015-02-23 20:22 ` [PATCH v2 0/7] Kernel huge I/O mapping support Andrew Morton
2015-02-23 20:22   ` Andrew Morton
2015-02-23 23:54   ` Toshi Kani
2015-02-23 23:54     ` Toshi Kani
2015-02-24  8:09   ` Ingo Molnar
2015-02-24  8:09     ` Ingo Molnar
2015-03-02 15:51     ` Toshi Kani
2015-03-02 15:51       ` Toshi Kani

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.