All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v1 00/12] mm: THP migration support
@ 2016-09-26 15:22 ` zi.yan
  0 siblings, 0 replies; 34+ messages in thread
From: zi.yan @ 2016-09-26 15:22 UTC (permalink / raw)
  To: linux-kernel, linux-mm
  Cc: benh, mgorman, kirill.shutemov, akpm, dave.hansen, n-horiguchi, Zi Yan

From: Zi Yan <zi.yan@cs.rutgers.edu>

Hi all,

This patchset is based on Naoya Horiguchi's page migration enchancement 
for thp patchset with additional IBM ppc64 support. And I rebase it
on the latest upstream commit.

The motivation is that 4KB page migration is underutilizing the memory
bandwidth compared to 2MB THP migration.

As part of my internship work in NVIDIA, I compared the bandwidth
utilizations between 512 4KB pages and 1 2MB page in both x86_64 and ppc64.
And the results show that migrating 512 4KB pages takes only 3x and 1.15x of
the time, compared to migrating single 2MB THP, in x86_64 and ppc64 
respectively.

Here are the actual BW numbers (total_data_size/migration_time):
        | 512 4KB pages | 1 2MB THP  |  1 4KB page
x86_64  |  0.98GB/s     |  2.97GB/s  |   0.06GB/s
ppc64   |  6.14GB/s     |  7.10GB/s  |   1.24GB/s

Any comments or advices are welcome.

Here is the original message from Naoya:

This patchset enhances page migration functionality to handle thp migration
for various page migration's callers:
 - mbind(2)
 - move_pages(2)
 - migrate_pages(2)
 - cgroup/cpuset migration
 - memory hotremove
 - soft offline

The main benefit is that we can avoid unnecessary thp splits, which helps us
avoid performance decrease when your applications handles NUMA optimization on
their own.

The implementation is similar to that of normal page migration, the key point
is that we modify a pmd to a pmd migration entry in swap-entry like format.
pmd_present() is not simple and it's not enough by itself to determine whether
a given pmd is a pmd migration entry. See patch 3/11 and 5/11 for details.

Here're topics which might be helpful to start discussion:

- at this point, this functionality is limited to x86_64.

- there's alrealy an implementation of thp migration in autonuma code of which
  this patchset doesn't touch anything because it works fine as it is.

- fallback to thp split: current implementation just fails a migration trial if
  thp migration fails. It's possible to retry migration after splitting the thp,
  but that's not included in this version.



Thanks,
Zi Yan
---

Naoya Horiguchi (11):
  mm: mempolicy: add queue_pages_node_check()
  mm: thp: introduce CONFIG_ARCH_ENABLE_HUGEPAGE_MIGRATION
  mm: thp: add helpers related to thp/pmd migration
  mm: thp: enable thp migration in generic path
  mm: thp: check pmd migration entry in common path
  mm: soft-dirty: keep soft-dirty bits over thp migration
  mm: hwpoison: fix race between unpoisoning and freeing migrate source
    page
  mm: hwpoison: soft offline supports thp migration
  mm: mempolicy: mbind and migrate_pages support thp migration
  mm: migrate: move_pages() supports thp migration
  mm: memory_hotplug: memory hotremove supports thp migration

Zi Yan (1):
  mm: ppc64: Add THP migration support for ppc64.

 arch/powerpc/Kconfig                         |   4 +
 arch/powerpc/include/asm/book3s/64/pgtable.h |  23 ++++
 arch/x86/Kconfig                             |   4 +
 arch/x86/include/asm/pgtable.h               |  28 ++++
 arch/x86/include/asm/pgtable_64.h            |   2 +
 arch/x86/include/asm/pgtable_types.h         |   8 +-
 arch/x86/mm/gup.c                            |   3 +
 fs/proc/task_mmu.c                           |  20 +--
 include/asm-generic/pgtable.h                |  34 ++++-
 include/linux/huge_mm.h                      |  13 ++
 include/linux/swapops.h                      |  64 ++++++++++
 mm/Kconfig                                   |   3 +
 mm/gup.c                                     |   8 ++
 mm/huge_memory.c                             | 184 +++++++++++++++++++++++++--
 mm/memcontrol.c                              |   2 +
 mm/memory-failure.c                          |  41 +++---
 mm/memory.c                                  |   5 +
 mm/memory_hotplug.c                          |   8 ++
 mm/mempolicy.c                               | 108 ++++++++++++----
 mm/migrate.c                                 |  49 ++++++-
 mm/page_isolation.c                          |   9 ++
 mm/rmap.c                                    |   5 +
 22 files changed, 549 insertions(+), 76 deletions(-)

-- 
2.9.3

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH v1 00/12] mm: THP migration support
@ 2016-09-26 15:22 ` zi.yan
  0 siblings, 0 replies; 34+ messages in thread
From: zi.yan @ 2016-09-26 15:22 UTC (permalink / raw)
  To: linux-kernel, linux-mm
  Cc: benh, mgorman, kirill.shutemov, akpm, dave.hansen, n-horiguchi, Zi Yan

From: Zi Yan <zi.yan@cs.rutgers.edu>

Hi all,

This patchset is based on Naoya Horiguchi's page migration enchancement 
for thp patchset with additional IBM ppc64 support. And I rebase it
on the latest upstream commit.

The motivation is that 4KB page migration is underutilizing the memory
bandwidth compared to 2MB THP migration.

As part of my internship work in NVIDIA, I compared the bandwidth
utilizations between 512 4KB pages and 1 2MB page in both x86_64 and ppc64.
And the results show that migrating 512 4KB pages takes only 3x and 1.15x of
the time, compared to migrating single 2MB THP, in x86_64 and ppc64 
respectively.

Here are the actual BW numbers (total_data_size/migration_time):
        | 512 4KB pages | 1 2MB THP  |  1 4KB page
x86_64  |  0.98GB/s     |  2.97GB/s  |   0.06GB/s
ppc64   |  6.14GB/s     |  7.10GB/s  |   1.24GB/s

Any comments or advices are welcome.

Here is the original message from Naoya:

This patchset enhances page migration functionality to handle thp migration
for various page migration's callers:
 - mbind(2)
 - move_pages(2)
 - migrate_pages(2)
 - cgroup/cpuset migration
 - memory hotremove
 - soft offline

The main benefit is that we can avoid unnecessary thp splits, which helps us
avoid performance decrease when your applications handles NUMA optimization on
their own.

The implementation is similar to that of normal page migration, the key point
is that we modify a pmd to a pmd migration entry in swap-entry like format.
pmd_present() is not simple and it's not enough by itself to determine whether
a given pmd is a pmd migration entry. See patch 3/11 and 5/11 for details.

Here're topics which might be helpful to start discussion:

- at this point, this functionality is limited to x86_64.

- there's alrealy an implementation of thp migration in autonuma code of which
  this patchset doesn't touch anything because it works fine as it is.

- fallback to thp split: current implementation just fails a migration trial if
  thp migration fails. It's possible to retry migration after splitting the thp,
  but that's not included in this version.



Thanks,
Zi Yan
---

Naoya Horiguchi (11):
  mm: mempolicy: add queue_pages_node_check()
  mm: thp: introduce CONFIG_ARCH_ENABLE_HUGEPAGE_MIGRATION
  mm: thp: add helpers related to thp/pmd migration
  mm: thp: enable thp migration in generic path
  mm: thp: check pmd migration entry in common path
  mm: soft-dirty: keep soft-dirty bits over thp migration
  mm: hwpoison: fix race between unpoisoning and freeing migrate source
    page
  mm: hwpoison: soft offline supports thp migration
  mm: mempolicy: mbind and migrate_pages support thp migration
  mm: migrate: move_pages() supports thp migration
  mm: memory_hotplug: memory hotremove supports thp migration

Zi Yan (1):
  mm: ppc64: Add THP migration support for ppc64.

 arch/powerpc/Kconfig                         |   4 +
 arch/powerpc/include/asm/book3s/64/pgtable.h |  23 ++++
 arch/x86/Kconfig                             |   4 +
 arch/x86/include/asm/pgtable.h               |  28 ++++
 arch/x86/include/asm/pgtable_64.h            |   2 +
 arch/x86/include/asm/pgtable_types.h         |   8 +-
 arch/x86/mm/gup.c                            |   3 +
 fs/proc/task_mmu.c                           |  20 +--
 include/asm-generic/pgtable.h                |  34 ++++-
 include/linux/huge_mm.h                      |  13 ++
 include/linux/swapops.h                      |  64 ++++++++++
 mm/Kconfig                                   |   3 +
 mm/gup.c                                     |   8 ++
 mm/huge_memory.c                             | 184 +++++++++++++++++++++++++--
 mm/memcontrol.c                              |   2 +
 mm/memory-failure.c                          |  41 +++---
 mm/memory.c                                  |   5 +
 mm/memory_hotplug.c                          |   8 ++
 mm/mempolicy.c                               | 108 ++++++++++++----
 mm/migrate.c                                 |  49 ++++++-
 mm/page_isolation.c                          |   9 ++
 mm/rmap.c                                    |   5 +
 22 files changed, 549 insertions(+), 76 deletions(-)

-- 
2.9.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH v1 01/12] mm: mempolicy: add queue_pages_node_check()
  2016-09-26 15:22 ` zi.yan
@ 2016-09-26 15:22   ` zi.yan
  -1 siblings, 0 replies; 34+ messages in thread
From: zi.yan @ 2016-09-26 15:22 UTC (permalink / raw)
  To: linux-kernel, linux-mm
  Cc: benh, mgorman, kirill.shutemov, akpm, dave.hansen, n-horiguchi

From: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>

Introduce a separate check routine related to MPOL_MF_INVERT flag. This patch
just does cleanup, no behavioral change.

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
---
 mm/mempolicy.c | 16 +++++++++++-----
 1 file changed, 11 insertions(+), 5 deletions(-)

diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 2da72a5..dc8e913 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -475,6 +475,15 @@ struct queue_pages {
 	struct vm_area_struct *prev;
 };
 
+static inline bool queue_pages_node_check(struct page *page,
+					struct queue_pages *qp)
+{
+	int nid = page_to_nid(page);
+	unsigned long flags = qp->flags;
+
+	return node_isset(nid, *qp->nmask) == !!(flags & MPOL_MF_INVERT);
+}
+
 /*
  * Scan through pages checking if pages follow certain conditions,
  * and move them to the pagelist if they do.
@@ -528,8 +537,7 @@ retry:
 		 */
 		if (PageReserved(page))
 			continue;
-		nid = page_to_nid(page);
-		if (node_isset(nid, *qp->nmask) == !!(flags & MPOL_MF_INVERT))
+		if (queue_pages_node_check(page, qp))
 			continue;
 		if (PageTransCompound(page)) {
 			get_page(page);
@@ -561,7 +569,6 @@ static int queue_pages_hugetlb(pte_t *pte, unsigned long hmask,
 #ifdef CONFIG_HUGETLB_PAGE
 	struct queue_pages *qp = walk->private;
 	unsigned long flags = qp->flags;
-	int nid;
 	struct page *page;
 	spinlock_t *ptl;
 	pte_t entry;
@@ -571,8 +578,7 @@ static int queue_pages_hugetlb(pte_t *pte, unsigned long hmask,
 	if (!pte_present(entry))
 		goto unlock;
 	page = pte_page(entry);
-	nid = page_to_nid(page);
-	if (node_isset(nid, *qp->nmask) == !!(flags & MPOL_MF_INVERT))
+	if (queue_pages_node_check(page, qp))
 		goto unlock;
 	/* With MPOL_MF_MOVE, we migrate only unshared hugepage. */
 	if (flags & (MPOL_MF_MOVE_ALL) ||
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v1 01/12] mm: mempolicy: add queue_pages_node_check()
@ 2016-09-26 15:22   ` zi.yan
  0 siblings, 0 replies; 34+ messages in thread
From: zi.yan @ 2016-09-26 15:22 UTC (permalink / raw)
  To: linux-kernel, linux-mm
  Cc: benh, mgorman, kirill.shutemov, akpm, dave.hansen, n-horiguchi

From: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>

Introduce a separate check routine related to MPOL_MF_INVERT flag. This patch
just does cleanup, no behavioral change.

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
---
 mm/mempolicy.c | 16 +++++++++++-----
 1 file changed, 11 insertions(+), 5 deletions(-)

diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 2da72a5..dc8e913 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -475,6 +475,15 @@ struct queue_pages {
 	struct vm_area_struct *prev;
 };
 
+static inline bool queue_pages_node_check(struct page *page,
+					struct queue_pages *qp)
+{
+	int nid = page_to_nid(page);
+	unsigned long flags = qp->flags;
+
+	return node_isset(nid, *qp->nmask) == !!(flags & MPOL_MF_INVERT);
+}
+
 /*
  * Scan through pages checking if pages follow certain conditions,
  * and move them to the pagelist if they do.
@@ -528,8 +537,7 @@ retry:
 		 */
 		if (PageReserved(page))
 			continue;
-		nid = page_to_nid(page);
-		if (node_isset(nid, *qp->nmask) == !!(flags & MPOL_MF_INVERT))
+		if (queue_pages_node_check(page, qp))
 			continue;
 		if (PageTransCompound(page)) {
 			get_page(page);
@@ -561,7 +569,6 @@ static int queue_pages_hugetlb(pte_t *pte, unsigned long hmask,
 #ifdef CONFIG_HUGETLB_PAGE
 	struct queue_pages *qp = walk->private;
 	unsigned long flags = qp->flags;
-	int nid;
 	struct page *page;
 	spinlock_t *ptl;
 	pte_t entry;
@@ -571,8 +578,7 @@ static int queue_pages_hugetlb(pte_t *pte, unsigned long hmask,
 	if (!pte_present(entry))
 		goto unlock;
 	page = pte_page(entry);
-	nid = page_to_nid(page);
-	if (node_isset(nid, *qp->nmask) == !!(flags & MPOL_MF_INVERT))
+	if (queue_pages_node_check(page, qp))
 		goto unlock;
 	/* With MPOL_MF_MOVE, we migrate only unshared hugepage. */
 	if (flags & (MPOL_MF_MOVE_ALL) ||
-- 
2.9.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v1 02/12] mm: thp: introduce CONFIG_ARCH_ENABLE_HUGEPAGE_MIGRATION
  2016-09-26 15:22 ` zi.yan
@ 2016-09-26 15:22   ` zi.yan
  -1 siblings, 0 replies; 34+ messages in thread
From: zi.yan @ 2016-09-26 15:22 UTC (permalink / raw)
  To: linux-kernel, linux-mm
  Cc: benh, mgorman, kirill.shutemov, akpm, dave.hansen, n-horiguchi

From: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>

Introduces CONFIG_ARCH_ENABLE_HUGEPAGE_MIGRATION to limit thp migration
functionality to x86_64, which should be safer at the first step.

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
---
 arch/x86/Kconfig        |  4 ++++
 include/linux/huge_mm.h | 10 ++++++++++
 mm/Kconfig              |  3 +++
 3 files changed, 17 insertions(+)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 2a1f0ce..ad99c05 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -2234,6 +2234,10 @@ config ARCH_ENABLE_HUGEPAGE_MIGRATION
 	def_bool y
 	depends on X86_64 && HUGETLB_PAGE && MIGRATION
 
+config ARCH_ENABLE_THP_MIGRATION
+	def_bool y
+	depends on X86_64 && TRANSPARENT_HUGEPAGE && MIGRATION
+
 menu "Power management and ACPI options"
 
 config ARCH_HIBERNATION_HEADER
diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index 6f14de4..4ae156e 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -157,6 +157,11 @@ void put_huge_zero_page(void);
 
 #define mk_huge_pmd(page, prot) pmd_mkhuge(mk_pmd(page, prot))
 
+static inline bool thp_migration_supported(void)
+{
+	return IS_ENABLED(CONFIG_ARCH_ENABLE_THP_MIGRATION);
+}
+
 #else /* CONFIG_TRANSPARENT_HUGEPAGE */
 #define HPAGE_PMD_SHIFT ({ BUILD_BUG(); 0; })
 #define HPAGE_PMD_MASK ({ BUILD_BUG(); 0; })
@@ -223,6 +228,11 @@ static inline struct page *follow_devmap_pmd(struct vm_area_struct *vma,
 {
 	return NULL;
 }
+
+static inline bool thp_migration_supported(void)
+{
+	return false;
+}
 #endif /* CONFIG_TRANSPARENT_HUGEPAGE */
 
 #endif /* _LINUX_HUGE_MM_H */
diff --git a/mm/Kconfig b/mm/Kconfig
index be0ee11..1965310 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -289,6 +289,9 @@ config MIGRATION
 config ARCH_ENABLE_HUGEPAGE_MIGRATION
 	bool
 
+config ARCH_ENABLE_THP_MIGRATION
+	bool
+
 config PHYS_ADDR_T_64BIT
 	def_bool 64BIT || ARCH_PHYS_ADDR_T_64BIT
 
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v1 02/12] mm: thp: introduce CONFIG_ARCH_ENABLE_HUGEPAGE_MIGRATION
@ 2016-09-26 15:22   ` zi.yan
  0 siblings, 0 replies; 34+ messages in thread
From: zi.yan @ 2016-09-26 15:22 UTC (permalink / raw)
  To: linux-kernel, linux-mm
  Cc: benh, mgorman, kirill.shutemov, akpm, dave.hansen, n-horiguchi

From: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>

Introduces CONFIG_ARCH_ENABLE_HUGEPAGE_MIGRATION to limit thp migration
functionality to x86_64, which should be safer at the first step.

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
---
 arch/x86/Kconfig        |  4 ++++
 include/linux/huge_mm.h | 10 ++++++++++
 mm/Kconfig              |  3 +++
 3 files changed, 17 insertions(+)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 2a1f0ce..ad99c05 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -2234,6 +2234,10 @@ config ARCH_ENABLE_HUGEPAGE_MIGRATION
 	def_bool y
 	depends on X86_64 && HUGETLB_PAGE && MIGRATION
 
+config ARCH_ENABLE_THP_MIGRATION
+	def_bool y
+	depends on X86_64 && TRANSPARENT_HUGEPAGE && MIGRATION
+
 menu "Power management and ACPI options"
 
 config ARCH_HIBERNATION_HEADER
diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index 6f14de4..4ae156e 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -157,6 +157,11 @@ void put_huge_zero_page(void);
 
 #define mk_huge_pmd(page, prot) pmd_mkhuge(mk_pmd(page, prot))
 
+static inline bool thp_migration_supported(void)
+{
+	return IS_ENABLED(CONFIG_ARCH_ENABLE_THP_MIGRATION);
+}
+
 #else /* CONFIG_TRANSPARENT_HUGEPAGE */
 #define HPAGE_PMD_SHIFT ({ BUILD_BUG(); 0; })
 #define HPAGE_PMD_MASK ({ BUILD_BUG(); 0; })
@@ -223,6 +228,11 @@ static inline struct page *follow_devmap_pmd(struct vm_area_struct *vma,
 {
 	return NULL;
 }
+
+static inline bool thp_migration_supported(void)
+{
+	return false;
+}
 #endif /* CONFIG_TRANSPARENT_HUGEPAGE */
 
 #endif /* _LINUX_HUGE_MM_H */
diff --git a/mm/Kconfig b/mm/Kconfig
index be0ee11..1965310 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -289,6 +289,9 @@ config MIGRATION
 config ARCH_ENABLE_HUGEPAGE_MIGRATION
 	bool
 
+config ARCH_ENABLE_THP_MIGRATION
+	bool
+
 config PHYS_ADDR_T_64BIT
 	def_bool 64BIT || ARCH_PHYS_ADDR_T_64BIT
 
-- 
2.9.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v1 03/12] mm: thp: add helpers related to thp/pmd migration
  2016-09-26 15:22 ` zi.yan
@ 2016-09-26 15:22   ` zi.yan
  -1 siblings, 0 replies; 34+ messages in thread
From: zi.yan @ 2016-09-26 15:22 UTC (permalink / raw)
  To: linux-kernel, linux-mm
  Cc: benh, mgorman, kirill.shutemov, akpm, dave.hansen, n-horiguchi

From: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>

This patch prepares thp migration's core code. These code will be open when
unmap_and_move() stops unconditionally splitting thp and get_new_page() starts
to allocate destination thps.

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
---
 arch/x86/include/asm/pgtable.h    | 11 ++++++
 arch/x86/include/asm/pgtable_64.h |  2 +
 include/linux/swapops.h           | 62 +++++++++++++++++++++++++++++++
 mm/huge_memory.c                  | 77 +++++++++++++++++++++++++++++++++++++++
 mm/migrate.c                      | 23 ++++++++++++
 5 files changed, 175 insertions(+)

diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index 437feb4..5ff861f 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -530,6 +530,17 @@ static inline int pmd_present(pmd_t pmd)
 	return pmd_flags(pmd) & (_PAGE_PRESENT | _PAGE_PROTNONE | _PAGE_PSE);
 }
 
+/*
+ * Unlike pmd_present(), __pmd_present() checks only _PAGE_PRESENT bit.
+ * Combined with is_migration_entry(), this routine is used to detect pmd
+ * migration entries. To make it work fine, callers should make sure that
+ * pmd_trans_huge() returns true beforehand.
+ */
+static inline int __pmd_present(pmd_t pmd)
+{
+	return pmd_flags(pmd) & _PAGE_PRESENT;
+}
+
 #ifdef CONFIG_NUMA_BALANCING
 /*
  * These work without NUMA balancing but the kernel does not care. See the
diff --git a/arch/x86/include/asm/pgtable_64.h b/arch/x86/include/asm/pgtable_64.h
index 1cc82ec..3a1b48e 100644
--- a/arch/x86/include/asm/pgtable_64.h
+++ b/arch/x86/include/asm/pgtable_64.h
@@ -167,7 +167,9 @@ static inline int pgd_large(pgd_t pgd) { return 0; }
 					 ((type) << (SWP_TYPE_FIRST_BIT)) \
 					 | ((offset) << SWP_OFFSET_FIRST_BIT) })
 #define __pte_to_swp_entry(pte)		((swp_entry_t) { pte_val((pte)) })
+#define __pmd_to_swp_entry(pte)		((swp_entry_t) { pmd_val((pmd)) })
 #define __swp_entry_to_pte(x)		((pte_t) { .pte = (x).val })
+#define __swp_entry_to_pmd(x)		((pmd_t) { .pmd = (x).val })
 
 extern int kern_addr_valid(unsigned long addr);
 extern void cleanup_highmap(void);
diff --git a/include/linux/swapops.h b/include/linux/swapops.h
index 5c3a5f3..b402a2c 100644
--- a/include/linux/swapops.h
+++ b/include/linux/swapops.h
@@ -163,6 +163,68 @@ static inline int is_write_migration_entry(swp_entry_t entry)
 
 #endif
 
+#ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION
+extern int set_pmd_migration_entry(struct page *page,
+		struct mm_struct *mm, unsigned long address);
+
+extern int remove_migration_pmd(struct page *new,
+		struct vm_area_struct *vma, unsigned long addr, void *old);
+
+extern void pmd_migration_entry_wait(struct mm_struct *mm, pmd_t *pmd);
+
+static inline swp_entry_t pmd_to_swp_entry(pmd_t pmd)
+{
+	swp_entry_t arch_entry;
+
+	arch_entry = __pmd_to_swp_entry(pmd);
+	return swp_entry(__swp_type(arch_entry), __swp_offset(arch_entry));
+}
+
+static inline pmd_t swp_entry_to_pmd(swp_entry_t entry)
+{
+	swp_entry_t arch_entry;
+
+	arch_entry = __swp_entry(swp_type(entry), swp_offset(entry));
+	return __swp_entry_to_pmd(arch_entry);
+}
+
+static inline int is_pmd_migration_entry(pmd_t pmd)
+{
+	return !__pmd_present(pmd) && is_migration_entry(pmd_to_swp_entry(pmd));
+}
+#else
+static inline int set_pmd_migration_entry(struct page *page,
+				struct mm_struct *mm, unsigned long address)
+{
+	return 0;
+}
+
+static inline int remove_migration_pmd(struct page *new,
+		struct vm_area_struct *vma, unsigned long addr, void *old)
+{
+	return 0;
+}
+
+static inline void pmd_migration_entry_wait(struct mm_struct *m, pmd_t *p) { }
+
+static inline swp_entry_t pmd_to_swp_entry(pmd_t pmd)
+{
+	return swp_entry(0, 0);
+}
+
+static inline pmd_t swp_entry_to_pmd(swp_entry_t entry)
+{
+	pmd_t pmd = {};
+
+	return pmd;
+}
+
+static inline int is_pmd_migration_entry(pmd_t pmd)
+{
+	return 0;
+}
+#endif
+
 #ifdef CONFIG_MEMORY_FAILURE
 
 extern atomic_long_t num_poisoned_pages __read_mostly;
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index a6abd76..0cd39ef 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2252,3 +2252,80 @@ static int __init split_huge_pages_debugfs(void)
 }
 late_initcall(split_huge_pages_debugfs);
 #endif
+
+#ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION
+int set_pmd_migration_entry(struct page *page, struct mm_struct *mm,
+				unsigned long addr)
+{
+	pte_t *pte;
+	pmd_t *pmd;
+	pmd_t pmdval;
+	pmd_t pmdswp;
+	swp_entry_t entry;
+	spinlock_t *ptl;
+
+	mmu_notifier_invalidate_range_start(mm, addr, addr + HPAGE_PMD_SIZE);
+	if (!page_check_address_transhuge(page, mm, addr, &pmd, &pte, &ptl))
+		goto out;
+	if (pte)
+		goto out;
+	pmdval = pmdp_huge_get_and_clear(mm, addr, pmd);
+	entry = make_migration_entry(page, pmd_write(pmdval));
+	pmdswp = swp_entry_to_pmd(entry);
+	pmdswp = pmd_mkhuge(pmdswp);
+	set_pmd_at(mm, addr, pmd, pmdswp);
+	page_remove_rmap(page, true);
+	put_page(page);
+	spin_unlock(ptl);
+out:
+	mmu_notifier_invalidate_range_end(mm, addr, addr + HPAGE_PMD_SIZE);
+	return SWAP_AGAIN;
+}
+
+int remove_migration_pmd(struct page *new, struct vm_area_struct *vma,
+			unsigned long addr, void *old)
+{
+	struct mm_struct *mm = vma->vm_mm;
+	spinlock_t *ptl;
+	pgd_t *pgd;
+	pud_t *pud;
+	pmd_t *pmd;
+	pmd_t pmde;
+	swp_entry_t entry;
+	unsigned long mmun_start = addr & HPAGE_PMD_MASK;
+	unsigned long mmun_end = mmun_start + HPAGE_PMD_SIZE;
+
+	pgd = pgd_offset(mm, addr);
+	if (!pgd_present(*pgd))
+		goto out;
+	pud = pud_offset(pgd, addr);
+	if (!pud_present(*pud))
+		goto out;
+	pmd = pmd_offset(pud, addr);
+	if (!pmd)
+		goto out;
+	ptl = pmd_lock(mm, pmd);
+	pmde = *pmd;
+	if (!is_pmd_migration_entry(pmde))
+		goto unlock_ptl;
+	entry = pmd_to_swp_entry(pmde);
+	if (migration_entry_to_page(entry) != old)
+		goto unlock_ptl;
+	get_page(new);
+	pmde = mk_huge_pmd(new, vma->vm_page_prot);
+	if (is_write_migration_entry(entry))
+		pmde = maybe_pmd_mkwrite(pmde, vma);
+	flush_cache_range(vma, mmun_start, mmun_end);
+	page_add_anon_rmap(new, vma, mmun_start, true);
+	pmdp_huge_clear_flush_notify(vma, mmun_start, pmd);
+	set_pmd_at(mm, mmun_start, pmd, pmde);
+	flush_tlb_range(vma, mmun_start, mmun_end);
+	if (vma->vm_flags & VM_LOCKED)
+		mlock_vma_page(new);
+	update_mmu_cache_pmd(vma, addr, pmd);
+unlock_ptl:
+	spin_unlock(ptl);
+out:
+	return SWAP_AGAIN;
+}
+#endif
diff --git a/mm/migrate.c b/mm/migrate.c
index f7ee04a..95613e7 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -207,6 +207,8 @@ static int remove_migration_pte(struct page *new, struct vm_area_struct *vma,
 		if (!ptep)
 			goto out;
 		ptl = huge_pte_lockptr(hstate_vma(vma), mm, ptep);
+	} else if (PageTransHuge(new)) {
+		return remove_migration_pmd(new, vma, addr, old);
 	} else {
 		pmd = mm_find_pmd(mm, addr);
 		if (!pmd)
@@ -344,6 +346,27 @@ void migration_entry_wait_huge(struct vm_area_struct *vma,
 	__migration_entry_wait(mm, pte, ptl);
 }
 
+#ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION
+void pmd_migration_entry_wait(struct mm_struct *mm, pmd_t *pmd)
+{
+	spinlock_t *ptl;
+	struct page *page;
+
+	ptl = pmd_lock(mm, pmd);
+	if (!is_pmd_migration_entry(*pmd))
+		goto unlock;
+	page = migration_entry_to_page(pmd_to_swp_entry(*pmd));
+	if (!get_page_unless_zero(page))
+		goto unlock;
+	spin_unlock(ptl);
+	wait_on_page_locked(page);
+	put_page(page);
+	return;
+unlock:
+	spin_unlock(ptl);
+}
+#endif
+
 #ifdef CONFIG_BLOCK
 /* Returns true if all buffers are successfully locked */
 static bool buffer_migrate_lock_buffers(struct buffer_head *head,
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v1 03/12] mm: thp: add helpers related to thp/pmd migration
@ 2016-09-26 15:22   ` zi.yan
  0 siblings, 0 replies; 34+ messages in thread
From: zi.yan @ 2016-09-26 15:22 UTC (permalink / raw)
  To: linux-kernel, linux-mm
  Cc: benh, mgorman, kirill.shutemov, akpm, dave.hansen, n-horiguchi

From: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>

This patch prepares thp migration's core code. These code will be open when
unmap_and_move() stops unconditionally splitting thp and get_new_page() starts
to allocate destination thps.

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
---
 arch/x86/include/asm/pgtable.h    | 11 ++++++
 arch/x86/include/asm/pgtable_64.h |  2 +
 include/linux/swapops.h           | 62 +++++++++++++++++++++++++++++++
 mm/huge_memory.c                  | 77 +++++++++++++++++++++++++++++++++++++++
 mm/migrate.c                      | 23 ++++++++++++
 5 files changed, 175 insertions(+)

diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index 437feb4..5ff861f 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -530,6 +530,17 @@ static inline int pmd_present(pmd_t pmd)
 	return pmd_flags(pmd) & (_PAGE_PRESENT | _PAGE_PROTNONE | _PAGE_PSE);
 }
 
+/*
+ * Unlike pmd_present(), __pmd_present() checks only _PAGE_PRESENT bit.
+ * Combined with is_migration_entry(), this routine is used to detect pmd
+ * migration entries. To make it work fine, callers should make sure that
+ * pmd_trans_huge() returns true beforehand.
+ */
+static inline int __pmd_present(pmd_t pmd)
+{
+	return pmd_flags(pmd) & _PAGE_PRESENT;
+}
+
 #ifdef CONFIG_NUMA_BALANCING
 /*
  * These work without NUMA balancing but the kernel does not care. See the
diff --git a/arch/x86/include/asm/pgtable_64.h b/arch/x86/include/asm/pgtable_64.h
index 1cc82ec..3a1b48e 100644
--- a/arch/x86/include/asm/pgtable_64.h
+++ b/arch/x86/include/asm/pgtable_64.h
@@ -167,7 +167,9 @@ static inline int pgd_large(pgd_t pgd) { return 0; }
 					 ((type) << (SWP_TYPE_FIRST_BIT)) \
 					 | ((offset) << SWP_OFFSET_FIRST_BIT) })
 #define __pte_to_swp_entry(pte)		((swp_entry_t) { pte_val((pte)) })
+#define __pmd_to_swp_entry(pte)		((swp_entry_t) { pmd_val((pmd)) })
 #define __swp_entry_to_pte(x)		((pte_t) { .pte = (x).val })
+#define __swp_entry_to_pmd(x)		((pmd_t) { .pmd = (x).val })
 
 extern int kern_addr_valid(unsigned long addr);
 extern void cleanup_highmap(void);
diff --git a/include/linux/swapops.h b/include/linux/swapops.h
index 5c3a5f3..b402a2c 100644
--- a/include/linux/swapops.h
+++ b/include/linux/swapops.h
@@ -163,6 +163,68 @@ static inline int is_write_migration_entry(swp_entry_t entry)
 
 #endif
 
+#ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION
+extern int set_pmd_migration_entry(struct page *page,
+		struct mm_struct *mm, unsigned long address);
+
+extern int remove_migration_pmd(struct page *new,
+		struct vm_area_struct *vma, unsigned long addr, void *old);
+
+extern void pmd_migration_entry_wait(struct mm_struct *mm, pmd_t *pmd);
+
+static inline swp_entry_t pmd_to_swp_entry(pmd_t pmd)
+{
+	swp_entry_t arch_entry;
+
+	arch_entry = __pmd_to_swp_entry(pmd);
+	return swp_entry(__swp_type(arch_entry), __swp_offset(arch_entry));
+}
+
+static inline pmd_t swp_entry_to_pmd(swp_entry_t entry)
+{
+	swp_entry_t arch_entry;
+
+	arch_entry = __swp_entry(swp_type(entry), swp_offset(entry));
+	return __swp_entry_to_pmd(arch_entry);
+}
+
+static inline int is_pmd_migration_entry(pmd_t pmd)
+{
+	return !__pmd_present(pmd) && is_migration_entry(pmd_to_swp_entry(pmd));
+}
+#else
+static inline int set_pmd_migration_entry(struct page *page,
+				struct mm_struct *mm, unsigned long address)
+{
+	return 0;
+}
+
+static inline int remove_migration_pmd(struct page *new,
+		struct vm_area_struct *vma, unsigned long addr, void *old)
+{
+	return 0;
+}
+
+static inline void pmd_migration_entry_wait(struct mm_struct *m, pmd_t *p) { }
+
+static inline swp_entry_t pmd_to_swp_entry(pmd_t pmd)
+{
+	return swp_entry(0, 0);
+}
+
+static inline pmd_t swp_entry_to_pmd(swp_entry_t entry)
+{
+	pmd_t pmd = {};
+
+	return pmd;
+}
+
+static inline int is_pmd_migration_entry(pmd_t pmd)
+{
+	return 0;
+}
+#endif
+
 #ifdef CONFIG_MEMORY_FAILURE
 
 extern atomic_long_t num_poisoned_pages __read_mostly;
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index a6abd76..0cd39ef 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2252,3 +2252,80 @@ static int __init split_huge_pages_debugfs(void)
 }
 late_initcall(split_huge_pages_debugfs);
 #endif
+
+#ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION
+int set_pmd_migration_entry(struct page *page, struct mm_struct *mm,
+				unsigned long addr)
+{
+	pte_t *pte;
+	pmd_t *pmd;
+	pmd_t pmdval;
+	pmd_t pmdswp;
+	swp_entry_t entry;
+	spinlock_t *ptl;
+
+	mmu_notifier_invalidate_range_start(mm, addr, addr + HPAGE_PMD_SIZE);
+	if (!page_check_address_transhuge(page, mm, addr, &pmd, &pte, &ptl))
+		goto out;
+	if (pte)
+		goto out;
+	pmdval = pmdp_huge_get_and_clear(mm, addr, pmd);
+	entry = make_migration_entry(page, pmd_write(pmdval));
+	pmdswp = swp_entry_to_pmd(entry);
+	pmdswp = pmd_mkhuge(pmdswp);
+	set_pmd_at(mm, addr, pmd, pmdswp);
+	page_remove_rmap(page, true);
+	put_page(page);
+	spin_unlock(ptl);
+out:
+	mmu_notifier_invalidate_range_end(mm, addr, addr + HPAGE_PMD_SIZE);
+	return SWAP_AGAIN;
+}
+
+int remove_migration_pmd(struct page *new, struct vm_area_struct *vma,
+			unsigned long addr, void *old)
+{
+	struct mm_struct *mm = vma->vm_mm;
+	spinlock_t *ptl;
+	pgd_t *pgd;
+	pud_t *pud;
+	pmd_t *pmd;
+	pmd_t pmde;
+	swp_entry_t entry;
+	unsigned long mmun_start = addr & HPAGE_PMD_MASK;
+	unsigned long mmun_end = mmun_start + HPAGE_PMD_SIZE;
+
+	pgd = pgd_offset(mm, addr);
+	if (!pgd_present(*pgd))
+		goto out;
+	pud = pud_offset(pgd, addr);
+	if (!pud_present(*pud))
+		goto out;
+	pmd = pmd_offset(pud, addr);
+	if (!pmd)
+		goto out;
+	ptl = pmd_lock(mm, pmd);
+	pmde = *pmd;
+	if (!is_pmd_migration_entry(pmde))
+		goto unlock_ptl;
+	entry = pmd_to_swp_entry(pmde);
+	if (migration_entry_to_page(entry) != old)
+		goto unlock_ptl;
+	get_page(new);
+	pmde = mk_huge_pmd(new, vma->vm_page_prot);
+	if (is_write_migration_entry(entry))
+		pmde = maybe_pmd_mkwrite(pmde, vma);
+	flush_cache_range(vma, mmun_start, mmun_end);
+	page_add_anon_rmap(new, vma, mmun_start, true);
+	pmdp_huge_clear_flush_notify(vma, mmun_start, pmd);
+	set_pmd_at(mm, mmun_start, pmd, pmde);
+	flush_tlb_range(vma, mmun_start, mmun_end);
+	if (vma->vm_flags & VM_LOCKED)
+		mlock_vma_page(new);
+	update_mmu_cache_pmd(vma, addr, pmd);
+unlock_ptl:
+	spin_unlock(ptl);
+out:
+	return SWAP_AGAIN;
+}
+#endif
diff --git a/mm/migrate.c b/mm/migrate.c
index f7ee04a..95613e7 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -207,6 +207,8 @@ static int remove_migration_pte(struct page *new, struct vm_area_struct *vma,
 		if (!ptep)
 			goto out;
 		ptl = huge_pte_lockptr(hstate_vma(vma), mm, ptep);
+	} else if (PageTransHuge(new)) {
+		return remove_migration_pmd(new, vma, addr, old);
 	} else {
 		pmd = mm_find_pmd(mm, addr);
 		if (!pmd)
@@ -344,6 +346,27 @@ void migration_entry_wait_huge(struct vm_area_struct *vma,
 	__migration_entry_wait(mm, pte, ptl);
 }
 
+#ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION
+void pmd_migration_entry_wait(struct mm_struct *mm, pmd_t *pmd)
+{
+	spinlock_t *ptl;
+	struct page *page;
+
+	ptl = pmd_lock(mm, pmd);
+	if (!is_pmd_migration_entry(*pmd))
+		goto unlock;
+	page = migration_entry_to_page(pmd_to_swp_entry(*pmd));
+	if (!get_page_unless_zero(page))
+		goto unlock;
+	spin_unlock(ptl);
+	wait_on_page_locked(page);
+	put_page(page);
+	return;
+unlock:
+	spin_unlock(ptl);
+}
+#endif
+
 #ifdef CONFIG_BLOCK
 /* Returns true if all buffers are successfully locked */
 static bool buffer_migrate_lock_buffers(struct buffer_head *head,
-- 
2.9.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v1 04/12] mm: thp: enable thp migration in generic path
  2016-09-26 15:22 ` zi.yan
@ 2016-09-26 15:22   ` zi.yan
  -1 siblings, 0 replies; 34+ messages in thread
From: zi.yan @ 2016-09-26 15:22 UTC (permalink / raw)
  To: linux-kernel, linux-mm
  Cc: benh, mgorman, kirill.shutemov, akpm, dave.hansen, n-horiguchi

From: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>

This patch makes it possible to support thp migration gradually. If you fail
to allocate a destination page as a thp, you just split the source thp as we
do now, and then enter the normal page migration. If you succeed to allocate
destination thp, you enter thp migration. Subsequent patches actually enable
thp migration for each caller of page migration by allowing its get_new_page()
callback to allocate thps.

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
---
 mm/migrate.c | 2 +-
 mm/rmap.c    | 5 +++++
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/mm/migrate.c b/mm/migrate.c
index 95613e7..dfca530 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1123,7 +1123,7 @@ static ICE_noinline int unmap_and_move(new_page_t get_new_page,
 		goto out;
 	}
 
-	if (unlikely(PageTransHuge(page))) {
+	if (unlikely(PageTransHuge(page) && !PageTransHuge(newpage))) {
 		lock_page(page);
 		rc = split_huge_page(page);
 		unlock_page(page);
diff --git a/mm/rmap.c b/mm/rmap.c
index 1ef3640..d53fff5 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1443,6 +1443,11 @@ static int try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
 	struct rmap_private *rp = arg;
 	enum ttu_flags flags = rp->flags;
 
+	if (!PageHuge(page) && PageTransHuge(page)) {
+		VM_BUG_ON_PAGE(!(flags & TTU_MIGRATION), page);
+		return set_pmd_migration_entry(page, mm, address);
+	}
+
 	/* munlock has nothing to gain from examining un-locked vmas */
 	if ((flags & TTU_MUNLOCK) && !(vma->vm_flags & VM_LOCKED))
 		goto out;
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v1 04/12] mm: thp: enable thp migration in generic path
@ 2016-09-26 15:22   ` zi.yan
  0 siblings, 0 replies; 34+ messages in thread
From: zi.yan @ 2016-09-26 15:22 UTC (permalink / raw)
  To: linux-kernel, linux-mm
  Cc: benh, mgorman, kirill.shutemov, akpm, dave.hansen, n-horiguchi

From: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>

This patch makes it possible to support thp migration gradually. If you fail
to allocate a destination page as a thp, you just split the source thp as we
do now, and then enter the normal page migration. If you succeed to allocate
destination thp, you enter thp migration. Subsequent patches actually enable
thp migration for each caller of page migration by allowing its get_new_page()
callback to allocate thps.

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
---
 mm/migrate.c | 2 +-
 mm/rmap.c    | 5 +++++
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/mm/migrate.c b/mm/migrate.c
index 95613e7..dfca530 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1123,7 +1123,7 @@ static ICE_noinline int unmap_and_move(new_page_t get_new_page,
 		goto out;
 	}
 
-	if (unlikely(PageTransHuge(page))) {
+	if (unlikely(PageTransHuge(page) && !PageTransHuge(newpage))) {
 		lock_page(page);
 		rc = split_huge_page(page);
 		unlock_page(page);
diff --git a/mm/rmap.c b/mm/rmap.c
index 1ef3640..d53fff5 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1443,6 +1443,11 @@ static int try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
 	struct rmap_private *rp = arg;
 	enum ttu_flags flags = rp->flags;
 
+	if (!PageHuge(page) && PageTransHuge(page)) {
+		VM_BUG_ON_PAGE(!(flags & TTU_MIGRATION), page);
+		return set_pmd_migration_entry(page, mm, address);
+	}
+
 	/* munlock has nothing to gain from examining un-locked vmas */
 	if ((flags & TTU_MUNLOCK) && !(vma->vm_flags & VM_LOCKED))
 		goto out;
-- 
2.9.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v1 05/12] mm: thp: check pmd migration entry in common path
  2016-09-26 15:22 ` zi.yan
@ 2016-09-26 15:22   ` zi.yan
  -1 siblings, 0 replies; 34+ messages in thread
From: zi.yan @ 2016-09-26 15:22 UTC (permalink / raw)
  To: linux-kernel, linux-mm
  Cc: benh, mgorman, kirill.shutemov, akpm, dave.hansen, n-horiguchi, Zi Yan

From: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>

If one of callers of page migration starts to handle thp, memory management code
start to see pmd migration entry, so we need to prepare for it before enabling.
This patch changes various code point which checks the status of given pmds in
order to prevent race between thp migration and the pmd-related works.

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Zi Yan <zi.yan@cs.rutgers.edu>
---
 arch/x86/mm/gup.c  |  3 +++
 fs/proc/task_mmu.c | 20 +++++++-------
 mm/gup.c           |  8 ++++++
 mm/huge_memory.c   | 76 +++++++++++++++++++++++++++++++++++++++++++++++-------
 mm/memcontrol.c    |  2 ++
 mm/memory.c        |  5 ++++
 6 files changed, 95 insertions(+), 19 deletions(-)

diff --git a/arch/x86/mm/gup.c b/arch/x86/mm/gup.c
index b8b6a60..72d0bef 100644
--- a/arch/x86/mm/gup.c
+++ b/arch/x86/mm/gup.c
@@ -10,6 +10,7 @@
 #include <linux/highmem.h>
 #include <linux/swap.h>
 #include <linux/memremap.h>
+#include <linux/swapops.h>
 
 #include <asm/mmu_context.h>
 #include <asm/pgtable.h>
@@ -225,6 +226,8 @@ static int gup_pmd_range(pud_t pud, unsigned long addr, unsigned long end,
 		if (pmd_none(pmd))
 			return 0;
 		if (unlikely(pmd_large(pmd) || !pmd_present(pmd))) {
+			if (unlikely(is_pmd_migration_entry(pmd)))
+				return 0;
 			/*
 			 * NUMA hinting faults need to be handled in the GUP
 			 * slowpath for accounting purposes and so that they
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index f6fa99e..60f6ce3 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -931,6 +931,9 @@ static int clear_refs_pte_range(pmd_t *pmd, unsigned long addr,
 
 	ptl = pmd_trans_huge_lock(pmd, vma);
 	if (ptl) {
+		if (unlikely(is_pmd_migration_entry(*pmd)))
+			goto out;
+
 		if (cp->type == CLEAR_REFS_SOFT_DIRTY) {
 			clear_soft_dirty_pmd(vma, addr, pmd);
 			goto out;
@@ -1215,19 +1218,18 @@ static int pagemap_pmd_range(pmd_t *pmdp, unsigned long addr, unsigned long end,
 	if (ptl) {
 		u64 flags = 0, frame = 0;
 		pmd_t pmd = *pmdp;
+		struct page *page;
 
 		if ((vma->vm_flags & VM_SOFTDIRTY) || pmd_soft_dirty(pmd))
 			flags |= PM_SOFT_DIRTY;
 
-		/*
-		 * Currently pmd for thp is always present because thp
-		 * can not be swapped-out, migrated, or HWPOISONed
-		 * (split in such cases instead.)
-		 * This if-check is just to prepare for future implementation.
-		 */
-		if (pmd_present(pmd)) {
-			struct page *page = pmd_page(pmd);
-
+		if (is_pmd_migration_entry(pmd)) {
+			swp_entry_t entry = pmd_to_swp_entry(pmd);
+			frame = swp_type(entry) |
+				(swp_offset(entry) << MAX_SWAPFILES_SHIFT);
+			page = migration_entry_to_page(entry);
+		} else if (pmd_present(pmd)) {
+			page = pmd_page(pmd);
 			if (page_mapcount(page) == 1)
 				flags |= PM_MMAP_EXCLUSIVE;
 
diff --git a/mm/gup.c b/mm/gup.c
index 96b2b2f..ef56be2 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -272,6 +272,11 @@ struct page *follow_page_mask(struct vm_area_struct *vma,
 		spin_unlock(ptl);
 		return follow_page_pte(vma, address, pmd, flags);
 	}
+	if (is_pmd_migration_entry(*pmd)) {
+		spin_unlock(ptl);
+		return no_page_table(vma, flags);
+	}
+
 	if (flags & FOLL_SPLIT) {
 		int ret;
 		page = pmd_page(*pmd);
@@ -1362,6 +1367,9 @@ static int gup_pmd_range(pud_t pud, unsigned long addr, unsigned long end,
 			return 0;
 
 		if (unlikely(pmd_trans_huge(pmd) || pmd_huge(pmd))) {
+			if (unlikely(is_pmd_migration_entry(pmd)))
+				return 0;
+
 			/*
 			 * NUMA hinting faults need to be handled in the GUP
 			 * slowpath for accounting purposes and so that they
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 0cd39ef..f4fcfc7 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -787,6 +787,19 @@ int copy_huge_pmd(struct mm_struct *dst_mm, struct mm_struct *src_mm,
 		goto out_unlock;
 	}
 
+	if (unlikely(is_pmd_migration_entry(pmd))) {
+		swp_entry_t entry = pmd_to_swp_entry(pmd);
+
+		if (is_write_migration_entry(entry)) {
+			make_migration_entry_read(&entry);
+			pmd = swp_entry_to_pmd(entry);
+			set_pmd_at(src_mm, addr, src_pmd, pmd);
+		}
+		set_pmd_at(dst_mm, addr, dst_pmd, pmd);
+		ret = 0;
+		goto out_unlock;
+	}
+
 	src_page = pmd_page(pmd);
 	VM_BUG_ON_PAGE(!PageHead(src_page), src_page);
 	get_page(src_page);
@@ -952,6 +965,9 @@ int do_huge_pmd_wp_page(struct fault_env *fe, pmd_t orig_pmd)
 	if (unlikely(!pmd_same(*fe->pmd, orig_pmd)))
 		goto out_unlock;
 
+	if (unlikely(is_pmd_migration_entry(*fe->pmd)))
+		goto out_unlock;
+
 	page = pmd_page(orig_pmd);
 	VM_BUG_ON_PAGE(!PageCompound(page) || !PageHead(page), page);
 	/*
@@ -1077,7 +1093,15 @@ struct page *follow_trans_huge_pmd(struct vm_area_struct *vma,
 	if ((flags & FOLL_NUMA) && pmd_protnone(*pmd))
 		goto out;
 
-	page = pmd_page(*pmd);
+	if (is_pmd_migration_entry(*pmd)) {
+		swp_entry_t entry;
+		entry = pmd_to_swp_entry(*pmd);
+		if (!is_migration_entry(entry))
+			goto out;
+		page = pfn_to_page(swp_offset(entry));
+	} else
+		page = pmd_page(*pmd);
+
 	VM_BUG_ON_PAGE(!PageHead(page) && !is_zone_device_page(page), page);
 	if (flags & FOLL_TOUCH)
 		touch_pmd(vma, addr, pmd);
@@ -1273,6 +1297,9 @@ bool madvise_free_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
 	if (is_huge_zero_pmd(orig_pmd))
 		goto out;
 
+	if (unlikely(is_pmd_migration_entry(orig_pmd)))
+		goto out;
+
 	page = pmd_page(orig_pmd);
 	/*
 	 * If other processes are mapping this page, we couldn't discard
@@ -1348,21 +1375,40 @@ int zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
 		spin_unlock(ptl);
 		tlb_remove_page(tlb, pmd_page(orig_pmd));
 	} else {
-		struct page *page = pmd_page(orig_pmd);
-		page_remove_rmap(page, true);
-		VM_BUG_ON_PAGE(page_mapcount(page) < 0, page);
-		VM_BUG_ON_PAGE(!PageHead(page), page);
-		if (PageAnon(page)) {
+		struct page *page;
+		int migration = 0;
+
+		if (!is_pmd_migration_entry(orig_pmd)) {
+			page = pmd_page(orig_pmd);
+			page_remove_rmap(page, true);
+			VM_BUG_ON_PAGE(page_mapcount(page) < 0, page);
+			VM_BUG_ON_PAGE(!PageHead(page), page);
+			if (PageAnon(page)) {
+				pgtable_t pgtable;
+				pgtable = pgtable_trans_huge_withdraw(tlb->mm, pmd);
+				pte_free(tlb->mm, pgtable);
+				atomic_long_dec(&tlb->mm->nr_ptes);
+				add_mm_counter(tlb->mm, MM_ANONPAGES, -HPAGE_PMD_NR);
+			} else {
+				add_mm_counter(tlb->mm, MM_FILEPAGES, -HPAGE_PMD_NR);
+			}
+		} else {
+			swp_entry_t entry;
 			pgtable_t pgtable;
+
+			entry = pmd_to_swp_entry(orig_pmd);
+			free_swap_and_cache(entry); /* waring in failure? */
+
+			add_mm_counter(tlb->mm, MM_ANONPAGES, -HPAGE_PMD_NR);
 			pgtable = pgtable_trans_huge_withdraw(tlb->mm, pmd);
 			pte_free(tlb->mm, pgtable);
 			atomic_long_dec(&tlb->mm->nr_ptes);
-			add_mm_counter(tlb->mm, MM_ANONPAGES, -HPAGE_PMD_NR);
-		} else {
-			add_mm_counter(tlb->mm, MM_FILEPAGES, -HPAGE_PMD_NR);
+
+			migration = 1;
 		}
 		spin_unlock(ptl);
-		tlb_remove_page_size(tlb, page, HPAGE_PMD_SIZE);
+		if (!migration)
+			tlb_remove_page_size(tlb, page, HPAGE_PMD_SIZE);
 	}
 	return 1;
 }
@@ -1445,6 +1491,11 @@ int change_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd,
 			return ret;
 		}
 
+		if (is_pmd_migration_entry(*pmd)) {
+			spin_unlock(ptl);
+			return ret;
+		}
+
 		if (!prot_numa || !pmd_protnone(*pmd)) {
 			entry = pmdp_huge_get_and_clear_notify(mm, addr, pmd);
 			entry = pmd_modify(entry, newprot);
@@ -1656,6 +1707,11 @@ void __split_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd,
 
 	if (pmd_trans_huge(*pmd)) {
 		page = pmd_page(*pmd);
+
+		if (is_pmd_migration_entry(*pmd)) {
+			goto out;
+		}
+
 		if (PageMlocked(page))
 			clear_page_mlock(page);
 	} else if (!pmd_devmap(*pmd))
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 4be518d..421ac4ff 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -4649,6 +4649,8 @@ static enum mc_target_type get_mctgt_type_thp(struct vm_area_struct *vma,
 	struct page *page = NULL;
 	enum mc_target_type ret = MC_TARGET_NONE;
 
+	if (unlikely(is_pmd_migration_entry(pmd)))
+		return ret;
 	page = pmd_page(pmd);
 	VM_BUG_ON_PAGE(!page || !PageHead(page), page);
 	if (!(mc.flags & MOVE_ANON))
diff --git a/mm/memory.c b/mm/memory.c
index 83be99d..3ad3bb2 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3590,6 +3590,11 @@ static int __handle_mm_fault(struct vm_area_struct *vma, unsigned long address,
 
 		barrier();
 		if (pmd_trans_huge(orig_pmd) || pmd_devmap(orig_pmd)) {
+			if (unlikely(is_pmd_migration_entry(orig_pmd))) {
+				pmd_migration_entry_wait(mm, fe.pmd);
+				return 0;
+			}
+
 			if (pmd_protnone(orig_pmd))
 				return do_huge_pmd_numa_page(&fe, orig_pmd);
 
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v1 05/12] mm: thp: check pmd migration entry in common path
@ 2016-09-26 15:22   ` zi.yan
  0 siblings, 0 replies; 34+ messages in thread
From: zi.yan @ 2016-09-26 15:22 UTC (permalink / raw)
  To: linux-kernel, linux-mm
  Cc: benh, mgorman, kirill.shutemov, akpm, dave.hansen, n-horiguchi, Zi Yan

From: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>

If one of callers of page migration starts to handle thp, memory management code
start to see pmd migration entry, so we need to prepare for it before enabling.
This patch changes various code point which checks the status of given pmds in
order to prevent race between thp migration and the pmd-related works.

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Zi Yan <zi.yan@cs.rutgers.edu>
---
 arch/x86/mm/gup.c  |  3 +++
 fs/proc/task_mmu.c | 20 +++++++-------
 mm/gup.c           |  8 ++++++
 mm/huge_memory.c   | 76 +++++++++++++++++++++++++++++++++++++++++++++++-------
 mm/memcontrol.c    |  2 ++
 mm/memory.c        |  5 ++++
 6 files changed, 95 insertions(+), 19 deletions(-)

diff --git a/arch/x86/mm/gup.c b/arch/x86/mm/gup.c
index b8b6a60..72d0bef 100644
--- a/arch/x86/mm/gup.c
+++ b/arch/x86/mm/gup.c
@@ -10,6 +10,7 @@
 #include <linux/highmem.h>
 #include <linux/swap.h>
 #include <linux/memremap.h>
+#include <linux/swapops.h>
 
 #include <asm/mmu_context.h>
 #include <asm/pgtable.h>
@@ -225,6 +226,8 @@ static int gup_pmd_range(pud_t pud, unsigned long addr, unsigned long end,
 		if (pmd_none(pmd))
 			return 0;
 		if (unlikely(pmd_large(pmd) || !pmd_present(pmd))) {
+			if (unlikely(is_pmd_migration_entry(pmd)))
+				return 0;
 			/*
 			 * NUMA hinting faults need to be handled in the GUP
 			 * slowpath for accounting purposes and so that they
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index f6fa99e..60f6ce3 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -931,6 +931,9 @@ static int clear_refs_pte_range(pmd_t *pmd, unsigned long addr,
 
 	ptl = pmd_trans_huge_lock(pmd, vma);
 	if (ptl) {
+		if (unlikely(is_pmd_migration_entry(*pmd)))
+			goto out;
+
 		if (cp->type == CLEAR_REFS_SOFT_DIRTY) {
 			clear_soft_dirty_pmd(vma, addr, pmd);
 			goto out;
@@ -1215,19 +1218,18 @@ static int pagemap_pmd_range(pmd_t *pmdp, unsigned long addr, unsigned long end,
 	if (ptl) {
 		u64 flags = 0, frame = 0;
 		pmd_t pmd = *pmdp;
+		struct page *page;
 
 		if ((vma->vm_flags & VM_SOFTDIRTY) || pmd_soft_dirty(pmd))
 			flags |= PM_SOFT_DIRTY;
 
-		/*
-		 * Currently pmd for thp is always present because thp
-		 * can not be swapped-out, migrated, or HWPOISONed
-		 * (split in such cases instead.)
-		 * This if-check is just to prepare for future implementation.
-		 */
-		if (pmd_present(pmd)) {
-			struct page *page = pmd_page(pmd);
-
+		if (is_pmd_migration_entry(pmd)) {
+			swp_entry_t entry = pmd_to_swp_entry(pmd);
+			frame = swp_type(entry) |
+				(swp_offset(entry) << MAX_SWAPFILES_SHIFT);
+			page = migration_entry_to_page(entry);
+		} else if (pmd_present(pmd)) {
+			page = pmd_page(pmd);
 			if (page_mapcount(page) == 1)
 				flags |= PM_MMAP_EXCLUSIVE;
 
diff --git a/mm/gup.c b/mm/gup.c
index 96b2b2f..ef56be2 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -272,6 +272,11 @@ struct page *follow_page_mask(struct vm_area_struct *vma,
 		spin_unlock(ptl);
 		return follow_page_pte(vma, address, pmd, flags);
 	}
+	if (is_pmd_migration_entry(*pmd)) {
+		spin_unlock(ptl);
+		return no_page_table(vma, flags);
+	}
+
 	if (flags & FOLL_SPLIT) {
 		int ret;
 		page = pmd_page(*pmd);
@@ -1362,6 +1367,9 @@ static int gup_pmd_range(pud_t pud, unsigned long addr, unsigned long end,
 			return 0;
 
 		if (unlikely(pmd_trans_huge(pmd) || pmd_huge(pmd))) {
+			if (unlikely(is_pmd_migration_entry(pmd)))
+				return 0;
+
 			/*
 			 * NUMA hinting faults need to be handled in the GUP
 			 * slowpath for accounting purposes and so that they
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 0cd39ef..f4fcfc7 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -787,6 +787,19 @@ int copy_huge_pmd(struct mm_struct *dst_mm, struct mm_struct *src_mm,
 		goto out_unlock;
 	}
 
+	if (unlikely(is_pmd_migration_entry(pmd))) {
+		swp_entry_t entry = pmd_to_swp_entry(pmd);
+
+		if (is_write_migration_entry(entry)) {
+			make_migration_entry_read(&entry);
+			pmd = swp_entry_to_pmd(entry);
+			set_pmd_at(src_mm, addr, src_pmd, pmd);
+		}
+		set_pmd_at(dst_mm, addr, dst_pmd, pmd);
+		ret = 0;
+		goto out_unlock;
+	}
+
 	src_page = pmd_page(pmd);
 	VM_BUG_ON_PAGE(!PageHead(src_page), src_page);
 	get_page(src_page);
@@ -952,6 +965,9 @@ int do_huge_pmd_wp_page(struct fault_env *fe, pmd_t orig_pmd)
 	if (unlikely(!pmd_same(*fe->pmd, orig_pmd)))
 		goto out_unlock;
 
+	if (unlikely(is_pmd_migration_entry(*fe->pmd)))
+		goto out_unlock;
+
 	page = pmd_page(orig_pmd);
 	VM_BUG_ON_PAGE(!PageCompound(page) || !PageHead(page), page);
 	/*
@@ -1077,7 +1093,15 @@ struct page *follow_trans_huge_pmd(struct vm_area_struct *vma,
 	if ((flags & FOLL_NUMA) && pmd_protnone(*pmd))
 		goto out;
 
-	page = pmd_page(*pmd);
+	if (is_pmd_migration_entry(*pmd)) {
+		swp_entry_t entry;
+		entry = pmd_to_swp_entry(*pmd);
+		if (!is_migration_entry(entry))
+			goto out;
+		page = pfn_to_page(swp_offset(entry));
+	} else
+		page = pmd_page(*pmd);
+
 	VM_BUG_ON_PAGE(!PageHead(page) && !is_zone_device_page(page), page);
 	if (flags & FOLL_TOUCH)
 		touch_pmd(vma, addr, pmd);
@@ -1273,6 +1297,9 @@ bool madvise_free_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
 	if (is_huge_zero_pmd(orig_pmd))
 		goto out;
 
+	if (unlikely(is_pmd_migration_entry(orig_pmd)))
+		goto out;
+
 	page = pmd_page(orig_pmd);
 	/*
 	 * If other processes are mapping this page, we couldn't discard
@@ -1348,21 +1375,40 @@ int zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
 		spin_unlock(ptl);
 		tlb_remove_page(tlb, pmd_page(orig_pmd));
 	} else {
-		struct page *page = pmd_page(orig_pmd);
-		page_remove_rmap(page, true);
-		VM_BUG_ON_PAGE(page_mapcount(page) < 0, page);
-		VM_BUG_ON_PAGE(!PageHead(page), page);
-		if (PageAnon(page)) {
+		struct page *page;
+		int migration = 0;
+
+		if (!is_pmd_migration_entry(orig_pmd)) {
+			page = pmd_page(orig_pmd);
+			page_remove_rmap(page, true);
+			VM_BUG_ON_PAGE(page_mapcount(page) < 0, page);
+			VM_BUG_ON_PAGE(!PageHead(page), page);
+			if (PageAnon(page)) {
+				pgtable_t pgtable;
+				pgtable = pgtable_trans_huge_withdraw(tlb->mm, pmd);
+				pte_free(tlb->mm, pgtable);
+				atomic_long_dec(&tlb->mm->nr_ptes);
+				add_mm_counter(tlb->mm, MM_ANONPAGES, -HPAGE_PMD_NR);
+			} else {
+				add_mm_counter(tlb->mm, MM_FILEPAGES, -HPAGE_PMD_NR);
+			}
+		} else {
+			swp_entry_t entry;
 			pgtable_t pgtable;
+
+			entry = pmd_to_swp_entry(orig_pmd);
+			free_swap_and_cache(entry); /* waring in failure? */
+
+			add_mm_counter(tlb->mm, MM_ANONPAGES, -HPAGE_PMD_NR);
 			pgtable = pgtable_trans_huge_withdraw(tlb->mm, pmd);
 			pte_free(tlb->mm, pgtable);
 			atomic_long_dec(&tlb->mm->nr_ptes);
-			add_mm_counter(tlb->mm, MM_ANONPAGES, -HPAGE_PMD_NR);
-		} else {
-			add_mm_counter(tlb->mm, MM_FILEPAGES, -HPAGE_PMD_NR);
+
+			migration = 1;
 		}
 		spin_unlock(ptl);
-		tlb_remove_page_size(tlb, page, HPAGE_PMD_SIZE);
+		if (!migration)
+			tlb_remove_page_size(tlb, page, HPAGE_PMD_SIZE);
 	}
 	return 1;
 }
@@ -1445,6 +1491,11 @@ int change_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd,
 			return ret;
 		}
 
+		if (is_pmd_migration_entry(*pmd)) {
+			spin_unlock(ptl);
+			return ret;
+		}
+
 		if (!prot_numa || !pmd_protnone(*pmd)) {
 			entry = pmdp_huge_get_and_clear_notify(mm, addr, pmd);
 			entry = pmd_modify(entry, newprot);
@@ -1656,6 +1707,11 @@ void __split_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd,
 
 	if (pmd_trans_huge(*pmd)) {
 		page = pmd_page(*pmd);
+
+		if (is_pmd_migration_entry(*pmd)) {
+			goto out;
+		}
+
 		if (PageMlocked(page))
 			clear_page_mlock(page);
 	} else if (!pmd_devmap(*pmd))
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 4be518d..421ac4ff 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -4649,6 +4649,8 @@ static enum mc_target_type get_mctgt_type_thp(struct vm_area_struct *vma,
 	struct page *page = NULL;
 	enum mc_target_type ret = MC_TARGET_NONE;
 
+	if (unlikely(is_pmd_migration_entry(pmd)))
+		return ret;
 	page = pmd_page(pmd);
 	VM_BUG_ON_PAGE(!page || !PageHead(page), page);
 	if (!(mc.flags & MOVE_ANON))
diff --git a/mm/memory.c b/mm/memory.c
index 83be99d..3ad3bb2 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3590,6 +3590,11 @@ static int __handle_mm_fault(struct vm_area_struct *vma, unsigned long address,
 
 		barrier();
 		if (pmd_trans_huge(orig_pmd) || pmd_devmap(orig_pmd)) {
+			if (unlikely(is_pmd_migration_entry(orig_pmd))) {
+				pmd_migration_entry_wait(mm, fe.pmd);
+				return 0;
+			}
+
 			if (pmd_protnone(orig_pmd))
 				return do_huge_pmd_numa_page(&fe, orig_pmd);
 
-- 
2.9.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v1 06/12] mm: soft-dirty: keep soft-dirty bits over thp migration
  2016-09-26 15:22 ` zi.yan
@ 2016-09-26 15:22   ` zi.yan
  -1 siblings, 0 replies; 34+ messages in thread
From: zi.yan @ 2016-09-26 15:22 UTC (permalink / raw)
  To: linux-kernel, linux-mm
  Cc: benh, mgorman, kirill.shutemov, akpm, dave.hansen, n-horiguchi

From: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>

Soft dirty bit is designed to keep tracked over page migration, so this patch
makes it done for thp migration too.

This patch changes the bit for _PAGE_SWP_SOFT_DIRTY bit, because it's necessary
for thp migration (i.e. both of _PAGE_PSE and _PAGE_PRESENT is used to detect
pmd migration entry.) When soft-dirty was introduced, bit 6 was used for
nonlinear file mapping, but now that feature is replaced with emulation, so
we can relocate _PAGE_SWP_SOFT_DIRTY to bit 6.

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
---
 arch/x86/include/asm/pgtable.h       | 17 +++++++++++++++++
 arch/x86/include/asm/pgtable_types.h |  8 ++++----
 include/asm-generic/pgtable.h        | 34 +++++++++++++++++++++++++++++++++-
 include/linux/swapops.h              |  2 ++
 mm/huge_memory.c                     | 33 +++++++++++++++++++++++++++++++--
 5 files changed, 87 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index 5ff861f..4304776 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -959,6 +959,23 @@ static inline pte_t pte_swp_clear_soft_dirty(pte_t pte)
 {
 	return pte_clear_flags(pte, _PAGE_SWP_SOFT_DIRTY);
 }
+
+#ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION
+static inline pmd_t pmd_swp_mksoft_dirty(pmd_t pmd)
+{
+	return pmd_set_flags(pmd, _PAGE_SWP_SOFT_DIRTY);
+}
+
+static inline int pmd_swp_soft_dirty(pmd_t pmd)
+{
+	return pmd_flags(pmd) & _PAGE_SWP_SOFT_DIRTY;
+}
+
+static inline pmd_t pmd_swp_clear_soft_dirty(pmd_t pmd)
+{
+	return pmd_clear_flags(pmd, _PAGE_SWP_SOFT_DIRTY);
+}
+#endif
 #endif
 
 #define PKRU_AD_BIT 0x1
diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pgtable_types.h
index f1218f5..a38e387 100644
--- a/arch/x86/include/asm/pgtable_types.h
+++ b/arch/x86/include/asm/pgtable_types.h
@@ -98,14 +98,14 @@
  * Tracking soft dirty bit when a page goes to a swap is tricky.
  * We need a bit which can be stored in pte _and_ not conflict
  * with swap entry format. On x86 bits 6 and 7 are *not* involved
- * into swap entry computation, but bit 6 is used for nonlinear
- * file mapping, so we borrow bit 7 for soft dirty tracking.
+ * into swap entry computation, but bit 7 is used for thp migration,
+ * so we borrow bit 6 for soft dirty tracking.
  *
  * Please note that this bit must be treated as swap dirty page
- * mark if and only if the PTE has present bit clear!
+ * mark if and only if the PTE/PMD has present bit clear!
  */
 #ifdef CONFIG_MEM_SOFT_DIRTY
-#define _PAGE_SWP_SOFT_DIRTY	_PAGE_PSE
+#define _PAGE_SWP_SOFT_DIRTY	_PAGE_DIRTY
 #else
 #define _PAGE_SWP_SOFT_DIRTY	(_AT(pteval_t, 0))
 #endif
diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h
index d4458b6..fdc4793 100644
--- a/include/asm-generic/pgtable.h
+++ b/include/asm-generic/pgtable.h
@@ -489,7 +489,24 @@ static inline void ptep_modify_prot_commit(struct mm_struct *mm,
 #define arch_start_context_switch(prev)	do {} while (0)
 #endif
 
-#ifndef CONFIG_HAVE_ARCH_SOFT_DIRTY
+#ifdef CONFIG_HAVE_ARCH_SOFT_DIRTY
+#ifndef CONFIG_ARCH_ENABLE_THP_MIGRATION
+static inline pmd_t pmd_swp_mksoft_dirty(pmd_t pmd)
+{
+	return pmd;
+}
+
+static inline int pmd_swp_soft_dirty(pmd_t pmd)
+{
+	return 0;
+}
+
+static inline pmd_t pmd_swp_clear_soft_dirty(pmd_t pmd)
+{
+	return pmd;
+}
+#endif
+#else /* !CONFIG_HAVE_ARCH_SOFT_DIRTY */
 static inline int pte_soft_dirty(pte_t pte)
 {
 	return 0;
@@ -534,6 +551,21 @@ static inline pte_t pte_swp_clear_soft_dirty(pte_t pte)
 {
 	return pte;
 }
+
+static inline pmd_t pmd_swp_mksoft_dirty(pmd_t pmd)
+{
+	return pmd;
+}
+
+static inline int pmd_swp_soft_dirty(pmd_t pmd)
+{
+	return 0;
+}
+
+static inline pmd_t pmd_swp_clear_soft_dirty(pmd_t pmd)
+{
+	return pmd;
+}
 #endif
 
 #ifndef __HAVE_PFNMAP_TRACKING
diff --git a/include/linux/swapops.h b/include/linux/swapops.h
index b402a2c..18f3744 100644
--- a/include/linux/swapops.h
+++ b/include/linux/swapops.h
@@ -176,6 +176,8 @@ static inline swp_entry_t pmd_to_swp_entry(pmd_t pmd)
 {
 	swp_entry_t arch_entry;
 
+	if (pmd_swp_soft_dirty(pmd))
+		pmd = pmd_swp_clear_soft_dirty(pmd);
 	arch_entry = __pmd_to_swp_entry(pmd);
 	return swp_entry(__swp_type(arch_entry), __swp_offset(arch_entry));
 }
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index f4fcfc7..1e1758b 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -793,6 +793,8 @@ int copy_huge_pmd(struct mm_struct *dst_mm, struct mm_struct *src_mm,
 		if (is_write_migration_entry(entry)) {
 			make_migration_entry_read(&entry);
 			pmd = swp_entry_to_pmd(entry);
+			if (pmd_swp_soft_dirty(pmd))
+				pmd = pmd_swp_mksoft_dirty(pmd);
 			set_pmd_at(src_mm, addr, src_pmd, pmd);
 		}
 		set_pmd_at(dst_mm, addr, dst_pmd, pmd);
@@ -1413,6 +1415,17 @@ int zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
 	return 1;
 }
 
+static pmd_t move_soft_dirty_pmd(pmd_t pmd)
+{
+#ifdef CONFIG_MEM_SOFT_DIRTY
+	if (unlikely(is_pmd_migration_entry(pmd)))
+		pmd = pmd_mksoft_dirty(pmd);
+	else if (pmd_present(pmd))
+		pmd = pmd_swp_mksoft_dirty(pmd);
+#endif
+	return pmd;
+}
+
 bool move_huge_pmd(struct vm_area_struct *vma, unsigned long old_addr,
 		  unsigned long new_addr, unsigned long old_end,
 		  pmd_t *old_pmd, pmd_t *new_pmd)
@@ -1453,7 +1466,8 @@ bool move_huge_pmd(struct vm_area_struct *vma, unsigned long old_addr,
 			pgtable = pgtable_trans_huge_withdraw(mm, old_pmd);
 			pgtable_trans_huge_deposit(mm, new_pmd, pgtable);
 		}
-		set_pmd_at(mm, new_addr, new_pmd, pmd_mksoft_dirty(pmd));
+		pmd = move_soft_dirty_pmd(pmd);
+		set_pmd_at(mm, new_addr, new_pmd, pmd);
 		if (new_ptl != old_ptl)
 			spin_unlock(new_ptl);
 		spin_unlock(old_ptl);
@@ -1492,6 +1506,17 @@ int change_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd,
 		}
 
 		if (is_pmd_migration_entry(*pmd)) {
+			swp_entry_t entry = pmd_to_swp_entry(*pmd);
+
+			if (is_write_migration_entry(entry)) {
+				pmd_t newpmd;
+
+				make_migration_entry_read(&entry);
+				newpmd = swp_entry_to_pmd(entry);
+				if (pmd_swp_soft_dirty(newpmd))
+					newpmd = pmd_swp_mksoft_dirty(newpmd);
+				set_pmd_at(mm, addr, pmd, newpmd);
+			}
 			spin_unlock(ptl);
 			return ret;
 		}
@@ -2329,6 +2354,8 @@ int set_pmd_migration_entry(struct page *page, struct mm_struct *mm,
 	entry = make_migration_entry(page, pmd_write(pmdval));
 	pmdswp = swp_entry_to_pmd(entry);
 	pmdswp = pmd_mkhuge(pmdswp);
+	if (pmd_soft_dirty(pmdval))
+		pmdswp = pmd_swp_mksoft_dirty(pmdswp);
 	set_pmd_at(mm, addr, pmd, pmdswp);
 	page_remove_rmap(page, true);
 	put_page(page);
@@ -2368,7 +2395,9 @@ int remove_migration_pmd(struct page *new, struct vm_area_struct *vma,
 	if (migration_entry_to_page(entry) != old)
 		goto unlock_ptl;
 	get_page(new);
-	pmde = mk_huge_pmd(new, vma->vm_page_prot);
+	pmde = pmd_mkold(mk_huge_pmd(new, vma->vm_page_prot));
+	if (pmd_swp_soft_dirty(pmde))
+		pmde = pmd_mksoft_dirty(pmde);
 	if (is_write_migration_entry(entry))
 		pmde = maybe_pmd_mkwrite(pmde, vma);
 	flush_cache_range(vma, mmun_start, mmun_end);
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v1 06/12] mm: soft-dirty: keep soft-dirty bits over thp migration
@ 2016-09-26 15:22   ` zi.yan
  0 siblings, 0 replies; 34+ messages in thread
From: zi.yan @ 2016-09-26 15:22 UTC (permalink / raw)
  To: linux-kernel, linux-mm
  Cc: benh, mgorman, kirill.shutemov, akpm, dave.hansen, n-horiguchi

From: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>

Soft dirty bit is designed to keep tracked over page migration, so this patch
makes it done for thp migration too.

This patch changes the bit for _PAGE_SWP_SOFT_DIRTY bit, because it's necessary
for thp migration (i.e. both of _PAGE_PSE and _PAGE_PRESENT is used to detect
pmd migration entry.) When soft-dirty was introduced, bit 6 was used for
nonlinear file mapping, but now that feature is replaced with emulation, so
we can relocate _PAGE_SWP_SOFT_DIRTY to bit 6.

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
---
 arch/x86/include/asm/pgtable.h       | 17 +++++++++++++++++
 arch/x86/include/asm/pgtable_types.h |  8 ++++----
 include/asm-generic/pgtable.h        | 34 +++++++++++++++++++++++++++++++++-
 include/linux/swapops.h              |  2 ++
 mm/huge_memory.c                     | 33 +++++++++++++++++++++++++++++++--
 5 files changed, 87 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index 5ff861f..4304776 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -959,6 +959,23 @@ static inline pte_t pte_swp_clear_soft_dirty(pte_t pte)
 {
 	return pte_clear_flags(pte, _PAGE_SWP_SOFT_DIRTY);
 }
+
+#ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION
+static inline pmd_t pmd_swp_mksoft_dirty(pmd_t pmd)
+{
+	return pmd_set_flags(pmd, _PAGE_SWP_SOFT_DIRTY);
+}
+
+static inline int pmd_swp_soft_dirty(pmd_t pmd)
+{
+	return pmd_flags(pmd) & _PAGE_SWP_SOFT_DIRTY;
+}
+
+static inline pmd_t pmd_swp_clear_soft_dirty(pmd_t pmd)
+{
+	return pmd_clear_flags(pmd, _PAGE_SWP_SOFT_DIRTY);
+}
+#endif
 #endif
 
 #define PKRU_AD_BIT 0x1
diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pgtable_types.h
index f1218f5..a38e387 100644
--- a/arch/x86/include/asm/pgtable_types.h
+++ b/arch/x86/include/asm/pgtable_types.h
@@ -98,14 +98,14 @@
  * Tracking soft dirty bit when a page goes to a swap is tricky.
  * We need a bit which can be stored in pte _and_ not conflict
  * with swap entry format. On x86 bits 6 and 7 are *not* involved
- * into swap entry computation, but bit 6 is used for nonlinear
- * file mapping, so we borrow bit 7 for soft dirty tracking.
+ * into swap entry computation, but bit 7 is used for thp migration,
+ * so we borrow bit 6 for soft dirty tracking.
  *
  * Please note that this bit must be treated as swap dirty page
- * mark if and only if the PTE has present bit clear!
+ * mark if and only if the PTE/PMD has present bit clear!
  */
 #ifdef CONFIG_MEM_SOFT_DIRTY
-#define _PAGE_SWP_SOFT_DIRTY	_PAGE_PSE
+#define _PAGE_SWP_SOFT_DIRTY	_PAGE_DIRTY
 #else
 #define _PAGE_SWP_SOFT_DIRTY	(_AT(pteval_t, 0))
 #endif
diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h
index d4458b6..fdc4793 100644
--- a/include/asm-generic/pgtable.h
+++ b/include/asm-generic/pgtable.h
@@ -489,7 +489,24 @@ static inline void ptep_modify_prot_commit(struct mm_struct *mm,
 #define arch_start_context_switch(prev)	do {} while (0)
 #endif
 
-#ifndef CONFIG_HAVE_ARCH_SOFT_DIRTY
+#ifdef CONFIG_HAVE_ARCH_SOFT_DIRTY
+#ifndef CONFIG_ARCH_ENABLE_THP_MIGRATION
+static inline pmd_t pmd_swp_mksoft_dirty(pmd_t pmd)
+{
+	return pmd;
+}
+
+static inline int pmd_swp_soft_dirty(pmd_t pmd)
+{
+	return 0;
+}
+
+static inline pmd_t pmd_swp_clear_soft_dirty(pmd_t pmd)
+{
+	return pmd;
+}
+#endif
+#else /* !CONFIG_HAVE_ARCH_SOFT_DIRTY */
 static inline int pte_soft_dirty(pte_t pte)
 {
 	return 0;
@@ -534,6 +551,21 @@ static inline pte_t pte_swp_clear_soft_dirty(pte_t pte)
 {
 	return pte;
 }
+
+static inline pmd_t pmd_swp_mksoft_dirty(pmd_t pmd)
+{
+	return pmd;
+}
+
+static inline int pmd_swp_soft_dirty(pmd_t pmd)
+{
+	return 0;
+}
+
+static inline pmd_t pmd_swp_clear_soft_dirty(pmd_t pmd)
+{
+	return pmd;
+}
 #endif
 
 #ifndef __HAVE_PFNMAP_TRACKING
diff --git a/include/linux/swapops.h b/include/linux/swapops.h
index b402a2c..18f3744 100644
--- a/include/linux/swapops.h
+++ b/include/linux/swapops.h
@@ -176,6 +176,8 @@ static inline swp_entry_t pmd_to_swp_entry(pmd_t pmd)
 {
 	swp_entry_t arch_entry;
 
+	if (pmd_swp_soft_dirty(pmd))
+		pmd = pmd_swp_clear_soft_dirty(pmd);
 	arch_entry = __pmd_to_swp_entry(pmd);
 	return swp_entry(__swp_type(arch_entry), __swp_offset(arch_entry));
 }
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index f4fcfc7..1e1758b 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -793,6 +793,8 @@ int copy_huge_pmd(struct mm_struct *dst_mm, struct mm_struct *src_mm,
 		if (is_write_migration_entry(entry)) {
 			make_migration_entry_read(&entry);
 			pmd = swp_entry_to_pmd(entry);
+			if (pmd_swp_soft_dirty(pmd))
+				pmd = pmd_swp_mksoft_dirty(pmd);
 			set_pmd_at(src_mm, addr, src_pmd, pmd);
 		}
 		set_pmd_at(dst_mm, addr, dst_pmd, pmd);
@@ -1413,6 +1415,17 @@ int zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
 	return 1;
 }
 
+static pmd_t move_soft_dirty_pmd(pmd_t pmd)
+{
+#ifdef CONFIG_MEM_SOFT_DIRTY
+	if (unlikely(is_pmd_migration_entry(pmd)))
+		pmd = pmd_mksoft_dirty(pmd);
+	else if (pmd_present(pmd))
+		pmd = pmd_swp_mksoft_dirty(pmd);
+#endif
+	return pmd;
+}
+
 bool move_huge_pmd(struct vm_area_struct *vma, unsigned long old_addr,
 		  unsigned long new_addr, unsigned long old_end,
 		  pmd_t *old_pmd, pmd_t *new_pmd)
@@ -1453,7 +1466,8 @@ bool move_huge_pmd(struct vm_area_struct *vma, unsigned long old_addr,
 			pgtable = pgtable_trans_huge_withdraw(mm, old_pmd);
 			pgtable_trans_huge_deposit(mm, new_pmd, pgtable);
 		}
-		set_pmd_at(mm, new_addr, new_pmd, pmd_mksoft_dirty(pmd));
+		pmd = move_soft_dirty_pmd(pmd);
+		set_pmd_at(mm, new_addr, new_pmd, pmd);
 		if (new_ptl != old_ptl)
 			spin_unlock(new_ptl);
 		spin_unlock(old_ptl);
@@ -1492,6 +1506,17 @@ int change_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd,
 		}
 
 		if (is_pmd_migration_entry(*pmd)) {
+			swp_entry_t entry = pmd_to_swp_entry(*pmd);
+
+			if (is_write_migration_entry(entry)) {
+				pmd_t newpmd;
+
+				make_migration_entry_read(&entry);
+				newpmd = swp_entry_to_pmd(entry);
+				if (pmd_swp_soft_dirty(newpmd))
+					newpmd = pmd_swp_mksoft_dirty(newpmd);
+				set_pmd_at(mm, addr, pmd, newpmd);
+			}
 			spin_unlock(ptl);
 			return ret;
 		}
@@ -2329,6 +2354,8 @@ int set_pmd_migration_entry(struct page *page, struct mm_struct *mm,
 	entry = make_migration_entry(page, pmd_write(pmdval));
 	pmdswp = swp_entry_to_pmd(entry);
 	pmdswp = pmd_mkhuge(pmdswp);
+	if (pmd_soft_dirty(pmdval))
+		pmdswp = pmd_swp_mksoft_dirty(pmdswp);
 	set_pmd_at(mm, addr, pmd, pmdswp);
 	page_remove_rmap(page, true);
 	put_page(page);
@@ -2368,7 +2395,9 @@ int remove_migration_pmd(struct page *new, struct vm_area_struct *vma,
 	if (migration_entry_to_page(entry) != old)
 		goto unlock_ptl;
 	get_page(new);
-	pmde = mk_huge_pmd(new, vma->vm_page_prot);
+	pmde = pmd_mkold(mk_huge_pmd(new, vma->vm_page_prot));
+	if (pmd_swp_soft_dirty(pmde))
+		pmde = pmd_mksoft_dirty(pmde);
 	if (is_write_migration_entry(entry))
 		pmde = maybe_pmd_mkwrite(pmde, vma);
 	flush_cache_range(vma, mmun_start, mmun_end);
-- 
2.9.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v1 07/12] mm: hwpoison: fix race between unpoisoning and freeing migrate source page
  2016-09-26 15:22 ` zi.yan
@ 2016-09-26 15:22   ` zi.yan
  -1 siblings, 0 replies; 34+ messages in thread
From: zi.yan @ 2016-09-26 15:22 UTC (permalink / raw)
  To: linux-kernel, linux-mm
  Cc: benh, mgorman, kirill.shutemov, akpm, dave.hansen, n-horiguchi

From: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>

During testing thp migration, I saw the BUG_ON triggered due to the race between
soft offline and unpoison (what I actually saw was "bad page" warning of freeing
page with PageActive set, then subsequent bug messages differ each time.)

I tried to solve similar problem a few times (see commit f4c18e6f7b5b ("mm:
check __PG_HWPOISON separately from PAGE_FLAGS_CHECK_AT_*",) but the new
workload brings out a new problem of the previous solution.

Let's say that unpoison never works well if the target page is not properly
contained,) so now I'm going in the direction of limiting unpoison function
(as commit 230ac719c500 ("mm/hwpoison: don't try to unpoison containment-failed
pages" does). This patch takes another step in the direction by ensuring that
the target page is kicked out from any pcplist. With this change, the dirty hack
of calling put_page() instead of putback_lru_page() when migration reason is
MR_MEMORY_FAILURE is not necessary any more, so it's reverted.

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
---
 mm/memory-failure.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index de88f33..e105f91 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1435,6 +1435,13 @@ int unpoison_memory(unsigned long pfn)
 		return 0;
 	}
 
+	/*
+	 * Soft-offlined pages might stay in PCP list because it's freed via
+	 * putback_lru_page(), and such pages shouldn't be unpoisoned because
+	 * it could cause list corruption. So let's drain pages to avoid that.
+	 */
+	shake_page(page, 0);
+
 	nr_pages = 1 << compound_order(page);
 
 	if (!get_hwpoison_page(p)) {
@@ -1678,7 +1685,8 @@ static int __soft_offline_page(struct page *page, int flags)
 				pfn, ret, page->flags);
 			if (ret > 0)
 				ret = -EIO;
-		}
+		} else if (!TestSetPageHWPoison(page))
+			num_poisoned_pages_inc();
 	} else {
 		pr_info("soft offline: %#lx: isolation failed: %d, page count %d, type %lx\n",
 			pfn, ret, page_count(page), page->flags);
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v1 07/12] mm: hwpoison: fix race between unpoisoning and freeing migrate source page
@ 2016-09-26 15:22   ` zi.yan
  0 siblings, 0 replies; 34+ messages in thread
From: zi.yan @ 2016-09-26 15:22 UTC (permalink / raw)
  To: linux-kernel, linux-mm
  Cc: benh, mgorman, kirill.shutemov, akpm, dave.hansen, n-horiguchi

From: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>

During testing thp migration, I saw the BUG_ON triggered due to the race between
soft offline and unpoison (what I actually saw was "bad page" warning of freeing
page with PageActive set, then subsequent bug messages differ each time.)

I tried to solve similar problem a few times (see commit f4c18e6f7b5b ("mm:
check __PG_HWPOISON separately from PAGE_FLAGS_CHECK_AT_*",) but the new
workload brings out a new problem of the previous solution.

Let's say that unpoison never works well if the target page is not properly
contained,) so now I'm going in the direction of limiting unpoison function
(as commit 230ac719c500 ("mm/hwpoison: don't try to unpoison containment-failed
pages" does). This patch takes another step in the direction by ensuring that
the target page is kicked out from any pcplist. With this change, the dirty hack
of calling put_page() instead of putback_lru_page() when migration reason is
MR_MEMORY_FAILURE is not necessary any more, so it's reverted.

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
---
 mm/memory-failure.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index de88f33..e105f91 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1435,6 +1435,13 @@ int unpoison_memory(unsigned long pfn)
 		return 0;
 	}
 
+	/*
+	 * Soft-offlined pages might stay in PCP list because it's freed via
+	 * putback_lru_page(), and such pages shouldn't be unpoisoned because
+	 * it could cause list corruption. So let's drain pages to avoid that.
+	 */
+	shake_page(page, 0);
+
 	nr_pages = 1 << compound_order(page);
 
 	if (!get_hwpoison_page(p)) {
@@ -1678,7 +1685,8 @@ static int __soft_offline_page(struct page *page, int flags)
 				pfn, ret, page->flags);
 			if (ret > 0)
 				ret = -EIO;
-		}
+		} else if (!TestSetPageHWPoison(page))
+			num_poisoned_pages_inc();
 	} else {
 		pr_info("soft offline: %#lx: isolation failed: %d, page count %d, type %lx\n",
 			pfn, ret, page_count(page), page->flags);
-- 
2.9.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v1 08/12] mm: hwpoison: soft offline supports thp migration
  2016-09-26 15:22 ` zi.yan
@ 2016-09-26 15:22   ` zi.yan
  -1 siblings, 0 replies; 34+ messages in thread
From: zi.yan @ 2016-09-26 15:22 UTC (permalink / raw)
  To: linux-kernel, linux-mm
  Cc: benh, mgorman, kirill.shutemov, akpm, dave.hansen, n-horiguchi

From: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>

This patch enables thp migration for soft offline.

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
---
 mm/memory-failure.c | 31 ++++++++++++-------------------
 1 file changed, 12 insertions(+), 19 deletions(-)

diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index e105f91..36eb064 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1494,7 +1494,17 @@ static struct page *new_page(struct page *p, unsigned long private, int **x)
 	if (PageHuge(p))
 		return alloc_huge_page_node(page_hstate(compound_head(p)),
 						   nid);
-	else
+	else if (thp_migration_supported() && PageTransHuge(p)) {
+		struct page *thp;
+
+		thp = alloc_pages_node(nid,
+			(GFP_TRANSHUGE | __GFP_THISNODE) & ~__GFP_RECLAIM,
+			HPAGE_PMD_ORDER);
+		if (!thp)
+			return NULL;
+		prep_transhuge_page(thp);
+		return thp;
+	} else
 		return __alloc_pages_node(nid, GFP_HIGHUSER_MOVABLE, 0);
 }
 
@@ -1697,28 +1707,11 @@ static int __soft_offline_page(struct page *page, int flags)
 static int soft_offline_in_use_page(struct page *page, int flags)
 {
 	int ret;
-	struct page *hpage = compound_head(page);
-
-	if (!PageHuge(page) && PageTransHuge(hpage)) {
-		lock_page(hpage);
-		if (!PageAnon(hpage) || unlikely(split_huge_page(hpage))) {
-			unlock_page(hpage);
-			if (!PageAnon(hpage))
-				pr_info("soft offline: %#lx: non anonymous thp\n", page_to_pfn(page));
-			else
-				pr_info("soft offline: %#lx: thp split failed\n", page_to_pfn(page));
-			put_hwpoison_page(hpage);
-			return -EBUSY;
-		}
-		unlock_page(hpage);
-		get_hwpoison_page(page);
-		put_hwpoison_page(hpage);
-	}
 
 	if (PageHuge(page))
 		ret = soft_offline_huge_page(page, flags);
 	else
-		ret = __soft_offline_page(page, flags);
+		ret = __soft_offline_page(compound_head(page), flags);
 
 	return ret;
 }
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v1 08/12] mm: hwpoison: soft offline supports thp migration
@ 2016-09-26 15:22   ` zi.yan
  0 siblings, 0 replies; 34+ messages in thread
From: zi.yan @ 2016-09-26 15:22 UTC (permalink / raw)
  To: linux-kernel, linux-mm
  Cc: benh, mgorman, kirill.shutemov, akpm, dave.hansen, n-horiguchi

From: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>

This patch enables thp migration for soft offline.

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
---
 mm/memory-failure.c | 31 ++++++++++++-------------------
 1 file changed, 12 insertions(+), 19 deletions(-)

diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index e105f91..36eb064 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1494,7 +1494,17 @@ static struct page *new_page(struct page *p, unsigned long private, int **x)
 	if (PageHuge(p))
 		return alloc_huge_page_node(page_hstate(compound_head(p)),
 						   nid);
-	else
+	else if (thp_migration_supported() && PageTransHuge(p)) {
+		struct page *thp;
+
+		thp = alloc_pages_node(nid,
+			(GFP_TRANSHUGE | __GFP_THISNODE) & ~__GFP_RECLAIM,
+			HPAGE_PMD_ORDER);
+		if (!thp)
+			return NULL;
+		prep_transhuge_page(thp);
+		return thp;
+	} else
 		return __alloc_pages_node(nid, GFP_HIGHUSER_MOVABLE, 0);
 }
 
@@ -1697,28 +1707,11 @@ static int __soft_offline_page(struct page *page, int flags)
 static int soft_offline_in_use_page(struct page *page, int flags)
 {
 	int ret;
-	struct page *hpage = compound_head(page);
-
-	if (!PageHuge(page) && PageTransHuge(hpage)) {
-		lock_page(hpage);
-		if (!PageAnon(hpage) || unlikely(split_huge_page(hpage))) {
-			unlock_page(hpage);
-			if (!PageAnon(hpage))
-				pr_info("soft offline: %#lx: non anonymous thp\n", page_to_pfn(page));
-			else
-				pr_info("soft offline: %#lx: thp split failed\n", page_to_pfn(page));
-			put_hwpoison_page(hpage);
-			return -EBUSY;
-		}
-		unlock_page(hpage);
-		get_hwpoison_page(page);
-		put_hwpoison_page(hpage);
-	}
 
 	if (PageHuge(page))
 		ret = soft_offline_huge_page(page, flags);
 	else
-		ret = __soft_offline_page(page, flags);
+		ret = __soft_offline_page(compound_head(page), flags);
 
 	return ret;
 }
-- 
2.9.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v1 09/12] mm: mempolicy: mbind and migrate_pages support thp migration
  2016-09-26 15:22 ` zi.yan
@ 2016-09-26 15:22   ` zi.yan
  -1 siblings, 0 replies; 34+ messages in thread
From: zi.yan @ 2016-09-26 15:22 UTC (permalink / raw)
  To: linux-kernel, linux-mm
  Cc: benh, mgorman, kirill.shutemov, akpm, dave.hansen, n-horiguchi

From: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>

This patch enables thp migration for mbind(2) and migrate_pages(2).

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
---
 mm/mempolicy.c | 92 ++++++++++++++++++++++++++++++++++++++++++++--------------
 1 file changed, 70 insertions(+), 22 deletions(-)

diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index dc8e913..c10f71b 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -94,6 +94,7 @@
 #include <linux/mm_inline.h>
 #include <linux/mmu_notifier.h>
 #include <linux/printk.h>
+#include <linux/swapops.h>
 
 #include <asm/tlbflush.h>
 #include <asm/uaccess.h>
@@ -484,6 +485,49 @@ static inline bool queue_pages_node_check(struct page *page,
 	return node_isset(nid, *qp->nmask) == !!(flags & MPOL_MF_INVERT);
 }
 
+static int queue_pages_pmd(pmd_t *pmd, spinlock_t *ptl, unsigned long addr,
+				unsigned long end, struct mm_walk *walk)
+{
+	int ret = 0;
+	struct page *page;
+	struct queue_pages *qp = walk->private;
+	unsigned long flags;
+
+	if (unlikely(is_pmd_migration_entry(*pmd))) {
+		ret = 1;
+		goto unlock;
+	}
+	page = pmd_page(*pmd);
+	if (is_huge_zero_page(page)) {
+		spin_unlock(ptl);
+		split_huge_pmd(walk->vma, pmd, addr);
+		goto out;
+	}
+	if ((end - addr != HPAGE_PMD_SIZE) || !thp_migration_supported()) {
+		get_page(page);
+		spin_unlock(ptl);
+		lock_page(page);
+		ret = split_huge_page(page);
+		unlock_page(page);
+		put_page(page);
+		goto out;
+	}
+	if (queue_pages_node_check(page, qp)) {
+		ret = 1;
+		goto unlock;
+	}
+
+	ret = 1;
+	flags = qp->flags;
+	/* go to thp migration */
+	if (flags & (MPOL_MF_MOVE | MPOL_MF_MOVE_ALL))
+		migrate_page_add(page, qp->pagelist, flags);
+unlock:
+	spin_unlock(ptl);
+out:
+	return ret;
+}
+
 /*
  * Scan through pages checking if pages follow certain conditions,
  * and move them to the pagelist if they do.
@@ -495,30 +539,15 @@ static int queue_pages_pte_range(pmd_t *pmd, unsigned long addr,
 	struct page *page;
 	struct queue_pages *qp = walk->private;
 	unsigned long flags = qp->flags;
-	int nid, ret;
+	int ret;
 	pte_t *pte;
 	spinlock_t *ptl;
 
-	if (pmd_trans_huge(*pmd)) {
-		ptl = pmd_lock(walk->mm, pmd);
-		if (pmd_trans_huge(*pmd)) {
-			page = pmd_page(*pmd);
-			if (is_huge_zero_page(page)) {
-				spin_unlock(ptl);
-				split_huge_pmd(vma, pmd, addr);
-			} else {
-				get_page(page);
-				spin_unlock(ptl);
-				lock_page(page);
-				ret = split_huge_page(page);
-				unlock_page(page);
-				put_page(page);
-				if (ret)
-					return 0;
-			}
-		} else {
-			spin_unlock(ptl);
-		}
+	ptl = pmd_trans_huge_lock(pmd, vma);
+	if (ptl) {
+		ret = queue_pages_pmd(pmd, ptl, addr, end, walk);
+		if (ret)
+			return 0;
 	}
 
 	if (pmd_trans_unstable(pmd))
@@ -979,7 +1008,17 @@ static struct page *new_node_page(struct page *page, unsigned long node, int **x
 	if (PageHuge(page))
 		return alloc_huge_page_node(page_hstate(compound_head(page)),
 					node);
-	else
+	else if (thp_migration_supported() && PageTransHuge(page)) {
+		struct page *thp;
+
+		thp = alloc_pages_node(node,
+			(GFP_TRANSHUGE | __GFP_THISNODE) & ~__GFP_RECLAIM,
+			HPAGE_PMD_ORDER);
+		if (!thp)
+			return NULL;
+		prep_transhuge_page(thp);
+		return thp;
+	} else
 		return __alloc_pages_node(node, GFP_HIGHUSER_MOVABLE |
 						    __GFP_THISNODE, 0);
 }
@@ -1145,6 +1184,15 @@ static struct page *new_page(struct page *page, unsigned long start, int **x)
 	if (PageHuge(page)) {
 		BUG_ON(!vma);
 		return alloc_huge_page_noerr(vma, address, 1);
+	} else if (thp_migration_supported() && PageTransHuge(page)) {
+		struct page *thp;
+
+		thp = alloc_hugepage_vma(GFP_TRANSHUGE, vma, address,
+					 HPAGE_PMD_ORDER);
+		if (!thp)
+			return NULL;
+		prep_transhuge_page(thp);
+		return thp;
 	}
 	/*
 	 * if !vma, alloc_page_vma() will use task or system default policy
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v1 09/12] mm: mempolicy: mbind and migrate_pages support thp migration
@ 2016-09-26 15:22   ` zi.yan
  0 siblings, 0 replies; 34+ messages in thread
From: zi.yan @ 2016-09-26 15:22 UTC (permalink / raw)
  To: linux-kernel, linux-mm
  Cc: benh, mgorman, kirill.shutemov, akpm, dave.hansen, n-horiguchi

From: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>

This patch enables thp migration for mbind(2) and migrate_pages(2).

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
---
 mm/mempolicy.c | 92 ++++++++++++++++++++++++++++++++++++++++++++--------------
 1 file changed, 70 insertions(+), 22 deletions(-)

diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index dc8e913..c10f71b 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -94,6 +94,7 @@
 #include <linux/mm_inline.h>
 #include <linux/mmu_notifier.h>
 #include <linux/printk.h>
+#include <linux/swapops.h>
 
 #include <asm/tlbflush.h>
 #include <asm/uaccess.h>
@@ -484,6 +485,49 @@ static inline bool queue_pages_node_check(struct page *page,
 	return node_isset(nid, *qp->nmask) == !!(flags & MPOL_MF_INVERT);
 }
 
+static int queue_pages_pmd(pmd_t *pmd, spinlock_t *ptl, unsigned long addr,
+				unsigned long end, struct mm_walk *walk)
+{
+	int ret = 0;
+	struct page *page;
+	struct queue_pages *qp = walk->private;
+	unsigned long flags;
+
+	if (unlikely(is_pmd_migration_entry(*pmd))) {
+		ret = 1;
+		goto unlock;
+	}
+	page = pmd_page(*pmd);
+	if (is_huge_zero_page(page)) {
+		spin_unlock(ptl);
+		split_huge_pmd(walk->vma, pmd, addr);
+		goto out;
+	}
+	if ((end - addr != HPAGE_PMD_SIZE) || !thp_migration_supported()) {
+		get_page(page);
+		spin_unlock(ptl);
+		lock_page(page);
+		ret = split_huge_page(page);
+		unlock_page(page);
+		put_page(page);
+		goto out;
+	}
+	if (queue_pages_node_check(page, qp)) {
+		ret = 1;
+		goto unlock;
+	}
+
+	ret = 1;
+	flags = qp->flags;
+	/* go to thp migration */
+	if (flags & (MPOL_MF_MOVE | MPOL_MF_MOVE_ALL))
+		migrate_page_add(page, qp->pagelist, flags);
+unlock:
+	spin_unlock(ptl);
+out:
+	return ret;
+}
+
 /*
  * Scan through pages checking if pages follow certain conditions,
  * and move them to the pagelist if they do.
@@ -495,30 +539,15 @@ static int queue_pages_pte_range(pmd_t *pmd, unsigned long addr,
 	struct page *page;
 	struct queue_pages *qp = walk->private;
 	unsigned long flags = qp->flags;
-	int nid, ret;
+	int ret;
 	pte_t *pte;
 	spinlock_t *ptl;
 
-	if (pmd_trans_huge(*pmd)) {
-		ptl = pmd_lock(walk->mm, pmd);
-		if (pmd_trans_huge(*pmd)) {
-			page = pmd_page(*pmd);
-			if (is_huge_zero_page(page)) {
-				spin_unlock(ptl);
-				split_huge_pmd(vma, pmd, addr);
-			} else {
-				get_page(page);
-				spin_unlock(ptl);
-				lock_page(page);
-				ret = split_huge_page(page);
-				unlock_page(page);
-				put_page(page);
-				if (ret)
-					return 0;
-			}
-		} else {
-			spin_unlock(ptl);
-		}
+	ptl = pmd_trans_huge_lock(pmd, vma);
+	if (ptl) {
+		ret = queue_pages_pmd(pmd, ptl, addr, end, walk);
+		if (ret)
+			return 0;
 	}
 
 	if (pmd_trans_unstable(pmd))
@@ -979,7 +1008,17 @@ static struct page *new_node_page(struct page *page, unsigned long node, int **x
 	if (PageHuge(page))
 		return alloc_huge_page_node(page_hstate(compound_head(page)),
 					node);
-	else
+	else if (thp_migration_supported() && PageTransHuge(page)) {
+		struct page *thp;
+
+		thp = alloc_pages_node(node,
+			(GFP_TRANSHUGE | __GFP_THISNODE) & ~__GFP_RECLAIM,
+			HPAGE_PMD_ORDER);
+		if (!thp)
+			return NULL;
+		prep_transhuge_page(thp);
+		return thp;
+	} else
 		return __alloc_pages_node(node, GFP_HIGHUSER_MOVABLE |
 						    __GFP_THISNODE, 0);
 }
@@ -1145,6 +1184,15 @@ static struct page *new_page(struct page *page, unsigned long start, int **x)
 	if (PageHuge(page)) {
 		BUG_ON(!vma);
 		return alloc_huge_page_noerr(vma, address, 1);
+	} else if (thp_migration_supported() && PageTransHuge(page)) {
+		struct page *thp;
+
+		thp = alloc_hugepage_vma(GFP_TRANSHUGE, vma, address,
+					 HPAGE_PMD_ORDER);
+		if (!thp)
+			return NULL;
+		prep_transhuge_page(thp);
+		return thp;
 	}
 	/*
 	 * if !vma, alloc_page_vma() will use task or system default policy
-- 
2.9.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v1 10/12] mm: migrate: move_pages() supports thp migration
  2016-09-26 15:22 ` zi.yan
@ 2016-09-26 15:22   ` zi.yan
  -1 siblings, 0 replies; 34+ messages in thread
From: zi.yan @ 2016-09-26 15:22 UTC (permalink / raw)
  To: linux-kernel, linux-mm
  Cc: benh, mgorman, kirill.shutemov, akpm, dave.hansen, n-horiguchi

From: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>

This patch enables thp migration for move_pages(2).

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
---
 mm/migrate.c | 24 +++++++++++++++++++++---
 1 file changed, 21 insertions(+), 3 deletions(-)

diff --git a/mm/migrate.c b/mm/migrate.c
index dfca530..132e8db 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1417,7 +1417,17 @@ static struct page *new_page_node(struct page *p, unsigned long private,
 	if (PageHuge(p))
 		return alloc_huge_page_node(page_hstate(compound_head(p)),
 					pm->node);
-	else
+	else if (thp_migration_supported() && PageTransHuge(p)) {
+		struct page *thp;
+
+		thp = alloc_pages_node(pm->node,
+			(GFP_TRANSHUGE | __GFP_THISNODE) & ~__GFP_RECLAIM,
+			HPAGE_PMD_ORDER);
+		if (!thp)
+			return NULL;
+		prep_transhuge_page(thp);
+		return thp;
+	} else
 		return __alloc_pages_node(pm->node,
 				GFP_HIGHUSER_MOVABLE | __GFP_THISNODE, 0);
 }
@@ -1444,6 +1454,7 @@ static int do_move_page_to_node_array(struct mm_struct *mm,
 	for (pp = pm; pp->node != MAX_NUMNODES; pp++) {
 		struct vm_area_struct *vma;
 		struct page *page;
+		unsigned int follflags;
 
 		err = -EFAULT;
 		vma = find_vma(mm, pp->addr);
@@ -1451,8 +1462,10 @@ static int do_move_page_to_node_array(struct mm_struct *mm,
 			goto set_status;
 
 		/* FOLL_DUMP to ignore special (like zero) pages */
-		page = follow_page(vma, pp->addr,
-				FOLL_GET | FOLL_SPLIT | FOLL_DUMP);
+		follflags = FOLL_GET | FOLL_SPLIT | FOLL_DUMP;
+		if (thp_migration_supported())
+			follflags &= ~FOLL_SPLIT;
+		page = follow_page(vma, pp->addr, follflags);
 
 		err = PTR_ERR(page);
 		if (IS_ERR(page))
@@ -1480,6 +1493,11 @@ static int do_move_page_to_node_array(struct mm_struct *mm,
 			if (PageHead(page))
 				isolate_huge_page(page, &pagelist);
 			goto put_and_set;
+		} else if (PageTransCompound(page)) {
+			if (PageTail(page)) {
+				err = pp->node;
+				goto put_and_set;
+			}
 		}
 
 		err = isolate_lru_page(page);
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v1 10/12] mm: migrate: move_pages() supports thp migration
@ 2016-09-26 15:22   ` zi.yan
  0 siblings, 0 replies; 34+ messages in thread
From: zi.yan @ 2016-09-26 15:22 UTC (permalink / raw)
  To: linux-kernel, linux-mm
  Cc: benh, mgorman, kirill.shutemov, akpm, dave.hansen, n-horiguchi

From: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>

This patch enables thp migration for move_pages(2).

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
---
 mm/migrate.c | 24 +++++++++++++++++++++---
 1 file changed, 21 insertions(+), 3 deletions(-)

diff --git a/mm/migrate.c b/mm/migrate.c
index dfca530..132e8db 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1417,7 +1417,17 @@ static struct page *new_page_node(struct page *p, unsigned long private,
 	if (PageHuge(p))
 		return alloc_huge_page_node(page_hstate(compound_head(p)),
 					pm->node);
-	else
+	else if (thp_migration_supported() && PageTransHuge(p)) {
+		struct page *thp;
+
+		thp = alloc_pages_node(pm->node,
+			(GFP_TRANSHUGE | __GFP_THISNODE) & ~__GFP_RECLAIM,
+			HPAGE_PMD_ORDER);
+		if (!thp)
+			return NULL;
+		prep_transhuge_page(thp);
+		return thp;
+	} else
 		return __alloc_pages_node(pm->node,
 				GFP_HIGHUSER_MOVABLE | __GFP_THISNODE, 0);
 }
@@ -1444,6 +1454,7 @@ static int do_move_page_to_node_array(struct mm_struct *mm,
 	for (pp = pm; pp->node != MAX_NUMNODES; pp++) {
 		struct vm_area_struct *vma;
 		struct page *page;
+		unsigned int follflags;
 
 		err = -EFAULT;
 		vma = find_vma(mm, pp->addr);
@@ -1451,8 +1462,10 @@ static int do_move_page_to_node_array(struct mm_struct *mm,
 			goto set_status;
 
 		/* FOLL_DUMP to ignore special (like zero) pages */
-		page = follow_page(vma, pp->addr,
-				FOLL_GET | FOLL_SPLIT | FOLL_DUMP);
+		follflags = FOLL_GET | FOLL_SPLIT | FOLL_DUMP;
+		if (thp_migration_supported())
+			follflags &= ~FOLL_SPLIT;
+		page = follow_page(vma, pp->addr, follflags);
 
 		err = PTR_ERR(page);
 		if (IS_ERR(page))
@@ -1480,6 +1493,11 @@ static int do_move_page_to_node_array(struct mm_struct *mm,
 			if (PageHead(page))
 				isolate_huge_page(page, &pagelist);
 			goto put_and_set;
+		} else if (PageTransCompound(page)) {
+			if (PageTail(page)) {
+				err = pp->node;
+				goto put_and_set;
+			}
 		}
 
 		err = isolate_lru_page(page);
-- 
2.9.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v1 11/12] mm: memory_hotplug: memory hotremove supports thp migration
  2016-09-26 15:22 ` zi.yan
@ 2016-09-26 15:22   ` zi.yan
  -1 siblings, 0 replies; 34+ messages in thread
From: zi.yan @ 2016-09-26 15:22 UTC (permalink / raw)
  To: linux-kernel, linux-mm
  Cc: benh, mgorman, kirill.shutemov, akpm, dave.hansen, n-horiguchi

From: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>

This patch enables thp migration for memory hotremove. Stub definition of
prep_transhuge_page() is added for CONFIG_TRANSPARENT_HUGEPAGE=n.

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
---
 include/linux/huge_mm.h | 3 +++
 mm/memory_hotplug.c     | 8 ++++++++
 mm/page_isolation.c     | 9 +++++++++
 3 files changed, 20 insertions(+)

diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index 4ae156e..fe8766dc 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -174,6 +174,9 @@ static inline bool thp_migration_supported(void)
 static inline void prep_transhuge_page(struct page *page) {}
 
 #define transparent_hugepage_flags 0UL
+static inline void prep_transhuge_page(struct page *page)
+{
+}
 static inline int
 split_huge_page_to_list(struct page *page, struct list_head *list)
 {
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index b58906b..6abe898 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1609,6 +1609,14 @@ do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
 			if (isolate_huge_page(page, &source))
 				move_pages -= 1 << compound_order(head);
 			continue;
+		} else if (thp_migration_supported() && PageTransHuge(page)) {
+			struct page *head = compound_head(page);
+
+			pfn = page_to_pfn(head) + (1<<compound_order(head)) - 1;
+			if (compound_order(head) > PFN_SECTION_SHIFT) {
+				ret = -EBUSY;
+				break;
+			}
 		}
 
 		if (!get_page_unless_zero(page))
diff --git a/mm/page_isolation.c b/mm/page_isolation.c
index 064b7fb..43ecdf6 100644
--- a/mm/page_isolation.c
+++ b/mm/page_isolation.c
@@ -294,6 +294,15 @@ struct page *alloc_migrate_target(struct page *page, unsigned long private,
 		return alloc_huge_page_node(page_hstate(compound_head(page)),
 					    next_node_in(page_to_nid(page),
 							 node_online_map));
+	else if (thp_migration_supported() && PageTransHuge(page)) {
+		struct page *thp;
+
+		thp = alloc_pages(GFP_TRANSHUGE, HPAGE_PMD_ORDER);
+		if (!thp)
+			return NULL;
+		prep_transhuge_page(thp);
+		return thp;
+	}
 
 	if (PageHighMem(page))
 		gfp_mask |= __GFP_HIGHMEM;
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v1 11/12] mm: memory_hotplug: memory hotremove supports thp migration
@ 2016-09-26 15:22   ` zi.yan
  0 siblings, 0 replies; 34+ messages in thread
From: zi.yan @ 2016-09-26 15:22 UTC (permalink / raw)
  To: linux-kernel, linux-mm
  Cc: benh, mgorman, kirill.shutemov, akpm, dave.hansen, n-horiguchi

From: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>

This patch enables thp migration for memory hotremove. Stub definition of
prep_transhuge_page() is added for CONFIG_TRANSPARENT_HUGEPAGE=n.

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
---
 include/linux/huge_mm.h | 3 +++
 mm/memory_hotplug.c     | 8 ++++++++
 mm/page_isolation.c     | 9 +++++++++
 3 files changed, 20 insertions(+)

diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index 4ae156e..fe8766dc 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -174,6 +174,9 @@ static inline bool thp_migration_supported(void)
 static inline void prep_transhuge_page(struct page *page) {}
 
 #define transparent_hugepage_flags 0UL
+static inline void prep_transhuge_page(struct page *page)
+{
+}
 static inline int
 split_huge_page_to_list(struct page *page, struct list_head *list)
 {
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index b58906b..6abe898 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1609,6 +1609,14 @@ do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
 			if (isolate_huge_page(page, &source))
 				move_pages -= 1 << compound_order(head);
 			continue;
+		} else if (thp_migration_supported() && PageTransHuge(page)) {
+			struct page *head = compound_head(page);
+
+			pfn = page_to_pfn(head) + (1<<compound_order(head)) - 1;
+			if (compound_order(head) > PFN_SECTION_SHIFT) {
+				ret = -EBUSY;
+				break;
+			}
 		}
 
 		if (!get_page_unless_zero(page))
diff --git a/mm/page_isolation.c b/mm/page_isolation.c
index 064b7fb..43ecdf6 100644
--- a/mm/page_isolation.c
+++ b/mm/page_isolation.c
@@ -294,6 +294,15 @@ struct page *alloc_migrate_target(struct page *page, unsigned long private,
 		return alloc_huge_page_node(page_hstate(compound_head(page)),
 					    next_node_in(page_to_nid(page),
 							 node_online_map));
+	else if (thp_migration_supported() && PageTransHuge(page)) {
+		struct page *thp;
+
+		thp = alloc_pages(GFP_TRANSHUGE, HPAGE_PMD_ORDER);
+		if (!thp)
+			return NULL;
+		prep_transhuge_page(thp);
+		return thp;
+	}
 
 	if (PageHighMem(page))
 		gfp_mask |= __GFP_HIGHMEM;
-- 
2.9.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v1 12/12] mm: ppc64: Add THP migration support for ppc64.
  2016-09-26 15:22 ` zi.yan
@ 2016-09-26 15:22   ` zi.yan
  -1 siblings, 0 replies; 34+ messages in thread
From: zi.yan @ 2016-09-26 15:22 UTC (permalink / raw)
  To: linux-kernel, linux-mm
  Cc: benh, mgorman, kirill.shutemov, akpm, dave.hansen, n-horiguchi, Zi Yan

From: Zi Yan <zi.yan@cs.rutgers.edu>

Signed-off-by: Zi Yan <zi.yan@cs.rutgers.edu>
---
 arch/powerpc/Kconfig                         |  4 ++++
 arch/powerpc/include/asm/book3s/64/pgtable.h | 23 +++++++++++++++++++++++
 2 files changed, 27 insertions(+)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 927d2ab..84ffd4c 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -553,6 +553,10 @@ config ARCH_SPARSEMEM_DEFAULT
 config SYS_SUPPORTS_HUGETLBFS
 	bool
 
+config ARCH_ENABLE_THP_MIGRATION
+	def_bool y
+	depends on PPC64 && TRANSPARENT_HUGEPAGE && MIGRATION
+
 source "mm/Kconfig"
 
 config ARCH_MEMORY_PROBE
diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h b/arch/powerpc/include/asm/book3s/64/pgtable.h
index 263bf39..9dee0467 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
@@ -521,7 +521,9 @@ static inline bool pte_user(pte_t pte)
  * Clear bits not found in swap entries here.
  */
 #define __pte_to_swp_entry(pte)	((swp_entry_t) { pte_val((pte)) & ~_PAGE_PTE })
+#define __pmd_to_swp_entry(pte)	((swp_entry_t) { pmd_val((pte)) & ~_PAGE_PTE })
 #define __swp_entry_to_pte(x)	__pte((x).val | _PAGE_PTE)
+#define __swp_entry_to_pmd(x)	__pmd((x).val | _PAGE_PTE)
 
 #ifdef CONFIG_MEM_SOFT_DIRTY
 #define _PAGE_SWP_SOFT_DIRTY   (1UL << (SWP_TYPE_BITS + _PAGE_BIT_SWAP_TYPE))
@@ -662,6 +664,10 @@ static inline int pmd_bad(pmd_t pmd)
 		return radix__pmd_bad(pmd);
 	return hash__pmd_bad(pmd);
 }
+static inline int __pmd_present(pmd_t pte)
+{
+	return !!(pmd_val(pte) & _PAGE_PRESENT);
+}
 
 static inline void pud_set(pud_t *pudp, unsigned long val)
 {
@@ -850,6 +856,23 @@ static inline pte_t *pmdp_ptep(pmd_t *pmd)
 #define pmd_soft_dirty(pmd)    pte_soft_dirty(pmd_pte(pmd))
 #define pmd_mksoft_dirty(pmd)  pte_pmd(pte_mksoft_dirty(pmd_pte(pmd)))
 #define pmd_clear_soft_dirty(pmd) pte_pmd(pte_clear_soft_dirty(pmd_pte(pmd)))
+
+#ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION
+static inline pmd_t pmd_swp_mksoft_dirty(pmd_t pmd)
+{
+	return pte_pmd(pte_swp_mksoft_dirty(pmd_pte(pmd)));
+}
+
+static inline int pmd_swp_soft_dirty(pmd_t pmd)
+{
+	return pte_swp_soft_dirty(pmd_pte(pmd));
+}
+
+static inline pmd_t pmd_swp_clear_soft_dirty(pmd_t pmd)
+{
+	return pte_pmd(pte_swp_clear_soft_dirty(pmd_pte(pmd)));
+}
+#endif
 #endif /* CONFIG_HAVE_ARCH_SOFT_DIRTY */
 
 #ifdef CONFIG_NUMA_BALANCING
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v1 12/12] mm: ppc64: Add THP migration support for ppc64.
@ 2016-09-26 15:22   ` zi.yan
  0 siblings, 0 replies; 34+ messages in thread
From: zi.yan @ 2016-09-26 15:22 UTC (permalink / raw)
  To: linux-kernel, linux-mm
  Cc: benh, mgorman, kirill.shutemov, akpm, dave.hansen, n-horiguchi, Zi Yan

From: Zi Yan <zi.yan@cs.rutgers.edu>

Signed-off-by: Zi Yan <zi.yan@cs.rutgers.edu>
---
 arch/powerpc/Kconfig                         |  4 ++++
 arch/powerpc/include/asm/book3s/64/pgtable.h | 23 +++++++++++++++++++++++
 2 files changed, 27 insertions(+)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 927d2ab..84ffd4c 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -553,6 +553,10 @@ config ARCH_SPARSEMEM_DEFAULT
 config SYS_SUPPORTS_HUGETLBFS
 	bool
 
+config ARCH_ENABLE_THP_MIGRATION
+	def_bool y
+	depends on PPC64 && TRANSPARENT_HUGEPAGE && MIGRATION
+
 source "mm/Kconfig"
 
 config ARCH_MEMORY_PROBE
diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h b/arch/powerpc/include/asm/book3s/64/pgtable.h
index 263bf39..9dee0467 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
@@ -521,7 +521,9 @@ static inline bool pte_user(pte_t pte)
  * Clear bits not found in swap entries here.
  */
 #define __pte_to_swp_entry(pte)	((swp_entry_t) { pte_val((pte)) & ~_PAGE_PTE })
+#define __pmd_to_swp_entry(pte)	((swp_entry_t) { pmd_val((pte)) & ~_PAGE_PTE })
 #define __swp_entry_to_pte(x)	__pte((x).val | _PAGE_PTE)
+#define __swp_entry_to_pmd(x)	__pmd((x).val | _PAGE_PTE)
 
 #ifdef CONFIG_MEM_SOFT_DIRTY
 #define _PAGE_SWP_SOFT_DIRTY   (1UL << (SWP_TYPE_BITS + _PAGE_BIT_SWAP_TYPE))
@@ -662,6 +664,10 @@ static inline int pmd_bad(pmd_t pmd)
 		return radix__pmd_bad(pmd);
 	return hash__pmd_bad(pmd);
 }
+static inline int __pmd_present(pmd_t pte)
+{
+	return !!(pmd_val(pte) & _PAGE_PRESENT);
+}
 
 static inline void pud_set(pud_t *pudp, unsigned long val)
 {
@@ -850,6 +856,23 @@ static inline pte_t *pmdp_ptep(pmd_t *pmd)
 #define pmd_soft_dirty(pmd)    pte_soft_dirty(pmd_pte(pmd))
 #define pmd_mksoft_dirty(pmd)  pte_pmd(pte_mksoft_dirty(pmd_pte(pmd)))
 #define pmd_clear_soft_dirty(pmd) pte_pmd(pte_clear_soft_dirty(pmd_pte(pmd)))
+
+#ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION
+static inline pmd_t pmd_swp_mksoft_dirty(pmd_t pmd)
+{
+	return pte_pmd(pte_swp_mksoft_dirty(pmd_pte(pmd)));
+}
+
+static inline int pmd_swp_soft_dirty(pmd_t pmd)
+{
+	return pte_swp_soft_dirty(pmd_pte(pmd));
+}
+
+static inline pmd_t pmd_swp_clear_soft_dirty(pmd_t pmd)
+{
+	return pte_pmd(pte_swp_clear_soft_dirty(pmd_pte(pmd)));
+}
+#endif
 #endif /* CONFIG_HAVE_ARCH_SOFT_DIRTY */
 
 #ifdef CONFIG_NUMA_BALANCING
-- 
2.9.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [PATCH v1 00/12] THP migration support
  2016-09-26 15:22 ` zi.yan
                   ` (12 preceding siblings ...)
  (?)
@ 2016-09-26 15:38 ` Zi Yan
  2016-09-29  8:25     ` Naoya Horiguchi
  -1 siblings, 1 reply; 34+ messages in thread
From: Zi Yan @ 2016-09-26 15:38 UTC (permalink / raw)
  To: linux-kernel, linux-mm
  Cc: benh, mgorman, kirill.shutemov, akpm, dave.hansen, n-horiguchi, Zi Yan

[-- Attachment #1: Type: text/plain, Size: 5116 bytes --]

On 26 Sep 2016, at 11:22, zi.yan@sent.com wrote:

> From: Zi Yan <zi.yan@cs.rutgers.edu>
>
> Hi all,
>
> This patchset is based on Naoya Horiguchi's page migration enchancement
> for thp patchset with additional IBM ppc64 support. And I rebase it
> on the latest upstream commit.
>
> The motivation is that 4KB page migration is underutilizing the memory
> bandwidth compared to 2MB THP migration.

Sorry, in ppc64, 64KB page was used as the base page and 16MB THP
was used.

>
> As part of my internship work in NVIDIA, I compared the bandwidth
> utilizations between 512 4KB pages and 1 2MB page in both x86_64 and ppc64.
> And the results show that migrating 512 4KB pages takes only 3x and 1.15x of
> the time, compared to migrating single 2MB THP, in x86_64 and ppc64
> respectively.
>
> Here are the actual BW numbers (total_data_size/migration_time):
>         | 512 4KB pages | 1 2MB THP  |  1 4KB page
> x86_64  |  0.98GB/s     |  2.97GB/s  |   0.06GB/s
> ppc64   |  6.14GB/s     |  7.10GB/s  |   1.24GB/s

And the BW number should be:
         | 512 4KB pages | 1 2MB THP  |  1 4KB page
 x86_64  |  0.98GB/s     |  2.97GB/s  |   0.06GB/s

         | 512 64KB pages | 1 16MB THP  |  1 64KB page
 ppc64   |  6.14GB/s      |  7.10GB/s   |   1.24GB/s

>
> Any comments or advices are welcome.
>
> Here is the original message from Naoya:
>
> This patchset enhances page migration functionality to handle thp migration
> for various page migration's callers:
>  - mbind(2)
>  - move_pages(2)
>  - migrate_pages(2)
>  - cgroup/cpuset migration
>  - memory hotremove
>  - soft offline
>
> The main benefit is that we can avoid unnecessary thp splits, which helps us
> avoid performance decrease when your applications handles NUMA optimization on
> their own.
>
> The implementation is similar to that of normal page migration, the key point
> is that we modify a pmd to a pmd migration entry in swap-entry like format.
> pmd_present() is not simple and it's not enough by itself to determine whether
> a given pmd is a pmd migration entry. See patch 3/11 and 5/11 for details.
>
> Here're topics which might be helpful to start discussion:
>
> - at this point, this functionality is limited to x86_64.
>
> - there's alrealy an implementation of thp migration in autonuma code of which
>   this patchset doesn't touch anything because it works fine as it is.
>
> - fallback to thp split: current implementation just fails a migration trial if
>   thp migration fails. It's possible to retry migration after splitting the thp,
>   but that's not included in this version.
>
> Thanks,
> Zi Yan
> ---
>
> Naoya Horiguchi (11):
>   mm: mempolicy: add queue_pages_node_check()
>   mm: thp: introduce CONFIG_ARCH_ENABLE_HUGEPAGE_MIGRATION
>   mm: thp: add helpers related to thp/pmd migration
>   mm: thp: enable thp migration in generic path
>   mm: thp: check pmd migration entry in common path
>   mm: soft-dirty: keep soft-dirty bits over thp migration
>   mm: hwpoison: fix race between unpoisoning and freeing migrate source
>     page
>   mm: hwpoison: soft offline supports thp migration
>   mm: mempolicy: mbind and migrate_pages support thp migration
>   mm: migrate: move_pages() supports thp migration
>   mm: memory_hotplug: memory hotremove supports thp migration
>
> Zi Yan (1):
>   mm: ppc64: Add THP migration support for ppc64.
>
>  arch/powerpc/Kconfig                         |   4 +
>  arch/powerpc/include/asm/book3s/64/pgtable.h |  23 ++++
>  arch/x86/Kconfig                             |   4 +
>  arch/x86/include/asm/pgtable.h               |  28 ++++
>  arch/x86/include/asm/pgtable_64.h            |   2 +
>  arch/x86/include/asm/pgtable_types.h         |   8 +-
>  arch/x86/mm/gup.c                            |   3 +
>  fs/proc/task_mmu.c                           |  20 +--
>  include/asm-generic/pgtable.h                |  34 ++++-
>  include/linux/huge_mm.h                      |  13 ++
>  include/linux/swapops.h                      |  64 ++++++++++
>  mm/Kconfig                                   |   3 +
>  mm/gup.c                                     |   8 ++
>  mm/huge_memory.c                             | 184 +++++++++++++++++++++++++--
>  mm/memcontrol.c                              |   2 +
>  mm/memory-failure.c                          |  41 +++---
>  mm/memory.c                                  |   5 +
>  mm/memory_hotplug.c                          |   8 ++
>  mm/mempolicy.c                               | 108 ++++++++++++----
>  mm/migrate.c                                 |  49 ++++++-
>  mm/page_isolation.c                          |   9 ++
>  mm/rmap.c                                    |   5 +
>  22 files changed, 549 insertions(+), 76 deletions(-)
>
> -- 
> 2.9.3
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>


--
Best Regards
Yan Zi

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 496 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v1 00/12] THP migration support
  2016-09-26 15:38 ` [PATCH v1 00/12] THP migration support Zi Yan
@ 2016-09-29  8:25     ` Naoya Horiguchi
  0 siblings, 0 replies; 34+ messages in thread
From: Naoya Horiguchi @ 2016-09-29  8:25 UTC (permalink / raw)
  To: Zi Yan
  Cc: linux-kernel, linux-mm, benh, mgorman, kirill.shutemov, akpm,
	dave.hansen, Zi Yan

Hi Yan,

On Mon, Sep 26, 2016 at 11:38:05AM -0400, Zi Yan wrote:
> On 26 Sep 2016, at 11:22, zi.yan@sent.com wrote:
> 
> > From: Zi Yan <zi.yan@cs.rutgers.edu>
> >
> > Hi all,
> >
> > This patchset is based on Naoya Horiguchi's page migration enchancement
> > for thp patchset with additional IBM ppc64 support. And I rebase it
> > on the latest upstream commit.

Thanks for helping,

I think that you seem to do some testing with these patches on powerpc,
which shows that thp migration can be enabled relatively easily for
non-x86_64. This is a good news to me.

And I apology for my slow development over this patchset.
My previous post was about 5 months ago, and I've not done ver.2 due to
many interruptions. Someone also privately asked me about the progress
of this work, so I promised ver.2 will be posted in a few weeks.
Your patch 12/12 will come with it.

Thanks,
Naoya Horiguchi

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v1 00/12] THP migration support
@ 2016-09-29  8:25     ` Naoya Horiguchi
  0 siblings, 0 replies; 34+ messages in thread
From: Naoya Horiguchi @ 2016-09-29  8:25 UTC (permalink / raw)
  To: Zi Yan
  Cc: linux-kernel, linux-mm, benh, mgorman, kirill.shutemov, akpm,
	dave.hansen, Zi Yan

Hi Yan,

On Mon, Sep 26, 2016 at 11:38:05AM -0400, Zi Yan wrote:
> On 26 Sep 2016, at 11:22, zi.yan@sent.com wrote:
> 
> > From: Zi Yan <zi.yan@cs.rutgers.edu>
> >
> > Hi all,
> >
> > This patchset is based on Naoya Horiguchi's page migration enchancement
> > for thp patchset with additional IBM ppc64 support. And I rebase it
> > on the latest upstream commit.

Thanks for helping,

I think that you seem to do some testing with these patches on powerpc,
which shows that thp migration can be enabled relatively easily for
non-x86_64. This is a good news to me.

And I apology for my slow development over this patchset.
My previous post was about 5 months ago, and I've not done ver.2 due to
many interruptions. Someone also privately asked me about the progress
of this work, so I promised ver.2 will be posted in a few weeks.
Your patch 12/12 will come with it.

Thanks,
Naoya Horiguchi
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v1 12/12] mm: ppc64: Add THP migration support for ppc64.
  2016-09-26 15:22   ` zi.yan
@ 2016-09-30  0:02     ` Balbir Singh
  -1 siblings, 0 replies; 34+ messages in thread
From: Balbir Singh @ 2016-09-30  0:02 UTC (permalink / raw)
  To: zi.yan, linux-kernel, linux-mm
  Cc: benh, mgorman, kirill.shutemov, akpm, dave.hansen, n-horiguchi, Zi Yan



On 27/09/16 01:22, zi.yan@sent.com wrote:
> From: Zi Yan <zi.yan@cs.rutgers.edu>
> 
> Signed-off-by: Zi Yan <zi.yan@cs.rutgers.edu>
> ---
>  arch/powerpc/Kconfig                         |  4 ++++
>  arch/powerpc/include/asm/book3s/64/pgtable.h | 23 +++++++++++++++++++++++
>  2 files changed, 27 insertions(+)
> 
> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
> index 927d2ab..84ffd4c 100644
> --- a/arch/powerpc/Kconfig
> +++ b/arch/powerpc/Kconfig
> @@ -553,6 +553,10 @@ config ARCH_SPARSEMEM_DEFAULT
>  config SYS_SUPPORTS_HUGETLBFS
>  	bool
>  
> +config ARCH_ENABLE_THP_MIGRATION
> +	def_bool y
> +	depends on PPC64 && TRANSPARENT_HUGEPAGE && MIGRATION

I had done the same patch before but never posted it, since Nayomi's patches
were blocked behind _PAGE_PSE (x86 specific concern for __pmdp_present())

Having said that this addition looks good to me

Acked-by: Balbir Singh <bsingharora@gmail.com>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v1 12/12] mm: ppc64: Add THP migration support for ppc64.
@ 2016-09-30  0:02     ` Balbir Singh
  0 siblings, 0 replies; 34+ messages in thread
From: Balbir Singh @ 2016-09-30  0:02 UTC (permalink / raw)
  To: zi.yan, linux-kernel, linux-mm
  Cc: benh, mgorman, kirill.shutemov, akpm, dave.hansen, n-horiguchi, Zi Yan



On 27/09/16 01:22, zi.yan@sent.com wrote:
> From: Zi Yan <zi.yan@cs.rutgers.edu>
> 
> Signed-off-by: Zi Yan <zi.yan@cs.rutgers.edu>
> ---
>  arch/powerpc/Kconfig                         |  4 ++++
>  arch/powerpc/include/asm/book3s/64/pgtable.h | 23 +++++++++++++++++++++++
>  2 files changed, 27 insertions(+)
> 
> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
> index 927d2ab..84ffd4c 100644
> --- a/arch/powerpc/Kconfig
> +++ b/arch/powerpc/Kconfig
> @@ -553,6 +553,10 @@ config ARCH_SPARSEMEM_DEFAULT
>  config SYS_SUPPORTS_HUGETLBFS
>  	bool
>  
> +config ARCH_ENABLE_THP_MIGRATION
> +	def_bool y
> +	depends on PPC64 && TRANSPARENT_HUGEPAGE && MIGRATION

I had done the same patch before but never posted it, since Nayomi's patches
were blocked behind _PAGE_PSE (x86 specific concern for __pmdp_present())

Having said that this addition looks good to me

Acked-by: Balbir Singh <bsingharora@gmail.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v1 00/12] THP migration support
  2016-09-29  8:25     ` Naoya Horiguchi
  (?)
@ 2016-09-30  2:32     ` Zi Yan
  -1 siblings, 0 replies; 34+ messages in thread
From: Zi Yan @ 2016-09-30  2:32 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: linux-kernel, linux-mm, benh, mgorman, kirill.shutemov, akpm,
	dave.hansen

[-- Attachment #1: Type: text/plain, Size: 1158 bytes --]

>
> Thanks for helping,

:)

>
> I think that you seem to do some testing with these patches on powerpc,
> which shows that thp migration can be enabled relatively easily for
> non-x86_64. This is a good news to me.

Right. I did some THP migration tests on both x86_64 and IBM ppc64.

You can use the code here to test the THP migration,
and compare the migration time between 512 base pages and 1 THP.
https://github.com/x-y-z/thp-migration-bench

NUMA (or fake NUMA) setup and libnuma are needed. Since it simply tries to
migrate pages from node 0 to node 1.

make bench should give you the result like:

THP Migration
Total time: 676.870346 us
Test successful.
-------------------
Base Page Migration
Total time: 2340.078354 us
Test successful.

>
> And I apology for my slow development over this patchset.
> My previous post was about 5 months ago, and I've not done ver.2 due to
> many interruptions. Someone also privately asked me about the progress
> of this work, so I promised ver.2 will be posted in a few weeks.
> Your patch 12/12 will come with it.

Looking forward to it. :)

—
Best Regards,
Yan Zi

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 496 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v1 12/12] mm: ppc64: Add THP migration support for ppc64.
  2016-09-26 15:22   ` zi.yan
@ 2016-09-30  5:18     ` Aneesh Kumar K.V
  -1 siblings, 0 replies; 34+ messages in thread
From: Aneesh Kumar K.V @ 2016-09-30  5:18 UTC (permalink / raw)
  To: zi.yan, linux-kernel, linux-mm
  Cc: benh, mgorman, kirill.shutemov, akpm, dave.hansen, n-horiguchi, Zi Yan

zi.yan@sent.com writes:

> From: Zi Yan <zi.yan@cs.rutgers.edu>
>
> Signed-off-by: Zi Yan <zi.yan@cs.rutgers.edu>
> ---
>  arch/powerpc/Kconfig                         |  4 ++++
>  arch/powerpc/include/asm/book3s/64/pgtable.h | 23 +++++++++++++++++++++++
>  2 files changed, 27 insertions(+)
>
> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
> index 927d2ab..84ffd4c 100644
> --- a/arch/powerpc/Kconfig
> +++ b/arch/powerpc/Kconfig
> @@ -553,6 +553,10 @@ config ARCH_SPARSEMEM_DEFAULT
>  config SYS_SUPPORTS_HUGETLBFS
>  	bool
>  
> +config ARCH_ENABLE_THP_MIGRATION
> +	def_bool y
> +	depends on PPC64 && TRANSPARENT_HUGEPAGE && MIGRATION
> +
>  source "mm/Kconfig"
>  
>  config ARCH_MEMORY_PROBE
> diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h b/arch/powerpc/include/asm/book3s/64/pgtable.h
> index 263bf39..9dee0467 100644
> --- a/arch/powerpc/include/asm/book3s/64/pgtable.h
> +++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
> @@ -521,7 +521,9 @@ static inline bool pte_user(pte_t pte)
>   * Clear bits not found in swap entries here.
>   */
>  #define __pte_to_swp_entry(pte)	((swp_entry_t) { pte_val((pte)) & ~_PAGE_PTE })
> +#define __pmd_to_swp_entry(pte)	((swp_entry_t) { pmd_val((pte)) & ~_PAGE_PTE })
>  #define __swp_entry_to_pte(x)	__pte((x).val | _PAGE_PTE)
> +#define __swp_entry_to_pmd(x)	__pmd((x).val | _PAGE_PTE)


We definitely need a comment around that. This will work only for 64K
linux page size, on 4k we may consider it a hugepd directory entry. But
This should be ok because we support THP only with 64K linux page size.
Hence my suggestion to add proper comments or move it to right headers.


>  
>  #ifdef CONFIG_MEM_SOFT_DIRTY
>  #define _PAGE_SWP_SOFT_DIRTY   (1UL << (SWP_TYPE_BITS + _PAGE_BIT_SWAP_TYPE))
> @@ -662,6 +664,10 @@ static inline int pmd_bad(pmd_t pmd)
>  		return radix__pmd_bad(pmd);
>  	return hash__pmd_bad(pmd);
>  }
> +static inline int __pmd_present(pmd_t pte)
> +{
> +	return !!(pmd_val(pte) & _PAGE_PRESENT);
> +}
>  
>  static inline void pud_set(pud_t *pudp, unsigned long val)
>  {
> @@ -850,6 +856,23 @@ static inline pte_t *pmdp_ptep(pmd_t *pmd)
>  #define pmd_soft_dirty(pmd)    pte_soft_dirty(pmd_pte(pmd))
>  #define pmd_mksoft_dirty(pmd)  pte_pmd(pte_mksoft_dirty(pmd_pte(pmd)))
>  #define pmd_clear_soft_dirty(pmd) pte_pmd(pte_clear_soft_dirty(pmd_pte(pmd)))
> +
> +#ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION
> +static inline pmd_t pmd_swp_mksoft_dirty(pmd_t pmd)
> +{
> +	return pte_pmd(pte_swp_mksoft_dirty(pmd_pte(pmd)));
> +}
> +
> +static inline int pmd_swp_soft_dirty(pmd_t pmd)
> +{
> +	return pte_swp_soft_dirty(pmd_pte(pmd));
> +}
> +
> +static inline pmd_t pmd_swp_clear_soft_dirty(pmd_t pmd)
> +{
> +	return pte_pmd(pte_swp_clear_soft_dirty(pmd_pte(pmd)));
> +}
> +#endif
>  #endif /* CONFIG_HAVE_ARCH_SOFT_DIRTY */
>  
>  #ifdef CONFIG_NUMA_BALANCING

Did we test this with Radix config ? If not I will suggest we hold off
the ppc64 patch and you can merge rest of the changes.

-aneesh

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v1 12/12] mm: ppc64: Add THP migration support for ppc64.
@ 2016-09-30  5:18     ` Aneesh Kumar K.V
  0 siblings, 0 replies; 34+ messages in thread
From: Aneesh Kumar K.V @ 2016-09-30  5:18 UTC (permalink / raw)
  To: zi.yan, linux-kernel, linux-mm
  Cc: benh, mgorman, kirill.shutemov, akpm, dave.hansen, n-horiguchi, Zi Yan

zi.yan@sent.com writes:

> From: Zi Yan <zi.yan@cs.rutgers.edu>
>
> Signed-off-by: Zi Yan <zi.yan@cs.rutgers.edu>
> ---
>  arch/powerpc/Kconfig                         |  4 ++++
>  arch/powerpc/include/asm/book3s/64/pgtable.h | 23 +++++++++++++++++++++++
>  2 files changed, 27 insertions(+)
>
> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
> index 927d2ab..84ffd4c 100644
> --- a/arch/powerpc/Kconfig
> +++ b/arch/powerpc/Kconfig
> @@ -553,6 +553,10 @@ config ARCH_SPARSEMEM_DEFAULT
>  config SYS_SUPPORTS_HUGETLBFS
>  	bool
>  
> +config ARCH_ENABLE_THP_MIGRATION
> +	def_bool y
> +	depends on PPC64 && TRANSPARENT_HUGEPAGE && MIGRATION
> +
>  source "mm/Kconfig"
>  
>  config ARCH_MEMORY_PROBE
> diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h b/arch/powerpc/include/asm/book3s/64/pgtable.h
> index 263bf39..9dee0467 100644
> --- a/arch/powerpc/include/asm/book3s/64/pgtable.h
> +++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
> @@ -521,7 +521,9 @@ static inline bool pte_user(pte_t pte)
>   * Clear bits not found in swap entries here.
>   */
>  #define __pte_to_swp_entry(pte)	((swp_entry_t) { pte_val((pte)) & ~_PAGE_PTE })
> +#define __pmd_to_swp_entry(pte)	((swp_entry_t) { pmd_val((pte)) & ~_PAGE_PTE })
>  #define __swp_entry_to_pte(x)	__pte((x).val | _PAGE_PTE)
> +#define __swp_entry_to_pmd(x)	__pmd((x).val | _PAGE_PTE)


We definitely need a comment around that. This will work only for 64K
linux page size, on 4k we may consider it a hugepd directory entry. But
This should be ok because we support THP only with 64K linux page size.
Hence my suggestion to add proper comments or move it to right headers.


>  
>  #ifdef CONFIG_MEM_SOFT_DIRTY
>  #define _PAGE_SWP_SOFT_DIRTY   (1UL << (SWP_TYPE_BITS + _PAGE_BIT_SWAP_TYPE))
> @@ -662,6 +664,10 @@ static inline int pmd_bad(pmd_t pmd)
>  		return radix__pmd_bad(pmd);
>  	return hash__pmd_bad(pmd);
>  }
> +static inline int __pmd_present(pmd_t pte)
> +{
> +	return !!(pmd_val(pte) & _PAGE_PRESENT);
> +}
>  
>  static inline void pud_set(pud_t *pudp, unsigned long val)
>  {
> @@ -850,6 +856,23 @@ static inline pte_t *pmdp_ptep(pmd_t *pmd)
>  #define pmd_soft_dirty(pmd)    pte_soft_dirty(pmd_pte(pmd))
>  #define pmd_mksoft_dirty(pmd)  pte_pmd(pte_mksoft_dirty(pmd_pte(pmd)))
>  #define pmd_clear_soft_dirty(pmd) pte_pmd(pte_clear_soft_dirty(pmd_pte(pmd)))
> +
> +#ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION
> +static inline pmd_t pmd_swp_mksoft_dirty(pmd_t pmd)
> +{
> +	return pte_pmd(pte_swp_mksoft_dirty(pmd_pte(pmd)));
> +}
> +
> +static inline int pmd_swp_soft_dirty(pmd_t pmd)
> +{
> +	return pte_swp_soft_dirty(pmd_pte(pmd));
> +}
> +
> +static inline pmd_t pmd_swp_clear_soft_dirty(pmd_t pmd)
> +{
> +	return pte_pmd(pte_swp_clear_soft_dirty(pmd_pte(pmd)));
> +}
> +#endif
>  #endif /* CONFIG_HAVE_ARCH_SOFT_DIRTY */
>  
>  #ifdef CONFIG_NUMA_BALANCING

Did we test this with Radix config ? If not I will suggest we hold off
the ppc64 patch and you can merge rest of the changes.

-aneesh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2016-09-30  5:18 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-09-26 15:22 [PATCH v1 00/12] mm: THP migration support zi.yan
2016-09-26 15:22 ` zi.yan
2016-09-26 15:22 ` [PATCH v1 01/12] mm: mempolicy: add queue_pages_node_check() zi.yan
2016-09-26 15:22   ` zi.yan
2016-09-26 15:22 ` [PATCH v1 02/12] mm: thp: introduce CONFIG_ARCH_ENABLE_HUGEPAGE_MIGRATION zi.yan
2016-09-26 15:22   ` zi.yan
2016-09-26 15:22 ` [PATCH v1 03/12] mm: thp: add helpers related to thp/pmd migration zi.yan
2016-09-26 15:22   ` zi.yan
2016-09-26 15:22 ` [PATCH v1 04/12] mm: thp: enable thp migration in generic path zi.yan
2016-09-26 15:22   ` zi.yan
2016-09-26 15:22 ` [PATCH v1 05/12] mm: thp: check pmd migration entry in common path zi.yan
2016-09-26 15:22   ` zi.yan
2016-09-26 15:22 ` [PATCH v1 06/12] mm: soft-dirty: keep soft-dirty bits over thp migration zi.yan
2016-09-26 15:22   ` zi.yan
2016-09-26 15:22 ` [PATCH v1 07/12] mm: hwpoison: fix race between unpoisoning and freeing migrate source page zi.yan
2016-09-26 15:22   ` zi.yan
2016-09-26 15:22 ` [PATCH v1 08/12] mm: hwpoison: soft offline supports thp migration zi.yan
2016-09-26 15:22   ` zi.yan
2016-09-26 15:22 ` [PATCH v1 09/12] mm: mempolicy: mbind and migrate_pages support " zi.yan
2016-09-26 15:22   ` zi.yan
2016-09-26 15:22 ` [PATCH v1 10/12] mm: migrate: move_pages() supports " zi.yan
2016-09-26 15:22   ` zi.yan
2016-09-26 15:22 ` [PATCH v1 11/12] mm: memory_hotplug: memory hotremove " zi.yan
2016-09-26 15:22   ` zi.yan
2016-09-26 15:22 ` [PATCH v1 12/12] mm: ppc64: Add THP migration support for ppc64 zi.yan
2016-09-26 15:22   ` zi.yan
2016-09-30  0:02   ` Balbir Singh
2016-09-30  0:02     ` Balbir Singh
2016-09-30  5:18   ` Aneesh Kumar K.V
2016-09-30  5:18     ` Aneesh Kumar K.V
2016-09-26 15:38 ` [PATCH v1 00/12] THP migration support Zi Yan
2016-09-29  8:25   ` Naoya Horiguchi
2016-09-29  8:25     ` Naoya Horiguchi
2016-09-30  2:32     ` Zi Yan

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.