linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 00/10] Enable HugeTLB page migration on POWER
@ 2016-04-07  5:37 Anshuman Khandual
  2016-04-07  5:37 ` [PATCH 01/10] mm/mmap: Replace SHM_HUGE_MASK with MAP_HUGE_MASK inside mmap_pgoff Anshuman Khandual
                   ` (10 more replies)
  0 siblings, 11 replies; 27+ messages in thread
From: Anshuman Khandual @ 2016-04-07  5:37 UTC (permalink / raw)
  To: linux-mm, linux-kernel, linuxppc-dev
  Cc: hughd, kirill, n-horiguchi, akpm, mgorman, dave.hansen,
	aneesh.kumar, mpe

This patch series enables HugeTLB page migration on POWER platform.
This series has some core VM changes (patch 1, 2, 3) and some powerpc
specific changes (patch 4, 5, 6, 7, 8, 9, 10). Comments, suggestions
and inputs are welcome.

Anshuman Khandual (10):
  mm/mmap: Replace SHM_HUGE_MASK with MAP_HUGE_MASK inside mmap_pgoff
  mm/hugetlb: Add PGD based implementation awareness
  mm/hugetlb: Protect follow_huge_(pud|pgd) functions from race
  powerpc/hugetlb: Add ABI defines for MAP_HUGE_16MB and MAP_HUGE_16GB
  powerpc/hugetlb: Split the function 'huge_pte_alloc'
  powerpc/hugetlb: Split the function 'huge_pte_offset'
  powerpc/hugetlb: Prepare arch functions for ARCH_WANT_GENERAL_HUGETLB
  powerpc/hugetlb: Selectively enable ARCH_WANT_GENERAL_HUGETLB
  powerpc/hugetlb: Selectively enable ARCH_ENABLE_HUGEPAGE_MIGRATION
  selfttest/powerpc: Add memory page migration tests

 arch/powerpc/Kconfig                               |   8 +
 arch/powerpc/include/asm/book3s/64/hash-64k.h      |  10 +
 arch/powerpc/include/uapi/asm/mman.h               |   3 +
 arch/powerpc/mm/hugetlbpage.c                      |  60 +++---
 include/linux/hugetlb.h                            |   3 +
 include/linux/mm.h                                 |  33 ++++
 mm/gup.c                                           |   6 +
 mm/hugetlb.c                                       |  75 +++++++-
 mm/mmap.c                                          |   2 +-
 tools/testing/selftests/powerpc/mm/Makefile        |  14 +-
 .../selftests/powerpc/mm/hugepage-migration.c      |  30 +++
 tools/testing/selftests/powerpc/mm/migration.h     | 205 +++++++++++++++++++++
 .../testing/selftests/powerpc/mm/page-migration.c  |  33 ++++
 tools/testing/selftests/powerpc/mm/run_mmtests     | 104 +++++++++++
 14 files changed, 552 insertions(+), 34 deletions(-)
 create mode 100644 tools/testing/selftests/powerpc/mm/hugepage-migration.c
 create mode 100644 tools/testing/selftests/powerpc/mm/migration.h
 create mode 100644 tools/testing/selftests/powerpc/mm/page-migration.c
 create mode 100755 tools/testing/selftests/powerpc/mm/run_mmtests

-- 
2.1.0

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 01/10] mm/mmap: Replace SHM_HUGE_MASK with MAP_HUGE_MASK inside mmap_pgoff
  2016-04-07  5:37 [PATCH 00/10] Enable HugeTLB page migration on POWER Anshuman Khandual
@ 2016-04-07  5:37 ` Anshuman Khandual
  2016-04-07  8:28   ` Balbir Singh
  2016-04-13  7:54   ` Michal Hocko
  2016-04-07  5:37 ` [PATCH 02/10] mm/hugetlb: Add PGD based implementation awareness Anshuman Khandual
                   ` (9 subsequent siblings)
  10 siblings, 2 replies; 27+ messages in thread
From: Anshuman Khandual @ 2016-04-07  5:37 UTC (permalink / raw)
  To: linux-mm, linux-kernel, linuxppc-dev
  Cc: hughd, kirill, n-horiguchi, akpm, mgorman, dave.hansen,
	aneesh.kumar, mpe

The commit 091d0d55b286 ("shm: fix null pointer deref when userspace
specifies invalid hugepage size") had replaced MAP_HUGE_MASK with
SHM_HUGE_MASK. Though both of them contain the same numeric value of
0x3f, MAP_HUGE_MASK flag sounds more appropriate than the other one
in the context. Hence change it back.

Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
---
 mm/mmap.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/mmap.c b/mm/mmap.c
index bd2e1a53..7d730a4 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1315,7 +1315,7 @@ SYSCALL_DEFINE6(mmap_pgoff, unsigned long, addr, unsigned long, len,
 		struct user_struct *user = NULL;
 		struct hstate *hs;
 
-		hs = hstate_sizelog((flags >> MAP_HUGE_SHIFT) & SHM_HUGE_MASK);
+		hs = hstate_sizelog((flags >> MAP_HUGE_SHIFT) & MAP_HUGE_MASK);
 		if (!hs)
 			return -EINVAL;
 
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 02/10] mm/hugetlb: Add PGD based implementation awareness
  2016-04-07  5:37 [PATCH 00/10] Enable HugeTLB page migration on POWER Anshuman Khandual
  2016-04-07  5:37 ` [PATCH 01/10] mm/mmap: Replace SHM_HUGE_MASK with MAP_HUGE_MASK inside mmap_pgoff Anshuman Khandual
@ 2016-04-07  5:37 ` Anshuman Khandual
  2016-04-07  9:04   ` Balbir Singh
  2016-04-07  5:37 ` [PATCH 03/10] mm/hugetlb: Protect follow_huge_(pud|pgd) functions from race Anshuman Khandual
                   ` (8 subsequent siblings)
  10 siblings, 1 reply; 27+ messages in thread
From: Anshuman Khandual @ 2016-04-07  5:37 UTC (permalink / raw)
  To: linux-mm, linux-kernel, linuxppc-dev
  Cc: hughd, kirill, n-horiguchi, akpm, mgorman, dave.hansen,
	aneesh.kumar, mpe

Currently the config ARCH_WANT_GENERAL_HUGETLB enabled functions like
'huge_pte_alloc' and 'huge_pte_offset' dont take into account HugeTLB
page implementation at the PGD level. This is also true for functions
like 'follow_page_mask' which is called from move_pages() system call.
This lack of PGD level huge page support prohibits some architectures
to use these generic HugeTLB functions.

This change adds the required PGD based implementation awareness and
with that, more architectures like POWER which implements 16GB pages
at the PGD level along with the 16MB pages at the PMD level can now
use ARCH_WANT_GENERAL_HUGETLB config option.

Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
---
 include/linux/hugetlb.h |  3 +++
 mm/gup.c                |  6 ++++++
 mm/hugetlb.c            | 20 ++++++++++++++++++++
 3 files changed, 29 insertions(+)

diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 7d953c2..71832e1 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -115,6 +115,8 @@ struct page *follow_huge_pmd(struct mm_struct *mm, unsigned long address,
 				pmd_t *pmd, int flags);
 struct page *follow_huge_pud(struct mm_struct *mm, unsigned long address,
 				pud_t *pud, int flags);
+struct page *follow_huge_pgd(struct mm_struct *mm, unsigned long address,
+				pgd_t *pgd, int flags);
 int pmd_huge(pmd_t pmd);
 int pud_huge(pud_t pmd);
 unsigned long hugetlb_change_protection(struct vm_area_struct *vma,
@@ -143,6 +145,7 @@ static inline void hugetlb_show_meminfo(void)
 }
 #define follow_huge_pmd(mm, addr, pmd, flags)	NULL
 #define follow_huge_pud(mm, addr, pud, flags)	NULL
+#define follow_huge_pgd(mm, addr, pgd, flags)	NULL
 #define prepare_hugepage_range(file, addr, len)	(-EINVAL)
 #define pmd_huge(x)	0
 #define pud_huge(x)	0
diff --git a/mm/gup.c b/mm/gup.c
index fb87aea..9bac78c 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -234,6 +234,12 @@ struct page *follow_page_mask(struct vm_area_struct *vma,
 	pgd = pgd_offset(mm, address);
 	if (pgd_none(*pgd) || unlikely(pgd_bad(*pgd)))
 		return no_page_table(vma, flags);
+	if (pgd_huge(*pgd) && vma->vm_flags & VM_HUGETLB) {
+		page = follow_huge_pgd(mm, address, pgd, flags);
+		if (page)
+			return page;
+		return no_page_table(vma, flags);
+	}
 
 	pud = pud_offset(pgd, address);
 	if (pud_none(*pud))
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 19d0d08..5ea3158 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -4250,6 +4250,11 @@ pte_t *huge_pte_alloc(struct mm_struct *mm,
 	pte_t *pte = NULL;
 
 	pgd = pgd_offset(mm, addr);
+	if (sz == PGDIR_SIZE) {
+		pte = (pte_t *)pgd;
+		goto huge_pgd;
+	}
+
 	pud = pud_alloc(mm, pgd, addr);
 	if (pud) {
 		if (sz == PUD_SIZE) {
@@ -4262,6 +4267,8 @@ pte_t *huge_pte_alloc(struct mm_struct *mm,
 				pte = (pte_t *)pmd_alloc(mm, pud, addr);
 		}
 	}
+
+huge_pgd:
 	BUG_ON(pte && !pte_none(*pte) && !pte_huge(*pte));
 
 	return pte;
@@ -4275,6 +4282,8 @@ pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long addr)
 
 	pgd = pgd_offset(mm, addr);
 	if (pgd_present(*pgd)) {
+		if (pgd_huge(*pgd))
+			return (pte_t *)pgd;
 		pud = pud_offset(pgd, addr);
 		if (pud_present(*pud)) {
 			if (pud_huge(*pud))
@@ -4343,6 +4352,17 @@ follow_huge_pud(struct mm_struct *mm, unsigned long address,
 	return pte_page(*(pte_t *)pud) + ((address & ~PUD_MASK) >> PAGE_SHIFT);
 }
 
+struct page * __weak
+follow_huge_pgd(struct mm_struct *mm, unsigned long address,
+		pgd_t *pgd, int flags)
+{
+	if (flags & FOLL_GET)
+		return NULL;
+
+	return pte_page(*(pte_t *)pgd) +
+				((address & ~PGDIR_MASK) >> PAGE_SHIFT);
+}
+
 #ifdef CONFIG_MEMORY_FAILURE
 
 /*
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 03/10] mm/hugetlb: Protect follow_huge_(pud|pgd) functions from race
  2016-04-07  5:37 [PATCH 00/10] Enable HugeTLB page migration on POWER Anshuman Khandual
  2016-04-07  5:37 ` [PATCH 01/10] mm/mmap: Replace SHM_HUGE_MASK with MAP_HUGE_MASK inside mmap_pgoff Anshuman Khandual
  2016-04-07  5:37 ` [PATCH 02/10] mm/hugetlb: Add PGD based implementation awareness Anshuman Khandual
@ 2016-04-07  5:37 ` Anshuman Khandual
  2016-04-07  9:16   ` kbuild test robot
                     ` (2 more replies)
  2016-04-07  5:37 ` [PATCH 04/10] powerpc/hugetlb: Add ABI defines for MAP_HUGE_16MB and MAP_HUGE_16GB Anshuman Khandual
                   ` (7 subsequent siblings)
  10 siblings, 3 replies; 27+ messages in thread
From: Anshuman Khandual @ 2016-04-07  5:37 UTC (permalink / raw)
  To: linux-mm, linux-kernel, linuxppc-dev
  Cc: hughd, kirill, n-horiguchi, akpm, mgorman, dave.hansen,
	aneesh.kumar, mpe

follow_huge_(pmd|pud|pgd) functions are used to walk the page table and
fetch the page struct during 'follow_page_mask' call. There are possible
race conditions faced by these functions which arise out of simultaneous
calls of move_pages() and freeing of huge pages. This was fixed partly
by the previous commit e66f17ff7177 ("mm/hugetlb: take page table lock
in follow_huge_pmd()") for only PMD based huge pages.

After implementing similar logic, functions like follow_huge_(pud|pgd)
are now safe from above mentioned race conditions and also can support
FOLL_GET. Generic version of the function 'follow_huge_addr' has been
left as it is and its upto the architecture to decide on it.

Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
---
 include/linux/mm.h | 33 +++++++++++++++++++++++++++
 mm/hugetlb.c       | 67 ++++++++++++++++++++++++++++++++++++++++++++++--------
 2 files changed, 91 insertions(+), 9 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index ffcff53..734182a 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1751,6 +1751,19 @@ static inline void pgtable_page_dtor(struct page *page)
 		NULL: pte_offset_kernel(pmd, address))
 
 #if USE_SPLIT_PMD_PTLOCKS
+static struct page *pgd_to_page(pgd_t *pgd)
+{
+	unsigned long mask = ~(PTRS_PER_PGD * sizeof(pgd_t) - 1);
+
+	return virt_to_page((void *)((unsigned long) pgd & mask));
+}
+
+static struct page *pud_to_page(pud_t *pud)
+{
+	unsigned long mask = ~(PTRS_PER_PUD * sizeof(pud_t) - 1);
+
+	return virt_to_page((void *)((unsigned long) pud & mask));
+}
 
 static struct page *pmd_to_page(pmd_t *pmd)
 {
@@ -1758,6 +1771,16 @@ static struct page *pmd_to_page(pmd_t *pmd)
 	return virt_to_page((void *)((unsigned long) pmd & mask));
 }
 
+static inline spinlock_t *pgd_lockptr(struct mm_struct *mm, pgd_t *pgd)
+{
+	return ptlock_ptr(pgd_to_page(pgd));
+}
+
+static inline spinlock_t *pud_lockptr(struct mm_struct *mm, pud_t *pud)
+{
+	return ptlock_ptr(pud_to_page(pud));
+}
+
 static inline spinlock_t *pmd_lockptr(struct mm_struct *mm, pmd_t *pmd)
 {
 	return ptlock_ptr(pmd_to_page(pmd));
@@ -1783,6 +1806,16 @@ static inline void pgtable_pmd_page_dtor(struct page *page)
 
 #else
 
+static inline spinlock_t *pgd_lockptr(struct mm_struct *mm, pgd_t *pgd)
+{
+	return &mm->page_table_lock;
+}
+
+static inline spinlock_t *pud_lockptr(struct mm_struct *mm, pud_t *pud)
+{
+	return &mm->page_table_lock;
+}
+
 static inline spinlock_t *pmd_lockptr(struct mm_struct *mm, pmd_t *pmd)
 {
 	return &mm->page_table_lock;
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 5ea3158..e84e479 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -4346,21 +4346,70 @@ struct page * __weak
 follow_huge_pud(struct mm_struct *mm, unsigned long address,
 		pud_t *pud, int flags)
 {
-	if (flags & FOLL_GET)
-		return NULL;
-
-	return pte_page(*(pte_t *)pud) + ((address & ~PUD_MASK) >> PAGE_SHIFT);
+	struct page *page = NULL;
+	spinlock_t *ptl;
+retry:
+	ptl = pud_lockptr(mm, pud);
+	spin_lock(ptl);
+	/*
+	 * make sure that the address range covered by this pud is not
+	 * unmapped from other threads.
+	 */
+	if (!pud_huge(*pud))
+		goto out;
+	if (pud_present(*pud)) {
+		page = pud_page(*pud) + ((address & ~PUD_MASK) >> PAGE_SHIFT);
+		if (flags & FOLL_GET)
+			get_page(page);
+	} else {
+		if (is_hugetlb_entry_migration(huge_ptep_get((pte_t *)pud))) {
+			spin_unlock(ptl);
+			__migration_entry_wait(mm, (pte_t *)pud, ptl);
+			goto retry;
+		}
+		/*
+		 * hwpoisoned entry is treated as no_page_table in
+		 * follow_page_mask().
+		 */
+	}
+out:
+	spin_unlock(ptl);
+	return page;
 }
 
 struct page * __weak
 follow_huge_pgd(struct mm_struct *mm, unsigned long address,
 		pgd_t *pgd, int flags)
 {
-	if (flags & FOLL_GET)
-		return NULL;
-
-	return pte_page(*(pte_t *)pgd) +
-				((address & ~PGDIR_MASK) >> PAGE_SHIFT);
+	struct page *page = NULL;
+	spinlock_t *ptl;
+retry:
+	ptl = pgd_lockptr(mm, pgd);
+	spin_lock(ptl);
+	/*
+	 * make sure that the address range covered by this pgd is not
+	 * unmapped from other threads.
+	 */
+	if (!pgd_huge(*pgd))
+		goto out;
+	if (pgd_present(*pgd)) {
+		page = pgd_page(*pgd) + ((address & ~PGDIR_MASK) >> PAGE_SHIFT);
+		if (flags & FOLL_GET)
+			get_page(page);
+	} else {
+		if (is_hugetlb_entry_migration(huge_ptep_get((pte_t *)pgd))) {
+			spin_unlock(ptl);
+			__migration_entry_wait(mm, (pte_t *)pgd, ptl);
+			goto retry;
+		}
+		/*
+		 * hwpoisoned entry is treated as no_page_table in
+		 * follow_page_mask().
+		 */
+	}
+out:
+	spin_unlock(ptl);
+	return page;
 }
 
 #ifdef CONFIG_MEMORY_FAILURE
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 04/10] powerpc/hugetlb: Add ABI defines for MAP_HUGE_16MB and MAP_HUGE_16GB
  2016-04-07  5:37 [PATCH 00/10] Enable HugeTLB page migration on POWER Anshuman Khandual
                   ` (2 preceding siblings ...)
  2016-04-07  5:37 ` [PATCH 03/10] mm/hugetlb: Protect follow_huge_(pud|pgd) functions from race Anshuman Khandual
@ 2016-04-07  5:37 ` Anshuman Khandual
  2016-04-07  5:37 ` [PATCH 05/10] powerpc/hugetlb: Split the function 'huge_pte_alloc' Anshuman Khandual
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 27+ messages in thread
From: Anshuman Khandual @ 2016-04-07  5:37 UTC (permalink / raw)
  To: linux-mm, linux-kernel, linuxppc-dev
  Cc: hughd, kirill, n-horiguchi, akpm, mgorman, dave.hansen,
	aneesh.kumar, mpe

This just adds user space exported ABI definitions for both 16MB and
16GB non default huge page sizes to be used with mmap() system call.

Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
---
 arch/powerpc/include/uapi/asm/mman.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/powerpc/include/uapi/asm/mman.h b/arch/powerpc/include/uapi/asm/mman.h
index 03c06ba..e78980b 100644
--- a/arch/powerpc/include/uapi/asm/mman.h
+++ b/arch/powerpc/include/uapi/asm/mman.h
@@ -29,4 +29,7 @@
 #define MAP_STACK	0x20000		/* give out an address that is best suited for process/thread stacks */
 #define MAP_HUGETLB	0x40000		/* create a huge page mapping */
 
+#define MAP_HUGE_16MB	(24 << MAP_HUGE_SHIFT)	/* 16MB HugeTLB Page */
+#define MAP_HUGE_16GB	(34 << MAP_HUGE_SHIFT)	/* 16GB HugeTLB Page */
+
 #endif /* _UAPI_ASM_POWERPC_MMAN_H */
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 05/10] powerpc/hugetlb: Split the function 'huge_pte_alloc'
  2016-04-07  5:37 [PATCH 00/10] Enable HugeTLB page migration on POWER Anshuman Khandual
                   ` (3 preceding siblings ...)
  2016-04-07  5:37 ` [PATCH 04/10] powerpc/hugetlb: Add ABI defines for MAP_HUGE_16MB and MAP_HUGE_16GB Anshuman Khandual
@ 2016-04-07  5:37 ` Anshuman Khandual
  2016-04-11 13:51   ` Balbir Singh
  2016-04-07  5:37 ` [PATCH 06/10] powerpc/hugetlb: Split the function 'huge_pte_offset' Anshuman Khandual
                   ` (5 subsequent siblings)
  10 siblings, 1 reply; 27+ messages in thread
From: Anshuman Khandual @ 2016-04-07  5:37 UTC (permalink / raw)
  To: linux-mm, linux-kernel, linuxppc-dev
  Cc: hughd, kirill, n-horiguchi, akpm, mgorman, dave.hansen,
	aneesh.kumar, mpe

Currently the function 'huge_pte_alloc' has got two versions, one for the
BOOK3S server and the other one for the BOOK3E embedded platforms. This
change splits only the BOOK3S server version into two parts, one for the
ARCH_WANT_GENERAL_HUGETLB config implementation and the other one for
everything else. This change is one of the prerequisites towards enabling
ARCH_WANT_GENERAL_HUGETLB config option on POWER platform.

Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
---
 arch/powerpc/mm/hugetlbpage.c | 67 +++++++++++++++++++++++++++----------------
 1 file changed, 43 insertions(+), 24 deletions(-)

diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c
index d991b9e..e453918 100644
--- a/arch/powerpc/mm/hugetlbpage.c
+++ b/arch/powerpc/mm/hugetlbpage.c
@@ -59,6 +59,7 @@ pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long addr)
 	return __find_linux_pte_or_hugepte(mm->pgd, addr, NULL, NULL);
 }
 
+#ifndef CONFIG_ARCH_WANT_GENERAL_HUGETLB
 static int __hugepte_alloc(struct mm_struct *mm, hugepd_t *hpdp,
 			   unsigned long address, unsigned pdshift, unsigned pshift)
 {
@@ -116,6 +117,7 @@ static int __hugepte_alloc(struct mm_struct *mm, hugepd_t *hpdp,
 	spin_unlock(&mm->page_table_lock);
 	return 0;
 }
+#endif /* !CONFIG_ARCH_WANT_GENERAL_HUGETLB */
 
 /*
  * These macros define how to determine which level of the page table holds
@@ -130,6 +132,7 @@ static int __hugepte_alloc(struct mm_struct *mm, hugepd_t *hpdp,
 #endif
 
 #ifdef CONFIG_PPC_BOOK3S_64
+#ifndef CONFIG_ARCH_WANT_GENERAL_HUGETLB
 /*
  * At this point we do the placement change only for BOOK3S 64. This would
  * possibly work on other subarchs.
@@ -145,32 +148,23 @@ pte_t *huge_pte_alloc(struct mm_struct *mm, unsigned long addr, unsigned long sz
 
 	addr &= ~(sz-1);
 	pg = pgd_offset(mm, addr);
-
-	if (pshift == PGDIR_SHIFT)
-		/* 16GB huge page */
-		return (pte_t *) pg;
-	else if (pshift > PUD_SHIFT)
-		/*
-		 * We need to use hugepd table
-		 */
+	if (pshift > PUD_SHIFT) {
 		hpdp = (hugepd_t *)pg;
-	else {
-		pdshift = PUD_SHIFT;
-		pu = pud_alloc(mm, pg, addr);
-		if (pshift == PUD_SHIFT)
-			return (pte_t *)pu;
-		else if (pshift > PMD_SHIFT)
-			hpdp = (hugepd_t *)pu;
-		else {
-			pdshift = PMD_SHIFT;
-			pm = pmd_alloc(mm, pu, addr);
-			if (pshift == PMD_SHIFT)
-				/* 16MB hugepage */
-				return (pte_t *)pm;
-			else
-				hpdp = (hugepd_t *)pm;
-		}
+		goto hugepd_search;
 	}
+
+	pdshift = PUD_SHIFT;
+	pu = pud_alloc(mm, pg, addr);
+	if (pshift > PMD_SHIFT) {
+		hpdp = (hugepd_t *)pu;
+		goto hugepd_search;
+	}
+
+	pdshift = PMD_SHIFT;
+	pm = pmd_alloc(mm, pu, addr);
+	hpdp = (hugepd_t *)pm;
+
+hugepd_search:
 	if (!hpdp)
 		return NULL;
 
@@ -182,6 +176,31 @@ pte_t *huge_pte_alloc(struct mm_struct *mm, unsigned long addr, unsigned long sz
 	return hugepte_offset(*hpdp, addr, pdshift);
 }
 
+#else /* CONFIG_ARCH_WANT_GENERAL_HUGETLB */
+pte_t *huge_pte_alloc(struct mm_struct *mm, unsigned long addr, unsigned long sz)
+{
+	pgd_t *pg;
+	pud_t *pu;
+	pmd_t *pm;
+	unsigned pshift = __ffs(sz);
+
+	addr &= ~(sz-1);
+	pg = pgd_offset(mm, addr);
+
+	if (pshift == PGDIR_SHIFT)	/* 16GB Huge Page */
+		return (pte_t *)pg;
+
+	pu = pud_alloc(mm, pg, addr);	/* NA, skipped */
+	if (pshift == PUD_SHIFT)
+		return (pte_t *)pu;
+
+	pm = pmd_alloc(mm, pu, addr);	/* 16MB Huge Page */
+	if (pshift == PMD_SHIFT)
+		return (pte_t *)pm;
+
+	return NULL;
+}
+#endif /* !CONFIG_ARCH_WANT_GENERAL_HUGETLB */
 #else
 
 pte_t *huge_pte_alloc(struct mm_struct *mm, unsigned long addr, unsigned long sz)
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 06/10] powerpc/hugetlb: Split the function 'huge_pte_offset'
  2016-04-07  5:37 [PATCH 00/10] Enable HugeTLB page migration on POWER Anshuman Khandual
                   ` (4 preceding siblings ...)
  2016-04-07  5:37 ` [PATCH 05/10] powerpc/hugetlb: Split the function 'huge_pte_alloc' Anshuman Khandual
@ 2016-04-07  5:37 ` Anshuman Khandual
  2016-04-07  5:37 ` [PATCH 07/10] powerpc/hugetlb: Prepare arch functions for ARCH_WANT_GENERAL_HUGETLB Anshuman Khandual
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 27+ messages in thread
From: Anshuman Khandual @ 2016-04-07  5:37 UTC (permalink / raw)
  To: linux-mm, linux-kernel, linuxppc-dev
  Cc: hughd, kirill, n-horiguchi, akpm, mgorman, dave.hansen,
	aneesh.kumar, mpe

Currently the function 'huge_pte_offset' has just got one version for all
possible configurations and platforms. This change splits that function
into two versions, first one for ARCH_WANT_GENERAL_HUGETLB implementation
and the other one for everything else. This change is again one of the
prerequisites towards enabling ARCH_WANT_GENERAL_ HUGETLB config option
on POWER platform.

Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
---
 arch/powerpc/mm/hugetlbpage.c | 35 +++++++++++++++++++++++++++++++++++
 1 file changed, 35 insertions(+)

diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c
index e453918..8fc6d23 100644
--- a/arch/powerpc/mm/hugetlbpage.c
+++ b/arch/powerpc/mm/hugetlbpage.c
@@ -53,11 +53,46 @@ static unsigned nr_gpages;
 
 #define hugepd_none(hpd)	((hpd).pd == 0)
 
+#ifndef CONFIG_ARCH_WANT_GENERAL_HUGETLB
 pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long addr)
 {
 	/* Only called for hugetlbfs pages, hence can ignore THP */
 	return __find_linux_pte_or_hugepte(mm->pgd, addr, NULL, NULL);
 }
+#else /* CONFIG_ARCH_WANT_GENERAL_HUGETLB */
+pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long addr)
+{
+	pgd_t pgd, *pgdp;
+	pud_t pud, *pudp;
+	pmd_t pmd, *pmdp;
+
+	pgdp = mm->pgd + pgd_index(addr);
+	pgd  = READ_ONCE(*pgdp);
+
+	if (pgd_none(pgd))
+		return NULL;
+
+	if (pgd_huge(pgd))
+		return (pte_t *)pgdp;
+
+	pudp = pud_offset(&pgd, addr);
+	pud  = READ_ONCE(*pudp);
+	if (pud_none(pud))
+		return NULL;
+
+	if (pud_huge(pud))
+		return (pte_t *)pudp;
+
+	pmdp = pmd_offset(&pud, addr);
+	pmd  = READ_ONCE(*pmdp);
+	if (pmd_none(pmd))
+		return NULL;
+
+	if (pmd_huge(pmd))
+		return (pte_t *)pmdp;
+	return NULL;
+}
+#endif /* !CONFIG_ARCH_WANT_GENERAL_HUGETLB */
 
 #ifndef CONFIG_ARCH_WANT_GENERAL_HUGETLB
 static int __hugepte_alloc(struct mm_struct *mm, hugepd_t *hpdp,
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 07/10] powerpc/hugetlb: Prepare arch functions for ARCH_WANT_GENERAL_HUGETLB
  2016-04-07  5:37 [PATCH 00/10] Enable HugeTLB page migration on POWER Anshuman Khandual
                   ` (5 preceding siblings ...)
  2016-04-07  5:37 ` [PATCH 06/10] powerpc/hugetlb: Split the function 'huge_pte_offset' Anshuman Khandual
@ 2016-04-07  5:37 ` Anshuman Khandual
  2016-04-07  5:37 ` [PATCH 08/10] powerpc/hugetlb: Selectively enable ARCH_WANT_GENERAL_HUGETLB Anshuman Khandual
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 27+ messages in thread
From: Anshuman Khandual @ 2016-04-07  5:37 UTC (permalink / raw)
  To: linux-mm, linux-kernel, linuxppc-dev
  Cc: hughd, kirill, n-horiguchi, akpm, mgorman, dave.hansen,
	aneesh.kumar, mpe

Arch override function 'follow_huge_addr' is called from 'follow_page_mask'
looking out for the associated page struct. Right now, it does not support
the FOLL_GET option.

With ARCH_WANTS_GENERAL_HUGETLB, we will need function 'follow_page_mask'
to use generic 'follow_huge_*' functions instead of the arch overrides. So,
here it modifies 'follow_huge_addr' function to return ERR_PTR(-EINVAL)
when ARCH_WANT_GENERAL_HUGETLB option is enabled. This also hides away all
the arch specific 'follow_huge_*' overrides allowing it to fall back on the
generic 'follow_huge_*' functions instead.

While here, this also implements the function 'pte_huge' which is required
by the generic call 'huge_pte_alloc'.

Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/book3s/64/hash-64k.h | 10 ++++++++++
 arch/powerpc/mm/hugetlbpage.c                 | 14 ++++++++++++++
 2 files changed, 24 insertions(+)

diff --git a/arch/powerpc/include/asm/book3s/64/hash-64k.h b/arch/powerpc/include/asm/book3s/64/hash-64k.h
index 0a7956a..3b6dff4 100644
--- a/arch/powerpc/include/asm/book3s/64/hash-64k.h
+++ b/arch/powerpc/include/asm/book3s/64/hash-64k.h
@@ -146,6 +146,16 @@ extern bool __rpte_sub_valid(real_pte_t rpte, unsigned long index);
  * Defined in such a way that we can optimize away code block at build time
  * if CONFIG_HUGETLB_PAGE=n.
  */
+#ifdef CONFIG_ARCH_WANT_GENERAL_HUGETLB
+static inline int pte_huge(pte_t pte)
+{
+	/*
+	 * leaf pte for huge page
+	 */
+	return !!(pte_val(pte) & _PAGE_PTE);
+}
+#endif
+
 static inline int pmd_huge(pmd_t pmd)
 {
 	/*
diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c
index 8fc6d23..4f44c62 100644
--- a/arch/powerpc/mm/hugetlbpage.c
+++ b/arch/powerpc/mm/hugetlbpage.c
@@ -690,6 +690,10 @@ follow_huge_addr(struct mm_struct *mm, unsigned long address, int write)
 	unsigned long mask, flags;
 	struct page *page = ERR_PTR(-EINVAL);
 
+#ifdef CONFIG_ARCH_WANT_GENERAL_HUGETLB
+	return ERR_PTR(-EINVAL);
+#endif
+
 	local_irq_save(flags);
 	ptep = find_linux_pte_or_hugepte(mm->pgd, address, &is_thp, &shift);
 	if (!ptep)
@@ -717,6 +721,7 @@ no_page:
 	return page;
 }
 
+#ifndef CONFIG_ARCH_WANT_GENERAL_HUGETLB
 struct page *
 follow_huge_pmd(struct mm_struct *mm, unsigned long address,
 		pmd_t *pmd, int write)
@@ -733,6 +738,15 @@ follow_huge_pud(struct mm_struct *mm, unsigned long address,
 	return NULL;
 }
 
+struct page *
+follow_huge_pgd(struct mm_struct *mm, unsigned long address,
+		pgd_t *pgd, int write)
+{
+	BUG();
+	return NULL;
+}
+#endif /* !CONFIG_ARCH_WANT_GENERAL_HUGETLB */
+
 static unsigned long hugepte_addr_end(unsigned long addr, unsigned long end,
 				      unsigned long sz)
 {
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 08/10] powerpc/hugetlb: Selectively enable ARCH_WANT_GENERAL_HUGETLB
  2016-04-07  5:37 [PATCH 00/10] Enable HugeTLB page migration on POWER Anshuman Khandual
                   ` (6 preceding siblings ...)
  2016-04-07  5:37 ` [PATCH 07/10] powerpc/hugetlb: Prepare arch functions for ARCH_WANT_GENERAL_HUGETLB Anshuman Khandual
@ 2016-04-07  5:37 ` Anshuman Khandual
  2016-04-07  5:37 ` [PATCH 09/10] powerpc/hugetlb: Selectively enable ARCH_ENABLE_HUGEPAGE_MIGRATION Anshuman Khandual
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 27+ messages in thread
From: Anshuman Khandual @ 2016-04-07  5:37 UTC (permalink / raw)
  To: linux-mm, linux-kernel, linuxppc-dev
  Cc: hughd, kirill, n-horiguchi, akpm, mgorman, dave.hansen,
	aneesh.kumar, mpe

This enables ARCH_WANT_GENERAL_HUGETLB config option only for BOOK3S
platforms with 64K page size implementation. Existing arch specific
functions for ARCH_WANT_GENERAL_HUGETLB config like 'huge_pte_alloc'
and 'huge_pte_offset' are no longer required and are removed with
this change.

Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
---
 arch/powerpc/Kconfig          |  4 +++
 arch/powerpc/mm/hugetlbpage.c | 58 -------------------------------------------
 2 files changed, 4 insertions(+), 58 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 7cd32c0..9b3ce18 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -33,6 +33,10 @@ config HAVE_SETUP_PER_CPU_AREA
 config NEED_PER_CPU_EMBED_FIRST_CHUNK
 	def_bool PPC64
 
+config ARCH_WANT_GENERAL_HUGETLB
+	depends on HUGETLB_PAGE && PPC_64K_PAGES && PPC_BOOK3S_64
+	def_bool y
+
 config NR_IRQS
 	int "Number of virtual interrupt numbers"
 	range 32 32768
diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c
index 4f44c62..bd0e584 100644
--- a/arch/powerpc/mm/hugetlbpage.c
+++ b/arch/powerpc/mm/hugetlbpage.c
@@ -59,39 +59,6 @@ pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long addr)
 	/* Only called for hugetlbfs pages, hence can ignore THP */
 	return __find_linux_pte_or_hugepte(mm->pgd, addr, NULL, NULL);
 }
-#else /* CONFIG_ARCH_WANT_GENERAL_HUGETLB */
-pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long addr)
-{
-	pgd_t pgd, *pgdp;
-	pud_t pud, *pudp;
-	pmd_t pmd, *pmdp;
-
-	pgdp = mm->pgd + pgd_index(addr);
-	pgd  = READ_ONCE(*pgdp);
-
-	if (pgd_none(pgd))
-		return NULL;
-
-	if (pgd_huge(pgd))
-		return (pte_t *)pgdp;
-
-	pudp = pud_offset(&pgd, addr);
-	pud  = READ_ONCE(*pudp);
-	if (pud_none(pud))
-		return NULL;
-
-	if (pud_huge(pud))
-		return (pte_t *)pudp;
-
-	pmdp = pmd_offset(&pud, addr);
-	pmd  = READ_ONCE(*pmdp);
-	if (pmd_none(pmd))
-		return NULL;
-
-	if (pmd_huge(pmd))
-		return (pte_t *)pmdp;
-	return NULL;
-}
 #endif /* !CONFIG_ARCH_WANT_GENERAL_HUGETLB */
 
 #ifndef CONFIG_ARCH_WANT_GENERAL_HUGETLB
@@ -210,31 +177,6 @@ hugepd_search:
 
 	return hugepte_offset(*hpdp, addr, pdshift);
 }
-
-#else /* CONFIG_ARCH_WANT_GENERAL_HUGETLB */
-pte_t *huge_pte_alloc(struct mm_struct *mm, unsigned long addr, unsigned long sz)
-{
-	pgd_t *pg;
-	pud_t *pu;
-	pmd_t *pm;
-	unsigned pshift = __ffs(sz);
-
-	addr &= ~(sz-1);
-	pg = pgd_offset(mm, addr);
-
-	if (pshift == PGDIR_SHIFT)	/* 16GB Huge Page */
-		return (pte_t *)pg;
-
-	pu = pud_alloc(mm, pg, addr);	/* NA, skipped */
-	if (pshift == PUD_SHIFT)
-		return (pte_t *)pu;
-
-	pm = pmd_alloc(mm, pu, addr);	/* 16MB Huge Page */
-	if (pshift == PMD_SHIFT)
-		return (pte_t *)pm;
-
-	return NULL;
-}
 #endif /* !CONFIG_ARCH_WANT_GENERAL_HUGETLB */
 #else
 
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 09/10] powerpc/hugetlb: Selectively enable ARCH_ENABLE_HUGEPAGE_MIGRATION
  2016-04-07  5:37 [PATCH 00/10] Enable HugeTLB page migration on POWER Anshuman Khandual
                   ` (7 preceding siblings ...)
  2016-04-07  5:37 ` [PATCH 08/10] powerpc/hugetlb: Selectively enable ARCH_WANT_GENERAL_HUGETLB Anshuman Khandual
@ 2016-04-07  5:37 ` Anshuman Khandual
  2016-04-07  5:37 ` [PATCH 10/10] selfttest/powerpc: Add memory page migration tests Anshuman Khandual
  2016-04-18  8:52 ` [PATCH 00/10] Enable HugeTLB page migration on POWER Anshuman Khandual
  10 siblings, 0 replies; 27+ messages in thread
From: Anshuman Khandual @ 2016-04-07  5:37 UTC (permalink / raw)
  To: linux-mm, linux-kernel, linuxppc-dev
  Cc: hughd, kirill, n-horiguchi, akpm, mgorman, dave.hansen,
	aneesh.kumar, mpe

This change enables the config option ARCH_ENABLE_HUGEPAGE_MIGRATION
depending on whether the platform has got ARCH_WANT_GENERAL_HUGETLB
or not along with config option MIGRATION. In turn, it turns on the
'hugepage_migration_supported' function which is checked for feature
presence during HugeTLB page migration and clears the way.

Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
---
 arch/powerpc/Kconfig | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 9b3ce18..f2a45eb 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -86,6 +86,10 @@ config GENERIC_HWEIGHT
 config ARCH_HAS_DMA_SET_COHERENT_MASK
         bool
 
+config ARCH_ENABLE_HUGEPAGE_MIGRATION
+	def_bool y
+	depends on ARCH_WANT_GENERAL_HUGETLB && MIGRATION
+
 config PPC
 	bool
 	default y
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 10/10] selfttest/powerpc: Add memory page migration tests
  2016-04-07  5:37 [PATCH 00/10] Enable HugeTLB page migration on POWER Anshuman Khandual
                   ` (8 preceding siblings ...)
  2016-04-07  5:37 ` [PATCH 09/10] powerpc/hugetlb: Selectively enable ARCH_ENABLE_HUGEPAGE_MIGRATION Anshuman Khandual
@ 2016-04-07  5:37 ` Anshuman Khandual
  2016-04-18  8:52 ` [PATCH 00/10] Enable HugeTLB page migration on POWER Anshuman Khandual
  10 siblings, 0 replies; 27+ messages in thread
From: Anshuman Khandual @ 2016-04-07  5:37 UTC (permalink / raw)
  To: linux-mm, linux-kernel, linuxppc-dev
  Cc: hughd, kirill, n-horiguchi, akpm, mgorman, dave.hansen,
	aneesh.kumar, mpe

This adds two tests for memory page migration. One for normal page
migration which works for both 4K or 64K base page size kernel and
the other one is for huge page migration which works only on 64K
base page sized 16MB huge page implemention at the PMD level.

Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
---
 tools/testing/selftests/powerpc/mm/Makefile        |  14 +-
 .../selftests/powerpc/mm/hugepage-migration.c      |  30 +++
 tools/testing/selftests/powerpc/mm/migration.h     | 205 +++++++++++++++++++++
 .../testing/selftests/powerpc/mm/page-migration.c  |  33 ++++
 tools/testing/selftests/powerpc/mm/run_mmtests     | 104 +++++++++++
 5 files changed, 381 insertions(+), 5 deletions(-)
 create mode 100644 tools/testing/selftests/powerpc/mm/hugepage-migration.c
 create mode 100644 tools/testing/selftests/powerpc/mm/migration.h
 create mode 100644 tools/testing/selftests/powerpc/mm/page-migration.c
 create mode 100755 tools/testing/selftests/powerpc/mm/run_mmtests

diff --git a/tools/testing/selftests/powerpc/mm/Makefile b/tools/testing/selftests/powerpc/mm/Makefile
index ee179e2..c482614 100644
--- a/tools/testing/selftests/powerpc/mm/Makefile
+++ b/tools/testing/selftests/powerpc/mm/Makefile
@@ -1,12 +1,16 @@
 noarg:
 	$(MAKE) -C ../
 
-TEST_PROGS := hugetlb_vs_thp_test subpage_prot
-TEST_FILES := tempfile
+TEST_PROGS := run_mmtests
+TEST_FILES := hugetlb_vs_thp_test
+TEST_FILES += subpage_prot
+TEST_FILES += tempfile
+TEST_FILES += hugepage-migration
+TEST_FILES += page-migration
 
-all: $(TEST_PROGS) $(TEST_FILES)
+all: $(TEST_FILES)
 
-$(TEST_PROGS): ../harness.c
+$(TEST_FILES): ../harness.c
 
 include ../../lib.mk
 
@@ -14,4 +18,4 @@ tempfile:
 	dd if=/dev/zero of=tempfile bs=64k count=1
 
 clean:
-	rm -f $(TEST_PROGS) tempfile
+	rm -f $(TEST_FILES)
diff --git a/tools/testing/selftests/powerpc/mm/hugepage-migration.c b/tools/testing/selftests/powerpc/mm/hugepage-migration.c
new file mode 100644
index 0000000..b60bc10
--- /dev/null
+++ b/tools/testing/selftests/powerpc/mm/hugepage-migration.c
@@ -0,0 +1,30 @@
+/*
+ * Copyright (C) 2015, Anshuman Khandual, IBM Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published
+ * by the Free Software Foundation.
+ */
+#include "migration.h"
+
+static int hugepage_migration(void)
+{
+	int ret = 0;
+
+	if ((unsigned long)getpagesize() == 0x1000)
+		printf("Running on base page size 4K\n");
+
+	if ((unsigned long)getpagesize() == 0x10000)
+		printf("Running on base page size 64K\n");
+
+	ret = test_huge_migration(16 * MEM_MB);
+	ret = test_huge_migration(256 * MEM_MB);
+	ret = test_huge_migration(512 * MEM_MB);
+
+	return ret;
+}
+
+int main(void)
+{
+	return test_harness(hugepage_migration, "hugepage_migration");
+}
diff --git a/tools/testing/selftests/powerpc/mm/migration.h b/tools/testing/selftests/powerpc/mm/migration.h
new file mode 100644
index 0000000..9d4e273
--- /dev/null
+++ b/tools/testing/selftests/powerpc/mm/migration.h
@@ -0,0 +1,205 @@
+/*
+ * Copyright (C) 2015, Anshuman Khandual, IBM Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published
+ * by the Free Software Foundation.
+ */
+#include <stdlib.h>
+#include <stdio.h>
+#include <string.h>
+#include <unistd.h>
+#include <sys/mman.h>
+#include <fcntl.h>
+
+#include "utils.h"
+
+#define HPAGE_OFF	0
+#define HPAGE_ON	1
+
+#define PAGE_SHIFT_4K	12
+#define PAGE_SHIFT_64K	16
+#define PAGE_SIZE_4K	0x1000
+#define PAGE_SIZE_64K	0x10000
+#define PAGE_SIZE_HUGE	16UL * 1024 * 1024
+
+#define MEM_GB		1024UL * 1024 * 1024
+#define MEM_MB		1024UL * 1024
+#define MME_KB		1024UL
+
+#define PMAP_FILE	"/proc/self/pagemap"
+#define PMAP_PFN	0x007FFFFFFFFFFFFFUL
+#define PMAP_SIZE	8
+
+#define SOFT_OFFLINE	"/sys/devices/system/memory/soft_offline_page"
+#define HARD_OFFLINE	"/sys/devices/system/memory/hard_offline_page"
+
+#define MMAP_LENGTH	(256 * MEM_MB)
+#define MMAP_ADDR	(void *)(0x0UL)
+#define MMAP_PROT	(PROT_READ | PROT_WRITE)
+#define MMAP_FLAGS	(MAP_PRIVATE | MAP_ANONYMOUS)
+#define MMAP_FLAGS_HUGE	(MAP_SHARED)
+
+#define FILE_NAME	"huge/hugepagefile"
+
+static void write_buffer(char *addr, unsigned long length)
+{
+	unsigned long i;
+
+	for (i = 0; i < length; i++)
+		*(addr + i) = (char)i;
+}
+
+static int read_buffer(char *addr, unsigned long length)
+{
+	unsigned long i;
+
+	for (i = 0; i < length; i++) {
+		if (*(addr + i) != (char)i) {
+			printf("Data miscompare at addr[%lu]\n", i);
+			return 1;
+		}
+	}
+	return 0;
+}
+
+static unsigned long get_npages(unsigned long length, unsigned long size)
+{
+	unsigned int tmp1 = length, tmp2 = size;
+
+	return tmp1/tmp2;
+}
+
+static void soft_offline_pages(int hugepage, void *addr,
+	unsigned long npages, unsigned long *skipped, unsigned long *failed)
+{
+	unsigned long psize, offset, pfn, paddr, fail, skip, i;
+	void *tmp;
+	int fd1, fd2;
+	char buf[20];
+
+	fd1 = open(PMAP_FILE, O_RDONLY);
+	if (fd1 == -1) {
+		perror("open() failed");
+		exit(-1);
+	}
+
+	fd2 = open(SOFT_OFFLINE, O_WRONLY);
+	if (fd2 == -1) {
+		perror("open() failed");
+		exit(-1);
+	}
+
+	fail = skip = 0;
+	psize = getpagesize();
+	for (i = 0; i < npages; i++) {
+		if (hugepage)
+			tmp = addr + i * PAGE_SIZE_HUGE;
+		else
+			tmp = addr + i * psize;
+
+		offset = ((unsigned long) tmp / psize) * PMAP_SIZE;
+
+		if (lseek(fd1, offset, SEEK_SET) == -1) {
+			perror("lseek() failed");
+			exit(-1);
+		}
+
+		if (read(fd1, &pfn, sizeof(pfn)) == -1) {
+			perror("read() failed");
+			exit(-1);
+		}
+
+		/* Skip if no valid PFN */
+		pfn = pfn & PMAP_PFN;
+		if (!pfn) {
+			skip++;
+			continue;
+		}
+
+		paddr = 0;
+		if (psize == PAGE_SIZE_4K)
+			paddr = pfn << PAGE_SHIFT_4K;
+
+		if (psize == PAGE_SIZE_64K)
+			paddr = pfn << PAGE_SHIFT_64K;
+
+		sprintf(buf, "0x%lx\n", paddr);
+
+		if (write(fd2, buf, strlen(buf)) == -1) {
+			perror("write() failed");
+			printf("[%ld] PFN: %lx BUF: %s\n",i, pfn, buf);
+			fail++;
+		}
+
+	}
+
+	if (failed)
+		*failed = fail;
+
+	if (skipped)
+		*skipped = skip;
+
+	close(fd1);
+	close(fd2);
+}
+
+int test_migration(unsigned long length)
+{
+	unsigned long skipped, failed;
+	void *addr;
+	int ret;
+
+	addr = mmap(MMAP_ADDR, length, MMAP_PROT, MMAP_FLAGS, -1, 0);
+	if (addr == MAP_FAILED) {
+		perror("mmap() failed");
+		exit(-1);
+	}
+
+	write_buffer(addr, length);
+	soft_offline_pages(HPAGE_OFF, addr, length/getpagesize(), &skipped, &failed);
+	ret = read_buffer(addr, length);
+
+	printf("%ld moved %ld skipped %ld failed\n", (length/getpagesize() - skipped - failed), skipped, failed);
+
+	munmap(addr, length);
+	return ret;
+}
+
+int test_huge_migration(unsigned long length)
+{
+	unsigned long skipped, failed, npages;
+	void *addr;
+	int fd, ret;
+
+	fd = open(FILE_NAME, O_CREAT | O_RDWR, 0755);
+	if (fd < 0) {
+		perror("open() failed");
+		exit(-1);
+	}
+
+	addr = mmap(MMAP_ADDR, length, MMAP_PROT, MMAP_FLAGS_HUGE, fd, 0);
+	if (addr == MAP_FAILED) {
+		perror("mmap() failed");
+		unlink(FILE_NAME);
+		exit(-1);
+	}
+
+        if (mlock(addr, length) == -1) {
+                perror("mlock() failed");
+		munmap(addr, length);
+                unlink(FILE_NAME);
+                exit(-1);
+        }
+
+	write_buffer(addr, length);
+	npages = get_npages(length, PAGE_SIZE_HUGE);
+	soft_offline_pages(HPAGE_ON, addr, npages, &skipped, &failed);
+	ret = read_buffer(addr, length);
+
+	printf("%ld moved %ld skipped %ld failed\n", (npages - skipped - failed), skipped, failed);
+
+	munmap(addr, length);
+	unlink(FILE_NAME);
+	return ret;
+}
diff --git a/tools/testing/selftests/powerpc/mm/page-migration.c b/tools/testing/selftests/powerpc/mm/page-migration.c
new file mode 100644
index 0000000..fc6e472
--- /dev/null
+++ b/tools/testing/selftests/powerpc/mm/page-migration.c
@@ -0,0 +1,33 @@
+/*
+ * Copyright (C) 2015, Anshuman Khandual, IBM Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published
+ * by the Free Software Foundation.
+ */
+#include "migration.h"
+
+static int page_migration(void)
+{
+	int ret = 0;
+
+	if ((unsigned long)getpagesize() == 0x1000)
+		printf("Running on base page size 4K\n");
+
+	if ((unsigned long)getpagesize() == 0x10000)
+		printf("Running on base page size 64K\n");
+
+	ret = test_migration(4 * MEM_MB);
+	ret = test_migration(64 * MEM_MB);
+	ret = test_migration(256 * MEM_MB);
+	ret = test_migration(512 * MEM_MB);
+	ret = test_migration(1 * MEM_GB);
+	ret = test_migration(2 * MEM_GB);
+
+	return ret;
+}
+
+int main(void)
+{
+	return test_harness(page_migration, "page_migration");
+}
diff --git a/tools/testing/selftests/powerpc/mm/run_mmtests b/tools/testing/selftests/powerpc/mm/run_mmtests
new file mode 100755
index 0000000..19805ba
--- /dev/null
+++ b/tools/testing/selftests/powerpc/mm/run_mmtests
@@ -0,0 +1,104 @@
+#!/bin/bash
+
+# Mostly borrowed from tools/testing/selftests/vm/run_vmtests
+
+# Please run this as root
+# Try allocating 2GB of 16MB huge pages, below is the size in kB.
+# Please change this needed memory if the test program changes
+needmem=2097152
+mnt=./huge
+exitcode=0
+
+# Get huge pagesize and freepages from /proc/meminfo
+while read name size unit; do
+	if [ "$name" = "HugePages_Free:" ]; then
+		freepgs=$size
+	fi
+	if [ "$name" = "Hugepagesize:" ]; then
+		pgsize=$size
+	fi
+done < /proc/meminfo
+
+# Set required nr_hugepages
+if [ -n "$freepgs" ] && [ -n "$pgsize" ]; then
+	nr_hugepgs=`cat /proc/sys/vm/nr_hugepages`
+	needpgs=`expr $needmem / $pgsize`
+	tries=2
+	while [ $tries -gt 0 ] && [ $freepgs -lt $needpgs ]; do
+		lackpgs=$(( $needpgs - $freepgs ))
+		echo 3 > /proc/sys/vm/drop_caches
+		echo $(( $lackpgs + $nr_hugepgs )) > /proc/sys/vm/nr_hugepages
+		if [ $? -ne 0 ]; then
+			echo "Please run this test as root"
+		fi
+		while read name size unit; do
+			if [ "$name" = "HugePages_Free:" ]; then
+				freepgs=$size
+			fi
+		done < /proc/meminfo
+		tries=$((tries - 1))
+	done
+	if [ $freepgs -lt $needpgs ]; then
+		printf "Not enough huge pages available (%d < %d)\n" \
+		       $freepgs $needpgs
+	fi
+else
+	echo "No hugetlbfs support in kernel ? check dmesg"
+fi
+
+mkdir $mnt
+mount -t hugetlbfs none $mnt
+
+# Run the test programs
+echo "...................."
+echo "Test HugeTLB vs THP"
+echo "...................."
+./hugetlb_vs_thp_test
+if [ $? -ne 0 ]; then
+	echo "[FAIL]"
+	exitcode=1
+else
+	echo "[PASS]"
+fi
+
+echo "........................."
+echo "Test subpage protection"
+echo "........................."
+./subpage_prot
+if [ $? -ne 0 ]; then
+	echo "[FAIL]"
+	exitcode=1
+else
+	echo "[PASS]"
+fi
+
+echo "..........................."
+echo "Test normal page migration"
+echo "..........................."
+./page-migration
+if [ $? -ne 0 ]; then
+	echo "[FAIL]"
+	exitcode=1
+else
+	echo "[PASS]"
+fi
+
+# Enable this after huge page migration is supported on POWER
+
+echo "........................."
+echo "Test huge page migration"
+echo "........................."
+./hugepage-migration
+if [ $? -ne 0 ]; then
+	echo "[FAIL]"
+	exitcode=1
+else
+	echo "[PASS]"
+fi
+
+# Huge pages cleanup
+umount $mnt
+rm -rf $mnt
+echo $nr_hugepgs > /proc/sys/vm/nr_hugepages
+
+exit $exitcode
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH 01/10] mm/mmap: Replace SHM_HUGE_MASK with MAP_HUGE_MASK inside mmap_pgoff
  2016-04-07  5:37 ` [PATCH 01/10] mm/mmap: Replace SHM_HUGE_MASK with MAP_HUGE_MASK inside mmap_pgoff Anshuman Khandual
@ 2016-04-07  8:28   ` Balbir Singh
  2016-04-13  7:54   ` Michal Hocko
  1 sibling, 0 replies; 27+ messages in thread
From: Balbir Singh @ 2016-04-07  8:28 UTC (permalink / raw)
  To: Anshuman Khandual, linux-mm, linux-kernel, linuxppc-dev
  Cc: hughd, kirill, n-horiguchi, akpm, mgorman, dave.hansen,
	aneesh.kumar, mpe



On 07/04/16 15:37, Anshuman Khandual wrote:
> The commit 091d0d55b286 ("shm: fix null pointer deref when userspace
> specifies invalid hugepage size") had replaced MAP_HUGE_MASK with
> SHM_HUGE_MASK. Though both of them contain the same numeric value of
> 0x3f, MAP_HUGE_MASK flag sounds more appropriate than the other one
> in the context. Hence change it back.
> 
> Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>

Acked-by: Balbir Singh <bsingharora@gmail.com>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 02/10] mm/hugetlb: Add PGD based implementation awareness
  2016-04-07  5:37 ` [PATCH 02/10] mm/hugetlb: Add PGD based implementation awareness Anshuman Khandual
@ 2016-04-07  9:04   ` Balbir Singh
  2016-04-11  5:25     ` Anshuman Khandual
  0 siblings, 1 reply; 27+ messages in thread
From: Balbir Singh @ 2016-04-07  9:04 UTC (permalink / raw)
  To: Anshuman Khandual, linux-mm, linux-kernel, linuxppc-dev
  Cc: hughd, kirill, n-horiguchi, akpm, mgorman, dave.hansen,
	aneesh.kumar, mpe



On 07/04/16 15:37, Anshuman Khandual wrote:
> Currently the config ARCH_WANT_GENERAL_HUGETLB enabled functions like
> 'huge_pte_alloc' and 'huge_pte_offset' dont take into account HugeTLB
> page implementation at the PGD level. This is also true for functions
> like 'follow_page_mask' which is called from move_pages() system call.
> This lack of PGD level huge page support prohibits some architectures
> to use these generic HugeTLB functions.
> 

>From what I know of move_pages(), it will always call follow_page_mask()
with FOLL_GET (I could be wrong here) and the implementation below
returns NULL for follow_huge_pgd().

> This change adds the required PGD based implementation awareness and
> with that, more architectures like POWER which implements 16GB pages
> at the PGD level along with the 16MB pages at the PMD level can now
> use ARCH_WANT_GENERAL_HUGETLB config option.
> 
> Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
> ---
>  include/linux/hugetlb.h |  3 +++
>  mm/gup.c                |  6 ++++++
>  mm/hugetlb.c            | 20 ++++++++++++++++++++
>  3 files changed, 29 insertions(+)
> 
> diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
> index 7d953c2..71832e1 100644
> --- a/include/linux/hugetlb.h
> +++ b/include/linux/hugetlb.h
> @@ -115,6 +115,8 @@ struct page *follow_huge_pmd(struct mm_struct *mm, unsigned long address,
>  				pmd_t *pmd, int flags);
>  struct page *follow_huge_pud(struct mm_struct *mm, unsigned long address,
>  				pud_t *pud, int flags);
> +struct page *follow_huge_pgd(struct mm_struct *mm, unsigned long address,
> +				pgd_t *pgd, int flags);
>  int pmd_huge(pmd_t pmd);
>  int pud_huge(pud_t pmd);
>  unsigned long hugetlb_change_protection(struct vm_area_struct *vma,
> @@ -143,6 +145,7 @@ static inline void hugetlb_show_meminfo(void)
>  }
>  #define follow_huge_pmd(mm, addr, pmd, flags)	NULL
>  #define follow_huge_pud(mm, addr, pud, flags)	NULL
> +#define follow_huge_pgd(mm, addr, pgd, flags)	NULL
>  #define prepare_hugepage_range(file, addr, len)	(-EINVAL)
>  #define pmd_huge(x)	0
>  #define pud_huge(x)	0
> diff --git a/mm/gup.c b/mm/gup.c
> index fb87aea..9bac78c 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -234,6 +234,12 @@ struct page *follow_page_mask(struct vm_area_struct *vma,
>  	pgd = pgd_offset(mm, address);
>  	if (pgd_none(*pgd) || unlikely(pgd_bad(*pgd)))
>  		return no_page_table(vma, flags);
> +	if (pgd_huge(*pgd) && vma->vm_flags & VM_HUGETLB) {
> +		page = follow_huge_pgd(mm, address, pgd, flags);
> +		if (page)
> +			return page;
> +		return no_page_table(vma, flags);
This will return NULL as well?
> +	}
>  
>  	pud = pud_offset(pgd, address);
>  	if (pud_none(*pud))
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 19d0d08..5ea3158 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -4250,6 +4250,11 @@ pte_t *huge_pte_alloc(struct mm_struct *mm,
>  	pte_t *pte = NULL;
>  
>  	pgd = pgd_offset(mm, addr);
> +	if (sz == PGDIR_SIZE) {
> +		pte = (pte_t *)pgd;
> +		goto huge_pgd;
> +	}
> +

No allocation for a pgd slot - right?

>  	pud = pud_alloc(mm, pgd, addr);
>  	if (pud) {
>  		if (sz == PUD_SIZE) {
> @@ -4262,6 +4267,8 @@ pte_t *huge_pte_alloc(struct mm_struct *mm,
>  				pte = (pte_t *)pmd_alloc(mm, pud, addr);
>  		}
>  	}
> +
> +huge_pgd:
>  	BUG_ON(pte && !pte_none(*pte) && !pte_huge(*pte));
>  
>  	return pte;
> @@ -4275,6 +4282,8 @@ pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long addr)
>  
>  	pgd = pgd_offset(mm, addr);
>  	if (pgd_present(*pgd)) {
> +		if (pgd_huge(*pgd))
> +			return (pte_t *)pgd;
>  		pud = pud_offset(pgd, addr);
>  		if (pud_present(*pud)) {
>  			if (pud_huge(*pud))
> @@ -4343,6 +4352,17 @@ follow_huge_pud(struct mm_struct *mm, unsigned long address,
>  	return pte_page(*(pte_t *)pud) + ((address & ~PUD_MASK) >> PAGE_SHIFT);
>  }
>  
> +struct page * __weak
> +follow_huge_pgd(struct mm_struct *mm, unsigned long address,
> +		pgd_t *pgd, int flags)
> +{
> +	if (flags & FOLL_GET)
> +		return NULL;
> +
> +	return pte_page(*(pte_t *)pgd) +
> +				((address & ~PGDIR_MASK) >> PAGE_SHIFT);
> +}
> +
>  #ifdef CONFIG_MEMORY_FAILURE
>  
>  /*
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 03/10] mm/hugetlb: Protect follow_huge_(pud|pgd) functions from race
  2016-04-07  5:37 ` [PATCH 03/10] mm/hugetlb: Protect follow_huge_(pud|pgd) functions from race Anshuman Khandual
@ 2016-04-07  9:16   ` kbuild test robot
  2016-04-18  8:44     ` Anshuman Khandual
  2016-04-07  9:26   ` Balbir Singh
  2016-04-07  9:34   ` kbuild test robot
  2 siblings, 1 reply; 27+ messages in thread
From: kbuild test robot @ 2016-04-07  9:16 UTC (permalink / raw)
  To: Anshuman Khandual
  Cc: kbuild-all, linux-mm, linux-kernel, linuxppc-dev, hughd, kirill,
	n-horiguchi, akpm, mgorman, dave.hansen, aneesh.kumar, mpe

[-- Attachment #1: Type: text/plain, Size: 1765 bytes --]

Hi Anshuman,

[auto build test ERROR on powerpc/next]
[also build test ERROR on v4.6-rc2 next-20160407]
[if your patch is applied to the wrong git tree, please drop us a note to help improving the system]

url:    https://github.com/0day-ci/linux/commits/Anshuman-Khandual/Enable-HugeTLB-page-migration-on-POWER/20160407-165841
base:   https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next
config: sparc64-allyesconfig (attached as .config)
reproduce:
        wget https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        make.cross ARCH=sparc64 

All error/warnings (new ones prefixed by >>):

   mm/hugetlb.c: In function 'follow_huge_pgd':
>> mm/hugetlb.c:4395:3: error: implicit declaration of function 'pgd_page' [-Werror=implicit-function-declaration]
      page = pgd_page(*pgd) + ((address & ~PGDIR_MASK) >> PAGE_SHIFT);
      ^
>> mm/hugetlb.c:4395:8: warning: assignment makes pointer from integer without a cast
      page = pgd_page(*pgd) + ((address & ~PGDIR_MASK) >> PAGE_SHIFT);
           ^
   cc1: some warnings being treated as errors

vim +/pgd_page +4395 mm/hugetlb.c

  4389		 * make sure that the address range covered by this pgd is not
  4390		 * unmapped from other threads.
  4391		 */
  4392		if (!pgd_huge(*pgd))
  4393			goto out;
  4394		if (pgd_present(*pgd)) {
> 4395			page = pgd_page(*pgd) + ((address & ~PGDIR_MASK) >> PAGE_SHIFT);
  4396			if (flags & FOLL_GET)
  4397				get_page(page);
  4398		} else {

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/octet-stream, Size: 45096 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 03/10] mm/hugetlb: Protect follow_huge_(pud|pgd) functions from race
  2016-04-07  5:37 ` [PATCH 03/10] mm/hugetlb: Protect follow_huge_(pud|pgd) functions from race Anshuman Khandual
  2016-04-07  9:16   ` kbuild test robot
@ 2016-04-07  9:26   ` Balbir Singh
  2016-04-11  5:39     ` Anshuman Khandual
  2016-04-07  9:34   ` kbuild test robot
  2 siblings, 1 reply; 27+ messages in thread
From: Balbir Singh @ 2016-04-07  9:26 UTC (permalink / raw)
  To: Anshuman Khandual, linux-mm, linux-kernel, linuxppc-dev
  Cc: hughd, kirill, n-horiguchi, akpm, mgorman, dave.hansen,
	aneesh.kumar, mpe



On 07/04/16 15:37, Anshuman Khandual wrote:
> follow_huge_(pmd|pud|pgd) functions are used to walk the page table and
> fetch the page struct during 'follow_page_mask' call. There are possible
> race conditions faced by these functions which arise out of simultaneous
> calls of move_pages() and freeing of huge pages. This was fixed partly
> by the previous commit e66f17ff7177 ("mm/hugetlb: take page table lock
> in follow_huge_pmd()") for only PMD based huge pages.
> 
> After implementing similar logic, functions like follow_huge_(pud|pgd)
> are now safe from above mentioned race conditions and also can support
> FOLL_GET. Generic version of the function 'follow_huge_addr' has been
> left as it is and its upto the architecture to decide on it.
> 
> Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
> ---
>  include/linux/mm.h | 33 +++++++++++++++++++++++++++
>  mm/hugetlb.c       | 67 ++++++++++++++++++++++++++++++++++++++++++++++--------
>  2 files changed, 91 insertions(+), 9 deletions(-)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index ffcff53..734182a 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -1751,6 +1751,19 @@ static inline void pgtable_page_dtor(struct page *page)
>  		NULL: pte_offset_kernel(pmd, address))
>  
>  #if USE_SPLIT_PMD_PTLOCKS

Do we still use USE_SPLIT_PMD_PTLOCKS? I think its good enough. with pgd's
we are likely to use the same locks and the split nature may not be really
split.

> +static struct page *pgd_to_page(pgd_t *pgd)
> +{
> +	unsigned long mask = ~(PTRS_PER_PGD * sizeof(pgd_t) - 1);
> +
> +	return virt_to_page((void *)((unsigned long) pgd & mask));
> +}
> +
> +static struct page *pud_to_page(pud_t *pud)
> +{
> +	unsigned long mask = ~(PTRS_PER_PUD * sizeof(pud_t) - 1);
> +
> +	return virt_to_page((void *)((unsigned long) pud & mask));
> +}
>  
>  static struct page *pmd_to_page(pmd_t *pmd)
>  {
> @@ -1758,6 +1771,16 @@ static struct page *pmd_to_page(pmd_t *pmd)
>  	return virt_to_page((void *)((unsigned long) pmd & mask));
>  }
>  
> +static inline spinlock_t *pgd_lockptr(struct mm_struct *mm, pgd_t *pgd)
> +{
> +	return ptlock_ptr(pgd_to_page(pgd));
> +}
> +
> +static inline spinlock_t *pud_lockptr(struct mm_struct *mm, pud_t *pud)
> +{
> +	return ptlock_ptr(pud_to_page(pud));
> +}
> +
>  static inline spinlock_t *pmd_lockptr(struct mm_struct *mm, pmd_t *pmd)
>  {
>  	return ptlock_ptr(pmd_to_page(pmd));
> @@ -1783,6 +1806,16 @@ static inline void pgtable_pmd_page_dtor(struct page *page)
>  
>  #else
>  
> +static inline spinlock_t *pgd_lockptr(struct mm_struct *mm, pgd_t *pgd)
> +{
> +	return &mm->page_table_lock;
> +}
> +
> +static inline spinlock_t *pud_lockptr(struct mm_struct *mm, pud_t *pud)
> +{
> +	return &mm->page_table_lock;
> +}
> +
>  static inline spinlock_t *pmd_lockptr(struct mm_struct *mm, pmd_t *pmd)
>  {
>  	return &mm->page_table_lock;
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 5ea3158..e84e479 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -4346,21 +4346,70 @@ struct page * __weak
>  follow_huge_pud(struct mm_struct *mm, unsigned long address,
>  		pud_t *pud, int flags)
>  {
> -	if (flags & FOLL_GET)
> -		return NULL;
> -
> -	return pte_page(*(pte_t *)pud) + ((address & ~PUD_MASK) >> PAGE_SHIFT);
> +	struct page *page = NULL;
> +	spinlock_t *ptl;
> +retry:
> +	ptl = pud_lockptr(mm, pud);
> +	spin_lock(ptl);
> +	/*
> +	 * make sure that the address range covered by this pud is not
> +	 * unmapped from other threads.
> +	 */
> +	if (!pud_huge(*pud))
> +		goto out;
> +	if (pud_present(*pud)) {
> +		page = pud_page(*pud) + ((address & ~PUD_MASK) >> PAGE_SHIFT);
> +		if (flags & FOLL_GET)
> +			get_page(page);
> +	} else {
> +		if (is_hugetlb_entry_migration(huge_ptep_get((pte_t *)pud))) {
> +			spin_unlock(ptl);
> +			__migration_entry_wait(mm, (pte_t *)pud, ptl);
> +			goto retry;
> +		}
> +		/*
> +		 * hwpoisoned entry is treated as no_page_table in
> +		 * follow_page_mask().
> +		 */
> +	}
> +out:
> +	spin_unlock(ptl);
> +	return page;
>  }
>  
>  struct page * __weak
>  follow_huge_pgd(struct mm_struct *mm, unsigned long address,
>  		pgd_t *pgd, int flags)
>  {
> -	if (flags & FOLL_GET)
> -		return NULL;
> -
> -	return pte_page(*(pte_t *)pgd) +
> -				((address & ~PGDIR_MASK) >> PAGE_SHIFT);
> +	struct page *page = NULL;
> +	spinlock_t *ptl;
> +retry:
> +	ptl = pgd_lockptr(mm, pgd);
> +	spin_lock(ptl);
> +	/*
> +	 * make sure that the address range covered by this pgd is not
> +	 * unmapped from other threads.
> +	 */
> +	if (!pgd_huge(*pgd))
> +		goto out;
> +	if (pgd_present(*pgd)) {
> +		page = pgd_page(*pgd) + ((address & ~PGDIR_MASK) >> PAGE_SHIFT);
> +		if (flags & FOLL_GET)
> +			get_page(page);
> +	} else {
> +		if (is_hugetlb_entry_migration(huge_ptep_get((pte_t *)pgd))) {
> +			spin_unlock(ptl);
> +			__migration_entry_wait(mm, (pte_t *)pgd, ptl);
> +			goto retry;
> +		}
> +		/*
> +		 * hwpoisoned entry is treated as no_page_table in
> +		 * follow_page_mask().
> +		 */
> +	}
> +out:
> +	spin_unlock(ptl);
> +	return page;
>  }
>  
>  #ifdef CONFIG_MEMORY_FAILURE
> 


Balbir

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 03/10] mm/hugetlb: Protect follow_huge_(pud|pgd) functions from race
  2016-04-07  5:37 ` [PATCH 03/10] mm/hugetlb: Protect follow_huge_(pud|pgd) functions from race Anshuman Khandual
  2016-04-07  9:16   ` kbuild test robot
  2016-04-07  9:26   ` Balbir Singh
@ 2016-04-07  9:34   ` kbuild test robot
  2016-04-11  6:04     ` Anshuman Khandual
  2 siblings, 1 reply; 27+ messages in thread
From: kbuild test robot @ 2016-04-07  9:34 UTC (permalink / raw)
  To: Anshuman Khandual
  Cc: kbuild-all, linux-mm, linux-kernel, linuxppc-dev, hughd, kirill,
	n-horiguchi, akpm, mgorman, dave.hansen, aneesh.kumar, mpe

[-- Attachment #1: Type: text/plain, Size: 2157 bytes --]

Hi Anshuman,

[auto build test ERROR on powerpc/next]
[also build test ERROR on v4.6-rc2 next-20160407]
[if your patch is applied to the wrong git tree, please drop us a note to help improving the system]

url:    https://github.com/0day-ci/linux/commits/Anshuman-Khandual/Enable-HugeTLB-page-migration-on-POWER/20160407-165841
base:   https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next
config: s390-allyesconfig (attached as .config)
reproduce:
        wget https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        make.cross ARCH=s390 

All errors (new ones prefixed by >>):

   mm/hugetlb.c: In function 'follow_huge_pud':
>> mm/hugetlb.c:4360:3: error: implicit declaration of function 'pud_page' [-Werror=implicit-function-declaration]
      page = pud_page(*pud) + ((address & ~PUD_MASK) >> PAGE_SHIFT);
      ^
   mm/hugetlb.c:4360:8: warning: assignment makes pointer from integer without a cast
      page = pud_page(*pud) + ((address & ~PUD_MASK) >> PAGE_SHIFT);
           ^
   mm/hugetlb.c: In function 'follow_huge_pgd':
   mm/hugetlb.c:4395:3: error: implicit declaration of function 'pgd_page' [-Werror=implicit-function-declaration]
      page = pgd_page(*pgd) + ((address & ~PGDIR_MASK) >> PAGE_SHIFT);
      ^
   mm/hugetlb.c:4395:8: warning: assignment makes pointer from integer without a cast
      page = pgd_page(*pgd) + ((address & ~PGDIR_MASK) >> PAGE_SHIFT);
           ^
   cc1: some warnings being treated as errors

vim +/pud_page +4360 mm/hugetlb.c

  4354		 * make sure that the address range covered by this pud is not
  4355		 * unmapped from other threads.
  4356		 */
  4357		if (!pud_huge(*pud))
  4358			goto out;
  4359		if (pud_present(*pud)) {
> 4360			page = pud_page(*pud) + ((address & ~PUD_MASK) >> PAGE_SHIFT);
  4361			if (flags & FOLL_GET)
  4362				get_page(page);
  4363		} else {

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/octet-stream, Size: 40088 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 02/10] mm/hugetlb: Add PGD based implementation awareness
  2016-04-07  9:04   ` Balbir Singh
@ 2016-04-11  5:25     ` Anshuman Khandual
  2016-04-11  6:10       ` Anshuman Khandual
  0 siblings, 1 reply; 27+ messages in thread
From: Anshuman Khandual @ 2016-04-11  5:25 UTC (permalink / raw)
  To: Balbir Singh, linux-mm, linux-kernel, linuxppc-dev
  Cc: hughd, dave.hansen, aneesh.kumar, kirill, n-horiguchi, mgorman, akpm

On 04/07/2016 02:34 PM, Balbir Singh wrote:
> 
> 
> On 07/04/16 15:37, Anshuman Khandual wrote:
>> Currently the config ARCH_WANT_GENERAL_HUGETLB enabled functions like
>> 'huge_pte_alloc' and 'huge_pte_offset' dont take into account HugeTLB
>> page implementation at the PGD level. This is also true for functions
>> like 'follow_page_mask' which is called from move_pages() system call.
>> This lack of PGD level huge page support prohibits some architectures
>> to use these generic HugeTLB functions.
>>
> 
> From what I know of move_pages(), it will always call follow_page_mask()
> with FOLL_GET (I could be wrong here) and the implementation below
> returns NULL for follow_huge_pgd().

You are right. This patch makes ARCH_WANT_GENERAL_HUGETLB functions aware
of PGD implementation so that we can do all transactions on 16GB pages
using these function instead of the present arch overrides. But that also
requires follow_page_mask() changes for every other access to the page
than the migrate_pages() usage.

But yes, we dont support migrate_pages() on PGD based pages yet, hence
it just returns NULL in that case. May be the commit message needs to
reflect this.

> 
>> This change adds the required PGD based implementation awareness and
>> with that, more architectures like POWER which implements 16GB pages
>> at the PGD level along with the 16MB pages at the PMD level can now
>> use ARCH_WANT_GENERAL_HUGETLB config option.
>>
>> Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
>> ---
>>  include/linux/hugetlb.h |  3 +++
>>  mm/gup.c                |  6 ++++++
>>  mm/hugetlb.c            | 20 ++++++++++++++++++++
>>  3 files changed, 29 insertions(+)
>>
>> diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
>> index 7d953c2..71832e1 100644
>> --- a/include/linux/hugetlb.h
>> +++ b/include/linux/hugetlb.h
>> @@ -115,6 +115,8 @@ struct page *follow_huge_pmd(struct mm_struct *mm, unsigned long address,
>>  				pmd_t *pmd, int flags);
>>  struct page *follow_huge_pud(struct mm_struct *mm, unsigned long address,
>>  				pud_t *pud, int flags);
>> +struct page *follow_huge_pgd(struct mm_struct *mm, unsigned long address,
>> +				pgd_t *pgd, int flags);
>>  int pmd_huge(pmd_t pmd);
>>  int pud_huge(pud_t pmd);
>>  unsigned long hugetlb_change_protection(struct vm_area_struct *vma,
>> @@ -143,6 +145,7 @@ static inline void hugetlb_show_meminfo(void)
>>  }
>>  #define follow_huge_pmd(mm, addr, pmd, flags)	NULL
>>  #define follow_huge_pud(mm, addr, pud, flags)	NULL
>> +#define follow_huge_pgd(mm, addr, pgd, flags)	NULL
>>  #define prepare_hugepage_range(file, addr, len)	(-EINVAL)
>>  #define pmd_huge(x)	0
>>  #define pud_huge(x)	0
>> diff --git a/mm/gup.c b/mm/gup.c
>> index fb87aea..9bac78c 100644
>> --- a/mm/gup.c
>> +++ b/mm/gup.c
>> @@ -234,6 +234,12 @@ struct page *follow_page_mask(struct vm_area_struct *vma,
>>  	pgd = pgd_offset(mm, address);
>>  	if (pgd_none(*pgd) || unlikely(pgd_bad(*pgd)))
>>  		return no_page_table(vma, flags);
>> +	if (pgd_huge(*pgd) && vma->vm_flags & VM_HUGETLB) {
>> +		page = follow_huge_pgd(mm, address, pgd, flags);
>> +		if (page)
>> +			return page;
>> +		return no_page_table(vma, flags);
> This will return NULL as well?

That right, no_page_table() returns NULL for FOLL_GET when we fall through
after failing on follow_huge_pgd().

>> +	}
>>  
>>  	pud = pud_offset(pgd, address);
>>  	if (pud_none(*pud))
>> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
>> index 19d0d08..5ea3158 100644
>> --- a/mm/hugetlb.c
>> +++ b/mm/hugetlb.c
>> @@ -4250,6 +4250,11 @@ pte_t *huge_pte_alloc(struct mm_struct *mm,
>>  	pte_t *pte = NULL;
>>  
>>  	pgd = pgd_offset(mm, addr);
>> +	if (sz == PGDIR_SIZE) {
>> +		pte = (pte_t *)pgd;
>> +		goto huge_pgd;
>> +	}
>> +
> 
> No allocation for a pgd slot - right?

No, its already allocated for the mm during creation.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 03/10] mm/hugetlb: Protect follow_huge_(pud|pgd) functions from race
  2016-04-07  9:26   ` Balbir Singh
@ 2016-04-11  5:39     ` Anshuman Khandual
  2016-04-11 12:46       ` Balbir Singh
  0 siblings, 1 reply; 27+ messages in thread
From: Anshuman Khandual @ 2016-04-11  5:39 UTC (permalink / raw)
  To: Balbir Singh, linux-mm, linux-kernel, linuxppc-dev
  Cc: hughd, dave.hansen, aneesh.kumar, kirill, n-horiguchi, mgorman, akpm

On 04/07/2016 02:56 PM, Balbir Singh wrote:
> 
> On 07/04/16 15:37, Anshuman Khandual wrote:
>> > follow_huge_(pmd|pud|pgd) functions are used to walk the page table and
>> > fetch the page struct during 'follow_page_mask' call. There are possible
>> > race conditions faced by these functions which arise out of simultaneous
>> > calls of move_pages() and freeing of huge pages. This was fixed partly
>> > by the previous commit e66f17ff7177 ("mm/hugetlb: take page table lock
>> > in follow_huge_pmd()") for only PMD based huge pages.
>> > 
>> > After implementing similar logic, functions like follow_huge_(pud|pgd)
>> > are now safe from above mentioned race conditions and also can support
>> > FOLL_GET. Generic version of the function 'follow_huge_addr' has been
>> > left as it is and its upto the architecture to decide on it.
>> > 
>> > Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
>> > ---
>> >  include/linux/mm.h | 33 +++++++++++++++++++++++++++
>> >  mm/hugetlb.c       | 67 ++++++++++++++++++++++++++++++++++++++++++++++--------
>> >  2 files changed, 91 insertions(+), 9 deletions(-)
>> > 
>> > diff --git a/include/linux/mm.h b/include/linux/mm.h
>> > index ffcff53..734182a 100644
>> > --- a/include/linux/mm.h
>> > +++ b/include/linux/mm.h
>> > @@ -1751,6 +1751,19 @@ static inline void pgtable_page_dtor(struct page *page)
>> >  		NULL: pte_offset_kernel(pmd, address))
>> >  
>> >  #if USE_SPLIT_PMD_PTLOCKS
> Do we still use USE_SPLIT_PMD_PTLOCKS? I think its good enough. with pgd's
> we are likely to use the same locks and the split nature may not be really
> split.
> 

Sorry Balbir, did not get what you asked. Can you please elaborate on
this ?

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 03/10] mm/hugetlb: Protect follow_huge_(pud|pgd) functions from race
  2016-04-07  9:34   ` kbuild test robot
@ 2016-04-11  6:04     ` Anshuman Khandual
  2016-04-18  8:42       ` Anshuman Khandual
  0 siblings, 1 reply; 27+ messages in thread
From: Anshuman Khandual @ 2016-04-11  6:04 UTC (permalink / raw)
  To: kbuild test robot
  Cc: dave.hansen, mgorman, hughd, linux-kernel, linux-mm, kbuild-all,
	kirill, n-horiguchi, linuxppc-dev, akpm, aneesh.kumar

On 04/07/2016 03:04 PM, kbuild test robot wrote:
> All errors (new ones prefixed by >>):
> 
>    mm/hugetlb.c: In function 'follow_huge_pud':
>>> >> mm/hugetlb.c:4360:3: error: implicit declaration of function 'pud_page' [-Werror=implicit-function-declaration]
>       page = pud_page(*pud) + ((address & ~PUD_MASK) >> PAGE_SHIFT);
>       ^
>    mm/hugetlb.c:4360:8: warning: assignment makes pointer from integer without a cast
>       page = pud_page(*pud) + ((address & ~PUD_MASK) >> PAGE_SHIFT);
>            ^
>    mm/hugetlb.c: In function 'follow_huge_pgd':
>    mm/hugetlb.c:4395:3: error: implicit declaration of function 'pgd_page' [-Werror=implicit-function-declaration]
>       page = pgd_page(*pgd) + ((address & ~PGDIR_MASK) >> PAGE_SHIFT);

Both the build errors here are because of the fact that pgd_page() is
not available for some platforms and config options. It got missed as
I ran only powerpc config options for build test purpose. My bad, will
fix it.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 02/10] mm/hugetlb: Add PGD based implementation awareness
  2016-04-11  5:25     ` Anshuman Khandual
@ 2016-04-11  6:10       ` Anshuman Khandual
  0 siblings, 0 replies; 27+ messages in thread
From: Anshuman Khandual @ 2016-04-11  6:10 UTC (permalink / raw)
  To: Balbir Singh, linux-mm, linux-kernel, linuxppc-dev
  Cc: hughd, dave.hansen, aneesh.kumar, kirill, n-horiguchi, mgorman, akpm

On 04/11/2016 10:55 AM, Anshuman Khandual wrote:
> On 04/07/2016 02:34 PM, Balbir Singh wrote:
>> > 
>> > 
>> > On 07/04/16 15:37, Anshuman Khandual wrote:
>>> >> Currently the config ARCH_WANT_GENERAL_HUGETLB enabled functions like
>>> >> 'huge_pte_alloc' and 'huge_pte_offset' dont take into account HugeTLB
>>> >> page implementation at the PGD level. This is also true for functions
>>> >> like 'follow_page_mask' which is called from move_pages() system call.
>>> >> This lack of PGD level huge page support prohibits some architectures
>>> >> to use these generic HugeTLB functions.
>>> >>
>> > 
>> > From what I know of move_pages(), it will always call follow_page_mask()
>> > with FOLL_GET (I could be wrong here) and the implementation below
>> > returns NULL for follow_huge_pgd().
> You are right. This patch makes ARCH_WANT_GENERAL_HUGETLB functions aware
> of PGD implementation so that we can do all transactions on 16GB pages
> using these function instead of the present arch overrides. But that also
> requires follow_page_mask() changes for every other access to the page
> than the migrate_pages() usage.
> 
> But yes, we dont support migrate_pages() on PGD based pages yet, hence
> it just returns NULL in that case. May be the commit message needs to
> reflect this.

The next commit actually changes follow_huge_pud|pgd() functions to
support FOLL_GET and PGD based huge page migration.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 03/10] mm/hugetlb: Protect follow_huge_(pud|pgd) functions from race
  2016-04-11  5:39     ` Anshuman Khandual
@ 2016-04-11 12:46       ` Balbir Singh
  0 siblings, 0 replies; 27+ messages in thread
From: Balbir Singh @ 2016-04-11 12:46 UTC (permalink / raw)
  To: Anshuman Khandual, linux-mm, linux-kernel, linuxppc-dev
  Cc: hughd, dave.hansen, aneesh.kumar, kirill, n-horiguchi, mgorman, akpm



On 11/04/16 15:39, Anshuman Khandual wrote:
> On 04/07/2016 02:56 PM, Balbir Singh wrote:
>>
>> On 07/04/16 15:37, Anshuman Khandual wrote:
>>>> follow_huge_(pmd|pud|pgd) functions are used to walk the page table and
>>>> fetch the page struct during 'follow_page_mask' call. There are possible
>>>> race conditions faced by these functions which arise out of simultaneous
>>>> calls of move_pages() and freeing of huge pages. This was fixed partly
>>>> by the previous commit e66f17ff7177 ("mm/hugetlb: take page table lock
>>>> in follow_huge_pmd()") for only PMD based huge pages.
>>>>
>>>> After implementing similar logic, functions like follow_huge_(pud|pgd)
>>>> are now safe from above mentioned race conditions and also can support
>>>> FOLL_GET. Generic version of the function 'follow_huge_addr' has been
>>>> left as it is and its upto the architecture to decide on it.
>>>>
>>>> Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
>>>> ---
>>>>  include/linux/mm.h | 33 +++++++++++++++++++++++++++
>>>>  mm/hugetlb.c       | 67 ++++++++++++++++++++++++++++++++++++++++++++++--------
>>>>  2 files changed, 91 insertions(+), 9 deletions(-)
>>>>
>>>> diff --git a/include/linux/mm.h b/include/linux/mm.h
>>>> index ffcff53..734182a 100644
>>>> --- a/include/linux/mm.h
>>>> +++ b/include/linux/mm.h
>>>> @@ -1751,6 +1751,19 @@ static inline void pgtable_page_dtor(struct page *page)
>>>>  		NULL: pte_offset_kernel(pmd, address))
>>>>  
>>>>  #if USE_SPLIT_PMD_PTLOCKS
>> Do we still use USE_SPLIT_PMD_PTLOCKS? I think its good enough. with pgd's
>> we are likely to use the same locks and the split nature may not be really
>> split.
>>
> 
> Sorry Balbir, did not get what you asked. Can you please elaborate on
> this ?
> 

What I meant is that do we need SPLIT_PUD_PTLOCKS for example? I don't think we do

Balbir

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 05/10] powerpc/hugetlb: Split the function 'huge_pte_alloc'
  2016-04-07  5:37 ` [PATCH 05/10] powerpc/hugetlb: Split the function 'huge_pte_alloc' Anshuman Khandual
@ 2016-04-11 13:51   ` Balbir Singh
  2016-04-13 11:08     ` Anshuman Khandual
  0 siblings, 1 reply; 27+ messages in thread
From: Balbir Singh @ 2016-04-11 13:51 UTC (permalink / raw)
  To: Anshuman Khandual, linux-mm, linux-kernel, linuxppc-dev
  Cc: hughd, kirill, n-horiguchi, akpm, mgorman, dave.hansen,
	aneesh.kumar, mpe



On 07/04/16 15:37, Anshuman Khandual wrote:
> Currently the function 'huge_pte_alloc' has got two versions, one for the
> BOOK3S server and the other one for the BOOK3E embedded platforms. This
> change splits only the BOOK3S server version into two parts, one for the
> ARCH_WANT_GENERAL_HUGETLB config implementation and the other one for
> everything else. This change is one of the prerequisites towards enabling
> ARCH_WANT_GENERAL_HUGETLB config option on POWER platform.
> 
> Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
> ---
>  arch/powerpc/mm/hugetlbpage.c | 67 +++++++++++++++++++++++++++----------------
>  1 file changed, 43 insertions(+), 24 deletions(-)
> 
> diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c
> index d991b9e..e453918 100644
> --- a/arch/powerpc/mm/hugetlbpage.c
> +++ b/arch/powerpc/mm/hugetlbpage.c
> @@ -59,6 +59,7 @@ pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long addr)
>  	return __find_linux_pte_or_hugepte(mm->pgd, addr, NULL, NULL);
>  }
>  
> +#ifndef CONFIG_ARCH_WANT_GENERAL_HUGETLB
>  static int __hugepte_alloc(struct mm_struct *mm, hugepd_t *hpdp,
>  			   unsigned long address, unsigned pdshift, unsigned pshift)
>  {
> @@ -116,6 +117,7 @@ static int __hugepte_alloc(struct mm_struct *mm, hugepd_t *hpdp,
>  	spin_unlock(&mm->page_table_lock);
>  	return 0;
>  }
> +#endif /* !CONFIG_ARCH_WANT_GENERAL_HUGETLB */
>  
>  /*
>   * These macros define how to determine which level of the page table holds
> @@ -130,6 +132,7 @@ static int __hugepte_alloc(struct mm_struct *mm, hugepd_t *hpdp,
>  #endif
>  
>  #ifdef CONFIG_PPC_BOOK3S_64
> +#ifndef CONFIG_ARCH_WANT_GENERAL_HUGETLB
>  /*
>   * At this point we do the placement change only for BOOK3S 64. This would
>   * possibly work on other subarchs.
> @@ -145,32 +148,23 @@ pte_t *huge_pte_alloc(struct mm_struct *mm, unsigned long addr, unsigned long sz
>  
>  	addr &= ~(sz-1);
>  	pg = pgd_offset(mm, addr);
> -
> -	if (pshift == PGDIR_SHIFT)
> -		/* 16GB huge page */
> -		return (pte_t *) pg;
> -	else if (pshift > PUD_SHIFT)
> -		/*
> -		 * We need to use hugepd table
> -		 */
> +	if (pshift > PUD_SHIFT) {
>  		hpdp = (hugepd_t *)pg;
> -	else {
> -		pdshift = PUD_SHIFT;
> -		pu = pud_alloc(mm, pg, addr);
> -		if (pshift == PUD_SHIFT)
> -			return (pte_t *)pu;
> -		else if (pshift > PMD_SHIFT)
> -			hpdp = (hugepd_t *)pu;
> -		else {
> -			pdshift = PMD_SHIFT;
> -			pm = pmd_alloc(mm, pu, addr);
> -			if (pshift == PMD_SHIFT)
> -				/* 16MB hugepage */
> -				return (pte_t *)pm;
> -			else
> -				hpdp = (hugepd_t *)pm;
> -		}
> +		goto hugepd_search;
>  	}
> +
> +	pdshift = PUD_SHIFT;
> +	pu = pud_alloc(mm, pg, addr);
> +	if (pshift > PMD_SHIFT) {
> +		hpdp = (hugepd_t *)pu;
> +		goto hugepd_search;
> +	}
> +
> +	pdshift = PMD_SHIFT;
> +	pm = pmd_alloc(mm, pu, addr);
> +	hpdp = (hugepd_t *)pm;
> +
> +hugepd_search:
>  	if (!hpdp)
>  		return NULL;
>  
> @@ -182,6 +176,31 @@ pte_t *huge_pte_alloc(struct mm_struct *mm, unsigned long addr, unsigned long sz
>  	return hugepte_offset(*hpdp, addr, pdshift);
>  }
>  
> +#else /* CONFIG_ARCH_WANT_GENERAL_HUGETLB */
> +pte_t *huge_pte_alloc(struct mm_struct *mm, unsigned long addr, unsigned long sz)

This is confusing, aren't we using the one from mm/hugetlb.c?

> +{
> +	pgd_t *pg;
> +	pud_t *pu;
> +	pmd_t *pm;
> +	unsigned pshift = __ffs(sz);
> +
> +	addr &= ~(sz-1);

Am I reading this right? Shouldn't this be addr &= ~(1 << pshift - 1)

> +	pg = pgd_offset(mm, addr);
> +
> +	if (pshift == PGDIR_SHIFT)	/* 16GB Huge Page */
> +		return (pte_t *)pg;
> +
> +	pu = pud_alloc(mm, pg, addr);	/* NA, skipped */
> +	if (pshift == PUD_SHIFT)
> +		return (pte_t *)pu;
> +
> +	pm = pmd_alloc(mm, pu, addr);	/* 16MB Huge Page */
> +	if (pshift == PMD_SHIFT)
> +		return (pte_t *)pm;
> +
> +	return NULL;
> +}
> +#endif /* !CONFIG_ARCH_WANT_GENERAL_HUGETLB */
>  #else
>  
>  pte_t *huge_pte_alloc(struct mm_struct *mm, unsigned long addr, unsigned long sz)
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 01/10] mm/mmap: Replace SHM_HUGE_MASK with MAP_HUGE_MASK inside mmap_pgoff
  2016-04-07  5:37 ` [PATCH 01/10] mm/mmap: Replace SHM_HUGE_MASK with MAP_HUGE_MASK inside mmap_pgoff Anshuman Khandual
  2016-04-07  8:28   ` Balbir Singh
@ 2016-04-13  7:54   ` Michal Hocko
  1 sibling, 0 replies; 27+ messages in thread
From: Michal Hocko @ 2016-04-13  7:54 UTC (permalink / raw)
  To: Anshuman Khandual
  Cc: linux-mm, linux-kernel, linuxppc-dev, hughd, kirill, n-horiguchi,
	akpm, mgorman, dave.hansen, aneesh.kumar, mpe

On Thu 07-04-16 11:07:35, Anshuman Khandual wrote:
> The commit 091d0d55b286 ("shm: fix null pointer deref when userspace
> specifies invalid hugepage size") had replaced MAP_HUGE_MASK with
> SHM_HUGE_MASK. Though both of them contain the same numeric value of
> 0x3f, MAP_HUGE_MASK flag sounds more appropriate than the other one
> in the context. Hence change it back.

Yes, SHM_HUGE_MASK mixing with MAP_HUGE_SHIFT is not only misleading
it might bite us later should any of the two change.

> 
> Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>

Acked-by: Michal Hocko <mhocko@suse.com>

> ---
>  mm/mmap.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/mm/mmap.c b/mm/mmap.c
> index bd2e1a53..7d730a4 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -1315,7 +1315,7 @@ SYSCALL_DEFINE6(mmap_pgoff, unsigned long, addr, unsigned long, len,
>  		struct user_struct *user = NULL;
>  		struct hstate *hs;
>  
> -		hs = hstate_sizelog((flags >> MAP_HUGE_SHIFT) & SHM_HUGE_MASK);
> +		hs = hstate_sizelog((flags >> MAP_HUGE_SHIFT) & MAP_HUGE_MASK);
>  		if (!hs)
>  			return -EINVAL;
>  
> -- 
> 2.1.0
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 05/10] powerpc/hugetlb: Split the function 'huge_pte_alloc'
  2016-04-11 13:51   ` Balbir Singh
@ 2016-04-13 11:08     ` Anshuman Khandual
  0 siblings, 0 replies; 27+ messages in thread
From: Anshuman Khandual @ 2016-04-13 11:08 UTC (permalink / raw)
  To: Balbir Singh, linux-mm, linux-kernel, linuxppc-dev
  Cc: hughd, dave.hansen, aneesh.kumar, kirill, n-horiguchi, mgorman, akpm

On 04/11/2016 07:21 PM, Balbir Singh wrote:
> 
> 
> On 07/04/16 15:37, Anshuman Khandual wrote:
>> Currently the function 'huge_pte_alloc' has got two versions, one for the
>> BOOK3S server and the other one for the BOOK3E embedded platforms. This
>> change splits only the BOOK3S server version into two parts, one for the
>> ARCH_WANT_GENERAL_HUGETLB config implementation and the other one for
>> everything else. This change is one of the prerequisites towards enabling
>> ARCH_WANT_GENERAL_HUGETLB config option on POWER platform.
>>
>> Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
>> ---
>>  arch/powerpc/mm/hugetlbpage.c | 67 +++++++++++++++++++++++++++----------------
>>  1 file changed, 43 insertions(+), 24 deletions(-)
>>
>> diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c
>> index d991b9e..e453918 100644
>> --- a/arch/powerpc/mm/hugetlbpage.c
>> +++ b/arch/powerpc/mm/hugetlbpage.c
>> @@ -59,6 +59,7 @@ pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long addr)
>>  	return __find_linux_pte_or_hugepte(mm->pgd, addr, NULL, NULL);
>>  }
>>  
>> +#ifndef CONFIG_ARCH_WANT_GENERAL_HUGETLB
>>  static int __hugepte_alloc(struct mm_struct *mm, hugepd_t *hpdp,
>>  			   unsigned long address, unsigned pdshift, unsigned pshift)
>>  {
>> @@ -116,6 +117,7 @@ static int __hugepte_alloc(struct mm_struct *mm, hugepd_t *hpdp,
>>  	spin_unlock(&mm->page_table_lock);
>>  	return 0;
>>  }
>> +#endif /* !CONFIG_ARCH_WANT_GENERAL_HUGETLB */
>>  
>>  /*
>>   * These macros define how to determine which level of the page table holds
>> @@ -130,6 +132,7 @@ static int __hugepte_alloc(struct mm_struct *mm, hugepd_t *hpdp,
>>  #endif
>>  
>>  #ifdef CONFIG_PPC_BOOK3S_64
>> +#ifndef CONFIG_ARCH_WANT_GENERAL_HUGETLB
>>  /*
>>   * At this point we do the placement change only for BOOK3S 64. This would
>>   * possibly work on other subarchs.
>> @@ -145,32 +148,23 @@ pte_t *huge_pte_alloc(struct mm_struct *mm, unsigned long addr, unsigned long sz
>>  
>>  	addr &= ~(sz-1);
>>  	pg = pgd_offset(mm, addr);
>> -
>> -	if (pshift == PGDIR_SHIFT)
>> -		/* 16GB huge page */
>> -		return (pte_t *) pg;
>> -	else if (pshift > PUD_SHIFT)
>> -		/*
>> -		 * We need to use hugepd table
>> -		 */
>> +	if (pshift > PUD_SHIFT) {
>>  		hpdp = (hugepd_t *)pg;
>> -	else {
>> -		pdshift = PUD_SHIFT;
>> -		pu = pud_alloc(mm, pg, addr);
>> -		if (pshift == PUD_SHIFT)
>> -			return (pte_t *)pu;
>> -		else if (pshift > PMD_SHIFT)
>> -			hpdp = (hugepd_t *)pu;
>> -		else {
>> -			pdshift = PMD_SHIFT;
>> -			pm = pmd_alloc(mm, pu, addr);
>> -			if (pshift == PMD_SHIFT)
>> -				/* 16MB hugepage */
>> -				return (pte_t *)pm;
>> -			else
>> -				hpdp = (hugepd_t *)pm;
>> -		}
>> +		goto hugepd_search;
>>  	}
>> +
>> +	pdshift = PUD_SHIFT;
>> +	pu = pud_alloc(mm, pg, addr);
>> +	if (pshift > PMD_SHIFT) {
>> +		hpdp = (hugepd_t *)pu;
>> +		goto hugepd_search;
>> +	}
>> +
>> +	pdshift = PMD_SHIFT;
>> +	pm = pmd_alloc(mm, pu, addr);
>> +	hpdp = (hugepd_t *)pm;
>> +
>> +hugepd_search:
>>  	if (!hpdp)
>>  		return NULL;
>>  
>> @@ -182,6 +176,31 @@ pte_t *huge_pte_alloc(struct mm_struct *mm, unsigned long addr, unsigned long sz
>>  	return hugepte_offset(*hpdp, addr, pdshift);
>>  }
>>  
>> +#else /* CONFIG_ARCH_WANT_GENERAL_HUGETLB */
>> +pte_t *huge_pte_alloc(struct mm_struct *mm, unsigned long addr, unsigned long sz)
> 
> This is confusing, aren't we using the one from mm/hugetlb.c?

We are using huge_pte_alloc() from mm/hugetlb.c only when we have
CONFIG_ARCH_WANT_GENERAL_HUGETLB enabled. For every thing else we
use the definition here for BOOK3S platforms.

> 
>> +{
>> +	pgd_t *pg;
>> +	pud_t *pu;
>> +	pmd_t *pm;
>> +	unsigned pshift = __ffs(sz);
>> +
>> +	addr &= ~(sz-1);
> 
> Am I reading this right? Shouldn't this be addr &= ~(1 << pshift - 1)

Both are same. __ffs() computes the __ilog2 of the size and arrives at
the page shift. Here we use the size directly instead.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 03/10] mm/hugetlb: Protect follow_huge_(pud|pgd) functions from race
  2016-04-11  6:04     ` Anshuman Khandual
@ 2016-04-18  8:42       ` Anshuman Khandual
  0 siblings, 0 replies; 27+ messages in thread
From: Anshuman Khandual @ 2016-04-18  8:42 UTC (permalink / raw)
  To: kbuild test robot
  Cc: linux-mm, linuxppc-dev, hughd, linux-kernel, dave.hansen,
	kbuild-all, kirill, n-horiguchi, mgorman, akpm, aneesh.kumar

On 04/11/2016 11:34 AM, Anshuman Khandual wrote:
> On 04/07/2016 03:04 PM, kbuild test robot wrote:
>> > All errors (new ones prefixed by >>):
>> > 
>> >    mm/hugetlb.c: In function 'follow_huge_pud':
>>>>>> >>> >> mm/hugetlb.c:4360:3: error: implicit declaration of function 'pud_page' [-Werror=implicit-function-declaration]
>> >       page = pud_page(*pud) + ((address & ~PUD_MASK) >> PAGE_SHIFT);
>> >       ^
>> >    mm/hugetlb.c:4360:8: warning: assignment makes pointer from integer without a cast
>> >       page = pud_page(*pud) + ((address & ~PUD_MASK) >> PAGE_SHIFT);
>> >            ^
>> >    mm/hugetlb.c: In function 'follow_huge_pgd':
>> >    mm/hugetlb.c:4395:3: error: implicit declaration of function 'pgd_page' [-Werror=implicit-function-declaration]
>> >       page = pgd_page(*pgd) + ((address & ~PGDIR_MASK) >> PAGE_SHIFT);
> Both the build errors here are because of the fact that pgd_page() is
> not available for some platforms and config options. It got missed as
> I ran only powerpc config options for build test purpose. My bad, will
> fix it.

The following change seems to fix the build problem on S390 but will
require some inputs from S390 maintainers regarding the functional
correctness of the patch.

diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtable.h
index 2f66645..834a8a6 100644
--- a/arch/s390/include/asm/pgtable.h
+++ b/arch/s390/include/asm/pgtable.h
@@ -963,6 +963,8 @@ static inline pmd_t *pmd_offset(pud_t *pud, unsigned long address)
 #define pte_page(x) pfn_to_page(pte_pfn(x))
 
 #define pmd_page(pmd) pfn_to_page(pmd_pfn(pmd))
+#define pud_page(pud) pud_val(pud)
+#define pgd_page(pgd) pgd_val(pgd)

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH 03/10] mm/hugetlb: Protect follow_huge_(pud|pgd) functions from race
  2016-04-07  9:16   ` kbuild test robot
@ 2016-04-18  8:44     ` Anshuman Khandual
  0 siblings, 0 replies; 27+ messages in thread
From: Anshuman Khandual @ 2016-04-18  8:44 UTC (permalink / raw)
  To: kbuild test robot
  Cc: dave.hansen, mgorman, hughd, linux-kernel, linux-mm, kbuild-all,
	kirill, n-horiguchi, linuxppc-dev, akpm, aneesh.kumar

On 04/07/2016 02:46 PM, kbuild test robot wrote:
> Hi Anshuman,
> 
> [auto build test ERROR on powerpc/next]
> [also build test ERROR on v4.6-rc2 next-20160407]
> [if your patch is applied to the wrong git tree, please drop us a note to help improving the system]
> 
> url:    https://github.com/0day-ci/linux/commits/Anshuman-Khandual/Enable-HugeTLB-page-migration-on-POWER/20160407-165841
> base:   https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next
> config: sparc64-allyesconfig (attached as .config)
> reproduce:
>         wget https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross -O ~/bin/make.cross
>         chmod +x ~/bin/make.cross
>         # save the attached .config to linux build tree
>         make.cross ARCH=sparc64 
> 
> All error/warnings (new ones prefixed by >>):
> 
>    mm/hugetlb.c: In function 'follow_huge_pgd':
>>> >> mm/hugetlb.c:4395:3: error: implicit declaration of function 'pgd_page' [-Werror=implicit-function-declaration]
>       page = pgd_page(*pgd) + ((address & ~PGDIR_MASK) >> PAGE_SHIFT);
>       ^


The following change seems to fix the build problem on SPARC but will
require some inputs from SPARC maintainers regarding the functional
correctness of the patch.

diff --git a/arch/sparc/include/asm/pgtable_64.h
b/arch/sparc/include/asm/pgtable_64.h
index f089cfa..7b7e6a0 100644
--- a/arch/sparc/include/asm/pgtable_64.h
+++ b/arch/sparc/include/asm/pgtable_64.h
@@ -804,6 +804,7 @@ static inline unsigned long __pmd_page(pmd_t pmd)
 #define pmd_clear(pmdp)                        (pmd_val(*(pmdp)) = 0UL)
 #define pud_present(pud)               (pud_val(pud) != 0U)
 #define pud_clear(pudp)                        (pud_val(*(pudp)) = 0UL)
+#define pgd_page(pgd)                  (pgd_val(pgd))
 #define pgd_page_vaddr(pgd)            \
        ((unsigned long) __va(pgd_val(pgd)))
 #define pgd_present(pgd)               (pgd_val(pgd) != 0U)

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH 00/10] Enable HugeTLB page migration on POWER
  2016-04-07  5:37 [PATCH 00/10] Enable HugeTLB page migration on POWER Anshuman Khandual
                   ` (9 preceding siblings ...)
  2016-04-07  5:37 ` [PATCH 10/10] selfttest/powerpc: Add memory page migration tests Anshuman Khandual
@ 2016-04-18  8:52 ` Anshuman Khandual
  10 siblings, 0 replies; 27+ messages in thread
From: Anshuman Khandual @ 2016-04-18  8:52 UTC (permalink / raw)
  To: linux-mm, linux-kernel, linuxppc-dev
  Cc: hughd, dave.hansen, aneesh.kumar, kirill, n-horiguchi, mgorman, akpm

On 04/07/2016 11:07 AM, Anshuman Khandual wrote:
> This patch series enables HugeTLB page migration on POWER platform.
> This series has some core VM changes (patch 1, 2, 3) and some powerpc
> specific changes (patch 4, 5, 6, 7, 8, 9, 10). Comments, suggestions
> and inputs are welcome.
> 
> Anshuman Khandual (10):
>   mm/mmap: Replace SHM_HUGE_MASK with MAP_HUGE_MASK inside mmap_pgoff
>   mm/hugetlb: Add PGD based implementation awareness
>   mm/hugetlb: Protect follow_huge_(pud|pgd) functions from race

Hugh/Mel/Naoya/Andrew,

Andrew had already reviewed the changes in the first two patches during
the RFC phase and was okay with them. Could you please review the third
patch here as well and let me know your inputs/suggestions. Currently
the third patch has got build failures on SPARC and S390 platforms
(details of which are on the thread with possible fixes). Thank you.



	

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2016-04-18  8:52 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-04-07  5:37 [PATCH 00/10] Enable HugeTLB page migration on POWER Anshuman Khandual
2016-04-07  5:37 ` [PATCH 01/10] mm/mmap: Replace SHM_HUGE_MASK with MAP_HUGE_MASK inside mmap_pgoff Anshuman Khandual
2016-04-07  8:28   ` Balbir Singh
2016-04-13  7:54   ` Michal Hocko
2016-04-07  5:37 ` [PATCH 02/10] mm/hugetlb: Add PGD based implementation awareness Anshuman Khandual
2016-04-07  9:04   ` Balbir Singh
2016-04-11  5:25     ` Anshuman Khandual
2016-04-11  6:10       ` Anshuman Khandual
2016-04-07  5:37 ` [PATCH 03/10] mm/hugetlb: Protect follow_huge_(pud|pgd) functions from race Anshuman Khandual
2016-04-07  9:16   ` kbuild test robot
2016-04-18  8:44     ` Anshuman Khandual
2016-04-07  9:26   ` Balbir Singh
2016-04-11  5:39     ` Anshuman Khandual
2016-04-11 12:46       ` Balbir Singh
2016-04-07  9:34   ` kbuild test robot
2016-04-11  6:04     ` Anshuman Khandual
2016-04-18  8:42       ` Anshuman Khandual
2016-04-07  5:37 ` [PATCH 04/10] powerpc/hugetlb: Add ABI defines for MAP_HUGE_16MB and MAP_HUGE_16GB Anshuman Khandual
2016-04-07  5:37 ` [PATCH 05/10] powerpc/hugetlb: Split the function 'huge_pte_alloc' Anshuman Khandual
2016-04-11 13:51   ` Balbir Singh
2016-04-13 11:08     ` Anshuman Khandual
2016-04-07  5:37 ` [PATCH 06/10] powerpc/hugetlb: Split the function 'huge_pte_offset' Anshuman Khandual
2016-04-07  5:37 ` [PATCH 07/10] powerpc/hugetlb: Prepare arch functions for ARCH_WANT_GENERAL_HUGETLB Anshuman Khandual
2016-04-07  5:37 ` [PATCH 08/10] powerpc/hugetlb: Selectively enable ARCH_WANT_GENERAL_HUGETLB Anshuman Khandual
2016-04-07  5:37 ` [PATCH 09/10] powerpc/hugetlb: Selectively enable ARCH_ENABLE_HUGEPAGE_MIGRATION Anshuman Khandual
2016-04-07  5:37 ` [PATCH 10/10] selfttest/powerpc: Add memory page migration tests Anshuman Khandual
2016-04-18  8:52 ` [PATCH 00/10] Enable HugeTLB page migration on POWER Anshuman Khandual

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).