All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/4] Add hstate parameter to huge_pte_offset()
@ 2017-03-30 16:38 ` Punit Agrawal
  0 siblings, 0 replies; 24+ messages in thread
From: Punit Agrawal @ 2017-03-30 16:38 UTC (permalink / raw)
  To: catalin.marinas, will.deacon, akpm
  Cc: Punit Agrawal, linux-mm, linux-arm-kernel, linux-kernel, tbaicar,
	kirill.shutemov, mike.kravetz

On architectures that support hugepages composed of contiguous pte(s)
as well as block entries at the same level in the page table,
huge_pte_offset() is not able to determine the correct offset to
return when it encounters a swap entry (which is used to mark poisoned
as well as migrated pages in the page table).

huge_pte_offset() needs to know the size of the hugepage at the
requested address to determine the offset to return - the current
entry or the first entry of a set of contiguous hugepages. This came
up while enabling support for memory failure handling on arm64 (Patch
3-4 add this support and are included here for completeness).

Patch 1 adds a hstate parameter to huge_pte_offset() to provide
additional information about the target address. It also updates the
signatures (and usage) of huge_pte_offset() for architectures that
override the generic implementation.

Patch 2 uses the size determined by the parameter added in Patch 1, to
return the correct page table offset in the arm64 implementation of
huge_pte_offset().

The patchset is based on top of v4.11-rc4 and the arm64 huge page
cleanup for break-before-make[0]. Previous posting can be found at
[1].

Changes RFC -> v1

* Fixed a missing conversion of huge_pte_offset() prototype to add
  hstate parameter. Reported by 0-day.

[0] http://lists.infradead.org/pipermail/linux-arm-kernel/2017-March/497027.html
[1] https://lkml.org/lkml/2017/3/23/293


Jonathan (Zhixiong) Zhang (2):
  arm64: hwpoison: add VM_FAULT_HWPOISON[_LARGE] handling
  arm64: kconfig: allow support for memory failure handling

Punit Agrawal (2):
  mm/hugetlb.c: add hstate parameter to huge_pte_offset()
  arm64: hugetlbpages: Correctly handle swap entries in
    huge_pte_offset()

 arch/arm64/Kconfig            |  1 +
 arch/arm64/mm/fault.c         | 22 +++++++++++++++++++---
 arch/arm64/mm/hugetlbpage.c   | 34 ++++++++++++++++++----------------
 arch/ia64/mm/hugetlbpage.c    |  4 ++--
 arch/metag/mm/hugetlbpage.c   |  3 ++-
 arch/mips/mm/hugetlbpage.c    |  3 ++-
 arch/parisc/mm/hugetlbpage.c  |  3 ++-
 arch/powerpc/mm/hugetlbpage.c |  2 +-
 arch/s390/mm/hugetlbpage.c    |  3 ++-
 arch/sh/mm/hugetlbpage.c      |  3 ++-
 arch/sparc/mm/hugetlbpage.c   |  3 ++-
 arch/tile/mm/hugetlbpage.c    |  3 ++-
 arch/x86/mm/hugetlbpage.c     |  2 +-
 drivers/acpi/apei/Kconfig     |  1 +
 fs/userfaultfd.c              |  7 +++++--
 include/linux/hugetlb.h       |  5 +++--
 mm/hugetlb.c                  | 21 ++++++++++++---------
 mm/page_vma_mapped.c          |  3 ++-
 mm/pagewalk.c                 |  2 +-
 19 files changed, 80 insertions(+), 45 deletions(-)

-- 
2.11.0

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 0/4] Add hstate parameter to huge_pte_offset()
@ 2017-03-30 16:38 ` Punit Agrawal
  0 siblings, 0 replies; 24+ messages in thread
From: Punit Agrawal @ 2017-03-30 16:38 UTC (permalink / raw)
  To: catalin.marinas, will.deacon, akpm
  Cc: Punit Agrawal, linux-mm, linux-arm-kernel, linux-kernel, tbaicar,
	kirill.shutemov, mike.kravetz

On architectures that support hugepages composed of contiguous pte(s)
as well as block entries at the same level in the page table,
huge_pte_offset() is not able to determine the correct offset to
return when it encounters a swap entry (which is used to mark poisoned
as well as migrated pages in the page table).

huge_pte_offset() needs to know the size of the hugepage at the
requested address to determine the offset to return - the current
entry or the first entry of a set of contiguous hugepages. This came
up while enabling support for memory failure handling on arm64 (Patch
3-4 add this support and are included here for completeness).

Patch 1 adds a hstate parameter to huge_pte_offset() to provide
additional information about the target address. It also updates the
signatures (and usage) of huge_pte_offset() for architectures that
override the generic implementation.

Patch 2 uses the size determined by the parameter added in Patch 1, to
return the correct page table offset in the arm64 implementation of
huge_pte_offset().

The patchset is based on top of v4.11-rc4 and the arm64 huge page
cleanup for break-before-make[0]. Previous posting can be found at
[1].

Changes RFC -> v1

* Fixed a missing conversion of huge_pte_offset() prototype to add
  hstate parameter. Reported by 0-day.

[0] http://lists.infradead.org/pipermail/linux-arm-kernel/2017-March/497027.html
[1] https://lkml.org/lkml/2017/3/23/293


Jonathan (Zhixiong) Zhang (2):
  arm64: hwpoison: add VM_FAULT_HWPOISON[_LARGE] handling
  arm64: kconfig: allow support for memory failure handling

Punit Agrawal (2):
  mm/hugetlb.c: add hstate parameter to huge_pte_offset()
  arm64: hugetlbpages: Correctly handle swap entries in
    huge_pte_offset()

 arch/arm64/Kconfig            |  1 +
 arch/arm64/mm/fault.c         | 22 +++++++++++++++++++---
 arch/arm64/mm/hugetlbpage.c   | 34 ++++++++++++++++++----------------
 arch/ia64/mm/hugetlbpage.c    |  4 ++--
 arch/metag/mm/hugetlbpage.c   |  3 ++-
 arch/mips/mm/hugetlbpage.c    |  3 ++-
 arch/parisc/mm/hugetlbpage.c  |  3 ++-
 arch/powerpc/mm/hugetlbpage.c |  2 +-
 arch/s390/mm/hugetlbpage.c    |  3 ++-
 arch/sh/mm/hugetlbpage.c      |  3 ++-
 arch/sparc/mm/hugetlbpage.c   |  3 ++-
 arch/tile/mm/hugetlbpage.c    |  3 ++-
 arch/x86/mm/hugetlbpage.c     |  2 +-
 drivers/acpi/apei/Kconfig     |  1 +
 fs/userfaultfd.c              |  7 +++++--
 include/linux/hugetlb.h       |  5 +++--
 mm/hugetlb.c                  | 21 ++++++++++++---------
 mm/page_vma_mapped.c          |  3 ++-
 mm/pagewalk.c                 |  2 +-
 19 files changed, 80 insertions(+), 45 deletions(-)

-- 
2.11.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 0/4] Add hstate parameter to huge_pte_offset()
@ 2017-03-30 16:38 ` Punit Agrawal
  0 siblings, 0 replies; 24+ messages in thread
From: Punit Agrawal @ 2017-03-30 16:38 UTC (permalink / raw)
  To: linux-arm-kernel

On architectures that support hugepages composed of contiguous pte(s)
as well as block entries at the same level in the page table,
huge_pte_offset() is not able to determine the correct offset to
return when it encounters a swap entry (which is used to mark poisoned
as well as migrated pages in the page table).

huge_pte_offset() needs to know the size of the hugepage at the
requested address to determine the offset to return - the current
entry or the first entry of a set of contiguous hugepages. This came
up while enabling support for memory failure handling on arm64 (Patch
3-4 add this support and are included here for completeness).

Patch 1 adds a hstate parameter to huge_pte_offset() to provide
additional information about the target address. It also updates the
signatures (and usage) of huge_pte_offset() for architectures that
override the generic implementation.

Patch 2 uses the size determined by the parameter added in Patch 1, to
return the correct page table offset in the arm64 implementation of
huge_pte_offset().

The patchset is based on top of v4.11-rc4 and the arm64 huge page
cleanup for break-before-make[0]. Previous posting can be found at
[1].

Changes RFC -> v1

* Fixed a missing conversion of huge_pte_offset() prototype to add
  hstate parameter. Reported by 0-day.

[0] http://lists.infradead.org/pipermail/linux-arm-kernel/2017-March/497027.html
[1] https://lkml.org/lkml/2017/3/23/293


Jonathan (Zhixiong) Zhang (2):
  arm64: hwpoison: add VM_FAULT_HWPOISON[_LARGE] handling
  arm64: kconfig: allow support for memory failure handling

Punit Agrawal (2):
  mm/hugetlb.c: add hstate parameter to huge_pte_offset()
  arm64: hugetlbpages: Correctly handle swap entries in
    huge_pte_offset()

 arch/arm64/Kconfig            |  1 +
 arch/arm64/mm/fault.c         | 22 +++++++++++++++++++---
 arch/arm64/mm/hugetlbpage.c   | 34 ++++++++++++++++++----------------
 arch/ia64/mm/hugetlbpage.c    |  4 ++--
 arch/metag/mm/hugetlbpage.c   |  3 ++-
 arch/mips/mm/hugetlbpage.c    |  3 ++-
 arch/parisc/mm/hugetlbpage.c  |  3 ++-
 arch/powerpc/mm/hugetlbpage.c |  2 +-
 arch/s390/mm/hugetlbpage.c    |  3 ++-
 arch/sh/mm/hugetlbpage.c      |  3 ++-
 arch/sparc/mm/hugetlbpage.c   |  3 ++-
 arch/tile/mm/hugetlbpage.c    |  3 ++-
 arch/x86/mm/hugetlbpage.c     |  2 +-
 drivers/acpi/apei/Kconfig     |  1 +
 fs/userfaultfd.c              |  7 +++++--
 include/linux/hugetlb.h       |  5 +++--
 mm/hugetlb.c                  | 21 ++++++++++++---------
 mm/page_vma_mapped.c          |  3 ++-
 mm/pagewalk.c                 |  2 +-
 19 files changed, 80 insertions(+), 45 deletions(-)

-- 
2.11.0

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 1/4] mm/hugetlb.c: add hstate parameter to huge_pte_offset()
  2017-03-30 16:38 ` Punit Agrawal
  (?)
@ 2017-03-30 16:38   ` Punit Agrawal
  -1 siblings, 0 replies; 24+ messages in thread
From: Punit Agrawal @ 2017-03-30 16:38 UTC (permalink / raw)
  To: catalin.marinas, will.deacon, akpm
  Cc: Punit Agrawal, linux-mm, linux-arm-kernel, linux-kernel, tbaicar,
	kirill.shutemov, mike.kravetz, Tony Luck, Fenghua Yu,
	James Hogan, Ralf Baechle, James E.J. Bottomley, Helge Deller,
	Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	Martin Schwidefsky, Heiko Carstens, Yoshinori Sato, Rich Felker,
	David S. Miller, Chris Metcalf, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Alexander Viro, Steve Capper, Michal Hocko,
	Naoya Horiguchi, Aneesh Kumar K.V, Hillf Danton

A poisoned or migrated hugepage is stored as a swap entry in the page
tables. On architectures that support hugepages consisting of contiguous
page table entries (such as on arm64) this leads to ambiguity in
determining the page table entry to return in huge_pte_offset() when a
poisoned entry is encountered.

Let's remove the ambiguity by adding a hstate parameter to convey
additional information about the requested address. Also fixup the
definition/usage of huge_pte_offset() throughout the tree.

Signed-off-by: Punit Agrawal <punit.agrawal@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: James Hogan <james.hogan@imgtec.com> (odd fixer:METAG ARCHITECTURE)
Cc: Ralf Baechle <ralf@linux-mips.org> (supporter:MIPS)
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
Cc: Helge Deller <deller@gmx.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Rich Felker <dalias@libc.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Steve Capper <steve.capper@arm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
---
 arch/arm64/mm/hugetlbpage.c   |  3 ++-
 arch/ia64/mm/hugetlbpage.c    |  4 ++--
 arch/metag/mm/hugetlbpage.c   |  3 ++-
 arch/mips/mm/hugetlbpage.c    |  3 ++-
 arch/parisc/mm/hugetlbpage.c  |  3 ++-
 arch/powerpc/mm/hugetlbpage.c |  2 +-
 arch/s390/mm/hugetlbpage.c    |  3 ++-
 arch/sh/mm/hugetlbpage.c      |  3 ++-
 arch/sparc/mm/hugetlbpage.c   |  3 ++-
 arch/tile/mm/hugetlbpage.c    |  3 ++-
 arch/x86/mm/hugetlbpage.c     |  2 +-
 fs/userfaultfd.c              |  7 +++++--
 include/linux/hugetlb.h       |  5 +++--
 mm/hugetlb.c                  | 21 ++++++++++++---------
 mm/page_vma_mapped.c          |  3 ++-
 mm/pagewalk.c                 |  2 +-
 16 files changed, 43 insertions(+), 27 deletions(-)

diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c
index e2106932daa0..9ca742c4c1ab 100644
--- a/arch/arm64/mm/hugetlbpage.c
+++ b/arch/arm64/mm/hugetlbpage.c
@@ -189,7 +189,8 @@ pte_t *huge_pte_alloc(struct mm_struct *mm,
 	return pte;
 }
 
-pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long addr)
+pte_t *huge_pte_offset(struct mm_struct *mm,
+		       unsigned long addr, struct hstate *h)
 {
 	pgd_t *pgd;
 	pud_t *pud;
diff --git a/arch/ia64/mm/hugetlbpage.c b/arch/ia64/mm/hugetlbpage.c
index 85de86d36fdf..09c865be3cfe 100644
--- a/arch/ia64/mm/hugetlbpage.c
+++ b/arch/ia64/mm/hugetlbpage.c
@@ -44,7 +44,7 @@ huge_pte_alloc(struct mm_struct *mm, unsigned long addr, unsigned long sz)
 }
 
 pte_t *
-huge_pte_offset (struct mm_struct *mm, unsigned long addr)
+huge_pte_offset (struct mm_struct *mm, unsigned long addr, struct hstate *h)
 {
 	unsigned long taddr = htlbpage_to_page(addr);
 	pgd_t *pgd;
@@ -92,7 +92,7 @@ struct page *follow_huge_addr(struct mm_struct *mm, unsigned long addr, int writ
 	if (REGION_NUMBER(addr) != RGN_HPAGE)
 		return ERR_PTR(-EINVAL);
 
-	ptep = huge_pte_offset(mm, addr);
+	ptep = huge_pte_offset(mm, addr, size_to_hstate(HPAGE_SIZE));
 	if (!ptep || pte_none(*ptep))
 		return NULL;
 	page = pte_page(*ptep);
diff --git a/arch/metag/mm/hugetlbpage.c b/arch/metag/mm/hugetlbpage.c
index db1b7da91e4f..f3778c9b219d 100644
--- a/arch/metag/mm/hugetlbpage.c
+++ b/arch/metag/mm/hugetlbpage.c
@@ -74,7 +74,8 @@ pte_t *huge_pte_alloc(struct mm_struct *mm,
 	return pte;
 }
 
-pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long addr)
+pte_t *huge_pte_offset(struct mm_struct *mm,
+		       unsigned long addr, struct hstate *h)
 {
 	pgd_t *pgd;
 	pud_t *pud;
diff --git a/arch/mips/mm/hugetlbpage.c b/arch/mips/mm/hugetlbpage.c
index 74aa6f62468f..f0f32c13a511 100644
--- a/arch/mips/mm/hugetlbpage.c
+++ b/arch/mips/mm/hugetlbpage.c
@@ -36,7 +36,8 @@ pte_t *huge_pte_alloc(struct mm_struct *mm, unsigned long addr,
 	return pte;
 }
 
-pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long addr)
+pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long addr,
+		       struct hstate *h)
 {
 	pgd_t *pgd;
 	pud_t *pud;
diff --git a/arch/parisc/mm/hugetlbpage.c b/arch/parisc/mm/hugetlbpage.c
index aa50ac090e9b..ff05ba5f66ac 100644
--- a/arch/parisc/mm/hugetlbpage.c
+++ b/arch/parisc/mm/hugetlbpage.c
@@ -69,7 +69,8 @@ pte_t *huge_pte_alloc(struct mm_struct *mm,
 	return pte;
 }
 
-pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long addr)
+pte_t *huge_pte_offset(struct mm_struct *mm,
+		       unsigned long addr, struct hstate *h)
 {
 	pgd_t *pgd;
 	pud_t *pud;
diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c
index 8c3389cbcd12..9fddb22c60d9 100644
--- a/arch/powerpc/mm/hugetlbpage.c
+++ b/arch/powerpc/mm/hugetlbpage.c
@@ -55,7 +55,7 @@ static unsigned nr_gpages;
 
 #define hugepd_none(hpd)	(hpd_val(hpd) == 0)
 
-pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long addr)
+pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long addr, struct hstate *h)
 {
 	/* Only called for hugetlbfs pages, hence can ignore THP */
 	return __find_linux_pte_or_hugepte(mm->pgd, addr, NULL, NULL);
diff --git a/arch/s390/mm/hugetlbpage.c b/arch/s390/mm/hugetlbpage.c
index 9b4050caa4e9..7fe5532887de 100644
--- a/arch/s390/mm/hugetlbpage.c
+++ b/arch/s390/mm/hugetlbpage.c
@@ -176,7 +176,8 @@ pte_t *huge_pte_alloc(struct mm_struct *mm,
 	return (pte_t *) pmdp;
 }
 
-pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long addr)
+pte_t *huge_pte_offset(struct mm_struct *mm,
+		       unsigned long addr, struct hstate *h)
 {
 	pgd_t *pgdp;
 	pud_t *pudp;
diff --git a/arch/sh/mm/hugetlbpage.c b/arch/sh/mm/hugetlbpage.c
index cc948db74878..53781fdc222c 100644
--- a/arch/sh/mm/hugetlbpage.c
+++ b/arch/sh/mm/hugetlbpage.c
@@ -42,7 +42,8 @@ pte_t *huge_pte_alloc(struct mm_struct *mm,
 	return pte;
 }
 
-pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long addr)
+pte_t *huge_pte_offset(struct mm_struct *mm,
+		       unsigned long addr, struct hstate *h)
 {
 	pgd_t *pgd;
 	pud_t *pud;
diff --git a/arch/sparc/mm/hugetlbpage.c b/arch/sparc/mm/hugetlbpage.c
index 323bc6b6e3ad..5b292864e7d1 100644
--- a/arch/sparc/mm/hugetlbpage.c
+++ b/arch/sparc/mm/hugetlbpage.c
@@ -270,7 +270,8 @@ pte_t *huge_pte_alloc(struct mm_struct *mm,
 	return pte;
 }
 
-pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long addr)
+pte_t *huge_pte_offset(struct mm_struct *mm,
+		       unsigned long addr, struct hstate *h)
 {
 	pgd_t *pgd;
 	pud_t *pud;
diff --git a/arch/tile/mm/hugetlbpage.c b/arch/tile/mm/hugetlbpage.c
index cb10153b5c9f..58d1f11830e3 100644
--- a/arch/tile/mm/hugetlbpage.c
+++ b/arch/tile/mm/hugetlbpage.c
@@ -102,7 +102,8 @@ static pte_t *get_pte(pte_t *base, int index, int level)
 	return ptep;
 }
 
-pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long addr)
+pte_t *huge_pte_offset(struct mm_struct *mm,
+		       unsigned long addr, struct hstate *h)
 {
 	pgd_t *pgd;
 	pud_t *pud;
diff --git a/arch/x86/mm/hugetlbpage.c b/arch/x86/mm/hugetlbpage.c
index c5066a260803..49d469fd4f07 100644
--- a/arch/x86/mm/hugetlbpage.c
+++ b/arch/x86/mm/hugetlbpage.c
@@ -31,7 +31,7 @@ follow_huge_addr(struct mm_struct *mm, unsigned long address, int write)
 	if (!vma || !is_vm_hugetlb_page(vma))
 		return ERR_PTR(-EINVAL);
 
-	pte = huge_pte_offset(mm, address);
+	pte = huge_pte_offset(mm, address, hstate_vma(vma));
 
 	/* hugetlb should be locked, and hence, prefaulted */
 	WARN_ON(!pte || pte_none(*pte));
diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
index 1d227b0fcf49..dabbf6e408d1 100644
--- a/fs/userfaultfd.c
+++ b/fs/userfaultfd.c
@@ -214,6 +214,7 @@ static inline struct uffd_msg userfault_msg(unsigned long address,
  * hugepmd ranges.
  */
 static inline bool userfaultfd_huge_must_wait(struct userfaultfd_ctx *ctx,
+					 struct vm_area_struct *vma,
 					 unsigned long address,
 					 unsigned long flags,
 					 unsigned long reason)
@@ -224,7 +225,7 @@ static inline bool userfaultfd_huge_must_wait(struct userfaultfd_ctx *ctx,
 
 	VM_BUG_ON(!rwsem_is_locked(&mm->mmap_sem));
 
-	pte = huge_pte_offset(mm, address);
+	pte = huge_pte_offset(mm, address, hstate_vma(vma));
 	if (!pte)
 		goto out;
 
@@ -243,6 +244,7 @@ static inline bool userfaultfd_huge_must_wait(struct userfaultfd_ctx *ctx,
 }
 #else
 static inline bool userfaultfd_huge_must_wait(struct userfaultfd_ctx *ctx,
+					 struct vm_area_struct *vma,
 					 unsigned long address,
 					 unsigned long flags,
 					 unsigned long reason)
@@ -435,7 +437,8 @@ int handle_userfault(struct vm_fault *vmf, unsigned long reason)
 		must_wait = userfaultfd_must_wait(ctx, vmf->address, vmf->flags,
 						  reason);
 	else
-		must_wait = userfaultfd_huge_must_wait(ctx, vmf->address,
+		must_wait = userfaultfd_huge_must_wait(ctx, vmf->vma,
+						       vmf->address,
 						       vmf->flags, reason);
 	up_read(&mm->mmap_sem);
 
diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index b857fc8cc2ec..c7f80729a1f9 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -113,7 +113,8 @@ extern struct list_head huge_boot_pages;
 
 pte_t *huge_pte_alloc(struct mm_struct *mm,
 			unsigned long addr, unsigned long sz);
-pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long addr);
+pte_t *huge_pte_offset(struct mm_struct *mm,
+		       unsigned long addr, struct hstate *h);
 int huge_pmd_unshare(struct mm_struct *mm, unsigned long *addr, pte_t *ptep);
 struct page *follow_huge_addr(struct mm_struct *mm, unsigned long address,
 			      int write);
@@ -157,7 +158,7 @@ static inline void hugetlb_show_meminfo(void)
 #define hugetlb_fault(mm, vma, addr, flags)	({ BUG(); 0; })
 #define hugetlb_mcopy_atomic_pte(dst_mm, dst_pte, dst_vma, dst_addr, \
 				src_addr, pagep)	({ BUG(); 0; })
-#define huge_pte_offset(mm, address)	0
+#define huge_pte_offset(mm, address, h)	0
 static inline int dequeue_hwpoisoned_huge_page(struct page *page)
 {
 	return 0;
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 3d0aab9ee80d..24e75982a638 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -3233,7 +3233,7 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src,
 
 	for (addr = vma->vm_start; addr < vma->vm_end; addr += sz) {
 		spinlock_t *src_ptl, *dst_ptl;
-		src_pte = huge_pte_offset(src, addr);
+		src_pte = huge_pte_offset(src, addr, h);
 		if (!src_pte)
 			continue;
 		dst_pte = huge_pte_alloc(dst, addr, sz);
@@ -3317,7 +3317,7 @@ void __unmap_hugepage_range(struct mmu_gather *tlb, struct vm_area_struct *vma,
 	mmu_notifier_invalidate_range_start(mm, mmun_start, mmun_end);
 	address = start;
 	for (; address < end; address += sz) {
-		ptep = huge_pte_offset(mm, address);
+		ptep = huge_pte_offset(mm, address, h);
 		if (!ptep)
 			continue;
 
@@ -3535,7 +3535,8 @@ static int hugetlb_cow(struct mm_struct *mm, struct vm_area_struct *vma,
 			unmap_ref_private(mm, vma, old_page, address);
 			BUG_ON(huge_pte_none(pte));
 			spin_lock(ptl);
-			ptep = huge_pte_offset(mm, address & huge_page_mask(h));
+			ptep = huge_pte_offset(mm, address & huge_page_mask(h),
+					       h);
 			if (likely(ptep &&
 				   pte_same(huge_ptep_get(ptep), pte)))
 				goto retry_avoidcopy;
@@ -3574,7 +3575,7 @@ static int hugetlb_cow(struct mm_struct *mm, struct vm_area_struct *vma,
 	 * before the page tables are altered
 	 */
 	spin_lock(ptl);
-	ptep = huge_pte_offset(mm, address & huge_page_mask(h));
+	ptep = huge_pte_offset(mm, address & huge_page_mask(h), h);
 	if (likely(ptep && pte_same(huge_ptep_get(ptep), pte))) {
 		ClearPagePrivate(new_page);
 
@@ -3861,7 +3862,7 @@ int hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
 
 	address &= huge_page_mask(h);
 
-	ptep = huge_pte_offset(mm, address);
+	ptep = huge_pte_offset(mm, address, h);
 	if (ptep) {
 		entry = huge_ptep_get(ptep);
 		if (unlikely(is_hugetlb_entry_migration(entry))) {
@@ -4118,7 +4119,7 @@ long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma,
 		 *
 		 * Note that page table lock is not held when pte is null.
 		 */
-		pte = huge_pte_offset(mm, vaddr & huge_page_mask(h));
+		pte = huge_pte_offset(mm, vaddr & huge_page_mask(h), h);
 		if (pte)
 			ptl = huge_pte_lock(h, mm, pte);
 		absent = !pte || huge_pte_none(huge_ptep_get(pte));
@@ -4252,7 +4253,7 @@ unsigned long hugetlb_change_protection(struct vm_area_struct *vma,
 	i_mmap_lock_write(vma->vm_file->f_mapping);
 	for (; address < end; address += huge_page_size(h)) {
 		spinlock_t *ptl;
-		ptep = huge_pte_offset(mm, address);
+		ptep = huge_pte_offset(mm, address, h);
 		if (!ptep)
 			continue;
 		ptl = huge_pte_lock(h, mm, ptep);
@@ -4514,7 +4515,8 @@ pte_t *huge_pmd_share(struct mm_struct *mm, unsigned long addr, pud_t *pud)
 
 		saddr = page_table_shareable(svma, vma, addr, idx);
 		if (saddr) {
-			spte = huge_pte_offset(svma->vm_mm, saddr);
+			spte = huge_pte_offset(svma->vm_mm, saddr,
+					       hstate_vma(svma));
 			if (spte) {
 				get_page(virt_to_page(spte));
 				break;
@@ -4610,7 +4612,8 @@ pte_t *huge_pte_alloc(struct mm_struct *mm,
 	return pte;
 }
 
-pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long addr)
+pte_t *huge_pte_offset(struct mm_struct *mm,
+		       unsigned long addr, struct hstate *h)
 {
 	pgd_t *pgd;
 	p4d_t *p4d;
diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c
index c4c9def8ffea..2bf529380079 100644
--- a/mm/page_vma_mapped.c
+++ b/mm/page_vma_mapped.c
@@ -120,7 +120,8 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw)
 
 	if (unlikely(PageHuge(pvmw->page))) {
 		/* when pud is not present, pte will be NULL */
-		pvmw->pte = huge_pte_offset(mm, pvmw->address);
+		pvmw->pte = huge_pte_offset(mm, pvmw->address,
+					    page_hstate(page));
 		if (!pvmw->pte)
 			return false;
 
diff --git a/mm/pagewalk.c b/mm/pagewalk.c
index 60f7856e508f..8805b68d353c 100644
--- a/mm/pagewalk.c
+++ b/mm/pagewalk.c
@@ -185,7 +185,7 @@ static int walk_hugetlb_range(unsigned long addr, unsigned long end,
 
 	do {
 		next = hugetlb_entry_end(h, addr, end);
-		pte = huge_pte_offset(walk->mm, addr & hmask);
+		pte = huge_pte_offset(walk->mm, addr & hmask, h);
 		if (pte && walk->hugetlb_entry)
 			err = walk->hugetlb_entry(pte, hmask, addr, next, walk);
 		if (err)
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 1/4] mm/hugetlb.c: add hstate parameter to huge_pte_offset()
@ 2017-03-30 16:38   ` Punit Agrawal
  0 siblings, 0 replies; 24+ messages in thread
From: Punit Agrawal @ 2017-03-30 16:38 UTC (permalink / raw)
  To: catalin.marinas, will.deacon, akpm
  Cc: Punit Agrawal, linux-mm, linux-arm-kernel, linux-kernel, tbaicar,
	kirill.shutemov, mike.kravetz, Tony Luck, Fenghua Yu,
	James Hogan, Ralf Baechle, James E.J. Bottomley, Helge Deller,
	Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	Martin Schwidefsky, Heiko Carstens, Yoshinori Sato, Rich Felker,
	David S. Miller, Chris Metcalf, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Alexander Viro, Steve Capper, Michal Hocko,
	Naoya Horiguchi, Aneesh Kumar K.V, Hillf Danton

A poisoned or migrated hugepage is stored as a swap entry in the page
tables. On architectures that support hugepages consisting of contiguous
page table entries (such as on arm64) this leads to ambiguity in
determining the page table entry to return in huge_pte_offset() when a
poisoned entry is encountered.

Let's remove the ambiguity by adding a hstate parameter to convey
additional information about the requested address. Also fixup the
definition/usage of huge_pte_offset() throughout the tree.

Signed-off-by: Punit Agrawal <punit.agrawal@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: James Hogan <james.hogan@imgtec.com> (odd fixer:METAG ARCHITECTURE)
Cc: Ralf Baechle <ralf@linux-mips.org> (supporter:MIPS)
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
Cc: Helge Deller <deller@gmx.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Rich Felker <dalias@libc.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Steve Capper <steve.capper@arm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
---
 arch/arm64/mm/hugetlbpage.c   |  3 ++-
 arch/ia64/mm/hugetlbpage.c    |  4 ++--
 arch/metag/mm/hugetlbpage.c   |  3 ++-
 arch/mips/mm/hugetlbpage.c    |  3 ++-
 arch/parisc/mm/hugetlbpage.c  |  3 ++-
 arch/powerpc/mm/hugetlbpage.c |  2 +-
 arch/s390/mm/hugetlbpage.c    |  3 ++-
 arch/sh/mm/hugetlbpage.c      |  3 ++-
 arch/sparc/mm/hugetlbpage.c   |  3 ++-
 arch/tile/mm/hugetlbpage.c    |  3 ++-
 arch/x86/mm/hugetlbpage.c     |  2 +-
 fs/userfaultfd.c              |  7 +++++--
 include/linux/hugetlb.h       |  5 +++--
 mm/hugetlb.c                  | 21 ++++++++++++---------
 mm/page_vma_mapped.c          |  3 ++-
 mm/pagewalk.c                 |  2 +-
 16 files changed, 43 insertions(+), 27 deletions(-)

diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c
index e2106932daa0..9ca742c4c1ab 100644
--- a/arch/arm64/mm/hugetlbpage.c
+++ b/arch/arm64/mm/hugetlbpage.c
@@ -189,7 +189,8 @@ pte_t *huge_pte_alloc(struct mm_struct *mm,
 	return pte;
 }
 
-pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long addr)
+pte_t *huge_pte_offset(struct mm_struct *mm,
+		       unsigned long addr, struct hstate *h)
 {
 	pgd_t *pgd;
 	pud_t *pud;
diff --git a/arch/ia64/mm/hugetlbpage.c b/arch/ia64/mm/hugetlbpage.c
index 85de86d36fdf..09c865be3cfe 100644
--- a/arch/ia64/mm/hugetlbpage.c
+++ b/arch/ia64/mm/hugetlbpage.c
@@ -44,7 +44,7 @@ huge_pte_alloc(struct mm_struct *mm, unsigned long addr, unsigned long sz)
 }
 
 pte_t *
-huge_pte_offset (struct mm_struct *mm, unsigned long addr)
+huge_pte_offset (struct mm_struct *mm, unsigned long addr, struct hstate *h)
 {
 	unsigned long taddr = htlbpage_to_page(addr);
 	pgd_t *pgd;
@@ -92,7 +92,7 @@ struct page *follow_huge_addr(struct mm_struct *mm, unsigned long addr, int writ
 	if (REGION_NUMBER(addr) != RGN_HPAGE)
 		return ERR_PTR(-EINVAL);
 
-	ptep = huge_pte_offset(mm, addr);
+	ptep = huge_pte_offset(mm, addr, size_to_hstate(HPAGE_SIZE));
 	if (!ptep || pte_none(*ptep))
 		return NULL;
 	page = pte_page(*ptep);
diff --git a/arch/metag/mm/hugetlbpage.c b/arch/metag/mm/hugetlbpage.c
index db1b7da91e4f..f3778c9b219d 100644
--- a/arch/metag/mm/hugetlbpage.c
+++ b/arch/metag/mm/hugetlbpage.c
@@ -74,7 +74,8 @@ pte_t *huge_pte_alloc(struct mm_struct *mm,
 	return pte;
 }
 
-pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long addr)
+pte_t *huge_pte_offset(struct mm_struct *mm,
+		       unsigned long addr, struct hstate *h)
 {
 	pgd_t *pgd;
 	pud_t *pud;
diff --git a/arch/mips/mm/hugetlbpage.c b/arch/mips/mm/hugetlbpage.c
index 74aa6f62468f..f0f32c13a511 100644
--- a/arch/mips/mm/hugetlbpage.c
+++ b/arch/mips/mm/hugetlbpage.c
@@ -36,7 +36,8 @@ pte_t *huge_pte_alloc(struct mm_struct *mm, unsigned long addr,
 	return pte;
 }
 
-pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long addr)
+pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long addr,
+		       struct hstate *h)
 {
 	pgd_t *pgd;
 	pud_t *pud;
diff --git a/arch/parisc/mm/hugetlbpage.c b/arch/parisc/mm/hugetlbpage.c
index aa50ac090e9b..ff05ba5f66ac 100644
--- a/arch/parisc/mm/hugetlbpage.c
+++ b/arch/parisc/mm/hugetlbpage.c
@@ -69,7 +69,8 @@ pte_t *huge_pte_alloc(struct mm_struct *mm,
 	return pte;
 }
 
-pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long addr)
+pte_t *huge_pte_offset(struct mm_struct *mm,
+		       unsigned long addr, struct hstate *h)
 {
 	pgd_t *pgd;
 	pud_t *pud;
diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c
index 8c3389cbcd12..9fddb22c60d9 100644
--- a/arch/powerpc/mm/hugetlbpage.c
+++ b/arch/powerpc/mm/hugetlbpage.c
@@ -55,7 +55,7 @@ static unsigned nr_gpages;
 
 #define hugepd_none(hpd)	(hpd_val(hpd) == 0)
 
-pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long addr)
+pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long addr, struct hstate *h)
 {
 	/* Only called for hugetlbfs pages, hence can ignore THP */
 	return __find_linux_pte_or_hugepte(mm->pgd, addr, NULL, NULL);
diff --git a/arch/s390/mm/hugetlbpage.c b/arch/s390/mm/hugetlbpage.c
index 9b4050caa4e9..7fe5532887de 100644
--- a/arch/s390/mm/hugetlbpage.c
+++ b/arch/s390/mm/hugetlbpage.c
@@ -176,7 +176,8 @@ pte_t *huge_pte_alloc(struct mm_struct *mm,
 	return (pte_t *) pmdp;
 }
 
-pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long addr)
+pte_t *huge_pte_offset(struct mm_struct *mm,
+		       unsigned long addr, struct hstate *h)
 {
 	pgd_t *pgdp;
 	pud_t *pudp;
diff --git a/arch/sh/mm/hugetlbpage.c b/arch/sh/mm/hugetlbpage.c
index cc948db74878..53781fdc222c 100644
--- a/arch/sh/mm/hugetlbpage.c
+++ b/arch/sh/mm/hugetlbpage.c
@@ -42,7 +42,8 @@ pte_t *huge_pte_alloc(struct mm_struct *mm,
 	return pte;
 }
 
-pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long addr)
+pte_t *huge_pte_offset(struct mm_struct *mm,
+		       unsigned long addr, struct hstate *h)
 {
 	pgd_t *pgd;
 	pud_t *pud;
diff --git a/arch/sparc/mm/hugetlbpage.c b/arch/sparc/mm/hugetlbpage.c
index 323bc6b6e3ad..5b292864e7d1 100644
--- a/arch/sparc/mm/hugetlbpage.c
+++ b/arch/sparc/mm/hugetlbpage.c
@@ -270,7 +270,8 @@ pte_t *huge_pte_alloc(struct mm_struct *mm,
 	return pte;
 }
 
-pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long addr)
+pte_t *huge_pte_offset(struct mm_struct *mm,
+		       unsigned long addr, struct hstate *h)
 {
 	pgd_t *pgd;
 	pud_t *pud;
diff --git a/arch/tile/mm/hugetlbpage.c b/arch/tile/mm/hugetlbpage.c
index cb10153b5c9f..58d1f11830e3 100644
--- a/arch/tile/mm/hugetlbpage.c
+++ b/arch/tile/mm/hugetlbpage.c
@@ -102,7 +102,8 @@ static pte_t *get_pte(pte_t *base, int index, int level)
 	return ptep;
 }
 
-pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long addr)
+pte_t *huge_pte_offset(struct mm_struct *mm,
+		       unsigned long addr, struct hstate *h)
 {
 	pgd_t *pgd;
 	pud_t *pud;
diff --git a/arch/x86/mm/hugetlbpage.c b/arch/x86/mm/hugetlbpage.c
index c5066a260803..49d469fd4f07 100644
--- a/arch/x86/mm/hugetlbpage.c
+++ b/arch/x86/mm/hugetlbpage.c
@@ -31,7 +31,7 @@ follow_huge_addr(struct mm_struct *mm, unsigned long address, int write)
 	if (!vma || !is_vm_hugetlb_page(vma))
 		return ERR_PTR(-EINVAL);
 
-	pte = huge_pte_offset(mm, address);
+	pte = huge_pte_offset(mm, address, hstate_vma(vma));
 
 	/* hugetlb should be locked, and hence, prefaulted */
 	WARN_ON(!pte || pte_none(*pte));
diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
index 1d227b0fcf49..dabbf6e408d1 100644
--- a/fs/userfaultfd.c
+++ b/fs/userfaultfd.c
@@ -214,6 +214,7 @@ static inline struct uffd_msg userfault_msg(unsigned long address,
  * hugepmd ranges.
  */
 static inline bool userfaultfd_huge_must_wait(struct userfaultfd_ctx *ctx,
+					 struct vm_area_struct *vma,
 					 unsigned long address,
 					 unsigned long flags,
 					 unsigned long reason)
@@ -224,7 +225,7 @@ static inline bool userfaultfd_huge_must_wait(struct userfaultfd_ctx *ctx,
 
 	VM_BUG_ON(!rwsem_is_locked(&mm->mmap_sem));
 
-	pte = huge_pte_offset(mm, address);
+	pte = huge_pte_offset(mm, address, hstate_vma(vma));
 	if (!pte)
 		goto out;
 
@@ -243,6 +244,7 @@ static inline bool userfaultfd_huge_must_wait(struct userfaultfd_ctx *ctx,
 }
 #else
 static inline bool userfaultfd_huge_must_wait(struct userfaultfd_ctx *ctx,
+					 struct vm_area_struct *vma,
 					 unsigned long address,
 					 unsigned long flags,
 					 unsigned long reason)
@@ -435,7 +437,8 @@ int handle_userfault(struct vm_fault *vmf, unsigned long reason)
 		must_wait = userfaultfd_must_wait(ctx, vmf->address, vmf->flags,
 						  reason);
 	else
-		must_wait = userfaultfd_huge_must_wait(ctx, vmf->address,
+		must_wait = userfaultfd_huge_must_wait(ctx, vmf->vma,
+						       vmf->address,
 						       vmf->flags, reason);
 	up_read(&mm->mmap_sem);
 
diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index b857fc8cc2ec..c7f80729a1f9 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -113,7 +113,8 @@ extern struct list_head huge_boot_pages;
 
 pte_t *huge_pte_alloc(struct mm_struct *mm,
 			unsigned long addr, unsigned long sz);
-pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long addr);
+pte_t *huge_pte_offset(struct mm_struct *mm,
+		       unsigned long addr, struct hstate *h);
 int huge_pmd_unshare(struct mm_struct *mm, unsigned long *addr, pte_t *ptep);
 struct page *follow_huge_addr(struct mm_struct *mm, unsigned long address,
 			      int write);
@@ -157,7 +158,7 @@ static inline void hugetlb_show_meminfo(void)
 #define hugetlb_fault(mm, vma, addr, flags)	({ BUG(); 0; })
 #define hugetlb_mcopy_atomic_pte(dst_mm, dst_pte, dst_vma, dst_addr, \
 				src_addr, pagep)	({ BUG(); 0; })
-#define huge_pte_offset(mm, address)	0
+#define huge_pte_offset(mm, address, h)	0
 static inline int dequeue_hwpoisoned_huge_page(struct page *page)
 {
 	return 0;
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 3d0aab9ee80d..24e75982a638 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -3233,7 +3233,7 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src,
 
 	for (addr = vma->vm_start; addr < vma->vm_end; addr += sz) {
 		spinlock_t *src_ptl, *dst_ptl;
-		src_pte = huge_pte_offset(src, addr);
+		src_pte = huge_pte_offset(src, addr, h);
 		if (!src_pte)
 			continue;
 		dst_pte = huge_pte_alloc(dst, addr, sz);
@@ -3317,7 +3317,7 @@ void __unmap_hugepage_range(struct mmu_gather *tlb, struct vm_area_struct *vma,
 	mmu_notifier_invalidate_range_start(mm, mmun_start, mmun_end);
 	address = start;
 	for (; address < end; address += sz) {
-		ptep = huge_pte_offset(mm, address);
+		ptep = huge_pte_offset(mm, address, h);
 		if (!ptep)
 			continue;
 
@@ -3535,7 +3535,8 @@ static int hugetlb_cow(struct mm_struct *mm, struct vm_area_struct *vma,
 			unmap_ref_private(mm, vma, old_page, address);
 			BUG_ON(huge_pte_none(pte));
 			spin_lock(ptl);
-			ptep = huge_pte_offset(mm, address & huge_page_mask(h));
+			ptep = huge_pte_offset(mm, address & huge_page_mask(h),
+					       h);
 			if (likely(ptep &&
 				   pte_same(huge_ptep_get(ptep), pte)))
 				goto retry_avoidcopy;
@@ -3574,7 +3575,7 @@ static int hugetlb_cow(struct mm_struct *mm, struct vm_area_struct *vma,
 	 * before the page tables are altered
 	 */
 	spin_lock(ptl);
-	ptep = huge_pte_offset(mm, address & huge_page_mask(h));
+	ptep = huge_pte_offset(mm, address & huge_page_mask(h), h);
 	if (likely(ptep && pte_same(huge_ptep_get(ptep), pte))) {
 		ClearPagePrivate(new_page);
 
@@ -3861,7 +3862,7 @@ int hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
 
 	address &= huge_page_mask(h);
 
-	ptep = huge_pte_offset(mm, address);
+	ptep = huge_pte_offset(mm, address, h);
 	if (ptep) {
 		entry = huge_ptep_get(ptep);
 		if (unlikely(is_hugetlb_entry_migration(entry))) {
@@ -4118,7 +4119,7 @@ long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma,
 		 *
 		 * Note that page table lock is not held when pte is null.
 		 */
-		pte = huge_pte_offset(mm, vaddr & huge_page_mask(h));
+		pte = huge_pte_offset(mm, vaddr & huge_page_mask(h), h);
 		if (pte)
 			ptl = huge_pte_lock(h, mm, pte);
 		absent = !pte || huge_pte_none(huge_ptep_get(pte));
@@ -4252,7 +4253,7 @@ unsigned long hugetlb_change_protection(struct vm_area_struct *vma,
 	i_mmap_lock_write(vma->vm_file->f_mapping);
 	for (; address < end; address += huge_page_size(h)) {
 		spinlock_t *ptl;
-		ptep = huge_pte_offset(mm, address);
+		ptep = huge_pte_offset(mm, address, h);
 		if (!ptep)
 			continue;
 		ptl = huge_pte_lock(h, mm, ptep);
@@ -4514,7 +4515,8 @@ pte_t *huge_pmd_share(struct mm_struct *mm, unsigned long addr, pud_t *pud)
 
 		saddr = page_table_shareable(svma, vma, addr, idx);
 		if (saddr) {
-			spte = huge_pte_offset(svma->vm_mm, saddr);
+			spte = huge_pte_offset(svma->vm_mm, saddr,
+					       hstate_vma(svma));
 			if (spte) {
 				get_page(virt_to_page(spte));
 				break;
@@ -4610,7 +4612,8 @@ pte_t *huge_pte_alloc(struct mm_struct *mm,
 	return pte;
 }
 
-pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long addr)
+pte_t *huge_pte_offset(struct mm_struct *mm,
+		       unsigned long addr, struct hstate *h)
 {
 	pgd_t *pgd;
 	p4d_t *p4d;
diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c
index c4c9def8ffea..2bf529380079 100644
--- a/mm/page_vma_mapped.c
+++ b/mm/page_vma_mapped.c
@@ -120,7 +120,8 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw)
 
 	if (unlikely(PageHuge(pvmw->page))) {
 		/* when pud is not present, pte will be NULL */
-		pvmw->pte = huge_pte_offset(mm, pvmw->address);
+		pvmw->pte = huge_pte_offset(mm, pvmw->address,
+					    page_hstate(page));
 		if (!pvmw->pte)
 			return false;
 
diff --git a/mm/pagewalk.c b/mm/pagewalk.c
index 60f7856e508f..8805b68d353c 100644
--- a/mm/pagewalk.c
+++ b/mm/pagewalk.c
@@ -185,7 +185,7 @@ static int walk_hugetlb_range(unsigned long addr, unsigned long end,
 
 	do {
 		next = hugetlb_entry_end(h, addr, end);
-		pte = huge_pte_offset(walk->mm, addr & hmask);
+		pte = huge_pte_offset(walk->mm, addr & hmask, h);
 		if (pte && walk->hugetlb_entry)
 			err = walk->hugetlb_entry(pte, hmask, addr, next, walk);
 		if (err)
-- 
2.11.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 1/4] mm/hugetlb.c: add hstate parameter to huge_pte_offset()
@ 2017-03-30 16:38   ` Punit Agrawal
  0 siblings, 0 replies; 24+ messages in thread
From: Punit Agrawal @ 2017-03-30 16:38 UTC (permalink / raw)
  To: linux-arm-kernel

A poisoned or migrated hugepage is stored as a swap entry in the page
tables. On architectures that support hugepages consisting of contiguous
page table entries (such as on arm64) this leads to ambiguity in
determining the page table entry to return in huge_pte_offset() when a
poisoned entry is encountered.

Let's remove the ambiguity by adding a hstate parameter to convey
additional information about the requested address. Also fixup the
definition/usage of huge_pte_offset() throughout the tree.

Signed-off-by: Punit Agrawal <punit.agrawal@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: James Hogan <james.hogan@imgtec.com> (odd fixer:METAG ARCHITECTURE)
Cc: Ralf Baechle <ralf@linux-mips.org> (supporter:MIPS)
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
Cc: Helge Deller <deller@gmx.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Rich Felker <dalias@libc.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Steve Capper <steve.capper@arm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
---
 arch/arm64/mm/hugetlbpage.c   |  3 ++-
 arch/ia64/mm/hugetlbpage.c    |  4 ++--
 arch/metag/mm/hugetlbpage.c   |  3 ++-
 arch/mips/mm/hugetlbpage.c    |  3 ++-
 arch/parisc/mm/hugetlbpage.c  |  3 ++-
 arch/powerpc/mm/hugetlbpage.c |  2 +-
 arch/s390/mm/hugetlbpage.c    |  3 ++-
 arch/sh/mm/hugetlbpage.c      |  3 ++-
 arch/sparc/mm/hugetlbpage.c   |  3 ++-
 arch/tile/mm/hugetlbpage.c    |  3 ++-
 arch/x86/mm/hugetlbpage.c     |  2 +-
 fs/userfaultfd.c              |  7 +++++--
 include/linux/hugetlb.h       |  5 +++--
 mm/hugetlb.c                  | 21 ++++++++++++---------
 mm/page_vma_mapped.c          |  3 ++-
 mm/pagewalk.c                 |  2 +-
 16 files changed, 43 insertions(+), 27 deletions(-)

diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c
index e2106932daa0..9ca742c4c1ab 100644
--- a/arch/arm64/mm/hugetlbpage.c
+++ b/arch/arm64/mm/hugetlbpage.c
@@ -189,7 +189,8 @@ pte_t *huge_pte_alloc(struct mm_struct *mm,
 	return pte;
 }
 
-pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long addr)
+pte_t *huge_pte_offset(struct mm_struct *mm,
+		       unsigned long addr, struct hstate *h)
 {
 	pgd_t *pgd;
 	pud_t *pud;
diff --git a/arch/ia64/mm/hugetlbpage.c b/arch/ia64/mm/hugetlbpage.c
index 85de86d36fdf..09c865be3cfe 100644
--- a/arch/ia64/mm/hugetlbpage.c
+++ b/arch/ia64/mm/hugetlbpage.c
@@ -44,7 +44,7 @@ huge_pte_alloc(struct mm_struct *mm, unsigned long addr, unsigned long sz)
 }
 
 pte_t *
-huge_pte_offset (struct mm_struct *mm, unsigned long addr)
+huge_pte_offset (struct mm_struct *mm, unsigned long addr, struct hstate *h)
 {
 	unsigned long taddr = htlbpage_to_page(addr);
 	pgd_t *pgd;
@@ -92,7 +92,7 @@ struct page *follow_huge_addr(struct mm_struct *mm, unsigned long addr, int writ
 	if (REGION_NUMBER(addr) != RGN_HPAGE)
 		return ERR_PTR(-EINVAL);
 
-	ptep = huge_pte_offset(mm, addr);
+	ptep = huge_pte_offset(mm, addr, size_to_hstate(HPAGE_SIZE));
 	if (!ptep || pte_none(*ptep))
 		return NULL;
 	page = pte_page(*ptep);
diff --git a/arch/metag/mm/hugetlbpage.c b/arch/metag/mm/hugetlbpage.c
index db1b7da91e4f..f3778c9b219d 100644
--- a/arch/metag/mm/hugetlbpage.c
+++ b/arch/metag/mm/hugetlbpage.c
@@ -74,7 +74,8 @@ pte_t *huge_pte_alloc(struct mm_struct *mm,
 	return pte;
 }
 
-pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long addr)
+pte_t *huge_pte_offset(struct mm_struct *mm,
+		       unsigned long addr, struct hstate *h)
 {
 	pgd_t *pgd;
 	pud_t *pud;
diff --git a/arch/mips/mm/hugetlbpage.c b/arch/mips/mm/hugetlbpage.c
index 74aa6f62468f..f0f32c13a511 100644
--- a/arch/mips/mm/hugetlbpage.c
+++ b/arch/mips/mm/hugetlbpage.c
@@ -36,7 +36,8 @@ pte_t *huge_pte_alloc(struct mm_struct *mm, unsigned long addr,
 	return pte;
 }
 
-pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long addr)
+pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long addr,
+		       struct hstate *h)
 {
 	pgd_t *pgd;
 	pud_t *pud;
diff --git a/arch/parisc/mm/hugetlbpage.c b/arch/parisc/mm/hugetlbpage.c
index aa50ac090e9b..ff05ba5f66ac 100644
--- a/arch/parisc/mm/hugetlbpage.c
+++ b/arch/parisc/mm/hugetlbpage.c
@@ -69,7 +69,8 @@ pte_t *huge_pte_alloc(struct mm_struct *mm,
 	return pte;
 }
 
-pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long addr)
+pte_t *huge_pte_offset(struct mm_struct *mm,
+		       unsigned long addr, struct hstate *h)
 {
 	pgd_t *pgd;
 	pud_t *pud;
diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c
index 8c3389cbcd12..9fddb22c60d9 100644
--- a/arch/powerpc/mm/hugetlbpage.c
+++ b/arch/powerpc/mm/hugetlbpage.c
@@ -55,7 +55,7 @@ static unsigned nr_gpages;
 
 #define hugepd_none(hpd)	(hpd_val(hpd) == 0)
 
-pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long addr)
+pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long addr, struct hstate *h)
 {
 	/* Only called for hugetlbfs pages, hence can ignore THP */
 	return __find_linux_pte_or_hugepte(mm->pgd, addr, NULL, NULL);
diff --git a/arch/s390/mm/hugetlbpage.c b/arch/s390/mm/hugetlbpage.c
index 9b4050caa4e9..7fe5532887de 100644
--- a/arch/s390/mm/hugetlbpage.c
+++ b/arch/s390/mm/hugetlbpage.c
@@ -176,7 +176,8 @@ pte_t *huge_pte_alloc(struct mm_struct *mm,
 	return (pte_t *) pmdp;
 }
 
-pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long addr)
+pte_t *huge_pte_offset(struct mm_struct *mm,
+		       unsigned long addr, struct hstate *h)
 {
 	pgd_t *pgdp;
 	pud_t *pudp;
diff --git a/arch/sh/mm/hugetlbpage.c b/arch/sh/mm/hugetlbpage.c
index cc948db74878..53781fdc222c 100644
--- a/arch/sh/mm/hugetlbpage.c
+++ b/arch/sh/mm/hugetlbpage.c
@@ -42,7 +42,8 @@ pte_t *huge_pte_alloc(struct mm_struct *mm,
 	return pte;
 }
 
-pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long addr)
+pte_t *huge_pte_offset(struct mm_struct *mm,
+		       unsigned long addr, struct hstate *h)
 {
 	pgd_t *pgd;
 	pud_t *pud;
diff --git a/arch/sparc/mm/hugetlbpage.c b/arch/sparc/mm/hugetlbpage.c
index 323bc6b6e3ad..5b292864e7d1 100644
--- a/arch/sparc/mm/hugetlbpage.c
+++ b/arch/sparc/mm/hugetlbpage.c
@@ -270,7 +270,8 @@ pte_t *huge_pte_alloc(struct mm_struct *mm,
 	return pte;
 }
 
-pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long addr)
+pte_t *huge_pte_offset(struct mm_struct *mm,
+		       unsigned long addr, struct hstate *h)
 {
 	pgd_t *pgd;
 	pud_t *pud;
diff --git a/arch/tile/mm/hugetlbpage.c b/arch/tile/mm/hugetlbpage.c
index cb10153b5c9f..58d1f11830e3 100644
--- a/arch/tile/mm/hugetlbpage.c
+++ b/arch/tile/mm/hugetlbpage.c
@@ -102,7 +102,8 @@ static pte_t *get_pte(pte_t *base, int index, int level)
 	return ptep;
 }
 
-pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long addr)
+pte_t *huge_pte_offset(struct mm_struct *mm,
+		       unsigned long addr, struct hstate *h)
 {
 	pgd_t *pgd;
 	pud_t *pud;
diff --git a/arch/x86/mm/hugetlbpage.c b/arch/x86/mm/hugetlbpage.c
index c5066a260803..49d469fd4f07 100644
--- a/arch/x86/mm/hugetlbpage.c
+++ b/arch/x86/mm/hugetlbpage.c
@@ -31,7 +31,7 @@ follow_huge_addr(struct mm_struct *mm, unsigned long address, int write)
 	if (!vma || !is_vm_hugetlb_page(vma))
 		return ERR_PTR(-EINVAL);
 
-	pte = huge_pte_offset(mm, address);
+	pte = huge_pte_offset(mm, address, hstate_vma(vma));
 
 	/* hugetlb should be locked, and hence, prefaulted */
 	WARN_ON(!pte || pte_none(*pte));
diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
index 1d227b0fcf49..dabbf6e408d1 100644
--- a/fs/userfaultfd.c
+++ b/fs/userfaultfd.c
@@ -214,6 +214,7 @@ static inline struct uffd_msg userfault_msg(unsigned long address,
  * hugepmd ranges.
  */
 static inline bool userfaultfd_huge_must_wait(struct userfaultfd_ctx *ctx,
+					 struct vm_area_struct *vma,
 					 unsigned long address,
 					 unsigned long flags,
 					 unsigned long reason)
@@ -224,7 +225,7 @@ static inline bool userfaultfd_huge_must_wait(struct userfaultfd_ctx *ctx,
 
 	VM_BUG_ON(!rwsem_is_locked(&mm->mmap_sem));
 
-	pte = huge_pte_offset(mm, address);
+	pte = huge_pte_offset(mm, address, hstate_vma(vma));
 	if (!pte)
 		goto out;
 
@@ -243,6 +244,7 @@ static inline bool userfaultfd_huge_must_wait(struct userfaultfd_ctx *ctx,
 }
 #else
 static inline bool userfaultfd_huge_must_wait(struct userfaultfd_ctx *ctx,
+					 struct vm_area_struct *vma,
 					 unsigned long address,
 					 unsigned long flags,
 					 unsigned long reason)
@@ -435,7 +437,8 @@ int handle_userfault(struct vm_fault *vmf, unsigned long reason)
 		must_wait = userfaultfd_must_wait(ctx, vmf->address, vmf->flags,
 						  reason);
 	else
-		must_wait = userfaultfd_huge_must_wait(ctx, vmf->address,
+		must_wait = userfaultfd_huge_must_wait(ctx, vmf->vma,
+						       vmf->address,
 						       vmf->flags, reason);
 	up_read(&mm->mmap_sem);
 
diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index b857fc8cc2ec..c7f80729a1f9 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -113,7 +113,8 @@ extern struct list_head huge_boot_pages;
 
 pte_t *huge_pte_alloc(struct mm_struct *mm,
 			unsigned long addr, unsigned long sz);
-pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long addr);
+pte_t *huge_pte_offset(struct mm_struct *mm,
+		       unsigned long addr, struct hstate *h);
 int huge_pmd_unshare(struct mm_struct *mm, unsigned long *addr, pte_t *ptep);
 struct page *follow_huge_addr(struct mm_struct *mm, unsigned long address,
 			      int write);
@@ -157,7 +158,7 @@ static inline void hugetlb_show_meminfo(void)
 #define hugetlb_fault(mm, vma, addr, flags)	({ BUG(); 0; })
 #define hugetlb_mcopy_atomic_pte(dst_mm, dst_pte, dst_vma, dst_addr, \
 				src_addr, pagep)	({ BUG(); 0; })
-#define huge_pte_offset(mm, address)	0
+#define huge_pte_offset(mm, address, h)	0
 static inline int dequeue_hwpoisoned_huge_page(struct page *page)
 {
 	return 0;
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 3d0aab9ee80d..24e75982a638 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -3233,7 +3233,7 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src,
 
 	for (addr = vma->vm_start; addr < vma->vm_end; addr += sz) {
 		spinlock_t *src_ptl, *dst_ptl;
-		src_pte = huge_pte_offset(src, addr);
+		src_pte = huge_pte_offset(src, addr, h);
 		if (!src_pte)
 			continue;
 		dst_pte = huge_pte_alloc(dst, addr, sz);
@@ -3317,7 +3317,7 @@ void __unmap_hugepage_range(struct mmu_gather *tlb, struct vm_area_struct *vma,
 	mmu_notifier_invalidate_range_start(mm, mmun_start, mmun_end);
 	address = start;
 	for (; address < end; address += sz) {
-		ptep = huge_pte_offset(mm, address);
+		ptep = huge_pte_offset(mm, address, h);
 		if (!ptep)
 			continue;
 
@@ -3535,7 +3535,8 @@ static int hugetlb_cow(struct mm_struct *mm, struct vm_area_struct *vma,
 			unmap_ref_private(mm, vma, old_page, address);
 			BUG_ON(huge_pte_none(pte));
 			spin_lock(ptl);
-			ptep = huge_pte_offset(mm, address & huge_page_mask(h));
+			ptep = huge_pte_offset(mm, address & huge_page_mask(h),
+					       h);
 			if (likely(ptep &&
 				   pte_same(huge_ptep_get(ptep), pte)))
 				goto retry_avoidcopy;
@@ -3574,7 +3575,7 @@ static int hugetlb_cow(struct mm_struct *mm, struct vm_area_struct *vma,
 	 * before the page tables are altered
 	 */
 	spin_lock(ptl);
-	ptep = huge_pte_offset(mm, address & huge_page_mask(h));
+	ptep = huge_pte_offset(mm, address & huge_page_mask(h), h);
 	if (likely(ptep && pte_same(huge_ptep_get(ptep), pte))) {
 		ClearPagePrivate(new_page);
 
@@ -3861,7 +3862,7 @@ int hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
 
 	address &= huge_page_mask(h);
 
-	ptep = huge_pte_offset(mm, address);
+	ptep = huge_pte_offset(mm, address, h);
 	if (ptep) {
 		entry = huge_ptep_get(ptep);
 		if (unlikely(is_hugetlb_entry_migration(entry))) {
@@ -4118,7 +4119,7 @@ long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma,
 		 *
 		 * Note that page table lock is not held when pte is null.
 		 */
-		pte = huge_pte_offset(mm, vaddr & huge_page_mask(h));
+		pte = huge_pte_offset(mm, vaddr & huge_page_mask(h), h);
 		if (pte)
 			ptl = huge_pte_lock(h, mm, pte);
 		absent = !pte || huge_pte_none(huge_ptep_get(pte));
@@ -4252,7 +4253,7 @@ unsigned long hugetlb_change_protection(struct vm_area_struct *vma,
 	i_mmap_lock_write(vma->vm_file->f_mapping);
 	for (; address < end; address += huge_page_size(h)) {
 		spinlock_t *ptl;
-		ptep = huge_pte_offset(mm, address);
+		ptep = huge_pte_offset(mm, address, h);
 		if (!ptep)
 			continue;
 		ptl = huge_pte_lock(h, mm, ptep);
@@ -4514,7 +4515,8 @@ pte_t *huge_pmd_share(struct mm_struct *mm, unsigned long addr, pud_t *pud)
 
 		saddr = page_table_shareable(svma, vma, addr, idx);
 		if (saddr) {
-			spte = huge_pte_offset(svma->vm_mm, saddr);
+			spte = huge_pte_offset(svma->vm_mm, saddr,
+					       hstate_vma(svma));
 			if (spte) {
 				get_page(virt_to_page(spte));
 				break;
@@ -4610,7 +4612,8 @@ pte_t *huge_pte_alloc(struct mm_struct *mm,
 	return pte;
 }
 
-pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long addr)
+pte_t *huge_pte_offset(struct mm_struct *mm,
+		       unsigned long addr, struct hstate *h)
 {
 	pgd_t *pgd;
 	p4d_t *p4d;
diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c
index c4c9def8ffea..2bf529380079 100644
--- a/mm/page_vma_mapped.c
+++ b/mm/page_vma_mapped.c
@@ -120,7 +120,8 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw)
 
 	if (unlikely(PageHuge(pvmw->page))) {
 		/* when pud is not present, pte will be NULL */
-		pvmw->pte = huge_pte_offset(mm, pvmw->address);
+		pvmw->pte = huge_pte_offset(mm, pvmw->address,
+					    page_hstate(page));
 		if (!pvmw->pte)
 			return false;
 
diff --git a/mm/pagewalk.c b/mm/pagewalk.c
index 60f7856e508f..8805b68d353c 100644
--- a/mm/pagewalk.c
+++ b/mm/pagewalk.c
@@ -185,7 +185,7 @@ static int walk_hugetlb_range(unsigned long addr, unsigned long end,
 
 	do {
 		next = hugetlb_entry_end(h, addr, end);
-		pte = huge_pte_offset(walk->mm, addr & hmask);
+		pte = huge_pte_offset(walk->mm, addr & hmask, h);
 		if (pte && walk->hugetlb_entry)
 			err = walk->hugetlb_entry(pte, hmask, addr, next, walk);
 		if (err)
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 2/4] arm64: hugetlbpages: Correctly handle swap entries in huge_pte_offset()
  2017-03-30 16:38 ` Punit Agrawal
  (?)
@ 2017-03-30 16:38   ` Punit Agrawal
  -1 siblings, 0 replies; 24+ messages in thread
From: Punit Agrawal @ 2017-03-30 16:38 UTC (permalink / raw)
  To: catalin.marinas, will.deacon, akpm
  Cc: Punit Agrawal, linux-mm, linux-arm-kernel, linux-kernel, tbaicar,
	kirill.shutemov, mike.kravetz, David Woods

huge_pte_offset() does not correctly handle poisoned or migration page
table entries. Not knowing the size of the hugepage entry being
requested only compounded the problem.

The recently added hstate parameter can be used to determine the size of
hugepage being accessed. Use the size to find the correct page table
entry to return when coming across a swap page table entry.

Signed-off-by: Punit Agrawal <punit.agrawal@arm.com>
Cc: David Woods <dwoods@mellanox.com>
---
 arch/arm64/mm/hugetlbpage.c | 31 ++++++++++++++++---------------
 1 file changed, 16 insertions(+), 15 deletions(-)

diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c
index 9ca742c4c1ab..44014403081f 100644
--- a/arch/arm64/mm/hugetlbpage.c
+++ b/arch/arm64/mm/hugetlbpage.c
@@ -192,38 +192,39 @@ pte_t *huge_pte_alloc(struct mm_struct *mm,
 pte_t *huge_pte_offset(struct mm_struct *mm,
 		       unsigned long addr, struct hstate *h)
 {
+	unsigned long sz = huge_page_size(h);
 	pgd_t *pgd;
 	pud_t *pud;
-	pmd_t *pmd = NULL;
-	pte_t *pte = NULL;
+	pmd_t *pmd;
+	pte_t *pte;
 
 	pgd = pgd_offset(mm, addr);
 	pr_debug("%s: addr:0x%lx pgd:%p\n", __func__, addr, pgd);
 	if (!pgd_present(*pgd))
 		return NULL;
+
 	pud = pud_offset(pgd, addr);
-	if (!pud_present(*pud))
+	if (pud_none(*pud) && sz != PUD_SIZE)
 		return NULL;
-
-	if (pud_huge(*pud))
+	else if (!pud_table(*pud))
 		return (pte_t *)pud;
+
+	if (sz == CONT_PMD_SIZE)
+		addr &= CONT_PMD_MASK;
+
 	pmd = pmd_offset(pud, addr);
-	if (!pmd_present(*pmd))
+	if (pmd_none(*pmd) &&
+	    !(sz == PMD_SIZE || sz == CONT_PMD_SIZE))
 		return NULL;
-
-	if (pte_cont(pmd_pte(*pmd))) {
-		pmd = pmd_offset(
-			pud, (addr & CONT_PMD_MASK));
-		return (pte_t *)pmd;
-	}
-	if (pmd_huge(*pmd))
+	else if (!pmd_table(*pmd))
 		return (pte_t *)pmd;
-	pte = pte_offset_kernel(pmd, addr);
-	if (pte_present(*pte) && pte_cont(*pte)) {
+
+	if (sz == CONT_PTE_SIZE) {
 		pte = pte_offset_kernel(
 			pmd, (addr & CONT_PTE_MASK));
 		return pte;
 	}
+
 	return NULL;
 }
 
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 2/4] arm64: hugetlbpages: Correctly handle swap entries in huge_pte_offset()
@ 2017-03-30 16:38   ` Punit Agrawal
  0 siblings, 0 replies; 24+ messages in thread
From: Punit Agrawal @ 2017-03-30 16:38 UTC (permalink / raw)
  To: catalin.marinas, will.deacon, akpm
  Cc: Punit Agrawal, linux-mm, linux-arm-kernel, linux-kernel, tbaicar,
	kirill.shutemov, mike.kravetz, David Woods

huge_pte_offset() does not correctly handle poisoned or migration page
table entries. Not knowing the size of the hugepage entry being
requested only compounded the problem.

The recently added hstate parameter can be used to determine the size of
hugepage being accessed. Use the size to find the correct page table
entry to return when coming across a swap page table entry.

Signed-off-by: Punit Agrawal <punit.agrawal@arm.com>
Cc: David Woods <dwoods@mellanox.com>
---
 arch/arm64/mm/hugetlbpage.c | 31 ++++++++++++++++---------------
 1 file changed, 16 insertions(+), 15 deletions(-)

diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c
index 9ca742c4c1ab..44014403081f 100644
--- a/arch/arm64/mm/hugetlbpage.c
+++ b/arch/arm64/mm/hugetlbpage.c
@@ -192,38 +192,39 @@ pte_t *huge_pte_alloc(struct mm_struct *mm,
 pte_t *huge_pte_offset(struct mm_struct *mm,
 		       unsigned long addr, struct hstate *h)
 {
+	unsigned long sz = huge_page_size(h);
 	pgd_t *pgd;
 	pud_t *pud;
-	pmd_t *pmd = NULL;
-	pte_t *pte = NULL;
+	pmd_t *pmd;
+	pte_t *pte;
 
 	pgd = pgd_offset(mm, addr);
 	pr_debug("%s: addr:0x%lx pgd:%p\n", __func__, addr, pgd);
 	if (!pgd_present(*pgd))
 		return NULL;
+
 	pud = pud_offset(pgd, addr);
-	if (!pud_present(*pud))
+	if (pud_none(*pud) && sz != PUD_SIZE)
 		return NULL;
-
-	if (pud_huge(*pud))
+	else if (!pud_table(*pud))
 		return (pte_t *)pud;
+
+	if (sz == CONT_PMD_SIZE)
+		addr &= CONT_PMD_MASK;
+
 	pmd = pmd_offset(pud, addr);
-	if (!pmd_present(*pmd))
+	if (pmd_none(*pmd) &&
+	    !(sz == PMD_SIZE || sz == CONT_PMD_SIZE))
 		return NULL;
-
-	if (pte_cont(pmd_pte(*pmd))) {
-		pmd = pmd_offset(
-			pud, (addr & CONT_PMD_MASK));
-		return (pte_t *)pmd;
-	}
-	if (pmd_huge(*pmd))
+	else if (!pmd_table(*pmd))
 		return (pte_t *)pmd;
-	pte = pte_offset_kernel(pmd, addr);
-	if (pte_present(*pte) && pte_cont(*pte)) {
+
+	if (sz == CONT_PTE_SIZE) {
 		pte = pte_offset_kernel(
 			pmd, (addr & CONT_PTE_MASK));
 		return pte;
 	}
+
 	return NULL;
 }
 
-- 
2.11.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 2/4] arm64: hugetlbpages: Correctly handle swap entries in huge_pte_offset()
@ 2017-03-30 16:38   ` Punit Agrawal
  0 siblings, 0 replies; 24+ messages in thread
From: Punit Agrawal @ 2017-03-30 16:38 UTC (permalink / raw)
  To: linux-arm-kernel

huge_pte_offset() does not correctly handle poisoned or migration page
table entries. Not knowing the size of the hugepage entry being
requested only compounded the problem.

The recently added hstate parameter can be used to determine the size of
hugepage being accessed. Use the size to find the correct page table
entry to return when coming across a swap page table entry.

Signed-off-by: Punit Agrawal <punit.agrawal@arm.com>
Cc: David Woods <dwoods@mellanox.com>
---
 arch/arm64/mm/hugetlbpage.c | 31 ++++++++++++++++---------------
 1 file changed, 16 insertions(+), 15 deletions(-)

diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c
index 9ca742c4c1ab..44014403081f 100644
--- a/arch/arm64/mm/hugetlbpage.c
+++ b/arch/arm64/mm/hugetlbpage.c
@@ -192,38 +192,39 @@ pte_t *huge_pte_alloc(struct mm_struct *mm,
 pte_t *huge_pte_offset(struct mm_struct *mm,
 		       unsigned long addr, struct hstate *h)
 {
+	unsigned long sz = huge_page_size(h);
 	pgd_t *pgd;
 	pud_t *pud;
-	pmd_t *pmd = NULL;
-	pte_t *pte = NULL;
+	pmd_t *pmd;
+	pte_t *pte;
 
 	pgd = pgd_offset(mm, addr);
 	pr_debug("%s: addr:0x%lx pgd:%p\n", __func__, addr, pgd);
 	if (!pgd_present(*pgd))
 		return NULL;
+
 	pud = pud_offset(pgd, addr);
-	if (!pud_present(*pud))
+	if (pud_none(*pud) && sz != PUD_SIZE)
 		return NULL;
-
-	if (pud_huge(*pud))
+	else if (!pud_table(*pud))
 		return (pte_t *)pud;
+
+	if (sz == CONT_PMD_SIZE)
+		addr &= CONT_PMD_MASK;
+
 	pmd = pmd_offset(pud, addr);
-	if (!pmd_present(*pmd))
+	if (pmd_none(*pmd) &&
+	    !(sz == PMD_SIZE || sz == CONT_PMD_SIZE))
 		return NULL;
-
-	if (pte_cont(pmd_pte(*pmd))) {
-		pmd = pmd_offset(
-			pud, (addr & CONT_PMD_MASK));
-		return (pte_t *)pmd;
-	}
-	if (pmd_huge(*pmd))
+	else if (!pmd_table(*pmd))
 		return (pte_t *)pmd;
-	pte = pte_offset_kernel(pmd, addr);
-	if (pte_present(*pte) && pte_cont(*pte)) {
+
+	if (sz == CONT_PTE_SIZE) {
 		pte = pte_offset_kernel(
 			pmd, (addr & CONT_PTE_MASK));
 		return pte;
 	}
+
 	return NULL;
 }
 
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 3/4] arm64: hwpoison: add VM_FAULT_HWPOISON[_LARGE] handling
  2017-03-30 16:38 ` Punit Agrawal
  (?)
@ 2017-03-30 16:38   ` Punit Agrawal
  -1 siblings, 0 replies; 24+ messages in thread
From: Punit Agrawal @ 2017-03-30 16:38 UTC (permalink / raw)
  To: catalin.marinas, will.deacon, akpm
  Cc: Jonathan (Zhixiong) Zhang, linux-mm, linux-arm-kernel,
	linux-kernel, tbaicar, kirill.shutemov, mike.kravetz,
	Punit Agrawal

From: "Jonathan (Zhixiong) Zhang" <zjzhang@codeaurora.org>

Add VM_FAULT_HWPOISON[_LARGE] handling to the arm64 page fault
handler. Handling of VM_FAULT_HWPOISON[_LARGE] is very similar
to VM_FAULT_OOM, the only difference is that a different si_code
(BUS_MCEERR_AR) is passed to user space and si_addr_lsb field is
initialized.

Signed-off-by: Jonathan (Zhixiong) Zhang <zjzhang@codeaurora.org>
Signed-off-by: Tyler Baicar <tbaicar@codeaurora.org>
Signed-off-by: Punit Agrawal <punit.agrawal@arm.com>
---
 arch/arm64/mm/fault.c | 22 +++++++++++++++++++---
 1 file changed, 19 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
index 4bf899fb451b..212c862b2fd0 100644
--- a/arch/arm64/mm/fault.c
+++ b/arch/arm64/mm/fault.c
@@ -31,6 +31,7 @@
 #include <linux/highmem.h>
 #include <linux/perf_event.h>
 #include <linux/preempt.h>
+#include <linux/hugetlb.h>
 
 #include <asm/bug.h>
 #include <asm/cpufeature.h>
@@ -194,9 +195,10 @@ static void __do_kernel_fault(struct mm_struct *mm, unsigned long addr,
  */
 static void __do_user_fault(struct task_struct *tsk, unsigned long addr,
 			    unsigned int esr, unsigned int sig, int code,
-			    struct pt_regs *regs)
+			    struct pt_regs *regs, int fault)
 {
 	struct siginfo si;
+	unsigned int lsb = 0;
 
 	if (unhandled_signal(tsk, sig) && show_unhandled_signals_ratelimited()) {
 		pr_info("%s[%d]: unhandled %s (%d) at 0x%08lx, esr 0x%03x\n",
@@ -212,6 +214,17 @@ static void __do_user_fault(struct task_struct *tsk, unsigned long addr,
 	si.si_errno = 0;
 	si.si_code = code;
 	si.si_addr = (void __user *)addr;
+	/*
+	 * Either small page or large page may be poisoned.
+	 * In other words, VM_FAULT_HWPOISON_LARGE and
+	 * VM_FAULT_HWPOISON are mutually exclusive.
+	 */
+	if (fault & VM_FAULT_HWPOISON_LARGE)
+		lsb = hstate_index_to_shift(VM_FAULT_GET_HINDEX(fault));
+	else if (fault & VM_FAULT_HWPOISON)
+		lsb = PAGE_SHIFT;
+	si.si_addr_lsb = lsb;
+
 	force_sig_info(sig, &si, tsk);
 }
 
@@ -225,7 +238,7 @@ static void do_bad_area(unsigned long addr, unsigned int esr, struct pt_regs *re
 	 * handle this fault with.
 	 */
 	if (user_mode(regs))
-		__do_user_fault(tsk, addr, esr, SIGSEGV, SEGV_MAPERR, regs);
+		__do_user_fault(tsk, addr, esr, SIGSEGV, SEGV_MAPERR, regs, 0);
 	else
 		__do_kernel_fault(mm, addr, esr, regs);
 }
@@ -427,6 +440,9 @@ static int __kprobes do_page_fault(unsigned long addr, unsigned int esr,
 		 */
 		sig = SIGBUS;
 		code = BUS_ADRERR;
+	} else if (fault & (VM_FAULT_HWPOISON | VM_FAULT_HWPOISON_LARGE)) {
+		sig = SIGBUS;
+		code = BUS_MCEERR_AR;
 	} else {
 		/*
 		 * Something tried to access memory that isn't in our memory
@@ -437,7 +453,7 @@ static int __kprobes do_page_fault(unsigned long addr, unsigned int esr,
 			SEGV_ACCERR : SEGV_MAPERR;
 	}
 
-	__do_user_fault(tsk, addr, esr, sig, code, regs);
+	__do_user_fault(tsk, addr, esr, sig, code, regs, fault);
 	return 0;
 
 no_context:
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 3/4] arm64: hwpoison: add VM_FAULT_HWPOISON[_LARGE] handling
@ 2017-03-30 16:38   ` Punit Agrawal
  0 siblings, 0 replies; 24+ messages in thread
From: Punit Agrawal @ 2017-03-30 16:38 UTC (permalink / raw)
  To: catalin.marinas, will.deacon, akpm
  Cc: Jonathan (Zhixiong) Zhang, linux-mm, linux-arm-kernel,
	linux-kernel, tbaicar, kirill.shutemov, mike.kravetz,
	Punit Agrawal

From: "Jonathan (Zhixiong) Zhang" <zjzhang@codeaurora.org>

Add VM_FAULT_HWPOISON[_LARGE] handling to the arm64 page fault
handler. Handling of VM_FAULT_HWPOISON[_LARGE] is very similar
to VM_FAULT_OOM, the only difference is that a different si_code
(BUS_MCEERR_AR) is passed to user space and si_addr_lsb field is
initialized.

Signed-off-by: Jonathan (Zhixiong) Zhang <zjzhang@codeaurora.org>
Signed-off-by: Tyler Baicar <tbaicar@codeaurora.org>
Signed-off-by: Punit Agrawal <punit.agrawal@arm.com>
---
 arch/arm64/mm/fault.c | 22 +++++++++++++++++++---
 1 file changed, 19 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
index 4bf899fb451b..212c862b2fd0 100644
--- a/arch/arm64/mm/fault.c
+++ b/arch/arm64/mm/fault.c
@@ -31,6 +31,7 @@
 #include <linux/highmem.h>
 #include <linux/perf_event.h>
 #include <linux/preempt.h>
+#include <linux/hugetlb.h>
 
 #include <asm/bug.h>
 #include <asm/cpufeature.h>
@@ -194,9 +195,10 @@ static void __do_kernel_fault(struct mm_struct *mm, unsigned long addr,
  */
 static void __do_user_fault(struct task_struct *tsk, unsigned long addr,
 			    unsigned int esr, unsigned int sig, int code,
-			    struct pt_regs *regs)
+			    struct pt_regs *regs, int fault)
 {
 	struct siginfo si;
+	unsigned int lsb = 0;
 
 	if (unhandled_signal(tsk, sig) && show_unhandled_signals_ratelimited()) {
 		pr_info("%s[%d]: unhandled %s (%d) at 0x%08lx, esr 0x%03x\n",
@@ -212,6 +214,17 @@ static void __do_user_fault(struct task_struct *tsk, unsigned long addr,
 	si.si_errno = 0;
 	si.si_code = code;
 	si.si_addr = (void __user *)addr;
+	/*
+	 * Either small page or large page may be poisoned.
+	 * In other words, VM_FAULT_HWPOISON_LARGE and
+	 * VM_FAULT_HWPOISON are mutually exclusive.
+	 */
+	if (fault & VM_FAULT_HWPOISON_LARGE)
+		lsb = hstate_index_to_shift(VM_FAULT_GET_HINDEX(fault));
+	else if (fault & VM_FAULT_HWPOISON)
+		lsb = PAGE_SHIFT;
+	si.si_addr_lsb = lsb;
+
 	force_sig_info(sig, &si, tsk);
 }
 
@@ -225,7 +238,7 @@ static void do_bad_area(unsigned long addr, unsigned int esr, struct pt_regs *re
 	 * handle this fault with.
 	 */
 	if (user_mode(regs))
-		__do_user_fault(tsk, addr, esr, SIGSEGV, SEGV_MAPERR, regs);
+		__do_user_fault(tsk, addr, esr, SIGSEGV, SEGV_MAPERR, regs, 0);
 	else
 		__do_kernel_fault(mm, addr, esr, regs);
 }
@@ -427,6 +440,9 @@ static int __kprobes do_page_fault(unsigned long addr, unsigned int esr,
 		 */
 		sig = SIGBUS;
 		code = BUS_ADRERR;
+	} else if (fault & (VM_FAULT_HWPOISON | VM_FAULT_HWPOISON_LARGE)) {
+		sig = SIGBUS;
+		code = BUS_MCEERR_AR;
 	} else {
 		/*
 		 * Something tried to access memory that isn't in our memory
@@ -437,7 +453,7 @@ static int __kprobes do_page_fault(unsigned long addr, unsigned int esr,
 			SEGV_ACCERR : SEGV_MAPERR;
 	}
 
-	__do_user_fault(tsk, addr, esr, sig, code, regs);
+	__do_user_fault(tsk, addr, esr, sig, code, regs, fault);
 	return 0;
 
 no_context:
-- 
2.11.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 3/4] arm64: hwpoison: add VM_FAULT_HWPOISON[_LARGE] handling
@ 2017-03-30 16:38   ` Punit Agrawal
  0 siblings, 0 replies; 24+ messages in thread
From: Punit Agrawal @ 2017-03-30 16:38 UTC (permalink / raw)
  To: linux-arm-kernel

From: "Jonathan (Zhixiong) Zhang" <zjzhang@codeaurora.org>

Add VM_FAULT_HWPOISON[_LARGE] handling to the arm64 page fault
handler. Handling of VM_FAULT_HWPOISON[_LARGE] is very similar
to VM_FAULT_OOM, the only difference is that a different si_code
(BUS_MCEERR_AR) is passed to user space and si_addr_lsb field is
initialized.

Signed-off-by: Jonathan (Zhixiong) Zhang <zjzhang@codeaurora.org>
Signed-off-by: Tyler Baicar <tbaicar@codeaurora.org>
Signed-off-by: Punit Agrawal <punit.agrawal@arm.com>
---
 arch/arm64/mm/fault.c | 22 +++++++++++++++++++---
 1 file changed, 19 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
index 4bf899fb451b..212c862b2fd0 100644
--- a/arch/arm64/mm/fault.c
+++ b/arch/arm64/mm/fault.c
@@ -31,6 +31,7 @@
 #include <linux/highmem.h>
 #include <linux/perf_event.h>
 #include <linux/preempt.h>
+#include <linux/hugetlb.h>
 
 #include <asm/bug.h>
 #include <asm/cpufeature.h>
@@ -194,9 +195,10 @@ static void __do_kernel_fault(struct mm_struct *mm, unsigned long addr,
  */
 static void __do_user_fault(struct task_struct *tsk, unsigned long addr,
 			    unsigned int esr, unsigned int sig, int code,
-			    struct pt_regs *regs)
+			    struct pt_regs *regs, int fault)
 {
 	struct siginfo si;
+	unsigned int lsb = 0;
 
 	if (unhandled_signal(tsk, sig) && show_unhandled_signals_ratelimited()) {
 		pr_info("%s[%d]: unhandled %s (%d) at 0x%08lx, esr 0x%03x\n",
@@ -212,6 +214,17 @@ static void __do_user_fault(struct task_struct *tsk, unsigned long addr,
 	si.si_errno = 0;
 	si.si_code = code;
 	si.si_addr = (void __user *)addr;
+	/*
+	 * Either small page or large page may be poisoned.
+	 * In other words, VM_FAULT_HWPOISON_LARGE and
+	 * VM_FAULT_HWPOISON are mutually exclusive.
+	 */
+	if (fault & VM_FAULT_HWPOISON_LARGE)
+		lsb = hstate_index_to_shift(VM_FAULT_GET_HINDEX(fault));
+	else if (fault & VM_FAULT_HWPOISON)
+		lsb = PAGE_SHIFT;
+	si.si_addr_lsb = lsb;
+
 	force_sig_info(sig, &si, tsk);
 }
 
@@ -225,7 +238,7 @@ static void do_bad_area(unsigned long addr, unsigned int esr, struct pt_regs *re
 	 * handle this fault with.
 	 */
 	if (user_mode(regs))
-		__do_user_fault(tsk, addr, esr, SIGSEGV, SEGV_MAPERR, regs);
+		__do_user_fault(tsk, addr, esr, SIGSEGV, SEGV_MAPERR, regs, 0);
 	else
 		__do_kernel_fault(mm, addr, esr, regs);
 }
@@ -427,6 +440,9 @@ static int __kprobes do_page_fault(unsigned long addr, unsigned int esr,
 		 */
 		sig = SIGBUS;
 		code = BUS_ADRERR;
+	} else if (fault & (VM_FAULT_HWPOISON | VM_FAULT_HWPOISON_LARGE)) {
+		sig = SIGBUS;
+		code = BUS_MCEERR_AR;
 	} else {
 		/*
 		 * Something tried to access memory that isn't in our memory
@@ -437,7 +453,7 @@ static int __kprobes do_page_fault(unsigned long addr, unsigned int esr,
 			SEGV_ACCERR : SEGV_MAPERR;
 	}
 
-	__do_user_fault(tsk, addr, esr, sig, code, regs);
+	__do_user_fault(tsk, addr, esr, sig, code, regs, fault);
 	return 0;
 
 no_context:
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 4/4] arm64: kconfig: allow support for memory failure handling
  2017-03-30 16:38 ` Punit Agrawal
  (?)
@ 2017-03-30 16:38   ` Punit Agrawal
  -1 siblings, 0 replies; 24+ messages in thread
From: Punit Agrawal @ 2017-03-30 16:38 UTC (permalink / raw)
  To: catalin.marinas, will.deacon, akpm
  Cc: Jonathan (Zhixiong) Zhang, linux-mm, linux-arm-kernel,
	linux-kernel, tbaicar, kirill.shutemov, mike.kravetz,
	Punit Agrawal

From: "Jonathan (Zhixiong) Zhang" <zjzhang@codeaurora.org>

If ACPI_APEI and MEMORY_FAILURE is configured, select
ACPI_APEI_MEMORY_FAILURE. This enables memory failure recovery
when such memory failure is reported through ACPI APEI. APEI
(ACPI Platform Error Interfaces) provides a means for the
platform to convey error information to the kernel.
APEI bits

Declare ARCH_SUPPORTS_MEMORY_FAILURE, as arm64 does support
memory failure recovery attempt.

Signed-off-by: Jonathan (Zhixiong) Zhang <zjzhang@codeaurora.org>
Signed-off-by: Tyler Baicar <tbaicar@codeaurora.org>
Signed-off-by: Punit Agrawal <punit.agrawal@arm.com>
---
 arch/arm64/Kconfig        | 1 +
 drivers/acpi/apei/Kconfig | 1 +
 2 files changed, 2 insertions(+)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 3741859765cf..993a5fd85452 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -19,6 +19,7 @@ config ARM64
 	select ARCH_HAS_STRICT_MODULE_RWX
 	select ARCH_HAS_TICK_BROADCAST if GENERIC_CLOCKEVENTS_BROADCAST
 	select ARCH_USE_CMPXCHG_LOCKREF
+	select ARCH_SUPPORTS_MEMORY_FAILURE
 	select ARCH_SUPPORTS_ATOMIC_RMW
 	select ARCH_SUPPORTS_NUMA_BALANCING
 	select ARCH_WANT_COMPAT_IPC_PARSE_VERSION
diff --git a/drivers/acpi/apei/Kconfig b/drivers/acpi/apei/Kconfig
index b0140c8fc733..6d9a812fd3f9 100644
--- a/drivers/acpi/apei/Kconfig
+++ b/drivers/acpi/apei/Kconfig
@@ -9,6 +9,7 @@ config ACPI_APEI
 	select MISC_FILESYSTEMS
 	select PSTORE
 	select UEFI_CPER
+	select ACPI_APEI_MEMORY_FAILURE if MEMORY_FAILURE
 	depends on HAVE_ACPI_APEI
 	help
 	  APEI allows to report errors (for example from the chipset)
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 4/4] arm64: kconfig: allow support for memory failure handling
@ 2017-03-30 16:38   ` Punit Agrawal
  0 siblings, 0 replies; 24+ messages in thread
From: Punit Agrawal @ 2017-03-30 16:38 UTC (permalink / raw)
  To: catalin.marinas, will.deacon, akpm
  Cc: Jonathan (Zhixiong) Zhang, linux-mm, linux-arm-kernel,
	linux-kernel, tbaicar, kirill.shutemov, mike.kravetz,
	Punit Agrawal

From: "Jonathan (Zhixiong) Zhang" <zjzhang@codeaurora.org>

If ACPI_APEI and MEMORY_FAILURE is configured, select
ACPI_APEI_MEMORY_FAILURE. This enables memory failure recovery
when such memory failure is reported through ACPI APEI. APEI
(ACPI Platform Error Interfaces) provides a means for the
platform to convey error information to the kernel.
APEI bits

Declare ARCH_SUPPORTS_MEMORY_FAILURE, as arm64 does support
memory failure recovery attempt.

Signed-off-by: Jonathan (Zhixiong) Zhang <zjzhang@codeaurora.org>
Signed-off-by: Tyler Baicar <tbaicar@codeaurora.org>
Signed-off-by: Punit Agrawal <punit.agrawal@arm.com>
---
 arch/arm64/Kconfig        | 1 +
 drivers/acpi/apei/Kconfig | 1 +
 2 files changed, 2 insertions(+)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 3741859765cf..993a5fd85452 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -19,6 +19,7 @@ config ARM64
 	select ARCH_HAS_STRICT_MODULE_RWX
 	select ARCH_HAS_TICK_BROADCAST if GENERIC_CLOCKEVENTS_BROADCAST
 	select ARCH_USE_CMPXCHG_LOCKREF
+	select ARCH_SUPPORTS_MEMORY_FAILURE
 	select ARCH_SUPPORTS_ATOMIC_RMW
 	select ARCH_SUPPORTS_NUMA_BALANCING
 	select ARCH_WANT_COMPAT_IPC_PARSE_VERSION
diff --git a/drivers/acpi/apei/Kconfig b/drivers/acpi/apei/Kconfig
index b0140c8fc733..6d9a812fd3f9 100644
--- a/drivers/acpi/apei/Kconfig
+++ b/drivers/acpi/apei/Kconfig
@@ -9,6 +9,7 @@ config ACPI_APEI
 	select MISC_FILESYSTEMS
 	select PSTORE
 	select UEFI_CPER
+	select ACPI_APEI_MEMORY_FAILURE if MEMORY_FAILURE
 	depends on HAVE_ACPI_APEI
 	help
 	  APEI allows to report errors (for example from the chipset)
-- 
2.11.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 4/4] arm64: kconfig: allow support for memory failure handling
@ 2017-03-30 16:38   ` Punit Agrawal
  0 siblings, 0 replies; 24+ messages in thread
From: Punit Agrawal @ 2017-03-30 16:38 UTC (permalink / raw)
  To: linux-arm-kernel

From: "Jonathan (Zhixiong) Zhang" <zjzhang@codeaurora.org>

If ACPI_APEI and MEMORY_FAILURE is configured, select
ACPI_APEI_MEMORY_FAILURE. This enables memory failure recovery
when such memory failure is reported through ACPI APEI. APEI
(ACPI Platform Error Interfaces) provides a means for the
platform to convey error information to the kernel.
APEI bits

Declare ARCH_SUPPORTS_MEMORY_FAILURE, as arm64 does support
memory failure recovery attempt.

Signed-off-by: Jonathan (Zhixiong) Zhang <zjzhang@codeaurora.org>
Signed-off-by: Tyler Baicar <tbaicar@codeaurora.org>
Signed-off-by: Punit Agrawal <punit.agrawal@arm.com>
---
 arch/arm64/Kconfig        | 1 +
 drivers/acpi/apei/Kconfig | 1 +
 2 files changed, 2 insertions(+)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 3741859765cf..993a5fd85452 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -19,6 +19,7 @@ config ARM64
 	select ARCH_HAS_STRICT_MODULE_RWX
 	select ARCH_HAS_TICK_BROADCAST if GENERIC_CLOCKEVENTS_BROADCAST
 	select ARCH_USE_CMPXCHG_LOCKREF
+	select ARCH_SUPPORTS_MEMORY_FAILURE
 	select ARCH_SUPPORTS_ATOMIC_RMW
 	select ARCH_SUPPORTS_NUMA_BALANCING
 	select ARCH_WANT_COMPAT_IPC_PARSE_VERSION
diff --git a/drivers/acpi/apei/Kconfig b/drivers/acpi/apei/Kconfig
index b0140c8fc733..6d9a812fd3f9 100644
--- a/drivers/acpi/apei/Kconfig
+++ b/drivers/acpi/apei/Kconfig
@@ -9,6 +9,7 @@ config ACPI_APEI
 	select MISC_FILESYSTEMS
 	select PSTORE
 	select UEFI_CPER
+	select ACPI_APEI_MEMORY_FAILURE if MEMORY_FAILURE
 	depends on HAVE_ACPI_APEI
 	help
 	  APEI allows to report errors (for example from the chipset)
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/4] arm64: hugetlbpages: Correctly handle swap entries in huge_pte_offset()
  2017-03-30 16:38   ` Punit Agrawal
  (?)
@ 2017-03-31  9:52     ` Mark Rutland
  -1 siblings, 0 replies; 24+ messages in thread
From: Mark Rutland @ 2017-03-31  9:52 UTC (permalink / raw)
  To: Punit Agrawal
  Cc: catalin.marinas, will.deacon, akpm, David Woods, tbaicar,
	linux-kernel, linux-mm, linux-arm-kernel, kirill.shutemov,
	mike.kravetz

Hi Punit,

On Thu, Mar 30, 2017 at 05:38:47PM +0100, Punit Agrawal wrote:
> huge_pte_offset() does not correctly handle poisoned or migration page
> table entries. 

What exactly does it do wrong?

Judging by the patch, we return NULL in some cases we shouldn't, right?

What can result from this? e.g. can we see data corruption?

> Not knowing the size of the hugepage entry being
> requested only compounded the problem.
> 
> The recently added hstate parameter can be used to determine the size of
> hugepage being accessed. Use the size to find the correct page table
> entry to return when coming across a swap page table entry.
> 
> Signed-off-by: Punit Agrawal <punit.agrawal@arm.com>
> Cc: David Woods <dwoods@mellanox.com>

Given this is a fix for a bug, it sounds like it should have a fixes
tag, or a Cc stable...

Thanks,
Mark.

> ---
>  arch/arm64/mm/hugetlbpage.c | 31 ++++++++++++++++---------------
>  1 file changed, 16 insertions(+), 15 deletions(-)
> 
> diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c
> index 9ca742c4c1ab..44014403081f 100644
> --- a/arch/arm64/mm/hugetlbpage.c
> +++ b/arch/arm64/mm/hugetlbpage.c
> @@ -192,38 +192,39 @@ pte_t *huge_pte_alloc(struct mm_struct *mm,
>  pte_t *huge_pte_offset(struct mm_struct *mm,
>  		       unsigned long addr, struct hstate *h)
>  {
> +	unsigned long sz = huge_page_size(h);
>  	pgd_t *pgd;
>  	pud_t *pud;
> -	pmd_t *pmd = NULL;
> -	pte_t *pte = NULL;
> +	pmd_t *pmd;
> +	pte_t *pte;
>  
>  	pgd = pgd_offset(mm, addr);
>  	pr_debug("%s: addr:0x%lx pgd:%p\n", __func__, addr, pgd);
>  	if (!pgd_present(*pgd))
>  		return NULL;
> +
>  	pud = pud_offset(pgd, addr);
> -	if (!pud_present(*pud))
> +	if (pud_none(*pud) && sz != PUD_SIZE)
>  		return NULL;
> -
> -	if (pud_huge(*pud))
> +	else if (!pud_table(*pud))
>  		return (pte_t *)pud;
> +
> +	if (sz == CONT_PMD_SIZE)
> +		addr &= CONT_PMD_MASK;
> +
>  	pmd = pmd_offset(pud, addr);
> -	if (!pmd_present(*pmd))
> +	if (pmd_none(*pmd) &&
> +	    !(sz == PMD_SIZE || sz == CONT_PMD_SIZE))
>  		return NULL;
> -
> -	if (pte_cont(pmd_pte(*pmd))) {
> -		pmd = pmd_offset(
> -			pud, (addr & CONT_PMD_MASK));
> -		return (pte_t *)pmd;
> -	}
> -	if (pmd_huge(*pmd))
> +	else if (!pmd_table(*pmd))
>  		return (pte_t *)pmd;
> -	pte = pte_offset_kernel(pmd, addr);
> -	if (pte_present(*pte) && pte_cont(*pte)) {
> +
> +	if (sz == CONT_PTE_SIZE) {
>  		pte = pte_offset_kernel(
>  			pmd, (addr & CONT_PTE_MASK));
>  		return pte;
>  	}
> +
>  	return NULL;
>  }
>  
> -- 
> 2.11.0
> 
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/4] arm64: hugetlbpages: Correctly handle swap entries in huge_pte_offset()
@ 2017-03-31  9:52     ` Mark Rutland
  0 siblings, 0 replies; 24+ messages in thread
From: Mark Rutland @ 2017-03-31  9:52 UTC (permalink / raw)
  To: Punit Agrawal
  Cc: catalin.marinas, will.deacon, akpm, David Woods, tbaicar,
	linux-kernel, linux-mm, linux-arm-kernel, kirill.shutemov,
	mike.kravetz

Hi Punit,

On Thu, Mar 30, 2017 at 05:38:47PM +0100, Punit Agrawal wrote:
> huge_pte_offset() does not correctly handle poisoned or migration page
> table entries. 

What exactly does it do wrong?

Judging by the patch, we return NULL in some cases we shouldn't, right?

What can result from this? e.g. can we see data corruption?

> Not knowing the size of the hugepage entry being
> requested only compounded the problem.
> 
> The recently added hstate parameter can be used to determine the size of
> hugepage being accessed. Use the size to find the correct page table
> entry to return when coming across a swap page table entry.
> 
> Signed-off-by: Punit Agrawal <punit.agrawal@arm.com>
> Cc: David Woods <dwoods@mellanox.com>

Given this is a fix for a bug, it sounds like it should have a fixes
tag, or a Cc stable...

Thanks,
Mark.

> ---
>  arch/arm64/mm/hugetlbpage.c | 31 ++++++++++++++++---------------
>  1 file changed, 16 insertions(+), 15 deletions(-)
> 
> diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c
> index 9ca742c4c1ab..44014403081f 100644
> --- a/arch/arm64/mm/hugetlbpage.c
> +++ b/arch/arm64/mm/hugetlbpage.c
> @@ -192,38 +192,39 @@ pte_t *huge_pte_alloc(struct mm_struct *mm,
>  pte_t *huge_pte_offset(struct mm_struct *mm,
>  		       unsigned long addr, struct hstate *h)
>  {
> +	unsigned long sz = huge_page_size(h);
>  	pgd_t *pgd;
>  	pud_t *pud;
> -	pmd_t *pmd = NULL;
> -	pte_t *pte = NULL;
> +	pmd_t *pmd;
> +	pte_t *pte;
>  
>  	pgd = pgd_offset(mm, addr);
>  	pr_debug("%s: addr:0x%lx pgd:%p\n", __func__, addr, pgd);
>  	if (!pgd_present(*pgd))
>  		return NULL;
> +
>  	pud = pud_offset(pgd, addr);
> -	if (!pud_present(*pud))
> +	if (pud_none(*pud) && sz != PUD_SIZE)
>  		return NULL;
> -
> -	if (pud_huge(*pud))
> +	else if (!pud_table(*pud))
>  		return (pte_t *)pud;
> +
> +	if (sz == CONT_PMD_SIZE)
> +		addr &= CONT_PMD_MASK;
> +
>  	pmd = pmd_offset(pud, addr);
> -	if (!pmd_present(*pmd))
> +	if (pmd_none(*pmd) &&
> +	    !(sz == PMD_SIZE || sz == CONT_PMD_SIZE))
>  		return NULL;
> -
> -	if (pte_cont(pmd_pte(*pmd))) {
> -		pmd = pmd_offset(
> -			pud, (addr & CONT_PMD_MASK));
> -		return (pte_t *)pmd;
> -	}
> -	if (pmd_huge(*pmd))
> +	else if (!pmd_table(*pmd))
>  		return (pte_t *)pmd;
> -	pte = pte_offset_kernel(pmd, addr);
> -	if (pte_present(*pte) && pte_cont(*pte)) {
> +
> +	if (sz == CONT_PTE_SIZE) {
>  		pte = pte_offset_kernel(
>  			pmd, (addr & CONT_PTE_MASK));
>  		return pte;
>  	}
> +
>  	return NULL;
>  }
>  
> -- 
> 2.11.0
> 
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 2/4] arm64: hugetlbpages: Correctly handle swap entries in huge_pte_offset()
@ 2017-03-31  9:52     ` Mark Rutland
  0 siblings, 0 replies; 24+ messages in thread
From: Mark Rutland @ 2017-03-31  9:52 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Punit,

On Thu, Mar 30, 2017 at 05:38:47PM +0100, Punit Agrawal wrote:
> huge_pte_offset() does not correctly handle poisoned or migration page
> table entries. 

What exactly does it do wrong?

Judging by the patch, we return NULL in some cases we shouldn't, right?

What can result from this? e.g. can we see data corruption?

> Not knowing the size of the hugepage entry being
> requested only compounded the problem.
> 
> The recently added hstate parameter can be used to determine the size of
> hugepage being accessed. Use the size to find the correct page table
> entry to return when coming across a swap page table entry.
> 
> Signed-off-by: Punit Agrawal <punit.agrawal@arm.com>
> Cc: David Woods <dwoods@mellanox.com>

Given this is a fix for a bug, it sounds like it should have a fixes
tag, or a Cc stable...

Thanks,
Mark.

> ---
>  arch/arm64/mm/hugetlbpage.c | 31 ++++++++++++++++---------------
>  1 file changed, 16 insertions(+), 15 deletions(-)
> 
> diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c
> index 9ca742c4c1ab..44014403081f 100644
> --- a/arch/arm64/mm/hugetlbpage.c
> +++ b/arch/arm64/mm/hugetlbpage.c
> @@ -192,38 +192,39 @@ pte_t *huge_pte_alloc(struct mm_struct *mm,
>  pte_t *huge_pte_offset(struct mm_struct *mm,
>  		       unsigned long addr, struct hstate *h)
>  {
> +	unsigned long sz = huge_page_size(h);
>  	pgd_t *pgd;
>  	pud_t *pud;
> -	pmd_t *pmd = NULL;
> -	pte_t *pte = NULL;
> +	pmd_t *pmd;
> +	pte_t *pte;
>  
>  	pgd = pgd_offset(mm, addr);
>  	pr_debug("%s: addr:0x%lx pgd:%p\n", __func__, addr, pgd);
>  	if (!pgd_present(*pgd))
>  		return NULL;
> +
>  	pud = pud_offset(pgd, addr);
> -	if (!pud_present(*pud))
> +	if (pud_none(*pud) && sz != PUD_SIZE)
>  		return NULL;
> -
> -	if (pud_huge(*pud))
> +	else if (!pud_table(*pud))
>  		return (pte_t *)pud;
> +
> +	if (sz == CONT_PMD_SIZE)
> +		addr &= CONT_PMD_MASK;
> +
>  	pmd = pmd_offset(pud, addr);
> -	if (!pmd_present(*pmd))
> +	if (pmd_none(*pmd) &&
> +	    !(sz == PMD_SIZE || sz == CONT_PMD_SIZE))
>  		return NULL;
> -
> -	if (pte_cont(pmd_pte(*pmd))) {
> -		pmd = pmd_offset(
> -			pud, (addr & CONT_PMD_MASK));
> -		return (pte_t *)pmd;
> -	}
> -	if (pmd_huge(*pmd))
> +	else if (!pmd_table(*pmd))
>  		return (pte_t *)pmd;
> -	pte = pte_offset_kernel(pmd, addr);
> -	if (pte_present(*pte) && pte_cont(*pte)) {
> +
> +	if (sz == CONT_PTE_SIZE) {
>  		pte = pte_offset_kernel(
>  			pmd, (addr & CONT_PTE_MASK));
>  		return pte;
>  	}
> +
>  	return NULL;
>  }
>  
> -- 
> 2.11.0
> 
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/4] mm/hugetlb.c: add hstate parameter to huge_pte_offset()
  2017-03-30 16:38   ` Punit Agrawal
  (?)
@ 2017-04-02 19:55     ` kbuild test robot
  -1 siblings, 0 replies; 24+ messages in thread
From: kbuild test robot @ 2017-04-02 19:55 UTC (permalink / raw)
  To: Punit Agrawal
  Cc: kbuild-all, catalin.marinas, will.deacon, akpm, Punit Agrawal,
	linux-mm, linux-arm-kernel, linux-kernel, tbaicar,
	kirill.shutemov, mike.kravetz, Tony Luck, Fenghua Yu,
	James Hogan, Ralf Baechle, James E.J. Bottomley, Helge Deller,
	Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	Martin Schwidefsky, Heiko Carstens, Yoshinori Sato, Rich Felker,
	David S. Miller, Chris Metcalf, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Alexander Viro, Steve Capper, Michal Hocko,
	Naoya Horiguchi, Aneesh Kumar K.V, Hillf Danton

[-- Attachment #1: Type: text/plain, Size: 3399 bytes --]

Hi Punit,

[auto build test ERROR on arm64/for-next/core]
[also build test ERROR on v4.11-rc4]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Punit-Agrawal/Add-hstate-parameter-to-huge_pte_offset/20170331-104016
base:   https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git for-next/core
config: arm64-defconfig (attached as .config)
compiler: aarch64-linux-gnu-gcc (Debian 6.1.1-9) 6.1.1 20160705
reproduce:
        wget https://raw.githubusercontent.com/01org/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        make.cross ARCH=arm64 

All errors (new ones prefixed by >>):

   arch/arm64/mm/hugetlbpage.c: In function 'huge_ptep_get_and_clear':
>> arch/arm64/mm/hugetlbpage.c:200:10: error: too few arguments to function 'huge_pte_offset'
      cpte = huge_pte_offset(mm, addr);
             ^~~~~~~~~~~~~~~
   arch/arm64/mm/hugetlbpage.c:135:8: note: declared here
    pte_t *huge_pte_offset(struct mm_struct *mm,
           ^~~~~~~~~~~~~~~
   arch/arm64/mm/hugetlbpage.c: In function 'huge_ptep_set_access_flags':
   arch/arm64/mm/hugetlbpage.c:238:10: error: too few arguments to function 'huge_pte_offset'
      cpte = huge_pte_offset(vma->vm_mm, addr);
             ^~~~~~~~~~~~~~~
   arch/arm64/mm/hugetlbpage.c:135:8: note: declared here
    pte_t *huge_pte_offset(struct mm_struct *mm,
           ^~~~~~~~~~~~~~~
   arch/arm64/mm/hugetlbpage.c: In function 'huge_ptep_set_wrprotect':
   arch/arm64/mm/hugetlbpage.c:263:10: error: too few arguments to function 'huge_pte_offset'
      cpte = huge_pte_offset(mm, addr);
             ^~~~~~~~~~~~~~~
   arch/arm64/mm/hugetlbpage.c:135:8: note: declared here
    pte_t *huge_pte_offset(struct mm_struct *mm,
           ^~~~~~~~~~~~~~~
   arch/arm64/mm/hugetlbpage.c: In function 'huge_ptep_clear_flush':
   arch/arm64/mm/hugetlbpage.c:280:10: error: too few arguments to function 'huge_pte_offset'
      cpte = huge_pte_offset(vma->vm_mm, addr);
             ^~~~~~~~~~~~~~~
   arch/arm64/mm/hugetlbpage.c:135:8: note: declared here
    pte_t *huge_pte_offset(struct mm_struct *mm,
           ^~~~~~~~~~~~~~~

vim +/huge_pte_offset +200 arch/arm64/mm/hugetlbpage.c

66b3923a David Woods 2015-12-17  194  	if (pte_cont(*ptep)) {
66b3923a David Woods 2015-12-17  195  		int ncontig, i;
66b3923a David Woods 2015-12-17  196  		size_t pgsize;
66b3923a David Woods 2015-12-17  197  		pte_t *cpte;
66b3923a David Woods 2015-12-17  198  		bool is_dirty = false;
66b3923a David Woods 2015-12-17  199  
66b3923a David Woods 2015-12-17 @200  		cpte = huge_pte_offset(mm, addr);
66b3923a David Woods 2015-12-17  201  		ncontig = find_num_contig(mm, addr, cpte, *cpte, &pgsize);
66b3923a David Woods 2015-12-17  202  		/* save the 1st pte to return */
66b3923a David Woods 2015-12-17  203  		pte = ptep_get_and_clear(mm, addr, cpte);

:::::: The code at line 200 was first introduced by commit
:::::: 66b3923a1a0f77a563b43f43f6ad091354abbfe9 arm64: hugetlb: add support for PTE contiguous bit

:::::: TO: David Woods <dwoods@ezchip.com>
:::::: CC: Will Deacon <will.deacon@arm.com>

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 34627 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/4] mm/hugetlb.c: add hstate parameter to huge_pte_offset()
@ 2017-04-02 19:55     ` kbuild test robot
  0 siblings, 0 replies; 24+ messages in thread
From: kbuild test robot @ 2017-04-02 19:55 UTC (permalink / raw)
  To: Punit Agrawal
  Cc: kbuild-all, catalin.marinas, will.deacon, akpm, linux-mm,
	linux-arm-kernel, linux-kernel, tbaicar, kirill.shutemov,
	mike.kravetz, Tony Luck, Fenghua Yu, James Hogan, Ralf Baechle,
	James E.J. Bottomley, Helge Deller, Benjamin Herrenschmidt,
	Paul Mackerras, Michael Ellerman, Martin Schwidefsky,
	Heiko Carstens, Yoshinori Sato, Rich Felker, David S. Miller,
	Chris Metcalf, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Alexander Viro, Steve Capper, Michal Hocko, Naoya Horiguchi,
	Aneesh Kumar K.V, Hillf Danton

[-- Attachment #1: Type: text/plain, Size: 3399 bytes --]

Hi Punit,

[auto build test ERROR on arm64/for-next/core]
[also build test ERROR on v4.11-rc4]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Punit-Agrawal/Add-hstate-parameter-to-huge_pte_offset/20170331-104016
base:   https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git for-next/core
config: arm64-defconfig (attached as .config)
compiler: aarch64-linux-gnu-gcc (Debian 6.1.1-9) 6.1.1 20160705
reproduce:
        wget https://raw.githubusercontent.com/01org/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        make.cross ARCH=arm64 

All errors (new ones prefixed by >>):

   arch/arm64/mm/hugetlbpage.c: In function 'huge_ptep_get_and_clear':
>> arch/arm64/mm/hugetlbpage.c:200:10: error: too few arguments to function 'huge_pte_offset'
      cpte = huge_pte_offset(mm, addr);
             ^~~~~~~~~~~~~~~
   arch/arm64/mm/hugetlbpage.c:135:8: note: declared here
    pte_t *huge_pte_offset(struct mm_struct *mm,
           ^~~~~~~~~~~~~~~
   arch/arm64/mm/hugetlbpage.c: In function 'huge_ptep_set_access_flags':
   arch/arm64/mm/hugetlbpage.c:238:10: error: too few arguments to function 'huge_pte_offset'
      cpte = huge_pte_offset(vma->vm_mm, addr);
             ^~~~~~~~~~~~~~~
   arch/arm64/mm/hugetlbpage.c:135:8: note: declared here
    pte_t *huge_pte_offset(struct mm_struct *mm,
           ^~~~~~~~~~~~~~~
   arch/arm64/mm/hugetlbpage.c: In function 'huge_ptep_set_wrprotect':
   arch/arm64/mm/hugetlbpage.c:263:10: error: too few arguments to function 'huge_pte_offset'
      cpte = huge_pte_offset(mm, addr);
             ^~~~~~~~~~~~~~~
   arch/arm64/mm/hugetlbpage.c:135:8: note: declared here
    pte_t *huge_pte_offset(struct mm_struct *mm,
           ^~~~~~~~~~~~~~~
   arch/arm64/mm/hugetlbpage.c: In function 'huge_ptep_clear_flush':
   arch/arm64/mm/hugetlbpage.c:280:10: error: too few arguments to function 'huge_pte_offset'
      cpte = huge_pte_offset(vma->vm_mm, addr);
             ^~~~~~~~~~~~~~~
   arch/arm64/mm/hugetlbpage.c:135:8: note: declared here
    pte_t *huge_pte_offset(struct mm_struct *mm,
           ^~~~~~~~~~~~~~~

vim +/huge_pte_offset +200 arch/arm64/mm/hugetlbpage.c

66b3923a David Woods 2015-12-17  194  	if (pte_cont(*ptep)) {
66b3923a David Woods 2015-12-17  195  		int ncontig, i;
66b3923a David Woods 2015-12-17  196  		size_t pgsize;
66b3923a David Woods 2015-12-17  197  		pte_t *cpte;
66b3923a David Woods 2015-12-17  198  		bool is_dirty = false;
66b3923a David Woods 2015-12-17  199  
66b3923a David Woods 2015-12-17 @200  		cpte = huge_pte_offset(mm, addr);
66b3923a David Woods 2015-12-17  201  		ncontig = find_num_contig(mm, addr, cpte, *cpte, &pgsize);
66b3923a David Woods 2015-12-17  202  		/* save the 1st pte to return */
66b3923a David Woods 2015-12-17  203  		pte = ptep_get_and_clear(mm, addr, cpte);

:::::: The code at line 200 was first introduced by commit
:::::: 66b3923a1a0f77a563b43f43f6ad091354abbfe9 arm64: hugetlb: add support for PTE contiguous bit

:::::: TO: David Woods <dwoods@ezchip.com>
:::::: CC: Will Deacon <will.deacon@arm.com>

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 34627 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 1/4] mm/hugetlb.c: add hstate parameter to huge_pte_offset()
@ 2017-04-02 19:55     ` kbuild test robot
  0 siblings, 0 replies; 24+ messages in thread
From: kbuild test robot @ 2017-04-02 19:55 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Punit,

[auto build test ERROR on arm64/for-next/core]
[also build test ERROR on v4.11-rc4]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Punit-Agrawal/Add-hstate-parameter-to-huge_pte_offset/20170331-104016
base:   https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git for-next/core
config: arm64-defconfig (attached as .config)
compiler: aarch64-linux-gnu-gcc (Debian 6.1.1-9) 6.1.1 20160705
reproduce:
        wget https://raw.githubusercontent.com/01org/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        make.cross ARCH=arm64 

All errors (new ones prefixed by >>):

   arch/arm64/mm/hugetlbpage.c: In function 'huge_ptep_get_and_clear':
>> arch/arm64/mm/hugetlbpage.c:200:10: error: too few arguments to function 'huge_pte_offset'
      cpte = huge_pte_offset(mm, addr);
             ^~~~~~~~~~~~~~~
   arch/arm64/mm/hugetlbpage.c:135:8: note: declared here
    pte_t *huge_pte_offset(struct mm_struct *mm,
           ^~~~~~~~~~~~~~~
   arch/arm64/mm/hugetlbpage.c: In function 'huge_ptep_set_access_flags':
   arch/arm64/mm/hugetlbpage.c:238:10: error: too few arguments to function 'huge_pte_offset'
      cpte = huge_pte_offset(vma->vm_mm, addr);
             ^~~~~~~~~~~~~~~
   arch/arm64/mm/hugetlbpage.c:135:8: note: declared here
    pte_t *huge_pte_offset(struct mm_struct *mm,
           ^~~~~~~~~~~~~~~
   arch/arm64/mm/hugetlbpage.c: In function 'huge_ptep_set_wrprotect':
   arch/arm64/mm/hugetlbpage.c:263:10: error: too few arguments to function 'huge_pte_offset'
      cpte = huge_pte_offset(mm, addr);
             ^~~~~~~~~~~~~~~
   arch/arm64/mm/hugetlbpage.c:135:8: note: declared here
    pte_t *huge_pte_offset(struct mm_struct *mm,
           ^~~~~~~~~~~~~~~
   arch/arm64/mm/hugetlbpage.c: In function 'huge_ptep_clear_flush':
   arch/arm64/mm/hugetlbpage.c:280:10: error: too few arguments to function 'huge_pte_offset'
      cpte = huge_pte_offset(vma->vm_mm, addr);
             ^~~~~~~~~~~~~~~
   arch/arm64/mm/hugetlbpage.c:135:8: note: declared here
    pte_t *huge_pte_offset(struct mm_struct *mm,
           ^~~~~~~~~~~~~~~

vim +/huge_pte_offset +200 arch/arm64/mm/hugetlbpage.c

66b3923a David Woods 2015-12-17  194  	if (pte_cont(*ptep)) {
66b3923a David Woods 2015-12-17  195  		int ncontig, i;
66b3923a David Woods 2015-12-17  196  		size_t pgsize;
66b3923a David Woods 2015-12-17  197  		pte_t *cpte;
66b3923a David Woods 2015-12-17  198  		bool is_dirty = false;
66b3923a David Woods 2015-12-17  199  
66b3923a David Woods 2015-12-17 @200  		cpte = huge_pte_offset(mm, addr);
66b3923a David Woods 2015-12-17  201  		ncontig = find_num_contig(mm, addr, cpte, *cpte, &pgsize);
66b3923a David Woods 2015-12-17  202  		/* save the 1st pte to return */
66b3923a David Woods 2015-12-17  203  		pte = ptep_get_and_clear(mm, addr, cpte);

:::::: The code at line 200 was first introduced by commit
:::::: 66b3923a1a0f77a563b43f43f6ad091354abbfe9 arm64: hugetlb: add support for PTE contiguous bit

:::::: TO: David Woods <dwoods@ezchip.com>
:::::: CC: Will Deacon <will.deacon@arm.com>

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation
-------------- next part --------------
A non-text attachment was scrubbed...
Name: .config.gz
Type: application/gzip
Size: 34627 bytes
Desc: not available
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20170403/8f10e4f1/attachment-0001.gz>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/4] arm64: hugetlbpages: Correctly handle swap entries in huge_pte_offset()
  2017-03-31  9:52     ` Mark Rutland
  (?)
@ 2017-04-04 18:47       ` Punit Agrawal
  -1 siblings, 0 replies; 24+ messages in thread
From: Punit Agrawal @ 2017-04-04 18:47 UTC (permalink / raw)
  To: Mark Rutland
  Cc: catalin.marinas, will.deacon, akpm, David Woods, tbaicar,
	linux-kernel, linux-mm, linux-arm-kernel, kirill.shutemov,
	mike.kravetz

Hi Mark,

Mark Rutland <mark.rutland@arm.com> writes:

> Hi Punit,
>
> On Thu, Mar 30, 2017 at 05:38:47PM +0100, Punit Agrawal wrote:
>> huge_pte_offset() does not correctly handle poisoned or migration page
>> table entries. 
>
> What exactly does it do wrong?
>
> Judging by the patch, we return NULL in some cases we shouldn't, right?

huge_pte_offset() returns NULL when it comes across swap entries for any
of the supported hugepage sizes.

>
> What can result from this? e.g. can we see data corruption?

In the tests I am running, it results in an error in the log -

[  344.165544] mm/pgtable-generic.c:33: bad pmd 000000083af00074.

when unmapping the page tables for the process that owns the poisoned
page.

In some instances, returning NULL instead of swap entries could lead to
data corruption - especially when the page tables contain migration swap
entries. But since hugepage migration is not enabled on arm64 I haven't
seen any corruption.

I've updated the commit log with more details locally.

>
>> Not knowing the size of the hugepage entry being
>> requested only compounded the problem.
>> 
>> The recently added hstate parameter can be used to determine the size of
>> hugepage being accessed. Use the size to find the correct page table
>> entry to return when coming across a swap page table entry.
>> 
>> Signed-off-by: Punit Agrawal <punit.agrawal@arm.com>
>> Cc: David Woods <dwoods@mellanox.com>
>
> Given this is a fix for a bug, it sounds like it should have a fixes
> tag, or a Cc stable...

The problem doesn't occur until we enable memory failure handling. So
there shouldn't be a problem on earlier kernels.

Thanks,
Punit

>
> Thanks,
> Mark.
>
>> ---
>>  arch/arm64/mm/hugetlbpage.c | 31 ++++++++++++++++---------------
>>  1 file changed, 16 insertions(+), 15 deletions(-)
>> 
>> diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c
>> index 9ca742c4c1ab..44014403081f 100644
>> --- a/arch/arm64/mm/hugetlbpage.c
>> +++ b/arch/arm64/mm/hugetlbpage.c
>> @@ -192,38 +192,39 @@ pte_t *huge_pte_alloc(struct mm_struct *mm,
>>  pte_t *huge_pte_offset(struct mm_struct *mm,
>>  		       unsigned long addr, struct hstate *h)
>>  {
>> +	unsigned long sz = huge_page_size(h);
>>  	pgd_t *pgd;
>>  	pud_t *pud;
>> -	pmd_t *pmd = NULL;
>> -	pte_t *pte = NULL;
>> +	pmd_t *pmd;
>> +	pte_t *pte;
>>  
>>  	pgd = pgd_offset(mm, addr);
>>  	pr_debug("%s: addr:0x%lx pgd:%p\n", __func__, addr, pgd);
>>  	if (!pgd_present(*pgd))
>>  		return NULL;
>> +
>>  	pud = pud_offset(pgd, addr);
>> -	if (!pud_present(*pud))
>> +	if (pud_none(*pud) && sz != PUD_SIZE)
>>  		return NULL;
>> -
>> -	if (pud_huge(*pud))
>> +	else if (!pud_table(*pud))
>>  		return (pte_t *)pud;
>> +
>> +	if (sz == CONT_PMD_SIZE)
>> +		addr &= CONT_PMD_MASK;
>> +
>>  	pmd = pmd_offset(pud, addr);
>> -	if (!pmd_present(*pmd))
>> +	if (pmd_none(*pmd) &&
>> +	    !(sz == PMD_SIZE || sz == CONT_PMD_SIZE))
>>  		return NULL;
>> -
>> -	if (pte_cont(pmd_pte(*pmd))) {
>> -		pmd = pmd_offset(
>> -			pud, (addr & CONT_PMD_MASK));
>> -		return (pte_t *)pmd;
>> -	}
>> -	if (pmd_huge(*pmd))
>> +	else if (!pmd_table(*pmd))
>>  		return (pte_t *)pmd;
>> -	pte = pte_offset_kernel(pmd, addr);
>> -	if (pte_present(*pte) && pte_cont(*pte)) {
>> +
>> +	if (sz == CONT_PTE_SIZE) {
>>  		pte = pte_offset_kernel(
>>  			pmd, (addr & CONT_PTE_MASK));
>>  		return pte;
>>  	}
>> +
>>  	return NULL;
>>  }
>>  
>> -- 
>> 2.11.0
>> 
>> 
>> _______________________________________________
>> linux-arm-kernel mailing list
>> linux-arm-kernel@lists.infradead.org
>> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/4] arm64: hugetlbpages: Correctly handle swap entries in huge_pte_offset()
@ 2017-04-04 18:47       ` Punit Agrawal
  0 siblings, 0 replies; 24+ messages in thread
From: Punit Agrawal @ 2017-04-04 18:47 UTC (permalink / raw)
  To: Mark Rutland
  Cc: catalin.marinas, will.deacon, akpm, David Woods, tbaicar,
	linux-kernel, linux-mm, linux-arm-kernel, kirill.shutemov,
	mike.kravetz

Hi Mark,

Mark Rutland <mark.rutland@arm.com> writes:

> Hi Punit,
>
> On Thu, Mar 30, 2017 at 05:38:47PM +0100, Punit Agrawal wrote:
>> huge_pte_offset() does not correctly handle poisoned or migration page
>> table entries. 
>
> What exactly does it do wrong?
>
> Judging by the patch, we return NULL in some cases we shouldn't, right?

huge_pte_offset() returns NULL when it comes across swap entries for any
of the supported hugepage sizes.

>
> What can result from this? e.g. can we see data corruption?

In the tests I am running, it results in an error in the log -

[  344.165544] mm/pgtable-generic.c:33: bad pmd 000000083af00074.

when unmapping the page tables for the process that owns the poisoned
page.

In some instances, returning NULL instead of swap entries could lead to
data corruption - especially when the page tables contain migration swap
entries. But since hugepage migration is not enabled on arm64 I haven't
seen any corruption.

I've updated the commit log with more details locally.

>
>> Not knowing the size of the hugepage entry being
>> requested only compounded the problem.
>> 
>> The recently added hstate parameter can be used to determine the size of
>> hugepage being accessed. Use the size to find the correct page table
>> entry to return when coming across a swap page table entry.
>> 
>> Signed-off-by: Punit Agrawal <punit.agrawal@arm.com>
>> Cc: David Woods <dwoods@mellanox.com>
>
> Given this is a fix for a bug, it sounds like it should have a fixes
> tag, or a Cc stable...

The problem doesn't occur until we enable memory failure handling. So
there shouldn't be a problem on earlier kernels.

Thanks,
Punit

>
> Thanks,
> Mark.
>
>> ---
>>  arch/arm64/mm/hugetlbpage.c | 31 ++++++++++++++++---------------
>>  1 file changed, 16 insertions(+), 15 deletions(-)
>> 
>> diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c
>> index 9ca742c4c1ab..44014403081f 100644
>> --- a/arch/arm64/mm/hugetlbpage.c
>> +++ b/arch/arm64/mm/hugetlbpage.c
>> @@ -192,38 +192,39 @@ pte_t *huge_pte_alloc(struct mm_struct *mm,
>>  pte_t *huge_pte_offset(struct mm_struct *mm,
>>  		       unsigned long addr, struct hstate *h)
>>  {
>> +	unsigned long sz = huge_page_size(h);
>>  	pgd_t *pgd;
>>  	pud_t *pud;
>> -	pmd_t *pmd = NULL;
>> -	pte_t *pte = NULL;
>> +	pmd_t *pmd;
>> +	pte_t *pte;
>>  
>>  	pgd = pgd_offset(mm, addr);
>>  	pr_debug("%s: addr:0x%lx pgd:%p\n", __func__, addr, pgd);
>>  	if (!pgd_present(*pgd))
>>  		return NULL;
>> +
>>  	pud = pud_offset(pgd, addr);
>> -	if (!pud_present(*pud))
>> +	if (pud_none(*pud) && sz != PUD_SIZE)
>>  		return NULL;
>> -
>> -	if (pud_huge(*pud))
>> +	else if (!pud_table(*pud))
>>  		return (pte_t *)pud;
>> +
>> +	if (sz == CONT_PMD_SIZE)
>> +		addr &= CONT_PMD_MASK;
>> +
>>  	pmd = pmd_offset(pud, addr);
>> -	if (!pmd_present(*pmd))
>> +	if (pmd_none(*pmd) &&
>> +	    !(sz == PMD_SIZE || sz == CONT_PMD_SIZE))
>>  		return NULL;
>> -
>> -	if (pte_cont(pmd_pte(*pmd))) {
>> -		pmd = pmd_offset(
>> -			pud, (addr & CONT_PMD_MASK));
>> -		return (pte_t *)pmd;
>> -	}
>> -	if (pmd_huge(*pmd))
>> +	else if (!pmd_table(*pmd))
>>  		return (pte_t *)pmd;
>> -	pte = pte_offset_kernel(pmd, addr);
>> -	if (pte_present(*pte) && pte_cont(*pte)) {
>> +
>> +	if (sz == CONT_PTE_SIZE) {
>>  		pte = pte_offset_kernel(
>>  			pmd, (addr & CONT_PTE_MASK));
>>  		return pte;
>>  	}
>> +
>>  	return NULL;
>>  }
>>  
>> -- 
>> 2.11.0
>> 
>> 
>> _______________________________________________
>> linux-arm-kernel mailing list
>> linux-arm-kernel@lists.infradead.org
>> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 2/4] arm64: hugetlbpages: Correctly handle swap entries in huge_pte_offset()
@ 2017-04-04 18:47       ` Punit Agrawal
  0 siblings, 0 replies; 24+ messages in thread
From: Punit Agrawal @ 2017-04-04 18:47 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Mark,

Mark Rutland <mark.rutland@arm.com> writes:

> Hi Punit,
>
> On Thu, Mar 30, 2017 at 05:38:47PM +0100, Punit Agrawal wrote:
>> huge_pte_offset() does not correctly handle poisoned or migration page
>> table entries. 
>
> What exactly does it do wrong?
>
> Judging by the patch, we return NULL in some cases we shouldn't, right?

huge_pte_offset() returns NULL when it comes across swap entries for any
of the supported hugepage sizes.

>
> What can result from this? e.g. can we see data corruption?

In the tests I am running, it results in an error in the log -

[  344.165544] mm/pgtable-generic.c:33: bad pmd 000000083af00074.

when unmapping the page tables for the process that owns the poisoned
page.

In some instances, returning NULL instead of swap entries could lead to
data corruption - especially when the page tables contain migration swap
entries. But since hugepage migration is not enabled on arm64 I haven't
seen any corruption.

I've updated the commit log with more details locally.

>
>> Not knowing the size of the hugepage entry being
>> requested only compounded the problem.
>> 
>> The recently added hstate parameter can be used to determine the size of
>> hugepage being accessed. Use the size to find the correct page table
>> entry to return when coming across a swap page table entry.
>> 
>> Signed-off-by: Punit Agrawal <punit.agrawal@arm.com>
>> Cc: David Woods <dwoods@mellanox.com>
>
> Given this is a fix for a bug, it sounds like it should have a fixes
> tag, or a Cc stable...

The problem doesn't occur until we enable memory failure handling. So
there shouldn't be a problem on earlier kernels.

Thanks,
Punit

>
> Thanks,
> Mark.
>
>> ---
>>  arch/arm64/mm/hugetlbpage.c | 31 ++++++++++++++++---------------
>>  1 file changed, 16 insertions(+), 15 deletions(-)
>> 
>> diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c
>> index 9ca742c4c1ab..44014403081f 100644
>> --- a/arch/arm64/mm/hugetlbpage.c
>> +++ b/arch/arm64/mm/hugetlbpage.c
>> @@ -192,38 +192,39 @@ pte_t *huge_pte_alloc(struct mm_struct *mm,
>>  pte_t *huge_pte_offset(struct mm_struct *mm,
>>  		       unsigned long addr, struct hstate *h)
>>  {
>> +	unsigned long sz = huge_page_size(h);
>>  	pgd_t *pgd;
>>  	pud_t *pud;
>> -	pmd_t *pmd = NULL;
>> -	pte_t *pte = NULL;
>> +	pmd_t *pmd;
>> +	pte_t *pte;
>>  
>>  	pgd = pgd_offset(mm, addr);
>>  	pr_debug("%s: addr:0x%lx pgd:%p\n", __func__, addr, pgd);
>>  	if (!pgd_present(*pgd))
>>  		return NULL;
>> +
>>  	pud = pud_offset(pgd, addr);
>> -	if (!pud_present(*pud))
>> +	if (pud_none(*pud) && sz != PUD_SIZE)
>>  		return NULL;
>> -
>> -	if (pud_huge(*pud))
>> +	else if (!pud_table(*pud))
>>  		return (pte_t *)pud;
>> +
>> +	if (sz == CONT_PMD_SIZE)
>> +		addr &= CONT_PMD_MASK;
>> +
>>  	pmd = pmd_offset(pud, addr);
>> -	if (!pmd_present(*pmd))
>> +	if (pmd_none(*pmd) &&
>> +	    !(sz == PMD_SIZE || sz == CONT_PMD_SIZE))
>>  		return NULL;
>> -
>> -	if (pte_cont(pmd_pte(*pmd))) {
>> -		pmd = pmd_offset(
>> -			pud, (addr & CONT_PMD_MASK));
>> -		return (pte_t *)pmd;
>> -	}
>> -	if (pmd_huge(*pmd))
>> +	else if (!pmd_table(*pmd))
>>  		return (pte_t *)pmd;
>> -	pte = pte_offset_kernel(pmd, addr);
>> -	if (pte_present(*pte) && pte_cont(*pte)) {
>> +
>> +	if (sz == CONT_PTE_SIZE) {
>>  		pte = pte_offset_kernel(
>>  			pmd, (addr & CONT_PTE_MASK));
>>  		return pte;
>>  	}
>> +
>>  	return NULL;
>>  }
>>  
>> -- 
>> 2.11.0
>> 
>> 
>> _______________________________________________
>> linux-arm-kernel mailing list
>> linux-arm-kernel at lists.infradead.org
>> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2017-04-04 18:47 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-03-30 16:38 [PATCH 0/4] Add hstate parameter to huge_pte_offset() Punit Agrawal
2017-03-30 16:38 ` Punit Agrawal
2017-03-30 16:38 ` Punit Agrawal
2017-03-30 16:38 ` [PATCH 1/4] mm/hugetlb.c: add " Punit Agrawal
2017-03-30 16:38   ` Punit Agrawal
2017-03-30 16:38   ` Punit Agrawal
2017-04-02 19:55   ` kbuild test robot
2017-04-02 19:55     ` kbuild test robot
2017-04-02 19:55     ` kbuild test robot
2017-03-30 16:38 ` [PATCH 2/4] arm64: hugetlbpages: Correctly handle swap entries in huge_pte_offset() Punit Agrawal
2017-03-30 16:38   ` Punit Agrawal
2017-03-30 16:38   ` Punit Agrawal
2017-03-31  9:52   ` Mark Rutland
2017-03-31  9:52     ` Mark Rutland
2017-03-31  9:52     ` Mark Rutland
2017-04-04 18:47     ` Punit Agrawal
2017-04-04 18:47       ` Punit Agrawal
2017-04-04 18:47       ` Punit Agrawal
2017-03-30 16:38 ` [PATCH 3/4] arm64: hwpoison: add VM_FAULT_HWPOISON[_LARGE] handling Punit Agrawal
2017-03-30 16:38   ` Punit Agrawal
2017-03-30 16:38   ` Punit Agrawal
2017-03-30 16:38 ` [PATCH 4/4] arm64: kconfig: allow support for memory failure handling Punit Agrawal
2017-03-30 16:38   ` Punit Agrawal
2017-03-30 16:38   ` Punit Agrawal

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.