linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/3] arm64: Add support for handling memory corruption
@ 2017-05-17 15:23 Punit Agrawal
  2017-05-17 15:23 ` [PATCH v2 1/3] arm64: hugetlb: Fix huge_pte_offset to return poisoned page table entries Punit Agrawal
                   ` (3 more replies)
  0 siblings, 4 replies; 15+ messages in thread
From: Punit Agrawal @ 2017-05-17 15:23 UTC (permalink / raw)
  To: catalin.marinas, will.deacon
  Cc: Punit Agrawal, tbaicar, steve.capper, linux-arm-kernel,
	linux-kernel, manoj.iyer

Hi,

This series enables memory failure handling for arm64. Previous
posting can be found at [0].

Changes since v1:

* Reworked Patch 1 based on Catalin's feedbak to symmetrically deal
  with PUD and PMD hugepages in huge_pte_offset()
* Added Steve's acks

With support for contiguous hugepages being turned off[1], some of the
problems arising from swap entries go away[2]. This simplifies the
changes needed to enable memory corruption handling for arm64 (done in
this seris).

In this series, we updates huge_pte_offset() to correctly deal with
swap entries (Patch 1). This function will need to be updated when
contiguous hugepages are re-enabled.

Patch 2 adds support to send SIGBUS to processes that have their
memory corrupted. With the prerequisites in place, enable memory
corruption handling for arm64 (patch 3).

Thanks,
Punit

[0] https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1376052.html
[1] https://lkml.org/lkml/2017/4/7/486
[2] https://lkml.org/lkml/2017/4/5/402

Jonathan (Zhixiong) Zhang (2):
  arm64: hwpoison: add VM_FAULT_HWPOISON[_LARGE] handling
  arm64: kconfig: allow support for memory failure handling

Punit Agrawal (1):
  arm64: hugetlb: Fix huge_pte_offset to return poisoned page table
    entries

 arch/arm64/Kconfig               |  1 +
 arch/arm64/include/asm/pgtable.h |  2 +-
 arch/arm64/mm/fault.c            | 22 +++++++++++++++++++---
 arch/arm64/mm/hugetlbpage.c      | 29 ++++++++++-------------------
 4 files changed, 31 insertions(+), 23 deletions(-)

-- 
2.11.0

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v2 1/3] arm64: hugetlb: Fix huge_pte_offset to return poisoned page table entries
  2017-05-17 15:23 [PATCH v2 0/3] arm64: Add support for handling memory corruption Punit Agrawal
@ 2017-05-17 15:23 ` Punit Agrawal
  2017-06-07 13:47   ` Will Deacon
  2017-06-07 14:58   ` Catalin Marinas
  2017-05-17 15:23 ` [PATCH v2 2/3] arm64: hwpoison: add VM_FAULT_HWPOISON[_LARGE] handling Punit Agrawal
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 15+ messages in thread
From: Punit Agrawal @ 2017-05-17 15:23 UTC (permalink / raw)
  To: catalin.marinas, will.deacon
  Cc: Punit Agrawal, tbaicar, steve.capper, linux-arm-kernel,
	linux-kernel, manoj.iyer, David Woods

When memory failure is enabled, a poisoned hugepage pte is marked as a
swap entry. huge_pte_offset() does not return the poisoned page table
entries when it encounters PUD/PMD hugepages.

This behaviour of huge_pte_offset() leads to error such as below when
munmap is called on poisoned hugepages.

[  344.165544] mm/pgtable-generic.c:33: bad pmd 000000083af00074.

Fix huge_pte_offset() to return the poisoned pte which is then
appropriately handled by the generic layer code.

Signed-off-by: Punit Agrawal <punit.agrawal@arm.com>
Acked-by: Steve Capper <steve.capper@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David Woods <dwoods@mellanox.com>
---
 arch/arm64/include/asm/pgtable.h |  2 +-
 arch/arm64/mm/hugetlbpage.c      | 29 ++++++++++-------------------
 2 files changed, 11 insertions(+), 20 deletions(-)

diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index c213fdbd056c..6eae342ced6b 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -441,7 +441,7 @@ static inline phys_addr_t pmd_page_paddr(pmd_t pmd)
 
 #define pud_none(pud)		(!pud_val(pud))
 #define pud_bad(pud)		(!(pud_val(pud) & PUD_TABLE_BIT))
-#define pud_present(pud)	(pud_val(pud))
+#define pud_present(pud)	pte_present(pud_pte(pud))
 
 static inline void set_pud(pud_t *pudp, pud_t pud)
 {
diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c
index 7514a000e361..69b8200b1cfd 100644
--- a/arch/arm64/mm/hugetlbpage.c
+++ b/arch/arm64/mm/hugetlbpage.c
@@ -136,36 +136,27 @@ pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long addr)
 {
 	pgd_t *pgd;
 	pud_t *pud;
-	pmd_t *pmd = NULL;
-	pte_t *pte = NULL;
+	pmd_t *pmd;
 
 	pgd = pgd_offset(mm, addr);
 	pr_debug("%s: addr:0x%lx pgd:%p\n", __func__, addr, pgd);
 	if (!pgd_present(*pgd))
 		return NULL;
+
 	pud = pud_offset(pgd, addr);
-	if (!pud_present(*pud))
+	if (pud_none(*pud))
 		return NULL;
-
-	if (pud_huge(*pud))
+	/* swap or huge page */
+	if (!pud_present(*pud) || pud_huge(*pud))
 		return (pte_t *)pud;
+	/* table; check the next level */
+
 	pmd = pmd_offset(pud, addr);
-	if (!pmd_present(*pmd))
+	if (pmd_none(*pmd))
 		return NULL;
-
-	if (pte_cont(pmd_pte(*pmd))) {
-		pmd = pmd_offset(
-			pud, (addr & CONT_PMD_MASK));
-		return (pte_t *)pmd;
-	}
-	if (pmd_huge(*pmd))
+	if (!pmd_present(*pmd) || pmd_huge(*pmd))
 		return (pte_t *)pmd;
-	pte = pte_offset_kernel(pmd, addr);
-	if (pte_present(*pte) && pte_cont(*pte)) {
-		pte = pte_offset_kernel(
-			pmd, (addr & CONT_PTE_MASK));
-		return pte;
-	}
+
 	return NULL;
 }
 
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v2 2/3] arm64: hwpoison: add VM_FAULT_HWPOISON[_LARGE] handling
  2017-05-17 15:23 [PATCH v2 0/3] arm64: Add support for handling memory corruption Punit Agrawal
  2017-05-17 15:23 ` [PATCH v2 1/3] arm64: hugetlb: Fix huge_pte_offset to return poisoned page table entries Punit Agrawal
@ 2017-05-17 15:23 ` Punit Agrawal
  2017-06-07 13:59   ` Will Deacon
  2017-05-17 15:23 ` [PATCH v2 3/3] arm64: kconfig: allow support for memory failure handling Punit Agrawal
       [not found] ` <1495081650.3981.15@smtp.canonical.com>
  3 siblings, 1 reply; 15+ messages in thread
From: Punit Agrawal @ 2017-05-17 15:23 UTC (permalink / raw)
  To: catalin.marinas, will.deacon
  Cc: Jonathan (Zhixiong) Zhang, tbaicar, steve.capper,
	linux-arm-kernel, linux-kernel, manoj.iyer, Punit Agrawal

From: "Jonathan (Zhixiong) Zhang" <zjzhang@codeaurora.org>

Add VM_FAULT_HWPOISON[_LARGE] handling to the arm64 page fault
handler. Handling of VM_FAULT_HWPOISON[_LARGE] is very similar
to VM_FAULT_OOM, the only difference is that a different si_code
(BUS_MCEERR_AR) is passed to user space and si_addr_lsb field is
initialized.

Signed-off-by: Jonathan (Zhixiong) Zhang <zjzhang@codeaurora.org>
Signed-off-by: Tyler Baicar <tbaicar@codeaurora.org>
(fix new __do_user_fault call-site)
Signed-off-by: Punit Agrawal <punit.agrawal@arm.com>
Acked-by: Steve Capper <steve.capper@arm.com>
---
 arch/arm64/mm/fault.c | 22 +++++++++++++++++++---
 1 file changed, 19 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
index 37b95dff0b07..a85b44343ac6 100644
--- a/arch/arm64/mm/fault.c
+++ b/arch/arm64/mm/fault.c
@@ -31,6 +31,7 @@
 #include <linux/highmem.h>
 #include <linux/perf_event.h>
 #include <linux/preempt.h>
+#include <linux/hugetlb.h>
 
 #include <asm/bug.h>
 #include <asm/cpufeature.h>
@@ -239,10 +240,11 @@ static void __do_kernel_fault(struct mm_struct *mm, unsigned long addr,
  */
 static void __do_user_fault(struct task_struct *tsk, unsigned long addr,
 			    unsigned int esr, unsigned int sig, int code,
-			    struct pt_regs *regs)
+			    struct pt_regs *regs, int fault)
 {
 	struct siginfo si;
 	const struct fault_info *inf;
+	unsigned int lsb = 0;
 
 	if (unhandled_signal(tsk, sig) && show_unhandled_signals_ratelimited()) {
 		inf = esr_to_fault_info(esr);
@@ -259,6 +261,17 @@ static void __do_user_fault(struct task_struct *tsk, unsigned long addr,
 	si.si_errno = 0;
 	si.si_code = code;
 	si.si_addr = (void __user *)addr;
+	/*
+	 * Either small page or large page may be poisoned.
+	 * In other words, VM_FAULT_HWPOISON_LARGE and
+	 * VM_FAULT_HWPOISON are mutually exclusive.
+	 */
+	if (fault & VM_FAULT_HWPOISON_LARGE)
+		lsb = hstate_index_to_shift(VM_FAULT_GET_HINDEX(fault));
+	else if (fault & VM_FAULT_HWPOISON)
+		lsb = PAGE_SHIFT;
+	si.si_addr_lsb = lsb;
+
 	force_sig_info(sig, &si, tsk);
 }
 
@@ -274,7 +287,7 @@ static void do_bad_area(unsigned long addr, unsigned int esr, struct pt_regs *re
 	 */
 	if (user_mode(regs)) {
 		inf = esr_to_fault_info(esr);
-		__do_user_fault(tsk, addr, esr, inf->sig, inf->code, regs);
+		__do_user_fault(tsk, addr, esr, inf->sig, inf->code, regs, 0);
 	} else
 		__do_kernel_fault(mm, addr, esr, regs);
 }
@@ -461,6 +474,9 @@ static int __kprobes do_page_fault(unsigned long addr, unsigned int esr,
 		 */
 		sig = SIGBUS;
 		code = BUS_ADRERR;
+	} else if (fault & (VM_FAULT_HWPOISON | VM_FAULT_HWPOISON_LARGE)) {
+		sig = SIGBUS;
+		code = BUS_MCEERR_AR;
 	} else {
 		/*
 		 * Something tried to access memory that isn't in our memory
@@ -471,7 +487,7 @@ static int __kprobes do_page_fault(unsigned long addr, unsigned int esr,
 			SEGV_ACCERR : SEGV_MAPERR;
 	}
 
-	__do_user_fault(tsk, addr, esr, sig, code, regs);
+	__do_user_fault(tsk, addr, esr, sig, code, regs, fault);
 	return 0;
 
 no_context:
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v2 3/3] arm64: kconfig: allow support for memory failure handling
  2017-05-17 15:23 [PATCH v2 0/3] arm64: Add support for handling memory corruption Punit Agrawal
  2017-05-17 15:23 ` [PATCH v2 1/3] arm64: hugetlb: Fix huge_pte_offset to return poisoned page table entries Punit Agrawal
  2017-05-17 15:23 ` [PATCH v2 2/3] arm64: hwpoison: add VM_FAULT_HWPOISON[_LARGE] handling Punit Agrawal
@ 2017-05-17 15:23 ` Punit Agrawal
       [not found] ` <1495081650.3981.15@smtp.canonical.com>
  3 siblings, 0 replies; 15+ messages in thread
From: Punit Agrawal @ 2017-05-17 15:23 UTC (permalink / raw)
  To: catalin.marinas, will.deacon
  Cc: Jonathan (Zhixiong) Zhang, tbaicar, steve.capper,
	linux-arm-kernel, linux-kernel, manoj.iyer, Punit Agrawal

From: "Jonathan (Zhixiong) Zhang" <zjzhang@codeaurora.org>

Declare ARCH_SUPPORTS_MEMORY_FAILURE, as arm64 does support
memory failure recovery attempt.

Signed-off-by: Jonathan (Zhixiong) Zhang <zjzhang@codeaurora.org>
Signed-off-by: Tyler Baicar <tbaicar@codeaurora.org>
(Dropped changes to ACPI APEI Kconfig and updated commit log)
Signed-off-by: Punit Agrawal <punit.agrawal@arm.com>
Acked-by: Steve Capper <steve.capper@arm.com>
---
 arch/arm64/Kconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 3dcd7ec69bca..39986b63383e 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -20,6 +20,7 @@ config ARM64
 	select ARCH_HAS_STRICT_MODULE_RWX
 	select ARCH_HAS_TICK_BROADCAST if GENERIC_CLOCKEVENTS_BROADCAST
 	select ARCH_USE_CMPXCHG_LOCKREF
+	select ARCH_SUPPORTS_MEMORY_FAILURE
 	select ARCH_SUPPORTS_ATOMIC_RMW
 	select ARCH_SUPPORTS_NUMA_BALANCING
 	select ARCH_WANT_COMPAT_IPC_PARSE_VERSION
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 0/3] arm64: Add support for handling memory corruption
       [not found] ` <1495081650.3981.15@smtp.canonical.com>
@ 2017-05-18 10:27   ` Punit Agrawal
  0 siblings, 0 replies; 15+ messages in thread
From: Punit Agrawal @ 2017-05-18 10:27 UTC (permalink / raw)
  To: Manoj Iyer
  Cc: catalin.marinas, will.deacon, tbaicar, steve.capper,
	linux-arm-kernel, linux-kernel

Hi Manoj,

Manoj Iyer <manoj.iyer@canonical.com> writes:

> On Wed, May 17, 2017 at 10:23 AM, Punit Agrawal
> <punit.agrawal@arm.com> wrote:
>
>     Hi, This series enables memory failure handling for arm64.
>     Previous posting can be found at [0]. Changes since v1: * Reworked
>     Patch 1 based on Catalin's feedbak to symmetrically deal with PUD
>     and PMD hugepages in huge_pte_offset() * Added Steve's acks With
>     support for contiguous hugepages being turned off[1], some of the
>     problems arising from swap entries go away[2]. This simplifies the
>     changes needed to enable memory corruption handling for arm64
>     (done in this seris). In this series, we updates huge_pte_offset()
>     to correctly deal with swap entries (Patch 1). This function will
>     need to be updated when contiguous hugepages are re-enabled. Patch
>     2 adds support to send SIGBUS to processes that have their memory
>     corrupted. With the prerequisites in place, enable memory
>     corruption handling for arm64 (patch 3).
>
>     Thanks, Punit [0]
>     https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1376052.html
>    [1] https://lkml.org/lkml/2017/4/7/486 [2]
>     https://lkml.org/lkml/2017/4/5/402 Jonathan (Zhixiong) Zhang (2):
>     arm64: hwpoison: add VM_FAULT_HWPOISON[_LARGE] handling arm64:
>     kconfig: allow support for memory failure handling Punit Agrawal
>     (1): arm64: hugetlb: Fix huge_pte_offset to return poisoned page
>     table entries arch/arm64/Kconfig | 1 +
>     arch/arm64/include/asm/pgtable.h | 2 +- arch/arm64/mm/fault.c | 22
>     +++++++++++++++++++--- arch/arm64/mm/hugetlbpage.c | 29
>     ++++++++++------------------- 4 files changed, 31 insertions(+),
>     23 deletions(-) 
>     -- 
>     2.11.0 
>
> I applied Jonathans 2 patches to Ubuntu Zesty kernel (4.10) and ran
> the mce-test ./run_hugepage.sh after fixing a few things in the test
> case. This generated the bad pmd messages.
>
> linux-4.10.0/mm/pgtable-generic.c:33: bad pmd 0000000172420074.
>
> Then I applied Punit's patch to the kernel and re-ran the mce-test and
> did not see the bad pmd messages. The tests were done on a Qualcomm
> Centriq 2400 platform. 
>
> Tested-by: Manoj Iyer <manoj.iyer@canonical.com>

Thanks for taking the patches for a spin.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 1/3] arm64: hugetlb: Fix huge_pte_offset to return poisoned page table entries
  2017-05-17 15:23 ` [PATCH v2 1/3] arm64: hugetlb: Fix huge_pte_offset to return poisoned page table entries Punit Agrawal
@ 2017-06-07 13:47   ` Will Deacon
  2017-06-07 14:30     ` Catalin Marinas
  2017-06-07 14:58   ` Catalin Marinas
  1 sibling, 1 reply; 15+ messages in thread
From: Will Deacon @ 2017-06-07 13:47 UTC (permalink / raw)
  To: Punit Agrawal
  Cc: catalin.marinas, tbaicar, steve.capper, linux-arm-kernel,
	linux-kernel, manoj.iyer, David Woods

On Wed, May 17, 2017 at 04:23:34PM +0100, Punit Agrawal wrote:
> When memory failure is enabled, a poisoned hugepage pte is marked as a
> swap entry. huge_pte_offset() does not return the poisoned page table
> entries when it encounters PUD/PMD hugepages.
> 
> This behaviour of huge_pte_offset() leads to error such as below when
> munmap is called on poisoned hugepages.
> 
> [  344.165544] mm/pgtable-generic.c:33: bad pmd 000000083af00074.
> 
> Fix huge_pte_offset() to return the poisoned pte which is then
> appropriately handled by the generic layer code.
> 
> Signed-off-by: Punit Agrawal <punit.agrawal@arm.com>
> Acked-by: Steve Capper <steve.capper@arm.com>
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: David Woods <dwoods@mellanox.com>
> ---
>  arch/arm64/include/asm/pgtable.h |  2 +-
>  arch/arm64/mm/hugetlbpage.c      | 29 ++++++++++-------------------
>  2 files changed, 11 insertions(+), 20 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
> index c213fdbd056c..6eae342ced6b 100644
> --- a/arch/arm64/include/asm/pgtable.h
> +++ b/arch/arm64/include/asm/pgtable.h
> @@ -441,7 +441,7 @@ static inline phys_addr_t pmd_page_paddr(pmd_t pmd)
>  
>  #define pud_none(pud)		(!pud_val(pud))
>  #define pud_bad(pud)		(!(pud_val(pud) & PUD_TABLE_BIT))
> -#define pud_present(pud)	(pud_val(pud))
> +#define pud_present(pud)	pte_present(pud_pte(pud))
>  
>  static inline void set_pud(pud_t *pudp, pud_t pud)
>  {
> diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c
> index 7514a000e361..69b8200b1cfd 100644
> --- a/arch/arm64/mm/hugetlbpage.c
> +++ b/arch/arm64/mm/hugetlbpage.c
> @@ -136,36 +136,27 @@ pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long addr)
>  {
>  	pgd_t *pgd;
>  	pud_t *pud;
> -	pmd_t *pmd = NULL;
> -	pte_t *pte = NULL;
> +	pmd_t *pmd;
>  
>  	pgd = pgd_offset(mm, addr);
>  	pr_debug("%s: addr:0x%lx pgd:%p\n", __func__, addr, pgd);
>  	if (!pgd_present(*pgd))
>  		return NULL;
> +
>  	pud = pud_offset(pgd, addr);
> -	if (!pud_present(*pud))
> +	if (pud_none(*pud))
>  		return NULL;

Do you actually need this special case?

> -
> -	if (pud_huge(*pud))
> +	/* swap or huge page */
> +	if (!pud_present(*pud) || pud_huge(*pud))

... couldn't you just add a '|| pud_none(*pud)' in here?

>  		return (pte_t *)pud;
> +	/* table; check the next level */
> +
>  	pmd = pmd_offset(pud, addr);
> -	if (!pmd_present(*pmd))
> +	if (pmd_none(*pmd))
>  		return NULL;
> -
> -	if (pte_cont(pmd_pte(*pmd))) {
> -		pmd = pmd_offset(
> -			pud, (addr & CONT_PMD_MASK));
> -		return (pte_t *)pmd;
> -	}
> -	if (pmd_huge(*pmd))
> +	if (!pmd_present(*pmd) || pmd_huge(*pmd))

Similarly here.

Will

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 2/3] arm64: hwpoison: add VM_FAULT_HWPOISON[_LARGE] handling
  2017-05-17 15:23 ` [PATCH v2 2/3] arm64: hwpoison: add VM_FAULT_HWPOISON[_LARGE] handling Punit Agrawal
@ 2017-06-07 13:59   ` Will Deacon
  2017-06-07 17:47     ` Punit Agrawal
  0 siblings, 1 reply; 15+ messages in thread
From: Will Deacon @ 2017-06-07 13:59 UTC (permalink / raw)
  To: Punit Agrawal
  Cc: catalin.marinas, Jonathan (Zhixiong) Zhang, tbaicar,
	steve.capper, linux-arm-kernel, linux-kernel, manoj.iyer

On Wed, May 17, 2017 at 04:23:35PM +0100, Punit Agrawal wrote:
> From: "Jonathan (Zhixiong) Zhang" <zjzhang@codeaurora.org>
> 
> Add VM_FAULT_HWPOISON[_LARGE] handling to the arm64 page fault
> handler. Handling of VM_FAULT_HWPOISON[_LARGE] is very similar
> to VM_FAULT_OOM, the only difference is that a different si_code
> (BUS_MCEERR_AR) is passed to user space and si_addr_lsb field is
> initialized.
> 
> Signed-off-by: Jonathan (Zhixiong) Zhang <zjzhang@codeaurora.org>
> Signed-off-by: Tyler Baicar <tbaicar@codeaurora.org>
> (fix new __do_user_fault call-site)
> Signed-off-by: Punit Agrawal <punit.agrawal@arm.com>
> Acked-by: Steve Capper <steve.capper@arm.com>
> ---
>  arch/arm64/mm/fault.c | 22 +++++++++++++++++++---
>  1 file changed, 19 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
> index 37b95dff0b07..a85b44343ac6 100644
> --- a/arch/arm64/mm/fault.c
> +++ b/arch/arm64/mm/fault.c
> @@ -31,6 +31,7 @@
>  #include <linux/highmem.h>
>  #include <linux/perf_event.h>
>  #include <linux/preempt.h>
> +#include <linux/hugetlb.h>
>  
>  #include <asm/bug.h>
>  #include <asm/cpufeature.h>
> @@ -239,10 +240,11 @@ static void __do_kernel_fault(struct mm_struct *mm, unsigned long addr,
>   */
>  static void __do_user_fault(struct task_struct *tsk, unsigned long addr,
>  			    unsigned int esr, unsigned int sig, int code,
> -			    struct pt_regs *regs)
> +			    struct pt_regs *regs, int fault)
>  {
>  	struct siginfo si;
>  	const struct fault_info *inf;
> +	unsigned int lsb = 0;
>  
>  	if (unhandled_signal(tsk, sig) && show_unhandled_signals_ratelimited()) {
>  		inf = esr_to_fault_info(esr);
> @@ -259,6 +261,17 @@ static void __do_user_fault(struct task_struct *tsk, unsigned long addr,
>  	si.si_errno = 0;
>  	si.si_code = code;
>  	si.si_addr = (void __user *)addr;
> +	/*
> +	 * Either small page or large page may be poisoned.
> +	 * In other words, VM_FAULT_HWPOISON_LARGE and
> +	 * VM_FAULT_HWPOISON are mutually exclusive.
> +	 */
> +	if (fault & VM_FAULT_HWPOISON_LARGE)
> +		lsb = hstate_index_to_shift(VM_FAULT_GET_HINDEX(fault));
> +	else if (fault & VM_FAULT_HWPOISON)
> +		lsb = PAGE_SHIFT;
> +	si.si_addr_lsb = lsb;
> +

If we're going to start handling poison faults, then we should probably
rejig the perf page fault accounting around here so that we follow x86:

  * Always report PERF_COUNT_SW_PAGE_FAULTS,
  * Don't report anything else for VM_FAULT_ERROR
  * Report PERF_COUNT_SW_PAGE_FAULTS_MAJ if VM_FAULT_MAJOR
  * Otherwise, report PERF_COUNT_SW_PAGE_FAULTS_MIN

at the moment, I think you're accounting VM_FAULT_ERROR as
PERF_COUNT_SW_PAGE_FAULTS_MIN, which doesn't feel right at all.

Will

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 1/3] arm64: hugetlb: Fix huge_pte_offset to return poisoned page table entries
  2017-06-07 13:47   ` Will Deacon
@ 2017-06-07 14:30     ` Catalin Marinas
  2017-06-07 14:57       ` Will Deacon
  0 siblings, 1 reply; 15+ messages in thread
From: Catalin Marinas @ 2017-06-07 14:30 UTC (permalink / raw)
  To: Will Deacon
  Cc: Punit Agrawal, David Woods, steve.capper, tbaicar, linux-kernel,
	linux-arm-kernel, manoj.iyer

On Wed, Jun 07, 2017 at 02:47:32PM +0100, Will Deacon wrote:
> On Wed, May 17, 2017 at 04:23:34PM +0100, Punit Agrawal wrote:
> > --- a/arch/arm64/mm/hugetlbpage.c
> > +++ b/arch/arm64/mm/hugetlbpage.c
> > @@ -136,36 +136,27 @@ pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long addr)
> >  {
> >  	pgd_t *pgd;
> >  	pud_t *pud;
> > -	pmd_t *pmd = NULL;
> > -	pte_t *pte = NULL;
> > +	pmd_t *pmd;
> >  
> >  	pgd = pgd_offset(mm, addr);
> >  	pr_debug("%s: addr:0x%lx pgd:%p\n", __func__, addr, pgd);
> >  	if (!pgd_present(*pgd))
> >  		return NULL;
> > +
> >  	pud = pud_offset(pgd, addr);
> > -	if (!pud_present(*pud))
> > +	if (pud_none(*pud))
> >  		return NULL;
> 
> Do you actually need this special case?
> 
> > -
> > -	if (pud_huge(*pud))
> > +	/* swap or huge page */
> > +	if (!pud_present(*pud) || pud_huge(*pud))
> 
> ... couldn't you just add a '|| pud_none(*pud)' in here?
> 
> >  		return (pte_t *)pud;

But then you no longer return NULL if *pud == 0.

-- 
Catalin

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 1/3] arm64: hugetlb: Fix huge_pte_offset to return poisoned page table entries
  2017-06-07 14:30     ` Catalin Marinas
@ 2017-06-07 14:57       ` Will Deacon
  2017-06-07 15:32         ` Punit Agrawal
  0 siblings, 1 reply; 15+ messages in thread
From: Will Deacon @ 2017-06-07 14:57 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Punit Agrawal, David Woods, steve.capper, tbaicar, linux-kernel,
	linux-arm-kernel, manoj.iyer

On Wed, Jun 07, 2017 at 03:30:37PM +0100, Catalin Marinas wrote:
> On Wed, Jun 07, 2017 at 02:47:32PM +0100, Will Deacon wrote:
> > On Wed, May 17, 2017 at 04:23:34PM +0100, Punit Agrawal wrote:
> > > --- a/arch/arm64/mm/hugetlbpage.c
> > > +++ b/arch/arm64/mm/hugetlbpage.c
> > > @@ -136,36 +136,27 @@ pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long addr)
> > >  {
> > >  	pgd_t *pgd;
> > >  	pud_t *pud;
> > > -	pmd_t *pmd = NULL;
> > > -	pte_t *pte = NULL;
> > > +	pmd_t *pmd;
> > >  
> > >  	pgd = pgd_offset(mm, addr);
> > >  	pr_debug("%s: addr:0x%lx pgd:%p\n", __func__, addr, pgd);
> > >  	if (!pgd_present(*pgd))
> > >  		return NULL;
> > > +
> > >  	pud = pud_offset(pgd, addr);
> > > -	if (!pud_present(*pud))
> > > +	if (pud_none(*pud))
> > >  		return NULL;
> > 
> > Do you actually need this special case?
> > 
> > > -
> > > -	if (pud_huge(*pud))
> > > +	/* swap or huge page */
> > > +	if (!pud_present(*pud) || pud_huge(*pud))
> > 
> > ... couldn't you just add a '|| pud_none(*pud)' in here?
> > 
> > >  		return (pte_t *)pud;
> 
> But then you no longer return NULL if *pud == 0.

Does that actually matter? The bits of hugetlb code I looked at will
deferenced the returned pud and handle the huge_pte_none case correctly.

Will

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 1/3] arm64: hugetlb: Fix huge_pte_offset to return poisoned page table entries
  2017-05-17 15:23 ` [PATCH v2 1/3] arm64: hugetlb: Fix huge_pte_offset to return poisoned page table entries Punit Agrawal
  2017-06-07 13:47   ` Will Deacon
@ 2017-06-07 14:58   ` Catalin Marinas
  1 sibling, 0 replies; 15+ messages in thread
From: Catalin Marinas @ 2017-06-07 14:58 UTC (permalink / raw)
  To: Punit Agrawal
  Cc: will.deacon, David Woods, steve.capper, tbaicar, linux-kernel,
	linux-arm-kernel, manoj.iyer

On Wed, May 17, 2017 at 04:23:34PM +0100, Punit Agrawal wrote:
> When memory failure is enabled, a poisoned hugepage pte is marked as a
> swap entry. huge_pte_offset() does not return the poisoned page table
> entries when it encounters PUD/PMD hugepages.
> 
> This behaviour of huge_pte_offset() leads to error such as below when
> munmap is called on poisoned hugepages.
> 
> [  344.165544] mm/pgtable-generic.c:33: bad pmd 000000083af00074.
> 
> Fix huge_pte_offset() to return the poisoned pte which is then
> appropriately handled by the generic layer code.
> 
> Signed-off-by: Punit Agrawal <punit.agrawal@arm.com>
> Acked-by: Steve Capper <steve.capper@arm.com>
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: David Woods <dwoods@mellanox.com>

Since the patch matches my suggestions in v1:

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 1/3] arm64: hugetlb: Fix huge_pte_offset to return poisoned page table entries
  2017-06-07 14:57       ` Will Deacon
@ 2017-06-07 15:32         ` Punit Agrawal
  2017-06-07 15:41           ` Will Deacon
  2017-06-07 16:54           ` Catalin Marinas
  0 siblings, 2 replies; 15+ messages in thread
From: Punit Agrawal @ 2017-06-07 15:32 UTC (permalink / raw)
  To: Will Deacon
  Cc: Catalin Marinas, David Woods, steve.capper, tbaicar,
	linux-kernel, linux-arm-kernel, manoj.iyer

Will Deacon <will.deacon@arm.com> writes:

> On Wed, Jun 07, 2017 at 03:30:37PM +0100, Catalin Marinas wrote:
>> On Wed, Jun 07, 2017 at 02:47:32PM +0100, Will Deacon wrote:
>> > On Wed, May 17, 2017 at 04:23:34PM +0100, Punit Agrawal wrote:
>> > > --- a/arch/arm64/mm/hugetlbpage.c
>> > > +++ b/arch/arm64/mm/hugetlbpage.c
>> > > @@ -136,36 +136,27 @@ pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long addr)
>> > >  {
>> > >  	pgd_t *pgd;
>> > >  	pud_t *pud;
>> > > -	pmd_t *pmd = NULL;
>> > > -	pte_t *pte = NULL;
>> > > +	pmd_t *pmd;
>> > >  
>> > >  	pgd = pgd_offset(mm, addr);
>> > >  	pr_debug("%s: addr:0x%lx pgd:%p\n", __func__, addr, pgd);
>> > >  	if (!pgd_present(*pgd))
>> > >  		return NULL;
>> > > +
>> > >  	pud = pud_offset(pgd, addr);
>> > > -	if (!pud_present(*pud))
>> > > +	if (pud_none(*pud))
>> > >  		return NULL;
>> > 
>> > Do you actually need this special case?
>> > 
>> > > -
>> > > -	if (pud_huge(*pud))
>> > > +	/* swap or huge page */
>> > > +	if (!pud_present(*pud) || pud_huge(*pud))
>> > 
>> > ... couldn't you just add a '|| pud_none(*pud)' in here?
>> > 

I think an earlier version took this approach but...

>> > >  		return (pte_t *)pud;
>> 
>> But then you no longer return NULL if *pud == 0.
>
> Does that actually matter? The bits of hugetlb code I looked at will
> deferenced the returned pud and handle the huge_pte_none case correctly.

For hugetlb fault handling (hugetlb_fault()), returning NULL vs pointer
to the pud/pmd results in different behaviour. If we return the pud when
pud_none(), then we lose the resulting hugepage size check we get from
huge_pte_alloc().

>
> Will

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 1/3] arm64: hugetlb: Fix huge_pte_offset to return poisoned page table entries
  2017-06-07 15:32         ` Punit Agrawal
@ 2017-06-07 15:41           ` Will Deacon
  2017-06-08 16:28             ` Punit Agrawal
  2017-06-07 16:54           ` Catalin Marinas
  1 sibling, 1 reply; 15+ messages in thread
From: Will Deacon @ 2017-06-07 15:41 UTC (permalink / raw)
  To: Punit Agrawal
  Cc: Catalin Marinas, David Woods, steve.capper, tbaicar,
	linux-kernel, linux-arm-kernel, manoj.iyer

On Wed, Jun 07, 2017 at 04:32:28PM +0100, Punit Agrawal wrote:
> Will Deacon <will.deacon@arm.com> writes:
> 
> > On Wed, Jun 07, 2017 at 03:30:37PM +0100, Catalin Marinas wrote:
> >> On Wed, Jun 07, 2017 at 02:47:32PM +0100, Will Deacon wrote:
> >> > On Wed, May 17, 2017 at 04:23:34PM +0100, Punit Agrawal wrote:
> >> > > --- a/arch/arm64/mm/hugetlbpage.c
> >> > > +++ b/arch/arm64/mm/hugetlbpage.c
> >> > > @@ -136,36 +136,27 @@ pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long addr)
> >> > >  {
> >> > >  	pgd_t *pgd;
> >> > >  	pud_t *pud;
> >> > > -	pmd_t *pmd = NULL;
> >> > > -	pte_t *pte = NULL;
> >> > > +	pmd_t *pmd;
> >> > >  
> >> > >  	pgd = pgd_offset(mm, addr);
> >> > >  	pr_debug("%s: addr:0x%lx pgd:%p\n", __func__, addr, pgd);
> >> > >  	if (!pgd_present(*pgd))
> >> > >  		return NULL;
> >> > > +
> >> > >  	pud = pud_offset(pgd, addr);
> >> > > -	if (!pud_present(*pud))
> >> > > +	if (pud_none(*pud))
> >> > >  		return NULL;
> >> > 
> >> > Do you actually need this special case?
> >> > 
> >> > > -
> >> > > -	if (pud_huge(*pud))
> >> > > +	/* swap or huge page */
> >> > > +	if (!pud_present(*pud) || pud_huge(*pud))
> >> > 
> >> > ... couldn't you just add a '|| pud_none(*pud)' in here?
> >> > 
> 
> I think an earlier version took this approach but...
> 
> >> > >  		return (pte_t *)pud;
> >> 
> >> But then you no longer return NULL if *pud == 0.
> >
> > Does that actually matter? The bits of hugetlb code I looked at will
> > deferenced the returned pud and handle the huge_pte_none case correctly.
> 
> For hugetlb fault handling (hugetlb_fault()), returning NULL vs pointer
> to the pud/pmd results in different behaviour. If we return the pud when
> pud_none(), then we lose the resulting hugepage size check we get from
> huge_pte_alloc().

Ok, so does that mean that many of the huge_pte_none checks in mm/hugetlb.c
that operate on a huge_ptep_get of non-NULL output from huge_pte_offset are
actually redundant?

Will

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 1/3] arm64: hugetlb: Fix huge_pte_offset to return poisoned page table entries
  2017-06-07 15:32         ` Punit Agrawal
  2017-06-07 15:41           ` Will Deacon
@ 2017-06-07 16:54           ` Catalin Marinas
  1 sibling, 0 replies; 15+ messages in thread
From: Catalin Marinas @ 2017-06-07 16:54 UTC (permalink / raw)
  To: Punit Agrawal
  Cc: Will Deacon, David Woods, steve.capper, tbaicar, linux-kernel,
	linux-arm-kernel, manoj.iyer

On Wed, Jun 07, 2017 at 04:32:28PM +0100, Punit Agrawal wrote:
> Will Deacon <will.deacon@arm.com> writes:
> > On Wed, Jun 07, 2017 at 03:30:37PM +0100, Catalin Marinas wrote:
> >> On Wed, Jun 07, 2017 at 02:47:32PM +0100, Will Deacon wrote:
> >> > On Wed, May 17, 2017 at 04:23:34PM +0100, Punit Agrawal wrote:
> >> > > --- a/arch/arm64/mm/hugetlbpage.c
> >> > > +++ b/arch/arm64/mm/hugetlbpage.c
> >> > > @@ -136,36 +136,27 @@ pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long addr)
> >> > >  {
> >> > >  	pgd_t *pgd;
> >> > >  	pud_t *pud;
> >> > > -	pmd_t *pmd = NULL;
> >> > > -	pte_t *pte = NULL;
> >> > > +	pmd_t *pmd;
> >> > >  
> >> > >  	pgd = pgd_offset(mm, addr);
> >> > >  	pr_debug("%s: addr:0x%lx pgd:%p\n", __func__, addr, pgd);
> >> > >  	if (!pgd_present(*pgd))
> >> > >  		return NULL;
> >> > > +
> >> > >  	pud = pud_offset(pgd, addr);
> >> > > -	if (!pud_present(*pud))
> >> > > +	if (pud_none(*pud))
> >> > >  		return NULL;
> >> > 
> >> > Do you actually need this special case?
> >> > 
> >> > > -
> >> > > -	if (pud_huge(*pud))
> >> > > +	/* swap or huge page */
> >> > > +	if (!pud_present(*pud) || pud_huge(*pud))
> >> > 
> >> > ... couldn't you just add a '|| pud_none(*pud)' in here?
> >> > 
> 
> I think an earlier version took this approach but...
> 
> >> > >  		return (pte_t *)pud;
> >> 
> >> But then you no longer return NULL if *pud == 0.
> >
> > Does that actually matter? The bits of hugetlb code I looked at will
> > deferenced the returned pud and handle the huge_pte_none case correctly.
> 
> For hugetlb fault handling (hugetlb_fault()), returning NULL vs pointer
> to the pud/pmd results in different behaviour. If we return the pud when
> pud_none(), then we lose the resulting hugepage size check we get from
> huge_pte_alloc().

At a quick look, there are a few other places where not returning NULL
has some other effects (though I don't think any of them are fatal):

- copy_huge_tlb_page_range() - unnecessary allocation of a destination
  pud

- huge_pmd_share() - do we actually need the subsequent get_page()?

- page_vma_mapped_walk() - it even has a comment: "when pud is not
  present, pte will be NULL". Now, that's no longer true with swap
  entries but we'd never return NULL for a pud_none() case

Current code behaviour is to return NULL when !p*d_present(). We are
slightly relaxing this for swap entries while still returning NULL for
the p*d_none() case but I wouldn't go that far as to never return NULL
here.

-- 
Catalin

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 2/3] arm64: hwpoison: add VM_FAULT_HWPOISON[_LARGE] handling
  2017-06-07 13:59   ` Will Deacon
@ 2017-06-07 17:47     ` Punit Agrawal
  0 siblings, 0 replies; 15+ messages in thread
From: Punit Agrawal @ 2017-06-07 17:47 UTC (permalink / raw)
  To: Will Deacon
  Cc: catalin.marinas, Jonathan (Zhixiong) Zhang, tbaicar,
	steve.capper, linux-arm-kernel, linux-kernel, manoj.iyer

Will Deacon <will.deacon@arm.com> writes:

> On Wed, May 17, 2017 at 04:23:35PM +0100, Punit Agrawal wrote:
>> From: "Jonathan (Zhixiong) Zhang" <zjzhang@codeaurora.org>
>> 
>> Add VM_FAULT_HWPOISON[_LARGE] handling to the arm64 page fault
>> handler. Handling of VM_FAULT_HWPOISON[_LARGE] is very similar
>> to VM_FAULT_OOM, the only difference is that a different si_code
>> (BUS_MCEERR_AR) is passed to user space and si_addr_lsb field is
>> initialized.
>> 
>> Signed-off-by: Jonathan (Zhixiong) Zhang <zjzhang@codeaurora.org>
>> Signed-off-by: Tyler Baicar <tbaicar@codeaurora.org>
>> (fix new __do_user_fault call-site)
>> Signed-off-by: Punit Agrawal <punit.agrawal@arm.com>
>> Acked-by: Steve Capper <steve.capper@arm.com>
>> ---
>>  arch/arm64/mm/fault.c | 22 +++++++++++++++++++---
>>  1 file changed, 19 insertions(+), 3 deletions(-)
>> 
>> diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
>> index 37b95dff0b07..a85b44343ac6 100644
>> --- a/arch/arm64/mm/fault.c
>> +++ b/arch/arm64/mm/fault.c
>> @@ -31,6 +31,7 @@
>>  #include <linux/highmem.h>
>>  #include <linux/perf_event.h>
>>  #include <linux/preempt.h>
>> +#include <linux/hugetlb.h>
>>  
>>  #include <asm/bug.h>
>>  #include <asm/cpufeature.h>
>> @@ -239,10 +240,11 @@ static void __do_kernel_fault(struct mm_struct *mm, unsigned long addr,
>>   */
>>  static void __do_user_fault(struct task_struct *tsk, unsigned long addr,
>>  			    unsigned int esr, unsigned int sig, int code,
>> -			    struct pt_regs *regs)
>> +			    struct pt_regs *regs, int fault)
>>  {
>>  	struct siginfo si;
>>  	const struct fault_info *inf;
>> +	unsigned int lsb = 0;
>>  
>>  	if (unhandled_signal(tsk, sig) && show_unhandled_signals_ratelimited()) {
>>  		inf = esr_to_fault_info(esr);
>> @@ -259,6 +261,17 @@ static void __do_user_fault(struct task_struct *tsk, unsigned long addr,
>>  	si.si_errno = 0;
>>  	si.si_code = code;
>>  	si.si_addr = (void __user *)addr;
>> +	/*
>> +	 * Either small page or large page may be poisoned.
>> +	 * In other words, VM_FAULT_HWPOISON_LARGE and
>> +	 * VM_FAULT_HWPOISON are mutually exclusive.
>> +	 */
>> +	if (fault & VM_FAULT_HWPOISON_LARGE)
>> +		lsb = hstate_index_to_shift(VM_FAULT_GET_HINDEX(fault));
>> +	else if (fault & VM_FAULT_HWPOISON)
>> +		lsb = PAGE_SHIFT;
>> +	si.si_addr_lsb = lsb;
>> +
>
> If we're going to start handling poison faults, then we should probably
> rejig the perf page fault accounting around here so that we follow x86:
>
>   * Always report PERF_COUNT_SW_PAGE_FAULTS,
>   * Don't report anything else for VM_FAULT_ERROR
>   * Report PERF_COUNT_SW_PAGE_FAULTS_MAJ if VM_FAULT_MAJOR
>   * Otherwise, report PERF_COUNT_SW_PAGE_FAULTS_MIN
>
> at the moment, I think you're accounting VM_FAULT_ERROR as
> PERF_COUNT_SW_PAGE_FAULTS_MIN, which doesn't feel right at all.

Ok. I'd missed the implication of enabling poisoned pages fault handling
on the perf accounting. I'll add a patch to update the reporting.

>
> Will

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 1/3] arm64: hugetlb: Fix huge_pte_offset to return poisoned page table entries
  2017-06-07 15:41           ` Will Deacon
@ 2017-06-08 16:28             ` Punit Agrawal
  0 siblings, 0 replies; 15+ messages in thread
From: Punit Agrawal @ 2017-06-08 16:28 UTC (permalink / raw)
  To: Will Deacon
  Cc: Catalin Marinas, David Woods, steve.capper, tbaicar,
	linux-kernel, linux-arm-kernel, manoj.iyer

Will Deacon <will.deacon@arm.com> writes:

> On Wed, Jun 07, 2017 at 04:32:28PM +0100, Punit Agrawal wrote:
>> Will Deacon <will.deacon@arm.com> writes:
>> 
>> > On Wed, Jun 07, 2017 at 03:30:37PM +0100, Catalin Marinas wrote:
>> >> On Wed, Jun 07, 2017 at 02:47:32PM +0100, Will Deacon wrote:
>> >> > On Wed, May 17, 2017 at 04:23:34PM +0100, Punit Agrawal wrote:
>> >> > > --- a/arch/arm64/mm/hugetlbpage.c
>> >> > > +++ b/arch/arm64/mm/hugetlbpage.c
>> >> > > @@ -136,36 +136,27 @@ pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long addr)
>> >> > >  {
>> >> > >  	pgd_t *pgd;
>> >> > >  	pud_t *pud;
>> >> > > -	pmd_t *pmd = NULL;
>> >> > > -	pte_t *pte = NULL;
>> >> > > +	pmd_t *pmd;
>> >> > >  
>> >> > >  	pgd = pgd_offset(mm, addr);
>> >> > >  	pr_debug("%s: addr:0x%lx pgd:%p\n", __func__, addr, pgd);
>> >> > >  	if (!pgd_present(*pgd))
>> >> > >  		return NULL;
>> >> > > +
>> >> > >  	pud = pud_offset(pgd, addr);
>> >> > > -	if (!pud_present(*pud))
>> >> > > +	if (pud_none(*pud))
>> >> > >  		return NULL;
>> >> > 
>> >> > Do you actually need this special case?
>> >> > 
>> >> > > -
>> >> > > -	if (pud_huge(*pud))
>> >> > > +	/* swap or huge page */
>> >> > > +	if (!pud_present(*pud) || pud_huge(*pud))
>> >> > 
>> >> > ... couldn't you just add a '|| pud_none(*pud)' in here?
>> >> > 
>> 
>> I think an earlier version took this approach but...
>> 
>> >> > >  		return (pte_t *)pud;
>> >> 
>> >> But then you no longer return NULL if *pud == 0.
>> >
>> > Does that actually matter? The bits of hugetlb code I looked at will
>> > deferenced the returned pud and handle the huge_pte_none case correctly.
>> 
>> For hugetlb fault handling (hugetlb_fault()), returning NULL vs pointer
>> to the pud/pmd results in different behaviour. If we return the pud when
>> pud_none(), then we lose the resulting hugepage size check we get from
>> huge_pte_alloc().
>
> Ok, so does that mean that many of the huge_pte_none checks in mm/hugetlb.c
> that operate on a huge_ptep_get of non-NULL output from huge_pte_offset are
> actually redundant?

Summarising our offline discussion - the semantics of huge_pte_offset()
are unclear in terms of when a pointer to (p*d) is returned vs when to
return NULL.

As part of enabling contiguous hugepage support[0][1], I have a patch to
add a size argument to huge_pte_offset() that can then help disambiguate
when we have a valid pte* vs when we have an error (NULL).

I'll separately look at clarifying the semantics of the generic version
of huge_pte_offse() and potentially also look at unifying the
huge_pte_offset() implementations across the various architectures.

[0] https://lkml.org/lkml/2017/5/24/463
[1] https://www.spinics.net/lists/arm-kernel/msg583367.html

>
> Will

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2017-06-08 16:28 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-05-17 15:23 [PATCH v2 0/3] arm64: Add support for handling memory corruption Punit Agrawal
2017-05-17 15:23 ` [PATCH v2 1/3] arm64: hugetlb: Fix huge_pte_offset to return poisoned page table entries Punit Agrawal
2017-06-07 13:47   ` Will Deacon
2017-06-07 14:30     ` Catalin Marinas
2017-06-07 14:57       ` Will Deacon
2017-06-07 15:32         ` Punit Agrawal
2017-06-07 15:41           ` Will Deacon
2017-06-08 16:28             ` Punit Agrawal
2017-06-07 16:54           ` Catalin Marinas
2017-06-07 14:58   ` Catalin Marinas
2017-05-17 15:23 ` [PATCH v2 2/3] arm64: hwpoison: add VM_FAULT_HWPOISON[_LARGE] handling Punit Agrawal
2017-06-07 13:59   ` Will Deacon
2017-06-07 17:47     ` Punit Agrawal
2017-05-17 15:23 ` [PATCH v2 3/3] arm64: kconfig: allow support for memory failure handling Punit Agrawal
     [not found] ` <1495081650.3981.15@smtp.canonical.com>
2017-05-18 10:27   ` [PATCH v2 0/3] arm64: Add support for handling memory corruption Punit Agrawal

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).