linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH V2] arm64: hwpoison: add VM_FAULT_HWPOISON[_LARGE] handling
@ 2017-02-21 23:36 Tyler Baicar
  2017-03-02 18:28 ` Punit Agrawal
  0 siblings, 1 reply; 11+ messages in thread
From: Tyler Baicar @ 2017-02-21 23:36 UTC (permalink / raw)
  To: catalin.marinas, will.deacon, mark.rutland, james.morse, akpm,
	paul.gortmaker, sandeepa.s.prabhu, shijie.huang,
	linux-arm-kernel, linux-kernel, punit.agrawal
  Cc: Jonathan (Zhixiong) Zhang, Tyler Baicar

From: "Jonathan (Zhixiong) Zhang" <zjzhang@codeaurora.org>

Add VM_FAULT_HWPOISON[_LARGE] handling to the arm64 page fault
handler. Handling of VM_FAULT_HWPOISON[_LARGE] is very similar
to VM_FAULT_OOM, the only difference is that a different si_code
(BUS_MCEERR_AR) is passed to user space and si_addr_lsb field is
initialized.

Signed-off-by: Jonathan (Zhixiong) Zhang <zjzhang@codeaurora.org>
Signed-off-by: Tyler Baicar <tbaicar@codeaurora.org>
---
 arch/arm64/mm/fault.c | 22 +++++++++++++++++++---
 1 file changed, 19 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
index 156169c..ceaa82f 100644
--- a/arch/arm64/mm/fault.c
+++ b/arch/arm64/mm/fault.c
@@ -30,6 +30,7 @@
 #include <linux/highmem.h>
 #include <linux/perf_event.h>
 #include <linux/preempt.h>
+#include <linux/hugetlb.h>
 
 #include <asm/bug.h>
 #include <asm/cpufeature.h>
@@ -193,9 +194,10 @@ static void __do_kernel_fault(struct mm_struct *mm, unsigned long addr,
  */
 static void __do_user_fault(struct task_struct *tsk, unsigned long addr,
 			    unsigned int esr, unsigned int sig, int code,
-			    struct pt_regs *regs)
+			    struct pt_regs *regs, int fault)
 {
 	struct siginfo si;
+	unsigned int lsb = 0;
 
 	if (unhandled_signal(tsk, sig) && show_unhandled_signals_ratelimited()) {
 		pr_info("%s[%d]: unhandled %s (%d) at 0x%08lx, esr 0x%03x\n",
@@ -211,6 +213,17 @@ static void __do_user_fault(struct task_struct *tsk, unsigned long addr,
 	si.si_errno = 0;
 	si.si_code = code;
 	si.si_addr = (void __user *)addr;
+	/*
+	 * Either small page or large page may be poisoned.
+	 * In other words, VM_FAULT_HWPOISON_LARGE and
+	 * VM_FAULT_HWPOISON are mutually exclusive.
+	 */
+	if (fault & VM_FAULT_HWPOISON_LARGE)
+		lsb = hstate_index_to_shift(VM_FAULT_GET_HINDEX(fault));
+	else if (fault & VM_FAULT_HWPOISON)
+		lsb = PAGE_SHIFT;
+	si.si_addr_lsb = lsb;
+
 	force_sig_info(sig, &si, tsk);
 }
 
@@ -224,7 +237,7 @@ static void do_bad_area(unsigned long addr, unsigned int esr, struct pt_regs *re
 	 * handle this fault with.
 	 */
 	if (user_mode(regs))
-		__do_user_fault(tsk, addr, esr, SIGSEGV, SEGV_MAPERR, regs);
+		__do_user_fault(tsk, addr, esr, SIGSEGV, SEGV_MAPERR, regs, 0);
 	else
 		__do_kernel_fault(mm, addr, esr, regs);
 }
@@ -426,6 +439,9 @@ static int __kprobes do_page_fault(unsigned long addr, unsigned int esr,
 		 */
 		sig = SIGBUS;
 		code = BUS_ADRERR;
+	} else if (fault & (VM_FAULT_HWPOISON | VM_FAULT_HWPOISON_LARGE)) {
+		sig = SIGBUS;
+		code = BUS_MCEERR_AR;
 	} else {
 		/*
 		 * Something tried to access memory that isn't in our memory
@@ -436,7 +452,7 @@ static int __kprobes do_page_fault(unsigned long addr, unsigned int esr,
 			SEGV_ACCERR : SEGV_MAPERR;
 	}
 
-	__do_user_fault(tsk, addr, esr, sig, code, regs);
+	__do_user_fault(tsk, addr, esr, sig, code, regs, fault);
 	return 0;
 
 no_context:
-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project.

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH V2] arm64: hwpoison: add VM_FAULT_HWPOISON[_LARGE] handling
  2017-02-21 23:36 [PATCH V2] arm64: hwpoison: add VM_FAULT_HWPOISON[_LARGE] handling Tyler Baicar
@ 2017-03-02 18:28 ` Punit Agrawal
  2017-03-07 19:56   ` Punit Agrawal
  0 siblings, 1 reply; 11+ messages in thread
From: Punit Agrawal @ 2017-03-02 18:28 UTC (permalink / raw)
  To: Tyler Baicar
  Cc: catalin.marinas, will.deacon, mark.rutland, james.morse, akpm,
	paul.gortmaker, sandeepa.s.prabhu, shijie.huang,
	linux-arm-kernel, linux-kernel, Jonathan (Zhixiong) Zhang

Hi Tyler,

Tyler Baicar <tbaicar@codeaurora.org> writes:

> From: "Jonathan (Zhixiong) Zhang" <zjzhang@codeaurora.org>
>
> Add VM_FAULT_HWPOISON[_LARGE] handling to the arm64 page fault
> handler. Handling of VM_FAULT_HWPOISON[_LARGE] is very similar
> to VM_FAULT_OOM, the only difference is that a different si_code
> (BUS_MCEERR_AR) is passed to user space and si_addr_lsb field is
> initialized.
>
> Signed-off-by: Jonathan (Zhixiong) Zhang <zjzhang@codeaurora.org>
> Signed-off-by: Tyler Baicar <tbaicar@codeaurora.org>
> ---
>  arch/arm64/mm/fault.c | 22 +++++++++++++++++++---
>  1 file changed, 19 insertions(+), 3 deletions(-)
>
> diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
> index 156169c..ceaa82f 100644
> --- a/arch/arm64/mm/fault.c
> +++ b/arch/arm64/mm/fault.c
> @@ -30,6 +30,7 @@
>  #include <linux/highmem.h>
>  #include <linux/perf_event.h>
>  #include <linux/preempt.h>
> +#include <linux/hugetlb.h>
>  
>  #include <asm/bug.h>
>  #include <asm/cpufeature.h>
> @@ -193,9 +194,10 @@ static void __do_kernel_fault(struct mm_struct *mm, unsigned long addr,
>   */
>  static void __do_user_fault(struct task_struct *tsk, unsigned long addr,
>  			    unsigned int esr, unsigned int sig, int code,
> -			    struct pt_regs *regs)
> +			    struct pt_regs *regs, int fault)
>  {
>  	struct siginfo si;
> +	unsigned int lsb = 0;
>  
>  	if (unhandled_signal(tsk, sig) && show_unhandled_signals_ratelimited()) {
>  		pr_info("%s[%d]: unhandled %s (%d) at 0x%08lx, esr 0x%03x\n",
> @@ -211,6 +213,17 @@ static void __do_user_fault(struct task_struct *tsk, unsigned long addr,
>  	si.si_errno = 0;
>  	si.si_code = code;
>  	si.si_addr = (void __user *)addr;
> +	/*
> +	 * Either small page or large page may be poisoned.
> +	 * In other words, VM_FAULT_HWPOISON_LARGE and
> +	 * VM_FAULT_HWPOISON are mutually exclusive.
> +	 */
> +	if (fault & VM_FAULT_HWPOISON_LARGE)
> +		lsb = hstate_index_to_shift(VM_FAULT_GET_HINDEX(fault));
> +	else if (fault & VM_FAULT_HWPOISON)
> +		lsb = PAGE_SHIFT;
> +	si.si_addr_lsb = lsb;
> +
>  	force_sig_info(sig, &si, tsk);
>  }
>  
> @@ -224,7 +237,7 @@ static void do_bad_area(unsigned long addr, unsigned int esr, struct pt_regs *re
>  	 * handle this fault with.
>  	 */
>  	if (user_mode(regs))
> -		__do_user_fault(tsk, addr, esr, SIGSEGV, SEGV_MAPERR, regs);
> +		__do_user_fault(tsk, addr, esr, SIGSEGV, SEGV_MAPERR, regs, 0);
>  	else
>  		__do_kernel_fault(mm, addr, esr, regs);
>  }
> @@ -426,6 +439,9 @@ static int __kprobes do_page_fault(unsigned long addr, unsigned int esr,
>  		 */
>  		sig = SIGBUS;
>  		code = BUS_ADRERR;
> +	} else if (fault & (VM_FAULT_HWPOISON | VM_FAULT_HWPOISON_LARGE)) {
> +		sig = SIGBUS;
> +		code = BUS_MCEERR_AR;
>  	} else {
>  		/*
>  		 * Something tried to access memory that isn't in our memory
> @@ -436,7 +452,7 @@ static int __kprobes do_page_fault(unsigned long addr, unsigned int esr,
>  			SEGV_ACCERR : SEGV_MAPERR;
>  	}
>  
> -	__do_user_fault(tsk, addr, esr, sig, code, regs);
> +	__do_user_fault(tsk, addr, esr, sig, code, regs, fault);
>  	return 0;
>  
>  no_context:

The code looks good but I ran into some failures while running the
hugepages hwpoison tests from mce-tests suite[0]. I get a bad pmd error
in dmesg -

[  344.165544] mm/pgtable-generic.c:33: bad pmd 000000083af00074.

I suspect that this is due to the huge pte accessors not correctly
dealing with poisoned entries (which are represented as swap entries).

I am investigating the failure but could you try running the tests at
your end as well.

To run the tests, I cloned the repository[0]. It test needs a simple fix
at the end of this mail to run correctly. With that applied and running
as root -

# cd mce-test/cases/function/hwpoison
# ./run_hugepage.sh


[0] https://git.kernel.org/cgit/utils/cpu/mce/mce-test.git/

--------->8--------------
commit cb5c61f18dd86baf01b90404d4ecf51dd3d176c7
Author: Punit Agrawal <punit.agrawal@arm.com>
Date:   Thu Mar 2 18:24:40 2017 +0000

    Use correct return type for getopt_long

    getopt_long returns an int. Fix the return type to avoid issues when
    checking for negative error codes on architectures with unsigned char,
    e.g., arm.

    Signed-off-by: Punit Agrawal <punit.agrawal@arm.com>

diff --git a/cases/function/hwpoison/thugetlb.c b/cases/function/hwpoison/thugetlb.c
index 92dc7d2..fbcf426 100644
--- a/cases/function/hwpoison/thugetlb.c
+++ b/cases/function/hwpoison/thugetlb.c
@@ -125,7 +125,7 @@ int main(int argc, char *argv[])
        int forkflag = 0;
        int privateflag = 0;
        int cowflag = 0;
-   char c;
+ int c;
        pid_t pid = 0;
        void *expected_addr = NULL;
        struct sembuf sembuffer;

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH V2] arm64: hwpoison: add VM_FAULT_HWPOISON[_LARGE] handling
  2017-03-02 18:28 ` Punit Agrawal
@ 2017-03-07 19:56   ` Punit Agrawal
  2017-03-07 20:28     ` Baicar, Tyler
  0 siblings, 1 reply; 11+ messages in thread
From: Punit Agrawal @ 2017-03-07 19:56 UTC (permalink / raw)
  To: Tyler Baicar
  Cc: catalin.marinas, will.deacon, mark.rutland, james.morse, akpm,
	paul.gortmaker, sandeepa.s.prabhu, shijie.huang,
	linux-arm-kernel, linux-kernel, Jonathan (Zhixiong) Zhang

Punit Agrawal <punit.agrawal@arm.com> writes:

[...]

>
> The code looks good but I ran into some failures while running the
> hugepages hwpoison tests from mce-tests suite[0]. I get a bad pmd error
> in dmesg -
>
> [  344.165544] mm/pgtable-generic.c:33: bad pmd 000000083af00074.
>
> I suspect that this is due to the huge pte accessors not correctly
> dealing with poisoned entries (which are represented as swap entries).

I think I've got to the bottom of the issue - the problem is due to
huge_pte_at() returning NULL for poisoned pmd entries (which in turn is
due to pmd_present() not handling poisoned pmd entries correctly)

The following is the call chain for the failure case.

do_munmap
  unmap_region
    unmap_vmas
      unmap_single_vma
        __unmap_hugepage_range_final    # The test case uses hugepages
          __unmap_hugepage_range
            huge_pte_offset             # Returns NULL for a poisoned pmd

Reverting 5bb1cc0ff9a6 ("arm64: Ensure pmd_present() returns false after
pmd_mknotpresent()") fixes the problem for me but I don't think that is
the right fix.

While I work on a proper fix, it would be great if you can confirm that
reverting 5bb1cc0ff9a6 makes the problem go away at your end.

>
> I am investigating the failure but could you try running the tests at
> your end as well.
>
> To run the tests, I cloned the repository[0]. It test needs a simple fix
> at the end of this mail to run correctly. With that applied and running
> as root -
>
> # cd mce-test/cases/function/hwpoison
> # ./run_hugepage.sh
>
>
> [0] https://git.kernel.org/cgit/utils/cpu/mce/mce-test.git/
>
> --------->8--------------
> commit cb5c61f18dd86baf01b90404d4ecf51dd3d176c7
> Author: Punit Agrawal <punit.agrawal@arm.com>
> Date:   Thu Mar 2 18:24:40 2017 +0000
>
>     Use correct return type for getopt_long
>
>     getopt_long returns an int. Fix the return type to avoid issues when
>     checking for negative error codes on architectures with unsigned char,
>     e.g., arm.
>
>     Signed-off-by: Punit Agrawal <punit.agrawal@arm.com>
>
> diff --git a/cases/function/hwpoison/thugetlb.c b/cases/function/hwpoison/thugetlb.c
> index 92dc7d2..fbcf426 100644
> --- a/cases/function/hwpoison/thugetlb.c
> +++ b/cases/function/hwpoison/thugetlb.c
> @@ -125,7 +125,7 @@ int main(int argc, char *argv[])
>         int forkflag = 0;
>         int privateflag = 0;
>         int cowflag = 0;
> -   char c;
> + int c;
>         pid_t pid = 0;
>         void *expected_addr = NULL;
>         struct sembuf sembuffer;

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH V2] arm64: hwpoison: add VM_FAULT_HWPOISON[_LARGE] handling
  2017-03-07 19:56   ` Punit Agrawal
@ 2017-03-07 20:28     ` Baicar, Tyler
  2017-03-09 17:46       ` Punit Agrawal
  0 siblings, 1 reply; 11+ messages in thread
From: Baicar, Tyler @ 2017-03-07 20:28 UTC (permalink / raw)
  To: Punit Agrawal
  Cc: catalin.marinas, will.deacon, mark.rutland, james.morse, akpm,
	paul.gortmaker, sandeepa.s.prabhu, shijie.huang,
	linux-arm-kernel, linux-kernel, Jonathan (Zhixiong) Zhang

On 3/7/2017 12:56 PM, Punit Agrawal wrote:
> Punit Agrawal <punit.agrawal@arm.com> writes:
>
> [...]
>
>> The code looks good but I ran into some failures while running the
>> hugepages hwpoison tests from mce-tests suite[0]. I get a bad pmd error
>> in dmesg -
>>
>> [  344.165544] mm/pgtable-generic.c:33: bad pmd 000000083af00074.
>>
>> I suspect that this is due to the huge pte accessors not correctly
>> dealing with poisoned entries (which are represented as swap entries).
> I think I've got to the bottom of the issue - the problem is due to
> huge_pte_at() returning NULL for poisoned pmd entries (which in turn is
> due to pmd_present() not handling poisoned pmd entries correctly)
>
> The following is the call chain for the failure case.
>
> do_munmap
>    unmap_region
>      unmap_vmas
>        unmap_single_vma
>          __unmap_hugepage_range_final    # The test case uses hugepages
>            __unmap_hugepage_range
>              huge_pte_offset             # Returns NULL for a poisoned pmd
>
> Reverting 5bb1cc0ff9a6 ("arm64: Ensure pmd_present() returns false after
> pmd_mknotpresent()") fixes the problem for me but I don't think that is
> the right fix.
>
> While I work on a proper fix, it would be great if you can confirm that
> reverting 5bb1cc0ff9a6 makes the problem go away at your end.
Thanks Punit! I haven't got a chance to do this yet, but I will let you 
know once I get it tested :)
>
>> I am investigating the failure but could you try running the tests at
>> your end as well.
>>
>> To run the tests, I cloned the repository[0]. It test needs a simple fix
>> at the end of this mail to run correctly. With that applied and running
>> as root -
>>
>> # cd mce-test/cases/function/hwpoison
>> # ./run_hugepage.sh
>>
>>
>> [0] https://git.kernel.org/cgit/utils/cpu/mce/mce-test.git/
>>
>> --------->8--------------
>> commit cb5c61f18dd86baf01b90404d4ecf51dd3d176c7
>> Author: Punit Agrawal <punit.agrawal@arm.com>
>> Date:   Thu Mar 2 18:24:40 2017 +0000
>>
>>      Use correct return type for getopt_long
>>
>>      getopt_long returns an int. Fix the return type to avoid issues when
>>      checking for negative error codes on architectures with unsigned char,
>>      e.g., arm.
>>
>>      Signed-off-by: Punit Agrawal <punit.agrawal@arm.com>
>>
>> diff --git a/cases/function/hwpoison/thugetlb.c b/cases/function/hwpoison/thugetlb.c
>> index 92dc7d2..fbcf426 100644
>> --- a/cases/function/hwpoison/thugetlb.c
>> +++ b/cases/function/hwpoison/thugetlb.c
>> @@ -125,7 +125,7 @@ int main(int argc, char *argv[])
>>          int forkflag = 0;
>>          int privateflag = 0;
>>          int cowflag = 0;
>> -   char c;
>> + int c;
>>          pid_t pid = 0;
>>          void *expected_addr = NULL;
>>          struct sembuf sembuffer;

-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH V2] arm64: hwpoison: add VM_FAULT_HWPOISON[_LARGE] handling
  2017-03-07 20:28     ` Baicar, Tyler
@ 2017-03-09 17:46       ` Punit Agrawal
  2017-03-10 18:06         ` Baicar, Tyler
  2017-03-15 11:19         ` Catalin Marinas
  0 siblings, 2 replies; 11+ messages in thread
From: Punit Agrawal @ 2017-03-09 17:46 UTC (permalink / raw)
  To: Baicar, Tyler
  Cc: catalin.marinas, will.deacon, mark.rutland, james.morse, akpm,
	paul.gortmaker, sandeepa.s.prabhu, shijie.huang,
	linux-arm-kernel, linux-kernel, Jonathan (Zhixiong) Zhang,
	Steve Capper

[ +steve for arm64 mm and hugepages chops ]

"Baicar, Tyler" <tbaicar@codeaurora.org> writes:

> On 3/7/2017 12:56 PM, Punit Agrawal wrote:
>> Punit Agrawal <punit.agrawal@arm.com> writes:
>>
>> [...]
>>
>>> The code looks good but I ran into some failures while running the
>>> hugepages hwpoison tests from mce-tests suite[0]. I get a bad pmd error
>>> in dmesg -
>>>
>>> [  344.165544] mm/pgtable-generic.c:33: bad pmd 000000083af00074.
>>>
>>> I suspect that this is due to the huge pte accessors not correctly
>>> dealing with poisoned entries (which are represented as swap entries).
>> I think I've got to the bottom of the issue - the problem is due to
>> huge_pte_at() returning NULL for poisoned pmd entries (which in turn is
>> due to pmd_present() not handling poisoned pmd entries correctly)
>>
>> The following is the call chain for the failure case.
>>
>> do_munmap
>>    unmap_region
>>      unmap_vmas
>>        unmap_single_vma
>>          __unmap_hugepage_range_final    # The test case uses hugepages
>>            __unmap_hugepage_range
>>              huge_pte_offset             # Returns NULL for a poisoned pmd
>>
>> Reverting 5bb1cc0ff9a6 ("arm64: Ensure pmd_present() returns false after
>> pmd_mknotpresent()") fixes the problem for me but I don't think that is
>> the right fix.
>>
>> While I work on a proper fix, it would be great if you can confirm that
>> reverting 5bb1cc0ff9a6 makes the problem go away at your end.
> Thanks Punit! I haven't got a chance to do this yet, but I will let
> you know once I get it tested :)

This time with a patch. Please test this instead.

After a lot of head scratching, I've bit the bullet and added a check to
return the poisoned entry from huge_pte_offset(). What with having to
deal with contiguous hugepages et al., there just doesn't seem to be any
leeway in how we handle the situation here.

Let's see if there are any other ideas. Patch follows.

Thanks,
Punit

----------->8-------------
>From d5ad3f428e629c80b0f93f2bbdf99b4cae28c9bc Mon Sep 17 00:00:00 2001
From: Punit Agrawal <punit.agrawal@arm.com>
Date: Thu, 9 Mar 2017 16:16:29 +0000
Subject: [PATCH] arm64: hugetlb: Fix huge_pte_offset to return poisoned pmd

When memory failure is enabled, a poisoned hugepage PMD is marked as a
swap entry. As pmd_present() only checks for VALID and PROT_NONE
bits (turned off for swap entries), it causues huge_pte_offset() to
return NULL for poisoned PMDs.

This behaviour of huge_pte_offset() leads to the error such as below
when munmap is called on poisoned hugepages.

[  344.165544] mm/pgtable-generic.c:33: bad pmd 000000083af00074.

Fix huge_pte_offset() to return the poisoned PMD which is then
appropriately handled by the generic layer code.

Signed-off-by: Punit Agrawal <punit.agrawal@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Steve Capper <steve.capper@arm.com>
---
 arch/arm64/mm/hugetlbpage.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c
index e25584d72396..9263f206353c 100644
--- a/arch/arm64/mm/hugetlbpage.c
+++ b/arch/arm64/mm/hugetlbpage.c
@@ -150,8 +150,17 @@ pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long addr)
        if (pud_huge(*pud))
                return (pte_t *)pud;
        pmd = pmd_offset(pud, addr);
+
+       /*
+        * In case of HW Poisoning, a hugepage pmd can contain
+        * poisoned entries. Poisoned entries are marked as swap
+        * entries.
+        *
+        * For pmds that are not present, check to see if it could be
+        * a swap entry (!present and !none) before giving up.
+        */
        if (!pmd_present(*pmd))
-               return NULL;
+               return !pmd_none(*pmd) ? (pte_t *)pmd : NULL;

        if (pte_cont(pmd_pte(*pmd))) {
                pmd = pmd_offset(
--
2.11.0

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH V2] arm64: hwpoison: add VM_FAULT_HWPOISON[_LARGE] handling
  2017-03-09 17:46       ` Punit Agrawal
@ 2017-03-10 18:06         ` Baicar, Tyler
  2017-03-14 16:20           ` Punit Agrawal
  2017-03-15 11:19         ` Catalin Marinas
  1 sibling, 1 reply; 11+ messages in thread
From: Baicar, Tyler @ 2017-03-10 18:06 UTC (permalink / raw)
  To: Punit Agrawal
  Cc: catalin.marinas, will.deacon, mark.rutland, james.morse, akpm,
	paul.gortmaker, sandeepa.s.prabhu, shijie.huang,
	linux-arm-kernel, linux-kernel, Jonathan (Zhixiong) Zhang,
	Steve Capper

Hello Punit,

I ran the test with and without the kernel patch you're suggesting 
below. I do not see the "bad pmd ..." print that you are seeing in 
either case. All the tests are not passing for me though, it runs 42 
test cases and 14 show up as failed for some reason.

Thanks,

Tyler


On 3/9/2017 10:46 AM, Punit Agrawal wrote:
> [ +steve for arm64 mm and hugepages chops ]
>
> "Baicar, Tyler" <tbaicar@codeaurora.org> writes:
>
>> On 3/7/2017 12:56 PM, Punit Agrawal wrote:
>>> Punit Agrawal <punit.agrawal@arm.com> writes:
>>>
>>> [...]
>>>
>>>> The code looks good but I ran into some failures while running the
>>>> hugepages hwpoison tests from mce-tests suite[0]. I get a bad pmd error
>>>> in dmesg -
>>>>
>>>> [  344.165544] mm/pgtable-generic.c:33: bad pmd 000000083af00074.
>>>>
>>>> I suspect that this is due to the huge pte accessors not correctly
>>>> dealing with poisoned entries (which are represented as swap entries).
>>> I think I've got to the bottom of the issue - the problem is due to
>>> huge_pte_at() returning NULL for poisoned pmd entries (which in turn is
>>> due to pmd_present() not handling poisoned pmd entries correctly)
>>>
>>> The following is the call chain for the failure case.
>>>
>>> do_munmap
>>>     unmap_region
>>>       unmap_vmas
>>>         unmap_single_vma
>>>           __unmap_hugepage_range_final    # The test case uses hugepages
>>>             __unmap_hugepage_range
>>>               huge_pte_offset             # Returns NULL for a poisoned pmd
>>>
>>> Reverting 5bb1cc0ff9a6 ("arm64: Ensure pmd_present() returns false after
>>> pmd_mknotpresent()") fixes the problem for me but I don't think that is
>>> the right fix.
>>>
>>> While I work on a proper fix, it would be great if you can confirm that
>>> reverting 5bb1cc0ff9a6 makes the problem go away at your end.
>> Thanks Punit! I haven't got a chance to do this yet, but I will let
>> you know once I get it tested :)
> This time with a patch. Please test this instead.
>
> After a lot of head scratching, I've bit the bullet and added a check to
> return the poisoned entry from huge_pte_offset(). What with having to
> deal with contiguous hugepages et al., there just doesn't seem to be any
> leeway in how we handle the situation here.
>
> Let's see if there are any other ideas. Patch follows.
>
> Thanks,
> Punit
>
> ----------->8-------------
>  From d5ad3f428e629c80b0f93f2bbdf99b4cae28c9bc Mon Sep 17 00:00:00 2001
> From: Punit Agrawal <punit.agrawal@arm.com>
> Date: Thu, 9 Mar 2017 16:16:29 +0000
> Subject: [PATCH] arm64: hugetlb: Fix huge_pte_offset to return poisoned pmd
>
> When memory failure is enabled, a poisoned hugepage PMD is marked as a
> swap entry. As pmd_present() only checks for VALID and PROT_NONE
> bits (turned off for swap entries), it causues huge_pte_offset() to
> return NULL for poisoned PMDs.
>
> This behaviour of huge_pte_offset() leads to the error such as below
> when munmap is called on poisoned hugepages.
>
> [  344.165544] mm/pgtable-generic.c:33: bad pmd 000000083af00074.
>
> Fix huge_pte_offset() to return the poisoned PMD which is then
> appropriately handled by the generic layer code.
>
> Signed-off-by: Punit Agrawal <punit.agrawal@arm.com>
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Steve Capper <steve.capper@arm.com>
> ---
>   arch/arm64/mm/hugetlbpage.c | 11 ++++++++++-
>   1 file changed, 10 insertions(+), 1 deletion(-)
>
> diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c
> index e25584d72396..9263f206353c 100644
> --- a/arch/arm64/mm/hugetlbpage.c
> +++ b/arch/arm64/mm/hugetlbpage.c
> @@ -150,8 +150,17 @@ pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long addr)
>          if (pud_huge(*pud))
>                  return (pte_t *)pud;
>          pmd = pmd_offset(pud, addr);
> +
> +       /*
> +        * In case of HW Poisoning, a hugepage pmd can contain
> +        * poisoned entries. Poisoned entries are marked as swap
> +        * entries.
> +        *
> +        * For pmds that are not present, check to see if it could be
> +        * a swap entry (!present and !none) before giving up.
> +        */
>          if (!pmd_present(*pmd))
> -               return NULL;
> +               return !pmd_none(*pmd) ? (pte_t *)pmd : NULL;
>
>          if (pte_cont(pmd_pte(*pmd))) {
>                  pmd = pmd_offset(
> --
> 2.11.0

-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH V2] arm64: hwpoison: add VM_FAULT_HWPOISON[_LARGE] handling
  2017-03-10 18:06         ` Baicar, Tyler
@ 2017-03-14 16:20           ` Punit Agrawal
  0 siblings, 0 replies; 11+ messages in thread
From: Punit Agrawal @ 2017-03-14 16:20 UTC (permalink / raw)
  To: Baicar, Tyler
  Cc: catalin.marinas, will.deacon, mark.rutland, james.morse, akpm,
	paul.gortmaker, sandeepa.s.prabhu, shijie.huang,
	linux-arm-kernel, linux-kernel, Jonathan (Zhixiong) Zhang,
	Steve Capper

Hi Tyler,

"Baicar, Tyler" <tbaicar@codeaurora.org> writes:

> Hello Punit,
>
> I ran the test with and without the kernel patch you're suggesting
> below. I do not see the "bad pmd ..." print that you are seeing in
> either case.

Thanks for trying out the patch. It's important to understand why we are
seeing the difference in behaviour.

Looking at the code path, you should be hitting the "bad pmd" pr_err in
dmesg. Any chance either hugepages or the memory failure configs weren't
enabled in the test kernel?

The test script (run_hugepage.sh) isn't particularly robust. It carries
on executing even though some of the pre-conditions are not satisfied. I
had seen the script continue even though some of the dependencies were
missing from "bin" directory in the mce-test repo (Fixed by running
"make install" in tools/page-types".

Also, I reduced the console output and dmesg noise by executing only the
failing test in run_hugepage.sh -

"exec_testcase head late_touch file fork_shared killed".

Can you try re-running with the other tests commented out?

> All the tests are not passing for me though, it runs 42
> test cases and 14 show up as failed for some reason.

I see similar behaviour. I think the failures are due to timing
sensitivity when synchronising multi-process test cases - I saw a
comment implying this somewhere but can't seem to find it now.

Thanks,
Punit

>
> Thanks,
>
> Tyler
>
>
> On 3/9/2017 10:46 AM, Punit Agrawal wrote:
>> [ +steve for arm64 mm and hugepages chops ]
>>
>> "Baicar, Tyler" <tbaicar@codeaurora.org> writes:
>>
>>> On 3/7/2017 12:56 PM, Punit Agrawal wrote:
>>>> Punit Agrawal <punit.agrawal@arm.com> writes:
>>>>
>>>> [...]
>>>>
>>>>> The code looks good but I ran into some failures while running the
>>>>> hugepages hwpoison tests from mce-tests suite[0]. I get a bad pmd error
>>>>> in dmesg -
>>>>>
>>>>> [  344.165544] mm/pgtable-generic.c:33: bad pmd 000000083af00074.
>>>>>
>>>>> I suspect that this is due to the huge pte accessors not correctly
>>>>> dealing with poisoned entries (which are represented as swap entries).
>>>> I think I've got to the bottom of the issue - the problem is due to
>>>> huge_pte_at() returning NULL for poisoned pmd entries (which in turn is
>>>> due to pmd_present() not handling poisoned pmd entries correctly)
>>>>
>>>> The following is the call chain for the failure case.
>>>>
>>>> do_munmap
>>>>     unmap_region
>>>>       unmap_vmas
>>>>         unmap_single_vma
>>>>           __unmap_hugepage_range_final    # The test case uses hugepages
>>>>             __unmap_hugepage_range
>>>>               huge_pte_offset             # Returns NULL for a poisoned pmd
>>>>
>>>> Reverting 5bb1cc0ff9a6 ("arm64: Ensure pmd_present() returns false after
>>>> pmd_mknotpresent()") fixes the problem for me but I don't think that is
>>>> the right fix.
>>>>
>>>> While I work on a proper fix, it would be great if you can confirm that
>>>> reverting 5bb1cc0ff9a6 makes the problem go away at your end.
>>> Thanks Punit! I haven't got a chance to do this yet, but I will let
>>> you know once I get it tested :)
>> This time with a patch. Please test this instead.
>>
>> After a lot of head scratching, I've bit the bullet and added a check to
>> return the poisoned entry from huge_pte_offset(). What with having to
>> deal with contiguous hugepages et al., there just doesn't seem to be any
>> leeway in how we handle the situation here.
>>
>> Let's see if there are any other ideas. Patch follows.
>>
>> Thanks,
>> Punit
>>
>> ----------->8-------------
>>  From d5ad3f428e629c80b0f93f2bbdf99b4cae28c9bc Mon Sep 17 00:00:00 2001
>> From: Punit Agrawal <punit.agrawal@arm.com>
>> Date: Thu, 9 Mar 2017 16:16:29 +0000
>> Subject: [PATCH] arm64: hugetlb: Fix huge_pte_offset to return poisoned pmd
>>
>> When memory failure is enabled, a poisoned hugepage PMD is marked as a
>> swap entry. As pmd_present() only checks for VALID and PROT_NONE
>> bits (turned off for swap entries), it causues huge_pte_offset() to
>> return NULL for poisoned PMDs.
>>
>> This behaviour of huge_pte_offset() leads to the error such as below
>> when munmap is called on poisoned hugepages.
>>
>> [  344.165544] mm/pgtable-generic.c:33: bad pmd 000000083af00074.
>>
>> Fix huge_pte_offset() to return the poisoned PMD which is then
>> appropriately handled by the generic layer code.
>>
>> Signed-off-by: Punit Agrawal <punit.agrawal@arm.com>
>> Cc: Catalin Marinas <catalin.marinas@arm.com>
>> Cc: Steve Capper <steve.capper@arm.com>
>> ---
>>   arch/arm64/mm/hugetlbpage.c | 11 ++++++++++-
>>   1 file changed, 10 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c
>> index e25584d72396..9263f206353c 100644
>> --- a/arch/arm64/mm/hugetlbpage.c
>> +++ b/arch/arm64/mm/hugetlbpage.c
>> @@ -150,8 +150,17 @@ pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long addr)
>>          if (pud_huge(*pud))
>>                  return (pte_t *)pud;
>>          pmd = pmd_offset(pud, addr);
>> +
>> +       /*
>> +        * In case of HW Poisoning, a hugepage pmd can contain
>> +        * poisoned entries. Poisoned entries are marked as swap
>> +        * entries.
>> +        *
>> +        * For pmds that are not present, check to see if it could be
>> +        * a swap entry (!present and !none) before giving up.
>> +        */
>>          if (!pmd_present(*pmd))
>> -               return NULL;
>> +               return !pmd_none(*pmd) ? (pte_t *)pmd : NULL;
>>
>>          if (pte_cont(pmd_pte(*pmd))) {
>>                  pmd = pmd_offset(
>> --
>> 2.11.0

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH V2] arm64: hwpoison: add VM_FAULT_HWPOISON[_LARGE] handling
  2017-03-09 17:46       ` Punit Agrawal
  2017-03-10 18:06         ` Baicar, Tyler
@ 2017-03-15 11:19         ` Catalin Marinas
  2017-03-15 16:07           ` Steve Capper
  2017-03-15 18:49           ` Punit Agrawal
  1 sibling, 2 replies; 11+ messages in thread
From: Catalin Marinas @ 2017-03-15 11:19 UTC (permalink / raw)
  To: Punit Agrawal
  Cc: Baicar, Tyler, mark.rutland, Jonathan (Zhixiong) Zhang,
	Steve Capper, will.deacon, linux-kernel, shijie.huang,
	paul.gortmaker, james.morse, sandeepa.s.prabhu, akpm,
	linux-arm-kernel, David Woods

Hi Punit,

Adding David Woods since he seems to have added the arm64-specific
huge_pte_offset() code.

On Thu, Mar 09, 2017 at 05:46:36PM +0000, Punit Agrawal wrote:
> From d5ad3f428e629c80b0f93f2bbdf99b4cae28c9bc Mon Sep 17 00:00:00 2001
> From: Punit Agrawal <punit.agrawal@arm.com>
> Date: Thu, 9 Mar 2017 16:16:29 +0000
> Subject: [PATCH] arm64: hugetlb: Fix huge_pte_offset to return poisoned pmd
> 
> When memory failure is enabled, a poisoned hugepage PMD is marked as a
> swap entry. As pmd_present() only checks for VALID and PROT_NONE
> bits (turned off for swap entries), it causues huge_pte_offset() to
> return NULL for poisoned PMDs.
> 
> This behaviour of huge_pte_offset() leads to the error such as below
> when munmap is called on poisoned hugepages.
> 
> [  344.165544] mm/pgtable-generic.c:33: bad pmd 000000083af00074.
> 
> Fix huge_pte_offset() to return the poisoned PMD which is then
> appropriately handled by the generic layer code.
> 
> Signed-off-by: Punit Agrawal <punit.agrawal@arm.com>
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Steve Capper <steve.capper@arm.com>
> ---
>  arch/arm64/mm/hugetlbpage.c | 11 ++++++++++-
>  1 file changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c
> index e25584d72396..9263f206353c 100644
> --- a/arch/arm64/mm/hugetlbpage.c
> +++ b/arch/arm64/mm/hugetlbpage.c
> @@ -150,8 +150,17 @@ pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long addr)
>         if (pud_huge(*pud))
>                 return (pte_t *)pud;
>         pmd = pmd_offset(pud, addr);
> +
> +       /*
> +        * In case of HW Poisoning, a hugepage pmd can contain
> +        * poisoned entries. Poisoned entries are marked as swap
> +        * entries.
> +        *
> +        * For pmds that are not present, check to see if it could be
> +        * a swap entry (!present and !none) before giving up.
> +        */
>         if (!pmd_present(*pmd))
> -               return NULL;
> +               return !pmd_none(*pmd) ? (pte_t *)pmd : NULL;

I'm not sure we need to return NULL here when pmd_none(). If we use
hugetlb at the pmd level we don't need to allocate a pmd page but just
fall back to hugetlb_no_page() in hugetlb_fault(). The problem is we
can't tell what kind of huge page we have when calling
huge_pte_offset(), so we always rely on huge_pte_alloc(). But there are
places where huge_pte_none() is checked explicitly and we would never
return it from huge_pte_get().

Can we improve the generic code to pass the huge page size to
huge_pte_offset()? Otherwise we make all kind of assumptions/guesses in
the arch code.

> 
>         if (pte_cont(pmd_pte(*pmd))) {
>                 pmd = pmd_offset(

Given that we can have huge pages at the pud level, we should address
that as well. The generic huge_pte_offset() doesn't need to since it
assumes huge pages at the pmd level only. If a pud is not present, you
can't dereference it to find the pmd, hence returning NULL.

Apart from hw poisoning, I think another use-case for non-present
pmd/pud entries is is_hugetlb_entry_migration() (see hugetlb_fault()),
so we need to fix this either way.

We have a discrepancy between the pud_present and pmd_present. The
latter was modified to fall back on pte_present because of THP which
does not support puds (last time I checked). So if a pud is poisoned,
huge_pte_offset thinks it is present and will try to get the pmd it
points to.

I think we can leave the pud_present() unchanged but fix the
huge_pte_offset() to check for pud_table() before dereferencing,
otherwise returning the actual value. And we need to figure out which
huge page size we have when the pud/pmd is 0.

-- 
Catalin

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH V2] arm64: hwpoison: add VM_FAULT_HWPOISON[_LARGE] handling
  2017-03-15 11:19         ` Catalin Marinas
@ 2017-03-15 16:07           ` Steve Capper
  2017-03-15 16:42             ` Catalin Marinas
  2017-03-15 18:49           ` Punit Agrawal
  1 sibling, 1 reply; 11+ messages in thread
From: Steve Capper @ 2017-03-15 16:07 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Punit Agrawal, Mark Rutland, Steve Capper, Sandeepa Prabhu,
	David Woods, Baicar, Tyler, Will Deacon, linux-kernel,
	shijie.huang, james.morse, linux-arm-kernel,
	Jonathan (Zhixiong) Zhang, akpm, paul.gortmaker

Hi,
Sorry for replying to this thread late.

On 15 March 2017 at 11:19, Catalin Marinas <catalin.marinas@arm.com> wrote:
> Hi Punit,
>
> Adding David Woods since he seems to have added the arm64-specific
> huge_pte_offset() code.
>
> On Thu, Mar 09, 2017 at 05:46:36PM +0000, Punit Agrawal wrote:
>> From d5ad3f428e629c80b0f93f2bbdf99b4cae28c9bc Mon Sep 17 00:00:00 2001
>> From: Punit Agrawal <punit.agrawal@arm.com>
>> Date: Thu, 9 Mar 2017 16:16:29 +0000
>> Subject: [PATCH] arm64: hugetlb: Fix huge_pte_offset to return poisoned pmd
>>
>> When memory failure is enabled, a poisoned hugepage PMD is marked as a
>> swap entry. As pmd_present() only checks for VALID and PROT_NONE
>> bits (turned off for swap entries), it causues huge_pte_offset() to
>> return NULL for poisoned PMDs.
>>
>> This behaviour of huge_pte_offset() leads to the error such as below
>> when munmap is called on poisoned hugepages.
>>
>> [  344.165544] mm/pgtable-generic.c:33: bad pmd 000000083af00074.
>>
>> Fix huge_pte_offset() to return the poisoned PMD which is then
>> appropriately handled by the generic layer code.
>>
>> Signed-off-by: Punit Agrawal <punit.agrawal@arm.com>
>> Cc: Catalin Marinas <catalin.marinas@arm.com>
>> Cc: Steve Capper <steve.capper@arm.com>
>> ---
>>  arch/arm64/mm/hugetlbpage.c | 11 ++++++++++-
>>  1 file changed, 10 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c
>> index e25584d72396..9263f206353c 100644
>> --- a/arch/arm64/mm/hugetlbpage.c
>> +++ b/arch/arm64/mm/hugetlbpage.c
>> @@ -150,8 +150,17 @@ pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long addr)
>>         if (pud_huge(*pud))
>>                 return (pte_t *)pud;
>>         pmd = pmd_offset(pud, addr);
>> +
>> +       /*
>> +        * In case of HW Poisoning, a hugepage pmd can contain
>> +        * poisoned entries. Poisoned entries are marked as swap
>> +        * entries.
>> +        *
>> +        * For pmds that are not present, check to see if it could be
>> +        * a swap entry (!present and !none) before giving up.
>> +        */
>>         if (!pmd_present(*pmd))
>> -               return NULL;
>> +               return !pmd_none(*pmd) ? (pte_t *)pmd : NULL;
>
> I'm not sure we need to return NULL here when pmd_none(). If we use
> hugetlb at the pmd level we don't need to allocate a pmd page but just
> fall back to hugetlb_no_page() in hugetlb_fault(). The problem is we
> can't tell what kind of huge page we have when calling
> huge_pte_offset(), so we always rely on huge_pte_alloc(). But there are
> places where huge_pte_none() is checked explicitly and we would never
> return it from huge_pte_get().
>
> Can we improve the generic code to pass the huge page size to
> huge_pte_offset()? Otherwise we make all kind of assumptions/guesses in
> the arch code.

We'll certainly need the huge page size as we are unable to
differentiate between pmd and contiguous pmd for invalid entries too;
and we'll need to return a pointer to the "head" pte_t.

>
>>
>>         if (pte_cont(pmd_pte(*pmd))) {
>>                 pmd = pmd_offset(
>
> Given that we can have huge pages at the pud level, we should address
> that as well. The generic huge_pte_offset() doesn't need to since it
> assumes huge pages at the pmd level only. If a pud is not present, you
> can't dereference it to find the pmd, hence returning NULL.
>
> Apart from hw poisoning, I think another use-case for non-present
> pmd/pud entries is is_hugetlb_entry_migration() (see hugetlb_fault()),
> so we need to fix this either way.
>
> We have a discrepancy between the pud_present and pmd_present. The
> latter was modified to fall back on pte_present because of THP which
> does not support puds (last time I checked). So if a pud is poisoned,
> huge_pte_offset thinks it is present and will try to get the pmd it
> points to.
>
> I think we can leave the pud_present() unchanged but fix the
> huge_pte_offset() to check for pud_table() before dereferencing,
> otherwise returning the actual value. And we need to figure out which
> huge page size we have when the pud/pmd is 0.

I don't understand the suggestions for puds, as they won't be contiguous?

Cheers,
--
Steve

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH V2] arm64: hwpoison: add VM_FAULT_HWPOISON[_LARGE] handling
  2017-03-15 16:07           ` Steve Capper
@ 2017-03-15 16:42             ` Catalin Marinas
  0 siblings, 0 replies; 11+ messages in thread
From: Catalin Marinas @ 2017-03-15 16:42 UTC (permalink / raw)
  To: Steve Capper
  Cc: Mark Rutland, Baicar, Tyler, Steve Capper, David Woods,
	Punit Agrawal, Will Deacon, linux-kernel, paul.gortmaker,
	Jonathan (Zhixiong) Zhang, shijie.huang, james.morse,
	Sandeepa Prabhu, akpm, linux-arm-kernel

On Wed, Mar 15, 2017 at 04:07:20PM +0000, Steve Capper wrote:
> On 15 March 2017 at 11:19, Catalin Marinas <catalin.marinas@arm.com> wrote:
> > On Thu, Mar 09, 2017 at 05:46:36PM +0000, Punit Agrawal wrote:
> >> From d5ad3f428e629c80b0f93f2bbdf99b4cae28c9bc Mon Sep 17 00:00:00 2001
> >> From: Punit Agrawal <punit.agrawal@arm.com>
> >> Date: Thu, 9 Mar 2017 16:16:29 +0000
> >> Subject: [PATCH] arm64: hugetlb: Fix huge_pte_offset to return poisoned pmd
> >>
> >> When memory failure is enabled, a poisoned hugepage PMD is marked as a
> >> swap entry. As pmd_present() only checks for VALID and PROT_NONE
> >> bits (turned off for swap entries), it causues huge_pte_offset() to
> >> return NULL for poisoned PMDs.
[...]
> > Given that we can have huge pages at the pud level, we should address
> > that as well. The generic huge_pte_offset() doesn't need to since it
> > assumes huge pages at the pmd level only. If a pud is not present, you
> > can't dereference it to find the pmd, hence returning NULL.
> >
> > Apart from hw poisoning, I think another use-case for non-present
> > pmd/pud entries is is_hugetlb_entry_migration() (see hugetlb_fault()),
> > so we need to fix this either way.
> >
> > We have a discrepancy between the pud_present and pmd_present. The
> > latter was modified to fall back on pte_present because of THP which
> > does not support puds (last time I checked). So if a pud is poisoned,
> > huge_pte_offset thinks it is present and will try to get the pmd it
> > points to.
> >
> > I think we can leave the pud_present() unchanged but fix the
> > huge_pte_offset() to check for pud_table() before dereferencing,
> > otherwise returning the actual value. And we need to figure out which
> > huge page size we have when the pud/pmd is 0.
> 
> I don't understand the suggestions for puds, as they won't be contiguous?

I wasn't thinking of the contiguous bit for pud but rather what to
return early based on present/huge/table. I think we have the cases
below:

1. pud_present() && pud_huge():
	return pud

2. pud_present() && pud_table():
	continue to pmd

3. pud_present() && !pud_huge() && !pud_table():
	return pud (huge pte poison at the pud level)

4. !pud_present() (a.k.a. pud_none()):
	a) return pud (if we have huge pages at the pud level)
	b) return NULL

At 3 I assumed that we don't poison table entries, therefore it is safe
to assume that the pud is an invalid huge page entry (poisoned,
migration).

At 4, I don't think we can currently distinguished between an empty huge
page pud and an empty table pointing further to a pmd. We could go for
NULL and assume that huge_pte_alloc() handles it properly.

-- 
Catalin

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH V2] arm64: hwpoison: add VM_FAULT_HWPOISON[_LARGE] handling
  2017-03-15 11:19         ` Catalin Marinas
  2017-03-15 16:07           ` Steve Capper
@ 2017-03-15 18:49           ` Punit Agrawal
  1 sibling, 0 replies; 11+ messages in thread
From: Punit Agrawal @ 2017-03-15 18:49 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Baicar, Tyler, mark.rutland, Jonathan (Zhixiong) Zhang,
	Steve Capper, will.deacon, linux-kernel, shijie.huang,
	paul.gortmaker, james.morse, sandeepa.s.prabhu, akpm,
	linux-arm-kernel, David Woods

Catalin Marinas <catalin.marinas@arm.com> writes:

> Hi Punit,
>
> Adding David Woods since he seems to have added the arm64-specific
> huge_pte_offset() code.
>
> On Thu, Mar 09, 2017 at 05:46:36PM +0000, Punit Agrawal wrote:
>> From d5ad3f428e629c80b0f93f2bbdf99b4cae28c9bc Mon Sep 17 00:00:00 2001
>> From: Punit Agrawal <punit.agrawal@arm.com>
>> Date: Thu, 9 Mar 2017 16:16:29 +0000
>> Subject: [PATCH] arm64: hugetlb: Fix huge_pte_offset to return poisoned pmd
>> 
>> When memory failure is enabled, a poisoned hugepage PMD is marked as a
>> swap entry. As pmd_present() only checks for VALID and PROT_NONE
>> bits (turned off for swap entries), it causues huge_pte_offset() to
>> return NULL for poisoned PMDs.
>> 
>> This behaviour of huge_pte_offset() leads to the error such as below
>> when munmap is called on poisoned hugepages.
>> 
>> [  344.165544] mm/pgtable-generic.c:33: bad pmd 000000083af00074.
>> 
>> Fix huge_pte_offset() to return the poisoned PMD which is then
>> appropriately handled by the generic layer code.
>> 
>> Signed-off-by: Punit Agrawal <punit.agrawal@arm.com>
>> Cc: Catalin Marinas <catalin.marinas@arm.com>
>> Cc: Steve Capper <steve.capper@arm.com>
>> ---
>>  arch/arm64/mm/hugetlbpage.c | 11 ++++++++++-
>>  1 file changed, 10 insertions(+), 1 deletion(-)
>> 
>> diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c
>> index e25584d72396..9263f206353c 100644
>> --- a/arch/arm64/mm/hugetlbpage.c
>> +++ b/arch/arm64/mm/hugetlbpage.c
>> @@ -150,8 +150,17 @@ pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long addr)
>>         if (pud_huge(*pud))
>>                 return (pte_t *)pud;
>>         pmd = pmd_offset(pud, addr);
>> +
>> +       /*
>> +        * In case of HW Poisoning, a hugepage pmd can contain
>> +        * poisoned entries. Poisoned entries are marked as swap
>> +        * entries.
>> +        *
>> +        * For pmds that are not present, check to see if it could be
>> +        * a swap entry (!present and !none) before giving up.
>> +        */
>>         if (!pmd_present(*pmd))
>> -               return NULL;
>> +               return !pmd_none(*pmd) ? (pte_t *)pmd : NULL;
>
> I'm not sure we need to return NULL here when pmd_none(). If we use
> hugetlb at the pmd level we don't need to allocate a pmd page but just
> fall back to hugetlb_no_page() in hugetlb_fault(). The problem is we
> can't tell what kind of huge page we have when calling
> huge_pte_offset(), so we always rely on huge_pte_alloc(). But there are
> places where huge_pte_none() is checked explicitly and we would never
> return it from huge_pte_get().

Makes sense.

>
> Can we improve the generic code to pass the huge page size to
> huge_pte_offset()? Otherwise we make all kind of assumptions/guesses in
> the arch code.

Agreed. The present fix only works for poisoned PMD entries. I'll
prototype adding size parameter and using that to disambiguate huge page
sizes. The change will touch a lot of architectures, most seem to have a
local definition of huge_pte_offset().

>
>> 
>>         if (pte_cont(pmd_pte(*pmd))) {
>>                 pmd = pmd_offset(
>
> Given that we can have huge pages at the pud level, we should address
> that as well. The generic huge_pte_offset() doesn't need to since it
> assumes huge pages at the pmd level only. If a pud is not present, you
> can't dereference it to find the pmd, hence returning NULL.
>
> Apart from hw poisoning, I think another use-case for non-present
> pmd/pud entries is is_hugetlb_entry_migration() (see hugetlb_fault()),
> so we need to fix this either way.
>
> We have a discrepancy between the pud_present and pmd_present. The
> latter was modified to fall back on pte_present because of THP which
> does not support puds (last time I checked). So if a pud is poisoned,
> huge_pte_offset thinks it is present and will try to get the pmd it
> points to.
>
> I think we can leave the pud_present() unchanged but fix the
> huge_pte_offset() to check for pud_table() before dereferencing,
> otherwise returning the actual value. And we need to figure out which
> huge page size we have when the pud/pmd is 0.

Ack. I'll add the check in the next update.

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2017-03-15 18:49 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-02-21 23:36 [PATCH V2] arm64: hwpoison: add VM_FAULT_HWPOISON[_LARGE] handling Tyler Baicar
2017-03-02 18:28 ` Punit Agrawal
2017-03-07 19:56   ` Punit Agrawal
2017-03-07 20:28     ` Baicar, Tyler
2017-03-09 17:46       ` Punit Agrawal
2017-03-10 18:06         ` Baicar, Tyler
2017-03-14 16:20           ` Punit Agrawal
2017-03-15 11:19         ` Catalin Marinas
2017-03-15 16:07           ` Steve Capper
2017-03-15 16:42             ` Catalin Marinas
2017-03-15 18:49           ` Punit Agrawal

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).