linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] arm64: hwpoison: add VM_FAULT_HWPOISON[_LARGE] handling
@ 2017-02-01 22:15 Tyler Baicar
  2017-02-03 16:17 ` Punit Agrawal
  0 siblings, 1 reply; 3+ messages in thread
From: Tyler Baicar @ 2017-02-01 22:15 UTC (permalink / raw)
  To: catalin.marinas, will.deacon, mark.rutland, james.morse, akpm,
	zjzhang, sandeepa.s.prabhu, shijie.huang, linux-arm-kernel,
	linux-kernel
  Cc: Tyler Baicar

From: "Jonathan (Zhixiong) Zhang" <zjzhang@codeaurora.org>

Add VM_FAULT_HWPOISON[_LARGE] handling to the arm64 page fault
handler. Handling of VM_FAULT_HWPOISON[_LARGE] is very similar
to VM_FAULT_OOM, the only difference is that a different si_code
(BUS_MCEERR_AR) is passed to user space and si_addr_lsb field is
initialized.

Signed-off-by: Jonathan (Zhixiong) Zhang <zjzhang@codeaurora.org>
Signed-off-by: Tyler Baicar <tbaicar@codeaurora.org>
---
 arch/arm64/mm/fault.c | 31 +++++++++++++++++++++++++++----
 1 file changed, 27 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
index 156169c..50857f9 100644
--- a/arch/arm64/mm/fault.c
+++ b/arch/arm64/mm/fault.c
@@ -30,6 +30,7 @@
 #include <linux/highmem.h>
 #include <linux/perf_event.h>
 #include <linux/preempt.h>
+#include <linux/hugetlb.h>
 
 #include <asm/bug.h>
 #include <asm/cpufeature.h>
@@ -193,9 +194,10 @@ static void __do_kernel_fault(struct mm_struct *mm, unsigned long addr,
  */
 static void __do_user_fault(struct task_struct *tsk, unsigned long addr,
 			    unsigned int esr, unsigned int sig, int code,
-			    struct pt_regs *regs)
+			    struct pt_regs *regs, int fault)
 {
 	struct siginfo si;
+	unsigned lsb = 0;
 
 	if (unhandled_signal(tsk, sig) && show_unhandled_signals_ratelimited()) {
 		pr_info("%s[%d]: unhandled %s (%d) at 0x%08lx, esr 0x%03x\n",
@@ -211,6 +213,17 @@ static void __do_user_fault(struct task_struct *tsk, unsigned long addr,
 	si.si_errno = 0;
 	si.si_code = code;
 	si.si_addr = (void __user *)addr;
+	/*
+	 * Either small page or large page may be poisoned.
+	 * In other words, VM_FAULT_HWPOISON_LARGE and
+	 * VM_FAULT_HWPOISON are mutually exclusive.
+	 */
+	if (fault & VM_FAULT_HWPOISON_LARGE)
+		lsb = hstate_index_to_shift(VM_FAULT_GET_HINDEX(fault));
+	else if (fault & VM_FAULT_HWPOISON)
+		lsb = PAGE_SHIFT;
+	si.si_addr_lsb = lsb;
+
 	force_sig_info(sig, &si, tsk);
 }
 
@@ -224,7 +237,7 @@ static void do_bad_area(unsigned long addr, unsigned int esr, struct pt_regs *re
 	 * handle this fault with.
 	 */
 	if (user_mode(regs))
-		__do_user_fault(tsk, addr, esr, SIGSEGV, SEGV_MAPERR, regs);
+		__do_user_fault(tsk, addr, esr, SIGSEGV, SEGV_MAPERR, regs, 0);
 	else
 		__do_kernel_fault(mm, addr, esr, regs);
 }
@@ -426,7 +439,17 @@ static int __kprobes do_page_fault(unsigned long addr, unsigned int esr,
 		 */
 		sig = SIGBUS;
 		code = BUS_ADRERR;
-	} else {
+	}
+#ifdef CONFIG_MEMORY_FAILURE
+	else if (fault & (VM_FAULT_HWPOISON|VM_FAULT_HWPOISON_LARGE)) {
+		pr_err(
+	"Killing %s:%d due to hardware memory corruption fault at %lx\n",
+			tsk->comm, tsk->pid, addr);
+		sig = SIGBUS;
+		code = BUS_MCEERR_AR;
+	}
+#endif
+	else {
 		/*
 		 * Something tried to access memory that isn't in our memory
 		 * map.
@@ -436,7 +459,7 @@ static int __kprobes do_page_fault(unsigned long addr, unsigned int esr,
 			SEGV_ACCERR : SEGV_MAPERR;
 	}
 
-	__do_user_fault(tsk, addr, esr, sig, code, regs);
+	__do_user_fault(tsk, addr, esr, sig, code, regs, fault);
 	return 0;
 
 no_context:
-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project.

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH] arm64: hwpoison: add VM_FAULT_HWPOISON[_LARGE] handling
  2017-02-01 22:15 [PATCH] arm64: hwpoison: add VM_FAULT_HWPOISON[_LARGE] handling Tyler Baicar
@ 2017-02-03 16:17 ` Punit Agrawal
       [not found]   ` <f92b8ab6-3728-e038-e3c8-bcaa6491075e@codeaurora.org>
  0 siblings, 1 reply; 3+ messages in thread
From: Punit Agrawal @ 2017-02-03 16:17 UTC (permalink / raw)
  To: Tyler Baicar
  Cc: catalin.marinas, will.deacon, mark.rutland, james.morse, akpm,
	zjzhang, sandeepa.s.prabhu, shijie.huang, linux-arm-kernel,
	linux-kernel

Tyler Baicar <tbaicar@codeaurora.org> writes:

> From: "Jonathan (Zhixiong) Zhang" <zjzhang@codeaurora.org>
>
> Add VM_FAULT_HWPOISON[_LARGE] handling to the arm64 page fault
> handler. Handling of VM_FAULT_HWPOISON[_LARGE] is very similar
> to VM_FAULT_OOM, the only difference is that a different si_code
> (BUS_MCEERR_AR) is passed to user space and si_addr_lsb field is
> initialized.
>
> Signed-off-by: Jonathan (Zhixiong) Zhang <zjzhang@codeaurora.org>
> Signed-off-by: Tyler Baicar <tbaicar@codeaurora.org>
> ---
>  arch/arm64/mm/fault.c | 31 +++++++++++++++++++++++++++----
>  1 file changed, 27 insertions(+), 4 deletions(-)
>
> diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c

[...]

> @@ -426,7 +439,17 @@ static int __kprobes do_page_fault(unsigned long addr, unsigned int esr,
>  		 */
>  		sig = SIGBUS;
>  		code = BUS_ADRERR;
> -	} else {
> +	}
> +#ifdef CONFIG_MEMORY_FAILURE
> +	else if (fault & (VM_FAULT_HWPOISON|VM_FAULT_HWPOISON_LARGE)) {

Please add spaces around '|'.

> +		pr_err(
> +	"Killing %s:%d due to hardware memory corruption fault at %lx\n",
> +			tsk->comm, tsk->pid, addr);

The message is misleading as we're not really killing a task but
delivering a signal (SIGBUS) which might not always lead to the receiver
being killed.

But considering that we don't print any message for the other faults,
I'd prefer that we drop this pr_err.

> +		sig = SIGBUS;
> +		code = BUS_MCEERR_AR;
> +	}
> +#endif

Although to get a HWPOISON fault CONFIG_MEMORY_FAILURE is needed, the
handling seems safe even when it is not enabled. Can the ifdeffery be
dropped?

Also, I was wondering how this code was tested? Did you by any chance
try using hwpoison inject debugfs interface?

Thanks,
Punit

> +	else {
>  		/*
>  		 * Something tried to access memory that isn't in our memory
>  		 * map.
> @@ -436,7 +459,7 @@ static int __kprobes do_page_fault(unsigned long addr, unsigned int esr,
>  			SEGV_ACCERR : SEGV_MAPERR;
>  	}
>  
> -	__do_user_fault(tsk, addr, esr, sig, code, regs);
> +	__do_user_fault(tsk, addr, esr, sig, code, regs, fault);
>  	return 0;
>  
>  no_context:

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] arm64: hwpoison: add VM_FAULT_HWPOISON[_LARGE] handling
       [not found]   ` <f92b8ab6-3728-e038-e3c8-bcaa6491075e@codeaurora.org>
@ 2017-02-07 17:19     ` Punit Agrawal
  0 siblings, 0 replies; 3+ messages in thread
From: Punit Agrawal @ 2017-02-07 17:19 UTC (permalink / raw)
  To: Baicar, Tyler
  Cc: catalin.marinas, will.deacon, mark.rutland, james.morse, akpm,
	zjzhang, sandeepa.s.prabhu, shijie.huang, linux-arm-kernel,
	linux-kernel



On 06/02/17 22:21, Baicar, Tyler wrote:
> Hello Punit,
>
>
> On 2/3/2017 9:17 AM, Punit Agrawal wrote:
>> Tyler Baicar <tbaicar@codeaurora.org> writes:
>>
>>> From: "Jonathan (Zhixiong) Zhang" <zjzhang@codeaurora.org>
>>>
>>> Add VM_FAULT_HWPOISON[_LARGE] handling to the arm64 page fault
>>> handler. Handling of VM_FAULT_HWPOISON[_LARGE] is very similar
>>> to VM_FAULT_OOM, the only difference is that a different si_code
>>> (BUS_MCEERR_AR) is passed to user space and si_addr_lsb field is
>>> initialized.
>>>
>>> Signed-off-by: Jonathan (Zhixiong) Zhang <zjzhang@codeaurora.org>
>>> Signed-off-by: Tyler Baicar <tbaicar@codeaurora.org>
>>> ---
>>>  arch/arm64/mm/fault.c | 31 +++++++++++++++++++++++++++----
>>>  1 file changed, 27 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
>> [...]
>>
>>> @@ -426,7 +439,17 @@ static int __kprobes do_page_fault(unsigned long addr, unsigned int esr,
>>>              */
>>>             sig = SIGBUS;
>>>             code = BUS_ADRERR;
>>> -   } else {
>>> +   }
>>> +#ifdef CONFIG_MEMORY_FAILURE
>>> +   else if (fault & (VM_FAULT_HWPOISON|VM_FAULT_HWPOISON_LARGE)) {
>> Please add spaces around '|'.
> Will do!
>>
>>> +           pr_err(
>>> +   "Killing %s:%d due to hardware memory corruption fault at %lx\n",
>>> +                   tsk->comm, tsk->pid, addr);
>> The message is misleading as we're not really killing a task but
>> delivering a signal (SIGBUS) which might not always lead to the receiver
>> being killed.
>>
>> But considering that we don't print any message for the other faults,
>> I'd prefer that we drop this pr_err.
> Yes, I'll drop the pr_err.
>>> +           sig = SIGBUS;
>>> +           code = BUS_MCEERR_AR;
>>> +   }
>>> +#endif
>> Although to get a HWPOISON fault CONFIG_MEMORY_FAILURE is needed, the
>> handling seems safe even when it is not enabled. Can the ifdeffery be
>> dropped?
> Yes, I can drop the ifdef. The handling would be fine either way.
>>
>> Also, I was wondering how this code was tested? Did you by any chance
>> try using hwpoison inject debugfs interface?
> This was originally tested using proprietary error injection that we have.
>
> I just tried the hwpoison inject interface and it didn't result in
> hitting this code path.
>
> [   70.747697] Injecting memory failure at pfn 0x400340
>
> [   70.748547] Memory failure: 0x400340: Unknown page state
>
> [   70.752911] Memory failure: 0x400340: unknown page still referenced
> by 1 users
>
> [   70.760167] Memory failure: 0x400340: recovery action for unknown
> page: Failed
>
>
> I've never used hwpoison inject though, so maybe I'm doing something
> wrong :)

No worries. Writing the pfn an executable is loaded at
/sys/kernel/debug/hwpoison/corrupt-pfn triggered the code for me. On my
system the program dies after printing "Bus error" - probably what the
default handler from glibc is configured to do in this situation. :)

>
> Thanks,
> Tyler
>
> --
> Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
> Qualcomm Technologies, Inc. is a member of the Code Aurora Forum,
> a Linux Foundation Collaborative Project.
>
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2017-02-07 17:19 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-02-01 22:15 [PATCH] arm64: hwpoison: add VM_FAULT_HWPOISON[_LARGE] handling Tyler Baicar
2017-02-03 16:17 ` Punit Agrawal
     [not found]   ` <f92b8ab6-3728-e038-e3c8-bcaa6491075e@codeaurora.org>
2017-02-07 17:19     ` Punit Agrawal

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).