linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] powerpc/mce: Add MCE notification chain
@ 2020-03-30  7:12 Ganesh Goudar
  2020-04-03  2:08 ` Nicholas Piggin
  2020-05-04  6:39 ` Ganesh
  0 siblings, 2 replies; 6+ messages in thread
From: Ganesh Goudar @ 2020-03-30  7:12 UTC (permalink / raw)
  To: mpe, linuxppc-dev
  Cc: santosh, mahesh, npiggin, Ganesh Goudar, aneesh.kumar, arbab

From: Santosh S <santosh@fossix.org>

Introduce notification chain which lets know about uncorrected memory
errors(UE). This would help prospective users in pmem or nvdimm subsystem
to track bad blocks for better handling of persistent memory allocations.

Signed-off-by: Santosh S <santosh@fossix.org>
Signed-off-by: Ganesh Goudar <ganeshgr@linux.ibm.com>
---
 arch/powerpc/include/asm/mce.h |  2 ++
 arch/powerpc/kernel/mce.c      | 15 +++++++++++++++
 2 files changed, 17 insertions(+)

diff --git a/arch/powerpc/include/asm/mce.h b/arch/powerpc/include/asm/mce.h
index 6a6ddaabdb34..6e222a94a68a 100644
--- a/arch/powerpc/include/asm/mce.h
+++ b/arch/powerpc/include/asm/mce.h
@@ -218,6 +218,8 @@ extern void machine_check_queue_event(void);
 extern void machine_check_print_event_info(struct machine_check_event *evt,
 					   bool user_mode, bool in_guest);
 unsigned long addr_to_pfn(struct pt_regs *regs, unsigned long addr);
+int mce_register_notifier(struct notifier_block *nb);
+int mce_unregister_notifier(struct notifier_block *nb);
 #ifdef CONFIG_PPC_BOOK3S_64
 void flush_and_reload_slb(void);
 #endif /* CONFIG_PPC_BOOK3S_64 */
diff --git a/arch/powerpc/kernel/mce.c b/arch/powerpc/kernel/mce.c
index 34c1001e9e8b..f50d7f56c02c 100644
--- a/arch/powerpc/kernel/mce.c
+++ b/arch/powerpc/kernel/mce.c
@@ -47,6 +47,20 @@ static struct irq_work mce_ue_event_irq_work = {
 
 DECLARE_WORK(mce_ue_event_work, machine_process_ue_event);
 
+static BLOCKING_NOTIFIER_HEAD(mce_notifier_list);
+
+int mce_register_notifier(struct notifier_block *nb)
+{
+	return blocking_notifier_chain_register(&mce_notifier_list, nb);
+}
+EXPORT_SYMBOL_GPL(mce_register_notifier);
+
+int mce_unregister_notifier(struct notifier_block *nb)
+{
+	return blocking_notifier_chain_unregister(&mce_notifier_list, nb);
+}
+EXPORT_SYMBOL_GPL(mce_unregister_notifier);
+
 static void mce_set_error_info(struct machine_check_event *mce,
 			       struct mce_error_info *mce_err)
 {
@@ -263,6 +277,7 @@ static void machine_process_ue_event(struct work_struct *work)
 	while (__this_cpu_read(mce_ue_count) > 0) {
 		index = __this_cpu_read(mce_ue_count) - 1;
 		evt = this_cpu_ptr(&mce_ue_event_queue[index]);
+		blocking_notifier_call_chain(&mce_notifier_list, 0, evt);
 #ifdef CONFIG_MEMORY_FAILURE
 		/*
 		 * This should probably queued elsewhere, but
-- 
2.17.2


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] powerpc/mce: Add MCE notification chain
  2020-03-30  7:12 [PATCH] powerpc/mce: Add MCE notification chain Ganesh Goudar
@ 2020-04-03  2:08 ` Nicholas Piggin
  2020-04-04 13:05   ` Ganesh
  2020-05-04  6:39 ` Ganesh
  1 sibling, 1 reply; 6+ messages in thread
From: Nicholas Piggin @ 2020-04-03  2:08 UTC (permalink / raw)
  To: Ganesh Goudar, linuxppc-dev, mpe; +Cc: aneesh.kumar, santosh, arbab, mahesh

Ganesh Goudar's on March 30, 2020 5:12 pm:
> From: Santosh S <santosh@fossix.org>
> 
> Introduce notification chain which lets know about uncorrected memory
> errors(UE). This would help prospective users in pmem or nvdimm subsystem
> to track bad blocks for better handling of persistent memory allocations.
> 
> Signed-off-by: Santosh S <santosh@fossix.org>
> Signed-off-by: Ganesh Goudar <ganeshgr@linux.ibm.com>

Do you have any such users yet? It would be good to refer to an example 
user and give a brief description of what it does in its notifier.

> @@ -263,6 +277,7 @@ static void machine_process_ue_event(struct work_struct *work)
>  	while (__this_cpu_read(mce_ue_count) > 0) {
>  		index = __this_cpu_read(mce_ue_count) - 1;
>  		evt = this_cpu_ptr(&mce_ue_event_queue[index]);
> +		blocking_notifier_call_chain(&mce_notifier_list, 0, evt);

Can we really use a blocking notifier here? I'm not sure that we can.

Thanks,
Nick

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] powerpc/mce: Add MCE notification chain
  2020-04-03  2:08 ` Nicholas Piggin
@ 2020-04-04 13:05   ` Ganesh
  2020-04-06  2:17     ` Nicholas Piggin
  0 siblings, 1 reply; 6+ messages in thread
From: Ganesh @ 2020-04-04 13:05 UTC (permalink / raw)
  To: Nicholas Piggin, linuxppc-dev, mpe; +Cc: aneesh.kumar, santosh, arbab, mahesh

[-- Attachment #1: Type: text/plain, Size: 1150 bytes --]

On 4/3/20 7:38 AM, Nicholas Piggin wrote:

> Ganesh Goudar's on March 30, 2020 5:12 pm:
>> From: Santosh S <santosh@fossix.org>
>>
>> Introduce notification chain which lets know about uncorrected memory
>> errors(UE). This would help prospective users in pmem or nvdimm subsystem
>> to track bad blocks for better handling of persistent memory allocations.
>>
>> Signed-off-by: Santosh S <santosh@fossix.org>
>> Signed-off-by: Ganesh Goudar <ganeshgr@linux.ibm.com>
> Do you have any such users yet? It would be good to refer to an example
> user and give a brief description of what it does in its notifier.

Santosh has sent a patch which uses this notification.
https://patchwork.ozlabs.org/patch/1265062/

>> @@ -263,6 +277,7 @@ static void machine_process_ue_event(struct work_struct *work)
>>   	while (__this_cpu_read(mce_ue_count) > 0) {
>>   		index = __this_cpu_read(mce_ue_count) - 1;
>>   		evt = this_cpu_ptr(&mce_ue_event_queue[index]);
>> +		blocking_notifier_call_chain(&mce_notifier_list, 0, evt);
> Can we really use a blocking notifier here? I'm not sure that we can.

I think we can, do you see any problem?

>
> Thanks,
> Nick


[-- Attachment #2: Type: text/html, Size: 2393 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] powerpc/mce: Add MCE notification chain
  2020-04-04 13:05   ` Ganesh
@ 2020-04-06  2:17     ` Nicholas Piggin
  2020-04-06 17:17       ` Mahesh J Salgaonkar
  0 siblings, 1 reply; 6+ messages in thread
From: Nicholas Piggin @ 2020-04-06  2:17 UTC (permalink / raw)
  To: Ganesh, linuxppc-dev, mpe; +Cc: aneesh.kumar, santosh, arbab, mahesh

Ganesh's on April 4, 2020 11:05 pm:
> On 4/3/20 7:38 AM, Nicholas Piggin wrote:
> 
>> Ganesh Goudar's on March 30, 2020 5:12 pm:
>>> From: Santosh S <santosh@fossix.org>
>>>
>>> Introduce notification chain which lets know about uncorrected memory
>>> errors(UE). This would help prospective users in pmem or nvdimm subsystem
>>> to track bad blocks for better handling of persistent memory allocations.
>>>
>>> Signed-off-by: Santosh S <santosh@fossix.org>
>>> Signed-off-by: Ganesh Goudar <ganeshgr@linux.ibm.com>
>> Do you have any such users yet? It would be good to refer to an example
>> user and give a brief description of what it does in its notifier.
> 
> Santosh has sent a patch which uses this notification.
> https://patchwork.ozlabs.org/patch/1265062/

Okay. So these things are asynchronous after the machine check. I guess
that's the design of it and memory offlining does something similar by
the looks, but how do you prevent the memory being allocated for 
something else before the notifiers run?

>>> @@ -263,6 +277,7 @@ static void machine_process_ue_event(struct work_struct *work)
>>>   	while (__this_cpu_read(mce_ue_count) > 0) {
>>>   		index = __this_cpu_read(mce_ue_count) - 1;
>>>   		evt = this_cpu_ptr(&mce_ue_event_queue[index]);
>>> +		blocking_notifier_call_chain(&mce_notifier_list, 0, evt);
>> Can we really use a blocking notifier here? I'm not sure that we can.
> 
> I think we can, do you see any problem?

No it looks okay after better look, sorry for the noise.

Thanks,
Nick

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] powerpc/mce: Add MCE notification chain
  2020-04-06  2:17     ` Nicholas Piggin
@ 2020-04-06 17:17       ` Mahesh J Salgaonkar
  0 siblings, 0 replies; 6+ messages in thread
From: Mahesh J Salgaonkar @ 2020-04-06 17:17 UTC (permalink / raw)
  To: Nicholas Piggin
  Cc: santosh, linuxppc-dev, mahesh, Ganesh, aneesh.kumar, arbab

On 2020-04-06 12:17:22 Mon, Nicholas Piggin wrote:
> Ganesh's on April 4, 2020 11:05 pm:
> > On 4/3/20 7:38 AM, Nicholas Piggin wrote:
> > 
> >> Ganesh Goudar's on March 30, 2020 5:12 pm:
> >>> From: Santosh S <santosh@fossix.org>
> >>>
> >>> Introduce notification chain which lets know about uncorrected memory
> >>> errors(UE). This would help prospective users in pmem or nvdimm subsystem
> >>> to track bad blocks for better handling of persistent memory allocations.
> >>>
> >>> Signed-off-by: Santosh S <santosh@fossix.org>
> >>> Signed-off-by: Ganesh Goudar <ganeshgr@linux.ibm.com>
> >> Do you have any such users yet? It would be good to refer to an example
> >> user and give a brief description of what it does in its notifier.
> > 
> > Santosh has sent a patch which uses this notification.
> > https://patchwork.ozlabs.org/patch/1265062/
> 
> Okay. So these things are asynchronous after the machine check. I guess
> that's the design of it and memory offlining does something similar by
> the looks, but how do you prevent the memory being allocated for 
> something else before the notifiers run?

We can't. This race even exists today when we call memory_failure(). If
the same memory is allocated again then we may hit another mce on same
address when touched until the subsystem that has resistered for
notification has completed handling the notified address.

Thanks,
-Mahesh.

> 
> >>> @@ -263,6 +277,7 @@ static void machine_process_ue_event(struct work_struct *work)
> >>>   	while (__this_cpu_read(mce_ue_count) > 0) {
> >>>   		index = __this_cpu_read(mce_ue_count) - 1;
> >>>   		evt = this_cpu_ptr(&mce_ue_event_queue[index]);
> >>> +		blocking_notifier_call_chain(&mce_notifier_list, 0, evt);
> >> Can we really use a blocking notifier here? I'm not sure that we can.
> > 
> > I think we can, do you see any problem?
> 
> No it looks okay after better look, sorry for the noise.
> 
> Thanks,
> Nick

-- 
Mahesh J Salgaonkar


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] powerpc/mce: Add MCE notification chain
  2020-03-30  7:12 [PATCH] powerpc/mce: Add MCE notification chain Ganesh Goudar
  2020-04-03  2:08 ` Nicholas Piggin
@ 2020-05-04  6:39 ` Ganesh
  1 sibling, 0 replies; 6+ messages in thread
From: Ganesh @ 2020-05-04  6:39 UTC (permalink / raw)
  To: mpe, linuxppc-dev; +Cc: mahesh, santosh, arbab, npiggin, aneesh.kumar

[-- Attachment #1: Type: text/plain, Size: 470 bytes --]

On 3/30/20 12:42 PM, Ganesh Goudar wrote:

> From: Santosh S <santosh@fossix.org>
>
> Introduce notification chain which lets know about uncorrected memory
> errors(UE). This would help prospective users in pmem or nvdimm subsystem
> to track bad blocks for better handling of persistent memory allocations.
>
> Signed-off-by: Santosh S <santosh@fossix.org>
> Signed-off-by: Ganesh Goudar <ganeshgr@linux.ibm.com>
> ---

Hi mpe, Do you have any comments on this patch?


[-- Attachment #2: Type: text/html, Size: 1030 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2020-05-04  7:06 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-30  7:12 [PATCH] powerpc/mce: Add MCE notification chain Ganesh Goudar
2020-04-03  2:08 ` Nicholas Piggin
2020-04-04 13:05   ` Ganesh
2020-04-06  2:17     ` Nicholas Piggin
2020-04-06 17:17       ` Mahesh J Salgaonkar
2020-05-04  6:39 ` Ganesh

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).