All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] powerpc/fadump: Fix the race in crash_fadump().
@ 2016-10-10  6:30 Mahesh J Salgaonkar
  2016-10-10 10:52 ` Michael Ellerman
  0 siblings, 1 reply; 5+ messages in thread
From: Mahesh J Salgaonkar @ 2016-10-10  6:30 UTC (permalink / raw)
  To: linuxppc-dev, Michael Ellerman
  Cc: Hari Bathini, Haren Myneni, Balbir Singh, Anton Blanchard

From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>

There are chances that multiple CPUs can call crash_fadump() simultaneously
and would start duplicating same info to vmcoreinfo ELF note section. This
causes makedumpfile to fail during kdump capture. One example is,
triggering dumprestart from HMC which sends system reset to all the CPUs at
once.

makedumpfile --dump-dmesg /proc/vmcore
read_vmcoreinfo_basic_info: Invalid data in /tmp/vmcoreinfoyjgxlL: CRASHTIME=1475605971CRASHTIME=1475605971CRASHTIME=1475605971CRASHTIME=1475605971CRASHTIME=1475605971CRASHTIME=1475605971CRASHTIME=1475605971CRASHTIME=1475605971
makedumpfile Failed.
Running makedumpfile --dump-dmesg /proc/vmcore failed (1).

makedumpfile  -d 31 -l /proc/vmcore
read_vmcoreinfo_basic_info: Invalid data in /tmp/vmcoreinfo1mmVdO: CRASHTIME=1475605971CRASHTIME=1475605971CRASHTIME=1475605971CRASHTIME=1475605971CRASHTIME=1475605971CRASHTIME=1475605971CRASHTIME=1475605971CRASHTIME=1475605971
makedumpfile Failed.
Running makedumpfile  -d 31 -l /proc/vmcore failed (1).

Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
---
 arch/powerpc/kernel/fadump.c |   11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
index b3a6633..2ed9d1c 100644
--- a/arch/powerpc/kernel/fadump.c
+++ b/arch/powerpc/kernel/fadump.c
@@ -402,8 +402,14 @@ void crash_fadump(struct pt_regs *regs, const char *str)
 {
 	struct fadump_crash_info_header *fdh = NULL;
 
-	if (!fw_dump.dump_registered || !fw_dump.fadumphdr_addr)
+	mutex_lock(&fadump_mutex);
+	if (!fw_dump.dump_registered || !fw_dump.fadumphdr_addr) {
+		mutex_unlock(&fadump_mutex);
 		return;
+	}
+
+	if (crashing_cpu != -1)
+		goto unlock_out;
 
 	fdh = __va(fw_dump.fadumphdr_addr);
 	crashing_cpu = smp_processor_id();
@@ -417,6 +423,9 @@ void crash_fadump(struct pt_regs *regs, const char *str)
 
 	fdh->online_mask = *cpu_online_mask;
 
+unlock_out:
+	mutex_unlock(&fadump_mutex);
+
 	/* Call ibm,os-term rtas call to trigger firmware assisted dump */
 	rtas_os_term((char *)str);
 }

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] powerpc/fadump: Fix the race in crash_fadump().
  2016-10-10  6:30 [PATCH] powerpc/fadump: Fix the race in crash_fadump() Mahesh J Salgaonkar
@ 2016-10-10 10:52 ` Michael Ellerman
  2016-10-12 17:48   ` Mahesh Jagannath Salgaonkar
  0 siblings, 1 reply; 5+ messages in thread
From: Michael Ellerman @ 2016-10-10 10:52 UTC (permalink / raw)
  To: Mahesh J Salgaonkar, linuxppc-dev
  Cc: Hari Bathini, Haren Myneni, Balbir Singh, Anton Blanchard

Mahesh J Salgaonkar <mahesh@linux.vnet.ibm.com> writes:

> From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
>
> There are chances that multiple CPUs can call crash_fadump() simultaneously
> and would start duplicating same info to vmcoreinfo ELF note section. This
> causes makedumpfile to fail during kdump capture. One example is,
> triggering dumprestart from HMC which sends system reset to all the CPUs at
> once.
...
> diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
> index b3a6633..2ed9d1c 100644
> --- a/arch/powerpc/kernel/fadump.c
> +++ b/arch/powerpc/kernel/fadump.c
> @@ -402,8 +402,14 @@ void crash_fadump(struct pt_regs *regs, const char *str)
>  {
>  	struct fadump_crash_info_header *fdh = NULL;
>  
> -	if (!fw_dump.dump_registered || !fw_dump.fadumphdr_addr)
> +	mutex_lock(&fadump_mutex);

What happens when a crashing CPU can't get the mutex and goes to sleep?

cheers

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] powerpc/fadump: Fix the race in crash_fadump().
  2016-10-10 10:52 ` Michael Ellerman
@ 2016-10-12 17:48   ` Mahesh Jagannath Salgaonkar
  2016-10-13 10:48     ` Michael Ellerman
  2016-10-14  0:42     ` Balbir Singh
  0 siblings, 2 replies; 5+ messages in thread
From: Mahesh Jagannath Salgaonkar @ 2016-10-12 17:48 UTC (permalink / raw)
  To: Michael Ellerman, linuxppc-dev
  Cc: Hari Bathini, Haren Myneni, Balbir Singh, Anton Blanchard

On 10/10/2016 04:22 PM, Michael Ellerman wrote:
> Mahesh J Salgaonkar <mahesh@linux.vnet.ibm.com> writes:
> 
>> From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
>>
>> There are chances that multiple CPUs can call crash_fadump() simultaneously
>> and would start duplicating same info to vmcoreinfo ELF note section. This
>> causes makedumpfile to fail during kdump capture. One example is,
>> triggering dumprestart from HMC which sends system reset to all the CPUs at
>> once.
> ...
>> diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
>> index b3a6633..2ed9d1c 100644
>> --- a/arch/powerpc/kernel/fadump.c
>> +++ b/arch/powerpc/kernel/fadump.c
>> @@ -402,8 +402,14 @@ void crash_fadump(struct pt_regs *regs, const char *str)
>>  {
>>  	struct fadump_crash_info_header *fdh = NULL;
>>  
>> -	if (!fw_dump.dump_registered || !fw_dump.fadumphdr_addr)
>> +	mutex_lock(&fadump_mutex);
> 
> What happens when a crashing CPU can't get the mutex and goes to sleep?

Got your point. I think I should use mutex_trylock() here. There is only
two reason crashing CPU can't get mutex, 1) Another CPU also crashing
that got the mutex and on its way to trigger fadump. OR 2) We are in
middle of fadump register/un-register, in which case we can just return
and go to normal panic.

Thanks,
-Mahesh.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] powerpc/fadump: Fix the race in crash_fadump().
  2016-10-12 17:48   ` Mahesh Jagannath Salgaonkar
@ 2016-10-13 10:48     ` Michael Ellerman
  2016-10-14  0:42     ` Balbir Singh
  1 sibling, 0 replies; 5+ messages in thread
From: Michael Ellerman @ 2016-10-13 10:48 UTC (permalink / raw)
  To: Mahesh Jagannath Salgaonkar, linuxppc-dev
  Cc: Hari Bathini, Haren Myneni, Balbir Singh, Anton Blanchard

Mahesh Jagannath Salgaonkar <mahesh@linux.vnet.ibm.com> writes:

> On 10/10/2016 04:22 PM, Michael Ellerman wrote:
>> Mahesh J Salgaonkar <mahesh@linux.vnet.ibm.com> writes:
>> 
>>> From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
>>>
>>> There are chances that multiple CPUs can call crash_fadump() simultaneously
>>> and would start duplicating same info to vmcoreinfo ELF note section. This
>>> causes makedumpfile to fail during kdump capture. One example is,
>>> triggering dumprestart from HMC which sends system reset to all the CPUs at
>>> once.
>> ...
>>> diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
>>> index b3a6633..2ed9d1c 100644
>>> --- a/arch/powerpc/kernel/fadump.c
>>> +++ b/arch/powerpc/kernel/fadump.c
>>> @@ -402,8 +402,14 @@ void crash_fadump(struct pt_regs *regs, const char *str)
>>>  {
>>>  	struct fadump_crash_info_header *fdh = NULL;
>>>  
>>> -	if (!fw_dump.dump_registered || !fw_dump.fadumphdr_addr)
>>> +	mutex_lock(&fadump_mutex);
>> 
>> What happens when a crashing CPU can't get the mutex and goes to sleep?
>
> Got your point. I think I should use mutex_trylock() here. There is only
> two reason crashing CPU can't get mutex, 1) Another CPU also crashing
> that got the mutex and on its way to trigger fadump. OR 2) We are in
> middle of fadump register/un-register, in which case we can just return
> and go to normal panic.

mutex_trylock() is probably OK, though with DEBUG options on it might
still run a bunch of code, you don't want it to go and call into lockdep
for example.

Normally the advice is not to write your own locks, but in this case the
safest option might be just to do a cmpxchg() or something similarly
simple.

cheers

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] powerpc/fadump: Fix the race in crash_fadump().
  2016-10-12 17:48   ` Mahesh Jagannath Salgaonkar
  2016-10-13 10:48     ` Michael Ellerman
@ 2016-10-14  0:42     ` Balbir Singh
  1 sibling, 0 replies; 5+ messages in thread
From: Balbir Singh @ 2016-10-14  0:42 UTC (permalink / raw)
  To: Mahesh Jagannath Salgaonkar, Michael Ellerman, linuxppc-dev
  Cc: Hari Bathini, Haren Myneni, Anton Blanchard



On 13/10/16 04:48, Mahesh Jagannath Salgaonkar wrote:
> On 10/10/2016 04:22 PM, Michael Ellerman wrote:
>> Mahesh J Salgaonkar <mahesh@linux.vnet.ibm.com> writes:
>>
>>> From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
>>>
>>> There are chances that multiple CPUs can call crash_fadump() simultaneously
>>> and would start duplicating same info to vmcoreinfo ELF note section. This
>>> causes makedumpfile to fail during kdump capture. One example is,
>>> triggering dumprestart from HMC which sends system reset to all the CPUs at
>>> once.
>> ...
>>> diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
>>> index b3a6633..2ed9d1c 100644
>>> --- a/arch/powerpc/kernel/fadump.c
>>> +++ b/arch/powerpc/kernel/fadump.c
>>> @@ -402,8 +402,14 @@ void crash_fadump(struct pt_regs *regs, const char *str)
>>>  {
>>>  	struct fadump_crash_info_header *fdh = NULL;
>>>  
>>> -	if (!fw_dump.dump_registered || !fw_dump.fadumphdr_addr)
>>> +	mutex_lock(&fadump_mutex);
>>
>> What happens when a crashing CPU can't get the mutex and goes to sleep?
> 
> Got your point. I think I should use mutex_trylock() here. There is only
> two reason crashing CPU can't get mutex, 1) Another CPU also crashing
> that got the mutex and on its way to trigger fadump. OR 2) We are in
> middle of fadump register/un-register, in which case we can just return
> and go to normal panic.

I think trylock is a good idea, but having said that not getting the lock
and having the CPU active will still lead to the same issue. 
I don't quite know the source of failure in makedumpfile but
should we fix makedumpfile to deal better with these issues?

Another option is to check to see if anyone started writing at the ELF
note section and have others bail out if they get there after the try
lock

Balbir Singh.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2016-10-14  0:42 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-10-10  6:30 [PATCH] powerpc/fadump: Fix the race in crash_fadump() Mahesh J Salgaonkar
2016-10-10 10:52 ` Michael Ellerman
2016-10-12 17:48   ` Mahesh Jagannath Salgaonkar
2016-10-13 10:48     ` Michael Ellerman
2016-10-14  0:42     ` Balbir Singh

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.