All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC][PATCH -next] pstore: replace spin_lock with spin_trylock_irqsave in panic path
@ 2011-09-27 17:14 Seiji Aguchi
  2011-09-27 17:34 ` Don Zickus
  0 siblings, 1 reply; 10+ messages in thread
From: Seiji Aguchi @ 2011-09-27 17:14 UTC (permalink / raw)
  To: linux-kernel, Vivek Goyal, Matthew Garrett, tony.luck, dzickus,
	gong.chen, Andrew Morton
  Cc: dle-develop, Satoru Moriya

Hi,

[Problem]
Currently, pstore takes spin_trylock(&psinfo->buf_lock) in NMI context.
And it takes spin_lock(&psinfo->buf_lock) in other cases.

If there are some bugs in pstore and kernel panics, spin_lock(&psinfo->buf_lock) causes deadlock 
and panic_notifier_chain will not work.

[Patch Description]
For solving this problem, this patch replaces spin_lock with spin_trylock_irqsave in panic path.

Dead lock in panic path will not happen by applying this patch.

 Signed-off-by: Seiji Aguchi <seiji.aguchi@hds.com>

---
 fs/pstore/platform.c |   17 ++++++++---------
 1 files changed, 8 insertions(+), 9 deletions(-)

diff --git a/fs/pstore/platform.c b/fs/pstore/platform.c index 0472924..9882892 100644
--- a/fs/pstore/platform.c
+++ b/fs/pstore/platform.c
@@ -97,12 +97,15 @@ static void pstore_dump(struct kmsg_dumper *dumper,
 	else
 		why = "Unknown";
 
-	if (in_nmi()) {
-		is_locked = spin_trylock(&psinfo->buf_lock);
-		if (!is_locked)
-			pr_err("pstore dump routine blocked in NMI, may corrupt error record\n");
+	if (reason == KMSG_DUMP_PANIC) {
+		is_locked = spin_trylock_irqsave(&psinfo->buf_lock, flags);
+		if (!is_locked) {
+			pr_err("pstore dump routine skipped in panic path\n");
+			return;
+		}
 	} else
 		spin_lock_irqsave(&psinfo->buf_lock, flags);
+
 	oopscount++;
 	while (total < kmsg_bytes) {
 		dst = psinfo->buf;
@@ -131,11 +134,7 @@ static void pstore_dump(struct kmsg_dumper *dumper,
 		total += l1_cpy + l2_cpy;
 		part++;
 	}
-	if (in_nmi()) {
-		if (is_locked)
-			spin_unlock(&psinfo->buf_lock);
-	} else
-		spin_unlock_irqrestore(&psinfo->buf_lock, flags);
+	spin_unlock_irqrestore(&psinfo->buf_lock, flags);
 }
 
 static struct kmsg_dumper pstore_dumper = {
--
1.7.1


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC][PATCH -next] pstore: replace spin_lock with spin_trylock_irqsave in panic path
  2011-09-27 17:14 [RFC][PATCH -next] pstore: replace spin_lock with spin_trylock_irqsave in panic path Seiji Aguchi
@ 2011-09-27 17:34 ` Don Zickus
  2011-09-27 17:46   ` Luck, Tony
  0 siblings, 1 reply; 10+ messages in thread
From: Don Zickus @ 2011-09-27 17:34 UTC (permalink / raw)
  To: Seiji Aguchi
  Cc: linux-kernel, Vivek Goyal, Matthew Garrett, tony.luck, gong.chen,
	Andrew Morton, dle-develop, Satoru Moriya

On Tue, Sep 27, 2011 at 01:14:59PM -0400, Seiji Aguchi wrote:
> Hi,
> 
> [Problem]
> Currently, pstore takes spin_trylock(&psinfo->buf_lock) in NMI context.
> And it takes spin_lock(&psinfo->buf_lock) in other cases.
> 
> If there are some bugs in pstore and kernel panics, spin_lock(&psinfo->buf_lock) causes deadlock 
> and panic_notifier_chain will not work.

Ok, so I missed your 'return' first time through and originally had a
bunch of comments.  So I would suggest adding a comment explaining why we
are returning in that failure.

Personally, I am not sure we want to abort here at the pstore layer, it
should probably be aborted lower.  There isn't any reason why we can't
continue from a pstore perspective (we can just bust the spinlock).

>From an ERST perspective, the state machine might be screwed up, hence
aborting in that layer could make sense.  But I don't think I agree with
the 'return' statement.

So I am opposed to it for now.

Cheers,
Don


> 
> [Patch Description]
> For solving this problem, this patch replaces spin_lock with spin_trylock_irqsave in panic path.
> 
> Dead lock in panic path will not happen by applying this patch.
> 
>  Signed-off-by: Seiji Aguchi <seiji.aguchi@hds.com>
> 
> ---
>  fs/pstore/platform.c |   17 ++++++++---------
>  1 files changed, 8 insertions(+), 9 deletions(-)
> 
> diff --git a/fs/pstore/platform.c b/fs/pstore/platform.c index 0472924..9882892 100644
> --- a/fs/pstore/platform.c
> +++ b/fs/pstore/platform.c
> @@ -97,12 +97,15 @@ static void pstore_dump(struct kmsg_dumper *dumper,
>  	else
>  		why = "Unknown";
>  
> -	if (in_nmi()) {
> -		is_locked = spin_trylock(&psinfo->buf_lock);
> -		if (!is_locked)
> -			pr_err("pstore dump routine blocked in NMI, may corrupt error record\n");
> +	if (reason == KMSG_DUMP_PANIC) {
> +		is_locked = spin_trylock_irqsave(&psinfo->buf_lock, flags);
> +		if (!is_locked) {
> +			pr_err("pstore dump routine skipped in panic path\n");
> +			return;
> +		}
>  	} else
>  		spin_lock_irqsave(&psinfo->buf_lock, flags);
> +
>  	oopscount++;
>  	while (total < kmsg_bytes) {
>  		dst = psinfo->buf;
> @@ -131,11 +134,7 @@ static void pstore_dump(struct kmsg_dumper *dumper,
>  		total += l1_cpy + l2_cpy;
>  		part++;
>  	}
> -	if (in_nmi()) {
> -		if (is_locked)
> -			spin_unlock(&psinfo->buf_lock);
> -	} else
> -		spin_unlock_irqrestore(&psinfo->buf_lock, flags);
> +	spin_unlock_irqrestore(&psinfo->buf_lock, flags);
>  }
>  
>  static struct kmsg_dumper pstore_dumper = {
> --
> 1.7.1
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: [RFC][PATCH -next] pstore: replace spin_lock with spin_trylock_irqsave in panic path
  2011-09-27 17:34 ` Don Zickus
@ 2011-09-27 17:46   ` Luck, Tony
  2011-09-27 17:59     ` Don Zickus
  0 siblings, 1 reply; 10+ messages in thread
From: Luck, Tony @ 2011-09-27 17:46 UTC (permalink / raw)
  To: Don Zickus, Seiji Aguchi
  Cc: linux-kernel, Vivek Goyal, Matthew Garrett, Chen, Gong,
	Andrew Morton, dle-develop, Satoru Moriya

> Personally, I am not sure we want to abort here at the pstore layer, it
> should probably be aborted lower.  There isn't any reason why we can't
> continue from a pstore perspective (we can just bust the spinlock).

But do we really have much chance at getting a real dump in this case?
The pstore buf_lock is protecting the memory that the backend uses to
save the data. If we can't get the lock, then we are going to conflict
using that buffer with whoever does have the lock. So we will probably
mess up whatever data they were trying to save, as well as not managing
to save our panic data. So this isn't just a back-end issue, it is
fundamental to the pstore layer (since it depends on this back end buffer).

This is a tough call - but I'm leaning a bit towards taking this patch.

I agree with your suggestion that we need a better comment by the "return"
(and also in the change log) saying why we are not saving the panic dmesg.

-Tony 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC][PATCH -next] pstore: replace spin_lock with spin_trylock_irqsave in panic path
  2011-09-27 17:46   ` Luck, Tony
@ 2011-09-27 17:59     ` Don Zickus
  2011-09-27 19:02       ` Luck, Tony
  0 siblings, 1 reply; 10+ messages in thread
From: Don Zickus @ 2011-09-27 17:59 UTC (permalink / raw)
  To: Luck, Tony
  Cc: Seiji Aguchi, linux-kernel, Vivek Goyal, Matthew Garrett, Chen,
	Gong, Andrew Morton, dle-develop, Satoru Moriya

On Tue, Sep 27, 2011 at 10:46:32AM -0700, Luck, Tony wrote:
> > Personally, I am not sure we want to abort here at the pstore layer, it
> > should probably be aborted lower.  There isn't any reason why we can't
> > continue from a pstore perspective (we can just bust the spinlock).
> 
> But do we really have much chance at getting a real dump in this case?
> The pstore buf_lock is protecting the memory that the backend uses to
> save the data. If we can't get the lock, then we are going to conflict
> using that buffer with whoever does have the lock. So we will probably
> mess up whatever data they were trying to save, as well as not managing

Ok.  Do we care?  I assumed the panic data would be more
relevant/interesting than whatever pstore was doing before (like loading
previous log files).

> to save our panic data. So this isn't just a back-end issue, it is

I assumed we are just overwriting the buffer with the current data, so
unless the other cpu is chugging along while this cpu is in panic, the new
data shouldn't get corrupted, no?

Cheers,
Don

> fundamental to the pstore layer (since it depends on this back end buffer).
> 
> This is a tough call - but I'm leaning a bit towards taking this patch.
> 
> I agree with your suggestion that we need a better comment by the "return"
> (and also in the change log) saying why we are not saving the panic dmesg.
> 
> -Tony 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: [RFC][PATCH -next] pstore: replace spin_lock with spin_trylock_irqsave in panic path
  2011-09-27 17:59     ` Don Zickus
@ 2011-09-27 19:02       ` Luck, Tony
  2011-09-27 19:46         ` Seiji Aguchi
  2011-09-28 13:57         ` Don Zickus
  0 siblings, 2 replies; 10+ messages in thread
From: Luck, Tony @ 2011-09-27 19:02 UTC (permalink / raw)
  To: Don Zickus
  Cc: Seiji Aguchi, linux-kernel, Vivek Goyal, Matthew Garrett, Chen,
	Gong, Andrew Morton, dle-develop, Satoru Moriya

> Ok.  Do we care?  I assumed the panic data would be more
> relevant/interesting than whatever pstore was doing before (like loading
> previous log files).

Yes we care - saving panic data is most likely the single most important
thing that pstore does.  I just have severe doubts that it will actually
save anything useful if we just blindly continue if we can't get the lock.

What actually happens next will be dependent on the back-end. For
the state machine in ERST, one possible outcome is a hang. For many
people a hang is considered worse than a panic.

> I assumed we are just overwriting the buffer with the current data, so
> unless the other cpu is chugging along while this cpu is in panic, the new
> data shouldn't get corrupted, no?

I really have no idea what *will* happen.  Lots of things are possible, only
some of them are desirable.

Is this patch based on a real-life case of a system deadlocking? I'd
like to know if we are just talking around the theoretical case that
the lock may be held at panic time - or something that has actually been
seen in real life.

-Tony

^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: [RFC][PATCH -next] pstore: replace spin_lock with spin_trylock_irqsave in panic path
  2011-09-27 19:02       ` Luck, Tony
@ 2011-09-27 19:46         ` Seiji Aguchi
  2011-09-28 14:09           ` Don Zickus
  2011-09-28 13:57         ` Don Zickus
  1 sibling, 1 reply; 10+ messages in thread
From: Seiji Aguchi @ 2011-09-27 19:46 UTC (permalink / raw)
  To: Luck, Tony, Don Zickus
  Cc: linux-kernel, Vivek Goyal, Matthew Garrett, Chen, Gong,
	Andrew Morton, dle-develop, Satoru Moriya

Hi,

>Yes we care - saving panic data is most likely the single most important
>thing that pstore does.  I just have severe doubts that it will actually
>save anything useful if we just blindly continue if we can't get the lock.

I agree with Tony. We may not get useful information if pstore just blindly continues
while other cpus are running.

>Is this patch based on a real-life case of a system deadlocking? I'd
>like to know if we are just talking around the theoretical case that
>the lock may be held at panic time - or something that has actually been
>seen in real life.

This patch is _not_ based on real-life case. I would like to avoid potential deadlock.

If Don disagrees to  my "return" code, I have another idea which moves pstore_dump() behind smp_send_stop().
smp_send_stop() stops other cpus by sending IPI.
So pstore can continue reliably and get useful information by just busting spinlock.

It depends on each backend driver whether it actually accesses to NVRAM/storage.

Idea
====
 Panic()
   |- smp_send_stop() (Send IPI to other cpus)
   |- bust spin_lock(&psinfo->buf_lock)
   |- call pstore_dump()

Seiji


>-----Original Message-----
>From: Luck, Tony [mailto:tony.luck@intel.com]
>Sent: Tuesday, September 27, 2011 3:03 PM
>To: Don Zickus
>Cc: Seiji Aguchi; linux-kernel@vger.kernel.org; Vivek Goyal; Matthew Garrett; Chen, Gong; Andrew Morton;
>dle-develop@lists.sourceforge.net; Satoru Moriya
>Subject: RE: [RFC][PATCH -next] pstore: replace spin_lock with spin_trylock_irqsave in panic path
>
>> Ok.  Do we care?  I assumed the panic data would be more
>> relevant/interesting than whatever pstore was doing before (like loading
>> previous log files).
>
>Yes we care - saving panic data is most likely the single most important
>thing that pstore does.  I just have severe doubts that it will actually
>save anything useful if we just blindly continue if we can't get the lock.
>
>What actually happens next will be dependent on the back-end. For
>the state machine in ERST, one possible outcome is a hang. For many
>people a hang is considered worse than a panic.
>
>> I assumed we are just overwriting the buffer with the current data, so
>> unless the other cpu is chugging along while this cpu is in panic, the new
>> data shouldn't get corrupted, no?
>
>I really have no idea what *will* happen.  Lots of things are possible, only
>some of them are desirable.
>
>Is this patch based on a real-life case of a system deadlocking? I'd
>like to know if we are just talking around the theoretical case that
>the lock may be held at panic time - or something that has actually been
>seen in real life.
>
>-Tony

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC][PATCH -next] pstore: replace spin_lock with spin_trylock_irqsave in panic path
  2011-09-27 19:02       ` Luck, Tony
  2011-09-27 19:46         ` Seiji Aguchi
@ 2011-09-28 13:57         ` Don Zickus
  2011-09-28 22:30           ` Luck, Tony
  1 sibling, 1 reply; 10+ messages in thread
From: Don Zickus @ 2011-09-28 13:57 UTC (permalink / raw)
  To: Luck, Tony
  Cc: Seiji Aguchi, linux-kernel, Vivek Goyal, Matthew Garrett, Chen,
	Gong, Andrew Morton, dle-develop, Satoru Moriya

On Tue, Sep 27, 2011 at 12:02:38PM -0700, Luck, Tony wrote:
> > Ok.  Do we care?  I assumed the panic data would be more
> > relevant/interesting than whatever pstore was doing before (like loading
> > previous log files).
> 
> Yes we care - saving panic data is most likely the single most important
> thing that pstore does.  I just have severe doubts that it will actually
> save anything useful if we just blindly continue if we can't get the lock.

Well, I was trying to imply that any pre-panic info is uninteresting.  It
is the panic/NMI stuff that should be top priority, worthy of busting the
spin lock.

> 
> What actually happens next will be dependent on the back-end. For
> the state machine in ERST, one possible outcome is a hang. For many
> people a hang is considered worse than a panic.

That should be up to the backend, no?  ERST has two modes, only one which
has a state machine.  The other is NVRAM which can probably handle
simultaneous writes.  And I believe the EFI back-end can handle that as
well.  That is why I was suggesting that the back-end return a failure.

> 
> > I assumed we are just overwriting the buffer with the current data, so
> > unless the other cpu is chugging along while this cpu is in panic, the new
> > data shouldn't get corrupted, no?
> 
> I really have no idea what *will* happen.  Lots of things are possible, only
> some of them are desirable.

My concern here is that if someone is just toying with pstore,
writing/reading data or even just poking at it to see what is going on
with the system, they may accidentally block real system errors or panics
from properly logging.  That doesn't seem right.

Cheers,
Don

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC][PATCH -next] pstore: replace spin_lock with spin_trylock_irqsave in panic path
  2011-09-27 19:46         ` Seiji Aguchi
@ 2011-09-28 14:09           ` Don Zickus
  2011-09-28 18:55             ` Seiji Aguchi
  0 siblings, 1 reply; 10+ messages in thread
From: Don Zickus @ 2011-09-28 14:09 UTC (permalink / raw)
  To: Seiji Aguchi
  Cc: Luck, Tony, linux-kernel, Vivek Goyal, Matthew Garrett, Chen,
	Gong, Andrew Morton, dle-develop, Satoru Moriya

On Tue, Sep 27, 2011 at 03:46:08PM -0400, Seiji Aguchi wrote:
> Hi,
> 
> >Yes we care - saving panic data is most likely the single most important
> >thing that pstore does.  I just have severe doubts that it will actually
> >save anything useful if we just blindly continue if we can't get the lock.
> 
> I agree with Tony. We may not get useful information if pstore just blindly continues
> while other cpus are running.
> 
> >Is this patch based on a real-life case of a system deadlocking? I'd
> >like to know if we are just talking around the theoretical case that
> >the lock may be held at panic time - or something that has actually been
> >seen in real life.
> 
> This patch is _not_ based on real-life case. I would like to avoid potential deadlock.
> 
> If Don disagrees to  my "return" code, I have another idea which moves pstore_dump() behind smp_send_stop().
> smp_send_stop() stops other cpus by sending IPI.
> So pstore can continue reliably and get useful information by just busting spinlock.

Yeah, Vivek had a similar idea to have the common panic path mimic what
they do with kdump, stop all the cpus except for the crashing one, to
serialize the crashing path.  This would allow us to more easily bust
spinlocks without worrying about what the other cpus are doing.

The kdump solution involves using NMI whereas smp_send_stop (on x86)
avoids it because of past issues and instead uses the IRQ line.  This
won't work if pstore_dump uses a spin_try_lock_irqsave() because the IRQ
line will be disable and never get the smp_send_stop() message (unless I
am reading the code wrong).

[reads the kernel/panic.c code] oh, I see this already exists, you would
just move the smp_send_stop() command up a couple lines of code.

[Side note] perhaps we should change the behaviour of smp_send_stop to use
NMI and create a blacklist of machines to use the IRQ line instead.  I
assume the list of broken machines is small as Red Hat has been kdumping
kernels since 2.6.18 with little evidence that machines were failing
because NMI wasn't working properly.


Cheers,
Don


^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: [RFC][PATCH -next] pstore: replace spin_lock with spin_trylock_irqsave in panic path
  2011-09-28 14:09           ` Don Zickus
@ 2011-09-28 18:55             ` Seiji Aguchi
  0 siblings, 0 replies; 10+ messages in thread
From: Seiji Aguchi @ 2011-09-28 18:55 UTC (permalink / raw)
  To: Don Zickus
  Cc: Luck, Tony, linux-kernel, Vivek Goyal, Matthew Garrett, Chen,
	Gong, Andrew Morton, dle-develop, Satoru Moriya

Hi Don,

>[reads the kernel/panic.c code] oh, I see this already exists, you would
>just move the smp_send_stop() command up a couple lines of code.
>
>[Side note] perhaps we should change the behaviour of smp_send_stop to use
>NMI and create a blacklist of machines to use the IRQ line instead.  I
>assume the list of broken machines is small as Red Hat has been kdumping
>kernels since 2.6.18 with little evidence that machines were failing
>because NMI wasn't working properly.

OK. I will develop a patch in accordance with your comment above.
In addition to that, I have to improve mtdoops/ramoops because
There are some blocking codes in them.

Seiji

^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: [RFC][PATCH -next] pstore: replace spin_lock with spin_trylock_irqsave in panic path
  2011-09-28 13:57         ` Don Zickus
@ 2011-09-28 22:30           ` Luck, Tony
  0 siblings, 0 replies; 10+ messages in thread
From: Luck, Tony @ 2011-09-28 22:30 UTC (permalink / raw)
  To: Don Zickus
  Cc: Seiji Aguchi, linux-kernel, Vivek Goyal, Matthew Garrett, Chen,
	Gong, Andrew Morton, dle-develop, Satoru Moriya

> That should be up to the backend, no?  ERST has two modes, only one which
> has a state machine.  The other is NVRAM which can probably handle
> simultaneous writes.  And I believe the EFI back-end can handle that as
> well.  That is why I was suggesting that the back-end return a failure.

ERST tries to provide a lot of flexibility to the platform on how to
make persistent space available.  If the platform has directly addressable
NVRAM - then ERST can point to it, so the "save" operation degenerates into
a simple write to the next available block.  But this isn't required. The
ERST buffer may be in normal memory, and the actions in the state machine
may trigger an SMI to make the BIOS copy it away to some safe place, or
the actions could ping a doorbell on a management controller which could
initiate a DMA transfer to pick up the buffer from memory.

-Tony

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2011-09-28 22:31 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-09-27 17:14 [RFC][PATCH -next] pstore: replace spin_lock with spin_trylock_irqsave in panic path Seiji Aguchi
2011-09-27 17:34 ` Don Zickus
2011-09-27 17:46   ` Luck, Tony
2011-09-27 17:59     ` Don Zickus
2011-09-27 19:02       ` Luck, Tony
2011-09-27 19:46         ` Seiji Aguchi
2011-09-28 14:09           ` Don Zickus
2011-09-28 18:55             ` Seiji Aguchi
2011-09-28 13:57         ` Don Zickus
2011-09-28 22:30           ` Luck, Tony

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.