pstore dump inside an nmi handler

* pstore dump inside an nmi handler
@ 2011-07-08 20:17 Don Zickus
  2011-07-08 21:40 ` Luck, Tony
  0 siblings, 1 reply; 11+ messages in thread
From: Don Zickus @ 2011-07-08 20:17 UTC (permalink / raw)
  To: tony.luck; +Cc: mjg, linux-kernel

Hi Tony,

I was playing with the APEI EINJ module, injecting errors trying to
capture a GHES record, then panic into a kdump kernel and reboot.

Matthew brought to my attention that pstore should capture an error record
on the panic path using kmsg_dump().  After injecting an error with EINJ,
I went to check to see if there was a pstore entry.  There wasn't.

Playing on another box, I noticed the machine double faulted and didn't
even make it into a kdump kernel.

Upon investigation, I noticed that when a fatal error occurs on the
platform, it will generate an NMI that will be handle by the
ghes_nmi_handler.  This handler calls panic() which calls kmsg_dump()
which calls pstore_dump().

Inside pstore_dump(), the first thing it tries to grab is a mutex_lock()
(inside an nmi hander).  This seems to be the root cause of my problems.

I am not familiar enough with pstore to just modify its locking, so I
wanted to ask you.

My first thought was to wrap the mutex_lock with a 'if !in_nmi()', but that
seemed kinda hacky.  Then I was wondering if there was a way to do this
locklessly or atomically because you are only dealing with whole blocks I
think.  I don't know.

Wanted to give you a heads up and seek your thoughts.  I am willing to
hack up some code and test. :-)

Cheers,
Don

^ permalink raw reply	[flat|nested] 11+ messages in thread