linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [GIT PULL] MCE recovery changes
@ 2012-01-24 23:06 Luck, Tony
  2012-01-26 10:46 ` Ingo Molnar
  0 siblings, 1 reply; 6+ messages in thread
From: Luck, Tony @ 2012-01-24 23:06 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: linux-kernel, Borislav Petkov

Ingo,

Time to move these from "ras" tree to "tip" so they will be nicely
seasoned for the 3.4 merge window.

I tried to follow the instructions in
 http://git-blame.blogspot.com/2012/01/using-signed-tag-in-pull-requests.html
to use the fancy new signed tag scheme. If something is wrong here, then it
is most likely that I typoed (or thinkoed) while following them.

-Tony

The following changes since commit dc47ce90c3a822cd7c9e9339fe4d5f61dcb26b50:

  Linux 3.2-rc5 (2011-12-09 15:09:32 -0800)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras.git mce-recovery-for-tip

for you to fetch changes up to 5f7b88d51e89771f64c15903b96b5933dd0bc6d8:

  x86/mce: Recognise machine check bank signature for data path error (2012-01-03 12:07:07 -0800)

----------------------------------------------------------------
MCE recovery (data path only)

----------------------------------------------------------------
Tony Luck (6):
      HWPOISON: Clean up memory_failure() vs. __memory_failure()
      HWPOISON: Add code to handle "action required" errors.
      x86/mce: Create helper function to save addr/misc when needed
      x86/mce: Add mechanism to safely save information in MCE handler
      x86/mce: Handle "action required" errors
      x86/mce: Recognise machine check bank signature for data path error

 arch/x86/kernel/cpu/mcheck/mce-severity.c |   16 +++-
 arch/x86/kernel/cpu/mcheck/mce.c          |  179 ++++++++++++++++++++---------
 drivers/base/memory.c                     |    2 +-
 include/linux/mm.h                        |    4 +-
 mm/hwpoison-inject.c                      |    4 +-
 mm/madvise.c                              |    2 +-
 mm/memory-failure.c                       |   96 ++++++++--------
 7 files changed, 197 insertions(+), 106 deletions(-)

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [GIT PULL] MCE recovery changes
  2012-01-24 23:06 [GIT PULL] MCE recovery changes Luck, Tony
@ 2012-01-26 10:46 ` Ingo Molnar
  2012-01-26 17:29   ` Tony Luck
  0 siblings, 1 reply; 6+ messages in thread
From: Ingo Molnar @ 2012-01-26 10:46 UTC (permalink / raw)
  To: Luck, Tony; +Cc: linux-kernel, Borislav Petkov, Thomas Gleixner, H. Peter Anvin


* Luck, Tony <tony.luck@intel.com> wrote:

> Ingo,
> 
> Time to move these from "ras" tree to "tip" so they will be nicely
> seasoned for the 3.4 merge window.
> 
> I tried to follow the instructions in
>  http://git-blame.blogspot.com/2012/01/using-signed-tag-in-pull-requests.html
> to use the fancy new signed tag scheme. If something is wrong here, then it
> is most likely that I typoed (or thinkoed) while following them.

It worked perfectly.

> 
> -Tony
> 
> The following changes since commit dc47ce90c3a822cd7c9e9339fe4d5f61dcb26b50:
> 
>   Linux 3.2-rc5 (2011-12-09 15:09:32 -0800)
> 
> are available in the git repository at:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras.git mce-recovery-for-tip
> 
> for you to fetch changes up to 5f7b88d51e89771f64c15903b96b5933dd0bc6d8:
> 
>   x86/mce: Recognise machine check bank signature for data path error (2012-01-03 12:07:07 -0800)
> 
> ----------------------------------------------------------------
> MCE recovery (data path only)
> 
> ----------------------------------------------------------------
> Tony Luck (6):
>       HWPOISON: Clean up memory_failure() vs. __memory_failure()
>       HWPOISON: Add code to handle "action required" errors.
>       x86/mce: Create helper function to save addr/misc when needed
>       x86/mce: Add mechanism to safely save information in MCE handler
>       x86/mce: Handle "action required" errors
>       x86/mce: Recognise machine check bank signature for data path error
> 
>  arch/x86/kernel/cpu/mcheck/mce-severity.c |   16 +++-
>  arch/x86/kernel/cpu/mcheck/mce.c          |  179 ++++++++++++++++++++---------
>  drivers/base/memory.c                     |    2 +-
>  include/linux/mm.h                        |    4 +-
>  mm/hwpoison-inject.c                      |    4 +-
>  mm/madvise.c                              |    2 +-
>  mm/memory-failure.c                       |   96 ++++++++--------
>  7 files changed, 197 insertions(+), 106 deletions(-)

Pulled, thanks!

One thing i noticed was the magic constant 0x134:

+               SER, MASK(MCI_STATUS_OVER|MCI_UC_SAR|MCI_ADDR|MCACOD, MCI_UC_SAR|MCI_ADDR|0x0134),

don't we want that defined a bit more clearly?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [GIT PULL] MCE recovery changes
  2012-01-26 10:46 ` Ingo Molnar
@ 2012-01-26 17:29   ` Tony Luck
  2012-01-26 18:28     ` Ingo Molnar
  0 siblings, 1 reply; 6+ messages in thread
From: Tony Luck @ 2012-01-26 17:29 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, Borislav Petkov, Thomas Gleixner, H. Peter Anvin

On Thu, Jan 26, 2012 at 2:46 AM, Ingo Molnar <mingo@elte.hu> wrote:
> It worked perfectly.

Hurrah!

> Pulled, thanks!

Thank you.

> One thing i noticed was the magic constant 0x134:
>
> +               SER, MASK(MCI_STATUS_OVER|MCI_UC_SAR|MCI_ADDR|MCACOD, MCI_UC_SAR|MCI_ADDR|0x0134),
>
> don't we want that defined a bit more clearly?

Stylistically it is compatible with the existing:
MASK(MCI_STATUS_OVER|MCI_UC_SAR|0xfff0, MCI_UC_S|0x00c0)
and
MASK(MCI_STATUS_OVER|MCI_UC_SAR|MCACOD, MCI_UC_S|0x017a)

... but that's just a sign that they need some love too :-)

I'll see what I can do - but meaningful names will clearly be longer than
the hex constants that they replace - and I'm already pushing line length
limits here, so it will need more than a trivial restructure.

-Tony

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [GIT PULL] MCE recovery changes
  2012-01-26 17:29   ` Tony Luck
@ 2012-01-26 18:28     ` Ingo Molnar
  2012-01-27  0:02       ` [PATCH] x86/mce: Replace hard coded hex constants with symbolic defines Tony Luck
  0 siblings, 1 reply; 6+ messages in thread
From: Ingo Molnar @ 2012-01-26 18:28 UTC (permalink / raw)
  To: Tony Luck; +Cc: linux-kernel, Borislav Petkov, Thomas Gleixner, H. Peter Anvin


* Tony Luck <tony.luck@intel.com> wrote:

> On Thu, Jan 26, 2012 at 2:46 AM, Ingo Molnar <mingo@elte.hu> wrote:
> > It worked perfectly.
> 
> Hurrah!
> 
> > Pulled, thanks!
> 
> Thank you.
> 
> > One thing i noticed was the magic constant 0x134:
> >
> > +               SER, MASK(MCI_STATUS_OVER|MCI_UC_SAR|MCI_ADDR|MCACOD, MCI_UC_SAR|MCI_ADDR|0x0134),
> >
> > don't we want that defined a bit more clearly?
> 
> Stylistically it is compatible with the existing:
> MASK(MCI_STATUS_OVER|MCI_UC_SAR|0xfff0, MCI_UC_S|0x00c0)
> and
> MASK(MCI_STATUS_OVER|MCI_UC_SAR|MCACOD, MCI_UC_S|0x017a)
> 
> ... but that's just a sign that they need some love too :-)
> 
> I'll see what I can do - but meaningful names will clearly be 
> longer than the hex constants that they replace - and I'm 
> already pushing line length limits here, so it will need more 
> than a trivial restructure.

Well, one option is to let the line grow - for such things it's 
ok up to 100 cols or so.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH] x86/mce: Replace hard coded hex constants with symbolic defines
  2012-01-26 18:28     ` Ingo Molnar
@ 2012-01-27  0:02       ` Tony Luck
  2012-01-27 10:49         ` Ingo Molnar
  0 siblings, 1 reply; 6+ messages in thread
From: Tony Luck @ 2012-01-27  0:02 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, Borislav Petkov, Thomas Gleixner, H. Peter Anvin

Magic constants like 0x0134 in code just invite questions on
where they come from, what they mean, can they be changed?

Provide #defines for the architecturally defined MCACOD values
with a reference to the Intel Software Developers manual which
describes them.

Signed-off-by: Tony Luck <tony.luck@intel.com>
---

Ingo: You said "ok up to 100 cols or so" ... this goes to 103.
If this is OK, you can either apply it from here on top of the
x86/mce branch in tip - or I can push it to the ras tree and
send you another pull.

 arch/x86/kernel/cpu/mcheck/mce-severity.c |   14 ++++++++++----
 1 files changed, 10 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/cpu/mcheck/mce-severity.c b/arch/x86/kernel/cpu/mcheck/mce-severity.c
index f6c92f9..0c82091 100644
--- a/arch/x86/kernel/cpu/mcheck/mce-severity.c
+++ b/arch/x86/kernel/cpu/mcheck/mce-severity.c
@@ -56,6 +56,12 @@ static struct severity {
 #define MCI_UC_SAR (MCI_STATUS_UC|MCI_STATUS_S|MCI_STATUS_AR)
 #define	MCI_ADDR (MCI_STATUS_ADDRV|MCI_STATUS_MISCV)
 #define MCACOD 0xffff
+/* Architecturally defined codes from SDM Vol. 3B Chapter 15 */
+#define MCACOD_SCRUB	0x00C0	/* 0xC0-0xCF Memory Scrubbing */
+#define MCACOD_SCRUBMSK	0xfff0
+#define MCACOD_L3WB	0x017A	/* L3 Explicit Writeback */
+#define MCACOD_DATA	0x0134	/* Data Load */
+#define MCACOD_INSTR	0x0150	/* Instruction Fetch */
 
 	MCESEV(
 		NO, "Invalid",
@@ -112,12 +118,12 @@ static struct severity {
 #ifdef	CONFIG_MEMORY_FAILURE
 	MCESEV(
 		KEEP, "HT thread notices Action required: data load error",
-		SER, MASK(MCI_STATUS_OVER|MCI_UC_SAR|MCI_ADDR|MCACOD, MCI_UC_SAR|MCI_ADDR|0x0134),
+		SER, MASK(MCI_STATUS_OVER|MCI_UC_SAR|MCI_ADDR|MCACOD, MCI_UC_SAR|MCI_ADDR|MCACOD_DATA),
 		MCGMASK(MCG_STATUS_EIPV, 0)
 		),
 	MCESEV(
 		AR, "Action required: data load error",
-		SER, MASK(MCI_STATUS_OVER|MCI_UC_SAR|MCI_ADDR|MCACOD, MCI_UC_SAR|MCI_ADDR|0x0134),
+		SER, MASK(MCI_STATUS_OVER|MCI_UC_SAR|MCI_ADDR|MCACOD, MCI_UC_SAR|MCI_ADDR|MCACOD_DATA),
 		USER
 		),
 #endif
@@ -129,11 +135,11 @@ static struct severity {
 	/* known AO MCACODs: */
 	MCESEV(
 		AO, "Action optional: memory scrubbing error",
-		SER, MASK(MCI_STATUS_OVER|MCI_UC_SAR|0xfff0, MCI_UC_S|0x00c0)
+		SER, MASK(MCI_STATUS_OVER|MCI_UC_SAR|MCACOD_SCRUBMSK, MCI_UC_S|MCACOD_SCRUB)
 		),
 	MCESEV(
 		AO, "Action optional: last level cache writeback error",
-		SER, MASK(MCI_STATUS_OVER|MCI_UC_SAR|MCACOD, MCI_UC_S|0x017a)
+		SER, MASK(MCI_STATUS_OVER|MCI_UC_SAR|MCACOD, MCI_UC_S|MCACOD_L3WB)
 		),
 	MCESEV(
 		SOME, "Action optional: unknown MCACOD",
-- 
1.7.9.rc2.1.g69204


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] x86/mce: Replace hard coded hex constants with symbolic defines
  2012-01-27  0:02       ` [PATCH] x86/mce: Replace hard coded hex constants with symbolic defines Tony Luck
@ 2012-01-27 10:49         ` Ingo Molnar
  0 siblings, 0 replies; 6+ messages in thread
From: Ingo Molnar @ 2012-01-27 10:49 UTC (permalink / raw)
  To: Tony Luck; +Cc: linux-kernel, Borislav Petkov, Thomas Gleixner, H. Peter Anvin


* Tony Luck <tony.luck@intel.com> wrote:

> Magic constants like 0x0134 in code just invite questions on
> where they come from, what they mean, can they be changed?
> 
> Provide #defines for the architecturally defined MCACOD values
> with a reference to the Intel Software Developers manual which
> describes them.
> 
> Signed-off-by: Tony Luck <tony.luck@intel.com>
> ---
> 
> Ingo: You said "ok up to 100 cols or so" ... this goes to 103.
> If this is OK, you can either apply it from here on top of the
> x86/mce branch in tip - or I can push it to the ras tree and
> send you another pull.

Yeah, looks good to me.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2012-01-27 10:49 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-01-24 23:06 [GIT PULL] MCE recovery changes Luck, Tony
2012-01-26 10:46 ` Ingo Molnar
2012-01-26 17:29   ` Tony Luck
2012-01-26 18:28     ` Ingo Molnar
2012-01-27  0:02       ` [PATCH] x86/mce: Replace hard coded hex constants with symbolic defines Tony Luck
2012-01-27 10:49         ` Ingo Molnar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).