linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] x86/mce: Always save severity in machine_check_poll
@ 2017-06-12 16:54 Yazen Ghannam
  2017-06-12 18:07 ` Luck, Tony
  2017-06-14 14:20 ` Borislav Petkov
  0 siblings, 2 replies; 7+ messages in thread
From: Yazen Ghannam @ 2017-06-12 16:54 UTC (permalink / raw)
  To: linux-edac; +Cc: Borislav Petkov, Tony Luck, x86, linux-kernel, Yazen Ghannam

From: Yazen Ghannam <yazen.ghannam@amd.com>

Remove code that was used to decide whether to schedule work. The decision
to schedule work is made later, so this code is now only deciding if we
should save the error severity.

Save the severity since we have it, and let the notifier blocks decide if
they want to do anything.

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
---
 arch/x86/kernel/cpu/mcheck/mce.c | 7 +------
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index b58b778..6dde049 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -673,7 +673,6 @@ bool machine_check_poll(enum mcp_flags flags, mce_banks_t *b)
 {
 	bool error_seen = false;
 	struct mce m;
-	int severity;
 	int i;
 
 	this_cpu_inc(mce_poll_count);
@@ -710,11 +709,7 @@ bool machine_check_poll(enum mcp_flags flags, mce_banks_t *b)
 
 		mce_read_aux(&m, i);
 
-		severity = mce_severity(&m, mca_cfg.tolerant, NULL, false);
-
-		if (severity == MCE_DEFERRED_SEVERITY && mce_is_memory_error(&m))
-			if (m.status & MCI_STATUS_ADDRV)
-				m.severity = severity;
+		m.severity = mce_severity(&m, mca_cfg.tolerant, NULL, false);
 
 		/*
 		 * Don't get the IP here because it's unlikely to
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] x86/mce: Always save severity in machine_check_poll
  2017-06-12 16:54 [PATCH] x86/mce: Always save severity in machine_check_poll Yazen Ghannam
@ 2017-06-12 18:07 ` Luck, Tony
  2017-06-12 18:55   ` Ghannam, Yazen
  2017-06-14 14:20 ` Borislav Petkov
  1 sibling, 1 reply; 7+ messages in thread
From: Luck, Tony @ 2017-06-12 18:07 UTC (permalink / raw)
  To: Yazen Ghannam; +Cc: linux-edac, Borislav Petkov, x86, linux-kernel

On Mon, Jun 12, 2017 at 11:54:06AM -0500, Yazen Ghannam wrote:
> -		severity = mce_severity(&m, mca_cfg.tolerant, NULL, false);
> -
> -		if (severity == MCE_DEFERRED_SEVERITY && mce_is_memory_error(&m))
> -			if (m.status & MCI_STATUS_ADDRV)
> -				m.severity = severity;
> +		m.severity = mce_severity(&m, mca_cfg.tolerant, NULL, false);

So that isn't quite the same. Before we only set m.severity for
memory errors where we had a valid address. Now you unconditionally
set it.

Maybe that's more useful. But it now needs an audit of the code
the registered notifiers to make sure they didn't assume that
severity set meant that this is a memory error.

-Tony

^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: [PATCH] x86/mce: Always save severity in machine_check_poll
  2017-06-12 18:07 ` Luck, Tony
@ 2017-06-12 18:55   ` Ghannam, Yazen
  0 siblings, 0 replies; 7+ messages in thread
From: Ghannam, Yazen @ 2017-06-12 18:55 UTC (permalink / raw)
  To: Luck, Tony; +Cc: linux-edac, Borislav Petkov, x86, linux-kernel

> -----Original Message-----
> From: linux-edac-owner@vger.kernel.org [mailto:linux-edac-
> owner@vger.kernel.org] On Behalf Of Luck, Tony
> Sent: Monday, June 12, 2017 2:08 PM
> To: Ghannam, Yazen <Yazen.Ghannam@amd.com>
> Cc: linux-edac@vger.kernel.org; Borislav Petkov <bp@suse.de>;
> x86@kernel.org; linux-kernel@vger.kernel.org
> Subject: Re: [PATCH] x86/mce: Always save severity in machine_check_poll
> 
> On Mon, Jun 12, 2017 at 11:54:06AM -0500, Yazen Ghannam wrote:
> > -		severity = mce_severity(&m, mca_cfg.tolerant, NULL, false);
> > -
> > -		if (severity == MCE_DEFERRED_SEVERITY &&
> mce_is_memory_error(&m))
> > -			if (m.status & MCI_STATUS_ADDRV)
> > -				m.severity = severity;
> > +		m.severity = mce_severity(&m, mca_cfg.tolerant, NULL, false);
> 
> So that isn't quite the same. Before we only set m.severity for memory errors
> where we had a valid address. Now you unconditionally set it.
> 
> Maybe that's more useful. But it now needs an audit of the code the
> registered notifiers to make sure they didn't assume that severity set meant
> that this is a memory error.
> 

Only the SRAO notifier checks for severity as far as I can tell, and it specifically
checks for m.serverity=MCE_SRAO_SEVERITY.

However, it looks like all the other actionable notifiers check for a memory
error either using mce_is_memory_error() or by checking the status bits
directly.

Thanks,
Yazen

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] x86/mce: Always save severity in machine_check_poll
  2017-06-12 16:54 [PATCH] x86/mce: Always save severity in machine_check_poll Yazen Ghannam
  2017-06-12 18:07 ` Luck, Tony
@ 2017-06-14 14:20 ` Borislav Petkov
  2017-06-16 14:49   ` Ghannam, Yazen
  1 sibling, 1 reply; 7+ messages in thread
From: Borislav Petkov @ 2017-06-14 14:20 UTC (permalink / raw)
  To: Yazen Ghannam; +Cc: linux-edac, Tony Luck, x86, linux-kernel

On Mon, Jun 12, 2017 at 11:54:06AM -0500, Yazen Ghannam wrote:
> From: Yazen Ghannam <yazen.ghannam@amd.com>
> 
> Remove code that was used to decide whether to schedule work. The decision

???

I'm missing a *lot* of background in order to understand what that sentence
means.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: [PATCH] x86/mce: Always save severity in machine_check_poll
  2017-06-14 14:20 ` Borislav Petkov
@ 2017-06-16 14:49   ` Ghannam, Yazen
  2017-06-19 16:48     ` Borislav Petkov
  0 siblings, 1 reply; 7+ messages in thread
From: Ghannam, Yazen @ 2017-06-16 14:49 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: linux-edac, Tony Luck, x86, linux-kernel

> -----Original Message-----
> From: linux-kernel-owner@vger.kernel.org [mailto:linux-kernel-
> owner@vger.kernel.org] On Behalf Of Borislav Petkov
> Sent: Wednesday, June 14, 2017 10:21 AM
> To: Ghannam, Yazen <Yazen.Ghannam@amd.com>
> Cc: linux-edac@vger.kernel.org; Tony Luck <tony.luck@intel.com>;
> x86@kernel.org; linux-kernel@vger.kernel.org
> Subject: Re: [PATCH] x86/mce: Always save severity in machine_check_poll
> 
> On Mon, Jun 12, 2017 at 11:54:06AM -0500, Yazen Ghannam wrote:
> > From: Yazen Ghannam <yazen.ghannam@amd.com>
> >
> > Remove code that was used to decide whether to schedule work. The
> > decision
> 
> ???
> 
> I'm missing a *lot* of background in order to understand what that sentence
> means.
> 

The code block being removed here was added in the following commit to decide
whether or not to schedule work.

fa92c58 x86, mce: Support memory error recovery for both UCNA and Deferred error in machine_check_poll

The following commit based the decision to schedule work on if we have a usable
address and made this decision later in machine_check_poll().

8b38937b x86/mce: Do not enter deferred errors into the generic pool twice

Then the following commit removed m.usable_addr from the code block.

c0ec382 x86/RAS: Remove mce.usable_addr

So now this code block just decides whether or not to save the severity. We can
remove this block, since the original purpose of this code (to schedule work) is no
longer happening.

Tony has a concern that some notifiers may assume that the severity being
set means that the error is a memory error. As far as I can tell, the only notifier
that uses severity is the SRAO notifier and it doesn't make an assumption.

We schedule work if we want to log the error or if we have a usable address.
So there's no reason not to save the severity anymore.

Thanks,
Yazen

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] x86/mce: Always save severity in machine_check_poll
  2017-06-16 14:49   ` Ghannam, Yazen
@ 2017-06-19 16:48     ` Borislav Petkov
  2017-06-21 19:41       ` Ghannam, Yazen
  0 siblings, 1 reply; 7+ messages in thread
From: Borislav Petkov @ 2017-06-19 16:48 UTC (permalink / raw)
  To: Ghannam, Yazen; +Cc: linux-edac, Tony Luck, x86, linux-kernel

On Fri, Jun 16, 2017 at 02:49:58PM +0000, Ghannam, Yazen wrote:
> The code block being removed here was added in the following commit to decide
> whether or not to schedule work.
> 
> fa92c58 x86, mce: Support memory error recovery for both UCNA and Deferred error in machine_check_poll

Are you sure?

That commit makes the code log poison memory errors which have a valid
address. The work being scheduled is only an implementation detail for
how we're doing the logging and calling to the userspace handlers to
start consuming.

Btw, do

$ git config --replace-all core.abbrev 12

to set your commit abbreviation length to 12 so that there's no
ambiguity for the recent years. :)

> The following commit based the decision to schedule work on if we have a usable
> address and made this decision later in machine_check_poll().

> 8b38937b x86/mce: Do not enter deferred errors into the generic pool twice

Yes, because we were logging them twice.

> Then the following commit removed m.usable_addr from the code block.
> 
> c0ec382 x86/RAS: Remove mce.usable_addr

That's just a cleanup.

> So now this code block just decides whether or not to save the severity. We can
> remove this block, since the original purpose of this code (to schedule work) is no
> longer happening.

Not really. The original purpose of this code was to log deferred errors
with AddrV set - the work scheduling is just an implementation detail,
as I mentioned above.

> Tony has a concern that some notifiers may assume that the severity being
> set means that the error is a memory error. As far as I can tell, the only notifier
> that uses severity is the SRAO notifier and it doesn't make an assumption.
> 
> We schedule work if we want to log the error or if we have a usable address.
> So there's no reason not to save the severity anymore.

Just forget the work scheduling - it is only a marginal implementation
thing - you only want to say that we want to log the severity since we
already have it. And the most important aspect in your v2 commit message
is explaining *why* we want the severity. And yes, it is possible to
simply say, this is a cleanup and we can just as well always save it
because it is a valid reason. And it makes sense too.

All I'm trying to remind you of, is, you should explain in your commit
message what the issue is and *why* you're doing the change. When the
first sentence of your commit message is "Remove code that was used
to decide whether to schedule work." I'm starting to scratch my head,
wondering, what the actual problem is and what you're trying to fix.

Ok?

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: [PATCH] x86/mce: Always save severity in machine_check_poll
  2017-06-19 16:48     ` Borislav Petkov
@ 2017-06-21 19:41       ` Ghannam, Yazen
  0 siblings, 0 replies; 7+ messages in thread
From: Ghannam, Yazen @ 2017-06-21 19:41 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: linux-edac, Tony Luck, x86, linux-kernel

> -----Original Message-----
> From: Borislav Petkov [mailto:bp@alien8.de]
> Sent: Monday, June 19, 2017 12:49 PM
> To: Ghannam, Yazen <Yazen.Ghannam@amd.com>
> Cc: linux-edac@vger.kernel.org; Tony Luck <tony.luck@intel.com>;
> x86@kernel.org; linux-kernel@vger.kernel.org
> Subject: Re: [PATCH] x86/mce: Always save severity in machine_check_poll
> 
> 
> Just forget the work scheduling - it is only a marginal implementation thing -
> you only want to say that we want to log the severity since we already have it.
> And the most important aspect in your v2 commit message is explaining
> *why* we want the severity. And yes, it is possible to simply say, this is a
> cleanup and we can just as well always save it because it is a valid reason. And
> it makes sense too.
> 
> All I'm trying to remind you of, is, you should explain in your commit message
> what the issue is and *why* you're doing the change. When the first sentence
> of your commit message is "Remove code that was used to decide whether to
> schedule work." I'm starting to scratch my head, wondering, what the actual
> problem is and what you're trying to fix.
> 
> Ok?
> 

Yep, got it.

Thanks,
Yazen

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2017-06-21 19:41 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-06-12 16:54 [PATCH] x86/mce: Always save severity in machine_check_poll Yazen Ghannam
2017-06-12 18:07 ` Luck, Tony
2017-06-12 18:55   ` Ghannam, Yazen
2017-06-14 14:20 ` Borislav Petkov
2017-06-16 14:49   ` Ghannam, Yazen
2017-06-19 16:48     ` Borislav Petkov
2017-06-21 19:41       ` Ghannam, Yazen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).