All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] x86/mce: Always save severity in machine_check_poll
@ 2017-06-12 16:54 ` Yazen Ghannam
  0 siblings, 0 replies; 14+ messages in thread
From: Yazen Ghannam @ 2017-06-12 16:54 UTC (permalink / raw)
  To: linux-edac; +Cc: Borislav Petkov, Tony Luck, x86, linux-kernel, Yazen Ghannam

From: Yazen Ghannam <yazen.ghannam@amd.com>

Remove code that was used to decide whether to schedule work. The decision
to schedule work is made later, so this code is now only deciding if we
should save the error severity.

Save the severity since we have it, and let the notifier blocks decide if
they want to do anything.

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
---
 arch/x86/kernel/cpu/mcheck/mce.c | 7 +------
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index b58b778..6dde049 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -673,7 +673,6 @@ bool machine_check_poll(enum mcp_flags flags, mce_banks_t *b)
 {
 	bool error_seen = false;
 	struct mce m;
-	int severity;
 	int i;
 
 	this_cpu_inc(mce_poll_count);
@@ -710,11 +709,7 @@ bool machine_check_poll(enum mcp_flags flags, mce_banks_t *b)
 
 		mce_read_aux(&m, i);
 
-		severity = mce_severity(&m, mca_cfg.tolerant, NULL, false);
-
-		if (severity == MCE_DEFERRED_SEVERITY && mce_is_memory_error(&m))
-			if (m.status & MCI_STATUS_ADDRV)
-				m.severity = severity;
+		m.severity = mce_severity(&m, mca_cfg.tolerant, NULL, false);
 
 		/*
 		 * Don't get the IP here because it's unlikely to
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* x86/mce: Always save severity in machine_check_poll
@ 2017-06-12 16:54 ` Yazen Ghannam
  0 siblings, 0 replies; 14+ messages in thread
From: Yazen Ghannam @ 2017-06-12 16:54 UTC (permalink / raw)
  To: linux-edac; +Cc: Borislav Petkov, Tony Luck, x86, linux-kernel, Yazen Ghannam

From: Yazen Ghannam <yazen.ghannam@amd.com>

Remove code that was used to decide whether to schedule work. The decision
to schedule work is made later, so this code is now only deciding if we
should save the error severity.

Save the severity since we have it, and let the notifier blocks decide if
they want to do anything.

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
---
 arch/x86/kernel/cpu/mcheck/mce.c | 7 +------
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index b58b778..6dde049 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -673,7 +673,6 @@ bool machine_check_poll(enum mcp_flags flags, mce_banks_t *b)
 {
 	bool error_seen = false;
 	struct mce m;
-	int severity;
 	int i;
 
 	this_cpu_inc(mce_poll_count);
@@ -710,11 +709,7 @@ bool machine_check_poll(enum mcp_flags flags, mce_banks_t *b)
 
 		mce_read_aux(&m, i);
 
-		severity = mce_severity(&m, mca_cfg.tolerant, NULL, false);
-
-		if (severity == MCE_DEFERRED_SEVERITY && mce_is_memory_error(&m))
-			if (m.status & MCI_STATUS_ADDRV)
-				m.severity = severity;
+		m.severity = mce_severity(&m, mca_cfg.tolerant, NULL, false);
 
 		/*
 		 * Don't get the IP here because it's unlikely to

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH] x86/mce: Always save severity in machine_check_poll
@ 2017-06-12 18:07   ` Luck, Tony
  0 siblings, 0 replies; 14+ messages in thread
From: Luck, Tony @ 2017-06-12 18:07 UTC (permalink / raw)
  To: Yazen Ghannam; +Cc: linux-edac, Borislav Petkov, x86, linux-kernel

On Mon, Jun 12, 2017 at 11:54:06AM -0500, Yazen Ghannam wrote:
> -		severity = mce_severity(&m, mca_cfg.tolerant, NULL, false);
> -
> -		if (severity == MCE_DEFERRED_SEVERITY && mce_is_memory_error(&m))
> -			if (m.status & MCI_STATUS_ADDRV)
> -				m.severity = severity;
> +		m.severity = mce_severity(&m, mca_cfg.tolerant, NULL, false);

So that isn't quite the same. Before we only set m.severity for
memory errors where we had a valid address. Now you unconditionally
set it.

Maybe that's more useful. But it now needs an audit of the code
the registered notifiers to make sure they didn't assume that
severity set meant that this is a memory error.

-Tony

^ permalink raw reply	[flat|nested] 14+ messages in thread

* x86/mce: Always save severity in machine_check_poll
@ 2017-06-12 18:07   ` Luck, Tony
  0 siblings, 0 replies; 14+ messages in thread
From: Luck, Tony @ 2017-06-12 18:07 UTC (permalink / raw)
  To: Yazen Ghannam; +Cc: linux-edac, Borislav Petkov, x86, linux-kernel

On Mon, Jun 12, 2017 at 11:54:06AM -0500, Yazen Ghannam wrote:
> -		severity = mce_severity(&m, mca_cfg.tolerant, NULL, false);
> -
> -		if (severity == MCE_DEFERRED_SEVERITY && mce_is_memory_error(&m))
> -			if (m.status & MCI_STATUS_ADDRV)
> -				m.severity = severity;
> +		m.severity = mce_severity(&m, mca_cfg.tolerant, NULL, false);

So that isn't quite the same. Before we only set m.severity for
memory errors where we had a valid address. Now you unconditionally
set it.

Maybe that's more useful. But it now needs an audit of the code
the registered notifiers to make sure they didn't assume that
severity set meant that this is a memory error.

-Tony
---
To unsubscribe from this list: send the line "unsubscribe linux-edac" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: [PATCH] x86/mce: Always save severity in machine_check_poll
@ 2017-06-12 18:55     ` Yazen Ghannam
  0 siblings, 0 replies; 14+ messages in thread
From: Ghannam, Yazen @ 2017-06-12 18:55 UTC (permalink / raw)
  To: Luck, Tony; +Cc: linux-edac, Borislav Petkov, x86, linux-kernel

> -----Original Message-----
> From: linux-edac-owner@vger.kernel.org [mailto:linux-edac-
> owner@vger.kernel.org] On Behalf Of Luck, Tony
> Sent: Monday, June 12, 2017 2:08 PM
> To: Ghannam, Yazen <Yazen.Ghannam@amd.com>
> Cc: linux-edac@vger.kernel.org; Borislav Petkov <bp@suse.de>;
> x86@kernel.org; linux-kernel@vger.kernel.org
> Subject: Re: [PATCH] x86/mce: Always save severity in machine_check_poll
> 
> On Mon, Jun 12, 2017 at 11:54:06AM -0500, Yazen Ghannam wrote:
> > -		severity = mce_severity(&m, mca_cfg.tolerant, NULL, false);
> > -
> > -		if (severity == MCE_DEFERRED_SEVERITY &&
> mce_is_memory_error(&m))
> > -			if (m.status & MCI_STATUS_ADDRV)
> > -				m.severity = severity;
> > +		m.severity = mce_severity(&m, mca_cfg.tolerant, NULL, false);
> 
> So that isn't quite the same. Before we only set m.severity for memory errors
> where we had a valid address. Now you unconditionally set it.
> 
> Maybe that's more useful. But it now needs an audit of the code the
> registered notifiers to make sure they didn't assume that severity set meant
> that this is a memory error.
> 

Only the SRAO notifier checks for severity as far as I can tell, and it specifically
checks for m.serverity=MCE_SRAO_SEVERITY.

However, it looks like all the other actionable notifiers check for a memory
error either using mce_is_memory_error() or by checking the status bits
directly.

Thanks,
Yazen

^ permalink raw reply	[flat|nested] 14+ messages in thread

* x86/mce: Always save severity in machine_check_poll
@ 2017-06-12 18:55     ` Yazen Ghannam
  0 siblings, 0 replies; 14+ messages in thread
From: Yazen Ghannam @ 2017-06-12 18:55 UTC (permalink / raw)
  To: Luck, Tony; +Cc: linux-edac, Borislav Petkov, x86, linux-kernel

> -----Original Message-----
> From: linux-edac-owner@vger.kernel.org [mailto:linux-edac-
> owner@vger.kernel.org] On Behalf Of Luck, Tony
> Sent: Monday, June 12, 2017 2:08 PM
> To: Ghannam, Yazen <Yazen.Ghannam@amd.com>
> Cc: linux-edac@vger.kernel.org; Borislav Petkov <bp@suse.de>;
> x86@kernel.org; linux-kernel@vger.kernel.org
> Subject: Re: [PATCH] x86/mce: Always save severity in machine_check_poll
> 
> On Mon, Jun 12, 2017 at 11:54:06AM -0500, Yazen Ghannam wrote:
> > -		severity = mce_severity(&m, mca_cfg.tolerant, NULL, false);
> > -
> > -		if (severity == MCE_DEFERRED_SEVERITY &&
> mce_is_memory_error(&m))
> > -			if (m.status & MCI_STATUS_ADDRV)
> > -				m.severity = severity;
> > +		m.severity = mce_severity(&m, mca_cfg.tolerant, NULL, false);
> 
> So that isn't quite the same. Before we only set m.severity for memory errors
> where we had a valid address. Now you unconditionally set it.
> 
> Maybe that's more useful. But it now needs an audit of the code the
> registered notifiers to make sure they didn't assume that severity set meant
> that this is a memory error.
> 

Only the SRAO notifier checks for severity as far as I can tell, and it specifically
checks for m.serverity=MCE_SRAO_SEVERITY.

However, it looks like all the other actionable notifiers check for a memory
error either using mce_is_memory_error() or by checking the status bits
directly.

Thanks,
Yazen
---
To unsubscribe from this list: send the line "unsubscribe linux-edac" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] x86/mce: Always save severity in machine_check_poll
@ 2017-06-14 14:20   ` Borislav Petkov
  0 siblings, 0 replies; 14+ messages in thread
From: Borislav Petkov @ 2017-06-14 14:20 UTC (permalink / raw)
  To: Yazen Ghannam; +Cc: linux-edac, Tony Luck, x86, linux-kernel

On Mon, Jun 12, 2017 at 11:54:06AM -0500, Yazen Ghannam wrote:
> From: Yazen Ghannam <yazen.ghannam@amd.com>
> 
> Remove code that was used to decide whether to schedule work. The decision

???

I'm missing a *lot* of background in order to understand what that sentence
means.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* x86/mce: Always save severity in machine_check_poll
@ 2017-06-14 14:20   ` Borislav Petkov
  0 siblings, 0 replies; 14+ messages in thread
From: Borislav Petkov @ 2017-06-14 14:20 UTC (permalink / raw)
  To: Yazen Ghannam; +Cc: linux-edac, Tony Luck, x86, linux-kernel

On Mon, Jun 12, 2017 at 11:54:06AM -0500, Yazen Ghannam wrote:
> From: Yazen Ghannam <yazen.ghannam@amd.com>
> 
> Remove code that was used to decide whether to schedule work. The decision

???

I'm missing a *lot* of background in order to understand what that sentence
means.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: [PATCH] x86/mce: Always save severity in machine_check_poll
@ 2017-06-16 14:49     ` Yazen Ghannam
  0 siblings, 0 replies; 14+ messages in thread
From: Ghannam, Yazen @ 2017-06-16 14:49 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: linux-edac, Tony Luck, x86, linux-kernel

> -----Original Message-----
> From: linux-kernel-owner@vger.kernel.org [mailto:linux-kernel-
> owner@vger.kernel.org] On Behalf Of Borislav Petkov
> Sent: Wednesday, June 14, 2017 10:21 AM
> To: Ghannam, Yazen <Yazen.Ghannam@amd.com>
> Cc: linux-edac@vger.kernel.org; Tony Luck <tony.luck@intel.com>;
> x86@kernel.org; linux-kernel@vger.kernel.org
> Subject: Re: [PATCH] x86/mce: Always save severity in machine_check_poll
> 
> On Mon, Jun 12, 2017 at 11:54:06AM -0500, Yazen Ghannam wrote:
> > From: Yazen Ghannam <yazen.ghannam@amd.com>
> >
> > Remove code that was used to decide whether to schedule work. The
> > decision
> 
> ???
> 
> I'm missing a *lot* of background in order to understand what that sentence
> means.
> 

The code block being removed here was added in the following commit to decide
whether or not to schedule work.

fa92c58 x86, mce: Support memory error recovery for both UCNA and Deferred error in machine_check_poll

The following commit based the decision to schedule work on if we have a usable
address and made this decision later in machine_check_poll().

8b38937b x86/mce: Do not enter deferred errors into the generic pool twice

Then the following commit removed m.usable_addr from the code block.

c0ec382 x86/RAS: Remove mce.usable_addr

So now this code block just decides whether or not to save the severity. We can
remove this block, since the original purpose of this code (to schedule work) is no
longer happening.

Tony has a concern that some notifiers may assume that the severity being
set means that the error is a memory error. As far as I can tell, the only notifier
that uses severity is the SRAO notifier and it doesn't make an assumption.

We schedule work if we want to log the error or if we have a usable address.
So there's no reason not to save the severity anymore.

Thanks,
Yazen

^ permalink raw reply	[flat|nested] 14+ messages in thread

* x86/mce: Always save severity in machine_check_poll
@ 2017-06-16 14:49     ` Yazen Ghannam
  0 siblings, 0 replies; 14+ messages in thread
From: Yazen Ghannam @ 2017-06-16 14:49 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: linux-edac, Tony Luck, x86, linux-kernel

PiAtLS0tLU9yaWdpbmFsIE1lc3NhZ2UtLS0tLQ0KPiBGcm9tOiBsaW51eC1rZXJuZWwtb3duZXJA
dmdlci5rZXJuZWwub3JnIFttYWlsdG86bGludXgta2VybmVsLQ0KPiBvd25lckB2Z2VyLmtlcm5l
bC5vcmddIE9uIEJlaGFsZiBPZiBCb3Jpc2xhdiBQZXRrb3YNCj4gU2VudDogV2VkbmVzZGF5LCBK
dW5lIDE0LCAyMDE3IDEwOjIxIEFNDQo+IFRvOiBHaGFubmFtLCBZYXplbiA8WWF6ZW4uR2hhbm5h
bUBhbWQuY29tPg0KPiBDYzogbGludXgtZWRhY0B2Z2VyLmtlcm5lbC5vcmc7IFRvbnkgTHVjayA8
dG9ueS5sdWNrQGludGVsLmNvbT47DQo+IHg4NkBrZXJuZWwub3JnOyBsaW51eC1rZXJuZWxAdmdl
ci5rZXJuZWwub3JnDQo+IFN1YmplY3Q6IFJlOiBbUEFUQ0hdIHg4Ni9tY2U6IEFsd2F5cyBzYXZl
IHNldmVyaXR5IGluIG1hY2hpbmVfY2hlY2tfcG9sbA0KPiANCj4gT24gTW9uLCBKdW4gMTIsIDIw
MTcgYXQgMTE6NTQ6MDZBTSAtMDUwMCwgWWF6ZW4gR2hhbm5hbSB3cm90ZToNCj4gPiBGcm9tOiBZ
YXplbiBHaGFubmFtIDx5YXplbi5naGFubmFtQGFtZC5jb20+DQo+ID4NCj4gPiBSZW1vdmUgY29k
ZSB0aGF0IHdhcyB1c2VkIHRvIGRlY2lkZSB3aGV0aGVyIHRvIHNjaGVkdWxlIHdvcmsuIFRoZQ0K
PiA+IGRlY2lzaW9uDQo+IA0KPiA/Pz8NCj4gDQo+IEknbSBtaXNzaW5nIGEgKmxvdCogb2YgYmFj
a2dyb3VuZCBpbiBvcmRlciB0byB1bmRlcnN0YW5kIHdoYXQgdGhhdCBzZW50ZW5jZQ0KPiBtZWFu
cy4NCj4gDQoNClRoZSBjb2RlIGJsb2NrIGJlaW5nIHJlbW92ZWQgaGVyZSB3YXMgYWRkZWQgaW4g
dGhlIGZvbGxvd2luZyBjb21taXQgdG8gZGVjaWRlDQp3aGV0aGVyIG9yIG5vdCB0byBzY2hlZHVs
ZSB3b3JrLg0KDQpmYTkyYzU4IHg4NiwgbWNlOiBTdXBwb3J0IG1lbW9yeSBlcnJvciByZWNvdmVy
eSBmb3IgYm90aCBVQ05BIGFuZCBEZWZlcnJlZCBlcnJvciBpbiBtYWNoaW5lX2NoZWNrX3BvbGwN
Cg0KVGhlIGZvbGxvd2luZyBjb21taXQgYmFzZWQgdGhlIGRlY2lzaW9uIHRvIHNjaGVkdWxlIHdv
cmsgb24gaWYgd2UgaGF2ZSBhIHVzYWJsZQ0KYWRkcmVzcyBhbmQgbWFkZSB0aGlzIGRlY2lzaW9u
IGxhdGVyIGluIG1hY2hpbmVfY2hlY2tfcG9sbCgpLg0KDQo4YjM4OTM3YiB4ODYvbWNlOiBEbyBu
b3QgZW50ZXIgZGVmZXJyZWQgZXJyb3JzIGludG8gdGhlIGdlbmVyaWMgcG9vbCB0d2ljZQ0KDQpU
aGVuIHRoZSBmb2xsb3dpbmcgY29tbWl0IHJlbW92ZWQgbS51c2FibGVfYWRkciBmcm9tIHRoZSBj
b2RlIGJsb2NrLg0KDQpjMGVjMzgyIHg4Ni9SQVM6IFJlbW92ZSBtY2UudXNhYmxlX2FkZHINCg0K
U28gbm93IHRoaXMgY29kZSBibG9jayBqdXN0IGRlY2lkZXMgd2hldGhlciBvciBub3QgdG8gc2F2
ZSB0aGUgc2V2ZXJpdHkuIFdlIGNhbg0KcmVtb3ZlIHRoaXMgYmxvY2ssIHNpbmNlIHRoZSBvcmln
aW5hbCBwdXJwb3NlIG9mIHRoaXMgY29kZSAodG8gc2NoZWR1bGUgd29yaykgaXMgbm8NCmxvbmdl
ciBoYXBwZW5pbmcuDQoNClRvbnkgaGFzIGEgY29uY2VybiB0aGF0IHNvbWUgbm90aWZpZXJzIG1h
eSBhc3N1bWUgdGhhdCB0aGUgc2V2ZXJpdHkgYmVpbmcNCnNldCBtZWFucyB0aGF0IHRoZSBlcnJv
ciBpcyBhIG1lbW9yeSBlcnJvci4gQXMgZmFyIGFzIEkgY2FuIHRlbGwsIHRoZSBvbmx5IG5vdGlm
aWVyDQp0aGF0IHVzZXMgc2V2ZXJpdHkgaXMgdGhlIFNSQU8gbm90aWZpZXIgYW5kIGl0IGRvZXNu
J3QgbWFrZSBhbiBhc3N1bXB0aW9uLg0KDQpXZSBzY2hlZHVsZSB3b3JrIGlmIHdlIHdhbnQgdG8g
bG9nIHRoZSBlcnJvciBvciBpZiB3ZSBoYXZlIGEgdXNhYmxlIGFkZHJlc3MuDQpTbyB0aGVyZSdz
IG5vIHJlYXNvbiBub3QgdG8gc2F2ZSB0aGUgc2V2ZXJpdHkgYW55bW9yZS4NCg0KVGhhbmtzLA0K
WWF6ZW4NCg0K
---
To unsubscribe from this list: send the line "unsubscribe linux-edac" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] x86/mce: Always save severity in machine_check_poll
@ 2017-06-19 16:48       ` Borislav Petkov
  0 siblings, 0 replies; 14+ messages in thread
From: Borislav Petkov @ 2017-06-19 16:48 UTC (permalink / raw)
  To: Ghannam, Yazen; +Cc: linux-edac, Tony Luck, x86, linux-kernel

On Fri, Jun 16, 2017 at 02:49:58PM +0000, Ghannam, Yazen wrote:
> The code block being removed here was added in the following commit to decide
> whether or not to schedule work.
> 
> fa92c58 x86, mce: Support memory error recovery for both UCNA and Deferred error in machine_check_poll

Are you sure?

That commit makes the code log poison memory errors which have a valid
address. The work being scheduled is only an implementation detail for
how we're doing the logging and calling to the userspace handlers to
start consuming.

Btw, do

$ git config --replace-all core.abbrev 12

to set your commit abbreviation length to 12 so that there's no
ambiguity for the recent years. :)

> The following commit based the decision to schedule work on if we have a usable
> address and made this decision later in machine_check_poll().

> 8b38937b x86/mce: Do not enter deferred errors into the generic pool twice

Yes, because we were logging them twice.

> Then the following commit removed m.usable_addr from the code block.
> 
> c0ec382 x86/RAS: Remove mce.usable_addr

That's just a cleanup.

> So now this code block just decides whether or not to save the severity. We can
> remove this block, since the original purpose of this code (to schedule work) is no
> longer happening.

Not really. The original purpose of this code was to log deferred errors
with AddrV set - the work scheduling is just an implementation detail,
as I mentioned above.

> Tony has a concern that some notifiers may assume that the severity being
> set means that the error is a memory error. As far as I can tell, the only notifier
> that uses severity is the SRAO notifier and it doesn't make an assumption.
> 
> We schedule work if we want to log the error or if we have a usable address.
> So there's no reason not to save the severity anymore.

Just forget the work scheduling - it is only a marginal implementation
thing - you only want to say that we want to log the severity since we
already have it. And the most important aspect in your v2 commit message
is explaining *why* we want the severity. And yes, it is possible to
simply say, this is a cleanup and we can just as well always save it
because it is a valid reason. And it makes sense too.

All I'm trying to remind you of, is, you should explain in your commit
message what the issue is and *why* you're doing the change. When the
first sentence of your commit message is "Remove code that was used
to decide whether to schedule work." I'm starting to scratch my head,
wondering, what the actual problem is and what you're trying to fix.

Ok?

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* x86/mce: Always save severity in machine_check_poll
@ 2017-06-19 16:48       ` Borislav Petkov
  0 siblings, 0 replies; 14+ messages in thread
From: Borislav Petkov @ 2017-06-19 16:48 UTC (permalink / raw)
  To: Ghannam, Yazen; +Cc: linux-edac, Tony Luck, x86, linux-kernel

On Fri, Jun 16, 2017 at 02:49:58PM +0000, Ghannam, Yazen wrote:
> The code block being removed here was added in the following commit to decide
> whether or not to schedule work.
> 
> fa92c58 x86, mce: Support memory error recovery for both UCNA and Deferred error in machine_check_poll

Are you sure?

That commit makes the code log poison memory errors which have a valid
address. The work being scheduled is only an implementation detail for
how we're doing the logging and calling to the userspace handlers to
start consuming.

Btw, do

$ git config --replace-all core.abbrev 12

to set your commit abbreviation length to 12 so that there's no
ambiguity for the recent years. :)

> The following commit based the decision to schedule work on if we have a usable
> address and made this decision later in machine_check_poll().

> 8b38937b x86/mce: Do not enter deferred errors into the generic pool twice

Yes, because we were logging them twice.

> Then the following commit removed m.usable_addr from the code block.
> 
> c0ec382 x86/RAS: Remove mce.usable_addr

That's just a cleanup.

> So now this code block just decides whether or not to save the severity. We can
> remove this block, since the original purpose of this code (to schedule work) is no
> longer happening.

Not really. The original purpose of this code was to log deferred errors
with AddrV set - the work scheduling is just an implementation detail,
as I mentioned above.

> Tony has a concern that some notifiers may assume that the severity being
> set means that the error is a memory error. As far as I can tell, the only notifier
> that uses severity is the SRAO notifier and it doesn't make an assumption.
> 
> We schedule work if we want to log the error or if we have a usable address.
> So there's no reason not to save the severity anymore.

Just forget the work scheduling - it is only a marginal implementation
thing - you only want to say that we want to log the severity since we
already have it. And the most important aspect in your v2 commit message
is explaining *why* we want the severity. And yes, it is possible to
simply say, this is a cleanup and we can just as well always save it
because it is a valid reason. And it makes sense too.

All I'm trying to remind you of, is, you should explain in your commit
message what the issue is and *why* you're doing the change. When the
first sentence of your commit message is "Remove code that was used
to decide whether to schedule work." I'm starting to scratch my head,
wondering, what the actual problem is and what you're trying to fix.

Ok?

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: [PATCH] x86/mce: Always save severity in machine_check_poll
@ 2017-06-21 19:41         ` Yazen Ghannam
  0 siblings, 0 replies; 14+ messages in thread
From: Ghannam, Yazen @ 2017-06-21 19:41 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: linux-edac, Tony Luck, x86, linux-kernel

> -----Original Message-----
> From: Borislav Petkov [mailto:bp@alien8.de]
> Sent: Monday, June 19, 2017 12:49 PM
> To: Ghannam, Yazen <Yazen.Ghannam@amd.com>
> Cc: linux-edac@vger.kernel.org; Tony Luck <tony.luck@intel.com>;
> x86@kernel.org; linux-kernel@vger.kernel.org
> Subject: Re: [PATCH] x86/mce: Always save severity in machine_check_poll
> 
> 
> Just forget the work scheduling - it is only a marginal implementation thing -
> you only want to say that we want to log the severity since we already have it.
> And the most important aspect in your v2 commit message is explaining
> *why* we want the severity. And yes, it is possible to simply say, this is a
> cleanup and we can just as well always save it because it is a valid reason. And
> it makes sense too.
> 
> All I'm trying to remind you of, is, you should explain in your commit message
> what the issue is and *why* you're doing the change. When the first sentence
> of your commit message is "Remove code that was used to decide whether to
> schedule work." I'm starting to scratch my head, wondering, what the actual
> problem is and what you're trying to fix.
> 
> Ok?
> 

Yep, got it.

Thanks,
Yazen

^ permalink raw reply	[flat|nested] 14+ messages in thread

* x86/mce: Always save severity in machine_check_poll
@ 2017-06-21 19:41         ` Yazen Ghannam
  0 siblings, 0 replies; 14+ messages in thread
From: Yazen Ghannam @ 2017-06-21 19:41 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: linux-edac, Tony Luck, x86, linux-kernel

> -----Original Message-----
> From: Borislav Petkov [mailto:bp@alien8.de]
> Sent: Monday, June 19, 2017 12:49 PM
> To: Ghannam, Yazen <Yazen.Ghannam@amd.com>
> Cc: linux-edac@vger.kernel.org; Tony Luck <tony.luck@intel.com>;
> x86@kernel.org; linux-kernel@vger.kernel.org
> Subject: Re: [PATCH] x86/mce: Always save severity in machine_check_poll
> 
> 
> Just forget the work scheduling - it is only a marginal implementation thing -
> you only want to say that we want to log the severity since we already have it.
> And the most important aspect in your v2 commit message is explaining
> *why* we want the severity. And yes, it is possible to simply say, this is a
> cleanup and we can just as well always save it because it is a valid reason. And
> it makes sense too.
> 
> All I'm trying to remind you of, is, you should explain in your commit message
> what the issue is and *why* you're doing the change. When the first sentence
> of your commit message is "Remove code that was used to decide whether to
> schedule work." I'm starting to scratch my head, wondering, what the actual
> problem is and what you're trying to fix.
> 
> Ok?
> 

Yep, got it.

Thanks,
Yazen

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2017-06-21 19:41 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-06-12 16:54 [PATCH] x86/mce: Always save severity in machine_check_poll Yazen Ghannam
2017-06-12 16:54 ` Yazen Ghannam
2017-06-12 18:07 ` [PATCH] " Luck, Tony
2017-06-12 18:07   ` Luck, Tony
2017-06-12 18:55   ` [PATCH] " Ghannam, Yazen
2017-06-12 18:55     ` Yazen Ghannam
2017-06-14 14:20 ` [PATCH] " Borislav Petkov
2017-06-14 14:20   ` Borislav Petkov
2017-06-16 14:49   ` [PATCH] " Ghannam, Yazen
2017-06-16 14:49     ` Yazen Ghannam
2017-06-19 16:48     ` [PATCH] " Borislav Petkov
2017-06-19 16:48       ` Borislav Petkov
2017-06-21 19:41       ` [PATCH] " Ghannam, Yazen
2017-06-21 19:41         ` Yazen Ghannam

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.