All of lore.kernel.org
 help / color / mirror / Atom feed
From: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
To: "Luck, Tony" <tony.luck@intel.com>
Cc: Borislav Petkov <bp@alien8.de>,
	Prarit Bhargava <prarit@redhat.com>,
	"Vivek Goyal" <vgoyal@redhat.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Junichi Nomura <j-nomura@ce.jp.nec.com>,
	Kiyoshi Ueda <k-ueda@ct.jp.nec.com>
Subject: Re: [PATCH v4] x86: mce: kexec: switch MCE handler for kexec/kdump
Date: Thu, 5 Mar 2015 01:24:47 +0000	[thread overview]
Message-ID: <20150305012447.GA16001@hori1.linux.bs1.fc.nec.co.jp> (raw)
In-Reply-To: <3908561D78D1C84285E8C5FCA982C28F329F835A@ORSMSX114.amr.corp.intel.com>

On Wed, Mar 04, 2015 at 11:12:33PM +0000, Luck, Tony wrote:
> > - fixed AR and UC order in enum severity_level because UC is severer than AR
> >  by definition. Current code is not affected by this wrong order by chance.
> 
> AR and AO are both UC errors - that happen also to be recoverable.

Maybe just saying "UC" might be confusing, meaning "UC bit is set" or "type of
error (defined in Table SDM's Table 15-6 in vol.3B) is 'Uncorrected Error'
(clearly separate from SRAR/SRAO)". You seem to mean the former, and I meant
the latter. So I should write the description more accurately like "UC=1,PCC=0"
error and "UC=1,PCC=1" error.

>  Are you really sure
> about this re-order not affecting existing code?

Sorry, I thought I checked that but I missed one, so let me check again now.
I checked all referencing site of MCE_*_SEVERITY. Most of them are using '=='
to compare the severity, where the re-order doesn't affect them.
Some are using inequalities:

- around ./arch/x86/kernel/cpu/mcheck/mce.c:720, 

          if (mce_severity(m, mca_cfg.tolerant, msg, true) >=
              MCE_PANIC_SEVERITY)                            
  
  , the re-order doesn't affect.

- ./arch/x86/kernel/cpu/mcheck/mce.c:819:

          if (m && global_worst >= MCE_PANIC_SEVERITY && mca_cfg.tolerant < 3)

  , the re-order doesn't affect.

- ./arch/x86/kernel/cpu/mcheck/mce.c:832:

          if (global_worst <= MCE_KEEP_SEVERITY && mca_cfg.tolerant < 3) 

  , ditto.

- ./arch/x86/kernel/cpu/mcheck/mce.c:1196:

                no_way_out = worst >= MCE_PANIC_SEVERITY;

  , ditto.

- ./arch/x86/kernel/cpu/mcheck/mce-severity.c:211:

                if (s->sev >= MCE_UC_SEVERITY && ctx == IN_KERNEL) {

  , the re-order should change to s->sev >= MCE_AR_SEVERITY to keep the
  kernel behavior.

So I fixed the last part to be included in the re-order patch.

>  You might well be right, but as every one
> else has pointed out mce_severity() is full of odd subtleties that catch people out.

I agree that this one big table is hard to maintain and bug-prone. One problem
is that it has too many fields to check so the parameter space is huge. I think
some field are checked only once, so separating it out makes table more simple
and readable.

> Is the "UC" entry at the end of the severities[] table just a catch-all for things that made it
> past all the other entries? Does it ever really get used?

I read through the severity check table and it seems that all UC=1 case
are already considered by the above entries, so it seems not used.

> What was the test case that made you promote UC above AR?

I thought of "Action required but unaffected thread is continuable" case
on kexec kernel, but I'm not sure that such a case really happens.
My motivation on the promotion was mainly from what SDM says rather than
the real testcase.

> This absolutely should not be buried in the middle of your other patch - it needs to
> be separate with a much more than two lines of commit description.

OK, I might not include this part in this series in later post, but if I do,
I'll separete it out.

Thanks,
Naoya Horiguchi

  reply	other threads:[~2015-03-05  1:26 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-03-03  9:01 [PATCH v3 1/2] x86: mce: kexec: switch MCE handler for kexec/kdump Naoya Horiguchi
2015-03-03  9:01 ` [PATCH v3 2/2] x86: mce: comment about MCE synchronization timeout on definition of tolerant Naoya Horiguchi
2015-03-03 18:09 ` [PATCH v3 1/2] x86: mce: kexec: switch MCE handler for kexec/kdump Luck, Tony
2015-03-04  7:41   ` [PATCH v4] " Naoya Horiguchi
2015-03-04 23:12     ` Luck, Tony
2015-03-05  1:24       ` Naoya Horiguchi [this message]
2015-03-05  6:45         ` [PATCH v5] " Naoya Horiguchi
2015-03-05  8:57           ` Borislav Petkov
2015-03-05  9:37             ` Naoya Horiguchi
2015-03-06  2:59               ` [PATCH v6] " Naoya Horiguchi
2015-03-06  8:34                 ` Borislav Petkov
2015-03-06  9:09                   ` Naoya Horiguchi
2015-03-06  9:27                     ` Borislav Petkov
2015-03-06  9:32                       ` Naoya Horiguchi
2015-03-06 10:22                         ` [PATCH v7] " Naoya Horiguchi
2015-04-06  7:18                           ` Naoya Horiguchi
2015-04-06 11:59                             ` Borislav Petkov
2015-04-07  8:00                               ` Naoya Horiguchi
2015-04-07  8:02                                 ` [PATCH v8] " Naoya Horiguchi
2015-04-09  6:13                                   ` Borislav Petkov
2015-04-09  6:57                                     ` Naoya Horiguchi
2015-04-09  7:02                                       ` Borislav Petkov
2015-04-09 18:07                                         ` Luck, Tony
2015-04-09  8:00                                     ` Ingo Molnar
2015-04-09  8:21                                       ` Borislav Petkov
2015-04-09  8:59                                         ` Naoya Horiguchi
2015-04-09  9:53                                           ` Borislav Petkov
2015-04-09 18:22                                             ` Luck, Tony
2015-04-09 19:05                                               ` Borislav Petkov
2015-04-10  0:49                                                 ` Naoya Horiguchi
2015-04-10  4:07                                                   ` Naoya Horiguchi
2015-04-10  7:24                                                     ` Borislav Petkov
2015-04-28  8:41                                                   ` Baoquan He
2015-04-09  8:39                                       ` Naoya Horiguchi
2015-04-09  9:13                                         ` Ingo Molnar
2015-04-06 11:56                           ` [PATCH v7] " Borislav Petkov
2015-04-07  7:59                             ` Naoya Horiguchi
2015-03-06  8:28               ` [PATCH v5] " Borislav Petkov
2015-03-06  5:44         ` [PATCH v4] " Naoya Horiguchi
2015-03-05  8:48       ` Borislav Petkov
2015-03-03 18:53 ` [PATCH v3 1/2] " Borislav Petkov
2015-03-04  7:51   ` Naoya Horiguchi
2015-03-04  9:12     ` Borislav Petkov
2015-03-05  1:27       ` Naoya Horiguchi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150305012447.GA16001@hori1.linux.bs1.fc.nec.co.jp \
    --to=n-horiguchi@ah.jp.nec.com \
    --cc=bp@alien8.de \
    --cc=j-nomura@ce.jp.nec.com \
    --cc=k-ueda@ct.jp.nec.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=prarit@redhat.com \
    --cc=tony.luck@intel.com \
    --cc=vgoyal@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.