Linux-EDAC Archive on lore.kernel.org
 help / color / Atom feed
From: "Ghannam, Yazen" <Yazen.Ghannam@amd.com>
To: Borislav Petkov <bp@alien8.de>, "Luck, Tony" <tony.luck@intel.com>
Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"x86@kernel.org" <x86@kernel.org>
Subject: RE: [PATCH v3 5/6] x86/MCE: Save MCA control bits that get set in hardware
Date: Fri, 17 May 2019 15:46:07 +0000
Message-ID: <SN6PR12MB26391A0C3979030082EE38F8F80B0@SN6PR12MB2639.namprd12.prod.outlook.com> (raw)
In-Reply-To: <20190517101006.GA32065@zn.tnic>

> -----Original Message-----
> From: linux-edac-owner@vger.kernel.org <linux-edac-owner@vger.kernel.org> On Behalf Of Borislav Petkov
> Sent: Friday, May 17, 2019 5:10 AM
> To: Luck, Tony <tony.luck@intel.com>
> Cc: Ghannam, Yazen <Yazen.Ghannam@amd.com>; linux-edac@vger.kernel.org; linux-kernel@vger.kernel.org; x86@kernel.org
> Subject: Re: [PATCH v3 5/6] x86/MCE: Save MCA control bits that get set in hardware
> 
> 
> On Thu, May 16, 2019 at 01:59:43PM -0700, Luck, Tony wrote:
> > I think the intent of the original patch was to find out
> > which bits are "implemented in hardware". I.e. throw all
> > 1's at the register and see if any of them stick.
> 
> And, in addition, check ->init before showing/setting a bank:
> 
> ---
> @@ -2095,6 +2098,9 @@ static ssize_t show_bank(struct device *s, struct device_attribute *attr,
> 
>         b = &per_cpu(mce_banks_array, s->id)[bank];
> 
> +       if (!b->init)
> +               return -ENODEV;
> +
>         return sprintf(buf, "%llx\n", b->ctl);
>  }
> 
> @@ -2113,6 +2119,9 @@ static ssize_t set_bank(struct device *s, struct device_attribute *attr,
> 
>         b = &per_cpu(mce_banks_array, s->id)[bank];
> 
> +       if (!b->init)
> +               return -ENODEV;
> +
>         b->ctl = new;
>         mce_restart();
> ---
> 
> so that you get a feedback whether the setting has even succeeded or
> not. Right now we're doing "something" blindly and accepting any b->ctl
> from userspace. Yeah, it is root-only but still...
> 
> > I don't object to the idea behind the patch. But if you want
> > to do this you just should not modify b->ctl.
> >
> > So something like:
> >
> >
> > static void __mcheck_cpu_init_clear_banks(void)
> > {
> >         struct mce_bank *mce_banks = this_cpu_read(mce_banks_array);
> >       u64 tmp;
> >         int i;
> >
> >         for (i = 0; i < this_cpu_read(mce_num_banks); i++) {
> >                 struct mce_bank *b = &mce_banks[i];
> >
> >                 if (b->init) {
> >                         wrmsrl(msr_ops.ctl(i), b->ctl);
> >                         wrmsrl(msr_ops.status(i), 0);
> >                       rdmsrl(msr_ops.ctl(i), tmp);
> >
> >                       /* Check if any bits implemented in h/w */
> >                       b->init = !!tmp;
> >                 }
> 
> ... except that we unconditionally set ->init to 1 in
> __mcheck_cpu_mce_banks_init() and I think we should query it. Btw, that
> name __mcheck_cpu_mce_banks_init() is hideous too. I'll fix those up. In
> the meantime, how does the below look like? The change is to tickle out
> from the hw whether some CTL bits stick and then use that to determine
> b->init setting:
> 
> ---
> From: Yazen Ghannam <yazen.ghannam@amd.com>
> Date: Tue, 30 Apr 2019 20:32:21 +0000
> Subject: [PATCH] x86/MCE: Determine MCA banks' init state properly
> 
> The OS is expected to write all bits to MCA_CTL for each bank,
> thus enabling error reporting in all banks. However, some banks
> may be unused in which case the registers for such banks are
> Read-as-Zero/Writes-Ignored. Also, the OS may avoid setting some control
> bits because of quirks, etc.
> 
> A bank can be considered uninitialized if the MCA_CTL register returns
> zero. This is because either the OS did not write anything or because
> the hardware is enforcing RAZ/WI for the bank.
> 
> Set a bank's init value based on if the control bits are set or not in
> hardware. Return an error code in the sysfs interface for uninitialized
> banks.
> 
>  [ bp: Massage a bit. Discover bank init state at boot. ]
> 
> Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
> Signed-off-by: Borislav Petkov <bp@suse.de>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Tony Luck <tony.luck@intel.com>
> Cc: "x86@kernel.org" <x86@kernel.org>
> Link: https://lkml.kernel.org/r/20190430203206.104163-7-Yazen.Ghannam@amd.com
> ---
>  arch/x86/kernel/cpu/mce/core.c | 23 ++++++++++++++++++-----
>  1 file changed, 18 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
> index 5bcecadcf4d9..d84b0c707d0e 100644
> --- a/arch/x86/kernel/cpu/mce/core.c
> +++ b/arch/x86/kernel/cpu/mce/core.c
> @@ -1492,9 +1492,16 @@ static int __mcheck_cpu_mce_banks_init(void)
> 
>         for (i = 0; i < n_banks; i++) {
>                 struct mce_bank *b = &mce_banks[i];
> +               u64 val;
> 
>                 b->ctl = -1ULL;
> -               b->init = 1;
> +
> +               /* Check if any bits are implemented in h/w */
> +               wrmsrl(msr_ops.ctl(i), b->ctl);
> +               rdmsrl(msr_ops.ctl(i), val);
> +               b->init = !!val;
> +
> +               wrmsrl(msr_ops.status(i), 0);
>         }

I think there are a couple of issues here.
1) The bank is being initialized without accounting for any quirks.
2) The bank is being initialized without having set up any handler or other appropriate setup.

Thanks,
Yazen


  reply index

Thread overview: 47+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-04-30 20:32 [PATCH v3 0/6] Handle MCA banks in a per_cpu way Ghannam, Yazen
2019-04-30 20:32 ` [v3,1/6] x86/MCE: Make struct mce_banks[] static Yazen Ghannam
2019-04-30 20:32   ` [PATCH v3 1/6] " Ghannam, Yazen
2019-04-30 20:32 ` [v3,2/6] x86/MCE: Handle MCA controls in a per_cpu way Yazen Ghannam
2019-04-30 20:32   ` [PATCH v3 2/6] " Ghannam, Yazen
2019-04-30 20:32 ` [v3,3/6] x86/MCE/AMD: Don't cache block addresses on SMCA systems Yazen Ghannam
2019-04-30 20:32   ` [PATCH v3 3/6] " Ghannam, Yazen
2019-04-30 20:32 ` [v3,5/6] x86/MCE: Save MCA control bits that get set in hardware Yazen Ghannam
2019-04-30 20:32   ` [PATCH v3 5/6] " Ghannam, Yazen
2019-05-16 15:52   ` Luck, Tony
2019-05-16 16:14     ` Ghannam, Yazen
2019-05-16 16:56       ` Borislav Petkov
2019-05-16 17:09         ` Ghannam, Yazen
2019-05-16 17:21           ` Borislav Petkov
2019-05-16 20:20             ` Ghannam, Yazen
2019-05-16 20:34               ` Borislav Petkov
2019-05-16 20:59                 ` Luck, Tony
2019-05-17 10:10                   ` Borislav Petkov
2019-05-17 15:46                     ` Ghannam, Yazen [this message]
2019-05-17 16:37                       ` Borislav Petkov
2019-05-17 17:26                         ` Luck, Tony
2019-05-17 17:48                           ` Borislav Petkov
2019-05-17 18:06                             ` Luck, Tony
2019-05-17 19:34                               ` Borislav Petkov
2019-05-17 19:44                                 ` Luck, Tony
2019-05-17 19:50                                   ` Borislav Petkov
2019-05-17 19:49                                 ` Ghannam, Yazen
2019-05-17 20:02                                   ` Borislav Petkov
2019-05-23 20:00                                     ` Ghannam, Yazen
2019-05-27 23:28                                       ` Borislav Petkov
2019-06-07 14:49                                         ` Ghannam, Yazen
2019-06-07 16:37                                           ` Borislav Petkov
2019-06-07 16:44                                             ` Ghannam, Yazen
2019-06-07 16:59                                               ` Borislav Petkov
2019-06-07 17:08                                                 ` Ghannam, Yazen
2019-06-07 17:20                                                   ` Borislav Petkov
2019-06-11  5:13                                             ` Borislav Petkov
2019-04-30 20:32 ` [v3,4/6] x86/MCE: Make number of MCA banks per_cpu Yazen Ghannam
2019-04-30 20:32   ` [PATCH v3 4/6] " Ghannam, Yazen
2019-05-18 11:25   ` Borislav Petkov
2019-05-21 17:52     ` Ghannam, Yazen
2019-05-21 20:29       ` Borislav Petkov
2019-05-21 20:42         ` Luck, Tony
2019-05-21 23:09           ` Borislav Petkov
2019-05-22 14:01             ` Ghannam, Yazen
2019-04-30 20:32 ` [v3,6/6] x86/MCE: Treat MCE bank as initialized if control bits set in hardware Yazen Ghannam
2019-04-30 20:32   ` [PATCH v3 6/6] " Ghannam, Yazen

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=SN6PR12MB26391A0C3979030082EE38F8F80B0@SN6PR12MB2639.namprd12.prod.outlook.com \
    --to=yazen.ghannam@amd.com \
    --cc=bp@alien8.de \
    --cc=linux-edac@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=tony.luck@intel.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-EDAC Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-edac/0 linux-edac/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-edac linux-edac/ https://lore.kernel.org/linux-edac \
		linux-edac@vger.kernel.org linux-edac@archiver.kernel.org
	public-inbox-index linux-edac


Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-edac


AGPL code for this site: git clone https://public-inbox.org/ public-inbox