Linux-EDAC Archive on lore.kernel.org
 help / color / Atom feed
From: Borislav Petkov <bp@alien8.de>
To: "Luck, Tony" <tony.luck@intel.com>
Cc: "Ghannam, Yazen" <Yazen.Ghannam@amd.com>,
	"linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"x86@kernel.org" <x86@kernel.org>
Subject: Re: [PATCH v3 5/6] x86/MCE: Save MCA control bits that get set in hardware
Date: Fri, 17 May 2019 12:10:06 +0200
Message-ID: <20190517101006.GA32065@zn.tnic> (raw)
In-Reply-To: <20190516205943.GA3299@agluck-desk>

On Thu, May 16, 2019 at 01:59:43PM -0700, Luck, Tony wrote:
> I think the intent of the original patch was to find out
> which bits are "implemented in hardware". I.e. throw all
> 1's at the register and see if any of them stick.

And, in addition, check ->init before showing/setting a bank:

---
@@ -2095,6 +2098,9 @@ static ssize_t show_bank(struct device *s, struct device_attribute *attr,
 
        b = &per_cpu(mce_banks_array, s->id)[bank];
 
+       if (!b->init)
+               return -ENODEV;
+
        return sprintf(buf, "%llx\n", b->ctl);
 }
 
@@ -2113,6 +2119,9 @@ static ssize_t set_bank(struct device *s, struct device_attribute *attr,
 
        b = &per_cpu(mce_banks_array, s->id)[bank];
 
+       if (!b->init)
+               return -ENODEV;
+
        b->ctl = new;
        mce_restart();
---

so that you get a feedback whether the setting has even succeeded or
not. Right now we're doing "something" blindly and accepting any b->ctl
from userspace. Yeah, it is root-only but still...

> I don't object to the idea behind the patch. But if you want
> to do this you just should not modify b->ctl.
> 
> So something like:
> 	
> 
> static void __mcheck_cpu_init_clear_banks(void)
> {
>         struct mce_bank *mce_banks = this_cpu_read(mce_banks_array);
> 	u64 tmp;
>         int i;
> 
>         for (i = 0; i < this_cpu_read(mce_num_banks); i++) {
>                 struct mce_bank *b = &mce_banks[i];
> 
>                 if (b->init) {
>                         wrmsrl(msr_ops.ctl(i), b->ctl);
>                         wrmsrl(msr_ops.status(i), 0);
> 			rdmsrl(msr_ops.ctl(i), tmp);
> 
> 			/* Check if any bits implemented in h/w */
> 			b->init = !!tmp;
>                 }

... except that we unconditionally set ->init to 1 in
__mcheck_cpu_mce_banks_init() and I think we should query it. Btw, that
name __mcheck_cpu_mce_banks_init() is hideous too. I'll fix those up. In
the meantime, how does the below look like? The change is to tickle out
from the hw whether some CTL bits stick and then use that to determine
b->init setting:

---
From: Yazen Ghannam <yazen.ghannam@amd.com>
Date: Tue, 30 Apr 2019 20:32:21 +0000
Subject: [PATCH] x86/MCE: Determine MCA banks' init state properly

The OS is expected to write all bits to MCA_CTL for each bank,
thus enabling error reporting in all banks. However, some banks
may be unused in which case the registers for such banks are
Read-as-Zero/Writes-Ignored. Also, the OS may avoid setting some control
bits because of quirks, etc.

A bank can be considered uninitialized if the MCA_CTL register returns
zero. This is because either the OS did not write anything or because
the hardware is enforcing RAZ/WI for the bank.

Set a bank's init value based on if the control bits are set or not in
hardware. Return an error code in the sysfs interface for uninitialized
banks.

 [ bp: Massage a bit. Discover bank init state at boot. ]

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: "x86@kernel.org" <x86@kernel.org>
Link: https://lkml.kernel.org/r/20190430203206.104163-7-Yazen.Ghannam@amd.com
---
 arch/x86/kernel/cpu/mce/core.c | 23 ++++++++++++++++++-----
 1 file changed, 18 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index 5bcecadcf4d9..d84b0c707d0e 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -1492,9 +1492,16 @@ static int __mcheck_cpu_mce_banks_init(void)
 
 	for (i = 0; i < n_banks; i++) {
 		struct mce_bank *b = &mce_banks[i];
+		u64 val;
 
 		b->ctl = -1ULL;
-		b->init = 1;
+
+		/* Check if any bits are implemented in h/w */
+		wrmsrl(msr_ops.ctl(i), b->ctl);
+		rdmsrl(msr_ops.ctl(i), val);
+		b->init = !!val;
+
+		wrmsrl(msr_ops.status(i), 0);
 	}
 
 	per_cpu(mce_banks_array, smp_processor_id()) = mce_banks;
@@ -1567,10 +1574,10 @@ static void __mcheck_cpu_init_clear_banks(void)
 	for (i = 0; i < this_cpu_read(mce_num_banks); i++) {
 		struct mce_bank *b = &mce_banks[i];
 
-		if (!b->init)
-			continue;
-		wrmsrl(msr_ops.ctl(i), b->ctl);
-		wrmsrl(msr_ops.status(i), 0);
+		if (b->init) {
+			wrmsrl(msr_ops.ctl(i), b->ctl);
+			wrmsrl(msr_ops.status(i), 0);
+		}
 	}
 }
 
@@ -2095,6 +2102,9 @@ static ssize_t show_bank(struct device *s, struct device_attribute *attr,
 
 	b = &per_cpu(mce_banks_array, s->id)[bank];
 
+	if (!b->init)
+		return -ENODEV;
+
 	return sprintf(buf, "%llx\n", b->ctl);
 }
 
@@ -2113,6 +2123,9 @@ static ssize_t set_bank(struct device *s, struct device_attribute *attr,
 
 	b = &per_cpu(mce_banks_array, s->id)[bank];
 
+	if (!b->init)
+		return -ENODEV;
+
 	b->ctl = new;
 	mce_restart();
 
-- 
2.21.0

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

  reply index

Thread overview: 47+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-04-30 20:32 [PATCH v3 0/6] Handle MCA banks in a per_cpu way Ghannam, Yazen
2019-04-30 20:32 ` [v3,1/6] x86/MCE: Make struct mce_banks[] static Yazen Ghannam
2019-04-30 20:32   ` [PATCH v3 1/6] " Ghannam, Yazen
2019-04-30 20:32 ` [v3,2/6] x86/MCE: Handle MCA controls in a per_cpu way Yazen Ghannam
2019-04-30 20:32   ` [PATCH v3 2/6] " Ghannam, Yazen
2019-04-30 20:32 ` [v3,3/6] x86/MCE/AMD: Don't cache block addresses on SMCA systems Yazen Ghannam
2019-04-30 20:32   ` [PATCH v3 3/6] " Ghannam, Yazen
2019-04-30 20:32 ` [v3,5/6] x86/MCE: Save MCA control bits that get set in hardware Yazen Ghannam
2019-04-30 20:32   ` [PATCH v3 5/6] " Ghannam, Yazen
2019-05-16 15:52   ` Luck, Tony
2019-05-16 16:14     ` Ghannam, Yazen
2019-05-16 16:56       ` Borislav Petkov
2019-05-16 17:09         ` Ghannam, Yazen
2019-05-16 17:21           ` Borislav Petkov
2019-05-16 20:20             ` Ghannam, Yazen
2019-05-16 20:34               ` Borislav Petkov
2019-05-16 20:59                 ` Luck, Tony
2019-05-17 10:10                   ` Borislav Petkov [this message]
2019-05-17 15:46                     ` Ghannam, Yazen
2019-05-17 16:37                       ` Borislav Petkov
2019-05-17 17:26                         ` Luck, Tony
2019-05-17 17:48                           ` Borislav Petkov
2019-05-17 18:06                             ` Luck, Tony
2019-05-17 19:34                               ` Borislav Petkov
2019-05-17 19:44                                 ` Luck, Tony
2019-05-17 19:50                                   ` Borislav Petkov
2019-05-17 19:49                                 ` Ghannam, Yazen
2019-05-17 20:02                                   ` Borislav Petkov
2019-05-23 20:00                                     ` Ghannam, Yazen
2019-05-27 23:28                                       ` Borislav Petkov
2019-06-07 14:49                                         ` Ghannam, Yazen
2019-06-07 16:37                                           ` Borislav Petkov
2019-06-07 16:44                                             ` Ghannam, Yazen
2019-06-07 16:59                                               ` Borislav Petkov
2019-06-07 17:08                                                 ` Ghannam, Yazen
2019-06-07 17:20                                                   ` Borislav Petkov
2019-06-11  5:13                                             ` Borislav Petkov
2019-04-30 20:32 ` [v3,4/6] x86/MCE: Make number of MCA banks per_cpu Yazen Ghannam
2019-04-30 20:32   ` [PATCH v3 4/6] " Ghannam, Yazen
2019-05-18 11:25   ` Borislav Petkov
2019-05-21 17:52     ` Ghannam, Yazen
2019-05-21 20:29       ` Borislav Petkov
2019-05-21 20:42         ` Luck, Tony
2019-05-21 23:09           ` Borislav Petkov
2019-05-22 14:01             ` Ghannam, Yazen
2019-04-30 20:32 ` [v3,6/6] x86/MCE: Treat MCE bank as initialized if control bits set in hardware Yazen Ghannam
2019-04-30 20:32   ` [PATCH v3 6/6] " Ghannam, Yazen

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190517101006.GA32065@zn.tnic \
    --to=bp@alien8.de \
    --cc=Yazen.Ghannam@amd.com \
    --cc=linux-edac@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=tony.luck@intel.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-EDAC Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-edac/0 linux-edac/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-edac linux-edac/ https://lore.kernel.org/linux-edac \
		linux-edac@vger.kernel.org
	public-inbox-index linux-edac

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-edac


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git