linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH RESEND 1/5] x86/MCE: Make struct mce_banks[] static
  2019-04-08 14:12 [PATCH RESEND 0/5] Handle MCA banks in a per_cpu way Ghannam, Yazen
@ 2019-04-08 14:12 ` Ghannam, Yazen
  2019-04-08 14:12 ` [PATCH RESEND 2/5] x86/MCE: Handle MCA controls in a per_cpu way Ghannam, Yazen
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 23+ messages in thread
From: Ghannam, Yazen @ 2019-04-08 14:12 UTC (permalink / raw)
  To: linux-edac; +Cc: Ghannam, Yazen, linux-kernel, bp, tony.luck, x86

From: Yazen Ghannam <yazen.ghannam@amd.com>

The struct mce_banks[] array is only used in mce/core.c so move the
definition of struct mce_bank to mce/core.c and make the array static.

Also, change the "init" field to bool type.

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
---
 arch/x86/kernel/cpu/mce/core.c     | 11 ++++++++++-
 arch/x86/kernel/cpu/mce/internal.h | 10 ----------
 2 files changed, 10 insertions(+), 11 deletions(-)

diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index 58925e7ccb27..8d0d1e8425db 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -64,7 +64,16 @@ static DEFINE_MUTEX(mce_sysfs_mutex);
 
 DEFINE_PER_CPU(unsigned, mce_exception_count);
 
-struct mce_bank *mce_banks __read_mostly;
+#define ATTR_LEN               16
+/* One object for each MCE bank, shared by all CPUs */
+struct mce_bank {
+	u64			ctl;			/* subevents to enable */
+	bool			init;			/* initialise bank? */
+	struct device_attribute	attr;			/* device attribute */
+	char			attrname[ATTR_LEN];	/* attribute name */
+};
+
+static struct mce_bank *mce_banks __read_mostly;
 struct mce_vendor_flags mce_flags __read_mostly;
 
 struct mca_config mca_cfg __read_mostly = {
diff --git a/arch/x86/kernel/cpu/mce/internal.h b/arch/x86/kernel/cpu/mce/internal.h
index af5eab1e65e2..032d52c66616 100644
--- a/arch/x86/kernel/cpu/mce/internal.h
+++ b/arch/x86/kernel/cpu/mce/internal.h
@@ -22,17 +22,8 @@ enum severity_level {
 
 extern struct blocking_notifier_head x86_mce_decoder_chain;
 
-#define ATTR_LEN		16
 #define INITIAL_CHECK_INTERVAL	5 * 60 /* 5 minutes */
 
-/* One object for each MCE bank, shared by all CPUs */
-struct mce_bank {
-	u64			ctl;			/* subevents to enable */
-	unsigned char init;				/* initialise bank? */
-	struct device_attribute attr;			/* device attribute */
-	char			attrname[ATTR_LEN];	/* attribute name */
-};
-
 struct mce_evt_llist {
 	struct llist_node llnode;
 	struct mce mce;
@@ -47,7 +38,6 @@ struct llist_node *mce_gen_pool_prepare_records(void);
 extern int (*mce_severity)(struct mce *a, int tolerant, char **msg, bool is_excp);
 struct dentry *mce_get_debugfs_dir(void);
 
-extern struct mce_bank *mce_banks;
 extern mce_banks_t mce_banks_ce_disabled;
 
 #ifdef CONFIG_X86_MCE_INTEL
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH RESEND 0/5]  Handle MCA banks in a per_cpu way
@ 2019-04-08 14:12 Ghannam, Yazen
  2019-04-08 14:12 ` [PATCH RESEND 1/5] x86/MCE: Make struct mce_banks[] static Ghannam, Yazen
                   ` (4 more replies)
  0 siblings, 5 replies; 23+ messages in thread
From: Ghannam, Yazen @ 2019-04-08 14:12 UTC (permalink / raw)
  To: linux-edac; +Cc: Ghannam, Yazen, linux-kernel, bp, tony.luck, x86

From: Yazen Ghannam <yazen.ghannam@amd.com>

The focus of this patchset is define and use the MCA bank structures
and bank count per logical CPU.

With the exception of patch 4, this set applies to systems in production
today.

Patch 1:
Moves the declaration of struct mce_banks[] to the only file it's used.

Patch 2:
Splits struct mce_bank into a structure for fields common to MCA banks
on all CPUs and another structure that can be used per_cpu.

Patch 3:
Brings full circle the saga of the threshold block addresses on SMCA
systems. After taking a step back and reviewing the AMD documentation, I
think that this implimentation is the simplest and more robust way to
follow the spec.

Patch 4:
Saves and uses the MCA bank count as a per_cpu variable. This is to
support systems that have MCA bank counts that are different between
logical CPUs.

Patch 5:
Makes sure that sysfs reports the MCA_CTL value as set in hardware. This
is not something related to making things per_cpu but rather just
something I noticed while working on the other patches.

Thanks,
Yazen

Yazen Ghannam (5):
  x86/MCE: Make struct mce_banks[] static
  x86/MCE: Handle MCA controls in a per_cpu way
  x86/MCE/AMD: Don't cache block addresses on SMCA systems
  x86/MCE: Make number of MCA banks per_cpu
  x86/MCE: Save MCA control bits that get set in hardware

 arch/x86/kernel/cpu/mce/amd.c      |  87 +++++++++---------
 arch/x86/kernel/cpu/mce/core.c     | 138 +++++++++++++++++++----------
 arch/x86/kernel/cpu/mce/inject.c   |   7 +-
 arch/x86/kernel/cpu/mce/internal.h |  12 +--
 4 files changed, 137 insertions(+), 107 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH RESEND 2/5] x86/MCE: Handle MCA controls in a per_cpu way
  2019-04-08 14:12 [PATCH RESEND 0/5] Handle MCA banks in a per_cpu way Ghannam, Yazen
  2019-04-08 14:12 ` [PATCH RESEND 1/5] x86/MCE: Make struct mce_banks[] static Ghannam, Yazen
@ 2019-04-08 14:12 ` Ghannam, Yazen
  2019-04-08 17:51   ` Borislav Petkov
  2019-04-08 14:12 ` [PATCH RESEND 3/5] x86/MCE/AMD: Don't cache block addresses on SMCA systems Ghannam, Yazen
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 23+ messages in thread
From: Ghannam, Yazen @ 2019-04-08 14:12 UTC (permalink / raw)
  To: linux-edac; +Cc: Ghannam, Yazen, linux-kernel, bp, tony.luck, x86

From: Yazen Ghannam <yazen.ghannam@amd.com>

Current AMD systems have unique MCA banks per logical CPU even though
the type of the banks may all align to the same bank number. Each CPU
will have control of a set of MCA banks in the hardware and these are
not shared with other CPUs.

For example, bank 0 may be the Load-Store Unit on every logical CPU, but
each bank 0 is a unique structure in the hardware. In other words, there
isn't a *single* Load-Store Unit at MCA bank 0 that all logical CPUs
share.

This idea extends even to non-core MCA banks. For example, CPU0 and CPU4
may see a Unified Memory Controller at bank 15, but each CPU is actually
seeing a unique hardware structure that is not shared with other CPUs.

Because the MCA banks are all unique hardware structures, it would be
good to control them in a more granular way. For example, if there is a
known issue with the Floating Point Unit on CPU5 and a user wishes to
disable an error type on the Floating Point Unit, then it would be good
to do this only for CPU5 rather than all CPUs.

Also, future AMD systems may have heterogeneous MCA banks. Meaning the
bank numbers may not necessarily represent the same types between CPUs.
For example, bank 20 visible to CPU0 may be a Unified Memory Controller
and bank 20 visible to CPU4 may be a Coherent Slave. So granular control
will be even more necessary should the user wish to control specific MCA
banks.

Split the device attributes from struct mce_bank leaving only the MCA
bank control fields.

Make struct mce_banks[] per_cpu in order to have more granular control
over individual MCA banks in the hardware.

Allocate the device attributes statically based on the maximum number of
MCA banks supported. The sysfs interface will use as many as needed per
CPU. Currently, this is set to mca_cfg.banks, but will be changed to a
per_cpu bank count in a future patch.

Allocate the MCA control bits dynamically. Use the maximum number of MCA
banks supported for now. This will be changed to a per_cpu bank count in
a future patch.

Redo the sysfs store/show functions to handle the per_cpu mce_banks[].

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
---
 arch/x86/kernel/cpu/mce/core.c | 77 ++++++++++++++++++++++------------
 1 file changed, 51 insertions(+), 26 deletions(-)

diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index 8d0d1e8425db..14583c5c6e12 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -64,16 +64,21 @@ static DEFINE_MUTEX(mce_sysfs_mutex);
 
 DEFINE_PER_CPU(unsigned, mce_exception_count);
 
+struct mce_bank {
+	u64	ctl;	/* subevents to enable */
+	bool	init;	/* initialise bank? */
+};
+static DEFINE_PER_CPU_READ_MOSTLY(struct mce_bank*, mce_banks);
+
 #define ATTR_LEN               16
 /* One object for each MCE bank, shared by all CPUs */
-struct mce_bank {
-	u64			ctl;			/* subevents to enable */
-	bool			init;			/* initialise bank? */
+struct mce_bank_dev {
 	struct device_attribute	attr;			/* device attribute */
 	char			attrname[ATTR_LEN];	/* attribute name */
+	u8			bank;			/* bank number */
 };
+static struct mce_bank_dev mce_bank_devs[MAX_NR_BANKS];
 
-static struct mce_bank *mce_banks __read_mostly;
 struct mce_vendor_flags mce_flags __read_mostly;
 
 struct mca_config mca_cfg __read_mostly = {
@@ -695,7 +700,7 @@ bool machine_check_poll(enum mcp_flags flags, mce_banks_t *b)
 		m.tsc = rdtsc();
 
 	for (i = 0; i < mca_cfg.banks; i++) {
-		if (!mce_banks[i].ctl || !test_bit(i, *b))
+		if (!this_cpu_read(mce_banks)[i].ctl || !test_bit(i, *b))
 			continue;
 
 		m.misc = 0;
@@ -1138,7 +1143,7 @@ static void __mc_scan_banks(struct mce *m, struct mce *final,
 		if (!test_bit(i, valid_banks))
 			continue;
 
-		if (!mce_banks[i].ctl)
+		if (!this_cpu_read(mce_banks)[i].ctl)
 			continue;
 
 		m->misc = 0;
@@ -1475,16 +1480,19 @@ static int __mcheck_cpu_mce_banks_init(void)
 {
 	int i;
 
-	mce_banks = kcalloc(MAX_NR_BANKS, sizeof(struct mce_bank), GFP_KERNEL);
-	if (!mce_banks)
+	per_cpu(mce_banks, smp_processor_id()) =
+		kcalloc(MAX_NR_BANKS, sizeof(struct mce_bank), GFP_KERNEL);
+
+	if (!this_cpu_read(mce_banks))
 		return -ENOMEM;
 
 	for (i = 0; i < MAX_NR_BANKS; i++) {
-		struct mce_bank *b = &mce_banks[i];
+		struct mce_bank *b = &this_cpu_read(mce_banks)[i];
 
 		b->ctl = -1ULL;
 		b->init = 1;
 	}
+
 	return 0;
 }
 
@@ -1504,7 +1512,7 @@ static int __mcheck_cpu_cap_init(void)
 
 	mca_cfg.banks = max(mca_cfg.banks, b);
 
-	if (!mce_banks) {
+	if (!this_cpu_read(mce_banks)) {
 		int err = __mcheck_cpu_mce_banks_init();
 		if (err)
 			return err;
@@ -1547,7 +1555,7 @@ static void __mcheck_cpu_init_clear_banks(void)
 	int i;
 
 	for (i = 0; i < mca_cfg.banks; i++) {
-		struct mce_bank *b = &mce_banks[i];
+		struct mce_bank *b = &this_cpu_read(mce_banks)[i];
 
 		if (!b->init)
 			continue;
@@ -1602,7 +1610,7 @@ static int __mcheck_cpu_apply_quirks(struct cpuinfo_x86 *c)
 			 * trips off incorrectly with the IOMMU & 3ware
 			 * & Cerberus:
 			 */
-			clear_bit(10, (unsigned long *)&mce_banks[4].ctl);
+			clear_bit(10, (unsigned long *)&this_cpu_read(mce_banks)[4].ctl);
 		}
 		if (c->x86 < 0x11 && cfg->bootlog < 0) {
 			/*
@@ -1616,7 +1624,7 @@ static int __mcheck_cpu_apply_quirks(struct cpuinfo_x86 *c)
 		 * by default.
 		 */
 		if (c->x86 == 6 && cfg->banks > 0)
-			mce_banks[0].ctl = 0;
+			this_cpu_read(mce_banks)[0].ctl = 0;
 
 		/*
 		 * overflow_recov is supported for F15h Models 00h-0fh
@@ -1638,7 +1646,7 @@ static int __mcheck_cpu_apply_quirks(struct cpuinfo_x86 *c)
 		 */
 
 		if (c->x86 == 6 && c->x86_model < 0x1A && cfg->banks > 0)
-			mce_banks[0].init = 0;
+			this_cpu_read(mce_banks)[0].init = 0;
 
 		/*
 		 * All newer Intel systems support MCE broadcasting. Enable
@@ -1952,7 +1960,7 @@ static void mce_disable_error_reporting(void)
 	int i;
 
 	for (i = 0; i < mca_cfg.banks; i++) {
-		struct mce_bank *b = &mce_banks[i];
+		struct mce_bank *b = &this_cpu_read(mce_banks)[i];
 
 		if (b->init)
 			wrmsrl(msr_ops.ctl(i), 0);
@@ -2051,26 +2059,41 @@ static struct bus_type mce_subsys = {
 
 DEFINE_PER_CPU(struct device *, mce_device);
 
-static inline struct mce_bank *attr_to_bank(struct device_attribute *attr)
+static inline struct mce_bank_dev *attr_to_bank(struct device_attribute *attr)
 {
-	return container_of(attr, struct mce_bank, attr);
+	return container_of(attr, struct mce_bank_dev, attr);
 }
 
 static ssize_t show_bank(struct device *s, struct device_attribute *attr,
 			 char *buf)
 {
-	return sprintf(buf, "%llx\n", attr_to_bank(attr)->ctl);
+	struct mce_bank *b;
+	u8 bank = attr_to_bank(attr)->bank;
+
+	if (bank >= mca_cfg.banks)
+		return -EINVAL;
+
+	b = &per_cpu(mce_banks, s->id)[bank];
+
+	return sprintf(buf, "%llx\n", b->ctl);
 }
 
 static ssize_t set_bank(struct device *s, struct device_attribute *attr,
 			const char *buf, size_t size)
 {
 	u64 new;
+	struct mce_bank *b;
+	u8 bank = attr_to_bank(attr)->bank;
 
 	if (kstrtou64(buf, 0, &new) < 0)
 		return -EINVAL;
 
-	attr_to_bank(attr)->ctl = new;
+	if (bank >= mca_cfg.banks)
+		return -EINVAL;
+
+	b = &per_cpu(mce_banks, s->id)[bank];
+
+	b->ctl = new;
 	mce_restart();
 
 	return size;
@@ -2185,7 +2208,7 @@ static void mce_device_release(struct device *dev)
 	kfree(dev);
 }
 
-/* Per cpu device init. All of the cpus still share the same ctrl bank: */
+/* Per cpu device init. All of the cpus still share the same bank device: */
 static int mce_device_create(unsigned int cpu)
 {
 	struct device *dev;
@@ -2218,7 +2241,7 @@ static int mce_device_create(unsigned int cpu)
 			goto error;
 	}
 	for (j = 0; j < mca_cfg.banks; j++) {
-		err = device_create_file(dev, &mce_banks[j].attr);
+		err = device_create_file(dev, &mce_bank_devs[j].attr);
 		if (err)
 			goto error2;
 	}
@@ -2228,7 +2251,7 @@ static int mce_device_create(unsigned int cpu)
 	return 0;
 error2:
 	while (--j >= 0)
-		device_remove_file(dev, &mce_banks[j].attr);
+		device_remove_file(dev, &mce_bank_devs[j].attr);
 error:
 	while (--i >= 0)
 		device_remove_file(dev, mce_device_attrs[i]);
@@ -2250,7 +2273,7 @@ static void mce_device_remove(unsigned int cpu)
 		device_remove_file(dev, mce_device_attrs[i]);
 
 	for (i = 0; i < mca_cfg.banks; i++)
-		device_remove_file(dev, &mce_banks[i].attr);
+		device_remove_file(dev, &mce_bank_devs[i].attr);
 
 	device_unregister(dev);
 	cpumask_clear_cpu(cpu, mce_device_initialized);
@@ -2279,7 +2302,7 @@ static void mce_reenable_cpu(void)
 	if (!cpuhp_tasks_frozen)
 		cmci_reenable();
 	for (i = 0; i < mca_cfg.banks; i++) {
-		struct mce_bank *b = &mce_banks[i];
+		struct mce_bank *b = &this_cpu_read(mce_banks)[i];
 
 		if (b->init)
 			wrmsrl(msr_ops.ctl(i), b->ctl);
@@ -2328,10 +2351,12 @@ static __init void mce_init_banks(void)
 {
 	int i;
 
-	for (i = 0; i < mca_cfg.banks; i++) {
-		struct mce_bank *b = &mce_banks[i];
+	for (i = 0; i < MAX_NR_BANKS; i++) {
+		struct mce_bank_dev *b = &mce_bank_devs[i];
 		struct device_attribute *a = &b->attr;
 
+		b->bank = i;
+
 		sysfs_attr_init(&a->attr);
 		a->attr.name	= b->attrname;
 		snprintf(b->attrname, ATTR_LEN, "bank%d", i);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH RESEND 4/5] x86/MCE: Make number of MCA banks per_cpu
  2019-04-08 14:12 [PATCH RESEND 0/5] Handle MCA banks in a per_cpu way Ghannam, Yazen
                   ` (2 preceding siblings ...)
  2019-04-08 14:12 ` [PATCH RESEND 3/5] x86/MCE/AMD: Don't cache block addresses on SMCA systems Ghannam, Yazen
@ 2019-04-08 14:12 ` Ghannam, Yazen
  2019-04-08 20:26   ` Luck, Tony
  2019-04-08 14:12 ` [PATCH RESEND 5/5] x86/MCE: Save MCA control bits that get set in hardware Ghannam, Yazen
  4 siblings, 1 reply; 23+ messages in thread
From: Ghannam, Yazen @ 2019-04-08 14:12 UTC (permalink / raw)
  To: linux-edac; +Cc: Ghannam, Yazen, linux-kernel, bp, tony.luck, x86

From: Yazen Ghannam <yazen.ghannam@amd.com>

The number of MCA banks is provided per logical CPU. Historically, this
number has been the same across all CPUs, but this is not an
architectural guarantee. Future AMD systems may have MCA bank counts
that vary between logical CPUs in a system.

This issue was partially addressed in

commit ("006c077041dc x86/mce: Handle varying MCA bank counts")

by allocating structures using the maximum number of MCA banks and by
saving the maximum MCA bank count in a system as the global count. This
means that some extra structures are allocated. Also, this means that
CPUs will spend more time in the #MC and other handlers checking extra
MCA banks.

Define the number of MCA banks as a per_cpu variable. Replace all uses
of mca_cfg.banks with this.

Also, use the per_cpu bank count when allocating per_cpu structures.

Print the number of banks per CPU as a debug message for those who may
be interested.

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
---
 arch/x86/kernel/cpu/mce/amd.c      | 16 +++++-----
 arch/x86/kernel/cpu/mce/core.c     | 49 +++++++++++++++++-------------
 arch/x86/kernel/cpu/mce/inject.c   |  7 +----
 arch/x86/kernel/cpu/mce/internal.h |  2 +-
 4 files changed, 38 insertions(+), 36 deletions(-)

diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c
index f0644b59848d..acce672efb45 100644
--- a/arch/x86/kernel/cpu/mce/amd.c
+++ b/arch/x86/kernel/cpu/mce/amd.c
@@ -493,7 +493,7 @@ static u32 get_block_address(u32 current_addr, u32 low, u32 high,
 {
 	u32 addr = 0, offset = 0;
 
-	if ((bank >= mca_cfg.banks) || (block >= NR_BLOCKS))
+	if ((bank >= per_cpu(num_banks, cpu)) || (block >= NR_BLOCKS))
 		return addr;
 
 	if (mce_flags.smca)
@@ -605,7 +605,7 @@ void mce_amd_feature_init(struct cpuinfo_x86 *c)
 
 	disable_err_thresholding(c);
 
-	for (bank = 0; bank < mca_cfg.banks; ++bank) {
+	for (bank = 0; bank < this_cpu_read(num_banks); ++bank) {
 		if (mce_flags.smca)
 			smca_configure(bank, cpu);
 
@@ -948,7 +948,7 @@ static void amd_deferred_error_interrupt(void)
 {
 	unsigned int bank;
 
-	for (bank = 0; bank < mca_cfg.banks; ++bank)
+	for (bank = 0; bank < this_cpu_read(num_banks); ++bank)
 		log_error_deferred(bank);
 }
 
@@ -989,7 +989,7 @@ static void amd_threshold_interrupt(void)
 	struct threshold_block *first_block = NULL, *block = NULL, *tmp = NULL;
 	unsigned int bank, cpu = smp_processor_id();
 
-	for (bank = 0; bank < mca_cfg.banks; ++bank) {
+	for (bank = 0; bank < this_cpu_read(num_banks); ++bank) {
 		if (!(per_cpu(bank_map, cpu) & (1 << bank)))
 			continue;
 
@@ -1176,7 +1176,7 @@ static int allocate_threshold_blocks(unsigned int cpu, unsigned int bank,
 	u32 low, high;
 	int err;
 
-	if ((bank >= mca_cfg.banks) || (block >= NR_BLOCKS))
+	if ((bank >= per_cpu(num_banks, cpu)) || (block >= NR_BLOCKS))
 		return 0;
 
 	if (rdmsr_safe_on_cpu(cpu, address, &low, &high))
@@ -1410,7 +1410,7 @@ int mce_threshold_remove_device(unsigned int cpu)
 {
 	unsigned int bank;
 
-	for (bank = 0; bank < mca_cfg.banks; ++bank) {
+	for (bank = 0; bank < per_cpu(num_banks, cpu); ++bank) {
 		if (!(per_cpu(bank_map, cpu) & (1 << bank)))
 			continue;
 		threshold_remove_bank(cpu, bank);
@@ -1431,14 +1431,14 @@ int mce_threshold_create_device(unsigned int cpu)
 	if (bp)
 		return 0;
 
-	bp = kcalloc(mca_cfg.banks, sizeof(struct threshold_bank *),
+	bp = kcalloc(per_cpu(num_banks, cpu), sizeof(struct threshold_bank *),
 		     GFP_KERNEL);
 	if (!bp)
 		return -ENOMEM;
 
 	per_cpu(threshold_banks, cpu) = bp;
 
-	for (bank = 0; bank < mca_cfg.banks; ++bank) {
+	for (bank = 0; bank < per_cpu(num_banks, cpu); ++bank) {
 		if (!(per_cpu(bank_map, cpu) & (1 << bank)))
 			continue;
 		err = threshold_create_bank(cpu, bank);
diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index 14583c5c6e12..4a066c1e8ab2 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -64,6 +64,9 @@ static DEFINE_MUTEX(mce_sysfs_mutex);
 
 DEFINE_PER_CPU(unsigned, mce_exception_count);
 
+DEFINE_PER_CPU_READ_MOSTLY(u8, num_banks);
+EXPORT_PER_CPU_SYMBOL_GPL(num_banks);
+
 struct mce_bank {
 	u64	ctl;	/* subevents to enable */
 	bool	init;	/* initialise bank? */
@@ -699,7 +702,7 @@ bool machine_check_poll(enum mcp_flags flags, mce_banks_t *b)
 	if (flags & MCP_TIMESTAMP)
 		m.tsc = rdtsc();
 
-	for (i = 0; i < mca_cfg.banks; i++) {
+	for (i = 0; i < this_cpu_read(num_banks); i++) {
 		if (!this_cpu_read(mce_banks)[i].ctl || !test_bit(i, *b))
 			continue;
 
@@ -801,7 +804,7 @@ static int mce_no_way_out(struct mce *m, char **msg, unsigned long *validp,
 	char *tmp;
 	int i;
 
-	for (i = 0; i < mca_cfg.banks; i++) {
+	for (i = 0; i < this_cpu_read(num_banks); i++) {
 		m->status = mce_rdmsrl(msr_ops.status(i));
 		if (!(m->status & MCI_STATUS_VAL))
 			continue;
@@ -1081,7 +1084,7 @@ static void mce_clear_state(unsigned long *toclear)
 {
 	int i;
 
-	for (i = 0; i < mca_cfg.banks; i++) {
+	for (i = 0; i < this_cpu_read(num_banks); i++) {
 		if (test_bit(i, toclear))
 			mce_wrmsrl(msr_ops.status(i), 0);
 	}
@@ -1138,7 +1141,7 @@ static void __mc_scan_banks(struct mce *m, struct mce *final,
 	struct mca_config *cfg = &mca_cfg;
 	int severity, i;
 
-	for (i = 0; i < cfg->banks; i++) {
+	for (i = 0; i < this_cpu_read(num_banks); i++) {
 		__clear_bit(i, toclear);
 		if (!test_bit(i, valid_banks))
 			continue;
@@ -1478,15 +1481,16 @@ EXPORT_SYMBOL_GPL(mce_notify_irq);
 
 static int __mcheck_cpu_mce_banks_init(void)
 {
+	u8 n_banks = this_cpu_read(num_banks);
 	int i;
 
 	per_cpu(mce_banks, smp_processor_id()) =
-		kcalloc(MAX_NR_BANKS, sizeof(struct mce_bank), GFP_KERNEL);
+		kcalloc(n_banks, sizeof(struct mce_bank), GFP_KERNEL);
 
 	if (!this_cpu_read(mce_banks))
 		return -ENOMEM;
 
-	for (i = 0; i < MAX_NR_BANKS; i++) {
+	for (i = 0; i < n_banks; i++) {
 		struct mce_bank *b = &this_cpu_read(mce_banks)[i];
 
 		b->ctl = -1ULL;
@@ -1507,10 +1511,15 @@ static int __mcheck_cpu_cap_init(void)
 	rdmsrl(MSR_IA32_MCG_CAP, cap);
 
 	b = cap & MCG_BANKCNT_MASK;
-	if (WARN_ON_ONCE(b > MAX_NR_BANKS))
+	pr_debug("CPU%d supports %d MCE banks\n", smp_processor_id(), b);
+
+	if (b > MAX_NR_BANKS) {
+		pr_warn("CPU%d: Using only %u machine check banks out of %u\n",
+			smp_processor_id(), MAX_NR_BANKS, b);
 		b = MAX_NR_BANKS;
+	}
 
-	mca_cfg.banks = max(mca_cfg.banks, b);
+	this_cpu_write(num_banks, b);
 
 	if (!this_cpu_read(mce_banks)) {
 		int err = __mcheck_cpu_mce_banks_init();
@@ -1554,7 +1563,7 @@ static void __mcheck_cpu_init_clear_banks(void)
 {
 	int i;
 
-	for (i = 0; i < mca_cfg.banks; i++) {
+	for (i = 0; i < this_cpu_read(num_banks); i++) {
 		struct mce_bank *b = &this_cpu_read(mce_banks)[i];
 
 		if (!b->init)
@@ -1604,7 +1613,7 @@ static int __mcheck_cpu_apply_quirks(struct cpuinfo_x86 *c)
 
 	/* This should be disabled by the BIOS, but isn't always */
 	if (c->x86_vendor == X86_VENDOR_AMD) {
-		if (c->x86 == 15 && cfg->banks > 4) {
+		if (c->x86 == 15 && this_cpu_read(num_banks) > 4) {
 			/*
 			 * disable GART TBL walk error reporting, which
 			 * trips off incorrectly with the IOMMU & 3ware
@@ -1623,7 +1632,7 @@ static int __mcheck_cpu_apply_quirks(struct cpuinfo_x86 *c)
 		 * Various K7s with broken bank 0 around. Always disable
 		 * by default.
 		 */
-		if (c->x86 == 6 && cfg->banks > 0)
+		if (c->x86 == 6 && this_cpu_read(num_banks) > 0)
 			this_cpu_read(mce_banks)[0].ctl = 0;
 
 		/*
@@ -1645,7 +1654,7 @@ static int __mcheck_cpu_apply_quirks(struct cpuinfo_x86 *c)
 		 * valid event later, merely don't write CTL0.
 		 */
 
-		if (c->x86 == 6 && c->x86_model < 0x1A && cfg->banks > 0)
+		if (c->x86 == 6 && c->x86_model < 0x1A && this_cpu_read(num_banks) > 0)
 			this_cpu_read(mce_banks)[0].init = 0;
 
 		/*
@@ -1871,7 +1880,7 @@ static void __mce_disable_bank(void *arg)
 
 void mce_disable_bank(int bank)
 {
-	if (bank >= mca_cfg.banks) {
+	if (bank >= this_cpu_read(num_banks)) {
 		pr_warn(FW_BUG
 			"Ignoring request to disable invalid MCA bank %d.\n",
 			bank);
@@ -1959,7 +1968,7 @@ static void mce_disable_error_reporting(void)
 {
 	int i;
 
-	for (i = 0; i < mca_cfg.banks; i++) {
+	for (i = 0; i < this_cpu_read(num_banks); i++) {
 		struct mce_bank *b = &this_cpu_read(mce_banks)[i];
 
 		if (b->init)
@@ -2070,7 +2079,7 @@ static ssize_t show_bank(struct device *s, struct device_attribute *attr,
 	struct mce_bank *b;
 	u8 bank = attr_to_bank(attr)->bank;
 
-	if (bank >= mca_cfg.banks)
+	if (bank >= per_cpu(num_banks, s->id))
 		return -EINVAL;
 
 	b = &per_cpu(mce_banks, s->id)[bank];
@@ -2088,7 +2097,7 @@ static ssize_t set_bank(struct device *s, struct device_attribute *attr,
 	if (kstrtou64(buf, 0, &new) < 0)
 		return -EINVAL;
 
-	if (bank >= mca_cfg.banks)
+	if (bank >= per_cpu(num_banks, s->id))
 		return -EINVAL;
 
 	b = &per_cpu(mce_banks, s->id)[bank];
@@ -2240,7 +2249,7 @@ static int mce_device_create(unsigned int cpu)
 		if (err)
 			goto error;
 	}
-	for (j = 0; j < mca_cfg.banks; j++) {
+	for (j = 0; j < per_cpu(num_banks, cpu); j++) {
 		err = device_create_file(dev, &mce_bank_devs[j].attr);
 		if (err)
 			goto error2;
@@ -2272,7 +2281,7 @@ static void mce_device_remove(unsigned int cpu)
 	for (i = 0; mce_device_attrs[i]; i++)
 		device_remove_file(dev, mce_device_attrs[i]);
 
-	for (i = 0; i < mca_cfg.banks; i++)
+	for (i = 0; i < per_cpu(num_banks, cpu); i++)
 		device_remove_file(dev, &mce_bank_devs[i].attr);
 
 	device_unregister(dev);
@@ -2301,7 +2310,7 @@ static void mce_reenable_cpu(void)
 
 	if (!cpuhp_tasks_frozen)
 		cmci_reenable();
-	for (i = 0; i < mca_cfg.banks; i++) {
+	for (i = 0; i < this_cpu_read(num_banks); i++) {
 		struct mce_bank *b = &this_cpu_read(mce_banks)[i];
 
 		if (b->init)
@@ -2489,8 +2498,6 @@ EXPORT_SYMBOL_GPL(mcsafe_key);
 
 static int __init mcheck_late_init(void)
 {
-	pr_info("Using %d MCE banks\n", mca_cfg.banks);
-
 	if (mca_cfg.recovery)
 		static_branch_inc(&mcsafe_key);
 
diff --git a/arch/x86/kernel/cpu/mce/inject.c b/arch/x86/kernel/cpu/mce/inject.c
index 3f82afd0f46f..8340f8213d6b 100644
--- a/arch/x86/kernel/cpu/mce/inject.c
+++ b/arch/x86/kernel/cpu/mce/inject.c
@@ -568,12 +568,7 @@ static void do_inject(void)
 static int inj_bank_set(void *data, u64 val)
 {
 	struct mce *m = (struct mce *)data;
-	u8 n_banks;
-	u64 cap;
-
-	/* Get bank count on target CPU so we can handle non-uniform values. */
-	rdmsrl_on_cpu(m->extcpu, MSR_IA32_MCG_CAP, &cap);
-	n_banks = cap & MCG_BANKCNT_MASK;
+	u8 n_banks = per_cpu(num_banks, m->extcpu);
 
 	if (val >= n_banks) {
 		pr_err("MCA bank %llu non-existent on CPU%d\n", val, m->extcpu);
diff --git a/arch/x86/kernel/cpu/mce/internal.h b/arch/x86/kernel/cpu/mce/internal.h
index 032d52c66616..f0078859afe3 100644
--- a/arch/x86/kernel/cpu/mce/internal.h
+++ b/arch/x86/kernel/cpu/mce/internal.h
@@ -118,7 +118,6 @@ struct mca_config {
 	      bios_cmci_threshold	: 1,
 	      __reserved		: 59;
 
-	u8 banks;
 	s8 bootlog;
 	int tolerant;
 	int monarch_timeout;
@@ -127,6 +126,7 @@ struct mca_config {
 };
 
 extern struct mca_config mca_cfg;
+DECLARE_PER_CPU_READ_MOSTLY(u8, num_banks);
 
 struct mce_vendor_flags {
 	/*
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH RESEND 3/5] x86/MCE/AMD: Don't cache block addresses on SMCA systems
  2019-04-08 14:12 [PATCH RESEND 0/5] Handle MCA banks in a per_cpu way Ghannam, Yazen
  2019-04-08 14:12 ` [PATCH RESEND 1/5] x86/MCE: Make struct mce_banks[] static Ghannam, Yazen
  2019-04-08 14:12 ` [PATCH RESEND 2/5] x86/MCE: Handle MCA controls in a per_cpu way Ghannam, Yazen
@ 2019-04-08 14:12 ` Ghannam, Yazen
  2019-04-08 14:12 ` [PATCH RESEND 4/5] x86/MCE: Make number of MCA banks per_cpu Ghannam, Yazen
  2019-04-08 14:12 ` [PATCH RESEND 5/5] x86/MCE: Save MCA control bits that get set in hardware Ghannam, Yazen
  4 siblings, 0 replies; 23+ messages in thread
From: Ghannam, Yazen @ 2019-04-08 14:12 UTC (permalink / raw)
  To: linux-edac; +Cc: Ghannam, Yazen, linux-kernel, bp, tony.luck, x86

From: Yazen Ghannam <yazen.ghannam@amd.com>

On legacy systems, the addresses of the MCA_MISC* registers need to be
recursively discovered based on a Block Pointer field in the registers.

On Scalable MCA systems, the register space is fixed, and particular
addresses can be derived by regular offsets for bank and register type.
This fixed address space includes the MCA_MISC* registers.

MCA_MISC0 is always available for each MCA bank. MCA_MISC1 through
MCA_MISC4 are considered available if MCA_MISC0[BlkPtr]=1.

Cache the value of MCA_MISC0[BlkPtr] for each bank and per CPU. This
needs to be done only during init. The values should be saved per CPU
to accommodate heterogeneous SMCA systems.

Redo smca_get_block_address() to directly return the block addresses.
Use the cached Block Pointer value to decide if the MCA_MISC1-4
addresses should be returned.

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
---
 arch/x86/kernel/cpu/mce/amd.c | 71 +++++++++++++++++------------------
 1 file changed, 35 insertions(+), 36 deletions(-)

diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c
index e64de5149e50..f0644b59848d 100644
--- a/arch/x86/kernel/cpu/mce/amd.c
+++ b/arch/x86/kernel/cpu/mce/amd.c
@@ -101,11 +101,6 @@ static struct smca_bank_name smca_names[] = {
 	[SMCA_PCIE]	= { "pcie",		"PCI Express Unit" },
 };
 
-static u32 smca_bank_addrs[MAX_NR_BANKS][NR_BLOCKS] __ro_after_init =
-{
-	[0 ... MAX_NR_BANKS - 1] = { [0 ... NR_BLOCKS - 1] = -1 }
-};
-
 static const char *smca_get_name(enum smca_bank_types t)
 {
 	if (t >= N_SMCA_BANK_TYPES)
@@ -198,6 +193,7 @@ static char buf_mcatype[MAX_MCATYPE_NAME_LEN];
 
 static DEFINE_PER_CPU(struct threshold_bank **, threshold_banks);
 static DEFINE_PER_CPU(unsigned int, bank_map);	/* see which banks are on */
+static DEFINE_PER_CPU(u32, smca_blkptr_map); /* see which banks use BlkPtr */
 
 static void amd_threshold_interrupt(void);
 static void amd_deferred_error_interrupt(void);
@@ -208,6 +204,28 @@ static void default_deferred_error_interrupt(void)
 }
 void (*deferred_error_int_vector)(void) = default_deferred_error_interrupt;
 
+static void smca_set_blkptr_map(unsigned int bank, unsigned int cpu)
+{
+	u32 low, high;
+
+	/*
+	 * For SMCA enabled processors, BLKPTR field of the first MISC register
+	 * (MCx_MISC0) indicates presence of additional MISC regs set (MISC1-4).
+	 */
+	if (rdmsr_safe(MSR_AMD64_SMCA_MCx_CONFIG(bank), &low, &high))
+		return;
+
+	if (!(low & MCI_CONFIG_MCAX))
+		return;
+
+	if (rdmsr_safe(MSR_AMD64_SMCA_MCx_MISC(bank), &low, &high))
+		return;
+
+	if (low & MASK_BLKPTR_LO)
+		per_cpu(smca_blkptr_map, cpu) |= 1 << bank;
+
+}
+
 static void smca_configure(unsigned int bank, unsigned int cpu)
 {
 	unsigned int i, hwid_mcatype;
@@ -245,6 +263,8 @@ static void smca_configure(unsigned int bank, unsigned int cpu)
 		wrmsr(smca_config, low, high);
 	}
 
+	smca_set_blkptr_map(bank, cpu);
+
 	/* Return early if this bank was already initialized. */
 	if (smca_banks[bank].hwid)
 		return;
@@ -455,42 +475,21 @@ static void deferred_error_interrupt_enable(struct cpuinfo_x86 *c)
 	wrmsr(MSR_CU_DEF_ERR, low, high);
 }
 
-static u32 smca_get_block_address(unsigned int bank, unsigned int block)
+static u32 smca_get_block_address(unsigned int bank, unsigned int block,
+				  unsigned int cpu)
 {
-	u32 low, high;
-	u32 addr = 0;
-
-	if (smca_get_bank_type(bank) == SMCA_RESERVED)
-		return addr;
-
 	if (!block)
 		return MSR_AMD64_SMCA_MCx_MISC(bank);
 
-	/* Check our cache first: */
-	if (smca_bank_addrs[bank][block] != -1)
-		return smca_bank_addrs[bank][block];
-
-	/*
-	 * For SMCA enabled processors, BLKPTR field of the first MISC register
-	 * (MCx_MISC0) indicates presence of additional MISC regs set (MISC1-4).
-	 */
-	if (rdmsr_safe(MSR_AMD64_SMCA_MCx_CONFIG(bank), &low, &high))
-		goto out;
-
-	if (!(low & MCI_CONFIG_MCAX))
-		goto out;
-
-	if (!rdmsr_safe(MSR_AMD64_SMCA_MCx_MISC(bank), &low, &high) &&
-	    (low & MASK_BLKPTR_LO))
-		addr = MSR_AMD64_SMCA_MCx_MISCy(bank, block - 1);
+	if (!(per_cpu(smca_blkptr_map, cpu) & (1 << bank)))
+		return 0;
 
-out:
-	smca_bank_addrs[bank][block] = addr;
-	return addr;
+	return MSR_AMD64_SMCA_MCx_MISCy(bank, block - 1);
 }
 
 static u32 get_block_address(u32 current_addr, u32 low, u32 high,
-			     unsigned int bank, unsigned int block)
+			     unsigned int bank, unsigned int block,
+			     unsigned int cpu)
 {
 	u32 addr = 0, offset = 0;
 
@@ -498,7 +497,7 @@ static u32 get_block_address(u32 current_addr, u32 low, u32 high,
 		return addr;
 
 	if (mce_flags.smca)
-		return smca_get_block_address(bank, block);
+		return smca_get_block_address(bank, block, cpu);
 
 	/* Fall back to method we used for older processors: */
 	switch (block) {
@@ -611,7 +610,7 @@ void mce_amd_feature_init(struct cpuinfo_x86 *c)
 			smca_configure(bank, cpu);
 
 		for (block = 0; block < NR_BLOCKS; ++block) {
-			address = get_block_address(address, low, high, bank, block);
+			address = get_block_address(address, low, high, bank, block, cpu);
 			if (!address)
 				break;
 
@@ -1228,7 +1227,7 @@ static int allocate_threshold_blocks(unsigned int cpu, unsigned int bank,
 	if (err)
 		goto out_free;
 recurse:
-	address = get_block_address(address, low, high, bank, ++block);
+	address = get_block_address(address, low, high, bank, ++block, cpu);
 	if (!address)
 		return 0;
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH RESEND 5/5] x86/MCE: Save MCA control bits that get set in hardware
  2019-04-08 14:12 [PATCH RESEND 0/5] Handle MCA banks in a per_cpu way Ghannam, Yazen
                   ` (3 preceding siblings ...)
  2019-04-08 14:12 ` [PATCH RESEND 4/5] x86/MCE: Make number of MCA banks per_cpu Ghannam, Yazen
@ 2019-04-08 14:12 ` Ghannam, Yazen
  4 siblings, 0 replies; 23+ messages in thread
From: Ghannam, Yazen @ 2019-04-08 14:12 UTC (permalink / raw)
  To: linux-edac; +Cc: Ghannam, Yazen, linux-kernel, bp, tony.luck, x86

From: Yazen Ghannam <yazen.ghannam@amd.com>

The OS is expected to write all bits in MCA_CTL. However, only
implemented bits get set in the hardware.

Read back MCA_CTL so that the value in the hardware is saved and
reported through sysfs.

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
---
 arch/x86/kernel/cpu/mce/core.c | 15 ++++++++++-----
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index 4a066c1e8ab2..ed5374ab3ac3 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -1566,10 +1566,13 @@ static void __mcheck_cpu_init_clear_banks(void)
 	for (i = 0; i < this_cpu_read(num_banks); i++) {
 		struct mce_bank *b = &this_cpu_read(mce_banks)[i];
 
-		if (!b->init)
-			continue;
-		wrmsrl(msr_ops.ctl(i), b->ctl);
-		wrmsrl(msr_ops.status(i), 0);
+		if (b->init) {
+			wrmsrl(msr_ops.ctl(i), b->ctl);
+			wrmsrl(msr_ops.status(i), 0);
+		}
+
+		/* Save bits set in hardware. */
+		rdmsrl(msr_ops.ctl(i), b->ctl);
 	}
 }
 
@@ -2313,8 +2316,10 @@ static void mce_reenable_cpu(void)
 	for (i = 0; i < this_cpu_read(num_banks); i++) {
 		struct mce_bank *b = &this_cpu_read(mce_banks)[i];
 
-		if (b->init)
+		if (b->init) {
 			wrmsrl(msr_ops.ctl(i), b->ctl);
+			rdmsrl(msr_ops.ctl(i), b->ctl);
+		}
 	}
 }
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH RESEND 2/5] x86/MCE: Handle MCA controls in a per_cpu way
  2019-04-08 14:12 ` [PATCH RESEND 2/5] x86/MCE: Handle MCA controls in a per_cpu way Ghannam, Yazen
@ 2019-04-08 17:51   ` Borislav Petkov
  2019-04-08 18:55     ` Ghannam, Yazen
  0 siblings, 1 reply; 23+ messages in thread
From: Borislav Petkov @ 2019-04-08 17:51 UTC (permalink / raw)
  To: Ghannam, Yazen; +Cc: linux-edac, linux-kernel, tony.luck, x86

On Mon, Apr 08, 2019 at 02:12:16PM +0000, Ghannam, Yazen wrote:
> From: Yazen Ghannam <yazen.ghannam@amd.com>
> 
> Current AMD systems have unique MCA banks per logical CPU even though
> the type of the banks may all align to the same bank number. Each CPU
> will have control of a set of MCA banks in the hardware and these are
> not shared with other CPUs.
> 
> For example, bank 0 may be the Load-Store Unit on every logical CPU, but
> each bank 0 is a unique structure in the hardware. In other words, there
> isn't a *single* Load-Store Unit at MCA bank 0 that all logical CPUs
> share.
> 
> This idea extends even to non-core MCA banks. For example, CPU0 and CPU4
> may see a Unified Memory Controller at bank 15, but each CPU is actually
> seeing a unique hardware structure that is not shared with other CPUs.
> 
> Because the MCA banks are all unique hardware structures, it would be
> good to control them in a more granular way. For example, if there is a
> known issue with the Floating Point Unit on CPU5 and a user wishes to
> disable an error type on the Floating Point Unit, then it would be good
> to do this only for CPU5 rather than all CPUs.
> 
> Also, future AMD systems may have heterogeneous MCA banks. Meaning the
> bank numbers may not necessarily represent the same types between CPUs.
> For example, bank 20 visible to CPU0 may be a Unified Memory Controller
> and bank 20 visible to CPU4 may be a Coherent Slave. So granular control
> will be even more necessary should the user wish to control specific MCA
> banks.
> 
> Split the device attributes from struct mce_bank leaving only the MCA
> bank control fields.
> 
> Make struct mce_banks[] per_cpu in order to have more granular control
> over individual MCA banks in the hardware.
> 
> Allocate the device attributes statically based on the maximum number of
> MCA banks supported. The sysfs interface will use as many as needed per
> CPU. Currently, this is set to mca_cfg.banks, but will be changed to a
> per_cpu bank count in a future patch.
> 
> Allocate the MCA control bits dynamically. Use the maximum number of MCA
> banks supported for now. This will be changed to a per_cpu bank count in
> a future patch.
> 
> Redo the sysfs store/show functions to handle the per_cpu mce_banks[].
> 
> Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
> ---
>  arch/x86/kernel/cpu/mce/core.c | 77 ++++++++++++++++++++++------------
>  1 file changed, 51 insertions(+), 26 deletions(-)
> 
> diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
> index 8d0d1e8425db..14583c5c6e12 100644
> --- a/arch/x86/kernel/cpu/mce/core.c
> +++ b/arch/x86/kernel/cpu/mce/core.c
> @@ -64,16 +64,21 @@ static DEFINE_MUTEX(mce_sysfs_mutex);
>  
>  DEFINE_PER_CPU(unsigned, mce_exception_count);
>  
> +struct mce_bank {
> +	u64	ctl;	/* subevents to enable */
> +	bool	init;	/* initialise bank? */

Keep that vertical alignment as of that of the members if mce_bank_dev
below.

> +};
> +static DEFINE_PER_CPU_READ_MOSTLY(struct mce_bank*, mce_banks);

Space between mce_bank and *.

> +
>  #define ATTR_LEN               16
>  /* One object for each MCE bank, shared by all CPUs */
> -struct mce_bank {
> -	u64			ctl;			/* subevents to enable */
> -	bool			init;			/* initialise bank? */
> +struct mce_bank_dev {
>  	struct device_attribute	attr;			/* device attribute */
>  	char			attrname[ATTR_LEN];	/* attribute name */
> +	u8			bank;			/* bank number */
>  };
> +static struct mce_bank_dev mce_bank_devs[MAX_NR_BANKS];

What bothers me here is the connection between the mce_bank and the
mce_bank_dev: it is simply not there.

Why isn't there a

	struct mce_bank_dev *dev;

in struct mce_bank?

Because - and correct me if I'm wrong here - but I think if we do
per-CPU banks, then we need to selectively point from each mce_bank to
its corresponding mce_bank_dev descriptor so that you have the proper
names.

For example, if bank3 on CPU5 is not present/disabled/N/A/whatever, then
you need to not initialize the that sysfs file there and have:

/sys/devices/system/machinecheck/machinecheck5/
├── bank0
├── bank1
├── bank10
├── bank11
├── bank12
├── bank13
├── bank14
├── bank15
├── bank16
├── bank17
├── bank18
├── bank19
├── bank2
├── bank20
├── bank21
├── bank22
		<--- bank 3 is not there because unsupported.
├── bank4
├── bank5
├── bank6
├── bank7
├── bank8
├── bank9


Which means that mce_device_create() should learn to be able to create
non-contiguous per-CPU bank sysfs files so that you'll have to iterate
over the per-CPU struct mce_banks array and use only those mce_bank_dev
* pointers which represent present banks on this CPU only.

Yes, no, am I way off?

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* RE: [PATCH RESEND 2/5] x86/MCE: Handle MCA controls in a per_cpu way
  2019-04-08 17:51   ` Borislav Petkov
@ 2019-04-08 18:55     ` Ghannam, Yazen
  2019-04-09 20:34       ` Borislav Petkov
  0 siblings, 1 reply; 23+ messages in thread
From: Ghannam, Yazen @ 2019-04-08 18:55 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: linux-edac, linux-kernel, tony.luck, x86

> -----Original Message-----
> From: Borislav Petkov <bp@alien8.de>
> Sent: Monday, April 8, 2019 12:52 PM
> To: Ghannam, Yazen <Yazen.Ghannam@amd.com>
> Cc: linux-edac@vger.kernel.org; linux-kernel@vger.kernel.org; tony.luck@intel.com; x86@kernel.org
> Subject: Re: [PATCH RESEND 2/5] x86/MCE: Handle MCA controls in a per_cpu way
> 
> On Mon, Apr 08, 2019 at 02:12:16PM +0000, Ghannam, Yazen wrote:
> > From: Yazen Ghannam <yazen.ghannam@amd.com>
> >
> > Current AMD systems have unique MCA banks per logical CPU even though
> > the type of the banks may all align to the same bank number. Each CPU
> > will have control of a set of MCA banks in the hardware and these are
> > not shared with other CPUs.
> >
> > For example, bank 0 may be the Load-Store Unit on every logical CPU, but
> > each bank 0 is a unique structure in the hardware. In other words, there
> > isn't a *single* Load-Store Unit at MCA bank 0 that all logical CPUs
> > share.
> >
> > This idea extends even to non-core MCA banks. For example, CPU0 and CPU4
> > may see a Unified Memory Controller at bank 15, but each CPU is actually
> > seeing a unique hardware structure that is not shared with other CPUs.
> >
> > Because the MCA banks are all unique hardware structures, it would be
> > good to control them in a more granular way. For example, if there is a
> > known issue with the Floating Point Unit on CPU5 and a user wishes to
> > disable an error type on the Floating Point Unit, then it would be good
> > to do this only for CPU5 rather than all CPUs.
> >
> > Also, future AMD systems may have heterogeneous MCA banks. Meaning the
> > bank numbers may not necessarily represent the same types between CPUs.
> > For example, bank 20 visible to CPU0 may be a Unified Memory Controller
> > and bank 20 visible to CPU4 may be a Coherent Slave. So granular control
> > will be even more necessary should the user wish to control specific MCA
> > banks.
> >
> > Split the device attributes from struct mce_bank leaving only the MCA
> > bank control fields.
> >
> > Make struct mce_banks[] per_cpu in order to have more granular control
> > over individual MCA banks in the hardware.
> >
> > Allocate the device attributes statically based on the maximum number of
> > MCA banks supported. The sysfs interface will use as many as needed per
> > CPU. Currently, this is set to mca_cfg.banks, but will be changed to a
> > per_cpu bank count in a future patch.
> >
> > Allocate the MCA control bits dynamically. Use the maximum number of MCA
> > banks supported for now. This will be changed to a per_cpu bank count in
> > a future patch.
> >
> > Redo the sysfs store/show functions to handle the per_cpu mce_banks[].
> >
> > Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
> > ---
> >  arch/x86/kernel/cpu/mce/core.c | 77 ++++++++++++++++++++++------------
> >  1 file changed, 51 insertions(+), 26 deletions(-)
> >
> > diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
> > index 8d0d1e8425db..14583c5c6e12 100644
> > --- a/arch/x86/kernel/cpu/mce/core.c
> > +++ b/arch/x86/kernel/cpu/mce/core.c
> > @@ -64,16 +64,21 @@ static DEFINE_MUTEX(mce_sysfs_mutex);
> >
> >  DEFINE_PER_CPU(unsigned, mce_exception_count);
> >
> > +struct mce_bank {
> > +	u64	ctl;	/* subevents to enable */
> > +	bool	init;	/* initialise bank? */
> 
> Keep that vertical alignment as of that of the members if mce_bank_dev
> below.
> 
> > +};
> > +static DEFINE_PER_CPU_READ_MOSTLY(struct mce_bank*, mce_banks);
> 
> Space between mce_bank and *.
> 

Okay.

> > +
> >  #define ATTR_LEN               16
> >  /* One object for each MCE bank, shared by all CPUs */
> > -struct mce_bank {
> > -	u64			ctl;			/* subevents to enable */
> > -	bool			init;			/* initialise bank? */
> > +struct mce_bank_dev {
> >  	struct device_attribute	attr;			/* device attribute */
> >  	char			attrname[ATTR_LEN];	/* attribute name */
> > +	u8			bank;			/* bank number */
> >  };
> > +static struct mce_bank_dev mce_bank_devs[MAX_NR_BANKS];
> 
> What bothers me here is the connection between the mce_bank and the
> mce_bank_dev: it is simply not there.
> 

The connection is the bank number.

> Why isn't there a
> 
> 	struct mce_bank_dev *dev;
> 
> in struct mce_bank?
> 
> Because - and correct me if I'm wrong here - but I think if we do
> per-CPU banks, then we need to selectively point from each mce_bank to
> its corresponding mce_bank_dev descriptor so that you have the proper
> names.
> 

The file name is always "bank#", so it seems redundant to make the file descriptor per_cpu.

My thinking was to make the file descriptor part common to all CPUs. Then redo the show/store functions to handle the per_cpu control values.

> For example, if bank3 on CPU5 is not present/disabled/N/A/whatever, then
> you need to not initialize the that sysfs file there and have:
> 
> /sys/devices/system/machinecheck/machinecheck5/
> ├── bank0
> ├── bank1
> ├── bank10
> ├── bank11
> ├── bank12
> ├── bank13
> ├── bank14
> ├── bank15
> ├── bank16
> ├── bank17
> ├── bank18
> ├── bank19
> ├── bank2
> ├── bank20
> ├── bank21
> ├── bank22
> 		<--- bank 3 is not there because unsupported.
> ├── bank4
> ├── bank5
> ├── bank6
> ├── bank7
> ├── bank8
> ├── bank9
> 
> 
> Which means that mce_device_create() should learn to be able to create
> non-contiguous per-CPU bank sysfs files so that you'll have to iterate
> over the per-CPU struct mce_banks array and use only those mce_bank_dev
> * pointers which represent present banks on this CPU only.
> 
> Yes, no, am I way off?
> 

I see what you're saying.

We already have the case where some banks are not initialized either due to quirks or because they are Read-as-Zero, but we don't try to skip creating their files. With this full set (see patch 5), an unused bank will return a control value of 0. Would that be sufficient to indicate that a bank is not used?

I don't have any strong opinion about skipping the file creation or not. The files are created per CPU, so I think an "if (per_cpu(mce_banks, cpu)[i].ctl)" check could be enough to decide to create a file or not.

But I do have a couple of thoughts:
1) Will missing banks confuse users? As mentioned, we already have the case of unused/uninitialized banks today, but we don't skip their file creation.
	a) Will this affect any userspace tools?
2) Is the added complexity for file creation/destruction worth it? As mentioned, the file will return 0 for unused/uninitialized banks.

Thanks,
Yazen

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH RESEND 4/5] x86/MCE: Make number of MCA banks per_cpu
  2019-04-08 14:12 ` [PATCH RESEND 4/5] x86/MCE: Make number of MCA banks per_cpu Ghannam, Yazen
@ 2019-04-08 20:26   ` Luck, Tony
  2019-04-08 20:34     ` Borislav Petkov
  0 siblings, 1 reply; 23+ messages in thread
From: Luck, Tony @ 2019-04-08 20:26 UTC (permalink / raw)
  To: Ghannam, Yazen; +Cc: linux-edac, linux-kernel, bp, x86

On Mon, Apr 08, 2019 at 02:12:17PM +0000, Ghannam, Yazen wrote:
> +DEFINE_PER_CPU_READ_MOSTLY(u8, num_banks);
> +EXPORT_PER_CPU_SYMBOL_GPL(num_banks);

The name "num_banks" is a bit generic for an exported symbol.
I think it should have a "mce_" prefix.

-Tony

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH RESEND 4/5] x86/MCE: Make number of MCA banks per_cpu
  2019-04-08 20:26   ` Luck, Tony
@ 2019-04-08 20:34     ` Borislav Petkov
  2019-04-08 20:42       ` Luck, Tony
  0 siblings, 1 reply; 23+ messages in thread
From: Borislav Petkov @ 2019-04-08 20:34 UTC (permalink / raw)
  To: Luck, Tony; +Cc: Ghannam, Yazen, linux-edac, linux-kernel, x86

On Mon, Apr 08, 2019 at 01:26:59PM -0700, Luck, Tony wrote:
> On Mon, Apr 08, 2019 at 02:12:17PM +0000, Ghannam, Yazen wrote:
> > +DEFINE_PER_CPU_READ_MOSTLY(u8, num_banks);
> > +EXPORT_PER_CPU_SYMBOL_GPL(num_banks);
> 
> The name "num_banks" is a bit generic for an exported symbol.
> I think it should have a "mce_" prefix.

Actually, it should not be exported at all. A function returning the num
banks is better instead.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* RE: [PATCH RESEND 4/5] x86/MCE: Make number of MCA banks per_cpu
  2019-04-08 20:34     ` Borislav Petkov
@ 2019-04-08 20:42       ` Luck, Tony
  2019-04-08 20:50         ` Borislav Petkov
  0 siblings, 1 reply; 23+ messages in thread
From: Luck, Tony @ 2019-04-08 20:42 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: Ghannam, Yazen, linux-edac, linux-kernel, x86

> Actually, it should not be exported at all. A function returning the num
> banks is better instead.

Are all the places it is used in non-pre-emptible sections of code? Looping
in the CMCI and #MC handlers should be fine. But do we need get_cpu()/put_cpu()
in any places?

-Tony

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH RESEND 4/5] x86/MCE: Make number of MCA banks per_cpu
  2019-04-08 20:42       ` Luck, Tony
@ 2019-04-08 20:50         ` Borislav Petkov
  2019-04-08 22:48           ` Ghannam, Yazen
  0 siblings, 1 reply; 23+ messages in thread
From: Borislav Petkov @ 2019-04-08 20:50 UTC (permalink / raw)
  To: Luck, Tony; +Cc: Ghannam, Yazen, linux-edac, linux-kernel, x86

On Mon, Apr 08, 2019 at 08:42:36PM +0000, Luck, Tony wrote:
> > Actually, it should not be exported at all. A function returning the num
> > banks is better instead.
> 
> Are all the places it is used in non-pre-emptible sections of code? Looping
> in the CMCI and #MC handlers should be fine. But do we need get_cpu()/put_cpu()
> in any places?

That export is needed only in the mce injector. Actually, it would be
much cleaner if the injector would find out the count straight from the
MSR as it does now, but be changed to do rdmsr_on_cpu() now, since can
have different num_banks on a CPU.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* RE: [PATCH RESEND 4/5] x86/MCE: Make number of MCA banks per_cpu
  2019-04-08 20:50         ` Borislav Petkov
@ 2019-04-08 22:48           ` Ghannam, Yazen
  2019-04-08 23:23             ` Luck, Tony
  2019-04-09 11:38             ` Borislav Petkov
  0 siblings, 2 replies; 23+ messages in thread
From: Ghannam, Yazen @ 2019-04-08 22:48 UTC (permalink / raw)
  To: Borislav Petkov, Luck, Tony; +Cc: linux-edac, linux-kernel, x86

> -----Original Message-----
> From: linux-edac-owner@vger.kernel.org <linux-edac-owner@vger.kernel.org> On Behalf Of Borislav Petkov
> Sent: Monday, April 8, 2019 3:50 PM
> To: Luck, Tony <tony.luck@intel.com>
> Cc: Ghannam, Yazen <Yazen.Ghannam@amd.com>; linux-edac@vger.kernel.org; linux-kernel@vger.kernel.org; x86@kernel.org
> Subject: Re: [PATCH RESEND 4/5] x86/MCE: Make number of MCA banks per_cpu
> 
> On Mon, Apr 08, 2019 at 08:42:36PM +0000, Luck, Tony wrote:
> > > Actually, it should not be exported at all. A function returning the num
> > > banks is better instead.
> >
> > Are all the places it is used in non-pre-emptible sections of code? Looping
> > in the CMCI and #MC handlers should be fine. But do we need get_cpu()/put_cpu()
> > in any places?
> 
> That export is needed only in the mce injector. Actually, it would be
> much cleaner if the injector would find out the count straight from the
> MSR as it does now, but be changed to do rdmsr_on_cpu() now, since can
> have different num_banks on a CPU.
> 

Okay, so drop the export and leave the injector code as-is (it's already doing a rdmsrl_on_cpu()).

Is that okay?

Thanks,
Yazen

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH RESEND 4/5] x86/MCE: Make number of MCA banks per_cpu
  2019-04-08 22:48           ` Ghannam, Yazen
@ 2019-04-08 23:23             ` Luck, Tony
  2019-04-08 23:37               ` Ghannam, Yazen
  2019-04-09 11:38             ` Borislav Petkov
  1 sibling, 1 reply; 23+ messages in thread
From: Luck, Tony @ 2019-04-08 23:23 UTC (permalink / raw)
  To: Ghannam, Yazen; +Cc: Borislav Petkov, linux-edac, linux-kernel, x86

On Mon, Apr 08, 2019 at 10:48:34PM +0000, Ghannam, Yazen wrote:
> Okay, so drop the export and leave the injector code as-is (it's
> already doing a rdmsrl_on_cpu()).

It's still a globally visible symbol (shared by core.c and amd.c).
So I think it needs a "mce_" prefix.

While it doesn't collide now, there are a bunch of other
subsystems that have "banks" and a variable to count them.

Look at output from "git grep -w num_banks".

-Tony

^ permalink raw reply	[flat|nested] 23+ messages in thread

* RE: [PATCH RESEND 4/5] x86/MCE: Make number of MCA banks per_cpu
  2019-04-08 23:23             ` Luck, Tony
@ 2019-04-08 23:37               ` Ghannam, Yazen
  0 siblings, 0 replies; 23+ messages in thread
From: Ghannam, Yazen @ 2019-04-08 23:37 UTC (permalink / raw)
  To: Luck, Tony; +Cc: Borislav Petkov, linux-edac, linux-kernel, x86

> -----Original Message-----
> From: linux-edac-owner@vger.kernel.org <linux-edac-owner@vger.kernel.org> On Behalf Of Luck, Tony
> Sent: Monday, April 8, 2019 6:23 PM
> To: Ghannam, Yazen <Yazen.Ghannam@amd.com>
> Cc: Borislav Petkov <bp@alien8.de>; linux-edac@vger.kernel.org; linux-kernel@vger.kernel.org; x86@kernel.org
> Subject: Re: [PATCH RESEND 4/5] x86/MCE: Make number of MCA banks per_cpu
> 
> On Mon, Apr 08, 2019 at 10:48:34PM +0000, Ghannam, Yazen wrote:
> > Okay, so drop the export and leave the injector code as-is (it's
> > already doing a rdmsrl_on_cpu()).
> 
> It's still a globally visible symbol (shared by core.c and amd.c).
> So I think it needs a "mce_" prefix.
> 
> While it doesn't collide now, there are a bunch of other
> subsystems that have "banks" and a variable to count them.
> 
> Look at output from "git grep -w num_banks".
> 

Okay, I'll add the prefix.

And thanks for the tip. I'll try to keep this in mind.

Thanks,
Yazen 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH RESEND 4/5] x86/MCE: Make number of MCA banks per_cpu
  2019-04-08 22:48           ` Ghannam, Yazen
  2019-04-08 23:23             ` Luck, Tony
@ 2019-04-09 11:38             ` Borislav Petkov
  1 sibling, 0 replies; 23+ messages in thread
From: Borislav Petkov @ 2019-04-09 11:38 UTC (permalink / raw)
  To: Ghannam, Yazen; +Cc: Luck, Tony, linux-edac, linux-kernel, x86

On Mon, Apr 08, 2019 at 10:48:34PM +0000, Ghannam, Yazen wrote:
> Okay, so drop the export and leave the injector code as-is (it's
> already doing a rdmsrl_on_cpu()).

Yes of course. The number of MCA banks is none of modules' business and
should not be exported at all.

Thx.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH RESEND 2/5] x86/MCE: Handle MCA controls in a per_cpu way
  2019-04-08 18:55     ` Ghannam, Yazen
@ 2019-04-09 20:34       ` Borislav Petkov
  2019-04-10 16:36         ` Ghannam, Yazen
  0 siblings, 1 reply; 23+ messages in thread
From: Borislav Petkov @ 2019-04-09 20:34 UTC (permalink / raw)
  To: Ghannam, Yazen; +Cc: linux-edac, linux-kernel, tony.luck, x86

On Mon, Apr 08, 2019 at 06:55:59PM +0000, Ghannam, Yazen wrote:
> We already have the case where some banks are not initialized either
> due to quirks or because they are Read-as-Zero, but we don't try to
> skip creating their files. With this full set (see patch 5), an unused
> bank will return a control value of 0.

So set_bank() is changed to do:

@@ -2088,7 +2097,7 @@ static ssize_t set_bank(struct device *s, struct device_attribute *attr,
        if (kstrtou64(buf, 0, &new) < 0)
                return -EINVAL;

-       if (bank >= mca_cfg.banks)
+       if (bank >= per_cpu(num_banks, s->id))
                return -EINVAL;


How would that work if the disabled/not-present bank is in the middle?
The old example: bank3 on CPU5.

> Would that be sufficient to indicate that a bank is not used?

Well, it should not allow for any control bits to be set and it should
have the proper bank number.

> But I do have a couple of thoughts:

> 1) Will missing banks confuse users? As mentioned, we already have the
> case of unused/uninitialized banks today, but we don't skip their file
> creation. a) Will this affect any userspace tools?

I guess it would be easier if we keep creating all files but denote properly
which banks are disabled.

> 2) Is the added complexity for file creation/destruction worth it? As
> mentioned, the file will return 0 for unused/uninitialized banks.

Yeah.

Thx.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* RE: [PATCH RESEND 2/5] x86/MCE: Handle MCA controls in a per_cpu way
  2019-04-09 20:34       ` Borislav Petkov
@ 2019-04-10 16:36         ` Ghannam, Yazen
  2019-04-10 16:40           ` Borislav Petkov
  0 siblings, 1 reply; 23+ messages in thread
From: Ghannam, Yazen @ 2019-04-10 16:36 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: linux-edac, linux-kernel, tony.luck, x86

> -----Original Message-----
> From: Borislav Petkov <bp@alien8.de>
> Sent: Tuesday, April 9, 2019 3:34 PM
> To: Ghannam, Yazen <Yazen.Ghannam@amd.com>
> Cc: linux-edac@vger.kernel.org; linux-kernel@vger.kernel.org; tony.luck@intel.com; x86@kernel.org
> Subject: Re: [PATCH RESEND 2/5] x86/MCE: Handle MCA controls in a per_cpu way
> 
> On Mon, Apr 08, 2019 at 06:55:59PM +0000, Ghannam, Yazen wrote:
> > We already have the case where some banks are not initialized either
> > due to quirks or because they are Read-as-Zero, but we don't try to
> > skip creating their files. With this full set (see patch 5), an unused
> > bank will return a control value of 0.
> 
> So set_bank() is changed to do:
> 
> @@ -2088,7 +2097,7 @@ static ssize_t set_bank(struct device *s, struct device_attribute *attr,
>         if (kstrtou64(buf, 0, &new) < 0)
>                 return -EINVAL;
> 
> -       if (bank >= mca_cfg.banks)
> +       if (bank >= per_cpu(num_banks, s->id))
>                 return -EINVAL;
> 
> 
> How would that work if the disabled/not-present bank is in the middle?
> The old example: bank3 on CPU5.
> 
> > Would that be sufficient to indicate that a bank is not used?
> 
> Well, it should not allow for any control bits to be set and it should
> have the proper bank number.
> 

We have this case on AMD Family 17h with Bank 4. The hardware enforces this bank to be Read-as-Zero/Writes-Ignored.

This behavior is enforced whether the bank is in the middle or at the end.

> > But I do have a couple of thoughts:
> 
> > 1) Will missing banks confuse users? As mentioned, we already have the
> > case of unused/uninitialized banks today, but we don't skip their file
> > creation. a) Will this affect any userspace tools?
> 
> I guess it would be easier if we keep creating all files but denote properly
> which banks are disabled.
> 

I'm thinking to redo the sysfs interface for banks in another patch set. I could include a new file to indicate enabled/disabled, or maybe just update the documentation to describe this case.

Thanks,
Yazen

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH RESEND 2/5] x86/MCE: Handle MCA controls in a per_cpu way
  2019-04-10 16:36         ` Ghannam, Yazen
@ 2019-04-10 16:40           ` Borislav Petkov
  2019-04-10 16:58             ` Ghannam, Yazen
  0 siblings, 1 reply; 23+ messages in thread
From: Borislav Petkov @ 2019-04-10 16:40 UTC (permalink / raw)
  To: Ghannam, Yazen; +Cc: linux-edac, linux-kernel, tony.luck, x86

On Wed, Apr 10, 2019 at 04:36:30PM +0000, Ghannam, Yazen wrote:
> We have this case on AMD Family 17h with Bank 4. The hardware enforces
> this bank to be Read-as-Zero/Writes-Ignored.
>
> This behavior is enforced whether the bank is in the middle or at the
> end.

Does num_banks contain the disabled bank? If so, then it will work.

> I'm thinking to redo the sysfs interface for banks in another patch
> set. I could include a new file to indicate enabled/disabled, or maybe
> just update the documentation to describe this case.

No, the write to the bank controls should fail on a disabled bank.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* RE: [PATCH RESEND 2/5] x86/MCE: Handle MCA controls in a per_cpu way
  2019-04-10 16:40           ` Borislav Petkov
@ 2019-04-10 16:58             ` Ghannam, Yazen
  2019-04-10 17:25               ` Borislav Petkov
  0 siblings, 1 reply; 23+ messages in thread
From: Ghannam, Yazen @ 2019-04-10 16:58 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: linux-edac, linux-kernel, tony.luck, x86

> -----Original Message-----
> From: Borislav Petkov <bp@alien8.de>
> Sent: Wednesday, April 10, 2019 11:41 AM
> To: Ghannam, Yazen <Yazen.Ghannam@amd.com>
> Cc: linux-edac@vger.kernel.org; linux-kernel@vger.kernel.org; tony.luck@intel.com; x86@kernel.org
> Subject: Re: [PATCH RESEND 2/5] x86/MCE: Handle MCA controls in a per_cpu way
> 
> On Wed, Apr 10, 2019 at 04:36:30PM +0000, Ghannam, Yazen wrote:
> > We have this case on AMD Family 17h with Bank 4. The hardware enforces
> > this bank to be Read-as-Zero/Writes-Ignored.
> >
> > This behavior is enforced whether the bank is in the middle or at the
> > end.
> 
> Does num_banks contain the disabled bank? If so, then it will work.
> 

Yes, unused banks in the middle are counted in the MCG_CAP[Count] value.

> > I'm thinking to redo the sysfs interface for banks in another patch
> > set. I could include a new file to indicate enabled/disabled, or maybe
> > just update the documentation to describe this case.
> 
> No, the write to the bank controls should fail on a disabled bank.
> 

Okay, so you're saying the sysfs access should fail if a bank is disabled. Is that correct?

Does "disabled" mean one or both of these?
Unused = RAZ/WI in hardware
Uninitialized = Not initialized by kernel due to quirks, etc.

For an unused bank, it doesn't hurt to write MCA_CTL, but really there's no reason to do so and go through mce_restart().

For an uninitialized bank, should we prevent users from overriding the kernel's settings?

Thanks,
Yazen


 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH RESEND 2/5] x86/MCE: Handle MCA controls in a per_cpu way
  2019-04-10 16:58             ` Ghannam, Yazen
@ 2019-04-10 17:25               ` Borislav Petkov
  2019-04-10 19:41                 ` Ghannam, Yazen
  0 siblings, 1 reply; 23+ messages in thread
From: Borislav Petkov @ 2019-04-10 17:25 UTC (permalink / raw)
  To: Ghannam, Yazen; +Cc: linux-edac, linux-kernel, tony.luck, x86

On Wed, Apr 10, 2019 at 04:58:12PM +0000, Ghannam, Yazen wrote:
> Yes, unused banks in the middle are counted in the MCG_CAP[Count] value.

Good.

> Okay, so you're saying the sysfs access should fail if a bank is
> disabled. Is that correct?

Well, think about it. If a bank is not operational for whatever reason,
we should tell the user that.

> Does "disabled" mean one or both of these?
> Unused = RAZ/WI in hardware
> Uninitialized = Not initialized by kernel due to quirks, etc.
> 
> For an unused bank, it doesn't hurt to write MCA_CTL, but really
> there's no reason to do so and go through mce_restart().

Yes, but that bank is non-operational in some form. So we should prevent
all writes to it because, well, it is not going to do anything. And this
would be a good way to give feedback to the user that that is the case.

> For an uninitialized bank, should we prevent users from overriding the
> kernel's settings?

That all depends on the quirks. Whether we should allow them to be
overridden or not. I don't think we've ever thought about it, though.

Let's look at one:

        if (c->x86_vendor == X86_VENDOR_AMD) {
                if (c->x86 == 15 && cfg->banks > 4) {
                        /*
                         * disable GART TBL walk error reporting, which
                         * trips off incorrectly with the IOMMU & 3ware
                         * & Cerberus:
                         */
                        clear_bit(10, (unsigned long *)&mce_banks[4].ctl);


Yah, so if the user reenables those GART errors, then she/he will see a
lot of MCEs reported and will maybe complain about it. And then we'll
say, but why did you enable them then. And she/he'll say: uh, didn't
know. Or, I was just poking at sysfs and this happened.

Then we can say, well, don't do that then! :-)

So my current position is, meh, who cares. But then I'm looking at
another quirk:

        if (c->x86_vendor == X86_VENDOR_INTEL) {
                /*
                 * SDM documents that on family 6 bank 0 should not be written
                 * because it aliases to another special BIOS controlled
                 * register.
                 * But it's not aliased anymore on model 0x1a+
                 * Don't ignore bank 0 completely because there could be a
                 * valid event later, merely don't write CTL0.
                 */

                if (c->x86 == 6 && c->x86_model < 0x1A && cfg->banks > 0)
                        mce_banks[0].init = 0;


which basically prevents that bank from being reinitialized. So I guess
we have that functionality already - we simply need to pay attention to
w->init.

Right?

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* RE: [PATCH RESEND 2/5] x86/MCE: Handle MCA controls in a per_cpu way
  2019-04-10 17:25               ` Borislav Petkov
@ 2019-04-10 19:41                 ` Ghannam, Yazen
  2019-04-10 20:04                   ` Borislav Petkov
  0 siblings, 1 reply; 23+ messages in thread
From: Ghannam, Yazen @ 2019-04-10 19:41 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: linux-edac, linux-kernel, tony.luck, x86

> -----Original Message-----
> From: linux-edac-owner@vger.kernel.org <linux-edac-owner@vger.kernel.org> On Behalf Of Borislav Petkov
> Sent: Wednesday, April 10, 2019 12:26 PM
> To: Ghannam, Yazen <Yazen.Ghannam@amd.com>
> Cc: linux-edac@vger.kernel.org; linux-kernel@vger.kernel.org; tony.luck@intel.com; x86@kernel.org
> Subject: Re: [PATCH RESEND 2/5] x86/MCE: Handle MCA controls in a per_cpu way
> 
> On Wed, Apr 10, 2019 at 04:58:12PM +0000, Ghannam, Yazen wrote:
> > Yes, unused banks in the middle are counted in the MCG_CAP[Count] value.
> 
> Good.
> 
> > Okay, so you're saying the sysfs access should fail if a bank is
> > disabled. Is that correct?
> 
> Well, think about it. If a bank is not operational for whatever reason,
> we should tell the user that.
> 
> > Does "disabled" mean one or both of these?
> > Unused = RAZ/WI in hardware
> > Uninitialized = Not initialized by kernel due to quirks, etc.
> >
> > For an unused bank, it doesn't hurt to write MCA_CTL, but really
> > there's no reason to do so and go through mce_restart().
> 
> Yes, but that bank is non-operational in some form. So we should prevent
> all writes to it because, well, it is not going to do anything. And this
> would be a good way to give feedback to the user that that is the case.
> 
> > For an uninitialized bank, should we prevent users from overriding the
> > kernel's settings?
> 
> That all depends on the quirks. Whether we should allow them to be
> overridden or not. I don't think we've ever thought about it, though.
> 
> Let's look at one:
> 
>         if (c->x86_vendor == X86_VENDOR_AMD) {
>                 if (c->x86 == 15 && cfg->banks > 4) {
>                         /*
>                          * disable GART TBL walk error reporting, which
>                          * trips off incorrectly with the IOMMU & 3ware
>                          * & Cerberus:
>                          */
>                         clear_bit(10, (unsigned long *)&mce_banks[4].ctl);
> 
> 
> Yah, so if the user reenables those GART errors, then she/he will see a
> lot of MCEs reported and will maybe complain about it. And then we'll
> say, but why did you enable them then. And she/he'll say: uh, didn't
> know. Or, I was just poking at sysfs and this happened.
> 
> Then we can say, well, don't do that then! :-)
> 
> So my current position is, meh, who cares. But then I'm looking at
> another quirk:
> 
>         if (c->x86_vendor == X86_VENDOR_INTEL) {
>                 /*
>                  * SDM documents that on family 6 bank 0 should not be written
>                  * because it aliases to another special BIOS controlled
>                  * register.
>                  * But it's not aliased anymore on model 0x1a+
>                  * Don't ignore bank 0 completely because there could be a
>                  * valid event later, merely don't write CTL0.
>                  */
> 
>                 if (c->x86 == 6 && c->x86_model < 0x1A && cfg->banks > 0)
>                         mce_banks[0].init = 0;
> 
> 
> which basically prevents that bank from being reinitialized. So I guess
> we have that functionality already - we simply need to pay attention to
> w->init.
> 
> Right?

Okay, I'm with you.

So I'm thinking to add another patch to the set. This will set mce_bank.init=0 if we read MCA_CTL=0 from the hardware.

Then we check if mce_bank.init=0 in the set/show functions and give a message if the bank is not used.

How does that sound?

Thanks,
Yazen

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH RESEND 2/5] x86/MCE: Handle MCA controls in a per_cpu way
  2019-04-10 19:41                 ` Ghannam, Yazen
@ 2019-04-10 20:04                   ` Borislav Petkov
  0 siblings, 0 replies; 23+ messages in thread
From: Borislav Petkov @ 2019-04-10 20:04 UTC (permalink / raw)
  To: Ghannam, Yazen; +Cc: linux-edac, linux-kernel, tony.luck, x86

On Wed, Apr 10, 2019 at 07:41:47PM +0000, Ghannam, Yazen wrote:
> So I'm thinking to add another patch to the set. This will set
> mce_bank.init=0 if we read MCA_CTL=0 from the hardware.

Ok.

> Then we check if mce_bank.init=0 in the set/show functions and give a
> message if the bank is not used.

Nah, not a message. return -ENODEV or -EINVAL or so.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2019-04-10 20:04 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-04-08 14:12 [PATCH RESEND 0/5] Handle MCA banks in a per_cpu way Ghannam, Yazen
2019-04-08 14:12 ` [PATCH RESEND 1/5] x86/MCE: Make struct mce_banks[] static Ghannam, Yazen
2019-04-08 14:12 ` [PATCH RESEND 2/5] x86/MCE: Handle MCA controls in a per_cpu way Ghannam, Yazen
2019-04-08 17:51   ` Borislav Petkov
2019-04-08 18:55     ` Ghannam, Yazen
2019-04-09 20:34       ` Borislav Petkov
2019-04-10 16:36         ` Ghannam, Yazen
2019-04-10 16:40           ` Borislav Petkov
2019-04-10 16:58             ` Ghannam, Yazen
2019-04-10 17:25               ` Borislav Petkov
2019-04-10 19:41                 ` Ghannam, Yazen
2019-04-10 20:04                   ` Borislav Petkov
2019-04-08 14:12 ` [PATCH RESEND 3/5] x86/MCE/AMD: Don't cache block addresses on SMCA systems Ghannam, Yazen
2019-04-08 14:12 ` [PATCH RESEND 4/5] x86/MCE: Make number of MCA banks per_cpu Ghannam, Yazen
2019-04-08 20:26   ` Luck, Tony
2019-04-08 20:34     ` Borislav Petkov
2019-04-08 20:42       ` Luck, Tony
2019-04-08 20:50         ` Borislav Petkov
2019-04-08 22:48           ` Ghannam, Yazen
2019-04-08 23:23             ` Luck, Tony
2019-04-08 23:37               ` Ghannam, Yazen
2019-04-09 11:38             ` Borislav Petkov
2019-04-08 14:12 ` [PATCH RESEND 5/5] x86/MCE: Save MCA control bits that get set in hardware Ghannam, Yazen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).