linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/2] Add DIMM 2x refresh event and failure syndrome
@ 2023-02-28  9:20 Quan Nguyen
  2023-02-28  9:20 ` [PATCH v2 1/2] misc: smpro-errmon: Add DIMM 2x Refresh rate event Quan Nguyen
  2023-02-28  9:20 ` [PATCH v2 2/2] misc: smpro-errmon: Add dimm training failure syndrome Quan Nguyen
  0 siblings, 2 replies; 6+ messages in thread
From: Quan Nguyen @ 2023-02-28  9:20 UTC (permalink / raw)
  To: Quan Nguyen, Arnd Bergmann, Greg Kroah-Hartman, Paul Menzel,
	Joel Stanley, Andrew Jeffery, linux-kernel, openbmc,
	Open Source Submission
  Cc: Phong Vo, Thang Nguyen

Adds event_dimm_2x_refresh sysfs to report doubling of DIMM refresh
rate on high temperature condition and event_dimm[0-15]_syndrome sysfs
to report the DDR failure syndrome to BMC when DIMM training failed.

Discussion for v1:
https://lore.kernel.org/lkml/20230214064509.3622044-1-quan@os.amperecomputing.com/T/

v2:
  + Change "to initialized" to "to initialize"            [Paul]
  + Corrected kernel version to 6.3                  [Greg,Paul]
  + Corrected variables type to 'unsigned int"            [Paul]

Quan Nguyen (2):
  misc: smpro-errmon: Add DIMM 2x Refresh rate event
  misc: smpro-errmon: Add dimm training failure syndrome

 .../sysfs-bus-platform-devices-ampere-smpro   | 19 ++++-
 drivers/misc/smpro-errmon.c                   | 82 +++++++++++++++++++
 2 files changed, 98 insertions(+), 3 deletions(-)

-- 
2.35.1


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH v2 1/2] misc: smpro-errmon: Add DIMM 2x Refresh rate event
  2023-02-28  9:20 [PATCH v2 0/2] Add DIMM 2x refresh event and failure syndrome Quan Nguyen
@ 2023-02-28  9:20 ` Quan Nguyen
  2023-02-28  9:20 ` [PATCH v2 2/2] misc: smpro-errmon: Add dimm training failure syndrome Quan Nguyen
  1 sibling, 0 replies; 6+ messages in thread
From: Quan Nguyen @ 2023-02-28  9:20 UTC (permalink / raw)
  To: Quan Nguyen, Arnd Bergmann, Greg Kroah-Hartman, Paul Menzel,
	Joel Stanley, Andrew Jeffery, linux-kernel, openbmc,
	Open Source Submission
  Cc: Phong Vo, Thang Nguyen

In high temperature condition, JEDEC spec requires memory controller to
double the refresh rate. This commit adds event_dimm_2x_refresh sysfs
to report that events for all memory channels.

Signed-off-by: Quan Nguyen <quan@os.amperecomputing.com>
---
v2:
  + Corrected kernel version to 6.3                  [Greg,Paul]
---
 .../ABI/testing/sysfs-bus-platform-devices-ampere-smpro  | 9 ++++++---
 drivers/misc/smpro-errmon.c                              | 5 +++++
 2 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/Documentation/ABI/testing/sysfs-bus-platform-devices-ampere-smpro b/Documentation/ABI/testing/sysfs-bus-platform-devices-ampere-smpro
index ca93c215ef99..5a8d1af49a2b 100644
--- a/Documentation/ABI/testing/sysfs-bus-platform-devices-ampere-smpro
+++ b/Documentation/ABI/testing/sysfs-bus-platform-devices-ampere-smpro
@@ -234,8 +234,8 @@ Description:
 		For details, see section `5.10 RAS Internal Error Register Definitions,
 		Altra Family Soc BMC Interface Specification`.
 
-What:		/sys/bus/platform/devices/smpro-errmon.*/event_[vrd_warn_fault|vrd_hot|dimm_hot]
-KernelVersion:	6.1
+What:		/sys/bus/platform/devices/smpro-errmon.*/event_[vrd_warn_fault|vrd_hot|dimm_hot|dimm_2x_refresh]
+KernelVersion:	6.1 (event_[vrd_warn_fault|vrd_hot|dimm_hot]), 6.3 (event_dimm_2x_refresh)
 Contact:	Quan Nguyen <quan@os.amperecomputing.com>
 Description:
 		(RO) Contains the detail information in case of VRD/DIMM warning/hot events
@@ -258,8 +258,11 @@ Description:
 		+---------------+---------------------------------------------------------------+---------------------+
 		| DIMM HOT      | /sys/bus/platform/devices/smpro-errmon.*/event_dimm_hot       | DIMM Hot            |
 		+---------------+---------------------------------------------------------------+---------------------+
+		| DIMM 2X       | /sys/bus/platform/devices/smpro-errmon.*/event_dimm_2x_refresh| DIMM 2x refresh rate|
+		| REFRESH RATE  |                                                               | event in high temp  |
+		+---------------+---------------------------------------------------------------+---------------------+
 
-		For more details, see section `5.7 GPI Status Registers,
+		For more details, see section `5.7 GPI Status Registers and 5.9 Memory Error Register Definitions,
 		Altra Family Soc BMC Interface Specification`.
 
 What:		/sys/bus/platform/devices/smpro-misc.*/boot_progress
diff --git a/drivers/misc/smpro-errmon.c b/drivers/misc/smpro-errmon.c
index d1431d419aa4..1635e881aefb 100644
--- a/drivers/misc/smpro-errmon.c
+++ b/drivers/misc/smpro-errmon.c
@@ -67,6 +67,7 @@
 #define VRD_WARN_FAULT_EVENT_DATA	0x78
 #define VRD_HOT_EVENT_DATA		0x79
 #define DIMM_HOT_EVENT_DATA		0x7A
+#define DIMM_2X_REFRESH_EVENT_DATA	0x96
 
 #define MAX_READ_BLOCK_LENGTH	48
 
@@ -190,6 +191,7 @@ enum EVENT_TYPES {
 	VRD_WARN_FAULT_EVENT,
 	VRD_HOT_EVENT,
 	DIMM_HOT_EVENT,
+	DIMM_2X_REFRESH_EVENT,
 	NUM_EVENTS_TYPE,
 };
 
@@ -198,6 +200,7 @@ static u8 smpro_event_table[NUM_EVENTS_TYPE] = {
 	VRD_WARN_FAULT_EVENT_DATA,
 	VRD_HOT_EVENT_DATA,
 	DIMM_HOT_EVENT_DATA,
+	DIMM_2X_REFRESH_EVENT_DATA,
 };
 
 static ssize_t smpro_event_data_read(struct device *dev,
@@ -463,6 +466,7 @@ static DEVICE_ATTR_RO(warn_pmpro);
 EVENT_RO(vrd_warn_fault, VRD_WARN_FAULT_EVENT);
 EVENT_RO(vrd_hot, VRD_HOT_EVENT);
 EVENT_RO(dimm_hot, DIMM_HOT_EVENT);
+EVENT_RO(dimm_2x_refresh, DIMM_2X_REFRESH_EVENT);
 
 static struct attribute *smpro_errmon_attrs[] = {
 	&dev_attr_overflow_core_ce.attr,
@@ -488,6 +492,7 @@ static struct attribute *smpro_errmon_attrs[] = {
 	&dev_attr_event_vrd_warn_fault.attr,
 	&dev_attr_event_vrd_hot.attr,
 	&dev_attr_event_dimm_hot.attr,
+	&dev_attr_event_dimm_2x_refresh.attr,
 	NULL
 };
 
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH v2 2/2] misc: smpro-errmon: Add dimm training failure syndrome
  2023-02-28  9:20 [PATCH v2 0/2] Add DIMM 2x refresh event and failure syndrome Quan Nguyen
  2023-02-28  9:20 ` [PATCH v2 1/2] misc: smpro-errmon: Add DIMM 2x Refresh rate event Quan Nguyen
@ 2023-02-28  9:20 ` Quan Nguyen
  2023-03-09 17:05   ` Greg Kroah-Hartman
  1 sibling, 1 reply; 6+ messages in thread
From: Quan Nguyen @ 2023-02-28  9:20 UTC (permalink / raw)
  To: Quan Nguyen, Arnd Bergmann, Greg Kroah-Hartman, Paul Menzel,
	Joel Stanley, Andrew Jeffery, linux-kernel, openbmc,
	Open Source Submission
  Cc: Phong Vo, Thang Nguyen

Adds event_dimm[0-15]_syndrome sysfs to report the failure syndrome
to BMC when DIMM training failed.

Signed-off-by: Quan Nguyen <quan@os.amperecomputing.com>
---
v2:
  + Change "to initialized" to "to initialize"            [Paul]
  + Corrected kernel version to 6.3                  [Greg,Paul]
  + Corrected variables type to 'unsigned int"            [Paul]
---
 .../sysfs-bus-platform-devices-ampere-smpro   | 10 +++
 drivers/misc/smpro-errmon.c                   | 77 +++++++++++++++++++
 2 files changed, 87 insertions(+)

diff --git a/Documentation/ABI/testing/sysfs-bus-platform-devices-ampere-smpro b/Documentation/ABI/testing/sysfs-bus-platform-devices-ampere-smpro
index 5a8d1af49a2b..44101d3fed22 100644
--- a/Documentation/ABI/testing/sysfs-bus-platform-devices-ampere-smpro
+++ b/Documentation/ABI/testing/sysfs-bus-platform-devices-ampere-smpro
@@ -265,6 +265,16 @@ Description:
 		For more details, see section `5.7 GPI Status Registers and 5.9 Memory Error Register Definitions,
 		Altra Family Soc BMC Interface Specification`.
 
+What:		/sys/bus/platform/devices/smpro-errmon.*/event_dimm[0-15]_syndrome
+KernelVersion:	6.3
+Contact:	Quan Nguyen <quan@os.amperecomputing.com>
+Description:
+		(RO) The sysfs returns the 2-byte DIMM failure syndrome data for slot
+		0-15 if it failed to initialize.
+
+		For more details, see section `5.11 Boot Stage Register Definitions,
+		Altra Family Soc BMC Interface Specification`.
+
 What:		/sys/bus/platform/devices/smpro-misc.*/boot_progress
 KernelVersion:	6.1
 Contact:	Quan Nguyen <quan@os.amperecomputing.com>
diff --git a/drivers/misc/smpro-errmon.c b/drivers/misc/smpro-errmon.c
index 1635e881aefb..a1f0b2c77fac 100644
--- a/drivers/misc/smpro-errmon.c
+++ b/drivers/misc/smpro-errmon.c
@@ -47,6 +47,12 @@
 #define WARN_PMPRO_INFO_LO	0xAC
 #define WARN_PMPRO_INFO_HI	0xAD
 
+/* Boot Stage Register */
+#define BOOTSTAGE		0xB0
+#define DIMM_SYNDROME_SEL	0xB4
+#define DIMM_SYNDROME_ERR	0xB5
+#define DIMM_SYNDROME_STAGE	4
+
 /* PCIE Error Registers */
 #define PCIE_CE_ERR_CNT		0xC0
 #define PCIE_CE_ERR_LEN		0xC1
@@ -468,6 +474,61 @@ EVENT_RO(vrd_hot, VRD_HOT_EVENT);
 EVENT_RO(dimm_hot, DIMM_HOT_EVENT);
 EVENT_RO(dimm_2x_refresh, DIMM_2X_REFRESH_EVENT);
 
+static ssize_t smpro_dimm_syndrome_read(struct device *dev, struct device_attribute *da,
+					char *buf, unsigned int slot)
+{
+	struct smpro_errmon *errmon = dev_get_drvdata(dev);
+	unsigned int data;
+	int ret;
+
+	ret = regmap_read(errmon->regmap, BOOTSTAGE, &data);
+	if (ret)
+		return ret;
+
+	/* check for valid stage */
+	data = (data >> 8) & 0xff;
+	if (data != DIMM_SYNDROME_STAGE)
+		return ret;
+
+	/* Write the slot ID to retrieve Error Syndrome */
+	ret = regmap_write(errmon->regmap, DIMM_SYNDROME_SEL, slot);
+	if (ret)
+		return ret;
+
+	/* Read the Syndrome error */
+	ret = regmap_read(errmon->regmap, DIMM_SYNDROME_ERR, &data);
+	if (ret || !data)
+		return ret;
+
+	return sysfs_emit(buf, "%04x\n", data);
+}
+
+#define EVENT_DIMM_SYNDROME(_slot) \
+	static ssize_t event_dimm##_slot##_syndrome_show(struct device *dev,          \
+							 struct device_attribute *da, \
+							 char *buf)                   \
+	{                                                                             \
+		return smpro_dimm_syndrome_read(dev, da, buf, _slot);                 \
+	}                                                                             \
+	static DEVICE_ATTR_RO(event_dimm##_slot##_syndrome)
+
+EVENT_DIMM_SYNDROME(0);
+EVENT_DIMM_SYNDROME(1);
+EVENT_DIMM_SYNDROME(2);
+EVENT_DIMM_SYNDROME(3);
+EVENT_DIMM_SYNDROME(4);
+EVENT_DIMM_SYNDROME(5);
+EVENT_DIMM_SYNDROME(6);
+EVENT_DIMM_SYNDROME(7);
+EVENT_DIMM_SYNDROME(8);
+EVENT_DIMM_SYNDROME(9);
+EVENT_DIMM_SYNDROME(10);
+EVENT_DIMM_SYNDROME(11);
+EVENT_DIMM_SYNDROME(12);
+EVENT_DIMM_SYNDROME(13);
+EVENT_DIMM_SYNDROME(14);
+EVENT_DIMM_SYNDROME(15);
+
 static struct attribute *smpro_errmon_attrs[] = {
 	&dev_attr_overflow_core_ce.attr,
 	&dev_attr_overflow_core_ue.attr,
@@ -493,6 +554,22 @@ static struct attribute *smpro_errmon_attrs[] = {
 	&dev_attr_event_vrd_hot.attr,
 	&dev_attr_event_dimm_hot.attr,
 	&dev_attr_event_dimm_2x_refresh.attr,
+	&dev_attr_event_dimm0_syndrome.attr,
+	&dev_attr_event_dimm1_syndrome.attr,
+	&dev_attr_event_dimm2_syndrome.attr,
+	&dev_attr_event_dimm3_syndrome.attr,
+	&dev_attr_event_dimm4_syndrome.attr,
+	&dev_attr_event_dimm5_syndrome.attr,
+	&dev_attr_event_dimm6_syndrome.attr,
+	&dev_attr_event_dimm7_syndrome.attr,
+	&dev_attr_event_dimm8_syndrome.attr,
+	&dev_attr_event_dimm9_syndrome.attr,
+	&dev_attr_event_dimm10_syndrome.attr,
+	&dev_attr_event_dimm11_syndrome.attr,
+	&dev_attr_event_dimm12_syndrome.attr,
+	&dev_attr_event_dimm13_syndrome.attr,
+	&dev_attr_event_dimm14_syndrome.attr,
+	&dev_attr_event_dimm15_syndrome.attr,
 	NULL
 };
 
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH v2 2/2] misc: smpro-errmon: Add dimm training failure syndrome
  2023-02-28  9:20 ` [PATCH v2 2/2] misc: smpro-errmon: Add dimm training failure syndrome Quan Nguyen
@ 2023-03-09 17:05   ` Greg Kroah-Hartman
  2023-03-10  0:20     ` Quan Nguyen
  0 siblings, 1 reply; 6+ messages in thread
From: Greg Kroah-Hartman @ 2023-03-09 17:05 UTC (permalink / raw)
  To: Quan Nguyen
  Cc: Arnd Bergmann, Paul Menzel, Joel Stanley, Andrew Jeffery,
	linux-kernel, openbmc, Open Source Submission, Phong Vo,
	Thang Nguyen

On Tue, Feb 28, 2023 at 04:20:10PM +0700, Quan Nguyen wrote:
> Adds event_dimm[0-15]_syndrome sysfs to report the failure syndrome
> to BMC when DIMM training failed.
> 
> Signed-off-by: Quan Nguyen <quan@os.amperecomputing.com>
> ---
> v2:
>   + Change "to initialized" to "to initialize"            [Paul]
>   + Corrected kernel version to 6.3                  [Greg,Paul]

Sorry, but this is going to have to be 6.4 now.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2 2/2] misc: smpro-errmon: Add dimm training failure syndrome
  2023-03-09 17:05   ` Greg Kroah-Hartman
@ 2023-03-10  0:20     ` Quan Nguyen
  2023-03-10  7:06       ` Greg Kroah-Hartman
  0 siblings, 1 reply; 6+ messages in thread
From: Quan Nguyen @ 2023-03-10  0:20 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Arnd Bergmann, Paul Menzel, Joel Stanley, Andrew Jeffery,
	linux-kernel, openbmc, Open Source Submission, Phong Vo,
	Thang Nguyen



On 10/03/2023 00:05, Greg Kroah-Hartman wrote:
> On Tue, Feb 28, 2023 at 04:20:10PM +0700, Quan Nguyen wrote:
>> Adds event_dimm[0-15]_syndrome sysfs to report the failure syndrome
>> to BMC when DIMM training failed.
>>
>> Signed-off-by: Quan Nguyen <quan@os.amperecomputing.com>
>> ---
>> v2:
>>    + Change "to initialized" to "to initialize"            [Paul]
>>    + Corrected kernel version to 6.3                  [Greg,Paul]
> 
> Sorry, but this is going to have to be 6.4 now.
> 

Thanks Greg for the review,

So should I send another version for the patch with 6.3 to 6.4 correction ?

Thanks,

-Quan

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2 2/2] misc: smpro-errmon: Add dimm training failure syndrome
  2023-03-10  0:20     ` Quan Nguyen
@ 2023-03-10  7:06       ` Greg Kroah-Hartman
  0 siblings, 0 replies; 6+ messages in thread
From: Greg Kroah-Hartman @ 2023-03-10  7:06 UTC (permalink / raw)
  To: Quan Nguyen
  Cc: Arnd Bergmann, Paul Menzel, Joel Stanley, Andrew Jeffery,
	linux-kernel, openbmc, Open Source Submission, Phong Vo,
	Thang Nguyen

On Fri, Mar 10, 2023 at 07:20:56AM +0700, Quan Nguyen wrote:
> 
> 
> On 10/03/2023 00:05, Greg Kroah-Hartman wrote:
> > On Tue, Feb 28, 2023 at 04:20:10PM +0700, Quan Nguyen wrote:
> > > Adds event_dimm[0-15]_syndrome sysfs to report the failure syndrome
> > > to BMC when DIMM training failed.
> > > 
> > > Signed-off-by: Quan Nguyen <quan@os.amperecomputing.com>
> > > ---
> > > v2:
> > >    + Change "to initialized" to "to initialize"            [Paul]
> > >    + Corrected kernel version to 6.3                  [Greg,Paul]
> > 
> > Sorry, but this is going to have to be 6.4 now.
> > 
> 
> Thanks Greg for the review,
> 
> So should I send another version for the patch with 6.3 to 6.4 correction ?

Yes please, for both, so I can queue them up now for the 6.4-rc1 merge
window.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2023-03-10  7:06 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-02-28  9:20 [PATCH v2 0/2] Add DIMM 2x refresh event and failure syndrome Quan Nguyen
2023-02-28  9:20 ` [PATCH v2 1/2] misc: smpro-errmon: Add DIMM 2x Refresh rate event Quan Nguyen
2023-02-28  9:20 ` [PATCH v2 2/2] misc: smpro-errmon: Add dimm training failure syndrome Quan Nguyen
2023-03-09 17:05   ` Greg Kroah-Hartman
2023-03-10  0:20     ` Quan Nguyen
2023-03-10  7:06       ` Greg Kroah-Hartman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).