openbmc.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] Add DIMM 2x refresh event and failure syndrome
@ 2023-02-14  6:45 Quan Nguyen
  2023-02-14  6:45 ` [PATCH 1/2] misc: smpro-errmon: Add DIMM 2x Refresh rate event Quan Nguyen
  2023-02-14  6:45 ` [PATCH 2/2] misc: smpro-errmon: Add dimm training failure syndrome Quan Nguyen
  0 siblings, 2 replies; 7+ messages in thread
From: Quan Nguyen @ 2023-02-14  6:45 UTC (permalink / raw)
  To: Arnd Bergmann, Greg Kroah-Hartman, Joel Stanley, Andrew Jeffery,
	linux-kernel, openbmc, Open Source Submission
  Cc: Thang Nguyen, Phong Vo, Quan Nguyen

Adds event_dimm_2x_refresh sysfs to report doubling of DIMM refresh
rate on high temperature condition and event_dimm[0-15]_syndrome sysfs
to report the DDR failure syndrome to BMC when DIMM training failed.

Quan Nguyen (2):
  misc: smpro-errmon: Add DIMM 2x Refresh rate event
  misc: smpro-errmon: Add dimm training failure syndrome

 .../sysfs-bus-platform-devices-ampere-smpro   | 15 +++-
 drivers/misc/smpro-errmon.c                   | 82 +++++++++++++++++++
 2 files changed, 96 insertions(+), 1 deletion(-)

-- 
2.35.1


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 1/2] misc: smpro-errmon: Add DIMM 2x Refresh rate event
  2023-02-14  6:45 [PATCH 0/2] Add DIMM 2x refresh event and failure syndrome Quan Nguyen
@ 2023-02-14  6:45 ` Quan Nguyen
  2023-02-14  6:45 ` [PATCH 2/2] misc: smpro-errmon: Add dimm training failure syndrome Quan Nguyen
  1 sibling, 0 replies; 7+ messages in thread
From: Quan Nguyen @ 2023-02-14  6:45 UTC (permalink / raw)
  To: Arnd Bergmann, Greg Kroah-Hartman, Joel Stanley, Andrew Jeffery,
	linux-kernel, openbmc, Open Source Submission
  Cc: Thang Nguyen, Phong Vo, Quan Nguyen

In high temperature condition, JEDEC spec requires memory controller to
double the refresh rate. This commit adds event_dimm_2x_refresh sysfs
to report that events for all memory channels.

Signed-off-by: Quan Nguyen <quan@os.amperecomputing.com>
---
 .../ABI/testing/sysfs-bus-platform-devices-ampere-smpro      | 5 ++++-
 drivers/misc/smpro-errmon.c                                  | 5 +++++
 2 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/Documentation/ABI/testing/sysfs-bus-platform-devices-ampere-smpro b/Documentation/ABI/testing/sysfs-bus-platform-devices-ampere-smpro
index ca93c215ef99..d4e3f308c451 100644
--- a/Documentation/ABI/testing/sysfs-bus-platform-devices-ampere-smpro
+++ b/Documentation/ABI/testing/sysfs-bus-platform-devices-ampere-smpro
@@ -258,8 +258,11 @@ Description:
 		+---------------+---------------------------------------------------------------+---------------------+
 		| DIMM HOT      | /sys/bus/platform/devices/smpro-errmon.*/event_dimm_hot       | DIMM Hot            |
 		+---------------+---------------------------------------------------------------+---------------------+
+		| DIMM 2X       | /sys/bus/platform/devices/smpro-errmon.*/event_dimm_2x_refresh| DIMM 2x refresh rate|
+		| REFRESH RATE  |                                                               | event in high temp  |
+		+---------------+---------------------------------------------------------------+---------------------+
 
-		For more details, see section `5.7 GPI Status Registers,
+		For more details, see section `5.7 GPI Status Registers and 5.9 Memory Error Register Definitions,
 		Altra Family Soc BMC Interface Specification`.
 
 What:		/sys/bus/platform/devices/smpro-misc.*/boot_progress
diff --git a/drivers/misc/smpro-errmon.c b/drivers/misc/smpro-errmon.c
index d1431d419aa4..1635e881aefb 100644
--- a/drivers/misc/smpro-errmon.c
+++ b/drivers/misc/smpro-errmon.c
@@ -67,6 +67,7 @@
 #define VRD_WARN_FAULT_EVENT_DATA	0x78
 #define VRD_HOT_EVENT_DATA		0x79
 #define DIMM_HOT_EVENT_DATA		0x7A
+#define DIMM_2X_REFRESH_EVENT_DATA	0x96
 
 #define MAX_READ_BLOCK_LENGTH	48
 
@@ -190,6 +191,7 @@ enum EVENT_TYPES {
 	VRD_WARN_FAULT_EVENT,
 	VRD_HOT_EVENT,
 	DIMM_HOT_EVENT,
+	DIMM_2X_REFRESH_EVENT,
 	NUM_EVENTS_TYPE,
 };
 
@@ -198,6 +200,7 @@ static u8 smpro_event_table[NUM_EVENTS_TYPE] = {
 	VRD_WARN_FAULT_EVENT_DATA,
 	VRD_HOT_EVENT_DATA,
 	DIMM_HOT_EVENT_DATA,
+	DIMM_2X_REFRESH_EVENT_DATA,
 };
 
 static ssize_t smpro_event_data_read(struct device *dev,
@@ -463,6 +466,7 @@ static DEVICE_ATTR_RO(warn_pmpro);
 EVENT_RO(vrd_warn_fault, VRD_WARN_FAULT_EVENT);
 EVENT_RO(vrd_hot, VRD_HOT_EVENT);
 EVENT_RO(dimm_hot, DIMM_HOT_EVENT);
+EVENT_RO(dimm_2x_refresh, DIMM_2X_REFRESH_EVENT);
 
 static struct attribute *smpro_errmon_attrs[] = {
 	&dev_attr_overflow_core_ce.attr,
@@ -488,6 +492,7 @@ static struct attribute *smpro_errmon_attrs[] = {
 	&dev_attr_event_vrd_warn_fault.attr,
 	&dev_attr_event_vrd_hot.attr,
 	&dev_attr_event_dimm_hot.attr,
+	&dev_attr_event_dimm_2x_refresh.attr,
 	NULL
 };
 
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 2/2] misc: smpro-errmon: Add dimm training failure syndrome
  2023-02-14  6:45 [PATCH 0/2] Add DIMM 2x refresh event and failure syndrome Quan Nguyen
  2023-02-14  6:45 ` [PATCH 1/2] misc: smpro-errmon: Add DIMM 2x Refresh rate event Quan Nguyen
@ 2023-02-14  6:45 ` Quan Nguyen
  2023-02-15  7:33   ` Paul Menzel
  1 sibling, 1 reply; 7+ messages in thread
From: Quan Nguyen @ 2023-02-14  6:45 UTC (permalink / raw)
  To: Arnd Bergmann, Greg Kroah-Hartman, Joel Stanley, Andrew Jeffery,
	linux-kernel, openbmc, Open Source Submission
  Cc: Thang Nguyen, Phong Vo, Quan Nguyen

Adds event_dimm[0-15]_syndrome sysfs to report the failure syndrome
to BMC when DIMM training failed.

Signed-off-by: Quan Nguyen <quan@os.amperecomputing.com>
---
 .../sysfs-bus-platform-devices-ampere-smpro   | 10 +++
 drivers/misc/smpro-errmon.c                   | 77 +++++++++++++++++++
 2 files changed, 87 insertions(+)

diff --git a/Documentation/ABI/testing/sysfs-bus-platform-devices-ampere-smpro b/Documentation/ABI/testing/sysfs-bus-platform-devices-ampere-smpro
index d4e3f308c451..c35f1d45e656 100644
--- a/Documentation/ABI/testing/sysfs-bus-platform-devices-ampere-smpro
+++ b/Documentation/ABI/testing/sysfs-bus-platform-devices-ampere-smpro
@@ -265,6 +265,16 @@ Description:
 		For more details, see section `5.7 GPI Status Registers and 5.9 Memory Error Register Definitions,
 		Altra Family Soc BMC Interface Specification`.
 
+What:		/sys/bus/platform/devices/smpro-errmon.*/event_dimm[0-15]_syndrome
+KernelVersion:	6.1
+Contact:	Quan Nguyen <quan@os.amperecomputing.com>
+Description:
+		(RO) The sysfs returns the 2-byte DIMM failure syndrome data for slot
+		0-15 if it failed to initialized.
+
+		For more details, see section `5.11 Boot Stage Register Definitions,
+		Altra Family Soc BMC Interface Specification`.
+
 What:		/sys/bus/platform/devices/smpro-misc.*/boot_progress
 KernelVersion:	6.1
 Contact:	Quan Nguyen <quan@os.amperecomputing.com>
diff --git a/drivers/misc/smpro-errmon.c b/drivers/misc/smpro-errmon.c
index 1635e881aefb..3e8570cbb740 100644
--- a/drivers/misc/smpro-errmon.c
+++ b/drivers/misc/smpro-errmon.c
@@ -47,6 +47,12 @@
 #define WARN_PMPRO_INFO_LO	0xAC
 #define WARN_PMPRO_INFO_HI	0xAD
 
+/* Boot Stage Register */
+#define BOOTSTAGE		0xB0
+#define DIMM_SYNDROME_SEL	0xB4
+#define DIMM_SYNDROME_ERR	0xB5
+#define DIMM_SYNDROME_STAGE	4
+
 /* PCIE Error Registers */
 #define PCIE_CE_ERR_CNT		0xC0
 #define PCIE_CE_ERR_LEN		0xC1
@@ -468,6 +474,61 @@ EVENT_RO(vrd_hot, VRD_HOT_EVENT);
 EVENT_RO(dimm_hot, DIMM_HOT_EVENT);
 EVENT_RO(dimm_2x_refresh, DIMM_2X_REFRESH_EVENT);
 
+static ssize_t smpro_dimm_syndrome_read(struct device *dev, struct device_attribute *da,
+					char *buf, int slot)
+{
+	struct smpro_errmon *errmon = dev_get_drvdata(dev);
+	s32 data;
+	int ret;
+
+	ret = regmap_read(errmon->regmap, BOOTSTAGE, &data);
+	if (ret)
+		return ret;
+
+	/* check for valid stage */
+	data = (data >> 8) & 0xff;
+	if (data != DIMM_SYNDROME_STAGE)
+		return ret;
+
+	/* Write the slot ID to retrieve Error Syndrome */
+	ret = regmap_write(errmon->regmap, DIMM_SYNDROME_SEL, slot);
+	if (ret)
+		return ret;
+
+	/* Read the Syndrome error */
+	ret = regmap_read(errmon->regmap, DIMM_SYNDROME_ERR, &data);
+	if (ret || !data)
+		return ret;
+
+	return sysfs_emit(buf, "%04x\n", data);
+}
+
+#define EVENT_DIMM_SYNDROME(_slot) \
+	static ssize_t event_dimm##_slot##_syndrome_show(struct device *dev,          \
+							 struct device_attribute *da, \
+							 char *buf)                   \
+	{                                                                             \
+		return smpro_dimm_syndrome_read(dev, da, buf, _slot);                 \
+	}                                                                             \
+	static DEVICE_ATTR_RO(event_dimm##_slot##_syndrome)
+
+EVENT_DIMM_SYNDROME(0);
+EVENT_DIMM_SYNDROME(1);
+EVENT_DIMM_SYNDROME(2);
+EVENT_DIMM_SYNDROME(3);
+EVENT_DIMM_SYNDROME(4);
+EVENT_DIMM_SYNDROME(5);
+EVENT_DIMM_SYNDROME(6);
+EVENT_DIMM_SYNDROME(7);
+EVENT_DIMM_SYNDROME(8);
+EVENT_DIMM_SYNDROME(9);
+EVENT_DIMM_SYNDROME(10);
+EVENT_DIMM_SYNDROME(11);
+EVENT_DIMM_SYNDROME(12);
+EVENT_DIMM_SYNDROME(13);
+EVENT_DIMM_SYNDROME(14);
+EVENT_DIMM_SYNDROME(15);
+
 static struct attribute *smpro_errmon_attrs[] = {
 	&dev_attr_overflow_core_ce.attr,
 	&dev_attr_overflow_core_ue.attr,
@@ -493,6 +554,22 @@ static struct attribute *smpro_errmon_attrs[] = {
 	&dev_attr_event_vrd_hot.attr,
 	&dev_attr_event_dimm_hot.attr,
 	&dev_attr_event_dimm_2x_refresh.attr,
+	&dev_attr_event_dimm0_syndrome.attr,
+	&dev_attr_event_dimm1_syndrome.attr,
+	&dev_attr_event_dimm2_syndrome.attr,
+	&dev_attr_event_dimm3_syndrome.attr,
+	&dev_attr_event_dimm4_syndrome.attr,
+	&dev_attr_event_dimm5_syndrome.attr,
+	&dev_attr_event_dimm6_syndrome.attr,
+	&dev_attr_event_dimm7_syndrome.attr,
+	&dev_attr_event_dimm8_syndrome.attr,
+	&dev_attr_event_dimm9_syndrome.attr,
+	&dev_attr_event_dimm10_syndrome.attr,
+	&dev_attr_event_dimm11_syndrome.attr,
+	&dev_attr_event_dimm12_syndrome.attr,
+	&dev_attr_event_dimm13_syndrome.attr,
+	&dev_attr_event_dimm14_syndrome.attr,
+	&dev_attr_event_dimm15_syndrome.attr,
 	NULL
 };
 
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH 2/2] misc: smpro-errmon: Add dimm training failure syndrome
  2023-02-14  6:45 ` [PATCH 2/2] misc: smpro-errmon: Add dimm training failure syndrome Quan Nguyen
@ 2023-02-15  7:33   ` Paul Menzel
  2023-02-16  3:22     ` Quan Nguyen
  0 siblings, 1 reply; 7+ messages in thread
From: Paul Menzel @ 2023-02-15  7:33 UTC (permalink / raw)
  To: Quan Nguyen
  Cc: Arnd Bergmann, Andrew Jeffery, Greg Kroah-Hartman, openbmc,
	Thang Nguyen, linux-kernel, Phong Vo, Joel Stanley,
	Open Source Submission

Dear Quan,


Thank you for your patch.

Am 14.02.23 um 07:45 schrieb Quan Nguyen:
> Adds event_dimm[0-15]_syndrome sysfs to report the failure syndrome
> to BMC when DIMM training failed.

Where you able to verify that it works? Out of curiosity, how?

> Signed-off-by: Quan Nguyen <quan@os.amperecomputing.com>
> ---
>   .../sysfs-bus-platform-devices-ampere-smpro   | 10 +++
>   drivers/misc/smpro-errmon.c                   | 77 +++++++++++++++++++
>   2 files changed, 87 insertions(+)
> 
> diff --git a/Documentation/ABI/testing/sysfs-bus-platform-devices-ampere-smpro b/Documentation/ABI/testing/sysfs-bus-platform-devices-ampere-smpro
> index d4e3f308c451..c35f1d45e656 100644
> --- a/Documentation/ABI/testing/sysfs-bus-platform-devices-ampere-smpro
> +++ b/Documentation/ABI/testing/sysfs-bus-platform-devices-ampere-smpro
> @@ -265,6 +265,16 @@ Description:
>   		For more details, see section `5.7 GPI Status Registers and 5.9 Memory Error Register Definitions,
>   		Altra Family Soc BMC Interface Specification`.
>   
> +What:		/sys/bus/platform/devices/smpro-errmon.*/event_dimm[0-15]_syndrome
> +KernelVersion:	6.1

Should it be 6.2, as it probably won’t make it into 6.1?

> +Contact:	Quan Nguyen <quan@os.amperecomputing.com>
> +Description:
> +		(RO) The sysfs returns the 2-byte DIMM failure syndrome data for slot
> +		0-15 if it failed to initialized.

to initialize

> +
> +		For more details, see section `5.11 Boot Stage Register Definitions,
> +		Altra Family Soc BMC Interface Specification`.
> +
>   What:		/sys/bus/platform/devices/smpro-misc.*/boot_progress
>   KernelVersion:	6.1
>   Contact:	Quan Nguyen <quan@os.amperecomputing.com>
> diff --git a/drivers/misc/smpro-errmon.c b/drivers/misc/smpro-errmon.c
> index 1635e881aefb..3e8570cbb740 100644
> --- a/drivers/misc/smpro-errmon.c
> +++ b/drivers/misc/smpro-errmon.c
> @@ -47,6 +47,12 @@
>   #define WARN_PMPRO_INFO_LO	0xAC
>   #define WARN_PMPRO_INFO_HI	0xAD
>   
> +/* Boot Stage Register */
> +#define BOOTSTAGE		0xB0
> +#define DIMM_SYNDROME_SEL	0xB4
> +#define DIMM_SYNDROME_ERR	0xB5
> +#define DIMM_SYNDROME_STAGE	4
> +
>   /* PCIE Error Registers */
>   #define PCIE_CE_ERR_CNT		0xC0
>   #define PCIE_CE_ERR_LEN		0xC1
> @@ -468,6 +474,61 @@ EVENT_RO(vrd_hot, VRD_HOT_EVENT);
>   EVENT_RO(dimm_hot, DIMM_HOT_EVENT);
>   EVENT_RO(dimm_2x_refresh, DIMM_2X_REFRESH_EVENT);
>   
> +static ssize_t smpro_dimm_syndrome_read(struct device *dev, struct device_attribute *da,
> +					char *buf, int slot)

Could `slot` be passed as `unsigned int`?

> +{
> +	struct smpro_errmon *errmon = dev_get_drvdata(dev);
> +	s32 data;
> +	int ret;
> +
> +	ret = regmap_read(errmon->regmap, BOOTSTAGE, &data);

The function signature is:

     int regmap_read(struct regmap *map, unsigned int reg, unsigned int 
*val)

So why not use unsigned int as data type for `data`?

> +	if (ret)
> +		return ret;
> +
> +	/* check for valid stage */
> +	data = (data >> 8) & 0xff;
> +	if (data != DIMM_SYNDROME_STAGE)
> +		return ret;

Isn’t now success returned? Should a debug message be printed?

> +
> +	/* Write the slot ID to retrieve Error Syndrome */
> +	ret = regmap_write(errmon->regmap, DIMM_SYNDROME_SEL, slot);
> +	if (ret)
> +		return ret;
> +
> +	/* Read the Syndrome error */
> +	ret = regmap_read(errmon->regmap, DIMM_SYNDROME_ERR, &data);
> +	if (ret || !data)
> +		return ret;
> +
> +	return sysfs_emit(buf, "%04x\n", data);
> +}
> +
> +#define EVENT_DIMM_SYNDROME(_slot) \
> +	static ssize_t event_dimm##_slot##_syndrome_show(struct device *dev,          \
> +							 struct device_attribute *da, \
> +							 char *buf)                   \
> +	{                                                                             \
> +		return smpro_dimm_syndrome_read(dev, da, buf, _slot);                 \
> +	}                                                                             \
> +	static DEVICE_ATTR_RO(event_dimm##_slot##_syndrome)
> +
> +EVENT_DIMM_SYNDROME(0);
> +EVENT_DIMM_SYNDROME(1);
> +EVENT_DIMM_SYNDROME(2);
> +EVENT_DIMM_SYNDROME(3);
> +EVENT_DIMM_SYNDROME(4);
> +EVENT_DIMM_SYNDROME(5);
> +EVENT_DIMM_SYNDROME(6);
> +EVENT_DIMM_SYNDROME(7);
> +EVENT_DIMM_SYNDROME(8);
> +EVENT_DIMM_SYNDROME(9);
> +EVENT_DIMM_SYNDROME(10);
> +EVENT_DIMM_SYNDROME(11);
> +EVENT_DIMM_SYNDROME(12);
> +EVENT_DIMM_SYNDROME(13);
> +EVENT_DIMM_SYNDROME(14);
> +EVENT_DIMM_SYNDROME(15);
> +
>   static struct attribute *smpro_errmon_attrs[] = {
>   	&dev_attr_overflow_core_ce.attr,
>   	&dev_attr_overflow_core_ue.attr,
> @@ -493,6 +554,22 @@ static struct attribute *smpro_errmon_attrs[] = {
>   	&dev_attr_event_vrd_hot.attr,
>   	&dev_attr_event_dimm_hot.attr,
>   	&dev_attr_event_dimm_2x_refresh.attr,
> +	&dev_attr_event_dimm0_syndrome.attr,
> +	&dev_attr_event_dimm1_syndrome.attr,
> +	&dev_attr_event_dimm2_syndrome.attr,
> +	&dev_attr_event_dimm3_syndrome.attr,
> +	&dev_attr_event_dimm4_syndrome.attr,
> +	&dev_attr_event_dimm5_syndrome.attr,
> +	&dev_attr_event_dimm6_syndrome.attr,
> +	&dev_attr_event_dimm7_syndrome.attr,
> +	&dev_attr_event_dimm8_syndrome.attr,
> +	&dev_attr_event_dimm9_syndrome.attr,
> +	&dev_attr_event_dimm10_syndrome.attr,
> +	&dev_attr_event_dimm11_syndrome.attr,
> +	&dev_attr_event_dimm12_syndrome.attr,
> +	&dev_attr_event_dimm13_syndrome.attr,
> +	&dev_attr_event_dimm14_syndrome.attr,
> +	&dev_attr_event_dimm15_syndrome.attr,
>   	NULL
>   };


Kind regards,

Paul

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 2/2] misc: smpro-errmon: Add dimm training failure syndrome
  2023-02-15  7:33   ` Paul Menzel
@ 2023-02-16  3:22     ` Quan Nguyen
  2023-02-16 12:30       ` Greg Kroah-Hartman
  0 siblings, 1 reply; 7+ messages in thread
From: Quan Nguyen @ 2023-02-16  3:22 UTC (permalink / raw)
  To: Paul Menzel
  Cc: Arnd Bergmann, Andrew Jeffery, Greg Kroah-Hartman, openbmc,
	Thang Nguyen, linux-kernel, Phong Vo, Joel Stanley,
	Open Source Submission



On 15/02/2023 14:33, Paul Menzel wrote:
> Dear Quan,
> 
> 
> Thank you for your patch.
> 

Thanks Paul for the review.

> Am 14.02.23 um 07:45 schrieb Quan Nguyen:
>> Adds event_dimm[0-15]_syndrome sysfs to report the failure syndrome
>> to BMC when DIMM training failed.
> 
> Where you able to verify that it works? Out of curiosity, how?
> 

Yes, we verified it by injecting DIMM errors and confirm that errors was 
reported correctly via sysfs.
For about how to do error injection, we may  need to refer to section 
3.2 Memory Error Group in Altra Family RAS Error Injection User Manual. 
It is shared in our Ampere Customer Connect [1]. The latest version is 
v1.00_20220329.

[1] https://connect.amperecomputing.com

>> Signed-off-by: Quan Nguyen <quan@os.amperecomputing.com>
>> ---
>>   .../sysfs-bus-platform-devices-ampere-smpro   | 10 +++
>>   drivers/misc/smpro-errmon.c                   | 77 +++++++++++++++++++
>>   2 files changed, 87 insertions(+)
>>
>> diff --git 
>> a/Documentation/ABI/testing/sysfs-bus-platform-devices-ampere-smpro 
>> b/Documentation/ABI/testing/sysfs-bus-platform-devices-ampere-smpro
>> index d4e3f308c451..c35f1d45e656 100644
>> --- a/Documentation/ABI/testing/sysfs-bus-platform-devices-ampere-smpro
>> +++ b/Documentation/ABI/testing/sysfs-bus-platform-devices-ampere-smpro
>> @@ -265,6 +265,16 @@ Description:
>>           For more details, see section `5.7 GPI Status Registers and 
>> 5.9 Memory Error Register Definitions,
>>           Altra Family Soc BMC Interface Specification`.
>> +What:        
>> /sys/bus/platform/devices/smpro-errmon.*/event_dimm[0-15]_syndrome
>> +KernelVersion:    6.1
> 
> Should it be 6.2, as it probably won’t make it into 6.1?
> 

Thanks for the catch. Will fix in next version.

>> +Contact:    Quan Nguyen <quan@os.amperecomputing.com>
>> +Description:
>> +        (RO) The sysfs returns the 2-byte DIMM failure syndrome data 
>> for slot
>> +        0-15 if it failed to initialized.
> 
> to initialize
> 

Will fix in next version.

>> +
>> +        For more details, see section `5.11 Boot Stage Register 
>> Definitions,
>> +        Altra Family Soc BMC Interface Specification`.
>> +
>>   What:        /sys/bus/platform/devices/smpro-misc.*/boot_progress
>>   KernelVersion:    6.1
>>   Contact:    Quan Nguyen <quan@os.amperecomputing.com>
>> diff --git a/drivers/misc/smpro-errmon.c b/drivers/misc/smpro-errmon.c
>> index 1635e881aefb..3e8570cbb740 100644
>> --- a/drivers/misc/smpro-errmon.c
>> +++ b/drivers/misc/smpro-errmon.c
>> @@ -47,6 +47,12 @@
>>   #define WARN_PMPRO_INFO_LO    0xAC
>>   #define WARN_PMPRO_INFO_HI    0xAD
>> +/* Boot Stage Register */
>> +#define BOOTSTAGE        0xB0
>> +#define DIMM_SYNDROME_SEL    0xB4
>> +#define DIMM_SYNDROME_ERR    0xB5
>> +#define DIMM_SYNDROME_STAGE    4
>> +
>>   /* PCIE Error Registers */
>>   #define PCIE_CE_ERR_CNT        0xC0
>>   #define PCIE_CE_ERR_LEN        0xC1
>> @@ -468,6 +474,61 @@ EVENT_RO(vrd_hot, VRD_HOT_EVENT);
>>   EVENT_RO(dimm_hot, DIMM_HOT_EVENT);
>>   EVENT_RO(dimm_2x_refresh, DIMM_2X_REFRESH_EVENT);
>> +static ssize_t smpro_dimm_syndrome_read(struct device *dev, struct 
>> device_attribute *da,
>> +                    char *buf, int slot)
> 
> Could `slot` be passed as `unsigned int`?
> 

Yes, I will update in next version.
Indeed, the "slot" will be passed as the last param to regmap_write() 
which is in unsigned int as below.

int regmap_write(struct regmap *map, unsigned int reg, unsigned int val);

So, change to "unsigned int" is corrected.

>> +{
>> +    struct smpro_errmon *errmon = dev_get_drvdata(dev);
>> +    s32 data;
>> +    int ret;
>> +
>> +    ret = regmap_read(errmon->regmap, BOOTSTAGE, &data);
> 
> The function signature is:
> 
>      int regmap_read(struct regmap *map, unsigned int reg, unsigned int 
> *val)
> 
> So why not use unsigned int as data type for `data`?
> 

Agree, you comment is correct. Will change data to "unsigned int" in new 
version.

>> +    if (ret)
>> +        return ret;
>> +
>> +    /* check for valid stage */
>> +    data = (data >> 8) & 0xff;
>> +    if (data != DIMM_SYNDROME_STAGE)
>> +        return ret;
> 
> Isn’t now success returned? Should a debug message be printed?
> 

Yes, this is a success case, there is no dimm training error and has 
nothing to return in sysfs.
If no dimm training error, host will never stay in DIMM_SYNDROME_STAGE.

I dont think printing a debug message in this case is good because 
printing out something like "No dimm traning error" is not necessary as 
I think.

>> +
>> +    /* Write the slot ID to retrieve Error Syndrome */
>> +    ret = regmap_write(errmon->regmap, DIMM_SYNDROME_SEL, slot);
>> +    if (ret)
>> +        return ret;
>> +
>> +    /* Read the Syndrome error */
>> +    ret = regmap_read(errmon->regmap, DIMM_SYNDROME_ERR, &data);
>> +    if (ret || !data)
>> +        return ret;
>> +
>> +    return sysfs_emit(buf, "%04x\n", data);
>> +}
>> +
>> +#define EVENT_DIMM_SYNDROME(_slot) \
>> +    static ssize_t event_dimm##_slot##_syndrome_show(struct device 
>> *dev,          \
>> +                             struct device_attribute *da, \
>> +                             char *buf)                   \
>> +    
>> {                                                                             \
>> +        return smpro_dimm_syndrome_read(dev, da, buf, 
>> _slot);                 \
>> +    
>> }                                                                             \
>> +    static DEVICE_ATTR_RO(event_dimm##_slot##_syndrome)
>> +
>> +EVENT_DIMM_SYNDROME(0);
>> +EVENT_DIMM_SYNDROME(1);
>> +EVENT_DIMM_SYNDROME(2);
>> +EVENT_DIMM_SYNDROME(3);
>> +EVENT_DIMM_SYNDROME(4);
>> +EVENT_DIMM_SYNDROME(5);
>> +EVENT_DIMM_SYNDROME(6);
>> +EVENT_DIMM_SYNDROME(7);
>> +EVENT_DIMM_SYNDROME(8);
>> +EVENT_DIMM_SYNDROME(9);
>> +EVENT_DIMM_SYNDROME(10);
>> +EVENT_DIMM_SYNDROME(11);
>> +EVENT_DIMM_SYNDROME(12);
>> +EVENT_DIMM_SYNDROME(13);
>> +EVENT_DIMM_SYNDROME(14);
>> +EVENT_DIMM_SYNDROME(15);
>> +
>>   static struct attribute *smpro_errmon_attrs[] = {
>>       &dev_attr_overflow_core_ce.attr,
>>       &dev_attr_overflow_core_ue.attr,
>> @@ -493,6 +554,22 @@ static struct attribute *smpro_errmon_attrs[] = {
>>       &dev_attr_event_vrd_hot.attr,
>>       &dev_attr_event_dimm_hot.attr,
>>       &dev_attr_event_dimm_2x_refresh.attr,
>> +    &dev_attr_event_dimm0_syndrome.attr,
>> +    &dev_attr_event_dimm1_syndrome.attr,
>> +    &dev_attr_event_dimm2_syndrome.attr,
>> +    &dev_attr_event_dimm3_syndrome.attr,
>> +    &dev_attr_event_dimm4_syndrome.attr,
>> +    &dev_attr_event_dimm5_syndrome.attr,
>> +    &dev_attr_event_dimm6_syndrome.attr,
>> +    &dev_attr_event_dimm7_syndrome.attr,
>> +    &dev_attr_event_dimm8_syndrome.attr,
>> +    &dev_attr_event_dimm9_syndrome.attr,
>> +    &dev_attr_event_dimm10_syndrome.attr,
>> +    &dev_attr_event_dimm11_syndrome.attr,
>> +    &dev_attr_event_dimm12_syndrome.attr,
>> +    &dev_attr_event_dimm13_syndrome.attr,
>> +    &dev_attr_event_dimm14_syndrome.attr,
>> +    &dev_attr_event_dimm15_syndrome.attr,
>>       NULL
>>   };
> 
> 
> Kind regards,
> 
> Paul

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 2/2] misc: smpro-errmon: Add dimm training failure syndrome
  2023-02-16  3:22     ` Quan Nguyen
@ 2023-02-16 12:30       ` Greg Kroah-Hartman
  2023-02-17  2:36         ` Quan Nguyen
  0 siblings, 1 reply; 7+ messages in thread
From: Greg Kroah-Hartman @ 2023-02-16 12:30 UTC (permalink / raw)
  To: Quan Nguyen
  Cc: Paul Menzel, Arnd Bergmann, Andrew Jeffery, openbmc,
	Thang Nguyen, linux-kernel, Phong Vo, Joel Stanley,
	Open Source Submission

On Thu, Feb 16, 2023 at 10:22:14AM +0700, Quan Nguyen wrote:
> 
> 
> On 15/02/2023 14:33, Paul Menzel wrote:
> > Dear Quan,
> > 
> > 
> > Thank you for your patch.
> > 
> 
> Thanks Paul for the review.
> 
> > Am 14.02.23 um 07:45 schrieb Quan Nguyen:
> > > Adds event_dimm[0-15]_syndrome sysfs to report the failure syndrome
> > > to BMC when DIMM training failed.
> > 
> > Where you able to verify that it works? Out of curiosity, how?
> > 
> 
> Yes, we verified it by injecting DIMM errors and confirm that errors was
> reported correctly via sysfs.
> For about how to do error injection, we may  need to refer to section 3.2
> Memory Error Group in Altra Family RAS Error Injection User Manual. It is
> shared in our Ampere Customer Connect [1]. The latest version is
> v1.00_20220329.
> 
> [1] https://connect.amperecomputing.com
> 
> > > Signed-off-by: Quan Nguyen <quan@os.amperecomputing.com>
> > > ---
> > >   .../sysfs-bus-platform-devices-ampere-smpro   | 10 +++
> > >   drivers/misc/smpro-errmon.c                   | 77 +++++++++++++++++++
> > >   2 files changed, 87 insertions(+)
> > > 
> > > diff --git
> > > a/Documentation/ABI/testing/sysfs-bus-platform-devices-ampere-smpro
> > > b/Documentation/ABI/testing/sysfs-bus-platform-devices-ampere-smpro
> > > index d4e3f308c451..c35f1d45e656 100644
> > > --- a/Documentation/ABI/testing/sysfs-bus-platform-devices-ampere-smpro
> > > +++ b/Documentation/ABI/testing/sysfs-bus-platform-devices-ampere-smpro
> > > @@ -265,6 +265,16 @@ Description:
> > >           For more details, see section `5.7 GPI Status Registers
> > > and 5.9 Memory Error Register Definitions,
> > >           Altra Family Soc BMC Interface Specification`.
> > > +What:
> > > /sys/bus/platform/devices/smpro-errmon.*/event_dimm[0-15]_syndrome
> > > +KernelVersion:    6.1
> > 
> > Should it be 6.2, as it probably won’t make it into 6.1?
> > 
> 
> Thanks for the catch. Will fix in next version.

Should be 6.3, it's missed the 6.2 merge window cycle, sorry.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 2/2] misc: smpro-errmon: Add dimm training failure syndrome
  2023-02-16 12:30       ` Greg Kroah-Hartman
@ 2023-02-17  2:36         ` Quan Nguyen
  0 siblings, 0 replies; 7+ messages in thread
From: Quan Nguyen @ 2023-02-17  2:36 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Paul Menzel, Arnd Bergmann, Andrew Jeffery, openbmc,
	Thang Nguyen, linux-kernel, Phong Vo, Joel Stanley,
	Open Source Submission



On 16/02/2023 19:30, Greg Kroah-Hartman wrote:
> On Thu, Feb 16, 2023 at 10:22:14AM +0700, Quan Nguyen wrote:
>>
>>
>> On 15/02/2023 14:33, Paul Menzel wrote:
>>> Dear Quan,
>>>
>>>
>>> Thank you for your patch.
>>>
>>
>> Thanks Paul for the review.
>>
>>> Am 14.02.23 um 07:45 schrieb Quan Nguyen:
>>>> Adds event_dimm[0-15]_syndrome sysfs to report the failure syndrome
>>>> to BMC when DIMM training failed.
>>>
>>> Where you able to verify that it works? Out of curiosity, how?
>>>
>>
>> Yes, we verified it by injecting DIMM errors and confirm that errors was
>> reported correctly via sysfs.
>> For about how to do error injection, we may  need to refer to section 3.2
>> Memory Error Group in Altra Family RAS Error Injection User Manual. It is
>> shared in our Ampere Customer Connect [1]. The latest version is
>> v1.00_20220329.
>>
>> [1] https://connect.amperecomputing.com
>>
>>>> Signed-off-by: Quan Nguyen <quan@os.amperecomputing.com>
>>>> ---
>>>>    .../sysfs-bus-platform-devices-ampere-smpro   | 10 +++
>>>>    drivers/misc/smpro-errmon.c                   | 77 +++++++++++++++++++
>>>>    2 files changed, 87 insertions(+)
>>>>
>>>> diff --git
>>>> a/Documentation/ABI/testing/sysfs-bus-platform-devices-ampere-smpro
>>>> b/Documentation/ABI/testing/sysfs-bus-platform-devices-ampere-smpro
>>>> index d4e3f308c451..c35f1d45e656 100644
>>>> --- a/Documentation/ABI/testing/sysfs-bus-platform-devices-ampere-smpro
>>>> +++ b/Documentation/ABI/testing/sysfs-bus-platform-devices-ampere-smpro
>>>> @@ -265,6 +265,16 @@ Description:
>>>>            For more details, see section `5.7 GPI Status Registers
>>>> and 5.9 Memory Error Register Definitions,
>>>>            Altra Family Soc BMC Interface Specification`.
>>>> +What:
>>>> /sys/bus/platform/devices/smpro-errmon.*/event_dimm[0-15]_syndrome
>>>> +KernelVersion:    6.1
>>>
>>> Should it be 6.2, as it probably won’t make it into 6.1?
>>>
>>
>> Thanks for the catch. Will fix in next version.
> 
> Should be 6.3, it's missed the 6.2 merge window cycle, sorry.
> 
> thanks,
> 

Thanks Greg,
Will update to 6.3



^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2023-02-17  2:37 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-02-14  6:45 [PATCH 0/2] Add DIMM 2x refresh event and failure syndrome Quan Nguyen
2023-02-14  6:45 ` [PATCH 1/2] misc: smpro-errmon: Add DIMM 2x Refresh rate event Quan Nguyen
2023-02-14  6:45 ` [PATCH 2/2] misc: smpro-errmon: Add dimm training failure syndrome Quan Nguyen
2023-02-15  7:33   ` Paul Menzel
2023-02-16  3:22     ` Quan Nguyen
2023-02-16 12:30       ` Greg Kroah-Hartman
2023-02-17  2:36         ` Quan Nguyen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).