Linux-EDAC Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH 1/1] edac: fsl_ddr_edac: fix expected data message
@ 2020-07-24 11:18 Gregor Herburger
  2020-08-17  9:53 ` Borislav Petkov
  0 siblings, 1 reply; 4+ messages in thread
From: Gregor Herburger @ 2020-07-24 11:18 UTC (permalink / raw)
  To: york.sun, bp, mchehab, tony.luck, james.morse, rrichter
  Cc: linux-edac, Gregor Herburger

In some cases a wrong 'Expected Data' is calculated and reported.
When comparing Expected/Captured Data this looks like dual bit errors when
only a single bit error occurred.

On my aarch64 machine it prints something similar to this:
[  311.103794] EDAC FSL_DDR MC0: Faulty Data bit: 36
[  311.108490] EDAC FSL_DDR MC0: Expected Data / ECC:   0xffffffef_ffffffff / 0x80000059
[  311.116135] EDAC FSL_DDR MC0: Captured Data / ECC:   0xffffffff_ffffffef / 0x59

Fix this by only shift the register where the error occurred.

Signed-off-by: Gregor Herburger <gregor.herburger@ew.tq-group.com>
---
 drivers/edac/fsl_ddr_edac.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/edac/fsl_ddr_edac.c b/drivers/edac/fsl_ddr_edac.c
index 6d8ea226010d..4b6989cf1947 100644
--- a/drivers/edac/fsl_ddr_edac.c
+++ b/drivers/edac/fsl_ddr_edac.c
@@ -343,9 +343,9 @@ static void fsl_mc_check(struct mem_ctl_info *mci)
 
 		fsl_mc_printk(mci, KERN_ERR,
 			"Expected Data / ECC:\t%#8.8x_%08x / %#2.2x\n",
-			cap_high ^ (1 << (bad_data_bit - 32)),
-			cap_low ^ (1 << bad_data_bit),
-			syndrome ^ (1 << bad_ecc_bit));
+			(bad_data_bit > 31) ? cap_high ^ (1 << (bad_data_bit - 32)) : cap_high,
+			(bad_data_bit <= 31) ? cap_low ^ (1 << (bad_data_bit)) : cap_low,
+			(bad_ecc_bit != -1) ? syndrome ^ (1 << (bad_ecc_bit)) : syndrome);
 	}
 
 	fsl_mc_printk(mci, KERN_ERR,
-- 
2.17.1


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH 1/1] edac: fsl_ddr_edac: fix expected data message
  2020-07-24 11:18 [PATCH 1/1] edac: fsl_ddr_edac: fix expected data message Gregor Herburger
@ 2020-08-17  9:53 ` Borislav Petkov
  2020-08-27  7:56   ` [PATCH v2 " Gregor Herburger
  0 siblings, 1 reply; 4+ messages in thread
From: Borislav Petkov @ 2020-08-17  9:53 UTC (permalink / raw)
  To: Gregor Herburger
  Cc: york.sun, mchehab, tony.luck, james.morse, rrichter, linux-edac

On Fri, Jul 24, 2020 at 01:18:46PM +0200, Gregor Herburger wrote:
> In some cases a wrong 'Expected Data' is calculated and reported.

In some cases? Which cases?

You need to expand that sentence with more details as to what the
problem is because I'm not getting any smarter from it.

> When comparing Expected/Captured Data this looks like dual bit errors when
> only a single bit error occurred.
> 
> On my aarch64 machine it prints something similar to this:
> [  311.103794] EDAC FSL_DDR MC0: Faulty Data bit: 36
> [  311.108490] EDAC FSL_DDR MC0: Expected Data / ECC:   0xffffffef_ffffffff / 0x80000059
> [  311.116135] EDAC FSL_DDR MC0: Captured Data / ECC:   0xffffffff_ffffffef / 0x59

Is that output before or after your change?

0xffffffef is with bit 4 XORed and cap_high was -1 before, cap_low is -1
too. The expected data syndrome has bit 31 set?!

Yeah, I'm confused. Please explain the issue in greater detail, try
structuring it this way:

Problem is A.

It happens because of B.

Fix it by doing C.

(Potentially do D).

For more detailed info, see
Documentation/process/submitting-patches.rst, Section "2) Describe your
changes".

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH v2 1/1] edac: fsl_ddr_edac: fix expected data message
  2020-08-17  9:53 ` Borislav Petkov
@ 2020-08-27  7:56   ` Gregor Herburger
  2020-09-03 10:58     ` Borislav Petkov
  0 siblings, 1 reply; 4+ messages in thread
From: Gregor Herburger @ 2020-08-27  7:56 UTC (permalink / raw)
  To: york.sun, bp, mchehab, tony.luck, james.morse, rrichter
  Cc: linux-edac, linux-kernel, Gregor Herburger

When a correctable single bit error occurs, the driver calculates the
bad_data_bit respectively the bad_ecc_bit. If there is no error in the
corresponding data, the value becomes -1. With this the expected data
message is calculated.

In the case of an error in the lower 32 bits or no error (-1) the right
side operand of the bit-shift becomes negative which is undefined
behavior.

This can result in wrong and misleading messages like this:
[  311.103794] EDAC FSL_DDR MC0: Faulty Data bit: 36
[  311.108490] EDAC FSL_DDR MC0: Expected Data / ECC:   0xffffffef_ffffffff / 0x80000059
[  311.116135] EDAC FSL_DDR MC0: Captured Data / ECC:   0xffffffff_ffffffef / 0x59

Fix this by only calculating the expected data where the error occurred.

With the fix the dmesg output looks like this:
[  311.103794] EDAC FSL_DDR MC0: Faulty Data bit: 36
[  311.108490] EDAC FSL_DDR MC0: Expected Data / ECC:   0xffffffef_ffffffef / 0x59
[  311.116135] EDAC FSL_DDR MC0: Captured Data / ECC:   0xffffffff_ffffffef / 0x59

Signed-off-by: Gregor Herburger <gregor.herburger@ew.tq-group.com>
---
 drivers/edac/fsl_ddr_edac.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/edac/fsl_ddr_edac.c b/drivers/edac/fsl_ddr_edac.c
index 6d8ea226010d..4b6989cf1947 100644
--- a/drivers/edac/fsl_ddr_edac.c
+++ b/drivers/edac/fsl_ddr_edac.c
@@ -343,9 +343,9 @@ static void fsl_mc_check(struct mem_ctl_info *mci)
 
 		fsl_mc_printk(mci, KERN_ERR,
 			"Expected Data / ECC:\t%#8.8x_%08x / %#2.2x\n",
-			cap_high ^ (1 << (bad_data_bit - 32)),
-			cap_low ^ (1 << bad_data_bit),
-			syndrome ^ (1 << bad_ecc_bit));
+			(bad_data_bit > 31) ? cap_high ^ (1 << (bad_data_bit - 32)) : cap_high,
+			(bad_data_bit <= 31) ? cap_low ^ (1 << (bad_data_bit)) : cap_low,
+			(bad_ecc_bit != -1) ? syndrome ^ (1 << (bad_ecc_bit)) : syndrome);
 	}
 
 	fsl_mc_printk(mci, KERN_ERR,
-- 
2.17.1


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH v2 1/1] edac: fsl_ddr_edac: fix expected data message
  2020-08-27  7:56   ` [PATCH v2 " Gregor Herburger
@ 2020-09-03 10:58     ` Borislav Petkov
  0 siblings, 0 replies; 4+ messages in thread
From: Borislav Petkov @ 2020-09-03 10:58 UTC (permalink / raw)
  To: Gregor Herburger
  Cc: york.sun, mchehab, tony.luck, james.morse, rrichter, linux-edac,
	linux-kernel

On Thu, Aug 27, 2020 at 09:56:00AM +0200, Gregor Herburger wrote:
> When a correctable single bit error occurs, the driver calculates the
> bad_data_bit respectively the bad_ecc_bit. If there is no error in the
> corresponding data, the value becomes -1. With this the expected data
> message is calculated.
> 
> In the case of an error in the lower 32 bits or no error (-1) the right
> side operand of the bit-shift becomes negative which is undefined
> behavior.
> 
> This can result in wrong and misleading messages like this:
> [  311.103794] EDAC FSL_DDR MC0: Faulty Data bit: 36
> [  311.108490] EDAC FSL_DDR MC0: Expected Data / ECC:   0xffffffef_ffffffff / 0x80000059
> [  311.116135] EDAC FSL_DDR MC0: Captured Data / ECC:   0xffffffff_ffffffef / 0x59
> 
> Fix this by only calculating the expected data where the error occurred.
> 
> With the fix the dmesg output looks like this:
> [  311.103794] EDAC FSL_DDR MC0: Faulty Data bit: 36
> [  311.108490] EDAC FSL_DDR MC0: Expected Data / ECC:   0xffffffef_ffffffef / 0x59
> [  311.116135] EDAC FSL_DDR MC0: Captured Data / ECC:   0xffffffff_ffffffef / 0x59
> 
> Signed-off-by: Gregor Herburger <gregor.herburger@ew.tq-group.com>
> ---
>  drivers/edac/fsl_ddr_edac.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/edac/fsl_ddr_edac.c b/drivers/edac/fsl_ddr_edac.c
> index 6d8ea226010d..4b6989cf1947 100644
> --- a/drivers/edac/fsl_ddr_edac.c
> +++ b/drivers/edac/fsl_ddr_edac.c
> @@ -343,9 +343,9 @@ static void fsl_mc_check(struct mem_ctl_info *mci)
>  
>  		fsl_mc_printk(mci, KERN_ERR,
>  			"Expected Data / ECC:\t%#8.8x_%08x / %#2.2x\n",
> -			cap_high ^ (1 << (bad_data_bit - 32)),
> -			cap_low ^ (1 << bad_data_bit),
> -			syndrome ^ (1 << bad_ecc_bit));
> +			(bad_data_bit > 31) ? cap_high ^ (1 << (bad_data_bit - 32)) : cap_high,
> +			(bad_data_bit <= 31) ? cap_low ^ (1 << (bad_data_bit)) : cap_low,

But if bad_data_bit is -1, this check above will hit and you'd still
shift by -1, IINM.

How about you fix it properly, clean it up and make it more readable in
the process (pasting the code directly instead of a diff because a diff
is less readable):

        if ((err_detect & DDR_EDE_SBE) && (bus_width == 64)) {
                sbe_ecc_decode(cap_high, cap_low, syndrome,
                                &bad_data_bit, &bad_ecc_bit);

                if (bad_data_bit != -1) {
                        if (bad_data_bit > 31)
                                cap_high ^= 1 << (bad_data_bit - 32);
                        else
                                cap_low  ^= 1 << bad_data_bit;

                        fsl_mc_printk(mci, KERN_ERR, "Faulty Data bit: %d\n", bad_data_bit);
                        fsl_mc_printk(mci, KERN_ERR, "Expected Data: %#8.8x_%08x\n",
                                      cap_high, cap_low);
                }

                if (bad_ecc_bit != -1) {
                        fsl_mc_printk(mci, KERN_ERR, "Faulty ECC bit: %d\n", bad_ecc_bit);
                        fsl_mc_printk(mci, KERN_ERR, "Expected ECC: %#2.2x\n",
                                      syndrome ^ (1 << bad_ecc_bit));
                }
        }

This way you print only when the respective faulty bits have been
properly found and not print anything otherwise.

Hmm?

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, back to index

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-24 11:18 [PATCH 1/1] edac: fsl_ddr_edac: fix expected data message Gregor Herburger
2020-08-17  9:53 ` Borislav Petkov
2020-08-27  7:56   ` [PATCH v2 " Gregor Herburger
2020-09-03 10:58     ` Borislav Petkov

Linux-EDAC Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-edac/0 linux-edac/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-edac linux-edac/ https://lore.kernel.org/linux-edac \
		linux-edac@vger.kernel.org
	public-inbox-index linux-edac

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-edac


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git