[PATCH 0/2] fsi and hwmon (occ): Prevent occasional checksum failures

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/2] fsi and hwmon (occ): Prevent occasional checksum failures
@ 2022-03-21 15:31 Eddie James
  2022-03-21 15:31 ` [PATCH 1/2] fsi: occ: Fix checksum failure mode Eddie James
  2022-03-21 15:31 ` [PATCH 2/2] hwmon (occ): Retry for checksum failure Eddie James
  0 siblings, 2 replies; 6+ messages in thread
From: Eddie James @ 2022-03-21 15:31 UTC (permalink / raw)
  To: linux-fsi
  Cc: linux-kernel, linux-hwmon, jdelvare, linux, joel, jk, alistair,
	Eddie James

Due to the OCC communication design with a shared SRAM area,
checkum errors are expected due to corrupted buffer from OCC
communications with other system components. Therefore, use a
unique errno for checksum failures and retry the command twice
in that case.

Eddie James (2):
  fsi: occ: Fix checksum failure mode
  hwmon (occ): Retry for checksum failure

 drivers/fsi/fsi-occ.c      |  7 +++++--
 drivers/hwmon/occ/p9_sbe.c | 28 ++++++++++++++++++----------
 2 files changed, 23 insertions(+), 12 deletions(-)

-- 
2.27.0


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 1/2] fsi: occ: Fix checksum failure mode
  2022-03-21 15:31 [PATCH 0/2] fsi and hwmon (occ): Prevent occasional checksum failures Eddie James
@ 2022-03-21 15:31 ` Eddie James
  2022-03-21 15:31 ` [PATCH 2/2] hwmon (occ): Retry for checksum failure Eddie James
  1 sibling, 0 replies; 6+ messages in thread
From: Eddie James @ 2022-03-21 15:31 UTC (permalink / raw)
  To: linux-fsi
  Cc: linux-kernel, linux-hwmon, jdelvare, linux, joel, jk, alistair,
	Eddie James

Change the checksum errno to something different than the errno
used for a bad SBE message. In addition, don't set the user's
response length to the data length in this case, since it's not
SBE FFDC.

Signed-off-by: Eddie James <eajames@linux.ibm.com>
---
 drivers/fsi/fsi-occ.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/fsi/fsi-occ.c b/drivers/fsi/fsi-occ.c
index c9cc75fbdfb9..3d04e8baecbb 100644
--- a/drivers/fsi/fsi-occ.c
+++ b/drivers/fsi/fsi-occ.c
@@ -246,7 +246,7 @@ static int occ_verify_checksum(struct occ *occ, struct occ_response *resp,
 	if (checksum != checksum_resp) {
 		dev_err(occ->dev, "Bad checksum: %04x!=%04x\n", checksum,
 			checksum_resp);
-		return -EBADMSG;
+		return -EBADE;
 	}
 
 	return 0;
@@ -575,8 +575,11 @@ int fsi_occ_submit(struct device *dev, const void *request, size_t req_len,
 	dev_dbg(dev, "resp_status=%02x resp_data_len=%d\n",
 		resp->return_status, resp_data_length);
 
-	occ->client_response_size = resp_data_length + 7;
 	rc = occ_verify_checksum(occ, resp, resp_data_length);
+	if (rc)
+		goto done;
+
+	occ->client_response_size = resp_data_length + 7;
 
  done:
 	*resp_len = occ->client_response_size;
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 2/2] hwmon (occ): Retry for checksum failure
  2022-03-21 15:31 [PATCH 0/2] fsi and hwmon (occ): Prevent occasional checksum failures Eddie James
  2022-03-21 15:31 ` [PATCH 1/2] fsi: occ: Fix checksum failure mode Eddie James
@ 2022-03-21 15:31 ` Eddie James
  2022-04-24 17:18   ` Guenter Roeck
  1 sibling, 1 reply; 6+ messages in thread
From: Eddie James @ 2022-03-21 15:31 UTC (permalink / raw)
  To: linux-fsi
  Cc: linux-kernel, linux-hwmon, jdelvare, linux, joel, jk, alistair,
	Eddie James

Due to the OCC communication design with a shared SRAM area,
checkum errors are expected due to corrupted buffer from OCC
communications with other system components. Therefore, retry
the command twice in the event of a checksum failure.

Signed-off-by: Eddie James <eajames@linux.ibm.com>
---
 drivers/hwmon/occ/p9_sbe.c | 28 ++++++++++++++++++----------
 1 file changed, 18 insertions(+), 10 deletions(-)

diff --git a/drivers/hwmon/occ/p9_sbe.c b/drivers/hwmon/occ/p9_sbe.c
index 49b13cc01073..7f4c3f979c54 100644
--- a/drivers/hwmon/occ/p9_sbe.c
+++ b/drivers/hwmon/occ/p9_sbe.c
@@ -84,17 +84,25 @@ static int p9_sbe_occ_send_cmd(struct occ *occ, u8 *cmd, size_t len)
 	struct p9_sbe_occ *ctx = to_p9_sbe_occ(occ);
 	size_t resp_len = sizeof(*resp);
 	int rc;
-
-	rc = fsi_occ_submit(ctx->sbe, cmd, len, resp, &resp_len);
-	if (rc < 0) {
-		if (resp_len) {
-			if (p9_sbe_occ_save_ffdc(ctx, resp, resp_len))
-				sysfs_notify(&occ->bus_dev->kobj, NULL,
-					     bin_attr_ffdc.attr.name);
+	int tries = 0;
+
+	do {
+		rc = fsi_occ_submit(ctx->sbe, cmd, len, resp, &resp_len);
+		if (rc < 0) {
+			if (resp_len) {
+				if (p9_sbe_occ_save_ffdc(ctx, resp, resp_len))
+					sysfs_notify(&occ->bus_dev->kobj, NULL,
+						     bin_attr_ffdc.attr.name);
+
+				return rc;
+			} else if (rc != -EBADE) {
+				return rc;
+			}
+			/* retry twice for checksum failures */
+		} else {
+			break;
 		}
-
-		return rc;
-	}
+	} while (++tries < 3);
 
 	switch (resp->return_status) {
 	case OCC_RESP_CMD_IN_PRG:
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH 2/2] hwmon (occ): Retry for checksum failure
  2022-03-21 15:31 ` [PATCH 2/2] hwmon (occ): Retry for checksum failure Eddie James
@ 2022-04-24 17:18   ` Guenter Roeck
  2022-04-25  9:10     ` David Laight
  0 siblings, 1 reply; 6+ messages in thread
From: Guenter Roeck @ 2022-04-24 17:18 UTC (permalink / raw)
  To: Eddie James
  Cc: linux-fsi, linux-kernel, linux-hwmon, jdelvare, joel, jk, alistair

On Mon, Mar 21, 2022 at 10:31:12AM -0500, Eddie James wrote:
> Due to the OCC communication design with a shared SRAM area,
> checkum errors are expected due to corrupted buffer from OCC
> communications with other system components. Therefore, retry
> the command twice in the event of a checksum failure.
> 
> Signed-off-by: Eddie James <eajames@linux.ibm.com>

I assume this will be applied together with patch 1 of the series.

Acked-by: Guenter Roeck <linux@roeck-us.net>

Guenter

> ---
>  drivers/hwmon/occ/p9_sbe.c | 28 ++++++++++++++++++----------
>  1 file changed, 18 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/hwmon/occ/p9_sbe.c b/drivers/hwmon/occ/p9_sbe.c
> index 49b13cc01073..7f4c3f979c54 100644
> --- a/drivers/hwmon/occ/p9_sbe.c
> +++ b/drivers/hwmon/occ/p9_sbe.c
> @@ -84,17 +84,25 @@ static int p9_sbe_occ_send_cmd(struct occ *occ, u8 *cmd, size_t len)
>  	struct p9_sbe_occ *ctx = to_p9_sbe_occ(occ);
>  	size_t resp_len = sizeof(*resp);
>  	int rc;
> -
> -	rc = fsi_occ_submit(ctx->sbe, cmd, len, resp, &resp_len);
> -	if (rc < 0) {
> -		if (resp_len) {
> -			if (p9_sbe_occ_save_ffdc(ctx, resp, resp_len))
> -				sysfs_notify(&occ->bus_dev->kobj, NULL,
> -					     bin_attr_ffdc.attr.name);
> +	int tries = 0;
> +
> +	do {
> +		rc = fsi_occ_submit(ctx->sbe, cmd, len, resp, &resp_len);
> +		if (rc < 0) {
> +			if (resp_len) {
> +				if (p9_sbe_occ_save_ffdc(ctx, resp, resp_len))
> +					sysfs_notify(&occ->bus_dev->kobj, NULL,
> +						     bin_attr_ffdc.attr.name);
> +
> +				return rc;
> +			} else if (rc != -EBADE) {
> +				return rc;
> +			}
> +			/* retry twice for checksum failures */
> +		} else {
> +			break;
>  		}
> -
> -		return rc;
> -	}
> +	} while (++tries < 3);
>  
>  	switch (resp->return_status) {
>  	case OCC_RESP_CMD_IN_PRG:

^ permalink raw reply	[flat|nested] 6+ messages in thread

* RE: [PATCH 2/2] hwmon (occ): Retry for checksum failure
  2022-04-24 17:18   ` Guenter Roeck
@ 2022-04-25  9:10     ` David Laight
  2022-04-26 15:46       ` Eddie James
  0 siblings, 1 reply; 6+ messages in thread
From: David Laight @ 2022-04-25  9:10 UTC (permalink / raw)
  To: 'Guenter Roeck', Eddie James
  Cc: linux-fsi, linux-kernel, linux-hwmon, jdelvare, joel, jk, alistair

From: Guenter Roeck
> Sent: 24 April 2022 18:18
> 
> On Mon, Mar 21, 2022 at 10:31:12AM -0500, Eddie James wrote:
> > Due to the OCC communication design with a shared SRAM area,
> > checkum errors are expected due to corrupted buffer from OCC
> > communications with other system components. Therefore, retry
> > the command twice in the event of a checksum failure.
> >
> > Signed-off-by: Eddie James <eajames@linux.ibm.com>
> 
> I assume this will be applied together with patch 1 of the series.
> 
> Acked-by: Guenter Roeck <linux@roeck-us.net>
> 
> Guenter
> 
> > ---
> >  drivers/hwmon/occ/p9_sbe.c | 28 ++++++++++++++++++----------
> >  1 file changed, 18 insertions(+), 10 deletions(-)
> >
> > diff --git a/drivers/hwmon/occ/p9_sbe.c b/drivers/hwmon/occ/p9_sbe.c
> > index 49b13cc01073..7f4c3f979c54 100644
> > --- a/drivers/hwmon/occ/p9_sbe.c
> > +++ b/drivers/hwmon/occ/p9_sbe.c
> > @@ -84,17 +84,25 @@ static int p9_sbe_occ_send_cmd(struct occ *occ, u8 *cmd, size_t len)
> >  	struct p9_sbe_occ *ctx = to_p9_sbe_occ(occ);
> >  	size_t resp_len = sizeof(*resp);
> >  	int rc;
> > -
> > -	rc = fsi_occ_submit(ctx->sbe, cmd, len, resp, &resp_len);
> > -	if (rc < 0) {
> > -		if (resp_len) {
> > -			if (p9_sbe_occ_save_ffdc(ctx, resp, resp_len))
> > -				sysfs_notify(&occ->bus_dev->kobj, NULL,
> > -					     bin_attr_ffdc.attr.name);
> > +	int tries = 0;
> > +
> > +	do {

Why not use a for() loop?

> > +		rc = fsi_occ_submit(ctx->sbe, cmd, len, resp, &resp_len);
> > +		if (rc < 0) {
> > +			if (resp_len) {
> > +				if (p9_sbe_occ_save_ffdc(ctx, resp, resp_len))
> > +					sysfs_notify(&occ->bus_dev->kobj, NULL,
> > +						     bin_attr_ffdc.attr.name);
> > +
> > +				return rc;
> > +			} else if (rc != -EBADE) {
> > +				return rc;
> > +			}

No need for else after return.

> > +			/* retry twice for checksum failures */
> > +		} else {
> > +			break;

I'd break on the success path after testing (rc >= 0).
Saves a level of indent.

> >  		}
> > -
> > -		return rc;
> > -	}
> > +	} while (++tries < 3);
> >
> >  	switch (resp->return_status) {
> >  	case OCC_RESP_CMD_IN_PRG:

Probably end up with something like:
	for (tries = 0; tries < 3; tries++) {
		rc = ...;
		if (rc >= 0)
			break;
		if (resp_len) {
			...
			return rc;
		}
		if (rc != -EBADE)
			return rc;
	}

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 2/2] hwmon (occ): Retry for checksum failure
  2022-04-25  9:10     ` David Laight
@ 2022-04-26 15:46       ` Eddie James
  0 siblings, 0 replies; 6+ messages in thread
From: Eddie James @ 2022-04-26 15:46 UTC (permalink / raw)
  To: David Laight, 'Guenter Roeck'
  Cc: linux-fsi, linux-kernel, linux-hwmon, jdelvare, joel, jk, alistair


On 4/25/22 04:10, David Laight wrote:
> From: Guenter Roeck
>> Sent: 24 April 2022 18:18
>>
>> On Mon, Mar 21, 2022 at 10:31:12AM -0500, Eddie James wrote:
>>> Due to the OCC communication design with a shared SRAM area,
>>> checkum errors are expected due to corrupted buffer from OCC
>>> communications with other system components. Therefore, retry
>>> the command twice in the event of a checksum failure.
>>>
>>> Signed-off-by: Eddie James <eajames@linux.ibm.com>
>> I assume this will be applied together with patch 1 of the series.
>>
>> Acked-by: Guenter Roeck <linux@roeck-us.net>
>>
>> Guenter
>>
>>> ---
>>>   drivers/hwmon/occ/p9_sbe.c | 28 ++++++++++++++++++----------
>>>   1 file changed, 18 insertions(+), 10 deletions(-)
>>>
>>> diff --git a/drivers/hwmon/occ/p9_sbe.c b/drivers/hwmon/occ/p9_sbe.c
>>> index 49b13cc01073..7f4c3f979c54 100644
>>> --- a/drivers/hwmon/occ/p9_sbe.c
>>> +++ b/drivers/hwmon/occ/p9_sbe.c
>>> @@ -84,17 +84,25 @@ static int p9_sbe_occ_send_cmd(struct occ *occ, u8 *cmd, size_t len)
>>>   	struct p9_sbe_occ *ctx = to_p9_sbe_occ(occ);
>>>   	size_t resp_len = sizeof(*resp);
>>>   	int rc;
>>> -
>>> -	rc = fsi_occ_submit(ctx->sbe, cmd, len, resp, &resp_len);
>>> -	if (rc < 0) {
>>> -		if (resp_len) {
>>> -			if (p9_sbe_occ_save_ffdc(ctx, resp, resp_len))
>>> -				sysfs_notify(&occ->bus_dev->kobj, NULL,
>>> -					     bin_attr_ffdc.attr.name);
>>> +	int tries = 0;
>>> +
>>> +	do {
> Why not use a for() loop?
>
>>> +		rc = fsi_occ_submit(ctx->sbe, cmd, len, resp, &resp_len);
>>> +		if (rc < 0) {
>>> +			if (resp_len) {
>>> +				if (p9_sbe_occ_save_ffdc(ctx, resp, resp_len))
>>> +					sysfs_notify(&occ->bus_dev->kobj, NULL,
>>> +						     bin_attr_ffdc.attr.name);
>>> +
>>> +				return rc;
>>> +			} else if (rc != -EBADE) {
>>> +				return rc;
>>> +			}
> No need for else after return.
>
>>> +			/* retry twice for checksum failures */
>>> +		} else {
>>> +			break;
> I'd break on the success path after testing (rc >= 0).
> Saves a level of indent.
>
>>>   		}
>>> -
>>> -		return rc;
>>> -	}
>>> +	} while (++tries < 3);
>>>
>>>   	switch (resp->return_status) {
>>>   	case OCC_RESP_CMD_IN_PRG:
> Probably end up with something like:
> 	for (tries = 0; tries < 3; tries++) {
> 		rc = ...;
> 		if (rc >= 0)
> 			break;
> 		if (resp_len) {
> 			...
> 			return rc;
> 		}
> 		if (rc != -EBADE)
> 			return rc;
> 	}
>
> 	David


Thanks David, Ack.


>
> -
> Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
> Registration No: 1397386 (Wales)
>

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2022-04-26 15:46 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-03-21 15:31 [PATCH 0/2] fsi and hwmon (occ): Prevent occasional checksum failures Eddie James
2022-03-21 15:31 ` [PATCH 1/2] fsi: occ: Fix checksum failure mode Eddie James
2022-03-21 15:31 ` [PATCH 2/2] hwmon (occ): Retry for checksum failure Eddie James
2022-04-24 17:18   ` Guenter Roeck
2022-04-25  9:10     ` David Laight
2022-04-26 15:46       ` Eddie James

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).