* [PATCH v2 0/2] fsi and hwmon (occ): Prevent occasional checksum failures
@ 2022-04-26 15:49 Eddie James
2022-04-26 15:49 ` [PATCH v2 1/2] fsi: occ: Fix checksum failure mode Eddie James
2022-04-26 15:49 ` [PATCH v2 2/2] hwmon (occ): Retry for checksum failure Eddie James
0 siblings, 2 replies; 4+ messages in thread
From: Eddie James @ 2022-04-26 15:49 UTC (permalink / raw)
To: linux-fsi
Cc: linux-hwmon, linux-kernel, jdelvare, linux, joel, jk, David.Laight
Due to the OCC communication design with a shared SRAM area,
checkum errors are expected due to corrupted buffer from OCC
communications with other system components. Therefore, use a
unique errno for checksum failures and retry the command twice
in that case.
Changes since v1:
- Refactor the retry loop
Eddie James (2):
fsi: occ: Fix checksum failure mode
hwmon (occ): Retry for checksum failure
drivers/fsi/fsi-occ.c | 7 +++++--
drivers/hwmon/occ/p9_sbe.c | 15 +++++++++++----
2 files changed, 16 insertions(+), 6 deletions(-)
--
2.27.0
^ permalink raw reply [flat|nested] 4+ messages in thread
* [PATCH v2 1/2] fsi: occ: Fix checksum failure mode
2022-04-26 15:49 [PATCH v2 0/2] fsi and hwmon (occ): Prevent occasional checksum failures Eddie James
@ 2022-04-26 15:49 ` Eddie James
2022-04-26 15:49 ` [PATCH v2 2/2] hwmon (occ): Retry for checksum failure Eddie James
1 sibling, 0 replies; 4+ messages in thread
From: Eddie James @ 2022-04-26 15:49 UTC (permalink / raw)
To: linux-fsi
Cc: linux-hwmon, linux-kernel, jdelvare, linux, joel, jk, David.Laight
Change the checksum errno to something different than the errno
used for a bad SBE message. In addition, don't set the user's
response length to the data length in this case, since it's not
SBE FFDC.
Signed-off-by: Eddie James <eajames@linux.ibm.com>
---
drivers/fsi/fsi-occ.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/drivers/fsi/fsi-occ.c b/drivers/fsi/fsi-occ.c
index c9cc75fbdfb9..3d04e8baecbb 100644
--- a/drivers/fsi/fsi-occ.c
+++ b/drivers/fsi/fsi-occ.c
@@ -246,7 +246,7 @@ static int occ_verify_checksum(struct occ *occ, struct occ_response *resp,
if (checksum != checksum_resp) {
dev_err(occ->dev, "Bad checksum: %04x!=%04x\n", checksum,
checksum_resp);
- return -EBADMSG;
+ return -EBADE;
}
return 0;
@@ -575,8 +575,11 @@ int fsi_occ_submit(struct device *dev, const void *request, size_t req_len,
dev_dbg(dev, "resp_status=%02x resp_data_len=%d\n",
resp->return_status, resp_data_length);
- occ->client_response_size = resp_data_length + 7;
rc = occ_verify_checksum(occ, resp, resp_data_length);
+ if (rc)
+ goto done;
+
+ occ->client_response_size = resp_data_length + 7;
done:
*resp_len = occ->client_response_size;
--
2.27.0
^ permalink raw reply related [flat|nested] 4+ messages in thread
* [PATCH v2 2/2] hwmon (occ): Retry for checksum failure
2022-04-26 15:49 [PATCH v2 0/2] fsi and hwmon (occ): Prevent occasional checksum failures Eddie James
2022-04-26 15:49 ` [PATCH v2 1/2] fsi: occ: Fix checksum failure mode Eddie James
@ 2022-04-26 15:49 ` Eddie James
2022-04-27 8:34 ` Joel Stanley
1 sibling, 1 reply; 4+ messages in thread
From: Eddie James @ 2022-04-26 15:49 UTC (permalink / raw)
To: linux-fsi
Cc: linux-hwmon, linux-kernel, jdelvare, linux, joel, jk, David.Laight
Due to the OCC communication design with a shared SRAM area,
checkum errors are expected due to corrupted buffer from OCC
communications with other system components. Therefore, retry
the command twice in the event of a checksum failure.
Signed-off-by: Eddie James <eajames@linux.ibm.com>
Acked-by: Guenter Roeck <linux@roeck-us.net>
---
drivers/hwmon/occ/p9_sbe.c | 15 +++++++++++----
1 file changed, 11 insertions(+), 4 deletions(-)
diff --git a/drivers/hwmon/occ/p9_sbe.c b/drivers/hwmon/occ/p9_sbe.c
index 49b13cc01073..e6ccef2af659 100644
--- a/drivers/hwmon/occ/p9_sbe.c
+++ b/drivers/hwmon/occ/p9_sbe.c
@@ -14,6 +14,8 @@
#include "common.h"
+#define OCC_CHECKSUM_RETRIES 3
+
struct p9_sbe_occ {
struct occ occ;
bool sbe_error;
@@ -83,17 +85,22 @@ static int p9_sbe_occ_send_cmd(struct occ *occ, u8 *cmd, size_t len)
struct occ_response *resp = &occ->resp;
struct p9_sbe_occ *ctx = to_p9_sbe_occ(occ);
size_t resp_len = sizeof(*resp);
+ int i;
int rc;
- rc = fsi_occ_submit(ctx->sbe, cmd, len, resp, &resp_len);
- if (rc < 0) {
+ for (i = 0; i < OCC_CHECKSUM_RETRIES; ++i) {
+ rc = fsi_occ_submit(ctx->sbe, cmd, len, resp, &resp_len);
+ if (rc >= 0)
+ break;
if (resp_len) {
if (p9_sbe_occ_save_ffdc(ctx, resp, resp_len))
sysfs_notify(&occ->bus_dev->kobj, NULL,
bin_attr_ffdc.attr.name);
- }
- return rc;
+ return rc;
+ }
+ if (rc != -EBADE)
+ return rc;
}
switch (resp->return_status) {
--
2.27.0
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH v2 2/2] hwmon (occ): Retry for checksum failure
2022-04-26 15:49 ` [PATCH v2 2/2] hwmon (occ): Retry for checksum failure Eddie James
@ 2022-04-27 8:34 ` Joel Stanley
0 siblings, 0 replies; 4+ messages in thread
From: Joel Stanley @ 2022-04-27 8:34 UTC (permalink / raw)
To: Eddie James
Cc: linux-fsi, linux-hwmon, Linux Kernel Mailing List, Jean Delvare,
Guenter Roeck, Jeremy Kerr, David Laight
On Tue, 26 Apr 2022 at 15:50, Eddie James <eajames@linux.ibm.com> wrote:
>
> Due to the OCC communication design with a shared SRAM area,
> checkum errors are expected due to corrupted buffer from OCC
> communications with other system components. Therefore, retry
> the command twice in the event of a checksum failure.
>
> Signed-off-by: Eddie James <eajames@linux.ibm.com>
> Acked-by: Guenter Roeck <linux@roeck-us.net>
> ---
> drivers/hwmon/occ/p9_sbe.c | 15 +++++++++++----
> 1 file changed, 11 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/hwmon/occ/p9_sbe.c b/drivers/hwmon/occ/p9_sbe.c
> index 49b13cc01073..e6ccef2af659 100644
> --- a/drivers/hwmon/occ/p9_sbe.c
> +++ b/drivers/hwmon/occ/p9_sbe.c
> @@ -14,6 +14,8 @@
>
> #include "common.h"
>
> +#define OCC_CHECKSUM_RETRIES 3
> +
> struct p9_sbe_occ {
> struct occ occ;
> bool sbe_error;
> @@ -83,17 +85,22 @@ static int p9_sbe_occ_send_cmd(struct occ *occ, u8 *cmd, size_t len)
> struct occ_response *resp = &occ->resp;
> struct p9_sbe_occ *ctx = to_p9_sbe_occ(occ);
> size_t resp_len = sizeof(*resp);
> + int i;
> int rc;
>
> - rc = fsi_occ_submit(ctx->sbe, cmd, len, resp, &resp_len);
> - if (rc < 0) {
> + for (i = 0; i < OCC_CHECKSUM_RETRIES; ++i) {
> + rc = fsi_occ_submit(ctx->sbe, cmd, len, resp, &resp_len);
> + if (rc >= 0)
> + break;
> if (resp_len) {
> if (p9_sbe_occ_save_ffdc(ctx, resp, resp_len))
> sysfs_notify(&occ->bus_dev->kobj, NULL,
> bin_attr_ffdc.attr.name);
> - }
>
> - return rc;
> + return rc;
> + }
> + if (rc != -EBADE)
> + return rc;
Future you might appreciate a comment above the EBADE check clarifying
why that error is being special cased.
> }
>
> switch (resp->return_status) {
> --
> 2.27.0
>
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2022-04-27 8:34 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-26 15:49 [PATCH v2 0/2] fsi and hwmon (occ): Prevent occasional checksum failures Eddie James
2022-04-26 15:49 ` [PATCH v2 1/2] fsi: occ: Fix checksum failure mode Eddie James
2022-04-26 15:49 ` [PATCH v2 2/2] hwmon (occ): Retry for checksum failure Eddie James
2022-04-27 8:34 ` Joel Stanley
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.