* [PATCH 0/2] fsi and hwmon (occ): Prevent occasional checksum failures @ 2022-03-21 15:31 Eddie James 2022-03-21 15:31 ` [PATCH 1/2] fsi: occ: Fix checksum failure mode Eddie James 2022-03-21 15:31 ` [PATCH 2/2] hwmon (occ): Retry for checksum failure Eddie James 0 siblings, 2 replies; 6+ messages in thread From: Eddie James @ 2022-03-21 15:31 UTC (permalink / raw) To: linux-fsi Cc: linux-kernel, linux-hwmon, jdelvare, linux, joel, jk, alistair, Eddie James Due to the OCC communication design with a shared SRAM area, checkum errors are expected due to corrupted buffer from OCC communications with other system components. Therefore, use a unique errno for checksum failures and retry the command twice in that case. Eddie James (2): fsi: occ: Fix checksum failure mode hwmon (occ): Retry for checksum failure drivers/fsi/fsi-occ.c | 7 +++++-- drivers/hwmon/occ/p9_sbe.c | 28 ++++++++++++++++++---------- 2 files changed, 23 insertions(+), 12 deletions(-) -- 2.27.0 ^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH 1/2] fsi: occ: Fix checksum failure mode 2022-03-21 15:31 [PATCH 0/2] fsi and hwmon (occ): Prevent occasional checksum failures Eddie James @ 2022-03-21 15:31 ` Eddie James 2022-03-21 15:31 ` [PATCH 2/2] hwmon (occ): Retry for checksum failure Eddie James 1 sibling, 0 replies; 6+ messages in thread From: Eddie James @ 2022-03-21 15:31 UTC (permalink / raw) To: linux-fsi Cc: linux-kernel, linux-hwmon, jdelvare, linux, joel, jk, alistair, Eddie James Change the checksum errno to something different than the errno used for a bad SBE message. In addition, don't set the user's response length to the data length in this case, since it's not SBE FFDC. Signed-off-by: Eddie James <eajames@linux.ibm.com> --- drivers/fsi/fsi-occ.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/drivers/fsi/fsi-occ.c b/drivers/fsi/fsi-occ.c index c9cc75fbdfb9..3d04e8baecbb 100644 --- a/drivers/fsi/fsi-occ.c +++ b/drivers/fsi/fsi-occ.c @@ -246,7 +246,7 @@ static int occ_verify_checksum(struct occ *occ, struct occ_response *resp, if (checksum != checksum_resp) { dev_err(occ->dev, "Bad checksum: %04x!=%04x\n", checksum, checksum_resp); - return -EBADMSG; + return -EBADE; } return 0; @@ -575,8 +575,11 @@ int fsi_occ_submit(struct device *dev, const void *request, size_t req_len, dev_dbg(dev, "resp_status=%02x resp_data_len=%d\n", resp->return_status, resp_data_length); - occ->client_response_size = resp_data_length + 7; rc = occ_verify_checksum(occ, resp, resp_data_length); + if (rc) + goto done; + + occ->client_response_size = resp_data_length + 7; done: *resp_len = occ->client_response_size; -- 2.27.0 ^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH 2/2] hwmon (occ): Retry for checksum failure 2022-03-21 15:31 [PATCH 0/2] fsi and hwmon (occ): Prevent occasional checksum failures Eddie James 2022-03-21 15:31 ` [PATCH 1/2] fsi: occ: Fix checksum failure mode Eddie James @ 2022-03-21 15:31 ` Eddie James 2022-04-24 17:18 ` Guenter Roeck 1 sibling, 1 reply; 6+ messages in thread From: Eddie James @ 2022-03-21 15:31 UTC (permalink / raw) To: linux-fsi Cc: linux-kernel, linux-hwmon, jdelvare, linux, joel, jk, alistair, Eddie James Due to the OCC communication design with a shared SRAM area, checkum errors are expected due to corrupted buffer from OCC communications with other system components. Therefore, retry the command twice in the event of a checksum failure. Signed-off-by: Eddie James <eajames@linux.ibm.com> --- drivers/hwmon/occ/p9_sbe.c | 28 ++++++++++++++++++---------- 1 file changed, 18 insertions(+), 10 deletions(-) diff --git a/drivers/hwmon/occ/p9_sbe.c b/drivers/hwmon/occ/p9_sbe.c index 49b13cc01073..7f4c3f979c54 100644 --- a/drivers/hwmon/occ/p9_sbe.c +++ b/drivers/hwmon/occ/p9_sbe.c @@ -84,17 +84,25 @@ static int p9_sbe_occ_send_cmd(struct occ *occ, u8 *cmd, size_t len) struct p9_sbe_occ *ctx = to_p9_sbe_occ(occ); size_t resp_len = sizeof(*resp); int rc; - - rc = fsi_occ_submit(ctx->sbe, cmd, len, resp, &resp_len); - if (rc < 0) { - if (resp_len) { - if (p9_sbe_occ_save_ffdc(ctx, resp, resp_len)) - sysfs_notify(&occ->bus_dev->kobj, NULL, - bin_attr_ffdc.attr.name); + int tries = 0; + + do { + rc = fsi_occ_submit(ctx->sbe, cmd, len, resp, &resp_len); + if (rc < 0) { + if (resp_len) { + if (p9_sbe_occ_save_ffdc(ctx, resp, resp_len)) + sysfs_notify(&occ->bus_dev->kobj, NULL, + bin_attr_ffdc.attr.name); + + return rc; + } else if (rc != -EBADE) { + return rc; + } + /* retry twice for checksum failures */ + } else { + break; } - - return rc; - } + } while (++tries < 3); switch (resp->return_status) { case OCC_RESP_CMD_IN_PRG: -- 2.27.0 ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH 2/2] hwmon (occ): Retry for checksum failure 2022-03-21 15:31 ` [PATCH 2/2] hwmon (occ): Retry for checksum failure Eddie James @ 2022-04-24 17:18 ` Guenter Roeck 2022-04-25 9:10 ` David Laight 0 siblings, 1 reply; 6+ messages in thread From: Guenter Roeck @ 2022-04-24 17:18 UTC (permalink / raw) To: Eddie James Cc: linux-fsi, linux-kernel, linux-hwmon, jdelvare, joel, jk, alistair On Mon, Mar 21, 2022 at 10:31:12AM -0500, Eddie James wrote: > Due to the OCC communication design with a shared SRAM area, > checkum errors are expected due to corrupted buffer from OCC > communications with other system components. Therefore, retry > the command twice in the event of a checksum failure. > > Signed-off-by: Eddie James <eajames@linux.ibm.com> I assume this will be applied together with patch 1 of the series. Acked-by: Guenter Roeck <linux@roeck-us.net> Guenter > --- > drivers/hwmon/occ/p9_sbe.c | 28 ++++++++++++++++++---------- > 1 file changed, 18 insertions(+), 10 deletions(-) > > diff --git a/drivers/hwmon/occ/p9_sbe.c b/drivers/hwmon/occ/p9_sbe.c > index 49b13cc01073..7f4c3f979c54 100644 > --- a/drivers/hwmon/occ/p9_sbe.c > +++ b/drivers/hwmon/occ/p9_sbe.c > @@ -84,17 +84,25 @@ static int p9_sbe_occ_send_cmd(struct occ *occ, u8 *cmd, size_t len) > struct p9_sbe_occ *ctx = to_p9_sbe_occ(occ); > size_t resp_len = sizeof(*resp); > int rc; > - > - rc = fsi_occ_submit(ctx->sbe, cmd, len, resp, &resp_len); > - if (rc < 0) { > - if (resp_len) { > - if (p9_sbe_occ_save_ffdc(ctx, resp, resp_len)) > - sysfs_notify(&occ->bus_dev->kobj, NULL, > - bin_attr_ffdc.attr.name); > + int tries = 0; > + > + do { > + rc = fsi_occ_submit(ctx->sbe, cmd, len, resp, &resp_len); > + if (rc < 0) { > + if (resp_len) { > + if (p9_sbe_occ_save_ffdc(ctx, resp, resp_len)) > + sysfs_notify(&occ->bus_dev->kobj, NULL, > + bin_attr_ffdc.attr.name); > + > + return rc; > + } else if (rc != -EBADE) { > + return rc; > + } > + /* retry twice for checksum failures */ > + } else { > + break; > } > - > - return rc; > - } > + } while (++tries < 3); > > switch (resp->return_status) { > case OCC_RESP_CMD_IN_PRG: ^ permalink raw reply [flat|nested] 6+ messages in thread
* RE: [PATCH 2/2] hwmon (occ): Retry for checksum failure 2022-04-24 17:18 ` Guenter Roeck @ 2022-04-25 9:10 ` David Laight 2022-04-26 15:46 ` Eddie James 0 siblings, 1 reply; 6+ messages in thread From: David Laight @ 2022-04-25 9:10 UTC (permalink / raw) To: 'Guenter Roeck', Eddie James Cc: linux-fsi, linux-kernel, linux-hwmon, jdelvare, joel, jk, alistair From: Guenter Roeck > Sent: 24 April 2022 18:18 > > On Mon, Mar 21, 2022 at 10:31:12AM -0500, Eddie James wrote: > > Due to the OCC communication design with a shared SRAM area, > > checkum errors are expected due to corrupted buffer from OCC > > communications with other system components. Therefore, retry > > the command twice in the event of a checksum failure. > > > > Signed-off-by: Eddie James <eajames@linux.ibm.com> > > I assume this will be applied together with patch 1 of the series. > > Acked-by: Guenter Roeck <linux@roeck-us.net> > > Guenter > > > --- > > drivers/hwmon/occ/p9_sbe.c | 28 ++++++++++++++++++---------- > > 1 file changed, 18 insertions(+), 10 deletions(-) > > > > diff --git a/drivers/hwmon/occ/p9_sbe.c b/drivers/hwmon/occ/p9_sbe.c > > index 49b13cc01073..7f4c3f979c54 100644 > > --- a/drivers/hwmon/occ/p9_sbe.c > > +++ b/drivers/hwmon/occ/p9_sbe.c > > @@ -84,17 +84,25 @@ static int p9_sbe_occ_send_cmd(struct occ *occ, u8 *cmd, size_t len) > > struct p9_sbe_occ *ctx = to_p9_sbe_occ(occ); > > size_t resp_len = sizeof(*resp); > > int rc; > > - > > - rc = fsi_occ_submit(ctx->sbe, cmd, len, resp, &resp_len); > > - if (rc < 0) { > > - if (resp_len) { > > - if (p9_sbe_occ_save_ffdc(ctx, resp, resp_len)) > > - sysfs_notify(&occ->bus_dev->kobj, NULL, > > - bin_attr_ffdc.attr.name); > > + int tries = 0; > > + > > + do { Why not use a for() loop? > > + rc = fsi_occ_submit(ctx->sbe, cmd, len, resp, &resp_len); > > + if (rc < 0) { > > + if (resp_len) { > > + if (p9_sbe_occ_save_ffdc(ctx, resp, resp_len)) > > + sysfs_notify(&occ->bus_dev->kobj, NULL, > > + bin_attr_ffdc.attr.name); > > + > > + return rc; > > + } else if (rc != -EBADE) { > > + return rc; > > + } No need for else after return. > > + /* retry twice for checksum failures */ > > + } else { > > + break; I'd break on the success path after testing (rc >= 0). Saves a level of indent. > > } > > - > > - return rc; > > - } > > + } while (++tries < 3); > > > > switch (resp->return_status) { > > case OCC_RESP_CMD_IN_PRG: Probably end up with something like: for (tries = 0; tries < 3; tries++) { rc = ...; if (rc >= 0) break; if (resp_len) { ... return rc; } if (rc != -EBADE) return rc; } David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales) ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 2/2] hwmon (occ): Retry for checksum failure 2022-04-25 9:10 ` David Laight @ 2022-04-26 15:46 ` Eddie James 0 siblings, 0 replies; 6+ messages in thread From: Eddie James @ 2022-04-26 15:46 UTC (permalink / raw) To: David Laight, 'Guenter Roeck' Cc: linux-fsi, linux-kernel, linux-hwmon, jdelvare, joel, jk, alistair On 4/25/22 04:10, David Laight wrote: > From: Guenter Roeck >> Sent: 24 April 2022 18:18 >> >> On Mon, Mar 21, 2022 at 10:31:12AM -0500, Eddie James wrote: >>> Due to the OCC communication design with a shared SRAM area, >>> checkum errors are expected due to corrupted buffer from OCC >>> communications with other system components. Therefore, retry >>> the command twice in the event of a checksum failure. >>> >>> Signed-off-by: Eddie James <eajames@linux.ibm.com> >> I assume this will be applied together with patch 1 of the series. >> >> Acked-by: Guenter Roeck <linux@roeck-us.net> >> >> Guenter >> >>> --- >>> drivers/hwmon/occ/p9_sbe.c | 28 ++++++++++++++++++---------- >>> 1 file changed, 18 insertions(+), 10 deletions(-) >>> >>> diff --git a/drivers/hwmon/occ/p9_sbe.c b/drivers/hwmon/occ/p9_sbe.c >>> index 49b13cc01073..7f4c3f979c54 100644 >>> --- a/drivers/hwmon/occ/p9_sbe.c >>> +++ b/drivers/hwmon/occ/p9_sbe.c >>> @@ -84,17 +84,25 @@ static int p9_sbe_occ_send_cmd(struct occ *occ, u8 *cmd, size_t len) >>> struct p9_sbe_occ *ctx = to_p9_sbe_occ(occ); >>> size_t resp_len = sizeof(*resp); >>> int rc; >>> - >>> - rc = fsi_occ_submit(ctx->sbe, cmd, len, resp, &resp_len); >>> - if (rc < 0) { >>> - if (resp_len) { >>> - if (p9_sbe_occ_save_ffdc(ctx, resp, resp_len)) >>> - sysfs_notify(&occ->bus_dev->kobj, NULL, >>> - bin_attr_ffdc.attr.name); >>> + int tries = 0; >>> + >>> + do { > Why not use a for() loop? > >>> + rc = fsi_occ_submit(ctx->sbe, cmd, len, resp, &resp_len); >>> + if (rc < 0) { >>> + if (resp_len) { >>> + if (p9_sbe_occ_save_ffdc(ctx, resp, resp_len)) >>> + sysfs_notify(&occ->bus_dev->kobj, NULL, >>> + bin_attr_ffdc.attr.name); >>> + >>> + return rc; >>> + } else if (rc != -EBADE) { >>> + return rc; >>> + } > No need for else after return. > >>> + /* retry twice for checksum failures */ >>> + } else { >>> + break; > I'd break on the success path after testing (rc >= 0). > Saves a level of indent. > >>> } >>> - >>> - return rc; >>> - } >>> + } while (++tries < 3); >>> >>> switch (resp->return_status) { >>> case OCC_RESP_CMD_IN_PRG: > Probably end up with something like: > for (tries = 0; tries < 3; tries++) { > rc = ...; > if (rc >= 0) > break; > if (resp_len) { > ... > return rc; > } > if (rc != -EBADE) > return rc; > } > > David Thanks David, Ack. > > - > Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK > Registration No: 1397386 (Wales) > ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2022-04-26 15:46 UTC | newest] Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2022-03-21 15:31 [PATCH 0/2] fsi and hwmon (occ): Prevent occasional checksum failures Eddie James 2022-03-21 15:31 ` [PATCH 1/2] fsi: occ: Fix checksum failure mode Eddie James 2022-03-21 15:31 ` [PATCH 2/2] hwmon (occ): Retry for checksum failure Eddie James 2022-04-24 17:18 ` Guenter Roeck 2022-04-25 9:10 ` David Laight 2022-04-26 15:46 ` Eddie James
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).