* [PATCH 1/4] fsi: occ: Don't accept response from un-initialized OCC
2021-02-09 17:12 [PATCH 0/4] occ: fsi and hwmon: Fixes for polling un-initialized OCC Eddie James
@ 2021-02-09 17:12 ` Eddie James
2021-04-06 7:18 ` Joel Stanley
2021-02-09 17:12 ` [PATCH 2/4] fsi: occ: Log error for checksum failure Eddie James
` (3 subsequent siblings)
4 siblings, 1 reply; 11+ messages in thread
From: Eddie James @ 2021-02-09 17:12 UTC (permalink / raw)
To: linux-hwmon
Cc: linux-fsi, linux-kernel, jk, joel, alistair, jdelvare, linux,
Eddie James
If the OCC is not initialized and responds as such, the driver
should continue waiting for a valid response until the timeout
expires.
Signed-off-by: Eddie James <eajames@linux.ibm.com>
---
drivers/fsi/fsi-occ.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/fsi/fsi-occ.c b/drivers/fsi/fsi-occ.c
index 10ca2e290655..cb05b6dacc9d 100644
--- a/drivers/fsi/fsi-occ.c
+++ b/drivers/fsi/fsi-occ.c
@@ -495,6 +495,7 @@ int fsi_occ_submit(struct device *dev, const void *request, size_t req_len,
goto done;
if (resp->return_status == OCC_RESP_CMD_IN_PRG ||
+ resp->return_status == OCC_RESP_CRIT_INIT ||
resp->seq_no != seq_no) {
rc = -ETIMEDOUT;
--
2.27.0
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH 1/4] fsi: occ: Don't accept response from un-initialized OCC
2021-02-09 17:12 ` [PATCH 1/4] fsi: occ: Don't accept response from " Eddie James
@ 2021-04-06 7:18 ` Joel Stanley
0 siblings, 0 replies; 11+ messages in thread
From: Joel Stanley @ 2021-04-06 7:18 UTC (permalink / raw)
To: Eddie James
Cc: linux-hwmon, linux-fsi, Linux Kernel Mailing List, Jeremy Kerr,
Alistair Popple, Jean Delvare, Guenter Roeck
On Tue, 9 Feb 2021 at 17:12, Eddie James <eajames@linux.ibm.com> wrote:
>
> If the OCC is not initialized and responds as such, the driver
> should continue waiting for a valid response until the timeout
> expires.
>
> Signed-off-by: Eddie James <eajames@linux.ibm.com>
Reviewed-by: Joel Stanley <joel@jms.id.au>
I guess we should add this too?
Fixes: 7ed98dddb764 ("fsi: Add On-Chip Controller (OCC) driver")
> ---
> drivers/fsi/fsi-occ.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/drivers/fsi/fsi-occ.c b/drivers/fsi/fsi-occ.c
> index 10ca2e290655..cb05b6dacc9d 100644
> --- a/drivers/fsi/fsi-occ.c
> +++ b/drivers/fsi/fsi-occ.c
> @@ -495,6 +495,7 @@ int fsi_occ_submit(struct device *dev, const void *request, size_t req_len,
> goto done;
>
> if (resp->return_status == OCC_RESP_CMD_IN_PRG ||
> + resp->return_status == OCC_RESP_CRIT_INIT ||
> resp->seq_no != seq_no) {
> rc = -ETIMEDOUT;
>
> --
> 2.27.0
>
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH 2/4] fsi: occ: Log error for checksum failure
2021-02-09 17:12 [PATCH 0/4] occ: fsi and hwmon: Fixes for polling un-initialized OCC Eddie James
2021-02-09 17:12 ` [PATCH 1/4] fsi: occ: Don't accept response from " Eddie James
@ 2021-02-09 17:12 ` Eddie James
2021-04-06 7:21 ` Joel Stanley
2021-02-09 17:12 ` [PATCH 3/4] hwmon: (occ) Start sequence number at one Eddie James
` (2 subsequent siblings)
4 siblings, 1 reply; 11+ messages in thread
From: Eddie James @ 2021-02-09 17:12 UTC (permalink / raw)
To: linux-hwmon
Cc: linux-fsi, linux-kernel, jk, joel, alistair, jdelvare, linux,
Eddie James
Log an error if the response checksum doesn't match the
calculated checksum.
Signed-off-by: Eddie James <eajames@linux.ibm.com>
---
drivers/fsi/fsi-occ.c | 10 +++++++---
1 file changed, 7 insertions(+), 3 deletions(-)
diff --git a/drivers/fsi/fsi-occ.c b/drivers/fsi/fsi-occ.c
index cb05b6dacc9d..524460995465 100644
--- a/drivers/fsi/fsi-occ.c
+++ b/drivers/fsi/fsi-occ.c
@@ -223,7 +223,8 @@ static const struct file_operations occ_fops = {
.release = occ_release,
};
-static int occ_verify_checksum(struct occ_response *resp, u16 data_length)
+static int occ_verify_checksum(struct occ *occ, struct occ_response *resp,
+ u16 data_length)
{
/* Fetch the two bytes after the data for the checksum. */
u16 checksum_resp = get_unaligned_be16(&resp->data[data_length]);
@@ -238,8 +239,11 @@ static int occ_verify_checksum(struct occ_response *resp, u16 data_length)
for (i = 0; i < data_length; ++i)
checksum += resp->data[i];
- if (checksum != checksum_resp)
+ if (checksum != checksum_resp) {
+ dev_err(occ->dev, "Bad checksum: %04x!=%04x\n", checksum,
+ checksum_resp);
return -EBADMSG;
+ }
return 0;
}
@@ -533,7 +537,7 @@ int fsi_occ_submit(struct device *dev, const void *request, size_t req_len,
}
*resp_len = resp_data_length + 7;
- rc = occ_verify_checksum(resp, resp_data_length);
+ rc = occ_verify_checksum(occ, resp, resp_data_length);
done:
mutex_unlock(&occ->occ_lock);
--
2.27.0
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH 2/4] fsi: occ: Log error for checksum failure
2021-02-09 17:12 ` [PATCH 2/4] fsi: occ: Log error for checksum failure Eddie James
@ 2021-04-06 7:21 ` Joel Stanley
0 siblings, 0 replies; 11+ messages in thread
From: Joel Stanley @ 2021-04-06 7:21 UTC (permalink / raw)
To: Eddie James
Cc: linux-hwmon, linux-fsi, Linux Kernel Mailing List, Jeremy Kerr,
Alistair Popple, Jean Delvare, Guenter Roeck
On Tue, 9 Feb 2021 at 17:13, Eddie James <eajames@linux.ibm.com> wrote:
>
> Log an error if the response checksum doesn't match the
> calculated checksum.
Reviewed-by: Joel Stanley <joel@jms.id.au>
>
> Signed-off-by: Eddie James <eajames@linux.ibm.com>
> ---
> drivers/fsi/fsi-occ.c | 10 +++++++---
> 1 file changed, 7 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/fsi/fsi-occ.c b/drivers/fsi/fsi-occ.c
> index cb05b6dacc9d..524460995465 100644
> --- a/drivers/fsi/fsi-occ.c
> +++ b/drivers/fsi/fsi-occ.c
> @@ -223,7 +223,8 @@ static const struct file_operations occ_fops = {
> .release = occ_release,
> };
>
> -static int occ_verify_checksum(struct occ_response *resp, u16 data_length)
> +static int occ_verify_checksum(struct occ *occ, struct occ_response *resp,
> + u16 data_length)
> {
> /* Fetch the two bytes after the data for the checksum. */
> u16 checksum_resp = get_unaligned_be16(&resp->data[data_length]);
> @@ -238,8 +239,11 @@ static int occ_verify_checksum(struct occ_response *resp, u16 data_length)
> for (i = 0; i < data_length; ++i)
> checksum += resp->data[i];
>
> - if (checksum != checksum_resp)
> + if (checksum != checksum_resp) {
> + dev_err(occ->dev, "Bad checksum: %04x!=%04x\n", checksum,
> + checksum_resp);
Just confirming that this is unexpected, we won't see this eg. if the
system is booting or when the BMC is reset while the host is running?
> return -EBADMSG;
> + }
>
> return 0;
> }
> @@ -533,7 +537,7 @@ int fsi_occ_submit(struct device *dev, const void *request, size_t req_len,
> }
>
> *resp_len = resp_data_length + 7;
> - rc = occ_verify_checksum(resp, resp_data_length);
> + rc = occ_verify_checksum(occ, resp, resp_data_length);
>
> done:
> mutex_unlock(&occ->occ_lock);
> --
> 2.27.0
>
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH 3/4] hwmon: (occ) Start sequence number at one
2021-02-09 17:12 [PATCH 0/4] occ: fsi and hwmon: Fixes for polling un-initialized OCC Eddie James
2021-02-09 17:12 ` [PATCH 1/4] fsi: occ: Don't accept response from " Eddie James
2021-02-09 17:12 ` [PATCH 2/4] fsi: occ: Log error for checksum failure Eddie James
@ 2021-02-09 17:12 ` Eddie James
2021-02-09 18:12 ` Guenter Roeck
2021-02-09 17:12 ` [PATCH 4/4] hwmon: (occ) Print response status in first poll error message Eddie James
2021-04-05 15:33 ` [PATCH 0/4] occ: fsi and hwmon: Fixes for polling un-initialized OCC Eddie James
4 siblings, 1 reply; 11+ messages in thread
From: Eddie James @ 2021-02-09 17:12 UTC (permalink / raw)
To: linux-hwmon
Cc: linux-fsi, linux-kernel, jk, joel, alistair, jdelvare, linux,
Eddie James
Initialize the sequence number at one, rather than zero, in order
to prevent false matches with the zero-initialized OCC SRAM
buffer before the OCC is fully initialized.
Signed-off-by: Eddie James <eajames@linux.ibm.com>
---
drivers/hwmon/occ/common.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/hwmon/occ/common.c b/drivers/hwmon/occ/common.c
index 7a5e539b567b..ee0c5d12dfdf 100644
--- a/drivers/hwmon/occ/common.c
+++ b/drivers/hwmon/occ/common.c
@@ -1150,6 +1150,8 @@ int occ_setup(struct occ *occ, const char *name)
{
int rc;
+ /* start with 1 to avoid false match with zero-initialized SRAM buffer */
+ occ->seq_no = 1;
mutex_init(&occ->lock);
occ->groups[0] = &occ->group;
--
2.27.0
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH 3/4] hwmon: (occ) Start sequence number at one
2021-02-09 17:12 ` [PATCH 3/4] hwmon: (occ) Start sequence number at one Eddie James
@ 2021-02-09 18:12 ` Guenter Roeck
0 siblings, 0 replies; 11+ messages in thread
From: Guenter Roeck @ 2021-02-09 18:12 UTC (permalink / raw)
To: Eddie James
Cc: linux-hwmon, linux-fsi, linux-kernel, jk, joel, alistair, jdelvare
On Tue, Feb 09, 2021 at 11:12:34AM -0600, Eddie James wrote:
> Initialize the sequence number at one, rather than zero, in order
> to prevent false matches with the zero-initialized OCC SRAM
> buffer before the OCC is fully initialized.
>
> Signed-off-by: Eddie James <eajames@linux.ibm.com>
Acked-by: Guenter Roeck <linux@roeck-us.net>
For now I'll assume that the series has to be submitted together,
and that this won't happen through the hwmon branch.
Guenter
> ---
> drivers/hwmon/occ/common.c | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/drivers/hwmon/occ/common.c b/drivers/hwmon/occ/common.c
> index 7a5e539b567b..ee0c5d12dfdf 100644
> --- a/drivers/hwmon/occ/common.c
> +++ b/drivers/hwmon/occ/common.c
> @@ -1150,6 +1150,8 @@ int occ_setup(struct occ *occ, const char *name)
> {
> int rc;
>
> + /* start with 1 to avoid false match with zero-initialized SRAM buffer */
> + occ->seq_no = 1;
> mutex_init(&occ->lock);
> occ->groups[0] = &occ->group;
>
> --
> 2.27.0
>
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH 4/4] hwmon: (occ) Print response status in first poll error message
2021-02-09 17:12 [PATCH 0/4] occ: fsi and hwmon: Fixes for polling un-initialized OCC Eddie James
` (2 preceding siblings ...)
2021-02-09 17:12 ` [PATCH 3/4] hwmon: (occ) Start sequence number at one Eddie James
@ 2021-02-09 17:12 ` Eddie James
2021-02-09 18:12 ` Guenter Roeck
2021-04-05 15:33 ` [PATCH 0/4] occ: fsi and hwmon: Fixes for polling un-initialized OCC Eddie James
4 siblings, 1 reply; 11+ messages in thread
From: Eddie James @ 2021-02-09 17:12 UTC (permalink / raw)
To: linux-hwmon
Cc: linux-fsi, linux-kernel, jk, joel, alistair, jdelvare, linux,
Eddie James
In order to better debug problems starting up the driver, print
the response status from the OCC in the error logged when the first
poll command fails.
Signed-off-by: Eddie James <eajames@linux.ibm.com>
---
drivers/hwmon/occ/common.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/drivers/hwmon/occ/common.c b/drivers/hwmon/occ/common.c
index ee0c5d12dfdf..f71d62b57468 100644
--- a/drivers/hwmon/occ/common.c
+++ b/drivers/hwmon/occ/common.c
@@ -1161,8 +1161,9 @@ int occ_setup(struct occ *occ, const char *name)
dev_info(occ->bus_dev, "host is not ready\n");
return rc;
} else if (rc < 0) {
- dev_err(occ->bus_dev, "failed to get OCC poll response: %d\n",
- rc);
+ dev_err(occ->bus_dev,
+ "failed to get OCC poll response=%02x: %d\n",
+ occ->resp.return_status, rc);
return rc;
}
--
2.27.0
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH 4/4] hwmon: (occ) Print response status in first poll error message
2021-02-09 17:12 ` [PATCH 4/4] hwmon: (occ) Print response status in first poll error message Eddie James
@ 2021-02-09 18:12 ` Guenter Roeck
0 siblings, 0 replies; 11+ messages in thread
From: Guenter Roeck @ 2021-02-09 18:12 UTC (permalink / raw)
To: Eddie James
Cc: linux-hwmon, linux-fsi, linux-kernel, jk, joel, alistair, jdelvare
On Tue, Feb 09, 2021 at 11:12:35AM -0600, Eddie James wrote:
> In order to better debug problems starting up the driver, print
> the response status from the OCC in the error logged when the first
> poll command fails.
>
> Signed-off-by: Eddie James <eajames@linux.ibm.com>
Acked-by: Guenter Roeck <linux@roeck-us.net>
> ---
> drivers/hwmon/occ/common.c | 5 +++--
> 1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/hwmon/occ/common.c b/drivers/hwmon/occ/common.c
> index ee0c5d12dfdf..f71d62b57468 100644
> --- a/drivers/hwmon/occ/common.c
> +++ b/drivers/hwmon/occ/common.c
> @@ -1161,8 +1161,9 @@ int occ_setup(struct occ *occ, const char *name)
> dev_info(occ->bus_dev, "host is not ready\n");
> return rc;
> } else if (rc < 0) {
> - dev_err(occ->bus_dev, "failed to get OCC poll response: %d\n",
> - rc);
> + dev_err(occ->bus_dev,
> + "failed to get OCC poll response=%02x: %d\n",
> + occ->resp.return_status, rc);
> return rc;
> }
>
> --
> 2.27.0
>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 0/4] occ: fsi and hwmon: Fixes for polling un-initialized OCC
2021-02-09 17:12 [PATCH 0/4] occ: fsi and hwmon: Fixes for polling un-initialized OCC Eddie James
` (3 preceding siblings ...)
2021-02-09 17:12 ` [PATCH 4/4] hwmon: (occ) Print response status in first poll error message Eddie James
@ 2021-04-05 15:33 ` Eddie James
2021-04-06 7:22 ` Joel Stanley
4 siblings, 1 reply; 11+ messages in thread
From: Eddie James @ 2021-04-05 15:33 UTC (permalink / raw)
To: joel; +Cc: linux-fsi, linux-kernel, jk, alistair, jdelvare, linux, linux-hwmon
On Tue, 2021-02-09 at 11:12 -0600, Eddie James wrote:
> In the event that the OCC is not initialized when the driver sends a
> poll
> command, the driver may receive an invalid response. This isn't an
> error
> condition unless there is no valid response before the timeout
> expires. So
> change the starting sequence number and check for the un-initialized
> OCC
> state before returning the response in order to detect this condition
> and
> continue waiting if necessary.
Hi Joel,
Do you have any comments on the FSI side of this series?
Thanks,
Eddie
>
> Eddie James (4):
> fsi: occ: Don't accept response from un-initialized OCC
> fsi: occ: Log error for checksum failure
> hwmon: (occ) Start sequence number at one
> hwmon: (occ) Print response status in first poll error message
>
> drivers/fsi/fsi-occ.c | 11 ++++++++---
> drivers/hwmon/occ/common.c | 7 +++++--
> 2 files changed, 13 insertions(+), 5 deletions(-)
>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 0/4] occ: fsi and hwmon: Fixes for polling un-initialized OCC
2021-04-05 15:33 ` [PATCH 0/4] occ: fsi and hwmon: Fixes for polling un-initialized OCC Eddie James
@ 2021-04-06 7:22 ` Joel Stanley
0 siblings, 0 replies; 11+ messages in thread
From: Joel Stanley @ 2021-04-06 7:22 UTC (permalink / raw)
To: Eddie James
Cc: linux-fsi, Linux Kernel Mailing List, Jeremy Kerr,
Alistair Popple, Jean Delvare, Guenter Roeck, linux-hwmon
On Mon, 5 Apr 2021 at 15:34, Eddie James <eajames@linux.ibm.com> wrote:
>
> On Tue, 2021-02-09 at 11:12 -0600, Eddie James wrote:
> > In the event that the OCC is not initialized when the driver sends a
> > poll
> > command, the driver may receive an invalid response. This isn't an
> > error
> > condition unless there is no valid response before the timeout
> > expires. So
> > change the starting sequence number and check for the un-initialized
> > OCC
> > state before returning the response in order to detect this condition
> > and
> > continue waiting if necessary.
>
> Hi Joel,
>
> Do you have any comments on the FSI side of this series?
They look fine to me. Guenter, if you want to take the series through
the hwmon tree I do not anticipate any conflicts.
Cheers,
Joel
^ permalink raw reply [flat|nested] 11+ messages in thread