All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/4] occ: fsi and hwmon: Fixes for polling un-initialized OCC
@ 2021-02-09 17:12 Eddie James
  2021-02-09 17:12 ` [PATCH 1/4] fsi: occ: Don't accept response from " Eddie James
                   ` (4 more replies)
  0 siblings, 5 replies; 11+ messages in thread
From: Eddie James @ 2021-02-09 17:12 UTC (permalink / raw)
  To: linux-hwmon
  Cc: linux-fsi, linux-kernel, jk, joel, alistair, jdelvare, linux,
	Eddie James

In the event that the OCC is not initialized when the driver sends a poll
command, the driver may receive an invalid response. This isn't an error
condition unless there is no valid response before the timeout expires. So
change the starting sequence number and check for the un-initialized OCC
state before returning the response in order to detect this condition and
continue waiting if necessary.

Eddie James (4):
  fsi: occ: Don't accept response from un-initialized OCC
  fsi: occ: Log error for checksum failure
  hwmon: (occ) Start sequence number at one
  hwmon: (occ) Print response status in first poll error message

 drivers/fsi/fsi-occ.c      | 11 ++++++++---
 drivers/hwmon/occ/common.c |  7 +++++--
 2 files changed, 13 insertions(+), 5 deletions(-)

-- 
2.27.0


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 1/4] fsi: occ: Don't accept response from un-initialized OCC
  2021-02-09 17:12 [PATCH 0/4] occ: fsi and hwmon: Fixes for polling un-initialized OCC Eddie James
@ 2021-02-09 17:12 ` Eddie James
  2021-04-06  7:18   ` Joel Stanley
  2021-02-09 17:12 ` [PATCH 2/4] fsi: occ: Log error for checksum failure Eddie James
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 11+ messages in thread
From: Eddie James @ 2021-02-09 17:12 UTC (permalink / raw)
  To: linux-hwmon
  Cc: linux-fsi, linux-kernel, jk, joel, alistair, jdelvare, linux,
	Eddie James

If the OCC is not initialized and responds as such, the driver
should continue waiting for a valid response until the timeout
expires.

Signed-off-by: Eddie James <eajames@linux.ibm.com>
---
 drivers/fsi/fsi-occ.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/fsi/fsi-occ.c b/drivers/fsi/fsi-occ.c
index 10ca2e290655..cb05b6dacc9d 100644
--- a/drivers/fsi/fsi-occ.c
+++ b/drivers/fsi/fsi-occ.c
@@ -495,6 +495,7 @@ int fsi_occ_submit(struct device *dev, const void *request, size_t req_len,
 			goto done;
 
 		if (resp->return_status == OCC_RESP_CMD_IN_PRG ||
+		    resp->return_status == OCC_RESP_CRIT_INIT ||
 		    resp->seq_no != seq_no) {
 			rc = -ETIMEDOUT;
 
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 2/4] fsi: occ: Log error for checksum failure
  2021-02-09 17:12 [PATCH 0/4] occ: fsi and hwmon: Fixes for polling un-initialized OCC Eddie James
  2021-02-09 17:12 ` [PATCH 1/4] fsi: occ: Don't accept response from " Eddie James
@ 2021-02-09 17:12 ` Eddie James
  2021-04-06  7:21   ` Joel Stanley
  2021-02-09 17:12 ` [PATCH 3/4] hwmon: (occ) Start sequence number at one Eddie James
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 11+ messages in thread
From: Eddie James @ 2021-02-09 17:12 UTC (permalink / raw)
  To: linux-hwmon
  Cc: linux-fsi, linux-kernel, jk, joel, alistair, jdelvare, linux,
	Eddie James

Log an error if the response checksum doesn't match the
calculated checksum.

Signed-off-by: Eddie James <eajames@linux.ibm.com>
---
 drivers/fsi/fsi-occ.c | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/drivers/fsi/fsi-occ.c b/drivers/fsi/fsi-occ.c
index cb05b6dacc9d..524460995465 100644
--- a/drivers/fsi/fsi-occ.c
+++ b/drivers/fsi/fsi-occ.c
@@ -223,7 +223,8 @@ static const struct file_operations occ_fops = {
 	.release = occ_release,
 };
 
-static int occ_verify_checksum(struct occ_response *resp, u16 data_length)
+static int occ_verify_checksum(struct occ *occ, struct occ_response *resp,
+			       u16 data_length)
 {
 	/* Fetch the two bytes after the data for the checksum. */
 	u16 checksum_resp = get_unaligned_be16(&resp->data[data_length]);
@@ -238,8 +239,11 @@ static int occ_verify_checksum(struct occ_response *resp, u16 data_length)
 	for (i = 0; i < data_length; ++i)
 		checksum += resp->data[i];
 
-	if (checksum != checksum_resp)
+	if (checksum != checksum_resp) {
+		dev_err(occ->dev, "Bad checksum: %04x!=%04x\n", checksum,
+			checksum_resp);
 		return -EBADMSG;
+	}
 
 	return 0;
 }
@@ -533,7 +537,7 @@ int fsi_occ_submit(struct device *dev, const void *request, size_t req_len,
 	}
 
 	*resp_len = resp_data_length + 7;
-	rc = occ_verify_checksum(resp, resp_data_length);
+	rc = occ_verify_checksum(occ, resp, resp_data_length);
 
  done:
 	mutex_unlock(&occ->occ_lock);
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 3/4] hwmon: (occ) Start sequence number at one
  2021-02-09 17:12 [PATCH 0/4] occ: fsi and hwmon: Fixes for polling un-initialized OCC Eddie James
  2021-02-09 17:12 ` [PATCH 1/4] fsi: occ: Don't accept response from " Eddie James
  2021-02-09 17:12 ` [PATCH 2/4] fsi: occ: Log error for checksum failure Eddie James
@ 2021-02-09 17:12 ` Eddie James
  2021-02-09 18:12   ` Guenter Roeck
  2021-02-09 17:12 ` [PATCH 4/4] hwmon: (occ) Print response status in first poll error message Eddie James
  2021-04-05 15:33 ` [PATCH 0/4] occ: fsi and hwmon: Fixes for polling un-initialized OCC Eddie James
  4 siblings, 1 reply; 11+ messages in thread
From: Eddie James @ 2021-02-09 17:12 UTC (permalink / raw)
  To: linux-hwmon
  Cc: linux-fsi, linux-kernel, jk, joel, alistair, jdelvare, linux,
	Eddie James

Initialize the sequence number at one, rather than zero, in order
to prevent false matches with the zero-initialized OCC SRAM
buffer before the OCC is fully initialized.

Signed-off-by: Eddie James <eajames@linux.ibm.com>
---
 drivers/hwmon/occ/common.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/hwmon/occ/common.c b/drivers/hwmon/occ/common.c
index 7a5e539b567b..ee0c5d12dfdf 100644
--- a/drivers/hwmon/occ/common.c
+++ b/drivers/hwmon/occ/common.c
@@ -1150,6 +1150,8 @@ int occ_setup(struct occ *occ, const char *name)
 {
 	int rc;
 
+	/* start with 1 to avoid false match with zero-initialized SRAM buffer */
+	occ->seq_no = 1;
 	mutex_init(&occ->lock);
 	occ->groups[0] = &occ->group;
 
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 4/4] hwmon: (occ) Print response status in first poll error message
  2021-02-09 17:12 [PATCH 0/4] occ: fsi and hwmon: Fixes for polling un-initialized OCC Eddie James
                   ` (2 preceding siblings ...)
  2021-02-09 17:12 ` [PATCH 3/4] hwmon: (occ) Start sequence number at one Eddie James
@ 2021-02-09 17:12 ` Eddie James
  2021-02-09 18:12   ` Guenter Roeck
  2021-04-05 15:33 ` [PATCH 0/4] occ: fsi and hwmon: Fixes for polling un-initialized OCC Eddie James
  4 siblings, 1 reply; 11+ messages in thread
From: Eddie James @ 2021-02-09 17:12 UTC (permalink / raw)
  To: linux-hwmon
  Cc: linux-fsi, linux-kernel, jk, joel, alistair, jdelvare, linux,
	Eddie James

In order to better debug problems starting up the driver, print
the response status from the OCC in the error logged when the first
poll command fails.

Signed-off-by: Eddie James <eajames@linux.ibm.com>
---
 drivers/hwmon/occ/common.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/hwmon/occ/common.c b/drivers/hwmon/occ/common.c
index ee0c5d12dfdf..f71d62b57468 100644
--- a/drivers/hwmon/occ/common.c
+++ b/drivers/hwmon/occ/common.c
@@ -1161,8 +1161,9 @@ int occ_setup(struct occ *occ, const char *name)
 		dev_info(occ->bus_dev, "host is not ready\n");
 		return rc;
 	} else if (rc < 0) {
-		dev_err(occ->bus_dev, "failed to get OCC poll response: %d\n",
-			rc);
+		dev_err(occ->bus_dev,
+			"failed to get OCC poll response=%02x: %d\n",
+			occ->resp.return_status, rc);
 		return rc;
 	}
 
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH 3/4] hwmon: (occ) Start sequence number at one
  2021-02-09 17:12 ` [PATCH 3/4] hwmon: (occ) Start sequence number at one Eddie James
@ 2021-02-09 18:12   ` Guenter Roeck
  0 siblings, 0 replies; 11+ messages in thread
From: Guenter Roeck @ 2021-02-09 18:12 UTC (permalink / raw)
  To: Eddie James
  Cc: linux-hwmon, linux-fsi, linux-kernel, jk, joel, alistair, jdelvare

On Tue, Feb 09, 2021 at 11:12:34AM -0600, Eddie James wrote:
> Initialize the sequence number at one, rather than zero, in order
> to prevent false matches with the zero-initialized OCC SRAM
> buffer before the OCC is fully initialized.
> 
> Signed-off-by: Eddie James <eajames@linux.ibm.com>

Acked-by: Guenter Roeck <linux@roeck-us.net>

For now I'll assume that the series has to be submitted together,
and that this won't happen through the hwmon branch.

Guenter

> ---
>  drivers/hwmon/occ/common.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/drivers/hwmon/occ/common.c b/drivers/hwmon/occ/common.c
> index 7a5e539b567b..ee0c5d12dfdf 100644
> --- a/drivers/hwmon/occ/common.c
> +++ b/drivers/hwmon/occ/common.c
> @@ -1150,6 +1150,8 @@ int occ_setup(struct occ *occ, const char *name)
>  {
>  	int rc;
>  
> +	/* start with 1 to avoid false match with zero-initialized SRAM buffer */
> +	occ->seq_no = 1;
>  	mutex_init(&occ->lock);
>  	occ->groups[0] = &occ->group;
>  
> -- 
> 2.27.0
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 4/4] hwmon: (occ) Print response status in first poll error message
  2021-02-09 17:12 ` [PATCH 4/4] hwmon: (occ) Print response status in first poll error message Eddie James
@ 2021-02-09 18:12   ` Guenter Roeck
  0 siblings, 0 replies; 11+ messages in thread
From: Guenter Roeck @ 2021-02-09 18:12 UTC (permalink / raw)
  To: Eddie James
  Cc: linux-hwmon, linux-fsi, linux-kernel, jk, joel, alistair, jdelvare

On Tue, Feb 09, 2021 at 11:12:35AM -0600, Eddie James wrote:
> In order to better debug problems starting up the driver, print
> the response status from the OCC in the error logged when the first
> poll command fails.
> 
> Signed-off-by: Eddie James <eajames@linux.ibm.com>

Acked-by: Guenter Roeck <linux@roeck-us.net>

> ---
>  drivers/hwmon/occ/common.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/hwmon/occ/common.c b/drivers/hwmon/occ/common.c
> index ee0c5d12dfdf..f71d62b57468 100644
> --- a/drivers/hwmon/occ/common.c
> +++ b/drivers/hwmon/occ/common.c
> @@ -1161,8 +1161,9 @@ int occ_setup(struct occ *occ, const char *name)
>  		dev_info(occ->bus_dev, "host is not ready\n");
>  		return rc;
>  	} else if (rc < 0) {
> -		dev_err(occ->bus_dev, "failed to get OCC poll response: %d\n",
> -			rc);
> +		dev_err(occ->bus_dev,
> +			"failed to get OCC poll response=%02x: %d\n",
> +			occ->resp.return_status, rc);
>  		return rc;
>  	}
>  
> -- 
> 2.27.0
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 0/4] occ: fsi and hwmon: Fixes for polling un-initialized OCC
  2021-02-09 17:12 [PATCH 0/4] occ: fsi and hwmon: Fixes for polling un-initialized OCC Eddie James
                   ` (3 preceding siblings ...)
  2021-02-09 17:12 ` [PATCH 4/4] hwmon: (occ) Print response status in first poll error message Eddie James
@ 2021-04-05 15:33 ` Eddie James
  2021-04-06  7:22   ` Joel Stanley
  4 siblings, 1 reply; 11+ messages in thread
From: Eddie James @ 2021-04-05 15:33 UTC (permalink / raw)
  To: joel; +Cc: linux-fsi, linux-kernel, jk, alistair, jdelvare, linux, linux-hwmon

On Tue, 2021-02-09 at 11:12 -0600, Eddie James wrote:
> In the event that the OCC is not initialized when the driver sends a
> poll
> command, the driver may receive an invalid response. This isn't an
> error
> condition unless there is no valid response before the timeout
> expires. So
> change the starting sequence number and check for the un-initialized
> OCC
> state before returning the response in order to detect this condition
> and
> continue waiting if necessary.

Hi Joel,

Do you have any comments on the FSI side of this series?

Thanks,
Eddie

> 
> Eddie James (4):
>   fsi: occ: Don't accept response from un-initialized OCC
>   fsi: occ: Log error for checksum failure
>   hwmon: (occ) Start sequence number at one
>   hwmon: (occ) Print response status in first poll error message
> 
>  drivers/fsi/fsi-occ.c      | 11 ++++++++---
>  drivers/hwmon/occ/common.c |  7 +++++--
>  2 files changed, 13 insertions(+), 5 deletions(-)
> 


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/4] fsi: occ: Don't accept response from un-initialized OCC
  2021-02-09 17:12 ` [PATCH 1/4] fsi: occ: Don't accept response from " Eddie James
@ 2021-04-06  7:18   ` Joel Stanley
  0 siblings, 0 replies; 11+ messages in thread
From: Joel Stanley @ 2021-04-06  7:18 UTC (permalink / raw)
  To: Eddie James
  Cc: linux-hwmon, linux-fsi, Linux Kernel Mailing List, Jeremy Kerr,
	Alistair Popple, Jean Delvare, Guenter Roeck

On Tue, 9 Feb 2021 at 17:12, Eddie James <eajames@linux.ibm.com> wrote:
>
> If the OCC is not initialized and responds as such, the driver
> should continue waiting for a valid response until the timeout
> expires.
>
> Signed-off-by: Eddie James <eajames@linux.ibm.com>

Reviewed-by: Joel Stanley <joel@jms.id.au>

I guess we should add this too?

Fixes: 7ed98dddb764 ("fsi: Add On-Chip Controller (OCC) driver")


> ---
>  drivers/fsi/fsi-occ.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/drivers/fsi/fsi-occ.c b/drivers/fsi/fsi-occ.c
> index 10ca2e290655..cb05b6dacc9d 100644
> --- a/drivers/fsi/fsi-occ.c
> +++ b/drivers/fsi/fsi-occ.c
> @@ -495,6 +495,7 @@ int fsi_occ_submit(struct device *dev, const void *request, size_t req_len,
>                         goto done;
>
>                 if (resp->return_status == OCC_RESP_CMD_IN_PRG ||
> +                   resp->return_status == OCC_RESP_CRIT_INIT ||
>                     resp->seq_no != seq_no) {
>                         rc = -ETIMEDOUT;
>
> --
> 2.27.0
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/4] fsi: occ: Log error for checksum failure
  2021-02-09 17:12 ` [PATCH 2/4] fsi: occ: Log error for checksum failure Eddie James
@ 2021-04-06  7:21   ` Joel Stanley
  0 siblings, 0 replies; 11+ messages in thread
From: Joel Stanley @ 2021-04-06  7:21 UTC (permalink / raw)
  To: Eddie James
  Cc: linux-hwmon, linux-fsi, Linux Kernel Mailing List, Jeremy Kerr,
	Alistair Popple, Jean Delvare, Guenter Roeck

On Tue, 9 Feb 2021 at 17:13, Eddie James <eajames@linux.ibm.com> wrote:
>
> Log an error if the response checksum doesn't match the
> calculated checksum.

Reviewed-by: Joel Stanley <joel@jms.id.au>

>
> Signed-off-by: Eddie James <eajames@linux.ibm.com>
> ---
>  drivers/fsi/fsi-occ.c | 10 +++++++---
>  1 file changed, 7 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/fsi/fsi-occ.c b/drivers/fsi/fsi-occ.c
> index cb05b6dacc9d..524460995465 100644
> --- a/drivers/fsi/fsi-occ.c
> +++ b/drivers/fsi/fsi-occ.c
> @@ -223,7 +223,8 @@ static const struct file_operations occ_fops = {
>         .release = occ_release,
>  };
>
> -static int occ_verify_checksum(struct occ_response *resp, u16 data_length)
> +static int occ_verify_checksum(struct occ *occ, struct occ_response *resp,
> +                              u16 data_length)
>  {
>         /* Fetch the two bytes after the data for the checksum. */
>         u16 checksum_resp = get_unaligned_be16(&resp->data[data_length]);
> @@ -238,8 +239,11 @@ static int occ_verify_checksum(struct occ_response *resp, u16 data_length)
>         for (i = 0; i < data_length; ++i)
>                 checksum += resp->data[i];
>
> -       if (checksum != checksum_resp)
> +       if (checksum != checksum_resp) {
> +               dev_err(occ->dev, "Bad checksum: %04x!=%04x\n", checksum,
> +                       checksum_resp);

Just confirming that this is unexpected, we won't see this eg. if the
system is booting or when the BMC is reset while the host is running?

>                 return -EBADMSG;
> +       }
>
>         return 0;
>  }
> @@ -533,7 +537,7 @@ int fsi_occ_submit(struct device *dev, const void *request, size_t req_len,
>         }
>
>         *resp_len = resp_data_length + 7;
> -       rc = occ_verify_checksum(resp, resp_data_length);
> +       rc = occ_verify_checksum(occ, resp, resp_data_length);
>
>   done:
>         mutex_unlock(&occ->occ_lock);
> --
> 2.27.0
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 0/4] occ: fsi and hwmon: Fixes for polling un-initialized OCC
  2021-04-05 15:33 ` [PATCH 0/4] occ: fsi and hwmon: Fixes for polling un-initialized OCC Eddie James
@ 2021-04-06  7:22   ` Joel Stanley
  0 siblings, 0 replies; 11+ messages in thread
From: Joel Stanley @ 2021-04-06  7:22 UTC (permalink / raw)
  To: Eddie James
  Cc: linux-fsi, Linux Kernel Mailing List, Jeremy Kerr,
	Alistair Popple, Jean Delvare, Guenter Roeck, linux-hwmon

On Mon, 5 Apr 2021 at 15:34, Eddie James <eajames@linux.ibm.com> wrote:
>
> On Tue, 2021-02-09 at 11:12 -0600, Eddie James wrote:
> > In the event that the OCC is not initialized when the driver sends a
> > poll
> > command, the driver may receive an invalid response. This isn't an
> > error
> > condition unless there is no valid response before the timeout
> > expires. So
> > change the starting sequence number and check for the un-initialized
> > OCC
> > state before returning the response in order to detect this condition
> > and
> > continue waiting if necessary.
>
> Hi Joel,
>
> Do you have any comments on the FSI side of this series?

They look fine to me. Guenter, if you want to take the series through
the hwmon tree I do not anticipate any conflicts.

Cheers,

Joel

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2021-04-06  7:22 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-02-09 17:12 [PATCH 0/4] occ: fsi and hwmon: Fixes for polling un-initialized OCC Eddie James
2021-02-09 17:12 ` [PATCH 1/4] fsi: occ: Don't accept response from " Eddie James
2021-04-06  7:18   ` Joel Stanley
2021-02-09 17:12 ` [PATCH 2/4] fsi: occ: Log error for checksum failure Eddie James
2021-04-06  7:21   ` Joel Stanley
2021-02-09 17:12 ` [PATCH 3/4] hwmon: (occ) Start sequence number at one Eddie James
2021-02-09 18:12   ` Guenter Roeck
2021-02-09 17:12 ` [PATCH 4/4] hwmon: (occ) Print response status in first poll error message Eddie James
2021-02-09 18:12   ` Guenter Roeck
2021-04-05 15:33 ` [PATCH 0/4] occ: fsi and hwmon: Fixes for polling un-initialized OCC Eddie James
2021-04-06  7:22   ` Joel Stanley

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.