All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] mfd: cros ec: spi: Fix "in progress" error signaling
@ 2017-09-27 21:35 Shawn Nematbakhsh
  2017-09-27 22:19 ` Brian Norris
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Shawn Nematbakhsh @ 2017-09-27 21:35 UTC (permalink / raw)
  To: linux-kernel; +Cc: lee.jones, jonathanh, briannorris, Shawn Nematbakhsh

For host commands that take a long time to process, cros ec can return
early by signaling a EC_RES_IN_PROGRESS result. The host must then poll
status with EC_CMD_GET_COMMS_STATUS until completion of the command.

None of the above applies when data link errors are encountered. When
errors such as EC_SPI_PAST_END are encountered during command
transmission, it usually means the command was not received by the EC.
Treating such errors as if they were 'EC_RES_IN_PROGRESS' results is
almost always the wrong decision, and can result in host commands
silently being lost.

Reported-and-tested-by: Jon Hunter <jonathanh@nvidia.com>
Signed-off-by: Shawn Nematbakhsh <shawnn@chromium.org>
---
 drivers/mfd/cros_ec_spi.c | 52 ++++++++++++++++++++++-------------------------
 1 file changed, 24 insertions(+), 28 deletions(-)

diff --git a/drivers/mfd/cros_ec_spi.c b/drivers/mfd/cros_ec_spi.c
index c9714072e224..d019b5b00b67 100644
--- a/drivers/mfd/cros_ec_spi.c
+++ b/drivers/mfd/cros_ec_spi.c
@@ -377,6 +377,7 @@ static int cros_ec_pkt_xfer_spi(struct cros_ec_device *ec_dev,
 	u8 *ptr;
 	u8 *rx_buf;
 	u8 sum;
+	u8 rx_byte;
 	int ret = 0, final_ret;
 
 	len = cros_ec_prepare_tx(ec_dev, ec_msg);
@@ -421,25 +422,22 @@ static int cros_ec_pkt_xfer_spi(struct cros_ec_device *ec_dev,
 	if (!ret) {
 		/* Verify that EC can process command */
 		for (i = 0; i < len; i++) {
-			switch (rx_buf[i]) {
-			case EC_SPI_PAST_END:
-			case EC_SPI_RX_BAD_DATA:
-			case EC_SPI_NOT_READY:
-				ret = -EAGAIN;
-				ec_msg->result = EC_RES_IN_PROGRESS;
-			default:
+			rx_byte = rx_buf[i];
+			if (rx_byte == EC_SPI_PAST_END  ||
+			    rx_byte == EC_SPI_RX_BAD_DATA ||
+			    rx_byte == EC_SPI_NOT_READY) {
+				ret = -EREMOTEIO;
 				break;
 			}
-			if (ret)
-				break;
 		}
-		if (!ret)
-			ret = cros_ec_spi_receive_packet(ec_dev,
-					ec_msg->insize + sizeof(*response));
-	} else {
-		dev_err(ec_dev->dev, "spi transfer failed: %d\n", ret);
 	}
 
+	if (!ret)
+		ret = cros_ec_spi_receive_packet(ec_dev,
+				ec_msg->insize + sizeof(*response));
+	else
+		dev_err(ec_dev->dev, "spi transfer failed: %d\n", ret);
+
 	final_ret = terminate_request(ec_dev);
 
 	spi_bus_unlock(ec_spi->spi->master);
@@ -508,6 +506,7 @@ static int cros_ec_cmd_xfer_spi(struct cros_ec_device *ec_dev,
 	int i, len;
 	u8 *ptr;
 	u8 *rx_buf;
+	u8 rx_byte;
 	int sum;
 	int ret = 0, final_ret;
 
@@ -544,25 +543,22 @@ static int cros_ec_cmd_xfer_spi(struct cros_ec_device *ec_dev,
 	if (!ret) {
 		/* Verify that EC can process command */
 		for (i = 0; i < len; i++) {
-			switch (rx_buf[i]) {
-			case EC_SPI_PAST_END:
-			case EC_SPI_RX_BAD_DATA:
-			case EC_SPI_NOT_READY:
-				ret = -EAGAIN;
-				ec_msg->result = EC_RES_IN_PROGRESS;
-			default:
+			rx_byte = rx_buf[i];
+			if (rx_byte == EC_SPI_PAST_END  ||
+			    rx_byte == EC_SPI_RX_BAD_DATA ||
+			    rx_byte == EC_SPI_NOT_READY) {
+				ret = -EREMOTEIO;
 				break;
 			}
-			if (ret)
-				break;
 		}
-		if (!ret)
-			ret = cros_ec_spi_receive_response(ec_dev,
-					ec_msg->insize + EC_MSG_TX_PROTO_BYTES);
-	} else {
-		dev_err(ec_dev->dev, "spi transfer failed: %d\n", ret);
 	}
 
+	if (!ret)
+		ret = cros_ec_spi_receive_response(ec_dev,
+				ec_msg->insize + EC_MSG_TX_PROTO_BYTES);
+	else
+		dev_err(ec_dev->dev, "spi transfer failed: %d\n", ret);
+
 	final_ret = terminate_request(ec_dev);
 
 	spi_bus_unlock(ec_spi->spi->master);
-- 
2.12.2

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH] mfd: cros ec: spi: Fix "in progress" error signaling
  2017-09-27 21:35 [PATCH] mfd: cros ec: spi: Fix "in progress" error signaling Shawn Nematbakhsh
@ 2017-09-27 22:19 ` Brian Norris
  2017-11-14 17:00   ` Shawn N
  2017-11-22  3:54 ` Benson Leung
  2017-11-29 12:11 ` Lee Jones
  2 siblings, 1 reply; 10+ messages in thread
From: Brian Norris @ 2017-09-27 22:19 UTC (permalink / raw)
  To: Shawn Nematbakhsh; +Cc: linux-kernel, lee.jones, jonathanh, Benson Leung

On Wed, Sep 27, 2017 at 02:35:27PM -0700, Shawn Nematbakhsh wrote:
> For host commands that take a long time to process, cros ec can return
> early by signaling a EC_RES_IN_PROGRESS result. The host must then poll
> status with EC_CMD_GET_COMMS_STATUS until completion of the command.
> 
> None of the above applies when data link errors are encountered. When
> errors such as EC_SPI_PAST_END are encountered during command
> transmission, it usually means the command was not received by the EC.
> Treating such errors as if they were 'EC_RES_IN_PROGRESS' results is
> almost always the wrong decision, and can result in host commands
> silently being lost.
> 
> Reported-and-tested-by: Jon Hunter <jonathanh@nvidia.com>
> Signed-off-by: Shawn Nematbakhsh <shawnn@chromium.org>
> ---
>  drivers/mfd/cros_ec_spi.c | 52 ++++++++++++++++++++++-------------------------
>  1 file changed, 24 insertions(+), 28 deletions(-)

I'm still not sure I understand the full extent of the
originally-reported error (it's still likely a SPI transport issue?),
but I believe this patch is good anyway:

Reviewed-by: Brian Norris <briannorris@chromium.org>

I wonder if we should tone down the BUG_ON()'s in drivers/mfd/cros_ec*
and drivers/platform/chrome/* too. That's basically a no-no these days,
as all of these type of things should be able to gracefully propagate
errors, no matter how "unlikely" it should be to see a crazy protocol
version number or a bad message length.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] mfd: cros ec: spi: Fix "in progress" error signaling
  2017-09-27 22:19 ` Brian Norris
@ 2017-11-14 17:00   ` Shawn N
  2017-11-29 11:50     ` Jon Hunter
  0 siblings, 1 reply; 10+ messages in thread
From: Shawn N @ 2017-11-14 17:00 UTC (permalink / raw)
  To: Brian Norris; +Cc: linux-kernel, Lee Jones, Jon Hunter, Benson Leung

On Wed, Sep 27, 2017 at 3:19 PM, Brian Norris <briannorris@chromium.org> wrote:
> On Wed, Sep 27, 2017 at 02:35:27PM -0700, Shawn Nematbakhsh wrote:
>> For host commands that take a long time to process, cros ec can return
>> early by signaling a EC_RES_IN_PROGRESS result. The host must then poll
>> status with EC_CMD_GET_COMMS_STATUS until completion of the command.
>>
>> None of the above applies when data link errors are encountered. When
>> errors such as EC_SPI_PAST_END are encountered during command
>> transmission, it usually means the command was not received by the EC.
>> Treating such errors as if they were 'EC_RES_IN_PROGRESS' results is
>> almost always the wrong decision, and can result in host commands
>> silently being lost.
>>
>> Reported-and-tested-by: Jon Hunter <jonathanh@nvidia.com>
>> Signed-off-by: Shawn Nematbakhsh <shawnn@chromium.org>
>> ---
>>  drivers/mfd/cros_ec_spi.c | 52 ++++++++++++++++++++++-------------------------
>>  1 file changed, 24 insertions(+), 28 deletions(-)
>
> I'm still not sure I understand the full extent of the
> originally-reported error (it's still likely a SPI transport issue?),
> but I believe this patch is good anyway:
>
> Reviewed-by: Brian Norris <briannorris@chromium.org>

Jon tracked down the root cause of the originally-reported error, but
we should still land this patch, as it fixes error signaling that was
previously broken.

>
> I wonder if we should tone down the BUG_ON()'s in drivers/mfd/cros_ec*
> and drivers/platform/chrome/* too. That's basically a no-no these days,
> as all of these type of things should be able to gracefully propagate
> errors, no matter how "unlikely" it should be to see a crazy protocol
> version number or a bad message length.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] mfd: cros ec: spi: Fix "in progress" error signaling
  2017-09-27 21:35 [PATCH] mfd: cros ec: spi: Fix "in progress" error signaling Shawn Nematbakhsh
  2017-09-27 22:19 ` Brian Norris
@ 2017-11-22  3:54 ` Benson Leung
  2017-11-29 12:11 ` Lee Jones
  2 siblings, 0 replies; 10+ messages in thread
From: Benson Leung @ 2017-11-22  3:54 UTC (permalink / raw)
  To: Shawn Nematbakhsh
  Cc: linux-kernel, lee.jones, jonathanh, briannorris, bleung, bleung

[-- Attachment #1: Type: text/plain, Size: 3999 bytes --]

Hi Shawn,

On Wed, Sep 27, 2017 at 02:35:27PM -0700, Shawn Nematbakhsh wrote:
> For host commands that take a long time to process, cros ec can return
> early by signaling a EC_RES_IN_PROGRESS result. The host must then poll
> status with EC_CMD_GET_COMMS_STATUS until completion of the command.
> 
> None of the above applies when data link errors are encountered. When
> errors such as EC_SPI_PAST_END are encountered during command
> transmission, it usually means the command was not received by the EC.
> Treating such errors as if they were 'EC_RES_IN_PROGRESS' results is
> almost always the wrong decision, and can result in host commands
> silently being lost.
> 
> Reported-and-tested-by: Jon Hunter <jonathanh@nvidia.com>
> Signed-off-by: Shawn Nematbakhsh <shawnn@chromium.org>

Acked-by: Benson Leung <bleung@chromium.org>

> ---
>  drivers/mfd/cros_ec_spi.c | 52 ++++++++++++++++++++++-------------------------
>  1 file changed, 24 insertions(+), 28 deletions(-)
> 
> diff --git a/drivers/mfd/cros_ec_spi.c b/drivers/mfd/cros_ec_spi.c
> index c9714072e224..d019b5b00b67 100644
> --- a/drivers/mfd/cros_ec_spi.c
> +++ b/drivers/mfd/cros_ec_spi.c
> @@ -377,6 +377,7 @@ static int cros_ec_pkt_xfer_spi(struct cros_ec_device *ec_dev,
>  	u8 *ptr;
>  	u8 *rx_buf;
>  	u8 sum;
> +	u8 rx_byte;
>  	int ret = 0, final_ret;
>  
>  	len = cros_ec_prepare_tx(ec_dev, ec_msg);
> @@ -421,25 +422,22 @@ static int cros_ec_pkt_xfer_spi(struct cros_ec_device *ec_dev,
>  	if (!ret) {
>  		/* Verify that EC can process command */
>  		for (i = 0; i < len; i++) {
> -			switch (rx_buf[i]) {
> -			case EC_SPI_PAST_END:
> -			case EC_SPI_RX_BAD_DATA:
> -			case EC_SPI_NOT_READY:
> -				ret = -EAGAIN;
> -				ec_msg->result = EC_RES_IN_PROGRESS;
> -			default:
> +			rx_byte = rx_buf[i];
> +			if (rx_byte == EC_SPI_PAST_END  ||
> +			    rx_byte == EC_SPI_RX_BAD_DATA ||
> +			    rx_byte == EC_SPI_NOT_READY) {
> +				ret = -EREMOTEIO;
>  				break;
>  			}
> -			if (ret)
> -				break;
>  		}
> -		if (!ret)
> -			ret = cros_ec_spi_receive_packet(ec_dev,
> -					ec_msg->insize + sizeof(*response));
> -	} else {
> -		dev_err(ec_dev->dev, "spi transfer failed: %d\n", ret);
>  	}
>  
> +	if (!ret)
> +		ret = cros_ec_spi_receive_packet(ec_dev,
> +				ec_msg->insize + sizeof(*response));
> +	else
> +		dev_err(ec_dev->dev, "spi transfer failed: %d\n", ret);
> +
>  	final_ret = terminate_request(ec_dev);
>  
>  	spi_bus_unlock(ec_spi->spi->master);
> @@ -508,6 +506,7 @@ static int cros_ec_cmd_xfer_spi(struct cros_ec_device *ec_dev,
>  	int i, len;
>  	u8 *ptr;
>  	u8 *rx_buf;
> +	u8 rx_byte;
>  	int sum;
>  	int ret = 0, final_ret;
>  
> @@ -544,25 +543,22 @@ static int cros_ec_cmd_xfer_spi(struct cros_ec_device *ec_dev,
>  	if (!ret) {
>  		/* Verify that EC can process command */
>  		for (i = 0; i < len; i++) {
> -			switch (rx_buf[i]) {
> -			case EC_SPI_PAST_END:
> -			case EC_SPI_RX_BAD_DATA:
> -			case EC_SPI_NOT_READY:
> -				ret = -EAGAIN;
> -				ec_msg->result = EC_RES_IN_PROGRESS;
> -			default:
> +			rx_byte = rx_buf[i];
> +			if (rx_byte == EC_SPI_PAST_END  ||
> +			    rx_byte == EC_SPI_RX_BAD_DATA ||
> +			    rx_byte == EC_SPI_NOT_READY) {
> +				ret = -EREMOTEIO;
>  				break;
>  			}
> -			if (ret)
> -				break;
>  		}
> -		if (!ret)
> -			ret = cros_ec_spi_receive_response(ec_dev,
> -					ec_msg->insize + EC_MSG_TX_PROTO_BYTES);
> -	} else {
> -		dev_err(ec_dev->dev, "spi transfer failed: %d\n", ret);
>  	}
>  
> +	if (!ret)
> +		ret = cros_ec_spi_receive_response(ec_dev,
> +				ec_msg->insize + EC_MSG_TX_PROTO_BYTES);
> +	else
> +		dev_err(ec_dev->dev, "spi transfer failed: %d\n", ret);
> +
>  	final_ret = terminate_request(ec_dev);
>  
>  	spi_bus_unlock(ec_spi->spi->master);
> -- 
> 2.12.2
> 

-- 
Benson Leung
Staff Software Engineer
Chrome OS Kernel
Google Inc.
bleung@google.com
Chromium OS Project
bleung@chromium.org

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] mfd: cros ec: spi: Fix "in progress" error signaling
  2017-11-14 17:00   ` Shawn N
@ 2017-11-29 11:50     ` Jon Hunter
  0 siblings, 0 replies; 10+ messages in thread
From: Jon Hunter @ 2017-11-29 11:50 UTC (permalink / raw)
  To: Lee Jones; +Cc: Shawn N, Brian Norris, linux-kernel, Benson Leung

Hi Lee,

On 14/11/17 17:00, Shawn N wrote:
> On Wed, Sep 27, 2017 at 3:19 PM, Brian Norris <briannorris@chromium.org> wrote:
>> On Wed, Sep 27, 2017 at 02:35:27PM -0700, Shawn Nematbakhsh wrote:
>>> For host commands that take a long time to process, cros ec can return
>>> early by signaling a EC_RES_IN_PROGRESS result. The host must then poll
>>> status with EC_CMD_GET_COMMS_STATUS until completion of the command.
>>>
>>> None of the above applies when data link errors are encountered. When
>>> errors such as EC_SPI_PAST_END are encountered during command
>>> transmission, it usually means the command was not received by the EC.
>>> Treating such errors as if they were 'EC_RES_IN_PROGRESS' results is
>>> almost always the wrong decision, and can result in host commands
>>> silently being lost.
>>>
>>> Reported-and-tested-by: Jon Hunter <jonathanh@nvidia.com>
>>> Signed-off-by: Shawn Nematbakhsh <shawnn@chromium.org>
>>> ---
>>>  drivers/mfd/cros_ec_spi.c | 52 ++++++++++++++++++++++-------------------------
>>>  1 file changed, 24 insertions(+), 28 deletions(-)
>>
>> I'm still not sure I understand the full extent of the
>> originally-reported error (it's still likely a SPI transport issue?),
>> but I believe this patch is good anyway:
>>
>> Reviewed-by: Brian Norris <briannorris@chromium.org>
> 
> Jon tracked down the root cause of the originally-reported error, but
> we should still land this patch, as it fixes error signaling that was
> previously broken.

Can you also queue this one as a fix for v4.15?

Cheers
Jon

-- 
nvpublic

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] mfd: cros ec: spi: Fix "in progress" error signaling
  2017-09-27 21:35 [PATCH] mfd: cros ec: spi: Fix "in progress" error signaling Shawn Nematbakhsh
  2017-09-27 22:19 ` Brian Norris
  2017-11-22  3:54 ` Benson Leung
@ 2017-11-29 12:11 ` Lee Jones
  2018-03-26 16:48   ` Enric Balletbo Serra
  2 siblings, 1 reply; 10+ messages in thread
From: Lee Jones @ 2017-11-29 12:11 UTC (permalink / raw)
  To: Shawn Nematbakhsh; +Cc: linux-kernel, jonathanh, briannorris

On Wed, 27 Sep 2017, Shawn Nematbakhsh wrote:

> For host commands that take a long time to process, cros ec can return
> early by signaling a EC_RES_IN_PROGRESS result. The host must then poll
> status with EC_CMD_GET_COMMS_STATUS until completion of the command.
> 
> None of the above applies when data link errors are encountered. When
> errors such as EC_SPI_PAST_END are encountered during command
> transmission, it usually means the command was not received by the EC.
> Treating such errors as if they were 'EC_RES_IN_PROGRESS' results is
> almost always the wrong decision, and can result in host commands
> silently being lost.
> 
> Reported-and-tested-by: Jon Hunter <jonathanh@nvidia.com>
> Signed-off-by: Shawn Nematbakhsh <shawnn@chromium.org>
> ---
>  drivers/mfd/cros_ec_spi.c | 52 ++++++++++++++++++++++-------------------------
>  1 file changed, 24 insertions(+), 28 deletions(-)

Applied, thanks.

-- 
Lee Jones
Linaro STMicroelectronics Landing Team Lead
Linaro.org │ Open source software for ARM SoCs
Follow Linaro: Facebook | Twitter | Blog

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] mfd: cros ec: spi: Fix "in progress" error signaling
  2017-11-29 12:11 ` Lee Jones
@ 2018-03-26 16:48   ` Enric Balletbo Serra
  2018-03-26 17:26     ` Alexandru M Stan
  0 siblings, 1 reply; 10+ messages in thread
From: Enric Balletbo Serra @ 2018-03-26 16:48 UTC (permalink / raw)
  To: Lee Jones
  Cc: Shawn Nematbakhsh, linux-kernel, jonathanh, Brian Norris,
	Tomeu Vizoso, Doug Anderson, Alexandru M Stan

Dear all,

Cc'ing some more chromium folks.

2017-11-29 13:11 GMT+01:00 Lee Jones <lee.jones@linaro.org>:
> On Wed, 27 Sep 2017, Shawn Nematbakhsh wrote:
>
>> For host commands that take a long time to process, cros ec can return
>> early by signaling a EC_RES_IN_PROGRESS result. The host must then poll
>> status with EC_CMD_GET_COMMS_STATUS until completion of the command.
>>
>> None of the above applies when data link errors are encountered. When
>> errors such as EC_SPI_PAST_END are encountered during command
>> transmission, it usually means the command was not received by the EC.
>> Treating such errors as if they were 'EC_RES_IN_PROGRESS' results is
>> almost always the wrong decision, and can result in host commands
>> silently being lost.
>>
>> Reported-and-tested-by: Jon Hunter <jonathanh@nvidia.com>
>> Signed-off-by: Shawn Nematbakhsh <shawnn@chromium.org>
>> ---
>>  drivers/mfd/cros_ec_spi.c | 52 ++++++++++++++++++++++-------------------------
>>  1 file changed, 24 insertions(+), 28 deletions(-)
>
> Applied, thanks.
>

This patch is a bit old and is already applied but I would like to
discuss an issue that Tomeu found playing with kernelci and a Veyron
Jaq attached to our lab.

Seems that after this patch, the veyron jaq spits out lots of spi
tranfer error messages. See full log here [1]

 cros-ec-spi spi0.0: spi transfer failed: -121
 cros-ec-spi spi0.0: Command xfer error (err:-121)
 cros-ec-i2c-tunnel ff110000.spi:ec@0:i2c-tunnel: Error transferring
EC i2c message -121

The issue is random, not always happens, but when happens makes
cros-ec unusable. Reproduce the issue is easier if CONFIG_SMP is not
set.

What happens is that the master starts the transmission and the
cros-ec returns an spi error event (EC_SPI_RX_BAD_DATA  - data is
0xfb). The difference between after and before the patch is that now
the cros-ec does not recover, whereas without this patch after some
tries it succeeds (note that before the patch the affected code
returned -EAGAIN whereas now returns -EREMOTEIO)

I think that reverting this patch is NOT the solution, but I don't
have enough knowledge yet to understand why the cros-ec fails. I need
to look at the cros-ec firmware to understand better what is
happening, but meanwhile, thoughts? clues?

An important note is that I did not reproduce the issue on a Veyron
Minnie, even with CONFIG_SMP disabled.

[1] https://lava.collabora.co.uk/scheduler/job/1085493#L905

Best regards,
  Enric

> --
> Lee Jones
> Linaro STMicroelectronics Landing Team Lead
> Linaro.org │ Open source software for ARM SoCs
> Follow Linaro: Facebook | Twitter | Blog

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] mfd: cros ec: spi: Fix "in progress" error signaling
  2018-03-26 16:48   ` Enric Balletbo Serra
@ 2018-03-26 17:26     ` Alexandru M Stan
  2018-03-27 10:49       ` Enric Balletbo Serra
  0 siblings, 1 reply; 10+ messages in thread
From: Alexandru M Stan @ 2018-03-26 17:26 UTC (permalink / raw)
  To: Enric Balletbo Serra
  Cc: Lee Jones, Shawn Nematbakhsh, linux-kernel, jonathanh,
	Brian Norris, Tomeu Vizoso, Doug Anderson

Hello Enric

On Mon, Mar 26, 2018 at 9:48 AM, Enric Balletbo Serra
<eballetbo@gmail.com> wrote:
> Dear all,
>
> Cc'ing some more chromium folks.
>
> 2017-11-29 13:11 GMT+01:00 Lee Jones <lee.jones@linaro.org>:
>> On Wed, 27 Sep 2017, Shawn Nematbakhsh wrote:
>>
>>> For host commands that take a long time to process, cros ec can return
>>> early by signaling a EC_RES_IN_PROGRESS result. The host must then poll
>>> status with EC_CMD_GET_COMMS_STATUS until completion of the command.
>>>
>>> None of the above applies when data link errors are encountered. When
>>> errors such as EC_SPI_PAST_END are encountered during command
>>> transmission, it usually means the command was not received by the EC.
>>> Treating such errors as if they were 'EC_RES_IN_PROGRESS' results is
>>> almost always the wrong decision, and can result in host commands
>>> silently being lost.
>>>
>>> Reported-and-tested-by: Jon Hunter <jonathanh@nvidia.com>
>>> Signed-off-by: Shawn Nematbakhsh <shawnn@chromium.org>
>>> ---
>>>  drivers/mfd/cros_ec_spi.c | 52 ++++++++++++++++++++++-------------------------
>>>  1 file changed, 24 insertions(+), 28 deletions(-)
>>
>> Applied, thanks.
>>
>
> This patch is a bit old and is already applied but I would like to
> discuss an issue that Tomeu found playing with kernelci and a Veyron
> Jaq attached to our lab.
>
> Seems that after this patch, the veyron jaq spits out lots of spi
> tranfer error messages. See full log here [1]
>
>  cros-ec-spi spi0.0: spi transfer failed: -121
>  cros-ec-spi spi0.0: Command xfer error (err:-121)
>  cros-ec-i2c-tunnel ff110000.spi:ec@0:i2c-tunnel: Error transferring
> EC i2c message -121
>
> The issue is random, not always happens, but when happens makes
> cros-ec unusable. Reproduce the issue is easier if CONFIG_SMP is not
> set.
>
> What happens is that the master starts the transmission and the
> cros-ec returns an spi error event (EC_SPI_RX_BAD_DATA  - data is
> 0xfb). The difference between after and before the patch is that now
> the cros-ec does not recover, whereas without this patch after some
> tries it succeeds (note that before the patch the affected code
> returned -EAGAIN whereas now returns -EREMOTEIO)
>
> I think that reverting this patch is NOT the solution, but I don't
> have enough knowledge yet to understand why the cros-ec fails. I need
> to look at the cros-ec firmware to understand better what is
> happening, but meanwhile, thoughts? clues?
>
> An important note is that I did not reproduce the issue on a Veyron
> Minnie, even with CONFIG_SMP disabled.
>
> [1] https://lava.collabora.co.uk/scheduler/job/1085493#L905
>
> Best regards,
>   Enric
>
>> --
>> Lee Jones
>> Linaro STMicroelectronics Landing Team Lead
>> Linaro.org │ Open source software for ARM SoCs
>> Follow Linaro: Facebook | Twitter | Blog

Would it be possible for you to run "ectool version" on both your
machines? Based on the code the EC is running we might have some hints
on what the differences are.

You can find both ectool and the code the ec runs on
https://chromium.googlesource.com/chromiumos/platform/ec/+/firmware-veyron-6588.B.
Though I would use ectool from the master branch.

One thing I suspect is different is that veyron_minnie regularly polls
an accelerometer, depending on the userspace you're running it's
possible it unwedged itself with a few accelerometer requests.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] mfd: cros ec: spi: Fix "in progress" error signaling
  2018-03-26 17:26     ` Alexandru M Stan
@ 2018-03-27 10:49       ` Enric Balletbo Serra
  2018-03-29 22:08         ` Alexandru M Stan
  0 siblings, 1 reply; 10+ messages in thread
From: Enric Balletbo Serra @ 2018-03-27 10:49 UTC (permalink / raw)
  To: Alexandru M Stan
  Cc: Lee Jones, Shawn Nematbakhsh, linux-kernel, jonathanh,
	Brian Norris, Tomeu Vizoso, Doug Anderson

Hi Alexandru

2018-03-26 19:26 GMT+02:00 Alexandru M Stan <amstan@chromium.org>:
> Hello Enric
>
> On Mon, Mar 26, 2018 at 9:48 AM, Enric Balletbo Serra
> <eballetbo@gmail.com> wrote:
>> Dear all,
>>
>> Cc'ing some more chromium folks.
>>
>> 2017-11-29 13:11 GMT+01:00 Lee Jones <lee.jones@linaro.org>:
>>> On Wed, 27 Sep 2017, Shawn Nematbakhsh wrote:
>>>
>>>> For host commands that take a long time to process, cros ec can return
>>>> early by signaling a EC_RES_IN_PROGRESS result. The host must then poll
>>>> status with EC_CMD_GET_COMMS_STATUS until completion of the command.
>>>>
>>>> None of the above applies when data link errors are encountered. When
>>>> errors such as EC_SPI_PAST_END are encountered during command
>>>> transmission, it usually means the command was not received by the EC.
>>>> Treating such errors as if they were 'EC_RES_IN_PROGRESS' results is
>>>> almost always the wrong decision, and can result in host commands
>>>> silently being lost.
>>>>
>>>> Reported-and-tested-by: Jon Hunter <jonathanh@nvidia.com>
>>>> Signed-off-by: Shawn Nematbakhsh <shawnn@chromium.org>
>>>> ---
>>>>  drivers/mfd/cros_ec_spi.c | 52 ++++++++++++++++++++++-------------------------
>>>>  1 file changed, 24 insertions(+), 28 deletions(-)
>>>
>>> Applied, thanks.
>>>
>>
>> This patch is a bit old and is already applied but I would like to
>> discuss an issue that Tomeu found playing with kernelci and a Veyron
>> Jaq attached to our lab.
>>
>> Seems that after this patch, the veyron jaq spits out lots of spi
>> tranfer error messages. See full log here [1]
>>
>>  cros-ec-spi spi0.0: spi transfer failed: -121
>>  cros-ec-spi spi0.0: Command xfer error (err:-121)
>>  cros-ec-i2c-tunnel ff110000.spi:ec@0:i2c-tunnel: Error transferring
>> EC i2c message -121
>>
>> The issue is random, not always happens, but when happens makes
>> cros-ec unusable. Reproduce the issue is easier if CONFIG_SMP is not
>> set.
>>
>> What happens is that the master starts the transmission and the
>> cros-ec returns an spi error event (EC_SPI_RX_BAD_DATA  - data is
>> 0xfb). The difference between after and before the patch is that now
>> the cros-ec does not recover, whereas without this patch after some
>> tries it succeeds (note that before the patch the affected code
>> returned -EAGAIN whereas now returns -EREMOTEIO)
>>
>> I think that reverting this patch is NOT the solution, but I don't
>> have enough knowledge yet to understand why the cros-ec fails. I need
>> to look at the cros-ec firmware to understand better what is
>> happening, but meanwhile, thoughts? clues?
>>
>> An important note is that I did not reproduce the issue on a Veyron
>> Minnie, even with CONFIG_SMP disabled.
>>
>> [1] https://lava.collabora.co.uk/scheduler/job/1085493#L905
>>
>> Best regards,
>>   Enric
>>
>>> --
>>> Lee Jones
>>> Linaro STMicroelectronics Landing Team Lead
>>> Linaro.org │ Open source software for ARM SoCs
>>> Follow Linaro: Facebook | Twitter | Blog
>
> Would it be possible for you to run "ectool version" on both your
> machines? Based on the code the EC is running we might have some hints
> on what the differences are.
>

I think that accessing to the ec console should give the same result, right?

In such case here is:

Veyron Minnie ( ASUS Chromebook Flip C100PA )
-------------------------------------------------------------------
Chip:    stm stm32f07x
Board:   0
RO:      minnie_v1.1.2697-faafaa5
RW:      minnie_v1.1.2712-242f6bd
Build: minnie_v1.1.2712-242f6bd 2016-11-03 00:34:41 @build196-m2

Veyron Jaq (  Haier Mighty MP )
--------------------------------------------------------------------
Chip:    stm stm32f07x
Board:   0
RO:      mighty_v1.1.2680-6727e1d
RW:      mighty_v1.1.2712-242f6bd
Build: mighty_v1.1.2680-6727e1d 2015-03-24 01:12:48 @build290-m2

We're running the RW firmware and I just discovered that our jaq is a
mighty (but I guess it's the same?)

Thanks,
  Enric

> You can find both ectool and the code the ec runs on
> https://chromium.googlesource.com/chromiumos/platform/ec/+/firmware-veyron-6588.B.
> Though I would use ectool from the master branch.
>
> One thing I suspect is different is that veyron_minnie regularly polls
> an accelerometer, depending on the userspace you're running it's
> possible it unwedged itself with a few accelerometer requests.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] mfd: cros ec: spi: Fix "in progress" error signaling
  2018-03-27 10:49       ` Enric Balletbo Serra
@ 2018-03-29 22:08         ` Alexandru M Stan
  0 siblings, 0 replies; 10+ messages in thread
From: Alexandru M Stan @ 2018-03-29 22:08 UTC (permalink / raw)
  To: Enric Balletbo Serra
  Cc: Lee Jones, Shawn Nematbakhsh, linux-kernel, jonathanh,
	Brian Norris, Tomeu Vizoso, Doug Anderson

On Tue, Mar 27, 2018 at 3:49 AM, Enric Balletbo Serra
<eballetbo@gmail.com> wrote:
> I think that accessing to the ec console should give the same result, right?

Yep, even better.

>
> In such case here is:
>
> Veyron Minnie ( ASUS Chromebook Flip C100PA )
> -------------------------------------------------------------------
> Chip:    stm stm32f07x
> Board:   0
> RO:      minnie_v1.1.2697-faafaa5
> RW:      minnie_v1.1.2712-242f6bd
> Build: minnie_v1.1.2712-242f6bd 2016-11-03 00:34:41 @build196-m2
>
> Veyron Jaq (  Haier Mighty MP )
> --------------------------------------------------------------------
> Chip:    stm stm32f07x
> Board:   0
> RO:      mighty_v1.1.2680-6727e1d
> RW:      mighty_v1.1.2712-242f6bd
> Build: mighty_v1.1.2680-6727e1d 2015-03-24 01:12:48 @build290-m2

Looks like your mighty is running the RO firmware, whereas your minnie
runs RW. Is it possible you have the 0x200 bit in the gbb flags set on
mighty? That would prevent the RO->RW transition, and give you an
older firmware.

6727e1d..242f6bd is quite the change. I see some spi changes too,
though i believe it's mostly at power state transitions
(suspend/resume). Other changes include battery settings (yeah.. you
should really avoid running that RO if you can avoid it) and a ton of
accelerometer stuff for minnie.

If it's not the gbb flags, and we can't figure it out why you're stuck
in RO, you can also use "sysjump RW" to force the RW copy on mighty.
See if there's any behavior changes in what you care about.

>
> We're running the RW firmware and I just discovered that our jaq is a
> mighty (but I guess it's the same?)

They're essentially the same, but they're running slightly different
firmware. In practice the only difference is that mighty's firmware
reads an extra gpio for the battery presence.

Feel free to diff the board/{jaq,mighty} ec folders for yourself for
more details/assurances.

>
> Thanks,
>   Enric

All in all I'm not sure that the version differences are enough to
explain the spi errors you see in the kernel.

My bet is back to the accelerometer stuff:
Are you running chromeos ui on this device (is there a chromeos-chrome
process constantly polling the accelerometer, so asking the cros-ec
driver for transfers)?
Another thing to make sure accelerometer is disabled is to run
"accelrate 0 0" on the minnie EC.
If none of that accelerometer stuff is enabled, minnie should
essentially act like a mighty/jaq.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2018-03-29 22:08 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-09-27 21:35 [PATCH] mfd: cros ec: spi: Fix "in progress" error signaling Shawn Nematbakhsh
2017-09-27 22:19 ` Brian Norris
2017-11-14 17:00   ` Shawn N
2017-11-29 11:50     ` Jon Hunter
2017-11-22  3:54 ` Benson Leung
2017-11-29 12:11 ` Lee Jones
2018-03-26 16:48   ` Enric Balletbo Serra
2018-03-26 17:26     ` Alexandru M Stan
2018-03-27 10:49       ` Enric Balletbo Serra
2018-03-29 22:08         ` Alexandru M Stan

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.