All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] i2c-omap: Trigger bus recovery in lockup case
@ 2017-09-14 15:39 Claudio Foellmi
  2017-09-15 23:31 ` Grygorii Strashko
  0 siblings, 1 reply; 20+ messages in thread
From: Claudio Foellmi @ 2017-09-14 15:39 UTC (permalink / raw)
  To: Tony Lindgren
  Cc: Claudio Foellmi, Aaro Koskinen, Wolfram Sang, linux-omap, linux-i2c

A very conservative check for bus activity (to prevent interference
in multimaster setups) prevented the bus recovery methods from being
triggered in the case that SDA or SCL was stuck low.
This defeats the purpose of the recovery mechanism, which was introduced
for exactly this situation (a slave device keeping SDA pulled down).

Note that bus lockups can persist across reboots. The only other options
are to reset or power cycle the offending slave device, and many i2c
slaves do not even have a reset pin.

If we see that one of the lines is low for the entire timeout duration,
we can actually be sure that there is no other master driving the bus.
It is therefore save for us to attempt a bus recovery.

Signed-off-by: Claudio Foellmi <claudio.foellmi@ergon.ch>
---
Caveat: It turns out I don't have the hardware to fully test the
recovery mechanism. My faulty i2c slave device actually pulls down SCL,
not SDA (so the recovery will not succeed in my case).
But by directly connecting SDA to ground, I could at least make sure
the recovery function gets called after applying this patch.

 drivers/i2c/busses/i2c-omap.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/i2c/busses/i2c-omap.c b/drivers/i2c/busses/i2c-omap.c
index 1ebb5e9..4b25fd1 100644
--- a/drivers/i2c/busses/i2c-omap.c
+++ b/drivers/i2c/busses/i2c-omap.c
@@ -563,8 +563,13 @@ static int omap_i2c_wait_for_bb_valid(struct omap_i2c_dev *omap)
 		}
 
 		if (time_after(jiffies, timeout)) {
+			/*
+			 * SDA or SCL were low for the entire timeout without
+			 * any activity detected. Most likely, a slave is
+			 * locking up the bus with no master driving the clock.
+			 */
 			dev_warn(omap->dev, "timeout waiting for bus ready\n");
-			return -ETIMEDOUT;
+			return i2c_recover_bus(&omap->adapter);
 		}
 
 		msleep(1);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH] i2c-omap: Trigger bus recovery in lockup case
  2017-09-14 15:39 [PATCH] i2c-omap: Trigger bus recovery in lockup case Claudio Foellmi
@ 2017-09-15 23:31 ` Grygorii Strashko
  2017-09-18  5:24   ` Vignesh R
  0 siblings, 1 reply; 20+ messages in thread
From: Grygorii Strashko @ 2017-09-15 23:31 UTC (permalink / raw)
  To: Claudio Foellmi, Tony Lindgren, Sekhar Nori, Franklin S Cooper Jr
  Cc: Aaro Koskinen, Wolfram Sang, linux-omap, linux-i2c



On 09/14/2017 10:39 AM, Claudio Foellmi wrote:
> A very conservative check for bus activity (to prevent interference
> in multimaster setups) prevented the bus recovery methods from being
> triggered in the case that SDA or SCL was stuck low.
> This defeats the purpose of the recovery mechanism, which was introduced
> for exactly this situation (a slave device keeping SDA pulled down).
> 
> Note that bus lockups can persist across reboots. The only other options
> are to reset or power cycle the offending slave device, and many i2c
> slaves do not even have a reset pin.
> 
> If we see that one of the lines is low for the entire timeout duration,
> we can actually be sure that there is no other master driving the bus.
> It is therefore save for us to attempt a bus recovery.
> 

Reviewed-by: Grygorii Strashko <grygorii.strashko@ti.com>

> Signed-off-by: Claudio Foellmi <claudio.foellmi@ergon.ch>
> ---
> Caveat: It turns out I don't have the hardware to fully test the
> recovery mechanism. My faulty i2c slave device actually pulls down SCL,
> not SDA (so the recovery will not succeed in my case).
> But by directly connecting SDA to ground, I could at least make sure
> the recovery function gets called after applying this patch.
> 
>   drivers/i2c/busses/i2c-omap.c | 7 ++++++-
>   1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/i2c/busses/i2c-omap.c b/drivers/i2c/busses/i2c-omap.c
> index 1ebb5e9..4b25fd1 100644
> --- a/drivers/i2c/busses/i2c-omap.c
> +++ b/drivers/i2c/busses/i2c-omap.c
> @@ -563,8 +563,13 @@ static int omap_i2c_wait_for_bb_valid(struct omap_i2c_dev *omap)
>   		}
>   
>   		if (time_after(jiffies, timeout)) {
> +			/*
> +			 * SDA or SCL were low for the entire timeout without
> +			 * any activity detected. Most likely, a slave is
> +			 * locking up the bus with no master driving the clock.
> +			 */
>   			dev_warn(omap->dev, "timeout waiting for bus ready\n");
> -			return -ETIMEDOUT;
> +			return i2c_recover_bus(&omap->adapter);
>   		}
>   
>   		msleep(1);
> 

-- 
regards,
-grygorii

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] i2c-omap: Trigger bus recovery in lockup case
  2017-09-15 23:31 ` Grygorii Strashko
@ 2017-09-18  5:24   ` Vignesh R
  2017-09-18 12:01     ` Claudio Foellmi
  0 siblings, 1 reply; 20+ messages in thread
From: Vignesh R @ 2017-09-18  5:24 UTC (permalink / raw)
  To: Strashko, Grygorii, Claudio Foellmi, Tony Lindgren, Nori, Sekhar,
	Cooper Jr.,
	Franklin
  Cc: Aaro Koskinen, Wolfram Sang, linux-omap, linux-i2c



On Saturday 16 September 2017 05:01 AM, Strashko, Grygorii wrote:
> 
> 
> On 09/14/2017 10:39 AM, Claudio Foellmi wrote:
>> A very conservative check for bus activity (to prevent interference
>> in multimaster setups) prevented the bus recovery methods from being
>> triggered in the case that SDA or SCL was stuck low.
>> This defeats the purpose of the recovery mechanism, which was introduced
>> for exactly this situation (a slave device keeping SDA pulled down).
>>
>> Note that bus lockups can persist across reboots. The only other options
>> are to reset or power cycle the offending slave device, and many i2c
>> slaves do not even have a reset pin.
>>
>> If we see that one of the lines is low for the entire timeout duration,
>> we can actually be sure that there is no other master driving the bus.
>> It is therefore save for us to attempt a bus recovery.
>>
> 
> Reviewed-by: Grygorii Strashko <grygorii.strashko@ti.com>
> 
>> Signed-off-by: Claudio Foellmi <claudio.foellmi@ergon.ch>
>> ---
>> Caveat: It turns out I don't have the hardware to fully test the
>> recovery mechanism. My faulty i2c slave device actually pulls down SCL,
>> not SDA (so the recovery will not succeed in my case).

Maybe, you could detect SCL stuck low case by reading status of SCL line
from OMAP_I2C_SYSTEST_REG and then call IP reset (there is nothing much
that can be done) instead of bus recovery.

>> But by directly connecting SDA to ground, I could at least make sure
>> the recovery function gets called after applying this patch.
>>

I had seen flood of XRDY & RRDY interrupts as soon as TMODE is set to
0x3 as part of omap_i2c_prepare_recovery() leading to unusable system.
Did you observe this behavior on your system? Could you mention the
platform on which this experiment done?

>>   drivers/i2c/busses/i2c-omap.c | 7 ++++++-
>>   1 file changed, 6 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/i2c/busses/i2c-omap.c b/drivers/i2c/busses/i2c-omap.c
>> index 1ebb5e9..4b25fd1 100644
>> --- a/drivers/i2c/busses/i2c-omap.c
>> +++ b/drivers/i2c/busses/i2c-omap.c
>> @@ -563,8 +563,13 @@ static int omap_i2c_wait_for_bb_valid(struct omap_i2c_dev *omap)
>>   		}
>>   
>>   		if (time_after(jiffies, timeout)) {
>> +			/*
>> +			 * SDA or SCL were low for the entire timeout without
>> +			 * any activity detected. Most likely, a slave is
>> +			 * locking up the bus with no master driving the clock.
>> +			 */
>>   			dev_warn(omap->dev, "timeout waiting for bus ready\n");
>> -			return -ETIMEDOUT;
>> +			return i2c_recover_bus(&omap->adapter);
>>   		}
>>   
>>   		msleep(1);
>>
> 

-- 
Regards
Vignesh

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] i2c-omap: Trigger bus recovery in lockup case
  2017-09-18  5:24   ` Vignesh R
@ 2017-09-18 12:01     ` Claudio Foellmi
  2017-09-19 10:50       ` Vignesh R
  0 siblings, 1 reply; 20+ messages in thread
From: Claudio Foellmi @ 2017-09-18 12:01 UTC (permalink / raw)
  To: Vignesh R, Strashko, Grygorii, Tony Lindgren, Nori, Sekhar,
	Cooper Jr.,
	Franklin
  Cc: Aaro Koskinen, Wolfram Sang, linux-omap, linux-i2c

On 18.09.2017 07:24, Vignesh R wrote:
> 
> 
> On Saturday 16 September 2017 05:01 AM, Strashko, Grygorii wrote:
>>
>>
>> On 09/14/2017 10:39 AM, Claudio Foellmi wrote:
>>> A very conservative check for bus activity (to prevent interference
>>> in multimaster setups) prevented the bus recovery methods from being
>>> triggered in the case that SDA or SCL was stuck low.
>>> This defeats the purpose of the recovery mechanism, which was introduced
>>> for exactly this situation (a slave device keeping SDA pulled down).
>>>
>>> Note that bus lockups can persist across reboots. The only other options
>>> are to reset or power cycle the offending slave device, and many i2c
>>> slaves do not even have a reset pin.
>>>
>>> If we see that one of the lines is low for the entire timeout duration,
>>> we can actually be sure that there is no other master driving the bus.
>>> It is therefore save for us to attempt a bus recovery.
>>>
>>
>> Reviewed-by: Grygorii Strashko <grygorii.strashko@ti.com>
>>
>>> Signed-off-by: Claudio Foellmi <claudio.foellmi@ergon.ch>
>>> ---
>>> Caveat: It turns out I don't have the hardware to fully test the
>>> recovery mechanism. My faulty i2c slave device actually pulls down SCL,
>>> not SDA (so the recovery will not succeed in my case).
> 
> Maybe, you could detect SCL stuck low case by reading status of SCL line
> from OMAP_I2C_SYSTEST_REG and then call IP reset (there is nothing much
> that can be done) instead of bus recovery.
> 

I plan on posting a related patch soon, that will print better messages
if the generic recovery fails. If SCL is stuck low, I think the best we
can do is make the problem visible in the kernel log.

>>> But by directly connecting SDA to ground, I could at least make sure
>>> the recovery function gets called after applying this patch.
>>>
> 
> I had seen flood of XRDY & RRDY interrupts as soon as TMODE is set to
> 0x3 as part of omap_i2c_prepare_recovery() leading to unusable system.
> Did you observe this behavior on your system? Could you mention the
> platform on which this experiment done?
> 

So attempting bus recovery is dangerous on some platforms?
I did not notice any obvious problems (assuming an 'unusable system'
would be hard to miss), but then again I only have one target to test on.
I'm working with a TI AM3352, the slave is a NXP NT3H2211 on i2c-1.

>>>   drivers/i2c/busses/i2c-omap.c | 7 ++++++-
>>>   1 file changed, 6 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/i2c/busses/i2c-omap.c b/drivers/i2c/busses/i2c-omap.c
>>> index 1ebb5e9..4b25fd1 100644
>>> --- a/drivers/i2c/busses/i2c-omap.c
>>> +++ b/drivers/i2c/busses/i2c-omap.c
>>> @@ -563,8 +563,13 @@ static int omap_i2c_wait_for_bb_valid(struct omap_i2c_dev *omap)
>>>   		}
>>>   
>>>   		if (time_after(jiffies, timeout)) {
>>> +			/*
>>> +			 * SDA or SCL were low for the entire timeout without
>>> +			 * any activity detected. Most likely, a slave is
>>> +			 * locking up the bus with no master driving the clock.
>>> +			 */
>>>   			dev_warn(omap->dev, "timeout waiting for bus ready\n");
>>> -			return -ETIMEDOUT;
>>> +			return i2c_recover_bus(&omap->adapter);
>>>   		}
>>>   
>>>   		msleep(1);
>>>
>>
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] i2c-omap: Trigger bus recovery in lockup case
  2017-09-18 12:01     ` Claudio Foellmi
@ 2017-09-19 10:50       ` Vignesh R
  2017-09-20  9:24         ` Claudio Foellmi
                           ` (2 more replies)
  0 siblings, 3 replies; 20+ messages in thread
From: Vignesh R @ 2017-09-19 10:50 UTC (permalink / raw)
  To: Claudio Foellmi, Strashko, Grygorii, Tony Lindgren, Nori, Sekhar,
	Cooper Jr.,
	Franklin
  Cc: Aaro Koskinen, Wolfram Sang, linux-omap, linux-i2c



On Monday 18 September 2017 05:31 PM, Claudio Foellmi wrote:
> On 18.09.2017 07:24, Vignesh R wrote:
>>
>>
>> On Saturday 16 September 2017 05:01 AM, Strashko, Grygorii wrote:
>>>
>>>
>>> On 09/14/2017 10:39 AM, Claudio Foellmi wrote:
>>>> A very conservative check for bus activity (to prevent interference
>>>> in multimaster setups) prevented the bus recovery methods from being
>>>> triggered in the case that SDA or SCL was stuck low.
>>>> This defeats the purpose of the recovery mechanism, which was introduced
>>>> for exactly this situation (a slave device keeping SDA pulled down).
>>>>
>>>> Note that bus lockups can persist across reboots. The only other options
>>>> are to reset or power cycle the offending slave device, and many i2c
>>>> slaves do not even have a reset pin.
>>>>
>>>> If we see that one of the lines is low for the entire timeout duration,
>>>> we can actually be sure that there is no other master driving the bus.
>>>> It is therefore save for us to attempt a bus recovery.
>>>>
>>>
>>> Reviewed-by: Grygorii Strashko <grygorii.strashko@ti.com>
>>>
>>>> Signed-off-by: Claudio Foellmi <claudio.foellmi@ergon.ch>
>>>> ---
>>>> Caveat: It turns out I don't have the hardware to fully test the
>>>> recovery mechanism. My faulty i2c slave device actually pulls down SCL,
>>>> not SDA (so the recovery will not succeed in my case).
>>
>> Maybe, you could detect SCL stuck low case by reading status of SCL line
>> from OMAP_I2C_SYSTEST_REG and then call IP reset (there is nothing much
>> that can be done) instead of bus recovery.
>>
> 
> I plan on posting a related patch soon, that will print better messages
> if the generic recovery fails. If SCL is stuck low, I think the best we
> can do is make the problem visible in the kernel log.
> 
>>>> But by directly connecting SDA to ground, I could at least make sure
>>>> the recovery function gets called after applying this patch.
>>>>
>>
>> I had seen flood of XRDY & RRDY interrupts as soon as TMODE is set to
>> 0x3 as part of omap_i2c_prepare_recovery() leading to unusable system.
>> Did you observe this behavior on your system? Could you mention the
>> platform on which this experiment done?
>>
> 
> So attempting bus recovery is dangerous on some platforms?
> I did not notice any obvious problems (assuming an 'unusable system'
> would be hard to miss), but then again I only have one target to test on.
> I'm working with a TI AM3352, the slave is a NXP NT3H2211 on i2c-1.
> 

I hit a situation where when communicating with a faulty i2c device, the
last transaction on the bus does not end with proper STOP condition on
the i2c bus. Since, STOP condition was not detected by IP, Bus Busy will
remain set even though both SCL and SDA are high. Thus,
omap_i2c_wait_for_bb() function would end up calling bus recovery. And
as soon as TMODE is set to 0x3 and ST_EN to 0x1, there is a flood of
XRDY & RRDY interrupts.

This spurious irqs can be reproduced easily by setting TMODE to 0x3 and
ST_EN to 0x1 in OMAP_I2C_SYSTEST_REG when both SCL and SDA are high (bus
is idle) even on AM335x.

So, if you are not seeing irq flood when SCL/SDA is stuck low, then
maybe its safe to enter TMODE 0x3 in such cases. Could you modify the
patch to test whether or not SDA is stuck low before initiating bus
recovery?


>>>>   drivers/i2c/busses/i2c-omap.c | 7 ++++++-
>>>>   1 file changed, 6 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/i2c/busses/i2c-omap.c b/drivers/i2c/busses/i2c-omap.c
>>>> index 1ebb5e9..4b25fd1 100644
>>>> --- a/drivers/i2c/busses/i2c-omap.c
>>>> +++ b/drivers/i2c/busses/i2c-omap.c
>>>> @@ -563,8 +563,13 @@ static int omap_i2c_wait_for_bb_valid(struct omap_i2c_dev *omap)
>>>>   		}
>>>>   
>>>>   		if (time_after(jiffies, timeout)) {
>>>> +			/*
>>>> +			 * SDA or SCL were low for the entire timeout without
>>>> +			 * any activity detected. Most likely, a slave is
>>>> +			 * locking up the bus with no master driving the clock.
>>>> +			 */
>>>>   			dev_warn(omap->dev, "timeout waiting for bus ready\n");
>>>> -			return -ETIMEDOUT;
>>>> +			return i2c_recover_bus(&omap->adapter);
>>>>   		}
>>>>   
>>>>   		msleep(1);
>>>>
>>>
>>
> 
> --
> Regards
> Claudio
> 

-- 
Regards
Vignesh

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] i2c-omap: Trigger bus recovery in lockup case
  2017-09-19 10:50       ` Vignesh R
@ 2017-09-20  9:24         ` Claudio Foellmi
  2017-09-20 15:02         ` Sebastian Reichel
  2017-09-26 12:24         ` Claudio Foellmi
  2 siblings, 0 replies; 20+ messages in thread
From: Claudio Foellmi @ 2017-09-20  9:24 UTC (permalink / raw)
  To: Vignesh R, Strashko, Grygorii, Tony Lindgren, Nori, Sekhar,
	Cooper Jr.,
	Franklin
  Cc: Aaro Koskinen, Wolfram Sang, linux-omap, linux-i2c

On 19.09.2017 12:50, Vignesh R wrote:
> 
> 
> On Monday 18 September 2017 05:31 PM, Claudio Foellmi wrote:
>> On 18.09.2017 07:24, Vignesh R wrote:
>>>
>>>
>>> On Saturday 16 September 2017 05:01 AM, Strashko, Grygorii wrote:
>>>>
>>>>
>>>> On 09/14/2017 10:39 AM, Claudio Foellmi wrote:
>>>>> A very conservative check for bus activity (to prevent interference
>>>>> in multimaster setups) prevented the bus recovery methods from being
>>>>> triggered in the case that SDA or SCL was stuck low.
>>>>> This defeats the purpose of the recovery mechanism, which was introduced
>>>>> for exactly this situation (a slave device keeping SDA pulled down).
>>>>>
>>>>> Note that bus lockups can persist across reboots. The only other options
>>>>> are to reset or power cycle the offending slave device, and many i2c
>>>>> slaves do not even have a reset pin.
>>>>>
>>>>> If we see that one of the lines is low for the entire timeout duration,
>>>>> we can actually be sure that there is no other master driving the bus.
>>>>> It is therefore save for us to attempt a bus recovery.
>>>>>
>>>>
>>>> Reviewed-by: Grygorii Strashko <grygorii.strashko@ti.com>
>>>>
>>>>> Signed-off-by: Claudio Foellmi <claudio.foellmi@ergon.ch>
>>>>> ---
>>>>> Caveat: It turns out I don't have the hardware to fully test the
>>>>> recovery mechanism. My faulty i2c slave device actually pulls down SCL,
>>>>> not SDA (so the recovery will not succeed in my case).
>>>
>>> Maybe, you could detect SCL stuck low case by reading status of SCL line
>>> from OMAP_I2C_SYSTEST_REG and then call IP reset (there is nothing much
>>> that can be done) instead of bus recovery.
>>>
>>
>> I plan on posting a related patch soon, that will print better messages
>> if the generic recovery fails. If SCL is stuck low, I think the best we
>> can do is make the problem visible in the kernel log.
>>
>>>>> But by directly connecting SDA to ground, I could at least make sure
>>>>> the recovery function gets called after applying this patch.
>>>>>
>>>
>>> I had seen flood of XRDY & RRDY interrupts as soon as TMODE is set to
>>> 0x3 as part of omap_i2c_prepare_recovery() leading to unusable system.
>>> Did you observe this behavior on your system? Could you mention the
>>> platform on which this experiment done?
>>>
>>
>> So attempting bus recovery is dangerous on some platforms?
>> I did not notice any obvious problems (assuming an 'unusable system'
>> would be hard to miss), but then again I only have one target to test on.
>> I'm working with a TI AM3352, the slave is a NXP NT3H2211 on i2c-1.
>>
>
> I hit a situation where when communicating with a faulty i2c device, the
> last transaction on the bus does not end with proper STOP condition on
> the i2c bus. Since, STOP condition was not detected by IP, Bus Busy will
> remain set even though both SCL and SDA are high. Thus,
> omap_i2c_wait_for_bb() function would end up calling bus recovery. And
> as soon as TMODE is set to 0x3 and ST_EN to 0x1, there is a flood of
> XRDY & RRDY interrupts.
>

If I understand the code correctly, the problem is that omap_i2c_wait_for_bb_valid
does not quite do what the name would suggest:
If SDA, SCL and BB are all 1 for a while, it correctly detects that
the bus is actually free, but then just sets bb_valid=1 and returns.
So now bb_valid is set, even though the value of BB is wrong.
Later, omap_i2c_wait_for_bb runs into a timeout because it trusts that BB flag,
and triggers the recovery.

It would be nice if we could manually set BB to 0 in that case.
But from what I've read in the manual, the only way to reset BB is to
reset the device.

> This spurious irqs can be reproduced easily by setting TMODE to 0x3 and
> ST_EN to 0x1 in OMAP_I2C_SYSTEST_REG when both SCL and SDA are high (bus
> is idle) even on AM335x.
>
> So, if you are not seeing irq flood when SCL/SDA is stuck low, then
> maybe its safe to enter TMODE 0x3 in such cases. Could you modify the
> patch to test whether or not SDA is stuck low before initiating bus
> recovery?
>

What happens if we try to transmit while BB is still 1?
If the transmission succeeds, adding such a check in omap_i2c_wait_for_bb
should be enough to deal with your case (as the message we then send
will contain the stop condition to correct BB).

Btw, it also looks like omap_i2c_wait_for_bb currently assumes that if
recovery succeeds, the bus will then also be free. I'm not sure if that
is actually correct.

>>>>>   drivers/i2c/busses/i2c-omap.c | 7 ++++++-
>>>>>   1 file changed, 6 insertions(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/drivers/i2c/busses/i2c-omap.c b/drivers/i2c/busses/i2c-omap.c
>>>>> index 1ebb5e9..4b25fd1 100644
>>>>> --- a/drivers/i2c/busses/i2c-omap.c
>>>>> +++ b/drivers/i2c/busses/i2c-omap.c
>>>>> @@ -563,8 +563,13 @@ static int omap_i2c_wait_for_bb_valid(struct omap_i2c_dev *omap)
>>>>>   		}
>>>>>   
>>>>>   		if (time_after(jiffies, timeout)) {
>>>>> +			/*
>>>>> +			 * SDA or SCL were low for the entire timeout without
>>>>> +			 * any activity detected. Most likely, a slave is
>>>>> +			 * locking up the bus with no master driving the clock.
>>>>> +			 */
>>>>>   			dev_warn(omap->dev, "timeout waiting for bus ready\n");
>>>>> -			return -ETIMEDOUT;
>>>>> +			return i2c_recover_bus(&omap->adapter);
>>>>>   		}
>>>>>   
>>>>>   		msleep(1);
>>>>>
>>>>
>>>
>>
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] i2c-omap: Trigger bus recovery in lockup case
  2017-09-19 10:50       ` Vignesh R
  2017-09-20  9:24         ` Claudio Foellmi
@ 2017-09-20 15:02         ` Sebastian Reichel
  2017-09-26 12:24         ` Claudio Foellmi
  2 siblings, 0 replies; 20+ messages in thread
From: Sebastian Reichel @ 2017-09-20 15:02 UTC (permalink / raw)
  To: Vignesh R
  Cc: Claudio Foellmi, Strashko, Grygorii, Tony Lindgren, Nori, Sekhar,
	Cooper Jr.,
	Franklin, Aaro Koskinen, Wolfram Sang, linux-omap, linux-i2c

[-- Attachment #1: Type: text/plain, Size: 493 bytes --]

Hi,

On Tue, Sep 19, 2017 at 04:20:49PM +0530, Vignesh R wrote:
> I hit a situation where when communicating with a faulty i2c device, the
> last transaction on the bus does not end with proper STOP condition on
> the i2c bus.

Yeah, omap-i2c seems to have big issues with a device not acking a
transaction properly. It can be easily triggered on N950 (OMAP3 based):

https://patchwork.kernel.org/patch/9914785/

Please Cc me when sending a patch, so that I can test it on N950.

-- Sebastian

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] i2c-omap: Trigger bus recovery in lockup case
  2017-09-19 10:50       ` Vignesh R
  2017-09-20  9:24         ` Claudio Foellmi
  2017-09-20 15:02         ` Sebastian Reichel
@ 2017-09-26 12:24         ` Claudio Foellmi
  2017-09-29 12:52           ` Sebastian Reichel
  2 siblings, 1 reply; 20+ messages in thread
From: Claudio Foellmi @ 2017-09-26 12:24 UTC (permalink / raw)
  To: Vignesh R, Strashko, Grygorii, Tony Lindgren, Nori, Sekhar,
	Cooper Jr.,
	Franklin, Sebastian Reichel
  Cc: Aaro Koskinen, Wolfram Sang, linux-omap, linux-i2c

On 19.09.2017 12:50, Vignesh R wrote:
> 
> 
> On Monday 18 September 2017 05:31 PM, Claudio Foellmi wrote:
>> On 18.09.2017 07:24, Vignesh R wrote:
>>>
>>>
>>> On Saturday 16 September 2017 05:01 AM, Strashko, Grygorii wrote:
>>>>
>>>>
>>>> On 09/14/2017 10:39 AM, Claudio Foellmi wrote:
>>>>> A very conservative check for bus activity (to prevent interference
>>>>> in multimaster setups) prevented the bus recovery methods from being
>>>>> triggered in the case that SDA or SCL was stuck low.
>>>>> This defeats the purpose of the recovery mechanism, which was introduced
>>>>> for exactly this situation (a slave device keeping SDA pulled down).
>>>>>
>>>>> Note that bus lockups can persist across reboots. The only other options
>>>>> are to reset or power cycle the offending slave device, and many i2c
>>>>> slaves do not even have a reset pin.
>>>>>
>>>>> If we see that one of the lines is low for the entire timeout duration,
>>>>> we can actually be sure that there is no other master driving the bus.
>>>>> It is therefore save for us to attempt a bus recovery.
>>>>>
>>>>
>>>> Reviewed-by: Grygorii Strashko <grygorii.strashko@ti.com>
>>>>
>>>>> Signed-off-by: Claudio Foellmi <claudio.foellmi@ergon.ch>
>>>>> ---
>>>>> Caveat: It turns out I don't have the hardware to fully test the
>>>>> recovery mechanism. My faulty i2c slave device actually pulls down SCL,
>>>>> not SDA (so the recovery will not succeed in my case).
>>>
>>> Maybe, you could detect SCL stuck low case by reading status of SCL line
>>> from OMAP_I2C_SYSTEST_REG and then call IP reset (there is nothing much
>>> that can be done) instead of bus recovery.
>>>
>>
>> I plan on posting a related patch soon, that will print better messages
>> if the generic recovery fails. If SCL is stuck low, I think the best we
>> can do is make the problem visible in the kernel log.
>>
>>>>> But by directly connecting SDA to ground, I could at least make sure
>>>>> the recovery function gets called after applying this patch.
>>>>>
>>>
>>> I had seen flood of XRDY & RRDY interrupts as soon as TMODE is set to
>>> 0x3 as part of omap_i2c_prepare_recovery() leading to unusable system.
>>> Did you observe this behavior on your system? Could you mention the
>>> platform on which this experiment done?
>>>
>>
>> So attempting bus recovery is dangerous on some platforms?
>> I did not notice any obvious problems (assuming an 'unusable system'
>> would be hard to miss), but then again I only have one target to test on.
>> I'm working with a TI AM3352, the slave is a NXP NT3H2211 on i2c-1.
>>
> 
> I hit a situation where when communicating with a faulty i2c device, the
> last transaction on the bus does not end with proper STOP condition on
> the i2c bus. Since, STOP condition was not detected by IP, Bus Busy will
> remain set even though both SCL and SDA are high. Thus,
> omap_i2c_wait_for_bb() function would end up calling bus recovery. And
> as soon as TMODE is set to 0x3 and ST_EN to 0x1, there is a flood of
> XRDY & RRDY interrupts.
> 
> This spurious irqs can be reproduced easily by setting TMODE to 0x3 and
> ST_EN to 0x1 in OMAP_I2C_SYSTEST_REG when both SCL and SDA are high (bus
> is idle) even on AM335x.
> 
> So, if you are not seeing irq flood when SCL/SDA is stuck low, then
> maybe its safe to enter TMODE 0x3 in such cases. Could you modify the
> patch to test whether or not SDA is stuck low before initiating bus
> recovery?
> 

This sounds more like a problem with the interrupt handler than with
bus recovery, so I'm a bit hesitant to just add such a workaround.
Instead, I spent a few hours looking through the interrupt handling
(and poking my i2c bus with a wire to induce random faults), and
I suspect to have found the underlying cause, or at least part of it:

We sometimes ignore some interrupts (such as RRDY if we think we are
not in receiving mode), but don't really deal with their cause.
As a result, the same interrupt will just be raised again as soon as
we leave the handler. It will then be ignored again, and raised again...

I'm still not quite sure how we can reliably get into such situations in
the first place, but not sending a stop condition seems to be part of it.
Maybe it is somehow connected to the automatic internal state change
that happens as part of AL or NACK interrupts.


Below is a small patch that should test my assumptions.
It clears the incoming fifo and acks the ignored RRDY interrupts.

Sebastian, can you please check if this helps with your problems on N950?
If it does, I'll turn it into a proper standalone patch.


Cc: Sebastian Reichel <sre@kernel.org>
Signed-off-by: Claudio Foellmi <claudio.foellmi@ergon.ch>
---
 drivers/i2c/busses/i2c-omap.c | 17 ++++++++++++++---
 1 file changed, 14 insertions(+), 3 deletions(-)

diff --git a/drivers/i2c/busses/i2c-omap.c b/drivers/i2c/busses/i2c-omap.c
index 1ebb5e9..1ad1b0c 100644
--- a/drivers/i2c/busses/i2c-omap.c
+++ b/drivers/i2c/busses/i2c-omap.c
@@ -1009,6 +1009,7 @@ omap_i2c_isr_thread(int this_irq, void *dev_id)
 	struct omap_i2c_dev *omap = dev_id;
 	u16 bits;
 	u16 stat;
+	u16 buf;
 	int err = 0, count = 0;
 
 	do {
@@ -1016,11 +1017,21 @@ omap_i2c_isr_thread(int this_irq, void *dev_id)
 		stat = omap_i2c_read_reg(omap, OMAP_I2C_STAT_REG);
 		stat &= bits;
 
-		/* If we're in receiver mode, ignore XDR/XRDY */
-		if (omap->receiver)
+		/* If we're in receiver mode, ignore XDR/XRDY and vice versa */
+		if (omap->receiver && stat & (OMAP_I2C_STAT_XDR |
+				OMAP_I2C_STAT_XRDY)) {
+			dev_err(omap->dev, "ignoring TX interrupts\n");
 			stat &= ~(OMAP_I2C_STAT_XDR | OMAP_I2C_STAT_XRDY);
-		else
+		} else if (!omap->receiver && stat & (OMAP_I2C_STAT_RDR |
+				OMAP_I2C_STAT_RRDY)) {
+			dev_err(omap->dev, "ignoring RX interrupts\n");
 			stat &= ~(OMAP_I2C_STAT_RDR | OMAP_I2C_STAT_RRDY);
+			buf = omap_i2c_read_reg(omap, OMAP_I2C_BUF_REG);
+			buf |= OMAP_I2C_BUF_RXFIF_CLR;
+			omap_i2c_write_reg(omap, OMAP_I2C_BUF_REG, buf);
+			omap_i2c_ack_stat(omap,
+				OMAP_I2C_STAT_RRDY | OMAP_I2C_STAT_RDR);
+		}
 
 		if (!stat) {
 			/* my work here is done */
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH] i2c-omap: Trigger bus recovery in lockup case
  2017-09-26 12:24         ` Claudio Foellmi
@ 2017-09-29 12:52           ` Sebastian Reichel
  2017-09-29 15:17             ` Claudio Foellmi
  0 siblings, 1 reply; 20+ messages in thread
From: Sebastian Reichel @ 2017-09-29 12:52 UTC (permalink / raw)
  To: Claudio Foellmi
  Cc: Vignesh R, Strashko, Grygorii, Tony Lindgren, Nori, Sekhar,
	Cooper Jr.,
	Franklin, Aaro Koskinen, Wolfram Sang, linux-omap, linux-i2c

[-- Attachment #1: Type: text/plain, Size: 5394 bytes --]

Hi Claudio,

On Tue, Sep 26, 2017 at 02:24:59PM +0200, Claudio Foellmi wrote:
> On 19.09.2017 12:50, Vignesh R wrote:
> > 
> > 
> > On Monday 18 September 2017 05:31 PM, Claudio Foellmi wrote:
> >> On 18.09.2017 07:24, Vignesh R wrote:
> >>>
> >>>
> >>> On Saturday 16 September 2017 05:01 AM, Strashko, Grygorii wrote:
> >>>>
> >>>>
> >>>> On 09/14/2017 10:39 AM, Claudio Foellmi wrote:
> >>>>> A very conservative check for bus activity (to prevent interference
> >>>>> in multimaster setups) prevented the bus recovery methods from being
> >>>>> triggered in the case that SDA or SCL was stuck low.
> >>>>> This defeats the purpose of the recovery mechanism, which was introduced
> >>>>> for exactly this situation (a slave device keeping SDA pulled down).
> >>>>>
> >>>>> Note that bus lockups can persist across reboots. The only other options
> >>>>> are to reset or power cycle the offending slave device, and many i2c
> >>>>> slaves do not even have a reset pin.
> >>>>>
> >>>>> If we see that one of the lines is low for the entire timeout duration,
> >>>>> we can actually be sure that there is no other master driving the bus.
> >>>>> It is therefore save for us to attempt a bus recovery.
> >>>>>
> >>>>
> >>>> Reviewed-by: Grygorii Strashko <grygorii.strashko@ti.com>
> >>>>
> >>>>> Signed-off-by: Claudio Foellmi <claudio.foellmi@ergon.ch>
> >>>>> ---
> >>>>> Caveat: It turns out I don't have the hardware to fully test the
> >>>>> recovery mechanism. My faulty i2c slave device actually pulls down SCL,
> >>>>> not SDA (so the recovery will not succeed in my case).
> >>>
> >>> Maybe, you could detect SCL stuck low case by reading status of SCL line
> >>> from OMAP_I2C_SYSTEST_REG and then call IP reset (there is nothing much
> >>> that can be done) instead of bus recovery.
> >>>
> >>
> >> I plan on posting a related patch soon, that will print better messages
> >> if the generic recovery fails. If SCL is stuck low, I think the best we
> >> can do is make the problem visible in the kernel log.
> >>
> >>>>> But by directly connecting SDA to ground, I could at least make sure
> >>>>> the recovery function gets called after applying this patch.
> >>>>>
> >>>
> >>> I had seen flood of XRDY & RRDY interrupts as soon as TMODE is set to
> >>> 0x3 as part of omap_i2c_prepare_recovery() leading to unusable system.
> >>> Did you observe this behavior on your system? Could you mention the
> >>> platform on which this experiment done?
> >>>
> >>
> >> So attempting bus recovery is dangerous on some platforms?
> >> I did not notice any obvious problems (assuming an 'unusable system'
> >> would be hard to miss), but then again I only have one target to test on.
> >> I'm working with a TI AM3352, the slave is a NXP NT3H2211 on i2c-1.
> >>
> > 
> > I hit a situation where when communicating with a faulty i2c device, the
> > last transaction on the bus does not end with proper STOP condition on
> > the i2c bus. Since, STOP condition was not detected by IP, Bus Busy will
> > remain set even though both SCL and SDA are high. Thus,
> > omap_i2c_wait_for_bb() function would end up calling bus recovery. And
> > as soon as TMODE is set to 0x3 and ST_EN to 0x1, there is a flood of
> > XRDY & RRDY interrupts.
> > 
> > This spurious irqs can be reproduced easily by setting TMODE to 0x3 and
> > ST_EN to 0x1 in OMAP_I2C_SYSTEST_REG when both SCL and SDA are high (bus
> > is idle) even on AM335x.
> > 
> > So, if you are not seeing irq flood when SCL/SDA is stuck low, then
> > maybe its safe to enter TMODE 0x3 in such cases. Could you modify the
> > patch to test whether or not SDA is stuck low before initiating bus
> > recovery?
> > 
> 
> This sounds more like a problem with the interrupt handler than with
> bus recovery, so I'm a bit hesitant to just add such a workaround.
> Instead, I spent a few hours looking through the interrupt handling
> (and poking my i2c bus with a wire to induce random faults), and
> I suspect to have found the underlying cause, or at least part of it:
> 
> We sometimes ignore some interrupts (such as RRDY if we think we are
> not in receiving mode), but don't really deal with their cause.
> As a result, the same interrupt will just be raised again as soon as
> we leave the handler. It will then be ignored again, and raised again...
> 
> I'm still not quite sure how we can reliably get into such situations in
> the first place, but not sending a stop condition seems to be part of it.
> Maybe it is somehow connected to the automatic internal state change
> that happens as part of AL or NACK interrupts.
> 
> 
> Below is a small patch that should test my assumptions.
> It clears the incoming fifo and acks the ignored RRDY interrupts.
> 
> Sebastian, can you please check if this helps with your problems on N950?
> If it does, I'll turn it into a proper standalone patch.

No, it does not. Also no interrupts ignoring messages appearing
in dmesg:

n950# dmesg | grep -E "48072000.i2c|lp5523x"
[    0.791046] omap_i2c 48072000.i2c: bus 1 rev4.4 at 400 kHz
[    4.934265] lp5523x 1-0032: reset command sent (no ACK)!
[    6.003875] omap_i2c 48072000.i2c: controller timed out
[    6.033874] lp5523x 1-0032: device detection err: -110
[    6.039154] lp5523x: probe of 1-0032 failed with error -110

-- Sebastian

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] i2c-omap: Trigger bus recovery in lockup case
  2017-09-29 12:52           ` Sebastian Reichel
@ 2017-09-29 15:17             ` Claudio Foellmi
  2017-09-29 16:37               ` Sebastian Reichel
  2017-10-03 10:32               ` Vignesh R
  0 siblings, 2 replies; 20+ messages in thread
From: Claudio Foellmi @ 2017-09-29 15:17 UTC (permalink / raw)
  To: Sebastian Reichel, Vignesh R
  Cc: Strashko, Grygorii, Tony Lindgren, Nori, Sekhar, Cooper Jr.,
	Franklin, Aaro Koskinen, Wolfram Sang, linux-omap, linux-i2c

On 29.09.2017 14:52, Sebastian Reichel wrote:
> Hi Claudio,
> 
> On Tue, Sep 26, 2017 at 02:24:59PM +0200, Claudio Foellmi wrote:
>> On 19.09.2017 12:50, Vignesh R wrote:
>>>
>>>
>>> On Monday 18 September 2017 05:31 PM, Claudio Foellmi wrote:
>>>> On 18.09.2017 07:24, Vignesh R wrote:
>>>>>
>>>>>
>>>>> On Saturday 16 September 2017 05:01 AM, Strashko, Grygorii wrote:
>>>>>>
>>>>>>
>>>>>> On 09/14/2017 10:39 AM, Claudio Foellmi wrote:
>>>>>>> A very conservative check for bus activity (to prevent interference
>>>>>>> in multimaster setups) prevented the bus recovery methods from being
>>>>>>> triggered in the case that SDA or SCL was stuck low.
>>>>>>> This defeats the purpose of the recovery mechanism, which was introduced
>>>>>>> for exactly this situation (a slave device keeping SDA pulled down).
>>>>>>>
>>>>>>> Note that bus lockups can persist across reboots. The only other options
>>>>>>> are to reset or power cycle the offending slave device, and many i2c
>>>>>>> slaves do not even have a reset pin.
>>>>>>>
>>>>>>> If we see that one of the lines is low for the entire timeout duration,
>>>>>>> we can actually be sure that there is no other master driving the bus.
>>>>>>> It is therefore save for us to attempt a bus recovery.
>>>>>>>
>>>>>>
>>>>>> Reviewed-by: Grygorii Strashko <grygorii.strashko@ti.com>
>>>>>>
>>>>>>> Signed-off-by: Claudio Foellmi <claudio.foellmi@ergon.ch>
>>>>>>> ---
>>>>>>> Caveat: It turns out I don't have the hardware to fully test the
>>>>>>> recovery mechanism. My faulty i2c slave device actually pulls down SCL,
>>>>>>> not SDA (so the recovery will not succeed in my case).
>>>>>
>>>>> Maybe, you could detect SCL stuck low case by reading status of SCL line
>>>>> from OMAP_I2C_SYSTEST_REG and then call IP reset (there is nothing much
>>>>> that can be done) instead of bus recovery.
>>>>>
>>>>
>>>> I plan on posting a related patch soon, that will print better messages
>>>> if the generic recovery fails. If SCL is stuck low, I think the best we
>>>> can do is make the problem visible in the kernel log.
>>>>
>>>>>>> But by directly connecting SDA to ground, I could at least make sure
>>>>>>> the recovery function gets called after applying this patch.
>>>>>>>
>>>>>
>>>>> I had seen flood of XRDY & RRDY interrupts as soon as TMODE is set to
>>>>> 0x3 as part of omap_i2c_prepare_recovery() leading to unusable system.
>>>>> Did you observe this behavior on your system? Could you mention the
>>>>> platform on which this experiment done?
>>>>>
>>>>
>>>> So attempting bus recovery is dangerous on some platforms?
>>>> I did not notice any obvious problems (assuming an 'unusable system'
>>>> would be hard to miss), but then again I only have one target to test on.
>>>> I'm working with a TI AM3352, the slave is a NXP NT3H2211 on i2c-1.
>>>>
>>>
>>> I hit a situation where when communicating with a faulty i2c device, the
>>> last transaction on the bus does not end with proper STOP condition on
>>> the i2c bus. Since, STOP condition was not detected by IP, Bus Busy will
>>> remain set even though both SCL and SDA are high. Thus,
>>> omap_i2c_wait_for_bb() function would end up calling bus recovery. And
>>> as soon as TMODE is set to 0x3 and ST_EN to 0x1, there is a flood of
>>> XRDY & RRDY interrupts.
>>>
>>> This spurious irqs can be reproduced easily by setting TMODE to 0x3 and
>>> ST_EN to 0x1 in OMAP_I2C_SYSTEST_REG when both SCL and SDA are high (bus
>>> is idle) even on AM335x.
>>>
>>> So, if you are not seeing irq flood when SCL/SDA is stuck low, then
>>> maybe its safe to enter TMODE 0x3 in such cases. Could you modify the
>>> patch to test whether or not SDA is stuck low before initiating bus
>>> recovery?
>>>
>>
>> This sounds more like a problem with the interrupt handler than with
>> bus recovery, so I'm a bit hesitant to just add such a workaround.
>> Instead, I spent a few hours looking through the interrupt handling
>> (and poking my i2c bus with a wire to induce random faults), and
>> I suspect to have found the underlying cause, or at least part of it:
>>
>> We sometimes ignore some interrupts (such as RRDY if we think we are
>> not in receiving mode), but don't really deal with their cause.
>> As a result, the same interrupt will just be raised again as soon as
>> we leave the handler. It will then be ignored again, and raised again...
>>
>> I'm still not quite sure how we can reliably get into such situations in
>> the first place, but not sending a stop condition seems to be part of it.
>> Maybe it is somehow connected to the automatic internal state change
>> that happens as part of AL or NACK interrupts.
>>
>>
>> Below is a small patch that should test my assumptions.
>> It clears the incoming fifo and acks the ignored RRDY interrupts.
>>
>> Sebastian, can you please check if this helps with your problems on N950?
>> If it does, I'll turn it into a proper standalone patch.
> 
> No, it does not. Also no interrupts ignoring messages appearing
> in dmesg:
> 
> n950# dmesg | grep -E "48072000.i2c|lp5523x"
> [    0.791046] omap_i2c 48072000.i2c: bus 1 rev4.4 at 400 kHz
> [    4.934265] lp5523x 1-0032: reset command sent (no ACK)!
> [    6.003875] omap_i2c 48072000.i2c: controller timed out
> [    6.033874] lp5523x 1-0032: device detection err: -110
> [    6.039154] lp5523x: probe of 1-0032 failed with error -110
> 

Hi Sebastian

Thank you for trying it out.
It seems that your symptoms are quite different from the ones that Vignesh
described earlier. He had never-ending storms of spurious interrupts
(which that patch would have addressed), but you don't seem to get
any interrupts at all. Not even the NACK one, which just looks wrong.

If you want to still dig deeper, you can enable debug logs for i2c-omap,
so you can see every single interrupt. But if there are none, I don't see
what we could possibly do to fix it.


Vignesh, do you still have access to any of those devices with interrupt
floods? If so, could you try the previous patch on one of them?

Also note that Sebastian's issue is definitely not caused (or helped)
by bus recovery, the timeout he sees resets the adapter right away.
So he is not affected by my original patch either way.

-- Claudio

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] i2c-omap: Trigger bus recovery in lockup case
  2017-09-29 15:17             ` Claudio Foellmi
@ 2017-09-29 16:37               ` Sebastian Reichel
  2017-10-02 23:01                 ` Grygorii Strashko
  2017-10-03 10:32               ` Vignesh R
  1 sibling, 1 reply; 20+ messages in thread
From: Sebastian Reichel @ 2017-09-29 16:37 UTC (permalink / raw)
  To: Claudio Foellmi
  Cc: Vignesh R, Strashko, Grygorii, Tony Lindgren, Nori, Sekhar,
	Cooper Jr.,
	Franklin, Aaro Koskinen, Wolfram Sang, linux-omap, linux-i2c

[-- Attachment #1: Type: text/plain, Size: 3013 bytes --]

Hi,

On Fri, Sep 29, 2017 at 05:17:47PM +0200, Claudio Foellmi wrote:
> >> Sebastian, can you please check if this helps with your problems on N950?
> >> If it does, I'll turn it into a proper standalone patch.
> > 
> > No, it does not. Also no interrupts ignoring messages appearing
> > in dmesg:
> > 
> > n950# dmesg | grep -E "48072000.i2c|lp5523x"
> > [    0.791046] omap_i2c 48072000.i2c: bus 1 rev4.4 at 400 kHz
> > [    4.934265] lp5523x 1-0032: reset command sent (no ACK)!
> > [    6.003875] omap_i2c 48072000.i2c: controller timed out
> > [    6.033874] lp5523x 1-0032: device detection err: -110
> > [    6.039154] lp5523x: probe of 1-0032 failed with error -110
> > 
> 
> Hi Sebastian
> 
> Thank you for trying it out.
> It seems that your symptoms are quite different from the ones that Vignesh
> described earlier. He had never-ending storms of spurious interrupts
> (which that patch would have addressed), but you don't seem to get
> any interrupts at all. Not even the NACK one, which just looks wrong.
> 
> If you want to still dig deeper, you can enable debug logs for i2c-omap,
> so you can see every single interrupt. But if there are none, I don't see
> what we could possibly do to fix it.
> 
> 
> Vignesh, do you still have access to any of those devices with interrupt
> floods? If so, could you try the previous patch on one of them?
> 
> Also note that Sebastian's issue is definitely not caused (or helped)
> by bus recovery, the timeout he sees resets the adapter right away.
> So he is not affected by my original patch either way.
> 
> -- Claudio

Yeah, I don't see IRQ storm, so this might be a different issue.
FWIW this is what I see on N950:

n950# dmesg | grep -E "48072000.i2c|lp55"
[    2.136383] omap_i2c 48072000.i2c: bus 1 rev4.4 at 400 kHz
[    8.212951] omap_i2c 48072000.i2c: addr: 0x0032, len: 2, flags: 0x0, stop: 1
[    8.213287] omap_i2c 48072000.i2c: IRQ (ISR = 0x0010)
[    8.213592] omap_i2c 48072000.i2c: IRQ (ISR = 0x0002)
[    8.213897] lp5523x 1-0032: reset command sent (no ACK)!
[    8.242462] omap_i2c 48072000.i2c: addr: 0x0032, len: 2, flags: 0x0, stop: 1
[    8.243255] omap_i2c 48072000.i2c: IRQ (ISR = 0x0014)
[    8.243469] omap_i2c 48072000.i2c: IRQ (ISR = 0x0010)
[    8.253387] omap_i2c 48072000.i2c: addr: 0x0032, len: 1, flags: 0x0, stop: 0
[    8.253631] omap_i2c 48072000.i2c: IRQ (ISR = 0x0010)
[    9.272735] omap_i2c 48072000.i2c: controller timed out
[    9.278167] lp5523x 1-0032: device detection err: -110
[    9.283843] lp5523x: probe of 1-0032 failed with error -110
[    9.662780] omap_i2c 48072000.i2c: addr: 0x0060, len: 1, flags: 0x0, stop: 0
[    9.670166] omap_i2c 48072000.i2c: IRQ (ISR = 0x0010)
[    9.675659] omap_i2c 48072000.i2c: IRQ (ISR = 0x0004)
[    9.682617] omap_i2c 48072000.i2c: addr: 0x0060, len: 1, flags: 0x1, stop: 1
[    9.689819] omap_i2c 48072000.i2c: IRQ (ISR = 0x0008)
[    9.695007] omap_i2c 48072000.i2c: IRQ (ISR = 0x0004)

-- Sebastian

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] i2c-omap: Trigger bus recovery in lockup case
  2017-09-29 16:37               ` Sebastian Reichel
@ 2017-10-02 23:01                 ` Grygorii Strashko
  2017-10-06 15:22                   ` Sebastian Reichel
  0 siblings, 1 reply; 20+ messages in thread
From: Grygorii Strashko @ 2017-10-02 23:01 UTC (permalink / raw)
  To: Sebastian Reichel, Claudio Foellmi
  Cc: Vignesh R, Tony Lindgren, Nori, Sekhar, Cooper Jr.,
	Franklin, Aaro Koskinen, Wolfram Sang, linux-omap, linux-i2c



On 09/29/2017 11:37 AM, Sebastian Reichel wrote:
> Hi,
> 
> On Fri, Sep 29, 2017 at 05:17:47PM +0200, Claudio Foellmi wrote:
>>>> Sebastian, can you please check if this helps with your problems on N950?
>>>> If it does, I'll turn it into a proper standalone patch.
>>>
>>> No, it does not. Also no interrupts ignoring messages appearing
>>> in dmesg:
>>>
>>> n950# dmesg | grep -E "48072000.i2c|lp5523x"
>>> [    0.791046] omap_i2c 48072000.i2c: bus 1 rev4.4 at 400 kHz
>>> [    4.934265] lp5523x 1-0032: reset command sent (no ACK)!
>>> [    6.003875] omap_i2c 48072000.i2c: controller timed out
>>> [    6.033874] lp5523x 1-0032: device detection err: -110
>>> [    6.039154] lp5523x: probe of 1-0032 failed with error -110
>>>
>>
>> Hi Sebastian
>>
>> Thank you for trying it out.
>> It seems that your symptoms are quite different from the ones that Vignesh
>> described earlier. He had never-ending storms of spurious interrupts
>> (which that patch would have addressed), but you don't seem to get
>> any interrupts at all. Not even the NACK one, which just looks wrong.

regarding spurious interrupts during recovery. In my opinion,
I2C IRQs should be disabled on entering recovery and re-enabled after, as
bus state is unpredictable during recovery and OMAP I2C driver expect to
receive IRQs only and only when omap_i2c_xfer() is called.
(omap->receiver will have correct value only in above case)

>>
>> If you want to still dig deeper, you can enable debug logs for i2c-omap,
>> so you can see every single interrupt. But if there are none, I don't see
>> what we could possibly do to fix it.
>>
>>
>> Vignesh, do you still have access to any of those devices with interrupt
>> floods? If so, could you try the previous patch on one of them?
>>
>> Also note that Sebastian's issue is definitely not caused (or helped)
>> by bus recovery, the timeout he sees resets the adapter right away.
>> So he is not affected by my original patch either way.
>>
>> -- Claudio
> 
> Yeah, I don't see IRQ storm, so this might be a different issue.
> FWIW this is what I see on N950:
> 
> n950# dmesg | grep -E "48072000.i2c|lp55"
> [    2.136383] omap_i2c 48072000.i2c: bus 1 rev4.4 at 400 kHz
> [    8.212951] omap_i2c 48072000.i2c: addr: 0x0032, len: 2, flags: 0x0, stop: 1
> [    8.213287] omap_i2c 48072000.i2c: IRQ (ISR = 0x0010)
> [    8.213592] omap_i2c 48072000.i2c: IRQ (ISR = 0x0002)
> [    8.213897] lp5523x 1-0032: reset command sent (no ACK)!
> [    8.242462] omap_i2c 48072000.i2c: addr: 0x0032, len: 2, flags: 0x0, stop: 1
> [    8.243255] omap_i2c 48072000.i2c: IRQ (ISR = 0x0014)
> [    8.243469] omap_i2c 48072000.i2c: IRQ (ISR = 0x0010)

this second XRDY looks suspicious, as it's received after ARDY.

> [    8.253387] omap_i2c 48072000.i2c: addr: 0x0032, len: 1, flags: 0x0, stop: 0
> [    8.253631] omap_i2c 48072000.i2c: IRQ (ISR = 0x0010)
> [    9.272735] omap_i2c 48072000.i2c: controller timed out
> [    9.278167] lp5523x 1-0032: device detection err: -110
> [    9.283843] lp5523x: probe of 1-0032 failed with error -110
> [    9.662780] omap_i2c 48072000.i2c: addr: 0x0060, len: 1, flags: 0x0, stop: 0
> [    9.670166] omap_i2c 48072000.i2c: IRQ (ISR = 0x0010)
> [    9.675659] omap_i2c 48072000.i2c: IRQ (ISR = 0x0004)
> [    9.682617] omap_i2c 48072000.i2c: addr: 0x0060, len: 1, flags: 0x1, stop: 1
> [    9.689819] omap_i2c 48072000.i2c: IRQ (ISR = 0x0008)
> [    9.695007] omap_i2c 48072000.i2c: IRQ (ISR = 0x0004)

 
Wouldn't below diff help:

diff --git a/drivers/i2c/busses/i2c-omap.c b/drivers/i2c/busses/i2c-omap.c
index 1ebb5e9..e52bdae 100644
--- a/drivers/i2c/busses/i2c-omap.c
+++ b/drivers/i2c/busses/i2c-omap.c
@@ -959,7 +959,7 @@ static int omap_i2c_transmit_data(struct omap_i2c_dev *omap, u8 num_bytes,
 {
        u16             w;
 
-       while (num_bytes--) {
+       while (omap->buf_len && num_bytes--) {
                w = *omap->buf++;
                omap->buf_len--;




-- 
regards,
-grygorii

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH] i2c-omap: Trigger bus recovery in lockup case
  2017-09-29 15:17             ` Claudio Foellmi
  2017-09-29 16:37               ` Sebastian Reichel
@ 2017-10-03 10:32               ` Vignesh R
  2017-10-04  9:43                 ` [PATCH v2] i2c: omap: " Claudio Foellmi
  1 sibling, 1 reply; 20+ messages in thread
From: Vignesh R @ 2017-10-03 10:32 UTC (permalink / raw)
  To: Claudio Foellmi, Sebastian Reichel
  Cc: Strashko, Grygorii, Tony Lindgren, Nori, Sekhar, Cooper Jr.,
	Franklin, Aaro Koskinen, Wolfram Sang, linux-omap, linux-i2c



On Friday 29 September 2017 08:47 PM, Claudio Foellmi wrote:
[...]
>>>> I hit a situation where when communicating with a faulty i2c device, the
>>>> last transaction on the bus does not end with proper STOP condition on
>>>> the i2c bus. Since, STOP condition was not detected by IP, Bus Busy will
>>>> remain set even though both SCL and SDA are high. Thus,
>>>> omap_i2c_wait_for_bb() function would end up calling bus recovery. And
>>>> as soon as TMODE is set to 0x3 and ST_EN to 0x1, there is a flood of
>>>> XRDY & RRDY interrupts.
>>>>
>>>> This spurious irqs can be reproduced easily by setting TMODE to 0x3 and
>>>> ST_EN to 0x1 in OMAP_I2C_SYSTEST_REG when both SCL and SDA are high (bus
>>>> is idle) even on AM335x.
>>>>
>>>> So, if you are not seeing irq flood when SCL/SDA is stuck low, then
>>>> maybe its safe to enter TMODE 0x3 in such cases. Could you modify the
>>>> patch to test whether or not SDA is stuck low before initiating bus
>>>> recovery?
>>>>
>>>
>>> This sounds more like a problem with the interrupt handler than with
>>> bus recovery, so I'm a bit hesitant to just add such a workaround.

I would not say its a workaround. As per I2C spec, bus recovery is to be
tried only when SDA is stuck low. My suggestion is to check this
condition before requesting recovery.

>>> Instead, I spent a few hours looking through the interrupt handling
>>> (and poking my i2c bus with a wire to induce random faults), and
>>> I suspect to have found the underlying cause, or at least part of it:
>>>
>>> We sometimes ignore some interrupts (such as RRDY if we think we are
>>> not in receiving mode), but don't really deal with their cause.
>>> As a result, the same interrupt will just be raised again as soon as
>>> we leave the handler. It will then be ignored again, and raised again...
>>>
>>> I'm still not quite sure how we can reliably get into such situations in
>>> the first place, but not sending a stop condition seems to be part of it.
>>> Maybe it is somehow connected to the automatic internal state change
>>> that happens as part of AL or NACK interrupts.
>>>
>>>
>>> Below is a small patch that should test my assumptions.
>>> It clears the incoming fifo and acks the ignored RRDY interrupts.
>>>
>>> Sebastian, can you please check if this helps with your problems on N950?
>>> If it does, I'll turn it into a proper standalone patch.
>>
>> No, it does not. Also no interrupts ignoring messages appearing
>> in dmesg:
>>
>> n950# dmesg | grep -E "48072000.i2c|lp5523x"
>> [    0.791046] omap_i2c 48072000.i2c: bus 1 rev4.4 at 400 kHz
>> [    4.934265] lp5523x 1-0032: reset command sent (no ACK)!
>> [    6.003875] omap_i2c 48072000.i2c: controller timed out
>> [    6.033874] lp5523x 1-0032: device detection err: -110
>> [    6.039154] lp5523x: probe of 1-0032 failed with error -110
>>
> 
> Hi Sebastian
> 
> Thank you for trying it out.
> It seems that your symptoms are quite different from the ones that Vignesh
> described earlier. He had never-ending storms of spurious interrupts
> (which that patch would have addressed), but you don't seem to get
> any interrupts at all. Not even the NACK one, which just looks wrong.
> 
> If you want to still dig deeper, you can enable debug logs for i2c-omap,
> so you can see every single interrupt. But if there are none, I don't see
> what we could possibly do to fix it.
> 
> 
> Vignesh, do you still have access to any of those devices with interrupt
> floods? If so, could you try the previous patch on one of them?

In past, I had tried to ACK all the IRQs instead of ignoring, but that
did not help. Anyway, I tried your patch, but unfortunately that does
not help either. I see interrupts being ignored, but the IRQ flood
continues. Here is the log:
http://pastebin.ubuntu.com/25666141/

> 
> Also note that Sebastian's issue is definitely not caused (or helped)
> by bus recovery, the timeout he sees resets the adapter right away.
> So he is not affected by my original patch either way.
> 
> -- Claudio
> 

-- 
Regards
Vignesh

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v2] i2c: omap: Trigger bus recovery in lockup case
  2017-10-03 10:32               ` Vignesh R
@ 2017-10-04  9:43                 ` Claudio Foellmi
  2017-10-05  6:01                   ` Vignesh R
                                     ` (2 more replies)
  0 siblings, 3 replies; 20+ messages in thread
From: Claudio Foellmi @ 2017-10-04  9:43 UTC (permalink / raw)
  To: Tony Lindgren
  Cc: Claudio Foellmi, Nori, Sekhar, Cooper Jr.,
	Franklin, Aaro Koskinen, Wolfram Sang, Sebastian Reichel,
	linux-omap, linux-i2c, Grygorii Strashko, Vignesh R

A very conservative check for bus activity (to prevent interference
in multimaster setups) prevented the bus recovery methods from being
triggered in the case that SDA or SCL was stuck low.
This defeats the purpose of the recovery mechanism, which was introduced
for exactly this situation (a slave device keeping SDA pulled down).

Also added a check to make sure SDA is low before attempting recovery.
If SDA is not stuck low, recovery will not help, so we can skip it.

Note that bus lockups can persist across reboots. The only other options
are to reset or power cycle the offending slave device, and many i2c
slaves do not even have a reset pin.

If we see that one of the lines is low for the entire timeout duration,
we can actually be sure that there is no other master driving the bus.
It is therefore save for us to attempt a bus recovery.

Signed-off-by: Claudio Foellmi <claudio.foellmi@ergon.ch>
Cc: Grygorii Strashko <grygorii.strashko@ti.com>
Cc: Vignesh R <vigneshr@ti.com>
---
Changes since v1:
Added a check before all bus recoveries, to make sure SDA actually is
low. This should prevent most unnecessary attempts, which are not
without risk.

The full discussion of v1 can be found at
http://patchwork.ozlabs.org/patch/813889/

 drivers/i2c/busses/i2c-omap.c | 25 +++++++++++++++++++++++--
 1 file changed, 23 insertions(+), 2 deletions(-)

diff --git a/drivers/i2c/busses/i2c-omap.c b/drivers/i2c/busses/i2c-omap.c
index 1ebb5e9..8cdf40a 100644
--- a/drivers/i2c/busses/i2c-omap.c
+++ b/drivers/i2c/busses/i2c-omap.c
@@ -473,6 +473,22 @@ static int omap_i2c_init(struct omap_i2c_dev *omap)
 }
 
 /*
+ * Try bus recovery, but only if SDA is actually low.
+ */
+static int omap_i2c_recover_bus(struct omap_i2c_dev *omap)
+{
+	u16 systest;
+
+	systest = omap_i2c_read_reg(omap, OMAP_I2C_SYSTEST_REG);
+	if ((systest & OMAP_I2C_SYSTEST_SCL_I_FUNC) &&
+	    (systest & OMAP_I2C_SYSTEST_SDA_I_FUNC))
+		return 0; /* bus seems to already be fine */
+	if (!(systest & OMAP_I2C_SYSTEST_SCL_I_FUNC))
+		return -ETIMEDOUT; /* recovery would not fix SCL */
+	return i2c_recover_bus(&omap->adapter);
+}
+
+/*
  * Waiting on Bus Busy
  */
 static int omap_i2c_wait_for_bb(struct omap_i2c_dev *omap)
@@ -482,7 +498,7 @@ static int omap_i2c_wait_for_bb(struct omap_i2c_dev *omap)
 	timeout = jiffies + OMAP_I2C_TIMEOUT;
 	while (omap_i2c_read_reg(omap, OMAP_I2C_STAT_REG) & OMAP_I2C_STAT_BB) {
 		if (time_after(jiffies, timeout))
-			return i2c_recover_bus(&omap->adapter);
+			return omap_i2c_recover_bus(omap);
 		msleep(1);
 	}
 
@@ -563,8 +579,13 @@ static int omap_i2c_wait_for_bb_valid(struct omap_i2c_dev *omap)
 		}
 
 		if (time_after(jiffies, timeout)) {
+			/*
+			 * SDA or SCL were low for the entire timeout without
+			 * any activity detected. Most likely, a slave is
+			 * locking up the bus with no master driving the clock.
+			 */
 			dev_warn(omap->dev, "timeout waiting for bus ready\n");
-			return -ETIMEDOUT;
+			return omap_i2c_recover_bus(omap);
 		}
 
 		msleep(1);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH v2] i2c: omap: Trigger bus recovery in lockup case
  2017-10-04  9:43                 ` [PATCH v2] i2c: omap: " Claudio Foellmi
@ 2017-10-05  6:01                   ` Vignesh R
  2017-10-05 12:30                     ` Grygorii Strashko
  2017-10-28 20:52                   ` Wolfram Sang
  2017-10-30 14:19                   ` Wolfram Sang
  2 siblings, 1 reply; 20+ messages in thread
From: Vignesh R @ 2017-10-05  6:01 UTC (permalink / raw)
  To: Claudio Foellmi, Tony Lindgren
  Cc: Nori, Sekhar, Cooper Jr.,
	Franklin, Aaro Koskinen, Wolfram Sang, Sebastian Reichel,
	linux-omap, linux-i2c, Grygorii Strashko

Hi,

On Wednesday 04 October 2017 03:13 PM, Claudio Foellmi wrote:
> A very conservative check for bus activity (to prevent interference
> in multimaster setups) prevented the bus recovery methods from being
> triggered in the case that SDA or SCL was stuck low.
> This defeats the purpose of the recovery mechanism, which was introduced
> for exactly this situation (a slave device keeping SDA pulled down).
> 
> Also added a check to make sure SDA is low before attempting recovery.
> If SDA is not stuck low, recovery will not help, so we can skip it.
> 
> Note that bus lockups can persist across reboots. The only other options
> are to reset or power cycle the offending slave device, and many i2c
> slaves do not even have a reset pin.
> 
> If we see that one of the lines is low for the entire timeout duration,
> we can actually be sure that there is no other master driving the bus.
> It is therefore save for us to attempt a bus recovery.
> 
> Signed-off-by: Claudio Foellmi <claudio.foellmi@ergon.ch>
> Cc: Grygorii Strashko <grygorii.strashko@ti.com>
> Cc: Vignesh R <vigneshr@ti.com>
> ---
> Changes since v1:
> Added a check before all bus recoveries, to make sure SDA actually is
> low. This should prevent most unnecessary attempts, which are not
> without risk.
> 

Thanks for the patch! This fixes my case. I no longer see recovery
attempt or IRQ flood:
Tested-by: Vignesh R <vigneshr@ti.com>

Regards
Vignesh

> The full discussion of v1 can be found at
> http://patchwork.ozlabs.org/patch/813889/
> 
>  drivers/i2c/busses/i2c-omap.c | 25 +++++++++++++++++++++++--
>  1 file changed, 23 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/i2c/busses/i2c-omap.c b/drivers/i2c/busses/i2c-omap.c
> index 1ebb5e9..8cdf40a 100644
> --- a/drivers/i2c/busses/i2c-omap.c
> +++ b/drivers/i2c/busses/i2c-omap.c
> @@ -473,6 +473,22 @@ static int omap_i2c_init(struct omap_i2c_dev *omap)
>  }
>  
>  /*
> + * Try bus recovery, but only if SDA is actually low.
> + */
> +static int omap_i2c_recover_bus(struct omap_i2c_dev *omap)
> +{
> +	u16 systest;
> +
> +	systest = omap_i2c_read_reg(omap, OMAP_I2C_SYSTEST_REG);
> +	if ((systest & OMAP_I2C_SYSTEST_SCL_I_FUNC) &&
> +	    (systest & OMAP_I2C_SYSTEST_SDA_I_FUNC))
> +		return 0; /* bus seems to already be fine */
> +	if (!(systest & OMAP_I2C_SYSTEST_SCL_I_FUNC))
> +		return -ETIMEDOUT; /* recovery would not fix SCL */
> +	return i2c_recover_bus(&omap->adapter);
> +}
> +
> +/*
>   * Waiting on Bus Busy
>   */
>  static int omap_i2c_wait_for_bb(struct omap_i2c_dev *omap)
> @@ -482,7 +498,7 @@ static int omap_i2c_wait_for_bb(struct omap_i2c_dev *omap)
>  	timeout = jiffies + OMAP_I2C_TIMEOUT;
>  	while (omap_i2c_read_reg(omap, OMAP_I2C_STAT_REG) & OMAP_I2C_STAT_BB) {
>  		if (time_after(jiffies, timeout))
> -			return i2c_recover_bus(&omap->adapter);
> +			return omap_i2c_recover_bus(omap);
>  		msleep(1);
>  	}
>  
> @@ -563,8 +579,13 @@ static int omap_i2c_wait_for_bb_valid(struct omap_i2c_dev *omap)
>  		}
>  
>  		if (time_after(jiffies, timeout)) {
> +			/*
> +			 * SDA or SCL were low for the entire timeout without
> +			 * any activity detected. Most likely, a slave is
> +			 * locking up the bus with no master driving the clock.
> +			 */
>  			dev_warn(omap->dev, "timeout waiting for bus ready\n");
> -			return -ETIMEDOUT;
> +			return omap_i2c_recover_bus(omap);
>  		}
>  
>  		msleep(1);
> 

-- 
Regards
Vignesh

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2] i2c: omap: Trigger bus recovery in lockup case
  2017-10-05  6:01                   ` Vignesh R
@ 2017-10-05 12:30                     ` Grygorii Strashko
  0 siblings, 0 replies; 20+ messages in thread
From: Grygorii Strashko @ 2017-10-05 12:30 UTC (permalink / raw)
  To: Vignesh R, Claudio Foellmi, Tony Lindgren
  Cc: Nori, Sekhar, Cooper Jr.,
	Franklin, Aaro Koskinen, Wolfram Sang, Sebastian Reichel,
	linux-omap, linux-i2c



On 10/05/2017 01:01 AM, Vignesh R wrote:
> Hi,
> 
> On Wednesday 04 October 2017 03:13 PM, Claudio Foellmi wrote:
>> A very conservative check for bus activity (to prevent interference
>> in multimaster setups) prevented the bus recovery methods from being
>> triggered in the case that SDA or SCL was stuck low.
>> This defeats the purpose of the recovery mechanism, which was introduced
>> for exactly this situation (a slave device keeping SDA pulled down).
>>
>> Also added a check to make sure SDA is low before attempting recovery.
>> If SDA is not stuck low, recovery will not help, so we can skip it.
>>
>> Note that bus lockups can persist across reboots. The only other options
>> are to reset or power cycle the offending slave device, and many i2c
>> slaves do not even have a reset pin.
>>
>> If we see that one of the lines is low for the entire timeout duration,
>> we can actually be sure that there is no other master driving the bus.
>> It is therefore save for us to attempt a bus recovery.
>>
>> Signed-off-by: Claudio Foellmi <claudio.foellmi@ergon.ch>
>> Cc: Grygorii Strashko <grygorii.strashko@ti.com>
>> Cc: Vignesh R <vigneshr@ti.com>
>> ---
>> Changes since v1:
>> Added a check before all bus recoveries, to make sure SDA actually is
>> low. This should prevent most unnecessary attempts, which are not
>> without risk.
>>
> 
> Thanks for the patch! This fixes my case. I no longer see recovery
> attempt or IRQ flood:
> Tested-by: Vignesh R <vigneshr@ti.com>

Reviewed-by: Grygorii Strashko <grygorii.strashko@ti.com>

-- 
regards,
-grygorii

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] i2c-omap: Trigger bus recovery in lockup case
  2017-10-02 23:01                 ` Grygorii Strashko
@ 2017-10-06 15:22                   ` Sebastian Reichel
  0 siblings, 0 replies; 20+ messages in thread
From: Sebastian Reichel @ 2017-10-06 15:22 UTC (permalink / raw)
  To: Grygorii Strashko
  Cc: Claudio Foellmi, Vignesh R, Tony Lindgren, Nori, Sekhar,
	Cooper Jr.,
	Franklin, Aaro Koskinen, Wolfram Sang, linux-omap, linux-i2c

[-- Attachment #1: Type: text/plain, Size: 5625 bytes --]

Hi,

On Mon, Oct 02, 2017 at 06:01:19PM -0500, Grygorii Strashko wrote:
> 
> 
> On 09/29/2017 11:37 AM, Sebastian Reichel wrote:
> > Hi,
> > 
> > On Fri, Sep 29, 2017 at 05:17:47PM +0200, Claudio Foellmi wrote:
> >>>> Sebastian, can you please check if this helps with your problems on N950?
> >>>> If it does, I'll turn it into a proper standalone patch.
> >>>
> >>> No, it does not. Also no interrupts ignoring messages appearing
> >>> in dmesg:
> >>>
> >>> n950# dmesg | grep -E "48072000.i2c|lp5523x"
> >>> [    0.791046] omap_i2c 48072000.i2c: bus 1 rev4.4 at 400 kHz
> >>> [    4.934265] lp5523x 1-0032: reset command sent (no ACK)!
> >>> [    6.003875] omap_i2c 48072000.i2c: controller timed out
> >>> [    6.033874] lp5523x 1-0032: device detection err: -110
> >>> [    6.039154] lp5523x: probe of 1-0032 failed with error -110
> >>>
> >>
> >> Hi Sebastian
> >>
> >> Thank you for trying it out.
> >> It seems that your symptoms are quite different from the ones that Vignesh
> >> described earlier. He had never-ending storms of spurious interrupts
> >> (which that patch would have addressed), but you don't seem to get
> >> any interrupts at all. Not even the NACK one, which just looks wrong.
> 
> regarding spurious interrupts during recovery. In my opinion,
> I2C IRQs should be disabled on entering recovery and re-enabled after, as
> bus state is unpredictable during recovery and OMAP I2C driver expect to
> receive IRQs only and only when omap_i2c_xfer() is called.
> (omap->receiver will have correct value only in above case)
> 
> >>
> >> If you want to still dig deeper, you can enable debug logs for i2c-omap,
> >> so you can see every single interrupt. But if there are none, I don't see
> >> what we could possibly do to fix it.
> >>
> >>
> >> Vignesh, do you still have access to any of those devices with interrupt
> >> floods? If so, could you try the previous patch on one of them?
> >>
> >> Also note that Sebastian's issue is definitely not caused (or helped)
> >> by bus recovery, the timeout he sees resets the adapter right away.
> >> So he is not affected by my original patch either way.
> >>
> >> -- Claudio
> > 
> > Yeah, I don't see IRQ storm, so this might be a different issue.
> > FWIW this is what I see on N950:
> > 
> > n950# dmesg | grep -E "48072000.i2c|lp55"
> > [    2.136383] omap_i2c 48072000.i2c: bus 1 rev4.4 at 400 kHz
> > [    8.212951] omap_i2c 48072000.i2c: addr: 0x0032, len: 2, flags: 0x0, stop: 1
> > [    8.213287] omap_i2c 48072000.i2c: IRQ (ISR = 0x0010)
> > [    8.213592] omap_i2c 48072000.i2c: IRQ (ISR = 0x0002)
> > [    8.213897] lp5523x 1-0032: reset command sent (no ACK)!
> > [    8.242462] omap_i2c 48072000.i2c: addr: 0x0032, len: 2, flags: 0x0, stop: 1
> > [    8.243255] omap_i2c 48072000.i2c: IRQ (ISR = 0x0014)
> > [    8.243469] omap_i2c 48072000.i2c: IRQ (ISR = 0x0010)
> 
> this second XRDY looks suspicious, as it's received after ARDY.
> 
> > [    8.253387] omap_i2c 48072000.i2c: addr: 0x0032, len: 1, flags: 0x0, stop: 0
> > [    8.253631] omap_i2c 48072000.i2c: IRQ (ISR = 0x0010)
> > [    9.272735] omap_i2c 48072000.i2c: controller timed out
> > [    9.278167] lp5523x 1-0032: device detection err: -110
> > [    9.283843] lp5523x: probe of 1-0032 failed with error -110
> > [    9.662780] omap_i2c 48072000.i2c: addr: 0x0060, len: 1, flags: 0x0, stop: 0
> > [    9.670166] omap_i2c 48072000.i2c: IRQ (ISR = 0x0010)
> > [    9.675659] omap_i2c 48072000.i2c: IRQ (ISR = 0x0004)
> > [    9.682617] omap_i2c 48072000.i2c: addr: 0x0060, len: 1, flags: 0x1, stop: 1
> > [    9.689819] omap_i2c 48072000.i2c: IRQ (ISR = 0x0008)
> > [    9.695007] omap_i2c 48072000.i2c: IRQ (ISR = 0x0004)
> 
>  
> Wouldn't below diff help:
> 
> diff --git a/drivers/i2c/busses/i2c-omap.c b/drivers/i2c/busses/i2c-omap.c
> index 1ebb5e9..e52bdae 100644
> --- a/drivers/i2c/busses/i2c-omap.c
> +++ b/drivers/i2c/busses/i2c-omap.c
> @@ -959,7 +959,7 @@ static int omap_i2c_transmit_data(struct omap_i2c_dev *omap, u8 num_bytes,
>  {
>         u16             w;
>  
> -       while (num_bytes--) {
> +       while (omap->buf_len && num_bytes--) {
>                 w = *omap->buf++;
>                 omap->buf_len--;

that did not change anything as far as I can see:

n950# dmesg | grep -E "48072000.i2c|lp5523x"
[    2.139801] omap_i2c 48072000.i2c: bus 1 rev4.4 at 400 kHz
[    8.236968] omap_i2c 48072000.i2c: addr: 0x0032, len: 2, flags: 0x0, stop: 1
[    8.237457] omap_i2c 48072000.i2c: IRQ (ISR = 0x0010)
[    8.237792] omap_i2c 48072000.i2c: IRQ (ISR = 0x0002)
[    8.238189] lp5523x 1-0032: reset command sent (no ACK)!
[    8.256500] omap_i2c 48072000.i2c: addr: 0x0032, len: 2, flags: 0x0, stop: 1
[    8.256835] omap_i2c 48072000.i2c: IRQ (ISR = 0x0014)
[    8.257080] omap_i2c 48072000.i2c: IRQ (ISR = 0x0010)
[    8.258941] omap_i2c 48072000.i2c: addr: 0x0032, len: 1, flags: 0x0, stop: 0
[    8.259246] omap_i2c 48072000.i2c: IRQ (ISR = 0x0010)
[    9.276062] omap_i2c 48072000.i2c: controller timed out
[    9.306060] lp5523x 1-0032: device detection err: -110
[    9.311492] lp5523x: probe of 1-0032 failed with error -110
[    9.705993] omap_i2c 48072000.i2c: addr: 0x0060, len: 1, flags: 0x0, stop: 0
[    9.713256] omap_i2c 48072000.i2c: IRQ (ISR = 0x0010)
[    9.718536] omap_i2c 48072000.i2c: IRQ (ISR = 0x0004)
[    9.723754] omap_i2c 48072000.i2c: addr: 0x0060, len: 1, flags: 0x1, stop: 1
[    9.731079] omap_i2c 48072000.i2c: IRQ (ISR = 0x0008)
[    9.736297] omap_i2c 48072000.i2c: IRQ (ISR = 0x0004)

-- Sebastian

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2] i2c: omap: Trigger bus recovery in lockup case
  2017-10-04  9:43                 ` [PATCH v2] i2c: omap: " Claudio Foellmi
  2017-10-05  6:01                   ` Vignesh R
@ 2017-10-28 20:52                   ` Wolfram Sang
  2017-10-30  9:11                     ` Claudio Foellmi
  2017-10-30 14:19                   ` Wolfram Sang
  2 siblings, 1 reply; 20+ messages in thread
From: Wolfram Sang @ 2017-10-28 20:52 UTC (permalink / raw)
  To: Claudio Foellmi
  Cc: Tony Lindgren, Nori, Sekhar, Cooper Jr.,
	Franklin, Aaro Koskinen, Sebastian Reichel, linux-omap,
	linux-i2c, Grygorii Strashko, Vignesh R

[-- Attachment #1: Type: text/plain, Size: 263 bytes --]


> +	if (!(systest & OMAP_I2C_SYSTEST_SCL_I_FUNC))
> +		return -ETIMEDOUT; /* recovery would not fix SCL */

I'd rather return EBUSY here. It is not that the transaction timed out.
If you agree, I can fix it locally, no need to resend.

Rest looks good, thanks!


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2] i2c: omap: Trigger bus recovery in lockup case
  2017-10-28 20:52                   ` Wolfram Sang
@ 2017-10-30  9:11                     ` Claudio Foellmi
  0 siblings, 0 replies; 20+ messages in thread
From: Claudio Foellmi @ 2017-10-30  9:11 UTC (permalink / raw)
  To: Wolfram Sang
  Cc: Tony Lindgren, Nori, Sekhar, Cooper Jr.,
	Franklin, Aaro Koskinen, Sebastian Reichel, linux-omap,
	linux-i2c, Grygorii Strashko, Vignesh R

On 10/28/2017 10:52 PM, Wolfram Sang wrote:
> 
>> +	if (!(systest & OMAP_I2C_SYSTEST_SCL_I_FUNC))
>> +		return -ETIMEDOUT; /* recovery would not fix SCL */
> 
> I'd rather return EBUSY here. It is not that the transaction timed out.
> If you agree, I can fix it locally, no need to resend.
> 
> Rest looks good, thanks!
>

Agreed, EBUSY is the better choice since that is also what the actual recovery would return.

Thanks,
Claudio

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2] i2c: omap: Trigger bus recovery in lockup case
  2017-10-04  9:43                 ` [PATCH v2] i2c: omap: " Claudio Foellmi
  2017-10-05  6:01                   ` Vignesh R
  2017-10-28 20:52                   ` Wolfram Sang
@ 2017-10-30 14:19                   ` Wolfram Sang
  2 siblings, 0 replies; 20+ messages in thread
From: Wolfram Sang @ 2017-10-30 14:19 UTC (permalink / raw)
  To: Claudio Foellmi
  Cc: Tony Lindgren, Nori, Sekhar, Cooper Jr.,
	Franklin, Aaro Koskinen, Sebastian Reichel, linux-omap,
	linux-i2c, Grygorii Strashko, Vignesh R

[-- Attachment #1: Type: text/plain, Size: 1177 bytes --]

On Wed, Oct 04, 2017 at 11:43:45AM +0200, Claudio Foellmi wrote:
> A very conservative check for bus activity (to prevent interference
> in multimaster setups) prevented the bus recovery methods from being
> triggered in the case that SDA or SCL was stuck low.
> This defeats the purpose of the recovery mechanism, which was introduced
> for exactly this situation (a slave device keeping SDA pulled down).
> 
> Also added a check to make sure SDA is low before attempting recovery.
> If SDA is not stuck low, recovery will not help, so we can skip it.
> 
> Note that bus lockups can persist across reboots. The only other options
> are to reset or power cycle the offending slave device, and many i2c
> slaves do not even have a reset pin.
> 
> If we see that one of the lines is low for the entire timeout duration,
> we can actually be sure that there is no other master driving the bus.
> It is therefore save for us to attempt a bus recovery.
> 
> Signed-off-by: Claudio Foellmi <claudio.foellmi@ergon.ch>
> Cc: Grygorii Strashko <grygorii.strashko@ti.com>
> Cc: Vignesh R <vigneshr@ti.com>

Applied with the discussed change to for-next, thanks!


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2017-10-30 14:19 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-09-14 15:39 [PATCH] i2c-omap: Trigger bus recovery in lockup case Claudio Foellmi
2017-09-15 23:31 ` Grygorii Strashko
2017-09-18  5:24   ` Vignesh R
2017-09-18 12:01     ` Claudio Foellmi
2017-09-19 10:50       ` Vignesh R
2017-09-20  9:24         ` Claudio Foellmi
2017-09-20 15:02         ` Sebastian Reichel
2017-09-26 12:24         ` Claudio Foellmi
2017-09-29 12:52           ` Sebastian Reichel
2017-09-29 15:17             ` Claudio Foellmi
2017-09-29 16:37               ` Sebastian Reichel
2017-10-02 23:01                 ` Grygorii Strashko
2017-10-06 15:22                   ` Sebastian Reichel
2017-10-03 10:32               ` Vignesh R
2017-10-04  9:43                 ` [PATCH v2] i2c: omap: " Claudio Foellmi
2017-10-05  6:01                   ` Vignesh R
2017-10-05 12:30                     ` Grygorii Strashko
2017-10-28 20:52                   ` Wolfram Sang
2017-10-30  9:11                     ` Claudio Foellmi
2017-10-30 14:19                   ` Wolfram Sang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.