kernelnewbies.kernelnewbies.org archive mirror
 help / color / mirror / Atom feed
* I2C bus driver TIMEDOUT because of PM autosuspend
@ 2019-11-27 14:49 Primoz Beltram
  0 siblings, 0 replies; 4+ messages in thread
From: Primoz Beltram @ 2019-11-27 14:49 UTC (permalink / raw)
  To: kernelnewbies

I am analysing a problem with I2C bus driver where the problem shows up 
as I2C bus completely blocked. The LX driver in question is 
/drivers/i2c/busses/i2c-xiic.c. This driver is for FPGA based I2C 
controller.
Problem is difficult to reproduce, it happens very rarely. So far I saw 
that the main precondition is to have very heavy I2C traffic on bus.
In my case this is achieved/reproduced via netdev driving SFP LEDs via 
/sys/class/leds/ (via gpio-pca953x). I generate traffic with iperf3. 
Network traffic is on 10Gbps EMAC. LX kernel is 4.14.0, ARM64, dual core 
CPU, 2Gbyte mem.
What I saw from debugging this problem, is that I2C bus gets in this 
blocked state when wait_event_timeout() completes because of timeout. 
The timeout error handling in this driver is probably not robust enough 
(bus should not remain blocked because of error), but at this moment 
this are just my speculations (don't know enough details).

Looking the data on oscilloscope, I saw that SCL in single I2C data 
transfer (several messages) sequence can be interrupted for very long 
delays, e.g up to hundredths of usec (SCL is 100kHz). There are only two 
delays in driver code, first in wait_event_timeout and second in set 
autosuspend delay. I started to suspect that PM autosuspend delay could 
play some role here. Case is a bit strange because in very busy I2C 
traffic, PM autosuspend should not be triggered at all. Additionally, if 
I lower PM timeout, e.g. from 1000 (default) to 100, I hit the problem 
sooner (waits for problem hit are in order of n*10minutes).

It looks like PM autosupend is playing some role here.

Power management options in my Linux kernel build .config:
# CONFIG_SUSPEND is not set
# CONFIG_PM is not set
CONFIG_ARCH_SUSPEND_POSSIBLE=y

It is not logical (PM is not configured, PM runtime calls are void/empty 
calls), but this case is repeatedly reproduced on my test setup:

XIIC_PM_TIMEOUT=1000 (or less) -> I2C timeout error, bus blocked

XIIC_PM_TIMEOUT=10000 -> No I2C timeout error

I tried also to build i2c-xiic driver with kernel build defaults and 
without optimizations (ccflags-y +=  -O0 -g) and also as in-kernel and 
as loadable module.

Always the same test results.

The workaround that works at the moment is to change PM delay from 1000 
(default) to 10000.

Doing so I don't reproduce the problem (24h runs ok), but honestly I 
don't believe this is fix. More likely the problem is some coincidence 
timing issue, somewhere.

I intentionally did not put all detail descriptions of embedded system 
and test setup here (long list), because the main reason of this post is:

I would like to expose/discuss this issue (maintainer of the code, or 
others).
The reason/source of the problem can be much more complex and in some 
other place.

So my question is who should I contact, is this the M: in the 
MAINTAINERS list, the MODULE_AUTHOR, ...?
How to proceed.

WBR Primoz


_______________________________________________
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: I2C bus driver TIMEDOUT because of PM autosuspend
  2019-12-03  1:30 ` anish singh
@ 2019-12-03  9:26   ` Primoz Beltram
  0 siblings, 0 replies; 4+ messages in thread
From: Primoz Beltram @ 2019-12-03  9:26 UTC (permalink / raw)
  Cc: kernelnewbies

On 3. 12. 19 02:30, anish singh wrote:
> On Fri, Nov 29, 2019 at 12:53 PM Primoz Beltram
> <primoz.beltram@gmail.com> wrote:
>> I am analysing a problem with I2C bus driver where the problem shows up
>> as I2C bus completely blocked. The LX driver in question is
>> /drivers/i2c/busses/i2c-xiic.c.
>> Problem is difficult to reproduce, it happens very rarely. So far I saw
>> that the main precondition is to have very heavy I2C traffic on bus.
>> In my case this is achieved/reproduced via netdev driving SFP LEDs via
>> /sys/class/leds/ (via gpio-pca953x). I generate traffic with iperf3.
>> Network traffic is on 10Gbps EMAC. LX kernel is 4.14.0.
>> What I saw from debugging this problem is that I2C bus get blocked when
>> wait_event_timeout() completes because of timeout. The timeout handling
>> in this driver is probably not robust enough (bus should not remain
>> blocked), but at this moment this are just my speculations (don't know
>> enough details).
> Check with salea logic analyzer what happens to the i2c bus.
>
>> Looking the driver code and data on oscilloscope, I saw that SCL in
>> single I2C data transfer sequence can be interrupted for very long
>> delays, e.g up to hundredths of usec (SCL is 100kHz). I started to
>> suspect that PM autosuspend delay could play some role here. There are
>> only two delays in driver code, first in wait_event_timeout and second
>> in set autosuspend delay. Case is a bit strange because in very busy I2C
>> traffic, PM autosuspend should not be triggered at all. Additionally, if
>> I lower PM timeout, e.g. from 1000 (default) to 100, I hit the problem
>> sooner (waits for problem hit are in order of n*10minutes).
>>
>> It looks to me that PM autosupend is playing some role here.
>>
>> Power management options in my .config:
>> # CONFIG_SUSPEND is not set
>> # CONFIG_PM is not set
>> CONFIG_ARCH_SUSPEND_POSSIBLE=y
>>
>> I intentionally did not put all detail descriptions of embedded system
>> and test setup here (long list), because the main reason of this post is:
>>
>> The workaround that works for me/customer (at the moment) is to disable
>> PM autosuspend in the driver code, either by incerementing PM delay from
>> 1000 to 10000 or by disabling autosuspend (comment out call to
>> pm_runtime_put_autosuspend() in xiic_xfer()).
>>
>> But, I would like to expose/discuss this issue (maintainer of the code,
>> or others).
>> The reason/source of the problem can be much more complex and in some
>> other place.
>>
>> So my question is who should I contact, is this the M: in the
>> MAINTAINERS list, the MODULE_AUTHOR, ...?
> You can certainly add the author in loop but I am afraid
> you won't get any help as this would be specific to your board. So,
> best is to check soc vendor who has written your i2c
> bus driver or it could be a issue with your i2c client in that
> case show them your salea logic analyzer logs to see
> if they can figure out anything wrong.

Thanks for reply and suggestions.

My first suspicion was signal integrity on PCB, but if I add some debug 
prints in i2c-xiic driver (e.g. build with DEBUG define), the problem is 
no longer reproducible (not a single timeout completion in 
wait_event_timeout()).

Signal integrity problem does not look credible to me.

For my system I fixed the problem in i2c-xiic driver (in handIing 
timeout, not leave bus blocked).

Found also a contact and fill report for SoC vendor.

WBR Primoz
>> How to proceed.
>>
>> WBR Primoz
>>
>>
>> _______________________________________________
>> Kernelnewbies mailing list
>> Kernelnewbies@kernelnewbies.org
>> https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies



_______________________________________________
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: I2C bus driver TIMEDOUT because of PM autosuspend
  2019-11-21 16:59 Primoz Beltram
@ 2019-12-03  1:30 ` anish singh
  2019-12-03  9:26   ` Primoz Beltram
  0 siblings, 1 reply; 4+ messages in thread
From: anish singh @ 2019-12-03  1:30 UTC (permalink / raw)
  To: primoz.beltram; +Cc: kernelnewbies

On Fri, Nov 29, 2019 at 12:53 PM Primoz Beltram
<primoz.beltram@gmail.com> wrote:
>
> I am analysing a problem with I2C bus driver where the problem shows up
> as I2C bus completely blocked. The LX driver in question is
> /drivers/i2c/busses/i2c-xiic.c.
> Problem is difficult to reproduce, it happens very rarely. So far I saw
> that the main precondition is to have very heavy I2C traffic on bus.
> In my case this is achieved/reproduced via netdev driving SFP LEDs via
> /sys/class/leds/ (via gpio-pca953x). I generate traffic with iperf3.
> Network traffic is on 10Gbps EMAC. LX kernel is 4.14.0.
> What I saw from debugging this problem is that I2C bus get blocked when
> wait_event_timeout() completes because of timeout. The timeout handling
> in this driver is probably not robust enough (bus should not remain
> blocked), but at this moment this are just my speculations (don't know
> enough details).

Check with salea logic analyzer what happens to the i2c bus.

>
> Looking the driver code and data on oscilloscope, I saw that SCL in
> single I2C data transfer sequence can be interrupted for very long
> delays, e.g up to hundredths of usec (SCL is 100kHz). I started to
> suspect that PM autosuspend delay could play some role here. There are
> only two delays in driver code, first in wait_event_timeout and second
> in set autosuspend delay. Case is a bit strange because in very busy I2C
> traffic, PM autosuspend should not be triggered at all. Additionally, if
> I lower PM timeout, e.g. from 1000 (default) to 100, I hit the problem
> sooner (waits for problem hit are in order of n*10minutes).
>
> It looks to me that PM autosupend is playing some role here.
>
> Power management options in my .config:
> # CONFIG_SUSPEND is not set
> # CONFIG_PM is not set
> CONFIG_ARCH_SUSPEND_POSSIBLE=y
>
> I intentionally did not put all detail descriptions of embedded system
> and test setup here (long list), because the main reason of this post is:
>
> The workaround that works for me/customer (at the moment) is to disable
> PM autosuspend in the driver code, either by incerementing PM delay from
> 1000 to 10000 or by disabling autosuspend (comment out call to
> pm_runtime_put_autosuspend() in xiic_xfer()).
>
> But, I would like to expose/discuss this issue (maintainer of the code,
> or others).
> The reason/source of the problem can be much more complex and in some
> other place.
>
> So my question is who should I contact, is this the M: in the
> MAINTAINERS list, the MODULE_AUTHOR, ...?

You can certainly add the author in loop but I am afraid
you won't get any help as this would be specific to your board. So,
best is to check soc vendor who has written your i2c
bus driver or it could be a issue with your i2c client in that
case show them your salea logic analyzer logs to see
if they can figure out anything wrong.

> How to proceed.
>
> WBR Primoz
>
>
> _______________________________________________
> Kernelnewbies mailing list
> Kernelnewbies@kernelnewbies.org
> https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies

_______________________________________________
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies

^ permalink raw reply	[flat|nested] 4+ messages in thread

* I2C bus driver TIMEDOUT because of PM autosuspend
@ 2019-11-21 16:59 Primoz Beltram
  2019-12-03  1:30 ` anish singh
  0 siblings, 1 reply; 4+ messages in thread
From: Primoz Beltram @ 2019-11-21 16:59 UTC (permalink / raw)
  To: kernelnewbies

I am analysing a problem with I2C bus driver where the problem shows up 
as I2C bus completely blocked. The LX driver in question is 
/drivers/i2c/busses/i2c-xiic.c.
Problem is difficult to reproduce, it happens very rarely. So far I saw 
that the main precondition is to have very heavy I2C traffic on bus.
In my case this is achieved/reproduced via netdev driving SFP LEDs via 
/sys/class/leds/ (via gpio-pca953x). I generate traffic with iperf3. 
Network traffic is on 10Gbps EMAC. LX kernel is 4.14.0.
What I saw from debugging this problem is that I2C bus get blocked when 
wait_event_timeout() completes because of timeout. The timeout handling 
in this driver is probably not robust enough (bus should not remain 
blocked), but at this moment this are just my speculations (don't know 
enough details).

Looking the driver code and data on oscilloscope, I saw that SCL in 
single I2C data transfer sequence can be interrupted for very long 
delays, e.g up to hundredths of usec (SCL is 100kHz). I started to 
suspect that PM autosuspend delay could play some role here. There are 
only two delays in driver code, first in wait_event_timeout and second 
in set autosuspend delay. Case is a bit strange because in very busy I2C 
traffic, PM autosuspend should not be triggered at all. Additionally, if 
I lower PM timeout, e.g. from 1000 (default) to 100, I hit the problem 
sooner (waits for problem hit are in order of n*10minutes).

It looks to me that PM autosupend is playing some role here.

Power management options in my .config:
# CONFIG_SUSPEND is not set
# CONFIG_PM is not set
CONFIG_ARCH_SUSPEND_POSSIBLE=y

I intentionally did not put all detail descriptions of embedded system 
and test setup here (long list), because the main reason of this post is:

The workaround that works for me/customer (at the moment) is to disable 
PM autosuspend in the driver code, either by incerementing PM delay from 
1000 to 10000 or by disabling autosuspend (comment out call to 
pm_runtime_put_autosuspend() in xiic_xfer()).

But, I would like to expose/discuss this issue (maintainer of the code, 
or others).
The reason/source of the problem can be much more complex and in some 
other place.

So my question is who should I contact, is this the M: in the 
MAINTAINERS list, the MODULE_AUTHOR, ...?
How to proceed.

WBR Primoz


_______________________________________________
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2019-12-03  9:28 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-11-27 14:49 I2C bus driver TIMEDOUT because of PM autosuspend Primoz Beltram
  -- strict thread matches above, loose matches on Subject: below --
2019-11-21 16:59 Primoz Beltram
2019-12-03  1:30 ` anish singh
2019-12-03  9:26   ` Primoz Beltram

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).