All of lore.kernel.org
 help / color / mirror / Atom feed
* Errant readings on LM81 with T2080 SoC
@ 2021-03-07 22:52 ` Chris Packham
  0 siblings, 0 replies; 57+ messages in thread
From: Chris Packham @ 2021-03-07 22:52 UTC (permalink / raw)
  To: jdelvare, Guenter Roeck
  Cc: linux-hwmon, linux-kernel, linux-i2c, linuxppc-dev

Hi,

I've got a system using a PowerPC T2080 SoC and among other things has 
an LM81 hwmon chip.

Under a high CPU load we see errant readings from the LM81 as well as 
actual failures. It's the errant readings that cause the most concern 
since we can easily ignore the read errors in our monitoring application 
(although it would be better if they weren't there at all).

I'm able to reproduce this with a test application[0] that artificially 
creates a high CPU load then by repeatedly checking for the all-1s 
values from the LM81 datasheet[1](page 17). The all-1s readings stick 
out as they are obviously higher than the voltage rails that are 
connected and disagree with measurements taken with a multimeter.

Here's the output from my device

[root@linuxbox ~]# cpuload 90&
[root@linuxbox ~]# (while true; do cat /sys/class/hwmon/hwmon0/in*_input 
| grep '3320\|4383\|6641\|15930\|3586'; sleep 1; done)&
3586
3586
cat: read error: No such device or address
cat: read error: No such device or address
3320
3320
3586
3586
6641
6641
4383
4383

Fundamentally I think this is a problem with the fact that the LM81 is 
an SMBus device but the T2080 (and other Freescale SoCs) uses i2c and we 
emulate SMBus. I suspect the errant readings are when we don't get round 
to completing the read within the timeout specified by the SMBus 
specification. Depending on when that happens we either fail the 
transfer or interpret the result as all-1s.

[0] - https://gist.github.com/cpackham/6356a3a943accebb228135dc10daf721
[1] - https://www.ti.com/lit/ds/symlink/lm81.pdf

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Errant readings on LM81 with T2080 SoC
@ 2021-03-07 22:52 ` Chris Packham
  0 siblings, 0 replies; 57+ messages in thread
From: Chris Packham @ 2021-03-07 22:52 UTC (permalink / raw)
  To: jdelvare, Guenter Roeck
  Cc: linux-hwmon, linuxppc-dev, linux-kernel, linux-i2c

Hi,

I've got a system using a PowerPC T2080 SoC and among other things has 
an LM81 hwmon chip.

Under a high CPU load we see errant readings from the LM81 as well as 
actual failures. It's the errant readings that cause the most concern 
since we can easily ignore the read errors in our monitoring application 
(although it would be better if they weren't there at all).

I'm able to reproduce this with a test application[0] that artificially 
creates a high CPU load then by repeatedly checking for the all-1s 
values from the LM81 datasheet[1](page 17). The all-1s readings stick 
out as they are obviously higher than the voltage rails that are 
connected and disagree with measurements taken with a multimeter.

Here's the output from my device

[root@linuxbox ~]# cpuload 90&
[root@linuxbox ~]# (while true; do cat /sys/class/hwmon/hwmon0/in*_input 
| grep '3320\|4383\|6641\|15930\|3586'; sleep 1; done)&
3586
3586
cat: read error: No such device or address
cat: read error: No such device or address
3320
3320
3586
3586
6641
6641
4383
4383

Fundamentally I think this is a problem with the fact that the LM81 is 
an SMBus device but the T2080 (and other Freescale SoCs) uses i2c and we 
emulate SMBus. I suspect the errant readings are when we don't get round 
to completing the read within the timeout specified by the SMBus 
specification. Depending on when that happens we either fail the 
transfer or interpret the result as all-1s.

[0] - https://gist.github.com/cpackham/6356a3a943accebb228135dc10daf721
[1] - https://www.ti.com/lit/ds/symlink/lm81.pdf

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: Errant readings on LM81 with T2080 SoC
  2021-03-07 22:52 ` Chris Packham
@ 2021-03-08  0:31   ` Guenter Roeck
  -1 siblings, 0 replies; 57+ messages in thread
From: Guenter Roeck @ 2021-03-08  0:31 UTC (permalink / raw)
  To: Chris Packham, jdelvare
  Cc: linux-hwmon, linux-kernel, linux-i2c, linuxppc-dev

On 3/7/21 2:52 PM, Chris Packham wrote:
> Hi,
> 
> I've got a system using a PowerPC T2080 SoC and among other things has 
> an LM81 hwmon chip.
> 
> Under a high CPU load we see errant readings from the LM81 as well as 
> actual failures. It's the errant readings that cause the most concern 
> since we can easily ignore the read errors in our monitoring application 
> (although it would be better if they weren't there at all).
> 
> I'm able to reproduce this with a test application[0] that artificially 
> creates a high CPU load then by repeatedly checking for the all-1s 
> values from the LM81 datasheet[1](page 17). The all-1s readings stick 
> out as they are obviously higher than the voltage rails that are 
> connected and disagree with measurements taken with a multimeter.
> 
> Here's the output from my device
> 
> [root@linuxbox ~]# cpuload 90&
> [root@linuxbox ~]# (while true; do cat /sys/class/hwmon/hwmon0/in*_input 
> | grep '3320\|4383\|6641\|15930\|3586'; sleep 1; done)&
> 3586
> 3586
> cat: read error: No such device or address
> cat: read error: No such device or address
> 3320
> 3320
> 3586
> 3586
> 6641
> 6641
> 4383
> 4383
> 
> Fundamentally I think this is a problem with the fact that the LM81 is 
> an SMBus device but the T2080 (and other Freescale SoCs) uses i2c and we 
> emulate SMBus. I suspect the errant readings are when we don't get round 
> to completing the read within the timeout specified by the SMBus 
> specification. Depending on when that happens we either fail the 
> transfer or interpret the result as all-1s.
> 

That is quite unlikely. Many sensor chips are SMBus chips connected to
i2c busses. It is much more likely that there is a bug in the T2080 i2c driver,
that the chip doesn't like the bulk read command issued through regmap, that
the chip has problems with the i2c bus speed, or that the i2c bus is noisy.

In this context, the "No such device or address" responses are very suspicious.
Those are reported by the i2c driver, not by the hwmon driver, and suggest
that the chip did not respond to a read request. Maybe it helps to enable
debugging to the i2c driver to see if it reports anything useful. Even
better might be to connect an i2c bus analyzer to the i2c bus and check
what is going on.

Guenter

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: Errant readings on LM81 with T2080 SoC
@ 2021-03-08  0:31   ` Guenter Roeck
  0 siblings, 0 replies; 57+ messages in thread
From: Guenter Roeck @ 2021-03-08  0:31 UTC (permalink / raw)
  To: Chris Packham, jdelvare
  Cc: linux-hwmon, linuxppc-dev, linux-kernel, linux-i2c

On 3/7/21 2:52 PM, Chris Packham wrote:
> Hi,
> 
> I've got a system using a PowerPC T2080 SoC and among other things has 
> an LM81 hwmon chip.
> 
> Under a high CPU load we see errant readings from the LM81 as well as 
> actual failures. It's the errant readings that cause the most concern 
> since we can easily ignore the read errors in our monitoring application 
> (although it would be better if they weren't there at all).
> 
> I'm able to reproduce this with a test application[0] that artificially 
> creates a high CPU load then by repeatedly checking for the all-1s 
> values from the LM81 datasheet[1](page 17). The all-1s readings stick 
> out as they are obviously higher than the voltage rails that are 
> connected and disagree with measurements taken with a multimeter.
> 
> Here's the output from my device
> 
> [root@linuxbox ~]# cpuload 90&
> [root@linuxbox ~]# (while true; do cat /sys/class/hwmon/hwmon0/in*_input 
> | grep '3320\|4383\|6641\|15930\|3586'; sleep 1; done)&
> 3586
> 3586
> cat: read error: No such device or address
> cat: read error: No such device or address
> 3320
> 3320
> 3586
> 3586
> 6641
> 6641
> 4383
> 4383
> 
> Fundamentally I think this is a problem with the fact that the LM81 is 
> an SMBus device but the T2080 (and other Freescale SoCs) uses i2c and we 
> emulate SMBus. I suspect the errant readings are when we don't get round 
> to completing the read within the timeout specified by the SMBus 
> specification. Depending on when that happens we either fail the 
> transfer or interpret the result as all-1s.
> 

That is quite unlikely. Many sensor chips are SMBus chips connected to
i2c busses. It is much more likely that there is a bug in the T2080 i2c driver,
that the chip doesn't like the bulk read command issued through regmap, that
the chip has problems with the i2c bus speed, or that the i2c bus is noisy.

In this context, the "No such device or address" responses are very suspicious.
Those are reported by the i2c driver, not by the hwmon driver, and suggest
that the chip did not respond to a read request. Maybe it helps to enable
debugging to the i2c driver to see if it reports anything useful. Even
better might be to connect an i2c bus analyzer to the i2c bus and check
what is going on.

Guenter

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: Errant readings on LM81 with T2080 SoC
  2021-03-08  0:31   ` Guenter Roeck
@ 2021-03-08  2:27     ` Chris Packham
  -1 siblings, 0 replies; 57+ messages in thread
From: Chris Packham @ 2021-03-08  2:27 UTC (permalink / raw)
  To: Guenter Roeck, jdelvare
  Cc: linux-hwmon, linux-kernel, linux-i2c, linuxppc-dev


On 8/03/21 1:31 pm, Guenter Roeck wrote:
> On 3/7/21 2:52 PM, Chris Packham wrote:
>> Hi,
>>
>> I've got a system using a PowerPC T2080 SoC and among other things has
>> an LM81 hwmon chip.
>>
>> Under a high CPU load we see errant readings from the LM81 as well as
>> actual failures. It's the errant readings that cause the most concern
>> since we can easily ignore the read errors in our monitoring application
>> (although it would be better if they weren't there at all).
>>
>> I'm able to reproduce this with a test application[0] that artificially
>> creates a high CPU load then by repeatedly checking for the all-1s
>> values from the LM81 datasheet[1](page 17). The all-1s readings stick
>> out as they are obviously higher than the voltage rails that are
>> connected and disagree with measurements taken with a multimeter.
>>
>> Here's the output from my device
>>
>> [root@linuxbox ~]# cpuload 90&
>> [root@linuxbox ~]# (while true; do cat /sys/class/hwmon/hwmon0/in*_input
>> | grep '3320\|4383\|6641\|15930\|3586'; sleep 1; done)&
>> 3586
>> 3586
>> cat: read error: No such device or address
>> cat: read error: No such device or address
>> 3320
>> 3320
>> 3586
>> 3586
>> 6641
>> 6641
>> 4383
>> 4383
>>
>> Fundamentally I think this is a problem with the fact that the LM81 is
>> an SMBus device but the T2080 (and other Freescale SoCs) uses i2c and we
>> emulate SMBus. I suspect the errant readings are when we don't get round
>> to completing the read within the timeout specified by the SMBus
>> specification. Depending on when that happens we either fail the
>> transfer or interpret the result as all-1s.
>>
> That is quite unlikely. Many sensor chips are SMBus chips connected to
> i2c busses. It is much more likely that there is a bug in the T2080 i2c driver,
> that the chip doesn't like the bulk read command issued through regmap, that
> the chip has problems with the i2c bus speed, or that the i2c bus is noisy.
Perhaps something gets upset when interrupt processing is delayed 
because of CPU load. I don't see the problem when there isn't a CPU load 
so I think that eliminates board issues.
> In this context, the "No such device or address" responses are very suspicious.
> Those are reported by the i2c driver, not by the hwmon driver, and suggest
> that the chip did not respond to a read request. Maybe it helps to enable
> debugging to the i2c driver to see if it reports anything useful. Even
> better might be to connect an i2c bus analyzer to the i2c bus and check
> what is going on.
That's from -ENXIO which is used in only one place in i2c-mpc.c. I'll 
enable some debug and see what we get.
>
> Guenter

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: Errant readings on LM81 with T2080 SoC
@ 2021-03-08  2:27     ` Chris Packham
  0 siblings, 0 replies; 57+ messages in thread
From: Chris Packham @ 2021-03-08  2:27 UTC (permalink / raw)
  To: Guenter Roeck, jdelvare
  Cc: linux-hwmon, linuxppc-dev, linux-kernel, linux-i2c


On 8/03/21 1:31 pm, Guenter Roeck wrote:
> On 3/7/21 2:52 PM, Chris Packham wrote:
>> Hi,
>>
>> I've got a system using a PowerPC T2080 SoC and among other things has
>> an LM81 hwmon chip.
>>
>> Under a high CPU load we see errant readings from the LM81 as well as
>> actual failures. It's the errant readings that cause the most concern
>> since we can easily ignore the read errors in our monitoring application
>> (although it would be better if they weren't there at all).
>>
>> I'm able to reproduce this with a test application[0] that artificially
>> creates a high CPU load then by repeatedly checking for the all-1s
>> values from the LM81 datasheet[1](page 17). The all-1s readings stick
>> out as they are obviously higher than the voltage rails that are
>> connected and disagree with measurements taken with a multimeter.
>>
>> Here's the output from my device
>>
>> [root@linuxbox ~]# cpuload 90&
>> [root@linuxbox ~]# (while true; do cat /sys/class/hwmon/hwmon0/in*_input
>> | grep '3320\|4383\|6641\|15930\|3586'; sleep 1; done)&
>> 3586
>> 3586
>> cat: read error: No such device or address
>> cat: read error: No such device or address
>> 3320
>> 3320
>> 3586
>> 3586
>> 6641
>> 6641
>> 4383
>> 4383
>>
>> Fundamentally I think this is a problem with the fact that the LM81 is
>> an SMBus device but the T2080 (and other Freescale SoCs) uses i2c and we
>> emulate SMBus. I suspect the errant readings are when we don't get round
>> to completing the read within the timeout specified by the SMBus
>> specification. Depending on when that happens we either fail the
>> transfer or interpret the result as all-1s.
>>
> That is quite unlikely. Many sensor chips are SMBus chips connected to
> i2c busses. It is much more likely that there is a bug in the T2080 i2c driver,
> that the chip doesn't like the bulk read command issued through regmap, that
> the chip has problems with the i2c bus speed, or that the i2c bus is noisy.
Perhaps something gets upset when interrupt processing is delayed 
because of CPU load. I don't see the problem when there isn't a CPU load 
so I think that eliminates board issues.
> In this context, the "No such device or address" responses are very suspicious.
> Those are reported by the i2c driver, not by the hwmon driver, and suggest
> that the chip did not respond to a read request. Maybe it helps to enable
> debugging to the i2c driver to see if it reports anything useful. Even
> better might be to connect an i2c bus analyzer to the i2c bus and check
> what is going on.
That's from -ENXIO which is used in only one place in i2c-mpc.c. I'll 
enable some debug and see what we get.
>
> Guenter

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: Errant readings on LM81 with T2080 SoC
  2021-03-08  2:27     ` Chris Packham
@ 2021-03-08  4:37       ` Chris Packham
  -1 siblings, 0 replies; 57+ messages in thread
From: Chris Packham @ 2021-03-08  4:37 UTC (permalink / raw)
  To: Guenter Roeck, jdelvare
  Cc: linux-hwmon, linux-kernel, linux-i2c, linuxppc-dev


On 8/03/21 3:27 pm, Chris Packham wrote:
>
> On 8/03/21 1:31 pm, Guenter Roeck wrote:
>> On 3/7/21 2:52 PM, Chris Packham wrote:
>>> Hi,
>>>
>>> I've got a system using a PowerPC T2080 SoC and among other things has
>>> an LM81 hwmon chip.
>>>
>>> Under a high CPU load we see errant readings from the LM81 as well as
>>> actual failures. It's the errant readings that cause the most concern
>>> since we can easily ignore the read errors in our monitoring 
>>> application
>>> (although it would be better if they weren't there at all).
>>>
>>> I'm able to reproduce this with a test application[0] that artificially
>>> creates a high CPU load then by repeatedly checking for the all-1s
>>> values from the LM81 datasheet[1](page 17). The all-1s readings stick
>>> out as they are obviously higher than the voltage rails that are
>>> connected and disagree with measurements taken with a multimeter.
>>>
>>> Here's the output from my device
>>>
>>> [root@linuxbox ~]# cpuload 90&
>>> [root@linuxbox ~]# (while true; do cat 
>>> /sys/class/hwmon/hwmon0/in*_input
>>> | grep '3320\|4383\|6641\|15930\|3586'; sleep 1; done)&
>>> 3586
>>> 3586
>>> cat: read error: No such device or address
>>> cat: read error: No such device or address
>>> 3320
>>> 3320
>>> 3586
>>> 3586
>>> 6641
>>> 6641
>>> 4383
>>> 4383
>>>
>>> Fundamentally I think this is a problem with the fact that the LM81 is
>>> an SMBus device but the T2080 (and other Freescale SoCs) uses i2c 
>>> and we
>>> emulate SMBus. I suspect the errant readings are when we don't get 
>>> round
>>> to completing the read within the timeout specified by the SMBus
>>> specification. Depending on when that happens we either fail the
>>> transfer or interpret the result as all-1s.
>>>
>> That is quite unlikely. Many sensor chips are SMBus chips connected to
>> i2c busses. It is much more likely that there is a bug in the T2080 
>> i2c driver,
>> that the chip doesn't like the bulk read command issued through 
>> regmap, that
>> the chip has problems with the i2c bus speed, or that the i2c bus is 
>> noisy.
> Perhaps something gets upset when interrupt processing is delayed 
> because of CPU load. I don't see the problem when there isn't a CPU 
> load so I think that eliminates board issues.
>> In this context, the "No such device or address" responses are very 
>> suspicious.
>> Those are reported by the i2c driver, not by the hwmon driver, and 
>> suggest
>> that the chip did not respond to a read request. Maybe it helps to 
>> enable
>> debugging to the i2c driver to see if it reports anything useful. Even
>> better might be to connect an i2c bus analyzer to the i2c bus and check
>> what is going on.
> That's from -ENXIO which is used in only one place in i2c-mpc.c. I'll 
> enable some debug and see what we get.

For the errant readings there was nothing abnormal reported by the driver.

For the "No such device or address" I saw "mpc-i2c ffe119000.i2c: No 
RXAK" which matches up with the -ENXIO return.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: Errant readings on LM81 with T2080 SoC
@ 2021-03-08  4:37       ` Chris Packham
  0 siblings, 0 replies; 57+ messages in thread
From: Chris Packham @ 2021-03-08  4:37 UTC (permalink / raw)
  To: Guenter Roeck, jdelvare
  Cc: linux-hwmon, linuxppc-dev, linux-kernel, linux-i2c


On 8/03/21 3:27 pm, Chris Packham wrote:
>
> On 8/03/21 1:31 pm, Guenter Roeck wrote:
>> On 3/7/21 2:52 PM, Chris Packham wrote:
>>> Hi,
>>>
>>> I've got a system using a PowerPC T2080 SoC and among other things has
>>> an LM81 hwmon chip.
>>>
>>> Under a high CPU load we see errant readings from the LM81 as well as
>>> actual failures. It's the errant readings that cause the most concern
>>> since we can easily ignore the read errors in our monitoring 
>>> application
>>> (although it would be better if they weren't there at all).
>>>
>>> I'm able to reproduce this with a test application[0] that artificially
>>> creates a high CPU load then by repeatedly checking for the all-1s
>>> values from the LM81 datasheet[1](page 17). The all-1s readings stick
>>> out as they are obviously higher than the voltage rails that are
>>> connected and disagree with measurements taken with a multimeter.
>>>
>>> Here's the output from my device
>>>
>>> [root@linuxbox ~]# cpuload 90&
>>> [root@linuxbox ~]# (while true; do cat 
>>> /sys/class/hwmon/hwmon0/in*_input
>>> | grep '3320\|4383\|6641\|15930\|3586'; sleep 1; done)&
>>> 3586
>>> 3586
>>> cat: read error: No such device or address
>>> cat: read error: No such device or address
>>> 3320
>>> 3320
>>> 3586
>>> 3586
>>> 6641
>>> 6641
>>> 4383
>>> 4383
>>>
>>> Fundamentally I think this is a problem with the fact that the LM81 is
>>> an SMBus device but the T2080 (and other Freescale SoCs) uses i2c 
>>> and we
>>> emulate SMBus. I suspect the errant readings are when we don't get 
>>> round
>>> to completing the read within the timeout specified by the SMBus
>>> specification. Depending on when that happens we either fail the
>>> transfer or interpret the result as all-1s.
>>>
>> That is quite unlikely. Many sensor chips are SMBus chips connected to
>> i2c busses. It is much more likely that there is a bug in the T2080 
>> i2c driver,
>> that the chip doesn't like the bulk read command issued through 
>> regmap, that
>> the chip has problems with the i2c bus speed, or that the i2c bus is 
>> noisy.
> Perhaps something gets upset when interrupt processing is delayed 
> because of CPU load. I don't see the problem when there isn't a CPU 
> load so I think that eliminates board issues.
>> In this context, the "No such device or address" responses are very 
>> suspicious.
>> Those are reported by the i2c driver, not by the hwmon driver, and 
>> suggest
>> that the chip did not respond to a read request. Maybe it helps to 
>> enable
>> debugging to the i2c driver to see if it reports anything useful. Even
>> better might be to connect an i2c bus analyzer to the i2c bus and check
>> what is going on.
> That's from -ENXIO which is used in only one place in i2c-mpc.c. I'll 
> enable some debug and see what we get.

For the errant readings there was nothing abnormal reported by the driver.

For the "No such device or address" I saw "mpc-i2c ffe119000.i2c: No 
RXAK" which matches up with the -ENXIO return.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: Errant readings on LM81 with T2080 SoC
  2021-03-08  4:37       ` Chris Packham
@ 2021-03-08  4:59         ` Guenter Roeck
  -1 siblings, 0 replies; 57+ messages in thread
From: Guenter Roeck @ 2021-03-08  4:59 UTC (permalink / raw)
  To: Chris Packham, jdelvare
  Cc: linux-hwmon, linux-kernel, linux-i2c, linuxppc-dev

On 3/7/21 8:37 PM, Chris Packham wrote:
[ ... ]
>> That's from -ENXIO which is used in only one place in i2c-mpc.c. I'll 
>> enable some debug and see what we get.
> 
> For the errant readings there was nothing abnormal reported by the driver.
> 
> For the "No such device or address" I saw "mpc-i2c ffe119000.i2c: No 
> RXAK" which matches up with the -ENXIO return.
> 

Id suggest to check the time until not busy and stop in mpc_xfer().
Those hot loops are unusual, and may well mess up the code especially
if preempt is enabled. Also, are you using interrupts or polling in
your system ? The interrupt handler looks a bit odd, with "Read again
to allow register to stabilise".

Do you have fsl,timeout set in the devicetree properties and, if so,
have you played with it ?

Other than that, the only other real idea I have would be to monitor
the i2c bus.

Guenter

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: Errant readings on LM81 with T2080 SoC
@ 2021-03-08  4:59         ` Guenter Roeck
  0 siblings, 0 replies; 57+ messages in thread
From: Guenter Roeck @ 2021-03-08  4:59 UTC (permalink / raw)
  To: Chris Packham, jdelvare
  Cc: linux-hwmon, linuxppc-dev, linux-kernel, linux-i2c

On 3/7/21 8:37 PM, Chris Packham wrote:
[ ... ]
>> That's from -ENXIO which is used in only one place in i2c-mpc.c. I'll 
>> enable some debug and see what we get.
> 
> For the errant readings there was nothing abnormal reported by the driver.
> 
> For the "No such device or address" I saw "mpc-i2c ffe119000.i2c: No 
> RXAK" which matches up with the -ENXIO return.
> 

Id suggest to check the time until not busy and stop in mpc_xfer().
Those hot loops are unusual, and may well mess up the code especially
if preempt is enabled. Also, are you using interrupts or polling in
your system ? The interrupt handler looks a bit odd, with "Read again
to allow register to stabilise".

Do you have fsl,timeout set in the devicetree properties and, if so,
have you played with it ?

Other than that, the only other real idea I have would be to monitor
the i2c bus.

Guenter

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: Errant readings on LM81 with T2080 SoC
  2021-03-08  4:59         ` Guenter Roeck
@ 2021-03-08 20:27           ` Chris Packham
  -1 siblings, 0 replies; 57+ messages in thread
From: Chris Packham @ 2021-03-08 20:27 UTC (permalink / raw)
  To: Guenter Roeck, jdelvare
  Cc: linux-hwmon, linux-kernel, linux-i2c, linuxppc-dev


On 8/03/21 5:59 pm, Guenter Roeck wrote:
> On 3/7/21 8:37 PM, Chris Packham wrote:
> [ ... ]
>>> That's from -ENXIO which is used in only one place in i2c-mpc.c. I'll
>>> enable some debug and see what we get.
>> For the errant readings there was nothing abnormal reported by the driver.
>>
>> For the "No such device or address" I saw "mpc-i2c ffe119000.i2c: No
>> RXAK" which matches up with the -ENXIO return.
>>
> Id suggest to check the time until not busy and stop in mpc_xfer().
> Those hot loops are unusual, and may well mess up the code especially
> if preempt is enabled. Also, are you using interrupts or polling in
> your system ?
I'm using interrupts but I see the same issue if I comment out the 
interrupts in the dtsi file (i.e. force it to use polling).
> The interrupt handler looks a bit odd, with "Read again
> to allow register to stabilise".

Yeah that stuck out to me too. The code in question predates git, I went 
spelunking in history.git and the "Read again" seems to be in the 
initial version[0]. I did try to alter the interrupt handler so that it 
only does one read but that didn't seem to change anything.

> Do you have fsl,timeout set in the devicetree properties and, if so,
> have you played with it ?
Haven't got it set but I'll have a go at tweaking it.
> Other than that, the only other real idea I have would be to monitor
> the i2c bus.
I am in the fortunate position of being able to go into the office and 
even happen to have the expensive scope at the moment. Now I just need 
to find a tame HW engineer so I don't burn myself trying to attach the 
probes.

-- 

[0] - https://git.kernel.org/pub/scm/linux/kernel/git/history/history.git/commit/?id=11b3235dc04a306f6a9ba14c1ab621b2d54f2c56


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: Errant readings on LM81 with T2080 SoC
@ 2021-03-08 20:27           ` Chris Packham
  0 siblings, 0 replies; 57+ messages in thread
From: Chris Packham @ 2021-03-08 20:27 UTC (permalink / raw)
  To: Guenter Roeck, jdelvare
  Cc: linux-hwmon, linuxppc-dev, linux-kernel, linux-i2c


On 8/03/21 5:59 pm, Guenter Roeck wrote:
> On 3/7/21 8:37 PM, Chris Packham wrote:
> [ ... ]
>>> That's from -ENXIO which is used in only one place in i2c-mpc.c. I'll
>>> enable some debug and see what we get.
>> For the errant readings there was nothing abnormal reported by the driver.
>>
>> For the "No such device or address" I saw "mpc-i2c ffe119000.i2c: No
>> RXAK" which matches up with the -ENXIO return.
>>
> Id suggest to check the time until not busy and stop in mpc_xfer().
> Those hot loops are unusual, and may well mess up the code especially
> if preempt is enabled. Also, are you using interrupts or polling in
> your system ?
I'm using interrupts but I see the same issue if I comment out the 
interrupts in the dtsi file (i.e. force it to use polling).
> The interrupt handler looks a bit odd, with "Read again
> to allow register to stabilise".

Yeah that stuck out to me too. The code in question predates git, I went 
spelunking in history.git and the "Read again" seems to be in the 
initial version[0]. I did try to alter the interrupt handler so that it 
only does one read but that didn't seem to change anything.

> Do you have fsl,timeout set in the devicetree properties and, if so,
> have you played with it ?
Haven't got it set but I'll have a go at tweaking it.
> Other than that, the only other real idea I have would be to monitor
> the i2c bus.
I am in the fortunate position of being able to go into the office and 
even happen to have the expensive scope at the moment. Now I just need 
to find a tame HW engineer so I don't burn myself trying to attach the 
probes.

-- 

[0] - https://git.kernel.org/pub/scm/linux/kernel/git/history/history.git/commit/?id=11b3235dc04a306f6a9ba14c1ab621b2d54f2c56


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: Errant readings on LM81 with T2080 SoC
  2021-03-08  4:59         ` Guenter Roeck
@ 2021-03-08 22:10           ` Chris Packham
  -1 siblings, 0 replies; 57+ messages in thread
From: Chris Packham @ 2021-03-08 22:10 UTC (permalink / raw)
  To: Guenter Roeck, jdelvare
  Cc: linux-hwmon, linux-kernel, linux-i2c, linuxppc-dev


On 8/03/21 5:59 pm, Guenter Roeck wrote:
> On 3/7/21 8:37 PM, Chris Packham wrote:
> [ ... ]
>>> That's from -ENXIO which is used in only one place in i2c-mpc.c. I'll
>>> enable some debug and see what we get.
>> For the errant readings there was nothing abnormal reported by the driver.
>>
>> For the "No such device or address" I saw "mpc-i2c ffe119000.i2c: No
>> RXAK" which matches up with the -ENXIO return.
>>
> Id suggest to check the time until not busy and stop in mpc_xfer().
> Those hot loops are unusual, and may well mess up the code especially
> if preempt is enabled.
Reworking those loops seems to have had a positive result. I'll do a bit 
more testing and hopefully get a patch out later today.
>   Also, are you using interrupts or polling in
> your system ? The interrupt handler looks a bit odd, with "Read again
> to allow register to stabilise".
>
> Do you have fsl,timeout set in the devicetree properties and, if so,
> have you played with it ?
>
> Other than that, the only other real idea I have would be to monitor
> the i2c bus.
>
> Guenter

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: Errant readings on LM81 with T2080 SoC
@ 2021-03-08 22:10           ` Chris Packham
  0 siblings, 0 replies; 57+ messages in thread
From: Chris Packham @ 2021-03-08 22:10 UTC (permalink / raw)
  To: Guenter Roeck, jdelvare
  Cc: linux-hwmon, linuxppc-dev, linux-kernel, linux-i2c


On 8/03/21 5:59 pm, Guenter Roeck wrote:
> On 3/7/21 8:37 PM, Chris Packham wrote:
> [ ... ]
>>> That's from -ENXIO which is used in only one place in i2c-mpc.c. I'll
>>> enable some debug and see what we get.
>> For the errant readings there was nothing abnormal reported by the driver.
>>
>> For the "No such device or address" I saw "mpc-i2c ffe119000.i2c: No
>> RXAK" which matches up with the -ENXIO return.
>>
> Id suggest to check the time until not busy and stop in mpc_xfer().
> Those hot loops are unusual, and may well mess up the code especially
> if preempt is enabled.
Reworking those loops seems to have had a positive result. I'll do a bit 
more testing and hopefully get a patch out later today.
>   Also, are you using interrupts or polling in
> your system ? The interrupt handler looks a bit odd, with "Read again
> to allow register to stabilise".
>
> Do you have fsl,timeout set in the devicetree properties and, if so,
> have you played with it ?
>
> Other than that, the only other real idea I have would be to monitor
> the i2c bus.
>
> Guenter

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: Errant readings on LM81 with T2080 SoC
  2021-03-08 20:27           ` Chris Packham
@ 2021-03-08 22:39             ` Guenter Roeck
  -1 siblings, 0 replies; 57+ messages in thread
From: Guenter Roeck @ 2021-03-08 22:39 UTC (permalink / raw)
  To: Chris Packham
  Cc: jdelvare, linux-hwmon, linux-kernel, linux-i2c, linuxppc-dev

On Mon, Mar 08, 2021 at 08:27:30PM +0000, Chris Packham wrote:
[ ... ]
> > Other than that, the only other real idea I have would be to monitor
> > the i2c bus.
> I am in the fortunate position of being able to go into the office and 
> even happen to have the expensive scope at the moment. Now I just need 
> to find a tame HW engineer so I don't burn myself trying to attach the 
> probes.
> 
A bit unrelated, but you can get scopes connected through usb which are
quite low-cost (like in the $100 range) and good enough for i2c testing.

Guenter

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: Errant readings on LM81 with T2080 SoC
@ 2021-03-08 22:39             ` Guenter Roeck
  0 siblings, 0 replies; 57+ messages in thread
From: Guenter Roeck @ 2021-03-08 22:39 UTC (permalink / raw)
  To: Chris Packham
  Cc: linux-hwmon, jdelvare, linuxppc-dev, linux-kernel, linux-i2c

On Mon, Mar 08, 2021 at 08:27:30PM +0000, Chris Packham wrote:
[ ... ]
> > Other than that, the only other real idea I have would be to monitor
> > the i2c bus.
> I am in the fortunate position of being able to go into the office and 
> even happen to have the expensive scope at the moment. Now I just need 
> to find a tame HW engineer so I don't burn myself trying to attach the 
> probes.
> 
A bit unrelated, but you can get scopes connected through usb which are
quite low-cost (like in the $100 range) and good enough for i2c testing.

Guenter

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: Errant readings on LM81 with T2080 SoC
  2021-03-08 22:10           ` Chris Packham
@ 2021-03-09  4:36             ` Chris Packham
  -1 siblings, 0 replies; 57+ messages in thread
From: Chris Packham @ 2021-03-09  4:36 UTC (permalink / raw)
  To: Guenter Roeck, jdelvare
  Cc: linux-hwmon, linux-kernel, linux-i2c, linuxppc-dev


On 9/03/21 11:10 am, Chris Packham wrote:
>
> On 8/03/21 5:59 pm, Guenter Roeck wrote:
>> On 3/7/21 8:37 PM, Chris Packham wrote:
>> [ ... ]
>>>> That's from -ENXIO which is used in only one place in i2c-mpc.c. I'll
>>>> enable some debug and see what we get.
>>> For the errant readings there was nothing abnormal reported by the 
>>> driver.
>>>
>>> For the "No such device or address" I saw "mpc-i2c ffe119000.i2c: No
>>> RXAK" which matches up with the -ENXIO return.
>>>
>> Id suggest to check the time until not busy and stop in mpc_xfer().
>> Those hot loops are unusual, and may well mess up the code especially
>> if preempt is enabled.
> Reworking those loops seems to have had a positive result. I'll do a 
> bit more testing and hopefully get a patch out later today.
D'oh my "fix" was to replace the cond_reshed() with msleep(10) which did 
"fix" the problem but made every i2c read slow. I didn't notice when 
testing just the lm81 but as soon as I booted the system with more i2c 
devices I saw stupidly slow boot times.
>>   Also, are you using interrupts or polling in
>> your system ? The interrupt handler looks a bit odd, with "Read again
>> to allow register to stabilise".
>>
>> Do you have fsl,timeout set in the devicetree properties and, if so,
>> have you played with it ?
>>
>> Other than that, the only other real idea I have would be to monitor
>> the i2c bus.
>>
>> Guenter

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: Errant readings on LM81 with T2080 SoC
@ 2021-03-09  4:36             ` Chris Packham
  0 siblings, 0 replies; 57+ messages in thread
From: Chris Packham @ 2021-03-09  4:36 UTC (permalink / raw)
  To: Guenter Roeck, jdelvare
  Cc: linux-hwmon, linuxppc-dev, linux-kernel, linux-i2c


On 9/03/21 11:10 am, Chris Packham wrote:
>
> On 8/03/21 5:59 pm, Guenter Roeck wrote:
>> On 3/7/21 8:37 PM, Chris Packham wrote:
>> [ ... ]
>>>> That's from -ENXIO which is used in only one place in i2c-mpc.c. I'll
>>>> enable some debug and see what we get.
>>> For the errant readings there was nothing abnormal reported by the 
>>> driver.
>>>
>>> For the "No such device or address" I saw "mpc-i2c ffe119000.i2c: No
>>> RXAK" which matches up with the -ENXIO return.
>>>
>> Id suggest to check the time until not busy and stop in mpc_xfer().
>> Those hot loops are unusual, and may well mess up the code especially
>> if preempt is enabled.
> Reworking those loops seems to have had a positive result. I'll do a 
> bit more testing and hopefully get a patch out later today.
D'oh my "fix" was to replace the cond_reshed() with msleep(10) which did 
"fix" the problem but made every i2c read slow. I didn't notice when 
testing just the lm81 but as soon as I booted the system with more i2c 
devices I saw stupidly slow boot times.
>>   Also, are you using interrupts or polling in
>> your system ? The interrupt handler looks a bit odd, with "Read again
>> to allow register to stabilise".
>>
>> Do you have fsl,timeout set in the devicetree properties and, if so,
>> have you played with it ?
>>
>> Other than that, the only other real idea I have would be to monitor
>> the i2c bus.
>>
>> Guenter

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: Errant readings on LM81 with T2080 SoC
  2021-03-09  4:36             ` Chris Packham
@ 2021-03-09  5:24               ` Guenter Roeck
  -1 siblings, 0 replies; 57+ messages in thread
From: Guenter Roeck @ 2021-03-09  5:24 UTC (permalink / raw)
  To: Chris Packham, jdelvare
  Cc: linux-hwmon, linux-kernel, linux-i2c, linuxppc-dev

On 3/8/21 8:36 PM, Chris Packham wrote:
> 
> On 9/03/21 11:10 am, Chris Packham wrote:
>>
>> On 8/03/21 5:59 pm, Guenter Roeck wrote:
>>> On 3/7/21 8:37 PM, Chris Packham wrote:
>>> [ ... ]
>>>>> That's from -ENXIO which is used in only one place in i2c-mpc.c. I'll
>>>>> enable some debug and see what we get.
>>>> For the errant readings there was nothing abnormal reported by the 
>>>> driver.
>>>>
>>>> For the "No such device or address" I saw "mpc-i2c ffe119000.i2c: No
>>>> RXAK" which matches up with the -ENXIO return.
>>>>
>>> Id suggest to check the time until not busy and stop in mpc_xfer().
>>> Those hot loops are unusual, and may well mess up the code especially
>>> if preempt is enabled.
>> Reworking those loops seems to have had a positive result. I'll do a 
>> bit more testing and hopefully get a patch out later today.
> D'oh my "fix" was to replace the cond_reshed() with msleep(10) which did 
> "fix" the problem but made every i2c read slow. I didn't notice when 
> testing just the lm81 but as soon as I booted the system with more i2c 
> devices I saw stupidly slow boot times.

msleep() is indeed a bad idea. You'd want something like usleep_range()
with increasing timeout. Like start with a few uS and double the sleep time
with each iteration (eg 4-8 / 8-16 / 16-32 / 32-64 / ...).

Guenter

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: Errant readings on LM81 with T2080 SoC
@ 2021-03-09  5:24               ` Guenter Roeck
  0 siblings, 0 replies; 57+ messages in thread
From: Guenter Roeck @ 2021-03-09  5:24 UTC (permalink / raw)
  To: Chris Packham, jdelvare
  Cc: linux-hwmon, linuxppc-dev, linux-kernel, linux-i2c

On 3/8/21 8:36 PM, Chris Packham wrote:
> 
> On 9/03/21 11:10 am, Chris Packham wrote:
>>
>> On 8/03/21 5:59 pm, Guenter Roeck wrote:
>>> On 3/7/21 8:37 PM, Chris Packham wrote:
>>> [ ... ]
>>>>> That's from -ENXIO which is used in only one place in i2c-mpc.c. I'll
>>>>> enable some debug and see what we get.
>>>> For the errant readings there was nothing abnormal reported by the 
>>>> driver.
>>>>
>>>> For the "No such device or address" I saw "mpc-i2c ffe119000.i2c: No
>>>> RXAK" which matches up with the -ENXIO return.
>>>>
>>> Id suggest to check the time until not busy and stop in mpc_xfer().
>>> Those hot loops are unusual, and may well mess up the code especially
>>> if preempt is enabled.
>> Reworking those loops seems to have had a positive result. I'll do a 
>> bit more testing and hopefully get a patch out later today.
> D'oh my "fix" was to replace the cond_reshed() with msleep(10) which did 
> "fix" the problem but made every i2c read slow. I didn't notice when 
> testing just the lm81 but as soon as I booted the system with more i2c 
> devices I saw stupidly slow boot times.

msleep() is indeed a bad idea. You'd want something like usleep_range()
with increasing timeout. Like start with a few uS and double the sleep time
with each iteration (eg 4-8 / 8-16 / 16-32 / 32-64 / ...).

Guenter

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: Errant readings on LM81 with T2080 SoC
  2021-03-08  0:31   ` Guenter Roeck
@ 2021-03-09 23:35     ` Chris Packham
  -1 siblings, 0 replies; 57+ messages in thread
From: Chris Packham @ 2021-03-09 23:35 UTC (permalink / raw)
  To: Guenter Roeck, jdelvare
  Cc: linux-hwmon, linux-kernel, linux-i2c, linuxppc-dev


On 8/03/21 1:31 pm, Guenter Roeck wrote:
> On 3/7/21 2:52 PM, Chris Packham wrote:
>> Fundamentally I think this is a problem with the fact that the LM81 is
>> an SMBus device but the T2080 (and other Freescale SoCs) uses i2c and we
>> emulate SMBus. I suspect the errant readings are when we don't get round
>> to completing the read within the timeout specified by the SMBus
>> specification. Depending on when that happens we either fail the
>> transfer or interpret the result as all-1s.
> That is quite unlikely. Many sensor chips are SMBus chips connected to
> i2c busses. It is much more likely that there is a bug in the T2080 i2c driver,
> that the chip doesn't like the bulk read command issued through regmap, that
> the chip has problems with the i2c bus speed, or that the i2c bus is noisy.
I have noticed that with the switch to regmap we end up using plain i2c 
instead of SMBUS. There appears to be no way of saying use SMBUS 
semantics if the i2c adapter reports I2C_FUNC_I2C.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: Errant readings on LM81 with T2080 SoC
@ 2021-03-09 23:35     ` Chris Packham
  0 siblings, 0 replies; 57+ messages in thread
From: Chris Packham @ 2021-03-09 23:35 UTC (permalink / raw)
  To: Guenter Roeck, jdelvare
  Cc: linux-hwmon, linuxppc-dev, linux-kernel, linux-i2c


On 8/03/21 1:31 pm, Guenter Roeck wrote:
> On 3/7/21 2:52 PM, Chris Packham wrote:
>> Fundamentally I think this is a problem with the fact that the LM81 is
>> an SMBus device but the T2080 (and other Freescale SoCs) uses i2c and we
>> emulate SMBus. I suspect the errant readings are when we don't get round
>> to completing the read within the timeout specified by the SMBus
>> specification. Depending on when that happens we either fail the
>> transfer or interpret the result as all-1s.
> That is quite unlikely. Many sensor chips are SMBus chips connected to
> i2c busses. It is much more likely that there is a bug in the T2080 i2c driver,
> that the chip doesn't like the bulk read command issued through regmap, that
> the chip has problems with the i2c bus speed, or that the i2c bus is noisy.
I have noticed that with the switch to regmap we end up using plain i2c 
instead of SMBUS. There appears to be no way of saying use SMBUS 
semantics if the i2c adapter reports I2C_FUNC_I2C.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: Errant readings on LM81 with T2080 SoC
  2021-03-08 20:27           ` Chris Packham
@ 2021-03-10  2:19             ` Chris Packham
  -1 siblings, 0 replies; 57+ messages in thread
From: Chris Packham @ 2021-03-10  2:19 UTC (permalink / raw)
  To: Guenter Roeck, jdelvare
  Cc: linux-hwmon, linux-kernel, linux-i2c, linuxppc-dev

On 9/03/21 9:27 am, Chris Packham wrote:
> On 8/03/21 5:59 pm, Guenter Roeck wrote:
>> Other than that, the only other real idea I have would be to monitor
>> the i2c bus.
> I am in the fortunate position of being able to go into the office and 
> even happen to have the expensive scope at the moment. Now I just need 
> to find a tame HW engineer so I don't burn myself trying to attach the 
> probes.
One thing I see on the scope is that when there is a CPU load there 
appears to be some clock stretching going on (SCL is held low some 
times). I don't see it without the CPU load. It's hard to correlate a 
clock stretching event with a bad read or error but it is one area where 
the SMBUS spec has a maximum that might cause the device to give up waiting.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: Errant readings on LM81 with T2080 SoC
@ 2021-03-10  2:19             ` Chris Packham
  0 siblings, 0 replies; 57+ messages in thread
From: Chris Packham @ 2021-03-10  2:19 UTC (permalink / raw)
  To: Guenter Roeck, jdelvare
  Cc: linux-hwmon, linuxppc-dev, linux-kernel, linux-i2c

On 9/03/21 9:27 am, Chris Packham wrote:
> On 8/03/21 5:59 pm, Guenter Roeck wrote:
>> Other than that, the only other real idea I have would be to monitor
>> the i2c bus.
> I am in the fortunate position of being able to go into the office and 
> even happen to have the expensive scope at the moment. Now I just need 
> to find a tame HW engineer so I don't burn myself trying to attach the 
> probes.
One thing I see on the scope is that when there is a CPU load there 
appears to be some clock stretching going on (SCL is held low some 
times). I don't see it without the CPU load. It's hard to correlate a 
clock stretching event with a bad read or error but it is one area where 
the SMBUS spec has a maximum that might cause the device to give up waiting.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: Errant readings on LM81 with T2080 SoC
  2021-03-09 23:35     ` Chris Packham
@ 2021-03-10  3:29       ` Guenter Roeck
  -1 siblings, 0 replies; 57+ messages in thread
From: Guenter Roeck @ 2021-03-10  3:29 UTC (permalink / raw)
  To: Chris Packham, jdelvare
  Cc: linux-hwmon, linux-kernel, linux-i2c, linuxppc-dev

On 3/9/21 3:35 PM, Chris Packham wrote:
> 
> On 8/03/21 1:31 pm, Guenter Roeck wrote:
>> On 3/7/21 2:52 PM, Chris Packham wrote:
>>> Fundamentally I think this is a problem with the fact that the LM81 is
>>> an SMBus device but the T2080 (and other Freescale SoCs) uses i2c and we
>>> emulate SMBus. I suspect the errant readings are when we don't get round
>>> to completing the read within the timeout specified by the SMBus
>>> specification. Depending on when that happens we either fail the
>>> transfer or interpret the result as all-1s.
>> That is quite unlikely. Many sensor chips are SMBus chips connected to
>> i2c busses. It is much more likely that there is a bug in the T2080 i2c driver,
>> that the chip doesn't like the bulk read command issued through regmap, that
>> the chip has problems with the i2c bus speed, or that the i2c bus is noisy.
> I have noticed that with the switch to regmap we end up using plain i2c 
> instead of SMBUS. There appears to be no way of saying use SMBUS 
> semantics if the i2c adapter reports I2C_FUNC_I2C.
> 

The driver only really supports I2C; SMBUS functions are emulated.
I don't think that makes a real difference.

Guenter

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: Errant readings on LM81 with T2080 SoC
@ 2021-03-10  3:29       ` Guenter Roeck
  0 siblings, 0 replies; 57+ messages in thread
From: Guenter Roeck @ 2021-03-10  3:29 UTC (permalink / raw)
  To: Chris Packham, jdelvare
  Cc: linux-hwmon, linuxppc-dev, linux-kernel, linux-i2c

On 3/9/21 3:35 PM, Chris Packham wrote:
> 
> On 8/03/21 1:31 pm, Guenter Roeck wrote:
>> On 3/7/21 2:52 PM, Chris Packham wrote:
>>> Fundamentally I think this is a problem with the fact that the LM81 is
>>> an SMBus device but the T2080 (and other Freescale SoCs) uses i2c and we
>>> emulate SMBus. I suspect the errant readings are when we don't get round
>>> to completing the read within the timeout specified by the SMBus
>>> specification. Depending on when that happens we either fail the
>>> transfer or interpret the result as all-1s.
>> That is quite unlikely. Many sensor chips are SMBus chips connected to
>> i2c busses. It is much more likely that there is a bug in the T2080 i2c driver,
>> that the chip doesn't like the bulk read command issued through regmap, that
>> the chip has problems with the i2c bus speed, or that the i2c bus is noisy.
> I have noticed that with the switch to regmap we end up using plain i2c 
> instead of SMBUS. There appears to be no way of saying use SMBUS 
> semantics if the i2c adapter reports I2C_FUNC_I2C.
> 

The driver only really supports I2C; SMBUS functions are emulated.
I don't think that makes a real difference.

Guenter

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: Errant readings on LM81 with T2080 SoC
  2021-03-10  2:19             ` Chris Packham
@ 2021-03-10  5:06               ` Guenter Roeck
  -1 siblings, 0 replies; 57+ messages in thread
From: Guenter Roeck @ 2021-03-10  5:06 UTC (permalink / raw)
  To: Chris Packham, jdelvare
  Cc: linux-hwmon, linux-kernel, linux-i2c, linuxppc-dev

On 3/9/21 6:19 PM, Chris Packham wrote:
> On 9/03/21 9:27 am, Chris Packham wrote:
>> On 8/03/21 5:59 pm, Guenter Roeck wrote:
>>> Other than that, the only other real idea I have would be to monitor
>>> the i2c bus.
>> I am in the fortunate position of being able to go into the office and 
>> even happen to have the expensive scope at the moment. Now I just need 
>> to find a tame HW engineer so I don't burn myself trying to attach the 
>> probes.
> One thing I see on the scope is that when there is a CPU load there 
> appears to be some clock stretching going on (SCL is held low some 
> times). I don't see it without the CPU load. It's hard to correlate a 
> clock stretching event with a bad read or error but it is one area where 
> the SMBUS spec has a maximum that might cause the device to give up waiting.
> 
Do you have CONFIG_PREEMPT enabled in your kernel ? But even without
that it is possible that the hot loops at the beginning and end of
each operation mess up the driver and cause it to sleep longer
than intended. Did you try usleep_range() ?

On a side note, can you send me a register dump for the lm81 ?
It would be useful for my module test code.

Thanks,
Guenter

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: Errant readings on LM81 with T2080 SoC
@ 2021-03-10  5:06               ` Guenter Roeck
  0 siblings, 0 replies; 57+ messages in thread
From: Guenter Roeck @ 2021-03-10  5:06 UTC (permalink / raw)
  To: Chris Packham, jdelvare
  Cc: linux-hwmon, linuxppc-dev, linux-kernel, linux-i2c

On 3/9/21 6:19 PM, Chris Packham wrote:
> On 9/03/21 9:27 am, Chris Packham wrote:
>> On 8/03/21 5:59 pm, Guenter Roeck wrote:
>>> Other than that, the only other real idea I have would be to monitor
>>> the i2c bus.
>> I am in the fortunate position of being able to go into the office and 
>> even happen to have the expensive scope at the moment. Now I just need 
>> to find a tame HW engineer so I don't burn myself trying to attach the 
>> probes.
> One thing I see on the scope is that when there is a CPU load there 
> appears to be some clock stretching going on (SCL is held low some 
> times). I don't see it without the CPU load. It's hard to correlate a 
> clock stretching event with a bad read or error but it is one area where 
> the SMBUS spec has a maximum that might cause the device to give up waiting.
> 
Do you have CONFIG_PREEMPT enabled in your kernel ? But even without
that it is possible that the hot loops at the beginning and end of
each operation mess up the driver and cause it to sleep longer
than intended. Did you try usleep_range() ?

On a side note, can you send me a register dump for the lm81 ?
It would be useful for my module test code.

Thanks,
Guenter

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: Errant readings on LM81 with T2080 SoC
  2021-03-10  5:06               ` Guenter Roeck
@ 2021-03-10 21:48                 ` Chris Packham
  -1 siblings, 0 replies; 57+ messages in thread
From: Chris Packham @ 2021-03-10 21:48 UTC (permalink / raw)
  To: Guenter Roeck, jdelvare
  Cc: linux-hwmon, linux-kernel, linux-i2c, linuxppc-dev


On 10/03/21 6:06 pm, Guenter Roeck wrote:
> On 3/9/21 6:19 PM, Chris Packham wrote:
>> On 9/03/21 9:27 am, Chris Packham wrote:
>>> On 8/03/21 5:59 pm, Guenter Roeck wrote:
>>>> Other than that, the only other real idea I have would be to monitor
>>>> the i2c bus.
>>> I am in the fortunate position of being able to go into the office and
>>> even happen to have the expensive scope at the moment. Now I just need
>>> to find a tame HW engineer so I don't burn myself trying to attach the
>>> probes.
>> One thing I see on the scope is that when there is a CPU load there
>> appears to be some clock stretching going on (SCL is held low some
>> times). I don't see it without the CPU load. It's hard to correlate a
>> clock stretching event with a bad read or error but it is one area where
>> the SMBUS spec has a maximum that might cause the device to give up waiting.
>>
> Do you have CONFIG_PREEMPT enabled in your kernel ? But even without
> that it is possible that the hot loops at the beginning and end of
> each operation mess up the driver and cause it to sleep longer
> than intended. Did you try usleep_range() ?

I've been running with and without CONFIG_PREEMPT. The failures happen 
with both.

I did try usleep_range() and still saw failures.

> On a side note, can you send me a register dump for the lm81 ?
> It would be useful for my module test code.

Here you go this is from a largely unconfigured LM81

      0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f 0123456789abcdef
00: 47 47 47 47 47 47 47 47 47 47 47 47 47 47 47 47 GGGGGGGGGGGGGGGG
10: 47 81 24 03 94 00 00 00 00 ff ff ff ff ff ff ff G?$??...........
20: bf cb c1 00 c0 47 ec 24 ff ff 65 ff 00 ff 00 ff ???.?G?$..e.....
30: 00 ff 00 ff 00 ff 00 71 a9 7f 7f ff ff 58 01 04 .......q???..X??
40: 01 08 00 00 00 00 00 50 2f 80 80 01 44 00 00 00 ??.....P/???D...
50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
90: 00 81 24 03 94 00 00 00 00 ff ff ff ff ff ff ff .?$??...........
a0: bf cb c1 00 c0 47 ec 24 ff ff 65 ff 00 ff 00 ff ???.?G?$..e.....
b0: 00 ff 00 ff 00 ff 00 71 a9 7f 7f ff ff 58 01 04 .......q???..X??
c0: 01 00 00 00 00 00 00 50 2f 80 80 01 44 00 00 00 ?......P/???D...
d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................

This is from a LM81 that's been configured by our application SW with 
limits appropriate for the platform.

      0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f 0123456789abcdef
00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ................
10: ff 81 24 03 94 00 00 00 00 ff ff ff ff ff ff ff ..$.............
20: bf cc c1 00 c0 47 ec 1c ff ff 65 dc b4 ff c0 d3 .....G....e.....
30: ad ff 00 d3 ad 4e 40 71 a9 4b 46 ff ff 58 01 04 .....N@q.KF..X..
40: 01 08 00 00 00 00 00 f0 2f 80 80 81 44 80 80 80 ......../...D...
50: 80 80 80 80 80 80 80 80 80 80 80 80 80 80 80 80 ................
60: 80 80 80 80 80 80 80 80 80 80 80 80 80 80 80 80 ................
70: 80 80 80 80 80 80 80 80 80 80 80 80 80 80 80 80 ................
80: 80 80 80 80 80 80 80 80 80 80 80 80 80 80 80 80 ................
90: 80 81 24 03 94 00 00 00 00 ff ff ff ff ff ff ff ..$.............
a0: bf cc c1 00 c0 47 ec 1c ff ff 65 dc b4 ff c0 d3 .....G....e.....
b0: ad ff 00 d3 ad 4e 40 71 a9 4b 46 ff ff 58 01 04 .....N@q.KF..X..
c0: 01 00 00 00 00 00 00 f0 2f 80 80 81 44 80 80 80 ......../...D...
d0: 80 80 80 80 80 80 80 80 80 80 80 80 80 80 80 80 ................
e0: 80 80 80 80 80 80 80 80 80 80 80 80 80 80 80 80 ................
f0: 80 80 80 80 80 80 80 80 80 80 80 80 80 80 80 80 ................

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: Errant readings on LM81 with T2080 SoC
@ 2021-03-10 21:48                 ` Chris Packham
  0 siblings, 0 replies; 57+ messages in thread
From: Chris Packham @ 2021-03-10 21:48 UTC (permalink / raw)
  To: Guenter Roeck, jdelvare
  Cc: linux-hwmon, linuxppc-dev, linux-kernel, linux-i2c


On 10/03/21 6:06 pm, Guenter Roeck wrote:
> On 3/9/21 6:19 PM, Chris Packham wrote:
>> On 9/03/21 9:27 am, Chris Packham wrote:
>>> On 8/03/21 5:59 pm, Guenter Roeck wrote:
>>>> Other than that, the only other real idea I have would be to monitor
>>>> the i2c bus.
>>> I am in the fortunate position of being able to go into the office and
>>> even happen to have the expensive scope at the moment. Now I just need
>>> to find a tame HW engineer so I don't burn myself trying to attach the
>>> probes.
>> One thing I see on the scope is that when there is a CPU load there
>> appears to be some clock stretching going on (SCL is held low some
>> times). I don't see it without the CPU load. It's hard to correlate a
>> clock stretching event with a bad read or error but it is one area where
>> the SMBUS spec has a maximum that might cause the device to give up waiting.
>>
> Do you have CONFIG_PREEMPT enabled in your kernel ? But even without
> that it is possible that the hot loops at the beginning and end of
> each operation mess up the driver and cause it to sleep longer
> than intended. Did you try usleep_range() ?

I've been running with and without CONFIG_PREEMPT. The failures happen 
with both.

I did try usleep_range() and still saw failures.

> On a side note, can you send me a register dump for the lm81 ?
> It would be useful for my module test code.

Here you go this is from a largely unconfigured LM81

      0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f 0123456789abcdef
00: 47 47 47 47 47 47 47 47 47 47 47 47 47 47 47 47 GGGGGGGGGGGGGGGG
10: 47 81 24 03 94 00 00 00 00 ff ff ff ff ff ff ff G?$??...........
20: bf cb c1 00 c0 47 ec 24 ff ff 65 ff 00 ff 00 ff ???.?G?$..e.....
30: 00 ff 00 ff 00 ff 00 71 a9 7f 7f ff ff 58 01 04 .......q???..X??
40: 01 08 00 00 00 00 00 50 2f 80 80 01 44 00 00 00 ??.....P/???D...
50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
90: 00 81 24 03 94 00 00 00 00 ff ff ff ff ff ff ff .?$??...........
a0: bf cb c1 00 c0 47 ec 24 ff ff 65 ff 00 ff 00 ff ???.?G?$..e.....
b0: 00 ff 00 ff 00 ff 00 71 a9 7f 7f ff ff 58 01 04 .......q???..X??
c0: 01 00 00 00 00 00 00 50 2f 80 80 01 44 00 00 00 ?......P/???D...
d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................

This is from a LM81 that's been configured by our application SW with 
limits appropriate for the platform.

      0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f 0123456789abcdef
00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ................
10: ff 81 24 03 94 00 00 00 00 ff ff ff ff ff ff ff ..$.............
20: bf cc c1 00 c0 47 ec 1c ff ff 65 dc b4 ff c0 d3 .....G....e.....
30: ad ff 00 d3 ad 4e 40 71 a9 4b 46 ff ff 58 01 04 .....N@q.KF..X..
40: 01 08 00 00 00 00 00 f0 2f 80 80 81 44 80 80 80 ......../...D...
50: 80 80 80 80 80 80 80 80 80 80 80 80 80 80 80 80 ................
60: 80 80 80 80 80 80 80 80 80 80 80 80 80 80 80 80 ................
70: 80 80 80 80 80 80 80 80 80 80 80 80 80 80 80 80 ................
80: 80 80 80 80 80 80 80 80 80 80 80 80 80 80 80 80 ................
90: 80 81 24 03 94 00 00 00 00 ff ff ff ff ff ff ff ..$.............
a0: bf cc c1 00 c0 47 ec 1c ff ff 65 dc b4 ff c0 d3 .....G....e.....
b0: ad ff 00 d3 ad 4e 40 71 a9 4b 46 ff ff 58 01 04 .....N@q.KF..X..
c0: 01 00 00 00 00 00 00 f0 2f 80 80 81 44 80 80 80 ......../...D...
d0: 80 80 80 80 80 80 80 80 80 80 80 80 80 80 80 80 ................
e0: 80 80 80 80 80 80 80 80 80 80 80 80 80 80 80 80 ................
f0: 80 80 80 80 80 80 80 80 80 80 80 80 80 80 80 80 ................

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: Errant readings on LM81 with T2080 SoC
  2021-03-10 21:48                 ` Chris Packham
@ 2021-03-11  7:41                   ` Guenter Roeck
  -1 siblings, 0 replies; 57+ messages in thread
From: Guenter Roeck @ 2021-03-11  7:41 UTC (permalink / raw)
  To: Chris Packham, jdelvare
  Cc: linux-hwmon, linux-kernel, linux-i2c, linuxppc-dev

On 3/10/21 1:48 PM, Chris Packham wrote:
> 
> On 10/03/21 6:06 pm, Guenter Roeck wrote:
>> On 3/9/21 6:19 PM, Chris Packham wrote:
>>> On 9/03/21 9:27 am, Chris Packham wrote:
>>>> On 8/03/21 5:59 pm, Guenter Roeck wrote:
>>>>> Other than that, the only other real idea I have would be to monitor
>>>>> the i2c bus.
>>>> I am in the fortunate position of being able to go into the office and
>>>> even happen to have the expensive scope at the moment. Now I just need
>>>> to find a tame HW engineer so I don't burn myself trying to attach the
>>>> probes.
>>> One thing I see on the scope is that when there is a CPU load there
>>> appears to be some clock stretching going on (SCL is held low some
>>> times). I don't see it without the CPU load. It's hard to correlate a
>>> clock stretching event with a bad read or error but it is one area where
>>> the SMBUS spec has a maximum that might cause the device to give up waiting.
>>>
>> Do you have CONFIG_PREEMPT enabled in your kernel ? But even without
>> that it is possible that the hot loops at the beginning and end of
>> each operation mess up the driver and cause it to sleep longer
>> than intended. Did you try usleep_range() ?
> 
> I've been running with and without CONFIG_PREEMPT. The failures happen 
> with both.
> 
> I did try usleep_range() and still saw failures.
> 

Bummer. What is really weird is that you see clock stretching under
CPU load. Normally clock stretching is triggered by the device, not
by the host. I wonder if there are some timing differences before
the clock stretching happens.

Anyway, I just sent a set of three patches to the list; maybe you
can give it a try. The patches convert the driver to the with_info
API and drop local caching.

The code is module tested with the register dumps I have available
for adm9240 and lm81, but it would be great to get test coverage
on real hardware. I don't really expect it to solve your problem,
but it does reduce and modify the load on the chip (because
registers are no longer read in bursts), so it may have some
positive impact.

>> On a side note, can you send me a register dump for the lm81 ?
>> It would be useful for my module test code.
> 
> Here you go this is from a largely unconfigured LM81
> 

Thanks, that helped a lot!

Guenter

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: Errant readings on LM81 with T2080 SoC
@ 2021-03-11  7:41                   ` Guenter Roeck
  0 siblings, 0 replies; 57+ messages in thread
From: Guenter Roeck @ 2021-03-11  7:41 UTC (permalink / raw)
  To: Chris Packham, jdelvare
  Cc: linux-hwmon, linuxppc-dev, linux-kernel, linux-i2c

On 3/10/21 1:48 PM, Chris Packham wrote:
> 
> On 10/03/21 6:06 pm, Guenter Roeck wrote:
>> On 3/9/21 6:19 PM, Chris Packham wrote:
>>> On 9/03/21 9:27 am, Chris Packham wrote:
>>>> On 8/03/21 5:59 pm, Guenter Roeck wrote:
>>>>> Other than that, the only other real idea I have would be to monitor
>>>>> the i2c bus.
>>>> I am in the fortunate position of being able to go into the office and
>>>> even happen to have the expensive scope at the moment. Now I just need
>>>> to find a tame HW engineer so I don't burn myself trying to attach the
>>>> probes.
>>> One thing I see on the scope is that when there is a CPU load there
>>> appears to be some clock stretching going on (SCL is held low some
>>> times). I don't see it without the CPU load. It's hard to correlate a
>>> clock stretching event with a bad read or error but it is one area where
>>> the SMBUS spec has a maximum that might cause the device to give up waiting.
>>>
>> Do you have CONFIG_PREEMPT enabled in your kernel ? But even without
>> that it is possible that the hot loops at the beginning and end of
>> each operation mess up the driver and cause it to sleep longer
>> than intended. Did you try usleep_range() ?
> 
> I've been running with and without CONFIG_PREEMPT. The failures happen 
> with both.
> 
> I did try usleep_range() and still saw failures.
> 

Bummer. What is really weird is that you see clock stretching under
CPU load. Normally clock stretching is triggered by the device, not
by the host. I wonder if there are some timing differences before
the clock stretching happens.

Anyway, I just sent a set of three patches to the list; maybe you
can give it a try. The patches convert the driver to the with_info
API and drop local caching.

The code is module tested with the register dumps I have available
for adm9240 and lm81, but it would be great to get test coverage
on real hardware. I don't really expect it to solve your problem,
but it does reduce and modify the load on the chip (because
registers are no longer read in bursts), so it may have some
positive impact.

>> On a side note, can you send me a register dump for the lm81 ?
>> It would be useful for my module test code.
> 
> Here you go this is from a largely unconfigured LM81
> 

Thanks, that helped a lot!

Guenter

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: Errant readings on LM81 with T2080 SoC
  2021-03-11  7:41                   ` Guenter Roeck
@ 2021-03-11  8:18                     ` Wolfram Sang
  -1 siblings, 0 replies; 57+ messages in thread
From: Wolfram Sang @ 2021-03-11  8:18 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: Chris Packham, jdelvare, linux-hwmon, linux-kernel, linux-i2c,
	linuxppc-dev

[-- Attachment #1: Type: text/plain, Size: 304 bytes --]


> Bummer. What is really weird is that you see clock stretching under
> CPU load. Normally clock stretching is triggered by the device, not
> by the host.

One example: Some hosts need an interrupt per byte to know if they
should send ACK or NACK. If that interrupt is delayed, they stretch the
clock.


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: Errant readings on LM81 with T2080 SoC
@ 2021-03-11  8:18                     ` Wolfram Sang
  0 siblings, 0 replies; 57+ messages in thread
From: Wolfram Sang @ 2021-03-11  8:18 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: linux-hwmon, jdelvare, linux-kernel, Chris Packham, linux-i2c,
	linuxppc-dev

[-- Attachment #1: Type: text/plain, Size: 304 bytes --]


> Bummer. What is really weird is that you see clock stretching under
> CPU load. Normally clock stretching is triggered by the device, not
> by the host.

One example: Some hosts need an interrupt per byte to know if they
should send ACK or NACK. If that interrupt is delayed, they stretch the
clock.


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: Errant readings on LM81 with T2080 SoC
  2021-03-11  8:18                     ` Wolfram Sang
@ 2021-03-11 15:19                       ` Guenter Roeck
  -1 siblings, 0 replies; 57+ messages in thread
From: Guenter Roeck @ 2021-03-11 15:19 UTC (permalink / raw)
  To: Wolfram Sang
  Cc: Chris Packham, jdelvare, linux-hwmon, linux-kernel, linux-i2c,
	linuxppc-dev


[-- Attachment #1.1: Type: text/plain, Size: 763 bytes --]

On 3/11/21 12:18 AM, Wolfram Sang wrote:
> 
>> Bummer. What is really weird is that you see clock stretching under
>> CPU load. Normally clock stretching is triggered by the device, not
>> by the host.
> 
> One example: Some hosts need an interrupt per byte to know if they
> should send ACK or NACK. If that interrupt is delayed, they stretch the
> clock.
> 

Indeed, the i2c-mpc driver sends TXAK (only) after receiving
that interrupt. Since that is running in the context of the user
process, that may well be delayed substantially on a loaded system.

Maybe the interrupt handler will need to play a more active role
in the i2c-mpc driver. Alternatively, the transfer function could
be handled by a high priority kernel thread.

Guenter


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: Errant readings on LM81 with T2080 SoC
@ 2021-03-11 15:19                       ` Guenter Roeck
  0 siblings, 0 replies; 57+ messages in thread
From: Guenter Roeck @ 2021-03-11 15:19 UTC (permalink / raw)
  To: Wolfram Sang
  Cc: linux-hwmon, jdelvare, linux-kernel, Chris Packham, linux-i2c,
	linuxppc-dev


[-- Attachment #1.1: Type: text/plain, Size: 763 bytes --]

On 3/11/21 12:18 AM, Wolfram Sang wrote:
> 
>> Bummer. What is really weird is that you see clock stretching under
>> CPU load. Normally clock stretching is triggered by the device, not
>> by the host.
> 
> One example: Some hosts need an interrupt per byte to know if they
> should send ACK or NACK. If that interrupt is delayed, they stretch the
> clock.
> 

Indeed, the i2c-mpc driver sends TXAK (only) after receiving
that interrupt. Since that is running in the context of the user
process, that may well be delayed substantially on a loaded system.

Maybe the interrupt handler will need to play a more active role
in the i2c-mpc driver. Alternatively, the transfer function could
be handled by a high priority kernel thread.

Guenter


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: Errant readings on LM81 with T2080 SoC
  2021-03-11  8:18                     ` Wolfram Sang
@ 2021-03-11 21:17                       ` Chris Packham
  -1 siblings, 0 replies; 57+ messages in thread
From: Chris Packham @ 2021-03-11 21:17 UTC (permalink / raw)
  To: Wolfram Sang, Guenter Roeck
  Cc: jdelvare, linux-hwmon, linux-kernel, linux-i2c, linuxppc-dev


On 11/03/21 9:18 pm, Wolfram Sang wrote:
>> Bummer. What is really weird is that you see clock stretching under
>> CPU load. Normally clock stretching is triggered by the device, not
>> by the host.
> One example: Some hosts need an interrupt per byte to know if they
> should send ACK or NACK. If that interrupt is delayed, they stretch the
> clock.
>
It feels like something like that is happening. Looking at the T2080 
Reference manual there is an interesting timing diagram (Figure 14-2 if 
someone feels like looking it up). It shows SCL low between the ACK for 
the address and the data byte. I think if we're delayed in sending the 
next byte we could violate Ttimeout or Tlow:mext from the SMBUS spec.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: Errant readings on LM81 with T2080 SoC
@ 2021-03-11 21:17                       ` Chris Packham
  0 siblings, 0 replies; 57+ messages in thread
From: Chris Packham @ 2021-03-11 21:17 UTC (permalink / raw)
  To: Wolfram Sang, Guenter Roeck
  Cc: linux-hwmon, jdelvare, linuxppc-dev, linux-kernel, linux-i2c


On 11/03/21 9:18 pm, Wolfram Sang wrote:
>> Bummer. What is really weird is that you see clock stretching under
>> CPU load. Normally clock stretching is triggered by the device, not
>> by the host.
> One example: Some hosts need an interrupt per byte to know if they
> should send ACK or NACK. If that interrupt is delayed, they stretch the
> clock.
>
It feels like something like that is happening. Looking at the T2080 
Reference manual there is an interesting timing diagram (Figure 14-2 if 
someone feels like looking it up). It shows SCL low between the ACK for 
the address and the data byte. I think if we're delayed in sending the 
next byte we could violate Ttimeout or Tlow:mext from the SMBUS spec.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: Errant readings on LM81 with T2080 SoC
  2021-03-11 21:17                       ` Chris Packham
@ 2021-03-11 21:34                         ` Guenter Roeck
  -1 siblings, 0 replies; 57+ messages in thread
From: Guenter Roeck @ 2021-03-11 21:34 UTC (permalink / raw)
  To: Chris Packham, Wolfram Sang
  Cc: jdelvare, linux-hwmon, linux-kernel, linux-i2c, linuxppc-dev

On 3/11/21 1:17 PM, Chris Packham wrote:
> 
> On 11/03/21 9:18 pm, Wolfram Sang wrote:
>>> Bummer. What is really weird is that you see clock stretching under
>>> CPU load. Normally clock stretching is triggered by the device, not
>>> by the host.
>> One example: Some hosts need an interrupt per byte to know if they
>> should send ACK or NACK. If that interrupt is delayed, they stretch the
>> clock.
>>
> It feels like something like that is happening. Looking at the T2080 
> Reference manual there is an interesting timing diagram (Figure 14-2 if 
> someone feels like looking it up). It shows SCL low between the ACK for 
> the address and the data byte. I think if we're delayed in sending the 
> next byte we could violate Ttimeout or Tlow:mext from the SMBUS spec.
> 

I think that really leaves you only two options that I can see:
Rework the driver to handle critical actions (such as setting TXAK,
and everything else that might result in clock stretching) in the
interrupt handler, or rework the driver to handle everything in
a high priority kernel thread.

Guenter

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: Errant readings on LM81 with T2080 SoC
@ 2021-03-11 21:34                         ` Guenter Roeck
  0 siblings, 0 replies; 57+ messages in thread
From: Guenter Roeck @ 2021-03-11 21:34 UTC (permalink / raw)
  To: Chris Packham, Wolfram Sang
  Cc: linux-hwmon, jdelvare, linuxppc-dev, linux-kernel, linux-i2c

On 3/11/21 1:17 PM, Chris Packham wrote:
> 
> On 11/03/21 9:18 pm, Wolfram Sang wrote:
>>> Bummer. What is really weird is that you see clock stretching under
>>> CPU load. Normally clock stretching is triggered by the device, not
>>> by the host.
>> One example: Some hosts need an interrupt per byte to know if they
>> should send ACK or NACK. If that interrupt is delayed, they stretch the
>> clock.
>>
> It feels like something like that is happening. Looking at the T2080 
> Reference manual there is an interesting timing diagram (Figure 14-2 if 
> someone feels like looking it up). It shows SCL low between the ACK for 
> the address and the data byte. I think if we're delayed in sending the 
> next byte we could violate Ttimeout or Tlow:mext from the SMBUS spec.
> 

I think that really leaves you only two options that I can see:
Rework the driver to handle critical actions (such as setting TXAK,
and everything else that might result in clock stretching) in the
interrupt handler, or rework the driver to handle everything in
a high priority kernel thread.

Guenter

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: Errant readings on LM81 with T2080 SoC
  2021-03-11 21:34                         ` Guenter Roeck
@ 2021-03-11 23:47                           ` Chris Packham
  -1 siblings, 0 replies; 57+ messages in thread
From: Chris Packham @ 2021-03-11 23:47 UTC (permalink / raw)
  To: Guenter Roeck, Wolfram Sang
  Cc: jdelvare, linux-hwmon, linux-kernel, linux-i2c, linuxppc-dev


On 12/03/21 10:34 am, Guenter Roeck wrote:
> On 3/11/21 1:17 PM, Chris Packham wrote:
>> On 11/03/21 9:18 pm, Wolfram Sang wrote:
>>>> Bummer. What is really weird is that you see clock stretching under
>>>> CPU load. Normally clock stretching is triggered by the device, not
>>>> by the host.
>>> One example: Some hosts need an interrupt per byte to know if they
>>> should send ACK or NACK. If that interrupt is delayed, they stretch the
>>> clock.
>>>
>> It feels like something like that is happening. Looking at the T2080
>> Reference manual there is an interesting timing diagram (Figure 14-2 if
>> someone feels like looking it up). It shows SCL low between the ACK for
>> the address and the data byte. I think if we're delayed in sending the
>> next byte we could violate Ttimeout or Tlow:mext from the SMBUS spec.
>>
> I think that really leaves you only two options that I can see:
> Rework the driver to handle critical actions (such as setting TXAK,
> and everything else that might result in clock stretching) in the
> interrupt handler, or rework the driver to handle everything in
> a high priority kernel thread.
One thing I've found that does seem to avoid the problem is to disable 
preemption, use polling and replace the schedule() in i2c_wait() with 
udelay(50). That's kind of like the kernel thread option.
> Guenter

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: Errant readings on LM81 with T2080 SoC
@ 2021-03-11 23:47                           ` Chris Packham
  0 siblings, 0 replies; 57+ messages in thread
From: Chris Packham @ 2021-03-11 23:47 UTC (permalink / raw)
  To: Guenter Roeck, Wolfram Sang
  Cc: linux-hwmon, jdelvare, linuxppc-dev, linux-kernel, linux-i2c


On 12/03/21 10:34 am, Guenter Roeck wrote:
> On 3/11/21 1:17 PM, Chris Packham wrote:
>> On 11/03/21 9:18 pm, Wolfram Sang wrote:
>>>> Bummer. What is really weird is that you see clock stretching under
>>>> CPU load. Normally clock stretching is triggered by the device, not
>>>> by the host.
>>> One example: Some hosts need an interrupt per byte to know if they
>>> should send ACK or NACK. If that interrupt is delayed, they stretch the
>>> clock.
>>>
>> It feels like something like that is happening. Looking at the T2080
>> Reference manual there is an interesting timing diagram (Figure 14-2 if
>> someone feels like looking it up). It shows SCL low between the ACK for
>> the address and the data byte. I think if we're delayed in sending the
>> next byte we could violate Ttimeout or Tlow:mext from the SMBUS spec.
>>
> I think that really leaves you only two options that I can see:
> Rework the driver to handle critical actions (such as setting TXAK,
> and everything else that might result in clock stretching) in the
> interrupt handler, or rework the driver to handle everything in
> a high priority kernel thread.
One thing I've found that does seem to avoid the problem is to disable 
preemption, use polling and replace the schedule() in i2c_wait() with 
udelay(50). That's kind of like the kernel thread option.
> Guenter

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: Errant readings on LM81 with T2080 SoC
  2021-03-11 23:47                           ` Chris Packham
@ 2021-03-12  0:07                             ` Guenter Roeck
  -1 siblings, 0 replies; 57+ messages in thread
From: Guenter Roeck @ 2021-03-12  0:07 UTC (permalink / raw)
  To: Chris Packham, Wolfram Sang
  Cc: jdelvare, linux-hwmon, linux-kernel, linux-i2c, linuxppc-dev

On 3/11/21 3:47 PM, Chris Packham wrote:
> 
> On 12/03/21 10:34 am, Guenter Roeck wrote:
>> On 3/11/21 1:17 PM, Chris Packham wrote:
>>> On 11/03/21 9:18 pm, Wolfram Sang wrote:
>>>>> Bummer. What is really weird is that you see clock stretching under
>>>>> CPU load. Normally clock stretching is triggered by the device, not
>>>>> by the host.
>>>> One example: Some hosts need an interrupt per byte to know if they
>>>> should send ACK or NACK. If that interrupt is delayed, they stretch the
>>>> clock.
>>>>
>>> It feels like something like that is happening. Looking at the T2080
>>> Reference manual there is an interesting timing diagram (Figure 14-2 if
>>> someone feels like looking it up). It shows SCL low between the ACK for
>>> the address and the data byte. I think if we're delayed in sending the
>>> next byte we could violate Ttimeout or Tlow:mext from the SMBUS spec.
>>>
>> I think that really leaves you only two options that I can see:
>> Rework the driver to handle critical actions (such as setting TXAK,
>> and everything else that might result in clock stretching) in the
>> interrupt handler, or rework the driver to handle everything in
>> a high priority kernel thread.
> One thing I've found that does seem to avoid the problem is to disable 
> preemption, use polling and replace the schedule() in i2c_wait() with 
> udelay(50). That's kind of like the kernel thread option.

It is kind of hackish, though, especially since it makes the "loaded system"
situation even worse by adding even more active wait loops.

Guenter

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: Errant readings on LM81 with T2080 SoC
@ 2021-03-12  0:07                             ` Guenter Roeck
  0 siblings, 0 replies; 57+ messages in thread
From: Guenter Roeck @ 2021-03-12  0:07 UTC (permalink / raw)
  To: Chris Packham, Wolfram Sang
  Cc: linux-hwmon, jdelvare, linuxppc-dev, linux-kernel, linux-i2c

On 3/11/21 3:47 PM, Chris Packham wrote:
> 
> On 12/03/21 10:34 am, Guenter Roeck wrote:
>> On 3/11/21 1:17 PM, Chris Packham wrote:
>>> On 11/03/21 9:18 pm, Wolfram Sang wrote:
>>>>> Bummer. What is really weird is that you see clock stretching under
>>>>> CPU load. Normally clock stretching is triggered by the device, not
>>>>> by the host.
>>>> One example: Some hosts need an interrupt per byte to know if they
>>>> should send ACK or NACK. If that interrupt is delayed, they stretch the
>>>> clock.
>>>>
>>> It feels like something like that is happening. Looking at the T2080
>>> Reference manual there is an interesting timing diagram (Figure 14-2 if
>>> someone feels like looking it up). It shows SCL low between the ACK for
>>> the address and the data byte. I think if we're delayed in sending the
>>> next byte we could violate Ttimeout or Tlow:mext from the SMBUS spec.
>>>
>> I think that really leaves you only two options that I can see:
>> Rework the driver to handle critical actions (such as setting TXAK,
>> and everything else that might result in clock stretching) in the
>> interrupt handler, or rework the driver to handle everything in
>> a high priority kernel thread.
> One thing I've found that does seem to avoid the problem is to disable 
> preemption, use polling and replace the schedule() in i2c_wait() with 
> udelay(50). That's kind of like the kernel thread option.

It is kind of hackish, though, especially since it makes the "loaded system"
situation even worse by adding even more active wait loops.

Guenter

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: Errant readings on LM81 with T2080 SoC
  2021-03-12  0:07                             ` Guenter Roeck
@ 2021-03-12  0:19                               ` Chris Packham
  -1 siblings, 0 replies; 57+ messages in thread
From: Chris Packham @ 2021-03-12  0:19 UTC (permalink / raw)
  To: Guenter Roeck, Wolfram Sang
  Cc: jdelvare, linux-hwmon, linux-kernel, linux-i2c, linuxppc-dev


On 12/03/21 1:07 pm, Guenter Roeck wrote:
> On 3/11/21 3:47 PM, Chris Packham wrote:
>> On 12/03/21 10:34 am, Guenter Roeck wrote:
>>> On 3/11/21 1:17 PM, Chris Packham wrote:
>>>> On 11/03/21 9:18 pm, Wolfram Sang wrote:
>>>>>> Bummer. What is really weird is that you see clock stretching under
>>>>>> CPU load. Normally clock stretching is triggered by the device, not
>>>>>> by the host.
>>>>> One example: Some hosts need an interrupt per byte to know if they
>>>>> should send ACK or NACK. If that interrupt is delayed, they stretch the
>>>>> clock.
>>>>>
>>>> It feels like something like that is happening. Looking at the T2080
>>>> Reference manual there is an interesting timing diagram (Figure 14-2 if
>>>> someone feels like looking it up). It shows SCL low between the ACK for
>>>> the address and the data byte. I think if we're delayed in sending the
>>>> next byte we could violate Ttimeout or Tlow:mext from the SMBUS spec.
>>>>
>>> I think that really leaves you only two options that I can see:
>>> Rework the driver to handle critical actions (such as setting TXAK,
>>> and everything else that might result in clock stretching) in the
>>> interrupt handler, or rework the driver to handle everything in
>>> a high priority kernel thread.
>> One thing I've found that does seem to avoid the problem is to disable
>> preemption, use polling and replace the schedule() in i2c_wait() with
>> udelay(50). That's kind of like the kernel thread option.
> It is kind of hackish, though, especially since it makes the "loaded system"
> situation even worse by adding even more active wait loops.
No -ish about it :). But it might put out one fire for me while I'm 
looking at doing some kind of interrupt driven state machine.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: Errant readings on LM81 with T2080 SoC
@ 2021-03-12  0:19                               ` Chris Packham
  0 siblings, 0 replies; 57+ messages in thread
From: Chris Packham @ 2021-03-12  0:19 UTC (permalink / raw)
  To: Guenter Roeck, Wolfram Sang
  Cc: linux-hwmon, jdelvare, linuxppc-dev, linux-kernel, linux-i2c


On 12/03/21 1:07 pm, Guenter Roeck wrote:
> On 3/11/21 3:47 PM, Chris Packham wrote:
>> On 12/03/21 10:34 am, Guenter Roeck wrote:
>>> On 3/11/21 1:17 PM, Chris Packham wrote:
>>>> On 11/03/21 9:18 pm, Wolfram Sang wrote:
>>>>>> Bummer. What is really weird is that you see clock stretching under
>>>>>> CPU load. Normally clock stretching is triggered by the device, not
>>>>>> by the host.
>>>>> One example: Some hosts need an interrupt per byte to know if they
>>>>> should send ACK or NACK. If that interrupt is delayed, they stretch the
>>>>> clock.
>>>>>
>>>> It feels like something like that is happening. Looking at the T2080
>>>> Reference manual there is an interesting timing diagram (Figure 14-2 if
>>>> someone feels like looking it up). It shows SCL low between the ACK for
>>>> the address and the data byte. I think if we're delayed in sending the
>>>> next byte we could violate Ttimeout or Tlow:mext from the SMBUS spec.
>>>>
>>> I think that really leaves you only two options that I can see:
>>> Rework the driver to handle critical actions (such as setting TXAK,
>>> and everything else that might result in clock stretching) in the
>>> interrupt handler, or rework the driver to handle everything in
>>> a high priority kernel thread.
>> One thing I've found that does seem to avoid the problem is to disable
>> preemption, use polling and replace the schedule() in i2c_wait() with
>> udelay(50). That's kind of like the kernel thread option.
> It is kind of hackish, though, especially since it makes the "loaded system"
> situation even worse by adding even more active wait loops.
No -ish about it :). But it might put out one fire for me while I'm 
looking at doing some kind of interrupt driven state machine.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* RE: Errant readings on LM81 with T2080 SoC
  2021-03-11 21:34                         ` Guenter Roeck
  (?)
  (?)
@ 2021-03-12  9:25                         ` David Laight
  2021-03-14 21:26                           ` Chris Packham
  -1 siblings, 1 reply; 57+ messages in thread
From: David Laight @ 2021-03-12  9:25 UTC (permalink / raw)
  To: 'Guenter Roeck', Chris Packham, Wolfram Sang
  Cc: linux-hwmon, jdelvare, linuxppc-dev, linux-kernel, linux-i2c

From: Linuxppc-dev Guenter Roeck
> Sent: 11 March 2021 21:35
> 
> On 3/11/21 1:17 PM, Chris Packham wrote:
> >
> > On 11/03/21 9:18 pm, Wolfram Sang wrote:
> >>> Bummer. What is really weird is that you see clock stretching under
> >>> CPU load. Normally clock stretching is triggered by the device, not
> >>> by the host.
> >> One example: Some hosts need an interrupt per byte to know if they
> >> should send ACK or NACK. If that interrupt is delayed, they stretch the
> >> clock.
> >>
> > It feels like something like that is happening. Looking at the T2080
> > Reference manual there is an interesting timing diagram (Figure 14-2 if
> > someone feels like looking it up). It shows SCL low between the ACK for
> > the address and the data byte. I think if we're delayed in sending the
> > next byte we could violate Ttimeout or Tlow:mext from the SMBUS spec.
> >
> 
> I think that really leaves you only two options that I can see:
> Rework the driver to handle critical actions (such as setting TXAK,
> and everything else that might result in clock stretching) in the
> interrupt handler, or rework the driver to handle everything in
> a high priority kernel thread.

I'm not sure a high priority kernel thread will help.
Without CONFIG_PREEMPT (which has its own set of nasties)
a RT process won't be scheduled until the processor it last
ran on does a reschedule.
I don't think a kernel thread will be any different from a
user process running under the RT scheduler.

I'm trying to remember the smbus spec (without remembering the I2C one).
While basically a clock+data bit-bang the slave is allowed to drive
the clock low to extend a cycle.
It may be allowed to do this at any point?
The master can generate the data at almost any rate (below the maximum)
but I don't think it can go down to zero.
But I do remember one of the specs having a timeout.

But I'd have thought the slave should answer the cycle correctly
regardless of any 'random' delays the master adds in.
Unless you are getting away with de-asserting chipselect?

The only implementation I've done is one an FPGA so doesn't have
worry about interrupt latencies.
It doesn't actually support clock stretching; it wasn't in the
code I started from and none of the slaves we need to connect to
ever does it.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: Errant readings on LM81 with T2080 SoC
  2021-03-12  9:25                         ` David Laight
@ 2021-03-14 21:26                           ` Chris Packham
  2021-03-15  9:46                             ` David Laight
  2021-03-18  5:44                               ` Wolfram Sang
  0 siblings, 2 replies; 57+ messages in thread
From: Chris Packham @ 2021-03-14 21:26 UTC (permalink / raw)
  To: David Laight, 'Guenter Roeck', Wolfram Sang
  Cc: linux-hwmon, jdelvare, linuxppc-dev, linux-kernel, linux-i2c

On 12/03/21 10:25 pm, David Laight wrote:
> From: Linuxppc-dev Guenter Roeck
>> Sent: 11 March 2021 21:35
>>
>> On 3/11/21 1:17 PM, Chris Packham wrote:
>>> On 11/03/21 9:18 pm, Wolfram Sang wrote:
>>>>> Bummer. What is really weird is that you see clock stretching under
>>>>> CPU load. Normally clock stretching is triggered by the device, not
>>>>> by the host.
>>>> One example: Some hosts need an interrupt per byte to know if they
>>>> should send ACK or NACK. If that interrupt is delayed, they stretch the
>>>> clock.
>>>>
>>> It feels like something like that is happening. Looking at the T2080
>>> Reference manual there is an interesting timing diagram (Figure 14-2 if
>>> someone feels like looking it up). It shows SCL low between the ACK for
>>> the address and the data byte. I think if we're delayed in sending the
>>> next byte we could violate Ttimeout or Tlow:mext from the SMBUS spec.
>>>
>> I think that really leaves you only two options that I can see:
>> Rework the driver to handle critical actions (such as setting TXAK,
>> and everything else that might result in clock stretching) in the
>> interrupt handler, or rework the driver to handle everything in
>> a high priority kernel thread.
> I'm not sure a high priority kernel thread will help.
> Without CONFIG_PREEMPT (which has its own set of nasties)
> a RT process won't be scheduled until the processor it last
> ran on does a reschedule.
> I don't think a kernel thread will be any different from a
> user process running under the RT scheduler.
>
> I'm trying to remember the smbus spec (without remembering the I2C one).
For those following along the spec is available here[0]. I know there's 
a 3.0 version[1] as well but the devices I'm dealing with are from a 2.0 
vintage.
> While basically a clock+data bit-bang the slave is allowed to drive
> the clock low to extend a cycle.
> It may be allowed to do this at any point?
 From what I can see it's actually the master extending the clock. Or 
more accurately holding it low between the address and data bytes (which 
from the T2080 reference manual looks expected). I think this may cause 
a strictly compliant SMBUS device to determine that Tlow:mext has been 
violated.
> The master can generate the data at almost any rate (below the maximum)
> but I don't think it can go down to zero.
> But I do remember one of the specs having a timeout.
>
> But I'd have thought the slave should answer the cycle correctly
> regardless of any 'random' delays the master adds in.
Probably depends on the device implementation. I've got multiple other 
I2C/SMBUS devices and the LM81 seems to be the one that objects.
> Unless you are getting away with de-asserting chipselect?
>
> The only implementation I've done is one an FPGA so doesn't have
> worry about interrupt latencies.
> It doesn't actually support clock stretching; it wasn't in the
> code I started from and none of the slaves we need to connect to
> ever does it.
>
> 	David

[0] - http://www.smbus.org/specs/smbus20.pdf
[1] - https://pmbus.org/Assets/PDFS/Public/SMBus_3_0_20141220.pdf

>
> -
> Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
> Registration No: 1397386 (Wales)
>

^ permalink raw reply	[flat|nested] 57+ messages in thread

* RE: Errant readings on LM81 with T2080 SoC
  2021-03-14 21:26                           ` Chris Packham
@ 2021-03-15  9:46                             ` David Laight
  2021-03-18  5:44                               ` Wolfram Sang
  1 sibling, 0 replies; 57+ messages in thread
From: David Laight @ 2021-03-15  9:46 UTC (permalink / raw)
  To: 'Chris Packham', 'Guenter Roeck', Wolfram Sang
  Cc: linux-hwmon, jdelvare, linuxppc-dev, linux-kernel, linux-i2c

From: Chris Packham
> Sent: 14 March 2021 21:26
> 
> On 12/03/21 10:25 pm, David Laight wrote:
> > From: Linuxppc-dev Guenter Roeck
> >> Sent: 11 March 2021 21:35
> >>
> >> On 3/11/21 1:17 PM, Chris Packham wrote:
> >>> On 11/03/21 9:18 pm, Wolfram Sang wrote:
> >>>>> Bummer. What is really weird is that you see clock stretching under
> >>>>> CPU load. Normally clock stretching is triggered by the device, not
> >>>>> by the host.
> >>>> One example: Some hosts need an interrupt per byte to know if they
> >>>> should send ACK or NACK. If that interrupt is delayed, they stretch the
> >>>> clock.
> >>>>
> >>> It feels like something like that is happening. Looking at the T2080
> >>> Reference manual there is an interesting timing diagram (Figure 14-2 if
> >>> someone feels like looking it up). It shows SCL low between the ACK for
> >>> the address and the data byte. I think if we're delayed in sending the
> >>> next byte we could violate Ttimeout or Tlow:mext from the SMBUS spec.
> >>>
> >> I think that really leaves you only two options that I can see:
> >> Rework the driver to handle critical actions (such as setting TXAK,
> >> and everything else that might result in clock stretching) in the
> >> interrupt handler, or rework the driver to handle everything in
> >> a high priority kernel thread.
> >
> > I'm not sure a high priority kernel thread will help.
> > Without CONFIG_PREEMPT (which has its own set of nasties)
> > a RT process won't be scheduled until the processor it last
> > ran on does a reschedule.
> > I don't think a kernel thread will be any different from a
> > user process running under the RT scheduler.
> >
> > I'm trying to remember the smbus spec (without remembering the I2C one).

> For those following along the spec is available here[0]. I know there's
> a 3.0 version[1] as well but the devices I'm dealing with are from a 2.0
> vintage.
> > While basically a clock+data bit-bang the slave is allowed to drive
> > the clock low to extend a cycle.
> > It may be allowed to do this at any point?
>
>  From what I can see it's actually the master extending the clock. Or
> more accurately holding it low between the address and data bytes (which
> from the T2080 reference manual looks expected). I think this may cause
> a strictly compliant SMBUS device to determine that Tlow:mext has been
> violated.

Yes, the spec does seem to assume that is a signal is stable
for 20ms something has gone 'horribly wrong'.
I wasn't worries about that, our fpga does the whole transaction
as a single command.
None of our slaves generate interrupts - so it is purely master/slave.

If you run your process under the RT scheduler it is unlikely
that pre-emption will be delayed by long enough to stop the process
running for 10ms.
I've seen >1ms delays (testing RTP audio), but most of the long
loops have a cond_resched() in them.

...

> Probably depends on the device implementation. I've got multiple other
> I2C/SMBUS devices and the LM81 seems to be the one that objects.

I bet most don't implement any of the timeouts.

I found one interesting pmbus device.
Sometimes it would detect a STOP condition because the data line
went high when it tri-stated its output driver in response to the
rising clock edge!
So it saw the same clock edge twice.

> [0] - http://www.smbus.org/specs/smbus20.pdf
> [1] - https://pmbus.org/Assets/PDFS/Public/SMBus_3_0_20141220.pdf

I should have both those - I've copied them to the directory where
I'd look for them first!

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: Errant readings on LM81 with T2080 SoC
  2021-03-11 21:34                         ` Guenter Roeck
@ 2021-03-18  3:46                           ` Chris Packham
  -1 siblings, 0 replies; 57+ messages in thread
From: Chris Packham @ 2021-03-18  3:46 UTC (permalink / raw)
  To: Guenter Roeck, Wolfram Sang
  Cc: jdelvare, linux-hwmon, linux-kernel, linux-i2c, linuxppc-dev


On 12/03/21 10:34 am, Guenter Roeck wrote:
> On 3/11/21 1:17 PM, Chris Packham wrote:
>> On 11/03/21 9:18 pm, Wolfram Sang wrote:
>>>> Bummer. What is really weird is that you see clock stretching under
>>>> CPU load. Normally clock stretching is triggered by the device, not
>>>> by the host.
>>> One example: Some hosts need an interrupt per byte to know if they
>>> should send ACK or NACK. If that interrupt is delayed, they stretch the
>>> clock.
>>>
>> It feels like something like that is happening. Looking at the T2080
>> Reference manual there is an interesting timing diagram (Figure 14-2 if
>> someone feels like looking it up). It shows SCL low between the ACK for
>> the address and the data byte. I think if we're delayed in sending the
>> next byte we could violate Ttimeout or Tlow:mext from the SMBUS spec.
>>
> I think that really leaves you only two options that I can see:
> Rework the driver to handle critical actions (such as setting TXAK,
> and everything else that might result in clock stretching) in the
> interrupt handler, or rework the driver to handle everything in
> a high priority kernel thread.
I've made some reasonable progress on making i2c-mpc more interrupt 
driven. Assuming it works out for my use-case is there an opinion on 
making interrupt support mandatory? Looking at all the in-tree dts files 
that use one of the compatible strings from i2c-mpc.c they all have 
interrupt properties so in theory nothing is using the polling mode. But 
there may be some out-of-tree boards or boards using an old dtb that 
would be affected?

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: Errant readings on LM81 with T2080 SoC
@ 2021-03-18  3:46                           ` Chris Packham
  0 siblings, 0 replies; 57+ messages in thread
From: Chris Packham @ 2021-03-18  3:46 UTC (permalink / raw)
  To: Guenter Roeck, Wolfram Sang
  Cc: linux-hwmon, jdelvare, linuxppc-dev, linux-kernel, linux-i2c


On 12/03/21 10:34 am, Guenter Roeck wrote:
> On 3/11/21 1:17 PM, Chris Packham wrote:
>> On 11/03/21 9:18 pm, Wolfram Sang wrote:
>>>> Bummer. What is really weird is that you see clock stretching under
>>>> CPU load. Normally clock stretching is triggered by the device, not
>>>> by the host.
>>> One example: Some hosts need an interrupt per byte to know if they
>>> should send ACK or NACK. If that interrupt is delayed, they stretch the
>>> clock.
>>>
>> It feels like something like that is happening. Looking at the T2080
>> Reference manual there is an interesting timing diagram (Figure 14-2 if
>> someone feels like looking it up). It shows SCL low between the ACK for
>> the address and the data byte. I think if we're delayed in sending the
>> next byte we could violate Ttimeout or Tlow:mext from the SMBUS spec.
>>
> I think that really leaves you only two options that I can see:
> Rework the driver to handle critical actions (such as setting TXAK,
> and everything else that might result in clock stretching) in the
> interrupt handler, or rework the driver to handle everything in
> a high priority kernel thread.
I've made some reasonable progress on making i2c-mpc more interrupt 
driven. Assuming it works out for my use-case is there an opinion on 
making interrupt support mandatory? Looking at all the in-tree dts files 
that use one of the compatible strings from i2c-mpc.c they all have 
interrupt properties so in theory nothing is using the polling mode. But 
there may be some out-of-tree boards or boards using an old dtb that 
would be affected?

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: Errant readings on LM81 with T2080 SoC
  2021-03-18  3:46                           ` Chris Packham
@ 2021-03-18  4:02                             ` Guenter Roeck
  -1 siblings, 0 replies; 57+ messages in thread
From: Guenter Roeck @ 2021-03-18  4:02 UTC (permalink / raw)
  To: Chris Packham, Wolfram Sang
  Cc: jdelvare, linux-hwmon, linux-kernel, linux-i2c, linuxppc-dev

On 3/17/21 8:46 PM, Chris Packham wrote:
> 
> On 12/03/21 10:34 am, Guenter Roeck wrote:
>> On 3/11/21 1:17 PM, Chris Packham wrote:
>>> On 11/03/21 9:18 pm, Wolfram Sang wrote:
>>>>> Bummer. What is really weird is that you see clock stretching under
>>>>> CPU load. Normally clock stretching is triggered by the device, not
>>>>> by the host.
>>>> One example: Some hosts need an interrupt per byte to know if they
>>>> should send ACK or NACK. If that interrupt is delayed, they stretch the
>>>> clock.
>>>>
>>> It feels like something like that is happening. Looking at the T2080
>>> Reference manual there is an interesting timing diagram (Figure 14-2 if
>>> someone feels like looking it up). It shows SCL low between the ACK for
>>> the address and the data byte. I think if we're delayed in sending the
>>> next byte we could violate Ttimeout or Tlow:mext from the SMBUS spec.
>>>
>> I think that really leaves you only two options that I can see:
>> Rework the driver to handle critical actions (such as setting TXAK,
>> and everything else that might result in clock stretching) in the
>> interrupt handler, or rework the driver to handle everything in
>> a high priority kernel thread.
> I've made some reasonable progress on making i2c-mpc more interrupt 
> driven. Assuming it works out for my use-case is there an opinion on 
> making interrupt support mandatory? Looking at all the in-tree dts files 
> that use one of the compatible strings from i2c-mpc.c they all have 
> interrupt properties so in theory nothing is using the polling mode. But 
> there may be some out-of-tree boards or boards using an old dtb that 
> would be affected?
> 

The polling code is from pre-git times. Like 2005 and earlier.
I'd say it is about time to get rid of it. Any out-of-tree users
had more than 15 years to upstream their code, after all.

Guenter

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: Errant readings on LM81 with T2080 SoC
@ 2021-03-18  4:02                             ` Guenter Roeck
  0 siblings, 0 replies; 57+ messages in thread
From: Guenter Roeck @ 2021-03-18  4:02 UTC (permalink / raw)
  To: Chris Packham, Wolfram Sang
  Cc: linux-hwmon, jdelvare, linuxppc-dev, linux-kernel, linux-i2c

On 3/17/21 8:46 PM, Chris Packham wrote:
> 
> On 12/03/21 10:34 am, Guenter Roeck wrote:
>> On 3/11/21 1:17 PM, Chris Packham wrote:
>>> On 11/03/21 9:18 pm, Wolfram Sang wrote:
>>>>> Bummer. What is really weird is that you see clock stretching under
>>>>> CPU load. Normally clock stretching is triggered by the device, not
>>>>> by the host.
>>>> One example: Some hosts need an interrupt per byte to know if they
>>>> should send ACK or NACK. If that interrupt is delayed, they stretch the
>>>> clock.
>>>>
>>> It feels like something like that is happening. Looking at the T2080
>>> Reference manual there is an interesting timing diagram (Figure 14-2 if
>>> someone feels like looking it up). It shows SCL low between the ACK for
>>> the address and the data byte. I think if we're delayed in sending the
>>> next byte we could violate Ttimeout or Tlow:mext from the SMBUS spec.
>>>
>> I think that really leaves you only two options that I can see:
>> Rework the driver to handle critical actions (such as setting TXAK,
>> and everything else that might result in clock stretching) in the
>> interrupt handler, or rework the driver to handle everything in
>> a high priority kernel thread.
> I've made some reasonable progress on making i2c-mpc more interrupt 
> driven. Assuming it works out for my use-case is there an opinion on 
> making interrupt support mandatory? Looking at all the in-tree dts files 
> that use one of the compatible strings from i2c-mpc.c they all have 
> interrupt properties so in theory nothing is using the polling mode. But 
> there may be some out-of-tree boards or boards using an old dtb that 
> would be affected?
> 

The polling code is from pre-git times. Like 2005 and earlier.
I'd say it is about time to get rid of it. Any out-of-tree users
had more than 15 years to upstream their code, after all.

Guenter

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: Errant readings on LM81 with T2080 SoC
  2021-03-18  4:02                             ` Guenter Roeck
@ 2021-03-18  5:39                               ` Wolfram Sang
  -1 siblings, 0 replies; 57+ messages in thread
From: Wolfram Sang @ 2021-03-18  5:39 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: Chris Packham, jdelvare, linux-hwmon, linux-kernel, linux-i2c,
	linuxppc-dev

[-- Attachment #1: Type: text/plain, Size: 304 bytes --]


> The polling code is from pre-git times. Like 2005 and earlier.
> I'd say it is about time to get rid of it. Any out-of-tree users
> had more than 15 years to upstream their code, after all.

Parts of the polling mode might be interesting for the atomic_xfer mode
maybe? Which is not implemented yet.


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: Errant readings on LM81 with T2080 SoC
@ 2021-03-18  5:39                               ` Wolfram Sang
  0 siblings, 0 replies; 57+ messages in thread
From: Wolfram Sang @ 2021-03-18  5:39 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: linux-hwmon, jdelvare, linux-kernel, Chris Packham, linux-i2c,
	linuxppc-dev

[-- Attachment #1: Type: text/plain, Size: 304 bytes --]


> The polling code is from pre-git times. Like 2005 and earlier.
> I'd say it is about time to get rid of it. Any out-of-tree users
> had more than 15 years to upstream their code, after all.

Parts of the polling mode might be interesting for the atomic_xfer mode
maybe? Which is not implemented yet.


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: Errant readings on LM81 with T2080 SoC
  2021-03-14 21:26                           ` Chris Packham
@ 2021-03-18  5:44                               ` Wolfram Sang
  2021-03-18  5:44                               ` Wolfram Sang
  1 sibling, 0 replies; 57+ messages in thread
From: Wolfram Sang @ 2021-03-18  5:44 UTC (permalink / raw)
  To: Chris Packham
  Cc: David Laight, 'Guenter Roeck',
	linux-hwmon, jdelvare, linuxppc-dev, linux-kernel, linux-i2c

[-- Attachment #1: Type: text/plain, Size: 365 bytes --]


> Probably depends on the device implementation. I've got multiple other 
> I2C/SMBUS devices and the LM81 seems to be the one that objects.

For the recored, there was just a similar case with a DA9063, but that
one luckily had a bit to switch from SMBus to I2C mode, i.e. no timeout
handling:

  [PATCH v6 1/1] mfd: da9063: Support SMBus and I2C mode


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: Errant readings on LM81 with T2080 SoC
@ 2021-03-18  5:44                               ` Wolfram Sang
  0 siblings, 0 replies; 57+ messages in thread
From: Wolfram Sang @ 2021-03-18  5:44 UTC (permalink / raw)
  To: Chris Packham
  Cc: linux-hwmon, jdelvare, linux-kernel, David Laight, linux-i2c,
	linuxppc-dev, 'Guenter Roeck'

[-- Attachment #1: Type: text/plain, Size: 365 bytes --]


> Probably depends on the device implementation. I've got multiple other 
> I2C/SMBUS devices and the LM81 seems to be the one that objects.

For the recored, there was just a similar case with a DA9063, but that
one luckily had a bit to switch from SMBus to I2C mode, i.e. no timeout
handling:

  [PATCH v6 1/1] mfd: da9063: Support SMBus and I2C mode


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 57+ messages in thread

end of thread, other threads:[~2021-03-18  5:45 UTC | newest]

Thread overview: 57+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-07 22:52 Errant readings on LM81 with T2080 SoC Chris Packham
2021-03-07 22:52 ` Chris Packham
2021-03-08  0:31 ` Guenter Roeck
2021-03-08  0:31   ` Guenter Roeck
2021-03-08  2:27   ` Chris Packham
2021-03-08  2:27     ` Chris Packham
2021-03-08  4:37     ` Chris Packham
2021-03-08  4:37       ` Chris Packham
2021-03-08  4:59       ` Guenter Roeck
2021-03-08  4:59         ` Guenter Roeck
2021-03-08 20:27         ` Chris Packham
2021-03-08 20:27           ` Chris Packham
2021-03-08 22:39           ` Guenter Roeck
2021-03-08 22:39             ` Guenter Roeck
2021-03-10  2:19           ` Chris Packham
2021-03-10  2:19             ` Chris Packham
2021-03-10  5:06             ` Guenter Roeck
2021-03-10  5:06               ` Guenter Roeck
2021-03-10 21:48               ` Chris Packham
2021-03-10 21:48                 ` Chris Packham
2021-03-11  7:41                 ` Guenter Roeck
2021-03-11  7:41                   ` Guenter Roeck
2021-03-11  8:18                   ` Wolfram Sang
2021-03-11  8:18                     ` Wolfram Sang
2021-03-11 15:19                     ` Guenter Roeck
2021-03-11 15:19                       ` Guenter Roeck
2021-03-11 21:17                     ` Chris Packham
2021-03-11 21:17                       ` Chris Packham
2021-03-11 21:34                       ` Guenter Roeck
2021-03-11 21:34                         ` Guenter Roeck
2021-03-11 23:47                         ` Chris Packham
2021-03-11 23:47                           ` Chris Packham
2021-03-12  0:07                           ` Guenter Roeck
2021-03-12  0:07                             ` Guenter Roeck
2021-03-12  0:19                             ` Chris Packham
2021-03-12  0:19                               ` Chris Packham
2021-03-12  9:25                         ` David Laight
2021-03-14 21:26                           ` Chris Packham
2021-03-15  9:46                             ` David Laight
2021-03-18  5:44                             ` Wolfram Sang
2021-03-18  5:44                               ` Wolfram Sang
2021-03-18  3:46                         ` Chris Packham
2021-03-18  3:46                           ` Chris Packham
2021-03-18  4:02                           ` Guenter Roeck
2021-03-18  4:02                             ` Guenter Roeck
2021-03-18  5:39                             ` Wolfram Sang
2021-03-18  5:39                               ` Wolfram Sang
2021-03-08 22:10         ` Chris Packham
2021-03-08 22:10           ` Chris Packham
2021-03-09  4:36           ` Chris Packham
2021-03-09  4:36             ` Chris Packham
2021-03-09  5:24             ` Guenter Roeck
2021-03-09  5:24               ` Guenter Roeck
2021-03-09 23:35   ` Chris Packham
2021-03-09 23:35     ` Chris Packham
2021-03-10  3:29     ` Guenter Roeck
2021-03-10  3:29       ` Guenter Roeck

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.