* Errant readings on LM81 with T2080 SoC @ 2021-03-07 22:52 Chris Packham 2021-03-08 0:31 ` Guenter Roeck 0 siblings, 1 reply; 30+ messages in thread From: Chris Packham @ 2021-03-07 22:52 UTC (permalink / raw) To: jdelvare, Guenter Roeck Cc: linux-hwmon, linux-kernel, linux-i2c, linuxppc-dev Hi, I've got a system using a PowerPC T2080 SoC and among other things has an LM81 hwmon chip. Under a high CPU load we see errant readings from the LM81 as well as actual failures. It's the errant readings that cause the most concern since we can easily ignore the read errors in our monitoring application (although it would be better if they weren't there at all). I'm able to reproduce this with a test application[0] that artificially creates a high CPU load then by repeatedly checking for the all-1s values from the LM81 datasheet[1](page 17). The all-1s readings stick out as they are obviously higher than the voltage rails that are connected and disagree with measurements taken with a multimeter. Here's the output from my device [root@linuxbox ~]# cpuload 90& [root@linuxbox ~]# (while true; do cat /sys/class/hwmon/hwmon0/in*_input | grep '3320\|4383\|6641\|15930\|3586'; sleep 1; done)& 3586 3586 cat: read error: No such device or address cat: read error: No such device or address 3320 3320 3586 3586 6641 6641 4383 4383 Fundamentally I think this is a problem with the fact that the LM81 is an SMBus device but the T2080 (and other Freescale SoCs) uses i2c and we emulate SMBus. I suspect the errant readings are when we don't get round to completing the read within the timeout specified by the SMBus specification. Depending on when that happens we either fail the transfer or interpret the result as all-1s. [0] - https://gist.github.com/cpackham/6356a3a943accebb228135dc10daf721 [1] - https://www.ti.com/lit/ds/symlink/lm81.pdf ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Errant readings on LM81 with T2080 SoC 2021-03-07 22:52 Errant readings on LM81 with T2080 SoC Chris Packham @ 2021-03-08 0:31 ` Guenter Roeck 2021-03-08 2:27 ` Chris Packham 2021-03-09 23:35 ` Chris Packham 0 siblings, 2 replies; 30+ messages in thread From: Guenter Roeck @ 2021-03-08 0:31 UTC (permalink / raw) To: Chris Packham, jdelvare Cc: linux-hwmon, linux-kernel, linux-i2c, linuxppc-dev On 3/7/21 2:52 PM, Chris Packham wrote: > Hi, > > I've got a system using a PowerPC T2080 SoC and among other things has > an LM81 hwmon chip. > > Under a high CPU load we see errant readings from the LM81 as well as > actual failures. It's the errant readings that cause the most concern > since we can easily ignore the read errors in our monitoring application > (although it would be better if they weren't there at all). > > I'm able to reproduce this with a test application[0] that artificially > creates a high CPU load then by repeatedly checking for the all-1s > values from the LM81 datasheet[1](page 17). The all-1s readings stick > out as they are obviously higher than the voltage rails that are > connected and disagree with measurements taken with a multimeter. > > Here's the output from my device > > [root@linuxbox ~]# cpuload 90& > [root@linuxbox ~]# (while true; do cat /sys/class/hwmon/hwmon0/in*_input > | grep '3320\|4383\|6641\|15930\|3586'; sleep 1; done)& > 3586 > 3586 > cat: read error: No such device or address > cat: read error: No such device or address > 3320 > 3320 > 3586 > 3586 > 6641 > 6641 > 4383 > 4383 > > Fundamentally I think this is a problem with the fact that the LM81 is > an SMBus device but the T2080 (and other Freescale SoCs) uses i2c and we > emulate SMBus. I suspect the errant readings are when we don't get round > to completing the read within the timeout specified by the SMBus > specification. Depending on when that happens we either fail the > transfer or interpret the result as all-1s. > That is quite unlikely. Many sensor chips are SMBus chips connected to i2c busses. It is much more likely that there is a bug in the T2080 i2c driver, that the chip doesn't like the bulk read command issued through regmap, that the chip has problems with the i2c bus speed, or that the i2c bus is noisy. In this context, the "No such device or address" responses are very suspicious. Those are reported by the i2c driver, not by the hwmon driver, and suggest that the chip did not respond to a read request. Maybe it helps to enable debugging to the i2c driver to see if it reports anything useful. Even better might be to connect an i2c bus analyzer to the i2c bus and check what is going on. Guenter ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Errant readings on LM81 with T2080 SoC 2021-03-08 0:31 ` Guenter Roeck @ 2021-03-08 2:27 ` Chris Packham 2021-03-08 4:37 ` Chris Packham 2021-03-09 23:35 ` Chris Packham 1 sibling, 1 reply; 30+ messages in thread From: Chris Packham @ 2021-03-08 2:27 UTC (permalink / raw) To: Guenter Roeck, jdelvare Cc: linux-hwmon, linux-kernel, linux-i2c, linuxppc-dev On 8/03/21 1:31 pm, Guenter Roeck wrote: > On 3/7/21 2:52 PM, Chris Packham wrote: >> Hi, >> >> I've got a system using a PowerPC T2080 SoC and among other things has >> an LM81 hwmon chip. >> >> Under a high CPU load we see errant readings from the LM81 as well as >> actual failures. It's the errant readings that cause the most concern >> since we can easily ignore the read errors in our monitoring application >> (although it would be better if they weren't there at all). >> >> I'm able to reproduce this with a test application[0] that artificially >> creates a high CPU load then by repeatedly checking for the all-1s >> values from the LM81 datasheet[1](page 17). The all-1s readings stick >> out as they are obviously higher than the voltage rails that are >> connected and disagree with measurements taken with a multimeter. >> >> Here's the output from my device >> >> [root@linuxbox ~]# cpuload 90& >> [root@linuxbox ~]# (while true; do cat /sys/class/hwmon/hwmon0/in*_input >> | grep '3320\|4383\|6641\|15930\|3586'; sleep 1; done)& >> 3586 >> 3586 >> cat: read error: No such device or address >> cat: read error: No such device or address >> 3320 >> 3320 >> 3586 >> 3586 >> 6641 >> 6641 >> 4383 >> 4383 >> >> Fundamentally I think this is a problem with the fact that the LM81 is >> an SMBus device but the T2080 (and other Freescale SoCs) uses i2c and we >> emulate SMBus. I suspect the errant readings are when we don't get round >> to completing the read within the timeout specified by the SMBus >> specification. Depending on when that happens we either fail the >> transfer or interpret the result as all-1s. >> > That is quite unlikely. Many sensor chips are SMBus chips connected to > i2c busses. It is much more likely that there is a bug in the T2080 i2c driver, > that the chip doesn't like the bulk read command issued through regmap, that > the chip has problems with the i2c bus speed, or that the i2c bus is noisy. Perhaps something gets upset when interrupt processing is delayed because of CPU load. I don't see the problem when there isn't a CPU load so I think that eliminates board issues. > In this context, the "No such device or address" responses are very suspicious. > Those are reported by the i2c driver, not by the hwmon driver, and suggest > that the chip did not respond to a read request. Maybe it helps to enable > debugging to the i2c driver to see if it reports anything useful. Even > better might be to connect an i2c bus analyzer to the i2c bus and check > what is going on. That's from -ENXIO which is used in only one place in i2c-mpc.c. I'll enable some debug and see what we get. > > Guenter ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Errant readings on LM81 with T2080 SoC 2021-03-08 2:27 ` Chris Packham @ 2021-03-08 4:37 ` Chris Packham 2021-03-08 4:59 ` Guenter Roeck 0 siblings, 1 reply; 30+ messages in thread From: Chris Packham @ 2021-03-08 4:37 UTC (permalink / raw) To: Guenter Roeck, jdelvare Cc: linux-hwmon, linux-kernel, linux-i2c, linuxppc-dev On 8/03/21 3:27 pm, Chris Packham wrote: > > On 8/03/21 1:31 pm, Guenter Roeck wrote: >> On 3/7/21 2:52 PM, Chris Packham wrote: >>> Hi, >>> >>> I've got a system using a PowerPC T2080 SoC and among other things has >>> an LM81 hwmon chip. >>> >>> Under a high CPU load we see errant readings from the LM81 as well as >>> actual failures. It's the errant readings that cause the most concern >>> since we can easily ignore the read errors in our monitoring >>> application >>> (although it would be better if they weren't there at all). >>> >>> I'm able to reproduce this with a test application[0] that artificially >>> creates a high CPU load then by repeatedly checking for the all-1s >>> values from the LM81 datasheet[1](page 17). The all-1s readings stick >>> out as they are obviously higher than the voltage rails that are >>> connected and disagree with measurements taken with a multimeter. >>> >>> Here's the output from my device >>> >>> [root@linuxbox ~]# cpuload 90& >>> [root@linuxbox ~]# (while true; do cat >>> /sys/class/hwmon/hwmon0/in*_input >>> | grep '3320\|4383\|6641\|15930\|3586'; sleep 1; done)& >>> 3586 >>> 3586 >>> cat: read error: No such device or address >>> cat: read error: No such device or address >>> 3320 >>> 3320 >>> 3586 >>> 3586 >>> 6641 >>> 6641 >>> 4383 >>> 4383 >>> >>> Fundamentally I think this is a problem with the fact that the LM81 is >>> an SMBus device but the T2080 (and other Freescale SoCs) uses i2c >>> and we >>> emulate SMBus. I suspect the errant readings are when we don't get >>> round >>> to completing the read within the timeout specified by the SMBus >>> specification. Depending on when that happens we either fail the >>> transfer or interpret the result as all-1s. >>> >> That is quite unlikely. Many sensor chips are SMBus chips connected to >> i2c busses. It is much more likely that there is a bug in the T2080 >> i2c driver, >> that the chip doesn't like the bulk read command issued through >> regmap, that >> the chip has problems with the i2c bus speed, or that the i2c bus is >> noisy. > Perhaps something gets upset when interrupt processing is delayed > because of CPU load. I don't see the problem when there isn't a CPU > load so I think that eliminates board issues. >> In this context, the "No such device or address" responses are very >> suspicious. >> Those are reported by the i2c driver, not by the hwmon driver, and >> suggest >> that the chip did not respond to a read request. Maybe it helps to >> enable >> debugging to the i2c driver to see if it reports anything useful. Even >> better might be to connect an i2c bus analyzer to the i2c bus and check >> what is going on. > That's from -ENXIO which is used in only one place in i2c-mpc.c. I'll > enable some debug and see what we get. For the errant readings there was nothing abnormal reported by the driver. For the "No such device or address" I saw "mpc-i2c ffe119000.i2c: No RXAK" which matches up with the -ENXIO return. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Errant readings on LM81 with T2080 SoC 2021-03-08 4:37 ` Chris Packham @ 2021-03-08 4:59 ` Guenter Roeck 2021-03-08 20:27 ` Chris Packham 2021-03-08 22:10 ` Chris Packham 0 siblings, 2 replies; 30+ messages in thread From: Guenter Roeck @ 2021-03-08 4:59 UTC (permalink / raw) To: Chris Packham, jdelvare Cc: linux-hwmon, linux-kernel, linux-i2c, linuxppc-dev On 3/7/21 8:37 PM, Chris Packham wrote: [ ... ] >> That's from -ENXIO which is used in only one place in i2c-mpc.c. I'll >> enable some debug and see what we get. > > For the errant readings there was nothing abnormal reported by the driver. > > For the "No such device or address" I saw "mpc-i2c ffe119000.i2c: No > RXAK" which matches up with the -ENXIO return. > Id suggest to check the time until not busy and stop in mpc_xfer(). Those hot loops are unusual, and may well mess up the code especially if preempt is enabled. Also, are you using interrupts or polling in your system ? The interrupt handler looks a bit odd, with "Read again to allow register to stabilise". Do you have fsl,timeout set in the devicetree properties and, if so, have you played with it ? Other than that, the only other real idea I have would be to monitor the i2c bus. Guenter ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Errant readings on LM81 with T2080 SoC 2021-03-08 4:59 ` Guenter Roeck @ 2021-03-08 20:27 ` Chris Packham 2021-03-08 22:39 ` Guenter Roeck 2021-03-10 2:19 ` Chris Packham 2021-03-08 22:10 ` Chris Packham 1 sibling, 2 replies; 30+ messages in thread From: Chris Packham @ 2021-03-08 20:27 UTC (permalink / raw) To: Guenter Roeck, jdelvare Cc: linux-hwmon, linux-kernel, linux-i2c, linuxppc-dev On 8/03/21 5:59 pm, Guenter Roeck wrote: > On 3/7/21 8:37 PM, Chris Packham wrote: > [ ... ] >>> That's from -ENXIO which is used in only one place in i2c-mpc.c. I'll >>> enable some debug and see what we get. >> For the errant readings there was nothing abnormal reported by the driver. >> >> For the "No such device or address" I saw "mpc-i2c ffe119000.i2c: No >> RXAK" which matches up with the -ENXIO return. >> > Id suggest to check the time until not busy and stop in mpc_xfer(). > Those hot loops are unusual, and may well mess up the code especially > if preempt is enabled. Also, are you using interrupts or polling in > your system ? I'm using interrupts but I see the same issue if I comment out the interrupts in the dtsi file (i.e. force it to use polling). > The interrupt handler looks a bit odd, with "Read again > to allow register to stabilise". Yeah that stuck out to me too. The code in question predates git, I went spelunking in history.git and the "Read again" seems to be in the initial version[0]. I did try to alter the interrupt handler so that it only does one read but that didn't seem to change anything. > Do you have fsl,timeout set in the devicetree properties and, if so, > have you played with it ? Haven't got it set but I'll have a go at tweaking it. > Other than that, the only other real idea I have would be to monitor > the i2c bus. I am in the fortunate position of being able to go into the office and even happen to have the expensive scope at the moment. Now I just need to find a tame HW engineer so I don't burn myself trying to attach the probes. -- [0] - https://git.kernel.org/pub/scm/linux/kernel/git/history/history.git/commit/?id=11b3235dc04a306f6a9ba14c1ab621b2d54f2c56 ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Errant readings on LM81 with T2080 SoC 2021-03-08 20:27 ` Chris Packham @ 2021-03-08 22:39 ` Guenter Roeck 2021-03-10 2:19 ` Chris Packham 1 sibling, 0 replies; 30+ messages in thread From: Guenter Roeck @ 2021-03-08 22:39 UTC (permalink / raw) To: Chris Packham Cc: jdelvare, linux-hwmon, linux-kernel, linux-i2c, linuxppc-dev On Mon, Mar 08, 2021 at 08:27:30PM +0000, Chris Packham wrote: [ ... ] > > Other than that, the only other real idea I have would be to monitor > > the i2c bus. > I am in the fortunate position of being able to go into the office and > even happen to have the expensive scope at the moment. Now I just need > to find a tame HW engineer so I don't burn myself trying to attach the > probes. > A bit unrelated, but you can get scopes connected through usb which are quite low-cost (like in the $100 range) and good enough for i2c testing. Guenter ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Errant readings on LM81 with T2080 SoC 2021-03-08 20:27 ` Chris Packham 2021-03-08 22:39 ` Guenter Roeck @ 2021-03-10 2:19 ` Chris Packham 2021-03-10 5:06 ` Guenter Roeck 1 sibling, 1 reply; 30+ messages in thread From: Chris Packham @ 2021-03-10 2:19 UTC (permalink / raw) To: Guenter Roeck, jdelvare Cc: linux-hwmon, linux-kernel, linux-i2c, linuxppc-dev On 9/03/21 9:27 am, Chris Packham wrote: > On 8/03/21 5:59 pm, Guenter Roeck wrote: >> Other than that, the only other real idea I have would be to monitor >> the i2c bus. > I am in the fortunate position of being able to go into the office and > even happen to have the expensive scope at the moment. Now I just need > to find a tame HW engineer so I don't burn myself trying to attach the > probes. One thing I see on the scope is that when there is a CPU load there appears to be some clock stretching going on (SCL is held low some times). I don't see it without the CPU load. It's hard to correlate a clock stretching event with a bad read or error but it is one area where the SMBUS spec has a maximum that might cause the device to give up waiting. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Errant readings on LM81 with T2080 SoC 2021-03-10 2:19 ` Chris Packham @ 2021-03-10 5:06 ` Guenter Roeck 2021-03-10 21:48 ` Chris Packham 0 siblings, 1 reply; 30+ messages in thread From: Guenter Roeck @ 2021-03-10 5:06 UTC (permalink / raw) To: Chris Packham, jdelvare Cc: linux-hwmon, linux-kernel, linux-i2c, linuxppc-dev On 3/9/21 6:19 PM, Chris Packham wrote: > On 9/03/21 9:27 am, Chris Packham wrote: >> On 8/03/21 5:59 pm, Guenter Roeck wrote: >>> Other than that, the only other real idea I have would be to monitor >>> the i2c bus. >> I am in the fortunate position of being able to go into the office and >> even happen to have the expensive scope at the moment. Now I just need >> to find a tame HW engineer so I don't burn myself trying to attach the >> probes. > One thing I see on the scope is that when there is a CPU load there > appears to be some clock stretching going on (SCL is held low some > times). I don't see it without the CPU load. It's hard to correlate a > clock stretching event with a bad read or error but it is one area where > the SMBUS spec has a maximum that might cause the device to give up waiting. > Do you have CONFIG_PREEMPT enabled in your kernel ? But even without that it is possible that the hot loops at the beginning and end of each operation mess up the driver and cause it to sleep longer than intended. Did you try usleep_range() ? On a side note, can you send me a register dump for the lm81 ? It would be useful for my module test code. Thanks, Guenter ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Errant readings on LM81 with T2080 SoC 2021-03-10 5:06 ` Guenter Roeck @ 2021-03-10 21:48 ` Chris Packham 2021-03-11 7:41 ` Guenter Roeck 0 siblings, 1 reply; 30+ messages in thread From: Chris Packham @ 2021-03-10 21:48 UTC (permalink / raw) To: Guenter Roeck, jdelvare Cc: linux-hwmon, linux-kernel, linux-i2c, linuxppc-dev On 10/03/21 6:06 pm, Guenter Roeck wrote: > On 3/9/21 6:19 PM, Chris Packham wrote: >> On 9/03/21 9:27 am, Chris Packham wrote: >>> On 8/03/21 5:59 pm, Guenter Roeck wrote: >>>> Other than that, the only other real idea I have would be to monitor >>>> the i2c bus. >>> I am in the fortunate position of being able to go into the office and >>> even happen to have the expensive scope at the moment. Now I just need >>> to find a tame HW engineer so I don't burn myself trying to attach the >>> probes. >> One thing I see on the scope is that when there is a CPU load there >> appears to be some clock stretching going on (SCL is held low some >> times). I don't see it without the CPU load. It's hard to correlate a >> clock stretching event with a bad read or error but it is one area where >> the SMBUS spec has a maximum that might cause the device to give up waiting. >> > Do you have CONFIG_PREEMPT enabled in your kernel ? But even without > that it is possible that the hot loops at the beginning and end of > each operation mess up the driver and cause it to sleep longer > than intended. Did you try usleep_range() ? I've been running with and without CONFIG_PREEMPT. The failures happen with both. I did try usleep_range() and still saw failures. > On a side note, can you send me a register dump for the lm81 ? > It would be useful for my module test code. Here you go this is from a largely unconfigured LM81 0 1 2 3 4 5 6 7 8 9 a b c d e f 0123456789abcdef 00: 47 47 47 47 47 47 47 47 47 47 47 47 47 47 47 47 GGGGGGGGGGGGGGGG 10: 47 81 24 03 94 00 00 00 00 ff ff ff ff ff ff ff G?$??........... 20: bf cb c1 00 c0 47 ec 24 ff ff 65 ff 00 ff 00 ff ???.?G?$..e..... 30: 00 ff 00 ff 00 ff 00 71 a9 7f 7f ff ff 58 01 04 .......q???..X?? 40: 01 08 00 00 00 00 00 50 2f 80 80 01 44 00 00 00 ??.....P/???D... 50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 90: 00 81 24 03 94 00 00 00 00 ff ff ff ff ff ff ff .?$??........... a0: bf cb c1 00 c0 47 ec 24 ff ff 65 ff 00 ff 00 ff ???.?G?$..e..... b0: 00 ff 00 ff 00 ff 00 71 a9 7f 7f ff ff 58 01 04 .......q???..X?? c0: 01 00 00 00 00 00 00 50 2f 80 80 01 44 00 00 00 ?......P/???D... d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ This is from a LM81 that's been configured by our application SW with limits appropriate for the platform. 0 1 2 3 4 5 6 7 8 9 a b c d e f 0123456789abcdef 00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ................ 10: ff 81 24 03 94 00 00 00 00 ff ff ff ff ff ff ff ..$............. 20: bf cc c1 00 c0 47 ec 1c ff ff 65 dc b4 ff c0 d3 .....G....e..... 30: ad ff 00 d3 ad 4e 40 71 a9 4b 46 ff ff 58 01 04 .....N@q.KF..X.. 40: 01 08 00 00 00 00 00 f0 2f 80 80 81 44 80 80 80 ......../...D... 50: 80 80 80 80 80 80 80 80 80 80 80 80 80 80 80 80 ................ 60: 80 80 80 80 80 80 80 80 80 80 80 80 80 80 80 80 ................ 70: 80 80 80 80 80 80 80 80 80 80 80 80 80 80 80 80 ................ 80: 80 80 80 80 80 80 80 80 80 80 80 80 80 80 80 80 ................ 90: 80 81 24 03 94 00 00 00 00 ff ff ff ff ff ff ff ..$............. a0: bf cc c1 00 c0 47 ec 1c ff ff 65 dc b4 ff c0 d3 .....G....e..... b0: ad ff 00 d3 ad 4e 40 71 a9 4b 46 ff ff 58 01 04 .....N@q.KF..X.. c0: 01 00 00 00 00 00 00 f0 2f 80 80 81 44 80 80 80 ......../...D... d0: 80 80 80 80 80 80 80 80 80 80 80 80 80 80 80 80 ................ e0: 80 80 80 80 80 80 80 80 80 80 80 80 80 80 80 80 ................ f0: 80 80 80 80 80 80 80 80 80 80 80 80 80 80 80 80 ................ ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Errant readings on LM81 with T2080 SoC 2021-03-10 21:48 ` Chris Packham @ 2021-03-11 7:41 ` Guenter Roeck 2021-03-11 8:18 ` Wolfram Sang 0 siblings, 1 reply; 30+ messages in thread From: Guenter Roeck @ 2021-03-11 7:41 UTC (permalink / raw) To: Chris Packham, jdelvare Cc: linux-hwmon, linux-kernel, linux-i2c, linuxppc-dev On 3/10/21 1:48 PM, Chris Packham wrote: > > On 10/03/21 6:06 pm, Guenter Roeck wrote: >> On 3/9/21 6:19 PM, Chris Packham wrote: >>> On 9/03/21 9:27 am, Chris Packham wrote: >>>> On 8/03/21 5:59 pm, Guenter Roeck wrote: >>>>> Other than that, the only other real idea I have would be to monitor >>>>> the i2c bus. >>>> I am in the fortunate position of being able to go into the office and >>>> even happen to have the expensive scope at the moment. Now I just need >>>> to find a tame HW engineer so I don't burn myself trying to attach the >>>> probes. >>> One thing I see on the scope is that when there is a CPU load there >>> appears to be some clock stretching going on (SCL is held low some >>> times). I don't see it without the CPU load. It's hard to correlate a >>> clock stretching event with a bad read or error but it is one area where >>> the SMBUS spec has a maximum that might cause the device to give up waiting. >>> >> Do you have CONFIG_PREEMPT enabled in your kernel ? But even without >> that it is possible that the hot loops at the beginning and end of >> each operation mess up the driver and cause it to sleep longer >> than intended. Did you try usleep_range() ? > > I've been running with and without CONFIG_PREEMPT. The failures happen > with both. > > I did try usleep_range() and still saw failures. > Bummer. What is really weird is that you see clock stretching under CPU load. Normally clock stretching is triggered by the device, not by the host. I wonder if there are some timing differences before the clock stretching happens. Anyway, I just sent a set of three patches to the list; maybe you can give it a try. The patches convert the driver to the with_info API and drop local caching. The code is module tested with the register dumps I have available for adm9240 and lm81, but it would be great to get test coverage on real hardware. I don't really expect it to solve your problem, but it does reduce and modify the load on the chip (because registers are no longer read in bursts), so it may have some positive impact. >> On a side note, can you send me a register dump for the lm81 ? >> It would be useful for my module test code. > > Here you go this is from a largely unconfigured LM81 > Thanks, that helped a lot! Guenter ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Errant readings on LM81 with T2080 SoC 2021-03-11 7:41 ` Guenter Roeck @ 2021-03-11 8:18 ` Wolfram Sang 2021-03-11 15:19 ` Guenter Roeck 2021-03-11 21:17 ` Chris Packham 0 siblings, 2 replies; 30+ messages in thread From: Wolfram Sang @ 2021-03-11 8:18 UTC (permalink / raw) To: Guenter Roeck Cc: Chris Packham, jdelvare, linux-hwmon, linux-kernel, linux-i2c, linuxppc-dev [-- Attachment #1: Type: text/plain, Size: 304 bytes --] > Bummer. What is really weird is that you see clock stretching under > CPU load. Normally clock stretching is triggered by the device, not > by the host. One example: Some hosts need an interrupt per byte to know if they should send ACK or NACK. If that interrupt is delayed, they stretch the clock. [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Errant readings on LM81 with T2080 SoC 2021-03-11 8:18 ` Wolfram Sang @ 2021-03-11 15:19 ` Guenter Roeck 2021-03-11 21:17 ` Chris Packham 1 sibling, 0 replies; 30+ messages in thread From: Guenter Roeck @ 2021-03-11 15:19 UTC (permalink / raw) To: Wolfram Sang Cc: Chris Packham, jdelvare, linux-hwmon, linux-kernel, linux-i2c, linuxppc-dev [-- Attachment #1.1: Type: text/plain, Size: 763 bytes --] On 3/11/21 12:18 AM, Wolfram Sang wrote: > >> Bummer. What is really weird is that you see clock stretching under >> CPU load. Normally clock stretching is triggered by the device, not >> by the host. > > One example: Some hosts need an interrupt per byte to know if they > should send ACK or NACK. If that interrupt is delayed, they stretch the > clock. > Indeed, the i2c-mpc driver sends TXAK (only) after receiving that interrupt. Since that is running in the context of the user process, that may well be delayed substantially on a loaded system. Maybe the interrupt handler will need to play a more active role in the i2c-mpc driver. Alternatively, the transfer function could be handled by a high priority kernel thread. Guenter [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Errant readings on LM81 with T2080 SoC 2021-03-11 8:18 ` Wolfram Sang 2021-03-11 15:19 ` Guenter Roeck @ 2021-03-11 21:17 ` Chris Packham 2021-03-11 21:34 ` Guenter Roeck 1 sibling, 1 reply; 30+ messages in thread From: Chris Packham @ 2021-03-11 21:17 UTC (permalink / raw) To: Wolfram Sang, Guenter Roeck Cc: jdelvare, linux-hwmon, linux-kernel, linux-i2c, linuxppc-dev On 11/03/21 9:18 pm, Wolfram Sang wrote: >> Bummer. What is really weird is that you see clock stretching under >> CPU load. Normally clock stretching is triggered by the device, not >> by the host. > One example: Some hosts need an interrupt per byte to know if they > should send ACK or NACK. If that interrupt is delayed, they stretch the > clock. > It feels like something like that is happening. Looking at the T2080 Reference manual there is an interesting timing diagram (Figure 14-2 if someone feels like looking it up). It shows SCL low between the ACK for the address and the data byte. I think if we're delayed in sending the next byte we could violate Ttimeout or Tlow:mext from the SMBUS spec. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Errant readings on LM81 with T2080 SoC 2021-03-11 21:17 ` Chris Packham @ 2021-03-11 21:34 ` Guenter Roeck 2021-03-11 23:47 ` Chris Packham ` (2 more replies) 0 siblings, 3 replies; 30+ messages in thread From: Guenter Roeck @ 2021-03-11 21:34 UTC (permalink / raw) To: Chris Packham, Wolfram Sang Cc: jdelvare, linux-hwmon, linux-kernel, linux-i2c, linuxppc-dev On 3/11/21 1:17 PM, Chris Packham wrote: > > On 11/03/21 9:18 pm, Wolfram Sang wrote: >>> Bummer. What is really weird is that you see clock stretching under >>> CPU load. Normally clock stretching is triggered by the device, not >>> by the host. >> One example: Some hosts need an interrupt per byte to know if they >> should send ACK or NACK. If that interrupt is delayed, they stretch the >> clock. >> > It feels like something like that is happening. Looking at the T2080 > Reference manual there is an interesting timing diagram (Figure 14-2 if > someone feels like looking it up). It shows SCL low between the ACK for > the address and the data byte. I think if we're delayed in sending the > next byte we could violate Ttimeout or Tlow:mext from the SMBUS spec. > I think that really leaves you only two options that I can see: Rework the driver to handle critical actions (such as setting TXAK, and everything else that might result in clock stretching) in the interrupt handler, or rework the driver to handle everything in a high priority kernel thread. Guenter ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Errant readings on LM81 with T2080 SoC 2021-03-11 21:34 ` Guenter Roeck @ 2021-03-11 23:47 ` Chris Packham 2021-03-12 0:07 ` Guenter Roeck 2021-03-12 9:25 ` David Laight 2021-03-18 3:46 ` Chris Packham 2 siblings, 1 reply; 30+ messages in thread From: Chris Packham @ 2021-03-11 23:47 UTC (permalink / raw) To: Guenter Roeck, Wolfram Sang Cc: jdelvare, linux-hwmon, linux-kernel, linux-i2c, linuxppc-dev On 12/03/21 10:34 am, Guenter Roeck wrote: > On 3/11/21 1:17 PM, Chris Packham wrote: >> On 11/03/21 9:18 pm, Wolfram Sang wrote: >>>> Bummer. What is really weird is that you see clock stretching under >>>> CPU load. Normally clock stretching is triggered by the device, not >>>> by the host. >>> One example: Some hosts need an interrupt per byte to know if they >>> should send ACK or NACK. If that interrupt is delayed, they stretch the >>> clock. >>> >> It feels like something like that is happening. Looking at the T2080 >> Reference manual there is an interesting timing diagram (Figure 14-2 if >> someone feels like looking it up). It shows SCL low between the ACK for >> the address and the data byte. I think if we're delayed in sending the >> next byte we could violate Ttimeout or Tlow:mext from the SMBUS spec. >> > I think that really leaves you only two options that I can see: > Rework the driver to handle critical actions (such as setting TXAK, > and everything else that might result in clock stretching) in the > interrupt handler, or rework the driver to handle everything in > a high priority kernel thread. One thing I've found that does seem to avoid the problem is to disable preemption, use polling and replace the schedule() in i2c_wait() with udelay(50). That's kind of like the kernel thread option. > Guenter ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Errant readings on LM81 with T2080 SoC 2021-03-11 23:47 ` Chris Packham @ 2021-03-12 0:07 ` Guenter Roeck 2021-03-12 0:19 ` Chris Packham 0 siblings, 1 reply; 30+ messages in thread From: Guenter Roeck @ 2021-03-12 0:07 UTC (permalink / raw) To: Chris Packham, Wolfram Sang Cc: jdelvare, linux-hwmon, linux-kernel, linux-i2c, linuxppc-dev On 3/11/21 3:47 PM, Chris Packham wrote: > > On 12/03/21 10:34 am, Guenter Roeck wrote: >> On 3/11/21 1:17 PM, Chris Packham wrote: >>> On 11/03/21 9:18 pm, Wolfram Sang wrote: >>>>> Bummer. What is really weird is that you see clock stretching under >>>>> CPU load. Normally clock stretching is triggered by the device, not >>>>> by the host. >>>> One example: Some hosts need an interrupt per byte to know if they >>>> should send ACK or NACK. If that interrupt is delayed, they stretch the >>>> clock. >>>> >>> It feels like something like that is happening. Looking at the T2080 >>> Reference manual there is an interesting timing diagram (Figure 14-2 if >>> someone feels like looking it up). It shows SCL low between the ACK for >>> the address and the data byte. I think if we're delayed in sending the >>> next byte we could violate Ttimeout or Tlow:mext from the SMBUS spec. >>> >> I think that really leaves you only two options that I can see: >> Rework the driver to handle critical actions (such as setting TXAK, >> and everything else that might result in clock stretching) in the >> interrupt handler, or rework the driver to handle everything in >> a high priority kernel thread. > One thing I've found that does seem to avoid the problem is to disable > preemption, use polling and replace the schedule() in i2c_wait() with > udelay(50). That's kind of like the kernel thread option. It is kind of hackish, though, especially since it makes the "loaded system" situation even worse by adding even more active wait loops. Guenter ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Errant readings on LM81 with T2080 SoC 2021-03-12 0:07 ` Guenter Roeck @ 2021-03-12 0:19 ` Chris Packham 0 siblings, 0 replies; 30+ messages in thread From: Chris Packham @ 2021-03-12 0:19 UTC (permalink / raw) To: Guenter Roeck, Wolfram Sang Cc: jdelvare, linux-hwmon, linux-kernel, linux-i2c, linuxppc-dev On 12/03/21 1:07 pm, Guenter Roeck wrote: > On 3/11/21 3:47 PM, Chris Packham wrote: >> On 12/03/21 10:34 am, Guenter Roeck wrote: >>> On 3/11/21 1:17 PM, Chris Packham wrote: >>>> On 11/03/21 9:18 pm, Wolfram Sang wrote: >>>>>> Bummer. What is really weird is that you see clock stretching under >>>>>> CPU load. Normally clock stretching is triggered by the device, not >>>>>> by the host. >>>>> One example: Some hosts need an interrupt per byte to know if they >>>>> should send ACK or NACK. If that interrupt is delayed, they stretch the >>>>> clock. >>>>> >>>> It feels like something like that is happening. Looking at the T2080 >>>> Reference manual there is an interesting timing diagram (Figure 14-2 if >>>> someone feels like looking it up). It shows SCL low between the ACK for >>>> the address and the data byte. I think if we're delayed in sending the >>>> next byte we could violate Ttimeout or Tlow:mext from the SMBUS spec. >>>> >>> I think that really leaves you only two options that I can see: >>> Rework the driver to handle critical actions (such as setting TXAK, >>> and everything else that might result in clock stretching) in the >>> interrupt handler, or rework the driver to handle everything in >>> a high priority kernel thread. >> One thing I've found that does seem to avoid the problem is to disable >> preemption, use polling and replace the schedule() in i2c_wait() with >> udelay(50). That's kind of like the kernel thread option. > It is kind of hackish, though, especially since it makes the "loaded system" > situation even worse by adding even more active wait loops. No -ish about it :). But it might put out one fire for me while I'm looking at doing some kind of interrupt driven state machine. ^ permalink raw reply [flat|nested] 30+ messages in thread
* RE: Errant readings on LM81 with T2080 SoC 2021-03-11 21:34 ` Guenter Roeck 2021-03-11 23:47 ` Chris Packham @ 2021-03-12 9:25 ` David Laight 2021-03-14 21:26 ` Chris Packham 2021-03-18 3:46 ` Chris Packham 2 siblings, 1 reply; 30+ messages in thread From: David Laight @ 2021-03-12 9:25 UTC (permalink / raw) To: 'Guenter Roeck', Chris Packham, Wolfram Sang Cc: linux-hwmon, jdelvare, linuxppc-dev, linux-kernel, linux-i2c From: Linuxppc-dev Guenter Roeck > Sent: 11 March 2021 21:35 > > On 3/11/21 1:17 PM, Chris Packham wrote: > > > > On 11/03/21 9:18 pm, Wolfram Sang wrote: > >>> Bummer. What is really weird is that you see clock stretching under > >>> CPU load. Normally clock stretching is triggered by the device, not > >>> by the host. > >> One example: Some hosts need an interrupt per byte to know if they > >> should send ACK or NACK. If that interrupt is delayed, they stretch the > >> clock. > >> > > It feels like something like that is happening. Looking at the T2080 > > Reference manual there is an interesting timing diagram (Figure 14-2 if > > someone feels like looking it up). It shows SCL low between the ACK for > > the address and the data byte. I think if we're delayed in sending the > > next byte we could violate Ttimeout or Tlow:mext from the SMBUS spec. > > > > I think that really leaves you only two options that I can see: > Rework the driver to handle critical actions (such as setting TXAK, > and everything else that might result in clock stretching) in the > interrupt handler, or rework the driver to handle everything in > a high priority kernel thread. I'm not sure a high priority kernel thread will help. Without CONFIG_PREEMPT (which has its own set of nasties) a RT process won't be scheduled until the processor it last ran on does a reschedule. I don't think a kernel thread will be any different from a user process running under the RT scheduler. I'm trying to remember the smbus spec (without remembering the I2C one). While basically a clock+data bit-bang the slave is allowed to drive the clock low to extend a cycle. It may be allowed to do this at any point? The master can generate the data at almost any rate (below the maximum) but I don't think it can go down to zero. But I do remember one of the specs having a timeout. But I'd have thought the slave should answer the cycle correctly regardless of any 'random' delays the master adds in. Unless you are getting away with de-asserting chipselect? The only implementation I've done is one an FPGA so doesn't have worry about interrupt latencies. It doesn't actually support clock stretching; it wasn't in the code I started from and none of the slaves we need to connect to ever does it. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales) ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Errant readings on LM81 with T2080 SoC 2021-03-12 9:25 ` David Laight @ 2021-03-14 21:26 ` Chris Packham 2021-03-15 9:46 ` David Laight 2021-03-18 5:44 ` Wolfram Sang 0 siblings, 2 replies; 30+ messages in thread From: Chris Packham @ 2021-03-14 21:26 UTC (permalink / raw) To: David Laight, 'Guenter Roeck', Wolfram Sang Cc: linux-hwmon, jdelvare, linuxppc-dev, linux-kernel, linux-i2c On 12/03/21 10:25 pm, David Laight wrote: > From: Linuxppc-dev Guenter Roeck >> Sent: 11 March 2021 21:35 >> >> On 3/11/21 1:17 PM, Chris Packham wrote: >>> On 11/03/21 9:18 pm, Wolfram Sang wrote: >>>>> Bummer. What is really weird is that you see clock stretching under >>>>> CPU load. Normally clock stretching is triggered by the device, not >>>>> by the host. >>>> One example: Some hosts need an interrupt per byte to know if they >>>> should send ACK or NACK. If that interrupt is delayed, they stretch the >>>> clock. >>>> >>> It feels like something like that is happening. Looking at the T2080 >>> Reference manual there is an interesting timing diagram (Figure 14-2 if >>> someone feels like looking it up). It shows SCL low between the ACK for >>> the address and the data byte. I think if we're delayed in sending the >>> next byte we could violate Ttimeout or Tlow:mext from the SMBUS spec. >>> >> I think that really leaves you only two options that I can see: >> Rework the driver to handle critical actions (such as setting TXAK, >> and everything else that might result in clock stretching) in the >> interrupt handler, or rework the driver to handle everything in >> a high priority kernel thread. > I'm not sure a high priority kernel thread will help. > Without CONFIG_PREEMPT (which has its own set of nasties) > a RT process won't be scheduled until the processor it last > ran on does a reschedule. > I don't think a kernel thread will be any different from a > user process running under the RT scheduler. > > I'm trying to remember the smbus spec (without remembering the I2C one). For those following along the spec is available here[0]. I know there's a 3.0 version[1] as well but the devices I'm dealing with are from a 2.0 vintage. > While basically a clock+data bit-bang the slave is allowed to drive > the clock low to extend a cycle. > It may be allowed to do this at any point? From what I can see it's actually the master extending the clock. Or more accurately holding it low between the address and data bytes (which from the T2080 reference manual looks expected). I think this may cause a strictly compliant SMBUS device to determine that Tlow:mext has been violated. > The master can generate the data at almost any rate (below the maximum) > but I don't think it can go down to zero. > But I do remember one of the specs having a timeout. > > But I'd have thought the slave should answer the cycle correctly > regardless of any 'random' delays the master adds in. Probably depends on the device implementation. I've got multiple other I2C/SMBUS devices and the LM81 seems to be the one that objects. > Unless you are getting away with de-asserting chipselect? > > The only implementation I've done is one an FPGA so doesn't have > worry about interrupt latencies. > It doesn't actually support clock stretching; it wasn't in the > code I started from and none of the slaves we need to connect to > ever does it. > > David [0] - http://www.smbus.org/specs/smbus20.pdf [1] - https://pmbus.org/Assets/PDFS/Public/SMBus_3_0_20141220.pdf > > - > Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK > Registration No: 1397386 (Wales) > ^ permalink raw reply [flat|nested] 30+ messages in thread
* RE: Errant readings on LM81 with T2080 SoC 2021-03-14 21:26 ` Chris Packham @ 2021-03-15 9:46 ` David Laight 2021-03-18 5:44 ` Wolfram Sang 1 sibling, 0 replies; 30+ messages in thread From: David Laight @ 2021-03-15 9:46 UTC (permalink / raw) To: 'Chris Packham', 'Guenter Roeck', Wolfram Sang Cc: linux-hwmon, jdelvare, linuxppc-dev, linux-kernel, linux-i2c From: Chris Packham > Sent: 14 March 2021 21:26 > > On 12/03/21 10:25 pm, David Laight wrote: > > From: Linuxppc-dev Guenter Roeck > >> Sent: 11 March 2021 21:35 > >> > >> On 3/11/21 1:17 PM, Chris Packham wrote: > >>> On 11/03/21 9:18 pm, Wolfram Sang wrote: > >>>>> Bummer. What is really weird is that you see clock stretching under > >>>>> CPU load. Normally clock stretching is triggered by the device, not > >>>>> by the host. > >>>> One example: Some hosts need an interrupt per byte to know if they > >>>> should send ACK or NACK. If that interrupt is delayed, they stretch the > >>>> clock. > >>>> > >>> It feels like something like that is happening. Looking at the T2080 > >>> Reference manual there is an interesting timing diagram (Figure 14-2 if > >>> someone feels like looking it up). It shows SCL low between the ACK for > >>> the address and the data byte. I think if we're delayed in sending the > >>> next byte we could violate Ttimeout or Tlow:mext from the SMBUS spec. > >>> > >> I think that really leaves you only two options that I can see: > >> Rework the driver to handle critical actions (such as setting TXAK, > >> and everything else that might result in clock stretching) in the > >> interrupt handler, or rework the driver to handle everything in > >> a high priority kernel thread. > > > > I'm not sure a high priority kernel thread will help. > > Without CONFIG_PREEMPT (which has its own set of nasties) > > a RT process won't be scheduled until the processor it last > > ran on does a reschedule. > > I don't think a kernel thread will be any different from a > > user process running under the RT scheduler. > > > > I'm trying to remember the smbus spec (without remembering the I2C one). > For those following along the spec is available here[0]. I know there's > a 3.0 version[1] as well but the devices I'm dealing with are from a 2.0 > vintage. > > While basically a clock+data bit-bang the slave is allowed to drive > > the clock low to extend a cycle. > > It may be allowed to do this at any point? > > From what I can see it's actually the master extending the clock. Or > more accurately holding it low between the address and data bytes (which > from the T2080 reference manual looks expected). I think this may cause > a strictly compliant SMBUS device to determine that Tlow:mext has been > violated. Yes, the spec does seem to assume that is a signal is stable for 20ms something has gone 'horribly wrong'. I wasn't worries about that, our fpga does the whole transaction as a single command. None of our slaves generate interrupts - so it is purely master/slave. If you run your process under the RT scheduler it is unlikely that pre-emption will be delayed by long enough to stop the process running for 10ms. I've seen >1ms delays (testing RTP audio), but most of the long loops have a cond_resched() in them. ... > Probably depends on the device implementation. I've got multiple other > I2C/SMBUS devices and the LM81 seems to be the one that objects. I bet most don't implement any of the timeouts. I found one interesting pmbus device. Sometimes it would detect a STOP condition because the data line went high when it tri-stated its output driver in response to the rising clock edge! So it saw the same clock edge twice. > [0] - http://www.smbus.org/specs/smbus20.pdf > [1] - https://pmbus.org/Assets/PDFS/Public/SMBus_3_0_20141220.pdf I should have both those - I've copied them to the directory where I'd look for them first! David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales) ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Errant readings on LM81 with T2080 SoC 2021-03-14 21:26 ` Chris Packham 2021-03-15 9:46 ` David Laight @ 2021-03-18 5:44 ` Wolfram Sang 1 sibling, 0 replies; 30+ messages in thread From: Wolfram Sang @ 2021-03-18 5:44 UTC (permalink / raw) To: Chris Packham Cc: David Laight, 'Guenter Roeck', linux-hwmon, jdelvare, linuxppc-dev, linux-kernel, linux-i2c [-- Attachment #1: Type: text/plain, Size: 365 bytes --] > Probably depends on the device implementation. I've got multiple other > I2C/SMBUS devices and the LM81 seems to be the one that objects. For the recored, there was just a similar case with a DA9063, but that one luckily had a bit to switch from SMBus to I2C mode, i.e. no timeout handling: [PATCH v6 1/1] mfd: da9063: Support SMBus and I2C mode [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Errant readings on LM81 with T2080 SoC 2021-03-11 21:34 ` Guenter Roeck 2021-03-11 23:47 ` Chris Packham 2021-03-12 9:25 ` David Laight @ 2021-03-18 3:46 ` Chris Packham 2021-03-18 4:02 ` Guenter Roeck 2 siblings, 1 reply; 30+ messages in thread From: Chris Packham @ 2021-03-18 3:46 UTC (permalink / raw) To: Guenter Roeck, Wolfram Sang Cc: jdelvare, linux-hwmon, linux-kernel, linux-i2c, linuxppc-dev On 12/03/21 10:34 am, Guenter Roeck wrote: > On 3/11/21 1:17 PM, Chris Packham wrote: >> On 11/03/21 9:18 pm, Wolfram Sang wrote: >>>> Bummer. What is really weird is that you see clock stretching under >>>> CPU load. Normally clock stretching is triggered by the device, not >>>> by the host. >>> One example: Some hosts need an interrupt per byte to know if they >>> should send ACK or NACK. If that interrupt is delayed, they stretch the >>> clock. >>> >> It feels like something like that is happening. Looking at the T2080 >> Reference manual there is an interesting timing diagram (Figure 14-2 if >> someone feels like looking it up). It shows SCL low between the ACK for >> the address and the data byte. I think if we're delayed in sending the >> next byte we could violate Ttimeout or Tlow:mext from the SMBUS spec. >> > I think that really leaves you only two options that I can see: > Rework the driver to handle critical actions (such as setting TXAK, > and everything else that might result in clock stretching) in the > interrupt handler, or rework the driver to handle everything in > a high priority kernel thread. I've made some reasonable progress on making i2c-mpc more interrupt driven. Assuming it works out for my use-case is there an opinion on making interrupt support mandatory? Looking at all the in-tree dts files that use one of the compatible strings from i2c-mpc.c they all have interrupt properties so in theory nothing is using the polling mode. But there may be some out-of-tree boards or boards using an old dtb that would be affected? ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Errant readings on LM81 with T2080 SoC 2021-03-18 3:46 ` Chris Packham @ 2021-03-18 4:02 ` Guenter Roeck 2021-03-18 5:39 ` Wolfram Sang 0 siblings, 1 reply; 30+ messages in thread From: Guenter Roeck @ 2021-03-18 4:02 UTC (permalink / raw) To: Chris Packham, Wolfram Sang Cc: jdelvare, linux-hwmon, linux-kernel, linux-i2c, linuxppc-dev On 3/17/21 8:46 PM, Chris Packham wrote: > > On 12/03/21 10:34 am, Guenter Roeck wrote: >> On 3/11/21 1:17 PM, Chris Packham wrote: >>> On 11/03/21 9:18 pm, Wolfram Sang wrote: >>>>> Bummer. What is really weird is that you see clock stretching under >>>>> CPU load. Normally clock stretching is triggered by the device, not >>>>> by the host. >>>> One example: Some hosts need an interrupt per byte to know if they >>>> should send ACK or NACK. If that interrupt is delayed, they stretch the >>>> clock. >>>> >>> It feels like something like that is happening. Looking at the T2080 >>> Reference manual there is an interesting timing diagram (Figure 14-2 if >>> someone feels like looking it up). It shows SCL low between the ACK for >>> the address and the data byte. I think if we're delayed in sending the >>> next byte we could violate Ttimeout or Tlow:mext from the SMBUS spec. >>> >> I think that really leaves you only two options that I can see: >> Rework the driver to handle critical actions (such as setting TXAK, >> and everything else that might result in clock stretching) in the >> interrupt handler, or rework the driver to handle everything in >> a high priority kernel thread. > I've made some reasonable progress on making i2c-mpc more interrupt > driven. Assuming it works out for my use-case is there an opinion on > making interrupt support mandatory? Looking at all the in-tree dts files > that use one of the compatible strings from i2c-mpc.c they all have > interrupt properties so in theory nothing is using the polling mode. But > there may be some out-of-tree boards or boards using an old dtb that > would be affected? > The polling code is from pre-git times. Like 2005 and earlier. I'd say it is about time to get rid of it. Any out-of-tree users had more than 15 years to upstream their code, after all. Guenter ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Errant readings on LM81 with T2080 SoC 2021-03-18 4:02 ` Guenter Roeck @ 2021-03-18 5:39 ` Wolfram Sang 0 siblings, 0 replies; 30+ messages in thread From: Wolfram Sang @ 2021-03-18 5:39 UTC (permalink / raw) To: Guenter Roeck Cc: Chris Packham, jdelvare, linux-hwmon, linux-kernel, linux-i2c, linuxppc-dev [-- Attachment #1: Type: text/plain, Size: 304 bytes --] > The polling code is from pre-git times. Like 2005 and earlier. > I'd say it is about time to get rid of it. Any out-of-tree users > had more than 15 years to upstream their code, after all. Parts of the polling mode might be interesting for the atomic_xfer mode maybe? Which is not implemented yet. [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Errant readings on LM81 with T2080 SoC 2021-03-08 4:59 ` Guenter Roeck 2021-03-08 20:27 ` Chris Packham @ 2021-03-08 22:10 ` Chris Packham 2021-03-09 4:36 ` Chris Packham 1 sibling, 1 reply; 30+ messages in thread From: Chris Packham @ 2021-03-08 22:10 UTC (permalink / raw) To: Guenter Roeck, jdelvare Cc: linux-hwmon, linux-kernel, linux-i2c, linuxppc-dev On 8/03/21 5:59 pm, Guenter Roeck wrote: > On 3/7/21 8:37 PM, Chris Packham wrote: > [ ... ] >>> That's from -ENXIO which is used in only one place in i2c-mpc.c. I'll >>> enable some debug and see what we get. >> For the errant readings there was nothing abnormal reported by the driver. >> >> For the "No such device or address" I saw "mpc-i2c ffe119000.i2c: No >> RXAK" which matches up with the -ENXIO return. >> > Id suggest to check the time until not busy and stop in mpc_xfer(). > Those hot loops are unusual, and may well mess up the code especially > if preempt is enabled. Reworking those loops seems to have had a positive result. I'll do a bit more testing and hopefully get a patch out later today. > Also, are you using interrupts or polling in > your system ? The interrupt handler looks a bit odd, with "Read again > to allow register to stabilise". > > Do you have fsl,timeout set in the devicetree properties and, if so, > have you played with it ? > > Other than that, the only other real idea I have would be to monitor > the i2c bus. > > Guenter ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Errant readings on LM81 with T2080 SoC 2021-03-08 22:10 ` Chris Packham @ 2021-03-09 4:36 ` Chris Packham 2021-03-09 5:24 ` Guenter Roeck 0 siblings, 1 reply; 30+ messages in thread From: Chris Packham @ 2021-03-09 4:36 UTC (permalink / raw) To: Guenter Roeck, jdelvare Cc: linux-hwmon, linux-kernel, linux-i2c, linuxppc-dev On 9/03/21 11:10 am, Chris Packham wrote: > > On 8/03/21 5:59 pm, Guenter Roeck wrote: >> On 3/7/21 8:37 PM, Chris Packham wrote: >> [ ... ] >>>> That's from -ENXIO which is used in only one place in i2c-mpc.c. I'll >>>> enable some debug and see what we get. >>> For the errant readings there was nothing abnormal reported by the >>> driver. >>> >>> For the "No such device or address" I saw "mpc-i2c ffe119000.i2c: No >>> RXAK" which matches up with the -ENXIO return. >>> >> Id suggest to check the time until not busy and stop in mpc_xfer(). >> Those hot loops are unusual, and may well mess up the code especially >> if preempt is enabled. > Reworking those loops seems to have had a positive result. I'll do a > bit more testing and hopefully get a patch out later today. D'oh my "fix" was to replace the cond_reshed() with msleep(10) which did "fix" the problem but made every i2c read slow. I didn't notice when testing just the lm81 but as soon as I booted the system with more i2c devices I saw stupidly slow boot times. >> Also, are you using interrupts or polling in >> your system ? The interrupt handler looks a bit odd, with "Read again >> to allow register to stabilise". >> >> Do you have fsl,timeout set in the devicetree properties and, if so, >> have you played with it ? >> >> Other than that, the only other real idea I have would be to monitor >> the i2c bus. >> >> Guenter ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Errant readings on LM81 with T2080 SoC 2021-03-09 4:36 ` Chris Packham @ 2021-03-09 5:24 ` Guenter Roeck 0 siblings, 0 replies; 30+ messages in thread From: Guenter Roeck @ 2021-03-09 5:24 UTC (permalink / raw) To: Chris Packham, jdelvare Cc: linux-hwmon, linux-kernel, linux-i2c, linuxppc-dev On 3/8/21 8:36 PM, Chris Packham wrote: > > On 9/03/21 11:10 am, Chris Packham wrote: >> >> On 8/03/21 5:59 pm, Guenter Roeck wrote: >>> On 3/7/21 8:37 PM, Chris Packham wrote: >>> [ ... ] >>>>> That's from -ENXIO which is used in only one place in i2c-mpc.c. I'll >>>>> enable some debug and see what we get. >>>> For the errant readings there was nothing abnormal reported by the >>>> driver. >>>> >>>> For the "No such device or address" I saw "mpc-i2c ffe119000.i2c: No >>>> RXAK" which matches up with the -ENXIO return. >>>> >>> Id suggest to check the time until not busy and stop in mpc_xfer(). >>> Those hot loops are unusual, and may well mess up the code especially >>> if preempt is enabled. >> Reworking those loops seems to have had a positive result. I'll do a >> bit more testing and hopefully get a patch out later today. > D'oh my "fix" was to replace the cond_reshed() with msleep(10) which did > "fix" the problem but made every i2c read slow. I didn't notice when > testing just the lm81 but as soon as I booted the system with more i2c > devices I saw stupidly slow boot times. msleep() is indeed a bad idea. You'd want something like usleep_range() with increasing timeout. Like start with a few uS and double the sleep time with each iteration (eg 4-8 / 8-16 / 16-32 / 32-64 / ...). Guenter ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Errant readings on LM81 with T2080 SoC 2021-03-08 0:31 ` Guenter Roeck 2021-03-08 2:27 ` Chris Packham @ 2021-03-09 23:35 ` Chris Packham 2021-03-10 3:29 ` Guenter Roeck 1 sibling, 1 reply; 30+ messages in thread From: Chris Packham @ 2021-03-09 23:35 UTC (permalink / raw) To: Guenter Roeck, jdelvare Cc: linux-hwmon, linux-kernel, linux-i2c, linuxppc-dev On 8/03/21 1:31 pm, Guenter Roeck wrote: > On 3/7/21 2:52 PM, Chris Packham wrote: >> Fundamentally I think this is a problem with the fact that the LM81 is >> an SMBus device but the T2080 (and other Freescale SoCs) uses i2c and we >> emulate SMBus. I suspect the errant readings are when we don't get round >> to completing the read within the timeout specified by the SMBus >> specification. Depending on when that happens we either fail the >> transfer or interpret the result as all-1s. > That is quite unlikely. Many sensor chips are SMBus chips connected to > i2c busses. It is much more likely that there is a bug in the T2080 i2c driver, > that the chip doesn't like the bulk read command issued through regmap, that > the chip has problems with the i2c bus speed, or that the i2c bus is noisy. I have noticed that with the switch to regmap we end up using plain i2c instead of SMBUS. There appears to be no way of saying use SMBUS semantics if the i2c adapter reports I2C_FUNC_I2C. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Errant readings on LM81 with T2080 SoC 2021-03-09 23:35 ` Chris Packham @ 2021-03-10 3:29 ` Guenter Roeck 0 siblings, 0 replies; 30+ messages in thread From: Guenter Roeck @ 2021-03-10 3:29 UTC (permalink / raw) To: Chris Packham, jdelvare Cc: linux-hwmon, linux-kernel, linux-i2c, linuxppc-dev On 3/9/21 3:35 PM, Chris Packham wrote: > > On 8/03/21 1:31 pm, Guenter Roeck wrote: >> On 3/7/21 2:52 PM, Chris Packham wrote: >>> Fundamentally I think this is a problem with the fact that the LM81 is >>> an SMBus device but the T2080 (and other Freescale SoCs) uses i2c and we >>> emulate SMBus. I suspect the errant readings are when we don't get round >>> to completing the read within the timeout specified by the SMBus >>> specification. Depending on when that happens we either fail the >>> transfer or interpret the result as all-1s. >> That is quite unlikely. Many sensor chips are SMBus chips connected to >> i2c busses. It is much more likely that there is a bug in the T2080 i2c driver, >> that the chip doesn't like the bulk read command issued through regmap, that >> the chip has problems with the i2c bus speed, or that the i2c bus is noisy. > I have noticed that with the switch to regmap we end up using plain i2c > instead of SMBUS. There appears to be no way of saying use SMBUS > semantics if the i2c adapter reports I2C_FUNC_I2C. > The driver only really supports I2C; SMBUS functions are emulated. I don't think that makes a real difference. Guenter ^ permalink raw reply [flat|nested] 30+ messages in thread
end of thread, other threads:[~2021-03-18 5:45 UTC | newest] Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-03-07 22:52 Errant readings on LM81 with T2080 SoC Chris Packham 2021-03-08 0:31 ` Guenter Roeck 2021-03-08 2:27 ` Chris Packham 2021-03-08 4:37 ` Chris Packham 2021-03-08 4:59 ` Guenter Roeck 2021-03-08 20:27 ` Chris Packham 2021-03-08 22:39 ` Guenter Roeck 2021-03-10 2:19 ` Chris Packham 2021-03-10 5:06 ` Guenter Roeck 2021-03-10 21:48 ` Chris Packham 2021-03-11 7:41 ` Guenter Roeck 2021-03-11 8:18 ` Wolfram Sang 2021-03-11 15:19 ` Guenter Roeck 2021-03-11 21:17 ` Chris Packham 2021-03-11 21:34 ` Guenter Roeck 2021-03-11 23:47 ` Chris Packham 2021-03-12 0:07 ` Guenter Roeck 2021-03-12 0:19 ` Chris Packham 2021-03-12 9:25 ` David Laight 2021-03-14 21:26 ` Chris Packham 2021-03-15 9:46 ` David Laight 2021-03-18 5:44 ` Wolfram Sang 2021-03-18 3:46 ` Chris Packham 2021-03-18 4:02 ` Guenter Roeck 2021-03-18 5:39 ` Wolfram Sang 2021-03-08 22:10 ` Chris Packham 2021-03-09 4:36 ` Chris Packham 2021-03-09 5:24 ` Guenter Roeck 2021-03-09 23:35 ` Chris Packham 2021-03-10 3:29 ` Guenter Roeck
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).