All of lore.kernel.org
 help / color / mirror / Atom feed
* Sluggish AT91 I2C driver causes SMBus timeouts
@ 2017-10-12 11:35 Peter Rosin
  2017-10-12 14:32 ` Peter Rosin
  2017-10-13 13:29 ` Alan Cox
  0 siblings, 2 replies; 10+ messages in thread
From: Peter Rosin @ 2017-10-12 11:35 UTC (permalink / raw)
  To: Ludovic Desroches; +Cc: linux-kernel, linux-i2c

Hi!

I have encountered an "interesting" bug. It silently corrupts data
and is generally nasty...

On an I2C bus, driven by the at91 driver and DMA (an Atmel
sama5d31 chip), I have an 256 byte eeprom (NXP SE97BTP). I'm using
Linux v4.13.

The at24 driver for the eeprom detects that the I2C adapter is
capable of I2C transactions and selects that over SMBus. Reads
are done in 128 byte chunks. However, sometimes there is some
kind of event that disturbs the transactions such that the very
last bit och the very last byte (and the following NACK and STOP)
of such chunks are delayed for a long time (the latest incident
shows 85ms on the scope). That is too long for the eeprom which
is expecting SMBus and times out after about 30 ms. When the
eeprom times out, it just releases the data line so that it is
pulled up high. The I2C driver does not notice this, and when it
finally gets going, it reads a one for the last bit instead of
the expected zero. Since it is the last byte of the read, a NACK
is expected and since the eeprom has timed out the NACK is there.
And the STOP condition also looks normal (expected since it is
generated by the driver itself). So, the driver has not noticed
anything funny. But the data is corrupted.

I can work around this by disabling the SMBus timeout in the eeprom
with:

	i2cset -f 0 0x18 0x22 0x8100

But that is done on a different I2C address (the eeprom is on
address 0x50), since the chip is a combined temperature sensor and
eeprom, and the SMBus timeout bit is of course in a temperature
sensor register.

HOWEVER, I fail to see how this is limited to my case with this
eeprom. Any SMBus chip with a timeout will suffer the same fate.
The real bug is that this happens without the driver noticing it.
And why is there a 85ms delay in the middle of the last byte?
Sure, I can see why there might be a delay before finishing up
with a STOP condition or between bytes if there needs to be some
DMA setup at some interval, but after the 7th bit of a byte?

For a lot of transactions on the I2C bus there is no delay before
the last bit. And most of the time there is no delay for the
eeprom reads either; the delay only occurs when it feels like it.

This does not feel good at all.

Cheers,
Peter

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Sluggish AT91 I2C driver causes SMBus timeouts
  2017-10-12 11:35 Sluggish AT91 I2C driver causes SMBus timeouts Peter Rosin
@ 2017-10-12 14:32 ` Peter Rosin
  2017-10-12 14:42   ` Guenter Roeck
  2017-10-13 13:29 ` Alan Cox
  1 sibling, 1 reply; 10+ messages in thread
From: Peter Rosin @ 2017-10-12 14:32 UTC (permalink / raw)
  To: Ludovic Desroches
  Cc: linux-kernel, linux-i2c, Guenter Roeck, Jean Delvare, linux-hwmon

On 2017-10-12 13:35, Peter Rosin wrote:
> Hi!
> 
> I have encountered an "interesting" bug. It silently corrupts data
> and is generally nasty...
> 
> On an I2C bus, driven by the at91 driver and DMA (an Atmel
> sama5d31 chip), I have an 256 byte eeprom (NXP SE97BTP). I'm using
> Linux v4.13.
> 
> The at24 driver for the eeprom detects that the I2C adapter is
> capable of I2C transactions and selects that over SMBus. Reads
> are done in 128 byte chunks. However, sometimes there is some
> kind of event that disturbs the transactions such that the very
> last bit och the very last byte (and the following NACK and STOP)
> of such chunks are delayed for a long time (the latest incident
> shows 85ms on the scope). That is too long for the eeprom which
> is expecting SMBus and times out after about 30 ms. When the
> eeprom times out, it just releases the data line so that it is
> pulled up high. The I2C driver does not notice this, and when it
> finally gets going, it reads a one for the last bit instead of
> the expected zero. Since it is the last byte of the read, a NACK
> is expected and since the eeprom has timed out the NACK is there.
> And the STOP condition also looks normal (expected since it is
> generated by the driver itself). So, the driver has not noticed
> anything funny. But the data is corrupted.
> 
> I can work around this by disabling the SMBus timeout in the eeprom
> with:
> 
> 	i2cset -f 0 0x18 0x22 0x8100
> 
> But that is done on a different I2C address (the eeprom is on
> address 0x50), since the chip is a combined temperature sensor and
> eeprom, and the SMBus timeout bit is of course in a temperature
> sensor register.
> 
> HOWEVER, I fail to see how this is limited to my case with this
> eeprom. Any SMBus chip with a timeout will suffer the same fate.
> The real bug is that this happens without the driver noticing it.
> And why is there a 85ms delay in the middle of the last byte?
> Sure, I can see why there might be a delay before finishing up
> with a STOP condition or between bytes if there needs to be some
> DMA setup at some interval, but after the 7th bit of a byte?
> 
> For a lot of transactions on the I2C bus there is no delay before
> the last bit. And most of the time there is no delay for the
> eeprom reads either; the delay only occurs when it feels like it.
> 
> This does not feel good at all.

I added some traces to i2c-at91.c and, AFAIU, it's the call to
at91_twi_read_data_dma_callback that sometimes arrives later than
desired. Once the callback runs, the transfer completes swiftly.

After reading the comments in that driver I suppose the HW holds
on to the last data-bit until it knows whether to ACK or NACK in
the following bit.

But given this, it is questionable if this driver/HW combo can
claim support for SMBus. But then again, I expect many things
suffer from similar scheduling delays (presumably that's what's
going on) so this is probably not special to i2c-at91.c...

Since this is probably a very generic problem and I just happened
to hit it for the eeprom, I wonder if it would be ok to add a
workaround, as below, to the temperature sensor driver part of this
chip? (with suitable comments, defines for the constants etc -
setting the 0x0080 bit in reg 0x22 disables the SMBus timeout)

Cheers,
Peter

diff --git a/drivers/hwmon/jc42.c b/drivers/hwmon/jc42.c
index 1bf22eff0b08..3e72bd8e06d1 100644
--- a/drivers/hwmon/jc42.c
+++ b/drivers/hwmon/jc42.c
@@ -416,6 +416,13 @@ static int jc42_detect(struct i2c_client *client, struct i2c_board_info *info)
 	if ((cap & 0xff00) || (config & 0xf800))
 		return -ENODEV;
 
+	if (manid == NXP_MANID && (devid & SE97_DEVID_MASK) == SE97_DEVID) {
+		int smbus = i2c_smbus_read_word_swapped(client, 0x22);
+		if (smbus < 0)
+			return -ENODEV;
+		i2c_smbus_write_word_swapped(client, 0x22, smbus | 0x0080);
+	}
+
 	for (i = 0; i < ARRAY_SIZE(jc42_chips); i++) {
 		struct jc42_chips *chip = &jc42_chips[i];
 		if (manid == chip->manid &&

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: Sluggish AT91 I2C driver causes SMBus timeouts
  2017-10-12 14:32 ` Peter Rosin
@ 2017-10-12 14:42   ` Guenter Roeck
  2017-10-12 21:19     ` Peter Rosin
  0 siblings, 1 reply; 10+ messages in thread
From: Guenter Roeck @ 2017-10-12 14:42 UTC (permalink / raw)
  To: Peter Rosin, Ludovic Desroches
  Cc: linux-kernel, linux-i2c, Jean Delvare, linux-hwmon

On 10/12/2017 07:32 AM, Peter Rosin wrote:
> On 2017-10-12 13:35, Peter Rosin wrote:
>> Hi!
>>
>> I have encountered an "interesting" bug. It silently corrupts data
>> and is generally nasty...
>>
>> On an I2C bus, driven by the at91 driver and DMA (an Atmel
>> sama5d31 chip), I have an 256 byte eeprom (NXP SE97BTP). I'm using
>> Linux v4.13.
>>
>> The at24 driver for the eeprom detects that the I2C adapter is
>> capable of I2C transactions and selects that over SMBus. Reads
>> are done in 128 byte chunks. However, sometimes there is some
>> kind of event that disturbs the transactions such that the very
>> last bit och the very last byte (and the following NACK and STOP)
>> of such chunks are delayed for a long time (the latest incident
>> shows 85ms on the scope). That is too long for the eeprom which
>> is expecting SMBus and times out after about 30 ms. When the
>> eeprom times out, it just releases the data line so that it is
>> pulled up high. The I2C driver does not notice this, and when it
>> finally gets going, it reads a one for the last bit instead of
>> the expected zero. Since it is the last byte of the read, a NACK
>> is expected and since the eeprom has timed out the NACK is there.
>> And the STOP condition also looks normal (expected since it is
>> generated by the driver itself). So, the driver has not noticed
>> anything funny. But the data is corrupted.
>>
>> I can work around this by disabling the SMBus timeout in the eeprom
>> with:
>>
>> 	i2cset -f 0 0x18 0x22 0x8100
>>
>> But that is done on a different I2C address (the eeprom is on
>> address 0x50), since the chip is a combined temperature sensor and
>> eeprom, and the SMBus timeout bit is of course in a temperature
>> sensor register.
>>
>> HOWEVER, I fail to see how this is limited to my case with this
>> eeprom. Any SMBus chip with a timeout will suffer the same fate.
>> The real bug is that this happens without the driver noticing it.
>> And why is there a 85ms delay in the middle of the last byte?
>> Sure, I can see why there might be a delay before finishing up
>> with a STOP condition or between bytes if there needs to be some
>> DMA setup at some interval, but after the 7th bit of a byte?
>>
>> For a lot of transactions on the I2C bus there is no delay before
>> the last bit. And most of the time there is no delay for the
>> eeprom reads either; the delay only occurs when it feels like it.
>>
>> This does not feel good at all.
> 
> I added some traces to i2c-at91.c and, AFAIU, it's the call to
> at91_twi_read_data_dma_callback that sometimes arrives later than
> desired. Once the callback runs, the transfer completes swiftly.
> 
> After reading the comments in that driver I suppose the HW holds
> on to the last data-bit until it knows whether to ACK or NACK in
> the following bit.
> 
> But given this, it is questionable if this driver/HW combo can
> claim support for SMBus. But then again, I expect many things
> suffer from similar scheduling delays (presumably that's what's
> going on) so this is probably not special to i2c-at91.c...
> 
> Since this is probably a very generic problem and I just happened
> to hit it for the eeprom, I wonder if it would be ok to add a
> workaround, as below, to the temperature sensor driver part of this
> chip? (with suitable comments, defines for the constants etc -
> setting the 0x0080 bit in reg 0x22 disables the SMBus timeout)
> 
> Cheers,
> Peter
> 
> diff --git a/drivers/hwmon/jc42.c b/drivers/hwmon/jc42.c
> index 1bf22eff0b08..3e72bd8e06d1 100644
> --- a/drivers/hwmon/jc42.c
> +++ b/drivers/hwmon/jc42.c
> @@ -416,6 +416,13 @@ static int jc42_detect(struct i2c_client *client, struct i2c_board_info *info)
>   	if ((cap & 0xff00) || (config & 0xf800))
>   		return -ENODEV;
>   
> +	if (manid == NXP_MANID && (devid & SE97_DEVID_MASK) == SE97_DEVID) {
> +		int smbus = i2c_smbus_read_word_swapped(client, 0x22);
> +		if (smbus < 0)
> +			return -ENODEV;
> +		i2c_smbus_write_word_swapped(client, 0x22, smbus | 0x0080);
> +	}
> +

Outch. Not like that; this would affect every board with this chip, not just this one.
We would need something like a DT property to do that (smbus-timeout-disable is used
in other drivers).

.. and definitely not in the detect function. This would have to be done in probe.

Guenter

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Sluggish AT91 I2C driver causes SMBus timeouts
  2017-10-12 14:42   ` Guenter Roeck
@ 2017-10-12 21:19     ` Peter Rosin
  2017-10-13  1:27       ` Guenter Roeck
  0 siblings, 1 reply; 10+ messages in thread
From: Peter Rosin @ 2017-10-12 21:19 UTC (permalink / raw)
  To: Guenter Roeck, Ludovic Desroches
  Cc: linux-kernel, linux-i2c, Jean Delvare, linux-hwmon

On 2017-10-12 16:42, Guenter Roeck wrote:
> On 10/12/2017 07:32 AM, Peter Rosin wrote:
>> On 2017-10-12 13:35, Peter Rosin wrote:
>>> Hi!
>>>
>>> I have encountered an "interesting" bug. It silently corrupts data
>>> and is generally nasty...
>>>
>>> On an I2C bus, driven by the at91 driver and DMA (an Atmel
>>> sama5d31 chip), I have an 256 byte eeprom (NXP SE97BTP). I'm using
>>> Linux v4.13.
>>>
>>> The at24 driver for the eeprom detects that the I2C adapter is
>>> capable of I2C transactions and selects that over SMBus. Reads
>>> are done in 128 byte chunks. However, sometimes there is some
>>> kind of event that disturbs the transactions such that the very
>>> last bit och the very last byte (and the following NACK and STOP)
>>> of such chunks are delayed for a long time (the latest incident
>>> shows 85ms on the scope). That is too long for the eeprom which
>>> is expecting SMBus and times out after about 30 ms. When the
>>> eeprom times out, it just releases the data line so that it is
>>> pulled up high. The I2C driver does not notice this, and when it
>>> finally gets going, it reads a one for the last bit instead of
>>> the expected zero. Since it is the last byte of the read, a NACK
>>> is expected and since the eeprom has timed out the NACK is there.
>>> And the STOP condition also looks normal (expected since it is
>>> generated by the driver itself). So, the driver has not noticed
>>> anything funny. But the data is corrupted.
>>>
>>> I can work around this by disabling the SMBus timeout in the eeprom
>>> with:
>>>
>>> 	i2cset -f 0 0x18 0x22 0x8100
>>>
>>> But that is done on a different I2C address (the eeprom is on
>>> address 0x50), since the chip is a combined temperature sensor and
>>> eeprom, and the SMBus timeout bit is of course in a temperature
>>> sensor register.
>>>
>>> HOWEVER, I fail to see how this is limited to my case with this
>>> eeprom. Any SMBus chip with a timeout will suffer the same fate.
>>> The real bug is that this happens without the driver noticing it.
>>> And why is there a 85ms delay in the middle of the last byte?
>>> Sure, I can see why there might be a delay before finishing up
>>> with a STOP condition or between bytes if there needs to be some
>>> DMA setup at some interval, but after the 7th bit of a byte?
>>>
>>> For a lot of transactions on the I2C bus there is no delay before
>>> the last bit. And most of the time there is no delay for the
>>> eeprom reads either; the delay only occurs when it feels like it.
>>>
>>> This does not feel good at all.
>>
>> I added some traces to i2c-at91.c and, AFAIU, it's the call to
>> at91_twi_read_data_dma_callback that sometimes arrives later than
>> desired. Once the callback runs, the transfer completes swiftly.
>>
>> After reading the comments in that driver I suppose the HW holds
>> on to the last data-bit until it knows whether to ACK or NACK in
>> the following bit.
>>
>> But given this, it is questionable if this driver/HW combo can
>> claim support for SMBus. But then again, I expect many things
>> suffer from similar scheduling delays (presumably that's what's
>> going on) so this is probably not special to i2c-at91.c...
>>
>> Since this is probably a very generic problem and I just happened
>> to hit it for the eeprom, I wonder if it would be ok to add a
>> workaround, as below, to the temperature sensor driver part of this
>> chip? (with suitable comments, defines for the constants etc -
>> setting the 0x0080 bit in reg 0x22 disables the SMBus timeout)
>>
>> Cheers,
>> Peter
>>
>> diff --git a/drivers/hwmon/jc42.c b/drivers/hwmon/jc42.c
>> index 1bf22eff0b08..3e72bd8e06d1 100644
>> --- a/drivers/hwmon/jc42.c
>> +++ b/drivers/hwmon/jc42.c
>> @@ -416,6 +416,13 @@ static int jc42_detect(struct i2c_client *client, struct i2c_board_info *info)
>>   	if ((cap & 0xff00) || (config & 0xf800))
>>   		return -ENODEV;
>>   
>> +	if (manid == NXP_MANID && (devid & SE97_DEVID_MASK) == SE97_DEVID) {
>> +		int smbus = i2c_smbus_read_word_swapped(client, 0x22);
>> +		if (smbus < 0)
>> +			return -ENODEV;
>> +		i2c_smbus_write_word_swapped(client, 0x22, smbus | 0x0080);
>> +	}
>> +
> 
> Outch. Not like that; this would affect every board with this chip, not just this one.

That was kind of the intention. Silent corruption is nasty. But ok, let's
do the opt-in thing.

> We would need something like a DT property to do that (smbus-timeout-disable is used
> in other drivers).
> 
> .. and definitely not in the detect function. This would have to be done in probe.

Arrgh, I did test it, but not properly. The patch passed my tests
because I didn't power-cycle the board and the already disabled timeout
stayed in place for the next boot. I just assumed jc42_detect was
called from probe and it was convenient to have the manufacturer etc
already at hand. Silly silly.

I'll send a proper patch tomorrow with these things taken care of...

Cheers,
peda

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Sluggish AT91 I2C driver causes SMBus timeouts
  2017-10-12 21:19     ` Peter Rosin
@ 2017-10-13  1:27       ` Guenter Roeck
  0 siblings, 0 replies; 10+ messages in thread
From: Guenter Roeck @ 2017-10-13  1:27 UTC (permalink / raw)
  To: Peter Rosin, Ludovic Desroches
  Cc: linux-kernel, linux-i2c, Jean Delvare, linux-hwmon

On 10/12/2017 02:19 PM, Peter Rosin wrote:
> On 2017-10-12 16:42, Guenter Roeck wrote:
>> On 10/12/2017 07:32 AM, Peter Rosin wrote:
>>> On 2017-10-12 13:35, Peter Rosin wrote:
>>>> Hi!
>>>>
>>>> I have encountered an "interesting" bug. It silently corrupts data
>>>> and is generally nasty...
>>>>
>>>> On an I2C bus, driven by the at91 driver and DMA (an Atmel
>>>> sama5d31 chip), I have an 256 byte eeprom (NXP SE97BTP). I'm using
>>>> Linux v4.13.
>>>>
>>>> The at24 driver for the eeprom detects that the I2C adapter is
>>>> capable of I2C transactions and selects that over SMBus. Reads
>>>> are done in 128 byte chunks. However, sometimes there is some
>>>> kind of event that disturbs the transactions such that the very
>>>> last bit och the very last byte (and the following NACK and STOP)
>>>> of such chunks are delayed for a long time (the latest incident
>>>> shows 85ms on the scope). That is too long for the eeprom which
>>>> is expecting SMBus and times out after about 30 ms. When the
>>>> eeprom times out, it just releases the data line so that it is
>>>> pulled up high. The I2C driver does not notice this, and when it
>>>> finally gets going, it reads a one for the last bit instead of
>>>> the expected zero. Since it is the last byte of the read, a NACK
>>>> is expected and since the eeprom has timed out the NACK is there.
>>>> And the STOP condition also looks normal (expected since it is
>>>> generated by the driver itself). So, the driver has not noticed
>>>> anything funny. But the data is corrupted.
>>>>
>>>> I can work around this by disabling the SMBus timeout in the eeprom
>>>> with:
>>>>
>>>> 	i2cset -f 0 0x18 0x22 0x8100
>>>>
>>>> But that is done on a different I2C address (the eeprom is on
>>>> address 0x50), since the chip is a combined temperature sensor and
>>>> eeprom, and the SMBus timeout bit is of course in a temperature
>>>> sensor register.
>>>>
>>>> HOWEVER, I fail to see how this is limited to my case with this
>>>> eeprom. Any SMBus chip with a timeout will suffer the same fate.
>>>> The real bug is that this happens without the driver noticing it.
>>>> And why is there a 85ms delay in the middle of the last byte?
>>>> Sure, I can see why there might be a delay before finishing up
>>>> with a STOP condition or between bytes if there needs to be some
>>>> DMA setup at some interval, but after the 7th bit of a byte?
>>>>
>>>> For a lot of transactions on the I2C bus there is no delay before
>>>> the last bit. And most of the time there is no delay for the
>>>> eeprom reads either; the delay only occurs when it feels like it.
>>>>
>>>> This does not feel good at all.
>>>
>>> I added some traces to i2c-at91.c and, AFAIU, it's the call to
>>> at91_twi_read_data_dma_callback that sometimes arrives later than
>>> desired. Once the callback runs, the transfer completes swiftly.
>>>
>>> After reading the comments in that driver I suppose the HW holds
>>> on to the last data-bit until it knows whether to ACK or NACK in
>>> the following bit.
>>>
>>> But given this, it is questionable if this driver/HW combo can
>>> claim support for SMBus. But then again, I expect many things
>>> suffer from similar scheduling delays (presumably that's what's
>>> going on) so this is probably not special to i2c-at91.c...
>>>
>>> Since this is probably a very generic problem and I just happened
>>> to hit it for the eeprom, I wonder if it would be ok to add a
>>> workaround, as below, to the temperature sensor driver part of this
>>> chip? (with suitable comments, defines for the constants etc -
>>> setting the 0x0080 bit in reg 0x22 disables the SMBus timeout)
>>>
>>> Cheers,
>>> Peter
>>>
>>> diff --git a/drivers/hwmon/jc42.c b/drivers/hwmon/jc42.c
>>> index 1bf22eff0b08..3e72bd8e06d1 100644
>>> --- a/drivers/hwmon/jc42.c
>>> +++ b/drivers/hwmon/jc42.c
>>> @@ -416,6 +416,13 @@ static int jc42_detect(struct i2c_client *client, struct i2c_board_info *info)
>>>    	if ((cap & 0xff00) || (config & 0xf800))
>>>    		return -ENODEV;
>>>    
>>> +	if (manid == NXP_MANID && (devid & SE97_DEVID_MASK) == SE97_DEVID) {
>>> +		int smbus = i2c_smbus_read_word_swapped(client, 0x22);
>>> +		if (smbus < 0)
>>> +			return -ENODEV;
>>> +		i2c_smbus_write_word_swapped(client, 0x22, smbus | 0x0080);
>>> +	}
>>> +
>>
>> Outch. Not like that; this would affect every board with this chip, not just this one.
> 
> That was kind of the intention. Silent corruption is nasty. But ok, let's
> do the opt-in thing.
> 

Maybe, but you only have your point of view in mind: You are effectively
arguing that SMBus timeout should be unconditionally disabled for the SE97.
In reality, the problem is due to the i2c controller driver, or maybe
the i2c controller hardware. So, strictly speaking, there should be a
controller-based property stating that SMBus timeouts should be disabled
for this controller because the controller (or maybe its driver) can not
handle it. After all, the chip is just an innocent victim.

Guenter

>> We would need something like a DT property to do that (smbus-timeout-disable is used
>> in other drivers).
>>
>> .. and definitely not in the detect function. This would have to be done in probe.
> 
> Arrgh, I did test it, but not properly. The patch passed my tests
> because I didn't power-cycle the board and the already disabled timeout
> stayed in place for the next boot. I just assumed jc42_detect was
> called from probe and it was convenient to have the manufacturer etc
> already at hand. Silly silly.
> 
> I'll send a proper patch tomorrow with these things taken care of...
> 
> Cheers,
> peda
> 


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Sluggish AT91 I2C driver causes SMBus timeouts
  2017-10-12 11:35 Sluggish AT91 I2C driver causes SMBus timeouts Peter Rosin
  2017-10-12 14:32 ` Peter Rosin
@ 2017-10-13 13:29 ` Alan Cox
  2017-10-13 15:01   ` Peter Rosin
  1 sibling, 1 reply; 10+ messages in thread
From: Alan Cox @ 2017-10-13 13:29 UTC (permalink / raw)
  To: Peter Rosin; +Cc: Ludovic Desroches, linux-kernel, linux-i2c

On Thu, 12 Oct 2017 13:35:17 +0200
Peter Rosin <peda@axentia.se> wrote:

> Hi!
> 
> I have encountered an "interesting" bug. It silently corrupts data
> and is generally nasty...
> 
> On an I2C bus, driven by the at91 driver and DMA (an Atmel
> sama5d31 chip), I have an 256 byte eeprom (NXP SE97BTP). I'm using
> Linux v4.13.

If your force the transfer to PIO does it behave ? Does the controller in
fact need to siwtch to PIO for SMBUS ?

Alan

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Sluggish AT91 I2C driver causes SMBus timeouts
  2017-10-13 13:29 ` Alan Cox
@ 2017-10-13 15:01   ` Peter Rosin
  2017-10-17  7:58     ` Ludovic Desroches
  0 siblings, 1 reply; 10+ messages in thread
From: Peter Rosin @ 2017-10-13 15:01 UTC (permalink / raw)
  To: Alan Cox; +Cc: Ludovic Desroches, linux-kernel, linux-i2c

On 2017-10-13 15:29, Alan Cox wrote:
> On Thu, 12 Oct 2017 13:35:17 +0200
> Peter Rosin <peda@axentia.se> wrote:
> 
>> Hi!
>>
>> I have encountered an "interesting" bug. It silently corrupts data
>> and is generally nasty...
>>
>> On an I2C bus, driven by the at91 driver and DMA (an Atmel
>> sama5d31 chip), I have an 256 byte eeprom (NXP SE97BTP). I'm using
>> Linux v4.13.
> 
> If your force the transfer to PIO does it behave ? Does the controller in
> fact need to siwtch to PIO for SMBUS ?

Like, what if I disable DMA?

I saw no way to do that, short of short-cutting a few things in the
driver code. So, did that and I cannot tickle the bug. But I don't
know if that makes me safe?

Ludovic, any reason to believe disabling DMA will prevent these
stalls, or will they just appear under different circumstances?

I used this dirty "patch" to i2c-at91.c:at91_twi_configure_dma() for
testing:

-	dev->use_dma = true;
+	//dev->use_dma = true;

Cheers,
Peter

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Sluggish AT91 I2C driver causes SMBus timeouts
  2017-10-13 15:01   ` Peter Rosin
@ 2017-10-17  7:58     ` Ludovic Desroches
  2017-10-25 21:40       ` Peter Rosin
  0 siblings, 1 reply; 10+ messages in thread
From: Ludovic Desroches @ 2017-10-17  7:58 UTC (permalink / raw)
  To: Peter Rosin; +Cc: Alan Cox, Ludovic Desroches, linux-kernel, linux-i2c

Hi Peter,

On Fri, Oct 13, 2017 at 05:01:04PM +0200, Peter Rosin wrote:
> On 2017-10-13 15:29, Alan Cox wrote:
> > On Thu, 12 Oct 2017 13:35:17 +0200
> > Peter Rosin <peda@axentia.se> wrote:
> > 
> >> Hi!
> >>
> >> I have encountered an "interesting" bug. It silently corrupts data
> >> and is generally nasty...
> >>
> >> On an I2C bus, driven by the at91 driver and DMA (an Atmel
> >> sama5d31 chip), I have an 256 byte eeprom (NXP SE97BTP). I'm using
> >> Linux v4.13.
> > 
> > If your force the transfer to PIO does it behave ? Does the controller in
> > fact need to siwtch to PIO for SMBUS ?
> 
> Like, what if I disable DMA?
> 
> I saw no way to do that, short of short-cutting a few things in the
> driver code. So, did that and I cannot tickle the bug. But I don't
> know if that makes me safe?
> 
> Ludovic, any reason to believe disabling DMA will prevent these
> stalls, or will they just appear under different circumstances?

Sorry I am currently on vacation. I outlined this discussion.

As you noticed, there are some hardware constraints when using DMA.
Switching from DMA to PIO to handle the end of the transfer is probably the
root cause of the delay you get.

I read you added traces, did you manage to get some information about
timings? Do we waste time waiting for the dma callback? for the RXRDY
interrupt?

If we spend time waiting for the dma callback for sure, disabling DMA
should prevent these stalls. If the stall is inbetween the two last
RXRDY interrupts, it seems it can appear under different circumstances.

> 
> I used this dirty "patch" to i2c-at91.c:at91_twi_configure_dma() for
> testing:
> 
> -	dev->use_dma = true;
> +	//dev->use_dma = true;
> 

You can simply remove dma bindings from the i2c node to force the i2c
controller to use the PIO mode.


Regards

Ludovic

> Cheers,
> Peter

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Sluggish AT91 I2C driver causes SMBus timeouts
  2017-10-17  7:58     ` Ludovic Desroches
@ 2017-10-25 21:40       ` Peter Rosin
  2017-10-31 16:10         ` Ludovic Desroches
  0 siblings, 1 reply; 10+ messages in thread
From: Peter Rosin @ 2017-10-25 21:40 UTC (permalink / raw)
  To: linux-kernel, linux-i2c, Ludovic Desroches; +Cc: Alan Cox

Hi Ludovic,

On 2017-10-17 09:58, Ludovic Desroches wrote:
> Hi Peter,
> 
> On Fri, Oct 13, 2017 at 05:01:04PM +0200, Peter Rosin wrote:
>> On 2017-10-13 15:29, Alan Cox wrote:
>>> On Thu, 12 Oct 2017 13:35:17 +0200
>>> Peter Rosin <peda@axentia.se> wrote:
>>>
>>>> Hi!
>>>>
>>>> I have encountered an "interesting" bug. It silently corrupts data
>>>> and is generally nasty...
>>>>
>>>> On an I2C bus, driven by the at91 driver and DMA (an Atmel
>>>> sama5d31 chip), I have an 256 byte eeprom (NXP SE97BTP). I'm using
>>>> Linux v4.13.
>>>
>>> If your force the transfer to PIO does it behave ? Does the controller in
>>> fact need to siwtch to PIO for SMBUS ?
>>
>> Like, what if I disable DMA?
>>
>> I saw no way to do that, short of short-cutting a few things in the
>> driver code. So, did that and I cannot tickle the bug. But I don't
>> know if that makes me safe?
>>
>> Ludovic, any reason to believe disabling DMA will prevent these
>> stalls, or will they just appear under different circumstances?
> 
> Sorry I am currently on vacation. I outlined this discussion.

And I got buried in other stuff so I managed to ignore and then forget
this for a couple of days. Sorry for the delay...

> As you noticed, there are some hardware constraints when using DMA.
> Switching from DMA to PIO to handle the end of the transfer is probably the
> root cause of the delay you get.
> 
> I read you added traces, did you manage to get some information about
> timings? Do we waste time waiting for the dma callback? for the RXRDY
> interrupt?

I *think* the stalls I'm seeing are from the dma callback.

> If we spend time waiting for the dma callback for sure, disabling DMA
> should prevent these stalls. If the stall is inbetween the two last
> RXRDY interrupts, it seems it can appear under different circumstances.

Exactly my point. It is hard to tell for sure. If we don't do dma, there
is simply no guarantee that the problem goes away. I fear that disabling
dma will only make the problem less likely, and that it therefore is not
a real fix. I can test this any number of times, and Murphy will make
sure that it doesn't trigger. Until it's in the hands of the customer...

The smbus timeout is quite hard to handle when there is no way to
guarantee that deadlines are met. The way I see it, the only safe option
is to disable the smbus timeout. I prefer that over killing dma
completely.

See my patches that take that approach (sorry for not having you on the
cc list)
https://lkml.org/lkml/2017/10/13/184

>>
>> I used this dirty "patch" to i2c-at91.c:at91_twi_configure_dma() for
>> testing:
>>
>> -	dev->use_dma = true;
>> +	//dev->use_dma = true;
>>
> 
> You can simply remove dma bindings from the i2c node to force the i2c
> controller to use the PIO mode.

Ok, that's less intrusive...

Cheers,
Peter

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Sluggish AT91 I2C driver causes SMBus timeouts
  2017-10-25 21:40       ` Peter Rosin
@ 2017-10-31 16:10         ` Ludovic Desroches
  0 siblings, 0 replies; 10+ messages in thread
From: Ludovic Desroches @ 2017-10-31 16:10 UTC (permalink / raw)
  To: Peter Rosin; +Cc: linux-kernel, linux-i2c, Ludovic Desroches, Alan Cox

Hi Peter,

On Wed, Oct 25, 2017 at 11:40:50PM +0200, Peter Rosin wrote:
> Hi Ludovic,
> 
> On 2017-10-17 09:58, Ludovic Desroches wrote:
> > Hi Peter,
> > 
> > On Fri, Oct 13, 2017 at 05:01:04PM +0200, Peter Rosin wrote:
> >> On 2017-10-13 15:29, Alan Cox wrote:
> >>> On Thu, 12 Oct 2017 13:35:17 +0200
> >>> Peter Rosin <peda@axentia.se> wrote:
> >>>
> >>>> Hi!
> >>>>
> >>>> I have encountered an "interesting" bug. It silently corrupts data
> >>>> and is generally nasty...
> >>>>
> >>>> On an I2C bus, driven by the at91 driver and DMA (an Atmel
> >>>> sama5d31 chip), I have an 256 byte eeprom (NXP SE97BTP). I'm using
> >>>> Linux v4.13.
> >>>
> >>> If your force the transfer to PIO does it behave ? Does the controller in
> >>> fact need to siwtch to PIO for SMBUS ?
> >>
> >> Like, what if I disable DMA?
> >>
> >> I saw no way to do that, short of short-cutting a few things in the
> >> driver code. So, did that and I cannot tickle the bug. But I don't
> >> know if that makes me safe?
> >>
> >> Ludovic, any reason to believe disabling DMA will prevent these
> >> stalls, or will they just appear under different circumstances?
> > 
> > Sorry I am currently on vacation. I outlined this discussion.
> 
> And I got buried in other stuff so I managed to ignore and then forget
> this for a couple of days. Sorry for the delay...
> 

No problem.

> > As you noticed, there are some hardware constraints when using DMA.
> > Switching from DMA to PIO to handle the end of the transfer is probably the
> > root cause of the delay you get.
> > 
> > I read you added traces, did you manage to get some information about
> > timings? Do we waste time waiting for the dma callback? for the RXRDY
> > interrupt?
> 
> I *think* the stalls I'm seeing are from the dma callback.
> 
> > If we spend time waiting for the dma callback for sure, disabling DMA
> > should prevent these stalls. If the stall is inbetween the two last
> > RXRDY interrupts, it seems it can appear under different circumstances.
> 
> Exactly my point. It is hard to tell for sure. If we don't do dma, there
> is simply no guarantee that the problem goes away. I fear that disabling
> dma will only make the problem less likely, and that it therefore is not
> a real fix. I can test this any number of times, and Murphy will make
> sure that it doesn't trigger. Until it's in the hands of the customer...
> 
> The smbus timeout is quite hard to handle when there is no way to
> guarantee that deadlines are met. The way I see it, the only safe option
> is to disable the smbus timeout. I prefer that over killing dma
> completely.
> 

Your approach is probably the good one as it seems that the i2c-at91
controller is not the only one that cannot handle the SMBus timeout
feature.

> See my patches that take that approach (sorry for not having you on the
> cc list)
> https://lkml.org/lkml/2017/10/13/184
> 

Thanks.

Regards

Ludovic

> >>
> >> I used this dirty "patch" to i2c-at91.c:at91_twi_configure_dma() for
> >> testing:
> >>
> >> -	dev->use_dma = true;
> >> +	//dev->use_dma = true;
> >>
> > 
> > You can simply remove dma bindings from the i2c node to force the i2c
> > controller to use the PIO mode.
> 
> Ok, that's less intrusive...
> 
> Cheers,
> Peter

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2017-10-31 16:10 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-10-12 11:35 Sluggish AT91 I2C driver causes SMBus timeouts Peter Rosin
2017-10-12 14:32 ` Peter Rosin
2017-10-12 14:42   ` Guenter Roeck
2017-10-12 21:19     ` Peter Rosin
2017-10-13  1:27       ` Guenter Roeck
2017-10-13 13:29 ` Alan Cox
2017-10-13 15:01   ` Peter Rosin
2017-10-17  7:58     ` Ludovic Desroches
2017-10-25 21:40       ` Peter Rosin
2017-10-31 16:10         ` Ludovic Desroches

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.