All of lore.kernel.org
 help / color / mirror / Atom feed
* i2c-i801 driver quit working in 3.8.11 (was: adt7475 driver quit working)
       [not found]             ` <CANQojO5QuoGSmmNYxxZ=0MhbZt0mEW-8n6kTxVSWuZ4p6NFw5Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2013-08-07 19:25               ` Guenter Roeck
       [not found]                 ` <20130807192556.GA31395-0h96xk9xTtrk1uMJSBkQmQ@public.gmane.org>
  0 siblings, 1 reply; 10+ messages in thread
From: Guenter Roeck @ 2013-08-07 19:25 UTC (permalink / raw)
  To: Soren Harward
  Cc: Jean Delvare, Wolfram Sang, linux-i2c-u79uwXL29TY76Z2rM5mHXA

On Wed, Aug 07, 2013 at 02:19:42PM -0400, Soren Harward wrote:
> Yeah, that fixed it.  Sensors are working properly now.
> 
> My computer is a Lenovo D10, which uses the Intel 5400 chipset.
> Here's the lspci dump for the SMBus controller:
> 
> root@jens:~# lspci -vv -s 0000:00:1f.3
> 00:1f.3 SMBus: Intel Corporation 631xESB/632xESB/3100 Chipset SMBus
> Controller (rev 09)
>         Subsystem: Lenovo Device 101d
>         Control: I/O+ Mem- BusMaster- SpecCycle- MemWINV- VGASnoop-
> ParErr- Stepping- SERR- FastB2B- DisINTx-
>         Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium
> >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
>         Interrupt: pin B routed to IRQ 23
>         Region 4: I/O ports at 1100 [size=32]
>         Kernel driver in use: i801_smbus
> 
> So what else can I do to help debug this?
> 
I copied the i2c mailing list and the i2c maintainer, and also changed the
subject to reflect the real problem. I don't really know what else you can do.
Maybe Jean has an idea, or someone else on the list.

One possibility is that the driver doesn't enable/use MSI interrupts.
Maybe that is causing trouble with your chipset. Another option might be
that there are subtle differences with this chipset, and the interrupt code
simply does not work with it.

Guenter

> 
> On Wed, Aug 7, 2013 at 1:48 PM, Guenter Roeck <linux-0h96xk9xTtrk1uMJSBkQmQ@public.gmane.org> wrote:
> > Can you try reverting commit 6676a847d48ac48908cf467b42da9045b5463a6e ?
> >
> > [ Unless you are using an ASUS Z8 board, then it's more complicated ]
> >
> > Guenter
> >
> > On Wed, Aug 07, 2013 at 01:38:15PM -0400, Soren Harward wrote:
> >> The module is i2c-i801, so I guess that's the i801_smbus driver.
> >>
> >> On Wed, Aug 7, 2013 at 11:18 AM, Guenter Roeck <linux-0h96xk9xTtrk1uMJSBkQmQ@public.gmane.org> wrote:
> >> > Thought so. What is the driver ?
> >> >
> >> > Thanks,
> >> > Guenter
> >> >
> >> >
> >> > On 08/07/2013 05:08 AM, Soren Harward wrote:
> >> >>
> >> >> Oops, I swapped kernel numbers. 3.6.11 was good, 3.8.13 is broken.
> >> >>
> >> >> It looks like a problem with i2c bus driver because i2cdetect also locks
> >> >> up when trying to read from the bus.  I'll go pester one of the I2C
> >> >> developers.
> >> >>
> >> >> --
> >> >> Soren Harward
> >> >>
> >> >> On Aug 6, 2013 3:22 PM, "Guenter Roeck" <linux-0h96xk9xTtrk1uMJSBkQmQ@public.gmane.org
> >> >> <mailto:linux-0h96xk9xTtrk1uMJSBkQmQ@public.gmane.org>> wrote:
> >> >>
> >> >>     On Tue, Aug 06, 2013 at 02:30:47PM -0400, Soren Harward wrote:
> >> >>      > When I upgraded from 3.6.13 to kernel 3.8.11 (I know it's old; I'm
> >> >>      > limited by some binary drivers that don't work in >=3.9), hardware
> >> >>      > monitoring thru my ADT7475 quit working.  The chip is still
> >> >>      > recognized, the driver creates the sysfs entries in
> >> >>      > /sys/devices/pci0000:00/0000:00:1f.3/i2c-0/0-002e, but if I try to
> >> >>      > read any of the files (eg, "cat fan1_input"), the process just gets
> >> >>      > stuck in "disk wait" status forever.
> >> >>      >
> >> >>      > Is this a known bug that has since been fixed?  If not, what can I
> >> >> do
> >> >>      > to help fix it?
> >> >>      >
> >> >>
> >> >>     Problem is that v3.6.13 does not exist in mainline. Are you sure this
> >> >> is your
> >> >>     version ? Assuming it is 3.6.11 vs. 3.8.13, the only difference in the
> >> >> driver is
> >> >>     an added include of linux/jiffies.h, which should not cause any
> >> >> problems. It is
> >> >>     more likely a problem with the i2c bus driver. Can you let us know
> >> >> what driver
> >> >>     that is, and check if you have the same problem with another chip on
> >> >> the same bus
> >> >>     (eg an eeprom if there is one) ?
> >> >>
> >> >>     Thanks,
> >> >>     Guenter
> >> >>
> >> >
> >>
> >>
> >>
> >> --
> >> Soren Harward
> >>
> 
> 
> 
> -- 
> Soren Harward
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: i2c-i801 driver quit working in 3.8.11
       [not found]                 ` <20130807192556.GA31395-0h96xk9xTtrk1uMJSBkQmQ@public.gmane.org>
@ 2014-08-07  8:11                   ` Jean Delvare
       [not found]                     ` <CANQojO70ys+UERD5fw0pZDWpneN7aN6z+d-RfCNqiH5-gjDyNQ@mail.gmail.com>
       [not found]                     ` <20140807101147.34087264-R0o5gVi9kd7kN2dkZ6Wm7A@public.gmane.org>
  0 siblings, 2 replies; 10+ messages in thread
From: Jean Delvare @ 2014-08-07  8:11 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: Soren Harward, Wolfram Sang, linux-i2c-u79uwXL29TY76Z2rM5mHXA

Hi Guenter, Soren,

Sorry for the late reply. Yes, one year, no kidding :(

On Wed, 7 Aug 2013 12:25:56 -0700, Guenter Roeck wrote:
> On Wed, Aug 07, 2013 at 02:19:42PM -0400, Soren Harward wrote:
> > Yeah, that fixed it.  Sensors are working properly now.
> > 
> > My computer is a Lenovo D10, which uses the Intel 5400 chipset.
> > Here's the lspci dump for the SMBus controller:
> > 
> > root@jens:~# lspci -vv -s 0000:00:1f.3
> > 00:1f.3 SMBus: Intel Corporation 631xESB/632xESB/3100 Chipset SMBus
> > Controller (rev 09)
> >         Subsystem: Lenovo Device 101d
> >         Control: I/O+ Mem- BusMaster- SpecCycle- MemWINV- VGASnoop-
> > ParErr- Stepping- SERR- FastB2B- DisINTx-
> >         Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium
> > >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
> >         Interrupt: pin B routed to IRQ 23
> >         Region 4: I/O ports at 1100 [size=32]
> >         Kernel driver in use: i801_smbus
> > 
> > So what else can I do to help debug this?
>
> I copied the i2c mailing list and the i2c maintainer, and also changed the
> subject to reflect the real problem. I don't really know what else you cMSIan do.
> Maybe Jean has an idea, or someone else on the list.

The interrupt code in i2c-i801 has worked for a majority of users and
brought up major performance improvements. So I'm not going to revert
it. However I really would like to understand the few failure reports
we had, and hopefully fix them.

Soren, what's the status on your side? Are you still running a kernel
with commit 6676a847 reverted? Note that you can also disable interrupt
support in i2c-i801 manually by passing option disable_features=0x10 to
the driver.

Also I had one report of a problem with the same chipset you are using,
and the reporter claims that kernel v3.16 no longer has the problem. We
don't know why yet, but it might be worth a try.

> One possibility is that the driver doesn't enable/use MSI interrupts.
> Maybe that is causing trouble with your chipset. Another option might be
> that there are subtle differences with this chipset, and the interrupt code
> simply does not work with it.

Guenter, I can confirm that the i2c-i801 driver only uses regular
interrupts. The datasheet does not mention anything about MSI. What
makes you think the problem could be related to the lack of MSI support?

Thanks,
-- 
Jean Delvare
SUSE L3 Support

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: i2c-i801 driver quit working in 3.8.11
       [not found]                     ` <20140807101147.34087264-R0o5gVi9kd7kN2dkZ6Wm7A@public.gmane.org>
@ 2014-08-07 14:19                       ` Guenter Roeck
       [not found]                         ` <53E38AD9.5030708-0h96xk9xTtrk1uMJSBkQmQ@public.gmane.org>
  2014-11-05  3:27                       ` Soren Harward
  1 sibling, 1 reply; 10+ messages in thread
From: Guenter Roeck @ 2014-08-07 14:19 UTC (permalink / raw)
  To: Jean Delvare
  Cc: Soren Harward, Wolfram Sang, linux-i2c-u79uwXL29TY76Z2rM5mHXA

On 08/07/2014 01:11 AM, Jean Delvare wrote:
> Hi Guenter, Soren,
>
> Sorry for the late reply. Yes, one year, no kidding :(
>
> On Wed, 7 Aug 2013 12:25:56 -0700, Guenter Roeck wrote:
>> On Wed, Aug 07, 2013 at 02:19:42PM -0400, Soren Harward wrote:
>>> Yeah, that fixed it.  Sensors are working properly now.
>>>
>>> My computer is a Lenovo D10, which uses the Intel 5400 chipset.
>>> Here's the lspci dump for the SMBus controller:
>>>
>>> root@jens:~# lspci -vv -s 0000:00:1f.3
>>> 00:1f.3 SMBus: Intel Corporation 631xESB/632xESB/3100 Chipset SMBus
>>> Controller (rev 09)
>>>          Subsystem: Lenovo Device 101d
>>>          Control: I/O+ Mem- BusMaster- SpecCycle- MemWINV- VGASnoop-
>>> ParErr- Stepping- SERR- FastB2B- DisINTx-
>>>          Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium
>>>> TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
>>>          Interrupt: pin B routed to IRQ 23
>>>          Region 4: I/O ports at 1100 [size=32]
>>>          Kernel driver in use: i801_smbus
>>>
>>> So what else can I do to help debug this?
>>
>> I copied the i2c mailing list and the i2c maintainer, and also changed the
>> subject to reflect the real problem. I don't really know what else you cMSIan do.
>> Maybe Jean has an idea, or someone else on the list.
>
> The interrupt code in i2c-i801 has worked for a majority of users and
> brought up major performance improvements. So I'm not going to revert
> it. However I really would like to understand the few failure reports
> we had, and hopefully fix them.
>
> Soren, what's the status on your side? Are you still running a kernel
> with commit 6676a847 reverted? Note that you can also disable interrupt
> support in i2c-i801 manually by passing option disable_features=0x10 to
> the driver.
>
> Also I had one report of a problem with the same chipset you are using,
> and the reporter claims that kernel v3.16 no longer has the problem. We
> don't know why yet, but it might be worth a try.
>
>> One possibility is that the driver doesn't enable/use MSI interrupts.
>> Maybe that is causing trouble with your chipset. Another option might be
>> that there are subtle differences with this chipset, and the interrupt code
>> simply does not work with it.
>
> Guenter, I can confirm that the i2c-i801 driver only uses regular
> interrupts. The datasheet does not mention anything about MSI. What
> makes you think the problem could be related to the lack of MSI support?
>

Working for a company that does lots of weird and non-standard stuff with PCIe
creates a state of constant paranoia ;-). For example, INTB / irq23 may be
used by some other chip and may not be handled properly, masking the interrupt.
Really, I don't know. I have had instances at work where INTx didn't work
and I _had_ to use MSI, that is all I can say.

Cheers,
Guenter

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: i2c-i801 driver quit working in 3.8.11
       [not found]                         ` <53E38AD9.5030708-0h96xk9xTtrk1uMJSBkQmQ@public.gmane.org>
@ 2014-08-07 16:04                           ` Jean Delvare
       [not found]                             ` <1407427494.4314.91.camel-H7Kp9ZFCxt/N0uC3ymp8PA@public.gmane.org>
  0 siblings, 1 reply; 10+ messages in thread
From: Jean Delvare @ 2014-08-07 16:04 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: Soren Harward, Wolfram Sang, linux-i2c-u79uwXL29TY76Z2rM5mHXA

Hi Guenter,

Le Thursday 07 August 2014 à 07:19 -0700, Guenter Roeck a écrit :
> On 08/07/2014 01:11 AM, Jean Delvare wrote:
> > Guenter, I can confirm that the i2c-i801 driver only uses regular
> > interrupts. The datasheet does not mention anything about MSI. What
> > makes you think the problem could be related to the lack of MSI support?
> 
> Working for a company that does lots of weird and non-standard stuff with PCIe
> creates a state of constant paranoia ;-). For example, INTB / irq23 may be
> used by some other chip and may not be handled properly, masking the interrupt.
> Really, I don't know. I have had instances at work where INTx didn't work
> and I _had_ to use MSI, that is all I can say.

I had my share of MSI (and MSI-X) drama at work too, but in most cases
the problem was in the MSI/MSI-X handling code and using legacy
interrupts solved the problem (although at the expense of
performance...)

Anyway, my understanding is that using MSI requires support at the
hardware level, you can't do that for any arbitrary PCI device, can you?
As I said I saw no mention of MSI support for the Intel SMBus device in
the datasheet, so I don't think this is possible, and thus I don't think
this can be the issue.

However I found a few interesting interrupt-related bits which the
i2c-i801 driver does not handle properly. I have patches almost ready,
I'll post the soon.

Speaking of "masking the interrupt", I am debugging another i2c-i801
issue. On one system at work, I see the following in dmesg:

[  601.485791] irq 18: nobody cared (try booting with the "irqpoll" option)
[  601.489785] CPU: 3 PID: 0 Comm: swapper/3 Tainted: G          I     3.12.25-2-default #1
[  601.489785] Hardware name: Supermicro X7DW3/X7DWN, BIOS 6.00 08/28/2007
[  601.489785]  ffff880220d9574c ffffffff815096a7 ffff880220d956c0 ffffffff810ad6ad
[  601.489785]  ffff880220d956c0 0000000000000012 0000000000000000 ffffffff810adb15
[  601.489785]  0000000000000000 0000000000000000 0000000000000012 0000000000000000
[  601.489785] Call Trace:
[  601.489785]  [<ffffffff810044bd>] dump_trace+0x7d/0x2d0
[  601.489785]  [<ffffffff810047a4>] show_stack_log_lvl+0x94/0x170
[  601.489785]  [<ffffffff81005be1>] show_stack+0x21/0x50
[  601.489785]  [<ffffffff815096a7>] dump_stack+0x41/0x51
[  601.489785]  [<ffffffff810ad6ad>] __report_bad_irq+0x2d/0xc0
[  601.489785]  [<ffffffff810adb15>] note_interrupt+0x1a5/0x260
[  601.489785]  [<ffffffff810ab649>] handle_irq_event_percpu+0xc9/0x1b0
[  601.489785]  [<ffffffff810ab762>] handle_irq_event+0x32/0x50
[  601.489785]  [<ffffffff810ae5e1>] handle_fasteoi_irq+0x51/0xf0
[  601.489785]  [<ffffffff8100439a>] handle_irq+0x1a/0x30
[  601.489785]  [<ffffffff815198f5>] do_IRQ+0x45/0xb0
[  601.489785]  [<ffffffff8150faad>] common_interrupt+0x6d/0x6d
[  601.489785]  [<ffffffff8103f0c2>] native_safe_halt+0x2/0x10
[  601.489785]  [<ffffffff8100b029>] default_idle+0x19/0xb0
[  601.489785]  [<ffffffff810aab51>] cpu_startup_entry+0xe1/0x270
[  601.489785]  [<ffffffff8103097a>] start_secondary+0x21a/0x2c0
[  601.489785] handlers:
[  601.489785] [<ffffffffa01ee220>] i801_isr [i2c_i801]
[  601.489785] Disabling IRQ #18

I suppose that the IRQ is shared with another device, for which no
driver is loaded. I wonder if/how I can figure out which device this
is. /proc/interrupts says:
 18:          0      50001          0      49999          0          0          1          0   IO-APIC-fasteoi   i801_smbus
and 100000 is the number of unhandled interrupts it takes to trigger the
code which disables the interrupt (note_interrupt in
kernel/irq/spurious.c.) So far so good.

But what I do not understand is that even after that, the i2c-i801
driver is still working. It is very slow, it takes 100 ms to complete
each command, which is even more than when using polling. And the fact
that the figures in /proc/interrupts do not increase, mean that IRQ 18
is really disabled. But it does not hang. However, looking at the code,
it's using wait_event() to wait for the interrupt, so I think it should
hang there is interrupt is disabled.

Can you (or anyone else) explain this mystery? I am most certainly
missing something, maybe something obvious.

-- 
Jean Delvare
SUSE L3 Support

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: i2c-i801 driver quit working in 3.8.11
       [not found]                             ` <1407427494.4314.91.camel-H7Kp9ZFCxt/N0uC3ymp8PA@public.gmane.org>
@ 2014-08-07 16:52                               ` Guenter Roeck
  0 siblings, 0 replies; 10+ messages in thread
From: Guenter Roeck @ 2014-08-07 16:52 UTC (permalink / raw)
  To: Jean Delvare
  Cc: Soren Harward, Wolfram Sang, linux-i2c-u79uwXL29TY76Z2rM5mHXA

On Thu, Aug 07, 2014 at 06:04:54PM +0200, Jean Delvare wrote:
> Hi Guenter,
> 
> Le Thursday 07 August 2014 à 07:19 -0700, Guenter Roeck a écrit :
> > On 08/07/2014 01:11 AM, Jean Delvare wrote:
> > > Guenter, I can confirm that the i2c-i801 driver only uses regular
> > > interrupts. The datasheet does not mention anything about MSI. What
> > > makes you think the problem could be related to the lack of MSI support?
> > 
> > Working for a company that does lots of weird and non-standard stuff with PCIe
> > creates a state of constant paranoia ;-). For example, INTB / irq23 may be
> > used by some other chip and may not be handled properly, masking the interrupt.
> > Really, I don't know. I have had instances at work where INTx didn't work
> > and I _had_ to use MSI, that is all I can say.
> 
> I had my share of MSI (and MSI-X) drama at work too, but in most cases
> the problem was in the MSI/MSI-X handling code and using legacy
> interrupts solved the problem (although at the expense of
> performance...)
> 
> Anyway, my understanding is that using MSI requires support at the
> hardware level, you can't do that for any arbitrary PCI device, can you?
> As I said I saw no mention of MSI support for the Intel SMBus device in
> the datasheet, so I don't think this is possible, and thus I don't think
> this can be the issue.
> 
As far as I recall, MSI support is mandatory in PCIe.

> However I found a few interesting interrupt-related bits which the
> i2c-i801 driver does not handle properly. I have patches almost ready,
> I'll post the soon.
> 
> Speaking of "masking the interrupt", I am debugging another i2c-i801
> issue. On one system at work, I see the following in dmesg:
> 
> [  601.485791] irq 18: nobody cared (try booting with the "irqpoll" option)
> [  601.489785] CPU: 3 PID: 0 Comm: swapper/3 Tainted: G          I     3.12.25-2-default #1
> [  601.489785] Hardware name: Supermicro X7DW3/X7DWN, BIOS 6.00 08/28/2007
> [  601.489785]  ffff880220d9574c ffffffff815096a7 ffff880220d956c0 ffffffff810ad6ad
> [  601.489785]  ffff880220d956c0 0000000000000012 0000000000000000 ffffffff810adb15
> [  601.489785]  0000000000000000 0000000000000000 0000000000000012 0000000000000000
> [  601.489785] Call Trace:
> [  601.489785]  [<ffffffff810044bd>] dump_trace+0x7d/0x2d0
> [  601.489785]  [<ffffffff810047a4>] show_stack_log_lvl+0x94/0x170
> [  601.489785]  [<ffffffff81005be1>] show_stack+0x21/0x50
> [  601.489785]  [<ffffffff815096a7>] dump_stack+0x41/0x51
> [  601.489785]  [<ffffffff810ad6ad>] __report_bad_irq+0x2d/0xc0
> [  601.489785]  [<ffffffff810adb15>] note_interrupt+0x1a5/0x260
> [  601.489785]  [<ffffffff810ab649>] handle_irq_event_percpu+0xc9/0x1b0
> [  601.489785]  [<ffffffff810ab762>] handle_irq_event+0x32/0x50
> [  601.489785]  [<ffffffff810ae5e1>] handle_fasteoi_irq+0x51/0xf0
> [  601.489785]  [<ffffffff8100439a>] handle_irq+0x1a/0x30
> [  601.489785]  [<ffffffff815198f5>] do_IRQ+0x45/0xb0
> [  601.489785]  [<ffffffff8150faad>] common_interrupt+0x6d/0x6d
> [  601.489785]  [<ffffffff8103f0c2>] native_safe_halt+0x2/0x10
> [  601.489785]  [<ffffffff8100b029>] default_idle+0x19/0xb0
> [  601.489785]  [<ffffffff810aab51>] cpu_startup_entry+0xe1/0x270
> [  601.489785]  [<ffffffff8103097a>] start_secondary+0x21a/0x2c0
> [  601.489785] handlers:
> [  601.489785] [<ffffffffa01ee220>] i801_isr [i2c_i801]
> [  601.489785] Disabling IRQ #18
> 
> I suppose that the IRQ is shared with another device, for which no
> driver is loaded. I wonder if/how I can figure out which device this
> is. /proc/interrupts says:
>  18:          0      50001          0      49999          0          0          1          0   IO-APIC-fasteoi   i801_smbus
> and 100000 is the number of unhandled interrupts it takes to trigger the
> code which disables the interrupt (note_interrupt in
> kernel/irq/spurious.c.) So far so good.
> 
You could just try to enable MSI in the driver. That won't solve the
unhandled interrupt problem, but it may help you discover the other
interrupt source. What happens if you do not load the i801 driver ?
Do you still see an unhandled interrupt ?

As for how to find the source of the unhandled interrupt - once you
find out, please let me know. I have a similar problem at work.

> But what I do not understand is that even after that, the i2c-i801
> driver is still working. It is very slow, it takes 100 ms to complete
> each command, which is even more than when using polling. And the fact
> that the figures in /proc/interrupts do not increase, mean that IRQ 18
> is really disabled. But it does not hang. However, looking at the code,
> it's using wait_event() to wait for the interrupt, so I think it should
> hang there is interrupt is disabled.
> 
> Can you (or anyone else) explain this mystery? I am most certainly
> missing something, maybe something obvious.
> 
Not me, sorry. Time for some debug messages ?

Guenter

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: i2c-i801 driver quit working in 3.8.11
       [not found]                       ` <CANQojO70ys+UERD5fw0pZDWpneN7aN6z+d-RfCNqiH5-gjDyNQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2014-08-07 16:55                         ` Jean Delvare
       [not found]                           ` <20140807185557.512e08e5-R0o5gVi9kd7kN2dkZ6Wm7A@public.gmane.org>
  0 siblings, 1 reply; 10+ messages in thread
From: Jean Delvare @ 2014-08-07 16:55 UTC (permalink / raw)
  To: Soren Harward
  Cc: Wolfram Sang, linux-i2c-u79uwXL29TY76Z2rM5mHXA, Guenter Roeck

Hi Soren,

Please keep the list and everyone else involved in Cc.

On Thu, 7 Aug 2014 07:33:16 -0400, Soren Harward wrote:
> On Aug 7, 2014 4:11 AM, "Jean Delvare" <jdelvare-l3A5Bk7waGM@public.gmane.org> wrote:
> > Soren, what's the status on your side? Are you still running a kernel
> > with commit 6676a847 reverted?
> 
> Yes, I'm running 3.14 (Gentoo is always a few versions behind) with
> 6676a847 reverted.  Thanks for telling me about the command-line option;
> that'll be way more convenient than reverting the patch every time I
> upgrade.

OK. Did you ever try 3.14 without the revert? Or do you always revert
before even testing because you assume the problem is still present?

> > Also I had one report of a problem with the same chipset you are using,
> > and the reporter claims that kernel v3.16 no longer has the problem. We
> > don't know why yet, but it might be worth a try.
> 
> I'm out of town for a few days, so I'll try v3.16 when I get home. I'll get
> back to you mid next week.

Great, thanks.

Meanwhile I have prepared a debug flavor of the i2c-i801 driver, which
can be built separately from the kernel:

http://jdelvare.nerim.net/devel/lm-sensors/drivers/i2c-i801_debug/

Feel free to give it a try and report the results, this may help us
understand where the problem lies.

-- 
Jean Delvare
SUSE L3 Support

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: i2c-i801 driver quit working in 3.8.11
       [not found]                           ` <20140807185557.512e08e5-R0o5gVi9kd7kN2dkZ6Wm7A@public.gmane.org>
@ 2014-08-07 20:42                             ` Guenter Roeck
  0 siblings, 0 replies; 10+ messages in thread
From: Guenter Roeck @ 2014-08-07 20:42 UTC (permalink / raw)
  To: Jean Delvare
  Cc: Soren Harward, Wolfram Sang, linux-i2c-u79uwXL29TY76Z2rM5mHXA

On Thu, Aug 07, 2014 at 06:55:57PM +0200, Jean Delvare wrote:
> Hi Soren,
> 
> Please keep the list and everyone else involved in Cc.
> 
> On Thu, 7 Aug 2014 07:33:16 -0400, Soren Harward wrote:
> > On Aug 7, 2014 4:11 AM, "Jean Delvare" <jdelvare-l3A5Bk7waGM@public.gmane.org> wrote:
> > > Soren, what's the status on your side? Are you still running a kernel
> > > with commit 6676a847 reverted?
> > 
> > Yes, I'm running 3.14 (Gentoo is always a few versions behind) with
> > 6676a847 reverted.  Thanks for telling me about the command-line option;
> > that'll be way more convenient than reverting the patch every time I
> > upgrade.
> 
> OK. Did you ever try 3.14 without the revert? Or do you always revert
> before even testing because you assume the problem is still present?
> 
> > > Also I had one report of a problem with the same chipset you are using,
> > > and the reporter claims that kernel v3.16 no longer has the problem. We
> > > don't know why yet, but it might be worth a try.
> > 
> > I'm out of town for a few days, so I'll try v3.16 when I get home. I'll get
> > back to you mid next week.
> 
> Great, thanks.
> 
> Meanwhile I have prepared a debug flavor of the i2c-i801 driver, which
> can be built separately from the kernel:
> 
> http://jdelvare.nerim.net/devel/lm-sensors/drivers/i2c-i801_debug/
> 
> Feel free to give it a try and report the results, this may help us
> understand where the problem lies.
> 
I tried to enable MSI on a Xeon L5238, but it failed.
So I guess MSI does not work after all.

Guenetr

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: i2c-i801 driver quit working in 3.8.11
       [not found]                     ` <20140807101147.34087264-R0o5gVi9kd7kN2dkZ6Wm7A@public.gmane.org>
  2014-08-07 14:19                       ` Guenter Roeck
@ 2014-11-05  3:27                       ` Soren Harward
  2014-11-05 10:22                         ` Jean Delvare
  1 sibling, 1 reply; 10+ messages in thread
From: Soren Harward @ 2014-11-05  3:27 UTC (permalink / raw)
  To: Jean Delvare
  Cc: Guenter Roeck, Wolfram Sang, linux-i2c-u79uwXL29TY76Z2rM5mHXA

On Thu, Aug 7, 2014 at 4:11 AM, Jean Delvare <jdelvare-l3A5Bk7waGM@public.gmane.org> wrote:
> Also I had one report of a problem with the same chipset you are using,
> and the reporter claims that kernel v3.16 no longer has the problem. We
> don't know why yet, but it might be worth a try.

Gentoo stabilized 3.16 a couple weeks ago, and I just realized "oh
yeah, I forgot to follow up with the i2c guys".  3.16.5 works without
reverting commit 6676a847.

-- 
Soren Harward

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: i2c-i801 driver quit working in 3.8.11
  2014-11-05  3:27                       ` Soren Harward
@ 2014-11-05 10:22                         ` Jean Delvare
       [not found]                           ` <20141105112217.0e1925f3-R0o5gVi9kd7kN2dkZ6Wm7A@public.gmane.org>
  0 siblings, 1 reply; 10+ messages in thread
From: Jean Delvare @ 2014-11-05 10:22 UTC (permalink / raw)
  To: Soren Harward
  Cc: Guenter Roeck, Wolfram Sang, linux-i2c-u79uwXL29TY76Z2rM5mHXA

Hi Soren,

On Tue, 4 Nov 2014 22:27:29 -0500, Soren Harward wrote:
> On Thu, Aug 7, 2014 at 4:11 AM, Jean Delvare <jdelvare-l3A5Bk7waGM@public.gmane.org> wrote:
> > Also I had one report of a problem with the same chipset you are using,
> > and the reporter claims that kernel v3.16 no longer has the problem. We
> > don't know why yet, but it might be worth a try.
> 
> Gentoo stabilized 3.16 a couple weeks ago, and I just realized "oh
> yeah, I forgot to follow up with the i2c guys".  3.16.5 works without
> reverting commit 6676a847.

Thanks for the report, very appreciated. There were no significant
change to the i2c-i801 driver between kernel versions 3.14 and 3.16, so
something else must have fixed it. If we knew which commit that was, we
could backport it to stable kernels. So if you have the expertise and
time to kill, it would be wonderful if you could bisect the kernel from
3.14 to 3.16 to find out the commit that fixed the problem (beware that
git bisect assumes you are looking for a regression, not a fix, so bad
will mean good and good will mean bad.) If you can't, no worry, your
success report was already very valuable.

The only change specific to the Intel 5400 chipset I could find during
that timeframe was:

commit c2e650c49a1795238895a474873a12c6c5662833
Author: Aristeu Rozanski <aris-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Date:   Thu Jan 16 11:20:21 2014 -0500

    i5400_edac: Disable device when unloading module

but I very much doubt this is related.

Thanks,
-- 
Jean Delvare
SUSE L3 Support

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: i2c-i801 driver quit working in 3.8.11
       [not found]                           ` <20141105112217.0e1925f3-R0o5gVi9kd7kN2dkZ6Wm7A@public.gmane.org>
@ 2014-11-07 17:05                             ` Soren Harward
  0 siblings, 0 replies; 10+ messages in thread
From: Soren Harward @ 2014-11-07 17:05 UTC (permalink / raw)
  To: Jean Delvare
  Cc: Guenter Roeck, Wolfram Sang, linux-i2c-u79uwXL29TY76Z2rM5mHXA

On Wed, Nov 5, 2014 at 5:22 AM, Jean Delvare <jdelvare-l3A5Bk7waGM@public.gmane.org> wrote:
>> Gentoo stabilized 3.16 a couple weeks ago, and I just realized "oh
>> yeah, I forgot to follow up with the i2c guys".  3.16.5 works without
>> reverting commit 6676a847.

Okay, so I guess I spoke too soon.  Using 3.16.5, the sensors can be
read for a little while after boot (anywhere from 20–50 minutes), and
then it locks up again: any process that tries to read the sensors
gets stuck in disk-wait.

But using the debug version of the i2c-i801.c that Jean linked to
above, I was able to figure out the problem.  It's in
drivers/i2c/busses/i2c-i801.c:386 (line number in commit 7cfc183b3d63,
which is the latest GIT commit at the time I'm writing this):

wait_event(priv->waitq, (status = priv->status));

The debug version uses wait_event_timeout() instead of wait_event().
It looks like this call hangs about once an hour at irregular and
unpredictable intervals, because the debug version dumps the following
message into the kernel logs:

Nov 07 10:04:03 jens kernel: i801_smbus 0000:00:1f.3: Timeout waiting
for interrupt!
Nov 07 10:04:03 jens kernel: i801_smbus 0000:00:1f.3: Reg dump: STS=42
CNT=08 CMD=81 ADD=5d DAT0=40 DAT1=00 BLKDAT=00
Nov 07 10:04:03 jens kernel: i801_smbus 0000:00:1f.3: Reg dump: PEC=00
AUXSTS=00 AUXCTL=00
Nov 07 10:04:03 jens kernel: i801_smbus 0000:00:1f.3: Transaction timeout
Nov 07 10:04:03 jens kernel: i801_smbus 0000:00:1f.3: Failed
terminating the transaction

This error shows up about once an hour, at irregular and unpredictable
intervals.  The stock kernel doesn't have the timeout, so I infer that
that's what causing it to hang.

I'm not sure what's actually causing the "timeout waiting for
interrupt"; I hope you guys can figure that out from the dumped
registers.  At the very least, it looks like we need to work around
buggy hardware by using wait_event_timeout().

-- 
Soren Harward

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2014-11-07 17:05 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <CANQojO4jbSM=03WfJtw-iSNfexBNSt6GwRLpVSmBx_KrAa0bWQ@mail.gmail.com>
     [not found] ` <20130806192228.GB8246@roeck-us.net>
     [not found]   ` <CANQojO74882h+oFJDrq2dsSC71NZHuyf2sjTzCr4a+VmPMLyVA@mail.gmail.com>
     [not found]     ` <52026530.2080301@roeck-us.net>
     [not found]       ` <CANQojO5B+7c+J=fxV-_mT=Y19Uy=4cW-83vnrphPkN5pmgc2YQ@mail.gmail.com>
     [not found]         ` <20130807174801.GB1862@roeck-us.net>
     [not found]           ` <CANQojO5QuoGSmmNYxxZ=0MhbZt0mEW-8n6kTxVSWuZ4p6NFw5Q@mail.gmail.com>
     [not found]             ` <CANQojO5QuoGSmmNYxxZ=0MhbZt0mEW-8n6kTxVSWuZ4p6NFw5Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-08-07 19:25               ` i2c-i801 driver quit working in 3.8.11 (was: adt7475 driver quit working) Guenter Roeck
     [not found]                 ` <20130807192556.GA31395-0h96xk9xTtrk1uMJSBkQmQ@public.gmane.org>
2014-08-07  8:11                   ` i2c-i801 driver quit working in 3.8.11 Jean Delvare
     [not found]                     ` <CANQojO70ys+UERD5fw0pZDWpneN7aN6z+d-RfCNqiH5-gjDyNQ@mail.gmail.com>
     [not found]                       ` <CANQojO70ys+UERD5fw0pZDWpneN7aN6z+d-RfCNqiH5-gjDyNQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-08-07 16:55                         ` Jean Delvare
     [not found]                           ` <20140807185557.512e08e5-R0o5gVi9kd7kN2dkZ6Wm7A@public.gmane.org>
2014-08-07 20:42                             ` Guenter Roeck
     [not found]                     ` <20140807101147.34087264-R0o5gVi9kd7kN2dkZ6Wm7A@public.gmane.org>
2014-08-07 14:19                       ` Guenter Roeck
     [not found]                         ` <53E38AD9.5030708-0h96xk9xTtrk1uMJSBkQmQ@public.gmane.org>
2014-08-07 16:04                           ` Jean Delvare
     [not found]                             ` <1407427494.4314.91.camel-H7Kp9ZFCxt/N0uC3ymp8PA@public.gmane.org>
2014-08-07 16:52                               ` Guenter Roeck
2014-11-05  3:27                       ` Soren Harward
2014-11-05 10:22                         ` Jean Delvare
     [not found]                           ` <20141105112217.0e1925f3-R0o5gVi9kd7kN2dkZ6Wm7A@public.gmane.org>
2014-11-07 17:05                             ` Soren Harward

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.