All of lore.kernel.org
 help / color / mirror / Atom feed
* 2.6.{30,31} x86_64 ahci problem - irq 23: nobody cared
@ 2009-09-24 18:21 Alexander Huemer
  2009-09-24 18:31 ` David Daney
  2009-09-24 19:24 ` Frans Pop
  0 siblings, 2 replies; 33+ messages in thread
From: Alexander Huemer @ 2009-09-24 18:21 UTC (permalink / raw)
  To: linux-kernel; +Cc: alexander.huemer

the problem appears under heavy system load and slows down the system to
unusable speed.
kernels before .30 were not affected.
irqpoll does not change behavior.

error message from .31:

    [157152.418524] irq 23: nobody cared (try booting with the "irqpoll"
    option)
    [157152.418530] Pid: 1359, comm: cc1plus Tainted: G        W 
    2.6.31-gentoo-blackbit #2
    [157152.418532] Call Trace:
    [157152.418534]  <IRQ>  [<ffffffff81066e3f>] ?
    __report_bad_irq+0x30/0x7d
    [157152.418544]  [<ffffffff81066f93>] ? note_interrupt+0x107/0x170
    [157152.418547]  [<ffffffff81067580>] ? handle_fasteoi_irq+0x8a/0xaa
    [157152.418551]  [<ffffffff8100d1cf>] ? handle_irq+0x17/0x1d
    [157152.418554]  [<ffffffff8100c84b>] ? do_IRQ+0x54/0xb2
    [157152.418558]  [<ffffffff8100b6d3>] ? ret_from_intr+0x0/0xa
    [157152.418559]  <EOI>
    [157152.418560] handlers:
    [157152.418562] [<ffffffff813d2a6f>] (ahci_interrupt+0x0/0x426)
    [157152.418566] Disabling IRQ #23


bios of the machine is up to date,
i tried all related bios settings, no change.

kernel config for .31     http://xx.vu/~ahuemer/config_ahuemer_20090923.gz
lspci -vxxx               http://xx.vu/~ahuemer/lspci_ahuemer_20090923
lsusb -v                  http://xx.vu/~ahuemer/lsusb_ahuemer_20090923
/proc/interrupts         
http://xx.vu/~ahuemer/proc_interrupts_ahuemer_20090923
thread in gentoo forums  
http://forums.gentoo.org/viewtopic-t-780725-start-0.html

please tell me what additional info is needed.
please CC me on replies, i am not subscribed.

-alex

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: 2.6.{30,31} x86_64 ahci problem - irq 23: nobody cared
  2009-09-24 18:21 2.6.{30,31} x86_64 ahci problem - irq 23: nobody cared Alexander Huemer
@ 2009-09-24 18:31 ` David Daney
  2009-09-24 19:15   ` Alexander Huemer
  2009-09-24 19:24 ` Frans Pop
  1 sibling, 1 reply; 33+ messages in thread
From: David Daney @ 2009-09-24 18:31 UTC (permalink / raw)
  To: Alexander Huemer; +Cc: linux-kernel

Alexander Huemer wrote:
> the problem appears under heavy system load and slows down the system to
> unusable speed.
> kernels before .30 were not affected.
> irqpoll does not change behavior.
> 
> error message from .31:
> 
>     [157152.418524] irq 23: nobody cared (try booting with the "irqpoll"
>     option)
>     [157152.418530] Pid: 1359, comm: cc1plus Tainted: G        W 

Right here is the problem  ->         ^^^^^^^^

Haven't you read all the threads about the evil of C++.  This is just 
one more example of the why we shouldn't be using it. :-)

David Daney

>     2.6.31-gentoo-blackbit #2
>     [157152.418532] Call Trace:
>     [157152.418534]  <IRQ>  [<ffffffff81066e3f>] ?
>     __report_bad_irq+0x30/0x7d
>     [157152.418544]  [<ffffffff81066f93>] ? note_interrupt+0x107/0x170
>     [157152.418547]  [<ffffffff81067580>] ? handle_fasteoi_irq+0x8a/0xaa
>     [157152.418551]  [<ffffffff8100d1cf>] ? handle_irq+0x17/0x1d
>     [157152.418554]  [<ffffffff8100c84b>] ? do_IRQ+0x54/0xb2
>     [157152.418558]  [<ffffffff8100b6d3>] ? ret_from_intr+0x0/0xa
>     [157152.418559]  <EOI>
>     [157152.418560] handlers:
>     [157152.418562] [<ffffffff813d2a6f>] (ahci_interrupt+0x0/0x426)
>     [157152.418566] Disabling IRQ #23
> 
> 
> bios of the machine is up to date,
> i tried all related bios settings, no change.
> 
> kernel config for .31     http://xx.vu/~ahuemer/config_ahuemer_20090923.gz
> lspci -vxxx               http://xx.vu/~ahuemer/lspci_ahuemer_20090923
> lsusb -v                  http://xx.vu/~ahuemer/lsusb_ahuemer_20090923
> /proc/interrupts         
> http://xx.vu/~ahuemer/proc_interrupts_ahuemer_20090923
> thread in gentoo forums  
> http://forums.gentoo.org/viewtopic-t-780725-start-0.html
> 
> please tell me what additional info is needed.
> please CC me on replies, i am not subscribed.
> 
> -alex
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: 2.6.{30,31} x86_64 ahci problem - irq 23: nobody cared
  2009-09-24 18:31 ` David Daney
@ 2009-09-24 19:15   ` Alexander Huemer
  2009-09-24 19:38     ` David Daney
  0 siblings, 1 reply; 33+ messages in thread
From: Alexander Huemer @ 2009-09-24 19:15 UTC (permalink / raw)
  To: David Daney; +Cc: linux-kernel, alexander.huemer

David Daney wrote:
> Alexander Huemer wrote:
>> the problem appears under heavy system load and slows down the system to
>> unusable speed.
>> kernels before .30 were not affected.
>> irqpoll does not change behavior.
>>
>> error message from .31:
>>
>>     [157152.418524] irq 23: nobody cared (try booting with the "irqpoll"
>>     option)
>>     [157152.418530] Pid: 1359, comm: cc1plus Tainted: G        W 
>
> Right here is the problem  ->         ^^^^^^^^
>
> Haven't you read all the threads about the evil of C++.  This is just
> one more example of the why we shouldn't be using it. :-)
>
> David Daney
>
>>     2.6.31-gentoo-blackbit #2
>>     [157152.418532] Call Trace:
>>     [157152.418534]  <IRQ>  [<ffffffff81066e3f>] ?
>>     __report_bad_irq+0x30/0x7d
>>     [157152.418544]  [<ffffffff81066f93>] ? note_interrupt+0x107/0x170
>>     [157152.418547]  [<ffffffff81067580>] ? handle_fasteoi_irq+0x8a/0xaa
>>     [157152.418551]  [<ffffffff8100d1cf>] ? handle_irq+0x17/0x1d
>>     [157152.418554]  [<ffffffff8100c84b>] ? do_IRQ+0x54/0xb2
>>     [157152.418558]  [<ffffffff8100b6d3>] ? ret_from_intr+0x0/0xa
>>     [157152.418559]  <EOI>
>>     [157152.418560] handlers:
>>     [157152.418562] [<ffffffff813d2a6f>] (ahci_interrupt+0x0/0x426)
>>     [157152.418566] Disabling IRQ #23
>>
>>
>> bios of the machine is up to date,
>> i tried all related bios settings, no change.
>>
>> kernel config for .31    
>> http://xx.vu/~ahuemer/config_ahuemer_20090923.gz
>> lspci -vxxx               http://xx.vu/~ahuemer/lspci_ahuemer_20090923
>> lsusb -v                  http://xx.vu/~ahuemer/lsusb_ahuemer_20090923
>> /proc/interrupts        
>> http://xx.vu/~ahuemer/proc_interrupts_ahuemer_20090923
>> thread in gentoo forums 
>> http://forums.gentoo.org/viewtopic-t-780725-start-0.html
>>
>> please tell me what additional info is needed.
>> please CC me on replies, i am not subscribed.
>>
>> -alex
>> -- 
>> To unsubscribe from this list: send the line "unsubscribe
>> linux-kernel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> Please read the FAQ at  http://www.tux.org/lkml/
>>
>
thanks for your quick answer, david.
so, this isn't a kernel issue at all ?
imho a user process shouldn't be able to cause such a situation.
what can i do against that phenomenon ?

-alex

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: 2.6.{30,31} x86_64 ahci problem - irq 23: nobody cared
  2009-09-24 18:21 2.6.{30,31} x86_64 ahci problem - irq 23: nobody cared Alexander Huemer
  2009-09-24 18:31 ` David Daney
@ 2009-09-24 19:24 ` Frans Pop
  2009-09-24 19:30   ` Alexander Huemer
  2009-09-24 19:40   ` Frans Pop
  1 sibling, 2 replies; 33+ messages in thread
From: Frans Pop @ 2009-09-24 19:24 UTC (permalink / raw)
  To: Alexander Huemer; +Cc: linux-kernel, linux-ide

Adding linux-ide to CC.

Alexander Huemer wrote:
> the problem appears under heavy system load and slows down the system to
> unusable speed.
> kernels before .30 were not affected.
> irqpoll does not change behavior.
> 
> error message from .31:
>  [157152.418524] irq 23: nobody cared (try booting with the "irqpoll" option)
>  [157152.418530] Pid: 1359, comm: cc1plus Tainted: G        W  2.6.31-gentoo-blackbit #2
>  [157152.418532] Call Trace:
>  [157152.418534]  <IRQ>  [<ffffffff81066e3f>] ?  __report_bad_irq+0x30/0x7d
>  [157152.418544]  [<ffffffff81066f93>] ? note_interrupt+0x107/0x170
>  [157152.418547]  [<ffffffff81067580>] ? handle_fasteoi_irq+0x8a/0xaa
>  [157152.418551]  [<ffffffff8100d1cf>] ? handle_irq+0x17/0x1d
>  [157152.418554]  [<ffffffff8100c84b>] ? do_IRQ+0x54/0xb2
>  [157152.418558]  [<ffffffff8100b6d3>] ? ret_from_intr+0x0/0xa
>  [157152.418559]  <EOI>
>  [157152.418560] handlers:
>  [157152.418562] [<ffffffff813d2a6f>] (ahci_interrupt+0x0/0x426)
>  [157152.418566] Disabling IRQ #23
>
> bios of the machine is up to date,
> i tried all related bios settings, no change.
> 
> kernel config for .31   http://xx.vu/~ahuemer/config_ahuemer_20090923.gz
> lspci -vxxx             http://xx.vu/~ahuemer/lspci_ahuemer_20090923
> lsusb -v                http://xx.vu/~ahuemer/lsusb_ahuemer_20090923
> /proc/interrupts        http://xx.vu/~ahuemer/proc_interrupts_ahuemer_20090923
> thread in gentoo forums http://forums.gentoo.org/viewtopic-t-780725-start-0.html
> 
> please tell me what additional info is needed.

A full dmesg (or kernel log) starting from a clean boot up to the error
could be useful.

If no others reply and the issue can be reproduced reliably, running a
git bisect between v2.6.29 and v2.6.30 to trace the cause of the regression
could be an option.

Cheers,
FJP

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: 2.6.{30,31} x86_64 ahci problem - irq 23: nobody cared
  2009-09-24 19:24 ` Frans Pop
@ 2009-09-24 19:30   ` Alexander Huemer
  2009-09-24 19:40   ` Frans Pop
  1 sibling, 0 replies; 33+ messages in thread
From: Alexander Huemer @ 2009-09-24 19:30 UTC (permalink / raw)
  To: Frans Pop; +Cc: linux-kernel, linux-ide, alexander.huemer

Frans Pop wrote:
> Adding linux-ide to CC.
>
> Alexander Huemer wrote:
>> the problem appears under heavy system load and slows down the system to
>> unusable speed.
>> kernels before .30 were not affected.
>> irqpoll does not change behavior.
>>
>> error message from .31:
>>  [157152.418524] irq 23: nobody cared (try booting with the "irqpoll" option)
>>  [157152.418530] Pid: 1359, comm: cc1plus Tainted: G        W  2.6.31-gentoo-blackbit #2
>>  [157152.418532] Call Trace:
>>  [157152.418534]  <IRQ>  [<ffffffff81066e3f>] ?  __report_bad_irq+0x30/0x7d
>>  [157152.418544]  [<ffffffff81066f93>] ? note_interrupt+0x107/0x170
>>  [157152.418547]  [<ffffffff81067580>] ? handle_fasteoi_irq+0x8a/0xaa
>>  [157152.418551]  [<ffffffff8100d1cf>] ? handle_irq+0x17/0x1d
>>  [157152.418554]  [<ffffffff8100c84b>] ? do_IRQ+0x54/0xb2
>>  [157152.418558]  [<ffffffff8100b6d3>] ? ret_from_intr+0x0/0xa
>>  [157152.418559]  <EOI>
>>  [157152.418560] handlers:
>>  [157152.418562] [<ffffffff813d2a6f>] (ahci_interrupt+0x0/0x426)
>>  [157152.418566] Disabling IRQ #23
>>
>> bios of the machine is up to date,
>> i tried all related bios settings, no change.
>>
>> kernel config for .31   http://xx.vu/~ahuemer/config_ahuemer_20090923.gz
>> lspci -vxxx             http://xx.vu/~ahuemer/lspci_ahuemer_20090923
>> lsusb -v                http://xx.vu/~ahuemer/lsusb_ahuemer_20090923
>> /proc/interrupts        http://xx.vu/~ahuemer/proc_interrupts_ahuemer_20090923
>> thread in gentoo forums http://forums.gentoo.org/viewtopic-t-780725-start-0.html
>>
>> please tell me what additional info is needed.
>
> A full dmesg (or kernel log) starting from a clean boot up to the error
> could be useful.
>
> If no others reply and the issue can be reproduced reliably, running a
> git bisect between v2.6.29 and v2.6.30 to trace the cause of the regression
> could be an option.
>
> Cheers,
> FJP
http://xx.vu/~ahuemer/dmesg_ahuemer_20090923
i rebootet and try to reproduce the error.
the last time the problem appeared during compilation of gcc-4.3.4.

regards
-alex

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: 2.6.{30,31} x86_64 ahci problem - irq 23: nobody cared
  2009-09-24 19:15   ` Alexander Huemer
@ 2009-09-24 19:38     ` David Daney
  0 siblings, 0 replies; 33+ messages in thread
From: David Daney @ 2009-09-24 19:38 UTC (permalink / raw)
  To: Alexander Huemer; +Cc: linux-kernel

Alexander Huemer wrote:
> David Daney wrote:
>> Alexander Huemer wrote:
>>> the problem appears under heavy system load and slows down the system to
>>> unusable speed.
>>> kernels before .30 were not affected.
>>> irqpoll does not change behavior.
>>>
>>> error message from .31:
>>>
>>>     [157152.418524] irq 23: nobody cared (try booting with the "irqpoll"
>>>     option)
>>>     [157152.418530] Pid: 1359, comm: cc1plus Tainted: G        W 
>> Right here is the problem  ->         ^^^^^^^^
>>
>> Haven't you read all the threads about the evil of C++.  This is just
>> one more example of the why we shouldn't be using it. :-)
>>
>> David Daney
>>
[...]
> thanks for your quick answer, david.
> so, this isn't a kernel issue at all ?
> imho a user process shouldn't be able to cause such a situation.
> what can i do against that phenomenon ?
> 

Just for avoidance of doubt, it was an attempt at a joke.

You are of course correct.  It looks like a real bug.  User-space 
shouldn't matter.

David Daney

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: 2.6.{30,31} x86_64 ahci problem - irq 23: nobody cared
  2009-09-24 19:24 ` Frans Pop
  2009-09-24 19:30   ` Alexander Huemer
@ 2009-09-24 19:40   ` Frans Pop
  2009-09-24 19:43     ` Alexander Huemer
  2009-09-25  0:02     ` Alexander Huemer
  1 sibling, 2 replies; 33+ messages in thread
From: Frans Pop @ 2009-09-24 19:40 UTC (permalink / raw)
  To: Alexander Huemer; +Cc: linux-kernel, linux-ide

On Thursday 24 September 2009, Frans Pop wrote:
> > error message from .31:
> >  [157152.418524] irq 23: nobody cared
>
> If no others reply and the issue can be reproduced reliably, running a
> git bisect between v2.6.29 and v2.6.30 to trace the cause of the
> regression could be an option.

Looking at the changes in drivers/ata/ahci.c, it might be worth to try if 
reverting the following commit fixes the issue:

commit a5bfc4714b3f01365aef89a92673f2ceb1ccf246
Author: Tejun Heo <tj@kernel.org>
Date:   Fri Jan 23 11:31:39 2009 +0900

    ahci: drop intx manipulation on msi enable

It's a bit of a wild guess though.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: 2.6.{30,31} x86_64 ahci problem - irq 23: nobody cared
  2009-09-24 19:40   ` Frans Pop
@ 2009-09-24 19:43     ` Alexander Huemer
  2009-09-25  0:02     ` Alexander Huemer
  1 sibling, 0 replies; 33+ messages in thread
From: Alexander Huemer @ 2009-09-24 19:43 UTC (permalink / raw)
  To: Frans Pop; +Cc: linux-kernel, linux-ide

Frans Pop wrote:
> On Thursday 24 September 2009, Frans Pop wrote:
>   
>>> error message from .31:
>>>  [157152.418524] irq 23: nobody cared
>>>       
>> If no others reply and the issue can be reproduced reliably, running a
>> git bisect between v2.6.29 and v2.6.30 to trace the cause of the
>> regression could be an option.
>>     
>
> Looking at the changes in drivers/ata/ahci.c, it might be worth to try if 
> reverting the following commit fixes the issue:
>
> commit a5bfc4714b3f01365aef89a92673f2ceb1ccf246
> Author: Tejun Heo <tj@kernel.org>
> Date:   Fri Jan 23 11:31:39 2009 +0900
>
>     ahci: drop intx manipulation on msi enable
>
> It's a bit of a wild guess though.
>   
thanks for the hint.
i'll wait for the end of the compilation of gcc-4.3.4. that will take ~ 45m.
afterwards i'll check out the kernel sources from git and try the revert.
many thanks till then.

-alex

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: 2.6.{30,31} x86_64 ahci problem - irq 23: nobody cared
  2009-09-24 19:40   ` Frans Pop
  2009-09-24 19:43     ` Alexander Huemer
@ 2009-09-25  0:02     ` Alexander Huemer
  2009-09-25 11:28       ` Alexander Huemer
  1 sibling, 1 reply; 33+ messages in thread
From: Alexander Huemer @ 2009-09-25  0:02 UTC (permalink / raw)
  To: Frans Pop; +Cc: linux-kernel, linux-ide, alexander.huemer

Frans Pop wrote:
> On Thursday 24 September 2009, Frans Pop wrote:
>>> error message from .31:
>>>  [157152.418524] irq 23: nobody cared
>> If no others reply and the issue can be reproduced reliably, running a
>> git bisect between v2.6.29 and v2.6.30 to trace the cause of the
>> regression could be an option.
>
> Looking at the changes in drivers/ata/ahci.c, it might be worth to try if 
> reverting the following commit fixes the issue:
>
> commit a5bfc4714b3f01365aef89a92673f2ceb1ccf246
> Author: Tejun Heo <tj@kernel.org>
> Date:   Fri Jan 23 11:31:39 2009 +0900
>
>     ahci: drop intx manipulation on msi enable
>
> It's a bit of a wild guess though.
i reproduced the issue.

    [ 3486.747729] Pid: 9573, comm: jc1 Tainted: G        W 
    2.6.31-gentoo-blackbit #2
    [ 3486.747731] Call Trace:
    [ 3486.747733]  <IRQ>  [<ffffffff81066e3f>] ? __report_bad_irq+0x30/0x7d
    [ 3486.747743]  [<ffffffff81066f93>] ? note_interrupt+0x107/0x170
    [ 3486.747746]  [<ffffffff81067580>] ? handle_fasteoi_irq+0x8a/0xaa
    [ 3486.747750]  [<ffffffff8100d1cf>] ? handle_irq+0x17/0x1d
    [ 3486.747752]  [<ffffffff8100c84b>] ? do_IRQ+0x54/0xb2
    [ 3486.747756]  [<ffffffff8100b6d3>] ? ret_from_intr+0x0/0xa
    [ 3486.747758]  <EOI>
    [ 3486.747759] handlers:
    [ 3486.747761] [<ffffffff813d2a6f>] (ahci_interrupt+0x0/0x426)
    [ 3486.747765] Disabling IRQ #23

i will report back after a compile run of gcc-4.3.4 with a kernel
without the commit you suggested.

-alex

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: 2.6.{30,31} x86_64 ahci problem - irq 23: nobody cared
  2009-09-25  0:02     ` Alexander Huemer
@ 2009-09-25 11:28       ` Alexander Huemer
  2009-09-25 12:24         ` Frans Pop
  0 siblings, 1 reply; 33+ messages in thread
From: Alexander Huemer @ 2009-09-25 11:28 UTC (permalink / raw)
  To: Frans Pop; +Cc: linux-kernel, linux-ide, alexander.huemer

Alexander Huemer wrote:
> Frans Pop wrote:
>> On Thursday 24 September 2009, Frans Pop wrote:
>>>> error message from .31:
>>>>  [157152.418524] irq 23: nobody cared
>>> If no others reply and the issue can be reproduced reliably, running a
>>> git bisect between v2.6.29 and v2.6.30 to trace the cause of the
>>> regression could be an option.
>> Looking at the changes in drivers/ata/ahci.c, it might be worth to try if 
>> reverting the following commit fixes the issue:
>>
>> commit a5bfc4714b3f01365aef89a92673f2ceb1ccf246
>> Author: Tejun Heo <tj@kernel.org>
>> Date:   Fri Jan 23 11:31:39 2009 +0900
>>
>>     ahci: drop intx manipulation on msi enable
>>
>> It's a bit of a wild guess though.
> i reproduced the issue.
>
>     [ 3486.747729] Pid: 9573, comm: jc1 Tainted: G        W 
>     2.6.31-gentoo-blackbit #2
>     [ 3486.747731] Call Trace:
>     [ 3486.747733]  <IRQ>  [<ffffffff81066e3f>] ? __report_bad_irq+0x30/0x7d
>     [ 3486.747743]  [<ffffffff81066f93>] ? note_interrupt+0x107/0x170
>     [ 3486.747746]  [<ffffffff81067580>] ? handle_fasteoi_irq+0x8a/0xaa
>     [ 3486.747750]  [<ffffffff8100d1cf>] ? handle_irq+0x17/0x1d
>     [ 3486.747752]  [<ffffffff8100c84b>] ? do_IRQ+0x54/0xb2
>     [ 3486.747756]  [<ffffffff8100b6d3>] ? ret_from_intr+0x0/0xa
>     [ 3486.747758]  <EOI>
>     [ 3486.747759] handlers:
>     [ 3486.747761] [<ffffffff813d2a6f>] (ahci_interrupt+0x0/0x426)
>     [ 3486.747765] Disabling IRQ #23
>
> i will report back after a compile run of gcc-4.3.4 with a kernel
> without the commit you suggested.
>
> -alex
4 compilation runs of gcc-4.3.4 finished without the issue re-appearing.
it seems like you guessed right, Frans.
i also found this:
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=31b239ad1ba7225435e13f5afc47e48eb674c0cc
i'll report on bugzilla.

thanks for the help.
-alex

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: 2.6.{30,31} x86_64 ahci problem - irq 23: nobody cared
  2009-09-25 11:28       ` Alexander Huemer
@ 2009-09-25 12:24         ` Frans Pop
  2009-09-25 12:27           ` Alexander Huemer
  0 siblings, 1 reply; 33+ messages in thread
From: Frans Pop @ 2009-09-25 12:24 UTC (permalink / raw)
  To: Alexander Huemer; +Cc: linux-kernel, linux-ide, Tejun Heo, stable

On Friday 25 September 2009, Alexander Huemer wrote:
> Alexander Huemer wrote:
> > Frans Pop wrote:
> >> On Thursday 24 September 2009, Frans Pop wrote:
> >>>> error message from .31:
> >>>>  [157152.418524] irq 23: nobody cared
> >>
> >> Looking at the changes in drivers/ata/ahci.c, it might be worth to
> >> try if reverting the following commit fixes the issue:
> >>
> >> commit a5bfc4714b3f01365aef89a92673f2ceb1ccf246
> >> Author: Tejun Heo <tj@kernel.org>
> >> Date:   Fri Jan 23 11:31:39 2009 +0900
> >>
> >>     ahci: drop intx manipulation on msi enable
> >
> > i reproduced the issue.
> >
> >     [ 3486.747729] Pid: 9573, comm: jc1 Tainted: G        W  2.6.31-gentoo-blackbit #2
> >     [ 3486.747731] Call Trace:
> >     [ 3486.747733]  <IRQ>  [<ffffffff81066e3f>] ? __report_bad_irq+0x30/0x7d
> >     [ 3486.747743]  [<ffffffff81066f93>] ? note_interrupt+0x107/0x170
> >     [ 3486.747746]  [<ffffffff81067580>] ? handle_fasteoi_irq+0x8a/0xaa
> >     [ 3486.747750]  [<ffffffff8100d1cf>] ? handle_irq+0x17/0x1d
> >     [ 3486.747752]  [<ffffffff8100c84b>] ? do_IRQ+0x54/0xb2
> >     [ 3486.747756]  [<ffffffff8100b6d3>] ? ret_from_intr+0x0/0xa
> >     [ 3486.747758]  <EOI>
> >     [ 3486.747759] handlers:
> >     [ 3486.747761] [<ffffffff813d2a6f>] (ahci_interrupt+0x0/0x426)
> >     [ 3486.747765] Disabling IRQ #23
> >
> > i will report back after a compile run of gcc-4.3.4 with a kernel
> > without the commit you suggested.
>
> 4 compilation runs of gcc-4.3.4 finished without the issue re-appearing.
> it seems like you guessed right, Frans.

Great. Glad to hear it worked out.

> i also found this:
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commi
>t;h=31b239ad1ba7225435e13f5afc47e48eb674c0cc i'll report on bugzilla.

So with the revert already in mainline for .32, the only thing left is for
that to get included in stable updates for .30 and .31.

Cheers,
FJP

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: 2.6.{30,31} x86_64 ahci problem - irq 23: nobody cared
  2009-09-25 12:24         ` Frans Pop
@ 2009-09-25 12:27           ` Alexander Huemer
  2009-09-25 12:48             ` Frans Pop
  0 siblings, 1 reply; 33+ messages in thread
From: Alexander Huemer @ 2009-09-25 12:27 UTC (permalink / raw)
  To: Frans Pop; +Cc: linux-kernel, linux-ide, Tejun Heo, stable

Frans Pop wrote:
> On Friday 25 September 2009, Alexander Huemer wrote:
>   
>> Alexander Huemer wrote:
>>     
>>> Frans Pop wrote:
>>>       
>>>> On Thursday 24 September 2009, Frans Pop wrote:
>>>>         
>>>>>> error message from .31:
>>>>>>  [157152.418524] irq 23: nobody cared
>>>>>>             
>>>> Looking at the changes in drivers/ata/ahci.c, it might be worth to
>>>> try if reverting the following commit fixes the issue:
>>>>
>>>> commit a5bfc4714b3f01365aef89a92673f2ceb1ccf246
>>>> Author: Tejun Heo <tj@kernel.org>
>>>> Date:   Fri Jan 23 11:31:39 2009 +0900
>>>>
>>>>     ahci: drop intx manipulation on msi enable
>>>>         
>>> i reproduced the issue.
>>>
>>>     [ 3486.747729] Pid: 9573, comm: jc1 Tainted: G        W  2.6.31-gentoo-blackbit #2
>>>     [ 3486.747731] Call Trace:
>>>     [ 3486.747733]  <IRQ>  [<ffffffff81066e3f>] ? __report_bad_irq+0x30/0x7d
>>>     [ 3486.747743]  [<ffffffff81066f93>] ? note_interrupt+0x107/0x170
>>>     [ 3486.747746]  [<ffffffff81067580>] ? handle_fasteoi_irq+0x8a/0xaa
>>>     [ 3486.747750]  [<ffffffff8100d1cf>] ? handle_irq+0x17/0x1d
>>>     [ 3486.747752]  [<ffffffff8100c84b>] ? do_IRQ+0x54/0xb2
>>>     [ 3486.747756]  [<ffffffff8100b6d3>] ? ret_from_intr+0x0/0xa
>>>     [ 3486.747758]  <EOI>
>>>     [ 3486.747759] handlers:
>>>     [ 3486.747761] [<ffffffff813d2a6f>] (ahci_interrupt+0x0/0x426)
>>>     [ 3486.747765] Disabling IRQ #23
>>>
>>> i will report back after a compile run of gcc-4.3.4 with a kernel
>>> without the commit you suggested.
>>>       
>> 4 compilation runs of gcc-4.3.4 finished without the issue re-appearing.
>> it seems like you guessed right, Frans.
>>     
>
> Great. Glad to hear it worked out.
>
>   
>> i also found this:
>> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commi
>> t;h=31b239ad1ba7225435e13f5afc47e48eb674c0cc i'll report on bugzilla.
>>     
>
> So with the revert already in mainline for .32, the only thing left is for
> that to get included in stable updates for .30 and .31.
>
> Cheers,
> FJP
>   
please see the last comment in [1].
can i do anything else to help ?

thanks again
-alex

[1] http://bugzilla.kernel.org/show_bug.cgi?id=14124

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: 2.6.{30,31} x86_64 ahci problem - irq 23: nobody cared
  2009-09-25 12:27           ` Alexander Huemer
@ 2009-09-25 12:48             ` Frans Pop
  2009-10-08 12:00               ` Alexander Huemer
  0 siblings, 1 reply; 33+ messages in thread
From: Frans Pop @ 2009-09-25 12:48 UTC (permalink / raw)
  To: Alexander Huemer; +Cc: linux-kernel, linux-ide, Tejun Heo, stable

On Friday 25 September 2009, Alexander Huemer wrote:
> > So with the revert already in mainline for .32, the only thing left is
> > for that to get included in stable updates for .30 and .31.
>
> please see the last comment in [1].
> can i do anything else to help ?

> [1] http://bugzilla.kernel.org/show_bug.cgi?id=14124

Yes, adding that comment was excellent. I also added the relevant people
in the CC of my previous mail, so it should get taken care of now. Unless 
they have additional questions no further action from you should be 
needed.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: 2.6.{30,31} x86_64 ahci problem - irq 23: nobody cared
  2009-09-25 12:48             ` Frans Pop
@ 2009-10-08 12:00               ` Alexander Huemer
  2009-10-09 21:30                 ` Alexander Huemer
  2009-10-10 13:13                 ` Frans Pop
  0 siblings, 2 replies; 33+ messages in thread
From: Alexander Huemer @ 2009-10-08 12:00 UTC (permalink / raw)
  To: Frans Pop; +Cc: linux-kernel, linux-ide, Tejun Heo, stable

Frans Pop wrote:
> On Friday 25 September 2009, Alexander Huemer wrote:
>>> So with the revert already in mainline for .32, the only thing left is
>>> for that to get included in stable updates for .30 and .31.
>> please see the last comment in [1].
>> can i do anything else to help ?
>
>> [1] http://bugzilla.kernel.org/show_bug.cgi?id=14124
>
> Yes, adding that comment was excellent. I also added the relevant people
> in the CC of my previous mail, so it should get taken care of now. Unless 
> they have additional questions no further action from you should be 
> needed.
it seems like the problem is _not_ solved.
i just booted with 2.6.31.3.
2.6.31-gentoo-r2 is vanilla-2.6.31-r2 with a few unrelated patches.
did the usual verification (compilation of gcc-4.3.4),
and got this again:

    [ 1018.059729] irq 23: nobody cared (try booting with the "irqpoll"
    option)
    [ 1018.059734] Pid: 8656, comm: sh Tainted: G        W
    2.6.31-gentoo-r2-blackbit #1
    [ 1018.059736] Call Trace:
    [ 1018.059738]  <IRQ>  [<ffffffff81066ecf>] ? __report_bad_irq+0x30/0x7d
    [ 1018.059748]  [<ffffffff81067023>] ? note_interrupt+0x107/0x170
    [ 1018.059751]  [<ffffffff81067610>] ? handle_fasteoi_irq+0x8a/0xaa
    [ 1018.059755]  [<ffffffff8100d1cf>] ? handle_irq+0x17/0x1d
    [ 1018.059757]  [<ffffffff8100c84b>] ? do_IRQ+0x54/0xb2
    [ 1018.059761]  [<ffffffff8100b6d3>] ? ret_from_intr+0x0/0xa
    [ 1018.059762]  <EOI>  [<ffffffff815c7d2c>] ? do_page_fault+0xed/0x2ef
    [ 1018.059769]  [<ffffffff815c7f12>] ? do_page_fault+0x2d3/0x2ef
    [ 1018.059773]  [<ffffffff812dd5ed>] ? __put_user_4+0x1d/0x30
    [ 1018.059776]  [<ffffffff815c5fdf>] ? page_fault+0x1f/0x30
    [ 1018.059777] handlers:
    [ 1018.059778] [<ffffffff813d2d8c>] (ahci_interrupt+0x0/0x426)
    [ 1018.059783] Disabling IRQ #23

so in my opinion reverting commit [1] with commit [2] missed the point.
please comment.

-alex

[1] a5bfc4714b3f01365aef89a92673f2ceb1ccf246
[2] 31b239ad1ba7225435e13f5afc47e48eb674c0cc

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: 2.6.{30,31} x86_64 ahci problem - irq 23: nobody cared
  2009-10-08 12:00               ` Alexander Huemer
@ 2009-10-09 21:30                 ` Alexander Huemer
  2009-10-10 13:13                 ` Frans Pop
  1 sibling, 0 replies; 33+ messages in thread
From: Alexander Huemer @ 2009-10-09 21:30 UTC (permalink / raw)
  To: Frans Pop; +Cc: linux-kernel, linux-ide, Tejun Heo, stable

Alexander Huemer wrote:
> Frans Pop wrote:
>> On Friday 25 September 2009, Alexander Huemer wrote:
>>>> So with the revert already in mainline for .32, the only thing left is
>>>> for that to get included in stable updates for .30 and .31.
>>> please see the last comment in [1].
>>> can i do anything else to help ?
>>> [1] http://bugzilla.kernel.org/show_bug.cgi?id=14124
>> Yes, adding that comment was excellent. I also added the relevant people
>> in the CC of my previous mail, so it should get taken care of now. Unless 
>> they have additional questions no further action from you should be 
>> needed.
> it seems like the problem is _not_ solved.
> i just booted with 2.6.31.3.
> 2.6.31-gentoo-r2 is vanilla-2.6.31-r2 with a few unrelated patches.
> did the usual verification (compilation of gcc-4.3.4),
> and got this again:
>
>     [ 1018.059729] irq 23: nobody cared (try booting with the "irqpoll"
>     option)
>     [ 1018.059734] Pid: 8656, comm: sh Tainted: G        W
>     2.6.31-gentoo-r2-blackbit #1
>     [ 1018.059736] Call Trace:
>     [ 1018.059738]  <IRQ>  [<ffffffff81066ecf>] ? __report_bad_irq+0x30/0x7d
>     [ 1018.059748]  [<ffffffff81067023>] ? note_interrupt+0x107/0x170
>     [ 1018.059751]  [<ffffffff81067610>] ? handle_fasteoi_irq+0x8a/0xaa
>     [ 1018.059755]  [<ffffffff8100d1cf>] ? handle_irq+0x17/0x1d
>     [ 1018.059757]  [<ffffffff8100c84b>] ? do_IRQ+0x54/0xb2
>     [ 1018.059761]  [<ffffffff8100b6d3>] ? ret_from_intr+0x0/0xa
>     [ 1018.059762]  <EOI>  [<ffffffff815c7d2c>] ? do_page_fault+0xed/0x2ef
>     [ 1018.059769]  [<ffffffff815c7f12>] ? do_page_fault+0x2d3/0x2ef
>     [ 1018.059773]  [<ffffffff812dd5ed>] ? __put_user_4+0x1d/0x30
>     [ 1018.059776]  [<ffffffff815c5fdf>] ? page_fault+0x1f/0x30
>     [ 1018.059777] handlers:
>     [ 1018.059778] [<ffffffff813d2d8c>] (ahci_interrupt+0x0/0x426)
>     [ 1018.059783] Disabling IRQ #23
>
> so in my opinion reverting commit [1] with commit [2] missed the point.
> please comment.
>
> -alex
>
> [1] a5bfc4714b3f01365aef89a92673f2ceb1ccf246
> [2] 31b239ad1ba7225435e13f5afc47e48eb674c0cc
>
i hope i do not annoy anybody by posting again, but i am afraid my last
message was not noticed by anybody.
is there something i don't know but should ? as it seems the problem is
still existing.
i would be happy do test whatever is needed to trace the problem.

please respond.
regards
-alex

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: 2.6.{30,31} x86_64 ahci problem - irq 23: nobody cared
  2009-10-08 12:00               ` Alexander Huemer
  2009-10-09 21:30                 ` Alexander Huemer
@ 2009-10-10 13:13                 ` Frans Pop
  2009-10-11 20:57                   ` Alexander Huemer
  2009-10-12  7:49                   ` Tejun Heo
  1 sibling, 2 replies; 33+ messages in thread
From: Frans Pop @ 2009-10-10 13:13 UTC (permalink / raw)
  To: Alexander Huemer; +Cc: linux-kernel, linux-ide, Tejun Heo, Jeff Garzik

(dropped stable from CC)

On Thursday 08 October 2009, you wrote:
> Frans Pop wrote:
> > On Friday 25 September 2009, Alexander Huemer wrote:
> >>> So with the revert already in mainline for .32, the only thing left
> >>> is for that to get included in stable updates for .30 and .31.
> >>
> >> please see the last comment in [1].
> >> can i do anything else to help ?
> >>
> >> [1] http://bugzilla.kernel.org/show_bug.cgi?id=14124
>
> it seems like the problem is _not_ solved.
> i just booted with 2.6.31.3.
> 2.6.31-gentoo-r2 is vanilla-2.6.31-r2 with a few unrelated patches.

I don't know what vanilla-2.6.31-r2 is, but I assume it's based on either
2.6.31.3 or 2.6.31.2.

> did the usual verification (compilation of gcc-4.3.4),

> so in my opinion reverting commit [1] with commit [2] missed the point.
>
> [1] a5bfc4714b3f01365aef89a92673f2ceb1ccf246
> [2] 31b239ad1ba7225435e13f5afc47e48eb674c0cc

The most likely explanation is that your earlier test from which you
concluded that the revert did fix the problem was incorrect. It seems
unlikely that some other stable commit interferes here.

So basically we're back where we started.

>     [ 1018.059729] irq 23: nobody cared (try booting with the "irqpoll" option)
>     [ 1018.059734] Pid: 8656, comm: sh Tainted: G        W    2.6.31-gentoo-r2-blackbit #1
>     [ 1018.059736] Call Trace:
>     [ 1018.059738]  <IRQ>  [<ffffffff81066ecf>] ? __report_bad_irq+0x30/0x7d
>     [ 1018.059748]  [<ffffffff81067023>] ? note_interrupt+0x107/0x170
>     [ 1018.059751]  [<ffffffff81067610>] ? handle_fasteoi_irq+0x8a/0xaa
>     [ 1018.059755]  [<ffffffff8100d1cf>] ? handle_irq+0x17/0x1d
>     [ 1018.059757]  [<ffffffff8100c84b>] ? do_IRQ+0x54/0xb2
>     [ 1018.059761]  [<ffffffff8100b6d3>] ? ret_from_intr+0x0/0xa
>     [ 1018.059762]  <EOI>  [<ffffffff815c7d2c>] ? do_page_fault+0xed/0x2ef
>     [ 1018.059769]  [<ffffffff815c7f12>] ? do_page_fault+0x2d3/0x2ef
>     [ 1018.059773]  [<ffffffff812dd5ed>] ? __put_user_4+0x1d/0x30
>     [ 1018.059776]  [<ffffffff815c5fdf>] ? page_fault+0x1f/0x30
>     [ 1018.059777] handlers:
>     [ 1018.059778] [<ffffffff813d2d8c>] (ahci_interrupt+0x0/0x426) 
>     [ 1018.059783] Disabling IRQ #23

How reproducible is the error for you? Do you see it every time or not?
If it is reliably reproducible, can you think of any explanation why your
earlier test was a success while we now see that the revert does not help?

Does the error *only* occur during gcc compilation, or was that just the
simplest way to reproduce it? Does it always occur at the same point during
the compilation or does it vary?
Can you create a test case that does not require doing the whole
compilation, but only executes the step that triggers the error?

If you can find a reliable and fairly quick way to reproduce the error, I
would suggest doing a bisection.

Jeff, Tejun: do you have any ideas what could cause this issue to suddenly
appear or how to debug/instrument it?

Cheers,
FJP

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: 2.6.{30,31} x86_64 ahci problem - irq 23: nobody cared
  2009-10-10 13:13                 ` Frans Pop
@ 2009-10-11 20:57                   ` Alexander Huemer
  2009-10-12  7:49                   ` Tejun Heo
  1 sibling, 0 replies; 33+ messages in thread
From: Alexander Huemer @ 2009-10-11 20:57 UTC (permalink / raw)
  To: Frans Pop
  Cc: linux-kernel, linux-ide, Tejun Heo, Jeff Garzik, alexander.huemer

I don't know what vanilla-2.6.31-r2 is, but I assume it's based on
either 2.6.31.3 or 2.6.31.2.

    vanilla just means the unpatched kernel from kernel.org.

The most likely explanation is that your earlier test from which you
concluded that the revert did fix the problem was incorrect. It seems
unlikely that some other stable commit interferes here.

So basically we're back where we started.

    unfortunately you seem to be right.

How reproducible is the error for you? Do you see it every time or not?
If it is reliably reproducible, can you think of any explanation why your
earlier test was a success while we now see that the revert does not help?

    the error is reproducible. i'll try to pin it down to certain kernel
    versions in the next days.

Does the error *only* occur during gcc compilation, or was that just the
simplest way to reproduce it? Does it always occur at the same point during
the compilation or does it vary?

    it was the simplest way.
    i don't know how i could find out if the error actually always
    happens exactly the same time.
    i'll think about that.

Can you create a test case that does not require doing the whole
compilation, but only executes the step that triggers the error?

    surely, if i know what happens when the error occurs.

If you can find a reliable and fairly quick way to reproduce the error, I
would suggest doing a bisection.

    i would be happy to do that.

    thanks for now.


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: 2.6.{30,31} x86_64 ahci problem - irq 23: nobody cared
  2009-10-10 13:13                 ` Frans Pop
  2009-10-11 20:57                   ` Alexander Huemer
@ 2009-10-12  7:49                   ` Tejun Heo
  2009-10-12  9:48                     ` Frans Pop
  1 sibling, 1 reply; 33+ messages in thread
From: Tejun Heo @ 2009-10-12  7:49 UTC (permalink / raw)
  To: Frans Pop; +Cc: Alexander Huemer, linux-kernel, linux-ide, Jeff Garzik

Hello,

Frans Pop wrote:
>> so in my opinion reverting commit [1] with commit [2] missed the point.
>>
>> [1] a5bfc4714b3f01365aef89a92673f2ceb1ccf246
>> [2] 31b239ad1ba7225435e13f5afc47e48eb674c0cc
> 
> The most likely explanation is that your earlier test from which you
> concluded that the revert did fix the problem was incorrect. It seems
> unlikely that some other stable commit interferes here.

Hmm...

> So basically we're back where we started.
> 
>>     [ 1018.059729] irq 23: nobody cared (try booting with the "irqpoll" option)
>>     [ 1018.059734] Pid: 8656, comm: sh Tainted: G        W    2.6.31-gentoo-r2-blackbit #1
>>     [ 1018.059736] Call Trace:
>>     [ 1018.059738]  <IRQ>  [<ffffffff81066ecf>] ? __report_bad_irq+0x30/0x7d
>>     [ 1018.059748]  [<ffffffff81067023>] ? note_interrupt+0x107/0x170
>>     [ 1018.059751]  [<ffffffff81067610>] ? handle_fasteoi_irq+0x8a/0xaa
>>     [ 1018.059755]  [<ffffffff8100d1cf>] ? handle_irq+0x17/0x1d
>>     [ 1018.059757]  [<ffffffff8100c84b>] ? do_IRQ+0x54/0xb2
>>     [ 1018.059761]  [<ffffffff8100b6d3>] ? ret_from_intr+0x0/0xa
>>     [ 1018.059762]  <EOI>  [<ffffffff815c7d2c>] ? do_page_fault+0xed/0x2ef
>>     [ 1018.059769]  [<ffffffff815c7f12>] ? do_page_fault+0x2d3/0x2ef
>>     [ 1018.059773]  [<ffffffff812dd5ed>] ? __put_user_4+0x1d/0x30
>>     [ 1018.059776]  [<ffffffff815c5fdf>] ? page_fault+0x1f/0x30
>>     [ 1018.059777] handlers:
>>     [ 1018.059778] [<ffffffff813d2d8c>] (ahci_interrupt+0x0/0x426) 
>>     [ 1018.059783] Disabling IRQ #23
> 
> How reproducible is the error for you? Do you see it every time or not?
> If it is reliably reproducible, can you think of any explanation why your
> earlier test was a success while we now see that the revert does not help?
> 
> Does the error *only* occur during gcc compilation, or was that just the
> simplest way to reproduce it? Does it always occur at the same point during
> the compilation or does it vary?
> Can you create a test case that does not require doing the whole
> compilation, but only executes the step that triggers the error?
> 
> If you can find a reliable and fairly quick way to reproduce the error, I
> would suggest doing a bisection.
> 
> Jeff, Tejun: do you have any ideas what could cause this issue to suddenly
> appear or how to debug/instrument it?

Alexander, can you please attach full boot log and the output of
"lspci -nn"?  Also, how reproducible is the problem?  You already
answered to Frans' question but can you be more specific?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: 2.6.{30,31} x86_64 ahci problem - irq 23: nobody cared
  2009-10-12  7:49                   ` Tejun Heo
@ 2009-10-12  9:48                     ` Frans Pop
  2009-10-12  9:52                       ` Tejun Heo
  0 siblings, 1 reply; 33+ messages in thread
From: Frans Pop @ 2009-10-12  9:48 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Alexander Huemer, linux-kernel, linux-ide, Jeff Garzik

On Monday 12 October 2009, Tejun Heo wrote:
> Alexander, can you please attach full boot log and the output of
> "lspci -nn"?  Also, how reproducible is the problem?  You already
> answered to Frans' question but can you be more specific?

Full dmesg was made available earlier at:
http://xx.vu/~ahuemer/dmesg_ahuemer_20090923

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: 2.6.{30,31} x86_64 ahci problem - irq 23: nobody cared
  2009-10-12  9:48                     ` Frans Pop
@ 2009-10-12  9:52                       ` Tejun Heo
  2009-10-12  9:55                         ` Alexander Huemer
  0 siblings, 1 reply; 33+ messages in thread
From: Tejun Heo @ 2009-10-12  9:52 UTC (permalink / raw)
  To: Frans Pop; +Cc: Alexander Huemer, linux-kernel, linux-ide, Jeff Garzik

Frans Pop wrote:
> On Monday 12 October 2009, Tejun Heo wrote:
>> Alexander, can you please attach full boot log and the output of
>> "lspci -nn"?  Also, how reproducible is the problem?  You already
>> answered to Frans' question but can you be more specific?
> 
> Full dmesg was made available earlier at:
> http://xx.vu/~ahuemer/dmesg_ahuemer_20090923

Does blacklisting i801_smbus make any difference?

-- 
tejun

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: 2.6.{30,31} x86_64 ahci problem - irq 23: nobody cared
  2009-10-12  9:52                       ` Tejun Heo
@ 2009-10-12  9:55                         ` Alexander Huemer
  2009-10-12 10:07                           ` Tejun Heo
  0 siblings, 1 reply; 33+ messages in thread
From: Alexander Huemer @ 2009-10-12  9:55 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Frans Pop, linux-kernel, linux-ide, Jeff Garzik, alexander.huemer

Tejun Heo wrote:
> Frans Pop wrote:
>   
>> On Monday 12 October 2009, Tejun Heo wrote:
>>     
>>> Alexander, can you please attach full boot log and the output of
>>> "lspci -nn"?  Also, how reproducible is the problem?  You already
>>> answered to Frans' question but can you be more specific?
>>>       
>> Full dmesg was made available earlier at:
>> http://xx.vu/~ahuemer/dmesg_ahuemer_20090923
>>     
>
> Does blacklisting i801_smbus make any difference?
>
>   
lspci -nn:
http://xx.vu/~ahuemer/lspci_nn_ahuemer_20091012

what do you mean with "blacklisting i801_smbus" ?

regards
-alex

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: 2.6.{30,31} x86_64 ahci problem - irq 23: nobody cared
  2009-10-12  9:55                         ` Alexander Huemer
@ 2009-10-12 10:07                           ` Tejun Heo
  2009-10-12 10:11                             ` Alexander Huemer
  0 siblings, 1 reply; 33+ messages in thread
From: Tejun Heo @ 2009-10-12 10:07 UTC (permalink / raw)
  To: Alexander Huemer; +Cc: Frans Pop, linux-kernel, linux-ide, Jeff Garzik

Alexander Huemer wrote:
> Tejun Heo wrote:
>> Frans Pop wrote:
>>  
>>> On Monday 12 October 2009, Tejun Heo wrote:
>>>    
>>>> Alexander, can you please attach full boot log and the output of
>>>> "lspci -nn"?  Also, how reproducible is the problem?  You already
>>>> answered to Frans' question but can you be more specific?
>>>>       
>>> Full dmesg was made available earlier at:
>>> http://xx.vu/~ahuemer/dmesg_ahuemer_20090923
>>>     
>>
>> Does blacklisting i801_smbus make any difference?
>>
>>   
> lspci -nn:
> http://xx.vu/~ahuemer/lspci_nn_ahuemer_20091012
> 
> what do you mean with "blacklisting i801_smbus" ?

[    3.872387] i2c /dev entries driver
[    3.873943] i801_smbus 0000:00:1f.3: PCI INT B -> GSI 23 (level, low) -> IRQ 23
[    3.875580] w83627hf: Found W83627HF chip at 0x290

IRQ23 is also used by i801_smbus and it would be nice to confirm
whether the problem can still be triggered with that driver not
loaded.  Adding "blacklist i2c_i801" to /etc/modprobe.d/blacklist
should probabaly do the trick.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: 2.6.{30,31} x86_64 ahci problem - irq 23: nobody cared
  2009-10-12 10:07                           ` Tejun Heo
@ 2009-10-12 10:11                             ` Alexander Huemer
  2009-10-12 15:03                               ` Alexander Huemer
  0 siblings, 1 reply; 33+ messages in thread
From: Alexander Huemer @ 2009-10-12 10:11 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Frans Pop, linux-kernel, linux-ide, Jeff Garzik, alexander.huemer

Tejun Heo wrote:
> Alexander Huemer wrote:
>   
>> Tejun Heo wrote:
>>     
>>> Frans Pop wrote:
>>>  
>>>       
>>>> On Monday 12 October 2009, Tejun Heo wrote:
>>>>    
>>>>         
>>>>> Alexander, can you please attach full boot log and the output of
>>>>> "lspci -nn"?  Also, how reproducible is the problem?  You already
>>>>> answered to Frans' question but can you be more specific?
>>>>>       
>>>>>           
>>>> Full dmesg was made available earlier at:
>>>> http://xx.vu/~ahuemer/dmesg_ahuemer_20090923
>>>>     
>>>>         
>>> Does blacklisting i801_smbus make any difference?
>>>
>>>   
>>>       
>> lspci -nn:
>> http://xx.vu/~ahuemer/lspci_nn_ahuemer_20091012
>>
>> what do you mean with "blacklisting i801_smbus" ?
>>     
>
> [    3.872387] i2c /dev entries driver
> [    3.873943] i801_smbus 0000:00:1f.3: PCI INT B -> GSI 23 (level, low) -> IRQ 23
> [    3.875580] w83627hf: Found W83627HF chip at 0x290
>
> IRQ23 is also used by i801_smbus and it would be nice to confirm
> whether the problem can still be triggered with that driver not
> loaded.  Adding "blacklist i2c_i801" to /etc/modprobe.d/blacklist
> should probabaly do the trick.
>
> Thanks.
>
>   
okay, i think you assume that i2c_i801 is a module.
it is indeed built into the kernel.
i'll rebuild the kernel without that component and run a test again.

regards
-alex

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: 2.6.{30,31} x86_64 ahci problem - irq 23: nobody cared
  2009-10-12 10:11                             ` Alexander Huemer
@ 2009-10-12 15:03                               ` Alexander Huemer
  2009-10-12 17:28                                 ` Robert Hancock
  2009-10-13  2:17                                 ` Tejun Heo
  0 siblings, 2 replies; 33+ messages in thread
From: Alexander Huemer @ 2009-10-12 15:03 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Frans Pop, linux-kernel, linux-ide, Jeff Garzik, alexander.huemer

Alexander Huemer wrote:
> Tejun Heo wrote:
>> Alexander Huemer wrote:
>>  
>>> Tejun Heo wrote:
>>>    
>>>> Frans Pop wrote:
>>>>  
>>>>      
>>>>> On Monday 12 October 2009, Tejun Heo wrote:
>>>>>           
>>>>>> Alexander, can you please attach full boot log and the output of
>>>>>> "lspci -nn"?  Also, how reproducible is the problem?  You already
>>>>>> answered to Frans' question but can you be more specific?
>>>>>>                 
>>>>> Full dmesg was made available earlier at:
>>>>> http://xx.vu/~ahuemer/dmesg_ahuemer_20090923
>>>>>             
>>>> Does blacklisting i801_smbus make any difference?
>>>>
>>>>         
>>> lspci -nn:
>>> http://xx.vu/~ahuemer/lspci_nn_ahuemer_20091012
>>>
>>> what do you mean with "blacklisting i801_smbus" ?
>>>     
>>
>> [    3.872387] i2c /dev entries driver
>> [    3.873943] i801_smbus 0000:00:1f.3: PCI INT B -> GSI 23 (level, 
>> low) -> IRQ 23
>> [    3.875580] w83627hf: Found W83627HF chip at 0x290
>>
>> IRQ23 is also used by i801_smbus and it would be nice to confirm
>> whether the problem can still be triggered with that driver not
>> loaded.  Adding "blacklist i2c_i801" to /etc/modprobe.d/blacklist
>> should probabaly do the trick.
>>
>> Thanks.
>>
>>   
> okay, i think you assume that i2c_i801 is a module.
> it is indeed built into the kernel.
> i'll rebuild the kernel without that component and run a test again.
>
> regards
> -alex
tejun, it seems you hit an interesting point.
i compiled kernel-2.6.31.3 with my ususal config _without_ i2c_i801.
my usual test (compilation of gcc-4.3.2) finished 5 times without the error.
i'll let it run some more times over night.
does anybody have an idea how i can trace what exactly causes the error 
during the compilation run so that i can create a short test program ?

regards
-alex

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: 2.6.{30,31} x86_64 ahci problem - irq 23: nobody cared
  2009-10-12 15:03                               ` Alexander Huemer
@ 2009-10-12 17:28                                 ` Robert Hancock
  2009-10-13  2:17                                 ` Tejun Heo
  1 sibling, 0 replies; 33+ messages in thread
From: Robert Hancock @ 2009-10-12 17:28 UTC (permalink / raw)
  To: Alexander Huemer
  Cc: Tejun Heo, Frans Pop, linux-kernel, linux-ide, Jeff Garzik

On 10/12/2009 09:03 AM, Alexander Huemer wrote:
> Alexander Huemer wrote:
>> Tejun Heo wrote:
>>> Alexander Huemer wrote:
>>>
>>>> Tejun Heo wrote:
>>>>> Frans Pop wrote:
>>>>>
>>>>>> On Monday 12 October 2009, Tejun Heo wrote:
>>>>>>> Alexander, can you please attach full boot log and the output of
>>>>>>> "lspci -nn"? Also, how reproducible is the problem? You already
>>>>>>> answered to Frans' question but can you be more specific?
>>>>>> Full dmesg was made available earlier at:
>>>>>> http://xx.vu/~ahuemer/dmesg_ahuemer_20090923
>>>>> Does blacklisting i801_smbus make any difference?
>>>>>
>>>> lspci -nn:
>>>> http://xx.vu/~ahuemer/lspci_nn_ahuemer_20091012
>>>>
>>>> what do you mean with "blacklisting i801_smbus" ?
>>>
>>> [ 3.872387] i2c /dev entries driver
>>> [ 3.873943] i801_smbus 0000:00:1f.3: PCI INT B -> GSI 23 (level, low)
>>> -> IRQ 23
>>> [ 3.875580] w83627hf: Found W83627HF chip at 0x290
>>>
>>> IRQ23 is also used by i801_smbus and it would be nice to confirm
>>> whether the problem can still be triggered with that driver not
>>> loaded. Adding "blacklist i2c_i801" to /etc/modprobe.d/blacklist
>>> should probabaly do the trick.
>>>
>>> Thanks.
>>>
>> okay, i think you assume that i2c_i801 is a module.
>> it is indeed built into the kernel.
>> i'll rebuild the kernel without that component and run a test again.
>>
>> regards
>> -alex
> tejun, it seems you hit an interesting point.
> i compiled kernel-2.6.31.3 with my ususal config _without_ i2c_i801.
> my usual test (compilation of gcc-4.3.2) finished 5 times without the
> error.
> i'll let it run some more times over night.
> does anybody have an idea how i can trace what exactly causes the error
> during the compilation run so that i can create a short test program ?

Do you have any hardware sensors monitoring software running (such as 
the GNOME sensors panel applet or something?) Something like that would 
be the most likely cause for something to access the smbus driver.

Interesting that the device seems to be on the same interrupt but it 
hasn't registered itself as a handler (it looks like that driver doesn't 
use interrupts). If the device did generate an interrupt though, it 
would indeed cause this problem.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: 2.6.{30,31} x86_64 ahci problem - irq 23: nobody cared
  2009-10-12 15:03                               ` Alexander Huemer
  2009-10-12 17:28                                 ` Robert Hancock
@ 2009-10-13  2:17                                 ` Tejun Heo
  2009-10-13  6:49                                   ` Alexander Huemer
  1 sibling, 1 reply; 33+ messages in thread
From: Tejun Heo @ 2009-10-13  2:17 UTC (permalink / raw)
  To: Alexander Huemer, Jean Delvare
  Cc: Frans Pop, linux-kernel, linux-ide, Jeff Garzik

[cc'ing Jean and quoting whole body]

Hello, Jean.

It seems i2c_i801 is triggering IRQ storm on Alexander's machine.  The
original thread is

 http://thread.gmane.org/gmane.linux.kernel/894187

Any ideas?

Thanks.

Alexander Huemer wrote:
> Alexander Huemer wrote:
>> Tejun Heo wrote:
>>> Alexander Huemer wrote:
>>>  
>>>> Tejun Heo wrote:
>>>>   
>>>>> Frans Pop wrote:
>>>>>  
>>>>>     
>>>>>> On Monday 12 October 2009, Tejun Heo wrote:
>>>>>>          
>>>>>>> Alexander, can you please attach full boot log and the output of
>>>>>>> "lspci -nn"?  Also, how reproducible is the problem?  You already
>>>>>>> answered to Frans' question but can you be more specific?
>>>>>>>                 
>>>>>> Full dmesg was made available earlier at:
>>>>>> http://xx.vu/~ahuemer/dmesg_ahuemer_20090923
>>>>>>             
>>>>> Does blacklisting i801_smbus make any difference?
>>>>>
>>>>>         
>>>> lspci -nn:
>>>> http://xx.vu/~ahuemer/lspci_nn_ahuemer_20091012
>>>>
>>>> what do you mean with "blacklisting i801_smbus" ?
>>>>     
>>>
>>> [    3.872387] i2c /dev entries driver
>>> [    3.873943] i801_smbus 0000:00:1f.3: PCI INT B -> GSI 23 (level,
>>> low) -> IRQ 23
>>> [    3.875580] w83627hf: Found W83627HF chip at 0x290
>>>
>>> IRQ23 is also used by i801_smbus and it would be nice to confirm
>>> whether the problem can still be triggered with that driver not
>>> loaded.  Adding "blacklist i2c_i801" to /etc/modprobe.d/blacklist
>>> should probabaly do the trick.
>>>
>>> Thanks.
>>>
>>>   
>> okay, i think you assume that i2c_i801 is a module.
>> it is indeed built into the kernel.
>> i'll rebuild the kernel without that component and run a test again.
>>
>> regards
>> -alex
> tejun, it seems you hit an interesting point.
> i compiled kernel-2.6.31.3 with my ususal config _without_ i2c_i801.
> my usual test (compilation of gcc-4.3.2) finished 5 times without the
> error.
> i'll let it run some more times over night.
> does anybody have an idea how i can trace what exactly causes the error
> during the compilation run so that i can create a short test program ?
> 
> regards
> -alex

-- 
tejun

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: 2.6.{30,31} x86_64 ahci problem - irq 23: nobody cared
  2009-10-13  2:17                                 ` Tejun Heo
@ 2009-10-13  6:49                                   ` Alexander Huemer
  2009-10-13 12:35                                     ` Tejun Heo
  0 siblings, 1 reply; 33+ messages in thread
From: Alexander Huemer @ 2009-10-13  6:49 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Jean Delvare, Frans Pop, linux-kernel, linux-ide, Jeff Garzik,
	alexander.huemer

Tejun Heo wrote:
> [cc'ing Jean and quoting whole body]
>
> Hello, Jean.
>
> It seems i2c_i801 is triggering IRQ storm on Alexander's machine.  The
> original thread is
>
>  http://thread.gmane.org/gmane.linux.kernel/894187
>
> Any ideas?
>
> Thanks.
>
> Alexander Huemer wrote:
>   
>> Alexander Huemer wrote:
>>     
>>> Tejun Heo wrote:
>>>       
>>>> Alexander Huemer wrote:
>>>>  
>>>>         
>>>>> Tejun Heo wrote:
>>>>>   
>>>>>           
>>>>>> Frans Pop wrote:
>>>>>>  
>>>>>>     
>>>>>>             
>>>>>>> On Monday 12 October 2009, Tejun Heo wrote:
>>>>>>>          
>>>>>>>               
>>>>>>>> Alexander, can you please attach full boot log and the output of
>>>>>>>> "lspci -nn"?  Also, how reproducible is the problem?  You already
>>>>>>>> answered to Frans' question but can you be more specific?
>>>>>>>>                 
>>>>>>>>                 
>>>>>>> Full dmesg was made available earlier at:
>>>>>>> http://xx.vu/~ahuemer/dmesg_ahuemer_20090923
>>>>>>>             
>>>>>>>               
>>>>>> Does blacklisting i801_smbus make any difference?
>>>>>>
>>>>>>         
>>>>>>             
>>>>> lspci -nn:
>>>>> http://xx.vu/~ahuemer/lspci_nn_ahuemer_20091012
>>>>>
>>>>> what do you mean with "blacklisting i801_smbus" ?
>>>>>     
>>>>>           
>>>> [    3.872387] i2c /dev entries driver
>>>> [    3.873943] i801_smbus 0000:00:1f.3: PCI INT B -> GSI 23 (level,
>>>> low) -> IRQ 23
>>>> [    3.875580] w83627hf: Found W83627HF chip at 0x290
>>>>
>>>> IRQ23 is also used by i801_smbus and it would be nice to confirm
>>>> whether the problem can still be triggered with that driver not
>>>> loaded.  Adding "blacklist i2c_i801" to /etc/modprobe.d/blacklist
>>>> should probabaly do the trick.
>>>>
>>>> Thanks.
>>>>
>>>>   
>>>>         
>>> okay, i think you assume that i2c_i801 is a module.
>>> it is indeed built into the kernel.
>>> i'll rebuild the kernel without that component and run a test again.
>>>
>>> regards
>>> -alex
>>>       
>> tejun, it seems you hit an interesting point.
>> i compiled kernel-2.6.31.3 with my ususal config _without_ i2c_i801.
>> my usual test (compilation of gcc-4.3.2) finished 5 times without the
>> error.
>> i'll let it run some more times over night.
>> does anybody have an idea how i can trace what exactly causes the error
>> during the compilation run so that i can create a short test program ?
>>
>> regards
>> -alex
>>     
>
>   
hi,

i compiled gcc in a loop over night, 14 times. no error.
it really seams i2c_i801 was the cause...
unfortunately i still don't know how i can extract the part of the gcc
compilation process that causes the error on an affected kernel.
that would enable me to create a simple test program.

regards
-alex

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: 2.6.{30,31} x86_64 ahci problem - irq 23: nobody cared
  2009-10-13  6:49                                   ` Alexander Huemer
@ 2009-10-13 12:35                                     ` Tejun Heo
  2009-10-14 11:45                                       ` Jean Delvare
  2009-10-21  8:38                                       ` Jean Delvare
  0 siblings, 2 replies; 33+ messages in thread
From: Tejun Heo @ 2009-10-13 12:35 UTC (permalink / raw)
  To: Alexander Huemer
  Cc: Jean Delvare, Frans Pop, linux-kernel, linux-ide, Jeff Garzik

Alexander Huemer wrote:
> i compiled gcc in a loop over night, 14 times. no error.
> it really seams i2c_i801 was the cause...
> unfortunately i still don't know how i can extract the part of the gcc
> compilation process that causes the error on an affected kernel.
> that would enable me to create a simple test program.

Given that i2c is used for temperature monitoring, I think it is not
triggered by any single step of the compiling but rather by the
accumulated heat load during compilation.  Let's wait for Jean to
chime in.  :-)

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: 2.6.{30,31} x86_64 ahci problem - irq 23: nobody cared
  2009-10-13 12:35                                     ` Tejun Heo
@ 2009-10-14 11:45                                       ` Jean Delvare
  2009-10-21  8:38                                       ` Jean Delvare
  1 sibling, 0 replies; 33+ messages in thread
From: Jean Delvare @ 2009-10-14 11:45 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Alexander Huemer, Frans Pop, linux-kernel, linux-ide, Jeff Garzik

Le mardi 13 octobre 2009, Tejun Heo a écrit :
> Alexander Huemer wrote:
> > i compiled gcc in a loop over night, 14 times. no error.
> > it really seams i2c_i801 was the cause...
> > unfortunately i still don't know how i can extract the part of the gcc
> > compilation process that causes the error on an affected kernel.
> > that would enable me to create a simple test program.
> 
> Given that i2c is used for temperature monitoring, I think it is not
> triggered by any single step of the compiling but rather by the
> accumulated heat load during compilation.  Let's wait for Jean to
> chime in.  :-)

Sorry, I'm somewhat busy at the moment, I'll give it a look as soon
as I get a moment.

-- 
Jean Delvare
Suse L3

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: 2.6.{30,31} x86_64 ahci problem - irq 23: nobody cared
  2009-10-13 12:35                                     ` Tejun Heo
  2009-10-14 11:45                                       ` Jean Delvare
@ 2009-10-21  8:38                                       ` Jean Delvare
  2009-10-21 10:01                                         ` Alexander Huemer
  1 sibling, 1 reply; 33+ messages in thread
From: Jean Delvare @ 2009-10-21  8:38 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Alexander Huemer, Frans Pop, linux-kernel, linux-ide, Jeff Garzik

Hi Tejun, Alexander,

Le mardi 13 octobre 2009, Tejun Heo a écrit :
> Alexander Huemer wrote:
> > i compiled gcc in a loop over night, 14 times. no error.
> > it really seams i2c_i801 was the cause...
> > unfortunately i still don't know how i can extract the part of the gcc
> > compilation process that causes the error on an affected kernel.
> > that would enable me to create a simple test program.
> 
> Given that i2c is used for temperature monitoring, I think it is not
> triggered by any single step of the compiling but rather by the
> accumulated heat load during compilation.  Let's wait for Jean to
> chime in.  :-)

OK, here I am, sorry for the delay. I've read the discussion thread.
Here are the few data points I can offer, in the hope it will help:

* While the i2c-i801 driver received some changes in kernel 2.6.30,
  none of these are related to PCI nor interrupts. So as the problem
  is new in kernel 2.6.30, the i2c-i801 driver alone is unlikely to
  cause it. This may, however, be a combination of something i2c-i801
  does and something the pci subsystem does since kernel 2.6.30. For
  this reason, I would still recommend a bisection if the problem can
  be reliably reproduced. I know it takes time, but it is always
  easier to fix a bug when we know which commit introduced it.

* The i2c-i801 driver does _not_ make use of interrupts. It is
  poll-based (I am not exactly proud of that, but that's the way it
  is.)

  #define ENABLE_INT9		0	/* set to 0x01 to enable - untested */

  So I am very surprised to read that this driver would cause an IRQ
  storm.

* One thing the i2c-i801 driver does on the PCI device is:

  err = pci_enable_device(dev);

  I presume this is what causes the following message in dmesg:

  i801_smbus 0000:00:1f.3: PCI INT B -> GSI 23 (level, low) -> IRQ 23

  Basically, even though the driver doesn't make use of interrupts,
  the IRQ is still registered because this is how the hardware is
  setup.

As a conclusion, I suspect that 2 things may be happening: either
the SMBus is triggering interrupts when told not to. The ICH6 is a
bit different from all the other supported chips, I'll double check
if we may have missed something. Or, something else is triggering
SMBus transactions. SMI and ACPI come to mind. If this is the case
then you do not want to use i2c-i801 on this motherboard.

Questions to Alexander :

* Can I please see the output of "sensors" on your system?
* What are the brand and model of your motherboard?
* Can we get an acpidump for your system?

-- 
Jean Delvare
Suse L3

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: 2.6.{30,31} x86_64 ahci problem - irq 23: nobody cared
  2009-10-21  8:38                                       ` Jean Delvare
@ 2009-10-21 10:01                                         ` Alexander Huemer
  2009-10-21 11:28                                           ` Jean Delvare
  0 siblings, 1 reply; 33+ messages in thread
From: Alexander Huemer @ 2009-10-21 10:01 UTC (permalink / raw)
  To: Jean Delvare
  Cc: Tejun Heo, Frans Pop, linux-kernel, linux-ide, Jeff Garzik,
	alexander.huemer

Jean Delvare wrote:
> Hi Tejun, Alexander,
>
> Le mardi 13 octobre 2009, Tejun Heo a écrit :
>   
>> Alexander Huemer wrote:
>>     
>>> i compiled gcc in a loop over night, 14 times. no error.
>>> it really seams i2c_i801 was the cause...
>>> unfortunately i still don't know how i can extract the part of the gcc
>>> compilation process that causes the error on an affected kernel.
>>> that would enable me to create a simple test program.
>>>       
>> Given that i2c is used for temperature monitoring, I think it is not
>> triggered by any single step of the compiling but rather by the
>> accumulated heat load during compilation.  Let's wait for Jean to
>> chime in.  :-)
>>     
>
> OK, here I am, sorry for the delay. I've read the discussion thread.
> Here are the few data points I can offer, in the hope it will help:
>
> * While the i2c-i801 driver received some changes in kernel 2.6.30,
>   none of these are related to PCI nor interrupts. So as the problem
>   is new in kernel 2.6.30, the i2c-i801 driver alone is unlikely to
>   cause it. This may, however, be a combination of something i2c-i801
>   does and something the pci subsystem does since kernel 2.6.30. For
>   this reason, I would still recommend a bisection if the problem can
>   be reliably reproduced. I know it takes time, but it is always
>   easier to fix a bug when we know which commit introduced it.
>
> * The i2c-i801 driver does _not_ make use of interrupts. It is
>   poll-based (I am not exactly proud of that, but that's the way it
>   is.)
>
>   #define ENABLE_INT9		0	/* set to 0x01 to enable - untested */
>
>   So I am very surprised to read that this driver would cause an IRQ
>   storm.
>
> * One thing the i2c-i801 driver does on the PCI device is:
>
>   err = pci_enable_device(dev);
>
>   I presume this is what causes the following message in dmesg:
>
>   i801_smbus 0000:00:1f.3: PCI INT B -> GSI 23 (level, low) -> IRQ 23
>
>   Basically, even though the driver doesn't make use of interrupts,
>   the IRQ is still registered because this is how the hardware is
>   setup.
>
> As a conclusion, I suspect that 2 things may be happening: either
> the SMBus is triggering interrupts when told not to. The ICH6 is a
> bit different from all the other supported chips, I'll double check
> if we may have missed something. Or, something else is triggering
> SMBus transactions. SMI and ACPI come to mind. If this is the case
> then you do not want to use i2c-i801 on this motherboard.
>
> Questions to Alexander :
>
> * Can I please see the output of "sensors" on your system?
> * What are the brand and model of your motherboard?
> * Can we get an acpidump for your system?
>
>   
many thanks for your response. i appreciate that.
first, the data you requested:

    sensors:        http://xx.vu/~ahuemer/sensors-ahuemer-20091021.txt
    acpidump:       http://xx.vu/~ahuemer/acpidump-ahuemer-20091021.txt
    motherboard:    tyan tempest i5400pw/s5397 with one intel xeon e5420.

the output of sensors was made _without_ i801_smbus in the kernel.
i noticed that the data of w83627hf-isa-0290 is quite weird. i do not
have an explanation for that.
if a bisection is what will bring light into this, i am willing to take
the time.
so that would be a bisection between 2.6.29 and 2.6.30 ?
a quicker test case would be good for that, but i don't have one yet,
just the compilation of gcc, which takes time, even on this machine with
tmpfs and ccache.



^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: 2.6.{30,31} x86_64 ahci problem - irq 23: nobody cared
  2009-10-21 10:01                                         ` Alexander Huemer
@ 2009-10-21 11:28                                           ` Jean Delvare
  2009-10-26 15:01                                             ` Alexander Huemer
  0 siblings, 1 reply; 33+ messages in thread
From: Jean Delvare @ 2009-10-21 11:28 UTC (permalink / raw)
  To: Alexander Huemer
  Cc: Tejun Heo, Frans Pop, linux-kernel, linux-ide, Jeff Garzik

Le mercredi 21 octobre 2009, Alexander Huemer a écrit :
> Jean Delvare wrote:
> > OK, here I am, sorry for the delay. I've read the discussion thread.
> > Here are the few data points I can offer, in the hope it will help:
> >
> > * While the i2c-i801 driver received some changes in kernel 2.6.30,
> >   none of these are related to PCI nor interrupts. So as the problem
> >   is new in kernel 2.6.30, the i2c-i801 driver alone is unlikely to
> >   cause it. This may, however, be a combination of something i2c-i801
> >   does and something the pci subsystem does since kernel 2.6.30. For
> >   this reason, I would still recommend a bisection if the problem can
> >   be reliably reproduced. I know it takes time, but it is always
> >   easier to fix a bug when we know which commit introduced it.
> >
> > * The i2c-i801 driver does _not_ make use of interrupts. It is
> >   poll-based (I am not exactly proud of that, but that's the way it
> >   is.)
> >
> >   #define ENABLE_INT9		0	/* set to 0x01 to enable - untested */
> >
> >   So I am very surprised to read that this driver would cause an IRQ
> >   storm.
> >
> > * One thing the i2c-i801 driver does on the PCI device is:
> >
> >   err = pci_enable_device(dev);
> >
> >   I presume this is what causes the following message in dmesg:
> >
> >   i801_smbus 0000:00:1f.3: PCI INT B -> GSI 23 (level, low) -> IRQ 23
> >
> >   Basically, even though the driver doesn't make use of interrupts,
> >   the IRQ is still registered because this is how the hardware is
> >   setup.
> >
> > As a conclusion, I suspect that 2 things may be happening: either
> > the SMBus is triggering interrupts when told not to. The ICH6 is a
> > bit different from all the other supported chips, I'll double check

My bad, it's an 63xxESB-based board, not ICH6. I must have been
mixing data from a different bug.

> > if we may have missed something. Or, something else is triggering
> > SMBus transactions. SMI and ACPI come to mind. If this is the case
> > then you do not want to use i2c-i801 on this motherboard.
> >
> > Questions to Alexander :
> >
> > * Can I please see the output of "sensors" on your system?
> > * What are the brand and model of your motherboard?
> > * Can we get an acpidump for your system?
> >
> >   
> many thanks for your response. i appreciate that.
> first, the data you requested:
> 
>     sensors:        http://xx.vu/~ahuemer/sensors-ahuemer-20091021.txt
>     acpidump:       http://xx.vu/~ahuemer/acpidump-ahuemer-20091021.txt

The good news is that I can't see any access to the SMBus in the
ACPI tables. Nothing can be said about the SMIs though, without an
intimate knowledge of the BIOS.

>     motherboard:    tyan tempest i5400pw/s5397 with one intel xeon e5420.
> 
> the output of sensors was made _without_ i801_smbus in the kernel.

Then please once again with it. My whole point was to know whether
there was any hardware monitoring chip connected to the SMBus. Your
initial kernel configuration suggests that you have a W83793G chip
there.

> i noticed that the data of w83627hf-isa-0290 is quite weird. i do not
> have an explanation for that.

I do. This happens when the manufacturer decides that the hardware
monitoring features of the Super-I/O are insufficient for their
needs. They add a dedicated chip for the hardware monitoring. This
is particularly frequent on server boards from Tyan and SuperMicro.
Ideally they would _also_ disable the feature on the Super-I/O side,
but often then do not, so the driver still loads, but outputs
garbage.

You can see the following messages in your log:
[    3.878703] w83627hf w83627hf.656: Enabling temp2, readings might not make sense
[    3.881708] w83627hf w83627hf.656: Enabling temp3, readings might not make sense
This is a good hint that this is the case (if the nonsensical data
displayed by "sensors" wasn't enough to convince you.)

So you should stop loading/including kernel module w83627hf.

> if a bisection is what will bring light into this, i am willing to take
> the time.
> so that would be a bisection between 2.6.29 and 2.6.30 ?
> a quicker test case would be good for that, but i don't have one yet,
> just the compilation of gcc, which takes time, even on this machine with
> tmpfs and ccache.

-- 
Jean Delvare
Suse L3

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: 2.6.{30,31} x86_64 ahci problem - irq 23: nobody cared
  2009-10-21 11:28                                           ` Jean Delvare
@ 2009-10-26 15:01                                             ` Alexander Huemer
  0 siblings, 0 replies; 33+ messages in thread
From: Alexander Huemer @ 2009-10-26 15:01 UTC (permalink / raw)
  To: Jean Delvare
  Cc: Tejun Heo, Frans Pop, linux-kernel, linux-ide, Jeff Garzik,
	alexander.huemer

Jean Delvare wrote:
> Le mercredi 21 octobre 2009, Alexander Huemer a écrit :
>   
>> Jean Delvare wrote:
>>     
>>> OK, here I am, sorry for the delay. I've read the discussion thread.
>>> Here are the few data points I can offer, in the hope it will help:
>>>
>>> * While the i2c-i801 driver received some changes in kernel 2.6.30,
>>>   none of these are related to PCI nor interrupts. So as the problem
>>>   is new in kernel 2.6.30, the i2c-i801 driver alone is unlikely to
>>>   cause it. This may, however, be a combination of something i2c-i801
>>>   does and something the pci subsystem does since kernel 2.6.30. For
>>>   this reason, I would still recommend a bisection if the problem can
>>>   be reliably reproduced. I know it takes time, but it is always
>>>   easier to fix a bug when we know which commit introduced it.
>>>
>>> * The i2c-i801 driver does _not_ make use of interrupts. It is
>>>   poll-based (I am not exactly proud of that, but that's the way it
>>>   is.)
>>>
>>>   #define ENABLE_INT9		0	/* set to 0x01 to enable - untested */
>>>
>>>   So I am very surprised to read that this driver would cause an IRQ
>>>   storm.
>>>
>>> * One thing the i2c-i801 driver does on the PCI device is:
>>>
>>>   err = pci_enable_device(dev);
>>>
>>>   I presume this is what causes the following message in dmesg:
>>>
>>>   i801_smbus 0000:00:1f.3: PCI INT B -> GSI 23 (level, low) -> IRQ 23
>>>
>>>   Basically, even though the driver doesn't make use of interrupts,
>>>   the IRQ is still registered because this is how the hardware is
>>>   setup.
>>>
>>> As a conclusion, I suspect that 2 things may be happening: either
>>> the SMBus is triggering interrupts when told not to. The ICH6 is a
>>> bit different from all the other supported chips, I'll double check
>>>       
>
> My bad, it's an 63xxESB-based board, not ICH6. I must have been
> mixing data from a different bug.
>
>   
>>> if we may have missed something. Or, something else is triggering
>>> SMBus transactions. SMI and ACPI come to mind. If this is the case
>>> then you do not want to use i2c-i801 on this motherboard.
>>>
>>> Questions to Alexander :
>>>
>>> * Can I please see the output of "sensors" on your system?
>>> * What are the brand and model of your motherboard?
>>> * Can we get an acpidump for your system?
>>>
>>>   
>>>       
>> many thanks for your response. i appreciate that.
>> first, the data you requested:
>>
>>     sensors:        http://xx.vu/~ahuemer/sensors-ahuemer-20091021.txt
>>     acpidump:       http://xx.vu/~ahuemer/acpidump-ahuemer-20091021.txt
>>     
>
> The good news is that I can't see any access to the SMBus in the
> ACPI tables. Nothing can be said about the SMIs though, without an
> intimate knowledge of the BIOS.
>
>   
>>     motherboard:    tyan tempest i5400pw/s5397 with one intel xeon e5420.
>>
>> the output of sensors was made _without_ i801_smbus in the kernel.
>>     
>
> Then please once again with it. My whole point was to know whether
> there was any hardware monitoring chip connected to the SMBus. Your
> initial kernel configuration suggests that you have a W83793G chip
> there.
>
>   
>> i noticed that the data of w83627hf-isa-0290 is quite weird. i do not
>> have an explanation for that.
>>     
>
> I do. This happens when the manufacturer decides that the hardware
> monitoring features of the Super-I/O are insufficient for their
> needs. They add a dedicated chip for the hardware monitoring. This
> is particularly frequent on server boards from Tyan and SuperMicro.
> Ideally they would _also_ disable the feature on the Super-I/O side,
> but often then do not, so the driver still loads, but outputs
> garbage.
>
> You can see the following messages in your log:
> [    3.878703] w83627hf w83627hf.656: Enabling temp2, readings might not make sense
> [    3.881708] w83627hf w83627hf.656: Enabling temp3, readings might not make sense
> This is a good hint that this is the case (if the nonsensical data
> displayed by "sensors" wasn't enough to convince you.)
>
> So you should stop loading/including kernel module w83627hf.
>
>   
>> if a bisection is what will bring light into this, i am willing to take
>> the time.
>> so that would be a bisection between 2.6.29 and 2.6.30 ?
>> a quicker test case would be good for that, but i don't have one yet,
>> just the compilation of gcc, which takes time, even on this machine with
>> tmpfs and ccache.
>>     
>
>   
here is the output you requested:
http://xx.vu/~ahuemer/sensors_ahuemer_with_i801_20091026.txt
i am currently in the middle of a bisection between 2.6.29 and 2.6.30, 8
steps left.
many thanks for the info on hardware monitoring.
i'll report back when bisection is finished.

regards
-alex

^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2009-10-26 15:02 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-09-24 18:21 2.6.{30,31} x86_64 ahci problem - irq 23: nobody cared Alexander Huemer
2009-09-24 18:31 ` David Daney
2009-09-24 19:15   ` Alexander Huemer
2009-09-24 19:38     ` David Daney
2009-09-24 19:24 ` Frans Pop
2009-09-24 19:30   ` Alexander Huemer
2009-09-24 19:40   ` Frans Pop
2009-09-24 19:43     ` Alexander Huemer
2009-09-25  0:02     ` Alexander Huemer
2009-09-25 11:28       ` Alexander Huemer
2009-09-25 12:24         ` Frans Pop
2009-09-25 12:27           ` Alexander Huemer
2009-09-25 12:48             ` Frans Pop
2009-10-08 12:00               ` Alexander Huemer
2009-10-09 21:30                 ` Alexander Huemer
2009-10-10 13:13                 ` Frans Pop
2009-10-11 20:57                   ` Alexander Huemer
2009-10-12  7:49                   ` Tejun Heo
2009-10-12  9:48                     ` Frans Pop
2009-10-12  9:52                       ` Tejun Heo
2009-10-12  9:55                         ` Alexander Huemer
2009-10-12 10:07                           ` Tejun Heo
2009-10-12 10:11                             ` Alexander Huemer
2009-10-12 15:03                               ` Alexander Huemer
2009-10-12 17:28                                 ` Robert Hancock
2009-10-13  2:17                                 ` Tejun Heo
2009-10-13  6:49                                   ` Alexander Huemer
2009-10-13 12:35                                     ` Tejun Heo
2009-10-14 11:45                                       ` Jean Delvare
2009-10-21  8:38                                       ` Jean Delvare
2009-10-21 10:01                                         ` Alexander Huemer
2009-10-21 11:28                                           ` Jean Delvare
2009-10-26 15:01                                             ` Alexander Huemer

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.