All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: [Bugme-new] [Bug 13617] New: GRO:__napi_complete from net_rx_action crash
       [not found] <bug-13617-10286@http.bugzilla.kernel.org/>
@ 2009-06-25 19:29 ` Andrew Morton
  2009-06-26 17:13   ` Dhananjay Phadke
  0 siblings, 1 reply; 6+ messages in thread
From: Andrew Morton @ 2009-06-25 19:29 UTC (permalink / raw)
  To: netdev; +Cc: bugzilla-daemon, bugme-daemon, amit, Dhananjay Phadke



(switched to email.  Please respond via emailed reply-to-all, not via the
bugzilla web interface).


netdev core crashed.  The netxen driver may be implicated.


Why did amit@netxen.com create this bug report?  Isn't Dhananjay
sitting in the next cube?  Perhaps you believe that the driver is OK
and that the bug lies in the netdev core?



On Thu, 25 Jun 2009 06:55:14 GMT
bugzilla-daemon@bugzilla.kernel.org wrote:

> http://bugzilla.kernel.org/show_bug.cgi?id=13617
> 
>            Summary: GRO:__napi_complete from net_rx_action crash
>            Product: Drivers
>            Version: 2.5
>           Platform: All
>         OS/Version: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: normal
>           Priority: P1
>          Component: Network
>         AssignedTo: drivers_network@kernel-bugs.osdl.org
>         ReportedBy: amit@netxen.com
>         Regression: No
> 
> 
> In net_rx_action, there is check if napi_disable_pending then call
> __napi_complete.
> In __napi_complete, there is BUG_ON(n->gro_list);
> Which has hit in below bug dump.
> Why __napi_complete is called from net_rx_action instead of napi_complete.
> napi_complete flushes the gro list.
> 
> Below code excerpt from net_rx_action 
> http://lxr.linux.no/linux+v2.6.30/net/core/dev.c#L2736
> 
>    if (unlikely(work == weight)) {
> 2791       if (unlikely(napi_disable_pending(n)))
> 2792              __napi_complete(n);
> 2793        else
> 2794              list_move_tail(&n->poll_list, list);
> 2795   }
> 
> ------------[ cut here ]------------
> kernel BUG at net/core/dev.c:2672!
> invalid opcode: 0000 [#1] SMP 
> last sysfs file: /sys/devices/system/cpu/cpu3/cache/index2/shared_cpu_map
> CPU 2 
> Modules linked in: netxen_nic nfs lockd nfs_acl auth_rpcgss ipv6 deflate
> zlib_deflate ctr twofish twofish_common serpent blowfish des_generic cbc
> aes_x86_64 aes_generic xcbc sha256_generic md5 crypto_null af_key autofs4
> sunrpc iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi dm_mirror
> dm_region_hash dm_log dm_multipath dm_mod video output sbs sbshc pci_slot
> battery acpi_memhotplug ac parport ipmi_devintf ide_cd_mod rtc_cmos bnx2 cdrom
> serio_raw ipmi_si rtc_core button ipmi_msghandler iTCO_wdt rtc_lib shpchp hpilo
> hpwdt i5000_edac pcspkr edac_core ata_piix libata sd_mod scsi_mod cciss ext3
> jbd uhci_hcd ohci_hcd ehci_hcd [last unloaded: microcode]
> Pid: 0, comm: swapper Tainted: G        W  2.6.30 #1 ProLiant DL380 G5
> RIP: 0010:[<ffffffff8043b128>]  [<ffffffff8043b128>] __napi_complete+0x15/0x25
> RSP: 0018:ffff880028139eb0  EFLAGS: 00010086
> RAX: ffff88023d4056b8 RBX: ffff88023d4056a8 RCX: 0000000002202318
> RDX: 00000000001b0000 RSI: ffff880028139d98 RDI: ffff88023d4056a8
> RBP: 0000000000000080 R08: 0000000002200000 R09: 000006de15931680
> R10: ffffc20011a32318 R11: 0000000000000005 R12: 0000000000000000
> R13: ffff8800281440e0 R14: 0000000000000080 R15: 000000000000012c
> FS:  0000000000000000(0000) GS:ffff880028136000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
> CR2: 00000000008cb530 CR3: 000000023d9ab000 CR4: 00000000000006e0
> Jun 23 23:41:27 DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> 0000000000000000
> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Process swapper (pid: 0, threadinfo ffff88023ed28000, task ffff88023ed27570)
> Stack:
>  ffff88023d4056a8 ffffffff8043ec9f 0000000000000001dut4146 last mes
> 000000010004f429
>  ffff88023d4056b8sage repeated 6  0000000000000046 0000000000000001
> 0000000000000100times
> Jun 23 23
>  ffffffff8069a098 0000000000000018 000000000000000a:41:32 dut4146 k
> ffffffff8023eba6
> ernel: BUG: scheCall Trace:
> duling while ato <IRQ> <0>mic: swapper/0/0 [<ffffffff8043ec9f>] ?
> net_rx_action+0xf0/0x162
>  [<ffffffff8023eba6>] ? __do_softirq+0xa3/0x163
>  [<ffffffff8020ca7c>] ? call_softirq+0x1c/0x28
> x10000100
> Jun 2 [<ffffffff8020dc1a>] ? do_softirq+0x2c/0x68
>  [<ffffffff8023eac6>] ? irq_exit+0x3f/0x7c
>  [<ffffffff8020d46b>] ? do_IRQ+0xa9/0xbf
> 3 23:41:32 dut41 [<ffffffff8020c353>] ? ret_from_intr+0x0/0xa
>  <EOI> 46 kernel: Modul<0> [<ffffffff80220e41>] ?
> hpet_legacy_next_event+0x0/0x7
> es linked in: ne [<ffffffff80386e2c>] ? acpi_hw_register_read+0x52/0xe5
>  [<ffffffff80394b2a>] ? acpi_idle_enter_simple+0x120/0x14e
>  [<ffffffff80394b20>] ? acpi_idle_enter_simple+0x116/0x14e
>  [<ffffffff8039486b>] ? acpi_idle_enter_bm+0xd5/0x274
>  [<ffffffff8041c020>] ? cpuidle_idle_call+0x7f/0xbb
>  [<ffffffff8020aaa5>] ? cpu_idle+0x4a/0x6d
> Code: txen_nic nfs loc48 8d kd nfs_acl auth_43 70 48 rpcgss ipv6 defl39 c2 ate
> zlib_deflate0f  ctr twofish two18 0e 75 fish_common serpdf ent blowfish des31
> c9 41 _generic cbc aes58 5b _x86_64 aes_gene5d 48 89 ric xcbc sha256_c8 c3
> generic md5 cryp53 f6 to_null af_key a47 10 01 utofs4 sunrpc is48 89 fb csi_tcp
> libiscsi75 04 _tcp libiscsi sc0f 0b eb si_transport_iscfe 48 83 si dm_mirror
> dm_7f 50 region_hash dm_l00 74 04 og dm_multipath <0f> dm_mod video out0b eb
> put sbs sbshc pcfe e8 i_slot battery a1f cpi_memhotplug a10 f1 ff c parport
> ipmi_df0 80 evintf ide_cd_mo63 10 fe d rtc_cmos bnx2 5b c3 cdrom serio_raw 53
> 48 89 ipmi_si rtc_corefb e8 
>  button ipmi_msgRIP  [<ffffffff8043b128>] __napi_complete+0x15/0x25
>  RSP <ffff880028139eb0>
> ---[ end trace 9c6b22b26aefd1b1 ]---
> handler iTCO_wdtKernel panic - not syncing: Fatal exception in interrupt
> Pid: 0, comm: swapper Tainted: G      D W  2.6.30 #1
> Call Trace:
>  <IRQ>  [<ffffffff8023a3b5>] ? panic+0x86/0x134
>  [<ffffffff8020e348>] ? show_registers+0x211/0x21d
>  [<ffffffff8024f5ea>] ? up+0xe/0x36
>  [<ffffffff8023a9db>] ? release_console_sem+0x174/0x18e
>  [<ffffffff804bdd54>] ? oops_end+0xa0/0xad
>  [<ffffffff8020cf2c>] ? do_invalid_op+0x85/0x8f
>  [<ffffffff8043b128>] ? __napi_complete+0x15/0x25
>  [<ffffffffa03ebfe2>] ? netxen_nic_hw_write_wx_2M+0x24/0xa8 [netxen_nic]
>  [<ffffffffa03ef866>] ? netxen_process_rcv_ring+0x4eb/0x501 [netxen_nic]
>  rtc_lib shpchp  [<ffffffff8020c715>] ? invalid_op+0x15/0x20
>  [<ffffffff8043b128>] ? __napi_complete+0x15/0x25
>  [<ffffffff8043ec9f>] ? net_rx_action+0xf0/0x162
>  [<ffffffff8023eba6>] ? __do_softirq+0xa3/0x163
>  [<ffffffff8020ca7c>] ? call_softirq+0x1c/0x28
>  [<ffffffff8020dc1a>] ? do_softirq+0x2c/0x68
>  [<ffffffff8023eac6>] ? irq_exit+0x3f/0x7c
>  [<ffffffff8020d46b>] ? do_IRQ+0xa9/0xbf
>  [<ffffffff8020c353>] ? ret_from_intr+0x0/0xa
>  <EOI>  [<ffffffff80220e41>] ? hpet_legacy_next_event+0x0/0x7
>  [<ffffffff80386e2c>] ? acpi_hw_register_read+0x52/0xe5
>  [<ffffffff80394b2a>] ? acpi_idle_enter_simple+0x120/0x14e
>  [<ffffffff80394b20>] ? acpi_idle_enter_simple+0x116/0x14e
>  [<ffffffff8039486b>] ? acpi_idle_enter_bm+0xd5/0x274
>  [<ffffffff8041c020>] ? cpuidle_idle_call+0x7f/0xbb
>  [<ffffffff8020aaa5>] ? cpu_idle+0x4a/0x6d
> 
> -- 
> Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
> ------- You are receiving this mail because: -------
> You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Bugme-new] [Bug 13617] New: GRO:__napi_complete from net_rx_action crash
  2009-06-25 19:29 ` [Bugme-new] [Bug 13617] New: GRO:__napi_complete from net_rx_action crash Andrew Morton
@ 2009-06-26 17:13   ` Dhananjay Phadke
  2009-06-26 17:24     ` David Miller
  0 siblings, 1 reply; 6+ messages in thread
From: Dhananjay Phadke @ 2009-06-26 17:13 UTC (permalink / raw)
  To: Andrew Morton, bugme-daemon
  Cc: netdev, bugzilla-daemon, Amit Salecha, David Miller

mea culpa, likely driver can wait more for rx to drain
so that we race with napi disable.

Although, I have question for Dave. If napi code is
anyway forcing napi completion, should it not flush
gro flows also? This code predates GRO.

-Dhananjay

Andrew Morton wrote:
> 
> (switched to email.  Please respond via emailed reply-to-all, not via the
> bugzilla web interface).
> 
> 
> netdev core crashed.  The netxen driver may be implicated.
> 
> 
> Why did amit@netxen.com create this bug report?  Isn't Dhananjay
> sitting in the next cube?  Perhaps you believe that the driver is OK
> and that the bug lies in the netdev core?
> 
> 
> 
> On Thu, 25 Jun 2009 06:55:14 GMT
> bugzilla-daemon@bugzilla.kernel.org wrote:
> 
>> http://bugzilla.kernel.org/show_bug.cgi?id=13617
>>
>>            Summary: GRO:__napi_complete from net_rx_action crash
>>            Product: Drivers
>>            Version: 2.5
>>           Platform: All
>>         OS/Version: Linux
>>               Tree: Mainline
>>             Status: NEW
>>           Severity: normal
>>           Priority: P1
>>          Component: Network
>>         AssignedTo: drivers_network@kernel-bugs.osdl.org
>>         ReportedBy: amit@netxen.com
>>         Regression: No
>>
>>
>> In net_rx_action, there is check if napi_disable_pending then call
>> __napi_complete.
>> In __napi_complete, there is BUG_ON(n->gro_list);
>> Which has hit in below bug dump.
>> Why __napi_complete is called from net_rx_action instead of napi_complete.
>> napi_complete flushes the gro list.
>>
>> Below code excerpt from net_rx_action 
>> http://lxr.linux.no/linux+v2.6.30/net/core/dev.c#L2736
>>
>>    if (unlikely(work == weight)) {
>> 2791       if (unlikely(napi_disable_pending(n)))
>> 2792              __napi_complete(n);
>> 2793        else
>> 2794              list_move_tail(&n->poll_list, list);
>> 2795   }
>>
>> ------------[ cut here ]------------
>> kernel BUG at net/core/dev.c:2672!
>> invalid opcode: 0000 [#1] SMP 
>> last sysfs file: /sys/devices/system/cpu/cpu3/cache/index2/shared_cpu_map
>> CPU 2 
>> Modules linked in: netxen_nic nfs lockd nfs_acl auth_rpcgss ipv6 deflate
>> zlib_deflate ctr twofish twofish_common serpent blowfish des_generic cbc
>> aes_x86_64 aes_generic xcbc sha256_generic md5 crypto_null af_key autofs4
>> sunrpc iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi dm_mirror
>> dm_region_hash dm_log dm_multipath dm_mod video output sbs sbshc pci_slot
>> battery acpi_memhotplug ac parport ipmi_devintf ide_cd_mod rtc_cmos bnx2 cdrom
>> serio_raw ipmi_si rtc_core button ipmi_msghandler iTCO_wdt rtc_lib shpchp hpilo
>> hpwdt i5000_edac pcspkr edac_core ata_piix libata sd_mod scsi_mod cciss ext3
>> jbd uhci_hcd ohci_hcd ehci_hcd [last unloaded: microcode]
>> Pid: 0, comm: swapper Tainted: G        W  2.6.30 #1 ProLiant DL380 G5
>> RIP: 0010:[<ffffffff8043b128>]  [<ffffffff8043b128>] __napi_complete+0x15/0x25
>> RSP: 0018:ffff880028139eb0  EFLAGS: 00010086
>> RAX: ffff88023d4056b8 RBX: ffff88023d4056a8 RCX: 0000000002202318
>> RDX: 00000000001b0000 RSI: ffff880028139d98 RDI: ffff88023d4056a8
>> RBP: 0000000000000080 R08: 0000000002200000 R09: 000006de15931680
>> R10: ffffc20011a32318 R11: 0000000000000005 R12: 0000000000000000
>> R13: ffff8800281440e0 R14: 0000000000000080 R15: 000000000000012c
>> FS:  0000000000000000(0000) GS:ffff880028136000(0000) knlGS:0000000000000000
>> CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
>> CR2: 00000000008cb530 CR3: 000000023d9ab000 CR4: 00000000000006e0
>> Jun 23 23:41:27 DR0: 0000000000000000 DR1: 0000000000000000 DR2:
>> 0000000000000000
>> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
>> Process swapper (pid: 0, threadinfo ffff88023ed28000, task ffff88023ed27570)
>> Stack:
>>  ffff88023d4056a8 ffffffff8043ec9f 0000000000000001dut4146 last mes
>> 000000010004f429
>>  ffff88023d4056b8sage repeated 6  0000000000000046 0000000000000001
>> 0000000000000100times
>> Jun 23 23
>>  ffffffff8069a098 0000000000000018 000000000000000a:41:32 dut4146 k
>> ffffffff8023eba6
>> ernel: BUG: scheCall Trace:
>> duling while ato <IRQ> <0>mic: swapper/0/0 [<ffffffff8043ec9f>] ?
>> net_rx_action+0xf0/0x162
>>  [<ffffffff8023eba6>] ? __do_softirq+0xa3/0x163
>>  [<ffffffff8020ca7c>] ? call_softirq+0x1c/0x28
>> x10000100
>> Jun 2 [<ffffffff8020dc1a>] ? do_softirq+0x2c/0x68
>>  [<ffffffff8023eac6>] ? irq_exit+0x3f/0x7c
>>  [<ffffffff8020d46b>] ? do_IRQ+0xa9/0xbf
>> 3 23:41:32 dut41 [<ffffffff8020c353>] ? ret_from_intr+0x0/0xa
>>  <EOI> 46 kernel: Modul<0> [<ffffffff80220e41>] ?
>> hpet_legacy_next_event+0x0/0x7
>> es linked in: ne [<ffffffff80386e2c>] ? acpi_hw_register_read+0x52/0xe5
>>  [<ffffffff80394b2a>] ? acpi_idle_enter_simple+0x120/0x14e
>>  [<ffffffff80394b20>] ? acpi_idle_enter_simple+0x116/0x14e
>>  [<ffffffff8039486b>] ? acpi_idle_enter_bm+0xd5/0x274
>>  [<ffffffff8041c020>] ? cpuidle_idle_call+0x7f/0xbb
>>  [<ffffffff8020aaa5>] ? cpu_idle+0x4a/0x6d
>> Code: txen_nic nfs loc48 8d kd nfs_acl auth_43 70 48 rpcgss ipv6 defl39 c2 ate
>> zlib_deflate0f  ctr twofish two18 0e 75 fish_common serpdf ent blowfish des31
>> c9 41 _generic cbc aes58 5b _x86_64 aes_gene5d 48 89 ric xcbc sha256_c8 c3
>> generic md5 cryp53 f6 to_null af_key a47 10 01 utofs4 sunrpc is48 89 fb csi_tcp
>> libiscsi75 04 _tcp libiscsi sc0f 0b eb si_transport_iscfe 48 83 si dm_mirror
>> dm_7f 50 region_hash dm_l00 74 04 og dm_multipath <0f> dm_mod video out0b eb
>> put sbs sbshc pcfe e8 i_slot battery a1f cpi_memhotplug a10 f1 ff c parport
>> ipmi_df0 80 evintf ide_cd_mo63 10 fe d rtc_cmos bnx2 5b c3 cdrom serio_raw 53
>> 48 89 ipmi_si rtc_corefb e8 
>>  button ipmi_msgRIP  [<ffffffff8043b128>] __napi_complete+0x15/0x25
>>  RSP <ffff880028139eb0>
>> ---[ end trace 9c6b22b26aefd1b1 ]---
>> handler iTCO_wdtKernel panic - not syncing: Fatal exception in interrupt
>> Pid: 0, comm: swapper Tainted: G      D W  2.6.30 #1
>> Call Trace:
>>  <IRQ>  [<ffffffff8023a3b5>] ? panic+0x86/0x134
>>  [<ffffffff8020e348>] ? show_registers+0x211/0x21d
>>  [<ffffffff8024f5ea>] ? up+0xe/0x36
>>  [<ffffffff8023a9db>] ? release_console_sem+0x174/0x18e
>>  [<ffffffff804bdd54>] ? oops_end+0xa0/0xad
>>  [<ffffffff8020cf2c>] ? do_invalid_op+0x85/0x8f
>>  [<ffffffff8043b128>] ? __napi_complete+0x15/0x25
>>  [<ffffffffa03ebfe2>] ? netxen_nic_hw_write_wx_2M+0x24/0xa8 [netxen_nic]
>>  [<ffffffffa03ef866>] ? netxen_process_rcv_ring+0x4eb/0x501 [netxen_nic]
>>  rtc_lib shpchp  [<ffffffff8020c715>] ? invalid_op+0x15/0x20
>>  [<ffffffff8043b128>] ? __napi_complete+0x15/0x25
>>  [<ffffffff8043ec9f>] ? net_rx_action+0xf0/0x162
>>  [<ffffffff8023eba6>] ? __do_softirq+0xa3/0x163
>>  [<ffffffff8020ca7c>] ? call_softirq+0x1c/0x28
>>  [<ffffffff8020dc1a>] ? do_softirq+0x2c/0x68
>>  [<ffffffff8023eac6>] ? irq_exit+0x3f/0x7c
>>  [<ffffffff8020d46b>] ? do_IRQ+0xa9/0xbf
>>  [<ffffffff8020c353>] ? ret_from_intr+0x0/0xa
>>  <EOI>  [<ffffffff80220e41>] ? hpet_legacy_next_event+0x0/0x7
>>  [<ffffffff80386e2c>] ? acpi_hw_register_read+0x52/0xe5
>>  [<ffffffff80394b2a>] ? acpi_idle_enter_simple+0x120/0x14e
>>  [<ffffffff80394b20>] ? acpi_idle_enter_simple+0x116/0x14e
>>  [<ffffffff8039486b>] ? acpi_idle_enter_bm+0xd5/0x274
>>  [<ffffffff8041c020>] ? cpuidle_idle_call+0x7f/0xbb
>>  [<ffffffff8020aaa5>] ? cpu_idle+0x4a/0x6d
>>
>> -- 
>> Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
>> ------- You are receiving this mail because: -------
>> You are on the CC list for the bug.
> 
> Checked by AVG - www.avg.com 
> Version: 8.5.374 / Virus Database: 270.12.91/2201 - Release Date: 06/25/09 17:58:00

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Bugme-new] [Bug 13617] New: GRO:__napi_complete from net_rx_action crash
  2009-06-26 17:13   ` Dhananjay Phadke
@ 2009-06-26 17:24     ` David Miller
  2009-06-27  1:49       ` Herbert Xu
  0 siblings, 1 reply; 6+ messages in thread
From: David Miller @ 2009-06-26 17:24 UTC (permalink / raw)
  To: dhananjay.phadke
  Cc: akpm, bugme-daemon, netdev, bugzilla-daemon, amit.salecha, herbert

From: Dhananjay Phadke <dhananjay.phadke@qlogic.com>
Date: Fri, 26 Jun 2009 10:13:59 -0700

> mea culpa, likely driver can wait more for rx to drain
> so that we race with napi disable.
> 
> Although, I have question for Dave. If napi code is
> anyway forcing napi completion, should it not flush
> gro flows also? This code predates GRO.

I think there are some reasons, but Herbert Xu is more likely
to remember than I am, CC:'d :-)

> Andrew Morton wrote:
>> 
>> (switched to email.  Please respond via emailed reply-to-all, not via the
>> bugzilla web interface).
>> 
>> 
>> netdev core crashed.  The netxen driver may be implicated.
>> 
>> 
>> Why did amit@netxen.com create this bug report?  Isn't Dhananjay
>> sitting in the next cube?  Perhaps you believe that the driver is OK
>> and that the bug lies in the netdev core?
>> 
>> 
>> 
>> On Thu, 25 Jun 2009 06:55:14 GMT
>> bugzilla-daemon@bugzilla.kernel.org wrote:
>> 
>>> http://bugzilla.kernel.org/show_bug.cgi?id=13617
>>>
>>>            Summary: GRO:__napi_complete from net_rx_action crash
>>>            Product: Drivers
>>>            Version: 2.5
>>>           Platform: All
>>>         OS/Version: Linux
>>>               Tree: Mainline
>>>             Status: NEW
>>>           Severity: normal
>>>           Priority: P1
>>>          Component: Network
>>>         AssignedTo: drivers_network@kernel-bugs.osdl.org
>>>         ReportedBy: amit@netxen.com
>>>         Regression: No
>>>
>>>
>>> In net_rx_action, there is check if napi_disable_pending then call
>>> __napi_complete.
>>> In __napi_complete, there is BUG_ON(n->gro_list);
>>> Which has hit in below bug dump.
>>> Why __napi_complete is called from net_rx_action instead of napi_complete.
>>> napi_complete flushes the gro list.
>>>
>>> Below code excerpt from net_rx_action 
>>> http://lxr.linux.no/linux+v2.6.30/net/core/dev.c#L2736
>>>
>>>    if (unlikely(work == weight)) {
>>> 2791       if (unlikely(napi_disable_pending(n)))
>>> 2792              __napi_complete(n);
>>> 2793        else
>>> 2794              list_move_tail(&n->poll_list, list);
>>> 2795   }
>>>
>>> ------------[ cut here ]------------
>>> kernel BUG at net/core/dev.c:2672!
>>> invalid opcode: 0000 [#1] SMP 
>>> last sysfs file: /sys/devices/system/cpu/cpu3/cache/index2/shared_cpu_map
>>> CPU 2 
>>> Modules linked in: netxen_nic nfs lockd nfs_acl auth_rpcgss ipv6 deflate
>>> zlib_deflate ctr twofish twofish_common serpent blowfish des_generic cbc
>>> aes_x86_64 aes_generic xcbc sha256_generic md5 crypto_null af_key autofs4
>>> sunrpc iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi dm_mirror
>>> dm_region_hash dm_log dm_multipath dm_mod video output sbs sbshc pci_slot
>>> battery acpi_memhotplug ac parport ipmi_devintf ide_cd_mod rtc_cmos bnx2 cdrom
>>> serio_raw ipmi_si rtc_core button ipmi_msghandler iTCO_wdt rtc_lib shpchp hpilo
>>> hpwdt i5000_edac pcspkr edac_core ata_piix libata sd_mod scsi_mod cciss ext3
>>> jbd uhci_hcd ohci_hcd ehci_hcd [last unloaded: microcode]
>>> Pid: 0, comm: swapper Tainted: G        W  2.6.30 #1 ProLiant DL380 G5
>>> RIP: 0010:[<ffffffff8043b128>]  [<ffffffff8043b128>] __napi_complete+0x15/0x25
>>> RSP: 0018:ffff880028139eb0  EFLAGS: 00010086
>>> RAX: ffff88023d4056b8 RBX: ffff88023d4056a8 RCX: 0000000002202318
>>> RDX: 00000000001b0000 RSI: ffff880028139d98 RDI: ffff88023d4056a8
>>> RBP: 0000000000000080 R08: 0000000002200000 R09: 000006de15931680
>>> R10: ffffc20011a32318 R11: 0000000000000005 R12: 0000000000000000
>>> R13: ffff8800281440e0 R14: 0000000000000080 R15: 000000000000012c
>>> FS:  0000000000000000(0000) GS:ffff880028136000(0000) knlGS:0000000000000000
>>> CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
>>> CR2: 00000000008cb530 CR3: 000000023d9ab000 CR4: 00000000000006e0
>>> Jun 23 23:41:27 DR0: 0000000000000000 DR1: 0000000000000000 DR2:
>>> 0000000000000000
>>> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
>>> Process swapper (pid: 0, threadinfo ffff88023ed28000, task ffff88023ed27570)
>>> Stack:
>>>  ffff88023d4056a8 ffffffff8043ec9f 0000000000000001dut4146 last mes
>>> 000000010004f429
>>>  ffff88023d4056b8sage repeated 6  0000000000000046 0000000000000001
>>> 0000000000000100times
>>> Jun 23 23
>>>  ffffffff8069a098 0000000000000018 000000000000000a:41:32 dut4146 k
>>> ffffffff8023eba6
>>> ernel: BUG: scheCall Trace:
>>> duling while ato <IRQ> <0>mic: swapper/0/0 [<ffffffff8043ec9f>] ?
>>> net_rx_action+0xf0/0x162
>>>  [<ffffffff8023eba6>] ? __do_softirq+0xa3/0x163
>>>  [<ffffffff8020ca7c>] ? call_softirq+0x1c/0x28
>>> x10000100
>>> Jun 2 [<ffffffff8020dc1a>] ? do_softirq+0x2c/0x68
>>>  [<ffffffff8023eac6>] ? irq_exit+0x3f/0x7c
>>>  [<ffffffff8020d46b>] ? do_IRQ+0xa9/0xbf
>>> 3 23:41:32 dut41 [<ffffffff8020c353>] ? ret_from_intr+0x0/0xa
>>>  <EOI> 46 kernel: Modul<0> [<ffffffff80220e41>] ?
>>> hpet_legacy_next_event+0x0/0x7
>>> es linked in: ne [<ffffffff80386e2c>] ? acpi_hw_register_read+0x52/0xe5
>>>  [<ffffffff80394b2a>] ? acpi_idle_enter_simple+0x120/0x14e
>>>  [<ffffffff80394b20>] ? acpi_idle_enter_simple+0x116/0x14e
>>>  [<ffffffff8039486b>] ? acpi_idle_enter_bm+0xd5/0x274
>>>  [<ffffffff8041c020>] ? cpuidle_idle_call+0x7f/0xbb
>>>  [<ffffffff8020aaa5>] ? cpu_idle+0x4a/0x6d
>>> Code: txen_nic nfs loc48 8d kd nfs_acl auth_43 70 48 rpcgss ipv6 defl39 c2 ate
>>> zlib_deflate0f  ctr twofish two18 0e 75 fish_common serpdf ent blowfish des31
>>> c9 41 _generic cbc aes58 5b _x86_64 aes_gene5d 48 89 ric xcbc sha256_c8 c3
>>> generic md5 cryp53 f6 to_null af_key a47 10 01 utofs4 sunrpc is48 89 fb csi_tcp
>>> libiscsi75 04 _tcp libiscsi sc0f 0b eb si_transport_iscfe 48 83 si dm_mirror
>>> dm_7f 50 region_hash dm_l00 74 04 og dm_multipath <0f> dm_mod video out0b eb
>>> put sbs sbshc pcfe e8 i_slot battery a1f cpi_memhotplug a10 f1 ff c parport
>>> ipmi_df0 80 evintf ide_cd_mo63 10 fe d rtc_cmos bnx2 5b c3 cdrom serio_raw 53
>>> 48 89 ipmi_si rtc_corefb e8 
>>>  button ipmi_msgRIP  [<ffffffff8043b128>] __napi_complete+0x15/0x25
>>>  RSP <ffff880028139eb0>
>>> ---[ end trace 9c6b22b26aefd1b1 ]---
>>> handler iTCO_wdtKernel panic - not syncing: Fatal exception in interrupt
>>> Pid: 0, comm: swapper Tainted: G      D W  2.6.30 #1
>>> Call Trace:
>>>  <IRQ>  [<ffffffff8023a3b5>] ? panic+0x86/0x134
>>>  [<ffffffff8020e348>] ? show_registers+0x211/0x21d
>>>  [<ffffffff8024f5ea>] ? up+0xe/0x36
>>>  [<ffffffff8023a9db>] ? release_console_sem+0x174/0x18e
>>>  [<ffffffff804bdd54>] ? oops_end+0xa0/0xad
>>>  [<ffffffff8020cf2c>] ? do_invalid_op+0x85/0x8f
>>>  [<ffffffff8043b128>] ? __napi_complete+0x15/0x25
>>>  [<ffffffffa03ebfe2>] ? netxen_nic_hw_write_wx_2M+0x24/0xa8 [netxen_nic]
>>>  [<ffffffffa03ef866>] ? netxen_process_rcv_ring+0x4eb/0x501 [netxen_nic]
>>>  rtc_lib shpchp  [<ffffffff8020c715>] ? invalid_op+0x15/0x20
>>>  [<ffffffff8043b128>] ? __napi_complete+0x15/0x25
>>>  [<ffffffff8043ec9f>] ? net_rx_action+0xf0/0x162
>>>  [<ffffffff8023eba6>] ? __do_softirq+0xa3/0x163
>>>  [<ffffffff8020ca7c>] ? call_softirq+0x1c/0x28
>>>  [<ffffffff8020dc1a>] ? do_softirq+0x2c/0x68
>>>  [<ffffffff8023eac6>] ? irq_exit+0x3f/0x7c
>>>  [<ffffffff8020d46b>] ? do_IRQ+0xa9/0xbf
>>>  [<ffffffff8020c353>] ? ret_from_intr+0x0/0xa
>>>  <EOI>  [<ffffffff80220e41>] ? hpet_legacy_next_event+0x0/0x7
>>>  [<ffffffff80386e2c>] ? acpi_hw_register_read+0x52/0xe5
>>>  [<ffffffff80394b2a>] ? acpi_idle_enter_simple+0x120/0x14e
>>>  [<ffffffff80394b20>] ? acpi_idle_enter_simple+0x116/0x14e
>>>  [<ffffffff8039486b>] ? acpi_idle_enter_bm+0xd5/0x274
>>>  [<ffffffff8041c020>] ? cpuidle_idle_call+0x7f/0xbb
>>>  [<ffffffff8020aaa5>] ? cpu_idle+0x4a/0x6d
>>>
>>> -- 
>>> Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
>>> ------- You are receiving this mail because: -------
>>> You are on the CC list for the bug.
>> 
>> Checked by AVG - www.avg.com 
>> Version: 8.5.374 / Virus Database: 270.12.91/2201 - Release Date: 06/25/09 17:58:00

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Bugme-new] [Bug 13617] New: GRO:__napi_complete from net_rx_action crash
  2009-06-26 17:24     ` David Miller
@ 2009-06-27  1:49       ` Herbert Xu
  2009-06-27  2:28         ` David Miller
  0 siblings, 1 reply; 6+ messages in thread
From: Herbert Xu @ 2009-06-27  1:49 UTC (permalink / raw)
  To: David Miller
  Cc: dhananjay.phadke, akpm, bugme-daemon, netdev, bugzilla-daemon,
	amit.salecha

On Fri, Jun 26, 2009 at 10:24:58AM -0700, David Miller wrote:
>
> >>> In net_rx_action, there is check if napi_disable_pending then call
> >>> __napi_complete.
> >>> In __napi_complete, there is BUG_ON(n->gro_list);
> >>> Which has hit in below bug dump.
> >>> Why __napi_complete is called from net_rx_action instead of napi_complete.
> >>> napi_complete flushes the gro list.

Indeed, it was an oversight.  Thanks for catching it!

gro: Flush GRO packets in napi_disable_pending path

When NAPI is disabled while we're in net_rx_action, we end up
calling __napi_complete without flushing GRO packets.  This is
a bug as it would cause the GRO packets to linger, of course it
also literally BUGs to catch error like this :)

This patch changes it to napi_complete, with the obligatory IRQ
reenabling.  This should be safe because we've only just disabled
IRQs and it does not materially affect the test conditions in
between.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

diff --git a/net/core/dev.c b/net/core/dev.c
index 60b5728..70c27e0 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2823,9 +2823,11 @@ static void net_rx_action(struct softirq_action *h)
 		 * move the instance around on the list at-will.
 		 */
 		if (unlikely(work == weight)) {
-			if (unlikely(napi_disable_pending(n)))
-				__napi_complete(n);
-			else
+			if (unlikely(napi_disable_pending(n))) {
+				local_irq_enable();
+				napi_complete(n);
+				local_irq_disable();
+			} else
 				list_move_tail(&n->poll_list, list);
 		}

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [Bugme-new] [Bug 13617] New: GRO:__napi_complete from net_rx_action crash
  2009-06-27  1:49       ` Herbert Xu
@ 2009-06-27  2:28         ` David Miller
  2009-06-27  3:33           ` Herbert Xu
  0 siblings, 1 reply; 6+ messages in thread
From: David Miller @ 2009-06-27  2:28 UTC (permalink / raw)
  To: herbert
  Cc: dhananjay.phadke, akpm, bugme-daemon, netdev, bugzilla-daemon,
	amit.salecha

From: Herbert Xu <herbert@gondor.apana.org.au>
Date: Sat, 27 Jun 2009 09:49:00 +0800

> On Fri, Jun 26, 2009 at 10:24:58AM -0700, David Miller wrote:
>>
>> >>> In net_rx_action, there is check if napi_disable_pending then call
>> >>> __napi_complete.
>> >>> In __napi_complete, there is BUG_ON(n->gro_list);
>> >>> Which has hit in below bug dump.
>> >>> Why __napi_complete is called from net_rx_action instead of napi_complete.
>> >>> napi_complete flushes the gro list.
> 
> Indeed, it was an oversight.  Thanks for catching it!
> 
> gro: Flush GRO packets in napi_disable_pending path
> 
> When NAPI is disabled while we're in net_rx_action, we end up
> calling __napi_complete without flushing GRO packets.  This is
> a bug as it would cause the GRO packets to linger, of course it
> also literally BUGs to catch error like this :)
> 
> This patch changes it to napi_complete, with the obligatory IRQ
> reenabling.  This should be safe because we've only just disabled
> IRQs and it does not materially affect the test conditions in
> between.
> 
> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

Applied.

I remembered that change where we had to disable GRO in the
legacy RX path and all the IRQ disabling problems we ran into
there.  So I went and had a look at that to make sure we won't
have similar issues here, luckily it seems not.

Thanks!

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Bugme-new] [Bug 13617] New: GRO:__napi_complete from net_rx_action crash
  2009-06-27  2:28         ` David Miller
@ 2009-06-27  3:33           ` Herbert Xu
  0 siblings, 0 replies; 6+ messages in thread
From: Herbert Xu @ 2009-06-27  3:33 UTC (permalink / raw)
  To: David Miller
  Cc: dhananjay.phadke, akpm, bugme-daemon, netdev, bugzilla-daemon,
	amit.salecha

On Fri, Jun 26, 2009 at 07:28:04PM -0700, David Miller wrote:
>
> I remembered that change where we had to disable GRO in the
> legacy RX path and all the IRQ disabling problems we ran into
> there.  So I went and had a look at that to make sure we won't
> have similar issues here, luckily it seems not.

Yeah the netpoll lock is all we need since netpoll is the only
other thing that can race against this.

Thanks,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2009-06-27  3:34 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <bug-13617-10286@http.bugzilla.kernel.org/>
2009-06-25 19:29 ` [Bugme-new] [Bug 13617] New: GRO:__napi_complete from net_rx_action crash Andrew Morton
2009-06-26 17:13   ` Dhananjay Phadke
2009-06-26 17:24     ` David Miller
2009-06-27  1:49       ` Herbert Xu
2009-06-27  2:28         ` David Miller
2009-06-27  3:33           ` Herbert Xu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.