netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Sudden kernel panic with skge in 3.3-rc2
@ 2012-02-02 19:21 Nick Bowler
  2012-02-02 20:45 ` Stephen Hemminger
  0 siblings, 1 reply; 7+ messages in thread
From: Nick Bowler @ 2012-02-02 19:21 UTC (permalink / raw)
  To: netdev, linux-kernel; +Cc: Stephen Hemminger

Hi,

I just saw this panic on 3.3-rc2 with skge.  I don't know whether it's
reproducible yet -- the machine crashed while I was not actively using
it.  We've had this type of card for a few years and I've never seen this
before so it may be a regression, but admittedly we don't use them all
that often.

At the time of the crash, the network interface in question was up, but
not configured with any addresses; mtu configured to (the default) 1500
bytes.  It was used for packet capture (tcpdump) shortly beforehand,
although this was not running at the time of the crash.  It's a PCI
gigabit ethernet card:

  03:01.0 Ethernet controller: D-Link System Inc DGE-530T Gigabit Ethernet Adapter (rev 11) (rev 11)

In case it matters, the primary network interface of the system is an
onboard device using the sky2 driver:

  02:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8057 PCI-E Gigabit Ethernet Controller (rev 10Ethernet controller: Marvell Technology Group Ltd. 88E8057 PCI-E Gigabit Ethernet Controller (rev 10)

I took a photo of the screen, but unfortuately the top few lines are
lost forever.  Posted at imgur because it seems a bit big for the
mailing lists: <http://i.imgur.com/xklGU.jpg>. Apologies for the
terrible quality; the best camera I could find around was in someone's
mobile phone.  I'll see about configuring a serial console in case it
crashes again.

For convenience (hopefully), here's the call trace retyped in plain text
(addresses elided, see photo for them).  The full code line is
reproduced by hand because it's almost unreadable in the photo (and
truncated to boot).

  Call Trace:
   <IRQ>
   [...] net_rx_action+0xaa/0x1c0
   [...] __do_softirq+0x7e/0x125
   [...] ? _raw_spin_unlock+0x26/0x31
   [...] call_softirq+0x1c/0x30
   [...] do_softirq+0x33/0x68
   [...] irq_exit+0x3f/0xb9
   [...] do_IRQ+0x97/0xae
   [...] common_interrupt+0x6b/0x6b
   <EOI>
   [...] ? hrtimer_start+0x13/0x15
   [...] ? mwait_idle+0x6e/0x80
   [...] ? mwait_idle+0x61/0x80
   [...] cpu_idle+0x61/0xbd
   [...] rest_init+0x8d/0x91
   [...] start_kernel+0x338/0x343
   [...] x86_64_start_reservations+0xb8/0xbd
   [...] x86_64_start_kernel+0xed/0xf4
  Code: 48 8b 40 30 48 85 c0 74 0a b9 02 00 00 00 4c 89 fa ff d0 49 8b 86 d0 00 00 00 49 8b 4d b4 48 89 c7 48 8b b2 d0 00 00 00 <f3> a4 31 ff 48 8b 03 49 8b 75 18 48 8b 40 08 48 85 c0 74 13 48
  RIP  [...] skge_poll+0x367/0x5cd [skge]

Let me know if you need any more info,
-- 
Nick Bowler, Elliptic Technologies (http://www.elliptictech.com/)

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Sudden kernel panic with skge in 3.3-rc2
  2012-02-02 19:21 Sudden kernel panic with skge in 3.3-rc2 Nick Bowler
@ 2012-02-02 20:45 ` Stephen Hemminger
  2012-02-03 19:28   ` Nick Bowler
                     ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Stephen Hemminger @ 2012-02-02 20:45 UTC (permalink / raw)
  To: Nick Bowler; +Cc: netdev, linux-kernel

On Thu, 2 Feb 2012 14:21:15 -0500
Nick Bowler <nbowler@elliptictech.com> wrote:

> Hi,
> 
> I just saw this panic on 3.3-rc2 with skge.  I don't know whether it's
> reproducible yet -- the machine crashed while I was not actively using
> it.  We've had this type of card for a few years and I've never seen this
> before so it may be a regression, but admittedly we don't use them all
> that often.
> 
> At the time of the crash, the network interface in question was up, but
> not configured with any addresses; mtu configured to (the default) 1500
> bytes.  It was used for packet capture (tcpdump) shortly beforehand,
> although this was not running at the time of the crash.  It's a PCI
> gigabit ethernet card:
> 
>   03:01.0 Ethernet controller: D-Link System Inc DGE-530T Gigabit Ethernet Adapter (rev 11) (rev 11)
> 
> In case it matters, the primary network interface of the system is an
> onboard device using the sky2 driver:
> 
>   02:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8057 PCI-E Gigabit Ethernet Controller (rev 10Ethernet controller: Marvell Technology Group Ltd. 88E8057 PCI-E Gigabit Ethernet Controller (rev 10)
> 
> I took a photo of the screen, but unfortuately the top few lines are
> lost forever.  Posted at imgur because it seems a bit big for the
> mailing lists: <http://i.imgur.com/xklGU.jpg>. Apologies for the
> terrible quality; the best camera I could find around was in someone's
> mobile phone.  I'll see about configuring a serial console in case it
> crashes again.
> 
> For convenience (hopefully), here's the call trace retyped in plain text
> (addresses elided, see photo for them).  The full code line is
> reproduced by hand because it's almost unreadable in the photo (and
> truncated to boot).
> 
>   Call Trace:
>    <IRQ>
>    [...] net_rx_action+0xaa/0x1c0
>    [...] __do_softirq+0x7e/0x125
>    [...] ? _raw_spin_unlock+0x26/0x31
>    [...] call_softirq+0x1c/0x30
>    [...] do_softirq+0x33/0x68
>    [...] irq_exit+0x3f/0xb9
>    [...] do_IRQ+0x97/0xae
>    [...] common_interrupt+0x6b/0x6b
>    <EOI>
>    [...] ? hrtimer_start+0x13/0x15
>    [...] ? mwait_idle+0x6e/0x80
>    [...] ? mwait_idle+0x61/0x80
>    [...] cpu_idle+0x61/0xbd
>    [...] rest_init+0x8d/0x91
>    [...] start_kernel+0x338/0x343
>    [...] x86_64_start_reservations+0xb8/0xbd
>    [...] x86_64_start_kernel+0xed/0xf4
>   Code: 48 8b 40 30 48 85 c0 74 0a b9 02 00 00 00 4c 89 fa ff d0 49 8b 86 d0 00 00 00 49 8b 4d b4 48 89 c7 48 8b b2 d0 00 00 00 <f3> a4 31 ff 48 8b 03 49 8b 75 18 48 8b 40 08 48 85 c0 74 13 48
>   RIP  [...] skge_poll+0x367/0x5cd [skge]
> 
> Let me know if you need any more info,

Try reverting this commit, it seems problematic
commit d0249e44432aa0ffcf710b64449b8eaa3722547e
Author: stephen hemminger <shemminger@vyatta.com>
Date:   Thu Jan 19 14:37:18 2012 +0000

    skge: check for PCI dma mapping errors

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Sudden kernel panic with skge in 3.3-rc2
  2012-02-02 20:45 ` Stephen Hemminger
@ 2012-02-03 19:28   ` Nick Bowler
  2012-02-08 16:32     ` Nick Bowler
  2012-02-03 22:05   ` Guillaume Chazarain
  2012-02-07  0:11   ` Paul Gortmaker
  2 siblings, 1 reply; 7+ messages in thread
From: Nick Bowler @ 2012-02-03 19:28 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev, linux-kernel

Hi Stephen,

On 2012-02-02 12:45 -0800, Stephen Hemminger wrote:
> On Thu, 2 Feb 2012 14:21:15 -0500
> Nick Bowler <nbowler@elliptictech.com> wrote:
> > I just saw this panic on 3.3-rc2 with skge.  I don't know whether it's
> > reproducible yet -- the machine crashed while I was not actively using
> > it.  We've had this type of card for a few years and I've never seen this
> > before so it may be a regression, but admittedly we don't use them all
> > that often.
[...]
> 
> Try reverting this commit, it seems problematic
> commit d0249e44432aa0ffcf710b64449b8eaa3722547e
> Author: stephen hemminger <shemminger@vyatta.com>
> Date:   Thu Jan 19 14:37:18 2012 +0000
> 
>     skge: check for PCI dma mapping errors

Thanks for the pointer, I'll try that.  Unfortunately some other stuff
has come up so I probably won't be able to test it until next week.

Cheers,
-- 
Nick Bowler, Elliptic Technologies (http://www.elliptictech.com/)

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Sudden kernel panic with skge in 3.3-rc2
  2012-02-02 20:45 ` Stephen Hemminger
  2012-02-03 19:28   ` Nick Bowler
@ 2012-02-03 22:05   ` Guillaume Chazarain
  2012-02-07  0:11   ` Paul Gortmaker
  2 siblings, 0 replies; 7+ messages in thread
From: Guillaume Chazarain @ 2012-02-03 22:05 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: Nick Bowler, netdev, linux-kernel

On Thu, Feb 2, 2012 at 9:45 PM, Stephen Hemminger <shemminger@vyatta.com> wrote:
> Try reverting this commit, it seems problematic
> commit d0249e44432aa0ffcf710b64449b8eaa3722547e

Reverting this commit fixed skge for me. The NIC stopped working after
a kernel upgrade with nothing interesting in dmesg:

skge: 1.14 addr 0xfbffc000 irq 18 chip Yukon-Lite rev 9
skge 0000:03:00.0: eth0: addr 00:15:f2:46:d3:3b
skge 0000:03:00.0: eth0: enabling interface
ADDRCONF(NETDEV_UP): eth0: link is not ready
skge 0000:03:00.0: eth0: Link is up at 100 Mbps, full duplex, flow control both
ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
eth0: no IPv6 routers present

NetworkManager would Oops during shutting down: http://i.imgur.com/eAFgM.jpg

-- 
Guillaume

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Sudden kernel panic with skge in 3.3-rc2
  2012-02-02 20:45 ` Stephen Hemminger
  2012-02-03 19:28   ` Nick Bowler
  2012-02-03 22:05   ` Guillaume Chazarain
@ 2012-02-07  0:11   ` Paul Gortmaker
  2012-02-07  0:17     ` Stephen Hemminger
  2 siblings, 1 reply; 7+ messages in thread
From: Paul Gortmaker @ 2012-02-07  0:11 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: Nick Bowler, netdev, linux-kernel

On Thu, Feb 2, 2012 at 3:45 PM, Stephen Hemminger <shemminger@vyatta.com> wrote:

[...]

>
> Try reverting this commit, it seems problematic
> commit d0249e44432aa0ffcf710b64449b8eaa3722547e
> Author: stephen hemminger <shemminger@vyatta.com>
> Date:   Thu Jan 19 14:37:18 2012 +0000
>
>    skge: check for PCI dma mapping errors
>

I'm seeing similar issues, and a revert of the above caused the
problems to go away.  I'm testing on a baseline of net-next
as of today (3238a9be4d7a) plus some TIPC patches I was
trying to test (which are 99.9% unrelated to this, I'm sure).

Details captured from serial console are below.  100% reproducible.

I can probably try a test/debug patch for you if need be.

Paul.

---

00:09.0 Ethernet controller: 3Com Corporation 3c940 10/100/1000Base-T
[Marvell] (rev 12)
(right on motherboard, older AMD platform with NVIDIA chipset)

[    1.698965] skge 0000:00:09.0: PCI: Disallowing DAC for device
[    1.704861] skge: 1.14 addr 0xef000000 irq 18 chip Yukon rev 1
[    1.711171] skge 0000:00:09.0: eth0: addr 00:0e:a6:71:ed:b4

These hw csum failure repeat on regular intervals:

[  162.830840] eth0: hw csum failure
[  162.831829] Pid: 0, comm: swapper/0 Not tainted 3.3.0-rc1+ #5
[  162.831829] Call Trace:
[  162.831829]  [<c16912bf>] ? printk+0x18/0x1a
[  162.831829]  [<c1515607>] netdev_rx_csum_fault+0x37/0x40
[  162.831829]  [<c1510dff>] __skb_checksum_complete_head+0x5f/0x70
[  162.831829]  [<c1510e1b>] __skb_checksum_complete+0xb/0x10
[  162.831829]  [<c1593332>] nf_ip_checksum+0x62/0x130
[  162.831829]  [<c15455d7>] udp_error+0xa7/0x260
[  162.831829]  [<c1598f27>] ? ipt_do_table+0x1e7/0x370
[  162.831829]  [<c1545530>] ? udp_print_tuple+0x40/0x40
[  162.831829]  [<c1540cf0>] nf_conntrack_in+0xc0/0x5f0
[  162.831829]  [<c1599955>] ? nf_nat_rule_find+0x85/0xa0
[  162.831829]  [<c1551a38>] ? ip_route_input_common+0x368/0xb20
[  162.831829]  [<c153fe69>] ? nf_conntrack_free+0x49/0x60
[  162.831829]  [<c153fe69>] ? nf_conntrack_free+0x49/0x60
[  162.831829]  [<c15538f0>] ? inet_del_protocol+0x30/0x30
[  162.831829]  [<c159432e>] ipv4_conntrack_in+0x1e/0x30
[  162.831829]  [<c153d1f3>] nf_iterate+0x63/0x90
[  162.831829]  [<c15538f0>] ? inet_del_protocol+0x30/0x30
[  162.831829]  [<c153d27a>] nf_hook_slow+0x5a/0x110
[  162.831829]  [<c15538f0>] ? inet_del_protocol+0x30/0x30
[  162.831829]  [<c1554265>] ip_rcv+0x235/0x310
[  162.831829]  [<c15538f0>] ? inet_del_protocol+0x30/0x30
[  162.831829]  [<c1517887>] __netif_receive_skb+0x477/0x530
[  162.831829]  [<c1518d22>] netif_receive_skb+0x22/0x80
[  162.831829]  [<c10077b8>] ? nommu_map_page+0x38/0x70
[  162.831829]  [<c1518ea7>] napi_skb_finish+0x37/0x50
[  162.831829]  [<c151937b>] napi_gro_receive+0xbb/0xd0
[  162.831829]  [<c13edc41>] skge_poll+0x381/0x690
[  162.831829]  [<c141d7e1>] ? usb_hcd_poll_rh_status+0xf1/0x120
[  162.831829]  [<c100a27d>] ? save_i387_fxsave+0x3d/0xa0
[  162.831829]  [<c151950d>] net_rx_action+0xed/0x1d0
[  162.831829]  [<c141deb0>] ? usb_add_hcd+0x6a0/0x6a0
[  162.831829]  [<c1034196>] __do_softirq+0x86/0x170
[  162.831829]  [<c1034110>] ? send_remote_softirq+0x30/0x30
[  162.831829]  <IRQ>  [<c103448e>] ? irq_exit+0x6e/0x90
[  162.831829]  [<c1004116>] ? do_IRQ+0x46/0xb0
[  162.831829]  [<c1034477>] ? irq_exit+0x57/0x90
[  162.831829]  [<c101ba64>] ? smp_apic_timer_interrupt+0x54/0x90
[  162.831829]  [<c169a2e9>] ? common_interrupt+0x29/0x30
[  162.831829]  [<c1009589>] ? default_idle+0x69/0x160
[  162.831829]  [<c100190f>] ? cpu_idle+0x5f/0xa0
[  162.831829]  [<c16729c8>] ? rest_init+0x58/0x60
[  162.831829]  [<c19136c5>] ? start_kernel+0x2db/0x2e1
[  162.831829]  [<c1913172>] ? loglevel+0x2b/0x2b
[  162.831829]  [<c1913075>] ? i386_start_kernel+0x75/0x79

root@asus-a7v600:~# cat /proc/net/dev
Inter-|   Receive                                                |  Transmit
 face |bytes    packets errs drop fifo frame compressed
multicast|bytes    packets errs drop fifo colls carrier compressed
    lo:      88       1    0    0    0     0          0         0
 88       1    0    0    0     0       0          0
  sit0:       0       0    0    0    0     0          0         0
  0       0    0    0    0     0       0          0
  eth0:  641588    6994    0    0    0     0          0      6957
8544      47    0    0    0     0       0          0
root@asus-a7v600:~#

This happens when I reboot it:

[ OK ] processes ended within 1 seconds....                                    d
 * Deconfiguring network interfaces...        [  402.315402] BUG: unable to han
le kernel NULL pointer dereference at 00000c78
[  402.316001] IP: [<c10c6c40>] pagevec_move_tail+0x30/0x30
[  402.316001] *pde = 00000000
[  402.316001] Oops: 0000 [#1] SMP
[  402.316001] Modules linked in:
[  402.316001]                                                                 r
[  402.316001] Pid: 4201, comm: ip Not tainted 3.3.0-rc1+ #2 System Manufacture
 System Name/A7V600
[  402.316001] EIP: 0060:[<c10c6c40>] EFLAGS: 00010202 CPU: 0
[  402.316001] EIP is at put_page+0x0/0x40
[  402.316001] EAX: 00000c78 EBX: 00000001 ECX: f42ca640 EDX: 00000001
[  402.316001] ESI: f4164000 EDI: f4ff27e0 EBP: f419ba4c ESP: f419ba40
[  402.316001]  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068                   0
[  402.316001] Process ip (pid: 4201, ti=f419a000 task=f6df44e0 task.ti=f419a00
)
[  402.316001] Stack:
[  402.316001]  c14c0c84 f4164000 f4164000 f419ba58 c14c0cf2 f4d5c000 f419ba68
14c0d96                                                                        0
[  402.316001]  f4d5c000 00000000 f419ba88 c13a0b3c 00000aa8 f4d72000 f4d72488
0000000                                                                        f
[  402.316001]  00000001 00000001 f419bab4 c13a2426 00001800 c17fea66 00000000
4d72400
[  402.316001] Call Trace:
[  402.316001]  [<c14c0c84>] ? skb_release_data+0x54/0xb0
[  402.316001]  [<c14c0cf2>] __kfree_skb+0x12/0x90
[  402.316001]  [<c14c0d96>] consume_skb+0x26/0x60
[  402.316001]  [<c13a0b3c>] skge_rx_clean.clone.77+0x5c/0x80
[  402.316001]  [<c13a2426>] skge_down+0x3d6/0x4f0
[  402.316001]  [<c14c9f49>] __dev_close_many+0x69/0xb0
[  402.316001]  [<c139ee38>] ? skge_set_multicast+0x8/0x10
[  402.316001]  [<c14c9faf>] __dev_close+0x1f/0x30
[  402.316001]  [<c14ceaad>] __dev_change_flags+0x7d/0x150
[  402.316001]  [<c14cec1e>] dev_change_flags+0x1e/0x60
[  402.316001]  [<c14d9e37>] do_setlink+0x177/0x900
[  402.316001]  [<c122885f>] ? nla_parse+0x1f/0xa0
[  402.316001]  [<c10e1a54>] ? page_add_new_anon_rmap+0x74/0x90
[  402.316001]  [<c14daf19>] rtnl_newlink+0x359/0x530
[  402.316001]  [<c11d02fe>] ? selinux_capable+0x2e/0x40
[  402.316001]  [<c1037200>] ? sys_sysctl+0x100/0x1a0
[  402.316001]  [<c14da820>] rtnetlink_rcv_msg+0x140/0x290
[  402.316001]  [<c10eff24>] ? kmem_cache_alloc+0x24/0x100
[  402.316001]  [<c14c0cc0>] ? skb_release_data+0x90/0xb0
[  402.316001]  [<c14dabc0>] ? rtnl_configure_link+0x80/0x80
[  402.316001]  [<c14da6e0>] ? __rtnl_unlock+0x10/0x10
[  402.316001]  [<c14ef5ae>] netlink_rcv_skb+0x8e/0xb0
[  402.316001]  [<c14d8dd7>] rtnetlink_rcv+0x17/0x20
[  402.316001]  [<c14ef045>] netlink_unicast+0x175/0x1c0
[  402.316001]  [<c14ef271>] netlink_sendmsg+0x1e1/0x2e0
[  402.316001]  [<c14bb03f>] sock_sendmsg+0xdf/0x110
[  402.316001]  [<c1028f0e>] ? __kmap_atomic+0xe/0x10
[  402.316001]  [<c10c2470>] ? get_page_from_freelist+0x250/0x4a0
[  402.316001]  [<c121ca3f>] ? _copy_from_user+0x3f/0x60
[  402.316001]  [<c14c4903>] ? verify_iovec+0x53/0xb0
[  402.316001]  [<c14bb36d>] __sys_sendmsg+0x2ad/0x2c0
[  402.316001]  [<c10bc64d>] ? unlock_page+0x3d/0x40
[  402.316001]  [<c10d8cc8>] ? __do_fault+0x368/0x460
[  402.316001]  [<c10dafe0>] ? handle_pte_fault+0x80/0x690
[  402.316001]  [<c1227ef5>] ? __percpu_counter_add+0x75/0xa0
[  402.316001]  [<c10db693>] ? handle_mm_fault+0xa3/0x130
[  402.316001]  [<c14ba1d4>] ? sockfd_lookup_light+0x24/0x80
[  402.316001]  [<c14bc336>] sys_sendmsg+0x36/0x60
[  402.316001]  [<c14bc82b>] sys_socketcall+0xfb/0x2c0
[  402.316001]  [<c164da4c>] sysenter_do_call+0x12/0x22

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Sudden kernel panic with skge in 3.3-rc2
  2012-02-07  0:11   ` Paul Gortmaker
@ 2012-02-07  0:17     ` Stephen Hemminger
  0 siblings, 0 replies; 7+ messages in thread
From: Stephen Hemminger @ 2012-02-07  0:17 UTC (permalink / raw)
  To: Paul Gortmaker; +Cc: Nick Bowler, netdev, linux-kernel

On Mon, 6 Feb 2012 19:11:27 -0500
Paul Gortmaker <paul.gortmaker@windriver.com> wrote:

> On Thu, Feb 2, 2012 at 3:45 PM, Stephen Hemminger <shemminger@vyatta.com> wrote:
> 
> [...]
> 
> >
> > Try reverting this commit, it seems problematic
> > commit d0249e44432aa0ffcf710b64449b8eaa3722547e
> > Author: stephen hemminger <shemminger@vyatta.com>
> > Date:   Thu Jan 19 14:37:18 2012 +0000
> >
> >    skge: check for PCI dma mapping errors
> >
> 
> I'm seeing similar issues, and a revert of the above caused the
> problems to go away.  I'm testing on a baseline of net-next
> as of today (3238a9be4d7a) plus some TIPC patches I was
> trying to test (which are 99.9% unrelated to this, I'm sure).
> 
> Details captured from serial console are below.  100% reproducible.
> 
> I can probably try a test/debug patch for you if need be.
> 
> Paul.

There is a simple bug in the cleanup code reordering. And it is
reproducible here. Working on a better solution.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Sudden kernel panic with skge in 3.3-rc2
  2012-02-03 19:28   ` Nick Bowler
@ 2012-02-08 16:32     ` Nick Bowler
  0 siblings, 0 replies; 7+ messages in thread
From: Nick Bowler @ 2012-02-08 16:32 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev, linux-kernel

On 2012-02-03 14:28 -0500, Nick Bowler wrote:
> On 2012-02-02 12:45 -0800, Stephen Hemminger wrote:
> > On Thu, 2 Feb 2012 14:21:15 -0500
> > Nick Bowler <nbowler@elliptictech.com> wrote:
> > > I just saw this panic on 3.3-rc2 with skge.  I don't know whether it's
> > > reproducible yet -- the machine crashed while I was not actively using
> > > it.  We've had this type of card for a few years and I've never seen this
> > > before so it may be a regression, but admittedly we don't use them all
> > > that often.
> [...]
> > 
> > Try reverting this commit, it seems problematic
> > commit d0249e44432aa0ffcf710b64449b8eaa3722547e
> > Author: stephen hemminger <shemminger@vyatta.com>
> > Date:   Thu Jan 19 14:37:18 2012 +0000
> > 
> >     skge: check for PCI dma mapping errors
> 
> Thanks for the pointer, I'll try that.  Unfortunately some other stuff
> has come up so I probably won't be able to test it until next week.

Just to confirm: I can reliably reproduce the crash and reverting that
commit fixes it.

For reference, I captured the full trace over serial console:

  skge 0000:03:01.0: eth1: enabling interface
  ADDRCONF(NETDEV_UP): eth1: link is not ready
  skge 0000:03:01.0: eth1: Link is up at 1000 Mbps, full duplex, flow control none
  ADDRCONF(NETDEV_CHANGE): eth1: link becomes ready
  device eth1 entered promiscuous mode
  BUG: unable to handle kernel NULL pointer dereference at           (null)
  IP: [<ffffffffa001826e>] skge_poll+0x367/0x5cd [skge]
  PGD 0 
  Oops: 0000 [#1] PREEMPT SMP 
  CPU 0 
  Modules linked in: nfs lockd auth_rpcgss nfs_acl sunrpc autofs4 acpi_cpufreq mperf deflate zlib_deflate ctr aes_x86_64 aes_generic des_generic cbc sha512_generic sha256_generic sha1_ssse3 sha1_generic md5 hmac crypto_null af_key ipv6 loop snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_pcm snd_seq snd_timer snd_seq_device snd soundcore skge snd_page_alloc sky2 evdev i2c_i801
  
  Pid: 10, comm: kworker/0:1 Not tainted 3.3.0-rc2+ #10 LENOVO 0841A5U/LENOVO
  RIP: 0010:[<ffffffffa001826e>]  [<ffffffffa001826e>] skge_poll+0x367/0x5cd [skge]
  RSP: 0018:ffff88007f403e00  EFLAGS: 00010246
  RAX: ffff880079e3bc40 RBX: ffff88007baf3600 RCX: 0000000000000046
  RDX: ffff88007bddaf00 RSI: 0000000000000000 RDI: ffff880079e3bc40
  RBP: ffff88007f403e70 R08: 0000000000000300 R09: ffffffff812d7e11
  R10: ffff880079eb7200 R11: ffff88007baf3600 R12: ffff88007baf3000
  R13: ffff88007ae98208 R14: ffff880079eb7200 R15: 0000000000000046
  FS:  0000000000000000(0000) GS:ffff88007f400000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
  CR2: 0000000000000000 CR3: 000000007ae4d000 CR4: 00000000000406f0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
  Process kworker/0:1 (pid: 10, threadinfo ffff88007bca8000, task ffff88007c87c260)
  Stack:
   ffff88007c870600 ffff88007f403e18 ffff88007c870600 897488007f403e58
   0000004600000040 ffff88007baf3600 0046000000000001 ffff88007baf3610
   0000000000000000 ffff88007baf3610 ffff88007f411380 0000000000000000
  Call Trace:
   <IRQ> 
   [<ffffffff812e432c>] net_rx_action+0xaa/0x1c0
   [<ffffffff8102fe91>] __do_softirq+0x7e/0x125
   [<ffffffff8135ecb7>] ? _raw_spin_unlock+0x26/0x31
   [<ffffffff8136092c>] call_softirq+0x1c/0x30
   [<ffffffff8100411b>] do_softirq+0x33/0x68
   [<ffffffff8102fc7f>] irq_exit+0x3f/0xb9
   [<ffffffff81003a20>] do_IRQ+0x97/0xae
   [<ffffffff8135f02b>] common_interrupt+0x6b/0x6b
   <EOI> 
   [<ffffffff8135ed02>] ? _raw_spin_unlock_irq+0xd/0x32
   [<ffffffff8103ea56>] worker_thread+0x24b/0x255
   [<ffffffff8103e80b>] ? manage_workers+0x190/0x190
   [<ffffffff81041f31>] kthread+0x84/0x8c
   [<ffffffff81360834>] kernel_thread_helper+0x4/0x10
   [<ffffffff81041ead>] ? kthread_freezable_should_stop+0x6b/0x6b
   [<ffffffff81360830>] ? gs_change+0xb/0xb
  Code: 48 8b 40 30 48 85 c0 74 0a b9 02 00 00 00 4c 89 fa ff d0 49 8b 86 d0 00 00 00 49 8b 55 10 8b 4d b4 48 89 c7 48 8b b2 d0 00 00 00 <f3> a4 31 ff 48 8b 03 49 8b 75 18 48 8b 40 08 48 85 c0 74 13 48 
  RIP  [<ffffffffa001826e>] skge_poll+0x367/0x5cd [skge]
   RSP <ffff88007f403e00>
  CR2: 0000000000000000
  ---[ end trace 13c07164f6f205a2 ]---
  Kernel panic - not syncing: Fatal exception in interrupt
  panic occurred, switching back to text console

Cheers,
-- 
Nick Bowler, Elliptic Technologies (http://www.elliptictech.com/)

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2012-02-08 16:32 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-02-02 19:21 Sudden kernel panic with skge in 3.3-rc2 Nick Bowler
2012-02-02 20:45 ` Stephen Hemminger
2012-02-03 19:28   ` Nick Bowler
2012-02-08 16:32     ` Nick Bowler
2012-02-03 22:05   ` Guillaume Chazarain
2012-02-07  0:11   ` Paul Gortmaker
2012-02-07  0:17     ` Stephen Hemminger

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).