All of lore.kernel.org
 help / color / mirror / Atom feed
* Panic in iwl3945 driver
@ 2009-11-25 16:49 Maxim Levitsky
  2009-11-27  0:00 ` Maxim Levitsky
  0 siblings, 1 reply; 20+ messages in thread
From: Maxim Levitsky @ 2009-11-25 16:49 UTC (permalink / raw)
  To: linux-wireless, iwlwifi maling list

Just captured a panic in iwl3945 driver.
Will investigate.

<1>[ 7290.414172] IP: [<ffffffffa0dd53a1>] iwl3945_rx_reply_tx+0xc1/0x450 [iwl3945]
<4>[ 7290.414205] PGD 0 
<1>[ 7290.414214] Thread overran stack, or stack corrupted
<0>[ 7290.414229] Oops: 0002 [#1] PREEMPT SMP 
<0>[ 7290.414246] last sysfs file: /sys/devices/platform/coretemp.1/temp1_input
<4>[ 7290.414265] CPU 0 
<4>[ 7290.414274] Modules linked in: af_packet nfsd usb_storage usb_libusual cpufreq_powersave exportfs cpufreq_conservative iwl3945 nfs cpufreq_userspace snd_hda_codec_realtek acpi_cpufreq uvcvideo lockd iwlcore snd_hda_intel joydev coretemp nfs_acl videodev snd_hda_codec mac80211 v4l1_compat snd_hwdep sbp2 v4l2_compat_ioctl32 uhci_hcd psmouse auth_rpcgss ohci1394 cfg80211 ehci_hcd video ieee1394 snd_pcm serio_raw battery ac nvidia(P) usbcore output sunrpc evdev lirc_ene0100 snd_page_alloc rfkill tg3 libphy fuse lzo lzo_decompress lzo_compress
<6>[ 7290.414486] Pid: 0, comm: swapper Tainted: P           2.6.32-rc8-wl #213 Aspire 5720     
<6>[ 7290.414507] RIP: 0010:[<ffffffffa0dd53a1>]  [<ffffffffa0dd53a1>] iwl3945_rx_reply_tx+0xc1/0x450 [iwl3945]
<6>[ 7290.414541] RSP: 0018:ffff880002203d60  EFLAGS: 00010246
<6>[ 7290.414557] RAX: 000000000000004f RBX: ffff880064c11600 RCX: 0000000000000013
<6>[ 7290.414576] RDX: ffffffffa0ddcf20 RSI: ffff8800512b7008 RDI: 0000000000000038
<6>[ 7290.414596] RBP: ffff880002203dd0 R08: 0000000000000000 R09: 0000000000000100
<6>[ 7290.414616] R10: 0000000000000001 R11: 0000000000000000 R12: 00000000000000a0
<6>[ 7290.414635] R13: 0000000000000002 R14: 0000000000000013 R15: 0000000000020201
<6>[ 7290.414655] FS:  0000000000000000(0000) GS:ffff880002200000(0000) knlGS:0000000000000000
<6>[ 7290.414677] CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
<6>[ 7290.414693] CR2: 0000000000000041 CR3: 0000000001001000 CR4: 00000000000006f0
<6>[ 7290.414712] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
<6>[ 7290.414732] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
<4>[ 7290.414752] Process swapper (pid: 0, threadinfo ffffffff81524000, task ffffffff81528b60)
<0>[ 7290.414772] Stack:
<4>[ 7290.414780]  ffff880002203da0 0000000000000046 0000000000000000 0000000000000046
<4>[ 7290.414804] <0> 0000000000000282 0000000000000282 0000000000000282 ffff880064c12010
<4>[ 7290.414830] <0> ffff880002203db0 ffff880064c11600 ffff880064c12e50 ffff8800512b7000
<0>[ 7290.414858] Call Trace:
<0>[ 7290.414867]  <IRQ> 
<4>[ 7290.414884]  [<ffffffffa0dc8c47>] iwl3945_irq_tasklet+0x657/0x1740 [iwl3945]
<4>[ 7290.414910]  [<ffffffff8138fc60>] ? _spin_unlock+0x30/0x60
<4>[ 7290.414931]  [<ffffffff81049a21>] tasklet_action+0x101/0x110
<4>[ 7290.414950]  [<ffffffff8104a3d0>] __do_softirq+0xc0/0x160
<4>[ 7290.414968]  [<ffffffff8100d01c>] call_softirq+0x1c/0x30
<4>[ 7290.414986]  [<ffffffff8100eff5>] do_softirq+0x75/0xb0
<4>[ 7290.415003]  [<ffffffff81049ee5>] irq_exit+0x95/0xa0
<4>[ 7290.415020]  [<ffffffff8100e547>] do_IRQ+0x77/0xf0
<4>[ 7290.415038]  [<ffffffff8100c7d3>] ret_from_intr+0x0/0xf
<0>[ 7290.415052]  <EOI> 
<4>[ 7290.415067]  [<ffffffff81234efa>] ? acpi_idle_enter_bm+0x270/0x2a5
<4>[ 7290.415087]  [<ffffffff81234f04>] ? acpi_idle_enter_bm+0x27a/0x2a5
<4>[ 7290.415107]  [<ffffffff81234efa>] ? acpi_idle_enter_bm+0x270/0x2a5
<4>[ 7290.415130]  [<ffffffff812c11f3>] ? cpuidle_idle_call+0x93/0xf0
<4>[ 7290.415149]  [<ffffffff8100b0d7>] ? cpu_idle+0xa7/0x110
<4>[ 7290.415168]  [<ffffffff8137b3d5>] ? rest_init+0x75/0x80
<4>[ 7290.415187]  [<ffffffff8158cd0a>] ? start_kernel+0x3a7/0x3b3
<4>[ 7290.415206]  [<ffffffff8158c315>] ? x86_64_start_reservations+0x125/0x129
<4>[ 7290.415227]  [<ffffffff8158c3fd>] ? x86_64_start_kernel+0xe4/0xeb
<0>[ 7290.415243] Code: 00 41 39 ce 0f 8d e8 01 00 00 48 8b 47 40 48 63 d2 48 69 d2 98 00 00 00 4c 8b 04 02 48 c7 c2 20 cf dd a0 49 8d 78 38 49 8d 40 4f <c6> 47 09 00 c6 47 0c 00 c6 47 0f 00 c6 47 12 00 c6 47 15 00 49 
<1>[ 7290.415382] RIP  [<ffffffffa0dd53a1>] iwl3945_rx_reply_tx+0xc1/0x450 [iwl3945]
<4>[ 7290.415410]  RSP <ffff880002203d60>
<0>[ 7290.415421] CR2: 0000000000000041
<4>[ 7290.415436] ---[ end trace ec46807277caa515 ]---
<0>[ 7290.415450] Kernel panic - not syncing: Fatal exception in interrupt
<4>[ 7290.415468] Pid: 0, comm: swapper Tainted: P      D    2.6.32-rc8-wl #213
<4>[ 7290.415486] Call Trace:
<4>[ 7290.415495]  <IRQ>  [<ffffffff8138c040>] panic+0x7d/0x13a
<4>[ 7290.415519]  [<ffffffff8101071a>] oops_end+0xda/0xe0
<4>[ 7290.415538]  [<ffffffff8102e1ea>] no_context+0xea/0x250
<4>[ 7290.415557]  [<ffffffff81038991>] ? select_task_rq_fair+0x511/0x780
<4>[ 7290.415578]  [<ffffffff8102e475>] __bad_area_nosemaphore+0x125/0x1e0
<4>[ 7290.415597]  [<ffffffff81038d0c>] ? __enqueue_entity+0x7c/0x80
<4>[ 7290.415616]  [<ffffffff81039201>] ? enqueue_task_fair+0x111/0x150
<4>[ 7290.415636]  [<ffffffff8102e53e>] bad_area_nosemaphore+0xe/0x10
<4>[ 7290.415656]  [<ffffffff8102e8fa>] do_page_fault+0x26a/0x320
<4>[ 7290.415674]  [<ffffffff813905df>] page_fault+0x1f/0x30
<4>[ 7290.415697]  [<ffffffffa0dd53a1>] ? iwl3945_rx_reply_tx+0xc1/0x450 [iwl3945]
<4>[ 7290.415723]  [<ffffffffa0dc8c47>] iwl3945_irq_tasklet+0x657/0x1740 [iwl3945]
<4>[ 7290.415746]  [<ffffffff8138fc60>] ? _spin_unlock+0x30/0x60
<4>[ 7290.415764]  [<ffffffff81049a21>] tasklet_action+0x101/0x110
<4>[ 7290.415783]  [<ffffffff8104a3d0>] __do_softirq+0xc0/0x160
<4>[ 7290.415801]  [<ffffffff8100d01c>] call_softirq+0x1c/0x30
<4>[ 7290.415818]  [<ffffffff8100eff5>] do_softirq+0x75/0xb0
<4>[ 7290.415835]  [<ffffffff81049ee5>] irq_exit+0x95/0xa0
<4>[ 7290.415852]  [<ffffffff8100e547>] do_IRQ+0x77/0xf0
<4>[ 7290.415869]  [<ffffffff8100c7d3>] ret_from_intr+0x0/0xf
<4>[ 7290.415883]  <EOI>  [<ffffffff81234efa>] ? acpi_idle_enter_bm+0x270/0x2a5
<4>[ 7290.415911]  [<ffffffff81234f04>] ? acpi_idle_enter_bm+0x27a/0x2a5
<4>[ 7290.415931]  [<ffffffff81234efa>] ? acpi_idle_enter_bm+0x270/0x2a5
<4>[ 7290.415952]  [<ffffffff812c11f3>] ? cpuidle_idle_call+0x93/0xf0
<4>[ 7290.415971]  [<ffffffff8100b0d7>] ? cpu_idle+0xa7/0x110
<4>[ 7290.415989]  [<ffffffff8137b3d5>] ? rest_init+0x75/0x80
<4>[ 7290.416007]  [<ffffffff8158cd0a>] ? start_kernel+0x3a7/0x3b3
<4>[ 7290.416026]  [<ffffffff8158c315>] ? x86_64_start_reservations+0x125/0x129
<4>[ 7290.416047]  [<ffffffff8158c3fd>] ? x86_64_start_kernel+0xe4/0xeb



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Panic in iwl3945 driver
  2009-11-25 16:49 Panic in iwl3945 driver Maxim Levitsky
@ 2009-11-27  0:00 ` Maxim Levitsky
  2009-11-30 15:55   ` Maxim Levitsky
  0 siblings, 1 reply; 20+ messages in thread
From: Maxim Levitsky @ 2009-11-27  0:00 UTC (permalink / raw)
  To: linux-wireless; +Cc: iwlwifi maling list

On Wed, 2009-11-25 at 18:49 +0200, Maxim Levitsky wrote: 
> Just captured a panic in iwl3945 driver.
> Will investigate.
> 
> <1>[ 7290.414172] IP: [<ffffffffa0dd53a1>] iwl3945_rx_reply_tx+0xc1/0x450 [iwl3945]
> <4>[ 7290.414205] PGD 0 
> <1>[ 7290.414214] Thread overran stack, or stack corrupted
> <0>[ 7290.414229] Oops: 0002 [#1] PREEMPT SMP 
> <0>[ 7290.414246] last sysfs file: /sys/devices/platform/coretemp.1/temp1_input
> <4>[ 7290.414265] CPU 0 
> <4>[ 7290.414274] Modules linked in: af_packet nfsd usb_storage usb_libusual cpufreq_powersave exportfs cpufreq_conservative iwl3945 nfs cpufreq_userspace snd_hda_codec_realtek acpi_cpufreq uvcvideo lockd iwlcore snd_hda_intel joydev coretemp nfs_acl videodev snd_hda_codec mac80211 v4l1_compat snd_hwdep sbp2 v4l2_compat_ioctl32 uhci_hcd psmouse auth_rpcgss ohci1394 cfg80211 ehci_hcd video ieee1394 snd_pcm serio_raw battery ac nvidia(P) usbcore output sunrpc evdev lirc_ene0100 snd_page_alloc rfkill tg3 libphy fuse lzo lzo_decompress lzo_compress
> <6>[ 7290.414486] Pid: 0, comm: swapper Tainted: P           2.6.32-rc8-wl #213 Aspire 5720     
> <6>[ 7290.414507] RIP: 0010:[<ffffffffa0dd53a1>]  [<ffffffffa0dd53a1>] iwl3945_rx_reply_tx+0xc1/0x450 [iwl3945]
> <6>[ 7290.414541] RSP: 0018:ffff880002203d60  EFLAGS: 00010246
> <6>[ 7290.414557] RAX: 000000000000004f RBX: ffff880064c11600 RCX: 0000000000000013
> <6>[ 7290.414576] RDX: ffffffffa0ddcf20 RSI: ffff8800512b7008 RDI: 0000000000000038
> <6>[ 7290.414596] RBP: ffff880002203dd0 R08: 0000000000000000 R09: 0000000000000100
> <6>[ 7290.414616] R10: 0000000000000001 R11: 0000000000000000 R12: 00000000000000a0
> <6>[ 7290.414635] R13: 0000000000000002 R14: 0000000000000013 R15: 0000000000020201
> <6>[ 7290.414655] FS:  0000000000000000(0000) GS:ffff880002200000(0000) knlGS:0000000000000000
> <6>[ 7290.414677] CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
> <6>[ 7290.414693] CR2: 0000000000000041 CR3: 0000000001001000 CR4: 00000000000006f0
> <6>[ 7290.414712] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> <6>[ 7290.414732] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> <4>[ 7290.414752] Process swapper (pid: 0, threadinfo ffffffff81524000, task ffffffff81528b60)
> <0>[ 7290.414772] Stack:
> <4>[ 7290.414780]  ffff880002203da0 0000000000000046 0000000000000000 0000000000000046
> <4>[ 7290.414804] <0> 0000000000000282 0000000000000282 0000000000000282 ffff880064c12010
> <4>[ 7290.414830] <0> ffff880002203db0 ffff880064c11600 ffff880064c12e50 ffff8800512b7000
> <0>[ 7290.414858] Call Trace:
> <0>[ 7290.414867]  <IRQ> 
> <4>[ 7290.414884]  [<ffffffffa0dc8c47>] iwl3945_irq_tasklet+0x657/0x1740 [iwl3945]
> <4>[ 7290.414910]  [<ffffffff8138fc60>] ? _spin_unlock+0x30/0x60
> <4>[ 7290.414931]  [<ffffffff81049a21>] tasklet_action+0x101/0x110
> <4>[ 7290.414950]  [<ffffffff8104a3d0>] __do_softirq+0xc0/0x160
> <4>[ 7290.414968]  [<ffffffff8100d01c>] call_softirq+0x1c/0x30
> <4>[ 7290.414986]  [<ffffffff8100eff5>] do_softirq+0x75/0xb0
> <4>[ 7290.415003]  [<ffffffff81049ee5>] irq_exit+0x95/0xa0
> <4>[ 7290.415020]  [<ffffffff8100e547>] do_IRQ+0x77/0xf0
> <4>[ 7290.415038]  [<ffffffff8100c7d3>] ret_from_intr+0x0/0xf
> <0>[ 7290.415052]  <EOI> 
> <4>[ 7290.415067]  [<ffffffff81234efa>] ? acpi_idle_enter_bm+0x270/0x2a5
> <4>[ 7290.415087]  [<ffffffff81234f04>] ? acpi_idle_enter_bm+0x27a/0x2a5
> <4>[ 7290.415107]  [<ffffffff81234efa>] ? acpi_idle_enter_bm+0x270/0x2a5
> <4>[ 7290.415130]  [<ffffffff812c11f3>] ? cpuidle_idle_call+0x93/0xf0
> <4>[ 7290.415149]  [<ffffffff8100b0d7>] ? cpu_idle+0xa7/0x110
> <4>[ 7290.415168]  [<ffffffff8137b3d5>] ? rest_init+0x75/0x80
> <4>[ 7290.415187]  [<ffffffff8158cd0a>] ? start_kernel+0x3a7/0x3b3
> <4>[ 7290.415206]  [<ffffffff8158c315>] ? x86_64_start_reservations+0x125/0x129
> <4>[ 7290.415227]  [<ffffffff8158c3fd>] ? x86_64_start_kernel+0xe4/0xeb
> <0>[ 7290.415243] Code: 00 41 39 ce 0f 8d e8 01 00 00 48 8b 47 40 48 63 d2 48 69 d2 98 00 00 00 4c 8b 04 02 48 c7 c2 20 cf dd a0 49 8d 78 38 49 8d 40 4f <c6> 47 09 00 c6 47 0c 00 c6 47 0f 00 c6 47 12 00 c6 47 15 00 49 
> <1>[ 7290.415382] RIP  [<ffffffffa0dd53a1>] iwl3945_rx_reply_tx+0xc1/0x450 [iwl3945]
> <4>[ 7290.415410]  RSP <ffff880002203d60>
> <0>[ 7290.415421] CR2: 0000000000000041
> <4>[ 7290.415436] ---[ end trace ec46807277caa515 ]---
> <0>[ 7290.415450] Kernel panic - not syncing: Fatal exception in interrupt
> <4>[ 7290.415468] Pid: 0, comm: swapper Tainted: P      D    2.6.32-rc8-wl #213
> <4>[ 7290.415486] Call Trace:
> <4>[ 7290.415495]  <IRQ>  [<ffffffff8138c040>] panic+0x7d/0x13a
> <4>[ 7290.415519]  [<ffffffff8101071a>] oops_end+0xda/0xe0
> <4>[ 7290.415538]  [<ffffffff8102e1ea>] no_context+0xea/0x250
> <4>[ 7290.415557]  [<ffffffff81038991>] ? select_task_rq_fair+0x511/0x780
> <4>[ 7290.415578]  [<ffffffff8102e475>] __bad_area_nosemaphore+0x125/0x1e0
> <4>[ 7290.415597]  [<ffffffff81038d0c>] ? __enqueue_entity+0x7c/0x80
> <4>[ 7290.415616]  [<ffffffff81039201>] ? enqueue_task_fair+0x111/0x150
> <4>[ 7290.415636]  [<ffffffff8102e53e>] bad_area_nosemaphore+0xe/0x10
> <4>[ 7290.415656]  [<ffffffff8102e8fa>] do_page_fault+0x26a/0x320
> <4>[ 7290.415674]  [<ffffffff813905df>] page_fault+0x1f/0x30
> <4>[ 7290.415697]  [<ffffffffa0dd53a1>] ? iwl3945_rx_reply_tx+0xc1/0x450 [iwl3945]
> <4>[ 7290.415723]  [<ffffffffa0dc8c47>] iwl3945_irq_tasklet+0x657/0x1740 [iwl3945]
> <4>[ 7290.415746]  [<ffffffff8138fc60>] ? _spin_unlock+0x30/0x60
> <4>[ 7290.415764]  [<ffffffff81049a21>] tasklet_action+0x101/0x110
> <4>[ 7290.415783]  [<ffffffff8104a3d0>] __do_softirq+0xc0/0x160
> <4>[ 7290.415801]  [<ffffffff8100d01c>] call_softirq+0x1c/0x30
> <4>[ 7290.415818]  [<ffffffff8100eff5>] do_softirq+0x75/0xb0
> <4>[ 7290.415835]  [<ffffffff81049ee5>] irq_exit+0x95/0xa0
> <4>[ 7290.415852]  [<ffffffff8100e547>] do_IRQ+0x77/0xf0
> <4>[ 7290.415869]  [<ffffffff8100c7d3>] ret_from_intr+0x0/0xf
> <4>[ 7290.415883]  <EOI>  [<ffffffff81234efa>] ? acpi_idle_enter_bm+0x270/0x2a5
> <4>[ 7290.415911]  [<ffffffff81234f04>] ? acpi_idle_enter_bm+0x27a/0x2a5
> <4>[ 7290.415931]  [<ffffffff81234efa>] ? acpi_idle_enter_bm+0x270/0x2a5
> <4>[ 7290.415952]  [<ffffffff812c11f3>] ? cpuidle_idle_call+0x93/0xf0
> <4>[ 7290.415971]  [<ffffffff8100b0d7>] ? cpu_idle+0xa7/0x110
> <4>[ 7290.415989]  [<ffffffff8137b3d5>] ? rest_init+0x75/0x80
> <4>[ 7290.416007]  [<ffffffff8158cd0a>] ? start_kernel+0x3a7/0x3b3
> <4>[ 7290.416026]  [<ffffffff8158c315>] ? x86_64_start_reservations+0x125/0x129
> <4>[ 7290.416047]  [<ffffffff8158c3fd>] ? x86_64_start_kernel+0xe4/0xeb


This is some very unpleasant problem.
The thing is that this happens very rarely, and while you use X.
I had recently few such embarrassing kernel panics (I never had any
random and rare kernel panics) and I strongly suspect them to be of same
origin.

This one is first I captured, due to some code that I wrote recently
that saves printk buffer in predefined location in system ram that isn't
cleared on reboot in my notebook.

I had put some NULL checks in iwl3945_rx_reply_tx, none did trigger yet,
nor I had another kernel panic.

Best regards,
Maxim Levitsky



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Panic in iwl3945 driver
  2009-11-27  0:00 ` Maxim Levitsky
@ 2009-11-30 15:55   ` Maxim Levitsky
  2009-11-30 21:42     ` reinette chatre
  0 siblings, 1 reply; 20+ messages in thread
From: Maxim Levitsky @ 2009-11-30 15:55 UTC (permalink / raw)
  To: linux-wireless; +Cc: iwlwifi maling list

On Fri, 2009-11-27 at 02:00 +0200, Maxim Levitsky wrote: 
> On Wed, 2009-11-25 at 18:49 +0200, Maxim Levitsky wrote: 
> > Just captured a panic in iwl3945 driver.
> > Will investigate.
> > 
> > <1>[ 7290.414172] IP: [<ffffffffa0dd53a1>] iwl3945_rx_reply_tx+0xc1/0x450 [iwl3945]
> > <4>[ 7290.414205] PGD 0 
> > <1>[ 7290.414214] Thread overran stack, or stack corrupted
> > <0>[ 7290.414229] Oops: 0002 [#1] PREEMPT SMP 
> > <0>[ 7290.414246] last sysfs file: /sys/devices/platform/coretemp.1/temp1_input
> > <4>[ 7290.414265] CPU 0 
> > <4>[ 7290.414274] Modules linked in: af_packet nfsd usb_storage usb_libusual cpufreq_powersave exportfs cpufreq_conservative iwl3945 nfs cpufreq_userspace snd_hda_codec_realtek acpi_cpufreq uvcvideo lockd iwlcore snd_hda_intel joydev coretemp nfs_acl videodev snd_hda_codec mac80211 v4l1_compat snd_hwdep sbp2 v4l2_compat_ioctl32 uhci_hcd psmouse auth_rpcgss ohci1394 cfg80211 ehci_hcd video ieee1394 snd_pcm serio_raw battery ac nvidia(P) usbcore output sunrpc evdev lirc_ene0100 snd_page_alloc rfkill tg3 libphy fuse lzo lzo_decompress lzo_compress
> > <6>[ 7290.414486] Pid: 0, comm: swapper Tainted: P           2.6.32-rc8-wl #213 Aspire 5720     
> > <6>[ 7290.414507] RIP: 0010:[<ffffffffa0dd53a1>]  [<ffffffffa0dd53a1>] iwl3945_rx_reply_tx+0xc1/0x450 [iwl3945]
> > <6>[ 7290.414541] RSP: 0018:ffff880002203d60  EFLAGS: 00010246
> > <6>[ 7290.414557] RAX: 000000000000004f RBX: ffff880064c11600 RCX: 0000000000000013
> > <6>[ 7290.414576] RDX: ffffffffa0ddcf20 RSI: ffff8800512b7008 RDI: 0000000000000038
> > <6>[ 7290.414596] RBP: ffff880002203dd0 R08: 0000000000000000 R09: 0000000000000100
> > <6>[ 7290.414616] R10: 0000000000000001 R11: 0000000000000000 R12: 00000000000000a0
> > <6>[ 7290.414635] R13: 0000000000000002 R14: 0000000000000013 R15: 0000000000020201
> > <6>[ 7290.414655] FS:  0000000000000000(0000) GS:ffff880002200000(0000) knlGS:0000000000000000
> > <6>[ 7290.414677] CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
> > <6>[ 7290.414693] CR2: 0000000000000041 CR3: 0000000001001000 CR4: 00000000000006f0
> > <6>[ 7290.414712] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > <6>[ 7290.414732] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> > <4>[ 7290.414752] Process swapper (pid: 0, threadinfo ffffffff81524000, task ffffffff81528b60)
> > <0>[ 7290.414772] Stack:
> > <4>[ 7290.414780]  ffff880002203da0 0000000000000046 0000000000000000 0000000000000046
> > <4>[ 7290.414804] <0> 0000000000000282 0000000000000282 0000000000000282 ffff880064c12010
> > <4>[ 7290.414830] <0> ffff880002203db0 ffff880064c11600 ffff880064c12e50 ffff8800512b7000
> > <0>[ 7290.414858] Call Trace:
> > <0>[ 7290.414867]  <IRQ> 
> > <4>[ 7290.414884]  [<ffffffffa0dc8c47>] iwl3945_irq_tasklet+0x657/0x1740 [iwl3945]
> > <4>[ 7290.414910]  [<ffffffff8138fc60>] ? _spin_unlock+0x30/0x60
> > <4>[ 7290.414931]  [<ffffffff81049a21>] tasklet_action+0x101/0x110
> > <4>[ 7290.414950]  [<ffffffff8104a3d0>] __do_softirq+0xc0/0x160
> > <4>[ 7290.414968]  [<ffffffff8100d01c>] call_softirq+0x1c/0x30
> > <4>[ 7290.414986]  [<ffffffff8100eff5>] do_softirq+0x75/0xb0
> > <4>[ 7290.415003]  [<ffffffff81049ee5>] irq_exit+0x95/0xa0
> > <4>[ 7290.415020]  [<ffffffff8100e547>] do_IRQ+0x77/0xf0
> > <4>[ 7290.415038]  [<ffffffff8100c7d3>] ret_from_intr+0x0/0xf
> > <0>[ 7290.415052]  <EOI> 
> > <4>[ 7290.415067]  [<ffffffff81234efa>] ? acpi_idle_enter_bm+0x270/0x2a5
> > <4>[ 7290.415087]  [<ffffffff81234f04>] ? acpi_idle_enter_bm+0x27a/0x2a5
> > <4>[ 7290.415107]  [<ffffffff81234efa>] ? acpi_idle_enter_bm+0x270/0x2a5
> > <4>[ 7290.415130]  [<ffffffff812c11f3>] ? cpuidle_idle_call+0x93/0xf0
> > <4>[ 7290.415149]  [<ffffffff8100b0d7>] ? cpu_idle+0xa7/0x110
> > <4>[ 7290.415168]  [<ffffffff8137b3d5>] ? rest_init+0x75/0x80
> > <4>[ 7290.415187]  [<ffffffff8158cd0a>] ? start_kernel+0x3a7/0x3b3
> > <4>[ 7290.415206]  [<ffffffff8158c315>] ? x86_64_start_reservations+0x125/0x129
> > <4>[ 7290.415227]  [<ffffffff8158c3fd>] ? x86_64_start_kernel+0xe4/0xeb
> > <0>[ 7290.415243] Code: 00 41 39 ce 0f 8d e8 01 00 00 48 8b 47 40 48 63 d2 48 69 d2 98 00 00 00 4c 8b 04 02 48 c7 c2 20 cf dd a0 49 8d 78 38 49 8d 40 4f <c6> 47 09 00 c6 47 0c 00 c6 47 0f 00 c6 47 12 00 c6 47 15 00 49 
> > <1>[ 7290.415382] RIP  [<ffffffffa0dd53a1>] iwl3945_rx_reply_tx+0xc1/0x450 [iwl3945]
> > <4>[ 7290.415410]  RSP <ffff880002203d60>
> > <0>[ 7290.415421] CR2: 0000000000000041
> > <4>[ 7290.415436] ---[ end trace ec46807277caa515 ]---
> > <0>[ 7290.415450] Kernel panic - not syncing: Fatal exception in interrupt
> > <4>[ 7290.415468] Pid: 0, comm: swapper Tainted: P      D    2.6.32-rc8-wl #213
> > <4>[ 7290.415486] Call Trace:
> > <4>[ 7290.415495]  <IRQ>  [<ffffffff8138c040>] panic+0x7d/0x13a
> > <4>[ 7290.415519]  [<ffffffff8101071a>] oops_end+0xda/0xe0
> > <4>[ 7290.415538]  [<ffffffff8102e1ea>] no_context+0xea/0x250
> > <4>[ 7290.415557]  [<ffffffff81038991>] ? select_task_rq_fair+0x511/0x780
> > <4>[ 7290.415578]  [<ffffffff8102e475>] __bad_area_nosemaphore+0x125/0x1e0
> > <4>[ 7290.415597]  [<ffffffff81038d0c>] ? __enqueue_entity+0x7c/0x80
> > <4>[ 7290.415616]  [<ffffffff81039201>] ? enqueue_task_fair+0x111/0x150
> > <4>[ 7290.415636]  [<ffffffff8102e53e>] bad_area_nosemaphore+0xe/0x10
> > <4>[ 7290.415656]  [<ffffffff8102e8fa>] do_page_fault+0x26a/0x320
> > <4>[ 7290.415674]  [<ffffffff813905df>] page_fault+0x1f/0x30
> > <4>[ 7290.415697]  [<ffffffffa0dd53a1>] ? iwl3945_rx_reply_tx+0xc1/0x450 [iwl3945]
> > <4>[ 7290.415723]  [<ffffffffa0dc8c47>] iwl3945_irq_tasklet+0x657/0x1740 [iwl3945]
> > <4>[ 7290.415746]  [<ffffffff8138fc60>] ? _spin_unlock+0x30/0x60
> > <4>[ 7290.415764]  [<ffffffff81049a21>] tasklet_action+0x101/0x110
> > <4>[ 7290.415783]  [<ffffffff8104a3d0>] __do_softirq+0xc0/0x160
> > <4>[ 7290.415801]  [<ffffffff8100d01c>] call_softirq+0x1c/0x30
> > <4>[ 7290.415818]  [<ffffffff8100eff5>] do_softirq+0x75/0xb0
> > <4>[ 7290.415835]  [<ffffffff81049ee5>] irq_exit+0x95/0xa0
> > <4>[ 7290.415852]  [<ffffffff8100e547>] do_IRQ+0x77/0xf0
> > <4>[ 7290.415869]  [<ffffffff8100c7d3>] ret_from_intr+0x0/0xf
> > <4>[ 7290.415883]  <EOI>  [<ffffffff81234efa>] ? acpi_idle_enter_bm+0x270/0x2a5
> > <4>[ 7290.415911]  [<ffffffff81234f04>] ? acpi_idle_enter_bm+0x27a/0x2a5
> > <4>[ 7290.415931]  [<ffffffff81234efa>] ? acpi_idle_enter_bm+0x270/0x2a5
> > <4>[ 7290.415952]  [<ffffffff812c11f3>] ? cpuidle_idle_call+0x93/0xf0
> > <4>[ 7290.415971]  [<ffffffff8100b0d7>] ? cpu_idle+0xa7/0x110
> > <4>[ 7290.415989]  [<ffffffff8137b3d5>] ? rest_init+0x75/0x80
> > <4>[ 7290.416007]  [<ffffffff8158cd0a>] ? start_kernel+0x3a7/0x3b3
> > <4>[ 7290.416026]  [<ffffffff8158c315>] ? x86_64_start_reservations+0x125/0x129
> > <4>[ 7290.416047]  [<ffffffff8158c3fd>] ? x86_64_start_kernel+0xe4/0xeb
> 
> 
> This is some very unpleasant problem.
> The thing is that this happens very rarely, and while you use X.
> I had recently few such embarrassing kernel panics (I never had any
> random and rare kernel panics) and I strongly suspect them to be of same
> origin.
> 
> This one is first I captured, due to some code that I wrote recently
> that saves printk buffer in predefined location in system ram that isn't
> cleared on reboot in my notebook.
> 
> I had put some NULL checks in iwl3945_rx_reply_tx, none did trigger yet,
> nor I had another kernel panic.
> 
> Best regards,
> Maxim Levitsky
> 
> 

Happened again:


<7>[39142.650078] No probe response from AP 00:1b:9e:d8:77:02 after 500ms, try 2
<1>[39329.299446] BUG: unable to handle kernel NULL pointer dereference at 0000000000000041
<1>[39329.299478] IP: [<ffffffffa0c7e416>] iwl3945_rx_reply_tx+0x136/0x460 [iwl3945]
<4>[39329.299513] PGD 49be6067 PUD 48f6b067 PMD 0 
<0>[39329.299533] Oops: 0002 [#1] PREEMPT SMP 
<0>[39329.299551] last sysfs file: /sys/devices/platform/coretemp.1/temp1_input
<4>[39329.299571] CPU 0 
<4>[39329.299581] Modules linked in: tg3 libphy lirc_ene0100 usbhid af_packet vmnet vmblock vsock vmci vmmon nfsd exportfs nfs lockd nfs_acl auth_rpcgss sunrpc usb_storage usb_libusual cpufreq_powersave cpufreq_conservative uvcvideo cpufreq_userspace snd_hda_codec_realtek acpi_cpufreq videodev iwl3945 joydev coretemp snd_hda_intel v4l1_compat iwlcore snd_hda_codec v4l2_compat_ioctl32 sbp2 uhci_hcd snd_hwdep psmouse ohci1394 mac80211 video ehci_hcd snd_pcm serio_raw ieee1394 nvidia(P) evdev output usbcore cfg80211 snd_page_alloc ac battery rfkill fuse lzo lzo_decompress lzo_compress [last unloaded: libphy]
<6>[39329.299809] Pid: 323, comm: firefox Tainted: P           2.6.32-rc8-wl #216 Aspire 5720     
<6>[39329.299833] RIP: 0010:[<ffffffffa0c7e416>]  [<ffffffffa0c7e416>] iwl3945_rx_reply_tx+0x136/0x460 [iwl3945]
<6>[39329.299868] RSP: 0000:ffff880002203d60  EFLAGS: 00010246
<6>[39329.299884] RAX: 000000000000004f RBX: ffff880065351600 RCX: 0000000000000057
<6>[39329.299905] RDX: ffffffffa0c85f40 RSI: ffff88002fd90008 RDI: 0000000000000038
<6>[39329.299925] RBP: ffff880002203dd0 R08: 0000000000000000 R09: 0000000000000100
<6>[39329.299946] R10: 0000000000000001 R11: 0000000000000000 R12: 00000000000000a0
<6>[39329.299966] R13: 0000000000000002 R14: 0000000000000057 R15: 0000000000020201
<6>[39329.299987] FS:  00007f89d8872710(0000) GS:ffff880002200000(0000) knlGS:0000000000000000
<6>[39329.300010] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<6>[39329.300027] CR2: 0000000000000041 CR3: 0000000047d6f000 CR4: 00000000000006f0
<6>[39329.300048] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
<6>[39329.300068] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
<4>[39329.300090] Process firefox (pid: 323, threadinfo ffff880047ce0000, task ffff88003bc35e20)
<0>[39329.300112] Stack:
<4>[39329.300119]  ffff880002203da0 0000000000000046 0000000000000000 0000000000000046
<4>[39329.300145] <0> 0000000000000282 0000000000000282 0000000000000282 ffff880065352010
<4>[39329.300172] <0> ffff880002203db0 ffff880065351600 ffff880065354110 ffff88002fd90000
<0>[39329.300201] Call Trace:
<0>[39329.300210]  <IRQ> 
<4>[39329.300228]  [<ffffffffa0c71c47>] iwl3945_irq_tasklet+0x657/0x1740 [iwl3945]
<4>[39329.300256]  [<ffffffff8138b880>] ? _spin_unlock+0x30/0x60
<4>[39329.300277]  [<ffffffff81049a21>] tasklet_action+0x101/0x110
<4>[39329.300297]  [<ffffffff8104a3d0>] __do_softirq+0xc0/0x160
<4>[39329.300317]  [<ffffffff8100d01c>] call_softirq+0x1c/0x30
<4>[39329.300336]  [<ffffffff8100eff5>] do_softirq+0x75/0xb0
<4>[39329.300353]  [<ffffffff81049ee5>] irq_exit+0x95/0xa0
<4>[39329.300370]  [<ffffffff8100e547>] do_IRQ+0x77/0xf0
<4>[39329.300389]  [<ffffffff8100c7d3>] ret_from_intr+0x0/0xf
<0>[39329.300405]  <EOI> 
<0>[39329.300413] Code: 00 00 00 00 00 41 39 d6 7c 89 48 8b 47 40 48 63 d2 48 69 d2 98 00 00 00 4c 8b 04 02 48 c7 c2 40 5f c8 a0 49 8d 78 38 49 8d 40 4f <c6> 47 09 00 c6 47 0c 00 c6 47 0f 00 c6 47 12 00 c6 47 15 00 49 
<1>[39329.300556] RIP  [<ffffffffa0c7e416>] iwl3945_rx_reply_tx+0x136/0x460 [iwl3945]
<4>[39329.300585]  RSP <ffff880002203d60>
<0>[39329.300597] CR2: 0000000000000041
<4>[39329.300643] ---[ end trace 7b44b083a7088d66 ]---
<0>[39329.300658] Kernel panic - not syncing: Fatal exception in interrupt
<4>[39329.300678] Pid: 323, comm: firefox Tainted: P      D    2.6.32-rc8-wl #216
<4>[39329.300697] Call Trace:
<4>[39329.300706]  <IRQ>  [<ffffffff81387c60>] panic+0x7d/0x13a
<4>[39329.300732]  [<ffffffff8101071a>] oops_end+0xda/0xe0
<4>[39329.300753]  [<ffffffff8102e1ea>] no_context+0xea/0x250
<4>[39329.300773]  [<ffffffff8102e475>] __bad_area_nosemaphore+0x125/0x1e0
<4>[39329.300795]  [<ffffffff8102e53e>] bad_area_nosemaphore+0xe/0x10
<4>[39329.300815]  [<ffffffff8102e8fa>] do_page_fault+0x26a/0x320
<4>[39329.300835]  [<ffffffff8138c1ff>] page_fault+0x1f/0x30
<4>[39329.300860]  [<ffffffffa0c7e416>] ? iwl3945_rx_reply_tx+0x136/0x460 [iwl3945]
<4>[39329.300888]  [<ffffffffa0c71c47>] iwl3945_irq_tasklet+0x657/0x1740 [iwl3945]
<4>[39329.300912]  [<ffffffff8138b880>] ? _spin_unlock+0x30/0x60
<4>[39329.300931]  [<ffffffff81049a21>] tasklet_action+0x101/0x110
<4>[39329.300950]  [<ffffffff8104a3d0>] __do_softirq+0xc0/0x160
<4>[39329.300969]  [<ffffffff8100d01c>] call_softirq+0x1c/0x30
<4>[39329.300987]  [<ffffffff8100eff5>] do_softirq+0x75/0xb0
<4>[39329.301005]  [<ffffffff81049ee5>] irq_exit+0x95/0xa0
<4>[39329.301022]  [<ffffffff8100e547>] do_IRQ+0x77/0xf0
<4>[39329.301040]  [<ffffffff8100c7d3>] ret_from_intr+0x0/0xf
<4>[39329.301055]  <EOI> 
<0>[39329.301089] Rebooting in 60 seconds..maxim@maxim-laptop:~$ 



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Panic in iwl3945 driver
  2009-11-30 15:55   ` Maxim Levitsky
@ 2009-11-30 21:42     ` reinette chatre
  2009-11-30 22:35       ` Maxim Levitsky
  0 siblings, 1 reply; 20+ messages in thread
From: reinette chatre @ 2009-11-30 21:42 UTC (permalink / raw)
  To: Maxim Levitsky; +Cc: linux-wireless, iwlwifi maling list

On Mon, 2009-11-30 at 07:55 -0800, Maxim Levitsky wrote:
> > This is some very unpleasant problem.
> > The thing is that this happens very rarely, and while you use X.
> > I had recently few such embarrassing kernel panics (I never had any
> > random and rare kernel panics) and I strongly suspect them to be of same
> > origin.
> >
> > This one is first I captured, due to some code that I wrote recently
> > that saves printk buffer in predefined location in system ram that isn't
> > cleared on reboot in my notebook.
> >
> > I had put some NULL checks in iwl3945_rx_reply_tx, none did trigger yet,
> > nor I had another kernel panic.

Did this problem happen with your NULL checks in place? Can you perhaps
help here with which line the problem occurred? Any idea how to trigger
this?

Thank you

Reinette



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Panic in iwl3945 driver
  2009-11-30 21:42     ` reinette chatre
@ 2009-11-30 22:35       ` Maxim Levitsky
  2009-12-01  9:28         ` Zhu Yi
  0 siblings, 1 reply; 20+ messages in thread
From: Maxim Levitsky @ 2009-11-30 22:35 UTC (permalink / raw)
  To: reinette chatre; +Cc: linux-wireless, iwlwifi maling list

On Mon, 2009-11-30 at 13:42 -0800, reinette chatre wrote: 
> On Mon, 2009-11-30 at 07:55 -0800, Maxim Levitsky wrote:
> > > This is some very unpleasant problem.
> > > The thing is that this happens very rarely, and while you use X.
> > > I had recently few such embarrassing kernel panics (I never had any
> > > random and rare kernel panics) and I strongly suspect them to be of same
> > > origin.
> > >
> > > This one is first I captured, due to some code that I wrote recently
> > > that saves printk buffer in predefined location in system ram that isn't
> > > cleared on reboot in my notebook.
> > >
> > > I had put some NULL checks in iwl3945_rx_reply_tx, none did trigger yet,
> > > nor I had another kernel panic.
> 
> Did this problem happen with your NULL checks in place? Can you perhaps
> help here with which line the problem occurred? Any idea how to trigger
> this?

Second one is with the NULL checks.
It is very rare, it seems to be related to playing music, and using
wireless at same time, although both use MSI here.


Now did some disassembly, and this is the results:


......
0x0000000000016652 <iwl3945_rx_reply_tx+242>:	callq  0x16657 <iwl3945_rx_reply_tx+247>
0x0000000000016657 <iwl3945_rx_reply_tx+247>:	nopw   0x0(%rax,%rax,1)
0x0000000000016660 <iwl3945_rx_reply_tx+256>:	add    $0x48,%rsp
0x0000000000016664 <iwl3945_rx_reply_tx+260>:	pop    %rbx
0x0000000000016665 <iwl3945_rx_reply_tx+261>:	pop    %r12
0x0000000000016667 <iwl3945_rx_reply_tx+263>:	pop    %r13
0x0000000000016669 <iwl3945_rx_reply_tx+265>:	pop    %r14
0x000000000001666b <iwl3945_rx_reply_tx+267>:	pop    %r15
0x000000000001666d <iwl3945_rx_reply_tx+269>:	leaveq 
0x000000000001666e <iwl3945_rx_reply_tx+270>:	retq   
0x000000000001666f <iwl3945_rx_reply_tx+271>:	nop

0x0000000000016670 <iwl3945_rx_reply_tx+272>:	cmp    %edx,%r14d
0x0000000000016673 <iwl3945_rx_reply_tx+275>:	jl     0x16603 <iwl3945_rx_reply_tx+163>
0x0000000000016675 <iwl3945_rx_reply_tx+277>:	mov    0x40(%rdi),%rax
0x0000000000016679 <iwl3945_rx_reply_tx+281>:	movslq %edx,%rdx



/home/maxim/software/kernel/linux-2.6/include/net/mac80211.h: 487
	memset(&info->status.ampdu_ack_len, 0,
	       sizeof(struct ieee80211_tx_info) -
	       offsetof(struct ieee80211_tx_info, status.ampdu_ack_len));

0x000000000001667c <iwl3945_rx_reply_tx+284>:	imul   $0x98,%rdx,%rdx
0x0000000000016683 <iwl3945_rx_reply_tx+291>:	mov    (%rdx,%rax,1),%r8
0x0000000000016687 <iwl3945_rx_reply_tx+295>:	mov    $0x0,%rdx
0x000000000001668e <iwl3945_rx_reply_tx+302>:	lea    0x38(%r8),%rdi
0x0000000000016692 <iwl3945_rx_reply_tx+306>:	lea    0x4f(%r8),%rax



	rate_idx = iwl3945_hwrate_to_plcp_idx(tx_resp->rate);

0x0000000000016696 <iwl3945_rx_reply_tx+310>:	movb   $0x0,0x9(%rdi)        <---------- RIP
0x000000000001669a <iwl3945_rx_reply_tx+314>:	movb   $0x0,0xc(%rdi)
0x000000000001669e <iwl3945_rx_reply_tx+318>:	movb   $0x0,0xf(%rdi)
0x00000000000166a2 <iwl3945_rx_reply_tx+322>:	movb   $0x0,0x12(%rdi)
0x00000000000166a6 <iwl3945_rx_reply_tx+326>:	movb   $0x0,0x15(%rdi)
0x00000000000166aa <iwl3945_rx_reply_tx+330>:	movq   $0x0,0x4f(%r8)
0x00000000000166b2 <iwl3945_rx_reply_tx+338>:	movq   $0x0,0x8(%rax)
0x00000000000166ba <iwl3945_rx_reply_tx+346>:	movq   $0x0,0x10(%rax)
0x00000000000166c2 <iwl3945_rx_reply_tx+354>:	movb   $0x0,0x18(%rax)
0x00000000000166c6 <iwl3945_rx_reply_tx+358>:	xor    %eax,%eax
0x00000000000166c8 <iwl3945_rx_reply_tx+360>:	movzbl 0x3(%rsi),%ecx
0x00000000000166cc <iwl3945_rx_reply_tx+364>:	nopl   0x0(%rax)
0x00000000000166d0 <iwl3945_rx_reply_tx+368>:	cmp    (%rdx),%cl
0x00000000000166d2 <iwl3945_rx_reply_tx+370>:	je     0x166e4 <iwl3945_rx_reply_tx+388>
0x00000000000166d4 <iwl3945_rx_reply_tx+372>:	inc    %eax


Even though addr2line seems to think that faulty instruction is outside the inlined function,
thus looks awfully similar to memset.


So the reason seem that, info somehow is null.

info = IEEE80211_SKB_CB(txq->txb[txq->q.read_ptr].skb[0]);
ieee80211_tx_info_clear_status(info);


I will add checks for this now.


Best regards,
	Maxim Levitsky




^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Panic in iwl3945 driver
  2009-11-30 22:35       ` Maxim Levitsky
@ 2009-12-01  9:28         ` Zhu Yi
  2009-12-01 18:52           ` reinette chatre
  2009-12-02  5:42           ` Zhu Yi
  0 siblings, 2 replies; 20+ messages in thread
From: Zhu Yi @ 2009-12-01  9:28 UTC (permalink / raw)
  To: Maxim Levitsky; +Cc: Chatre, Reinette, linux-wireless, iwlwifi maling list

On Tue, 2009-12-01 at 06:35 +0800, Maxim Levitsky wrote:
> 0x000000000001668e <iwl3945_rx_reply_tx+302>:	lea    0x38(%r8),%rdi
> 0x0000000000016692 <iwl3945_rx_reply_tx+306>:	lea    0x4f(%r8),%rax

When this happened, from your previous post, r8 is 0x0 and rdi is 0x38.
Since "info" is %rdi (see below), this means
txq->txb[txq->q.read_ptr].skb[0], aka. r8 is 0.

> 	rate_idx = iwl3945_hwrate_to_plcp_idx(tx_resp->rate);
> 
> 0x0000000000016696 <iwl3945_rx_reply_tx+310>:	movb   $0x0,0x9(%rdi)        <---------- RIP
> 0x000000000001669a <iwl3945_rx_reply_tx+314>:	movb   $0x0,0xc(%rdi)
> 0x000000000001669e <iwl3945_rx_reply_tx+318>:	movb   $0x0,0xf(%rdi)
> 0x00000000000166a2 <iwl3945_rx_reply_tx+322>:	movb   $0x0,0x12(%rdi)
> 0x00000000000166a6 <iwl3945_rx_reply_tx+326>:	movb   $0x0,0x15(%rdi)

This equals to below code in ieee80211_tx_info_clear_status(). "info" is
%rdi, which is 0x38. That matches NULL pointer dereference at 0x41 in
your oops header.

	for (i = 0; i < IEEE80211_TX_MAX_RATES; i++)
                info->status.rates[i].count = 0;

I guess there is a race for txq->q.read_ptr somewhere. Haven't checked
though.

Thanks,
-yi


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Panic in iwl3945 driver
  2009-12-01  9:28         ` Zhu Yi
@ 2009-12-01 18:52           ` reinette chatre
  2009-12-02  2:06             ` Zhu Yi
  2009-12-02  5:42           ` Zhu Yi
  1 sibling, 1 reply; 20+ messages in thread
From: reinette chatre @ 2009-12-01 18:52 UTC (permalink / raw)
  To: Zhu, Yi; +Cc: Maxim Levitsky, linux-wireless, iwlwifi maling list

On Tue, 2009-12-01 at 01:28 -0800, Zhu, Yi wrote:
> On Tue, 2009-12-01 at 06:35 +0800, Maxim Levitsky wrote:
> > 0x000000000001668e <iwl3945_rx_reply_tx+302>:	lea    0x38(%r8),%rdi
> > 0x0000000000016692 <iwl3945_rx_reply_tx+306>:	lea    0x4f(%r8),%rax
> 
> When this happened, from your previous post, r8 is 0x0 and rdi is 0x38.
> Since "info" is %rdi (see below), this means
> txq->txb[txq->q.read_ptr].skb[0], aka. r8 is 0.

I agree with your conclusion here that r8 is 0 at this point. I am very
confused how this can be since the two lines above the ones you pasted
here are:

0x0000000000016683 <iwl3945_rx_reply_tx+291>:»­­mov    (%rdx,%rax,1),%r8
0x0000000000016687 <iwl3945_rx_reply_tx+295>:»­­mov    $0x0,%rdx

>From the information in the post you refer to we have rdx as
0xffffffffa0c85f40 and rax as 0x4f. Since r8 is initialized above from
these two registers ... how could it be that r8 ends up as zero?

Reinette



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Panic in iwl3945 driver
  2009-12-01 18:52           ` reinette chatre
@ 2009-12-02  2:06             ` Zhu Yi
  0 siblings, 0 replies; 20+ messages in thread
From: Zhu Yi @ 2009-12-02  2:06 UTC (permalink / raw)
  To: Chatre, Reinette; +Cc: Maxim Levitsky, linux-wireless, iwlwifi maling list

On Wed, 2009-12-02 at 02:52 +0800, Chatre, Reinette wrote:
> I agree with your conclusion here that r8 is 0 at this point. I am
> very
> confused how this can be since the two lines above the ones you pasted
> here are:
> 
> 0x0000000000016683 <iwl3945_rx_reply_tx+291>:»­­mov    (%rdx,%rax,1),%
> r8
> 0x0000000000016687 <iwl3945_rx_reply_tx+295>:»­­mov    $0x0,%rdx
> 
> From the information in the post you refer to we have rdx as
> 0xffffffffa0c85f40 and rax as 0x4f. Since r8 is initialized above from
> these two registers ... how could it be that r8 ends up as zero? 

Is it related? They are memory address references.

Thanks,
-yi


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Panic in iwl3945 driver
  2009-12-01  9:28         ` Zhu Yi
  2009-12-01 18:52           ` reinette chatre
@ 2009-12-02  5:42           ` Zhu Yi
  2009-12-02 17:17             ` Maxim Levitsky
  2009-12-02 19:18             ` [ipw3945-devel] " Cahill, Ben M
  1 sibling, 2 replies; 20+ messages in thread
From: Zhu Yi @ 2009-12-02  5:42 UTC (permalink / raw)
  To: Maxim Levitsky; +Cc: Chatre, Reinette, linux-wireless, iwlwifi maling list

On Tue, 2009-12-01 at 17:28 +0800, Zhu Yi wrote:
> On Tue, 2009-12-01 at 06:35 +0800, Maxim Levitsky wrote:
> > 0x000000000001668e <iwl3945_rx_reply_tx+302>:	lea    0x38(%r8),%rdi
> > 0x0000000000016692 <iwl3945_rx_reply_tx+306>:	lea    0x4f(%r8),%rax
> 
> When this happened, from your previous post, r8 is 0x0 and rdi is 0x38.
> Since "info" is %rdi (see below), this means
> txq->txb[txq->q.read_ptr].skb[0], aka. r8 is 0.
> 
> > 	rate_idx = iwl3945_hwrate_to_plcp_idx(tx_resp->rate);
> > 
> > 0x0000000000016696 <iwl3945_rx_reply_tx+310>:	movb   $0x0,0x9(%rdi)        <---------- RIP
> > 0x000000000001669a <iwl3945_rx_reply_tx+314>:	movb   $0x0,0xc(%rdi)
> > 0x000000000001669e <iwl3945_rx_reply_tx+318>:	movb   $0x0,0xf(%rdi)
> > 0x00000000000166a2 <iwl3945_rx_reply_tx+322>:	movb   $0x0,0x12(%rdi)
> > 0x00000000000166a6 <iwl3945_rx_reply_tx+326>:	movb   $0x0,0x15(%rdi)
> 
> This equals to below code in ieee80211_tx_info_clear_status(). "info" is
> %rdi, which is 0x38. That matches NULL pointer dereference at 0x41 in
> your oops header.
> 
> 	for (i = 0; i < IEEE80211_TX_MAX_RATES; i++)
>                 info->status.rates[i].count = 0;
> 
> I guess there is a race for txq->q.read_ptr somewhere. Haven't checked
> though.

OK. 3945 updated write_ptr without regard to read_ptr on the Tx path.
This messes up our TFD on high load. The patch should fix your problem.

Signed-off-by: Zhu Yi <yi.zhu@intel.com>

diff --git a/drivers/net/wireless/iwlwifi/iwl3945-base.c b/drivers/net/wireless/iwlwifi/iwl3945-base.c
index 994db4a..b31b34c 100644
--- a/drivers/net/wireless/iwlwifi/iwl3945-base.c
+++ b/drivers/net/wireless/iwlwifi/iwl3945-base.c
@@ -548,6 +548,9 @@ static int iwl3945_tx_skb(struct iwl_priv *priv, struct sk_buff *skb)
 	txq = &priv->txq[txq_id];
 	q = &txq->q;
 
+	if ((iwl_queue_space(q) < q->high_mark))
+		goto drop;
+
 	spin_lock_irqsave(&priv->lock, flags);
 
 	idx = get_cmd_index(q, q->write_ptr, 0);
 


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: Panic in iwl3945 driver
  2009-12-02  5:42           ` Zhu Yi
@ 2009-12-02 17:17             ` Maxim Levitsky
  2009-12-21 16:40               ` Maxim Levitsky
  2009-12-02 19:18             ` [ipw3945-devel] " Cahill, Ben M
  1 sibling, 1 reply; 20+ messages in thread
From: Maxim Levitsky @ 2009-12-02 17:17 UTC (permalink / raw)
  To: Zhu Yi; +Cc: Chatre, Reinette, linux-wireless, iwlwifi maling list

On Wed, 2009-12-02 at 13:42 +0800, Zhu Yi wrote: 
> On Tue, 2009-12-01 at 17:28 +0800, Zhu Yi wrote:
> > On Tue, 2009-12-01 at 06:35 +0800, Maxim Levitsky wrote:
> > > 0x000000000001668e <iwl3945_rx_reply_tx+302>:	lea    0x38(%r8),%rdi
> > > 0x0000000000016692 <iwl3945_rx_reply_tx+306>:	lea    0x4f(%r8),%rax
> > 
> > When this happened, from your previous post, r8 is 0x0 and rdi is 0x38.
> > Since "info" is %rdi (see below), this means
> > txq->txb[txq->q.read_ptr].skb[0], aka. r8 is 0.
> > 
> > > 	rate_idx = iwl3945_hwrate_to_plcp_idx(tx_resp->rate);
> > > 
> > > 0x0000000000016696 <iwl3945_rx_reply_tx+310>:	movb   $0x0,0x9(%rdi)        <---------- RIP
> > > 0x000000000001669a <iwl3945_rx_reply_tx+314>:	movb   $0x0,0xc(%rdi)
> > > 0x000000000001669e <iwl3945_rx_reply_tx+318>:	movb   $0x0,0xf(%rdi)
> > > 0x00000000000166a2 <iwl3945_rx_reply_tx+322>:	movb   $0x0,0x12(%rdi)
> > > 0x00000000000166a6 <iwl3945_rx_reply_tx+326>:	movb   $0x0,0x15(%rdi)
> > 
> > This equals to below code in ieee80211_tx_info_clear_status(). "info" is
> > %rdi, which is 0x38. That matches NULL pointer dereference at 0x41 in
> > your oops header.
> > 
> > 	for (i = 0; i < IEEE80211_TX_MAX_RATES; i++)
> >                 info->status.rates[i].count = 0;
> > 
> > I guess there is a race for txq->q.read_ptr somewhere. Haven't checked
> > though.
> 
> OK. 3945 updated write_ptr without regard to read_ptr on the Tx path.
> This messes up our TFD on high load. The patch should fix your problem.
> 
> Signed-off-by: Zhu Yi <yi.zhu@intel.com>
> 
> diff --git a/drivers/net/wireless/iwlwifi/iwl3945-base.c b/drivers/net/wireless/iwlwifi/iwl3945-base.c
> index 994db4a..b31b34c 100644
> --- a/drivers/net/wireless/iwlwifi/iwl3945-base.c
> +++ b/drivers/net/wireless/iwlwifi/iwl3945-base.c
> @@ -548,6 +548,9 @@ static int iwl3945_tx_skb(struct iwl_priv *priv, struct sk_buff *skb)
>  	txq = &priv->txq[txq_id];
>  	q = &txq->q;
>  
> +	if ((iwl_queue_space(q) < q->high_mark))
> +		goto drop;
> +
>  	spin_lock_irqsave(&priv->lock, flags);
>  
>  	idx = get_cmd_index(q, q->write_ptr, 0);
>  
> 
I applied that patch, everything works.
I let you know if I see another kernel panic
(I can capture any panic on that system, I set up everything for that)

Best regards,
Maxim Levitsky



^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: [ipw3945-devel] Panic in iwl3945 driver
  2009-12-02  5:42           ` Zhu Yi
  2009-12-02 17:17             ` Maxim Levitsky
@ 2009-12-02 19:18             ` Cahill, Ben M
  2009-12-02 19:40               ` Abhijeet Kolekar
  1 sibling, 1 reply; 20+ messages in thread
From: Cahill, Ben M @ 2009-12-02 19:18 UTC (permalink / raw)
  To: Zhu, Yi, Maxim Levitsky
  Cc: Chatre, Reinette, linux-wireless, iwlwifi maling list

 

>-----Original Message-----
>From: Zhu Yi [mailto:yi.zhu@intel.com] 
>Sent: Wednesday, December 02, 2009 12:43 AM
>To: Maxim Levitsky
>Cc: Chatre, Reinette; linux-wireless; iwlwifi maling list
>Subject: Re: [ipw3945-devel] Panic in iwl3945 driver
>
>On Tue, 2009-12-01 at 17:28 +0800, Zhu Yi wrote:
>> On Tue, 2009-12-01 at 06:35 +0800, Maxim Levitsky wrote:
>> > 0x000000000001668e <iwl3945_rx_reply_tx+302>:	lea    
>0x38(%r8),%rdi
>> > 0x0000000000016692 <iwl3945_rx_reply_tx+306>:	lea    
>0x4f(%r8),%rax
>> 
>> When this happened, from your previous post, r8 is 0x0 and 
>rdi is 0x38.
>> Since "info" is %rdi (see below), this means
>> txq->txb[txq->q.read_ptr].skb[0], aka. r8 is 0.
>> 
>> > 	rate_idx = iwl3945_hwrate_to_plcp_idx(tx_resp->rate);
>> > 
>> > 0x0000000000016696 <iwl3945_rx_reply_tx+310>:	movb   
>$0x0,0x9(%rdi)        <---------- RIP
>> > 0x000000000001669a <iwl3945_rx_reply_tx+314>:	movb   
>$0x0,0xc(%rdi)
>> > 0x000000000001669e <iwl3945_rx_reply_tx+318>:	movb   
>$0x0,0xf(%rdi)
>> > 0x00000000000166a2 <iwl3945_rx_reply_tx+322>:	movb   
>$0x0,0x12(%rdi)
>> > 0x00000000000166a6 <iwl3945_rx_reply_tx+326>:	movb   
>$0x0,0x15(%rdi)
>> 
>> This equals to below code in 
>ieee80211_tx_info_clear_status(). "info" is
>> %rdi, which is 0x38. That matches NULL pointer dereference at 0x41 in
>> your oops header.
>> 
>> 	for (i = 0; i < IEEE80211_TX_MAX_RATES; i++)
>>                 info->status.rates[i].count = 0;
>> 
>> I guess there is a race for txq->q.read_ptr somewhere. 
>Haven't checked
>> though.
>
>OK. 3945 updated write_ptr without regard to read_ptr on the Tx path.
>This messes up our TFD on high load. The patch should fix your problem.

Wow.  Could this be happening with other devices as well?

-- Ben --


>
>Signed-off-by: Zhu Yi <yi.zhu@intel.com>
>
>diff --git a/drivers/net/wireless/iwlwifi/iwl3945-base.c 
>b/drivers/net/wireless/iwlwifi/iwl3945-base.c
>index 994db4a..b31b34c 100644
>--- a/drivers/net/wireless/iwlwifi/iwl3945-base.c
>+++ b/drivers/net/wireless/iwlwifi/iwl3945-base.c
>@@ -548,6 +548,9 @@ static int iwl3945_tx_skb(struct iwl_priv 
>*priv, struct sk_buff *skb)
> 	txq = &priv->txq[txq_id];
> 	q = &txq->q;
> 
>+	if ((iwl_queue_space(q) < q->high_mark))
>+		goto drop;
>+
> 	spin_lock_irqsave(&priv->lock, flags);
> 
> 	idx = get_cmd_index(q, q->write_ptr, 0);
> 
>
>
>---------------------------------------------------------------
>---------------
>Join us December 9, 2009 for the Red Hat Virtual Experience,
>a free event focused on virtualization and cloud computing. 
>Attend in-depth sessions from your desk. Your couch. Anywhere.
>http://p.sf.net/sfu/redhat-sfdev2dev
>_______________________________________________
>Ipw3945-devel mailing list
>Ipw3945-devel@lists.sourceforge.net
>https://lists.sourceforge.net/lists/listinfo/ipw3945-devel
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: [ipw3945-devel] Panic in iwl3945 driver
  2009-12-02 19:18             ` [ipw3945-devel] " Cahill, Ben M
@ 2009-12-02 19:40               ` Abhijeet Kolekar
  0 siblings, 0 replies; 20+ messages in thread
From: Abhijeet Kolekar @ 2009-12-02 19:40 UTC (permalink / raw)
  To: Cahill, Ben M
  Cc: Zhu, Yi, Maxim Levitsky, Chatre, Reinette, linux-wireless,
	iwlwifi maling list

On Wed, 2009-12-02 at 11:18 -0800, Cahill, Ben M wrote:
> 
> >-----Original Message-----
> >From: Zhu Yi [mailto:yi.zhu@intel.com] 
> >Sent: Wednesday, December 02, 2009 12:43 AM
> >To: Maxim Levitsky
> >Cc: Chatre, Reinette; linux-wireless; iwlwifi maling list
> >Subject: Re: [ipw3945-devel] Panic in iwl3945 driver
> >
> >On Tue, 2009-12-01 at 17:28 +0800, Zhu Yi wrote:
> >> On Tue, 2009-12-01 at 06:35 +0800, Maxim Levitsky wrote:
> >> > 0x000000000001668e <iwl3945_rx_reply_tx+302>:	lea    
> >0x38(%r8),%rdi
> >> > 0x0000000000016692 <iwl3945_rx_reply_tx+306>:	lea    
> >0x4f(%r8),%rax
> >> 
> >> When this happened, from your previous post, r8 is 0x0 and 
> >rdi is 0x38.
> >> Since "info" is %rdi (see below), this means
> >> txq->txb[txq->q.read_ptr].skb[0], aka. r8 is 0.
> >> 
> >> > 	rate_idx = iwl3945_hwrate_to_plcp_idx(tx_resp->rate);
> >> > 
> >> > 0x0000000000016696 <iwl3945_rx_reply_tx+310>:	movb   
> >$0x0,0x9(%rdi)        <---------- RIP
> >> > 0x000000000001669a <iwl3945_rx_reply_tx+314>:	movb   
> >$0x0,0xc(%rdi)
> >> > 0x000000000001669e <iwl3945_rx_reply_tx+318>:	movb   
> >$0x0,0xf(%rdi)
> >> > 0x00000000000166a2 <iwl3945_rx_reply_tx+322>:	movb   
> >$0x0,0x12(%rdi)
> >> > 0x00000000000166a6 <iwl3945_rx_reply_tx+326>:	movb   
> >$0x0,0x15(%rdi)
> >> 
> >> This equals to below code in 
> >ieee80211_tx_info_clear_status(). "info" is
> >> %rdi, which is 0x38. That matches NULL pointer dereference at 0x41 in
> >> your oops header.
> >> 
> >> 	for (i = 0; i < IEEE80211_TX_MAX_RATES; i++)
> >>                 info->status.rates[i].count = 0;
> >> 
> >> I guess there is a race for txq->q.read_ptr somewhere. 
> >Haven't checked
> >> though.
> >
> >OK. 3945 updated write_ptr without regard to read_ptr on the Tx path.
> >This messes up our TFD on high load. The patch should fix your problem.
> 
> Wow.  Could this be happening with other devices as well?
> 

We do check this on other devices.

Abhijeet

> -- Ben --
> 
> 
> >
> >Signed-off-by: Zhu Yi <yi.zhu@intel.com>
> >
> >diff --git a/drivers/net/wireless/iwlwifi/iwl3945-base.c 
> >b/drivers/net/wireless/iwlwifi/iwl3945-base.c
> >index 994db4a..b31b34c 100644
> >--- a/drivers/net/wireless/iwlwifi/iwl3945-base.c
> >+++ b/drivers/net/wireless/iwlwifi/iwl3945-base.c
> >@@ -548,6 +548,9 @@ static int iwl3945_tx_skb(struct iwl_priv 
> >*priv, struct sk_buff *skb)
> > 	txq = &priv->txq[txq_id];
> > 	q = &txq->q;
> > 
> >+	if ((iwl_queue_space(q) < q->high_mark))
> >+		goto drop;
> >+
> > 	spin_lock_irqsave(&priv->lock, flags);
> > 
> > 	idx = get_cmd_index(q, q->write_ptr, 0);
> > 
> >
> >
> >---------------------------------------------------------------
> >---------------
> >Join us December 9, 2009 for the Red Hat Virtual Experience,
> >a free event focused on virtualization and cloud computing. 
> >Attend in-depth sessions from your desk. Your couch. Anywhere.
> >http://p.sf.net/sfu/redhat-sfdev2dev
> >_______________________________________________
> >Ipw3945-devel mailing list
> >Ipw3945-devel@lists.sourceforge.net
> >https://lists.sourceforge.net/lists/listinfo/ipw3945-devel
> >--
> To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Panic in iwl3945 driver
  2009-12-02 17:17             ` Maxim Levitsky
@ 2009-12-21 16:40               ` Maxim Levitsky
  2009-12-22  8:57                 ` Zhu Yi
  0 siblings, 1 reply; 20+ messages in thread
From: Maxim Levitsky @ 2009-12-21 16:40 UTC (permalink / raw)
  To: Zhu Yi; +Cc: Chatre, Reinette, linux-wireless, iwlwifi maling list

On Wed, 2009-12-02 at 19:17 +0200, Maxim Levitsky wrote: 
> On Wed, 2009-12-02 at 13:42 +0800, Zhu Yi wrote: 
> > On Tue, 2009-12-01 at 17:28 +0800, Zhu Yi wrote:
> > > On Tue, 2009-12-01 at 06:35 +0800, Maxim Levitsky wrote:
> > > > 0x000000000001668e <iwl3945_rx_reply_tx+302>:	lea    0x38(%r8),%rdi
> > > > 0x0000000000016692 <iwl3945_rx_reply_tx+306>:	lea    0x4f(%r8),%rax
> > > 
> > > When this happened, from your previous post, r8 is 0x0 and rdi is 0x38.
> > > Since "info" is %rdi (see below), this means
> > > txq->txb[txq->q.read_ptr].skb[0], aka. r8 is 0.
> > > 
> > > > 	rate_idx = iwl3945_hwrate_to_plcp_idx(tx_resp->rate);
> > > > 
> > > > 0x0000000000016696 <iwl3945_rx_reply_tx+310>:	movb   $0x0,0x9(%rdi)        <---------- RIP
> > > > 0x000000000001669a <iwl3945_rx_reply_tx+314>:	movb   $0x0,0xc(%rdi)
> > > > 0x000000000001669e <iwl3945_rx_reply_tx+318>:	movb   $0x0,0xf(%rdi)
> > > > 0x00000000000166a2 <iwl3945_rx_reply_tx+322>:	movb   $0x0,0x12(%rdi)
> > > > 0x00000000000166a6 <iwl3945_rx_reply_tx+326>:	movb   $0x0,0x15(%rdi)
> > > 
> > > This equals to below code in ieee80211_tx_info_clear_status(). "info" is
> > > %rdi, which is 0x38. That matches NULL pointer dereference at 0x41 in
> > > your oops header.
> > > 
> > > 	for (i = 0; i < IEEE80211_TX_MAX_RATES; i++)
> > >                 info->status.rates[i].count = 0;
> > > 
> > > I guess there is a race for txq->q.read_ptr somewhere. Haven't checked
> > > though.
> > 
> > OK. 3945 updated write_ptr without regard to read_ptr on the Tx path.
> > This messes up our TFD on high load. The patch should fix your problem.
> > 
> > Signed-off-by: Zhu Yi <yi.zhu@intel.com>
> > 
> > diff --git a/drivers/net/wireless/iwlwifi/iwl3945-base.c b/drivers/net/wireless/iwlwifi/iwl3945-base.c
> > index 994db4a..b31b34c 100644
> > --- a/drivers/net/wireless/iwlwifi/iwl3945-base.c
> > +++ b/drivers/net/wireless/iwlwifi/iwl3945-base.c
> > @@ -548,6 +548,9 @@ static int iwl3945_tx_skb(struct iwl_priv *priv, struct sk_buff *skb)
> >  	txq = &priv->txq[txq_id];
> >  	q = &txq->q;
> >  
> > +	if ((iwl_queue_space(q) < q->high_mark))
> > +		goto drop;
> > +
> >  	spin_lock_irqsave(&priv->lock, flags);
> >  
> >  	idx = get_cmd_index(q, q->write_ptr, 0);
> >  
> >
Few days ago, I had an idea to reply here that I am sure that this
problem disappeared with this patch.


Today I got same kernel panic _with_ the patch applied....



<1>[ 3075.773505] BUG: unable to handle kernel NULL pointer dereference at 0000000000000041
<1>[ 3075.773540] IP: [<ffffffffa0c175f6>] iwl3945_rx_reply_tx+0xc6/0x450 [iwl3945]
<4>[ 3075.773564] PGD 0 
<1>[ 3075.773570] Thread overran stack, or stack corrupted
<0>[ 3075.773579] Oops: 0002 [#1] PREEMPT SMP 
<0>[ 3075.773591] last sysfs file: /sys/devices/platform/coretemp.1/temp1_input
<4>[ 3075.773604] CPU 0 
<4>[ 3075.773611] Modules linked in: af_packet nfsd exportfs nfs lockd nfs_acl auth_rpcgss sunrpc snd_hda_codec_realtek uvcvideo videodev iwl3945 cpufreq_powersave usb_storage v4l1_compat snd_hda_intel iwlcore cpufreq_conservative joydev usb_libusual v4l2_compat_ioctl32 snd_hda_codec cpufreq_userspace mac80211 acpi_cpufreq uhci_hcd coretemp snd_hwdep cfg80211 psmouse tg3 ohci1394 video ehci_hcd snd_pcm sbp2 ac battery output usbcore libphy snd_page_alloc lirc_ene0100 ieee1394 nvidia(P) serio_raw evdev rfkill fuse lzo lzo_decompress lzo_compress
<6>[ 3075.773757] Pid: 0, comm: swapper Tainted: P           2.6.32-wl #225 Aspire 5720     
<6>[ 3075.773772] RIP: 0010:[<ffffffffa0c175f6>]  [<ffffffffa0c175f6>] iwl3945_rx_reply_tx+0xc6/0x450 [iwl3945]
<6>[ 3075.773795] RSP: 0018:ffff880002203d20  EFLAGS: 00010246
<6>[ 3075.773806] RAX: 000000000000004f RBX: ffff880065121600 RCX: 00000000000000b0
<6>[ 3075.773820] RDX: ffffffffa0c1f220 RSI: ffff88004f42e008 RDI: 0000000000000038
<6>[ 3075.773834] RBP: ffff880002203d90 R08: 0000000000000000 R09: 0000000000000100
<6>[ 3075.773847] R10: 0000000000000001 R11: 0000000000000046 R12: 00000000000000a0
<6>[ 3075.773861] R13: 0000000000000002 R14: 00000000000000b0 R15: 0000000000020201
<6>[ 3075.773875] FS:  0000000000000000(0000) GS:ffff880002200000(0000) knlGS:0000000000000000
<6>[ 3075.773891] CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
<6>[ 3075.773903] CR2: 0000000000000041 CR3: 0000000001001000 CR4: 00000000000006f0
<6>[ 3075.773917] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
<6>[ 3075.773930] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
<4>[ 3075.773945] Process swapper (pid: 0, threadinfo ffffffff815b8000, task ffffffff815bcb60)
<0>[ 3075.773959] Stack:
<4>[ 3075.773964]  ffff880065122070 0000000000000046 ffff880002203d40 0000000000000046
<4>[ 3075.773981] <0> 0000000000000282 0000000000000282 0000000000000282 ffff880065122010
<4>[ 3075.774000] <0> ffff880002203d70 ffff880065121600 ffff880065122eb0 ffff88004f42e000
<0>[ 3075.774020] Call Trace:
<0>[ 3075.774026]  <IRQ> 
<4>[ 3075.774037]  [<ffffffffa0c0acf7>] iwl3945_irq_tasklet+0x657/0x1740 [iwl3945]
<4>[ 3075.774060]  [<ffffffffa09b883e>] ? iwl_isr_legacy+0x3e/0x600 [iwlcore]
<4>[ 3075.774077]  [<ffffffff81053f86>] tasklet_action+0x106/0x110
<4>[ 3075.774090]  [<ffffffff810556a4>] __do_softirq+0x114/0x2c0
<4>[ 3075.774103]  [<ffffffff8109eb68>] ? handle_edge_irq+0x78/0x160
<4>[ 3075.774117]  [<ffffffff8100d25c>] call_softirq+0x1c/0x30
<4>[ 3075.774128]  [<ffffffff8100f1fd>] do_softirq+0x7d/0xb0
<4>[ 3075.774140]  [<ffffffff81055155>] irq_exit+0x95/0xa0
<4>[ 3075.774152]  [<ffffffff814031fc>] do_IRQ+0x7c/0xf0
<4>[ 3075.774165]  [<ffffffff8100ca13>] ret_from_intr+0x0/0xf
<0>[ 3075.774175]  <EOI> 
<4>[ 3075.774183]  [<ffffffff8129996a>] ? acpi_idle_enter_bm+0x275/0x2aa
<4>[ 3075.774196]  [<ffffffff81299974>] ? acpi_idle_enter_bm+0x27f/0x2aa
<4>[ 3075.774209]  [<ffffffff8129996a>] ? acpi_idle_enter_bm+0x275/0x2aa
<4>[ 3075.774224]  [<ffffffff8132996f>] ? cpuidle_idle_call+0x9f/0x160
<4>[ 3075.774237]  [<ffffffff8100b1df>] ? cpu_idle+0xaf/0x110
<4>[ 3075.774250]  [<ffffffff813eca3a>] ? rest_init+0x7a/0x80
<4>[ 3075.774264]  [<ffffffff8162dd01>] ? start_kernel+0x3ac/0x3b8
<4>[ 3075.774277]  [<ffffffff8162d315>] ? x86_64_start_reservations+0x125/0x129
<4>[ 3075.774291]  [<ffffffff8162d3fd>] ? x86_64_start_kernel+0xe4/0xeb
<0>[ 3075.774302] Code: 00 41 39 ce 0f 8d e3 01 00 00 48 8b 47 40 48 63 d2 48 69 d2 98 00 00 00 4c 8b 04 02 48 c7 c2 20 f2 c1 a0 49 8d 78 38 49 8d 40 4f <c6> 47 09 00 c6 47 0c 00 c6 47 0f 00 c6 47 12 00 c6 47 15 00 49 
<1>[ 3075.774394] RIP  [<ffffffffa0c175f6>] iwl3945_rx_reply_tx+0xc6/0x450 [iwl3945]
<4>[ 3075.774412]  RSP <ffff880002203d20>
<0>[ 3075.774420] CR2: 0000000000000041
<4>[ 3075.774429] ---[ end trace 7ea524291c193896 ]---
<0>[ 3075.774439] Kernel panic - not syncing: Fatal exception in interrupt
<4>[ 3075.774451] Pid: 0, comm: swapper Tainted: P      D    2.6.32-wl #225
<4>[ 3075.774463] Call Trace:
<4>[ 3075.774469]  <IRQ>  [<ffffffff813fe515>] panic+0x82/0x13f
<4>[ 3075.774486]  [<ffffffff81010a52>] oops_end+0xe2/0xf0
<4>[ 3075.774497]  [<ffffffff81031382>] no_context+0xf2/0x260
<4>[ 3075.774509]  [<ffffffff81031615>] __bad_area_nosemaphore+0x125/0x1e0
<4>[ 3075.774523]  [<ffffffff810316e3>] bad_area_nosemaphore+0x13/0x20
<4>[ 3075.774536]  [<ffffffff81031aca>] do_page_fault+0x26a/0x320
<4>[ 3075.774549]  [<ffffffff81402dcf>] page_fault+0x1f/0x30
<4>[ 3075.774564]  [<ffffffffa0c175f6>] ? iwl3945_rx_reply_tx+0xc6/0x450 [iwl3945]
<4>[ 3075.774581]  [<ffffffffa0c0acf7>] iwl3945_irq_tasklet+0x657/0x1740 [iwl3945]
<4>[ 3075.774602]  [<ffffffffa09b883e>] ? iwl_isr_legacy+0x3e/0x600 [iwlcore]
<4>[ 3075.774616]  [<ffffffff81053f86>] tasklet_action+0x106/0x110
<4>[ 3075.774628]  [<ffffffff810556a4>] __do_softirq+0x114/0x2c0
<4>[ 3075.774640]  [<ffffffff8109eb68>] ? handle_edge_irq+0x78/0x160
<4>[ 3075.774653]  [<ffffffff8100d25c>] call_softirq+0x1c/0x30
<4>[ 3075.774665]  [<ffffffff8100f1fd>] do_softirq+0x7d/0xb0
<4>[ 3075.774676]  [<ffffffff81055155>] irq_exit+0x95/0xa0
<4>[ 3075.774687]  [<ffffffff814031fc>] do_IRQ+0x7c/0xf0
<4>[ 3075.774698]  [<ffffffff8100ca13>] ret_from_intr+0x0/0xf
<4>[ 3075.774708]  <EOI>  [<ffffffff8129996a>] ? acpi_idle_enter_bm+0x275/0x2aa
<4>[ 3075.774726]  [<ffffffff81299974>] ? acpi_idle_enter_bm+0x27f/0x2aa
<4>[ 3075.774739]  [<ffffffff8129996a>] ? acpi_idle_enter_bm+0x275/0x2aa
<4>[ 3075.774752]  [<ffffffff8132996f>] ? cpuidle_idle_call+0x9f/0x160
<4>[ 3075.774765]  [<ffffffff8100b1df>] ? cpu_idle+0xaf/0x110
<4>[ 3075.774777]  [<ffffffff813eca3a>] ? rest_init+0x7a/0x80
<4>[ 3075.774789]  [<ffffffff8162dd01>] ? start_kernel+0x3ac/0x3b8
<4>[ 3075.774802]  [<ffffffff8162d315>] ? x86_64_start_reservations+0x125/0x129
<4>[ 3075.774816]  [<ffffffff8162d3fd>] ? x86_64_start_kernel+0xe4/0xeb
<0>[ 3075.774845] Rebooting in 10 seconds..



(gdb) info line *iwl3945_rx_reply_tx+0xc6
Line 483 of "/home/maxim/software/kernel/linux-2.6/include/net/mac80211.h"
   starts at address 0x16626 <iwl3945_rx_reply_tx+198> and ends at 0x1663a <iwl3945_rx_reply_tx+218>.


(gdb) list * 0x16626
0x16626 is in iwl3945_rx_reply_tx (/home/maxim/software/kernel/linux-2.6/include/net/mac80211.h:483).
478		BUILD_BUG_ON(offsetof(struct ieee80211_tx_info, status.rates) !=
479			     offsetof(struct ieee80211_tx_info, driver_rates));
480		BUILD_BUG_ON(offsetof(struct ieee80211_tx_info, status.rates) != 8);
481		/* clear the rate counts */
482		for (i = 0; i < IEEE80211_TX_MAX_RATES; i++)
483			info->status.rates[i].count = 0;
484	
485		BUILD_BUG_ON(
486		    offsetof(struct ieee80211_tx_info, status.ampdu_ack_len) != 23);
487		memset(&info->status.ampdu_ack_len, 0,



Kernel is the dbb6e436ef8e1713258bf1218d09e927d8de3590
(wireless: update old static regulatory domain rules)

Plus few patches that only one that touches wireless is:


diff --git a/drivers/net/wireless/iwlwifi/iwl3945-base.c b/drivers/net/wireless/iwlwifi/iwl3945-base.c
index 2a28a1f..a36de73 100644
--- a/drivers/net/wireless/iwlwifi/iwl3945-base.c
+++ b/drivers/net/wireless/iwlwifi/iwl3945-base.c
@@ -548,6 +548,9 @@ static int iwl3945_tx_skb(struct iwl_priv *priv, struct sk_buff *skb)
        txq = &priv->txq[txq_id];
        q = &txq->q;
 
+       if ((iwl_queue_space(q) < q->high_mark))
+               goto drop;
+
        spin_lock_irqsave(&priv->lock, flags);
 
        idx = get_cmd_index(q, q->write_ptr, 0);



Best regards,
	Maxim Levitsky



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: Panic in iwl3945 driver
  2009-12-21 16:40               ` Maxim Levitsky
@ 2009-12-22  8:57                 ` Zhu Yi
  2010-01-05 16:56                   ` Maxim Levitsky
  0 siblings, 1 reply; 20+ messages in thread
From: Zhu Yi @ 2009-12-22  8:57 UTC (permalink / raw)
  To: Maxim Levitsky; +Cc: Chatre, Reinette, linux-wireless, iwlwifi maling list

On Tue, 2009-12-22 at 00:40 +0800, Maxim Levitsky wrote:
> On Wed, 2009-12-02 at 19:17 +0200, Maxim Levitsky wrote: 
> > On Wed, 2009-12-02 at 13:42 +0800, Zhu Yi wrote: 
> > > On Tue, 2009-12-01 at 17:28 +0800, Zhu Yi wrote:
> > > > On Tue, 2009-12-01 at 06:35 +0800, Maxim Levitsky wrote:
> > > > > 0x000000000001668e <iwl3945_rx_reply_tx+302>:	lea    0x38(%r8),%rdi
> > > > > 0x0000000000016692 <iwl3945_rx_reply_tx+306>:	lea    0x4f(%r8),%rax
> > > > 
> > > > When this happened, from your previous post, r8 is 0x0 and rdi is 0x38.
> > > > Since "info" is %rdi (see below), this means
> > > > txq->txb[txq->q.read_ptr].skb[0], aka. r8 is 0.
> > > > 
> > > > > 	rate_idx = iwl3945_hwrate_to_plcp_idx(tx_resp->rate);
> > > > > 
> > > > > 0x0000000000016696 <iwl3945_rx_reply_tx+310>:	movb   $0x0,0x9(%rdi)        <---------- RIP
> > > > > 0x000000000001669a <iwl3945_rx_reply_tx+314>:	movb   $0x0,0xc(%rdi)
> > > > > 0x000000000001669e <iwl3945_rx_reply_tx+318>:	movb   $0x0,0xf(%rdi)
> > > > > 0x00000000000166a2 <iwl3945_rx_reply_tx+322>:	movb   $0x0,0x12(%rdi)
> > > > > 0x00000000000166a6 <iwl3945_rx_reply_tx+326>:	movb   $0x0,0x15(%rdi)
> > > > 
> > > > This equals to below code in ieee80211_tx_info_clear_status(). "info" is
> > > > %rdi, which is 0x38. That matches NULL pointer dereference at 0x41 in
> > > > your oops header.
> > > > 
> > > > 	for (i = 0; i < IEEE80211_TX_MAX_RATES; i++)
> > > >                 info->status.rates[i].count = 0;
> > > > 
> > > > I guess there is a race for txq->q.read_ptr somewhere. Haven't checked
> > > > though.
> > > 
> > > OK. 3945 updated write_ptr without regard to read_ptr on the Tx path.
> > > This messes up our TFD on high load. The patch should fix your problem.
> > > 
> > > Signed-off-by: Zhu Yi <yi.zhu@intel.com>
> > > 
> > > diff --git a/drivers/net/wireless/iwlwifi/iwl3945-base.c b/drivers/net/wireless/iwlwifi/iwl3945-base.c
> > > index 994db4a..b31b34c 100644
> > > --- a/drivers/net/wireless/iwlwifi/iwl3945-base.c
> > > +++ b/drivers/net/wireless/iwlwifi/iwl3945-base.c
> > > @@ -548,6 +548,9 @@ static int iwl3945_tx_skb(struct iwl_priv *priv, struct sk_buff *skb)
> > >  	txq = &priv->txq[txq_id];
> > >  	q = &txq->q;
> > >  
> > > +	if ((iwl_queue_space(q) < q->high_mark))
> > > +		goto drop;
> > > +
> > >  	spin_lock_irqsave(&priv->lock, flags);
> > >  
> > >  	idx = get_cmd_index(q, q->write_ptr, 0);
> > >  
> > >
> Few days ago, I had an idea to reply here that I am sure that this
> problem disappeared with this patch.
> 
> 
> Today I got same kernel panic _with_ the patch applied....

Looks like (all of) the root causes are still not found yet. The symptom
is exactly the same as the previous one.

One thing I found today is when txq read_ptr catches up to write_ptr
(read_ptr == write_ptr), iwl_queue_used() will _always_ return TRUE!
This will be a problem if the firmware sends us a wrong index
(sequence), then we will fail the check in this condition. I'm not sure
if firmware can really send us a wrong sequence. Can you please try this
patch? Apply it on top of the previous one. If you do see the "FIRMWARE
BUG" in dmesg, then I think we find the root cause.


diff --git a/drivers/net/wireless/iwlwifi/iwl-dev.h b/drivers/net/wireless/iwlwifi/iwl-dev.h
index 2673e9a..02070cc 100644
--- a/drivers/net/wireless/iwlwifi/iwl-dev.h
+++ b/drivers/net/wireless/iwlwifi/iwl-dev.h
@@ -711,7 +711,11 @@ extern void iwl_txq_ctx_stop(struct iwl_priv *priv);
 extern int iwl_queue_space(const struct iwl_queue *q);
 static inline int iwl_queue_used(const struct iwl_queue *q, int i)
 {
-	return q->write_ptr > q->read_ptr ?
+	if (q->write_ptr == q->read_ptr)
+		printk("FIRMWARE BUG: index %d is given while read_ptr is %d\n",
+		       i, q->read_ptr);
+
+	return q->write_ptr >= q->read_ptr ?
 		(i >= q->read_ptr && i < q->write_ptr) :
 		!(i < q->read_ptr && i >= q->write_ptr);
 }


Thanks,
-yi

> <1>[ 3075.773505] BUG: unable to handle kernel NULL pointer dereference at 0000000000000041
> <1>[ 3075.773540] IP: [<ffffffffa0c175f6>] iwl3945_rx_reply_tx+0xc6/0x450 [iwl3945]
> <4>[ 3075.773564] PGD 0 
> <1>[ 3075.773570] Thread overran stack, or stack corrupted
> <0>[ 3075.773579] Oops: 0002 [#1] PREEMPT SMP 
> <0>[ 3075.773591] last sysfs file: /sys/devices/platform/coretemp.1/temp1_input
> <4>[ 3075.773604] CPU 0 
> <4>[ 3075.773611] Modules linked in: af_packet nfsd exportfs nfs lockd nfs_acl auth_rpcgss sunrpc snd_hda_codec_realtek uvcvideo videodev iwl3945 cpufreq_powersave usb_storage v4l1_compat snd_hda_intel iwlcore cpufreq_conservative joydev usb_libusual v4l2_compat_ioctl32 snd_hda_codec cpufreq_userspace mac80211 acpi_cpufreq uhci_hcd coretemp snd_hwdep cfg80211 psmouse tg3 ohci1394 video ehci_hcd snd_pcm sbp2 ac battery output usbcore libphy snd_page_alloc lirc_ene0100 ieee1394 nvidia(P) serio_raw evdev rfkill fuse lzo lzo_decompress lzo_compress
> <6>[ 3075.773757] Pid: 0, comm: swapper Tainted: P           2.6.32-wl #225 Aspire 5720     
> <6>[ 3075.773772] RIP: 0010:[<ffffffffa0c175f6>]  [<ffffffffa0c175f6>] iwl3945_rx_reply_tx+0xc6/0x450 [iwl3945]
> <6>[ 3075.773795] RSP: 0018:ffff880002203d20  EFLAGS: 00010246
> <6>[ 3075.773806] RAX: 000000000000004f RBX: ffff880065121600 RCX: 00000000000000b0
> <6>[ 3075.773820] RDX: ffffffffa0c1f220 RSI: ffff88004f42e008 RDI: 0000000000000038
> <6>[ 3075.773834] RBP: ffff880002203d90 R08: 0000000000000000 R09: 0000000000000100
> <6>[ 3075.773847] R10: 0000000000000001 R11: 0000000000000046 R12: 00000000000000a0
> <6>[ 3075.773861] R13: 0000000000000002 R14: 00000000000000b0 R15: 0000000000020201
> <6>[ 3075.773875] FS:  0000000000000000(0000) GS:ffff880002200000(0000) knlGS:0000000000000000
> <6>[ 3075.773891] CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
> <6>[ 3075.773903] CR2: 0000000000000041 CR3: 0000000001001000 CR4: 00000000000006f0
> <6>[ 3075.773917] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> <6>[ 3075.773930] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> <4>[ 3075.773945] Process swapper (pid: 0, threadinfo ffffffff815b8000, task ffffffff815bcb60)
> <0>[ 3075.773959] Stack:
> <4>[ 3075.773964]  ffff880065122070 0000000000000046 ffff880002203d40 0000000000000046
> <4>[ 3075.773981] <0> 0000000000000282 0000000000000282 0000000000000282 ffff880065122010
> <4>[ 3075.774000] <0> ffff880002203d70 ffff880065121600 ffff880065122eb0 ffff88004f42e000
> <0>[ 3075.774020] Call Trace:
> <0>[ 3075.774026]  <IRQ> 
> <4>[ 3075.774037]  [<ffffffffa0c0acf7>] iwl3945_irq_tasklet+0x657/0x1740 [iwl3945]
> <4>[ 3075.774060]  [<ffffffffa09b883e>] ? iwl_isr_legacy+0x3e/0x600 [iwlcore]
> <4>[ 3075.774077]  [<ffffffff81053f86>] tasklet_action+0x106/0x110
> <4>[ 3075.774090]  [<ffffffff810556a4>] __do_softirq+0x114/0x2c0
> <4>[ 3075.774103]  [<ffffffff8109eb68>] ? handle_edge_irq+0x78/0x160
> <4>[ 3075.774117]  [<ffffffff8100d25c>] call_softirq+0x1c/0x30
> <4>[ 3075.774128]  [<ffffffff8100f1fd>] do_softirq+0x7d/0xb0
> <4>[ 3075.774140]  [<ffffffff81055155>] irq_exit+0x95/0xa0
> <4>[ 3075.774152]  [<ffffffff814031fc>] do_IRQ+0x7c/0xf0
> <4>[ 3075.774165]  [<ffffffff8100ca13>] ret_from_intr+0x0/0xf
> <0>[ 3075.774175]  <EOI> 
> <4>[ 3075.774183]  [<ffffffff8129996a>] ? acpi_idle_enter_bm+0x275/0x2aa
> <4>[ 3075.774196]  [<ffffffff81299974>] ? acpi_idle_enter_bm+0x27f/0x2aa
> <4>[ 3075.774209]  [<ffffffff8129996a>] ? acpi_idle_enter_bm+0x275/0x2aa
> <4>[ 3075.774224]  [<ffffffff8132996f>] ? cpuidle_idle_call+0x9f/0x160
> <4>[ 3075.774237]  [<ffffffff8100b1df>] ? cpu_idle+0xaf/0x110
> <4>[ 3075.774250]  [<ffffffff813eca3a>] ? rest_init+0x7a/0x80
> <4>[ 3075.774264]  [<ffffffff8162dd01>] ? start_kernel+0x3ac/0x3b8
> <4>[ 3075.774277]  [<ffffffff8162d315>] ? x86_64_start_reservations+0x125/0x129
> <4>[ 3075.774291]  [<ffffffff8162d3fd>] ? x86_64_start_kernel+0xe4/0xeb
> <0>[ 3075.774302] Code: 00 41 39 ce 0f 8d e3 01 00 00 48 8b 47 40 48 63 d2 48 69 d2 98 00 00 00 4c 8b 04 02 48 c7 c2 20 f2 c1 a0 49 8d 78 38 49 8d 40 4f <c6> 47 09 00 c6 47 0c 00 c6 47 0f 00 c6 47 12 00 c6 47 15 00 49 
> <1>[ 3075.774394] RIP  [<ffffffffa0c175f6>] iwl3945_rx_reply_tx+0xc6/0x450 [iwl3945]
> <4>[ 3075.774412]  RSP <ffff880002203d20>
> <0>[ 3075.774420] CR2: 0000000000000041
> <4>[ 3075.774429] ---[ end trace 7ea524291c193896 ]---
> <0>[ 3075.774439] Kernel panic - not syncing: Fatal exception in interrupt
> <4>[ 3075.774451] Pid: 0, comm: swapper Tainted: P      D    2.6.32-wl #225
> <4>[ 3075.774463] Call Trace:
> <4>[ 3075.774469]  <IRQ>  [<ffffffff813fe515>] panic+0x82/0x13f
> <4>[ 3075.774486]  [<ffffffff81010a52>] oops_end+0xe2/0xf0
> <4>[ 3075.774497]  [<ffffffff81031382>] no_context+0xf2/0x260
> <4>[ 3075.774509]  [<ffffffff81031615>] __bad_area_nosemaphore+0x125/0x1e0
> <4>[ 3075.774523]  [<ffffffff810316e3>] bad_area_nosemaphore+0x13/0x20
> <4>[ 3075.774536]  [<ffffffff81031aca>] do_page_fault+0x26a/0x320
> <4>[ 3075.774549]  [<ffffffff81402dcf>] page_fault+0x1f/0x30
> <4>[ 3075.774564]  [<ffffffffa0c175f6>] ? iwl3945_rx_reply_tx+0xc6/0x450 [iwl3945]
> <4>[ 3075.774581]  [<ffffffffa0c0acf7>] iwl3945_irq_tasklet+0x657/0x1740 [iwl3945]
> <4>[ 3075.774602]  [<ffffffffa09b883e>] ? iwl_isr_legacy+0x3e/0x600 [iwlcore]
> <4>[ 3075.774616]  [<ffffffff81053f86>] tasklet_action+0x106/0x110
> <4>[ 3075.774628]  [<ffffffff810556a4>] __do_softirq+0x114/0x2c0
> <4>[ 3075.774640]  [<ffffffff8109eb68>] ? handle_edge_irq+0x78/0x160
> <4>[ 3075.774653]  [<ffffffff8100d25c>] call_softirq+0x1c/0x30
> <4>[ 3075.774665]  [<ffffffff8100f1fd>] do_softirq+0x7d/0xb0
> <4>[ 3075.774676]  [<ffffffff81055155>] irq_exit+0x95/0xa0
> <4>[ 3075.774687]  [<ffffffff814031fc>] do_IRQ+0x7c/0xf0
> <4>[ 3075.774698]  [<ffffffff8100ca13>] ret_from_intr+0x0/0xf
> <4>[ 3075.774708]  <EOI>  [<ffffffff8129996a>] ? acpi_idle_enter_bm+0x275/0x2aa
> <4>[ 3075.774726]  [<ffffffff81299974>] ? acpi_idle_enter_bm+0x27f/0x2aa
> <4>[ 3075.774739]  [<ffffffff8129996a>] ? acpi_idle_enter_bm+0x275/0x2aa
> <4>[ 3075.774752]  [<ffffffff8132996f>] ? cpuidle_idle_call+0x9f/0x160
> <4>[ 3075.774765]  [<ffffffff8100b1df>] ? cpu_idle+0xaf/0x110
> <4>[ 3075.774777]  [<ffffffff813eca3a>] ? rest_init+0x7a/0x80
> <4>[ 3075.774789]  [<ffffffff8162dd01>] ? start_kernel+0x3ac/0x3b8
> <4>[ 3075.774802]  [<ffffffff8162d315>] ? x86_64_start_reservations+0x125/0x129
> <4>[ 3075.774816]  [<ffffffff8162d3fd>] ? x86_64_start_kernel+0xe4/0xeb
> <0>[ 3075.774845] Rebooting in 10 seconds..
> 
> 
> 
> (gdb) info line *iwl3945_rx_reply_tx+0xc6
> Line 483 of "/home/maxim/software/kernel/linux-2.6/include/net/mac80211.h"
>    starts at address 0x16626 <iwl3945_rx_reply_tx+198> and ends at 0x1663a <iwl3945_rx_reply_tx+218>.
> 
> 
> (gdb) list * 0x16626
> 0x16626 is in iwl3945_rx_reply_tx (/home/maxim/software/kernel/linux-2.6/include/net/mac80211.h:483).
> 478		BUILD_BUG_ON(offsetof(struct ieee80211_tx_info, status.rates) !=
> 479			     offsetof(struct ieee80211_tx_info, driver_rates));
> 480		BUILD_BUG_ON(offsetof(struct ieee80211_tx_info, status.rates) != 8);
> 481		/* clear the rate counts */
> 482		for (i = 0; i < IEEE80211_TX_MAX_RATES; i++)
> 483			info->status.rates[i].count = 0;
> 484	
> 485		BUILD_BUG_ON(
> 486		    offsetof(struct ieee80211_tx_info, status.ampdu_ack_len) != 23);
> 487		memset(&info->status.ampdu_ack_len, 0,
> 
> 
> 
> Kernel is the dbb6e436ef8e1713258bf1218d09e927d8de3590
> (wireless: update old static regulatory domain rules)
> 
> Plus few patches that only one that touches wireless is:
> 
> 
> diff --git a/drivers/net/wireless/iwlwifi/iwl3945-base.c b/drivers/net/wireless/iwlwifi/iwl3945-base.c
> index 2a28a1f..a36de73 100644
> --- a/drivers/net/wireless/iwlwifi/iwl3945-base.c
> +++ b/drivers/net/wireless/iwlwifi/iwl3945-base.c
> @@ -548,6 +548,9 @@ static int iwl3945_tx_skb(struct iwl_priv *priv, struct sk_buff *skb)
>         txq = &priv->txq[txq_id];
>         q = &txq->q;
>  
> +       if ((iwl_queue_space(q) < q->high_mark))
> +               goto drop;
> +
>         spin_lock_irqsave(&priv->lock, flags);
>  
>         idx = get_cmd_index(q, q->write_ptr, 0);
> 
> 
> 
> Best regards,
> 	Maxim Levitsky
> 
> 



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: Panic in iwl3945 driver
  2009-12-22  8:57                 ` Zhu Yi
@ 2010-01-05 16:56                   ` Maxim Levitsky
  2010-01-05 22:16                     ` [ipw3945-devel] " Cahill, Ben M
  2010-01-06  3:55                     ` Zhu Yi
  0 siblings, 2 replies; 20+ messages in thread
From: Maxim Levitsky @ 2010-01-05 16:56 UTC (permalink / raw)
  To: Zhu Yi; +Cc: Chatre, Reinette, linux-wireless, iwlwifi maling list

On Tue, 2009-12-22 at 16:57 +0800, Zhu Yi wrote: 
> On Tue, 2009-12-22 at 00:40 +0800, Maxim Levitsky wrote:
> > On Wed, 2009-12-02 at 19:17 +0200, Maxim Levitsky wrote: 
> > > On Wed, 2009-12-02 at 13:42 +0800, Zhu Yi wrote: 
> > > > On Tue, 2009-12-01 at 17:28 +0800, Zhu Yi wrote:
> > > > > On Tue, 2009-12-01 at 06:35 +0800, Maxim Levitsky wrote:
> > > > > > 0x000000000001668e <iwl3945_rx_reply_tx+302>:	lea    0x38(%r8),%rdi
> > > > > > 0x0000000000016692 <iwl3945_rx_reply_tx+306>:	lea    0x4f(%r8),%rax
> > > > > 
> > > > > When this happened, from your previous post, r8 is 0x0 and rdi is 0x38.
> > > > > Since "info" is %rdi (see below), this means
> > > > > txq->txb[txq->q.read_ptr].skb[0], aka. r8 is 0.
> > > > > 
> > > > > > 	rate_idx = iwl3945_hwrate_to_plcp_idx(tx_resp->rate);
> > > > > > 
> > > > > > 0x0000000000016696 <iwl3945_rx_reply_tx+310>:	movb   $0x0,0x9(%rdi)        <---------- RIP
> > > > > > 0x000000000001669a <iwl3945_rx_reply_tx+314>:	movb   $0x0,0xc(%rdi)
> > > > > > 0x000000000001669e <iwl3945_rx_reply_tx+318>:	movb   $0x0,0xf(%rdi)
> > > > > > 0x00000000000166a2 <iwl3945_rx_reply_tx+322>:	movb   $0x0,0x12(%rdi)
> > > > > > 0x00000000000166a6 <iwl3945_rx_reply_tx+326>:	movb   $0x0,0x15(%rdi)
> > > > > 
> > > > > This equals to below code in ieee80211_tx_info_clear_status(). "info" is
> > > > > %rdi, which is 0x38. That matches NULL pointer dereference at 0x41 in
> > > > > your oops header.
> > > > > 
> > > > > 	for (i = 0; i < IEEE80211_TX_MAX_RATES; i++)
> > > > >                 info->status.rates[i].count = 0;
> > > > > 
> > > > > I guess there is a race for txq->q.read_ptr somewhere. Haven't checked
> > > > > though.
> > > > 
> > > > OK. 3945 updated write_ptr without regard to read_ptr on the Tx path.
> > > > This messes up our TFD on high load. The patch should fix your problem.
> > > > 
> > > > Signed-off-by: Zhu Yi <yi.zhu@intel.com>
> > > > 
> > > > diff --git a/drivers/net/wireless/iwlwifi/iwl3945-base.c b/drivers/net/wireless/iwlwifi/iwl3945-base.c
> > > > index 994db4a..b31b34c 100644
> > > > --- a/drivers/net/wireless/iwlwifi/iwl3945-base.c
> > > > +++ b/drivers/net/wireless/iwlwifi/iwl3945-base.c
> > > > @@ -548,6 +548,9 @@ static int iwl3945_tx_skb(struct iwl_priv *priv, struct sk_buff *skb)
> > > >  	txq = &priv->txq[txq_id];
> > > >  	q = &txq->q;
> > > >  
> > > > +	if ((iwl_queue_space(q) < q->high_mark))
> > > > +		goto drop;
> > > > +
> > > >  	spin_lock_irqsave(&priv->lock, flags);
> > > >  
> > > >  	idx = get_cmd_index(q, q->write_ptr, 0);
> > > >  
> > > >
> > Few days ago, I had an idea to reply here that I am sure that this
> > problem disappeared with this patch.
> > 
> > 
> > Today I got same kernel panic _with_ the patch applied....
> 
> Looks like (all of) the root causes are still not found yet. The symptom
> is exactly the same as the previous one.
> 
> One thing I found today is when txq read_ptr catches up to write_ptr
> (read_ptr == write_ptr), iwl_queue_used() will _always_ return TRUE!
> This will be a problem if the firmware sends us a wrong index
> (sequence), then we will fail the check in this condition. I'm not sure
> if firmware can really send us a wrong sequence. Can you please try this
> patch? Apply it on top of the previous one. If you do see the "FIRMWARE
> BUG" in dmesg, then I think we find the root cause.



Finally exploded....


<3>[  330.761347] iwl3945 0000:06:00.0: Microcode SW error detected. Restarting 0x82000008.
<3>[  330.761403] iwl3945 0000:06:00.0: Start IWL Error Log Dump:
<3>[  330.761421] iwl3945 0000:06:00.0: Status: 0x000302E6, count: 1
<3>[  330.761438] iwl3945 0000:06:00.0: Desc       Time       asrtPC  blink2 ilink1  nmiPC   Line
<3>[  330.761683] iwl3945 0000:06:00.0: SYSASSERT     (#5) 1035963230 0x008B6 0x017B8 0x0031C 0x00000 65
<3>[  330.761692] 
<3>[  330.761837] iwl3945 0000:06:00.0: Start IWL Event Log Dump: display last 20 count
<3>[  330.761895] iwl3945 0000:06:00.0: 1035963151	0x00000000	1106
<3>[  330.761931] iwl3945 0000:06:00.0: 1035963155	0x00000000	1106
<3>[  330.761967] iwl3945 0000:06:00.0: 1035963160	0x00000000	1106
<3>[  330.762003] iwl3945 0000:06:00.0: 1035963164	0x00000000	1106
<3>[  330.762039] iwl3945 0000:06:00.0: 1035963168	0x00000000	1106
<3>[  330.762075] iwl3945 0000:06:00.0: 1035963173	0x00000000	1106
<3>[  330.762112] iwl3945 0000:06:00.0: 1035963177	0x00000000	1106
<3>[  330.762148] iwl3945 0000:06:00.0: 1035963181	0x00000000	1106
<3>[  330.762184] iwl3945 0000:06:00.0: 1035963185	0x00000000	1106
<3>[  330.762220] iwl3945 0000:06:00.0: 1035963190	0x00000001	1106
<3>[  330.762256] iwl3945 0000:06:00.0: 1035963190	0x0000000f	1106
<3>[  330.762292] iwl3945 0000:06:00.0: 1035963192	0x00000417	1105
<3>[  330.762329] iwl3945 0000:06:00.0: 1035963195	0x000005a5	1115
<3>[  330.762365] iwl3945 0000:06:00.0: 1035963199	0x00000001	0463
<3>[  330.762401] iwl3945 0000:06:00.0: 1035963200	0x00000001	0462
<3>[  330.762437] iwl3945 0000:06:00.0: 1035963212	0x000002b1	0603
<3>[  330.762474] iwl3945 0000:06:00.0: 1035963217	0x04fb0077	0401
<3>[  330.762510] iwl3945 0000:06:00.0: 1035963222	0x04dc0018	0401
<3>[  330.762547] iwl3945 0000:06:00.0: 1035963229	0x00000000	0401
<3>[  330.762583] iwl3945 0000:06:00.0: 1035963232	0x00000100	0125
<4>[  330.762623] FIRMWARE BUG: index 220 is given while read_ptr is 252
<0>[  330.762674] ------------[ cut here ]------------
<2>[  330.762689] kernel BUG at /home/maxim/software/kernel/linux-2.6/drivers/net/wireless/iwlwifi/iwl-dev.h:717!
<0>[  330.762715] invalid opcode: 0000 [#1] PREEMPT SMP 
<0>[  330.762735] last sysfs file: /sys/devices/platform/coretemp.1/temp1_input
<4>[  330.762755] CPU 0 
<4>[  330.762765] Modules linked in: nvidia(P) af_packet nfsd exportfs nfs lockd nfs_acl auth_rpcgss sunrpc usb_storage usb_libusual cpufreq_powersave cpufreq_conservative snd_hda_codec_realtek cpufreq_userspace snd_hda_intel acpi_cpufreq uvcvideo iwl3945 snd_hda_codec coretemp videodev iwlcore v4l1_compat snd_hwdep joydev ohci1394 v4l2_compat_ioctl32 tg3 snd_pcm sbp2 mac80211 uhci_hcd psmouse video snd_page_alloc ieee1394 libphy ehci_hcd output ac battery lirc_ene0100 serio_raw evdev cfg80211 rfkill usbcore fuse lzo lzo_decompress lzo_compress
<6>[  330.762987] Pid: 0, comm: swapper Tainted: P           2.6.32-wl #243 Aspire 5720     
<6>[  330.763011] RIP: 0010:[<ffffffffa01c3fe7>]  [<ffffffffa01c3fe7>] iwl_tx_cmd_complete+0x397/0x3a0 [iwlcore]
<6>[  330.763057] RSP: 0018:ffff880002203d20  EFLAGS: 00010286
<6>[  330.763075] RAX: 0000000000000049 RBX: ffff88006fc09b40 RCX: 0000000000000102
<6>[  330.763096] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000001
<6>[  330.763118] RBP: ffff880002203d90 R08: 0000000000000002 R09: 0000000000000000
<6>[  330.763139] R10: 0000000000000000 R11: 0000000000000001 R12: 00000000000000fc
<6>[  330.763160] R13: 00000000000000dc R14: ffff88006fd6d460 R15: ffff88006a2a1600
<6>[  330.763182] FS:  0000000000000000(0000) GS:ffff880002200000(0000) knlGS:0000000000000000
<6>[  330.763206] CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
<6>[  330.763224] CR2: 00007f7ffbf95000 CR3: 0000000001001000 CR4: 00000000000006f0
<6>[  330.763246] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
<6>[  330.763267] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
<4>[  330.763289] Process swapper (pid: 0, threadinfo ffffffff815ca000, task ffffffff815ceb60)
<0>[  330.763310] Stack:
<4>[  330.763318]  ffff88006a2a9690 0000000000000046 ffff880002203d40 0000000000000046
<4>[  330.763346] <0> 0000000000000282 0000000000000282 0000000000000282 ffff8800699eba00
<4>[  330.763375] <0> ffff880002203d70 ffff88006a2a1600 ffff88006a2a2f10 ffff88006a673000
<0>[  330.763406] Call Trace:
<0>[  330.763416]  <IRQ> 
<4>[  330.763436]  [<ffffffffa0230d13>] iwl3945_irq_tasklet+0x673/0x1740 [iwl3945]
<4>[  330.763469]  [<ffffffffa01b683e>] ? iwl_isr_legacy+0x3e/0x600 [iwlcore]
<4>[  330.763496]  [<ffffffff81053f56>] tasklet_action+0x106/0x110
<4>[  330.763516]  [<ffffffff81055674>] __do_softirq+0x114/0x2c0
<4>[  330.763536]  [<ffffffff8109eb38>] ? handle_edge_irq+0x78/0x160
<4>[  330.763558]  [<ffffffff8100d25c>] call_softirq+0x1c/0x30
<4>[  330.763577]  [<ffffffff8100f1fd>] do_softirq+0x7d/0xb0
<4>[  330.763595]  [<ffffffff81055125>] irq_exit+0x95/0xa0
<4>[  330.763615]  [<ffffffff8140bddc>] do_IRQ+0x7c/0xf0
<4>[  330.763634]  [<ffffffff8100ca13>] ret_from_intr+0x0/0xf
<0>[  330.763650]  <EOI> 
<4>[  330.763664]  [<ffffffff812a1d5b>] ? acpi_idle_enter_bm+0x275/0x2aa
<4>[  330.763684]  [<ffffffff812a1d65>] ? acpi_idle_enter_bm+0x27f/0x2aa
<4>[  330.763705]  [<ffffffff812a1d5b>] ? acpi_idle_enter_bm+0x275/0x2aa
<4>[  330.763729]  [<ffffffff8133227f>] ? cpuidle_idle_call+0x9f/0x160
<4>[  330.763749]  [<ffffffff8100b1df>] ? cpu_idle+0xaf/0x110
<4>[  330.763769]  [<ffffffff813f534a>] ? rest_init+0x7a/0x80
<4>[  330.763789]  [<ffffffff81640d01>] ? start_kernel+0x3ac/0x3b8
<4>[  330.763808]  [<ffffffff81640315>] ? x86_64_start_reservations+0x125/0x129
<4>[  330.763831]  [<ffffffff816403fd>] ? x86_64_start_kernel+0xe4/0xeb
<0>[  330.763848] Code: c7 c6 48 86 1d a0 48 c7 c7 53 86 1d a0 e8 a2 b3 07 e1 e9 68 fe ff ff 44 89 e2 44 89 ee 48 c7 c7 80 5a 1d a0 31 c0 e8 c7 31 24 e1 <0f> 0b eb fe 0f 1f 44 00 00 55 48 89 e5 41 57 41 56 41 55 41 54 
<1>[  330.764008] RIP  [<ffffffffa01c3fe7>] iwl_tx_cmd_complete+0x397/0x3a0 [iwlcore]
<4>[  330.764042]  RSP <ffff880002203d20>
<4>[  330.764059] ---[ end trace e386a588d450d76f ]---
<0>[  330.764073] Kernel panic - not syncing: Fatal exception in interrupt
<4>[  330.764093] Pid: 0, comm: swapper Tainted: P      D    2.6.32-wl #243
<4>[  330.764111] Call Trace:
<4>[  330.764120]  <IRQ>  [<ffffffff814070f1>] panic+0x82/0x13f
<4>[  330.764148]  [<ffffffff81010a52>] oops_end+0xe2/0xf0
<4>[  330.764166]  [<ffffffff81010c7b>] die+0x5b/0x90
<4>[  330.764183]  [<ffffffff8100db24>] do_trap+0xc4/0x170
<4>[  330.764201]  [<ffffffff8100e025>] do_invalid_op+0x95/0xb0
<4>[  330.764230]  [<ffffffffa01c3fe7>] ? iwl_tx_cmd_complete+0x397/0x3a0 [iwlcore]
<4>[  330.764254]  [<ffffffff8140a94d>] ? trace_hardirqs_off_thunk+0x3a/0x3c
<4>[  330.764276]  [<ffffffff8100cef5>] invalid_op+0x15/0x20
<4>[  330.764304]  [<ffffffffa01c3fe7>] ? iwl_tx_cmd_complete+0x397/0x3a0 [iwlcore]
<4>[  330.764333]  [<ffffffffa0230d13>] iwl3945_irq_tasklet+0x673/0x1740 [iwl3945]
<4>[  330.764364]  [<ffffffffa01b683e>] ? iwl_isr_legacy+0x3e/0x600 [iwlcore]
<4>[  330.764388]  [<ffffffff81053f56>] tasklet_action+0x106/0x110
<4>[  330.764408]  [<ffffffff81055674>] __do_softirq+0x114/0x2c0
<4>[  330.764427]  [<ffffffff8109eb38>] ? handle_edge_irq+0x78/0x160
<4>[  330.764447]  [<ffffffff8100d25c>] call_softirq+0x1c/0x30
<4>[  330.764465]  [<ffffffff8100f1fd>] do_softirq+0x7d/0xb0
<4>[  330.764483]  [<ffffffff81055125>] irq_exit+0x95/0xa0
<4>[  330.764500]  [<ffffffff8140bddc>] do_IRQ+0x7c/0xf0
<4>[  330.764519]  [<ffffffff8100ca13>] ret_from_intr+0x0/0xf
<4>[  330.764534]  <EOI>  [<ffffffff812a1d5b>] ? acpi_idle_enter_bm+0x275/0x2aa
<4>[  330.764562]  [<ffffffff812a1d65>] ? acpi_idle_enter_bm+0x27f/0x2aa
<4>[  330.764583]  [<ffffffff812a1d5b>] ? acpi_idle_enter_bm+0x275/0x2aa
<4>[  330.764605]  [<ffffffff8133227f>] ? cpuidle_idle_call+0x9f/0x160
<4>[  330.764625]  [<ffffffff8100b1df>] ? cpu_idle+0xaf/0x110
<4>[  330.764643]  [<ffffffff813f534a>] ? rest_init+0x7a/0x80
<4>[  330.764662]  [<ffffffff81640d01>] ? start_kernel+0x3ac/0x3b8
<4>[  330.764682]  [<ffffffff81640315>] ? x86_64_start_reservations+0x125/0x129
<4>[  330.764703]  [<ffffffff816403fd>] ? x86_64_start_kernel+0xe4/0xeb



Note that these days I see the 'Microcode SW error detected' somewhat more often that I used to.


This is the combined patch I used:


diff --git a/drivers/net/wireless/iwlwifi/iwl-dev.h b/drivers/net/wireless/iwlwifi/iwl-dev.h
index 2673e9a..8efc469 100644
--- a/drivers/net/wireless/iwlwifi/iwl-dev.h
+++ b/drivers/net/wireless/iwlwifi/iwl-dev.h
@@ -711,7 +711,13 @@ extern void iwl_txq_ctx_stop(struct iwl_priv *priv);
 extern int iwl_queue_space(const struct iwl_queue *q);
 static inline int iwl_queue_used(const struct iwl_queue *q, int i)
 {
-       return q->write_ptr > q->read_ptr ?
+       if (q->write_ptr == q->read_ptr) {
+               printk("FIRMWARE BUG: index %d is given while read_ptr is %d\n",
+                      i, q->read_ptr);
+               BUG();
+       }
+
+       return q->write_ptr >= q->read_ptr ?
                (i >= q->read_ptr && i < q->write_ptr) :
                !(i < q->read_ptr && i >= q->write_ptr);
 }
diff --git a/drivers/net/wireless/iwlwifi/iwl3945-base.c b/drivers/net/wireless/iwlwifi/iwl3945-base.c
index 2a28a1f..a36de73 100644
--- a/drivers/net/wireless/iwlwifi/iwl3945-base.c
+++ b/drivers/net/wireless/iwlwifi/iwl3945-base.c
@@ -548,6 +548,9 @@ static int iwl3945_tx_skb(struct iwl_priv *priv, struct sk_buff *skb)
        txq = &priv->txq[txq_id];
        q = &txq->q;
 
+       if ((iwl_queue_space(q) < q->high_mark))
+               goto drop;
+
        spin_lock_irqsave(&priv->lock, flags);
 
        idx = get_cmd_index(q, q->write_ptr, 0);


Best regards,
	Maxim Levitsky


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* RE: [ipw3945-devel] Panic in iwl3945 driver
  2010-01-05 16:56                   ` Maxim Levitsky
@ 2010-01-05 22:16                     ` Cahill, Ben M
  2010-01-05 22:26                       ` Cahill, Ben M
  2010-01-06  3:55                     ` Zhu Yi
  1 sibling, 1 reply; 20+ messages in thread
From: Cahill, Ben M @ 2010-01-05 22:16 UTC (permalink / raw)
  To: Maxim Levitsky, Zhu, Yi
  Cc: Chatre, Reinette, linux-wireless, iwlwifi maling list

uCode is receiving bad command from command queue ... See below (search "Bad").

-- Ben --

>-----Original Message-----
>From: Maxim Levitsky [mailto:maximlevitsky@gmail.com]
>Sent: Tuesday, January 05, 2010 11:56 AM
>To: Zhu, Yi
>Cc: Chatre, Reinette; linux-wireless; iwlwifi maling list
>Subject: Re: [ipw3945-devel] Panic in iwl3945 driver
>
>On Tue, 2009-12-22 at 16:57 +0800, Zhu Yi wrote:
>> On Tue, 2009-12-22 at 00:40 +0800, Maxim Levitsky wrote:
>> > On Wed, 2009-12-02 at 19:17 +0200, Maxim Levitsky wrote:
>> > > On Wed, 2009-12-02 at 13:42 +0800, Zhu Yi wrote:
>> > > > On Tue, 2009-12-01 at 17:28 +0800, Zhu Yi wrote:
>> > > > > On Tue, 2009-12-01 at 06:35 +0800, Maxim Levitsky wrote:
>> > > > > > 0x000000000001668e <iwl3945_rx_reply_tx+302>:
>lea    0x38(%r8),%rdi
>> > > > > > 0x0000000000016692 <iwl3945_rx_reply_tx+306>:
>lea    0x4f(%r8),%rax
>> > > > >
>> > > > > When this happened, from your previous post, r8 is
>0x0 and rdi is 0x38.
>> > > > > Since "info" is %rdi (see below), this means
>> > > > > txq->txb[txq->q.read_ptr].skb[0], aka. r8 is 0.
>> > > > >
>> > > > > >    rate_idx = iwl3945_hwrate_to_plcp_idx(tx_resp->rate);
>> > > > > >
>> > > > > > 0x0000000000016696 <iwl3945_rx_reply_tx+310>:
>movb   $0x0,0x9(%rdi)        <---------- RIP
>> > > > > > 0x000000000001669a <iwl3945_rx_reply_tx+314>:
>movb   $0x0,0xc(%rdi)
>> > > > > > 0x000000000001669e <iwl3945_rx_reply_tx+318>:
>movb   $0x0,0xf(%rdi)
>> > > > > > 0x00000000000166a2 <iwl3945_rx_reply_tx+322>:
>movb   $0x0,0x12(%rdi)
>> > > > > > 0x00000000000166a6 <iwl3945_rx_reply_tx+326>:
>movb   $0x0,0x15(%rdi)
>> > > > >
>> > > > > This equals to below code in
>ieee80211_tx_info_clear_status(). "info" is
>> > > > > %rdi, which is 0x38. That matches NULL pointer
>dereference at 0x41 in
>> > > > > your oops header.
>> > > > >
>> > > > >      for (i = 0; i < IEEE80211_TX_MAX_RATES; i++)
>> > > > >                 info->status.rates[i].count = 0;
>> > > > >
>> > > > > I guess there is a race for txq->q.read_ptr
>somewhere. Haven't checked
>> > > > > though.
>> > > >
>> > > > OK. 3945 updated write_ptr without regard to read_ptr
>on the Tx path.
>> > > > This messes up our TFD on high load. The patch should
>fix your problem.
>> > > >
>> > > > Signed-off-by: Zhu Yi <yi.zhu@intel.com>
>> > > >
>> > > > diff --git
>a/drivers/net/wireless/iwlwifi/iwl3945-base.c
>b/drivers/net/wireless/iwlwifi/iwl3945-base.c
>> > > > index 994db4a..b31b34c 100644
>> > > > --- a/drivers/net/wireless/iwlwifi/iwl3945-base.c
>> > > > +++ b/drivers/net/wireless/iwlwifi/iwl3945-base.c
>> > > > @@ -548,6 +548,9 @@ static int iwl3945_tx_skb(struct
>iwl_priv *priv, struct sk_buff *skb)
>> > > >        txq = &priv->txq[txq_id];
>> > > >        q = &txq->q;
>> > > >
>> > > > +      if ((iwl_queue_space(q) < q->high_mark))
>> > > > +              goto drop;
>> > > > +
>> > > >        spin_lock_irqsave(&priv->lock, flags);
>> > > >
>> > > >        idx = get_cmd_index(q, q->write_ptr, 0);
>> > > >
>> > > >
>> > Few days ago, I had an idea to reply here that I am sure that this
>> > problem disappeared with this patch.
>> >
>> >
>> > Today I got same kernel panic _with_ the patch applied....
>>
>> Looks like (all of) the root causes are still not found yet.
>The symptom
>> is exactly the same as the previous one.
>>
>> One thing I found today is when txq read_ptr catches up to write_ptr
>> (read_ptr == write_ptr), iwl_queue_used() will _always_ return TRUE!
>> This will be a problem if the firmware sends us a wrong index
>> (sequence), then we will fail the check in this condition.
>I'm not sure
>> if firmware can really send us a wrong sequence. Can you
>please try this
>> patch? Apply it on top of the previous one. If you do see
>the "FIRMWARE
>> BUG" in dmesg, then I think we find the root cause.
>
>
>
>Finally exploded....
>
>
><3>[  330.761347] iwl3945 0000:06:00.0: Microcode SW error
>detected. Restarting 0x82000008.
><3>[  330.761403] iwl3945 0000:06:00.0: Start IWL Error Log Dump:
><3>[  330.761421] iwl3945 0000:06:00.0: Status: 0x000302E6, count: 1
><3>[  330.761438] iwl3945 0000:06:00.0: Desc       Time
>asrtPC  blink2 ilink1  nmiPC   Line
><3>[  330.761683] iwl3945 0000:06:00.0: SYSASSERT     (#5)
>1035963230 0x008B6 0x017B8 0x0031C 0x00000 65

"Bad Command" sysassert ... See below (search "obviously").

><3>[  330.761692]
><3>[  330.761837] iwl3945 0000:06:00.0: Start IWL Event Log
>Dump: display last 20 count
><3>[  330.761895] iwl3945 0000:06:00.0: 1035963151
>0x00000000     1106
><3>[  330.761931] iwl3945 0000:06:00.0: 1035963155
>0x00000000     1106
><3>[  330.761967] iwl3945 0000:06:00.0: 1035963160
>0x00000000     1106
><3>[  330.762003] iwl3945 0000:06:00.0: 1035963164
>0x00000000     1106
><3>[  330.762039] iwl3945 0000:06:00.0: 1035963168
>0x00000000     1106
><3>[  330.762075] iwl3945 0000:06:00.0: 1035963173
>0x00000000     1106
><3>[  330.762112] iwl3945 0000:06:00.0: 1035963177
>0x00000000     1106
><3>[  330.762148] iwl3945 0000:06:00.0: 1035963181
>0x00000000     1106
><3>[  330.762184] iwl3945 0000:06:00.0: 1035963185
>0x00000000     1106
><3>[  330.762220] iwl3945 0000:06:00.0: 1035963190
>0x00000001     1106
><3>[  330.762256] iwl3945 0000:06:00.0: 1035963190
>0x0000000f     1106
><3>[  330.762292] iwl3945 0000:06:00.0: 1035963192
>0x00000417     1105
><3>[  330.762329] iwl3945 0000:06:00.0: 1035963195
>0x000005a5     1115
><3>[  330.762365] iwl3945 0000:06:00.0: 1035963199
>0x00000001     0463
><3>[  330.762401] iwl3945 0000:06:00.0: 1035963200
>0x00000001     0462
><3>[  330.762437] iwl3945 0000:06:00.0: 1035963212
>0x000002b1     0603
><3>[  330.762474] iwl3945 0000:06:00.0: 1035963217
>0x04fb0077     0401
><3>[  330.762510] iwl3945 0000:06:00.0: 1035963222
>0x04dc0018     0401

Last good command 0x18, with (driver provided) header indicating sent via queue 4, queue index 0xdc

><3>[  330.762547] iwl3945 0000:06:00.0: 1035963229
>0x00000000     0401

This shows command "0" (non-existant/unsupported/bad command) with (driver provided) header indicating sent via queue "0", index "0"; obviously bad data in command queue.

-- Ben --

><3>[  330.762583] iwl3945 0000:06:00.0: 1035963232
>0x00000100     0125
><4>[  330.762623] FIRMWARE BUG: index 220 is given while
>read_ptr is 252
><0>[  330.762674] ------------[ cut here ]------------
><2>[  330.762689] kernel BUG at
>/home/maxim/software/kernel/linux-2.6/drivers/net/wireless/iwlw
>ifi/iwl-dev.h:717!
><0>[  330.762715] invalid opcode: 0000 [#1] PREEMPT SMP
><0>[  330.762735] last sysfs file:
>/sys/devices/platform/coretemp.1/temp1_input
><4>[  330.762755] CPU 0
><4>[  330.762765] Modules linked in: nvidia(P) af_packet nfsd
>exportfs nfs lockd nfs_acl auth_rpcgss sunrpc usb_storage
>usb_libusual cpufreq_powersave cpufreq_conservative
>snd_hda_codec_realtek cpufreq_userspace snd_hda_intel
>acpi_cpufreq uvcvideo iwl3945 snd_hda_codec coretemp videodev
>iwlcore v4l1_compat snd_hwdep joydev ohci1394
>v4l2_compat_ioctl32 tg3 snd_pcm sbp2 mac80211 uhci_hcd psmouse
>video snd_page_alloc ieee1394 libphy ehci_hcd output ac
>battery lirc_ene0100 serio_raw evdev cfg80211 rfkill usbcore
>fuse lzo lzo_decompress lzo_compress
><6>[  330.762987] Pid: 0, comm: swapper Tainted: P
>2.6.32-wl #243 Aspire 5720
><6>[  330.763011] RIP: 0010:[<ffffffffa01c3fe7>]
>[<ffffffffa01c3fe7>] iwl_tx_cmd_complete+0x397/0x3a0 [iwlcore]
><6>[  330.763057] RSP: 0018:ffff880002203d20  EFLAGS: 00010286
><6>[  330.763075] RAX: 0000000000000049 RBX: ffff88006fc09b40
>RCX: 0000000000000102
><6>[  330.763096] RDX: 0000000000000000 RSI: 0000000000000000
>RDI: 0000000000000001
><6>[  330.763118] RBP: ffff880002203d90 R08: 0000000000000002
>R09: 0000000000000000
><6>[  330.763139] R10: 0000000000000000 R11: 0000000000000001
>R12: 00000000000000fc
><6>[  330.763160] R13: 00000000000000dc R14: ffff88006fd6d460
>R15: ffff88006a2a1600
><6>[  330.763182] FS:  0000000000000000(0000)
>GS:ffff880002200000(0000) knlGS:0000000000000000
><6>[  330.763206] CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
><6>[  330.763224] CR2: 00007f7ffbf95000 CR3: 0000000001001000
>CR4: 00000000000006f0
><6>[  330.763246] DR0: 0000000000000000 DR1: 0000000000000000
>DR2: 0000000000000000
><6>[  330.763267] DR3: 0000000000000000 DR6: 00000000ffff0ff0
>DR7: 0000000000000400
><4>[  330.763289] Process swapper (pid: 0, threadinfo
>ffffffff815ca000, task ffffffff815ceb60)
><0>[  330.763310] Stack:
><4>[  330.763318]  ffff88006a2a9690 0000000000000046
>ffff880002203d40 0000000000000046
><4>[  330.763346] <0> 0000000000000282 0000000000000282
>0000000000000282 ffff8800699eba00
><4>[  330.763375] <0> ffff880002203d70 ffff88006a2a1600
>ffff88006a2a2f10 ffff88006a673000
><0>[  330.763406] Call Trace:
><0>[  330.763416]  <IRQ>
><4>[  330.763436]  [<ffffffffa0230d13>]
>iwl3945_irq_tasklet+0x673/0x1740 [iwl3945]
><4>[  330.763469]  [<ffffffffa01b683e>] ?
>iwl_isr_legacy+0x3e/0x600 [iwlcore]
><4>[  330.763496]  [<ffffffff81053f56>] tasklet_action+0x106/0x110
><4>[  330.763516]  [<ffffffff81055674>] __do_softirq+0x114/0x2c0
><4>[  330.763536]  [<ffffffff8109eb38>] ? handle_edge_irq+0x78/0x160
><4>[  330.763558]  [<ffffffff8100d25c>] call_softirq+0x1c/0x30
><4>[  330.763577]  [<ffffffff8100f1fd>] do_softirq+0x7d/0xb0
><4>[  330.763595]  [<ffffffff81055125>] irq_exit+0x95/0xa0
><4>[  330.763615]  [<ffffffff8140bddc>] do_IRQ+0x7c/0xf0
><4>[  330.763634]  [<ffffffff8100ca13>] ret_from_intr+0x0/0xf
><0>[  330.763650]  <EOI>
><4>[  330.763664]  [<ffffffff812a1d5b>] ?
>acpi_idle_enter_bm+0x275/0x2aa
><4>[  330.763684]  [<ffffffff812a1d65>] ?
>acpi_idle_enter_bm+0x27f/0x2aa
><4>[  330.763705]  [<ffffffff812a1d5b>] ?
>acpi_idle_enter_bm+0x275/0x2aa
><4>[  330.763729]  [<ffffffff8133227f>] ? cpuidle_idle_call+0x9f/0x160
><4>[  330.763749]  [<ffffffff8100b1df>] ? cpu_idle+0xaf/0x110
><4>[  330.763769]  [<ffffffff813f534a>] ? rest_init+0x7a/0x80
><4>[  330.763789]  [<ffffffff81640d01>] ? start_kernel+0x3ac/0x3b8
><4>[  330.763808]  [<ffffffff81640315>] ?
>x86_64_start_reservations+0x125/0x129
><4>[  330.763831]  [<ffffffff816403fd>] ? x86_64_start_kernel+0xe4/0xeb
><0>[  330.763848] Code: c7 c6 48 86 1d a0 48 c7 c7 53 86 1d a0
>e8 a2 b3 07 e1 e9 68 fe ff ff 44 89 e2 44 89 ee 48 c7 c7 80 5a
>1d a0 31 c0 e8 c7 31 24 e1 <0f> 0b eb fe 0f 1f 44 00 00 55 48
>89 e5 41 57 41 56 41 55 41 54
><1>[  330.764008] RIP  [<ffffffffa01c3fe7>]
>iwl_tx_cmd_complete+0x397/0x3a0 [iwlcore]
><4>[  330.764042]  RSP <ffff880002203d20>
><4>[  330.764059] ---[ end trace e386a588d450d76f ]---
><0>[  330.764073] Kernel panic - not syncing: Fatal exception
>in interrupt
><4>[  330.764093] Pid: 0, comm: swapper Tainted: P      D
>2.6.32-wl #243
><4>[  330.764111] Call Trace:
><4>[  330.764120]  <IRQ>  [<ffffffff814070f1>] panic+0x82/0x13f
><4>[  330.764148]  [<ffffffff81010a52>] oops_end+0xe2/0xf0
><4>[  330.764166]  [<ffffffff81010c7b>] die+0x5b/0x90
><4>[  330.764183]  [<ffffffff8100db24>] do_trap+0xc4/0x170
><4>[  330.764201]  [<ffffffff8100e025>] do_invalid_op+0x95/0xb0
><4>[  330.764230]  [<ffffffffa01c3fe7>] ?
>iwl_tx_cmd_complete+0x397/0x3a0 [iwlcore]
><4>[  330.764254]  [<ffffffff8140a94d>] ?
>trace_hardirqs_off_thunk+0x3a/0x3c
><4>[  330.764276]  [<ffffffff8100cef5>] invalid_op+0x15/0x20
><4>[  330.764304]  [<ffffffffa01c3fe7>] ?
>iwl_tx_cmd_complete+0x397/0x3a0 [iwlcore]
><4>[  330.764333]  [<ffffffffa0230d13>]
>iwl3945_irq_tasklet+0x673/0x1740 [iwl3945]
><4>[  330.764364]  [<ffffffffa01b683e>] ?
>iwl_isr_legacy+0x3e/0x600 [iwlcore]
><4>[  330.764388]  [<ffffffff81053f56>] tasklet_action+0x106/0x110
><4>[  330.764408]  [<ffffffff81055674>] __do_softirq+0x114/0x2c0
><4>[  330.764427]  [<ffffffff8109eb38>] ? handle_edge_irq+0x78/0x160
><4>[  330.764447]  [<ffffffff8100d25c>] call_softirq+0x1c/0x30
><4>[  330.764465]  [<ffffffff8100f1fd>] do_softirq+0x7d/0xb0
><4>[  330.764483]  [<ffffffff81055125>] irq_exit+0x95/0xa0
><4>[  330.764500]  [<ffffffff8140bddc>] do_IRQ+0x7c/0xf0
><4>[  330.764519]  [<ffffffff8100ca13>] ret_from_intr+0x0/0xf
><4>[  330.764534]  <EOI>  [<ffffffff812a1d5b>] ?
>acpi_idle_enter_bm+0x275/0x2aa
><4>[  330.764562]  [<ffffffff812a1d65>] ?
>acpi_idle_enter_bm+0x27f/0x2aa
><4>[  330.764583]  [<ffffffff812a1d5b>] ?
>acpi_idle_enter_bm+0x275/0x2aa
><4>[  330.764605]  [<ffffffff8133227f>] ? cpuidle_idle_call+0x9f/0x160
><4>[  330.764625]  [<ffffffff8100b1df>] ? cpu_idle+0xaf/0x110
><4>[  330.764643]  [<ffffffff813f534a>] ? rest_init+0x7a/0x80
><4>[  330.764662]  [<ffffffff81640d01>] ? start_kernel+0x3ac/0x3b8
><4>[  330.764682]  [<ffffffff81640315>] ?
>x86_64_start_reservations+0x125/0x129
><4>[  330.764703]  [<ffffffff816403fd>] ? x86_64_start_kernel+0xe4/0xeb
>
>
>
>Note that these days I see the 'Microcode SW error detected'
>somewhat more often that I used to.
>
>
>This is the combined patch I used:
>
>
>diff --git a/drivers/net/wireless/iwlwifi/iwl-dev.h
>b/drivers/net/wireless/iwlwifi/iwl-dev.h
>index 2673e9a..8efc469 100644
>--- a/drivers/net/wireless/iwlwifi/iwl-dev.h
>+++ b/drivers/net/wireless/iwlwifi/iwl-dev.h
>@@ -711,7 +711,13 @@ extern void iwl_txq_ctx_stop(struct
>iwl_priv *priv);
> extern int iwl_queue_space(const struct iwl_queue *q);
> static inline int iwl_queue_used(const struct iwl_queue *q, int i)
> {
>-       return q->write_ptr > q->read_ptr ?
>+       if (q->write_ptr == q->read_ptr) {
>+               printk("FIRMWARE BUG: index %d is given while
>read_ptr is %d\n",
>+                      i, q->read_ptr);
>+               BUG();
>+       }
>+
>+       return q->write_ptr >= q->read_ptr ?
>                (i >= q->read_ptr && i < q->write_ptr) :
>                !(i < q->read_ptr && i >= q->write_ptr);
> }
>diff --git a/drivers/net/wireless/iwlwifi/iwl3945-base.c
>b/drivers/net/wireless/iwlwifi/iwl3945-base.c
>index 2a28a1f..a36de73 100644
>--- a/drivers/net/wireless/iwlwifi/iwl3945-base.c
>+++ b/drivers/net/wireless/iwlwifi/iwl3945-base.c
>@@ -548,6 +548,9 @@ static int iwl3945_tx_skb(struct iwl_priv
>*priv, struct sk_buff *skb)
>        txq = &priv->txq[txq_id];
>        q = &txq->q;
>
>+       if ((iwl_queue_space(q) < q->high_mark))
>+               goto drop;
>+
>        spin_lock_irqsave(&priv->lock, flags);
>
>        idx = get_cmd_index(q, q->write_ptr, 0);
>
>
>Best regards,
>       Maxim Levitsky
>
>
>---------------------------------------------------------------
>---------------
>This SF.Net email is sponsored by the Verizon Developer Community
>Take advantage of Verizon's best-in-class app development support
>A streamlined, 14 day to market process makes app distribution
>fast and easy
>Join now and get one step closer to millions of Verizon customers
>http://p.sf.net/sfu/verizon-dev2dev
>_______________________________________________
>Ipw3945-devel mailing list
>Ipw3945-devel@lists.sourceforge.net
>https://lists.sourceforge.net/lists/listinfo/ipw3945-devel
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: [ipw3945-devel] Panic in iwl3945 driver
  2010-01-05 22:16                     ` [ipw3945-devel] " Cahill, Ben M
@ 2010-01-05 22:26                       ` Cahill, Ben M
  2010-01-06  3:57                         ` Zhu Yi
  0 siblings, 1 reply; 20+ messages in thread
From: Cahill, Ben M @ 2010-01-05 22:26 UTC (permalink / raw)
  To: Cahill, Ben M, Maxim Levitsky, Zhu, Yi
  Cc: Chatre, Reinette, linux-wireless, iwlwifi maling list



>-----Original Message-----
>From: Cahill, Ben M [mailto:ben.m.cahill@intel.com]
>Sent: Tuesday, January 05, 2010 5:16 PM
>To: Maxim Levitsky; Zhu, Yi
>Cc: Chatre, Reinette; linux-wireless; iwlwifi maling list
>Subject: Re: [ipw3945-devel] Panic in iwl3945 driver
>
>uCode is receiving bad command from command queue ... See
>below (search "Bad").
>
>-- Ben --
>
>>-----Original Message-----
>>From: Maxim Levitsky [mailto:maximlevitsky@gmail.com]
>>Sent: Tuesday, January 05, 2010 11:56 AM
>>To: Zhu, Yi
>>Cc: Chatre, Reinette; linux-wireless; iwlwifi maling list
>>Subject: Re: [ipw3945-devel] Panic in iwl3945 driver
>>
>>On Tue, 2009-12-22 at 16:57 +0800, Zhu Yi wrote:
>>> On Tue, 2009-12-22 at 00:40 +0800, Maxim Levitsky wrote:
>>> > On Wed, 2009-12-02 at 19:17 +0200, Maxim Levitsky wrote:
>>> > > On Wed, 2009-12-02 at 13:42 +0800, Zhu Yi wrote:
>>> > > > On Tue, 2009-12-01 at 17:28 +0800, Zhu Yi wrote:
>>> > > > > On Tue, 2009-12-01 at 06:35 +0800, Maxim Levitsky wrote:
>>> > > > > > 0x000000000001668e <iwl3945_rx_reply_tx+302>:
>>lea    0x38(%r8),%rdi
>>> > > > > > 0x0000000000016692 <iwl3945_rx_reply_tx+306>:
>>lea    0x4f(%r8),%rax
>>> > > > >
>>> > > > > When this happened, from your previous post, r8 is
>>0x0 and rdi is 0x38.
>>> > > > > Since "info" is %rdi (see below), this means
>>> > > > > txq->txb[txq->q.read_ptr].skb[0], aka. r8 is 0.
>>> > > > >
>>> > > > > >    rate_idx = iwl3945_hwrate_to_plcp_idx(tx_resp->rate);
>>> > > > > >
>>> > > > > > 0x0000000000016696 <iwl3945_rx_reply_tx+310>:
>>movb   $0x0,0x9(%rdi)        <---------- RIP
>>> > > > > > 0x000000000001669a <iwl3945_rx_reply_tx+314>:
>>movb   $0x0,0xc(%rdi)
>>> > > > > > 0x000000000001669e <iwl3945_rx_reply_tx+318>:
>>movb   $0x0,0xf(%rdi)
>>> > > > > > 0x00000000000166a2 <iwl3945_rx_reply_tx+322>:
>>movb   $0x0,0x12(%rdi)
>>> > > > > > 0x00000000000166a6 <iwl3945_rx_reply_tx+326>:
>>movb   $0x0,0x15(%rdi)
>>> > > > >
>>> > > > > This equals to below code in
>>ieee80211_tx_info_clear_status(). "info" is
>>> > > > > %rdi, which is 0x38. That matches NULL pointer
>>dereference at 0x41 in
>>> > > > > your oops header.
>>> > > > >
>>> > > > >      for (i = 0; i < IEEE80211_TX_MAX_RATES; i++)
>>> > > > >                 info->status.rates[i].count = 0;
>>> > > > >
>>> > > > > I guess there is a race for txq->q.read_ptr
>>somewhere. Haven't checked
>>> > > > > though.
>>> > > >
>>> > > > OK. 3945 updated write_ptr without regard to read_ptr
>>on the Tx path.
>>> > > > This messes up our TFD on high load. The patch should
>>fix your problem.
>>> > > >
>>> > > > Signed-off-by: Zhu Yi <yi.zhu@intel.com>
>>> > > >
>>> > > > diff --git
>>a/drivers/net/wireless/iwlwifi/iwl3945-base.c
>>b/drivers/net/wireless/iwlwifi/iwl3945-base.c
>>> > > > index 994db4a..b31b34c 100644
>>> > > > --- a/drivers/net/wireless/iwlwifi/iwl3945-base.c
>>> > > > +++ b/drivers/net/wireless/iwlwifi/iwl3945-base.c
>>> > > > @@ -548,6 +548,9 @@ static int iwl3945_tx_skb(struct
>>iwl_priv *priv, struct sk_buff *skb)
>>> > > >        txq = &priv->txq[txq_id];
>>> > > >        q = &txq->q;
>>> > > >
>>> > > > +      if ((iwl_queue_space(q) < q->high_mark))
>>> > > > +              goto drop;
>>> > > > +
>>> > > >        spin_lock_irqsave(&priv->lock, flags);
>>> > > >
>>> > > >        idx = get_cmd_index(q, q->write_ptr, 0);
>>> > > >
>>> > > >
>>> > Few days ago, I had an idea to reply here that I am sure that this
>>> > problem disappeared with this patch.
>>> >
>>> >
>>> > Today I got same kernel panic _with_ the patch applied....
>>>
>>> Looks like (all of) the root causes are still not found yet.
>>The symptom
>>> is exactly the same as the previous one.
>>>
>>> One thing I found today is when txq read_ptr catches up to write_ptr
>>> (read_ptr == write_ptr), iwl_queue_used() will _always_ return TRUE!
>>> This will be a problem if the firmware sends us a wrong index
>>> (sequence), then we will fail the check in this condition.
>>I'm not sure
>>> if firmware can really send us a wrong sequence. Can you
>>please try this
>>> patch? Apply it on top of the previous one. If you do see
>>the "FIRMWARE
>>> BUG" in dmesg, then I think we find the root cause.
>>
>>
>>
>>Finally exploded....
>>
>>
>><3>[  330.761347] iwl3945 0000:06:00.0: Microcode SW error
>>detected. Restarting 0x82000008.
>><3>[  330.761403] iwl3945 0000:06:00.0: Start IWL Error Log Dump:
>><3>[  330.761421] iwl3945 0000:06:00.0: Status: 0x000302E6, count: 1
>><3>[  330.761438] iwl3945 0000:06:00.0: Desc       Time
>>asrtPC  blink2 ilink1  nmiPC   Line
>><3>[  330.761683] iwl3945 0000:06:00.0: SYSASSERT     (#5)
>>1035963230 0x008B6 0x017B8 0x0031C 0x00000 65
>
>"Bad Command" sysassert ... See below (search "obviously").
>
>><3>[  330.761692]
>><3>[  330.761837] iwl3945 0000:06:00.0: Start IWL Event Log
>>Dump: display last 20 count
>><3>[  330.761895] iwl3945 0000:06:00.0: 1035963151
>>0x00000000     1106
>><3>[  330.761931] iwl3945 0000:06:00.0: 1035963155
>>0x00000000     1106
>><3>[  330.761967] iwl3945 0000:06:00.0: 1035963160
>>0x00000000     1106
>><3>[  330.762003] iwl3945 0000:06:00.0: 1035963164
>>0x00000000     1106
>><3>[  330.762039] iwl3945 0000:06:00.0: 1035963168
>>0x00000000     1106
>><3>[  330.762075] iwl3945 0000:06:00.0: 1035963173
>>0x00000000     1106
>><3>[  330.762112] iwl3945 0000:06:00.0: 1035963177
>>0x00000000     1106
>><3>[  330.762148] iwl3945 0000:06:00.0: 1035963181
>>0x00000000     1106
>><3>[  330.762184] iwl3945 0000:06:00.0: 1035963185
>>0x00000000     1106
>><3>[  330.762220] iwl3945 0000:06:00.0: 1035963190
>>0x00000001     1106
>><3>[  330.762256] iwl3945 0000:06:00.0: 1035963190
>>0x0000000f     1106
>><3>[  330.762292] iwl3945 0000:06:00.0: 1035963192
>>0x00000417     1105
>><3>[  330.762329] iwl3945 0000:06:00.0: 1035963195
>>0x000005a5     1115
>><3>[  330.762365] iwl3945 0000:06:00.0: 1035963199
>>0x00000001     0463
>><3>[  330.762401] iwl3945 0000:06:00.0: 1035963200
>>0x00000001     0462
>><3>[  330.762437] iwl3945 0000:06:00.0: 1035963212
>>0x000002b1     0603
>><3>[  330.762474] iwl3945 0000:06:00.0: 1035963217
>>0x04fb0077     0401
>><3>[  330.762510] iwl3945 0000:06:00.0: 1035963222
>>0x04dc0018     0401
>
>Last good command 0x18, with (driver provided) header
>indicating sent via queue 4, queue index 0xdc
>
>><3>[  330.762547] iwl3945 0000:06:00.0: 1035963229
>>0x00000000     0401
>
>This shows command "0" (non-existant/unsupported/bad command)
>with (driver provided) header indicating sent via queue "0",
>index "0"; obviously bad data in command queue.

Or index ("pointer") is getting out of sync.  (Reminder to self; never use the word "obviously").

-- Ben --
>
>-- Ben --
>
>><3>[  330.762583] iwl3945 0000:06:00.0: 1035963232
>>0x00000100     0125
>><4>[  330.762623] FIRMWARE BUG: index 220 is given while
>>read_ptr is 252
>><0>[  330.762674] ------------[ cut here ]------------
>><2>[  330.762689] kernel BUG at
>>/home/maxim/software/kernel/linux-2.6/drivers/net/wireless/iwlw
>>ifi/iwl-dev.h:717!
>><0>[  330.762715] invalid opcode: 0000 [#1] PREEMPT SMP
>><0>[  330.762735] last sysfs file:
>>/sys/devices/platform/coretemp.1/temp1_input
>><4>[  330.762755] CPU 0
>><4>[  330.762765] Modules linked in: nvidia(P) af_packet nfsd
>>exportfs nfs lockd nfs_acl auth_rpcgss sunrpc usb_storage
>>usb_libusual cpufreq_powersave cpufreq_conservative
>>snd_hda_codec_realtek cpufreq_userspace snd_hda_intel
>>acpi_cpufreq uvcvideo iwl3945 snd_hda_codec coretemp videodev
>>iwlcore v4l1_compat snd_hwdep joydev ohci1394
>>v4l2_compat_ioctl32 tg3 snd_pcm sbp2 mac80211 uhci_hcd psmouse
>>video snd_page_alloc ieee1394 libphy ehci_hcd output ac
>>battery lirc_ene0100 serio_raw evdev cfg80211 rfkill usbcore
>>fuse lzo lzo_decompress lzo_compress
>><6>[  330.762987] Pid: 0, comm: swapper Tainted: P
>>2.6.32-wl #243 Aspire 5720
>><6>[  330.763011] RIP: 0010:[<ffffffffa01c3fe7>]
>>[<ffffffffa01c3fe7>] iwl_tx_cmd_complete+0x397/0x3a0 [iwlcore]
>><6>[  330.763057] RSP: 0018:ffff880002203d20  EFLAGS: 00010286
>><6>[  330.763075] RAX: 0000000000000049 RBX: ffff88006fc09b40
>>RCX: 0000000000000102
>><6>[  330.763096] RDX: 0000000000000000 RSI: 0000000000000000
>>RDI: 0000000000000001
>><6>[  330.763118] RBP: ffff880002203d90 R08: 0000000000000002
>>R09: 0000000000000000
>><6>[  330.763139] R10: 0000000000000000 R11: 0000000000000001
>>R12: 00000000000000fc
>><6>[  330.763160] R13: 00000000000000dc R14: ffff88006fd6d460
>>R15: ffff88006a2a1600
>><6>[  330.763182] FS:  0000000000000000(0000)
>>GS:ffff880002200000(0000) knlGS:0000000000000000
>><6>[  330.763206] CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
>><6>[  330.763224] CR2: 00007f7ffbf95000 CR3: 0000000001001000
>>CR4: 00000000000006f0
>><6>[  330.763246] DR0: 0000000000000000 DR1: 0000000000000000
>>DR2: 0000000000000000
>><6>[  330.763267] DR3: 0000000000000000 DR6: 00000000ffff0ff0
>>DR7: 0000000000000400
>><4>[  330.763289] Process swapper (pid: 0, threadinfo
>>ffffffff815ca000, task ffffffff815ceb60)
>><0>[  330.763310] Stack:
>><4>[  330.763318]  ffff88006a2a9690 0000000000000046
>>ffff880002203d40 0000000000000046
>><4>[  330.763346] <0> 0000000000000282 0000000000000282
>>0000000000000282 ffff8800699eba00
>><4>[  330.763375] <0> ffff880002203d70 ffff88006a2a1600
>>ffff88006a2a2f10 ffff88006a673000
>><0>[  330.763406] Call Trace:
>><0>[  330.763416]  <IRQ>
>><4>[  330.763436]  [<ffffffffa0230d13>]
>>iwl3945_irq_tasklet+0x673/0x1740 [iwl3945]
>><4>[  330.763469]  [<ffffffffa01b683e>] ?
>>iwl_isr_legacy+0x3e/0x600 [iwlcore]
>><4>[  330.763496]  [<ffffffff81053f56>] tasklet_action+0x106/0x110
>><4>[  330.763516]  [<ffffffff81055674>] __do_softirq+0x114/0x2c0
>><4>[  330.763536]  [<ffffffff8109eb38>] ? handle_edge_irq+0x78/0x160
>><4>[  330.763558]  [<ffffffff8100d25c>] call_softirq+0x1c/0x30
>><4>[  330.763577]  [<ffffffff8100f1fd>] do_softirq+0x7d/0xb0
>><4>[  330.763595]  [<ffffffff81055125>] irq_exit+0x95/0xa0
>><4>[  330.763615]  [<ffffffff8140bddc>] do_IRQ+0x7c/0xf0
>><4>[  330.763634]  [<ffffffff8100ca13>] ret_from_intr+0x0/0xf
>><0>[  330.763650]  <EOI>
>><4>[  330.763664]  [<ffffffff812a1d5b>] ?
>>acpi_idle_enter_bm+0x275/0x2aa
>><4>[  330.763684]  [<ffffffff812a1d65>] ?
>>acpi_idle_enter_bm+0x27f/0x2aa
>><4>[  330.763705]  [<ffffffff812a1d5b>] ?
>>acpi_idle_enter_bm+0x275/0x2aa
>><4>[  330.763729]  [<ffffffff8133227f>] ? cpuidle_idle_call+0x9f/0x160
>><4>[  330.763749]  [<ffffffff8100b1df>] ? cpu_idle+0xaf/0x110
>><4>[  330.763769]  [<ffffffff813f534a>] ? rest_init+0x7a/0x80
>><4>[  330.763789]  [<ffffffff81640d01>] ? start_kernel+0x3ac/0x3b8
>><4>[  330.763808]  [<ffffffff81640315>] ?
>>x86_64_start_reservations+0x125/0x129
>><4>[  330.763831]  [<ffffffff816403fd>] ?
>x86_64_start_kernel+0xe4/0xeb
>><0>[  330.763848] Code: c7 c6 48 86 1d a0 48 c7 c7 53 86 1d a0
>>e8 a2 b3 07 e1 e9 68 fe ff ff 44 89 e2 44 89 ee 48 c7 c7 80 5a
>>1d a0 31 c0 e8 c7 31 24 e1 <0f> 0b eb fe 0f 1f 44 00 00 55 48
>>89 e5 41 57 41 56 41 55 41 54
>><1>[  330.764008] RIP  [<ffffffffa01c3fe7>]
>>iwl_tx_cmd_complete+0x397/0x3a0 [iwlcore]
>><4>[  330.764042]  RSP <ffff880002203d20>
>><4>[  330.764059] ---[ end trace e386a588d450d76f ]---
>><0>[  330.764073] Kernel panic - not syncing: Fatal exception
>>in interrupt
>><4>[  330.764093] Pid: 0, comm: swapper Tainted: P      D
>>2.6.32-wl #243
>><4>[  330.764111] Call Trace:
>><4>[  330.764120]  <IRQ>  [<ffffffff814070f1>] panic+0x82/0x13f
>><4>[  330.764148]  [<ffffffff81010a52>] oops_end+0xe2/0xf0
>><4>[  330.764166]  [<ffffffff81010c7b>] die+0x5b/0x90
>><4>[  330.764183]  [<ffffffff8100db24>] do_trap+0xc4/0x170
>><4>[  330.764201]  [<ffffffff8100e025>] do_invalid_op+0x95/0xb0
>><4>[  330.764230]  [<ffffffffa01c3fe7>] ?
>>iwl_tx_cmd_complete+0x397/0x3a0 [iwlcore]
>><4>[  330.764254]  [<ffffffff8140a94d>] ?
>>trace_hardirqs_off_thunk+0x3a/0x3c
>><4>[  330.764276]  [<ffffffff8100cef5>] invalid_op+0x15/0x20
>><4>[  330.764304]  [<ffffffffa01c3fe7>] ?
>>iwl_tx_cmd_complete+0x397/0x3a0 [iwlcore]
>><4>[  330.764333]  [<ffffffffa0230d13>]
>>iwl3945_irq_tasklet+0x673/0x1740 [iwl3945]
>><4>[  330.764364]  [<ffffffffa01b683e>] ?
>>iwl_isr_legacy+0x3e/0x600 [iwlcore]
>><4>[  330.764388]  [<ffffffff81053f56>] tasklet_action+0x106/0x110
>><4>[  330.764408]  [<ffffffff81055674>] __do_softirq+0x114/0x2c0
>><4>[  330.764427]  [<ffffffff8109eb38>] ? handle_edge_irq+0x78/0x160
>><4>[  330.764447]  [<ffffffff8100d25c>] call_softirq+0x1c/0x30
>><4>[  330.764465]  [<ffffffff8100f1fd>] do_softirq+0x7d/0xb0
>><4>[  330.764483]  [<ffffffff81055125>] irq_exit+0x95/0xa0
>><4>[  330.764500]  [<ffffffff8140bddc>] do_IRQ+0x7c/0xf0
>><4>[  330.764519]  [<ffffffff8100ca13>] ret_from_intr+0x0/0xf
>><4>[  330.764534]  <EOI>  [<ffffffff812a1d5b>] ?
>>acpi_idle_enter_bm+0x275/0x2aa
>><4>[  330.764562]  [<ffffffff812a1d65>] ?
>>acpi_idle_enter_bm+0x27f/0x2aa
>><4>[  330.764583]  [<ffffffff812a1d5b>] ?
>>acpi_idle_enter_bm+0x275/0x2aa
>><4>[  330.764605]  [<ffffffff8133227f>] ? cpuidle_idle_call+0x9f/0x160
>><4>[  330.764625]  [<ffffffff8100b1df>] ? cpu_idle+0xaf/0x110
>><4>[  330.764643]  [<ffffffff813f534a>] ? rest_init+0x7a/0x80
>><4>[  330.764662]  [<ffffffff81640d01>] ? start_kernel+0x3ac/0x3b8
>><4>[  330.764682]  [<ffffffff81640315>] ?
>>x86_64_start_reservations+0x125/0x129
>><4>[  330.764703]  [<ffffffff816403fd>] ?
>x86_64_start_kernel+0xe4/0xeb
>>
>>
>>
>>Note that these days I see the 'Microcode SW error detected'
>>somewhat more often that I used to.
>>
>>
>>This is the combined patch I used:
>>
>>
>>diff --git a/drivers/net/wireless/iwlwifi/iwl-dev.h
>>b/drivers/net/wireless/iwlwifi/iwl-dev.h
>>index 2673e9a..8efc469 100644
>>--- a/drivers/net/wireless/iwlwifi/iwl-dev.h
>>+++ b/drivers/net/wireless/iwlwifi/iwl-dev.h
>>@@ -711,7 +711,13 @@ extern void iwl_txq_ctx_stop(struct
>>iwl_priv *priv);
>> extern int iwl_queue_space(const struct iwl_queue *q);
>> static inline int iwl_queue_used(const struct iwl_queue *q, int i)
>> {
>>-       return q->write_ptr > q->read_ptr ?
>>+       if (q->write_ptr == q->read_ptr) {
>>+               printk("FIRMWARE BUG: index %d is given while
>>read_ptr is %d\n",
>>+                      i, q->read_ptr);
>>+               BUG();
>>+       }
>>+
>>+       return q->write_ptr >= q->read_ptr ?
>>                (i >= q->read_ptr && i < q->write_ptr) :
>>                !(i < q->read_ptr && i >= q->write_ptr);
>> }
>>diff --git a/drivers/net/wireless/iwlwifi/iwl3945-base.c
>>b/drivers/net/wireless/iwlwifi/iwl3945-base.c
>>index 2a28a1f..a36de73 100644
>>--- a/drivers/net/wireless/iwlwifi/iwl3945-base.c
>>+++ b/drivers/net/wireless/iwlwifi/iwl3945-base.c
>>@@ -548,6 +548,9 @@ static int iwl3945_tx_skb(struct iwl_priv
>>*priv, struct sk_buff *skb)
>>        txq = &priv->txq[txq_id];
>>        q = &txq->q;
>>
>>+       if ((iwl_queue_space(q) < q->high_mark))
>>+               goto drop;
>>+
>>        spin_lock_irqsave(&priv->lock, flags);
>>
>>        idx = get_cmd_index(q, q->write_ptr, 0);
>>
>>
>>Best regards,
>>       Maxim Levitsky
>>
>>
>>---------------------------------------------------------------
>>---------------
>>This SF.Net email is sponsored by the Verizon Developer Community
>>Take advantage of Verizon's best-in-class app development support
>>A streamlined, 14 day to market process makes app distribution
>>fast and easy
>>Join now and get one step closer to millions of Verizon customers
>>http://p.sf.net/sfu/verizon-dev2dev
>>_______________________________________________
>>Ipw3945-devel mailing list
>>Ipw3945-devel@lists.sourceforge.net
>>https://lists.sourceforge.net/lists/listinfo/ipw3945-devel
>>
>
>---------------------------------------------------------------
>---------------
>This SF.Net email is sponsored by the Verizon Developer Community
>Take advantage of Verizon's best-in-class app development support
>A streamlined, 14 day to market process makes app distribution
>fast and easy
>Join now and get one step closer to millions of Verizon customers
>http://p.sf.net/sfu/verizon-dev2dev
>_______________________________________________
>Ipw3945-devel mailing list
>Ipw3945-devel@lists.sourceforge.net
>https://lists.sourceforge.net/lists/listinfo/ipw3945-devel
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Panic in iwl3945 driver
  2010-01-05 16:56                   ` Maxim Levitsky
  2010-01-05 22:16                     ` [ipw3945-devel] " Cahill, Ben M
@ 2010-01-06  3:55                     ` Zhu Yi
  1 sibling, 0 replies; 20+ messages in thread
From: Zhu Yi @ 2010-01-06  3:55 UTC (permalink / raw)
  To: Maxim Levitsky; +Cc: Chatre, Reinette, linux-wireless, iwlwifi maling list

On Wed, 2010-01-06 at 00:56 +0800, Maxim Levitsky wrote:
> > One thing I found today is when txq read_ptr catches up to write_ptr
> > (read_ptr == write_ptr), iwl_queue_used() will _always_ return TRUE!
> > This will be a problem if the firmware sends us a wrong index
> > (sequence), then we will fail the check in this condition. I'm not sure
> > if firmware can really send us a wrong sequence. Can you please try this
> > patch? Apply it on top of the previous one. If you do see the "FIRMWARE
> > BUG" in dmesg, then I think we find the root cause.
> 
> 
> 
> Finally exploded....
> 
> 
> <3>[  330.761347] iwl3945 0000:06:00.0: Microcode SW error detected. Restarting 0x82000008.
> <3>[  330.761403] iwl3945 0000:06:00.0: Start IWL Error Log Dump:
> <3>[  330.761421] iwl3945 0000:06:00.0: Status: 0x000302E6, count: 1
> <3>[  330.761438] iwl3945 0000:06:00.0: Desc       Time       asrtPC  blink2 ilink1  nmiPC   Line
> <3>[  330.761683] iwl3945 0000:06:00.0: SYSASSERT     (#5) 1035963230 0x008B6 0x017B8 0x0031C 0x00000 65
> <3>[  330.761692]
> <3>[  330.761837] iwl3945 0000:06:00.0: Start IWL Event Log Dump: display last 20 count
> <3>[  330.761895] iwl3945 0000:06:00.0: 1035963151      0x00000000      1106
> <3>[  330.761931] iwl3945 0000:06:00.0: 1035963155      0x00000000      1106
> <3>[  330.761967] iwl3945 0000:06:00.0: 1035963160      0x00000000      1106
> <3>[  330.762003] iwl3945 0000:06:00.0: 1035963164      0x00000000      1106
> <3>[  330.762039] iwl3945 0000:06:00.0: 1035963168      0x00000000      1106
> <3>[  330.762075] iwl3945 0000:06:00.0: 1035963173      0x00000000      1106
> <3>[  330.762112] iwl3945 0000:06:00.0: 1035963177      0x00000000      1106
> <3>[  330.762148] iwl3945 0000:06:00.0: 1035963181      0x00000000      1106
> <3>[  330.762184] iwl3945 0000:06:00.0: 1035963185      0x00000000      1106
> <3>[  330.762220] iwl3945 0000:06:00.0: 1035963190      0x00000001      1106
> <3>[  330.762256] iwl3945 0000:06:00.0: 1035963190      0x0000000f      1106
> <3>[  330.762292] iwl3945 0000:06:00.0: 1035963192      0x00000417      1105
> <3>[  330.762329] iwl3945 0000:06:00.0: 1035963195      0x000005a5      1115
> <3>[  330.762365] iwl3945 0000:06:00.0: 1035963199      0x00000001      0463
> <3>[  330.762401] iwl3945 0000:06:00.0: 1035963200      0x00000001      0462
> <3>[  330.762437] iwl3945 0000:06:00.0: 1035963212      0x000002b1      0603
> <3>[  330.762474] iwl3945 0000:06:00.0: 1035963217      0x04fb0077      0401
> <3>[  330.762510] iwl3945 0000:06:00.0: 1035963222      0x04dc0018      0401
> <3>[  330.762547] iwl3945 0000:06:00.0: 1035963229      0x00000000      0401
> <3>[  330.762583] iwl3945 0000:06:00.0: 1035963232      0x00000100      0125
> <4>[  330.762623] FIRMWARE BUG: index 220 is given while read_ptr is 252

OK. This confirms my supposition. The firmware does give us an invalid
read pointer in some situations. In the above case, the firmware also
detected a "bad command" error itself.

So we can use the second patch to prevent the panic (will do a formal
submission) and start to track the firmware issue on the other hand.
Thanks for your testing!

Thanks,
-yi

> <0>[  330.762674] ------------[ cut here ]------------
> <2>[  330.762689] kernel BUG at /home/maxim/software/kernel/linux-2.6/drivers/net/wireless/iwlwifi/iwl-dev.h:717!
> <0>[  330.762715] invalid opcode: 0000 [#1] PREEMPT SMP
> <0>[  330.762735] last sysfs file: /sys/devices/platform/coretemp.1/temp1_input
> <4>[  330.762755] CPU 0
> <4>[  330.762765] Modules linked in: nvidia(P) af_packet nfsd exportfs nfs lockd nfs_acl auth_rpcgss sunrpc usb_storage usb_libusual cpufreq_powersave cpufreq_conservative snd_hda_codec_realtek cpufreq_userspace snd_hda_intel acpi_cpufreq uvcvideo iwl3945 snd_hda_codec coretemp videodev iwlcore v4l1_compat snd_hwdep joydev ohci1394 v4l2_compat_ioctl32 tg3 snd_pcm sbp2 mac80211 uhci_hcd psmouse video snd_page_alloc ieee1394 libphy ehci_hcd output ac battery lirc_ene0100 serio_raw evdev cfg80211 rfkill usbcore fuse lzo lzo_decompress lzo_compress
> <6>[  330.762987] Pid: 0, comm: swapper Tainted: P           2.6.32-wl #243 Aspire 5720
> <6>[  330.763011] RIP: 0010:[<ffffffffa01c3fe7>]  [<ffffffffa01c3fe7>] iwl_tx_cmd_complete+0x397/0x3a0 [iwlcore]
> <6>[  330.763057] RSP: 0018:ffff880002203d20  EFLAGS: 00010286
> <6>[  330.763075] RAX: 0000000000000049 RBX: ffff88006fc09b40 RCX: 0000000000000102
> <6>[  330.763096] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000001
> <6>[  330.763118] RBP: ffff880002203d90 R08: 0000000000000002 R09: 0000000000000000
> <6>[  330.763139] R10: 0000000000000000 R11: 0000000000000001 R12: 00000000000000fc
> <6>[  330.763160] R13: 00000000000000dc R14: ffff88006fd6d460 R15: ffff88006a2a1600
> <6>[  330.763182] FS:  0000000000000000(0000) GS:ffff880002200000(0000) knlGS:0000000000000000
> <6>[  330.763206] CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
> <6>[  330.763224] CR2: 00007f7ffbf95000 CR3: 0000000001001000 CR4: 00000000000006f0
> <6>[  330.763246] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> <6>[  330.763267] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> <4>[  330.763289] Process swapper (pid: 0, threadinfo ffffffff815ca000, task ffffffff815ceb60)
> <0>[  330.763310] Stack:
> <4>[  330.763318]  ffff88006a2a9690 0000000000000046 ffff880002203d40 0000000000000046
> <4>[  330.763346] <0> 0000000000000282 0000000000000282 0000000000000282 ffff8800699eba00
> <4>[  330.763375] <0> ffff880002203d70 ffff88006a2a1600 ffff88006a2a2f10 ffff88006a673000
> <0>[  330.763406] Call Trace:
> <0>[  330.763416]  <IRQ>
> <4>[  330.763436]  [<ffffffffa0230d13>] iwl3945_irq_tasklet+0x673/0x1740 [iwl3945]
> <4>[  330.763469]  [<ffffffffa01b683e>] ? iwl_isr_legacy+0x3e/0x600 [iwlcore]
> <4>[  330.763496]  [<ffffffff81053f56>] tasklet_action+0x106/0x110
> <4>[  330.763516]  [<ffffffff81055674>] __do_softirq+0x114/0x2c0
> <4>[  330.763536]  [<ffffffff8109eb38>] ? handle_edge_irq+0x78/0x160
> <4>[  330.763558]  [<ffffffff8100d25c>] call_softirq+0x1c/0x30
> <4>[  330.763577]  [<ffffffff8100f1fd>] do_softirq+0x7d/0xb0
> <4>[  330.763595]  [<ffffffff81055125>] irq_exit+0x95/0xa0
> <4>[  330.763615]  [<ffffffff8140bddc>] do_IRQ+0x7c/0xf0
> <4>[  330.763634]  [<ffffffff8100ca13>] ret_from_intr+0x0/0xf
> <0>[  330.763650]  <EOI>
> <4>[  330.763664]  [<ffffffff812a1d5b>] ? acpi_idle_enter_bm+0x275/0x2aa
> <4>[  330.763684]  [<ffffffff812a1d65>] ? acpi_idle_enter_bm+0x27f/0x2aa
> <4>[  330.763705]  [<ffffffff812a1d5b>] ? acpi_idle_enter_bm+0x275/0x2aa
> <4>[  330.763729]  [<ffffffff8133227f>] ? cpuidle_idle_call+0x9f/0x160
> <4>[  330.763749]  [<ffffffff8100b1df>] ? cpu_idle+0xaf/0x110
> <4>[  330.763769]  [<ffffffff813f534a>] ? rest_init+0x7a/0x80
> <4>[  330.763789]  [<ffffffff81640d01>] ? start_kernel+0x3ac/0x3b8
> <4>[  330.763808]  [<ffffffff81640315>] ? x86_64_start_reservations+0x125/0x129
> <4>[  330.763831]  [<ffffffff816403fd>] ? x86_64_start_kernel+0xe4/0xeb
> <0>[  330.763848] Code: c7 c6 48 86 1d a0 48 c7 c7 53 86 1d a0 e8 a2 b3 07 e1 e9 68 fe ff ff 44 89 e2 44 89 ee 48 c7 c7 80 5a 1d a0 31 c0 e8 c7 31 24 e1 <0f> 0b eb fe 0f 1f 44 00 00 55 48 89 e5 41 57 41 56 41 55 41 54
> <1>[  330.764008] RIP  [<ffffffffa01c3fe7>] iwl_tx_cmd_complete+0x397/0x3a0 [iwlcore]
> <4>[  330.764042]  RSP <ffff880002203d20>
> <4>[  330.764059] ---[ end trace e386a588d450d76f ]---
> <0>[  330.764073] Kernel panic - not syncing: Fatal exception in interrupt
> <4>[  330.764093] Pid: 0, comm: swapper Tainted: P      D    2.6.32-wl #243
> <4>[  330.764111] Call Trace:
> <4>[  330.764120]  <IRQ>  [<ffffffff814070f1>] panic+0x82/0x13f
> <4>[  330.764148]  [<ffffffff81010a52>] oops_end+0xe2/0xf0
> <4>[  330.764166]  [<ffffffff81010c7b>] die+0x5b/0x90
> <4>[  330.764183]  [<ffffffff8100db24>] do_trap+0xc4/0x170
> <4>[  330.764201]  [<ffffffff8100e025>] do_invalid_op+0x95/0xb0
> <4>[  330.764230]  [<ffffffffa01c3fe7>] ? iwl_tx_cmd_complete+0x397/0x3a0 [iwlcore]
> <4>[  330.764254]  [<ffffffff8140a94d>] ? trace_hardirqs_off_thunk+0x3a/0x3c
> <4>[  330.764276]  [<ffffffff8100cef5>] invalid_op+0x15/0x20
> <4>[  330.764304]  [<ffffffffa01c3fe7>] ? iwl_tx_cmd_complete+0x397/0x3a0 [iwlcore]
> <4>[  330.764333]  [<ffffffffa0230d13>] iwl3945_irq_tasklet+0x673/0x1740 [iwl3945]
> <4>[  330.764364]  [<ffffffffa01b683e>] ? iwl_isr_legacy+0x3e/0x600 [iwlcore]
> <4>[  330.764388]  [<ffffffff81053f56>] tasklet_action+0x106/0x110
> <4>[  330.764408]  [<ffffffff81055674>] __do_softirq+0x114/0x2c0
> <4>[  330.764427]  [<ffffffff8109eb38>] ? handle_edge_irq+0x78/0x160
> <4>[  330.764447]  [<ffffffff8100d25c>] call_softirq+0x1c/0x30
> <4>[  330.764465]  [<ffffffff8100f1fd>] do_softirq+0x7d/0xb0
> <4>[  330.764483]  [<ffffffff81055125>] irq_exit+0x95/0xa0
> <4>[  330.764500]  [<ffffffff8140bddc>] do_IRQ+0x7c/0xf0
> <4>[  330.764519]  [<ffffffff8100ca13>] ret_from_intr+0x0/0xf
> <4>[  330.764534]  <EOI>  [<ffffffff812a1d5b>] ? acpi_idle_enter_bm+0x275/0x2aa
> <4>[  330.764562]  [<ffffffff812a1d65>] ? acpi_idle_enter_bm+0x27f/0x2aa
> <4>[  330.764583]  [<ffffffff812a1d5b>] ? acpi_idle_enter_bm+0x275/0x2aa
> <4>[  330.764605]  [<ffffffff8133227f>] ? cpuidle_idle_call+0x9f/0x160
> <4>[  330.764625]  [<ffffffff8100b1df>] ? cpu_idle+0xaf/0x110
> <4>[  330.764643]  [<ffffffff813f534a>] ? rest_init+0x7a/0x80
> <4>[  330.764662]  [<ffffffff81640d01>] ? start_kernel+0x3ac/0x3b8
> <4>[  330.764682]  [<ffffffff81640315>] ? x86_64_start_reservations+0x125/0x129
> <4>[  330.764703]  [<ffffffff816403fd>] ? x86_64_start_kernel+0xe4/0xeb

Thanks,
-yi


^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: [ipw3945-devel] Panic in iwl3945 driver
  2010-01-05 22:26                       ` Cahill, Ben M
@ 2010-01-06  3:57                         ` Zhu Yi
  2010-01-06  5:23                           ` Cahill, Ben M
  0 siblings, 1 reply; 20+ messages in thread
From: Zhu Yi @ 2010-01-06  3:57 UTC (permalink / raw)
  To: Cahill, Ben M
  Cc: Maxim Levitsky, Chatre, Reinette, linux-wireless, iwlwifi maling list

On Wed, 2010-01-06 at 06:26 +0800, Cahill, Ben M wrote:
> >><3>[  330.762510] iwl3945 0000:06:00.0: 1035963222
> >>0x04dc0018     0401
> >
> >Last good command 0x18, with (driver provided) header
> >indicating sent via queue 4, queue index 0xdc
> >
> >><3>[  330.762547] iwl3945 0000:06:00.0: 1035963229
> >>0x00000000     0401
> >
> >This shows command "0" (non-existant/unsupported/bad command)
> >with (driver provided) header indicating sent via queue "0",
> >index "0"; obviously bad data in command queue.
> 
> Or index ("pointer") is getting out of sync.  (Reminder to self; never
> use the word "obviously"). 

Ben, does it smell like driver issue or firmware bug?

Thanks,
-yi


^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: [ipw3945-devel] Panic in iwl3945 driver
  2010-01-06  3:57                         ` Zhu Yi
@ 2010-01-06  5:23                           ` Cahill, Ben M
  0 siblings, 0 replies; 20+ messages in thread
From: Cahill, Ben M @ 2010-01-06  5:23 UTC (permalink / raw)
  To: Zhu, Yi
  Cc: Maxim Levitsky, Chatre, Reinette, linux-wireless, iwlwifi maling list

 

>-----Original Message-----
>From: Zhu, Yi 
>Sent: Tuesday, January 05, 2010 10:57 PM
>To: Cahill, Ben M
>Cc: Maxim Levitsky; Chatre, Reinette; linux-wireless; iwlwifi 
>maling list
>Subject: RE: [ipw3945-devel] Panic in iwl3945 driver
>
>On Wed, 2010-01-06 at 06:26 +0800, Cahill, Ben M wrote:
>> >><3>[  330.762510] iwl3945 0000:06:00.0: 1035963222
>> >>0x04dc0018     0401
>> >
>> >Last good command 0x18, with (driver provided) header
>> >indicating sent via queue 4, queue index 0xdc
>> >
>> >><3>[  330.762547] iwl3945 0000:06:00.0: 1035963229
>> >>0x00000000     0401
>> >
>> >This shows command "0" (non-existant/unsupported/bad command)
>> >with (driver provided) header indicating sent via queue "0",
>> >index "0"; obviously bad data in command queue.
>> 
>> Or index ("pointer") is getting out of sync.  (Reminder to 
>self; never
>> use the word "obviously"). 
>
>Ben, does it smell like driver issue or firmware bug?

I would tend to suspect driver or platform issue, but that's just a guess of course.  Something seems to be telling the device to read DRAM that contains "0".  Do we reset unused command buffers to 0?  Just wondering if the index/pointers are getting out of sync somehow (double-increment or who-knows-what?).

I haven't followed this thread closely (plus I've been on vacation!); can you give me a summary of what you're trying to fix?

Also, not sure if it's relevant, but a community member discovered that MSI should be enabled in kernel config.  See bugzilla 1944.

-- Ben --

>
>Thanks,
>-yi
>
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2010-01-06  5:23 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-11-25 16:49 Panic in iwl3945 driver Maxim Levitsky
2009-11-27  0:00 ` Maxim Levitsky
2009-11-30 15:55   ` Maxim Levitsky
2009-11-30 21:42     ` reinette chatre
2009-11-30 22:35       ` Maxim Levitsky
2009-12-01  9:28         ` Zhu Yi
2009-12-01 18:52           ` reinette chatre
2009-12-02  2:06             ` Zhu Yi
2009-12-02  5:42           ` Zhu Yi
2009-12-02 17:17             ` Maxim Levitsky
2009-12-21 16:40               ` Maxim Levitsky
2009-12-22  8:57                 ` Zhu Yi
2010-01-05 16:56                   ` Maxim Levitsky
2010-01-05 22:16                     ` [ipw3945-devel] " Cahill, Ben M
2010-01-05 22:26                       ` Cahill, Ben M
2010-01-06  3:57                         ` Zhu Yi
2010-01-06  5:23                           ` Cahill, Ben M
2010-01-06  3:55                     ` Zhu Yi
2009-12-02 19:18             ` [ipw3945-devel] " Cahill, Ben M
2009-12-02 19:40               ` Abhijeet Kolekar

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.