From: Pedro Francisco <pedrogfrancisco@gmail.com>
To: Stanislaw Gruszka <sgruszka@redhat.com>
Cc: ML linux-wireless <linux-wireless@vger.kernel.org>,
Johannes Berg <johannes@sipsolutions.net>
Subject: Re: unloading WiFi modules is usually triggering kernel crash
Date: Tue, 9 Oct 2012 10:14:40 +0100 [thread overview]
Message-ID: <CAJZjf_xZC2473B2SQAKFfc7m6SSeEK7z7N-GaUsZUNV-K9c=Ww@mail.gmail.com> (raw)
In-Reply-To: <20121003143029.GF2259@redhat.com>
On Wed, Oct 3, 2012 at 3:30 PM, Stanislaw Gruszka <sgruszka@redhat.com> wrote:
> On Wed, Sep 26, 2012 at 01:47:18PM +0100, Pedro Francisco wrote:
>> On Thu, Aug 30, 2012 at 4:58 PM, Pedro Francisco
>> <pedrogfrancisco@gmail.com> wrote:
>> > On Tue, Aug 7, 2012 at 11:22 AM, Stanislaw Gruszka <sgruszka@redhat.com> wrote:
>> >> On Tue, Jul 31, 2012 at 01:54:52PM +0100, Pedro Francisco wrote:
>> >>> I've noticed in the past few days a pattern: sometimes nm-applet
>> >>> starts showing empty bars for the signal strength.
>> >>
>> >> RSSI reporting problem or maybe NM issue. When you change kernel to
>> >> older or newer does this problem go away ?
>> >>
>> >>> Running the script:
>> >>> sudo ifconfig wlan0 down; sleep 1
>> >>> sudo rmmod hp_wmi; sudo rmmod iwl3945; sudo rmmod iwlegacy; sudo rmmod
>> >>> mac80211; sudo rmmod cfg80211
>> >>> sleep 2; sudo rmmod rfkill; sync
>> >>> sudo modprobe rfkill; sudo modprobe cfg80211; sudo modprobe mac80211;
>> >>> sudo modprobe iwlegacy
>> >>> sudo modprobe iwl3945; sudo modprobe hp_wmi; sleep 1; sudo ifconfig wlan0 up
>> >>
>> >> I run a bit modified script (I do not have hp_wmi.ko and rfkill.ko) for few
>> >> hours, and did not get any WARNING/crash. I used 3.5, can you check if that
>> >> problem is also fixed on your system on 3.5 or newer.
>> >
>> > On 3.5.2-3.fc17.i686.PAE everything seems stable. The problem I had
>> > described hasn't happened recently.
>> > I guess it got fixed in the meantime.
>>
>> I was wrong, got it again.
>>
>> So, to recap: once the network applet shows no signal, but only then,
>> removing the wireless modules triggers an unrecoverable kernel panic.
>> I still haven't compiled a relocatable x86 kernel to get a proper
>> backtrace using kexec/kdump, sorry.
>>
>> I found something else as well. Notice this output of "iwconfig" when
>> everything is _normal_:
>> $ iwconfig wlan0
>> wlan0 IEEE 802.11abg ESSID:"eduroam"
>> Mode:Managed Frequency:2.437 GHz Access Point: B8:62:1F:XX:XX:XX
>> Bit Rate=54 Mb/s Tx-Power=15 dBm
>> Retry long limit:7 RTS thr:off Fragment thr:off
>> Power Management:off
>> Link Quality=58/70 Signal level=-52 dBm
>> Rx invalid nwid:0 Rx invalid crypt:0 Rx invalid frag:0
>> Tx excessive retries:0 Invalid misc:0 Missed beacon:0
>>
>> When I have the "empty signal bars" issue:
>> $ iwconfig wlan0
>> wlan0 IEEE 802.11abg ESSID:off/any
>> Mode:Managed Access Point: Not-Associated Tx-Power=15 dBm
>> Retry long limit:7 RTS thr:off Fragment thr:off
>> Power Management:off
>>
>> In case you're wondering, it is connected and streaming stuff :)
>>
>> I can sometimes trigger it on purpose: I just have to roam to a 5GHz
>> AP of the same ESS, cycle around 2GHz and back to 5GHz (using wpa_cli
>> roam XX:XX:XX:XX:XX ). If I get "SME: Authentication request to the
>> driver failed", then disabling NetworkManager (not wireless) and
>> reenabling will _probably_ get the "empty signal bars" (I was just
>> able to trigger the "empty signal bars" now after a clean boot).
>> So I'm guessing something gets corrupted, which is why reloading the
>> modules will crash.
>
> We do not stop mac80211 timers on module unload. I reproduced below
> warnings with iwlwifi on 3.5 kernel with DEBUG_OBJECTS enabled.
> I forced roaming many times, and then do "modprobe -r iwlwifi".
> Unfortunately those steps do not trigger warnings anytime, they
> happened just once.
>
> iwlwifi 0000:02:00.0: ACTIVATE a non DRIVER active station id 0 addr 6c:50:4d:3f:79:73
> ------------[ cut here ]------------
> WARNING: at lib/debugobjects.c:261 debug_print_object+0x8e/0xb0()
> Hardware name: SandyBridge Platform
> ODEBUG: free active (active state 0) object type: timer_list hint:
> ieee80211_sta_conn_mon_timer+0x0/0x40 [mac80211]
> Modules linked in: autofs4 sunrpc cpufreq_ondemand acpi_cpufreq
> freq_table mperf ipv6 uinput arc4 sg iwlwifi(-) mac80211 cfg80211 rfkill
> coretemp kvm_intel kvm crc32c_intel ghash_clmulni_intel microcode pcspkr
> lpc_ich mfd_core i2c_i801 e1000e ext4 mbcache jbd2 sd_mod crc_t10dif
> sr_mod cdrom aesni_intel cryptd aes_x86_64 aes_generic ahci libahci i915
> drm_kms_helper drm i2c_algo_bit i2c_core video dm_mirror dm_region_hash
> dm_log dm_mod [last unloaded: scsi_wait_scan]
> Pid: 3064, comm: modprobe Not tainted 3.5.0 #1
> Call Trace:
> [<ffffffff810535af>] warn_slowpath_common+0x7f/0xc0
> [<ffffffff810536a6>] warn_slowpath_fmt+0x46/0x50
> [<ffffffff812901be>] debug_print_object+0x8e/0xb0
> [<ffffffffa03a09b0>] ? ieee80211_chswitch_timer+0x40/0x40 [mac80211]
> [<ffffffff81290a0d>] __debug_check_no_obj_freed+0x10d/0x200
> [<ffffffff81290b1d>] debug_check_no_obj_freed+0x1d/0x30
> [<ffffffff8117a2b0>] kfree+0xc0/0x330
> [<ffffffff810b9083>] ? __lock_release+0x133/0x1a0
> [<ffffffff815555f0>] ? _raw_spin_unlock_irqrestore+0x40/0x80
> [<ffffffff814957c4>] netdev_release+0x44/0x60
> [<ffffffff813704b7>] device_release+0x27/0xa0
> [<ffffffff8127da42>] kobject_cleanup+0x82/0x1b0
> [<ffffffff8127db7d>] kobject_release+0xd/0x10
> [<ffffffff8127d8cc>] kobject_put+0x2c/0x60
> [<ffffffff8147e371>] netdev_run_todo+0x101/0x180
> [<ffffffff8148f5ae>] rtnl_unlock+0xe/0x10
> [<ffffffffa0366178>] ieee80211_unregister_hw+0x58/0x120 [mac80211]
> [<ffffffffa040912b>] iwlagn_mac_unregister+0x2b/0x40 [iwlwifi]
> [<ffffffffa03fdf59>] iwl_op_mode_dvm_stop+0x49/0xf0 [iwlwifi]
> [<ffffffffa041f730>] iwl_drv_stop+0x40/0x60 [iwlwifi]
> [<ffffffffa0430a39>] iwl_pci_remove+0x25/0x3c [iwlwifi]
> [<ffffffff812aafc2>] pci_device_remove+0x52/0x120
> [<ffffffff813741cc>] __device_release_driver+0x7c/0xe0
> [<ffffffff81374308>] driver_detach+0xd8/0xe0
> [<ffffffff81372f61>] bus_remove_driver+0x91/0x110
> [<ffffffff81374fd2>] driver_unregister+0x62/0xa0
> [<ffffffff812ab2b4>] pci_unregister_driver+0x44/0xa0
> [<ffffffffa041f3d5>] iwl_pci_unregister_driver+0x15/0x20 [iwlwifi]
> [<ffffffffa0430a01>] iwl_exit+0x9/0x1c [iwlwifi]
> [<ffffffff810c50f1>] sys_delete_module+0x1d1/0x2c0
> [<ffffffff81555855>] ? retint_swapgs+0x13/0x1b
> [<ffffffff810e169c>] ? __audit_syscall_entry+0xcc/0x210
> [<ffffffff812896ce>] ? trace_hardirqs_on_thunk+0x3a/0x3f
> [<ffffffff8155de69>] system_call_fastpath+0x16/0x1b
> ---[ end trace 8070f580fc119b8b ]---
> ------------[ cut here ]------------
> WARNING: at lib/debugobjects.c:261 debug_print_object+0x8e/0xb0()
> Hardware name: SandyBridge Platform
> ODEBUG: free active (active state 0) object type: timer_list hint:
> ieee80211_sta_bcn_mon_timer+0x0/0x40 [mac80211]
> Modules linked in: autofs4 sunrpc cpufreq_ondemand acpi_cpufreq
> freq_table mperf ipv6 uinput arc4 sg iwlwifi(-) mac80211 cfg80211 rfkill
> coretemp kvm_intel kvm crc32c_intel ghash_clmulni_intel microcode pcspkr
> lpc_ich mfd_core i2c_i801 e1000e ext4 mbcache jbd2 sd_mod crc_t10dif
> sr_mod cdrom aesni_intel cryptd aes_x86_64 aes_generic ahci libahci i915
> drm_kms_helper drm i2c_algo_bit i2c_core video dm_mirror dm_region_hash
> dm_log dm_mod [last unloaded: scsi_wait_scan]
> Pid: 3064, comm: modprobe Tainted: G W 3.5.0 #1
> Call Trace:
> [<ffffffff810535af>] warn_slowpath_common+0x7f/0xc0
> [<ffffffff810536a6>] warn_slowpath_fmt+0x46/0x50
> [<ffffffff812901be>] debug_print_object+0x8e/0xb0
> [<ffffffffa03a09f0>] ? ieee80211_sta_conn_mon_timer+0x40/0x40
> [mac80211]
> [<ffffffff81290a0d>] __debug_check_no_obj_freed+0x10d/0x200
> [<ffffffff81290b1d>] debug_check_no_obj_freed+0x1d/0x30
> [<ffffffff8117a2b0>] kfree+0xc0/0x330
> [<ffffffff810b9083>] ? __lock_release+0x133/0x1a0
> [<ffffffff815555f0>] ? _raw_spin_unlock_irqrestore+0x40/0x80
> [<ffffffff814957c4>] netdev_release+0x44/0x60
> [<ffffffff813704b7>] device_release+0x27/0xa0
> [<ffffffff8127da42>] kobject_cleanup+0x82/0x1b0
> [<ffffffff8127db7d>] kobject_release+0xd/0x10
> [<ffffffff8127d8cc>] kobject_put+0x2c/0x60
> [<ffffffff8147e371>] netdev_run_todo+0x101/0x180
> [<ffffffff8148f5ae>] rtnl_unlock+0xe/0x10
> [<ffffffffa0366178>] ieee80211_unregister_hw+0x58/0x120 [mac80211]
> [<ffffffffa040912b>] iwlagn_mac_unregister+0x2b/0x40 [iwlwifi]
> [<ffffffffa03fdf59>] iwl_op_mode_dvm_stop+0x49/0xf0 [iwlwifi]
> [<ffffffffa041f730>] iwl_drv_stop+0x40/0x60 [iwlwifi]
> [<ffffffffa0430a39>] iwl_pci_remove+0x25/0x3c [iwlwifi]
> [<ffffffff812aafc2>] pci_device_remove+0x52/0x120
> [<ffffffff813741cc>] __device_release_driver+0x7c/0xe0
> [<ffffffff81374308>] driver_detach+0xd8/0xe0
> [<ffffffff81372f61>] bus_remove_driver+0x91/0x110
> [<ffffffff81374fd2>] driver_unregister+0x62/0xa0
> [<ffffffff812ab2b4>] pci_unregister_driver+0x44/0xa0
> [<ffffffffa041f3d5>] iwl_pci_unregister_driver+0x15/0x20 [iwlwifi]
> [<ffffffffa0430a01>] iwl_exit+0x9/0x1c [iwlwifi]
> [<ffffffff810c50f1>] sys_delete_module+0x1d1/0x2c0
> [<ffffffff81555855>] ? retint_swapgs+0x13/0x1b
> [<ffffffff810e169c>] ? __audit_syscall_entry+0xcc/0x210
> [<ffffffff812896ce>] ? trace_hardirqs_on_thunk+0x3a/0x3f
> [<ffffffff8155de69>] system_call_fastpath+0x16/0x1b
> ---[ end trace 8070f580fc119b8c ]---
> Bridge firewalling registered
>
Hi!
I was finally able to compile a relocatable kernel: here's what I got,
after a crash on iwlegacy module removal:
# crash vmcore /usr/lib/debug/lib/modules/`uname -r`/vmlinux
crash 6.0.8-1.fc18 [note, I'm on FC17 but had to install FC18's crash
to workaround a log structure change in 3.5 kernel]
(...)
This GDB was configured as "i686-pc-linux-gnu"...
============ CRASH 1 ============
KERNEL: /usr/lib/debug/lib/modules/3.5.4-2.pedro.fc17.i686.PAE/vmlinux
DUMPFILE: vmcore [PARTIAL DUMP]
CPUS: 2
DATE: Tue Oct 9 09:17:35 2012
UPTIME: 00:12:47
LOAD AVERAGE: 0.50, 0.62, 0.60
TASKS: 322
NODENAME: s2
RELEASE: 3.5.4-2.pedro.fc17.i686.PAE
VERSION: #1 SMP Mon Oct 8 23:15:44 WEST 2012
MACHINE: i686 (1496 Mhz)
MEMORY: 2 GB
PANIC: "Oops: 0000 [#1] SMP " (check log for details)
PID: 0
COMMAND: "swapper/1"
TASK: f4104240 (1 of 2) [THREAD_INFO: f4146000]
CPU: 1
STATE: TASK_RUNNING (PANIC)
crash> bt
PID: 0 TASK: f4104240 CPU: 1 COMMAND: "swapper/1"
#0 [f4147db4] crash_kexec at c04a7d59
#1 [f4147e04] timerqueue_add at c0675503
#2 [f4147e14] ktime_get at c04921ee
#3 [f4147e30] bad_area_nosemaphore at c0958328
#4 [f4147e3c] do_page_fault at c0964c25
#5 [f4147eb8] error_code (via page_fault) at c0961eb1
EAX: 6b6b6b6b EBX: 00072420 ECX: 00000001 EDX: f4178930 EBP: f4147f30
DS: 007b ESI: 00000024 ES: 007b EDI: 00072420 GS: 00e0
CS: 0060 EIP: c0457993 ERR: ffffffff EFLAGS: 00010003
#6 [f4147eec] get_next_timer_interrupt at c0457993
#7 [f4147f34] tick_nohz_stop_sched_tick.isra.11 at c049a19a
#8 [f4147f78] tick_nohz_idle_enter at c049a661
#9 [f4147f80] cpu_idle at c04195da
============ CRASH 2 ============
KERNEL: /usr/lib/debug/lib/modules/3.5.4-2.pedro.fc17.i686.PAE/vmlinux
DUMPFILE: vmcore
CPUS: 2
DATE: Tue Oct 9 09:29:35 2012
UPTIME: 00:10:22
LOAD AVERAGE: 0.30, 0.78, 0.67
TASKS: 323
NODENAME: s2
RELEASE: 3.5.4-2.pedro.fc17.i686.PAE
VERSION: #1 SMP Mon Oct 8 23:15:44 WEST 2012
MACHINE: i686 (1496 Mhz)
MEMORY: 2 GB
PANIC: "kernel BUG at kernel/timer.c:1091!"
PID: 6563
COMMAND: "rpm" <-- ?
TASK: eaa5dcc0 [THREAD_INFO: f414a000]
CPU: 0
STATE: TASK_RUNNING (PANIC)
crash> bt
PID: 6563 TASK: eaa5dcc0 CPU: 0 COMMAND: "rpm"
bt: cannot resolve stack trace:
#0 [f414bd58] __schedule at c095fba6
#1 [f414bdd4] sched_clock_local at c047a56d
bt: text symbols on stack:
[f414bd5c] kmap_atomic_prot at c0441244
[f414bd70] __kunmap_atomic at c04410dd
[f414bd80] get_page_from_freelist at c0504e40
[f414bdcc] sched_clock at c0417a28
[f414bdd4] sched_clock_local at c047a572
[f414be28] update_curr at c047cdb2
[f414be60] clear_nohz_tick_stopped.part.37 at c0958f63
[f414be6c] trigger_load_balance at c047ff73
[f414be88] scheduler_tick at c04774e5
[f414beac] timerqueue_add at c0675508
[f414bec0] ktime_get at c04921f0
[f414bed4] lapic_next_event at c042f75b
[f414bedc] clockevents_program_event at c049877d
[f414bef4] tick_program_event at c0499a79
[f414bf04] hrtimer_interrupt at c046bcc8
[f414bf54] irq_exit at c045001d
[f414bf5c] smp_apic_timer_interrupt at c042fdbe
[f414bf74] apic_timer_interrupt at c0961c85
[f414bfa8] sysenter_past_esp at c0968322
bt: possible exception frame:
USER-MODE EXCEPTION FRAME AT f414bfb4:
EAX: 0000002d EBX: 09fba000 ECX: 45a4bff4 EDX: 09fba000
DS: 007b ESI: 09f99000 ES: 007b EDI: 09fba000
SS: 007b ESP: bff95804 EBP: bff95804 GS: 0033
CS: 0073 EIP: b7720424 ERR: 0000002d EFLAGS: 00000202
So, I'm guessing this means it is related to what you found on iwlwifi
(even if I'm on iwlegacy)?
The crash kernel crashed again but I can try to add a script to try to
recover dmesg -- I believe slub_debug caught something as well...
--
Pedro
next prev parent reply other threads:[~2012-10-09 9:15 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-07-31 12:54 unloading WiFi modules is usually triggering kernel crash Pedro Francisco
2012-07-31 13:13 ` John W. Linville
2012-08-07 10:22 ` Stanislaw Gruszka
2012-08-30 15:58 ` Pedro Francisco
2012-09-26 12:47 ` Pedro Francisco
2012-10-03 14:30 ` Stanislaw Gruszka
2012-10-09 9:14 ` Pedro Francisco [this message]
2012-10-12 12:13 ` Stanislaw Gruszka
2012-10-15 11:03 ` Johannes Berg
2012-10-15 15:48 ` Pedro Francisco
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAJZjf_xZC2473B2SQAKFfc7m6SSeEK7z7N-GaUsZUNV-K9c=Ww@mail.gmail.com' \
--to=pedrogfrancisco@gmail.com \
--cc=johannes@sipsolutions.net \
--cc=linux-wireless@vger.kernel.org \
--cc=sgruszka@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.