All of lore.kernel.org
 help / color / mirror / Atom feed
* unloading WiFi modules is usually triggering kernel crash
@ 2012-07-31 12:54 Pedro Francisco
  2012-07-31 13:13 ` John W. Linville
  2012-08-07 10:22 ` Stanislaw Gruszka
  0 siblings, 2 replies; 10+ messages in thread
From: Pedro Francisco @ 2012-07-31 12:54 UTC (permalink / raw)
  To: ML linux-wireless

I've noticed in the past few days a pattern: sometimes nm-applet
starts showing empty bars for the signal strength.

Running the script:
sudo ifconfig wlan0 down; sleep 1
sudo rmmod hp_wmi; sudo rmmod iwl3945; sudo rmmod iwlegacy; sudo rmmod
mac80211; sudo rmmod cfg80211
sleep 2; sudo rmmod rfkill; sync
sudo modprobe rfkill; sudo modprobe cfg80211; sudo modprobe mac80211;
sudo modprobe iwlegacy
sudo modprobe iwl3945; sudo modprobe hp_wmi; sleep 1; sudo ifconfig wlan0 up

usually triggers a kernel crash. This has happened twice so far. I
tried it now for the third time but it didn't crash.

Logs (running with slub_debug ):
https://dl.dropbox.com/u/1332655/WiFi-issues/notTainted-cfg80211_mlme_disassoc-WARNING.log
https://dl.dropbox.com/u/1332655/WiFi-issues/alreadyTainted-debug_print_object-WARNING.log
(debug_print_object-WARNING was caused by running the above script
rmmoding things)
https://dl.dropbox.com/u/1332655/WiFi-issues/iw_dev_scan.log
https://dl.dropbox.com/u/1332655/WiFi-issues/gshell-wifiBars_empty.png

Any ideas on what is going on? Looking at other mails around here it
seems not to be driver specific, at least the cfg80211_mlme_disassoc
part.

Thanks in Advance,
-- 
Pedro

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: unloading WiFi modules is usually triggering kernel crash
  2012-07-31 12:54 unloading WiFi modules is usually triggering kernel crash Pedro Francisco
@ 2012-07-31 13:13 ` John W. Linville
  2012-08-07 10:22 ` Stanislaw Gruszka
  1 sibling, 0 replies; 10+ messages in thread
From: John W. Linville @ 2012-07-31 13:13 UTC (permalink / raw)
  To: Pedro Francisco; +Cc: ML linux-wireless, johannes

On Tue, Jul 31, 2012 at 01:54:52PM +0100, Pedro Francisco wrote:
> I've noticed in the past few days a pattern: sometimes nm-applet
> starts showing empty bars for the signal strength.
> 
> Running the script:
> sudo ifconfig wlan0 down; sleep 1
> sudo rmmod hp_wmi; sudo rmmod iwl3945; sudo rmmod iwlegacy; sudo rmmod
> mac80211; sudo rmmod cfg80211
> sleep 2; sudo rmmod rfkill; sync
> sudo modprobe rfkill; sudo modprobe cfg80211; sudo modprobe mac80211;
> sudo modprobe iwlegacy
> sudo modprobe iwl3945; sudo modprobe hp_wmi; sleep 1; sudo ifconfig wlan0 up
> 
> usually triggers a kernel crash. This has happened twice so far. I
> tried it now for the third time but it didn't crash.
> 
> Logs (running with slub_debug ):
> https://dl.dropbox.com/u/1332655/WiFi-issues/notTainted-cfg80211_mlme_disassoc-WARNING.log
> https://dl.dropbox.com/u/1332655/WiFi-issues/alreadyTainted-debug_print_object-WARNING.log
> (debug_print_object-WARNING was caused by running the above script
> rmmoding things)
> https://dl.dropbox.com/u/1332655/WiFi-issues/iw_dev_scan.log
> https://dl.dropbox.com/u/1332655/WiFi-issues/gshell-wifiBars_empty.png
> 
> Any ideas on what is going on? Looking at other mails around here it
> seems not to be driver specific, at least the cfg80211_mlme_disassoc
> part.

Looks the same as this one, FWIW...

	https://bugzilla.redhat.com/show_bug.cgi?id=834158

John
-- 
John W. Linville		Someday the world will need a hero, and you
linville@tuxdriver.com			might be all we have.  Be ready.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: unloading WiFi modules is usually triggering kernel crash
  2012-07-31 12:54 unloading WiFi modules is usually triggering kernel crash Pedro Francisco
  2012-07-31 13:13 ` John W. Linville
@ 2012-08-07 10:22 ` Stanislaw Gruszka
  2012-08-30 15:58   ` Pedro Francisco
  1 sibling, 1 reply; 10+ messages in thread
From: Stanislaw Gruszka @ 2012-08-07 10:22 UTC (permalink / raw)
  To: Pedro Francisco; +Cc: ML linux-wireless

On Tue, Jul 31, 2012 at 01:54:52PM +0100, Pedro Francisco wrote:
> I've noticed in the past few days a pattern: sometimes nm-applet
> starts showing empty bars for the signal strength.

RSSI reporting problem or maybe NM issue. When you change kernel to
older or newer does this problem go away ?

> Running the script:
> sudo ifconfig wlan0 down; sleep 1
> sudo rmmod hp_wmi; sudo rmmod iwl3945; sudo rmmod iwlegacy; sudo rmmod
> mac80211; sudo rmmod cfg80211
> sleep 2; sudo rmmod rfkill; sync
> sudo modprobe rfkill; sudo modprobe cfg80211; sudo modprobe mac80211;
> sudo modprobe iwlegacy
> sudo modprobe iwl3945; sudo modprobe hp_wmi; sleep 1; sudo ifconfig wlan0 up

I run a bit modified script (I do not have hp_wmi.ko and rfkill.ko) for few
hours, and did not get any WARNING/crash. I used 3.5, can you check if that
problem is also fixed on your system on 3.5 or newer.

Stanislaw

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: unloading WiFi modules is usually triggering kernel crash
  2012-08-07 10:22 ` Stanislaw Gruszka
@ 2012-08-30 15:58   ` Pedro Francisco
  2012-09-26 12:47     ` Pedro Francisco
  0 siblings, 1 reply; 10+ messages in thread
From: Pedro Francisco @ 2012-08-30 15:58 UTC (permalink / raw)
  To: Stanislaw Gruszka; +Cc: ML linux-wireless

On Tue, Aug 7, 2012 at 11:22 AM, Stanislaw Gruszka <sgruszka@redhat.com> wrote:
> On Tue, Jul 31, 2012 at 01:54:52PM +0100, Pedro Francisco wrote:
>> I've noticed in the past few days a pattern: sometimes nm-applet
>> starts showing empty bars for the signal strength.
>
> RSSI reporting problem or maybe NM issue. When you change kernel to
> older or newer does this problem go away ?
>
>> Running the script:
>> sudo ifconfig wlan0 down; sleep 1
>> sudo rmmod hp_wmi; sudo rmmod iwl3945; sudo rmmod iwlegacy; sudo rmmod
>> mac80211; sudo rmmod cfg80211
>> sleep 2; sudo rmmod rfkill; sync
>> sudo modprobe rfkill; sudo modprobe cfg80211; sudo modprobe mac80211;
>> sudo modprobe iwlegacy
>> sudo modprobe iwl3945; sudo modprobe hp_wmi; sleep 1; sudo ifconfig wlan0 up
>
> I run a bit modified script (I do not have hp_wmi.ko and rfkill.ko) for few
> hours, and did not get any WARNING/crash. I used 3.5, can you check if that
> problem is also fixed on your system on 3.5 or newer.

On 3.5.2-3.fc17.i686.PAE everything seems stable. The problem I had
described hasn't happened recently.
I guess it got fixed in the meantime.

Thank you for your time,
-- 
Pedro

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: unloading WiFi modules is usually triggering kernel crash
  2012-08-30 15:58   ` Pedro Francisco
@ 2012-09-26 12:47     ` Pedro Francisco
  2012-10-03 14:30       ` Stanislaw Gruszka
  0 siblings, 1 reply; 10+ messages in thread
From: Pedro Francisco @ 2012-09-26 12:47 UTC (permalink / raw)
  To: Stanislaw Gruszka; +Cc: ML linux-wireless

On Thu, Aug 30, 2012 at 4:58 PM, Pedro Francisco
<pedrogfrancisco@gmail.com> wrote:
> On Tue, Aug 7, 2012 at 11:22 AM, Stanislaw Gruszka <sgruszka@redhat.com> wrote:
>> On Tue, Jul 31, 2012 at 01:54:52PM +0100, Pedro Francisco wrote:
>>> I've noticed in the past few days a pattern: sometimes nm-applet
>>> starts showing empty bars for the signal strength.
>>
>> RSSI reporting problem or maybe NM issue. When you change kernel to
>> older or newer does this problem go away ?
>>
>>> Running the script:
>>> sudo ifconfig wlan0 down; sleep 1
>>> sudo rmmod hp_wmi; sudo rmmod iwl3945; sudo rmmod iwlegacy; sudo rmmod
>>> mac80211; sudo rmmod cfg80211
>>> sleep 2; sudo rmmod rfkill; sync
>>> sudo modprobe rfkill; sudo modprobe cfg80211; sudo modprobe mac80211;
>>> sudo modprobe iwlegacy
>>> sudo modprobe iwl3945; sudo modprobe hp_wmi; sleep 1; sudo ifconfig wlan0 up
>>
>> I run a bit modified script (I do not have hp_wmi.ko and rfkill.ko) for few
>> hours, and did not get any WARNING/crash. I used 3.5, can you check if that
>> problem is also fixed on your system on 3.5 or newer.
>
> On 3.5.2-3.fc17.i686.PAE everything seems stable. The problem I had
> described hasn't happened recently.
> I guess it got fixed in the meantime.

I was wrong, got it again.

So, to recap: once the network applet shows no signal, but only then,
removing the wireless modules triggers an unrecoverable kernel panic.
I still haven't compiled a relocatable x86 kernel to get a proper
backtrace using kexec/kdump, sorry.

I found something else as well. Notice this output of "iwconfig" when
everything is _normal_:
$ iwconfig wlan0
wlan0     IEEE 802.11abg  ESSID:"eduroam"
          Mode:Managed  Frequency:2.437 GHz  Access Point: B8:62:1F:XX:XX:XX
          Bit Rate=54 Mb/s   Tx-Power=15 dBm
          Retry  long limit:7   RTS thr:off   Fragment thr:off
          Power Management:off
          Link Quality=58/70  Signal level=-52 dBm
          Rx invalid nwid:0  Rx invalid crypt:0  Rx invalid frag:0
          Tx excessive retries:0  Invalid misc:0   Missed beacon:0

When I have the "empty signal bars" issue:
$ iwconfig wlan0
wlan0     IEEE 802.11abg  ESSID:off/any
          Mode:Managed  Access Point: Not-Associated   Tx-Power=15 dBm
          Retry  long limit:7   RTS thr:off   Fragment thr:off
          Power Management:off

In case you're wondering, it is connected and streaming stuff :)

I can sometimes trigger it on purpose: I just have to roam to a 5GHz
AP of the same ESS, cycle around 2GHz and back to 5GHz (using wpa_cli
roam XX:XX:XX:XX:XX ). If I get "SME: Authentication request to the
driver failed", then disabling NetworkManager (not wireless) and
reenabling will _probably_ get the "empty signal bars" (I was just
able to trigger the "empty signal bars" now after a clean boot).
So I'm guessing something gets corrupted, which is why reloading the
modules will crash.

I'm aware due to a patch to _iwlwifi_ (not iwl3945/iwlegacy) [1] that
2->5GHz roaming is not working very well on newer Intel wireless cards
so it is worth considering it is happening here as well.

Also, note some info, collected two days ago, relative to "Invalid
misc:" is getting 10 "invalid misc" packets in 10 seconds normal?
Several 'VAL=`date`; VAL="$VAL $(iwconfig wlan0 |grep "Invalid
misc")"; echo $VAL' follow:
Seg Set 24 15:06:36 WEST 2012 Tx excessive retries:5 Invalid misc:133
Missed beacon:0
Seg Set 24 15:06:46 WEST 2012 Tx excessive retries:5 Invalid misc:143
Missed beacon:0
Seg Set 24 15:07:00 WEST 2012 Tx excessive retries:5 Invalid misc:148
Missed beacon:0
Seg Set 24 15:21:46 WEST 2012 Tx excessive retries:22 Invalid misc:495
Missed beacon:0
Seg Set 24 15:24:41 WEST 2012 Tx excessive retries:24 Invalid misc:593
Missed beacon:0


So, something is getting corrupted here. Do you want the full logs?

[1] http://thread.gmane.org/gmane.linux.kernel.wireless.general/89361/focus=89445

-- 
Pedro

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: unloading WiFi modules is usually triggering kernel crash
  2012-09-26 12:47     ` Pedro Francisco
@ 2012-10-03 14:30       ` Stanislaw Gruszka
  2012-10-09  9:14         ` Pedro Francisco
  0 siblings, 1 reply; 10+ messages in thread
From: Stanislaw Gruszka @ 2012-10-03 14:30 UTC (permalink / raw)
  To: Pedro Francisco; +Cc: ML linux-wireless, Johannes Berg

On Wed, Sep 26, 2012 at 01:47:18PM +0100, Pedro Francisco wrote:
> On Thu, Aug 30, 2012 at 4:58 PM, Pedro Francisco
> <pedrogfrancisco@gmail.com> wrote:
> > On Tue, Aug 7, 2012 at 11:22 AM, Stanislaw Gruszka <sgruszka@redhat.com> wrote:
> >> On Tue, Jul 31, 2012 at 01:54:52PM +0100, Pedro Francisco wrote:
> >>> I've noticed in the past few days a pattern: sometimes nm-applet
> >>> starts showing empty bars for the signal strength.
> >>
> >> RSSI reporting problem or maybe NM issue. When you change kernel to
> >> older or newer does this problem go away ?
> >>
> >>> Running the script:
> >>> sudo ifconfig wlan0 down; sleep 1
> >>> sudo rmmod hp_wmi; sudo rmmod iwl3945; sudo rmmod iwlegacy; sudo rmmod
> >>> mac80211; sudo rmmod cfg80211
> >>> sleep 2; sudo rmmod rfkill; sync
> >>> sudo modprobe rfkill; sudo modprobe cfg80211; sudo modprobe mac80211;
> >>> sudo modprobe iwlegacy
> >>> sudo modprobe iwl3945; sudo modprobe hp_wmi; sleep 1; sudo ifconfig wlan0 up
> >>
> >> I run a bit modified script (I do not have hp_wmi.ko and rfkill.ko) for few
> >> hours, and did not get any WARNING/crash. I used 3.5, can you check if that
> >> problem is also fixed on your system on 3.5 or newer.
> >
> > On 3.5.2-3.fc17.i686.PAE everything seems stable. The problem I had
> > described hasn't happened recently.
> > I guess it got fixed in the meantime.
> 
> I was wrong, got it again.
> 
> So, to recap: once the network applet shows no signal, but only then,
> removing the wireless modules triggers an unrecoverable kernel panic.
> I still haven't compiled a relocatable x86 kernel to get a proper
> backtrace using kexec/kdump, sorry.
> 
> I found something else as well. Notice this output of "iwconfig" when
> everything is _normal_:
> $ iwconfig wlan0
> wlan0     IEEE 802.11abg  ESSID:"eduroam"
>           Mode:Managed  Frequency:2.437 GHz  Access Point: B8:62:1F:XX:XX:XX
>           Bit Rate=54 Mb/s   Tx-Power=15 dBm
>           Retry  long limit:7   RTS thr:off   Fragment thr:off
>           Power Management:off
>           Link Quality=58/70  Signal level=-52 dBm
>           Rx invalid nwid:0  Rx invalid crypt:0  Rx invalid frag:0
>           Tx excessive retries:0  Invalid misc:0   Missed beacon:0
> 
> When I have the "empty signal bars" issue:
> $ iwconfig wlan0
> wlan0     IEEE 802.11abg  ESSID:off/any
>           Mode:Managed  Access Point: Not-Associated   Tx-Power=15 dBm
>           Retry  long limit:7   RTS thr:off   Fragment thr:off
>           Power Management:off
> 
> In case you're wondering, it is connected and streaming stuff :)
> 
> I can sometimes trigger it on purpose: I just have to roam to a 5GHz
> AP of the same ESS, cycle around 2GHz and back to 5GHz (using wpa_cli
> roam XX:XX:XX:XX:XX ). If I get "SME: Authentication request to the
> driver failed", then disabling NetworkManager (not wireless) and
> reenabling will _probably_ get the "empty signal bars" (I was just
> able to trigger the "empty signal bars" now after a clean boot).
> So I'm guessing something gets corrupted, which is why reloading the
> modules will crash.

We do not stop mac80211 timers on module unload. I reproduced below
warnings with iwlwifi on 3.5 kernel with DEBUG_OBJECTS enabled.
I forced roaming many times, and then do "modprobe -r iwlwifi".
Unfortunately those steps do not trigger warnings anytime, they
happened just once.

iwlwifi 0000:02:00.0: ACTIVATE a non DRIVER active station id 0 addr 6c:50:4d:3f:79:73
------------[ cut here ]------------
WARNING: at lib/debugobjects.c:261 debug_print_object+0x8e/0xb0()
Hardware name: SandyBridge Platform
ODEBUG: free active (active state 0) object type: timer_list hint:
ieee80211_sta_conn_mon_timer+0x0/0x40 [mac80211]
Modules linked in: autofs4 sunrpc cpufreq_ondemand acpi_cpufreq
freq_table mperf ipv6 uinput arc4 sg iwlwifi(-) mac80211 cfg80211 rfkill
coretemp kvm_intel kvm crc32c_intel ghash_clmulni_intel microcode pcspkr
lpc_ich mfd_core i2c_i801 e1000e ext4 mbcache jbd2 sd_mod crc_t10dif
sr_mod cdrom aesni_intel cryptd aes_x86_64 aes_generic ahci libahci i915
drm_kms_helper drm i2c_algo_bit i2c_core video dm_mirror dm_region_hash
dm_log dm_mod [last unloaded: scsi_wait_scan]
Pid: 3064, comm: modprobe Not tainted 3.5.0 #1
Call Trace:
 [<ffffffff810535af>] warn_slowpath_common+0x7f/0xc0
 [<ffffffff810536a6>] warn_slowpath_fmt+0x46/0x50
 [<ffffffff812901be>] debug_print_object+0x8e/0xb0
 [<ffffffffa03a09b0>] ? ieee80211_chswitch_timer+0x40/0x40 [mac80211]
 [<ffffffff81290a0d>] __debug_check_no_obj_freed+0x10d/0x200
 [<ffffffff81290b1d>] debug_check_no_obj_freed+0x1d/0x30
 [<ffffffff8117a2b0>] kfree+0xc0/0x330
 [<ffffffff810b9083>] ? __lock_release+0x133/0x1a0
 [<ffffffff815555f0>] ? _raw_spin_unlock_irqrestore+0x40/0x80
 [<ffffffff814957c4>] netdev_release+0x44/0x60
 [<ffffffff813704b7>] device_release+0x27/0xa0
 [<ffffffff8127da42>] kobject_cleanup+0x82/0x1b0
 [<ffffffff8127db7d>] kobject_release+0xd/0x10
 [<ffffffff8127d8cc>] kobject_put+0x2c/0x60
 [<ffffffff8147e371>] netdev_run_todo+0x101/0x180
 [<ffffffff8148f5ae>] rtnl_unlock+0xe/0x10
 [<ffffffffa0366178>] ieee80211_unregister_hw+0x58/0x120 [mac80211]
 [<ffffffffa040912b>] iwlagn_mac_unregister+0x2b/0x40 [iwlwifi]
 [<ffffffffa03fdf59>] iwl_op_mode_dvm_stop+0x49/0xf0 [iwlwifi]
 [<ffffffffa041f730>] iwl_drv_stop+0x40/0x60 [iwlwifi]
 [<ffffffffa0430a39>] iwl_pci_remove+0x25/0x3c [iwlwifi]
 [<ffffffff812aafc2>] pci_device_remove+0x52/0x120
 [<ffffffff813741cc>] __device_release_driver+0x7c/0xe0
 [<ffffffff81374308>] driver_detach+0xd8/0xe0
 [<ffffffff81372f61>] bus_remove_driver+0x91/0x110
 [<ffffffff81374fd2>] driver_unregister+0x62/0xa0
 [<ffffffff812ab2b4>] pci_unregister_driver+0x44/0xa0
 [<ffffffffa041f3d5>] iwl_pci_unregister_driver+0x15/0x20 [iwlwifi]
 [<ffffffffa0430a01>] iwl_exit+0x9/0x1c [iwlwifi]
 [<ffffffff810c50f1>] sys_delete_module+0x1d1/0x2c0
 [<ffffffff81555855>] ? retint_swapgs+0x13/0x1b
 [<ffffffff810e169c>] ? __audit_syscall_entry+0xcc/0x210
 [<ffffffff812896ce>] ? trace_hardirqs_on_thunk+0x3a/0x3f
 [<ffffffff8155de69>] system_call_fastpath+0x16/0x1b
---[ end trace 8070f580fc119b8b ]---
------------[ cut here ]------------
WARNING: at lib/debugobjects.c:261 debug_print_object+0x8e/0xb0()
Hardware name: SandyBridge Platform
ODEBUG: free active (active state 0) object type: timer_list hint:
ieee80211_sta_bcn_mon_timer+0x0/0x40 [mac80211]
Modules linked in: autofs4 sunrpc cpufreq_ondemand acpi_cpufreq
freq_table mperf ipv6 uinput arc4 sg iwlwifi(-) mac80211 cfg80211 rfkill
coretemp kvm_intel kvm crc32c_intel ghash_clmulni_intel microcode pcspkr
lpc_ich mfd_core i2c_i801 e1000e ext4 mbcache jbd2 sd_mod crc_t10dif
sr_mod cdrom aesni_intel cryptd aes_x86_64 aes_generic ahci libahci i915
drm_kms_helper drm i2c_algo_bit i2c_core video dm_mirror dm_region_hash
dm_log dm_mod [last unloaded: scsi_wait_scan]
Pid: 3064, comm: modprobe Tainted: G        W    3.5.0 #1
Call Trace:
 [<ffffffff810535af>] warn_slowpath_common+0x7f/0xc0
 [<ffffffff810536a6>] warn_slowpath_fmt+0x46/0x50
 [<ffffffff812901be>] debug_print_object+0x8e/0xb0
 [<ffffffffa03a09f0>] ? ieee80211_sta_conn_mon_timer+0x40/0x40
[mac80211]
 [<ffffffff81290a0d>] __debug_check_no_obj_freed+0x10d/0x200
 [<ffffffff81290b1d>] debug_check_no_obj_freed+0x1d/0x30
 [<ffffffff8117a2b0>] kfree+0xc0/0x330
 [<ffffffff810b9083>] ? __lock_release+0x133/0x1a0
 [<ffffffff815555f0>] ? _raw_spin_unlock_irqrestore+0x40/0x80
 [<ffffffff814957c4>] netdev_release+0x44/0x60
 [<ffffffff813704b7>] device_release+0x27/0xa0
 [<ffffffff8127da42>] kobject_cleanup+0x82/0x1b0
 [<ffffffff8127db7d>] kobject_release+0xd/0x10
 [<ffffffff8127d8cc>] kobject_put+0x2c/0x60
 [<ffffffff8147e371>] netdev_run_todo+0x101/0x180
 [<ffffffff8148f5ae>] rtnl_unlock+0xe/0x10
 [<ffffffffa0366178>] ieee80211_unregister_hw+0x58/0x120 [mac80211]
 [<ffffffffa040912b>] iwlagn_mac_unregister+0x2b/0x40 [iwlwifi]
 [<ffffffffa03fdf59>] iwl_op_mode_dvm_stop+0x49/0xf0 [iwlwifi]
 [<ffffffffa041f730>] iwl_drv_stop+0x40/0x60 [iwlwifi]
 [<ffffffffa0430a39>] iwl_pci_remove+0x25/0x3c [iwlwifi]
 [<ffffffff812aafc2>] pci_device_remove+0x52/0x120
 [<ffffffff813741cc>] __device_release_driver+0x7c/0xe0
 [<ffffffff81374308>] driver_detach+0xd8/0xe0
 [<ffffffff81372f61>] bus_remove_driver+0x91/0x110
 [<ffffffff81374fd2>] driver_unregister+0x62/0xa0
 [<ffffffff812ab2b4>] pci_unregister_driver+0x44/0xa0
 [<ffffffffa041f3d5>] iwl_pci_unregister_driver+0x15/0x20 [iwlwifi]
 [<ffffffffa0430a01>] iwl_exit+0x9/0x1c [iwlwifi]
 [<ffffffff810c50f1>] sys_delete_module+0x1d1/0x2c0
 [<ffffffff81555855>] ? retint_swapgs+0x13/0x1b
 [<ffffffff810e169c>] ? __audit_syscall_entry+0xcc/0x210
 [<ffffffff812896ce>] ? trace_hardirqs_on_thunk+0x3a/0x3f
 [<ffffffff8155de69>] system_call_fastpath+0x16/0x1b
---[ end trace 8070f580fc119b8c ]---
Bridge firewalling registered

> misc:" is getting 10 "invalid misc" packets in 10 seconds normal?
> Several 'VAL=`date`; VAL="$VAL $(iwconfig wlan0 |grep "Invalid
> misc")"; echo $VAL' follow:
> Seg Set 24 15:06:36 WEST 2012 Tx excessive retries:5 Invalid misc:133
> Missed beacon:0
> Seg Set 24 15:06:46 WEST 2012 Tx excessive retries:5 Invalid misc:143
> Missed beacon:0
> Seg Set 24 15:07:00 WEST 2012 Tx excessive retries:5 Invalid misc:148
> Missed beacon:0
> Seg Set 24 15:21:46 WEST 2012 Tx excessive retries:22 Invalid misc:495
> Missed beacon:0
> Seg Set 24 15:24:41 WEST 2012 Tx excessive retries:24 Invalid misc:593
> Missed beacon:0

I see lot of that. This can be caused by noisy radio environment, but also
can be a firmware/driver bug. Unfortunately those kind of bugs are not
easy to fix.

Stanislaw

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: unloading WiFi modules is usually triggering kernel crash
  2012-10-03 14:30       ` Stanislaw Gruszka
@ 2012-10-09  9:14         ` Pedro Francisco
  2012-10-12 12:13           ` Stanislaw Gruszka
  0 siblings, 1 reply; 10+ messages in thread
From: Pedro Francisco @ 2012-10-09  9:14 UTC (permalink / raw)
  To: Stanislaw Gruszka; +Cc: ML linux-wireless, Johannes Berg

On Wed, Oct 3, 2012 at 3:30 PM, Stanislaw Gruszka <sgruszka@redhat.com> wrote:
> On Wed, Sep 26, 2012 at 01:47:18PM +0100, Pedro Francisco wrote:
>> On Thu, Aug 30, 2012 at 4:58 PM, Pedro Francisco
>> <pedrogfrancisco@gmail.com> wrote:
>> > On Tue, Aug 7, 2012 at 11:22 AM, Stanislaw Gruszka <sgruszka@redhat.com> wrote:
>> >> On Tue, Jul 31, 2012 at 01:54:52PM +0100, Pedro Francisco wrote:
>> >>> I've noticed in the past few days a pattern: sometimes nm-applet
>> >>> starts showing empty bars for the signal strength.
>> >>
>> >> RSSI reporting problem or maybe NM issue. When you change kernel to
>> >> older or newer does this problem go away ?
>> >>
>> >>> Running the script:
>> >>> sudo ifconfig wlan0 down; sleep 1
>> >>> sudo rmmod hp_wmi; sudo rmmod iwl3945; sudo rmmod iwlegacy; sudo rmmod
>> >>> mac80211; sudo rmmod cfg80211
>> >>> sleep 2; sudo rmmod rfkill; sync
>> >>> sudo modprobe rfkill; sudo modprobe cfg80211; sudo modprobe mac80211;
>> >>> sudo modprobe iwlegacy
>> >>> sudo modprobe iwl3945; sudo modprobe hp_wmi; sleep 1; sudo ifconfig wlan0 up
>> >>
>> >> I run a bit modified script (I do not have hp_wmi.ko and rfkill.ko) for few
>> >> hours, and did not get any WARNING/crash. I used 3.5, can you check if that
>> >> problem is also fixed on your system on 3.5 or newer.
>> >
>> > On 3.5.2-3.fc17.i686.PAE everything seems stable. The problem I had
>> > described hasn't happened recently.
>> > I guess it got fixed in the meantime.
>>
>> I was wrong, got it again.
>>
>> So, to recap: once the network applet shows no signal, but only then,
>> removing the wireless modules triggers an unrecoverable kernel panic.
>> I still haven't compiled a relocatable x86 kernel to get a proper
>> backtrace using kexec/kdump, sorry.
>>
>> I found something else as well. Notice this output of "iwconfig" when
>> everything is _normal_:
>> $ iwconfig wlan0
>> wlan0     IEEE 802.11abg  ESSID:"eduroam"
>>           Mode:Managed  Frequency:2.437 GHz  Access Point: B8:62:1F:XX:XX:XX
>>           Bit Rate=54 Mb/s   Tx-Power=15 dBm
>>           Retry  long limit:7   RTS thr:off   Fragment thr:off
>>           Power Management:off
>>           Link Quality=58/70  Signal level=-52 dBm
>>           Rx invalid nwid:0  Rx invalid crypt:0  Rx invalid frag:0
>>           Tx excessive retries:0  Invalid misc:0   Missed beacon:0
>>
>> When I have the "empty signal bars" issue:
>> $ iwconfig wlan0
>> wlan0     IEEE 802.11abg  ESSID:off/any
>>           Mode:Managed  Access Point: Not-Associated   Tx-Power=15 dBm
>>           Retry  long limit:7   RTS thr:off   Fragment thr:off
>>           Power Management:off
>>
>> In case you're wondering, it is connected and streaming stuff :)
>>
>> I can sometimes trigger it on purpose: I just have to roam to a 5GHz
>> AP of the same ESS, cycle around 2GHz and back to 5GHz (using wpa_cli
>> roam XX:XX:XX:XX:XX ). If I get "SME: Authentication request to the
>> driver failed", then disabling NetworkManager (not wireless) and
>> reenabling will _probably_ get the "empty signal bars" (I was just
>> able to trigger the "empty signal bars" now after a clean boot).
>> So I'm guessing something gets corrupted, which is why reloading the
>> modules will crash.
>
> We do not stop mac80211 timers on module unload. I reproduced below
> warnings with iwlwifi on 3.5 kernel with DEBUG_OBJECTS enabled.
> I forced roaming many times, and then do "modprobe -r iwlwifi".
> Unfortunately those steps do not trigger warnings anytime, they
> happened just once.
>
> iwlwifi 0000:02:00.0: ACTIVATE a non DRIVER active station id 0 addr 6c:50:4d:3f:79:73
> ------------[ cut here ]------------
> WARNING: at lib/debugobjects.c:261 debug_print_object+0x8e/0xb0()
> Hardware name: SandyBridge Platform
> ODEBUG: free active (active state 0) object type: timer_list hint:
> ieee80211_sta_conn_mon_timer+0x0/0x40 [mac80211]
> Modules linked in: autofs4 sunrpc cpufreq_ondemand acpi_cpufreq
> freq_table mperf ipv6 uinput arc4 sg iwlwifi(-) mac80211 cfg80211 rfkill
> coretemp kvm_intel kvm crc32c_intel ghash_clmulni_intel microcode pcspkr
> lpc_ich mfd_core i2c_i801 e1000e ext4 mbcache jbd2 sd_mod crc_t10dif
> sr_mod cdrom aesni_intel cryptd aes_x86_64 aes_generic ahci libahci i915
> drm_kms_helper drm i2c_algo_bit i2c_core video dm_mirror dm_region_hash
> dm_log dm_mod [last unloaded: scsi_wait_scan]
> Pid: 3064, comm: modprobe Not tainted 3.5.0 #1
> Call Trace:
>  [<ffffffff810535af>] warn_slowpath_common+0x7f/0xc0
>  [<ffffffff810536a6>] warn_slowpath_fmt+0x46/0x50
>  [<ffffffff812901be>] debug_print_object+0x8e/0xb0
>  [<ffffffffa03a09b0>] ? ieee80211_chswitch_timer+0x40/0x40 [mac80211]
>  [<ffffffff81290a0d>] __debug_check_no_obj_freed+0x10d/0x200
>  [<ffffffff81290b1d>] debug_check_no_obj_freed+0x1d/0x30
>  [<ffffffff8117a2b0>] kfree+0xc0/0x330
>  [<ffffffff810b9083>] ? __lock_release+0x133/0x1a0
>  [<ffffffff815555f0>] ? _raw_spin_unlock_irqrestore+0x40/0x80
>  [<ffffffff814957c4>] netdev_release+0x44/0x60
>  [<ffffffff813704b7>] device_release+0x27/0xa0
>  [<ffffffff8127da42>] kobject_cleanup+0x82/0x1b0
>  [<ffffffff8127db7d>] kobject_release+0xd/0x10
>  [<ffffffff8127d8cc>] kobject_put+0x2c/0x60
>  [<ffffffff8147e371>] netdev_run_todo+0x101/0x180
>  [<ffffffff8148f5ae>] rtnl_unlock+0xe/0x10
>  [<ffffffffa0366178>] ieee80211_unregister_hw+0x58/0x120 [mac80211]
>  [<ffffffffa040912b>] iwlagn_mac_unregister+0x2b/0x40 [iwlwifi]
>  [<ffffffffa03fdf59>] iwl_op_mode_dvm_stop+0x49/0xf0 [iwlwifi]
>  [<ffffffffa041f730>] iwl_drv_stop+0x40/0x60 [iwlwifi]
>  [<ffffffffa0430a39>] iwl_pci_remove+0x25/0x3c [iwlwifi]
>  [<ffffffff812aafc2>] pci_device_remove+0x52/0x120
>  [<ffffffff813741cc>] __device_release_driver+0x7c/0xe0
>  [<ffffffff81374308>] driver_detach+0xd8/0xe0
>  [<ffffffff81372f61>] bus_remove_driver+0x91/0x110
>  [<ffffffff81374fd2>] driver_unregister+0x62/0xa0
>  [<ffffffff812ab2b4>] pci_unregister_driver+0x44/0xa0
>  [<ffffffffa041f3d5>] iwl_pci_unregister_driver+0x15/0x20 [iwlwifi]
>  [<ffffffffa0430a01>] iwl_exit+0x9/0x1c [iwlwifi]
>  [<ffffffff810c50f1>] sys_delete_module+0x1d1/0x2c0
>  [<ffffffff81555855>] ? retint_swapgs+0x13/0x1b
>  [<ffffffff810e169c>] ? __audit_syscall_entry+0xcc/0x210
>  [<ffffffff812896ce>] ? trace_hardirqs_on_thunk+0x3a/0x3f
>  [<ffffffff8155de69>] system_call_fastpath+0x16/0x1b
> ---[ end trace 8070f580fc119b8b ]---
> ------------[ cut here ]------------
> WARNING: at lib/debugobjects.c:261 debug_print_object+0x8e/0xb0()
> Hardware name: SandyBridge Platform
> ODEBUG: free active (active state 0) object type: timer_list hint:
> ieee80211_sta_bcn_mon_timer+0x0/0x40 [mac80211]
> Modules linked in: autofs4 sunrpc cpufreq_ondemand acpi_cpufreq
> freq_table mperf ipv6 uinput arc4 sg iwlwifi(-) mac80211 cfg80211 rfkill
> coretemp kvm_intel kvm crc32c_intel ghash_clmulni_intel microcode pcspkr
> lpc_ich mfd_core i2c_i801 e1000e ext4 mbcache jbd2 sd_mod crc_t10dif
> sr_mod cdrom aesni_intel cryptd aes_x86_64 aes_generic ahci libahci i915
> drm_kms_helper drm i2c_algo_bit i2c_core video dm_mirror dm_region_hash
> dm_log dm_mod [last unloaded: scsi_wait_scan]
> Pid: 3064, comm: modprobe Tainted: G        W    3.5.0 #1
> Call Trace:
>  [<ffffffff810535af>] warn_slowpath_common+0x7f/0xc0
>  [<ffffffff810536a6>] warn_slowpath_fmt+0x46/0x50
>  [<ffffffff812901be>] debug_print_object+0x8e/0xb0
>  [<ffffffffa03a09f0>] ? ieee80211_sta_conn_mon_timer+0x40/0x40
> [mac80211]
>  [<ffffffff81290a0d>] __debug_check_no_obj_freed+0x10d/0x200
>  [<ffffffff81290b1d>] debug_check_no_obj_freed+0x1d/0x30
>  [<ffffffff8117a2b0>] kfree+0xc0/0x330
>  [<ffffffff810b9083>] ? __lock_release+0x133/0x1a0
>  [<ffffffff815555f0>] ? _raw_spin_unlock_irqrestore+0x40/0x80
>  [<ffffffff814957c4>] netdev_release+0x44/0x60
>  [<ffffffff813704b7>] device_release+0x27/0xa0
>  [<ffffffff8127da42>] kobject_cleanup+0x82/0x1b0
>  [<ffffffff8127db7d>] kobject_release+0xd/0x10
>  [<ffffffff8127d8cc>] kobject_put+0x2c/0x60
>  [<ffffffff8147e371>] netdev_run_todo+0x101/0x180
>  [<ffffffff8148f5ae>] rtnl_unlock+0xe/0x10
>  [<ffffffffa0366178>] ieee80211_unregister_hw+0x58/0x120 [mac80211]
>  [<ffffffffa040912b>] iwlagn_mac_unregister+0x2b/0x40 [iwlwifi]
>  [<ffffffffa03fdf59>] iwl_op_mode_dvm_stop+0x49/0xf0 [iwlwifi]
>  [<ffffffffa041f730>] iwl_drv_stop+0x40/0x60 [iwlwifi]
>  [<ffffffffa0430a39>] iwl_pci_remove+0x25/0x3c [iwlwifi]
>  [<ffffffff812aafc2>] pci_device_remove+0x52/0x120
>  [<ffffffff813741cc>] __device_release_driver+0x7c/0xe0
>  [<ffffffff81374308>] driver_detach+0xd8/0xe0
>  [<ffffffff81372f61>] bus_remove_driver+0x91/0x110
>  [<ffffffff81374fd2>] driver_unregister+0x62/0xa0
>  [<ffffffff812ab2b4>] pci_unregister_driver+0x44/0xa0
>  [<ffffffffa041f3d5>] iwl_pci_unregister_driver+0x15/0x20 [iwlwifi]
>  [<ffffffffa0430a01>] iwl_exit+0x9/0x1c [iwlwifi]
>  [<ffffffff810c50f1>] sys_delete_module+0x1d1/0x2c0
>  [<ffffffff81555855>] ? retint_swapgs+0x13/0x1b
>  [<ffffffff810e169c>] ? __audit_syscall_entry+0xcc/0x210
>  [<ffffffff812896ce>] ? trace_hardirqs_on_thunk+0x3a/0x3f
>  [<ffffffff8155de69>] system_call_fastpath+0x16/0x1b
> ---[ end trace 8070f580fc119b8c ]---
> Bridge firewalling registered
>

Hi!
I was finally able to compile a relocatable kernel: here's what I got,
after a crash on iwlegacy module removal:
# crash vmcore /usr/lib/debug/lib/modules/`uname -r`/vmlinux

crash 6.0.8-1.fc18 [note, I'm on FC17 but had to install FC18's crash
to workaround a log structure change in 3.5 kernel]
(...)
This GDB was configured as "i686-pc-linux-gnu"...


============ CRASH 1 ============
      KERNEL: /usr/lib/debug/lib/modules/3.5.4-2.pedro.fc17.i686.PAE/vmlinux
    DUMPFILE: vmcore  [PARTIAL DUMP]
        CPUS: 2
        DATE: Tue Oct  9 09:17:35 2012
      UPTIME: 00:12:47
LOAD AVERAGE: 0.50, 0.62, 0.60
       TASKS: 322
    NODENAME: s2
     RELEASE: 3.5.4-2.pedro.fc17.i686.PAE
     VERSION: #1 SMP Mon Oct 8 23:15:44 WEST 2012
     MACHINE: i686  (1496 Mhz)
      MEMORY: 2 GB
       PANIC: "Oops: 0000 [#1] SMP " (check log for details)
         PID: 0
     COMMAND: "swapper/1"
        TASK: f4104240  (1 of 2)  [THREAD_INFO: f4146000]
         CPU: 1
       STATE: TASK_RUNNING (PANIC)

crash> bt
PID: 0      TASK: f4104240  CPU: 1   COMMAND: "swapper/1"
 #0 [f4147db4] crash_kexec at c04a7d59
 #1 [f4147e04] timerqueue_add at c0675503
 #2 [f4147e14] ktime_get at c04921ee
 #3 [f4147e30] bad_area_nosemaphore at c0958328
 #4 [f4147e3c] do_page_fault at c0964c25
 #5 [f4147eb8] error_code (via page_fault) at c0961eb1
    EAX: 6b6b6b6b  EBX: 00072420  ECX: 00000001  EDX: f4178930  EBP: f4147f30
    DS:  007b      ESI: 00000024  ES:  007b      EDI: 00072420  GS:  00e0
    CS:  0060      EIP: c0457993  ERR: ffffffff  EFLAGS: 00010003
 #6 [f4147eec] get_next_timer_interrupt at c0457993
 #7 [f4147f34] tick_nohz_stop_sched_tick.isra.11 at c049a19a
 #8 [f4147f78] tick_nohz_idle_enter at c049a661
 #9 [f4147f80] cpu_idle at c04195da


============ CRASH 2 ============

      KERNEL: /usr/lib/debug/lib/modules/3.5.4-2.pedro.fc17.i686.PAE/vmlinux
    DUMPFILE: vmcore
        CPUS: 2
        DATE: Tue Oct  9 09:29:35 2012
      UPTIME: 00:10:22
LOAD AVERAGE: 0.30, 0.78, 0.67
       TASKS: 323
    NODENAME: s2
     RELEASE: 3.5.4-2.pedro.fc17.i686.PAE
     VERSION: #1 SMP Mon Oct 8 23:15:44 WEST 2012
     MACHINE: i686  (1496 Mhz)
      MEMORY: 2 GB
       PANIC: "kernel BUG at kernel/timer.c:1091!"
         PID: 6563
     COMMAND: "rpm" <-- ?
        TASK: eaa5dcc0  [THREAD_INFO: f414a000]
         CPU: 0
       STATE: TASK_RUNNING (PANIC)

crash> bt
PID: 6563   TASK: eaa5dcc0  CPU: 0   COMMAND: "rpm"
bt: cannot resolve stack trace:
 #0 [f414bd58] __schedule at c095fba6
 #1 [f414bdd4] sched_clock_local at c047a56d
bt: text symbols on stack:
    [f414bd5c] kmap_atomic_prot at c0441244
    [f414bd70] __kunmap_atomic at c04410dd
    [f414bd80] get_page_from_freelist at c0504e40
    [f414bdcc] sched_clock at c0417a28
    [f414bdd4] sched_clock_local at c047a572
    [f414be28] update_curr at c047cdb2
    [f414be60] clear_nohz_tick_stopped.part.37 at c0958f63
    [f414be6c] trigger_load_balance at c047ff73
    [f414be88] scheduler_tick at c04774e5
    [f414beac] timerqueue_add at c0675508
    [f414bec0] ktime_get at c04921f0
    [f414bed4] lapic_next_event at c042f75b
    [f414bedc] clockevents_program_event at c049877d
    [f414bef4] tick_program_event at c0499a79
    [f414bf04] hrtimer_interrupt at c046bcc8
    [f414bf54] irq_exit at c045001d
    [f414bf5c] smp_apic_timer_interrupt at c042fdbe
    [f414bf74] apic_timer_interrupt at c0961c85
    [f414bfa8] sysenter_past_esp at c0968322
bt: possible exception frame:
  USER-MODE EXCEPTION FRAME AT f414bfb4:
    EAX: 0000002d  EBX: 09fba000  ECX: 45a4bff4  EDX: 09fba000
    DS:  007b      ESI: 09f99000  ES:  007b      EDI: 09fba000
    SS:  007b      ESP: bff95804  EBP: bff95804  GS:  0033
    CS:  0073      EIP: b7720424  ERR: 0000002d  EFLAGS: 00000202



So, I'm guessing this means it is related to what you found on iwlwifi
(even if I'm on iwlegacy)?
The crash kernel crashed again but I can try to add a script to try to
recover dmesg -- I believe slub_debug caught something as well...

-- 
Pedro

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: unloading WiFi modules is usually triggering kernel crash
  2012-10-09  9:14         ` Pedro Francisco
@ 2012-10-12 12:13           ` Stanislaw Gruszka
  2012-10-15 11:03             ` Johannes Berg
  2012-10-15 15:48             ` Pedro Francisco
  0 siblings, 2 replies; 10+ messages in thread
From: Stanislaw Gruszka @ 2012-10-12 12:13 UTC (permalink / raw)
  To: Pedro Francisco; +Cc: ML linux-wireless, Johannes Berg

On Tue, Oct 09, 2012 at 10:14:40AM +0100, Pedro Francisco wrote:
> So, I'm guessing this means it is related to what you found on iwlwifi
> (even if I'm on iwlegacy)?

Yes, this seems to be cfg80211 problem. I think crash happen because
cfg80211 is in disassociate state (i.e. has wdev->current_bss NULL) and
erroneously mac80211 stays in associate state. So while we unload
module cfg80211_mlme_down() we do not call ieee80211_deauth().

I think this state mishmash happens because wrong behaviour on
 __cfg80211_mlme_deauth(). Below patch try to correct that.
Can you check if it prevent a crash? On my environment I can 
not reproduce this problem reliably.

Thanks
Stanislaw

diff --git a/include/net/cfg80211.h b/include/net/cfg80211.h
index ab78b53..9b99b60 100644
--- a/include/net/cfg80211.h
+++ b/include/net/cfg80211.h
@@ -1218,6 +1218,7 @@ struct cfg80211_deauth_request {
 	const u8 *ie;
 	size_t ie_len;
 	u16 reason_code;
+	bool local_state_change;
 };
 
 /**
diff --git a/net/mac80211/mlme.c b/net/mac80211/mlme.c
index e714ed8..e510a33 100644
--- a/net/mac80211/mlme.c
+++ b/net/mac80211/mlme.c
@@ -3549,6 +3549,7 @@ int ieee80211_mgd_deauth(struct ieee80211_sub_if_data *sdata,
 {
 	struct ieee80211_if_managed *ifmgd = &sdata->u.mgd;
 	u8 frame_buf[IEEE80211_DEAUTH_FRAME_LEN];
+	bool tx = !req->local_state_change;
 
 	mutex_lock(&ifmgd->mtx);
 
@@ -3565,12 +3566,12 @@ int ieee80211_mgd_deauth(struct ieee80211_sub_if_data *sdata,
 	if (ifmgd->associated &&
 	    ether_addr_equal(ifmgd->associated->bssid, req->bssid)) {
 		ieee80211_set_disassoc(sdata, IEEE80211_STYPE_DEAUTH,
-				       req->reason_code, true, frame_buf);
+				       req->reason_code, tx, frame_buf);
 	} else {
 		drv_mgd_prepare_tx(sdata->local, sdata);
 		ieee80211_send_deauth_disassoc(sdata, req->bssid,
 					       IEEE80211_STYPE_DEAUTH,
-					       req->reason_code, true,
+					       req->reason_code, tx,
 					       frame_buf);
 	}
 
diff --git a/net/wireless/mlme.c b/net/wireless/mlme.c
index 3df195a..4954010 100644
--- a/net/wireless/mlme.c
+++ b/net/wireless/mlme.c
@@ -457,21 +457,11 @@ int __cfg80211_mlme_deauth(struct cfg80211_registered_device *rdev,
 		.reason_code = reason,
 		.ie = ie,
 		.ie_len = ie_len,
+		.local_state_change = local_state_change,
 	};
 
 	ASSERT_WDEV_LOCK(wdev);
 
-	if (local_state_change) {
-		if (wdev->current_bss &&
-		    ether_addr_equal(wdev->current_bss->pub.bssid, bssid)) {
-			cfg80211_unhold_bss(wdev->current_bss);
-			cfg80211_put_bss(&wdev->current_bss->pub);
-			wdev->current_bss = NULL;
-		}
-
-		return 0;
-	}
-
 	return rdev->ops->deauth(&rdev->wiphy, dev, &req);
 }
 

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: unloading WiFi modules is usually triggering kernel crash
  2012-10-12 12:13           ` Stanislaw Gruszka
@ 2012-10-15 11:03             ` Johannes Berg
  2012-10-15 15:48             ` Pedro Francisco
  1 sibling, 0 replies; 10+ messages in thread
From: Johannes Berg @ 2012-10-15 11:03 UTC (permalink / raw)
  To: Stanislaw Gruszka; +Cc: Pedro Francisco, ML linux-wireless

On Fri, 2012-10-12 at 14:13 +0200, Stanislaw Gruszka wrote:
> On Tue, Oct 09, 2012 at 10:14:40AM +0100, Pedro Francisco wrote:
> > So, I'm guessing this means it is related to what you found on iwlwifi
> > (even if I'm on iwlegacy)?
> 
> Yes, this seems to be cfg80211 problem. I think crash happen because
> cfg80211 is in disassociate state (i.e. has wdev->current_bss NULL) and
> erroneously mac80211 stays in associate state. So while we unload
> module cfg80211_mlme_down() we do not call ieee80211_deauth().
> 
> I think this state mishmash happens because wrong behaviour on
>  __cfg80211_mlme_deauth(). Below patch try to correct that.
> Can you check if it prevent a crash? On my environment I can 
> not reproduce this problem reliably.

Ugh, yeah, what was I thinking with the code below ... ??


> diff --git a/include/net/cfg80211.h b/include/net/cfg80211.h
> index ab78b53..9b99b60 100644
> --- a/include/net/cfg80211.h
> +++ b/include/net/cfg80211.h
> @@ -1218,6 +1218,7 @@ struct cfg80211_deauth_request {
>  	const u8 *ie;
>  	size_t ie_len;
>  	u16 reason_code;
> +	bool local_state_change;
>  };
>  
>  /**
> diff --git a/net/mac80211/mlme.c b/net/mac80211/mlme.c
> index e714ed8..e510a33 100644
> --- a/net/mac80211/mlme.c
> +++ b/net/mac80211/mlme.c
> @@ -3549,6 +3549,7 @@ int ieee80211_mgd_deauth(struct ieee80211_sub_if_data *sdata,
>  {
>  	struct ieee80211_if_managed *ifmgd = &sdata->u.mgd;
>  	u8 frame_buf[IEEE80211_DEAUTH_FRAME_LEN];
> +	bool tx = !req->local_state_change;
>  
>  	mutex_lock(&ifmgd->mtx);
>  
> @@ -3565,12 +3566,12 @@ int ieee80211_mgd_deauth(struct ieee80211_sub_if_data *sdata,
>  	if (ifmgd->associated &&
>  	    ether_addr_equal(ifmgd->associated->bssid, req->bssid)) {
>  		ieee80211_set_disassoc(sdata, IEEE80211_STYPE_DEAUTH,
> -				       req->reason_code, true, frame_buf);
> +				       req->reason_code, tx, frame_buf);
>  	} else {
>  		drv_mgd_prepare_tx(sdata->local, sdata);
>  		ieee80211_send_deauth_disassoc(sdata, req->bssid,
>  					       IEEE80211_STYPE_DEAUTH,
> -					       req->reason_code, true,
> +					       req->reason_code, tx,
>  					       frame_buf);
>  	}
>  
> diff --git a/net/wireless/mlme.c b/net/wireless/mlme.c
> index 3df195a..4954010 100644
> --- a/net/wireless/mlme.c
> +++ b/net/wireless/mlme.c
> @@ -457,21 +457,11 @@ int __cfg80211_mlme_deauth(struct cfg80211_registered_device *rdev,
>  		.reason_code = reason,
>  		.ie = ie,
>  		.ie_len = ie_len,
> +		.local_state_change = local_state_change,
>  	};
>  
>  	ASSERT_WDEV_LOCK(wdev);
>  
> -	if (local_state_change) {
> -		if (wdev->current_bss &&
> -		    ether_addr_equal(wdev->current_bss->pub.bssid, bssid)) {
> -			cfg80211_unhold_bss(wdev->current_bss);
> -			cfg80211_put_bss(&wdev->current_bss->pub);
> -			wdev->current_bss = NULL;
> -		}
> -
> -		return 0;
> -	}
> -


This looks fine to me. Probably needs Cc: stable?

Then again, maybe if the deauth request is for a BSS that *isn't* the
current BSS we should "swallow" it in cfg80211? IOW, something like

if (local_state_change && (!wdev->current_bss ||
			   !ether_addr_equal(...))
	return 0;

since neither mac80211 nor cfg80211 track authentication... Doesn't
matter much though.

johannes


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: unloading WiFi modules is usually triggering kernel crash
  2012-10-12 12:13           ` Stanislaw Gruszka
  2012-10-15 11:03             ` Johannes Berg
@ 2012-10-15 15:48             ` Pedro Francisco
  1 sibling, 0 replies; 10+ messages in thread
From: Pedro Francisco @ 2012-10-15 15:48 UTC (permalink / raw)
  To: Stanislaw Gruszka; +Cc: ML linux-wireless, Johannes Berg

[-- Attachment #1: Type: text/plain, Size: 3578 bytes --]

On Fri, Oct 12, 2012 at 1:13 PM, Stanislaw Gruszka <sgruszka@redhat.com> wrote:
> On Tue, Oct 09, 2012 at 10:14:40AM +0100, Pedro Francisco wrote:
>> So, I'm guessing this means it is related to what you found on iwlwifi
>> (even if I'm on iwlegacy)?
>
> Yes, this seems to be cfg80211 problem. I think crash happen because
> cfg80211 is in disassociate state (i.e. has wdev->current_bss NULL) and
> erroneously mac80211 stays in associate state. So while we unload
> module cfg80211_mlme_down() we do not call ieee80211_deauth().
>
> I think this state mishmash happens because wrong behaviour on
>  __cfg80211_mlme_deauth(). Below patch try to correct that.
> Can you check if it prevent a crash? On my environment I can
> not reproduce this problem reliably.
>
> Thanks
> Stanislaw
>
> diff --git a/include/net/cfg80211.h b/include/net/cfg80211.h
> index ab78b53..9b99b60 100644
> --- a/include/net/cfg80211.h
> +++ b/include/net/cfg80211.h
> @@ -1218,6 +1218,7 @@ struct cfg80211_deauth_request {
>         const u8 *ie;
>         size_t ie_len;
>         u16 reason_code;
> +       bool local_state_change;
>  };
>
>  /**
> diff --git a/net/mac80211/mlme.c b/net/mac80211/mlme.c
> index e714ed8..e510a33 100644
> --- a/net/mac80211/mlme.c
> +++ b/net/mac80211/mlme.c
> @@ -3549,6 +3549,7 @@ int ieee80211_mgd_deauth(struct ieee80211_sub_if_data *sdata,
>  {
>         struct ieee80211_if_managed *ifmgd = &sdata->u.mgd;
>         u8 frame_buf[IEEE80211_DEAUTH_FRAME_LEN];
> +       bool tx = !req->local_state_change;
>
>         mutex_lock(&ifmgd->mtx);
>
> @@ -3565,12 +3566,12 @@ int ieee80211_mgd_deauth(struct ieee80211_sub_if_data *sdata,
>         if (ifmgd->associated &&
>             ether_addr_equal(ifmgd->associated->bssid, req->bssid)) {
>                 ieee80211_set_disassoc(sdata, IEEE80211_STYPE_DEAUTH,
> -                                      req->reason_code, true, frame_buf);
> +                                      req->reason_code, tx, frame_buf);
>         } else {
>                 drv_mgd_prepare_tx(sdata->local, sdata);
>                 ieee80211_send_deauth_disassoc(sdata, req->bssid,
>                                                IEEE80211_STYPE_DEAUTH,
> -                                              req->reason_code, true,
> +                                              req->reason_code, tx,
>                                                frame_buf);
>         }
>
> diff --git a/net/wireless/mlme.c b/net/wireless/mlme.c
> index 3df195a..4954010 100644
> --- a/net/wireless/mlme.c
> +++ b/net/wireless/mlme.c
> @@ -457,21 +457,11 @@ int __cfg80211_mlme_deauth(struct cfg80211_registered_device *rdev,
>                 .reason_code = reason,
>                 .ie = ie,
>                 .ie_len = ie_len,
> +               .local_state_change = local_state_change,
>         };
>
>         ASSERT_WDEV_LOCK(wdev);
>
> -       if (local_state_change) {
> -               if (wdev->current_bss &&
> -                   ether_addr_equal(wdev->current_bss->pub.bssid, bssid)) {
> -                       cfg80211_unhold_bss(wdev->current_bss);
> -                       cfg80211_put_bss(&wdev->current_bss->pub);
> -                       wdev->current_bss = NULL;
> -               }
> -
> -               return 0;
> -       }
> -
>         return rdev->ops->deauth(&rdev->wiphy, dev, &req);
>  }
>

I've been testing the patch since this morning (GMT), I can't
reproduce any of the issues I referred on this thread (had to adapt
the patch slightly, though). Seems to be fixed!

Thank you for your help!
-- 
Pedro Francisco

[-- Attachment #2: mlme-timers-fedora-3.6.1-fc17-kernel.patch --]
[-- Type: application/octet-stream, Size: 1921 bytes --]

diff --git a/include/net/cfg80211.h b/include/net/cfg80211.h
index 3d254e1..f10553c 100644
--- a/include/net/cfg80211.h
+++ b/include/net/cfg80211.h
@@ -1217,6 +1217,7 @@ struct cfg80211_deauth_request {
 	const u8 *ie;
 	size_t ie_len;
 	u16 reason_code;
+	bool local_state_change;
 };
 
 /**
diff --git a/net/mac80211/mlme.c b/net/mac80211/mlme.c
index f76b833..da3f5e4 100644
--- a/net/mac80211/mlme.c
+++ b/net/mac80211/mlme.c
@@ -3457,6 +3457,7 @@ int ieee80211_mgd_deauth(struct ieee80211_sub_if_data *sdata,
 {
 	struct ieee80211_if_managed *ifmgd = &sdata->u.mgd;
 	u8 frame_buf[DEAUTH_DISASSOC_LEN];
+	bool tx = !req->local_state_change;
 
 	mutex_lock(&ifmgd->mtx);
 
@@ -3473,11 +3474,11 @@ int ieee80211_mgd_deauth(struct ieee80211_sub_if_data *sdata,
 	if (ifmgd->associated &&
 	    ether_addr_equal(ifmgd->associated->bssid, req->bssid))
 		ieee80211_set_disassoc(sdata, IEEE80211_STYPE_DEAUTH,
-				       req->reason_code, true, frame_buf);
+				       req->reason_code, tx, frame_buf);
 	else
 		ieee80211_send_deauth_disassoc(sdata, req->bssid,
 					       IEEE80211_STYPE_DEAUTH,
-					       req->reason_code, true,
+					       req->reason_code, tx,
 					       frame_buf);
 	mutex_unlock(&ifmgd->mtx);
 
diff --git a/net/wireless/mlme.c b/net/wireless/mlme.c
index 1cdb1d5..0877efb 100644
--- a/net/wireless/mlme.c
+++ b/net/wireless/mlme.c
@@ -457,21 +457,11 @@ int __cfg80211_mlme_deauth(struct cfg80211_registered_device *rdev,
 		.reason_code = reason,
 		.ie = ie,
 		.ie_len = ie_len,
+		.local_state_change = local_state_change,
 	};
 
 	ASSERT_WDEV_LOCK(wdev);
 
-	if (local_state_change) {
-		if (wdev->current_bss &&
-		    ether_addr_equal(wdev->current_bss->pub.bssid, bssid)) {
-			cfg80211_unhold_bss(wdev->current_bss);
-			cfg80211_put_bss(&wdev->current_bss->pub);
-			wdev->current_bss = NULL;
-		}
-
-		return 0;
-	}
-
 	return rdev->ops->deauth(&rdev->wiphy, dev, &req);
 }
 

^ permalink raw reply related	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2012-10-15 15:49 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-07-31 12:54 unloading WiFi modules is usually triggering kernel crash Pedro Francisco
2012-07-31 13:13 ` John W. Linville
2012-08-07 10:22 ` Stanislaw Gruszka
2012-08-30 15:58   ` Pedro Francisco
2012-09-26 12:47     ` Pedro Francisco
2012-10-03 14:30       ` Stanislaw Gruszka
2012-10-09  9:14         ` Pedro Francisco
2012-10-12 12:13           ` Stanislaw Gruszka
2012-10-15 11:03             ` Johannes Berg
2012-10-15 15:48             ` Pedro Francisco

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.