linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Kernel BUG due to kernel page fault
@ 2012-01-26 22:36 Larry Finger
  2012-01-27 16:42 ` Stanislaw Gruszka
  2012-01-28  4:30 ` Johannes Berg
  0 siblings, 2 replies; 5+ messages in thread
From: Larry Finger @ 2012-01-26 22:36 UTC (permalink / raw)
  To: LKML, wireless

I am trying to convert the rtlwifi family of drivers to use asynchronous 
firmware loading. With recent changes in udev, many problems are being reported. 
See Bug 42632] at b.k.o.

Most of the code is now working, but I still have one problem. When the firmware 
is available, everything is fine. I delay the starting of the driver and the MAC 
layer until the firmware is available. The problem occurs when I start the 
driver with the firmware file not available. Under those conditions, when I 
unload the driver, I get the following BUG output:

BUG: unable to handle kernel paging request at ffffffffa0056641
IP: [<ffffffff811e3ecb>] strnlen+0xb/0x30
PGD 1607067 PUD 160b063 PMD b2aed067 PTE 0
Oops: 0000 [#1] SMP
CPU 1
Modules linked in: fuse autofs4 snd_pcm_oss snd_mixer_oss snd_seq snd_seq_device 
edd nfs nfsd lockd auth_rpcgss nfs_acl sunrpc exportfs ipv6 vboxpci(O) 
vboxnetadp(O) vboxnetflt(O) vboxdrv(O) cpufreq_conservative cpufreq_userspace 
cpufreq_powersave powernow_k8 mperf ext3 jbd ide_cd_mod cdrom 
snd_hda_codec_conexant ide_pci_generic snd_hda_intel snd_hda_codec snd_pcm 
k8temp amd74xx sg snd_timer ide_core joydev button hwmon i2c_nforce2 serio_raw 
ac battery forcedeth snd i2c_core video soundcore snd_page_alloc ext4 mbcache 
jbd2 crc16 sd_mod ohci_hcd ahci libahci libata ehci_hcd scsi_mod usbcore 
usb_common fan processor thermal [last unloaded: rfkill]

Pid: 3605, comm: irqbalance Tainted: G           O 3.3.0-rc1-wl+ #175 
Hewlett-Packard HP Pavilion dv2700 Notebook PC/30D6
RIP: 0010:[<ffffffff811e3ecb>]  [<ffffffff811e3ecb>] strnlen+0xb/0x30
RSP: 0018:ffff8800b3179c98  EFLAGS: 00010086
RAX: 0000000000000000 RBX: ffff880084cdd28f RCX: 0000000000000000
RDX: ffffffffa0056641 RSI: ffffffffffffffff RDI: ffffffffa0056641
RBP: ffff8800b3179c98 R08: 000000000000ffff R09: 000000000000ffff
R10: 0000000000000000 R11: 0000000000000000 R12: ffffffffa0056641
R13: ffff880084cde000 R14: 0000000000000000 R15: 000000000000ffff
FS:  00007fa070fdd720(0000) GS:ffff8800bbd00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffffffa0056641 CR3: 00000000b2bba000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process irqbalance (pid: 3605, threadinfo ffff8800b3178000, task ffff8800b5875880)
Stack:
  ffff8800b3179cd8 ffffffff811e58ae ffff8800b3179cf8 ffff880084cdd28f
  ffffffff8151f664 ffff8800b3179d70 ffffffff8151f664 ffff880084cde000
  ffff8800b3179d58 ffffffff811e6889 000000000000ffff 0000000000000002
Call Trace:
  [<ffffffff811e58ae>] string.isra.4+0x3e/0xd0
  [<ffffffff811e6889>] vsnprintf+0x219/0x600
  [<ffffffff81153093>] seq_printf+0x53/0x80
  [<ffffffff811e21db>] ? radix_tree_lookup+0xb/0x10
  [<ffffffff810ba5ab>] show_interrupts+0x24b/0x300
  [<ffffffff81153474>] seq_read+0x2b4/0x3f0
  [<ffffffff811531c0>] ? seq_lseek+0x100/0x100
  [<ffffffff8118b651>] proc_reg_read+0x81/0xc0
  [<ffffffff8112f967>] vfs_read+0xa7/0x160
  [<ffffffff8112fa65>] sys_read+0x45/0x90
  [<ffffffff813a33a2>] system_call_fastpath+0x16/0x1b
Code: 31 c0 80 3f 00 55 48 89 e5 74 11 48 89 f8 66 90 48 83 c0 01 80 38 00 75 f7 
48 29 f8 5d c3 66 90 55 31 c0 48 85 f6 48 89 e5 74 23 <80> 3f 00 74 1e 48 89 f8 
eb 0c 0f 1f 00 48 83 ee 01 80 38 00 74
RIP  [<ffffffff811e3ecb>] strnlen+0xb/0x30
RSP <ffff8800b3179c98>
CR2: ffffffffa0056641
---[ end trace ad5f2d26ea4a9371 ]---

Nothing that I have tried has made any difference. Does this kind of fault 
suggest a fix? I found nothing useful with Google.

Thanks,

Larry

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Kernel BUG due to kernel page fault
  2012-01-26 22:36 Kernel BUG due to kernel page fault Larry Finger
@ 2012-01-27 16:42 ` Stanislaw Gruszka
  2012-01-27 18:07   ` Larry Finger
  2012-01-28  4:30 ` Johannes Berg
  1 sibling, 1 reply; 5+ messages in thread
From: Stanislaw Gruszka @ 2012-01-27 16:42 UTC (permalink / raw)
  To: Larry Finger; +Cc: LKML, wireless

On Thu, Jan 26, 2012 at 04:36:47PM -0600, Larry Finger wrote:
> Modules linked in: fuse autofs4 snd_pcm_oss snd_mixer_oss snd_seq
> snd_seq_device edd nfs nfsd lockd auth_rpcgss nfs_acl sunrpc
> exportfs ipv6 vboxpci(O) vboxnetadp(O) vboxnetflt(O) vboxdrv(O)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 
> RIP: 0010:[<ffffffff811e3ecb>]  [<ffffffff811e3ecb>] strnlen+0xb/0x30
[snip]
> Call Trace:
>  [<ffffffff811e58ae>] string.isra.4+0x3e/0xd0
>  [<ffffffff811e6889>] vsnprintf+0x219/0x600
>  [<ffffffff81153093>] seq_printf+0x53/0x80
>  [<ffffffff811e21db>] ? radix_tree_lookup+0xb/0x10
>  [<ffffffff810ba5ab>] show_interrupts+0x24b/0x300
>  [<ffffffff81153474>] seq_read+0x2b4/0x3f0
>  [<ffffffff811531c0>] ? seq_lseek+0x100/0x100
>  [<ffffffff8118b651>] proc_reg_read+0x81/0xc0
>  [<ffffffff8112f967>] vfs_read+0xa7/0x160
>  [<ffffffff8112fa65>] sys_read+0x45/0x90
>  [<ffffffff813a33a2>] system_call_fastpath+0x16/0x1b

This is call trace from cat /proc/interrupts, seems something pass
malformed string to request_irq(). I do not think this is rtlwifi,
rather that vbox thing.

Stanislaw

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Kernel BUG due to kernel page fault
  2012-01-27 16:42 ` Stanislaw Gruszka
@ 2012-01-27 18:07   ` Larry Finger
  0 siblings, 0 replies; 5+ messages in thread
From: Larry Finger @ 2012-01-27 18:07 UTC (permalink / raw)
  To: Stanislaw Gruszka; +Cc: LKML, wireless

On 01/27/2012 10:42 AM, Stanislaw Gruszka wrote:
> On Thu, Jan 26, 2012 at 04:36:47PM -0600, Larry Finger wrote:
>> Modules linked in: fuse autofs4 snd_pcm_oss snd_mixer_oss snd_seq
>> snd_seq_device edd nfs nfsd lockd auth_rpcgss nfs_acl sunrpc
>> exportfs ipv6 vboxpci(O) vboxnetadp(O) vboxnetflt(O) vboxdrv(O)
>                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>> RIP: 0010:[<ffffffff811e3ecb>]  [<ffffffff811e3ecb>] strnlen+0xb/0x30
> [snip]
>> Call Trace:
>>   [<ffffffff811e58ae>] string.isra.4+0x3e/0xd0
>>   [<ffffffff811e6889>] vsnprintf+0x219/0x600
>>   [<ffffffff81153093>] seq_printf+0x53/0x80
>>   [<ffffffff811e21db>] ? radix_tree_lookup+0xb/0x10
>>   [<ffffffff810ba5ab>] show_interrupts+0x24b/0x300
>>   [<ffffffff81153474>] seq_read+0x2b4/0x3f0
>>   [<ffffffff811531c0>] ? seq_lseek+0x100/0x100
>>   [<ffffffff8118b651>] proc_reg_read+0x81/0xc0
>>   [<ffffffff8112f967>] vfs_read+0xa7/0x160
>>   [<ffffffff8112fa65>] sys_read+0x45/0x90
>>   [<ffffffff813a33a2>] system_call_fastpath+0x16/0x1b
>
> This is call trace from cat /proc/interrupts, seems something pass
> malformed string to request_irq(). I do not think this is rtlwifi,
> rather that vbox thing.

Thanks for responding. Vbox may be guilty of a lot of things, but this is not 
one of them.

Just before I received your reply, I found the problem. The PCI driver in 
rtlwifi sets a flag to indicate that interrupts have been enabled, and uses that 
flag to tell if the cleanup needs to disable them. When I moved the code that 
starts mac80211, I also moved the statement that sets the flag. As a result, the 
driver was exiting with interrupts still enabled, and the BUG resulted from 
that. I can now unload the driver when firmware is not available.

Larry





^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Kernel BUG due to kernel page fault
  2012-01-26 22:36 Kernel BUG due to kernel page fault Larry Finger
  2012-01-27 16:42 ` Stanislaw Gruszka
@ 2012-01-28  4:30 ` Johannes Berg
  2012-01-28 23:02   ` Larry Finger
  1 sibling, 1 reply; 5+ messages in thread
From: Johannes Berg @ 2012-01-28  4:30 UTC (permalink / raw)
  To: Larry Finger; +Cc: LKML, wireless

Larry,

On 1/26/2012 2:36 PM, Larry Finger wrote:
> I am trying to convert the rtlwifi family of drivers to use asynchronous
> firmware loading. With recent changes in udev, many problems are being
> reported. See Bug 42632] at b.k.o.

I'm not sure what caused your problem, but your email reminded me of an 
issue we had with asynchronous firmware loading that you may want to 
consider.

When request_firmware_async() is called, it eventually calls the 
callback you give it. This may, however, be a long time later (I think 
the current default timeout is 1 minute). In the meantime, the module 
could be unloaded by the user, and then the system crashes as soon as 
the firmwaer code attempts to invoke the callback.

To fix this, we have a completion in iwlwifi that some exit code path 
waits for to make sure this scenario doesn't happen.

I think you should implement that.

A better fix would probably be to make request_firmware_async() take a 
struct module * argument and pass THIS_MODULE to it, and make the 
firmware code handle this, but currently it doesn't.

johannes

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Kernel BUG due to kernel page fault
  2012-01-28  4:30 ` Johannes Berg
@ 2012-01-28 23:02   ` Larry Finger
  0 siblings, 0 replies; 5+ messages in thread
From: Larry Finger @ 2012-01-28 23:02 UTC (permalink / raw)
  To: Johannes Berg; +Cc: LKML, wireless

On 01/27/2012 10:30 PM, Johannes Berg wrote:
> Larry,
>
> On 1/26/2012 2:36 PM, Larry Finger wrote:
>> I am trying to convert the rtlwifi family of drivers to use asynchronous
>> firmware loading. With recent changes in udev, many problems are being
>> reported. See Bug 42632] at b.k.o.
>
> I'm not sure what caused your problem, but your email reminded me of an issue we
> had with asynchronous firmware loading that you may want to consider.
>
> When request_firmware_async() is called, it eventually calls the callback you
> give it. This may, however, be a long time later (I think the current default
> timeout is 1 minute). In the meantime, the module could be unloaded by the user,
> and then the system crashes as soon as the firmwaer code attempts to invoke the
> callback.
>
> To fix this, we have a completion in iwlwifi that some exit code path waits for
> to make sure this scenario doesn't happen.
>
> I think you should implement that.
>
> A better fix would probably be to make request_firmware_async() take a struct
> module * argument and pass THIS_MODULE to it, and make the firmware code handle
> this, but currently it doesn't.

Thanks for explaining why the completion queue is in iwlwifi. That was easy to 
implement.

Larry


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2012-01-28 23:02 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-01-26 22:36 Kernel BUG due to kernel page fault Larry Finger
2012-01-27 16:42 ` Stanislaw Gruszka
2012-01-27 18:07   ` Larry Finger
2012-01-28  4:30 ` Johannes Berg
2012-01-28 23:02   ` Larry Finger

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).