* Elevator bug in concert with usb-storage
@ 2003-10-23 2:25 Fredrik Tolf
2003-10-23 2:48 ` Nick Piggin
2003-10-23 15:27 ` Patrick Mansfield
0 siblings, 2 replies; 6+ messages in thread
From: Fredrik Tolf @ 2003-10-23 2:25 UTC (permalink / raw)
To: linux-kernel
Hello,
I believe that there is a bug in the usb-storage code. I'm using
2.6.0-test8-mm1, but I have experienced this in essentially all
2.6.0-test* kernels. Mostly anytime when I remove a usb-storage device
(especially before umounting it), I get a SEGV followed by general
unstability in the SCSI subsys.
After doing some research, I believe it is because the I/O scheduler
is released before the SCSI subsys stops using it. Here is the SEGV
report:
Unable to handle kernel NULL pointer dereference at virtual address 00000001
printing eip:
00000001
*pde = 00000000
Oops: 0000 [#1]
PREEMPT
CPU: 0
EIP: 0060:[<00000001>] Not tainted VLI
EFLAGS: 00010202
EIP is at 0x1
eax: d8b83180 ebx: dafc4e00 ecx: 00000000 edx: dafc4e00
esi: dafc4e10 edi: c03465e0 ebp: d990fe64 esp: d990fe58
ds: 007b es: 007b ss: 0068
Process umount (pid: 1744, threadinfo=d990e000 task=d8ba4080)
Stack: c02173d7 dafc4e00 d8b83180 d990fe78 c0218f5d dafc4e00 df774800 c03465b0
d990fe8c e0952fb5 dafc4e00 db8e3ca0 c03465b0 d990fe9c e0953f83 df774800
00000000 d990feac c021480a df774980 df7749a8 d990fec4 c01c26e3 df7749a8
Call Trace:
[<c02173d7>] elevator_exit+0x2d/0x3c
[<c0218f5d>] blk_cleanup_queue+0x65/0x70
[<e0952fb5>] scsi_free_sdev+0x11f/0x15a [scsi_mod]
[<e0953f83>] scsi_device_dev_release+0x15/0x22 [scsi_mod]
[<c021480a>] device_release+0x1a/0x60
[<c01c26e3>] kobject_cleanup+0x63/0x70
[<e094eb1c>] scsi_device_put+0x68/0xa6 [scsi_mod]
[<e08a94cb>] sd_release+0x27/0x46 [sd_mod]
[<c01569c5>] blkdev_put+0x1a7/0x1bc
[<c015698f>] blkdev_put+0x171/0x1bc
[<c015566b>] kill_block_super+0x25/0x2c
[<c01548c7>] deactivate_super+0x67/0xae
[<c01686c2>] sys_umount+0x36/0x84
[<c014e9c5>] filp_close+0x43/0x66
[<c016871d>] sys_oldumount+0xd/0x10
[<c02bafeb>] syscall_call+0x7/0xb
Code: Bad EIP value.
It seems as if the elevator_exit_fn is set to 1, and there is a
message logged just before the SEGV report spits out that an I/O
scheduler is released, which makes me draw the conclusion that the
kernel tries to free the I/O scheduler twice.
My apologies for taking up your time if this is a known a fixed
problem. However, if it is, can you point me to some information on
how to remedy it?
If it isn't, please tell me how I can help fix this.
Also, between the message that indicates that the I/O scheduler has
been released and the SEGV, there are two messages from the hotplug
handler, indicating that it can't the agents for "block" and "scsi"
(in that order), if that helps you determine the order of
things. There is also another message of the same kind for "block"
just before the I/O scheduler is first released.
Thank you for your time!
Fredrik Tolf
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Elevator bug in concert with usb-storage
2003-10-23 2:25 Elevator bug in concert with usb-storage Fredrik Tolf
@ 2003-10-23 2:48 ` Nick Piggin
2003-10-23 15:27 ` Patrick Mansfield
1 sibling, 0 replies; 6+ messages in thread
From: Nick Piggin @ 2003-10-23 2:48 UTC (permalink / raw)
To: Fredrik Tolf; +Cc: linux-kernel, James Bottomley
Fredrik Tolf wrote:
>Hello,
>
>I believe that there is a bug in the usb-storage code. I'm using
>2.6.0-test8-mm1, but I have experienced this in essentially all
>2.6.0-test* kernels. Mostly anytime when I remove a usb-storage device
>(especially before umounting it), I get a SEGV followed by general
>unstability in the SCSI subsys.
>
>After doing some research, I believe it is because the I/O scheduler
>is released before the SCSI subsys stops using it. Here is the SEGV
>report:
>
>Unable to handle kernel NULL pointer dereference at virtual address 00000001
> printing eip:
>00000001
>*pde = 00000000
>Oops: 0000 [#1]
>PREEMPT
>CPU: 0
>EIP: 0060:[<00000001>] Not tainted VLI
>EFLAGS: 00010202
>EIP is at 0x1
>eax: d8b83180 ebx: dafc4e00 ecx: 00000000 edx: dafc4e00
>esi: dafc4e10 edi: c03465e0 ebp: d990fe64 esp: d990fe58
>ds: 007b es: 007b ss: 0068
>Process umount (pid: 1744, threadinfo=d990e000 task=d8ba4080)
>Stack: c02173d7 dafc4e00 d8b83180 d990fe78 c0218f5d dafc4e00 df774800 c03465b0
> d990fe8c e0952fb5 dafc4e00 db8e3ca0 c03465b0 d990fe9c e0953f83 df774800
> 00000000 d990feac c021480a df774980 df7749a8 d990fec4 c01c26e3 df7749a8
>Call Trace:
> [<c02173d7>] elevator_exit+0x2d/0x3c
> [<c0218f5d>] blk_cleanup_queue+0x65/0x70
> [<e0952fb5>] scsi_free_sdev+0x11f/0x15a [scsi_mod]
> [<e0953f83>] scsi_device_dev_release+0x15/0x22 [scsi_mod]
> [<c021480a>] device_release+0x1a/0x60
> [<c01c26e3>] kobject_cleanup+0x63/0x70
> [<e094eb1c>] scsi_device_put+0x68/0xa6 [scsi_mod]
> [<e08a94cb>] sd_release+0x27/0x46 [sd_mod]
> [<c01569c5>] blkdev_put+0x1a7/0x1bc
> [<c015698f>] blkdev_put+0x171/0x1bc
> [<c015566b>] kill_block_super+0x25/0x2c
> [<c01548c7>] deactivate_super+0x67/0xae
> [<c01686c2>] sys_umount+0x36/0x84
> [<c014e9c5>] filp_close+0x43/0x66
> [<c016871d>] sys_oldumount+0xd/0x10
> [<c02bafeb>] syscall_call+0x7/0xb
>
>Code: Bad EIP value.
>
>It seems as if the elevator_exit_fn is set to 1, and there is a
>message logged just before the SEGV report spits out that an I/O
>scheduler is released, which makes me draw the conclusion that the
>kernel tries to free the I/O scheduler twice.
>
>My apologies for taking up your time if this is a known a fixed
>problem. However, if it is, can you point me to some information on
>how to remedy it?
>
James is having a look at refounting issues now. So it will be fixed.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Elevator bug in concert with usb-storage
2003-10-23 2:25 Elevator bug in concert with usb-storage Fredrik Tolf
2003-10-23 2:48 ` Nick Piggin
@ 2003-10-23 15:27 ` Patrick Mansfield
2003-10-23 19:06 ` Fredrik Tolf
1 sibling, 1 reply; 6+ messages in thread
From: Patrick Mansfield @ 2003-10-23 15:27 UTC (permalink / raw)
To: Fredrik Tolf; +Cc: linux-kernel
On Thu, Oct 23, 2003 at 04:25:37AM +0200, Fredrik Tolf wrote:
> Hello,
>
> I believe that there is a bug in the usb-storage code. I'm using
> 2.6.0-test8-mm1, but I have experienced this in essentially all
> 2.6.0-test* kernels. Mostly anytime when I remove a usb-storage device
> (especially before umounting it), I get a SEGV followed by general
> unstability in the SCSI subsys.
Try Mike's patch:
http://marc.theaimsgroup.com/?l=linux-scsi&m=106646263401437&w=2
-- Patrick Mansfield
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Elevator bug in concert with usb-storage
2003-10-23 15:27 ` Patrick Mansfield
@ 2003-10-23 19:06 ` Fredrik Tolf
2003-10-25 6:27 ` Mike Anderson
0 siblings, 1 reply; 6+ messages in thread
From: Fredrik Tolf @ 2003-10-23 19:06 UTC (permalink / raw)
To: Patrick Mansfield; +Cc: Fredrik Tolf, linux-kernel
Patrick Mansfield writes:
> On Thu, Oct 23, 2003 at 04:25:37AM +0200, Fredrik Tolf wrote:
> > Hello,
> >
> > I believe that there is a bug in the usb-storage code. I'm using
> > 2.6.0-test8-mm1, but I have experienced this in essentially all
> > 2.6.0-test* kernels. Mostly anytime when I remove a usb-storage device
> > (especially before umounting it), I get a SEGV followed by general
> > unstability in the SCSI subsys.
>
> Try Mike's patch:
>
> http://marc.theaimsgroup.com/?l=linux-scsi&m=106646263401437&w=2
Sorry, that didn't work well. It doesn't crash on the same thing
anymore, but nonetheless crashes. In addition, when I have removed the
device but not yet umounted the filesystem, I tried to ls its root
dir. Before, nothing extraordinary happened then, but now there's a
couple of oopses for the ls process as well.
The dmesg output since the device removal follows. I'm sorry for
filling up your mboxes; please tell me if this is unacceptable
behavior.
Fredrik Tolf
---
usb 3-1: USB disconnect, address 2
releasing anticipatory io scheduler
Unable to handle kernel NULL pointer dereference at virtual address 00000004
printing eip:
00000004
*pde = 00000000
Oops: 0000 [#1]
PREEMPT
CPU: 0
EIP: 0060:[<00000004>] Not tainted VLI
EFLAGS: 00010002
EIP is at 0x4
eax: dea9c800 ebx: 00000001 ecx: dea9c800 edx: df9749c0
esi: dea9c800 edi: 00000000 ebp: d9bedc0c esp: d9bedc04
ds: 007b es: 007b ss: 0068
Process ls (pid: 1277, threadinfo=d9bec000 task=dc2440c0)
Stack: c02176b3 dea9c800 d9bedc40 c0219ab4 dea9c800 00000041 00000000 00000000
00000001 00000000 00000000 c1702620 00000000 dea9c800 db8e8dc0 d9bedc94
c021a016 dea9c800 db8e8dc0 0003cfd0 00000000 dea9c800 00000041 00000000
Call Trace:
[<c02176b3>] elv_queue_empty+0x1d/0x20
[<c0219ab4>] __make_request+0x80/0x4ae
[<c021a016>] generic_make_request+0x134/0x186
[<c01534a7>] submit_bh+0x97/0x1e6
[<c021a09d>] submit_bio+0x35/0x60
[<c0151659>] __bread_slow_wq+0x4b/0xdc
[<c0151952>] __bread+0x2c/0x32
[<e08949e8>] fat__get_entry+0x9e/0x16e [fat]
[<e08913be>] fat_readdirx+0xdc0/0xe86 [fat]
[<c01868c3>] ext3_do_update_inode+0x19f/0x36c
[<c018a5d7>] __ext3_journal_stop+0x1b/0x3c
[<c013638e>] find_get_page+0x1e/0x44
[<c015178f>] bh_lru_install+0xa5/0xdc
[<c0191af8>] __journal_file_buffer+0x168/0x228
[<c0191af8>] __journal_file_buffer+0x168/0x228
[<c013638e>] find_get_page+0x1e/0x44
[<c013732b>] filemap_nopage+0x1e7/0x2b6
[<c0147240>] pte_chain_alloc+0x7a/0x7e
[<c0143150>] do_no_page+0x17c/0x32e
[<c0143485>] handle_mm_fault+0xa7/0x124
[<c011b284>] do_page_fault+0x2de/0x4dd
[<e08914a0>] fat_readdir+0x1c/0x1e [fat]
[<c015f62c>] filldir64+0x0/0x10a
[<c015f36d>] vfs_readdir+0x6d/0x72
[<c015f62c>] filldir64+0x0/0x10a
[<c015f798>] sys_getdents64+0x62/0x9d
[<c015f62c>] filldir64+0x0/0x10a
[<c02bafeb>] syscall_call+0x7/0xb
Code: Bad EIP value.
<6>note: ls[1277] exited with preempt_count 2
bad: scheduling while atomic!
Call Trace:
[<c011ce2e>] schedule+0x572/0x578
[<c0141df4>] unmap_vmas+0x18e/0x1d2
[<c0145534>] exit_mmap+0x60/0x164
[<c011e6b3>] mmput+0x71/0xd4
[<c01220be>] do_exit+0x130/0x3aa
[<c010b305>] die+0xd3/0xd4
[<c011b165>] do_page_fault+0x1bf/0x4dd
[<c0139de7>] buffered_rmqueue+0xd3/0x16a
[<c0139f16>] __alloc_pages+0x98/0x2d4
[<c013cdb1>] cache_init_objs+0x4f/0x54
[<c013cf6b>] cache_grow+0x185/0x29c
[<c011afa6>] do_page_fault+0x0/0x4dd
[<c02bb9f7>] error_code+0x2f/0x38
[<c02176b3>] elv_queue_empty+0x1d/0x20
[<c0219ab4>] __make_request+0x80/0x4ae
[<c021a016>] generic_make_request+0x134/0x186
[<c01534a7>] submit_bh+0x97/0x1e6
[<c021a09d>] submit_bio+0x35/0x60
[<c0151659>] __bread_slow_wq+0x4b/0xdc
[<c0151952>] __bread+0x2c/0x32
[<e08949e8>] fat__get_entry+0x9e/0x16e [fat]
[<e08913be>] fat_readdirx+0xdc0/0xe86 [fat]
[<c01868c3>] ext3_do_update_inode+0x19f/0x36c
[<c018a5d7>] __ext3_journal_stop+0x1b/0x3c
[<c013638e>] find_get_page+0x1e/0x44
[<c015178f>] bh_lru_install+0xa5/0xdc
[<c0191af8>] __journal_file_buffer+0x168/0x228
[<c0191af8>] __journal_file_buffer+0x168/0x228
[<c013638e>] find_get_page+0x1e/0x44
[<c013732b>] filemap_nopage+0x1e7/0x2b6
[<c0147240>] pte_chain_alloc+0x7a/0x7e
[<c0143150>] do_no_page+0x17c/0x32e
[<c0143485>] handle_mm_fault+0xa7/0x124
[<c011b284>] do_page_fault+0x2de/0x4dd
[<e08914a0>] fat_readdir+0x1c/0x1e [fat]
[<c015f62c>] filldir64+0x0/0x10a
[<c015f36d>] vfs_readdir+0x6d/0x72
[<c015f62c>] filldir64+0x0/0x10a
[<c015f798>] sys_getdents64+0x62/0x9d
[<c015f62c>] filldir64+0x0/0x10a
[<c02bafeb>] syscall_call+0x7/0xb
bad: scheduling while atomic!
Call Trace:
[<c011ce2e>] schedule+0x572/0x578
[<c0141df4>] unmap_vmas+0x18e/0x1d2
[<c0145534>] exit_mmap+0x60/0x164
[<c011e6b3>] mmput+0x71/0xd4
[<c01220be>] do_exit+0x130/0x3aa
[<c010b305>] die+0xd3/0xd4
[<c011b165>] do_page_fault+0x1bf/0x4dd
[<c0139de7>] buffered_rmqueue+0xd3/0x16a
[<c0139f16>] __alloc_pages+0x98/0x2d4
[<c013cdb1>] cache_init_objs+0x4f/0x54
[<c013cf6b>] cache_grow+0x185/0x29c
[<c011afa6>] do_page_fault+0x0/0x4dd
[<c02bb9f7>] error_code+0x2f/0x38
[<c02176b3>] elv_queue_empty+0x1d/0x20
[<c0219ab4>] __make_request+0x80/0x4ae
[<c021a016>] generic_make_request+0x134/0x186
[<c01534a7>] submit_bh+0x97/0x1e6
[<c021a09d>] submit_bio+0x35/0x60
[<c0151659>] __bread_slow_wq+0x4b/0xdc
[<c0151952>] __bread+0x2c/0x32
[<e08949e8>] fat__get_entry+0x9e/0x16e [fat]
[<e08913be>] fat_readdirx+0xdc0/0xe86 [fat]
[<c01868c3>] ext3_do_update_inode+0x19f/0x36c
[<c018a5d7>] __ext3_journal_stop+0x1b/0x3c
[<c013638e>] find_get_page+0x1e/0x44
[<c015178f>] bh_lru_install+0xa5/0xdc
[<c0191af8>] __journal_file_buffer+0x168/0x228
[<c0191af8>] __journal_file_buffer+0x168/0x228
[<c013638e>] find_get_page+0x1e/0x44
[<c013732b>] filemap_nopage+0x1e7/0x2b6
[<c0147240>] pte_chain_alloc+0x7a/0x7e
[<c0143150>] do_no_page+0x17c/0x32e
[<c0143485>] handle_mm_fault+0xa7/0x124
[<c011b284>] do_page_fault+0x2de/0x4dd
[<e08914a0>] fat_readdir+0x1c/0x1e [fat]
[<c015f62c>] filldir64+0x0/0x10a
[<c015f36d>] vfs_readdir+0x6d/0x72
[<c015f62c>] filldir64+0x0/0x10a
[<c015f798>] sys_getdents64+0x62/0x9d
[<c015f62c>] filldir64+0x0/0x10a
[<c02bafeb>] syscall_call+0x7/0xb
Debug: sleeping function called from invalid context at include/asm/semaphore.h:119
in_atomic():1, irqs_disabled():0
Call Trace:
[<c011e194>] __might_sleep+0x94/0xb8
[<c0143e42>] remove_shared_vm_struct+0x22/0x7a
[<c01455da>] exit_mmap+0x106/0x164
[<c011e6b3>] mmput+0x71/0xd4
[<c01220be>] do_exit+0x130/0x3aa
[<c010b305>] die+0xd3/0xd4
[<c011b165>] do_page_fault+0x1bf/0x4dd
[<c0139de7>] buffered_rmqueue+0xd3/0x16a
[<c0139f16>] __alloc_pages+0x98/0x2d4
[<c013cdb1>] cache_init_objs+0x4f/0x54
[<c013cf6b>] cache_grow+0x185/0x29c
[<c011afa6>] do_page_fault+0x0/0x4dd
[<c02bb9f7>] error_code+0x2f/0x38
[<c02176b3>] elv_queue_empty+0x1d/0x20
[<c0219ab4>] __make_request+0x80/0x4ae
[<c021a016>] generic_make_request+0x134/0x186
[<c01534a7>] submit_bh+0x97/0x1e6
[<c021a09d>] submit_bio+0x35/0x60
[<c0151659>] __bread_slow_wq+0x4b/0xdc
[<c0151952>] __bread+0x2c/0x32
[<e08949e8>] fat__get_entry+0x9e/0x16e [fat]
[<e08913be>] fat_readdirx+0xdc0/0xe86 [fat]
[<c01868c3>] ext3_do_update_inode+0x19f/0x36c
[<c018a5d7>] __ext3_journal_stop+0x1b/0x3c
[<c013638e>] find_get_page+0x1e/0x44
[<c015178f>] bh_lru_install+0xa5/0xdc
[<c0191af8>] __journal_file_buffer+0x168/0x228
[<c0191af8>] __journal_file_buffer+0x168/0x228
[<c013638e>] find_get_page+0x1e/0x44
[<c013732b>] filemap_nopage+0x1e7/0x2b6
[<c0147240>] pte_chain_alloc+0x7a/0x7e
[<c0143150>] do_no_page+0x17c/0x32e
[<c0143485>] handle_mm_fault+0xa7/0x124
[<c011b284>] do_page_fault+0x2de/0x4dd
[<e08914a0>] fat_readdir+0x1c/0x1e [fat]
[<c015f62c>] filldir64+0x0/0x10a
[<c015f36d>] vfs_readdir+0x6d/0x72
[<c015f62c>] filldir64+0x0/0x10a
[<c015f798>] sys_getdents64+0x62/0x9d
[<c015f62c>] filldir64+0x0/0x10a
[<c02bafeb>] syscall_call+0x7/0xb
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Elevator bug in concert with usb-storage
2003-10-23 19:06 ` Fredrik Tolf
@ 2003-10-25 6:27 ` Mike Anderson
2003-10-25 11:46 ` Fredrik Tolf
0 siblings, 1 reply; 6+ messages in thread
From: Mike Anderson @ 2003-10-25 6:27 UTC (permalink / raw)
To: Fredrik Tolf; +Cc: Patrick Mansfield, linux-kernel
Fredrik Tolf [fredrik@dolda2000.com] wrote:
> Sorry, that didn't work well. It doesn't crash on the same thing
> anymore, but nonetheless crashes. In addition, when I have removed the
> device but not yet umounted the filesystem, I tried to ls its root
> dir. Before, nothing extraordinary happened then, but now there's a
> couple of oopses for the ls process as well.
>
> .. snip ..
>
> Call Trace:
> [<c02176b3>] elv_queue_empty+0x1d/0x20
> [<c0219ab4>] __make_request+0x80/0x4ae
> [<c021a016>] generic_make_request+0x134/0x186
I tried to reproduce this error last night using the scsi_debug driver,
but could not. I tried different combinations of file systems (fat, ext2,
ext3). I did notice that I had elevator=deadline on the cmdline. I
removed this in case the elevator_queue_empty_fn between the two
elevators made a difference. I still was unable to reproduce. I will try
a few more combinations of things this weekend and let you know what I
find out. There might be a race still with our cleanups and I am not
able to reproduce it exactly on my system.
-andmike
--
Michael Anderson
andmike@us.ibm.com
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Elevator bug in concert with usb-storage
2003-10-25 6:27 ` Mike Anderson
@ 2003-10-25 11:46 ` Fredrik Tolf
0 siblings, 0 replies; 6+ messages in thread
From: Fredrik Tolf @ 2003-10-25 11:46 UTC (permalink / raw)
To: Mike Anderson; +Cc: Fredrik Tolf, Patrick Mansfield, linux-kernel
Mike Anderson writes:
> Fredrik Tolf [fredrik@dolda2000.com] wrote:
> > Sorry, that didn't work well. It doesn't crash on the same thing
> > anymore, but nonetheless crashes. In addition, when I have removed the
> > device but not yet umounted the filesystem, I tried to ls its root
> > dir. Before, nothing extraordinary happened then, but now there's a
> > couple of oopses for the ls process as well.
> >
> > .. snip ..
> >
> > Call Trace:
> > [<c02176b3>] elv_queue_empty+0x1d/0x20
> > [<c0219ab4>] __make_request+0x80/0x4ae
> > [<c021a016>] generic_make_request+0x134/0x186
>
> I tried to reproduce this error last night using the scsi_debug driver,
> but could not. I tried different combinations of file systems (fat, ext2,
> ext3). I did notice that I had elevator=deadline on the cmdline. I
> removed this in case the elevator_queue_empty_fn between the two
> elevators made a difference. I still was unable to reproduce. I will try
> a few more combinations of things this weekend and let you know what I
> find out. There might be a race still with our cleanups and I am not
> able to reproduce it exactly on my system.
Be aware that this was with usb-storage as the back-end driver. I
don't know if that might make some difference (as opposed to using
scsi_debug, that is).
Is there anything I can do to help?
Fredrik Tolf
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2003-10-25 11:46 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-10-23 2:25 Elevator bug in concert with usb-storage Fredrik Tolf
2003-10-23 2:48 ` Nick Piggin
2003-10-23 15:27 ` Patrick Mansfield
2003-10-23 19:06 ` Fredrik Tolf
2003-10-25 6:27 ` Mike Anderson
2003-10-25 11:46 ` Fredrik Tolf
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.