linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Kernel crashing on eject SD card
@ 2012-02-08  0:19 Naveen Goswamy
  2012-02-12 21:08 ` Stefan Richter
  0 siblings, 1 reply; 17+ messages in thread
From: Naveen Goswamy @ 2012-02-08  0:19 UTC (permalink / raw)
  To: linux-kernel

The details are here:


https://bugs.gentoo.org/show_bug.cgi?id=402433

Any ideas?  Is this a known issue?

Thanks,

Naveen

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Kernel crashing on eject SD card
  2012-02-08  0:19 Kernel crashing on eject SD card Naveen Goswamy
@ 2012-02-12 21:08 ` Stefan Richter
  2012-02-12 21:20   ` Stefan Richter
  0 siblings, 1 reply; 17+ messages in thread
From: Stefan Richter @ 2012-02-12 21:08 UTC (permalink / raw)
  To: Naveen Goswamy; +Cc: linux-kernel, linux-scsi

On Feb 07 Naveen Goswamy wrote:
> The details are here:
> 
> 
> https://bugs.gentoo.org/show_bug.cgi?id=402433
> 
> Any ideas?  Is this a known issue?

It has been reported repeatedly, AFAICT without any progress so far.
http://marc.info/?l=linux-scsi&m=132388619710052

It's the old story; an udev helper opens a block device while it is being
torn down apparently.  Block subsystem soils itself.

Here is the kernel log from Naveen's report at the Gentoo bug tracker, for
the convenience of the list subscribers:

Feb  6 10:51:09 speedy kernel: ieee80211 phy0: wl0: brcms_c_d11hdrs_mac80211:  txop exceeded phylen 130/256 dur 1546/1504
Feb  6 10:54:39 speedy kernel: usb 1-1.6: new high-speed USB device number 4 using ehci_hcd
Feb  6 10:54:39 speedy kernel: usb 1-1.6: New USB device found, idVendor=0bda, idProduct=0159
Feb  6 10:54:39 speedy kernel: usb 1-1.6: New USB device strings: Mfr=1, Product=2, SerialNumber=3
Feb  6 10:54:39 speedy kernel: usb 1-1.6: Product: USB2.0-CRW
Feb  6 10:54:39 speedy kernel: usb 1-1.6: Manufacturer: Generic
Feb  6 10:54:39 speedy kernel: usb 1-1.6: SerialNumber: 20071114173400000
Feb  6 10:54:39 speedy kernel: scsi7 : usb-storage 1-1.6:1.0
Feb  6 10:54:40 speedy kernel: scsi 7:0:0:0: Direct-Access     Generic- Multi-Card       1.00 PQ: 0 ANSI: 0 CCS
Feb  6 10:54:40 speedy kernel: sd 7:0:0:0: Attached scsi generic sg2 type 0
Feb  6 10:54:41 speedy kernel: sd 7:0:0:0: [sdb] 3862528 512-byte logical blocks: (1.97 GB/1.84 GiB)
Feb  6 10:54:41 speedy kernel: sd 7:0:0:0: [sdb] Write Protect is off
Feb  6 10:54:41 speedy kernel: sd 7:0:0:0: [sdb] Mode Sense: 03 00 00 00
Feb  6 10:54:41 speedy kernel: sd 7:0:0:0: [sdb] No Caching mode page present
Feb  6 10:54:41 speedy kernel: sd 7:0:0:0: [sdb] Assuming drive cache: write through
Feb  6 10:54:41 speedy kernel: sd 7:0:0:0: [sdb] No Caching mode page present
Feb  6 10:54:41 speedy kernel: sd 7:0:0:0: [sdb] Assuming drive cache: write through
Feb  6 10:54:41 speedy kernel: sdb: sdb1
Feb  6 10:54:41 speedy kernel: sd 7:0:0:0: [sdb] No Caching mode page present
Feb  6 10:54:41 speedy kernel: sd 7:0:0:0: [sdb] Assuming drive cache: write through
Feb  6 10:54:41 speedy kernel: sd 7:0:0:0: [sdb] Attached SCSI removable disk
Feb  6 10:58:08 speedy kernel: usb 1-1.6: USB disconnect, device number 4
Feb  6 10:58:08 speedy kernel: scsi 7:0:0:0: killing request
Feb  6 10:58:08 speedy kernel: BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
Feb  6 10:58:08 speedy kernel: IP: [<ffffffff8135b1c7>] sd_revalidate_disk+0x1a/0x16ee
Feb  6 10:58:08 speedy kernel: PGD 2209b0067 PUD 2209b1067 PMD 0 
Feb  6 10:58:08 speedy kernel: Oops: 0000 [#1] SMP 
Feb  6 10:58:08 speedy kernel: CPU 3 
Feb  6 10:58:08 speedy kernel: Modules linked in: aes_x86_64 aes_generic ipt_REJECT iptable_mangle iptable_nat nf_nat iptable_filter ip_tables ipv6 dm_mod vboxnetadp(O) vboxnetflt(O) vboxdrv(O) uvcvideo videodev v4l2_compat_ioctl32 usb_storage arc4 brcmsmac snd_hda_codec_hdmi snd_hda_codec_idt mac80211 snd_hda_intel brcmutil snd_hda_codec dell_wmi ehci_hcd r8169 usbcore snd_pcm cfg80211 snd_timer dcdbas firmware_class snd usb_common sparse_keymap soundcore rtc rfkill sg wmi snd_page_alloc crc8 cordic
Feb  6 10:58:08 speedy kernel: 
Feb  6 10:58:08 speedy kernel: Pid: 2434, comm: udisks-daemon Tainted: G           O 3.2.1-gentoo-r2_MINE_V01 #1 Dell Inc. Vostro 3400/07MJFM
Feb  6 10:58:08 speedy kernel: RIP: 0010:[<ffffffff8135b1c7>]  [<ffffffff8135b1c7>] sd_revalidate_disk+0x1a/0x16ee
Feb  6 10:58:08 speedy kernel: RSP: 0018:ffff8802209cfb08  EFLAGS: 00010292
Feb  6 10:58:08 speedy kernel: RAX: ffffffff8135b1ad RBX: 0000000000000000 RCX: 0000000000000002
Feb  6 10:58:08 speedy kernel: RDX: 0000000000000002 RSI: 0000000800000000 RDI: ffff880231668800
Feb  6 10:58:08 speedy kernel: RBP: ffff880231668800 R08: ffff8802315447a0 R09: ffffffff81852e48
Feb  6 10:58:08 speedy kernel: R10: 0000000000000000 R11: 0000000000011e00 R12: ffff880231668800
Feb  6 10:58:08 speedy kernel: R13: ffff88021746acd8 R14: 0000000000000000 R15: ffff88021746acc0
Feb  6 10:58:08 speedy kernel: FS:  00007fc6dd3ab700(0000) GS:ffff88023bd80000(0000) knlGS:0000000000000000
Feb  6 10:58:08 speedy kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
Feb  6 10:58:08 speedy kernel: CR2: 0000000000000008 CR3: 00000002209af000 CR4: 00000000000006e0
Feb  6 10:58:08 speedy kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Feb  6 10:58:08 speedy kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Feb  6 10:58:08 speedy kernel: Process udisks-daemon (pid: 2434, threadinfo ffff8802209ce000, task ffff88023aac4110)
Feb  6 10:58:08 speedy kernel: Stack:
Feb  6 10:58:08 speedy kernel: ffffffff8103461a ffff880231668848 0000000000000000 ffff880231668800
Feb  6 10:58:08 speedy kernel: ffff88021746acd8 000000000000001d ffff88021746acc0 ffffffff810a0ec9
Feb  6 10:58:08 speedy kernel: ffff88021746acc0 ffff880231668800 0000000000000000 ffff88021746ad98
Feb  6 10:58:08 speedy kernel: Call Trace:
Feb  6 10:58:08 speedy kernel: [<ffffffff8103461a>] ? try_to_wake_up+0x200/0x200
Feb  6 10:58:08 speedy kernel: [<ffffffff810a0ec9>] ? get_super+0x1a/0x95
Feb  6 10:58:08 speedy kernel: [<ffffffff810b23d8>] ? iput+0x2b/0x17e
Feb  6 10:58:08 speedy kernel: [<ffffffff810eb3ce>] ? rescan_partitions+0xac/0x446
Feb  6 10:58:08 speedy kernel: [<ffffffff810c5410>] ? __blkdev_get+0x162/0x33f
Feb  6 10:58:08 speedy kernel: [<ffffffff810c588b>] ? blkdev_get+0x29e/0x29e
Feb  6 10:58:08 speedy kernel: [<ffffffff810c57ad>] ? blkdev_get+0x1c0/0x29e
Feb  6 10:58:08 speedy kernel: [<ffffffff810c588b>] ? blkdev_get+0x29e/0x29e
Feb  6 10:58:08 speedy kernel: [<ffffffff8109e03b>] ? __dentry_open.clone.14+0x16b/0x294
Feb  6 10:58:08 speedy kernel: [<ffffffff810aaacb>] ? do_last.clone.34+0x64e/0x662
Feb  6 10:58:08 speedy kernel: [<ffffffff810aabe1>] ? path_openat+0xcb/0x354
Feb  6 10:58:08 speedy kernel: [<ffffffff8133e3fc>] ? scsi_set_medium_removal+0x46/0x6b
Feb  6 10:58:08 speedy kernel: [<ffffffff8102c3b7>] ? ttwu_do_wakeup+0x11/0x86
Feb  6 10:58:08 speedy kernel: [<ffffffff810aaf45>] ? do_filp_open+0x2c/0x72
Feb  6 10:58:08 speedy kernel: [<ffffffff810b3fde>] ? alloc_fd+0x69/0x10f
Feb  6 10:58:08 speedy kernel: [<ffffffff8109ed2e>] ? do_sys_open+0x101/0x18f
Feb  6 10:58:08 speedy kernel: [<ffffffff81482b52>] ? system_call_fastpath+0x16/0x1b
Feb  6 10:58:08 speedy kernel: Code: ff ff 48 83 c4 68 5b 5d 41 5c 41 5d 41 5e 41 5f c3 41 57 41 56 41 55 41 54 55 53 48 83 ec 78 48 8b 9f 50 02 00 00 48 89 7c 24 48 <48> 8b 43 08 48 89 44 24 28 8b 05 1a e2 7e 00 c1 e8 15 83 e0 07 
Feb  6 10:58:08 speedy kernel: RIP  [<ffffffff8135b1c7>] sd_revalidate_disk+0x1a/0x16ee
Feb  6 10:58:08 speedy kernel: RSP <ffff8802209cfb08>
Feb  6 10:58:08 speedy kernel: CR2: 0000000000000008
Feb  6 10:58:09 speedy kernel: ---[ end trace 2cb4da56c38cb030 ]---

-- 
Stefan Richter
-=====-===-- --=- -==--
http://arcgraph.de/sr/

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Kernel crashing on eject SD card
  2012-02-12 21:08 ` Stefan Richter
@ 2012-02-12 21:20   ` Stefan Richter
  2012-02-13  1:46     ` Naveen Goswamy
  2012-02-13  2:18     ` Dave Jones
  0 siblings, 2 replies; 17+ messages in thread
From: Stefan Richter @ 2012-02-12 21:20 UTC (permalink / raw)
  To: Stefan Richter; +Cc: Naveen Goswamy, linux-kernel, linux-scsi

On Feb 12 Stefan Richter wrote:
> Modules linked in: [...] vboxnetadp(O) vboxnetflt(O) vboxdrv(O) [...]

Oh, could you try without virtualbox?

The debian bug report hints that kernel 3.2 /without virtualbox drivers/
seems to behave itself.
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=649735
-- 
Stefan Richter
-=====-===-- --=- -==--
http://arcgraph.de/sr/

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Kernel crashing on eject SD card
  2012-02-12 21:20   ` Stefan Richter
@ 2012-02-13  1:46     ` Naveen Goswamy
  2012-02-13  2:18     ` Dave Jones
  1 sibling, 0 replies; 17+ messages in thread
From: Naveen Goswamy @ 2012-02-13  1:46 UTC (permalink / raw)
  To: Stefan Richter; +Cc: linux-kernel, linux-scsi

I confirm that it behaves properly when virtualbox drivers are not loaded.

Cheers,

Naveen


Quoting Stefan Richter <stefanr@s5r6.in-berlin.de>:

> On Feb 12 Stefan Richter wrote:
> > Modules linked in: [...] vboxnetadp(O) vboxnetflt(O) vboxdrv(O) [...]
>
> Oh, could you try without virtualbox?
>
> The debian bug report hints that kernel 3.2 /without virtualbox drivers/
> seems to behave itself.
> http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=649735
> --
> Stefan Richter
> -=====-===-- --=- -==--
> http://arcgraph.de/sr/
>




^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Kernel crashing on eject SD card
  2012-02-12 21:20   ` Stefan Richter
  2012-02-13  1:46     ` Naveen Goswamy
@ 2012-02-13  2:18     ` Dave Jones
  2012-02-13 17:40       ` Naveen Goswamy
  1 sibling, 1 reply; 17+ messages in thread
From: Dave Jones @ 2012-02-13  2:18 UTC (permalink / raw)
  To: Stefan Richter; +Cc: Naveen Goswamy, linux-kernel, linux-scsi

On Sun, Feb 12, 2012 at 10:20:27PM +0100, Stefan Richter wrote:
 > On Feb 12 Stefan Richter wrote:
 > > Modules linked in: [...] vboxnetadp(O) vboxnetflt(O) vboxdrv(O) [...]
 > 
 > Oh, could you try without virtualbox?
 > 
 > The debian bug report hints that kernel 3.2 /without virtualbox drivers/
 > seems to behave itself.
 > http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=649735

We've seen this a bunch of times in Fedora too.

Here's a report we've been duping similar bugs against
https://bugzilla.redhat.com/show_bug.cgi?id=754518

Some of them are using vbox/vmware, but there's a few in there that
haven't used either, so I think that might be a red herring.

	Dave



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Kernel crashing on eject SD card
  2012-02-13  2:18     ` Dave Jones
@ 2012-02-13 17:40       ` Naveen Goswamy
  2012-02-14 11:14         ` Jun'ichi Nomura
  0 siblings, 1 reply; 17+ messages in thread
From: Naveen Goswamy @ 2012-02-13 17:40 UTC (permalink / raw)
  To: Dave Jones; +Cc: Stefan Richter, linux-kernel, linux-scsi

Quoting Dave Jones <davej@redhat.com>:


> Some of them are using vbox/vmware, but there's a few in there that
> haven't used either, so I think that might be a red herring.


You are correct Dave.  There was a red-herring.  I managed to experience the
same crash and burn again today, without vbox drivers.  Here are the logs.


Feb 13 08:50:53 speedy kernel: scsi 6:0:0:0: killing request
Feb 13 08:50:53 speedy kernel: BUG: unable to handle kernel NULL pointer
dereference at 0000000000000008
Feb 13 08:50:53 speedy kernel: IP: [<ffffffff8135b798>]
sd_revalidate_disk+0x1a/0x16ee
Feb 13 08:50:53 speedy kernel: PGD 223493067 PUD 2234de067 PMD 0
Feb 13 08:50:53 speedy kernel: Oops: 0000 [#1] SMP
Feb 13 08:50:53 speedy kernel: CPU 2
Feb 13 08:50:53 speedy kernel: Modules linked in: aes_x86_64 aes_generic
ipt_REJECT iptable_mangle iptable_nat nf_nat iptable_filter ip_tables ipv6
dm_mod uvcvideo videodev v4l2_compat_ioctl32 usb_storage arc4 brcmsmac
snd_hda_codec_hdmi snd_hda_codec_idt mac80211 brcmutil snd_hda_intel
snd_hda_codec cfg80211 r8169 rfkill snd_pcm snd_timer dell_wmi snd
sparse_keymap ehci_hcd wmi firmware_class dcdbas crc8 soundcore rtc usbcore
snd_page_alloc sg cordic usb_common
Feb 13 08:50:53 speedy kernel:
Feb 13 08:50:53 speedy kernel: Pid: 2721, comm: udisks-daemon Not tainted
3.2.5-gentoo_MINE_V00 #1 Dell Inc. Vostro 3400/07MJFM
Feb 13 08:50:53 speedy kernel: RIP: 0010:[<ffffffff8135b798>]
[<ffffffff8135b798>] sd_revalidate_disk+0x1a/0x16ee
Feb 13 08:50:53 speedy kernel: RSP: 0018:ffff8802234ddb08  EFLAGS: 00010292
Feb 13 08:50:53 speedy kernel: RAX: ffffffff8135b77e RBX: 0000000000000000 RCX:
0000000000000002
Feb 13 08:50:53 speedy kernel: RDX: 0000000000000002 RSI: 0000000800000000 RDI:
ffff880231599000
Feb 13 08:50:53 speedy kernel: RBP: ffff880231599000 R08: ffff88023ab4f9a0 R09:
ffffffff81852ec8
Feb 13 08:50:53 speedy kernel: R10: 0000000000000002 R11: 0000000000011e00 R12:
ffff880231599000
Feb 13 08:50:53 speedy kernel: R13: ffff880232322698 R14: 0000000000000000 R15:
ffff880232322680
Feb 13 08:50:53 speedy kernel: FS:  00007f7666c6b700(0000)
GS:ffff88023bd00000(0000) knlGS:0000000000000000
Feb 13 08:50:53 speedy kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Feb 13 08:50:53 speedy kernel: CR2: 0000000000000008 CR3: 0000000223492000 CR4:
00000000000006e0
Feb 13 08:50:53 speedy kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
Feb 13 08:50:53 speedy kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
0000000000000400
Feb 13 08:50:53 speedy kernel: Process udisks-daemon (pid: 2721, threadinfo
ffff8802234dc000, task ffff880230f76920)
Feb 13 08:50:53 speedy kernel: Stack:
Feb 13 08:50:53 speedy kernel: ffffffff8103468a ffff880231599048
0000000000000000 ffff880231599000
Feb 13 08:50:53 speedy kernel: ffff880232322698 000000000000001d
ffff880232322680 ffffffff810a0f35
Feb 13 08:50:53 speedy kernel: ffff880232322680 ffff880231599000
0000000000000000 ffff880232322758
Feb 13 08:50:53 speedy kernel: Call Trace:
Feb 13 08:50:53 speedy kernel: [<ffffffff8103468a>] ? try_to_wake_up+0x200/0x200
Feb 13 08:50:53 speedy kernel: [<ffffffff810a0f35>] ? get_super+0x1a/0x95
Feb 13 08:50:53 speedy kernel: [<ffffffff810b2460>] ? iput+0x2b/0x17e
Feb 13 08:50:53 speedy kernel: [<ffffffff810eb4b6>] ?
rescan_partitions+0xac/0x446
Feb 13 08:50:53 speedy kernel: [<ffffffff810c5498>] ? __blkdev_get+0x162/0x33f
Feb 13 08:50:53 speedy kernel: [<ffffffff810c5913>] ? blkdev_get+0x29e/0x29e
Feb 13 08:50:53 speedy kernel: [<ffffffff810c5835>] ? blkdev_get+0x1c0/0x29e
Feb 13 08:50:53 speedy kernel: [<ffffffff810c5913>] ? blkdev_get+0x29e/0x29e
Feb 13 08:50:53 speedy kernel: [<ffffffff8109e0a7>] ?
__dentry_open.clone.14+0x16b/0x294
Feb 13 08:50:53 speedy kernel: [<ffffffff810aab37>] ?
do_last.clone.34+0x64e/0x662
Feb 13 08:50:53 speedy kernel: [<ffffffff810aac4d>] ? path_openat+0xcb/0x354
Feb 13 08:50:53 speedy kernel: [<ffffffff8133e9b0>] ?
scsi_set_medium_removal+0x46/0x6b
Feb 13 08:50:53 speedy kernel: [<ffffffff810aafb1>] ? do_filp_open+0x2c/0x72
Feb 13 08:50:53 speedy kernel: [<ffffffff810b4066>] ? alloc_fd+0x69/0x10f
Feb 13 08:50:53 speedy kernel: [<ffffffff8109ed9a>] ? do_sys_open+0x101/0x18f
Feb 13 08:50:53 speedy kernel: [<ffffffff81483292>] ?
system_call_fastpath+0x16/0x1b
Feb 13 08:50:53 speedy kernel: Code: ff ff 48 83 c4 68 5b 5d 41 5c 41 5d 41 5e
41 5f c3 41 57 41 56 41 55 41 54 55 53 48 83 ec 78 48 8b 9f 50 02 00 00 48 89
7c 24 48 <48> 8b 43 08 48 89 44 24 28 8b 05 49 dc 7e 00 c1 e8 15 83 e0 07
Feb 13 08:50:53 speedy kernel: RIP  [<ffffffff8135b798>]
sd_revalidate_disk+0x1a/0x16ee
Feb 13 08:50:53 speedy kernel: RSP <ffff8802234ddb08>
Feb 13 08:50:53 speedy kernel: CR2: 0000000000000008
Feb 13 08:50:53 speedy kernel: ---[ end trace 0370d79d444e26e5 ]---



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Kernel crashing on eject SD card
  2012-02-13 17:40       ` Naveen Goswamy
@ 2012-02-14 11:14         ` Jun'ichi Nomura
  2012-02-14 13:31           ` Stefan Richter
  2012-02-14 16:28           ` Tejun Heo
  0 siblings, 2 replies; 17+ messages in thread
From: Jun'ichi Nomura @ 2012-02-14 11:14 UTC (permalink / raw)
  To: Naveen Goswamy, Jens Axboe, Tejun Heo, James Bottomley
  Cc: Stefan Richter, Dave Jones, linux-kernel, linux-scsi

On 02/14/12 02:40, Naveen Goswamy wrote:
> Feb 13 08:50:53 speedy kernel: scsi 6:0:0:0: killing request
> Feb 13 08:50:53 speedy kernel: BUG: unable to handle kernel NULL pointer
> dereference at 0000000000000008
> Feb 13 08:50:53 speedy kernel: IP: [<ffffffff8135b798>]
> sd_revalidate_disk+0x1a/0x16ee
> Feb 13 08:50:53 speedy kernel: PGD 223493067 PUD 2234de067 PMD 0
> Feb 13 08:50:53 speedy kernel: Oops: 0000 [#1] SMP
> Feb 13 08:50:53 speedy kernel: CPU 2
> Feb 13 08:50:53 speedy kernel: Modules linked in: aes_x86_64 aes_generic
> ipt_REJECT iptable_mangle iptable_nat nf_nat iptable_filter ip_tables ipv6
> dm_mod uvcvideo videodev v4l2_compat_ioctl32 usb_storage arc4 brcmsmac
> snd_hda_codec_hdmi snd_hda_codec_idt mac80211 brcmutil snd_hda_intel
> snd_hda_codec cfg80211 r8169 rfkill snd_pcm snd_timer dell_wmi snd
> sparse_keymap ehci_hcd wmi firmware_class dcdbas crc8 soundcore rtc usbcore
> snd_page_alloc sg cordic usb_common
> Feb 13 08:50:53 speedy kernel:
> Feb 13 08:50:53 speedy kernel: Pid: 2721, comm: udisks-daemon Not tainted
> 3.2.5-gentoo_MINE_V00 #1 Dell Inc. Vostro 3400/07MJFM
> Feb 13 08:50:53 speedy kernel: RIP: 0010:[<ffffffff8135b798>]
> [<ffffffff8135b798>] sd_revalidate_disk+0x1a/0x16ee
> Feb 13 08:50:53 speedy kernel: RSP: 0018:ffff8802234ddb08  EFLAGS: 00010292
> Feb 13 08:50:53 speedy kernel: RAX: ffffffff8135b77e RBX: 0000000000000000 RCX:
> 0000000000000002
> Feb 13 08:50:53 speedy kernel: RDX: 0000000000000002 RSI: 0000000800000000 RDI:
> ffff880231599000
> Feb 13 08:50:53 speedy kernel: RBP: ffff880231599000 R08: ffff88023ab4f9a0 R09:
> ffffffff81852ec8
> Feb 13 08:50:53 speedy kernel: R10: 0000000000000002 R11: 0000000000011e00 R12:
> ffff880231599000
> Feb 13 08:50:53 speedy kernel: R13: ffff880232322698 R14: 0000000000000000 R15:
> ffff880232322680
> Feb 13 08:50:53 speedy kernel: FS:  00007f7666c6b700(0000)
> GS:ffff88023bd00000(0000) knlGS:0000000000000000
> Feb 13 08:50:53 speedy kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> Feb 13 08:50:53 speedy kernel: CR2: 0000000000000008 CR3: 0000000223492000 CR4:
> 00000000000006e0
> Feb 13 08:50:53 speedy kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> 0000000000000000
> Feb 13 08:50:53 speedy kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
> 0000000000000400
> Feb 13 08:50:53 speedy kernel: Process udisks-daemon (pid: 2721, threadinfo
> ffff8802234dc000, task ffff880230f76920)
> Feb 13 08:50:53 speedy kernel: Stack:
> Feb 13 08:50:53 speedy kernel: ffffffff8103468a ffff880231599048
> 0000000000000000 ffff880231599000
> Feb 13 08:50:53 speedy kernel: ffff880232322698 000000000000001d
> ffff880232322680 ffffffff810a0f35
> Feb 13 08:50:53 speedy kernel: ffff880232322680 ffff880231599000
> 0000000000000000 ffff880232322758
> Feb 13 08:50:53 speedy kernel: Call Trace:
> Feb 13 08:50:53 speedy kernel: [<ffffffff8103468a>] ? try_to_wake_up+0x200/0x200
> Feb 13 08:50:53 speedy kernel: [<ffffffff810a0f35>] ? get_super+0x1a/0x95
> Feb 13 08:50:53 speedy kernel: [<ffffffff810b2460>] ? iput+0x2b/0x17e
> Feb 13 08:50:53 speedy kernel: [<ffffffff810eb4b6>] ?
> rescan_partitions+0xac/0x446
> Feb 13 08:50:53 speedy kernel: [<ffffffff810c5498>] ? __blkdev_get+0x162/0x33f
> Feb 13 08:50:53 speedy kernel: [<ffffffff810c5913>] ? blkdev_get+0x29e/0x29e
> Feb 13 08:50:53 speedy kernel: [<ffffffff810c5835>] ? blkdev_get+0x1c0/0x29e
> Feb 13 08:50:53 speedy kernel: [<ffffffff810c5913>] ? blkdev_get+0x29e/0x29e
> Feb 13 08:50:53 speedy kernel: [<ffffffff8109e0a7>] ?
> __dentry_open.clone.14+0x16b/0x294
> Feb 13 08:50:53 speedy kernel: [<ffffffff810aab37>] ?
> do_last.clone.34+0x64e/0x662
> Feb 13 08:50:53 speedy kernel: [<ffffffff810aac4d>] ? path_openat+0xcb/0x354
> Feb 13 08:50:53 speedy kernel: [<ffffffff8133e9b0>] ?
> scsi_set_medium_removal+0x46/0x6b
> Feb 13 08:50:53 speedy kernel: [<ffffffff810aafb1>] ? do_filp_open+0x2c/0x72
> Feb 13 08:50:53 speedy kernel: [<ffffffff810b4066>] ? alloc_fd+0x69/0x10f
> Feb 13 08:50:53 speedy kernel: [<ffffffff8109ed9a>] ? do_sys_open+0x101/0x18f
> Feb 13 08:50:53 speedy kernel: [<ffffffff81483292>] ?
> system_call_fastpath+0x16/0x1b
> Feb 13 08:50:53 speedy kernel: Code: ff ff 48 83 c4 68 5b 5d 41 5c 41 5d 41 5e
> 41 5f c3 41 57 41 56 41 55 41 54 55 53 48 83 ec 78 48 8b 9f 50 02 00 00 48 89
> 7c 24 48 <48> 8b 43 08 48 89 44 24 28 8b 05 49 dc 7e 00 c1 e8 15 83 e0 07
> Feb 13 08:50:53 speedy kernel: RIP  [<ffffffff8135b798>]
> sd_revalidate_disk+0x1a/0x16ee
> Feb 13 08:50:53 speedy kernel: RSP <ffff8802234ddb08>
> Feb 13 08:50:53 speedy kernel: CR2: 0000000000000008
> Feb 13 08:50:53 speedy kernel: ---[ end trace 0370d79d444e26e5 ]---

According to the comments by Huajun Li:
http://www.spinics.net/lists/linux-scsi/msg55698.html

The following commit has changed __blkdev_get() to end up calling
sd_revalidate_disk() without getting a refcount of scsi_device:

  commit 1196f8b814f32cd04df334abf47648c2a9fd8324
  Author: Tejun Heo <tj@kernel.org>
  Date:   Thu Apr 21 20:54:45 2011 +0200

    block: rescan partitions on invalidated devices on -ENOMEDIA too

that could lead to oops like this:

  process A                  process B
  ----------------------------------------------
  sys_open
    __blkdev_get
      sd_open
        returns -ENOMEDIUM
                             scsi_remove_device
                               <scsi_device torn down>
      rescan_partitions
        sd_revalidate_disk
          <oops>

Should "revalidate_disk" of block_device_operations work
without successful open()?

If so, sd_revalidate_disk() (and possibly other drivers) needs to be
fixed. (e.g. use scsi_disk_get/put by itself)

If not, __blkdev_get() or rescan_partision() should avoid calling
"revalidate_disk" for -ENOMEDIUM case.

Thanks,
-- 
Jun'ichi Nomura, NEC Corporation

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Kernel crashing on eject SD card
  2012-02-14 11:14         ` Jun'ichi Nomura
@ 2012-02-14 13:31           ` Stefan Richter
  2012-02-14 16:28           ` Tejun Heo
  1 sibling, 0 replies; 17+ messages in thread
From: Stefan Richter @ 2012-02-14 13:31 UTC (permalink / raw)
  To: Jun'ichi Nomura
  Cc: Naveen Goswamy, Jens Axboe, Tejun Heo, James Bottomley,
	Dave Jones, linux-kernel, linux-scsi

On Feb 14 Jun'ichi Nomura wrote:
> According to the comments by Huajun Li:
> http://www.spinics.net/lists/linux-scsi/msg55698.html
> 
> The following commit has changed __blkdev_get() to end up calling
> sd_revalidate_disk() without getting a refcount of scsi_device:
> 
>   commit 1196f8b814f32cd04df334abf47648c2a9fd8324
>   Author: Tejun Heo <tj@kernel.org>
>   Date:   Thu Apr 21 20:54:45 2011 +0200
> 
>     block: rescan partitions on invalidated devices on -ENOMEDIA too
> 
> that could lead to oops like this:
> 
>   process A                  process B
>   ----------------------------------------------
>   sys_open
>     __blkdev_get
>       sd_open
>         returns -ENOMEDIUM
>                              scsi_remove_device
>                                <scsi_device torn down>
>       rescan_partitions
>         sd_revalidate_disk
>           <oops>
> 
> Should "revalidate_disk" of block_device_operations work
> without successful open()?
> 
> If so, sd_revalidate_disk() (and possibly other drivers) needs to be
> fixed. (e.g. use scsi_disk_get/put by itself)
> 
> If not, __blkdev_get() or rescan_partision() should avoid calling
> "revalidate_disk" for -ENOMEDIUM case.

It may very well be that not only sd_revalidate_disk is affected.

I have yet to check whether the "open -> unplug -> ioctl -> oops" bug from
http://www.spinics.net/lists/linux-scsi/msg56254.html
(a) happens under 3.3-rc still (was reported against 3.2-rc7),
(b) affects sd devices too (was reported against sr devices).
-- 
Stefan Richter
-=====-===-- --=- -===-
http://arcgraph.de/sr/

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Kernel crashing on eject SD card
  2012-02-14 11:14         ` Jun'ichi Nomura
  2012-02-14 13:31           ` Stefan Richter
@ 2012-02-14 16:28           ` Tejun Heo
  2012-02-15  2:56             ` Jun'ichi Nomura
  1 sibling, 1 reply; 17+ messages in thread
From: Tejun Heo @ 2012-02-14 16:28 UTC (permalink / raw)
  To: Jun'ichi Nomura
  Cc: Naveen Goswamy, Jens Axboe, James Bottomley, Stefan Richter,
	Dave Jones, linux-kernel, linux-scsi

Hello,

On Tue, Feb 14, 2012 at 08:14:40PM +0900, Jun'ichi Nomura wrote:
> The following commit has changed __blkdev_get() to end up calling
> sd_revalidate_disk() without getting a refcount of scsi_device:
> 
>   commit 1196f8b814f32cd04df334abf47648c2a9fd8324
>   Author: Tejun Heo <tj@kernel.org>
>   Date:   Thu Apr 21 20:54:45 2011 +0200
> 
>     block: rescan partitions on invalidated devices on -ENOMEDIA too
> 
> that could lead to oops like this:
> 
>   process A                  process B
>   ----------------------------------------------
>   sys_open
>     __blkdev_get
>       sd_open
>         returns -ENOMEDIUM
>                              scsi_remove_device
>                                <scsi_device torn down>
>       rescan_partitions
>         sd_revalidate_disk
>           <oops>
> 
> Should "revalidate_disk" of block_device_operations work
> without successful open()?
> 
> If so, sd_revalidate_disk() (and possibly other drivers) needs to be
> fixed. (e.g. use scsi_disk_get/put by itself)
> 
> If not, __blkdev_get() or rescan_partision() should avoid calling
> "revalidate_disk" for -ENOMEDIUM case.

Hmmm... right, that's a problem.  Missed rescan_partitions() calling
into driver.  What we should probably do is separating out
invalidation & partition shoot down into a separate function, say
trucate_disk(), and call that on -ENOMEDIUM instead of
rescan_partitions().  All that's necessary is killing the partition
devices (and maybe zapping device size to zero).  Any one interested
in trying it?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Kernel crashing on eject SD card
  2012-02-14 16:28           ` Tejun Heo
@ 2012-02-15  2:56             ` Jun'ichi Nomura
  2012-02-15 17:26               ` Tejun Heo
  0 siblings, 1 reply; 17+ messages in thread
From: Jun'ichi Nomura @ 2012-02-15  2:56 UTC (permalink / raw)
  To: Tejun Heo, Naveen Goswamy
  Cc: Jens Axboe, James Bottomley, Stefan Richter, Dave Jones,
	linux-kernel, linux-scsi

Hi,

Thank you for the comments.

On 02/15/12 01:28, Tejun Heo wrote:
>> that could lead to oops like this:
>>
>>   process A                  process B
>>   ----------------------------------------------
>>   sys_open
>>     __blkdev_get
>>       sd_open
>>         returns -ENOMEDIUM
>>                              scsi_remove_device
>>                                <scsi_device torn down>
>>       rescan_partitions
>>         sd_revalidate_disk
>>           <oops>
>>
>> Should "revalidate_disk" of block_device_operations work
>> without successful open()?
>>
>> If so, sd_revalidate_disk() (and possibly other drivers) needs to be
>> fixed. (e.g. use scsi_disk_get/put by itself)
>>
>> If not, __blkdev_get() or rescan_partision() should avoid calling
>> "revalidate_disk" for -ENOMEDIUM case.
> 
> Hmmm... right, that's a problem.  Missed rescan_partitions() calling
> into driver.  What we should probably do is separating out
> invalidation & partition shoot down into a separate function, say
> trucate_disk(), and call that on -ENOMEDIUM instead of
> rescan_partitions().  All that's necessary is killing the partition
> devices (and maybe zapping device size to zero).  Any one interested
> in trying it?

How about this?

If the patch looks ok, I appreciate if somebody with removable media
could test the followings:
  - the oops in sd_revalidate_disk() should not occur:
    http://marc.info/?l=linux-scsi&m=132388619710052
  - the problem reported here should not be re-introduced:
    https://bugzilla.kernel.org/show_bug.cgi?id=13029

Index: linux-3.3/block/partition-generic.c
===================================================================
--- linux-3.3.orig/block/partition-generic.c	2012-02-15 09:00:25.147293790 +0900
+++ linux-3.3/block/partition-generic.c	2012-02-15 11:31:33.835554974 +0900
@@ -389,17 +389,11 @@ static bool disk_unlock_native_capacity(
 	}
 }
 
-int rescan_partitions(struct gendisk *disk, struct block_device *bdev)
+static int drop_partitions(struct gendisk *disk, struct block_device *bdev)
 {
-	struct parsed_partitions *state = NULL;
 	struct disk_part_iter piter;
 	struct hd_struct *part;
-	int p, highest, res;
-rescan:
-	if (state && !IS_ERR(state)) {
-		kfree(state);
-		state = NULL;
-	}
+	int res;
 
 	if (bdev->bd_part_count)
 		return -EBUSY;
@@ -412,6 +406,24 @@ rescan:
 		delete_partition(disk, part->partno);
 	disk_part_iter_exit(&piter);
 
+	return 0;
+}
+
+int rescan_partitions(struct gendisk *disk, struct block_device *bdev)
+{
+	struct parsed_partitions *state = NULL;
+	struct hd_struct *part;
+	int p, highest, res;
+rescan:
+	if (state && !IS_ERR(state)) {
+		kfree(state);
+		state = NULL;
+	}
+
+	res = drop_partitions(disk, bdev);
+	if (res)
+		return res;
+
 	if (disk->fops->revalidate_disk)
 		disk->fops->revalidate_disk(disk);
 	check_disk_size_change(disk, bdev);
@@ -515,6 +527,22 @@ rescan:
 	return 0;
 }
 
+int invalidate_partitions(struct gendisk *disk, struct block_device *bdev)
+{
+	int res;
+
+	res = drop_partitions(disk, bdev);
+	if (res)
+		return res;
+
+	check_disk_size_change(disk, bdev);
+	bdev->bd_invalidated = 0;
+	/* tell userspace that the media / partition table may have changed */
+	kobject_uevent(&disk_to_dev(disk)->kobj, KOBJ_CHANGE);
+
+	return 0;
+}
+
 unsigned char *read_dev_sector(struct block_device *bdev, sector_t n, Sector *p)
 {
 	struct address_space *mapping = bdev->bd_inode->i_mapping;
Index: linux-3.3/include/linux/genhd.h
===================================================================
--- linux-3.3.orig/include/linux/genhd.h	2012-02-09 12:21:53.000000000 +0900
+++ linux-3.3/include/linux/genhd.h	2012-02-15 11:18:59.661594629 +0900
@@ -596,6 +596,7 @@ extern char *disk_name (struct gendisk *
 
 extern int disk_expand_part_tbl(struct gendisk *disk, int target);
 extern int rescan_partitions(struct gendisk *disk, struct block_device *bdev);
+extern int invalidate_partitions(struct gendisk *disk, struct block_device *bdev);
 extern struct hd_struct * __must_check add_partition(struct gendisk *disk,
 						     int partno, sector_t start,
 						     sector_t len, int flags,
Index: linux-3.3/fs/block_dev.c
===================================================================
--- linux-3.3.orig/fs/block_dev.c	2012-02-09 12:21:53.000000000 +0900
+++ linux-3.3/fs/block_dev.c	2012-02-15 11:34:48.800549266 +0900
@@ -1183,8 +1183,12 @@ static int __blkdev_get(struct block_dev
 			 * The latter is necessary to prevent ghost
 			 * partitions on a removed medium.
 			 */
-			if (bdev->bd_invalidated && (!ret || ret == -ENOMEDIUM))
-				rescan_partitions(disk, bdev);
+			if (bdev->bd_invalidated) {
+				if (!ret)
+					rescan_partitions(disk, bdev);
+				else if (ret == -ENOMEDIUM)
+					invalidate_partitions(disk, bdev);
+			}
 			if (ret)
 				goto out_clear;
 		} else {
@@ -1214,8 +1218,12 @@ static int __blkdev_get(struct block_dev
 			if (bdev->bd_disk->fops->open)
 				ret = bdev->bd_disk->fops->open(bdev, mode);
 			/* the same as first opener case, read comment there */
-			if (bdev->bd_invalidated && (!ret || ret == -ENOMEDIUM))
-				rescan_partitions(bdev->bd_disk, bdev);
+			if (bdev->bd_invalidated) {
+				if (!ret)
+					rescan_partitions(bdev->bd_disk, bdev);
+				else if (ret == -ENOMEDIUM)
+					invalidate_partitions(bdev->bd_disk, bdev);
+			}
 			if (ret)
 				goto out_unlock_bdev;
 		}

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Kernel crashing on eject SD card
  2012-02-15  2:56             ` Jun'ichi Nomura
@ 2012-02-15 17:26               ` Tejun Heo
  2012-02-16  1:26                 ` Jun'ichi Nomura
  0 siblings, 1 reply; 17+ messages in thread
From: Tejun Heo @ 2012-02-15 17:26 UTC (permalink / raw)
  To: Jun'ichi Nomura
  Cc: Naveen Goswamy, Jens Axboe, James Bottomley, Stefan Richter,
	Dave Jones, linux-kernel, linux-scsi

Hello,

This seems like the right approach to me, but..

On Wed, Feb 15, 2012 at 11:56:19AM +0900, Jun'ichi Nomura wrote:
> +int invalidate_partitions(struct gendisk *disk, struct block_device *bdev)
> +{
> +	int res;
> +
> +	res = drop_partitions(disk, bdev);
> +	if (res)
> +		return res;
> +

Hmmm... shouldn't we have set_capacity(disk, 0) here?

> +	check_disk_size_change(disk, bdev);
> +	bdev->bd_invalidated = 0;
> +	/* tell userspace that the media / partition table may have changed */
> +	kobject_uevent(&disk_to_dev(disk)->kobj, KOBJ_CHANGE);

Also, we really shouldn't be generating KOBJ_CHANGE after every
-ENOMEDIUM open.  This can easily lead to infinite loop.  We should
generate this iff we actually dropped partitions && modified the size.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Kernel crashing on eject SD card
  2012-02-15 17:26               ` Tejun Heo
@ 2012-02-16  1:26                 ` Jun'ichi Nomura
  2012-02-16 16:36                   ` Tejun Heo
  2012-03-01 18:58                   ` Luis Henriques
  0 siblings, 2 replies; 17+ messages in thread
From: Jun'ichi Nomura @ 2012-02-16  1:26 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Naveen Goswamy, Jens Axboe, James Bottomley, Stefan Richter,
	Dave Jones, linux-kernel, linux-scsi

Hi,

Thank you for review and comments.

On 02/16/12 02:26, Tejun Heo wrote:
> On Wed, Feb 15, 2012 at 11:56:19AM +0900, Jun'ichi Nomura wrote:
>> +int invalidate_partitions(struct gendisk *disk, struct block_device *bdev)
>> +{
>> +	int res;
>> +
>> +	res = drop_partitions(disk, bdev);
>> +	if (res)
>> +		return res;
>> +
> 
> Hmmm... shouldn't we have set_capacity(disk, 0) here?

Added.
I wasn't sure whether I should leave it to drivers.
But it seems capacity 0 for ENOMEDIUM device is reasonable.

>> +	check_disk_size_change(disk, bdev);
>> +	bdev->bd_invalidated = 0;
>> +	/* tell userspace that the media / partition table may have changed */
>> +	kobject_uevent(&disk_to_dev(disk)->kobj, KOBJ_CHANGE);
> 
> Also, we really shouldn't be generating KOBJ_CHANGE after every
> -ENOMEDIUM open.  This can easily lead to infinite loop.  We should
> generate this iff we actually dropped partitions && modified the size.

invalidate_partitions() is called only when bd_invalidated is set.
So KOBJ_CHANGE is not raised for every ENOMEDIUM open.

I put it explicit in the function to make it safer for
possible misuse.

How about this?

---------------------------------------------------------
Do not call drivers when invalidating partitions for -ENOMEDIUM

When a scsi driver returns -ENOMEDIUM for open(),
__blkdev_get() calls rescan_partitions(), which ends up calling
sd_revalidate_disk() without getting a refcount of scsi_device.

That could lead to oops like this:

  process A                  process B
  ----------------------------------------------
  sys_open
    __blkdev_get
      sd_open
        returns -ENOMEDIUM
                             scsi_remove_device
                               <scsi_device torn down>
      rescan_partitions
        sd_revalidate_disk
          <oops>

Oopses are reported here:
http://marc.info/?l=linux-scsi&m=132388619710052

This patch separates the partition invalidation from rescan_partitions()
and use it for -ENOMEDIUM case. 

Index: linux-3.3/block/partition-generic.c
===================================================================
--- linux-3.3.orig/block/partition-generic.c	2012-02-15 09:00:25.147293790 +0900
+++ linux-3.3/block/partition-generic.c	2012-02-16 10:48:22.257680685 +0900
@@ -389,17 +389,11 @@ static bool disk_unlock_native_capacity(
 	}
 }
 
-int rescan_partitions(struct gendisk *disk, struct block_device *bdev)
+static int drop_partitions(struct gendisk *disk, struct block_device *bdev)
 {
-	struct parsed_partitions *state = NULL;
 	struct disk_part_iter piter;
 	struct hd_struct *part;
-	int p, highest, res;
-rescan:
-	if (state && !IS_ERR(state)) {
-		kfree(state);
-		state = NULL;
-	}
+	int res;
 
 	if (bdev->bd_part_count)
 		return -EBUSY;
@@ -412,6 +406,24 @@ rescan:
 		delete_partition(disk, part->partno);
 	disk_part_iter_exit(&piter);
 
+	return 0;
+}
+
+int rescan_partitions(struct gendisk *disk, struct block_device *bdev)
+{
+	struct parsed_partitions *state = NULL;
+	struct hd_struct *part;
+	int p, highest, res;
+rescan:
+	if (state && !IS_ERR(state)) {
+		kfree(state);
+		state = NULL;
+	}
+
+	res = drop_partitions(disk, bdev);
+	if (res)
+		return res;
+
 	if (disk->fops->revalidate_disk)
 		disk->fops->revalidate_disk(disk);
 	check_disk_size_change(disk, bdev);
@@ -515,6 +527,26 @@ rescan:
 	return 0;
 }
 
+int invalidate_partitions(struct gendisk *disk, struct block_device *bdev)
+{
+	int res;
+
+	if (!bdev->bd_invalidated)
+		return 0;
+
+	res = drop_partitions(disk, bdev);
+	if (res)
+		return res;
+
+	set_capacity(disk, 0);
+	check_disk_size_change(disk, bdev);
+	bdev->bd_invalidated = 0;
+	/* tell userspace that the media / partition table may have changed */
+	kobject_uevent(&disk_to_dev(disk)->kobj, KOBJ_CHANGE);
+
+	return 0;
+}
+
 unsigned char *read_dev_sector(struct block_device *bdev, sector_t n, Sector *p)
 {
 	struct address_space *mapping = bdev->bd_inode->i_mapping;
Index: linux-3.3/include/linux/genhd.h
===================================================================
--- linux-3.3.orig/include/linux/genhd.h	2012-02-09 12:21:53.000000000 +0900
+++ linux-3.3/include/linux/genhd.h	2012-02-16 10:47:43.783681813 +0900
@@ -596,6 +596,7 @@ extern char *disk_name (struct gendisk *
 
 extern int disk_expand_part_tbl(struct gendisk *disk, int target);
 extern int rescan_partitions(struct gendisk *disk, struct block_device *bdev);
+extern int invalidate_partitions(struct gendisk *disk, struct block_device *bdev);
 extern struct hd_struct * __must_check add_partition(struct gendisk *disk,
 						     int partno, sector_t start,
 						     sector_t len, int flags,
Index: linux-3.3/fs/block_dev.c
===================================================================
--- linux-3.3.orig/fs/block_dev.c	2012-02-09 12:21:53.000000000 +0900
+++ linux-3.3/fs/block_dev.c	2012-02-16 10:47:52.602681441 +0900
@@ -1183,8 +1183,12 @@ static int __blkdev_get(struct block_dev
 			 * The latter is necessary to prevent ghost
 			 * partitions on a removed medium.
 			 */
-			if (bdev->bd_invalidated && (!ret || ret == -ENOMEDIUM))
-				rescan_partitions(disk, bdev);
+			if (bdev->bd_invalidated) {
+				if (!ret)
+					rescan_partitions(disk, bdev);
+				else if (ret == -ENOMEDIUM)
+					invalidate_partitions(disk, bdev);
+			}
 			if (ret)
 				goto out_clear;
 		} else {
@@ -1214,8 +1218,12 @@ static int __blkdev_get(struct block_dev
 			if (bdev->bd_disk->fops->open)
 				ret = bdev->bd_disk->fops->open(bdev, mode);
 			/* the same as first opener case, read comment there */
-			if (bdev->bd_invalidated && (!ret || ret == -ENOMEDIUM))
-				rescan_partitions(bdev->bd_disk, bdev);
+			if (bdev->bd_invalidated) {
+				if (!ret)
+					rescan_partitions(bdev->bd_disk, bdev);
+				else if (ret == -ENOMEDIUM)
+					invalidate_partitions(bdev->bd_disk, bdev);
+			}
 			if (ret)
 				goto out_unlock_bdev;
 		}

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Kernel crashing on eject SD card
  2012-02-16  1:26                 ` Jun'ichi Nomura
@ 2012-02-16 16:36                   ` Tejun Heo
  2012-03-01 18:58                   ` Luis Henriques
  1 sibling, 0 replies; 17+ messages in thread
From: Tejun Heo @ 2012-02-16 16:36 UTC (permalink / raw)
  To: Jun'ichi Nomura
  Cc: Naveen Goswamy, Jens Axboe, James Bottomley, Stefan Richter,
	Dave Jones, linux-kernel, linux-scsi

Hello,

On Thu, Feb 16, 2012 at 10:26:38AM +0900, Jun'ichi Nomura wrote:
> >> +int invalidate_partitions(struct gendisk *disk, struct block_device *bdev)
> >> +{
> >> +	int res;
> >> +
> >> +	res = drop_partitions(disk, bdev);
> >> +	if (res)
> >> +		return res;
> >> +
> > 
> > Hmmm... shouldn't we have set_capacity(disk, 0) here?
> 
> Added.
> I wasn't sure whether I should leave it to drivers.

The problem is that we shouldn't call into drivers without first
opening the device, so....

> But it seems capacity 0 for ENOMEDIUM device is reasonable.

Yeah, I *think* it should be okay.

> >> +	check_disk_size_change(disk, bdev);
> >> +	bdev->bd_invalidated = 0;
> >> +	/* tell userspace that the media / partition table may have changed */
> >> +	kobject_uevent(&disk_to_dev(disk)->kobj, KOBJ_CHANGE);
> > 
> > Also, we really shouldn't be generating KOBJ_CHANGE after every
> > -ENOMEDIUM open.  This can easily lead to infinite loop.  We should
> > generate this iff we actually dropped partitions && modified the size.
> 
> invalidate_partitions() is called only when bd_invalidated is set.
> So KOBJ_CHANGE is not raised for every ENOMEDIUM open.

Ah, okay.

> I put it explicit in the function to make it safer for
> possible misuse.
> 
> How about this?
> 
> ---------------------------------------------------------
> Do not call drivers when invalidating partitions for -ENOMEDIUM
> 
> When a scsi driver returns -ENOMEDIUM for open(),
> __blkdev_get() calls rescan_partitions(), which ends up calling
> sd_revalidate_disk() without getting a refcount of scsi_device.
> 
> That could lead to oops like this:
> 
>   process A                  process B
>   ----------------------------------------------
>   sys_open
>     __blkdev_get
>       sd_open
>         returns -ENOMEDIUM
>                              scsi_remove_device
>                                <scsi_device torn down>
>       rescan_partitions
>         sd_revalidate_disk
>           <oops>
> 
> Oopses are reported here:
> http://marc.info/?l=linux-scsi&m=132388619710052
> 
> This patch separates the partition invalidation from rescan_partitions()
> and use it for -ENOMEDIUM case. 

Yeah, this looks good to me.

Thank you.

-- 
tejun

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Kernel crashing on eject SD card
  2012-02-16  1:26                 ` Jun'ichi Nomura
  2012-02-16 16:36                   ` Tejun Heo
@ 2012-03-01 18:58                   ` Luis Henriques
  2012-03-02  0:12                     ` Jun'ichi Nomura
  1 sibling, 1 reply; 17+ messages in thread
From: Luis Henriques @ 2012-03-01 18:58 UTC (permalink / raw)
  To: Jun'ichi Nomura
  Cc: Tejun Heo, Naveen Goswamy, Jens Axboe, James Bottomley,
	Stefan Richter, Dave Jones, linux-kernel, linux-scsi

Hi,

On Thu, Feb 16, 2012 at 10:26:38AM +0900, Jun'ichi Nomura wrote:
> Hi,
> 
> Thank you for review and comments.
> 
> On 02/16/12 02:26, Tejun Heo wrote:
> > On Wed, Feb 15, 2012 at 11:56:19AM +0900, Jun'ichi Nomura wrote:
> >> +int invalidate_partitions(struct gendisk *disk, struct block_device *bdev)
> >> +{
> >> +	int res;
> >> +
> >> +	res = drop_partitions(disk, bdev);
> >> +	if (res)
> >> +		return res;
> >> +
> > 
> > Hmmm... shouldn't we have set_capacity(disk, 0) here?
> 
> Added.
> I wasn't sure whether I should leave it to drivers.
> But it seems capacity 0 for ENOMEDIUM device is reasonable.
> 
> >> +	check_disk_size_change(disk, bdev);
> >> +	bdev->bd_invalidated = 0;
> >> +	/* tell userspace that the media / partition table may have changed */
> >> +	kobject_uevent(&disk_to_dev(disk)->kobj, KOBJ_CHANGE);
> > 
> > Also, we really shouldn't be generating KOBJ_CHANGE after every
> > -ENOMEDIUM open.  This can easily lead to infinite loop.  We should
> > generate this iff we actually dropped partitions && modified the size.
> 
> invalidate_partitions() is called only when bd_invalidated is set.
> So KOBJ_CHANGE is not raised for every ENOMEDIUM open.
> 
> I put it explicit in the function to make it safer for
> possible misuse.
> 
> How about this?

Are there any updates on this fix?  I was wondering if any progress has been
made and if this patch has any chances of hitting mainline soon.

I have executed a quick test and it seems to solve the problem (or, at least, I
am not able to reproduce the oops anymore).

Cheers,
-- 
Luis

> ---------------------------------------------------------
> Do not call drivers when invalidating partitions for -ENOMEDIUM
> 
> When a scsi driver returns -ENOMEDIUM for open(),
> __blkdev_get() calls rescan_partitions(), which ends up calling
> sd_revalidate_disk() without getting a refcount of scsi_device.
> 
> That could lead to oops like this:
> 
>   process A                  process B
>   ----------------------------------------------
>   sys_open
>     __blkdev_get
>       sd_open
>         returns -ENOMEDIUM
>                              scsi_remove_device
>                                <scsi_device torn down>
>       rescan_partitions
>         sd_revalidate_disk
>           <oops>
> 
> Oopses are reported here:
> http://marc.info/?l=linux-scsi&m=132388619710052
> 
> This patch separates the partition invalidation from rescan_partitions()
> and use it for -ENOMEDIUM case. 
> 
> Index: linux-3.3/block/partition-generic.c
> ===================================================================
> --- linux-3.3.orig/block/partition-generic.c	2012-02-15 09:00:25.147293790 +0900
> +++ linux-3.3/block/partition-generic.c	2012-02-16 10:48:22.257680685 +0900
> @@ -389,17 +389,11 @@ static bool disk_unlock_native_capacity(
>  	}
>  }
>  
> -int rescan_partitions(struct gendisk *disk, struct block_device *bdev)
> +static int drop_partitions(struct gendisk *disk, struct block_device *bdev)
>  {
> -	struct parsed_partitions *state = NULL;
>  	struct disk_part_iter piter;
>  	struct hd_struct *part;
> -	int p, highest, res;
> -rescan:
> -	if (state && !IS_ERR(state)) {
> -		kfree(state);
> -		state = NULL;
> -	}
> +	int res;
>  
>  	if (bdev->bd_part_count)
>  		return -EBUSY;
> @@ -412,6 +406,24 @@ rescan:
>  		delete_partition(disk, part->partno);
>  	disk_part_iter_exit(&piter);
>  
> +	return 0;
> +}
> +
> +int rescan_partitions(struct gendisk *disk, struct block_device *bdev)
> +{
> +	struct parsed_partitions *state = NULL;
> +	struct hd_struct *part;
> +	int p, highest, res;
> +rescan:
> +	if (state && !IS_ERR(state)) {
> +		kfree(state);
> +		state = NULL;
> +	}
> +
> +	res = drop_partitions(disk, bdev);
> +	if (res)
> +		return res;
> +
>  	if (disk->fops->revalidate_disk)
>  		disk->fops->revalidate_disk(disk);
>  	check_disk_size_change(disk, bdev);
> @@ -515,6 +527,26 @@ rescan:
>  	return 0;
>  }
>  
> +int invalidate_partitions(struct gendisk *disk, struct block_device *bdev)
> +{
> +	int res;
> +
> +	if (!bdev->bd_invalidated)
> +		return 0;
> +
> +	res = drop_partitions(disk, bdev);
> +	if (res)
> +		return res;
> +
> +	set_capacity(disk, 0);
> +	check_disk_size_change(disk, bdev);
> +	bdev->bd_invalidated = 0;
> +	/* tell userspace that the media / partition table may have changed */
> +	kobject_uevent(&disk_to_dev(disk)->kobj, KOBJ_CHANGE);
> +
> +	return 0;
> +}
> +
>  unsigned char *read_dev_sector(struct block_device *bdev, sector_t n, Sector *p)
>  {
>  	struct address_space *mapping = bdev->bd_inode->i_mapping;
> Index: linux-3.3/include/linux/genhd.h
> ===================================================================
> --- linux-3.3.orig/include/linux/genhd.h	2012-02-09 12:21:53.000000000 +0900
> +++ linux-3.3/include/linux/genhd.h	2012-02-16 10:47:43.783681813 +0900
> @@ -596,6 +596,7 @@ extern char *disk_name (struct gendisk *
>  
>  extern int disk_expand_part_tbl(struct gendisk *disk, int target);
>  extern int rescan_partitions(struct gendisk *disk, struct block_device *bdev);
> +extern int invalidate_partitions(struct gendisk *disk, struct block_device *bdev);
>  extern struct hd_struct * __must_check add_partition(struct gendisk *disk,
>  						     int partno, sector_t start,
>  						     sector_t len, int flags,
> Index: linux-3.3/fs/block_dev.c
> ===================================================================
> --- linux-3.3.orig/fs/block_dev.c	2012-02-09 12:21:53.000000000 +0900
> +++ linux-3.3/fs/block_dev.c	2012-02-16 10:47:52.602681441 +0900
> @@ -1183,8 +1183,12 @@ static int __blkdev_get(struct block_dev
>  			 * The latter is necessary to prevent ghost
>  			 * partitions on a removed medium.
>  			 */
> -			if (bdev->bd_invalidated && (!ret || ret == -ENOMEDIUM))
> -				rescan_partitions(disk, bdev);
> +			if (bdev->bd_invalidated) {
> +				if (!ret)
> +					rescan_partitions(disk, bdev);
> +				else if (ret == -ENOMEDIUM)
> +					invalidate_partitions(disk, bdev);
> +			}
>  			if (ret)
>  				goto out_clear;
>  		} else {
> @@ -1214,8 +1218,12 @@ static int __blkdev_get(struct block_dev
>  			if (bdev->bd_disk->fops->open)
>  				ret = bdev->bd_disk->fops->open(bdev, mode);
>  			/* the same as first opener case, read comment there */
> -			if (bdev->bd_invalidated && (!ret || ret == -ENOMEDIUM))
> -				rescan_partitions(bdev->bd_disk, bdev);
> +			if (bdev->bd_invalidated) {
> +				if (!ret)
> +					rescan_partitions(bdev->bd_disk, bdev);
> +				else if (ret == -ENOMEDIUM)
> +					invalidate_partitions(bdev->bd_disk, bdev);
> +			}
>  			if (ret)
>  				goto out_unlock_bdev;
>  		}
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Kernel crashing on eject SD card
  2012-03-01 18:58                   ` Luis Henriques
@ 2012-03-02  0:12                     ` Jun'ichi Nomura
  2012-03-02  9:35                       ` Luis Henriques
  2012-03-02  9:41                       ` Jens Axboe
  0 siblings, 2 replies; 17+ messages in thread
From: Jun'ichi Nomura @ 2012-03-02  0:12 UTC (permalink / raw)
  To: Luis Henriques, Jens Axboe
  Cc: Tejun Heo, Naveen Goswamy, James Bottomley, Stefan Richter,
	Dave Jones, linux-kernel, linux-scsi

Hi,

On 03/02/12 03:58, Luis Henriques wrote:
> Are there any updates on this fix?  I was wondering if any progress has been
> made and if this patch has any chances of hitting mainline soon.
> 
> I have executed a quick test and it seems to solve the problem (or, at least, I
> am not able to reproduce the oops anymore).

Thank you for testing and feedback.

The patch is posted here and waiting for Jens to pick up:
  https://lkml.org/lkml/2012/2/21/443
  [PATCH] Fix NULL pointer dereference in sd_revalidate_disk

-- 
Jun'ichi Nomura, NEC Corporation

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Kernel crashing on eject SD card
  2012-03-02  0:12                     ` Jun'ichi Nomura
@ 2012-03-02  9:35                       ` Luis Henriques
  2012-03-02  9:41                       ` Jens Axboe
  1 sibling, 0 replies; 17+ messages in thread
From: Luis Henriques @ 2012-03-02  9:35 UTC (permalink / raw)
  To: Jun'ichi Nomura
  Cc: Jens Axboe, Tejun Heo, Naveen Goswamy, James Bottomley,
	Stefan Richter, Dave Jones, linux-kernel, linux-scsi

On Fri, Mar 02, 2012 at 09:12:02AM +0900, Jun'ichi Nomura wrote:
> Hi,
> 
> On 03/02/12 03:58, Luis Henriques wrote:
> > Are there any updates on this fix?  I was wondering if any progress has been
> > made and if this patch has any chances of hitting mainline soon.
> > 
> > I have executed a quick test and it seems to solve the problem (or, at least, I
> > am not able to reproduce the oops anymore).
> 
> Thank you for testing and feedback.
> 
> The patch is posted here and waiting for Jens to pick up:
>   https://lkml.org/lkml/2012/2/21/443
>   [PATCH] Fix NULL pointer dereference in sd_revalidate_disk

Thanks for pointing me to the post.  I don't know how I missed that update.
I'll take a look at it and re-test it.

Cheers,
-- 
Luis

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Kernel crashing on eject SD card
  2012-03-02  0:12                     ` Jun'ichi Nomura
  2012-03-02  9:35                       ` Luis Henriques
@ 2012-03-02  9:41                       ` Jens Axboe
  1 sibling, 0 replies; 17+ messages in thread
From: Jens Axboe @ 2012-03-02  9:41 UTC (permalink / raw)
  To: Jun'ichi Nomura
  Cc: Luis Henriques, Tejun Heo, Naveen Goswamy, James Bottomley,
	Stefan Richter, Dave Jones, linux-kernel, linux-scsi

On 03/02/2012 01:12 AM, Jun'ichi Nomura wrote:
> Hi,
> 
> On 03/02/12 03:58, Luis Henriques wrote:
>> Are there any updates on this fix?  I was wondering if any progress has been
>> made and if this patch has any chances of hitting mainline soon.
>>
>> I have executed a quick test and it seems to solve the problem (or, at least, I
>> am not able to reproduce the oops anymore).
> 
> Thank you for testing and feedback.
> 
> The patch is posted here and waiting for Jens to pick up:
>   https://lkml.org/lkml/2012/2/21/443
>   [PATCH] Fix NULL pointer dereference in sd_revalidate_disk

It's queued up, I'll send off patches to Linus today.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2012-03-02  9:42 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-02-08  0:19 Kernel crashing on eject SD card Naveen Goswamy
2012-02-12 21:08 ` Stefan Richter
2012-02-12 21:20   ` Stefan Richter
2012-02-13  1:46     ` Naveen Goswamy
2012-02-13  2:18     ` Dave Jones
2012-02-13 17:40       ` Naveen Goswamy
2012-02-14 11:14         ` Jun'ichi Nomura
2012-02-14 13:31           ` Stefan Richter
2012-02-14 16:28           ` Tejun Heo
2012-02-15  2:56             ` Jun'ichi Nomura
2012-02-15 17:26               ` Tejun Heo
2012-02-16  1:26                 ` Jun'ichi Nomura
2012-02-16 16:36                   ` Tejun Heo
2012-03-01 18:58                   ` Luis Henriques
2012-03-02  0:12                     ` Jun'ichi Nomura
2012-03-02  9:35                       ` Luis Henriques
2012-03-02  9:41                       ` Jens Axboe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).