All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: mount stuck, khubd blocked
       [not found] <CAGGBzX+uGFOXb0WMD-bkDSL8nq=rD4ZFiV=xwAJyqNx=rZ20sw@mail.gmail.com>
@ 2012-06-19 14:45 ` Alan Stern
  2012-06-19 21:41   ` Dave Chinner
  2012-06-20 18:47   ` Jeff Moyer
  0 siblings, 2 replies; 14+ messages in thread
From: Alan Stern @ 2012-06-19 14:45 UTC (permalink / raw)
  To: Dima Tisnek, Alexander Viro, Jens Axboe
  Cc: USB list, linux-fsdevel, Kernel development list

On Tue, 19 Jun 2012, Dima Tisnek wrote:

> I made a microsd flash with 2 partitions, sdb1 is data partition, and
> sdb2 is a sentinel partition, 1 block in size.
> 
> I attached the usb-microsd reader with that card in it and by mistake
> tried to mount the sentinel partition, I ran:
> mount /dev/sdb2 /mnt/flash/
> 
> mount got stuck, I was not able to kill or strace it, I pulled the usb
> reader from the port, mount was still stuck, here's the dmesg log:
> 
> [65464.536212] usb 4-1.2: new high-speed USB device number 3 using ehci_hcd
> [65464.700933] usbcore: registered new interface driver uas
> [65464.703478] Initializing USB Mass Storage driver...
> [65464.703762] scsi8 : usb-storage 4-1.2:1.0
> [65464.703852] usbcore: registered new interface driver usb-storage
> [65464.703854] USB Mass Storage support registered.
> [65465.706479] scsi 8:0:0:0: Direct-Access     Generic- Card Reader
>   1.00 PQ: 0 ANSI: 0 CCS
> [65466.389664] sd 8:0:0:0: [sdb] 3862528 512-byte logical blocks:
> (1.97 GB/1.84 GiB)
> [65466.390493] sd 8:0:0:0: [sdb] Write Protect is off
> [65466.390497] sd 8:0:0:0: [sdb] Mode Sense: 03 00 00 00
> [65466.391263] sd 8:0:0:0: [sdb] No Caching mode page present
> [65466.391267] sd 8:0:0:0: [sdb] Assuming drive cache: write through
> [65466.394723] sd 8:0:0:0: [sdb] No Caching mode page present
> [65466.394727] sd 8:0:0:0: [sdb] Assuming drive cache: write through
> [65466.397500]  sdb: sdb1 sdb2
> [65466.400468] sd 8:0:0:0: [sdb] No Caching mode page present
> [65466.400471] sd 8:0:0:0: [sdb] Assuming drive cache: write through
> [65466.400474] sd 8:0:0:0: [sdb] Attached SCSI removable disk
> [66159.793752] usb 4-1.2: USB disconnect, device number 3
> [66291.567080] INFO: task khubd:90 blocked for more than 120 seconds.
> [66291.567083] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [66291.567086] khubd           D 0000000000000001     0    90      2 0x00000000
> [66291.567090]  ffff880230313948 0000000000000046 ffff88023233e000
> ffff880230313fd8
> [66291.567095]  ffff880230313fd8 ffff880230313fd8 ffff880194f8b000
> ffff88023233e000
> [66291.567099]  ffffffff81244c85 ffff880194f8b048 0000000000000046
> ffff8802303138c0
> [66291.567104] Call Trace:
> [66291.567112]  [<ffffffff81244c85>] ? number.isra.2+0x315/0x350
> [66291.567117]  [<ffffffff811b091b>] ? ep_poll_callback+0xeb/0x120
> [66291.567121]  [<ffffffff81244cfe>] ? string.isra.4+0x3e/0xd0
> [66291.567125]  [<ffffffff8145e50f>] schedule+0x3f/0x60
> [66291.567129]  [<ffffffff8145ef35>] rwsem_down_failed_common+0xc5/0x160
> [66291.567133]  [<ffffffff81185f49>] ? find_inode+0xa9/0xb0
> [66291.567136]  [<ffffffff8145f005>] rwsem_down_read_failed+0x15/0x17
> [66291.567139]  [<ffffffff81248434>] call_rwsem_down_read_failed+0x14/0x30
> [66291.567143]  [<ffffffff8145d3d7>] ? down_read+0x17/0x20
> [66291.567146]  [<ffffffff8116ebdf>] get_super+0x9f/0xe0
> [66291.567149]  [<ffffffff811a4d9d>] fsync_bdev+0x1d/0x60
> [66291.567152]  [<ffffffff81227a2d>] invalidate_partition+0x2d/0x60
> [66291.567155]  [<ffffffff81228920>] del_gendisk+0x90/0x250

As can be seen from the stack entries above, this problem lies in the 
block or filesystem layer and not in USB or SCSI.

> [66291.567170]  [<ffffffffa01004ed>] sd_remove+0x6d/0xb0 [sd_mod]
> [66291.567177]  [<ffffffff8130ae0c>] __device_release_driver+0x7c/0xe0
> [66291.567181]  [<ffffffff8130ae9c>] device_release_driver+0x2c/0x40
> [66291.567185]  [<ffffffff8130a931>] bus_remove_device+0xe1/0x120
> [66291.567188]  [<ffffffff8130848a>] device_del+0x12a/0x1b0
> [66291.567195]  [<ffffffffa01211f5>] __scsi_remove_device+0xc5/0xd0 [scsi_mod]
> [66291.567202]  [<ffffffffa011fb54>] scsi_forget_host+0x64/0x70 [scsi_mod]
> [66291.567209]  [<ffffffffa01165cf>] scsi_remove_host+0x6f/0x120 [scsi_mod]
> [66291.567213]  [<ffffffffa04d26e3>] usb_stor_disconnect+0x63/0xd0 [usb_storage]
> [66291.567221]  [<ffffffffa0154240>] usb_unbind_interface+0x50/0x180 [usbcore]
> [66291.567226]  [<ffffffff8130ae0c>] __device_release_driver+0x7c/0xe0
> [66291.567229]  [<ffffffff8130ae9c>] device_release_driver+0x2c/0x40
> [66291.567233]  [<ffffffff8130a931>] bus_remove_device+0xe1/0x120
> [66291.567236]  [<ffffffff8130848a>] device_del+0x12a/0x1b0
> [66291.567243]  [<ffffffffa0151fef>] usb_disable_device+0xaf/0x1f0 [usbcore]
> [66291.567250]  [<ffffffffa014a427>] usb_disconnect+0x87/0x120 [usbcore]
> [66291.567256]  [<ffffffffa014be1b>] hub_thread+0x54b/0x12a0 [usbcore]
> [66291.567261]  [<ffffffff81072540>] ? abort_exclusive_wait+0xb0/0xb0
> [66291.567268]  [<ffffffffa014b8d0>] ? usb_remote_wakeup+0x40/0x40 [usbcore]
> [66291.567272]  [<ffffffff81071bc3>] kthread+0x93/0xa0
> [66291.567275]  [<ffffffff81461664>] kernel_thread_helper+0x4/0x10
>
> subsequent to this, another khubd and systemd-udevd are reported
> blocked in terms.
>
> kernel  3.3.8-1-ARCH #1 x86_64
>
> I think I will have to reboot to get usb running again
>
> Tell if there's something I can do to narrow down this issue as I see
> it is rather vague as it is now.

See also Bugzilla #43269:

	https://bugzilla.kernel.org/show_bug.cgi?id=43269

It looks like the same problem.

Alan Stern


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: mount stuck, khubd blocked
  2012-06-19 14:45 ` mount stuck, khubd blocked Alan Stern
@ 2012-06-19 21:41   ` Dave Chinner
  2012-06-20 14:31     ` Alan Stern
  2012-06-20 18:47   ` Jeff Moyer
  1 sibling, 1 reply; 14+ messages in thread
From: Dave Chinner @ 2012-06-19 21:41 UTC (permalink / raw)
  To: Alan Stern
  Cc: Dima Tisnek, Alexander Viro, Jens Axboe, USB list, linux-fsdevel,
	Kernel development list

On Tue, Jun 19, 2012 at 10:45:10AM -0400, Alan Stern wrote:
> On Tue, 19 Jun 2012, Dima Tisnek wrote:
> 
> > I made a microsd flash with 2 partitions, sdb1 is data partition, and
> > sdb2 is a sentinel partition, 1 block in size.
> > 
> > I attached the usb-microsd reader with that card in it and by mistake
> > tried to mount the sentinel partition, I ran:
> > mount /dev/sdb2 /mnt/flash/
> > 
> > mount got stuck, I was not able to kill or strace it, I pulled the usb
> > reader from the port, mount was still stuck, here's the dmesg log:

So where is the mount process stuck? It's holding the lock that
khubd is stuck on....

> > 
> > [65464.536212] usb 4-1.2: new high-speed USB device number 3 using ehci_hcd
> > [65464.700933] usbcore: registered new interface driver uas
> > [65464.703478] Initializing USB Mass Storage driver...
> > [65464.703762] scsi8 : usb-storage 4-1.2:1.0
> > [65464.703852] usbcore: registered new interface driver usb-storage
> > [65464.703854] USB Mass Storage support registered.
> > [65465.706479] scsi 8:0:0:0: Direct-Access     Generic- Card Reader
> >   1.00 PQ: 0 ANSI: 0 CCS
> > [65466.389664] sd 8:0:0:0: [sdb] 3862528 512-byte logical blocks:
> > (1.97 GB/1.84 GiB)
> > [65466.390493] sd 8:0:0:0: [sdb] Write Protect is off
> > [65466.390497] sd 8:0:0:0: [sdb] Mode Sense: 03 00 00 00
> > [65466.391263] sd 8:0:0:0: [sdb] No Caching mode page present
> > [65466.391267] sd 8:0:0:0: [sdb] Assuming drive cache: write through
> > [65466.394723] sd 8:0:0:0: [sdb] No Caching mode page present
> > [65466.394727] sd 8:0:0:0: [sdb] Assuming drive cache: write through
> > [65466.397500]  sdb: sdb1 sdb2
> > [65466.400468] sd 8:0:0:0: [sdb] No Caching mode page present
> > [65466.400471] sd 8:0:0:0: [sdb] Assuming drive cache: write through
> > [65466.400474] sd 8:0:0:0: [sdb] Attached SCSI removable disk
> > [66159.793752] usb 4-1.2: USB disconnect, device number 3
> > [66291.567080] INFO: task khubd:90 blocked for more than 120 seconds.
> > [66291.567083] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> > disables this message.
> > [66291.567086] khubd           D 0000000000000001     0    90      2 0x00000000
> > [66291.567090]  ffff880230313948 0000000000000046 ffff88023233e000
> > ffff880230313fd8
> > [66291.567095]  ffff880230313fd8 ffff880230313fd8 ffff880194f8b000
> > ffff88023233e000
> > [66291.567099]  ffffffff81244c85 ffff880194f8b048 0000000000000046
> > ffff8802303138c0
> > [66291.567104] Call Trace:
> > [66291.567112]  [<ffffffff81244c85>] ? number.isra.2+0x315/0x350
> > [66291.567117]  [<ffffffff811b091b>] ? ep_poll_callback+0xeb/0x120
> > [66291.567121]  [<ffffffff81244cfe>] ? string.isra.4+0x3e/0xd0
> > [66291.567125]  [<ffffffff8145e50f>] schedule+0x3f/0x60
> > [66291.567129]  [<ffffffff8145ef35>] rwsem_down_failed_common+0xc5/0x160
> > [66291.567133]  [<ffffffff81185f49>] ? find_inode+0xa9/0xb0
> > [66291.567136]  [<ffffffff8145f005>] rwsem_down_read_failed+0x15/0x17
> > [66291.567139]  [<ffffffff81248434>] call_rwsem_down_read_failed+0x14/0x30
> > [66291.567143]  [<ffffffff8145d3d7>] ? down_read+0x17/0x20
> > [66291.567146]  [<ffffffff8116ebdf>] get_super+0x9f/0xe0
> > [66291.567149]  [<ffffffff811a4d9d>] fsync_bdev+0x1d/0x60
> > [66291.567152]  [<ffffffff81227a2d>] invalidate_partition+0x2d/0x60
> > [66291.567155]  [<ffffffff81228920>] del_gendisk+0x90/0x250
> 
> As can be seen from the stack entries above, this problem lies in the 
> block or filesystem layer and not in USB or SCSI.

Don't blame the higher layers as the cause of the problem simply
because they are the ones that show the visible symptoms ;)

The problem lies in the fact that the error handling callback that
is run when the device is removed triggers IO to the block device
that was just removed.  If all outstanding IOs have been error'd out
correctly, and all new IOs return errors, then there is no reason
for the fsync to block here. i.e. the mount process should have
received an error.

However, the mount could have hung because underlying device has not
been cleaned up properly before the device disconnect has proceeded.
i.e. that it is possible that the cause is a SCSI or USB issue, not a
filesystem issue. :)

So, what other blocked tasks are there in the system (echo w >
/proc/sysrq-trigger)?

As it is, I think that invalidate_partition() is doing something
somewhat insane for a block device that has been removed - you can't
write to it so fsync_bdev() is useless. And cleaning up the dentry
and inode caches is something that should be done when unmounting
the filesystem, not when the block device goes away as they can
trigger more IO and potentially deadlock with other operations that
have not handled the IO errors properly. Yes, shut a filesystem down
that has had it's block device removed, but filesystem level cleanup
should be left to the filesystem, not this error handling path.

And another question - why doesn't having an active filesystem on a
block device (i.e. an active reference to the gendisk) prevent the
block device from being removed from underneath it?

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: mount stuck, khubd blocked
  2012-06-19 21:41   ` Dave Chinner
@ 2012-06-20 14:31     ` Alan Stern
  2012-06-21  1:34         ` Dave Chinner
  0 siblings, 1 reply; 14+ messages in thread
From: Alan Stern @ 2012-06-20 14:31 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Dima Tisnek, Alexander Viro, Jens Axboe, USB list, linux-fsdevel,
	Kernel development list

On Wed, 20 Jun 2012, Dave Chinner wrote:

> On Tue, Jun 19, 2012 at 10:45:10AM -0400, Alan Stern wrote:
> > On Tue, 19 Jun 2012, Dima Tisnek wrote:
> > 
> > > I made a microsd flash with 2 partitions, sdb1 is data partition, and
> > > sdb2 is a sentinel partition, 1 block in size.
> > > 
> > > I attached the usb-microsd reader with that card in it and by mistake
> > > tried to mount the sentinel partition, I ran:
> > > mount /dev/sdb2 /mnt/flash/
> > > 
> > > mount got stuck, I was not able to kill or strace it, I pulled the usb
> > > reader from the port, mount was still stuck, here's the dmesg log:
> 
> So where is the mount process stuck? It's holding the lock that
> khubd is stuck on....

Yes, that's most likely the right explanation.

> > > [65464.536212] usb 4-1.2: new high-speed USB device number 3 using ehci_hcd
> > > [65464.700933] usbcore: registered new interface driver uas
> > > [65464.703478] Initializing USB Mass Storage driver...
> > > [65464.703762] scsi8 : usb-storage 4-1.2:1.0
> > > [65464.703852] usbcore: registered new interface driver usb-storage
> > > [65464.703854] USB Mass Storage support registered.
> > > [65465.706479] scsi 8:0:0:0: Direct-Access     Generic- Card Reader
> > >   1.00 PQ: 0 ANSI: 0 CCS
> > > [65466.389664] sd 8:0:0:0: [sdb] 3862528 512-byte logical blocks:
> > > (1.97 GB/1.84 GiB)
> > > [65466.390493] sd 8:0:0:0: [sdb] Write Protect is off
> > > [65466.390497] sd 8:0:0:0: [sdb] Mode Sense: 03 00 00 00
> > > [65466.391263] sd 8:0:0:0: [sdb] No Caching mode page present
> > > [65466.391267] sd 8:0:0:0: [sdb] Assuming drive cache: write through
> > > [65466.394723] sd 8:0:0:0: [sdb] No Caching mode page present
> > > [65466.394727] sd 8:0:0:0: [sdb] Assuming drive cache: write through
> > > [65466.397500]  sdb: sdb1 sdb2
> > > [65466.400468] sd 8:0:0:0: [sdb] No Caching mode page present
> > > [65466.400471] sd 8:0:0:0: [sdb] Assuming drive cache: write through
> > > [65466.400474] sd 8:0:0:0: [sdb] Attached SCSI removable disk
> > > [66159.793752] usb 4-1.2: USB disconnect, device number 3
> > > [66291.567080] INFO: task khubd:90 blocked for more than 120 seconds.
> > > [66291.567083] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> > > disables this message.
> > > [66291.567086] khubd           D 0000000000000001     0    90      2 0x00000000
> > > [66291.567090]  ffff880230313948 0000000000000046 ffff88023233e000
> > > ffff880230313fd8
> > > [66291.567095]  ffff880230313fd8 ffff880230313fd8 ffff880194f8b000
> > > ffff88023233e000
> > > [66291.567099]  ffffffff81244c85 ffff880194f8b048 0000000000000046
> > > ffff8802303138c0
> > > [66291.567104] Call Trace:
> > > [66291.567112]  [<ffffffff81244c85>] ? number.isra.2+0x315/0x350
> > > [66291.567117]  [<ffffffff811b091b>] ? ep_poll_callback+0xeb/0x120
> > > [66291.567121]  [<ffffffff81244cfe>] ? string.isra.4+0x3e/0xd0
> > > [66291.567125]  [<ffffffff8145e50f>] schedule+0x3f/0x60
> > > [66291.567129]  [<ffffffff8145ef35>] rwsem_down_failed_common+0xc5/0x160
> > > [66291.567133]  [<ffffffff81185f49>] ? find_inode+0xa9/0xb0
> > > [66291.567136]  [<ffffffff8145f005>] rwsem_down_read_failed+0x15/0x17
> > > [66291.567139]  [<ffffffff81248434>] call_rwsem_down_read_failed+0x14/0x30
> > > [66291.567143]  [<ffffffff8145d3d7>] ? down_read+0x17/0x20
> > > [66291.567146]  [<ffffffff8116ebdf>] get_super+0x9f/0xe0
> > > [66291.567149]  [<ffffffff811a4d9d>] fsync_bdev+0x1d/0x60
> > > [66291.567152]  [<ffffffff81227a2d>] invalidate_partition+0x2d/0x60
> > > [66291.567155]  [<ffffffff81228920>] del_gendisk+0x90/0x250
> > 
> > As can be seen from the stack entries above, this problem lies in the 
> > block or filesystem layer and not in USB or SCSI.
> 
> Don't blame the higher layers as the cause of the problem simply
> because they are the ones that show the visible symptoms ;)

Okay, point taken.  It's always good to have a new point of view when 
tackling a tough problem.

> The problem lies in the fact that the error handling callback that
> is run when the device is removed triggers IO to the block device
> that was just removed.  If all outstanding IOs have been error'd out
> correctly, and all new IOs return errors, then there is no reason
> for the fsync to block here. i.e. the mount process should have
> received an error.
> 
> However, the mount could have hung because underlying device has not
> been cleaned up properly before the device disconnect has proceeded.
> i.e. that it is possible that the cause is a SCSI or USB issue, not a
> filesystem issue. :)

But the mount got stuck _before_ the device was unplugged.  Hence
failure to clean up cannot be the underlying cause.

> So, what other blocked tasks are there in the system (echo w >
> /proc/sysrq-trigger)?
> 
> As it is, I think that invalidate_partition() is doing something
> somewhat insane for a block device that has been removed - you can't
> write to it so fsync_bdev() is useless.

That depends.  If by "removed" you mean physically disconnected from
the computer, then yes.  But if "removed" means merely unregistered
from the device core then writes can still succeed.  
invalidate_partition() doesn't know which has happened.

>  And cleaning up the dentry
> and inode caches is something that should be done when unmounting
> the filesystem, not when the block device goes away as they can
> trigger more IO and potentially deadlock with other operations that
> have not handled the IO errors properly. Yes, shut a filesystem down
> that has had it's block device removed, but filesystem level cleanup
> should be left to the filesystem, not this error handling path.
> 
> And another question - why doesn't having an active filesystem on a
> block device (i.e. an active reference to the gendisk) prevent the
> block device from being removed from underneath it?

References prevent data structures from being deallocated, not from 
being unregistered (or as James Bottomley likes to call it, "removed 
from visibility").

Alan Stern


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: mount stuck, khubd blocked
  2012-06-19 14:45 ` mount stuck, khubd blocked Alan Stern
  2012-06-19 21:41   ` Dave Chinner
@ 2012-06-20 18:47   ` Jeff Moyer
  2012-07-23 19:07     ` Alan Stern
  1 sibling, 1 reply; 14+ messages in thread
From: Jeff Moyer @ 2012-06-20 18:47 UTC (permalink / raw)
  To: Dima Tisnek
  Cc: Alan Stern, Alexander Viro, Jens Axboe, USB list, linux-fsdevel,
	Kernel development list

Alan Stern <stern@rowland.harvard.edu> writes:

> On Tue, 19 Jun 2012, Dima Tisnek wrote:
>
>> I made a microsd flash with 2 partitions, sdb1 is data partition, and
>> sdb2 is a sentinel partition, 1 block in size.
>> 
>> I attached the usb-microsd reader with that card in it and by mistake
>> tried to mount the sentinel partition, I ran:
>> mount /dev/sdb2 /mnt/flash/
>> 
>> mount got stuck, I was not able to kill or strace it, I pulled the usb
>> reader from the port, mount was still stuck, here's the dmesg log:

Hi, Dima,

Could you try the following patch?

Thanks,
Jeff

diff --git a/fs/buffer.c b/fs/buffer.c
index 838a9cf..769b30b 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -930,7 +930,7 @@ init_page_buffers(struct page *page, struct block_device *bdev,
 			bh->b_blocknr = block;
 			if (uptodate)
 				set_buffer_uptodate(bh);
-			if (block < end_block)
+			if (block <= end_block)
 				set_buffer_mapped(bh);
 		}
 		block++;

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: mount stuck, khubd blocked
@ 2012-06-21  1:34         ` Dave Chinner
  0 siblings, 0 replies; 14+ messages in thread
From: Dave Chinner @ 2012-06-21  1:34 UTC (permalink / raw)
  To: Alan Stern
  Cc: Dima Tisnek, Alexander Viro, Jens Axboe, USB list, linux-fsdevel,
	Kernel development list

On Wed, Jun 20, 2012 at 10:31:37AM -0400, Alan Stern wrote:
> On Wed, 20 Jun 2012, Dave Chinner wrote:
> 
> > On Tue, Jun 19, 2012 at 10:45:10AM -0400, Alan Stern wrote:
> > > On Tue, 19 Jun 2012, Dima Tisnek wrote:
> > > 
> > > > I made a microsd flash with 2 partitions, sdb1 is data partition, and
> > > > sdb2 is a sentinel partition, 1 block in size.
> > > > 
> > > > I attached the usb-microsd reader with that card in it and by mistake
> > > > tried to mount the sentinel partition, I ran:
> > > > mount /dev/sdb2 /mnt/flash/
> > > > 
> > > > mount got stuck, I was not able to kill or strace it, I pulled the usb
> > > > reader from the port, mount was still stuck, here's the dmesg log:
> > 
> > So where is the mount process stuck? It's holding the lock that
> > khubd is stuck on....
> 
> Yes, that's most likely the right explanation.

.....

> > > As can be seen from the stack entries above, this problem lies in the 
> > > block or filesystem layer and not in USB or SCSI.
> > 
> > Don't blame the higher layers as the cause of the problem simply
> > because they are the ones that show the visible symptoms ;)
> 
> Okay, point taken.  It's always good to have a new point of view when 
> tackling a tough problem.
> 
> > The problem lies in the fact that the error handling callback that
> > is run when the device is removed triggers IO to the block device
> > that was just removed.  If all outstanding IOs have been error'd out
> > correctly, and all new IOs return errors, then there is no reason
> > for the fsync to block here. i.e. the mount process should have
> > received an error.
> > 
> > However, the mount could have hung because underlying device has not
> > been cleaned up properly before the device disconnect has proceeded.
> > i.e. that it is possible that the cause is a SCSI or USB issue, not a
> > filesystem issue. :)
> 
> But the mount got stuck _before_ the device was unplugged.  Hence
> failure to clean up cannot be the underlying cause.

Perhaps. It might not be stuck - sometimes mount does a lot of IO
(e.g. due to journal recovery or quota checks) and it can't be
killed when this is occurring, and it's only a single system call so
strace won't return anything. Hence the filesystem -could- have been
actively issuing IO whenteh device was pulled.

Only stack traces of all the blocked tasks will tell us any
different...

> > So, what other blocked tasks are there in the system (echo w >
> > /proc/sysrq-trigger)?
> > 
> > As it is, I think that invalidate_partition() is doing something
> > somewhat insane for a block device that has been removed - you can't
> > write to it so fsync_bdev() is useless.
> 
> That depends.  If by "removed" you mean physically disconnected from
> the computer, then yes.  But if "removed" means merely unregistered
> from the device core then writes can still succeed.  
> invalidate_partition() doesn't know which has happened.

Which means the lower layers probably need to pass that distinction
up to the invalidation function.

> >  And cleaning up the dentry
> > and inode caches is something that should be done when unmounting
> > the filesystem, not when the block device goes away as they can
> > trigger more IO and potentially deadlock with other operations that
> > have not handled the IO errors properly. Yes, shut a filesystem down
> > that has had it's block device removed, but filesystem level cleanup
> > should be left to the filesystem, not this error handling path.
> > 
> > And another question - why doesn't having an active filesystem on a
> > block device (i.e. an active reference to the gendisk) prevent the
> > block device from being removed from underneath it?
> 
> References prevent data structures from being deallocated, not from 
> being unregistered (or as James Bottomley likes to call it, "removed 
> from visibility").

Except the unregister path appears to assume that a valid block
device available when it is unregistered. That seems to me like
there is a bad assumption being made in this error handling path...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: mount stuck, khubd blocked
@ 2012-06-21  1:34         ` Dave Chinner
  0 siblings, 0 replies; 14+ messages in thread
From: Dave Chinner @ 2012-06-21  1:34 UTC (permalink / raw)
  To: Alan Stern
  Cc: Dima Tisnek, Alexander Viro, Jens Axboe, USB list,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Kernel development list

On Wed, Jun 20, 2012 at 10:31:37AM -0400, Alan Stern wrote:
> On Wed, 20 Jun 2012, Dave Chinner wrote:
> 
> > On Tue, Jun 19, 2012 at 10:45:10AM -0400, Alan Stern wrote:
> > > On Tue, 19 Jun 2012, Dima Tisnek wrote:
> > > 
> > > > I made a microsd flash with 2 partitions, sdb1 is data partition, and
> > > > sdb2 is a sentinel partition, 1 block in size.
> > > > 
> > > > I attached the usb-microsd reader with that card in it and by mistake
> > > > tried to mount the sentinel partition, I ran:
> > > > mount /dev/sdb2 /mnt/flash/
> > > > 
> > > > mount got stuck, I was not able to kill or strace it, I pulled the usb
> > > > reader from the port, mount was still stuck, here's the dmesg log:
> > 
> > So where is the mount process stuck? It's holding the lock that
> > khubd is stuck on....
> 
> Yes, that's most likely the right explanation.

.....

> > > As can be seen from the stack entries above, this problem lies in the 
> > > block or filesystem layer and not in USB or SCSI.
> > 
> > Don't blame the higher layers as the cause of the problem simply
> > because they are the ones that show the visible symptoms ;)
> 
> Okay, point taken.  It's always good to have a new point of view when 
> tackling a tough problem.
> 
> > The problem lies in the fact that the error handling callback that
> > is run when the device is removed triggers IO to the block device
> > that was just removed.  If all outstanding IOs have been error'd out
> > correctly, and all new IOs return errors, then there is no reason
> > for the fsync to block here. i.e. the mount process should have
> > received an error.
> > 
> > However, the mount could have hung because underlying device has not
> > been cleaned up properly before the device disconnect has proceeded.
> > i.e. that it is possible that the cause is a SCSI or USB issue, not a
> > filesystem issue. :)
> 
> But the mount got stuck _before_ the device was unplugged.  Hence
> failure to clean up cannot be the underlying cause.

Perhaps. It might not be stuck - sometimes mount does a lot of IO
(e.g. due to journal recovery or quota checks) and it can't be
killed when this is occurring, and it's only a single system call so
strace won't return anything. Hence the filesystem -could- have been
actively issuing IO whenteh device was pulled.

Only stack traces of all the blocked tasks will tell us any
different...

> > So, what other blocked tasks are there in the system (echo w >
> > /proc/sysrq-trigger)?
> > 
> > As it is, I think that invalidate_partition() is doing something
> > somewhat insane for a block device that has been removed - you can't
> > write to it so fsync_bdev() is useless.
> 
> That depends.  If by "removed" you mean physically disconnected from
> the computer, then yes.  But if "removed" means merely unregistered
> from the device core then writes can still succeed.  
> invalidate_partition() doesn't know which has happened.

Which means the lower layers probably need to pass that distinction
up to the invalidation function.

> >  And cleaning up the dentry
> > and inode caches is something that should be done when unmounting
> > the filesystem, not when the block device goes away as they can
> > trigger more IO and potentially deadlock with other operations that
> > have not handled the IO errors properly. Yes, shut a filesystem down
> > that has had it's block device removed, but filesystem level cleanup
> > should be left to the filesystem, not this error handling path.
> > 
> > And another question - why doesn't having an active filesystem on a
> > block device (i.e. an active reference to the gendisk) prevent the
> > block device from being removed from underneath it?
> 
> References prevent data structures from being deallocated, not from 
> being unregistered (or as James Bottomley likes to call it, "removed 
> from visibility").

Except the unregister path appears to assume that a valid block
device available when it is unregistered. That seems to me like
there is a bad assumption being made in this error handling path...

Cheers,

Dave.
-- 
Dave Chinner
david-FqsqvQoI3Ljby3iVrkZq2A@public.gmane.org
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: mount stuck, khubd blocked
  2012-06-21  1:34         ` Dave Chinner
  (?)
@ 2012-06-21 14:25         ` Alan Stern
  2012-06-22  3:22           ` Dave Chinner
  -1 siblings, 1 reply; 14+ messages in thread
From: Alan Stern @ 2012-06-21 14:25 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Dima Tisnek, Alexander Viro, Jens Axboe, USB list, linux-fsdevel,
	Kernel development list

On Thu, 21 Jun 2012, Dave Chinner wrote:

> > > As it is, I think that invalidate_partition() is doing something
> > > somewhat insane for a block device that has been removed - you can't
> > > write to it so fsync_bdev() is useless.
> > 
> > That depends.  If by "removed" you mean physically disconnected from
> > the computer, then yes.  But if "removed" means merely unregistered
> > from the device core then writes can still succeed.  
> > invalidate_partition() doesn't know which has happened.
> 
> Which means the lower layers probably need to pass that distinction
> up to the invalidation function.

I don't think that information is passed anywhere in the kernel.  And 
in any case, it's not really important.  When a device is unregistered, 
the upper layers shouldn't care about the reason why.

> > > And another question - why doesn't having an active filesystem on a
> > > block device (i.e. an active reference to the gendisk) prevent the
> > > block device from being removed from underneath it?
> > 
> > References prevent data structures from being deallocated, not from 
> > being unregistered (or as James Bottomley likes to call it, "removed 
> > from visibility").
> 
> Except the unregister path appears to assume that a valid block
> device available when it is unregistered.

It may very well be available during the unregistration procedure.  
There's nothing wrong with assuming it is -- if it isn't, I/O attempts 
will simply fail.

> That seems to me like
> there is a bad assumption being made in this error handling path...

No; a bad assumption would be if the code assumed the device was 
available _after_ the unregistration call had completed.

Alan Stern


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: mount stuck, khubd blocked
  2012-06-21 14:25         ` Alan Stern
@ 2012-06-22  3:22           ` Dave Chinner
  2012-06-22 14:32             ` Alan Stern
  0 siblings, 1 reply; 14+ messages in thread
From: Dave Chinner @ 2012-06-22  3:22 UTC (permalink / raw)
  To: Alan Stern
  Cc: Dima Tisnek, Alexander Viro, Jens Axboe, USB list, linux-fsdevel,
	Kernel development list

On Thu, Jun 21, 2012 at 10:25:02AM -0400, Alan Stern wrote:
> On Thu, 21 Jun 2012, Dave Chinner wrote:
> 
> > > > As it is, I think that invalidate_partition() is doing something
> > > > somewhat insane for a block device that has been removed - you can't
> > > > write to it so fsync_bdev() is useless.
> > > 
> > > That depends.  If by "removed" you mean physically disconnected from
> > > the computer, then yes.  But if "removed" means merely unregistered
> > > from the device core then writes can still succeed.  
> > > invalidate_partition() doesn't know which has happened.
> > 
> > Which means the lower layers probably need to pass that distinction
> > up to the invalidation function.
> 
> I don't think that information is passed anywhere in the kernel.  And 
> in any case, it's not really important.  When a device is unregistered, 
> the upper layers shouldn't care about the reason why.

Then why have filesystem developers been asking for notifications
from the block layer that the device has been disconected for the
past couple of LSF summits? :)

Because we'd much prefer to know that part of the filesystem has
just disappeared and can't be used, rather than get back errors
every time we try to send an IO to the region that of the filesytem.
IO errors can be transient - disconnected block devices are not -
and so being able to tell the difference is important to handling
storage errors in a robust manner.

Think about BTRFS - knowing that a leg of an internal mirror has
been pulled out means it can select the other leg for all it's
metadata IO rather than just getting IO errors to it, and that it
can perhaps allocate a region on another device to mirror all new
metadata and avoid the problem altogether.

IOWs, there's plenty of good reasons for knowing that a device has
been disconnected at the higher layers of the storage stack....

> > > > And another question - why doesn't having an active filesystem on a
> > > > block device (i.e. an active reference to the gendisk) prevent the
> > > > block device from being removed from underneath it?
> > > 
> > > References prevent data structures from being deallocated, not from 
> > > being unregistered (or as James Bottomley likes to call it, "removed 
> > > from visibility").
> > 
> > Except the unregister path appears to assume that a valid block
> > device available when it is unregistered.
> 
> It may very well be available during the unregistration procedure.  
> There's nothing wrong with assuming it is -- if it isn't, I/O attempts 
> will simply fail.

It's clear that it isn't available, and you're assuming that IO
attempts are possible and that they will fail. If that assumption
was always valid, then we wouldn't have got this bug report....

> > That seems to me like
> > there is a bad assumption being made in this error handling path...
> 
> No; a bad assumption would be if the code assumed the device was 
> available _after_ the unregistration call had completed.

It's known to be unavaiable *during* the unregistration call, and
that code is assuming it is available.  When a device is forcible
unplugged from underenath an active filesytem, there is no guarantee
that it can extract itself from the mess that this leaves behind,
and assuming that it can is just wrong...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: mount stuck, khubd blocked
  2012-06-22  3:22           ` Dave Chinner
@ 2012-06-22 14:32             ` Alan Stern
  0 siblings, 0 replies; 14+ messages in thread
From: Alan Stern @ 2012-06-22 14:32 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Dima Tisnek, Alexander Viro, Jens Axboe, USB list, linux-fsdevel,
	Kernel development list

On Fri, 22 Jun 2012, Dave Chinner wrote:

> On Thu, Jun 21, 2012 at 10:25:02AM -0400, Alan Stern wrote:
> > On Thu, 21 Jun 2012, Dave Chinner wrote:
> > 
> > > > > As it is, I think that invalidate_partition() is doing something
> > > > > somewhat insane for a block device that has been removed - you can't
> > > > > write to it so fsync_bdev() is useless.
> > > > 
> > > > That depends.  If by "removed" you mean physically disconnected from
> > > > the computer, then yes.  But if "removed" means merely unregistered
> > > > from the device core then writes can still succeed.  
> > > > invalidate_partition() doesn't know which has happened.
> > > 
> > > Which means the lower layers probably need to pass that distinction
> > > up to the invalidation function.
> > 
> > I don't think that information is passed anywhere in the kernel.  And 
> > in any case, it's not really important.  When a device is unregistered, 
> > the upper layers shouldn't care about the reason why.
> 
> Then why have filesystem developers been asking for notifications
> from the block layer that the device has been disconected for the
> past couple of LSF summits? :)

I don't know -- I don't attend LSF summits (and I can't read the 
filesystem developers' minds).  :-)

Still, I have nothing _against_ such notifications.  I'm just saying 
that things should work properly even in their absence.

> Because we'd much prefer to know that part of the filesystem has
> just disappeared and can't be used, rather than get back errors
> every time we try to send an IO to the region that of the filesytem.
> IO errors can be transient - disconnected block devices are not -
> and so being able to tell the difference is important to handling
> storage errors in a robust manner.
> 
> Think about BTRFS - knowing that a leg of an internal mirror has
> been pulled out means it can select the other leg for all it's
> metadata IO rather than just getting IO errors to it, and that it
> can perhaps allocate a region on another device to mirror all new
> metadata and avoid the problem altogether.
> 
> IOWs, there's plenty of good reasons for knowing that a device has
> been disconnected at the higher layers of the storage stack....

There was a discussion about this about half a year ago (although from 
a somewhat different point of view):

	http://marc.info/?t=132577666300004&r=1&w=2

Ted Ts'o took your position and Tejun Heo took mine.  But nobody 
mentioned the mirroring example, or even anything like it.

> > > Except the unregister path appears to assume that a valid block
> > > device available when it is unregistered.
> > 
> > It may very well be available during the unregistration procedure.  
> > There's nothing wrong with assuming it is -- if it isn't, I/O attempts 
> > will simply fail.
> 
> It's clear that it isn't available, and you're assuming that IO
> attempts are possible and that they will fail. If that assumption
> was always valid, then we wouldn't have got this bug report....

Not true.  This particular bug has nothing to do with device removal.  
It was caused by mount getting trapped in a loop (presumably while 
holding a lock).

> > No; a bad assumption would be if the code assumed the device was 
> > available _after_ the unregistration call had completed.
> 
> It's known to be unavaiable *during* the unregistration call, and
> that code is assuming it is available.  When a device is forcible
> unplugged from underenath an active filesytem, there is no guarantee
> that it can extract itself from the mess that this leaves behind,
> and assuming that it can is just wrong...

Filesystems _have_ to be able to extricate themselves from this sort 
of mess.  If they can't then they are broken, period.  See Greg KH's 
comment in the thread mentioned above.

Alan Stern


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: mount stuck, khubd blocked
  2012-06-20 18:47   ` Jeff Moyer
@ 2012-07-23 19:07     ` Alan Stern
  2012-07-23 19:22       ` Jeff Moyer
  0 siblings, 1 reply; 14+ messages in thread
From: Alan Stern @ 2012-07-23 19:07 UTC (permalink / raw)
  To: Jeff Moyer
  Cc: Dima Tisnek, Alexander Viro, Jens Axboe, amethyst623, USB list,
	linux-fsdevel, Kernel development list

On Wed, 20 Jun 2012, Jeff Moyer wrote:

> Alan Stern <stern@rowland.harvard.edu> writes:
> 
> > On Tue, 19 Jun 2012, Dima Tisnek wrote:
> >
> >> I made a microsd flash with 2 partitions, sdb1 is data partition, and
> >> sdb2 is a sentinel partition, 1 block in size.
> >> 
> >> I attached the usb-microsd reader with that card in it and by mistake
> >> tried to mount the sentinel partition, I ran:
> >> mount /dev/sdb2 /mnt/flash/
> >> 
> >> mount got stuck, I was not able to kill or strace it, I pulled the usb
> >> reader from the port, mount was still stuck, here's the dmesg log:
> 
> Hi, Dima,
> 
> Could you try the following patch?
> 
> Thanks,
> Jeff
> 
> diff --git a/fs/buffer.c b/fs/buffer.c
> index 838a9cf..769b30b 100644
> --- a/fs/buffer.c
> +++ b/fs/buffer.c
> @@ -930,7 +930,7 @@ init_page_buffers(struct page *page, struct block_device *bdev,
>  			bh->b_blocknr = block;
>  			if (uptodate)
>  				set_buffer_uptodate(bh);
> -			if (block < end_block)
> +			if (block <= end_block)
>  				set_buffer_mapped(bh);
>  		}
>  		block++;

Jeff, does this also fix Bugzilla #43269?

Alan Stern


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: mount stuck, khubd blocked
  2012-07-23 19:07     ` Alan Stern
@ 2012-07-23 19:22       ` Jeff Moyer
  2012-07-23 19:57           ` Alan Stern
  0 siblings, 1 reply; 14+ messages in thread
From: Jeff Moyer @ 2012-07-23 19:22 UTC (permalink / raw)
  To: Alan Stern
  Cc: Dima Tisnek, Alexander Viro, Jens Axboe, amethyst623, USB list,
	linux-fsdevel, Kernel development list

Alan Stern <stern@rowland.harvard.edu> writes:

> On Wed, 20 Jun 2012, Jeff Moyer wrote:
>
>> Alan Stern <stern@rowland.harvard.edu> writes:
>> 
>> > On Tue, 19 Jun 2012, Dima Tisnek wrote:
>> >
>> >> I made a microsd flash with 2 partitions, sdb1 is data partition, and
>> >> sdb2 is a sentinel partition, 1 block in size.
>> >> 
>> >> I attached the usb-microsd reader with that card in it and by mistake
>> >> tried to mount the sentinel partition, I ran:
>> >> mount /dev/sdb2 /mnt/flash/
>> >> 
>> >> mount got stuck, I was not able to kill or strace it, I pulled the usb
>> >> reader from the port, mount was still stuck, here's the dmesg log:
>> 
>> Hi, Dima,
>> 
>> Could you try the following patch?
>> 
>> Thanks,
>> Jeff
>> 
>> diff --git a/fs/buffer.c b/fs/buffer.c
>> index 838a9cf..769b30b 100644
>> --- a/fs/buffer.c
>> +++ b/fs/buffer.c
>> @@ -930,7 +930,7 @@ init_page_buffers(struct page *page, struct block_device *bdev,
>>  			bh->b_blocknr = block;
>>  			if (uptodate)
>>  				set_buffer_uptodate(bh);
>> -			if (block < end_block)
>> +			if (block <= end_block)
>>  				set_buffer_mapped(bh);
>>  		}
>>  		block++;
>
> Jeff, does this also fix Bugzilla #43269?

First, this patch is wrong.  I posted another version later on that got
merged for 3.5.

As for bug 43269, it does not look like the same symptoms, so I would
not expect the patches I posted to resolve that issue.  Sorry.

Cheers,
Jeff

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: mount stuck, khubd blocked
@ 2012-07-23 19:57           ` Alan Stern
  0 siblings, 0 replies; 14+ messages in thread
From: Alan Stern @ 2012-07-23 19:57 UTC (permalink / raw)
  To: Jeff Moyer
  Cc: Dima Tisnek, Alexander Viro, Jens Axboe, amethyst623, USB list,
	linux-fsdevel, Kernel development list

On Mon, 23 Jul 2012, Jeff Moyer wrote:

> > Jeff, does this also fix Bugzilla #43269?
> 
> First, this patch is wrong.  I posted another version later on that got
> merged for 3.5.
> 
> As for bug 43269, it does not look like the same symptoms, so I would
> not expect the patches I posted to resolve that issue.  Sorry.

Can you suggest someone who might be able to help with #43269?  It has 
been languishing since May.

Alan Stern


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: mount stuck, khubd blocked
@ 2012-07-23 19:57           ` Alan Stern
  0 siblings, 0 replies; 14+ messages in thread
From: Alan Stern @ 2012-07-23 19:57 UTC (permalink / raw)
  To: Jeff Moyer
  Cc: Dima Tisnek, Alexander Viro, Jens Axboe, amethyst623-9Onoh4P/yGk,
	USB list, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	Kernel development list

On Mon, 23 Jul 2012, Jeff Moyer wrote:

> > Jeff, does this also fix Bugzilla #43269?
> 
> First, this patch is wrong.  I posted another version later on that got
> merged for 3.5.
> 
> As for bug 43269, it does not look like the same symptoms, so I would
> not expect the patches I posted to resolve that issue.  Sorry.

Can you suggest someone who might be able to help with #43269?  It has 
been languishing since May.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: mount stuck, khubd blocked
  2012-07-23 19:57           ` Alan Stern
  (?)
@ 2012-07-23 20:19           ` Jeff Moyer
  -1 siblings, 0 replies; 14+ messages in thread
From: Jeff Moyer @ 2012-07-23 20:19 UTC (permalink / raw)
  To: Alan Stern
  Cc: Dima Tisnek, Alexander Viro, Jens Axboe, amethyst623, USB list,
	linux-fsdevel, Kernel development list, James Bottomley

Alan Stern <stern@rowland.harvard.edu> writes:

> On Mon, 23 Jul 2012, Jeff Moyer wrote:
>
>> > Jeff, does this also fix Bugzilla #43269?
>> 
>> First, this patch is wrong.  I posted another version later on that got
>> merged for 3.5.
>> 
>> As for bug 43269, it does not look like the same symptoms, so I would
>> not expect the patches I posted to resolve that issue.  Sorry.
>
> Can you suggest someone who might be able to help with #43269?  It has 
> been languishing since May.

Well, either Jens or James may be able to make more progress faster than
I could.  I'm a little concerned that the reporter has unreasonable
expectations for his use of library calls, but I suppose that's a
separate issue.

James, Jens:
  https://bugzilla.kernel.org/show_bug.cgi?id=43269

Cheers,
Jeff

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2012-07-23 20:19 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <CAGGBzX+uGFOXb0WMD-bkDSL8nq=rD4ZFiV=xwAJyqNx=rZ20sw@mail.gmail.com>
2012-06-19 14:45 ` mount stuck, khubd blocked Alan Stern
2012-06-19 21:41   ` Dave Chinner
2012-06-20 14:31     ` Alan Stern
2012-06-21  1:34       ` Dave Chinner
2012-06-21  1:34         ` Dave Chinner
2012-06-21 14:25         ` Alan Stern
2012-06-22  3:22           ` Dave Chinner
2012-06-22 14:32             ` Alan Stern
2012-06-20 18:47   ` Jeff Moyer
2012-07-23 19:07     ` Alan Stern
2012-07-23 19:22       ` Jeff Moyer
2012-07-23 19:57         ` Alan Stern
2012-07-23 19:57           ` Alan Stern
2012-07-23 20:19           ` Jeff Moyer

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.