linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: INFO: task hung in __get_super
       [not found] ` <201905150102.x4F12b6o009249@www262.sakura.ne.jp>
@ 2019-05-15 10:21   ` Jan Kara
  2019-05-15 11:32     ` Tetsuo Handa
  2019-05-15 11:59     ` syzbot
  0 siblings, 2 replies; 9+ messages in thread
From: Jan Kara @ 2019-05-15 10:21 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: Jan Kara, Jens Axboe, Alexander Viro, syzbot, dvyukov,
	linux-fsdevel, linux-kernel, syzkaller-bugs, linux-block

[-- Attachment #1: Type: text/plain, Size: 3813 bytes --]

On Wed 15-05-19 10:02:37, Tetsuo Handa wrote:
> Since lo_ioctl() is calling sb_set_blocksize() immediately after udf_load_vrs()
> called sb_set_blocksize(), udf_tread() can't use expected i_blkbits settings...

Thanks for debugging this but this doesn't quiet make sense to me. See
below:

> [   48.685672][ T7322] fs/block_dev.c:135 bdev=0000000014fa0ec2 12 -> 9
> [   48.694675][ T7322] CPU: 4 PID: 7322 Comm: a.out Not tainted 5.1.0+ #196
> [   48.701321][ T7322] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 04/13/2018
> [   48.710265][ T7322] Call Trace:
> [   48.710272][ T7322]  dump_stack+0xaa/0xd8
> [   48.715633][ T7322]  set_blocksize+0xff/0x140
> [   48.822094][ T7322]  sb_set_blocksize+0x27/0x70
> [   48.824843][ T7322]  udf_load_vrs+0x4b/0x500
> [   48.827322][ T7322]  udf_fill_super+0x32e/0x890
> [   48.830125][ T7322]  ? snprintf+0x66/0x90
> [   48.832572][ T7322]  mount_bdev+0x1c7/0x210
> [   48.835293][ T7322]  ? udf_load_vrs+0x500/0x500
> [   48.838009][ T7322]  udf_mount+0x34/0x40
> [   48.840504][ T7322]  legacy_get_tree+0x2d/0x80
> [   48.843192][ T7322]  vfs_get_tree+0x30/0x140
> [   48.845787][ T7322]  do_mount+0x830/0xc30
> [   48.848325][ T7322]  ? copy_mount_options+0x152/0x1c0
> [   48.851066][ T7322]  ksys_mount+0xab/0x120
> [   48.853627][ T7322]  __x64_sys_mount+0x26/0x30
> [   48.856168][ T7322]  do_syscall_64+0x7c/0x1a0
> [   48.858943][ T7322]  entry_SYSCALL_64_after_hwframe+0x44/0xa9

So this is normal - UDF sets block size it wants on the device during
mount. Now we have the block device exclusively open so nobody should be
changing it.

> [   48.978376][ T7332] fs/block_dev.c:135 bdev=0000000014fa0ec2 9 -> 12
> [   49.079394][ T7332] CPU: 6 PID: 7332 Comm: a.out Not tainted 5.1.0+ #196
> [   49.082769][ T7332] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 04/13/2018
> [   49.089007][ T7332] Call Trace:
> [   49.091410][ T7332]  dump_stack+0xaa/0xd8
> [   49.094053][ T7332]  set_blocksize+0xff/0x140
> [   49.096734][ T7332]  lo_ioctl+0x570/0xc60
> [   49.099366][ T7332]  ? loop_queue_work+0xdb0/0xdb0
> [   49.102079][ T7332]  blkdev_ioctl+0xb69/0xc10
> [   49.104667][ T7332]  block_ioctl+0x56/0x70
> [   49.107267][ T7332]  ? blkdev_fallocate+0x230/0x230
> [   49.110035][ T7332]  do_vfs_ioctl+0xc1/0x7e0
> [   49.112728][ T7332]  ? tomoyo_file_ioctl+0x23/0x30
> [   49.115452][ T7332]  ksys_ioctl+0x94/0xb0
> [   49.118008][ T7332]  __x64_sys_ioctl+0x1e/0x30
> [   49.120686][ T7332]  do_syscall_64+0x7c/0x1a0
> [   49.123470][ T7332]  entry_SYSCALL_64_after_hwframe+0x44/0xa9

And this is strange. set_blocksize() is only called from loop_set_fd() but
that means that the loop device must already be in lo->lo_state ==
Lo_unbound. But loop device that is being used should never be in
Lo_unbound state... Except if... Oh, I see what the problem is:

UDF opens unbound loop device (through mount_bdev() ->
blkdev_get_by_path()). That succeeds as loop allows opens on unbound
devices so that ioctl can be run to set it up. UDF sets block size for the
block device. Someone else comes and calls LOOP_SET_FD for the device and
plop, block device block size changes under UDF's hands.

The question is how to fix this problem. The simplest fix I can see is that
we'd just refuse to do LOOP_SET_FD if someone has the block device
exclusively open as there are high chances such user will be unpleasantly
surprised by the device changing under him. OTOH this has some potential
for userspace visible regressions. But I guess it's worth a try. Something
like attached patch?

Let syzbot test the patch as well:

#syz test: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git v5.1

								Honza

-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

[-- Attachment #2: 0001-loop-Don-t-change-loop-device-under-exclusive-opener.patch --]
[-- Type: text/x-patch, Size: 1703 bytes --]

From 0145358ae24581b7af36261caee0c6dbe22cce0c Mon Sep 17 00:00:00 2001
From: Jan Kara <jack@suse.cz>
Date: Wed, 15 May 2019 11:45:10 +0200
Subject: [PATCH] loop: Don't change loop device under exclusive opener

Loop module allows calling LOOP_SET_FD while there are other openers of
the loop device. Even exclusive ones. This can lead to weird
consequences such as kernel deadlocks like:

mount_bdev()				lo_ioctl()
  udf_fill_super()
    udf_load_vrs()
      sb_set_blocksize() - sets desired block size B
      udf_tread()
        sb_bread()
          __bread_gfp(bdev, block, B)
					  loop_set_fd()
					    set_blocksize()
            - now __getblk_slow() indefinitely loops because B != bdev
              block size

Fix the problem by disallowing LOOP_SET_FD ioctl when there are
exclusive openers of a loop device.

[Deliberately chosen not to CC stable as a user with priviledges to
trigger this race has other means of taking the system down and this
has a potential of breaking some weird userspace setup]

Reported-by: syzbot+10007d66ca02b08f0e60@syzkaller.appspotmail.com
Signed-off-by: Jan Kara <jack@suse.cz>
---
 drivers/block/loop.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index 102d79575895..9dcf8bb60c4e 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -952,6 +952,9 @@ static int loop_set_fd(struct loop_device *lo, fmode_t mode,
 	error = -EBUSY;
 	if (lo->lo_state != Lo_unbound)
 		goto out_unlock;
+	/* Avoid changing loop device under an exclusive opener... */
+	if (!(mode & FMODE_EXCL) && bdev->bd_holders > 0)
+		goto out_unlock;
 
 	error = loop_validate_file(file, bdev);
 	if (error)
-- 
2.16.4


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: INFO: task hung in __get_super
  2019-05-15 10:21   ` INFO: task hung in __get_super Jan Kara
@ 2019-05-15 11:32     ` Tetsuo Handa
  2019-05-15 13:07       ` Jan Kara
  2019-05-15 11:59     ` syzbot
  1 sibling, 1 reply; 9+ messages in thread
From: Tetsuo Handa @ 2019-05-15 11:32 UTC (permalink / raw)
  To: Jan Kara
  Cc: Jens Axboe, Alexander Viro, syzbot, dvyukov, linux-fsdevel,
	linux-kernel, syzkaller-bugs, linux-block

On 2019/05/15 19:21, Jan Kara wrote:
> The question is how to fix this problem. The simplest fix I can see is that
> we'd just refuse to do LOOP_SET_FD if someone has the block device
> exclusively open as there are high chances such user will be unpleasantly
> surprised by the device changing under him. OTOH this has some potential
> for userspace visible regressions. But I guess it's worth a try. Something
> like attached patch?

(1) If I understand correctly, FMODE_EXCL is set at blkdev_open() only if O_EXCL
    is specified. How can we detect if O_EXCL was not used, for the reproducer
    ( https://syzkaller.appspot.com/text?tag=ReproC&x=135385a8a00000 ) is not
    using O_EXCL ?

(2) There seems to be no serialization. What guarantees that mount_bdev()
    does not start due to preempted after the check added by this patch?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: INFO: task hung in __get_super
  2019-05-15 10:21   ` INFO: task hung in __get_super Jan Kara
  2019-05-15 11:32     ` Tetsuo Handa
@ 2019-05-15 11:59     ` syzbot
  1 sibling, 0 replies; 9+ messages in thread
From: syzbot @ 2019-05-15 11:59 UTC (permalink / raw)
  To: axboe, dvyukov, jack, linux-block, linux-fsdevel, linux-kernel,
	penguin-kernel, syzkaller-bugs, viro

Hello,

syzbot has tested the proposed patch and the reproducer did not trigger  
crash:

Reported-and-tested-by:  
syzbot+10007d66ca02b08f0e60@syzkaller.appspotmail.com

Tested on:

commit:         e93c9c99 Linux 5.1
git tree:        
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git v5.1
kernel config:  https://syzkaller.appspot.com/x/.config?x=5edd1df52e9bc982
compiler:       gcc (GCC) 9.0.0 20181231 (experimental)
patch:          https://syzkaller.appspot.com/x/patch.diff?x=133626d8a00000

Note: testing is done by a robot and is best-effort only.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: INFO: task hung in __get_super
  2019-05-15 11:32     ` Tetsuo Handa
@ 2019-05-15 13:07       ` Jan Kara
  2019-05-16 11:48         ` Jan Kara
  0 siblings, 1 reply; 9+ messages in thread
From: Jan Kara @ 2019-05-15 13:07 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: Jan Kara, Jens Axboe, Alexander Viro, syzbot, dvyukov,
	linux-fsdevel, linux-kernel, syzkaller-bugs, linux-block

On Wed 15-05-19 20:32:27, Tetsuo Handa wrote:
> On 2019/05/15 19:21, Jan Kara wrote:
> > The question is how to fix this problem. The simplest fix I can see is that
> > we'd just refuse to do LOOP_SET_FD if someone has the block device
> > exclusively open as there are high chances such user will be unpleasantly
> > surprised by the device changing under him. OTOH this has some potential
> > for userspace visible regressions. But I guess it's worth a try. Something
> > like attached patch?
> 
> (1) If I understand correctly, FMODE_EXCL is set at blkdev_open() only if
> O_EXCL is specified.

Yes.

> How can we detect if O_EXCL was not used, for the reproducer (
> https://syzkaller.appspot.com/text?tag=ReproC&x=135385a8a00000 ) is not
> using O_EXCL ?

mount_bdev() is using O_EXCL and that's what matters.

> (2) There seems to be no serialization. What guarantees that mount_bdev()
>     does not start due to preempted after the check added by this patch?

That's a good question. lo_ctl_mutex actually synchronizes most of this
(taken in both loop_set_fd() and lo_open()) but you're right that there's
still a small race window where loop_set_fd() need not see bdev->bd_holders
elevated while blkdev_get() will succeed. So I need to think a bit more
about proper synchronization of this.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: INFO: task hung in __get_super
  2019-05-15 13:07       ` Jan Kara
@ 2019-05-16 11:48         ` Jan Kara
  2019-05-16 12:17           ` Tetsuo Handa
  2019-05-16 12:33           ` syzbot
  0 siblings, 2 replies; 9+ messages in thread
From: Jan Kara @ 2019-05-16 11:48 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: Jan Kara, Jens Axboe, Alexander Viro, syzbot, dvyukov,
	linux-fsdevel, linux-kernel, syzkaller-bugs, linux-block

[-- Attachment #1: Type: text/plain, Size: 1801 bytes --]

On Wed 15-05-19 15:07:30, Jan Kara wrote:
> On Wed 15-05-19 20:32:27, Tetsuo Handa wrote:
> > On 2019/05/15 19:21, Jan Kara wrote:
> > > The question is how to fix this problem. The simplest fix I can see is that
> > > we'd just refuse to do LOOP_SET_FD if someone has the block device
> > > exclusively open as there are high chances such user will be unpleasantly
> > > surprised by the device changing under him. OTOH this has some potential
> > > for userspace visible regressions. But I guess it's worth a try. Something
> > > like attached patch?
> > 
> > (1) If I understand correctly, FMODE_EXCL is set at blkdev_open() only if
> > O_EXCL is specified.
> 
> Yes.
> 
> > How can we detect if O_EXCL was not used, for the reproducer (
> > https://syzkaller.appspot.com/text?tag=ReproC&x=135385a8a00000 ) is not
> > using O_EXCL ?
> 
> mount_bdev() is using O_EXCL and that's what matters.
> 
> > (2) There seems to be no serialization. What guarantees that mount_bdev()
> >     does not start due to preempted after the check added by this patch?
> 
> That's a good question. lo_ctl_mutex actually synchronizes most of this
> (taken in both loop_set_fd() and lo_open()) but you're right that there's
> still a small race window where loop_set_fd() need not see bdev->bd_holders
> elevated while blkdev_get() will succeed. So I need to think a bit more
> about proper synchronization of this.

OK, so non-racy fix was a bit more involved and I've ended up just
upgrading the file reference to an exclusive one in loop_set_fd() instead
of trying to hand-craft some locking solution. The result is attached and
it passes blktests.

Let syzbot also test it:

#syz test: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git v5.1

									Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

[-- Attachment #2: 0001-loop-Don-t-change-loop-device-under-exclusive-opener.patch --]
[-- Type: text/x-patch, Size: 2344 bytes --]

From e7a35f48a902b42894eba8cc4201ca745bfd5dfe Mon Sep 17 00:00:00 2001
From: Jan Kara <jack@suse.cz>
Date: Wed, 15 May 2019 11:45:10 +0200
Subject: [PATCH] loop: Don't change loop device under exclusive opener

Loop module allows calling LOOP_SET_FD while there are other openers of
the loop device. Even exclusive ones. This can lead to weird
consequences such as kernel deadlocks like:

mount_bdev()				lo_ioctl()
  udf_fill_super()
    udf_load_vrs()
      sb_set_blocksize() - sets desired block size B
      udf_tread()
        sb_bread()
          __bread_gfp(bdev, block, B)
					  loop_set_fd()
					    set_blocksize()
            - now __getblk_slow() indefinitely loops because B != bdev
              block size

Fix the problem by disallowing LOOP_SET_FD ioctl when there are
exclusive openers of a loop device.

[Deliberately chosen not to CC stable as a user with priviledges to
trigger this race has other means of taking the system down and this
has a potential of breaking some weird userspace setup]

Reported-by: syzbot+10007d66ca02b08f0e60@syzkaller.appspotmail.com
Signed-off-by: Jan Kara <jack@suse.cz>
---
 drivers/block/loop.c | 18 +++++++++++++++++-
 1 file changed, 17 insertions(+), 1 deletion(-)

diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index 102d79575895..f11b7dc16e9d 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -945,9 +945,20 @@ static int loop_set_fd(struct loop_device *lo, fmode_t mode,
 	if (!file)
 		goto out;
 
+	/*
+	 * If we don't hold exclusive handle for the device, upgrade to it
+	 * here to avoid changing device under exclusive owner.
+	 */
+	if (!(mode & FMODE_EXCL)) {
+		bdgrab(bdev);
+		error = blkdev_get(bdev, mode | FMODE_EXCL, loop_set_fd);
+		if (error)
+			goto out_putf;
+	}
+
 	error = mutex_lock_killable(&loop_ctl_mutex);
 	if (error)
-		goto out_putf;
+		goto out_bdev;
 
 	error = -EBUSY;
 	if (lo->lo_state != Lo_unbound)
@@ -1012,10 +1023,15 @@ static int loop_set_fd(struct loop_device *lo, fmode_t mode,
 	mutex_unlock(&loop_ctl_mutex);
 	if (partscan)
 		loop_reread_partitions(lo, bdev);
+	if (!(mode & FMODE_EXCL))
+		blkdev_put(bdev, mode | FMODE_EXCL);
 	return 0;
 
 out_unlock:
 	mutex_unlock(&loop_ctl_mutex);
+out_bdev:
+	if (!(mode & FMODE_EXCL))
+		blkdev_put(bdev, mode | FMODE_EXCL);
 out_putf:
 	fput(file);
 out:
-- 
2.16.4


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: INFO: task hung in __get_super
  2019-05-16 11:48         ` Jan Kara
@ 2019-05-16 12:17           ` Tetsuo Handa
  2019-05-16 12:32             ` Jan Kara
  2019-05-16 12:33           ` syzbot
  1 sibling, 1 reply; 9+ messages in thread
From: Tetsuo Handa @ 2019-05-16 12:17 UTC (permalink / raw)
  To: Jan Kara
  Cc: Jens Axboe, Alexander Viro, syzbot, dvyukov, linux-fsdevel,
	linux-kernel, syzkaller-bugs, linux-block

On 2019/05/16 20:48, Jan Kara wrote:
> OK, so non-racy fix was a bit more involved and I've ended up just
> upgrading the file reference to an exclusive one in loop_set_fd() instead
> of trying to hand-craft some locking solution. The result is attached and
> it passes blktests.

blkdev_get() has corresponding blkdev_put().
bdgrab() does not have corresponding bdput() ?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: INFO: task hung in __get_super
  2019-05-16 12:17           ` Tetsuo Handa
@ 2019-05-16 12:32             ` Jan Kara
  2019-05-16 12:50               ` Tetsuo Handa
  0 siblings, 1 reply; 9+ messages in thread
From: Jan Kara @ 2019-05-16 12:32 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: Jan Kara, Jens Axboe, Alexander Viro, syzbot, dvyukov,
	linux-fsdevel, linux-kernel, syzkaller-bugs, linux-block

On Thu 16-05-19 21:17:14, Tetsuo Handa wrote:
> On 2019/05/16 20:48, Jan Kara wrote:
> > OK, so non-racy fix was a bit more involved and I've ended up just
> > upgrading the file reference to an exclusive one in loop_set_fd() instead
> > of trying to hand-craft some locking solution. The result is attached and
> > it passes blktests.
> 
> blkdev_get() has corresponding blkdev_put().
> bdgrab() does not have corresponding bdput() ?

Yes, and that's hidden inside blkdev_put() (or failing blkdev_get()). Don't
get me started on calling conventions of these functions... I've wasted half
an hour trying to figure out where I'm leaking inode references in my patch
;).

								Honza

-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: INFO: task hung in __get_super
  2019-05-16 11:48         ` Jan Kara
  2019-05-16 12:17           ` Tetsuo Handa
@ 2019-05-16 12:33           ` syzbot
  1 sibling, 0 replies; 9+ messages in thread
From: syzbot @ 2019-05-16 12:33 UTC (permalink / raw)
  To: axboe, dvyukov, jack, linux-block, linux-fsdevel, linux-kernel,
	penguin-kernel, syzkaller-bugs, viro

Hello,

syzbot has tested the proposed patch and the reproducer did not trigger  
crash:

Reported-and-tested-by:  
syzbot+10007d66ca02b08f0e60@syzkaller.appspotmail.com

Tested on:

commit:         e93c9c99 Linux 5.1
git tree:        
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git v5.1
kernel config:  https://syzkaller.appspot.com/x/.config?x=5edd1df52e9bc982
compiler:       gcc (GCC) 9.0.0 20181231 (experimental)
patch:          https://syzkaller.appspot.com/x/patch.diff?x=135c5b02a00000

Note: testing is done by a robot and is best-effort only.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: INFO: task hung in __get_super
  2019-05-16 12:32             ` Jan Kara
@ 2019-05-16 12:50               ` Tetsuo Handa
  0 siblings, 0 replies; 9+ messages in thread
From: Tetsuo Handa @ 2019-05-16 12:50 UTC (permalink / raw)
  To: Jan Kara
  Cc: Jens Axboe, Alexander Viro, syzbot, dvyukov, linux-fsdevel,
	linux-kernel, syzkaller-bugs, linux-block

On 2019/05/16 21:32, Jan Kara wrote:
> On Thu 16-05-19 21:17:14, Tetsuo Handa wrote:
>> On 2019/05/16 20:48, Jan Kara wrote:
>>> OK, so non-racy fix was a bit more involved and I've ended up just
>>> upgrading the file reference to an exclusive one in loop_set_fd() instead
>>> of trying to hand-craft some locking solution. The result is attached and
>>> it passes blktests.
>>
>> blkdev_get() has corresponding blkdev_put().
>> bdgrab() does not have corresponding bdput() ?
> 
> Yes, and that's hidden inside blkdev_put() (or failing blkdev_get()). Don't
> get me started on calling conventions of these functions... I've wasted half
> an hour trying to figure out where I'm leaking inode references in my patch
> ;).

Ah, found tricky comment. Please apply the patch. Thank you.

/**
 * blkdev_get - open a block device
 * @bdev: block_device to open
 * @mode: FMODE_* mask
 * @holder: exclusive holder identifier
 *
 * Open @bdev with @mode.  If @mode includes %FMODE_EXCL, @bdev is
 * open with exclusive access.  Specifying %FMODE_EXCL with %NULL
 * @holder is invalid.  Exclusive opens may nest for the same @holder.
 *
 * On success, the reference count of @bdev is unchanged.  On failure,
 * @bdev is put.
 *
 * CONTEXT:
 * Might sleep.
 *
 * RETURNS:
 * 0 on success, -errno on failure.
 */

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2019-05-16 12:51 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <0000000000002cd22305879b22c4@google.com>
     [not found] ` <201905150102.x4F12b6o009249@www262.sakura.ne.jp>
2019-05-15 10:21   ` INFO: task hung in __get_super Jan Kara
2019-05-15 11:32     ` Tetsuo Handa
2019-05-15 13:07       ` Jan Kara
2019-05-16 11:48         ` Jan Kara
2019-05-16 12:17           ` Tetsuo Handa
2019-05-16 12:32             ` Jan Kara
2019-05-16 12:50               ` Tetsuo Handa
2019-05-16 12:33           ` syzbot
2019-05-15 11:59     ` syzbot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).