Linux-Block Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH] loop: Don't change loop device under exclusive opener
@ 2019-05-16 14:01 Jan Kara
  2019-05-16 20:44 ` Jens Axboe
  0 siblings, 1 reply; 19+ messages in thread
From: Jan Kara @ 2019-05-16 14:01 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Tetsuo Handa, linux-block, Jan Kara

Loop module allows calling LOOP_SET_FD while there are other openers of
the loop device. Even exclusive ones. This can lead to weird
consequences such as kernel deadlocks like:

mount_bdev()				lo_ioctl()
  udf_fill_super()
    udf_load_vrs()
      sb_set_blocksize() - sets desired block size B
      udf_tread()
        sb_bread()
          __bread_gfp(bdev, block, B)
					  loop_set_fd()
					    set_blocksize()
            - now __getblk_slow() indefinitely loops because B != bdev
              block size

Fix the problem by disallowing LOOP_SET_FD ioctl when there are
exclusive openers of a loop device.

[Deliberately chosen not to CC stable as a user with priviledges to
trigger this race has other means of taking the system down and this
has a potential of breaking some weird userspace setup]

Reported-and-tested-by: syzbot+10007d66ca02b08f0e60@syzkaller.appspotmail.com
Signed-off-by: Jan Kara <jack@suse.cz>
---
 drivers/block/loop.c | 18 +++++++++++++++++-
 1 file changed, 17 insertions(+), 1 deletion(-)

Hi Jens!

What do you think about this patch? It fixes the problem but it also changes
user visible behavior so there are chances it breaks some existing setup
(although I have hard time coming up with a realistic scenario where it would
matter).

Alternatively we could change getblk() code handle changing block size. That
would fix the particular issue syzkaller found as well but I'm not sure what
else is broken when block device changes while fs driver is working with it.

Honza

diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index 102d79575895..f11b7dc16e9d 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -945,9 +945,20 @@ static int loop_set_fd(struct loop_device *lo, fmode_t mode,
 	if (!file)
 		goto out;
 
+	/*
+	 * If we don't hold exclusive handle for the device, upgrade to it
+	 * here to avoid changing device under exclusive owner.
+	 */
+	if (!(mode & FMODE_EXCL)) {
+		bdgrab(bdev);
+		error = blkdev_get(bdev, mode | FMODE_EXCL, loop_set_fd);
+		if (error)
+			goto out_putf;
+	}
+
 	error = mutex_lock_killable(&loop_ctl_mutex);
 	if (error)
-		goto out_putf;
+		goto out_bdev;
 
 	error = -EBUSY;
 	if (lo->lo_state != Lo_unbound)
@@ -1012,10 +1023,15 @@ static int loop_set_fd(struct loop_device *lo, fmode_t mode,
 	mutex_unlock(&loop_ctl_mutex);
 	if (partscan)
 		loop_reread_partitions(lo, bdev);
+	if (!(mode & FMODE_EXCL))
+		blkdev_put(bdev, mode | FMODE_EXCL);
 	return 0;
 
 out_unlock:
 	mutex_unlock(&loop_ctl_mutex);
+out_bdev:
+	if (!(mode & FMODE_EXCL))
+		blkdev_put(bdev, mode | FMODE_EXCL);
 out_putf:
 	fput(file);
 out:
-- 
2.16.4


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] loop: Don't change loop device under exclusive opener
  2019-05-16 14:01 [PATCH] loop: Don't change loop device under exclusive opener Jan Kara
@ 2019-05-16 20:44 ` Jens Axboe
  2019-05-27 12:29   ` Jan Kara
  0 siblings, 1 reply; 19+ messages in thread
From: Jens Axboe @ 2019-05-16 20:44 UTC (permalink / raw)
  To: Jan Kara; +Cc: Tetsuo Handa, linux-block

On 5/16/19 8:01 AM, Jan Kara wrote:
> Loop module allows calling LOOP_SET_FD while there are other openers of
> the loop device. Even exclusive ones. This can lead to weird
> consequences such as kernel deadlocks like:
> 
> mount_bdev()				lo_ioctl()
>   udf_fill_super()
>     udf_load_vrs()
>       sb_set_blocksize() - sets desired block size B
>       udf_tread()
>         sb_bread()
>           __bread_gfp(bdev, block, B)
> 					  loop_set_fd()
> 					    set_blocksize()
>             - now __getblk_slow() indefinitely loops because B != bdev
>               block size
> 
> Fix the problem by disallowing LOOP_SET_FD ioctl when there are
> exclusive openers of a loop device.
> 
> [Deliberately chosen not to CC stable as a user with priviledges to
> trigger this race has other means of taking the system down and this
> has a potential of breaking some weird userspace setup]
> 
> Reported-and-tested-by: syzbot+10007d66ca02b08f0e60@syzkaller.appspotmail.com
> Signed-off-by: Jan Kara <jack@suse.cz>
> ---
>  drivers/block/loop.c | 18 +++++++++++++++++-
>  1 file changed, 17 insertions(+), 1 deletion(-)
> 
> Hi Jens!
> 
> What do you think about this patch? It fixes the problem but it also
> changes user visible behavior so there are chances it breaks some
> existing setup (although I have hard time coming up with a realistic
> scenario where it would matter).

I also have a hard time thinking about valid cases where this would be a
problem. I think, in the end, that fixing the issue is more important
than a potentially hypothetical use case.

> Alternatively we could change getblk() code handle changing block
> size. That would fix the particular issue syzkaller found as well but
> I'm not sure what else is broken when block device changes while fs
> driver is working with it.

I think your solution here is saner.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] loop: Don't change loop device under exclusive opener
  2019-05-16 20:44 ` Jens Axboe
@ 2019-05-27 12:29   ` Jan Kara
  2019-05-27 13:34     ` Jens Axboe
  0 siblings, 1 reply; 19+ messages in thread
From: Jan Kara @ 2019-05-27 12:29 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Jan Kara, Tetsuo Handa, linux-block

On Thu 16-05-19 14:44:07, Jens Axboe wrote:
> On 5/16/19 8:01 AM, Jan Kara wrote:
> > Loop module allows calling LOOP_SET_FD while there are other openers of
> > the loop device. Even exclusive ones. This can lead to weird
> > consequences such as kernel deadlocks like:
> > 
> > mount_bdev()				lo_ioctl()
> >   udf_fill_super()
> >     udf_load_vrs()
> >       sb_set_blocksize() - sets desired block size B
> >       udf_tread()
> >         sb_bread()
> >           __bread_gfp(bdev, block, B)
> > 					  loop_set_fd()
> > 					    set_blocksize()
> >             - now __getblk_slow() indefinitely loops because B != bdev
> >               block size
> > 
> > Fix the problem by disallowing LOOP_SET_FD ioctl when there are
> > exclusive openers of a loop device.
> > 
> > [Deliberately chosen not to CC stable as a user with priviledges to
> > trigger this race has other means of taking the system down and this
> > has a potential of breaking some weird userspace setup]
> > 
> > Reported-and-tested-by: syzbot+10007d66ca02b08f0e60@syzkaller.appspotmail.com
> > Signed-off-by: Jan Kara <jack@suse.cz>
> > ---
> >  drivers/block/loop.c | 18 +++++++++++++++++-
> >  1 file changed, 17 insertions(+), 1 deletion(-)
> > 
> > Hi Jens!
> > 
> > What do you think about this patch? It fixes the problem but it also
> > changes user visible behavior so there are chances it breaks some
> > existing setup (although I have hard time coming up with a realistic
> > scenario where it would matter).
> 
> I also have a hard time thinking about valid cases where this would be a
> problem. I think, in the end, that fixing the issue is more important
> than a potentially hypothetical use case.
> 
> > Alternatively we could change getblk() code handle changing block
> > size. That would fix the particular issue syzkaller found as well but
> > I'm not sure what else is broken when block device changes while fs
> > driver is working with it.
> 
> I think your solution here is saner.

Will you pick up the patch please? I cannot find it in your tree... Thanks!

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] loop: Don't change loop device under exclusive opener
  2019-05-27 12:29   ` Jan Kara
@ 2019-05-27 13:34     ` Jens Axboe
  2019-07-18  8:15       ` Kai-Heng Feng
  0 siblings, 1 reply; 19+ messages in thread
From: Jens Axboe @ 2019-05-27 13:34 UTC (permalink / raw)
  To: Jan Kara; +Cc: Tetsuo Handa, linux-block

On 5/27/19 6:29 AM, Jan Kara wrote:
> On Thu 16-05-19 14:44:07, Jens Axboe wrote:
>> On 5/16/19 8:01 AM, Jan Kara wrote:
>>> Loop module allows calling LOOP_SET_FD while there are other openers of
>>> the loop device. Even exclusive ones. This can lead to weird
>>> consequences such as kernel deadlocks like:
>>>
>>> mount_bdev()				lo_ioctl()
>>>    udf_fill_super()
>>>      udf_load_vrs()
>>>        sb_set_blocksize() - sets desired block size B
>>>        udf_tread()
>>>          sb_bread()
>>>            __bread_gfp(bdev, block, B)
>>> 					  loop_set_fd()
>>> 					    set_blocksize()
>>>              - now __getblk_slow() indefinitely loops because B != bdev
>>>                block size
>>>
>>> Fix the problem by disallowing LOOP_SET_FD ioctl when there are
>>> exclusive openers of a loop device.
>>>
>>> [Deliberately chosen not to CC stable as a user with priviledges to
>>> trigger this race has other means of taking the system down and this
>>> has a potential of breaking some weird userspace setup]
>>>
>>> Reported-and-tested-by: syzbot+10007d66ca02b08f0e60@syzkaller.appspotmail.com
>>> Signed-off-by: Jan Kara <jack@suse.cz>
>>> ---
>>>   drivers/block/loop.c | 18 +++++++++++++++++-
>>>   1 file changed, 17 insertions(+), 1 deletion(-)
>>>
>>> Hi Jens!
>>>
>>> What do you think about this patch? It fixes the problem but it also
>>> changes user visible behavior so there are chances it breaks some
>>> existing setup (although I have hard time coming up with a realistic
>>> scenario where it would matter).
>>
>> I also have a hard time thinking about valid cases where this would be a
>> problem. I think, in the end, that fixing the issue is more important
>> than a potentially hypothetical use case.
>>
>>> Alternatively we could change getblk() code handle changing block
>>> size. That would fix the particular issue syzkaller found as well but
>>> I'm not sure what else is broken when block device changes while fs
>>> driver is working with it.
>>
>> I think your solution here is saner.
> 
> Will you pick up the patch please? I cannot find it in your tree... Thanks!

Done!

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] loop: Don't change loop device under exclusive opener
  2019-05-27 13:34     ` Jens Axboe
@ 2019-07-18  8:15       ` Kai-Heng Feng
  2019-07-30  9:29         ` Jan Kara
  0 siblings, 1 reply; 19+ messages in thread
From: Kai-Heng Feng @ 2019-07-18  8:15 UTC (permalink / raw)
  To: Jan Kara
  Cc: Jens Axboe, Tetsuo Handa, linux-block, John Lenton,
	jean-baptiste.lallement

Hi Jan,

at 21:34, Jens Axboe <axboe@kernel.dk> wrote:

> On 5/27/19 6:29 AM, Jan Kara wrote:
>> On Thu 16-05-19 14:44:07, Jens Axboe wrote:
>>> On 5/16/19 8:01 AM, Jan Kara wrote:
>>>> Loop module allows calling LOOP_SET_FD while there are other openers of
>>>> the loop device. Even exclusive ones. This can lead to weird
>>>> consequences such as kernel deadlocks like:
>>>>
>>>> mount_bdev()				lo_ioctl()
>>>>    udf_fill_super()
>>>>      udf_load_vrs()
>>>>        sb_set_blocksize() - sets desired block size B
>>>>        udf_tread()
>>>>          sb_bread()
>>>>            __bread_gfp(bdev, block, B)
>>>> 					  loop_set_fd()
>>>> 					    set_blocksize()
>>>>              - now __getblk_slow() indefinitely loops because B != bdev
>>>>                block size
>>>>
>>>> Fix the problem by disallowing LOOP_SET_FD ioctl when there are
>>>> exclusive openers of a loop device.
>>>>
>>>> [Deliberately chosen not to CC stable as a user with priviledges to
>>>> trigger this race has other means of taking the system down and this
>>>> has a potential of breaking some weird userspace setup]
>>>>
>>>> Reported-and-tested-by:  
>>>> syzbot+10007d66ca02b08f0e60@syzkaller.appspotmail.com
>>>> Signed-off-by: Jan Kara <jack@suse.cz>
>>>> ---
>>>>   drivers/block/loop.c | 18 +++++++++++++++++-
>>>>   1 file changed, 17 insertions(+), 1 deletion(-)
>>>>
>>>> Hi Jens!
>>>>
>>>> What do you think about this patch? It fixes the problem but it also
>>>> changes user visible behavior so there are chances it breaks some
>>>> existing setup (although I have hard time coming up with a realistic
>>>> scenario where it would matter).
>>>
>>> I also have a hard time thinking about valid cases where this would be a
>>> problem. I think, in the end, that fixing the issue is more important
>>> than a potentially hypothetical use case.
>>>
>>>> Alternatively we could change getblk() code handle changing block
>>>> size. That would fix the particular issue syzkaller found as well but
>>>> I'm not sure what else is broken when block device changes while fs
>>>> driver is working with it.
>>>
>>> I think your solution here is saner.
>>
>> Will you pick up the patch please? I cannot find it in your tree...  
>> Thanks!
>
> Done!

This patch introduced a regression [1].
A reproducer can be found at [2].

[1] https://bugs.launchpad.net/bugs/1836914
[2] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1836914/comments/4

Kai-Heng

>
> -- 
> Jens Axboe



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] loop: Don't change loop device under exclusive opener
  2019-07-18  8:15       ` Kai-Heng Feng
@ 2019-07-30  9:29         ` Jan Kara
  2019-07-30  9:36           ` John Lenton
  2019-07-30 10:16           ` Tetsuo Handa
  0 siblings, 2 replies; 19+ messages in thread
From: Jan Kara @ 2019-07-30  9:29 UTC (permalink / raw)
  To: Kai-Heng Feng
  Cc: Jan Kara, Jens Axboe, Tetsuo Handa, linux-block, John Lenton,
	jean-baptiste.lallement

On Thu 18-07-19 16:15:42, Kai-Heng Feng wrote:
> Hi Jan,
> 
> at 21:34, Jens Axboe <axboe@kernel.dk> wrote:
> 
> > On 5/27/19 6:29 AM, Jan Kara wrote:
> > > On Thu 16-05-19 14:44:07, Jens Axboe wrote:
> > > > On 5/16/19 8:01 AM, Jan Kara wrote:
> > > > > Loop module allows calling LOOP_SET_FD while there are other openers of
> > > > > the loop device. Even exclusive ones. This can lead to weird
> > > > > consequences such as kernel deadlocks like:
> > > > > 
> > > > > mount_bdev()				lo_ioctl()
> > > > >    udf_fill_super()
> > > > >      udf_load_vrs()
> > > > >        sb_set_blocksize() - sets desired block size B
> > > > >        udf_tread()
> > > > >          sb_bread()
> > > > >            __bread_gfp(bdev, block, B)
> > > > > 					  loop_set_fd()
> > > > > 					    set_blocksize()
> > > > >              - now __getblk_slow() indefinitely loops because B != bdev
> > > > >                block size
> > > > > 
> > > > > Fix the problem by disallowing LOOP_SET_FD ioctl when there are
> > > > > exclusive openers of a loop device.
> > > > > 
> > > > > [Deliberately chosen not to CC stable as a user with priviledges to
> > > > > trigger this race has other means of taking the system down and this
> > > > > has a potential of breaking some weird userspace setup]
> > > > > 
> > > > > Reported-and-tested-by:
> > > > > syzbot+10007d66ca02b08f0e60@syzkaller.appspotmail.com
> > > > > Signed-off-by: Jan Kara <jack@suse.cz>
> > > > > ---
> > > > >   drivers/block/loop.c | 18 +++++++++++++++++-
> > > > >   1 file changed, 17 insertions(+), 1 deletion(-)
> > > > > 
> > > > > Hi Jens!
> > > > > 
> > > > > What do you think about this patch? It fixes the problem but it also
> > > > > changes user visible behavior so there are chances it breaks some
> > > > > existing setup (although I have hard time coming up with a realistic
> > > > > scenario where it would matter).
> > > > 
> > > > I also have a hard time thinking about valid cases where this would be a
> > > > problem. I think, in the end, that fixing the issue is more important
> > > > than a potentially hypothetical use case.
> > > > 
> > > > > Alternatively we could change getblk() code handle changing block
> > > > > size. That would fix the particular issue syzkaller found as well but
> > > > > I'm not sure what else is broken when block device changes while fs
> > > > > driver is working with it.
> > > > 
> > > > I think your solution here is saner.
> > > 
> > > Will you pick up the patch please? I cannot find it in your tree...
> > > Thanks!
> > 
> > Done!
> 
> This patch introduced a regression [1].
> A reproducer can be found at [2].
> 
> [1] https://bugs.launchpad.net/bugs/1836914
> [2] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1836914/comments/4

Thanks for the notice and the references. What's your version of
util-linux? What your test script does is indeed racy. You have there:

echo Running:
for i in {a..z}{a..z}; do
    mount $i.squash /mnt/$i &
done

So all mount(8) commands will run in parallel and race to setup loop
devices with LOOP_SET_FD and mount them. However util-linux (at least in
the current version) seems to handle EBUSY from LOOP_SET_FD just fine and
retries with the new loop device. So at this point I don't see why the patch
makes difference... I guess I'll need to reproduce and see what's going on
in detail.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] loop: Don't change loop device under exclusive opener
  2019-07-30  9:29         ` Jan Kara
@ 2019-07-30  9:36           ` John Lenton
  2019-07-30 10:16             ` Jan Kara
  2019-07-30 10:16           ` Tetsuo Handa
  1 sibling, 1 reply; 19+ messages in thread
From: John Lenton @ 2019-07-30  9:36 UTC (permalink / raw)
  To: Jan Kara
  Cc: Kai-Heng Feng, Jens Axboe, Tetsuo Handa, linux-block,
	jean-baptiste.lallement

On Tue, 30 Jul 2019 at 10:29, Jan Kara <jack@suse.cz> wrote:
>
> Thanks for the notice and the references. What's your version of
> util-linux? What your test script does is indeed racy. You have there:
>
> echo Running:
> for i in {a..z}{a..z}; do
>     mount $i.squash /mnt/$i &
> done
>
> So all mount(8) commands will run in parallel and race to setup loop
> devices with LOOP_SET_FD and mount them. However util-linux (at least in
> the current version) seems to handle EBUSY from LOOP_SET_FD just fine and
> retries with the new loop device. So at this point I don't see why the patch
> makes difference... I guess I'll need to reproduce and see what's going on
> in detail.

We've observed this in arch with util-linux 2.34, and ubuntu 19.10
(eoan ermine) with util-linux 2.33.

just to be clear, the initial reports didn't involve a zany loop of
mounts, but were triggered by effectively the same thing as systemd
booted a system with a lot of snaps. The reroducer tries to makes
things simpler to reproduce :-). FWIW,  systemd versions were 244 and
242 for those systems, respectively.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] loop: Don't change loop device under exclusive opener
  2019-07-30  9:29         ` Jan Kara
  2019-07-30  9:36           ` John Lenton
@ 2019-07-30 10:16           ` Tetsuo Handa
  1 sibling, 0 replies; 19+ messages in thread
From: Tetsuo Handa @ 2019-07-30 10:16 UTC (permalink / raw)
  To: Jan Kara, Kai-Heng Feng
  Cc: Jens Axboe, linux-block, John Lenton, jean-baptiste.lallement

On 2019/07/30 18:29, Jan Kara wrote:
>> This patch introduced a regression [1].
>> A reproducer can be found at [2].
>>
>> [1] https://bugs.launchpad.net/bugs/1836914
>> [2] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1836914/comments/4
> 
> Thanks for the notice and the references. What's your version of
> util-linux? What your test script does is indeed racy. You have there:
> 
> echo Running:
> for i in {a..z}{a..z}; do
>     mount $i.squash /mnt/$i &
> done
> 
> So all mount(8) commands will run in parallel and race to setup loop
> devices with LOOP_SET_FD and mount them. However util-linux (at least in
> the current version) seems to handle EBUSY from LOOP_SET_FD just fine and
> retries with the new loop device. So at this point I don't see why the patch
> makes difference... I guess I'll need to reproduce and see what's going on
> in detail.

Firstly, why not to check the return value of blkdev_get() ?
EBUSY is not the only error code blkdev_get() might return.

 	/*
 	 * If we don't hold exclusive handle for the device, upgrade to it
 	 * here to avoid changing device under exclusive owner.
 	 */
 	if (!(mode & FMODE_EXCL)) {
 		bdgrab(bdev);
 		error = blkdev_get(bdev, mode | FMODE_EXCL, loop_set_fd);
-		if (error)
+		if (error) {
+			printk("loop_set_fd: %d\n", error);
 			goto out_putf;
+		}
 	}

And try finding which line is returning an error
(like https://marc.info/?l=linux-xfs&m=156437221703110 does).

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] loop: Don't change loop device under exclusive opener
  2019-07-30  9:36           ` John Lenton
@ 2019-07-30 10:16             ` Jan Kara
  2019-07-30 13:36               ` Jan Kara
  0 siblings, 1 reply; 19+ messages in thread
From: Jan Kara @ 2019-07-30 10:16 UTC (permalink / raw)
  To: John Lenton
  Cc: Jan Kara, Kai-Heng Feng, Jens Axboe, Tetsuo Handa, linux-block,
	jean-baptiste.lallement

On Tue 30-07-19 10:36:59, John Lenton wrote:
> On Tue, 30 Jul 2019 at 10:29, Jan Kara <jack@suse.cz> wrote:
> >
> > Thanks for the notice and the references. What's your version of
> > util-linux? What your test script does is indeed racy. You have there:
> >
> > echo Running:
> > for i in {a..z}{a..z}; do
> >     mount $i.squash /mnt/$i &
> > done
> >
> > So all mount(8) commands will run in parallel and race to setup loop
> > devices with LOOP_SET_FD and mount them. However util-linux (at least in
> > the current version) seems to handle EBUSY from LOOP_SET_FD just fine and
> > retries with the new loop device. So at this point I don't see why the patch
> > makes difference... I guess I'll need to reproduce and see what's going on
> > in detail.
> 
> We've observed this in arch with util-linux 2.34, and ubuntu 19.10
> (eoan ermine) with util-linux 2.33.
> 
> just to be clear, the initial reports didn't involve a zany loop of
> mounts, but were triggered by effectively the same thing as systemd
> booted a system with a lot of snaps. The reroducer tries to makes
> things simpler to reproduce :-). FWIW,  systemd versions were 244 and
> 242 for those systems, respectively.

Thanks for info! So I think I see what's going on. The two mounts race
like:

MOUNT1					MOUNT2
num = ioctl(LOOP_CTL_GET_FREE)
					num = ioctl(LOOP_CTL_GET_FREE)
ioctl("/dev/loop$num", LOOP_SET_FD, ..)
 - returns OK
					ioctl("/dev/loop$num", LOOP_SET_FD, ..)
					  - acquires exclusine loop$num
					    reference
mount("/dev/loop$num", ...)
 - sees exclusive reference from MOUNT2 and fails
					  - sees loop device is already
					    bound and fails

It is a bug in the scheme I've chosen that racing LOOP_SET_FD can block
perfectly valid mount. I'll think how to fix this...

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] loop: Don't change loop device under exclusive opener
  2019-07-30 10:16             ` Jan Kara
@ 2019-07-30 13:36               ` Jan Kara
  2019-07-30 17:59                 ` Kai-Heng Feng
                                   ` (2 more replies)
  0 siblings, 3 replies; 19+ messages in thread
From: Jan Kara @ 2019-07-30 13:36 UTC (permalink / raw)
  To: John Lenton
  Cc: Jan Kara, Kai-Heng Feng, Jens Axboe, Tetsuo Handa, linux-block,
	jean-baptiste.lallement

[-- Attachment #1: Type: text/plain, Size: 2027 bytes --]

On Tue 30-07-19 12:16:46, Jan Kara wrote:
> On Tue 30-07-19 10:36:59, John Lenton wrote:
> > On Tue, 30 Jul 2019 at 10:29, Jan Kara <jack@suse.cz> wrote:
> > >
> > > Thanks for the notice and the references. What's your version of
> > > util-linux? What your test script does is indeed racy. You have there:
> > >
> > > echo Running:
> > > for i in {a..z}{a..z}; do
> > >     mount $i.squash /mnt/$i &
> > > done
> > >
> > > So all mount(8) commands will run in parallel and race to setup loop
> > > devices with LOOP_SET_FD and mount them. However util-linux (at least in
> > > the current version) seems to handle EBUSY from LOOP_SET_FD just fine and
> > > retries with the new loop device. So at this point I don't see why the patch
> > > makes difference... I guess I'll need to reproduce and see what's going on
> > > in detail.
> > 
> > We've observed this in arch with util-linux 2.34, and ubuntu 19.10
> > (eoan ermine) with util-linux 2.33.
> > 
> > just to be clear, the initial reports didn't involve a zany loop of
> > mounts, but were triggered by effectively the same thing as systemd
> > booted a system with a lot of snaps. The reroducer tries to makes
> > things simpler to reproduce :-). FWIW,  systemd versions were 244 and
> > 242 for those systems, respectively.
> 
> Thanks for info! So I think I see what's going on. The two mounts race
> like:
> 
> MOUNT1					MOUNT2
> num = ioctl(LOOP_CTL_GET_FREE)
> 					num = ioctl(LOOP_CTL_GET_FREE)
> ioctl("/dev/loop$num", LOOP_SET_FD, ..)
>  - returns OK
> 					ioctl("/dev/loop$num", LOOP_SET_FD, ..)
> 					  - acquires exclusine loop$num
> 					    reference
> mount("/dev/loop$num", ...)
>  - sees exclusive reference from MOUNT2 and fails
> 					  - sees loop device is already
> 					    bound and fails
> 
> It is a bug in the scheme I've chosen that racing LOOP_SET_FD can block
> perfectly valid mount. I'll think how to fix this...

So how about attached patch? It fixes the regression for me.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

[-- Attachment #2: 0001-loop-Fix-mount-2-failure-due-to-race-with-LOOP_SET_F.patch --]
[-- Type: text/x-patch, Size: 7200 bytes --]

From 5069263402e9daef5df1ee02576107e11bd138a6 Mon Sep 17 00:00:00 2001
From: Jan Kara <jack@suse.cz>
Date: Tue, 30 Jul 2019 13:10:14 +0200
Subject: [PATCH] loop: Fix mount(2) failure due to race with LOOP_SET_FD

Commit 33ec3e53e7b1 ("loop: Don't change loop device under exclusive
opener") made LOOP_SET_FD ioctl acquire exclusive block device reference
while it updates loop device binding. However this can make perfectly
valid mount(2) fail with EBUSY due to racing LOOP_SET_FD holding
temporarily the exclusive bdev reference in cases like this:

for i in {a..z}{a..z}; do
        dd if=/dev/zero of=$i.image bs=1k count=0 seek=1024
        mkfs.ext2 $i.image
        mkdir mnt$i
done

echo "Run"
for i in {a..z}{a..z}; do
        mount -o loop -t ext2 $i.image mnt$i &
done

Fix the problem by not getting full exclusive bdev reference in
LOOP_SET_FD but instead just mark the bdev as being claimed while we
update the binding information. This just blocks new exclusive openers
instead of failing them with EBUSY thus fixing the problem.

Fixes: 33ec3e53e7b1 ("loop: Don't change loop device under exclusive opener")
Signed-off-by: Jan Kara <jack@suse.cz>
---
 drivers/block/loop.c | 16 +++++-----
 fs/block_dev.c       | 83 ++++++++++++++++++++++++++++++++++++----------------
 include/linux/fs.h   |  6 ++++
 3 files changed, 73 insertions(+), 32 deletions(-)

diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index 44c9985f352a..3036883fc9f8 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -924,6 +924,7 @@ static int loop_set_fd(struct loop_device *lo, fmode_t mode,
 	struct file	*file;
 	struct inode	*inode;
 	struct address_space *mapping;
+	struct block_device *claimed_bdev = NULL;
 	int		lo_flags = 0;
 	int		error;
 	loff_t		size;
@@ -942,10 +943,11 @@ static int loop_set_fd(struct loop_device *lo, fmode_t mode,
 	 * here to avoid changing device under exclusive owner.
 	 */
 	if (!(mode & FMODE_EXCL)) {
-		bdgrab(bdev);
-		error = blkdev_get(bdev, mode | FMODE_EXCL, loop_set_fd);
-		if (error)
+		claimed_bdev = bd_start_claiming(bdev, loop_set_fd);
+		if (IS_ERR(claimed_bdev)) {
+			error = PTR_ERR(claimed_bdev);
 			goto out_putf;
+		}
 	}
 
 	error = mutex_lock_killable(&loop_ctl_mutex);
@@ -1015,15 +1017,15 @@ static int loop_set_fd(struct loop_device *lo, fmode_t mode,
 	mutex_unlock(&loop_ctl_mutex);
 	if (partscan)
 		loop_reread_partitions(lo, bdev);
-	if (!(mode & FMODE_EXCL))
-		blkdev_put(bdev, mode | FMODE_EXCL);
+	if (claimed_bdev)
+		bd_abort_claiming(bdev, claimed_bdev, loop_set_fd);
 	return 0;
 
 out_unlock:
 	mutex_unlock(&loop_ctl_mutex);
 out_bdev:
-	if (!(mode & FMODE_EXCL))
-		blkdev_put(bdev, mode | FMODE_EXCL);
+	if (claimed_bdev)
+		bd_abort_claiming(bdev, claimed_bdev, loop_set_fd);
 out_putf:
 	fput(file);
 out:
diff --git a/fs/block_dev.c b/fs/block_dev.c
index c2a85b587922..22591bad9353 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -1181,8 +1181,7 @@ static struct gendisk *bdev_get_gendisk(struct block_device *bdev, int *partno)
  * Pointer to the block device containing @bdev on success, ERR_PTR()
  * value on failure.
  */
-static struct block_device *bd_start_claiming(struct block_device *bdev,
-					      void *holder)
+struct block_device *bd_start_claiming(struct block_device *bdev, void *holder)
 {
 	struct gendisk *disk;
 	struct block_device *whole;
@@ -1229,6 +1228,62 @@ static struct block_device *bd_start_claiming(struct block_device *bdev,
 		return ERR_PTR(err);
 	}
 }
+EXPORT_SYMBOL(bd_start_claiming);
+
+static void bd_clear_claiming(struct block_device *whole, void *holder)
+{
+	lockdep_assert_held(&bdev_lock);
+	/* tell others that we're done */
+	BUG_ON(whole->bd_claiming != holder);
+	whole->bd_claiming = NULL;
+	wake_up_bit(&whole->bd_claiming, 0);
+}
+
+/**
+ * bd_finish_claiming - finish claiming of a block device
+ * @bdev: block device of interest
+ * @whole: whole block device (returned from bd_start_claiming())
+ * @holder: holder that has claimed @bdev
+ *
+ * Finish exclusive open of a block device. Mark the device as exlusively
+ * open by the holder and wake up all waiters for exclusive open to finish.
+ */
+void bd_finish_claiming(struct block_device *bdev, struct block_device *whole,
+			void *holder)
+{
+	spin_lock(&bdev_lock);
+	BUG_ON(!bd_may_claim(bdev, whole, holder));
+	/*
+	 * Note that for a whole device bd_holders will be incremented twice,
+	 * and bd_holder will be set to bd_may_claim before being set to holder
+	 */
+	whole->bd_holders++;
+	whole->bd_holder = bd_may_claim;
+	bdev->bd_holders++;
+	bdev->bd_holder = holder;
+	bd_clear_claiming(whole, holder);
+	spin_unlock(&bdev_lock);
+}
+EXPORT_SYMBOL(bd_finish_claiming);
+
+/**
+ * bd_abort_claiming - abort claiming of a block device
+ * @bdev: block device of interest
+ * @whole: whole block device (returned from bd_start_claiming())
+ * @holder: holder that has claimed @bdev
+ *
+ * Abort claiming of a block device when the exclusive open failed. This can be
+ * also used when exclusive open is not actually desired and we just needed
+ * to block other exclusive openers for a while.
+ */
+void bd_abort_claiming(struct block_device *bdev, struct block_device *whole,
+		       void *holder)
+{
+	spin_lock(&bdev_lock);
+	bd_clear_claiming(whole, holder);
+	spin_unlock(&bdev_lock);
+}
+EXPORT_SYMBOL(bd_abort_claiming);
 
 #ifdef CONFIG_SYSFS
 struct bd_holder_disk {
@@ -1698,29 +1753,7 @@ int blkdev_get(struct block_device *bdev, fmode_t mode, void *holder)
 
 		/* finish claiming */
 		mutex_lock(&bdev->bd_mutex);
-		spin_lock(&bdev_lock);
-
-		if (!res) {
-			BUG_ON(!bd_may_claim(bdev, whole, holder));
-			/*
-			 * Note that for a whole device bd_holders
-			 * will be incremented twice, and bd_holder
-			 * will be set to bd_may_claim before being
-			 * set to holder
-			 */
-			whole->bd_holders++;
-			whole->bd_holder = bd_may_claim;
-			bdev->bd_holders++;
-			bdev->bd_holder = holder;
-		}
-
-		/* tell others that we're done */
-		BUG_ON(whole->bd_claiming != holder);
-		whole->bd_claiming = NULL;
-		wake_up_bit(&whole->bd_claiming, 0);
-
-		spin_unlock(&bdev_lock);
-
+		bd_finish_claiming(bdev, whole, holder);
 		/*
 		 * Block event polling for write claims if requested.  Any
 		 * write holder makes the write_holder state stick until
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 56b8e358af5c..997a530ff4e9 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2598,6 +2598,12 @@ extern struct block_device *blkdev_get_by_path(const char *path, fmode_t mode,
 					       void *holder);
 extern struct block_device *blkdev_get_by_dev(dev_t dev, fmode_t mode,
 					      void *holder);
+extern struct block_device *bd_start_claiming(struct block_device *bdev,
+					      void *holder);
+extern void bd_finish_claiming(struct block_device *bdev,
+			       struct block_device *whole, void *holder);
+extern void bd_abort_claiming(struct block_device *bdev,
+			      struct block_device *whole, void *holder);
 extern void blkdev_put(struct block_device *bdev, fmode_t mode);
 extern int __blkdev_reread_part(struct block_device *bdev);
 extern int blkdev_reread_part(struct block_device *bdev);
-- 
2.16.4


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] loop: Don't change loop device under exclusive opener
  2019-07-30 13:36               ` Jan Kara
@ 2019-07-30 17:59                 ` Kai-Heng Feng
  2019-07-30 19:17                 ` Jens Axboe
  2019-08-05 16:41                 ` Bart Van Assche
  2 siblings, 0 replies; 19+ messages in thread
From: Kai-Heng Feng @ 2019-07-30 17:59 UTC (permalink / raw)
  To: Jan Kara
  Cc: John Lenton, Jens Axboe, Tetsuo Handa, linux-block,
	jean-baptiste.lallement

at 21:36, Jan Kara <jack@suse.cz> wrote:

> On Tue 30-07-19 12:16:46, Jan Kara wrote:
>> On Tue 30-07-19 10:36:59, John Lenton wrote:
>>> On Tue, 30 Jul 2019 at 10:29, Jan Kara <jack@suse.cz> wrote:
>>>> Thanks for the notice and the references. What's your version of
>>>> util-linux? What your test script does is indeed racy. You have there:
>>>>
>>>> echo Running:
>>>> for i in {a..z}{a..z}; do
>>>>     mount $i.squash /mnt/$i &
>>>> done
>>>>
>>>> So all mount(8) commands will run in parallel and race to setup loop
>>>> devices with LOOP_SET_FD and mount them. However util-linux (at least in
>>>> the current version) seems to handle EBUSY from LOOP_SET_FD just fine  
>>>> and
>>>> retries with the new loop device. So at this point I don't see why the  
>>>> patch
>>>> makes difference... I guess I'll need to reproduce and see what's  
>>>> going on
>>>> in detail.
>>>
>>> We've observed this in arch with util-linux 2.34, and ubuntu 19.10
>>> (eoan ermine) with util-linux 2.33.
>>>
>>> just to be clear, the initial reports didn't involve a zany loop of
>>> mounts, but were triggered by effectively the same thing as systemd
>>> booted a system with a lot of snaps. The reroducer tries to makes
>>> things simpler to reproduce :-). FWIW,  systemd versions were 244 and
>>> 242 for those systems, respectively.
>>
>> Thanks for info! So I think I see what's going on. The two mounts race
>> like:
>>
>> MOUNT1					MOUNT2
>> num = ioctl(LOOP_CTL_GET_FREE)
>> 					num = ioctl(LOOP_CTL_GET_FREE)
>> ioctl("/dev/loop$num", LOOP_SET_FD, ..)
>>  - returns OK
>> 					ioctl("/dev/loop$num", LOOP_SET_FD, ..)
>> 					  - acquires exclusine loop$num
>> 					    reference
>> mount("/dev/loop$num", ...)
>>  - sees exclusive reference from MOUNT2 and fails
>> 					  - sees loop device is already
>> 					    bound and fails
>>
>> It is a bug in the scheme I've chosen that racing LOOP_SET_FD can block
>> perfectly valid mount. I'll think how to fix this...
>
> So how about attached patch? It fixes the regression for me.

Tested-by: Kai-Heng Feng <kai.heng.feng@canonical.com>

>
> 								Honza
> -- 
> Jan Kara <jack@suse.com>
> SUSE Labs, CR
> <0001-loop-Fix-mount-2-failure-due-to-race-with-LOOP_SET_F.patch>



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] loop: Don't change loop device under exclusive opener
  2019-07-30 13:36               ` Jan Kara
  2019-07-30 17:59                 ` Kai-Heng Feng
@ 2019-07-30 19:17                 ` Jens Axboe
  2019-07-30 21:11                   ` John Lenton
  2019-07-31  8:56                   ` Jan Kara
  2019-08-05 16:41                 ` Bart Van Assche
  2 siblings, 2 replies; 19+ messages in thread
From: Jens Axboe @ 2019-07-30 19:17 UTC (permalink / raw)
  To: Jan Kara, John Lenton
  Cc: Kai-Heng Feng, Tetsuo Handa, linux-block, jean-baptiste.lallement

On 7/30/19 7:36 AM, Jan Kara wrote:
> On Tue 30-07-19 12:16:46, Jan Kara wrote:
>> On Tue 30-07-19 10:36:59, John Lenton wrote:
>>> On Tue, 30 Jul 2019 at 10:29, Jan Kara <jack@suse.cz> wrote:
>>>>
>>>> Thanks for the notice and the references. What's your version of
>>>> util-linux? What your test script does is indeed racy. You have there:
>>>>
>>>> echo Running:
>>>> for i in {a..z}{a..z}; do
>>>>      mount $i.squash /mnt/$i &
>>>> done
>>>>
>>>> So all mount(8) commands will run in parallel and race to setup loop
>>>> devices with LOOP_SET_FD and mount them. However util-linux (at least in
>>>> the current version) seems to handle EBUSY from LOOP_SET_FD just fine and
>>>> retries with the new loop device. So at this point I don't see why the patch
>>>> makes difference... I guess I'll need to reproduce and see what's going on
>>>> in detail.
>>>
>>> We've observed this in arch with util-linux 2.34, and ubuntu 19.10
>>> (eoan ermine) with util-linux 2.33.
>>>
>>> just to be clear, the initial reports didn't involve a zany loop of
>>> mounts, but were triggered by effectively the same thing as systemd
>>> booted a system with a lot of snaps. The reroducer tries to makes
>>> things simpler to reproduce :-). FWIW,  systemd versions were 244 and
>>> 242 for those systems, respectively.
>>
>> Thanks for info! So I think I see what's going on. The two mounts race
>> like:
>>
>> MOUNT1					MOUNT2
>> num = ioctl(LOOP_CTL_GET_FREE)
>> 					num = ioctl(LOOP_CTL_GET_FREE)
>> ioctl("/dev/loop$num", LOOP_SET_FD, ..)
>>   - returns OK
>> 					ioctl("/dev/loop$num", LOOP_SET_FD, ..)
>> 					  - acquires exclusine loop$num
>> 					    reference
>> mount("/dev/loop$num", ...)
>>   - sees exclusive reference from MOUNT2 and fails
>> 					  - sees loop device is already
>> 					    bound and fails
>>
>> It is a bug in the scheme I've chosen that racing LOOP_SET_FD can block
>> perfectly valid mount. I'll think how to fix this...
> 
> So how about attached patch? It fixes the regression for me.

Jan, I've applied this patch - and also marked it for stable, so it'll
end up in 5.2-stable. Thanks.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] loop: Don't change loop device under exclusive opener
  2019-07-30 19:17                 ` Jens Axboe
@ 2019-07-30 21:11                   ` John Lenton
  2019-07-31  8:56                   ` Jan Kara
  1 sibling, 0 replies; 19+ messages in thread
From: John Lenton @ 2019-07-30 21:11 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Jan Kara, Kai-Heng Feng, Tetsuo Handa, linux-block,
	jean-baptiste.lallement

On Tue, 30 Jul 2019 at 20:17, Jens Axboe <axboe@kernel.dk> wrote:
>
> Jan, I've applied this patch - and also marked it for stable, so it'll
> end up in 5.2-stable. Thanks.

thank you all!

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] loop: Don't change loop device under exclusive opener
  2019-07-30 19:17                 ` Jens Axboe
  2019-07-30 21:11                   ` John Lenton
@ 2019-07-31  8:56                   ` Jan Kara
  1 sibling, 0 replies; 19+ messages in thread
From: Jan Kara @ 2019-07-31  8:56 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Jan Kara, John Lenton, Kai-Heng Feng, Tetsuo Handa, linux-block,
	jean-baptiste.lallement

On Tue 30-07-19 13:17:28, Jens Axboe wrote:
> On 7/30/19 7:36 AM, Jan Kara wrote:
> > On Tue 30-07-19 12:16:46, Jan Kara wrote:
> >> On Tue 30-07-19 10:36:59, John Lenton wrote:
> >>> On Tue, 30 Jul 2019 at 10:29, Jan Kara <jack@suse.cz> wrote:
> >>>>
> >>>> Thanks for the notice and the references. What's your version of
> >>>> util-linux? What your test script does is indeed racy. You have there:
> >>>>
> >>>> echo Running:
> >>>> for i in {a..z}{a..z}; do
> >>>>      mount $i.squash /mnt/$i &
> >>>> done
> >>>>
> >>>> So all mount(8) commands will run in parallel and race to setup loop
> >>>> devices with LOOP_SET_FD and mount them. However util-linux (at least in
> >>>> the current version) seems to handle EBUSY from LOOP_SET_FD just fine and
> >>>> retries with the new loop device. So at this point I don't see why the patch
> >>>> makes difference... I guess I'll need to reproduce and see what's going on
> >>>> in detail.
> >>>
> >>> We've observed this in arch with util-linux 2.34, and ubuntu 19.10
> >>> (eoan ermine) with util-linux 2.33.
> >>>
> >>> just to be clear, the initial reports didn't involve a zany loop of
> >>> mounts, but were triggered by effectively the same thing as systemd
> >>> booted a system with a lot of snaps. The reroducer tries to makes
> >>> things simpler to reproduce :-). FWIW,  systemd versions were 244 and
> >>> 242 for those systems, respectively.
> >>
> >> Thanks for info! So I think I see what's going on. The two mounts race
> >> like:
> >>
> >> MOUNT1					MOUNT2
> >> num = ioctl(LOOP_CTL_GET_FREE)
> >> 					num = ioctl(LOOP_CTL_GET_FREE)
> >> ioctl("/dev/loop$num", LOOP_SET_FD, ..)
> >>   - returns OK
> >> 					ioctl("/dev/loop$num", LOOP_SET_FD, ..)
> >> 					  - acquires exclusine loop$num
> >> 					    reference
> >> mount("/dev/loop$num", ...)
> >>   - sees exclusive reference from MOUNT2 and fails
> >> 					  - sees loop device is already
> >> 					    bound and fails
> >>
> >> It is a bug in the scheme I've chosen that racing LOOP_SET_FD can block
> >> perfectly valid mount. I'll think how to fix this...
> > 
> > So how about attached patch? It fixes the regression for me.
> 
> Jan, I've applied this patch - and also marked it for stable, so it'll
> end up in 5.2-stable. Thanks.

Thanks Jens!

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] loop: Don't change loop device under exclusive opener
  2019-07-30 13:36               ` Jan Kara
  2019-07-30 17:59                 ` Kai-Heng Feng
  2019-07-30 19:17                 ` Jens Axboe
@ 2019-08-05 16:41                 ` Bart Van Assche
  2019-08-05 21:01                   ` Tetsuo Handa
  2019-08-07  9:45                   ` Jan Kara
  2 siblings, 2 replies; 19+ messages in thread
From: Bart Van Assche @ 2019-08-05 16:41 UTC (permalink / raw)
  To: Jan Kara, John Lenton
  Cc: Kai-Heng Feng, Jens Axboe, Tetsuo Handa, linux-block,
	jean-baptiste.lallement

On 7/30/19 6:36 AM, Jan Kara wrote:
> On Tue 30-07-19 12:16:46, Jan Kara wrote:
>> On Tue 30-07-19 10:36:59, John Lenton wrote:
>>> On Tue, 30 Jul 2019 at 10:29, Jan Kara <jack@suse.cz> wrote:
>>>>
>>>> Thanks for the notice and the references. What's your version of
>>>> util-linux? What your test script does is indeed racy. You have there:
>>>>
>>>> echo Running:
>>>> for i in {a..z}{a..z}; do
>>>>      mount $i.squash /mnt/$i &
>>>> done
>>>>
>>>> So all mount(8) commands will run in parallel and race to setup loop
>>>> devices with LOOP_SET_FD and mount them. However util-linux (at least in
>>>> the current version) seems to handle EBUSY from LOOP_SET_FD just fine and
>>>> retries with the new loop device. So at this point I don't see why the patch
>>>> makes difference... I guess I'll need to reproduce and see what's going on
>>>> in detail.
>>>
>>> We've observed this in arch with util-linux 2.34, and ubuntu 19.10
>>> (eoan ermine) with util-linux 2.33.
>>>
>>> just to be clear, the initial reports didn't involve a zany loop of
>>> mounts, but were triggered by effectively the same thing as systemd
>>> booted a system with a lot of snaps. The reroducer tries to makes
>>> things simpler to reproduce :-). FWIW,  systemd versions were 244 and
>>> 242 for those systems, respectively.
>>
>> Thanks for info! So I think I see what's going on. The two mounts race
>> like:
>>
>> MOUNT1					MOUNT2
>> num = ioctl(LOOP_CTL_GET_FREE)
>> 					num = ioctl(LOOP_CTL_GET_FREE)
>> ioctl("/dev/loop$num", LOOP_SET_FD, ..)
>>   - returns OK
>> 					ioctl("/dev/loop$num", LOOP_SET_FD, ..)
>> 					  - acquires exclusine loop$num
>> 					    reference
>> mount("/dev/loop$num", ...)
>>   - sees exclusive reference from MOUNT2 and fails
>> 					  - sees loop device is already
>> 					    bound and fails
>>
>> It is a bug in the scheme I've chosen that racing LOOP_SET_FD can block
>> perfectly valid mount. I'll think how to fix this...
> 
> So how about attached patch? It fixes the regression for me.

Hi Jan,

A new kernel warning is triggered by blktests block/001 that did not happen
without this patch. Reverting commit 89e524c04fa9 ("loop: Fix mount(2)
failure due to race with LOOP_SET_FD") makes that kernel warning disappear.
Is this reproducible on your setup?

Thanks,

Bart.

kernel: sr 10:0:0:0: [sr1] scsi-1 drive
kernel: scsi host9: scsi_runtime_resume
kernel: sr 10:0:0:0: Attached scsi CD-ROM sr1
kernel: driver: 'sr': driver_bound: bound to device '10:0:0:0'
kernel: bus: 'scsi': really_probe: bound device 10:0:0:0 to driver sr
kernel: scsi 9:0:0:0: CD-ROM            Linux    scsi_debug       0188 PQ: 0 ANSI: 7
kernel: WARNING: CPU: 5 PID: 907 at fs/block_dev.c:1899 __blkdev_put+0x396/0x3a0
systemd-udevd[906]: Process 'cdrom_id --lock-media /dev/sr2' failed with exit code 1.
kernel: bus: 'scsi': driver_probe_device: matched device 9:0:0:0 with driver sr
kernel: Modules linked in: scsi_debug crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 glue_helper crypto_simd joydev cryptd virtio_balloon virtio_console serio_raw qemu_fw_cfg iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ip_tables x_tables 
autofs4 hid_generic usbhid virtio_net psmouse hid ahci net_failover libahci i2c_piix4 floppy virtio_blk virtio_scsi failover pata_acpi [last unloaded: scsi_debug]
kernel: bus: 'scsi': really_probe: probing driver sr with device 9:0:0:0
kernel: sr 10:0:0:0: Attached scsi generic sg2 type 5
kernel: CPU: 5 PID: 907 Comm: systemd-udevd Not tainted 5.3.0-rc3-dbg+ #5
kernel: Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
kernel: RIP: 0010:__blkdev_put+0x396/0x3a0
kernel: Code: 49 8d 7d 08 e8 6b c8 f4 ff 49 8b 45 08 48 85 c0 0f 84 9c fe ff ff 8b b5 64 ff ff ff 48 8b bd 70 ff ff ff ff d0 e9 88 fe ff ff <0f> 0b e9 54 fd ff ff 0f 1f 00 0f 1f 44 00 00 55 48 89 e5 41 57 41
kernel: RSP: 0018:ffff8881111afd18 EFLAGS: 00010202
kernel: RAX: 0000000000000000 RBX: ffff888035472e00 RCX: ffffffff814a4783
kernel: RDX: 0000000000000002 RSI: dffffc0000000000 RDI: ffff888035472ea8
kernel: RBP: ffff8881111afde0 R08: ffffed1006a8e5c4 R09: ffffed1006a8e5c4
kernel: R10: ffff8881111afd08 R11: ffff888035472e1f R12: 0000000000000000
kernel: R13: 00000000080a005d R14: ffff888035472e18 R15: ffff888115e5f028
kernel: sr 9:0:0:0: Power-on or device reset occurred
kernel: FS:  00007f113411d8c0(0000) GS:ffff88811b740000(0000) knlGS:0000000000000000
kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
systemd-udevd[1257]: Process '/usr/bin/sg_inq --export /dev/sr1' failed with exit code 15.
systemd-udevd[1257]: Process 'cdrom_id --lock-media /dev/sr1' failed with exit code 1.
systemd-udevd[906]: Process '/usr/bin/sg_inq --export /dev/sr3' failed with exit code 15.
systemd-udevd[906]: Process 'cdrom_id --lock-media /dev/sr3' failed with exit code 1.
kernel: sr 9:0:0:0: [sr2] scsi-1 drive
kernel: CR2: 00007ffc4ac7fc98 CR3: 000000011127a001 CR4: 0000000000360ee0
kernel: debugfs: Directory 'sr2' with parent 'block' already present!
kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
kernel: Call Trace:
kernel:  ? __kasan_check_write+0x14/0x20
kernel: sr 9:0:0:0: Attached scsi CD-ROM sr2
kernel:  ? __mutex_unlock_slowpath+0xbd/0x400
kernel:  ? bd_set_size+0x70/0x70
kernel:  ? preempt_count_sub+0x18/0xc0
kernel:  blkdev_put+0x62/0x200
kernel:  blkdev_close+0x49/0x50
kernel:  __fput+0x15c/0x390
kernel: driver: 'sr': driver_bound: bound to device '9:0:0:0'
systemd-udevd[913]: Process '/usr/bin/sg_inq --export /dev/sr1' failed with exit code 15.
kernel:  ____fput+0xe/0x10
kernel:  task_work_run+0xc3/0xf0
kernel: bus: 'scsi': really_probe: bound device 9:0:0:0 to driver sr
kernel:  exit_to_usermode_loop+0xee/0xf0
systemd-udevd[913]: Process 'cdrom_id --lock-media /dev/sr1' failed with exit code 1.
kernel: sr 9:0:0:0: Attached scsi generic sg2 type 5
kernel:  do_syscall_64+0x213/0x270
systemd-udevd[1004]: Process '/usr/bin/sg_inq --export /dev/sr2' failed with exit code 15.
systemd-udevd[906]: Process '/usr/bin/sg_inq --export /dev/sr1' failed with exit code 15.
kernel:  entry_SYSCALL_64_after_hwframe+0x49/0xbe
kernel: RIP: 0033:0x7f11343a0674
kernel: Code: eb 8d e8 bf 1b 02 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 8d 05 69 cd 0d 00 8b 00 85 c0 75 13 b8 03 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 3c c3 0f 1f 00 53 89 fb 48 83 ec 10 e8 44 e7
kernel: RSP: 002b:00007ffc4ac80d78 EFLAGS: 00000246 ORIG_RAX: 0000000000000003
kernel: RAX: 0000000000000000 RBX: 0000000000000006 RCX: 00007f11343a0674
kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000006
kernel: RBP: 00007f113411d7e0 R08: 000055f286c9f4b0 R09: 0000000000000000
kernel: R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000002
kernel: R13: 00000000000061b0 R14: 000055f287600fd0 R15: 000055f28757f3d0
kernel: irq event stamp: 2072492
kernel: hardirqs last  enabled at (2072491): [<ffffffff81ef8b1c>] _raw_spin_unlock_irq+0x2c/0x50
kernel: hardirqs last disabled at (2072492): [<ffffffff8100291a>] trace_hardirqs_off_thunk+0x1a/0x20
kernel: softirqs last  enabled at (2069158): [<ffffffff81043a03>] fpu__copy+0xe3/0x470
kernel: softirqs last disabled at (2069156): [<ffffffff81043975>] fpu__copy+0x55/0x470
kernel: ---[ end trace 352d17ea465743b6 ]---
systemd-udevd[1004]: Process 'cdrom_id --lock-media /dev/sr2' failed with exit code 1.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] loop: Don't change loop device under exclusive opener
  2019-08-05 16:41                 ` Bart Van Assche
@ 2019-08-05 21:01                   ` Tetsuo Handa
  2019-08-07  9:45                   ` Jan Kara
  1 sibling, 0 replies; 19+ messages in thread
From: Tetsuo Handa @ 2019-08-05 21:01 UTC (permalink / raw)
  To: Bart Van Assche, Jan Kara, John Lenton
  Cc: Kai-Heng Feng, Jens Axboe, linux-block, jean-baptiste.lallement

On 2019/08/06 1:41, Bart Van Assche wrote:
> Hi Jan,
> 
> A new kernel warning is triggered by blktests block/001 that did not happen
> without this patch. Reverting commit 89e524c04fa9 ("loop: Fix mount(2)
> failure due to race with LOOP_SET_FD") makes that kernel warning disappear.
> Is this reproducible on your setup?
> 
> Thanks,
> 
> Bart.
> 
> kernel: WARNING: CPU: 5 PID: 907 at fs/block_dev.c:1899 __blkdev_put+0x396/0x3a0

Hmm, this is reported as below one.

  WARNING in __blkdev_put (2)
  https://syzkaller.appspot.com/bug?id=1ac7a7a8440522302ccb634ba69f8e1e6f203902

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] loop: Don't change loop device under exclusive opener
  2019-08-05 16:41                 ` Bart Van Assche
  2019-08-05 21:01                   ` Tetsuo Handa
@ 2019-08-07  9:45                   ` Jan Kara
  2019-08-07 18:45                     ` Bart Van Assche
  2019-08-08 13:37                     ` Jens Axboe
  1 sibling, 2 replies; 19+ messages in thread
From: Jan Kara @ 2019-08-07  9:45 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Jan Kara, John Lenton, Kai-Heng Feng, Jens Axboe, Tetsuo Handa,
	linux-block, jean-baptiste.lallement

[-- Attachment #1: Type: text/plain, Size: 2839 bytes --]

On Mon 05-08-19 09:41:39, Bart Van Assche wrote:
> On 7/30/19 6:36 AM, Jan Kara wrote:
> > On Tue 30-07-19 12:16:46, Jan Kara wrote:
> > > On Tue 30-07-19 10:36:59, John Lenton wrote:
> > > > On Tue, 30 Jul 2019 at 10:29, Jan Kara <jack@suse.cz> wrote:
> > > > > 
> > > > > Thanks for the notice and the references. What's your version of
> > > > > util-linux? What your test script does is indeed racy. You have there:
> > > > > 
> > > > > echo Running:
> > > > > for i in {a..z}{a..z}; do
> > > > >      mount $i.squash /mnt/$i &
> > > > > done
> > > > > 
> > > > > So all mount(8) commands will run in parallel and race to setup loop
> > > > > devices with LOOP_SET_FD and mount them. However util-linux (at least in
> > > > > the current version) seems to handle EBUSY from LOOP_SET_FD just fine and
> > > > > retries with the new loop device. So at this point I don't see why the patch
> > > > > makes difference... I guess I'll need to reproduce and see what's going on
> > > > > in detail.
> > > > 
> > > > We've observed this in arch with util-linux 2.34, and ubuntu 19.10
> > > > (eoan ermine) with util-linux 2.33.
> > > > 
> > > > just to be clear, the initial reports didn't involve a zany loop of
> > > > mounts, but were triggered by effectively the same thing as systemd
> > > > booted a system with a lot of snaps. The reroducer tries to makes
> > > > things simpler to reproduce :-). FWIW,  systemd versions were 244 and
> > > > 242 for those systems, respectively.
> > > 
> > > Thanks for info! So I think I see what's going on. The two mounts race
> > > like:
> > > 
> > > MOUNT1					MOUNT2
> > > num = ioctl(LOOP_CTL_GET_FREE)
> > > 					num = ioctl(LOOP_CTL_GET_FREE)
> > > ioctl("/dev/loop$num", LOOP_SET_FD, ..)
> > >   - returns OK
> > > 					ioctl("/dev/loop$num", LOOP_SET_FD, ..)
> > > 					  - acquires exclusine loop$num
> > > 					    reference
> > > mount("/dev/loop$num", ...)
> > >   - sees exclusive reference from MOUNT2 and fails
> > > 					  - sees loop device is already
> > > 					    bound and fails
> > > 
> > > It is a bug in the scheme I've chosen that racing LOOP_SET_FD can block
> > > perfectly valid mount. I'll think how to fix this...
> > 
> > So how about attached patch? It fixes the regression for me.
 
Hi Bart,

> A new kernel warning is triggered by blktests block/001 that did not happen
> without this patch. Reverting commit 89e524c04fa9 ("loop: Fix mount(2)
> failure due to race with LOOP_SET_FD") makes that kernel warning disappear.
> Is this reproducible on your setup?

Thanks for report! Hum, no, it seems the warning doesn't trigger in my test
VM. But reviewing the mentioned commit with fresh head, I can see where I
did a mistake during my conversion of blkdev_get(). Does attached patch fix
the warning for you?

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

[-- Attachment #2: 0001-bdev-Fixup-error-handling-in-blkdev_get.patch --]
[-- Type: text/x-patch, Size: 1501 bytes --]

From c4cd39244088d8de548264bc33dc9fb8f0f1db2d Mon Sep 17 00:00:00 2001
From: Jan Kara <jack@suse.cz>
Date: Wed, 7 Aug 2019 11:36:47 +0200
Subject: [PATCH] bdev: Fixup error handling in blkdev_get()

Commit 89e524c04fa9 ("loop: Fix mount(2) failure due to race with
LOOP_SET_FD") converted blkdev_get() to use the new helpers for
finishing claiming of a block device. However the conversion botched the
error handling in blkdev_get() and thus the bdev has been marked as held
even in case __blkdev_get() returned error. This led to occasional
warnings with block/001 test from blktests like:

kernel: WARNING: CPU: 5 PID: 907 at fs/block_dev.c:1899 __blkdev_put+0x396/0x3a0

Correct the error handling.

CC: stable@vger.kernel.org
Fixes: 89e524c04fa9 ("loop: Fix mount(2) failure due to race with LOOP_SET_FD")
Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/block_dev.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/fs/block_dev.c b/fs/block_dev.c
index a6f7c892cb4a..7db181581606 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -1754,7 +1754,10 @@ int blkdev_get(struct block_device *bdev, fmode_t mode, void *holder)
 
 		/* finish claiming */
 		mutex_lock(&bdev->bd_mutex);
-		bd_finish_claiming(bdev, whole, holder);
+		if (!res)
+			bd_finish_claiming(bdev, whole, holder);
+		else
+			bd_abort_claiming(bdev, whole, holder);
 		/*
 		 * Block event polling for write claims if requested.  Any
 		 * write holder makes the write_holder state stick until
-- 
2.16.4


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] loop: Don't change loop device under exclusive opener
  2019-08-07  9:45                   ` Jan Kara
@ 2019-08-07 18:45                     ` Bart Van Assche
  2019-08-08 13:37                     ` Jens Axboe
  1 sibling, 0 replies; 19+ messages in thread
From: Bart Van Assche @ 2019-08-07 18:45 UTC (permalink / raw)
  To: Jan Kara
  Cc: John Lenton, Kai-Heng Feng, Jens Axboe, Tetsuo Handa,
	linux-block, jean-baptiste.lallement

On 8/7/19 2:45 AM, Jan Kara wrote:
> On Mon 05-08-19 09:41:39, Bart Van Assche wrote:
>> A new kernel warning is triggered by blktests block/001 that did not happen
>> without this patch. Reverting commit 89e524c04fa9 ("loop: Fix mount(2)
>> failure due to race with LOOP_SET_FD") makes that kernel warning disappear.
>> Is this reproducible on your setup?
> 
> Thanks for report! Hum, no, it seems the warning doesn't trigger in my test
> VM. But reviewing the mentioned commit with fresh head, I can see where I
> did a mistake during my conversion of blkdev_get(). Does attached patch fix
> the warning for you?

Hi Jan,

That patch indeed fixes the warning. Feel free to add my Tested-by.

Thanks,

Bart.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] loop: Don't change loop device under exclusive opener
  2019-08-07  9:45                   ` Jan Kara
  2019-08-07 18:45                     ` Bart Van Assche
@ 2019-08-08 13:37                     ` Jens Axboe
  1 sibling, 0 replies; 19+ messages in thread
From: Jens Axboe @ 2019-08-08 13:37 UTC (permalink / raw)
  To: Jan Kara, Bart Van Assche
  Cc: John Lenton, Kai-Heng Feng, Tetsuo Handa, linux-block,
	jean-baptiste.lallement

On 8/7/19 2:45 AM, Jan Kara wrote:
> On Mon 05-08-19 09:41:39, Bart Van Assche wrote:
>> On 7/30/19 6:36 AM, Jan Kara wrote:
>>> On Tue 30-07-19 12:16:46, Jan Kara wrote:
>>>> On Tue 30-07-19 10:36:59, John Lenton wrote:
>>>>> On Tue, 30 Jul 2019 at 10:29, Jan Kara <jack@suse.cz> wrote:
>>>>>>
>>>>>> Thanks for the notice and the references. What's your version of
>>>>>> util-linux? What your test script does is indeed racy. You have there:
>>>>>>
>>>>>> echo Running:
>>>>>> for i in {a..z}{a..z}; do
>>>>>>       mount $i.squash /mnt/$i &
>>>>>> done
>>>>>>
>>>>>> So all mount(8) commands will run in parallel and race to setup loop
>>>>>> devices with LOOP_SET_FD and mount them. However util-linux (at least in
>>>>>> the current version) seems to handle EBUSY from LOOP_SET_FD just fine and
>>>>>> retries with the new loop device. So at this point I don't see why the patch
>>>>>> makes difference... I guess I'll need to reproduce and see what's going on
>>>>>> in detail.
>>>>>
>>>>> We've observed this in arch with util-linux 2.34, and ubuntu 19.10
>>>>> (eoan ermine) with util-linux 2.33.
>>>>>
>>>>> just to be clear, the initial reports didn't involve a zany loop of
>>>>> mounts, but were triggered by effectively the same thing as systemd
>>>>> booted a system with a lot of snaps. The reroducer tries to makes
>>>>> things simpler to reproduce :-). FWIW,  systemd versions were 244 and
>>>>> 242 for those systems, respectively.
>>>>
>>>> Thanks for info! So I think I see what's going on. The two mounts race
>>>> like:
>>>>
>>>> MOUNT1					MOUNT2
>>>> num = ioctl(LOOP_CTL_GET_FREE)
>>>> 					num = ioctl(LOOP_CTL_GET_FREE)
>>>> ioctl("/dev/loop$num", LOOP_SET_FD, ..)
>>>>    - returns OK
>>>> 					ioctl("/dev/loop$num", LOOP_SET_FD, ..)
>>>> 					  - acquires exclusine loop$num
>>>> 					    reference
>>>> mount("/dev/loop$num", ...)
>>>>    - sees exclusive reference from MOUNT2 and fails
>>>> 					  - sees loop device is already
>>>> 					    bound and fails
>>>>
>>>> It is a bug in the scheme I've chosen that racing LOOP_SET_FD can block
>>>> perfectly valid mount. I'll think how to fix this...
>>>
>>> So how about attached patch? It fixes the regression for me.
>   
> Hi Bart,
> 
>> A new kernel warning is triggered by blktests block/001 that did not happen
>> without this patch. Reverting commit 89e524c04fa9 ("loop: Fix mount(2)
>> failure due to race with LOOP_SET_FD") makes that kernel warning disappear.
>> Is this reproducible on your setup?
> 
> Thanks for report! Hum, no, it seems the warning doesn't trigger in my test
> VM. But reviewing the mentioned commit with fresh head, I can see where I
> did a mistake during my conversion of blkdev_get(). Does attached patch fix
> the warning for you?

I've queued this up, Jan. Thanks for taking a look at it.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, back to index

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-05-16 14:01 [PATCH] loop: Don't change loop device under exclusive opener Jan Kara
2019-05-16 20:44 ` Jens Axboe
2019-05-27 12:29   ` Jan Kara
2019-05-27 13:34     ` Jens Axboe
2019-07-18  8:15       ` Kai-Heng Feng
2019-07-30  9:29         ` Jan Kara
2019-07-30  9:36           ` John Lenton
2019-07-30 10:16             ` Jan Kara
2019-07-30 13:36               ` Jan Kara
2019-07-30 17:59                 ` Kai-Heng Feng
2019-07-30 19:17                 ` Jens Axboe
2019-07-30 21:11                   ` John Lenton
2019-07-31  8:56                   ` Jan Kara
2019-08-05 16:41                 ` Bart Van Assche
2019-08-05 21:01                   ` Tetsuo Handa
2019-08-07  9:45                   ` Jan Kara
2019-08-07 18:45                     ` Bart Van Assche
2019-08-08 13:37                     ` Jens Axboe
2019-07-30 10:16           ` Tetsuo Handa

Linux-Block Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-block/0 linux-block/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-block linux-block/ https://lore.kernel.org/linux-block \
		linux-block@vger.kernel.org linux-block@archiver.kernel.org
	public-inbox-index linux-block


Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-block


AGPL code for this site: git clone https://public-inbox.org/ public-inbox