All of lore.kernel.org
 help / color / mirror / Atom feed
* Racy loop device reuse logic
@ 2022-01-13 15:47 Jan Kara
  2022-01-19  8:52 ` Jan Kara
  0 siblings, 1 reply; 9+ messages in thread
From: Jan Kara @ 2022-01-13 15:47 UTC (permalink / raw)
  To: Karel Zak; +Cc: util-linux, linux-block, Tetsuo Handa

Hello,

Tetsuo has been doing some changes to the loop device shutdown in the
kernel and that broke LTP that is doing essentially the following loop:

while :; do mount -o loop,ro isofs.iso isofs/; umount isofs/; done

And this loop is broken because of a subtle interaction with systemd-udev
that also opens the loop device. The race seems to be in mount(8) handling
itself and the altered kernel timing makes it happen. It look like:

bash					systemd-udev
  mount -o loop,ro isofs.iso isofs/
    /dev/loop0 is created and bound to isofs.iso, autoclear is set for
    loop0
  					opens /dev/loop0
  umount isofs/
  loop0 still lives because systemd-udev still has device open
  mount -o loop,ro isofs.iso isofs/
    gets to mnt_context_setup_loopdev()
      loopcxt_find_overlap()
      sees loop0 is still valid and with proper parameters
      reuse = true;
					close /dev/loop0
					  last fd closed => loop0 is
					    cleaned up
      loopcxt_get_fd()
        opens loop0 but it is no longer the device we wanted!
    calls mount(2) which fails because we cannot read from the loop device

It seems to me that mnt_context_setup_loopdev() should actually recheck
that loop device parameters still match what we need after opening
/dev/loop0 (if LOOP_GET_STATUS ioctl succeeds on the fd, you are guaranteed
the loop device is in that state and will not be torn down under your
hands). What do you think?

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Racy loop device reuse logic
  2022-01-13 15:47 Racy loop device reuse logic Jan Kara
@ 2022-01-19  8:52 ` Jan Kara
  2022-01-19 11:30   ` Tetsuo Handa
  2022-01-19 11:39   ` Karel Zak
  0 siblings, 2 replies; 9+ messages in thread
From: Jan Kara @ 2022-01-19  8:52 UTC (permalink / raw)
  To: Karel Zak; +Cc: util-linux, linux-block, Tetsuo Handa

Ping? Any opinion?

								Honza

On Thu 13-01-22 16:47:35, Jan Kara wrote:
> Hello,
> 
> Tetsuo has been doing some changes to the loop device shutdown in the
> kernel and that broke LTP that is doing essentially the following loop:
> 
> while :; do mount -o loop,ro isofs.iso isofs/; umount isofs/; done
> 
> And this loop is broken because of a subtle interaction with systemd-udev
> that also opens the loop device. The race seems to be in mount(8) handling
> itself and the altered kernel timing makes it happen. It look like:
> 
> bash					systemd-udev
>   mount -o loop,ro isofs.iso isofs/
>     /dev/loop0 is created and bound to isofs.iso, autoclear is set for
>     loop0
>   					opens /dev/loop0
>   umount isofs/
>   loop0 still lives because systemd-udev still has device open
>   mount -o loop,ro isofs.iso isofs/
>     gets to mnt_context_setup_loopdev()
>       loopcxt_find_overlap()
>       sees loop0 is still valid and with proper parameters
>       reuse = true;
> 					close /dev/loop0
> 					  last fd closed => loop0 is
> 					    cleaned up
>       loopcxt_get_fd()
>         opens loop0 but it is no longer the device we wanted!
>     calls mount(2) which fails because we cannot read from the loop device
> 
> It seems to me that mnt_context_setup_loopdev() should actually recheck
> that loop device parameters still match what we need after opening
> /dev/loop0 (if LOOP_GET_STATUS ioctl succeeds on the fd, you are guaranteed
> the loop device is in that state and will not be torn down under your
> hands). What do you think?
> 
> 								Honza
> -- 
> Jan Kara <jack@suse.com>
> SUSE Labs, CR
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Racy loop device reuse logic
  2022-01-19  8:52 ` Jan Kara
@ 2022-01-19 11:30   ` Tetsuo Handa
  2022-01-19 21:34     ` Jan Kara
  2022-01-19 11:39   ` Karel Zak
  1 sibling, 1 reply; 9+ messages in thread
From: Tetsuo Handa @ 2022-01-19 11:30 UTC (permalink / raw)
  To: Jan Kara; +Cc: util-linux, linux-block, Karel Zak

I found a way to avoid this race by splitting lo_open() into two phases using task_work_add().
Christoph Hellwig is trying to take a look at https://lkml.kernel.org/r/f6b947d0-1047-66b3-0243-af5017c9ab55@I-love.SAKURA.ne.jp .

On 2022/01/12 22:16, Jan Kara wrote:
> I don't think we can fully solve this race in the kernel and IMO it is
> futile to try that - depending on when exactly systemd-udev decides to

s/systemd-udev/systemd-udevd/g

> close /dev/loop0 and how exactly mount decides to implement its loop device
> reuse, different strategies will / will not work.

Assuming memory leak

 +	spin_unlock(&loop_delete_spinlock);
-+	if (err)
-+		return err;
++	if (err) {
++		kfree(lot);
++		kfree(lrt);
++		return err;
++	}
 +	/* Add to spool, for -ENOMEM upon release() cannot be handled. */

in that patch is fixed, trying to solve this race on the kernel side looks not such bad.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Racy loop device reuse logic
  2022-01-19  8:52 ` Jan Kara
  2022-01-19 11:30   ` Tetsuo Handa
@ 2022-01-19 11:39   ` Karel Zak
  2022-01-19 21:34     ` Jan Kara
  1 sibling, 1 reply; 9+ messages in thread
From: Karel Zak @ 2022-01-19 11:39 UTC (permalink / raw)
  To: Jan Kara; +Cc: util-linux, linux-block, Tetsuo Handa

On Wed, Jan 19, 2022 at 09:52:47AM +0100, Jan Kara wrote:
> Ping? Any opinion?

 Sorry for the delay.

> On Thu 13-01-22 16:47:35, Jan Kara wrote:
> > Hello,
> > 
> > Tetsuo has been doing some changes to the loop device shutdown in the
> > kernel and that broke LTP that is doing essentially the following loop:
> > 
> > while :; do mount -o loop,ro isofs.iso isofs/; umount isofs/; done
> > 
> > And this loop is broken because of a subtle interaction with systemd-udev
> > that also opens the loop device. The race seems to be in mount(8) handling
> > itself and the altered kernel timing makes it happen. It look like:
> > 
> > bash					systemd-udev
> >   mount -o loop,ro isofs.iso isofs/
> >     /dev/loop0 is created and bound to isofs.iso, autoclear is set for
> >     loop0
> >   					opens /dev/loop0
> >   umount isofs/
> >   loop0 still lives because systemd-udev still has device open
> >   mount -o loop,ro isofs.iso isofs/
> >     gets to mnt_context_setup_loopdev()
> >       loopcxt_find_overlap()
> >       sees loop0 is still valid and with proper parameters
> >       reuse = true;
> > 					close /dev/loop0
> > 					  last fd closed => loop0 is
> > 					    cleaned up
> >       loopcxt_get_fd()
> >         opens loop0 but it is no longer the device we wanted!
> >     calls mount(2) which fails because we cannot read from the loop device
> > 
> > It seems to me that mnt_context_setup_loopdev() should actually recheck
> > that loop device parameters still match what we need after opening
> > /dev/loop0 (if LOOP_GET_STATUS ioctl succeeds on the fd, you are guaranteed
> > the loop device is in that state and will not be torn down under your
> > hands). What do you think?

Seems like elegant solution. Please, send a patch.

(It would be possible to be care about autoclear in
loopcxt_find_overlap(), but it sounds complicated and probably still
racy.

> > 
> > 								Honza
> > -- 
> > Jan Kara <jack@suse.com>
> > SUSE Labs, CR
> -- 
> Jan Kara <jack@suse.com>
> SUSE Labs, CR
> 

-- 
 Karel Zak  <kzak@redhat.com>
 http://karelzak.blogspot.com


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Racy loop device reuse logic
  2022-01-19 11:30   ` Tetsuo Handa
@ 2022-01-19 21:34     ` Jan Kara
  2022-01-20  8:50       ` Karel Zak
  0 siblings, 1 reply; 9+ messages in thread
From: Jan Kara @ 2022-01-19 21:34 UTC (permalink / raw)
  To: Tetsuo Handa; +Cc: Jan Kara, util-linux, linux-block, Karel Zak

On Wed 19-01-22 20:30:52, Tetsuo Handa wrote:
> I found a way to avoid this race by splitting lo_open() into two phases
> using task_work_add().  Christoph Hellwig is trying to take a look at
> https://lkml.kernel.org/r/f6b947d0-1047-66b3-0243-af5017c9ab55@I-love.SAKURA.ne.jp
> .

No, you have found a way to make the race window for mount(8) smaller. And
I still disagree with that kernel change because it is making kernel more
complex only to make the race window smaller. On another machine or with
different scheduling decisions, you can still hit this race. This problem
must be fixed in mount...

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Racy loop device reuse logic
  2022-01-19 11:39   ` Karel Zak
@ 2022-01-19 21:34     ` Jan Kara
  2022-01-20 12:13       ` Jan Kara
  0 siblings, 1 reply; 9+ messages in thread
From: Jan Kara @ 2022-01-19 21:34 UTC (permalink / raw)
  To: Karel Zak; +Cc: Jan Kara, util-linux, linux-block, Tetsuo Handa

On Wed 19-01-22 12:39:00, Karel Zak wrote:
> On Wed, Jan 19, 2022 at 09:52:47AM +0100, Jan Kara wrote:
> > Ping? Any opinion?
> 
>  Sorry for the delay.
> 
> > On Thu 13-01-22 16:47:35, Jan Kara wrote:
> > > Hello,
> > > 
> > > Tetsuo has been doing some changes to the loop device shutdown in the
> > > kernel and that broke LTP that is doing essentially the following loop:
> > > 
> > > while :; do mount -o loop,ro isofs.iso isofs/; umount isofs/; done
> > > 
> > > And this loop is broken because of a subtle interaction with systemd-udev
> > > that also opens the loop device. The race seems to be in mount(8) handling
> > > itself and the altered kernel timing makes it happen. It look like:
> > > 
> > > bash					systemd-udev
> > >   mount -o loop,ro isofs.iso isofs/
> > >     /dev/loop0 is created and bound to isofs.iso, autoclear is set for
> > >     loop0
> > >   					opens /dev/loop0
> > >   umount isofs/
> > >   loop0 still lives because systemd-udev still has device open
> > >   mount -o loop,ro isofs.iso isofs/
> > >     gets to mnt_context_setup_loopdev()
> > >       loopcxt_find_overlap()
> > >       sees loop0 is still valid and with proper parameters
> > >       reuse = true;
> > > 					close /dev/loop0
> > > 					  last fd closed => loop0 is
> > > 					    cleaned up
> > >       loopcxt_get_fd()
> > >         opens loop0 but it is no longer the device we wanted!
> > >     calls mount(2) which fails because we cannot read from the loop device
> > > 
> > > It seems to me that mnt_context_setup_loopdev() should actually recheck
> > > that loop device parameters still match what we need after opening
> > > /dev/loop0 (if LOOP_GET_STATUS ioctl succeeds on the fd, you are guaranteed
> > > the loop device is in that state and will not be torn down under your
> > > hands). What do you think?
> 
> Seems like elegant solution. Please, send a patch.

OK, will do.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Racy loop device reuse logic
  2022-01-19 21:34     ` Jan Kara
@ 2022-01-20  8:50       ` Karel Zak
  2022-01-20 10:09         ` Tetsuo Handa
  0 siblings, 1 reply; 9+ messages in thread
From: Karel Zak @ 2022-01-20  8:50 UTC (permalink / raw)
  To: Jan Kara; +Cc: Tetsuo Handa, util-linux, linux-block

On Wed, Jan 19, 2022 at 10:34:15PM +0100, Jan Kara wrote:
> On Wed 19-01-22 20:30:52, Tetsuo Handa wrote:
> > I found a way to avoid this race by splitting lo_open() into two phases
> > using task_work_add().  Christoph Hellwig is trying to take a look at
> > https://lkml.kernel.org/r/f6b947d0-1047-66b3-0243-af5017c9ab55@I-love.SAKURA.ne.jp
> > .
> 
> No, you have found a way to make the race window for mount(8) smaller. And
> I still disagree with that kernel change because it is making kernel more
> complex only to make the race window smaller. On another machine or with
> different scheduling decisions, you can still hit this race. This problem
> must be fixed in mount...

+1

I think Jan is right. In this case mount(8) is not robust enough. It
reads info about the device from /sys and then it opens the device.
Unfortunately, whatever can happen before the open() call.

    Karel

-- 
 Karel Zak  <kzak@redhat.com>
 http://karelzak.blogspot.com


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Racy loop device reuse logic
  2022-01-20  8:50       ` Karel Zak
@ 2022-01-20 10:09         ` Tetsuo Handa
  0 siblings, 0 replies; 9+ messages in thread
From: Tetsuo Handa @ 2022-01-20 10:09 UTC (permalink / raw)
  To: Karel Zak, Jan Kara; +Cc: util-linux, linux-block

On 2022/01/20 17:50, Karel Zak wrote:
> On Wed, Jan 19, 2022 at 10:34:15PM +0100, Jan Kara wrote:
>> On Wed 19-01-22 20:30:52, Tetsuo Handa wrote:
>>> I found a way to avoid this race by splitting lo_open() into two phases
>>> using task_work_add().  Christoph Hellwig is trying to take a look at
>>> https://lkml.kernel.org/r/f6b947d0-1047-66b3-0243-af5017c9ab55@I-love.SAKURA.ne.jp
>>> .
>>
>> No, you have found a way to make the race window for mount(8) smaller. And
>> I still disagree with that kernel change because it is making kernel more
>> complex only to make the race window smaller. On another machine or with
>> different scheduling decisions, you can still hit this race. This problem
>> must be fixed in mount...
> 
> +1
> 
> I think Jan is right. In this case mount(8) is not robust enough. It
> reads info about the device from /sys and then it opens the device.
> Unfortunately, whatever can happen before the open() call.
> 

I'm not objecting to fix /bin/mount itself. Please check
[PATCH 4/4] loop: wait for __loop_clr_fd() to complete upon lo_open()
in https://lkml.kernel.org/r/cdaf1346-2885-f0da-8878-12264bd48348@I-love.SAKURA.ne.jp .

  /bin/mount needs to be updated to check ioctl(LOOP_GET_STATUS) after open()
  in order to confirm that lo->lo_state remains Lo_bound. But we need some
  migration period for allowing users to update their util-linux package.
  Thus, meantime emulate serialization between lo_open() and lo_release()
  without using disk->open_mutex.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Racy loop device reuse logic
  2022-01-19 21:34     ` Jan Kara
@ 2022-01-20 12:13       ` Jan Kara
  0 siblings, 0 replies; 9+ messages in thread
From: Jan Kara @ 2022-01-20 12:13 UTC (permalink / raw)
  To: Karel Zak; +Cc: Jan Kara, util-linux, linux-block, Tetsuo Handa

On Wed 19-01-22 22:34:49, Jan Kara wrote:
> On Wed 19-01-22 12:39:00, Karel Zak wrote:
> > On Wed, Jan 19, 2022 at 09:52:47AM +0100, Jan Kara wrote:
> > > Ping? Any opinion?
> > 
> >  Sorry for the delay.
> > 
> > > On Thu 13-01-22 16:47:35, Jan Kara wrote:
> > > > Hello,
> > > > 
> > > > Tetsuo has been doing some changes to the loop device shutdown in the
> > > > kernel and that broke LTP that is doing essentially the following loop:
> > > > 
> > > > while :; do mount -o loop,ro isofs.iso isofs/; umount isofs/; done
> > > > 
> > > > And this loop is broken because of a subtle interaction with systemd-udev
> > > > that also opens the loop device. The race seems to be in mount(8) handling
> > > > itself and the altered kernel timing makes it happen. It look like:
> > > > 
> > > > bash					systemd-udev
> > > >   mount -o loop,ro isofs.iso isofs/
> > > >     /dev/loop0 is created and bound to isofs.iso, autoclear is set for
> > > >     loop0
> > > >   					opens /dev/loop0
> > > >   umount isofs/
> > > >   loop0 still lives because systemd-udev still has device open
> > > >   mount -o loop,ro isofs.iso isofs/
> > > >     gets to mnt_context_setup_loopdev()
> > > >       loopcxt_find_overlap()
> > > >       sees loop0 is still valid and with proper parameters
> > > >       reuse = true;
> > > > 					close /dev/loop0
> > > > 					  last fd closed => loop0 is
> > > > 					    cleaned up
> > > >       loopcxt_get_fd()
> > > >         opens loop0 but it is no longer the device we wanted!
> > > >     calls mount(2) which fails because we cannot read from the loop device
> > > > 
> > > > It seems to me that mnt_context_setup_loopdev() should actually recheck
> > > > that loop device parameters still match what we need after opening
> > > > /dev/loop0 (if LOOP_GET_STATUS ioctl succeeds on the fd, you are guaranteed
> > > > the loop device is in that state and will not be torn down under your
> > > > hands). What do you think?
> > 
> > Seems like elegant solution. Please, send a patch.
> 
> OK, will do.

Patch sent which fixes the problem for me. Related to this, I have found out
that loop device reuse detection code sometimes fails with EPERM (if I run
the mount loop long enough). I have tracked this down to e.g.
loopcxt_get_sizelimit() calling ul_path_read_u64() which fails (loop device
just got torn down) and ul_path_read_u64() always returns -1 on error which
gets somewhat confusingly translated to EPERM error. I think this should be
handled more gracefully so I'll send patches to fix this as well.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2022-01-20 12:13 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-01-13 15:47 Racy loop device reuse logic Jan Kara
2022-01-19  8:52 ` Jan Kara
2022-01-19 11:30   ` Tetsuo Handa
2022-01-19 21:34     ` Jan Kara
2022-01-20  8:50       ` Karel Zak
2022-01-20 10:09         ` Tetsuo Handa
2022-01-19 11:39   ` Karel Zak
2022-01-19 21:34     ` Jan Kara
2022-01-20 12:13       ` Jan Kara

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.