All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/3] btrfs-safe implementation of -oloop
@ 2016-04-12 18:21 Stanislav Brabec
  2016-04-13 11:08 ` Karel Zak
  2016-04-22 11:22 ` Karel Zak
  0 siblings, 2 replies; 7+ messages in thread
From: Stanislav Brabec @ 2016-04-12 18:21 UTC (permalink / raw)
  To: util-linux

As it was discussed in LKML[1], kernel has no way to detect backing
file data changes if multiple loop devices have the same backing file.

Exactly this happened if -oloop is used multiple times with the same
source.

It could not happen in past, but with introduction of btrfs sub-volumes
it has a perfectly legal use cases.

This patch set introduces a new behavior of -oloop:

First check, whether the same backing file with the same offset is
already used. If yes, reuse already initialized loop device.

If not, initialize another loop device.

There are some controversial things with these patches. Some of them
cannot be fixed.

- Kernel never returns EBUSY for LOOP_CLR_FD. It prevents errors when
   mount will steal other's loop device. But I am not sure, whether it
   will work correctly in all possible situations. Anyway, I don't see a
   better solution.

- If the same backing file is already used for a read-only loop device,
   there is no safe way to continue.
   - There is no way to turn R/O loop to R/W in kernel.
   - If another loop is initialized, changes will not propagate to R/O
     volume.
   - One would need to umount all R/O devices, initialize loop R/W, and
     then everything mount again.
   I can imagine partial solution: Introduce looprw option. Such option
   would cause to initialize loop device R/W even for R/O mount.

- If the same backing file is already used for a loop device with
   correct offset, but incorrect sizelimit, there is no solution. The
   current implementation does not check for it.

- There exists a change for a race condition between device lookup and
   mount syscall.

- The implementation does not check for crypto. I think it is not a big
   problem, as it makes no sense to initialize the same backing file as
   encrypted and non-encrypted at once.

References:
[1] https://lkml.org/lkml/2016/2/26/897

-- 
Best Regards / S pozdravem,

Stanislav Brabec
software developer
---------------------------------------------------------------------
SUSE LINUX, s. r. o.                         e-mail: sbrabec@suse.com
Lihovarská 1060/12                            tel: +49 911 7405384547
190 00 Praha 9                                 fax:  +420 284 084 001
Czech Republic                                    http://www.suse.cz/
PGP: 830B 40D5 9E05 35D8 5E27 6FA3 717C 209F A04F CD76

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 0/3] btrfs-safe implementation of -oloop
  2016-04-12 18:21 [PATCH 0/3] btrfs-safe implementation of -oloop Stanislav Brabec
@ 2016-04-13 11:08 ` Karel Zak
  2016-04-13 11:29   ` Ruediger Meier
  2016-04-13 12:52   ` Stanislav Brabec
  2016-04-22 11:22 ` Karel Zak
  1 sibling, 2 replies; 7+ messages in thread
From: Karel Zak @ 2016-04-13 11:08 UTC (permalink / raw)
  To: Stanislav Brabec; +Cc: util-linux

On Tue, Apr 12, 2016 at 08:21:59PM +0200, Stanislav Brabec wrote:
> Exactly this happened if -oloop is used multiple times with the same
> source.
> 
> It could not happen in past, but with introduction of btrfs sub-volumes
> it has a perfectly legal use cases.
> 
> This patch set introduces a new behavior of -oloop:
> 
> First check, whether the same backing file with the same offset is
> already used. If yes, reuse already initialized loop device.
> 
> If not, initialize another loop device.

I think this behaviour (re-use loopdev) is better, I thought about it
years ago, but it has never been implemented due to backward
compatibility. Now it's obvious it's the right way to go.

> There are some controversial things with these patches. Some of them
> cannot be fixed.
> 
> - Kernel never returns EBUSY for LOOP_CLR_FD. It prevents errors when
>   mount will steal other's loop device. But I am not sure, whether it
>   will work correctly in all possible situations. Anyway, I don't see a
>   better solution.
> 
> - If the same backing file is already used for a read-only loop device,
>   there is no safe way to continue.

Why? Just re-use the read-only loopdev. We don't have have to promise
read-write device if the backend is read only. This is generic
behavior used in another cases too. For example if you try to mount
NFS exported by server in read-only mode than your "rw" is ignored and
result is "ro", the same is with read-only media (CDROM, etc.).

IMHO all we need is a warning, "backing file already used as
read-only, mounting read-only".

>   - There is no way to turn R/O loop to R/W in kernel.
>   - If another loop is initialized, changes will not propagate to R/O
>     volume.
>   - One would need to umount all R/O devices, initialize loop R/W, and
>     then everything mount again.
>   I can imagine partial solution: Introduce looprw option. Such option
>   would cause to initialize loop device R/W even for R/O mount.

Sounds like over-engineering :-)
 
> - If the same backing file is already used for a loop device with
>   correct offset, but incorrect sizelimit, there is no solution. The
>   current implementation does not check for it.

IMHO the current "check start offset only" is a poor solution if we want
to re-use loopdevs. It would be better to think about offset+limit
as about partition and about backing-file as about whole-disk. 

It means check if the area specified by offset+limit does not overlap
any existing backing-file mapping. It should be possible to re-use
loopdev (by mount(8)) only if offset and size matches.

It should be also forbidden to create a new (non re-used) loopdev if
area specified by offset and sizelimit overlap any existing backing
file mapping. This is important!

Maybe we can also improve losetup to warn about overlapping loop
devices.

> - There exists a change for a race condition between device lookup and
>   mount syscall.

Sure.

> - The implementation does not check for crypto. I think it is not a big
>   problem, as it makes no sense to initialize the same backing file as
>   encrypted and non-encrypted at once.

There is nothing like loop devices encryption :-)

    Karel

-- 
 Karel Zak  <kzak@redhat.com>
 http://karelzak.blogspot.com

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 0/3] btrfs-safe implementation of -oloop
  2016-04-13 11:08 ` Karel Zak
@ 2016-04-13 11:29   ` Ruediger Meier
  2016-04-13 11:41     ` Karel Zak
  2016-04-13 12:52   ` Stanislav Brabec
  1 sibling, 1 reply; 7+ messages in thread
From: Ruediger Meier @ 2016-04-13 11:29 UTC (permalink / raw)
  To: Karel Zak; +Cc: Stanislav Brabec, util-linux

On Wednesday 13 April 2016, Karel Zak wrote:
> On Tue, Apr 12, 2016 at 08:21:59PM +0200, Stanislav Brabec wrote:
> > Exactly this happened if -oloop is used multiple times with the
> > same source.
> >
> > It could not happen in past, but with introduction of btrfs
> > sub-volumes it has a perfectly legal use cases.
> >
> > This patch set introduces a new behavior of -oloop:
> >
> > First check, whether the same backing file with the same offset is
> > already used. If yes, reuse already initialized loop device.
> >
> > If not, initialize another loop device.
>
> I think this behaviour (re-use loopdev) is better, I thought about it
> years ago, but it has never been implemented due to backward
> compatibility. Now it's obvious it's the right way to go.
>
> > There are some controversial things with these patches. Some of
> > them cannot be fixed.
> >
> > - Kernel never returns EBUSY for LOOP_CLR_FD. It prevents errors
> > when mount will steal other's loop device. But I am not sure,
> > whether it will work correctly in all possible situations. Anyway,
> > I don't see a better solution.
> >
> > - If the same backing file is already used for a read-only loop
> > device, there is no safe way to continue.
>
> Why? Just re-use the read-only loopdev. We don't have have to promise
> read-write device if the backend is read only. This is generic
> behavior used in another cases too. For example if you try to mount
> NFS exported by server in read-only mode than your "rw" is ignored
> and result is "ro", the same is with read-only media (CDROM, etc.).
>
> IMHO all we need is a warning, "backing file already used as
> read-only, mounting read-only".
>
> >   - There is no way to turn R/O loop to R/W in kernel.
> >   - If another loop is initialized, changes will not propagate to
> > R/O volume.
> >   - One would need to umount all R/O devices, initialize loop R/W,
> > and then everything mount again.
> >   I can imagine partial solution: Introduce looprw option. Such
> > option would cause to initialize loop device R/W even for R/O
> > mount.
>
> Sounds like over-engineering :-)
>
> > - If the same backing file is already used for a loop device with
> >   correct offset, but incorrect sizelimit, there is no solution.
> > The current implementation does not check for it.
>
> IMHO the current "check start offset only" is a poor solution if we
> want to re-use loopdevs. It would be better to think about
> offset+limit as about partition and about backing-file as about
> whole-disk.
>
> It means check if the area specified by offset+limit does not overlap
> any existing backing-file mapping. It should be possible to re-use
> loopdev (by mount(8)) only if offset and size matches.
>
> It should be also forbidden to create a new (non re-used) loopdev if
> area specified by offset and sizelimit overlap any existing backing
> file mapping. This is important!
>
> Maybe we can also improve losetup to warn about overlapping loop
> devices.
>
> > - There exists a change for a race condition between device lookup
> > and mount syscall.
>
> Sure.
>
> > - The implementation does not check for crypto. I think it is not a
> > big problem, as it makes no sense to initialize the same backing
> > file as encrypted and non-encrypted at once.
>
> There is nothing like loop devices encryption :-)

openSUSE had cryptoloop before 12.2. But it's not supported anymore.
There was also losetup -e:
       -e, -E, --encryption encryption_type
              enable data encryption with specified name or number

cu,
Rudi

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 0/3] btrfs-safe implementation of -oloop
  2016-04-13 11:29   ` Ruediger Meier
@ 2016-04-13 11:41     ` Karel Zak
  0 siblings, 0 replies; 7+ messages in thread
From: Karel Zak @ 2016-04-13 11:41 UTC (permalink / raw)
  To: Ruediger Meier; +Cc: Stanislav Brabec, util-linux

On Wed, Apr 13, 2016 at 01:29:35PM +0200, Ruediger Meier wrote:
> > There is nothing like loop devices encryption :-)
> 
> openSUSE had cryptoloop before 12.2. But it's not supported anymore.
> There was also losetup -e:
>        -e, -E, --encryption encryption_type
>               enable data encryption with specified name or number

   /me is sarcastic sometimes :-)  
 

It wasn't easy to force Suse to sent:

  commit 5cf05c71472bf7230075cbdcd5cd6eb12b1d3654
  Author: Ludwig Nussel <ludwig.nussel@suse.de>
  Date:   Tue Sep 11 10:46:11 2012 +0200

    mount: losetup: remove obsolete encryption support


... but I guess we all are happy with it after years.

    Karel

-- 
 Karel Zak  <kzak@redhat.com>
 http://karelzak.blogspot.com

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 0/3] btrfs-safe implementation of -oloop
  2016-04-13 11:08 ` Karel Zak
  2016-04-13 11:29   ` Ruediger Meier
@ 2016-04-13 12:52   ` Stanislav Brabec
  1 sibling, 0 replies; 7+ messages in thread
From: Stanislav Brabec @ 2016-04-13 12:52 UTC (permalink / raw)
  To: Karel Zak; +Cc: util-linux

Karel Zak wrote:
> On Tue, Apr 12, 2016 at 08:21:59PM +0200, Stanislav Brabec wrote:

>> - If the same backing file is already used for a read-only loop device,
>>    there is no safe way to continue.
>
> Why? Just re-use the read-only loopdev. We don't have have to promise
> read-write device if the backend is read only. This is generic
> behavior used in another cases too. For example if you try to mount
> NFS exported by server in read-only mode than your "rw" is ignored and
> result is "ro", the same is with read-only media (CDROM, etc.).

There is a problem that mount itself initializes loop as R/O:

# ./mount -oro,loop /btrfs.img /mnt/1
# ./mount -oloop,subvol=/ /btrfs.img /mnt/2
mount: /btrfs.img is used as read only loop, mounting read-only

> IMHO all we need is a warning, "backing file already used as
> read-only, mounting read-only".

This is part of patch 3. See above. Feel free to improve the warning.

>>    - There is no way to turn R/O loop to R/W in kernel.
>>    - If another loop is initialized, changes will not propagate to R/O
>>      volume.
>>    - One would need to umount all R/O devices, initialize loop R/W, and
>>      then everything mount again.
>>    I can imagine partial solution: Introduce looprw option. Such option
>>    would cause to initialize loop device R/W even for R/O mount.
>
> Sounds like over-engineering :-)

I think that mounting / as R/W and /snapshots as R/O could be a legal 
use. If mount mounts /snapshots first, there is no way to mount / R/W.

>> - If the same backing file is already used for a loop device with
>>    correct offset, but incorrect sizelimit, there is no solution. The
>>    current implementation does not check for it.
>
> IMHO the current "check start offset only" is a poor solution if we want
> to re-use loopdevs. It would be better to think about offset+limit
> as about partition and about backing-file as about whole-disk.

I am aware that it is unsafe. But it was easy, and will work in most 
"normal" cases. Nobody has a reason to initialize loop device that 
contains only half of the partition.

> It means check if the area specified by offset+limit does not overlap
> any existing backing-file mapping. It should be possible to re-use
> loopdev (by mount(8)) only if offset and size matches.
 >
> It should be also forbidden to create a new (non re-used) loopdev if
> area specified by offset and sizelimit overlap any existing backing
> file mapping. This is important!
>
> Maybe we can also improve losetup to warn about overlapping loop
> devices.

This sounds as a 100% safe solution.

>> - There exists a change for a race condition between device lookup and
>>    mount syscall.
>
> Sure.

I was thinking about LO_FLAGS_OPEN_FD flag for 
loopcxt_find_by_backing_file(). It will use open() as soon as possible, 
and if it fails, it repeats the iteration.
loopctxt_close_fd() would then close it.

But it would need more work, as mount does not have access to loopctxt.

>> - The implementation does not check for crypto. I think it is not a big
>>    problem, as it makes no sense to initialize the same backing file as
>>    encrypted and non-encrypted at once.
>
> There is nothing like loop devices encryption :-)

Good to know. It is still a part of kernel structures.

-- 
Best Regards / S pozdravem,

Stanislav Brabec
software developer
---------------------------------------------------------------------
SUSE LINUX, s. r. o.                         e-mail: sbrabec@suse.com
Lihovarská 1060/12                            tel: +49 911 7405384547
190 00 Praha 9                                 fax:  +420 284 084 001
Czech Republic                                    http://www.suse.cz/
PGP: 830B 40D5 9E05 35D8 5E27 6FA3 717C 209F A04F CD76

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 0/3] btrfs-safe implementation of -oloop
  2016-04-12 18:21 [PATCH 0/3] btrfs-safe implementation of -oloop Stanislav Brabec
  2016-04-13 11:08 ` Karel Zak
@ 2016-04-22 11:22 ` Karel Zak
  2016-04-22 12:42   ` Stanislav Brabec
  1 sibling, 1 reply; 7+ messages in thread
From: Karel Zak @ 2016-04-22 11:22 UTC (permalink / raw)
  To: Stanislav Brabec; +Cc: util-linux

On Tue, Apr 12, 2016 at 08:21:59PM +0200, Stanislav Brabec wrote:
> As it was discussed in LKML[1], kernel has no way to detect backing
> file data changes if multiple loop devices have the same backing file.
> 
> Exactly this happened if -oloop is used multiple times with the same
> source.
> 
> It could not happen in past, but with introduction of btrfs sub-volumes
> it has a perfectly legal use cases.
> 
> This patch set introduces a new behavior of -oloop:
> 
> First check, whether the same backing file with the same offset is
> already used. If yes, reuse already initialized loop device.
> 
> If not, initialize another loop device.

 Applied, thanks.

> - If the same backing file is already used for a loop device with
>   correct offset, but incorrect sizelimit, there is no solution. The
>   current implementation does not check for it.

 We need a warning/error for this.

    Karel

-- 
 Karel Zak  <kzak@redhat.com>
 http://karelzak.blogspot.com

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 0/3] btrfs-safe implementation of -oloop
  2016-04-22 11:22 ` Karel Zak
@ 2016-04-22 12:42   ` Stanislav Brabec
  0 siblings, 0 replies; 7+ messages in thread
From: Stanislav Brabec @ 2016-04-22 12:42 UTC (permalink / raw)
  To: Karel Zak; +Cc: util-linux

Karel Zak wrote:

>> - If the same backing file is already used for a loop device with
>>    correct offset, but incorrect sizelimit, there is no solution. The
>>    current implementation does not check for it.
>
>   We need a warning/error for this.
>
I am working on an improved, more safe loop checks.

-- 
Best Regards / S pozdravem,

Stanislav Brabec
software developer
---------------------------------------------------------------------
SUSE LINUX, s. r. o.                         e-mail: sbrabec@suse.com
Lihovarská 1060/12                            tel: +49 911 7405384547
190 00 Praha 9                                 fax:  +420 284 084 001
Czech Republic                                    http://www.suse.cz/
PGP: 830B 40D5 9E05 35D8 5E27 6FA3 717C 209F A04F CD76

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2016-04-22 12:42 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-04-12 18:21 [PATCH 0/3] btrfs-safe implementation of -oloop Stanislav Brabec
2016-04-13 11:08 ` Karel Zak
2016-04-13 11:29   ` Ruediger Meier
2016-04-13 11:41     ` Karel Zak
2016-04-13 12:52   ` Stanislav Brabec
2016-04-22 11:22 ` Karel Zak
2016-04-22 12:42   ` Stanislav Brabec

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.