All of lore.kernel.org
 help / color / mirror / Atom feed
* How about adding an ioctl to convert a directory to a subvolume?
@ 2017-11-27  9:41 Lu Fengqi
  2017-11-27 10:13 ` Qu Wenruo
  2017-11-28 18:48 ` David Sterba
  0 siblings, 2 replies; 11+ messages in thread
From: Lu Fengqi @ 2017-11-27  9:41 UTC (permalink / raw)
  To: linux-btrfs

Hi all,

As we all know, under certain circumstances, it is more appropriate to
create some subvolumes rather than keep everything in the same
subvolume. As the condition of demand change, the user may need to
convert a previous directory to a subvolume. For this reason,how about
adding an ioctl to convert a directory to a subvolume?

Users can convert by the scripts mentioned in this
thread(https://www.spinics.net/lists/linux-btrfs/msg33252.html), but is
it easier to use the off-the-shelf btrfs subcommand?

After an initial consideration, our implementation is broadly divided
into the following steps:
1. Freeze the filesystem or set the subvolume above the source directory
to read-only;
2. Perform a pre-check, for example, check if a cross-device link
creation during the conversion;
3. Perform conversion, such as creating a new subvolume and moving the
contents of the source directory;
4. Thaw the filesystem or restore the subvolume writable property.

In fact, I am not so sure whether this use of freeze is appropriate
because the source directory the user needs to convert may be located
at / or /home and this pre-check and conversion process may take a long
time, which can lead to some shell and graphical application suspended.

Please give your comments if any.

-- 
Thanks,
Lu



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: How about adding an ioctl to convert a directory to a subvolume?
  2017-11-27  9:41 How about adding an ioctl to convert a directory to a subvolume? Lu Fengqi
@ 2017-11-27 10:13 ` Qu Wenruo
  2017-11-27 13:02   ` Austin S. Hemmelgarn
  2017-11-28  8:29   ` Lu Fengqi
  2017-11-28 18:48 ` David Sterba
  1 sibling, 2 replies; 11+ messages in thread
From: Qu Wenruo @ 2017-11-27 10:13 UTC (permalink / raw)
  To: Lu Fengqi, linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 3952 bytes --]



On 2017年11月27日 17:41, Lu Fengqi wrote:
> Hi all,
> 
> As we all know, under certain circumstances, it is more appropriate to
> create some subvolumes rather than keep everything in the same
> subvolume. As the condition of demand change, the user may need to
> convert a previous directory to a subvolume. For this reason,how about
> adding an ioctl to convert a directory to a subvolume?

The idea seems interesting.

However in my opinion, this can be done quite easily in (mostly) user
space, thanks to btrfs support of relink.

The method from Hugo or Chris is quite good, maybe it can be enhanced a
little.

Use the following layout as an example:

root_subv
|- subvolume_1
|  |- dir_1
|  |  |- file_1
|  |  |- file_2
|  |- dir_2
|     |- file_3
|- subvolume_2

If we want to convert dir_1 into subvolume, we can do it like:

1) Create a temporary readonly snapshot of parent subvolume containing
   the desired dir
   # btrfs sub snapshot -r root_subv/subvolume_1 \
     root_subv/tmp_snapshot_1

2) Create a new subvolume, as destination.
   # btrfs sub create root_subv/tmp_dest/

3) Copy the content and sync the fs
   Use of reflink is necessary.
   # cp -r --reflink=always root_subv/tmp_snapshot_1/dir_1 \
     root_subv/tmp_dest
   # btrfs sync root_subv/tmp_dest

4) Delete temporary readonly snapshot
   # btrfs subvolume delete root_subv/tmp_snapshot_1

5) Remove the source dir
   # rm -rf root_subv/subvolume_1/dir_1

5) Create a final destination snapshot of "root_subv/temporary_dest"
   # btrfs subvolume snapshot root_subv/tmp_dest \
     root_subv/subvolume_1/dir_1

6) Remove the temporary destination
   # btrfs subvolume delete root_subv/tmp_dest


The main challenge is in step 3).
In fact above method can only handle normal dir/files.
If there is another subvolume inside the desired dir, current "cp -r" is
a bad idea.
We need to skip subvolume dir, and create snapshot for it.

But it's quite easy to write a user space program to handle it.
Maybe using "find" command can already handle it well.

Anyway, doing it in user space is already possible and much easier than
doing it in kernel.

> 
> Users can convert by the scripts mentioned in this
> thread(https://www.spinics.net/lists/linux-btrfs/msg33252.html), but is
> it easier to use the off-the-shelf btrfs subcommand?

If you just want to integrate the functionality into btrfs-progs, maybe
it's possible.

But if you insist in providing a new ioctl for this, I highly doubt if
the extra hassle is worthy.

> 
> After an initial consideration, our implementation is broadly divided
> into the following steps:
> 1. Freeze the filesystem or set the subvolume above the source directory
> to read-only;

Not really need to freeze the whole fs.
Just create a readonly snapshot of the parent subvolume which contains
the dir.
That's how snapshot is designed for.

> 2. Perform a pre-check, for example, check if a cross-device link
> creation during the conversion;

This can be done in-the-fly.
As the check is so easy (only needs to check if the inode number is 256).
We only need a mid-order iteration of the source dir (in temporary
snapshot), and for normal file, use reflink.
For subvolume dir, create a snapshot for it.

And for such iteration, a python script less than 100 lines would be
sufficient.

Thanks,
Qu

> 3. Perform conversion, such as creating a new subvolume and moving the
> contents of the source directory;
> 4. Thaw the filesystem or restore the subvolume writable property.
> 
> In fact, I am not so sure whether this use of freeze is appropriate
> because the source directory the user needs to convert may be located
> at / or /home and this pre-check and conversion process may take a long
> time, which can lead to some shell and graphical application suspended.
> 
> Please give your comments if any.
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 520 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: How about adding an ioctl to convert a directory to a subvolume?
  2017-11-27 10:13 ` Qu Wenruo
@ 2017-11-27 13:02   ` Austin S. Hemmelgarn
  2017-11-27 13:17     ` Qu Wenruo
  2017-11-28  8:29   ` Lu Fengqi
  1 sibling, 1 reply; 11+ messages in thread
From: Austin S. Hemmelgarn @ 2017-11-27 13:02 UTC (permalink / raw)
  To: Qu Wenruo, Lu Fengqi, linux-btrfs

On 2017-11-27 05:13, Qu Wenruo wrote:
> 
> 
> On 2017年11月27日 17:41, Lu Fengqi wrote:
>> Hi all,
>>
>> As we all know, under certain circumstances, it is more appropriate to
>> create some subvolumes rather than keep everything in the same
>> subvolume. As the condition of demand change, the user may need to
>> convert a previous directory to a subvolume. For this reason,how about
>> adding an ioctl to convert a directory to a subvolume?
> 
> The idea seems interesting.
> 
> However in my opinion, this can be done quite easily in (mostly) user
> space, thanks to btrfs support of relink.
> 
> The method from Hugo or Chris is quite good, maybe it can be enhanced a
> little.
> 
> Use the following layout as an example:
> 
> root_subv
> |- subvolume_1
> |  |- dir_1
> |  |  |- file_1
> |  |  |- file_2
> |  |- dir_2
> |     |- file_3
> |- subvolume_2
> 
> If we want to convert dir_1 into subvolume, we can do it like:
> 
> 1) Create a temporary readonly snapshot of parent subvolume containing
>     the desired dir
>     # btrfs sub snapshot -r root_subv/subvolume_1 \
>       root_subv/tmp_snapshot_1
> 
> 2) Create a new subvolume, as destination.
>     # btrfs sub create root_subv/tmp_dest/
> 
> 3) Copy the content and sync the fs
>     Use of reflink is necessary.
>     # cp -r --reflink=always root_subv/tmp_snapshot_1/dir_1 \
>       root_subv/tmp_dest
>     # btrfs sync root_subv/tmp_dest
> 
> 4) Delete temporary readonly snapshot
>     # btrfs subvolume delete root_subv/tmp_snapshot_1
> 
> 5) Remove the source dir
>     # rm -rf root_subv/subvolume_1/dir_1
> 
> 5) Create a final destination snapshot of "root_subv/temporary_dest"
>     # btrfs subvolume snapshot root_subv/tmp_dest \
>       root_subv/subvolume_1/dir_1
> 
> 6) Remove the temporary destination
>     # btrfs subvolume delete root_subv/tmp_dest
> 
> 
> The main challenge is in step 3).
> In fact above method can only handle normal dir/files.
> If there is another subvolume inside the desired dir, current "cp -r" is
> a bad idea.
> We need to skip subvolume dir, and create snapshot for it.
> 
> But it's quite easy to write a user space program to handle it.
> Maybe using "find" command can already handle it well.
> 
> Anyway, doing it in user space is already possible and much easier than
> doing it in kernel.
> 
>>
>> Users can convert by the scripts mentioned in this
>> thread(https://www.spinics.net/lists/linux-btrfs/msg33252.html), but is
>> it easier to use the off-the-shelf btrfs subcommand?
> 
> If you just want to integrate the functionality into btrfs-progs, maybe
> it's possible.
> 
> But if you insist in providing a new ioctl for this, I highly doubt if
> the extra hassle is worthy.
> 
>>
>> After an initial consideration, our implementation is broadly divided
>> into the following steps:
>> 1. Freeze the filesystem or set the subvolume above the source directory
>> to read-only;
> 
> Not really need to freeze the whole fs.
> Just create a readonly snapshot of the parent subvolume which contains
> the dir.
> That's how snapshot is designed for.
> 
>> 2. Perform a pre-check, for example, check if a cross-device link
>> creation during the conversion;
> 
> This can be done in-the-fly.
> As the check is so easy (only needs to check if the inode number is 256).
> We only need a mid-order iteration of the source dir (in temporary
> snapshot), and for normal file, use reflink.
> For subvolume dir, create a snapshot for it.
> 
> And for such iteration, a python script less than 100 lines would be
> sufficient.
On that note, see the function convert_dir_to_subv() in:
https://github.com/Ferroin/btrfs-subv-backup/blob/master/btrfs-subv-backup.py

For an example of how to do it in Python (albeit with some extra code to 
handle the case of not having the reflink module from PyPI, and without 
anything to prevent the source from being modified).

It would still be nice to be able to do this atomically though, or at 
least get cross-rename support in BTRFS, which would allow the final 
rename to replace the source with a subvolume to be atomic (assuming of 
course you could cross-rename a directory and subvolume).
> 
> Thanks,
> Qu
> 
>> 3. Perform conversion, such as creating a new subvolume and moving the
>> contents of the source directory;
>> 4. Thaw the filesystem or restore the subvolume writable property.
>>
>> In fact, I am not so sure whether this use of freeze is appropriate
>> because the source directory the user needs to convert may be located
>> at / or /home and this pre-check and conversion process may take a long
>> time, which can lead to some shell and graphical application suspended.
>>
>> Please give your comments if any.
>>
> 


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: How about adding an ioctl to convert a directory to a subvolume?
  2017-11-27 13:02   ` Austin S. Hemmelgarn
@ 2017-11-27 13:17     ` Qu Wenruo
  2017-11-27 13:49       ` Austin S. Hemmelgarn
  0 siblings, 1 reply; 11+ messages in thread
From: Qu Wenruo @ 2017-11-27 13:17 UTC (permalink / raw)
  To: Austin S. Hemmelgarn, Lu Fengqi, linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 5918 bytes --]



On 2017年11月27日 21:02, Austin S. Hemmelgarn wrote:
> On 2017-11-27 05:13, Qu Wenruo wrote:
>>
>>
>> On 2017年11月27日 17:41, Lu Fengqi wrote:
>>> Hi all,
>>>
>>> As we all know, under certain circumstances, it is more appropriate to
>>> create some subvolumes rather than keep everything in the same
>>> subvolume. As the condition of demand change, the user may need to
>>> convert a previous directory to a subvolume. For this reason,how about
>>> adding an ioctl to convert a directory to a subvolume?
>>
>> The idea seems interesting.
>>
>> However in my opinion, this can be done quite easily in (mostly) user
>> space, thanks to btrfs support of relink.
>>
>> The method from Hugo or Chris is quite good, maybe it can be enhanced a
>> little.
>>
>> Use the following layout as an example:
>>
>> root_subv
>> |- subvolume_1
>> |  |- dir_1
>> |  |  |- file_1
>> |  |  |- file_2
>> |  |- dir_2
>> |     |- file_3
>> |- subvolume_2
>>
>> If we want to convert dir_1 into subvolume, we can do it like:
>>
>> 1) Create a temporary readonly snapshot of parent subvolume containing
>>     the desired dir
>>     # btrfs sub snapshot -r root_subv/subvolume_1 \
>>       root_subv/tmp_snapshot_1
>>
>> 2) Create a new subvolume, as destination.
>>     # btrfs sub create root_subv/tmp_dest/
>>
>> 3) Copy the content and sync the fs
>>     Use of reflink is necessary.
>>     # cp -r --reflink=always root_subv/tmp_snapshot_1/dir_1 \
>>       root_subv/tmp_dest
>>     # btrfs sync root_subv/tmp_dest
>>
>> 4) Delete temporary readonly snapshot
>>     # btrfs subvolume delete root_subv/tmp_snapshot_1
>>
>> 5) Remove the source dir
>>     # rm -rf root_subv/subvolume_1/dir_1
>>
>> 5) Create a final destination snapshot of "root_subv/temporary_dest"
>>     # btrfs subvolume snapshot root_subv/tmp_dest \
>>       root_subv/subvolume_1/dir_1
>>
>> 6) Remove the temporary destination
>>     # btrfs subvolume delete root_subv/tmp_dest
>>
>>
>> The main challenge is in step 3).
>> In fact above method can only handle normal dir/files.
>> If there is another subvolume inside the desired dir, current "cp -r" is
>> a bad idea.
>> We need to skip subvolume dir, and create snapshot for it.
>>
>> But it's quite easy to write a user space program to handle it.
>> Maybe using "find" command can already handle it well.
>>
>> Anyway, doing it in user space is already possible and much easier than
>> doing it in kernel.
>>
>>>
>>> Users can convert by the scripts mentioned in this
>>> thread(https://www.spinics.net/lists/linux-btrfs/msg33252.html), but is
>>> it easier to use the off-the-shelf btrfs subcommand?
>>
>> If you just want to integrate the functionality into btrfs-progs, maybe
>> it's possible.
>>
>> But if you insist in providing a new ioctl for this, I highly doubt if
>> the extra hassle is worthy.
>>
>>>
>>> After an initial consideration, our implementation is broadly divided
>>> into the following steps:
>>> 1. Freeze the filesystem or set the subvolume above the source directory
>>> to read-only;
>>
>> Not really need to freeze the whole fs.
>> Just create a readonly snapshot of the parent subvolume which contains
>> the dir.
>> That's how snapshot is designed for.
>>
>>> 2. Perform a pre-check, for example, check if a cross-device link
>>> creation during the conversion;
>>
>> This can be done in-the-fly.
>> As the check is so easy (only needs to check if the inode number is 256).
>> We only need a mid-order iteration of the source dir (in temporary
>> snapshot), and for normal file, use reflink.
>> For subvolume dir, create a snapshot for it.
>>
>> And for such iteration, a python script less than 100 lines would be
>> sufficient.
> On that note, see the function convert_dir_to_subv() in:
> https://github.com/Ferroin/btrfs-subv-backup/blob/master/btrfs-subv-backup.py
> 
> 
> For an example of how to do it in Python (albeit with some extra code to
> handle the case of not having the reflink module from PyPI, and without
> anything to prevent the source from being modified).
> 
> It would still be nice to be able to do this atomically though, or at
> least get cross-rename support in BTRFS, which would allow the final
> rename to replace the source with a subvolume to be atomic (assuming of
> course you could cross-rename a directory and subvolume).

The problem behind cross-rename is, btrfs doesn't follow the
one-inode-one-tree organization used by most filesystems.

This prevents inode from being referred outside of its subvolume.


And since btrfs uses one-subvolume-one-tree solution, which greatly
simplify the snapshot implementation, it's pretty hard or almost
impossible to do real rename-across-subvolume.

But at least we can reflink, reducing huge amount of data IO, making us
only need to handle inode creation/link.

(Although such one-subvolume-one-tree also makes metadata concurrency
very low, further slowing down the metadata operation)

Thanks,
Qu

>>
>> Thanks,
>> Qu
>>
>>> 3. Perform conversion, such as creating a new subvolume and moving the
>>> contents of the source directory;
>>> 4. Thaw the filesystem or restore the subvolume writable property.
>>>
>>> In fact, I am not so sure whether this use of freeze is appropriate
>>> because the source directory the user needs to convert may be located
>>> at / or /home and this pre-check and conversion process may take a long
>>> time, which can lead to some shell and graphical application suspended.
>>>
>>> Please give your comments if any.
>>>
>>
> 
> -- 
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 520 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: How about adding an ioctl to convert a directory to a subvolume?
  2017-11-27 13:17     ` Qu Wenruo
@ 2017-11-27 13:49       ` Austin S. Hemmelgarn
  0 siblings, 0 replies; 11+ messages in thread
From: Austin S. Hemmelgarn @ 2017-11-27 13:49 UTC (permalink / raw)
  To: Qu Wenruo, Lu Fengqi, linux-btrfs

On 2017-11-27 08:17, Qu Wenruo wrote:
> 
> 
> On 2017年11月27日 21:02, Austin S. Hemmelgarn wrote:
>> On 2017-11-27 05:13, Qu Wenruo wrote:
>>>
>>>
>>> On 2017年11月27日 17:41, Lu Fengqi wrote:
>>>> Hi all,
>>>>
>>>> As we all know, under certain circumstances, it is more appropriate to
>>>> create some subvolumes rather than keep everything in the same
>>>> subvolume. As the condition of demand change, the user may need to
>>>> convert a previous directory to a subvolume. For this reason,how about
>>>> adding an ioctl to convert a directory to a subvolume?
>>>
>>> The idea seems interesting.
>>>
>>> However in my opinion, this can be done quite easily in (mostly) user
>>> space, thanks to btrfs support of relink.
>>>
>>> The method from Hugo or Chris is quite good, maybe it can be enhanced a
>>> little.
>>>
>>> Use the following layout as an example:
>>>
>>> root_subv
>>> |- subvolume_1
>>> |  |- dir_1
>>> |  |  |- file_1
>>> |  |  |- file_2
>>> |  |- dir_2
>>> |     |- file_3
>>> |- subvolume_2
>>>
>>> If we want to convert dir_1 into subvolume, we can do it like:
>>>
>>> 1) Create a temporary readonly snapshot of parent subvolume containing
>>>      the desired dir
>>>      # btrfs sub snapshot -r root_subv/subvolume_1 \
>>>        root_subv/tmp_snapshot_1
>>>
>>> 2) Create a new subvolume, as destination.
>>>      # btrfs sub create root_subv/tmp_dest/
>>>
>>> 3) Copy the content and sync the fs
>>>      Use of reflink is necessary.
>>>      # cp -r --reflink=always root_subv/tmp_snapshot_1/dir_1 \
>>>        root_subv/tmp_dest
>>>      # btrfs sync root_subv/tmp_dest
>>>
>>> 4) Delete temporary readonly snapshot
>>>      # btrfs subvolume delete root_subv/tmp_snapshot_1
>>>
>>> 5) Remove the source dir
>>>      # rm -rf root_subv/subvolume_1/dir_1
>>>
>>> 5) Create a final destination snapshot of "root_subv/temporary_dest"
>>>      # btrfs subvolume snapshot root_subv/tmp_dest \
>>>        root_subv/subvolume_1/dir_1
>>>
>>> 6) Remove the temporary destination
>>>      # btrfs subvolume delete root_subv/tmp_dest
>>>
>>>
>>> The main challenge is in step 3).
>>> In fact above method can only handle normal dir/files.
>>> If there is another subvolume inside the desired dir, current "cp -r" is
>>> a bad idea.
>>> We need to skip subvolume dir, and create snapshot for it.
>>>
>>> But it's quite easy to write a user space program to handle it.
>>> Maybe using "find" command can already handle it well.
>>>
>>> Anyway, doing it in user space is already possible and much easier than
>>> doing it in kernel.
>>>
>>>>
>>>> Users can convert by the scripts mentioned in this
>>>> thread(https://www.spinics.net/lists/linux-btrfs/msg33252.html), but is
>>>> it easier to use the off-the-shelf btrfs subcommand?
>>>
>>> If you just want to integrate the functionality into btrfs-progs, maybe
>>> it's possible.
>>>
>>> But if you insist in providing a new ioctl for this, I highly doubt if
>>> the extra hassle is worthy.
>>>
>>>>
>>>> After an initial consideration, our implementation is broadly divided
>>>> into the following steps:
>>>> 1. Freeze the filesystem or set the subvolume above the source directory
>>>> to read-only;
>>>
>>> Not really need to freeze the whole fs.
>>> Just create a readonly snapshot of the parent subvolume which contains
>>> the dir.
>>> That's how snapshot is designed for.
>>>
>>>> 2. Perform a pre-check, for example, check if a cross-device link
>>>> creation during the conversion;
>>>
>>> This can be done in-the-fly.
>>> As the check is so easy (only needs to check if the inode number is 256).
>>> We only need a mid-order iteration of the source dir (in temporary
>>> snapshot), and for normal file, use reflink.
>>> For subvolume dir, create a snapshot for it.
>>>
>>> And for such iteration, a python script less than 100 lines would be
>>> sufficient.
>> On that note, see the function convert_dir_to_subv() in:
>> https://github.com/Ferroin/btrfs-subv-backup/blob/master/btrfs-subv-backup.py
>>
>>
>> For an example of how to do it in Python (albeit with some extra code to
>> handle the case of not having the reflink module from PyPI, and without
>> anything to prevent the source from being modified).
>>
>> It would still be nice to be able to do this atomically though, or at
>> least get cross-rename support in BTRFS, which would allow the final
>> rename to replace the source with a subvolume to be atomic (assuming of
>> course you could cross-rename a directory and subvolume).
> 
> The problem behind cross-rename is, btrfs doesn't follow the
> one-inode-one-tree organization used by most filesystems.
> 
> This prevents inode from being referred outside of its subvolume.
> 
> 
> And since btrfs uses one-subvolume-one-tree solution, which greatly
> simplify the snapshot implementation, it's pretty hard or almost
> impossible to do real rename-across-subvolume.
I seriously doubt that that matters in almost all real-world use cases. 
Everything I've seen that uses cross-rename does it with a temporary 
file in the same directory as the target file, using it to avoid the 
non-atomic nature of creating a backup and replacing a file without 
needing extra I/O (yes, reflinks help here, but still aren't perfect).

Just supporting it within a subvolume and returning whatever errno gets 
returned for trying to call rename(2) across filesystem boundaries 
should be more than sufficient for most use cases, even if it doesn't 
work with what I had suggested (which I believe probably qualifies as 
'novel' usage), and in theory would side-step the issues with inodes not 
being globally unique within the filesystem.
> 
> But at least we can reflink, reducing huge amount of data IO, making us
> only need to handle inode creation/link.
> 
> (Although such one-subvolume-one-tree also makes metadata concurrency
> very low, further slowing down the metadata operation)
> 
> Thanks,
> Qu
> 
>>>
>>> Thanks,
>>> Qu
>>>
>>>> 3. Perform conversion, such as creating a new subvolume and moving the
>>>> contents of the source directory;
>>>> 4. Thaw the filesystem or restore the subvolume writable property.
>>>>
>>>> In fact, I am not so sure whether this use of freeze is appropriate
>>>> because the source directory the user needs to convert may be located
>>>> at / or /home and this pre-check and conversion process may take a long
>>>> time, which can lead to some shell and graphical application suspended.
>>>>
>>>> Please give your comments if any.
>>>>
>>>
>>
>> -- 
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: How about adding an ioctl to convert a directory to a subvolume?
  2017-11-27 10:13 ` Qu Wenruo
  2017-11-27 13:02   ` Austin S. Hemmelgarn
@ 2017-11-28  8:29   ` Lu Fengqi
  2017-11-28  8:35     ` Qu Wenruo
  1 sibling, 1 reply; 11+ messages in thread
From: Lu Fengqi @ 2017-11-28  8:29 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Mon, Nov 27, 2017 at 06:13:10PM +0800, Qu Wenruo wrote:
>
>
>On 2017年11月27日 17:41, Lu Fengqi wrote:
>> Hi all,
>> 
>> As we all know, under certain circumstances, it is more appropriate to
>> create some subvolumes rather than keep everything in the same
>> subvolume. As the condition of demand change, the user may need to
>> convert a previous directory to a subvolume. For this reason,how about
>> adding an ioctl to convert a directory to a subvolume?
>
>The idea seems interesting.
>
>However in my opinion, this can be done quite easily in (mostly) user
>space, thanks to btrfs support of relink.
>
>The method from Hugo or Chris is quite good, maybe it can be enhanced a
>little.
>
>Use the following layout as an example:
>
>root_subv
>|- subvolume_1
>|  |- dir_1
>|  |  |- file_1
>|  |  |- file_2
>|  |- dir_2
>|     |- file_3
>|- subvolume_2
>
>If we want to convert dir_1 into subvolume, we can do it like:
>
>1) Create a temporary readonly snapshot of parent subvolume containing
>   the desired dir
>   # btrfs sub snapshot -r root_subv/subvolume_1 \
>     root_subv/tmp_snapshot_1
>
>2) Create a new subvolume, as destination.
>   # btrfs sub create root_subv/tmp_dest/
>
>3) Copy the content and sync the fs
>   Use of reflink is necessary.
>   # cp -r --reflink=always root_subv/tmp_snapshot_1/dir_1 \
>     root_subv/tmp_dest
>   # btrfs sync root_subv/tmp_dest
>
>4) Delete temporary readonly snapshot
>   # btrfs subvolume delete root_subv/tmp_snapshot_1
>
>5) Remove the source dir
>   # rm -rf root_subv/subvolume_1/dir_1
>
>5) Create a final destination snapshot of "root_subv/temporary_dest"
>   # btrfs subvolume snapshot root_subv/tmp_dest \
>     root_subv/subvolume_1/dir_1
>
>6) Remove the temporary destination
>   # btrfs subvolume delete root_subv/tmp_dest
>
>
>The main challenge is in step 3).
>In fact above method can only handle normal dir/files.
>If there is another subvolume inside the desired dir, current "cp -r" is
>a bad idea.
>We need to skip subvolume dir, and create snapshot for it.
>
>But it's quite easy to write a user space program to handle it.
>Maybe using "find" command can already handle it well.
>
>Anyway, doing it in user space is already possible and much easier than
>doing it in kernel.
>
>> 
>> Users can convert by the scripts mentioned in this
>> thread(https://www.spinics.net/lists/linux-btrfs/msg33252.html), but is
>> it easier to use the off-the-shelf btrfs subcommand?
>
>If you just want to integrate the functionality into btrfs-progs, maybe
>it's possible.
>
>But if you insist in providing a new ioctl for this, I highly doubt if
>the extra hassle is worthy.
>

Thanks for getting back to me.

The enhanced approach you provide is pretty good, and I agree that it
meets the needs of some users.

>> 
>> After an initial consideration, our implementation is broadly divided
>> into the following steps:
>> 1. Freeze the filesystem or set the subvolume above the source directory
>> to read-only;
>
>Not really need to freeze the whole fs.
>Just create a readonly snapshot of the parent subvolume which contains
>the dir.
>That's how snapshot is designed for.
>

I still worry about the following problem. Although tmp_snapshot_1 is
read-only, the source directory dir_1 is still writable. Also, step 5)
will delete dir_1, which may result in the loss of newly written data
during the conversion.

I can not think of any method else for now, except I set the subvolume_1
read-only(obviously this is a bad idea). Could you provide some
suggestions?

>> 2. Perform a pre-check, for example, check if a cross-device link
>> creation during the conversion;
>
>This can be done in-the-fly.
>As the check is so easy (only needs to check if the inode number is 256).
>We only need a mid-order iteration of the source dir (in temporary
>snapshot), and for normal file, use reflink.
>For subvolume dir, create a snapshot for it.
>
>And for such iteration, a python script less than 100 lines would be
>sufficient.

Just to clarify, the check is to confirm if there is a hard link across
dir_1. The dir_1 can't be converted to subvolume if there is such a
cross-device link that can not be created.

>
>Thanks,
>Qu
>
>> 3. Perform conversion, such as creating a new subvolume and moving the
>> contents of the source directory;
>> 4. Thaw the filesystem or restore the subvolume writable property.
>> 
>> In fact, I am not so sure whether this use of freeze is appropriate
>> because the source directory the user needs to convert may be located
>> at / or /home and this pre-check and conversion process may take a long
>> time, which can lead to some shell and graphical application suspended.
>> 
>> Please give your comments if any.
>> 
>

-- 
Thanks,
Lu



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: How about adding an ioctl to convert a directory to a subvolume?
  2017-11-28  8:29   ` Lu Fengqi
@ 2017-11-28  8:35     ` Qu Wenruo
  0 siblings, 0 replies; 11+ messages in thread
From: Qu Wenruo @ 2017-11-28  8:35 UTC (permalink / raw)
  To: Lu Fengqi; +Cc: linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 5529 bytes --]



On 2017年11月28日 16:29, Lu Fengqi wrote:
> On Mon, Nov 27, 2017 at 06:13:10PM +0800, Qu Wenruo wrote:
>>
>>
>> On 2017年11月27日 17:41, Lu Fengqi wrote:
>>> Hi all,
>>>
>>> As we all know, under certain circumstances, it is more appropriate to
>>> create some subvolumes rather than keep everything in the same
>>> subvolume. As the condition of demand change, the user may need to
>>> convert a previous directory to a subvolume. For this reason,how about
>>> adding an ioctl to convert a directory to a subvolume?
>>
>> The idea seems interesting.
>>
>> However in my opinion, this can be done quite easily in (mostly) user
>> space, thanks to btrfs support of relink.
>>
>> The method from Hugo or Chris is quite good, maybe it can be enhanced a
>> little.
>>
>> Use the following layout as an example:
>>
>> root_subv
>> |- subvolume_1
>> |  |- dir_1
>> |  |  |- file_1
>> |  |  |- file_2
>> |  |- dir_2
>> |     |- file_3
>> |- subvolume_2
>>
>> If we want to convert dir_1 into subvolume, we can do it like:
>>
>> 1) Create a temporary readonly snapshot of parent subvolume containing
>>   the desired dir
>>   # btrfs sub snapshot -r root_subv/subvolume_1 \
>>     root_subv/tmp_snapshot_1
>>
>> 2) Create a new subvolume, as destination.
>>   # btrfs sub create root_subv/tmp_dest/
>>
>> 3) Copy the content and sync the fs
>>   Use of reflink is necessary.
>>   # cp -r --reflink=always root_subv/tmp_snapshot_1/dir_1 \
>>     root_subv/tmp_dest
>>   # btrfs sync root_subv/tmp_dest
>>
>> 4) Delete temporary readonly snapshot
>>   # btrfs subvolume delete root_subv/tmp_snapshot_1
>>
>> 5) Remove the source dir
>>   # rm -rf root_subv/subvolume_1/dir_1
>>
>> 5) Create a final destination snapshot of "root_subv/temporary_dest"
>>   # btrfs subvolume snapshot root_subv/tmp_dest \
>>     root_subv/subvolume_1/dir_1
>>
>> 6) Remove the temporary destination
>>   # btrfs subvolume delete root_subv/tmp_dest
>>
>>
>> The main challenge is in step 3).
>> In fact above method can only handle normal dir/files.
>> If there is another subvolume inside the desired dir, current "cp -r" is
>> a bad idea.
>> We need to skip subvolume dir, and create snapshot for it.
>>
>> But it's quite easy to write a user space program to handle it.
>> Maybe using "find" command can already handle it well.
>>
>> Anyway, doing it in user space onsidering btrfs snapshot creation won't flush data, which already means some buffered data will not occur in snapshot.
is already possible and much easier than
>> doing it in kernel.
>>
>>>
>>> Users can convert by the scripts mentioned in this
>>> thread(https://www.spinics.net/lists/linux-btrfs/msg33252.html), but is
>>> it easier to use the off-the-shelf btrfs subcommand?
>>
>> If you just want to integrate the functionality into btrfs-progs, maybe
>> it's possible.
>>
>> But if you insist in providing a new ioctl for this, I highly doubt if
>> the extra hassle is worthy.
>>
> 
> Thanks for getting back to me.
> 
> The enhanced approach you provide is pretty good, and I agree that it
> meets the needs of some users.
> 
>>>
>>> After an initial consideration, our implementation is broadly divided
>>> into the following steps:
>>> 1. Freeze the filesystem or set the subvolume above the source directory
>>> to read-only;
>>
>> Not really need to freeze the whole fs.
>> Just create a readonly snapshot of the parent subvolume which contains
>> the dir.
>> That's how snapshot is designed for.
>>
> 
> I still worry about the following problem. Although tmp_snapshot_1 is
> read-only, the source directory dir_1 is still writable. Also, step 5)
> will delete dir_1, which may result in the loss of newly written data
> during the conversion.

Just a problem of timing.

Output message after or before creation of read-only snapshot to info
user will be good enough.

Especially considering btrfs snapshot creation won't flush data, which
already means some buffered data will not occur in snapshot.

Thanks,
Qu

> 
> I can not think of any method else for now, except I set the subvolume_1
> read-only(obviously this is a bad idea). Could you provide some
> suggestions?
> 
>>> 2. Perform a pre-check, for example, check if a cross-device link
>>> creation during the conversion;
>>
>> This can be done in-the-fly.
>> As the check is so easy (only needs to check if the inode number is 256).
>> We only need a mid-order iteration of the source dir (in temporary
>> snapshot), and for normal file, use reflink.
>> For subvolume dir, create a snapshot for it.
>>
>> And for such iteration, a python script less than 100 lines would be
>> sufficient.
> 
> Just to clarify, the check is to confirm if there is a hard link across
> dir_1.> The dir_1 can't be converted to subvolume if there is such a
> cross-device link that can not be created.
> 
>>
>> Thanks,
>> Qu
>>
>>> 3. Perform conversion, such as creating a new subvolume and moving the
>>> contents of the source directory;
>>> 4. Thaw the filesystem or restore the subvolume writable property.
>>>
>>> In fact, I am not so sure whether this use of freeze is appropriate
>>> because the source directory the user needs to convert may be located
>>> at / or /home and this pre-check and conversion process may take a long
>>> time, which can lead to some shell and graphical application suspended.
>>>
>>> Please give your comments if any.
>>>
>>
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 520 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: How about adding an ioctl to convert a directory to a subvolume?
  2017-11-27  9:41 How about adding an ioctl to convert a directory to a subvolume? Lu Fengqi
  2017-11-27 10:13 ` Qu Wenruo
@ 2017-11-28 18:48 ` David Sterba
  2017-11-28 19:54   ` Austin S. Hemmelgarn
                     ` (2 more replies)
  1 sibling, 3 replies; 11+ messages in thread
From: David Sterba @ 2017-11-28 18:48 UTC (permalink / raw)
  To: Lu Fengqi; +Cc: linux-btrfs

On Mon, Nov 27, 2017 at 05:41:56PM +0800, Lu Fengqi wrote:
> As we all know, under certain circumstances, it is more appropriate to
> create some subvolumes rather than keep everything in the same
> subvolume. As the condition of demand change, the user may need to
> convert a previous directory to a subvolume. For this reason,how about
> adding an ioctl to convert a directory to a subvolume?

I'd say too difficult to get everything right in kernel. This is
possible to be done in userspace, with existing tools.

The problem is that the conversion cannot be done atomically in most
cases, so even if it's just one ioctl call, there are several possible
intermediate states that would exist during the call. Reporting where
did the ioctl fail would need some extended error code semantics.

> Users can convert by the scripts mentioned in this
> thread(https://www.spinics.net/lists/linux-btrfs/msg33252.html), but is
> it easier to use the off-the-shelf btrfs subcommand?

Adding a subcommand would work, though I'd rather avoid reimplementing
'cp -ax' or 'rsync -ax'.  We want to copy the files preserving all
attributes, with reflink, and be able to identify partially synced
files, and not cross the mountpoints or subvolumes.

The middle step with snapshotting the containing subvolume before
syncing the data is also a valid option, but not always necessary.

> After an initial consideration, our implementation is broadly divided
> into the following steps:
> 1. Freeze the filesystem or set the subvolume above the source directory
> to read-only;

Freezing the filesystme will freeze all IO, so this would not work, but
I understand what you mean. The file data are synced before the snapshot
is taken, but nothing prevents applications to continue writing data.

Open and live files is a problem and don't see a nice solution here.

> 2. Perform a pre-check, for example, check if a cross-device link
> creation during the conversion;

Cross-device links are not a problem as long as we use 'cp' ie. the
manual creation of files in the target.

> 3. Perform conversion, such as creating a new subvolume and moving the
> contents of the source directory;
> 4. Thaw the filesystem or restore the subvolume writable property.
> 
> In fact, I am not so sure whether this use of freeze is appropriate
> because the source directory the user needs to convert may be located
> at / or /home and this pre-check and conversion process may take a long
> time, which can lead to some shell and graphical application suspended.

I think the closest operation is a read-only remount, which is not
always possible due to open files and can otherwise considered as quite
intrusive operation to the whole system. And the root filesystem cannot
be easily remounted read-only in the systemd days anyway.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: How about adding an ioctl to convert a directory to a subvolume?
  2017-11-28 18:48 ` David Sterba
@ 2017-11-28 19:54   ` Austin S. Hemmelgarn
  2017-11-28 20:04   ` Timofey Titovets
  2017-11-29 11:23   ` Lu Fengqi
  2 siblings, 0 replies; 11+ messages in thread
From: Austin S. Hemmelgarn @ 2017-11-28 19:54 UTC (permalink / raw)
  To: dsterba, Lu Fengqi, linux-btrfs

On 2017-11-28 13:48, David Sterba wrote:
> On Mon, Nov 27, 2017 at 05:41:56PM +0800, Lu Fengqi wrote:
>> As we all know, under certain circumstances, it is more appropriate to
>> create some subvolumes rather than keep everything in the same
>> subvolume. As the condition of demand change, the user may need to
>> convert a previous directory to a subvolume. For this reason,how about
>> adding an ioctl to convert a directory to a subvolume?
> 
> I'd say too difficult to get everything right in kernel. This is
> possible to be done in userspace, with existing tools.
> 
> The problem is that the conversion cannot be done atomically in most
> cases, so even if it's just one ioctl call, there are several possible
> intermediate states that would exist during the call. Reporting where
> did the ioctl fail would need some extended error code semantics.
I think you mean it can't be done atomically in an inexpensive manner 
without significant work on the kernel side.  It should in theory be 
possible to do it atomically by watching for and mirroring changes from 
the source directory to the new subvolume.  Such an approach is however 
expensive, and is not guaranteed to ever finish if the source directory 
is under active usage.  The only issue is updating open file descriptors 
to point to the new files.

In short, the flow I'm thinking of is:

1. Create the subvolume in a temporary state that would get cleaned up 
by garbage collection if the FS got remounted.
2. Start watching the directory structure in the source directory for 
changes, recursively, and mirror all such changes to the subvolume as 
they happen.
3. For each file in the source directory and sub-directories, create a 
file in the new subvolume using a reflink, and add start watching that 
file for changes.  Reflink any detected updates to the temporary file.
Beyond this point, there are two possible methods to finish things:
A:
     4. Freeze all userspace I/O to the filesystem.
     5. Update the dentry for the source directory to point to the new 
subvolume, remove the subvolume's 'temporary' status, and force a commit.
     6. Update all open file descriptors to point to the files in the 
new subvolume.
     7. Thaw all userspace I/O to the filesystem.
     8. Garbage collect the source directory and it's contents.
or:
B:
     4. Update the dentry for the source directory to point to the new 
subvolume, remove the subvolume's temporary status, and force a commit.
     5. Keep the old data around until there are no references to it, 
and indirect opens on files that were already open prior to step 4 to 
point to the old file, while keeping watches around for the old files 
that were open.

Prior to step A5 or B4, this can be atomically rolled back by simply 
nuking the temporary subvolume and removing the watches.  After those 
steps, it's fully complete as far as the on-device state is concerned. 
Method A of completing the conversion has less overall impact on 
long-term operation of the system, but may require significant changes 
to the VFS API to be viable (I don't know if some of the overlay stuff 
could be used or not).  Method B will continue to negatively impact 
performance until all the files that were open are closed, but shouldn't 
require as much legwork on the kernel side.  In both cases, it ends up 
working similarly to the device replace operation, or LVM's pvmove 
operation, both of which can be made atomic.

For what it's worth, I am of the opinion that this would be nice to have 
not so that stuff could be converted on-line, but so that you can 
convert a directory more easily off-line.  Right now, it's a serious 
pain in the arse to do such a conversion (the Python code I linked is 
simple because it's using high-level operations out of the shutil 
standard module, and/or the reflink module on PyPI), largely because it 
is decidedly non-trivial to actually copy all the data about a file (and 
both `cp -ax` and `rsync -ax` miss information other than reflinks, most 
notably file attributes normally set by `chattr`).

I personally care less about the atomicity of the operation than the 
fact that it actually preserves _everything_ (with the likely exception 
of EVM and IMA xattrs, but _nothing_ should be preserving those).  IOW, 
I would be perfectly fine with something that does this in the kernel 
but returns -EWHATEVER if there are open files below the source 
directory and blocks modification to them until the switch is done.
> 
>> Users can convert by the scripts mentioned in this
>> thread(https://www.spinics.net/lists/linux-btrfs/msg33252.html), but is
>> it easier to use the off-the-shelf btrfs subcommand?
> 
> Adding a subcommand would work, though I'd rather avoid reimplementing
> 'cp -ax' or 'rsync -ax'.  We want to copy the files preserving all
> attributes, with reflink, and be able to identify partially synced
> files, and not cross the mountpoints or subvolumes.
> 
> The middle step with snapshotting the containing subvolume before
> syncing the data is also a valid option, but not always necessary. >
>> After an initial consideration, our implementation is broadly divided
>> into the following steps:
>> 1. Freeze the filesystem or set the subvolume above the source directory
>> to read-only;
> 
> Freezing the filesystme will freeze all IO, so this would not work, but
> I understand what you mean. The file data are synced before the snapshot
> is taken, but nothing prevents applications to continue writing data.
> 
> Open and live files is a problem and don't see a nice solution here.
> 
>> 2. Perform a pre-check, for example, check if a cross-device link
>> creation during the conversion;
> 
> Cross-device links are not a problem as long as we use 'cp' ie. the
> manual creation of files in the target.
Avoiding this would be a nice side effect of having it in the kernel, 
but of course is mostly irrelevant because such a kernel-powered 
solution would only be on newer kernels.
> 
>> 3. Perform conversion, such as creating a new subvolume and moving the
>> contents of the source directory;
>> 4. Thaw the filesystem or restore the subvolume writable property.
>>
>> In fact, I am not so sure whether this use of freeze is appropriate
>> because the source directory the user needs to convert may be located
>> at / or /home and this pre-check and conversion process may take a long
>> time, which can lead to some shell and graphical application suspended.
> 
> I think the closest operation is a read-only remount, which is not
> always possible due to open files and can otherwise considered as quite
> intrusive operation to the whole system. And the root filesystem cannot
> be easily remounted read-only in the systemd days anyway.
That's not exactly a systemd specific thing (or a new thing for that 
matter) unless you've got /var on a separate partition (and I know of no 
distributions that do so without manual intervention).

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: How about adding an ioctl to convert a directory to a subvolume?
  2017-11-28 18:48 ` David Sterba
  2017-11-28 19:54   ` Austin S. Hemmelgarn
@ 2017-11-28 20:04   ` Timofey Titovets
  2017-11-29 11:23   ` Lu Fengqi
  2 siblings, 0 replies; 11+ messages in thread
From: Timofey Titovets @ 2017-11-28 20:04 UTC (permalink / raw)
  To: David Sterba, Lu Fengqi, linux-btrfs

2017-11-28 21:48 GMT+03:00 David Sterba <dsterba@suse.cz>:
> On Mon, Nov 27, 2017 at 05:41:56PM +0800, Lu Fengqi wrote:
>> As we all know, under certain circumstances, it is more appropriate to
>> create some subvolumes rather than keep everything in the same
>> subvolume. As the condition of demand change, the user may need to
>> convert a previous directory to a subvolume. For this reason,how about
>> adding an ioctl to convert a directory to a subvolume?
>
> I'd say too difficult to get everything right in kernel. This is
> possible to be done in userspace, with existing tools.
>
> The problem is that the conversion cannot be done atomically in most
> cases, so even if it's just one ioctl call, there are several possible
> intermediate states that would exist during the call. Reporting where
> did the ioctl fail would need some extended error code semantics.
>
>> Users can convert by the scripts mentioned in this
>> thread(https://www.spinics.net/lists/linux-btrfs/msg33252.html), but is
>> it easier to use the off-the-shelf btrfs subcommand?
>
> Adding a subcommand would work, though I'd rather avoid reimplementing
> 'cp -ax' or 'rsync -ax'.  We want to copy the files preserving all
> attributes, with reflink, and be able to identify partially synced
> files, and not cross the mountpoints or subvolumes.
>
> The middle step with snapshotting the containing subvolume before
> syncing the data is also a valid option, but not always necessary.
>
>> After an initial consideration, our implementation is broadly divided
>> into the following steps:
>> 1. Freeze the filesystem or set the subvolume above the source directory
>> to read-only;
>
> Freezing the filesystme will freeze all IO, so this would not work, but
> I understand what you mean. The file data are synced before the snapshot
> is taken, but nothing prevents applications to continue writing data.
>
> Open and live files is a problem and don't see a nice solution here.
>
>> 2. Perform a pre-check, for example, check if a cross-device link
>> creation during the conversion;
>
> Cross-device links are not a problem as long as we use 'cp' ie. the
> manual creation of files in the target.
>
>> 3. Perform conversion, such as creating a new subvolume and moving the
>> contents of the source directory;
>> 4. Thaw the filesystem or restore the subvolume writable property.
>>
>> In fact, I am not so sure whether this use of freeze is appropriate
>> because the source directory the user needs to convert may be located
>> at / or /home and this pre-check and conversion process may take a long
>> time, which can lead to some shell and graphical application suspended.
>
> I think the closest operation is a read-only remount, which is not
> always possible due to open files and can otherwise considered as quite
> intrusive operation to the whole system. And the root filesystem cannot
> be easily remounted read-only in the systemd days anyway.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

My two 2c,
Then we talking about 'fast' (i.e. i like the idea where ioctl calls
to be fast) conversion of dir to subvolume,
can be done like that (sorry if i miss understood something and that a
rave or i'm crazy..):

For make idea more clear, for userspace that can looks like:
1. Create snapshot of parent subvol for that dir
2. Cleanup all data, except content of dir in snapshot
3. Move content of that dir to snapshot root
4. Replace dir with that snapshot/subvol
i.e. no copy, no cp, only rename() and garbage collecting.

In kernel that in "theory" will looks like:
1. Copy of subvol root inode
2. Replace root inode with target dir inode
3. Replace target dir in old subvol with new subvol
4. GC old dir content from parent subvol, GC all useless content of
around dir in new subvol

That's may be a fastest way for user, but that will not solve problems
with opened files & etc,
but that must be fast from user point of view, and all other staff can
be simply cleaned in background

Thanks
-- 
Have a nice day,
Timofey.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: How about adding an ioctl to convert a directory to a subvolume?
  2017-11-28 18:48 ` David Sterba
  2017-11-28 19:54   ` Austin S. Hemmelgarn
  2017-11-28 20:04   ` Timofey Titovets
@ 2017-11-29 11:23   ` Lu Fengqi
  2 siblings, 0 replies; 11+ messages in thread
From: Lu Fengqi @ 2017-11-29 11:23 UTC (permalink / raw)
  To: dsterba, linux-btrfs


On Tue, Nov 28, 2017 at 07:48:28PM +0100, David Sterba wrote:
>On Mon, Nov 27, 2017 at 05:41:56PM +0800, Lu Fengqi wrote:
>> As we all know, under certain circumstances, it is more appropriate to
>> create some subvolumes rather than keep everything in the same
>> subvolume. As the condition of demand change, the user may need to
>> convert a previous directory to a subvolume. For this reason,how about
>> adding an ioctl to convert a directory to a subvolume?

Thanks for taking time to reply.

>
>I'd say too difficult to get everything right in kernel. This is
>possible to be done in userspace, with existing tools.
>
>The problem is that the conversion cannot be done atomically in most
>cases, so even if it's just one ioctl call, there are several possible
>intermediate states that would exist during the call. Reporting where
>did the ioctl fail would need some extended error code semantics.

Make sense. 

>
>> Users can convert by the scripts mentioned in this
>> thread(https://www.spinics.net/lists/linux-btrfs/msg33252.html), but is
>> it easier to use the off-the-shelf btrfs subcommand?
>
>Adding a subcommand would work, though I'd rather avoid reimplementing
>'cp -ax' or 'rsync -ax'.  We want to copy the files preserving all
>attributes, with reflink, and be able to identify partially synced
>files, and not cross the mountpoints or subvolumes.

I agree that re-implementation of cp is worthless. 'cp -ax
--reflink=always' already meet the above requirements, except identifing
partially synced files.

In fact, I'm not sure what the partially synced files mean? The files
being partially written to disk? If yes, why we need to identify these?
Or this refers to the open and live files mentioned below?

>
>The middle step with snapshotting the containing subvolume before
>syncing the data is also a valid option, but not always necessary.
>
>> After an initial consideration, our implementation is broadly divided
>> into the following steps:
>> 1. Freeze the filesystem or set the subvolume above the source directory
>> to read-only;
>
>Freezing the filesystme will freeze all IO, so this would not work, but
>I understand what you mean. The file data are synced before the snapshot
>is taken, but nothing prevents applications to continue writing data.
>
>Open and live files is a problem and don't see a nice solution here.
>
>> 2. Perform a pre-check, for example, check if a cross-device link
>> creation during the conversion;
>
>Cross-device links are not a problem as long as we use 'cp' ie. the
>manual creation of files in the target.

└── subvol_1
    ├── dir_1
    │   └── file_1_hard_link
    └── file_1

If we want to convert dir_1 to a new subvolume, we can't create the hard
link of file1 in the new subvolume so that we must abort the conversion.

>
>> 3. Perform conversion, such as creating a new subvolume and moving the
>> contents of the source directory;
>> 4. Thaw the filesystem or restore the subvolume writable property.
>> 
>> In fact, I am not so sure whether this use of freeze is appropriate
>> because the source directory the user needs to convert may be located
>> at / or /home and this pre-check and conversion process may take a long
>> time, which can lead to some shell and graphical application suspended.
>
>I think the closest operation is a read-only remount, which is not
>always possible due to open files and can otherwise considered as quite
>intrusive operation to the whole system. And the root filesystem cannot
>be easily remounted read-only in the systemd days anyway.
>
>

-- 
Thanks,
Lu



^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2017-11-29 11:24 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-11-27  9:41 How about adding an ioctl to convert a directory to a subvolume? Lu Fengqi
2017-11-27 10:13 ` Qu Wenruo
2017-11-27 13:02   ` Austin S. Hemmelgarn
2017-11-27 13:17     ` Qu Wenruo
2017-11-27 13:49       ` Austin S. Hemmelgarn
2017-11-28  8:29   ` Lu Fengqi
2017-11-28  8:35     ` Qu Wenruo
2017-11-28 18:48 ` David Sterba
2017-11-28 19:54   ` Austin S. Hemmelgarn
2017-11-28 20:04   ` Timofey Titovets
2017-11-29 11:23   ` Lu Fengqi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.