All of lore.kernel.org
 help / color / mirror / Atom feed
* [btrfs-progs] Bug in mkfs.btrfs -r
@ 2017-08-31 17:27 Goffredo Baroncelli
  2017-08-31 18:49 ` Austin S. Hemmelgarn
  2017-09-01  0:13 ` Qu Wenruo
  0 siblings, 2 replies; 15+ messages in thread
From: Goffredo Baroncelli @ 2017-08-31 17:27 UTC (permalink / raw)
  To: linux-btrfs

Hi All,


I found a bug in mkfs.btrfs, when it is used the option '-r'. It seems that it is not visible the full disk.

$ uname -a
Linux venice.bhome 4.12.8 #268 SMP Thu Aug 17 09:03:26 CEST 2017 x86_64 GNU/Linux
$ btrfs --version
btrfs-progs v4.12


--- First try without '-r' (/dev/sda is about 80GB)

$ sudo mkfs.btrfs  -f /dev/sda4
btrfs-progs v4.12
See http://btrfs.wiki.kernel.org for more information.

Label:              (null)
UUID:               6f11971d-a945-4f33-8750-c16a7438a15d
Node size:          16384
Sector size:        4096
Filesystem size:    83.73GiB
Block group profiles:
  Data:             single            8.00MiB
  Metadata:         DUP               1.00GiB
  System:           DUP               8.00MiB
SSD detected:       no
Incompat features:  extref, skinny-metadata
Number of devices:  1
Devices:
   ID        SIZE  PATH
    1    83.73GiB  /dev/sda4


All the disk (~80GB) is visible


---- Now try with '-r'

$ sudo mkfs.btrfs  -r /tmp/test/ -f /dev/sda4
btrfs-progs v4.12
See http://btrfs.wiki.kernel.org for more information.

Making image is completed.
Label:              (null)
UUID:               60ea1c38-e5b1-403d-8a25-1cac2258922d
Node size:          16384
Sector size:        4096
Filesystem size:    52.00MiB
Block group profiles:
  Data:             single           29.19MiB
  Metadata:         DUP               5.19MiB
  System:           DUP               4.00MiB
SSD detected:       no
Incompat features:  extref, skinny-metadata
Number of devices:  1
Devices:
   ID        SIZE  PATH
    1    52.00MiB  /dev/sda4


where /tmp/test contains:
$ ls -l /tmp/test/
total 16392
-rw-r--r-- 1 ghigo ghigo        5 Aug 31 18:26 123
-rw-r--r-- 1 ghigo ghigo 16777216 Aug 31 18:26 123456
-rw-r--r-- 1 ghigo ghigo        5 Aug 31 18:26 456


BTRFS sees only 52MB instead of ~80GB.

Even if I try to mount and umount, the thing doesn't change.

$ sudo mount /dev/sda4 /mnt/other/
$ echo 123 | sudo tee /mnt/other/9999
123
$ sync
$ sudo umount /mnt/other 
$ sudo mount /dev/sda4 /mnt/other/

Below an output of "btrfs fi us"

$ sudo btrfs fi us /mnt/other/
Overall:
    Device size:		  52.00MiB
    Device allocated:		  52.00MiB
    Device unallocated:		     0.00B
    Device missing:		     0.00B
    Used:			  16.16MiB
    Free (estimated):		  13.19MiB	(min: 13.19MiB)
    Data ratio:			      1.00
    Metadata ratio:		      1.00
    Global reserve:		  16.00MiB	(used: 0.00B)

Data,single: Size:29.19MiB, Used:16.00MiB
   /dev/sda4	  29.19MiB

Metadata,single: Size:18.81MiB, Used:144.00KiB
   /dev/sda4	  18.81MiB

System,single: Size:4.00MiB, Used:16.00KiB
   /dev/sda4	   4.00MiB

Unallocated:
   /dev/sda4	     0.00B



And a balance is impossible

$ sudo btrfs bala start --full-balance /mnt/other/
ERROR: error during balancing '/mnt/other/': No space left on device
There may be more info in syslog - try dmesg | tail

$ dmesg | tail
[ 2034.684649] BTRFS: device fsid 60ea1c38-e5b1-403d-8a25-1cac2258922d devid 1 transid 7 /dev/sda4
[ 2140.835629] BTRFS info (device sda4): disk space caching is enabled
[ 2140.835632] BTRFS info (device sda4): has skinny extents
[ 2140.835633] BTRFS info (device sda4): flagging fs with big metadata feature
[ 2140.837381] BTRFS info (device sda4): creating UUID tree
[ 2171.646349] BTRFS info (device sda4): disk space caching is enabled
[ 2171.646362] BTRFS info (device sda4): has skinny extents
[ 2273.696914] BTRFS info (device sda4): relocating block group 20512768 flags data
[ 2273.721995] BTRFS info (device sda4): relocating block group 9633792 flags data
[ 2273.746950] BTRFS info (device sda4): 6 enospc errors during balance


I tried several btrfs-progs version ( I go back until v0.20-rc1: 2012!!!), and the behavior is the same: with '-r' the disk is not fully used. So I suppose that it is a kernel problem (IIRC the kernel should "complete" the mkfs at the first mount). Any idea ? 

BR
G.Baroncelli

-- 
gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [btrfs-progs] Bug in mkfs.btrfs -r
  2017-08-31 17:27 [btrfs-progs] Bug in mkfs.btrfs -r Goffredo Baroncelli
@ 2017-08-31 18:49 ` Austin S. Hemmelgarn
  2017-08-31 20:29   ` Goffredo Baroncelli
  2017-09-01  0:13 ` Qu Wenruo
  1 sibling, 1 reply; 15+ messages in thread
From: Austin S. Hemmelgarn @ 2017-08-31 18:49 UTC (permalink / raw)
  To: kreijack, linux-btrfs

On 2017-08-31 13:27, Goffredo Baroncelli wrote:
> Hi All,
> 
> 
> I found a bug in mkfs.btrfs, when it is used the option '-r'. It seems that it is not visible the full disk.
> 
> $ uname -a
> Linux venice.bhome 4.12.8 #268 SMP Thu Aug 17 09:03:26 CEST 2017 x86_64 GNU/Linux
> $ btrfs --version
> btrfs-progs v4.12
As far as I understand it, this is intended behavior.  Tools that offer 
equivalent options (genext2fs for example) are designed to generate 
pre-packaged system images that can then be resized to fit the target 
device's space.  As an example use case, I do full system image updates 
on some of my systems (that is, I keep per-system configuration to a 
minimum and replace the entire root filesystem when I update the system) 
I use this option to generate a base-image, which then gets 
automatically resized by my update scripts to fill the partition during 
the update process.

Overall, this could probably stand to be documented better though (I'll 
look at writing a patch to update the documentation to clarify this when 
I have some spare time over the weekend).
> 
> 
> --- First try without '-r' (/dev/sda is about 80GB)
> 
> $ sudo mkfs.btrfs  -f /dev/sda4
> btrfs-progs v4.12
> See http://btrfs.wiki.kernel.org for more information.
> 
> Label:              (null)
> UUID:               6f11971d-a945-4f33-8750-c16a7438a15d
> Node size:          16384
> Sector size:        4096
> Filesystem size:    83.73GiB
> Block group profiles:
>    Data:             single            8.00MiB
>    Metadata:         DUP               1.00GiB
>    System:           DUP               8.00MiB
> SSD detected:       no
> Incompat features:  extref, skinny-metadata
> Number of devices:  1
> Devices:
>     ID        SIZE  PATH
>      1    83.73GiB  /dev/sda4
> 
> 
> All the disk (~80GB) is visible
> 
> 
> ---- Now try with '-r'
> 
> $ sudo mkfs.btrfs  -r /tmp/test/ -f /dev/sda4
> btrfs-progs v4.12
> See http://btrfs.wiki.kernel.org for more information.
> 
> Making image is completed.
> Label:              (null)
> UUID:               60ea1c38-e5b1-403d-8a25-1cac2258922d
> Node size:          16384
> Sector size:        4096
> Filesystem size:    52.00MiB
> Block group profiles:
>    Data:             single           29.19MiB
>    Metadata:         DUP               5.19MiB
>    System:           DUP               4.00MiB
> SSD detected:       no
> Incompat features:  extref, skinny-metadata
> Number of devices:  1
> Devices:
>     ID        SIZE  PATH
>      1    52.00MiB  /dev/sda4
> 
> 
> where /tmp/test contains:
> $ ls -l /tmp/test/
> total 16392
> -rw-r--r-- 1 ghigo ghigo        5 Aug 31 18:26 123
> -rw-r--r-- 1 ghigo ghigo 16777216 Aug 31 18:26 123456
> -rw-r--r-- 1 ghigo ghigo        5 Aug 31 18:26 456
> 
> 
> BTRFS sees only 52MB instead of ~80GB.
> 
> Even if I try to mount and umount, the thing doesn't change.
> 
> $ sudo mount /dev/sda4 /mnt/other/
> $ echo 123 | sudo tee /mnt/other/9999
> 123
> $ sync
> $ sudo umount /mnt/other
> $ sudo mount /dev/sda4 /mnt/other/
> 
> Below an output of "btrfs fi us"
> 
> $ sudo btrfs fi us /mnt/other/
> Overall:
>      Device size:		  52.00MiB
>      Device allocated:		  52.00MiB
>      Device unallocated:		     0.00B
>      Device missing:		     0.00B
>      Used:			  16.16MiB
>      Free (estimated):		  13.19MiB	(min: 13.19MiB)
>      Data ratio:			      1.00
>      Metadata ratio:		      1.00
>      Global reserve:		  16.00MiB	(used: 0.00B)
> 
> Data,single: Size:29.19MiB, Used:16.00MiB
>     /dev/sda4	  29.19MiB
> 
> Metadata,single: Size:18.81MiB, Used:144.00KiB
>     /dev/sda4	  18.81MiB
> 
> System,single: Size:4.00MiB, Used:16.00KiB
>     /dev/sda4	   4.00MiB
> 
> Unallocated:
>     /dev/sda4	     0.00B
> 
> 
> 
> And a balance is impossible
> 
> $ sudo btrfs bala start --full-balance /mnt/other/
> ERROR: error during balancing '/mnt/other/': No space left on device
> There may be more info in syslog - try dmesg | tail
> 
> $ dmesg | tail
> [ 2034.684649] BTRFS: device fsid 60ea1c38-e5b1-403d-8a25-1cac2258922d devid 1 transid 7 /dev/sda4
> [ 2140.835629] BTRFS info (device sda4): disk space caching is enabled
> [ 2140.835632] BTRFS info (device sda4): has skinny extents
> [ 2140.835633] BTRFS info (device sda4): flagging fs with big metadata feature
> [ 2140.837381] BTRFS info (device sda4): creating UUID tree
> [ 2171.646349] BTRFS info (device sda4): disk space caching is enabled
> [ 2171.646362] BTRFS info (device sda4): has skinny extents
> [ 2273.696914] BTRFS info (device sda4): relocating block group 20512768 flags data
> [ 2273.721995] BTRFS info (device sda4): relocating block group 9633792 flags data
> [ 2273.746950] BTRFS info (device sda4): 6 enospc errors during balance
> 
> 
> I tried several btrfs-progs version ( I go back until v0.20-rc1: 2012!!!), and the behavior is the same: with '-r' the disk is not fully used. So I suppose that it is a kernel problem (IIRC the kernel should "complete" the mkfs at the first mount). Any idea ?
> 
> BR
> G.Baroncelli
> 


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [btrfs-progs] Bug in mkfs.btrfs -r
  2017-08-31 18:49 ` Austin S. Hemmelgarn
@ 2017-08-31 20:29   ` Goffredo Baroncelli
  2017-09-01 11:49     ` Austin S. Hemmelgarn
  0 siblings, 1 reply; 15+ messages in thread
From: Goffredo Baroncelli @ 2017-08-31 20:29 UTC (permalink / raw)
  To: Austin S. Hemmelgarn, linux-btrfs

On 2017-08-31 20:49, Austin S. Hemmelgarn wrote:
> On 2017-08-31 13:27, Goffredo Baroncelli wrote:
>> Hi All,
>> 
>> 
>> I found a bug in mkfs.btrfs, when it is used the option '-r'. It
>> seems that it is not visible the full disk.
>> 
>> $ uname -a Linux venice.bhome 4.12.8 #268 SMP Thu Aug 17 09:03:26
>> CEST 2017 x86_64 GNU/Linux $ btrfs --version btrfs-progs v4.12
> As far as I understand it, this is intended behavior.  Tools that
> offer equivalent options (genext2fs for example) are designed to
> generate pre-packaged system images that can then be resized to fit
> the target device's space.  As an example use case, I do full system
> image updates on some of my systems (that is, I keep per-system
> configuration to a minimum and replace the entire root filesystem
> when I update the system) I use this option to generate a base-image,
> which then gets automatically resized by my update scripts to fill
> the partition during the update process.

Sorry, but I am a bit confused. If I run "mkfs.btrfs -r ...." on a partition... how I can detect the end of the filesystem in order to cut the unused space ?

>From your explanation I should do

# mkfs.btrfs -r <source> /dev/sdX

then

# dd if=/dev/sdX of=/tmp/image bs=1M count=NNNN

What I have to put in NNNN ?

genext2fs in effect works generating a file. Instead mkfs.btrfs seems to work only with disks (or file already created)...

> 
> Overall, this could probably stand to be documented better though
> (I'll look at writing a patch to update the documentation to clarify
> this when I have some spare time over the weekend).

This would be great. However I think that some code should be update in order to generate a file instead of rely on a block device.

BR
G.Baroncelli 
-- 
gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [btrfs-progs] Bug in mkfs.btrfs -r
  2017-08-31 17:27 [btrfs-progs] Bug in mkfs.btrfs -r Goffredo Baroncelli
  2017-08-31 18:49 ` Austin S. Hemmelgarn
@ 2017-09-01  0:13 ` Qu Wenruo
  2017-09-01 11:28   ` Austin S. Hemmelgarn
  1 sibling, 1 reply; 15+ messages in thread
From: Qu Wenruo @ 2017-09-01  0:13 UTC (permalink / raw)
  To: kreijack, linux-btrfs



On 2017年09月01日 01:27, Goffredo Baroncelli wrote:
> Hi All,
> 
> 
> I found a bug in mkfs.btrfs, when it is used the option '-r'. It seems that it is not visible the full disk.

Despite the new bug you found, -r has several existing bugs.

For example it will create dev extent starting from physical offset 0, 
while kernel and mkfs will avoid that range, as 0~1M on each device is 
reserved.

According to the code, -r will modify chunk layout by itself, not the 
traditional way kernel is doing.

I'll fix them (if I'm not a lazybone), before that fix, please don't use 
-r option as it's not well maintained or fully tested.

Thanks,
Qu

> 
> $ uname -a
> Linux venice.bhome 4.12.8 #268 SMP Thu Aug 17 09:03:26 CEST 2017 x86_64 GNU/Linux
> $ btrfs --version
> btrfs-progs v4.12
> 
> 
> --- First try without '-r' (/dev/sda is about 80GB)
> 
> $ sudo mkfs.btrfs  -f /dev/sda4
> btrfs-progs v4.12
> See http://btrfs.wiki.kernel.org for more information.
> 
> Label:              (null)
> UUID:               6f11971d-a945-4f33-8750-c16a7438a15d
> Node size:          16384
> Sector size:        4096
> Filesystem size:    83.73GiB
> Block group profiles:
>    Data:             single            8.00MiB
>    Metadata:         DUP               1.00GiB
>    System:           DUP               8.00MiB
> SSD detected:       no
> Incompat features:  extref, skinny-metadata
> Number of devices:  1
> Devices:
>     ID        SIZE  PATH
>      1    83.73GiB  /dev/sda4
> 
> 
> All the disk (~80GB) is visible
> 
> 
> ---- Now try with '-r'
> 
> $ sudo mkfs.btrfs  -r /tmp/test/ -f /dev/sda4
> btrfs-progs v4.12
> See http://btrfs.wiki.kernel.org for more information.
> 
> Making image is completed.
> Label:              (null)
> UUID:               60ea1c38-e5b1-403d-8a25-1cac2258922d
> Node size:          16384
> Sector size:        4096
> Filesystem size:    52.00MiB
> Block group profiles:
>    Data:             single           29.19MiB
>    Metadata:         DUP               5.19MiB
>    System:           DUP               4.00MiB
> SSD detected:       no
> Incompat features:  extref, skinny-metadata
> Number of devices:  1
> Devices:
>     ID        SIZE  PATH
>      1    52.00MiB  /dev/sda4
> 
> 
> where /tmp/test contains:
> $ ls -l /tmp/test/
> total 16392
> -rw-r--r-- 1 ghigo ghigo        5 Aug 31 18:26 123
> -rw-r--r-- 1 ghigo ghigo 16777216 Aug 31 18:26 123456
> -rw-r--r-- 1 ghigo ghigo        5 Aug 31 18:26 456
> 
> 
> BTRFS sees only 52MB instead of ~80GB.
> 
> Even if I try to mount and umount, the thing doesn't change.
> 
> $ sudo mount /dev/sda4 /mnt/other/
> $ echo 123 | sudo tee /mnt/other/9999
> 123
> $ sync
> $ sudo umount /mnt/other
> $ sudo mount /dev/sda4 /mnt/other/
> 
> Below an output of "btrfs fi us"
> 
> $ sudo btrfs fi us /mnt/other/
> Overall:
>      Device size:		  52.00MiB
>      Device allocated:		  52.00MiB
>      Device unallocated:		     0.00B
>      Device missing:		     0.00B
>      Used:			  16.16MiB
>      Free (estimated):		  13.19MiB	(min: 13.19MiB)
>      Data ratio:			      1.00
>      Metadata ratio:		      1.00
>      Global reserve:		  16.00MiB	(used: 0.00B)
> 
> Data,single: Size:29.19MiB, Used:16.00MiB
>     /dev/sda4	  29.19MiB
> 
> Metadata,single: Size:18.81MiB, Used:144.00KiB
>     /dev/sda4	  18.81MiB
> 
> System,single: Size:4.00MiB, Used:16.00KiB
>     /dev/sda4	   4.00MiB
> 
> Unallocated:
>     /dev/sda4	     0.00B
> 
> 
> 
> And a balance is impossible
> 
> $ sudo btrfs bala start --full-balance /mnt/other/
> ERROR: error during balancing '/mnt/other/': No space left on device
> There may be more info in syslog - try dmesg | tail
> 
> $ dmesg | tail
> [ 2034.684649] BTRFS: device fsid 60ea1c38-e5b1-403d-8a25-1cac2258922d devid 1 transid 7 /dev/sda4
> [ 2140.835629] BTRFS info (device sda4): disk space caching is enabled
> [ 2140.835632] BTRFS info (device sda4): has skinny extents
> [ 2140.835633] BTRFS info (device sda4): flagging fs with big metadata feature
> [ 2140.837381] BTRFS info (device sda4): creating UUID tree
> [ 2171.646349] BTRFS info (device sda4): disk space caching is enabled
> [ 2171.646362] BTRFS info (device sda4): has skinny extents
> [ 2273.696914] BTRFS info (device sda4): relocating block group 20512768 flags data
> [ 2273.721995] BTRFS info (device sda4): relocating block group 9633792 flags data
> [ 2273.746950] BTRFS info (device sda4): 6 enospc errors during balance
> 
> 
> I tried several btrfs-progs version ( I go back until v0.20-rc1: 2012!!!), and the behavior is the same: with '-r' the disk is not fully used. So I suppose that it is a kernel problem (IIRC the kernel should "complete" the mkfs at the first mount). Any idea ?
> 
> BR
> G.Baroncelli
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [btrfs-progs] Bug in mkfs.btrfs -r
  2017-09-01  0:13 ` Qu Wenruo
@ 2017-09-01 11:28   ` Austin S. Hemmelgarn
  2017-09-01 11:49     ` Qu Wenruo
  2017-09-01 11:54     ` Qu Wenruo
  0 siblings, 2 replies; 15+ messages in thread
From: Austin S. Hemmelgarn @ 2017-09-01 11:28 UTC (permalink / raw)
  To: Qu Wenruo, kreijack, linux-btrfs

On 2017-08-31 20:13, Qu Wenruo wrote:
> 
> 
> On 2017年09月01日 01:27, Goffredo Baroncelli wrote:
>> Hi All,
>>
>>
>> I found a bug in mkfs.btrfs, when it is used the option '-r'. It seems 
>> that it is not visible the full disk.
> 
> Despite the new bug you found, -r has several existing bugs.
Is this actually a bug though?  Every other filesystem creation  tool 
that I know of that offers functionality like this generates the 
filesystem just large enough to contain the data you want in it, so I 
would argue that making this use the whole device is actually breaking 
consistency with other tools, not to mention removing functionality that 
is useful (even aside from the system image generation use case I 
mentioned, there are other practical applications (seed 'device' 
generation comes to mind).
> 
> For example it will create dev extent starting from physical offset 0, 
> while kernel and mkfs will avoid that range, as 0~1M on each device is 
> reserved.
> 
> According to the code, -r will modify chunk layout by itself, not the 
> traditional way kernel is doing.
> 
> I'll fix them (if I'm not a lazybone), before that fix, please don't use 
> -r option as it's not well maintained or fully tested.
FWIW, based on my own testing, filesystems generated with '-r' work just 
fine as long as you don't try to embed boot code in the FS itself.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [btrfs-progs] Bug in mkfs.btrfs -r
  2017-08-31 20:29   ` Goffredo Baroncelli
@ 2017-09-01 11:49     ` Austin S. Hemmelgarn
  0 siblings, 0 replies; 15+ messages in thread
From: Austin S. Hemmelgarn @ 2017-09-01 11:49 UTC (permalink / raw)
  To: kreijack, linux-btrfs

On 2017-08-31 16:29, Goffredo Baroncelli wrote:
> On 2017-08-31 20:49, Austin S. Hemmelgarn wrote:
>> On 2017-08-31 13:27, Goffredo Baroncelli wrote:
>>> Hi All,
>>>
>>>
>>> I found a bug in mkfs.btrfs, when it is used the option '-r'. It
>>> seems that it is not visible the full disk.
>>>
>>> $ uname -a Linux venice.bhome 4.12.8 #268 SMP Thu Aug 17 09:03:26
>>> CEST 2017 x86_64 GNU/Linux $ btrfs --version btrfs-progs v4.12
>> As far as I understand it, this is intended behavior.  Tools that
>> offer equivalent options (genext2fs for example) are designed to
>> generate pre-packaged system images that can then be resized to fit
>> the target device's space.  As an example use case, I do full system
>> image updates on some of my systems (that is, I keep per-system
>> configuration to a minimum and replace the entire root filesystem
>> when I update the system) I use this option to generate a base-image,
>> which then gets automatically resized by my update scripts to fill
>> the partition during the update process.
> 
> Sorry, but I am a bit confused. If I run "mkfs.btrfs -r ...." on a partition... how I can detect the end of the filesystem in order to cut the unused space ?
> 
>  From your explanation I should do
> 
> # mkfs.btrfs -r <source> /dev/sdX
> 
> then
> 
> # dd if=/dev/sdX of=/tmp/image bs=1M count=NNNN
> 
> What I have to put in NNNN ?
Mount the filesystem, and see what size `btrfs filesystem show` reports 
for the device, then go just over that (to account for rounding).
> 
> genext2fs in effect works generating a file. Instead mkfs.btrfs seems to work only with disks (or file already created)...
Most mkfs tools require a file to already exist because they were 
designed to create fixed size filesystem images.
> 
>>
>> Overall, this could probably stand to be documented better though
>> (I'll look at writing a patch to update the documentation to clarify
>> this when I have some spare time over the weekend).
> 
> This would be great. However I think that some code should be update in order to generate a file instead of rely on a block device.
> 
> BR
> G.Baroncelli
> 


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [btrfs-progs] Bug in mkfs.btrfs -r
  2017-09-01 11:28   ` Austin S. Hemmelgarn
@ 2017-09-01 11:49     ` Qu Wenruo
  2017-09-01 12:05       ` Austin S. Hemmelgarn
  2017-09-01 11:54     ` Qu Wenruo
  1 sibling, 1 reply; 15+ messages in thread
From: Qu Wenruo @ 2017-09-01 11:49 UTC (permalink / raw)
  To: Austin S. Hemmelgarn, kreijack, linux-btrfs



On 2017年09月01日 19:28, Austin S. Hemmelgarn wrote:
> On 2017-08-31 20:13, Qu Wenruo wrote:
>>
>>
>> On 2017年09月01日 01:27, Goffredo Baroncelli wrote:
>>> Hi All,
>>>
>>>
>>> I found a bug in mkfs.btrfs, when it is used the option '-r'. It 
>>> seems that it is not visible the full disk.
>>
>> Despite the new bug you found, -r has several existing bugs.
> Is this actually a bug though?  Every other filesystem creation  tool 
> that I know of that offers functionality like this generates the 
> filesystem just large enough to contain the data you want in it, so I 
> would argue that making this use the whole device is actually breaking 
> consistency with other tools, not to mention removing functionality that 
> is useful (even aside from the system image generation use case I 
> mentioned, there are other practical applications (seed 'device' 
> generation comes to mind).

Well, then documentation bug.

And I'm not sure the chunk size is correct or optimized.
Even for btrfs-convert, which will make data chunks very scattered, we 
still try to make a large chunk to cover scattered data extents.

At least to me, it's not the case for chunk created by -r option.

BTW, seed device is RO anyway, how much or how less spare space we have 
is not a problem at all.

So to me, even follow other tools -r, we should follow the normal extent 
allocator behavior to create data/metadata, and then set the device size 
to end of its dev extents.

>>
>> For example it will create dev extent starting from physical offset 0, 
>> while kernel and mkfs will avoid that range, as 0~1M on each device is 
>> reserved.
>>
>> According to the code, -r will modify chunk layout by itself, not the 
>> traditional way kernel is doing.
>>
>> I'll fix them (if I'm not a lazybone), before that fix, please don't 
>> use -r option as it's not well maintained or fully tested.
> FWIW, based on my own testing, filesystems generated with '-r' work just 
> fine as long as you don't try to embed boot code in the FS itself.

It works fine because btrfs extent allocator will try to avoid superblock.
But it doesn't mean we should put dev extents into 0~1M range.

In fact, there is a deprecated mount option, alloc_start, to set how 
many bytes we should reserve for *each* device.
And since it's deprecated, we'd better follow the 1M reservation for 
each device.

Anyway, kernel balance and plain mkfs won't create chunk stripe in 0~1M 
range of each device, mkfs -r should also follow it.

Thanks,
Qu

> 
> -- 
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [btrfs-progs] Bug in mkfs.btrfs -r
  2017-09-01 11:28   ` Austin S. Hemmelgarn
  2017-09-01 11:49     ` Qu Wenruo
@ 2017-09-01 11:54     ` Qu Wenruo
  1 sibling, 0 replies; 15+ messages in thread
From: Qu Wenruo @ 2017-09-01 11:54 UTC (permalink / raw)
  To: Austin S. Hemmelgarn, kreijack, linux-btrfs



On 2017年09月01日 19:28, Austin S. Hemmelgarn wrote:
> On 2017-08-31 20:13, Qu Wenruo wrote:
>>
>>
>> On 2017年09月01日 01:27, Goffredo Baroncelli wrote:
>>> Hi All,
>>>
>>>
>>> I found a bug in mkfs.btrfs, when it is used the option '-r'. It 
>>> seems that it is not visible the full disk.
>>
>> Despite the new bug you found, -r has several existing bugs.
> Is this actually a bug though?  Every other filesystem creation  tool 
> that I know of that offers functionality like this generates the 
> filesystem just large enough to contain the data you want in it,

At least I tried mkfs.ext4 with an almost empty directory (only one 512K 
file),
After mount the fs, there is still over 900M available space.

Even mkfs.ext4 doesn't explain much about its -d option, I think it's 
not the case, at least for -d option alone.

Thanks,
Qu

> so I 
> would argue that making this use the whole device is actually breaking 
> consistency with other tools, not to mention removing functionality that 
> is useful (even aside from the system image generation use case I 
> mentioned, there are other practical applications (seed 'device' 
> generation comes to mind).
>>
>> For example it will create dev extent starting from physical offset 0, 
>> while kernel and mkfs will avoid that range, as 0~1M on each device is 
>> reserved.
>>
>> According to the code, -r will modify chunk layout by itself, not the 
>> traditional way kernel is doing.
>>
>> I'll fix them (if I'm not a lazybone), before that fix, please don't 
>> use -r option as it's not well maintained or fully tested.
> FWIW, based on my own testing, filesystems generated with '-r' work just 
> fine as long as you don't try to embed boot code in the FS itself.
> 
> -- 
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [btrfs-progs] Bug in mkfs.btrfs -r
  2017-09-01 11:49     ` Qu Wenruo
@ 2017-09-01 12:05       ` Austin S. Hemmelgarn
  2017-09-01 12:19         ` Qu Wenruo
  0 siblings, 1 reply; 15+ messages in thread
From: Austin S. Hemmelgarn @ 2017-09-01 12:05 UTC (permalink / raw)
  To: Qu Wenruo, kreijack, linux-btrfs

On 2017-09-01 07:49, Qu Wenruo wrote:
> 
> On 2017年09月01日 19:28, Austin S. Hemmelgarn wrote:
>> On 2017-08-31 20:13, Qu Wenruo wrote:
>>>
>>> On 2017年09月01日 01:27, Goffredo Baroncelli wrote:
>>>> Hi All,
>>>>
>>>> I found a bug in mkfs.btrfs, when it is used the option '-r'. It 
>>>> seems that it is not visible the full disk.
>>>
>>> Despite the new bug you found, -r has several existing bugs.
>> Is this actually a bug though?  Every other filesystem creation  tool 
>> that I know of that offers functionality like this generates the 
>> filesystem just large enough to contain the data you want in it, so I 
>> would argue that making this use the whole device is actually breaking 
>> consistency with other tools, not to mention removing functionality 
>> that is useful (even aside from the system image generation use case I 
>> mentioned, there are other practical applications (seed 'device' 
>> generation comes to mind).
> 
> Well, then documentation bug.
> 
> And I'm not sure the chunk size is correct or optimized.
> Even for btrfs-convert, which will make data chunks very scattered, we 
> still try to make a large chunk to cover scattered data extents.
For a one-shot or read-only filesystem though, a maximally sized chunk 
is probably suboptimal.  Suppose you use this to generate a base image 
for a system in the form of a seed device.  This actually ends up being 
a pretty easy way to get factory reset functionality.  It's also a case 
where you want the base image to take up as little space as possible, so 
that the end-user usable storage space is as much as possible.  In that 
case, if your base image doesn't need an exact multiple of 1GB for data 
chunks, then using 1GB data chunks is not the best choice for at least 
the final data chunk (because the rest of that 1GB gets wasted).  A 
similar argument applies for metadata.
> 
> At least to me, it's not the case for chunk created by -r option.
> 
> BTW, seed device is RO anyway, how much or how less spare space we have 
> is not a problem at all.
That really depends on how you look at it.  Aside from the above 
example, there's the rather specific question of why you would not want 
to avoid wasting space.  The filesystem is read-only, which means that 
any 'free space' on that filesystem is completely unusable, can't be 
reclaimed for anything else, and in general is just a waste.
> 
> So to me, even follow other tools -r, we should follow the normal extent 
> allocator behavior to create data/metadata, and then set the device size 
> to end of its dev extents.
I don't entirely agree, but I think I've made my point well enough above.
>>>
>>> For example it will create dev extent starting from physical offset 
>>> 0, while kernel and mkfs will avoid that range, as 0~1M on each 
>>> device is reserved.
>>>
>>> According to the code, -r will modify chunk layout by itself, not the 
>>> traditional way kernel is doing.
>>>
>>> I'll fix them (if I'm not a lazybone), before that fix, please don't 
>>> use -r option as it's not well maintained or fully tested.
>> FWIW, based on my own testing, filesystems generated with '-r' work 
>> just fine as long as you don't try to embed boot code in the FS itself.
> 
> It works fine because btrfs extent allocator will try to avoid superblock.
> But it doesn't mean we should put dev extents into 0~1M range.
> 
> In fact, there is a deprecated mount option, alloc_start, to set how 
> many bytes we should reserve for *each* device.
> And since it's deprecated, we'd better follow the 1M reservation for 
> each device.
> 
> Anyway, kernel balance and plain mkfs won't create chunk stripe in 0~1M 
> range of each device, mkfs -r should also follow it.
Agreed, although the comments I made above about wasted space do still 
apply here (albeit to a lesser degree, 1MB is not going to make much of 
a difference for most people).


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [btrfs-progs] Bug in mkfs.btrfs -r
  2017-09-01 12:05       ` Austin S. Hemmelgarn
@ 2017-09-01 12:19         ` Qu Wenruo
  2017-09-01 12:47           ` Austin S. Hemmelgarn
  0 siblings, 1 reply; 15+ messages in thread
From: Qu Wenruo @ 2017-09-01 12:19 UTC (permalink / raw)
  To: Austin S. Hemmelgarn, kreijack, linux-btrfs



On 2017年09月01日 20:05, Austin S. Hemmelgarn wrote:
> On 2017-09-01 07:49, Qu Wenruo wrote:
>>
>> On 2017年09月01日 19:28, Austin S. Hemmelgarn wrote:
>>> On 2017-08-31 20:13, Qu Wenruo wrote:
>>>>
>>>> On 2017年09月01日 01:27, Goffredo Baroncelli wrote:
>>>>> Hi All,
>>>>>
>>>>> I found a bug in mkfs.btrfs, when it is used the option '-r'. It 
>>>>> seems that it is not visible the full disk.
>>>>
>>>> Despite the new bug you found, -r has several existing bugs.
>>> Is this actually a bug though?  Every other filesystem creation  tool 
>>> that I know of that offers functionality like this generates the 
>>> filesystem just large enough to contain the data you want in it, so I 
>>> would argue that making this use the whole device is actually 
>>> breaking consistency with other tools, not to mention removing 
>>> functionality that is useful (even aside from the system image 
>>> generation use case I mentioned, there are other practical 
>>> applications (seed 'device' generation comes to mind).
>>
>> Well, then documentation bug.
>>
>> And I'm not sure the chunk size is correct or optimized.
>> Even for btrfs-convert, which will make data chunks very scattered, we 
>> still try to make a large chunk to cover scattered data extents.
> For a one-shot or read-only filesystem though, a maximally sized chunk 
> is probably suboptimal.

Not exactly.
Current kernel (and btrfs-progs also tries to follow kernel chunk 
allocator's behavior) will not make a chunk larger than 10% of RW space.
So for small filesystem chunk won't be too maximally sized.

>  Suppose you use this to generate a base image 
> for a system in the form of a seed device.  This actually ends up being 
> a pretty easy way to get factory reset functionality.  It's also a case 
> where you want the base image to take up as little space as possible, so 
> that the end-user usable storage space is as much as possible.  In that 
> case, if your base image doesn't need an exact multiple of 1GB for data 
> chunks, then using 1GB data chunks is not the best choice for at least 
> the final data chunk (because the rest of that 1GB gets wasted).  A 
> similar argument applies for metadata.

Yes, your example makes sense. (despite of above 10% limit I mentioned).

The problem is, no one really knows how the image will be used.
Maybe it will be used as normal btrfs (with fi resize), or with your 
purpose.

For normal btrfs case, although it may not cause much problem, but it 
will not be the optimized use case and may need extra manual balance.

>>
>> At least to me, it's not the case for chunk created by -r option.
>>
>> BTW, seed device is RO anyway, how much or how less spare space we 
>> have is not a problem at all.
> That really depends on how you look at it.  Aside from the above 
> example, there's the rather specific question of why you would not want 
> to avoid wasting space.  The filesystem is read-only, which means that 
> any 'free space' on that filesystem is completely unusable, can't be 
> reclaimed for anything else, and in general is just a waste.

Still same problem above.
What if the seed device is de-attached and then be used as normal btrfs?

>>
>> So to me, even follow other tools -r, we should follow the normal 
>> extent allocator behavior to create data/metadata, and then set the 
>> device size to end of its dev extents.
> I don't entirely agree, but I think I've made my point well enough above.

Yes, you did make your point clear, and I agree that use cases you 
mentioned exist and wasted space also exists.

But since we don't really know what the image will be used, I prefer to 
keep everything to use kernel (or btrfs-progs) chunk allocator to make 
the behavior consistent.

So my point is more about consistent behavior of btrfs-progs and kernel, 
and less maintenance.
(That's to say, my goal for mkfs.btrfs -r is just to do mkfs, mount, cp 
without privilege)

Thanks,
Qu

>>>>
>>>> For example it will create dev extent starting from physical offset 
>>>> 0, while kernel and mkfs will avoid that range, as 0~1M on each 
>>>> device is reserved.
>>>>
>>>> According to the code, -r will modify chunk layout by itself, not 
>>>> the traditional way kernel is doing.
>>>>
>>>> I'll fix them (if I'm not a lazybone), before that fix, please don't 
>>>> use -r option as it's not well maintained or fully tested.
>>> FWIW, based on my own testing, filesystems generated with '-r' work 
>>> just fine as long as you don't try to embed boot code in the FS itself.
>>
>> It works fine because btrfs extent allocator will try to avoid 
>> superblock.
>> But it doesn't mean we should put dev extents into 0~1M range.
>>
>> In fact, there is a deprecated mount option, alloc_start, to set how 
>> many bytes we should reserve for *each* device.
>> And since it's deprecated, we'd better follow the 1M reservation for 
>> each device.
>>
>> Anyway, kernel balance and plain mkfs won't create chunk stripe in 
>> 0~1M range of each device, mkfs -r should also follow it.
> Agreed, although the comments I made above about wasted space do still 
> apply here (albeit to a lesser degree, 1MB is not going to make much of 
> a difference for most people).
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [btrfs-progs] Bug in mkfs.btrfs -r
  2017-09-01 12:19         ` Qu Wenruo
@ 2017-09-01 12:47           ` Austin S. Hemmelgarn
  2017-09-01 13:54             ` Qu Wenruo
  0 siblings, 1 reply; 15+ messages in thread
From: Austin S. Hemmelgarn @ 2017-09-01 12:47 UTC (permalink / raw)
  To: Qu Wenruo, kreijack, linux-btrfs

On 2017-09-01 08:19, Qu Wenruo wrote:
> 
> 
> On 2017年09月01日 20:05, Austin S. Hemmelgarn wrote:
>> On 2017-09-01 07:49, Qu Wenruo wrote:
>>>
>>> On 2017年09月01日 19:28, Austin S. Hemmelgarn wrote:
>>>> On 2017-08-31 20:13, Qu Wenruo wrote:
>>>>>
>>>>> On 2017年09月01日 01:27, Goffredo Baroncelli wrote:
>>>>>> Hi All,
>>>>>>
>>>>>> I found a bug in mkfs.btrfs, when it is used the option '-r'. It 
>>>>>> seems that it is not visible the full disk.
>>>>>
>>>>> Despite the new bug you found, -r has several existing bugs.
>>>> Is this actually a bug though?  Every other filesystem creation  
>>>> tool that I know of that offers functionality like this generates 
>>>> the filesystem just large enough to contain the data you want in it, 
>>>> so I would argue that making this use the whole device is actually 
>>>> breaking consistency with other tools, not to mention removing 
>>>> functionality that is useful (even aside from the system image 
>>>> generation use case I mentioned, there are other practical 
>>>> applications (seed 'device' generation comes to mind).
>>>
>>> Well, then documentation bug.
>>>
>>> And I'm not sure the chunk size is correct or optimized.
>>> Even for btrfs-convert, which will make data chunks very scattered, 
>>> we still try to make a large chunk to cover scattered data extents.
>> For a one-shot or read-only filesystem though, a maximally sized chunk 
>> is probably suboptimal.
> 
> Not exactly.
> Current kernel (and btrfs-progs also tries to follow kernel chunk 
> allocator's behavior) will not make a chunk larger than 10% of RW space.
> So for small filesystem chunk won't be too maximally sized.
Are you sure about this?  I've got a couple of sub 10GB BTRFS volumes 
that definitely have more than one 1GB data chunk.
> 
>>   Suppose you use this to generate a base image for a system in the 
>> form of a seed device.  This actually ends up being a pretty easy way 
>> to get factory reset functionality.  It's also a case where you want 
>> the base image to take up as little space as possible, so that the 
>> end-user usable storage space is as much as possible.  In that case, 
>> if your base image doesn't need an exact multiple of 1GB for data 
>> chunks, then using 1GB data chunks is not the best choice for at least 
>> the final data chunk (because the rest of that 1GB gets wasted).  A 
>> similar argument applies for metadata.
> 
> Yes, your example makes sense. (despite of above 10% limit I mentioned).
> 
> The problem is, no one really knows how the image will be used.
> Maybe it will be used as normal btrfs (with fi resize), or with your 
> purpose.
We can't save users from making poor choices.  If we could, we wouldn't 
have anywhere near as many e-mails on the list from people who are 
trying to recover data from their broken filesystems because they have 
no backups.

The only case I can find where '-r' is a win is when you need the 
filesystem to be as small as possible with no free space.  The moment 
you need free space, it's actually faster to just create the filesystem, 
resize it to the desired size, and then copy in your data (I've actually 
benchmarked this, and while it's not _much_ difference in time spent, 
there is a measurable difference, with my guess being that the 
allocation code is doing more work in userspace than in the kernel).  At 
a minimum, I think it's probably worth documenting this fact.
> For normal btrfs case, although it may not cause much problem, but it 
> will not be the optimized use case and may need extra manual balance.
Actually, until the first write to the filesystem, it will still be an 
optimal layout.  Once you start writing to any BTRFS filesystem that has 
an optimal layout though, it immediately becomes non-optimal, and 
there's not really anything we can do about that unless we allow chunks 
that are already allocated to be resized on the fly (which is a bad idea 
for multiple reasons).
> 
>>>
>>> At least to me, it's not the case for chunk created by -r option.
>>>
>>> BTW, seed device is RO anyway, how much or how less spare space we 
>>> have is not a problem at all.
>> That really depends on how you look at it.  Aside from the above 
>> example, there's the rather specific question of why you would not 
>> want to avoid wasting space.  The filesystem is read-only, which means 
>> that any 'free space' on that filesystem is completely unusable, can't 
>> be reclaimed for anything else, and in general is just a waste.
> 
> Still same problem above.
> What if the seed device is de-attached and then be used as normal btrfs?
> 
>>>
>>> So to me, even follow other tools -r, we should follow the normal 
>>> extent allocator behavior to create data/metadata, and then set the 
>>> device size to end of its dev extents.
>> I don't entirely agree, but I think I've made my point well enough above.
> 
> Yes, you did make your point clear, and I agree that use cases you 
> mentioned exist and wasted space also exists.
> 
> But since we don't really know what the image will be used, I prefer to 
> keep everything to use kernel (or btrfs-progs) chunk allocator to make 
> the behavior consistent.
> 
> So my point is more about consistent behavior of btrfs-progs and kernel, 
> and less maintenance.
> (That's to say, my goal for mkfs.btrfs -r is just to do mkfs, mount, cp 
> without privilege)
Perhaps we could add some tool then to take a BTRFS filesystem and 
restructure it to have an optimal layout?  On first examination, the 
resize command actually sounds like a reasonable place to do this, 
possibly add a 'min' keyword (similar to 'max') that can also adjust 
chunk sizes to get the smallest possible filesystem.  The biggest thing 
I'm worried about here is that there are numerous use cases for optimal 
filesystems of minimal size, and changing the behavior of the -r option 
will remove the only currently available way to get such filesystems.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [btrfs-progs] Bug in mkfs.btrfs -r
  2017-09-01 12:47           ` Austin S. Hemmelgarn
@ 2017-09-01 13:54             ` Qu Wenruo
  2017-09-01 14:07               ` Austin S. Hemmelgarn
  0 siblings, 1 reply; 15+ messages in thread
From: Qu Wenruo @ 2017-09-01 13:54 UTC (permalink / raw)
  To: Austin S. Hemmelgarn, kreijack, linux-btrfs



On 2017年09月01日 20:47, Austin S. Hemmelgarn wrote:
> On 2017-09-01 08:19, Qu Wenruo wrote:
>>
>>
>> On 2017年09月01日 20:05, Austin S. Hemmelgarn wrote:
>>> On 2017-09-01 07:49, Qu Wenruo wrote:
>>>>
>>>> On 2017年09月01日 19:28, Austin S. Hemmelgarn wrote:
>>>>> On 2017-08-31 20:13, Qu Wenruo wrote:
>>>>>>
>>>>>> On 2017年09月01日 01:27, Goffredo Baroncelli wrote:
>>>>>>> Hi All,
>>>>>>>
>>>>>>> I found a bug in mkfs.btrfs, when it is used the option '-r'. It 
>>>>>>> seems that it is not visible the full disk.
>>>>>>
>>>>>> Despite the new bug you found, -r has several existing bugs.
>>>>> Is this actually a bug though?  Every other filesystem creation 
>>>>> tool that I know of that offers functionality like this generates 
>>>>> the filesystem just large enough to contain the data you want in 
>>>>> it, so I would argue that making this use the whole device is 
>>>>> actually breaking consistency with other tools, not to mention 
>>>>> removing functionality that is useful (even aside from the system 
>>>>> image generation use case I mentioned, there are other practical 
>>>>> applications (seed 'device' generation comes to mind).
>>>>
>>>> Well, then documentation bug.
>>>>
>>>> And I'm not sure the chunk size is correct or optimized.
>>>> Even for btrfs-convert, which will make data chunks very scattered, 
>>>> we still try to make a large chunk to cover scattered data extents.
>>> For a one-shot or read-only filesystem though, a maximally sized 
>>> chunk is probably suboptimal.
>>
>> Not exactly.
>> Current kernel (and btrfs-progs also tries to follow kernel chunk 
>> allocator's behavior) will not make a chunk larger than 10% of RW space.
>> So for small filesystem chunk won't be too maximally sized.
> Are you sure about this?  I've got a couple of sub 10GB BTRFS volumes 
> that definitely have more than one 1GB data chunk.

Yes, check the following code:

         /* we don't want a chunk larger than 10% of writeable space */
         max_chunk_size = min(div_factor(fs_devices->total_rw_bytes, 1),
                              max_chunk_size);
Which is in __btrfs_alloc_chunk() function in fs/btrfs/volumes.c

>>
>>>   Suppose you use this to generate a base image for a system in the 
>>> form of a seed device.  This actually ends up being a pretty easy way 
>>> to get factory reset functionality.  It's also a case where you want 
>>> the base image to take up as little space as possible, so that the 
>>> end-user usable storage space is as much as possible.  In that case, 
>>> if your base image doesn't need an exact multiple of 1GB for data 
>>> chunks, then using 1GB data chunks is not the best choice for at 
>>> least the final data chunk (because the rest of that 1GB gets 
>>> wasted).  A similar argument applies for metadata.
>>
>> Yes, your example makes sense. (despite of above 10% limit I mentioned).
>>
>> The problem is, no one really knows how the image will be used.
>> Maybe it will be used as normal btrfs (with fi resize), or with your 
>> purpose.
> We can't save users from making poor choices.  If we could, we wouldn't 
> have anywhere near as many e-mails on the list from people who are 
> trying to recover data from their broken filesystems because they have 
> no backups.
> 
> The only case I can find where '-r' is a win is when you need the 
> filesystem to be as small as possible with no free space.  The moment 
> you need free space, it's actually faster to just create the filesystem, 
> resize it to the desired size, and then copy in your data (I've actually 
> benchmarked this, and while it's not _much_ difference in time spent, 
> there is a measurable difference, with my guess being that the 
> allocation code is doing more work in userspace than in the kernel).  At 
> a minimum, I think it's probably worth documenting this fact.

I still remember some time ago, other guys told me that the main 
advantage of -r is we don't need root privilege to mount.

Anyway, documentation is important, but we need to first know the 
correct or designed behavior of -r.

At least mkfs.ext4 -d option doesn't limit the size.
In my test, 1G file with mkfs.ext -d still shows about 900M+ available 
space.

>> For normal btrfs case, although it may not cause much problem, but it 
>> will not be the optimized use case and may need extra manual balance.
> Actually, until the first write to the filesystem, it will still be an 
> optimal layout.  Once you start writing to any BTRFS filesystem that has 
> an optimal layout though, it immediately becomes non-optimal, and 
> there's not really anything we can do about that unless we allow chunks 
> that are already allocated to be resized on the fly (which is a bad idea 
> for multiple reasons).
>>
>>>>
>>>> At least to me, it's not the case for chunk created by -r option.
>>>>
>>>> BTW, seed device is RO anyway, how much or how less spare space we 
>>>> have is not a problem at all.
>>> That really depends on how you look at it.  Aside from the above 
>>> example, there's the rather specific question of why you would not 
>>> want to avoid wasting space.  The filesystem is read-only, which 
>>> means that any 'free space' on that filesystem is completely 
>>> unusable, can't be reclaimed for anything else, and in general is 
>>> just a waste.
>>
>> Still same problem above.
>> What if the seed device is de-attached and then be used as normal btrfs?
>>
>>>>
>>>> So to me, even follow other tools -r, we should follow the normal 
>>>> extent allocator behavior to create data/metadata, and then set the 
>>>> device size to end of its dev extents.
>>> I don't entirely agree, but I think I've made my point well enough 
>>> above.
>>
>> Yes, you did make your point clear, and I agree that use cases you 
>> mentioned exist and wasted space also exists.
>>
>> But since we don't really know what the image will be used, I prefer 
>> to keep everything to use kernel (or btrfs-progs) chunk allocator to 
>> make the behavior consistent.
>>
>> So my point is more about consistent behavior of btrfs-progs and 
>> kernel, and less maintenance.
>> (That's to say, my goal for mkfs.btrfs -r is just to do mkfs, mount, 
>> cp without privilege)
> Perhaps we could add some tool then to take a BTRFS filesystem and 
> restructure it to have an optimal layout?  On first examination, the 
> resize command actually sounds like a reasonable place to do this, 
> possibly add a 'min' keyword (similar to 'max') that can also adjust 
> chunk sizes to get the smallest possible filesystem.  The biggest thing 
> I'm worried about here is that there are numerous use cases for optimal 
> filesystems of minimal size, and changing the behavior of the -r option 
> will remove the only currently available way to get such filesystems.

Yes, when we're going to cover all possible cases, we're doomed.

So I'll just make it as simple as possible for now.
If some one really wants to do that, resize subcommand seems to be a 
good place to start.

Thanks,
Qu

> -- 
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [btrfs-progs] Bug in mkfs.btrfs -r
  2017-09-01 13:54             ` Qu Wenruo
@ 2017-09-01 14:07               ` Austin S. Hemmelgarn
  2017-09-02  4:03                 ` Duncan
  0 siblings, 1 reply; 15+ messages in thread
From: Austin S. Hemmelgarn @ 2017-09-01 14:07 UTC (permalink / raw)
  To: Qu Wenruo, kreijack, linux-btrfs

On 2017-09-01 09:54, Qu Wenruo wrote:
> 
> 
> On 2017年09月01日 20:47, Austin S. Hemmelgarn wrote:
>> On 2017-09-01 08:19, Qu Wenruo wrote:
>>>
>>>
>>> On 2017年09月01日 20:05, Austin S. Hemmelgarn wrote:
>>>> On 2017-09-01 07:49, Qu Wenruo wrote:
>>>>>
>>>>> On 2017年09月01日 19:28, Austin S. Hemmelgarn wrote:
>>>>>> On 2017-08-31 20:13, Qu Wenruo wrote:
>>>>>>>
>>>>>>> On 2017年09月01日 01:27, Goffredo Baroncelli wrote:
>>>>>>>> Hi All,
>>>>>>>>
>>>>>>>> I found a bug in mkfs.btrfs, when it is used the option '-r'. It 
>>>>>>>> seems that it is not visible the full disk.
>>>>>>>
>>>>>>> Despite the new bug you found, -r has several existing bugs.
>>>>>> Is this actually a bug though?  Every other filesystem creation 
>>>>>> tool that I know of that offers functionality like this generates 
>>>>>> the filesystem just large enough to contain the data you want in 
>>>>>> it, so I would argue that making this use the whole device is 
>>>>>> actually breaking consistency with other tools, not to mention 
>>>>>> removing functionality that is useful (even aside from the system 
>>>>>> image generation use case I mentioned, there are other practical 
>>>>>> applications (seed 'device' generation comes to mind).
>>>>>
>>>>> Well, then documentation bug.
>>>>>
>>>>> And I'm not sure the chunk size is correct or optimized.
>>>>> Even for btrfs-convert, which will make data chunks very scattered, 
>>>>> we still try to make a large chunk to cover scattered data extents.
>>>> For a one-shot or read-only filesystem though, a maximally sized 
>>>> chunk is probably suboptimal.
>>>
>>> Not exactly.
>>> Current kernel (and btrfs-progs also tries to follow kernel chunk 
>>> allocator's behavior) will not make a chunk larger than 10% of RW space.
>>> So for small filesystem chunk won't be too maximally sized.
>> Are you sure about this?  I've got a couple of sub 10GB BTRFS volumes 
>> that definitely have more than one 1GB data chunk.
> 
> Yes, check the following code:
> 
>          /* we don't want a chunk larger than 10% of writeable space */
>          max_chunk_size = min(div_factor(fs_devices->total_rw_bytes, 1),
>                               max_chunk_size);
> Which is in __btrfs_alloc_chunk() function in fs/btrfs/volumes.c
Huh, I may have the remnants of an old bug present on those filesystems 
then, I'll have to look further into this.
> 
>>>
>>>>   Suppose you use this to generate a base image for a system in the 
>>>> form of a seed device.  This actually ends up being a pretty easy 
>>>> way to get factory reset functionality.  It's also a case where you 
>>>> want the base image to take up as little space as possible, so that 
>>>> the end-user usable storage space is as much as possible.  In that 
>>>> case, if your base image doesn't need an exact multiple of 1GB for 
>>>> data chunks, then using 1GB data chunks is not the best choice for 
>>>> at least the final data chunk (because the rest of that 1GB gets 
>>>> wasted).  A similar argument applies for metadata.
>>>
>>> Yes, your example makes sense. (despite of above 10% limit I mentioned).
>>>
>>> The problem is, no one really knows how the image will be used.
>>> Maybe it will be used as normal btrfs (with fi resize), or with your 
>>> purpose.
>> We can't save users from making poor choices.  If we could, we 
>> wouldn't have anywhere near as many e-mails on the list from people 
>> who are trying to recover data from their broken filesystems because 
>> they have no backups.
>>
>> The only case I can find where '-r' is a win is when you need the 
>> filesystem to be as small as possible with no free space.  The moment 
>> you need free space, it's actually faster to just create the 
>> filesystem, resize it to the desired size, and then copy in your data 
>> (I've actually benchmarked this, and while it's not _much_ difference 
>> in time spent, there is a measurable difference, with my guess being 
>> that the allocation code is doing more work in userspace than in the 
>> kernel).  At a minimum, I think it's probably worth documenting this 
>> fact.
> 
> I still remember some time ago, other guys told me that the main 
> advantage of -r is we don't need root privilege to mount.
That's a good point I hadn't thought of.  I'm used to working on single 
user systems (where I can just trap out to root to do stuff like that 
without any issue), or multi-user systems where I'm the admin (where I 
can also trap out to root to do that kind of thing with limited issues). 
  Getting a working FUSE module for BTRFS could help with this too 
though (and actually would probably not be hugely difficult, considering 
that we have most of the FS specific code in userspace already as part 
of btrfs-progs), but that's kind of beyond this discussion.
> 
> Anyway, documentation is important, but we need to first know the 
> correct or designed behavior of -r.
Agreed.
> 
> At least mkfs.ext4 -d option doesn't limit the size.
> In my test, 1G file with mkfs.ext -d still shows about 900M+ available 
> space.
That may just be them choosing to use whatever size the device has. 
They have limited incentive to do anything else, because genext2fs 
exists and covers the minimal filesystem generation side of things.
> 
>>> For normal btrfs case, although it may not cause much problem, but it 
>>> will not be the optimized use case and may need extra manual balance.
>> Actually, until the first write to the filesystem, it will still be an 
>> optimal layout.  Once you start writing to any BTRFS filesystem that 
>> has an optimal layout though, it immediately becomes non-optimal, and 
>> there's not really anything we can do about that unless we allow 
>> chunks that are already allocated to be resized on the fly (which is a 
>> bad idea for multiple reasons).
>>>
>>>>>
>>>>> At least to me, it's not the case for chunk created by -r option.
>>>>>
>>>>> BTW, seed device is RO anyway, how much or how less spare space we 
>>>>> have is not a problem at all.
>>>> That really depends on how you look at it.  Aside from the above 
>>>> example, there's the rather specific question of why you would not 
>>>> want to avoid wasting space.  The filesystem is read-only, which 
>>>> means that any 'free space' on that filesystem is completely 
>>>> unusable, can't be reclaimed for anything else, and in general is 
>>>> just a waste.
>>>
>>> Still same problem above.
>>> What if the seed device is de-attached and then be used as normal btrfs?
>>>
>>>>>
>>>>> So to me, even follow other tools -r, we should follow the normal 
>>>>> extent allocator behavior to create data/metadata, and then set the 
>>>>> device size to end of its dev extents.
>>>> I don't entirely agree, but I think I've made my point well enough 
>>>> above.
>>>
>>> Yes, you did make your point clear, and I agree that use cases you 
>>> mentioned exist and wasted space also exists.
>>>
>>> But since we don't really know what the image will be used, I prefer 
>>> to keep everything to use kernel (or btrfs-progs) chunk allocator to 
>>> make the behavior consistent.
>>>
>>> So my point is more about consistent behavior of btrfs-progs and 
>>> kernel, and less maintenance.
>>> (That's to say, my goal for mkfs.btrfs -r is just to do mkfs, mount, 
>>> cp without privilege)
>> Perhaps we could add some tool then to take a BTRFS filesystem and 
>> restructure it to have an optimal layout?  On first examination, the 
>> resize command actually sounds like a reasonable place to do this, 
>> possibly add a 'min' keyword (similar to 'max') that can also adjust 
>> chunk sizes to get the smallest possible filesystem.  The biggest 
>> thing I'm worried about here is that there are numerous use cases for 
>> optimal filesystems of minimal size, and changing the behavior of the 
>> -r option will remove the only currently available way to get such 
>> filesystems.
> 
> Yes, when we're going to cover all possible cases, we're doomed.
> 
> So I'll just make it as simple as possible for now.
> If some one really wants to do that, resize subcommand seems to be a 
> good place to start.
In the short term, perhaps we could add an option to mkfs.btrfs to use 
the current '-r' behavior.

Long term, I probably will look into adding something to do this to the 
resize command (because I can think of a few other cases where it's 
useful to have something like this), but it will probably be next year 
before I have a large enough block of free time to sit down and work on 
this.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [btrfs-progs] Bug in mkfs.btrfs -r
  2017-09-01 14:07               ` Austin S. Hemmelgarn
@ 2017-09-02  4:03                 ` Duncan
  2017-09-05  3:57                   ` Duncan
  0 siblings, 1 reply; 15+ messages in thread
From: Duncan @ 2017-09-02  4:03 UTC (permalink / raw)
  To: linux-btrfs

Austin S. Hemmelgarn posted on Fri, 01 Sep 2017 10:07:47 -0400 as
excerpted:

>  On 2017-09-01 09:54, Qu Wenruo wrote:
>> 
>> On 2017年09月01日 20:47, Austin S. Hemmelgarn wrote:

>>> On 2017-09-01 08:19, Qu Wenruo wrote:
>>>>
>>>> Current kernel (and btrfs-progs also tries to follow kernel chunk
>>>> allocator's behavior) will not make a chunk larger than 10% of RW
>>>> space.
>>>> So for small filesystem chunk won't be too maximally sized.

>>> Are you sure about this?  I've got a couple of sub 10GB BTRFS volumes
>>> that definitely have more than one 1GB data chunk.
>> 
>> Yes, check the following code:
>> 
>>          /* we don't want a chunk larger than 10% of writeable space */
>>          max_chunk_size = min(div_factor(fs_devices->total_rw_bytes, 1),
>>                               max_chunk_size);
>>
>> Which is in __btrfs_alloc_chunk() function in fs/btrfs/volumes.c

> Huh, I may have the remnants of an old bug present on those filesystems
> then, I'll have to look further into this.

Please do.

While I redid my /boot (and backups on other devices) when I upgraded to
all-ssd, and my /boot is now 512 MiB, with what appears to be a 64 MiB
first-chunk (mixed-bg mode so data/metadata), doubled due to dup-mode to
128 MiB, and while that works out to 1/8=12.5% of the filesystem size,
it still rebalances just fine as long as there's still 128 MiB unallocated
to create the new chunks to write into.

But my earlier layout had a 256 MiB /boot (and backups), and the initial
chunk size was *STILL* 64 MiB, dup to 128 MiB, half the 256 MiB filesystem
size so obviously no balance of that chunk possible because there's not
enough space left after the system chunk takes its byte as well, to write
the new copy of the chunk (and its dup) into.

Now the last time I redid the old layout /boot with a mkfs.btrfs was
several kernel and userspace cycles ago now, so mkfs.btrfs might well have
changed and now creates smaller chunks on a 256 MiB filesystem, but it
sure was frustrating not to be able to rebalance that chunk, and I don't
/know/ that the bug was fixed, because I have the larger /boot now.  Tho
as I said even there it's apparently 1/8, 12.5%, larger than the 10%
quoted, yet I know 32 MiB and I believe even 16 MiB chunks are possible,
tho I'm not sure what the exact minimum is.

Anyway, be sure and check mixed-mode too, because that's where I had my
problems, tho I'm not sure if it's where you had yours.  But it could be
that the differing code path of mixed-mode misses that 10% check, which
would explain my problem, and possibly yours if that's what yours were.

Meanwhile, I might have some free time to do my own checks tomorrow.
It'd be worth it just to my own peace of mind to settle the issue,
since it frustrated me for so long.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [btrfs-progs] Bug in mkfs.btrfs -r
  2017-09-02  4:03                 ` Duncan
@ 2017-09-05  3:57                   ` Duncan
  0 siblings, 0 replies; 15+ messages in thread
From: Duncan @ 2017-09-05  3:57 UTC (permalink / raw)
  To: linux-btrfs

Duncan posted on Sat, 02 Sep 2017 04:03:06 +0000 as excerpted:

> Austin S. Hemmelgarn posted on Fri, 01 Sep 2017 10:07:47 -0400 as
> excerpted:
> 
>>  On 2017-09-01 09:54, Qu Wenruo wrote:
>>> 
>>> On 2017年09月01日 20:47, Austin S. Hemmelgarn wrote:
> 
>>>> On 2017-09-01 08:19, Qu Wenruo wrote:
>>>>>
>>>>> Current kernel (and btrfs-progs also tries to follow kernel chunk
>>>>> allocator's behavior) will not make a chunk larger than 10% of RW
>>>>> space.
>>>>> So for small filesystem chunk won't be too maximally sized.
> 
>>>> Are you sure about this?  I've got a couple of sub 10GB BTRFS volumes
>>>> that definitely have more than one 1GB data chunk.
>>> 
>>> Yes, check the following code:
>>> 
>>>          /* we don't want a chunk larger than 10% of writeable
>>>          space */
>>>          max_chunk_size = min(div_factor(fs_devices->total_rw_bytes,
>>>          1),
>>>                               max_chunk_size);
>>>
>>> Which is in __btrfs_alloc_chunk() function in fs/btrfs/volumes.c
> 
>> Huh, I may have the remnants of an old bug present on those filesystems
>> then, I'll have to look further into this.
> 
> Please do.

> Meanwhile, I might have some free time to do my own checks tomorrow.
> It'd be worth it just to my own peace of mind to settle the issue,
> since it frustrated me for so long.

Just finished testing, and no, as of btrfs-progs 4.12, mkfs.btrfs does 
*NOT* always limit chunk sizes to 1/10 filesystem size, despite smaller 
chunk sizes being available.

Try this.

256 MiB partition.

mkfs.btrfs -d dup -m dup -O extref,skinny-metadata,no-holes --mixed 
<device>

mkfs.btrfs will create a 64 MiB mixed-mode chunk, a quarter the 256 MiB 
filesystem size, duped to 128 MiB, half the filesystem size.

That chunk cannot be balanced because there's no room for the write-into 
chunk and its dup on the filesystem, due to the system chunk of several 
MiB taking up part of the other half.

This despite further writes creating smaller chunks, so it's definitely 
possible to have 32 MiB and I believe 16 MiB chunks (at least, don't know 
if smaller is possible).

So as of -progs 4.12 at least, whatever code is there to try to keep 
mkfs.btrfs max chunk size under 10% of the filesystem size, is broken, at 
least for some configurations (check mixed-mode, sub-GiB).

So I'm glad I upsized /boot to half a GiB when I upgraded SSDs and redid 
my layout.  At least with half a gig, 64 MiB (still 12.5%, over the 10% 
limit) doubled to 128 MiB is still only a quarter the filesystem size, so 
rebalancing is actually possible.

(FWIW my /var/log is still 256 MiB per device, but it's raid1, so only a 
single copy of each chunk per device, and the 64 MiB chunk size is only a 
quarter of the device, so it can at least still be balanced, even with 
chunks more than double the broken 10% ceiling.)

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2017-09-05  3:58 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-08-31 17:27 [btrfs-progs] Bug in mkfs.btrfs -r Goffredo Baroncelli
2017-08-31 18:49 ` Austin S. Hemmelgarn
2017-08-31 20:29   ` Goffredo Baroncelli
2017-09-01 11:49     ` Austin S. Hemmelgarn
2017-09-01  0:13 ` Qu Wenruo
2017-09-01 11:28   ` Austin S. Hemmelgarn
2017-09-01 11:49     ` Qu Wenruo
2017-09-01 12:05       ` Austin S. Hemmelgarn
2017-09-01 12:19         ` Qu Wenruo
2017-09-01 12:47           ` Austin S. Hemmelgarn
2017-09-01 13:54             ` Qu Wenruo
2017-09-01 14:07               ` Austin S. Hemmelgarn
2017-09-02  4:03                 ` Duncan
2017-09-05  3:57                   ` Duncan
2017-09-01 11:54     ` Qu Wenruo

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.