[RFC PATCH] btrfs-progs: mkfs: Enforce 4k sectorsize by default

linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [RFC PATCH] btrfs-progs: mkfs: Enforce 4k sectorsize by default
@ 2023-03-21  2:03 Neal Gompa
  2023-03-21  3:03 ` Anand Jain
  2023-03-21  3:21 ` Qu Wenruo
  0 siblings, 2 replies; 9+ messages in thread
From: Neal Gompa @ 2023-03-21  2:03 UTC (permalink / raw)
  To: linux-btrfs; +Cc: David Sterba, Hector Martin, Davide Cavalca, Neal Gompa

We have had working subpage support in Btrfs for many cycles now.
Generally, we do not want people creating filesystems by default
with non-4k sectorsizes since it creates portability problems.

Signed-off-by: Neal Gompa <neal@gompa.dev>
---
 Documentation/Subpage.rst    |  9 +++++----
 Documentation/mkfs.btrfs.rst | 11 +++++++----
 mkfs/main.c                  |  2 +-
 3 files changed, 13 insertions(+), 9 deletions(-)

diff --git a/Documentation/Subpage.rst b/Documentation/Subpage.rst
index 21a495d5..d7e9b241 100644
--- a/Documentation/Subpage.rst
+++ b/Documentation/Subpage.rst
@@ -9,10 +9,11 @@ to the exactly same size of the block and page. On x86_64 this is typically
 pages, like 64KiB on 64bit ARM or PowerPC. This means filesystems created
 with 64KiB sector size cannot be mounted on a system with 4KiB page size.
 
-While with subpage support, systems with 64KiB page size can create (still needs
-"-s 4k" option for mkfs.btrfs) and mount filesystems with 4KiB sectorsize,
-allowing us to push 4KiB sectorsize as default sectorsize for all platforms in the
-near future.
+Since v6.3, filesystems are created with a 4KiB sectorsize by default,
+though it remains possible to create filesystems with other page sizes
+(such as 64KiB with the "-s 64k" option for mkfs.btrfs). This ensures that
+new filesystems are compatible across other architecture variants using
+larger page sizes.
 
 Requirements, limitations
 -------------------------
diff --git a/Documentation/mkfs.btrfs.rst b/Documentation/mkfs.btrfs.rst
index ba7227b3..af0b9c03 100644
--- a/Documentation/mkfs.btrfs.rst
+++ b/Documentation/mkfs.btrfs.rst
@@ -116,10 +116,13 @@ OPTIONS
 -s|--sectorsize <size>
         Specify the sectorsize, the minimum data block allocation unit.
 
-        The default value is the page size and is autodetected. If the sectorsize
-        differs from the page size, the created filesystem may not be mountable by the
-        running kernel. Therefore it is not recommended to use this option unless you
-        are going to mount it on a system with the appropriate page size.
+        The default value is 4KiB (4096). If larger page sizes (such as 64KiB [16384])
+        are used, the created filesystem will only mount on systems with a running kernel
+        running on a matching page size. Therefore it is not recommended to use this option
+        unless you are going to mount it on a system with the appropriate page size.
+
+        .. note::
+                Versions up to 6.3 set the sectorsize matching to the page size.
 
 -L|--label <string>
         Specify a label for the filesystem. The *string* should be less than 256
diff --git a/mkfs/main.c b/mkfs/main.c
index f5e34cbd..5e1834d7 100644
--- a/mkfs/main.c
+++ b/mkfs/main.c
@@ -1207,7 +1207,7 @@ int BOX_MAIN(mkfs)(int argc, char **argv)
 	}
 
 	if (!sectorsize)
-		sectorsize = (u32)sysconf(_SC_PAGESIZE);
+		sectorsize = (u32)SZ_4K;
 	if (btrfs_check_sectorsize(sectorsize))
 		goto error;
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [RFC PATCH] btrfs-progs: mkfs: Enforce 4k sectorsize by default
  2023-03-21  2:03 [RFC PATCH] btrfs-progs: mkfs: Enforce 4k sectorsize by default Neal Gompa
@ 2023-03-21  3:03 ` Anand Jain
  2023-03-21 11:25   ` Neal Gompa
  2023-03-21  3:21 ` Qu Wenruo
  1 sibling, 1 reply; 9+ messages in thread
From: Anand Jain @ 2023-03-21  3:03 UTC (permalink / raw)
  To: Neal Gompa, linux-btrfs; +Cc: David Sterba, Hector Martin, Davide Cavalca

On 21/03/2023 10:03, Neal Gompa wrote:
> We have had working subpage support in Btrfs for many cycles now.
> Generally, we do not want people creating filesystems by default
> with non-4k sectorsizes since it creates portability problems.
> 

I agree.

> Signed-off-by: Neal Gompa <neal@gompa.dev>
> ---
>   Documentation/Subpage.rst    |  9 +++++----
>   Documentation/mkfs.btrfs.rst | 11 +++++++----
>   mkfs/main.c                  |  2 +-
>   3 files changed, 13 insertions(+), 9 deletions(-)
> 
> diff --git a/Documentation/Subpage.rst b/Documentation/Subpage.rst
> index 21a495d5..d7e9b241 100644
> --- a/Documentation/Subpage.rst
> +++ b/Documentation/Subpage.rst
> @@ -9,10 +9,11 @@ to the exactly same size of the block and page. On x86_64 this is typically
>   pages, like 64KiB on 64bit ARM or PowerPC. This means filesystems created
>   with 64KiB sector size cannot be mounted on a system with 4KiB page size.
>   
> -While with subpage support, systems with 64KiB page size can create (still needs
> -"-s 4k" option for mkfs.btrfs) and mount filesystems with 4KiB sectorsize,
> -allowing us to push 4KiB sectorsize as default sectorsize for all platforms in the
> -near future.
> +Since v6.3, filesystems are created with a 4KiB sectorsize by default,
> +though it remains possible to create filesystems with other page sizes
> +(such as 64KiB with the "-s 64k" option for mkfs.btrfs). This ensures that
> +new filesystems are compatible across other architecture variants using
> +larger page sizes.
>   

This part is a little confusing.

We can also include kernel versions that started supporting subpages

v5.12   read support of blocksize<pagesize (subpage)
v5.15   write support of blocksize<pagesize (subpage)


>   Requirements, limitations
>   -------------------------
> diff --git a/Documentation/mkfs.btrfs.rst b/Documentation/mkfs.btrfs.rst
> index ba7227b3..af0b9c03 100644
> --- a/Documentation/mkfs.btrfs.rst
> +++ b/Documentation/mkfs.btrfs.rst
> @@ -116,10 +116,13 @@ OPTIONS
>   -s|--sectorsize <size>
>           Specify the sectorsize, the minimum data block allocation unit.
>   
> -        The default value is the page size and is autodetected. If the sectorsize
> -        differs from the page size, the created filesystem may not be mountable by the
> -        running kernel. Therefore it is not recommended to use this option unless you
> -        are going to mount it on a system with the appropriate page size.
> +        The default value is 4KiB (4096). If larger page sizes (such as 64KiB [16384])
> +        are used, the created filesystem will only mount on systems with a running kernel
> +        running on a matching page size. Therefore it is not recommended to use this option
> +        unless you are going to mount it on a system with the appropriate page size.
> +
> +        .. note::
> +                Versions up to 6.3 set the sectorsize matching to the page size.
>   


IMO this can be rewritten as below:

By default, the value is 4KiB, but it can be manually set to match the 
system page size. However, if the sector size is different from the page 
size, the resulting filesystem may not be mountable by the current 
kernel, apart from the default 4KiB. Hence, it's not advisable to use 
this option unless you intend to mount it on a system with the suitable 
page size.


Thanks, Anand


>   -L|--label <string>
>           Specify a label for the filesystem. The *string* should be less than 256
> diff --git a/mkfs/main.c b/mkfs/main.c
> index f5e34cbd..5e1834d7 100644
> --- a/mkfs/main.c
> +++ b/mkfs/main.c
> @@ -1207,7 +1207,7 @@ int BOX_MAIN(mkfs)(int argc, char **argv)
>   	}
>   
>   	if (!sectorsize)
> -		sectorsize = (u32)sysconf(_SC_PAGESIZE);
> +		sectorsize = (u32)SZ_4K;
>   	if (btrfs_check_sectorsize(sectorsize))
>   		goto error;
>   


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC PATCH] btrfs-progs: mkfs: Enforce 4k sectorsize by default
  2023-03-21  2:03 [RFC PATCH] btrfs-progs: mkfs: Enforce 4k sectorsize by default Neal Gompa
  2023-03-21  3:03 ` Anand Jain
@ 2023-03-21  3:21 ` Qu Wenruo
  2023-03-21 10:07   ` Hector Martin
  1 sibling, 1 reply; 9+ messages in thread
From: Qu Wenruo @ 2023-03-21  3:21 UTC (permalink / raw)
  To: Neal Gompa, linux-btrfs; +Cc: David Sterba, Hector Martin, Davide Cavalca



On 2023/3/21 10:03, Neal Gompa wrote:
> We have had working subpage support in Btrfs for many cycles now.
> Generally, we do not want people creating filesystems by default
> with non-4k sectorsizes since it creates portability problems.

My biggest concern for now is, I haven't yet done any subpage testing 
for a while.

The bottle neck is the lack of computing power.

I only have one decent (RK3588 based SBC, Rock5B) board available for 
testing, but it's taken by my daily fstests runs, and it's still using 
4K page size for the aarc64 VM.

Although I have an backup SBC (the same Rock5B), it's reserved mostly 
for upstreaming and testing for the RK3588 SoC.

Personally speaking, this change would bring a lot of more testing 
feedback from Asahi guys, thus would be pretty awesome.

But on the other hand, I really don't want any big bug screwing up early 
adopters, and further damage the reputation of btrfs.


Would the Asahi guys gives us some early test results?
(AKA, full fstests runs with 16K page size and 4K sectorsize).

If nothing wrong happened, I am very happy to this patch.

Thanks,
Qu
> 
> Signed-off-by: Neal Gompa <neal@gompa.dev>
> ---
>   Documentation/Subpage.rst    |  9 +++++----
>   Documentation/mkfs.btrfs.rst | 11 +++++++----
>   mkfs/main.c                  |  2 +-
>   3 files changed, 13 insertions(+), 9 deletions(-)
> 
> diff --git a/Documentation/Subpage.rst b/Documentation/Subpage.rst
> index 21a495d5..d7e9b241 100644
> --- a/Documentation/Subpage.rst
> +++ b/Documentation/Subpage.rst
> @@ -9,10 +9,11 @@ to the exactly same size of the block and page. On x86_64 this is typically
>   pages, like 64KiB on 64bit ARM or PowerPC. This means filesystems created
>   with 64KiB sector size cannot be mounted on a system with 4KiB page size.
>   
> -While with subpage support, systems with 64KiB page size can create (still needs
> -"-s 4k" option for mkfs.btrfs) and mount filesystems with 4KiB sectorsize,
> -allowing us to push 4KiB sectorsize as default sectorsize for all platforms in the
> -near future.
> +Since v6.3, filesystems are created with a 4KiB sectorsize by default,
> +though it remains possible to create filesystems with other page sizes
> +(such as 64KiB with the "-s 64k" option for mkfs.btrfs). This ensures that
> +new filesystems are compatible across other architecture variants using
> +larger page sizes.
>   
>   Requirements, limitations
>   -------------------------
> diff --git a/Documentation/mkfs.btrfs.rst b/Documentation/mkfs.btrfs.rst
> index ba7227b3..af0b9c03 100644
> --- a/Documentation/mkfs.btrfs.rst
> +++ b/Documentation/mkfs.btrfs.rst
> @@ -116,10 +116,13 @@ OPTIONS
>   -s|--sectorsize <size>
>           Specify the sectorsize, the minimum data block allocation unit.
>   
> -        The default value is the page size and is autodetected. If the sectorsize
> -        differs from the page size, the created filesystem may not be mountable by the
> -        running kernel. Therefore it is not recommended to use this option unless you
> -        are going to mount it on a system with the appropriate page size.
> +        The default value is 4KiB (4096). If larger page sizes (such as 64KiB [16384])
> +        are used, the created filesystem will only mount on systems with a running kernel
> +        running on a matching page size. Therefore it is not recommended to use this option
> +        unless you are going to mount it on a system with the appropriate page size.
> +
> +        .. note::
> +                Versions up to 6.3 set the sectorsize matching to the page size.
>   
>   -L|--label <string>
>           Specify a label for the filesystem. The *string* should be less than 256
> diff --git a/mkfs/main.c b/mkfs/main.c
> index f5e34cbd..5e1834d7 100644
> --- a/mkfs/main.c
> +++ b/mkfs/main.c
> @@ -1207,7 +1207,7 @@ int BOX_MAIN(mkfs)(int argc, char **argv)
>   	}
>   
>   	if (!sectorsize)
> -		sectorsize = (u32)sysconf(_SC_PAGESIZE);
> +		sectorsize = (u32)SZ_4K;
>   	if (btrfs_check_sectorsize(sectorsize))
>   		goto error;
>   

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC PATCH] btrfs-progs: mkfs: Enforce 4k sectorsize by default
  2023-03-21  3:21 ` Qu Wenruo
@ 2023-03-21 10:07   ` Hector Martin
  2023-03-21 11:02     ` Qu Wenruo
                       ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Hector Martin @ 2023-03-21 10:07 UTC (permalink / raw)
  To: Qu Wenruo, Neal Gompa, linux-btrfs
  Cc: David Sterba, Davide Cavalca, Sven Peter, axboe, Asahi Linux

On 21/03/2023 12.21, Qu Wenruo wrote:
> 
> 
> On 2023/3/21 10:03, Neal Gompa wrote:
>> We have had working subpage support in Btrfs for many cycles now.
>> Generally, we do not want people creating filesystems by default
>> with non-4k sectorsizes since it creates portability problems.
> 
> My biggest concern for now is, I haven't yet done any subpage testing 
> for a while.
> 
> The bottle neck is the lack of computing power.
> 
> I only have one decent (RK3588 based SBC, Rock5B) board available for 
> testing, but it's taken by my daily fstests runs, and it's still using 
> 4K page size for the aarc64 VM.
> 
> Although I have an backup SBC (the same Rock5B), it's reserved mostly 
> for upstreaming and testing for the RK3588 SoC.
> 
> Personally speaking, this change would bring a lot of more testing 
> feedback from Asahi guys, thus would be pretty awesome.

Note that we already have a bunch of people running Fedora Asahi, and as
far as I know those images are created on 4K systems and then used on
16K systems, so this should be already getting real-world testing (and
will get a lot more once we get to official announcement/release)
regardless of the default.

IOW, this change is mostly about people creating secondary btrfs
filesystems on Asahi directly, which is relatively niche in comparison.
So if you have a subpage bug it's going to hit Asahi users whether this
change happens or not :)

> 
> But on the other hand, I really don't want any big bug screwing up early 
> adopters, and further damage the reputation of btrfs.
> 
> 
> Would the Asahi guys gives us some early test results?
> (AKA, full fstests runs with 16K page size and 4K sectorsize).

Gave it a shot. Tested with options:

export TEST_DEV=/dev/nvme0n1p6
export TEST_DIR=/mnt/test
#export SCRATCH_DEV=/dev/nvme0n1p7
export SCRATCH_MNT=/mnt/scratch
export SCRATCH_DEV_POOL="/dev/nvme0n1p7 /dev/nvme0n1p8 /dev/nvme0n1p9
/dev/nvme0n1p10 /dev/nvme0n1p11"
export MKFS_OPTIONS="-s 4096"
export FSTYP=btrfs

btrfs/012 is broken, the converted FS fails to mount with:

[  784.588172] BTRFS warning (device nvme0n1p7): v1 space cache is not
supported for page size 16384 with sectorsize 4096
[  784.588199] BTRFS error (device nvme0n1p7): open_ctree failed

btrfs/131 and 136 have the same issue.

btrfs/106 has a size mismatch in the output, but I get the feeling
that's just a bad test that assumes 4K somewhere?

btrfs/140 is the first one that looks bad. Looks like the corruption
correction didn't work for some reason.

... and then btrfs/142 which is similar actually managed to log a bunch
of errors on our NVMe controller. Nice.

[ 1240.000104] nvme-apple 393cc0000.nvme: RTKit: syslog message:
host_ans2.c:1564: cmd parsing error for tag 12 fast decode err 0x1
[ 1240.000767] nvme0n1: Read(0x2) @ LBA 322753843, 0 blocks, Invalid
Field in Command (sct 0x0 / sc 0x2)
[ 1240.000771] nvme-apple 393cc0000.nvme: RTKit: syslog message:
host_ans2.c:1469: tag 12, completed with err BAD_CMD-
[ 1240.000775] operation not supported error, dev nvme0n1, sector
2582030751 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 2

Looks like it tried to read 0 blocks? I'm pretty sure that's not a valid
block device operation... and then that test hangs because the
_btrfs_direct_read_on_mirror() common function is completely broken, as
it infinite loops if the exec'd command fails (which it does here, with
ENOENT). Cc sven/axboe/asahi: I'm pretty sure we should catch
upper-layer badness like this before sending it to the controller.

Excluding that one and moving on, 143 is broken the same way, as are
btrfs/265, 266, 267, 269.

213 failed to balance with ENOSPC.

btrfs/246 has an output discrepancy, I don't know what's up with that.

generic/251 then failed too, with dmesg logs about failing to trim block
groups.

generic/520 failed with an EBUSY on umount, but I suspect this might be
some desktop/systemd stuff trying to use the mount?

generic/563 suggests there may be cgroup accounting issues

generic/619 seems known to be pathologically slow on arm64, so I killed
it (https://www.spinics.net/lists/linux-btrfs/msg131553.html).

generic/708 failed but that pointed to a known kernel bug that we don't
have the fix for yet (this is running on 6.2 + asahi patches).

Run output and some select dmesg sections:
https://gist.github.com/marcan/822a34266bcaf4f4cffaa6a198b4c616

Let me know if you need anything else.

> 
> If nothing wrong happened, I am very happy to this patch.
> 
> Thanks,
> Qu
>>
>> Signed-off-by: Neal Gompa <neal@gompa.dev>
>> ---
>>   Documentation/Subpage.rst    |  9 +++++----
>>   Documentation/mkfs.btrfs.rst | 11 +++++++----
>>   mkfs/main.c                  |  2 +-
>>   3 files changed, 13 insertions(+), 9 deletions(-)
>>
>> diff --git a/Documentation/Subpage.rst b/Documentation/Subpage.rst
>> index 21a495d5..d7e9b241 100644
>> --- a/Documentation/Subpage.rst
>> +++ b/Documentation/Subpage.rst
>> @@ -9,10 +9,11 @@ to the exactly same size of the block and page. On x86_64 this is typically
>>   pages, like 64KiB on 64bit ARM or PowerPC. This means filesystems created
>>   with 64KiB sector size cannot be mounted on a system with 4KiB page size.
>>   
>> -While with subpage support, systems with 64KiB page size can create (still needs
>> -"-s 4k" option for mkfs.btrfs) and mount filesystems with 4KiB sectorsize,
>> -allowing us to push 4KiB sectorsize as default sectorsize for all platforms in the
>> -near future.
>> +Since v6.3, filesystems are created with a 4KiB sectorsize by default,
>> +though it remains possible to create filesystems with other page sizes
>> +(such as 64KiB with the "-s 64k" option for mkfs.btrfs). This ensures that
>> +new filesystems are compatible across other architecture variants using
>> +larger page sizes.
>>   
>>   Requirements, limitations
>>   -------------------------
>> diff --git a/Documentation/mkfs.btrfs.rst b/Documentation/mkfs.btrfs.rst
>> index ba7227b3..af0b9c03 100644
>> --- a/Documentation/mkfs.btrfs.rst
>> +++ b/Documentation/mkfs.btrfs.rst
>> @@ -116,10 +116,13 @@ OPTIONS
>>   -s|--sectorsize <size>
>>           Specify the sectorsize, the minimum data block allocation unit.
>>   
>> -        The default value is the page size and is autodetected. If the sectorsize
>> -        differs from the page size, the created filesystem may not be mountable by the
>> -        running kernel. Therefore it is not recommended to use this option unless you
>> -        are going to mount it on a system with the appropriate page size.
>> +        The default value is 4KiB (4096). If larger page sizes (such as 64KiB [16384])
>> +        are used, the created filesystem will only mount on systems with a running kernel
>> +        running on a matching page size. Therefore it is not recommended to use this option
>> +        unless you are going to mount it on a system with the appropriate page size.
>> +
>> +        .. note::
>> +                Versions up to 6.3 set the sectorsize matching to the page size.
>>   
>>   -L|--label <string>
>>           Specify a label for the filesystem. The *string* should be less than 256
>> diff --git a/mkfs/main.c b/mkfs/main.c
>> index f5e34cbd..5e1834d7 100644
>> --- a/mkfs/main.c
>> +++ b/mkfs/main.c
>> @@ -1207,7 +1207,7 @@ int BOX_MAIN(mkfs)(int argc, char **argv)
>>   	}
>>   
>>   	if (!sectorsize)
>> -		sectorsize = (u32)sysconf(_SC_PAGESIZE);
>> +		sectorsize = (u32)SZ_4K;
>>   	if (btrfs_check_sectorsize(sectorsize))
>>   		goto error;
>>   
> 


- Hector

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC PATCH] btrfs-progs: mkfs: Enforce 4k sectorsize by default
  2023-03-21 10:07   ` Hector Martin
@ 2023-03-21 11:02     ` Qu Wenruo
  2023-03-22  5:19     ` Qu Wenruo
  2023-03-22 14:07     ` Su Yue
  2 siblings, 0 replies; 9+ messages in thread
From: Qu Wenruo @ 2023-03-21 11:02 UTC (permalink / raw)
  To: Hector Martin, Qu Wenruo, Neal Gompa, linux-btrfs
  Cc: David Sterba, Davide Cavalca, Sven Peter, axboe, Asahi Linux



On 2023/3/21 18:07, Hector Martin wrote:
> On 21/03/2023 12.21, Qu Wenruo wrote:
[...]
> 
> Note that we already have a bunch of people running Fedora Asahi, and as
> far as I know those images are created on 4K systems and then used on
> 16K systems, so this should be already getting real-world testing (and
> will get a lot more once we get to official announcement/release)
> regardless of the default.
> 
> IOW, this change is mostly about people creating secondary btrfs
> filesystems on Asahi directly, which is relatively niche in comparison.
> So if you have a subpage bug it's going to hit Asahi users whether this
> change happens or not :)

Awesome, nothing happened would already be the best thing.

The worst scenario, in which tons of users reporting subpage crash, is 
already avoided.

> 
>>
>> But on the other hand, I really don't want any big bug screwing up early
>> adopters, and further damage the reputation of btrfs.
>>
>>
>> Would the Asahi guys gives us some early test results?
>> (AKA, full fstests runs with 16K page size and 4K sectorsize).
> 
> Gave it a shot. Tested with options:
> 
> export TEST_DEV=/dev/nvme0n1p6
> export TEST_DIR=/mnt/test
> #export SCRATCH_DEV=/dev/nvme0n1p7
> export SCRATCH_MNT=/mnt/scratch
> export SCRATCH_DEV_POOL="/dev/nvme0n1p7 /dev/nvme0n1p8 /dev/nvme0n1p9
> /dev/nvme0n1p10 /dev/nvme0n1p11"
> export MKFS_OPTIONS="-s 4096"
> export FSTYP=btrfs
> 
> btrfs/012 is broken, the converted FS fails to mount with:
> 
> [  784.588172] BTRFS warning (device nvme0n1p7): v1 space cache is not
> supported for page size 16384 with sectorsize 4096
> [  784.588199] BTRFS error (device nvme0n1p7): open_ctree failed
> 
> btrfs/131 and 136 have the same issue.

This is v1 cache problems, IIRC newer progs from ArchlinuxARM upstream 
should already default to v2 cache, it's the test case itself requesting 
v1 cache, thus causing problems.
(v1 cache is explicitly rejected because I'm a little too lazy to fix 
all the hard coded PAGE_SIZE usage there).

I can update those test cases to avoid if v1 cache is not supported.

> 
> btrfs/106 has a size mismatch in the output, but I get the feeling
> that's just a bad test that assumes 4K somewhere?

Yep, that looks like the case.

Although I'll need dig deeper to find a way to fix the test case.

> 
> btrfs/140 is the first one that looks bad. Looks like the corruption
> correction didn't work for some reason.

This one seems interesting, would definitely bring some machine time for 
16K page sized testsing.

> 
> ... and then btrfs/142 which is similar actually managed to log a bunch
> of errors on our NVMe controller. Nice.
> 
> [ 1240.000104] nvme-apple 393cc0000.nvme: RTKit: syslog message:
> host_ans2.c:1564: cmd parsing error for tag 12 fast decode err 0x1
> [ 1240.000767] nvme0n1: Read(0x2) @ LBA 322753843, 0 blocks, Invalid
> Field in Command (sct 0x0 / sc 0x2)
> [ 1240.000771] nvme-apple 393cc0000.nvme: RTKit: syslog message:
> host_ans2.c:1469: tag 12, completed with err BAD_CMD-
> [ 1240.000775] operation not supported error, dev nvme0n1, sector
> 2582030751 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 2
> 
> Looks like it tried to read 0 blocks? I'm pretty sure that's not a valid
> block device operation... and then that test hangs because the
> _btrfs_direct_read_on_mirror() common function is completely broken, as
> it infinite loops if the exec'd command fails (which it does here, with
> ENOENT). Cc sven/axboe/asahi: I'm pretty sure we should catch
> upper-layer badness like this before sending it to the controller.

And also very weird, definitely worthy halting my daily runs.
(To be honest, recent misc-next is a little too boring, which means 
David is doing a pretty good job).

I also believe we should catch such zero lengthed bios inside btrfs 
before we had a block layer sanity checks.
(I know btrfs RAID56 code uses zero length bio as a way for completion 
barrier, but it should not reach lower layer, thus this must be a bug)

> 
> Excluding that one and moving on, 143 is broken the same way, as are
> btrfs/265, 266, 267, 269.
> 
> 213 failed to balance with ENOSPC.
> 
> btrfs/246 has an output discrepancy, I don't know what's up with that.
> 
> generic/251 then failed too, with dmesg logs about failing to trim block
> groups.
> 
> generic/520 failed with an EBUSY on umount, but I suspect this might be
> some desktop/systemd stuff trying to use the mount?
> 
> generic/563 suggests there may be cgroup accounting issues
> 
> generic/619 seems known to be pathologically slow on arm64, so I killed
> it (https://www.spinics.net/lists/linux-btrfs/msg131553.html).
> 
> generic/708 failed but that pointed to a known kernel bug that we don't
> have the fix for yet (this is running on 6.2 + asahi patches).
> 
> Run output and some select dmesg sections:
> https://gist.github.com/marcan/822a34266bcaf4f4cffaa6a198b4c616
> 
> Let me know if you need anything else.

Thank you very much for the test results and all your awesome Apple 
enablement work!
Qu

> 
>>
>> If nothing wrong happened, I am very happy to this patch.
>>
>> Thanks,
>> Qu
>>>
>>> Signed-off-by: Neal Gompa <neal@gompa.dev>
>>> ---
>>>    Documentation/Subpage.rst    |  9 +++++----
>>>    Documentation/mkfs.btrfs.rst | 11 +++++++----
>>>    mkfs/main.c                  |  2 +-
>>>    3 files changed, 13 insertions(+), 9 deletions(-)
>>>
>>> diff --git a/Documentation/Subpage.rst b/Documentation/Subpage.rst
>>> index 21a495d5..d7e9b241 100644
>>> --- a/Documentation/Subpage.rst
>>> +++ b/Documentation/Subpage.rst
>>> @@ -9,10 +9,11 @@ to the exactly same size of the block and page. On x86_64 this is typically
>>>    pages, like 64KiB on 64bit ARM or PowerPC. This means filesystems created
>>>    with 64KiB sector size cannot be mounted on a system with 4KiB page size.
>>>    
>>> -While with subpage support, systems with 64KiB page size can create (still needs
>>> -"-s 4k" option for mkfs.btrfs) and mount filesystems with 4KiB sectorsize,
>>> -allowing us to push 4KiB sectorsize as default sectorsize for all platforms in the
>>> -near future.
>>> +Since v6.3, filesystems are created with a 4KiB sectorsize by default,
>>> +though it remains possible to create filesystems with other page sizes
>>> +(such as 64KiB with the "-s 64k" option for mkfs.btrfs). This ensures that
>>> +new filesystems are compatible across other architecture variants using
>>> +larger page sizes.
>>>    
>>>    Requirements, limitations
>>>    -------------------------
>>> diff --git a/Documentation/mkfs.btrfs.rst b/Documentation/mkfs.btrfs.rst
>>> index ba7227b3..af0b9c03 100644
>>> --- a/Documentation/mkfs.btrfs.rst
>>> +++ b/Documentation/mkfs.btrfs.rst
>>> @@ -116,10 +116,13 @@ OPTIONS
>>>    -s|--sectorsize <size>
>>>            Specify the sectorsize, the minimum data block allocation unit.
>>>    
>>> -        The default value is the page size and is autodetected. If the sectorsize
>>> -        differs from the page size, the created filesystem may not be mountable by the
>>> -        running kernel. Therefore it is not recommended to use this option unless you
>>> -        are going to mount it on a system with the appropriate page size.
>>> +        The default value is 4KiB (4096). If larger page sizes (such as 64KiB [16384])
>>> +        are used, the created filesystem will only mount on systems with a running kernel
>>> +        running on a matching page size. Therefore it is not recommended to use this option
>>> +        unless you are going to mount it on a system with the appropriate page size.
>>> +
>>> +        .. note::
>>> +                Versions up to 6.3 set the sectorsize matching to the page size.
>>>    
>>>    -L|--label <string>
>>>            Specify a label for the filesystem. The *string* should be less than 256
>>> diff --git a/mkfs/main.c b/mkfs/main.c
>>> index f5e34cbd..5e1834d7 100644
>>> --- a/mkfs/main.c
>>> +++ b/mkfs/main.c
>>> @@ -1207,7 +1207,7 @@ int BOX_MAIN(mkfs)(int argc, char **argv)
>>>    	}
>>>    
>>>    	if (!sectorsize)
>>> -		sectorsize = (u32)sysconf(_SC_PAGESIZE);
>>> +		sectorsize = (u32)SZ_4K;
>>>    	if (btrfs_check_sectorsize(sectorsize))
>>>    		goto error;
>>>    
>>
> 
> 
> - Hector

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC PATCH] btrfs-progs: mkfs: Enforce 4k sectorsize by default
  2023-03-21  3:03 ` Anand Jain
@ 2023-03-21 11:25   ` Neal Gompa
  0 siblings, 0 replies; 9+ messages in thread
From: Neal Gompa @ 2023-03-21 11:25 UTC (permalink / raw)
  To: Anand Jain; +Cc: linux-btrfs, David Sterba, Hector Martin, Davide Cavalca

On Mon, Mar 20, 2023 at 11:03 PM Anand Jain <anand.jain@oracle.com> wrote:
>
> On 21/03/2023 10:03, Neal Gompa wrote:
> > We have had working subpage support in Btrfs for many cycles now.
> > Generally, we do not want people creating filesystems by default
> > with non-4k sectorsizes since it creates portability problems.
> >
>
> I agree.
>
> > Signed-off-by: Neal Gompa <neal@gompa.dev>
> > ---
> >   Documentation/Subpage.rst    |  9 +++++----
> >   Documentation/mkfs.btrfs.rst | 11 +++++++----
> >   mkfs/main.c                  |  2 +-
> >   3 files changed, 13 insertions(+), 9 deletions(-)
> >
> > diff --git a/Documentation/Subpage.rst b/Documentation/Subpage.rst
> > index 21a495d5..d7e9b241 100644
> > --- a/Documentation/Subpage.rst
> > +++ b/Documentation/Subpage.rst
> > @@ -9,10 +9,11 @@ to the exactly same size of the block and page. On x86_64 this is typically
> >   pages, like 64KiB on 64bit ARM or PowerPC. This means filesystems created
> >   with 64KiB sector size cannot be mounted on a system with 4KiB page size.
> >
> > -While with subpage support, systems with 64KiB page size can create (still needs
> > -"-s 4k" option for mkfs.btrfs) and mount filesystems with 4KiB sectorsize,
> > -allowing us to push 4KiB sectorsize as default sectorsize for all platforms in the
> > -near future.
> > +Since v6.3, filesystems are created with a 4KiB sectorsize by default,
> > +though it remains possible to create filesystems with other page sizes
> > +(such as 64KiB with the "-s 64k" option for mkfs.btrfs). This ensures that
> > +new filesystems are compatible across other architecture variants using
> > +larger page sizes.
> >
>
> This part is a little confusing.
>
> We can also include kernel versions that started supporting subpages
>
> v5.12   read support of blocksize<pagesize (subpage)
> v5.15   write support of blocksize<pagesize (subpage)
>

This is covered further down in the document. Though it does remind me
that I should rephrase some of it so that it doesn't say a default
mode is experimental. :)

>
> >   Requirements, limitations
> >   -------------------------
> > diff --git a/Documentation/mkfs.btrfs.rst b/Documentation/mkfs.btrfs.rst
> > index ba7227b3..af0b9c03 100644
> > --- a/Documentation/mkfs.btrfs.rst
> > +++ b/Documentation/mkfs.btrfs.rst
> > @@ -116,10 +116,13 @@ OPTIONS
> >   -s|--sectorsize <size>
> >           Specify the sectorsize, the minimum data block allocation unit.
> >
> > -        The default value is the page size and is autodetected. If the sectorsize
> > -        differs from the page size, the created filesystem may not be mountable by the
> > -        running kernel. Therefore it is not recommended to use this option unless you
> > -        are going to mount it on a system with the appropriate page size.
> > +        The default value is 4KiB (4096). If larger page sizes (such as 64KiB [16384])
> > +        are used, the created filesystem will only mount on systems with a running kernel
> > +        running on a matching page size. Therefore it is not recommended to use this option
> > +        unless you are going to mount it on a system with the appropriate page size.
> > +
> > +        .. note::
> > +                Versions up to 6.3 set the sectorsize matching to the page size.
> >
>
>
> IMO this can be rewritten as below:
>
> By default, the value is 4KiB, but it can be manually set to match the
> system page size. However, if the sector size is different from the page
> size, the resulting filesystem may not be mountable by the current
> kernel, apart from the default 4KiB. Hence, it's not advisable to use
> this option unless you intend to mount it on a system with the suitable
> page size.
>
>
> Thanks, Anand
>
>
> >   -L|--label <string>
> >           Specify a label for the filesystem. The *string* should be less than 256
> > diff --git a/mkfs/main.c b/mkfs/main.c
> > index f5e34cbd..5e1834d7 100644
> > --- a/mkfs/main.c
> > +++ b/mkfs/main.c
> > @@ -1207,7 +1207,7 @@ int BOX_MAIN(mkfs)(int argc, char **argv)
> >       }
> >
> >       if (!sectorsize)
> > -             sectorsize = (u32)sysconf(_SC_PAGESIZE);
> > +             sectorsize = (u32)SZ_4K;
> >       if (btrfs_check_sectorsize(sectorsize))
> >               goto error;
> >
>


-- 
真実はいつも一つ！/ Always, there's only one truth!

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC PATCH] btrfs-progs: mkfs: Enforce 4k sectorsize by default
  2023-03-21 10:07   ` Hector Martin
  2023-03-21 11:02     ` Qu Wenruo
@ 2023-03-22  5:19     ` Qu Wenruo
  2023-03-22 14:07     ` Su Yue
  2 siblings, 0 replies; 9+ messages in thread
From: Qu Wenruo @ 2023-03-22  5:19 UTC (permalink / raw)
  To: Hector Martin, Qu Wenruo, Neal Gompa, linux-btrfs
  Cc: David Sterba, Davide Cavalca, Sven Peter, axboe, Asahi Linux



On 2023/3/21 18:07, Hector Martin wrote:
[...]
> 
> btrfs/140 is the first one that looks bad. Looks like the corruption
> correction didn't work for some reason.
> 
> ... and then btrfs/142 which is similar actually managed to log a bunch
> of errors on our NVMe controller. Nice.
> 
> [ 1240.000104] nvme-apple 393cc0000.nvme: RTKit: syslog message:
> host_ans2.c:1564: cmd parsing error for tag 12 fast decode err 0x1
> [ 1240.000767] nvme0n1: Read(0x2) @ LBA 322753843, 0 blocks, Invalid
> Field in Command (sct 0x0 / sc 0x2)
> [ 1240.000771] nvme-apple 393cc0000.nvme: RTKit: syslog message:
> host_ans2.c:1469: tag 12, completed with err BAD_CMD-
> [ 1240.000775] operation not supported error, dev nvme0n1, sector
> 2582030751 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 2

Mind to share the reproducibility?

I tried it with my aarch64 VMs with the following setup:

- Page size 16K
   The host SoC supports all 4K/16K/64K page size

- Virtio-blk
   Backed by a NVME drive, with unsafe cache mode (aka, ignoring all file
   sync)

- Virtual memory 8G

- 4 vCPU cores
   All pinned to A76 cores, as qemu failed to boot if it's initialized
   across A76 and A55 cores.

   Paired with the virtio-blk with unsafe cache, the VM should have a
   very fast storage, while lame CPU perf.

- With extra ASSERT()s in btrfs_submit_bio()
   Making sure all bios submitted through btrfs_submit_bio() has a non-
   zero bi_size.

Unfortunately unable to reproduce the problem.


If you can reproduce, mind to provide a call trace?

> 
> Looks like it tried to read 0 blocks? I'm pretty sure that's not a valid
> block device operation... and then that test hangs because the
> _btrfs_direct_read_on_mirror() common function is completely broken, as
> it infinite loops if the exec'd command fails (which it does here, with
> ENOENT). Cc sven/axboe/asahi: I'm pretty sure we should catch
> upper-layer badness like this before sending it to the controller.
> 
> Excluding that one and moving on, 143 is broken the same way, as are
> btrfs/265, 266, 267, 269.

Also tried those ones, all passed here...
> 
> 213 failed to balance with ENOSPC.

Passed with 5x10G LVs as scratch dev pool.

I hope it's not Apple Silicon too fast to trigger it, or it would be 
pretty hard to reproduce...

> 
> btrfs/246 has an output discrepancy, I don't know what's up with that.

This is a limitation on the compressed write support, would update the 
test case to skip it.

> 
> generic/251 then failed too, with dmesg logs about failing to trim block
> groups.

It takes much longer time but still passed here.

I'll try my best to got some runs on real Apple machines meanwhile, but 
don't expect any improvement on my hardware situation any time soon...

Thanks,
Qu

> 
> generic/520 failed with an EBUSY on umount, but I suspect this might be
> some desktop/systemd stuff trying to use the mount?
> 
> generic/563 suggests there may be cgroup accounting issues
> 
> generic/619 seems known to be pathologically slow on arm64, so I killed
> it (https://www.spinics.net/lists/linux-btrfs/msg131553.html).
> 
> generic/708 failed but that pointed to a known kernel bug that we don't
> have the fix for yet (this is running on 6.2 + asahi patches).
> 
> Run output and some select dmesg sections:
> https://gist.github.com/marcan/822a34266bcaf4f4cffaa6a198b4c616
> 
> Let me know if you need anything else.
>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC PATCH] btrfs-progs: mkfs: Enforce 4k sectorsize by default
  2023-03-21 10:07   ` Hector Martin
  2023-03-21 11:02     ` Qu Wenruo
  2023-03-22  5:19     ` Qu Wenruo
@ 2023-03-22 14:07     ` Su Yue
  2 siblings, 0 replies; 9+ messages in thread
From: Su Yue @ 2023-03-22 14:07 UTC (permalink / raw)
  To: Hector Martin, Qu Wenruo, Neal Gompa, linux-btrfs
  Cc: David Sterba, Davide Cavalca, Sven Peter, axboe, Asahi Linux



On 2023/3/21 18:07, Hector Martin wrote:
> On 21/03/2023 12.21, Qu Wenruo wrote:
>>
>>
>> On 2023/3/21 10:03, Neal Gompa wrote:
>>> We have had working subpage support in Btrfs for many cycles now.
>>> Generally, we do not want people creating filesystems by default
>>> with non-4k sectorsizes since it creates portability problems.
>>
>> My biggest concern for now is, I haven't yet done any subpage testing
>> for a while.
>>
>> The bottle neck is the lack of computing power.
>>
>> I only have one decent (RK3588 based SBC, Rock5B) board available for
>> testing, but it's taken by my daily fstests runs, and it's still using
>> 4K page size for the aarc64 VM.
>>
>> Although I have an backup SBC (the same Rock5B), it's reserved mostly
>> for upstreaming and testing for the RK3588 SoC.
>>
>> Personally speaking, this change would bring a lot of more testing
>> feedback from Asahi guys, thus would be pretty awesome.
> 
> Note that we already have a bunch of people running Fedora Asahi, and as
> far as I know those images are created on 4K systems and then used on
> 16K systems, so this should be already getting real-world testing (and
> will get a lot more once we get to official announcement/release)
> regardless of the default.
> 
> IOW, this change is mostly about people creating secondary btrfs
> filesystems on Asahi directly, which is relatively niche in comparison.
> So if you have a subpage bug it's going to hit Asahi users whether this
> change happens or not :)
> 
>>
>> But on the other hand, I really don't want any big bug screwing up early
>> adopters, and further damage the reputation of btrfs.
>>
>>
>> Would the Asahi guys gives us some early test results?
>> (AKA, full fstests runs with 16K page size and 4K sectorsize).
> 
> Gave it a shot. Tested with options:
> 
> export TEST_DEV=/dev/nvme0n1p6
> export TEST_DIR=/mnt/test
> #export SCRATCH_DEV=/dev/nvme0n1p7
> export SCRATCH_MNT=/mnt/scratch
> export SCRATCH_DEV_POOL="/dev/nvme0n1p7 /dev/nvme0n1p8 /dev/nvme0n1p9
> /dev/nvme0n1p10 /dev/nvme0n1p11"
> export MKFS_OPTIONS="-s 4096"
> export FSTYP=btrfs
> 
> btrfs/012 is broken, the converted FS fails to mount with:
> 
> [  784.588172] BTRFS warning (device nvme0n1p7): v1 space cache is not
> supported for page size 16384 with sectorsize 4096
> [  784.588199] BTRFS error (device nvme0n1p7): open_ctree failed
> 
> btrfs/131 and 136 have the same issue.
> 
> btrfs/106 has a size mismatch in the output, but I get the feeling
> that's just a bad test that assumes 4K somewhere?
> 
> btrfs/140 is the first one that looks bad. Looks like the corruption
> correction didn't work for some reason.
> 
> ... and then btrfs/142 which is similar actually managed to log a bunch
> of errors on our NVMe controller. Nice.
> 
> [ 1240.000104] nvme-apple 393cc0000.nvme: RTKit: syslog message:
> host_ans2.c:1564: cmd parsing error for tag 12 fast decode err 0x1
> [ 1240.000767] nvme0n1: Read(0x2) @ LBA 322753843, 0 blocks, Invalid
> Field in Command (sct 0x0 / sc 0x2)
> [ 1240.000771] nvme-apple 393cc0000.nvme: RTKit: syslog message:
> host_ans2.c:1469: tag 12, completed with err BAD_CMD-
> [ 1240.000775] operation not supported error, dev nvme0n1, sector
> 2582030751 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 2
> 
> Looks like it tried to read 0 blocks? I'm pretty sure that's not a valid
> block device operation... and then that test hangs because the
> _btrfs_direct_read_on_mirror() common function is completely broken, as
> it infinite loops if the exec'd command fails (which it does here, with
> ENOENT). Cc sven/axboe/asahi: I'm pretty sure we should catch
> upper-layer badness like this before sending it to the controller.
> 
> Excluding that one and moving on, 143 is broken the same way, as are
> btrfs/265, 266, 267, 269.
> 
> 213 failed to balance with ENOSPC.
> 
> btrfs/246 has an output discrepancy, I don't know what's up with that.
> 
> generic/251 then failed too, with dmesg logs about failing to trim block
> groups.
> 
> generic/520 failed with an EBUSY on umount, but I suspect this might be
> some desktop/systemd stuff trying to use the mount?
> 
> generic/563 suggests there may be cgroup accounting issues
> 
> generic/619 seems known to be pathologically slow on arm64, so I killed
> it (https://www.spinics.net/lists/linux-btrfs/msg131553.html).
> 
> generic/708 failed but that pointed to a known kernel bug that we don't
> have the fix for yet (this is running on 6.2 + asahi patches).
> 

As Qu asked for test results on M1 mac mini, I ran above tests on 
misc-next in my qemu box but no error found. Those tests failed on bare 
metal
machines.


qemu configuration:

qemu-system-aarch64 \
     -M virt,highmem=no \
     -accel hvf -cpu host -name alarm \
     -smp cpus=8,sockets=1,cores=8,threads=1 -m 10096M \
     -bios $uefi \
     -device virtio-gpu -device qemu-xhci,id=xhci -device usb-kbd \
     -device usb-tablet -device virtio-rng-pci \
     -nic user,model=virtio,hostfwd=tcp::10022-:22 \
     -rtc base=localtime,clock=host \
     -chardev socket,id=mon0,host=localhost,port=63855,server=on,wait=off \
     -mon chardev=mon0,mode=control,pretty=on \
     -drive file=${alarm},if=virtio,id=boot,cache=writethrough \
     -vga none -nographic \
     -drive file=/dev/disk0s3,if=virtio,cache=unsafe,format=raw \
     -netdev vmnet-bridged,id=net0,ifname=en1

--
Su
> Run output and some select dmesg sections:
> https://gist.github.com/marcan/822a34266bcaf4f4cffaa6a198b4c616
> 
> Let me know if you need anything else.
> 
>>
>> If nothing wrong happened, I am very happy to this patch.
>>
>> Thanks,
>> Qu
>>>
>>> Signed-off-by: Neal Gompa <neal@gompa.dev>
>>> ---
>>>    Documentation/Subpage.rst    |  9 +++++----
>>>    Documentation/mkfs.btrfs.rst | 11 +++++++----
>>>    mkfs/main.c                  |  2 +-
>>>    3 files changed, 13 insertions(+), 9 deletions(-)
>>>
>>> diff --git a/Documentation/Subpage.rst b/Documentation/Subpage.rst
>>> index 21a495d5..d7e9b241 100644
>>> --- a/Documentation/Subpage.rst
>>> +++ b/Documentation/Subpage.rst
>>> @@ -9,10 +9,11 @@ to the exactly same size of the block and page. On x86_64 this is typically
>>>    pages, like 64KiB on 64bit ARM or PowerPC. This means filesystems created
>>>    with 64KiB sector size cannot be mounted on a system with 4KiB page size.
>>>    
>>> -While with subpage support, systems with 64KiB page size can create (still needs
>>> -"-s 4k" option for mkfs.btrfs) and mount filesystems with 4KiB sectorsize,
>>> -allowing us to push 4KiB sectorsize as default sectorsize for all platforms in the
>>> -near future.
>>> +Since v6.3, filesystems are created with a 4KiB sectorsize by default,
>>> +though it remains possible to create filesystems with other page sizes
>>> +(such as 64KiB with the "-s 64k" option for mkfs.btrfs). This ensures that
>>> +new filesystems are compatible across other architecture variants using
>>> +larger page sizes.
>>>    
>>>    Requirements, limitations
>>>    -------------------------
>>> diff --git a/Documentation/mkfs.btrfs.rst b/Documentation/mkfs.btrfs.rst
>>> index ba7227b3..af0b9c03 100644
>>> --- a/Documentation/mkfs.btrfs.rst
>>> +++ b/Documentation/mkfs.btrfs.rst
>>> @@ -116,10 +116,13 @@ OPTIONS
>>>    -s|--sectorsize <size>
>>>            Specify the sectorsize, the minimum data block allocation unit.
>>>    
>>> -        The default value is the page size and is autodetected. If the sectorsize
>>> -        differs from the page size, the created filesystem may not be mountable by the
>>> -        running kernel. Therefore it is not recommended to use this option unless you
>>> -        are going to mount it on a system with the appropriate page size.
>>> +        The default value is 4KiB (4096). If larger page sizes (such as 64KiB [16384])
>>> +        are used, the created filesystem will only mount on systems with a running kernel
>>> +        running on a matching page size. Therefore it is not recommended to use this option
>>> +        unless you are going to mount it on a system with the appropriate page size.
>>> +
>>> +        .. note::
>>> +                Versions up to 6.3 set the sectorsize matching to the page size.
>>>    
>>>    -L|--label <string>
>>>            Specify a label for the filesystem. The *string* should be less than 256
>>> diff --git a/mkfs/main.c b/mkfs/main.c
>>> index f5e34cbd..5e1834d7 100644
>>> --- a/mkfs/main.c
>>> +++ b/mkfs/main.c
>>> @@ -1207,7 +1207,7 @@ int BOX_MAIN(mkfs)(int argc, char **argv)
>>>    	}
>>>    
>>>    	if (!sectorsize)
>>> -		sectorsize = (u32)sysconf(_SC_PAGESIZE);
>>> +		sectorsize = (u32)SZ_4K;
>>>    	if (btrfs_check_sectorsize(sectorsize))
>>>    		goto error;
>>>    
>>
> 
> 
> - Hector

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC PATCH] btrfs-progs: mkfs: Enforce 4k sectorsize by default
@ 2023-03-25 14:05 Hector Martin
  0 siblings, 0 replies; 9+ messages in thread
From: Hector Martin @ 2023-03-25 14:05 UTC (permalink / raw)
  To: Qu Wenruo, Qu Wenruo, Neal Gompa, linux-btrfs
  Cc: David Sterba, Davide Cavalca, Sven Peter, axboe, Asahi Linux

On 22/03/2023 14.19, Qu Wenruo wrote:
> 
> 
> On 2023/3/21 18:07, Hector Martin wrote:
> [...]
>>
>> btrfs/140 is the first one that looks bad. Looks like the corruption
>> correction didn't work for some reason.
>>
>> ... and then btrfs/142 which is similar actually managed to log a bunch
>> of errors on our NVMe controller. Nice.
>>
>> [ 1240.000104] nvme-apple 393cc0000.nvme: RTKit: syslog message:
>> host_ans2.c:1564: cmd parsing error for tag 12 fast decode err 0x1
>> [ 1240.000767] nvme0n1: Read(0x2) @ LBA 322753843, 0 blocks, Invalid
>> Field in Command (sct 0x0 / sc 0x2)
>> [ 1240.000771] nvme-apple 393cc0000.nvme: RTKit: syslog message:
>> host_ans2.c:1469: tag 12, completed with err BAD_CMD-
>> [ 1240.000775] operation not supported error, dev nvme0n1, sector
>> 2582030751 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 2
> 
> Mind to share the reproducibility?

100% as far as I can tell. I'm using 20G devices for everything. It
instantly hits that bug.

Good news is it's not a btrfs bug though :)

common/dmdust:  DUST_TABLE="0 $DEV_SIZE dust $SCRATCH_DEV 0 512"

The dust device is created with 512 byte sectors, while the NVMe on
these machines has 4K sectors. Changing that to 4K and the respective
block size calculation in the tests (142/143) (`physical / 512` ->
`physical / 4096`) makes it pass. It's not reading 0 bytes, it's reading
512 bytes (which gets truncated to 0 sectors).

And then that explains why 140 doesn't work: it also assumes 512, but in
a way that just fails instead of causing kernel explosions. Same with
btrfs/265, 266, 267, 269.

So there's two problems here: a bunch of tests assume 512 byte sectors,
and the kernel DM dust device is missing a check that should reject
creation with an invalid sector size instead of just causing wreckage
further down the block device chain.

>> 213 failed to balance with ENOSPC.
> 
> Passed with 5x10G LVs as scratch dev pool.

5x20G GPT partitions here, and I can reliably repro this one. Maybe it's
also related to the 4K sectors?

> I hope it's not Apple Silicon too fast to trigger it, or it would be 
> pretty hard to reproduce...

FWIW, there's also the whole thing where Apple Silicon is *really* good
at triggering memory ordering and concurrency problems, since it is such
a wide architecture. I found a bug in the core kernel workqueue code
that had been there for years on ARM64...

>> generic/251 then failed too, with dmesg logs about failing to trim block
>> groups.
> 
> It takes much longer time but still passed here.

Reliably fails here, and this is another 4K sector one and actually a
btrfs bug I think. It tries to do discards that are not 4K-aligned.
Here's the call trace leading to the bad discard attempt:

[   70.571872]  btrfs_issue_discard+0x224/0x260
[   70.571874]  btrfs_discard_extent+0x128/0x1dc
[   70.571876]  do_trimming+0xc4/0x220
[   70.571878]  trim_no_bitmap+0x21c/0x3f0
[   70.571879]  btrfs_trim_block_group+0x88/0x170
[   70.571880]  btrfs_trim_fs+0xf0/0x3a0
[   70.571882]  btrfs_ioctl_fitrim+0x100/0x1dc
[   70.571885]  btrfs_ioctl+0x1550/0x25bc
[   70.571887]  __arm64_sys_ioctl+0x464/0xbb0
[   70.571891]  invoke_syscall.constprop.0+0x78/0xd0
[   70.571894]  do_el0_svc+0x58/0x170
[   70.571896]  el0_svc+0x38/0xf0
[   70.571900]  el0t_64_sync_handler+0xf4/0x120
[   70.571902]  el0t_64_sync+0x190/0x194

Note that discard size is actually a *separate* device property from
sector size, so this is doubly wrong. btrfs probably needs to explicitly
align to the discard size.

> I'll try my best to got some runs on real Apple machines meanwhile, but 
> don't expect any improvement on my hardware situation any time soon...

Hope you can repro with the 4K sector hint. Last time I had emulate that
I used a loop device with 4K sectors, but I don't know if that supports
discard... not sure if there is a better way.

> 
> Thanks,
> Qu
> 
>>
>> generic/520 failed with an EBUSY on umount, but I suspect this might be
>> some desktop/systemd stuff trying to use the mount?
>>
>> generic/563 suggests there may be cgroup accounting issues
>>
>> generic/619 seems known to be pathologically slow on arm64, so I killed
>> it (https://www.spinics.net/lists/linux-btrfs/msg131553.html).
>>
>> generic/708 failed but that pointed to a known kernel bug that we don't
>> have the fix for yet (this is running on 6.2 + asahi patches).
>>
>> Run output and some select dmesg sections:
>> https://gist.github.com/marcan/822a34266bcaf4f4cffaa6a198b4c616
>>
>> Let me know if you need anything else.
>>
> 

- Hector

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2023-03-25 14:06 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-03-21  2:03 [RFC PATCH] btrfs-progs: mkfs: Enforce 4k sectorsize by default Neal Gompa
2023-03-21  3:03 ` Anand Jain
2023-03-21 11:25   ` Neal Gompa
2023-03-21  3:21 ` Qu Wenruo
2023-03-21 10:07   ` Hector Martin
2023-03-21 11:02     ` Qu Wenruo
2023-03-22  5:19     ` Qu Wenruo
2023-03-22 14:07     ` Su Yue
2023-03-25 14:05 Hector Martin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).