All of lore.kernel.org
 help / color / mirror / Atom feed
* [btrfs] RAID1 volume on zoned device oops when sync.
@ 2024-02-02  8:13 韩于惟
  2024-02-02 12:19 ` David Sterba
  0 siblings, 1 reply; 22+ messages in thread
From: 韩于惟 @ 2024-02-02  8:13 UTC (permalink / raw)
  To: linux-btrfs

Hi All,

I have built a RAID1 volume on HC620 using kernel 6.7.2 with btrfs debug 
enabled.

Then I started BT download and sync, and it oops. dmesg in 
https://fars.ee/N4pJ

I am still keeping the drive's state.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [btrfs] RAID1 volume on zoned device oops when sync.
  2024-02-02  8:13 [btrfs] RAID1 volume on zoned device oops when sync 韩于惟
@ 2024-02-02 12:19 ` David Sterba
  2024-02-03 10:18   ` 韩于惟
  0 siblings, 1 reply; 22+ messages in thread
From: David Sterba @ 2024-02-02 12:19 UTC (permalink / raw)
  To: 韩于惟; +Cc: linux-btrfs

On Fri, Feb 02, 2024 at 04:13:13PM +0800, 韩于惟 wrote:
> Hi All,
> 
> I have built a RAID1 volume on HC620 using kernel 6.7.2 with btrfs debug 
> enabled.
> 
> Then I started BT download and sync, and it oops. dmesg in 
> https://fars.ee/N4pJ

The system has 16K pages and uses a zoned device, this crashes in the
end io write callback that does not seem to have subpage support:

[ 1863.303324]    ra: ffff8000025364b0 end_bio_extent_writepage+0x110/0x220 [btrfs]
[ 1863.303413]   ERA: ffff800002533464 btrfs_finish_ordered_extent+0x24/0xc0 [btrfs]

[ 1863.303638] Call Trace:
[ 1863.303639] [<ffff800002533464>] btrfs_finish_ordered_extent+0x24/0xc0 [btrfs]
[ 1863.303736] [<ffff8000025364b0>] end_bio_extent_writepage+0x110/0x220 [btrfs]
[ 1863.303831] [<ffff8000025d4510>] __btrfs_bio_end_io+0x50/0x80 [btrfs]
[ 1863.303924] [<ffff8000025d5118>] btrfs_submit_chunk+0x378/0x620 [btrfs]
[ 1863.304016] [<ffff8000025d5524>] btrfs_submit_bio+0x24/0x40 [btrfs]
[ 1863.304109] [<ffff800002535628>] submit_one_bio+0x48/0x80 [btrfs]
[ 1863.304204] [<ffff80000253a2bc>] extent_write_locked_range+0x31c/0x480 [btrfs]
[ 1863.304298] [<ffff800002510dc8>] run_delalloc_cow+0x88/0x160 [btrfs]
[ 1863.304393] [<ffff80000251186c>] btrfs_run_delalloc_range+0x10c/0x4c0 [btrfs]
[ 1863.304486] [<ffff800002536d1c>] writepage_delalloc+0xbc/0x1e0 [btrfs]
[ 1863.304579] [<ffff800002539874>] extent_write_cache_pages+0x274/0x7a0 [btrfs]
[ 1863.304672] [<ffff80000253a4c4>] extent_writepages+0xa4/0x1a0 [btrfs]
[ 1863.304765] [<900000000252ee14>] do_writepages+0x94/0x220

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [btrfs] RAID1 volume on zoned device oops when sync.
  2024-02-02 12:19 ` David Sterba
@ 2024-02-03 10:18   ` 韩于惟
  2024-02-03 22:15     ` David Sterba
  0 siblings, 1 reply; 22+ messages in thread
From: 韩于惟 @ 2024-02-03 10:18 UTC (permalink / raw)
  To: dsterba; +Cc: linux-btrfs

When mkfs, I intentionally used "-s 4k" for better compatibility.
And /sys/fs/btrfs/features/supported_sectorsizes is 4096 16384, which 
should be ok.

btrfs-progs is 6.6.2-1, is this related?

在 2024/2/2 20:19, David Sterba 写道:
> On Fri, Feb 02, 2024 at 04:13:13PM +0800, 韩于惟 wrote:
>> Hi All,
>>
>> I have built a RAID1 volume on HC620 using kernel 6.7.2 with btrfs debug
>> enabled.
>>
>> Then I started BT download and sync, and it oops. dmesg in
>> https://fars.ee/N4pJ
> The system has 16K pages and uses a zoned device, this crashes in the
> end io write callback that does not seem to have subpage support:
>
> [ 1863.303324]    ra: ffff8000025364b0 end_bio_extent_writepage+0x110/0x220 [btrfs]
> [ 1863.303413]   ERA: ffff800002533464 btrfs_finish_ordered_extent+0x24/0xc0 [btrfs]
>
> [ 1863.303638] Call Trace:
> [ 1863.303639] [<ffff800002533464>] btrfs_finish_ordered_extent+0x24/0xc0 [btrfs]
> [ 1863.303736] [<ffff8000025364b0>] end_bio_extent_writepage+0x110/0x220 [btrfs]
> [ 1863.303831] [<ffff8000025d4510>] __btrfs_bio_end_io+0x50/0x80 [btrfs]
> [ 1863.303924] [<ffff8000025d5118>] btrfs_submit_chunk+0x378/0x620 [btrfs]
> [ 1863.304016] [<ffff8000025d5524>] btrfs_submit_bio+0x24/0x40 [btrfs]
> [ 1863.304109] [<ffff800002535628>] submit_one_bio+0x48/0x80 [btrfs]
> [ 1863.304204] [<ffff80000253a2bc>] extent_write_locked_range+0x31c/0x480 [btrfs]
> [ 1863.304298] [<ffff800002510dc8>] run_delalloc_cow+0x88/0x160 [btrfs]
> [ 1863.304393] [<ffff80000251186c>] btrfs_run_delalloc_range+0x10c/0x4c0 [btrfs]
> [ 1863.304486] [<ffff800002536d1c>] writepage_delalloc+0xbc/0x1e0 [btrfs]
> [ 1863.304579] [<ffff800002539874>] extent_write_cache_pages+0x274/0x7a0 [btrfs]
> [ 1863.304672] [<ffff80000253a4c4>] extent_writepages+0xa4/0x1a0 [btrfs]
> [ 1863.304765] [<900000000252ee14>] do_writepages+0x94/0x220
>


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [btrfs] RAID1 volume on zoned device oops when sync.
  2024-02-03 10:18   ` 韩于惟
@ 2024-02-03 22:15     ` David Sterba
  2024-02-04  9:34       ` 韩于惟
  0 siblings, 1 reply; 22+ messages in thread
From: David Sterba @ 2024-02-03 22:15 UTC (permalink / raw)
  To: 韩于惟; +Cc: dsterba, linux-btrfs

On Sat, Feb 03, 2024 at 06:18:09PM +0800, 韩于惟 wrote:
> When mkfs, I intentionally used "-s 4k" for better compatibility.
> And /sys/fs/btrfs/features/supported_sectorsizes is 4096 16384, which 
> should be ok.
> 
> btrfs-progs is 6.6.2-1, is this related?

No, this is something in kernel. You could test if same page and sector
size works, ie. mkfs.btrfs --sectorsize 16k. This avoids using the
subpage layer that transalates the 4k sectors <-> 16k pages. This has
the known interoperability issues with different page and sector sizes
but if it does not affect you, you can use it.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [btrfs] RAID1 volume on zoned device oops when sync.
  2024-02-03 22:15     ` David Sterba
@ 2024-02-04  9:34       ` 韩于惟
  2024-02-05  5:22         ` Qu Wenruo
  0 siblings, 1 reply; 22+ messages in thread
From: 韩于惟 @ 2024-02-04  9:34 UTC (permalink / raw)
  To: dsterba; +Cc: linux-btrfs

 > ie. mkfs.btrfs --sectorsize 16k. it works! I can sync without any 
problem now. I will continue to monitor if any issues occurred. seems 
like I can only use these disks on my loongson machine for a while.

Is there any progress or proposed patch for subpage layer fix?

在 2024/2/4 6:15, David Sterba 写道:
> On Sat, Feb 03, 2024 at 06:18:09PM +0800, 韩于惟 wrote:
>> When mkfs, I intentionally used "-s 4k" for better compatibility.
>> And /sys/fs/btrfs/features/supported_sectorsizes is 4096 16384, which
>> should be ok.
>>
>> btrfs-progs is 6.6.2-1, is this related?
> No, this is something in kernel. You could test if same page and sector
> size works, ie. mkfs.btrfs --sectorsize 16k. This avoids using the
> subpage layer that transalates the 4k sectors <-> 16k pages. This has
> the known interoperability issues with different page and sector sizes
> but if it does not affect you, you can use it.
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [btrfs] RAID1 volume on zoned device oops when sync.
  2024-02-04  9:34       ` 韩于惟
@ 2024-02-05  5:22         ` Qu Wenruo
  2024-02-05  6:46           ` 韩于惟
  0 siblings, 1 reply; 22+ messages in thread
From: Qu Wenruo @ 2024-02-05  5:22 UTC (permalink / raw)
  To: 韩于惟, dsterba; +Cc: linux-btrfs



On 2024/2/4 20:04, 韩于惟 wrote:
>  > ie. mkfs.btrfs --sectorsize 16k. it works! I can sync without any
> problem now. I will continue to monitor if any issues occurred. seems
> like I can only use these disks on my loongson machine for a while.

Any clue how can I purchase such disks?
And what's the interface? (NVME? SATA? U2?)

I can go try qemu zoned nvme on my aarch64 host, but so far the SoC is
offline (won't be online until this weekend).

And have you tried emulated zoned device (no matter if it's qemu zoned
emulation or nbd or whatever) with 4K sectorsize?


So far we don't have good enough coverage with zoned on subpage, I have
the physical hardware of aarch64 (and VMs with different page size), but
I don't have any zoned devices.

If you can provide some help, it would super great.

>
> Is there any progress or proposed patch for subpage layer fix?
>
> 在 2024/2/4 6:15, David Sterba 写道:
>> On Sat, Feb 03, 2024 at 06:18:09PM +0800, 韩于惟 wrote:
>>> When mkfs, I intentionally used "-s 4k" for better compatibility.
>>> And /sys/fs/btrfs/features/supported_sectorsizes is 4096 16384, which
>>> should be ok.
>>>
>>> btrfs-progs is 6.6.2-1, is this related?
>> No, this is something in kernel. You could test if same page and sector
>> size works, ie. mkfs.btrfs --sectorsize 16k. This avoids using the
>> subpage layer that transalates the 4k sectors <-> 16k pages. This has
>> the known interoperability issues with different page and sector sizes
>> but if it does not affect you, you can use it.
>>

Another thing is, I don't know how the loongson kernel dump works, but
can you provide the faddr2line output for
"btrfs_finish_ordered_extent+0x24"?

It looks like ordered->inode is not properly initialized but I'm not
100% sure.

Thanks,
Qu

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [btrfs] RAID1 volume on zoned device oops when sync.
  2024-02-05  5:22         ` Qu Wenruo
@ 2024-02-05  6:46           ` 韩于惟
  2024-02-05  7:56             ` Qu Wenruo
  0 siblings, 1 reply; 22+ messages in thread
From: 韩于惟 @ 2024-02-05  6:46 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

 > Any clue how can I purchase such disks?
 > And what's the interface? (NVME? SATA? U2?)

I purchased these on used market app called Xianyu(闲鱼) which may be 
difficult for users
outside China mainland. And its supply is extremely unstable.

Its interface is SATA. Mine model is HSH721414ALN6M0. Spec link: 
https://documents.westerndigital.com/content/dam/doc-library/en_us/assets/public/western-digital/product/data-center-drives/ultrastar-dc-hc600-series/data-sheet-ultrastar-dc-hc620.pdf

 > And have you tried emulated zoned device (no matter if it's qemu zoned
 > emulation or nbd or whatever) with 4K sectorsize?

Have tried on my loongson with this script from 
https://github.com/Rongronggg9

 > ./nullb setup
 > ./nullb create -s 4096 -z 256
 > ./nullb create -s 4096 -z 256
 > ./nullb ls
 > mkfs.btrfs -s 16k /dev/nullb0
 > mount /dev/nullb0 /mnt/tmp
 > btrfs device add /dev/nullb1 /mnt/tmp
 > btrfs balance start -dconvert=raid1 -mconvert=raid1 /mnt/tmp

Whether it is 4k or 16k, kernel will have "zoned: data raid1 needs 
raid-stripe-tree"

 > If you can provide some help, it would super great.

Sure. I can provide access to my loongson w/ dual HC620 if you wish. You 
can contact me on t.me/hanyuwei70.

 > can you provide the faddr2line output for
 > "btrfs_finish_ordered_extent+0x24"?

I have recompiled kernel to add DEBUG_INFO. Here's result.

[hyw@loong3a6 linux-6.7.2]$ ./scripts/faddr2line fs/btrfs/btrfs.ko 
btrfs_finish_ordered_extent+0x24
btrfs_finish_ordered_extent+0x24/0xc0:
spinlock_check at 
/home/hyw/kernel_build/linux-6.7.2/./include/linux/spinlock.h:326
(inlined by) btrfs_finish_ordered_extent at 
/home/hyw/kernel_build/linux-6.7.2/fs/btrfs/ordered-data.c:381

在 2024/2/5 13:22, Qu Wenruo 写道:
>
>
> On 2024/2/4 20:04, 韩于惟 wrote:
>>  > ie. mkfs.btrfs --sectorsize 16k. it works! I can sync without any
>> problem now. I will continue to monitor if any issues occurred. seems
>> like I can only use these disks on my loongson machine for a while.
>
> Any clue how can I purchase such disks?
> And what's the interface? (NVME? SATA? U2?)
>
> I can go try qemu zoned nvme on my aarch64 host, but so far the SoC is
> offline (won't be online until this weekend).
>
> And have you tried emulated zoned device (no matter if it's qemu zoned
> emulation or nbd or whatever) with 4K sectorsize?
>
>
> So far we don't have good enough coverage with zoned on subpage, I have
> the physical hardware of aarch64 (and VMs with different page size), but
> I don't have any zoned devices.
>
> If you can provide some help, it would super great.
>
>>
>> Is there any progress or proposed patch for subpage layer fix?
>>
>> 在 2024/2/4 6:15, David Sterba 写道:
>>> On Sat, Feb 03, 2024 at 06:18:09PM +0800, 韩于惟 wrote:
>>>> When mkfs, I intentionally used "-s 4k" for better compatibility.
>>>> And /sys/fs/btrfs/features/supported_sectorsizes is 4096 16384, which
>>>> should be ok.
>>>>
>>>> btrfs-progs is 6.6.2-1, is this related?
>>> No, this is something in kernel. You could test if same page and sector
>>> size works, ie. mkfs.btrfs --sectorsize 16k. This avoids using the
>>> subpage layer that transalates the 4k sectors <-> 16k pages. This has
>>> the known interoperability issues with different page and sector sizes
>>> but if it does not affect you, you can use it.
>>>
>
> Another thing is, I don't know how the loongson kernel dump works, but
> can you provide the faddr2line output for
> "btrfs_finish_ordered_extent+0x24"?
>
> It looks like ordered->inode is not properly initialized but I'm not
> 100% sure.
>
> Thanks,
> Qu
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [btrfs] RAID1 volume on zoned device oops when sync.
  2024-02-05  6:46           ` 韩于惟
@ 2024-02-05  7:56             ` Qu Wenruo
  2024-02-05 10:50               ` 韩于惟
                                 ` (3 more replies)
  0 siblings, 4 replies; 22+ messages in thread
From: Qu Wenruo @ 2024-02-05  7:56 UTC (permalink / raw)
  To: 韩于惟; +Cc: linux-btrfs



On 2024/2/5 17:16, 韩于惟 wrote:
>  > Any clue how can I purchase such disks?
>  > And what's the interface? (NVME? SATA? U2?)
>
> I purchased these on used market app called Xianyu(闲鱼) which may be
> difficult for users
> outside China mainland. And its supply is extremely unstable.
>
> Its interface is SATA. Mine model is HSH721414ALN6M0. Spec link:
> https://documents.westerndigital.com/content/dam/doc-library/en_us/assets/public/western-digital/product/data-center-drives/ultrastar-dc-hc600-series/data-sheet-ultrastar-dc-hc620.pdf
>
>  > And have you tried emulated zoned device (no matter if it's qemu zoned
>  > emulation or nbd or whatever) with 4K sectorsize?
>
> Have tried on my loongson with this script from
> https://github.com/Rongronggg9
>
>  > ./nullb setup
>  > ./nullb create -s 4096 -z 256
>  > ./nullb create -s 4096 -z 256
>  > ./nullb ls
>  > mkfs.btrfs -s 16k /dev/nullb0
>  > mount /dev/nullb0 /mnt/tmp
>  > btrfs device add /dev/nullb1 /mnt/tmp
>  > btrfs balance start -dconvert=raid1 -mconvert=raid1 /mnt/tmp

Just want to be sure, for your case, you're doing the same mkfs (4K
sectorsize) on the physical disk, then add a new disk, and finally
balanced the fs?

IIRC the balance itself should not succeed, no matter if it's emulated
or real disks, as data RAID1 requires zoned RST support.

If that's the case, it looks like some checks got bypassed, and one copy
of the raid1 bbio doesn't get its content setup properly thus leading to
the NULL pointer dereference.

Anyway, I'll try to reproduce it next week locally, or I'll ask for the
access to your loonson system soon.

Thanks,
Qu
>
> Whether it is 4k or 16k, kernel will have "zoned: data raid1 needs
> raid-stripe-tree"
>
>  > If you can provide some help, it would super great.
>
> Sure. I can provide access to my loongson w/ dual HC620 if you wish. You
> can contact me on t.me/hanyuwei70.
>
>  > can you provide the faddr2line output for
>  > "btrfs_finish_ordered_extent+0x24"?
>
> I have recompiled kernel to add DEBUG_INFO. Here's result.
>
> [hyw@loong3a6 linux-6.7.2]$ ./scripts/faddr2line fs/btrfs/btrfs.ko
> btrfs_finish_ordered_extent+0x24
> btrfs_finish_ordered_extent+0x24/0xc0:
> spinlock_check at
> /home/hyw/kernel_build/linux-6.7.2/./include/linux/spinlock.h:326
> (inlined by) btrfs_finish_ordered_extent at
> /home/hyw/kernel_build/linux-6.7.2/fs/btrfs/ordered-data.c:381
>
> 在 2024/2/5 13:22, Qu Wenruo 写道:
>>
>>
>> On 2024/2/4 20:04, 韩于惟 wrote:
>>>  > ie. mkfs.btrfs --sectorsize 16k. it works! I can sync without any
>>> problem now. I will continue to monitor if any issues occurred. seems
>>> like I can only use these disks on my loongson machine for a while.
>>
>> Any clue how can I purchase such disks?
>> And what's the interface? (NVME? SATA? U2?)
>>
>> I can go try qemu zoned nvme on my aarch64 host, but so far the SoC is
>> offline (won't be online until this weekend).
>>
>> And have you tried emulated zoned device (no matter if it's qemu zoned
>> emulation or nbd or whatever) with 4K sectorsize?
>>
>>
>> So far we don't have good enough coverage with zoned on subpage, I have
>> the physical hardware of aarch64 (and VMs with different page size), but
>> I don't have any zoned devices.
>>
>> If you can provide some help, it would super great.
>>
>>>
>>> Is there any progress or proposed patch for subpage layer fix?
>>>
>>> 在 2024/2/4 6:15, David Sterba 写道:
>>>> On Sat, Feb 03, 2024 at 06:18:09PM +0800, 韩于惟 wrote:
>>>>> When mkfs, I intentionally used "-s 4k" for better compatibility.
>>>>> And /sys/fs/btrfs/features/supported_sectorsizes is 4096 16384, which
>>>>> should be ok.
>>>>>
>>>>> btrfs-progs is 6.6.2-1, is this related?
>>>> No, this is something in kernel. You could test if same page and sector
>>>> size works, ie. mkfs.btrfs --sectorsize 16k. This avoids using the
>>>> subpage layer that transalates the 4k sectors <-> 16k pages. This has
>>>> the known interoperability issues with different page and sector sizes
>>>> but if it does not affect you, you can use it.
>>>>
>>
>> Another thing is, I don't know how the loongson kernel dump works, but
>> can you provide the faddr2line output for
>> "btrfs_finish_ordered_extent+0x24"?
>>
>> It looks like ordered->inode is not properly initialized but I'm not
>> 100% sure.
>>
>> Thanks,
>> Qu
>>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [btrfs] RAID1 volume on zoned device oops when sync.
  2024-02-05  7:56             ` Qu Wenruo
@ 2024-02-05 10:50               ` 韩于惟
  2024-02-05 10:50               ` 韩于惟
                                 ` (2 subsequent siblings)
  3 siblings, 0 replies; 22+ messages in thread
From: 韩于惟 @ 2024-02-05 10:50 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs


在 2024/2/5 15:56, Qu Wenruo 写道:
>
>
> On 2024/2/5 17:16, 韩于惟 wrote:
>>  > Any clue how can I purchase such disks?
>>  > And what's the interface? (NVME? SATA? U2?)
>>
>> I purchased these on used market app called Xianyu(闲鱼) which may be
>> difficult for users
>> outside China mainland. And its supply is extremely unstable.
>>
>> Its interface is SATA. Mine model is HSH721414ALN6M0. Spec link:
>> https://documents.westerndigital.com/content/dam/doc-library/en_us/assets/public/western-digital/product/data-center-drives/ultrastar-dc-hc600-series/data-sheet-ultrastar-dc-hc620.pdf 
>>
>>
>>  > And have you tried emulated zoned device (no matter if it's qemu 
>> zoned
>>  > emulation or nbd or whatever) with 4K sectorsize?
>>
>> Have tried on my loongson with this script from
>> https://github.com/Rongronggg9
>>
>>  > ./nullb setup
>>  > ./nullb create -s 4096 -z 256
>>  > ./nullb create -s 4096 -z 256
>>  > ./nullb ls
>>  > mkfs.btrfs -s 16k /dev/nullb0
>>  > mount /dev/nullb0 /mnt/tmp
>>  > btrfs device add /dev/nullb1 /mnt/tmp
>>  > btrfs balance start -dconvert=raid1 -mconvert=raid1 /mnt/tmp
>
> Just want to be sure, for your case, you're doing the same mkfs (4K
> sectorsize) on the physical disk, then add a new disk, and finally
> balanced the fs?
>
No. I didn't specified sector size in first place, just "mkfs.btrfs 
$dev" on default loongarchlinux (kernel 6.7.0). And it succeed with add 
device & balance. Then I successfully write & read some small files. It 
oops when I started using transmission to download something and 
executed "sync".
> IIRC the balance itself should not succeed, no matter if it's emulated
> or real disks, as data RAID1 requires zoned RST support.
>
> If that's the case, it looks like some checks got bypassed, and one copy
> of the raid1 bbio doesn't get its content setup properly thus leading to
> the NULL pointer dereference.
>
> Anyway, I'll try to reproduce it next week locally, or I'll ask for the
> access to your loonson system soon.
>
> Thanks,
> Qu
>>
>> Whether it is 4k or 16k, kernel will have "zoned: data raid1 needs
>> raid-stripe-tree"
>>
>>  > If you can provide some help, it would super great.
>>
>> Sure. I can provide access to my loongson w/ dual HC620 if you wish. You
>> can contact me on t.me/hanyuwei70.
>>
>>  > can you provide the faddr2line output for
>>  > "btrfs_finish_ordered_extent+0x24"?
>>
>> I have recompiled kernel to add DEBUG_INFO. Here's result.
>>
>> [hyw@loong3a6 linux-6.7.2]$ ./scripts/faddr2line fs/btrfs/btrfs.ko
>> btrfs_finish_ordered_extent+0x24
>> btrfs_finish_ordered_extent+0x24/0xc0:
>> spinlock_check at
>> /home/hyw/kernel_build/linux-6.7.2/./include/linux/spinlock.h:326
>> (inlined by) btrfs_finish_ordered_extent at
>> /home/hyw/kernel_build/linux-6.7.2/fs/btrfs/ordered-data.c:381
>>
>> 在 2024/2/5 13:22, Qu Wenruo 写道:
>>>
>>>
>>> On 2024/2/4 20:04, 韩于惟 wrote:
>>>>  > ie. mkfs.btrfs --sectorsize 16k. it works! I can sync without any
>>>> problem now. I will continue to monitor if any issues occurred. seems
>>>> like I can only use these disks on my loongson machine for a while.
>>>
>>> Any clue how can I purchase such disks?
>>> And what's the interface? (NVME? SATA? U2?)
>>>
>>> I can go try qemu zoned nvme on my aarch64 host, but so far the SoC is
>>> offline (won't be online until this weekend).
>>>
>>> And have you tried emulated zoned device (no matter if it's qemu zoned
>>> emulation or nbd or whatever) with 4K sectorsize?
>>>
>>>
>>> So far we don't have good enough coverage with zoned on subpage, I have
>>> the physical hardware of aarch64 (and VMs with different page size), 
>>> but
>>> I don't have any zoned devices.
>>>
>>> If you can provide some help, it would super great.
>>>
>>>>
>>>> Is there any progress or proposed patch for subpage layer fix?
>>>>
>>>> 在 2024/2/4 6:15, David Sterba 写道:
>>>>> On Sat, Feb 03, 2024 at 06:18:09PM +0800, 韩于惟 wrote:
>>>>>> When mkfs, I intentionally used "-s 4k" for better compatibility.
>>>>>> And /sys/fs/btrfs/features/supported_sectorsizes is 4096 16384, 
>>>>>> which
>>>>>> should be ok.
>>>>>>
>>>>>> btrfs-progs is 6.6.2-1, is this related?
>>>>> No, this is something in kernel. You could test if same page and 
>>>>> sector
>>>>> size works, ie. mkfs.btrfs --sectorsize 16k. This avoids using the
>>>>> subpage layer that transalates the 4k sectors <-> 16k pages. This has
>>>>> the known interoperability issues with different page and sector 
>>>>> sizes
>>>>> but if it does not affect you, you can use it.
>>>>>
>>>
>>> Another thing is, I don't know how the loongson kernel dump works, but
>>> can you provide the faddr2line output for
>>> "btrfs_finish_ordered_extent+0x24"?
>>>
>>> It looks like ordered->inode is not properly initialized but I'm not
>>> 100% sure.
>>>
>>> Thanks,
>>> Qu
>>>
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [btrfs] RAID1 volume on zoned device oops when sync.
  2024-02-05  7:56             ` Qu Wenruo
  2024-02-05 10:50               ` 韩于惟
@ 2024-02-05 10:50               ` 韩于惟
  2024-02-05 10:50               ` 韩于惟
  2024-02-08 12:42               ` Johannes Thumshirn
  3 siblings, 0 replies; 22+ messages in thread
From: 韩于惟 @ 2024-02-05 10:50 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs


在 2024/2/5 15:56, Qu Wenruo 写道:
>
>
> On 2024/2/5 17:16, 韩于惟 wrote:
>>  > Any clue how can I purchase such disks?
>>  > And what's the interface? (NVME? SATA? U2?)
>>
>> I purchased these on used market app called Xianyu(闲鱼) which may be
>> difficult for users
>> outside China mainland. And its supply is extremely unstable.
>>
>> Its interface is SATA. Mine model is HSH721414ALN6M0. Spec link:
>> https://documents.westerndigital.com/content/dam/doc-library/en_us/assets/public/western-digital/product/data-center-drives/ultrastar-dc-hc600-series/data-sheet-ultrastar-dc-hc620.pdf 
>>
>>
>>  > And have you tried emulated zoned device (no matter if it's qemu 
>> zoned
>>  > emulation or nbd or whatever) with 4K sectorsize?
>>
>> Have tried on my loongson with this script from
>> https://github.com/Rongronggg9
>>
>>  > ./nullb setup
>>  > ./nullb create -s 4096 -z 256
>>  > ./nullb create -s 4096 -z 256
>>  > ./nullb ls
>>  > mkfs.btrfs -s 16k /dev/nullb0
>>  > mount /dev/nullb0 /mnt/tmp
>>  > btrfs device add /dev/nullb1 /mnt/tmp
>>  > btrfs balance start -dconvert=raid1 -mconvert=raid1 /mnt/tmp
>
> Just want to be sure, for your case, you're doing the same mkfs (4K
> sectorsize) on the physical disk, then add a new disk, and finally
> balanced the fs?
>
No. I didn't specified sector size in first place, just "mkfs.btrfs 
$dev" on default loongarchlinux (kernel 6.7.0). And it succeed with add 
device & balance. Then I successfully write & read some small files. It 
oops when I started using transmission to download something and 
executed "sync".
> IIRC the balance itself should not succeed, no matter if it's emulated
> or real disks, as data RAID1 requires zoned RST support.
>
> If that's the case, it looks like some checks got bypassed, and one copy
> of the raid1 bbio doesn't get its content setup properly thus leading to
> the NULL pointer dereference.
>
> Anyway, I'll try to reproduce it next week locally, or I'll ask for the
> access to your loonson system soon.
>
> Thanks,
> Qu
>>
>> Whether it is 4k or 16k, kernel will have "zoned: data raid1 needs
>> raid-stripe-tree"
>>
>>  > If you can provide some help, it would super great.
>>
>> Sure. I can provide access to my loongson w/ dual HC620 if you wish. You
>> can contact me on t.me/hanyuwei70.
>>
>>  > can you provide the faddr2line output for
>>  > "btrfs_finish_ordered_extent+0x24"?
>>
>> I have recompiled kernel to add DEBUG_INFO. Here's result.
>>
>> [hyw@loong3a6 linux-6.7.2]$ ./scripts/faddr2line fs/btrfs/btrfs.ko
>> btrfs_finish_ordered_extent+0x24
>> btrfs_finish_ordered_extent+0x24/0xc0:
>> spinlock_check at
>> /home/hyw/kernel_build/linux-6.7.2/./include/linux/spinlock.h:326
>> (inlined by) btrfs_finish_ordered_extent at
>> /home/hyw/kernel_build/linux-6.7.2/fs/btrfs/ordered-data.c:381
>>
>> 在 2024/2/5 13:22, Qu Wenruo 写道:
>>>
>>>
>>> On 2024/2/4 20:04, 韩于惟 wrote:
>>>>  > ie. mkfs.btrfs --sectorsize 16k. it works! I can sync without any
>>>> problem now. I will continue to monitor if any issues occurred. seems
>>>> like I can only use these disks on my loongson machine for a while.
>>>
>>> Any clue how can I purchase such disks?
>>> And what's the interface? (NVME? SATA? U2?)
>>>
>>> I can go try qemu zoned nvme on my aarch64 host, but so far the SoC is
>>> offline (won't be online until this weekend).
>>>
>>> And have you tried emulated zoned device (no matter if it's qemu zoned
>>> emulation or nbd or whatever) with 4K sectorsize?
>>>
>>>
>>> So far we don't have good enough coverage with zoned on subpage, I have
>>> the physical hardware of aarch64 (and VMs with different page size), 
>>> but
>>> I don't have any zoned devices.
>>>
>>> If you can provide some help, it would super great.
>>>
>>>>
>>>> Is there any progress or proposed patch for subpage layer fix?
>>>>
>>>> 在 2024/2/4 6:15, David Sterba 写道:
>>>>> On Sat, Feb 03, 2024 at 06:18:09PM +0800, 韩于惟 wrote:
>>>>>> When mkfs, I intentionally used "-s 4k" for better compatibility.
>>>>>> And /sys/fs/btrfs/features/supported_sectorsizes is 4096 16384, 
>>>>>> which
>>>>>> should be ok.
>>>>>>
>>>>>> btrfs-progs is 6.6.2-1, is this related?
>>>>> No, this is something in kernel. You could test if same page and 
>>>>> sector
>>>>> size works, ie. mkfs.btrfs --sectorsize 16k. This avoids using the
>>>>> subpage layer that transalates the 4k sectors <-> 16k pages. This has
>>>>> the known interoperability issues with different page and sector 
>>>>> sizes
>>>>> but if it does not affect you, you can use it.
>>>>>
>>>
>>> Another thing is, I don't know how the loongson kernel dump works, but
>>> can you provide the faddr2line output for
>>> "btrfs_finish_ordered_extent+0x24"?
>>>
>>> It looks like ordered->inode is not properly initialized but I'm not
>>> 100% sure.
>>>
>>> Thanks,
>>> Qu
>>>
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [btrfs] RAID1 volume on zoned device oops when sync.
  2024-02-05  7:56             ` Qu Wenruo
  2024-02-05 10:50               ` 韩于惟
  2024-02-05 10:50               ` 韩于惟
@ 2024-02-05 10:50               ` 韩于惟
  2024-02-05 20:40                 ` Qu Wenruo
  2024-02-08 12:42               ` Johannes Thumshirn
  3 siblings, 1 reply; 22+ messages in thread
From: 韩于惟 @ 2024-02-05 10:50 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs


在 2024/2/5 15:56, Qu Wenruo 写道:
>
>
> On 2024/2/5 17:16, 韩于惟 wrote:
>>  > Any clue how can I purchase such disks?
>>  > And what's the interface? (NVME? SATA? U2?)
>>
>> I purchased these on used market app called Xianyu(闲鱼) which may be
>> difficult for users
>> outside China mainland. And its supply is extremely unstable.
>>
>> Its interface is SATA. Mine model is HSH721414ALN6M0. Spec link:
>> https://documents.westerndigital.com/content/dam/doc-library/en_us/assets/public/western-digital/product/data-center-drives/ultrastar-dc-hc600-series/data-sheet-ultrastar-dc-hc620.pdf 
>>
>>
>>  > And have you tried emulated zoned device (no matter if it's qemu 
>> zoned
>>  > emulation or nbd or whatever) with 4K sectorsize?
>>
>> Have tried on my loongson with this script from
>> https://github.com/Rongronggg9
>>
>>  > ./nullb setup
>>  > ./nullb create -s 4096 -z 256
>>  > ./nullb create -s 4096 -z 256
>>  > ./nullb ls
>>  > mkfs.btrfs -s 16k /dev/nullb0
>>  > mount /dev/nullb0 /mnt/tmp
>>  > btrfs device add /dev/nullb1 /mnt/tmp
>>  > btrfs balance start -dconvert=raid1 -mconvert=raid1 /mnt/tmp
>
> Just want to be sure, for your case, you're doing the same mkfs (4K
> sectorsize) on the physical disk, then add a new disk, and finally
> balanced the fs?
>
No. I didn't specified sector size in first place, just "mkfs.btrfs 
$dev" on default loongarchlinux (kernel 6.7.0). And it succeed with add 
device & balance. Then I successfully write & read some small files. It 
oops when I started using transmission to download something and 
executed "sync".
> IIRC the balance itself should not succeed, no matter if it's emulated
> or real disks, as data RAID1 requires zoned RST support.
>
> If that's the case, it looks like some checks got bypassed, and one copy
> of the raid1 bbio doesn't get its content setup properly thus leading to
> the NULL pointer dereference.
>
> Anyway, I'll try to reproduce it next week locally, or I'll ask for the
> access to your loonson system soon.
>
> Thanks,
> Qu
>>
>> Whether it is 4k or 16k, kernel will have "zoned: data raid1 needs
>> raid-stripe-tree"
>>
>>  > If you can provide some help, it would super great.
>>
>> Sure. I can provide access to my loongson w/ dual HC620 if you wish. You
>> can contact me on t.me/hanyuwei70.
>>
>>  > can you provide the faddr2line output for
>>  > "btrfs_finish_ordered_extent+0x24"?
>>
>> I have recompiled kernel to add DEBUG_INFO. Here's result.
>>
>> [hyw@loong3a6 linux-6.7.2]$ ./scripts/faddr2line fs/btrfs/btrfs.ko
>> btrfs_finish_ordered_extent+0x24
>> btrfs_finish_ordered_extent+0x24/0xc0:
>> spinlock_check at
>> /home/hyw/kernel_build/linux-6.7.2/./include/linux/spinlock.h:326
>> (inlined by) btrfs_finish_ordered_extent at
>> /home/hyw/kernel_build/linux-6.7.2/fs/btrfs/ordered-data.c:381
>>
>> 在 2024/2/5 13:22, Qu Wenruo 写道:
>>>
>>>
>>> On 2024/2/4 20:04, 韩于惟 wrote:
>>>>  > ie. mkfs.btrfs --sectorsize 16k. it works! I can sync without any
>>>> problem now. I will continue to monitor if any issues occurred. seems
>>>> like I can only use these disks on my loongson machine for a while.
>>>
>>> Any clue how can I purchase such disks?
>>> And what's the interface? (NVME? SATA? U2?)
>>>
>>> I can go try qemu zoned nvme on my aarch64 host, but so far the SoC is
>>> offline (won't be online until this weekend).
>>>
>>> And have you tried emulated zoned device (no matter if it's qemu zoned
>>> emulation or nbd or whatever) with 4K sectorsize?
>>>
>>>
>>> So far we don't have good enough coverage with zoned on subpage, I have
>>> the physical hardware of aarch64 (and VMs with different page size), 
>>> but
>>> I don't have any zoned devices.
>>>
>>> If you can provide some help, it would super great.
>>>
>>>>
>>>> Is there any progress or proposed patch for subpage layer fix?
>>>>
>>>> 在 2024/2/4 6:15, David Sterba 写道:
>>>>> On Sat, Feb 03, 2024 at 06:18:09PM +0800, 韩于惟 wrote:
>>>>>> When mkfs, I intentionally used "-s 4k" for better compatibility.
>>>>>> And /sys/fs/btrfs/features/supported_sectorsizes is 4096 16384, 
>>>>>> which
>>>>>> should be ok.
>>>>>>
>>>>>> btrfs-progs is 6.6.2-1, is this related?
>>>>> No, this is something in kernel. You could test if same page and 
>>>>> sector
>>>>> size works, ie. mkfs.btrfs --sectorsize 16k. This avoids using the
>>>>> subpage layer that transalates the 4k sectors <-> 16k pages. This has
>>>>> the known interoperability issues with different page and sector 
>>>>> sizes
>>>>> but if it does not affect you, you can use it.
>>>>>
>>>
>>> Another thing is, I don't know how the loongson kernel dump works, but
>>> can you provide the faddr2line output for
>>> "btrfs_finish_ordered_extent+0x24"?
>>>
>>> It looks like ordered->inode is not properly initialized but I'm not
>>> 100% sure.
>>>
>>> Thanks,
>>> Qu
>>>
>


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [btrfs] RAID1 volume on zoned device oops when sync.
  2024-02-05 10:50               ` 韩于惟
@ 2024-02-05 20:40                 ` Qu Wenruo
  2024-02-06  1:45                   ` 韩于惟
  0 siblings, 1 reply; 22+ messages in thread
From: Qu Wenruo @ 2024-02-05 20:40 UTC (permalink / raw)
  To: 韩于惟; +Cc: linux-btrfs



On 2024/2/5 21:20, 韩于惟 wrote:
>
> 在 2024/2/5 15:56, Qu Wenruo 写道:
>>
>>
>> On 2024/2/5 17:16, 韩于惟 wrote:
>>>  > Any clue how can I purchase such disks?
>>>  > And what's the interface? (NVME? SATA? U2?)
>>>
>>> I purchased these on used market app called Xianyu(闲鱼) which may be
>>> difficult for users
>>> outside China mainland. And its supply is extremely unstable.
>>>
>>> Its interface is SATA. Mine model is HSH721414ALN6M0. Spec link:
>>> https://documents.westerndigital.com/content/dam/doc-library/en_us/assets/public/western-digital/product/data-center-drives/ultrastar-dc-hc600-series/data-sheet-ultrastar-dc-hc620.pdf
>>>
>>>  > And have you tried emulated zoned device (no matter if it's qemu
>>> zoned
>>>  > emulation or nbd or whatever) with 4K sectorsize?
>>>
>>> Have tried on my loongson with this script from
>>> https://github.com/Rongronggg9
>>>
>>>  > ./nullb setup
>>>  > ./nullb create -s 4096 -z 256
>>>  > ./nullb create -s 4096 -z 256
>>>  > ./nullb ls
>>>  > mkfs.btrfs -s 16k /dev/nullb0
>>>  > mount /dev/nullb0 /mnt/tmp
>>>  > btrfs device add /dev/nullb1 /mnt/tmp
>>>  > btrfs balance start -dconvert=raid1 -mconvert=raid1 /mnt/tmp
>>
>> Just want to be sure, for your case, you're doing the same mkfs (4K
>> sectorsize) on the physical disk, then add a new disk, and finally
>> balanced the fs?
>>
> No. I didn't specified sector size in first place, just "mkfs.btrfs
> $dev" on default loongarchlinux (kernel 6.7.0).

Mind to re-run the mkfs.btrfs on the physical disk and provide the output?

I believe this is because you're using the latest btrfs-progs, which by
default is using 4K sectorsize by default.

Thanks,
Qu

> And it succeed with add
> device & balance. Then I successfully write & read some small files. It
> oops when I started using transmission to download something and
> executed "sync".
>> IIRC the balance itself should not succeed, no matter if it's emulated
>> or real disks, as data RAID1 requires zoned RST support.
>>
>> If that's the case, it looks like some checks got bypassed, and one copy
>> of the raid1 bbio doesn't get its content setup properly thus leading to
>> the NULL pointer dereference.
>>
>> Anyway, I'll try to reproduce it next week locally, or I'll ask for the
>> access to your loonson system soon.
>>
>> Thanks,
>> Qu
>>>
>>> Whether it is 4k or 16k, kernel will have "zoned: data raid1 needs
>>> raid-stripe-tree"
>>>
>>>  > If you can provide some help, it would super great.
>>>
>>> Sure. I can provide access to my loongson w/ dual HC620 if you wish. You
>>> can contact me on t.me/hanyuwei70.
>>>
>>>  > can you provide the faddr2line output for
>>>  > "btrfs_finish_ordered_extent+0x24"?
>>>
>>> I have recompiled kernel to add DEBUG_INFO. Here's result.
>>>
>>> [hyw@loong3a6 linux-6.7.2]$ ./scripts/faddr2line fs/btrfs/btrfs.ko
>>> btrfs_finish_ordered_extent+0x24
>>> btrfs_finish_ordered_extent+0x24/0xc0:
>>> spinlock_check at
>>> /home/hyw/kernel_build/linux-6.7.2/./include/linux/spinlock.h:326
>>> (inlined by) btrfs_finish_ordered_extent at
>>> /home/hyw/kernel_build/linux-6.7.2/fs/btrfs/ordered-data.c:381
>>>
>>> 在 2024/2/5 13:22, Qu Wenruo 写道:
>>>>
>>>>
>>>> On 2024/2/4 20:04, 韩于惟 wrote:
>>>>>  > ie. mkfs.btrfs --sectorsize 16k. it works! I can sync without any
>>>>> problem now. I will continue to monitor if any issues occurred. seems
>>>>> like I can only use these disks on my loongson machine for a while.
>>>>
>>>> Any clue how can I purchase such disks?
>>>> And what's the interface? (NVME? SATA? U2?)
>>>>
>>>> I can go try qemu zoned nvme on my aarch64 host, but so far the SoC is
>>>> offline (won't be online until this weekend).
>>>>
>>>> And have you tried emulated zoned device (no matter if it's qemu zoned
>>>> emulation or nbd or whatever) with 4K sectorsize?
>>>>
>>>>
>>>> So far we don't have good enough coverage with zoned on subpage, I have
>>>> the physical hardware of aarch64 (and VMs with different page size),
>>>> but
>>>> I don't have any zoned devices.
>>>>
>>>> If you can provide some help, it would super great.
>>>>
>>>>>
>>>>> Is there any progress or proposed patch for subpage layer fix?
>>>>>
>>>>> 在 2024/2/4 6:15, David Sterba 写道:
>>>>>> On Sat, Feb 03, 2024 at 06:18:09PM +0800, 韩于惟 wrote:
>>>>>>> When mkfs, I intentionally used "-s 4k" for better compatibility.
>>>>>>> And /sys/fs/btrfs/features/supported_sectorsizes is 4096 16384,
>>>>>>> which
>>>>>>> should be ok.
>>>>>>>
>>>>>>> btrfs-progs is 6.6.2-1, is this related?
>>>>>> No, this is something in kernel. You could test if same page and
>>>>>> sector
>>>>>> size works, ie. mkfs.btrfs --sectorsize 16k. This avoids using the
>>>>>> subpage layer that transalates the 4k sectors <-> 16k pages. This has
>>>>>> the known interoperability issues with different page and sector
>>>>>> sizes
>>>>>> but if it does not affect you, you can use it.
>>>>>>
>>>>
>>>> Another thing is, I don't know how the loongson kernel dump works, but
>>>> can you provide the faddr2line output for
>>>> "btrfs_finish_ordered_extent+0x24"?
>>>>
>>>> It looks like ordered->inode is not properly initialized but I'm not
>>>> 100% sure.
>>>>
>>>> Thanks,
>>>> Qu
>>>>
>>
>
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [btrfs] RAID1 volume on zoned device oops when sync.
  2024-02-05 20:40                 ` Qu Wenruo
@ 2024-02-06  1:45                   ` 韩于惟
  0 siblings, 0 replies; 22+ messages in thread
From: 韩于惟 @ 2024-02-06  1:45 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs


在 2024/2/6 4:40, Qu Wenruo 写道:
>
>
> On 2024/2/5 21:20, 韩于惟 wrote:
>>
>> 在 2024/2/5 15:56, Qu Wenruo 写道:
>>>
>>>
>>> On 2024/2/5 17:16, 韩于惟 wrote:
>>>>  > Any clue how can I purchase such disks?
>>>>  > And what's the interface? (NVME? SATA? U2?)
>>>>
>>>> I purchased these on used market app called Xianyu(闲鱼) which may be
>>>> difficult for users
>>>> outside China mainland. And its supply is extremely unstable.
>>>>
>>>> Its interface is SATA. Mine model is HSH721414ALN6M0. Spec link:
>>>> https://documents.westerndigital.com/content/dam/doc-library/en_us/assets/public/western-digital/product/data-center-drives/ultrastar-dc-hc600-series/data-sheet-ultrastar-dc-hc620.pdf 
>>>>
>>>>
>>>>  > And have you tried emulated zoned device (no matter if it's qemu
>>>> zoned
>>>>  > emulation or nbd or whatever) with 4K sectorsize?
>>>>
>>>> Have tried on my loongson with this script from
>>>> https://github.com/Rongronggg9
>>>>
>>>>  > ./nullb setup
>>>>  > ./nullb create -s 4096 -z 256
>>>>  > ./nullb create -s 4096 -z 256
>>>>  > ./nullb ls
>>>>  > mkfs.btrfs -s 16k /dev/nullb0
>>>>  > mount /dev/nullb0 /mnt/tmp
>>>>  > btrfs device add /dev/nullb1 /mnt/tmp
>>>>  > btrfs balance start -dconvert=raid1 -mconvert=raid1 /mnt/tmp
>>>
>>> Just want to be sure, for your case, you're doing the same mkfs (4K
>>> sectorsize) on the physical disk, then add a new disk, and finally
>>> balanced the fs?
>>>
>> No. I didn't specified sector size in first place, just "mkfs.btrfs
>> $dev" on default loongarchlinux (kernel 6.7.0).
>
> Mind to re-run the mkfs.btrfs on the physical disk and provide the 
> output?
>
> I believe this is because you're using the latest btrfs-progs, which by
> default is using 4K sectorsize by default.
>
I have checked using dupm-super after first mkfs, sector size is 16k.
>
>> And it succeed with add
>> device & balance. Then I successfully write & read some small files. It
>> oops when I started using transmission to download something and
>> executed "sync".
>>> IIRC the balance itself should not succeed, no matter if it's emulated
>>> or real disks, as data RAID1 requires zoned RST support.
>>>
>>> If that's the case, it looks like some checks got bypassed, and one 
>>> copy
>>> of the raid1 bbio doesn't get its content setup properly thus 
>>> leading to
>>> the NULL pointer dereference.
>>>
>>> Anyway, I'll try to reproduce it next week locally, or I'll ask for the
>>> access to your loonson system soon.
>>>
>>> Thanks,
>>> Qu
>>>>
>>>> Whether it is 4k or 16k, kernel will have "zoned: data raid1 needs
>>>> raid-stripe-tree"
>>>>
>>>>  > If you can provide some help, it would super great.
>>>>
>>>> Sure. I can provide access to my loongson w/ dual HC620 if you 
>>>> wish. You
>>>> can contact me on t.me/hanyuwei70.
>>>>
>>>>  > can you provide the faddr2line output for
>>>>  > "btrfs_finish_ordered_extent+0x24"?
>>>>
>>>> I have recompiled kernel to add DEBUG_INFO. Here's result.
>>>>
>>>> [hyw@loong3a6 linux-6.7.2]$ ./scripts/faddr2line fs/btrfs/btrfs.ko
>>>> btrfs_finish_ordered_extent+0x24
>>>> btrfs_finish_ordered_extent+0x24/0xc0:
>>>> spinlock_check at
>>>> /home/hyw/kernel_build/linux-6.7.2/./include/linux/spinlock.h:326
>>>> (inlined by) btrfs_finish_ordered_extent at
>>>> /home/hyw/kernel_build/linux-6.7.2/fs/btrfs/ordered-data.c:381
>>>>
>>>> 在 2024/2/5 13:22, Qu Wenruo 写道:
>>>>>
>>>>>
>>>>> On 2024/2/4 20:04, 韩于惟 wrote:
>>>>>>  > ie. mkfs.btrfs --sectorsize 16k. it works! I can sync without any
>>>>>> problem now. I will continue to monitor if any issues occurred. 
>>>>>> seems
>>>>>> like I can only use these disks on my loongson machine for a while.
>>>>>
>>>>> Any clue how can I purchase such disks?
>>>>> And what's the interface? (NVME? SATA? U2?)
>>>>>
>>>>> I can go try qemu zoned nvme on my aarch64 host, but so far the 
>>>>> SoC is
>>>>> offline (won't be online until this weekend).
>>>>>
>>>>> And have you tried emulated zoned device (no matter if it's qemu 
>>>>> zoned
>>>>> emulation or nbd or whatever) with 4K sectorsize?
>>>>>
>>>>>
>>>>> So far we don't have good enough coverage with zoned on subpage, I 
>>>>> have
>>>>> the physical hardware of aarch64 (and VMs with different page size),
>>>>> but
>>>>> I don't have any zoned devices.
>>>>>
>>>>> If you can provide some help, it would super great.
>>>>>
>>>>>>
>>>>>> Is there any progress or proposed patch for subpage layer fix?
>>>>>>
>>>>>> 在 2024/2/4 6:15, David Sterba 写道:
>>>>>>> On Sat, Feb 03, 2024 at 06:18:09PM +0800, 韩于惟 wrote:
>>>>>>>> When mkfs, I intentionally used "-s 4k" for better compatibility.
>>>>>>>> And /sys/fs/btrfs/features/supported_sectorsizes is 4096 16384,
>>>>>>>> which
>>>>>>>> should be ok.
>>>>>>>>
>>>>>>>> btrfs-progs is 6.6.2-1, is this related?
>>>>>>> No, this is something in kernel. You could test if same page and
>>>>>>> sector
>>>>>>> size works, ie. mkfs.btrfs --sectorsize 16k. This avoids using the
>>>>>>> subpage layer that transalates the 4k sectors <-> 16k pages. 
>>>>>>> This has
>>>>>>> the known interoperability issues with different page and sector
>>>>>>> sizes
>>>>>>> but if it does not affect you, you can use it.
>>>>>>>
>>>>>
>>>>> Another thing is, I don't know how the loongson kernel dump works, 
>>>>> but
>>>>> can you provide the faddr2line output for
>>>>> "btrfs_finish_ordered_extent+0x24"?
>>>>>
>>>>> It looks like ordered->inode is not properly initialized but I'm not
>>>>> 100% sure.
>>>>>
>>>>> Thanks,
>>>>> Qu
>>>>>
>>>
>>
>>
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [btrfs] RAID1 volume on zoned device oops when sync.
  2024-02-05  7:56             ` Qu Wenruo
                                 ` (2 preceding siblings ...)
  2024-02-05 10:50               ` 韩于惟
@ 2024-02-08 12:42               ` Johannes Thumshirn
  2024-02-08 20:15                 ` Qu Wenruo
  2024-02-09  8:16                 ` 韩于惟
  3 siblings, 2 replies; 22+ messages in thread
From: Johannes Thumshirn @ 2024-02-08 12:42 UTC (permalink / raw)
  To: Qu Wenruo, 韩于惟; +Cc: linux-btrfs

On 05.02.24 08:56, Qu Wenruo wrote:
>>
>>   > ./nullb setup
>>   > ./nullb create -s 4096 -z 256
>>   > ./nullb create -s 4096 -z 256
>>   > ./nullb ls
>>   > mkfs.btrfs -s 16k /dev/nullb0
>>   > mount /dev/nullb0 /mnt/tmp
>>   > btrfs device add /dev/nullb1 /mnt/tmp
>>   > btrfs balance start -dconvert=raid1 -mconvert=raid1 /mnt/tmp
> 
> Just want to be sure, for your case, you're doing the same mkfs (4K
> sectorsize) on the physical disk, then add a new disk, and finally
> balanced the fs?
> 
> IIRC the balance itself should not succeed, no matter if it's emulated
> or real disks, as data RAID1 requires zoned RST support.

For me, balance doesn't accept RAID on zoned devices, as it's supposed 
to do:

[  212.721872] BTRFS info (device nvme1n1): host-managed zoned block 
device /dev/nvme2n1, 160 zones of 134217728 bytes
[  212.725694] BTRFS info (device nvme1n1): disk added /dev/nvme2n1
[  212.744807] BTRFS warning (device nvme1n1): balance: metadata profile 
dup has lower redundancy than data profile raid1
[  212.748706] BTRFS info (device nvme1n1): balance: start -dconvert=raid1
[  212.750006] BTRFS error (device nvme1n1): zoned: data raid1 needs 
raid-stripe-tree
[  212.751267] BTRFS info (device nvme1n1): balance: ended with status: -22

So I'm not exactly sure what's happening here.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [btrfs] RAID1 volume on zoned device oops when sync.
  2024-02-08 12:42               ` Johannes Thumshirn
@ 2024-02-08 20:15                 ` Qu Wenruo
  2024-02-09  1:10                   ` David Sterba
  2024-02-09  8:16                 ` 韩于惟
  1 sibling, 1 reply; 22+ messages in thread
From: Qu Wenruo @ 2024-02-08 20:15 UTC (permalink / raw)
  To: Johannes Thumshirn, Qu Wenruo, 韩于惟; +Cc: linux-btrfs


[-- Attachment #1.1.1: Type: text/plain, Size: 6966 bytes --]



On 2024/2/8 23:12, Johannes Thumshirn wrote:
> On 05.02.24 08:56, Qu Wenruo wrote:
>>>
>>>    > ./nullb setup
>>>    > ./nullb create -s 4096 -z 256
>>>    > ./nullb create -s 4096 -z 256
>>>    > ./nullb ls
>>>    > mkfs.btrfs -s 16k /dev/nullb0
>>>    > mount /dev/nullb0 /mnt/tmp
>>>    > btrfs device add /dev/nullb1 /mnt/tmp
>>>    > btrfs balance start -dconvert=raid1 -mconvert=raid1 /mnt/tmp
>>
>> Just want to be sure, for your case, you're doing the same mkfs (4K
>> sectorsize) on the physical disk, then add a new disk, and finally
>> balanced the fs?
>>
>> IIRC the balance itself should not succeed, no matter if it's emulated
>> or real disks, as data RAID1 requires zoned RST support.
> 
> For me, balance doesn't accept RAID on zoned devices, as it's supposed
> to do:
> 
> [  212.721872] BTRFS info (device nvme1n1): host-managed zoned block
> device /dev/nvme2n1, 160 zones of 134217728 bytes
> [  212.725694] BTRFS info (device nvme1n1): disk added /dev/nvme2n1
> [  212.744807] BTRFS warning (device nvme1n1): balance: metadata profile
> dup has lower redundancy than data profile raid1
> [  212.748706] BTRFS info (device nvme1n1): balance: start -dconvert=raid1
> [  212.750006] BTRFS error (device nvme1n1): zoned: data raid1 needs
> raid-stripe-tree
> [  212.751267] BTRFS info (device nvme1n1): balance: ended with status: -22
> 
> So I'm not exactly sure what's happening here.

I have the access to that machine, and it doesn't reject the convert as 
expected:

$ sudo mkfs.btrfs -f /dev/sdb
btrfs-progs v6.6.2
See https://btrfs.readthedocs.io for more information.

Zoned: /dev/sdb: host-managed device detected, setting zoned feature
WARNING: libblkid < 2.38 does not support zoned mode's superblock 
location, update recommended
Resetting device zones /dev/sdb (52156 zones) ...
NOTE: several default settings have changed in version 5.15, please make 
sure
       this does not affect your deployments:
       - DUP for metadata (-m dup)
       - enabled no-holes (-O no-holes)
       - enabled free-space-tree (-R free-space-tree)

Label:              (null)
UUID:               e49c5f73-35dd-4faa-8660-dd0b3d02e978
Node size:          16384
Sector size:        16384	<<< Not yet subpage.
Filesystem size:    12.73TiB
Block group profiles:
   Data:             single          256.00MiB
   Metadata:         DUP             256.00MiB
   System:           DUP             256.00MiB
SSD detected:       no
Zoned device:       yes
   Zone size:        256.00MiB
Incompat features:  extref, skinny-metadata, no-holes, free-space-tree, 
zoned
Runtime features:   free-space-tree
Checksum:           crc32c
Number of devices:  1
Devices:
    ID        SIZE  ZONES  PATH
     1    12.73TiB  52156  /dev/sdb

$ sudo mount /dev/sdb /mnt/btrfs
$ sudo btrfs dev add /dev/sdc -f /mnt/btrfs/
Resetting device zones /dev/sdc (52156 zones) ...

$ dmesg
[146422.722707] BTRFS: device fsid e49c5f73-35dd-4faa-8660-dd0b3d02e978 
devid 1 transid 6 /dev/sdb scanned by mount (4172)
[146422.736415] BTRFS info (device sdb): first mount of filesystem 
e49c5f73-35dd-4faa-8660-dd0b3d02e978
[146422.745508] BTRFS info (device sdb): using crc32c (crc32c-generic) 
checksum algorithm
[146422.753388] BTRFS info (device sdb): using free-space-tree
[146423.313000] BTRFS info (device sdb): host-managed zoned block device 
/dev/sdb, 52156 zones of 268435456 bytes
[146423.322954] BTRFS info (device sdb): zoned mode enabled with zone 
size 268435456
[146423.330808] BTRFS info (device sdb): checking UUID tree
[146446.313055] BTRFS info (device sdb): host-managed zoned block device 
/dev/sdc, 52156 zones of 268435456 bytes
[146446.345735] BTRFS info (device sdb): disk added /dev/sdc

$ sudo dmesg -C
$ sudo btrfs balance start -mconvert=raid1 -dconvert=raid1 /mnt/btrfs/
Done, had to relocate 3 out of 3 chunks

$ sudo dmesg
[146533.890423] BTRFS info (device sdb): balance: start -dconvert=raid1 
-mconvert=raid1 -sconvert=raid1
[146533.899668] BTRFS info (device sdb): relocating block group 
1610612736 flags metadata|dup
[146533.992730] BTRFS info (device sdb): found 3 extents, stage: move 
data extents
[146534.126812] BTRFS info (device sdb): relocating block group 
1342177280 flags system|dup
[146534.252836] BTRFS info (device sdb): relocating block group 
1073741824 flags data
[146534.428593] BTRFS info (device sdb): balance: ended with status: 0

Furthermore, tree-dump of chunk tree indeed shows RAID1:

$ sudo btrfs ins dump-tree -t chunk /dev/sdb
btrfs-progs v6.6.2
chunk tree
leaf 2147549184 items 5 free space 15626 generation 22 owner CHUNK_TREE
leaf 2147549184 flags 0x1(WRITTEN) backref revision 1
fs uuid e49c5f73-35dd-4faa-8660-dd0b3d02e978
chunk uuid 4d0e11ba-e791-4688-bc19-f4960c3138b8
	item 0 key (DEV_ITEMS DEV_ITEM 1) itemoff 16185 itemsize 98
		devid 1 total_bytes 14000519643136 bytes_used 805306368
		io_align 16384 io_width 16384 sector_size 16384 type 0
		generation 0 start_offset 0 dev_group 0
		seek_speed 0 bandwidth 0
		uuid b276d748-ed0f-4769-94e3-427ba0b9cc12
		fsid e49c5f73-35dd-4faa-8660-dd0b3d02e978
	item 1 key (DEV_ITEMS DEV_ITEM 2) itemoff 16087 itemsize 98
		devid 2 total_bytes 14000519643136 bytes_used 805306368
		io_align 16384 io_width 16384 sector_size 16384 type 0
		generation 0 start_offset 0 dev_group 0
		seek_speed 0 bandwidth 0
		uuid 2d4c70d1-808d-4c10-9696-521cf92748e2
		fsid e49c5f73-35dd-4faa-8660-dd0b3d02e978
	item 2 key (FIRST_CHUNK_TREE CHUNK_ITEM 1879048192) itemoff 15975 
itemsize 112
		length 268435456 owner 2 stripe_len 65536 type METADATA|RAID1
		io_align 65536 io_width 65536 sector_size 16384
		num_stripes 2 sub_stripes 1
			stripe 0 devid 2 offset 536870912
			dev_uuid 2d4c70d1-808d-4c10-9696-521cf92748e2
			stripe 1 devid 1 offset 536870912
			dev_uuid b276d748-ed0f-4769-94e3-427ba0b9cc12
	item 3 key (FIRST_CHUNK_TREE CHUNK_ITEM 2147483648) itemoff 15863 
itemsize 112
		length 268435456 owner 2 stripe_len 65536 type SYSTEM|RAID1
		io_align 65536 io_width 65536 sector_size 16384
		num_stripes 2 sub_stripes 1
			stripe 0 devid 2 offset 805306368
			dev_uuid 2d4c70d1-808d-4c10-9696-521cf92748e2
			stripe 1 devid 1 offset 805306368
			dev_uuid b276d748-ed0f-4769-94e3-427ba0b9cc12
	item 4 key (FIRST_CHUNK_TREE CHUNK_ITEM 2684354560) itemoff 15751 
itemsize 112
		length 268435456 owner 2 stripe_len 65536 type DATA|RAID1
		io_align 65536 io_width 65536 sector_size 16384
		num_stripes 2 sub_stripes 1
			stripe 0 devid 2 offset 1342177280
			dev_uuid 2d4c70d1-808d-4c10-9696-521cf92748e2
			stripe 1 devid 1 offset 2147483648
			dev_uuid b276d748-ed0f-4769-94e3-427ba0b9cc12
The kernel is using for-next branch:
fa5d21fe6e6999373c2c2f48510af37964b6d9d1 (HEAD, btrfs/for-next) btrfs: 
preallocate temporary extent buffer for inode logging when needed

Meanwhile the progs is v6.6.2.

Thanks,
Qu

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 7027 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [btrfs] RAID1 volume on zoned device oops when sync.
  2024-02-08 20:15                 ` Qu Wenruo
@ 2024-02-09  1:10                   ` David Sterba
  2024-02-09  8:14                     ` Johannes Thumshirn
  0 siblings, 1 reply; 22+ messages in thread
From: David Sterba @ 2024-02-09  1:10 UTC (permalink / raw)
  To: Qu Wenruo
  Cc: Johannes Thumshirn, Qu Wenruo, 韩于惟, linux-btrfs

On Fri, Feb 09, 2024 at 06:45:10AM +1030, Qu Wenruo wrote:
> 
> 
> On 2024/2/8 23:12, Johannes Thumshirn wrote:
> > On 05.02.24 08:56, Qu Wenruo wrote:
> >>>
> >>>    > ./nullb setup
> >>>    > ./nullb create -s 4096 -z 256
> >>>    > ./nullb create -s 4096 -z 256
> >>>    > ./nullb ls
> >>>    > mkfs.btrfs -s 16k /dev/nullb0
> >>>    > mount /dev/nullb0 /mnt/tmp
> >>>    > btrfs device add /dev/nullb1 /mnt/tmp
> >>>    > btrfs balance start -dconvert=raid1 -mconvert=raid1 /mnt/tmp
> >>
> >> Just want to be sure, for your case, you're doing the same mkfs (4K
> >> sectorsize) on the physical disk, then add a new disk, and finally
> >> balanced the fs?
> >>
> >> IIRC the balance itself should not succeed, no matter if it's emulated
> >> or real disks, as data RAID1 requires zoned RST support.
> > 
> > For me, balance doesn't accept RAID on zoned devices, as it's supposed
> > to do:
> > 
> > [  212.721872] BTRFS info (device nvme1n1): host-managed zoned block
> > device /dev/nvme2n1, 160 zones of 134217728 bytes
> > [  212.725694] BTRFS info (device nvme1n1): disk added /dev/nvme2n1
> > [  212.744807] BTRFS warning (device nvme1n1): balance: metadata profile
> > dup has lower redundancy than data profile raid1
> > [  212.748706] BTRFS info (device nvme1n1): balance: start -dconvert=raid1
> > [  212.750006] BTRFS error (device nvme1n1): zoned: data raid1 needs
> > raid-stripe-tree
> > [  212.751267] BTRFS info (device nvme1n1): balance: ended with status: -22
> > 
> > So I'm not exactly sure what's happening here.
> 
> I have the access to that machine, and it doesn't reject the convert as 
> expected:
> 
> $ sudo mkfs.btrfs -f /dev/sdb
> btrfs-progs v6.6.2
> See https://btrfs.readthedocs.io for more information.
> 
> Zoned: /dev/sdb: host-managed device detected, setting zoned feature
> WARNING: libblkid < 2.38 does not support zoned mode's superblock 
> location, update recommended
> Resetting device zones /dev/sdb (52156 zones) ...
> NOTE: several default settings have changed in version 5.15, please make 
> sure
>        this does not affect your deployments:
>        - DUP for metadata (-m dup)
>        - enabled no-holes (-O no-holes)
>        - enabled free-space-tree (-R free-space-tree)
> 
> Label:              (null)
> UUID:               e49c5f73-35dd-4faa-8660-dd0b3d02e978
> Node size:          16384
> Sector size:        16384	<<< Not yet subpage.
> Filesystem size:    12.73TiB
> Block group profiles:
>    Data:             single          256.00MiB
>    Metadata:         DUP             256.00MiB
>    System:           DUP             256.00MiB
> SSD detected:       no
> Zoned device:       yes
>    Zone size:        256.00MiB
> Incompat features:  extref, skinny-metadata, no-holes, free-space-tree, 
> zoned
> Runtime features:   free-space-tree
> Checksum:           crc32c
> Number of devices:  1
> Devices:
>     ID        SIZE  ZONES  PATH
>      1    12.73TiB  52156  /dev/sdb
> 
> $ sudo mount /dev/sdb /mnt/btrfs
> $ sudo btrfs dev add /dev/sdc -f /mnt/btrfs/
> Resetting device zones /dev/sdc (52156 zones) ...
> 
> $ dmesg
> [146422.722707] BTRFS: device fsid e49c5f73-35dd-4faa-8660-dd0b3d02e978 
> devid 1 transid 6 /dev/sdb scanned by mount (4172)
> [146422.736415] BTRFS info (device sdb): first mount of filesystem 
> e49c5f73-35dd-4faa-8660-dd0b3d02e978
> [146422.745508] BTRFS info (device sdb): using crc32c (crc32c-generic) 
> checksum algorithm
> [146422.753388] BTRFS info (device sdb): using free-space-tree
> [146423.313000] BTRFS info (device sdb): host-managed zoned block device 
> /dev/sdb, 52156 zones of 268435456 bytes
> [146423.322954] BTRFS info (device sdb): zoned mode enabled with zone 
> size 268435456
> [146423.330808] BTRFS info (device sdb): checking UUID tree
> [146446.313055] BTRFS info (device sdb): host-managed zoned block device 
> /dev/sdc, 52156 zones of 268435456 bytes
> [146446.345735] BTRFS info (device sdb): disk added /dev/sdc
> 
> $ sudo dmesg -C
> $ sudo btrfs balance start -mconvert=raid1 -dconvert=raid1 /mnt/btrfs/
> Done, had to relocate 3 out of 3 chunks
> 
> $ sudo dmesg
> [146533.890423] BTRFS info (device sdb): balance: start -dconvert=raid1 
> -mconvert=raid1 -sconvert=raid1

Here I'd expect a message like "cannot convert to raid1 because for
zoned RST is needed"

> [146533.899668] BTRFS info (device sdb): relocating block group 
> 1610612736 flags metadata|dup

but relocation starts anyway.

> [146533.992730] BTRFS info (device sdb): found 3 extents, stage: move 
> data extents
> [146534.126812] BTRFS info (device sdb): relocating block group 
> 1342177280 flags system|dup
> [146534.252836] BTRFS info (device sdb): relocating block group 
> 1073741824 flags data
> [146534.428593] BTRFS info (device sdb): balance: ended with status: 0

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [btrfs] RAID1 volume on zoned device oops when sync.
  2024-02-09  1:10                   ` David Sterba
@ 2024-02-09  8:14                     ` Johannes Thumshirn
  0 siblings, 0 replies; 22+ messages in thread
From: Johannes Thumshirn @ 2024-02-09  8:14 UTC (permalink / raw)
  To: dsterba, Qu Wenruo; +Cc: Qu Wenruo, 韩于惟, linux-btrfs

On 09.02.24 02:11, David Sterba wrote:
> On Fri, Feb 09, 2024 at 06:45:10AM +1030, Qu Wenruo wrote:
>>
>>
>> On 2024/2/8 23:12, Johannes Thumshirn wrote:
>>> On 05.02.24 08:56, Qu Wenruo wrote:
>>>>>
>>>>>     > ./nullb setup
>>>>>     > ./nullb create -s 4096 -z 256
>>>>>     > ./nullb create -s 4096 -z 256
>>>>>     > ./nullb ls
>>>>>     > mkfs.btrfs -s 16k /dev/nullb0
>>>>>     > mount /dev/nullb0 /mnt/tmp
>>>>>     > btrfs device add /dev/nullb1 /mnt/tmp
>>>>>     > btrfs balance start -dconvert=raid1 -mconvert=raid1 /mnt/tmp
>>>>
>>>> Just want to be sure, for your case, you're doing the same mkfs (4K
>>>> sectorsize) on the physical disk, then add a new disk, and finally
>>>> balanced the fs?
>>>>
>>>> IIRC the balance itself should not succeed, no matter if it's emulated
>>>> or real disks, as data RAID1 requires zoned RST support.
>>>
>>> For me, balance doesn't accept RAID on zoned devices, as it's supposed
>>> to do:
>>>
>>> [  212.721872] BTRFS info (device nvme1n1): host-managed zoned block
>>> device /dev/nvme2n1, 160 zones of 134217728 bytes
>>> [  212.725694] BTRFS info (device nvme1n1): disk added /dev/nvme2n1
>>> [  212.744807] BTRFS warning (device nvme1n1): balance: metadata profile
>>> dup has lower redundancy than data profile raid1
>>> [  212.748706] BTRFS info (device nvme1n1): balance: start -dconvert=raid1
>>> [  212.750006] BTRFS error (device nvme1n1): zoned: data raid1 needs
>>> raid-stripe-tree
>>> [  212.751267] BTRFS info (device nvme1n1): balance: ended with status: -22
>>>
>>> So I'm not exactly sure what's happening here.
>>
>> I have the access to that machine, and it doesn't reject the convert as
>> expected:
>>
>> $ sudo mkfs.btrfs -f /dev/sdb
>> btrfs-progs v6.6.2
>> See https://btrfs.readthedocs.io for more information.
>>
>> Zoned: /dev/sdb: host-managed device detected, setting zoned feature
>> WARNING: libblkid < 2.38 does not support zoned mode's superblock
>> location, update recommended
>> Resetting device zones /dev/sdb (52156 zones) ...
>> NOTE: several default settings have changed in version 5.15, please make
>> sure
>>         this does not affect your deployments:
>>         - DUP for metadata (-m dup)
>>         - enabled no-holes (-O no-holes)
>>         - enabled free-space-tree (-R free-space-tree)
>>
>> Label:              (null)
>> UUID:               e49c5f73-35dd-4faa-8660-dd0b3d02e978
>> Node size:          16384
>> Sector size:        16384	<<< Not yet subpage.
>> Filesystem size:    12.73TiB
>> Block group profiles:
>>     Data:             single          256.00MiB
>>     Metadata:         DUP             256.00MiB
>>     System:           DUP             256.00MiB
>> SSD detected:       no
>> Zoned device:       yes
>>     Zone size:        256.00MiB
>> Incompat features:  extref, skinny-metadata, no-holes, free-space-tree,
>> zoned
>> Runtime features:   free-space-tree
>> Checksum:           crc32c
>> Number of devices:  1
>> Devices:
>>      ID        SIZE  ZONES  PATH
>>       1    12.73TiB  52156  /dev/sdb
>>
>> $ sudo mount /dev/sdb /mnt/btrfs
>> $ sudo btrfs dev add /dev/sdc -f /mnt/btrfs/
>> Resetting device zones /dev/sdc (52156 zones) ...
>>
>> $ dmesg
>> [146422.722707] BTRFS: device fsid e49c5f73-35dd-4faa-8660-dd0b3d02e978
>> devid 1 transid 6 /dev/sdb scanned by mount (4172)
>> [146422.736415] BTRFS info (device sdb): first mount of filesystem
>> e49c5f73-35dd-4faa-8660-dd0b3d02e978
>> [146422.745508] BTRFS info (device sdb): using crc32c (crc32c-generic)
>> checksum algorithm
>> [146422.753388] BTRFS info (device sdb): using free-space-tree
>> [146423.313000] BTRFS info (device sdb): host-managed zoned block device
>> /dev/sdb, 52156 zones of 268435456 bytes
>> [146423.322954] BTRFS info (device sdb): zoned mode enabled with zone
>> size 268435456
>> [146423.330808] BTRFS info (device sdb): checking UUID tree
>> [146446.313055] BTRFS info (device sdb): host-managed zoned block device
>> /dev/sdc, 52156 zones of 268435456 bytes
>> [146446.345735] BTRFS info (device sdb): disk added /dev/sdc
>>
>> $ sudo dmesg -C
>> $ sudo btrfs balance start -mconvert=raid1 -dconvert=raid1 /mnt/btrfs/
>> Done, had to relocate 3 out of 3 chunks
>>
>> $ sudo dmesg
>> [146533.890423] BTRFS info (device sdb): balance: start -dconvert=raid1
>> -mconvert=raid1 -sconvert=raid1
> 
> Here I'd expect a message like "cannot convert to raid1 because for
> zoned RST is needed"

Yep this is what I'm seeing on for-next.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [btrfs] RAID1 volume on zoned device oops when sync.
  2024-02-08 12:42               ` Johannes Thumshirn
  2024-02-08 20:15                 ` Qu Wenruo
@ 2024-02-09  8:16                 ` 韩于惟
  2024-02-09  9:00                   ` Johannes Thumshirn
  1 sibling, 1 reply; 22+ messages in thread
From: 韩于惟 @ 2024-02-09  8:16 UTC (permalink / raw)
  To: Johannes Thumshirn, Qu Wenruo; +Cc: linux-btrfs


[-- Attachment #1.1.1: Type: text/plain, Size: 1483 bytes --]


在 2024/2/8 20:42, Johannes Thumshirn 写道:
> On 05.02.24 08:56, Qu Wenruo wrote:
>>>    > ./nullb setup
>>>    > ./nullb create -s 4096 -z 256
>>>    > ./nullb create -s 4096 -z 256
>>>    > ./nullb ls
>>>    > mkfs.btrfs -s 16k /dev/nullb0
>>>    > mount /dev/nullb0 /mnt/tmp
>>>    > btrfs device add /dev/nullb1 /mnt/tmp
>>>    > btrfs balance start -dconvert=raid1 -mconvert=raid1 /mnt/tmp
>> Just want to be sure, for your case, you're doing the same mkfs (4K
>> sectorsize) on the physical disk, then add a new disk, and finally
>> balanced the fs?
>>
>> IIRC the balance itself should not succeed, no matter if it's emulated
>> or real disks, as data RAID1 requires zoned RST support.
> For me, balance doesn't accept RAID on zoned devices, as it's supposed
> to do:
>
> [  212.721872] BTRFS info (device nvme1n1): host-managed zoned block
> device /dev/nvme2n1, 160 zones of 134217728 bytes
> [  212.725694] BTRFS info (device nvme1n1): disk added /dev/nvme2n1
> [  212.744807] BTRFS warning (device nvme1n1): balance: metadata profile
> dup has lower redundancy than data profile raid1
> [  212.748706] BTRFS info (device nvme1n1): balance: start -dconvert=raid1
> [  212.750006] BTRFS error (device nvme1n1): zoned: data raid1 needs
> raid-stripe-tree
> [  212.751267] BTRFS info (device nvme1n1): balance: ended with status: -22
This is using nvme driver, mine is SATA. It this related?
> So I'm not exactly sure what's happening here.

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 1589 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [btrfs] RAID1 volume on zoned device oops when sync.
  2024-02-09  8:16                 ` 韩于惟
@ 2024-02-09  9:00                   ` Johannes Thumshirn
  2024-02-09  9:06                     ` Johannes Thumshirn
  0 siblings, 1 reply; 22+ messages in thread
From: Johannes Thumshirn @ 2024-02-09  9:00 UTC (permalink / raw)
  To: 韩于惟, Qu Wenruo; +Cc: linux-btrfs

On 09.02.24 09:17, 韩于惟 wrote:
> 
> 在 2024/2/8 20:42, Johannes Thumshirn 写道:
>> On 05.02.24 08:56, Qu Wenruo wrote:
>>>>    > ./nullb setup
>>>>    > ./nullb create -s 4096 -z 256
>>>>    > ./nullb create -s 4096 -z 256
>>>>    > ./nullb ls
>>>>    > mkfs.btrfs -s 16k /dev/nullb0
>>>>    > mount /dev/nullb0 /mnt/tmp
>>>>    > btrfs device add /dev/nullb1 /mnt/tmp
>>>>    > btrfs balance start -dconvert=raid1 -mconvert=raid1 /mnt/tmp
>>> Just want to be sure, for your case, you're doing the same mkfs (4K
>>> sectorsize) on the physical disk, then add a new disk, and finally
>>> balanced the fs?
>>>
>>> IIRC the balance itself should not succeed, no matter if it's emulated
>>> or real disks, as data RAID1 requires zoned RST support.
>> For me, balance doesn't accept RAID on zoned devices, as it's supposed
>> to do:
>>
>> [  212.721872] BTRFS info (device nvme1n1): host-managed zoned block
>> device /dev/nvme2n1, 160 zones of 134217728 bytes
>> [  212.725694] BTRFS info (device nvme1n1): disk added /dev/nvme2n1
>> [  212.744807] BTRFS warning (device nvme1n1): balance: metadata profile
>> dup has lower redundancy than data profile raid1
>> [  212.748706] BTRFS info (device nvme1n1): balance: start 
>> -dconvert=raid1
>> [  212.750006] BTRFS error (device nvme1n1): zoned: data raid1 needs
>> raid-stripe-tree
>> [  212.751267] BTRFS info (device nvme1n1): balance: ended with 
>> status: -22
> This is using nvme driver, mine is SATA. It this related?

The only difference here (for btrfs) is, that an SMR HDD can have 
conventional zones.

But btrfs_load_block_group_zone_info() does check for the profile in 
both cases:

btrfs_load_block_group_zone_info()
`-> switch (map->type & BTRFS_BLOCK_GROUP_PROFILE_MASK) {
     `-> case BTRFS_BLOCK_GROUP_RAID1:
         `-> btrfs_load_block_group_raid1()
             `-> if ((map->type & BTRFS_BLOCK_GROUP_DATA) &&
                      !fs_info->stripe_root) {
                       btrfs_err(...)
                        return -EINVAL;

I don't see the difference yet. I'll re-run a test on an SMR drive, just 
to be sure.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [btrfs] RAID1 volume on zoned device oops when sync.
  2024-02-09  9:00                   ` Johannes Thumshirn
@ 2024-02-09  9:06                     ` Johannes Thumshirn
  2024-02-09  9:16                       ` Johannes Thumshirn
  2024-02-09  9:45                       ` 韩于惟
  0 siblings, 2 replies; 22+ messages in thread
From: Johannes Thumshirn @ 2024-02-09  9:06 UTC (permalink / raw)
  To: 韩于惟, Qu Wenruo; +Cc: linux-btrfs

On 09.02.24 10:01, Johannes Thumshirn wrote:
> On 09.02.24 09:17, 韩于惟 wrote:
>>
>> 在 2024/2/8 20:42, Johannes Thumshirn 写道:
>>> On 05.02.24 08:56, Qu Wenruo wrote:
>>>>>     > ./nullb setup
>>>>>     > ./nullb create -s 4096 -z 256
>>>>>     > ./nullb create -s 4096 -z 256
>>>>>     > ./nullb ls
>>>>>     > mkfs.btrfs -s 16k /dev/nullb0
>>>>>     > mount /dev/nullb0 /mnt/tmp
>>>>>     > btrfs device add /dev/nullb1 /mnt/tmp
>>>>>     > btrfs balance start -dconvert=raid1 -mconvert=raid1 /mnt/tmp
>>>> Just want to be sure, for your case, you're doing the same mkfs (4K
>>>> sectorsize) on the physical disk, then add a new disk, and finally
>>>> balanced the fs?
>>>>
>>>> IIRC the balance itself should not succeed, no matter if it's emulated
>>>> or real disks, as data RAID1 requires zoned RST support.
>>> For me, balance doesn't accept RAID on zoned devices, as it's supposed
>>> to do:
>>>
>>> [  212.721872] BTRFS info (device nvme1n1): host-managed zoned block
>>> device /dev/nvme2n1, 160 zones of 134217728 bytes
>>> [  212.725694] BTRFS info (device nvme1n1): disk added /dev/nvme2n1
>>> [  212.744807] BTRFS warning (device nvme1n1): balance: metadata profile
>>> dup has lower redundancy than data profile raid1
>>> [  212.748706] BTRFS info (device nvme1n1): balance: start
>>> -dconvert=raid1
>>> [  212.750006] BTRFS error (device nvme1n1): zoned: data raid1 needs
>>> raid-stripe-tree
>>> [  212.751267] BTRFS info (device nvme1n1): balance: ended with
>>> status: -22
>> This is using nvme driver, mine is SATA. It this related?
> 
> The only difference here (for btrfs) is, that an SMR HDD can have
> conventional zones.
> 
> But btrfs_load_block_group_zone_info() does check for the profile in
> both cases:
> 
> btrfs_load_block_group_zone_info()
> `-> switch (map->type & BTRFS_BLOCK_GROUP_PROFILE_MASK) {
>       `-> case BTRFS_BLOCK_GROUP_RAID1:
>           `-> btrfs_load_block_group_raid1()
>               `-> if ((map->type & BTRFS_BLOCK_GROUP_DATA) &&
>                        !fs_info->stripe_root) {
>                         btrfs_err(...)
>                          return -EINVAL;
> 
> I don't see the difference yet. I'll re-run a test on an SMR drive, just
> to be sure.
> 


Oh I think I see the problem now, can you try the following patch:
https://termbin.com/fss0

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [btrfs] RAID1 volume on zoned device oops when sync.
  2024-02-09  9:06                     ` Johannes Thumshirn
@ 2024-02-09  9:16                       ` Johannes Thumshirn
  2024-02-09  9:45                       ` 韩于惟
  1 sibling, 0 replies; 22+ messages in thread
From: Johannes Thumshirn @ 2024-02-09  9:16 UTC (permalink / raw)
  To: 韩于惟, Qu Wenruo; +Cc: linux-btrfs

On 09.02.24 10:06, Johannes Thumshirn wrote:
> 
> 
> Oh I think I see the problem now, can you try the following patch:
> https://termbin.com/fss0
> 

OK, should've tested before. I'll send and updated patch soon.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [btrfs] RAID1 volume on zoned device oops when sync.
  2024-02-09  9:06                     ` Johannes Thumshirn
  2024-02-09  9:16                       ` Johannes Thumshirn
@ 2024-02-09  9:45                       ` 韩于惟
  1 sibling, 0 replies; 22+ messages in thread
From: 韩于惟 @ 2024-02-09  9:45 UTC (permalink / raw)
  To: Johannes Thumshirn, Qu Wenruo; +Cc: linux-btrfs


[-- Attachment #1.1.1: Type: text/plain, Size: 3529 bytes --]


在 2024/2/9 17:06, Johannes Thumshirn 写道:
> On 09.02.24 10:01, Johannes Thumshirn wrote:
>> On 09.02.24 09:17, 韩于惟 wrote:
>>> 在 2024/2/8 20:42, Johannes Thumshirn 写道:
>>>> On 05.02.24 08:56, Qu Wenruo wrote:
>>>>>>      > ./nullb setup
>>>>>>      > ./nullb create -s 4096 -z 256
>>>>>>      > ./nullb create -s 4096 -z 256
>>>>>>      > ./nullb ls
>>>>>>      > mkfs.btrfs -s 16k /dev/nullb0
>>>>>>      > mount /dev/nullb0 /mnt/tmp
>>>>>>      > btrfs device add /dev/nullb1 /mnt/tmp
>>>>>>      > btrfs balance start -dconvert=raid1 -mconvert=raid1 /mnt/tmp
>>>>> Just want to be sure, for your case, you're doing the same mkfs (4K
>>>>> sectorsize) on the physical disk, then add a new disk, and finally
>>>>> balanced the fs?
>>>>>
>>>>> IIRC the balance itself should not succeed, no matter if it's emulated
>>>>> or real disks, as data RAID1 requires zoned RST support.
>>>> For me, balance doesn't accept RAID on zoned devices, as it's supposed
>>>> to do:
>>>>
>>>> [  212.721872] BTRFS info (device nvme1n1): host-managed zoned block
>>>> device /dev/nvme2n1, 160 zones of 134217728 bytes
>>>> [  212.725694] BTRFS info (device nvme1n1): disk added /dev/nvme2n1
>>>> [  212.744807] BTRFS warning (device nvme1n1): balance: metadata profile
>>>> dup has lower redundancy than data profile raid1
>>>> [  212.748706] BTRFS info (device nvme1n1): balance: start
>>>> -dconvert=raid1
>>>> [  212.750006] BTRFS error (device nvme1n1): zoned: data raid1 needs
>>>> raid-stripe-tree
>>>> [  212.751267] BTRFS info (device nvme1n1): balance: ended with
>>>> status: -22
>>> This is using nvme driver, mine is SATA. It this related?
>> The only difference here (for btrfs) is, that an SMR HDD can have
>> conventional zones.
>>
>> But btrfs_load_block_group_zone_info() does check for the profile in
>> both cases:
>>
>> btrfs_load_block_group_zone_info()
>> `-> switch (map->type & BTRFS_BLOCK_GROUP_PROFILE_MASK) {
>>        `-> case BTRFS_BLOCK_GROUP_RAID1:
>>            `-> btrfs_load_block_group_raid1()
>>                `-> if ((map->type & BTRFS_BLOCK_GROUP_DATA) &&
>>                         !fs_info->stripe_root) {
>>                          btrfs_err(...)
>>                           return -EINVAL;
>>
>> I don't see the difference yet. I'll re-run a test on an SMR drive, just
>> to be sure.
>>
>
> Oh I think I see the problem now, can you try the following patch:
> https://termbin.com/fss0

I can't mount after new mkfs.btrfs.

[195158.807960] BTRFS: device fsid 4fb3ae17-9f18-47d9-b7bf-00d425fe450e 
devid 1 transid 6 /dev/sdb scanned by mount (4761) [195158.822827] BTRFS 
info (device sdb): first mount of filesystem 
4fb3ae17-9f18-47d9-b7bf-00d425fe450e [195158.831915] BTRFS info (device 
sdb): using crc32c (crc32c-generic) checksum algorithm [195158.839795] 
BTRFS info (device sdb): using free-space-tree [195159.399477] BTRFS 
info (device sdb): host-managed zoned block device /dev/sdb, 52156 zones 
of 268435456 bytes [195159.409431] BTRFS info (device sdb): zoned mode 
enabled with zone size 268435456 [195159.416934] BTRFS error (device 
sdb): zoned: invalid write pointer 18446744073709551614 (larger than 
zone capacity 0) in block group 1073741824 [195159.429815] BTRFS error 
(device sdb): zoned: failed to load zone info of bg 1073741824 
[195159.437771] BTRFS error (device sdb): failed to read block groups: 
-5 [195159.444473] BTRFS error (device sdb): open_ctree failed


[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 1589 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2024-02-09  9:46 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-02-02  8:13 [btrfs] RAID1 volume on zoned device oops when sync 韩于惟
2024-02-02 12:19 ` David Sterba
2024-02-03 10:18   ` 韩于惟
2024-02-03 22:15     ` David Sterba
2024-02-04  9:34       ` 韩于惟
2024-02-05  5:22         ` Qu Wenruo
2024-02-05  6:46           ` 韩于惟
2024-02-05  7:56             ` Qu Wenruo
2024-02-05 10:50               ` 韩于惟
2024-02-05 10:50               ` 韩于惟
2024-02-05 10:50               ` 韩于惟
2024-02-05 20:40                 ` Qu Wenruo
2024-02-06  1:45                   ` 韩于惟
2024-02-08 12:42               ` Johannes Thumshirn
2024-02-08 20:15                 ` Qu Wenruo
2024-02-09  1:10                   ` David Sterba
2024-02-09  8:14                     ` Johannes Thumshirn
2024-02-09  8:16                 ` 韩于惟
2024-02-09  9:00                   ` Johannes Thumshirn
2024-02-09  9:06                     ` Johannes Thumshirn
2024-02-09  9:16                       ` Johannes Thumshirn
2024-02-09  9:45                       ` 韩于惟

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.