Hi All, after the thread "Question: how understand the raid profile of a btrfs filesystem" [*] I was working to cleanup the function btrfs_reduce_alloc_profile(), when I figured that it was incompatible with a mixed profile of DUP and RAID1C3/RAID1C4. This is a very uncommon situation; to be honest it very unlikely that it will happen at all. However if the filesystem has a mixed profiles of DUP and RAID1C3/RAID1C4 (of the same type of chunk of course, i.e. if you have metadata RAID1C3 and data DUP there is no problem because the type of chunks are different), the function btrfs_reduce_alloc_profile() returns both the profiles and subsequent calls to alloc_profile_is_valid() return invalid. The problem is how the function btrfs_reduce_alloc_profile "reduces" the profiles. [...] static u64 btrfs_reduce_alloc_profile(struct btrfs_fs_info *fs_info, u64 flags) [...] allowed &= flags; if (allowed & BTRFS_BLOCK_GROUP_RAID6) allowed = BTRFS_BLOCK_GROUP_RAID6; else if (allowed & BTRFS_BLOCK_GROUP_RAID5) allowed = BTRFS_BLOCK_GROUP_RAID5; else if (allowed & BTRFS_BLOCK_GROUP_RAID10) allowed = BTRFS_BLOCK_GROUP_RAID10; else if (allowed & BTRFS_BLOCK_GROUP_RAID1) allowed = BTRFS_BLOCK_GROUP_RAID1; else if (allowed & BTRFS_BLOCK_GROUP_RAID0) allowed = BTRFS_BLOCK_GROUP_RAID0; flags &= ~BTRFS_BLOCK_GROUP_PROFILE_MASK; [...] "allowed" are all the possibles profiles allowed by the disks. "flags" contains the existing profiles. If "flags" contains both DUP, RAID1C3 no reduction is performed and both the profiles are returned. If full conversion from DUP to RAID1C3 is performed, there is no problem. But a partial conversion from DUP to RAID1C3 is performed, then there is no possibility to allocate a new chunk. On my tests the filesystem was never corrupted, but only force to RO. However I was unable to exit from this state without my patch. [*] https://lore.kernel.org/linux-btrfs/517dac49-5f57-2754-2134-92d716e50064@alice.it/ -- gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it> Key fingerprint BBF5 1610 0B64 DAC6 5F7D 17B2 0EDA 9B37 8B82 E0B5
From: Goffredo Baroncelli <kreijack@inwind.it> The function btrfs_reduce_alloc_profile() is in charge to choice the profile for the 'next' block group allocation. This patch add support RAID1C3 and RAID1C4. Moreover are explicetely handled the case DUP and SINGLE. Before if both DUP and RAID1C3/RAID1C3 were present btrfs_reduce_alloc_profile() return both the bit set. This would cause the check alloc_profile_is_valid() to fail leading to a ro filesystem. Signed-off-by: Goffredo Baroncelli <kreijack@inwind.it> --- fs/btrfs/block-group.c | 14 +++++++++++++- 1 file changed, 13 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c index 5627f53ca115..f23a1e1dc359 100644 --- a/fs/btrfs/block-group.c +++ b/fs/btrfs/block-group.c @@ -81,8 +81,12 @@ static u64 btrfs_reduce_alloc_profile(struct btrfs_fs_info *fs_info, u64 flags) } allowed &= flags; - if (allowed & BTRFS_BLOCK_GROUP_RAID6) + if (allowed & BTRFS_BLOCK_GROUP_RAID1C4) + allowed = BTRFS_BLOCK_GROUP_RAID1C4; + else if (allowed & BTRFS_BLOCK_GROUP_RAID6) allowed = BTRFS_BLOCK_GROUP_RAID6; + else if (allowed & BTRFS_BLOCK_GROUP_RAID1C3) + allowed = BTRFS_BLOCK_GROUP_RAID1C3; else if (allowed & BTRFS_BLOCK_GROUP_RAID5) allowed = BTRFS_BLOCK_GROUP_RAID5; else if (allowed & BTRFS_BLOCK_GROUP_RAID10) @@ -91,6 +95,14 @@ static u64 btrfs_reduce_alloc_profile(struct btrfs_fs_info *fs_info, u64 flags) allowed = BTRFS_BLOCK_GROUP_RAID1; else if (allowed & BTRFS_BLOCK_GROUP_RAID0) allowed = BTRFS_BLOCK_GROUP_RAID0; + else if (allowed & BTRFS_BLOCK_GROUP_DUP) + allowed = BTRFS_BLOCK_GROUP_DUP; + else if (allowed & BTRFS_AVAIL_ALLOC_BIT_SINGLE) + allowed = BTRFS_AVAIL_ALLOC_BIT_SINGLE; + else { + btrfs_err(fs_info, "invalid profile type %llu", allowed); + BUG_ON(1); + } flags &= ~BTRFS_BLOCK_GROUP_PROFILE_MASK; -- 2.27.0.rc2
On 2020/5/31 上午2:53, Goffredo Baroncelli wrote: > > Hi All, > > after the thread "Question: how understand the raid profile of a btrfs > filesystem" [*] I was working to cleanup the function > btrfs_reduce_alloc_profile(), when I figured that it was incompatible with > a mixed profile of DUP and RAID1C3/RAID1C4. > > This is a very uncommon situation; to be honest it very unlikely that it will > happen at all. > > However if the filesystem has a mixed profiles of DUP and RAID1C3/RAID1C4 (of > the same type of chunk of course, i.e. if you have metadata RAID1C3 and data > DUP there is no problem because the type of chunks are different), the function > btrfs_reduce_alloc_profile() returns both the profiles and subsequent calls > to alloc_profile_is_valid() return invalid. > > The problem is how the function btrfs_reduce_alloc_profile "reduces" the > profiles. > > [...] > static u64 btrfs_reduce_alloc_profile(struct btrfs_fs_info *fs_info, u64 flags) > [...] > allowed &= flags; > > if (allowed & BTRFS_BLOCK_GROUP_RAID6) > allowed = BTRFS_BLOCK_GROUP_RAID6; > else if (allowed & BTRFS_BLOCK_GROUP_RAID5) > allowed = BTRFS_BLOCK_GROUP_RAID5; > else if (allowed & BTRFS_BLOCK_GROUP_RAID10) > allowed = BTRFS_BLOCK_GROUP_RAID10; > else if (allowed & BTRFS_BLOCK_GROUP_RAID1) > allowed = BTRFS_BLOCK_GROUP_RAID1; > else if (allowed & BTRFS_BLOCK_GROUP_RAID0) > allowed = BTRFS_BLOCK_GROUP_RAID0; > > flags &= ~BTRFS_BLOCK_GROUP_PROFILE_MASK; > > [...] > > "allowed" are all the possibles profiles allowed by the disks. > "flags" contains the existing profiles. > > If "flags" contains both DUP, RAID1C3 no reduction is performed and both > the profiles are returned. > > If full conversion from DUP to RAID1C3 is performed, there is no problem. > But a partial conversion from DUP to RAID1C3 is performed, then there is no > possibility to allocate a new chunk. > > On my tests the filesystem was never corrupted, but only force to RO. > However I was unable to exit from this state without my patch. This in facts exposed the long existing bug that btrfs has no on-disk indicator for the target chunk time, thus we need to be "creative" to handle chunk profiles. I'm wondering if we could add new persistent item in chunk tree or super block to solve the problem. Any idea on this, David? Thanks, Qu > > [*] https://lore.kernel.org/linux-btrfs/517dac49-5f57-2754-2134-92d716e50064@alice.it/ >
On 5/31/20 2:51 AM, Qu Wenruo wrote: > > > On 2020/5/31 上午2:53, Goffredo Baroncelli wrote: >> >> Hi All, >> >> after the thread "Question: how understand the raid profile of a btrfs >> filesystem" [*] I was working to cleanup the function >> btrfs_reduce_alloc_profile(), when I figured that it was incompatible with >> a mixed profile of DUP and RAID1C3/RAID1C4. >> >> This is a very uncommon situation; to be honest it very unlikely that it will >> happen at all. >> >> However if the filesystem has a mixed profiles of DUP and RAID1C3/RAID1C4 (of >> the same type of chunk of course, i.e. if you have metadata RAID1C3 and data >> DUP there is no problem because the type of chunks are different), the function >> btrfs_reduce_alloc_profile() returns both the profiles and subsequent calls >> to alloc_profile_is_valid() return invalid. >> >> The problem is how the function btrfs_reduce_alloc_profile "reduces" the >> profiles. >> >> [...] >> static u64 btrfs_reduce_alloc_profile(struct btrfs_fs_info *fs_info, u64 flags) >> [...] >> allowed &= flags; >> >> if (allowed & BTRFS_BLOCK_GROUP_RAID6) >> allowed = BTRFS_BLOCK_GROUP_RAID6; >> else if (allowed & BTRFS_BLOCK_GROUP_RAID5) >> allowed = BTRFS_BLOCK_GROUP_RAID5; >> else if (allowed & BTRFS_BLOCK_GROUP_RAID10) >> allowed = BTRFS_BLOCK_GROUP_RAID10; >> else if (allowed & BTRFS_BLOCK_GROUP_RAID1) >> allowed = BTRFS_BLOCK_GROUP_RAID1; >> else if (allowed & BTRFS_BLOCK_GROUP_RAID0) >> allowed = BTRFS_BLOCK_GROUP_RAID0; >> >> flags &= ~BTRFS_BLOCK_GROUP_PROFILE_MASK; >> >> [...] >> >> "allowed" are all the possibles profiles allowed by the disks. >> "flags" contains the existing profiles. >> >> If "flags" contains both DUP, RAID1C3 no reduction is performed and both >> the profiles are returned. >> >> If full conversion from DUP to RAID1C3 is performed, there is no problem. >> But a partial conversion from DUP to RAID1C3 is performed, then there is no >> possibility to allocate a new chunk. >> >> On my tests the filesystem was never corrupted, but only force to RO. >> However I was unable to exit from this state without my patch. > > This in facts exposed the long existing bug that btrfs has no on-disk > indicator for the target chunk time, thus we need to be "creative" to > handle chunk profiles. Fully agree; this patch is... a patch to correct a bug. Changing to having a persistent field is a lot more complicated. > > I'm wondering if we could add new persistent item in chunk tree or super > block to solve the problem I suggest to add a new object in the trees. I think that the superblock should be reserved for info which allow to detect / identify the filesystem (i.e. FDID - Disk UUID, Label basic data for the boot) and nothing more. Moreover having an object stored in the tree, it would be possible to think to and extendible structure (an object has a size) to allow future expansion. > > Any idea on this, David? > > Thanks, > Qu > >> >> [*] https://lore.kernel.org/linux-btrfs/517dac49-5f57-2754-2134-92d716e50064@alice.it/ >> -- gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it> Key fingerprint BBF5 1610 0B64 DAC6 5F7D 17B2 0EDA 9B37 8B82 E0B5
> > This in facts exposed the long existing bug that btrfs has no on-disk > indicator for the target chunk time, thus we need to be "creative" to > handle chunk profiles. > > I'm wondering if we could add new persistent item in chunk tree or super > block to solve the problem. > +1. That will also fix the chunk size discrepancy between mkfs and kernel. Thanks, Anand > Any idea on this, David? > > Thanks, > Qu > >> >> [*] https://lore.kernel.org/linux-btrfs/517dac49-5f57-2754-2134-92d716e50064@alice.it/ >>