* Array extremely unbalanced after convert to Raid5
@ 2021-05-05 13:41 Abdulla Bubshait
2021-05-05 13:58 ` remi
2021-05-05 14:49 ` Zygo Blaxell
0 siblings, 2 replies; 7+ messages in thread
From: Abdulla Bubshait @ 2021-05-05 13:41 UTC (permalink / raw)
To: linux-btrfs
I ran a balance convert of my data single setup to raid 5. Once
complete the setup is extremely unbalanced and doesn't even make sense
as a raid 5. I tried to run a balance with dlimit of 1000, but it just
seems to make things worse.
After convert the array looked like this:
btrfs fi show gives:
Label: 'horde' uuid: 26debbc1-fdd0-4c3a-8581-8445b99c067c
Total devices 4 FS bytes used 25.53TiB
devid 1 size 16.37TiB used 2.36TiB path /dev/sdd
devid 2 size 14.55TiB used 14.27TiB path /dev/sdc
devid 3 size 12.73TiB used 12.69TiB path /dev/sdf
devid 4 size 16.37TiB used 16.32TiB path /dev/sde
btrfs fi usage gives:
Overall:
Device size: 60.03TiB
Device allocated: 45.64TiB
Device unallocated: 14.39TiB
Device missing: 0.00B
Used: 45.59TiB
Free (estimated): 8.08TiB (min: 4.81TiB)
Free (statfs, df): 410.67GiB
Data ratio: 1.78
Metadata ratio: 3.00
Global reserve: 512.00MiB (used: 80.00KiB)
Multiple profiles: no
Data,RAID5: Size:25.51TiB, Used:25.50TiB (99.93%)
/dev/sdd 2.33TiB
/dev/sdc 14.23TiB
/dev/sdf 12.66TiB
/dev/sde 16.31TiB
Metadata,RAID1C3: Size:35.00GiB, Used:28.54GiB (81.55%)
/dev/sdd 34.00GiB
/dev/sdc 35.00GiB
/dev/sdf 30.00GiB
/dev/sde 6.00GiB
System,RAID1C3: Size:32.00MiB, Used:3.06MiB (9.57%)
/dev/sdd 32.00MiB
/dev/sdc 32.00MiB
/dev/sde 32.00MiB
Unallocated:
/dev/sdd 14.01TiB
/dev/sdc 292.99GiB
/dev/sdf 47.00GiB
/dev/sde 53.00GiB
After doing some balance I currently have:
btrfs fi usage
Overall:
Device size: 60.03TiB
Device allocated: 45.52TiB
Device unallocated: 14.51TiB
Device missing: 0.00B
Used: 45.50TiB
Free (estimated): 8.16TiB (min: 4.85TiB)
Free (statfs, df): 414.97GiB
Data ratio: 1.78
Metadata ratio: 3.00
Global reserve: 512.00MiB (used: 80.00KiB)
Multiple profiles: no
Data,RAID5: Size:25.52TiB, Used:25.51TiB (99.96%)
/dev/sdd 2.23TiB
/dev/sdc 14.13TiB
/dev/sdf 12.71TiB
/dev/sde 16.37TiB
Metadata,RAID1C3: Size:29.00GiB, Used:28.51GiB (98.31%)
/dev/sdd 29.00GiB
/dev/sdc 29.00GiB
/dev/sdf 27.00GiB
/dev/sde 2.00GiB
System,RAID1C3: Size:32.00MiB, Used:3.03MiB (9.47%)
/dev/sdd 32.00MiB
/dev/sdc 32.00MiB
/dev/sde 32.00MiB
Unallocated:
/dev/sdd 14.12TiB
/dev/sdc 404.99GiB
/dev/sdf 1.00MiB
/dev/sde 1.00MiB
So the estimated freespace is 8TB, When it should be closer to 15. I
am guessing the freespace would be better if it was properly balanced.
I am unsure how to properly balance this array though.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Array extremely unbalanced after convert to Raid5
2021-05-05 13:41 Array extremely unbalanced after convert to Raid5 Abdulla Bubshait
@ 2021-05-05 13:58 ` remi
2021-05-05 14:23 ` Abdulla Bubshait
2021-05-05 14:51 ` Zygo Blaxell
2021-05-05 14:49 ` Zygo Blaxell
1 sibling, 2 replies; 7+ messages in thread
From: remi @ 2021-05-05 13:58 UTC (permalink / raw)
To: Abdulla Bubshait, linux-btrfs
On Wed, May 5, 2021, at 9:41 AM, Abdulla Bubshait wrote:
> I ran a balance convert of my data single setup to raid 5. Once
> complete the setup is extremely unbalanced and doesn't even make sense
> as a raid 5. I tried to run a balance with dlimit of 1000, but it just
> seems to make things worse.
>
> Unallocated:
> /dev/sdd 14.12TiB
> /dev/sdc 404.99GiB
> /dev/sdf 1.00MiB
> /dev/sde 1.00MiB
>
>
Sorry, I don't have a solution for you, but I want to point out that the situation is far more critical than you seem to have realized.. this filesystem is now completely wedged, and I would suggest adding another device, or replacing either /dev/sdf or /dev/sde with something larger,, (though, if those are real disks, I see that might be a challenge.)
Your metadata is Raid1C3, (meaning 3 copies,), but you only have 2 disks with free space. And thanks to the recent balancing, there is very little free space in the already allocated metadata,, so effectively, the filesystem can no longer write any new metadata and will very quickly hit out of space errors.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Array extremely unbalanced after convert to Raid5
2021-05-05 13:58 ` remi
@ 2021-05-05 14:23 ` Abdulla Bubshait
2021-05-05 14:51 ` Zygo Blaxell
1 sibling, 0 replies; 7+ messages in thread
From: Abdulla Bubshait @ 2021-05-05 14:23 UTC (permalink / raw)
To: remi; +Cc: linux-btrfs
On Wed, May 5, 2021 at 9:59 AM <remi@georgianit.com> wrote:
> Sorry, I don't have a solution for you, but I want to point out that the situation is far more critical than you seem to have realized.. this filesystem is now completely wedged, and I would suggest adding another device, or replacing either /dev/sdf or /dev/sde with something larger,, (though, if those are real disks, I see that might be a challenge.)
I don't think they make disks larger than sde.
I can get some more space by offloading stuff from the array. But even
if I offload a TB and give myself some breathing room on those disks,
as soon as I run a balance the freespace on the disks will quickly
fill up leaving me again without any unallocated space on 3 drives.
Right now I am lucky enough to have some space on sdc, but typically
after balancing I would have 3 disks with 1MB left and 1 disk with 14
TB unallocated. And I can't seem to get the balance to start using up
sdd and freeing up space from all the others.
As things stand I don't know how sde ever got filled up. Since it is
the largest of the disks it should have a portion that cannot be find
a parity anywhere else, so part of it will remain unoccupied. At least
that is what I could gather from the btrfs space allocator site.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Array extremely unbalanced after convert to Raid5
2021-05-05 13:41 Array extremely unbalanced after convert to Raid5 Abdulla Bubshait
2021-05-05 13:58 ` remi
@ 2021-05-05 14:49 ` Zygo Blaxell
2021-05-05 15:35 ` Abdulla Bubshait
1 sibling, 1 reply; 7+ messages in thread
From: Zygo Blaxell @ 2021-05-05 14:49 UTC (permalink / raw)
To: Abdulla Bubshait; +Cc: linux-btrfs
On Wed, May 05, 2021 at 09:41:49AM -0400, Abdulla Bubshait wrote:
> I ran a balance convert of my data single setup to raid 5. Once
> complete the setup is extremely unbalanced and doesn't even make sense
> as a raid 5. I tried to run a balance with dlimit of 1000, but it just
> seems to make things worse.
Balancing a full single array to raid5 requires a better device selection
algorithm than the kernel provides, especially if the disks are of
different sizes and were added to the array over a long period of time.
The kernel will strictly relocate the newest block groups first, which
may leave space occupied on some disks for most of the balance, and
cause many chunks to be created with suboptimal stripe width.
> After convert the array looked like this:
>
> btrfs fi show gives:
> Label: 'horde' uuid: 26debbc1-fdd0-4c3a-8581-8445b99c067c
> Total devices 4 FS bytes used 25.53TiB
> devid 1 size 16.37TiB used 2.36TiB path /dev/sdd
> devid 2 size 14.55TiB used 14.27TiB path /dev/sdc
> devid 3 size 12.73TiB used 12.69TiB path /dev/sdf
> devid 4 size 16.37TiB used 16.32TiB path /dev/sde
For raid5 conversion, you need equal amounts of unallocated space on
each disk. Convert sufficient raid5 block groups back to single profile
to redistribute the unallocated space:
btrfs balance start -dconvert=single,devid=2,limit=4000 /fs
btrfs balance start -dconvert=single,devid=3,limit=4000 /fs
btrfs balance start -dconvert=single,devid=4,limit=4000 /fs
After this, each disk should have 3-4 TB of unallocated space on it
(devid 2-4 will have data moved to devid 1). The important thing is to
have equal unallocated space--you can cancel balancing as soon as that
is achieved. The limits above are higher than necessary to be sure
that happens.
Now use the stripes filter to get rid of all chunks that have fewer
than the optimum number of stripes on each disk. Cycle through these
commands until they report 0 chunks relocated (you can just leave these
running in a shell loop and check on it every few hours, when they get
to 0 they will just become no-ops):
btrfs balance start -dlimit=100,convert=raid5,stripes=1..3,devid=3 /fs
btrfs balance start -dlimit=100,convert=raid5,stripes=1..2,devid=2 /fs
btrfs balance start -dlimit=100,convert=raid5,stripes=1..1,devid=1 /fs
btrfs balance start -dlimit=100,convert=raid5,stripes=1..1,devid=4 /fs
The filters select chunks that have undesirable stripe counts and force
them into raid5 profile. Single chunks have stripe count 1 and will
be converted to raid5. RAID5 chunks have stripe count >1 and will be
relocated (converted from raid5 to raid5, but in a different location
with more disks in the chunk). RAID5 chunks that already occupy the
correct number of drives will not be touched.
It is important to select chunks from every drive in turn in order to
keep some free space available on all disks. Each command will spread
out 100 chunks from one disk over all the disks, making space on one
disk and filling all the others. The balance must change to another
devid at regular intervals to ensure all disks maintain free space as
long as possible.
If you want to avoid converting to single profile then you might be able
to use only the balance commands in the second section; however, if you
run out of space on one or more drives then the balances will push data
around but be unable to make any progress on changing the chunk sizes.
In that case you will need to convert some raid5 back to single chunks
to continue.
> btrfs fi usage gives:
> Overall:
> Device size: 60.03TiB
> Device allocated: 45.64TiB
> Device unallocated: 14.39TiB
> Device missing: 0.00B
> Used: 45.59TiB
> Free (estimated): 8.08TiB (min: 4.81TiB)
> Free (statfs, df): 410.67GiB
> Data ratio: 1.78
> Metadata ratio: 3.00
> Global reserve: 512.00MiB (used: 80.00KiB)
> Multiple profiles: no
>
> Data,RAID5: Size:25.51TiB, Used:25.50TiB (99.93%)
> /dev/sdd 2.33TiB
> /dev/sdc 14.23TiB
> /dev/sdf 12.66TiB
> /dev/sde 16.31TiB
>
> Metadata,RAID1C3: Size:35.00GiB, Used:28.54GiB (81.55%)
> /dev/sdd 34.00GiB
> /dev/sdc 35.00GiB
> /dev/sdf 30.00GiB
> /dev/sde 6.00GiB
>
> System,RAID1C3: Size:32.00MiB, Used:3.06MiB (9.57%)
> /dev/sdd 32.00MiB
> /dev/sdc 32.00MiB
> /dev/sde 32.00MiB
>
> Unallocated:
> /dev/sdd 14.01TiB
> /dev/sdc 292.99GiB
> /dev/sdf 47.00GiB
> /dev/sde 53.00GiB
>
> After doing some balance I currently have:
> btrfs fi usage
> Overall:
> Device size: 60.03TiB
> Device allocated: 45.52TiB
> Device unallocated: 14.51TiB
> Device missing: 0.00B
> Used: 45.50TiB
> Free (estimated): 8.16TiB (min: 4.85TiB)
> Free (statfs, df): 414.97GiB
> Data ratio: 1.78
> Metadata ratio: 3.00
> Global reserve: 512.00MiB (used: 80.00KiB)
> Multiple profiles: no
>
> Data,RAID5: Size:25.52TiB, Used:25.51TiB (99.96%)
> /dev/sdd 2.23TiB
> /dev/sdc 14.13TiB
> /dev/sdf 12.71TiB
> /dev/sde 16.37TiB
>
> Metadata,RAID1C3: Size:29.00GiB, Used:28.51GiB (98.31%)
> /dev/sdd 29.00GiB
> /dev/sdc 29.00GiB
> /dev/sdf 27.00GiB
> /dev/sde 2.00GiB
>
> System,RAID1C3: Size:32.00MiB, Used:3.03MiB (9.47%)
> /dev/sdd 32.00MiB
> /dev/sdc 32.00MiB
> /dev/sde 32.00MiB
>
> Unallocated:
> /dev/sdd 14.12TiB
> /dev/sdc 404.99GiB
> /dev/sdf 1.00MiB
> /dev/sde 1.00MiB
>
>
> So the estimated freespace is 8TB, When it should be closer to 15. I
> am guessing the freespace would be better if it was properly balanced.
> I am unsure how to properly balance this array though.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Array extremely unbalanced after convert to Raid5
2021-05-05 13:58 ` remi
2021-05-05 14:23 ` Abdulla Bubshait
@ 2021-05-05 14:51 ` Zygo Blaxell
1 sibling, 0 replies; 7+ messages in thread
From: Zygo Blaxell @ 2021-05-05 14:51 UTC (permalink / raw)
To: remi; +Cc: Abdulla Bubshait, linux-btrfs
On Wed, May 05, 2021 at 09:58:03AM -0400, remi@georgianit.com wrote:
>
>
> On Wed, May 5, 2021, at 9:41 AM, Abdulla Bubshait wrote:
> > I ran a balance convert of my data single setup to raid 5. Once
> > complete the setup is extremely unbalanced and doesn't even make sense
> > as a raid 5. I tried to run a balance with dlimit of 1000, but it just
> > seems to make things worse.
>
> >
> > Unallocated:
> > /dev/sdd 14.12TiB
> > /dev/sdc 404.99GiB
> > /dev/sdf 1.00MiB
> > /dev/sde 1.00MiB
> >
> >
>
>
> Sorry, I don't have a solution for you, but I want to point out that
> the situation is far more critical than you seem to have realized.. this
> filesystem is now completely wedged, and I would suggest adding another
> device, or replacing either /dev/sdf or /dev/sde with something
> larger,, (though, if those are real disks, I see that might be a
> challenge.)
>
> Your metadata is Raid1C3, (meaning 3 copies,), but you only have 2 disks
> with free space. And thanks to the recent balancing, there is very
> little free space in the already allocated metadata,, so effectively,
> the filesystem can no longer write any new metadata and will very
> quickly hit out of space errors.
The situation isn't that dire. Balancing one data chunk off of either
/dev/sdf or /dev/sde will resolve the issue.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Array extremely unbalanced after convert to Raid5
2021-05-05 14:49 ` Zygo Blaxell
@ 2021-05-05 15:35 ` Abdulla Bubshait
2021-05-06 4:32 ` Zygo Blaxell
0 siblings, 1 reply; 7+ messages in thread
From: Abdulla Bubshait @ 2021-05-05 15:35 UTC (permalink / raw)
To: Zygo Blaxell; +Cc: linux-btrfs
On Wed, May 5, 2021 at 10:49 AM Zygo Blaxell
<ce3g8jdj@umail.furryterror.org> wrote:
>
> Balancing a full single array to raid5 requires a better device selection
> algorithm than the kernel provides, especially if the disks are of
> different sizes and were added to the array over a long period of time.
> The kernel will strictly relocate the newest block groups first, which
> may leave space occupied on some disks for most of the balance, and
> cause many chunks to be created with suboptimal stripe width.
>
Is this also true of running a full balance after conversion to raid5?
Is it able to optimize the stripe width or would a balance run into
the same issue due to
the disks being full?
> Now use the stripes filter to get rid of all chunks that have fewer
> than the optimum number of stripes on each disk. Cycle through these
> commands until they report 0 chunks relocated (you can just leave these
> running in a shell loop and check on it every few hours, when they get
> to 0 they will just become no-ops):
>
> btrfs balance start -dlimit=100,convert=raid5,stripes=1..3,devid=3 /fs
>
> btrfs balance start -dlimit=100,convert=raid5,stripes=1..2,devid=2 /fs
>
> btrfs balance start -dlimit=100,convert=raid5,stripes=1..1,devid=1 /fs
>
> btrfs balance start -dlimit=100,convert=raid5,stripes=1..1,devid=4 /fs
>
> The filters select chunks that have undesirable stripe counts and force
> them into raid5 profile. Single chunks have stripe count 1 and will
> be converted to raid5. RAID5 chunks have stripe count >1 and will be
> relocated (converted from raid5 to raid5, but in a different location
> with more disks in the chunk). RAID5 chunks that already occupy the
> correct number of drives will not be touched.
That is what I was looking to do, I must have missed the stripes
filter. I think I can figure out a script that is able to spread the
data enough.
But here is a question. At what point does the fs stop striping onto a
disk? Does it stop at 1MB unallocated and if so does that cause issues
in practice if the need arises to allocate metadata chunks due to
raid1c3?
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Array extremely unbalanced after convert to Raid5
2021-05-05 15:35 ` Abdulla Bubshait
@ 2021-05-06 4:32 ` Zygo Blaxell
0 siblings, 0 replies; 7+ messages in thread
From: Zygo Blaxell @ 2021-05-06 4:32 UTC (permalink / raw)
To: Abdulla Bubshait; +Cc: linux-btrfs
On Wed, May 05, 2021 at 11:35:23AM -0400, Abdulla Bubshait wrote:
> On Wed, May 5, 2021 at 10:49 AM Zygo Blaxell
> <ce3g8jdj@umail.furryterror.org> wrote:
> >
> > Balancing a full single array to raid5 requires a better device selection
> > algorithm than the kernel provides, especially if the disks are of
> > different sizes and were added to the array over a long period of time.
> > The kernel will strictly relocate the newest block groups first, which
> > may leave space occupied on some disks for most of the balance, and
> > cause many chunks to be created with suboptimal stripe width.
> >
> Is this also true of running a full balance after conversion to raid5?
> Is it able to optimize the stripe width or would a balance run into
> the same issue due to
> the disks being full?
Balancing all the data block groups in a single command will simply
restripe every chunk in reverse creation order, whether needed or not,
and get whatever space is available at the time for each chunk--and
possibly run out of space on filesystems with non-equal disk sizes.
If you run it enough times, it might eventually settle into a good state,
but it is not guaranteed.
Generally you should never do a full balance because a full balance
balances metadata, and you should never balance metadata because it
will lead to low-metadata-space conditions like the one you are in now.
The exceptions to the "never balance metadata" rule are when converting
from one raid profile to a different profile, or when permanently removing
a disk from the filesystem, and even then you should ensure there is a
lot of unallocated space available before starting a metadata balance.
> > Now use the stripes filter to get rid of all chunks that have fewer
> > than the optimum number of stripes on each disk. Cycle through these
> > commands until they report 0 chunks relocated (you can just leave these
> > running in a shell loop and check on it every few hours, when they get
> > to 0 they will just become no-ops):
> >
> > btrfs balance start -dlimit=100,convert=raid5,stripes=1..3,devid=3 /fs
> >
> > btrfs balance start -dlimit=100,convert=raid5,stripes=1..2,devid=2 /fs
> >
> > btrfs balance start -dlimit=100,convert=raid5,stripes=1..1,devid=1 /fs
> >
> > btrfs balance start -dlimit=100,convert=raid5,stripes=1..1,devid=4 /fs
> >
> > The filters select chunks that have undesirable stripe counts and force
> > them into raid5 profile. Single chunks have stripe count 1 and will
> > be converted to raid5. RAID5 chunks have stripe count >1 and will be
> > relocated (converted from raid5 to raid5, but in a different location
> > with more disks in the chunk). RAID5 chunks that already occupy the
> > correct number of drives will not be touched.
>
> That is what I was looking to do, I must have missed the stripes
> filter. I think I can figure out a script that is able to spread the
> data enough.
>
> But here is a question. At what point does the fs stop striping onto a
> disk? Does it stop at 1MB unallocated and if so does that cause issues
> in practice if the need arises to allocate metadata chunks due to
> raid1c3?
Allocators come in two groups: those that fill the emptiest disks first
(raid1*, single, dup) and those that fill all disks equally (raid0,
raid5, raid6, raid10). raid1c3 metadata has a 3-disk minimum, so it
will allocate all its space on the 3 largest disks (or most free space
if the array is unbalanced) and normally runs out of space when the 3rd
largest disk in the array fills up. raid5 data has a 2-disk minimum,
but will try to fill all drives equally, and normally runs out of space
when the 2nd largest disk in the array fills up.
raid5 fills up devid 3 first, then 2, then 1 and 4. raid1c3 fills up
devid 1, 4, and 2 first, then 3. When raid5 fills up devid 2, raid1c3 is
out of space, so you will have effectively 2TB unusable--there will not be
enough metadata space to fill the last 2 TB on devid 1 and 4. You will
also have additional complications due to being out of metadata space
without also being out of data space that can be hard to recover from.
You can fix that in a few different ways:
- convert metadata to raid1 (2 disk minimum, 1 failure tolerated,
same as raid5, works with the 2 largest disks you have).
- resize the 2 largest disks to be equal in size to the 3rd
largest (will reduce filesystem capacity by 2 TB). This ensures
the 3rd largest disk will fill up at the same time as the two
larger ones, so raid1c3 metadata can always be allocated until
the filesystem is completely full.
- replace a smaller disk with one matching the largest 2 disk
sizes. This is another way to make the top 3 disks the same
size to satisfy the raid1c3 requirement.
- mount -o metadata_ratio=20 (preallocates a lot of metadata
space, equal in size to 5% of the data when normally <3% are
needed). Remember to never balance metadata or you'll lose
this preallocation. This enables maximum space usage and
3-disk metadata redundancy, but it has a risk of failure
if the metadata ratio turns out to be too low.
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2021-05-06 4:32 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-05 13:41 Array extremely unbalanced after convert to Raid5 Abdulla Bubshait
2021-05-05 13:58 ` remi
2021-05-05 14:23 ` Abdulla Bubshait
2021-05-05 14:51 ` Zygo Blaxell
2021-05-05 14:49 ` Zygo Blaxell
2021-05-05 15:35 ` Abdulla Bubshait
2021-05-06 4:32 ` Zygo Blaxell
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.