All of lore.kernel.org
 help / color / mirror / Atom feed
* Replacing disk strange (buggy?) behaviour - RAID1
@ 2021-04-19 15:22 Jonah Sabean
  2021-04-20 18:19 ` Andrei Borzenkov
  0 siblings, 1 reply; 4+ messages in thread
From: Jonah Sabean @ 2021-04-19 15:22 UTC (permalink / raw)
  To: linux-btrfs

I'm running Ubuntu 21.04 (technically not a stable "release" yet, but
it will be in a few days, so if this is an ubuntu specific issue I'd
like to report it before it is!).

The btrfs volume in question is two 8TB hard disks that were in RAID1
at the time the filesystem was created. Kernel version is Ubuntu's
5.11.0-14-generic with btrfs-progs version 5.10.1-1build1 in the
hirsute repos currently. This array is mostly non-changing archived
data, if that even matters.

I replaced a missing disk (sda is the replacement disk) last night
while in a degraded mount (left it all night to complete) with `btrfs
replace start 1 /dev/sda1 /mnt/btrfs` (1 was the missing disk in btrfs
fi show) and it appears to have worked fine. However, when I ran
`btrfs fi usage` it returned:

Overall:
    Device size:                  14.55TiB
    Device allocated:              2.41TiB
    Device unallocated:           12.14TiB
    Device missing:                  0.00B
    Used:                          1.60TiB
    Free (estimated):              8.63TiB      (min: 6.61TiB)
    Free (statfs, df):            12.14TiB
    Data ratio:                       1.50
    Metadata ratio:                   1.43
    Global reserve:              512.00MiB      (used: 0.00B)
    Multiple profiles:                 yes      (data, metadata, system)

Data,single: Size:820.00GiB, Used:3.25MiB (0.00%)
   /dev/sdb1     820.00GiB

Data,RAID1: Size:819.00GiB, Used:818.64GiB (99.96%)
   /dev/sda1     819.00GiB
   /dev/sdb1     819.00GiB

Metadata,single: Size:4.00GiB, Used:864.00KiB (0.02%)
   /dev/sdb1       4.00GiB

Metadata,RAID1: Size:3.00GiB, Used:1.69GiB (56.23%)
   /dev/sda1       3.00GiB
   /dev/sdb1       3.00GiB

System,single: Size:32.00MiB, Used:144.00KiB (0.44%)
   /dev/sdb1      32.00MiB

System,RAID1: Size:8.00MiB, Used:80.00KiB (0.98%)
   /dev/sda1       8.00MiB
   /dev/sdb1       8.00MiB

Unallocated:
   /dev/sda1       6.47TiB
   /dev/sdb1       5.67TiB

So a small amount of actual data and metadata was still single on the
disk I was rebuilding from (sdb), but it had massively allocated
"single" chunks in the process (relatively equal to what I had in
actual data), and to a lesser extent, metadata too. Why didn't it free
those up as it replaced the missing disk and duplicated the data in
RAID1? Shouldn't it all be RAID1 once it's complete, why even have
such small amounts remain single? Easy fix I thought, as at first
glance I didn't realize 800GiB was allocated single, only paying
attention to the small amounts used, so I did a soft convert to fix
this.
sudo btrfs balance start -dconvert=raid1,soft -mconvert=raid1,soft /mnt/btrfs

Convert was pretty quick... took just a few minutes, but of course now
it's all allocated just as raid1 now (with presumably 0 actual data in
most of them):
sudo btrfs fi usage /mnt/btrfs/
Overall:
    Device size:                  14.55TiB
    Device allocated:              3.21TiB
    Device unallocated:           11.34TiB
    Device missing:                  0.00B
    Used:                          1.60TiB
    Free (estimated):              6.47TiB      (min: 6.47TiB)
    Free (statfs, df):             6.47TiB
    Data ratio:                       2.00
    Metadata ratio:                   2.00
    Global reserve:              512.00MiB      (used: 0.00B)
    Multiple profiles:                  no

Data,RAID1: Size:1.60TiB, Used:818.64GiB (49.95%)
   /dev/sda1       1.60TiB
   /dev/sdb1       1.60TiB

Metadata,RAID1: Size:7.00GiB, Used:1.69GiB (24.11%)
   /dev/sda1       7.00GiB
   /dev/sdb1       7.00GiB

System,RAID1: Size:40.00MiB, Used:240.00KiB (0.59%)
   /dev/sda1      40.00MiB
   /dev/sdb1      40.00MiB

Unallocated:
   /dev/sda1       5.67TiB
   /dev/sdb1       5.67TiB

Ratios are 2 now, which is exactly what I wanted, and how it was
beforehand. However I obviously didn't want all those chunks allocated
with nothing in them for no reason, even if they are relatively
harmless.

My questions are:
1. Why did it have so much 'single' allocated chunks to begin with?
Everything was RAID1 all up until the disk replacement, so it clearly
did this during the `btrfs replace` process.  Did I do this wrong, or
is there a bug?
2. Would the btrfs replace have failed if the filesystem was more full
and those chunks were not possible to allocate (it basically allocated
double the amount of data I have after all, so if the fs was 50%+
full...)?
3. How do I prevent this from happening in the future, should I need
to replace a disk? Is this possibly an Ubuntu related issue (perhaps
how the btrfs progs is older relative to the kernel?).

The 7GiB metadata isn't so bad, however I did proceed to run
btrfs balance start -dusage=0 /mnt/btrfs

Is it possible to run balance with `-dusage=0` along with the convert
to do that all in one balance? Obviously, that doesn't solve the
actual issue to begin with, I'm just curious as I did it in two steps.

FWIW: The `dusage=0` filter freed up pretty much everything as I
expected it to, and it looks pretty much identical to how it did
before the disk replacement:
Data,RAID1: Size:819.00GiB, Used:818.64GiB (99.96%)
  /dev/sda1     819.00GiB
  /dev/sdb1     819.00GiB

I'm willing to do the process all over again as all this data is on
another system, I just would like assurance I don't run into this same
issue twice.

Thanks,
-Jonah

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Replacing disk strange (buggy?) behaviour - RAID1
  2021-04-19 15:22 Replacing disk strange (buggy?) behaviour - RAID1 Jonah Sabean
@ 2021-04-20 18:19 ` Andrei Borzenkov
  2021-04-21  0:23   ` Jonah Sabean
  0 siblings, 1 reply; 4+ messages in thread
From: Andrei Borzenkov @ 2021-04-20 18:19 UTC (permalink / raw)
  To: me, linux-btrfs

On 19.04.2021 18:22, Jonah Sabean wrote:
> I'm running Ubuntu 21.04 (technically not a stable "release" yet, but
> it will be in a few days, so if this is an ubuntu specific issue I'd
> like to report it before it is!).
> 
> The btrfs volume in question is two 8TB hard disks that were in RAID1
> at the time the filesystem was created. Kernel version is Ubuntu's
> 5.11.0-14-generic with btrfs-progs version 5.10.1-1build1 in the
> hirsute repos currently. This array is mostly non-changing archived
> data, if that even matters.
> 
> I replaced a missing disk (sda is the replacement disk) last night
> while in a degraded mount (left it all night to complete) with `btrfs
> replace start 1 /dev/sda1 /mnt/btrfs` (1 was the missing disk in btrfs
> fi show) and it appears to have worked fine. However, when I ran
> `btrfs fi usage` it returned:
> 
> Overall:
>     Device size:                  14.55TiB
>     Device allocated:              2.41TiB
>     Device unallocated:           12.14TiB
>     Device missing:                  0.00B
>     Used:                          1.60TiB
>     Free (estimated):              8.63TiB      (min: 6.61TiB)
>     Free (statfs, df):            12.14TiB
>     Data ratio:                       1.50
>     Metadata ratio:                   1.43
>     Global reserve:              512.00MiB      (used: 0.00B)
>     Multiple profiles:                 yes      (data, metadata, system)
> 
> Data,single: Size:820.00GiB, Used:3.25MiB (0.00%)
>    /dev/sdb1     820.00GiB
> 
> Data,RAID1: Size:819.00GiB, Used:818.64GiB (99.96%)
>    /dev/sda1     819.00GiB
>    /dev/sdb1     819.00GiB
> 
> Metadata,single: Size:4.00GiB, Used:864.00KiB (0.02%)
>    /dev/sdb1       4.00GiB
> 
> Metadata,RAID1: Size:3.00GiB, Used:1.69GiB (56.23%)
>    /dev/sda1       3.00GiB
>    /dev/sdb1       3.00GiB
> 
> System,single: Size:32.00MiB, Used:144.00KiB (0.44%)
>    /dev/sdb1      32.00MiB
> 
> System,RAID1: Size:8.00MiB, Used:80.00KiB (0.98%)
>    /dev/sda1       8.00MiB
>    /dev/sdb1       8.00MiB
> 
> Unallocated:
>    /dev/sda1       6.47TiB
>    /dev/sdb1       5.67TiB
> 
> So a small amount of actual data and metadata was still single on the
> disk I was rebuilding from (sdb), but it had massively allocated
> "single" chunks in the process (relatively equal to what I had in
> actual data), and to a lesser extent, metadata too. 

Mounting raid1 btrfs writable in degraded mode creates chunks with
single profile. This is long standing issue. What is rather surprising
that you apparently have chunk size 819GiB which is suspiciously close
to 10% of 8TiB. btrfs indeed limits chunk size to 10% of total space,
but it should not exceed 10GiB. Could it be specific Ubuntu issue?

So when you wrote data in degraded mode it had to allocate new chunk
with "single" profile.

> Why didn't it free
> those up as it replaced the missing disk and duplicated the data in
> RAID1? 

Device replacement restored mirrored data (chunks with "raid1" profile)
on the new device. It had no reasons to touch chunks with "single"
profile because from btrfs point of view these chunks never had any data
on replaced device so there is nothing to write there.

> Shouldn't it all be RAID1 once it's complete,

No. btrfs replace restores content of missing device. It is not
replacement for profile conversion.

> why even have
> such small amounts remain single? Easy fix I thought, as at first
> glance I didn't realize 800GiB was allocated single, only paying
> attention to the small amounts used, so I did a soft convert to fix
> this.
> sudo btrfs balance start -dconvert=raid1,soft -mconvert=raid1,soft /mnt/btrfs
> 
> Convert was pretty quick... took just a few minutes, but of course now
> it's all allocated just as raid1 now (with presumably 0 actual data in
> most of them):

Correct. To convert profile btrfs must allocate new chunks in new
profile and copy data over.
...
> 
> My questions are:
> 1. Why did it have so much 'single' allocated chunks to begin with?

It does not look like "chunks", rather it really looks like "chunk".
Output of

btrfs inspect-internal dump-tree -d /dev/xxx

may be interesting.

> Everything was RAID1 all up until the disk replacement, so it clearly
> did this during the `btrfs replace` process. 

No, it did it during degraded writable mount.

> Did I do this wrong, or
> is there a bug?

There is misfeature that btrfs creates "single" chunks during degraded
mount. Ideally it should create degraded raid1 chunks.

> 2. Would the btrfs replace have failed if the filesystem was more full
> and those chunks were not possible to allocate (it basically allocated
> double the amount of data I have after all, so if the fs was 50%+
> full...)?

btrfs replace duplicates data that was on missing device. If you were
able to write this data while device was present, btrfs replace cannot
fail due to missing space (of course if device is at least as large).

> 3. How do I prevent this from happening in the future, should I need
> to replace a disk?

Do not write anything in degraded mode.

> Is this possibly an Ubuntu related issue (perhaps
> how the btrfs progs is older relative to the kernel?).
> 
> The 7GiB metadata isn't so bad, however I did proceed to run
> btrfs balance start -dusage=0 /mnt/btrfs
> 
> Is it possible to run balance with `-dusage=0` along with the convert
> to do that all in one balance? Obviously, that doesn't solve the
> actual issue to begin with, I'm just curious as I did it in two steps.
> 
> FWIW: The `dusage=0` filter freed up pretty much everything as I
> expected it to, and it looks pretty much identical to how it did
> before the disk replacement:
> Data,RAID1: Size:819.00GiB, Used:818.64GiB (99.96%)
>   /dev/sda1     819.00GiB
>   /dev/sdb1     819.00GiB
> 
> I'm willing to do the process all over again as all this data is on
> another system, I just would like assurance I don't run into this same
> issue twice.
> 
> Thanks,
> -Jonah
> 


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Replacing disk strange (buggy?) behaviour - RAID1
  2021-04-20 18:19 ` Andrei Borzenkov
@ 2021-04-21  0:23   ` Jonah Sabean
  2021-04-21  8:39     ` Forza
  0 siblings, 1 reply; 4+ messages in thread
From: Jonah Sabean @ 2021-04-21  0:23 UTC (permalink / raw)
  To: Andrei Borzenkov; +Cc: linux-btrfs

Thanks for the reply, it's appreciated!

> Mounting raid1 btrfs writable in degraded mode creates chunks with
> single profile. This is long standing issue. What is rather surprising
> that you apparently have chunk size 819GiB which is suspiciously close
> to 10% of 8TiB. btrfs indeed limits chunk size to 10% of total space,
> but it should not exceed 10GiB. Could it be specific Ubuntu issue?
>
> So when you wrote data in degraded mode it had to allocate new chunk
> with "single" profile.

That's strange as I didn't write actual data to the disk during that
time. Perhaps Ubuntu wrote some hidden file or something to it, I have
no idea, but I didn't interact with the filesystem beyond doing the
replace after mounting it. Still... 10% of 8TiB may be what's
happening and if so that's really strange... and massive. I'd don't
think it was a single chunk equal to 10% though, as when I converted
it to raid1, I specified the soft filter with `-dconvert=raid1,soft`,
and the resulting output once it was complete was:

Done, had to relocate 825 out of 1648 chunks

So the hypothesis that it was one large chunk doesn't make hold up to
me knowing that output. I thus assume the chunks were all 1GiB given
that many. Unfortunately I didn't save the output of the balance with
`-dusage=0`, but I do recall it being in the 800s as well.

> > Why didn't it free
> > those up as it replaced the missing disk and duplicated the data in
> > RAID1?
>
> Device replacement restored mirrored data (chunks with "raid1" profile)
> on the new device. It had no reasons to touch chunks with "single"
> profile because from btrfs point of view these chunks never had any data
> on replaced device so there is nothing to write there.
>
> > Shouldn't it all be RAID1 once it's complete,
> No. btrfs replace restores content of missing device. It is not
> replacement for profile conversion.

Makes sense, knowing that I would expect that. I just have no idea why
it allocated 800GiB, especially since I didn't write anything to the
single disk during the convert process, much less ~800GiB.

> > Everything was RAID1 all up until the disk replacement, so it clearly
> > did this during the `btrfs replace` process.
>
> No, it did it during degraded writable mount.
>
> > Did I do this wrong, or
> > is there a bug?
>
> There is misfeature that btrfs creates "single" chunks during degraded
> mount. Ideally it should create degraded raid1 chunks.

Hmm... would be nice to see this then. Is there a patch for it,
assuming it's planned?

Thanks,
-Jonah

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Replacing disk strange (buggy?) behaviour - RAID1
  2021-04-21  0:23   ` Jonah Sabean
@ 2021-04-21  8:39     ` Forza
  0 siblings, 0 replies; 4+ messages in thread
From: Forza @ 2021-04-21  8:39 UTC (permalink / raw)
  To: me, Andrei Borzenkov; +Cc: linux-btrfs



---- From: Jonah Sabean <me@jse.io> -- Sent: 2021-04-21 - 02:23 ----

> Thanks for the reply, it's appreciated!
> 
>> Mounting raid1 btrfs writable in degraded mode creates chunks with
>> single profile. This is long standing issue. What is rather surprising
>> that you apparently have chunk size 819GiB which is suspiciously close
>> to 10% of 8TiB. btrfs indeed limits chunk size to 10% of total space,
>> but it should not exceed 10GiB. Could it be specific Ubuntu issue?
>>
>> So when you wrote data in degraded mode it had to allocate new chunk
>> with "single" profile.
> 
> That's strange as I didn't write actual data to the disk during that
> time. Perhaps Ubuntu wrote some hidden file or something to it, I have
> no idea, but I didn't interact with the filesystem beyond doing the
> replace after mounting it. Still... 10% of 8TiB may be what's
> happening and if so that's really strange... and massive. I'd don't
> think it was a single chunk equal to 10% though, as when I converted
> it to raid1, I specified the soft filter with `-dconvert=raid1,soft`,
> and the resulting output once it was complete was:
> 
> Done, had to relocate 825 out of 1648 chunks
> 
> So the hypothesis that it was one large chunk doesn't make hold up to
> me knowing that output. I thus assume the chunks were all 1GiB given
> that many. Unfortunately I didn't save the output of the balance with
> `-dusage=0`, but I do recall it being in the 800s as well.
> 
>> > Why didn't it free
>> > those up as it replaced the missing disk and duplicated the data in
>> > RAID1?
>>
>> Device replacement restored mirrored data (chunks with "raid1" profile)
>> on the new device. It had no reasons to touch chunks with "single"
>> profile because from btrfs point of view these chunks never had any data
>> on replaced device so there is nothing to write there.
>>
>> > Shouldn't it all be RAID1 once it's complete,
>> No. btrfs replace restores content of missing device. It is not
>> replacement for profile conversion.
> 
> Makes sense, knowing that I would expect that. I just have no idea why
> it allocated 800GiB, especially since I didn't write anything to the
> single disk during the convert process, much less ~800GiB.
> 
>> > Everything was RAID1 all up until the disk replacement, so it clearly
>> > did this during the `btrfs replace` process.
>>
>> No, it did it during degraded writable mount.
>>
>> > Did I do this wrong, or
>> > is there a bug?
>>
>> There is misfeature that btrfs creates "single" chunks during degraded
>> mount. Ideally it should create degraded raid1 chunks.
> 
> Hmm... would be nice to see this then. Is there a patch for it,
> assuming it's planned?
> 
> Thanks,
> -Jonah

I've been testing the replacement feature and in IMHO there is often (usually?) a single block group created when mounting a RAID1 or RAID10 filesystem degraded with a missing disk. 

I think this is because there will always be some metadata updates if you mount a filesystem rw,degraded with missing disks. 

And on principle I think it is correct. It clearly shows to the user that there is data not protected by redundancy. 

So I think we should simply amend the official docs to check for multiple profiles and issue a balance with soft filter. 

It is what I suggest on my personal wiki space https://wiki.tnonline.net/w/Btrfs/Replacing_a_disk

Take care, stay safe. 

/Forza



^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2021-04-21  8:39 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-19 15:22 Replacing disk strange (buggy?) behaviour - RAID1 Jonah Sabean
2021-04-20 18:19 ` Andrei Borzenkov
2021-04-21  0:23   ` Jonah Sabean
2021-04-21  8:39     ` Forza

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.