All of lore.kernel.org
 help / color / mirror / Atom feed
* Replacing drives with larger ones in a 4 drive raid1
@ 2016-06-08 18:55 boli
  2016-06-09 15:20 ` Duncan
                   ` (2 more replies)
  0 siblings, 3 replies; 17+ messages in thread
From: boli @ 2016-06-08 18:55 UTC (permalink / raw)
  To: linux-btrfs

Dear list

I've had a 4 drive btrfs raid1 setup in my backup NAS for a few months now. It's running Fedora 23 Server with kernel 4.5.5 and btrfs-progs v4.4.1.

Recently I had the idea to replace the 6 TB HDDs with 8 TB ones ("WD Red"), because their price is now acceptable.
(More back story: That particular machine has only 4 HDD bays, which is why I originally dared run it as raid5, but later converted to raid1 after having experienced very slow monthly btrfs scrubs and figuring that 12 TB total capacity would be enough for a while; my main NAS on the other hand has always had 6 x 6 TB raid1, that's from where I knew that scrubs can be much faster).

Anyway, so I physically replaced one of the 6 TB drives with an 8 TB one. Fedora didn't boot properly, but went into emergency mode, apparently because it couldn't mount the filesystem.

Because I have to use a finicky Java console when it's booted in emergency mode, I figured I should probably get it to boot normally again as quickly as possible, so I can connect properly with SSH instead.

I guessed the way to do that would be to remove the missing drive from /etc/crypttab (all drives use encryption) and from the btrfs raid1, then reboot and add the new drive to the btrfs volume (also I'd like to completely zero the new drive first, to weed out bad sectors).

In the wiki I read about replace as well as delete/add and figured since I will eventually have to replace all 4 drives one-by-one, I might as well try out different methods and gain insight while doing it. :)

So for this first replacement I mounted the volume degraded and ran "btrfs device delete missing /mnt", and that's where it's been stuck for the past ~23 hours. Only later did I figure out that this command will trigger a rebalance, and of course that will take a long time.

I'm not entirely sure that this rebalance has a chance to work, as a 3x6 TB raid1 would only have 9 TB of space, which may just be enough (but not by much). I can't currently check how much space is actually used, but it must be at least 8.1 TB (that's how much data is on my main NAS), but probably not much more than that (my main NAS may still have most if not all of the snapshots synched to the backup NAS too, for now).

Regarding a few gotchas: I use btrbk to copy and thin snapshots, so there are < 100 snapshots. I might still have quotas active though, because that allows determining the diff size between 2 snapshots. In practice I don't use this often, so will turn it off once things are stable, because I read in other list mails that it makes things slow.

I assume I could probably just Ctrl+C that "btrfs device delete missing /mnt", and the balance would continue as usual in the background, but I have not done that yet, as I'd rather consult you guys first (a bit late, I know).

Anyway, if you have any tips, I'm glad to read them.

For now my plan is to continue waiting what happens. Since it's a just my personal backup NAS, the downtime is not that bad, only that it won't get the usual nightly backups from my main NAS for some time.

Losing data and having to start from scratch would just be an inconvenience, but not a disaster, particularly because the backup NAS is at a friend's house and my upstream is only 50 Mbit/s.

Also thanks to Hugo and Duncan for their awesome/insightful replies to my first question a few months ago (didn't want to spam the list just to say thanks).

Best regards, boli

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Replacing drives with larger ones in a 4 drive raid1
  2016-06-08 18:55 Replacing drives with larger ones in a 4 drive raid1 boli
@ 2016-06-09 15:20 ` Duncan
  2016-06-09 17:30   ` bOli
  2016-06-10 18:56   ` Jukka Larja
  2016-06-11 13:13 ` boli
  2016-06-19 17:38 ` boli
  2 siblings, 2 replies; 17+ messages in thread
From: Duncan @ 2016-06-09 15:20 UTC (permalink / raw)
  To: linux-btrfs

boli posted on Wed, 08 Jun 2016 20:55:13 +0200 as excerpted:

> Recently I had the idea to replace the 6 TB HDDs with 8 TB ones ("WD
> Red"), because their price is now acceptable.

Are those the 8 TB SMR "archive" drives?

I haven't been following the issue very closely, but be aware that there 
were serious issues with those drives a few kernels back, and that while 
those issues are now fixed, the drives themselves operate rather 
differently than normal drives, and simply don't work well in normal 
usage.

The short version is that they really are designed for archiving and work 
well when used for that purpose -- a mostly write once and leave it there 
for archiving and retrieval but rarely if ever rewrite it, type usage.  
However, they work rather poorly in normal usage where data is rewritten, 
because they have to rewrite entire zones of data, and that takes much 
longer than simply rewriting individual sectors on normal drives does.

With the kernel patches to fix the initial problems they do work well 
enough, tho performance may not be what you expect, but the key to 
keeping them working well is being aware that they continue to do 
rewrites in the background for long after they are done with the initial 
write, and shutting them down while they are doing them can be an issue.

Due to btrfs' data checksumming feature, small variances to data that 
wouldn't normally be detected on non-checksumming filesystems were 
detected far sooner on btrfs, making it far more sensitive to these small 
errors.  However, if you use the drives for their intended nearly write-
only purpose, and/or very seldom power down the drives at all or do so 
only long after (give it half an hour, say) any writes have completed, as 
long as you're running a current kernel with the initial issues patched, 
you should be fine.  Just don't treat them like normal drives.

If OTOH you need more normal drive usage including lots of data rewrites, 
especially if you frequently poweroff the devices, strongly consider 
avoiding those 8 TB SMR drives, at least until the technology has a few 
more years to mature.

There's more information on other threads on the list and on other lists, 
if you need it and nobody posts more direct information (such as the 
specific patches in question and what specific kernel versions they hit) 
here.  I could find it but I'd have to do a search in my own list 
archives, and now that you are aware of the problem, you can of course do 
the search as well, if you need to. =:^)

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Replacing drives with larger ones in a 4 drive raid1
  2016-06-09 15:20 ` Duncan
@ 2016-06-09 17:30   ` bOli
  2016-06-10 18:56   ` Jukka Larja
  1 sibling, 0 replies; 17+ messages in thread
From: bOli @ 2016-06-09 17:30 UTC (permalink / raw)
  To: linux-btrfs

On 09.06.2016, at 17:20, Duncan <1i5t5.duncan@cox.net> wrote:

> Are those the 8 TB SMR "archive" drives?

No, they are Western Digital Red drives.

Thanks for the detailed follow-up anyway. :)

Half a year ago, when I evaluated hard drives, in the 8 TB category there were only the Hitachi 8 TB Helium drives for 800 bucks, and the Seagate SMR for 250 bucks.

I bought myself one of the Seagate SMR ones for testing, and figured out it wouldn't work for my use case (I now use it in a write-very-seldom context).

For my two NASes I went with 6 TB WD Red drives all around.

Nowadays there are more choices of 8 TB drives, such as the WD Reds I'm switching my backup NAS to.

> I haven't been following the issue very closely, but be aware that there 
> were serious issues with those drives a few kernels back, and that while 
> those issues are now fixed, the drives themselves operate rather 
> differently than normal drives, and simply don't work well in normal 
> usage.
> 
> The short version is that they really are designed for archiving and work 
> well when used for that purpose -- a mostly write once and leave it there 
> for archiving and retrieval but rarely if ever rewrite it, type usage.  
> However, they work rather poorly in normal usage where data is rewritten, 
> because they have to rewrite entire zones of data, and that takes much 
> longer than simply rewriting individual sectors on normal drives does.
> 
> With the kernel patches to fix the initial problems they do work well 
> enough, tho performance may not be what you expect, but the key to 
> keeping them working well is being aware that they continue to do 
> rewrites in the background for long after they are done with the initial 
> write, and shutting them down while they are doing them can be an issue.
> 
> Due to btrfs' data checksumming feature, small variances to data that 
> wouldn't normally be detected on non-checksumming filesystems were 
> detected far sooner on btrfs, making it far more sensitive to these small 
> errors.  However, if you use the drives for their intended nearly write-
> only purpose, and/or very seldom power down the drives at all or do so 
> only long after (give it half an hour, say) any writes have completed, as 
> long as you're running a current kernel with the initial issues patched, 
> you should be fine.  Just don't treat them like normal drives.
> 
> If OTOH you need more normal drive usage including lots of data rewrites, 
> especially if you frequently poweroff the devices, strongly consider 
> avoiding those 8 TB SMR drives, at least until the technology has a few 
> more years to mature.
> 
> There's more information on other threads on the list and on other lists, 
> if you need it and nobody posts more direct information (such as the 
> specific patches in question and what specific kernel versions they hit) 
> here.  I could find it but I'd have to do a search in my own list 
> archives, and now that you are aware of the problem, you can of course do 
> the search as well, if you need to. =:^)
> 
> -- 
> Duncan - List replies preferred.   No HTML msgs.
> "Every nonfree program has a lord, a master --
> and if you use the program, he is your master."  Richard Stallman
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Replacing drives with larger ones in a 4 drive raid1
  2016-06-09 15:20 ` Duncan
  2016-06-09 17:30   ` bOli
@ 2016-06-10 18:56   ` Jukka Larja
  1 sibling, 0 replies; 17+ messages in thread
From: Jukka Larja @ 2016-06-10 18:56 UTC (permalink / raw)
  To: linux-btrfs

This is somewhat off topic but...

9.6.2016, 18.20, Duncan kirjoitti:

> Are those the 8 TB SMR "archive" drives?
>
> I haven't been following the issue very closely, but be aware that there
> were serious issues with those drives a few kernels back, and that while
> those issues are now fixed, the drives themselves operate rather
> differently than normal drives, and simply don't work well in normal
> usage.

Either the issues were not fixed or LSI Logic / Symbios Logic SAS3008 is 
incompatible with the drives (and an older model of theirs, which I don't 
have anymore) as well as Intel Corporation 8 Series/C220 Series Chipset 
Family 6-port SATA Controller 1 [AHCI mode] (rev 05).

I haven't been able to get the disks to fail with any other load but Btrfs. 
However, with that they fail spectacularly. They drop out and make enough 
mess to corrupt things beyond repair. (See 
https://www.spinics.net/lists/linux-btrfs/msg55218.html for more info.)

There's a slight change that I missed some relevant kernel update. When I 
get new disks and can get the array fixed (it still only mounts read-only), 
I'll do some testing with the SMR drives. If they work, that's great, but at 
the moment I wouldn't buy them for Btrfs use even if the workload or 
environmental characteristics wouldn't be a problem.

-- 
      ...Elämälle vierasta toimintaa...
     Jukka Larja, Roskakori@aarghimedes.fi

"Are we feeling better then?"
"I'm naming all the stars."
"You can't see the stars, love. That's the ceiling. Also, it's day."
"I can see them. But I've named them all the same name, and there's terrible 
confusion..."
- Spike & Drusilla, Buffy the Vampire Slayer -


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Replacing drives with larger ones in a 4 drive raid1
  2016-06-08 18:55 Replacing drives with larger ones in a 4 drive raid1 boli
  2016-06-09 15:20 ` Duncan
@ 2016-06-11 13:13 ` boli
  2016-06-12 10:35   ` boli
  2016-06-19 17:38 ` boli
  2 siblings, 1 reply; 17+ messages in thread
From: boli @ 2016-06-11 13:13 UTC (permalink / raw)
  To: linux-btrfs

Updates:

> So for this first replacement I mounted the volume degraded and ran "btrfs device delete missing /mnt", and that's where it's been stuck for the past ~23 hours. Only later did I figure out that this command will trigger a rebalance, and of course that will take a long time.

It has now been doing "btrfs device delete missing /mnt" for about 90 hours.

These 90 hours seem like a rather long time, given that a rebalance/convert from 4-disk-raid5 to 4-disk-raid1 took about 20 hours months ago, and a scrub takes about 7 hours (4-disk-raid1).

OTOH the filesystem will be rather full with only 3 of 4 disks available, so I do expect it to take somewhat "longer than usual".

Would anyone venture a guess as to how long it might take?

> I assume I could probably just Ctrl+C that "btrfs device delete missing /mnt", and the balance would continue as usual in the background, but I have not done that yet, as I'd rather consult you guys first (a bit late, I know).

I've tried finding more info about "btrfs device delete missing", but the man page doesn't even mention the "missing" option, nor does it tell about a rebalance automatically starting (or if that rebalance runs in the background and if Ctrl+C should work or not).

Given my assumption above I've just tried hitting Ctrl+C, but it didn't do anything. The (Java remote console) cursor is still happily blinking away, so I assume it's still doing its thing.

Since it's weekend I'd have more time to tinker, but I'm afraid to do anything drastic, such as force reboot the box, because in some other mails I read that one has just *one* chance to save a degraded array (though not sure if this applies to my case here).

If you know any DOs/DON'Ts please share. :)

Ofc I'll keep reporting any new developments. And should this replacement of the first of 4 drives turn end well, I'll replace the second drive with the "btrfs replace" option instead of delete/add and report.

Cheers, boli



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Replacing drives with larger ones in a 4 drive raid1
  2016-06-11 13:13 ` boli
@ 2016-06-12 10:35   ` boli
  2016-06-12 15:24     ` Henk Slager
  2016-06-13 12:24     ` Austin S. Hemmelgarn
  0 siblings, 2 replies; 17+ messages in thread
From: boli @ 2016-06-12 10:35 UTC (permalink / raw)
  To: linux-btrfs

> It has now been doing "btrfs device delete missing /mnt" for about 90 hours.
> 
> These 90 hours seem like a rather long time, given that a rebalance/convert from 4-disk-raid5 to 4-disk-raid1 took about 20 hours months ago, and a scrub takes about 7 hours (4-disk-raid1).
> 
> OTOH the filesystem will be rather full with only 3 of 4 disks available, so I do expect it to take somewhat "longer than usual".
> 
> Would anyone venture a guess as to how long it might take?

It's done now, and took close to 99 hours to rebalance 8.1 TB of data from a 4x6TB raid1 (12 TB capacity) with 1 drive missing onto the remaining 3x6TB raid1 (9 TB capacity).

Now I made sure quotas were off, then started a screen to fill the new 8 TB disk with zeros, detached it and and checked iotop to get a rough estimate on how long it will take (I'm aware it will become slower in time).

After that I'll add this 8 TB disk to the btrfs raid1 (for yet another rebalance).

The next 3 disks will be replaced with "btrfs replace", so only one rebalance each is needed.

I assume each "btrfs replace" would do a full rebalance, and thus assign chunks according to the normal strategy of choosing the two drives with the most free space, which in this case would be a chunk to the new drive, and a mirrored chunk to that existing 3 drive with most free space.

What I'm wondering is this:
If the goal is to replace 4x 6TB drive (raid1) with 4x 8TB drive (still raid1), is there a way to remove one 6 TB drive at a time, recreate its exact contents from the other 3 drives onto a new 8 TB drive, without doing a full rebalance? That is: without writing any substantial amount of data onto the remaining 3 drives.

It seems to me that would be a lot more efficient, but it would go against the normal chunk assignment strategy.

Cheers, boli


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Replacing drives with larger ones in a 4 drive raid1
  2016-06-12 10:35   ` boli
@ 2016-06-12 15:24     ` Henk Slager
  2016-06-12 17:03       ` boli
  2016-06-13 12:24     ` Austin S. Hemmelgarn
  1 sibling, 1 reply; 17+ messages in thread
From: Henk Slager @ 2016-06-12 15:24 UTC (permalink / raw)
  To: boli; +Cc: linux-btrfs

On Sun, Jun 12, 2016 at 12:35 PM, boli <btrfs@bueechi.net> wrote:
>> It has now been doing "btrfs device delete missing /mnt" for about 90 hours.
>>
>> These 90 hours seem like a rather long time, given that a rebalance/convert from 4-disk-raid5 to 4-disk-raid1 took about 20 hours months ago, and a scrub takes about 7 hours (4-disk-raid1).
>>
>> OTOH the filesystem will be rather full with only 3 of 4 disks available, so I do expect it to take somewhat "longer than usual".
>>
>> Would anyone venture a guess as to how long it might take?
>
> It's done now, and took close to 99 hours to rebalance 8.1 TB of data from a 4x6TB raid1 (12 TB capacity) with 1 drive missing onto the remaining 3x6TB raid1 (9 TB capacity).

Indeed, it not clear why it takes 4 days for such an action. You
indicated that you cannot add an online 5th drive, so then you and
intermediate compaction of the fs to less drives is a way to handle
this issue. There are 2 ways however:

1) Keeping the to-be-replaced drive online until a btrfs dev remove of
it from the fs of it is finished and only then replace a 6TB with an
8TB in the drivebay. So in this case, one needs enough free capacity
on the fs (which you had) and full btrfs raid1 redundancy is there all
the time.

2) Take a 6TB out of the drivebay first and then do the btrfs dev
remove, in this case on a really missing disk. This way, the fs is in
degraded mode (or mounted as such) and the action of remove missing is
also a sort of 'reconstruction'. I don't know the details of the code,
but I can imagine that it has performance implications.

> Now I made sure quotas were off, then started a screen to fill the new 8 TB disk with zeros, detached it and and checked iotop to get a rough estimate on how long it will take (I'm aware it will become slower in time).
>
> After that I'll add this 8 TB disk to the btrfs raid1 (for yet another rebalance).
>
> The next 3 disks will be replaced with "btrfs replace", so only one rebalance each is needed.
>
> I assume each "btrfs replace" would do a full rebalance, and thus assign chunks according to the normal strategy of choosing the two drives with the most free space, which in this case would be a chunk to the new drive, and a mirrored chunk to that existing 3 drive with most free space.
>
> What I'm wondering is this:
> If the goal is to replace 4x 6TB drive (raid1) with 4x 8TB drive (still raid1), is there a way to remove one 6 TB drive at a time, recreate its exact contents from the other 3 drives onto a new 8 TB drive, without doing a full rebalance? That is: without writing any substantial amount of data onto the remaining 3 drives.

There isn't such a way. This goal has a violation in itself with
respect to redundancy (btrfs raid1).

> It seems to me that would be a lot more efficient, but it would go against the normal chunk assignment strategy.

man btrfs-replace and option -r I would say. But still, having a 5th
drive online available makes things much easier and faster and solid
and is the way to do a drive replace. You can then do a normal replace
and there is just highspeed data transfer for the old and the new disk
and only for parts/blocks of the disk that contain filedata. So it is
not a sector-by-sector copying also deleted blocks, but from end-user
perspective is an exact copy. There are patches ('hot spare') that
assume it to be this way, but they aren't in the mainline kernel yet.

The btrfs-replace should work ok for btrfs raid1 fs (at least it
worked ok for btrfs raid10 half a year ago I can confirm), if the fs
is mostly idle during the replace (almost no new files added). Still,
you might want to have the replace related fixes added in kernel
4.7-rc2.

Another less likely reason for the performance issue is that the fs is
changed from raid5 and has 4k nodesize. btrfs-show-super can show you
that. It should not be, but my experience for a delete / add sequence
for such a case is that is very slow.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Replacing drives with larger ones in a 4 drive raid1
  2016-06-12 15:24     ` Henk Slager
@ 2016-06-12 17:03       ` boli
  2016-06-12 19:03         ` Henk Slager
  0 siblings, 1 reply; 17+ messages in thread
From: boli @ 2016-06-12 17:03 UTC (permalink / raw)
  To: linux-btrfs

>> It's done now, and took close to 99 hours to rebalance 8.1 TB of data from a 4x6TB raid1 (12 TB capacity) with 1 drive missing onto the remaining 3x6TB raid1 (9 TB capacity).
> 
> Indeed, it not clear why it takes 4 days for such an action. You
> indicated that you cannot add an online 5th drive, so then you and
> intermediate compaction of the fs to less drives is a way to handle
> this issue. There are 2 ways however:
> 
> 1) Keeping the to-be-replaced drive online until a btrfs dev remove of
> it from the fs of it is finished and only then replace a 6TB with an
> 8TB in the drivebay. So in this case, one needs enough free capacity
> on the fs (which you had) and full btrfs raid1 redundancy is there all
> the time.
> 
> 2) Take a 6TB out of the drivebay first and then do the btrfs dev
> remove, in this case on a really missing disk. This way, the fs is in
> degraded mode (or mounted as such) and the action of remove missing is
> also a sort of 'reconstruction'. I don't know the details of the code,
> but I can imagine that it has performance implications.

Thanks for reminding me about option 1). So in summary, without temporarily adding an additional drive, there are 3 ways to replace a drive:

1) Logically removing old drive (triggers 1st rebalance), physically removing it, then adding new drive physically and logically (triggers 2nd rebalance)

2) Physically removing old drive, mounting degraded, logically removing it (triggers 1st rebalance, while degraded), then adding new drive physically and logically (2nd rebalance)

3) Physically replacing old with new drive, mounting degraded, then logically replacing old with new drive (triggers rebalance while degraded)


I did option 2, which seems to be the worst of the three, as there was no redundancy for a couple days, and 2 rebalances are needed, which potentially take a long time.

Option 1 also has 2 rebalances, but redundancy is always maintained.

Option 3 needs just 1 rebalance, but (like option 1) does not maintain redundancy at all times.

That's where an extra drive bay would come in handy, allowing to maintain redundancy while still just needing one "rebalance"? Question mark because you mentioned "highspeed data transfer" rather than "rebalance" when doing a btrfs-replace, which sounds very efficient (in case of -r option these transfers would be from multiple drives).

The man page mentioned that the replacement drive needs to be at least as large as the original, which makes me wonder if it's still a "highspeed data transfer" if the new drive is larger, or if it does a rebalance in that case. If not then that'd be pretty much what I'm looking for. More on that below.

>> If the goal is to replace 4x 6TB drive (raid1) with 4x 8TB drive (still raid1), is there a way to remove one 6 TB drive at a time, recreate its exact contents from the other 3 drives onto a new 8 TB drive, without doing a full rebalance? That is: without writing any substantial amount of data onto the remaining 3 drives.
> 
> There isn't such a way. This goal has a violation in itself with
> respect to redundancy (btrfs raid1).

True, it would be "hack" to minimize the amount of data to rebalance (thus saving time), with the (significant) downside of not maintaining redundancy at all times.
Personally I'd probably be willing to take the risk, since I have a few other copies of this data.

> man btrfs-replace and option -r I would say. But still, having a 5th
> drive online available makes things much easier and faster and solid
> and is the way to do a drive replace. You can then do a normal replace
> and there is just highspeed data transfer for the old and the new disk
> and only for parts/blocks of the disk that contain filedata. So it is
> not a sector-by-sector copying also deleted blocks, but from end-user
> perspective is an exact copy. There are patches ('hot spare') that
> assume it to be this way, but they aren't in the mainline kernel yet.

Hmm, so maybe I should think about using an USB enclosure to temporarily add a 5th drive.
Being a bit wary about an external USB enclosure, I'd probably try to minimize transfers from/to the USB enclosure.

Say by putting the old (to-be-replaced) drive into the USB enclosure, the new drive into the internal drive bay where the old drive used to be, and then do a btrfs-replace with -r option to minimize reads from USB.

Or put one of the *other* disks into the USB enclosure (neither the old nor its new replacement drive), and doing a btrfs-replace without -r option.

> The btrfs-replace should work ok for btrfs raid1 fs (at least it
> worked ok for btrfs raid10 half a year ago I can confirm), if the fs
> is mostly idle during the replace (almost no new files added).

That's good to read. The fs will be idle during the replace.

> Still, you might want to have the replace related fixes added in kernel
> 4.7-rc2.

Hmm, since I'm on Fedora with kernel 4.5.5 (or 4.5.6 after most recent upgrades, which this box didn't get yet), I guess waiting for kernel 4.7 is not very practical, and replacing the kernel is outside my comfort zone/knowledge for know.

Anyway, thanks for your helpful reply!


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Replacing drives with larger ones in a 4 drive raid1
  2016-06-12 17:03       ` boli
@ 2016-06-12 19:03         ` Henk Slager
  2016-06-13  3:54           ` Duncan
  0 siblings, 1 reply; 17+ messages in thread
From: Henk Slager @ 2016-06-12 19:03 UTC (permalink / raw)
  To: boli; +Cc: linux-btrfs

On Sun, Jun 12, 2016 at 7:03 PM, boli <btrfs@bueechi.net> wrote:
>>> It's done now, and took close to 99 hours to rebalance 8.1 TB of data from a 4x6TB raid1 (12 TB capacity) with 1 drive missing onto the remaining 3x6TB raid1 (9 TB capacity).
>>
>> Indeed, it not clear why it takes 4 days for such an action. You
>> indicated that you cannot add an online 5th drive, so then you and
>> intermediate compaction of the fs to less drives is a way to handle
>> this issue. There are 2 ways however:
>>
>> 1) Keeping the to-be-replaced drive online until a btrfs dev remove of
>> it from the fs of it is finished and only then replace a 6TB with an
>> 8TB in the drivebay. So in this case, one needs enough free capacity
>> on the fs (which you had) and full btrfs raid1 redundancy is there all
>> the time.
>>
>> 2) Take a 6TB out of the drivebay first and then do the btrfs dev
>> remove, in this case on a really missing disk. This way, the fs is in
>> degraded mode (or mounted as such) and the action of remove missing is
>> also a sort of 'reconstruction'. I don't know the details of the code,
>> but I can imagine that it has performance implications.
>
> Thanks for reminding me about option 1). So in summary, without temporarily adding an additional drive, there are 3 ways to replace a drive:
>
> 1) Logically removing old drive (triggers 1st rebalance), physically removing it, then adding new drive physically and logically (triggers 2nd rebalance)
>
> 2) Physically removing old drive, mounting degraded, logically removing it (triggers 1st rebalance, while degraded), then adding new drive physically and logically (2nd rebalance)
>
> 3) Physically replacing old with new drive, mounting degraded, then logically replacing old with new drive (triggers rebalance while degraded)
>
>
> I did option 2, which seems to be the worst of the three, as there was no redundancy for a couple days, and 2 rebalances are needed, which potentially take a long time.
>
> Option 1 also has 2 rebalances, but redundancy is always maintained.
>
> Option 3 needs just 1 rebalance, but (like option 1) does not maintain redundancy at all times.
>
> That's where an extra drive bay would come in handy, allowing to maintain redundancy while still just needing one "rebalance"? Question mark because you mentioned "highspeed data transfer" rather than "rebalance" when doing a btrfs-replace, which sounds very efficient (in case of -r option these transfers would be from multiple drives).

I haven't used -r with replace other then for testing purposes inside
virtual machines. I think the '..transfers would be from multiple
drives...' might not be a speed advantage with the current state of
the code. If the drives are still healthy and the replace purpose is
capacity increase, my experience is that without the -r option (and
using an extra SATA port), the transfer is mostly at the drives max
magnetic media transferspeed. Also for cases like if you want to add
LUKS or bcache headers in front of the blockdevice that hosts the
fs/devid1 data.

But now that you anyhow have all data on 3x 6TB drives, you could save
balancing time by just doing btrfs-replace 6TB to 8TB 3x and then for
the 4th 8TB just add it and let btrfs do the spreading/balancing over
time by itself.

> The man page mentioned that the replacement drive needs to be at least as large as the original, which makes me wonder if it's still a "highspeed data transfer" if the new drive is larger, or if it does a rebalance in that case. If not then that'd be pretty much what I'm looking for. More on that below.
>
>>> If the goal is to replace 4x 6TB drive (raid1) with 4x 8TB drive (still raid1), is there a way to remove one 6 TB drive at a time, recreate its exact contents from the other 3 drives onto a new 8 TB drive, without doing a full rebalance? That is: without writing any substantial amount of data onto the remaining 3 drives.
>>
>> There isn't such a way. This goal has a violation in itself with
>> respect to redundancy (btrfs raid1).
>
> True, it would be "hack" to minimize the amount of data to rebalance (thus saving time), with the (significant) downside of not maintaining redundancy at all times.
> Personally I'd probably be willing to take the risk, since I have a few other copies of this data.
>
>> man btrfs-replace and option -r I would say. But still, having a 5th
>> drive online available makes things much easier and faster and solid
>> and is the way to do a drive replace. You can then do a normal replace
>> and there is just highspeed data transfer for the old and the new disk
>> and only for parts/blocks of the disk that contain filedata. So it is
>> not a sector-by-sector copying also deleted blocks, but from end-user
>> perspective is an exact copy. There are patches ('hot spare') that
>> assume it to be this way, but they aren't in the mainline kernel yet.
>
> Hmm, so maybe I should think about using an USB enclosure to temporarily add a 5th drive.
> Being a bit wary about an external USB enclosure, I'd probably try to minimize transfers from/to the USB enclosure.
>
> Say by putting the old (to-be-replaced) drive into the USB enclosure, the new drive into the internal drive bay where the old drive used to be, and then do a btrfs-replace with -r option to minimize reads from USB.
>
> Or put one of the *other* disks into the USB enclosure (neither the old nor its new replacement drive), and doing a btrfs-replace without -r option.

Yes USB would also not be my preferred choice. I have had chipset and
sectors lost issues. If I have some SATA free (external or on the
motherboard), I'd rather use that. but if it is all remote, other
factors might be more important.


>> The btrfs-replace should work ok for btrfs raid1 fs (at least it
>> worked ok for btrfs raid10 half a year ago I can confirm), if the fs
>> is mostly idle during the replace (almost no new files added).
>
> That's good to read. The fs will be idle during the replace.
>
>> Still, you might want to have the replace related fixes added in kernel
>> 4.7-rc2.
>
> Hmm, since I'm on Fedora with kernel 4.5.5 (or 4.5.6 after most recent upgrades, which this box didn't get yet), I guess waiting for kernel 4.7 is not very practical, and replacing the kernel is outside my comfort zone/knowledge for know.
>
> Anyway, thanks for your helpful reply!
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Replacing drives with larger ones in a 4 drive raid1
  2016-06-12 19:03         ` Henk Slager
@ 2016-06-13  3:54           ` Duncan
  0 siblings, 0 replies; 17+ messages in thread
From: Duncan @ 2016-06-13  3:54 UTC (permalink / raw)
  To: linux-btrfs

Henk Slager posted on Sun, 12 Jun 2016 21:03:22 +0200 as excerpted:

> But now that you anyhow have all data on 3x 6TB drives, you could save
> balancing time by just doing btrfs-replace 6TB to 8TB 3x and then for
> the 4th 8TB just add it and let btrfs do the spreading/balancing over
> time by itself.

That's what I'd suggest.  You have all the data on three of the 6 TB 
drives now.  Just replace one at a time to 8 TB drives.  Then add the 4th 
8 TB drive, and then at your option do a final balance at that point, or 
simply let the normal activity take care of it.

Altho if you're doing mostly add, little delete, without a balance you 
may run out of space prematurely, since raid1 requires two drives with 
unallocated space on them to allocate a new chunk (one copy on each of 
the two), and you'll only have ~2 TB free on each of the three, which 
would be used up with ~2 TB still left free on the last added drive...

So at least a partial balance after adding that 4th 8 TB in is probably a 
good idea.  You can leave that last drive with a couple extra free TB 
compared to the others and cancel the balance at that point, and new 
allocations should take it from there, but unless you're going to be 
deleting several TB of stuff as you add, at least doing a few TB worth of 
balance to the new drive to start the process should result in a pretty 
even spread as it fills up the rest of the way.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Replacing drives with larger ones in a 4 drive raid1
  2016-06-12 10:35   ` boli
  2016-06-12 15:24     ` Henk Slager
@ 2016-06-13 12:24     ` Austin S. Hemmelgarn
  2016-06-14 19:28       ` boli
  1 sibling, 1 reply; 17+ messages in thread
From: Austin S. Hemmelgarn @ 2016-06-13 12:24 UTC (permalink / raw)
  To: boli, linux-btrfs

On 2016-06-12 06:35, boli wrote:
>> It has now been doing "btrfs device delete missing /mnt" for about 90 hours.
>>
>> These 90 hours seem like a rather long time, given that a rebalance/convert from 4-disk-raid5 to 4-disk-raid1 took about 20 hours months ago, and a scrub takes about 7 hours (4-disk-raid1).
>>
>> OTOH the filesystem will be rather full with only 3 of 4 disks available, so I do expect it to take somewhat "longer than usual".
>>
>> Would anyone venture a guess as to how long it might take?
>
> It's done now, and took close to 99 hours to rebalance 8.1 TB of data from a 4x6TB raid1 (12 TB capacity) with 1 drive missing onto the remaining 3x6TB raid1 (9 TB capacity).
>
> Now I made sure quotas were off, then started a screen to fill the new 8 TB disk with zeros, detached it and and checked iotop to get a rough estimate on how long it will take (I'm aware it will become slower in time).
>
> After that I'll add this 8 TB disk to the btrfs raid1 (for yet another rebalance).
>
> The next 3 disks will be replaced with "btrfs replace", so only one rebalance each is needed.
>
> I assume each "btrfs replace" would do a full rebalance, and thus assign chunks according to the normal strategy of choosing the two drives with the most free space, which in this case would be a chunk to the new drive, and a mirrored chunk to that existing 3 drive with most free space.
Replace doesn't need to do a balance, it's largely just a block level 
copy of the device being replaced, but with some special handling so 
that the filesystem is consistent throughout the whole operation.  This 
is most of why it's so much more efficient than add/delete.
>
> What I'm wondering is this:
> If the goal is to replace 4x 6TB drive (raid1) with 4x 8TB drive (still raid1), is there a way to remove one 6 TB drive at a time, recreate its exact contents from the other 3 drives onto a new 8 TB drive, without doing a full rebalance? That is: without writing any substantial amount of data onto the remaining 3 drives.
The most efficient way of converting the array online without adding any 
more disks than you have to begin with is:
1. Delete one device from the array with device delete.
2. Physically switch the now unused device with one of the new devices.
3. Use btrfs replace to replace one of the devices in the array with the 
newly connected device (and make sure to resize to the full size of the 
new device).
4. Repeat from step 2 until you aren't using any of the old devices in 
the array.
5. You should have one old device left unused, physically switch it for 
a new device.
6. Use btrfs device add to add the new device to the array, then run a 
full balance.

This will result in only two balances being needed (one implicit in the 
device delete, and the explicit final one to restripe across the full 
array), and will result in the absolute minimum possible data transfer.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Replacing drives with larger ones in a 4 drive raid1
  2016-06-13 12:24     ` Austin S. Hemmelgarn
@ 2016-06-14 19:28       ` boli
  2016-06-15  3:19         ` Duncan
  0 siblings, 1 reply; 17+ messages in thread
From: boli @ 2016-06-14 19:28 UTC (permalink / raw)
  To: linux-btrfs

> Replace doesn't need to do a balance, it's largely just a block level copy of the device being replaced, but with some special handling so that the filesystem is consistent throughout the whole operation.  This is most of why it's so much more efficient than add/delete.

Thanks for this correction. In the mean time I experienced myself that replace is pretty fast…

Last time I wrote I thought the initial 4 day "remove missing" was successful/complete, but as it turned out that device was still missing. Maybe that Ctrl+C I tried after a few days did work after all. I only checked/noticed this after the 8 TB drive was zeroed and encrypted.

Luckily, most of the "missing" data was already rebuilt onto the remaining 2 drives, and only 1.27 TiB were still "missing".

In hindsight I should probably have repeated "remove missing" here, but to completion. What I did instead was a "replace -r" onto the 8 TB drive. This did successfully rebuild the missing 1.27 TiB of data onto the 8 TB drive, at a speedy ~144 MiB/s no less!

So I was back to a 4-drive raid1, with 3x 6 TB drives and 1x 8 TB drive (though that 8 TB drive had very little data on it). Then I tried to "remove" (without "-r" this time) the 6 TB drive with the least amount of data on it (one had 4.0 TiB, where the other two had 5.45 TiB each). This failed after a few minutes because of "no space left on device". 

Austin's mail reminded me to resize due to the larger disk, which I then did, but that device still couldn't be removed, same error message.
I then consulted the wiki, which mentions that space for metadata might be rather full (11.91 used of 12.66 GiB total here), and to try a "balance" with a low "dusage" in such cases.

For now I avoided that by removing one of the other two (rather full) 6 TB drives at random, and this has been going on for the last 20 hours or so. Thanks to running it in a screen I can check the progress this time around, and it's doing its thing at ~41 MiB/s, or ~7 hours per TiB, on average.

Maybe the "no data left on device" will sort itself out during this "remove"'s balance, otherwise I'll do it manually later.

> The most efficient way of converting the array online without adding any more disks than you have to begin with is:
> 1. Delete one device from the array with device delete.
> 2. Physically switch the now unused device with one of the new devices.
> 3. Use btrfs replace to replace one of the devices in the array with the newly connected device (and make sure to resize to the full size of the new device).
> 4. Repeat from step 2 until you aren't using any of the old devices in the array.
> 5. You should have one old device left unused, physically switch it for a new device.
> 6. Use btrfs device add to add the new device to the array, then run a full balance.
> 
> This will result in only two balances being needed (one implicit in the device delete, and the explicit final one to restripe across the full array), and will result in the absolute minimum possible data transfer.

Thank you for these very explicit/succinct instructions! Also thanks to Henk and Duncan! I will definitely do a full balance when all disks are replaced.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Replacing drives with larger ones in a 4 drive raid1
  2016-06-14 19:28       ` boli
@ 2016-06-15  3:19         ` Duncan
  2016-06-16  0:09           ` boli
  0 siblings, 1 reply; 17+ messages in thread
From: Duncan @ 2016-06-15  3:19 UTC (permalink / raw)
  To: linux-btrfs

boli posted on Tue, 14 Jun 2016 21:28:57 +0200 as excerpted:

> So I was back to a 4-drive raid1, with 3x 6 TB drives and 1x 8 TB drive
> (though that 8 TB drive had very little data on it). Then I tried to
> "remove" (without "-r" this time) the 6 TB drive with the least amount
> of data on it (one had 4.0 TiB, where the other two had 5.45 TiB each).
> This failed after a few minutes because of "no space left on device".
> 
> Austin's mail reminded me to resize due to the larger disk, which I then
> did, but that device still couldn't be removed, same error message.
> I then consulted the wiki, which mentions that space for metadata might
> be rather full (11.91 used of 12.66 GiB total here), and to try a
> "balance" with a low "dusage" in such cases.
> 
> For now I avoided that by removing one of the other two (rather full) 6
> TB drives at random, and this has been going on for the last 20 hours or
> so. Thanks to running it in a screen I can check the progress this time
> around, and it's doing its thing at ~41 MiB/s, or ~7 hours per TiB, on
> average.

The ENOSPC errors are likely due to the fact that the raid1 allocator 
needs _two_ devices with free space.  If your 6T devices get too full, 
even if the 8T device is nearly empty, you'll run into ENOSPC, because 
you have just one device with unallocated space and the raid1 allocator 
needs two.

btrfs device usage should help diagnose this condition, with btrfs 
filesystem show also showing the individual device space allocation but 
not as much other information as usage will.

If you run into this, you may just have to do the hardware yank and 
replace-missing thing again, yanking a 6T and replacing with an 8T.  
Don't forget the resize.  That should leave you with two devices with 
free space and thus hopefully allow normal raid1 reallocation with a 
device remove again.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Replacing drives with larger ones in a 4 drive raid1
  2016-06-15  3:19         ` Duncan
@ 2016-06-16  0:09           ` boli
  2016-06-16 18:18             ` boli
  0 siblings, 1 reply; 17+ messages in thread
From: boli @ 2016-06-16  0:09 UTC (permalink / raw)
  To: linux-btrfs

>> So I was back to a 4-drive raid1, with 3x 6 TB drives and 1x 8 TB drive
>> (though that 8 TB drive had very little data on it). Then I tried to
>> "remove" (without "-r" this time) the 6 TB drive with the least amount
>> of data on it (one had 4.0 TiB, where the other two had 5.45 TiB each).
>> This failed after a few minutes because of "no space left on device".
>> 
>> […]
>> 
>> For now I avoided that by removing one of the other two (rather full) 6
>> TB drives at random, and this has been going on for the last 20 hours or
>> so. Thanks to running it in a screen I can check the progress this time
>> around, and it's doing its thing at ~41 MiB/s, or ~7 hours per TiB, on
>> average.
> 
> The ENOSPC errors are likely due to the fact that the raid1 allocator 
> needs _two_ devices with free space.  If your 6T devices get too full, 
> even if the 8T device is nearly empty, you'll run into ENOSPC, because 
> you have just one device with unallocated space and the raid1 allocator 
> needs two.

I see, now this makes total sense. Two of the 6 TB drives were almost completely full, at 5.45 used of 5.46 TiB capacity. Note to self: Maybe I should start using the --si option to make such a condition more obvious when mentally comparing to the advertised capacity of 6 TB (when comparing to to 5.46 TiB would have been correct)

"remove"-ing one of these almost-full-drives did finish successfully, and a "replace" of the 3rd 6 TB drive onto a second 8 TB drive is currently in progress (at high speed).

> btrfs device usage should help diagnose this condition, with btrfs 
> filesystem show also showing the individual device space allocation but 
> not as much other information as usage will.

I had mostly been using btrfs filesystem usage, thanks for the reminder about device usage, which is easier to read in this case.

> If you run into this, you may just have to do the hardware yank and 
> replace-missing thing again, yanking a 6T and replacing with an 8T.  
> Don't forget the resize.  That should leave you with two devices with 
> free space and thus hopefully allow normal raid1 reallocation with a 
> device remove again.

Good to know. For now this doesn't seem necessary, even the other drive that was almost completely full before looks much better now at 4.8/5.46 TiB or 5.27/6.0 TB in --si :), as some data was moved to the first 8 TB drive during the last "remove".

So far everything is looking good, thanks very much for the help everyone.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Replacing drives with larger ones in a 4 drive raid1
  2016-06-16  0:09           ` boli
@ 2016-06-16 18:18             ` boli
  2016-06-17  6:25               ` Duncan
  0 siblings, 1 reply; 17+ messages in thread
From: boli @ 2016-06-16 18:18 UTC (permalink / raw)
  To: linux-btrfs

> a "replace" of the 3rd 6 TB drive onto a second 8 TB drive is currently in progress (at high speed).

This second replace is now finished, and it looks OK now:

	# btrfs replace status /data
	Started on 16.Jun 01:15:17, finished on 16.Jun 11:40:30, 0 write errs, 0 uncorr. read errs

Transfer rate of ~134 MiB/s, or ~2.2 hours per TiB.

	# btrfs device usage  /data 
	/dev/dm-2, ID: 3
	   Device size:             5.46TiB
	   Data,RAID1:              4.85TiB
	   Metadata,RAID1:          3.00GiB
	   Unallocated:           620.03GiB

	/dev/mapper/AAAAAAAA_enc, ID: 1
	   Device size:             7.28TiB
	   Data,RAID1:              6.66TiB
	   Metadata,RAID1:         12.69GiB
	   System,RAID1:           64.00MiB
	   Unallocated:           620.31GiB

	/dev/mapper/BBBBBBBB_enc, ID: 2
	   Device size:             7.28TiB
	   Data,RAID1:              4.79TiB
	   Metadata,RAID1:          9.69GiB
	   System,RAID1:           64.00MiB
	   Unallocated:           676.31GiB

However, while the replace was in progress, it showed weird stuff, like this percentage > 100 today at 9am (~3 hours before completion):

	# btrfs replace status /data       
	272.1% done, 0 write errs, 0 uncorr. read errs

Also, contrary to he first replace, filesystem info was not updated during the replace, and looked like this (for example):

	# btrfs device usage  /data 
	/dev/dm-2, ID: 3
	   Device size:             5.46TiB
	   Data,RAID1:              4.85TiB
	   Metadata,RAID1:          3.00GiB
	   Unallocated:           620.03GiB

	/dev/dm-3, ID: 2
	   Device size:             5.46TiB
	   Data,RAID1:              4.79TiB
	   Metadata,RAID1:          9.69GiB
	   System,RAID1:           64.00MiB
	   Unallocated:           676.31GiB

	/dev/mapper/AAAAAAAA_enc, ID: 1
	   Device size:             7.28TiB
	   Data,RAID1:              6.66TiB
	   Metadata,RAID1:         12.69GiB
	   System,RAID1:           64.00MiB
	   Unallocated:           620.31GiB

	/dev/mapper/BBBBBBBB_enc, ID: 0
	   Device size:             7.28TiB
	   Unallocated:             5.46TiB

I'm happy it worked, just wondering why it behaved weirdly this second time.

During the first replace, my Fedora 23 was booted in emergency mode, whereas for the second time it was booted normally.

I'm going to reboot now to update Kernel 4.5.5 to 4.5.6 and then continue replacing drives.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Replacing drives with larger ones in a 4 drive raid1
  2016-06-16 18:18             ` boli
@ 2016-06-17  6:25               ` Duncan
  0 siblings, 0 replies; 17+ messages in thread
From: Duncan @ 2016-06-17  6:25 UTC (permalink / raw)
  To: linux-btrfs

boli posted on Thu, 16 Jun 2016 20:18:50 +0200 as excerpted:

> This second replace is now finished, and it looks OK now:
> 
> 	# btrfs replace status /data
> 	Started on 16.Jun 01:15:17, finished on 16.Jun 11:40:30,
>       0 write errs, 0 uncorr. read errs


> However, while the replace was in progress, it showed weird stuff,
> like this percentage > 100 today at 9am (~3 hours before completion):
> 
> 	# btrfs replace status /data       
> 	272.1% done, 0 write errs, 0 uncorr. read errs
> 
> Also, contrary to he first replace, filesystem info was not updated
> during the replace


> I'm happy it worked, just wondering why it behaved
> weirdly this second time.
> 
> During the first replace, my Fedora 23 was booted in emergency mode,
> whereas for the second time it was booted normally.
> 
> I'm going to reboot now to update Kernel 4.5.5 to 4.5.6
> and then continue replacing drives.


I'm guessing you were either running differing kernels,
or it had something to do with the information available...
/sys and /proc mounted, udev and lvm/device-mapper possibly
in different states due to the differing systemd target states,
possibly different btrfs, udev and lvm/dm versions in initr*
vs the main system, etc.

In particular, I know there have been some patches to fix problems
where it would count only one device, generally the one it was mounted
with, as 100%, when the balance affected multiple devices, so it could
get to multiple-hundred percent done.  And there have been patches
having to do with resolving names to the canonical form, vs the various
symlinked udev/lvm/mdraid/etc names.

But I wouldn't be surprised if there's still inconsistencies,
particularly related to udev/lvm state differences that may well appear
between systemd emergency and multi-user target modes or between initr*
and main system boot, especially if the initr* is running differing
versions of one or more affected utilities.

So if it was the same kernel version, it's likely that in the one case
it simply wasn't mapping the full filesystem for the purpose of
calculating percentages and current filesystem numbers, correctly.
That should be corrected in time, and Fedora is likely to see it
pretty early compared to everyone else given the number of upstream
devs that package for it and sometimes pre-release or first release
deployment versions with systemd/udev/lvm patches that others simply
don't have yet, but it could yet be a few years before it /fully/
settles down.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Replacing drives with larger ones in a 4 drive raid1
  2016-06-08 18:55 Replacing drives with larger ones in a 4 drive raid1 boli
  2016-06-09 15:20 ` Duncan
  2016-06-11 13:13 ` boli
@ 2016-06-19 17:38 ` boli
  2 siblings, 0 replies; 17+ messages in thread
From: boli @ 2016-06-19 17:38 UTC (permalink / raw)
  To: linux-btrfs

For completeness here's the summary of my replacement of all four 6 TB drives (henceforth "6T") with 8 TB drives ("8T") in a btrfs raid1 volume.
I included transfer rates so maybe others can get a rough idea what to expect when doing similar things. All capacity units are SI, not base 2.

Filesystem usage was ~17.84 of 24 TB used when I started.

The first steps all happened while the machine was booted into emergency mode.

 1. Physically replaced 1st 6T with 1st 8T,
    without having done a logical remove beforehand.
    Should have done that to maintain redundancy.
 2. Mounted volume degraded and btrfs device remove missing.
    Took over 4 days, and 1.4 TB were still missing after.
    Also it was a close call: 17.84 TB of 18 TB used!
    (Two of the drives were completely full after this)
    Transfer rate of ~46 MB/s (~6 h/TB)
 3. Restored missing 1.4 TB onto the 1st 8T with btrfs replace -r
    Would have been more efficient to try and complete step 2.
    Transfer rate of ~159 MB/s (~1.75 h/TB)
 4. Resized to full size of 1st 8T
 5. btrfs device remove'd a 2nd 6T
 6. Physically replaced this 2nd 6T with 2nd 8T

At this point the machine was rebooted into normal mode.   

 7. Logically replaced 3rd 6T onto 2nd 8T with btrfs replace
    Transfer rate of ~140 MB/s (~1.98 h/TB)
 8. Resized to full size of 2nd 8T
 9. Physically replaced 3rd 6T with 3rd 8T

Another reboot for kernel update to 4.5.6. Also the machine received a few of the backups that were previously held back so it could restore in peace.

10. Logically replaced 4th 6T onto 3rd 8T with btrfs replace
    Transfer rate of ~151 MB/s (~1.84 h/TB)
11. Resized to full size of 3rd 8T
12. Physically replaced 4th 6T with 4th 8T (reboot)
13. Logically added 4th 8T to volume with btrfs device add
14. Ran a full balance (~18 TB used). Took about 2 days.
    Transfer rate of ~104 MB/s (~2.67 h/TB)


^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2016-06-19 17:39 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-06-08 18:55 Replacing drives with larger ones in a 4 drive raid1 boli
2016-06-09 15:20 ` Duncan
2016-06-09 17:30   ` bOli
2016-06-10 18:56   ` Jukka Larja
2016-06-11 13:13 ` boli
2016-06-12 10:35   ` boli
2016-06-12 15:24     ` Henk Slager
2016-06-12 17:03       ` boli
2016-06-12 19:03         ` Henk Slager
2016-06-13  3:54           ` Duncan
2016-06-13 12:24     ` Austin S. Hemmelgarn
2016-06-14 19:28       ` boli
2016-06-15  3:19         ` Duncan
2016-06-16  0:09           ` boli
2016-06-16 18:18             ` boli
2016-06-17  6:25               ` Duncan
2016-06-19 17:38 ` boli

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.