* raid1 with several old drives and a big new one @ 2020-07-31 0:16 Eric Wong 2020-07-31 2:57 ` Chris Murphy ` (2 more replies) 0 siblings, 3 replies; 9+ messages in thread From: Eric Wong @ 2020-07-31 0:16 UTC (permalink / raw) To: linux-btrfs Say I have three ancient 2TB HDDs and one new 6TB HDD, is there a way I can ensure one raid1 copy of the data stays on the new 6TB HDD? I expect the 2TB HDDs to fail sooner than the 6TB HDD given their age (>5 years). The devid balance filter only affects data which already exists on the device, so that isn't suitable for this, right? Thanks in advance. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: raid1 with several old drives and a big new one 2020-07-31 0:16 raid1 with several old drives and a big new one Eric Wong @ 2020-07-31 2:57 ` Chris Murphy 2020-07-31 3:22 ` Eric Wong 2020-08-01 9:05 ` Roman Mamedov 2020-07-31 8:29 ` Alberto Bursi 2020-07-31 16:13 ` Adam Borowski 2 siblings, 2 replies; 9+ messages in thread From: Chris Murphy @ 2020-07-31 2:57 UTC (permalink / raw) To: Eric Wong; +Cc: Btrfs BTRFS (first attempt did not go to the list) On Thu, Jul 30, 2020 at 6:16 PM Eric Wong <e@80x24.org> wrote: > > Say I have three ancient 2TB HDDs and one new 6TB HDD, is there > a way I can ensure one raid1 copy of the data stays on the new > 6TB HDD? Yes. Use mdadm --level=linear --raid-devices=2 to concatenate the two 2TB drives. Or use LVM (linear by default). Leave the 6TB out of this regime. And now you have two block devices (one is the concat virtual device) to do a raid1 with btrfs, and the 6TB will always get one of the raid1 chunks. There isn't a way to do this with btrfs alone. When one of the 2TB fails, there's some likelihood that it'll behave like a partially failing device. Some reads and writes will succeed, others won't. So you'll need to be prepared strategy wise what to do. Ideal scenario is a new 4+TB drive, and use 'btrfs replace' to replace the md concat device. Due to the large number of errors possible with the 'btrfs replace' you might want to use -r option. Following successful replace, an option is to break the 2x 2TB mdadm concat apart, send the dead drive off for grinding, and the good 2TB you can add as a 3rd device to the Btrfs. If it dies, same thing. Preferably use 'btrfs replace' - it's faster and more reliable than 'btrfs delete missing'. And on second thought... You might do some rudimentary read/write benchmarks on all three drives. I haven't found btrfs to be fussy about speed differences between raid1 member drives. But if it turns out either of the 2TB's are slower than the 6TB, you could do raid0 instead of linear. If so, I suggest either 32Kib or 64KiB for mdadm --chunk size. Default is 512KiB. Not great for metadata centric workloads. Of course, if one of them dies, the error behavior will be quite a lot more consistent, EIO on every other 64KiB strip. So you'll definitely want -r option when doing the replace. -- Chris Murphy ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: raid1 with several old drives and a big new one 2020-07-31 2:57 ` Chris Murphy @ 2020-07-31 3:22 ` Eric Wong 2020-07-31 3:35 ` Chris Murphy 2020-08-01 9:05 ` Roman Mamedov 1 sibling, 1 reply; 9+ messages in thread From: Eric Wong @ 2020-07-31 3:22 UTC (permalink / raw) To: Chris Murphy; +Cc: linux-btrfs Chris Murphy <lists@colorremedies.com> wrote: > On Thu, Jul 30, 2020 at 6:16 PM Eric Wong <e@80x24.org> wrote: > > > > Say I have three ancient 2TB HDDs and one new 6TB HDD, is there > > a way I can ensure one raid1 copy of the data stays on the new > > 6TB HDD? > > Yes. Use mdadm --level=linear --raid-devices=2 to concatenate the two > 2TB drives. Or use LVM (linear by default). Leave the 6TB out of this > regime. And now you have two block devices (one is the concat virtual > device) to do a raid1 with btrfs, and the 6TB will always get one of > the raid1 chunks. > > There isn't a way to do this with btrfs alone. Thanks for the response(s), I was hoping to simplify my stack with btrfs alone. > When one of the 2TB fails, there's some likelihood that it'll behave > like a partially failing device. Some reads and writes will succeed, > others won't. So you'll need to be prepared strategy wise what to do. > Ideal scenario is a new 4+TB drive, and use 'btrfs replace' to replace > the md concat device. Due to the large number of errors possible with > the 'btrfs replace' you might want to use -r option. If I went ahead with btrfs alone and am prepared to lose some (not "all") files; could part of the FS remain usable (and the rest restorable from slow backups) w/o involving LVM? I could make metadata (and maybe system chunks?) raid1c3 or even raid1c4 since they seem small and important enough with ancient HW in play. I mainly wanted raid1 because restoring from backups is slow; and btrfs would let me grow a single FS without much planning or having to find identical or even similar drives. > And on second thought... > > You might do some rudimentary read/write benchmarks on all three <snip> Not performance critical at all, all that is on SSD :) ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: raid1 with several old drives and a big new one 2020-07-31 3:22 ` Eric Wong @ 2020-07-31 3:35 ` Chris Murphy 0 siblings, 0 replies; 9+ messages in thread From: Chris Murphy @ 2020-07-31 3:35 UTC (permalink / raw) To: Eric Wong; +Cc: Btrfs BTRFS On Thu, Jul 30, 2020 at 9:22 PM Eric Wong <e@80x24.org> wrote: > > Chris Murphy <lists@colorremedies.com> wrote: > > When one of the 2TB fails, there's some likelihood that it'll behave > > like a partially failing device. Some reads and writes will succeed, > > others won't. So you'll need to be prepared strategy wise what to do. > > Ideal scenario is a new 4+TB drive, and use 'btrfs replace' to replace > > the md concat device. Due to the large number of errors possible with > > the 'btrfs replace' you might want to use -r option. > > If I went ahead with btrfs alone and am prepared to lose some > (not "all") files; could part of the FS remain usable (and the > rest restorable from slow backups) w/o involving LVM? > > I could make metadata (and maybe system chunks?) raid1c3 or even > raid1c4 since they seem small and important enough with ancient > HW in play. Yes. I'm not sure whether it will mount rw,degraded if 2 devices are missing though, it might insist on read-only. -- Chris Murphy ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: raid1 with several old drives and a big new one 2020-07-31 2:57 ` Chris Murphy 2020-07-31 3:22 ` Eric Wong @ 2020-08-01 9:05 ` Roman Mamedov 1 sibling, 0 replies; 9+ messages in thread From: Roman Mamedov @ 2020-08-01 9:05 UTC (permalink / raw) To: Chris Murphy; +Cc: Eric Wong, Btrfs BTRFS On Thu, 30 Jul 2020 20:57:38 -0600 Chris Murphy <lists@colorremedies.com> wrote: > On Thu, Jul 30, 2020 at 6:16 PM Eric Wong <e@80x24.org> wrote: > > > > Say I have three ancient 2TB HDDs and one new 6TB HDD, is there > > a way I can ensure one raid1 copy of the data stays on the new > > 6TB HDD? > > Yes. Use mdadm --level=linear --raid-devices=2 to concatenate the two > 2TB drives. Or go with a RAID0 for this, to get a nice performance benefit as well. It is a bad idea in any case to hope for any data recoverability from a half-failed linear "array". -- With respect, Roman ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: raid1 with several old drives and a big new one 2020-07-31 0:16 raid1 with several old drives and a big new one Eric Wong 2020-07-31 2:57 ` Chris Murphy @ 2020-07-31 8:29 ` Alberto Bursi 2020-07-31 10:06 ` Eric Wong 2020-07-31 16:13 ` Adam Borowski 2 siblings, 1 reply; 9+ messages in thread From: Alberto Bursi @ 2020-07-31 8:29 UTC (permalink / raw) To: Eric Wong, linux-btrfs On 31/07/20 02:16, Eric Wong wrote: > Say I have three ancient 2TB HDDs and one new 6TB HDD, is there > a way I can ensure one raid1 copy of the data stays on the new > 6TB HDD? > > I expect the 2TB HDDs to fail sooner than the 6TB HDD given > their age (>5 years). > > The devid balance filter only affects data which already exists > on the device, so that isn't suitable for this, right? > > Thanks in advance. I'm not sure what is the problem, ok maybe the drives are old and are more likely to fail, but why would more than one drive fail at once? -Alberto ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: raid1 with several old drives and a big new one 2020-07-31 8:29 ` Alberto Bursi @ 2020-07-31 10:06 ` Eric Wong 0 siblings, 0 replies; 9+ messages in thread From: Eric Wong @ 2020-07-31 10:06 UTC (permalink / raw) To: Alberto Bursi; +Cc: linux-btrfs Alberto Bursi <bobafetthotmail@gmail.com> wrote: > On 31/07/20 02:16, Eric Wong wrote: > > Say I have three ancient 2TB HDDs and one new 6TB HDD, is there > > a way I can ensure one raid1 copy of the data stays on the new > > 6TB HDD? > > > > I expect the 2TB HDDs to fail sooner than the 6TB HDD given > > their age (>5 years). > > > > I'm not sure what is the problem, ok maybe the drives are old and are more > likely to fail, but why would more than one drive fail at once? Why wouldn't they? Otherwise there'd be no reason for RAID6 to exist over RAID5. Recovery puts more stress on the remaining drives and increases the likelyhood of another drive in a pool failing. I've seen HW RAID5 arrays lost like like this in a previous life (I didn't manage to convince the other sysadmins to use RAID6 :<). ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: raid1 with several old drives and a big new one 2020-07-31 0:16 raid1 with several old drives and a big new one Eric Wong 2020-07-31 2:57 ` Chris Murphy 2020-07-31 8:29 ` Alberto Bursi @ 2020-07-31 16:13 ` Adam Borowski 2020-08-01 3:40 ` Zygo Blaxell 2 siblings, 1 reply; 9+ messages in thread From: Adam Borowski @ 2020-07-31 16:13 UTC (permalink / raw) To: Eric Wong; +Cc: linux-btrfs On Fri, Jul 31, 2020 at 12:16:52AM +0000, Eric Wong wrote: > Say I have three ancient 2TB HDDs and one new 6TB HDD, is there > a way I can ensure one raid1 copy of the data stays on the new > 6TB HDD? > > I expect the 2TB HDDs to fail sooner than the 6TB HDD given > their age (>5 years). While there's no good way to do so in general, in your case, there's no way for any new block group to be allocated without the big disk. Btrfs' allocation algorithm is: always pick the disk with most free space left. Besides being simple, this guarantees optimally utilizing available space. And, for 2+2+2+6, no scheme that doesn't waste space could possibly place raid1 copies without having one on the biggest disk. Thus, all you need is to balance once. > The devid balance filter only affects data which already exists > on the device, so that isn't suitable for this, right? Yeah, balance affects existing data, but doesn't have a lingering effect on new allocations. Meow! -- ⢀⣴⠾⠻⢶⣦⠀ ⣾⠁⢠⠒⠀⣿⡁ ⢿⡄⠘⠷⠚⠋⠀ It's time to migrate your Imaginary Protocol from version 4i to 6i. ⠈⠳⣄⠀⠀⠀⠀ ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: raid1 with several old drives and a big new one 2020-07-31 16:13 ` Adam Borowski @ 2020-08-01 3:40 ` Zygo Blaxell 0 siblings, 0 replies; 9+ messages in thread From: Zygo Blaxell @ 2020-08-01 3:40 UTC (permalink / raw) To: Adam Borowski; +Cc: Eric Wong, linux-btrfs On Fri, Jul 31, 2020 at 06:13:07PM +0200, Adam Borowski wrote: > On Fri, Jul 31, 2020 at 12:16:52AM +0000, Eric Wong wrote: > > Say I have three ancient 2TB HDDs and one new 6TB HDD, is there > > a way I can ensure one raid1 copy of the data stays on the new > > 6TB HDD? > > > > I expect the 2TB HDDs to fail sooner than the 6TB HDD given > > their age (>5 years). It might be a good idea to run 'btrfs replace' on one of the two 2TB disks instead of 'device add'. That will move one copy of the data very quickly to the new disk. You then resize the new disk to 6TB (or 'max'), then add the 2TB disk back into the array with btrfs dev add. This will leave you with 1 full 2TB disk, 1 empty 2TB disk, and a 6TB disk with 2TB of data on it. In that case you don't even need to balance--the empty 2TB drive will fill up with BGs that contain one chunk from the 2TB drive and one from 6TB, since the allocator will pick the two emptiest drives first. Everything will be mirrored on the 6TB drive (probably, see below). The variation in write load might also shift the date when the drives eventually do fail, so they'll be less likely to fail at the same time. > While there's no good way to do so in general, in your case, there's no way > for any new block group to be allocated without the big disk. > > Btrfs' allocation algorithm is: always pick the disk with most free space > left. Besides being simple, this guarantees optimally utilizing available > space. That is the theory; however, practice is a little different. Sometimes btrfs just doesn't follow its own rules. I've filled in big raid1 arrays with lopsided disks like this, and ended up with one block group out of every few thousand with a chunk from each of the two smaller disks. I guess it's a race condition, possibly triggered by scrub or balance marking block groups readonly, but I've never fully investigated. When the larger disk is _exactly_ the same size as the two smaller disks, having one block group in the wrong place can be annoying, as it reduces capacity. If two disks fail, btrfs will count the number of failing disks and say "nope, can't mount this degraded raid1, sorry" if even one block group in the filesystem contains both failing disks. In any case, the behavior isn't strictly guaranteed here--btrfs *can* allocate a block group across the two smaller disks, even though it normally would not; therefore, there's a risk that it might do so unexpectedly. Contrast with combining the two 2TB disks (e.g. with mdadm-raid0 or linear, or LVM), where btrfs is presented with exactly two devices and has exactly one option to allocate mirror devices on them. > And, for 2+2+2+6, no scheme that doesn't waste space could possibly place > raid1 copies without having one on the biggest disk. > > Thus, all you need is to balance once. > > > The devid balance filter only affects data which already exists > > on the device, so that isn't suitable for this, right? > > Yeah, balance affects existing data, but doesn't have a lingering effect on > new allocations. > > Meow! > -- > ⢀⣴⠾⠻⢶⣦⠀ > ⣾⠁⢠⠒⠀⣿⡁ > ⢿⡄⠘⠷⠚⠋⠀ It's time to migrate your Imaginary Protocol from version 4i to 6i. > ⠈⠳⣄⠀⠀⠀⠀ ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2020-08-01 9:05 UTC | newest] Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-07-31 0:16 raid1 with several old drives and a big new one Eric Wong 2020-07-31 2:57 ` Chris Murphy 2020-07-31 3:22 ` Eric Wong 2020-07-31 3:35 ` Chris Murphy 2020-08-01 9:05 ` Roman Mamedov 2020-07-31 8:29 ` Alberto Bursi 2020-07-31 10:06 ` Eric Wong 2020-07-31 16:13 ` Adam Borowski 2020-08-01 3:40 ` Zygo Blaxell
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).