* dstat shows unexpected result for two disk RAID1 @ 2016-03-09 20:21 Nicholas D Steeves 2016-03-09 20:25 ` Nicholas D Steeves 0 siblings, 1 reply; 18+ messages in thread From: Nicholas D Steeves @ 2016-03-09 20:21 UTC (permalink / raw) To: linux-btrfs Hello everyone, I've run into an expected behaviour for a my two disk RAID1. I mount with UUIDs, because sometimes my USB disk gets /dev/sdc instead of /dev/sdd. The two elements of my RAID1 are currently sdb and sdd. dstat -tdD total,sdb,sdc,sdd It seems that per process, reads come from either sdb or sdd. This surprises me, because I understood that a btrfs RAID1 ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: dstat shows unexpected result for two disk RAID1 2016-03-09 20:21 dstat shows unexpected result for two disk RAID1 Nicholas D Steeves @ 2016-03-09 20:25 ` Nicholas D Steeves 2016-03-09 20:50 ` Goffredo Baroncelli ` (2 more replies) 0 siblings, 3 replies; 18+ messages in thread From: Nicholas D Steeves @ 2016-03-09 20:25 UTC (permalink / raw) To: linux-btrfs grr. Gmail is terrible :-/ I understood that a btrfs RAID1 would at best grab one block from sdb and then one block from sdd in round-robin fashion, or at worse grab one chunk from sdb and then one chunk from sdd. Alternatively I thought that it might read from both simultaneously, to make sure that all data matches, while at the same time providing single-disk performance. None of these was the case. Running a single IO-intensive process reads from a single drive. Did I misunderstand the documentation and is this normal, or is this a bug? Nicholas On 9 March 2016 at 15:21, Nicholas D Steeves <nsteeves@gmail.com> wrote: > Hello everyone, > > I've run into an expected behaviour for a my two disk RAID1. I mount > with UUIDs, because sometimes my USB disk gets /dev/sdc instead of > /dev/sdd. The two elements of my RAID1 are currently sdb and sdd. > > dstat -tdD total,sdb,sdc,sdd > > It seems that per process, reads come from either sdb or sdd. This > surprises me, because I understood that a btrfs RAID1 ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: dstat shows unexpected result for two disk RAID1 2016-03-09 20:25 ` Nicholas D Steeves @ 2016-03-09 20:50 ` Goffredo Baroncelli 2016-03-09 21:26 ` Chris Murphy 2016-03-09 21:36 ` Roman Mamedov 2 siblings, 0 replies; 18+ messages in thread From: Goffredo Baroncelli @ 2016-03-09 20:50 UTC (permalink / raw) To: Nicholas D Steeves, linux-btrfs On 2016-03-09 21:25, Nicholas D Steeves wrote: > grr. Gmail is terrible :-/ > > I understood that a btrfs RAID1 would at best grab one block from sdb > and then one block from sdd in round-robin fashion, or at worse grab > one chunk from sdb and then one chunk from sdd. Alternatively I > thought that it might read from both simultaneously, to make sure that > all data matches, while at the same time providing single-disk > performance. None of these was the case. Running a single > IO-intensive process reads from a single drive. > > Did I misunderstand the documentation and is this normal, or is this a bug? > Nicholas In a case of a BTRFS RAID, I knew that a process read from a drive depending by its pid. I don't know if it is changed. But what from you write it seems that it still true today. > > On 9 March 2016 at 15:21, Nicholas D Steeves <nsteeves@gmail.com> wrote: >> Hello everyone, >> >> I've run into an expected behaviour for a my two disk RAID1. I mount >> with UUIDs, because sometimes my USB disk gets /dev/sdc instead of >> /dev/sdd. The two elements of my RAID1 are currently sdb and sdd. >> >> dstat -tdD total,sdb,sdc,sdd >> >> It seems that per process, reads come from either sdb or sdd. This >> surprises me, because I understood that a btrfs RAID1 > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it> Key fingerprint BBF5 1610 0B64 DAC6 5F7D 17B2 0EDA 9B37 8B82 E0B5 ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: dstat shows unexpected result for two disk RAID1 2016-03-09 20:25 ` Nicholas D Steeves 2016-03-09 20:50 ` Goffredo Baroncelli @ 2016-03-09 21:26 ` Chris Murphy 2016-03-09 22:51 ` Nicholas D Steeves 2016-03-11 23:42 ` Nicholas D Steeves 2016-03-09 21:36 ` Roman Mamedov 2 siblings, 2 replies; 18+ messages in thread From: Chris Murphy @ 2016-03-09 21:26 UTC (permalink / raw) To: Nicholas D Steeves; +Cc: Btrfs BTRFS On Wed, Mar 9, 2016 at 1:25 PM, Nicholas D Steeves <nsteeves@gmail.com> wrote: > grr. Gmail is terrible :-/ > > I understood that a btrfs RAID1 would at best grab one block from sdb > and then one block from sdd in round-robin fashion, or at worse grab > one chunk from sdb and then one chunk from sdd. Alternatively I > thought that it might read from both simultaneously, to make sure that > all data matches, while at the same time providing single-disk > performance. None of these was the case. Running a single > IO-intensive process reads from a single drive. > > Did I misunderstand the documentation and is this normal, or is this a bug? > Nicholas > > On 9 March 2016 at 15:21, Nicholas D Steeves <nsteeves@gmail.com> wrote: >> Hello everyone, >> >> I've run into an expected behaviour for a my two disk RAID1. I mount >> with UUIDs, because sometimes my USB disk gets /dev/sdc instead of >> /dev/sdd. The two elements of my RAID1 are currently sdb and sdd. >> >> dstat -tdD total,sdb,sdc,sdd >> >> It seems that per process, reads come from either sdb or sdd. This >> surprises me, because I understood that a btrfs RAID1 It's normal and recognized to be sub-optimal. So it's an optimization opportunity. :-) I see parallelization of reads and writes to data single profile multiple devices as useful also, similar to XFS allocation group parallelization. Those AGs are spread across multiple devices in md/lvm linear layouts, so if you have processes that read/write to multiple AGs at a time, those I/Os happen at the same time when on separate devices. -- Chris Murphy ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: dstat shows unexpected result for two disk RAID1 2016-03-09 21:26 ` Chris Murphy @ 2016-03-09 22:51 ` Nicholas D Steeves 2016-03-11 23:42 ` Nicholas D Steeves 1 sibling, 0 replies; 18+ messages in thread From: Nicholas D Steeves @ 2016-03-09 22:51 UTC (permalink / raw) To: Chris Murphy; +Cc: Btrfs BTRFS On 9 March 2016 at 16:36, Roman Mamedov <rm@romanrm.net> wrote: > On Wed, 9 Mar 2016 15:25:19 -0500 > Nicholas D Steeves <nsteeves@gmail.com> wrote: > >> I understood that a btrfs RAID1 would at best grab one block from sdb >> and then one block from sdd in round-robin fashion, or at worse grab >> one chunk from sdb and then one chunk from sdd. Alternatively I >> thought that it might read from both simultaneously, to make sure that >> all data matches, while at the same time providing single-disk >> performance. None of these was the case. Running a single >> IO-intensive process reads from a single drive. > > No RAID1 implementation reads from disks in a round-robin fashion, as that > would give terrible performance giving disks a constant seek load instead of > the normal linear read scenario. On 9 March 2016 at 16:26, Chris Murphy <lists@colorremedies.com> wrote: > It's normal and recognized to be sub-optimal. So it's an optimization > opportunity. :-) > > I see parallelization of reads and writes to data single profile > multiple devices as useful also, similar to XFS allocation group > parallelization. Those AGs are spread across multiple devices in > md/lvm linear layouts, so if you have processes that read/write to > multiple AGs at a time, those I/Os happen at the same time when on > separate devices. Chris, yes, that's exactly how I thought that it would work. Roman, when I said round-robin--please forgive my naïvité--I meant hoped there would be a chunk A1 from disk0 read at the same time as chunk A2 from disk1. Can you use the btree associated with chunk A1 to put disk B to work readingahead, but searching the btree associated with chunk A1? Then, when disk0 finishes reading A1 into memory, A2 gets contatinated. If disk0 is finishes reading chunk A1, change the primary read disk for PID to disk1 and let reading A2 continue, and put disk0 to work using the same method as disk1 was previously, but on chunk A3. Else, if disk1 reading A2 finishes before disk0 finishes A1, then disk0 remains the primary read disk for PID and disk1 begins reading A3. That's how I thought that it would work, and that the scheduler could interrupt the readahead operation for non-primary disk. Eg: disk1 would becoming primary reading disk for PID2, where disk0 would continue as primary for PID1. And if there's a long queue of reads or writes then this simplest-case would be limited in the following way: disk0 and disk1 never actually get to read or write to the same chunk <- Is this the explanation why, for practical reasons, dstat shows the behaviour it shows? If this is the case, would it be possible for the non-primary read disk for PID1 to tag the A[x] chunk it wrote to memory with a request for the PID to use what it wrote to memory from A[x]? And also for the "primary" disk to resume from location y in A[x] instead beginning from scratch with A[x]? Roman, in this case, the seeks would be time-saving, no? Unfortunately, I don't know how to implement this, but I had imagined that the btree for a directory contained pointers (I'm using this term loosely rather than programically) to all extents associated with all files contained underneath it. Or does it point to the chunk, which then points to the extent? At any rate, is this similar to the dir_index of ext4, and is this the method btrfs uses? Best regards, Nicholas ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: dstat shows unexpected result for two disk RAID1 2016-03-09 21:26 ` Chris Murphy 2016-03-09 22:51 ` Nicholas D Steeves @ 2016-03-11 23:42 ` Nicholas D Steeves 1 sibling, 0 replies; 18+ messages in thread From: Nicholas D Steeves @ 2016-03-11 23:42 UTC (permalink / raw) To: Btrfs BTRFS On 9 March 2016 at 16:26, Chris Murphy <lists@colorremedies.com> wrote: > > It's normal and recognized to be sub-optimal. So it's an optimization > opportunity. :-) > > I see parallelization of reads and writes to data single profile > multiple devices as useful also, similar to XFS allocation group > parallelization. Those AGs are spread across multiple devices in > md/lvm linear layouts, so if you have processes that read/write to > multiple AGs at a time, those I/Os happen at the same time when on > separate devices. I'm not sure if I can pull it off... :-) At best I might only be able to define the problem and how things fit together, and then attempt to logic my way through it with pseudo-code. My hope is that someone would look at this work, say "Aha! You're doing it wrong!" and then implement it the right way. On 10 March 2016 at 03:10, Duncan <1i5t5.duncan@cox.net> wrote: > > Call me a conspiracy nut, but don't be too surprised if someone's > introducing some product with btrfs and encrypted subvolumes a year or 18 > months from now... I know I won't be! =:^) In that case, couldn't an "at a glance" overview of what needs to be done for distributed read optimisation entice a product-manager somewhere out there to throw some employee-time at the problem? :-p Best regards, Nicholas ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: dstat shows unexpected result for two disk RAID1 2016-03-09 20:25 ` Nicholas D Steeves 2016-03-09 20:50 ` Goffredo Baroncelli 2016-03-09 21:26 ` Chris Murphy @ 2016-03-09 21:36 ` Roman Mamedov 2016-03-09 21:43 ` Chris Murphy 2016-03-10 4:06 ` Duncan 2 siblings, 2 replies; 18+ messages in thread From: Roman Mamedov @ 2016-03-09 21:36 UTC (permalink / raw) To: Nicholas D Steeves; +Cc: linux-btrfs [-- Attachment #1: Type: text/plain, Size: 1862 bytes --] On Wed, 9 Mar 2016 15:25:19 -0500 Nicholas D Steeves <nsteeves@gmail.com> wrote: > grr. Gmail is terrible :-/ > > I understood that a btrfs RAID1 would at best grab one block from sdb > and then one block from sdd in round-robin fashion, or at worse grab > one chunk from sdb and then one chunk from sdd. Alternatively I > thought that it might read from both simultaneously, to make sure that > all data matches, while at the same time providing single-disk > performance. None of these was the case. Running a single > IO-intensive process reads from a single drive. No RAID1 implementation reads from disks in a round-robin fashion, as that would give terrible performance giving disks a constant seek load instead of the normal linear read scenario. As for reading at the same time, there's no reason to do that either, since the data integrity is protected by checksums, and "the other" disk for a particular data piece is being consulted only in case the checksum did not match (or when you execute a 'scrub'). It's a known limitation that the disks are in effect "pinned" to running processes, based on their process ID. One process reads from the same disk, from the point it started and until it terminates. Other processes by luck may read from a different disk, thus achieving load balancing. Or they may not, and you will have contention with the other disk idling. This is unlike MD RAID1, which knows to distribute read load dynamically to the least-utilized array members. Now if you want to do some more performance evaluation, check with your dstat if both disks happen to *write* data in parallel, when you write to the array, as ideally they should. Last I checked they mostly didn't, and this almost halved write performance on a Btrfs RAID1 compared to a single disk. -- With respect, Roman [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 181 bytes --] ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: dstat shows unexpected result for two disk RAID1 2016-03-09 21:36 ` Roman Mamedov @ 2016-03-09 21:43 ` Chris Murphy 2016-03-09 22:08 ` Nicholas D Steeves 2016-03-10 4:06 ` Duncan 1 sibling, 1 reply; 18+ messages in thread From: Chris Murphy @ 2016-03-09 21:43 UTC (permalink / raw) To: Roman Mamedov; +Cc: Nicholas D Steeves, Btrfs BTRFS On Wed, Mar 9, 2016 at 2:36 PM, Roman Mamedov <rm@romanrm.net> wrote: > On Wed, 9 Mar 2016 15:25:19 -0500 > Nicholas D Steeves <nsteeves@gmail.com> wrote: > >> grr. Gmail is terrible :-/ >> >> I understood that a btrfs RAID1 would at best grab one block from sdb >> and then one block from sdd in round-robin fashion, or at worse grab >> one chunk from sdb and then one chunk from sdd. Alternatively I >> thought that it might read from both simultaneously, to make sure that >> all data matches, while at the same time providing single-disk >> performance. None of these was the case. Running a single >> IO-intensive process reads from a single drive. > > No RAID1 implementation reads from disks in a round-robin fashion, as that > would give terrible performance giving disks a constant seek load instead of > the normal linear read scenario. > > As for reading at the same time, there's no reason to do that either, since > the data integrity is protected by checksums, and "the other" disk for a > particular data piece is being consulted only in case the checksum did not > match (or when you execute a 'scrub'). > > It's a known limitation that the disks are in effect "pinned" to running > processes, based on their process ID. One process reads from the same disk, > from the point it started and until it terminates. Other processes by luck may > read from a different disk, thus achieving load balancing. Or they may not, > and you will have contention with the other disk idling. This is unlike MD > RAID1, which knows to distribute read load dynamically to the least-utilized > array members. This is a better qualification than my answer. > > Now if you want to do some more performance evaluation, check with your dstat > if both disks happen to *write* data in parallel, when you write to the array, > as ideally they should. Last I checked they mostly didn't, and this almost > halved write performance on a Btrfs RAID1 compared to a single disk. I've found it to be about the same or slightly less than single disk. But most of my writes to raid1 are btrfs receive. -- Chris Murphy ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: dstat shows unexpected result for two disk RAID1 2016-03-09 21:43 ` Chris Murphy @ 2016-03-09 22:08 ` Nicholas D Steeves 0 siblings, 0 replies; 18+ messages in thread From: Nicholas D Steeves @ 2016-03-09 22:08 UTC (permalink / raw) To: Btrfs BTRFS On 9 March 2016 at 16:43, Chris Murphy <lists@colorremedies.com> wrote: > On Wed, Mar 9, 2016 at 2:36 PM, Roman Mamedov <rm@romanrm.net> wrote: >> On Wed, 9 Mar 2016 15:25:19 -0500 > This is a better qualification than my answer. > >> >> Now if you want to do some more performance evaluation, check with your dstat >> if both disks happen to *write* data in parallel, when you write to the array, >> as ideally they should. Last I checked they mostly didn't, and this almost >> halved write performance on a Btrfs RAID1 compared to a single disk. > > I've found it to be about the same or slightly less than single disk. > But most of my writes to raid1 are btrfs receive. Here are my results for sending pv /tmpfs_mem_disk/deleteme.tar -pabet > /scratch/deleteme.tar, after I've cleared all caches. Pv states the average rate was 77MiB/s, which seems low for a 4GB file. Here is the dstat section for peak rates for writing. ----system---- -dsk/total----dsk/sdb-----dsk/sdd-- time | read writ: read writ: read writ 09-03 16:48:43| 48k 145M: 0 74M: 48k 72M 09-03 16:48:44| 0 120M: 0 74M: 0 46M 09-03 16:48:45| 840k 144M: 0 74M: 0 70M 09-03 16:48:46| 0 147M: 0 80M: 0 67M and for reading many >200MB raw WAVs from one subvolume while writing a ~20GB tar to another subvolume: 09-03 16:59:57| 56M 103M: 0 54M: 56M 50M 09-03 16:59:58| 48M 118M: 32k 56M: 48M 62M 09-03 16:59:59| 54M 113M: 0 57M: 54M 55M 09-03 17:00:00| 43M 116M: 0 54M: 43M 63M 09-03 17:00:01| 60M 118M: 0 64M: 60M 54M 09-03 17:00:02| 57M 97M: 32k 48M: 54M 49M ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: dstat shows unexpected result for two disk RAID1 2016-03-09 21:36 ` Roman Mamedov 2016-03-09 21:43 ` Chris Murphy @ 2016-03-10 4:06 ` Duncan 2016-03-10 5:01 ` Chris Murphy 2016-03-12 0:04 ` Nicholas D Steeves 1 sibling, 2 replies; 18+ messages in thread From: Duncan @ 2016-03-10 4:06 UTC (permalink / raw) To: linux-btrfs Roman Mamedov posted on Thu, 10 Mar 2016 02:36:27 +0500 as excerpted: > It's a known limitation that the disks are in effect "pinned" to running > processes, based on their process ID. One process reads from the same > disk, from the point it started and until it terminates. Other processes > by luck may read from a different disk, thus achieving load balancing. > Or they may not, and you will have contention with the other disk > idling. This is unlike MD RAID1, which knows to distribute read load > dynamically to the least-utilized array members. > > Now if you want to do some more performance evaluation, check with your > dstat if both disks happen to *write* data in parallel, when you write > to the array, > as ideally they should. Last I checked they mostly didn't, and this > almost halved write performance on a Btrfs RAID1 compared to a single > disk. As stated, at present btrfs mostly handles devices (I've made it a personal point to try not to say disks, because SSD, etc, unless it's /specific/ /to/ spinning rust, but device remains correct) one at a time per task. And for raid1 read in particular, the read scheduler is a very simple even/odd PID based scheduler, implemented early on when simplicity of implementation and easy testing of single-task single-device, multi-task multi-device, and multi-task-bottlenecked-to-single-device, all three scenarios, was of prime consideration, far more so than a speed. Indeed, at that point, optimization would have been a prime example of "premature optimization", as it would have almost certainly either restricted various later added feature implementation choices later on, or would have needed redone later, once those features and their constraints were known, thus losing the work done in the first optimization. And in fact, I've pointed out this very fact as a an easily seen example of why btrfs isn't yet fully stable or production ready -- as can be seen in the work of the very developers themselves. Any developer worth the name will be very wary of the dangers of "premature optimization" and the risk it brings of either severely limiting implementations of further features or having to be good work thrown out because it doesn't match the new code. When the devs consider the btrfs code stable enough, they'll optimize this. Until then, it's prime evidence that they do _not_ consider btrfs stable and mature enough for this sort of optimization just yet. =:^) Meanwhile, for quite some time (since at least kernel 3.5 when raid56 was expected in kernel 3.6) on the roadmap for implementation after raid56, is N-way-mirroring -- basically, raid1 the way mdraid does it, so 5 devices means 5 mirrors, not the precisely two mirrors of each chunk, with new chunks distributed across the other devices until they've all been used, that we have now (tho it would continue to be an option). And FWIW, N-way-mirroring is a primary feature interest of mine so I've been following it more closely than much of btrfs development. Of course the logical raid10 extension of that would be the ability to specify N mirrors and M stripes on raid10 as well, so that for a 6-device raid10, you could choose between the existing two-way-mirroring, three- way-striping, and a new three-way-mirroring, two-way-striping, mode, tho I don't know if they'll implement both N-way-mirroring raid1 and N-way- mirroring raid10 at the same time, or wait on the latter. Either way, my point in bringing up N-way-mirroring, is that it has been roadmapped for quite some time, and with it roadmapped, attempting either two-way-only-optimization or N-way-optimization, now, arguably _would_ be premature optimization, because the first would have to be redone for N- way once it became available, and there's no way to test that the second actually works beyond two-way, until n-way is actually available. So I'd guess N-way-read-optimization, with N=2 just one of the possibilities, will come after N-way-mirroring, which in turn has long been roadmapped for after raid56. Meanwhile, while parity-raid (aka raid56) isn't as bad as it was when first nominally completed in 3.19, as of 4.4 (and I think 4.5 as I've not seen a full trace yet, let alone a fix), there's still at least one known bug remaining to be traced down and exterminated, that's causing at least some raid56 reshapes to different numbers of devices or recovery from a lost device to take at least 10 times as long as they logically should, we're talking times of weeks to months, during which time the array can be used, but if it's a bad device replacement and more devices go down in that time... So even if it's not an immediate data-loss bug, it's still a blocker in terms of actually using parity-raid for the purposes parity- raid is normally used. So raid56, while nominally complete now (after nearing four /years/ of work, remember, originally it was intended for kernel 3.5 or 3.6), still isn't anything close to stable as the rest of btrfs, and is still requiring developer focus, so it could be awhile before we see that N-way- mirroring that was roadmapped after it, which in turn means it'll likely be even longer before we see good raid1 read optimization. Tho hopefully all the really tough problems they would have hit with N- way-mirroring were hit and resolved with raid56, and N-way-mirroring will thus be relatively simple, so hopefully it's less than the four years it's taking raid56. But I don't expect to see it for another year or two, and don't expect to be actually use it as intended (as a more failure resistant raid1) for some time after that as the bugs get worked out, so realistically, 2-3 years. If multi-device scheduling optimization is done in say 6 months after that... that means we're looking at 2.5-3.5 years, perhaps longer, for it. So it's a known issue, yes, and on the roadmap, yes, but don't expect to see anything in the near (-2-year) future, more like intermediate (3-5) year future. In all honesty I don't seriously expect it to be long-term future, beyond 5 years, but it's possible. -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: dstat shows unexpected result for two disk RAID1 2016-03-10 4:06 ` Duncan @ 2016-03-10 5:01 ` Chris Murphy 2016-03-10 8:10 ` Duncan 2016-03-12 0:04 ` Nicholas D Steeves 1 sibling, 1 reply; 18+ messages in thread From: Chris Murphy @ 2016-03-10 5:01 UTC (permalink / raw) To: Btrfs BTRFS On Wed, Mar 9, 2016 at 9:06 PM, Duncan <1i5t5.duncan@cox.net> wrote: > Tho hopefully all the really tough problems they would have hit with N- > way-mirroring were hit and resolved with raid56, and N-way-mirroring will > thus be relatively simple, so hopefully it's less than the four years > it's taking raid56. But I don't expect to see it for another year or > two, and don't expect to be actually use it as intended (as a more > failure resistant raid1) for some time after that as the bugs get worked > out, so realistically, 2-3 years. > > If multi-device scheduling optimization is done in say 6 months after > that... that means we're looking at 2.5-3.5 years, perhaps longer, for > it. So it's a known issue, yes, and on the roadmap, yes, but don't > expect to see anything in the near (-2-year) future, more like > intermediate (3-5) year future. In all honesty I don't seriously expect > it to be long-term future, beyond 5 years, but it's possible. Meh, encryption RFC patches arrived 8 days ago and I wasn't expecting that to happen for a couple years. So I think our expectations have almost no bearing on feature or fix arrival. For all we know n-way could appear in 4.6. -- Chris Murphy ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: dstat shows unexpected result for two disk RAID1 2016-03-10 5:01 ` Chris Murphy @ 2016-03-10 8:10 ` Duncan 0 siblings, 0 replies; 18+ messages in thread From: Duncan @ 2016-03-10 8:10 UTC (permalink / raw) To: linux-btrfs Chris Murphy posted on Wed, 09 Mar 2016 22:01:21 -0700 as excerpted: > Meh, encryption RFC patches arrived 8 days ago and I wasn't expecting > that to happen for a couple years. So I think our expectations have > almost no bearing on feature or fix arrival. For all we know n-way could > appear in 4.6. The crypto rfc patches were out of left field for me as well. But if n-way does show up effectively "tomorrow" (as it would need to, to hit 4.6), I'd suspect someone with some money to spend prioritized it, much as I suspect that's what happened with the crypto patches. Call me a conspiracy nut, but don't be too surprised if someone's introducing some product with btrfs and encrypted subvolumes a year or 18 months from now... I know I won't be! =:^) -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: dstat shows unexpected result for two disk RAID1 2016-03-10 4:06 ` Duncan 2016-03-10 5:01 ` Chris Murphy @ 2016-03-12 0:04 ` Nicholas D Steeves 2016-03-12 0:10 ` Nicholas D Steeves 1 sibling, 1 reply; 18+ messages in thread From: Nicholas D Steeves @ 2016-03-12 0:04 UTC (permalink / raw) To: Btrfs BTRFS On 9 March 2016 at 23:06, Duncan <1i5t5.duncan@cox.net> wrote: > > Meanwhile, while parity-raid (aka raid56) isn't as bad as it was when > first nominally completed in 3.19, as of 4.4 (and I think 4.5 as I've not > seen a full trace yet, let alone a fix), there's still at least one known > bug remaining to be traced down and exterminated, that's causing at least > some raid56 reshapes to different numbers of devices or recovery from a > lost device to take at least 10 times as long as they logically should, > we're talking times of weeks to months, during which time the array can > be used, but if it's a bad device replacement and more devices go down in > that time... > > Tho hopefully all the really tough problems they would have hit with N- > way-mirroring were hit and resolved with raid56, and N-way-mirroring will > thus be relatively simple, so hopefully it's less than the four years > it's taking raid56. But I don't expect to see it for another year or > two, and don't expect to be actually use it as intended (as a more > failure resistant raid1) for some time after that as the bugs get worked > out, so realistically, 2-3 years. Could the raid5 code could be patched to copy/read instead of build/check parity? In effect I'm wondering if this could be used as an alternative to the current raid1 profile. The bonus being that it seems like it might accelerate shaking out the bugs in raid5. Likewise, would doing the same with the raid6 code in effect implement a 3-way mirror distributed over n-devices? Kind regards, Nicholas ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: dstat shows unexpected result for two disk RAID1 2016-03-12 0:04 ` Nicholas D Steeves @ 2016-03-12 0:10 ` Nicholas D Steeves 2016-03-12 1:20 ` Chris Murphy 0 siblings, 1 reply; 18+ messages in thread From: Nicholas D Steeves @ 2016-03-12 0:10 UTC (permalink / raw) To: Btrfs BTRFS P.S. Rather than parity, I mean instead of distributing into stripes, do a copy! ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: dstat shows unexpected result for two disk RAID1 2016-03-12 0:10 ` Nicholas D Steeves @ 2016-03-12 1:20 ` Chris Murphy 2016-04-06 3:58 ` Nicholas D Steeves 0 siblings, 1 reply; 18+ messages in thread From: Chris Murphy @ 2016-03-12 1:20 UTC (permalink / raw) To: Nicholas D Steeves; +Cc: Btrfs BTRFS On Fri, Mar 11, 2016 at 5:10 PM, Nicholas D Steeves <nsteeves@gmail.com> wrote: > P.S. Rather than parity, I mean instead of distributing into stripes, do a copy! raid56 by definition are parity based, so I'd say no that's confusing to turn it into something it's not. -- Chris Murphy ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: dstat shows unexpected result for two disk RAID1 2016-03-12 1:20 ` Chris Murphy @ 2016-04-06 3:58 ` Nicholas D Steeves 2016-04-06 12:02 ` Austin S. Hemmelgarn 0 siblings, 1 reply; 18+ messages in thread From: Nicholas D Steeves @ 2016-04-06 3:58 UTC (permalink / raw) To: Chris Murphy; +Cc: Btrfs BTRFS On 11 March 2016 at 20:20, Chris Murphy <lists@colorremedies.com> wrote: > On Fri, Mar 11, 2016 at 5:10 PM, Nicholas D Steeves <nsteeves@gmail.com> wrote: >> P.S. Rather than parity, I mean instead of distributing into stripes, do a copy! > > raid56 by definition are parity based, so I'd say no that's confusing > to turn it into something it's not. I just found the Multiple Device Support diagram. I'm trying to figure out how hard it's going for me to get up to speed, because I've only ever casually and informally read about filesystems. I worry that because I didn't study filesystem design in school, and because everything I worked on was in C++...well, the level of sophistication and design might be beyond what I can learn. What do you think? Can you recommend any books on file system design that will provide what is necessary to understand btrfs? Cheers, Nicholas ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: dstat shows unexpected result for two disk RAID1 2016-04-06 3:58 ` Nicholas D Steeves @ 2016-04-06 12:02 ` Austin S. Hemmelgarn 2016-04-22 22:36 ` Nicholas D Steeves 0 siblings, 1 reply; 18+ messages in thread From: Austin S. Hemmelgarn @ 2016-04-06 12:02 UTC (permalink / raw) To: Nicholas D Steeves; +Cc: Chris Murphy, Btrfs BTRFS On 2016-04-05 23:58, Nicholas D Steeves wrote: > On 11 March 2016 at 20:20, Chris Murphy <lists@colorremedies.com> wrote: >> On Fri, Mar 11, 2016 at 5:10 PM, Nicholas D Steeves <nsteeves@gmail.com> wrote: >>> P.S. Rather than parity, I mean instead of distributing into stripes, do a copy! >> >> raid56 by definition are parity based, so I'd say no that's confusing >> to turn it into something it's not. > > I just found the Multiple Device Support diagram. I'm trying to > figure out how hard it's going for me to get up to speed, because I've > only ever casually and informally read about filesystems. I worry > that because I didn't study filesystem design in school, and because > everything I worked on was in C++...well, the level of sophistication > and design might be beyond what I can learn. What do you think? Can > you recommend any books on file system design that will provide what > is necessary to understand btrfs? While I can't personally recommend any books on filesystem design, I can give some more general advice: 1. Make sure you have at least a basic understanding of how things work at a high level from the user perspective. It's a lot easier to understand the low-level stuff if you know how it all ends up fitting together. Back when I started looking at the internals of BTRFS I was pretty lost myself. I still am to a certain extent when it comes to the kernel code (most of my background is in Python, Lua, or Bourne Shell, not C, and I don't normally deal with data structures at such a low level), but as I've used it more on my systems, a lot of stuff that seemed cryptic at first is making a lot more sense. 2. Keep in mind that there are a number of things in BTRFS that have no equivalent in other filesystems, or are not typical filesystem design topics. The multi-device support for example is pretty much non-existent as a filesystem design topic because it's traditionally handled by lower levels like LVM. 3. The Linux VFS layer is worth taking a look at, as it handles the translation between the low-level ABI provided by each filesystem and the user-level API. Most of the stuff that BTRFS provides through it is rather consistent with the user level API, but understanding what translation goes on there can be helpful to understanding some of the higher-level internals in BTRFS. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: dstat shows unexpected result for two disk RAID1 2016-04-06 12:02 ` Austin S. Hemmelgarn @ 2016-04-22 22:36 ` Nicholas D Steeves 0 siblings, 0 replies; 18+ messages in thread From: Nicholas D Steeves @ 2016-04-22 22:36 UTC (permalink / raw) To: Austin S. Hemmelgarn; +Cc: Chris Murphy, Btrfs BTRFS Everyone, thank you very much for helping me to learn more. Getting up to speed takes forever! I posted an idea relating to this thread, but it's more read latency rather than throughput related, but I'm not sure what the right way to link overlapping threads is, so here is how to find it: Date: Fri, 22 Apr 2016 18:14:00 -0400 Message-ID: <CAD=QJKgJ9JAgZAOSivJTL-bcLbdkP6UqGb0i6g=fS9j6XKtcLA@mail.gmail.com> Subject: Re: [PATCH v8 00/27][For 4.7] Btrfs: Add inband (write time) de-duplication framework WRT the original AG balanced IO optimisation problem, should I spend most of my time reading disk-io.c while looking for opportunities to optimize multi-device reads, while consulting xfs_file.c and xfs_super.c, or just xfs_super.c? Thanks, Nicholas ^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2016-04-22 22:36 UTC | newest] Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2016-03-09 20:21 dstat shows unexpected result for two disk RAID1 Nicholas D Steeves 2016-03-09 20:25 ` Nicholas D Steeves 2016-03-09 20:50 ` Goffredo Baroncelli 2016-03-09 21:26 ` Chris Murphy 2016-03-09 22:51 ` Nicholas D Steeves 2016-03-11 23:42 ` Nicholas D Steeves 2016-03-09 21:36 ` Roman Mamedov 2016-03-09 21:43 ` Chris Murphy 2016-03-09 22:08 ` Nicholas D Steeves 2016-03-10 4:06 ` Duncan 2016-03-10 5:01 ` Chris Murphy 2016-03-10 8:10 ` Duncan 2016-03-12 0:04 ` Nicholas D Steeves 2016-03-12 0:10 ` Nicholas D Steeves 2016-03-12 1:20 ` Chris Murphy 2016-04-06 3:58 ` Nicholas D Steeves 2016-04-06 12:02 ` Austin S. Hemmelgarn 2016-04-22 22:36 ` Nicholas D Steeves
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.