* btrfs-progs 4.4 re-balance of RAID6 is very slow / limited to one cpu core? @ 2016-01-22 13:38 Christian Rohmann 2016-01-22 14:51 ` Duncan 2016-01-24 2:30 ` Henk Slager 0 siblings, 2 replies; 28+ messages in thread From: Christian Rohmann @ 2016-01-22 13:38 UTC (permalink / raw) To: linux-btrfs Hello btrfs-folks, I am currently doing a big "btrfs balance" to extend a 8 drive RAID6 to 12 drives using "btrfs balance start -dstripes 1..11 -mstripes 1..11" With kernel 4.4 and btrfs progs 4.4 it's running fine for a few days now and the new disks are slowing getting more and more extents. But somehow the process is VERY slow (3% in 3 days) and there is almost no additional disk utilization. The process doing the balance is doing 100% cpu (one core) so apparently the whole thing is very much single threaded and therefore CPU-bound in this case. Is this a known issue or is there anything I can do to speed this up? I mean the disks have plenty of iops left to work with and the box has many more CPU cores idling away. Regards Christian ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: btrfs-progs 4.4 re-balance of RAID6 is very slow / limited to one cpu core? 2016-01-22 13:38 btrfs-progs 4.4 re-balance of RAID6 is very slow / limited to one cpu core? Christian Rohmann @ 2016-01-22 14:51 ` Duncan 2016-01-24 2:30 ` Henk Slager 1 sibling, 0 replies; 28+ messages in thread From: Duncan @ 2016-01-22 14:51 UTC (permalink / raw) To: linux-btrfs Christian Rohmann posted on Fri, 22 Jan 2016 14:38:11 +0100 as excerpted: > I am currently doing a big "btrfs balance" to extend a 8 drive RAID6 to > 12 drives using > "btrfs balance start -dstripes 1..11 -mstripes 1..11" > > With kernel 4.4 and btrfs progs 4.4 it's running fine for a few days now > and the new disks are slowing getting more and more extents. > But somehow the process is VERY slow (3% in 3 days) and there is almost > no additional disk utilization. > > The process doing the balance is doing 100% cpu (one core) so apparently > the whole thing is very much single threaded and therefore CPU-bound in > this case. > > Is this a known issue or is there anything I can do to speed this up? I > mean the disks have plenty of iops left to work with and the box has > many more CPU cores idling away. [This is only intended to be a stop-gap reply, until someone with more detailed/direct knowledge/experience on the topic can reply.] My own use-case is btrfs raid1, but from what I've seen on the list, raid56 mode maintenance that involves recalculating parity, as converting from an 8-device stripe to a 12-device stripe will, is indeed /very/ slow. I didn't know it was single-core limited, however. If it's slow/complex calculations, AND limited to a single core, plus given the likely size of a filesystem of 8-12 devices in the day of multi-TB devices... ~1%/day, 100 days to complete... Ouch, that's going to be painful! The good thing is that it happens online, so you can be using the filesystem and the other cores while it's happening. Plus, balances are interruptable. You can reboot or whatever and it should pick up and continue where it left off. -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: btrfs-progs 4.4 re-balance of RAID6 is very slow / limited to one cpu core? 2016-01-22 13:38 btrfs-progs 4.4 re-balance of RAID6 is very slow / limited to one cpu core? Christian Rohmann 2016-01-22 14:51 ` Duncan @ 2016-01-24 2:30 ` Henk Slager 2016-01-25 11:34 ` Christian Rohmann 1 sibling, 1 reply; 28+ messages in thread From: Henk Slager @ 2016-01-24 2:30 UTC (permalink / raw) To: linux-btrfs On Fri, Jan 22, 2016 at 2:38 PM, Christian Rohmann <crohmann@netcologne.de> wrote: > Hello btrfs-folks, > > I am currently doing a big "btrfs balance" to extend a 8 drive RAID6 to > 12 drives using > "btrfs balance start -dstripes 1..11 -mstripes 1..11" I am not sure why you use/need the stripes filter here; In fact you want a full balance I think. If you cancel sometime during the ongoing balance and then later want to continue, it might be needed in order not to redo the already balanced chunks, maybe that is the case. > With kernel 4.4 and btrfs progs 4.4 it's running fine for a few days now > and the new disks are slowing getting more and more extents. > But somehow the process is VERY slow (3% in 3 days) and there is almost > no additional disk utilization. > > The process doing the balance is doing 100% cpu (one core) so apparently > the whole thing is very much single threaded and therefore CPU-bound in > this case. > > Is this a known issue or is there anything I can do to speed this up? I > mean the disks have plenty of iops left to work with and the box has > many more CPU cores idling away. I have been using raid5 with kernels 3.11..4.1.6 and several disk swaps (add command, delete command, dd, but not replace command). Before raid5 functionally was complete in the kernel, low-level operations were OK w.r.t speed (like a raid0) as far as I remember. Later kernels I remember the operations were very slow and very high cpu load. It has been single core (3.x kernels I believe ), but also multicore but slow. In fact so slow, that samba gave up and the filesytem/server was simply unusable for hour/days/weeks. One reason was I wanted 4x 4TB disk and was halfway (2x 2TB + 2x 4TB) that upgrade. As balances were crashing and very slow, btrfs was using 4x 2TB for 'normal' raid5 (data0 + data1 +parity), but for the second half of the 4TB disks just data + parity. The 'normal' raid5 involving the 2TB disk was very slow, high fragmentation etc. So my experience is, yes it is or can be slow, very slow. Also scrub is roughly 10x slower (with 4.3.x kernels at least) than it should be. A reason is likely that readahead for raid56 is currently not working (see patches in the list), for some operations, not for all AFAIKU. If you use iostat you will get an idea of the speed. It might also be that there are 512 and 4096 sector size effects, but this is just speculation. It might be that just a full balance runs faster, so no filters, you could try that. Otherwise I wouldn't know how to speedup, hopefully the fs is still usable while balancing. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: btrfs-progs 4.4 re-balance of RAID6 is very slow / limited to one cpu core? 2016-01-24 2:30 ` Henk Slager @ 2016-01-25 11:34 ` Christian Rohmann 2016-01-25 22:13 ` Chris Murphy 0 siblings, 1 reply; 28+ messages in thread From: Christian Rohmann @ 2016-01-25 11:34 UTC (permalink / raw) To: Henk Slager, linux-btrfs Hey there Henk, btrfs-enthusiasts, On 01/24/2016 03:30 AM, Henk Slager wrote: > It might be that just a full balance runs faster, so no filters, you > could try that. Otherwise I wouldn't know how to speedup, hopefully > the fs is still usable while balancing. Yes the FS is still usable, munin shows just a little increate in iops and disk latency. The filter should not affect the performance of a balance at all. I am simply saying to only consider chunks which are not spread across all disks yet. Finding out a chunks data distribution should not add any burden on the balancing. The balancing is still VERY VERY slow, we still have 93% left to balance. But since I did not hit any hardware limit (CPU or disk IO) I am confident to say btrfs-balance is buggy in this regard. CPU single thread performance will not explode anytime soon. But disks (or SSD) will still grow in size and so will their potential iops. With a 8 - 12 disk array growth I am not doing something crazy that has never been done before on a storage array either ;-) Regards Christian ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: btrfs-progs 4.4 re-balance of RAID6 is very slow / limited to one cpu core? 2016-01-25 11:34 ` Christian Rohmann @ 2016-01-25 22:13 ` Chris Murphy [not found] ` <CAKZK7uxdX9UBPOKButtPjqBOdVUfHdRTimP+W34fkz1h9P+wHg@mail.gmail.com> 0 siblings, 1 reply; 28+ messages in thread From: Chris Murphy @ 2016-01-25 22:13 UTC (permalink / raw) To: Christian Rohmann; +Cc: Henk Slager, linux-btrfs On Mon, Jan 25, 2016 at 4:34 AM, Christian Rohmann <crohmann@netcologne.de> wrote: > Hey there Henk, btrfs-enthusiasts, > > > On 01/24/2016 03:30 AM, Henk Slager wrote: >> It might be that just a full balance runs faster, so no filters, you >> could try that. Otherwise I wouldn't know how to speedup, hopefully >> the fs is still usable while balancing. > > Yes the FS is still usable, munin shows just a little increate in iops > and disk latency. The filter should not affect the performance of a > balance at all. I am simply saying to only consider chunks which are not > spread across all disks yet. Finding out a chunks data distribution > should not add any burden on the balancing. > > The balancing is still VERY VERY slow, we still have 93% left to > balance. But since I did not hit any hardware limit (CPU or disk IO) I > am confident to say btrfs-balance is buggy in this regard. CPU single > thread performance will not explode anytime soon. But disks (or SSD) > will still grow in size and so will their potential iops. > > With a 8 - 12 disk array growth I am not doing something crazy that has > never been done before on a storage array either ;-) Does anyone suspect a kernel regression here? I wonder if its worth it to suggest testing the current version of all fairly recent kernels: 4.5.rc1, 4.4, 4.3.4, 4.2.8, 4.1.16? I think going farther back to 3.18.x isn't worth it since that's before the major work since raid56 was added. Quite a while ago I've done a raid56 rebuild and balance that was pretty fast but it was only a 4 or 5 device test. -- Chris Murphy ^ permalink raw reply [flat|nested] 28+ messages in thread
[parent not found: <CAKZK7uxdX9UBPOKButtPjqBOdVUfHdRTimP+W34fkz1h9P+wHg@mail.gmail.com>]
* Fwd: btrfs-progs 4.4 re-balance of RAID6 is very slow / limited to one cpu core? [not found] ` <CAKZK7uxdX9UBPOKButtPjqBOdVUfHdRTimP+W34fkz1h9P+wHg@mail.gmail.com> @ 2016-01-26 0:44 ` Justin Brown 2016-01-26 5:17 ` Chris Murphy 0 siblings, 1 reply; 28+ messages in thread From: Justin Brown @ 2016-01-26 0:44 UTC (permalink / raw) To: linux-btrfs > Does anyone suspect a kernel regression here? I wonder if its worth it > to suggest testing the current version of all fairly recent kernels: > 4.5.rc1, 4.4, 4.3.4, 4.2.8, 4.1.16? I don't have any useful information about parity RAID modes or large arrays, so this might be totally useless. Nonetheless, just last week I added a 2TB drive to an existing Btrfs raid10 array (5x 2TB before addition) and did a balance afterwards. I don't take any numbers, but I was frequently looking at htop and iotop. I thought the numbers were extremely good: 100-120MB/s sustained for each drive with the "total" reported by iotop exceeding 600MB/s. That's with integrated sata controller on an Intel Z97 mini-ITX motherboard (cpu i4770). Significantly faster than anticipated. I started it one evening, and it was finished when I awoke the next morning. That was on 4.2.8-300.fc23.x86_64 with btrfs-progs 4.3.1. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: btrfs-progs 4.4 re-balance of RAID6 is very slow / limited to one cpu core? 2016-01-26 0:44 ` Fwd: " Justin Brown @ 2016-01-26 5:17 ` Chris Murphy 2016-01-26 6:14 ` Chris Murphy 0 siblings, 1 reply; 28+ messages in thread From: Chris Murphy @ 2016-01-26 5:17 UTC (permalink / raw) To: linux-btrfs On Mon, Jan 25, 2016 at 5:44 PM, Justin Brown <justin.brown@fandingo.org> wrote: >> Does anyone suspect a kernel regression here? I wonder if its worth it >> to suggest testing the current version of all fairly recent kernels: >> 4.5.rc1, 4.4, 4.3.4, 4.2.8, 4.1.16? > > I don't have any useful information about parity RAID modes or large > arrays, so this might be totally useless. Nonetheless, just last week > I added a 2TB drive to an existing Btrfs raid10 array (5x 2TB before > addition) and did a balance afterwards. I don't take any numbers, but > I was frequently looking at htop and iotop. I thought the numbers were > extremely good: 100-120MB/s sustained for each drive with the "total" > reported by iotop exceeding 600MB/s. That's with integrated sata > controller on an Intel Z97 mini-ITX motherboard (cpu i4770). > Significantly faster than anticipated. I started it one evening, and > it was finished when I awoke the next morning. > > That was on 4.2.8-300.fc23.x86_64 with btrfs-progs 4.3.1. That's been my experience also with raid0 and 10. Because p+q computation is more expensive with raid6, it may need specifically testing with a raid6. If Christian can successfully cancel balance, umount, then reboot another kernel version and retry, it might be useful in tracking down the problem (or someone else willing to test). I'd do it but I don't have enough drive space at the moment to do it with anything other than VM and qcow2 files on a single SSD, although that should at least saturate the SSD or close to it. If so, it would still be faster than what Christian is reporting. -- Chris Murphy ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: btrfs-progs 4.4 re-balance of RAID6 is very slow / limited to one cpu core? 2016-01-26 5:17 ` Chris Murphy @ 2016-01-26 6:14 ` Chris Murphy 2016-01-26 8:54 ` Christian Rohmann 0 siblings, 1 reply; 28+ messages in thread From: Chris Murphy @ 2016-01-26 6:14 UTC (permalink / raw) To: Christian Rohmann; +Cc: linux-btrfs 1495MiB used raid6 in a VM using 4x qcow2 files on an SSD. Host is using kernel 4.4.0 Guest is using kernel 4.5.0rc0.git9.1.fc24 (this is a Fedora Rawhide debug kernel so it'll be a bit slower), btrfs-progs 4.3.1 Not degraded, balance takes 11 seconds, that's ~136MiB/s iotop isn't consistent, max is ~300MiB/s write. Reboot with 1 device missing and a new empty qcow2 in its place, mount degraded and 'btrfs replace' takes 13 seconds according to 'btrfs replace status'. I don't know that this is very useful information though. Christian, what are you getting for 'iotop -d3 -o' or 'iostat -d3'. Is it consistent or is it fluctuating all over the place? What sort of eyeball avg/min/max are you getting? Chris Murphy ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: btrfs-progs 4.4 re-balance of RAID6 is very slow / limited to one cpu core? 2016-01-26 6:14 ` Chris Murphy @ 2016-01-26 8:54 ` Christian Rohmann 2016-01-26 19:26 ` Chris Murphy 0 siblings, 1 reply; 28+ messages in thread From: Christian Rohmann @ 2016-01-26 8:54 UTC (permalink / raw) To: Chris Murphy; +Cc: linux-btrfs Hey Chris and all, On 01/25/2016 11:13 PM, Chris Murphy wrote: > Does anyone suspect a kernel regression here? I wonder if its worth it > to suggest testing the current version of all fairly recent kernels: > 4.5.rc1, 4.4, 4.3.4, 4.2.8, 4.1.16? I think going farther back to > 3.18.x isn't worth it since that's before the major work since raid56 > was added. Quite a while ago I've done a raid56 rebuild and balance > that was pretty fast but it was only a 4 or 5 device test. Problem is that this balance did not work before going to 4.4 kernel, it's was simply crashing after about an hour or two of runtime. Currently I am using 4.4 kernel + btrfs-progs, so apart from 4.5rc1 I can not get any more bleeding edge. 4.5 I am happy to try, but not RC1 as there are already some bugs popping up regarding the BTRFS changes. On 01/26/2016 07:14 AM, Chris Murphy wrote: > Christian, what are you getting for 'iotop -d3 -o' or 'iostat -d3'. Is > it consistent or is it fluctuating all over the place? What sort of > eyeball avg/min/max are you getting? "1672.81 K/s 1672.81 K/s 0.00 % 6.99 % btrfs balance start -dstripes 1..11 -mstripes 1..11 " but it's jumping up to 25MB/s for a few polls, but most of the time it's at 1.3 to 1.7 MB/s You may check out more the various munin graphs of the box if you like: * http://mirror.netcologne.de/munin has all the goods. This also brings me to mention, that the disks (it's 12 disks though!) read somewhere between 20 and 60MB/s constantly. Regards Christian ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: btrfs-progs 4.4 re-balance of RAID6 is very slow / limited to one cpu core? 2016-01-26 8:54 ` Christian Rohmann @ 2016-01-26 19:26 ` Chris Murphy 2016-01-26 19:27 ` Chris Murphy 2016-01-26 19:57 ` Austin S. Hemmelgarn 0 siblings, 2 replies; 28+ messages in thread From: Chris Murphy @ 2016-01-26 19:26 UTC (permalink / raw) To: Christian Rohmann; +Cc: Chris Murphy, linux-btrfs On Tue, Jan 26, 2016 at 1:54 AM, Christian Rohmann <crohmann@netcologne.de> wrote: > Hey Chris and all, > > On 01/25/2016 11:13 PM, Chris Murphy wrote: >> Does anyone suspect a kernel regression here? I wonder if its worth it >> to suggest testing the current version of all fairly recent kernels: >> 4.5.rc1, 4.4, 4.3.4, 4.2.8, 4.1.16? I think going farther back to >> 3.18.x isn't worth it since that's before the major work since raid56 >> was added. Quite a while ago I've done a raid56 rebuild and balance >> that was pretty fast but it was only a 4 or 5 device test. > > Problem is that this balance did not work before going to 4.4 kernel, > it's was simply crashing after about an hour or two of runtime. > > Currently I am using 4.4 kernel + btrfs-progs, so apart from 4.5rc1 I > can not get any more bleeding edge. > > 4.5 I am happy to try, but not RC1 as there are already some bugs > popping up regarding the BTRFS changes. > > > On 01/26/2016 07:14 AM, Chris Murphy wrote: >> Christian, what are you getting for 'iotop -d3 -o' or 'iostat -d3'. Is >> it consistent or is it fluctuating all over the place? What sort of >> eyeball avg/min/max are you getting? > > "1672.81 K/s 1672.81 K/s 0.00 % 6.99 % btrfs balance start -dstripes > 1..11 -mstripes 1..11 " > > but it's jumping up to 25MB/s for a few polls, but most of the time it's > at 1.3 to 1.7 MB/s That is really slow. The fact you can't balance without crashing prior to a 4.4 kernel makes me suspicious about the file system state. What about reading and writing files? What's the performance in that case? Is it just the balance that's this slow? Do you have the call traces for older kernel crashes with balance? What btrfs-progs was used to create the raid6 volume? Maybe the slowness is due to the -dstripes -mstripes filter. That's relatively new. And I didn't try that. And I also don't really understand the values you picked either. Seems to me if you've added four drives relatively recently, there won't be many chunks using 12-strip stripes, most of them will be 8-strip stripes. So I don't really know what you're limiting. -- Chris Murphy ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: btrfs-progs 4.4 re-balance of RAID6 is very slow / limited to one cpu core? 2016-01-26 19:26 ` Chris Murphy @ 2016-01-26 19:27 ` Chris Murphy 2016-01-26 19:57 ` Austin S. Hemmelgarn 1 sibling, 0 replies; 28+ messages in thread From: Chris Murphy @ 2016-01-26 19:27 UTC (permalink / raw) To: Chris Murphy; +Cc: Christian Rohmann, linux-btrfs On Tue, Jan 26, 2016 at 12:26 PM, Chris Murphy <lists@colorremedies.com> wrote: > On Tue, Jan 26, 2016 at 1:54 AM, Christian Rohmann > <crohmann@netcologne.de> wrote: >> Hey Chris and all, >> >> On 01/25/2016 11:13 PM, Chris Murphy wrote: >>> Does anyone suspect a kernel regression here? I wonder if its worth it >>> to suggest testing the current version of all fairly recent kernels: >>> 4.5.rc1, 4.4, 4.3.4, 4.2.8, 4.1.16? I think going farther back to >>> 3.18.x isn't worth it since that's before the major work since raid56 >>> was added. Quite a while ago I've done a raid56 rebuild and balance >>> that was pretty fast but it was only a 4 or 5 device test. >> >> Problem is that this balance did not work before going to 4.4 kernel, >> it's was simply crashing after about an hour or two of runtime. >> >> Currently I am using 4.4 kernel + btrfs-progs, so apart from 4.5rc1 I >> can not get any more bleeding edge. >> >> 4.5 I am happy to try, but not RC1 as there are already some bugs >> popping up regarding the BTRFS changes. >> >> >> On 01/26/2016 07:14 AM, Chris Murphy wrote: >>> Christian, what are you getting for 'iotop -d3 -o' or 'iostat -d3'. Is >>> it consistent or is it fluctuating all over the place? What sort of >>> eyeball avg/min/max are you getting? >> >> "1672.81 K/s 1672.81 K/s 0.00 % 6.99 % btrfs balance start -dstripes >> 1..11 -mstripes 1..11 " >> >> but it's jumping up to 25MB/s for a few polls, but most of the time it's >> at 1.3 to 1.7 MB/s > > > That is really slow. The fact you can't balance without crashing prior > to a 4.4 kernel makes me suspicious about the file system state. What > about reading and writing files? What's the performance in that case? > Is it just the balance that's this slow? Do you have the call traces > for older kernel crashes with balance? What btrfs-progs was used to > create the raid6 volume? > > Maybe the slowness is due to the -dstripes -mstripes filter. That's > relatively new. And I didn't try that. And I also don't really > understand the values you picked either. Seems to me if you've added > four drives relatively recently, there won't be many chunks using > 12-strip stripes, most of them will be 8-strip stripes. So I don't > really know what you're limiting. I guess the bottom line of what I'm suggesting before trying anything else, is to stop the balance and start a normal one without filters and see how that performs. -- Chris Murphy ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: btrfs-progs 4.4 re-balance of RAID6 is very slow / limited to one cpu core? 2016-01-26 19:26 ` Chris Murphy 2016-01-26 19:27 ` Chris Murphy @ 2016-01-26 19:57 ` Austin S. Hemmelgarn 2016-01-26 20:20 ` Chris Murphy 1 sibling, 1 reply; 28+ messages in thread From: Austin S. Hemmelgarn @ 2016-01-26 19:57 UTC (permalink / raw) To: Chris Murphy, Christian Rohmann; +Cc: linux-btrfs On 2016-01-26 14:26, Chris Murphy wrote: > On Tue, Jan 26, 2016 at 1:54 AM, Christian Rohmann > <crohmann@netcologne.de> wrote: >> Hey Chris and all, >> >> On 01/25/2016 11:13 PM, Chris Murphy wrote: >>> Does anyone suspect a kernel regression here? I wonder if its worth it >>> to suggest testing the current version of all fairly recent kernels: >>> 4.5.rc1, 4.4, 4.3.4, 4.2.8, 4.1.16? I think going farther back to >>> 3.18.x isn't worth it since that's before the major work since raid56 >>> was added. Quite a while ago I've done a raid56 rebuild and balance >>> that was pretty fast but it was only a 4 or 5 device test. >> >> Problem is that this balance did not work before going to 4.4 kernel, >> it's was simply crashing after about an hour or two of runtime. >> >> Currently I am using 4.4 kernel + btrfs-progs, so apart from 4.5rc1 I >> can not get any more bleeding edge. >> >> 4.5 I am happy to try, but not RC1 as there are already some bugs >> popping up regarding the BTRFS changes. >> >> >> On 01/26/2016 07:14 AM, Chris Murphy wrote: >>> Christian, what are you getting for 'iotop -d3 -o' or 'iostat -d3'. Is >>> it consistent or is it fluctuating all over the place? What sort of >>> eyeball avg/min/max are you getting? >> >> "1672.81 K/s 1672.81 K/s 0.00 % 6.99 % btrfs balance start -dstripes >> 1..11 -mstripes 1..11 " >> >> but it's jumping up to 25MB/s for a few polls, but most of the time it's >> at 1.3 to 1.7 MB/s > > > That is really slow. The fact you can't balance without crashing prior > to a 4.4 kernel makes me suspicious about the file system state. What > about reading and writing files? What's the performance in that case? > Is it just the balance that's this slow? Do you have the call traces > for older kernel crashes with balance? What btrfs-progs was used to > create the raid6 volume? > > Maybe the slowness is due to the -dstripes -mstripes filter. That's > relatively new. And I didn't try that. And I also don't really > understand the values you picked either. Seems to me if you've added > four drives relatively recently, there won't be many chunks using > 12-strip stripes, most of them will be 8-strip stripes. So I don't > really know what you're limiting. > The filters he used are telling balance to re-stripe anything spanning less than 12 devices. So, in essence, it's only going to re-stripe the chunks from before the fourth disk was added. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: btrfs-progs 4.4 re-balance of RAID6 is very slow / limited to one cpu core? 2016-01-26 19:57 ` Austin S. Hemmelgarn @ 2016-01-26 20:20 ` Chris Murphy 2016-01-27 8:48 ` Christian Rohmann 0 siblings, 1 reply; 28+ messages in thread From: Chris Murphy @ 2016-01-26 20:20 UTC (permalink / raw) To: Austin S. Hemmelgarn; +Cc: Chris Murphy, Christian Rohmann, linux-btrfs On Tue, Jan 26, 2016 at 12:57 PM, Austin S. Hemmelgarn <ahferroin7@gmail.com> wrote: > On 2016-01-26 14:26, Chris Murphy wrote: >> >> On Tue, Jan 26, 2016 at 1:54 AM, Christian Rohmann >> <crohmann@netcologne.de> wrote: >>> >>> Hey Chris and all, >>> >>> On 01/25/2016 11:13 PM, Chris Murphy wrote: >>>> >>>> Does anyone suspect a kernel regression here? I wonder if its worth it >>>> to suggest testing the current version of all fairly recent kernels: >>>> 4.5.rc1, 4.4, 4.3.4, 4.2.8, 4.1.16? I think going farther back to >>>> 3.18.x isn't worth it since that's before the major work since raid56 >>>> was added. Quite a while ago I've done a raid56 rebuild and balance >>>> that was pretty fast but it was only a 4 or 5 device test. >>> >>> >>> Problem is that this balance did not work before going to 4.4 kernel, >>> it's was simply crashing after about an hour or two of runtime. >>> >>> Currently I am using 4.4 kernel + btrfs-progs, so apart from 4.5rc1 I >>> can not get any more bleeding edge. >>> >>> 4.5 I am happy to try, but not RC1 as there are already some bugs >>> popping up regarding the BTRFS changes. >>> >>> >>> On 01/26/2016 07:14 AM, Chris Murphy wrote: >>>> >>>> Christian, what are you getting for 'iotop -d3 -o' or 'iostat -d3'. Is >>>> it consistent or is it fluctuating all over the place? What sort of >>>> eyeball avg/min/max are you getting? >>> >>> >>> "1672.81 K/s 1672.81 K/s 0.00 % 6.99 % btrfs balance start -dstripes >>> 1..11 -mstripes 1..11 " >>> >>> but it's jumping up to 25MB/s for a few polls, but most of the time it's >>> at 1.3 to 1.7 MB/s >> >> >> >> That is really slow. The fact you can't balance without crashing prior >> to a 4.4 kernel makes me suspicious about the file system state. What >> about reading and writing files? What's the performance in that case? >> Is it just the balance that's this slow? Do you have the call traces >> for older kernel crashes with balance? What btrfs-progs was used to >> create the raid6 volume? >> >> Maybe the slowness is due to the -dstripes -mstripes filter. That's >> relatively new. And I didn't try that. And I also don't really >> understand the values you picked either. Seems to me if you've added >> four drives relatively recently, there won't be many chunks using >> 12-strip stripes, most of them will be 8-strip stripes. So I don't >> really know what you're limiting. >> > The filters he used are telling balance to re-stripe anything spanning less > than 12 devices. So, in essence, it's only going to re-stripe the chunks > from before the fourth disk was added. Which is most of what's on the volume unless the 4 disks were added and used for a while, but I can't tell what the time frame is. Anyway, it seems reasonable to try a balance without the filters to see if that's a factor, because those filters are brand new in btrfs-progs 4.4. Granted, I'd expect they've been tested by upstream developers, but I don't know if there's an fstest for balance with these specific filters yet. -- Chris Murphy ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: btrfs-progs 4.4 re-balance of RAID6 is very slow / limited to one cpu core? 2016-01-26 20:20 ` Chris Murphy @ 2016-01-27 8:48 ` Christian Rohmann 2016-01-27 16:34 ` Austin S. Hemmelgarn 0 siblings, 1 reply; 28+ messages in thread From: Christian Rohmann @ 2016-01-27 8:48 UTC (permalink / raw) To: Chris Murphy, Austin S. Hemmelgarn; +Cc: linux-btrfs On 01/26/2016 09:20 PM, Chris Murphy wrote: > nyway, > it seems reasonable to try a balance without the filters to see if > that's a factor, because those filters are brand new in btrfs-progs > 4.4. Granted, I'd expect they've been tested by upstream developers, > but I don't know if there's an fstest for balance with these specific > filters yet. I have another box with 8 disks RAID6 on which I simply did a balance with no newly added drives. Same issue ... VERY slow running balance with IO nowhere near 100% utilization and many many days of runtime to finish. Regards Christian ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: btrfs-progs 4.4 re-balance of RAID6 is very slow / limited to one cpu core? 2016-01-27 8:48 ` Christian Rohmann @ 2016-01-27 16:34 ` Austin S. Hemmelgarn 2016-01-27 20:58 ` bbrendon 2016-01-27 21:53 ` Chris Murphy 0 siblings, 2 replies; 28+ messages in thread From: Austin S. Hemmelgarn @ 2016-01-27 16:34 UTC (permalink / raw) To: Christian Rohmann, Chris Murphy; +Cc: linux-btrfs On 2016-01-27 03:48, Christian Rohmann wrote: > > > On 01/26/2016 09:20 PM, Chris Murphy wrote: >> nyway, >> it seems reasonable to try a balance without the filters to see if >> that's a factor, because those filters are brand new in btrfs-progs >> 4.4. Granted, I'd expect they've been tested by upstream developers, >> but I don't know if there's an fstest for balance with these specific >> filters yet. > > I have another box with 8 disks RAID6 on which I simply did a balance > with no newly added drives. Same issue ... VERY slow running balance > with IO nowhere near 100% utilization and many many days of runtime to > finish. Hmm, I did some automated testing in a couple of VM's last night, and I have to agree, this _really_ needs to get optimized. Using the same data-set on otherwise identical VM's, I saw an average 28x slowdown (best case was 16x, worst was almost 100x) for balancing a RAID6 set versus a RAID1 set. While the parity computations add to the time, there is absolutely no way that just that can explain why this is taking so long. The closest comparison using MD or DM RAID is probably a full verification of the array, and the greatest difference there that I've seen is around 10x. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: btrfs-progs 4.4 re-balance of RAID6 is very slow / limited to one cpu core? 2016-01-27 16:34 ` Austin S. Hemmelgarn @ 2016-01-27 20:58 ` bbrendon 2016-01-27 21:53 ` Chris Murphy 1 sibling, 0 replies; 28+ messages in thread From: bbrendon @ 2016-01-27 20:58 UTC (permalink / raw) Cc: linux-btrfs I ran into some major problems with balancing a raid6 array recently on 4.4. It wouldn't resume without crashing and yes, it was VERY slow. In my case, it took 4 days. I found a link to the posting. http://www.spinics.net/lists/linux-btrfs/msg51159.html On Wed, Jan 27, 2016 at 8:34 AM, Austin S. Hemmelgarn <ahferroin7@gmail.com> wrote: > On 2016-01-27 03:48, Christian Rohmann wrote: >> >> >> >> On 01/26/2016 09:20 PM, Chris Murphy wrote: >>> >>> nyway, >>> it seems reasonable to try a balance without the filters to see if >>> that's a factor, because those filters are brand new in btrfs-progs >>> 4.4. Granted, I'd expect they've been tested by upstream developers, >>> but I don't know if there's an fstest for balance with these specific >>> filters yet. >> >> >> I have another box with 8 disks RAID6 on which I simply did a balance >> with no newly added drives. Same issue ... VERY slow running balance >> with IO nowhere near 100% utilization and many many days of runtime to >> finish. > > Hmm, I did some automated testing in a couple of VM's last night, and I have > to agree, this _really_ needs to get optimized. Using the same data-set on > otherwise identical VM's, I saw an average 28x slowdown (best case was 16x, > worst was almost 100x) for balancing a RAID6 set versus a RAID1 set. While > the parity computations add to the time, there is absolutely no way that > just that can explain why this is taking so long. The closest comparison > using MD or DM RAID is probably a full verification of the array, and the > greatest difference there that I've seen is around 10x. > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: btrfs-progs 4.4 re-balance of RAID6 is very slow / limited to one cpu core? 2016-01-27 16:34 ` Austin S. Hemmelgarn 2016-01-27 20:58 ` bbrendon @ 2016-01-27 21:53 ` Chris Murphy 2016-01-28 12:27 ` Austin S. Hemmelgarn 2016-02-01 14:10 ` Christian Rohmann 1 sibling, 2 replies; 28+ messages in thread From: Chris Murphy @ 2016-01-27 21:53 UTC (permalink / raw) To: Austin S. Hemmelgarn; +Cc: Christian Rohmann, linux-btrfs On Wed, Jan 27, 2016 at 9:34 AM, Austin S. Hemmelgarn <ahferroin7@gmail.com> wrote: > Hmm, I did some automated testing in a couple of VM's last night, and I have > to agree, this _really_ needs to get optimized. Using the same data-set on > otherwise identical VM's, I saw an average 28x slowdown (best case was 16x, > worst was almost 100x) for balancing a RAID6 set versus a RAID1 set. While > the parity computations add to the time, there is absolutely no way that > just that can explain why this is taking so long. The closest comparison > using MD or DM RAID is probably a full verification of the array, and the > greatest difference there that I've seen is around 10x. I can't exactly reproduce this. I'm using +C qcow2 on Btrfs on one SSD to back the drives in the VM. 2x btrfs raid1 with files totalling 5G consistently takes ~1 minute [1] to balance (no filters) 4x btrfs raid6 with the same files *inconsistently* takes ~1m15s [2] to balance (no filters) iotop is all over the place, from 21MB/s writes to 527MB/s Do both of you get something like this: [root@f23m ~]# dmesg | grep -i raid [ 1.518682] raid6: sse2x1 gen() 4531 MB/s [ 1.535663] raid6: sse2x1 xor() 3783 MB/s [ 1.552683] raid6: sse2x2 gen() 10140 MB/s [ 1.569658] raid6: sse2x2 xor() 7306 MB/s [ 1.586673] raid6: sse2x4 gen() 11261 MB/s [ 1.603683] raid6: sse2x4 xor() 7009 MB/s [ 1.603685] raid6: using algorithm sse2x4 gen() 11261 MB/s [ 1.603686] raid6: .... xor() 7009 MB/s, rmw enabled [ 1.603687] raid6: using ssse3x2 recovery algorithm [1] Did it 3 times 1m8 0m58 0m40 [2] Did this multiple times 1m15s 0m55s 0m49s And then from that point all attempts were 2+m, but never more than 2m29s. I'm not sure why, but there were a lot of drop outs in iotop where it'd go to 0MB/s for a couple seconds. I captured some sysrq+t for this. https://drive.google.com/open?id=0B_2Asp8DGjJ9SE5ZNTBGQUV1ZUk -- Chris Murphy ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: btrfs-progs 4.4 re-balance of RAID6 is very slow / limited to one cpu core? 2016-01-27 21:53 ` Chris Murphy @ 2016-01-28 12:27 ` Austin S. Hemmelgarn 2016-02-01 14:10 ` Christian Rohmann 1 sibling, 0 replies; 28+ messages in thread From: Austin S. Hemmelgarn @ 2016-01-28 12:27 UTC (permalink / raw) To: Chris Murphy; +Cc: Christian Rohmann, linux-btrfs On 2016-01-27 16:53, Chris Murphy wrote: > On Wed, Jan 27, 2016 at 9:34 AM, Austin S. Hemmelgarn > <ahferroin7@gmail.com> wrote: > >> Hmm, I did some automated testing in a couple of VM's last night, and I have >> to agree, this _really_ needs to get optimized. Using the same data-set on >> otherwise identical VM's, I saw an average 28x slowdown (best case was 16x, >> worst was almost 100x) for balancing a RAID6 set versus a RAID1 set. While >> the parity computations add to the time, there is absolutely no way that >> just that can explain why this is taking so long. The closest comparison >> using MD or DM RAID is probably a full verification of the array, and the >> greatest difference there that I've seen is around 10x. > > I can't exactly reproduce this. I'm using +C qcow2 on Btrfs on one SSD > to back the drives in the VM. In my case I was using a set of 8 thinly-provisioned 256G (virtual size) LVM volumes exposed directly to a Xen VM as virtual block devices, physically backed by traditional hard drives. For both tests, I used a filesystem spanning all the disks which had a lot of sparse files, and had had a lot of data chunks forced allocated and then made almost empty. I made a point to use snapshots to ensure that the filesystem itself was not a variable in this. It's probably worth noting that the system I ran this on does have other VM's running at the same time on the same physical CPU's, but we need to plan for that use case also. > > 2x btrfs raid1 with files totalling 5G consistently takes ~1 minute > [1] to balance (no filters) Similar times here. > > 4x btrfs raid6 with the same files *inconsistently* takes ~1m15s [2] > to balance (no filters) On this I was literally getting around 30 minutes on average, with one case where it only took 16, and one where it took 97. On both configurations, I did 12 runs total. > iotop is all over the place, from 21MB/s writes to 527MB/s Similar results with iotop, with values ranging from 2MB/s up to spikes of 100MB/s (which is about 150% of the measured streaming write speed from the VM going straight to the virtual disk). > > > Do both of you get something like this: > [root@f23m ~]# dmesg | grep -i raid > [ 1.518682] raid6: sse2x1 gen() 4531 MB/s > [ 1.535663] raid6: sse2x1 xor() 3783 MB/s > [ 1.552683] raid6: sse2x2 gen() 10140 MB/s > [ 1.569658] raid6: sse2x2 xor() 7306 MB/s > [ 1.586673] raid6: sse2x4 gen() 11261 MB/s > [ 1.603683] raid6: sse2x4 xor() 7009 MB/s > [ 1.603685] raid6: using algorithm sse2x4 gen() 11261 MB/s > [ 1.603686] raid6: .... xor() 7009 MB/s, rmw enabled > [ 1.603687] raid6: using ssse3x2 recovery algorithm My system picks avx2x4, which supposedly gets 6.6 GB/s on this hardware, although I've never seen any raid recovery, even on RAM disks, manage that kind of computational throughput. > > > > [1] Did it 3 times > 1m8 > 0m58 > 0m40 > > [2] Did this multiple times > 1m15s > 0m55s > 0m49s > And then from that point all attempts were 2+m, but never more than > 2m29s. I'm not sure why, but there were a lot of drop outs in iotop > where it'd go to 0MB/s for a couple seconds. I captured some sysrq+t > for this. I saw similar drops in IO performance as well, although I didn't get any traces for it. > > https://drive.google.com/open?id=0B_2Asp8DGjJ9SE5ZNTBGQUV1ZUk > > ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: btrfs-progs 4.4 re-balance of RAID6 is very slow / limited to one cpu core? 2016-01-27 21:53 ` Chris Murphy 2016-01-28 12:27 ` Austin S. Hemmelgarn @ 2016-02-01 14:10 ` Christian Rohmann 2016-02-01 20:52 ` Chris Murphy 1 sibling, 1 reply; 28+ messages in thread From: Christian Rohmann @ 2016-02-01 14:10 UTC (permalink / raw) To: Chris Murphy, Austin S. Hemmelgarn; +Cc: linux-btrfs Hey Chris, sorry for the late reply. On 01/27/2016 10:53 PM, Chris Murphy wrote: > I can't exactly reproduce this. I'm using +C qcow2 on Btrfs on one SSD > to back the drives in the VM. > > 2x btrfs raid1 with files totalling 5G consistently takes ~1 minute > [1] to balance (no filters) > > 4x btrfs raid6 with the same files *inconsistently* takes ~1m15s [2] > to balance (no filters) > iotop is all over the place, from 21MB/s writes to 527MB/s To be honest, 5G is not really 21T spread across 12 spindles with LOTS of data on them. On another box with 8x4TB spinning rust it's also very slow. > Do both of you get something like this: > [root@f23m ~]# dmesg | grep -i raid > [ 1.518682] raid6: sse2x1 gen() 4531 MB/s > [ 1.535663] raid6: sse2x1 xor() 3783 MB/s > [ 1.552683] raid6: sse2x2 gen() 10140 MB/s > [ 1.569658] raid6: sse2x2 xor() 7306 MB/s > [ 1.586673] raid6: sse2x4 gen() 11261 MB/s > [ 1.603683] raid6: sse2x4 xor() 7009 MB/s > [ 1.603685] raid6: using algorithm sse2x4 gen() 11261 MB/s > [ 1.603686] raid6: .... xor() 7009 MB/s, rmw enabled > [ 1.603687] raid6: using ssse3x2 recovery algorithm Yes: --- cut --- [ 4.704396] raid6: sse2x1 gen() 4288 MB/s [ 4.772401] raid6: sse2x1 xor() 4036 MB/s [ 4.840403] raid6: sse2x2 gen() 7629 MB/s [ 4.908405] raid6: sse2x2 xor() 6247 MB/s [ 4.976404] raid6: sse2x4 gen() 10221 MB/s [ 5.044397] raid6: sse2x4 xor() 7620 MB/s [ 5.044525] raid6: using algorithm sse2x4 gen() 10221 MB/s [ 5.044641] raid6: .... xor() 7620 MB/s, rmw enabled [ 5.044767] raid6: using ssse3x2 recovery algorithm --- cut --- Would some sort of stracing or profiling of the process help to narrow down where the time is currently spent and why the balancing is only running single-threaded? Regards Christian ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: btrfs-progs 4.4 re-balance of RAID6 is very slow / limited to one cpu core? 2016-02-01 14:10 ` Christian Rohmann @ 2016-02-01 20:52 ` Chris Murphy 2016-02-09 13:48 ` Christian Rohmann 0 siblings, 1 reply; 28+ messages in thread From: Chris Murphy @ 2016-02-01 20:52 UTC (permalink / raw) To: Christian Rohmann; +Cc: Chris Murphy, Austin S. Hemmelgarn, linux-btrfs On Mon, Feb 1, 2016 at 7:10 AM, Christian Rohmann <crohmann@netcologne.de> wrote: > Hey Chris, > > > sorry for the late reply. > > > On 01/27/2016 10:53 PM, Chris Murphy wrote: >> I can't exactly reproduce this. I'm using +C qcow2 on Btrfs on one SSD >> to back the drives in the VM. >> >> 2x btrfs raid1 with files totalling 5G consistently takes ~1 minute >> [1] to balance (no filters) >> >> 4x btrfs raid6 with the same files *inconsistently* takes ~1m15s [2] >> to balance (no filters) >> iotop is all over the place, from 21MB/s writes to 527MB/s > > To be honest, 5G is not really 21T spread across 12 spindles with LOTS > of data on them. On another box with 8x4TB spinning rust it's also very > slow. 5G vs 21T is relevant if the mere fact there's more metadata (bigger file system) is the source of the problem. Otherwise, at a moment in time, neither one of us have 5G let alone 21T of data in-flight. But you have 12 drives, with a theoretical data bandwidth for reads and writes of about 1GiB/s depending on the performance of the drives, and where on the platter the read/write happens. So my test is actually the disadvantaged one. My scenario with 4 qcow2 files on a single SSD should not perform better, except possibly with respect to IOPS. But this is not a metdata intensive test, it was merely two large sequential files. So if you have very heavy metadata intensive workload, that's actually pretty bad for any RAID6 and it's probably not great for Btrfs either. A consideration is how metadata chunks get balanced on raid6 where the strip size is 64K and the nodesize is 16K. If there's a lot of metadata being produced, I think we'd expect first that 16K nodes are fully packed, and then each 64K strip per device is fully packed, then parity is computed for that stripe, and then the whole stripe is written. But when modified, what does a single key change look like? The minimum initial change is a single 16KiB node has to be CoWd, but since it's raid6, that means what? 1. Read the 64K strip containing the 16K node. 2. Read the separate 64K strip containing its csum? Not sure if the node's csum is actually in the node itself. 3. Does btrfs raid6 always check parity on every read? That's not the case with md raid. On normal reads where the drive does not report a read error, parity strips are never read, so in effect it's raid0 using n-2 drives, with the strip being the minimum read size. Depending on all of this, a single 16K read means 1-3 IOs. And a modification would require 4-6 IOs. Each IO is 64K. So this is not going to be small file friendly at all the way I see it, hence why it could be really valuable to have raid1 metadata (with n way mirroring). Or possibly set the nodesize to 64K to match the strip size? So the test I did is relevant in that a.) it's sufficiently different from your setup, b.) I can't reproduce the problem where raid6 balance takes longer than raid1 balance. So there's something else going on other than it merely being raid6. It's raid6 *and* it's something else, like the workload. > Would some sort of stracing or profiling of the process help to narrow > down where the time is currently spent and why the balancing is only > running single-threaded? This can't be straced. Someone a lot more knowledgeable than I am might figure out where all the waits are with just a sysrq + t, if it is a hold up in say parity computations. Otherwise perf which is a rabbit hole but perf top is kinda cool to watch. That might give you an idea where most of the cpu cycles are going if you can isolate the workload to just the balance. Otherwise you may end up with noisy data. -- Chris Murphy ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: btrfs-progs 4.4 re-balance of RAID6 is very slow / limited to one cpu core? 2016-02-01 20:52 ` Chris Murphy @ 2016-02-09 13:48 ` Christian Rohmann 2016-02-09 16:46 ` Marc MERLIN 2016-02-09 21:46 ` Chris Murphy 0 siblings, 2 replies; 28+ messages in thread From: Christian Rohmann @ 2016-02-09 13:48 UTC (permalink / raw) To: Chris Murphy; +Cc: Austin S. Hemmelgarn, linux-btrfs On 02/01/2016 09:52 PM, Chris Murphy wrote: >> Would some sort of stracing or profiling of the process help to narrow >> > down where the time is currently spent and why the balancing is only >> > running single-threaded? > This can't be straced. Someone a lot more knowledgeable than I am > might figure out where all the waits are with just a sysrq + t, if it > is a hold up in say parity computations. Otherwise perf which is a > rabbit hole but perf top is kinda cool to watch. That might give you > an idea where most of the cpu cycles are going if you can isolate the > workload to just the balance. Otherwise you may end up with noisy > data. My balance run is now working away since 19th of January: "885 out of about 3492 chunks balanced (996 considered), 75% left" So this will take several more WEEKS to finish. Is there really nothing anyone here wants me to do or analyze to help finding the root cause of this? I mean with this kind of performance there is no way a RAID6 can be used in production. Not because the code is not stable or functioning, but because regular maintenance like replacing a drive or growing an array takes WEEKS in which another maintenance procedure could be necessary or, much worse, another drive might have failed. What I'm saying is: Such a slow RAID6 balance renders the redundancy unusable because drives might fail quicker than the potential rebuild (read "balance"). Regards Christian ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: btrfs-progs 4.4 re-balance of RAID6 is very slow / limited to one cpu core? 2016-02-09 13:48 ` Christian Rohmann @ 2016-02-09 16:46 ` Marc MERLIN 2016-02-09 21:46 ` Chris Murphy 1 sibling, 0 replies; 28+ messages in thread From: Marc MERLIN @ 2016-02-09 16:46 UTC (permalink / raw) To: Christian Rohmann; +Cc: Chris Murphy, Austin S. Hemmelgarn, linux-btrfs On Tue, Feb 09, 2016 at 02:48:14PM +0100, Christian Rohmann wrote: > > > On 02/01/2016 09:52 PM, Chris Murphy wrote: > >> Would some sort of stracing or profiling of the process help to narrow > >> > down where the time is currently spent and why the balancing is only > >> > running single-threaded? > > This can't be straced. Someone a lot more knowledgeable than I am > > might figure out where all the waits are with just a sysrq + t, if it > > is a hold up in say parity computations. Otherwise perf which is a > > rabbit hole but perf top is kinda cool to watch. That might give you > > an idea where most of the cpu cycles are going if you can isolate the > > workload to just the balance. Otherwise you may end up with noisy > > data. > > My balance run is now working away since 19th of January: > "885 out of about 3492 chunks balanced (996 considered), 75% left" > > So this will take several more WEEKS to finish. Is there really nothing > anyone here wants me to do or analyze to help finding the root cause of > this? I mean with this kind of performance there is no way a RAID6 can > be used in production. Not because the code is not stable or > functioning, but because regular maintenance like replacing a drive or > growing an array takes WEEKS in which another maintenance procedure > could be necessary or, much worse, another drive might have failed. > > What I'm saying is: Such a slow RAID6 balance renders the redundancy > unusable because drives might fail quicker than the potential rebuild > (read "balance"). I agree, this is bad. For what it's worth, one of my own filesystems (target for backups, many many files) has apparently become slow enough that it half hangs my system when I'm using it. I've just unmounted it to make sure my overall system performance comes back, and I may have to delete and recreate it. Sadly, this also means that btrfs still seems to get itself in corner cases that are causing performance issues. I'm not saying that you did hit this problem, but it is possible. Marc -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems .... .... what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ | PGP 1024R/763BE901 ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: btrfs-progs 4.4 re-balance of RAID6 is very slow / limited to one cpu core? 2016-02-09 13:48 ` Christian Rohmann 2016-02-09 16:46 ` Marc MERLIN @ 2016-02-09 21:46 ` Chris Murphy 2016-02-10 2:23 ` Chris Murphy 2016-02-10 13:19 ` Christian Rohmann 1 sibling, 2 replies; 28+ messages in thread From: Chris Murphy @ 2016-02-09 21:46 UTC (permalink / raw) To: Christian Rohmann; +Cc: Chris Murphy, Austin S. Hemmelgarn, linux-btrfs On Tue, Feb 9, 2016 at 6:48 AM, Christian Rohmann <crohmann@netcologne.de> wrote: > > > On 02/01/2016 09:52 PM, Chris Murphy wrote: >>> Would some sort of stracing or profiling of the process help to narrow >>> > down where the time is currently spent and why the balancing is only >>> > running single-threaded? >> This can't be straced. Someone a lot more knowledgeable than I am >> might figure out where all the waits are with just a sysrq + t, if it >> is a hold up in say parity computations. Otherwise perf which is a >> rabbit hole but perf top is kinda cool to watch. That might give you >> an idea where most of the cpu cycles are going if you can isolate the >> workload to just the balance. Otherwise you may end up with noisy >> data. > > My balance run is now working away since 19th of January: > "885 out of about 3492 chunks balanced (996 considered), 75% left" > > So this will take several more WEEKS to finish. Is there really nothing > anyone here wants me to do or analyze to help finding the root cause of > this? Can you run 'perf top' and let it run for a few minutes, then copy/paste or screenshot it somewhere? I'll definitely say in advance this is just a matter of curiosity where the kernel is spending all of its time, that this is going so slowly. In no way can I imagine being able to help fix it. I'm a bit surprised there's no dev response, maybe try the IRC channel? Weeks is just too long. My concern is if there's a drive failure, a.) what state is the fs going to be in and b.) will device replace be this slow too? I'd expect the code path for balance and replace to be the same, so I suspect yes. > I mean with this kind of performance there is no way a RAID6 can > be used in production. Not because the code is not stable or > functioning, but because regular maintenance like replacing a drive or > growing an array takes WEEKS in which another maintenance procedure > could be necessary or, much worse, another drive might have failed. That's right. In my dummy test, which should have run slower than your setup, the other differences on my end: elevator=noop ## because I'm running an SSD kernel 4.5rc0 I could redo my test, using 'perf top' also and see if there's any glaring difference in where the kernel is spending its time on a system pushing the block device to its max write ability, vs ones that aren't. I don't have any other ideas. I'd rather a developer say, "try this" to gather more useful information, rather than just poking things with a random stick. -- Chris Murphy ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: btrfs-progs 4.4 re-balance of RAID6 is very slow / limited to one cpu core? 2016-02-09 21:46 ` Chris Murphy @ 2016-02-10 2:23 ` Chris Murphy 2016-02-10 2:36 ` Chris Murphy 2016-02-10 13:19 ` Christian Rohmann 1 sibling, 1 reply; 28+ messages in thread From: Chris Murphy @ 2016-02-10 2:23 UTC (permalink / raw) Cc: Christian Rohmann, Austin S. Hemmelgarn, linux-btrfs # perf stat -e 'btrfs:*' -a sleep 10 ## This is single device HDD, balance of a root fs was started before these 10 seconds of sampling. There are some differences in the statistics depending on whether there are predominately reads or writes for the balance, so clearly balance does predominately reads, then predominately writes. Unsurprising but the three tries I did were largely in agreement (orders of magnitude wise). http://fpaste.org/320551/06921614/ # perf record -e block:block_rq_issue -ag ^C ## after ~30 seconds # perf report ## Single device HDD, balance of root fs start before perf record. There's a lot of data, collapsed by default. I expanded a few items at random just as an example. I suspect the write of the perf.data file is a non-factor because it was just under 2MiB. http://fpaste.org/320555/14550698/raw/ # perf top ## Single device HDD, balance of root fs start before issuing this command, and let it run for about 20 seconds. This is actually not as interesting as I thought it might be, but I don't really know what I'm looking for. I'd need something else to compare it to. http://fpaste.org/320559/55070873/ Anyway, all of these are single device, so it's not apples/apples comparison, but it is a working (full speed for the block device) balance. Chris Murphy ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: btrfs-progs 4.4 re-balance of RAID6 is very slow / limited to one cpu core? 2016-02-10 2:23 ` Chris Murphy @ 2016-02-10 2:36 ` Chris Murphy 0 siblings, 0 replies; 28+ messages in thread From: Chris Murphy @ 2016-02-10 2:36 UTC (permalink / raw) Cc: Christian Rohmann, Austin S. Hemmelgarn, linux-btrfs This could also be interesting. It means canceling the balance in progress; waiting some time; and then cancelling it again to get results to return. # perf stat -B btrfs balance start / ## Again, single device example, balancing at expected performance. http://fpaste.org/320562/55071438/ I didn't try this but, it looks like it'd be a variation on the above, attaching to a running balance: # perf stat -B -p <pidforbalance> sleep 60 Anyway... Chris Murphy ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: btrfs-progs 4.4 re-balance of RAID6 is very slow / limited to one cpu core? 2016-02-09 21:46 ` Chris Murphy 2016-02-10 2:23 ` Chris Murphy @ 2016-02-10 13:19 ` Christian Rohmann 2016-02-10 19:16 ` Chris Murphy 1 sibling, 1 reply; 28+ messages in thread From: Christian Rohmann @ 2016-02-10 13:19 UTC (permalink / raw) To: Chris Murphy; +Cc: Austin S. Hemmelgarn, linux-btrfs Hey btrfs-folks, I did a bit of digging using "perf": 1) * "perf stat -B -p 3933 sleep 60" * "perf stat -e 'btrfs:*' -a sleep 60" -> http://fpaste.org/320718/10016145/ 2) * perf record -e block:block_rq_issue -ag" for about 30 seconds: -> http://fpaste.org/320719/51101751/raw/ 3) * perf top -> http://fpaste.org/320720/45511028/ Regards Christian ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: btrfs-progs 4.4 re-balance of RAID6 is very slow / limited to one cpu core? 2016-02-10 13:19 ` Christian Rohmann @ 2016-02-10 19:16 ` Chris Murphy 2016-02-10 19:38 ` Chris Murphy 0 siblings, 1 reply; 28+ messages in thread From: Chris Murphy @ 2016-02-10 19:16 UTC (permalink / raw) To: Christian Rohmann; +Cc: Chris Murphy, Austin S. Hemmelgarn, linux-btrfs http://fpaste.org/320720/45511028/ What is rb_next? See if you can explode that out and find out more about why there's so much time going on with that. I see that rb_next gets used for lots of things, including btrfs. In mine, rb_next is less than 1% overhead, but for you it's the top item. That's suspicious. http://fpaste.org/320718/10016145/ line 72-73. We both have counts for qgroup stuff. Mine is much much less than yours. I have never had quotas enabled on any of my filesystems, so I don't know why there are any such counts at all. But since your values are nearly three orders of magnitude greater than mine, I have to ask if you have quotas enabled or have ever had them enabled? That might be a factor here... Chris Murphy ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: btrfs-progs 4.4 re-balance of RAID6 is very slow / limited to one cpu core? 2016-02-10 19:16 ` Chris Murphy @ 2016-02-10 19:38 ` Chris Murphy 0 siblings, 0 replies; 28+ messages in thread From: Chris Murphy @ 2016-02-10 19:38 UTC (permalink / raw) To: Christian Rohmann; +Cc: Austin S. Hemmelgarn, linux-btrfs Sometimes when things are really slow or even hung up with Btrfs, yet there's no blocked task being reported, a dev has asked for sysrq+t, so that might also be something to issue while the slow balance is happening, and then dmesg to grab the result. The thing is, I have no idea how to read the output, but maybe if it gets posted up somewhere we can figure it out. I mean, obviously this is a bug, it shouldn't take two weeks or more to balance a raid6 volume. I'd like to think this would have been caught much sooner in regression testing before it'd be released, so it makes me wonder if this is an edge case related to hardware, kernel build, or more likely some state of the affected file systems that the test file systems aren't in. It might be more helpful to sort through xfstests that call balance and raid56, and see if there's something that's just not being tested, but applies to the actual filesystems involved; rather than trying to decipher kernel output. *shrug* Chris Murphy ^ permalink raw reply [flat|nested] 28+ messages in thread
end of thread, other threads:[~2016-02-10 19:38 UTC | newest] Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2016-01-22 13:38 btrfs-progs 4.4 re-balance of RAID6 is very slow / limited to one cpu core? Christian Rohmann 2016-01-22 14:51 ` Duncan 2016-01-24 2:30 ` Henk Slager 2016-01-25 11:34 ` Christian Rohmann 2016-01-25 22:13 ` Chris Murphy [not found] ` <CAKZK7uxdX9UBPOKButtPjqBOdVUfHdRTimP+W34fkz1h9P+wHg@mail.gmail.com> 2016-01-26 0:44 ` Fwd: " Justin Brown 2016-01-26 5:17 ` Chris Murphy 2016-01-26 6:14 ` Chris Murphy 2016-01-26 8:54 ` Christian Rohmann 2016-01-26 19:26 ` Chris Murphy 2016-01-26 19:27 ` Chris Murphy 2016-01-26 19:57 ` Austin S. Hemmelgarn 2016-01-26 20:20 ` Chris Murphy 2016-01-27 8:48 ` Christian Rohmann 2016-01-27 16:34 ` Austin S. Hemmelgarn 2016-01-27 20:58 ` bbrendon 2016-01-27 21:53 ` Chris Murphy 2016-01-28 12:27 ` Austin S. Hemmelgarn 2016-02-01 14:10 ` Christian Rohmann 2016-02-01 20:52 ` Chris Murphy 2016-02-09 13:48 ` Christian Rohmann 2016-02-09 16:46 ` Marc MERLIN 2016-02-09 21:46 ` Chris Murphy 2016-02-10 2:23 ` Chris Murphy 2016-02-10 2:36 ` Chris Murphy 2016-02-10 13:19 ` Christian Rohmann 2016-02-10 19:16 ` Chris Murphy 2016-02-10 19:38 ` Chris Murphy
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.