All of lore.kernel.org
 help / color / mirror / Atom feed
From: Chris Murphy <lists@colorremedies.com>
To: Christian Rohmann <crohmann@netcologne.de>
Cc: Chris Murphy <lists@colorremedies.com>,
	"Austin S. Hemmelgarn" <ahferroin7@gmail.com>,
	linux-btrfs <linux-btrfs@vger.kernel.org>
Subject: Re: btrfs-progs 4.4 re-balance of RAID6 is very slow / limited to one cpu core?
Date: Tue, 9 Feb 2016 14:46:01 -0700	[thread overview]
Message-ID: <CAJCQCtTPQE+hBzaktURR1v3GtObSjH=UV806qU-RmFVomwK3GA@mail.gmail.com> (raw)
In-Reply-To: <56B9EE1E.2040000@netcologne.de>

On Tue, Feb 9, 2016 at 6:48 AM, Christian Rohmann
<crohmann@netcologne.de> wrote:
>
>
> On 02/01/2016 09:52 PM, Chris Murphy wrote:
>>> Would some sort of stracing or profiling of the process help to narrow
>>> > down where the time is currently spent and why the balancing is only
>>> > running single-threaded?
>> This can't be straced. Someone a lot more knowledgeable than I am
>> might figure out where all the waits are with just a sysrq + t, if it
>> is a hold up in say parity computations. Otherwise perf which is a
>> rabbit hole but perf top is kinda cool to watch. That might give you
>> an idea where most of the cpu cycles are going if you can isolate the
>> workload to just the balance. Otherwise you may end up with noisy
>> data.
>
> My balance run is now working away since 19th of January:
>  "885 out of about 3492 chunks balanced (996 considered),  75% left"
>
> So this will take several more WEEKS to finish. Is there really nothing
> anyone here wants me to do or analyze to help finding the root cause of
> this?

Can you run 'perf top' and let it run for a few minutes, then
copy/paste or screenshot it somewhere? I'll definitely say in advance
this is just a matter of curiosity where the kernel is spending all of
its time, that this is going so slowly. In no way can I imagine being
able to help fix it. I'm a bit surprised there's no dev response,
maybe try the IRC channel? Weeks is just too long. My concern is if
there's a drive failure, a.) what state is the fs going to be in and
b.) will device replace be this slow too? I'd expect the code path for
balance and replace to be the same, so I suspect yes.


> I mean with this kind of performance there is no way a RAID6 can
> be used in production. Not because the code is not stable or
> functioning, but because regular maintenance like replacing a drive or
> growing an array takes WEEKS in which another maintenance procedure
> could be necessary or, much worse, another drive might have failed.

That's right.

In my dummy test, which should have run slower than your setup, the
other differences on my end:

elevator=noop    ## because I'm running an SSD
kernel 4.5rc0

I could redo my test, using 'perf top' also and see if there's any
glaring difference in where the kernel is spending its time on a
system pushing the block device to its max write ability, vs ones that
aren't. I don't have any other ideas. I'd rather a developer say, "try
this" to gather more useful information, rather than just poking
things with a random stick.



-- 
Chris Murphy

  parent reply	other threads:[~2016-02-09 21:46 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-01-22 13:38 btrfs-progs 4.4 re-balance of RAID6 is very slow / limited to one cpu core? Christian Rohmann
2016-01-22 14:51 ` Duncan
2016-01-24  2:30 ` Henk Slager
2016-01-25 11:34   ` Christian Rohmann
2016-01-25 22:13     ` Chris Murphy
     [not found]       ` <CAKZK7uxdX9UBPOKButtPjqBOdVUfHdRTimP+W34fkz1h9P+wHg@mail.gmail.com>
2016-01-26  0:44         ` Fwd: " Justin Brown
2016-01-26  5:17           ` Chris Murphy
2016-01-26  6:14             ` Chris Murphy
2016-01-26  8:54               ` Christian Rohmann
2016-01-26 19:26                 ` Chris Murphy
2016-01-26 19:27                   ` Chris Murphy
2016-01-26 19:57                   ` Austin S. Hemmelgarn
2016-01-26 20:20                     ` Chris Murphy
2016-01-27  8:48                       ` Christian Rohmann
2016-01-27 16:34                         ` Austin S. Hemmelgarn
2016-01-27 20:58                           ` bbrendon
2016-01-27 21:53                           ` Chris Murphy
2016-01-28 12:27                             ` Austin S. Hemmelgarn
2016-02-01 14:10                             ` Christian Rohmann
2016-02-01 20:52                               ` Chris Murphy
2016-02-09 13:48                                 ` Christian Rohmann
2016-02-09 16:46                                   ` Marc MERLIN
2016-02-09 21:46                                   ` Chris Murphy [this message]
2016-02-10  2:23                                     ` Chris Murphy
2016-02-10  2:36                                       ` Chris Murphy
2016-02-10 13:19                                     ` Christian Rohmann
2016-02-10 19:16                                       ` Chris Murphy
2016-02-10 19:38                                         ` Chris Murphy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAJCQCtTPQE+hBzaktURR1v3GtObSjH=UV806qU-RmFVomwK3GA@mail.gmail.com' \
    --to=lists@colorremedies.com \
    --cc=ahferroin7@gmail.com \
    --cc=crohmann@netcologne.de \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.