linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Austin S Hemmelgarn <ahferroin7@gmail.com>
To: Tomasz Chmielewski <tch@virtall.com>,
	Calvin Walton <calvin.walton@kepstin.ca>
Cc: "linux-btrfs@vger.kernel.org" <linux-btrfs@vger.kernel.org>
Subject: Re: RAID-1 - suboptimal write performance?
Date: Fri, 16 May 2014 17:36:57 -0400	[thread overview]
Message-ID: <537684F9.6060909@gmail.com> (raw)
In-Reply-To: <20140516214135.23fefc39@s9>

[-- Attachment #1: Type: text/plain, Size: 2333 bytes --]

On 05/16/2014 04:41 PM, Tomasz Chmielewski wrote:
> On Fri, 16 May 2014 14:06:24 -0400
> Calvin Walton <calvin.walton@kepstin.ca> wrote:
> 
>> No comment on the performance issue, other than to say that I've seen
>> similar on RAID-10 before, I think.
>>
>>> Also, what happens when the system crashes, and one drive has
>>> several hundred megabytes data more than the other one?
>>
>> This shouldn't be an issue as long as you occasionally run a scrub or
>> balance. The scrub should find it and fix the missing data, and a
>> balance would just rewrite it as proper RAID-1 as a matter of course.
> 
> It's similar (writes to just one drive, while the other is idle) when
> removing (many) snapshots. 
> 
> Not sure if that's optimal behaviour.
> 
I think, after having looked at some of the code, that I know what is
causing this (although my interpretation of the code may be completely
off target).  As far as I can make out, BTRFS only dispatches writes to
one device at a time, and the write() system call only returns when the
data is on both devices.  While dispatching to one device at a time is
optimal when both 'devices' are partitions on the same underlying disk
(and also if your optimization metric is the simplicity of the
underlying code), it degrades very fast to the worst case when using
multiple devices.  The underlying cause however, which the one device at
a time logic in BTRFS just makes much worse, is that the buffer for the
write() call is kept in memory until the write completes, and counts
against the per-process write-caching limit, and when the process fills
up it's write-cache, the next call it makes that would write to the disk
hangs until the write cache is less full.

The two options that I've found that work around this are:
1. Run 'sync' whenever the program stalls, or
2. Disable write-caching by adding the following to /etc/sysctl.conf
vm.dirty_bytes = 0
vm.dirty_background_bytes = 0

Option 1 is kind of tedious, but doesn't hurt performance all that much,
Option 2 will lower throughput, but will cause most of the stalls to
disappear.

Ideally, BTRFS should dispatch the first write for a block in a
round-robin fashion among available devices.  This won't fix the
underlying issue, but it will make it less of an issue for BTRFS.


[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 2967 bytes --]

  reply	other threads:[~2014-05-16 21:37 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-05-16 15:48 RAID-1 - suboptimal write performance? Tomasz Chmielewski
2014-05-16 18:06 ` Calvin Walton
2014-05-16 20:41   ` Tomasz Chmielewski
2014-05-16 21:36     ` Austin S Hemmelgarn [this message]
2014-05-18 18:49       ` Brendan Hide
2014-05-23 12:57       ` Roman Mamedov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=537684F9.6060909@gmail.com \
    --to=ahferroin7@gmail.com \
    --cc=calvin.walton@kepstin.ca \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=tch@virtall.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).