linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* RAID-1 - suboptimal write performance?
@ 2014-05-16 15:48 Tomasz Chmielewski
  2014-05-16 18:06 ` Calvin Walton
  0 siblings, 1 reply; 6+ messages in thread
From: Tomasz Chmielewski @ 2014-05-16 15:48 UTC (permalink / raw)
  To: linux-btrfs

While doing rsyncs of large archives from one RAID-1 btrfs filesystem
to another RAID-1 btrfs filesystem:

btrfs filesystem 1: sda + sdb (RAID-1), being copied to:
btrfs filesystem 2: sdc + sdd (RAID-1)
Server has 32 GB RAM


I can observe the following:


>From time to time, rsync "freezes", while there is high IO on only *one*
of write drives.


To reproduce:

dd if=/dev/urandom of=/mnt/btrfs1/bigfile.img bs=1M count=10000; sync
# cp should work, too, but won't show copy speed/progress
rsync -a -v --progress /mnt/btrfs1/bigfile.img /mnt/btrfs2/

In another terminal, run iostat -m 1:

1) a few seconds of writes to only one RAID-1 member:

Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
sdc             355.00         0.00       177.50          0        177
sdb               0.00         0.00         0.00          0          0
sdd               0.00         0.00         0.00          0          0
sda               0.00         0.00         0.00          0          0


2) then, a few seconds of writes to the other RAID-1 member:

Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
sdc               0.00         0.00         0.00          0          0
sdb               0.00         0.00         0.00          0          0
sdd             351.00         0.00       175.50          0        175
sda               0.00         0.00         0.00          0          0


Is it optimal behaviour? With software RAID-1, I'm seeing writes to
both devices at the same time.

Also, what happens when the system crashes, and one drive has several
hundred megabytes data more than the other one?

-- 
Tomasz Chmielewski
http://wpkg.org

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: RAID-1 - suboptimal write performance?
  2014-05-16 15:48 RAID-1 - suboptimal write performance? Tomasz Chmielewski
@ 2014-05-16 18:06 ` Calvin Walton
  2014-05-16 20:41   ` Tomasz Chmielewski
  0 siblings, 1 reply; 6+ messages in thread
From: Calvin Walton @ 2014-05-16 18:06 UTC (permalink / raw)
  To: Tomasz Chmielewski; +Cc: linux-btrfs

On Fri, 2014-05-16 at 16:48 +0100, Tomasz Chmielewski wrote:
> While doing rsyncs of large archives from one RAID-1 btrfs filesystem
> to another RAID-1 btrfs filesystem:
> 
> btrfs filesystem 1: sda + sdb (RAID-1), being copied to:
> btrfs filesystem 2: sdc + sdd (RAID-1)
> Server has 32 GB RAM
> 
> 
> I can observe the following:
> 
> 
> From time to time, rsync "freezes", while there is high IO on only *one*
> of write drives.

No comment on the performance issue, other than to say that I've seen
similar on RAID-10 before, I think.

> Also, what happens when the system crashes, and one drive has several
> hundred megabytes data more than the other one?

This shouldn't be an issue as long as you occasionally run a scrub or
balance. The scrub should find it and fix the missing data, and a
balance would just rewrite it as proper RAID-1 as a matter of course.

-- 
Calvin Walton <calvin.walton@kepstin.ca>


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: RAID-1 - suboptimal write performance?
  2014-05-16 18:06 ` Calvin Walton
@ 2014-05-16 20:41   ` Tomasz Chmielewski
  2014-05-16 21:36     ` Austin S Hemmelgarn
  0 siblings, 1 reply; 6+ messages in thread
From: Tomasz Chmielewski @ 2014-05-16 20:41 UTC (permalink / raw)
  To: Calvin Walton; +Cc: linux-btrfs

On Fri, 16 May 2014 14:06:24 -0400
Calvin Walton <calvin.walton@kepstin.ca> wrote:

> No comment on the performance issue, other than to say that I've seen
> similar on RAID-10 before, I think.
> 
> > Also, what happens when the system crashes, and one drive has
> > several hundred megabytes data more than the other one?
> 
> This shouldn't be an issue as long as you occasionally run a scrub or
> balance. The scrub should find it and fix the missing data, and a
> balance would just rewrite it as proper RAID-1 as a matter of course.

It's similar (writes to just one drive, while the other is idle) when
removing (many) snapshots. 

Not sure if that's optimal behaviour.

-- 
Tomasz Chmielewski
http://wpkg.org

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: RAID-1 - suboptimal write performance?
  2014-05-16 20:41   ` Tomasz Chmielewski
@ 2014-05-16 21:36     ` Austin S Hemmelgarn
  2014-05-18 18:49       ` Brendan Hide
  2014-05-23 12:57       ` Roman Mamedov
  0 siblings, 2 replies; 6+ messages in thread
From: Austin S Hemmelgarn @ 2014-05-16 21:36 UTC (permalink / raw)
  To: Tomasz Chmielewski, Calvin Walton; +Cc: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 2333 bytes --]

On 05/16/2014 04:41 PM, Tomasz Chmielewski wrote:
> On Fri, 16 May 2014 14:06:24 -0400
> Calvin Walton <calvin.walton@kepstin.ca> wrote:
> 
>> No comment on the performance issue, other than to say that I've seen
>> similar on RAID-10 before, I think.
>>
>>> Also, what happens when the system crashes, and one drive has
>>> several hundred megabytes data more than the other one?
>>
>> This shouldn't be an issue as long as you occasionally run a scrub or
>> balance. The scrub should find it and fix the missing data, and a
>> balance would just rewrite it as proper RAID-1 as a matter of course.
> 
> It's similar (writes to just one drive, while the other is idle) when
> removing (many) snapshots. 
> 
> Not sure if that's optimal behaviour.
> 
I think, after having looked at some of the code, that I know what is
causing this (although my interpretation of the code may be completely
off target).  As far as I can make out, BTRFS only dispatches writes to
one device at a time, and the write() system call only returns when the
data is on both devices.  While dispatching to one device at a time is
optimal when both 'devices' are partitions on the same underlying disk
(and also if your optimization metric is the simplicity of the
underlying code), it degrades very fast to the worst case when using
multiple devices.  The underlying cause however, which the one device at
a time logic in BTRFS just makes much worse, is that the buffer for the
write() call is kept in memory until the write completes, and counts
against the per-process write-caching limit, and when the process fills
up it's write-cache, the next call it makes that would write to the disk
hangs until the write cache is less full.

The two options that I've found that work around this are:
1. Run 'sync' whenever the program stalls, or
2. Disable write-caching by adding the following to /etc/sysctl.conf
vm.dirty_bytes = 0
vm.dirty_background_bytes = 0

Option 1 is kind of tedious, but doesn't hurt performance all that much,
Option 2 will lower throughput, but will cause most of the stalls to
disappear.

Ideally, BTRFS should dispatch the first write for a block in a
round-robin fashion among available devices.  This won't fix the
underlying issue, but it will make it less of an issue for BTRFS.


[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 2967 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: RAID-1 - suboptimal write performance?
  2014-05-16 21:36     ` Austin S Hemmelgarn
@ 2014-05-18 18:49       ` Brendan Hide
  2014-05-23 12:57       ` Roman Mamedov
  1 sibling, 0 replies; 6+ messages in thread
From: Brendan Hide @ 2014-05-18 18:49 UTC (permalink / raw)
  To: Austin S Hemmelgarn, Tomasz Chmielewski, Calvin Walton; +Cc: linux-btrfs

On 2014/05/16 11:36 PM, Austin S Hemmelgarn wrote:
> On 05/16/2014 04:41 PM, Tomasz Chmielewski wrote:
>> On Fri, 16 May 2014 14:06:24 -0400
>> Calvin Walton <calvin.walton@kepstin.ca> wrote:
>>
>>> No comment on the performance issue, other than to say that I've seen
>>> similar on RAID-10 before, I think.
>>>
>>>> Also, what happens when the system crashes, and one drive has
>>>> several hundred megabytes data more than the other one?
>>> This shouldn't be an issue as long as you occasionally run a scrub or
>>> balance. The scrub should find it and fix the missing data, and a
>>> balance would just rewrite it as proper RAID-1 as a matter of course.
>> It's similar (writes to just one drive, while the other is idle) when
>> removing (many) snapshots.
>>
>> Not sure if that's optimal behaviour.
>>
> [snip]
>
> Ideally, BTRFS should dispatch the first write for a block in a
> round-robin fashion among available devices.  This won't fix the
> underlying issue, but it will make it less of an issue for BTRFS.
>

More ideally, btrfs should dispatch them in parallel. This will likely 
be looked into for N-way mirroring. Having 3 or more copies and working 
in the current way would be far from optimal.



-- 
__________
Brendan Hide
http://swiftspirit.co.za/
http://www.webafrica.co.za/?AFF1E97


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: RAID-1 - suboptimal write performance?
  2014-05-16 21:36     ` Austin S Hemmelgarn
  2014-05-18 18:49       ` Brendan Hide
@ 2014-05-23 12:57       ` Roman Mamedov
  1 sibling, 0 replies; 6+ messages in thread
From: Roman Mamedov @ 2014-05-23 12:57 UTC (permalink / raw)
  To: Austin S Hemmelgarn; +Cc: Tomasz Chmielewski, Calvin Walton, linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 1209 bytes --]

On Fri, 16 May 2014 17:36:57 -0400
Austin S Hemmelgarn <ahferroin7@gmail.com> wrote:

> > It's similar (writes to just one drive, while the other is idle) when
> > removing (many) snapshots. 
> > 
> > Not sure if that's optimal behaviour.
> > 
> I think, after having looked at some of the code, that I know what is
> causing this (although my interpretation of the code may be completely
> off target).  As far as I can make out, BTRFS only dispatches writes to
> one device at a time

Yes, I can confirm this... yesterday I was writing large files to my Btrfs
RAID1 of two devices, and remembering this thread, decided to take a look at
how the writes are performed. And indeed in 'iostat' it was clear that only
one device works at a time. In my case, first one drive was writing at 80-100
MB/sec for 5-10 seconds, then activity on that once ceased entirely, and the
second drive started writing for the same period at similar speeds.

In effect this is causing the whole operation take about 2x longer than ideal
(or in case of just a single device Btrfs). Surprising that this performance
drawback of Btrfs RAID1 is not more widely known or discussed.

-- 
With respect,
Roman

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2014-05-23 12:58 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-05-16 15:48 RAID-1 - suboptimal write performance? Tomasz Chmielewski
2014-05-16 18:06 ` Calvin Walton
2014-05-16 20:41   ` Tomasz Chmielewski
2014-05-16 21:36     ` Austin S Hemmelgarn
2014-05-18 18:49       ` Brendan Hide
2014-05-23 12:57       ` Roman Mamedov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).