All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Wilson, Ellis" <ellisw@panasas.com>
To: Btrfs BTRFS <linux-btrfs@vger.kernel.org>
Subject: Understanding BTRFS RAID0 Performance
Date: Thu, 4 Oct 2018 21:33:29 +0000	[thread overview]
Message-ID: <54026c92-9cd1-2ac8-5747-c5405dd82087@panasas.com> (raw)

Hi all,

I'm attempting to understand a roughly 30% degradation in BTRFS RAID0 
for large read I/Os across six disks compared with ext4 atop mdadm RAID0.

Specifically, I achieve performance parity with BTRFS in terms of 
single-threaded write and read, and multi-threaded write, but poor 
performance for multi-threaded read.  The relative discrepancy appears 
to grow as one adds disks.  At 6 disks in a RAID0 (yes, I know, and I do 
not care about data persistence as I have this solved at a different 
layer) I see approximately 1.3GB/s for ext4 atop mdadm, but only about 
950MB/s for BTRFS, both using four threads to read and write four 
different large files.  Across a large number of my nodes this 
aggregates to a sizable performance loss.

This has been a long and winding road for me, but to keep my question 
somewhat succinct, I'm down to the level of block tracing and one thing 
that stands out between the two traces is the number of rather small 
read I/O's that reach one of the drives in the test is vastly different 
for mdadm RAID0 vs BTRFS, which I think explains (in part at least) the 
performance drop off.  The read queue depth for BTRFS hovers in the 
upper single digits while the ext4/mdadm queue depth is towards 20.  I'm 
unsure right now if this is related or not.

Benchmark: FIO was used with the following command:
fio --name=read --rw=read --bs=1M --direct=0 --size=16G --numjobs=4 
--runtime=120 --group_reporting

The block sizes and counts of I/Os at that size I'm seeing for both 
cases comes in like the following (my max_segment_kb_size is 4K, hence 
the above typical upper-end):

BTRFS:
  Count  Read I/O Size
   21849 128
      18 640
       9 768
       3 1280
       9 1408
       3 2048
       3 2560
    1011 2688
     507 2816

ext4 on mdadm RAID0:
  Count  Read I/O Size
       9 8
       3 16
       5 256
       5 768
      19 1024
     716 1536
       5 1592
       5 2504
     695 2560
      24 4096
      21 6656
     477 8192

Before I dive into the BTRFS source or try tracing in a different way, I 
wanted to see if this was a well-known artifact of BTRFS RAID0 and, even 
better, if there's any tunables available for RAID0 in BTRFS I could 
play with.  The man page for mkfs.btrfs and btrfstune in the tuning 
regard seemed...sparse.

Any help or pointers are greatly appreciated!

Thanks,

ellis

             reply	other threads:[~2018-10-04 21:33 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-10-04 21:33 Wilson, Ellis [this message]
2018-10-05  8:45 ` Understanding BTRFS RAID0 Performance Nikolay Borisov
2018-10-05 10:40 ` Duncan
2018-10-05 15:29   ` Wilson, Ellis
2018-10-06  0:34     ` Duncan
2018-10-08 12:20       ` Austin S. Hemmelgarn

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=54026c92-9cd1-2ac8-5747-c5405dd82087@panasas.com \
    --to=ellisw@panasas.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.