All of lore.kernel.org
 help / color / mirror / Atom feed
From: Stan Hoeppner <stan@hardwarefreak.com>
To: Adam Goryachev <mailinglists@websitemanagers.com.au>
Cc: Dave Cundiff <syshackmin@gmail.com>, linux-raid@vger.kernel.org
Subject: Re: RAID performance
Date: Fri, 08 Feb 2013 15:42:25 -0600	[thread overview]
Message-ID: <51157141.90205@hardwarefreak.com> (raw)
In-Reply-To: <51150475.2020803@websitemanagers.com.au>

On 2/8/2013 7:58 AM, Adam Goryachev wrote:

> Firstly, this is done against /tmp which is on the single standalone
> Intel SSD used for the rootfs (shows some performance level of the
> chipset I presume):

The chipset performance shouldn't be an issue, but it's possible.

> root@san1:/tmp/testing# fio /root/test.fio
> seq-read: (g=0): rw=read, bs=64K-64K/64K-64K, ioengine=libaio, iodepth=32
> seq-write: (g=1): rw=write, bs=64K-64K/64K-64K, ioengine=libaio, iodepth=32
> Starting 2 processes
> seq-read: Laying out IO file(s) (1 file(s) / 4096MB)
> Jobs: 1 (f=1): [_W] [100.0% done] [0K/137M /s] [0/2133 iops] [eta 00m:00s]
> seq-read: (groupid=0, jobs=1): err= 0: pid=4932
>   read : io=4096MB, bw=518840KB/s, iops=8106, runt=  8084msec
> seq-write: (groupid=1, jobs=1): err= 0: pid=5138
>   write: io=4096MB, bw=136405KB/s, iops=2131, runt= 30749msec
> Run status group 0 (all jobs):
>    READ: io=4096MB, aggrb=518840KB/s, minb=531292KB/s, maxb=531292KB/s,
> mint=8084msec, maxt=8084msec
> 
> Run status group 1 (all jobs):
>   WRITE: io=4096MB, aggrb=136404KB/s, minb=139678KB/s, maxb=139678KB/s,
> mint=30749msec, maxt=30749msec
> 
> Disk stats (read/write):
>   sda: ios=66570/66363, merge=10297/10453, ticks=259152/993304,
> in_queue=1252592, util=99.34%
...
> This seems to indicate a read speed of 531M and write of 139M, which to
> me says something is wrong. I thought write speed is slower, but not
> that much slower?

Study this:
http://www.anandtech.com/show/5508/intel-ssd-520-review-cherryville-brings-reliability-to-sandforce/3

That's the 240GB version of your 520S.  Note the write tests are all
well over 300MB/s, one seq write test reaching almost 400MB/s.  The
480GB version should be even better.  These tests use 4KB *aligned* IOs.
 If you've partitioned the SSDs, and your partition boundaries fall in
the middle of erase blocks instead of perfectly between them, then your
IOs will be unaligned, and performance will suffer.  Considering the
numbers you're seeing with FIO this may be part of the low performance
problem.

> Moving on, I've stopped the secondary DRBD, created a new LV (testlv) of
> 15G, and formatted with ext4, mounted it, and re-run the test:
> 
> seq-read: (groupid=0, jobs=1): err= 0: pid=19578
>   read : io=4096MB, bw=640743KB/s, iops=10011, runt=  6546msec
> seq-write: (groupid=1, jobs=1): err= 0: pid=19997
>   write: io=4096MB, bw=208765KB/s, iops=3261, runt= 20091msec
> Run status group 0 (all jobs):
>    READ: io=4096MB, aggrb=640743KB/s, minb=656120KB/s, maxb=656120KB/s,
> mint=6546msec, maxt=6546msec
> 
> Run status group 1 (all jobs):
>   WRITE: io=4096MB, aggrb=208765KB/s, minb=213775KB/s, maxb=213775KB/s,
> mint=20091msec, maxt=20091msec
> 
> Disk stats (read/write):
>   dm-14: ios=65536/64841, merge=0/0, ticks=206920/469464,
> in_queue=676580, util=98.89%, aggrios=0/0, aggrmerge=0/0, aggrticks=0/0,
> aggrin_queue=0, aggrutil=0.00%
>     drbd2: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=-nan%
> 
> dm-14 is the testlv
> 
> So, this indicates a max read speed of 656M and write of 213M, again,
> write is very slow (about 30%).
> 
> With these figures, just 2 x 1Gbps links would saturate the write
> performance of this RAID5 array.

You might get close if you ran a synthetic test, but you wouldn't
bottleneck at the array with CIFS traffic from that DC.  Now, once you
get the network problems straightened out, then you may bottleneck the
SSDs with multiple large sequential writes.  Assuming you don't get the
block IO issues fixed.

I recommend blowing away the partitions entirely and building your md
array on bare drives.  As part of this you will have to recreate your
LVs you're exporting via iSCSI.  Make sure all LVs are aligned to the
underlying md device geometry.  This will eliminate any possible
alignment issues.

Whether it does or not, given what I've learned of this environment, I'd
go ahead and install one of the LSI 9207-8i 6GB/s SAS/SATA HBAs I
mentioned earlier, in SLOT6 for full bandwidth, and move all the SSDs in
the array over to it.  This will give you 600MB/s peak bandwidth per SSD
eliminating any possible issues created by running them at SATA2 link
speed, and eliminate any possible issues with the C204 Southbridge chip,
while giving you substantially higher controller IOPS: 700,000.  If the
SSDs are not on a chassis backplane you'll need two SFF8088 breakout
cables to connect the drives to the card.  The "kit" version of these
cards comes with these cables.  Kit runs ~$350 USD.

> Finally, changing the fio config file to point filename=/dev/vg0/testlv
> (ie, raw LV, no filesystem):
> seq-read: (groupid=0, jobs=1): err= 0: pid=10986
>   read : io=4096MB, bw=652607KB/s, iops=10196, runt=  6427msec
> seq-write: (groupid=1, jobs=1): err= 0: pid=11177
>   write: io=4096MB, bw=202252KB/s, iops=3160, runt= 20738msec
> Run status group 0 (all jobs):
>    READ: io=4096MB, aggrb=652606KB/s, minb=668269KB/s, maxb=668269KB/s,
> mint=6427msec, maxt=6427msec
> 
> Run status group 1 (all jobs):
>   WRITE: io=4096MB, aggrb=202252KB/s, minb=207106KB/s, maxb=207106KB/s,
> mint=20738msec, maxt=20738msec
> 
> Not much difference, which I didn't really expect...
> 
> So, should I be concerned about these results? Do I need to try to
> re-run these tests at a lower layer (ie, remove DRBD and/or LVM from the
> picture)? Are these meaningless and I should be running a different
> test/set of tests/etc ?

The ~200MB/s seq writes is a bit alarming, as is the ~500MB/s read rate.
 5 SSDs in RAID5 should be able to do much much more, especially reads.
 Theoretically you should be able to squeeze 2GB/s of read speed out of
this RAID5.  But given this is a RAID5 array, write will always be
slower, even with SSD.  But they shouldn't be this much slower with SSD
because the RMW latency is so much lower.  But with large sequential
writes you shouldn't have RMW cycles anyway.  If DRBD is mirroring the
md/RAID5 device it will definitely skew your test results lower, but not
drastically so.  I can't recall if you stated the size of your md stripe
cache.  If it's too small that may be hurting performance.

Something we've only briefly touched on so far is the single write
thread bottleneck of the md/RAID5 driver.  To verify if this is part of
this problem you need to capture CPU core utilization during your write
tests to see if md is eating all of one core.  If it is then your RAID5
speed will never get better on this mobo/CPU combo, until you upgrade to
a kernel with the appropriate patches.  But at only 200MB/s I doubt this
is the case, but check it anyway.  Once you get the IO problem fixed you
may run into the single thread problem, so you'll check this again at
that time.

IIRC, people on this list are hitting ~400-500MB/s sequential writes
with RAID5/6/10 rust arrays, so I don't think the write thread is your
problem.  Not yet anyway.

-- 
Stan


  reply	other threads:[~2013-02-08 21:42 UTC|newest]

Thread overview: 131+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-02-07  6:48 RAID performance Adam Goryachev
2013-02-07  6:51 ` Adam Goryachev
2013-02-07  8:24   ` Stan Hoeppner
2013-02-07  7:02 ` Carsten Aulbert
2013-02-07 10:12   ` Adam Goryachev
2013-02-07 10:29     ` Carsten Aulbert
2013-02-07 10:41       ` Adam Goryachev
2013-02-07  8:11 ` Stan Hoeppner
2013-02-07 10:05   ` Adam Goryachev
2013-02-16  4:33     ` RAID performance - *Slow SSDs likely solved* Stan Hoeppner
     [not found]       ` <cfefe7a6-a13f-413c-9e3d-e061c68dc01b@email.android.com>
2013-02-17  5:01         ` Stan Hoeppner
2013-02-08  7:21   ` RAID performance Adam Goryachev
2013-02-08  7:37     ` Chris Murphy
2013-02-08 13:04     ` Stan Hoeppner
2013-02-07  9:07 ` Dave Cundiff
2013-02-07 10:19   ` Adam Goryachev
2013-02-07 11:07     ` Dave Cundiff
2013-02-07 12:49       ` Adam Goryachev
2013-02-07 12:53         ` Phil Turmel
2013-02-07 12:58           ` Adam Goryachev
2013-02-07 13:03             ` Phil Turmel
2013-02-07 13:08               ` Adam Goryachev
2013-02-07 13:20                 ` Mikael Abrahamsson
2013-02-07 22:03               ` Chris Murphy
2013-02-07 23:48                 ` Chris Murphy
2013-02-08  0:02                   ` Chris Murphy
2013-02-08  6:25                     ` Adam Goryachev
2013-02-08  7:35                       ` Chris Murphy
2013-02-08  8:34                         ` Chris Murphy
2013-02-08 14:31                           ` Adam Goryachev
2013-02-08 14:19                         ` Adam Goryachev
2013-02-08  6:15                   ` Adam Goryachev
2013-02-07 15:32         ` Dave Cundiff
2013-02-08 13:58           ` Adam Goryachev
2013-02-08 21:42             ` Stan Hoeppner [this message]
2013-02-14 22:42               ` Chris Murphy
2013-02-15  1:10                 ` Adam Goryachev
2013-02-15  1:40                   ` Chris Murphy
2013-02-15  4:01                     ` Adam Goryachev
2013-02-15  5:14                       ` Chris Murphy
2013-02-15 11:10                         ` Adam Goryachev
2013-02-15 23:01                           ` Chris Murphy
2013-02-17  9:52             ` RAID performance - new kernel results Adam Goryachev
2013-02-18 13:20               ` RAID performance - new kernel results - 5x SSD RAID5 Stan Hoeppner
2013-02-20 17:10                 ` Adam Goryachev
2013-02-21  6:04                   ` Stan Hoeppner
2013-02-21  6:40                     ` Adam Goryachev
2013-02-21  8:47                       ` Joseph Glanville
2013-02-22  8:10                       ` Stan Hoeppner
2013-02-24 20:36                         ` Stan Hoeppner
2013-03-01 16:06                           ` Adam Goryachev
2013-03-02  9:15                             ` Stan Hoeppner
2013-03-02 17:07                               ` Phil Turmel
2013-03-02 23:48                                 ` Stan Hoeppner
2013-03-03  2:35                                   ` Phil Turmel
2013-03-03 15:19                                 ` Adam Goryachev
2013-03-04  1:31                                   ` Phil Turmel
2013-03-04  9:39                                     ` Adam Goryachev
2013-03-04 12:41                                       ` Phil Turmel
2013-03-04 12:42                                       ` Stan Hoeppner
2013-03-04  5:25                                   ` Stan Hoeppner
2013-03-03 17:32                               ` Adam Goryachev
2013-03-04 12:20                                 ` Stan Hoeppner
2013-03-04 16:26                                   ` Adam Goryachev
2013-03-05  9:30                                     ` RAID performance - 5x SSD RAID5 - effects of stripe cache sizing Stan Hoeppner
2013-03-05 15:53                                       ` Adam Goryachev
2013-03-07  7:36                                         ` Stan Hoeppner
2013-03-08  0:17                                           ` Adam Goryachev
2013-03-08  4:02                                             ` Stan Hoeppner
2013-03-08  5:57                                               ` Mikael Abrahamsson
2013-03-08 10:09                                                 ` Stan Hoeppner
2013-03-08 14:11                                                   ` Mikael Abrahamsson
2013-02-21 17:41                     ` RAID performance - new kernel results - 5x SSD RAID5 David Brown
2013-02-23  6:41                       ` Stan Hoeppner
2013-02-23 15:57               ` RAID performance - new kernel results John Stoffel
2013-03-01 16:10                 ` Adam Goryachev
2013-03-10 15:35                   ` Charles Polisher
2013-04-15 12:23                     ` Adam Goryachev
2013-04-15 15:31                       ` John Stoffel
2013-04-17 10:15                         ` Adam Goryachev
2013-04-15 16:49                       ` Roy Sigurd Karlsbakk
2013-04-15 20:16                       ` Phil Turmel
2013-04-16 19:28                         ` Roy Sigurd Karlsbakk
2013-04-16 21:03                           ` Phil Turmel
2013-04-16 21:43                           ` Stan Hoeppner
2013-04-15 20:42                       ` Stan Hoeppner
2013-02-08  3:32       ` RAID performance Stan Hoeppner
2013-02-08  7:11         ` Adam Goryachev
2013-02-08 17:10           ` Stan Hoeppner
2013-02-08 18:44             ` Adam Goryachev
2013-02-09  4:09               ` Stan Hoeppner
2013-02-10  4:40                 ` Adam Goryachev
2013-02-10 13:22                   ` Stan Hoeppner
2013-02-10 16:16                     ` Adam Goryachev
2013-02-10 17:19                       ` Mikael Abrahamsson
2013-02-10 21:57                         ` Adam Goryachev
2013-02-11  3:41                           ` Adam Goryachev
2013-02-11  4:33                           ` Mikael Abrahamsson
2013-02-12  2:46                       ` Stan Hoeppner
2013-02-12  5:33                         ` Adam Goryachev
2013-02-13  7:56                           ` Stan Hoeppner
2013-02-13 13:48                             ` Phil Turmel
2013-02-13 16:17                             ` Adam Goryachev
2013-02-13 20:20                               ` Adam Goryachev
2013-02-14 12:22                                 ` Stan Hoeppner
2013-02-15 13:31                                   ` Stan Hoeppner
2013-02-15 14:32                                     ` Adam Goryachev
2013-02-16  1:07                                       ` Stan Hoeppner
2013-02-16 17:19                                         ` Adam Goryachev
2013-02-17  1:42                                           ` Stan Hoeppner
2013-02-17  5:02                                             ` Adam Goryachev
2013-02-17  6:28                                               ` Stan Hoeppner
2013-02-17  8:41                                                 ` Adam Goryachev
2013-02-17 13:58                                                   ` Stan Hoeppner
2013-02-17 14:46                                                     ` Adam Goryachev
2013-02-19  8:17                                                       ` Stan Hoeppner
2013-02-20 16:45                                                         ` Adam Goryachev
2013-02-21  0:45                                                           ` Stan Hoeppner
2013-02-21  3:10                                                             ` Adam Goryachev
2013-02-22 11:19                                                               ` Stan Hoeppner
2013-02-22 15:25                                                                 ` Charles Polisher
2013-02-23  4:14                                                                   ` Stan Hoeppner
2013-02-12  7:34                         ` Mikael Abrahamsson
2013-02-08  7:17         ` Adam Goryachev
2013-02-07 12:01     ` Brad Campbell
2013-02-07 12:37       ` Adam Goryachev
2013-02-07 17:12         ` Fredrik Lindgren
2013-02-08  0:00           ` Adam Goryachev
2013-02-11 19:49   ` Roy Sigurd Karlsbakk
2013-02-11 20:30     ` Dave Cundiff
2013-02-07 11:32 ` Mikael Abrahamsson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=51157141.90205@hardwarefreak.com \
    --to=stan@hardwarefreak.com \
    --cc=linux-raid@vger.kernel.org \
    --cc=mailinglists@websitemanagers.com.au \
    --cc=syshackmin@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.