From mboxrd@z Thu Jan 1 00:00:00 1970 From: Adam Goryachev Subject: Re: RAID performance - new kernel results Date: Sun, 17 Feb 2013 20:52:14 +1100 Message-ID: <5120A84E.4020702@websitemanagers.com.au> References: <51134E43.7090508@websitemanagers.com.au> <51137FB8.6060003@websitemanagers.com.au> <5113A2D6.20104@websitemanagers.com.au> <51150475.2020803@websitemanagers.com.au> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <51150475.2020803@websitemanagers.com.au> Sender: linux-raid-owner@vger.kernel.org To: Dave Cundiff Cc: linux-raid@vger.kernel.org List-Id: linux-raid.ids On 09/02/13 00:58, Adam Goryachev wrote: > On 08/02/13 02:32, Dave Cundiff wrote: >> On Thu, Feb 7, 2013 at 7:49 AM, Adam Goryachev >> wrote: >>>> I definitely see that. See below for a FIO run I just did on one of my RAID10s > OK, some fio results. > > Firstly, this is done against /tmp which is on the single standalone > Intel SSD used for the rootfs (shows some performance level of the > chipset I presume): > > root@san1:/tmp/testing# fio /root/test.fio > seq-read: (g=0): rw=read, bs=64K-64K/64K-64K, ioengine=libaio, iodepth=32 > seq-write: (g=1): rw=write, bs=64K-64K/64K-64K, ioengine=libaio, iodepth=32 > Starting 2 processes > seq-read: Laying out IO file(s) (1 file(s) / 4096MB) > Jobs: 1 (f=1): [_W] [100.0% done] [0K/137M /s] [0/2133 iops] [eta 00m:00s] > seq-read: (groupid=0, jobs=1): err= 0: pid=4932 > read : io=4096MB, bw=518840KB/s, iops=8106, runt= 8084msec > seq-write: (groupid=1, jobs=1): err= 0: pid=5138 > write: io=4096MB, bw=136405KB/s, iops=2131, runt= 30749msec > Run status group 0 (all jobs): > READ: io=4096MB, aggrb=518840KB/s, minb=531292KB/s, maxb=531292KB/s, > mint=8084msec, maxt=8084msec > > Run status group 1 (all jobs): > WRITE: io=4096MB, aggrb=136404KB/s, minb=139678KB/s, maxb=139678KB/s, > mint=30749msec, maxt=30749msec > > Disk stats (read/write): > sda: ios=66570/66363, merge=10297/10453, ticks=259152/993304, > in_queue=1252592, util=99.34% > > > PS, I'm assuming I should omit the extra output similar to what you > did.... If I should include all info, I can re-run and provide... > > This seems to indicate a read speed of 531M and write of 139M, which to > me says something is wrong. I thought write speed is slower, but not > that much slower? > > Moving on, I've stopped the secondary DRBD, created a new LV (testlv) of > 15G, and formatted with ext4, mounted it, and re-run the test: > > seq-read: (groupid=0, jobs=1): err= 0: pid=19578 > read : io=4096MB, bw=640743KB/s, iops=10011, runt= 6546msec > seq-write: (groupid=1, jobs=1): err= 0: pid=19997 > write: io=4096MB, bw=208765KB/s, iops=3261, runt= 20091msec > Run status group 0 (all jobs): > READ: io=4096MB, aggrb=640743KB/s, minb=656120KB/s, maxb=656120KB/s, > mint=6546msec, maxt=6546msec > > Run status group 1 (all jobs): > WRITE: io=4096MB, aggrb=208765KB/s, minb=213775KB/s, maxb=213775KB/s, > mint=20091msec, maxt=20091msec > > Disk stats (read/write): > dm-14: ios=65536/64841, merge=0/0, ticks=206920/469464, > in_queue=676580, util=98.89%, aggrios=0/0, aggrmerge=0/0, aggrticks=0/0, > aggrin_queue=0, aggrutil=0.00% > drbd2: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=-nan% > > dm-14 is the testlv > > So, this indicates a max read speed of 656M and write of 213M, again, > write is very slow (about 30%). > > With these figures, just 2 x 1Gbps links would saturate the write > performance of this RAID5 array. > > Finally, changing the fio config file to point filename=/dev/vg0/testlv > (ie, raw LV, no filesystem): > seq-read: (groupid=0, jobs=1): err= 0: pid=10986 > read : io=4096MB, bw=652607KB/s, iops=10196, runt= 6427msec > seq-write: (groupid=1, jobs=1): err= 0: pid=11177 > write: io=4096MB, bw=202252KB/s, iops=3160, runt= 20738msec > Run status group 0 (all jobs): > READ: io=4096MB, aggrb=652606KB/s, minb=668269KB/s, maxb=668269KB/s, > mint=6427msec, maxt=6427msec > > Run status group 1 (all jobs): > WRITE: io=4096MB, aggrb=202252KB/s, minb=207106KB/s, maxb=207106KB/s, > mint=20738msec, maxt=20738msec > > Not much difference, which I didn't really expect... > > So, should I be concerned about these results? Do I need to try to > re-run these tests at a lower layer (ie, remove DRBD and/or LVM from the > picture)? Are these meaningless and I should be running a different > test/set of tests/etc ? OK, I've upgraded to: Linux san1 3.2.0-0.bpo.4-amd64 #1 SMP Debian 3.2.35-2~bpo60+1 x86_64 GNU/Linux I also upgraded to iscsitarget from testing, as there seemed a few fixes there, even though not the one I might have liked: ii iscsitarget 1.4.20.2-10.1 iSCSI Enterprise Target userland tools ii iscsitarget-dkms 1.4.20.2-10.1 iSCSI Enterprise Target kernel module source - dkms version Then I re-ran the fio tests from above and here is what I get when testing against an LV which has a snapshot against it: seq-read: (groupid=0, jobs=1): err= 0: pid=10168 read : io=4096MB, bw=1920MB/s, iops=30724, runt= 2133msec seq-write: (groupid=1, jobs=1): err= 0: pid=10169 write: io=2236MB, bw=38097KB/s, iops=595, runt= 60094msec Run status group 0 (all jobs): READ: io=4096MB, aggrb=1920MB/s, minb=1966MB/s, maxb=1966MB/s, mint=2133msec, maxt=2133msec Run status group 1 (all jobs): WRITE: io=2236MB, aggrb=38097KB/s, minb=39011KB/s, maxb=39011KB/s, mint=60094msec, maxt=60094msec So, 1920MB/s read, that sounds good to me, almost 3 times faster, however, the write performance is pretty dismal :( After removing the snapshot, here is another look: seq-read: (groupid=0, jobs=1): err= 0: pid=10222 read : io=4096MB, bw=2225MB/s, iops=35598, runt= 1841msec seq-write: (groupid=1, jobs=1): err= 0: pid=10223 write: io=4096MB, bw=111666KB/s, iops=1744, runt= 37561msec Run status group 0 (all jobs): READ: io=4096MB, aggrb=2225MB/s, minb=2278MB/s, maxb=2278MB/s, mint=1841msec, maxt=1841msec Run status group 1 (all jobs): WRITE: io=4096MB, aggrb=111666KB/s, minb=114346KB/s, maxb=114346KB/s, mint=37561msec, maxt=37561msec A big improvement, 111MB/s write, and even better reads. However, this write speed still seems pretty slow. Another run after stopping the secondary DRBD sync: seq-read: (groupid=0, jobs=1): err= 0: pid=10708 read : io=4096MB, bw=2242MB/s, iops=35870, runt= 1827msec seq-write: (groupid=1, jobs=1): err= 0: pid=10709 write: io=4096MB, bw=560661KB/s, iops=8760, runt= 7481msec Run status group 0 (all jobs): READ: io=4096MB, aggrb=2242MB/s, minb=2296MB/s, maxb=2296MB/s, mint=1827msec, maxt=1827msec Run status group 1 (all jobs): WRITE: io=4096MB, aggrb=560660KB/s, minb=574116KB/s, maxb=574116KB/s, mint=7481msec, maxt=7481msec Now THAT is what I was hoping to see.... 2,242MB/s read, enough to saturate 18 x 1Gbps ports... and 560MB/s write, enough for 4.5 x 1Gbps, which is more than the maximum from 2 machines. So as long as I have the secondary DRBD disconnected during the day (I do), and don't have any LVM snapshots (I don't due to performance), then things should be a lot better. Now looking back at all this, I think I was probably suffering from a whole bunch of problems: 1) Write cache enabled on windows 2) iSCSI not configured to properly deal with intermittent/slow responses, queue forever instead of returning an error 3) Not using multipath IO 4) Server storage performance too slow to keep up (due to kernel bug in debian stable squeeze/2.6.32) 5) Using LVM snapshots which degraded performance 6) Using DRBD during the day with spinning disks on the secondary (couldn't keep up, slowed down the primary) 7) Sharing a single ethernet for user traffic and SAN traffic, allowing one protocol to flood/block the other 8) Using RR bonding with more ports on the SAN than the client, causing flooding, 802.3X pause frames, etc I can't say that any one of the above fixed the problem, it has been getting progressively better as each item has been addressed. I'd like to think that its very close to done now. The only thing I still need to do is get rid of the bond0 on the SAN, change to use 8 individual IP's, and configure the clients to talk to two of the IP's on the san, but only one over each ethernet interface. I'd again like to say thanks to all the people who've helped out with this drama. I did forget to take those photo's, but I'll take some next time I'm in, I think I did a pretty good job overall, and it looks reasonably neat (by my standards anyway :) Regards, Adam -- Adam Goryachev Website Managers www.websitemanagers.com.au