From mboxrd@z Thu Jan 1 00:00:00 1970 From: Phil Turmel Subject: Re: best base / worst case RAID 5,6 write speeds Date: Tue, 15 Dec 2015 08:53:18 -0500 Message-ID: <56701B4E.8060007@turmel.org> References: <566B6C8F.7020201@turmel.org> <566BA6E5.6030008@turmel.org> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: Sender: linux-raid-owner@vger.kernel.org To: Dallas Clement , Mark Knecht Cc: John Stoffel , Linux-RAID List-Id: linux-raid.ids Hi Dallas, On December 14, 2015 9:36:05 PM EST, Dallas Clement >Hi Everyone. I have some very interesting news to report. I did a >little bit more playing around with fio, doing sequential writes to a >RAID 5 device with all 12 disks. I kept the block size at the 128K >chunk aligned value of 1408K. But this time I varied the queue depth. >These are my results for writing a 10 GB of data: > >iodepth=1 => 642 MB/s, # of RMWs = 11 > >iodepth=4 => 1108 MB/s, # of RMWs = 6 > >iodepth=8 => 895 MB/s, # of RMWs = 7 > >iodepth=16 => 855 MB/s, # of RMWs = 11 > >iodepth=32 => 936 MB/s, # of RMWs = 11 > >iodepth=64 => 551 MB/s, # of RMWs = 5606 > >iodepth=128 => 554 MB/s, # of RMWs = 6333 > >As you can see, something goes terribly wrong with async i/o with >iodepth >= 64. Btw, not to be contentious Phil, I have checked >multiple fio man pages and they clearly indicate that iodepth is for >async i/o which this is (libaio). I don't see any mention of >sequential writes being prohibited with async i/o. See >https://github.com/axboe/fio/blob/master/HOWTO. Hmmm. I misread that part. But do note the comment that you might not achieve as many in-flight I/Os as you expect. >However, maybe I'm >missing something and it sure looks from these results that there may >be a connection. > >This is my fio job config: > >[job] >ioengine=libaio >iodepth=128 >prio=0 >rw=write >bs=1408k >filename=/dev/md10 >numjobs=1 >size=10g >direct=1 >invalidate=1 > >Incidentally, the very best write speed here (1108 MB/s with >iodepth=4) comes out to about 100 MB/s per disk, which is pretty close >to the worst case inner disk speed of 95.5 MB/s I had recorded >earlier. Very interesting indeed. I wonder if the extra I/O in flight at high depths is consuming all available stripe cache space, possibly not consistently. I'd raise and lower that in various combinations with various combinations of iodepth. Running out of stripe cache will cause premature RMWs. Regards, Phil