From mboxrd@z Thu Jan 1 00:00:00 1970 From: Dallas Clement Subject: Re: best base / worst case RAID 5,6 write speeds Date: Fri, 11 Dec 2015 17:30:26 -0600 Message-ID: References: <5669DB3B.30101@turmel.org> <22122.64143.522908.45940@quad.stoffel.home> <22123.9525.433754.283927@quad.stoffel.home> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Return-path: In-Reply-To: Sender: linux-raid-owner@vger.kernel.org To: John Stoffel Cc: Mark Knecht , Phil Turmel , Linux-RAID List-Id: linux-raid.ids On Fri, Dec 11, 2015 at 3:24 PM, Dallas Clement wrote: > On Fri, Dec 11, 2015 at 1:34 PM, John Stoffel wrote: >>>>>>> "Dallas" == Dallas Clement writes: >> >> Dallas> On Fri, Dec 11, 2015 at 10:32 AM, John Stoffel wrote: >>>>>>>>> "Dallas" == Dallas Clement writes: >>>> >> Dallas> Hi Mark. I have three different controllers on this >> Dallas> motherboard. A Marvell 9485 controls 8 of the disks. And an >> Dallas> Intel Cougar Point controls the 4 remaining disks. >>>> >>>> What type of PCIe slots are the controllers in? And how fast are the >>>> controllers/drives? Are they SATA1/2/3 drives? >>>> >>>>>> If you're spinning in IO loops then it could be a driver issue. >>>> >> Dallas> It sure is looking like that. I will try to profile the >> Dallas> kernel threads today and maybe use blktrace as Phil >> Dallas> recommended to see what is going on there. >>>> >>>> what kernel aer you running? >>>> >> Dallas> This is pretty sad that 12 single threaded fio jobs can bring >> Dallas> this system to its knees. >>>> >>>> I think it might be better to lower the queue depth, you might be just >>>> blowing out the controller caches... hard to know. >> >> Dallas> Hi John. >> >>>> What type of PCIe slots are the controllers in? And how fast are the >>>> controllers/drives? Are they SATA1/2/3 drives? >> >> Dallas> The MV 9485 controller is attached to an Intel Sandy Bridge >> Dallas> via PCIe GEN2 x 8. This one controls 8 of the disks. The >> Dallas> Intel Cougar Point is connected to the Intel Sandy Bridge via >> Dallas> DMI bus. >> >> So that should all be nice and fast. >> >> Dallas> All of the drives are SATA III, however I do have two of the >> Dallas> drives connected to SATA II ports on the Cougar Point. These >> Dallas> two drives used to be connected to SATA III ports on a MV >> Dallas> 9125/9120 controller. But it had truly horrible write >> Dallas> performance. Moving to the SATA II ports on the Cougar Point >> Dallas> boosted the performance close to the same as the other drives. >> Dallas> The remaining 10 drives are all connected to SATA III ports. >> >>>> what kernel aer you running? >> >> Dallas> Right now, I'm using 3.10.69. But I have tried the 4.2 kernel >> Dallas> in Fedora 23 with similar results. >> >> Hmm... maybe if your feeling adventerous you could try v4.4-rc4 and >> see how it works. You don't want anything between 4.2.6 and that >> because of problems with blk req management. I'm hazy on the details. >> >>>> I think it might be better to lower the queue depth, you might be just >>>> blowing out the controller caches... hard to know. >> >> Dallas> Good idea. I'll trying lowering to see what effect. >> >> It might also make sense to try your tests starting with just 1 disk, >> and then adding one more disk, re-running the tests, then another >> disk, re-running the tests, etc. >> >> Try with one on the MV, then one on the Cougar, then one on MV and one >> on Cougar, etc. >> >> Try to see if you can spot where the performance falls off the cliff. >> >> Also, which disk scheduler are you using? Instead of CFQ, you might >> try deadline instead. >> >> As you can see, there's a TON of knobs to twiddle with, it's not a >> simple thing to do at times. >> >> John > >> It might also make sense to try your tests starting with just 1 disk, >> and then adding one more disk, re-running the tests, then another >> disk, re-running the tests, etc > >> Try to see if you can spot where the performance falls off the cliff. > > Okay, did this. Interestingly, things did not fall of the cliff until > adding in the 12th disk. I started adding disks one at a time > beginning with the Cougar Point. The %iowait jumped up right away > with this guy also. > >> Also, which disk scheduler are you using? Instead of CFQ, you might >> try deadline instead. > > I'm using deadline. I have definitely observed better performance > with this vs cfq. > > At this point I think I need to probably use a tool like blktrace to > get more visibility than what I have with ps and iostat. I have one more observation. I tried varying the queue depth from 1, 4, 16, 32, 64, 128, 256. Surprisingly, all 12 disks are able to handle this load with queue depth <= 128. Each disk is at 100% utilization and writing 170-180 MB/s. Things start to fall apart with queue depth = 256 after adding in the 12th disk. The inflection point on load average seems to be around queue depth = 32. The load average for this 8 core system goes up to about 13 when I increase the queue depth to 64. So is my workload of 12 fio jobs writing sequential 2 MB blocks with direct I/O just too abusive? Seems so with high queue depth. I started this discussion because my RAID 5 and RAID 6 write performance is really bad. If my system is able to write to all 12 disks at 170 MB/s in JBOD mode, I am expecting that one fio job should be able to write at a speed of (N - 1) * X = 11 * 170 MB/s = 1870 MB/s. However, I am getting < 700 MB/s for queue depth = 32 and < 600 MB/s for queue depth = 256. I get similarly disappointing results for RAID 6 writes.