From mboxrd@z Thu Jan 1 00:00:00 1970 From: Phil Turmel Subject: Re: best base / worst case RAID 5,6 write speeds Date: Thu, 17 Dec 2015 17:40:48 -0500 Message-ID: <567339F0.9000209@turmel.org> References: <22128.11867.847781.946791@quad.stoffel.home> <22128.35881.182823.556362@quad.stoffel.home> <5672BB7A.4050808@turmel.org> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: Sender: linux-raid-owner@vger.kernel.org To: Dallas Clement Cc: Mark Knecht , John Stoffel , Linux-RAID List-Id: linux-raid.ids On 12/17/2015 04:08 PM, Dallas Clement wrote: > I am still in the process of collecting a bunch of performance data. > But so far, it is shocking to see the throughput difference when > blocks written are stripe aligned. Random writes unaligned has at least a 4x multiplier on raid5 and 6x on raid6 per my earlier explanation. Why does this surprise you? It's parity raid. This is why users with heavy random workloads are pointed at raid1 and raid10. I like raid10,f3 for VM host images and databases. > However, in the non-ideal world it > is not always possible to ensure that clients are writing blocks of > data which are stripe aligned. Hardly possible at all, except for bulk writes of large media files, and then only if you are writing one stream at a time to an otherwise idle storage stack. Not very realistic in a general-purpose storage appliance. "General purpose" just isn't very sequential. > If the goal is to reduce the # of RMWs > it seems like writing big blocks would also help for sequential > workloads where large quantities of data are being written. The goal is to be able to read later what you need to write now. Unless you have unlimited $ to spend, you have to balance speed, redundancy, and capacity. As they say, pick two. Lots of spindles is generally good. Raid5 is great for capacity, good for redundancy, and marginal for speed. Raid6 is great for capacity, great for redundancy, and pitiful for speed. Raid10,f2 is great for speed, poor for capacity, and good for redundancy. Raid10,f3 is great for speed, pitiful for capacity, and great for redundancy. > Can any > of you think of anything else that can be tuned in the kernel to > reduce # of RMWs in the case where blocks are not stripe aligned? Is > it a bad idea to mess with the timing of the stripe cache? You can't really hold those writes for long, as any serious application is going to call fdatasync at short intervals, for algorithmic integrity reasons. On random workloads, you simply have no choice but to do RMWs. Your only out is to make complete chunk stripes smaller than your application's typical write size. That raises the odds that any particular write will be aligned or mostly aligned. Have you tried 4k chunks? Phil