All of lore.kernel.org
 help / color / mirror / Atom feed
* Latest on SSD Raid
@ 2017-09-29 15:53 Dag Nygren
  2017-09-29 16:22 ` Joe Landman
  2017-09-30 13:22 ` Matt Garman
  0 siblings, 2 replies; 4+ messages in thread
From: Dag Nygren @ 2017-09-29 15:53 UTC (permalink / raw)
  To: linux-raid

Hi all!

Would like to tap some experience out of all here
with the question:

Any good hints and advice when setting up a RAID5 SSD with
3 disks to start with?

Best
Dag


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Latest on SSD Raid
  2017-09-29 15:53 Latest on SSD Raid Dag Nygren
@ 2017-09-29 16:22 ` Joe Landman
  2017-09-30 11:00   ` David Brown
  2017-09-30 13:22 ` Matt Garman
  1 sibling, 1 reply; 4+ messages in thread
From: Joe Landman @ 2017-09-29 16:22 UTC (permalink / raw)
  To: Dag Nygren, linux-raid


On 09/29/2017 11:53 AM, Dag Nygren wrote:
> Hi all!
>
> Would like to tap some experience out of all here
> with the question:
>
> Any good hints and advice when setting up a RAID5 SSD with
> 3 disks to start with?

You would need to worry about write amplification due to R5.  So if you 
do this, use SSDs with higher DWPD (drive writes per day).  Aim for 
3DWPD if you can, so you don't burn out the SSDs early.  Don't do this 
with consumer grade SSDs (anything 0.5 DWPD or less).  They do burn out 
(sometimes much) faster.  The little extra money spent on the enterprise 
SATA (or SAS) with higher DWPD is worth it.

Precondition the SSDs.  If you don't know how, I wrote a nice little 
util here: https://github.com/joelandman/disk_test_setup that helps you 
do it ... uses fio to drive 128k seqeuential writes to fill drives.  
Drive life appears well correlated with preconditioning and write loads.

Use a chunk size of 128k or so (larger better).  You want the chunk size 
at the same size as the erase block size.  Reduces write amplification.

You still have to worry about the whole RMW cycle for RAID5.  This 
means, for small IO (below chunk/erase block size), you have to 
read-modify-write at least 2 blocks back for every block written. If 
your writes are small (4k -> 32k) you'll want to invest in even higher 
quality (e.g. more DWPD).

If you can get enough drives, I'd actually recommend a RAID10.  Much 
lower write amplification -> longer lifetime.

>
> Best
> Dag
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
Joe Landman
e: joe.landman@gmail.com
t: @hpcjoe
w: https://scalability.org
g: https://github.com/joelandman
l: https://www.linkedin.com/in/joelandman


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Latest on SSD Raid
  2017-09-29 16:22 ` Joe Landman
@ 2017-09-30 11:00   ` David Brown
  0 siblings, 0 replies; 4+ messages in thread
From: David Brown @ 2017-09-30 11:00 UTC (permalink / raw)
  To: Joe Landman, Dag Nygren, linux-raid



On 29/09/17 18:22, Joe Landman wrote:
> 
> On 09/29/2017 11:53 AM, Dag Nygren wrote:
>> Hi all!
>>
>> Would like to tap some experience out of all here
>> with the question:
>>
>> Any good hints and advice when setting up a RAID5 SSD with
>> 3 disks to start with?
> 
> You would need to worry about write amplification due to R5.  So if you 
> do this, use SSDs with higher DWPD (drive writes per day).  Aim for 
> 3DWPD if you can, so you don't burn out the SSDs early.  Don't do this 
> with consumer grade SSDs (anything 0.5 DWPD or less).  They do burn out 
> (sometimes much) faster.  The little extra money spent on the enterprise 
> SATA (or SAS) with higher DWPD is worth it.
> 
> Precondition the SSDs.  If you don't know how, I wrote a nice little 
> util here: https://github.com/joelandman/disk_test_setup that helps you 
> do it ... uses fio to drive 128k seqeuential writes to fill drives. 
> Drive life appears well correlated with preconditioning and write loads.
> 
> Use a chunk size of 128k or so (larger better).  You want the chunk size 
> at the same size as the erase block size.  Reduces write amplification.
> 
> You still have to worry about the whole RMW cycle for RAID5.  This 
> means, for small IO (below chunk/erase block size), you have to 
> read-modify-write at least 2 blocks back for every block written. If 
> your writes are small (4k -> 32k) you'll want to invest in even higher 
> quality (e.g. more DWPD).
> 
> If you can get enough drives, I'd actually recommend a RAID10.  Much 
> lower write amplification -> longer lifetime.
> 

And also much faster.  The key point for speed in SSD's is low latency - 
and RAID5 can add a lot of latency to writes.


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Latest on SSD Raid
  2017-09-29 15:53 Latest on SSD Raid Dag Nygren
  2017-09-29 16:22 ` Joe Landman
@ 2017-09-30 13:22 ` Matt Garman
  1 sibling, 0 replies; 4+ messages in thread
From: Matt Garman @ 2017-09-30 13:22 UTC (permalink / raw)
  To: Dag Nygren; +Cc: Mdadm

On Fri, Sep 29, 2017 at 10:53 AM, Dag Nygren <dag@newtech.fi> wrote:
> Would like to tap some experience out of all here
> with the question:
>
> Any good hints and advice when setting up a RAID5 SSD with
> 3 disks to start with?

As is often (always?) the case, a lot depends on your expected workload.

Have now built three systems with 24-disk 2TB SSD RAID6 arrays.  We
used consumer-grade drives for cost savings, as this is a virtually
read-only workload.  We do add a small amount of data every
day---roughly 50GB, spread across the entire array.  The rest of the
time it's just read read read read.  (Effectively a WORM workload.)

In this role, our systems have been great.  We found that the network
interface was the first bottleneck.  Now we've got dual 40gbs ports on
these systems.  The interfaces are bonded and load balanced, and jumbo
frames is a must.

We did just a tiny bit of tuning, nothing special (don't remember the
details offhand).

The only real "gotcha" we ran into in all this is rebuild times.
Actually, rebuilds themselves are fast.  The problem is, it's a
tradeoff between reduced client performance and rebuild time.  mdadm
allows you to tune how fast the rebuild can go.  Although, with a bit
of experimentation, I found that mdadm supports multi-threaded
rebuilds, and this made a huge improvement in being able to do a
rebuild while still serving some data.

I suspect enterprise drives, with their generally bigger overprovision
space and smarter controllers, would likely fare better on rebuilds.
On the flipside, we haven't actually had any drive failures, the
rebuilds we did were just "practicing" for what to do and expect when
a drive does inevitably fail.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2017-09-30 13:22 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-09-29 15:53 Latest on SSD Raid Dag Nygren
2017-09-29 16:22 ` Joe Landman
2017-09-30 11:00   ` David Brown
2017-09-30 13:22 ` Matt Garman

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.