All of lore.kernel.org
 help / color / mirror / Atom feed
* mdadm raid1 read performance
@ 2011-05-04  0:07 Liam Kurmos
  2011-05-04  0:57 ` John Robinson
  2011-05-04  0:58 ` NeilBrown
  0 siblings, 2 replies; 36+ messages in thread
From: Liam Kurmos @ 2011-05-04  0:07 UTC (permalink / raw)
  To: linux-raid; +Cc: neilb

Hi,

I've been testing mdadm (great piece of software btw) however all my
test show that reading from raid1 is only the same speed as reading
from a single drive.

Is this a known issue? or is there something seriously wrong with my
system? i have tried v2.8.1 and v.3.2.1 without difference and several
benchmarking methods.

best regards,

Liam

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: mdadm raid1 read performance
  2011-05-04  0:07 mdadm raid1 read performance Liam Kurmos
@ 2011-05-04  0:57 ` John Robinson
  2011-05-06 20:44   ` Leslie Rhorer
  2011-05-04  0:58 ` NeilBrown
  1 sibling, 1 reply; 36+ messages in thread
From: John Robinson @ 2011-05-04  0:57 UTC (permalink / raw)
  To: Liam Kurmos; +Cc: Linux RAID

On 04/05/2011 01:07, Liam Kurmos wrote:
> Hi,
>
> I've been testing mdadm (great piece of software btw) however all my
> test show that reading from raid1 is only the same speed as reading
> from a single drive.
>
> Is this a known issue? or is there something seriously wrong with my
> system? i have tried v2.8.1 and v.3.2.1 without difference and several
> benchmarking methods.

This is a FAQ. Yes, this is known. No, it's not an issue, it's by design 
- pretty much any RAID 1 implementation will be the same because of the 
nature of spinning discs. md RAID 1 will serve multiple simultaneous 
reads from the different mirrors, giving a higher total throughput, but 
a single-threaded read will read from only one. If you want RAID 0 
sequential speed at the same time as RAID 1 mirroring, look at md RAID 
10, and in particular RAID 10,f2; please see the excellent documentation 
and wiki for more details.

Cheers,

John.


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: mdadm raid1 read performance
  2011-05-04  0:07 mdadm raid1 read performance Liam Kurmos
  2011-05-04  0:57 ` John Robinson
@ 2011-05-04  0:58 ` NeilBrown
  2011-05-04  5:30   ` Drew
  1 sibling, 1 reply; 36+ messages in thread
From: NeilBrown @ 2011-05-04  0:58 UTC (permalink / raw)
  To: Liam Kurmos; +Cc: linux-raid

On Wed, 4 May 2011 01:07:45 +0100 Liam Kurmos <quantum.leaf@gmail.com> wrote:

> Hi,
> 
> I've been testing mdadm (great piece of software btw) however all my
> test show that reading from raid1 is only the same speed as reading
> from a single drive.

Why do you expect RAID1 to be faster?  On a single threaded sequential read
there is not much it can do to go faster than a single device.  Maybe on some
multi-thread random IOs it might.
What sort of tests were you running?

NeilBrown


> 
> Is this a known issue? or is there something seriously wrong with my
> system? i have tried v2.8.1 and v.3.2.1 without difference and several
> benchmarking methods.
> 
> best regards,
> 
> Liam


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: mdadm raid1 read performance
  2011-05-04  0:58 ` NeilBrown
@ 2011-05-04  5:30   ` Drew
  2011-05-04  6:31     ` Brad Campbell
  0 siblings, 1 reply; 36+ messages in thread
From: Drew @ 2011-05-04  5:30 UTC (permalink / raw)
  To: NeilBrown; +Cc: Liam Kurmos, linux-raid

>> I've been testing mdadm (great piece of software btw) however all my
>> test show that reading from raid1 is only the same speed as reading
>> from a single drive.
>
> Why do you expect RAID1 to be faster?  On a single threaded sequential read
> there is not much it can do to go faster than a single device.  Maybe on some
> multi-thread random IOs it might.
> What sort of tests were you running?

It wouldn't surprise me if the OP had the same idea I had when I first
started reading about RAID many moons ago.

It seemed logical to me that if two disks had the same data and we
were reading an arbitrary amount of data, why couldn't we split the
read across both disks? That way we get the benefits of pulling from
multiple disks in the read case while accepting the penalty of a write
being as slow as the slowest disk..


-- 
Drew

"Nothing in life is to be feared. It is only to be understood."
--Marie Curie
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: mdadm raid1 read performance
  2011-05-04  5:30   ` Drew
@ 2011-05-04  6:31     ` Brad Campbell
  2011-05-04  7:42       ` Roberto Spadim
  2011-05-04  7:48       ` David Brown
  0 siblings, 2 replies; 36+ messages in thread
From: Brad Campbell @ 2011-05-04  6:31 UTC (permalink / raw)
  To: Drew; +Cc: NeilBrown, Liam Kurmos, linux-raid

On 04/05/11 13:30, Drew wrote:

> It seemed logical to me that if two disks had the same data and we
> were reading an arbitrary amount of data, why couldn't we split the
> read across both disks? That way we get the benefits of pulling from
> multiple disks in the read case while accepting the penalty of a write
> being as slow as the slowest disk..
>
>

I would have thought as you'd be skipping alternate "stripes" on each 
disk you minimise the benefit of a readahead buffer and get subjected to 
seek and rotational latency on both disks. Overall you're benefit would 
be slim to immeasurable. Now on SSD's I could see it providing some 
extra oomph as you suffer none of the mechanical latency penalties.


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: mdadm raid1 read performance
  2011-05-04  6:31     ` Brad Campbell
@ 2011-05-04  7:42       ` Roberto Spadim
  2011-05-04 23:08         ` Liam Kurmos
  2011-05-04  7:48       ` David Brown
  1 sibling, 1 reply; 36+ messages in thread
From: Roberto Spadim @ 2011-05-04  7:42 UTC (permalink / raw)
  To: Brad Campbell; +Cc: Drew, NeilBrown, Liam Kurmos, linux-raid

hum...
at user program we use:
file=fopen(); var=fread(file,buffer_size);fclose(file);

buffer_size is the problem since it can be very small (many reads), or
very big (small memory problem, but very nice query to optimize at
device block level)
if we have a big buffer_size, we can split it across disks (ssd)
if we have a small buffer_size, we can't split it (only if readahead
is very big)
problem: we need memory (cache/buffer)

the problem... is readahead better for ssd? or a bigger 'buffer_size'
at user program is better?
or... a filesystem change of 'block' size to a bigger block size, with
this don't matter if user use a small buffer_size at fread functions,
filesystem will always read many information at device block layer,
what's better? others ideas?

i don't know how linux kernel handle a very big fread with memory
for example:
fread(file,1000000); // 1MB
will linux split the 'single' fread in many reads at block layer? each
read with 1 block size (512byte/4096byte)?

2011/5/4 Brad Campbell <lists2009@fnarfbargle.com>:
> On 04/05/11 13:30, Drew wrote:
>
>> It seemed logical to me that if two disks had the same data and we
>> were reading an arbitrary amount of data, why couldn't we split the
>> read across both disks? That way we get the benefits of pulling from
>> multiple disks in the read case while accepting the penalty of a write
>> being as slow as the slowest disk..
>>
>>
>
> I would have thought as you'd be skipping alternate "stripes" on each disk
> you minimise the benefit of a readahead buffer and get subjected to seek and
> rotational latency on both disks. Overall you're benefit would be slim to
> immeasurable. Now on SSD's I could see it providing some extra oomph as you
> suffer none of the mechanical latency penalties.
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>



-- 
Roberto Spadim
Spadim Technology / SPAEmpresarial
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: mdadm raid1 read performance
  2011-05-04  6:31     ` Brad Campbell
  2011-05-04  7:42       ` Roberto Spadim
@ 2011-05-04  7:48       ` David Brown
  1 sibling, 0 replies; 36+ messages in thread
From: David Brown @ 2011-05-04  7:48 UTC (permalink / raw)
  To: linux-raid

On 04/05/2011 08:31, Brad Campbell wrote:
> On 04/05/11 13:30, Drew wrote:
>
>> It seemed logical to me that if two disks had the same data and we
>> were reading an arbitrary amount of data, why couldn't we split the
>> read across both disks? That way we get the benefits of pulling from
>> multiple disks in the read case while accepting the penalty of a write
>> being as slow as the slowest disk..
>>
>>
>
> I would have thought as you'd be skipping alternate "stripes" on each
> disk you minimise the benefit of a readahead buffer and get subjected to
> seek and rotational latency on both disks. Overall you're benefit would
> be slim to immeasurable. Now on SSD's I could see it providing some
> extra oomph as you suffer none of the mechanical latency penalties.
>

Even on SSD's you'd get some overhead for the skipping - each read 
command has to be tracked by both the host software and the disk firmware.

Such splitting would have to be done on a larger scale to make it 
efficient.  If you request a read for 2 MB, you could take the first MB 
from the first disk and simultaneously the second MB for the second disk.



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: mdadm raid1 read performance
  2011-05-04  7:42       ` Roberto Spadim
@ 2011-05-04 23:08         ` Liam Kurmos
  2011-05-04 23:35           ` Roberto Spadim
                             ` (3 more replies)
  0 siblings, 4 replies; 36+ messages in thread
From: Liam Kurmos @ 2011-05-04 23:08 UTC (permalink / raw)
  To: Roberto Spadim; +Cc: Brad Campbell, Drew, NeilBrown, linux-raid

Thanks to all who replied on this.

I somewhat naively assumed that having 2 disks with the same data
would mean a similar read speed to raid0 should be the norm (and i
think this is a very popular miss-conception).
I was neglecting the seek time to skip alternate blocks which i guess
must the flaw.

In theory though if i was reading a larger file, couldn't one disk
start reading at the beginning to a buffer and one start reading from
half way ( assuming 2 disks) and hence get close to 2x single disk
speed?

as a separate question, what should be the theoretical performance of raid5?

in my tests i read 1GB and throw away the data.
dd if=/dev/md0 of=/dev/null bs=1M count=1000

With 4 fairly fast hdd's i get

raid0: ~540MB/s
raid10: 220MB/s
raid5: ~165MB/s
raid1: ~140MB/s  (single disk speed)

for 4 disks raid0 seems like suicide, but for my system drive the
speed advantage is so great im tempted to try it anyway and try and
use rsync to keep constant back up.

cheers for you responses,

Liam



On Wed, May 4, 2011 at 8:42 AM, Roberto Spadim <roberto@spadim.com.br> wrote:
> hum...
> at user program we use:
> file=fopen(); var=fread(file,buffer_size);fclose(file);
>
> buffer_size is the problem since it can be very small (many reads), or
> very big (small memory problem, but very nice query to optimize at
> device block level)
> if we have a big buffer_size, we can split it across disks (ssd)
> if we have a small buffer_size, we can't split it (only if readahead
> is very big)
> problem: we need memory (cache/buffer)
>
> the problem... is readahead better for ssd? or a bigger 'buffer_size'
> at user program is better?
> or... a filesystem change of 'block' size to a bigger block size, with
> this don't matter if user use a small buffer_size at fread functions,
> filesystem will always read many information at device block layer,
> what's better? others ideas?
>
> i don't know how linux kernel handle a very big fread with memory
> for example:
> fread(file,1000000); // 1MB
> will linux split the 'single' fread in many reads at block layer? each
> read with 1 block size (512byte/4096byte)?
>
> 2011/5/4 Brad Campbell <lists2009@fnarfbargle.com>:
>> On 04/05/11 13:30, Drew wrote:
>>
>>> It seemed logical to me that if two disks had the same data and we
>>> were reading an arbitrary amount of data, why couldn't we split the
>>> read across both disks? That way we get the benefits of pulling from
>>> multiple disks in the read case while accepting the penalty of a write
>>> being as slow as the slowest disk..
>>>
>>>
>>
>> I would have thought as you'd be skipping alternate "stripes" on each disk
>> you minimise the benefit of a readahead buffer and get subjected to seek and
>> rotational latency on both disks. Overall you're benefit would be slim to
>> immeasurable. Now on SSD's I could see it providing some extra oomph as you
>> suffer none of the mechanical latency penalties.
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>
>
>
> --
> Roberto Spadim
> Spadim Technology / SPAEmpresarial
>
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: mdadm raid1 read performance
  2011-05-04 23:08         ` Liam Kurmos
@ 2011-05-04 23:35           ` Roberto Spadim
  2011-05-04 23:36           ` Brad Campbell
                             ` (2 subsequent siblings)
  3 siblings, 0 replies; 36+ messages in thread
From: Roberto Spadim @ 2011-05-04 23:35 UTC (permalink / raw)
  To: Liam Kurmos; +Cc: Brad Campbell, Drew, NeilBrown, linux-raid

2011/5/4 Liam Kurmos <quantum.leaf@gmail.com>:
> Thanks to all who replied on this.
>
> I somewhat naively assumed that having 2 disks with the same data
> would mean a similar read speed to raid0 should be the norm (and i
> think this is a very popular miss-conception).
> I was neglecting the seek time to skip alternate blocks which i guess
> must the flaw.
>
> In theory though if i was reading a larger file, couldn't one disk
> start reading at the beginning to a buffer and one start reading from
> half way ( assuming 2 disks) and hence get close to 2x single disk
> speed?

hummm..... maybe, it´s what LINEAR do, and depend how linux divide one
large read into small reads, and how program use fread(), with many
small freads, or with one big fread
check some magic....

1 disk blocks:
disk1: ABCDEFGH

raid0 (stripe) 2 disks
disk1: ACEG
disk2: BDFH

raid1 (no stripe) 2 disks
disk1: ABCDEFGH
disk2: ABCDEFGH

raid0 (linear) 2 disks
disk1: ABCD
disk2: EFGH

if you want to read ABCDEFGH the best speed will be raid0 (stripe),
you can read A+B, C+D, E+F, G+H with small disk/head movement
raid1 could help? maybe.... if you have 2 programs reading ABCDEFGH
and you don´t have cache/buffer, one program can use disk1, and
another disk2 that´s the best speed, or raid0 (linear) if one program
read ABCD and another EFGH, and after change program 1 EFGH and
program 2 ABCD

the problem here is:
1)read speed (more RPM = more MB/s),
2)access time (more acces time = more latency, acess time = RPM and
DISK (head move time) size 2,5" or 3,5" or 1,8"), some 'normal'
numbers:
    7200rpm=8,3333333ms acess time
    10000rpm=6ms acess time
    15000rpm=4ms acesstime
    ssd = 0.1ms acesstime (firmware: sata protocol + internal address
table + queue + others internal firmware tasks)
3)
for hard disk:
total time to read = access time (from current disk position and
current head position, to new head position and new disk position) +
read speed * number of bytes
for ssd:
total time to read = access time + internal information search (some
ssd have internal reallocation) + memory read time

stripe allow a small accesstime, since one disk read A, and is near to
C, while other disk read B and is near to D, with a sequencial read of
ABCD, you have 2 'reads' per driver, while with a linear you have 4
'reads'



> as a separate question, what should be the theoretical performance of raid5?
>
> in my tests i read 1GB and throw away the data.
> dd if=/dev/md0 of=/dev/null bs=1M count=1000
>
> With 4 fairly fast hdd's i get
>
> raid0: ~540MB/s
> raid10: 220MB/s
> raid5: ~165MB/s
> raid1: ~140MB/s  (single disk speed)
>
> for 4 disks raid0 seems like suicide, but for my system drive the
> speed advantage is so great im tempted to try it anyway and try and
> use rsync to keep constant back up.
>

i don´t know many information about raid5, but i think it´s near raid0
linear or raid0 stripe algorithm, need some checks with others guys

> cheers for you responses,
>
> Liam
>
>
>
> On Wed, May 4, 2011 at 8:42 AM, Roberto Spadim <roberto@spadim.com.br> wrote:
>> hum...
>> at user program we use:
>> file=fopen(); var=fread(file,buffer_size);fclose(file);
>>
>> buffer_size is the problem since it can be very small (many reads), or
>> very big (small memory problem, but very nice query to optimize at
>> device block level)
>> if we have a big buffer_size, we can split it across disks (ssd)
>> if we have a small buffer_size, we can't split it (only if readahead
>> is very big)
>> problem: we need memory (cache/buffer)
>>
>> the problem... is readahead better for ssd? or a bigger 'buffer_size'
>> at user program is better?
>> or... a filesystem change of 'block' size to a bigger block size, with
>> this don't matter if user use a small buffer_size at fread functions,
>> filesystem will always read many information at device block layer,
>> what's better? others ideas?
>>
>> i don't know how linux kernel handle a very big fread with memory
>> for example:
>> fread(file,1000000); // 1MB
>> will linux split the 'single' fread in many reads at block layer? each
>> read with 1 block size (512byte/4096byte)?
>>
>> 2011/5/4 Brad Campbell <lists2009@fnarfbargle.com>:
>>> On 04/05/11 13:30, Drew wrote:
>>>
>>>> It seemed logical to me that if two disks had the same data and we
>>>> were reading an arbitrary amount of data, why couldn't we split the
>>>> read across both disks? That way we get the benefits of pulling from
>>>> multiple disks in the read case while accepting the penalty of a write
>>>> being as slow as the slowest disk..
>>>>
>>>>
>>>
>>> I would have thought as you'd be skipping alternate "stripes" on each disk
>>> you minimise the benefit of a readahead buffer and get subjected to seek and
>>> rotational latency on both disks. Overall you're benefit would be slim to
>>> immeasurable. Now on SSD's I could see it providing some extra oomph as you
>>> suffer none of the mechanical latency penalties.
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
>>
>>
>>
>> --
>> Roberto Spadim
>> Spadim Technology / SPAEmpresarial
>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>



-- 
Roberto Spadim
Spadim Technology / SPAEmpresarial
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: mdadm raid1 read performance
  2011-05-04 23:08         ` Liam Kurmos
  2011-05-04 23:35           ` Roberto Spadim
@ 2011-05-04 23:36           ` Brad Campbell
  2011-05-04 23:45           ` NeilBrown
  2011-05-05  4:06           ` Roman Mamedov
  3 siblings, 0 replies; 36+ messages in thread
From: Brad Campbell @ 2011-05-04 23:36 UTC (permalink / raw)
  To: Liam Kurmos; +Cc: Roberto Spadim, Brad Campbell, Drew, NeilBrown, linux-raid

On 05/05/11 07:08, Liam Kurmos wrote:
> Thanks to all who replied on this.
>
> I somewhat naively assumed that having 2 disks with the same data
> would mean a similar read speed to raid0 should be the norm (and i
> think this is a very popular miss-conception).
> I was neglecting the seek time to skip alternate blocks which i guess
> must the flaw.
>
> In theory though if i was reading a larger file, couldn't one disk
> start reading at the beginning to a buffer and one start reading from
> half way ( assuming 2 disks) and hence get close to 2x single disk
> speed?
>
> as a separate question, what should be the theoretical performance of raid5?
>
> in my tests i read 1GB and throw away the data.
> dd if=/dev/md0 of=/dev/null bs=1M count=1000
>
> With 4 fairly fast hdd's i get
>
> raid0: ~540MB/s
> raid10: 220MB/s
> raid5: ~165MB/s
> raid1: ~140MB/s  (single disk speed)
>
> for 4 disks raid0 seems like suicide, but for my system drive the
> speed advantage is so great im tempted to try it anyway and try and
> use rsync to keep constant back up.
>

Try RAID10 with the far layout. It should give you streaming reads the same as RAID0

Brad

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: mdadm raid1 read performance
  2011-05-04 23:08         ` Liam Kurmos
  2011-05-04 23:35           ` Roberto Spadim
  2011-05-04 23:36           ` Brad Campbell
@ 2011-05-04 23:45           ` NeilBrown
  2011-05-04 23:57             ` Roberto Spadim
                               ` (2 more replies)
  2011-05-05  4:06           ` Roman Mamedov
  3 siblings, 3 replies; 36+ messages in thread
From: NeilBrown @ 2011-05-04 23:45 UTC (permalink / raw)
  To: Liam Kurmos; +Cc: Roberto Spadim, Brad Campbell, Drew, linux-raid

On Thu, 5 May 2011 00:08:59 +0100 Liam Kurmos <quantum.leaf@gmail.com> wrote:

> Thanks to all who replied on this.
> 
> I somewhat naively assumed that having 2 disks with the same data
> would mean a similar read speed to raid0 should be the norm (and i
> think this is a very popular miss-conception).
> I was neglecting the seek time to skip alternate blocks which i guess
> must the flaw.
> 
> In theory though if i was reading a larger file, couldn't one disk
> start reading at the beginning to a buffer and one start reading from
> half way ( assuming 2 disks) and hence get close to 2x single d

isk
> speed?

If you write your program to read from both the beginning and the middle
then you might get double-speed.  The kernel doesn't know you are going to do
this so the best it can do is read-ahead is large amounts.

raid1 could notice large reads and send some to one disk and some to another,
but the size for each device must be large enough that the time to seek over
must be much less than the time to read, which is probably many megabytes on
todays hardware - and raid1 has no way to know what that size is.

Certainly it is possible that the read_balance code in md/raid1 could be
improved.  As yet no-one has improved it and provided convincing performance
numbers.

> 
> as a separate question, what should be the theoretical performance of raid5?

x(N-1)

So a 4 drive RAID5 should read at 3 time the speed of a single drive.

> 
> in my tests i read 1GB and throw away the data.
> dd if=/dev/md0 of=/dev/null bs=1M count=1000
> 
> With 4 fairly fast hdd's i get

Which apparently do 140MB/s:

> 
> raid0: ~540MB/s

I would expect 4*140 == 560, so this is a good result.

> raid10: 220MB/s

Assuming the default 'n2' layout, I would expect 2*140 or 280, so this is a
little slow.  Try "--layout=f2" and see what you get (should be more like
RAID0).

> raid5: ~165MB/s

I would expect 3*140 or 420, so this is very slow.  I wonder if read-ahead is
set badly.
Can you:
   blockdev --getra /dev/md0
multiply the number it gives you by 8 and give it back with
   blockdev --setra NUMBER /dev/md0


> raid1: ~140MB/s  (single disk speed)

as expected.

> 
> for 4 disks raid0 seems like suicide, but for my system drive the
> speed advantage is so great im tempted to try it anyway and try and
> use rsync to keep constant back up.

If you have somewhere to rsync to, then you have more disks so RAID10 might
be an answer... but I suspect you cannot move disks around that freely :-)

NeilBrown



> 
> cheers for you responses,
> 
> Liam

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: mdadm raid1 read performance
  2011-05-04 23:45           ` NeilBrown
@ 2011-05-04 23:57             ` Roberto Spadim
  2011-05-05  0:14             ` Liam Kurmos
  2011-05-05 11:10             ` Keld Jørn Simonsen
  2 siblings, 0 replies; 36+ messages in thread
From: Roberto Spadim @ 2011-05-04 23:57 UTC (permalink / raw)
  To: NeilBrown; +Cc: Liam Kurmos, Brad Campbell, Drew, linux-raid

2011/5/4 NeilBrown <neilb@suse.de>:
> On Thu, 5 May 2011 00:08:59 +0100 Liam Kurmos <quantum.leaf@gmail.com> wrote:
>
>> Thanks to all who replied on this.
>>
>> I somewhat naively assumed that having 2 disks with the same data
>> would mean a similar read speed to raid0 should be the norm (and i
>> think this is a very popular miss-conception).
>> I was neglecting the seek time to skip alternate blocks which i guess
>> must the flaw.
>>
>> In theory though if i was reading a larger file, couldn't one disk
>> start reading at the beginning to a buffer and one start reading from
>> half way ( assuming 2 disks) and hence get close to 2x single d
>
> isk
>> speed?
>
> If you write your program to read from both the beginning and the middle
> then you might get double-speed.  The kernel doesn't know you are going to do
> this so the best it can do is read-ahead is large amounts.
>
> raid1 could notice large reads and send some to one disk and some to another,
> but the size for each device must be large enough that the time to seek over
> must be much less than the time to read, which is probably many megabytes on
> todays hardware - and raid1 has no way to know what that size is.
>
> Certainly it is possible that the read_balance code in md/raid1 could be
> improved.  As yet no-one has improved it and provided convincing performance
> numbers.

yes, it´s not a 10000% improvement, i got a max of 1% on a big test (1
hour of nonsequencial read), for ssd round robin allow a more use of
drives, and some improvements, while i don´t know how to get
hardware/software queue size, i couln´t improve code for select 'best'
disk: the disk that should return with less time, but benchmark
results was interesting since 1% was 1% three times (60minutes drop to
54minutes)

could be very interesting how to get information about disk and
automatic tune read balance
informations: acesstime (RPM information can help here), mb/s in a
sequencial search (depend RPM+disk size(1,8" 2,5" 3,5")+interface
(SATA1,SATA2,SAS) since SATA1 can´t allow more than 1,5Gb/s),
rotational/non rotational information
diference from rotational to non rotational:
roatitional: access time proportional to block distance (head arm /
disk position)
non rotaition: fixed accesstime with low variation


>> as a separate question, what should be the theoretical performance of raid5?
>
> x(N-1)
>
> So a 4 drive RAID5 should read at 3 time the speed of a single drive.
>
>>
>> in my tests i read 1GB and throw away the data.
>> dd if=/dev/md0 of=/dev/null bs=1M count=1000
>>
>> With 4 fairly fast hdd's i get
>
> Which apparently do 140MB/s:
>
>>
>> raid0: ~540MB/s
>
> I would expect 4*140 == 560, so this is a good result.
>
>> raid10: 220MB/s
>
> Assuming the default 'n2' layout, I would expect 2*140 or 280, so this is a
> little slow.  Try "--layout=f2" and see what you get (should be more like
> RAID0).
>
>> raid5: ~165MB/s
>
> I would expect 3*140 or 420, so this is very slow.  I wonder if read-ahead is
> set badly.
> Can you:
>   blockdev --getra /dev/md0
> multiply the number it gives you by 8 and give it back with
>   blockdev --setra NUMBER /dev/md0

very nice :)

>
>
>> raid1: ~140MB/s  (single disk speed)
>
> as expected.
>
>>
>> for 4 disks raid0 seems like suicide, but for my system drive the
>> speed advantage is so great im tempted to try it anyway and try and
>> use rsync to keep constant back up.
>
> If you have somewhere to rsync to, then you have more disks so RAID10 might
> be an answer... but I suspect you cannot move disks around that freely :-)
>
> NeilBrown
>
>
>
>>
>> cheers for you responses,
>>
>> Liam
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>



-- 
Roberto Spadim
Spadim Technology / SPAEmpresarial
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: mdadm raid1 read performance
  2011-05-04 23:45           ` NeilBrown
  2011-05-04 23:57             ` Roberto Spadim
@ 2011-05-05  0:14             ` Liam Kurmos
  2011-05-05  0:20               ` Liam Kurmos
  2011-05-05  0:24               ` Roberto Spadim
  2011-05-05 11:10             ` Keld Jørn Simonsen
  2 siblings, 2 replies; 36+ messages in thread
From: Liam Kurmos @ 2011-05-05  0:14 UTC (permalink / raw)
  To: NeilBrown; +Cc: Roberto Spadim, Brad Campbell, Drew, linux-raid

Thanks guys!



>> raid10: 220MB/s
>
> Assuming the default 'n2' layout, I would expect 2*140 or 280, so this is a
> little slow.  Try "--layout=f2" and see what you get (should be more like
> RAID0).


mdadm -C /dev/md0 --level=raid10 --layout=f2 --raid-devices=4
/dev/sda1 /dev/sdc1 /dev/sdd1 /dev/sde1

dd if=/dev/md0 of=/dev/null bs=1M count=1000
1000+0 records in
1000+0 records out
1048576000 bytes (1.0 GB) copied, 2.23352 s, 469 MB/s

:D

awesome!!

>
>> raid5: ~165MB/s
>
> I would expect 3*140 or 420, so this is very slow.  I wonder if read-ahead is
> set badly.

> Can you:
>   blockdev --getra /dev/md0
> multiply the number it gives you by 8 and give it back with
>   blockdev --setra NUMBER /dev/md0
>

genius.

im not really sure what this did but it totally fixed the problem.

look ahead was 768, set it 6144 and immediately got 400MB/s
>
>> raid1: ~140MB/s  (single disk speed)
>
> as expected.
>
>>
>> for 4 disks raid0 seems like suicide, but for my system drive the
>> speed advantage is so great im tempted to try it anyway and try and
>> use rsync to keep constant back up.
>
> If you have somewhere to rsync to, then you have more disks so RAID10 might
> be an answer... but I suspect you cannot move disks around that freely :-)
>

no need now! f2 layout is awesome.

many thanks,

Liam



> NeilBrown
>
>
>
>>
>> cheers for you responses,
>>
>> Liam
>
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: mdadm raid1 read performance
  2011-05-05  0:14             ` Liam Kurmos
@ 2011-05-05  0:20               ` Liam Kurmos
  2011-05-05  0:25                 ` Roberto Spadim
  2011-05-05  0:24               ` Roberto Spadim
  1 sibling, 1 reply; 36+ messages in thread
From: Liam Kurmos @ 2011-05-05  0:20 UTC (permalink / raw)
  To: NeilBrown; +Cc: Roberto Spadim, Brad Campbell, Drew, linux-raid

incidentally what does the f2 layout do that it performs so much
better than the default?

Liam


On Thu, May 5, 2011 at 1:14 AM, Liam Kurmos <quantum.leaf@gmail.com> wrote:
> Thanks guys!
>
>
>
>>> raid10: 220MB/s
>>
>> Assuming the default 'n2' layout, I would expect 2*140 or 280, so this is a
>> little slow.  Try "--layout=f2" and see what you get (should be more like
>> RAID0).
>
>
> mdadm -C /dev/md0 --level=raid10 --layout=f2 --raid-devices=4
> /dev/sda1 /dev/sdc1 /dev/sdd1 /dev/sde1
>
> dd if=/dev/md0 of=/dev/null bs=1M count=1000
> 1000+0 records in
> 1000+0 records out
> 1048576000 bytes (1.0 GB) copied, 2.23352 s, 469 MB/s
>
> :D
>
> awesome!!
>
>>
>>> raid5: ~165MB/s
>>
>> I would expect 3*140 or 420, so this is very slow.  I wonder if read-ahead is
>> set badly.
>
>> Can you:
>>   blockdev --getra /dev/md0
>> multiply the number it gives you by 8 and give it back with
>>   blockdev --setra NUMBER /dev/md0
>>
>
> genius.
>
> im not really sure what this did but it totally fixed the problem.
>
> look ahead was 768, set it 6144 and immediately got 400MB/s
>>
>>> raid1: ~140MB/s  (single disk speed)
>>
>> as expected.
>>
>>>
>>> for 4 disks raid0 seems like suicide, but for my system drive the
>>> speed advantage is so great im tempted to try it anyway and try and
>>> use rsync to keep constant back up.
>>
>> If you have somewhere to rsync to, then you have more disks so RAID10 might
>> be an answer... but I suspect you cannot move disks around that freely :-)
>>
>
> no need now! f2 layout is awesome.
>
> many thanks,
>
> Liam
>
>
>
>> NeilBrown
>>
>>
>>
>>>
>>> cheers for you responses,
>>>
>>> Liam
>>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: mdadm raid1 read performance
  2011-05-05  0:14             ` Liam Kurmos
  2011-05-05  0:20               ` Liam Kurmos
@ 2011-05-05  0:24               ` Roberto Spadim
  1 sibling, 0 replies; 36+ messages in thread
From: Roberto Spadim @ 2011-05-05  0:24 UTC (permalink / raw)
  To: Liam Kurmos; +Cc: NeilBrown, Brad Campbell, Drew, linux-raid

2011/5/4 Liam Kurmos <quantum.leaf@gmail.com>:
> Thanks guys!
>
>
>
>>> raid10: 220MB/s
>>
>> Assuming the default 'n2' layout, I would expect 2*140 or 280, so this is a
>> little slow.  Try "--layout=f2" and see what you get (should be more like
>> RAID0).
>
>
> mdadm -C /dev/md0 --level=raid10 --layout=f2 --raid-devices=4
> /dev/sda1 /dev/sdc1 /dev/sdd1 /dev/sde1
>
> dd if=/dev/md0 of=/dev/null bs=1M count=1000
> 1000+0 records in
> 1000+0 records out
> 1048576000 bytes (1.0 GB) copied, 2.23352 s, 469 MB/s
>
> :D
>
> awesome!!
>
>>
>>> raid5: ~165MB/s
>>
>> I would expect 3*140 or 420, so this is very slow.  I wonder if read-ahead is
>> set badly.
>
>> Can you:
>>   blockdev --getra /dev/md0
>> multiply the number it gives you by 8 and give it back with
>>   blockdev --setra NUMBER /dev/md0
>>
>
> genius.
>
> im not really sure what this did but it totally fixed the problem.
>
> look ahead was 768, set it 6144 and immediately got 400MB/s
>>
>>> raid1: ~140MB/s  (single disk speed)
>>
>> as expected.
>>
>>>
>>> for 4 disks raid0 seems like suicide, but for my system drive the
>>> speed advantage is so great im tempted to try it anyway and try and
>>> use rsync to keep constant back up.
>>
>> If you have somewhere to rsync to, then you have more disks so RAID10 might
>> be an answer... but I suspect you cannot move disks around that freely :-)
>>
>
> no need now! f2 layout is awesome.

hum, you should consider you application.....
for example, if you need a big ALTER TABLE (SQL database), that must
be very fast to don´t stop your production server, you should use f2
or raid0 ehehehe
but if you have a stable application that you need multiuser acess,
raid1 or n2 layout could be better

backup i´m running with raid10 f2, or raid1+raid0, and production
machine i´m using raid1 and linear to get more space

don´t forget to align partitions if you use it


>
> many thanks,
>
> Liam
>
>
>
>> NeilBrown
>>
>>
>>
>>>
>>> cheers for you responses,
>>>
>>> Liam
>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>



-- 
Roberto Spadim
Spadim Technology / SPAEmpresarial
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: mdadm raid1 read performance
  2011-05-05  0:20               ` Liam Kurmos
@ 2011-05-05  0:25                 ` Roberto Spadim
  2011-05-05  0:40                   ` Liam Kurmos
  0 siblings, 1 reply; 36+ messages in thread
From: Roberto Spadim @ 2011-05-05  0:25 UTC (permalink / raw)
  To: Liam Kurmos; +Cc: NeilBrown, Brad Campbell, Drew, linux-raid

raid10,f2 is a stripe configuration and mirror too, think about it like this:

disk 1,2,3,4
/dev/md0 = raid1 (1,2)
/dev/md1 = raid1 (3,4)

/dev/md2 = raid0 (stripe) (md0,md1)  <--- it´s near raid10




2011/5/4 Liam Kurmos <quantum.leaf@gmail.com>:
> incidentally what does the f2 layout do that it performs so much
> better than the default?
>
> Liam
>
>
> On Thu, May 5, 2011 at 1:14 AM, Liam Kurmos <quantum.leaf@gmail.com> wrote:
>> Thanks guys!
>>
>>
>>
>>>> raid10: 220MB/s
>>>
>>> Assuming the default 'n2' layout, I would expect 2*140 or 280, so this is a
>>> little slow.  Try "--layout=f2" and see what you get (should be more like
>>> RAID0).
>>
>>
>> mdadm -C /dev/md0 --level=raid10 --layout=f2 --raid-devices=4
>> /dev/sda1 /dev/sdc1 /dev/sdd1 /dev/sde1
>>
>> dd if=/dev/md0 of=/dev/null bs=1M count=1000
>> 1000+0 records in
>> 1000+0 records out
>> 1048576000 bytes (1.0 GB) copied, 2.23352 s, 469 MB/s
>>
>> :D
>>
>> awesome!!
>>
>>>
>>>> raid5: ~165MB/s
>>>
>>> I would expect 3*140 or 420, so this is very slow.  I wonder if read-ahead is
>>> set badly.
>>
>>> Can you:
>>>   blockdev --getra /dev/md0
>>> multiply the number it gives you by 8 and give it back with
>>>   blockdev --setra NUMBER /dev/md0
>>>
>>
>> genius.
>>
>> im not really sure what this did but it totally fixed the problem.
>>
>> look ahead was 768, set it 6144 and immediately got 400MB/s
>>>
>>>> raid1: ~140MB/s  (single disk speed)
>>>
>>> as expected.
>>>
>>>>
>>>> for 4 disks raid0 seems like suicide, but for my system drive the
>>>> speed advantage is so great im tempted to try it anyway and try and
>>>> use rsync to keep constant back up.
>>>
>>> If you have somewhere to rsync to, then you have more disks so RAID10 might
>>> be an answer... but I suspect you cannot move disks around that freely :-)
>>>
>>
>> no need now! f2 layout is awesome.
>>
>> many thanks,
>>
>> Liam
>>
>>
>>
>>> NeilBrown
>>>
>>>
>>>
>>>>
>>>> cheers for you responses,
>>>>
>>>> Liam
>>>
>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>



-- 
Roberto Spadim
Spadim Technology / SPAEmpresarial
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: mdadm raid1 read performance
  2011-05-05  0:25                 ` Roberto Spadim
@ 2011-05-05  0:40                   ` Liam Kurmos
  2011-05-05  7:26                     ` David Brown
  0 siblings, 1 reply; 36+ messages in thread
From: Liam Kurmos @ 2011-05-05  0:40 UTC (permalink / raw)
  To: Roberto Spadim; +Cc: NeilBrown, Brad Campbell, Drew, linux-raid

Cheers Roberto,

I've got the gist of the far layout from looking at wikipedia. There
is some clever stuff going on that i had never considered.
i'm going for f2 for my system drive.

Liam


On Thu, May 5, 2011 at 1:25 AM, Roberto Spadim <roberto@spadim.com.br> wrote:
> raid10,f2 is a stripe configuration and mirror too, think about it like this:
>
> disk 1,2,3,4
> /dev/md0 = raid1 (1,2)
> /dev/md1 = raid1 (3,4)
>
> /dev/md2 = raid0 (stripe) (md0,md1)  <--- it´s near raid10
>
>
>
>
> 2011/5/4 Liam Kurmos <quantum.leaf@gmail.com>:
>> incidentally what does the f2 layout do that it performs so much
>> better than the default?
>>
>> Liam
>>
>>
>> On Thu, May 5, 2011 at 1:14 AM, Liam Kurmos <quantum.leaf@gmail.com> wrote:
>>> Thanks guys!
>>>
>>>
>>>
>>>>> raid10: 220MB/s
>>>>
>>>> Assuming the default 'n2' layout, I would expect 2*140 or 280, so this is a
>>>> little slow.  Try "--layout=f2" and see what you get (should be more like
>>>> RAID0).
>>>
>>>
>>> mdadm -C /dev/md0 --level=raid10 --layout=f2 --raid-devices=4
>>> /dev/sda1 /dev/sdc1 /dev/sdd1 /dev/sde1
>>>
>>> dd if=/dev/md0 of=/dev/null bs=1M count=1000
>>> 1000+0 records in
>>> 1000+0 records out
>>> 1048576000 bytes (1.0 GB) copied, 2.23352 s, 469 MB/s
>>>
>>> :D
>>>
>>> awesome!!
>>>
>>>>
>>>>> raid5: ~165MB/s
>>>>
>>>> I would expect 3*140 or 420, so this is very slow.  I wonder if read-ahead is
>>>> set badly.
>>>
>>>> Can you:
>>>>   blockdev --getra /dev/md0
>>>> multiply the number it gives you by 8 and give it back with
>>>>   blockdev --setra NUMBER /dev/md0
>>>>
>>>
>>> genius.
>>>
>>> im not really sure what this did but it totally fixed the problem.
>>>
>>> look ahead was 768, set it 6144 and immediately got 400MB/s
>>>>
>>>>> raid1: ~140MB/s  (single disk speed)
>>>>
>>>> as expected.
>>>>
>>>>>
>>>>> for 4 disks raid0 seems like suicide, but for my system drive the
>>>>> speed advantage is so great im tempted to try it anyway and try and
>>>>> use rsync to keep constant back up.
>>>>
>>>> If you have somewhere to rsync to, then you have more disks so RAID10 might
>>>> be an answer... but I suspect you cannot move disks around that freely :-)
>>>>
>>>
>>> no need now! f2 layout is awesome.
>>>
>>> many thanks,
>>>
>>> Liam
>>>
>>>
>>>
>>>> NeilBrown
>>>>
>>>>
>>>>
>>>>>
>>>>> cheers for you responses,
>>>>>
>>>>> Liam
>>>>
>>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>
>
>
> --
> Roberto Spadim
> Spadim Technology / SPAEmpresarial
>
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: mdadm raid1 read performance
  2011-05-04 23:08         ` Liam Kurmos
                             ` (2 preceding siblings ...)
  2011-05-04 23:45           ` NeilBrown
@ 2011-05-05  4:06           ` Roman Mamedov
  2011-05-05  8:06             ` Nikolay Kichukov
  3 siblings, 1 reply; 36+ messages in thread
From: Roman Mamedov @ 2011-05-05  4:06 UTC (permalink / raw)
  To: Liam Kurmos; +Cc: Roberto Spadim, Brad Campbell, Drew, NeilBrown, linux-raid

[-- Attachment #1: Type: text/plain, Size: 525 bytes --]

On Thu, 5 May 2011 00:08:59 +0100
Liam Kurmos <quantum.leaf@gmail.com> wrote:

> in my tests i read 1GB and throw away the data.
> dd if=/dev/md0 of=/dev/null bs=1M count=1000

If you have enough RAM for disk cache, on the second and further consecutive
invocations of this you will be reading mostly from the cache, giving you an
incorrect inflated result. So either don't forget to drop filesystem caches
between runs, or just test read performance with "hdparm -t /dev/mdX" instead.

-- 
With respect,
Roman

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: mdadm raid1 read performance
  2011-05-05  0:40                   ` Liam Kurmos
@ 2011-05-05  7:26                     ` David Brown
  2011-05-05 10:41                       ` Keld Jørn Simonsen
  2011-05-06 21:05                       ` Leslie Rhorer
  0 siblings, 2 replies; 36+ messages in thread
From: David Brown @ 2011-05-05  7:26 UTC (permalink / raw)
  To: linux-raid

On 05/05/2011 02:40, Liam Kurmos wrote:
> Cheers Roberto,
>
> I've got the gist of the far layout from looking at wikipedia. There
> is some clever stuff going on that i had never considered.
> i'm going for f2 for my system drive.
>
> Liam
>

For general use, raid10,f2 is often the best choice.  The only 
disadvantage is if you have applications that make a lot of synchronised 
writes, as writes take longer (everything must be written twice, and 
because the data is spread out there is more head movement).  For most 
writes this doesn't matter - the OS caches the writes, and the app 
continues on its way, so the writes are done when the disks are not 
otherwise used.  But if you have synchronous writes, so that the app 
will wait for the write to complete, it will be slower (compared to 
raid10,n2 or raid10,o2).

The other problem with raid10 layout is booting - bootloaders don't much 
like it.  The very latest version of grub, IIRC, can boot from raid10 - 
but it can be awkward.  There are lots of how-tos around the web for 
booting when you have raid, but by far the easiest is to divide your 
disks into partitions:

sdX1 = 1GB
sdX2 = xGB
sdX3 = yGB

Put all your sdX1 partitions together as raid1 with metadata layout 
0.90, format as ext3 and use it as /boot.  Any bootloader will work fine 
with that (don't forget to install grub on each disk's MBR).

Put your sdX2 partitions together as raid10,f2 for swap.

Put the sdX3 partitions together as raid10,f2 for everything else.  The 
most flexible choice is to use LVM here and make logical partitions for 
/, /home, /usr, etc.  But you can also partition up the md device in 
distinct fixed partitions for /, /home, etc. if you want.

Don't try and make sdX3 and sdX4 groups and raids for separate / and 
/home (unless you want to use different raid levels for these two 
groups).  Your disks are faster near the start (at the outer edge of the 
disk), so you get the best speed by making the raid10,f2 from almost the 
whole disk.

mvh.,

David



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: mdadm raid1 read performance
  2011-05-05  4:06           ` Roman Mamedov
@ 2011-05-05  8:06             ` Nikolay Kichukov
  2011-05-05  8:39               ` Liam Kurmos
  2011-05-05  9:30               ` NeilBrown
  0 siblings, 2 replies; 36+ messages in thread
From: Nikolay Kichukov @ 2011-05-05  8:06 UTC (permalink / raw)
  To: Roman Mamedov
  Cc: Liam Kurmos, Roberto Spadim, Brad Campbell, Drew, NeilBrown, linux-raid

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi,
It seems like he is reading directly from the raid device and not through the filesystem. So there are no filesystem
caches in this way.

Cheers,
- -Nik

On 05/05/2011 07:06 AM, Roman Mamedov wrote:
> On Thu, 5 May 2011 00:08:59 +0100
> Liam Kurmos <quantum.leaf@gmail.com> wrote:
> 
>> in my tests i read 1GB and throw away the data.
>> dd if=/dev/md0 of=/dev/null bs=1M count=1000
> 
> If you have enough RAM for disk cache, on the second and further consecutive
> invocations of this you will be reading mostly from the cache, giving you an
> incorrect inflated result. So either don't forget to drop filesystem caches
> between runs, or just test read performance with "hdparm -t /dev/mdX" instead.
> 
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQEcBAEBAgAGBQJNwlpxAAoJEDFLYVOGGjgXHS0IAJAVFwR0Y0SD6G4CTaViqNCv
32s/VCZWTZMzXiLOrY5xCGFiuqPurmGLy/+aW+HeEYShMndkZQ8H8ZHlSx5L3OoH
SQnWS7gP5hUn2w9qamhtFk9iPWpx18ZzVqN/k9WyAWhY4Ro20G8PWI3/T4Q3+zam
WYJ6KglllX+BuQYVhmhwB1KGVFhmpQXBXKWVrcGIB7vyGnM5K9fWLbxd6VvghZvd
qpfrmPO2WvuqpxCS+YcZTqEg7osbzNB+W/6DMJ7BpxyUcxIEyXwBwZSUxsZe3WWo
oTfj0XUaVc17TPfKMCyLfgm+K6f+IJfKky5e5mJyFCjesDhqFngkpilft9xNxq0=
=qsmJ
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: mdadm raid1 read performance
  2011-05-05  8:06             ` Nikolay Kichukov
@ 2011-05-05  8:39               ` Liam Kurmos
  2011-05-05  8:49                 ` Liam Kurmos
  2011-05-05  9:30               ` NeilBrown
  1 sibling, 1 reply; 36+ messages in thread
From: Liam Kurmos @ 2011-05-05  8:39 UTC (permalink / raw)
  To: Nikolay Kichukov; +Cc: linux-raid

On Thu, May 5, 2011 at 9:06 AM, Nikolay Kichukov <hijacker@oldum.net> wrote:
> -----BE
> It seems like he is reading directly from the raid device and not through the filesystem. So there are no filesystem
> caches in this way.

phew!  ... i think (see below)


I installed ubuntu 11.04 on the new system last night.
this morning i went to reconnect the old system drive (id disconnected
it for safety)  and 'pop' a small piece of metal must have touched the
back of the loose drive and fried the board! ... oh joy.

--luckily its an old drive and i have an identical spare so im hoping
i can swap the board and save all my work since last git commit.

anyway.. this look different in 11.04. mdadm 3.1.4

/dev/md0:
 Timing buffered disk reads: 594 MB in  3.00 seconds = 197.79 MB/sec
zoizoi@shankara:~$ sudo dd if=/dev/md0 of=/dev/null bs=1M count=1000
1000+0 records in
1000+0 records out
1048576000 bytes (1.0 GB) copied, 5.84755 s, 179 MB/s
zoizoi@shankara:~$ sudo dd if=/dev/md0 of=/dev/null bs=1M count=1000
1000+0 records in
1000+0 records out
1048576000 bytes (1.0 GB) copied, 0.152353 s, 6.9 GB/s

even though im accessing the block device directly it does look like
im getting buffering in natty! I don't think i was in 10.10 and
certainly wasnt getting 7GB/s

raid10 f2 performance is right down vs what i got last night (i got
470MB/s first try after creating the array so dont think there was
buffering.

my md1 raid5 was also slow again. readahead on both defaulted down to
256 on 11.04

I applied Neil's x8 fix to both md0 and md1 and now the dd test look
much better.

sudo dd if=/dev/md0 of=/dev/null bs=1M count=4000
4000+0 records in
4000+0 records out
4194304000 bytes (4.2 GB) copied, 7.62018 s, 550 MB/s

a little too good!
i could see in the system monitor that i didnt have the large 4G
buffer (i do post this test). Something i did must have reset the
buffer. I could see a small amount of buffer in system monitor so
maybe it was 1GB. I appreciate these are not the best test but not
that hdparm is much worse. Once i set the 2048 sector readahead i gets
totally unrealistic.

zoizoi@shankara:~$ sudo blockdev --setra 2048 /dev/md0
zoizoi@shankara:~$ sudo hdparm -t /dev/md0

/dev/md0:
 Timing buffered disk reads: 5294 MB in  3.00 seconds = 1762.34 MB/sec
zoizoi@shankara:~$ sudo blockdev --setra 256 /dev/md0
zoizoi@shankara:~$ sudo hdparm -t /dev/md0

/dev/md0:
 Timing buffered disk reads: 582 MB in  3.00 seconds = 193.78 MB/sec

anyway, it looks like i'm getting good read speed now with 2048
lookahead, ill do another dd test on reboot.


thanks to you all for the helpful responses,

Liam

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: mdadm raid1 read performance
  2011-05-05  8:39               ` Liam Kurmos
@ 2011-05-05  8:49                 ` Liam Kurmos
  0 siblings, 0 replies; 36+ messages in thread
From: Liam Kurmos @ 2011-05-05  8:49 UTC (permalink / raw)
  To: Nikolay Kichukov; +Cc: linux-raid

from reboot:

zoizoi@shankara:~$ sudo blockdev --getra  /dev/md0
[sudo] password for zoizoi:
256
zoizoi@shankara:~$ sudo blockdev --setra 2048 /dev/md0
zoizoi@shankara:~$ sudo dd if=/dev/md0 of=/dev/null bs=1M count=4000
4000+0 records in
4000+0 records out
4194304000 bytes (4.2 GB) copied, 7.53149 s, 557 MB/s
zoizoi@shankara:~$ sudo dd if=/dev/md0 of=/dev/null bs=1M count=4000
4000+0 records in
4000+0 records out
4194304000 bytes (4.2 GB) copied, 0.600264 s, 7.0 GB/s
zoizoi@shankara:~$


Liam


On Thu, May 5, 2011 at 9:39 AM, Liam Kurmos <quantum.leaf@gmail.com> wrote:
> On Thu, May 5, 2011 at 9:06 AM, Nikolay Kichukov <hijacker@oldum.net> wrote:
>> -----BE
>> It seems like he is reading directly from the raid device and not through the filesystem. So there are no filesystem
>> caches in this way.
>
> phew!  ... i think (see below)
>
>
> I installed ubuntu 11.04 on the new system last night.
> this morning i went to reconnect the old system drive (id disconnected
> it for safety)  and 'pop' a small piece of metal must have touched the
> back of the loose drive and fried the board! ... oh joy.
>
> --luckily its an old drive and i have an identical spare so im hoping
> i can swap the board and save all my work since last git commit.
>
> anyway.. this look different in 11.04. mdadm 3.1.4
>
> /dev/md0:
>  Timing buffered disk reads: 594 MB in  3.00 seconds = 197.79 MB/sec
> zoizoi@shankara:~$ sudo dd if=/dev/md0 of=/dev/null bs=1M count=1000
> 1000+0 records in
> 1000+0 records out
> 1048576000 bytes (1.0 GB) copied, 5.84755 s, 179 MB/s
> zoizoi@shankara:~$ sudo dd if=/dev/md0 of=/dev/null bs=1M count=1000
> 1000+0 records in
> 1000+0 records out
> 1048576000 bytes (1.0 GB) copied, 0.152353 s, 6.9 GB/s
>
> even though im accessing the block device directly it does look like
> im getting buffering in natty! I don't think i was in 10.10 and
> certainly wasnt getting 7GB/s
>
> raid10 f2 performance is right down vs what i got last night (i got
> 470MB/s first try after creating the array so dont think there was
> buffering.
>
> my md1 raid5 was also slow again. readahead on both defaulted down to
> 256 on 11.04
>
> I applied Neil's x8 fix to both md0 and md1 and now the dd test look
> much better.
>
> sudo dd if=/dev/md0 of=/dev/null bs=1M count=4000
> 4000+0 records in
> 4000+0 records out
> 4194304000 bytes (4.2 GB) copied, 7.62018 s, 550 MB/s
>
> a little too good!
> i could see in the system monitor that i didnt have the large 4G
> buffer (i do post this test). Something i did must have reset the
> buffer. I could see a small amount of buffer in system monitor so
> maybe it was 1GB. I appreciate these are not the best test but not
> that hdparm is much worse. Once i set the 2048 sector readahead i gets
> totally unrealistic.
>
> zoizoi@shankara:~$ sudo blockdev --setra 2048 /dev/md0
> zoizoi@shankara:~$ sudo hdparm -t /dev/md0
>
> /dev/md0:
>  Timing buffered disk reads: 5294 MB in  3.00 seconds = 1762.34 MB/sec
> zoizoi@shankara:~$ sudo blockdev --setra 256 /dev/md0
> zoizoi@shankara:~$ sudo hdparm -t /dev/md0
>
> /dev/md0:
>  Timing buffered disk reads: 582 MB in  3.00 seconds = 193.78 MB/sec
>
> anyway, it looks like i'm getting good read speed now with 2048
> lookahead, ill do another dd test on reboot.
>
>
> thanks to you all for the helpful responses,
>
> Liam
>
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: mdadm raid1 read performance
  2011-05-05  8:06             ` Nikolay Kichukov
  2011-05-05  8:39               ` Liam Kurmos
@ 2011-05-05  9:30               ` NeilBrown
  1 sibling, 0 replies; 36+ messages in thread
From: NeilBrown @ 2011-05-05  9:30 UTC (permalink / raw)
  To: Nikolay Kichukov
  Cc: Roman Mamedov, Liam Kurmos, Roberto Spadim, Brad Campbell, Drew,
	linux-raid

On Thu, 05 May 2011 11:06:09 +0300 Nikolay Kichukov <hijacker@oldum.net>
wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> Hi,
> It seems like he is reading directly from the raid device and not through the filesystem. So there are no filesystem
> caches in this way.

A block device is actually implemented a lot like a filesystem which
contains a single file with a trivial mapping from file-block to device-block.

In any case, reading from a block device most definitely does go through the
page cache, and repeatedly reading from a device which is substantially
smaller than memory will cause subsequent reads to come from the cache.

NeilBrown


> 
> Cheers,
> - -Nik
> 
> On 05/05/2011 07:06 AM, Roman Mamedov wrote:
> > On Thu, 5 May 2011 00:08:59 +0100
> > Liam Kurmos <quantum.leaf@gmail.com> wrote:
> > 
> >> in my tests i read 1GB and throw away the data.
> >> dd if=/dev/md0 of=/dev/null bs=1M count=1000
> > 
> > If you have enough RAM for disk cache, on the second and further consecutive
> > invocations of this you will be reading mostly from the cache, giving you an
> > incorrect inflated result. So either don't forget to drop filesystem caches
> > between runs, or just test read performance with "hdparm -t /dev/mdX" instead.
> > 
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.11 (GNU/Linux)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
> 
> iQEcBAEBAgAGBQJNwlpxAAoJEDFLYVOGGjgXHS0IAJAVFwR0Y0SD6G4CTaViqNCv
> 32s/VCZWTZMzXiLOrY5xCGFiuqPurmGLy/+aW+HeEYShMndkZQ8H8ZHlSx5L3OoH
> SQnWS7gP5hUn2w9qamhtFk9iPWpx18ZzVqN/k9WyAWhY4Ro20G8PWI3/T4Q3+zam
> WYJ6KglllX+BuQYVhmhwB1KGVFhmpQXBXKWVrcGIB7vyGnM5K9fWLbxd6VvghZvd
> qpfrmPO2WvuqpxCS+YcZTqEg7osbzNB+W/6DMJ7BpxyUcxIEyXwBwZSUxsZe3WWo
> oTfj0XUaVc17TPfKMCyLfgm+K6f+IJfKky5e5mJyFCjesDhqFngkpilft9xNxq0=
> =qsmJ
> -----END PGP SIGNATURE-----


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: mdadm raid1 read performance
  2011-05-05  7:26                     ` David Brown
@ 2011-05-05 10:41                       ` Keld Jørn Simonsen
  2011-05-05 11:38                         ` David Brown
  2011-05-06 21:05                       ` Leslie Rhorer
  1 sibling, 1 reply; 36+ messages in thread
From: Keld Jørn Simonsen @ 2011-05-05 10:41 UTC (permalink / raw)
  To: David Brown; +Cc: linux-raid

On Thu, May 05, 2011 at 09:26:45AM +0200, David Brown wrote:
> On 05/05/2011 02:40, Liam Kurmos wrote:
> >Cheers Roberto,
> >
> >I've got the gist of the far layout from looking at wikipedia. There
> >is some clever stuff going on that i had never considered.
> >i'm going for f2 for my system drive.
> >
> >Liam
> >
> 
> For general use, raid10,f2 is often the best choice.  The only 
> disadvantage is if you have applications that make a lot of synchronised 
> writes, as writes take longer (everything must be written twice, and 
> because the data is spread out there is more head movement).  For most 
> writes this doesn't matter - the OS caches the writes, and the app 
> continues on its way, so the writes are done when the disks are not 
> otherwise used.  But if you have synchronous writes, so that the app 
> will wait for the write to complete, it will be slower (compared to 
> raid10,n2 or raid10,o2).

Yes syncroneous writes would be significantly slower.
I have not seen benchmarks on it, tho.
Which applications typically use syncroneous IO?
Maybe not that many.
Do databases do that, eg postgresql and mysql?

> The other problem with raid10 layout is booting - bootloaders don't much 
> like it.  The very latest version of grub, IIRC, can boot from raid10 - 
> but it can be awkward.  There are lots of how-tos around the web for 
> booting when you have raid, but by far the easiest is to divide your 
> disks into partitions:
> 
> sdX1 = 1GB
> sdX2 = xGB
> sdX3 = yGB
> 
> Put all your sdX1 partitions together as raid1 with metadata layout 
> 0.90, format as ext3 and use it as /boot.  Any bootloader will work fine 
> with that (don't forget to install grub on each disk's MBR).
> 
> Put your sdX2 partitions together as raid10,f2 for swap.
> 
> Put the sdX3 partitions together as raid10,f2 for everything else.  The 
> most flexible choice is to use LVM here and make logical partitions for 
> /, /home, /usr, etc.  But you can also partition up the md device in 
> distinct fixed partitions for /, /home, etc. if you want.

there is a similar layout of your disks described in

https://raid.wiki.kernel.org/index.php/Preventing_against_a_failing_disk

> Don't try and make sdX3 and sdX4 groups and raids for separate / and 
> /home (unless you want to use different raid levels for these two 
> groups).  Your disks are faster near the start (at the outer edge of the 
> disk), so you get the best speed by making the raid10,f2 from almost the 
> whole disk.

Hmm, I think the root partition actually would have more accesses than
/home and other partitions, so it may be beneficial to give the fastest
disk sectors to a separate root partition. Comments?

best regards
Keld

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: mdadm raid1 read performance
  2011-05-04 23:45           ` NeilBrown
  2011-05-04 23:57             ` Roberto Spadim
  2011-05-05  0:14             ` Liam Kurmos
@ 2011-05-05 11:10             ` Keld Jørn Simonsen
  2011-05-06 21:20               ` Leslie Rhorer
  2 siblings, 1 reply; 36+ messages in thread
From: Keld Jørn Simonsen @ 2011-05-05 11:10 UTC (permalink / raw)
  To: NeilBrown; +Cc: Liam Kurmos, Roberto Spadim, Brad Campbell, Drew, linux-raid

On Thu, May 05, 2011 at 09:45:38AM +1000, NeilBrown wrote:
> On Thu, 5 May 2011 00:08:59 +0100 Liam Kurmos <quantum.leaf@gmail.com> wrote:
> 
> > as a separate question, what should be the theoretical performance of raid5?
> 
> x(N-1)
> 
> So a 4 drive RAID5 should read at 3 time the speed of a single drive.

Actually, theoretically, it should be more than that for reading, more like N minus
some overhead. In a raid5 stripe of 4 disks, when reading you do not read
the checksum block, and thus you should be able to have all 4 drives
occupied with reading real data. Some benchmarks back this up, 
http://home.comcast.net/~jpiszcz/20080329-raid/
http://blog.jamponi.net/2008/07/raid56-and-10-benchmarks-on-26255_10.html
The latter reports a 3.44 times performance for raid5 reads with 4
disks, significantly over the N-1 = 3.0 mark.

For writing, you are correct with the N-1 formular.

best regards
keld

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: mdadm raid1 read performance
  2011-05-05 10:41                       ` Keld Jørn Simonsen
@ 2011-05-05 11:38                         ` David Brown
  2011-05-06  4:14                           ` CoolCold
  0 siblings, 1 reply; 36+ messages in thread
From: David Brown @ 2011-05-05 11:38 UTC (permalink / raw)
  To: linux-raid

On 05/05/2011 12:41, Keld Jørn Simonsen wrote:
> On Thu, May 05, 2011 at 09:26:45AM +0200, David Brown wrote:
>> On 05/05/2011 02:40, Liam Kurmos wrote:
>>> Cheers Roberto,
>>>
>>> I've got the gist of the far layout from looking at wikipedia. There
>>> is some clever stuff going on that i had never considered.
>>> i'm going for f2 for my system drive.
>>>
>>> Liam
>>>
>>
>> For general use, raid10,f2 is often the best choice.  The only
>> disadvantage is if you have applications that make a lot of synchronised
>> writes, as writes take longer (everything must be written twice, and
>> because the data is spread out there is more head movement).  For most
>> writes this doesn't matter - the OS caches the writes, and the app
>> continues on its way, so the writes are done when the disks are not
>> otherwise used.  But if you have synchronous writes, so that the app
>> will wait for the write to complete, it will be slower (compared to
>> raid10,n2 or raid10,o2).
>
> Yes syncroneous writes would be significantly slower.
> I have not seen benchmarks on it, tho.
> Which applications typically use syncroneous IO?
> Maybe not that many.
> Do databases do that, eg postgresql and mysql?
>

Database servers do use synchronous writes (or fsync() calls), but I 
suspect that they won't suffer much if these are slow unless you have a 
great deal of writes - they typically write to the transaction log, 
fsync(), write to the database files, fsync(), then write to the log 
again and fsync().  But they will buffer up their writes as needed in a 
separate thread or process - it should not hinder their read processes.

Lots of other applications also use fsync() whenever they want to be 
sure that data is written to the disk.  A prime example is sqlite, which 
is used by many other programs.  If you have your disk systems and file 
systems set up as a typical home user, there is little problem - the 
disk write caches and file system caches will ensure that the app thinks 
the write is complete long before it hits the disk surfaces anyway (thus 
negating the whole point of using fsync() in the first place...).  But 
if you have a more paranoid setup, so that your databases or other files 
will not get corrupted by power fails or OS crashes, then you have write 
barriers enabled on the filesystems and write caches disabled on the 
disks.  fsync() will then take time - and it will slow down programs 
that wait for fsync().

I've not done (or seen) any benchmarks on this, and I don't think it 
will be noticeable to most users.  But it's a typical tradeoff - if you 
are looking for high reliability even with power failures or OS crashes, 
then you pay for it in some kinds of performance.


>> The other problem with raid10 layout is booting - bootloaders don't much
>> like it.  The very latest version of grub, IIRC, can boot from raid10 -
>> but it can be awkward.  There are lots of how-tos around the web for
>> booting when you have raid, but by far the easiest is to divide your
>> disks into partitions:
>>
>> sdX1 = 1GB
>> sdX2 = xGB
>> sdX3 = yGB
>>
>> Put all your sdX1 partitions together as raid1 with metadata layout
>> 0.90, format as ext3 and use it as /boot.  Any bootloader will work fine
>> with that (don't forget to install grub on each disk's MBR).
>>
>> Put your sdX2 partitions together as raid10,f2 for swap.
>>
>> Put the sdX3 partitions together as raid10,f2 for everything else.  The
>> most flexible choice is to use LVM here and make logical partitions for
>> /, /home, /usr, etc.  But you can also partition up the md device in
>> distinct fixed partitions for /, /home, etc. if you want.
>
> there is a similar layout of your disks described in
>
> https://raid.wiki.kernel.org/index.php/Preventing_against_a_failing_disk
>

They've stolen my ideas!  Actually, I think this setup is fairly obvious 
when you think through the workings of raid and grub, and it's not 
surprising that more than one person has independently picked the same 
arrangement.

>> Don't try and make sdX3 and sdX4 groups and raids for separate / and
>> /home (unless you want to use different raid levels for these two
>> groups).  Your disks are faster near the start (at the outer edge of the
>> disk), so you get the best speed by making the raid10,f2 from almost the
>> whole disk.
>
> Hmm, I think the root partition actually would have more accesses than
> /home and other partitions, so it may be beneficial to give the fastest
> disk sectors to a separate root partition. Comments?
>

If you make the root logical volume first, then the home logical volume 
(or fixed partitions within the raid), then you will automatically get 
faster access for it.  The arrangement on the disk (for a two disk 
raid10,far) will then be:

Boot1 SwapA1 SwapB2 RootA1 HomeA1 <spareA1> RootB2 HomeB2 <spareB2>
Boot2 SwapB1 SwapA2 RootB1 HomeB1 <spareB1> RootA2 HomeA2 <spareA2>

Here "A" and "B" are stripes, while "1" and "2" are copies.

<spare> is unallocated LVM space.

Since Boot is very small, it negligible for performance - it doesn't 
matter that it takes the fastest few tracks.  Swap gets as high speed as 
the disk can support.  Then root will be faster than home, but both will 
still be better than the disk's average speed since one copy of the data 
is within the outer half of the disk.


--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: mdadm raid1 read performance
  2011-05-05 11:38                         ` David Brown
@ 2011-05-06  4:14                           ` CoolCold
  2011-05-06  7:29                             ` David Brown
  0 siblings, 1 reply; 36+ messages in thread
From: CoolCold @ 2011-05-06  4:14 UTC (permalink / raw)
  To: David Brown; +Cc: linux-raid

On Thu, May 5, 2011 at 3:38 PM, David Brown <david@westcontrol.com> wrote:
> On 05/05/2011 12:41, Keld Jørn Simonsen wrote:
>>
>> On Thu, May 05, 2011 at 09:26:45AM +0200, David Brown wrote:
>>>
>>> On 05/05/2011 02:40, Liam Kurmos wrote:
>>>>
>>>> Cheers Roberto,
>>>>
>>>> I've got the gist of the far layout from looking at wikipedia. There
>>>> is some clever stuff going on that i had never considered.
>>>> i'm going for f2 for my system drive.
>>>>
>>>> Liam
>>>>
>>>
>>> For general use, raid10,f2 is often the best choice.  The only
>>> disadvantage is if you have applications that make a lot of synchronised
>>> writes, as writes take longer (everything must be written twice, and
>>> because the data is spread out there is more head movement).  For most
>>> writes this doesn't matter - the OS caches the writes, and the app
>>> continues on its way, so the writes are done when the disks are not
>>> otherwise used.  But if you have synchronous writes, so that the app
>>> will wait for the write to complete, it will be slower (compared to
>>> raid10,n2 or raid10,o2).
>>
>> Yes syncroneous writes would be significantly slower.
>> I have not seen benchmarks on it, tho.
>> Which applications typically use syncroneous IO?
>> Maybe not that many.
>> Do databases do that, eg postgresql and mysql?
>>
>
> Database servers do use synchronous writes (or fsync() calls), but I suspect
> that they won't suffer much if these are slow unless you have a great deal
> of writes - they typically write to the transaction log, fsync(), write to
> the database files, fsync(), then write to the log again and fsync().  But
> they will buffer up their writes as needed in a separate thread or process -
> it should not hinder their read processes.
>
> Lots of other applications also use fsync() whenever they want to be sure
> that data is written to the disk.  A prime example is sqlite, which is used
> by many other programs.  If you have your disk systems and file systems set
> up as a typical home user, there is little problem - the disk write caches
> and file system caches will ensure that the app thinks the write is complete
> long before it hits the disk surfaces anyway (thus negating the whole point
> of using fsync() in the first place...).  But if you have a more paranoid
> setup, so that your databases or other files will not get corrupted by power
> fails or OS crashes, then you have write barriers enabled on the filesystems
> and write caches disabled on the disks.
I guess you mess things a bit - one should disable write cache or
enable barriers at one time, not both. Here goes quote from XFS faq:
"Write barrier support is enabled by default in XFS since kernel
version 2.6.17. It is disabled by mounting the filesystem with
"nobarrier". Barrier support will flush the write back cache at the
appropriate times (such as on XFS log writes). "
http://xfs.org/index.php/XFS_FAQ#Write_barrier_support.

> fsync() will then take time - and it will slow down programs that wait for fsync().
>
> I've not done (or seen) any benchmarks on this, and I don't think it will be
> noticeable to most users.  But it's a typical tradeoff - if you are looking
> for high reliability even with power failures or OS crashes, then you pay
> for it in some kinds of performance.
>
>
>>> The other problem with raid10 layout is booting - bootloaders don't much
>>> like it.  The very latest version of grub, IIRC, can boot from raid10 -
>>> but it can be awkward.  There are lots of how-tos around the web for
>>> booting when you have raid, but by far the easiest is to divide your
>>> disks into partitions:
>>>
>>> sdX1 = 1GB
>>> sdX2 = xGB
>>> sdX3 = yGB
>>>
>>> Put all your sdX1 partitions together as raid1 with metadata layout
>>> 0.90, format as ext3 and use it as /boot.  Any bootloader will work fine
>>> with that (don't forget to install grub on each disk's MBR).
>>>
>>> Put your sdX2 partitions together as raid10,f2 for swap.
>>>
>>> Put the sdX3 partitions together as raid10,f2 for everything else.  The
>>> most flexible choice is to use LVM here and make logical partitions for
>>> /, /home, /usr, etc.  But you can also partition up the md device in
>>> distinct fixed partitions for /, /home, etc. if you want.
>>
>> there is a similar layout of your disks described in
>>
>> https://raid.wiki.kernel.org/index.php/Preventing_against_a_failing_disk
>>
>
> They've stolen my ideas!  Actually, I think this setup is fairly obvious
> when you think through the workings of raid and grub, and it's not
> surprising that more than one person has independently picked the same
> arrangement.
>
>>> Don't try and make sdX3 and sdX4 groups and raids for separate / and
>>> /home (unless you want to use different raid levels for these two
>>> groups).  Your disks are faster near the start (at the outer edge of the
>>> disk), so you get the best speed by making the raid10,f2 from almost the
>>> whole disk.
>>
>> Hmm, I think the root partition actually would have more accesses than
>> /home and other partitions, so it may be beneficial to give the fastest
>> disk sectors to a separate root partition. Comments?
>>
>
> If you make the root logical volume first, then the home logical volume (or
> fixed partitions within the raid), then you will automatically get faster
> access for it.  The arrangement on the disk (for a two disk raid10,far) will
> then be:
>
> Boot1 SwapA1 SwapB2 RootA1 HomeA1 <spareA1> RootB2 HomeB2 <spareB2>
> Boot2 SwapB1 SwapA2 RootB1 HomeB1 <spareB1> RootA2 HomeA2 <spareA2>
>
> Here "A" and "B" are stripes, while "1" and "2" are copies.
>
> <spare> is unallocated LVM space.
>
> Since Boot is very small, it negligible for performance - it doesn't matter
> that it takes the fastest few tracks.  Swap gets as high speed as the disk
> can support.  Then root will be faster than home, but both will still be
> better than the disk's average speed since one copy of the data is within
> the outer half of the disk.
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>



-- 
Best regards,
[COOLCOLD-RIPN]
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: mdadm raid1 read performance
  2011-05-06  4:14                           ` CoolCold
@ 2011-05-06  7:29                             ` David Brown
  0 siblings, 0 replies; 36+ messages in thread
From: David Brown @ 2011-05-06  7:29 UTC (permalink / raw)
  To: linux-raid

On 06/05/2011 06:14, CoolCold wrote:
> On Thu, May 5, 2011 at 3:38 PM, David Brown<david@westcontrol.com>
> wrote:
>> On 05/05/2011 12:41, Keld Jørn Simonsen wrote:
>>>
>>> On Thu, May 05, 2011 at 09:26:45AM +0200, David Brown wrote:
>>>>
>>>> On 05/05/2011 02:40, Liam Kurmos wrote:
>>>>>
>>>>> Cheers Roberto,
>>>>>
>>>>> I've got the gist of the far layout from looking at
>>>>> wikipedia. There is some clever stuff going on that i had
>>>>> never considered. i'm going for f2 for my system drive.
>>>>>
>>>>> Liam
>>>>>
>>>>
>>>> For general use, raid10,f2 is often the best choice.  The only
>>>> disadvantage is if you have applications that make a lot of
>>>> synchronised writes, as writes take longer (everything must be
>>>> written twice, and because the data is spread out there is more
>>>> head movement).  For most writes this doesn't matter - the OS
>>>> caches the writes, and the app continues on its way, so the
>>>> writes are done when the disks are not otherwise used.  But if
>>>> you have synchronous writes, so that the app will wait for the
>>>> write to complete, it will be slower (compared to raid10,n2 or
>>>> raid10,o2).
>>>
>>> Yes syncroneous writes would be significantly slower. I have not
>>> seen benchmarks on it, tho. Which applications typically use
>>> syncroneous IO? Maybe not that many. Do databases do that, eg
>>> postgresql and mysql?
>>>
>>
>> Database servers do use synchronous writes (or fsync() calls), but
>> I suspect that they won't suffer much if these are slow unless you
>> have a great deal of writes - they typically write to the
>> transaction log, fsync(), write to the database files, fsync(),
>> then write to the log again and fsync().  But they will buffer up
>> their writes as needed in a separate thread or process - it should
>> not hinder their read processes.
>>
>> Lots of other applications also use fsync() whenever they want to
>> be sure that data is written to the disk.  A prime example is
>> sqlite, which is used by many other programs.  If you have your
>> disk systems and file systems set up as a typical home user, there
>> is little problem - the disk write caches and file system caches
>> will ensure that the app thinks the write is complete long before
>> it hits the disk surfaces anyway (thus negating the whole point of
>> using fsync() in the first place...).  But if you have a more
>> paranoid setup, so that your databases or other files will not get
>> corrupted by power fails or OS crashes, then you have write
>> barriers enabled on the filesystems and write caches disabled on
>> the disks.
> I guess you mess things a bit - one should disable write cache or
> enable barriers at one time, not both. Here goes quote from XFS faq:
> "Write barrier support is enabled by default in XFS since kernel
> version 2.6.17. It is disabled by mounting the filesystem with
> "nobarrier". Barrier support will flush the write back cache at the
> appropriate times (such as on XFS log writes). "
> http://xfs.org/index.php/XFS_FAQ#Write_barrier_support.
>

Yes, thanks.  Usually I don't need to think about these things much, and
when I do, I always have to look up the details to make sure I get the
combinations right.



--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 36+ messages in thread

* RE: mdadm raid1 read performance
  2011-05-04  0:57 ` John Robinson
@ 2011-05-06 20:44   ` Leslie Rhorer
  2011-05-06 21:56     ` Keld Jørn Simonsen
  0 siblings, 1 reply; 36+ messages in thread
From: Leslie Rhorer @ 2011-05-06 20:44 UTC (permalink / raw)
  To: 'John Robinson', 'Liam Kurmos'; +Cc: 'Linux RAID'



> -----Original Message-----
> From: linux-raid-owner@vger.kernel.org [mailto:linux-raid-
> owner@vger.kernel.org] On Behalf Of John Robinson
> Sent: Tuesday, May 03, 2011 7:57 PM
> To: Liam Kurmos
> Cc: Linux RAID
> Subject: Re: mdadm raid1 read performance
> 
> On 04/05/2011 01:07, Liam Kurmos wrote:
> > Hi,
> >
> > I've been testing mdadm (great piece of software btw) however all my
> > test show that reading from raid1 is only the same speed as reading
> > from a single drive.
> >
> > Is this a known issue? or is there something seriously wrong with my
> > system? i have tried v2.8.1 and v.3.2.1 without difference and several
> > benchmarking methods.
> 
> This is a FAQ. Yes, this is known. No, it's not an issue, it's by design
> - pretty much any RAID 1 implementation will be the same because of the
> nature of spinning discs. md RAID 1 will serve multiple simultaneous
> reads from the different mirrors, giving a higher total throughput, but
> a single-threaded read will read from only one. If you want RAID 0
> sequential speed at the same time as RAID 1 mirroring, look at md RAID
> 10, and in particular RAID 10,f2; please see the excellent documentation
> and wiki for more details.

	I would go so far as to say it is more than just by design.  It is
by the very fundamental nature of RAID1.  RAID1 is intended to be a simple
mirror.  Every write is sent in identical form to precisely the same logical
sector of all devices.  Any read can come from any device in the array.  The
WriteMostly specifier can help insure the best throughput in the case where
one of the members is inherently slower than the other members of the array,
and some RAID1 implementations support load balancing, but otherwise there
are no real operational gains in performance for a RAID1 array over a single
disk.


^ permalink raw reply	[flat|nested] 36+ messages in thread

* RE: mdadm raid1 read performance
  2011-05-05  7:26                     ` David Brown
  2011-05-05 10:41                       ` Keld Jørn Simonsen
@ 2011-05-06 21:05                       ` Leslie Rhorer
  2011-05-07 10:37                         ` David Brown
  1 sibling, 1 reply; 36+ messages in thread
From: Leslie Rhorer @ 2011-05-06 21:05 UTC (permalink / raw)
  To: 'David Brown', linux-raid



> -----Original Message-----
> From: linux-raid-owner@vger.kernel.org [mailto:linux-raid-
> owner@vger.kernel.org] On Behalf Of David Brown
> Sent: Thursday, May 05, 2011 2:27 AM
> To: linux-raid@vger.kernel.org
> Subject: Re: mdadm raid1 read performance
> 
> On 05/05/2011 02:40, Liam Kurmos wrote:
> > Cheers Roberto,
> >
> > I've got the gist of the far layout from looking at wikipedia. There
> > is some clever stuff going on that i had never considered.
> > i'm going for f2 for my system drive.
> >
> > Liam
> >
> 
> For general use, raid10,f2 is often the best choice.  The only
> disadvantage is if you have applications that make a lot of synchronised
> writes, as writes take longer (everything must be written twice, and
> because the data is spread out there is more head movement).  For most
> writes this doesn't matter - the OS caches the writes, and the app
> continues on its way, so the writes are done when the disks are not
> otherwise used.  But if you have synchronous writes, so that the app
> will wait for the write to complete, it will be slower (compared to
> raid10,n2 or raid10,o2).
> 
> The other problem with raid10 layout is booting - bootloaders don't much
> like it.  The very latest version of grub, IIRC, can boot from raid10 -
> but it can be awkward.  There are lots of how-tos around the web for
> booting when you have raid, but by far the easiest is to divide your
> disks into partitions:
> 
> sdX1 = 1GB
> sdX2 = xGB
> sdX3 = yGB
> 
> Put all your sdX1 partitions together as raid1 with metadata layout
> 0.90, format as ext3 and use it as /boot.  Any bootloader will work fine
> with that (don't forget to install grub on each disk's MBR).
> 
> Put your sdX2 partitions together as raid10,f2 for swap.
> 
> Put the sdX3 partitions together as raid10,f2 for everything else.  The
> most flexible choice is to use LVM here and make logical partitions for
> /, /home, /usr, etc.  But you can also partition up the md device in
> distinct fixed partitions for /, /home, etc. if you want.

	I agree, except that I like to have separate physical devices for
booting and raw disks for the data.  My servers each have a pair of 500G
hard drives partitioned into three sections.  First, /dev/sdX1 is a small
partition which contains only /boot, it is read-only, and can be mounted at
boot time, or not.  As you say, it has a 0.90 superblock, although I chose
an ext2 file system.  Next, /dev/sdX2 uses about half the disk and is
mounted at /.  Finally, I use the rest of the disk, /dev/sdX3, as swap
space.  I chose all three to be RAID1.

	The data drives are all >= 1Tb, unpartitioned, and assembled into
RAID6 arrays of 10 or more members, each.

	These systems use so little swap space and so rarely, I'm not sure I
see any benefit to RAID10,f2 for them.  Is there?


^ permalink raw reply	[flat|nested] 36+ messages in thread

* RE: mdadm raid1 read performance
  2011-05-05 11:10             ` Keld Jørn Simonsen
@ 2011-05-06 21:20               ` Leslie Rhorer
  2011-05-06 21:53                 ` Keld Jørn Simonsen
  0 siblings, 1 reply; 36+ messages in thread
From: Leslie Rhorer @ 2011-05-06 21:20 UTC (permalink / raw)
  To: 'Keld Jørn Simonsen', 'NeilBrown'; +Cc: linux-raid

> -----Original Message-----
> From: linux-raid-owner@vger.kernel.org [mailto:linux-raid-
> owner@vger.kernel.org] On Behalf Of Keld Jørn Simonsen
> Sent: Thursday, May 05, 2011 6:10 AM
> To: NeilBrown
> Cc: Liam Kurmos; Roberto Spadim; Brad Campbell; Drew; linux-
> raid@vger.kernel.org
> Subject: Re: mdadm raid1 read performance
> 
> On Thu, May 05, 2011 at 09:45:38AM +1000, NeilBrown wrote:
> > On Thu, 5 May 2011 00:08:59 +0100 Liam Kurmos <quantum.leaf@gmail.com>
> wrote:
> >
> > > as a separate question, what should be the theoretical performance of
> raid5?
> >
> > x(N-1)
> >
> > So a 4 drive RAID5 should read at 3 time the speed of a single drive.
> 
> Actually, theoretically, it should be more than that for reading, more
> like N minus
> some overhead. In a raid5 stripe of 4 disks, when reading you do not read
> the checksum block, and thus you should be able to have all 4 drives
> occupied with reading real data. Some benchmarks back this up,
> http://home.comcast.net/~jpiszcz/20080329-raid/
> http://blog.jamponi.net/2008/07/raid56-and-10-benchmarks-on-26255_10.html
> The latter reports a 3.44 times performance for raid5 reads with 4
> disks, significantly over the N-1 = 3.0 mark.
> 
> For writing, you are correct with the N-1 formular.

	There have been a lot of threads here about array performance, but
one important factor rarely mentioned in these threads is network
performance.  Of course, network performance is really outside the scope of
this list, but I frequently see people talking about performance well in
excess of 120MBps.  That's great, but I have to wonder if their network
actually can make use of such speeds.  Of course, if the application
actually obtaining the raw data is on the machine, then network performance
is much less of an issue.  A database search implemented directly on the
server, for example, can use every bit of performance available to the local
machine.  Given that in my case the vast majority of data is squirted across
the LAN (e.g., these are mostly file servers), anything much in excess of
120MBps is irrelevant.  I mean, yeah, it’s a rather nice feeling that my
RAID arrays can deliver more than 450MBps if they are ever called upon to do
so, but with a 1G LAN, that's not going to happen very often.  I just wonder
how many people who complain of poor performance can really benefit all that
much from increased performance?

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: mdadm raid1 read performance
  2011-05-06 21:20               ` Leslie Rhorer
@ 2011-05-06 21:53                 ` Keld Jørn Simonsen
  2011-05-07  3:17                   ` Leslie Rhorer
  0 siblings, 1 reply; 36+ messages in thread
From: Keld Jørn Simonsen @ 2011-05-06 21:53 UTC (permalink / raw)
  To: Leslie Rhorer
  Cc: 'Keld Jørn Simonsen', 'NeilBrown', linux-raid

On Fri, May 06, 2011 at 04:20:39PM -0500, Leslie Rhorer wrote:
> > -----Original Message-----
> > From: linux-raid-owner@vger.kernel.org [mailto:linux-raid-
> > owner@vger.kernel.org] On Behalf Of Keld Jørn Simonsen
> > Sent: Thursday, May 05, 2011 6:10 AM
> > To: NeilBrown
> > Cc: Liam Kurmos; Roberto Spadim; Brad Campbell; Drew; linux-
> > raid@vger.kernel.org
> > Subject: Re: mdadm raid1 read performance
> > 
> > On Thu, May 05, 2011 at 09:45:38AM +1000, NeilBrown wrote:
> > > On Thu, 5 May 2011 00:08:59 +0100 Liam Kurmos <quantum.leaf@gmail.com>
> > wrote:
> > >
> > > > as a separate question, what should be the theoretical performance of
> > raid5?
> > >
> > > x(N-1)
> > >
> > > So a 4 drive RAID5 should read at 3 time the speed of a single drive.
> > 
> > Actually, theoretically, it should be more than that for reading, more
> > like N minus
> > some overhead. In a raid5 stripe of 4 disks, when reading you do not read
> > the checksum block, and thus you should be able to have all 4 drives
> > occupied with reading real data. Some benchmarks back this up,
> > http://home.comcast.net/~jpiszcz/20080329-raid/
> > http://blog.jamponi.net/2008/07/raid56-and-10-benchmarks-on-26255_10.html
> > The latter reports a 3.44 times performance for raid5 reads with 4
> > disks, significantly over the N-1 = 3.0 mark.
> > 
> > For writing, you are correct with the N-1 formular.
> 
> 	There have been a lot of threads here about array performance, but
> one important factor rarely mentioned in these threads is network
> performance.  Of course, network performance is really outside the scope of
> this list, but I frequently see people talking about performance well in
> excess of 120MBps.  That's great, but I have to wonder if their network
> actually can make use of such speeds.  Of course, if the application
> actually obtaining the raw data is on the machine, then network performance
> is much less of an issue.  A database search implemented directly on the
> server, for example, can use every bit of performance available to the local
> machine.  Given that in my case the vast majority of data is squirted across
> the LAN (e.g., these are mostly file servers), anything much in excess of
> 120MBps is irrelevant.  I mean, yeah, it’s a rather nice feeling that my
> RAID arrays can deliver more than 450MBps if they are ever called upon to do
> so, but with a 1G LAN, that's not going to happen very often.  I just wonder
> how many people who complain of poor performance can really benefit all that
> much from increased performance?

10 Gbit/s connections are getting commonplace these days, at least in the
environments that I operate in.

Best regards
keld
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: mdadm raid1 read performance
  2011-05-06 20:44   ` Leslie Rhorer
@ 2011-05-06 21:56     ` Keld Jørn Simonsen
  0 siblings, 0 replies; 36+ messages in thread
From: Keld Jørn Simonsen @ 2011-05-06 21:56 UTC (permalink / raw)
  To: Leslie Rhorer
  Cc: 'John Robinson', 'Liam Kurmos', 'Linux RAID'

On Fri, May 06, 2011 at 03:44:31PM -0500, Leslie Rhorer wrote:
> 
> 
> > -----Original Message-----
> > From: linux-raid-owner@vger.kernel.org [mailto:linux-raid-
> > owner@vger.kernel.org] On Behalf Of John Robinson
> > Sent: Tuesday, May 03, 2011 7:57 PM
> > To: Liam Kurmos
> > Cc: Linux RAID
> > Subject: Re: mdadm raid1 read performance
> > 
> > On 04/05/2011 01:07, Liam Kurmos wrote:
> > > Hi,
> > >
> > > I've been testing mdadm (great piece of software btw) however all my
> > > test show that reading from raid1 is only the same speed as reading
> > > from a single drive.
> > >
> > > Is this a known issue? or is there something seriously wrong with my
> > > system? i have tried v2.8.1 and v.3.2.1 without difference and several
> > > benchmarking methods.
> > 
> > This is a FAQ. Yes, this is known. No, it's not an issue, it's by design
> > - pretty much any RAID 1 implementation will be the same because of the
> > nature of spinning discs. md RAID 1 will serve multiple simultaneous
> > reads from the different mirrors, giving a higher total throughput, but
> > a single-threaded read will read from only one. If you want RAID 0
> > sequential speed at the same time as RAID 1 mirroring, look at md RAID
> > 10, and in particular RAID 10,f2; please see the excellent documentation
> > and wiki for more details.
> 
> 	I would go so far as to say it is more than just by design.  It is
> by the very fundamental nature of RAID1.  RAID1 is intended to be a simple
> mirror.  Every write is sent in identical form to precisely the same logical
> sector of all devices.  Any read can come from any device in the array.  The
> WriteMostly specifier can help insure the best throughput in the case where
> one of the members is inherently slower than the other members of the array,
> and some RAID1 implementations support load balancing, but otherwise there
> are no real operational gains in performance for a RAID1 array over a single
> disk.

SNIA defines RAID1 variants that are not so simple. And in many cases
you really do not know the internal layout of HW RAID1. So IMHO what you say does
not hold true.

best regards
keld

^ permalink raw reply	[flat|nested] 36+ messages in thread

* RE: mdadm raid1 read performance
  2011-05-06 21:53                 ` Keld Jørn Simonsen
@ 2011-05-07  3:17                   ` Leslie Rhorer
  0 siblings, 0 replies; 36+ messages in thread
From: Leslie Rhorer @ 2011-05-07  3:17 UTC (permalink / raw)
  To: 'Keld Jørn Simonsen'; +Cc: 'NeilBrown', linux-raid

> -----Original Message-----
> From: linux-raid-owner@vger.kernel.org [mailto:linux-raid-
> owner@vger.kernel.org] On Behalf Of Keld Jørn Simonsen
> Sent: Friday, May 06, 2011 4:54 PM
> To: Leslie Rhorer
> Cc: 'Keld Jørn Simonsen'; 'NeilBrown'; linux-raid@vger.kernel.org
> Subject: Re: mdadm raid1 read performance
> 
> On Fri, May 06, 2011 at 04:20:39PM -0500, Leslie Rhorer wrote:
> > > -----Original Message-----
> > > From: linux-raid-owner@vger.kernel.org [mailto:linux-raid-
> > > owner@vger.kernel.org] On Behalf Of Keld Jørn Simonsen
> > > Sent: Thursday, May 05, 2011 6:10 AM
> > > To: NeilBrown
> > > Cc: Liam Kurmos; Roberto Spadim; Brad Campbell; Drew; linux-
> > > raid@vger.kernel.org
> > > Subject: Re: mdadm raid1 read performance
> > >
> > > On Thu, May 05, 2011 at 09:45:38AM +1000, NeilBrown wrote:
> > > > On Thu, 5 May 2011 00:08:59 +0100 Liam Kurmos
> <quantum.leaf@gmail.com>
> > > wrote:
> > > >
> > > > > as a separate question, what should be the theoretical performance
> of
> > > raid5?
> > > >
> > > > x(N-1)
> > > >
> > > > So a 4 drive RAID5 should read at 3 time the speed of a single
> drive.
> > >
> > > Actually, theoretically, it should be more than that for reading, more
> > > like N minus
> > > some overhead. In a raid5 stripe of 4 disks, when reading you do not
> read
> > > the checksum block, and thus you should be able to have all 4 drives
> > > occupied with reading real data. Some benchmarks back this up,
> > > http://home.comcast.net/~jpiszcz/20080329-raid/
> > > http://blog.jamponi.net/2008/07/raid56-and-10-benchmarks-on-
> 26255_10.html
> > > The latter reports a 3.44 times performance for raid5 reads with 4
> > > disks, significantly over the N-1 = 3.0 mark.
> > >
> > > For writing, you are correct with the N-1 formular.
> >
> > 	There have been a lot of threads here about array performance, but
> > one important factor rarely mentioned in these threads is network
> > performance.  Of course, network performance is really outside the scope
> of
> > this list, but I frequently see people talking about performance well in
> > excess of 120MBps.  That's great, but I have to wonder if their network
> > actually can make use of such speeds.  Of course, if the application
> > actually obtaining the raw data is on the machine, then network
> performance
> > is much less of an issue.  A database search implemented directly on the
> > server, for example, can use every bit of performance available to the
> local
> > machine.  Given that in my case the vast majority of data is squirted
> across
> > the LAN (e.g., these are mostly file servers), anything much in excess
> of
> > 120MBps is irrelevant.  I mean, yeah, it’s a rather nice feeling that my
> > RAID arrays can deliver more than 450MBps if they are ever called upon
> to do
> > so, but with a 1G LAN, that's not going to happen very often.  I just
> wonder
> > how many people who complain of poor performance can really benefit all
> that
> > much from increased performance?
> 
> 10 Gbit/s connections are getting commonplace these days, at least in the
> environments that I operate in.

	They are certainly not unheard-of, but I'm not sure I would call
them, "commonplace".  They are definitely not in the majority.  I work for a
very large national telecommunications company, and most of the links we
sell are still less than 10M.  I'm not sure we have sold any full 10G
network links, at all, although we have certainly sold a number of 2G - 4G
links.  Of course, WAN and SAN applications are always more expensive than
LAN applications, so many companies have large intra-site links but
comparatively small inter-site links.  Our customer backbone, of course, is
much, much higher than 10G, but none of our internal LAN links at any of our
locations is more than 1G.  Most are 100M.

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: mdadm raid1 read performance
  2011-05-06 21:05                       ` Leslie Rhorer
@ 2011-05-07 10:37                         ` David Brown
  2011-05-07 10:58                           ` Keld Jørn Simonsen
  0 siblings, 1 reply; 36+ messages in thread
From: David Brown @ 2011-05-07 10:37 UTC (permalink / raw)
  To: linux-raid

On 06/05/11 23:05, Leslie Rhorer wrote:
>
>
>> -----Original Message-----
>> From: linux-raid-owner@vger.kernel.org [mailto:linux-raid-
>> owner@vger.kernel.org] On Behalf Of David Brown
>> Sent: Thursday, May 05, 2011 2:27 AM
>> To: linux-raid@vger.kernel.org
>> Subject: Re: mdadm raid1 read performance
>>
>> On 05/05/2011 02:40, Liam Kurmos wrote:
>>> Cheers Roberto,
>>>
>>> I've got the gist of the far layout from looking at wikipedia. There
>>> is some clever stuff going on that i had never considered.
>>> i'm going for f2 for my system drive.
>>>
>>> Liam
>>>
>>
>> For general use, raid10,f2 is often the best choice.  The only
>> disadvantage is if you have applications that make a lot of synchronised
>> writes, as writes take longer (everything must be written twice, and
>> because the data is spread out there is more head movement).  For most
>> writes this doesn't matter - the OS caches the writes, and the app
>> continues on its way, so the writes are done when the disks are not
>> otherwise used.  But if you have synchronous writes, so that the app
>> will wait for the write to complete, it will be slower (compared to
>> raid10,n2 or raid10,o2).
>>
>> The other problem with raid10 layout is booting - bootloaders don't much
>> like it.  The very latest version of grub, IIRC, can boot from raid10 -
>> but it can be awkward.  There are lots of how-tos around the web for
>> booting when you have raid, but by far the easiest is to divide your
>> disks into partitions:
>>
>> sdX1 = 1GB
>> sdX2 = xGB
>> sdX3 = yGB
>>
>> Put all your sdX1 partitions together as raid1 with metadata layout
>> 0.90, format as ext3 and use it as /boot.  Any bootloader will work fine
>> with that (don't forget to install grub on each disk's MBR).
>>
>> Put your sdX2 partitions together as raid10,f2 for swap.
>>
>> Put the sdX3 partitions together as raid10,f2 for everything else.  The
>> most flexible choice is to use LVM here and make logical partitions for
>> /, /home, /usr, etc.  But you can also partition up the md device in
>> distinct fixed partitions for /, /home, etc. if you want.
>
> 	I agree, except that I like to have separate physical devices for
> booting and raw disks for the data.  My servers each have a pair of 500G
> hard drives partitioned into three sections.  First, /dev/sdX1 is a small
> partition which contains only /boot, it is read-only, and can be mounted at
> boot time, or not.  As you say, it has a 0.90 superblock, although I chose
> an ext2 file system.  Next, /dev/sdX2 uses about half the disk and is
> mounted at /.  Finally, I use the rest of the disk, /dev/sdX3, as swap
> space.  I chose all three to be RAID1.
>
> 	The data drives are all>= 1Tb, unpartitioned, and assembled into
> RAID6 arrays of 10 or more members, each.
>

When you need enough data space to have separate disks like this, then 
it is a good plan to separate the OS from the data disks.  These days 
I'd put the OS on a raid1 pair of SSD disks - even small and cheap 40GB 
drives are fine for the OS.  The speed of such drives is such that it 
won't make any difference if you use raid1 or raid10,far, especially 
since almost all files are small (raid0 striping only helps for big files).

> 	These systems use so little swap space and so rarely, I'm not sure I
> see any benefit to RAID10,f2 for them.  Is there?
>

Obviously when you use swap rarely, it makes little difference how it is 
laid out on the disk.  And since it is small, there is no difference 
between the speed of the outer and inner tracks (for HD's - for SSD's 
there is obviously no difference), so you don't gain there.  raid10,f2 
will still be better than raid1 for larger reads from swap - but I think 
you would have a hard time trying to spot that effect in the real world.


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: mdadm raid1 read performance
  2011-05-07 10:37                         ` David Brown
@ 2011-05-07 10:58                           ` Keld Jørn Simonsen
  0 siblings, 0 replies; 36+ messages in thread
From: Keld Jørn Simonsen @ 2011-05-07 10:58 UTC (permalink / raw)
  To: David Brown; +Cc: linux-raid

On Sat, May 07, 2011 at 12:37:18PM +0200, David Brown wrote:
> On 06/05/11 23:05, Leslie Rhorer wrote:
> >	These systems use so little swap space and so rarely, I'm not sure I
> >see any benefit to RAID10,f2 for them.  Is there?
> >
> 
> Obviously when you use swap rarely, it makes little difference how it is 
> laid out on the disk.  And since it is small, there is no difference 
> between the speed of the outer and inner tracks (for HD's - for SSD's 
> there is obviously no difference), so you don't gain there.  raid10,f2 
> will still be better than raid1 for larger reads from swap - but I think 
> you would have a hard time trying to spot that effect in the real world.

I think swap on raid10,f2 mostly matters on workstations, where you have 
big apps like OpenOffice.org or firefox, and limited RAM.

best regards
keld

^ permalink raw reply	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2011-05-07 10:58 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-05-04  0:07 mdadm raid1 read performance Liam Kurmos
2011-05-04  0:57 ` John Robinson
2011-05-06 20:44   ` Leslie Rhorer
2011-05-06 21:56     ` Keld Jørn Simonsen
2011-05-04  0:58 ` NeilBrown
2011-05-04  5:30   ` Drew
2011-05-04  6:31     ` Brad Campbell
2011-05-04  7:42       ` Roberto Spadim
2011-05-04 23:08         ` Liam Kurmos
2011-05-04 23:35           ` Roberto Spadim
2011-05-04 23:36           ` Brad Campbell
2011-05-04 23:45           ` NeilBrown
2011-05-04 23:57             ` Roberto Spadim
2011-05-05  0:14             ` Liam Kurmos
2011-05-05  0:20               ` Liam Kurmos
2011-05-05  0:25                 ` Roberto Spadim
2011-05-05  0:40                   ` Liam Kurmos
2011-05-05  7:26                     ` David Brown
2011-05-05 10:41                       ` Keld Jørn Simonsen
2011-05-05 11:38                         ` David Brown
2011-05-06  4:14                           ` CoolCold
2011-05-06  7:29                             ` David Brown
2011-05-06 21:05                       ` Leslie Rhorer
2011-05-07 10:37                         ` David Brown
2011-05-07 10:58                           ` Keld Jørn Simonsen
2011-05-05  0:24               ` Roberto Spadim
2011-05-05 11:10             ` Keld Jørn Simonsen
2011-05-06 21:20               ` Leslie Rhorer
2011-05-06 21:53                 ` Keld Jørn Simonsen
2011-05-07  3:17                   ` Leslie Rhorer
2011-05-05  4:06           ` Roman Mamedov
2011-05-05  8:06             ` Nikolay Kichukov
2011-05-05  8:39               ` Liam Kurmos
2011-05-05  8:49                 ` Liam Kurmos
2011-05-05  9:30               ` NeilBrown
2011-05-04  7:48       ` David Brown

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.