All of lore.kernel.org
 help / color / mirror / Atom feed
* Raid 1 vs Raid 10 single thread performance
@ 2014-09-10 21:24 Bostjan Skufca
  2014-09-11  0:31 ` NeilBrown
  2014-09-12  8:49 ` David Brown
  0 siblings, 2 replies; 10+ messages in thread
From: Bostjan Skufca @ 2014-09-10 21:24 UTC (permalink / raw)
  To: linux-raid

Hi,

I have a simple question:
- Where is the code that is used for actual RAID 10 creation? In
kernel or in mdadm?


Explanation:

I was dissatisfied with single-threaded RAID 1 sequential read
performance (basically boils down to the speed of one disk). I figured
that instead of using level 1 I could create RAID level 10 and use two
equally-sized partitions on each drive (instead of one).

It turns out that if array is created properly, it is capable of
sequential reads at almost 2x single device speed, as expected (on
SSD!) and what would anyone expect from ordinary RAID 1.

What does "properly" actually mean?
I was doing some benchmarks with various raid configurations and
figured out that the order of devices submitted to creation command is
significant. It also makes raid10 created in such mode reliable or
unreliable to a device failure (not partition failure, device failure,
which means that two raid underlying devices fail at once).

Sum:
- if such array is created properly, it has redundancy in place and
performs as expected
- if not, it performs as raid1 and fails with one physical disk failure

I am trying to find the code responsible for creation of RAID 10 in
order to try and make it more inteligent about where to place RAID 10
parts if it gets a list of devices to use, and some of those devices
are on the same physical disks.

Thanks for hints,
b.



PS: More details about testing is available here, but be warned, it is
still a bit hectic to read:
http://blog.a2o.si/2014/09/07/linux-software-raid-why-you-should-always-use-raid-10-instead-of-raid-1/

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Raid 1 vs Raid 10 single thread performance
  2014-09-10 21:24 Raid 1 vs Raid 10 single thread performance Bostjan Skufca
@ 2014-09-11  0:31 ` NeilBrown
  2014-09-11  4:48   ` Bostjan Skufca
  2014-09-12  8:49 ` David Brown
  1 sibling, 1 reply; 10+ messages in thread
From: NeilBrown @ 2014-09-11  0:31 UTC (permalink / raw)
  To: Bostjan Skufca; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 2376 bytes --]

On Wed, 10 Sep 2014 23:24:11 +0200 Bostjan Skufca <bostjan@a2o.si> wrote:

> Hi,
> 
> I have a simple question:
> - Where is the code that is used for actual RAID 10 creation? In
> kernel or in mdadm?

Depends on exactly what you mean ... probably in mdadm.

> 
> 
> Explanation:
> 
> I was dissatisfied with single-threaded RAID 1 sequential read
> performance (basically boils down to the speed of one disk). I figured
> that instead of using level 1 I could create RAID level 10 and use two
> equally-sized partitions on each drive (instead of one).
> 
> It turns out that if array is created properly, it is capable of
> sequential reads at almost 2x single device speed, as expected (on
> SSD!) and what would anyone expect from ordinary RAID 1.
> 
> What does "properly" actually mean?
> I was doing some benchmarks with various raid configurations and
> figured out that the order of devices submitted to creation command is
> significant. It also makes raid10 created in such mode reliable or
> unreliable to a device failure (not partition failure, device failure,
> which means that two raid underlying devices fail at once).

I don't think you've really explained what "properly" means.  How exactly do
you get better throughput?

If you want double-speed single-thread throughput on 2 devices, then create a
2-device RAID10 with "--layout=f2".



> 
> Sum:
> - if such array is created properly, it has redundancy in place and
> performs as expected
> - if not, it performs as raid1 and fails with one physical disk failure
> 
> I am trying to find the code responsible for creation of RAID 10 in
> order to try and make it more inteligent about where to place RAID 10
> parts if it gets a list of devices to use, and some of those devices
> are on the same physical disks.

mdadm uses the devices in the order that you list them.


> 
> Thanks for hints,
> b.
> 

NeilBrown


> 
> 
> PS: More details about testing is available here, but be warned, it is
> still a bit hectic to read:
> http://blog.a2o.si/2014/09/07/linux-software-raid-why-you-should-always-use-raid-10-instead-of-raid-1/
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Raid 1 vs Raid 10 single thread performance
  2014-09-11  0:31 ` NeilBrown
@ 2014-09-11  4:48   ` Bostjan Skufca
  2014-09-11  4:59     ` NeilBrown
  0 siblings, 1 reply; 10+ messages in thread
From: Bostjan Skufca @ 2014-09-11  4:48 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid

On 11 September 2014 02:31, NeilBrown <neilb@suse.de> wrote:
> On Wed, 10 Sep 2014 23:24:11 +0200 Bostjan Skufca <bostjan@a2o.si> wrote:
>> What does "properly" actually mean?
>> I was doing some benchmarks with various raid configurations and
>> figured out that the order of devices submitted to creation command is
>> significant. It also makes raid10 created in such mode reliable or
>> unreliable to a device failure (not partition failure, device failure,
>> which means that two raid underlying devices fail at once).
>
> I don't think you've really explained what "properly" means.  How exactly do
> you get better throughput?
>
> If you want double-speed single-thread throughput on 2 devices, then create a
> 2-device RAID10 with "--layout=f2".

I went and retested a few things and I see I must have done something
wrong before:
- regardless whether I use --layout flag or not, and
- regardless of device cli arg order at array creation time,
= I always get double-speed single-thread throughput. Yaay!

Anyway, the thing is that regardless of -using -layout=f2 or not,
redundancy STILL depends on the order of command line arguments passed
to mdadm --create.
If I do:
- "sda1 sdb1 sda2 sdb2" - redundandcy is ok
- "sda1 sda2 sdb1 sdb2" - redundancy fails

Is there a flag that ensures redundancy in this particular case?
If not, don't you think the naive user (me, for example) would assume
that code is smart enough to ensure basic redundancy, if there are at
least two devices available?

Because, if someone wants only performance and no redundancy, they
will look no further than raid 0. But raid10 strongly hints at
redundancy being incorporated in it. (I admit this is anecdotal, based
on my own experience and thought flow.)


b.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Raid 1 vs Raid 10 single thread performance
  2014-09-11  4:48   ` Bostjan Skufca
@ 2014-09-11  4:59     ` NeilBrown
  2014-09-11  5:20       ` Bostjan Skufca
  0 siblings, 1 reply; 10+ messages in thread
From: NeilBrown @ 2014-09-11  4:59 UTC (permalink / raw)
  To: Bostjan Skufca; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 2834 bytes --]

On Thu, 11 Sep 2014 06:48:31 +0200 Bostjan Skufca <bostjan@a2o.si> wrote:

> On 11 September 2014 02:31, NeilBrown <neilb@suse.de> wrote:
> > On Wed, 10 Sep 2014 23:24:11 +0200 Bostjan Skufca <bostjan@a2o.si> wrote:
> >> What does "properly" actually mean?
> >> I was doing some benchmarks with various raid configurations and
> >> figured out that the order of devices submitted to creation command is
> >> significant. It also makes raid10 created in such mode reliable or
> >> unreliable to a device failure (not partition failure, device failure,
> >> which means that two raid underlying devices fail at once).
> >
> > I don't think you've really explained what "properly" means.  How exactly do
> > you get better throughput?
> >
> > If you want double-speed single-thread throughput on 2 devices, then create a
> > 2-device RAID10 with "--layout=f2".
> 
> I went and retested a few things and I see I must have done something
> wrong before:
> - regardless whether I use --layout flag or not, and
> - regardless of device cli arg order at array creation time,
> = I always get double-speed single-thread throughput. Yaay!
> 
> Anyway, the thing is that regardless of -using -layout=f2 or not,
> redundancy STILL depends on the order of command line arguments passed
> to mdadm --create.
> If I do:
> - "sda1 sdb1 sda2 sdb2" - redundandcy is ok
> - "sda1 sda2 sdb1 sdb2" - redundancy fails
> 
> Is there a flag that ensures redundancy in this particular case?
> If not, don't you think the naive user (me, for example) would assume
> that code is smart enough to ensure basic redundancy, if there are at
> least two devices available?

I cannot guess what other people will assume.  I certainly cannot guard
against all possible incorrect assumptions.

If you create an array which doesn't have true redundancy you will get a
message from the kernel saying:

  %s: WARNING: %s appears to be on the same physical disk as %s.
  True protection against single-disk failure might be compromised.

Maybe mdadm could produce a similar message...


> 
> Because, if someone wants only performance and no redundancy, they
> will look no further than raid 0. But raid10 strongly hints at
> redundancy being incorporated in it. (I admit this is anecdotal, based
> on my own experience and thought flow.)

I really don't think there is any value is splitting a device into multiple
partitions and putting more than one partition per device into an array.
Have you tried using just one partition per device, making a RAID10 with
--layout=f2 ??

NeilBrown


> 
> 
> b.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Raid 1 vs Raid 10 single thread performance
  2014-09-11  4:59     ` NeilBrown
@ 2014-09-11  5:20       ` Bostjan Skufca
  2014-09-11  5:46         ` NeilBrown
  0 siblings, 1 reply; 10+ messages in thread
From: Bostjan Skufca @ 2014-09-11  5:20 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid

On 11 September 2014 06:59, NeilBrown <neilb@suse.de> wrote:
> On Thu, 11 Sep 2014 06:48:31 +0200 Bostjan Skufca <bostjan@a2o.si> wrote:
>
>> On 11 September 2014 02:31, NeilBrown <neilb@suse.de> wrote:
>> > On Wed, 10 Sep 2014 23:24:11 +0200 Bostjan Skufca <bostjan@a2o.si> wrote:
>> >> What does "properly" actually mean?
>> >> I was doing some benchmarks with various raid configurations and
>> >> figured out that the order of devices submitted to creation command is
>> >> significant. It also makes raid10 created in such mode reliable or
>> >> unreliable to a device failure (not partition failure, device failure,
>> >> which means that two raid underlying devices fail at once).
>> >
>> > I don't think you've really explained what "properly" means.  How exactly do
>> > you get better throughput?
>> >
>> > If you want double-speed single-thread throughput on 2 devices, then create a
>> > 2-device RAID10 with "--layout=f2".
>>
>> I went and retested a few things and I see I must have done something
>> wrong before:
>> - regardless whether I use --layout flag or not, and
>> - regardless of device cli arg order at array creation time,
>> = I always get double-speed single-thread throughput. Yaay!
>>
>> Anyway, the thing is that regardless of -using -layout=f2 or not,
>> redundancy STILL depends on the order of command line arguments passed
>> to mdadm --create.
>> If I do:
>> - "sda1 sdb1 sda2 sdb2" - redundandcy is ok
>> - "sda1 sda2 sdb1 sdb2" - redundancy fails
>>
>> Is there a flag that ensures redundancy in this particular case?
>> If not, don't you think the naive user (me, for example) would assume
>> that code is smart enough to ensure basic redundancy, if there are at
>> least two devices available?
>
> I cannot guess what other people will assume.  I certainly cannot guard
> against all possible incorrect assumptions.
>
> If you create an array which doesn't have true redundancy you will get a
> message from the kernel saying:
>
>   %s: WARNING: %s appears to be on the same physical disk as %s.
>   True protection against single-disk failure might be compromised.
>
> Maybe mdadm could produce a similar message...

I've seen it. Kernel produces this message in both cases.


>> Because, if someone wants only performance and no redundancy, they
>> will look no further than raid 0. But raid10 strongly hints at
>> redundancy being incorporated in it. (I admit this is anecdotal, based
>> on my own experience and thought flow.)
>
> I really don't think there is any value is splitting a device into multiple
> partitions and putting more than one partition per device into an array.
> Have you tried using just one partition per device, making a RAID10 with
> --layout=f2 ??

Yep, I tried raid10 on 4 devices with layout=f2, it works as expected.
No problem there.
And I know it is better if you have 4 devices for raid10, you are
right there. That is the expected use case.

But if you only have 2, you are limited to the options with those two.
Now, if I create raid1 on those two, I get bad single-threaded read
performance. This usually does not happen with hardware RAIDs.

This is the reason I started looking into posibility of using multiple
partitions per disk, to get something which reads off both disks even
for single "client". Raid10 seemed an option, and it works, albeit a
bit hackish ATM.

This is also the reason I asked for code locations, to look at it and
maybe send in patches for review which make a bit more inteligent
data-placement guesses in the case mentioned above. Would this be an
option of interest to actually pull it it?

b.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Raid 1 vs Raid 10 single thread performance
  2014-09-11  5:20       ` Bostjan Skufca
@ 2014-09-11  5:46         ` NeilBrown
  0 siblings, 0 replies; 10+ messages in thread
From: NeilBrown @ 2014-09-11  5:46 UTC (permalink / raw)
  To: Bostjan Skufca; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 4303 bytes --]

On Thu, 11 Sep 2014 07:20:48 +0200 Bostjan Skufca <bostjan@a2o.si> wrote:

> On 11 September 2014 06:59, NeilBrown <neilb@suse.de> wrote:
> > On Thu, 11 Sep 2014 06:48:31 +0200 Bostjan Skufca <bostjan@a2o.si> wrote:
> >
> >> On 11 September 2014 02:31, NeilBrown <neilb@suse.de> wrote:
> >> > On Wed, 10 Sep 2014 23:24:11 +0200 Bostjan Skufca <bostjan@a2o.si> wrote:
> >> >> What does "properly" actually mean?
> >> >> I was doing some benchmarks with various raid configurations and
> >> >> figured out that the order of devices submitted to creation command is
> >> >> significant. It also makes raid10 created in such mode reliable or
> >> >> unreliable to a device failure (not partition failure, device failure,
> >> >> which means that two raid underlying devices fail at once).
> >> >
> >> > I don't think you've really explained what "properly" means.  How exactly do
> >> > you get better throughput?
> >> >
> >> > If you want double-speed single-thread throughput on 2 devices, then create a
> >> > 2-device RAID10 with "--layout=f2".
> >>
> >> I went and retested a few things and I see I must have done something
> >> wrong before:
> >> - regardless whether I use --layout flag or not, and
> >> - regardless of device cli arg order at array creation time,
> >> = I always get double-speed single-thread throughput. Yaay!
> >>
> >> Anyway, the thing is that regardless of -using -layout=f2 or not,
> >> redundancy STILL depends on the order of command line arguments passed
> >> to mdadm --create.
> >> If I do:
> >> - "sda1 sdb1 sda2 sdb2" - redundandcy is ok
> >> - "sda1 sda2 sdb1 sdb2" - redundancy fails
> >>
> >> Is there a flag that ensures redundancy in this particular case?
> >> If not, don't you think the naive user (me, for example) would assume
> >> that code is smart enough to ensure basic redundancy, if there are at
> >> least two devices available?
> >
> > I cannot guess what other people will assume.  I certainly cannot guard
> > against all possible incorrect assumptions.
> >
> > If you create an array which doesn't have true redundancy you will get a
> > message from the kernel saying:
> >
> >   %s: WARNING: %s appears to be on the same physical disk as %s.
> >   True protection against single-disk failure might be compromised.
> >
> > Maybe mdadm could produce a similar message...
> 
> I've seen it. Kernel produces this message in both cases.
> 
> 
> >> Because, if someone wants only performance and no redundancy, they
> >> will look no further than raid 0. But raid10 strongly hints at
> >> redundancy being incorporated in it. (I admit this is anecdotal, based
> >> on my own experience and thought flow.)
> >
> > I really don't think there is any value is splitting a device into multiple
> > partitions and putting more than one partition per device into an array.
> > Have you tried using just one partition per device, making a RAID10 with
> > --layout=f2 ??
> 
> Yep, I tried raid10 on 4 devices with layout=f2, it works as expected.
> No problem there.

But did you try RAID10 with just 2 devices?


> And I know it is better if you have 4 devices for raid10, you are
> right there. That is the expected use case.
> 
> But if you only have 2, you are limited to the options with those two.

You can still use RAID10 on 2 devices - that is not a limit (just like you can
use RAID5 on 2 devices).

NeilBrown


> Now, if I create raid1 on those two, I get bad single-threaded read
> performance. This usually does not happen with hardware RAIDs.
> 
> This is the reason I started looking into posibility of using multiple
> partitions per disk, to get something which reads off both disks even
> for single "client". Raid10 seemed an option, and it works, albeit a
> bit hackish ATM.
> 
> This is also the reason I asked for code locations, to look at it and
> maybe send in patches for review which make a bit more inteligent
> data-placement guesses in the case mentioned above. Would this be an
> option of interest to actually pull it it?
> 
> b.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Raid 1 vs Raid 10 single thread performance
  2014-09-10 21:24 Raid 1 vs Raid 10 single thread performance Bostjan Skufca
  2014-09-11  0:31 ` NeilBrown
@ 2014-09-12  8:49 ` David Brown
  2014-09-16  7:48   ` Bostjan Skufca
  1 sibling, 1 reply; 10+ messages in thread
From: David Brown @ 2014-09-12  8:49 UTC (permalink / raw)
  To: Bostjan Skufca, linux-raid

On 10/09/14 23:24, Bostjan Skufca wrote:
> Hi,
> 
> I have a simple question:
> - Where is the code that is used for actual RAID 10 creation? In
> kernel or in mdadm?
> 
> 
> Explanation:
> 
> I was dissatisfied with single-threaded RAID 1 sequential read
> performance (basically boils down to the speed of one disk). I figured
> that instead of using level 1 I could create RAID level 10 and use two
> equally-sized partitions on each drive (instead of one).
> 
> It turns out that if array is created properly, it is capable of
> sequential reads at almost 2x single device speed, as expected (on
> SSD!) and what would anyone expect from ordinary RAID 1.
> 
> What does "properly" actually mean?
> I was doing some benchmarks with various raid configurations and
> figured out that the order of devices submitted to creation command is
> significant. It also makes raid10 created in such mode reliable or
> unreliable to a device failure (not partition failure, device failure,
> which means that two raid underlying devices fail at once).
> 
> Sum:
> - if such array is created properly, it has redundancy in place and
> performs as expected
> - if not, it performs as raid1 and fails with one physical disk failure
> 
> I am trying to find the code responsible for creation of RAID 10 in
> order to try and make it more inteligent about where to place RAID 10
> parts if it gets a list of devices to use, and some of those devices
> are on the same physical disks.
> 
> Thanks for hints,
> b.
> 
> 
> 
> PS: More details about testing is available here, but be warned, it is
> still a bit hectic to read:
> http://blog.a2o.si/2014/09/07/linux-software-raid-why-you-should-always-use-raid-10-instead-of-raid-1/


Hi,

First let me applaud your enthusiasm for trying to inform people about
raid in your blog, your interest in investigating different ideas in the
hope of making md raid faster and/or easier and/or safer.

Then let me tell you your entire blog post is wasted, because md already
has a solution that is faster, easier and safer than anything you have
come up with so far.

You are absolutely correct about the single-threaded read performance of
raid1 pairs - for a number of reasons, a single thread read will get
reads from only one disk.  This is not a problem in many cases, because
you often have multiple simultaneous reads on "typical" systems with
raid1.  But for some cases, such as a high performance desktop, it can
be a limitation.

You are also correct that the solution is basically to split the drives
into two parts, pair up halves from each disk as raid1 mirrors, and
stripe the two mirrors as raid0.

And you are correct that you have to get the sets right, or you will may
lose redundancy and/or speed.

Fortunately, Neil and the other md raid developers are way ahead of you.

Neil gave you the pointers in one of his replies, but I suspect you did
not understand that Linux raid10 is not limited to the arrangement of
traditional raid10, and thus did not see his point.

md raid and mdadmin already support a very flexible form of raid10.
Unlike traditional raid10 that requires a multiple of 4 disks, Linux
raid10 can work with /any/ number of disks greater than 1.  There are
various layouts that can be used for this - the Wikipedia entry gives
some useful diagrams:

<http://en.wikipedia.org/wiki/Non-standard_RAID_levels#Linux_MD_RAID_10>

You can also read about it in the mdadm manual page, and various
documents and resources around the web.


In your particular case, what you want is to use "--layout raid10,f2" on
your two disks.  This asks md to split each disk (or the partitions you
use) into two parts, without creating any new partitions.  The first
half of disk 1 is mirrored with the second half of disk 2, and vice
versa, then these mirrors are striped.  This is very similar to the
layout you are trying to achieve, except for four points:

The mirrors are crossed-over, so that a first half is mirrored with a
second half.  This makes no difference on an SSD, but makes a huge
difference on a hard disk.

mdadm and md raid get the ordering right every time - there is no need
to worry about the ordering of the two disks.

You don't have to have extra partitions, automatic detection works, and
the layout has one less layer, meaning less complexity and lower latency
and overheads.

md raid knows more about the layout, and can use it to optimise the speed.


In particular, md will (almost) always read from the outer halves of the
disks.  On a hard disk, this can be twice the speed of the inner layers.

Obviously you pay a penalty in writing when you have such an arrangement
- writes need to go to both disks, and involve significant head
movement.  There are other raid10 layouts that have lower streamed read
speeds but also lower write latencies (choose the balance you want).


With this in mind, I hope you can try out raid10,f2 layout on your
system and then change your blog to show how easy this all is with md
raid, how practical it is for a fast workstation or desktop, and how
much faster such a setup is than anything that can be achieved with
hardware raid cards or anything other than md raid.

mvh.,

David


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Raid 1 vs Raid 10 single thread performance
  2014-09-12  8:49 ` David Brown
@ 2014-09-16  7:48   ` Bostjan Skufca
  2014-09-16 10:19     ` keld
  0 siblings, 1 reply; 10+ messages in thread
From: Bostjan Skufca @ 2014-09-16  7:48 UTC (permalink / raw)
  To: David Brown; +Cc: linux-raid

David and Neil, thanks for hints!

(I was busy with other things lately, but believe it or not I got the
"why not try raid 10 with only 2 partitions" idea just last night,
tested it a couple of minutes ago with fascination, and now here I am
reading your emails - please do not remind me again of time wasted :)

The write performance is curious though:
- f2: 147 MB/s
- n2: 162 MB/s
I was expecting greater difference (bu I must admit this was not
tested on the whole 3TB disk, just 400GB partition on it).

b.


On 12 September 2014 10:49, David Brown <david.brown@hesbynett.no> wrote:
> On 10/09/14 23:24, Bostjan Skufca wrote:
>> Hi,
>>
>> I have a simple question:
>> - Where is the code that is used for actual RAID 10 creation? In
>> kernel or in mdadm?
>>
>>
>> Explanation:
>>
>> I was dissatisfied with single-threaded RAID 1 sequential read
>> performance (basically boils down to the speed of one disk). I figured
>> that instead of using level 1 I could create RAID level 10 and use two
>> equally-sized partitions on each drive (instead of one).
>>
>> It turns out that if array is created properly, it is capable of
>> sequential reads at almost 2x single device speed, as expected (on
>> SSD!) and what would anyone expect from ordinary RAID 1.
>>
>> What does "properly" actually mean?
>> I was doing some benchmarks with various raid configurations and
>> figured out that the order of devices submitted to creation command is
>> significant. It also makes raid10 created in such mode reliable or
>> unreliable to a device failure (not partition failure, device failure,
>> which means that two raid underlying devices fail at once).
>>
>> Sum:
>> - if such array is created properly, it has redundancy in place and
>> performs as expected
>> - if not, it performs as raid1 and fails with one physical disk failure
>>
>> I am trying to find the code responsible for creation of RAID 10 in
>> order to try and make it more inteligent about where to place RAID 10
>> parts if it gets a list of devices to use, and some of those devices
>> are on the same physical disks.
>>
>> Thanks for hints,
>> b.
>>
>>
>>
>> PS: More details about testing is available here, but be warned, it is
>> still a bit hectic to read:
>> http://blog.a2o.si/2014/09/07/linux-software-raid-why-you-should-always-use-raid-10-instead-of-raid-1/
>
>
> Hi,
>
> First let me applaud your enthusiasm for trying to inform people about
> raid in your blog, your interest in investigating different ideas in the
> hope of making md raid faster and/or easier and/or safer.
>
> Then let me tell you your entire blog post is wasted, because md already
> has a solution that is faster, easier and safer than anything you have
> come up with so far.
>
> You are absolutely correct about the single-threaded read performance of
> raid1 pairs - for a number of reasons, a single thread read will get
> reads from only one disk.  This is not a problem in many cases, because
> you often have multiple simultaneous reads on "typical" systems with
> raid1.  But for some cases, such as a high performance desktop, it can
> be a limitation.
>
> You are also correct that the solution is basically to split the drives
> into two parts, pair up halves from each disk as raid1 mirrors, and
> stripe the two mirrors as raid0.
>
> And you are correct that you have to get the sets right, or you will may
> lose redundancy and/or speed.
>
> Fortunately, Neil and the other md raid developers are way ahead of you.
>
> Neil gave you the pointers in one of his replies, but I suspect you did
> not understand that Linux raid10 is not limited to the arrangement of
> traditional raid10, and thus did not see his point.
>
> md raid and mdadmin already support a very flexible form of raid10.
> Unlike traditional raid10 that requires a multiple of 4 disks, Linux
> raid10 can work with /any/ number of disks greater than 1.  There are
> various layouts that can be used for this - the Wikipedia entry gives
> some useful diagrams:
>
> <http://en.wikipedia.org/wiki/Non-standard_RAID_levels#Linux_MD_RAID_10>
>
> You can also read about it in the mdadm manual page, and various
> documents and resources around the web.
>
>
> In your particular case, what you want is to use "--layout raid10,f2" on
> your two disks.  This asks md to split each disk (or the partitions you
> use) into two parts, without creating any new partitions.  The first
> half of disk 1 is mirrored with the second half of disk 2, and vice
> versa, then these mirrors are striped.  This is very similar to the
> layout you are trying to achieve, except for four points:
>
> The mirrors are crossed-over, so that a first half is mirrored with a
> second half.  This makes no difference on an SSD, but makes a huge
> difference on a hard disk.
>
> mdadm and md raid get the ordering right every time - there is no need
> to worry about the ordering of the two disks.
>
> You don't have to have extra partitions, automatic detection works, and
> the layout has one less layer, meaning less complexity and lower latency
> and overheads.
>
> md raid knows more about the layout, and can use it to optimise the speed.
>
>
> In particular, md will (almost) always read from the outer halves of the
> disks.  On a hard disk, this can be twice the speed of the inner layers.
>
> Obviously you pay a penalty in writing when you have such an arrangement
> - writes need to go to both disks, and involve significant head
> movement.  There are other raid10 layouts that have lower streamed read
> speeds but also lower write latencies (choose the balance you want).
>
>
> With this in mind, I hope you can try out raid10,f2 layout on your
> system and then change your blog to show how easy this all is with md
> raid, how practical it is for a fast workstation or desktop, and how
> much faster such a setup is than anything that can be achieved with
> hardware raid cards or anything other than md raid.
>
> mvh.,
>
> David
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Raid 1 vs Raid 10 single thread performance
  2014-09-16  7:48   ` Bostjan Skufca
@ 2014-09-16 10:19     ` keld
       [not found]       ` <CAEp_DRDBQQmBHe7uYdOWWnUD084RtTrnbZe3jUrG3b6c6w=ivQ@mail.gmail.com>
  0 siblings, 1 reply; 10+ messages in thread
From: keld @ 2014-09-16 10:19 UTC (permalink / raw)
  To: Bostjan Skufca; +Cc: David Brown, linux-raid

On Tue, Sep 16, 2014 at 09:48:28AM +0200, Bostjan Skufca wrote:
> David and Neil, thanks for hints!
> 
> (I was busy with other things lately, but believe it or not I got the
> "why not try raid 10 with only 2 partitions" idea just last night,
> tested it a couple of minutes ago with fascination, and now here I am
> reading your emails - please do not remind me again of time wasted :)
> 
> The write performance is curious though:
> - f2: 147 MB/s
> - n2: 162 MB/s
> I was expecting greater difference (bu I must admit this was not
> tested on the whole 3TB disk, just 400GB partition on it).


This is as expected, and also as reported in other benchmarks.

Many expect that writing is considerably slower in F2 than n2, 
because the blocks are distributed much more apart in f2 than in n2,
but the elevator algorithm for IO sceduling collects writing blocks
in the cache and does almost equalize the time used for about all mirrored
raid types.

See also https://raid.wiki.kernel.org/index.php/Performance
for more benchmarks.

Best regards
Keld

> b.
> 
> 
> On 12 September 2014 10:49, David Brown <david.brown@hesbynett.no> wrote:
> > On 10/09/14 23:24, Bostjan Skufca wrote:
> >> Hi,
> >>
> >> I have a simple question:
> >> - Where is the code that is used for actual RAID 10 creation? In
> >> kernel or in mdadm?
> >>
> >>
> >> Explanation:
> >>
> >> I was dissatisfied with single-threaded RAID 1 sequential read
> >> performance (basically boils down to the speed of one disk). I figured
> >> that instead of using level 1 I could create RAID level 10 and use two
> >> equally-sized partitions on each drive (instead of one).
> >>
> >> It turns out that if array is created properly, it is capable of
> >> sequential reads at almost 2x single device speed, as expected (on
> >> SSD!) and what would anyone expect from ordinary RAID 1.
> >>
> >> What does "properly" actually mean?
> >> I was doing some benchmarks with various raid configurations and
> >> figured out that the order of devices submitted to creation command is
> >> significant. It also makes raid10 created in such mode reliable or
> >> unreliable to a device failure (not partition failure, device failure,
> >> which means that two raid underlying devices fail at once).
> >>
> >> Sum:
> >> - if such array is created properly, it has redundancy in place and
> >> performs as expected
> >> - if not, it performs as raid1 and fails with one physical disk failure
> >>
> >> I am trying to find the code responsible for creation of RAID 10 in
> >> order to try and make it more inteligent about where to place RAID 10
> >> parts if it gets a list of devices to use, and some of those devices
> >> are on the same physical disks.
> >>
> >> Thanks for hints,
> >> b.
> >>
> >>
> >>
> >> PS: More details about testing is available here, but be warned, it is
> >> still a bit hectic to read:
> >> http://blog.a2o.si/2014/09/07/linux-software-raid-why-you-should-always-use-raid-10-instead-of-raid-1/
> >
> >
> > Hi,
> >
> > First let me applaud your enthusiasm for trying to inform people about
> > raid in your blog, your interest in investigating different ideas in the
> > hope of making md raid faster and/or easier and/or safer.
> >
> > Then let me tell you your entire blog post is wasted, because md already
> > has a solution that is faster, easier and safer than anything you have
> > come up with so far.
> >
> > You are absolutely correct about the single-threaded read performance of
> > raid1 pairs - for a number of reasons, a single thread read will get
> > reads from only one disk.  This is not a problem in many cases, because
> > you often have multiple simultaneous reads on "typical" systems with
> > raid1.  But for some cases, such as a high performance desktop, it can
> > be a limitation.
> >
> > You are also correct that the solution is basically to split the drives
> > into two parts, pair up halves from each disk as raid1 mirrors, and
> > stripe the two mirrors as raid0.
> >
> > And you are correct that you have to get the sets right, or you will may
> > lose redundancy and/or speed.
> >
> > Fortunately, Neil and the other md raid developers are way ahead of you.
> >
> > Neil gave you the pointers in one of his replies, but I suspect you did
> > not understand that Linux raid10 is not limited to the arrangement of
> > traditional raid10, and thus did not see his point.
> >
> > md raid and mdadmin already support a very flexible form of raid10.
> > Unlike traditional raid10 that requires a multiple of 4 disks, Linux
> > raid10 can work with /any/ number of disks greater than 1.  There are
> > various layouts that can be used for this - the Wikipedia entry gives
> > some useful diagrams:
> >
> > <http://en.wikipedia.org/wiki/Non-standard_RAID_levels#Linux_MD_RAID_10>
> >
> > You can also read about it in the mdadm manual page, and various
> > documents and resources around the web.
> >
> >
> > In your particular case, what you want is to use "--layout raid10,f2" on
> > your two disks.  This asks md to split each disk (or the partitions you
> > use) into two parts, without creating any new partitions.  The first
> > half of disk 1 is mirrored with the second half of disk 2, and vice
> > versa, then these mirrors are striped.  This is very similar to the
> > layout you are trying to achieve, except for four points:
> >
> > The mirrors are crossed-over, so that a first half is mirrored with a
> > second half.  This makes no difference on an SSD, but makes a huge
> > difference on a hard disk.
> >
> > mdadm and md raid get the ordering right every time - there is no need
> > to worry about the ordering of the two disks.
> >
> > You don't have to have extra partitions, automatic detection works, and
> > the layout has one less layer, meaning less complexity and lower latency
> > and overheads.
> >
> > md raid knows more about the layout, and can use it to optimise the speed.
> >
> >
> > In particular, md will (almost) always read from the outer halves of the
> > disks.  On a hard disk, this can be twice the speed of the inner layers.
> >
> > Obviously you pay a penalty in writing when you have such an arrangement
> > - writes need to go to both disks, and involve significant head
> > movement.  There are other raid10 layouts that have lower streamed read
> > speeds but also lower write latencies (choose the balance you want).
> >
> >
> > With this in mind, I hope you can try out raid10,f2 layout on your
> > system and then change your blog to show how easy this all is with md
> > raid, how practical it is for a fast workstation or desktop, and how
> > much faster such a setup is than anything that can be achieved with
> > hardware raid cards or anything other than md raid.
> >
> > mvh.,
> >
> > David
> >
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Raid 1 vs Raid 10 single thread performance
       [not found]       ` <CAEp_DRDBQQmBHe7uYdOWWnUD084RtTrnbZe3jUrG3b6c6w=ivQ@mail.gmail.com>
@ 2014-09-18 13:19         ` keld
  0 siblings, 0 replies; 10+ messages in thread
From: keld @ 2014-09-18 13:19 UTC (permalink / raw)
  To: Bostjan Skufca; +Cc: David Brown, linux-raid

Hi Bostjan

The raid.wiki.kernel.org is not mine, but it is the official wiki for this email list
and kernel group. I am one of the more active people on the wiki:
Some of the benchmarks ar provided by me, but most are provided by
others. The source of the benchmark is reported in every case.

I wrote many years ago when "far" layout was originally implemented that F2
should be the default raid10 layout, as I think it has the best
overall performance, but that has not happened (yet!). 

There are some shortcomings, tho, as F2 is the only raid10 layout
that is not possible to grow. This could be solved by implementing it.

Also a better allocation of the disk partitions is not fully implemented,
which gives better redundancy. The fully supported
implementation of "far" layout gives the redundancy of raid 0+1,
while the partly implemented "far" layout (implemented partly in the kernel)
gives raid 1+0 redundancy.

Best regards
keld


On Tue, Sep 16, 2014 at 05:19:59PM +0200, Bostjan Skufca wrote:
> I expected "optimized" result, but not by that much. Positively surprised.
> 
> Looking over at results shown on the wiki (yours, I presume), my results
> for n2 could be even higher. Yours are within 2% range (for sequential
> writes), mine 10%.
> 
> Do you think f2 should be made default for 2-device RAID 10 arrays?
> 
> b.
> 
> PS: Judging by the results it would benefit almost everyone (trade 2-10%
> write penalty for 100% read throughput increase). But this is just my
> personal opinion. Heck, the best would be to replace raid1 with 10
> altogether, so users would not be surprised by this unexpected
> single-client RAID 1 non-performance.
> 
> PPS: BTW it seems you guys did a great job here, like David stated in his
> last response ("way ahead":).
> 
> PPPS: David: enthusiasm came from finally being enough p...ed off about why
> can't linux raid 1 behave like a raid 1 should, even for single client. And
> that 1(0)Gbps connection is not saturated when it could/should be! :)
> 
> 
> On 16 September 2014 12:19, <keld@keldix.com> wrote:
> 
> > On Tue, Sep 16, 2014 at 09:48:28AM +0200, Bostjan Skufca wrote:
> > > David and Neil, thanks for hints!
> > >
> > > (I was busy with other things lately, but believe it or not I got the
> > > "why not try raid 10 with only 2 partitions" idea just last night,
> > > tested it a couple of minutes ago with fascination, and now here I am
> > > reading your emails - please do not remind me again of time wasted :)
> > >
> > > The write performance is curious though:
> > > - f2: 147 MB/s
> > > - n2: 162 MB/s
> > > I was expecting greater difference (bu I must admit this was not
> > > tested on the whole 3TB disk, just 400GB partition on it).
> >
> >
> > This is as expected, and also as reported in other benchmarks.
> >
> > Many expect that writing is considerably slower in F2 than n2,
> > because the blocks are distributed much more apart in f2 than in n2,
> > but the elevator algorithm for IO sceduling collects writing blocks
> > in the cache and does almost equalize the time used for about all mirrored
> > raid types.
> >
> > See also https://raid.wiki.kernel.org/index.php/Performance
> > for more benchmarks.
> >
> > Best regards
> > Keld
> >
> > > b.
> > >
> > >
> > > On 12 September 2014 10:49, David Brown <david.brown@hesbynett.no>
> > wrote:
> > > > On 10/09/14 23:24, Bostjan Skufca wrote:
> > > >> Hi,
> > > >>
> > > >> I have a simple question:
> > > >> - Where is the code that is used for actual RAID 10 creation? In
> > > >> kernel or in mdadm?
> > > >>
> > > >>
> > > >> Explanation:
> > > >>
> > > >> I was dissatisfied with single-threaded RAID 1 sequential read
> > > >> performance (basically boils down to the speed of one disk). I figured
> > > >> that instead of using level 1 I could create RAID level 10 and use two
> > > >> equally-sized partitions on each drive (instead of one).
> > > >>
> > > >> It turns out that if array is created properly, it is capable of
> > > >> sequential reads at almost 2x single device speed, as expected (on
> > > >> SSD!) and what would anyone expect from ordinary RAID 1.
> > > >>
> > > >> What does "properly" actually mean?
> > > >> I was doing some benchmarks with various raid configurations and
> > > >> figured out that the order of devices submitted to creation command is
> > > >> significant. It also makes raid10 created in such mode reliable or
> > > >> unreliable to a device failure (not partition failure, device failure,
> > > >> which means that two raid underlying devices fail at once).
> > > >>
> > > >> Sum:
> > > >> - if such array is created properly, it has redundancy in place and
> > > >> performs as expected
> > > >> - if not, it performs as raid1 and fails with one physical disk
> > failure
> > > >>
> > > >> I am trying to find the code responsible for creation of RAID 10 in
> > > >> order to try and make it more inteligent about where to place RAID 10
> > > >> parts if it gets a list of devices to use, and some of those devices
> > > >> are on the same physical disks.
> > > >>
> > > >> Thanks for hints,
> > > >> b.
> > > >>
> > > >>
> > > >>
> > > >> PS: More details about testing is available here, but be warned, it is
> > > >> still a bit hectic to read:
> > > >>
> > http://blog.a2o.si/2014/09/07/linux-software-raid-why-you-should-always-use-raid-10-instead-of-raid-1/
> > > >
> > > >
> > > > Hi,
> > > >
> > > > First let me applaud your enthusiasm for trying to inform people about
> > > > raid in your blog, your interest in investigating different ideas in
> > the
> > > > hope of making md raid faster and/or easier and/or safer.
> > > >
> > > > Then let me tell you your entire blog post is wasted, because md
> > already
> > > > has a solution that is faster, easier and safer than anything you have
> > > > come up with so far.
> > > >
> > > > You are absolutely correct about the single-threaded read performance
> > of
> > > > raid1 pairs - for a number of reasons, a single thread read will get
> > > > reads from only one disk.  This is not a problem in many cases, because
> > > > you often have multiple simultaneous reads on "typical" systems with
> > > > raid1.  But for some cases, such as a high performance desktop, it can
> > > > be a limitation.
> > > >
> > > > You are also correct that the solution is basically to split the drives
> > > > into two parts, pair up halves from each disk as raid1 mirrors, and
> > > > stripe the two mirrors as raid0.
> > > >
> > > > And you are correct that you have to get the sets right, or you will
> > may
> > > > lose redundancy and/or speed.
> > > >
> > > > Fortunately, Neil and the other md raid developers are way ahead of
> > you.
> > > >
> > > > Neil gave you the pointers in one of his replies, but I suspect you did
> > > > not understand that Linux raid10 is not limited to the arrangement of
> > > > traditional raid10, and thus did not see his point.
> > > >
> > > > md raid and mdadmin already support a very flexible form of raid10.
> > > > Unlike traditional raid10 that requires a multiple of 4 disks, Linux
> > > > raid10 can work with /any/ number of disks greater than 1.  There are
> > > > various layouts that can be used for this - the Wikipedia entry gives
> > > > some useful diagrams:
> > > >
> > > > <
> > http://en.wikipedia.org/wiki/Non-standard_RAID_levels#Linux_MD_RAID_10>
> > > >
> > > > You can also read about it in the mdadm manual page, and various
> > > > documents and resources around the web.
> > > >
> > > >
> > > > In your particular case, what you want is to use "--layout raid10,f2"
> > on
> > > > your two disks.  This asks md to split each disk (or the partitions you
> > > > use) into two parts, without creating any new partitions.  The first
> > > > half of disk 1 is mirrored with the second half of disk 2, and vice
> > > > versa, then these mirrors are striped.  This is very similar to the
> > > > layout you are trying to achieve, except for four points:
> > > >
> > > > The mirrors are crossed-over, so that a first half is mirrored with a
> > > > second half.  This makes no difference on an SSD, but makes a huge
> > > > difference on a hard disk.
> > > >
> > > > mdadm and md raid get the ordering right every time - there is no need
> > > > to worry about the ordering of the two disks.
> > > >
> > > > You don't have to have extra partitions, automatic detection works, and
> > > > the layout has one less layer, meaning less complexity and lower
> > latency
> > > > and overheads.
> > > >
> > > > md raid knows more about the layout, and can use it to optimise the
> > speed.
> > > >
> > > >
> > > > In particular, md will (almost) always read from the outer halves of
> > the
> > > > disks.  On a hard disk, this can be twice the speed of the inner
> > layers.
> > > >
> > > > Obviously you pay a penalty in writing when you have such an
> > arrangement
> > > > - writes need to go to both disks, and involve significant head
> > > > movement.  There are other raid10 layouts that have lower streamed read
> > > > speeds but also lower write latencies (choose the balance you want).
> > > >
> > > >
> > > > With this in mind, I hope you can try out raid10,f2 layout on your
> > > > system and then change your blog to show how easy this all is with md
> > > > raid, how practical it is for a fast workstation or desktop, and how
> > > > much faster such a setup is than anything that can be achieved with
> > > > hardware raid cards or anything other than md raid.
> > > >
> > > > mvh.,
> > > >
> > > > David
> > > >
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> > > the body of a message to majordomo@vger.kernel.org
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2014-09-18 13:19 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-09-10 21:24 Raid 1 vs Raid 10 single thread performance Bostjan Skufca
2014-09-11  0:31 ` NeilBrown
2014-09-11  4:48   ` Bostjan Skufca
2014-09-11  4:59     ` NeilBrown
2014-09-11  5:20       ` Bostjan Skufca
2014-09-11  5:46         ` NeilBrown
2014-09-12  8:49 ` David Brown
2014-09-16  7:48   ` Bostjan Skufca
2014-09-16 10:19     ` keld
     [not found]       ` <CAEp_DRDBQQmBHe7uYdOWWnUD084RtTrnbZe3jUrG3b6c6w=ivQ@mail.gmail.com>
2014-09-18 13:19         ` keld

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.