All of lore.kernel.org
 help / color / mirror / Atom feed
* Awful RAID5 random read performance
@ 2009-05-30 21:46 Maurice Hilarius
  2009-05-31  6:25 ` Michael Tokarev
  0 siblings, 1 reply; 27+ messages in thread
From: Maurice Hilarius @ 2009-05-30 21:46 UTC (permalink / raw)
  To: linux-raid

A friend writes:

On a recent machine set up with Raid5.
On a AMD Phenom II X4 810, and 4GB ram.
4 Seagate 7200.12  SATA 1TB drives,

I'm getting some rather impressive numbers for sequential read
(300MB/s+) and write (170MB/s+) but the random read is proving to be
absolutely atrocious.
iostat says its going at about 0.5MB/s,

I've seen plenty of references to people getting numbers in the double 
digits
for random reads on a md raid5 array, and one with slower disks to boot.

I tried disabling automatic acoustic management on the drives, but it 
didn't
seem to help at all.
 I really don't care that much about the noise coming from a file server 
stuck in a closet  ;)

Does anyone know why I'm seeing such bad random read times?
There's got to be some configuration error on my part, but I just can't
seem to find what it might be.

I've spent a few hours on google looking for various tweaks but
almost nobody even mentions random read/write times.
All everyone seems to care about is block sequential access time.
I'd be willing to sacrifice a fair portion of  my rather excellent 
sequential
times for much better random access.

-- 
Regards, Maurice

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Awful RAID5 random read performance
  2009-05-30 21:46 Awful RAID5 random read performance Maurice Hilarius
@ 2009-05-31  6:25 ` Michael Tokarev
  2009-05-31  7:47   ` Thomas Fjellstrom
  0 siblings, 1 reply; 27+ messages in thread
From: Michael Tokarev @ 2009-05-31  6:25 UTC (permalink / raw)
  To: Maurice Hilarius; +Cc: linux-raid

Maurice Hilarius wrote:
> A friend writes:
> 
> On a recent machine set up with Raid5.
> On a AMD Phenom II X4 810, and 4GB ram.
> 4 Seagate 7200.12  SATA 1TB drives,
> 
> I'm getting some rather impressive numbers for sequential read
> (300MB/s+) and write (170MB/s+) but the random read is proving to be
> absolutely atrocious.
> iostat says its going at about 0.5MB/s,

The key thing about random i/o is the block size.  With, say, 512bytes
blocks and single thread you will see less than 0.5Mb/sec.  With 64kbytes
blocksize it will be much better.

To diagnose: first try the same test on bare disk without raid layer.
Next try to vary block size and number of concurrent threads doing I/O.
There's no tweaks needed really.

/mjt

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Awful RAID5 random read performance
  2009-05-31  6:25 ` Michael Tokarev
@ 2009-05-31  7:47   ` Thomas Fjellstrom
  2009-05-31 12:29     ` John Robinson
  0 siblings, 1 reply; 27+ messages in thread
From: Thomas Fjellstrom @ 2009-05-31  7:47 UTC (permalink / raw)
  To: Michael Tokarev; +Cc: Maurice Hilarius, linux-raid

On Sun May 31 2009, Michael Tokarev wrote:
> Maurice Hilarius wrote:
> > A friend writes:
> >
> > On a recent machine set up with Raid5.
> > On a AMD Phenom II X4 810, and 4GB ram.
> > 4 Seagate 7200.12  SATA 1TB drives,
> >
> > I'm getting some rather impressive numbers for sequential read
> > (300MB/s+) and write (170MB/s+) but the random read is proving to be
> > absolutely atrocious.
> > iostat says its going at about 0.5MB/s,
>
> The key thing about random i/o is the block size.  With, say, 512bytes
> blocks and single thread you will see less than 0.5Mb/sec.  With 64kbytes
> blocksize it will be much better.
>
> To diagnose: first try the same test on bare disk without raid layer.
> Next try to vary block size and number of concurrent threads doing I/O.
> There's no tweaks needed really.

I happen to be the friend Maurice was talking about. I let the raid layer keep 
its default chunk size of 64K. The smaller size (below like 2MB) tests in 
iozone are very very slow. I recently tried disabling readahead, Acoustic 
Management, and played with the io scheduler and all any of it has done is 
make the sequential access slower and has barely touched the smaller sized 
random access test results. Even with the 64K iozone test random read/write is 
only in the 7 and 11MB/s range.

It just seems too low to me.

> /mjt
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


-- 
Thomas Fjellstrom
tfjellstrom@shaw.ca

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Awful RAID5 random read performance
  2009-05-31  7:47   ` Thomas Fjellstrom
@ 2009-05-31 12:29     ` John Robinson
  2009-05-31 15:41       ` Leslie Rhorer
  2009-05-31 17:19       ` Goswin von Brederlow
  0 siblings, 2 replies; 27+ messages in thread
From: John Robinson @ 2009-05-31 12:29 UTC (permalink / raw)
  To: tfjellstrom; +Cc: linux-raid

On 31/05/2009 08:47, Thomas Fjellstrom wrote:
> On Sun May 31 2009, Michael Tokarev wrote:
>> Maurice Hilarius wrote:
>>> A friend writes:
>>>
>>> On a recent machine set up with Raid5.
>>> On a AMD Phenom II X4 810, and 4GB ram.
>>> 4 Seagate 7200.12  SATA 1TB drives,
>>>
>>> I'm getting some rather impressive numbers for sequential read
>>> (300MB/s+) and write (170MB/s+) but the random read is proving to be
>>> absolutely atrocious.
>>> iostat says its going at about 0.5MB/s,
>> The key thing about random i/o is the block size.  With, say, 512bytes
>> blocks and single thread you will see less than 0.5Mb/sec.  With 64kbytes
>> blocksize it will be much better.
>>
>> To diagnose: first try the same test on bare disk without raid layer.
>> Next try to vary block size and number of concurrent threads doing I/O.
>> There's no tweaks needed really.
> 
> I happen to be the friend Maurice was talking about. I let the raid layer keep 
> its default chunk size of 64K. The smaller size (below like 2MB) tests in 
> iozone are very very slow. I recently tried disabling readahead, Acoustic 
> Management, and played with the io scheduler and all any of it has done is 
> make the sequential access slower and has barely touched the smaller sized 
> random access test results. Even with the 64K iozone test random read/write is 
> only in the 7 and 11MB/s range.
> 
> It just seems too low to me.

I don't think so; can you try a similar test on single drives not using 
md RAID-5?

The killer is seeks, which is what random I/O uses lots of; with a 10ms 
seek time you're only going to get ~100 seeks/second and if you're only 
reading 512 bytes after each seek you're only going to get ~500 
kbytes/second. Bigger block sizes will show higher throughput, but 
you'll still only get ~100 seeks/second.

Clearly when you're doing this over 4 drives you can have ~400 
seeks/second but that's still limiting you to ~400 reads/second for 
smallish block sizes.

Cheers,

John.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* RE: Awful RAID5 random read performance
  2009-05-31 12:29     ` John Robinson
@ 2009-05-31 15:41       ` Leslie Rhorer
  2009-05-31 16:56         ` Thomas Fjellstrom
  2009-06-01  1:19         ` Carlos Carvalho
  2009-05-31 17:19       ` Goswin von Brederlow
  1 sibling, 2 replies; 27+ messages in thread
From: Leslie Rhorer @ 2009-05-31 15:41 UTC (permalink / raw)
  To: linux-raid

> > I happen to be the friend Maurice was talking about. I let the raid
> layer keep
> > its default chunk size of 64K. The smaller size (below like 2MB) tests
> in
> > iozone are very very slow. I recently tried disabling readahead,
> Acoustic
> > Management, and played with the io scheduler and all any of it has done
> is
> > make the sequential access slower and has barely touched the smaller
> sized
> > random access test results. Even with the 64K iozone test random
> read/write is
> > only in the 7 and 11MB/s range.
> >
> > It just seems too low to me.
> 
> I don't think so; can you try a similar test on single drives not using
> md RAID-5?
> 
> The killer is seeks, which is what random I/O uses lots of; with a 10ms
> seek time you're only going to get ~100 seeks/second and if you're only
> reading 512 bytes after each seek you're only going to get ~500
> kbytes/second. Bigger block sizes will show higher throughput, but
> you'll still only get ~100 seeks/second.
> 
> Clearly when you're doing this over 4 drives you can have ~400
> seeks/second but that's still limiting you to ~400 reads/second for
> smallish block sizes.

	John is perfectly correct, although of course a 10ms seek is a
fairly slow one.  The point is, it is drive dependent, and there may not be
much one can do about it at the software layer.  That said, you might try a
different scheduler, as the seek order can make a difference.  Drives with
larger caches may help some, although the increase in performance with
larger cache sizes diminishes rapidly beyond a certain point.  As one would
infer from John's post, increasing the number of drives in the array will
help a lot, since increasing the number of drives raises the limit on the
number of seeks / second.

	What file system are you using?  It can make a difference, and
surely has a bigger impact than most tweaks to the RAID subsystem.

	The biggest question in my mind, however, is why is random access a
big issue for you?  Are you running a very large relational database with
tens of thousands of tiny files?  For most systems, high volume accesses
consist mostly of large sequential I/O.  The majority of random I/O is of
rather short duration, meaning even with comparatively poor performance, it
doesn't take long to get the job done.  Fifty to eighty Megabits per second
is nothing at which to sneeze for random access of small files.  A few years
ago, many drives would have been barely able to manage that on a sustained
basis for sequential I/O.



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Awful RAID5 random read performance
  2009-05-31 15:41       ` Leslie Rhorer
@ 2009-05-31 16:56         ` Thomas Fjellstrom
  2009-05-31 18:26           ` Keld Jørn Simonsen
  2009-06-02 18:54           ` Bill Davidsen
  2009-06-01  1:19         ` Carlos Carvalho
  1 sibling, 2 replies; 27+ messages in thread
From: Thomas Fjellstrom @ 2009-05-31 16:56 UTC (permalink / raw)
  To: lrhorer; +Cc: linux-raid

On Sun May 31 2009, Leslie Rhorer wrote:
> > > I happen to be the friend Maurice was talking about. I let the raid
> >
> > layer keep
> >
> > > its default chunk size of 64K. The smaller size (below like 2MB) tests
> >
> > in
> >
> > > iozone are very very slow. I recently tried disabling readahead,
> >
> > Acoustic
> >
> > > Management, and played with the io scheduler and all any of it has done
> >
> > is
> >
> > > make the sequential access slower and has barely touched the smaller
> >
> > sized
> >
> > > random access test results. Even with the 64K iozone test random
> >
> > read/write is
> >
> > > only in the 7 and 11MB/s range.
> > >
> > > It just seems too low to me.
> >
> > I don't think so; can you try a similar test on single drives not using
> > md RAID-5?
> >
> > The killer is seeks, which is what random I/O uses lots of; with a 10ms
> > seek time you're only going to get ~100 seeks/second and if you're only
> > reading 512 bytes after each seek you're only going to get ~500
> > kbytes/second. Bigger block sizes will show higher throughput, but
> > you'll still only get ~100 seeks/second.
> >
> > Clearly when you're doing this over 4 drives you can have ~400
> > seeks/second but that's still limiting you to ~400 reads/second for
> > smallish block sizes.
>
> 	John is perfectly correct, although of course a 10ms seek is a
> fairly slow one.  The point is, it is drive dependent, and there may not be
> much one can do about it at the software layer.  That said, you might try a
> different scheduler, as the seek order can make a difference.  Drives with
> larger caches may help some, although the increase in performance with
> larger cache sizes diminishes rapidly beyond a certain point.  As one would
> infer from John's post, increasing the number of drives in the array will
> help a lot, since increasing the number of drives raises the limit on the
> number of seeks / second.
>
> 	What file system are you using?  It can make a difference, and
> surely has a bigger impact than most tweaks to the RAID subsystem.
>
> 	The biggest question in my mind, however, is why is random access a
> big issue for you?  Are you running a very large relational database with
> tens of thousands of tiny files?  For most systems, high volume accesses
> consist mostly of large sequential I/O.  The majority of random I/O is of
> rather short duration, meaning even with comparatively poor performance, it
> doesn't take long to get the job done.  Fifty to eighty Megabits per second
> is nothing at which to sneeze for random access of small files.  A few
> years ago, many drives would have been barely able to manage that on a
> sustained basis for sequential I/O.

I thought the numbers were way too low. But I guess I was wrong. I really only 
have three use cases for my arrays. One will be hosting VM images/volumes, and 
iso disk images, while another will be hosting large media which will be 
streaming off, p2p downloads, amd rsync/rsnapshot backups of several machines. 
I imagine the vm array will appreciate faster random io (boot times will 
improve, as will things like database and http disk access), and the p2p 
surely will appreciate faster random io.

I currently have them all on one disk array, but I'm thinking its a good idea 
to separate the media from the VMs. when ktorrent is downloading a linux iso 
or something similar atop shows very high disk utilization for ktorrent, same 
goes for booting VMs. and the backups, oh my lord does that take a while, I 
even tell it to skip a lot of stuff I don't need to backup.

When I get around to it I may utilize the raid10 module for the VM's and 
backups. Though that may decrease performance a little bit in the small random 
io case. 

>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


-- 
Thomas Fjellstrom
tfjellstrom@shaw.ca

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Awful RAID5 random read performance
  2009-05-31 12:29     ` John Robinson
  2009-05-31 15:41       ` Leslie Rhorer
@ 2009-05-31 17:19       ` Goswin von Brederlow
  2009-06-01 12:01         ` John Robinson
  1 sibling, 1 reply; 27+ messages in thread
From: Goswin von Brederlow @ 2009-05-31 17:19 UTC (permalink / raw)
  To: John Robinson; +Cc: tfjellstrom, linux-raid

John Robinson <john.robinson@anonymous.org.uk> writes:

> Clearly when you're doing this over 4 drives you can have ~400
> seeks/second but that's still limiting you to ~400 reads/second for
> smallish block sizes.
>
> Cheers,
>
> John.

Note that that only holds true for writes or multithreaded reads.
Reading from a single thread will randomly pick one drive (depending
on where it wants to read), wait for it to seek, read one block of
data and repeat. So you get the speed of a single drive no matter how
many drives there are in the raid.

Single thread seek times are only improved with raid1 (raid10) because
there linux can choose the drive with shorter seek time. And that only
saves a tiny amount.


MfG
        Goswin

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Awful RAID5 random read performance
  2009-05-31 16:56         ` Thomas Fjellstrom
@ 2009-05-31 18:26           ` Keld Jørn Simonsen
  2009-06-02 18:54           ` Bill Davidsen
  1 sibling, 0 replies; 27+ messages in thread
From: Keld Jørn Simonsen @ 2009-05-31 18:26 UTC (permalink / raw)
  To: Thomas Fjellstrom; +Cc: lrhorer, linux-raid

On Sun, May 31, 2009 at 10:56:29AM -0600, Thomas Fjellstrom wrote:
> On Sun May 31 2009, Leslie Rhorer wrote:
> > > > I happen to be the friend Maurice was talking about. I let the raid
> > >
> > > layer keep
> > >
> > > > its default chunk size of 64K. The smaller size (below like 2MB) tests
> > >
> > > in
> > >
> > > > iozone are very very slow. I recently tried disabling readahead,
> > >
> > > Acoustic
> > >
> > > > Management, and played with the io scheduler and all any of it has done
> > >
> > > is
> > >
> > > > make the sequential access slower and has barely touched the smaller
> > >
> > > sized
> > >
> > > > random access test results. Even with the 64K iozone test random
> > >
> > > read/write is
> > >
> > > > only in the 7 and 11MB/s range.
> > > >
> > > > It just seems too low to me.
> > >
> > > I don't think so; can you try a similar test on single drives not using
> > > md RAID-5?
> > >
> > > The killer is seeks, which is what random I/O uses lots of; with a 10ms
> > > seek time you're only going to get ~100 seeks/second and if you're only
> > > reading 512 bytes after each seek you're only going to get ~500
> > > kbytes/second. Bigger block sizes will show higher throughput, but
> > > you'll still only get ~100 seeks/second.
> > >
> > > Clearly when you're doing this over 4 drives you can have ~400
> > > seeks/second but that's still limiting you to ~400 reads/second for
> > > smallish block sizes.
> >
> > 	John is perfectly correct, although of course a 10ms seek is a
> > fairly slow one.  The point is, it is drive dependent, and there may not be
> > much one can do about it at the software layer.  That said, you might try a
> > different scheduler, as the seek order can make a difference.  Drives with
> > larger caches may help some, although the increase in performance with
> > larger cache sizes diminishes rapidly beyond a certain point.  As one would
> > infer from John's post, increasing the number of drives in the array will
> > help a lot, since increasing the number of drives raises the limit on the
> > number of seeks / second.
> >
> > 	What file system are you using?  It can make a difference, and
> > surely has a bigger impact than most tweaks to the RAID subsystem.
> >
> > 	The biggest question in my mind, however, is why is random access a
> > big issue for you?  Are you running a very large relational database with
> > tens of thousands of tiny files?  For most systems, high volume accesses
> > consist mostly of large sequential I/O.  The majority of random I/O is of
> > rather short duration, meaning even with comparatively poor performance, it
> > doesn't take long to get the job done.  Fifty to eighty Megabits per second
> > is nothing at which to sneeze for random access of small files.  A few
> > years ago, many drives would have been barely able to manage that on a
> > sustained basis for sequential I/O.
> 
> I thought the numbers were way too low. But I guess I was wrong. I really only 
> have three use cases for my arrays. One will be hosting VM images/volumes, and 
> iso disk images, while another will be hosting large media which will be 
> streaming off, p2p downloads, amd rsync/rsnapshot backups of several machines. 
> I imagine the vm array will appreciate faster random io (boot times will 
> improve, as will things like database and http disk access), and the p2p 
> surely will appreciate faster random io.
> 
> I currently have them all on one disk array, but I'm thinking its a good idea 
> to separate the media from the VMs. when ktorrent is downloading a linux iso 
> or something similar atop shows very high disk utilization for ktorrent, same 
> goes for booting VMs. and the backups, oh my lord does that take a while, I 
> even tell it to skip a lot of stuff I don't need to backup.
> 
> When I get around to it I may utilize the raid10 module for the VM's and 
> backups. Though that may decrease performance a little bit in the small random 
> io case. 

raid10,f2 may actually speed up random i/o as seeks are in essence
confined to the faster outer sectors of the disk, and thereby about
halfing the access times.

best regards
keld

^ permalink raw reply	[flat|nested] 27+ messages in thread

* RE: Awful RAID5 random read performance
  2009-05-31 15:41       ` Leslie Rhorer
  2009-05-31 16:56         ` Thomas Fjellstrom
@ 2009-06-01  1:19         ` Carlos Carvalho
  2009-06-01  4:57           ` Leslie Rhorer
  1 sibling, 1 reply; 27+ messages in thread
From: Carlos Carvalho @ 2009-06-01  1:19 UTC (permalink / raw)
  To: linux-raid

Leslie Rhorer (lrhorer@satx.rr.com) wrote on 31 May 2009 10:41:
 >> > I happen to be the friend Maurice was talking about. I let the raid
 >> layer keep
 >> > its default chunk size of 64K. The smaller size (below like 2MB) tests
 >> in
 >> > iozone are very very slow. I recently tried disabling readahead,
 >> Acoustic
 >> > Management, and played with the io scheduler and all any of it has done
 >> is
 >> > make the sequential access slower and has barely touched the smaller
 >> sized
 >> > random access test results. Even with the 64K iozone test random
 >> read/write is
 >> > only in the 7 and 11MB/s range.
 >> >
 >> > It just seems too low to me.
 >> 
 >> I don't think so; can you try a similar test on single drives not using
 >> md RAID-5?
 >> 
 >> The killer is seeks, which is what random I/O uses lots of; with a 10ms
 >> seek time you're only going to get ~100 seeks/second and if you're only
 >> reading 512 bytes after each seek you're only going to get ~500
 >> kbytes/second. Bigger block sizes will show higher throughput, but
 >> you'll still only get ~100 seeks/second.
 >> 
 >> Clearly when you're doing this over 4 drives you can have ~400
 >> seeks/second but that's still limiting you to ~400 reads/second for
 >> smallish block sizes.
 >
 >	John is perfectly correct, although of course a 10ms seek is a
 >fairly slow one.

Unfortunately it doesn't seem to be. Take a well-considered drive such
as the WD RE3; it's spec for average latency is 4.2ms. However does it
include the rotational latency (the time the head takes to reach the
sector once it's on the track)? I bet it doesn't. Taking it to be only
the average seek time, this drive is still among the fastest. For a
7200rpm drive this latency is just 4.2ms, so we'd have for this fast
drive an average total latency of 8.4ms.

 >	The biggest question in my mind, however, is why is random access a
 >big issue for you?  Are you running a very large relational database with
 >tens of thousands of tiny files?  For most systems, high volume accesses
 >consist mostly of large sequential I/O.

No, random I/O is the most common case for busy servers, when there
are lots of processes doing uncorrelated reads and writes. Even if a
single application does sequential access the head will likely have
moved between them. The only solution is to have lots of ram for
cache, and/or lots of disks. It'd be better if they were connected to
several controllers...

^ permalink raw reply	[flat|nested] 27+ messages in thread

* RE: Awful RAID5 random read performance
  2009-06-01  1:19         ` Carlos Carvalho
@ 2009-06-01  4:57           ` Leslie Rhorer
  2009-06-01  5:39             ` Thomas Fjellstrom
  2009-06-01 11:41             ` Goswin von Brederlow
  0 siblings, 2 replies; 27+ messages in thread
From: Leslie Rhorer @ 2009-06-01  4:57 UTC (permalink / raw)
  To: linux-raid

>  >	John is perfectly correct, although of course a 10ms seek is a
>  >fairly slow one.
> 
> Unfortunately it doesn't seem to be. Take a well-considered drive such
> as the WD RE3; it's spec for average latency is 4.2ms. However does it
> include the rotational latency (the time the head takes to reach the
> sector once it's on the track)? I bet it doesn't. Taking it to be only
> the average seek time, this drive is still among the fastest. For a
> 7200rpm drive this latency is just 4.2ms, so we'd have for this fast
> drive an average total latency of 8.4ms.

That's an average.  For a random seek to exceed that, it's going to have to
span many cylinders.  Give the container size of a modern cylinder, that's a
pretty big jump.  Single applications will tend to have their data lumped
somewhat together on the drive.

>  >	The biggest question in my mind, however, is why is random access a
>  >big issue for you?  Are you running a very large relational database
> with
>  >tens of thousands of tiny files?  For most systems, high volume accesses
>  >consist mostly of large sequential I/O.
> 
> No, random I/O is the most common case for busy servers, when there
> are lots of processes doing uncorrelated reads and writes. Even if a

Yes, exactly.  By definition, such a scenario represents a multithreaded set
of seeks, and as we already established, multithreaded seeks are vastly more
efficient than serial random seeks.  The 400 seeks per second number for 4
drives applies.  I don't know the details of the Linux schedulers, but most
schedulers employ some variation of an elevator seek to maximize seek
efficiency.  The brings the average latency way down and brings the seek
frequency way up.

> single application does sequential access the head will likely have
> moved between them. The only solution is to have lots of ram for
> cache, and/or lots of disks. It'd be better if they were connected to
> several controllers...

A large RAM cache will help, but as I already pointed out, the increases in
returns for increasing cache size diminish rapidly past a certain point.
Most quality drives these days have a 32MB cache, or 128M for a 4 drive
array.  Add the Linux cache on top of that, and it should be sufficient for
most purposes.  Remember, random seeks implies small data extents.  Lots of
disks will bring the biggest benefit, and disks are cheap.  Multiple
controllers really are not necessary, especially if the controller and
drives support NCQ , but having multiple controllers certainly doesn't hurt.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Awful RAID5 random read performance
  2009-06-01  4:57           ` Leslie Rhorer
@ 2009-06-01  5:39             ` Thomas Fjellstrom
  2009-06-01 12:43               ` Maurice Hilarius
  2009-06-02 19:47               ` Bill Davidsen
  2009-06-01 11:41             ` Goswin von Brederlow
  1 sibling, 2 replies; 27+ messages in thread
From: Thomas Fjellstrom @ 2009-06-01  5:39 UTC (permalink / raw)
  To: linux-raid

On Sun May 31 2009, Leslie Rhorer wrote:

> >
> > Unfortunately it doesn't seem to be. Take a well-considered drive such
> > as the WD RE3; it's spec for average latency is 4.2ms. However does it
> > include the rotational latency (the time the head takes to reach the
> > sector once it's on the track)? I bet it doesn't. Taking it to be only
> > the average seek time, this drive is still among the fastest. For a
> > 7200rpm drive this latency is just 4.2ms, so we'd have for this fast
> > drive an average total latency of 8.4ms.
>
> That's an average.  For a random seek to exceed that, it's going to have to
> span many cylinders.  Give the container size of a modern cylinder, that's
> a pretty big jump.  Single applications will tend to have their data lumped
> somewhat together on the drive.
>

> >
> > with
> >

> >
> > No, random I/O is the most common case for busy servers, when there
> > are lots of processes doing uncorrelated reads and writes. Even if a
>
> Yes, exactly.  By definition, such a scenario represents a multithreaded
> set of seeks, and as we already established, multithreaded seeks are vastly
> more efficient than serial random seeks.  The 400 seeks per second number
> for 4 drives applies.  I don't know the details of the Linux schedulers,
> but most schedulers employ some variation of an elevator seek to maximize
> seek efficiency.  The brings the average latency way down and brings the
> seek frequency way up.

Ah, I never really understood how adding more random load could increase 
performance. Now I get it :)

> > single application does sequential access the head will likely have
> > moved between them. The only solution is to have lots of ram for
> > cache, and/or lots of disks. It'd be better if they were connected to
> > several controllers...
>
> A large RAM cache will help, but as I already pointed out, the increases in
> returns for increasing cache size diminish rapidly past a certain point.
> Most quality drives these days have a 32MB cache, or 128M for a 4 drive
> array.  Add the Linux cache on top of that, and it should be sufficient for
> most purposes.  Remember, random seeks implies small data extents.  Lots of
> disks will bring the biggest benefit, and disks are cheap.  Multiple
> controllers really are not necessary, especially if the controller and
> drives support NCQ , but having multiple controllers certainly doesn't
> hurt.

Yet I've heard NCQ makes some things worse. Some raid tweaking pages tell you 
to try disabling NCQ.

I've actually been thinking about trying md-cache with an SSD on top of my new 
raid and see how that works long term. But I can't really think of a good 
benchmark that actually imitates my particular use cases well enough to show 
me if it'd help me at all ::)

I doubt my punny little 30G OCZ Vertex would really help all that much any 
how.

> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


-- 
Thomas Fjellstrom
tfjellstrom@shaw.ca

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Awful RAID5 random read performance
  2009-06-01  4:57           ` Leslie Rhorer
  2009-06-01  5:39             ` Thomas Fjellstrom
@ 2009-06-01 11:41             ` Goswin von Brederlow
  2009-06-03  1:57               ` Leslie Rhorer
  1 sibling, 1 reply; 27+ messages in thread
From: Goswin von Brederlow @ 2009-06-01 11:41 UTC (permalink / raw)
  To: lrhorer; +Cc: linux-raid

"Leslie Rhorer" <lrhorer@satx.rr.com> writes:

>>  >	John is perfectly correct, although of course a 10ms seek is a
>>  >fairly slow one.
>> 
>> Unfortunately it doesn't seem to be. Take a well-considered drive such
>> as the WD RE3; it's spec for average latency is 4.2ms. However does it
>> include the rotational latency (the time the head takes to reach the
>> sector once it's on the track)? I bet it doesn't. Taking it to be only
>> the average seek time, this drive is still among the fastest. For a
>> 7200rpm drive this latency is just 4.2ms, so we'd have for this fast
>> drive an average total latency of 8.4ms.
>
> That's an average.  For a random seek to exceed that, it's going to have to
> span many cylinders.  Give the container size of a modern cylinder, that's a
> pretty big jump.  Single applications will tend to have their data lumped
> somewhat together on the drive.

Only at the start, which is usualy when people benchmark. But after a
while filesystem fragment. Files get distributed all over the disk,
files themself get spread out as they grow. And suddenly an FS that
was fine  month ago is too slow.

The worst you can do to an FS is run mldonkey/rtorrent on it with lots
of downloads. I've managed to get an ext2 to the point where copying a
file from the FS to another disk only managed <100kiB/s.

In conclusion: Seek times can not be ignored and they should be
avoided.

MfG
        Goswin

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Awful RAID5 random read performance
  2009-05-31 17:19       ` Goswin von Brederlow
@ 2009-06-01 12:01         ` John Robinson
  0 siblings, 0 replies; 27+ messages in thread
From: John Robinson @ 2009-06-01 12:01 UTC (permalink / raw)
  To: Linux RAID

On 31/05/2009 18:19, Goswin von Brederlow wrote:
> John Robinson <john.robinson@anonymous.org.uk> writes:
>> Clearly when you're doing this over 4 drives you can have ~400
>> seeks/second but that's still limiting you to ~400 reads/second for
>> smallish block sizes.
> 
> Note that that only holds true for writes or multithreaded reads.
> Reading from a single thread will randomly pick one drive (depending
> on where it wants to read), wait for it to seek, read one block of
> data and repeat. So you get the speed of a single drive no matter how
> many drives there are in the raid.

Sure, that's why I said "can", but I thought iozone was multi-threaded. 
Maybe it needs an option specified, in which case using more threads 
than there are discs would be a good idea.

Cheers,

John.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Awful RAID5 random read performance
  2009-06-01  5:39             ` Thomas Fjellstrom
@ 2009-06-01 12:43               ` Maurice Hilarius
  2009-06-02 14:57                 ` Wil Reichert
  2009-06-02 19:47               ` Bill Davidsen
  1 sibling, 1 reply; 27+ messages in thread
From: Maurice Hilarius @ 2009-06-01 12:43 UTC (permalink / raw)
  To: tfjellstrom; +Cc: linux-raid

Thomas Fjellstrom wrote:
> ..
> Yet I've heard NCQ makes some things worse. Some raid tweaking pages tell you 
> to try disabling NCQ.
>   

Not so relevant here.
This recommendation to  disable drive NCQ is generally due tot he fact 
that NCQ implementations by various manufacturers tend
to differ in behaviour, and, in some cases , is just plain "broken".


-- 
Regards, Maurice

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Awful RAID5 random read performance
  2009-06-01 12:43               ` Maurice Hilarius
@ 2009-06-02 14:57                 ` Wil Reichert
  2009-06-02 15:14                   ` Maurice Hilarius
  0 siblings, 1 reply; 27+ messages in thread
From: Wil Reichert @ 2009-06-02 14:57 UTC (permalink / raw)
  To: Maurice Hilarius; +Cc: tfjellstrom, linux-raid

On Mon, Jun 1, 2009 at 5:43 AM, Maurice Hilarius <maurice@harddata.com> wrote:
> Thomas Fjellstrom wrote:
>>
>> ..
>> Yet I've heard NCQ makes some things worse. Some raid tweaking pages tell
>> you to try disabling NCQ.
>>
>
> Not so relevant here.
> This recommendation to  disable drive NCQ is generally due tot he fact that
> NCQ implementations by various manufacturers tend
> to differ in behaviour, and, in some cases , is just plain "broken".

Not trying to start a flamewar on whos hardware is better, but I'd
love to get clarification on that last statement.  I've always been
under the impression Intel gets their NCQ right (works for me).  All
the reviews I've read on the current set of AMD southbridges indicate
they don't but I've no personal experience there.  No idea about
nvidia or the cheap add-in sata cards like Silicon Image.

Wil
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Awful RAID5 random read performance
  2009-06-02 14:57                 ` Wil Reichert
@ 2009-06-02 15:14                   ` Maurice Hilarius
  0 siblings, 0 replies; 27+ messages in thread
From: Maurice Hilarius @ 2009-06-02 15:14 UTC (permalink / raw)
  To: Wil Reichert; +Cc: tfjellstrom, linux-raid

Wil Reichert wrote:
> On Mon, Jun 1, 2009 at 5:43 AM, Maurice Hilarius <maurice@harddata.com> wrote:
>   
>> Thomas Fjellstrom wrote:
>>     
>>> ..
>>> Yet I've heard NCQ makes some things worse. Some raid tweaking pages tell
>>> you to try disabling NCQ.
>>>
>>>       
>> Not so relevant here.
>> This recommendation to  disable drive NCQ is generally due tot he fact that
>> NCQ implementations by various manufacturers tend
>> to differ in behaviour, and, in some cases , is just plain "broken".
>>     
>
> Not trying to start a flamewar on whos hardware is better, but I'd
> love to get clarification on that last statement.  I've always been
> under the impression Intel gets their NCQ right (works for me).  All
> the reviews I've read on the current set of AMD southbridges indicate
> they don't but I've no personal experience there.  No idea about
> nvidia or the cheap add-in sata cards like Silicon Image.
>
> Wil
>   
 From speaking with hardware RAID card manufacturers support engineers 
over the past few years,
there have been several instances where NCQ implementations were 
incomplete, or faulty, or both.

This seems to have improved a lot over the last year or two, so I think 
it is likely now "OK".

However, from the earlier history  I think there is a tendency for 
people now to assume it is "broken",
and to turn it off.


-- 
Regards, Maurice

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Awful RAID5 random read performance
  2009-05-31 16:56         ` Thomas Fjellstrom
  2009-05-31 18:26           ` Keld Jørn Simonsen
@ 2009-06-02 18:54           ` Bill Davidsen
  2009-06-02 19:47             ` Keld Jørn Simonsen
  1 sibling, 1 reply; 27+ messages in thread
From: Bill Davidsen @ 2009-06-02 18:54 UTC (permalink / raw)
  To: tfjellstrom; +Cc: lrhorer, linux-raid

Thomas Fjellstrom wrote:
> On Sun May 31 2009, Leslie Rhorer wrote:
>   
>>>> I happen to be the friend Maurice was talking about. I let the raid
>>>>         
>>> layer keep
>>>
>>>       
>>>> its default chunk size of 64K. The smaller size (below like 2MB) tests
>>>>         
>>> in
>>>
>>>       
>>>> iozone are very very slow. I recently tried disabling readahead,
>>>>         
>>> Acoustic
>>>
>>>       
>>>> Management, and played with the io scheduler and all any of it has done
>>>>         
>>> is
>>>
>>>       
>>>> make the sequential access slower and has barely touched the smaller
>>>>         
>>> sized
>>>
>>>       
>>>> random access test results. Even with the 64K iozone test random
>>>>         
>>> read/write is
>>>
>>>       
>>>> only in the 7 and 11MB/s range.
>>>>
>>>> It just seems too low to me.
>>>>         
>>> I don't think so; can you try a similar test on single drives not using
>>> md RAID-5?
>>>
>>> The killer is seeks, which is what random I/O uses lots of; with a 10ms
>>> seek time you're only going to get ~100 seeks/second and if you're only
>>> reading 512 bytes after each seek you're only going to get ~500
>>> kbytes/second. Bigger block sizes will show higher throughput, but
>>> you'll still only get ~100 seeks/second.
>>>
>>> Clearly when you're doing this over 4 drives you can have ~400
>>> seeks/second but that's still limiting you to ~400 reads/second for
>>> smallish block sizes.
>>>       
>> 	John is perfectly correct, although of course a 10ms seek is a
>> fairly slow one.  The point is, it is drive dependent, and there may not be
>> much one can do about it at the software layer.  That said, you might try a
>> different scheduler, as the seek order can make a difference.  Drives with
>> larger caches may help some, although the increase in performance with
>> larger cache sizes diminishes rapidly beyond a certain point.  As one would
>> infer from John's post, increasing the number of drives in the array will
>> help a lot, since increasing the number of drives raises the limit on the
>> number of seeks / second.
>>
>> 	What file system are you using?  It can make a difference, and
>> surely has a bigger impact than most tweaks to the RAID subsystem.
>>
>> 	The biggest question in my mind, however, is why is random access a
>> big issue for you?  Are you running a very large relational database with
>> tens of thousands of tiny files?  For most systems, high volume accesses
>> consist mostly of large sequential I/O.  The majority of random I/O is of
>> rather short duration, meaning even with comparatively poor performance, it
>> doesn't take long to get the job done.  Fifty to eighty Megabits per second
>> is nothing at which to sneeze for random access of small files.  A few
>> years ago, many drives would have been barely able to manage that on a
>> sustained basis for sequential I/O.
>>     
>
> I thought the numbers were way too low. But I guess I was wrong. I really only 
> have three use cases for my arrays. One will be hosting VM images/volumes, and 
> iso disk images, while another will be hosting large media which will be 
> streaming off, p2p downloads, amd rsync/rsnapshot backups of several machines. 
> I imagine the vm array will appreciate faster random io (boot times will 
> improve, as will things like database and http disk access), and the p2p 
> surely will appreciate faster random io.
>
> I currently have them all on one disk array, but I'm thinking its a good idea 
> to separate the media from the VMs. when ktorrent is downloading a linux iso 
> or something similar atop shows very high disk utilization for ktorrent, same 
> goes for booting VMs. and the backups, oh my lord does that take a while, I 
> even tell it to skip a lot of stuff I don't need to backup.
>
> When I get around to it I may utilize the raid10 module for the VM's and 
> backups. Though that may decrease performance a little bit in the small random 
> io case. 
>   
The accesses on the VM will be similar to a real disk, so you want the 
VM on whatever you would use for bare iron. I run on raid10, many of my 
machines are on VM (including this one, my main desktop). Raid10 is a 
good general use array, I use it for a lot, other than cases where I 
need cheap space and use raid[56] to get more bytes/$ and don't need 
blinding speed. Archival storage, for instance.

-- 
Bill Davidsen <davidsen@tmr.com>
  Even purely technical things can appear to be magic, if the documentation is
obscure enough. For example, PulseAudio is configured by dancing naked around a
fire at midnight, shaking a rattle with one hand and a LISP manual with the
other, while reciting the GNU manifesto in hexadecimal. The documentation fails
to note that you must circle the fire counter-clockwise in the southern
hemisphere.



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Awful RAID5 random read performance
  2009-06-02 18:54           ` Bill Davidsen
@ 2009-06-02 19:47             ` Keld Jørn Simonsen
  2009-06-02 23:13               ` John Robinson
  0 siblings, 1 reply; 27+ messages in thread
From: Keld Jørn Simonsen @ 2009-06-02 19:47 UTC (permalink / raw)
  To: Bill Davidsen; +Cc: tfjellstrom, lrhorer, linux-raid

On Tue, Jun 02, 2009 at 02:54:07PM -0400, Bill Davidsen wrote:
> Thomas Fjellstrom wrote:
>>
>> When I get around to it I may utilize the raid10 module for the VM's 
>> and backups. Though that may decrease performance a little bit in the 
>> small random io case.   

> The accesses on the VM will be similar to a real disk, so you want the  
> VM on whatever you would use for bare iron. I run on raid10, many of my  
> machines are on VM (including this one, my main desktop). Raid10 is a  
> good general use array, I use it for a lot, other than cases where I  
> need cheap space and use raid[56] to get more bytes/$ and don't need  
> blinding speed. Archival storage, for instance.

My perception is that raid10,f2 is probably the fastest also for small random
reads because of the lower latency, and faster transfer times due to only
using the outer disk sectors. For writes the elevator evens out the
ramdom access. Benchmarks may not show this effect as they are often
done on clean file systems, where the files are allocated in the
beginning of the fs.

For cases where you need cheap disk space, and have big files like
.iso's then raid5 could be a good choice because it has the most space
while maintaining fair to good performance for big files. 

In your case, using 3 disks, raid5 should give about 210 % of the nominal
single disk speed for big file reads, and maybe 180 % for big file
writes. raid10,f2 should give about 290 % for big file reads and 140%
for big file writes. Random reads should be about the same for raid5 and
raid10,f2 - raid10,f2 maybe 15 % faster, while random writes should be
mediocre for raid5, and good for raid10,f2.

best regards
keld

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Awful RAID5 random read performance
  2009-06-01  5:39             ` Thomas Fjellstrom
  2009-06-01 12:43               ` Maurice Hilarius
@ 2009-06-02 19:47               ` Bill Davidsen
  1 sibling, 0 replies; 27+ messages in thread
From: Bill Davidsen @ 2009-06-02 19:47 UTC (permalink / raw)
  To: tfjellstrom; +Cc: linux-raid

Thomas Fjellstrom wrote:
> On Sun May 31 2009, Leslie Rhorer wrote:
>
>   
>>> Unfortunately it doesn't seem to be. Take a well-considered drive such
>>> as the WD RE3; it's spec for average latency is 4.2ms. However does it
>>> include the rotational latency (the time the head takes to reach the
>>> sector once it's on the track)? I bet it doesn't. Taking it to be only
>>> the average seek time, this drive is still among the fastest. For a
>>> 7200rpm drive this latency is just 4.2ms, so we'd have for this fast
>>> drive an average total latency of 8.4ms.
>>>       
>> That's an average.  For a random seek to exceed that, it's going to have to
>> span many cylinders.  Give the container size of a modern cylinder, that's
>> a pretty big jump.  Single applications will tend to have their data lumped
>> somewhat together on the drive.
>>
>>     
>
>   
>>> with
>>>
>>>       
>
>   
>>> No, random I/O is the most common case for busy servers, when there
>>> are lots of processes doing uncorrelated reads and writes. Even if a
>>>       
>> Yes, exactly.  By definition, such a scenario represents a multithreaded
>> set of seeks, and as we already established, multithreaded seeks are vastly
>> more efficient than serial random seeks.  The 400 seeks per second number
>> for 4 drives applies.  I don't know the details of the Linux schedulers,
>> but most schedulers employ some variation of an elevator seek to maximize
>> seek efficiency.  The brings the average latency way down and brings the
>> seek frequency way up.
>>     
>
> Ah, I never really understood how adding more random load could increase 
> performance. Now I get it :)
>
>   
>>> single application does sequential access the head will likely have
>>> moved between them. The only solution is to have lots of ram for
>>> cache, and/or lots of disks. It'd be better if they were connected to
>>> several controllers...
>>>       
>> A large RAM cache will help, but as I already pointed out, the increases in
>> returns for increasing cache size diminish rapidly past a certain point.
>> Most quality drives these days have a 32MB cache, or 128M for a 4 drive
>> array.  Add the Linux cache on top of that, and it should be sufficient for
>> most purposes.  Remember, random seeks implies small data extents.  Lots of
>> disks will bring the biggest benefit, and disks are cheap.  Multiple
>> controllers really are not necessary, especially if the controller and
>> drives support NCQ , but having multiple controllers certainly doesn't
>> hurt.
>>     
>
> Yet I've heard NCQ makes some things worse. Some raid tweaking pages tell you 
> to try disabling NCQ.
>
> I've actually been thinking about trying md-cache with an SSD on top of my new 
> raid and see how that works long term. But I can't really think of a good 
> benchmark that actually imitates my particular use cases well enough to show 
> me if it'd help me at all ::)
>
> I doubt my punny little 30G OCZ Vertex would really help all that much any 
> how.
>   

For ext[34] you might want to put the journal on SSD, if you are doing 
any significant write that will help.
Mounting data=journal may also help write, supposedly the write will 
complete when the data hits the journal, and not wait for the platter.

-- 
Bill Davidsen <davidsen@tmr.com>
  Even purely technical things can appear to be magic, if the documentation is
obscure enough. For example, PulseAudio is configured by dancing naked around a
fire at midnight, shaking a rattle with one hand and a LISP manual with the
other, while reciting the GNU manifesto in hexadecimal. The documentation fails
to note that you must circle the fire counter-clockwise in the southern
hemisphere.



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Awful RAID5 random read performance
  2009-06-02 19:47             ` Keld Jørn Simonsen
@ 2009-06-02 23:13               ` John Robinson
  2009-06-03 18:38                 ` Bill Davidsen
  0 siblings, 1 reply; 27+ messages in thread
From: John Robinson @ 2009-06-02 23:13 UTC (permalink / raw)
  To: Keld Jørn Simonsen; +Cc: Linux RAID

On 02/06/2009 20:47, Keld Jørn Simonsen wrote:
[...]
> My perception is that raid10,f2 is probably the fastest also for small random
> reads because of the lower latency, and faster transfer times due to only
> using the outer disk sectors. For writes the elevator evens out the
> ramdom access. Benchmarks may not show this effect as they are often
> done on clean file systems, where the files are allocated in the
> beginning of the fs.
> 
> For cases where you need cheap disk space, and have big files like
> .iso's then raid5 could be a good choice because it has the most space
> while maintaining fair to good performance for big files. 
> 
> In your case, using 3 disks, raid5 should give about 210 % of the nominal
> single disk speed for big file reads, and maybe 180 % for big file
> writes. raid10,f2 should give about 290 % for big file reads and 140%
> for big file writes. Random reads should be about the same for raid5 and
> raid10,f2 - raid10,f2 maybe 15 % faster, while random writes should be
> mediocre for raid5, and good for raid10,f2.

I'd be interested in reading about where you got these figures from 
and/or the rationale behind them; I'd have guessed differently...

Cheers,

John.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 27+ messages in thread

* RE: Awful RAID5 random read performance
  2009-06-01 11:41             ` Goswin von Brederlow
@ 2009-06-03  1:57               ` Leslie Rhorer
  0 siblings, 0 replies; 27+ messages in thread
From: Leslie Rhorer @ 2009-06-03  1:57 UTC (permalink / raw)
  To: 'Goswin von Brederlow'; +Cc: linux-raid

> > That's an average.  For a random seek to exceed that, it's going to have
> to
> > span many cylinders.  Give the container size of a modern cylinder,
> that's a
> > pretty big jump.  Single applications will tend to have their data
> lumped
> > somewhat together on the drive.
> 
> Only at the start, which is usualy when people benchmark. But after a
> while filesystem fragment. Files get distributed all over the disk,
> files themself get spread out as they grow. And suddenly an FS that
> was fine  month ago is too slow.

There can be a lot of application dependent variation, of course, but even
with a fragmented disk, many applications still tend to wind up with their
files clustered together on the disk.  If the application writes each file
once and never updates it, creating many more files as time goes by, then
indeed the database will grow ever more scattered.  Random access files, of
course, may wind up scattered all over the drive, even if there is only one
file use by the app.  If the application tends to update the majority of its
files on a regular basis, however, then the file updates tend to fall in
little pools across the disk, rather than being scattered in a perfectly
random fashion.  One's mileage will definitely vary.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Awful RAID5 random read performance
  2009-06-02 23:13               ` John Robinson
@ 2009-06-03 18:38                 ` Bill Davidsen
  2009-06-03 19:57                   ` John Robinson
  0 siblings, 1 reply; 27+ messages in thread
From: Bill Davidsen @ 2009-06-03 18:38 UTC (permalink / raw)
  To: John Robinson; +Cc: Keld Jørn Simonsen, Linux RAID

John Robinson wrote:
> On 02/06/2009 20:47, Keld Jørn Simonsen wrote:
> [...]
>> My perception is that raid10,f2 is probably the fastest also for 
>> small random
>> reads because of the lower latency, and faster transfer times due to 
>> only
>> using the outer disk sectors. For writes the elevator evens out the
>> ramdom access. Benchmarks may not show this effect as they are often
>> done on clean file systems, where the files are allocated in the
>> beginning of the fs.
>>
>> For cases where you need cheap disk space, and have big files like
>> .iso's then raid5 could be a good choice because it has the most space
>> while maintaining fair to good performance for big files.
>> In your case, using 3 disks, raid5 should give about 210 % of the 
>> nominal
>> single disk speed for big file reads, and maybe 180 % for big file
>> writes. raid10,f2 should give about 290 % for big file reads and 140%
>> for big file writes. Random reads should be about the same for raid5 and
>> raid10,f2 - raid10,f2 maybe 15 % faster, while random writes should be
>> mediocre for raid5, and good for raid10,f2.
>
> I'd be interested in reading about where you got these figures from 
> and/or the rationale behind them; I'd have guessed differently...

For small values of N, 10,f2 generally comes quite close to N*Sr, where 
N is # of disks and Sr is single drive read speed. This is assuming 
fiarly large reads and adequate stripe buffer space. Obviously for 
larger values of N that saturates something else in the system, like the 
bus, before N gets too large. I don't generally see more than (N/2-1)*Sw 
for write, at least for large writes. I came up with those numbers based 
on testing 3-4-5 drive arrays which do large file transfers. If you want 
to read more than large file speed into them, feel free.

All tests done on raw devices and raw arrays, and ext3 devices and 
arrays. The ratios stay about the same, tuning stripe size (stride) can 
be helpful for improving write speed.

Short summary - the numbers look close enough to mine that I would say 
they are at least useful approximations.

-- 
Bill Davidsen <davidsen@tmr.com>
  Even purely technical things can appear to be magic, if the documentation is
obscure enough. For example, PulseAudio is configured by dancing naked around a
fire at midnight, shaking a rattle with one hand and a LISP manual with the
other, while reciting the GNU manifesto in hexadecimal. The documentation fails
to note that you must circle the fire counter-clockwise in the southern
hemisphere.



--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Awful RAID5 random read performance
  2009-06-03 18:38                 ` Bill Davidsen
@ 2009-06-03 19:57                   ` John Robinson
  2009-06-03 22:21                     ` Goswin von Brederlow
  0 siblings, 1 reply; 27+ messages in thread
From: John Robinson @ 2009-06-03 19:57 UTC (permalink / raw)
  To: Linux RAID

On 03/06/2009 19:38, Bill Davidsen wrote:
> John Robinson wrote:
>> On 02/06/2009 20:47, Keld Jørn Simonsen wrote:
[...]
>>> In your case, using 3 disks, raid5 should give about 210 % of the 
>>> nominal
>>> single disk speed for big file reads, and maybe 180 % for big file
>>> writes. raid10,f2 should give about 290 % for big file reads and 140%
>>> for big file writes. Random reads should be about the same for raid5 and
>>> raid10,f2 - raid10,f2 maybe 15 % faster, while random writes should be
>>> mediocre for raid5, and good for raid10,f2.
>>
>> I'd be interested in reading about where you got these figures from 
>> and/or the rationale behind them; I'd have guessed differently...
> 
> For small values of N, 10,f2 generally comes quite close to N*Sr, where 
> N is # of disks and Sr is single drive read speed. This is assuming 
> fiarly large reads and adequate stripe buffer space. Obviously for 
> larger values of N that saturates something else in the system, like the 
> bus, before N gets too large. I don't generally see more than (N/2-1)*Sw 
> for write, at least for large writes. I came up with those numbers based 
> on testing 3-4-5 drive arrays which do large file transfers. If you want 
> to read more than large file speed into them, feel free.

Actually it was the RAID-5 figures I'd have guessed differently. I'd 
expect ~290% (rather than 210%) for big 3-disc RAID-5 reads, and ~140% 
(rather than "mediocre") for random small writes. But of course I 
haven't tested.

Cheers,

John.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Awful RAID5 random read performance
  2009-06-03 19:57                   ` John Robinson
@ 2009-06-03 22:21                     ` Goswin von Brederlow
  2009-06-04 11:23                       ` Keld Jørn Simonsen
                                         ` (2 more replies)
  0 siblings, 3 replies; 27+ messages in thread
From: Goswin von Brederlow @ 2009-06-03 22:21 UTC (permalink / raw)
  To: John Robinson; +Cc: Linux RAID

John Robinson <john.robinson@anonymous.org.uk> writes:

> On 03/06/2009 19:38, Bill Davidsen wrote:
>> John Robinson wrote:
>>> On 02/06/2009 20:47, Keld Jørn Simonsen wrote:
> [...]
>>>> In your case, using 3 disks, raid5 should give about 210 % of the
>>>> nominal
>>>> single disk speed for big file reads, and maybe 180 % for big file
>>>> writes. raid10,f2 should give about 290 % for big file reads and 140%
>>>> for big file writes. Random reads should be about the same for raid5 and
>>>> raid10,f2 - raid10,f2 maybe 15 % faster, while random writes should be
>>>> mediocre for raid5, and good for raid10,f2.
>>>
>>> I'd be interested in reading about where you got these figures from
>>> and/or the rationale behind them; I'd have guessed differently...
>>
>> For small values of N, 10,f2 generally comes quite close to N*Sr,
>> where N is # of disks and Sr is single drive read speed. This is
>> assuming fiarly large reads and adequate stripe buffer
>> space. Obviously for larger values of N that saturates something
>> else in the system, like the bus, before N gets too large. I don't
>> generally see more than (N/2-1)*Sw for write, at least for large
>> writes. I came up with those numbers based on testing 3-4-5 drive
>> arrays which do large file transfers. If you want to read more than
>> large file speed into them, feel free.

With far copies reading is like reading raid0 and writing is like
raid0 but writing twice with a seek between each. So (N/2) and (N/2-a
bit) are the theoretical maximums and raid10 comes damn close to those.

> Actually it was the RAID-5 figures I'd have guessed differently. I'd
> expect ~290% (rather than 210%) for big 3-disc RAID-5 reads, and ~140%
> (rather than "mediocre") for random small writes. But of course I
> haven't tested.

That kind of depends on the chunk size I think.

Say you have a raid 5 with chunk size << size of 1 track. Then on each
disk you read 2 chunks, skip a chunk, read 2 chunks, skip a chunk. But
skipping a chunk means waiting for the disk to rotate over it. That
takes as long as reading it. You shouldn't even get 210% speed.

Only if chunk size >> size of 1 track could you seek over a
chunk. And you have to hope that by the time you have seeked the start
of the next chunk hasn't rotated past the head yet.

Anyone know what the size of a track is on modern disks? How many
sectors/track do they have?

MfG
        Goswin
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Awful RAID5 random read performance
  2009-06-03 22:21                     ` Goswin von Brederlow
@ 2009-06-04 11:23                       ` Keld Jørn Simonsen
  2009-06-04 22:40                       ` Nifty Fedora Mitch
  2009-06-06 23:06                       ` Bill Davidsen
  2 siblings, 0 replies; 27+ messages in thread
From: Keld Jørn Simonsen @ 2009-06-04 11:23 UTC (permalink / raw)
  To: Goswin von Brederlow; +Cc: John Robinson, Linux RAID

On Thu, Jun 04, 2009 at 12:21:02AM +0200, Goswin von Brederlow wrote:
> John Robinson <john.robinson@anonymous.org.uk> writes:
> 
> > On 03/06/2009 19:38, Bill Davidsen wrote:
> >> John Robinson wrote:
> >>> On 02/06/2009 20:47, Keld Jørn Simonsen wrote:
> > [...]
> >>>> In your case, using 3 disks, raid5 should give about 210 % of the
> >>>> nominal
> >>>> single disk speed for big file reads, and maybe 180 % for big file
> >>>> writes. raid10,f2 should give about 290 % for big file reads and 140%
> >>>> for big file writes. Random reads should be about the same for raid5 and
> >>>> raid10,f2 - raid10,f2 maybe 15 % faster, while random writes should be
> >>>> mediocre for raid5, and good for raid10,f2.
> >>>
> >>> I'd be interested in reading about where you got these figures from
> >>> and/or the rationale behind them; I'd have guessed differently...

See more on our wiki for actual benchmarks,
http://linux-raid.osdl.org/index.php/Performance
http://blog.jamponi.net/2008/07/raid56-and-10-benchmarks-on-26255_10.html
The latter reports on arrays with 4 disks, som downscale it and you get
a good idea of expected values for 3 disks.

> >> For small values of N, 10,f2 generally comes quite close to N*Sr,
> >> where N is # of disks and Sr is single drive read speed. This is
> >> assuming fiarly large reads and adequate stripe buffer
> >> space. Obviously for larger values of N that saturates something
> >> else in the system, like the bus, before N gets too large. I don't
> >> generally see more than (N/2-1)*Sw for write, at least for large
> >> writes. I came up with those numbers based on testing 3-4-5 drive
> >> arrays which do large file transfers. If you want to read more than
> >> large file speed into them, feel free.
> 
> With far copies reading is like reading raid0 and writing is like
> raid0 but writing twice with a seek between each. So (N/2) and (N/2-a
> bit) are the theoretical maximums and raid10 comes damn close to those.

My take on theoretical maxima is:
raid10,f2 for sequential reads: N * Sr
Raid10,f2 for sequential writes:  N/2 * Sw

> 
> > Actually it was the RAID-5 figures I'd have guessed differently. I'd
> > expect ~290% (rather than 210%) for big 3-disc RAID-5 reads, and ~140%
> > (rather than "mediocre") for random small writes. But of course I
> > haven't tested.
> 
> That kind of depends on the chunk size I think.
> 
> Say you have a raid 5 with chunk size << size of 1 track. Then on each
> disk you read 2 chunks, skip a chunk, read 2 chunks, skip a chunk. But
> skipping a chunk means waiting for the disk to rotate over it. That
> takes as long as reading it. You shouldn't even get 210% speed.
> 
> Only if chunk size >> size of 1 track could you seek over a
> chunk. And you have to hope that by the time you have seeked the start
> of the next chunk hasn't rotated past the head yet.
> 
> Anyone know what the size of a track is on modern disks? How many
> sectors/track do they have?

I believe Goswins analyses here is valid, skipping sectors is as
expensive as reading them. 

Anyway, using somewhat bigger chunk sizes you may get into the effect of
not reading/seeking over data, and thus go beyond the N-1 mark. As I was
trying to report best values obtainable, then I chose to report this
factor also. Actually some figures show a loss of only 0.50 for
sequential reads on raid5 with a chunk size of 2 MB.

For sequential writes I was asuming that you were writing 2 data stripes and 1
parity stripe, and that the theoretical effective writing speed would
get close to 2 (for a 3 disk raid5). Jon's benchmark does not support
this. His best figures for raid5 is a loss of 2.25 write speed,
where I would expect somethng like a little more than 1. Maybe the fact
that the test is on raw partitions, and not on a file system with an
active elevator is in play here. Maybe it is because there is quite some
calculations involved for the parity calculation, and because of no
elevator, the system have to wait for completion of parity calculation
before parity writes can be done.


For random writes on raid5 I reported "mediocre". This is because that
if you write randomly in raid5, you need to first read the chunk, read
the parity chunk, do updating and then write the chunk and the parity
chunk again. And you need to read full chuncks. So at most you
will something like N/4 if your data size is close to the chunk size.
If you have a big chunk size and smallish payload size than a lot of
read/writes are done on uninteresting data. This probably also goes for
other raid types, and the fs elevator may help a little here, especially
for writing. 

In general I think raid5 random writes would be in the order of N/4
where mirrored raid types would be N/2 (with 2 copies) - making raid5
half speed of mirrored raid types like raid1 and raid10. I am not sure I
have data to back that statement up.

best regards
keld
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Awful RAID5 random read performance
  2009-06-03 22:21                     ` Goswin von Brederlow
  2009-06-04 11:23                       ` Keld Jørn Simonsen
@ 2009-06-04 22:40                       ` Nifty Fedora Mitch
  2009-06-06 23:06                       ` Bill Davidsen
  2 siblings, 0 replies; 27+ messages in thread
From: Nifty Fedora Mitch @ 2009-06-04 22:40 UTC (permalink / raw)
  To: Goswin von Brederlow; +Cc: John Robinson, Linux RAID

On Thu, Jun 04, 2009 at 12:21:02AM +0200, Goswin von Brederlow wrote:
.....mi
> 
> Anyone know what the size of a track is on modern disks? How many
> sectors/track do they have?

The number will differ from the inside to the outside of the
disk.  The number of zones will differ from drive to drive too....

Some diagnostic software will either know this or 
have vendor specific ways to get it live from the disk.

Data sheets now report average... At one time vendors 
made a big deal on this...
http://www.impediment.com/seagate/s2000/spec_318436lcv.shtml
    Sectors/Track (avg)  	426
    Bytes/Track (avg) 	218,112

If you take numbers like:
    Track Density (TPI) 	18,145 tracks/inch
    Recording Density (BPI, max) 	328,272 bits/inch 
and dust off some geometry you might discover
how many more bits are possible on the outside
tracks vs the inside tracks and then take those
bits and estimate the number of additional blocks.
An estimate may all you get because spares and 
other uses for bits will be hidden.




-- 
	T o m  M i t c h e l l 
	Found me a new hat, now what?


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Awful RAID5 random read performance
  2009-06-03 22:21                     ` Goswin von Brederlow
  2009-06-04 11:23                       ` Keld Jørn Simonsen
  2009-06-04 22:40                       ` Nifty Fedora Mitch
@ 2009-06-06 23:06                       ` Bill Davidsen
  2 siblings, 0 replies; 27+ messages in thread
From: Bill Davidsen @ 2009-06-06 23:06 UTC (permalink / raw)
  To: Goswin von Brederlow; +Cc: John Robinson, Linux RAID

Goswin von Brederlow wrote:
> John Robinson <john.robinson@anonymous.org.uk> writes:
>
>   
>> On 03/06/2009 19:38, Bill Davidsen wrote:
>>     
>>> John Robinson wrote:
>>>       
>>>> On 02/06/2009 20:47, Keld Jørn Simonsen wrote:
>>>>         
>> [...]
>>     
>>>>> In your case, using 3 disks, raid5 should give about 210 % of the
>>>>> nominal
>>>>> single disk speed for big file reads, and maybe 180 % for big file
>>>>> writes. raid10,f2 should give about 290 % for big file reads and 140%
>>>>> for big file writes. Random reads should be about the same for raid5 and
>>>>> raid10,f2 - raid10,f2 maybe 15 % faster, while random writes should be
>>>>> mediocre for raid5, and good for raid10,f2.
>>>>>           
>>>> I'd be interested in reading about where you got these figures from
>>>> and/or the rationale behind them; I'd have guessed differently...
>>>>         
>>> For small values of N, 10,f2 generally comes quite close to N*Sr,
>>> where N is # of disks and Sr is single drive read speed. This is
>>> assuming fiarly large reads and adequate stripe buffer
>>> space. Obviously for larger values of N that saturates something
>>> else in the system, like the bus, before N gets too large. I don't
>>> generally see more than (N/2-1)*Sw for write, at least for large
>>> writes. I came up with those numbers based on testing 3-4-5 drive
>>> arrays which do large file transfers. If you want to read more than
>>> large file speed into them, feel free.
>>>       
>
> With far copies reading is like reading raid0 and writing is like
> raid0 but writing twice with a seek between each. So (N/2) and (N/2-a
> bit) are the theoretical maximums and raid10 comes damn close to those.
>
>   
>> Actually it was the RAID-5 figures I'd have guessed differently. I'd
>> expect ~290% (rather than 210%) for big 3-disc RAID-5 reads, and ~140%
>> (rather than "mediocre") for random small writes. But of course I
>> haven't tested.
>>     
>
> That kind of depends on the chunk size I think.
>
> Say you have a raid 5 with chunk size << size of 1 track. Then on each
> disk you read 2 chunks, skip a chunk, read 2 chunks, skip a chunk. But
> skipping a chunk means waiting for the disk to rotate over it. That
> takes as long as reading it. You shouldn't even get 210% speed.
>
> Only if chunk size >> size of 1 track could you seek over a
> chunk. And you have to hope that by the time you have seeked the start
> of the next chunk hasn't rotated past the head yet.
>
> Anyone know what the size of a track is on modern disks? How many
> sectors/track do they have?
>   

It varies to keep the bpi constant, so there are more sectors on outer 
tracks and transfer rate is higher. raid10 can use outer tracks in more 
cases (with the "far" layout) and thus delivers a higher transfer rate. 
Or so the theory goes, in practice raid10 *does* give a higher transfer 
rate, so the above is theory to explain the observed facts.

-- 
Bill Davidsen <davidsen@tmr.com>
  Even purely technical things can appear to be magic, if the documentation is
obscure enough. For example, PulseAudio is configured by dancing naked around a
fire at midnight, shaking a rattle with one hand and a LISP manual with the
other, while reciting the GNU manifesto in hexadecimal. The documentation fails
to note that you must circle the fire counter-clockwise in the southern
hemisphere.



--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2009-06-06 23:06 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-05-30 21:46 Awful RAID5 random read performance Maurice Hilarius
2009-05-31  6:25 ` Michael Tokarev
2009-05-31  7:47   ` Thomas Fjellstrom
2009-05-31 12:29     ` John Robinson
2009-05-31 15:41       ` Leslie Rhorer
2009-05-31 16:56         ` Thomas Fjellstrom
2009-05-31 18:26           ` Keld Jørn Simonsen
2009-06-02 18:54           ` Bill Davidsen
2009-06-02 19:47             ` Keld Jørn Simonsen
2009-06-02 23:13               ` John Robinson
2009-06-03 18:38                 ` Bill Davidsen
2009-06-03 19:57                   ` John Robinson
2009-06-03 22:21                     ` Goswin von Brederlow
2009-06-04 11:23                       ` Keld Jørn Simonsen
2009-06-04 22:40                       ` Nifty Fedora Mitch
2009-06-06 23:06                       ` Bill Davidsen
2009-06-01  1:19         ` Carlos Carvalho
2009-06-01  4:57           ` Leslie Rhorer
2009-06-01  5:39             ` Thomas Fjellstrom
2009-06-01 12:43               ` Maurice Hilarius
2009-06-02 14:57                 ` Wil Reichert
2009-06-02 15:14                   ` Maurice Hilarius
2009-06-02 19:47               ` Bill Davidsen
2009-06-01 11:41             ` Goswin von Brederlow
2009-06-03  1:57               ` Leslie Rhorer
2009-05-31 17:19       ` Goswin von Brederlow
2009-06-01 12:01         ` John Robinson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.