All of lore.kernel.org
 help / color / mirror / Atom feed
* Optimize RAID0 for max IOPS?
@ 2011-01-18 21:01 Wolfgang Denk
  2011-01-18 22:18 ` Roberto Spadim
                   ` (2 more replies)
  0 siblings, 3 replies; 46+ messages in thread
From: Wolfgang Denk @ 2011-01-18 21:01 UTC (permalink / raw)
  To: linux-raid

Hi,

I'm going to replace a h/w based RAID system (3ware 9650SE) by a plain
s/w RAID0, because the existing system appears to be seriously limited
in terms of numbers of I/O operations per second.

Our workload is mixed read / write (something between 80% read / 20%
write and 50% / 50%), consisting of a very large number of usually
very small files.

There may be 20...50 millions of files, or more. 65% of the files are
smaller than 4 kB; 80% are smaller than 8 kB; 90% are smaller than 16
kB; 98.4% are smaller than 64 kB.

I will have 4 x 1 TB disks for this setup.

The plan is to build a RAID0 from the 4 devices, create a physical
volume and a volume group on the resulting /dev/md?, then create 2 or
3 logical volumes that will be used as XFS file systems.

My goal is to optimize for maximum number of I/O operations per
second. [I am aware that using SSDs would be a nice thing, but that
would be too expensive.]

Is this a reasonable approach for such a task?

Should I do anything different to acchive maximum performance?

What are the tunables in this setup?  [It seems the usual recipies are
more oriented in maximizing the data troughput for large, mostly
sequential accesses - I figure that things like increasing read-ahead
etc. will not help me much here?]

Thanks in advance.

Best regards,

Wolfgang Denk

-- 
DENX Software Engineering GmbH,     MD: Wolfgang Denk & Detlev Zundel
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de
Quote from a recent meeting:   "We are going to continue having these
meetings everyday until I find out why no work is getting done."

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Optimize RAID0 for max IOPS?
  2011-01-18 21:01 Optimize RAID0 for max IOPS? Wolfgang Denk
@ 2011-01-18 22:18 ` Roberto Spadim
  2011-01-19  7:04   ` Wolfgang Denk
  2011-01-18 23:15 ` Stefan /*St0fF*/ Hübner
  2011-01-25 17:10 ` Christoph Hellwig
  2 siblings, 1 reply; 46+ messages in thread
From: Roberto Spadim @ 2011-01-18 22:18 UTC (permalink / raw)
  To: Wolfgang Denk; +Cc: linux-raid

it´s a interesting question, i don´t know what the best too
but...
i didn´t create a partition of a /dev/mdxxx device yet (linux 2.6.29),
maybe it´s not possible

try partitioning all hard drives and make many paritions and make raid
on each one
another way could be a lvm over mdxxx and try to partition it (can lvm
be partitioned?)

another optimization is per disk elevator (at linux level) at /sys/
you can find it (try find -iname elevator, or find -iname scheduler, i
don´t remember the file name)

linux raid0 have a nice read/write algorithm for hard disks (i think)
test it
the best solution is no partition (since md will be made in disk, and
not on partition, this make disk head position more real than by
partition, making read_balance algorithm better)

2011/1/18 Wolfgang Denk <wd@denx.de>:
> Hi,
>
> I'm going to replace a h/w based RAID system (3ware 9650SE) by a plain
> s/w RAID0, because the existing system appears to be seriously limited
> in terms of numbers of I/O operations per second.
>
> Our workload is mixed read / write (something between 80% read / 20%
> write and 50% / 50%), consisting of a very large number of usually
> very small files.
>
> There may be 20...50 millions of files, or more. 65% of the files are
> smaller than 4 kB; 80% are smaller than 8 kB; 90% are smaller than 16
> kB; 98.4% are smaller than 64 kB.
>
> I will have 4 x 1 TB disks for this setup.
>
> The plan is to build a RAID0 from the 4 devices, create a physical
> volume and a volume group on the resulting /dev/md?, then create 2 or
> 3 logical volumes that will be used as XFS file systems.
>
> My goal is to optimize for maximum number of I/O operations per
> second. [I am aware that using SSDs would be a nice thing, but that
> would be too expensive.]
>
> Is this a reasonable approach for such a task?
>
> Should I do anything different to acchive maximum performance?
>
> What are the tunables in this setup?  [It seems the usual recipies are
> more oriented in maximizing the data troughput for large, mostly
> sequential accesses - I figure that things like increasing read-ahead
> etc. will not help me much here?]
>
> Thanks in advance.
>
> Best regards,
>
> Wolfgang Denk
>
> --
> DENX Software Engineering GmbH,     MD: Wolfgang Denk & Detlev Zundel
> HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
> Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de
> Quote from a recent meeting:   "We are going to continue having these
> meetings everyday until I find out why no work is getting done."
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>



-- 
Roberto Spadim
Spadim Technology / SPAEmpresarial
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Optimize RAID0 for max IOPS?
  2011-01-18 21:01 Optimize RAID0 for max IOPS? Wolfgang Denk
  2011-01-18 22:18 ` Roberto Spadim
@ 2011-01-18 23:15 ` Stefan /*St0fF*/ Hübner
  2011-01-19  0:05   ` Roberto Spadim
                     ` (2 more replies)
  2011-01-25 17:10 ` Christoph Hellwig
  2 siblings, 3 replies; 46+ messages in thread
From: Stefan /*St0fF*/ Hübner @ 2011-01-18 23:15 UTC (permalink / raw)
  To: Wolfgang Denk; +Cc: linux-raid

Hi,

[in German:] Schätzelein, Dein Problem sind die Platten, nicht der
Controller.

[in English:] Dude, the disks are your bottleneck.

On a 4-disk RAID0 software RAID can only outspeed this 3ware Controller
with a really really fast processor.  The limiting factor is the disk's
access time.  If SSDs are too expensive, then your actual performance is
the max you'll get (maybe to replace the HWRAID controller might give a
little speed-up, but not very much).

All the best,
Stefan

Am 18.01.2011 22:01, schrieb Wolfgang Denk:
> Hi,
> 
> I'm going to replace a h/w based RAID system (3ware 9650SE) by a plain
> s/w RAID0, because the existing system appears to be seriously limited
> in terms of numbers of I/O operations per second.
> 
> Our workload is mixed read / write (something between 80% read / 20%
> write and 50% / 50%), consisting of a very large number of usually
> very small files.
> 
> There may be 20...50 millions of files, or more. 65% of the files are
> smaller than 4 kB; 80% are smaller than 8 kB; 90% are smaller than 16
> kB; 98.4% are smaller than 64 kB.
> 
> I will have 4 x 1 TB disks for this setup.
> 
> The plan is to build a RAID0 from the 4 devices, create a physical
> volume and a volume group on the resulting /dev/md?, then create 2 or
> 3 logical volumes that will be used as XFS file systems.
> 
> My goal is to optimize for maximum number of I/O operations per
> second. [I am aware that using SSDs would be a nice thing, but that
> would be too expensive.]
> 
> Is this a reasonable approach for such a task?
> 
> Should I do anything different to acchive maximum performance?
> 
> What are the tunables in this setup?  [It seems the usual recipies are
> more oriented in maximizing the data troughput for large, mostly
> sequential accesses - I figure that things like increasing read-ahead
> etc. will not help me much here?]
> 
> Thanks in advance.
> 
> Best regards,
> 
> Wolfgang Denk
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Optimize RAID0 for max IOPS?
  2011-01-18 23:15 ` Stefan /*St0fF*/ Hübner
@ 2011-01-19  0:05   ` Roberto Spadim
  2011-01-19  7:11     ` Wolfgang Denk
  2011-01-19  7:10   ` Wolfgang Denk
  2011-01-19 19:21   ` Wolfgang Denk
  2 siblings, 1 reply; 46+ messages in thread
From: Roberto Spadim @ 2011-01-19  0:05 UTC (permalink / raw)
  To: stefan.huebner; +Cc: Wolfgang Denk, linux-raid

maybe removing hwraid and using swraid may reduce speed (depend how
much cpu you use with hw and with sw)
what we can optimize? less I/O per seconds making as much useful
read/write data on array, how? good read/write algorithms for raid.
(for each device type, ssd, hd) but... like stefan, disks are your
bottleneck

2011/1/18 Stefan /*St0fF*/ Hübner <stefan.huebner@stud.tu-ilmenau.de>:
> Hi,
>
> [in German:] Schätzelein, Dein Problem sind die Platten, nicht der
> Controller.
>
> [in English:] Dude, the disks are your bottleneck.
>
> On a 4-disk RAID0 software RAID can only outspeed this 3ware Controller
> with a really really fast processor.  The limiting factor is the disk's
> access time.  If SSDs are too expensive, then your actual performance is
> the max you'll get (maybe to replace the HWRAID controller might give a
> little speed-up, but not very much).
>
> All the best,
> Stefan
>
> Am 18.01.2011 22:01, schrieb Wolfgang Denk:
>> Hi,
>>
>> I'm going to replace a h/w based RAID system (3ware 9650SE) by a plain
>> s/w RAID0, because the existing system appears to be seriously limited
>> in terms of numbers of I/O operations per second.
>>
>> Our workload is mixed read / write (something between 80% read / 20%
>> write and 50% / 50%), consisting of a very large number of usually
>> very small files.
>>
>> There may be 20...50 millions of files, or more. 65% of the files are
>> smaller than 4 kB; 80% are smaller than 8 kB; 90% are smaller than 16
>> kB; 98.4% are smaller than 64 kB.
>>
>> I will have 4 x 1 TB disks for this setup.
>>
>> The plan is to build a RAID0 from the 4 devices, create a physical
>> volume and a volume group on the resulting /dev/md?, then create 2 or
>> 3 logical volumes that will be used as XFS file systems.
>>
>> My goal is to optimize for maximum number of I/O operations per
>> second. [I am aware that using SSDs would be a nice thing, but that
>> would be too expensive.]
>>
>> Is this a reasonable approach for such a task?
>>
>> Should I do anything different to acchive maximum performance?
>>
>> What are the tunables in this setup?  [It seems the usual recipies are
>> more oriented in maximizing the data troughput for large, mostly
>> sequential accesses - I figure that things like increasing read-ahead
>> etc. will not help me much here?]
>>
>> Thanks in advance.
>>
>> Best regards,
>>
>> Wolfgang Denk
>>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>



-- 
Roberto Spadim
Spadim Technology / SPAEmpresarial
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Optimize RAID0 for max IOPS?
  2011-01-18 22:18 ` Roberto Spadim
@ 2011-01-19  7:04   ` Wolfgang Denk
  0 siblings, 0 replies; 46+ messages in thread
From: Wolfgang Denk @ 2011-01-19  7:04 UTC (permalink / raw)
  To: Roberto Spadim; +Cc: linux-raid

Dear Roberto Spadim,

In message <AANLkTimzwgg_Htj4rMxjdjhMQHExeWOqzd5Puu9KbXug@mail.gmail.com> you wrote:
>
> try partitioning all hard drives and make many paritions and make raid
> on each one
> another way could be a lvm over mdxxx and try to partition it (can lvm
> be partitioned?)

I do not intend to use any partitions.  I will use LVM on the full
device /dev/mdX, and then use logical volumes.


Best regards,

Wolfgang Denk

-- 
DENX Software Engineering GmbH,     MD: Wolfgang Denk & Detlev Zundel
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de
What was sliced bread the greatest thing since?

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Optimize RAID0 for max IOPS?
  2011-01-18 23:15 ` Stefan /*St0fF*/ Hübner
  2011-01-19  0:05   ` Roberto Spadim
@ 2011-01-19  7:10   ` Wolfgang Denk
  2011-01-19 19:21   ` Wolfgang Denk
  2 siblings, 0 replies; 46+ messages in thread
From: Wolfgang Denk @ 2011-01-19  7:10 UTC (permalink / raw)
  To: stefan.huebner; +Cc: linux-raid

Dear =?ISO-8859-15?Q?Stefan_/*St0fF*/_H=FCbner?=,

In message <4D361F26.3060507@stud.tu-ilmenau.de> you wrote:
> 
> [in German:] Schätzelein, Dein Problem sind die Platten, nicht der
> Controller.

Irrtum.

> [in English:] Dude, the disks are your bottleneck.

Wrong.  Testing the same workload with soft RAID versus the h/w RAID
solution gives a _significant_ performance difference.

I happen to know which benchmarks 3ware (and other RAID controller
manufacturers) are optimizing their firmware for - IOPS is not even
mentioned there.

> On a 4-disk RAID0 software RAID can only outspeed this 3ware Controller
> with a really really fast processor.  The limiting factor is the disk's
> access time.  If SSDs are too expensive, then your actual performance is
> the max you'll get (maybe to replace the HWRAID controller might give a
> little speed-up, but not very much).

From some tests done before I expect to see a speed increase of >10.

Hey, even a single disk is performing better under this work load.

And fast processor? Yes, I have it, but what for? It spends most of
it's time (>90%, usually more) in iowait.

Best regards,

Wolfgang Denk

-- 
DENX Software Engineering GmbH,     MD: Wolfgang Denk & Detlev Zundel
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de
"To IBM, 'open' means there is a modicum  of  interoperability  among
some of their equipment."                            - Harv Masterson
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Optimize RAID0 for max IOPS?
  2011-01-19  0:05   ` Roberto Spadim
@ 2011-01-19  7:11     ` Wolfgang Denk
  2011-01-19  8:18       ` Stefan /*St0fF*/ Hübner
  0 siblings, 1 reply; 46+ messages in thread
From: Wolfgang Denk @ 2011-01-19  7:11 UTC (permalink / raw)
  To: Roberto Spadim; +Cc: stefan.huebner, linux-raid

Dear Roberto Spadim,

In message <AANLkTi=v6yA_0OOfi2ymA67X0x+KsvV9VC5OgG+0DvKq@mail.gmail.com> you wrote:
> maybe removing hwraid and using swraid may reduce speed (depend how
> much cpu you use with hw and with sw)
> what we can optimize? less I/O per seconds making as much useful
> read/write data on array, how? good read/write algorithms for raid.
> (for each device type, ssd, hd) but... like stefan, disks are your
> bottleneck

No, they are not.  Run some benchmarks yourself if you don't believe
me.

Even a single disk drive is performing better than the hw RAID under
this workload.

Best regards,

Wolfgang Denk

-- 
DENX Software Engineering GmbH,     MD: Wolfgang Denk & Detlev Zundel
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de
WARNING:  This Product Attracts Every Other Piece  of  Matter in  the
Universe, Including the Products of Other Manufacturers, with a Force
Proportional  to the Product of the Masses and Inversely Proportional
to the Distance Between Them.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Optimize RAID0 for max IOPS?
  2011-01-19  7:11     ` Wolfgang Denk
@ 2011-01-19  8:18       ` Stefan /*St0fF*/ Hübner
  2011-01-19  8:29         ` Jaap Crezee
  0 siblings, 1 reply; 46+ messages in thread
From: Stefan /*St0fF*/ Hübner @ 2011-01-19  8:18 UTC (permalink / raw)
  To: Wolfgang Denk; +Cc: Roberto Spadim, linux-raid

Am 19.01.2011 08:11, schrieb Wolfgang Denk:
> Dear Roberto Spadim,
> 
> In message <AANLkTi=v6yA_0OOfi2ymA67X0x+KsvV9VC5OgG+0DvKq@mail.gmail.com> you wrote:
>> maybe removing hwraid and using swraid may reduce speed (depend how
>> much cpu you use with hw and with sw)
>> what we can optimize? less I/O per seconds making as much useful
>> read/write data on array, how? good read/write algorithms for raid.
>> (for each device type, ssd, hd) but... like stefan, disks are your
>> bottleneck
> 
> No, they are not.  Run some benchmarks yourself if you don't believe
> me.

Lol - I wouldn't have answered in the first place if I didn't have any
expertise.  So suit yourself - as you don't bring up any real numbers
(remember: you've got the weird setup, you asked, you don't have enough
money for the enterprise solution - so ...) nobody who worked with 3ware
controllers will believe you.

> 
> Even a single disk drive is performing better than the hw RAID under
> this workload.

Well - that is the problem - simulate YOUR workload.  Actually I fear at
least one of your disks has a grown defect, which slows down / blocks
i/o.  Haven't seen any 9650SE RAID being slower than the same config in
a software raid.

> 
> Best regards,
> 
> Wolfgang Denk
> 


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Optimize RAID0 for max IOPS?
  2011-01-19  8:18       ` Stefan /*St0fF*/ Hübner
@ 2011-01-19  8:29         ` Jaap Crezee
  2011-01-19  9:32           ` Jan Kasprzak
  0 siblings, 1 reply; 46+ messages in thread
From: Jaap Crezee @ 2011-01-19  8:29 UTC (permalink / raw)
  To: stefan.huebner; +Cc: Wolfgang Denk, Roberto Spadim, linux-raid

On 01/19/11 09:18, Stefan /*St0fF*/ Hübner wrote:
> Am 19.01.2011 08:11, schrieb Wolfgang Denk:
> Lol - I wouldn't have answered in the first place if I didn't have any
> expertise.  So suit yourself - as you don't bring up any real numbers
> (remember: you've got the weird setup, you asked, you don't have enough
> money for the enterprise solution - so ...) nobody who worked with 3ware
> controllers will believe you.

Here's one: I switched from 3ware hardware based raid to linux software raid and I am getting better 
throughputs. I had a 3ware PCI-X car (don't know which type by hearth).
Okay, to be honest I did not have a (enterprise solution?) battery-backup-unit. So probably no write caching...

Jaap
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Optimize RAID0 for max IOPS?
  2011-01-19  8:29         ` Jaap Crezee
@ 2011-01-19  9:32           ` Jan Kasprzak
  0 siblings, 0 replies; 46+ messages in thread
From: Jan Kasprzak @ 2011-01-19  9:32 UTC (permalink / raw)
  To: Jaap Crezee; +Cc: stefan.huebner, Wolfgang Denk, Roberto Spadim, linux-raid

Jaap Crezee wrote:
: On 01/19/11 09:18, Stefan /*St0fF*/ Hübner wrote:
: >Am 19.01.2011 08:11, schrieb Wolfgang Denk:
: >Lol - I wouldn't have answered in the first place if I didn't have any
: >expertise.  So suit yourself - as you don't bring up any real numbers
: >(remember: you've got the weird setup, you asked, you don't have enough
: >money for the enterprise solution - so ...) nobody who worked with 3ware
: >controllers will believe you.
: 
: Here's one: I switched from 3ware hardware based raid to linux software 
: raid and I am getting better throughputs. I had a 3ware PCI-X car (don't 
: know which type by hearth).
: Okay, to be honest I did not have a (enterprise solution?) 
: battery-backup-unit. So probably no write caching...
: 
	A "me too": 3ware 9550SX with 8 drives, RAID-5. The performance
(especially latency) was very bad. After I switched to the md SW RAID
and lowered the TCQ depth in the 3ware controller to 16[*], the filesystem
and latency feels much faster.

	The only problem I had was a poor interaction of the CFQ
iosched with the RAID-5 rebuild process, but I have fixed this
by moving to deadline I/O scheduler.

	Another case was the LSI SAS 2008 (I admit it is a pretty low-end
HW RAID controller): 10 disks WD RE4 black 2TB in HW and SW RAID-10
configurations:

time mkfs.ext4 /dev/md0  # SW RAID
real	8m4.783s
user	0m9.255s
sys	2m30.107s

time mkfs.ext4 -F /dev/sdb # HW RAID
real	22m13.503s
user	0m9.763s
sys	2m51.371s

	The problem with HW RAID is that today's computers can dedicate tens
of gigabytes to buffer cache, which allows the I/O scheduler to reorder
requests based on latency and other criteria, which no RAID controller
can match, because it cannot see which requests are latency-critical
and which are not.

	Also, Linux I/O scheduler works really hard to keep all spindles
busy, while when you fill the TC queue of a HW RAID volume with requests
which maps to one or small number of physical disks, there is no way the
controller can tell "send me more requests, but not from this area
of the HW RAID volume".

[*] 3ware driver is especially bad here, because its default queue
	depth is 1024, IIRC, which makes the whole I/O scheduler
	with queue size 512 a no-op. Think bufferbloat in the storage area.

-- 
| Jan "Yenya" Kasprzak  <kas at {fi.muni.cz - work | yenya.net - private}> |
| GPG: ID 1024/D3498839      Fingerprint 0D99A7FB206605D7 8B35FCDE05B18A5E |
| http://www.fi.muni.cz/~kas/    Journal: http://www.fi.muni.cz/~kas/blog/ |
Please don't top post and in particular don't attach entire digests to your
mail or we'll all soon be using bittorrent to read the list.     --Alan Cox
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Optimize RAID0 for max IOPS?
  2011-01-18 23:15 ` Stefan /*St0fF*/ Hübner
  2011-01-19  0:05   ` Roberto Spadim
  2011-01-19  7:10   ` Wolfgang Denk
@ 2011-01-19 19:21   ` Wolfgang Denk
  2011-01-19 19:50     ` Roberto Spadim
  2011-01-24 14:40     ` CoolCold
  2 siblings, 2 replies; 46+ messages in thread
From: Wolfgang Denk @ 2011-01-19 19:21 UTC (permalink / raw)
  To: stefan.huebner; +Cc: linux-raid

Dear =?ISO-8859-15?Q?Stefan_/*St0fF*/_H=FCbner?=,

In message <4D361F26.3060507@stud.tu-ilmenau.de> you wrote:
> 
> [in German:] Schätzelein, Dein Problem sind die Platten, nicht der
> Controller.
> 
> [in English:] Dude, the disks are your bottleneck.
...

Maybe we can stop speculations about what might be the cause of the
problems in some setup I do NOT intend to use, and rather discuss the
questions I asked.

> > I will have 4 x 1 TB disks for this setup.
> > 
> > The plan is to build a RAID0 from the 4 devices, create a physical
> > volume and a volume group on the resulting /dev/md?, then create 2 or
> > 3 logical volumes that will be used as XFS file systems.

Clarrification: I'll run /dev/md* on the raw disks, without any
partitions on them.

> > My goal is to optimize for maximum number of I/O operations per
> > second. ...
> > 
> > Is this a reasonable approach for such a task?
> > 
> > Should I do anything different to acchive maximum performance?
> > 
> > What are the tunables in this setup?  [It seems the usual recipies are
> > more oriented in maximizing the data troughput for large, mostly
> > sequential accesses - I figure that things like increasing read-ahead
> > etc. will not help me much here?]

So can anybody help answering these questions:

- are there any special options when creating the RAID0 to make it
  perform faster for such a use case?
- are there other tunables, any special MD / LVM / file system /
  read ahead / buffer cache / ... parameters to look for?

Thanks.

Wolfgang Denk

-- 
DENX Software Engineering GmbH,     MD: Wolfgang Denk & Detlev Zundel
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de
Boykottiert Microsoft - Kauft Eure Fenster bei OBI!
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Optimize RAID0 for max IOPS?
  2011-01-19 19:21   ` Wolfgang Denk
@ 2011-01-19 19:50     ` Roberto Spadim
  2011-01-19 22:36       ` Stefan /*St0fF*/ Hübner
  2011-01-24 14:40     ` CoolCold
  1 sibling, 1 reply; 46+ messages in thread
From: Roberto Spadim @ 2011-01-19 19:50 UTC (permalink / raw)
  To: Wolfgang Denk; +Cc: stefan.huebner, linux-raid

So can anybody help answering these questions:

- are there any special options when creating the RAID0 to make it
perform faster for such a use case?
- are there other tunables, any special MD / LVM / file system / read
ahead / buffer cache / ... parameters to look for?

lets see:
what´s your disk (ssd or sas or sata) best block size to write/read?
write this at ->(A)
what´s your work load? 50% write 50% read ?

raid0 block size should be multiple of (A)
*****filesystem size should be multiple of (A) of all disks
*****read ahead should be a multiple of (A)
for example
/dev/sda 1kb
/dev/sdb 4kb

you should use 6kb... you should use 4kb, 8kb, 16kb (multiple of 1kb and 4kb)

check i/o sheduller per disk too (ssd should use noop, disks should
use cfq, deadline or another...)
async and sync option at mount /etc/fstab, noatime reduce a lot of i/o
too, you should optimize your application too
hdparm each disk to use dma and fastest i/o options

are you using only filesystem? are you using somethink more? samba?
mysql? apache? lvm?
each of this programs have some tunning, check their benchmarks


getting back....
what´s a raid controller?
cpu + memory + disk controller + disks
but... it only run raid software (it can run linux....)

if you computer is slower than raid cpu+memory+disk controller, you
will have a slower software raid, than hardware raid
it´s like load balance on cpu/memory utilization of disk i/o (use
dedicated hardware, or use your hardware?)
got it?
using a super fast xeon with ddr3 and optical fiber running software
raid, is faster than a hardware raid using a arm (or fpga) ddrX memory
and sas(fiber optical) connection to disks

two solutions for the same problem
what´s fast? benchmark it
i think that if your xeon run a database and a very workloaded apache,
a dedicated hardware raid can run faster, but a light xeon can run
faster than a dedicated hardware raid



2011/1/19 Wolfgang Denk <wd@denx.de>:
> Dear =?ISO-8859-15?Q?Stefan_/*St0fF*/_H=FCbner?=,
>
> In message <4D361F26.3060507@stud.tu-ilmenau.de> you wrote:
>>
>> [in German:] Schätzelein, Dein Problem sind die Platten, nicht der
>> Controller.
>>
>> [in English:] Dude, the disks are your bottleneck.
> ...
>
> Maybe we can stop speculations about what might be the cause of the
> problems in some setup I do NOT intend to use, and rather discuss the
> questions I asked.
>
>> > I will have 4 x 1 TB disks for this setup.
>> >
>> > The plan is to build a RAID0 from the 4 devices, create a physical
>> > volume and a volume group on the resulting /dev/md?, then create 2 or
>> > 3 logical volumes that will be used as XFS file systems.
>
> Clarrification: I'll run /dev/md* on the raw disks, without any
> partitions on them.
>
>> > My goal is to optimize for maximum number of I/O operations per
>> > second. ...
>> >
>> > Is this a reasonable approach for such a task?
>> >
>> > Should I do anything different to acchive maximum performance?
>> >
>> > What are the tunables in this setup?  [It seems the usual recipies are
>> > more oriented in maximizing the data troughput for large, mostly
>> > sequential accesses - I figure that things like increasing read-ahead
>> > etc. will not help me much here?]
>
> So can anybody help answering these questions:
>
> - are there any special options when creating the RAID0 to make it
>  perform faster for such a use case?
> - are there other tunables, any special MD / LVM / file system /
>  read ahead / buffer cache / ... parameters to look for?
>
> Thanks.
>
> Wolfgang Denk
>
> --
> DENX Software Engineering GmbH,     MD: Wolfgang Denk & Detlev Zundel
> HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
> Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de
> Boykottiert Microsoft - Kauft Eure Fenster bei OBI!
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>



-- 
Roberto Spadim
Spadim Technology / SPAEmpresarial
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Optimize RAID0 for max IOPS?
  2011-01-19 19:50     ` Roberto Spadim
@ 2011-01-19 22:36       ` Stefan /*St0fF*/ Hübner
  2011-01-19 23:09         ` Roberto Spadim
  0 siblings, 1 reply; 46+ messages in thread
From: Stefan /*St0fF*/ Hübner @ 2011-01-19 22:36 UTC (permalink / raw)
  To: Roberto Spadim; +Cc: Wolfgang Denk, linux-raid

@Roberto: I guess you're right. BUT: i have not seen 900MB/s coming from
(i.e. read access) a software raid, but I've seen it from a 9750 on a
LSI SASx28 backplane, running RAID6 over 16disks (HDS722020ALA330).  So
one might not be wrong assuming on current raid-controllers
hardware/software matching and timing is way more optimized than what
mdraid might get at all.

The 9650 and 9690 are considerably slower, but I've seen 550MB/s thruput
from those, also (I don't recall the setup anymore, tho).

Max reading I saw from a software raid was around 350MB/s - so hence my
answers.  And if people had problems with controllers which are 5 years
or older by now, the numbers are not really comparable...

Now again there's the point where there are also parameters on the
controller that can be tweaked, and a simple way to recreate the testing
scenario.  We may discuss and throw in further numbers and experience,
but not being able to recreate your specific scenario makes us talk past
each other...

stefan

Am 19.01.2011 20:50, schrieb Roberto Spadim:
> So can anybody help answering these questions:
> 
> - are there any special options when creating the RAID0 to make it
> perform faster for such a use case?
> - are there other tunables, any special MD / LVM / file system / read
> ahead / buffer cache / ... parameters to look for?
> 
> lets see:
> what´s your disk (ssd or sas or sata) best block size to write/read?
> write this at ->(A)
> what´s your work load? 50% write 50% read ?
> 
> raid0 block size should be multiple of (A)
> *****filesystem size should be multiple of (A) of all disks
> *****read ahead should be a multiple of (A)
> for example
> /dev/sda 1kb
> /dev/sdb 4kb
> 
> you should use 6kb... you should use 4kb, 8kb, 16kb (multiple of 1kb and 4kb)
> 
> check i/o sheduller per disk too (ssd should use noop, disks should
> use cfq, deadline or another...)
> async and sync option at mount /etc/fstab, noatime reduce a lot of i/o
> too, you should optimize your application too
> hdparm each disk to use dma and fastest i/o options
> 
> are you using only filesystem? are you using somethink more? samba?
> mysql? apache? lvm?
> each of this programs have some tunning, check their benchmarks
> 
> 
> getting back....
> what´s a raid controller?
> cpu + memory + disk controller + disks
> but... it only run raid software (it can run linux....)
> 
> if you computer is slower than raid cpu+memory+disk controller, you
> will have a slower software raid, than hardware raid
> it´s like load balance on cpu/memory utilization of disk i/o (use
> dedicated hardware, or use your hardware?)
> got it?
> using a super fast xeon with ddr3 and optical fiber running software
> raid, is faster than a hardware raid using a arm (or fpga) ddrX memory
> and sas(fiber optical) connection to disks
> 
> two solutions for the same problem
> what´s fast? benchmark it
> i think that if your xeon run a database and a very workloaded apache,
> a dedicated hardware raid can run faster, but a light xeon can run
> faster than a dedicated hardware raid
> 
> 
> 
> 2011/1/19 Wolfgang Denk <wd@denx.de>:
>> Dear =?ISO-8859-15?Q?Stefan_/*St0fF*/_H=FCbner?=,
>>
>> In message <4D361F26.3060507@stud.tu-ilmenau.de> you wrote:
>>>
>>> [in German:] Schätzelein, Dein Problem sind die Platten, nicht der
>>> Controller.
>>>
>>> [in English:] Dude, the disks are your bottleneck.
>> ...
>>
>> Maybe we can stop speculations about what might be the cause of the
>> problems in some setup I do NOT intend to use, and rather discuss the
>> questions I asked.
>>
>>>> I will have 4 x 1 TB disks for this setup.
>>>>
>>>> The plan is to build a RAID0 from the 4 devices, create a physical
>>>> volume and a volume group on the resulting /dev/md?, then create 2 or
>>>> 3 logical volumes that will be used as XFS file systems.
>>
>> Clarrification: I'll run /dev/md* on the raw disks, without any
>> partitions on them.
>>
>>>> My goal is to optimize for maximum number of I/O operations per
>>>> second. ...
>>>>
>>>> Is this a reasonable approach for such a task?
>>>>
>>>> Should I do anything different to acchive maximum performance?
>>>>
>>>> What are the tunables in this setup?  [It seems the usual recipies are
>>>> more oriented in maximizing the data troughput for large, mostly
>>>> sequential accesses - I figure that things like increasing read-ahead
>>>> etc. will not help me much here?]
>>
>> So can anybody help answering these questions:
>>
>> - are there any special options when creating the RAID0 to make it
>>  perform faster for such a use case?
>> - are there other tunables, any special MD / LVM / file system /
>>  read ahead / buffer cache / ... parameters to look for?
>>
>> Thanks.
>>
>> Wolfgang Denk
>>
>> --
>> DENX Software Engineering GmbH,     MD: Wolfgang Denk & Detlev Zundel
>> HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
>> Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de
>> Boykottiert Microsoft - Kauft Eure Fenster bei OBI!
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
> 
> 
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Optimize RAID0 for max IOPS?
  2011-01-19 22:36       ` Stefan /*St0fF*/ Hübner
@ 2011-01-19 23:09         ` Roberto Spadim
  2011-01-19 23:18           ` Roberto Spadim
  0 siblings, 1 reply; 46+ messages in thread
From: Roberto Spadim @ 2011-01-19 23:09 UTC (permalink / raw)
  To: stefan.huebner; +Cc: Wolfgang Denk, linux-raid

the problem....
if you use iostat, or iotop
with software raid:
   you just see disk i/o
   you don´t see memory (cache) i/o
when using hardware raid:
   you just see raid i/o (it can be a cache read or a real disk read)


if you check memory+disk i/o, you will get similar values, if not, you
will see high cpu usage
for example you are using raidx with 10disks on a hardware raid
change hardware raid to use only disks (10 disks for linux)
make the same raidx with 10disks
you will get a slower i/o since it have a controler between disk and cpu
try it without hardware raid cpu, just a (sas/sata) optimized
controller, or 10 (sata/sas) one port
you still with a slow i/o then hardware controller (that´s right!)

now let´s remove the sata/sas channel, let´s use a pci-express
revodrive or pci-express texas ssd drive
you will get better values then a hardware raid, but... why? you
changed the hardware (ok, i know) but you make cpu more close to disk
if you use disks with cache, you will get more speed (a memory ssd
harddisk is faster than a harddisk only disk)

why hardware are more faster than linux? i don´t think they are...
they can make smaller latencies with good memory cache
but if you computer use ddr3 and your hardware raid controller use i2c
memory, your ddr3 cache is faster...

how to benchmark? check disk i/o+memory cache i/o
if linux is faster ok, you use more cpu and memory of your computer
if linux is slower ok, you use less cpu and memory, but will have it
on hardware raid...
if you upgrade you memory and cpu, it can be faster than you hardware
raid controller, what´s better for you?

want a better read/write solution for software raid? make a new
read/write code, you can do it, linux is easier than hardware raid to
code!
want a better read/write solution for hardware raid? call your
hardware seller and talk, please i need a better firmware, could you
send me?

got?


2011/1/19 Stefan /*St0fF*/ Hübner <stefan.huebner@stud.tu-ilmenau.de>:
> @Roberto: I guess you're right. BUT: i have not seen 900MB/s coming from
> (i.e. read access) a software raid, but I've seen it from a 9750 on a
> LSI SASx28 backplane, running RAID6 over 16disks (HDS722020ALA330).  So
> one might not be wrong assuming on current raid-controllers
> hardware/software matching and timing is way more optimized than what
> mdraid might get at all.
>
> The 9650 and 9690 are considerably slower, but I've seen 550MB/s thruput
> from those, also (I don't recall the setup anymore, tho).
>
> Max reading I saw from a software raid was around 350MB/s - so hence my
> answers.  And if people had problems with controllers which are 5 years
> or older by now, the numbers are not really comparable...
>
> Now again there's the point where there are also parameters on the
> controller that can be tweaked, and a simple way to recreate the testing
> scenario.  We may discuss and throw in further numbers and experience,
> but not being able to recreate your specific scenario makes us talk past
> each other...
>
> stefan
>
> Am 19.01.2011 20:50, schrieb Roberto Spadim:
>> So can anybody help answering these questions:
>>
>> - are there any special options when creating the RAID0 to make it
>> perform faster for such a use case?
>> - are there other tunables, any special MD / LVM / file system / read
>> ahead / buffer cache / ... parameters to look for?
>>
>> lets see:
>> what´s your disk (ssd or sas or sata) best block size to write/read?
>> write this at ->(A)
>> what´s your work load? 50% write 50% read ?
>>
>> raid0 block size should be multiple of (A)
>> *****filesystem size should be multiple of (A) of all disks
>> *****read ahead should be a multiple of (A)
>> for example
>> /dev/sda 1kb
>> /dev/sdb 4kb
>>
>> you should use 6kb... you should use 4kb, 8kb, 16kb (multiple of 1kb and 4kb)
>>
>> check i/o sheduller per disk too (ssd should use noop, disks should
>> use cfq, deadline or another...)
>> async and sync option at mount /etc/fstab, noatime reduce a lot of i/o
>> too, you should optimize your application too
>> hdparm each disk to use dma and fastest i/o options
>>
>> are you using only filesystem? are you using somethink more? samba?
>> mysql? apache? lvm?
>> each of this programs have some tunning, check their benchmarks
>>
>>
>> getting back....
>> what´s a raid controller?
>> cpu + memory + disk controller + disks
>> but... it only run raid software (it can run linux....)
>>
>> if you computer is slower than raid cpu+memory+disk controller, you
>> will have a slower software raid, than hardware raid
>> it´s like load balance on cpu/memory utilization of disk i/o (use
>> dedicated hardware, or use your hardware?)
>> got it?
>> using a super fast xeon with ddr3 and optical fiber running software
>> raid, is faster than a hardware raid using a arm (or fpga) ddrX memory
>> and sas(fiber optical) connection to disks
>>
>> two solutions for the same problem
>> what´s fast? benchmark it
>> i think that if your xeon run a database and a very workloaded apache,
>> a dedicated hardware raid can run faster, but a light xeon can run
>> faster than a dedicated hardware raid
>>
>>
>>
>> 2011/1/19 Wolfgang Denk <wd@denx.de>:
>>> Dear =?ISO-8859-15?Q?Stefan_/*St0fF*/_H=FCbner?=,
>>>
>>> In message <4D361F26.3060507@stud.tu-ilmenau.de> you wrote:
>>>>
>>>> [in German:] Schätzelein, Dein Problem sind die Platten, nicht der
>>>> Controller.
>>>>
>>>> [in English:] Dude, the disks are your bottleneck.
>>> ...
>>>
>>> Maybe we can stop speculations about what might be the cause of the
>>> problems in some setup I do NOT intend to use, and rather discuss the
>>> questions I asked.
>>>
>>>>> I will have 4 x 1 TB disks for this setup.
>>>>>
>>>>> The plan is to build a RAID0 from the 4 devices, create a physical
>>>>> volume and a volume group on the resulting /dev/md?, then create 2 or
>>>>> 3 logical volumes that will be used as XFS file systems.
>>>
>>> Clarrification: I'll run /dev/md* on the raw disks, without any
>>> partitions on them.
>>>
>>>>> My goal is to optimize for maximum number of I/O operations per
>>>>> second. ...
>>>>>
>>>>> Is this a reasonable approach for such a task?
>>>>>
>>>>> Should I do anything different to acchive maximum performance?
>>>>>
>>>>> What are the tunables in this setup?  [It seems the usual recipies are
>>>>> more oriented in maximizing the data troughput for large, mostly
>>>>> sequential accesses - I figure that things like increasing read-ahead
>>>>> etc. will not help me much here?]
>>>
>>> So can anybody help answering these questions:
>>>
>>> - are there any special options when creating the RAID0 to make it
>>>  perform faster for such a use case?
>>> - are there other tunables, any special MD / LVM / file system /
>>>  read ahead / buffer cache / ... parameters to look for?
>>>
>>> Thanks.
>>>
>>> Wolfgang Denk
>>>
>>> --
>>> DENX Software Engineering GmbH,     MD: Wolfgang Denk & Detlev Zundel
>>> HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
>>> Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de
>>> Boykottiert Microsoft - Kauft Eure Fenster bei OBI!
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
>>
>>
>>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>



-- 
Roberto Spadim
Spadim Technology / SPAEmpresarial
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Optimize RAID0 for max IOPS?
  2011-01-19 23:09         ` Roberto Spadim
@ 2011-01-19 23:18           ` Roberto Spadim
  2011-01-20  2:48             ` Keld Jørn Simonsen
  2011-01-21 19:34             ` Wolfgang Denk
  0 siblings, 2 replies; 46+ messages in thread
From: Roberto Spadim @ 2011-01-19 23:18 UTC (permalink / raw)
  To: stefan.huebner; +Cc: Wolfgang Denk, linux-raid

a good idea....
why not start a opensource raid controller?
what we need? a cpu, memory, power supply with battery or capacitor,
sas/sata (disk interfaces), pci-express or another (computer
interface)
it don´t need a operational system, since it will only run one program
with some threads (ok a small operational system to implement threads
easly)

we could use arm, fpga, intel core2duo, atlhon, xeon, or another system...
instead using a computer with ethernet interface (nbd nfs samba or
another file/device sharing iscsi ethernet sata), we need a computer
with pci-express interface and native operational system module


2011/1/19 Roberto Spadim <roberto@spadim.com.br>:
> the problem....
> if you use iostat, or iotop
> with software raid:
>   you just see disk i/o
>   you don´t see memory (cache) i/o
> when using hardware raid:
>   you just see raid i/o (it can be a cache read or a real disk read)
>
>
> if you check memory+disk i/o, you will get similar values, if not, you
> will see high cpu usage
> for example you are using raidx with 10disks on a hardware raid
> change hardware raid to use only disks (10 disks for linux)
> make the same raidx with 10disks
> you will get a slower i/o since it have a controler between disk and cpu
> try it without hardware raid cpu, just a (sas/sata) optimized
> controller, or 10 (sata/sas) one port
> you still with a slow i/o then hardware controller (that´s right!)
>
> now let´s remove the sata/sas channel, let´s use a pci-express
> revodrive or pci-express texas ssd drive
> you will get better values then a hardware raid, but... why? you
> changed the hardware (ok, i know) but you make cpu more close to disk
> if you use disks with cache, you will get more speed (a memory ssd
> harddisk is faster than a harddisk only disk)
>
> why hardware are more faster than linux? i don´t think they are...
> they can make smaller latencies with good memory cache
> but if you computer use ddr3 and your hardware raid controller use i2c
> memory, your ddr3 cache is faster...
>
> how to benchmark? check disk i/o+memory cache i/o
> if linux is faster ok, you use more cpu and memory of your computer
> if linux is slower ok, you use less cpu and memory, but will have it
> on hardware raid...
> if you upgrade you memory and cpu, it can be faster than you hardware
> raid controller, what´s better for you?
>
> want a better read/write solution for software raid? make a new
> read/write code, you can do it, linux is easier than hardware raid to
> code!
> want a better read/write solution for hardware raid? call your
> hardware seller and talk, please i need a better firmware, could you
> send me?
>
> got?
>
>
> 2011/1/19 Stefan /*St0fF*/ Hübner <stefan.huebner@stud.tu-ilmenau.de>:
>> @Roberto: I guess you're right. BUT: i have not seen 900MB/s coming from
>> (i.e. read access) a software raid, but I've seen it from a 9750 on a
>> LSI SASx28 backplane, running RAID6 over 16disks (HDS722020ALA330).  So
>> one might not be wrong assuming on current raid-controllers
>> hardware/software matching and timing is way more optimized than what
>> mdraid might get at all.
>>
>> The 9650 and 9690 are considerably slower, but I've seen 550MB/s thruput
>> from those, also (I don't recall the setup anymore, tho).
>>
>> Max reading I saw from a software raid was around 350MB/s - so hence my
>> answers.  And if people had problems with controllers which are 5 years
>> or older by now, the numbers are not really comparable...
>>
>> Now again there's the point where there are also parameters on the
>> controller that can be tweaked, and a simple way to recreate the testing
>> scenario.  We may discuss and throw in further numbers and experience,
>> but not being able to recreate your specific scenario makes us talk past
>> each other...
>>
>> stefan
>>
>> Am 19.01.2011 20:50, schrieb Roberto Spadim:
>>> So can anybody help answering these questions:
>>>
>>> - are there any special options when creating the RAID0 to make it
>>> perform faster for such a use case?
>>> - are there other tunables, any special MD / LVM / file system / read
>>> ahead / buffer cache / ... parameters to look for?
>>>
>>> lets see:
>>> what´s your disk (ssd or sas or sata) best block size to write/read?
>>> write this at ->(A)
>>> what´s your work load? 50% write 50% read ?
>>>
>>> raid0 block size should be multiple of (A)
>>> *****filesystem size should be multiple of (A) of all disks
>>> *****read ahead should be a multiple of (A)
>>> for example
>>> /dev/sda 1kb
>>> /dev/sdb 4kb
>>>
>>> you should use 6kb... you should use 4kb, 8kb, 16kb (multiple of 1kb and 4kb)
>>>
>>> check i/o sheduller per disk too (ssd should use noop, disks should
>>> use cfq, deadline or another...)
>>> async and sync option at mount /etc/fstab, noatime reduce a lot of i/o
>>> too, you should optimize your application too
>>> hdparm each disk to use dma and fastest i/o options
>>>
>>> are you using only filesystem? are you using somethink more? samba?
>>> mysql? apache? lvm?
>>> each of this programs have some tunning, check their benchmarks
>>>
>>>
>>> getting back....
>>> what´s a raid controller?
>>> cpu + memory + disk controller + disks
>>> but... it only run raid software (it can run linux....)
>>>
>>> if you computer is slower than raid cpu+memory+disk controller, you
>>> will have a slower software raid, than hardware raid
>>> it´s like load balance on cpu/memory utilization of disk i/o (use
>>> dedicated hardware, or use your hardware?)
>>> got it?
>>> using a super fast xeon with ddr3 and optical fiber running software
>>> raid, is faster than a hardware raid using a arm (or fpga) ddrX memory
>>> and sas(fiber optical) connection to disks
>>>
>>> two solutions for the same problem
>>> what´s fast? benchmark it
>>> i think that if your xeon run a database and a very workloaded apache,
>>> a dedicated hardware raid can run faster, but a light xeon can run
>>> faster than a dedicated hardware raid
>>>
>>>
>>>
>>> 2011/1/19 Wolfgang Denk <wd@denx.de>:
>>>> Dear =?ISO-8859-15?Q?Stefan_/*St0fF*/_H=FCbner?=,
>>>>
>>>> In message <4D361F26.3060507@stud.tu-ilmenau.de> you wrote:
>>>>>
>>>>> [in German:] Schätzelein, Dein Problem sind die Platten, nicht der
>>>>> Controller.
>>>>>
>>>>> [in English:] Dude, the disks are your bottleneck.
>>>> ...
>>>>
>>>> Maybe we can stop speculations about what might be the cause of the
>>>> problems in some setup I do NOT intend to use, and rather discuss the
>>>> questions I asked.
>>>>
>>>>>> I will have 4 x 1 TB disks for this setup.
>>>>>>
>>>>>> The plan is to build a RAID0 from the 4 devices, create a physical
>>>>>> volume and a volume group on the resulting /dev/md?, then create 2 or
>>>>>> 3 logical volumes that will be used as XFS file systems.
>>>>
>>>> Clarrification: I'll run /dev/md* on the raw disks, without any
>>>> partitions on them.
>>>>
>>>>>> My goal is to optimize for maximum number of I/O operations per
>>>>>> second. ...
>>>>>>
>>>>>> Is this a reasonable approach for such a task?
>>>>>>
>>>>>> Should I do anything different to acchive maximum performance?
>>>>>>
>>>>>> What are the tunables in this setup?  [It seems the usual recipies are
>>>>>> more oriented in maximizing the data troughput for large, mostly
>>>>>> sequential accesses - I figure that things like increasing read-ahead
>>>>>> etc. will not help me much here?]
>>>>
>>>> So can anybody help answering these questions:
>>>>
>>>> - are there any special options when creating the RAID0 to make it
>>>>  perform faster for such a use case?
>>>> - are there other tunables, any special MD / LVM / file system /
>>>>  read ahead / buffer cache / ... parameters to look for?
>>>>
>>>> Thanks.
>>>>
>>>> Wolfgang Denk
>>>>
>>>> --
>>>> DENX Software Engineering GmbH,     MD: Wolfgang Denk & Detlev Zundel
>>>> HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
>>>> Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de
>>>> Boykottiert Microsoft - Kauft Eure Fenster bei OBI!
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>
>>>
>>>
>>>
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>
>
>
> --
> Roberto Spadim
> Spadim Technology / SPAEmpresarial
>



-- 
Roberto Spadim
Spadim Technology / SPAEmpresarial
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Optimize RAID0 for max IOPS?
  2011-01-19 23:18           ` Roberto Spadim
@ 2011-01-20  2:48             ` Keld Jørn Simonsen
  2011-01-20  3:53               ` Roberto Spadim
  2011-01-21 19:34             ` Wolfgang Denk
  1 sibling, 1 reply; 46+ messages in thread
From: Keld Jørn Simonsen @ 2011-01-20  2:48 UTC (permalink / raw)
  To: Roberto Spadim; +Cc: stefan.huebner, Wolfgang Denk, linux-raid

On Wed, Jan 19, 2011 at 09:18:22PM -0200, Roberto Spadim wrote:
> a good idea....
> why not start a opensource raid controller?
> what we need? a cpu, memory, power supply with battery or capacitor,
> sas/sata (disk interfaces), pci-express or another (computer
> interface)

Why? because of some differences in memory speed?

Normally software raid is faster than hardware raid, as wittnessed by
many here on the list. The mentioning of max 350 MB/s on a SW raid 
is not true, 350 MB/S is what I get out of a simple box with 4 slightly
oldish SATA drives. 16 new fast SATA drives in SW raid6 should easily go beyond
1000 MB/s, given that there are not other bottlenecks in the system.

Linux SW raid goes fairly close to theoretical maxima, given adequate
HW.


best regards
keld

> it don?t need a operational system, since it will only run one program
> with some threads (ok a small operational system to implement threads
> easly)
> 
> we could use arm, fpga, intel core2duo, atlhon, xeon, or another system...
> instead using a computer with ethernet interface (nbd nfs samba or
> another file/device sharing iscsi ethernet sata), we need a computer
> with pci-express interface and native operational system module
> 
> 
> 2011/1/19 Roberto Spadim <roberto@spadim.com.br>:
> > the problem....
> > if you use iostat, or iotop
> > with software raid:
> >   you just see disk i/o
> >   you don?t see memory (cache) i/o
> > when using hardware raid:
> >   you just see raid i/o (it can be a cache read or a real disk read)
> >
> >
> > if you check memory+disk i/o, you will get similar values, if not, you
> > will see high cpu usage
> > for example you are using raidx with 10disks on a hardware raid
> > change hardware raid to use only disks (10 disks for linux)
> > make the same raidx with 10disks
> > you will get a slower i/o since it have a controler between disk and cpu
> > try it without hardware raid cpu, just a (sas/sata) optimized
> > controller, or 10 (sata/sas) one port
> > you still with a slow i/o then hardware controller (that?s right!)
> >
> > now let?s remove the sata/sas channel, let?s use a pci-express
> > revodrive or pci-express texas ssd drive
> > you will get better values then a hardware raid, but... why? you
> > changed the hardware (ok, i know) but you make cpu more close to disk
> > if you use disks with cache, you will get more speed (a memory ssd
> > harddisk is faster than a harddisk only disk)
> >
> > why hardware are more faster than linux? i don?t think they are...
> > they can make smaller latencies with good memory cache
> > but if you computer use ddr3 and your hardware raid controller use i2c
> > memory, your ddr3 cache is faster...
> >
> > how to benchmark? check disk i/o+memory cache i/o
> > if linux is faster ok, you use more cpu and memory of your computer
> > if linux is slower ok, you use less cpu and memory, but will have it
> > on hardware raid...
> > if you upgrade you memory and cpu, it can be faster than you hardware
> > raid controller, what?s better for you?
> >
> > want a better read/write solution for software raid? make a new
> > read/write code, you can do it, linux is easier than hardware raid to
> > code!
> > want a better read/write solution for hardware raid? call your
> > hardware seller and talk, please i need a better firmware, could you
> > send me?
> >
> > got?
> >
> >
> > 2011/1/19 Stefan /*St0fF*/ Hübner <stefan.huebner@stud.tu-ilmenau.de>:
> >> @Roberto: I guess you're right. BUT: i have not seen 900MB/s coming from
> >> (i.e. read access) a software raid, but I've seen it from a 9750 on a
> >> LSI SASx28 backplane, running RAID6 over 16disks (HDS722020ALA330).  So
> >> one might not be wrong assuming on current raid-controllers
> >> hardware/software matching and timing is way more optimized than what
> >> mdraid might get at all.
> >>
> >> The 9650 and 9690 are considerably slower, but I've seen 550MB/s thruput
> >> from those, also (I don't recall the setup anymore, tho).
> >>
> >> Max reading I saw from a software raid was around 350MB/s - so hence my
> >> answers.  And if people had problems with controllers which are 5 years
> >> or older by now, the numbers are not really comparable...
> >>
> >> Now again there's the point where there are also parameters on the
> >> controller that can be tweaked, and a simple way to recreate the testing
> >> scenario.  We may discuss and throw in further numbers and experience,
> >> but not being able to recreate your specific scenario makes us talk past
> >> each other...
> >>
> >> stefan
> >>
> >> Am 19.01.2011 20:50, schrieb Roberto Spadim:
> >>> So can anybody help answering these questions:
> >>>
> >>> - are there any special options when creating the RAID0 to make it
> >>> perform faster for such a use case?
> >>> - are there other tunables, any special MD / LVM / file system / read
> >>> ahead / buffer cache / ... parameters to look for?
> >>>
> >>> lets see:
> >>> what?s your disk (ssd or sas or sata) best block size to write/read?
> >>> write this at ->(A)
> >>> what?s your work load? 50% write 50% read ?
> >>>
> >>> raid0 block size should be multiple of (A)
> >>> *****filesystem size should be multiple of (A) of all disks
> >>> *****read ahead should be a multiple of (A)
> >>> for example
> >>> /dev/sda 1kb
> >>> /dev/sdb 4kb
> >>>
> >>> you should use 6kb... you should use 4kb, 8kb, 16kb (multiple of 1kb and 4kb)
> >>>
> >>> check i/o sheduller per disk too (ssd should use noop, disks should
> >>> use cfq, deadline or another...)
> >>> async and sync option at mount /etc/fstab, noatime reduce a lot of i/o
> >>> too, you should optimize your application too
> >>> hdparm each disk to use dma and fastest i/o options
> >>>
> >>> are you using only filesystem? are you using somethink more? samba?
> >>> mysql? apache? lvm?
> >>> each of this programs have some tunning, check their benchmarks
> >>>
> >>>
> >>> getting back....
> >>> what?s a raid controller?
> >>> cpu + memory + disk controller + disks
> >>> but... it only run raid software (it can run linux....)
> >>>
> >>> if you computer is slower than raid cpu+memory+disk controller, you
> >>> will have a slower software raid, than hardware raid
> >>> it?s like load balance on cpu/memory utilization of disk i/o (use
> >>> dedicated hardware, or use your hardware?)
> >>> got it?
> >>> using a super fast xeon with ddr3 and optical fiber running software
> >>> raid, is faster than a hardware raid using a arm (or fpga) ddrX memory
> >>> and sas(fiber optical) connection to disks
> >>>
> >>> two solutions for the same problem
> >>> what?s fast? benchmark it
> >>> i think that if your xeon run a database and a very workloaded apache,
> >>> a dedicated hardware raid can run faster, but a light xeon can run
> >>> faster than a dedicated hardware raid
> >>>
> >>>
> >>>
> >>> 2011/1/19 Wolfgang Denk <wd@denx.de>:
> >>>> Dear =?ISO-8859-15?Q?Stefan_/*St0fF*/_H=FCbner?=,
> >>>>
> >>>> In message <4D361F26.3060507@stud.tu-ilmenau.de> you wrote:
> >>>>>
> >>>>> [in German:] Schätzelein, Dein Problem sind die Platten, nicht der
> >>>>> Controller.
> >>>>>
> >>>>> [in English:] Dude, the disks are your bottleneck.
> >>>> ...
> >>>>
> >>>> Maybe we can stop speculations about what might be the cause of the
> >>>> problems in some setup I do NOT intend to use, and rather discuss the
> >>>> questions I asked.
> >>>>
> >>>>>> I will have 4 x 1 TB disks for this setup.
> >>>>>>
> >>>>>> The plan is to build a RAID0 from the 4 devices, create a physical
> >>>>>> volume and a volume group on the resulting /dev/md?, then create 2 or
> >>>>>> 3 logical volumes that will be used as XFS file systems.
> >>>>
> >>>> Clarrification: I'll run /dev/md* on the raw disks, without any
> >>>> partitions on them.
> >>>>
> >>>>>> My goal is to optimize for maximum number of I/O operations per
> >>>>>> second. ...
> >>>>>>
> >>>>>> Is this a reasonable approach for such a task?
> >>>>>>
> >>>>>> Should I do anything different to acchive maximum performance?
> >>>>>>
> >>>>>> What are the tunables in this setup?  [It seems the usual recipies are
> >>>>>> more oriented in maximizing the data troughput for large, mostly
> >>>>>> sequential accesses - I figure that things like increasing read-ahead
> >>>>>> etc. will not help me much here?]
> >>>>
> >>>> So can anybody help answering these questions:
> >>>>
> >>>> - are there any special options when creating the RAID0 to make it
> >>>>  perform faster for such a use case?
> >>>> - are there other tunables, any special MD / LVM / file system /
> >>>>  read ahead / buffer cache / ... parameters to look for?
> >>>>
> >>>> Thanks.
> >>>>
> >>>> Wolfgang Denk
> >>>>
> >>>> --
> >>>> DENX Software Engineering GmbH,     MD: Wolfgang Denk & Detlev Zundel
> >>>> HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
> >>>> Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de
> >>>> Boykottiert Microsoft - Kauft Eure Fenster bei OBI!
> >>>> --
> >>>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> >>>> the body of a message to majordomo@vger.kernel.org
> >>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>>>
> >>>
> >>>
> >>>
> >>
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> >> the body of a message to majordomo@vger.kernel.org
> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>
> >
> >
> >
> > --
> > Roberto Spadim
> > Spadim Technology / SPAEmpresarial
> >
> 
> 
> 
> -- 
> Roberto Spadim
> Spadim Technology / SPAEmpresarial
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Optimize RAID0 for max IOPS?
  2011-01-20  2:48             ` Keld Jørn Simonsen
@ 2011-01-20  3:53               ` Roberto Spadim
  0 siblings, 0 replies; 46+ messages in thread
From: Roberto Spadim @ 2011-01-20  3:53 UTC (permalink / raw)
  To: Keld Jørn Simonsen; +Cc: stefan.huebner, Wolfgang Denk, linux-raid

=) i know, but since we have many proprietary firmware, a opensource
firmware (like openbios) could be very nice :D hehehehe

i will use linux raid (i´m sure it´s very good) it´s really fast, and
work with hotswap too
(ok there´s some userspace programs to allow it to work ok even with
wrong kernel hotswap problems, but when kernel can release and replug
it without problems we don´t need userspace programs... userspace
check each new hotpluged volume, if uuid is = some raid uuid device,
it put the device in the right raid device (i made it with a php
script =) hehehe) )


2011/1/20 Keld Jørn Simonsen <keld@keldix.com>:
> On Wed, Jan 19, 2011 at 09:18:22PM -0200, Roberto Spadim wrote:
>> a good idea....
>> why not start a opensource raid controller?
>> what we need? a cpu, memory, power supply with battery or capacitor,
>> sas/sata (disk interfaces), pci-express or another (computer
>> interface)
>
> Why? because of some differences in memory speed?
>
> Normally software raid is faster than hardware raid, as wittnessed by
> many here on the list. The mentioning of max 350 MB/s on a SW raid
> is not true, 350 MB/S is what I get out of a simple box with 4 slightly
> oldish SATA drives. 16 new fast SATA drives in SW raid6 should easily go beyond
> 1000 MB/s, given that there are not other bottlenecks in the system.
>
> Linux SW raid goes fairly close to theoretical maxima, given adequate
> HW.
>
>
> best regards
> keld
>
>> it don?t need a operational system, since it will only run one program
>> with some threads (ok a small operational system to implement threads
>> easly)
>>
>> we could use arm, fpga, intel core2duo, atlhon, xeon, or another system...
>> instead using a computer with ethernet interface (nbd nfs samba or
>> another file/device sharing iscsi ethernet sata), we need a computer
>> with pci-express interface and native operational system module
>>
>>
>> 2011/1/19 Roberto Spadim <roberto@spadim.com.br>:
>> > the problem....
>> > if you use iostat, or iotop
>> > with software raid:
>> >   you just see disk i/o
>> >   you don?t see memory (cache) i/o
>> > when using hardware raid:
>> >   you just see raid i/o (it can be a cache read or a real disk read)
>> >
>> >
>> > if you check memory+disk i/o, you will get similar values, if not, you
>> > will see high cpu usage
>> > for example you are using raidx with 10disks on a hardware raid
>> > change hardware raid to use only disks (10 disks for linux)
>> > make the same raidx with 10disks
>> > you will get a slower i/o since it have a controler between disk and cpu
>> > try it without hardware raid cpu, just a (sas/sata) optimized
>> > controller, or 10 (sata/sas) one port
>> > you still with a slow i/o then hardware controller (that?s right!)
>> >
>> > now let?s remove the sata/sas channel, let?s use a pci-express
>> > revodrive or pci-express texas ssd drive
>> > you will get better values then a hardware raid, but... why? you
>> > changed the hardware (ok, i know) but you make cpu more close to disk
>> > if you use disks with cache, you will get more speed (a memory ssd
>> > harddisk is faster than a harddisk only disk)
>> >
>> > why hardware are more faster than linux? i don?t think they are...
>> > they can make smaller latencies with good memory cache
>> > but if you computer use ddr3 and your hardware raid controller use i2c
>> > memory, your ddr3 cache is faster...
>> >
>> > how to benchmark? check disk i/o+memory cache i/o
>> > if linux is faster ok, you use more cpu and memory of your computer
>> > if linux is slower ok, you use less cpu and memory, but will have it
>> > on hardware raid...
>> > if you upgrade you memory and cpu, it can be faster than you hardware
>> > raid controller, what?s better for you?
>> >
>> > want a better read/write solution for software raid? make a new
>> > read/write code, you can do it, linux is easier than hardware raid to
>> > code!
>> > want a better read/write solution for hardware raid? call your
>> > hardware seller and talk, please i need a better firmware, could you
>> > send me?
>> >
>> > got?
>> >
>> >
>> > 2011/1/19 Stefan /*St0fF*/ Hübner <stefan.huebner@stud.tu-ilmenau.de>:
>> >> @Roberto: I guess you're right. BUT: i have not seen 900MB/s coming from
>> >> (i.e. read access) a software raid, but I've seen it from a 9750 on a
>> >> LSI SASx28 backplane, running RAID6 over 16disks (HDS722020ALA330).  So
>> >> one might not be wrong assuming on current raid-controllers
>> >> hardware/software matching and timing is way more optimized than what
>> >> mdraid might get at all.
>> >>
>> >> The 9650 and 9690 are considerably slower, but I've seen 550MB/s thruput
>> >> from those, also (I don't recall the setup anymore, tho).
>> >>
>> >> Max reading I saw from a software raid was around 350MB/s - so hence my
>> >> answers.  And if people had problems with controllers which are 5 years
>> >> or older by now, the numbers are not really comparable...
>> >>
>> >> Now again there's the point where there are also parameters on the
>> >> controller that can be tweaked, and a simple way to recreate the testing
>> >> scenario.  We may discuss and throw in further numbers and experience,
>> >> but not being able to recreate your specific scenario makes us talk past
>> >> each other...
>> >>
>> >> stefan
>> >>
>> >> Am 19.01.2011 20:50, schrieb Roberto Spadim:
>> >>> So can anybody help answering these questions:
>> >>>
>> >>> - are there any special options when creating the RAID0 to make it
>> >>> perform faster for such a use case?
>> >>> - are there other tunables, any special MD / LVM / file system / read
>> >>> ahead / buffer cache / ... parameters to look for?
>> >>>
>> >>> lets see:
>> >>> what?s your disk (ssd or sas or sata) best block size to write/read?
>> >>> write this at ->(A)
>> >>> what?s your work load? 50% write 50% read ?
>> >>>
>> >>> raid0 block size should be multiple of (A)
>> >>> *****filesystem size should be multiple of (A) of all disks
>> >>> *****read ahead should be a multiple of (A)
>> >>> for example
>> >>> /dev/sda 1kb
>> >>> /dev/sdb 4kb
>> >>>
>> >>> you should use 6kb... you should use 4kb, 8kb, 16kb (multiple of 1kb and 4kb)
>> >>>
>> >>> check i/o sheduller per disk too (ssd should use noop, disks should
>> >>> use cfq, deadline or another...)
>> >>> async and sync option at mount /etc/fstab, noatime reduce a lot of i/o
>> >>> too, you should optimize your application too
>> >>> hdparm each disk to use dma and fastest i/o options
>> >>>
>> >>> are you using only filesystem? are you using somethink more? samba?
>> >>> mysql? apache? lvm?
>> >>> each of this programs have some tunning, check their benchmarks
>> >>>
>> >>>
>> >>> getting back....
>> >>> what?s a raid controller?
>> >>> cpu + memory + disk controller + disks
>> >>> but... it only run raid software (it can run linux....)
>> >>>
>> >>> if you computer is slower than raid cpu+memory+disk controller, you
>> >>> will have a slower software raid, than hardware raid
>> >>> it?s like load balance on cpu/memory utilization of disk i/o (use
>> >>> dedicated hardware, or use your hardware?)
>> >>> got it?
>> >>> using a super fast xeon with ddr3 and optical fiber running software
>> >>> raid, is faster than a hardware raid using a arm (or fpga) ddrX memory
>> >>> and sas(fiber optical) connection to disks
>> >>>
>> >>> two solutions for the same problem
>> >>> what?s fast? benchmark it
>> >>> i think that if your xeon run a database and a very workloaded apache,
>> >>> a dedicated hardware raid can run faster, but a light xeon can run
>> >>> faster than a dedicated hardware raid
>> >>>
>> >>>
>> >>>
>> >>> 2011/1/19 Wolfgang Denk <wd@denx.de>:
>> >>>> Dear =?ISO-8859-15?Q?Stefan_/*St0fF*/_H=FCbner?=,
>> >>>>
>> >>>> In message <4D361F26.3060507@stud.tu-ilmenau.de> you wrote:
>> >>>>>
>> >>>>> [in German:] Schätzelein, Dein Problem sind die Platten, nicht der
>> >>>>> Controller.
>> >>>>>
>> >>>>> [in English:] Dude, the disks are your bottleneck.
>> >>>> ...
>> >>>>
>> >>>> Maybe we can stop speculations about what might be the cause of the
>> >>>> problems in some setup I do NOT intend to use, and rather discuss the
>> >>>> questions I asked.
>> >>>>
>> >>>>>> I will have 4 x 1 TB disks for this setup.
>> >>>>>>
>> >>>>>> The plan is to build a RAID0 from the 4 devices, create a physical
>> >>>>>> volume and a volume group on the resulting /dev/md?, then create 2 or
>> >>>>>> 3 logical volumes that will be used as XFS file systems.
>> >>>>
>> >>>> Clarrification: I'll run /dev/md* on the raw disks, without any
>> >>>> partitions on them.
>> >>>>
>> >>>>>> My goal is to optimize for maximum number of I/O operations per
>> >>>>>> second. ...
>> >>>>>>
>> >>>>>> Is this a reasonable approach for such a task?
>> >>>>>>
>> >>>>>> Should I do anything different to acchive maximum performance?
>> >>>>>>
>> >>>>>> What are the tunables in this setup?  [It seems the usual recipies are
>> >>>>>> more oriented in maximizing the data troughput for large, mostly
>> >>>>>> sequential accesses - I figure that things like increasing read-ahead
>> >>>>>> etc. will not help me much here?]
>> >>>>
>> >>>> So can anybody help answering these questions:
>> >>>>
>> >>>> - are there any special options when creating the RAID0 to make it
>> >>>>  perform faster for such a use case?
>> >>>> - are there other tunables, any special MD / LVM / file system /
>> >>>>  read ahead / buffer cache / ... parameters to look for?
>> >>>>
>> >>>> Thanks.
>> >>>>
>> >>>> Wolfgang Denk
>> >>>>
>> >>>> --
>> >>>> DENX Software Engineering GmbH,     MD: Wolfgang Denk & Detlev Zundel
>> >>>> HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
>> >>>> Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de
>> >>>> Boykottiert Microsoft - Kauft Eure Fenster bei OBI!
>> >>>> --
>> >>>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>> >>>> the body of a message to majordomo@vger.kernel.org
>> >>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> >>>>
>> >>>
>> >>>
>> >>>
>> >>
>> >> --
>> >> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>> >> the body of a message to majordomo@vger.kernel.org
>> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> >>
>> >
>> >
>> >
>> > --
>> > Roberto Spadim
>> > Spadim Technology / SPAEmpresarial
>> >
>>
>>
>>
>> --
>> Roberto Spadim
>> Spadim Technology / SPAEmpresarial
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>



-- 
Roberto Spadim
Spadim Technology / SPAEmpresarial
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Optimize RAID0 for max IOPS?
  2011-01-19 23:18           ` Roberto Spadim
  2011-01-20  2:48             ` Keld Jørn Simonsen
@ 2011-01-21 19:34             ` Wolfgang Denk
  2011-01-21 20:03               ` Roberto Spadim
  1 sibling, 1 reply; 46+ messages in thread
From: Wolfgang Denk @ 2011-01-21 19:34 UTC (permalink / raw)
  To: Roberto Spadim; +Cc: stefan.huebner, linux-raid

Dear Roberto,

In message <AANLkTiki_FfRrLtL3dMsrDLXeT8jNO0ndnTNpXk1OXMW@mail.gmail.com> you wrote:
> a good idea....
> why not start a opensource raid controller?
> what we need? a cpu, memory, power supply with battery or capacitor,
> sas/sata (disk interfaces), pci-express or another (computer
> interface)
> it don´t need a operational system, since it will only run one program
> with some threads (ok a small operational system to implement threads
> easly)
>
> we could use arm, fpga, intel core2duo, atlhon, xeon, or another system...

You could evenuse a processor dedicated for such a job, like a
PPC440SPe or PPC460SX or similar, which provide hardware-offload
capabilities for the RAID calculations.  These are even supported by
drivers in mainline Linux.

But again, thee would not helpo to maximize IOPS - goal for
optimization has always been maximum sequential troughput only
(and yes, I know exactly what I'm talking about; guess where the
aforementioned drivers are coming from).

Best regards,

Wolfgang Denk

-- 
DENX Software Engineering GmbH,     MD: Wolfgang Denk & Detlev Zundel
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de
I don't see any direct evidence ...  but, then, my crystal ball is in
dire need of an ectoplasmic upgrade. :-)              -- Howard Smith
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Optimize RAID0 for max IOPS?
  2011-01-21 19:34             ` Wolfgang Denk
@ 2011-01-21 20:03               ` Roberto Spadim
  2011-01-21 20:04                 ` Roberto Spadim
  0 siblings, 1 reply; 46+ messages in thread
From: Roberto Spadim @ 2011-01-21 20:03 UTC (permalink / raw)
  To: Wolfgang Denk; +Cc: stefan.huebner, linux-raid

=) i know
but, every body tell software is slower, the solution - use hardware
ok
there´s no opensource firmware for raid hardware

i preffer a good software/hardware solution, linux raid is a good
software solution for me =)
but, why not try a opensource project? hehe
what we could do.... a virtual machine :P with only raid and nfs, or
make a dedicated cpu for raid (cpu affinity) and a portion of memory
only for raid cache (today i think raid software don´t have cache, it
shoudn´t, cache is done by linux at filesystem level, i´m right?)


2011/1/21 Wolfgang Denk <wd@denx.de>:
> Dear Roberto,
>
> In message <AANLkTiki_FfRrLtL3dMsrDLXeT8jNO0ndnTNpXk1OXMW@mail.gmail.com> you wrote:
>> a good idea....
>> why not start a opensource raid controller?
>> what we need? a cpu, memory, power supply with battery or capacitor,
>> sas/sata (disk interfaces), pci-express or another (computer
>> interface)
>> it don´t need a operational system, since it will only run one program
>> with some threads (ok a small operational system to implement threads
>> easly)
>>
>> we could use arm, fpga, intel core2duo, atlhon, xeon, or another system...
>
> You could evenuse a processor dedicated for such a job, like a
> PPC440SPe or PPC460SX or similar, which provide hardware-offload
> capabilities for the RAID calculations.  These are even supported by
> drivers in mainline Linux.
>
> But again, thee would not helpo to maximize IOPS - goal for
> optimization has always been maximum sequential troughput only
> (and yes, I know exactly what I'm talking about; guess where the
> aforementioned drivers are coming from).
>
> Best regards,
>
> Wolfgang Denk
>
> --
> DENX Software Engineering GmbH,     MD: Wolfgang Denk & Detlev Zundel
> HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
> Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de
> I don't see any direct evidence ...  but, then, my crystal ball is in
> dire need of an ectoplasmic upgrade. :-)              -- Howard Smith
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>



-- 
Roberto Spadim
Spadim Technology / SPAEmpresarial
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Optimize RAID0 for max IOPS?
  2011-01-21 20:03               ` Roberto Spadim
@ 2011-01-21 20:04                 ` Roberto Spadim
  0 siblings, 0 replies; 46+ messages in thread
From: Roberto Spadim @ 2011-01-21 20:04 UTC (permalink / raw)
  To: Wolfgang Denk; +Cc: stefan.huebner, linux-raid

thanks, i never used a PPC440SPe, i will buy one for hobby =)

2011/1/21 Roberto Spadim <roberto@spadim.com.br>:
> =) i know
> but, every body tell software is slower, the solution - use hardware
> ok
> there´s no opensource firmware for raid hardware
>
> i preffer a good software/hardware solution, linux raid is a good
> software solution for me =)
> but, why not try a opensource project? hehe
> what we could do.... a virtual machine :P with only raid and nfs, or
> make a dedicated cpu for raid (cpu affinity) and a portion of memory
> only for raid cache (today i think raid software don´t have cache, it
> shoudn´t, cache is done by linux at filesystem level, i´m right?)
>
>
> 2011/1/21 Wolfgang Denk <wd@denx.de>:
>> Dear Roberto,
>>
>> In message <AANLkTiki_FfRrLtL3dMsrDLXeT8jNO0ndnTNpXk1OXMW@mail.gmail.com> you wrote:
>>> a good idea....
>>> why not start a opensource raid controller?
>>> what we need? a cpu, memory, power supply with battery or capacitor,
>>> sas/sata (disk interfaces), pci-express or another (computer
>>> interface)
>>> it don´t need a operational system, since it will only run one program
>>> with some threads (ok a small operational system to implement threads
>>> easly)
>>>
>>> we could use arm, fpga, intel core2duo, atlhon, xeon, or another system...
>>
>> You could evenuse a processor dedicated for such a job, like a
>> PPC440SPe or PPC460SX or similar, which provide hardware-offload
>> capabilities for the RAID calculations.  These are even supported by
>> drivers in mainline Linux.
>>
>> But again, thee would not helpo to maximize IOPS - goal for
>> optimization has always been maximum sequential troughput only
>> (and yes, I know exactly what I'm talking about; guess where the
>> aforementioned drivers are coming from).
>>
>> Best regards,
>>
>> Wolfgang Denk
>>
>> --
>> DENX Software Engineering GmbH,     MD: Wolfgang Denk & Detlev Zundel
>> HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
>> Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de
>> I don't see any direct evidence ...  but, then, my crystal ball is in
>> dire need of an ectoplasmic upgrade. :-)              -- Howard Smith
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>
>
>
> --
> Roberto Spadim
> Spadim Technology / SPAEmpresarial
>



-- 
Roberto Spadim
Spadim Technology / SPAEmpresarial
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Optimize RAID0 for max IOPS?
  2011-01-19 19:21   ` Wolfgang Denk
  2011-01-19 19:50     ` Roberto Spadim
@ 2011-01-24 14:40     ` CoolCold
  2011-01-24 15:25         ` Justin Piszcz
  2011-01-24 20:43       ` Wolfgang Denk
  1 sibling, 2 replies; 46+ messages in thread
From: CoolCold @ 2011-01-24 14:40 UTC (permalink / raw)
  To: Wolfgang Denk; +Cc: stefan.huebner, linux-raid

On Wed, Jan 19, 2011 at 10:21 PM, Wolfgang Denk <wd@denx.de> wrote:
> Dear =?ISO-8859-15?Q?Stefan_/*St0fF*/_H=FCbner?=,
>
> In message <4D361F26.3060507@stud.tu-ilmenau.de> you wrote:
>>
>> [in German:] Schätzelein, Dein Problem sind die Platten, nicht der
>> Controller.
>>
>> [in English:] Dude, the disks are your bottleneck.
> ...
>
> Maybe we can stop speculations about what might be the cause of the
> problems in some setup I do NOT intend to use, and rather discuss the
> questions I asked.
>
>> > I will have 4 x 1 TB disks for this setup.
>> >
>> > The plan is to build a RAID0 from the 4 devices, create a physical
>> > volume and a volume group on the resulting /dev/md?, then create 2 or
>> > 3 logical volumes that will be used as XFS file systems.
>
> Clarrification: I'll run /dev/md* on the raw disks, without any
> partitions on them.
>
>> > My goal is to optimize for maximum number of I/O operations per
>> > second. ...
>> >
>> > Is this a reasonable approach for such a task?
>> >
>> > Should I do anything different to acchive maximum performance?
>> >
>> > What are the tunables in this setup?  [It seems the usual recipies are
>> > more oriented in maximizing the data troughput for large, mostly
>> > sequential accesses - I figure that things like increasing read-ahead
>> > etc. will not help me much here?]
>
> So can anybody help answering these questions:
>
> - are there any special options when creating the RAID0 to make it
>  perform faster for such a use case?
> - are there other tunables, any special MD / LVM / file system /
>  read ahead / buffer cache / ... parameters to look for?
XFS is known for it's slow speed on metadata operations like updating
file attributes/removing files..but things gonna change after 2.6.35
where delaylog is used. Citating Dave Chinner :
< dchinner> Indeed, the biggest concurrency limitation has
traditionally been the transaction commit/journalling code, but that's
a lot more scalable now with delayed logging....

So, you may need to benchmark fs part.

>
> Thanks.
>
> Wolfgang Denk
>
> --
> DENX Software Engineering GmbH,     MD: Wolfgang Denk & Detlev Zundel
> HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
> Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de
> Boykottiert Microsoft - Kauft Eure Fenster bei OBI!
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>



-- 
Best regards,
[COOLCOLD-RIPN]
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Optimize RAID0 for max IOPS?
  2011-01-24 14:40     ` CoolCold
@ 2011-01-24 15:25         ` Justin Piszcz
  2011-01-24 20:43       ` Wolfgang Denk
  1 sibling, 0 replies; 46+ messages in thread
From: Justin Piszcz @ 2011-01-24 15:25 UTC (permalink / raw)
  To: CoolCold; +Cc: Wolfgang Denk, stefan.huebner, linux-raid, xfs

[-- Attachment #1: Type: TEXT/PLAIN, Size: 900 bytes --]



On Mon, 24 Jan 2011, CoolCold wrote:

>> So can anybody help answering these questions:
>>
>> - are there any special options when creating the RAID0 to make it
>>  perform faster for such a use case?
>> - are there other tunables, any special MD / LVM / file system /
>>  read ahead / buffer cache / ... parameters to look for?
> XFS is known for it's slow speed on metadata operations like updating
> file attributes/removing files..but things gonna change after 2.6.35
> where delaylog is used. Citating Dave Chinner :
> < dchinner> Indeed, the biggest concurrency limitation has
> traditionally been the transaction commit/journalling code, but that's
> a lot more scalable now with delayed logging....
>
> So, you may need to benchmark fs part.

Some info on XFS benchmark with delaylog here:
http://comments.gmane.org/gmane.comp.file-systems.xfs.general/34379

Justin.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Optimize RAID0 for max IOPS?
@ 2011-01-24 15:25         ` Justin Piszcz
  0 siblings, 0 replies; 46+ messages in thread
From: Justin Piszcz @ 2011-01-24 15:25 UTC (permalink / raw)
  To: CoolCold; +Cc: linux-raid, stefan.huebner, Wolfgang Denk, xfs

[-- Attachment #1: Type: TEXT/PLAIN, Size: 900 bytes --]



On Mon, 24 Jan 2011, CoolCold wrote:

>> So can anybody help answering these questions:
>>
>> - are there any special options when creating the RAID0 to make it
>>  perform faster for such a use case?
>> - are there other tunables, any special MD / LVM / file system /
>>  read ahead / buffer cache / ... parameters to look for?
> XFS is known for it's slow speed on metadata operations like updating
> file attributes/removing files..but things gonna change after 2.6.35
> where delaylog is used. Citating Dave Chinner :
> < dchinner> Indeed, the biggest concurrency limitation has
> traditionally been the transaction commit/journalling code, but that's
> a lot more scalable now with delayed logging....
>
> So, you may need to benchmark fs part.

Some info on XFS benchmark with delaylog here:
http://comments.gmane.org/gmane.comp.file-systems.xfs.general/34379

Justin.

[-- Attachment #2: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Optimize RAID0 for max IOPS?
  2011-01-24 14:40     ` CoolCold
  2011-01-24 15:25         ` Justin Piszcz
@ 2011-01-24 20:43       ` Wolfgang Denk
  1 sibling, 0 replies; 46+ messages in thread
From: Wolfgang Denk @ 2011-01-24 20:43 UTC (permalink / raw)
  To: CoolCold; +Cc: linux-raid

Dear CoolCold,

In message <AANLkTikx4g99-Cf_09kEGfF2mmf4Dnuh2A5gTrtKweDy@mail.gmail.com> you wrote:
>
> > So can anybody help answering these questions:
> >
> > - are there any special options when creating the RAID0 to make it
> >  perform faster for such a use case?
> > - are there other tunables, any special MD / LVM / file system /
> >  read ahead / buffer cache / ... parameters to look for?
> XFS is known for it's slow speed on metadata operations like updating
> file attributes/removing files..but things gonna change after 2.6.35
> where delaylog is used. Citating Dave Chinner :
> < dchinner> Indeed, the biggest concurrency limitation has
> traditionally been the transaction commit/journalling code, but that's
> a lot more scalable now with delayed logging....
> 
> So, you may need to benchmark fs part.

Thanks a lot - much appreciated.  The first reply that actually was on
topic...

Best regards,

Wolfgang Denk

-- 
DENX Software Engineering GmbH,     MD: Wolfgang Denk & Detlev Zundel
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de
It is not best to swap horses while crossing the river.
- Abraham Lincoln

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Optimize RAID0 for max IOPS?
  2011-01-24 15:25         ` Justin Piszcz
@ 2011-01-24 20:48           ` Wolfgang Denk
  -1 siblings, 0 replies; 46+ messages in thread
From: Wolfgang Denk @ 2011-01-24 20:48 UTC (permalink / raw)
  To: Justin Piszcz; +Cc: linux-raid, xfs

Dear Justin Piszcz,

In message <alpine.DEB.2.00.1101241024230.14640@p34.internal.lan> you wrote:
>
> > So, you may need to benchmark fs part.
> 
> Some info on XFS benchmark with delaylog here:
> http://comments.gmane.org/gmane.comp.file-systems.xfs.general/34379

Thanks a lot for the pointer. I will try this out.

Best regards,

Wolfgang Denk

-- 
DENX Software Engineering GmbH,     MD: Wolfgang Denk & Detlev Zundel
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de
Madness takes its toll. Please have exact change.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Optimize RAID0 for max IOPS?
@ 2011-01-24 20:48           ` Wolfgang Denk
  0 siblings, 0 replies; 46+ messages in thread
From: Wolfgang Denk @ 2011-01-24 20:48 UTC (permalink / raw)
  To: Justin Piszcz; +Cc: linux-raid, xfs

Dear Justin Piszcz,

In message <alpine.DEB.2.00.1101241024230.14640@p34.internal.lan> you wrote:
>
> > So, you may need to benchmark fs part.
> 
> Some info on XFS benchmark with delaylog here:
> http://comments.gmane.org/gmane.comp.file-systems.xfs.general/34379

Thanks a lot for the pointer. I will try this out.

Best regards,

Wolfgang Denk

-- 
DENX Software Engineering GmbH,     MD: Wolfgang Denk & Detlev Zundel
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de
Madness takes its toll. Please have exact change.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Optimize RAID0 for max IOPS?
  2011-01-24 15:25         ` Justin Piszcz
@ 2011-01-24 21:57           ` Wolfgang Denk
  -1 siblings, 0 replies; 46+ messages in thread
From: Wolfgang Denk @ 2011-01-24 21:57 UTC (permalink / raw)
  To: Justin Piszcz; +Cc: linux-raid, xfs

Dear Justin,

In message <alpine.DEB.2.00.1101241024230.14640@p34.internal.lan> you wrote:
> 
> Some info on XFS benchmark with delaylog here:
> http://comments.gmane.org/gmane.comp.file-systems.xfs.general/34379

For the record: I tested both the "delaylog" and "logbsize=262144" on
two systems running Fedora 14 x86_64 (kernel version
2.6.35.10-74.fc14.x86_64).


Test No.	Mount options
1		rw,noatime
2		rw,noatime,delaylog
3		rw,noatime,delaylog,logbsize=262144


System A: Gigabyte EP35C-DS3R Mainbord, Core 2 Quad CPU Q9550 @ 2.83GHz, 4 GB RAM
--------- software RAID 5 using 4 x old Maxtor 7Y250M0 S-ATA I disks
	  (chunk size 16 kB, using S-ATA ports on main board), XFS

Test 1:

Version  1.96       ------Sequential Output------ --Sequential Input- --Random-
Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
A1               8G   844  96 153107  19 56427  11  2006  98 127174  15 369.4   6
Latency             13686us    1480ms    1128ms   14986us     136ms   74911us
Version  1.96       ------Sequential Create------ --------Random Create--------
A1                  -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
                 16   104   0 +++++ +++   115   0    89   0 +++++ +++   111   0
Latency               326ms     171us     277ms     343ms       9us     360ms
1.96,1.96,A1,1,1295714835,8G,,844,96,153107,19,56427,11,2006,98,127174,15,369.4,6,16,,,,,104,0,+++++,+++,115,0,89,0,+++++,+++,111,0,13686us,1480ms,1128ms,14986us,136ms,74911us,326ms,171us,277ms,343ms,9us,360ms

Test 2:

Version  1.96       ------Sequential Output------ --Sequential Input- --Random-
Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
A2               8G   417  46 67526   8 28251   5  1338  63 53780   5 236.0   4
Latency             38626us    1859ms     508ms   26689us     258ms     188ms
Version  1.96       ------Sequential Create------ --------Random Create--------
A2                  -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
                 16    51   0 +++++ +++   128   0   102   0 +++++ +++   125   0
Latency              1526ms     169us     277ms     363ms       8us     324ms
1.96,1.96,A2,1,1295901138,8G,,417,46,67526,8,28251,5,1338,63,53780,5,236.0,4,16,,,,,51,0,+++++,+++,128,0,102,0,+++++,+++,125,0,38626us,1859ms,508ms,26689us,258ms,188ms,1526ms,169us,277ms,363ms,8us,324ms

Test 3:

Version  1.96       ------Sequential Output------ --Sequential Input- --Random-
Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
A3               8G   417  46 67526   8 28251   5  1338  63 53780   5 236.0   4
Latency             38626us    1859ms     508ms   26689us     258ms     188ms
Version  1.96       ------Sequential Create------ --------Random Create--------
A3                  -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
                 16    51   0 +++++ +++   128   0   102   0 +++++ +++   125   0
Latency              1526ms     169us     277ms     363ms       8us     324ms
1.96,1.96,A3,1,1295901138,8G,,417,46,67526,8,28251,5,1338,63,53780,5,236.0,4,16,,,,,51,0,+++++,+++,128,0,102,0,+++++,+++,125,0,38626us,1859ms,508ms,26689us,258ms,188ms,1526ms,169us,277ms,363ms,8us,324ms

System B: Supermicro H8DM8-2 Mainbord, Dual-Core AMD Opteron 2216 @ 2.4 GHz, 8 GB RAM
          software RAID 6 using 6 x Seagate ST31000524NS S-ATA II disks
          (chunk size 16 kB, using a Marvell MV88SX6081 8-port SATA II PCI-X Controller)
          XFS

Test 1:

Version  1.96       ------Sequential Output------ --Sequential Input- --Random-
Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
B1              16G   403  98 198720  66 53287  49  1013  99 228076  91 545.0  31
Latency             43022us     127ms     126ms   29328us     105ms   66395us
Version  1.96       ------Sequential Create------ --------Random Create--------
B1                  -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
                 16    97   1 +++++ +++    96   1    96   1 +++++ +++    95   1
Latency               326ms     349us     351ms     355ms      49us     363ms
1.96,1.96,B1,1,1295784794,16G,,403,98,198720,66,53287,49,1013,99,228076,91,545.0,31,16,,,,,97,1,+++++,+++,96,1,96,1,+++++,+++,95,1,43022us,127ms,126ms,29328us,105ms,66395us,326ms,349us,351ms,355ms,49us,363ms

Test 2:

Version  1.96       ------Sequential Output------ --Sequential Input- --Random-
Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
B2              16G   380  98 197319  68 54835  48   983  99 216812  89 527.8  31
Latency             47456us     227ms     280ms   24696us   38233us   80147us
Version  1.96       ------Sequential Create------ --------Random Create--------
B2                  -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
                 16    91   1 +++++ +++   115   1    73   1 +++++ +++    96   1
Latency               355ms    2274us     833ms     750ms    1079us     400ms
1.96,1.96,B2,1,1295884032,16G,,380,98,197319,68,54835,48,983,99,216812,89,527.8,31,16,,,,,91,1,+++++,+++,115,1,73,1,+++++,+++,96,1,47456us,227ms,280ms,24696us,38233us,80147us,355ms,2274us,833ms,750ms,1079us,400ms

Test 3:

Version  1.96       ------Sequential Output------ --Sequential Input- --Random-
Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
B3              16G   402  99 175802  64 55639  48  1006  99 232748  87 543.7  32
Latency             43160us     426ms     164ms   13306us   40857us   65114us
Version  1.96       ------Sequential Create------ --------Random Create--------
B3                  -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
                 16    93   1 +++++ +++   101   1    95   1 +++++ +++    95   1
Latency               479ms    2281us     383ms     366ms      22us     402ms
1.96,1.96,B3,1,1295880202,16G,,402,99,175802,64,55639,48,1006,99,232748,87,543.7,32,16,,,,,93,1,+++++,+++,101,1,95,1,+++++,+++,95,1,43160us,426ms,164ms,13306us,40857us,65114us,479ms,2281us,383ms,366ms,22us,402ms


I do not see any significant improvement in any of the parameters -
especially when compared to the serious performance degradation (down
to 44% for block write, 42% for block read) on system A.

Best regards,

Wolfgang Denk

-- 
DENX Software Engineering GmbH,     MD: Wolfgang Denk & Detlev Zundel
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de
A supercomputer is a machine that runs an endless loop in 2 seconds.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Optimize RAID0 for max IOPS?
@ 2011-01-24 21:57           ` Wolfgang Denk
  0 siblings, 0 replies; 46+ messages in thread
From: Wolfgang Denk @ 2011-01-24 21:57 UTC (permalink / raw)
  To: Justin Piszcz; +Cc: linux-raid, xfs

Dear Justin,

In message <alpine.DEB.2.00.1101241024230.14640@p34.internal.lan> you wrote:
> 
> Some info on XFS benchmark with delaylog here:
> http://comments.gmane.org/gmane.comp.file-systems.xfs.general/34379

For the record: I tested both the "delaylog" and "logbsize=262144" on
two systems running Fedora 14 x86_64 (kernel version
2.6.35.10-74.fc14.x86_64).


Test No.	Mount options
1		rw,noatime
2		rw,noatime,delaylog
3		rw,noatime,delaylog,logbsize=262144


System A: Gigabyte EP35C-DS3R Mainbord, Core 2 Quad CPU Q9550 @ 2.83GHz, 4 GB RAM
--------- software RAID 5 using 4 x old Maxtor 7Y250M0 S-ATA I disks
	  (chunk size 16 kB, using S-ATA ports on main board), XFS

Test 1:

Version  1.96       ------Sequential Output------ --Sequential Input- --Random-
Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
A1               8G   844  96 153107  19 56427  11  2006  98 127174  15 369.4   6
Latency             13686us    1480ms    1128ms   14986us     136ms   74911us
Version  1.96       ------Sequential Create------ --------Random Create--------
A1                  -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
                 16   104   0 +++++ +++   115   0    89   0 +++++ +++   111   0
Latency               326ms     171us     277ms     343ms       9us     360ms
1.96,1.96,A1,1,1295714835,8G,,844,96,153107,19,56427,11,2006,98,127174,15,369.4,6,16,,,,,104,0,+++++,+++,115,0,89,0,+++++,+++,111,0,13686us,1480ms,1128ms,14986us,136ms,74911us,326ms,171us,277ms,343ms,9us,360ms

Test 2:

Version  1.96       ------Sequential Output------ --Sequential Input- --Random-
Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
A2               8G   417  46 67526   8 28251   5  1338  63 53780   5 236.0   4
Latency             38626us    1859ms     508ms   26689us     258ms     188ms
Version  1.96       ------Sequential Create------ --------Random Create--------
A2                  -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
                 16    51   0 +++++ +++   128   0   102   0 +++++ +++   125   0
Latency              1526ms     169us     277ms     363ms       8us     324ms
1.96,1.96,A2,1,1295901138,8G,,417,46,67526,8,28251,5,1338,63,53780,5,236.0,4,16,,,,,51,0,+++++,+++,128,0,102,0,+++++,+++,125,0,38626us,1859ms,508ms,26689us,258ms,188ms,1526ms,169us,277ms,363ms,8us,324ms

Test 3:

Version  1.96       ------Sequential Output------ --Sequential Input- --Random-
Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
A3               8G   417  46 67526   8 28251   5  1338  63 53780   5 236.0   4
Latency             38626us    1859ms     508ms   26689us     258ms     188ms
Version  1.96       ------Sequential Create------ --------Random Create--------
A3                  -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
                 16    51   0 +++++ +++   128   0   102   0 +++++ +++   125   0
Latency              1526ms     169us     277ms     363ms       8us     324ms
1.96,1.96,A3,1,1295901138,8G,,417,46,67526,8,28251,5,1338,63,53780,5,236.0,4,16,,,,,51,0,+++++,+++,128,0,102,0,+++++,+++,125,0,38626us,1859ms,508ms,26689us,258ms,188ms,1526ms,169us,277ms,363ms,8us,324ms

System B: Supermicro H8DM8-2 Mainbord, Dual-Core AMD Opteron 2216 @ 2.4 GHz, 8 GB RAM
          software RAID 6 using 6 x Seagate ST31000524NS S-ATA II disks
          (chunk size 16 kB, using a Marvell MV88SX6081 8-port SATA II PCI-X Controller)
          XFS

Test 1:

Version  1.96       ------Sequential Output------ --Sequential Input- --Random-
Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
B1              16G   403  98 198720  66 53287  49  1013  99 228076  91 545.0  31
Latency             43022us     127ms     126ms   29328us     105ms   66395us
Version  1.96       ------Sequential Create------ --------Random Create--------
B1                  -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
                 16    97   1 +++++ +++    96   1    96   1 +++++ +++    95   1
Latency               326ms     349us     351ms     355ms      49us     363ms
1.96,1.96,B1,1,1295784794,16G,,403,98,198720,66,53287,49,1013,99,228076,91,545.0,31,16,,,,,97,1,+++++,+++,96,1,96,1,+++++,+++,95,1,43022us,127ms,126ms,29328us,105ms,66395us,326ms,349us,351ms,355ms,49us,363ms

Test 2:

Version  1.96       ------Sequential Output------ --Sequential Input- --Random-
Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
B2              16G   380  98 197319  68 54835  48   983  99 216812  89 527.8  31
Latency             47456us     227ms     280ms   24696us   38233us   80147us
Version  1.96       ------Sequential Create------ --------Random Create--------
B2                  -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
                 16    91   1 +++++ +++   115   1    73   1 +++++ +++    96   1
Latency               355ms    2274us     833ms     750ms    1079us     400ms
1.96,1.96,B2,1,1295884032,16G,,380,98,197319,68,54835,48,983,99,216812,89,527.8,31,16,,,,,91,1,+++++,+++,115,1,73,1,+++++,+++,96,1,47456us,227ms,280ms,24696us,38233us,80147us,355ms,2274us,833ms,750ms,1079us,400ms

Test 3:

Version  1.96       ------Sequential Output------ --Sequential Input- --Random-
Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
B3              16G   402  99 175802  64 55639  48  1006  99 232748  87 543.7  32
Latency             43160us     426ms     164ms   13306us   40857us   65114us
Version  1.96       ------Sequential Create------ --------Random Create--------
B3                  -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
                 16    93   1 +++++ +++   101   1    95   1 +++++ +++    95   1
Latency               479ms    2281us     383ms     366ms      22us     402ms
1.96,1.96,B3,1,1295880202,16G,,402,99,175802,64,55639,48,1006,99,232748,87,543.7,32,16,,,,,93,1,+++++,+++,101,1,95,1,+++++,+++,95,1,43160us,426ms,164ms,13306us,40857us,65114us,479ms,2281us,383ms,366ms,22us,402ms


I do not see any significant improvement in any of the parameters -
especially when compared to the serious performance degradation (down
to 44% for block write, 42% for block read) on system A.

Best regards,

Wolfgang Denk

-- 
DENX Software Engineering GmbH,     MD: Wolfgang Denk & Detlev Zundel
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de
A supercomputer is a machine that runs an endless loop in 2 seconds.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Optimize RAID0 for max IOPS?
  2011-01-24 21:57           ` Wolfgang Denk
@ 2011-01-24 23:03             ` Dave Chinner
  -1 siblings, 0 replies; 46+ messages in thread
From: Dave Chinner @ 2011-01-24 23:03 UTC (permalink / raw)
  To: Wolfgang Denk; +Cc: Justin Piszcz, linux-raid, xfs

On Mon, Jan 24, 2011 at 10:57:13PM +0100, Wolfgang Denk wrote:
> Dear Justin,
> 
> In message <alpine.DEB.2.00.1101241024230.14640@p34.internal.lan> you wrote:
> > 
> > Some info on XFS benchmark with delaylog here:
> > http://comments.gmane.org/gmane.comp.file-systems.xfs.general/34379
> 
> For the record: I tested both the "delaylog" and "logbsize=262144" on
> two systems running Fedora 14 x86_64 (kernel version
> 2.6.35.10-74.fc14.x86_64).
> 
> 
> Test No.	Mount options
> 1		rw,noatime
> 2		rw,noatime,delaylog
> 3		rw,noatime,delaylog,logbsize=262144
> 
> 
> System A: Gigabyte EP35C-DS3R Mainbord, Core 2 Quad CPU Q9550 @ 2.83GHz, 4 GB RAM
> --------- software RAID 5 using 4 x old Maxtor 7Y250M0 S-ATA I disks
> 	  (chunk size 16 kB, using S-ATA ports on main board), XFS
> 
> Test 1:
> 
> Version  1.96       ------Sequential Output------ --Sequential Input- --Random-
> Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
> Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
> A1               8G   844  96 153107  19 56427  11  2006  98 127174  15 369.4   6
> Latency             13686us    1480ms    1128ms   14986us     136ms   74911us
> Version  1.96       ------Sequential Create------ --------Random Create--------
> A1                  -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
>               files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
>                  16   104   0 +++++ +++   115   0    89   0 +++++ +++   111   0

Only 16 files? You need to test something that takes more than 5
milliseconds to run. Given that XFS can run at >20,000 creates/s for
a single threaded sequential create like this, perhaps you should
start at 100,000 files (maybe a million) so you get an idea of
sustained performance.

.....

> I do not see any significant improvement in any of the parameters -
> especially when compared to the serious performance degradation (down
> to 44% for block write, 42% for block read) on system A.

delaylog does not affect the block IO path in any way, so something
else is going on there. You need to sort that out before drawing any
conclusions.

Similarly, you need to test something relevant to your workload, not
use a canned benchmarks in the expectation the results are in any
way meaningful to your real workload. Also, if you do use a stupid
canned benchmark, make sure you configure it to test something
relevant to what you are trying to compare...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Optimize RAID0 for max IOPS?
@ 2011-01-24 23:03             ` Dave Chinner
  0 siblings, 0 replies; 46+ messages in thread
From: Dave Chinner @ 2011-01-24 23:03 UTC (permalink / raw)
  To: Wolfgang Denk; +Cc: linux-raid, Justin Piszcz, xfs

On Mon, Jan 24, 2011 at 10:57:13PM +0100, Wolfgang Denk wrote:
> Dear Justin,
> 
> In message <alpine.DEB.2.00.1101241024230.14640@p34.internal.lan> you wrote:
> > 
> > Some info on XFS benchmark with delaylog here:
> > http://comments.gmane.org/gmane.comp.file-systems.xfs.general/34379
> 
> For the record: I tested both the "delaylog" and "logbsize=262144" on
> two systems running Fedora 14 x86_64 (kernel version
> 2.6.35.10-74.fc14.x86_64).
> 
> 
> Test No.	Mount options
> 1		rw,noatime
> 2		rw,noatime,delaylog
> 3		rw,noatime,delaylog,logbsize=262144
> 
> 
> System A: Gigabyte EP35C-DS3R Mainbord, Core 2 Quad CPU Q9550 @ 2.83GHz, 4 GB RAM
> --------- software RAID 5 using 4 x old Maxtor 7Y250M0 S-ATA I disks
> 	  (chunk size 16 kB, using S-ATA ports on main board), XFS
> 
> Test 1:
> 
> Version  1.96       ------Sequential Output------ --Sequential Input- --Random-
> Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
> Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
> A1               8G   844  96 153107  19 56427  11  2006  98 127174  15 369.4   6
> Latency             13686us    1480ms    1128ms   14986us     136ms   74911us
> Version  1.96       ------Sequential Create------ --------Random Create--------
> A1                  -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
>               files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
>                  16   104   0 +++++ +++   115   0    89   0 +++++ +++   111   0

Only 16 files? You need to test something that takes more than 5
milliseconds to run. Given that XFS can run at >20,000 creates/s for
a single threaded sequential create like this, perhaps you should
start at 100,000 files (maybe a million) so you get an idea of
sustained performance.

.....

> I do not see any significant improvement in any of the parameters -
> especially when compared to the serious performance degradation (down
> to 44% for block write, 42% for block read) on system A.

delaylog does not affect the block IO path in any way, so something
else is going on there. You need to sort that out before drawing any
conclusions.

Similarly, you need to test something relevant to your workload, not
use a canned benchmarks in the expectation the results are in any
way meaningful to your real workload. Also, if you do use a stupid
canned benchmark, make sure you configure it to test something
relevant to what you are trying to compare...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Optimize RAID0 for max IOPS?
  2011-01-24 23:03             ` Dave Chinner
@ 2011-01-25  7:39               ` Emmanuel Florac
  -1 siblings, 0 replies; 46+ messages in thread
From: Emmanuel Florac @ 2011-01-25  7:39 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Wolfgang Denk, linux-raid, Justin Piszcz, xfs

Le Tue, 25 Jan 2011 10:03:14 +1100 vous écriviez:

> Only 16 files?

IIRC this is 16 thousands of files. Though this is not enough, I
generally use 80 to 160 for tests.

-- 
------------------------------------------------------------------------
Emmanuel Florac     |   Direction technique
                    |   Intellique
                    |	<eflorac@intellique.com>
                    |   +33 1 78 94 84 02
------------------------------------------------------------------------
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Optimize RAID0 for max IOPS?
@ 2011-01-25  7:39               ` Emmanuel Florac
  0 siblings, 0 replies; 46+ messages in thread
From: Emmanuel Florac @ 2011-01-25  7:39 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-raid, xfs, Wolfgang Denk, Justin Piszcz

Le Tue, 25 Jan 2011 10:03:14 +1100 vous écriviez:

> Only 16 files?

IIRC this is 16 thousands of files. Though this is not enough, I
generally use 80 to 160 for tests.

-- 
------------------------------------------------------------------------
Emmanuel Florac     |   Direction technique
                    |   Intellique
                    |	<eflorac@intellique.com>
                    |   +33 1 78 94 84 02
------------------------------------------------------------------------

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Optimize RAID0 for max IOPS?
  2011-01-25  7:39               ` Emmanuel Florac
@ 2011-01-25  8:36                 ` Dave Chinner
  -1 siblings, 0 replies; 46+ messages in thread
From: Dave Chinner @ 2011-01-25  8:36 UTC (permalink / raw)
  To: Emmanuel Florac; +Cc: Wolfgang Denk, linux-raid, Justin Piszcz, xfs

[ As a small note - if you are going to comment on the results table
from a previous message, please don't cut it from your response.
Context is important. I pasted the relevant part back in so i can
refer back to it in my response. ]

On Tue, Jan 25, 2011 at 08:39:00AM +0100, Emmanuel Florac wrote:
> Le Tue, 25 Jan 2011 10:03:14 +1100 vous écriviez:
> > > Version  1.96       ------Sequential Create------ --------Random Create--------
> > > A1                  -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
> > >               files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
> > >                  16   104   0 +++++ +++   115   0    89   0 +++++ +++   111   0
> > 
> > Only 16 files?
> 
> IIRC this is 16 thousands of files. Though this is not enough, I
> generally use 80 to 160 for tests.

Yes, you're right, the bonnie++ man page states that it is in units
of 1024 files. Be nice if there was a "k" to signify that so people
who aren't intimately familiar with it's output format can see
exactly what was tested....

As it is, a create rate of 104 files/s (note the consistency of
units between 2 adjacent numbers!) indicates something else is
screwed, because my local test VM on RAID0 gets numbers like this:

Version  1.96       ------Sequential Create------ --------Random Create--------
test-4              -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
                 16 25507  90 +++++ +++ 30472  97 25281  93 +++++ +++ 29077  97
Latency             23864us     204us   21092us   18855us      82us     121us

IOWs, create rates of 25k/s and unlink of 30k/s and it is clearly
CPU bound.

Therein lies the difference: the original numbers have 0% CPU usage,
which indicates that the test is blocking.  Something is causing the
reported test system to be blocked almost all the time.

/me looks closer.

Oh, despite $subject being "RAID0" the filesystems being tested are
on RAID5 and RAID6 with very small chunk sizes on slow SATA drives.
This is smelling like a case of barrier IOs on software raid on
cheap storage....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Optimize RAID0 for max IOPS?
@ 2011-01-25  8:36                 ` Dave Chinner
  0 siblings, 0 replies; 46+ messages in thread
From: Dave Chinner @ 2011-01-25  8:36 UTC (permalink / raw)
  To: Emmanuel Florac; +Cc: linux-raid, xfs, Wolfgang Denk, Justin Piszcz

[ As a small note - if you are going to comment on the results table
from a previous message, please don't cut it from your response.
Context is important. I pasted the relevant part back in so i can
refer back to it in my response. ]

On Tue, Jan 25, 2011 at 08:39:00AM +0100, Emmanuel Florac wrote:
> Le Tue, 25 Jan 2011 10:03:14 +1100 vous écriviez:
> > > Version  1.96       ------Sequential Create------ --------Random Create--------
> > > A1                  -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
> > >               files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
> > >                  16   104   0 +++++ +++   115   0    89   0 +++++ +++   111   0
> > 
> > Only 16 files?
> 
> IIRC this is 16 thousands of files. Though this is not enough, I
> generally use 80 to 160 for tests.

Yes, you're right, the bonnie++ man page states that it is in units
of 1024 files. Be nice if there was a "k" to signify that so people
who aren't intimately familiar with it's output format can see
exactly what was tested....

As it is, a create rate of 104 files/s (note the consistency of
units between 2 adjacent numbers!) indicates something else is
screwed, because my local test VM on RAID0 gets numbers like this:

Version  1.96       ------Sequential Create------ --------Random Create--------
test-4              -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
                 16 25507  90 +++++ +++ 30472  97 25281  93 +++++ +++ 29077  97
Latency             23864us     204us   21092us   18855us      82us     121us

IOWs, create rates of 25k/s and unlink of 30k/s and it is clearly
CPU bound.

Therein lies the difference: the original numbers have 0% CPU usage,
which indicates that the test is blocking.  Something is causing the
reported test system to be blocked almost all the time.

/me looks closer.

Oh, despite $subject being "RAID0" the filesystems being tested are
on RAID5 and RAID6 with very small chunk sizes on slow SATA drives.
This is smelling like a case of barrier IOs on software raid on
cheap storage....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Optimize RAID0 for max IOPS?
  2011-01-25  8:36                 ` Dave Chinner
@ 2011-01-25 12:45                   ` Wolfgang Denk
  -1 siblings, 0 replies; 46+ messages in thread
From: Wolfgang Denk @ 2011-01-25 12:45 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Emmanuel Florac, linux-raid, Justin Piszcz, xfs

Dear Dave Chinner,

In message <20110125083643.GE28803@dastard> you wrote:
>
> Oh, despite $subject being "RAID0" the filesystems being tested are
> on RAID5 and RAID6 with very small chunk sizes on slow SATA drives.
> This is smelling like a case of barrier IOs on software raid on
> cheap storage....

Right. [Any way to avoid these, btw?]  I got side-tracked by the
comments about the new (to me) delaylog mount option to xfs; as the
results were not exactly as exp[ected I though it might be interesting
to report these.

But as the subject says, my current topic is tuning RAID0 to avoid
exactly this type of bottleneck; or rather looking for tunable options
on RAID0

Best regards,

Wolfgang Denk

--
DENX Software Engineering GmbH,     MD: Wolfgang Denk & Detlev Zundel
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de
PLEASE NOTE: Some Quantum Physics Theories Suggest That When the Con-
sumer Is Not Directly Observing This Product, It May Cease  to  Exist
or Will Exist Only in a Vague and Undetermined State.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Optimize RAID0 for max IOPS?
@ 2011-01-25 12:45                   ` Wolfgang Denk
  0 siblings, 0 replies; 46+ messages in thread
From: Wolfgang Denk @ 2011-01-25 12:45 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-raid, Justin Piszcz, xfs

Dear Dave Chinner,

In message <20110125083643.GE28803@dastard> you wrote:
>
> Oh, despite $subject being "RAID0" the filesystems being tested are
> on RAID5 and RAID6 with very small chunk sizes on slow SATA drives.
> This is smelling like a case of barrier IOs on software raid on
> cheap storage....

Right. [Any way to avoid these, btw?]  I got side-tracked by the
comments about the new (to me) delaylog mount option to xfs; as the
results were not exactly as exp[ected I though it might be interesting
to report these.

But as the subject says, my current topic is tuning RAID0 to avoid
exactly this type of bottleneck; or rather looking for tunable options
on RAID0

Best regards,

Wolfgang Denk

--
DENX Software Engineering GmbH,     MD: Wolfgang Denk & Detlev Zundel
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de
PLEASE NOTE: Some Quantum Physics Theories Suggest That When the Con-
sumer Is Not Directly Observing This Product, It May Cease  to  Exist
or Will Exist Only in a Vague and Undetermined State.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Optimize RAID0 for max IOPS?
  2011-01-25 12:45                   ` Wolfgang Denk
@ 2011-01-25 12:51                     ` Emmanuel Florac
  -1 siblings, 0 replies; 46+ messages in thread
From: Emmanuel Florac @ 2011-01-25 12:51 UTC (permalink / raw)
  To: Wolfgang Denk; +Cc: Dave Chinner, linux-raid, Justin Piszcz, xfs

Le Tue, 25 Jan 2011 13:45:09 +0100
Wolfgang Denk <wd@denx.de> écrivait:

> > This is smelling like a case of barrier IOs on software raid on
> > cheap storage....  
> 
> Right. [Any way to avoid these, btw?] 

Easy enough, use the "nobarrier" mount option. 

-- 
------------------------------------------------------------------------
Emmanuel Florac     |   Direction technique
                    |   Intellique
                    |	<eflorac@intellique.com>
                    |   +33 1 78 94 84 02
------------------------------------------------------------------------
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Optimize RAID0 for max IOPS?
@ 2011-01-25 12:51                     ` Emmanuel Florac
  0 siblings, 0 replies; 46+ messages in thread
From: Emmanuel Florac @ 2011-01-25 12:51 UTC (permalink / raw)
  To: Wolfgang Denk; +Cc: linux-raid, xfs, Justin, Piszcz

Le Tue, 25 Jan 2011 13:45:09 +0100
Wolfgang Denk <wd@denx.de> écrivait:

> > This is smelling like a case of barrier IOs on software raid on
> > cheap storage....  
> 
> Right. [Any way to avoid these, btw?] 

Easy enough, use the "nobarrier" mount option. 

-- 
------------------------------------------------------------------------
Emmanuel Florac     |   Direction technique
                    |   Intellique
                    |	<eflorac@intellique.com>
                    |   +33 1 78 94 84 02
------------------------------------------------------------------------

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Optimize RAID0 for max IOPS?
  2011-01-18 21:01 Optimize RAID0 for max IOPS? Wolfgang Denk
  2011-01-18 22:18 ` Roberto Spadim
  2011-01-18 23:15 ` Stefan /*St0fF*/ Hübner
@ 2011-01-25 17:10 ` Christoph Hellwig
  2011-01-25 18:41   ` Wolfgang Denk
  2 siblings, 1 reply; 46+ messages in thread
From: Christoph Hellwig @ 2011-01-25 17:10 UTC (permalink / raw)
  To: Wolfgang Denk; +Cc: linux-raid

On Tue, Jan 18, 2011 at 10:01:12PM +0100, Wolfgang Denk wrote:
> Hi,
> 
> I'm going to replace a h/w based RAID system (3ware 9650SE) by a plain
> s/w RAID0, because the existing system appears to be seriously limited
> in terms of numbers of I/O operations per second.
> 
> Our workload is mixed read / write (something between 80% read / 20%
> write and 50% / 50%), consisting of a very large number of usually
> very small files.
> 
> There may be 20...50 millions of files, or more. 65% of the files are
> smaller than 4 kB; 80% are smaller than 8 kB; 90% are smaller than 16
> kB; 98.4% are smaller than 64 kB.

I don't think you even want a RAID0 in that case.  For small IOPs
you're much better off with a simple concatenation of devices.

> The plan is to build a RAID0 from the 4 devices, create a physical
> volume and a volume group on the resulting /dev/md?, then create 2 or
> 3 logical volumes that will be used as XFS file systems.

Especially if you're running XFS the concetantion will work beautifully
for this setup.  Make sure that your AG boundaries align to the physical
devices, and they can be used completely independently for small IOPs.

> Should I do anything different to acchive maximum performance?

Make sure to disable the disk write caches and if not using the newest
kernel also mount the filesystem with -o nobarrier.  With lots of small
I/Os and metadata intensive workloads that's usually a lot faster.

Also if you have a lot of log traffic an external log devices will
help a lot.  It's doesn't need to be larger, but it will keep the
amount of seeks on the other devices down.


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Optimize RAID0 for max IOPS?
  2011-01-25 17:10 ` Christoph Hellwig
@ 2011-01-25 18:41   ` Wolfgang Denk
  2011-01-25 21:35     ` Christoph Hellwig
  0 siblings, 1 reply; 46+ messages in thread
From: Wolfgang Denk @ 2011-01-25 18:41 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-raid

Dear Christoph,

In message <20110125171017.GA24921@infradead.org> you wrote:
>
> > There may be 20...50 millions of files, or more. 65% of the files are
> > smaller than 4 kB; 80% are smaller than 8 kB; 90% are smaller than 16
> > kB; 98.4% are smaller than 64 kB.
> 
> I don't think you even want a RAID0 in that case.  For small IOPs
> you're much better off with a simple concatenation of devices.

What exactly do you mean by "conatenation"? LVM striping?
At least the discussion here does not show any significant advantages
for this concept:
http://groups.google.com/group/ubuntu-user-community/web/pick-your-pleasure-raid-0-mdadm-striping-or-lvm-striping

> > Should I do anything different to acchive maximum performance?
> 
> Make sure to disable the disk write caches and if not using the newest
> kernel also mount the filesystem with -o nobarrier.  With lots of small
> I/Os and metadata intensive workloads that's usually a lot faster.

Tests if done recently indicate that on the other hand nobarrier causes
a serious degradation of read and write performance (down to some 40%
of the values before).

> Also if you have a lot of log traffic an external log devices will
> help a lot.  It's doesn't need to be larger, but it will keep the
> amount of seeks on the other devices down.

Understood, thanks.

Best regards,

Wolfgang Denk

-- 
DENX Software Engineering GmbH,     MD: Wolfgang Denk & Detlev Zundel
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de
Never underestimate the bandwidth of a station wagon full of tapes.
                                -- Dr. Warren Jackson, Director, UTCS

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Optimize RAID0 for max IOPS?
  2011-01-25 18:41   ` Wolfgang Denk
@ 2011-01-25 21:35     ` Christoph Hellwig
  2011-01-26  7:16       ` Wolfgang Denk
  0 siblings, 1 reply; 46+ messages in thread
From: Christoph Hellwig @ 2011-01-25 21:35 UTC (permalink / raw)
  To: Wolfgang Denk; +Cc: Christoph Hellwig, linux-raid

On Tue, Jan 25, 2011 at 07:41:15PM +0100, Wolfgang Denk wrote:
> > I don't think you even want a RAID0 in that case.  For small IOPs
> > you're much better off with a simple concatenation of devices.
> 
> What exactly do you mean by "conatenation"? LVM striping?
> At least the discussion here does not show any significant advantages
> for this concept:
> http://groups.google.com/group/ubuntu-user-community/web/pick-your-pleasure-raid-0-mdadm-striping-or-lvm-striping

No, concatenation means not using any striping, but just concatenating
the disk linearly, e.g.

 +-----------------------------------+
 |            Filesystem             |
 +--------+--------+--------+--------+
 | Disk 1 | Disk 2 | Disk 3 | Disk 4 |
 +--------+--------+--------+--------+

This can be done using the using the MD linear target, or simply
by having multiple PVs in a VG with LVM.

> 
> > Make sure to disable the disk write caches and if not using the newest
> > kernel also mount the filesystem with -o nobarrier.  With lots of small
> > I/Os and metadata intensive workloads that's usually a lot faster.
> 
> Tests if done recently indicate that on the other hand nobarrier causes
> a serious degradation of read and write performance (down to some 40%
> of the values before).

Do you have a pointer to your results?


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Optimize RAID0 for max IOPS?
  2011-01-25 21:35     ` Christoph Hellwig
@ 2011-01-26  7:16       ` Wolfgang Denk
  2011-01-26  8:32         ` Stan Hoeppner
  2011-01-26  9:38         ` Christoph Hellwig
  0 siblings, 2 replies; 46+ messages in thread
From: Wolfgang Denk @ 2011-01-26  7:16 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-raid

Dear Christoph Hellwig,

In message <20110125213523.GA14375@infradead.org> you wrote:
>
> > What exactly do you mean by "conatenation"? LVM striping?
> > At least the discussion here does not show any significant advantages
> > for this concept:
> > http://groups.google.com/group/ubuntu-user-community/web/pick-your-pleasure-raid-0-mdadm-striping-or-lvm-striping
> 
> No, concatenation means not using any striping, but just concatenating
> the disk linearly, e.g.
> 
>  +-----------------------------------+
>  |            Filesystem             |
>  +--------+--------+--------+--------+
>  | Disk 1 | Disk 2 | Disk 3 | Disk 4 |
>  +--------+--------+--------+--------+
> 
> This can be done using the using the MD linear target, or simply
> by having multiple PVs in a VG with LVM.

I will not have a single file system, but several, so I'd probably go
with LVM. But - when I then create a LV, eventually smaller than any
of the disks, will the data (and thus the traffic) be really distri-
buted over all drives, or will I not basicly see the same results as
when using a single drive?

> > Tests if done recently indicate that on the other hand nobarrier causes
> > a serious degradation of read and write performance (down to some 40%
> > of the values before).
> 
> Do you have a pointer to your results?

This was the first set of tests:

http://thread.gmane.org/gmane.linux.raid/31269/focus=31419

I've run some more tests on the system called 'B' in this list:


# lvcreate -L 32G -n test castor0
  Logical volume "test" created
# mkfs.xfs /dev/mapper/castor0-test 
meta-data=/dev/mapper/castor0-test isize=256    agcount=16, agsize=524284 blks
         =                       sectsz=512   attr=2
data     =                       bsize=4096   blocks=8388544, imaxpct=25
         =                       sunit=4      swidth=16 blks
naming   =version 2              bsize=4096   ascii-ci=0
log      =internal log           bsize=4096   blocks=4096, version=2
         =                       sectsz=512   sunit=4 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0
# mount /dev/mapper/castor0-test /mnt/tmp/
# mkdir /mnt/tmp/foo
# chown wd.wd /mnt/tmp/foo
# bonnie++ -d /mnt/tmp/foo -m xfs -u wd -g wd
Version  1.96       ------Sequential Output------ --Sequential Input- --Random-
Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
xfs             16G   425  98 182929  64 46956  41   955  97 201274  83 517.6  30
Latency             42207us    2377ms     195ms   33339us   86675us   84167us
Version  1.96       ------Sequential Create------ --------Random Create--------
xfs                 -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
                 16    93   1 +++++ +++    90   1   123   1 +++++ +++   127   1
Latency               939ms    2279us    1415ms     307ms    1057us     724ms
1.96,1.96,xfs,1,1295938326,16G,,425,98,182929,64,46956,41,955,97,201274,83,517.6,30,16,,,,,93,1,+++++,+++,90,1,123,1,+++++,+++,127,1,42207us,2377ms,195ms,33339us,86675us,84167us,939ms,2279us,1415ms,307ms,1057us,724ms

[[Re-run with larger number of file creates / deletes]]

# bonnie++ -d /mnt/tmp/foo -n 128:65536:0:512 -m xfs1 -u wd -g wd
Version  1.96       ------Sequential Output------ --Sequential Input- --Random-
Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
xfs1            16G   400  98 175931  63 46970  40   781  99 181044  73 524.2  30
Latency             48299us    2501ms     210ms   20693us   83729us   85349us
Version  1.96       ------Sequential Create------ --------Random Create--------
xfs1                -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
files:max            /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
    128:65536:0/512    42   1 25607  99    71   1    38   1  8267  67    34   0
Latency              1410ms    2337us    2116ms    1240ms   44920us    4139ms
1.96,1.96,xfs1,1,1295942356,16G,,400,98,175931,63,46970,40,781,99,181044,73,524.2,30,128,65536,,,512,42,1,25607,99,71,1,38,1,8267,67,34,0,48299us,2501ms,210ms,20693us,83729us,85349us,1410ms,2337us,2116ms,1240ms,44920us,4139ms

[[Add delaylog,logbsize=262144]]

# mount | grep /mnt/tmp
/dev/mapper/castor0-test on /mnt/tmp type xfs (rw)
# mount -o remount,noatime,delaylog,logbsize=262144 /mnt/tmp
# mount | grep /mnt/tmp
/dev/mapper/castor0-test on /mnt/tmp type xfs (rw,noatime,delaylog,logbsize=262144)
Version  1.96       ------Sequential Output------ --Sequential Input- --Random-
Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
xfs1            16G   445  98 106201  43 35407  33   939  99 83545  42 490.4  30
Latency             43307us    4614ms     242ms   37420us     195ms     128ms
Version  1.96       ------Sequential Create------ --------Random Create--------
xfs1                -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
files:max            /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
    128:65536:0/512   308   4 24121  99  2393  30   321   5 22929  99   331   6
Latency             34842ms    1288us    6634ms   87944ms     195us   12239ms
1.96,1.96,xfs1,1,1295968991,16G,,445,98,106201,43,35407,33,939,99,83545,42,490.4,30,128,65536,,,512,308,4,24121,99,2393,30,321,5,22929,99,331,6,43307us,4614ms,242ms,37420us,195ms,128ms,34842ms,1288us,6634ms,87944ms,195us,12239ms


[[Note: Block write: drop to 60%, Block read drops to <50%]]

[[Add nobarriers]]

# mount -o remount,nobarriers /mnt/tmp
# mount | grep /mnt/tmp
/dev/mapper/castor0-test on /mnt/tmp type xfs (rw,noatime,delaylog,logbsize=262144,nobarriers)
# bonnie++ -d /mnt/tmp/foo -n 128:65536:0:512 -m xfs2 -u wd -g wd
Version  1.96       ------Sequential Output------ --Sequential Input- --Random-
Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
xfs2            16G   427  98 193950  65 52848  45   987  99 198110  83 496.5  25
Latency             41543us     128ms     186ms   14678us   67639us   76024us
Version  1.96       ------Sequential Create------ --------Random Create--------
xfs2                -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
files:max            /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
    128:65536:0/512   352   6 24513  99  2604  32   334   5 24921  99   333   6
Latency             32152ms    2307us    4148ms   31036ms     493us   23065ms
1.96,1.96,xfs2,1,1295966513,16G,,427,98,193950,65,52848,45,987,99,198110,83,496.5,25,128,65536,,,512,352,6,24513,99,2604,32,334,5,24921,99,333,6,41543us,128ms,186ms,14678us,67639us,76024us,32152ms,2307us,4148ms,31036ms,493us,23065ms


[[Much better.  But now compare ext4]]

# mkfs.ext4 /dev/mapper/castor0-test
mke2fs 1.41.12 (17-May-2010)
Filesystem label=
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
Stride=4 blocks, Stripe width=16 blocks
2097152 inodes, 8388608 blocks
419430 blocks (5.00%) reserved for the super user
First data block=0
Maximum filesystem blocks=4294967296
256 block groups
32768 blocks per group, 32768 fragments per group
8192 inodes per group
Superblock backups stored on blocks: 
        32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208, 
        4096000, 7962624

Writing inode tables: done                            
Creating journal (32768 blocks): done
Writing superblocks and filesystem accounting information: done

This filesystem will be automatically checked every 22 mounts or
180 days, whichever comes first.  Use tune2fs -c or -i to override.
# mount /dev/mapper/castor0-test /mnt/tmp
# mount | grep /mnt/tmp
/dev/mapper/castor0-test on /mnt/tmp type ext4 (rw)
# mkdir /mnt/tmp/foo
# chown wd.wd /mnt/tmp/foo
# bonnie++ -d /mnt/tmp/foo -m ext4 -u wd -g wd
...
Version  1.96       ------Sequential Output------ --Sequential Input- --Random-
Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
ext4            16G   248  99 128657  49 61267  49  1026  97 236552  85 710.9  35
Latency             78833us     567ms    2586ms   37539us   61572us   88413us
Version  1.96       ------Sequential Create------ --------Random Create--------
ext4                -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
                 16 14841  52 +++++ +++ 23164  70 20409  78 +++++ +++ 23441  73
Latency               206us    2384us    2372us    2322us      78us    2335us
1.96,1.96,ext4,1,1295954392,16G,,248,99,128657,49,61267,49,1026,97,236552,85,710.9,35,16,,,,,14841,52,+++++,+++,23164,70,20409,78,+++++,+++,23441,73,78833us,567ms,2586ms,37539us,61572us,88413us,206us,2384us,2372us,2322us,78us,2335us

[[Only 2/3 of the speed of XFS for block write, but nearly 20% faster
for block read.  But magnitudes faster for file creates / deletes!]]

[[add nobarrier]]

# mount -o remount,nobarrier /mnt/tmp
# mount | grep /mnt/tmp
/dev/mapper/castor0-test on /mnt/tmp type ext4.2 (rw,nobarrier)
# bonnie++ -d /mnt/tmp/foo -m ext4 -u wd -g wd
Version  1.96       ------Sequential Output------ --Sequential Input- --Random-
Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
ext4.2          16G   241  99 125446  50 57726  55   945  97 215698  87 509.2  54
Latency             81198us    1085ms    2479ms   46401us     111ms   83051us
Version  1.96       ------Sequential Create------ --------Random Create--------
ext4                -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
                 16 12476  63 +++++ +++ 23990  66 21185  82 +++++ +++ 23039  82
Latency               440us    1019us    1094us     238us      25us     215us
1.96,1.96,ext4.2,1,1295996176,16G,,241,99,125446,50,57726,55,945,97,215698,87,509.2,54,16,,,,,12476,63,+++++,+++,23990,66,21185,82,+++++,+++,23039,82,81198us,1085ms,2479ms,46401us,111ms,83051us,440us,1019us,1094us,238us,25us,215us

[[Again, degradation of about 10% for block read; with only minod
advantages for seq. delete and random create]]



Best regards,

Wolfgang Denk

-- 
DENX Software Engineering GmbH,     MD: Wolfgang Denk & Detlev Zundel
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de
For those who like this sort of thing, this is the sort of thing they
like.                                               - Abraham Lincoln

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Optimize RAID0 for max IOPS?
  2011-01-26  7:16       ` Wolfgang Denk
@ 2011-01-26  8:32         ` Stan Hoeppner
  2011-01-26  8:42           ` Wolfgang Denk
  2011-01-26  9:38         ` Christoph Hellwig
  1 sibling, 1 reply; 46+ messages in thread
From: Stan Hoeppner @ 2011-01-26  8:32 UTC (permalink / raw)
  To: Wolfgang Denk; +Cc: Christoph Hellwig, linux-raid

Wolfgang Denk put forth on 1/26/2011 1:16 AM:

> I will not have a single file system, but several, so I'd probably go
> with LVM. But - when I then create a LV, eventually smaller than any
> of the disks, will the data (and thus the traffic) be really distri-
> buted over all drives, or will I not basicly see the same results as
> when using a single drive?

If creating multiple filesystems then concatenation is probably not what you
want, for the reasons you suspect, if you want the IO spread across all 4 disks
for all operations on all filesystems.

> # lvcreate -L 32G -n test castor0
>   Logical volume "test" created
> # mkfs.xfs /dev/mapper/castor0-test

Is this on that set of 4 low end Maxtor disks?  Is the above LV sitting atop
RAID 0, RAID 5, or concatenation?

> [[Only 2/3 of the speed of XFS for block write, but nearly 20% faster
> for block read.  But magnitudes faster for file creates / deletes!]]

Try adding some concurrency, say 8, to bonnie++ and retest both XFS and ext4.
XFS was designed/optimized for parallel workloads, not single thread workloads
(although it can extract some concurrency from a single thread workload).  XFS
really shines with parallel workloads  (assuming the underlying hardware isn't
junk, and the mdraid/lvm configuration is sane).  ext4 will probably always beat
XFS performance with single thread workloads, and I don't believe anyone is
surprised by that.  For most moderate to heavy parallel workloads, XFS usually
trounces ext4 (and all other Linux filesystems).

-- 
Stan

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Optimize RAID0 for max IOPS?
  2011-01-26  8:32         ` Stan Hoeppner
@ 2011-01-26  8:42           ` Wolfgang Denk
  0 siblings, 0 replies; 46+ messages in thread
From: Wolfgang Denk @ 2011-01-26  8:42 UTC (permalink / raw)
  To: Stan Hoeppner; +Cc: Christoph Hellwig, linux-raid

Dear Stan Hoeppner,

In message <4D3FDC31.3010502@hardwarefreak.com> you wrote:
> 
> > # lvcreate -L 32G -n test castor0
> >   Logical volume "test" created
> > # mkfs.xfs /dev/mapper/castor0-test
> 
> Is this on that set of 4 low end Maxtor disks?  Is the above LV sitting atop
> RAID 0, RAID 5, or concatenation?

No, this is the other system, using 6 x Seagate ST31000524NS on a
Marvell MV88SX6081 8-port SATA II PCI-X Controller.

LVM is sitting on top of a RAID6 here:

md2 : active raid6 sda[0] sdi[5] sdh[4] sde[3] sdd[2] sdb[1]
      3907049792 blocks super 1.2 level 6, 16k chunk, algorithm 2 [6/6] [UUUUUU]

> Try adding some concurrency, say 8, to bonnie++ and retest both XFS and ext4.

OK, will do.

Best regards,

Wolfgang Denk

-- 
DENX Software Engineering GmbH,     MD: Wolfgang Denk & Detlev Zundel
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de
The price of curiosity is a terminal experience.
                         - Terry Pratchett, _The Dark Side of the Sun_

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Optimize RAID0 for max IOPS?
  2011-01-26  7:16       ` Wolfgang Denk
  2011-01-26  8:32         ` Stan Hoeppner
@ 2011-01-26  9:38         ` Christoph Hellwig
  2011-01-26  9:41           ` CoolCold
  1 sibling, 1 reply; 46+ messages in thread
From: Christoph Hellwig @ 2011-01-26  9:38 UTC (permalink / raw)
  To: Wolfgang Denk; +Cc: Christoph Hellwig, linux-raid

On Wed, Jan 26, 2011 at 08:16:16AM +0100, Wolfgang Denk wrote:
> I will not have a single file system, but several, so I'd probably go
> with LVM. But - when I then create a LV, eventually smaller than any
> of the disks, will the data (and thus the traffic) be really distri-
> buted over all drives, or will I not basicly see the same results as
> when using a single drive?

Think about it:  if you're doing small IOPs, they usually are smaller
than the stripe size and you will hit only one disk anyway.  But with
a raid0 which disk you hit is relatively unpredictable.  With a
concatentation aligned to the AGs XFS will distribute processes writing
data to the different AGs and thus the different disks, and you can
reliably get performance out of them.

If you have multiple filesystems the setup depends a lot on the
workloads you plan to put on the filesystems.  If all of the filesystems
on it are busy at the same time just assigning disks to filesystems 
probably gives you the best performace.  If they are busy at different
times, or some are not busy at all you first want to partition the disk
into areas for each filesystem and then concatenate them into volumes
for each filesystem.


> [[Note: Block write: drop to 60%, Block read drops to <50%]]

How is the cpu load?  delaylog trades I/O operations for cpu
utilization.  Together with a raid6, which apparently is the system you
use here i might overload your system.

And btw, in future please state you have numbers for a totally different
setup then the one you're asking questions for.  Comparing a raid6 setup
to striping/concatenation is completely irrelevant.

> 
> [[Add nobarriers]]
> 
> # mount -o remount,nobarriers /mnt/tmp
> # mount | grep /mnt/tmp
> /dev/mapper/castor0-test on /mnt/tmp type xfs (rw,noatime,delaylog,logbsize=262144,nobarriers)

 a) the option is called nobarrier
 b) it looks like your mount implementation is really buggy as it shows
    random options that weren't actually parsed and accepted by the
    filesystem.

> [[Again, degradation of about 10% for block read; with only minod
> advantages for seq. delete and random create]]

I really don't trust the numbers.  nobarrier sends down less I/O
requests, and avoids all kinds of queue stalls.  How repetable are these
benchmarks?  Do you also see it using a less hacky benchmark than
bonnier++?


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Optimize RAID0 for max IOPS?
  2011-01-26  9:38         ` Christoph Hellwig
@ 2011-01-26  9:41           ` CoolCold
  0 siblings, 0 replies; 46+ messages in thread
From: CoolCold @ 2011-01-26  9:41 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Wolfgang Denk, linux-raid

On Wed, Jan 26, 2011 at 12:38 PM, Christoph Hellwig <hch@infradead.org> wrote:
> On Wed, Jan 26, 2011 at 08:16:16AM +0100, Wolfgang Denk wrote:
>> I will not have a single file system, but several, so I'd probably go
>> with LVM. But - when I then create a LV, eventually smaller than any
>> of the disks, will the data (and thus the traffic) be really distri-
>> buted over all drives, or will I not basicly see the same results as
>> when using a single drive?
>
> Think about it:  if you're doing small IOPs, they usually are smaller
> than the stripe size and you will hit only one disk anyway.  But with
> a raid0 which disk you hit is relatively unpredictable.  With a
> concatentation aligned to the AGs XFS will distribute processes writing
> data to the different AGs and thus the different disks, and you can
> reliably get performance out of them.
>
> If you have multiple filesystems the setup depends a lot on the
> workloads you plan to put on the filesystems.  If all of the filesystems
> on it are busy at the same time just assigning disks to filesystems
> probably gives you the best performace.  If they are busy at different
> times, or some are not busy at all you first want to partition the disk
> into areas for each filesystem and then concatenate them into volumes
> for each filesystem.
>
>
>> [[Note: Block write: drop to 60%, Block read drops to <50%]]
>
> How is the cpu load?  delaylog trades I/O operations for cpu
> utilization.  Together with a raid6, which apparently is the system you
> use here i might overload your system.
>
> And btw, in future please state you have numbers for a totally different
> setup then the one you're asking questions for.  Comparing a raid6 setup
> to striping/concatenation is completely irrelevant.
>
>>
>> [[Add nobarriers]]
>>
>> # mount -o remount,nobarriers /mnt/tmp
>> # mount | grep /mnt/tmp
>> /dev/mapper/castor0-test on /mnt/tmp type xfs (rw,noatime,delaylog,logbsize=262144,nobarriers)
>
>  a) the option is called nobarrier
>  b) it looks like your mount implementation is really buggy as it shows
>    random options that weren't actually parsed and accepted by the
>    filesystem.
cat /proc/mounts may help i guess

>
>> [[Again, degradation of about 10% for block read; with only minod
>> advantages for seq. delete and random create]]
>
> I really don't trust the numbers.  nobarrier sends down less I/O
> requests, and avoids all kinds of queue stalls.  How repetable are these
> benchmarks?  Do you also see it using a less hacky benchmark than
> bonnier++?
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>



-- 
Best regards,
[COOLCOLD-RIPN]
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 46+ messages in thread

end of thread, other threads:[~2011-01-26  9:41 UTC | newest]

Thread overview: 46+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-01-18 21:01 Optimize RAID0 for max IOPS? Wolfgang Denk
2011-01-18 22:18 ` Roberto Spadim
2011-01-19  7:04   ` Wolfgang Denk
2011-01-18 23:15 ` Stefan /*St0fF*/ Hübner
2011-01-19  0:05   ` Roberto Spadim
2011-01-19  7:11     ` Wolfgang Denk
2011-01-19  8:18       ` Stefan /*St0fF*/ Hübner
2011-01-19  8:29         ` Jaap Crezee
2011-01-19  9:32           ` Jan Kasprzak
2011-01-19  7:10   ` Wolfgang Denk
2011-01-19 19:21   ` Wolfgang Denk
2011-01-19 19:50     ` Roberto Spadim
2011-01-19 22:36       ` Stefan /*St0fF*/ Hübner
2011-01-19 23:09         ` Roberto Spadim
2011-01-19 23:18           ` Roberto Spadim
2011-01-20  2:48             ` Keld Jørn Simonsen
2011-01-20  3:53               ` Roberto Spadim
2011-01-21 19:34             ` Wolfgang Denk
2011-01-21 20:03               ` Roberto Spadim
2011-01-21 20:04                 ` Roberto Spadim
2011-01-24 14:40     ` CoolCold
2011-01-24 15:25       ` Justin Piszcz
2011-01-24 15:25         ` Justin Piszcz
2011-01-24 20:48         ` Wolfgang Denk
2011-01-24 20:48           ` Wolfgang Denk
2011-01-24 21:57         ` Wolfgang Denk
2011-01-24 21:57           ` Wolfgang Denk
2011-01-24 23:03           ` Dave Chinner
2011-01-24 23:03             ` Dave Chinner
2011-01-25  7:39             ` Emmanuel Florac
2011-01-25  7:39               ` Emmanuel Florac
2011-01-25  8:36               ` Dave Chinner
2011-01-25  8:36                 ` Dave Chinner
2011-01-25 12:45                 ` Wolfgang Denk
2011-01-25 12:45                   ` Wolfgang Denk
2011-01-25 12:51                   ` Emmanuel Florac
2011-01-25 12:51                     ` Emmanuel Florac
2011-01-24 20:43       ` Wolfgang Denk
2011-01-25 17:10 ` Christoph Hellwig
2011-01-25 18:41   ` Wolfgang Denk
2011-01-25 21:35     ` Christoph Hellwig
2011-01-26  7:16       ` Wolfgang Denk
2011-01-26  8:32         ` Stan Hoeppner
2011-01-26  8:42           ` Wolfgang Denk
2011-01-26  9:38         ` Christoph Hellwig
2011-01-26  9:41           ` CoolCold

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.