* Optimize RAID0 for max IOPS? @ 2011-01-18 21:01 Wolfgang Denk 2011-01-18 22:18 ` Roberto Spadim ` (2 more replies) 0 siblings, 3 replies; 46+ messages in thread From: Wolfgang Denk @ 2011-01-18 21:01 UTC (permalink / raw) To: linux-raid Hi, I'm going to replace a h/w based RAID system (3ware 9650SE) by a plain s/w RAID0, because the existing system appears to be seriously limited in terms of numbers of I/O operations per second. Our workload is mixed read / write (something between 80% read / 20% write and 50% / 50%), consisting of a very large number of usually very small files. There may be 20...50 millions of files, or more. 65% of the files are smaller than 4 kB; 80% are smaller than 8 kB; 90% are smaller than 16 kB; 98.4% are smaller than 64 kB. I will have 4 x 1 TB disks for this setup. The plan is to build a RAID0 from the 4 devices, create a physical volume and a volume group on the resulting /dev/md?, then create 2 or 3 logical volumes that will be used as XFS file systems. My goal is to optimize for maximum number of I/O operations per second. [I am aware that using SSDs would be a nice thing, but that would be too expensive.] Is this a reasonable approach for such a task? Should I do anything different to acchive maximum performance? What are the tunables in this setup? [It seems the usual recipies are more oriented in maximizing the data troughput for large, mostly sequential accesses - I figure that things like increasing read-ahead etc. will not help me much here?] Thanks in advance. Best regards, Wolfgang Denk -- DENX Software Engineering GmbH, MD: Wolfgang Denk & Detlev Zundel HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de Quote from a recent meeting: "We are going to continue having these meetings everyday until I find out why no work is getting done." ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: Optimize RAID0 for max IOPS? 2011-01-18 21:01 Optimize RAID0 for max IOPS? Wolfgang Denk @ 2011-01-18 22:18 ` Roberto Spadim 2011-01-19 7:04 ` Wolfgang Denk 2011-01-18 23:15 ` Stefan /*St0fF*/ Hübner 2011-01-25 17:10 ` Christoph Hellwig 2 siblings, 1 reply; 46+ messages in thread From: Roberto Spadim @ 2011-01-18 22:18 UTC (permalink / raw) To: Wolfgang Denk; +Cc: linux-raid it´s a interesting question, i don´t know what the best too but... i didn´t create a partition of a /dev/mdxxx device yet (linux 2.6.29), maybe it´s not possible try partitioning all hard drives and make many paritions and make raid on each one another way could be a lvm over mdxxx and try to partition it (can lvm be partitioned?) another optimization is per disk elevator (at linux level) at /sys/ you can find it (try find -iname elevator, or find -iname scheduler, i don´t remember the file name) linux raid0 have a nice read/write algorithm for hard disks (i think) test it the best solution is no partition (since md will be made in disk, and not on partition, this make disk head position more real than by partition, making read_balance algorithm better) 2011/1/18 Wolfgang Denk <wd@denx.de>: > Hi, > > I'm going to replace a h/w based RAID system (3ware 9650SE) by a plain > s/w RAID0, because the existing system appears to be seriously limited > in terms of numbers of I/O operations per second. > > Our workload is mixed read / write (something between 80% read / 20% > write and 50% / 50%), consisting of a very large number of usually > very small files. > > There may be 20...50 millions of files, or more. 65% of the files are > smaller than 4 kB; 80% are smaller than 8 kB; 90% are smaller than 16 > kB; 98.4% are smaller than 64 kB. > > I will have 4 x 1 TB disks for this setup. > > The plan is to build a RAID0 from the 4 devices, create a physical > volume and a volume group on the resulting /dev/md?, then create 2 or > 3 logical volumes that will be used as XFS file systems. > > My goal is to optimize for maximum number of I/O operations per > second. [I am aware that using SSDs would be a nice thing, but that > would be too expensive.] > > Is this a reasonable approach for such a task? > > Should I do anything different to acchive maximum performance? > > What are the tunables in this setup? [It seems the usual recipies are > more oriented in maximizing the data troughput for large, mostly > sequential accesses - I figure that things like increasing read-ahead > etc. will not help me much here?] > > Thanks in advance. > > Best regards, > > Wolfgang Denk > > -- > DENX Software Engineering GmbH, MD: Wolfgang Denk & Detlev Zundel > HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany > Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de > Quote from a recent meeting: "We are going to continue having these > meetings everyday until I find out why no work is getting done." > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- Roberto Spadim Spadim Technology / SPAEmpresarial -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: Optimize RAID0 for max IOPS? 2011-01-18 22:18 ` Roberto Spadim @ 2011-01-19 7:04 ` Wolfgang Denk 0 siblings, 0 replies; 46+ messages in thread From: Wolfgang Denk @ 2011-01-19 7:04 UTC (permalink / raw) To: Roberto Spadim; +Cc: linux-raid Dear Roberto Spadim, In message <AANLkTimzwgg_Htj4rMxjdjhMQHExeWOqzd5Puu9KbXug@mail.gmail.com> you wrote: > > try partitioning all hard drives and make many paritions and make raid > on each one > another way could be a lvm over mdxxx and try to partition it (can lvm > be partitioned?) I do not intend to use any partitions. I will use LVM on the full device /dev/mdX, and then use logical volumes. Best regards, Wolfgang Denk -- DENX Software Engineering GmbH, MD: Wolfgang Denk & Detlev Zundel HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de What was sliced bread the greatest thing since? ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: Optimize RAID0 for max IOPS? 2011-01-18 21:01 Optimize RAID0 for max IOPS? Wolfgang Denk 2011-01-18 22:18 ` Roberto Spadim @ 2011-01-18 23:15 ` Stefan /*St0fF*/ Hübner 2011-01-19 0:05 ` Roberto Spadim ` (2 more replies) 2011-01-25 17:10 ` Christoph Hellwig 2 siblings, 3 replies; 46+ messages in thread From: Stefan /*St0fF*/ Hübner @ 2011-01-18 23:15 UTC (permalink / raw) To: Wolfgang Denk; +Cc: linux-raid Hi, [in German:] Schätzelein, Dein Problem sind die Platten, nicht der Controller. [in English:] Dude, the disks are your bottleneck. On a 4-disk RAID0 software RAID can only outspeed this 3ware Controller with a really really fast processor. The limiting factor is the disk's access time. If SSDs are too expensive, then your actual performance is the max you'll get (maybe to replace the HWRAID controller might give a little speed-up, but not very much). All the best, Stefan Am 18.01.2011 22:01, schrieb Wolfgang Denk: > Hi, > > I'm going to replace a h/w based RAID system (3ware 9650SE) by a plain > s/w RAID0, because the existing system appears to be seriously limited > in terms of numbers of I/O operations per second. > > Our workload is mixed read / write (something between 80% read / 20% > write and 50% / 50%), consisting of a very large number of usually > very small files. > > There may be 20...50 millions of files, or more. 65% of the files are > smaller than 4 kB; 80% are smaller than 8 kB; 90% are smaller than 16 > kB; 98.4% are smaller than 64 kB. > > I will have 4 x 1 TB disks for this setup. > > The plan is to build a RAID0 from the 4 devices, create a physical > volume and a volume group on the resulting /dev/md?, then create 2 or > 3 logical volumes that will be used as XFS file systems. > > My goal is to optimize for maximum number of I/O operations per > second. [I am aware that using SSDs would be a nice thing, but that > would be too expensive.] > > Is this a reasonable approach for such a task? > > Should I do anything different to acchive maximum performance? > > What are the tunables in this setup? [It seems the usual recipies are > more oriented in maximizing the data troughput for large, mostly > sequential accesses - I figure that things like increasing read-ahead > etc. will not help me much here?] > > Thanks in advance. > > Best regards, > > Wolfgang Denk > -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: Optimize RAID0 for max IOPS? 2011-01-18 23:15 ` Stefan /*St0fF*/ Hübner @ 2011-01-19 0:05 ` Roberto Spadim 2011-01-19 7:11 ` Wolfgang Denk 2011-01-19 7:10 ` Wolfgang Denk 2011-01-19 19:21 ` Wolfgang Denk 2 siblings, 1 reply; 46+ messages in thread From: Roberto Spadim @ 2011-01-19 0:05 UTC (permalink / raw) To: stefan.huebner; +Cc: Wolfgang Denk, linux-raid maybe removing hwraid and using swraid may reduce speed (depend how much cpu you use with hw and with sw) what we can optimize? less I/O per seconds making as much useful read/write data on array, how? good read/write algorithms for raid. (for each device type, ssd, hd) but... like stefan, disks are your bottleneck 2011/1/18 Stefan /*St0fF*/ Hübner <stefan.huebner@stud.tu-ilmenau.de>: > Hi, > > [in German:] Schätzelein, Dein Problem sind die Platten, nicht der > Controller. > > [in English:] Dude, the disks are your bottleneck. > > On a 4-disk RAID0 software RAID can only outspeed this 3ware Controller > with a really really fast processor. The limiting factor is the disk's > access time. If SSDs are too expensive, then your actual performance is > the max you'll get (maybe to replace the HWRAID controller might give a > little speed-up, but not very much). > > All the best, > Stefan > > Am 18.01.2011 22:01, schrieb Wolfgang Denk: >> Hi, >> >> I'm going to replace a h/w based RAID system (3ware 9650SE) by a plain >> s/w RAID0, because the existing system appears to be seriously limited >> in terms of numbers of I/O operations per second. >> >> Our workload is mixed read / write (something between 80% read / 20% >> write and 50% / 50%), consisting of a very large number of usually >> very small files. >> >> There may be 20...50 millions of files, or more. 65% of the files are >> smaller than 4 kB; 80% are smaller than 8 kB; 90% are smaller than 16 >> kB; 98.4% are smaller than 64 kB. >> >> I will have 4 x 1 TB disks for this setup. >> >> The plan is to build a RAID0 from the 4 devices, create a physical >> volume and a volume group on the resulting /dev/md?, then create 2 or >> 3 logical volumes that will be used as XFS file systems. >> >> My goal is to optimize for maximum number of I/O operations per >> second. [I am aware that using SSDs would be a nice thing, but that >> would be too expensive.] >> >> Is this a reasonable approach for such a task? >> >> Should I do anything different to acchive maximum performance? >> >> What are the tunables in this setup? [It seems the usual recipies are >> more oriented in maximizing the data troughput for large, mostly >> sequential accesses - I figure that things like increasing read-ahead >> etc. will not help me much here?] >> >> Thanks in advance. >> >> Best regards, >> >> Wolfgang Denk >> > > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- Roberto Spadim Spadim Technology / SPAEmpresarial -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: Optimize RAID0 for max IOPS? 2011-01-19 0:05 ` Roberto Spadim @ 2011-01-19 7:11 ` Wolfgang Denk 2011-01-19 8:18 ` Stefan /*St0fF*/ Hübner 0 siblings, 1 reply; 46+ messages in thread From: Wolfgang Denk @ 2011-01-19 7:11 UTC (permalink / raw) To: Roberto Spadim; +Cc: stefan.huebner, linux-raid Dear Roberto Spadim, In message <AANLkTi=v6yA_0OOfi2ymA67X0x+KsvV9VC5OgG+0DvKq@mail.gmail.com> you wrote: > maybe removing hwraid and using swraid may reduce speed (depend how > much cpu you use with hw and with sw) > what we can optimize? less I/O per seconds making as much useful > read/write data on array, how? good read/write algorithms for raid. > (for each device type, ssd, hd) but... like stefan, disks are your > bottleneck No, they are not. Run some benchmarks yourself if you don't believe me. Even a single disk drive is performing better than the hw RAID under this workload. Best regards, Wolfgang Denk -- DENX Software Engineering GmbH, MD: Wolfgang Denk & Detlev Zundel HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de WARNING: This Product Attracts Every Other Piece of Matter in the Universe, Including the Products of Other Manufacturers, with a Force Proportional to the Product of the Masses and Inversely Proportional to the Distance Between Them. ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: Optimize RAID0 for max IOPS? 2011-01-19 7:11 ` Wolfgang Denk @ 2011-01-19 8:18 ` Stefan /*St0fF*/ Hübner 2011-01-19 8:29 ` Jaap Crezee 0 siblings, 1 reply; 46+ messages in thread From: Stefan /*St0fF*/ Hübner @ 2011-01-19 8:18 UTC (permalink / raw) To: Wolfgang Denk; +Cc: Roberto Spadim, linux-raid Am 19.01.2011 08:11, schrieb Wolfgang Denk: > Dear Roberto Spadim, > > In message <AANLkTi=v6yA_0OOfi2ymA67X0x+KsvV9VC5OgG+0DvKq@mail.gmail.com> you wrote: >> maybe removing hwraid and using swraid may reduce speed (depend how >> much cpu you use with hw and with sw) >> what we can optimize? less I/O per seconds making as much useful >> read/write data on array, how? good read/write algorithms for raid. >> (for each device type, ssd, hd) but... like stefan, disks are your >> bottleneck > > No, they are not. Run some benchmarks yourself if you don't believe > me. Lol - I wouldn't have answered in the first place if I didn't have any expertise. So suit yourself - as you don't bring up any real numbers (remember: you've got the weird setup, you asked, you don't have enough money for the enterprise solution - so ...) nobody who worked with 3ware controllers will believe you. > > Even a single disk drive is performing better than the hw RAID under > this workload. Well - that is the problem - simulate YOUR workload. Actually I fear at least one of your disks has a grown defect, which slows down / blocks i/o. Haven't seen any 9650SE RAID being slower than the same config in a software raid. > > Best regards, > > Wolfgang Denk > ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: Optimize RAID0 for max IOPS? 2011-01-19 8:18 ` Stefan /*St0fF*/ Hübner @ 2011-01-19 8:29 ` Jaap Crezee 2011-01-19 9:32 ` Jan Kasprzak 0 siblings, 1 reply; 46+ messages in thread From: Jaap Crezee @ 2011-01-19 8:29 UTC (permalink / raw) To: stefan.huebner; +Cc: Wolfgang Denk, Roberto Spadim, linux-raid On 01/19/11 09:18, Stefan /*St0fF*/ Hübner wrote: > Am 19.01.2011 08:11, schrieb Wolfgang Denk: > Lol - I wouldn't have answered in the first place if I didn't have any > expertise. So suit yourself - as you don't bring up any real numbers > (remember: you've got the weird setup, you asked, you don't have enough > money for the enterprise solution - so ...) nobody who worked with 3ware > controllers will believe you. Here's one: I switched from 3ware hardware based raid to linux software raid and I am getting better throughputs. I had a 3ware PCI-X car (don't know which type by hearth). Okay, to be honest I did not have a (enterprise solution?) battery-backup-unit. So probably no write caching... Jaap -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: Optimize RAID0 for max IOPS? 2011-01-19 8:29 ` Jaap Crezee @ 2011-01-19 9:32 ` Jan Kasprzak 0 siblings, 0 replies; 46+ messages in thread From: Jan Kasprzak @ 2011-01-19 9:32 UTC (permalink / raw) To: Jaap Crezee; +Cc: stefan.huebner, Wolfgang Denk, Roberto Spadim, linux-raid Jaap Crezee wrote: : On 01/19/11 09:18, Stefan /*St0fF*/ Hübner wrote: : >Am 19.01.2011 08:11, schrieb Wolfgang Denk: : >Lol - I wouldn't have answered in the first place if I didn't have any : >expertise. So suit yourself - as you don't bring up any real numbers : >(remember: you've got the weird setup, you asked, you don't have enough : >money for the enterprise solution - so ...) nobody who worked with 3ware : >controllers will believe you. : : Here's one: I switched from 3ware hardware based raid to linux software : raid and I am getting better throughputs. I had a 3ware PCI-X car (don't : know which type by hearth). : Okay, to be honest I did not have a (enterprise solution?) : battery-backup-unit. So probably no write caching... : A "me too": 3ware 9550SX with 8 drives, RAID-5. The performance (especially latency) was very bad. After I switched to the md SW RAID and lowered the TCQ depth in the 3ware controller to 16[*], the filesystem and latency feels much faster. The only problem I had was a poor interaction of the CFQ iosched with the RAID-5 rebuild process, but I have fixed this by moving to deadline I/O scheduler. Another case was the LSI SAS 2008 (I admit it is a pretty low-end HW RAID controller): 10 disks WD RE4 black 2TB in HW and SW RAID-10 configurations: time mkfs.ext4 /dev/md0 # SW RAID real 8m4.783s user 0m9.255s sys 2m30.107s time mkfs.ext4 -F /dev/sdb # HW RAID real 22m13.503s user 0m9.763s sys 2m51.371s The problem with HW RAID is that today's computers can dedicate tens of gigabytes to buffer cache, which allows the I/O scheduler to reorder requests based on latency and other criteria, which no RAID controller can match, because it cannot see which requests are latency-critical and which are not. Also, Linux I/O scheduler works really hard to keep all spindles busy, while when you fill the TC queue of a HW RAID volume with requests which maps to one or small number of physical disks, there is no way the controller can tell "send me more requests, but not from this area of the HW RAID volume". [*] 3ware driver is especially bad here, because its default queue depth is 1024, IIRC, which makes the whole I/O scheduler with queue size 512 a no-op. Think bufferbloat in the storage area. -- | Jan "Yenya" Kasprzak <kas at {fi.muni.cz - work | yenya.net - private}> | | GPG: ID 1024/D3498839 Fingerprint 0D99A7FB206605D7 8B35FCDE05B18A5E | | http://www.fi.muni.cz/~kas/ Journal: http://www.fi.muni.cz/~kas/blog/ | Please don't top post and in particular don't attach entire digests to your mail or we'll all soon be using bittorrent to read the list. --Alan Cox -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: Optimize RAID0 for max IOPS? 2011-01-18 23:15 ` Stefan /*St0fF*/ Hübner 2011-01-19 0:05 ` Roberto Spadim @ 2011-01-19 7:10 ` Wolfgang Denk 2011-01-19 19:21 ` Wolfgang Denk 2 siblings, 0 replies; 46+ messages in thread From: Wolfgang Denk @ 2011-01-19 7:10 UTC (permalink / raw) To: stefan.huebner; +Cc: linux-raid Dear =?ISO-8859-15?Q?Stefan_/*St0fF*/_H=FCbner?=, In message <4D361F26.3060507@stud.tu-ilmenau.de> you wrote: > > [in German:] Schätzelein, Dein Problem sind die Platten, nicht der > Controller. Irrtum. > [in English:] Dude, the disks are your bottleneck. Wrong. Testing the same workload with soft RAID versus the h/w RAID solution gives a _significant_ performance difference. I happen to know which benchmarks 3ware (and other RAID controller manufacturers) are optimizing their firmware for - IOPS is not even mentioned there. > On a 4-disk RAID0 software RAID can only outspeed this 3ware Controller > with a really really fast processor. The limiting factor is the disk's > access time. If SSDs are too expensive, then your actual performance is > the max you'll get (maybe to replace the HWRAID controller might give a > little speed-up, but not very much). From some tests done before I expect to see a speed increase of >10. Hey, even a single disk is performing better under this work load. And fast processor? Yes, I have it, but what for? It spends most of it's time (>90%, usually more) in iowait. Best regards, Wolfgang Denk -- DENX Software Engineering GmbH, MD: Wolfgang Denk & Detlev Zundel HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de "To IBM, 'open' means there is a modicum of interoperability among some of their equipment." - Harv Masterson -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: Optimize RAID0 for max IOPS? 2011-01-18 23:15 ` Stefan /*St0fF*/ Hübner 2011-01-19 0:05 ` Roberto Spadim 2011-01-19 7:10 ` Wolfgang Denk @ 2011-01-19 19:21 ` Wolfgang Denk 2011-01-19 19:50 ` Roberto Spadim 2011-01-24 14:40 ` CoolCold 2 siblings, 2 replies; 46+ messages in thread From: Wolfgang Denk @ 2011-01-19 19:21 UTC (permalink / raw) To: stefan.huebner; +Cc: linux-raid Dear =?ISO-8859-15?Q?Stefan_/*St0fF*/_H=FCbner?=, In message <4D361F26.3060507@stud.tu-ilmenau.de> you wrote: > > [in German:] Schätzelein, Dein Problem sind die Platten, nicht der > Controller. > > [in English:] Dude, the disks are your bottleneck. ... Maybe we can stop speculations about what might be the cause of the problems in some setup I do NOT intend to use, and rather discuss the questions I asked. > > I will have 4 x 1 TB disks for this setup. > > > > The plan is to build a RAID0 from the 4 devices, create a physical > > volume and a volume group on the resulting /dev/md?, then create 2 or > > 3 logical volumes that will be used as XFS file systems. Clarrification: I'll run /dev/md* on the raw disks, without any partitions on them. > > My goal is to optimize for maximum number of I/O operations per > > second. ... > > > > Is this a reasonable approach for such a task? > > > > Should I do anything different to acchive maximum performance? > > > > What are the tunables in this setup? [It seems the usual recipies are > > more oriented in maximizing the data troughput for large, mostly > > sequential accesses - I figure that things like increasing read-ahead > > etc. will not help me much here?] So can anybody help answering these questions: - are there any special options when creating the RAID0 to make it perform faster for such a use case? - are there other tunables, any special MD / LVM / file system / read ahead / buffer cache / ... parameters to look for? Thanks. Wolfgang Denk -- DENX Software Engineering GmbH, MD: Wolfgang Denk & Detlev Zundel HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de Boykottiert Microsoft - Kauft Eure Fenster bei OBI! -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: Optimize RAID0 for max IOPS? 2011-01-19 19:21 ` Wolfgang Denk @ 2011-01-19 19:50 ` Roberto Spadim 2011-01-19 22:36 ` Stefan /*St0fF*/ Hübner 2011-01-24 14:40 ` CoolCold 1 sibling, 1 reply; 46+ messages in thread From: Roberto Spadim @ 2011-01-19 19:50 UTC (permalink / raw) To: Wolfgang Denk; +Cc: stefan.huebner, linux-raid So can anybody help answering these questions: - are there any special options when creating the RAID0 to make it perform faster for such a use case? - are there other tunables, any special MD / LVM / file system / read ahead / buffer cache / ... parameters to look for? lets see: what´s your disk (ssd or sas or sata) best block size to write/read? write this at ->(A) what´s your work load? 50% write 50% read ? raid0 block size should be multiple of (A) *****filesystem size should be multiple of (A) of all disks *****read ahead should be a multiple of (A) for example /dev/sda 1kb /dev/sdb 4kb you should use 6kb... you should use 4kb, 8kb, 16kb (multiple of 1kb and 4kb) check i/o sheduller per disk too (ssd should use noop, disks should use cfq, deadline or another...) async and sync option at mount /etc/fstab, noatime reduce a lot of i/o too, you should optimize your application too hdparm each disk to use dma and fastest i/o options are you using only filesystem? are you using somethink more? samba? mysql? apache? lvm? each of this programs have some tunning, check their benchmarks getting back.... what´s a raid controller? cpu + memory + disk controller + disks but... it only run raid software (it can run linux....) if you computer is slower than raid cpu+memory+disk controller, you will have a slower software raid, than hardware raid it´s like load balance on cpu/memory utilization of disk i/o (use dedicated hardware, or use your hardware?) got it? using a super fast xeon with ddr3 and optical fiber running software raid, is faster than a hardware raid using a arm (or fpga) ddrX memory and sas(fiber optical) connection to disks two solutions for the same problem what´s fast? benchmark it i think that if your xeon run a database and a very workloaded apache, a dedicated hardware raid can run faster, but a light xeon can run faster than a dedicated hardware raid 2011/1/19 Wolfgang Denk <wd@denx.de>: > Dear =?ISO-8859-15?Q?Stefan_/*St0fF*/_H=FCbner?=, > > In message <4D361F26.3060507@stud.tu-ilmenau.de> you wrote: >> >> [in German:] Schätzelein, Dein Problem sind die Platten, nicht der >> Controller. >> >> [in English:] Dude, the disks are your bottleneck. > ... > > Maybe we can stop speculations about what might be the cause of the > problems in some setup I do NOT intend to use, and rather discuss the > questions I asked. > >> > I will have 4 x 1 TB disks for this setup. >> > >> > The plan is to build a RAID0 from the 4 devices, create a physical >> > volume and a volume group on the resulting /dev/md?, then create 2 or >> > 3 logical volumes that will be used as XFS file systems. > > Clarrification: I'll run /dev/md* on the raw disks, without any > partitions on them. > >> > My goal is to optimize for maximum number of I/O operations per >> > second. ... >> > >> > Is this a reasonable approach for such a task? >> > >> > Should I do anything different to acchive maximum performance? >> > >> > What are the tunables in this setup? [It seems the usual recipies are >> > more oriented in maximizing the data troughput for large, mostly >> > sequential accesses - I figure that things like increasing read-ahead >> > etc. will not help me much here?] > > So can anybody help answering these questions: > > - are there any special options when creating the RAID0 to make it > perform faster for such a use case? > - are there other tunables, any special MD / LVM / file system / > read ahead / buffer cache / ... parameters to look for? > > Thanks. > > Wolfgang Denk > > -- > DENX Software Engineering GmbH, MD: Wolfgang Denk & Detlev Zundel > HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany > Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de > Boykottiert Microsoft - Kauft Eure Fenster bei OBI! > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- Roberto Spadim Spadim Technology / SPAEmpresarial -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: Optimize RAID0 for max IOPS? 2011-01-19 19:50 ` Roberto Spadim @ 2011-01-19 22:36 ` Stefan /*St0fF*/ Hübner 2011-01-19 23:09 ` Roberto Spadim 0 siblings, 1 reply; 46+ messages in thread From: Stefan /*St0fF*/ Hübner @ 2011-01-19 22:36 UTC (permalink / raw) To: Roberto Spadim; +Cc: Wolfgang Denk, linux-raid @Roberto: I guess you're right. BUT: i have not seen 900MB/s coming from (i.e. read access) a software raid, but I've seen it from a 9750 on a LSI SASx28 backplane, running RAID6 over 16disks (HDS722020ALA330). So one might not be wrong assuming on current raid-controllers hardware/software matching and timing is way more optimized than what mdraid might get at all. The 9650 and 9690 are considerably slower, but I've seen 550MB/s thruput from those, also (I don't recall the setup anymore, tho). Max reading I saw from a software raid was around 350MB/s - so hence my answers. And if people had problems with controllers which are 5 years or older by now, the numbers are not really comparable... Now again there's the point where there are also parameters on the controller that can be tweaked, and a simple way to recreate the testing scenario. We may discuss and throw in further numbers and experience, but not being able to recreate your specific scenario makes us talk past each other... stefan Am 19.01.2011 20:50, schrieb Roberto Spadim: > So can anybody help answering these questions: > > - are there any special options when creating the RAID0 to make it > perform faster for such a use case? > - are there other tunables, any special MD / LVM / file system / read > ahead / buffer cache / ... parameters to look for? > > lets see: > what´s your disk (ssd or sas or sata) best block size to write/read? > write this at ->(A) > what´s your work load? 50% write 50% read ? > > raid0 block size should be multiple of (A) > *****filesystem size should be multiple of (A) of all disks > *****read ahead should be a multiple of (A) > for example > /dev/sda 1kb > /dev/sdb 4kb > > you should use 6kb... you should use 4kb, 8kb, 16kb (multiple of 1kb and 4kb) > > check i/o sheduller per disk too (ssd should use noop, disks should > use cfq, deadline or another...) > async and sync option at mount /etc/fstab, noatime reduce a lot of i/o > too, you should optimize your application too > hdparm each disk to use dma and fastest i/o options > > are you using only filesystem? are you using somethink more? samba? > mysql? apache? lvm? > each of this programs have some tunning, check their benchmarks > > > getting back.... > what´s a raid controller? > cpu + memory + disk controller + disks > but... it only run raid software (it can run linux....) > > if you computer is slower than raid cpu+memory+disk controller, you > will have a slower software raid, than hardware raid > it´s like load balance on cpu/memory utilization of disk i/o (use > dedicated hardware, or use your hardware?) > got it? > using a super fast xeon with ddr3 and optical fiber running software > raid, is faster than a hardware raid using a arm (or fpga) ddrX memory > and sas(fiber optical) connection to disks > > two solutions for the same problem > what´s fast? benchmark it > i think that if your xeon run a database and a very workloaded apache, > a dedicated hardware raid can run faster, but a light xeon can run > faster than a dedicated hardware raid > > > > 2011/1/19 Wolfgang Denk <wd@denx.de>: >> Dear =?ISO-8859-15?Q?Stefan_/*St0fF*/_H=FCbner?=, >> >> In message <4D361F26.3060507@stud.tu-ilmenau.de> you wrote: >>> >>> [in German:] Schätzelein, Dein Problem sind die Platten, nicht der >>> Controller. >>> >>> [in English:] Dude, the disks are your bottleneck. >> ... >> >> Maybe we can stop speculations about what might be the cause of the >> problems in some setup I do NOT intend to use, and rather discuss the >> questions I asked. >> >>>> I will have 4 x 1 TB disks for this setup. >>>> >>>> The plan is to build a RAID0 from the 4 devices, create a physical >>>> volume and a volume group on the resulting /dev/md?, then create 2 or >>>> 3 logical volumes that will be used as XFS file systems. >> >> Clarrification: I'll run /dev/md* on the raw disks, without any >> partitions on them. >> >>>> My goal is to optimize for maximum number of I/O operations per >>>> second. ... >>>> >>>> Is this a reasonable approach for such a task? >>>> >>>> Should I do anything different to acchive maximum performance? >>>> >>>> What are the tunables in this setup? [It seems the usual recipies are >>>> more oriented in maximizing the data troughput for large, mostly >>>> sequential accesses - I figure that things like increasing read-ahead >>>> etc. will not help me much here?] >> >> So can anybody help answering these questions: >> >> - are there any special options when creating the RAID0 to make it >> perform faster for such a use case? >> - are there other tunables, any special MD / LVM / file system / >> read ahead / buffer cache / ... parameters to look for? >> >> Thanks. >> >> Wolfgang Denk >> >> -- >> DENX Software Engineering GmbH, MD: Wolfgang Denk & Detlev Zundel >> HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany >> Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de >> Boykottiert Microsoft - Kauft Eure Fenster bei OBI! >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-raid" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> > > > -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: Optimize RAID0 for max IOPS? 2011-01-19 22:36 ` Stefan /*St0fF*/ Hübner @ 2011-01-19 23:09 ` Roberto Spadim 2011-01-19 23:18 ` Roberto Spadim 0 siblings, 1 reply; 46+ messages in thread From: Roberto Spadim @ 2011-01-19 23:09 UTC (permalink / raw) To: stefan.huebner; +Cc: Wolfgang Denk, linux-raid the problem.... if you use iostat, or iotop with software raid: you just see disk i/o you don´t see memory (cache) i/o when using hardware raid: you just see raid i/o (it can be a cache read or a real disk read) if you check memory+disk i/o, you will get similar values, if not, you will see high cpu usage for example you are using raidx with 10disks on a hardware raid change hardware raid to use only disks (10 disks for linux) make the same raidx with 10disks you will get a slower i/o since it have a controler between disk and cpu try it without hardware raid cpu, just a (sas/sata) optimized controller, or 10 (sata/sas) one port you still with a slow i/o then hardware controller (that´s right!) now let´s remove the sata/sas channel, let´s use a pci-express revodrive or pci-express texas ssd drive you will get better values then a hardware raid, but... why? you changed the hardware (ok, i know) but you make cpu more close to disk if you use disks with cache, you will get more speed (a memory ssd harddisk is faster than a harddisk only disk) why hardware are more faster than linux? i don´t think they are... they can make smaller latencies with good memory cache but if you computer use ddr3 and your hardware raid controller use i2c memory, your ddr3 cache is faster... how to benchmark? check disk i/o+memory cache i/o if linux is faster ok, you use more cpu and memory of your computer if linux is slower ok, you use less cpu and memory, but will have it on hardware raid... if you upgrade you memory and cpu, it can be faster than you hardware raid controller, what´s better for you? want a better read/write solution for software raid? make a new read/write code, you can do it, linux is easier than hardware raid to code! want a better read/write solution for hardware raid? call your hardware seller and talk, please i need a better firmware, could you send me? got? 2011/1/19 Stefan /*St0fF*/ Hübner <stefan.huebner@stud.tu-ilmenau.de>: > @Roberto: I guess you're right. BUT: i have not seen 900MB/s coming from > (i.e. read access) a software raid, but I've seen it from a 9750 on a > LSI SASx28 backplane, running RAID6 over 16disks (HDS722020ALA330). So > one might not be wrong assuming on current raid-controllers > hardware/software matching and timing is way more optimized than what > mdraid might get at all. > > The 9650 and 9690 are considerably slower, but I've seen 550MB/s thruput > from those, also (I don't recall the setup anymore, tho). > > Max reading I saw from a software raid was around 350MB/s - so hence my > answers. And if people had problems with controllers which are 5 years > or older by now, the numbers are not really comparable... > > Now again there's the point where there are also parameters on the > controller that can be tweaked, and a simple way to recreate the testing > scenario. We may discuss and throw in further numbers and experience, > but not being able to recreate your specific scenario makes us talk past > each other... > > stefan > > Am 19.01.2011 20:50, schrieb Roberto Spadim: >> So can anybody help answering these questions: >> >> - are there any special options when creating the RAID0 to make it >> perform faster for such a use case? >> - are there other tunables, any special MD / LVM / file system / read >> ahead / buffer cache / ... parameters to look for? >> >> lets see: >> what´s your disk (ssd or sas or sata) best block size to write/read? >> write this at ->(A) >> what´s your work load? 50% write 50% read ? >> >> raid0 block size should be multiple of (A) >> *****filesystem size should be multiple of (A) of all disks >> *****read ahead should be a multiple of (A) >> for example >> /dev/sda 1kb >> /dev/sdb 4kb >> >> you should use 6kb... you should use 4kb, 8kb, 16kb (multiple of 1kb and 4kb) >> >> check i/o sheduller per disk too (ssd should use noop, disks should >> use cfq, deadline or another...) >> async and sync option at mount /etc/fstab, noatime reduce a lot of i/o >> too, you should optimize your application too >> hdparm each disk to use dma and fastest i/o options >> >> are you using only filesystem? are you using somethink more? samba? >> mysql? apache? lvm? >> each of this programs have some tunning, check their benchmarks >> >> >> getting back.... >> what´s a raid controller? >> cpu + memory + disk controller + disks >> but... it only run raid software (it can run linux....) >> >> if you computer is slower than raid cpu+memory+disk controller, you >> will have a slower software raid, than hardware raid >> it´s like load balance on cpu/memory utilization of disk i/o (use >> dedicated hardware, or use your hardware?) >> got it? >> using a super fast xeon with ddr3 and optical fiber running software >> raid, is faster than a hardware raid using a arm (or fpga) ddrX memory >> and sas(fiber optical) connection to disks >> >> two solutions for the same problem >> what´s fast? benchmark it >> i think that if your xeon run a database and a very workloaded apache, >> a dedicated hardware raid can run faster, but a light xeon can run >> faster than a dedicated hardware raid >> >> >> >> 2011/1/19 Wolfgang Denk <wd@denx.de>: >>> Dear =?ISO-8859-15?Q?Stefan_/*St0fF*/_H=FCbner?=, >>> >>> In message <4D361F26.3060507@stud.tu-ilmenau.de> you wrote: >>>> >>>> [in German:] Schätzelein, Dein Problem sind die Platten, nicht der >>>> Controller. >>>> >>>> [in English:] Dude, the disks are your bottleneck. >>> ... >>> >>> Maybe we can stop speculations about what might be the cause of the >>> problems in some setup I do NOT intend to use, and rather discuss the >>> questions I asked. >>> >>>>> I will have 4 x 1 TB disks for this setup. >>>>> >>>>> The plan is to build a RAID0 from the 4 devices, create a physical >>>>> volume and a volume group on the resulting /dev/md?, then create 2 or >>>>> 3 logical volumes that will be used as XFS file systems. >>> >>> Clarrification: I'll run /dev/md* on the raw disks, without any >>> partitions on them. >>> >>>>> My goal is to optimize for maximum number of I/O operations per >>>>> second. ... >>>>> >>>>> Is this a reasonable approach for such a task? >>>>> >>>>> Should I do anything different to acchive maximum performance? >>>>> >>>>> What are the tunables in this setup? [It seems the usual recipies are >>>>> more oriented in maximizing the data troughput for large, mostly >>>>> sequential accesses - I figure that things like increasing read-ahead >>>>> etc. will not help me much here?] >>> >>> So can anybody help answering these questions: >>> >>> - are there any special options when creating the RAID0 to make it >>> perform faster for such a use case? >>> - are there other tunables, any special MD / LVM / file system / >>> read ahead / buffer cache / ... parameters to look for? >>> >>> Thanks. >>> >>> Wolfgang Denk >>> >>> -- >>> DENX Software Engineering GmbH, MD: Wolfgang Denk & Detlev Zundel >>> HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany >>> Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de >>> Boykottiert Microsoft - Kauft Eure Fenster bei OBI! >>> -- >>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in >>> the body of a message to majordomo@vger.kernel.org >>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>> >> >> >> > > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- Roberto Spadim Spadim Technology / SPAEmpresarial -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: Optimize RAID0 for max IOPS? 2011-01-19 23:09 ` Roberto Spadim @ 2011-01-19 23:18 ` Roberto Spadim 2011-01-20 2:48 ` Keld Jørn Simonsen 2011-01-21 19:34 ` Wolfgang Denk 0 siblings, 2 replies; 46+ messages in thread From: Roberto Spadim @ 2011-01-19 23:18 UTC (permalink / raw) To: stefan.huebner; +Cc: Wolfgang Denk, linux-raid a good idea.... why not start a opensource raid controller? what we need? a cpu, memory, power supply with battery or capacitor, sas/sata (disk interfaces), pci-express or another (computer interface) it don´t need a operational system, since it will only run one program with some threads (ok a small operational system to implement threads easly) we could use arm, fpga, intel core2duo, atlhon, xeon, or another system... instead using a computer with ethernet interface (nbd nfs samba or another file/device sharing iscsi ethernet sata), we need a computer with pci-express interface and native operational system module 2011/1/19 Roberto Spadim <roberto@spadim.com.br>: > the problem.... > if you use iostat, or iotop > with software raid: > you just see disk i/o > you don´t see memory (cache) i/o > when using hardware raid: > you just see raid i/o (it can be a cache read or a real disk read) > > > if you check memory+disk i/o, you will get similar values, if not, you > will see high cpu usage > for example you are using raidx with 10disks on a hardware raid > change hardware raid to use only disks (10 disks for linux) > make the same raidx with 10disks > you will get a slower i/o since it have a controler between disk and cpu > try it without hardware raid cpu, just a (sas/sata) optimized > controller, or 10 (sata/sas) one port > you still with a slow i/o then hardware controller (that´s right!) > > now let´s remove the sata/sas channel, let´s use a pci-express > revodrive or pci-express texas ssd drive > you will get better values then a hardware raid, but... why? you > changed the hardware (ok, i know) but you make cpu more close to disk > if you use disks with cache, you will get more speed (a memory ssd > harddisk is faster than a harddisk only disk) > > why hardware are more faster than linux? i don´t think they are... > they can make smaller latencies with good memory cache > but if you computer use ddr3 and your hardware raid controller use i2c > memory, your ddr3 cache is faster... > > how to benchmark? check disk i/o+memory cache i/o > if linux is faster ok, you use more cpu and memory of your computer > if linux is slower ok, you use less cpu and memory, but will have it > on hardware raid... > if you upgrade you memory and cpu, it can be faster than you hardware > raid controller, what´s better for you? > > want a better read/write solution for software raid? make a new > read/write code, you can do it, linux is easier than hardware raid to > code! > want a better read/write solution for hardware raid? call your > hardware seller and talk, please i need a better firmware, could you > send me? > > got? > > > 2011/1/19 Stefan /*St0fF*/ Hübner <stefan.huebner@stud.tu-ilmenau.de>: >> @Roberto: I guess you're right. BUT: i have not seen 900MB/s coming from >> (i.e. read access) a software raid, but I've seen it from a 9750 on a >> LSI SASx28 backplane, running RAID6 over 16disks (HDS722020ALA330). So >> one might not be wrong assuming on current raid-controllers >> hardware/software matching and timing is way more optimized than what >> mdraid might get at all. >> >> The 9650 and 9690 are considerably slower, but I've seen 550MB/s thruput >> from those, also (I don't recall the setup anymore, tho). >> >> Max reading I saw from a software raid was around 350MB/s - so hence my >> answers. And if people had problems with controllers which are 5 years >> or older by now, the numbers are not really comparable... >> >> Now again there's the point where there are also parameters on the >> controller that can be tweaked, and a simple way to recreate the testing >> scenario. We may discuss and throw in further numbers and experience, >> but not being able to recreate your specific scenario makes us talk past >> each other... >> >> stefan >> >> Am 19.01.2011 20:50, schrieb Roberto Spadim: >>> So can anybody help answering these questions: >>> >>> - are there any special options when creating the RAID0 to make it >>> perform faster for such a use case? >>> - are there other tunables, any special MD / LVM / file system / read >>> ahead / buffer cache / ... parameters to look for? >>> >>> lets see: >>> what´s your disk (ssd or sas or sata) best block size to write/read? >>> write this at ->(A) >>> what´s your work load? 50% write 50% read ? >>> >>> raid0 block size should be multiple of (A) >>> *****filesystem size should be multiple of (A) of all disks >>> *****read ahead should be a multiple of (A) >>> for example >>> /dev/sda 1kb >>> /dev/sdb 4kb >>> >>> you should use 6kb... you should use 4kb, 8kb, 16kb (multiple of 1kb and 4kb) >>> >>> check i/o sheduller per disk too (ssd should use noop, disks should >>> use cfq, deadline or another...) >>> async and sync option at mount /etc/fstab, noatime reduce a lot of i/o >>> too, you should optimize your application too >>> hdparm each disk to use dma and fastest i/o options >>> >>> are you using only filesystem? are you using somethink more? samba? >>> mysql? apache? lvm? >>> each of this programs have some tunning, check their benchmarks >>> >>> >>> getting back.... >>> what´s a raid controller? >>> cpu + memory + disk controller + disks >>> but... it only run raid software (it can run linux....) >>> >>> if you computer is slower than raid cpu+memory+disk controller, you >>> will have a slower software raid, than hardware raid >>> it´s like load balance on cpu/memory utilization of disk i/o (use >>> dedicated hardware, or use your hardware?) >>> got it? >>> using a super fast xeon with ddr3 and optical fiber running software >>> raid, is faster than a hardware raid using a arm (or fpga) ddrX memory >>> and sas(fiber optical) connection to disks >>> >>> two solutions for the same problem >>> what´s fast? benchmark it >>> i think that if your xeon run a database and a very workloaded apache, >>> a dedicated hardware raid can run faster, but a light xeon can run >>> faster than a dedicated hardware raid >>> >>> >>> >>> 2011/1/19 Wolfgang Denk <wd@denx.de>: >>>> Dear =?ISO-8859-15?Q?Stefan_/*St0fF*/_H=FCbner?=, >>>> >>>> In message <4D361F26.3060507@stud.tu-ilmenau.de> you wrote: >>>>> >>>>> [in German:] Schätzelein, Dein Problem sind die Platten, nicht der >>>>> Controller. >>>>> >>>>> [in English:] Dude, the disks are your bottleneck. >>>> ... >>>> >>>> Maybe we can stop speculations about what might be the cause of the >>>> problems in some setup I do NOT intend to use, and rather discuss the >>>> questions I asked. >>>> >>>>>> I will have 4 x 1 TB disks for this setup. >>>>>> >>>>>> The plan is to build a RAID0 from the 4 devices, create a physical >>>>>> volume and a volume group on the resulting /dev/md?, then create 2 or >>>>>> 3 logical volumes that will be used as XFS file systems. >>>> >>>> Clarrification: I'll run /dev/md* on the raw disks, without any >>>> partitions on them. >>>> >>>>>> My goal is to optimize for maximum number of I/O operations per >>>>>> second. ... >>>>>> >>>>>> Is this a reasonable approach for such a task? >>>>>> >>>>>> Should I do anything different to acchive maximum performance? >>>>>> >>>>>> What are the tunables in this setup? [It seems the usual recipies are >>>>>> more oriented in maximizing the data troughput for large, mostly >>>>>> sequential accesses - I figure that things like increasing read-ahead >>>>>> etc. will not help me much here?] >>>> >>>> So can anybody help answering these questions: >>>> >>>> - are there any special options when creating the RAID0 to make it >>>> perform faster for such a use case? >>>> - are there other tunables, any special MD / LVM / file system / >>>> read ahead / buffer cache / ... parameters to look for? >>>> >>>> Thanks. >>>> >>>> Wolfgang Denk >>>> >>>> -- >>>> DENX Software Engineering GmbH, MD: Wolfgang Denk & Detlev Zundel >>>> HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany >>>> Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de >>>> Boykottiert Microsoft - Kauft Eure Fenster bei OBI! >>>> -- >>>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in >>>> the body of a message to majordomo@vger.kernel.org >>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>> >>> >>> >>> >> >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-raid" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> > > > > -- > Roberto Spadim > Spadim Technology / SPAEmpresarial > -- Roberto Spadim Spadim Technology / SPAEmpresarial -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: Optimize RAID0 for max IOPS? 2011-01-19 23:18 ` Roberto Spadim @ 2011-01-20 2:48 ` Keld Jørn Simonsen 2011-01-20 3:53 ` Roberto Spadim 2011-01-21 19:34 ` Wolfgang Denk 1 sibling, 1 reply; 46+ messages in thread From: Keld Jørn Simonsen @ 2011-01-20 2:48 UTC (permalink / raw) To: Roberto Spadim; +Cc: stefan.huebner, Wolfgang Denk, linux-raid On Wed, Jan 19, 2011 at 09:18:22PM -0200, Roberto Spadim wrote: > a good idea.... > why not start a opensource raid controller? > what we need? a cpu, memory, power supply with battery or capacitor, > sas/sata (disk interfaces), pci-express or another (computer > interface) Why? because of some differences in memory speed? Normally software raid is faster than hardware raid, as wittnessed by many here on the list. The mentioning of max 350 MB/s on a SW raid is not true, 350 MB/S is what I get out of a simple box with 4 slightly oldish SATA drives. 16 new fast SATA drives in SW raid6 should easily go beyond 1000 MB/s, given that there are not other bottlenecks in the system. Linux SW raid goes fairly close to theoretical maxima, given adequate HW. best regards keld > it don?t need a operational system, since it will only run one program > with some threads (ok a small operational system to implement threads > easly) > > we could use arm, fpga, intel core2duo, atlhon, xeon, or another system... > instead using a computer with ethernet interface (nbd nfs samba or > another file/device sharing iscsi ethernet sata), we need a computer > with pci-express interface and native operational system module > > > 2011/1/19 Roberto Spadim <roberto@spadim.com.br>: > > the problem.... > > if you use iostat, or iotop > > with software raid: > > you just see disk i/o > > you don?t see memory (cache) i/o > > when using hardware raid: > > you just see raid i/o (it can be a cache read or a real disk read) > > > > > > if you check memory+disk i/o, you will get similar values, if not, you > > will see high cpu usage > > for example you are using raidx with 10disks on a hardware raid > > change hardware raid to use only disks (10 disks for linux) > > make the same raidx with 10disks > > you will get a slower i/o since it have a controler between disk and cpu > > try it without hardware raid cpu, just a (sas/sata) optimized > > controller, or 10 (sata/sas) one port > > you still with a slow i/o then hardware controller (that?s right!) > > > > now let?s remove the sata/sas channel, let?s use a pci-express > > revodrive or pci-express texas ssd drive > > you will get better values then a hardware raid, but... why? you > > changed the hardware (ok, i know) but you make cpu more close to disk > > if you use disks with cache, you will get more speed (a memory ssd > > harddisk is faster than a harddisk only disk) > > > > why hardware are more faster than linux? i don?t think they are... > > they can make smaller latencies with good memory cache > > but if you computer use ddr3 and your hardware raid controller use i2c > > memory, your ddr3 cache is faster... > > > > how to benchmark? check disk i/o+memory cache i/o > > if linux is faster ok, you use more cpu and memory of your computer > > if linux is slower ok, you use less cpu and memory, but will have it > > on hardware raid... > > if you upgrade you memory and cpu, it can be faster than you hardware > > raid controller, what?s better for you? > > > > want a better read/write solution for software raid? make a new > > read/write code, you can do it, linux is easier than hardware raid to > > code! > > want a better read/write solution for hardware raid? call your > > hardware seller and talk, please i need a better firmware, could you > > send me? > > > > got? > > > > > > 2011/1/19 Stefan /*St0fF*/ Hübner <stefan.huebner@stud.tu-ilmenau.de>: > >> @Roberto: I guess you're right. BUT: i have not seen 900MB/s coming from > >> (i.e. read access) a software raid, but I've seen it from a 9750 on a > >> LSI SASx28 backplane, running RAID6 over 16disks (HDS722020ALA330). So > >> one might not be wrong assuming on current raid-controllers > >> hardware/software matching and timing is way more optimized than what > >> mdraid might get at all. > >> > >> The 9650 and 9690 are considerably slower, but I've seen 550MB/s thruput > >> from those, also (I don't recall the setup anymore, tho). > >> > >> Max reading I saw from a software raid was around 350MB/s - so hence my > >> answers. And if people had problems with controllers which are 5 years > >> or older by now, the numbers are not really comparable... > >> > >> Now again there's the point where there are also parameters on the > >> controller that can be tweaked, and a simple way to recreate the testing > >> scenario. We may discuss and throw in further numbers and experience, > >> but not being able to recreate your specific scenario makes us talk past > >> each other... > >> > >> stefan > >> > >> Am 19.01.2011 20:50, schrieb Roberto Spadim: > >>> So can anybody help answering these questions: > >>> > >>> - are there any special options when creating the RAID0 to make it > >>> perform faster for such a use case? > >>> - are there other tunables, any special MD / LVM / file system / read > >>> ahead / buffer cache / ... parameters to look for? > >>> > >>> lets see: > >>> what?s your disk (ssd or sas or sata) best block size to write/read? > >>> write this at ->(A) > >>> what?s your work load? 50% write 50% read ? > >>> > >>> raid0 block size should be multiple of (A) > >>> *****filesystem size should be multiple of (A) of all disks > >>> *****read ahead should be a multiple of (A) > >>> for example > >>> /dev/sda 1kb > >>> /dev/sdb 4kb > >>> > >>> you should use 6kb... you should use 4kb, 8kb, 16kb (multiple of 1kb and 4kb) > >>> > >>> check i/o sheduller per disk too (ssd should use noop, disks should > >>> use cfq, deadline or another...) > >>> async and sync option at mount /etc/fstab, noatime reduce a lot of i/o > >>> too, you should optimize your application too > >>> hdparm each disk to use dma and fastest i/o options > >>> > >>> are you using only filesystem? are you using somethink more? samba? > >>> mysql? apache? lvm? > >>> each of this programs have some tunning, check their benchmarks > >>> > >>> > >>> getting back.... > >>> what?s a raid controller? > >>> cpu + memory + disk controller + disks > >>> but... it only run raid software (it can run linux....) > >>> > >>> if you computer is slower than raid cpu+memory+disk controller, you > >>> will have a slower software raid, than hardware raid > >>> it?s like load balance on cpu/memory utilization of disk i/o (use > >>> dedicated hardware, or use your hardware?) > >>> got it? > >>> using a super fast xeon with ddr3 and optical fiber running software > >>> raid, is faster than a hardware raid using a arm (or fpga) ddrX memory > >>> and sas(fiber optical) connection to disks > >>> > >>> two solutions for the same problem > >>> what?s fast? benchmark it > >>> i think that if your xeon run a database and a very workloaded apache, > >>> a dedicated hardware raid can run faster, but a light xeon can run > >>> faster than a dedicated hardware raid > >>> > >>> > >>> > >>> 2011/1/19 Wolfgang Denk <wd@denx.de>: > >>>> Dear =?ISO-8859-15?Q?Stefan_/*St0fF*/_H=FCbner?=, > >>>> > >>>> In message <4D361F26.3060507@stud.tu-ilmenau.de> you wrote: > >>>>> > >>>>> [in German:] Schätzelein, Dein Problem sind die Platten, nicht der > >>>>> Controller. > >>>>> > >>>>> [in English:] Dude, the disks are your bottleneck. > >>>> ... > >>>> > >>>> Maybe we can stop speculations about what might be the cause of the > >>>> problems in some setup I do NOT intend to use, and rather discuss the > >>>> questions I asked. > >>>> > >>>>>> I will have 4 x 1 TB disks for this setup. > >>>>>> > >>>>>> The plan is to build a RAID0 from the 4 devices, create a physical > >>>>>> volume and a volume group on the resulting /dev/md?, then create 2 or > >>>>>> 3 logical volumes that will be used as XFS file systems. > >>>> > >>>> Clarrification: I'll run /dev/md* on the raw disks, without any > >>>> partitions on them. > >>>> > >>>>>> My goal is to optimize for maximum number of I/O operations per > >>>>>> second. ... > >>>>>> > >>>>>> Is this a reasonable approach for such a task? > >>>>>> > >>>>>> Should I do anything different to acchive maximum performance? > >>>>>> > >>>>>> What are the tunables in this setup? [It seems the usual recipies are > >>>>>> more oriented in maximizing the data troughput for large, mostly > >>>>>> sequential accesses - I figure that things like increasing read-ahead > >>>>>> etc. will not help me much here?] > >>>> > >>>> So can anybody help answering these questions: > >>>> > >>>> - are there any special options when creating the RAID0 to make it > >>>> perform faster for such a use case? > >>>> - are there other tunables, any special MD / LVM / file system / > >>>> read ahead / buffer cache / ... parameters to look for? > >>>> > >>>> Thanks. > >>>> > >>>> Wolfgang Denk > >>>> > >>>> -- > >>>> DENX Software Engineering GmbH, MD: Wolfgang Denk & Detlev Zundel > >>>> HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany > >>>> Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de > >>>> Boykottiert Microsoft - Kauft Eure Fenster bei OBI! > >>>> -- > >>>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in > >>>> the body of a message to majordomo@vger.kernel.org > >>>> More majordomo info at http://vger.kernel.org/majordomo-info.html > >>>> > >>> > >>> > >>> > >> > >> -- > >> To unsubscribe from this list: send the line "unsubscribe linux-raid" in > >> the body of a message to majordomo@vger.kernel.org > >> More majordomo info at http://vger.kernel.org/majordomo-info.html > >> > > > > > > > > -- > > Roberto Spadim > > Spadim Technology / SPAEmpresarial > > > > > > -- > Roberto Spadim > Spadim Technology / SPAEmpresarial > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: Optimize RAID0 for max IOPS? 2011-01-20 2:48 ` Keld Jørn Simonsen @ 2011-01-20 3:53 ` Roberto Spadim 0 siblings, 0 replies; 46+ messages in thread From: Roberto Spadim @ 2011-01-20 3:53 UTC (permalink / raw) To: Keld Jørn Simonsen; +Cc: stefan.huebner, Wolfgang Denk, linux-raid =) i know, but since we have many proprietary firmware, a opensource firmware (like openbios) could be very nice :D hehehehe i will use linux raid (i´m sure it´s very good) it´s really fast, and work with hotswap too (ok there´s some userspace programs to allow it to work ok even with wrong kernel hotswap problems, but when kernel can release and replug it without problems we don´t need userspace programs... userspace check each new hotpluged volume, if uuid is = some raid uuid device, it put the device in the right raid device (i made it with a php script =) hehehe) ) 2011/1/20 Keld Jørn Simonsen <keld@keldix.com>: > On Wed, Jan 19, 2011 at 09:18:22PM -0200, Roberto Spadim wrote: >> a good idea.... >> why not start a opensource raid controller? >> what we need? a cpu, memory, power supply with battery or capacitor, >> sas/sata (disk interfaces), pci-express or another (computer >> interface) > > Why? because of some differences in memory speed? > > Normally software raid is faster than hardware raid, as wittnessed by > many here on the list. The mentioning of max 350 MB/s on a SW raid > is not true, 350 MB/S is what I get out of a simple box with 4 slightly > oldish SATA drives. 16 new fast SATA drives in SW raid6 should easily go beyond > 1000 MB/s, given that there are not other bottlenecks in the system. > > Linux SW raid goes fairly close to theoretical maxima, given adequate > HW. > > > best regards > keld > >> it don?t need a operational system, since it will only run one program >> with some threads (ok a small operational system to implement threads >> easly) >> >> we could use arm, fpga, intel core2duo, atlhon, xeon, or another system... >> instead using a computer with ethernet interface (nbd nfs samba or >> another file/device sharing iscsi ethernet sata), we need a computer >> with pci-express interface and native operational system module >> >> >> 2011/1/19 Roberto Spadim <roberto@spadim.com.br>: >> > the problem.... >> > if you use iostat, or iotop >> > with software raid: >> > you just see disk i/o >> > you don?t see memory (cache) i/o >> > when using hardware raid: >> > you just see raid i/o (it can be a cache read or a real disk read) >> > >> > >> > if you check memory+disk i/o, you will get similar values, if not, you >> > will see high cpu usage >> > for example you are using raidx with 10disks on a hardware raid >> > change hardware raid to use only disks (10 disks for linux) >> > make the same raidx with 10disks >> > you will get a slower i/o since it have a controler between disk and cpu >> > try it without hardware raid cpu, just a (sas/sata) optimized >> > controller, or 10 (sata/sas) one port >> > you still with a slow i/o then hardware controller (that?s right!) >> > >> > now let?s remove the sata/sas channel, let?s use a pci-express >> > revodrive or pci-express texas ssd drive >> > you will get better values then a hardware raid, but... why? you >> > changed the hardware (ok, i know) but you make cpu more close to disk >> > if you use disks with cache, you will get more speed (a memory ssd >> > harddisk is faster than a harddisk only disk) >> > >> > why hardware are more faster than linux? i don?t think they are... >> > they can make smaller latencies with good memory cache >> > but if you computer use ddr3 and your hardware raid controller use i2c >> > memory, your ddr3 cache is faster... >> > >> > how to benchmark? check disk i/o+memory cache i/o >> > if linux is faster ok, you use more cpu and memory of your computer >> > if linux is slower ok, you use less cpu and memory, but will have it >> > on hardware raid... >> > if you upgrade you memory and cpu, it can be faster than you hardware >> > raid controller, what?s better for you? >> > >> > want a better read/write solution for software raid? make a new >> > read/write code, you can do it, linux is easier than hardware raid to >> > code! >> > want a better read/write solution for hardware raid? call your >> > hardware seller and talk, please i need a better firmware, could you >> > send me? >> > >> > got? >> > >> > >> > 2011/1/19 Stefan /*St0fF*/ Hübner <stefan.huebner@stud.tu-ilmenau.de>: >> >> @Roberto: I guess you're right. BUT: i have not seen 900MB/s coming from >> >> (i.e. read access) a software raid, but I've seen it from a 9750 on a >> >> LSI SASx28 backplane, running RAID6 over 16disks (HDS722020ALA330). So >> >> one might not be wrong assuming on current raid-controllers >> >> hardware/software matching and timing is way more optimized than what >> >> mdraid might get at all. >> >> >> >> The 9650 and 9690 are considerably slower, but I've seen 550MB/s thruput >> >> from those, also (I don't recall the setup anymore, tho). >> >> >> >> Max reading I saw from a software raid was around 350MB/s - so hence my >> >> answers. And if people had problems with controllers which are 5 years >> >> or older by now, the numbers are not really comparable... >> >> >> >> Now again there's the point where there are also parameters on the >> >> controller that can be tweaked, and a simple way to recreate the testing >> >> scenario. We may discuss and throw in further numbers and experience, >> >> but not being able to recreate your specific scenario makes us talk past >> >> each other... >> >> >> >> stefan >> >> >> >> Am 19.01.2011 20:50, schrieb Roberto Spadim: >> >>> So can anybody help answering these questions: >> >>> >> >>> - are there any special options when creating the RAID0 to make it >> >>> perform faster for such a use case? >> >>> - are there other tunables, any special MD / LVM / file system / read >> >>> ahead / buffer cache / ... parameters to look for? >> >>> >> >>> lets see: >> >>> what?s your disk (ssd or sas or sata) best block size to write/read? >> >>> write this at ->(A) >> >>> what?s your work load? 50% write 50% read ? >> >>> >> >>> raid0 block size should be multiple of (A) >> >>> *****filesystem size should be multiple of (A) of all disks >> >>> *****read ahead should be a multiple of (A) >> >>> for example >> >>> /dev/sda 1kb >> >>> /dev/sdb 4kb >> >>> >> >>> you should use 6kb... you should use 4kb, 8kb, 16kb (multiple of 1kb and 4kb) >> >>> >> >>> check i/o sheduller per disk too (ssd should use noop, disks should >> >>> use cfq, deadline or another...) >> >>> async and sync option at mount /etc/fstab, noatime reduce a lot of i/o >> >>> too, you should optimize your application too >> >>> hdparm each disk to use dma and fastest i/o options >> >>> >> >>> are you using only filesystem? are you using somethink more? samba? >> >>> mysql? apache? lvm? >> >>> each of this programs have some tunning, check their benchmarks >> >>> >> >>> >> >>> getting back.... >> >>> what?s a raid controller? >> >>> cpu + memory + disk controller + disks >> >>> but... it only run raid software (it can run linux....) >> >>> >> >>> if you computer is slower than raid cpu+memory+disk controller, you >> >>> will have a slower software raid, than hardware raid >> >>> it?s like load balance on cpu/memory utilization of disk i/o (use >> >>> dedicated hardware, or use your hardware?) >> >>> got it? >> >>> using a super fast xeon with ddr3 and optical fiber running software >> >>> raid, is faster than a hardware raid using a arm (or fpga) ddrX memory >> >>> and sas(fiber optical) connection to disks >> >>> >> >>> two solutions for the same problem >> >>> what?s fast? benchmark it >> >>> i think that if your xeon run a database and a very workloaded apache, >> >>> a dedicated hardware raid can run faster, but a light xeon can run >> >>> faster than a dedicated hardware raid >> >>> >> >>> >> >>> >> >>> 2011/1/19 Wolfgang Denk <wd@denx.de>: >> >>>> Dear =?ISO-8859-15?Q?Stefan_/*St0fF*/_H=FCbner?=, >> >>>> >> >>>> In message <4D361F26.3060507@stud.tu-ilmenau.de> you wrote: >> >>>>> >> >>>>> [in German:] Schätzelein, Dein Problem sind die Platten, nicht der >> >>>>> Controller. >> >>>>> >> >>>>> [in English:] Dude, the disks are your bottleneck. >> >>>> ... >> >>>> >> >>>> Maybe we can stop speculations about what might be the cause of the >> >>>> problems in some setup I do NOT intend to use, and rather discuss the >> >>>> questions I asked. >> >>>> >> >>>>>> I will have 4 x 1 TB disks for this setup. >> >>>>>> >> >>>>>> The plan is to build a RAID0 from the 4 devices, create a physical >> >>>>>> volume and a volume group on the resulting /dev/md?, then create 2 or >> >>>>>> 3 logical volumes that will be used as XFS file systems. >> >>>> >> >>>> Clarrification: I'll run /dev/md* on the raw disks, without any >> >>>> partitions on them. >> >>>> >> >>>>>> My goal is to optimize for maximum number of I/O operations per >> >>>>>> second. ... >> >>>>>> >> >>>>>> Is this a reasonable approach for such a task? >> >>>>>> >> >>>>>> Should I do anything different to acchive maximum performance? >> >>>>>> >> >>>>>> What are the tunables in this setup? [It seems the usual recipies are >> >>>>>> more oriented in maximizing the data troughput for large, mostly >> >>>>>> sequential accesses - I figure that things like increasing read-ahead >> >>>>>> etc. will not help me much here?] >> >>>> >> >>>> So can anybody help answering these questions: >> >>>> >> >>>> - are there any special options when creating the RAID0 to make it >> >>>> perform faster for such a use case? >> >>>> - are there other tunables, any special MD / LVM / file system / >> >>>> read ahead / buffer cache / ... parameters to look for? >> >>>> >> >>>> Thanks. >> >>>> >> >>>> Wolfgang Denk >> >>>> >> >>>> -- >> >>>> DENX Software Engineering GmbH, MD: Wolfgang Denk & Detlev Zundel >> >>>> HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany >> >>>> Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de >> >>>> Boykottiert Microsoft - Kauft Eure Fenster bei OBI! >> >>>> -- >> >>>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in >> >>>> the body of a message to majordomo@vger.kernel.org >> >>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >> >>>> >> >>> >> >>> >> >>> >> >> >> >> -- >> >> To unsubscribe from this list: send the line "unsubscribe linux-raid" in >> >> the body of a message to majordomo@vger.kernel.org >> >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> >> >> > >> > >> > >> > -- >> > Roberto Spadim >> > Spadim Technology / SPAEmpresarial >> > >> >> >> >> -- >> Roberto Spadim >> Spadim Technology / SPAEmpresarial >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-raid" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- Roberto Spadim Spadim Technology / SPAEmpresarial -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: Optimize RAID0 for max IOPS? 2011-01-19 23:18 ` Roberto Spadim 2011-01-20 2:48 ` Keld Jørn Simonsen @ 2011-01-21 19:34 ` Wolfgang Denk 2011-01-21 20:03 ` Roberto Spadim 1 sibling, 1 reply; 46+ messages in thread From: Wolfgang Denk @ 2011-01-21 19:34 UTC (permalink / raw) To: Roberto Spadim; +Cc: stefan.huebner, linux-raid Dear Roberto, In message <AANLkTiki_FfRrLtL3dMsrDLXeT8jNO0ndnTNpXk1OXMW@mail.gmail.com> you wrote: > a good idea.... > why not start a opensource raid controller? > what we need? a cpu, memory, power supply with battery or capacitor, > sas/sata (disk interfaces), pci-express or another (computer > interface) > it don´t need a operational system, since it will only run one program > with some threads (ok a small operational system to implement threads > easly) > > we could use arm, fpga, intel core2duo, atlhon, xeon, or another system... You could evenuse a processor dedicated for such a job, like a PPC440SPe or PPC460SX or similar, which provide hardware-offload capabilities for the RAID calculations. These are even supported by drivers in mainline Linux. But again, thee would not helpo to maximize IOPS - goal for optimization has always been maximum sequential troughput only (and yes, I know exactly what I'm talking about; guess where the aforementioned drivers are coming from). Best regards, Wolfgang Denk -- DENX Software Engineering GmbH, MD: Wolfgang Denk & Detlev Zundel HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de I don't see any direct evidence ... but, then, my crystal ball is in dire need of an ectoplasmic upgrade. :-) -- Howard Smith -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: Optimize RAID0 for max IOPS? 2011-01-21 19:34 ` Wolfgang Denk @ 2011-01-21 20:03 ` Roberto Spadim 2011-01-21 20:04 ` Roberto Spadim 0 siblings, 1 reply; 46+ messages in thread From: Roberto Spadim @ 2011-01-21 20:03 UTC (permalink / raw) To: Wolfgang Denk; +Cc: stefan.huebner, linux-raid =) i know but, every body tell software is slower, the solution - use hardware ok there´s no opensource firmware for raid hardware i preffer a good software/hardware solution, linux raid is a good software solution for me =) but, why not try a opensource project? hehe what we could do.... a virtual machine :P with only raid and nfs, or make a dedicated cpu for raid (cpu affinity) and a portion of memory only for raid cache (today i think raid software don´t have cache, it shoudn´t, cache is done by linux at filesystem level, i´m right?) 2011/1/21 Wolfgang Denk <wd@denx.de>: > Dear Roberto, > > In message <AANLkTiki_FfRrLtL3dMsrDLXeT8jNO0ndnTNpXk1OXMW@mail.gmail.com> you wrote: >> a good idea.... >> why not start a opensource raid controller? >> what we need? a cpu, memory, power supply with battery or capacitor, >> sas/sata (disk interfaces), pci-express or another (computer >> interface) >> it don´t need a operational system, since it will only run one program >> with some threads (ok a small operational system to implement threads >> easly) >> >> we could use arm, fpga, intel core2duo, atlhon, xeon, or another system... > > You could evenuse a processor dedicated for such a job, like a > PPC440SPe or PPC460SX or similar, which provide hardware-offload > capabilities for the RAID calculations. These are even supported by > drivers in mainline Linux. > > But again, thee would not helpo to maximize IOPS - goal for > optimization has always been maximum sequential troughput only > (and yes, I know exactly what I'm talking about; guess where the > aforementioned drivers are coming from). > > Best regards, > > Wolfgang Denk > > -- > DENX Software Engineering GmbH, MD: Wolfgang Denk & Detlev Zundel > HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany > Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de > I don't see any direct evidence ... but, then, my crystal ball is in > dire need of an ectoplasmic upgrade. :-) -- Howard Smith > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- Roberto Spadim Spadim Technology / SPAEmpresarial -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: Optimize RAID0 for max IOPS? 2011-01-21 20:03 ` Roberto Spadim @ 2011-01-21 20:04 ` Roberto Spadim 0 siblings, 0 replies; 46+ messages in thread From: Roberto Spadim @ 2011-01-21 20:04 UTC (permalink / raw) To: Wolfgang Denk; +Cc: stefan.huebner, linux-raid thanks, i never used a PPC440SPe, i will buy one for hobby =) 2011/1/21 Roberto Spadim <roberto@spadim.com.br>: > =) i know > but, every body tell software is slower, the solution - use hardware > ok > there´s no opensource firmware for raid hardware > > i preffer a good software/hardware solution, linux raid is a good > software solution for me =) > but, why not try a opensource project? hehe > what we could do.... a virtual machine :P with only raid and nfs, or > make a dedicated cpu for raid (cpu affinity) and a portion of memory > only for raid cache (today i think raid software don´t have cache, it > shoudn´t, cache is done by linux at filesystem level, i´m right?) > > > 2011/1/21 Wolfgang Denk <wd@denx.de>: >> Dear Roberto, >> >> In message <AANLkTiki_FfRrLtL3dMsrDLXeT8jNO0ndnTNpXk1OXMW@mail.gmail.com> you wrote: >>> a good idea.... >>> why not start a opensource raid controller? >>> what we need? a cpu, memory, power supply with battery or capacitor, >>> sas/sata (disk interfaces), pci-express or another (computer >>> interface) >>> it don´t need a operational system, since it will only run one program >>> with some threads (ok a small operational system to implement threads >>> easly) >>> >>> we could use arm, fpga, intel core2duo, atlhon, xeon, or another system... >> >> You could evenuse a processor dedicated for such a job, like a >> PPC440SPe or PPC460SX or similar, which provide hardware-offload >> capabilities for the RAID calculations. These are even supported by >> drivers in mainline Linux. >> >> But again, thee would not helpo to maximize IOPS - goal for >> optimization has always been maximum sequential troughput only >> (and yes, I know exactly what I'm talking about; guess where the >> aforementioned drivers are coming from). >> >> Best regards, >> >> Wolfgang Denk >> >> -- >> DENX Software Engineering GmbH, MD: Wolfgang Denk & Detlev Zundel >> HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany >> Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de >> I don't see any direct evidence ... but, then, my crystal ball is in >> dire need of an ectoplasmic upgrade. :-) -- Howard Smith >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-raid" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> > > > > -- > Roberto Spadim > Spadim Technology / SPAEmpresarial > -- Roberto Spadim Spadim Technology / SPAEmpresarial -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: Optimize RAID0 for max IOPS? 2011-01-19 19:21 ` Wolfgang Denk 2011-01-19 19:50 ` Roberto Spadim @ 2011-01-24 14:40 ` CoolCold 2011-01-24 15:25 ` Justin Piszcz 2011-01-24 20:43 ` Wolfgang Denk 1 sibling, 2 replies; 46+ messages in thread From: CoolCold @ 2011-01-24 14:40 UTC (permalink / raw) To: Wolfgang Denk; +Cc: stefan.huebner, linux-raid On Wed, Jan 19, 2011 at 10:21 PM, Wolfgang Denk <wd@denx.de> wrote: > Dear =?ISO-8859-15?Q?Stefan_/*St0fF*/_H=FCbner?=, > > In message <4D361F26.3060507@stud.tu-ilmenau.de> you wrote: >> >> [in German:] Schätzelein, Dein Problem sind die Platten, nicht der >> Controller. >> >> [in English:] Dude, the disks are your bottleneck. > ... > > Maybe we can stop speculations about what might be the cause of the > problems in some setup I do NOT intend to use, and rather discuss the > questions I asked. > >> > I will have 4 x 1 TB disks for this setup. >> > >> > The plan is to build a RAID0 from the 4 devices, create a physical >> > volume and a volume group on the resulting /dev/md?, then create 2 or >> > 3 logical volumes that will be used as XFS file systems. > > Clarrification: I'll run /dev/md* on the raw disks, without any > partitions on them. > >> > My goal is to optimize for maximum number of I/O operations per >> > second. ... >> > >> > Is this a reasonable approach for such a task? >> > >> > Should I do anything different to acchive maximum performance? >> > >> > What are the tunables in this setup? [It seems the usual recipies are >> > more oriented in maximizing the data troughput for large, mostly >> > sequential accesses - I figure that things like increasing read-ahead >> > etc. will not help me much here?] > > So can anybody help answering these questions: > > - are there any special options when creating the RAID0 to make it > perform faster for such a use case? > - are there other tunables, any special MD / LVM / file system / > read ahead / buffer cache / ... parameters to look for? XFS is known for it's slow speed on metadata operations like updating file attributes/removing files..but things gonna change after 2.6.35 where delaylog is used. Citating Dave Chinner : < dchinner> Indeed, the biggest concurrency limitation has traditionally been the transaction commit/journalling code, but that's a lot more scalable now with delayed logging.... So, you may need to benchmark fs part. > > Thanks. > > Wolfgang Denk > > -- > DENX Software Engineering GmbH, MD: Wolfgang Denk & Detlev Zundel > HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany > Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de > Boykottiert Microsoft - Kauft Eure Fenster bei OBI! > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- Best regards, [COOLCOLD-RIPN] -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: Optimize RAID0 for max IOPS? 2011-01-24 14:40 ` CoolCold @ 2011-01-24 15:25 ` Justin Piszcz 2011-01-24 20:43 ` Wolfgang Denk 1 sibling, 0 replies; 46+ messages in thread From: Justin Piszcz @ 2011-01-24 15:25 UTC (permalink / raw) To: CoolCold; +Cc: Wolfgang Denk, stefan.huebner, linux-raid, xfs [-- Attachment #1: Type: TEXT/PLAIN, Size: 900 bytes --] On Mon, 24 Jan 2011, CoolCold wrote: >> So can anybody help answering these questions: >> >> - are there any special options when creating the RAID0 to make it >> perform faster for such a use case? >> - are there other tunables, any special MD / LVM / file system / >> read ahead / buffer cache / ... parameters to look for? > XFS is known for it's slow speed on metadata operations like updating > file attributes/removing files..but things gonna change after 2.6.35 > where delaylog is used. Citating Dave Chinner : > < dchinner> Indeed, the biggest concurrency limitation has > traditionally been the transaction commit/journalling code, but that's > a lot more scalable now with delayed logging.... > > So, you may need to benchmark fs part. Some info on XFS benchmark with delaylog here: http://comments.gmane.org/gmane.comp.file-systems.xfs.general/34379 Justin. ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: Optimize RAID0 for max IOPS? @ 2011-01-24 15:25 ` Justin Piszcz 0 siblings, 0 replies; 46+ messages in thread From: Justin Piszcz @ 2011-01-24 15:25 UTC (permalink / raw) To: CoolCold; +Cc: linux-raid, stefan.huebner, Wolfgang Denk, xfs [-- Attachment #1: Type: TEXT/PLAIN, Size: 900 bytes --] On Mon, 24 Jan 2011, CoolCold wrote: >> So can anybody help answering these questions: >> >> - are there any special options when creating the RAID0 to make it >> perform faster for such a use case? >> - are there other tunables, any special MD / LVM / file system / >> read ahead / buffer cache / ... parameters to look for? > XFS is known for it's slow speed on metadata operations like updating > file attributes/removing files..but things gonna change after 2.6.35 > where delaylog is used. Citating Dave Chinner : > < dchinner> Indeed, the biggest concurrency limitation has > traditionally been the transaction commit/journalling code, but that's > a lot more scalable now with delayed logging.... > > So, you may need to benchmark fs part. Some info on XFS benchmark with delaylog here: http://comments.gmane.org/gmane.comp.file-systems.xfs.general/34379 Justin. [-- Attachment #2: Type: text/plain, Size: 121 bytes --] _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: Optimize RAID0 for max IOPS? 2011-01-24 15:25 ` Justin Piszcz @ 2011-01-24 20:48 ` Wolfgang Denk -1 siblings, 0 replies; 46+ messages in thread From: Wolfgang Denk @ 2011-01-24 20:48 UTC (permalink / raw) To: Justin Piszcz; +Cc: linux-raid, xfs Dear Justin Piszcz, In message <alpine.DEB.2.00.1101241024230.14640@p34.internal.lan> you wrote: > > > So, you may need to benchmark fs part. > > Some info on XFS benchmark with delaylog here: > http://comments.gmane.org/gmane.comp.file-systems.xfs.general/34379 Thanks a lot for the pointer. I will try this out. Best regards, Wolfgang Denk -- DENX Software Engineering GmbH, MD: Wolfgang Denk & Detlev Zundel HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de Madness takes its toll. Please have exact change. ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: Optimize RAID0 for max IOPS? @ 2011-01-24 20:48 ` Wolfgang Denk 0 siblings, 0 replies; 46+ messages in thread From: Wolfgang Denk @ 2011-01-24 20:48 UTC (permalink / raw) To: Justin Piszcz; +Cc: linux-raid, xfs Dear Justin Piszcz, In message <alpine.DEB.2.00.1101241024230.14640@p34.internal.lan> you wrote: > > > So, you may need to benchmark fs part. > > Some info on XFS benchmark with delaylog here: > http://comments.gmane.org/gmane.comp.file-systems.xfs.general/34379 Thanks a lot for the pointer. I will try this out. Best regards, Wolfgang Denk -- DENX Software Engineering GmbH, MD: Wolfgang Denk & Detlev Zundel HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de Madness takes its toll. Please have exact change. _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: Optimize RAID0 for max IOPS? 2011-01-24 15:25 ` Justin Piszcz @ 2011-01-24 21:57 ` Wolfgang Denk -1 siblings, 0 replies; 46+ messages in thread From: Wolfgang Denk @ 2011-01-24 21:57 UTC (permalink / raw) To: Justin Piszcz; +Cc: linux-raid, xfs Dear Justin, In message <alpine.DEB.2.00.1101241024230.14640@p34.internal.lan> you wrote: > > Some info on XFS benchmark with delaylog here: > http://comments.gmane.org/gmane.comp.file-systems.xfs.general/34379 For the record: I tested both the "delaylog" and "logbsize=262144" on two systems running Fedora 14 x86_64 (kernel version 2.6.35.10-74.fc14.x86_64). Test No. Mount options 1 rw,noatime 2 rw,noatime,delaylog 3 rw,noatime,delaylog,logbsize=262144 System A: Gigabyte EP35C-DS3R Mainbord, Core 2 Quad CPU Q9550 @ 2.83GHz, 4 GB RAM --------- software RAID 5 using 4 x old Maxtor 7Y250M0 S-ATA I disks (chunk size 16 kB, using S-ATA ports on main board), XFS Test 1: Version 1.96 ------Sequential Output------ --Sequential Input- --Random- Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP A1 8G 844 96 153107 19 56427 11 2006 98 127174 15 369.4 6 Latency 13686us 1480ms 1128ms 14986us 136ms 74911us Version 1.96 ------Sequential Create------ --------Random Create-------- A1 -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete-- files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP 16 104 0 +++++ +++ 115 0 89 0 +++++ +++ 111 0 Latency 326ms 171us 277ms 343ms 9us 360ms 1.96,1.96,A1,1,1295714835,8G,,844,96,153107,19,56427,11,2006,98,127174,15,369.4,6,16,,,,,104,0,+++++,+++,115,0,89,0,+++++,+++,111,0,13686us,1480ms,1128ms,14986us,136ms,74911us,326ms,171us,277ms,343ms,9us,360ms Test 2: Version 1.96 ------Sequential Output------ --Sequential Input- --Random- Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP A2 8G 417 46 67526 8 28251 5 1338 63 53780 5 236.0 4 Latency 38626us 1859ms 508ms 26689us 258ms 188ms Version 1.96 ------Sequential Create------ --------Random Create-------- A2 -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete-- files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP 16 51 0 +++++ +++ 128 0 102 0 +++++ +++ 125 0 Latency 1526ms 169us 277ms 363ms 8us 324ms 1.96,1.96,A2,1,1295901138,8G,,417,46,67526,8,28251,5,1338,63,53780,5,236.0,4,16,,,,,51,0,+++++,+++,128,0,102,0,+++++,+++,125,0,38626us,1859ms,508ms,26689us,258ms,188ms,1526ms,169us,277ms,363ms,8us,324ms Test 3: Version 1.96 ------Sequential Output------ --Sequential Input- --Random- Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP A3 8G 417 46 67526 8 28251 5 1338 63 53780 5 236.0 4 Latency 38626us 1859ms 508ms 26689us 258ms 188ms Version 1.96 ------Sequential Create------ --------Random Create-------- A3 -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete-- files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP 16 51 0 +++++ +++ 128 0 102 0 +++++ +++ 125 0 Latency 1526ms 169us 277ms 363ms 8us 324ms 1.96,1.96,A3,1,1295901138,8G,,417,46,67526,8,28251,5,1338,63,53780,5,236.0,4,16,,,,,51,0,+++++,+++,128,0,102,0,+++++,+++,125,0,38626us,1859ms,508ms,26689us,258ms,188ms,1526ms,169us,277ms,363ms,8us,324ms System B: Supermicro H8DM8-2 Mainbord, Dual-Core AMD Opteron 2216 @ 2.4 GHz, 8 GB RAM software RAID 6 using 6 x Seagate ST31000524NS S-ATA II disks (chunk size 16 kB, using a Marvell MV88SX6081 8-port SATA II PCI-X Controller) XFS Test 1: Version 1.96 ------Sequential Output------ --Sequential Input- --Random- Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP B1 16G 403 98 198720 66 53287 49 1013 99 228076 91 545.0 31 Latency 43022us 127ms 126ms 29328us 105ms 66395us Version 1.96 ------Sequential Create------ --------Random Create-------- B1 -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete-- files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP 16 97 1 +++++ +++ 96 1 96 1 +++++ +++ 95 1 Latency 326ms 349us 351ms 355ms 49us 363ms 1.96,1.96,B1,1,1295784794,16G,,403,98,198720,66,53287,49,1013,99,228076,91,545.0,31,16,,,,,97,1,+++++,+++,96,1,96,1,+++++,+++,95,1,43022us,127ms,126ms,29328us,105ms,66395us,326ms,349us,351ms,355ms,49us,363ms Test 2: Version 1.96 ------Sequential Output------ --Sequential Input- --Random- Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP B2 16G 380 98 197319 68 54835 48 983 99 216812 89 527.8 31 Latency 47456us 227ms 280ms 24696us 38233us 80147us Version 1.96 ------Sequential Create------ --------Random Create-------- B2 -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete-- files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP 16 91 1 +++++ +++ 115 1 73 1 +++++ +++ 96 1 Latency 355ms 2274us 833ms 750ms 1079us 400ms 1.96,1.96,B2,1,1295884032,16G,,380,98,197319,68,54835,48,983,99,216812,89,527.8,31,16,,,,,91,1,+++++,+++,115,1,73,1,+++++,+++,96,1,47456us,227ms,280ms,24696us,38233us,80147us,355ms,2274us,833ms,750ms,1079us,400ms Test 3: Version 1.96 ------Sequential Output------ --Sequential Input- --Random- Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP B3 16G 402 99 175802 64 55639 48 1006 99 232748 87 543.7 32 Latency 43160us 426ms 164ms 13306us 40857us 65114us Version 1.96 ------Sequential Create------ --------Random Create-------- B3 -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete-- files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP 16 93 1 +++++ +++ 101 1 95 1 +++++ +++ 95 1 Latency 479ms 2281us 383ms 366ms 22us 402ms 1.96,1.96,B3,1,1295880202,16G,,402,99,175802,64,55639,48,1006,99,232748,87,543.7,32,16,,,,,93,1,+++++,+++,101,1,95,1,+++++,+++,95,1,43160us,426ms,164ms,13306us,40857us,65114us,479ms,2281us,383ms,366ms,22us,402ms I do not see any significant improvement in any of the parameters - especially when compared to the serious performance degradation (down to 44% for block write, 42% for block read) on system A. Best regards, Wolfgang Denk -- DENX Software Engineering GmbH, MD: Wolfgang Denk & Detlev Zundel HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de A supercomputer is a machine that runs an endless loop in 2 seconds. ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: Optimize RAID0 for max IOPS? @ 2011-01-24 21:57 ` Wolfgang Denk 0 siblings, 0 replies; 46+ messages in thread From: Wolfgang Denk @ 2011-01-24 21:57 UTC (permalink / raw) To: Justin Piszcz; +Cc: linux-raid, xfs Dear Justin, In message <alpine.DEB.2.00.1101241024230.14640@p34.internal.lan> you wrote: > > Some info on XFS benchmark with delaylog here: > http://comments.gmane.org/gmane.comp.file-systems.xfs.general/34379 For the record: I tested both the "delaylog" and "logbsize=262144" on two systems running Fedora 14 x86_64 (kernel version 2.6.35.10-74.fc14.x86_64). Test No. Mount options 1 rw,noatime 2 rw,noatime,delaylog 3 rw,noatime,delaylog,logbsize=262144 System A: Gigabyte EP35C-DS3R Mainbord, Core 2 Quad CPU Q9550 @ 2.83GHz, 4 GB RAM --------- software RAID 5 using 4 x old Maxtor 7Y250M0 S-ATA I disks (chunk size 16 kB, using S-ATA ports on main board), XFS Test 1: Version 1.96 ------Sequential Output------ --Sequential Input- --Random- Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP A1 8G 844 96 153107 19 56427 11 2006 98 127174 15 369.4 6 Latency 13686us 1480ms 1128ms 14986us 136ms 74911us Version 1.96 ------Sequential Create------ --------Random Create-------- A1 -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete-- files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP 16 104 0 +++++ +++ 115 0 89 0 +++++ +++ 111 0 Latency 326ms 171us 277ms 343ms 9us 360ms 1.96,1.96,A1,1,1295714835,8G,,844,96,153107,19,56427,11,2006,98,127174,15,369.4,6,16,,,,,104,0,+++++,+++,115,0,89,0,+++++,+++,111,0,13686us,1480ms,1128ms,14986us,136ms,74911us,326ms,171us,277ms,343ms,9us,360ms Test 2: Version 1.96 ------Sequential Output------ --Sequential Input- --Random- Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP A2 8G 417 46 67526 8 28251 5 1338 63 53780 5 236.0 4 Latency 38626us 1859ms 508ms 26689us 258ms 188ms Version 1.96 ------Sequential Create------ --------Random Create-------- A2 -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete-- files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP 16 51 0 +++++ +++ 128 0 102 0 +++++ +++ 125 0 Latency 1526ms 169us 277ms 363ms 8us 324ms 1.96,1.96,A2,1,1295901138,8G,,417,46,67526,8,28251,5,1338,63,53780,5,236.0,4,16,,,,,51,0,+++++,+++,128,0,102,0,+++++,+++,125,0,38626us,1859ms,508ms,26689us,258ms,188ms,1526ms,169us,277ms,363ms,8us,324ms Test 3: Version 1.96 ------Sequential Output------ --Sequential Input- --Random- Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP A3 8G 417 46 67526 8 28251 5 1338 63 53780 5 236.0 4 Latency 38626us 1859ms 508ms 26689us 258ms 188ms Version 1.96 ------Sequential Create------ --------Random Create-------- A3 -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete-- files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP 16 51 0 +++++ +++ 128 0 102 0 +++++ +++ 125 0 Latency 1526ms 169us 277ms 363ms 8us 324ms 1.96,1.96,A3,1,1295901138,8G,,417,46,67526,8,28251,5,1338,63,53780,5,236.0,4,16,,,,,51,0,+++++,+++,128,0,102,0,+++++,+++,125,0,38626us,1859ms,508ms,26689us,258ms,188ms,1526ms,169us,277ms,363ms,8us,324ms System B: Supermicro H8DM8-2 Mainbord, Dual-Core AMD Opteron 2216 @ 2.4 GHz, 8 GB RAM software RAID 6 using 6 x Seagate ST31000524NS S-ATA II disks (chunk size 16 kB, using a Marvell MV88SX6081 8-port SATA II PCI-X Controller) XFS Test 1: Version 1.96 ------Sequential Output------ --Sequential Input- --Random- Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP B1 16G 403 98 198720 66 53287 49 1013 99 228076 91 545.0 31 Latency 43022us 127ms 126ms 29328us 105ms 66395us Version 1.96 ------Sequential Create------ --------Random Create-------- B1 -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete-- files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP 16 97 1 +++++ +++ 96 1 96 1 +++++ +++ 95 1 Latency 326ms 349us 351ms 355ms 49us 363ms 1.96,1.96,B1,1,1295784794,16G,,403,98,198720,66,53287,49,1013,99,228076,91,545.0,31,16,,,,,97,1,+++++,+++,96,1,96,1,+++++,+++,95,1,43022us,127ms,126ms,29328us,105ms,66395us,326ms,349us,351ms,355ms,49us,363ms Test 2: Version 1.96 ------Sequential Output------ --Sequential Input- --Random- Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP B2 16G 380 98 197319 68 54835 48 983 99 216812 89 527.8 31 Latency 47456us 227ms 280ms 24696us 38233us 80147us Version 1.96 ------Sequential Create------ --------Random Create-------- B2 -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete-- files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP 16 91 1 +++++ +++ 115 1 73 1 +++++ +++ 96 1 Latency 355ms 2274us 833ms 750ms 1079us 400ms 1.96,1.96,B2,1,1295884032,16G,,380,98,197319,68,54835,48,983,99,216812,89,527.8,31,16,,,,,91,1,+++++,+++,115,1,73,1,+++++,+++,96,1,47456us,227ms,280ms,24696us,38233us,80147us,355ms,2274us,833ms,750ms,1079us,400ms Test 3: Version 1.96 ------Sequential Output------ --Sequential Input- --Random- Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP B3 16G 402 99 175802 64 55639 48 1006 99 232748 87 543.7 32 Latency 43160us 426ms 164ms 13306us 40857us 65114us Version 1.96 ------Sequential Create------ --------Random Create-------- B3 -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete-- files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP 16 93 1 +++++ +++ 101 1 95 1 +++++ +++ 95 1 Latency 479ms 2281us 383ms 366ms 22us 402ms 1.96,1.96,B3,1,1295880202,16G,,402,99,175802,64,55639,48,1006,99,232748,87,543.7,32,16,,,,,93,1,+++++,+++,101,1,95,1,+++++,+++,95,1,43160us,426ms,164ms,13306us,40857us,65114us,479ms,2281us,383ms,366ms,22us,402ms I do not see any significant improvement in any of the parameters - especially when compared to the serious performance degradation (down to 44% for block write, 42% for block read) on system A. Best regards, Wolfgang Denk -- DENX Software Engineering GmbH, MD: Wolfgang Denk & Detlev Zundel HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de A supercomputer is a machine that runs an endless loop in 2 seconds. _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: Optimize RAID0 for max IOPS? 2011-01-24 21:57 ` Wolfgang Denk @ 2011-01-24 23:03 ` Dave Chinner -1 siblings, 0 replies; 46+ messages in thread From: Dave Chinner @ 2011-01-24 23:03 UTC (permalink / raw) To: Wolfgang Denk; +Cc: Justin Piszcz, linux-raid, xfs On Mon, Jan 24, 2011 at 10:57:13PM +0100, Wolfgang Denk wrote: > Dear Justin, > > In message <alpine.DEB.2.00.1101241024230.14640@p34.internal.lan> you wrote: > > > > Some info on XFS benchmark with delaylog here: > > http://comments.gmane.org/gmane.comp.file-systems.xfs.general/34379 > > For the record: I tested both the "delaylog" and "logbsize=262144" on > two systems running Fedora 14 x86_64 (kernel version > 2.6.35.10-74.fc14.x86_64). > > > Test No. Mount options > 1 rw,noatime > 2 rw,noatime,delaylog > 3 rw,noatime,delaylog,logbsize=262144 > > > System A: Gigabyte EP35C-DS3R Mainbord, Core 2 Quad CPU Q9550 @ 2.83GHz, 4 GB RAM > --------- software RAID 5 using 4 x old Maxtor 7Y250M0 S-ATA I disks > (chunk size 16 kB, using S-ATA ports on main board), XFS > > Test 1: > > Version 1.96 ------Sequential Output------ --Sequential Input- --Random- > Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- > Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP > A1 8G 844 96 153107 19 56427 11 2006 98 127174 15 369.4 6 > Latency 13686us 1480ms 1128ms 14986us 136ms 74911us > Version 1.96 ------Sequential Create------ --------Random Create-------- > A1 -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete-- > files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP > 16 104 0 +++++ +++ 115 0 89 0 +++++ +++ 111 0 Only 16 files? You need to test something that takes more than 5 milliseconds to run. Given that XFS can run at >20,000 creates/s for a single threaded sequential create like this, perhaps you should start at 100,000 files (maybe a million) so you get an idea of sustained performance. ..... > I do not see any significant improvement in any of the parameters - > especially when compared to the serious performance degradation (down > to 44% for block write, 42% for block read) on system A. delaylog does not affect the block IO path in any way, so something else is going on there. You need to sort that out before drawing any conclusions. Similarly, you need to test something relevant to your workload, not use a canned benchmarks in the expectation the results are in any way meaningful to your real workload. Also, if you do use a stupid canned benchmark, make sure you configure it to test something relevant to what you are trying to compare... Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: Optimize RAID0 for max IOPS? @ 2011-01-24 23:03 ` Dave Chinner 0 siblings, 0 replies; 46+ messages in thread From: Dave Chinner @ 2011-01-24 23:03 UTC (permalink / raw) To: Wolfgang Denk; +Cc: linux-raid, Justin Piszcz, xfs On Mon, Jan 24, 2011 at 10:57:13PM +0100, Wolfgang Denk wrote: > Dear Justin, > > In message <alpine.DEB.2.00.1101241024230.14640@p34.internal.lan> you wrote: > > > > Some info on XFS benchmark with delaylog here: > > http://comments.gmane.org/gmane.comp.file-systems.xfs.general/34379 > > For the record: I tested both the "delaylog" and "logbsize=262144" on > two systems running Fedora 14 x86_64 (kernel version > 2.6.35.10-74.fc14.x86_64). > > > Test No. Mount options > 1 rw,noatime > 2 rw,noatime,delaylog > 3 rw,noatime,delaylog,logbsize=262144 > > > System A: Gigabyte EP35C-DS3R Mainbord, Core 2 Quad CPU Q9550 @ 2.83GHz, 4 GB RAM > --------- software RAID 5 using 4 x old Maxtor 7Y250M0 S-ATA I disks > (chunk size 16 kB, using S-ATA ports on main board), XFS > > Test 1: > > Version 1.96 ------Sequential Output------ --Sequential Input- --Random- > Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- > Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP > A1 8G 844 96 153107 19 56427 11 2006 98 127174 15 369.4 6 > Latency 13686us 1480ms 1128ms 14986us 136ms 74911us > Version 1.96 ------Sequential Create------ --------Random Create-------- > A1 -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete-- > files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP > 16 104 0 +++++ +++ 115 0 89 0 +++++ +++ 111 0 Only 16 files? You need to test something that takes more than 5 milliseconds to run. Given that XFS can run at >20,000 creates/s for a single threaded sequential create like this, perhaps you should start at 100,000 files (maybe a million) so you get an idea of sustained performance. ..... > I do not see any significant improvement in any of the parameters - > especially when compared to the serious performance degradation (down > to 44% for block write, 42% for block read) on system A. delaylog does not affect the block IO path in any way, so something else is going on there. You need to sort that out before drawing any conclusions. Similarly, you need to test something relevant to your workload, not use a canned benchmarks in the expectation the results are in any way meaningful to your real workload. Also, if you do use a stupid canned benchmark, make sure you configure it to test something relevant to what you are trying to compare... Cheers, Dave. -- Dave Chinner david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: Optimize RAID0 for max IOPS? 2011-01-24 23:03 ` Dave Chinner @ 2011-01-25 7:39 ` Emmanuel Florac -1 siblings, 0 replies; 46+ messages in thread From: Emmanuel Florac @ 2011-01-25 7:39 UTC (permalink / raw) To: Dave Chinner; +Cc: Wolfgang Denk, linux-raid, Justin Piszcz, xfs Le Tue, 25 Jan 2011 10:03:14 +1100 vous écriviez: > Only 16 files? IIRC this is 16 thousands of files. Though this is not enough, I generally use 80 to 160 for tests. -- ------------------------------------------------------------------------ Emmanuel Florac | Direction technique | Intellique | <eflorac@intellique.com> | +33 1 78 94 84 02 ------------------------------------------------------------------------ -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: Optimize RAID0 for max IOPS? @ 2011-01-25 7:39 ` Emmanuel Florac 0 siblings, 0 replies; 46+ messages in thread From: Emmanuel Florac @ 2011-01-25 7:39 UTC (permalink / raw) To: Dave Chinner; +Cc: linux-raid, xfs, Wolfgang Denk, Justin Piszcz Le Tue, 25 Jan 2011 10:03:14 +1100 vous écriviez: > Only 16 files? IIRC this is 16 thousands of files. Though this is not enough, I generally use 80 to 160 for tests. -- ------------------------------------------------------------------------ Emmanuel Florac | Direction technique | Intellique | <eflorac@intellique.com> | +33 1 78 94 84 02 ------------------------------------------------------------------------ _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: Optimize RAID0 for max IOPS? 2011-01-25 7:39 ` Emmanuel Florac @ 2011-01-25 8:36 ` Dave Chinner -1 siblings, 0 replies; 46+ messages in thread From: Dave Chinner @ 2011-01-25 8:36 UTC (permalink / raw) To: Emmanuel Florac; +Cc: Wolfgang Denk, linux-raid, Justin Piszcz, xfs [ As a small note - if you are going to comment on the results table from a previous message, please don't cut it from your response. Context is important. I pasted the relevant part back in so i can refer back to it in my response. ] On Tue, Jan 25, 2011 at 08:39:00AM +0100, Emmanuel Florac wrote: > Le Tue, 25 Jan 2011 10:03:14 +1100 vous écriviez: > > > Version 1.96 ------Sequential Create------ --------Random Create-------- > > > A1 -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete-- > > > files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP > > > 16 104 0 +++++ +++ 115 0 89 0 +++++ +++ 111 0 > > > > Only 16 files? > > IIRC this is 16 thousands of files. Though this is not enough, I > generally use 80 to 160 for tests. Yes, you're right, the bonnie++ man page states that it is in units of 1024 files. Be nice if there was a "k" to signify that so people who aren't intimately familiar with it's output format can see exactly what was tested.... As it is, a create rate of 104 files/s (note the consistency of units between 2 adjacent numbers!) indicates something else is screwed, because my local test VM on RAID0 gets numbers like this: Version 1.96 ------Sequential Create------ --------Random Create-------- test-4 -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete-- files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP 16 25507 90 +++++ +++ 30472 97 25281 93 +++++ +++ 29077 97 Latency 23864us 204us 21092us 18855us 82us 121us IOWs, create rates of 25k/s and unlink of 30k/s and it is clearly CPU bound. Therein lies the difference: the original numbers have 0% CPU usage, which indicates that the test is blocking. Something is causing the reported test system to be blocked almost all the time. /me looks closer. Oh, despite $subject being "RAID0" the filesystems being tested are on RAID5 and RAID6 with very small chunk sizes on slow SATA drives. This is smelling like a case of barrier IOs on software raid on cheap storage.... Cheers, Dave. -- Dave Chinner david@fromorbit.com -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: Optimize RAID0 for max IOPS? @ 2011-01-25 8:36 ` Dave Chinner 0 siblings, 0 replies; 46+ messages in thread From: Dave Chinner @ 2011-01-25 8:36 UTC (permalink / raw) To: Emmanuel Florac; +Cc: linux-raid, xfs, Wolfgang Denk, Justin Piszcz [ As a small note - if you are going to comment on the results table from a previous message, please don't cut it from your response. Context is important. I pasted the relevant part back in so i can refer back to it in my response. ] On Tue, Jan 25, 2011 at 08:39:00AM +0100, Emmanuel Florac wrote: > Le Tue, 25 Jan 2011 10:03:14 +1100 vous écriviez: > > > Version 1.96 ------Sequential Create------ --------Random Create-------- > > > A1 -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete-- > > > files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP > > > 16 104 0 +++++ +++ 115 0 89 0 +++++ +++ 111 0 > > > > Only 16 files? > > IIRC this is 16 thousands of files. Though this is not enough, I > generally use 80 to 160 for tests. Yes, you're right, the bonnie++ man page states that it is in units of 1024 files. Be nice if there was a "k" to signify that so people who aren't intimately familiar with it's output format can see exactly what was tested.... As it is, a create rate of 104 files/s (note the consistency of units between 2 adjacent numbers!) indicates something else is screwed, because my local test VM on RAID0 gets numbers like this: Version 1.96 ------Sequential Create------ --------Random Create-------- test-4 -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete-- files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP 16 25507 90 +++++ +++ 30472 97 25281 93 +++++ +++ 29077 97 Latency 23864us 204us 21092us 18855us 82us 121us IOWs, create rates of 25k/s and unlink of 30k/s and it is clearly CPU bound. Therein lies the difference: the original numbers have 0% CPU usage, which indicates that the test is blocking. Something is causing the reported test system to be blocked almost all the time. /me looks closer. Oh, despite $subject being "RAID0" the filesystems being tested are on RAID5 and RAID6 with very small chunk sizes on slow SATA drives. This is smelling like a case of barrier IOs on software raid on cheap storage.... Cheers, Dave. -- Dave Chinner david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: Optimize RAID0 for max IOPS? 2011-01-25 8:36 ` Dave Chinner @ 2011-01-25 12:45 ` Wolfgang Denk -1 siblings, 0 replies; 46+ messages in thread From: Wolfgang Denk @ 2011-01-25 12:45 UTC (permalink / raw) To: Dave Chinner; +Cc: Emmanuel Florac, linux-raid, Justin Piszcz, xfs Dear Dave Chinner, In message <20110125083643.GE28803@dastard> you wrote: > > Oh, despite $subject being "RAID0" the filesystems being tested are > on RAID5 and RAID6 with very small chunk sizes on slow SATA drives. > This is smelling like a case of barrier IOs on software raid on > cheap storage.... Right. [Any way to avoid these, btw?] I got side-tracked by the comments about the new (to me) delaylog mount option to xfs; as the results were not exactly as exp[ected I though it might be interesting to report these. But as the subject says, my current topic is tuning RAID0 to avoid exactly this type of bottleneck; or rather looking for tunable options on RAID0 Best regards, Wolfgang Denk -- DENX Software Engineering GmbH, MD: Wolfgang Denk & Detlev Zundel HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de PLEASE NOTE: Some Quantum Physics Theories Suggest That When the Con- sumer Is Not Directly Observing This Product, It May Cease to Exist or Will Exist Only in a Vague and Undetermined State. ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: Optimize RAID0 for max IOPS? @ 2011-01-25 12:45 ` Wolfgang Denk 0 siblings, 0 replies; 46+ messages in thread From: Wolfgang Denk @ 2011-01-25 12:45 UTC (permalink / raw) To: Dave Chinner; +Cc: linux-raid, Justin Piszcz, xfs Dear Dave Chinner, In message <20110125083643.GE28803@dastard> you wrote: > > Oh, despite $subject being "RAID0" the filesystems being tested are > on RAID5 and RAID6 with very small chunk sizes on slow SATA drives. > This is smelling like a case of barrier IOs on software raid on > cheap storage.... Right. [Any way to avoid these, btw?] I got side-tracked by the comments about the new (to me) delaylog mount option to xfs; as the results were not exactly as exp[ected I though it might be interesting to report these. But as the subject says, my current topic is tuning RAID0 to avoid exactly this type of bottleneck; or rather looking for tunable options on RAID0 Best regards, Wolfgang Denk -- DENX Software Engineering GmbH, MD: Wolfgang Denk & Detlev Zundel HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de PLEASE NOTE: Some Quantum Physics Theories Suggest That When the Con- sumer Is Not Directly Observing This Product, It May Cease to Exist or Will Exist Only in a Vague and Undetermined State. _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: Optimize RAID0 for max IOPS? 2011-01-25 12:45 ` Wolfgang Denk @ 2011-01-25 12:51 ` Emmanuel Florac -1 siblings, 0 replies; 46+ messages in thread From: Emmanuel Florac @ 2011-01-25 12:51 UTC (permalink / raw) To: Wolfgang Denk; +Cc: Dave Chinner, linux-raid, Justin Piszcz, xfs Le Tue, 25 Jan 2011 13:45:09 +0100 Wolfgang Denk <wd@denx.de> écrivait: > > This is smelling like a case of barrier IOs on software raid on > > cheap storage.... > > Right. [Any way to avoid these, btw?] Easy enough, use the "nobarrier" mount option. -- ------------------------------------------------------------------------ Emmanuel Florac | Direction technique | Intellique | <eflorac@intellique.com> | +33 1 78 94 84 02 ------------------------------------------------------------------------ -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: Optimize RAID0 for max IOPS? @ 2011-01-25 12:51 ` Emmanuel Florac 0 siblings, 0 replies; 46+ messages in thread From: Emmanuel Florac @ 2011-01-25 12:51 UTC (permalink / raw) To: Wolfgang Denk; +Cc: linux-raid, xfs, Justin, Piszcz Le Tue, 25 Jan 2011 13:45:09 +0100 Wolfgang Denk <wd@denx.de> écrivait: > > This is smelling like a case of barrier IOs on software raid on > > cheap storage.... > > Right. [Any way to avoid these, btw?] Easy enough, use the "nobarrier" mount option. -- ------------------------------------------------------------------------ Emmanuel Florac | Direction technique | Intellique | <eflorac@intellique.com> | +33 1 78 94 84 02 ------------------------------------------------------------------------ _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: Optimize RAID0 for max IOPS? 2011-01-24 14:40 ` CoolCold 2011-01-24 15:25 ` Justin Piszcz @ 2011-01-24 20:43 ` Wolfgang Denk 1 sibling, 0 replies; 46+ messages in thread From: Wolfgang Denk @ 2011-01-24 20:43 UTC (permalink / raw) To: CoolCold; +Cc: linux-raid Dear CoolCold, In message <AANLkTikx4g99-Cf_09kEGfF2mmf4Dnuh2A5gTrtKweDy@mail.gmail.com> you wrote: > > > So can anybody help answering these questions: > > > > - are there any special options when creating the RAID0 to make it > > perform faster for such a use case? > > - are there other tunables, any special MD / LVM / file system / > > read ahead / buffer cache / ... parameters to look for? > XFS is known for it's slow speed on metadata operations like updating > file attributes/removing files..but things gonna change after 2.6.35 > where delaylog is used. Citating Dave Chinner : > < dchinner> Indeed, the biggest concurrency limitation has > traditionally been the transaction commit/journalling code, but that's > a lot more scalable now with delayed logging.... > > So, you may need to benchmark fs part. Thanks a lot - much appreciated. The first reply that actually was on topic... Best regards, Wolfgang Denk -- DENX Software Engineering GmbH, MD: Wolfgang Denk & Detlev Zundel HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de It is not best to swap horses while crossing the river. - Abraham Lincoln ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: Optimize RAID0 for max IOPS? 2011-01-18 21:01 Optimize RAID0 for max IOPS? Wolfgang Denk 2011-01-18 22:18 ` Roberto Spadim 2011-01-18 23:15 ` Stefan /*St0fF*/ Hübner @ 2011-01-25 17:10 ` Christoph Hellwig 2011-01-25 18:41 ` Wolfgang Denk 2 siblings, 1 reply; 46+ messages in thread From: Christoph Hellwig @ 2011-01-25 17:10 UTC (permalink / raw) To: Wolfgang Denk; +Cc: linux-raid On Tue, Jan 18, 2011 at 10:01:12PM +0100, Wolfgang Denk wrote: > Hi, > > I'm going to replace a h/w based RAID system (3ware 9650SE) by a plain > s/w RAID0, because the existing system appears to be seriously limited > in terms of numbers of I/O operations per second. > > Our workload is mixed read / write (something between 80% read / 20% > write and 50% / 50%), consisting of a very large number of usually > very small files. > > There may be 20...50 millions of files, or more. 65% of the files are > smaller than 4 kB; 80% are smaller than 8 kB; 90% are smaller than 16 > kB; 98.4% are smaller than 64 kB. I don't think you even want a RAID0 in that case. For small IOPs you're much better off with a simple concatenation of devices. > The plan is to build a RAID0 from the 4 devices, create a physical > volume and a volume group on the resulting /dev/md?, then create 2 or > 3 logical volumes that will be used as XFS file systems. Especially if you're running XFS the concetantion will work beautifully for this setup. Make sure that your AG boundaries align to the physical devices, and they can be used completely independently for small IOPs. > Should I do anything different to acchive maximum performance? Make sure to disable the disk write caches and if not using the newest kernel also mount the filesystem with -o nobarrier. With lots of small I/Os and metadata intensive workloads that's usually a lot faster. Also if you have a lot of log traffic an external log devices will help a lot. It's doesn't need to be larger, but it will keep the amount of seeks on the other devices down. ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: Optimize RAID0 for max IOPS? 2011-01-25 17:10 ` Christoph Hellwig @ 2011-01-25 18:41 ` Wolfgang Denk 2011-01-25 21:35 ` Christoph Hellwig 0 siblings, 1 reply; 46+ messages in thread From: Wolfgang Denk @ 2011-01-25 18:41 UTC (permalink / raw) To: Christoph Hellwig; +Cc: linux-raid Dear Christoph, In message <20110125171017.GA24921@infradead.org> you wrote: > > > There may be 20...50 millions of files, or more. 65% of the files are > > smaller than 4 kB; 80% are smaller than 8 kB; 90% are smaller than 16 > > kB; 98.4% are smaller than 64 kB. > > I don't think you even want a RAID0 in that case. For small IOPs > you're much better off with a simple concatenation of devices. What exactly do you mean by "conatenation"? LVM striping? At least the discussion here does not show any significant advantages for this concept: http://groups.google.com/group/ubuntu-user-community/web/pick-your-pleasure-raid-0-mdadm-striping-or-lvm-striping > > Should I do anything different to acchive maximum performance? > > Make sure to disable the disk write caches and if not using the newest > kernel also mount the filesystem with -o nobarrier. With lots of small > I/Os and metadata intensive workloads that's usually a lot faster. Tests if done recently indicate that on the other hand nobarrier causes a serious degradation of read and write performance (down to some 40% of the values before). > Also if you have a lot of log traffic an external log devices will > help a lot. It's doesn't need to be larger, but it will keep the > amount of seeks on the other devices down. Understood, thanks. Best regards, Wolfgang Denk -- DENX Software Engineering GmbH, MD: Wolfgang Denk & Detlev Zundel HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de Never underestimate the bandwidth of a station wagon full of tapes. -- Dr. Warren Jackson, Director, UTCS ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: Optimize RAID0 for max IOPS? 2011-01-25 18:41 ` Wolfgang Denk @ 2011-01-25 21:35 ` Christoph Hellwig 2011-01-26 7:16 ` Wolfgang Denk 0 siblings, 1 reply; 46+ messages in thread From: Christoph Hellwig @ 2011-01-25 21:35 UTC (permalink / raw) To: Wolfgang Denk; +Cc: Christoph Hellwig, linux-raid On Tue, Jan 25, 2011 at 07:41:15PM +0100, Wolfgang Denk wrote: > > I don't think you even want a RAID0 in that case. For small IOPs > > you're much better off with a simple concatenation of devices. > > What exactly do you mean by "conatenation"? LVM striping? > At least the discussion here does not show any significant advantages > for this concept: > http://groups.google.com/group/ubuntu-user-community/web/pick-your-pleasure-raid-0-mdadm-striping-or-lvm-striping No, concatenation means not using any striping, but just concatenating the disk linearly, e.g. +-----------------------------------+ | Filesystem | +--------+--------+--------+--------+ | Disk 1 | Disk 2 | Disk 3 | Disk 4 | +--------+--------+--------+--------+ This can be done using the using the MD linear target, or simply by having multiple PVs in a VG with LVM. > > > Make sure to disable the disk write caches and if not using the newest > > kernel also mount the filesystem with -o nobarrier. With lots of small > > I/Os and metadata intensive workloads that's usually a lot faster. > > Tests if done recently indicate that on the other hand nobarrier causes > a serious degradation of read and write performance (down to some 40% > of the values before). Do you have a pointer to your results? ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: Optimize RAID0 for max IOPS? 2011-01-25 21:35 ` Christoph Hellwig @ 2011-01-26 7:16 ` Wolfgang Denk 2011-01-26 8:32 ` Stan Hoeppner 2011-01-26 9:38 ` Christoph Hellwig 0 siblings, 2 replies; 46+ messages in thread From: Wolfgang Denk @ 2011-01-26 7:16 UTC (permalink / raw) To: Christoph Hellwig; +Cc: linux-raid Dear Christoph Hellwig, In message <20110125213523.GA14375@infradead.org> you wrote: > > > What exactly do you mean by "conatenation"? LVM striping? > > At least the discussion here does not show any significant advantages > > for this concept: > > http://groups.google.com/group/ubuntu-user-community/web/pick-your-pleasure-raid-0-mdadm-striping-or-lvm-striping > > No, concatenation means not using any striping, but just concatenating > the disk linearly, e.g. > > +-----------------------------------+ > | Filesystem | > +--------+--------+--------+--------+ > | Disk 1 | Disk 2 | Disk 3 | Disk 4 | > +--------+--------+--------+--------+ > > This can be done using the using the MD linear target, or simply > by having multiple PVs in a VG with LVM. I will not have a single file system, but several, so I'd probably go with LVM. But - when I then create a LV, eventually smaller than any of the disks, will the data (and thus the traffic) be really distri- buted over all drives, or will I not basicly see the same results as when using a single drive? > > Tests if done recently indicate that on the other hand nobarrier causes > > a serious degradation of read and write performance (down to some 40% > > of the values before). > > Do you have a pointer to your results? This was the first set of tests: http://thread.gmane.org/gmane.linux.raid/31269/focus=31419 I've run some more tests on the system called 'B' in this list: # lvcreate -L 32G -n test castor0 Logical volume "test" created # mkfs.xfs /dev/mapper/castor0-test meta-data=/dev/mapper/castor0-test isize=256 agcount=16, agsize=524284 blks = sectsz=512 attr=2 data = bsize=4096 blocks=8388544, imaxpct=25 = sunit=4 swidth=16 blks naming =version 2 bsize=4096 ascii-ci=0 log =internal log bsize=4096 blocks=4096, version=2 = sectsz=512 sunit=4 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0 # mount /dev/mapper/castor0-test /mnt/tmp/ # mkdir /mnt/tmp/foo # chown wd.wd /mnt/tmp/foo # bonnie++ -d /mnt/tmp/foo -m xfs -u wd -g wd Version 1.96 ------Sequential Output------ --Sequential Input- --Random- Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP xfs 16G 425 98 182929 64 46956 41 955 97 201274 83 517.6 30 Latency 42207us 2377ms 195ms 33339us 86675us 84167us Version 1.96 ------Sequential Create------ --------Random Create-------- xfs -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete-- files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP 16 93 1 +++++ +++ 90 1 123 1 +++++ +++ 127 1 Latency 939ms 2279us 1415ms 307ms 1057us 724ms 1.96,1.96,xfs,1,1295938326,16G,,425,98,182929,64,46956,41,955,97,201274,83,517.6,30,16,,,,,93,1,+++++,+++,90,1,123,1,+++++,+++,127,1,42207us,2377ms,195ms,33339us,86675us,84167us,939ms,2279us,1415ms,307ms,1057us,724ms [[Re-run with larger number of file creates / deletes]] # bonnie++ -d /mnt/tmp/foo -n 128:65536:0:512 -m xfs1 -u wd -g wd Version 1.96 ------Sequential Output------ --Sequential Input- --Random- Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP xfs1 16G 400 98 175931 63 46970 40 781 99 181044 73 524.2 30 Latency 48299us 2501ms 210ms 20693us 83729us 85349us Version 1.96 ------Sequential Create------ --------Random Create-------- xfs1 -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete-- files:max /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP 128:65536:0/512 42 1 25607 99 71 1 38 1 8267 67 34 0 Latency 1410ms 2337us 2116ms 1240ms 44920us 4139ms 1.96,1.96,xfs1,1,1295942356,16G,,400,98,175931,63,46970,40,781,99,181044,73,524.2,30,128,65536,,,512,42,1,25607,99,71,1,38,1,8267,67,34,0,48299us,2501ms,210ms,20693us,83729us,85349us,1410ms,2337us,2116ms,1240ms,44920us,4139ms [[Add delaylog,logbsize=262144]] # mount | grep /mnt/tmp /dev/mapper/castor0-test on /mnt/tmp type xfs (rw) # mount -o remount,noatime,delaylog,logbsize=262144 /mnt/tmp # mount | grep /mnt/tmp /dev/mapper/castor0-test on /mnt/tmp type xfs (rw,noatime,delaylog,logbsize=262144) Version 1.96 ------Sequential Output------ --Sequential Input- --Random- Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP xfs1 16G 445 98 106201 43 35407 33 939 99 83545 42 490.4 30 Latency 43307us 4614ms 242ms 37420us 195ms 128ms Version 1.96 ------Sequential Create------ --------Random Create-------- xfs1 -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete-- files:max /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP 128:65536:0/512 308 4 24121 99 2393 30 321 5 22929 99 331 6 Latency 34842ms 1288us 6634ms 87944ms 195us 12239ms 1.96,1.96,xfs1,1,1295968991,16G,,445,98,106201,43,35407,33,939,99,83545,42,490.4,30,128,65536,,,512,308,4,24121,99,2393,30,321,5,22929,99,331,6,43307us,4614ms,242ms,37420us,195ms,128ms,34842ms,1288us,6634ms,87944ms,195us,12239ms [[Note: Block write: drop to 60%, Block read drops to <50%]] [[Add nobarriers]] # mount -o remount,nobarriers /mnt/tmp # mount | grep /mnt/tmp /dev/mapper/castor0-test on /mnt/tmp type xfs (rw,noatime,delaylog,logbsize=262144,nobarriers) # bonnie++ -d /mnt/tmp/foo -n 128:65536:0:512 -m xfs2 -u wd -g wd Version 1.96 ------Sequential Output------ --Sequential Input- --Random- Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP xfs2 16G 427 98 193950 65 52848 45 987 99 198110 83 496.5 25 Latency 41543us 128ms 186ms 14678us 67639us 76024us Version 1.96 ------Sequential Create------ --------Random Create-------- xfs2 -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete-- files:max /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP 128:65536:0/512 352 6 24513 99 2604 32 334 5 24921 99 333 6 Latency 32152ms 2307us 4148ms 31036ms 493us 23065ms 1.96,1.96,xfs2,1,1295966513,16G,,427,98,193950,65,52848,45,987,99,198110,83,496.5,25,128,65536,,,512,352,6,24513,99,2604,32,334,5,24921,99,333,6,41543us,128ms,186ms,14678us,67639us,76024us,32152ms,2307us,4148ms,31036ms,493us,23065ms [[Much better. But now compare ext4]] # mkfs.ext4 /dev/mapper/castor0-test mke2fs 1.41.12 (17-May-2010) Filesystem label= OS type: Linux Block size=4096 (log=2) Fragment size=4096 (log=2) Stride=4 blocks, Stripe width=16 blocks 2097152 inodes, 8388608 blocks 419430 blocks (5.00%) reserved for the super user First data block=0 Maximum filesystem blocks=4294967296 256 block groups 32768 blocks per group, 32768 fragments per group 8192 inodes per group Superblock backups stored on blocks: 32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208, 4096000, 7962624 Writing inode tables: done Creating journal (32768 blocks): done Writing superblocks and filesystem accounting information: done This filesystem will be automatically checked every 22 mounts or 180 days, whichever comes first. Use tune2fs -c or -i to override. # mount /dev/mapper/castor0-test /mnt/tmp # mount | grep /mnt/tmp /dev/mapper/castor0-test on /mnt/tmp type ext4 (rw) # mkdir /mnt/tmp/foo # chown wd.wd /mnt/tmp/foo # bonnie++ -d /mnt/tmp/foo -m ext4 -u wd -g wd ... Version 1.96 ------Sequential Output------ --Sequential Input- --Random- Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP ext4 16G 248 99 128657 49 61267 49 1026 97 236552 85 710.9 35 Latency 78833us 567ms 2586ms 37539us 61572us 88413us Version 1.96 ------Sequential Create------ --------Random Create-------- ext4 -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete-- files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP 16 14841 52 +++++ +++ 23164 70 20409 78 +++++ +++ 23441 73 Latency 206us 2384us 2372us 2322us 78us 2335us 1.96,1.96,ext4,1,1295954392,16G,,248,99,128657,49,61267,49,1026,97,236552,85,710.9,35,16,,,,,14841,52,+++++,+++,23164,70,20409,78,+++++,+++,23441,73,78833us,567ms,2586ms,37539us,61572us,88413us,206us,2384us,2372us,2322us,78us,2335us [[Only 2/3 of the speed of XFS for block write, but nearly 20% faster for block read. But magnitudes faster for file creates / deletes!]] [[add nobarrier]] # mount -o remount,nobarrier /mnt/tmp # mount | grep /mnt/tmp /dev/mapper/castor0-test on /mnt/tmp type ext4.2 (rw,nobarrier) # bonnie++ -d /mnt/tmp/foo -m ext4 -u wd -g wd Version 1.96 ------Sequential Output------ --Sequential Input- --Random- Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP ext4.2 16G 241 99 125446 50 57726 55 945 97 215698 87 509.2 54 Latency 81198us 1085ms 2479ms 46401us 111ms 83051us Version 1.96 ------Sequential Create------ --------Random Create-------- ext4 -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete-- files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP 16 12476 63 +++++ +++ 23990 66 21185 82 +++++ +++ 23039 82 Latency 440us 1019us 1094us 238us 25us 215us 1.96,1.96,ext4.2,1,1295996176,16G,,241,99,125446,50,57726,55,945,97,215698,87,509.2,54,16,,,,,12476,63,+++++,+++,23990,66,21185,82,+++++,+++,23039,82,81198us,1085ms,2479ms,46401us,111ms,83051us,440us,1019us,1094us,238us,25us,215us [[Again, degradation of about 10% for block read; with only minod advantages for seq. delete and random create]] Best regards, Wolfgang Denk -- DENX Software Engineering GmbH, MD: Wolfgang Denk & Detlev Zundel HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de For those who like this sort of thing, this is the sort of thing they like. - Abraham Lincoln ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: Optimize RAID0 for max IOPS? 2011-01-26 7:16 ` Wolfgang Denk @ 2011-01-26 8:32 ` Stan Hoeppner 2011-01-26 8:42 ` Wolfgang Denk 2011-01-26 9:38 ` Christoph Hellwig 1 sibling, 1 reply; 46+ messages in thread From: Stan Hoeppner @ 2011-01-26 8:32 UTC (permalink / raw) To: Wolfgang Denk; +Cc: Christoph Hellwig, linux-raid Wolfgang Denk put forth on 1/26/2011 1:16 AM: > I will not have a single file system, but several, so I'd probably go > with LVM. But - when I then create a LV, eventually smaller than any > of the disks, will the data (and thus the traffic) be really distri- > buted over all drives, or will I not basicly see the same results as > when using a single drive? If creating multiple filesystems then concatenation is probably not what you want, for the reasons you suspect, if you want the IO spread across all 4 disks for all operations on all filesystems. > # lvcreate -L 32G -n test castor0 > Logical volume "test" created > # mkfs.xfs /dev/mapper/castor0-test Is this on that set of 4 low end Maxtor disks? Is the above LV sitting atop RAID 0, RAID 5, or concatenation? > [[Only 2/3 of the speed of XFS for block write, but nearly 20% faster > for block read. But magnitudes faster for file creates / deletes!]] Try adding some concurrency, say 8, to bonnie++ and retest both XFS and ext4. XFS was designed/optimized for parallel workloads, not single thread workloads (although it can extract some concurrency from a single thread workload). XFS really shines with parallel workloads (assuming the underlying hardware isn't junk, and the mdraid/lvm configuration is sane). ext4 will probably always beat XFS performance with single thread workloads, and I don't believe anyone is surprised by that. For most moderate to heavy parallel workloads, XFS usually trounces ext4 (and all other Linux filesystems). -- Stan ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: Optimize RAID0 for max IOPS? 2011-01-26 8:32 ` Stan Hoeppner @ 2011-01-26 8:42 ` Wolfgang Denk 0 siblings, 0 replies; 46+ messages in thread From: Wolfgang Denk @ 2011-01-26 8:42 UTC (permalink / raw) To: Stan Hoeppner; +Cc: Christoph Hellwig, linux-raid Dear Stan Hoeppner, In message <4D3FDC31.3010502@hardwarefreak.com> you wrote: > > > # lvcreate -L 32G -n test castor0 > > Logical volume "test" created > > # mkfs.xfs /dev/mapper/castor0-test > > Is this on that set of 4 low end Maxtor disks? Is the above LV sitting atop > RAID 0, RAID 5, or concatenation? No, this is the other system, using 6 x Seagate ST31000524NS on a Marvell MV88SX6081 8-port SATA II PCI-X Controller. LVM is sitting on top of a RAID6 here: md2 : active raid6 sda[0] sdi[5] sdh[4] sde[3] sdd[2] sdb[1] 3907049792 blocks super 1.2 level 6, 16k chunk, algorithm 2 [6/6] [UUUUUU] > Try adding some concurrency, say 8, to bonnie++ and retest both XFS and ext4. OK, will do. Best regards, Wolfgang Denk -- DENX Software Engineering GmbH, MD: Wolfgang Denk & Detlev Zundel HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de The price of curiosity is a terminal experience. - Terry Pratchett, _The Dark Side of the Sun_ ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: Optimize RAID0 for max IOPS? 2011-01-26 7:16 ` Wolfgang Denk 2011-01-26 8:32 ` Stan Hoeppner @ 2011-01-26 9:38 ` Christoph Hellwig 2011-01-26 9:41 ` CoolCold 1 sibling, 1 reply; 46+ messages in thread From: Christoph Hellwig @ 2011-01-26 9:38 UTC (permalink / raw) To: Wolfgang Denk; +Cc: Christoph Hellwig, linux-raid On Wed, Jan 26, 2011 at 08:16:16AM +0100, Wolfgang Denk wrote: > I will not have a single file system, but several, so I'd probably go > with LVM. But - when I then create a LV, eventually smaller than any > of the disks, will the data (and thus the traffic) be really distri- > buted over all drives, or will I not basicly see the same results as > when using a single drive? Think about it: if you're doing small IOPs, they usually are smaller than the stripe size and you will hit only one disk anyway. But with a raid0 which disk you hit is relatively unpredictable. With a concatentation aligned to the AGs XFS will distribute processes writing data to the different AGs and thus the different disks, and you can reliably get performance out of them. If you have multiple filesystems the setup depends a lot on the workloads you plan to put on the filesystems. If all of the filesystems on it are busy at the same time just assigning disks to filesystems probably gives you the best performace. If they are busy at different times, or some are not busy at all you first want to partition the disk into areas for each filesystem and then concatenate them into volumes for each filesystem. > [[Note: Block write: drop to 60%, Block read drops to <50%]] How is the cpu load? delaylog trades I/O operations for cpu utilization. Together with a raid6, which apparently is the system you use here i might overload your system. And btw, in future please state you have numbers for a totally different setup then the one you're asking questions for. Comparing a raid6 setup to striping/concatenation is completely irrelevant. > > [[Add nobarriers]] > > # mount -o remount,nobarriers /mnt/tmp > # mount | grep /mnt/tmp > /dev/mapper/castor0-test on /mnt/tmp type xfs (rw,noatime,delaylog,logbsize=262144,nobarriers) a) the option is called nobarrier b) it looks like your mount implementation is really buggy as it shows random options that weren't actually parsed and accepted by the filesystem. > [[Again, degradation of about 10% for block read; with only minod > advantages for seq. delete and random create]] I really don't trust the numbers. nobarrier sends down less I/O requests, and avoids all kinds of queue stalls. How repetable are these benchmarks? Do you also see it using a less hacky benchmark than bonnier++? ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: Optimize RAID0 for max IOPS? 2011-01-26 9:38 ` Christoph Hellwig @ 2011-01-26 9:41 ` CoolCold 0 siblings, 0 replies; 46+ messages in thread From: CoolCold @ 2011-01-26 9:41 UTC (permalink / raw) To: Christoph Hellwig; +Cc: Wolfgang Denk, linux-raid On Wed, Jan 26, 2011 at 12:38 PM, Christoph Hellwig <hch@infradead.org> wrote: > On Wed, Jan 26, 2011 at 08:16:16AM +0100, Wolfgang Denk wrote: >> I will not have a single file system, but several, so I'd probably go >> with LVM. But - when I then create a LV, eventually smaller than any >> of the disks, will the data (and thus the traffic) be really distri- >> buted over all drives, or will I not basicly see the same results as >> when using a single drive? > > Think about it: if you're doing small IOPs, they usually are smaller > than the stripe size and you will hit only one disk anyway. But with > a raid0 which disk you hit is relatively unpredictable. With a > concatentation aligned to the AGs XFS will distribute processes writing > data to the different AGs and thus the different disks, and you can > reliably get performance out of them. > > If you have multiple filesystems the setup depends a lot on the > workloads you plan to put on the filesystems. If all of the filesystems > on it are busy at the same time just assigning disks to filesystems > probably gives you the best performace. If they are busy at different > times, or some are not busy at all you first want to partition the disk > into areas for each filesystem and then concatenate them into volumes > for each filesystem. > > >> [[Note: Block write: drop to 60%, Block read drops to <50%]] > > How is the cpu load? delaylog trades I/O operations for cpu > utilization. Together with a raid6, which apparently is the system you > use here i might overload your system. > > And btw, in future please state you have numbers for a totally different > setup then the one you're asking questions for. Comparing a raid6 setup > to striping/concatenation is completely irrelevant. > >> >> [[Add nobarriers]] >> >> # mount -o remount,nobarriers /mnt/tmp >> # mount | grep /mnt/tmp >> /dev/mapper/castor0-test on /mnt/tmp type xfs (rw,noatime,delaylog,logbsize=262144,nobarriers) > > a) the option is called nobarrier > b) it looks like your mount implementation is really buggy as it shows > random options that weren't actually parsed and accepted by the > filesystem. cat /proc/mounts may help i guess > >> [[Again, degradation of about 10% for block read; with only minod >> advantages for seq. delete and random create]] > > I really don't trust the numbers. nobarrier sends down less I/O > requests, and avoids all kinds of queue stalls. How repetable are these > benchmarks? Do you also see it using a less hacky benchmark than > bonnier++? > > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- Best regards, [COOLCOLD-RIPN] -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 46+ messages in thread
end of thread, other threads:[~2011-01-26 9:41 UTC | newest] Thread overview: 46+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2011-01-18 21:01 Optimize RAID0 for max IOPS? Wolfgang Denk 2011-01-18 22:18 ` Roberto Spadim 2011-01-19 7:04 ` Wolfgang Denk 2011-01-18 23:15 ` Stefan /*St0fF*/ Hübner 2011-01-19 0:05 ` Roberto Spadim 2011-01-19 7:11 ` Wolfgang Denk 2011-01-19 8:18 ` Stefan /*St0fF*/ Hübner 2011-01-19 8:29 ` Jaap Crezee 2011-01-19 9:32 ` Jan Kasprzak 2011-01-19 7:10 ` Wolfgang Denk 2011-01-19 19:21 ` Wolfgang Denk 2011-01-19 19:50 ` Roberto Spadim 2011-01-19 22:36 ` Stefan /*St0fF*/ Hübner 2011-01-19 23:09 ` Roberto Spadim 2011-01-19 23:18 ` Roberto Spadim 2011-01-20 2:48 ` Keld Jørn Simonsen 2011-01-20 3:53 ` Roberto Spadim 2011-01-21 19:34 ` Wolfgang Denk 2011-01-21 20:03 ` Roberto Spadim 2011-01-21 20:04 ` Roberto Spadim 2011-01-24 14:40 ` CoolCold 2011-01-24 15:25 ` Justin Piszcz 2011-01-24 15:25 ` Justin Piszcz 2011-01-24 20:48 ` Wolfgang Denk 2011-01-24 20:48 ` Wolfgang Denk 2011-01-24 21:57 ` Wolfgang Denk 2011-01-24 21:57 ` Wolfgang Denk 2011-01-24 23:03 ` Dave Chinner 2011-01-24 23:03 ` Dave Chinner 2011-01-25 7:39 ` Emmanuel Florac 2011-01-25 7:39 ` Emmanuel Florac 2011-01-25 8:36 ` Dave Chinner 2011-01-25 8:36 ` Dave Chinner 2011-01-25 12:45 ` Wolfgang Denk 2011-01-25 12:45 ` Wolfgang Denk 2011-01-25 12:51 ` Emmanuel Florac 2011-01-25 12:51 ` Emmanuel Florac 2011-01-24 20:43 ` Wolfgang Denk 2011-01-25 17:10 ` Christoph Hellwig 2011-01-25 18:41 ` Wolfgang Denk 2011-01-25 21:35 ` Christoph Hellwig 2011-01-26 7:16 ` Wolfgang Denk 2011-01-26 8:32 ` Stan Hoeppner 2011-01-26 8:42 ` Wolfgang Denk 2011-01-26 9:38 ` Christoph Hellwig 2011-01-26 9:41 ` CoolCold
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.