Re: [patch 2/3 v3] raid1: read balance chooses idlest disk for SSD

From: Roberto Spadim <roberto@spadim.com.br>
To: Shaohua Li <shli@kernel.org>
Cc: linux-raid@vger.kernel.org, neilb@suse.de, axboe@kernel.dk
Subject: Re: [patch 2/3 v3] raid1: read balance chooses idlest disk for SSD
Date: Mon, 2 Jul 2012 01:33:11 -0300	[thread overview]
Message-ID: <CABYL=Tofo65o7c5kg56v6yP-1Lf0NnkHg1EpX5cmpN-JEf-1-A@mail.gmail.com> (raw)
In-Reply-To: <CABYL=TpnJpUUNouAE3QKrPj-kewZ9jE4g8PVw3jFjmTDP7hRDw@mail.gmail.com>

check that if you don´t what this algorithm, you could use:
distance time =1
read time=0
penalty =0
and it would work as today implementation... (ok must check if this
could work for single disk to full array read, but it´s near)

2012/7/2 Roberto Spadim <roberto@spadim.com.br>:
> hummm well that´s true... exist a queue inside disk hardware that we
> can´t measure... but... if you want i can make tests to you :)
> i used a configuration a bit diferent some time ago, instead of a SSD
> and a harddisk, i used a disk with 7200rpm and a disk with 15000 the
> "time based" algorithm runs nice in this case, maybe could give just a
> little more 'performace' (maybe none), like i told the mean performace
> that i got was 1% (i made tests with different disks speed and
> ssd+disks, i had a ocz vortex2, a sata 7200rpm (500gb) and a sas
> 15000rpm (142gb), some other guy here in kernel list tested too, but
> they didn´t confirmed if the performace was a mean performace or just
> a 'error' in measure
>
> when i done this i got some 'empirical' values to 'tune' the
> algorithm, i don´t remember all 'theory' but i done something like
> this:
>
>
> 1)  (distance * time/distance unit)
> time/distance unit,
>     i don´t remember distance unit, i think it´s 1 block =  512bytes
> right? well, just check the idea...
>     for disks:
>         total disk capacity in distance units / 1 revolution time
>         1 revolution time = 1/rpm for disk, for example
>               7200 rpm => 120 hz => 8.333ms = 8333us (near 10ms like
> told in disk spec of random acess time)
>               15000 rpm => 250hz => 4ms = 4000us (near 5ms like told
> in disk spec)
>     for ssd : 0 seconds
>         7200 => 500gb (1024*1024*1024/512) / 8333 =   1048576000blocks /
> 8333us = 0.000'007'946'968'078 block/us
>         15000 => 142gb (1024*1024*1024/512) / 4000us = 297795584blocks /
> 4000us = 0.000'013'432'032'625 block/us
>         ssd => infinite blocks/us
>                 0.000007946 for 7200rpm,
>                 0.000013432 for 15000rpm,
>                 0 for ssd
>
>
>
> 2)(blocks to read/write * time to read/write 1 block)
>  this part i put dd to work...
>   dd if=/dev/sda of=/dev/null (there was some flags to remove cache
> too but don´t remember now...)
>    and used iostat -d 1 -k to get mean read performace
>  i don´t remember the rights numbers but they was something near this:
>     ssd - 230mb/s  = 230Mb(1024*1024)/512bytes => 471040 blocks /
> second =  0.000'002'122 => 2.122us / block
>     hd 7200 - 120mb/s => 245760 blocks/second => 0.000'004'069 =>
> 4.069us / block
>     hd 15000 - 170mb/s => 348160 blocks/second => 0.000'002'872 =>
> 2.872us / block
>
> 3) (non sequencial penalty time)
> here i used two dd to do this (some seconds between first and second dd)
> and got the new mb/s values
> ssd get a bit down but not much 230 -> 200
> hd 7200 120mb -> 90
> hd 15000 170 -> 150
>
> with this loses i done a 'penalty' value
> (230-200)/230 = 13.043%
> (120-90)/120 = 25%
> (170-150)/170 = 11.76%
>
> i don´t remember if i used the penalty with distance=0, or if i used
> it like in today implementation that select the previous disk when
> reading the full md device
>
> ======
> with this numbers.... some algorithms expected selects...
> sda=ssd, sdb=15000rpm, sdc=7200rpm
>
> sda|sdb|sdc
> disk positions: 0 | 0 | 0
> read 100 block at position 20000...
> sda=> distance = 20000, extimate time = 20000*0 + 2.122*100 + 13.043%
>                 in other words...
>                         (        0 + 212.2) * 1.13043 = 239.877246
> sdb=> distance = 20000, extimate time = 20000*0.000013432 + 2.872*100
> + 11.76% =
>                         (0.26864 + 287.2) * 1.1176 = 321.274952064
> sdc=> distance = 20000, extimate time = 20000*0.000007946 + 4.069*100 + 25% =
>                         (0.15892 + 406.9) * 1.25 = 508.82365
>         HERE WE SELECT sda (239.877)
>
> disk positions: 200 | 0 | 0
> read 100 blocks at position 0...
> sda=> distance = 200, extimate time = 200*0 + 2.122*100 + 13.043%
>                         (        0 + 212.2) * 1.13043 = 239.877246
> sdb=> distance = 0, extimate time = 0*0.000013432 + 2.872*100 + 0% =
>         (no penalty here since we are at the right place)
>                         (        0 + 287.2) * 1 = 287.2
> sdc=> distance = 0, extimate time = 0*0.000007946 + 4.069*100 + 0% =
>                         (        0 + 406.9) * 1 = 406.9
>         sda...
>         check that i will always select sda... since it´s fast for distance
> (0seconds) and have the highets transfer rate
>
> that´s here my algorithm didn´t worked fine... (i don´t know anything
> about past and queue just the current read)
>
> but now... with someone that know the kernel code... we have this
> information of pendings requests =D
>
> i think we can go inside queue and calculate the total estimate time =), or not?
>         for each pending request we should calculate this times... and sum
> the total time to select the 'best' disk
>         here i didn´t coded since i don´t know how to get information from
> queue in kernel =( and my hobby ended ='(
>
> thanks to read....

-- 
Roberto Spadim
Spadim Technology / SPAEmpresarial
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html