All of lore.kernel.org
 help / color / mirror / Atom feed
From: Roberto Spadim <roberto@spadim.com.br>
To: Shaohua Li <shli@kernel.org>
Cc: linux-raid@vger.kernel.org, neilb@suse.de, axboe@kernel.dk
Subject: Re: [patch 2/3 v3] raid1: read balance chooses idlest disk for SSD
Date: Mon, 2 Jul 2012 01:33:11 -0300	[thread overview]
Message-ID: <CABYL=Tofo65o7c5kg56v6yP-1Lf0NnkHg1EpX5cmpN-JEf-1-A@mail.gmail.com> (raw)
In-Reply-To: <CABYL=TpnJpUUNouAE3QKrPj-kewZ9jE4g8PVw3jFjmTDP7hRDw@mail.gmail.com>

check that if you don´t what this algorithm, you could use:
distance time =1
read time=0
penalty =0
and it would work as today implementation... (ok must check if this
could work for single disk to full array read, but it´s near)

2012/7/2 Roberto Spadim <roberto@spadim.com.br>:
> hummm well that´s true... exist a queue inside disk hardware that we
> can´t measure... but... if you want i can make tests to you :)
> i used a configuration a bit diferent some time ago, instead of a SSD
> and a harddisk, i used a disk with 7200rpm and a disk with 15000 the
> "time based" algorithm runs nice in this case, maybe could give just a
> little more 'performace' (maybe none), like i told the mean performace
> that i got was 1% (i made tests with different disks speed and
> ssd+disks, i had a ocz vortex2, a sata 7200rpm (500gb) and a sas
> 15000rpm (142gb), some other guy here in kernel list tested too, but
> they didn´t confirmed if the performace was a mean performace or just
> a 'error' in measure
>
> when i done this i got some 'empirical' values to 'tune' the
> algorithm, i don´t remember all 'theory' but i done something like
> this:
>
>
> 1)  (distance * time/distance unit)
> time/distance unit,
>     i don´t remember distance unit, i think it´s 1 block =  512bytes
> right? well, just check the idea...
>     for disks:
>         total disk capacity in distance units / 1 revolution time
>         1 revolution time = 1/rpm for disk, for example
>               7200 rpm => 120 hz => 8.333ms = 8333us (near 10ms like
> told in disk spec of random acess time)
>               15000 rpm => 250hz => 4ms = 4000us (near 5ms like told
> in disk spec)
>     for ssd : 0 seconds
>         7200 => 500gb (1024*1024*1024/512) / 8333 =   1048576000blocks /
> 8333us = 0.000'007'946'968'078 block/us
>         15000 => 142gb (1024*1024*1024/512) / 4000us = 297795584blocks /
> 4000us = 0.000'013'432'032'625 block/us
>         ssd => infinite blocks/us
>                 0.000007946 for 7200rpm,
>                 0.000013432 for 15000rpm,
>                 0 for ssd
>
>
>
> 2)(blocks to read/write * time to read/write 1 block)
>  this part i put dd to work...
>   dd if=/dev/sda of=/dev/null (there was some flags to remove cache
> too but don´t remember now...)
>    and used iostat -d 1 -k to get mean read performace
>  i don´t remember the rights numbers but they was something near this:
>     ssd - 230mb/s  = 230Mb(1024*1024)/512bytes => 471040 blocks /
> second =  0.000'002'122 => 2.122us / block
>     hd 7200 - 120mb/s => 245760 blocks/second => 0.000'004'069 =>
> 4.069us / block
>     hd 15000 - 170mb/s => 348160 blocks/second => 0.000'002'872 =>
> 2.872us / block
>
> 3) (non sequencial penalty time)
> here i used two dd to do this (some seconds between first and second dd)
> and got the new mb/s values
> ssd get a bit down but not much 230 -> 200
> hd 7200 120mb -> 90
> hd 15000 170 -> 150
>
> with this loses i done a 'penalty' value
> (230-200)/230 = 13.043%
> (120-90)/120 = 25%
> (170-150)/170 = 11.76%
>
> i don´t remember if i used the penalty with distance=0, or if i used
> it like in today implementation that select the previous disk when
> reading the full md device
>
> ======
> with this numbers.... some algorithms expected selects...
> sda=ssd, sdb=15000rpm, sdc=7200rpm
>
> sda|sdb|sdc
> disk positions: 0 | 0 | 0
> read 100 block at position 20000...
> sda=> distance = 20000, extimate time = 20000*0 + 2.122*100 + 13.043%
>                 in other words...
>                         (        0 + 212.2) * 1.13043 = 239.877246
> sdb=> distance = 20000, extimate time = 20000*0.000013432 + 2.872*100
> + 11.76% =
>                         (0.26864 + 287.2) * 1.1176 = 321.274952064
> sdc=> distance = 20000, extimate time = 20000*0.000007946 + 4.069*100 + 25% =
>                         (0.15892 + 406.9) * 1.25 = 508.82365
>         HERE WE SELECT sda (239.877)
>
> disk positions: 200 | 0 | 0
> read 100 blocks at position 0...
> sda=> distance = 200, extimate time = 200*0 + 2.122*100 + 13.043%
>                         (        0 + 212.2) * 1.13043 = 239.877246
> sdb=> distance = 0, extimate time = 0*0.000013432 + 2.872*100 + 0% =
>         (no penalty here since we are at the right place)
>                         (        0 + 287.2) * 1 = 287.2
> sdc=> distance = 0, extimate time = 0*0.000007946 + 4.069*100 + 0% =
>                         (        0 + 406.9) * 1 = 406.9
>         sda...
>         check that i will always select sda... since it´s fast for distance
> (0seconds) and have the highets transfer rate
>
> that´s here my algorithm didn´t worked fine... (i don´t know anything
> about past and queue just the current read)
>
> but now... with someone that know the kernel code... we have this
> information of pendings requests =D
>
> i think we can go inside queue and calculate the total estimate time =), or not?
>         for each pending request we should calculate this times... and sum
> the total time to select the 'best' disk
>         here i didn´t coded since i don´t know how to get information from
> queue in kernel =( and my hobby ended ='(
>
> thanks to read....



-- 
Roberto Spadim
Spadim Technology / SPAEmpresarial
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

  reply	other threads:[~2012-07-02  4:33 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-07-02  1:08 [patch 0/3 v3] Optimize raid1 read balance for SSD Shaohua Li
2012-07-02  1:08 ` [patch 1/3 v3] raid1: make sequential read detection per disk based Shaohua Li
2012-07-04  5:38   ` NeilBrown
2012-07-02  1:08 ` [patch 2/3 v3] raid1: read balance chooses idlest disk for SSD Shaohua Li
2012-07-02  2:13   ` Roberto Spadim
2012-07-02  3:02     ` Shaohua Li
2012-07-02  3:57       ` Roberto Spadim
2012-07-02  4:33         ` Roberto Spadim [this message]
2012-07-02  4:31       ` Roberto Spadim
2012-07-02  4:36         ` Roberto Spadim
2012-07-04  5:45   ` NeilBrown
2012-07-02  1:08 ` [patch 3/3 v3] raid1: prevent merging too large request Shaohua Li
2012-07-04  5:59   ` NeilBrown
2012-07-04  8:01     ` Shaohua Li

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CABYL=Tofo65o7c5kg56v6yP-1Lf0NnkHg1EpX5cmpN-JEf-1-A@mail.gmail.com' \
    --to=roberto@spadim.com.br \
    --cc=axboe@kernel.dk \
    --cc=linux-raid@vger.kernel.org \
    --cc=neilb@suse.de \
    --cc=shli@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.