Re: unfair io behaviour for high load interactive use still present in 2.6.31

All of lore.kernel.org
 help / color / mirror / Atom feed

* Re: unfair io behaviour for high load interactive use still present  in 2.6.31
       [not found] <dkAXF-4Gg-25@gated-at.bofh.it>
@ 2009-09-15 11:02 ` Daniel J Blueman
  2009-09-15 23:21   ` Tobias Oetiker
       [not found] ` <dkHFQ-1FQ-3@gated-at.bofh.it>
  1 sibling, 1 reply; 9+ messages in thread
From: Daniel J Blueman @ 2009-09-15 11:02 UTC (permalink / raw)
  To: Tobias Oetiker; +Cc: Linux Kernel

On Sep 15, 8:50 am, Tobias Oetiker <t...@oetiker.ch> wrote:
> Experts,
>
> We run several busy NFS file servers with Areca HW Raid + LVM2 + ext3
>
> We find that the read bandwidth falls dramatically as well as the
> response times going up to several seconds as soon as the system
> comes under heavy write strain.

It's worthwhile checking:
 - that the ext3 filesystem starts at a stripe-aligned offset
 - that the ext3 filesystem was created with the correct stripe-width
and stride (chunk) size
 - due to the larger amount of memory, ext4 may be a big win (due to
delayed allocate), if you'll stay with a newer kernel
 - if you have battery backup at the right levels:
  - performance may be better mounting the ext3 filesystem with
'barrier=0'
  - performance may improve mounting 'data=writeback'

Thanks,
  Daniel
--
Daniel J Blueman

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: unfair io behaviour for high load interactive use still present in 2.6.31
  2009-09-15 11:02 ` unfair io behaviour for high load interactive use still present in 2.6.31 Daniel J Blueman
@ 2009-09-15 23:21   ` Tobias Oetiker
  0 siblings, 0 replies; 9+ messages in thread
From: Tobias Oetiker @ 2009-09-15 23:21 UTC (permalink / raw)
  To: Daniel J Blueman; +Cc: Linux Kernel

[-- Attachment #1: Type: TEXT/PLAIN, Size: 3387 bytes --]

Hi Daniel,

Yesterday Daniel J Blueman wrote:

> On Sep 15, 8:50 am, Tobias Oetiker <t...@oetiker.ch> wrote:
> > Experts,
> >
> > We run several busy NFS file servers with Areca HW Raid + LVM2 + ext3
> >
> > We find that the read bandwidth falls dramatically as well as the
> > response times going up to several seconds as soon as the system
> > comes under heavy write strain.
>
> It's worthwhile checking:

>  - that the ext3 filesystem starts at a stripe-aligned offset

yep

>  - that the ext3 filesystem was created with the correct
>  stripe-width and stride (chunk) size

yep

>  - due to the larger amount of memory, ext4 may be a big win (due
>  to delayed allocate), if you'll stay with a newer kernel

ext4 does not seem to change much

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await  svctm  %util

dm-18             0.00     0.00    0.00  316.60     0.00     1.24     8.00  2590.22   99.29   3.16 100.00
dm-18             0.00     0.00    1.60 1211.60     0.01     4.73     8.00   692.64 2925.31   0.76  92.00
dm-18             0.00     0.00    2.00 4541.40     0.01    17.74     8.00  1858.21  413.23   0.22  99.52
dm-18             0.00     0.00    0.00 1341.80     0.00     5.24     8.00  1084.59  239.59   0.68  90.96
dm-18             0.00     0.00   10.00 3739.80     0.04    14.61     8.00   914.32  447.29   0.25  93.12
dm-18             0.00     0.00    2.60 2474.20     0.01     9.66     8.00   537.33   23.88   0.36  89.36
dm-18             0.00     0.00    2.00 2569.00     0.01    10.04     8.00  1215.49  658.95   0.33  85.92


>  - if you have battery backup at the right levels:
>   - performance may be better mounting the ext3 filesystem with
> 'barrier=0'

the results look pretty much the same

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await  svctm  %util

dm-18             0.00     0.00    1.40 9272.00     0.01    36.22     8.00   413.57  111.78   0.10  88.64
dm-18             0.00     0.00    4.20   71.80     0.02     0.28     8.00    78.24  471.06  12.83  97.52
dm-18             0.00     0.00    0.60 4183.20     0.00    16.34     8.00   988.32  216.66   0.24 100.00
dm-18             0.00     0.00    3.60  792.80     0.01     3.10     8.00  1098.60 1535.50   0.93  74.16
dm-18             0.00     0.00    1.60 6161.20     0.01    24.07     8.00   407.17   42.46   0.16  99.60
dm-18             0.00     0.00    0.00 2128.60     0.00     8.31     8.00  2497.56  368.99   0.47  99.92
dm-18             0.00     0.00    2.60 2657.80     0.01    10.38     8.00  1507.78  937.57   0.34  91.28
dm-18             0.00     0.00    5.80 8872.20     0.02    34.66     8.00   455.43  130.94   0.09  79.20
dm-18             0.00     0.00    2.20 4981.20     0.01    19.46     8.00  1058.84  245.39   0.19  92.24


>   - performance may improve mounting 'data=writeback'

here the effect is that writes get delayed much longer ... it
almost seems as if they got delayed until the reads were done ...

What I am really interested is not perfomance (the total throughput
of the system is pretty ok) the problem is just that there are such
big read delays under heavy load which makes interactive use pretty
hard.

cheers
tobi


-- 
Tobi Oetiker, OETIKER+PARTNER AG, Aarweg 15 CH-4600 Olten, Switzerland
http://it.oetiker.ch tobi@oetiker.ch ++41 62 775 9902 / sb: -9900

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: unfair io behaviour for high load interactive use still present  in 2.6.31
       [not found]       ` <dkXAT-6SV-9@gated-at.bofh.it>
@ 2009-09-26 19:35         ` Daniel J Blueman
  0 siblings, 0 replies; 9+ messages in thread
From: Daniel J Blueman @ 2009-09-26 19:35 UTC (permalink / raw)
  To: Tobias Oetiker; +Cc: Linux Kernel

On Sep 16, 9:00 am, Tobias Oetiker <t...@oetiker.ch> wrote:
> Today Corrado Zoccolo wrote:
> > On Tue, Sep 15, 2009 at 11:07 PM, Tobias Oetiker <t...@oetiker.ch> wrote:
> > > Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await  svctm  %util
> > > ---------------------------------------------------------------------------------------------------------
> > > dm-18             0.00     0.00    0.00 2566.80     0.00    10.03     8.00  2224.74  737.62   0.39 100.00
> > > dm-18             0.00     0.00    9.60  679.00     0.04     2.65     8.00   400.41 1029.73   1.35  92.80
> > > dm-18             0.00     0.00    0.00 2080.80     0.00     8.13     8.00   906.58  456.45   0.48 100.00
> > > dm-18             0.00     0.00    0.00 2349.20     0.00     9.18     8.00  1351.17  491.44   0.43 100.00
> > > dm-18             0.00     0.00    3.80  665.60     0.01     2.60     8.00   906.72 1098.75   1.39  93.20
> > > dm-18             0.00     0.00    0.00 1811.20     0.00     7.07     8.00  1008.23  725.34   0.55 100.00
> > > dm-18             0.00     0.00    0.00 2632.60     0.00    10.28     8.00  1651.18  640.61   0.38 100.00
>
> > Good.
> > The high await is normal for writes, especially since you get so many
> > queued requests.
> > can you post the output of "grep -r .  /sys/block/_device_/queue/" and
> > iostat for your real devices?
> > This should not affect reads, that will preempt writes with cfq.
>
> /sys/block/sdc/queue/nr_requests:128
> /sys/block/sdc/queue/read_ahead_kb:128
> /sys/block/sdc/queue/max_hw_sectors_kb:2048
> /sys/block/sdc/queue/max_sectors_kb:512
> /sys/block/sdc/queue/scheduler:noop anticipatory deadline [cfq]
> /sys/block/sdc/queue/hw_sector_size:512
> /sys/block/sdc/queue/logical_block_size:512
> /sys/block/sdc/queue/physical_block_size:512
> /sys/block/sdc/queue/minimum_io_size:512
> /sys/block/sdc/queue/optimal_io_size:0
> /sys/block/sdc/queue/rotational:1
> /sys/block/sdc/queue/nomerges:0
> /sys/block/sdc/queue/rq_affinity:0
> /sys/block/sdc/queue/iostats:1

> /sys/block/sdc/queue/iosched/quantum:4

As Mike points out [1], it may be worth trying a small quantum, though
the benefit may depend on the behaviour of your SCSI RAID controller.

--- [1]

http://groups.google.com/group/linux.kernel/browse_thread/thread/49d2360a22af8898/60022a7e287af096#60022a7e287af096
--
Daniel J Blueman

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: unfair io behaviour for high load interactive use still present in 2.6.31
  2009-09-16  7:29     ` Corrado Zoccolo
  2009-09-16  7:54       ` Tobias Oetiker
@ 2009-09-16 12:40       ` Tobias Oetiker
  1 sibling, 0 replies; 9+ messages in thread
From: Tobias Oetiker @ 2009-09-16 12:40 UTC (permalink / raw)
  To: Corrado Zoccolo; +Cc: linux-kernel

[-- Attachment #1: Type: TEXT/PLAIN, Size: 2887 bytes --]

Hi Corrado,

Today Corrado Zoccolo wrote:

> Hi Tobias,
> On Tue, Sep 15, 2009 at 11:07 PM, Tobias Oetiker <tobi@oetiker.ch> wrote:
> > Today Corrado Zoccolo wrote:
> >
> >> On Tue, Sep 15, 2009 at 9:30 AM, Tobias Oetiker <tobi@oetiker.ch> wrote:
> >> > Below is an excerpt from iostat while the test is in full swing:
> >> >
> >> > * 2.6.31 (8 cpu x86_64, 24 GB Ram)
> >> > * scheduler = cfq
> >> > * iostat -m -x dm-5 5
> >> > * running in parallel on 3 lvm logical volumes
> >> >  on a single physical volume
> >> >
> >> > Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await  svctm  %util
> >> > ---------------------------------------------------------------------------------------------------------
> * with i/o controller patches, what happens if you put readers in one
> domain and writers in the other?

With the io-conroller and readers and writers in two groups, the
rMB/s and wMB/s come out roughly the same. there are two problems unfortunately

a) the interactive performance is still not good since directory access still seems to get.

b) processes normally do both, read AND write, so the distinction would have to be made at the io-controller level.


Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await  svctm  %util
---------------------------------------------------------------------------------------------------------
dm-5              0.00     0.00  549.80    0.00     2.15     0.00     8.00   577.75    6.82   1.82 100.00
dm-5              0.00     0.00  191.20 1333.60     0.75     5.21     8.00  1077.45  953.15   0.66 100.00
dm-5              0.00     0.00  457.20 1144.00     1.79     4.47     8.00  1820.56  836.86   0.62 100.00
dm-5              0.00     0.00  453.00 1328.80     1.77     5.19     8.00  1326.74 1168.99   0.56 100.00
dm-5              0.00     0.00  353.09  353.49     1.38     1.38     8.00  1405.80 1993.23   1.41  99.80
dm-5              0.00     0.00  280.40 1296.80     1.10     5.07     8.00  1248.95  747.99   0.63 100.00
dm-5              0.00     0.00  307.00 4557.20     1.20    17.80     8.00   985.92  211.25   0.21 100.00
dm-5              0.00     0.00    6.00    0.00     0.02     0.00     8.00   450.42 19116.00 166.67 100.00
dm-5              0.00     0.00  154.00  624.20     0.60     2.44     8.00   926.52 1119.77   1.29 100.00


cheers
tobi



> Are you willing to test some patches? I'm working on patches to reduce
> read latency, that may be interesting to you.
>
> Corrado
>
> > cheers
> > tobi
> >
> > --
> > Tobi Oetiker, OETIKER+PARTNER AG, Aarweg 15 CH-4600 Olten, Switzerland
> > http://it.oetiker.ch tobi@oetiker.ch ++41 62 775 9902 / sb: -9900
>
>
>
>

-- 
Tobi Oetiker, OETIKER+PARTNER AG, Aarweg 15 CH-4600 Olten, Switzerland
http://it.oetiker.ch tobi@oetiker.ch ++41 62 775 9902 / sb: -9900

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: unfair io behaviour for high load interactive use still present in 2.6.31
  2009-09-16  7:29     ` Corrado Zoccolo
@ 2009-09-16  7:54       ` Tobias Oetiker
  2009-09-16 12:40       ` Tobias Oetiker
  1 sibling, 0 replies; 9+ messages in thread
From: Tobias Oetiker @ 2009-09-16  7:54 UTC (permalink / raw)
  To: Corrado Zoccolo; +Cc: linux-kernel

[-- Attachment #1: Type: TEXT/PLAIN, Size: 7092 bytes --]

HI Corrado,

Today Corrado Zoccolo wrote:

> Hi Tobias,
> On Tue, Sep 15, 2009 at 11:07 PM, Tobias Oetiker <tobi@oetiker.ch> wrote:
> >
> > Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await  svctm  %util
> > ---------------------------------------------------------------------------------------------------------
> > dm-18             0.00     0.00    0.00 2566.80     0.00    10.03     8.00  2224.74  737.62   0.39 100.00
> > dm-18             0.00     0.00    9.60  679.00     0.04     2.65     8.00   400.41 1029.73   1.35  92.80
> > dm-18             0.00     0.00    0.00 2080.80     0.00     8.13     8.00   906.58  456.45   0.48 100.00
> > dm-18             0.00     0.00    0.00 2349.20     0.00     9.18     8.00  1351.17  491.44   0.43 100.00
> > dm-18             0.00     0.00    3.80  665.60     0.01     2.60     8.00   906.72 1098.75   1.39  93.20
> > dm-18             0.00     0.00    0.00 1811.20     0.00     7.07     8.00  1008.23  725.34   0.55 100.00
> > dm-18             0.00     0.00    0.00 2632.60     0.00    10.28     8.00  1651.18  640.61   0.38 100.00
> >
>
> Good.
> The high await is normal for writes, especially since you get so many
> queued requests.
> can you post the output of "grep -r .  /sys/block/_device_/queue/" and
> iostat for your real devices?
> This should not affect reads, that will preempt writes with cfq.

/sys/block/sdc/queue/nr_requests:128
/sys/block/sdc/queue/read_ahead_kb:128
/sys/block/sdc/queue/max_hw_sectors_kb:2048
/sys/block/sdc/queue/max_sectors_kb:512
/sys/block/sdc/queue/scheduler:noop anticipatory deadline [cfq]
/sys/block/sdc/queue/hw_sector_size:512
/sys/block/sdc/queue/logical_block_size:512
/sys/block/sdc/queue/physical_block_size:512
/sys/block/sdc/queue/minimum_io_size:512
/sys/block/sdc/queue/optimal_io_size:0
/sys/block/sdc/queue/rotational:1
/sys/block/sdc/queue/nomerges:0
/sys/block/sdc/queue/rq_affinity:0
/sys/block/sdc/queue/iostats:1
/sys/block/sdc/queue/iosched/quantum:4
/sys/block/sdc/queue/iosched/fifo_expire_sync:124
/sys/block/sdc/queue/iosched/fifo_expire_async:248
/sys/block/sdc/queue/iosched/back_seek_max:16384
/sys/block/sdc/queue/iosched/back_seek_penalty:2
/sys/block/sdc/queue/iosched/slice_sync:100
/sys/block/sdc/queue/iosched/slice_async:40
/sys/block/sdc/queue/iosched/slice_async_rq:2
/sys/block/sdc/queue/iosched/slice_idle:8

but as I said in my original post. I have done extensive tests,
twiddly all the knobs I know of and the only thing that never
changed, was that as soon as writers come into play, the readers
get starved pretty thorowly.

> >
> > in the read/write test the write rate stais down, but the rMB/s is even worse and the await is also way up,
> > so I guess the bad performance is not to blame on the the cache in the controller ...
> >
> > Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await  svctm  %util
> > ---------------------------------------------------------------------------------------------------------
> > dm-18             0.00     0.00    0.00 1225.80     0.00     4.79     8.00  1050.49  807.38   0.82 100.00
> > dm-18             0.00     0.00    0.00 1721.80     0.00     6.73     8.00  1823.67  807.20   0.58 100.00
> > dm-18             0.00     0.00    0.00 1128.00     0.00     4.41     8.00   617.94  832.52   0.89 100.00
> > dm-18             0.00     0.00    0.00  838.80     0.00     3.28     8.00   873.04 1056.37   1.19 100.00
> > dm-18             0.00     0.00   39.60  347.80     0.15     1.36     8.00   590.27 1880.05   2.57  99.68
> > dm-18             0.00     0.00    0.00 1626.00     0.00     6.35     8.00   983.85  452.72   0.62 100.00
> > dm-18             0.00     0.00    0.00 1117.00     0.00     4.36     8.00  1047.16 1117.78   0.90 100.00
> > dm-18             0.00     0.00    0.00 1319.20     0.00     5.15     8.00   840.64  573.39   0.76 100.00
> >
>
> This is interesting. it seems no reads are happening at all here.
> I suspect that this and your observation that the real read throughput
> is much higher, can be explained because your readers mostly read from
> the page cache.


> Can you describe better how the test workload is structured,
> especially regarding cache?
> How do you say that some tars are just readers, and some are just
> writers? I suppose you pre-fault-in the input for writers, and you
> direct output to /dev/null for readers? Are all readers reading from
> different directories?

they are NOT ... what I do is this (http://tobi.oetiker.ch/fspunisher).

* get a copy of the linux kernel source an place it on tmpfs

* unpack 4 copies of the linux kernel for each reader process

* sync and echo 3 >/proc/sys/vm/drop_caches  (this should loose all
  cache)

* start the readers, each on its private copy of the kernel source,
  as prepared above writing their output to /dev/null.

* start an equal amount of tars unpacking the kernel archive from
  tmpfs into separate directories, next to the readers source
  directories.

the goal of this exercise is to simulate independent writers and
readers while excluding the cache as much as possible since I want
to see how the system deals with accessing the actual disk device
and not the local cache.
>
> Few things to check:
> * are the cpus saturated during the test?
nope

> * are the readers mostly in state 'D', or 'S', or 'R'?

they are almost alway in D (same as the writers) sleeping for IO

> * did you try 'as' I/O scheduler?

yes, this helps the readers a bit
but it does not help the 'smoothness'
of the system operation

> * how big are your volumes?

100 G

> * what is the average load on the system?

24 (or however many tars I start since they al want to run all the
time and are all waiting for IO)

> * with i/o controller patches, what happens if you put readers in one
> domain and writers in the other?

will try ... note though that this would have to happen
automatically since I can not know before hand who writes and who
reads ...

> Are you willing to test some patches? I'm working on patches to reduce
> read latency, that may be interesting to you.

by all means ... to repeat my goal, I want to see the readers not
starved by the writers, also I have a hunch that the handling of
metadata updates/access in the filesystem may play a role here.
In the dd tests these do not play a role but in real life I find
that often the problem is getting at the file, not reading the file
once I have it ... this is especially tricky to test since there is
good caching on this and this can pretty quickly distort the
results if not handled carefully.

cheers
tobi

-- 
Tobi Oetiker, OETIKER+PARTNER AG, Aarweg 15 CH-4600 Olten, Switzerland
http://it.oetiker.ch tobi@oetiker.ch ++41 62 775 9902 / sb: -9900

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: unfair io behaviour for high load interactive use still present  in 2.6.31
  2009-09-15 21:07   ` Tobias Oetiker
@ 2009-09-16  7:29     ` Corrado Zoccolo
  2009-09-16  7:54       ` Tobias Oetiker
  2009-09-16 12:40       ` Tobias Oetiker
  0 siblings, 2 replies; 9+ messages in thread
From: Corrado Zoccolo @ 2009-09-16  7:29 UTC (permalink / raw)
  To: Tobias Oetiker; +Cc: linux-kernel

Hi Tobias,
On Tue, Sep 15, 2009 at 11:07 PM, Tobias Oetiker <tobi@oetiker.ch> wrote:
> Today Corrado Zoccolo wrote:
>
>> On Tue, Sep 15, 2009 at 9:30 AM, Tobias Oetiker <tobi@oetiker.ch> wrote:
>> > Below is an excerpt from iostat while the test is in full swing:
>> >
>> > * 2.6.31 (8 cpu x86_64, 24 GB Ram)
>> > * scheduler = cfq
>> > * iostat -m -x dm-5 5
>> > * running in parallel on 3 lvm logical volumes
>> >  on a single physical volume
>> >
>> > Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await  svctm  %util
>> > ---------------------------------------------------------------------------------------------------------
>> > dm-5             0.00     0.00   26.60 7036.60     0.10    27.49     8.00   669.04   91.58   0.09  66.96
>> > dm-5             0.00     0.00   63.80 2543.00     0.25     9.93     8.00  1383.63  539.28   0.36  94.64
>> > dm-5             0.00     0.00   78.00 5084.60     0.30    19.86     8.00  1007.41  195.12   0.15  77.36
>> > dm-5             0.00     0.00   44.00 5588.00     0.17    21.83     8.00   516.27   91.69   0.17  95.44
>> > dm-5             0.00     0.00    0.00 6014.20     0.00    23.49     8.00  1331.42   66.25   0.13  76.48
>> > dm-5             0.00     0.00   28.80 4491.40     0.11    17.54     8.00  1000.37  412.09   0.17  78.24
>> > dm-5             0.00     0.00   36.60 6486.40     0.14    25.34     8.00   765.12  128.07   0.11  72.16
>> > dm-5             0.00     0.00   33.40 5095.60     0.13    19.90     8.00   431.38   43.78   0.17  85.20
>> >
>> > for comparison these are the numbers for seen when running the test
>> > with just the writing enabled
>> >
>> > Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await  svctm  %util
>> > ---------------------------------------------------------------------------------------------------------
>> > dm-5             0.00     0.00    4.00 12047.40    0.02    47.06     8.00   989.81   79.55   0.07  79.20
>> > dm-5             0.00     0.00    3.40 12399.00    0.01    48.43     8.00   977.13   72.15   0.05  58.08
>> > dm-5             0.00     0.00    3.80 13130.00    0.01    51.29     8.00  1130.48   95.11   0.04  58.48
>> > dm-5             0.00     0.00    2.40 5109.20     0.01    19.96     8.00   427.75   47.41   0.16  79.92
>> > dm-5             0.00     0.00    3.20    0.00     0.01     0.00     8.00   290.33 148653.75 282.50  90.40
>> > dm-5             0.00     0.00    3.40 5103.00     0.01    19.93     8.00   168.75   33.06   0.13  67.84
>> >
>>
>> From the numbers, it appears that your controller (or the disks) is
>> lying about writes. They get cached (service time around 0.1ms) and
>> reported as completed. When the cache is full, or a flush is issued,
>> you get >200ms delay.
>> Reads, instead, have a more reasonable service time of around 2ms.
>>
>> Can you disable write cache, and see if the response times for writes
>> become reasonable?
>
> sure can ... the write-only results look like this. As you have predicted,
> the wMB has gone down to more realistic levels ... but also the await has gone way up.
>
> Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await  svctm  %util
> ---------------------------------------------------------------------------------------------------------
> dm-18             0.00     0.00    0.00 2566.80     0.00    10.03     8.00  2224.74  737.62   0.39 100.00
> dm-18             0.00     0.00    9.60  679.00     0.04     2.65     8.00   400.41 1029.73   1.35  92.80
> dm-18             0.00     0.00    0.00 2080.80     0.00     8.13     8.00   906.58  456.45   0.48 100.00
> dm-18             0.00     0.00    0.00 2349.20     0.00     9.18     8.00  1351.17  491.44   0.43 100.00
> dm-18             0.00     0.00    3.80  665.60     0.01     2.60     8.00   906.72 1098.75   1.39  93.20
> dm-18             0.00     0.00    0.00 1811.20     0.00     7.07     8.00  1008.23  725.34   0.55 100.00
> dm-18             0.00     0.00    0.00 2632.60     0.00    10.28     8.00  1651.18  640.61   0.38 100.00
>

Good.
The high await is normal for writes, especially since you get so many
queued requests.
can you post the output of "grep -r .  /sys/block/_device_/queue/" and
iostat for your real devices?
This should not affect reads, that will preempt writes with cfq.

>
> in the read/write test the write rate stais down, but the rMB/s is even worse and the await is also way up,
> so I guess the bad performance is not to blame on the the cache in the controller ...
>
> Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await  svctm  %util
> ---------------------------------------------------------------------------------------------------------
> dm-18             0.00     0.00    0.00 1225.80     0.00     4.79     8.00  1050.49  807.38   0.82 100.00
> dm-18             0.00     0.00    0.00 1721.80     0.00     6.73     8.00  1823.67  807.20   0.58 100.00
> dm-18             0.00     0.00    0.00 1128.00     0.00     4.41     8.00   617.94  832.52   0.89 100.00
> dm-18             0.00     0.00    0.00  838.80     0.00     3.28     8.00   873.04 1056.37   1.19 100.00
> dm-18             0.00     0.00   39.60  347.80     0.15     1.36     8.00   590.27 1880.05   2.57  99.68
> dm-18             0.00     0.00    0.00 1626.00     0.00     6.35     8.00   983.85  452.72   0.62 100.00
> dm-18             0.00     0.00    0.00 1117.00     0.00     4.36     8.00  1047.16 1117.78   0.90 100.00
> dm-18             0.00     0.00    0.00 1319.20     0.00     5.15     8.00   840.64  573.39   0.76 100.00
>

This is interesting. it seems no reads are happening at all here.
I suspect that this and your observation that the real read throughput
is much higher, can be explained because your readers mostly read from
the page cache.
Can you describe better how the test workload is structured,
especially regarding cache?
How do you say that some tars are just readers, and some are just
writers? I suppose you pre-fault-in the input for writers, and you
direct output to /dev/null for readers? Are all readers reading from
different directories?

Few things to check:
* are the cpus saturated during the test?
* are the readers mostly in state 'D', or 'S', or 'R'?
* did you try 'as' I/O scheduler?
* how big are your volumes?
* what is the average load on the system?
* with i/o controller patches, what happens if you put readers in one
domain and writers in the other?

Are you willing to test some patches? I'm working on patches to reduce
read latency, that may be interesting to you.

Corrado

> cheers
> tobi
>
> --
> Tobi Oetiker, OETIKER+PARTNER AG, Aarweg 15 CH-4600 Olten, Switzerland
> http://it.oetiker.ch tobi@oetiker.ch ++41 62 775 9902 / sb: -9900



-- 
__________________________________________________________________________

dott. Corrado Zoccolo                          mailto:czoccolo@gmail.com
PhD - Department of Computer Science - University of Pisa, Italy
--------------------------------------------------------------------------

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: unfair io behaviour for high load interactive use still present in 2.6.31
  2009-09-15 14:53 ` Corrado Zoccolo
@ 2009-09-15 21:07   ` Tobias Oetiker
  2009-09-16  7:29     ` Corrado Zoccolo
  0 siblings, 1 reply; 9+ messages in thread
From: Tobias Oetiker @ 2009-09-15 21:07 UTC (permalink / raw)
  To: Corrado Zoccolo; +Cc: linux-kernel

[-- Attachment #1: Type: TEXT/PLAIN, Size: 5707 bytes --]

Today Corrado Zoccolo wrote:

> On Tue, Sep 15, 2009 at 9:30 AM, Tobias Oetiker <tobi@oetiker.ch> wrote:
> > Below is an excerpt from iostat while the test is in full swing:
> >
> > * 2.6.31 (8 cpu x86_64, 24 GB Ram)
> > * scheduler = cfq
> > * iostat -m -x dm-5 5
> > * running in parallel on 3 lvm logical volumes
> >  on a single physical volume
> >
> > Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await  svctm  %util
> > ---------------------------------------------------------------------------------------------------------
> > dm-5             0.00     0.00   26.60 7036.60     0.10    27.49     8.00   669.04   91.58   0.09  66.96
> > dm-5             0.00     0.00   63.80 2543.00     0.25     9.93     8.00  1383.63  539.28   0.36  94.64
> > dm-5             0.00     0.00   78.00 5084.60     0.30    19.86     8.00  1007.41  195.12   0.15  77.36
> > dm-5             0.00     0.00   44.00 5588.00     0.17    21.83     8.00   516.27   91.69   0.17  95.44
> > dm-5             0.00     0.00    0.00 6014.20     0.00    23.49     8.00  1331.42   66.25   0.13  76.48
> > dm-5             0.00     0.00   28.80 4491.40     0.11    17.54     8.00  1000.37  412.09   0.17  78.24
> > dm-5             0.00     0.00   36.60 6486.40     0.14    25.34     8.00   765.12  128.07   0.11  72.16
> > dm-5             0.00     0.00   33.40 5095.60     0.13    19.90     8.00   431.38   43.78   0.17  85.20
> >
> > for comparison these are the numbers for seen when running the test
> > with just the writing enabled
> >
> > Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await  svctm  %util
> > ---------------------------------------------------------------------------------------------------------
> > dm-5             0.00     0.00    4.00 12047.40    0.02    47.06     8.00   989.81   79.55   0.07  79.20
> > dm-5             0.00     0.00    3.40 12399.00    0.01    48.43     8.00   977.13   72.15   0.05  58.08
> > dm-5             0.00     0.00    3.80 13130.00    0.01    51.29     8.00  1130.48   95.11   0.04  58.48
> > dm-5             0.00     0.00    2.40 5109.20     0.01    19.96     8.00   427.75   47.41   0.16  79.92
> > dm-5             0.00     0.00    3.20    0.00     0.01     0.00     8.00   290.33 148653.75 282.50  90.40
> > dm-5             0.00     0.00    3.40 5103.00     0.01    19.93     8.00   168.75   33.06   0.13  67.84
> >
>
> From the numbers, it appears that your controller (or the disks) is
> lying about writes. They get cached (service time around 0.1ms) and
> reported as completed. When the cache is full, or a flush is issued,
> you get >200ms delay.
> Reads, instead, have a more reasonable service time of around 2ms.
>
> Can you disable write cache, and see if the response times for writes
> become reasonable?

sure can ... the write-only results look like this. As you have predicted,
the wMB has gone down to more realistic levels ... but also the await has gone way up.

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await  svctm  %util
---------------------------------------------------------------------------------------------------------
dm-18             0.00     0.00    0.00 2566.80     0.00    10.03     8.00  2224.74  737.62   0.39 100.00
dm-18             0.00     0.00    9.60  679.00     0.04     2.65     8.00   400.41 1029.73   1.35  92.80
dm-18             0.00     0.00    0.00 2080.80     0.00     8.13     8.00   906.58  456.45   0.48 100.00
dm-18             0.00     0.00    0.00 2349.20     0.00     9.18     8.00  1351.17  491.44   0.43 100.00
dm-18             0.00     0.00    3.80  665.60     0.01     2.60     8.00   906.72 1098.75   1.39  93.20
dm-18             0.00     0.00    0.00 1811.20     0.00     7.07     8.00  1008.23  725.34   0.55 100.00
dm-18             0.00     0.00    0.00 2632.60     0.00    10.28     8.00  1651.18  640.61   0.38 100.00


in the read/write test the write rate stais down, but the rMB/s is even worse and the await is also way up,
so I guess the bad performance is not to blame on the the cache in the controller ...

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await  svctm  %util
---------------------------------------------------------------------------------------------------------
dm-18             0.00     0.00    0.00 1225.80     0.00     4.79     8.00  1050.49  807.38   0.82 100.00
dm-18             0.00     0.00    0.00 1721.80     0.00     6.73     8.00  1823.67  807.20   0.58 100.00
dm-18             0.00     0.00    0.00 1128.00     0.00     4.41     8.00   617.94  832.52   0.89 100.00
dm-18             0.00     0.00    0.00  838.80     0.00     3.28     8.00   873.04 1056.37   1.19 100.00
dm-18             0.00     0.00   39.60  347.80     0.15     1.36     8.00   590.27 1880.05   2.57  99.68
dm-18             0.00     0.00    0.00 1626.00     0.00     6.35     8.00   983.85  452.72   0.62 100.00
dm-18             0.00     0.00    0.00 1117.00     0.00     4.36     8.00  1047.16 1117.78   0.90 100.00
dm-18             0.00     0.00    0.00 1319.20     0.00     5.15     8.00   840.64  573.39   0.76 100.00

cheers
tobi

-- 
Tobi Oetiker, OETIKER+PARTNER AG, Aarweg 15 CH-4600 Olten, Switzerland
http://it.oetiker.ch tobi@oetiker.ch ++41 62 775 9902 / sb: -9900

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: unfair io behaviour for high load interactive use still present  in 2.6.31
  2009-09-15  7:30 Tobias Oetiker
@ 2009-09-15 14:53 ` Corrado Zoccolo
  2009-09-15 21:07   ` Tobias Oetiker
  0 siblings, 1 reply; 9+ messages in thread
From: Corrado Zoccolo @ 2009-09-15 14:53 UTC (permalink / raw)
  To: Tobias Oetiker; +Cc: linux-kernel

On Tue, Sep 15, 2009 at 9:30 AM, Tobias Oetiker <tobi@oetiker.ch> wrote:
> Experts,
>
> We run several busy NFS file servers with Areca HW Raid + LVM2 + ext3
>
> We find that the read bandwidth falls dramatically as well as the
> response times going up to several seconds as soon as the system
> comes under heavy write strain.
>
> With the release of 2.6.31 and all the io fixes that went in there,
> I was hoping for a solution and set out to do some tests ...
>
> I have seen the io problem posted on LKML a few times, and normally
> it is simulated by running several concurrent dd processes one
> reading and one writing. It seems that cfq with a low slice_async
> can deal pretty well with competing dds.
>
> Unfortunately our use case is not users running dd but rather a lot
> of processes accessing many small to medium sized files for reading
> and writing.
>
> I have written a test program that unpacks Linux 2.6.30.5 a few
> times into a file system, flushes the cache and then tars it up
> again while unpacking some more tars in parallel. While this is
> running I use iostat to watch the activity on the block devices. As
> I am interested in interactive performance of the system, the await
> row as well as the rMB/s row are of special interest to me
>
>  iostat -m -x 5
>
> Even with a low 'resolution' of 5 seconds, the performance figures
> are jumping all over the place. This concurs with the user
> experience when working with the system interactively.
>
> I tried to optimize the configuration systematically turning all
> the knobs I know of (/proc/sys/vm, /sys/block/*/queue/scheduler,
> data=journal, data=ordered, external journal on a ssd device) one
> at a time. I found that cfq helps the read performance quite a lot
> as far as total run-time is concerned, but the jerky nature of the
> measurements does not change, and also the read performance keeps
> dropping dramatically as soon as it is in competition with writers.
>
> I would love to get some hints on how to make such a setup perform
> without these huge performance fluctuations.
>
> While testing, I saw that iostat reports huge wMB/s numbers and
> ridiculously low rMB/s numbers. Looking at the actual amount of
> data in the tar files as well as the run time of the tar processes
> the numbers MB/s numbers reported by iostat do seem strange. The
> read numbers are too low and the write numbers are too high.
>
>  12 * 1.3 GB reading in 270s = 14 MB/s sustained
>  12 * 0.3 GB writing in 180s = 1.8 MB/s sustained
>
> Since I am looking at relative performance figures this does not
> matter so much, but it is still a bit disconcerting.
>
> My tests script is available on http://tobi.oetiker.ch/fspunisher/
>
> Below is an excerpt from iostat while the test is in full swing:
>
> * 2.6.31 (8 cpu x86_64, 24 GB Ram)
> * scheduler = cfq
> * iostat -m -x dm-5 5
> * running in parallel on 3 lvm logical volumes
>  on a single physical volume
>
> Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await  svctm  %util
> ---------------------------------------------------------------------------------------------------------
> dm-5             0.00     0.00   26.60 7036.60     0.10    27.49     8.00   669.04   91.58   0.09  66.96
> dm-5             0.00     0.00   63.80 2543.00     0.25     9.93     8.00  1383.63  539.28   0.36  94.64
> dm-5             0.00     0.00   78.00 5084.60     0.30    19.86     8.00  1007.41  195.12   0.15  77.36
> dm-5             0.00     0.00   44.00 5588.00     0.17    21.83     8.00   516.27   91.69   0.17  95.44
> dm-5             0.00     0.00    0.00 6014.20     0.00    23.49     8.00  1331.42   66.25   0.13  76.48
> dm-5             0.00     0.00   28.80 4491.40     0.11    17.54     8.00  1000.37  412.09   0.17  78.24
> dm-5             0.00     0.00   36.60 6486.40     0.14    25.34     8.00   765.12  128.07   0.11  72.16
> dm-5             0.00     0.00   33.40 5095.60     0.13    19.90     8.00   431.38   43.78   0.17  85.20
>
> for comparison these are the numbers for seen when running the test
> with just the writing enabled
>
> Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await  svctm  %util
> ---------------------------------------------------------------------------------------------------------
> dm-5             0.00     0.00    4.00 12047.40    0.02    47.06     8.00   989.81   79.55   0.07  79.20
> dm-5             0.00     0.00    3.40 12399.00    0.01    48.43     8.00   977.13   72.15   0.05  58.08
> dm-5             0.00     0.00    3.80 13130.00    0.01    51.29     8.00  1130.48   95.11   0.04  58.48
> dm-5             0.00     0.00    2.40 5109.20     0.01    19.96     8.00   427.75   47.41   0.16  79.92
> dm-5             0.00     0.00    3.20    0.00     0.01     0.00     8.00   290.33 148653.75 282.50  90.40
> dm-5             0.00     0.00    3.40 5103.00     0.01    19.93     8.00   168.75   33.06   0.13  67.84
>
> And also with just the reading:
>
> Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await  svctm  %util
> ---------------------------------------------------------------------------------------------------------
> dm-5             0.00     0.00  463.80    0.00     1.81     0.00     8.00     3.90    8.41   2.16 100.00
> dm-5             0.00     0.00  434.20    0.00     1.70     0.00     8.00     3.89    8.95   2.30 100.00
> dm-5             0.00     0.00  540.80    0.00     2.11     0.00     8.00     3.88    7.18   1.85 100.00
> dm-5             0.00     0.00  591.60    0.00     2.31     0.00     8.00     3.84    6.50   1.68  99.68
> dm-5             0.00     0.00  793.20    0.00     3.10     0.00     8.00     3.81    4.80   1.26 100.00
> dm-5             0.00     0.00  592.80    0.00     2.32     0.00     8.00     3.84    6.47   1.68  99.60
> dm-5             0.00     0.00  578.80    0.00     2.26     0.00     8.00     3.85    6.66   1.73 100.00
> dm-5             0.00     0.00  771.00    0.00     3.01     0.00     8.00     3.81    4.93   1.30  99.92
>

>From the numbers, it appears that your controller (or the disks) is
lying about writes. They get cached (service time around 0.1ms) and
reported as completed. When the cache is full, or a flush is issued,
you get >200ms delay.
Reads, instead, have a more reasonable service time of around 2ms.

Can you disable write cache, and see if the response times for writes
become reasonable?

> I also tested 2.6.31 for what happens when I run the same 'load' on a single lvm
> logical volume. Interestingly enough it is even worse than when doing it on three
> volumes:
>
> Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await  svctm  %util
> ---------------------------------------------------------------------------------------------------------
> dm-5             0.00     0.00   19.20 4566.80     0.07    17.84     8.00  2232.12  486.87   0.22  99.92
> dm-5             0.00     0.00    6.80 9410.40     0.03    36.76     8.00  1827.47  187.99   0.10  92.00
> dm-5             0.00     0.00    4.00    0.00     0.02     0.00     8.00   685.26 185618.40 249.60  99.84
> dm-5             0.00     0.00    4.20 4968.20     0.02    19.41     8.00  1426.45  286.86   0.20  99.84
> dm-5             0.00     0.00   10.60 9886.00     0.04    38.62     8.00   167.57    5.74   0.09  88.72
> dm-5             0.00     0.00    5.00    0.00     0.02     0.00     8.00  1103.98 242774.88 199.68  99.84
> dm-5             0.00     0.00   38.20 14794.60    0.15    57.79     8.00  1171.75   74.25   0.06  87.20
>
> I also tested 2.6.31 with io-controller v9 patches. It seems to help a
> bit with the read rate, but the figures still jumps all over the place
>
> * 2.6.31 with io-controller patches v9 (8 cpu x86_64, 24 GB Ram)
> * fairness set to 1 on all block devices
> * scheduler = cfq
> * iostat -m -x dm-5 5
> * running in parallel on 3 lvm logical volumes
>  on a single physical volume
>
> Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await  svctm  %util
> ---------------------------------------------------------------------------------------------------------
> dm-5              0.00     0.00  412.00 1640.60     1.61     6.41     8.00  1992.54 1032.27   0.49  99.84
> dm-5              0.00     0.00  362.40  576.40     1.42     2.25     8.00   456.13  612.67   1.07 100.00
> dm-5              0.00     0.00  211.80 1004.40     0.83     3.92     8.00  1186.20  995.00   0.82 100.00
> dm-5              0.00     0.00   44.20  719.60     0.17     2.81     8.00   788.56  574.81   1.31  99.76
> dm-5              0.00     0.00    0.00 1274.80     0.00     4.98     8.00  1584.07 1317.32   0.78 100.00
> dm-5              0.00     0.00    0.00  946.91     0.00     3.70     8.00   989.09  911.30   1.05  99.72
> dm-5              0.00     0.00    7.20 2526.00     0.03     9.87     8.00  2085.57  201.72   0.37  92.88
>
>
> For completeness sake I did the tests on 2.4.24 as well not much different.
>
> * 2.6.24 (8 cpu x86_64, 24 GB Ram)
> * scheduler = cfq
> * iostat -m -x dm-5 5
> * running in parallel on 3 lvm logical volumes
>  situated on a single physical volume
>
> Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await  svctm  %util
> ---------------------------------------------------------------------------------------------------------
> dm-5              0.00     0.00  144.60 5498.00     0.56    21.48     8.00  1215.82  215.48   0.14  76.80
> dm-5              0.00     0.00   30.00 9831.40     0.12    38.40     8.00  1729.16  142.37   0.09  88.60
> dm-5              0.00     0.00   27.60 4126.40     0.11    16.12     8.00  2245.24  618.77   0.21  86.00
> dm-5              0.00     0.00    2.00 3981.20     0.01    15.55     8.00  1069.07  268.40   0.23  91.60
> dm-5              0.00     0.00   40.60   13.20     0.16     0.05     8.00     3.98   74.83  15.02  80.80
> dm-5              0.00     0.00    5.60 5085.20     0.02    19.86     8.00  2586.65  508.10   0.18  94.00
> dm-5              0.00     0.00   20.80 5344.60     0.08    20.88     8.00   985.51  148.96   0.17  92.60
>
>
> cheers
> tobi
>
> --
> Tobi Oetiker, OETIKER+PARTNER AG, Aarweg 15 CH-4600 Olten, Switzerland
> http://it.oetiker.ch tobi@oetiker.ch ++41 62 775 9902 / sb: -9900
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>



-- 
__________________________________________________________________________

dott. Corrado Zoccolo                          mailto:czoccolo@gmail.com
PhD - Department of Computer Science - University of Pisa, Italy
--------------------------------------------------------------------------

^ permalink raw reply	[flat|nested] 9+ messages in thread

* unfair io behaviour for high load interactive use still present in 2.6.31
@ 2009-09-15  7:30 Tobias Oetiker
  2009-09-15 14:53 ` Corrado Zoccolo
  0 siblings, 1 reply; 9+ messages in thread
From: Tobias Oetiker @ 2009-09-15  7:30 UTC (permalink / raw)
  To: linux-kernel

Experts,

We run several busy NFS file servers with Areca HW Raid + LVM2 + ext3

We find that the read bandwidth falls dramatically as well as the
response times going up to several seconds as soon as the system
comes under heavy write strain.

With the release of 2.6.31 and all the io fixes that went in there,
I was hoping for a solution and set out to do some tests ...

I have seen the io problem posted on LKML a few times, and normally
it is simulated by running several concurrent dd processes one
reading and one writing. It seems that cfq with a low slice_async
can deal pretty well with competing dds.

Unfortunately our use case is not users running dd but rather a lot
of processes accessing many small to medium sized files for reading
and writing.

I have written a test program that unpacks Linux 2.6.30.5 a few
times into a file system, flushes the cache and then tars it up
again while unpacking some more tars in parallel. While this is
running I use iostat to watch the activity on the block devices. As
I am interested in interactive performance of the system, the await
row as well as the rMB/s row are of special interest to me

  iostat -m -x 5

Even with a low 'resolution' of 5 seconds, the performance figures
are jumping all over the place. This concurs with the user
experience when working with the system interactively.

I tried to optimize the configuration systematically turning all
the knobs I know of (/proc/sys/vm, /sys/block/*/queue/scheduler,
data=journal, data=ordered, external journal on a ssd device) one
at a time. I found that cfq helps the read performance quite a lot
as far as total run-time is concerned, but the jerky nature of the
measurements does not change, and also the read performance keeps
dropping dramatically as soon as it is in competition with writers.

I would love to get some hints on how to make such a setup perform
without these huge performance fluctuations.

While testing, I saw that iostat reports huge wMB/s numbers and
ridiculously low rMB/s numbers. Looking at the actual amount of
data in the tar files as well as the run time of the tar processes
the numbers MB/s numbers reported by iostat do seem strange. The
read numbers are too low and the write numbers are too high.

  12 * 1.3 GB reading in 270s = 14 MB/s sustained
  12 * 0.3 GB writing in 180s = 1.8 MB/s sustained

Since I am looking at relative performance figures this does not
matter so much, but it is still a bit disconcerting.

My tests script is available on http://tobi.oetiker.ch/fspunisher/

Below is an excerpt from iostat while the test is in full swing:

* 2.6.31 (8 cpu x86_64, 24 GB Ram)
* scheduler = cfq
* iostat -m -x dm-5 5
* running in parallel on 3 lvm logical volumes
  on a single physical volume

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await  svctm  %util
---------------------------------------------------------------------------------------------------------
dm-5             0.00     0.00   26.60 7036.60     0.10    27.49     8.00   669.04   91.58   0.09  66.96
dm-5             0.00     0.00   63.80 2543.00     0.25     9.93     8.00  1383.63  539.28   0.36  94.64
dm-5             0.00     0.00   78.00 5084.60     0.30    19.86     8.00  1007.41  195.12   0.15  77.36
dm-5             0.00     0.00   44.00 5588.00     0.17    21.83     8.00   516.27   91.69   0.17  95.44
dm-5             0.00     0.00    0.00 6014.20     0.00    23.49     8.00  1331.42   66.25   0.13  76.48
dm-5             0.00     0.00   28.80 4491.40     0.11    17.54     8.00  1000.37  412.09   0.17  78.24
dm-5             0.00     0.00   36.60 6486.40     0.14    25.34     8.00   765.12  128.07   0.11  72.16
dm-5             0.00     0.00   33.40 5095.60     0.13    19.90     8.00   431.38   43.78   0.17  85.20

for comparison these are the numbers for seen when running the test
with just the writing enabled

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await  svctm  %util
---------------------------------------------------------------------------------------------------------
dm-5             0.00     0.00    4.00 12047.40    0.02    47.06     8.00   989.81   79.55   0.07  79.20
dm-5             0.00     0.00    3.40 12399.00    0.01    48.43     8.00   977.13   72.15   0.05  58.08
dm-5             0.00     0.00    3.80 13130.00    0.01    51.29     8.00  1130.48   95.11   0.04  58.48
dm-5             0.00     0.00    2.40 5109.20     0.01    19.96     8.00   427.75   47.41   0.16  79.92
dm-5             0.00     0.00    3.20    0.00     0.01     0.00     8.00   290.33 148653.75 282.50  90.40
dm-5             0.00     0.00    3.40 5103.00     0.01    19.93     8.00   168.75   33.06   0.13  67.84

And also with just the reading:

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await  svctm  %util
---------------------------------------------------------------------------------------------------------
dm-5             0.00     0.00  463.80    0.00     1.81     0.00     8.00     3.90    8.41   2.16 100.00
dm-5             0.00     0.00  434.20    0.00     1.70     0.00     8.00     3.89    8.95   2.30 100.00
dm-5             0.00     0.00  540.80    0.00     2.11     0.00     8.00     3.88    7.18   1.85 100.00
dm-5             0.00     0.00  591.60    0.00     2.31     0.00     8.00     3.84    6.50   1.68  99.68
dm-5             0.00     0.00  793.20    0.00     3.10     0.00     8.00     3.81    4.80   1.26 100.00
dm-5             0.00     0.00  592.80    0.00     2.32     0.00     8.00     3.84    6.47   1.68  99.60
dm-5             0.00     0.00  578.80    0.00     2.26     0.00     8.00     3.85    6.66   1.73 100.00
dm-5             0.00     0.00  771.00    0.00     3.01     0.00     8.00     3.81    4.93   1.30  99.92

I also tested 2.6.31 for what happens when I run the same 'load' on a single lvm
logical volume. Interestingly enough it is even worse than when doing it on three
volumes:

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await  svctm  %util
---------------------------------------------------------------------------------------------------------
dm-5             0.00     0.00   19.20 4566.80     0.07    17.84     8.00  2232.12  486.87   0.22  99.92
dm-5             0.00     0.00    6.80 9410.40     0.03    36.76     8.00  1827.47  187.99   0.10  92.00
dm-5             0.00     0.00    4.00    0.00     0.02     0.00     8.00   685.26 185618.40 249.60  99.84
dm-5             0.00     0.00    4.20 4968.20     0.02    19.41     8.00  1426.45  286.86   0.20  99.84
dm-5             0.00     0.00   10.60 9886.00     0.04    38.62     8.00   167.57    5.74   0.09  88.72
dm-5             0.00     0.00    5.00    0.00     0.02     0.00     8.00  1103.98 242774.88 199.68  99.84
dm-5             0.00     0.00   38.20 14794.60    0.15    57.79     8.00  1171.75   74.25   0.06  87.20

I also tested 2.6.31 with io-controller v9 patches. It seems to help a
bit with the read rate, but the figures still jumps all over the place

* 2.6.31 with io-controller patches v9 (8 cpu x86_64, 24 GB Ram)
* fairness set to 1 on all block devices
* scheduler = cfq
* iostat -m -x dm-5 5
* running in parallel on 3 lvm logical volumes
  on a single physical volume

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await  svctm  %util
---------------------------------------------------------------------------------------------------------
dm-5              0.00     0.00  412.00 1640.60     1.61     6.41     8.00  1992.54 1032.27   0.49  99.84
dm-5              0.00     0.00  362.40  576.40     1.42     2.25     8.00   456.13  612.67   1.07 100.00
dm-5              0.00     0.00  211.80 1004.40     0.83     3.92     8.00  1186.20  995.00   0.82 100.00
dm-5              0.00     0.00   44.20  719.60     0.17     2.81     8.00   788.56  574.81   1.31  99.76
dm-5              0.00     0.00    0.00 1274.80     0.00     4.98     8.00  1584.07 1317.32   0.78 100.00
dm-5              0.00     0.00    0.00  946.91     0.00     3.70     8.00   989.09  911.30   1.05  99.72
dm-5              0.00     0.00    7.20 2526.00     0.03     9.87     8.00  2085.57  201.72   0.37  92.88


For completeness sake I did the tests on 2.4.24 as well not much different.

* 2.6.24 (8 cpu x86_64, 24 GB Ram)
* scheduler = cfq
* iostat -m -x dm-5 5
* running in parallel on 3 lvm logical volumes
  situated on a single physical volume

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await  svctm  %util
---------------------------------------------------------------------------------------------------------
dm-5              0.00     0.00  144.60 5498.00     0.56    21.48     8.00  1215.82  215.48   0.14  76.80
dm-5              0.00     0.00   30.00 9831.40     0.12    38.40     8.00  1729.16  142.37   0.09  88.60
dm-5              0.00     0.00   27.60 4126.40     0.11    16.12     8.00  2245.24  618.77   0.21  86.00
dm-5              0.00     0.00    2.00 3981.20     0.01    15.55     8.00  1069.07  268.40   0.23  91.60
dm-5              0.00     0.00   40.60   13.20     0.16     0.05     8.00     3.98   74.83  15.02  80.80
dm-5              0.00     0.00    5.60 5085.20     0.02    19.86     8.00  2586.65  508.10   0.18  94.00
dm-5              0.00     0.00   20.80 5344.60     0.08    20.88     8.00   985.51  148.96   0.17  92.60


cheers
tobi

-- 
Tobi Oetiker, OETIKER+PARTNER AG, Aarweg 15 CH-4600 Olten, Switzerland
http://it.oetiker.ch tobi@oetiker.ch ++41 62 775 9902 / sb: -9900

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2009-09-26 19:35 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <dkAXF-4Gg-25@gated-at.bofh.it>
2009-09-15 11:02 ` unfair io behaviour for high load interactive use still present in 2.6.31 Daniel J Blueman
2009-09-15 23:21   ` Tobias Oetiker
     [not found] ` <dkHFQ-1FQ-3@gated-at.bofh.it>
     [not found]   ` <dkNrM-6Fq-11@gated-at.bofh.it>
     [not found]     ` <dkX7N-6wO-15@gated-at.bofh.it>
     [not found]       ` <dkXAT-6SV-9@gated-at.bofh.it>
2009-09-26 19:35         ` Daniel J Blueman
2009-09-15  7:30 Tobias Oetiker
2009-09-15 14:53 ` Corrado Zoccolo
2009-09-15 21:07   ` Tobias Oetiker
2009-09-16  7:29     ` Corrado Zoccolo
2009-09-16  7:54       ` Tobias Oetiker
2009-09-16 12:40       ` Tobias Oetiker

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.