All of lore.kernel.org
 help / color / mirror / Atom feed
* dm-ioband: Test results.
@ 2009-04-13  4:05 Ryo Tsuruta
  2009-04-13 14:46   ` Vivek Goyal
                   ` (2 more replies)
  0 siblings, 3 replies; 55+ messages in thread
From: Ryo Tsuruta @ 2009-04-13  4:05 UTC (permalink / raw)
  To: agk, dm-devel; +Cc: linux-kernel

Hi Alasdair and all,

I did more tests on dm-ioband and I've posted the test items and
results on my website. The results are very good.
http://people.valinux.co.jp/~ryov/dm-ioband/test/test-items.xls

I hope someone will test dm-ioband and report back to the dm-devel
mailing list.

Alasdair, could you please merge dm-ioband into upstream? Or could
you please tell me why dm-ioband can't be merged?

Thanks,
Ryo Tsuruta

To know the details of dm-ioband:
http://people.valinux.co.jp/~ryov/dm-ioband/

RPM packages for RHEL5 and CentOS5 are available:
http://people.valinux.co.jp/~ryov/dm-ioband/binary.html

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: dm-ioband: Test results.
  2009-04-13  4:05 dm-ioband: Test results Ryo Tsuruta
@ 2009-04-13 14:46   ` Vivek Goyal
  2009-04-15  4:37   ` Vivek Goyal
  2009-04-16 20:57   ` Vivek Goyal
  2 siblings, 0 replies; 55+ messages in thread
From: Vivek Goyal @ 2009-04-13 14:46 UTC (permalink / raw)
  To: Ryo Tsuruta; +Cc: agk, dm-devel, linux-kernel

On Mon, Apr 13, 2009 at 01:05:52PM +0900, Ryo Tsuruta wrote:
> Hi Alasdair and all,
> 
> I did more tests on dm-ioband and I've posted the test items and
> results on my website. The results are very good.
> http://people.valinux.co.jp/~ryov/dm-ioband/test/test-items.xls
> 

Hi Ryo,

I quickly looked at the xls sheet. Most of the test cases seem to be
direct IO. Have you done testing with buffered writes/async writes and
been able to provide service differentiation between cgroups?

For example, two "dd" threads running in two cgroups doing writes.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: dm-ioband: Test results.
@ 2009-04-13 14:46   ` Vivek Goyal
  0 siblings, 0 replies; 55+ messages in thread
From: Vivek Goyal @ 2009-04-13 14:46 UTC (permalink / raw)
  To: Ryo Tsuruta; +Cc: dm-devel, linux-kernel, agk

On Mon, Apr 13, 2009 at 01:05:52PM +0900, Ryo Tsuruta wrote:
> Hi Alasdair and all,
> 
> I did more tests on dm-ioband and I've posted the test items and
> results on my website. The results are very good.
> http://people.valinux.co.jp/~ryov/dm-ioband/test/test-items.xls
> 

Hi Ryo,

I quickly looked at the xls sheet. Most of the test cases seem to be
direct IO. Have you done testing with buffered writes/async writes and
been able to provide service differentiation between cgroups?

For example, two "dd" threads running in two cgroups doing writes.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: dm-ioband: Test results.
  2009-04-13 14:46   ` Vivek Goyal
@ 2009-04-14  2:49     ` Vivek Goyal
  -1 siblings, 0 replies; 55+ messages in thread
From: Vivek Goyal @ 2009-04-14  2:49 UTC (permalink / raw)
  To: Ryo Tsuruta; +Cc: agk, dm-devel, linux-kernel

On Mon, Apr 13, 2009 at 10:46:26AM -0400, Vivek Goyal wrote:
> On Mon, Apr 13, 2009 at 01:05:52PM +0900, Ryo Tsuruta wrote:
> > Hi Alasdair and all,
> > 
> > I did more tests on dm-ioband and I've posted the test items and
> > results on my website. The results are very good.
> > http://people.valinux.co.jp/~ryov/dm-ioband/test/test-items.xls
> > 
> 
> Hi Ryo,
> 
> I quickly looked at the xls sheet. Most of the test cases seem to be
> direct IO. Have you done testing with buffered writes/async writes and
> been able to provide service differentiation between cgroups?
> 
> For example, two "dd" threads running in two cgroups doing writes.
> 

Just realized that last time I replied to wrong mail id. Ryo, this time
you should get the mail. This is reply to my original reply.

Also I wanted to test run your patches. How do I do that? I see that your
patches are based on dm quilt tree. I downloaded the dm-patches and tried
to apply on top of 2.6.30-rc1 but it failed. So can't apply your patches
now.

So what's the simplest way of testing your changes on latest kernels?

Thanks
Vivek

Applying patch dm-add-request-based-facility.patch
patching file drivers/md/dm-table.c
Hunk #1 succeeded at 1021 (offset 29 lines).
patching file drivers/md/dm.c
Hunk #1 succeeded at 90 (offset 6 lines).
Hunk #2 succeeded at 175 (offset -1 lines).
Hunk #3 succeeded at 427 (offset 6 lines).
Hunk #4 succeeded at 648 (offset 10 lines).
Hunk #5 succeeded at 1246 with fuzz 2 (offset 20 lines).
Hunk #6 succeeded at 1273 (offset 3 lines).
Hunk #7 succeeded at 1615 with fuzz 1 (offset 20 lines).
Hunk #8 succeeded at 2016 with fuzz 1 (offset 14 lines).
Hunk #9 succeeded at 2141 (offset 33 lines).
Hunk #10 FAILED at 2321.
Hunk #11 FAILED at 2336.
Hunk #12 succeeded at 2388 with fuzz 2 (offset 29 lines).
2 out of 12 hunks FAILED -- rejects in file drivers/md/dm.c
patching file drivers/md/dm.h
patching file include/linux/device-mapper.h
Hunk #1 succeeded at 230 (offset -1 lines).
Patch dm-add-request-based-facility.patch does not apply (enforce with -f)

> Thanks
> Vivek

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: dm-ioband: Test results.
@ 2009-04-14  2:49     ` Vivek Goyal
  0 siblings, 0 replies; 55+ messages in thread
From: Vivek Goyal @ 2009-04-14  2:49 UTC (permalink / raw)
  To: Ryo Tsuruta; +Cc: dm-devel, linux-kernel, agk

On Mon, Apr 13, 2009 at 10:46:26AM -0400, Vivek Goyal wrote:
> On Mon, Apr 13, 2009 at 01:05:52PM +0900, Ryo Tsuruta wrote:
> > Hi Alasdair and all,
> > 
> > I did more tests on dm-ioband and I've posted the test items and
> > results on my website. The results are very good.
> > http://people.valinux.co.jp/~ryov/dm-ioband/test/test-items.xls
> > 
> 
> Hi Ryo,
> 
> I quickly looked at the xls sheet. Most of the test cases seem to be
> direct IO. Have you done testing with buffered writes/async writes and
> been able to provide service differentiation between cgroups?
> 
> For example, two "dd" threads running in two cgroups doing writes.
> 

Just realized that last time I replied to wrong mail id. Ryo, this time
you should get the mail. This is reply to my original reply.

Also I wanted to test run your patches. How do I do that? I see that your
patches are based on dm quilt tree. I downloaded the dm-patches and tried
to apply on top of 2.6.30-rc1 but it failed. So can't apply your patches
now.

So what's the simplest way of testing your changes on latest kernels?

Thanks
Vivek

Applying patch dm-add-request-based-facility.patch
patching file drivers/md/dm-table.c
Hunk #1 succeeded at 1021 (offset 29 lines).
patching file drivers/md/dm.c
Hunk #1 succeeded at 90 (offset 6 lines).
Hunk #2 succeeded at 175 (offset -1 lines).
Hunk #3 succeeded at 427 (offset 6 lines).
Hunk #4 succeeded at 648 (offset 10 lines).
Hunk #5 succeeded at 1246 with fuzz 2 (offset 20 lines).
Hunk #6 succeeded at 1273 (offset 3 lines).
Hunk #7 succeeded at 1615 with fuzz 1 (offset 20 lines).
Hunk #8 succeeded at 2016 with fuzz 1 (offset 14 lines).
Hunk #9 succeeded at 2141 (offset 33 lines).
Hunk #10 FAILED at 2321.
Hunk #11 FAILED at 2336.
Hunk #12 succeeded at 2388 with fuzz 2 (offset 29 lines).
2 out of 12 hunks FAILED -- rejects in file drivers/md/dm.c
patching file drivers/md/dm.h
patching file include/linux/device-mapper.h
Hunk #1 succeeded at 230 (offset -1 lines).
Patch dm-add-request-based-facility.patch does not apply (enforce with -f)

> Thanks
> Vivek

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: dm-ioband: Test results.
  2009-04-14  2:49     ` Vivek Goyal
  (?)
@ 2009-04-14  5:27     ` Ryo Tsuruta
  -1 siblings, 0 replies; 55+ messages in thread
From: Ryo Tsuruta @ 2009-04-14  5:27 UTC (permalink / raw)
  To: vgoyal; +Cc: agk, dm-devel, linux-kernel

Hi Vivek,

> Also I wanted to test run your patches. How do I do that? I see that your
> patches are based on dm quilt tree. I downloaded the dm-patches and tried
> to apply on top of 2.6.30-rc1 but it failed. So can't apply your patches
> now.

> Applying patch dm-add-request-based-facility.patch

"dm-add-request-based-facility.patch" is not my patch. To apply my
patch, you need to edit the series file and comment out the patch
files after "mm" section and my patch like this:

  ############################################################
  # Marker corresponding to end of -mm tree.
  ############################################################
  
  #dm-table-fix-alignment-to-hw_sector.patch
  mm

  # An attempt to get UML to work with dm.
  uml-fixes.patch

  ############################################################
  # May need more work or testing, but close to being ready.
  ############################################################
  
  # Under review
  #dm-exception-store-generalize-table-args.patch
  #dm-snapshot-new-ctr-table-format.patch
  #dm-snapshot-cleanup.patch
  
  #dm-raid1-add-clustering.patch
  dm-add-ioband.patch

And then type "quilt push dm-add-ioband.patch" command.
However, the patch can be applied, but a compile error is caused
because the patch is against the previous dm-tree.

> So what's the simplest way of testing your changes on latest kernels?

So I've uploaded a patch against 2.6.30-rc1. Please use it if you
like. http://people.valinux.co.jp/~ryov/dm-ioband/
The bio-cgroup patch against 2.6.30-rc1 will be uploaded within a few
days. I'm looking forward to your feedback on dm-ioband.

Thanks,
Ryo Tsuruta

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [dm-devel] Re: dm-ioband: Test results.
  2009-04-13 14:46   ` Vivek Goyal
@ 2009-04-14  9:30     ` Ryo Tsuruta
  -1 siblings, 0 replies; 55+ messages in thread
From: Ryo Tsuruta @ 2009-04-14  9:30 UTC (permalink / raw)
  To: dm-devel, vgoyal; +Cc: vivek.goyal2008, linux-kernel, agk

Hi Vivek,

> I quickly looked at the xls sheet. Most of the test cases seem to be
> direct IO. Have you done testing with buffered writes/async writes and
> been able to provide service differentiation between cgroups?
> 
> For example, two "dd" threads running in two cgroups doing writes.

Thanks for taking a look at the sheet. I did a buffered write test
with "fio." Only two "dd" threads can't generate enough I/O load to
make dm-ioband start bandwidth control. The following is a script that
I actually used for the test.

  #!/bin/bash
  sync
  echo 1 > /proc/sys/vm/drop_caches
  arg="--size=64m --rw=write --numjobs=50 --group_reporting"
  echo $$ > /cgroup/1/tasks
  fio $arg --name=ioband1 --directory=/mnt1 --output=ioband1.log &
  echo $$ > /cgroup/2/tasks
  fio $arg --name=ioband2 --directory=/mnt2 --output=ioband2.log &
  echo $$ > /cgroup/tasks
  wait

I created two dm-devices to easily monitor the throughput of each
cgroup by iostat, and gave weights of 200 for cgroup1 and 100 for
cgroup2 that means cgroup1 can use twice bandwidth of cgroup2. The
following is a part of the output of iostat. dm-0 and dm-1 corresponds
to ioband1 and ioband2. You can see the bandwidth is according to the
weights.

  avg-cpu:  %user   %nice %system %iowait  %steal   %idle
             0.99    0.00    6.44   92.57    0.00    0.00
  
  Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
  dm-0           3549.00         0.00     28392.00          0      28392
  dm-1           1797.00         0.00     14376.00          0      14376
  
  avg-cpu:  %user   %nice %system %iowait  %steal   %idle
             1.01    0.00    4.02   94.97    0.00    0.00
  
  Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
  dm-0           3919.00         0.00     31352.00          0      31352
  dm-1           1925.00         0.00     15400.00          0      15400
  
  avg-cpu:  %user   %nice %system %iowait  %steal   %idle
             0.00    0.00    5.97   94.03    0.00    0.00
  
  Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
  dm-0           3534.00         0.00     28272.00          0      28272
  dm-1           1773.00         0.00     14184.00          0      14184
  
  avg-cpu:  %user   %nice %system %iowait  %steal   %idle
             0.50    0.00    6.00   93.50    0.00    0.00
  
  Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
  dm-0           4053.00         0.00     32424.00          0      32424
  dm-1           2039.00         8.00     16304.00          8      16304

Thanks,
Ryo Tsuruta

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: Re: dm-ioband: Test results.
@ 2009-04-14  9:30     ` Ryo Tsuruta
  0 siblings, 0 replies; 55+ messages in thread
From: Ryo Tsuruta @ 2009-04-14  9:30 UTC (permalink / raw)
  To: dm-devel, vgoyal; +Cc: vivek.goyal2008, agk, linux-kernel

Hi Vivek,

> I quickly looked at the xls sheet. Most of the test cases seem to be
> direct IO. Have you done testing with buffered writes/async writes and
> been able to provide service differentiation between cgroups?
> 
> For example, two "dd" threads running in two cgroups doing writes.

Thanks for taking a look at the sheet. I did a buffered write test
with "fio." Only two "dd" threads can't generate enough I/O load to
make dm-ioband start bandwidth control. The following is a script that
I actually used for the test.

  #!/bin/bash
  sync
  echo 1 > /proc/sys/vm/drop_caches
  arg="--size=64m --rw=write --numjobs=50 --group_reporting"
  echo $$ > /cgroup/1/tasks
  fio $arg --name=ioband1 --directory=/mnt1 --output=ioband1.log &
  echo $$ > /cgroup/2/tasks
  fio $arg --name=ioband2 --directory=/mnt2 --output=ioband2.log &
  echo $$ > /cgroup/tasks
  wait

I created two dm-devices to easily monitor the throughput of each
cgroup by iostat, and gave weights of 200 for cgroup1 and 100 for
cgroup2 that means cgroup1 can use twice bandwidth of cgroup2. The
following is a part of the output of iostat. dm-0 and dm-1 corresponds
to ioband1 and ioband2. You can see the bandwidth is according to the
weights.

  avg-cpu:  %user   %nice %system %iowait  %steal   %idle
             0.99    0.00    6.44   92.57    0.00    0.00
  
  Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
  dm-0           3549.00         0.00     28392.00          0      28392
  dm-1           1797.00         0.00     14376.00          0      14376
  
  avg-cpu:  %user   %nice %system %iowait  %steal   %idle
             1.01    0.00    4.02   94.97    0.00    0.00
  
  Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
  dm-0           3919.00         0.00     31352.00          0      31352
  dm-1           1925.00         0.00     15400.00          0      15400
  
  avg-cpu:  %user   %nice %system %iowait  %steal   %idle
             0.00    0.00    5.97   94.03    0.00    0.00
  
  Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
  dm-0           3534.00         0.00     28272.00          0      28272
  dm-1           1773.00         0.00     14184.00          0      14184
  
  avg-cpu:  %user   %nice %system %iowait  %steal   %idle
             0.50    0.00    6.00   93.50    0.00    0.00
  
  Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
  dm-0           4053.00         0.00     32424.00          0      32424
  dm-1           2039.00         8.00     16304.00          8      16304

Thanks,
Ryo Tsuruta

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: dm-ioband: Test results.
  2009-04-13  4:05 dm-ioband: Test results Ryo Tsuruta
@ 2009-04-15  4:37   ` Vivek Goyal
  2009-04-15  4:37   ` Vivek Goyal
  2009-04-16 20:57   ` Vivek Goyal
  2 siblings, 0 replies; 55+ messages in thread
From: Vivek Goyal @ 2009-04-15  4:37 UTC (permalink / raw)
  To: Ryo Tsuruta
  Cc: agk, dm-devel, linux-kernel, Jens Axboe,
	Fernando Luis Vázquez Cao, Nauman Rafique, Moyer Jeff Moyer,
	Balbir Singh

On Mon, Apr 13, 2009 at 01:05:52PM +0900, Ryo Tsuruta wrote:
> Hi Alasdair and all,
> 
> I did more tests on dm-ioband and I've posted the test items and
> results on my website. The results are very good.
> http://people.valinux.co.jp/~ryov/dm-ioband/test/test-items.xls
> 
> I hope someone will test dm-ioband and report back to the dm-devel
> mailing list.
> 

Hi Ryo,

I have been able to take your patch for 2.6.30-rc1 kernel and started
doing some testing for reads. Hopefully you will provide bio-cgroup
patches soon so that I can do some write testing also.

In the beginning of the mail, i am listing some basic test results and
in later part of mail I am raising some of my concerns with this patchset.

My test setup:
--------------
I have got one SATA driver with two partitions /dev/sdd1 and /dev/sdd2 on
that. I have created ext3 file systems on these partitions. Created one
ioband device "ioband1" with weight 40 on /dev/sdd1 and another ioband
device "ioband2" with weight 10 on /dev/sdd2.
  
1) I think an RT task with-in a group does not get its fair share (all
  the BW available as long as RT task is backlogged). 

  I launched one RT read task of 2G file in ioband1 group and in parallel
  launched more readers in ioband1 group. ioband2 group did not have any
  io going. Following are results with and without ioband.

  A) 1 RT prio 0 + 1 BE prio 4 reader

	dm-ioband
	2147483648 bytes (2.1 GB) copied, 39.4701 s, 54.4 MB/s
	2147483648 bytes (2.1 GB) copied, 71.8034 s, 29.9 MB/s

	without-dm-ioband
	2147483648 bytes (2.1 GB) copied, 35.3677 s, 60.7 MB/s
	2147483648 bytes (2.1 GB) copied, 70.8214 s, 30.3 MB/s

  B) 1 RT prio 0 + 2 BE prio 4 reader

	dm-ioband
	2147483648 bytes (2.1 GB) copied, 43.8305 s, 49.0 MB/s
	2147483648 bytes (2.1 GB) copied, 135.395 s, 15.9 MB/s
	2147483648 bytes (2.1 GB) copied, 136.545 s, 15.7 MB/s

	without-dm-ioband
	2147483648 bytes (2.1 GB) copied, 35.3177 s, 60.8 MB/s
	2147483648 bytes (2.1 GB) copied, 124.793 s, 17.2 MB/s
	2147483648 bytes (2.1 GB) copied, 126.267 s, 17.0 MB/s

  C) 1 RT prio 0 + 3 BE prio 4 reader

	dm-ioband
	2147483648 bytes (2.1 GB) copied, 48.8159 s, 44.0 MB/s
	2147483648 bytes (2.1 GB) copied, 185.848 s, 11.6 MB/s
	2147483648 bytes (2.1 GB) copied, 188.171 s, 11.4 MB/s
	2147483648 bytes (2.1 GB) copied, 189.537 s, 11.3 MB/s

	without-dm-ioband
	2147483648 bytes (2.1 GB) copied, 35.2928 s, 60.8 MB/s
	2147483648 bytes (2.1 GB) copied, 169.929 s, 12.6 MB/s
	2147483648 bytes (2.1 GB) copied, 172.486 s, 12.5 MB/s
	2147483648 bytes (2.1 GB) copied, 172.817 s, 12.4 MB/s

  C) 1 RT prio 0 + 3 BE prio 4 reader
	dm-ioband
	2147483648 bytes (2.1 GB) copied, 51.4279 s, 41.8 MB/s
	2147483648 bytes (2.1 GB) copied, 260.29 s, 8.3 MB/s
	2147483648 bytes (2.1 GB) copied, 261.824 s, 8.2 MB/s
	2147483648 bytes (2.1 GB) copied, 261.981 s, 8.2 MB/s
	2147483648 bytes (2.1 GB) copied, 262.372 s, 8.2 MB/s

	without-dm-ioband
	2147483648 bytes (2.1 GB) copied, 35.4213 s, 60.6 MB/s
	2147483648 bytes (2.1 GB) copied, 215.784 s, 10.0 MB/s
	2147483648 bytes (2.1 GB) copied, 218.706 s, 9.8 MB/s
	2147483648 bytes (2.1 GB) copied, 220.12 s, 9.8 MB/s
	2147483648 bytes (2.1 GB) copied, 220.57 s, 9.7 MB/s

Notice that with dm-ioband as number of readers are increasing, finish
time of RT tasks is also increasing. But without dm-ioband finish time
of RT tasks remains more or less constat even with increase in number
of readers.

For some reason overall throughput also seems to be less with dm-ioband.
Because ioband2 is not doing any IO, i expected that tasks in ioband1
will get full disk BW and throughput will not drop.

I have not debugged it but I guess it might be coming from the fact that
there are no separate queues for RT tasks. bios from all the tasks can be
buffered on a single queue in a cgroup and that might be causing RT
request to hide behind BE tasks' request?

General thoughts about dm-ioband
================================
- Implementing control at second level has the advantage tha one does not
  have to muck with IO scheduler code. But then it also has the
  disadvantage that there is no communication with IO scheduler.

- dm-ioband is buffering bio at higher layer and then doing FIFO release
  of these bios. This FIFO release can lead to priority inversion problems
  in certain cases where RT requests are way behind BE requests or 
  reader starvation where reader bios are getting hidden behind writer
  bios etc. These are hard to notice issues in user space. I guess above
  RT results do highlight the RT task problems. I am still working on
  other test cases and see if i can show the probelm.

- dm-ioband does this extra grouping logic using dm messages. Why
  cgroup infrastructure is not sufficient to meet your needs like
  grouping tasks based on uid etc? I think we should get rid of all
  the extra grouping logic and just use cgroup for grouping information.

- Why do we need to specify bio cgroup ids to the dm-ioband externally with
  the help of dm messages? A user should be able to just create the
  cgroups, put the tasks in right cgroup and then everything should
  just work fine.

- Why do we have to put another dm-ioband device on top of every partition
  or existing device mapper device to control it? Is it possible to do
  this control on make_request function of the reuqest queue so that
  we don't end up creating additional dm devices? I had posted the crude
  RFC patch as proof of concept but did not continue the development 
  because of fundamental issue of FIFO release of buffered bios.

	http://lkml.org/lkml/2008/11/6/227 

  Can you please have a look and provide feedback about why we can not
  go in the direction of the above patches and why do we need to create
  additional dm device.

  I think in current form, dm-ioband is hard to configure and we should
  look for ways simplify configuration.

- I personally think that even group IO scheduling should be done at
  IO scheduler level and we should not break down IO scheduling in two
  parts where group scheduling is done by higher level IO scheduler 
  sitting in dm layer and io scheduling among tasks with-in groups is
  done by actual IO scheduler.

  But this also means more work as one has to muck around with core IO
  scheduler's to make them cgroup aware and also make sure existing
  functionality is not broken. I posted the patches here.

	http://lkml.org/lkml/2009/3/11/486

  Can you please let us know that why does IO scheduler based approach
  does not work for you? 

  Jens, it would be nice to hear your opinion about two level vs one
  level conrol. Do you think that common layer approach is the way
  to go where one can control things more tightly or FIFO release of bios
  from second level controller is fine and we can live with this additional       serialization in the layer above just above IO scheduler?

- There is no notion of RT cgroups. So even if one wants to run an RT
  task in root cgroup to make sure to get full access of disk, it can't
  do that. It has to share the BW with other competing groups. 

- dm-ioband controls amount of IO done per second. Will a seeky process
  not run away more disk time? 

  Additionally, at group level we will provide fairness in terms of amount
  of IO (number of blocks transferred etc) and with-in group cfq will try
  to provide fairness in terms of disk access time slices. I don't even
  know whether it is a matter of concern or not. I was thinking that
  probably one uniform policy on the hierarchical scheduling tree would
  have probably been better. Just thinking loud.....

Thanks
Vivek
 
> Alasdair, could you please merge dm-ioband into upstream? Or could
> you please tell me why dm-ioband can't be merged?
> 
> Thanks,
> Ryo Tsuruta
> 
> To know the details of dm-ioband:
> http://people.valinux.co.jp/~ryov/dm-ioband/
> 
> RPM packages for RHEL5 and CentOS5 are available:
> http://people.valinux.co.jp/~ryov/dm-ioband/binary.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: dm-ioband: Test results.
@ 2009-04-15  4:37   ` Vivek Goyal
  0 siblings, 0 replies; 55+ messages in thread
From: Vivek Goyal @ 2009-04-15  4:37 UTC (permalink / raw)
  To: Ryo Tsuruta
  Cc: Fernando Luis Vázquez Cao, linux-kernel, Moyer Jeff Moyer,
	dm-devel, Jens Axboe, Nauman Rafique, agk, Balbir Singh

On Mon, Apr 13, 2009 at 01:05:52PM +0900, Ryo Tsuruta wrote:
> Hi Alasdair and all,
> 
> I did more tests on dm-ioband and I've posted the test items and
> results on my website. The results are very good.
> http://people.valinux.co.jp/~ryov/dm-ioband/test/test-items.xls
> 
> I hope someone will test dm-ioband and report back to the dm-devel
> mailing list.
> 

Hi Ryo,

I have been able to take your patch for 2.6.30-rc1 kernel and started
doing some testing for reads. Hopefully you will provide bio-cgroup
patches soon so that I can do some write testing also.

In the beginning of the mail, i am listing some basic test results and
in later part of mail I am raising some of my concerns with this patchset.

My test setup:
--------------
I have got one SATA driver with two partitions /dev/sdd1 and /dev/sdd2 on
that. I have created ext3 file systems on these partitions. Created one
ioband device "ioband1" with weight 40 on /dev/sdd1 and another ioband
device "ioband2" with weight 10 on /dev/sdd2.
  
1) I think an RT task with-in a group does not get its fair share (all
  the BW available as long as RT task is backlogged). 

  I launched one RT read task of 2G file in ioband1 group and in parallel
  launched more readers in ioband1 group. ioband2 group did not have any
  io going. Following are results with and without ioband.

  A) 1 RT prio 0 + 1 BE prio 4 reader

	dm-ioband
	2147483648 bytes (2.1 GB) copied, 39.4701 s, 54.4 MB/s
	2147483648 bytes (2.1 GB) copied, 71.8034 s, 29.9 MB/s

	without-dm-ioband
	2147483648 bytes (2.1 GB) copied, 35.3677 s, 60.7 MB/s
	2147483648 bytes (2.1 GB) copied, 70.8214 s, 30.3 MB/s

  B) 1 RT prio 0 + 2 BE prio 4 reader

	dm-ioband
	2147483648 bytes (2.1 GB) copied, 43.8305 s, 49.0 MB/s
	2147483648 bytes (2.1 GB) copied, 135.395 s, 15.9 MB/s
	2147483648 bytes (2.1 GB) copied, 136.545 s, 15.7 MB/s

	without-dm-ioband
	2147483648 bytes (2.1 GB) copied, 35.3177 s, 60.8 MB/s
	2147483648 bytes (2.1 GB) copied, 124.793 s, 17.2 MB/s
	2147483648 bytes (2.1 GB) copied, 126.267 s, 17.0 MB/s

  C) 1 RT prio 0 + 3 BE prio 4 reader

	dm-ioband
	2147483648 bytes (2.1 GB) copied, 48.8159 s, 44.0 MB/s
	2147483648 bytes (2.1 GB) copied, 185.848 s, 11.6 MB/s
	2147483648 bytes (2.1 GB) copied, 188.171 s, 11.4 MB/s
	2147483648 bytes (2.1 GB) copied, 189.537 s, 11.3 MB/s

	without-dm-ioband
	2147483648 bytes (2.1 GB) copied, 35.2928 s, 60.8 MB/s
	2147483648 bytes (2.1 GB) copied, 169.929 s, 12.6 MB/s
	2147483648 bytes (2.1 GB) copied, 172.486 s, 12.5 MB/s
	2147483648 bytes (2.1 GB) copied, 172.817 s, 12.4 MB/s

  C) 1 RT prio 0 + 3 BE prio 4 reader
	dm-ioband
	2147483648 bytes (2.1 GB) copied, 51.4279 s, 41.8 MB/s
	2147483648 bytes (2.1 GB) copied, 260.29 s, 8.3 MB/s
	2147483648 bytes (2.1 GB) copied, 261.824 s, 8.2 MB/s
	2147483648 bytes (2.1 GB) copied, 261.981 s, 8.2 MB/s
	2147483648 bytes (2.1 GB) copied, 262.372 s, 8.2 MB/s

	without-dm-ioband
	2147483648 bytes (2.1 GB) copied, 35.4213 s, 60.6 MB/s
	2147483648 bytes (2.1 GB) copied, 215.784 s, 10.0 MB/s
	2147483648 bytes (2.1 GB) copied, 218.706 s, 9.8 MB/s
	2147483648 bytes (2.1 GB) copied, 220.12 s, 9.8 MB/s
	2147483648 bytes (2.1 GB) copied, 220.57 s, 9.7 MB/s

Notice that with dm-ioband as number of readers are increasing, finish
time of RT tasks is also increasing. But without dm-ioband finish time
of RT tasks remains more or less constat even with increase in number
of readers.

For some reason overall throughput also seems to be less with dm-ioband.
Because ioband2 is not doing any IO, i expected that tasks in ioband1
will get full disk BW and throughput will not drop.

I have not debugged it but I guess it might be coming from the fact that
there are no separate queues for RT tasks. bios from all the tasks can be
buffered on a single queue in a cgroup and that might be causing RT
request to hide behind BE tasks' request?

General thoughts about dm-ioband
================================
- Implementing control at second level has the advantage tha one does not
  have to muck with IO scheduler code. But then it also has the
  disadvantage that there is no communication with IO scheduler.

- dm-ioband is buffering bio at higher layer and then doing FIFO release
  of these bios. This FIFO release can lead to priority inversion problems
  in certain cases where RT requests are way behind BE requests or 
  reader starvation where reader bios are getting hidden behind writer
  bios etc. These are hard to notice issues in user space. I guess above
  RT results do highlight the RT task problems. I am still working on
  other test cases and see if i can show the probelm.

- dm-ioband does this extra grouping logic using dm messages. Why
  cgroup infrastructure is not sufficient to meet your needs like
  grouping tasks based on uid etc? I think we should get rid of all
  the extra grouping logic and just use cgroup for grouping information.

- Why do we need to specify bio cgroup ids to the dm-ioband externally with
  the help of dm messages? A user should be able to just create the
  cgroups, put the tasks in right cgroup and then everything should
  just work fine.

- Why do we have to put another dm-ioband device on top of every partition
  or existing device mapper device to control it? Is it possible to do
  this control on make_request function of the reuqest queue so that
  we don't end up creating additional dm devices? I had posted the crude
  RFC patch as proof of concept but did not continue the development 
  because of fundamental issue of FIFO release of buffered bios.

	http://lkml.org/lkml/2008/11/6/227 

  Can you please have a look and provide feedback about why we can not
  go in the direction of the above patches and why do we need to create
  additional dm device.

  I think in current form, dm-ioband is hard to configure and we should
  look for ways simplify configuration.

- I personally think that even group IO scheduling should be done at
  IO scheduler level and we should not break down IO scheduling in two
  parts where group scheduling is done by higher level IO scheduler 
  sitting in dm layer and io scheduling among tasks with-in groups is
  done by actual IO scheduler.

  But this also means more work as one has to muck around with core IO
  scheduler's to make them cgroup aware and also make sure existing
  functionality is not broken. I posted the patches here.

	http://lkml.org/lkml/2009/3/11/486

  Can you please let us know that why does IO scheduler based approach
  does not work for you? 

  Jens, it would be nice to hear your opinion about two level vs one
  level conrol. Do you think that common layer approach is the way
  to go where one can control things more tightly or FIFO release of bios
  from second level controller is fine and we can live with this additional       serialization in the layer above just above IO scheduler?

- There is no notion of RT cgroups. So even if one wants to run an RT
  task in root cgroup to make sure to get full access of disk, it can't
  do that. It has to share the BW with other competing groups. 

- dm-ioband controls amount of IO done per second. Will a seeky process
  not run away more disk time? 

  Additionally, at group level we will provide fairness in terms of amount
  of IO (number of blocks transferred etc) and with-in group cfq will try
  to provide fairness in terms of disk access time slices. I don't even
  know whether it is a matter of concern or not. I was thinking that
  probably one uniform policy on the hierarchical scheduling tree would
  have probably been better. Just thinking loud.....

Thanks
Vivek
 
> Alasdair, could you please merge dm-ioband into upstream? Or could
> you please tell me why dm-ioband can't be merged?
> 
> Thanks,
> Ryo Tsuruta
> 
> To know the details of dm-ioband:
> http://people.valinux.co.jp/~ryov/dm-ioband/
> 
> RPM packages for RHEL5 and CentOS5 are available:
> http://people.valinux.co.jp/~ryov/dm-ioband/binary.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: dm-ioband: Test results.
  2009-04-15  4:37   ` Vivek Goyal
  (?)
@ 2009-04-15 13:38   ` Ryo Tsuruta
  2009-04-15 14:10       ` Vivek Goyal
                       ` (2 more replies)
  -1 siblings, 3 replies; 55+ messages in thread
From: Ryo Tsuruta @ 2009-04-15 13:38 UTC (permalink / raw)
  To: vgoyal
  Cc: agk, dm-devel, linux-kernel, jens.axboe, fernando, nauman,
	jmoyer, balbir

Hi Vivek, 

> In the beginning of the mail, i am listing some basic test results and
> in later part of mail I am raising some of my concerns with this patchset.

I did a similar test and got different results to yours. I'll reply
later about the later part of your mail.

> My test setup:
> --------------
> I have got one SATA driver with two partitions /dev/sdd1 and /dev/sdd2 on
> that. I have created ext3 file systems on these partitions. Created one
> ioband device "ioband1" with weight 40 on /dev/sdd1 and another ioband
> device "ioband2" with weight 10 on /dev/sdd2.
>   
> 1) I think an RT task with-in a group does not get its fair share (all
>   the BW available as long as RT task is backlogged). 
> 
>   I launched one RT read task of 2G file in ioband1 group and in parallel
>   launched more readers in ioband1 group. ioband2 group did not have any
>   io going. Following are results with and without ioband.
> 
>   A) 1 RT prio 0 + 1 BE prio 4 reader
> 
> 	dm-ioband
> 	2147483648 bytes (2.1 GB) copied, 39.4701 s, 54.4 MB/s
> 	2147483648 bytes (2.1 GB) copied, 71.8034 s, 29.9 MB/s
> 
> 	without-dm-ioband
> 	2147483648 bytes (2.1 GB) copied, 35.3677 s, 60.7 MB/s
> 	2147483648 bytes (2.1 GB) copied, 70.8214 s, 30.3 MB/s
> 
>   B) 1 RT prio 0 + 2 BE prio 4 reader
> 
> 	dm-ioband
> 	2147483648 bytes (2.1 GB) copied, 43.8305 s, 49.0 MB/s
> 	2147483648 bytes (2.1 GB) copied, 135.395 s, 15.9 MB/s
> 	2147483648 bytes (2.1 GB) copied, 136.545 s, 15.7 MB/s
> 
> 	without-dm-ioband
> 	2147483648 bytes (2.1 GB) copied, 35.3177 s, 60.8 MB/s
> 	2147483648 bytes (2.1 GB) copied, 124.793 s, 17.2 MB/s
> 	2147483648 bytes (2.1 GB) copied, 126.267 s, 17.0 MB/s
> 
>   C) 1 RT prio 0 + 3 BE prio 4 reader
> 
> 	dm-ioband
> 	2147483648 bytes (2.1 GB) copied, 48.8159 s, 44.0 MB/s
> 	2147483648 bytes (2.1 GB) copied, 185.848 s, 11.6 MB/s
> 	2147483648 bytes (2.1 GB) copied, 188.171 s, 11.4 MB/s
> 	2147483648 bytes (2.1 GB) copied, 189.537 s, 11.3 MB/s
> 
> 	without-dm-ioband
> 	2147483648 bytes (2.1 GB) copied, 35.2928 s, 60.8 MB/s
> 	2147483648 bytes (2.1 GB) copied, 169.929 s, 12.6 MB/s
> 	2147483648 bytes (2.1 GB) copied, 172.486 s, 12.5 MB/s
> 	2147483648 bytes (2.1 GB) copied, 172.817 s, 12.4 MB/s
> 
>   C) 1 RT prio 0 + 3 BE prio 4 reader
> 	dm-ioband
> 	2147483648 bytes (2.1 GB) copied, 51.4279 s, 41.8 MB/s
> 	2147483648 bytes (2.1 GB) copied, 260.29 s, 8.3 MB/s
> 	2147483648 bytes (2.1 GB) copied, 261.824 s, 8.2 MB/s
> 	2147483648 bytes (2.1 GB) copied, 261.981 s, 8.2 MB/s
> 	2147483648 bytes (2.1 GB) copied, 262.372 s, 8.2 MB/s
> 
> 	without-dm-ioband
> 	2147483648 bytes (2.1 GB) copied, 35.4213 s, 60.6 MB/s
> 	2147483648 bytes (2.1 GB) copied, 215.784 s, 10.0 MB/s
> 	2147483648 bytes (2.1 GB) copied, 218.706 s, 9.8 MB/s
> 	2147483648 bytes (2.1 GB) copied, 220.12 s, 9.8 MB/s
> 	2147483648 bytes (2.1 GB) copied, 220.57 s, 9.7 MB/s
> 
> Notice that with dm-ioband as number of readers are increasing, finish
> time of RT tasks is also increasing. But without dm-ioband finish time
> of RT tasks remains more or less constat even with increase in number
> of readers.
> 
> For some reason overall throughput also seems to be less with dm-ioband.
> Because ioband2 is not doing any IO, i expected that tasks in ioband1
> will get full disk BW and throughput will not drop.
> 
> I have not debugged it but I guess it might be coming from the fact that
> there are no separate queues for RT tasks. bios from all the tasks can be
> buffered on a single queue in a cgroup and that might be causing RT
> request to hide behind BE tasks' request?

I followed your setup and ran the following script on my machine.

        #!/bin/sh
        echo 1 > /proc/sys/vm/drop_caches
        ionice -c1 -n0 dd if=/mnt1/2g.1 of=/dev/null &
        ionice -c2 -n4 dd if=/mnt1/2g.2 of=/dev/null &
        ionice -c2 -n4 dd if=/mnt1/2g.3 of=/dev/null &
        ionice -c2 -n4 dd if=/mnt1/2g.4 of=/dev/null &
        wait

I got different results and there is no siginificant difference each
dd's throughput between w/ and w/o dm-ioband. 

    A) 1 RT prio 0 + 1 BE prio 4 reader
        w/ dm-ioband
        2147483648 bytes (2.1 GB) copied, 64.0764 seconds, 33.5 MB/s
        2147483648 bytes (2.1 GB) copied, 99.0757 seconds, 21.7 MB/s
        w/o dm-ioband
        2147483648 bytes (2.1 GB) copied, 62.3575 seconds, 34.4 MB/s
        2147483648 bytes (2.1 GB) copied, 98.5804 seconds, 21.8 MB/s

    B) 1 RT prio 0 + 2 BE prio 4 reader
        w/ dm-ioband
        2147483648 bytes (2.1 GB) copied, 64.5634 seconds, 33.3 MB/s
        2147483648 bytes (2.1 GB) copied, 220.372 seconds, 9.7 MB/s
        2147483648 bytes (2.1 GB) copied, 222.174 seconds, 9.7 MB/s
        w/o dm-ioband
        2147483648 bytes (2.1 GB) copied, 62.3036 seconds, 34.5 MB/s
        2147483648 bytes (2.1 GB) copied, 226.315 seconds, 9.5 MB/s
        2147483648 bytes (2.1 GB) copied, 229.064 seconds, 9.4 MB/s

    C) 1 RT prio 0 + 3 BE prio 4 reader
        w/ dm-ioband
        2147483648 bytes (2.1 GB) copied, 66.7155 seconds, 32.2 MB/s
        2147483648 bytes (2.1 GB) copied, 306.524 seconds, 7.0 MB/s
        2147483648 bytes (2.1 GB) copied, 306.627 seconds, 7.0 MB/s
        2147483648 bytes (2.1 GB) copied, 306.971 seconds, 7.0 MB/s
        w/o dm-ioband
        2147483648 bytes (2.1 GB) copied, 66.1144 seconds, 32.5 MB/s
        2147483648 bytes (2.1 GB) copied, 305.5 seconds, 7.0 MB/s
        2147483648 bytes (2.1 GB) copied, 306.469 seconds, 7.0 MB/s
        2147483648 bytes (2.1 GB) copied, 307.63 seconds, 7.0 MB/s

The results show that the effect of the single queue is too small and
dm-ioband doesn't break CFQ's classification and priority.
What do you think about my results?

Thanks,
Ryo Tsuruta

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: dm-ioband: Test results.
  2009-04-15 13:38   ` Ryo Tsuruta
@ 2009-04-15 14:10       ` Vivek Goyal
  2009-04-15 16:17       ` Vivek Goyal
  2009-04-16  2:47     ` [dm-devel] " Ryo Tsuruta
  2 siblings, 0 replies; 55+ messages in thread
From: Vivek Goyal @ 2009-04-15 14:10 UTC (permalink / raw)
  To: Ryo Tsuruta
  Cc: agk, dm-devel, linux-kernel, jens.axboe, fernando, nauman,
	jmoyer, balbir

On Wed, Apr 15, 2009 at 10:38:32PM +0900, Ryo Tsuruta wrote:
> Hi Vivek, 
> 
> > In the beginning of the mail, i am listing some basic test results and
> > in later part of mail I am raising some of my concerns with this patchset.
> 
> I did a similar test and got different results to yours. I'll reply
> later about the later part of your mail.
> 
> > My test setup:
> > --------------
> > I have got one SATA driver with two partitions /dev/sdd1 and /dev/sdd2 on
> > that. I have created ext3 file systems on these partitions. Created one
> > ioband device "ioband1" with weight 40 on /dev/sdd1 and another ioband
> > device "ioband2" with weight 10 on /dev/sdd2.
> >   
> > 1) I think an RT task with-in a group does not get its fair share (all
> >   the BW available as long as RT task is backlogged). 
> > 
> >   I launched one RT read task of 2G file in ioband1 group and in parallel
> >   launched more readers in ioband1 group. ioband2 group did not have any
> >   io going. Following are results with and without ioband.
> > 
> >   A) 1 RT prio 0 + 1 BE prio 4 reader
> > 
> > 	dm-ioband
> > 	2147483648 bytes (2.1 GB) copied, 39.4701 s, 54.4 MB/s
> > 	2147483648 bytes (2.1 GB) copied, 71.8034 s, 29.9 MB/s
> > 
> > 	without-dm-ioband
> > 	2147483648 bytes (2.1 GB) copied, 35.3677 s, 60.7 MB/s
> > 	2147483648 bytes (2.1 GB) copied, 70.8214 s, 30.3 MB/s
> > 
> >   B) 1 RT prio 0 + 2 BE prio 4 reader
> > 
> > 	dm-ioband
> > 	2147483648 bytes (2.1 GB) copied, 43.8305 s, 49.0 MB/s
> > 	2147483648 bytes (2.1 GB) copied, 135.395 s, 15.9 MB/s
> > 	2147483648 bytes (2.1 GB) copied, 136.545 s, 15.7 MB/s
> > 
> > 	without-dm-ioband
> > 	2147483648 bytes (2.1 GB) copied, 35.3177 s, 60.8 MB/s
> > 	2147483648 bytes (2.1 GB) copied, 124.793 s, 17.2 MB/s
> > 	2147483648 bytes (2.1 GB) copied, 126.267 s, 17.0 MB/s
> > 
> >   C) 1 RT prio 0 + 3 BE prio 4 reader
> > 
> > 	dm-ioband
> > 	2147483648 bytes (2.1 GB) copied, 48.8159 s, 44.0 MB/s
> > 	2147483648 bytes (2.1 GB) copied, 185.848 s, 11.6 MB/s
> > 	2147483648 bytes (2.1 GB) copied, 188.171 s, 11.4 MB/s
> > 	2147483648 bytes (2.1 GB) copied, 189.537 s, 11.3 MB/s
> > 
> > 	without-dm-ioband
> > 	2147483648 bytes (2.1 GB) copied, 35.2928 s, 60.8 MB/s
> > 	2147483648 bytes (2.1 GB) copied, 169.929 s, 12.6 MB/s
> > 	2147483648 bytes (2.1 GB) copied, 172.486 s, 12.5 MB/s
> > 	2147483648 bytes (2.1 GB) copied, 172.817 s, 12.4 MB/s
> > 
> >   C) 1 RT prio 0 + 3 BE prio 4 reader
> > 	dm-ioband
> > 	2147483648 bytes (2.1 GB) copied, 51.4279 s, 41.8 MB/s
> > 	2147483648 bytes (2.1 GB) copied, 260.29 s, 8.3 MB/s
> > 	2147483648 bytes (2.1 GB) copied, 261.824 s, 8.2 MB/s
> > 	2147483648 bytes (2.1 GB) copied, 261.981 s, 8.2 MB/s
> > 	2147483648 bytes (2.1 GB) copied, 262.372 s, 8.2 MB/s
> > 
> > 	without-dm-ioband
> > 	2147483648 bytes (2.1 GB) copied, 35.4213 s, 60.6 MB/s
> > 	2147483648 bytes (2.1 GB) copied, 215.784 s, 10.0 MB/s
> > 	2147483648 bytes (2.1 GB) copied, 218.706 s, 9.8 MB/s
> > 	2147483648 bytes (2.1 GB) copied, 220.12 s, 9.8 MB/s
> > 	2147483648 bytes (2.1 GB) copied, 220.57 s, 9.7 MB/s
> > 
> > Notice that with dm-ioband as number of readers are increasing, finish
> > time of RT tasks is also increasing. But without dm-ioband finish time
> > of RT tasks remains more or less constat even with increase in number
> > of readers.
> > 
> > For some reason overall throughput also seems to be less with dm-ioband.
> > Because ioband2 is not doing any IO, i expected that tasks in ioband1
> > will get full disk BW and throughput will not drop.
> > 
> > I have not debugged it but I guess it might be coming from the fact that
> > there are no separate queues for RT tasks. bios from all the tasks can be
> > buffered on a single queue in a cgroup and that might be causing RT
> > request to hide behind BE tasks' request?
> 
> I followed your setup and ran the following script on my machine.
> 
>         #!/bin/sh
>         echo 1 > /proc/sys/vm/drop_caches
>         ionice -c1 -n0 dd if=/mnt1/2g.1 of=/dev/null &
>         ionice -c2 -n4 dd if=/mnt1/2g.2 of=/dev/null &
>         ionice -c2 -n4 dd if=/mnt1/2g.3 of=/dev/null &
>         ionice -c2 -n4 dd if=/mnt1/2g.4 of=/dev/null &
>         wait
> 
> I got different results and there is no siginificant difference each
> dd's throughput between w/ and w/o dm-ioband. 
> 
>     A) 1 RT prio 0 + 1 BE prio 4 reader
>         w/ dm-ioband
>         2147483648 bytes (2.1 GB) copied, 64.0764 seconds, 33.5 MB/s
>         2147483648 bytes (2.1 GB) copied, 99.0757 seconds, 21.7 MB/s
>         w/o dm-ioband
>         2147483648 bytes (2.1 GB) copied, 62.3575 seconds, 34.4 MB/s
>         2147483648 bytes (2.1 GB) copied, 98.5804 seconds, 21.8 MB/s
> 
>     B) 1 RT prio 0 + 2 BE prio 4 reader
>         w/ dm-ioband
>         2147483648 bytes (2.1 GB) copied, 64.5634 seconds, 33.3 MB/s
>         2147483648 bytes (2.1 GB) copied, 220.372 seconds, 9.7 MB/s
>         2147483648 bytes (2.1 GB) copied, 222.174 seconds, 9.7 MB/s
>         w/o dm-ioband
>         2147483648 bytes (2.1 GB) copied, 62.3036 seconds, 34.5 MB/s
>         2147483648 bytes (2.1 GB) copied, 226.315 seconds, 9.5 MB/s
>         2147483648 bytes (2.1 GB) copied, 229.064 seconds, 9.4 MB/s
> 
>     C) 1 RT prio 0 + 3 BE prio 4 reader
>         w/ dm-ioband
>         2147483648 bytes (2.1 GB) copied, 66.7155 seconds, 32.2 MB/s
>         2147483648 bytes (2.1 GB) copied, 306.524 seconds, 7.0 MB/s
>         2147483648 bytes (2.1 GB) copied, 306.627 seconds, 7.0 MB/s
>         2147483648 bytes (2.1 GB) copied, 306.971 seconds, 7.0 MB/s
>         w/o dm-ioband
>         2147483648 bytes (2.1 GB) copied, 66.1144 seconds, 32.5 MB/s
>         2147483648 bytes (2.1 GB) copied, 305.5 seconds, 7.0 MB/s
>         2147483648 bytes (2.1 GB) copied, 306.469 seconds, 7.0 MB/s
>         2147483648 bytes (2.1 GB) copied, 307.63 seconds, 7.0 MB/s
> 
> The results show that the effect of the single queue is too small and
> dm-ioband doesn't break CFQ's classification and priority.
> What do you think about my results?

Hmm..., strange. We are getting different results.  May be it is some
configuration/setup issue.

How does your ioband setup looks like. Have you created at least one more
competing ioband device? Because I think only in that case you have got
this ad-hoc logic of waiting for the group which has not finished the
tokens yet and you will end up buffering the bio in a FIFO.

If you have not already done, can you just create two partitions on your
disk, say sda1 and sda2. Create two ioband devices with weights say 95
and 5 (95% of disk for first partition and 5% for other) and then run
the above test on first ioband device.

So how does this proportional weight thing works. If I have got two ioband
devices with weight 80 and 20 and if there is no IO happening on the
second device, first devices should get all the BW?

I will re-run my tests.

Secondly from technical point of view how do you explain the fact that
FIFO release of bio does not break the notion of CFQ priority? The moment
you buffered the bios in a single queue and started doing FIFO dispatch,
you lost that notion of one bio being more important than other.

That's a different thing that in practice it might not be easily visible.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: dm-ioband: Test results.
@ 2009-04-15 14:10       ` Vivek Goyal
  0 siblings, 0 replies; 55+ messages in thread
From: Vivek Goyal @ 2009-04-15 14:10 UTC (permalink / raw)
  To: Ryo Tsuruta
  Cc: fernando, linux-kernel, jmoyer, dm-devel, jens.axboe, nauman,
	agk, balbir

On Wed, Apr 15, 2009 at 10:38:32PM +0900, Ryo Tsuruta wrote:
> Hi Vivek, 
> 
> > In the beginning of the mail, i am listing some basic test results and
> > in later part of mail I am raising some of my concerns with this patchset.
> 
> I did a similar test and got different results to yours. I'll reply
> later about the later part of your mail.
> 
> > My test setup:
> > --------------
> > I have got one SATA driver with two partitions /dev/sdd1 and /dev/sdd2 on
> > that. I have created ext3 file systems on these partitions. Created one
> > ioband device "ioband1" with weight 40 on /dev/sdd1 and another ioband
> > device "ioband2" with weight 10 on /dev/sdd2.
> >   
> > 1) I think an RT task with-in a group does not get its fair share (all
> >   the BW available as long as RT task is backlogged). 
> > 
> >   I launched one RT read task of 2G file in ioband1 group and in parallel
> >   launched more readers in ioband1 group. ioband2 group did not have any
> >   io going. Following are results with and without ioband.
> > 
> >   A) 1 RT prio 0 + 1 BE prio 4 reader
> > 
> > 	dm-ioband
> > 	2147483648 bytes (2.1 GB) copied, 39.4701 s, 54.4 MB/s
> > 	2147483648 bytes (2.1 GB) copied, 71.8034 s, 29.9 MB/s
> > 
> > 	without-dm-ioband
> > 	2147483648 bytes (2.1 GB) copied, 35.3677 s, 60.7 MB/s
> > 	2147483648 bytes (2.1 GB) copied, 70.8214 s, 30.3 MB/s
> > 
> >   B) 1 RT prio 0 + 2 BE prio 4 reader
> > 
> > 	dm-ioband
> > 	2147483648 bytes (2.1 GB) copied, 43.8305 s, 49.0 MB/s
> > 	2147483648 bytes (2.1 GB) copied, 135.395 s, 15.9 MB/s
> > 	2147483648 bytes (2.1 GB) copied, 136.545 s, 15.7 MB/s
> > 
> > 	without-dm-ioband
> > 	2147483648 bytes (2.1 GB) copied, 35.3177 s, 60.8 MB/s
> > 	2147483648 bytes (2.1 GB) copied, 124.793 s, 17.2 MB/s
> > 	2147483648 bytes (2.1 GB) copied, 126.267 s, 17.0 MB/s
> > 
> >   C) 1 RT prio 0 + 3 BE prio 4 reader
> > 
> > 	dm-ioband
> > 	2147483648 bytes (2.1 GB) copied, 48.8159 s, 44.0 MB/s
> > 	2147483648 bytes (2.1 GB) copied, 185.848 s, 11.6 MB/s
> > 	2147483648 bytes (2.1 GB) copied, 188.171 s, 11.4 MB/s
> > 	2147483648 bytes (2.1 GB) copied, 189.537 s, 11.3 MB/s
> > 
> > 	without-dm-ioband
> > 	2147483648 bytes (2.1 GB) copied, 35.2928 s, 60.8 MB/s
> > 	2147483648 bytes (2.1 GB) copied, 169.929 s, 12.6 MB/s
> > 	2147483648 bytes (2.1 GB) copied, 172.486 s, 12.5 MB/s
> > 	2147483648 bytes (2.1 GB) copied, 172.817 s, 12.4 MB/s
> > 
> >   C) 1 RT prio 0 + 3 BE prio 4 reader
> > 	dm-ioband
> > 	2147483648 bytes (2.1 GB) copied, 51.4279 s, 41.8 MB/s
> > 	2147483648 bytes (2.1 GB) copied, 260.29 s, 8.3 MB/s
> > 	2147483648 bytes (2.1 GB) copied, 261.824 s, 8.2 MB/s
> > 	2147483648 bytes (2.1 GB) copied, 261.981 s, 8.2 MB/s
> > 	2147483648 bytes (2.1 GB) copied, 262.372 s, 8.2 MB/s
> > 
> > 	without-dm-ioband
> > 	2147483648 bytes (2.1 GB) copied, 35.4213 s, 60.6 MB/s
> > 	2147483648 bytes (2.1 GB) copied, 215.784 s, 10.0 MB/s
> > 	2147483648 bytes (2.1 GB) copied, 218.706 s, 9.8 MB/s
> > 	2147483648 bytes (2.1 GB) copied, 220.12 s, 9.8 MB/s
> > 	2147483648 bytes (2.1 GB) copied, 220.57 s, 9.7 MB/s
> > 
> > Notice that with dm-ioband as number of readers are increasing, finish
> > time of RT tasks is also increasing. But without dm-ioband finish time
> > of RT tasks remains more or less constat even with increase in number
> > of readers.
> > 
> > For some reason overall throughput also seems to be less with dm-ioband.
> > Because ioband2 is not doing any IO, i expected that tasks in ioband1
> > will get full disk BW and throughput will not drop.
> > 
> > I have not debugged it but I guess it might be coming from the fact that
> > there are no separate queues for RT tasks. bios from all the tasks can be
> > buffered on a single queue in a cgroup and that might be causing RT
> > request to hide behind BE tasks' request?
> 
> I followed your setup and ran the following script on my machine.
> 
>         #!/bin/sh
>         echo 1 > /proc/sys/vm/drop_caches
>         ionice -c1 -n0 dd if=/mnt1/2g.1 of=/dev/null &
>         ionice -c2 -n4 dd if=/mnt1/2g.2 of=/dev/null &
>         ionice -c2 -n4 dd if=/mnt1/2g.3 of=/dev/null &
>         ionice -c2 -n4 dd if=/mnt1/2g.4 of=/dev/null &
>         wait
> 
> I got different results and there is no siginificant difference each
> dd's throughput between w/ and w/o dm-ioband. 
> 
>     A) 1 RT prio 0 + 1 BE prio 4 reader
>         w/ dm-ioband
>         2147483648 bytes (2.1 GB) copied, 64.0764 seconds, 33.5 MB/s
>         2147483648 bytes (2.1 GB) copied, 99.0757 seconds, 21.7 MB/s
>         w/o dm-ioband
>         2147483648 bytes (2.1 GB) copied, 62.3575 seconds, 34.4 MB/s
>         2147483648 bytes (2.1 GB) copied, 98.5804 seconds, 21.8 MB/s
> 
>     B) 1 RT prio 0 + 2 BE prio 4 reader
>         w/ dm-ioband
>         2147483648 bytes (2.1 GB) copied, 64.5634 seconds, 33.3 MB/s
>         2147483648 bytes (2.1 GB) copied, 220.372 seconds, 9.7 MB/s
>         2147483648 bytes (2.1 GB) copied, 222.174 seconds, 9.7 MB/s
>         w/o dm-ioband
>         2147483648 bytes (2.1 GB) copied, 62.3036 seconds, 34.5 MB/s
>         2147483648 bytes (2.1 GB) copied, 226.315 seconds, 9.5 MB/s
>         2147483648 bytes (2.1 GB) copied, 229.064 seconds, 9.4 MB/s
> 
>     C) 1 RT prio 0 + 3 BE prio 4 reader
>         w/ dm-ioband
>         2147483648 bytes (2.1 GB) copied, 66.7155 seconds, 32.2 MB/s
>         2147483648 bytes (2.1 GB) copied, 306.524 seconds, 7.0 MB/s
>         2147483648 bytes (2.1 GB) copied, 306.627 seconds, 7.0 MB/s
>         2147483648 bytes (2.1 GB) copied, 306.971 seconds, 7.0 MB/s
>         w/o dm-ioband
>         2147483648 bytes (2.1 GB) copied, 66.1144 seconds, 32.5 MB/s
>         2147483648 bytes (2.1 GB) copied, 305.5 seconds, 7.0 MB/s
>         2147483648 bytes (2.1 GB) copied, 306.469 seconds, 7.0 MB/s
>         2147483648 bytes (2.1 GB) copied, 307.63 seconds, 7.0 MB/s
> 
> The results show that the effect of the single queue is too small and
> dm-ioband doesn't break CFQ's classification and priority.
> What do you think about my results?

Hmm..., strange. We are getting different results.  May be it is some
configuration/setup issue.

How does your ioband setup looks like. Have you created at least one more
competing ioband device? Because I think only in that case you have got
this ad-hoc logic of waiting for the group which has not finished the
tokens yet and you will end up buffering the bio in a FIFO.

If you have not already done, can you just create two partitions on your
disk, say sda1 and sda2. Create two ioband devices with weights say 95
and 5 (95% of disk for first partition and 5% for other) and then run
the above test on first ioband device.

So how does this proportional weight thing works. If I have got two ioband
devices with weight 80 and 20 and if there is no IO happening on the
second device, first devices should get all the BW?

I will re-run my tests.

Secondly from technical point of view how do you explain the fact that
FIFO release of bio does not break the notion of CFQ priority? The moment
you buffered the bios in a single queue and started doing FIFO dispatch,
you lost that notion of one bio being more important than other.

That's a different thing that in practice it might not be easily visible.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: dm-ioband: Test results.
  2009-04-15 13:38   ` Ryo Tsuruta
@ 2009-04-15 16:17       ` Vivek Goyal
  2009-04-15 16:17       ` Vivek Goyal
  2009-04-16  2:47     ` [dm-devel] " Ryo Tsuruta
  2 siblings, 0 replies; 55+ messages in thread
From: Vivek Goyal @ 2009-04-15 16:17 UTC (permalink / raw)
  To: Ryo Tsuruta
  Cc: agk, dm-devel, linux-kernel, jens.axboe, fernando, nauman,
	jmoyer, balbir

On Wed, Apr 15, 2009 at 10:38:32PM +0900, Ryo Tsuruta wrote:
> Hi Vivek, 
> 
> > In the beginning of the mail, i am listing some basic test results and
> > in later part of mail I am raising some of my concerns with this patchset.
> 
> I did a similar test and got different results to yours. I'll reply
> later about the later part of your mail.
> 
> > My test setup:
> > --------------
> > I have got one SATA driver with two partitions /dev/sdd1 and /dev/sdd2 on
> > that. I have created ext3 file systems on these partitions. Created one
> > ioband device "ioband1" with weight 40 on /dev/sdd1 and another ioband
> > device "ioband2" with weight 10 on /dev/sdd2.
> >   
> > 1) I think an RT task with-in a group does not get its fair share (all
> >   the BW available as long as RT task is backlogged). 
> > 
> >   I launched one RT read task of 2G file in ioband1 group and in parallel
> >   launched more readers in ioband1 group. ioband2 group did not have any
> >   io going. Following are results with and without ioband.
> > 
> >   A) 1 RT prio 0 + 1 BE prio 4 reader
> > 
> > 	dm-ioband
> > 	2147483648 bytes (2.1 GB) copied, 39.4701 s, 54.4 MB/s
> > 	2147483648 bytes (2.1 GB) copied, 71.8034 s, 29.9 MB/s
> > 
> > 	without-dm-ioband
> > 	2147483648 bytes (2.1 GB) copied, 35.3677 s, 60.7 MB/s
> > 	2147483648 bytes (2.1 GB) copied, 70.8214 s, 30.3 MB/s
> > 
> >   B) 1 RT prio 0 + 2 BE prio 4 reader
> > 
> > 	dm-ioband
> > 	2147483648 bytes (2.1 GB) copied, 43.8305 s, 49.0 MB/s
> > 	2147483648 bytes (2.1 GB) copied, 135.395 s, 15.9 MB/s
> > 	2147483648 bytes (2.1 GB) copied, 136.545 s, 15.7 MB/s
> > 
> > 	without-dm-ioband
> > 	2147483648 bytes (2.1 GB) copied, 35.3177 s, 60.8 MB/s
> > 	2147483648 bytes (2.1 GB) copied, 124.793 s, 17.2 MB/s
> > 	2147483648 bytes (2.1 GB) copied, 126.267 s, 17.0 MB/s
> > 
> >   C) 1 RT prio 0 + 3 BE prio 4 reader
> > 
> > 	dm-ioband
> > 	2147483648 bytes (2.1 GB) copied, 48.8159 s, 44.0 MB/s
> > 	2147483648 bytes (2.1 GB) copied, 185.848 s, 11.6 MB/s
> > 	2147483648 bytes (2.1 GB) copied, 188.171 s, 11.4 MB/s
> > 	2147483648 bytes (2.1 GB) copied, 189.537 s, 11.3 MB/s
> > 
> > 	without-dm-ioband
> > 	2147483648 bytes (2.1 GB) copied, 35.2928 s, 60.8 MB/s
> > 	2147483648 bytes (2.1 GB) copied, 169.929 s, 12.6 MB/s
> > 	2147483648 bytes (2.1 GB) copied, 172.486 s, 12.5 MB/s
> > 	2147483648 bytes (2.1 GB) copied, 172.817 s, 12.4 MB/s
> > 
> >   C) 1 RT prio 0 + 3 BE prio 4 reader
> > 	dm-ioband
> > 	2147483648 bytes (2.1 GB) copied, 51.4279 s, 41.8 MB/s
> > 	2147483648 bytes (2.1 GB) copied, 260.29 s, 8.3 MB/s
> > 	2147483648 bytes (2.1 GB) copied, 261.824 s, 8.2 MB/s
> > 	2147483648 bytes (2.1 GB) copied, 261.981 s, 8.2 MB/s
> > 	2147483648 bytes (2.1 GB) copied, 262.372 s, 8.2 MB/s
> > 
> > 	without-dm-ioband
> > 	2147483648 bytes (2.1 GB) copied, 35.4213 s, 60.6 MB/s
> > 	2147483648 bytes (2.1 GB) copied, 215.784 s, 10.0 MB/s
> > 	2147483648 bytes (2.1 GB) copied, 218.706 s, 9.8 MB/s
> > 	2147483648 bytes (2.1 GB) copied, 220.12 s, 9.8 MB/s
> > 	2147483648 bytes (2.1 GB) copied, 220.57 s, 9.7 MB/s
> > 
> > Notice that with dm-ioband as number of readers are increasing, finish
> > time of RT tasks is also increasing. But without dm-ioband finish time
> > of RT tasks remains more or less constat even with increase in number
> > of readers.
> > 
> > For some reason overall throughput also seems to be less with dm-ioband.
> > Because ioband2 is not doing any IO, i expected that tasks in ioband1
> > will get full disk BW and throughput will not drop.
> > 
> > I have not debugged it but I guess it might be coming from the fact that
> > there are no separate queues for RT tasks. bios from all the tasks can be
> > buffered on a single queue in a cgroup and that might be causing RT
> > request to hide behind BE tasks' request?
> 
> I followed your setup and ran the following script on my machine.
> 
>         #!/bin/sh
>         echo 1 > /proc/sys/vm/drop_caches
>         ionice -c1 -n0 dd if=/mnt1/2g.1 of=/dev/null &
>         ionice -c2 -n4 dd if=/mnt1/2g.2 of=/dev/null &
>         ionice -c2 -n4 dd if=/mnt1/2g.3 of=/dev/null &
>         ionice -c2 -n4 dd if=/mnt1/2g.4 of=/dev/null &
>         wait
> 
> I got different results and there is no siginificant difference each
> dd's throughput between w/ and w/o dm-ioband. 
> 
>     A) 1 RT prio 0 + 1 BE prio 4 reader
>         w/ dm-ioband
>         2147483648 bytes (2.1 GB) copied, 64.0764 seconds, 33.5 MB/s
>         2147483648 bytes (2.1 GB) copied, 99.0757 seconds, 21.7 MB/s
>         w/o dm-ioband
>         2147483648 bytes (2.1 GB) copied, 62.3575 seconds, 34.4 MB/s
>         2147483648 bytes (2.1 GB) copied, 98.5804 seconds, 21.8 MB/s
> 
>     B) 1 RT prio 0 + 2 BE prio 4 reader
>         w/ dm-ioband
>         2147483648 bytes (2.1 GB) copied, 64.5634 seconds, 33.3 MB/s
>         2147483648 bytes (2.1 GB) copied, 220.372 seconds, 9.7 MB/s
>         2147483648 bytes (2.1 GB) copied, 222.174 seconds, 9.7 MB/s
>         w/o dm-ioband
>         2147483648 bytes (2.1 GB) copied, 62.3036 seconds, 34.5 MB/s
>         2147483648 bytes (2.1 GB) copied, 226.315 seconds, 9.5 MB/s
>         2147483648 bytes (2.1 GB) copied, 229.064 seconds, 9.4 MB/s
> 
>     C) 1 RT prio 0 + 3 BE prio 4 reader
>         w/ dm-ioband
>         2147483648 bytes (2.1 GB) copied, 66.7155 seconds, 32.2 MB/s
>         2147483648 bytes (2.1 GB) copied, 306.524 seconds, 7.0 MB/s
>         2147483648 bytes (2.1 GB) copied, 306.627 seconds, 7.0 MB/s
>         2147483648 bytes (2.1 GB) copied, 306.971 seconds, 7.0 MB/s
>         w/o dm-ioband
>         2147483648 bytes (2.1 GB) copied, 66.1144 seconds, 32.5 MB/s
>         2147483648 bytes (2.1 GB) copied, 305.5 seconds, 7.0 MB/s
>         2147483648 bytes (2.1 GB) copied, 306.469 seconds, 7.0 MB/s
>         2147483648 bytes (2.1 GB) copied, 307.63 seconds, 7.0 MB/s
> 
> The results show that the effect of the single queue is too small and
> dm-ioband doesn't break CFQ's classification and priority.

Ok, one more round of testing. Little different though this time. This
time instead of progressively increasing the number of competing readers
I have run with constant number of readers multimple times.

Again, I created two partitions /dev/sdd1 and /dev/sdd2 and created two
ioband devices and assigned weight 40 and 10 respectively. All my IO
is being done only on first ioband device and there is no IO happening
on second partition.

I use following to create ioband devices.

echo "0 $(blockdev --getsize /dev/sdd1) ioband /dev/sdd1 1 0 0 none"
"weight 0 :40" | dmsetup create ioband1
echo "0 $(blockdev --getsize /dev/sdd2) ioband /dev/sdd2 1 0 0 none"
"weight 0 :10" | dmsetup create ioband2

mount /dev/mapper/ioband1 /mnt/sdd1
mount /dev/mapper/ioband2 /mnt/sdd2

Following is dmsetup output.

# dmsetup status
ioband2: 0 38025855 ioband 1 -1 150 13 186 1 0 8
ioband1: 0 40098177 ioband 1 -1 335056 819 80342386 1 0 8

Following is my actual script to run multiple reads.

sync
echo 3 > /proc/sys/vm/drop_caches
ionice -c 1 -n 0 dd if=/mnt/sdd1/testzerofile1 of=/dev/null &
ionice -c 2 -n 4 dd if=/mnt/sdd1/testzerofile2 of=/dev/null &
ionice -c 2 -n 4 dd if=/mnt/sdd1/testzerofile3 of=/dev/null &
ionice -c 2 -n 4 dd if=/mnt/sdd1/testzerofile4 of=/dev/null &
ionice -c 2 -n 4 dd if=/mnt/sdd1/testzerofile5 of=/dev/null &

Following is output of 4 runs of reads with and without dm-ioband

1 RT process prio 0 and 4 BE process with prio 4.

First run
----------
without dm-ioband

2147483648 bytes (2.1 GB) copied, 35.3428 s, 60.8 MB/s
2147483648 bytes (2.1 GB) copied, 215.446 s, 10.0 MB/s
2147483648 bytes (2.1 GB) copied, 218.269 s, 9.8 MB/s
2147483648 bytes (2.1 GB) copied, 219.433 s, 9.8 MB/s
2147483648 bytes (2.1 GB) copied, 220.033 s, 9.8 MB/s

with dm-ioband

2147483648 bytes (2.1 GB) copied, 48.4239 s, 44.3 MB/s
2147483648 bytes (2.1 GB) copied, 257.943 s, 8.3 MB/s
2147483648 bytes (2.1 GB) copied, 258.385 s, 8.3 MB/s
2147483648 bytes (2.1 GB) copied, 258.778 s, 8.3 MB/s
2147483648 bytes (2.1 GB) copied, 259.81 s, 8.3 MB/s

Second run
----------
without dm-ioband
2147483648 bytes (2.1 GB) copied, 35.4003 s, 60.7 MB/s
2147483648 bytes (2.1 GB) copied, 217.204 s, 9.9 MB/s
2147483648 bytes (2.1 GB) copied, 218.336 s, 9.8 MB/s
2147483648 bytes (2.1 GB) copied, 219.75 s, 9.8 MB/s
2147483648 bytes (2.1 GB) copied, 219.816 s, 9.8 MB/s

with dm-ioband
2147483648 bytes (2.1 GB) copied, 49.7719 s, 43.1 MB/s
2147483648 bytes (2.1 GB) copied, 254.118 s, 8.5 MB/s
2147483648 bytes (2.1 GB) copied, 255.7 s, 8.4 MB/s
2147483648 bytes (2.1 GB) copied, 256.512 s, 8.4 MB/s
2147483648 bytes (2.1 GB) copied, 256.581 s, 8.4 MB/s

third run
---------
without dm-ioband
2147483648 bytes (2.1 GB) copied, 35.426 s, 60.6 MB/s
2147483648 bytes (2.1 GB) copied, 218.4 s, 9.8 MB/s
2147483648 bytes (2.1 GB) copied, 221.074 s, 9.7 MB/s
2147483648 bytes (2.1 GB) copied, 222.421 s, 9.7 MB/s
2147483648 bytes (2.1 GB) copied, 222.489 s, 9.7 MB/s

with dm-ioband
2147483648 bytes (2.1 GB) copied, 51.5454 s, 41.7 MB/s
2147483648 bytes (2.1 GB) copied, 261.481 s, 8.2 MB/s
2147483648 bytes (2.1 GB) copied, 261.567 s, 8.2 MB/s
2147483648 bytes (2.1 GB) copied, 263.048 s, 8.2 MB/s
2147483648 bytes (2.1 GB) copied, 264.204 s, 8.1 MB/s

fourth run
----------
without dm-ioband
2147483648 bytes (2.1 GB) copied, 35.4676 s, 60.5 MB/s
2147483648 bytes (2.1 GB) copied, 217.752 s, 9.9 MB/s
2147483648 bytes (2.1 GB) copied, 219.693 s, 9.8 MB/s
2147483648 bytes (2.1 GB) copied, 221.921 s, 9.7 MB/s
2147483648 bytes (2.1 GB) copied, 222.18 s, 9.7 MB/s

with dm-ioband
2147483648 bytes (2.1 GB) copied, 46.1355 s, 46.5 MB/s
2147483648 bytes (2.1 GB) copied, 253.84 s, 8.5 MB/s
2147483648 bytes (2.1 GB) copied, 256.282 s, 8.4 MB/s
2147483648 bytes (2.1 GB) copied, 256.356 s, 8.4 MB/s
2147483648 bytes (2.1 GB) copied, 256.679 s, 8.4 MB/s


Do let me know if you think there is something wrong with my
configuration.

First of all I still notice that there is significant performance drop
here.

Secondly notice that finish time of RT task is varying so much with 
dm-ioband and it is so stable with plain cfq.

with dm-ioabnd		48.4239  49.7719   51.5454   46.1355  
without dm-ioband	35.3428  35.4003   35.426    35.4676  		

Thanks
Vivek

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: dm-ioband: Test results.
@ 2009-04-15 16:17       ` Vivek Goyal
  0 siblings, 0 replies; 55+ messages in thread
From: Vivek Goyal @ 2009-04-15 16:17 UTC (permalink / raw)
  To: Ryo Tsuruta
  Cc: fernando, linux-kernel, jmoyer, dm-devel, jens.axboe, nauman,
	agk, balbir

On Wed, Apr 15, 2009 at 10:38:32PM +0900, Ryo Tsuruta wrote:
> Hi Vivek, 
> 
> > In the beginning of the mail, i am listing some basic test results and
> > in later part of mail I am raising some of my concerns with this patchset.
> 
> I did a similar test and got different results to yours. I'll reply
> later about the later part of your mail.
> 
> > My test setup:
> > --------------
> > I have got one SATA driver with two partitions /dev/sdd1 and /dev/sdd2 on
> > that. I have created ext3 file systems on these partitions. Created one
> > ioband device "ioband1" with weight 40 on /dev/sdd1 and another ioband
> > device "ioband2" with weight 10 on /dev/sdd2.
> >   
> > 1) I think an RT task with-in a group does not get its fair share (all
> >   the BW available as long as RT task is backlogged). 
> > 
> >   I launched one RT read task of 2G file in ioband1 group and in parallel
> >   launched more readers in ioband1 group. ioband2 group did not have any
> >   io going. Following are results with and without ioband.
> > 
> >   A) 1 RT prio 0 + 1 BE prio 4 reader
> > 
> > 	dm-ioband
> > 	2147483648 bytes (2.1 GB) copied, 39.4701 s, 54.4 MB/s
> > 	2147483648 bytes (2.1 GB) copied, 71.8034 s, 29.9 MB/s
> > 
> > 	without-dm-ioband
> > 	2147483648 bytes (2.1 GB) copied, 35.3677 s, 60.7 MB/s
> > 	2147483648 bytes (2.1 GB) copied, 70.8214 s, 30.3 MB/s
> > 
> >   B) 1 RT prio 0 + 2 BE prio 4 reader
> > 
> > 	dm-ioband
> > 	2147483648 bytes (2.1 GB) copied, 43.8305 s, 49.0 MB/s
> > 	2147483648 bytes (2.1 GB) copied, 135.395 s, 15.9 MB/s
> > 	2147483648 bytes (2.1 GB) copied, 136.545 s, 15.7 MB/s
> > 
> > 	without-dm-ioband
> > 	2147483648 bytes (2.1 GB) copied, 35.3177 s, 60.8 MB/s
> > 	2147483648 bytes (2.1 GB) copied, 124.793 s, 17.2 MB/s
> > 	2147483648 bytes (2.1 GB) copied, 126.267 s, 17.0 MB/s
> > 
> >   C) 1 RT prio 0 + 3 BE prio 4 reader
> > 
> > 	dm-ioband
> > 	2147483648 bytes (2.1 GB) copied, 48.8159 s, 44.0 MB/s
> > 	2147483648 bytes (2.1 GB) copied, 185.848 s, 11.6 MB/s
> > 	2147483648 bytes (2.1 GB) copied, 188.171 s, 11.4 MB/s
> > 	2147483648 bytes (2.1 GB) copied, 189.537 s, 11.3 MB/s
> > 
> > 	without-dm-ioband
> > 	2147483648 bytes (2.1 GB) copied, 35.2928 s, 60.8 MB/s
> > 	2147483648 bytes (2.1 GB) copied, 169.929 s, 12.6 MB/s
> > 	2147483648 bytes (2.1 GB) copied, 172.486 s, 12.5 MB/s
> > 	2147483648 bytes (2.1 GB) copied, 172.817 s, 12.4 MB/s
> > 
> >   C) 1 RT prio 0 + 3 BE prio 4 reader
> > 	dm-ioband
> > 	2147483648 bytes (2.1 GB) copied, 51.4279 s, 41.8 MB/s
> > 	2147483648 bytes (2.1 GB) copied, 260.29 s, 8.3 MB/s
> > 	2147483648 bytes (2.1 GB) copied, 261.824 s, 8.2 MB/s
> > 	2147483648 bytes (2.1 GB) copied, 261.981 s, 8.2 MB/s
> > 	2147483648 bytes (2.1 GB) copied, 262.372 s, 8.2 MB/s
> > 
> > 	without-dm-ioband
> > 	2147483648 bytes (2.1 GB) copied, 35.4213 s, 60.6 MB/s
> > 	2147483648 bytes (2.1 GB) copied, 215.784 s, 10.0 MB/s
> > 	2147483648 bytes (2.1 GB) copied, 218.706 s, 9.8 MB/s
> > 	2147483648 bytes (2.1 GB) copied, 220.12 s, 9.8 MB/s
> > 	2147483648 bytes (2.1 GB) copied, 220.57 s, 9.7 MB/s
> > 
> > Notice that with dm-ioband as number of readers are increasing, finish
> > time of RT tasks is also increasing. But without dm-ioband finish time
> > of RT tasks remains more or less constat even with increase in number
> > of readers.
> > 
> > For some reason overall throughput also seems to be less with dm-ioband.
> > Because ioband2 is not doing any IO, i expected that tasks in ioband1
> > will get full disk BW and throughput will not drop.
> > 
> > I have not debugged it but I guess it might be coming from the fact that
> > there are no separate queues for RT tasks. bios from all the tasks can be
> > buffered on a single queue in a cgroup and that might be causing RT
> > request to hide behind BE tasks' request?
> 
> I followed your setup and ran the following script on my machine.
> 
>         #!/bin/sh
>         echo 1 > /proc/sys/vm/drop_caches
>         ionice -c1 -n0 dd if=/mnt1/2g.1 of=/dev/null &
>         ionice -c2 -n4 dd if=/mnt1/2g.2 of=/dev/null &
>         ionice -c2 -n4 dd if=/mnt1/2g.3 of=/dev/null &
>         ionice -c2 -n4 dd if=/mnt1/2g.4 of=/dev/null &
>         wait
> 
> I got different results and there is no siginificant difference each
> dd's throughput between w/ and w/o dm-ioband. 
> 
>     A) 1 RT prio 0 + 1 BE prio 4 reader
>         w/ dm-ioband
>         2147483648 bytes (2.1 GB) copied, 64.0764 seconds, 33.5 MB/s
>         2147483648 bytes (2.1 GB) copied, 99.0757 seconds, 21.7 MB/s
>         w/o dm-ioband
>         2147483648 bytes (2.1 GB) copied, 62.3575 seconds, 34.4 MB/s
>         2147483648 bytes (2.1 GB) copied, 98.5804 seconds, 21.8 MB/s
> 
>     B) 1 RT prio 0 + 2 BE prio 4 reader
>         w/ dm-ioband
>         2147483648 bytes (2.1 GB) copied, 64.5634 seconds, 33.3 MB/s
>         2147483648 bytes (2.1 GB) copied, 220.372 seconds, 9.7 MB/s
>         2147483648 bytes (2.1 GB) copied, 222.174 seconds, 9.7 MB/s
>         w/o dm-ioband
>         2147483648 bytes (2.1 GB) copied, 62.3036 seconds, 34.5 MB/s
>         2147483648 bytes (2.1 GB) copied, 226.315 seconds, 9.5 MB/s
>         2147483648 bytes (2.1 GB) copied, 229.064 seconds, 9.4 MB/s
> 
>     C) 1 RT prio 0 + 3 BE prio 4 reader
>         w/ dm-ioband
>         2147483648 bytes (2.1 GB) copied, 66.7155 seconds, 32.2 MB/s
>         2147483648 bytes (2.1 GB) copied, 306.524 seconds, 7.0 MB/s
>         2147483648 bytes (2.1 GB) copied, 306.627 seconds, 7.0 MB/s
>         2147483648 bytes (2.1 GB) copied, 306.971 seconds, 7.0 MB/s
>         w/o dm-ioband
>         2147483648 bytes (2.1 GB) copied, 66.1144 seconds, 32.5 MB/s
>         2147483648 bytes (2.1 GB) copied, 305.5 seconds, 7.0 MB/s
>         2147483648 bytes (2.1 GB) copied, 306.469 seconds, 7.0 MB/s
>         2147483648 bytes (2.1 GB) copied, 307.63 seconds, 7.0 MB/s
> 
> The results show that the effect of the single queue is too small and
> dm-ioband doesn't break CFQ's classification and priority.

Ok, one more round of testing. Little different though this time. This
time instead of progressively increasing the number of competing readers
I have run with constant number of readers multimple times.

Again, I created two partitions /dev/sdd1 and /dev/sdd2 and created two
ioband devices and assigned weight 40 and 10 respectively. All my IO
is being done only on first ioband device and there is no IO happening
on second partition.

I use following to create ioband devices.

echo "0 $(blockdev --getsize /dev/sdd1) ioband /dev/sdd1 1 0 0 none"
"weight 0 :40" | dmsetup create ioband1
echo "0 $(blockdev --getsize /dev/sdd2) ioband /dev/sdd2 1 0 0 none"
"weight 0 :10" | dmsetup create ioband2

mount /dev/mapper/ioband1 /mnt/sdd1
mount /dev/mapper/ioband2 /mnt/sdd2

Following is dmsetup output.

# dmsetup status
ioband2: 0 38025855 ioband 1 -1 150 13 186 1 0 8
ioband1: 0 40098177 ioband 1 -1 335056 819 80342386 1 0 8

Following is my actual script to run multiple reads.

sync
echo 3 > /proc/sys/vm/drop_caches
ionice -c 1 -n 0 dd if=/mnt/sdd1/testzerofile1 of=/dev/null &
ionice -c 2 -n 4 dd if=/mnt/sdd1/testzerofile2 of=/dev/null &
ionice -c 2 -n 4 dd if=/mnt/sdd1/testzerofile3 of=/dev/null &
ionice -c 2 -n 4 dd if=/mnt/sdd1/testzerofile4 of=/dev/null &
ionice -c 2 -n 4 dd if=/mnt/sdd1/testzerofile5 of=/dev/null &

Following is output of 4 runs of reads with and without dm-ioband

1 RT process prio 0 and 4 BE process with prio 4.

First run
----------
without dm-ioband

2147483648 bytes (2.1 GB) copied, 35.3428 s, 60.8 MB/s
2147483648 bytes (2.1 GB) copied, 215.446 s, 10.0 MB/s
2147483648 bytes (2.1 GB) copied, 218.269 s, 9.8 MB/s
2147483648 bytes (2.1 GB) copied, 219.433 s, 9.8 MB/s
2147483648 bytes (2.1 GB) copied, 220.033 s, 9.8 MB/s

with dm-ioband

2147483648 bytes (2.1 GB) copied, 48.4239 s, 44.3 MB/s
2147483648 bytes (2.1 GB) copied, 257.943 s, 8.3 MB/s
2147483648 bytes (2.1 GB) copied, 258.385 s, 8.3 MB/s
2147483648 bytes (2.1 GB) copied, 258.778 s, 8.3 MB/s
2147483648 bytes (2.1 GB) copied, 259.81 s, 8.3 MB/s

Second run
----------
without dm-ioband
2147483648 bytes (2.1 GB) copied, 35.4003 s, 60.7 MB/s
2147483648 bytes (2.1 GB) copied, 217.204 s, 9.9 MB/s
2147483648 bytes (2.1 GB) copied, 218.336 s, 9.8 MB/s
2147483648 bytes (2.1 GB) copied, 219.75 s, 9.8 MB/s
2147483648 bytes (2.1 GB) copied, 219.816 s, 9.8 MB/s

with dm-ioband
2147483648 bytes (2.1 GB) copied, 49.7719 s, 43.1 MB/s
2147483648 bytes (2.1 GB) copied, 254.118 s, 8.5 MB/s
2147483648 bytes (2.1 GB) copied, 255.7 s, 8.4 MB/s
2147483648 bytes (2.1 GB) copied, 256.512 s, 8.4 MB/s
2147483648 bytes (2.1 GB) copied, 256.581 s, 8.4 MB/s

third run
---------
without dm-ioband
2147483648 bytes (2.1 GB) copied, 35.426 s, 60.6 MB/s
2147483648 bytes (2.1 GB) copied, 218.4 s, 9.8 MB/s
2147483648 bytes (2.1 GB) copied, 221.074 s, 9.7 MB/s
2147483648 bytes (2.1 GB) copied, 222.421 s, 9.7 MB/s
2147483648 bytes (2.1 GB) copied, 222.489 s, 9.7 MB/s

with dm-ioband
2147483648 bytes (2.1 GB) copied, 51.5454 s, 41.7 MB/s
2147483648 bytes (2.1 GB) copied, 261.481 s, 8.2 MB/s
2147483648 bytes (2.1 GB) copied, 261.567 s, 8.2 MB/s
2147483648 bytes (2.1 GB) copied, 263.048 s, 8.2 MB/s
2147483648 bytes (2.1 GB) copied, 264.204 s, 8.1 MB/s

fourth run
----------
without dm-ioband
2147483648 bytes (2.1 GB) copied, 35.4676 s, 60.5 MB/s
2147483648 bytes (2.1 GB) copied, 217.752 s, 9.9 MB/s
2147483648 bytes (2.1 GB) copied, 219.693 s, 9.8 MB/s
2147483648 bytes (2.1 GB) copied, 221.921 s, 9.7 MB/s
2147483648 bytes (2.1 GB) copied, 222.18 s, 9.7 MB/s

with dm-ioband
2147483648 bytes (2.1 GB) copied, 46.1355 s, 46.5 MB/s
2147483648 bytes (2.1 GB) copied, 253.84 s, 8.5 MB/s
2147483648 bytes (2.1 GB) copied, 256.282 s, 8.4 MB/s
2147483648 bytes (2.1 GB) copied, 256.356 s, 8.4 MB/s
2147483648 bytes (2.1 GB) copied, 256.679 s, 8.4 MB/s


Do let me know if you think there is something wrong with my
configuration.

First of all I still notice that there is significant performance drop
here.

Secondly notice that finish time of RT task is varying so much with 
dm-ioband and it is so stable with plain cfq.

with dm-ioabnd		48.4239  49.7719   51.5454   46.1355  
without dm-ioband	35.3428  35.4003   35.426    35.4676  		

Thanks
Vivek

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [dm-devel] Re: dm-ioband: Test results.
  2009-04-14  9:30     ` Ryo Tsuruta
@ 2009-04-15 17:04       ` Vivek Goyal
  -1 siblings, 0 replies; 55+ messages in thread
From: Vivek Goyal @ 2009-04-15 17:04 UTC (permalink / raw)
  To: Ryo Tsuruta; +Cc: dm-devel, vivek.goyal2008, linux-kernel, agk

On Tue, Apr 14, 2009 at 06:30:22PM +0900, Ryo Tsuruta wrote:
> Hi Vivek,
> 
> > I quickly looked at the xls sheet. Most of the test cases seem to be
> > direct IO. Have you done testing with buffered writes/async writes and
> > been able to provide service differentiation between cgroups?
> > 
> > For example, two "dd" threads running in two cgroups doing writes.
> 
> Thanks for taking a look at the sheet. I did a buffered write test
> with "fio." Only two "dd" threads can't generate enough I/O load to
> make dm-ioband start bandwidth control. The following is a script that
> I actually used for the test.
> 
>   #!/bin/bash
>   sync
>   echo 1 > /proc/sys/vm/drop_caches
>   arg="--size=64m --rw=write --numjobs=50 --group_reporting"
>   echo $$ > /cgroup/1/tasks
>   fio $arg --name=ioband1 --directory=/mnt1 --output=ioband1.log &
>   echo $$ > /cgroup/2/tasks
>   fio $arg --name=ioband2 --directory=/mnt2 --output=ioband2.log &
>   echo $$ > /cgroup/tasks
>   wait
> 

Ryo,

Can you also send bio-cgroup patches which apply to 2.6.30-rc1 so that
I can do testing for async writes.

Why have you split the regular patch and bio-cgroup patch? Do you want
to address only reads and sync writes?

In the above test case, do these "fio" jobs finish at different times?
In my testing I see that two dd generate a lot of traffic at IO scheudler
level but traffic seems to be bursty. So when higher weight process has
done some IO, it seems to disappear for .2 to 1 seconds and in that
time other writer gets to do lot of IO and eradicates any service
difference provided so far. 

I am not sure where this high priority writer is blocked and that needs
to be looked into. But I am sure that you will also face the same issue.

Thanks
Vivek

> I created two dm-devices to easily monitor the throughput of each
> cgroup by iostat, and gave weights of 200 for cgroup1 and 100 for
> cgroup2 that means cgroup1 can use twice bandwidth of cgroup2. The
> following is a part of the output of iostat. dm-0 and dm-1 corresponds
> to ioband1 and ioband2. You can see the bandwidth is according to the
> weights.
> 
>   avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>              0.99    0.00    6.44   92.57    0.00    0.00
>   
>   Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
>   dm-0           3549.00         0.00     28392.00          0      28392
>   dm-1           1797.00         0.00     14376.00          0      14376
>   
>   avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>              1.01    0.00    4.02   94.97    0.00    0.00
>   
>   Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
>   dm-0           3919.00         0.00     31352.00          0      31352
>   dm-1           1925.00         0.00     15400.00          0      15400
>   
>   avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>              0.00    0.00    5.97   94.03    0.00    0.00
>   
>   Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
>   dm-0           3534.00         0.00     28272.00          0      28272
>   dm-1           1773.00         0.00     14184.00          0      14184
>   
>   avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>              0.50    0.00    6.00   93.50    0.00    0.00
>   
>   Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
>   dm-0           4053.00         0.00     32424.00          0      32424
>   dm-1           2039.00         8.00     16304.00          8      16304
> 

> Thanks,
> Ryo Tsuruta

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: Re: dm-ioband: Test results.
@ 2009-04-15 17:04       ` Vivek Goyal
  0 siblings, 0 replies; 55+ messages in thread
From: Vivek Goyal @ 2009-04-15 17:04 UTC (permalink / raw)
  To: Ryo Tsuruta; +Cc: dm-devel, vivek.goyal2008, agk, linux-kernel

On Tue, Apr 14, 2009 at 06:30:22PM +0900, Ryo Tsuruta wrote:
> Hi Vivek,
> 
> > I quickly looked at the xls sheet. Most of the test cases seem to be
> > direct IO. Have you done testing with buffered writes/async writes and
> > been able to provide service differentiation between cgroups?
> > 
> > For example, two "dd" threads running in two cgroups doing writes.
> 
> Thanks for taking a look at the sheet. I did a buffered write test
> with "fio." Only two "dd" threads can't generate enough I/O load to
> make dm-ioband start bandwidth control. The following is a script that
> I actually used for the test.
> 
>   #!/bin/bash
>   sync
>   echo 1 > /proc/sys/vm/drop_caches
>   arg="--size=64m --rw=write --numjobs=50 --group_reporting"
>   echo $$ > /cgroup/1/tasks
>   fio $arg --name=ioband1 --directory=/mnt1 --output=ioband1.log &
>   echo $$ > /cgroup/2/tasks
>   fio $arg --name=ioband2 --directory=/mnt2 --output=ioband2.log &
>   echo $$ > /cgroup/tasks
>   wait
> 

Ryo,

Can you also send bio-cgroup patches which apply to 2.6.30-rc1 so that
I can do testing for async writes.

Why have you split the regular patch and bio-cgroup patch? Do you want
to address only reads and sync writes?

In the above test case, do these "fio" jobs finish at different times?
In my testing I see that two dd generate a lot of traffic at IO scheudler
level but traffic seems to be bursty. So when higher weight process has
done some IO, it seems to disappear for .2 to 1 seconds and in that
time other writer gets to do lot of IO and eradicates any service
difference provided so far. 

I am not sure where this high priority writer is blocked and that needs
to be looked into. But I am sure that you will also face the same issue.

Thanks
Vivek

> I created two dm-devices to easily monitor the throughput of each
> cgroup by iostat, and gave weights of 200 for cgroup1 and 100 for
> cgroup2 that means cgroup1 can use twice bandwidth of cgroup2. The
> following is a part of the output of iostat. dm-0 and dm-1 corresponds
> to ioband1 and ioband2. You can see the bandwidth is according to the
> weights.
> 
>   avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>              0.99    0.00    6.44   92.57    0.00    0.00
>   
>   Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
>   dm-0           3549.00         0.00     28392.00          0      28392
>   dm-1           1797.00         0.00     14376.00          0      14376
>   
>   avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>              1.01    0.00    4.02   94.97    0.00    0.00
>   
>   Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
>   dm-0           3919.00         0.00     31352.00          0      31352
>   dm-1           1925.00         0.00     15400.00          0      15400
>   
>   avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>              0.00    0.00    5.97   94.03    0.00    0.00
>   
>   Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
>   dm-0           3534.00         0.00     28272.00          0      28272
>   dm-1           1773.00         0.00     14184.00          0      14184
>   
>   avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>              0.50    0.00    6.00   93.50    0.00    0.00
>   
>   Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
>   dm-0           4053.00         0.00     32424.00          0      32424
>   dm-1           2039.00         8.00     16304.00          8      16304
> 

> Thanks,
> Ryo Tsuruta

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [dm-devel] Re: dm-ioband: Test results.
  2009-04-15 13:38   ` Ryo Tsuruta
  2009-04-15 14:10       ` Vivek Goyal
  2009-04-15 16:17       ` Vivek Goyal
@ 2009-04-16  2:47     ` Ryo Tsuruta
  2009-04-16 14:11         ` Vivek Goyal
  2 siblings, 1 reply; 55+ messages in thread
From: Ryo Tsuruta @ 2009-04-16  2:47 UTC (permalink / raw)
  To: vgoyal
  Cc: fernando, linux-kernel, jmoyer, dm-devel, jens.axboe, nauman,
	agk, balbir

Hi Vivek, 

> General thoughts about dm-ioband
> ================================
> - Implementing control at second level has the advantage tha one does not
>   have to muck with IO scheduler code. But then it also has the
>   disadvantage that there is no communication with IO scheduler.
> 
> - dm-ioband is buffering bio at higher layer and then doing FIFO release
>   of these bios. This FIFO release can lead to priority inversion problems
>   in certain cases where RT requests are way behind BE requests or 
>   reader starvation where reader bios are getting hidden behind writer
>   bios etc. These are hard to notice issues in user space. I guess above
>   RT results do highlight the RT task problems. I am still working on
>   other test cases and see if i can show the probelm.
>
> - dm-ioband does this extra grouping logic using dm messages. Why
>   cgroup infrastructure is not sufficient to meet your needs like
>   grouping tasks based on uid etc? I think we should get rid of all
>   the extra grouping logic and just use cgroup for grouping information.

I want to use dm-ioband even without cgroup and to make dm-ioband has
flexibility to support various type of objects.
 
> - Why do we need to specify bio cgroup ids to the dm-ioband externally with
>   the help of dm messages? A user should be able to just create the
>   cgroups, put the tasks in right cgroup and then everything should
>   just work fine.

This is because to handle cgroup on dm-ioband easily and it keeps the
code simple.

> - Why do we have to put another dm-ioband device on top of every partition
>   or existing device mapper device to control it? Is it possible to do
>   this control on make_request function of the reuqest queue so that
>   we don't end up creating additional dm devices? I had posted the crude
>   RFC patch as proof of concept but did not continue the development 
>   because of fundamental issue of FIFO release of buffered bios.
> 
> 	http://lkml.org/lkml/2008/11/6/227 
> 
>   Can you please have a look and provide feedback about why we can not
>   go in the direction of the above patches and why do we need to create
>   additional dm device.
> 
>   I think in current form, dm-ioband is hard to configure and we should
>   look for ways simplify configuration.

This can be solved by using a tool or a small script.

> - I personally think that even group IO scheduling should be done at
>   IO scheduler level and we should not break down IO scheduling in two
>   parts where group scheduling is done by higher level IO scheduler 
>   sitting in dm layer and io scheduling among tasks with-in groups is
>   done by actual IO scheduler.
> 
>   But this also means more work as one has to muck around with core IO
>   scheduler's to make them cgroup aware and also make sure existing
>   functionality is not broken. I posted the patches here.
> 
> 	http://lkml.org/lkml/2009/3/11/486
> 
>   Can you please let us know that why does IO scheduler based approach
>   does not work for you? 

I think your approach is not bad, but I've made it my purpose to
control disk bandwidth of virtual machines by device-mapper and
dm-ioband. 
I think device-mapper is a well designed system for the following
reasons:
 - It can easily add new functions to a block device.
 - No need to muck around with the existing kernel code.
 - dm-devices are detachable. It doesn't make any effects on the
   system if a user doesn't use it.
So I think dm-ioband and your IO controller can coexist. What do you
think about it?
 
>   Jens, it would be nice to hear your opinion about two level vs one
>   level conrol. Do you think that common layer approach is the way
>   to go where one can control things more tightly or FIFO release of bios
>   from second level controller is fine and we can live with this additional       serialization in the layer above just above IO scheduler?
>
> - There is no notion of RT cgroups. So even if one wants to run an RT
>   task in root cgroup to make sure to get full access of disk, it can't
>   do that. It has to share the BW with other competing groups. 
>
> - dm-ioband controls amount of IO done per second. Will a seeky process
>   not run away more disk time? 

Could you elaborate on this? dm-ioband doesn't control it per second.

>   Additionally, at group level we will provide fairness in terms of amount
>   of IO (number of blocks transferred etc) and with-in group cfq will try
>   to provide fairness in terms of disk access time slices. I don't even
>   know whether it is a matter of concern or not. I was thinking that
>   probably one uniform policy on the hierarchical scheduling tree would
>   have probably been better. Just thinking loud.....
> 
> Thanks
> Vivek

Thanks,
Ryo Tsuruta

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [dm-devel] Re: dm-ioband: Test results.
  2009-04-15 17:04       ` Vivek Goyal
@ 2009-04-16 12:56         ` Ryo Tsuruta
  -1 siblings, 0 replies; 55+ messages in thread
From: Ryo Tsuruta @ 2009-04-16 12:56 UTC (permalink / raw)
  To: vgoyal; +Cc: dm-devel, vivek.goyal2008, linux-kernel, agk

Hi Vivek,

> How does your ioband setup looks like. Have you created at least one more
> competing ioband device? Because I think only in that case you have got
> this ad-hoc logic of waiting for the group which has not finished the
> tokens yet and you will end up buffering the bio in a FIFO.

I created two ioband devices and ran the dd commands only on the first
device.

> Do let me know if you think there is something wrong with my
> configuration.

>From a quick look at your configuration, there seems to be no problem.

> Can you also send bio-cgroup patches which apply to 2.6.30-rc1 so that
> I can do testing for async writes.

I've just posted the patches to related mailing lists. Please try it.

> Why have you split the regular patch and bio-cgroup patch? Do you want
> to address only reads and sync writes?

For the first step, my goal is to merge dm-ioband into device-mapper,
and bio-cgroup is not necessary for all situations such as bandwidth
control on a per partition basis.

I'll also try to do more test and report you back.

Thank you for your help,
Ryo Tsuruta

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [dm-devel] Re: dm-ioband: Test results.
@ 2009-04-16 12:56         ` Ryo Tsuruta
  0 siblings, 0 replies; 55+ messages in thread
From: Ryo Tsuruta @ 2009-04-16 12:56 UTC (permalink / raw)
  To: vgoyal; +Cc: dm-devel, vivek.goyal2008, linux-kernel, agk

Hi Vivek,

> How does your ioband setup looks like. Have you created at least one more
> competing ioband device? Because I think only in that case you have got
> this ad-hoc logic of waiting for the group which has not finished the
> tokens yet and you will end up buffering the bio in a FIFO.

I created two ioband devices and ran the dd commands only on the first
device.

> Do let me know if you think there is something wrong with my
> configuration.

From a quick look at your configuration, there seems to be no problem.

> Can you also send bio-cgroup patches which apply to 2.6.30-rc1 so that
> I can do testing for async writes.

I've just posted the patches to related mailing lists. Please try it.

> Why have you split the regular patch and bio-cgroup patch? Do you want
> to address only reads and sync writes?

For the first step, my goal is to merge dm-ioband into device-mapper,
and bio-cgroup is not necessary for all situations such as bandwidth
control on a per partition basis.

I'll also try to do more test and report you back.

Thank you for your help,
Ryo Tsuruta

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [dm-devel] Re: dm-ioband: Test results.
  2009-04-16 12:56         ` Ryo Tsuruta
@ 2009-04-16 13:32           ` Vivek Goyal
  -1 siblings, 0 replies; 55+ messages in thread
From: Vivek Goyal @ 2009-04-16 13:32 UTC (permalink / raw)
  To: Ryo Tsuruta
  Cc: dm-devel, vivek.goyal2008, linux-kernel, agk, Jens Axboe,
	Nauman Rafique, Fernando Luis Vázquez Cao, Balbir Singh

On Thu, Apr 16, 2009 at 09:56:30PM +0900, Ryo Tsuruta wrote:
> Hi Vivek,
> 
> > How does your ioband setup looks like. Have you created at least one more
> > competing ioband device? Because I think only in that case you have got
> > this ad-hoc logic of waiting for the group which has not finished the
> > tokens yet and you will end up buffering the bio in a FIFO.
> 
> I created two ioband devices and ran the dd commands only on the first
> device.

Ok. So please do let me know how to debug it further. At the moment I 
think it is a problem with dm-ioband and most likely either coming
from buffering of bios in single queue or due to delays introduced
because of waiting mechanism to let slowest process catch up.

I have looked at that code 2-3 times but never understood it fully. Will
give it a try again. I think code quality there needs to be improved.

> 
> > Do let me know if you think there is something wrong with my
> > configuration.
> 
> >From a quick look at your configuration, there seems to be no problem.
> 
> > Can you also send bio-cgroup patches which apply to 2.6.30-rc1 so that
> > I can do testing for async writes.
> 
> I've just posted the patches to related mailing lists. Please try it.

Which mailing list you have posted to. I am assuming you did it to
dm-devel list. Please keep all the postings to both lkml and dm-devel
list for sometime while we are discussing the fundamental issues which
are also of concern from generic IO controller point of view and not
limited to dm only.

> 
> > Why have you split the regular patch and bio-cgroup patch? Do you want
> > to address only reads and sync writes?
> 
> For the first step, my goal is to merge dm-ioband into device-mapper,
> and bio-cgroup is not necessary for all situations such as bandwidth
> control on a per partition basis.

IIUC, bio-cgroup is necessary to account for async writes otherwise writes
will be accounted to submitting task. Andrew Morton clearly mentioned
in one of the mails that writes have been our biggest problem and he
wants to see a clear solution for handling async writes. So please don't
split up both the patches and keep these together.

So if you are not accounting for async writes, what kind of usage you have
got in mind? Any practical work load will have both reads and writes
going. So if a customer creates even two groups say A and B and both are
having async writes also going, what kind of gurantees will you offer
to these guys?

IOW, with just sync bio handling as your first step, what kind of usage
scenario you are covering?

Secondly, per partition control sounds bit excessive. Why per disk 
control is not sufficient? That's where the real contention for resources
is. And even if you really want equivalent of per partition contorl one
should be able to achive it with to level of cgroup hierarchy.

			root
		      /    \
		   sda1G   sda2g

So if there are two partitions in a disk, just create two groups and put
the processes doing IO to partition sda1 in group sda1G and processes
doing IO to partition sda2 in sda2g and assign the weights to the groups
the way you want to the IO to be distributed between these two partitions.

But in the end, I think doing per partition control is excessive. If
you really want that kind of isolation, then from storage array carve
out another device/logical unit and create a separate device and do
IO on that.		

Thanks
Vivek

> 
> I'll also try to do more test and report you back.
> 

> Thank you for your help,
> Ryo Tsuruta

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: Re: dm-ioband: Test results.
@ 2009-04-16 13:32           ` Vivek Goyal
  0 siblings, 0 replies; 55+ messages in thread
From: Vivek Goyal @ 2009-04-16 13:32 UTC (permalink / raw)
  To: Ryo Tsuruta
  Cc: vivek.goyal2008, Fernando Luis Vázquez Cao, linux-kernel,
	dm-devel, Jens Axboe, Nauman Rafique, agk, Balbir Singh

On Thu, Apr 16, 2009 at 09:56:30PM +0900, Ryo Tsuruta wrote:
> Hi Vivek,
> 
> > How does your ioband setup looks like. Have you created at least one more
> > competing ioband device? Because I think only in that case you have got
> > this ad-hoc logic of waiting for the group which has not finished the
> > tokens yet and you will end up buffering the bio in a FIFO.
> 
> I created two ioband devices and ran the dd commands only on the first
> device.

Ok. So please do let me know how to debug it further. At the moment I 
think it is a problem with dm-ioband and most likely either coming
from buffering of bios in single queue or due to delays introduced
because of waiting mechanism to let slowest process catch up.

I have looked at that code 2-3 times but never understood it fully. Will
give it a try again. I think code quality there needs to be improved.

> 
> > Do let me know if you think there is something wrong with my
> > configuration.
> 
> >From a quick look at your configuration, there seems to be no problem.
> 
> > Can you also send bio-cgroup patches which apply to 2.6.30-rc1 so that
> > I can do testing for async writes.
> 
> I've just posted the patches to related mailing lists. Please try it.

Which mailing list you have posted to. I am assuming you did it to
dm-devel list. Please keep all the postings to both lkml and dm-devel
list for sometime while we are discussing the fundamental issues which
are also of concern from generic IO controller point of view and not
limited to dm only.

> 
> > Why have you split the regular patch and bio-cgroup patch? Do you want
> > to address only reads and sync writes?
> 
> For the first step, my goal is to merge dm-ioband into device-mapper,
> and bio-cgroup is not necessary for all situations such as bandwidth
> control on a per partition basis.

IIUC, bio-cgroup is necessary to account for async writes otherwise writes
will be accounted to submitting task. Andrew Morton clearly mentioned
in one of the mails that writes have been our biggest problem and he
wants to see a clear solution for handling async writes. So please don't
split up both the patches and keep these together.

So if you are not accounting for async writes, what kind of usage you have
got in mind? Any practical work load will have both reads and writes
going. So if a customer creates even two groups say A and B and both are
having async writes also going, what kind of gurantees will you offer
to these guys?

IOW, with just sync bio handling as your first step, what kind of usage
scenario you are covering?

Secondly, per partition control sounds bit excessive. Why per disk 
control is not sufficient? That's where the real contention for resources
is. And even if you really want equivalent of per partition contorl one
should be able to achive it with to level of cgroup hierarchy.

			root
		      /    \
		   sda1G   sda2g

So if there are two partitions in a disk, just create two groups and put
the processes doing IO to partition sda1 in group sda1G and processes
doing IO to partition sda2 in sda2g and assign the weights to the groups
the way you want to the IO to be distributed between these two partitions.

But in the end, I think doing per partition control is excessive. If
you really want that kind of isolation, then from storage array carve
out another device/logical unit and create a separate device and do
IO on that.		

Thanks
Vivek

> 
> I'll also try to do more test and report you back.
> 

> Thank you for your help,
> Ryo Tsuruta

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [dm-devel] Re: dm-ioband: Test results.
  2009-04-16  2:47     ` [dm-devel] " Ryo Tsuruta
@ 2009-04-16 14:11         ` Vivek Goyal
  0 siblings, 0 replies; 55+ messages in thread
From: Vivek Goyal @ 2009-04-16 14:11 UTC (permalink / raw)
  To: Ryo Tsuruta
  Cc: fernando, linux-kernel, jmoyer, dm-devel, jens.axboe, nauman,
	agk, balbir

On Thu, Apr 16, 2009 at 11:47:50AM +0900, Ryo Tsuruta wrote:
> Hi Vivek, 
> 
> > General thoughts about dm-ioband
> > ================================
> > - Implementing control at second level has the advantage tha one does not
> >   have to muck with IO scheduler code. But then it also has the
> >   disadvantage that there is no communication with IO scheduler.
> > 
> > - dm-ioband is buffering bio at higher layer and then doing FIFO release
> >   of these bios. This FIFO release can lead to priority inversion problems
> >   in certain cases where RT requests are way behind BE requests or 
> >   reader starvation where reader bios are getting hidden behind writer
> >   bios etc. These are hard to notice issues in user space. I guess above
> >   RT results do highlight the RT task problems. I am still working on
> >   other test cases and see if i can show the probelm.
> >
> > - dm-ioband does this extra grouping logic using dm messages. Why
> >   cgroup infrastructure is not sufficient to meet your needs like
> >   grouping tasks based on uid etc? I think we should get rid of all
> >   the extra grouping logic and just use cgroup for grouping information.
> 
> I want to use dm-ioband even without cgroup and to make dm-ioband has
> flexibility to support various type of objects.

That's the core question. We all know that you want to use it that way.
But the point is that does not sound the right way. cgroup infrastructure
has been created for the precise reason to allow arbitrary grouping of
tasks in hierarchical manner. The kind of grouping you are doing like
uid based, you can easily do with cgroups also. In fact I have written 
a pam plugin and contributed to libcg project (user space library) to
put a uid's task automatically in a specified cgroup upon login to help
the admin.

By not using cgroups and creating additional grouping mechanisms in the
dm layer I don't think we are helping anybody. We are just increasing
the complexity for no reason without any proper justification. The only
reason I have heard so far is "I want it that way" or "This is my goal".
This kind of reasoning does not help.

>  
> > - Why do we need to specify bio cgroup ids to the dm-ioband externally with
> >   the help of dm messages? A user should be able to just create the
> >   cgroups, put the tasks in right cgroup and then everything should
> >   just work fine.
> 
> This is because to handle cgroup on dm-ioband easily and it keeps the
> code simple.

But it becomes the configuration nightmare. cgroup is the way for grouping
tasks from resource management perspective. Please use that and don't
create additional ways of grouping which increase configuration
complexity. If you think there are deficiencies in cgroup infrastructure
and it can't handle your case, then please enhance cgroup infrstructure to
meet that case.

> 
> > - Why do we have to put another dm-ioband device on top of every partition
> >   or existing device mapper device to control it? Is it possible to do
> >   this control on make_request function of the reuqest queue so that
> >   we don't end up creating additional dm devices? I had posted the crude
> >   RFC patch as proof of concept but did not continue the development 
> >   because of fundamental issue of FIFO release of buffered bios.
> > 
> > 	http://lkml.org/lkml/2008/11/6/227 
> > 
> >   Can you please have a look and provide feedback about why we can not
> >   go in the direction of the above patches and why do we need to create
> >   additional dm device.
> > 
> >   I think in current form, dm-ioband is hard to configure and we should
> >   look for ways simplify configuration.
> 
> This can be solved by using a tool or a small script.
> 

libcg is trying to provide generic helper library so that all the
user space management programs can use it to control resource controllers
which are using cgroup. Now by not using cgroup, an admin shall have to
come up with entirely different set of scripts for IO controller? That
does not make too much of sense.

Please also answer rest of the question above. Why do we need to put 
additional device mapper device on every device we want to control and 
why can't we do it by providing a hook into make_request function of
the queue and not putting additional device mapper device.

Why do you think that it will not turn out to be a simpler approach?

> > - I personally think that even group IO scheduling should be done at
> >   IO scheduler level and we should not break down IO scheduling in two
> >   parts where group scheduling is done by higher level IO scheduler 
> >   sitting in dm layer and io scheduling among tasks with-in groups is
> >   done by actual IO scheduler.
> > 
> >   But this also means more work as one has to muck around with core IO
> >   scheduler's to make them cgroup aware and also make sure existing
> >   functionality is not broken. I posted the patches here.
> > 
> > 	http://lkml.org/lkml/2009/3/11/486
> > 
> >   Can you please let us know that why does IO scheduler based approach
> >   does not work for you? 
> 
> I think your approach is not bad, but I've made it my purpose to
> control disk bandwidth of virtual machines by device-mapper and
> dm-ioband. 

What do you mean by "I have made it my purpose"? Its not about that
I have decided to do something in a specific way and I will do it
only that way. 

I think open source development is more about that this is the problem
statement and we discuss openly and experiment with various approaches
and then a approach which works for most of the people is accepted.

If you say that providing "IO control infrastructure in linux kernel"
is my goal, I can very well relate to it. But if you say providng "IO
control infrastructure only through dm-ioband, only through device-mapper
infrastructure" is my goal, then it is hard to digest.

I also have same concern and that is control the IO resources for
virtual machines. And IO schduler modification based approach as as well as 
hooking into make_request function approach will achive the same
goal.

Here we are having a technical discussion about interfaces and what's the
best way do that. And not looking at other approches and not having an
open discussion about merits and demerits of all the approaches and not
willing to change the direction does not help.

> I think device-mapper is a well designed system for the following
> reasons:
>  - It can easily add new functions to a block device.
>  - No need to muck around with the existing kernel code.

Not touching the core code makes life simple and is an advantage.  But 
remember that it comes at a cost of FIFO dispatch and possible unwanted
scnerios with underlying ioscheduoer like CFQ. I already demonstrated that
with one RT example.

But then hooking into make_request_function will give us same advantage
with simpler configuration and there is no need of putting extra dm
device on every device. 

>  - dm-devices are detachable. It doesn't make any effects on the
>    system if a user doesn't use it.

Even wth make_request approach, one could enable/disable io controller
by writing 0/1 to a file.

So why are you not open to experimenting with hooking into make_request
function approach and try to make it work? It would meet your requirements
at the same time achive the goals of not touching the core IO scheduler,
elevator and block layer code etc.? It will also be simple to
enable/disable IO control. We shall not have to put additional dm device
on every device. We shall not have to come up with additional grouping
mechanisms and can use cgroup interfaces etc. 

> So I think dm-ioband and your IO controller can coexist. What do you
> think about it?

Yes they can. I am not against that. But I don't think that dm-ioband
currently is in the right shape for various reasons have been citing
in the mails.

>  
> >   Jens, it would be nice to hear your opinion about two level vs one
> >   level conrol. Do you think that common layer approach is the way
> >   to go where one can control things more tightly or FIFO release of bios
> >   from second level controller is fine and we can live with this additional       serialization in the layer above just above IO scheduler?
> >
> > - There is no notion of RT cgroups. So even if one wants to run an RT
> >   task in root cgroup to make sure to get full access of disk, it can't
> >   do that. It has to share the BW with other competing groups. 
> >
> > - dm-ioband controls amount of IO done per second. Will a seeky process
> >   not run away more disk time? 
> 
> Could you elaborate on this? dm-ioband doesn't control it per second.
> 

There are two ways to view fairness.

- Fairness in terms of amount of sectors/data transferred.
- Fairness in terms of disk access time one gets.

In first case, if there is a seeky process doing IO, it will run away
with lot more disk time than a process doing sequential IO. Some people
consider it unfair and I think that's the reason CFQ provides fairness
in terms of disk time slices and not in terms of number of sectors
transferred.

Now with any two level of scheme, at higher layer only easy way to 
provide fairness is in terms of secotrs transferred and underlying
CFQ will be working on providing fairness in terms of disk slices.

Thanks
Vivek

> >   Additionally, at group level we will provide fairness in terms of amount
> >   of IO (number of blocks transferred etc) and with-in group cfq will try
> >   to provide fairness in terms of disk access time slices. I don't even
> >   know whether it is a matter of concern or not. I was thinking that
> >   probably one uniform policy on the hierarchical scheduling tree would
> >   have probably been better. Just thinking loud.....
> > 
> > Thanks
> > Vivek
> 
> Thanks,
> Ryo Tsuruta

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: Re: dm-ioband: Test results.
@ 2009-04-16 14:11         ` Vivek Goyal
  0 siblings, 0 replies; 55+ messages in thread
From: Vivek Goyal @ 2009-04-16 14:11 UTC (permalink / raw)
  To: Ryo Tsuruta
  Cc: fernando, linux-kernel, jmoyer, dm-devel, jens.axboe, nauman,
	agk, balbir

On Thu, Apr 16, 2009 at 11:47:50AM +0900, Ryo Tsuruta wrote:
> Hi Vivek, 
> 
> > General thoughts about dm-ioband
> > ================================
> > - Implementing control at second level has the advantage tha one does not
> >   have to muck with IO scheduler code. But then it also has the
> >   disadvantage that there is no communication with IO scheduler.
> > 
> > - dm-ioband is buffering bio at higher layer and then doing FIFO release
> >   of these bios. This FIFO release can lead to priority inversion problems
> >   in certain cases where RT requests are way behind BE requests or 
> >   reader starvation where reader bios are getting hidden behind writer
> >   bios etc. These are hard to notice issues in user space. I guess above
> >   RT results do highlight the RT task problems. I am still working on
> >   other test cases and see if i can show the probelm.
> >
> > - dm-ioband does this extra grouping logic using dm messages. Why
> >   cgroup infrastructure is not sufficient to meet your needs like
> >   grouping tasks based on uid etc? I think we should get rid of all
> >   the extra grouping logic and just use cgroup for grouping information.
> 
> I want to use dm-ioband even without cgroup and to make dm-ioband has
> flexibility to support various type of objects.

That's the core question. We all know that you want to use it that way.
But the point is that does not sound the right way. cgroup infrastructure
has been created for the precise reason to allow arbitrary grouping of
tasks in hierarchical manner. The kind of grouping you are doing like
uid based, you can easily do with cgroups also. In fact I have written 
a pam plugin and contributed to libcg project (user space library) to
put a uid's task automatically in a specified cgroup upon login to help
the admin.

By not using cgroups and creating additional grouping mechanisms in the
dm layer I don't think we are helping anybody. We are just increasing
the complexity for no reason without any proper justification. The only
reason I have heard so far is "I want it that way" or "This is my goal".
This kind of reasoning does not help.

>  
> > - Why do we need to specify bio cgroup ids to the dm-ioband externally with
> >   the help of dm messages? A user should be able to just create the
> >   cgroups, put the tasks in right cgroup and then everything should
> >   just work fine.
> 
> This is because to handle cgroup on dm-ioband easily and it keeps the
> code simple.

But it becomes the configuration nightmare. cgroup is the way for grouping
tasks from resource management perspective. Please use that and don't
create additional ways of grouping which increase configuration
complexity. If you think there are deficiencies in cgroup infrastructure
and it can't handle your case, then please enhance cgroup infrstructure to
meet that case.

> 
> > - Why do we have to put another dm-ioband device on top of every partition
> >   or existing device mapper device to control it? Is it possible to do
> >   this control on make_request function of the reuqest queue so that
> >   we don't end up creating additional dm devices? I had posted the crude
> >   RFC patch as proof of concept but did not continue the development 
> >   because of fundamental issue of FIFO release of buffered bios.
> > 
> > 	http://lkml.org/lkml/2008/11/6/227 
> > 
> >   Can you please have a look and provide feedback about why we can not
> >   go in the direction of the above patches and why do we need to create
> >   additional dm device.
> > 
> >   I think in current form, dm-ioband is hard to configure and we should
> >   look for ways simplify configuration.
> 
> This can be solved by using a tool or a small script.
> 

libcg is trying to provide generic helper library so that all the
user space management programs can use it to control resource controllers
which are using cgroup. Now by not using cgroup, an admin shall have to
come up with entirely different set of scripts for IO controller? That
does not make too much of sense.

Please also answer rest of the question above. Why do we need to put 
additional device mapper device on every device we want to control and 
why can't we do it by providing a hook into make_request function of
the queue and not putting additional device mapper device.

Why do you think that it will not turn out to be a simpler approach?

> > - I personally think that even group IO scheduling should be done at
> >   IO scheduler level and we should not break down IO scheduling in two
> >   parts where group scheduling is done by higher level IO scheduler 
> >   sitting in dm layer and io scheduling among tasks with-in groups is
> >   done by actual IO scheduler.
> > 
> >   But this also means more work as one has to muck around with core IO
> >   scheduler's to make them cgroup aware and also make sure existing
> >   functionality is not broken. I posted the patches here.
> > 
> > 	http://lkml.org/lkml/2009/3/11/486
> > 
> >   Can you please let us know that why does IO scheduler based approach
> >   does not work for you? 
> 
> I think your approach is not bad, but I've made it my purpose to
> control disk bandwidth of virtual machines by device-mapper and
> dm-ioband. 

What do you mean by "I have made it my purpose"? Its not about that
I have decided to do something in a specific way and I will do it
only that way. 

I think open source development is more about that this is the problem
statement and we discuss openly and experiment with various approaches
and then a approach which works for most of the people is accepted.

If you say that providing "IO control infrastructure in linux kernel"
is my goal, I can very well relate to it. But if you say providng "IO
control infrastructure only through dm-ioband, only through device-mapper
infrastructure" is my goal, then it is hard to digest.

I also have same concern and that is control the IO resources for
virtual machines. And IO schduler modification based approach as as well as 
hooking into make_request function approach will achive the same
goal.

Here we are having a technical discussion about interfaces and what's the
best way do that. And not looking at other approches and not having an
open discussion about merits and demerits of all the approaches and not
willing to change the direction does not help.

> I think device-mapper is a well designed system for the following
> reasons:
>  - It can easily add new functions to a block device.
>  - No need to muck around with the existing kernel code.

Not touching the core code makes life simple and is an advantage.  But 
remember that it comes at a cost of FIFO dispatch and possible unwanted
scnerios with underlying ioscheduoer like CFQ. I already demonstrated that
with one RT example.

But then hooking into make_request_function will give us same advantage
with simpler configuration and there is no need of putting extra dm
device on every device. 

>  - dm-devices are detachable. It doesn't make any effects on the
>    system if a user doesn't use it.

Even wth make_request approach, one could enable/disable io controller
by writing 0/1 to a file.

So why are you not open to experimenting with hooking into make_request
function approach and try to make it work? It would meet your requirements
at the same time achive the goals of not touching the core IO scheduler,
elevator and block layer code etc.? It will also be simple to
enable/disable IO control. We shall not have to put additional dm device
on every device. We shall not have to come up with additional grouping
mechanisms and can use cgroup interfaces etc. 

> So I think dm-ioband and your IO controller can coexist. What do you
> think about it?

Yes they can. I am not against that. But I don't think that dm-ioband
currently is in the right shape for various reasons have been citing
in the mails.

>  
> >   Jens, it would be nice to hear your opinion about two level vs one
> >   level conrol. Do you think that common layer approach is the way
> >   to go where one can control things more tightly or FIFO release of bios
> >   from second level controller is fine and we can live with this additional       serialization in the layer above just above IO scheduler?
> >
> > - There is no notion of RT cgroups. So even if one wants to run an RT
> >   task in root cgroup to make sure to get full access of disk, it can't
> >   do that. It has to share the BW with other competing groups. 
> >
> > - dm-ioband controls amount of IO done per second. Will a seeky process
> >   not run away more disk time? 
> 
> Could you elaborate on this? dm-ioband doesn't control it per second.
> 

There are two ways to view fairness.

- Fairness in terms of amount of sectors/data transferred.
- Fairness in terms of disk access time one gets.

In first case, if there is a seeky process doing IO, it will run away
with lot more disk time than a process doing sequential IO. Some people
consider it unfair and I think that's the reason CFQ provides fairness
in terms of disk time slices and not in terms of number of sectors
transferred.

Now with any two level of scheme, at higher layer only easy way to 
provide fairness is in terms of secotrs transferred and underlying
CFQ will be working on providing fairness in terms of disk slices.

Thanks
Vivek

> >   Additionally, at group level we will provide fairness in terms of amount
> >   of IO (number of blocks transferred etc) and with-in group cfq will try
> >   to provide fairness in terms of disk access time slices. I don't even
> >   know whether it is a matter of concern or not. I was thinking that
> >   probably one uniform policy on the hierarchical scheduling tree would
> >   have probably been better. Just thinking loud.....
> > 
> > Thanks
> > Vivek
> 
> Thanks,
> Ryo Tsuruta

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [dm-devel] Re: dm-ioband: Test results.
  2009-04-16 14:11         ` Vivek Goyal
  (?)
@ 2009-04-16 20:24         ` Nauman Rafique
  2009-04-20  8:29             ` Ryo Tsuruta
  -1 siblings, 1 reply; 55+ messages in thread
From: Nauman Rafique @ 2009-04-16 20:24 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Ryo Tsuruta, fernando, linux-kernel, jmoyer, dm-devel,
	jens.axboe, agk, balbir

On Thu, Apr 16, 2009 at 7:11 AM, Vivek Goyal <vgoyal@redhat.com> wrote:
> On Thu, Apr 16, 2009 at 11:47:50AM +0900, Ryo Tsuruta wrote:
>> Hi Vivek,
>>
>> > General thoughts about dm-ioband
>> > ================================
>> > - Implementing control at second level has the advantage tha one does not
>> >   have to muck with IO scheduler code. But then it also has the
>> >   disadvantage that there is no communication with IO scheduler.
>> >
>> > - dm-ioband is buffering bio at higher layer and then doing FIFO release
>> >   of these bios. This FIFO release can lead to priority inversion problems
>> >   in certain cases where RT requests are way behind BE requests or
>> >   reader starvation where reader bios are getting hidden behind writer
>> >   bios etc. These are hard to notice issues in user space. I guess above
>> >   RT results do highlight the RT task problems. I am still working on
>> >   other test cases and see if i can show the probelm.

Ryo, I could not agree more with Vivek here. At Google, we have very
stringent requirement for latency of our RT requests. If RT requests
get queued in any higher layer (behind BE requests), all bets are off.
I don't find doing IO control at two layer for this particular reason.
The upper layer (dm-ioband in this case) would have to make sure that
RT requests are released immediately, irrespective of the state (FIFO
queuing and tokens held). And the lower layer (IO scheduling layer)
has to do the same. This requirement is not specific to us. I have
seen similar comments from filesystem folks here previously, in the
context of metadata updates being submitted as RT. Basically, the
semantics of RT class has to be preserved by any solution that is
build on top of CFQ scheduler.

>> >
>> > - dm-ioband does this extra grouping logic using dm messages. Why
>> >   cgroup infrastructure is not sufficient to meet your needs like
>> >   grouping tasks based on uid etc? I think we should get rid of all
>> >   the extra grouping logic and just use cgroup for grouping information.
>>
>> I want to use dm-ioband even without cgroup and to make dm-ioband has
>> flexibility to support various type of objects.
>
> That's the core question. We all know that you want to use it that way.
> But the point is that does not sound the right way. cgroup infrastructure
> has been created for the precise reason to allow arbitrary grouping of
> tasks in hierarchical manner. The kind of grouping you are doing like
> uid based, you can easily do with cgroups also. In fact I have written
> a pam plugin and contributed to libcg project (user space library) to
> put a uid's task automatically in a specified cgroup upon login to help
> the admin.
>
> By not using cgroups and creating additional grouping mechanisms in the
> dm layer I don't think we are helping anybody. We are just increasing
> the complexity for no reason without any proper justification. The only
> reason I have heard so far is "I want it that way" or "This is my goal".
> This kind of reasoning does not help.
>
>>
>> > - Why do we need to specify bio cgroup ids to the dm-ioband externally with
>> >   the help of dm messages? A user should be able to just create the
>> >   cgroups, put the tasks in right cgroup and then everything should
>> >   just work fine.
>>
>> This is because to handle cgroup on dm-ioband easily and it keeps the
>> code simple.
>
> But it becomes the configuration nightmare. cgroup is the way for grouping
> tasks from resource management perspective. Please use that and don't
> create additional ways of grouping which increase configuration
> complexity. If you think there are deficiencies in cgroup infrastructure
> and it can't handle your case, then please enhance cgroup infrstructure to
> meet that case.
>
>>
>> > - Why do we have to put another dm-ioband device on top of every partition
>> >   or existing device mapper device to control it? Is it possible to do
>> >   this control on make_request function of the reuqest queue so that
>> >   we don't end up creating additional dm devices? I had posted the crude
>> >   RFC patch as proof of concept but did not continue the development
>> >   because of fundamental issue of FIFO release of buffered bios.
>> >
>> >     http://lkml.org/lkml/2008/11/6/227
>> >
>> >   Can you please have a look and provide feedback about why we can not
>> >   go in the direction of the above patches and why do we need to create
>> >   additional dm device.
>> >
>> >   I think in current form, dm-ioband is hard to configure and we should
>> >   look for ways simplify configuration.
>>
>> This can be solved by using a tool or a small script.
>>
>
> libcg is trying to provide generic helper library so that all the
> user space management programs can use it to control resource controllers
> which are using cgroup. Now by not using cgroup, an admin shall have to
> come up with entirely different set of scripts for IO controller? That
> does not make too much of sense.
>
> Please also answer rest of the question above. Why do we need to put
> additional device mapper device on every device we want to control and
> why can't we do it by providing a hook into make_request function of
> the queue and not putting additional device mapper device.
>
> Why do you think that it will not turn out to be a simpler approach?
>
>> > - I personally think that even group IO scheduling should be done at
>> >   IO scheduler level and we should not break down IO scheduling in two
>> >   parts where group scheduling is done by higher level IO scheduler
>> >   sitting in dm layer and io scheduling among tasks with-in groups is
>> >   done by actual IO scheduler.
>> >
>> >   But this also means more work as one has to muck around with core IO
>> >   scheduler's to make them cgroup aware and also make sure existing
>> >   functionality is not broken. I posted the patches here.
>> >
>> >     http://lkml.org/lkml/2009/3/11/486
>> >
>> >   Can you please let us know that why does IO scheduler based approach
>> >   does not work for you?
>>
>> I think your approach is not bad, but I've made it my purpose to
>> control disk bandwidth of virtual machines by device-mapper and
>> dm-ioband.
>
> What do you mean by "I have made it my purpose"? Its not about that
> I have decided to do something in a specific way and I will do it
> only that way.
>
> I think open source development is more about that this is the problem
> statement and we discuss openly and experiment with various approaches
> and then a approach which works for most of the people is accepted.
>
> If you say that providing "IO control infrastructure in linux kernel"
> is my goal, I can very well relate to it. But if you say providng "IO
> control infrastructure only through dm-ioband, only through device-mapper
> infrastructure" is my goal, then it is hard to digest.
>
> I also have same concern and that is control the IO resources for
> virtual machines. And IO schduler modification based approach as as well as
> hooking into make_request function approach will achive the same
> goal.
>
> Here we are having a technical discussion about interfaces and what's the
> best way do that. And not looking at other approches and not having an
> open discussion about merits and demerits of all the approaches and not
> willing to change the direction does not help.
>
>> I think device-mapper is a well designed system for the following
>> reasons:
>>  - It can easily add new functions to a block device.
>>  - No need to muck around with the existing kernel code.
>
> Not touching the core code makes life simple and is an advantage.  But
> remember that it comes at a cost of FIFO dispatch and possible unwanted
> scnerios with underlying ioscheduoer like CFQ. I already demonstrated that
> with one RT example.
>
> But then hooking into make_request_function will give us same advantage
> with simpler configuration and there is no need of putting extra dm
> device on every device.
>
>>  - dm-devices are detachable. It doesn't make any effects on the
>>    system if a user doesn't use it.
>
> Even wth make_request approach, one could enable/disable io controller
> by writing 0/1 to a file.
>
> So why are you not open to experimenting with hooking into make_request
> function approach and try to make it work? It would meet your requirements
> at the same time achive the goals of not touching the core IO scheduler,
> elevator and block layer code etc.? It will also be simple to
> enable/disable IO control. We shall not have to put additional dm device
> on every device. We shall not have to come up with additional grouping
> mechanisms and can use cgroup interfaces etc.
>
>> So I think dm-ioband and your IO controller can coexist. What do you
>> think about it?
>
> Yes they can. I am not against that. But I don't think that dm-ioband
> currently is in the right shape for various reasons have been citing
> in the mails.
>
>>
>> >   Jens, it would be nice to hear your opinion about two level vs one
>> >   level conrol. Do you think that common layer approach is the way
>> >   to go where one can control things more tightly or FIFO release of bios
>> >   from second level controller is fine and we can live with this additional       serialization in the layer above just above IO scheduler?
>> >
>> > - There is no notion of RT cgroups. So even if one wants to run an RT
>> >   task in root cgroup to make sure to get full access of disk, it can't
>> >   do that. It has to share the BW with other competing groups.
>> >
>> > - dm-ioband controls amount of IO done per second. Will a seeky process
>> >   not run away more disk time?
>>
>> Could you elaborate on this? dm-ioband doesn't control it per second.
>>
>
> There are two ways to view fairness.
>
> - Fairness in terms of amount of sectors/data transferred.
> - Fairness in terms of disk access time one gets.
>
> In first case, if there is a seeky process doing IO, it will run away
> with lot more disk time than a process doing sequential IO. Some people
> consider it unfair and I think that's the reason CFQ provides fairness
> in terms of disk time slices and not in terms of number of sectors
> transferred.
>
> Now with any two level of scheme, at higher layer only easy way to
> provide fairness is in terms of secotrs transferred and underlying
> CFQ will be working on providing fairness in terms of disk slices.
>
> Thanks
> Vivek
>
>> >   Additionally, at group level we will provide fairness in terms of amount
>> >   of IO (number of blocks transferred etc) and with-in group cfq will try
>> >   to provide fairness in terms of disk access time slices. I don't even
>> >   know whether it is a matter of concern or not. I was thinking that
>> >   probably one uniform policy on the hierarchical scheduling tree would
>> >   have probably been better. Just thinking loud.....
>> >
>> > Thanks
>> > Vivek
>>
>> Thanks,
>> Ryo Tsuruta
>

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: dm-ioband: Test results.
  2009-04-13  4:05 dm-ioband: Test results Ryo Tsuruta
@ 2009-04-16 20:57   ` Vivek Goyal
  2009-04-15  4:37   ` Vivek Goyal
  2009-04-16 20:57   ` Vivek Goyal
  2 siblings, 0 replies; 55+ messages in thread
From: Vivek Goyal @ 2009-04-16 20:57 UTC (permalink / raw)
  To: Ryo Tsuruta
  Cc: agk, dm-devel, linux-kernel, Nauman Rafique,
	Fernando Luis Vázquez Cao, Andrea Righi, Jens Axboe,
	Balbir Singh, Moyer Jeff Moyer, Morton Andrew Morton

On Mon, Apr 13, 2009 at 01:05:52PM +0900, Ryo Tsuruta wrote:
> Hi Alasdair and all,
> 
> I did more tests on dm-ioband and I've posted the test items and
> results on my website. The results are very good.
> http://people.valinux.co.jp/~ryov/dm-ioband/test/test-items.xls
> 
> I hope someone will test dm-ioband and report back to the dm-devel
> mailing list.
> 

Ok, here are more test results. This time I am trying to see how fairness
is provided for async writes and how does it impact throughput.

I have created two partitions /dev/sda1 and /dev/sda2. Two ioband devices
ioband1 and ioband2 on /dev/sda1 and /dev/sda2 respectively with weights
40 and 40.

#dmsetup status
ioband2: 0 38025855 ioband 1 -1 150 8 186 1 0 8
ioband1: 0 40098177 ioband 1 -1 150 8 186 1 0 8

I ran following two fio jobs. One job in each partition.

************************************************************
echo cfq > /sys/block/sdd/queue/scheduler
sync
echo 3 > /proc/sys/vm/drop_caches

fio_args="--size=64m --rw=write --numjobs=50 --group_reporting"
time fio $fio_args --name=test1 --directory=/mnt/sdd1/fio/
--output=test1.log &
time fio $fio_args --name=test2 --directory=/mnt/sdd2/fio/
--output=test2.log &
wait
*****************************************************************

Following are fio job finish times with and without dm-ioband
		
			first job		second job
without dm-ioband	3m29.947s		4m1.436s 	
with dm-ioband		8m42.532s		8m43.328s

This sounds like 100% performance regression in this particular setup.

I think this regression is introduced because we are waiting for too
long for slower group to catch up to make sure proportionate numbers
look right and choke the writes even if deviec is free.

It is an hard to solve problem because the async writes traffic is 
bursty when seen at block layer and we not necessarily see higher amount of
writer traffic dispatched from higher prio process/group. So what does one
do? Wait for other groups to catch up to show right proportionate numbers
and hence let the disk be idle and kill the performance. Or just continue
and not idle too much (a small amount of idling like 8ms for sync queue 
might still be ok).

I think there might not be much benefit in providing artificial notion
of maintaining proportionate ratio and kill the performance. We should
instead try to audit async write path and see where the higher weight
application/group is stuck.

In my simple two dd test, I could see bursty traffic from high prio app and
then it would sometimes disappear for .2 to .8 seconds. In that duration if I
wait for higher priority group to catch up that I will end up keeping disk
idle for .8 seconds and kill performance. I guess better way is to not wait
that long (even if it means that to application it might give the impression
that io scheduler is not doing the job right in assiginig proportionate disk)
and over a period of time see if we can fix some things in async write path
for more smooth traffic to io scheduler.

Thoughts?

Thanks
Vivek


> Alasdair, could you please merge dm-ioband into upstream? Or could
> you please tell me why dm-ioband can't be merged?
> 
> Thanks,
> Ryo Tsuruta
> 
> To know the details of dm-ioband:
> http://people.valinux.co.jp/~ryov/dm-ioband/
> 
> RPM packages for RHEL5 and CentOS5 are available:
> http://people.valinux.co.jp/~ryov/dm-ioband/binary.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: dm-ioband: Test results.
@ 2009-04-16 20:57   ` Vivek Goyal
  0 siblings, 0 replies; 55+ messages in thread
From: Vivek Goyal @ 2009-04-16 20:57 UTC (permalink / raw)
  To: Ryo Tsuruta
  Cc: Fernando Luis Vázquez Cao, linux-kernel, Moyer Jeff Moyer,
	dm-devel, Jens Axboe, Nauman Rafique, Morton Andrew Morton,
	Balbir Singh, agk, Andrea Righi

On Mon, Apr 13, 2009 at 01:05:52PM +0900, Ryo Tsuruta wrote:
> Hi Alasdair and all,
> 
> I did more tests on dm-ioband and I've posted the test items and
> results on my website. The results are very good.
> http://people.valinux.co.jp/~ryov/dm-ioband/test/test-items.xls
> 
> I hope someone will test dm-ioband and report back to the dm-devel
> mailing list.
> 

Ok, here are more test results. This time I am trying to see how fairness
is provided for async writes and how does it impact throughput.

I have created two partitions /dev/sda1 and /dev/sda2. Two ioband devices
ioband1 and ioband2 on /dev/sda1 and /dev/sda2 respectively with weights
40 and 40.

#dmsetup status
ioband2: 0 38025855 ioband 1 -1 150 8 186 1 0 8
ioband1: 0 40098177 ioband 1 -1 150 8 186 1 0 8

I ran following two fio jobs. One job in each partition.

************************************************************
echo cfq > /sys/block/sdd/queue/scheduler
sync
echo 3 > /proc/sys/vm/drop_caches

fio_args="--size=64m --rw=write --numjobs=50 --group_reporting"
time fio $fio_args --name=test1 --directory=/mnt/sdd1/fio/
--output=test1.log &
time fio $fio_args --name=test2 --directory=/mnt/sdd2/fio/
--output=test2.log &
wait
*****************************************************************

Following are fio job finish times with and without dm-ioband
		
			first job		second job
without dm-ioband	3m29.947s		4m1.436s 	
with dm-ioband		8m42.532s		8m43.328s

This sounds like 100% performance regression in this particular setup.

I think this regression is introduced because we are waiting for too
long for slower group to catch up to make sure proportionate numbers
look right and choke the writes even if deviec is free.

It is an hard to solve problem because the async writes traffic is 
bursty when seen at block layer and we not necessarily see higher amount of
writer traffic dispatched from higher prio process/group. So what does one
do? Wait for other groups to catch up to show right proportionate numbers
and hence let the disk be idle and kill the performance. Or just continue
and not idle too much (a small amount of idling like 8ms for sync queue 
might still be ok).

I think there might not be much benefit in providing artificial notion
of maintaining proportionate ratio and kill the performance. We should
instead try to audit async write path and see where the higher weight
application/group is stuck.

In my simple two dd test, I could see bursty traffic from high prio app and
then it would sometimes disappear for .2 to .8 seconds. In that duration if I
wait for higher priority group to catch up that I will end up keeping disk
idle for .8 seconds and kill performance. I guess better way is to not wait
that long (even if it means that to application it might give the impression
that io scheduler is not doing the job right in assiginig proportionate disk)
and over a period of time see if we can fix some things in async write path
for more smooth traffic to io scheduler.

Thoughts?

Thanks
Vivek


> Alasdair, could you please merge dm-ioband into upstream? Or could
> you please tell me why dm-ioband can't be merged?
> 
> Thanks,
> Ryo Tsuruta
> 
> To know the details of dm-ioband:
> http://people.valinux.co.jp/~ryov/dm-ioband/
> 
> RPM packages for RHEL5 and CentOS5 are available:
> http://people.valinux.co.jp/~ryov/dm-ioband/binary.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: dm-ioband: Test results.
  2009-04-16 20:57   ` Vivek Goyal
@ 2009-04-17  2:11     ` Vivek Goyal
  -1 siblings, 0 replies; 55+ messages in thread
From: Vivek Goyal @ 2009-04-17  2:11 UTC (permalink / raw)
  To: Ryo Tsuruta
  Cc: agk, dm-devel, linux-kernel, Nauman Rafique,
	Fernando Luis Vázquez Cao, Andrea Righi, Jens Axboe,
	Balbir Singh, Moyer Jeff Moyer, Morton Andrew Morton

On Thu, Apr 16, 2009 at 04:57:20PM -0400, Vivek Goyal wrote:
> On Mon, Apr 13, 2009 at 01:05:52PM +0900, Ryo Tsuruta wrote:
> > Hi Alasdair and all,
> > 
> > I did more tests on dm-ioband and I've posted the test items and
> > results on my website. The results are very good.
> > http://people.valinux.co.jp/~ryov/dm-ioband/test/test-items.xls
> > 
> > I hope someone will test dm-ioband and report back to the dm-devel
> > mailing list.
> > 
> 

Ok, one more test. This time to show that with single queue and FIFO
dispatch a writer can easily starve the reader.

I have created two partitions /dev/sda1 and /dev/sda2. Two ioband devices
ioband1 and ioband2 on /dev/sda1 and /dev/sda2 respectively with weights
40 and 20.

I am launching an aggressive writer dd with prio 7 (Best effort) and a
reader with prio 0 (Best effort).

Following is my script.

****************************************************************
rm /mnt/sdd1/aggressivewriter

sync
echo 3 > /proc/sys/vm/drop_caches

#launch an hostile writer
ionice -c2 -n7 dd if=/dev/zero of=/mnt/sdd1/aggressivewriter bs=4K count=524288 conv=fdatasync &

# Reader
ionice -c 2 -n 0 dd if=/mnt/sdd1/testzerofile1 of=/dev/null &
wait $!
echo "reader finished"
**********************************************************************

Following are the results without and with dm-ioband

Without dm-ioband
-----------------
First run
2147483648 bytes (2.1 GB) copied, 46.4747 s, 46.2 MB/s (Reader)
reader finished
2147483648 bytes (2.1 GB) copied, 87.9293 s, 24.4 MB/s (Writer)

Second run
2147483648 bytes (2.1 GB) copied, 47.6461 s, 45.1 MB/s (Reader)
reader finished
2147483648 bytes (2.1 GB) copied, 89.0781 s, 24.1 MB/s (Writer)

Third run
2147483648 bytes (2.1 GB) copied, 51.0624 s, 42.1 MB/s (Reader)
reader finished
2147483648 bytes (2.1 GB) copied, 91.9507 s, 23.4 MB/s (Writer)

With dm-ioband
--------------
2147483648 bytes (2.1 GB) copied, 54.895 s, 39.1 MB/s (Writer)
2147483648 bytes (2.1 GB) copied, 88.6323 s, 24.2 MB/s (Reader)
reader finished

2147483648 bytes (2.1 GB) copied, 62.6102 s, 34.3 MB/s (Writer)
2147483648 bytes (2.1 GB) copied, 91.6662 s, 23.4 MB/s (Reader)
reader finished

2147483648 bytes (2.1 GB) copied, 58.9928 s, 36.4 MB/s (Writer)
2147483648 bytes (2.1 GB) copied, 90.6707 s, 23.7 MB/s (Reader)
reader finished

I have marked which dd finished first. I determine it with the help of
wait command and also monitor the "iostat -d 5 sdd1" to see how IO rates
are varying.

Notice with dm-ioband its complete reversal of fortunes. A reader is
completely starved by the aggressive writer. I think this one you should
be able to reproduce easily with script.

I don't understand how single queue and FIFO dispatch does not break
the notion of cfq classes and priorities?

Thanks
Vivek

> Ok, here are more test results. This time I am trying to see how fairness
> is provided for async writes and how does it impact throughput.
> 
> I have created two partitions /dev/sda1 and /dev/sda2. Two ioband devices
> ioband1 and ioband2 on /dev/sda1 and /dev/sda2 respectively with weights
> 40 and 40.
> 
> #dmsetup status
> ioband2: 0 38025855 ioband 1 -1 150 8 186 1 0 8
> ioband1: 0 40098177 ioband 1 -1 150 8 186 1 0 8
> 
> I ran following two fio jobs. One job in each partition.
> 
> ************************************************************
> echo cfq > /sys/block/sdd/queue/scheduler
> sync
> echo 3 > /proc/sys/vm/drop_caches
> 
> fio_args="--size=64m --rw=write --numjobs=50 --group_reporting"
> time fio $fio_args --name=test1 --directory=/mnt/sdd1/fio/
> --output=test1.log &
> time fio $fio_args --name=test2 --directory=/mnt/sdd2/fio/
> --output=test2.log &
> wait
> *****************************************************************
> 
> Following are fio job finish times with and without dm-ioband
> 		
> 			first job		second job
> without dm-ioband	3m29.947s		4m1.436s 	
> with dm-ioband		8m42.532s		8m43.328s
> 
> This sounds like 100% performance regression in this particular setup.
> 
> I think this regression is introduced because we are waiting for too
> long for slower group to catch up to make sure proportionate numbers
> look right and choke the writes even if deviec is free.
> 
> It is an hard to solve problem because the async writes traffic is 
> bursty when seen at block layer and we not necessarily see higher amount of
> writer traffic dispatched from higher prio process/group. So what does one
> do? Wait for other groups to catch up to show right proportionate numbers
> and hence let the disk be idle and kill the performance. Or just continue
> and not idle too much (a small amount of idling like 8ms for sync queue 
> might still be ok).
> 
> I think there might not be much benefit in providing artificial notion
> of maintaining proportionate ratio and kill the performance. We should
> instead try to audit async write path and see where the higher weight
> application/group is stuck.
> 
> In my simple two dd test, I could see bursty traffic from high prio app and
> then it would sometimes disappear for .2 to .8 seconds. In that duration if I
> wait for higher priority group to catch up that I will end up keeping disk
> idle for .8 seconds and kill performance. I guess better way is to not wait
> that long (even if it means that to application it might give the impression
> that io scheduler is not doing the job right in assiginig proportionate disk)
> and over a period of time see if we can fix some things in async write path
> for more smooth traffic to io scheduler.
> 
> Thoughts?
> 
> Thanks
> Vivek
> 
> 
> > Alasdair, could you please merge dm-ioband into upstream? Or could
> > you please tell me why dm-ioband can't be merged?
> > 
> > Thanks,
> > Ryo Tsuruta
> > 
> > To know the details of dm-ioband:
> > http://people.valinux.co.jp/~ryov/dm-ioband/
> > 
> > RPM packages for RHEL5 and CentOS5 are available:
> > http://people.valinux.co.jp/~ryov/dm-ioband/binary.html
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: dm-ioband: Test results.
@ 2009-04-17  2:11     ` Vivek Goyal
  0 siblings, 0 replies; 55+ messages in thread
From: Vivek Goyal @ 2009-04-17  2:11 UTC (permalink / raw)
  To: Ryo Tsuruta
  Cc: Fernando Luis Vázquez Cao, linux-kernel, Moyer Jeff Moyer,
	dm-devel, Jens Axboe, Nauman Rafique, Morton Andrew Morton,
	Balbir Singh, agk, Andrea Righi

On Thu, Apr 16, 2009 at 04:57:20PM -0400, Vivek Goyal wrote:
> On Mon, Apr 13, 2009 at 01:05:52PM +0900, Ryo Tsuruta wrote:
> > Hi Alasdair and all,
> > 
> > I did more tests on dm-ioband and I've posted the test items and
> > results on my website. The results are very good.
> > http://people.valinux.co.jp/~ryov/dm-ioband/test/test-items.xls
> > 
> > I hope someone will test dm-ioband and report back to the dm-devel
> > mailing list.
> > 
> 

Ok, one more test. This time to show that with single queue and FIFO
dispatch a writer can easily starve the reader.

I have created two partitions /dev/sda1 and /dev/sda2. Two ioband devices
ioband1 and ioband2 on /dev/sda1 and /dev/sda2 respectively with weights
40 and 20.

I am launching an aggressive writer dd with prio 7 (Best effort) and a
reader with prio 0 (Best effort).

Following is my script.

****************************************************************
rm /mnt/sdd1/aggressivewriter

sync
echo 3 > /proc/sys/vm/drop_caches

#launch an hostile writer
ionice -c2 -n7 dd if=/dev/zero of=/mnt/sdd1/aggressivewriter bs=4K count=524288 conv=fdatasync &

# Reader
ionice -c 2 -n 0 dd if=/mnt/sdd1/testzerofile1 of=/dev/null &
wait $!
echo "reader finished"
**********************************************************************

Following are the results without and with dm-ioband

Without dm-ioband
-----------------
First run
2147483648 bytes (2.1 GB) copied, 46.4747 s, 46.2 MB/s (Reader)
reader finished
2147483648 bytes (2.1 GB) copied, 87.9293 s, 24.4 MB/s (Writer)

Second run
2147483648 bytes (2.1 GB) copied, 47.6461 s, 45.1 MB/s (Reader)
reader finished
2147483648 bytes (2.1 GB) copied, 89.0781 s, 24.1 MB/s (Writer)

Third run
2147483648 bytes (2.1 GB) copied, 51.0624 s, 42.1 MB/s (Reader)
reader finished
2147483648 bytes (2.1 GB) copied, 91.9507 s, 23.4 MB/s (Writer)

With dm-ioband
--------------
2147483648 bytes (2.1 GB) copied, 54.895 s, 39.1 MB/s (Writer)
2147483648 bytes (2.1 GB) copied, 88.6323 s, 24.2 MB/s (Reader)
reader finished

2147483648 bytes (2.1 GB) copied, 62.6102 s, 34.3 MB/s (Writer)
2147483648 bytes (2.1 GB) copied, 91.6662 s, 23.4 MB/s (Reader)
reader finished

2147483648 bytes (2.1 GB) copied, 58.9928 s, 36.4 MB/s (Writer)
2147483648 bytes (2.1 GB) copied, 90.6707 s, 23.7 MB/s (Reader)
reader finished

I have marked which dd finished first. I determine it with the help of
wait command and also monitor the "iostat -d 5 sdd1" to see how IO rates
are varying.

Notice with dm-ioband its complete reversal of fortunes. A reader is
completely starved by the aggressive writer. I think this one you should
be able to reproduce easily with script.

I don't understand how single queue and FIFO dispatch does not break
the notion of cfq classes and priorities?

Thanks
Vivek

> Ok, here are more test results. This time I am trying to see how fairness
> is provided for async writes and how does it impact throughput.
> 
> I have created two partitions /dev/sda1 and /dev/sda2. Two ioband devices
> ioband1 and ioband2 on /dev/sda1 and /dev/sda2 respectively with weights
> 40 and 40.
> 
> #dmsetup status
> ioband2: 0 38025855 ioband 1 -1 150 8 186 1 0 8
> ioband1: 0 40098177 ioband 1 -1 150 8 186 1 0 8
> 
> I ran following two fio jobs. One job in each partition.
> 
> ************************************************************
> echo cfq > /sys/block/sdd/queue/scheduler
> sync
> echo 3 > /proc/sys/vm/drop_caches
> 
> fio_args="--size=64m --rw=write --numjobs=50 --group_reporting"
> time fio $fio_args --name=test1 --directory=/mnt/sdd1/fio/
> --output=test1.log &
> time fio $fio_args --name=test2 --directory=/mnt/sdd2/fio/
> --output=test2.log &
> wait
> *****************************************************************
> 
> Following are fio job finish times with and without dm-ioband
> 		
> 			first job		second job
> without dm-ioband	3m29.947s		4m1.436s 	
> with dm-ioband		8m42.532s		8m43.328s
> 
> This sounds like 100% performance regression in this particular setup.
> 
> I think this regression is introduced because we are waiting for too
> long for slower group to catch up to make sure proportionate numbers
> look right and choke the writes even if deviec is free.
> 
> It is an hard to solve problem because the async writes traffic is 
> bursty when seen at block layer and we not necessarily see higher amount of
> writer traffic dispatched from higher prio process/group. So what does one
> do? Wait for other groups to catch up to show right proportionate numbers
> and hence let the disk be idle and kill the performance. Or just continue
> and not idle too much (a small amount of idling like 8ms for sync queue 
> might still be ok).
> 
> I think there might not be much benefit in providing artificial notion
> of maintaining proportionate ratio and kill the performance. We should
> instead try to audit async write path and see where the higher weight
> application/group is stuck.
> 
> In my simple two dd test, I could see bursty traffic from high prio app and
> then it would sometimes disappear for .2 to .8 seconds. In that duration if I
> wait for higher priority group to catch up that I will end up keeping disk
> idle for .8 seconds and kill performance. I guess better way is to not wait
> that long (even if it means that to application it might give the impression
> that io scheduler is not doing the job right in assiginig proportionate disk)
> and over a period of time see if we can fix some things in async write path
> for more smooth traffic to io scheduler.
> 
> Thoughts?
> 
> Thanks
> Vivek
> 
> 
> > Alasdair, could you please merge dm-ioband into upstream? Or could
> > you please tell me why dm-ioband can't be merged?
> > 
> > Thanks,
> > Ryo Tsuruta
> > 
> > To know the details of dm-ioband:
> > http://people.valinux.co.jp/~ryov/dm-ioband/
> > 
> > RPM packages for RHEL5 and CentOS5 are available:
> > http://people.valinux.co.jp/~ryov/dm-ioband/binary.html
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: dm-ioband: Test results.
  2009-04-17  2:11     ` Vivek Goyal
@ 2009-04-17  2:28       ` Vivek Goyal
  -1 siblings, 0 replies; 55+ messages in thread
From: Vivek Goyal @ 2009-04-17  2:28 UTC (permalink / raw)
  To: Ryo Tsuruta
  Cc: agk, dm-devel, linux-kernel, Nauman Rafique,
	Fernando Luis Vázquez Cao, Andrea Righi, Jens Axboe,
	Balbir Singh, Moyer Jeff Moyer, Morton Andrew Morton

On Thu, Apr 16, 2009 at 10:11:39PM -0400, Vivek Goyal wrote:
> On Thu, Apr 16, 2009 at 04:57:20PM -0400, Vivek Goyal wrote:
> > On Mon, Apr 13, 2009 at 01:05:52PM +0900, Ryo Tsuruta wrote:
> > > Hi Alasdair and all,
> > > 
> > > I did more tests on dm-ioband and I've posted the test items and
> > > results on my website. The results are very good.
> > > http://people.valinux.co.jp/~ryov/dm-ioband/test/test-items.xls
> > > 
> > > I hope someone will test dm-ioband and report back to the dm-devel
> > > mailing list.
> > > 
> > 
> 
> Ok, one more test. This time to show that with single queue and FIFO
> dispatch a writer can easily starve the reader.
> 
> I have created two partitions /dev/sda1 and /dev/sda2. Two ioband devices
> ioband1 and ioband2 on /dev/sda1 and /dev/sda2 respectively with weights
> 40 and 20.
> 
> I am launching an aggressive writer dd with prio 7 (Best effort) and a
> reader with prio 0 (Best effort).
> 
> Following is my script.
> 
> ****************************************************************
> rm /mnt/sdd1/aggressivewriter
> 
> sync
> echo 3 > /proc/sys/vm/drop_caches
> 
> #launch an hostile writer
> ionice -c2 -n7 dd if=/dev/zero of=/mnt/sdd1/aggressivewriter bs=4K count=524288 conv=fdatasync &
> 
> # Reader
> ionice -c 2 -n 0 dd if=/mnt/sdd1/testzerofile1 of=/dev/null &
> wait $!
> echo "reader finished"
> **********************************************************************

More results. Same reader writer test as above, the only variation I 
have done is change the class of reader to RT from BE.

ionice -c 1 -n 0 dd if=/mnt/sdd1/testzerofile1 of=/dev/null &

Even chaning the class of reader does not help. Writer still starves the
reader.

Without dm-ioband
=================
First run
2147483648 bytes (2.1 GB) copied, 43.9096 s, 48.9 MB/s (Reader)
reader finished
2147483648 bytes (2.1 GB) copied, 85.9094 s, 25.0 MB/s (Writer)

Second run
2147483648 bytes (2.1 GB) copied, 40.2446 s, 53.4 MB/s (Reader)
reader finished
2147483648 bytes (2.1 GB) copied, 82.723 s, 26.0 MB/s (Writer)

With dm-ioband
==============
First run
2147483648 bytes (2.1 GB) copied, 69.0272 s, 31.1 MB/s (Writer)
2147483648 bytes (2.1 GB) copied, 89.3037 s, 24.0 MB/s (Reader)
reader finished

Second run
2147483648 bytes (2.1 GB) copied, 64.8751 s, 33.1 MB/s (Writer)
2147483648 bytes (2.1 GB) copied, 89.0273 s, 24.1 MB/s (Reader)
reader finished

Thanks
Vivek

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: dm-ioband: Test results.
@ 2009-04-17  2:28       ` Vivek Goyal
  0 siblings, 0 replies; 55+ messages in thread
From: Vivek Goyal @ 2009-04-17  2:28 UTC (permalink / raw)
  To: Ryo Tsuruta
  Cc: Fernando Luis Vázquez Cao, linux-kernel, Moyer Jeff Moyer,
	dm-devel, Jens Axboe, Nauman Rafique, Morton Andrew Morton,
	Balbir Singh, agk, Andrea Righi

On Thu, Apr 16, 2009 at 10:11:39PM -0400, Vivek Goyal wrote:
> On Thu, Apr 16, 2009 at 04:57:20PM -0400, Vivek Goyal wrote:
> > On Mon, Apr 13, 2009 at 01:05:52PM +0900, Ryo Tsuruta wrote:
> > > Hi Alasdair and all,
> > > 
> > > I did more tests on dm-ioband and I've posted the test items and
> > > results on my website. The results are very good.
> > > http://people.valinux.co.jp/~ryov/dm-ioband/test/test-items.xls
> > > 
> > > I hope someone will test dm-ioband and report back to the dm-devel
> > > mailing list.
> > > 
> > 
> 
> Ok, one more test. This time to show that with single queue and FIFO
> dispatch a writer can easily starve the reader.
> 
> I have created two partitions /dev/sda1 and /dev/sda2. Two ioband devices
> ioband1 and ioband2 on /dev/sda1 and /dev/sda2 respectively with weights
> 40 and 20.
> 
> I am launching an aggressive writer dd with prio 7 (Best effort) and a
> reader with prio 0 (Best effort).
> 
> Following is my script.
> 
> ****************************************************************
> rm /mnt/sdd1/aggressivewriter
> 
> sync
> echo 3 > /proc/sys/vm/drop_caches
> 
> #launch an hostile writer
> ionice -c2 -n7 dd if=/dev/zero of=/mnt/sdd1/aggressivewriter bs=4K count=524288 conv=fdatasync &
> 
> # Reader
> ionice -c 2 -n 0 dd if=/mnt/sdd1/testzerofile1 of=/dev/null &
> wait $!
> echo "reader finished"
> **********************************************************************

More results. Same reader writer test as above, the only variation I 
have done is change the class of reader to RT from BE.

ionice -c 1 -n 0 dd if=/mnt/sdd1/testzerofile1 of=/dev/null &

Even chaning the class of reader does not help. Writer still starves the
reader.

Without dm-ioband
=================
First run
2147483648 bytes (2.1 GB) copied, 43.9096 s, 48.9 MB/s (Reader)
reader finished
2147483648 bytes (2.1 GB) copied, 85.9094 s, 25.0 MB/s (Writer)

Second run
2147483648 bytes (2.1 GB) copied, 40.2446 s, 53.4 MB/s (Reader)
reader finished
2147483648 bytes (2.1 GB) copied, 82.723 s, 26.0 MB/s (Writer)

With dm-ioband
==============
First run
2147483648 bytes (2.1 GB) copied, 69.0272 s, 31.1 MB/s (Writer)
2147483648 bytes (2.1 GB) copied, 89.3037 s, 24.0 MB/s (Reader)
reader finished

Second run
2147483648 bytes (2.1 GB) copied, 64.8751 s, 33.1 MB/s (Writer)
2147483648 bytes (2.1 GB) copied, 89.0273 s, 24.1 MB/s (Reader)
reader finished

Thanks
Vivek

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [dm-devel] Re: dm-ioband: Test results.
  2009-04-16 20:24         ` [dm-devel] " Nauman Rafique
@ 2009-04-20  8:29             ` Ryo Tsuruta
  0 siblings, 0 replies; 55+ messages in thread
From: Ryo Tsuruta @ 2009-04-20  8:29 UTC (permalink / raw)
  To: nauman
  Cc: vgoyal, fernando, linux-kernel, jmoyer, dm-devel, jens.axboe,
	agk, balbir

Hi Vivek and Nauman,

> On Thu, Apr 16, 2009 at 7:11 AM, Vivek Goyal <vgoyal@redhat.com> wrote:
> > On Thu, Apr 16, 2009 at 11:47:50AM +0900, Ryo Tsuruta wrote:
> >> Hi Vivek,
> >>
> >> > General thoughts about dm-ioband
> >> > ================================
> >> > - Implementing control at second level has the advantage tha one does not
> >> >   have to muck with IO scheduler code. But then it also has the
> >> >   disadvantage that there is no communication with IO scheduler.
> >> >
> >> > - dm-ioband is buffering bio at higher layer and then doing FIFO release
> >> >   of these bios. This FIFO release can lead to priority inversion problems
> >> >   in certain cases where RT requests are way behind BE requests or
> >> >   reader starvation where reader bios are getting hidden behind writer
> >> >   bios etc. These are hard to notice issues in user space. I guess above
> >> >   RT results do highlight the RT task problems. I am still working on
> >> >   other test cases and see if i can show the probelm.
> 
> Ryo, I could not agree more with Vivek here. At Google, we have very
> stringent requirement for latency of our RT requests. If RT requests
> get queued in any higher layer (behind BE requests), all bets are off.
> I don't find doing IO control at two layer for this particular reason.
> The upper layer (dm-ioband in this case) would have to make sure that
> RT requests are released immediately, irrespective of the state (FIFO
> queuing and tokens held). And the lower layer (IO scheduling layer)
> has to do the same. This requirement is not specific to us. I have
> seen similar comments from filesystem folks here previously, in the
> context of metadata updates being submitted as RT. Basically, the
> semantics of RT class has to be preserved by any solution that is
> build on top of CFQ scheduler.

I could see the priority inversion by running Vivek's script and I
understand how RT requests has to be handled. I'll create a patch
which makes dm-ioband cooperates with CFQ scheduler. However, do you
think we need some kind of limitation on processes which belong to the
RT class to prevent the processes from depleting bandwidth?

Thanks,
Ryo Tsuruta

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: Re: dm-ioband: Test results.
@ 2009-04-20  8:29             ` Ryo Tsuruta
  0 siblings, 0 replies; 55+ messages in thread
From: Ryo Tsuruta @ 2009-04-20  8:29 UTC (permalink / raw)
  To: nauman
  Cc: fernando, linux-kernel, jmoyer, dm-devel, agk, jens.axboe,
	vgoyal, balbir

Hi Vivek and Nauman,

> On Thu, Apr 16, 2009 at 7:11 AM, Vivek Goyal <vgoyal@redhat.com> wrote:
> > On Thu, Apr 16, 2009 at 11:47:50AM +0900, Ryo Tsuruta wrote:
> >> Hi Vivek,
> >>
> >> > General thoughts about dm-ioband
> >> > ================================
> >> > - Implementing control at second level has the advantage tha one does not
> >> >   have to muck with IO scheduler code. But then it also has the
> >> >   disadvantage that there is no communication with IO scheduler.
> >> >
> >> > - dm-ioband is buffering bio at higher layer and then doing FIFO release
> >> >   of these bios. This FIFO release can lead to priority inversion problems
> >> >   in certain cases where RT requests are way behind BE requests or
> >> >   reader starvation where reader bios are getting hidden behind writer
> >> >   bios etc. These are hard to notice issues in user space. I guess above
> >> >   RT results do highlight the RT task problems. I am still working on
> >> >   other test cases and see if i can show the probelm.
> 
> Ryo, I could not agree more with Vivek here. At Google, we have very
> stringent requirement for latency of our RT requests. If RT requests
> get queued in any higher layer (behind BE requests), all bets are off.
> I don't find doing IO control at two layer for this particular reason.
> The upper layer (dm-ioband in this case) would have to make sure that
> RT requests are released immediately, irrespective of the state (FIFO
> queuing and tokens held). And the lower layer (IO scheduling layer)
> has to do the same. This requirement is not specific to us. I have
> seen similar comments from filesystem folks here previously, in the
> context of metadata updates being submitted as RT. Basically, the
> semantics of RT class has to be preserved by any solution that is
> build on top of CFQ scheduler.

I could see the priority inversion by running Vivek's script and I
understand how RT requests has to be handled. I'll create a patch
which makes dm-ioband cooperates with CFQ scheduler. However, do you
think we need some kind of limitation on processes which belong to the
RT class to prevent the processes from depleting bandwidth?

Thanks,
Ryo Tsuruta

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [dm-devel] Re: dm-ioband: Test results.
  2009-04-20  8:29             ` Ryo Tsuruta
  (?)
@ 2009-04-20  9:07             ` Nauman Rafique
  2009-04-21 12:06               ` Ryo Tsuruta
  -1 siblings, 1 reply; 55+ messages in thread
From: Nauman Rafique @ 2009-04-20  9:07 UTC (permalink / raw)
  To: Ryo Tsuruta
  Cc: vgoyal, fernando, linux-kernel, jmoyer, dm-devel, jens.axboe,
	agk, balbir

On Mon, Apr 20, 2009 at 1:29 AM, Ryo Tsuruta <ryov@valinux.co.jp> wrote:
> Hi Vivek and Nauman,
>
>> On Thu, Apr 16, 2009 at 7:11 AM, Vivek Goyal <vgoyal@redhat.com> wrote:
>> > On Thu, Apr 16, 2009 at 11:47:50AM +0900, Ryo Tsuruta wrote:
>> >> Hi Vivek,
>> >>
>> >> > General thoughts about dm-ioband
>> >> > ================================
>> >> > - Implementing control at second level has the advantage tha one does not
>> >> >   have to muck with IO scheduler code. But then it also has the
>> >> >   disadvantage that there is no communication with IO scheduler.
>> >> >
>> >> > - dm-ioband is buffering bio at higher layer and then doing FIFO release
>> >> >   of these bios. This FIFO release can lead to priority inversion problems
>> >> >   in certain cases where RT requests are way behind BE requests or
>> >> >   reader starvation where reader bios are getting hidden behind writer
>> >> >   bios etc. These are hard to notice issues in user space. I guess above
>> >> >   RT results do highlight the RT task problems. I am still working on
>> >> >   other test cases and see if i can show the probelm.
>>
>> Ryo, I could not agree more with Vivek here. At Google, we have very
>> stringent requirement for latency of our RT requests. If RT requests
>> get queued in any higher layer (behind BE requests), all bets are off.
>> I don't find doing IO control at two layer for this particular reason.
>> The upper layer (dm-ioband in this case) would have to make sure that
>> RT requests are released immediately, irrespective of the state (FIFO
>> queuing and tokens held). And the lower layer (IO scheduling layer)
>> has to do the same. This requirement is not specific to us. I have
>> seen similar comments from filesystem folks here previously, in the
>> context of metadata updates being submitted as RT. Basically, the
>> semantics of RT class has to be preserved by any solution that is
>> build on top of CFQ scheduler.
>
> I could see the priority inversion by running Vivek's script and I
> understand how RT requests has to be handled. I'll create a patch
> which makes dm-ioband cooperates with CFQ scheduler. However, do you
> think we need some kind of limitation on processes which belong to the
> RT class to prevent the processes from depleting bandwidth?

If you are talking about starvation that could be caused by RT tasks,
you are right. We need some mechanism to introduce starvation
prevention, but I think that is an issue that can be tackled once we
decide where to do bandwidth control.

The real question is, once you create a version of dm-ioband that
co-operates with CFQ scheduler, how that solution would compare with
the patch set Vivek has posted? In my opinion, we need to converge to
one solution as soon as possible, so that we can work on it together
to refine and test it.

>
> Thanks,
> Ryo Tsuruta
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [dm-devel] Re: dm-ioband: Test results.
  2009-04-20  8:29             ` Ryo Tsuruta
@ 2009-04-20 21:37               ` Vivek Goyal
  -1 siblings, 0 replies; 55+ messages in thread
From: Vivek Goyal @ 2009-04-20 21:37 UTC (permalink / raw)
  To: Ryo Tsuruta
  Cc: nauman, fernando, linux-kernel, jmoyer, dm-devel, jens.axboe,
	agk, balbir, Andrea Righi

On Mon, Apr 20, 2009 at 05:29:59PM +0900, Ryo Tsuruta wrote:
> Hi Vivek and Nauman,
> 
> > On Thu, Apr 16, 2009 at 7:11 AM, Vivek Goyal <vgoyal@redhat.com> wrote:
> > > On Thu, Apr 16, 2009 at 11:47:50AM +0900, Ryo Tsuruta wrote:
> > >> Hi Vivek,
> > >>
> > >> > General thoughts about dm-ioband
> > >> > ================================
> > >> > - Implementing control at second level has the advantage tha one does not
> > >> >   have to muck with IO scheduler code. But then it also has the
> > >> >   disadvantage that there is no communication with IO scheduler.
> > >> >
> > >> > - dm-ioband is buffering bio at higher layer and then doing FIFO release
> > >> >   of these bios. This FIFO release can lead to priority inversion problems
> > >> >   in certain cases where RT requests are way behind BE requests or
> > >> >   reader starvation where reader bios are getting hidden behind writer
> > >> >   bios etc. These are hard to notice issues in user space. I guess above
> > >> >   RT results do highlight the RT task problems. I am still working on
> > >> >   other test cases and see if i can show the probelm.
> > 
> > Ryo, I could not agree more with Vivek here. At Google, we have very
> > stringent requirement for latency of our RT requests. If RT requests
> > get queued in any higher layer (behind BE requests), all bets are off.
> > I don't find doing IO control at two layer for this particular reason.
> > The upper layer (dm-ioband in this case) would have to make sure that
> > RT requests are released immediately, irrespective of the state (FIFO
> > queuing and tokens held). And the lower layer (IO scheduling layer)
> > has to do the same. This requirement is not specific to us. I have
> > seen similar comments from filesystem folks here previously, in the
> > context of metadata updates being submitted as RT. Basically, the
> > semantics of RT class has to be preserved by any solution that is
> > build on top of CFQ scheduler.
> 
> I could see the priority inversion by running Vivek's script and I
> understand how RT requests has to be handled. I'll create a patch
> which makes dm-ioband cooperates with CFQ scheduler. However, do you
> think we need some kind of limitation on processes which belong to the
> RT class to prevent the processes from depleting bandwidth?

I think to begin with, we can keep the same behavior as CFQ. An RT task
can starve other tasks.

But we should provide two configurations and user can choose any one.
If RT task is in root group, it will starve other sibling tasks/groups. If
it is with-in a cgroup, then it will starve its sibling only with-in that
cgroup and will not impact other cgroups.

What I mean is following.

			root
			/  \
		      RT   group1

In above configuration RT task will starve everybody else.

			root
		        /   \
		   group1   group2
		   /  \
	         RT   BE

In above configuration  RT task will starve only sibling in group1 but
will not starve the tasks in group2 or in root.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: Re: dm-ioband: Test results.
@ 2009-04-20 21:37               ` Vivek Goyal
  0 siblings, 0 replies; 55+ messages in thread
From: Vivek Goyal @ 2009-04-20 21:37 UTC (permalink / raw)
  To: Ryo Tsuruta
  Cc: fernando, linux-kernel, jmoyer, dm-devel, jens.axboe, nauman,
	Andrea Righi, agk, balbir

On Mon, Apr 20, 2009 at 05:29:59PM +0900, Ryo Tsuruta wrote:
> Hi Vivek and Nauman,
> 
> > On Thu, Apr 16, 2009 at 7:11 AM, Vivek Goyal <vgoyal@redhat.com> wrote:
> > > On Thu, Apr 16, 2009 at 11:47:50AM +0900, Ryo Tsuruta wrote:
> > >> Hi Vivek,
> > >>
> > >> > General thoughts about dm-ioband
> > >> > ================================
> > >> > - Implementing control at second level has the advantage tha one does not
> > >> >   have to muck with IO scheduler code. But then it also has the
> > >> >   disadvantage that there is no communication with IO scheduler.
> > >> >
> > >> > - dm-ioband is buffering bio at higher layer and then doing FIFO release
> > >> >   of these bios. This FIFO release can lead to priority inversion problems
> > >> >   in certain cases where RT requests are way behind BE requests or
> > >> >   reader starvation where reader bios are getting hidden behind writer
> > >> >   bios etc. These are hard to notice issues in user space. I guess above
> > >> >   RT results do highlight the RT task problems. I am still working on
> > >> >   other test cases and see if i can show the probelm.
> > 
> > Ryo, I could not agree more with Vivek here. At Google, we have very
> > stringent requirement for latency of our RT requests. If RT requests
> > get queued in any higher layer (behind BE requests), all bets are off.
> > I don't find doing IO control at two layer for this particular reason.
> > The upper layer (dm-ioband in this case) would have to make sure that
> > RT requests are released immediately, irrespective of the state (FIFO
> > queuing and tokens held). And the lower layer (IO scheduling layer)
> > has to do the same. This requirement is not specific to us. I have
> > seen similar comments from filesystem folks here previously, in the
> > context of metadata updates being submitted as RT. Basically, the
> > semantics of RT class has to be preserved by any solution that is
> > build on top of CFQ scheduler.
> 
> I could see the priority inversion by running Vivek's script and I
> understand how RT requests has to be handled. I'll create a patch
> which makes dm-ioband cooperates with CFQ scheduler. However, do you
> think we need some kind of limitation on processes which belong to the
> RT class to prevent the processes from depleting bandwidth?

I think to begin with, we can keep the same behavior as CFQ. An RT task
can starve other tasks.

But we should provide two configurations and user can choose any one.
If RT task is in root group, it will starve other sibling tasks/groups. If
it is with-in a cgroup, then it will starve its sibling only with-in that
cgroup and will not impact other cgroups.

What I mean is following.

			root
			/  \
		      RT   group1

In above configuration RT task will starve everybody else.

			root
		        /   \
		   group1   group2
		   /  \
	         RT   BE

In above configuration  RT task will starve only sibling in group1 but
will not starve the tasks in group2 or in root.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [dm-devel] Re: dm-ioband: Test results.
  2009-04-20  9:07             ` [dm-devel] " Nauman Rafique
@ 2009-04-21 12:06               ` Ryo Tsuruta
  2009-04-21 12:10                   ` Ryo Tsuruta
  0 siblings, 1 reply; 55+ messages in thread
From: Ryo Tsuruta @ 2009-04-21 12:06 UTC (permalink / raw)
  To: nauman
  Cc: vgoyal, fernando, linux-kernel, jmoyer, dm-devel, jens.axboe,
	agk, balbir

Hi Nauman,

> >> >> > General thoughts about dm-ioband
> >> >> > ================================
> >> >> > - Implementing control at second level has the advantage tha one does not
> >> >> >   have to muck with IO scheduler code. But then it also has the
> >> >> >   disadvantage that there is no communication with IO scheduler.
> >> >> >
> >> >> > - dm-ioband is buffering bio at higher layer and then doing FIFO release
> >> >> >   of these bios. This FIFO release can lead to priority inversion problems
> >> >> >   in certain cases where RT requests are way behind BE requests or
> >> >> >   reader starvation where reader bios are getting hidden behind writer
> >> >> >   bios etc. These are hard to notice issues in user space. I guess above
> >> >> >   RT results do highlight the RT task problems. I am still working on
> >> >> >   other test cases and see if i can show the probelm.
> >>
> >> Ryo, I could not agree more with Vivek here. At Google, we have very
> >> stringent requirement for latency of our RT requests. If RT requests
> >> get queued in any higher layer (behind BE requests), all bets are off.
> >> I don't find doing IO control at two layer for this particular reason.
> >> The upper layer (dm-ioband in this case) would have to make sure that
> >> RT requests are released immediately, irrespective of the state (FIFO
> >> queuing and tokens held). And the lower layer (IO scheduling layer)
> >> has to do the same. This requirement is not specific to us. I have
> >> seen similar comments from filesystem folks here previously, in the
> >> context of metadata updates being submitted as RT. Basically, the
> >> semantics of RT class has to be preserved by any solution that is
> >> build on top of CFQ scheduler.
> >
> > I could see the priority inversion by running Vivek's script and I
> > understand how RT requests has to be handled. I'll create a patch
> > which makes dm-ioband cooperates with CFQ scheduler. However, do you
> > think we need some kind of limitation on processes which belong to the
> > RT class to prevent the processes from depleting bandwidth?
> 
> If you are talking about starvation that could be caused by RT tasks,
> you are right. We need some mechanism to introduce starvation
> prevention, but I think that is an issue that can be tackled once we
> decide where to do bandwidth control.
> 
> The real question is, once you create a version of dm-ioband that
> co-operates with CFQ scheduler, how that solution would compare with
> the patch set Vivek has posted? In my opinion, we need to converge to
> one solution as soon as possible, so that we can work on it together
> to refine and test it.

I think I can do some help for your work. but I want to continue the
development of dm-ioband, because dm-ioband actually works well and
I think it has some advantages against other IO controllers.
  - It can use without cgroup.
  - It can control bandwidth on a per partition basis.
  - The driver module can be replaced without stopping the system.

Thanks,
Ryo Tsuruta

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [dm-devel] Re: dm-ioband: Test results.
  2009-04-21 12:06               ` Ryo Tsuruta
@ 2009-04-21 12:10                   ` Ryo Tsuruta
  0 siblings, 0 replies; 55+ messages in thread
From: Ryo Tsuruta @ 2009-04-21 12:10 UTC (permalink / raw)
  To: nauman
  Cc: fernando, linux-kernel, jmoyer, dm-devel, agk, jens.axboe,
	vgoyal, balbir

Hi Nauman,

> > The real question is, once you create a version of dm-ioband that
> > co-operates with CFQ scheduler, how that solution would compare with
> > the patch set Vivek has posted? In my opinion, we need to converge to
> > one solution as soon as possible, so that we can work on it together
> > to refine and test it.
> 
> I think I can do some help for your work. but I want to continue the
> development of dm-ioband, because dm-ioband actually works well and
> I think it has some advantages against other IO controllers.
>   - It can use without cgroup.
>   - It can control bandwidth on a per partition basis.
>   - The driver module can be replaced without stopping the system.

In addition, dm-ioband can run on the RHEL5.

Thanks,
Ryo Tsuruta

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: Re: dm-ioband: Test results.
@ 2009-04-21 12:10                   ` Ryo Tsuruta
  0 siblings, 0 replies; 55+ messages in thread
From: Ryo Tsuruta @ 2009-04-21 12:10 UTC (permalink / raw)
  To: nauman
  Cc: fernando, linux-kernel, jmoyer, dm-devel, vgoyal, jens.axboe,
	agk, balbir

Hi Nauman,

> > The real question is, once you create a version of dm-ioband that
> > co-operates with CFQ scheduler, how that solution would compare with
> > the patch set Vivek has posted? In my opinion, we need to converge to
> > one solution as soon as possible, so that we can work on it together
> > to refine and test it.
> 
> I think I can do some help for your work. but I want to continue the
> development of dm-ioband, because dm-ioband actually works well and
> I think it has some advantages against other IO controllers.
>   - It can use without cgroup.
>   - It can control bandwidth on a per partition basis.
>   - The driver module can be replaced without stopping the system.

In addition, dm-ioband can run on the RHEL5.

Thanks,
Ryo Tsuruta

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [dm-devel] Re: dm-ioband: Test results.
  2009-04-20 21:37               ` Vivek Goyal
  (?)
@ 2009-04-21 12:18               ` Ryo Tsuruta
  -1 siblings, 0 replies; 55+ messages in thread
From: Ryo Tsuruta @ 2009-04-21 12:18 UTC (permalink / raw)
  To: vgoyal
  Cc: nauman, fernando, linux-kernel, jmoyer, dm-devel, jens.axboe,
	agk, balbir, righi.andrea

Hi Vivek,

> > I could see the priority inversion by running Vivek's script and I
> > understand how RT requests has to be handled. I'll create a patch
> > which makes dm-ioband cooperates with CFQ scheduler. However, do you
> > think we need some kind of limitation on processes which belong to the
> > RT class to prevent the processes from depleting bandwidth?
>
> I think to begin with, we can keep the same behavior as CFQ. An RT task
> can starve other tasks.
> 
> But we should provide two configurations and user can choose any one.
> If RT task is in root group, it will starve other sibling tasks/groups. If
> it is with-in a cgroup, then it will starve its sibling only with-in that
> cgroup and will not impact other cgroups.
> 
> What I mean is following.
> 
> 			root
> 			/  \
> 		      RT   group1
> 
> In above configuration RT task will starve everybody else.
> 
> 			root
> 		        /   \
> 		   group1   group2
> 		   /  \
> 	         RT   BE
> 
> In above configuration  RT task will starve only sibling in group1 but
> will not starve the tasks in group2 or in root.

Thanks for the suggestion. I'll try this way when dm-ioband supports
hierarchical grouping.

Thanks,
Ryo Tsuruta

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: dm-ioband: Test results.
  2009-04-21 12:10                   ` Ryo Tsuruta
@ 2009-04-21 13:57                     ` Mike Snitzer
  -1 siblings, 0 replies; 55+ messages in thread
From: Mike Snitzer @ 2009-04-21 13:57 UTC (permalink / raw)
  To: device-mapper development
  Cc: nauman, fernando, linux-kernel, jmoyer, vgoyal, jens.axboe, agk, balbir

On Tue, Apr 21 2009 at  8:10am -0400,
Ryo Tsuruta <ryov@valinux.co.jp> wrote:

> Hi Nauman,
> 
> > > The real question is, once you create a version of dm-ioband that
> > > co-operates with CFQ scheduler, how that solution would compare with
> > > the patch set Vivek has posted? In my opinion, we need to converge to
> > > one solution as soon as possible, so that we can work on it together
> > > to refine and test it.
> > 
> > I think I can do some help for your work. but I want to continue the
> > development of dm-ioband, because dm-ioband actually works well and
> > I think it has some advantages against other IO controllers.
> >   - It can use without cgroup.
> >   - It can control bandwidth on a per partition basis.
> >   - The driver module can be replaced without stopping the system.
> 
> In addition, dm-ioband can run on the RHEL5.

RHEL5 compatibility does not matter relative to merging an I/O bandwidth
controller upstream.  So both the "can [be] use without cgroup" and "can
run on RHEL5" features do not help your cause of getting dm-ioband
merged upstream.  In fact these features serve as distractions.

Mike

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: dm-ioband: Test results.
@ 2009-04-21 13:57                     ` Mike Snitzer
  0 siblings, 0 replies; 55+ messages in thread
From: Mike Snitzer @ 2009-04-21 13:57 UTC (permalink / raw)
  To: device-mapper development
  Cc: fernando, linux-kernel, jmoyer, agk, jens.axboe, nauman, vgoyal, balbir

On Tue, Apr 21 2009 at  8:10am -0400,
Ryo Tsuruta <ryov@valinux.co.jp> wrote:

> Hi Nauman,
> 
> > > The real question is, once you create a version of dm-ioband that
> > > co-operates with CFQ scheduler, how that solution would compare with
> > > the patch set Vivek has posted? In my opinion, we need to converge to
> > > one solution as soon as possible, so that we can work on it together
> > > to refine and test it.
> > 
> > I think I can do some help for your work. but I want to continue the
> > development of dm-ioband, because dm-ioband actually works well and
> > I think it has some advantages against other IO controllers.
> >   - It can use without cgroup.
> >   - It can control bandwidth on a per partition basis.
> >   - The driver module can be replaced without stopping the system.
> 
> In addition, dm-ioband can run on the RHEL5.

RHEL5 compatibility does not matter relative to merging an I/O bandwidth
controller upstream.  So both the "can [be] use without cgroup" and "can
run on RHEL5" features do not help your cause of getting dm-ioband
merged upstream.  In fact these features serve as distractions.

Mike

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: dm-ioband: Test results.
  2009-04-21 13:57                     ` Mike Snitzer
@ 2009-04-21 14:16                       ` Vivek Goyal
  -1 siblings, 0 replies; 55+ messages in thread
From: Vivek Goyal @ 2009-04-21 14:16 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: device-mapper development, nauman, fernando, linux-kernel,
	jmoyer, jens.axboe, agk, balbir

On Tue, Apr 21, 2009 at 09:57:23AM -0400, Mike Snitzer wrote:
> On Tue, Apr 21 2009 at  8:10am -0400,
> Ryo Tsuruta <ryov@valinux.co.jp> wrote:
> 
> > Hi Nauman,
> > 
> > > > The real question is, once you create a version of dm-ioband that
> > > > co-operates with CFQ scheduler, how that solution would compare with
> > > > the patch set Vivek has posted? In my opinion, we need to converge to
> > > > one solution as soon as possible, so that we can work on it together
> > > > to refine and test it.
> > > 
> > > I think I can do some help for your work. but I want to continue the
> > > development of dm-ioband, because dm-ioband actually works well and
> > > I think it has some advantages against other IO controllers.
> > >   - It can use without cgroup.
> > >   - It can control bandwidth on a per partition basis.
> > >   - The driver module can be replaced without stopping the system.
> > 
> > In addition, dm-ioband can run on the RHEL5.
> 
> RHEL5 compatibility does not matter relative to merging an I/O bandwidth
> controller upstream.  So both the "can [be] use without cgroup" and "can
> run on RHEL5" features do not help your cause of getting dm-ioband
> merged upstream.  In fact these features serve as distractions.

Exactly. I don't think that "it can be used without cgroup" is a feature
or advantage. To me it is a disadvantage and should be fixed. cgroup is
standard mechanism to group tasks arbitrarily and we should use that to make
things working instead of coming up with own ways of grouping things and
terming it as advantage.

What do you mean by "The driver module can be replaced without stopping
the system"? I guess you mean that one does not have to reboot the system
to remove ioband device? So if one decides to not use the cgroup, then
one shall have to remove the ioband devices, remount the filesystems and
restart the application?

With cgroup approach, if one does not want things to be classified, a user
can simply move all the tasks to root group and things will be fine. No
remounting, no application stopping etc. So this also does not look like
an advantage instead sounds like an disadvantage.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: dm-ioband: Test results.
@ 2009-04-21 14:16                       ` Vivek Goyal
  0 siblings, 0 replies; 55+ messages in thread
From: Vivek Goyal @ 2009-04-21 14:16 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: fernando, device-mapper development, linux-kernel, jmoyer,
	jens.axboe, nauman, agk, balbir

On Tue, Apr 21, 2009 at 09:57:23AM -0400, Mike Snitzer wrote:
> On Tue, Apr 21 2009 at  8:10am -0400,
> Ryo Tsuruta <ryov@valinux.co.jp> wrote:
> 
> > Hi Nauman,
> > 
> > > > The real question is, once you create a version of dm-ioband that
> > > > co-operates with CFQ scheduler, how that solution would compare with
> > > > the patch set Vivek has posted? In my opinion, we need to converge to
> > > > one solution as soon as possible, so that we can work on it together
> > > > to refine and test it.
> > > 
> > > I think I can do some help for your work. but I want to continue the
> > > development of dm-ioband, because dm-ioband actually works well and
> > > I think it has some advantages against other IO controllers.
> > >   - It can use without cgroup.
> > >   - It can control bandwidth on a per partition basis.
> > >   - The driver module can be replaced without stopping the system.
> > 
> > In addition, dm-ioband can run on the RHEL5.
> 
> RHEL5 compatibility does not matter relative to merging an I/O bandwidth
> controller upstream.  So both the "can [be] use without cgroup" and "can
> run on RHEL5" features do not help your cause of getting dm-ioband
> merged upstream.  In fact these features serve as distractions.

Exactly. I don't think that "it can be used without cgroup" is a feature
or advantage. To me it is a disadvantage and should be fixed. cgroup is
standard mechanism to group tasks arbitrarily and we should use that to make
things working instead of coming up with own ways of grouping things and
terming it as advantage.

What do you mean by "The driver module can be replaced without stopping
the system"? I guess you mean that one does not have to reboot the system
to remove ioband device? So if one decides to not use the cgroup, then
one shall have to remove the ioband devices, remount the filesystems and
restart the application?

With cgroup approach, if one does not want things to be classified, a user
can simply move all the tasks to root group and things will be fine. No
remounting, no application stopping etc. So this also does not look like
an advantage instead sounds like an disadvantage.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: dm-ioband: Test results.
  2009-04-21 14:16                       ` Vivek Goyal
  (?)
@ 2009-04-22  0:50                       ` Li Zefan
  -1 siblings, 0 replies; 55+ messages in thread
From: Li Zefan @ 2009-04-22  0:50 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Mike Snitzer, device-mapper development, nauman, fernando,
	linux-kernel, jmoyer, jens.axboe, agk, balbir

Vivek Goyal wrote:
> On Tue, Apr 21, 2009 at 09:57:23AM -0400, Mike Snitzer wrote:
>> On Tue, Apr 21 2009 at  8:10am -0400,
>> Ryo Tsuruta <ryov@valinux.co.jp> wrote:
>>
>>> Hi Nauman,
>>>
>>>>> The real question is, once you create a version of dm-ioband that
>>>>> co-operates with CFQ scheduler, how that solution would compare with
>>>>> the patch set Vivek has posted? In my opinion, we need to converge to
>>>>> one solution as soon as possible, so that we can work on it together
>>>>> to refine and test it.
>>>> I think I can do some help for your work. but I want to continue the
>>>> development of dm-ioband, because dm-ioband actually works well and
>>>> I think it has some advantages against other IO controllers.
>>>>   - It can use without cgroup.
>>>>   - It can control bandwidth on a per partition basis.
>>>>   - The driver module can be replaced without stopping the system.
>>> In addition, dm-ioband can run on the RHEL5.
>> RHEL5 compatibility does not matter relative to merging an I/O bandwidth
>> controller upstream.  So both the "can [be] use without cgroup" and "can
>> run on RHEL5" features do not help your cause of getting dm-ioband
>> merged upstream.  In fact these features serve as distractions.
> 
> Exactly. I don't think that "it can be used without cgroup" is a feature
> or advantage. To me it is a disadvantage and should be fixed. cgroup is
> standard mechanism to group tasks arbitrarily and we should use that to make
> things working instead of coming up with own ways of grouping things and
> terming it as advantage.
> 

I agree. And for the case of cpu scheduler, there are user group scheduler
and cgroup group scheduler, but Peter said he would like to see user group
scheduler to be removed.

> What do you mean by "The driver module can be replaced without stopping
> the system"? I guess you mean that one does not have to reboot the system
> to remove ioband device? So if one decides to not use the cgroup, then
> one shall have to remove the ioband devices, remount the filesystems and
> restart the application?
> 
> With cgroup approach, if one does not want things to be classified, a user
> can simply move all the tasks to root group and things will be fine. No
> remounting, no application stopping etc. So this also does not look like
> an advantage instead sounds like an disadvantage.
> 
> Thanks
> Vivek

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [dm-devel] Re: dm-ioband: Test results.
  2009-04-21 14:16                       ` Vivek Goyal
@ 2009-04-22  3:14                         ` Ryo Tsuruta
  -1 siblings, 0 replies; 55+ messages in thread
From: Ryo Tsuruta @ 2009-04-22  3:14 UTC (permalink / raw)
  To: dm-devel, vgoyal
  Cc: snitzer, fernando, linux-kernel, jmoyer, jens.axboe, nauman, agk, balbir

Hi,

From: Vivek Goyal <vgoyal@redhat.com>
Subject: [dm-devel] Re: dm-ioband: Test results.
Date: Tue, 21 Apr 2009 10:16:07 -0400

> On Tue, Apr 21, 2009 at 09:57:23AM -0400, Mike Snitzer wrote:
> > On Tue, Apr 21 2009 at  8:10am -0400,
> > Ryo Tsuruta <ryov@valinux.co.jp> wrote:
> > 
> > > Hi Nauman,
> > > 
> > > > > The real question is, once you create a version of dm-ioband that
> > > > > co-operates with CFQ scheduler, how that solution would compare with
> > > > > the patch set Vivek has posted? In my opinion, we need to converge to
> > > > > one solution as soon as possible, so that we can work on it together
> > > > > to refine and test it.
> > > > 
> > > > I think I can do some help for your work. but I want to continue the
> > > > development of dm-ioband, because dm-ioband actually works well and
> > > > I think it has some advantages against other IO controllers.
> > > >   - It can use without cgroup.
> > > >   - It can control bandwidth on a per partition basis.
> > > >   - The driver module can be replaced without stopping the system.
> > > 
> > > In addition, dm-ioband can run on the RHEL5.
> > 
> > RHEL5 compatibility does not matter relative to merging an I/O bandwidth
> > controller upstream.  So both the "can [be] use without cgroup" and "can
> > run on RHEL5" features do not help your cause of getting dm-ioband
> > merged upstream.  In fact these features serve as distractions.
>
> Exactly. I don't think that "it can be used without cgroup" is a feature
> or advantage. To me it is a disadvantage and should be fixed. cgroup is
> standard mechanism to group tasks arbitrarily and we should use that to make
> things working instead of coming up with own ways of grouping things and
> terming it as advantage.
> 
> What do you mean by "The driver module can be replaced without stopping
> the system"? I guess you mean that one does not have to reboot the system
> to remove ioband device? So if one decides to not use the cgroup, then
> one shall have to remove the ioband devices, remount the filesystems and
> restart the application?

Device-mapper has a feature that can replace an intermediate module
without unmount the device like the following.

      ---------------------        ---------------------
     |         /mnt        |      |        /mnt         |
     |---------------------|      |---------------------|
     | /dev/mapper/ioband1 |      | /dev/mapper/ioband1 |
     |---------------------|      |---------------------|
     |      dm-ioband      | <==> |      dm-linear      |
     |---------------------|      |---------------------|
     |      /dev/sda1      |      |      /dev/sda1      |
      ---------------------        ---------------------

So we can safely unload the dm-ioband module and update it.

> With cgroup approach, if one does not want things to be classified, a user
> can simply move all the tasks to root group and things will be fine. No
> remounting, no application stopping etc. So this also does not look like
> an advantage instead sounds like an disadvantage.
> 
> Thanks
> Vivek
> 
> --
> dm-devel mailing list
> dm-devel@redhat.com
> https://www.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: Re: dm-ioband: Test results.
@ 2009-04-22  3:14                         ` Ryo Tsuruta
  0 siblings, 0 replies; 55+ messages in thread
From: Ryo Tsuruta @ 2009-04-22  3:14 UTC (permalink / raw)
  To: dm-devel, vgoyal
  Cc: snitzer, fernando, linux-kernel, jmoyer, jens.axboe, nauman, agk, balbir

Hi,

From: Vivek Goyal <vgoyal@redhat.com>
Subject: [dm-devel] Re: dm-ioband: Test results.
Date: Tue, 21 Apr 2009 10:16:07 -0400

> On Tue, Apr 21, 2009 at 09:57:23AM -0400, Mike Snitzer wrote:
> > On Tue, Apr 21 2009 at  8:10am -0400,
> > Ryo Tsuruta <ryov@valinux.co.jp> wrote:
> > 
> > > Hi Nauman,
> > > 
> > > > > The real question is, once you create a version of dm-ioband that
> > > > > co-operates with CFQ scheduler, how that solution would compare with
> > > > > the patch set Vivek has posted? In my opinion, we need to converge to
> > > > > one solution as soon as possible, so that we can work on it together
> > > > > to refine and test it.
> > > > 
> > > > I think I can do some help for your work. but I want to continue the
> > > > development of dm-ioband, because dm-ioband actually works well and
> > > > I think it has some advantages against other IO controllers.
> > > >   - It can use without cgroup.
> > > >   - It can control bandwidth on a per partition basis.
> > > >   - The driver module can be replaced without stopping the system.
> > > 
> > > In addition, dm-ioband can run on the RHEL5.
> > 
> > RHEL5 compatibility does not matter relative to merging an I/O bandwidth
> > controller upstream.  So both the "can [be] use without cgroup" and "can
> > run on RHEL5" features do not help your cause of getting dm-ioband
> > merged upstream.  In fact these features serve as distractions.
>
> Exactly. I don't think that "it can be used without cgroup" is a feature
> or advantage. To me it is a disadvantage and should be fixed. cgroup is
> standard mechanism to group tasks arbitrarily and we should use that to make
> things working instead of coming up with own ways of grouping things and
> terming it as advantage.
> 
> What do you mean by "The driver module can be replaced without stopping
> the system"? I guess you mean that one does not have to reboot the system
> to remove ioband device? So if one decides to not use the cgroup, then
> one shall have to remove the ioband devices, remount the filesystems and
> restart the application?

Device-mapper has a feature that can replace an intermediate module
without unmount the device like the following.

      ---------------------        ---------------------
     |         /mnt        |      |        /mnt         |
     |---------------------|      |---------------------|
     | /dev/mapper/ioband1 |      | /dev/mapper/ioband1 |
     |---------------------|      |---------------------|
     |      dm-ioband      | <==> |      dm-linear      |
     |---------------------|      |---------------------|
     |      /dev/sda1      |      |      /dev/sda1      |
      ---------------------        ---------------------

So we can safely unload the dm-ioband module and update it.

> With cgroup approach, if one does not want things to be classified, a user
> can simply move all the tasks to root group and things will be fine. No
> remounting, no application stopping etc. So this also does not look like
> an advantage instead sounds like an disadvantage.
> 
> Thanks
> Vivek
> 
> --
> dm-devel mailing list
> dm-devel@redhat.com
> https://www.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: dm-ioband: Test results.
  2009-04-22  3:14                         ` Ryo Tsuruta
@ 2009-04-22 15:18                           ` Mike Snitzer
  -1 siblings, 0 replies; 55+ messages in thread
From: Mike Snitzer @ 2009-04-22 15:18 UTC (permalink / raw)
  To: Ryo Tsuruta
  Cc: dm-devel, vgoyal, fernando, linux-kernel, jmoyer, jens.axboe,
	nauman, agk, balbir

On Tue, Apr 21 2009 at 11:14pm -0400,
Ryo Tsuruta <ryov@valinux.co.jp> wrote:

> Hi,
> 
> From: Vivek Goyal <vgoyal@redhat.com>
> Subject: [dm-devel] Re: dm-ioband: Test results.
> Date: Tue, 21 Apr 2009 10:16:07 -0400
> 
> > On Tue, Apr 21, 2009 at 09:57:23AM -0400, Mike Snitzer wrote:
> > > On Tue, Apr 21 2009 at  8:10am -0400,
> > > Ryo Tsuruta <ryov@valinux.co.jp> wrote:
> > > 
> > > > Hi Nauman,
> > > > 
> > > > > > The real question is, once you create a version of dm-ioband that
> > > > > > co-operates with CFQ scheduler, how that solution would compare with
> > > > > > the patch set Vivek has posted? In my opinion, we need to converge to
> > > > > > one solution as soon as possible, so that we can work on it together
> > > > > > to refine and test it.
> > > > > 
> > > > > I think I can do some help for your work. but I want to continue the
> > > > > development of dm-ioband, because dm-ioband actually works well and
> > > > > I think it has some advantages against other IO controllers.
> > > > >   - It can use without cgroup.
> > > > >   - It can control bandwidth on a per partition basis.
> > > > >   - The driver module can be replaced without stopping the system.
> > > > 
> > > > In addition, dm-ioband can run on the RHEL5.
> > > 
> > > RHEL5 compatibility does not matter relative to merging an I/O bandwidth
> > > controller upstream.  So both the "can [be] use without cgroup" and "can
> > > run on RHEL5" features do not help your cause of getting dm-ioband
> > > merged upstream.  In fact these features serve as distractions.
> >
> > Exactly. I don't think that "it can be used without cgroup" is a feature
> > or advantage. To me it is a disadvantage and should be fixed. cgroup is
> > standard mechanism to group tasks arbitrarily and we should use that to make
> > things working instead of coming up with own ways of grouping things and
> > terming it as advantage.
> > 
> > What do you mean by "The driver module can be replaced without stopping
> > the system"? I guess you mean that one does not have to reboot the system
> > to remove ioband device? So if one decides to not use the cgroup, then
> > one shall have to remove the ioband devices, remount the filesystems and
> > restart the application?
> 
> Device-mapper has a feature that can replace an intermediate module
> without unmount the device like the following.
> 
>       ---------------------        ---------------------
>      |         /mnt        |      |        /mnt         |
>      |---------------------|      |---------------------|
>      | /dev/mapper/ioband1 |      | /dev/mapper/ioband1 |
>      |---------------------|      |---------------------|
>      |      dm-ioband      | <==> |      dm-linear      |
>      |---------------------|      |---------------------|
>      |      /dev/sda1      |      |      /dev/sda1      |
>       ---------------------        ---------------------
> 
> So we can safely unload the dm-ioband module and update it.
> 
> > With cgroup approach, if one does not want things to be classified, a user
> > can simply move all the tasks to root group and things will be fine. No
> > remounting, no application stopping etc. So this also does not look like
> > an advantage instead sounds like an disadvantage.
> > 
> > Thanks
> > Vivek

Ryo,

Why is it that you repeatedly ignore concern/discussion about your
determination to continue using a custom grouping mechanism?  It is this
type of excess layering that serves no purpose other than to facilitate
out-of-tree use-cases.  dm-ioband would take a big step closer to being
merged upstream if you took others' feedback and showed more willingness
to work through the outstanding issues.

Mike

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: dm-ioband: Test results.
@ 2009-04-22 15:18                           ` Mike Snitzer
  0 siblings, 0 replies; 55+ messages in thread
From: Mike Snitzer @ 2009-04-22 15:18 UTC (permalink / raw)
  To: Ryo Tsuruta
  Cc: fernando, dm-devel, linux-kernel, jmoyer, agk, jens.axboe,
	nauman, vgoyal, balbir

On Tue, Apr 21 2009 at 11:14pm -0400,
Ryo Tsuruta <ryov@valinux.co.jp> wrote:

> Hi,
> 
> From: Vivek Goyal <vgoyal@redhat.com>
> Subject: [dm-devel] Re: dm-ioband: Test results.
> Date: Tue, 21 Apr 2009 10:16:07 -0400
> 
> > On Tue, Apr 21, 2009 at 09:57:23AM -0400, Mike Snitzer wrote:
> > > On Tue, Apr 21 2009 at  8:10am -0400,
> > > Ryo Tsuruta <ryov@valinux.co.jp> wrote:
> > > 
> > > > Hi Nauman,
> > > > 
> > > > > > The real question is, once you create a version of dm-ioband that
> > > > > > co-operates with CFQ scheduler, how that solution would compare with
> > > > > > the patch set Vivek has posted? In my opinion, we need to converge to
> > > > > > one solution as soon as possible, so that we can work on it together
> > > > > > to refine and test it.
> > > > > 
> > > > > I think I can do some help for your work. but I want to continue the
> > > > > development of dm-ioband, because dm-ioband actually works well and
> > > > > I think it has some advantages against other IO controllers.
> > > > >   - It can use without cgroup.
> > > > >   - It can control bandwidth on a per partition basis.
> > > > >   - The driver module can be replaced without stopping the system.
> > > > 
> > > > In addition, dm-ioband can run on the RHEL5.
> > > 
> > > RHEL5 compatibility does not matter relative to merging an I/O bandwidth
> > > controller upstream.  So both the "can [be] use without cgroup" and "can
> > > run on RHEL5" features do not help your cause of getting dm-ioband
> > > merged upstream.  In fact these features serve as distractions.
> >
> > Exactly. I don't think that "it can be used without cgroup" is a feature
> > or advantage. To me it is a disadvantage and should be fixed. cgroup is
> > standard mechanism to group tasks arbitrarily and we should use that to make
> > things working instead of coming up with own ways of grouping things and
> > terming it as advantage.
> > 
> > What do you mean by "The driver module can be replaced without stopping
> > the system"? I guess you mean that one does not have to reboot the system
> > to remove ioband device? So if one decides to not use the cgroup, then
> > one shall have to remove the ioband devices, remount the filesystems and
> > restart the application?
> 
> Device-mapper has a feature that can replace an intermediate module
> without unmount the device like the following.
> 
>       ---------------------        ---------------------
>      |         /mnt        |      |        /mnt         |
>      |---------------------|      |---------------------|
>      | /dev/mapper/ioband1 |      | /dev/mapper/ioband1 |
>      |---------------------|      |---------------------|
>      |      dm-ioband      | <==> |      dm-linear      |
>      |---------------------|      |---------------------|
>      |      /dev/sda1      |      |      /dev/sda1      |
>       ---------------------        ---------------------
> 
> So we can safely unload the dm-ioband module and update it.
> 
> > With cgroup approach, if one does not want things to be classified, a user
> > can simply move all the tasks to root group and things will be fine. No
> > remounting, no application stopping etc. So this also does not look like
> > an advantage instead sounds like an disadvantage.
> > 
> > Thanks
> > Vivek

Ryo,

Why is it that you repeatedly ignore concern/discussion about your
determination to continue using a custom grouping mechanism?  It is this
type of excess layering that serves no purpose other than to facilitate
out-of-tree use-cases.  dm-ioband would take a big step closer to being
merged upstream if you took others' feedback and showed more willingness
to work through the outstanding issues.

Mike

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: dm-ioband: Test results.
  2009-04-22 15:18                           ` Mike Snitzer
@ 2009-04-27 10:30                             ` Ryo Tsuruta
  -1 siblings, 0 replies; 55+ messages in thread
From: Ryo Tsuruta @ 2009-04-27 10:30 UTC (permalink / raw)
  To: snitzer
  Cc: dm-devel, vgoyal, fernando, linux-kernel, jmoyer, jens.axboe,
	nauman, agk, balbir

Hi Mike,

> Why is it that you repeatedly ignore concern/discussion about your
> determination to continue using a custom grouping mechanism?  It is this
> type of excess layering that serves no purpose other than to facilitate
> out-of-tree use-cases.  dm-ioband would take a big step closer to being
> merged upstream if you took others' feedback and showed more willingness
> to work through the outstanding issues.

I think dm-ioband's approach is one simple way to handle cgroup
because the current cgroup has no way to manage kernel module's
resources. Please tell me if you have any good ideas to handle 
cgroup by dm-ioband.

Thanks,
Ryo Tsuruta

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: dm-ioband: Test results.
@ 2009-04-27 10:30                             ` Ryo Tsuruta
  0 siblings, 0 replies; 55+ messages in thread
From: Ryo Tsuruta @ 2009-04-27 10:30 UTC (permalink / raw)
  To: snitzer
  Cc: fernando, dm-devel, linux-kernel, jmoyer, agk, jens.axboe,
	nauman, vgoyal, balbir

Hi Mike,

> Why is it that you repeatedly ignore concern/discussion about your
> determination to continue using a custom grouping mechanism?  It is this
> type of excess layering that serves no purpose other than to facilitate
> out-of-tree use-cases.  dm-ioband would take a big step closer to being
> merged upstream if you took others' feedback and showed more willingness
> to work through the outstanding issues.

I think dm-ioband's approach is one simple way to handle cgroup
because the current cgroup has no way to manage kernel module's
resources. Please tell me if you have any good ideas to handle 
cgroup by dm-ioband.

Thanks,
Ryo Tsuruta

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: dm-ioband: Test results.
  2009-04-27 10:30                             ` Ryo Tsuruta
@ 2009-04-27 12:44                               ` Ryo Tsuruta
  -1 siblings, 0 replies; 55+ messages in thread
From: Ryo Tsuruta @ 2009-04-27 12:44 UTC (permalink / raw)
  To: agk
  Cc: dm-devel, vgoyal, fernando, linux-kernel, jmoyer, jens.axboe,
	nauman, balbir, snitzer

Hi Alasdair,

From: Mike Snitzer <snitzer@redhat.com>
Subject: Re: dm-ioband: Test results.
Date: Wed, 22 Apr 2009 11:18:06 -0400

> Why is it that you repeatedly ignore concern/discussion about your
> determination to continue using a custom grouping mechanism?  It is this
> type of excess layering that serves no purpose other than to facilitate
> out-of-tree use-cases.  dm-ioband would take a big step closer to being
> merged upstream if you took others' feedback and showed more willingness
> to work through the outstanding issues.

Could you please let me know your opinion about merging dm-ioband
upstream? What do I have to do in order to merge dm-ioband?

Thanks,
Ryo Tsuruta

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: dm-ioband: Test results.
@ 2009-04-27 12:44                               ` Ryo Tsuruta
  0 siblings, 0 replies; 55+ messages in thread
From: Ryo Tsuruta @ 2009-04-27 12:44 UTC (permalink / raw)
  To: agk
  Cc: snitzer, fernando, dm-devel, linux-kernel, jmoyer, jens.axboe,
	nauman, vgoyal, balbir

Hi Alasdair,

From: Mike Snitzer <snitzer@redhat.com>
Subject: Re: dm-ioband: Test results.
Date: Wed, 22 Apr 2009 11:18:06 -0400

> Why is it that you repeatedly ignore concern/discussion about your
> determination to continue using a custom grouping mechanism?  It is this
> type of excess layering that serves no purpose other than to facilitate
> out-of-tree use-cases.  dm-ioband would take a big step closer to being
> merged upstream if you took others' feedback and showed more willingness
> to work through the outstanding issues.

Could you please let me know your opinion about merging dm-ioband
upstream? What do I have to do in order to merge dm-ioband?

Thanks,
Ryo Tsuruta

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: dm-ioband: Test results.
  2009-04-27 10:30                             ` Ryo Tsuruta
@ 2009-04-27 13:03                               ` Mike Snitzer
  -1 siblings, 0 replies; 55+ messages in thread
From: Mike Snitzer @ 2009-04-27 13:03 UTC (permalink / raw)
  To: Ryo Tsuruta
  Cc: dm-devel, vgoyal, fernando, linux-kernel, jmoyer, jens.axboe,
	nauman, agk, balbir

On Mon, Apr 27 2009 at  6:30am -0400,
Ryo Tsuruta <ryov@valinux.co.jp> wrote:

> Hi Mike,
> 
> > Why is it that you repeatedly ignore concern/discussion about your
> > determination to continue using a custom grouping mechanism?  It is this
> > type of excess layering that serves no purpose other than to facilitate
> > out-of-tree use-cases.  dm-ioband would take a big step closer to being
> > merged upstream if you took others' feedback and showed more willingness
> > to work through the outstanding issues.
> 
> I think dm-ioband's approach is one simple way to handle cgroup
> because the current cgroup has no way to manage kernel module's
> resources. Please tell me if you have any good ideas to handle 
> cgroup by dm-ioband.

If you'd like to keep dm-ioband modular then I'd say the appropriate
cgroup interfaces need to be exposed for module use (symbols exported,
etc).  No other controller has had a need to be modular but if you think
it is requirement for dm-ioband (facilitate updates, etc) then I have to
believe it is doable.

Mike

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: dm-ioband: Test results.
@ 2009-04-27 13:03                               ` Mike Snitzer
  0 siblings, 0 replies; 55+ messages in thread
From: Mike Snitzer @ 2009-04-27 13:03 UTC (permalink / raw)
  To: Ryo Tsuruta
  Cc: fernando, dm-devel, linux-kernel, jmoyer, agk, jens.axboe,
	nauman, vgoyal, balbir

On Mon, Apr 27 2009 at  6:30am -0400,
Ryo Tsuruta <ryov@valinux.co.jp> wrote:

> Hi Mike,
> 
> > Why is it that you repeatedly ignore concern/discussion about your
> > determination to continue using a custom grouping mechanism?  It is this
> > type of excess layering that serves no purpose other than to facilitate
> > out-of-tree use-cases.  dm-ioband would take a big step closer to being
> > merged upstream if you took others' feedback and showed more willingness
> > to work through the outstanding issues.
> 
> I think dm-ioband's approach is one simple way to handle cgroup
> because the current cgroup has no way to manage kernel module's
> resources. Please tell me if you have any good ideas to handle 
> cgroup by dm-ioband.

If you'd like to keep dm-ioband modular then I'd say the appropriate
cgroup interfaces need to be exposed for module use (symbols exported,
etc).  No other controller has had a need to be modular but if you think
it is requirement for dm-ioband (facilitate updates, etc) then I have to
believe it is doable.

Mike

^ permalink raw reply	[flat|nested] 55+ messages in thread

end of thread, other threads:[~2009-04-27 13:03 UTC | newest]

Thread overview: 55+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-04-13  4:05 dm-ioband: Test results Ryo Tsuruta
2009-04-13 14:46 ` Vivek Goyal
2009-04-13 14:46   ` Vivek Goyal
2009-04-14  2:49   ` Vivek Goyal
2009-04-14  2:49     ` Vivek Goyal
2009-04-14  5:27     ` Ryo Tsuruta
2009-04-14  9:30   ` [dm-devel] " Ryo Tsuruta
2009-04-14  9:30     ` Ryo Tsuruta
2009-04-15 17:04     ` [dm-devel] " Vivek Goyal
2009-04-15 17:04       ` Vivek Goyal
2009-04-16 12:56       ` [dm-devel] " Ryo Tsuruta
2009-04-16 12:56         ` Ryo Tsuruta
2009-04-16 13:32         ` Vivek Goyal
2009-04-16 13:32           ` Vivek Goyal
2009-04-15  4:37 ` Vivek Goyal
2009-04-15  4:37   ` Vivek Goyal
2009-04-15 13:38   ` Ryo Tsuruta
2009-04-15 14:10     ` Vivek Goyal
2009-04-15 14:10       ` Vivek Goyal
2009-04-15 16:17     ` Vivek Goyal
2009-04-15 16:17       ` Vivek Goyal
2009-04-16  2:47     ` [dm-devel] " Ryo Tsuruta
2009-04-16 14:11       ` Vivek Goyal
2009-04-16 14:11         ` Vivek Goyal
2009-04-16 20:24         ` [dm-devel] " Nauman Rafique
2009-04-20  8:29           ` Ryo Tsuruta
2009-04-20  8:29             ` Ryo Tsuruta
2009-04-20  9:07             ` [dm-devel] " Nauman Rafique
2009-04-21 12:06               ` Ryo Tsuruta
2009-04-21 12:10                 ` Ryo Tsuruta
2009-04-21 12:10                   ` Ryo Tsuruta
2009-04-21 13:57                   ` Mike Snitzer
2009-04-21 13:57                     ` Mike Snitzer
2009-04-21 14:16                     ` Vivek Goyal
2009-04-21 14:16                       ` Vivek Goyal
2009-04-22  0:50                       ` Li Zefan
2009-04-22  3:14                       ` [dm-devel] " Ryo Tsuruta
2009-04-22  3:14                         ` Ryo Tsuruta
2009-04-22 15:18                         ` Mike Snitzer
2009-04-22 15:18                           ` Mike Snitzer
2009-04-27 10:30                           ` Ryo Tsuruta
2009-04-27 10:30                             ` Ryo Tsuruta
2009-04-27 12:44                             ` Ryo Tsuruta
2009-04-27 12:44                               ` Ryo Tsuruta
2009-04-27 13:03                             ` Mike Snitzer
2009-04-27 13:03                               ` Mike Snitzer
2009-04-20 21:37             ` [dm-devel] " Vivek Goyal
2009-04-20 21:37               ` Vivek Goyal
2009-04-21 12:18               ` [dm-devel] " Ryo Tsuruta
2009-04-16 20:57 ` Vivek Goyal
2009-04-16 20:57   ` Vivek Goyal
2009-04-17  2:11   ` Vivek Goyal
2009-04-17  2:11     ` Vivek Goyal
2009-04-17  2:28     ` Vivek Goyal
2009-04-17  2:28       ` Vivek Goyal

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.