All of lore.kernel.org
 help / color / mirror / Atom feed
* Regarding dm-ioband tests
@ 2009-09-01 16:50 ` Vivek Goyal
  0 siblings, 0 replies; 80+ messages in thread
From: Vivek Goyal @ 2009-09-01 16:50 UTC (permalink / raw)
  To: Ryo Tsuruta; +Cc: linux kernel mailing list, dm-devel

Hi Ryo,

I decided to play a bit more with dm-ioband and started doing some
testing. I am doing a simple two dd threads doing reads and don't seem
to be gettting the fairness. So thought will ask you what's the issue
here. Is there an issue with my testing procedure.

I got one 40G SATA drive (no hardware queuing). I have created two
partitions on that disk /dev/sdd1 and /dev/sdd2 and created two ioband
devices ioband1 and ioband2 on partitions sdd1 and sdd2 respectively. The
weights of ioband1 and ioband2 devices are 200 and 100 respectively. 

I am assuming that this setup will create two default groups and IO
going to partition sdd1 should get double the BW of partition sdd2.

But it looks like I am not gettting that behavior. Following is the output
of "dmsetup table" command. This snapshot has been taken every 2 seconds
while IO was going on. Column 9 seems to be containing how many sectors
of IO has been done on a particular io band device and group. Looking at
the snapshot, it does not look like that ioband1 default group got double
the BW of ioband2 default group.  

Am I doing something wrong here?

ioband2: 0 40355280 ioband 1 -1 0 0 0 0 0 0
ioband1: 0 37768752 ioband 1 -1 0 0 0 0 0 0

ioband2: 0 40355280 ioband 1 -1 96 0 11528 0 0 0
ioband1: 0 37768752 ioband 1 -1 82 0 9736 0 0 0

ioband2: 0 40355280 ioband 1 -1 748 2 93032 0 0 0
ioband1: 0 37768752 ioband 1 -1 896 0 112232 0 0 0

ioband2: 0 40355280 ioband 1 -1 1326 5 165816 0 0 0
ioband1: 0 37768752 ioband 1 -1 1816 0 228312 0 0 0

ioband2: 0 40355280 ioband 1 -1 1943 6 243712 0 0 0
ioband1: 0 37768752 ioband 1 -1 2692 0 338760 0 0 0

ioband2: 0 40355280 ioband 1 -1 2461 10 308576 0 0 0
ioband1: 0 37768752 ioband 1 -1 3618 0 455608 0 0 0

ioband2: 0 40355280 ioband 1 -1 3118 11 391352 0 0 0
ioband1: 0 37768752 ioband 1 -1 4406 0 555032 0 0 0

ioband2: 0 40355280 ioband 1 -1 3734 15 468760 0 0 0
ioband1: 0 37768752 ioband 1 -1 5273 0 664328 0 0 0

ioband2: 0 40355280 ioband 1 -1 4307 17 540784 0 0 0
ioband1: 0 37768752 ioband 1 -1 6181 0 778992 0 0 0

ioband2: 0 40355280 ioband 1 -1 4930 19 619208 0 0 0
ioband1: 0 37768752 ioband 1 -1 7028 0 885728 0 0 0

ioband2: 0 40355280 ioband 1 -1 5599 22 703280 0 0 0
ioband1: 0 37768752 ioband 1 -1 7815 0 985024 0 0 0

ioband2: 0 40355280 ioband 1 -1 6586 27 827456 0 0 0
ioband1: 0 37768752 ioband 1 -1 8327 0 1049624 0 0 0

Following are details of my test setup.
---------------------------------------
I took dm-ioband patch version 1.12.3 and applied on 2.6.31-rc6.

Created ioband devices using following command.
----------------------------------------------
echo "0 $(blockdev --getsize /dev/sdd1) ioband /dev/sdd1 1 0 0 none"
"weight 0 :200" | dmsetup create ioband1
echo "0 $(blockdev --getsize /dev/sdd2) ioband /dev/sdd2 1 0 0 none"
"weight 0 :100" | dmsetup create ioband2

mount /dev/mapper/ioband1 /mnt/sdd1
mount /dev/mapper/ioband2 /mnt/sdd2

Started two dd threads
======================
dd if=/mnt/sdd1/testzerofile1 of=/dev/null &
dd if=/mnt/sdd2/testzerofile1 of=/dev/null &

Output of dmsetup table command
================================
ioband2: 0 40355280 ioband 8:50 1 4 192 none weight 768 :100
ioband1: 0 37768752 ioband 8:49 1 4 192 none weight 768 :200

Thanks
Vivek

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Regarding dm-ioband tests
@ 2009-09-01 16:50 ` Vivek Goyal
  0 siblings, 0 replies; 80+ messages in thread
From: Vivek Goyal @ 2009-09-01 16:50 UTC (permalink / raw)
  To: Ryo Tsuruta; +Cc: dm-devel, linux kernel mailing list

Hi Ryo,

I decided to play a bit more with dm-ioband and started doing some
testing. I am doing a simple two dd threads doing reads and don't seem
to be gettting the fairness. So thought will ask you what's the issue
here. Is there an issue with my testing procedure.

I got one 40G SATA drive (no hardware queuing). I have created two
partitions on that disk /dev/sdd1 and /dev/sdd2 and created two ioband
devices ioband1 and ioband2 on partitions sdd1 and sdd2 respectively. The
weights of ioband1 and ioband2 devices are 200 and 100 respectively. 

I am assuming that this setup will create two default groups and IO
going to partition sdd1 should get double the BW of partition sdd2.

But it looks like I am not gettting that behavior. Following is the output
of "dmsetup table" command. This snapshot has been taken every 2 seconds
while IO was going on. Column 9 seems to be containing how many sectors
of IO has been done on a particular io band device and group. Looking at
the snapshot, it does not look like that ioband1 default group got double
the BW of ioband2 default group.  

Am I doing something wrong here?

ioband2: 0 40355280 ioband 1 -1 0 0 0 0 0 0
ioband1: 0 37768752 ioband 1 -1 0 0 0 0 0 0

ioband2: 0 40355280 ioband 1 -1 96 0 11528 0 0 0
ioband1: 0 37768752 ioband 1 -1 82 0 9736 0 0 0

ioband2: 0 40355280 ioband 1 -1 748 2 93032 0 0 0
ioband1: 0 37768752 ioband 1 -1 896 0 112232 0 0 0

ioband2: 0 40355280 ioband 1 -1 1326 5 165816 0 0 0
ioband1: 0 37768752 ioband 1 -1 1816 0 228312 0 0 0

ioband2: 0 40355280 ioband 1 -1 1943 6 243712 0 0 0
ioband1: 0 37768752 ioband 1 -1 2692 0 338760 0 0 0

ioband2: 0 40355280 ioband 1 -1 2461 10 308576 0 0 0
ioband1: 0 37768752 ioband 1 -1 3618 0 455608 0 0 0

ioband2: 0 40355280 ioband 1 -1 3118 11 391352 0 0 0
ioband1: 0 37768752 ioband 1 -1 4406 0 555032 0 0 0

ioband2: 0 40355280 ioband 1 -1 3734 15 468760 0 0 0
ioband1: 0 37768752 ioband 1 -1 5273 0 664328 0 0 0

ioband2: 0 40355280 ioband 1 -1 4307 17 540784 0 0 0
ioband1: 0 37768752 ioband 1 -1 6181 0 778992 0 0 0

ioband2: 0 40355280 ioband 1 -1 4930 19 619208 0 0 0
ioband1: 0 37768752 ioband 1 -1 7028 0 885728 0 0 0

ioband2: 0 40355280 ioband 1 -1 5599 22 703280 0 0 0
ioband1: 0 37768752 ioband 1 -1 7815 0 985024 0 0 0

ioband2: 0 40355280 ioband 1 -1 6586 27 827456 0 0 0
ioband1: 0 37768752 ioband 1 -1 8327 0 1049624 0 0 0

Following are details of my test setup.
---------------------------------------
I took dm-ioband patch version 1.12.3 and applied on 2.6.31-rc6.

Created ioband devices using following command.
----------------------------------------------
echo "0 $(blockdev --getsize /dev/sdd1) ioband /dev/sdd1 1 0 0 none"
"weight 0 :200" | dmsetup create ioband1
echo "0 $(blockdev --getsize /dev/sdd2) ioband /dev/sdd2 1 0 0 none"
"weight 0 :100" | dmsetup create ioband2

mount /dev/mapper/ioband1 /mnt/sdd1
mount /dev/mapper/ioband2 /mnt/sdd2

Started two dd threads
======================
dd if=/mnt/sdd1/testzerofile1 of=/dev/null &
dd if=/mnt/sdd2/testzerofile1 of=/dev/null &

Output of dmsetup table command
================================
ioband2: 0 40355280 ioband 8:50 1 4 192 none weight 768 :100
ioband1: 0 37768752 ioband 8:49 1 4 192 none weight 768 :200

Thanks
Vivek

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: Regarding dm-ioband tests
  2009-09-01 16:50 ` Vivek Goyal
@ 2009-09-01 17:47   ` Vivek Goyal
  -1 siblings, 0 replies; 80+ messages in thread
From: Vivek Goyal @ 2009-09-01 17:47 UTC (permalink / raw)
  To: Ryo Tsuruta; +Cc: linux kernel mailing list, dm-devel

On Tue, Sep 01, 2009 at 12:50:11PM -0400, Vivek Goyal wrote:
> Hi Ryo,
> 
> I decided to play a bit more with dm-ioband and started doing some
> testing. I am doing a simple two dd threads doing reads and don't seem
> to be gettting the fairness. So thought will ask you what's the issue
> here. Is there an issue with my testing procedure.
> 
> I got one 40G SATA drive (no hardware queuing). I have created two
> partitions on that disk /dev/sdd1 and /dev/sdd2 and created two ioband
> devices ioband1 and ioband2 on partitions sdd1 and sdd2 respectively. The
> weights of ioband1 and ioband2 devices are 200 and 100 respectively. 
> 
> I am assuming that this setup will create two default groups and IO
> going to partition sdd1 should get double the BW of partition sdd2.
> 
> But it looks like I am not gettting that behavior. Following is the output
> of "dmsetup table" command. This snapshot has been taken every 2 seconds
> while IO was going on. Column 9 seems to be containing how many sectors
> of IO has been done on a particular io band device and group. Looking at
> the snapshot, it does not look like that ioband1 default group got double
> the BW of ioband2 default group.  
> 
> Am I doing something wrong here?
> 

I tried another variant of test. This time I also created two additional
groups on ioband1 devices and linked these to cgroups test1 and test2 and
launched two dd threads in two cgroups on device ioband1. There also I
don't seem to be getting the right fairness numbers for cgroup test1 and
test2.

Script to create ioband devices and additional groups
-----------------------------------------------------
echo "0 $(blockdev --getsize /dev/sdd1) ioband /dev/sdd1 1 0 0 none"
"weight 0 :200" | dmsetup create ioband1
echo "0 $(blockdev --getsize /dev/sdd2) ioband /dev/sdd2 1 0 0 none"
"weight 0 :100" | dmsetup create ioband2

# Some code to mount and create cgroups.
# Read group id
test1_id=`cat /cgroup/ioband/test1/blkio.id`
test2_id=`cat /cgroup/ioband/test2/blkio.id`

test1_weight=200
test2_weight=100

dmsetup message ioband1 0 type cgroup 
dmsetup message ioband1 0 attach $test1_id
dmsetup message ioband1 0 attach $test2_id
dmsetup message ioband1 0 weight $test1_id:$test1_weight
dmsetup message ioband1 0 weight $test2_id:$test2_weight

mount /dev/mapper/ioband1 /mnt/sdd1
mount /dev/mapper/ioband2 /mnt/sdd2
-----------------------------------------------------------------

Following are two dd jobs
-------------------------
dd if=/mnt/sdd1/testzerofile1 of=/dev/null &
echo $! > /cgroup/ioband/test1/tasks

dd if=/mnt/sdd1/testzerofile2 of=/dev/null &
echo $! > /cgroup/ioband/test2/tasks


Following are "dmsetup status" results every 2 seconds
======================================================

ioband2: 0 40355280 ioband 1 -1 0 0 0 0 0 0
ioband1: 0 37768752 ioband 1 -1 0 0 0 0 0 0 2 0 0 0 0 0 0 3 0 0 0 0 0 0

ioband2: 0 40355280 ioband 1 -1 0 0 0 0 0 0
ioband1: 0 37768752 ioband 1 -1 0 0 0 0 0 0 2 0 0 0 0 0 0 3 0 0 0 0 0 0

ioband2: 0 40355280 ioband 1 -1 0 0 0 0 0 0
ioband1: 0 37768752 ioband 1 -1 0 0 0 0 0 0 2 689 0 86336 0 0 0 3 650 3
81472 0 0 0

ioband2: 0 40355280 ioband 1 -1 0 0 0 0 0 0
ioband1: 0 37768752 ioband 1 -1 0 0 0 0 0 0 2 1725 0 217024 0 0 0 3 1270
11 158912 0 0 0

ioband2: 0 40355280 ioband 1 -1 0 0 0 0 0 0
ioband1: 0 37768752 ioband 1 -1 0 0 0 0 0 0 2 2690 0 338744 0 0 0 3 1978
15 247856 0 0 0

ioband2: 0 40355280 ioband 1 -1 0 0 0 0 0 0
ioband1: 0 37768752 ioband 1 -1 0 0 0 0 0 0 2 3762 0 474040 0 0 0 3 2583
21 323736 0 0 0

ioband2: 0 40355280 ioband 1 -1 0 0 0 0 0 0
ioband1: 0 37768752 ioband 1 -1 0 0 0 0 0 0 2 4745 0 598064 0 0 0 3 3275
27 410392 0 0 0

ioband2: 0 40355280 ioband 1 -1 0 0 0 0 0 0
ioband1: 0 37768752 ioband 1 -1 0 0 0 0 0 0 2 5737 0 723120 0 0 0 3 3985
31 499592 0 0 0

ioband2: 0 40355280 ioband 1 -1 0 0 0 0 0 0
ioband1: 0 37768752 ioband 1 -1 0 0 0 0 0 0 2 6815 0 859184 0 0 0 3 4594
37 575864 0 0 0

ioband2: 0 40355280 ioband 1 -1 0 0 0 0 0 0
ioband1: 0 37768752 ioband 1 -1 0 0 0 0 0 0 2 7823 0 986288 0 0 0 3 5276
43 661360 0 0 0

"dmsetup table" output
======================
ioband2: 0 40355280 ioband 8:50 1 4 192 none weight 768 :100
ioband1: 0 37768752 ioband 8:49 1 4 192 cgroup weight 768 :200 2:200 3:100

Because I am using "weight" policy, I thought that test1 cgroup with id
"2" will issue double the number of requests of cgroup test2 with id "3".
But that does not seem to be happening here. Is there an issue with my
testing method.

Thanks
Vivek
 
> ioband2: 0 40355280 ioband 1 -1 0 0 0 0 0 0
> ioband1: 0 37768752 ioband 1 -1 0 0 0 0 0 0
> 
> ioband2: 0 40355280 ioband 1 -1 96 0 11528 0 0 0
> ioband1: 0 37768752 ioband 1 -1 82 0 9736 0 0 0
> 
> ioband2: 0 40355280 ioband 1 -1 748 2 93032 0 0 0
> ioband1: 0 37768752 ioband 1 -1 896 0 112232 0 0 0
> 
> ioband2: 0 40355280 ioband 1 -1 1326 5 165816 0 0 0
> ioband1: 0 37768752 ioband 1 -1 1816 0 228312 0 0 0
> 
> ioband2: 0 40355280 ioband 1 -1 1943 6 243712 0 0 0
> ioband1: 0 37768752 ioband 1 -1 2692 0 338760 0 0 0
> 
> ioband2: 0 40355280 ioband 1 -1 2461 10 308576 0 0 0
> ioband1: 0 37768752 ioband 1 -1 3618 0 455608 0 0 0
> 
> ioband2: 0 40355280 ioband 1 -1 3118 11 391352 0 0 0
> ioband1: 0 37768752 ioband 1 -1 4406 0 555032 0 0 0
> 
> ioband2: 0 40355280 ioband 1 -1 3734 15 468760 0 0 0
> ioband1: 0 37768752 ioband 1 -1 5273 0 664328 0 0 0
> 
> ioband2: 0 40355280 ioband 1 -1 4307 17 540784 0 0 0
> ioband1: 0 37768752 ioband 1 -1 6181 0 778992 0 0 0
> 
> ioband2: 0 40355280 ioband 1 -1 4930 19 619208 0 0 0
> ioband1: 0 37768752 ioband 1 -1 7028 0 885728 0 0 0
> 
> ioband2: 0 40355280 ioband 1 -1 5599 22 703280 0 0 0
> ioband1: 0 37768752 ioband 1 -1 7815 0 985024 0 0 0
> 
> ioband2: 0 40355280 ioband 1 -1 6586 27 827456 0 0 0
> ioband1: 0 37768752 ioband 1 -1 8327 0 1049624 0 0 0
> 
> Following are details of my test setup.
> ---------------------------------------
> I took dm-ioband patch version 1.12.3 and applied on 2.6.31-rc6.
> 
> Created ioband devices using following command.
> ----------------------------------------------
> echo "0 $(blockdev --getsize /dev/sdd1) ioband /dev/sdd1 1 0 0 none"
> "weight 0 :200" | dmsetup create ioband1
> echo "0 $(blockdev --getsize /dev/sdd2) ioband /dev/sdd2 1 0 0 none"
> "weight 0 :100" | dmsetup create ioband2
> 
> mount /dev/mapper/ioband1 /mnt/sdd1
> mount /dev/mapper/ioband2 /mnt/sdd2
> 
> Started two dd threads
> ======================
> dd if=/mnt/sdd1/testzerofile1 of=/dev/null &
> dd if=/mnt/sdd2/testzerofile1 of=/dev/null &
> 
> Output of dmsetup table command
> ================================
> ioband2: 0 40355280 ioband 8:50 1 4 192 none weight 768 :100
> ioband1: 0 37768752 ioband 8:49 1 4 192 none weight 768 :200
> 
> Thanks
> Vivek

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: Regarding dm-ioband tests
@ 2009-09-01 17:47   ` Vivek Goyal
  0 siblings, 0 replies; 80+ messages in thread
From: Vivek Goyal @ 2009-09-01 17:47 UTC (permalink / raw)
  To: Ryo Tsuruta; +Cc: dm-devel, linux kernel mailing list

On Tue, Sep 01, 2009 at 12:50:11PM -0400, Vivek Goyal wrote:
> Hi Ryo,
> 
> I decided to play a bit more with dm-ioband and started doing some
> testing. I am doing a simple two dd threads doing reads and don't seem
> to be gettting the fairness. So thought will ask you what's the issue
> here. Is there an issue with my testing procedure.
> 
> I got one 40G SATA drive (no hardware queuing). I have created two
> partitions on that disk /dev/sdd1 and /dev/sdd2 and created two ioband
> devices ioband1 and ioband2 on partitions sdd1 and sdd2 respectively. The
> weights of ioband1 and ioband2 devices are 200 and 100 respectively. 
> 
> I am assuming that this setup will create two default groups and IO
> going to partition sdd1 should get double the BW of partition sdd2.
> 
> But it looks like I am not gettting that behavior. Following is the output
> of "dmsetup table" command. This snapshot has been taken every 2 seconds
> while IO was going on. Column 9 seems to be containing how many sectors
> of IO has been done on a particular io band device and group. Looking at
> the snapshot, it does not look like that ioband1 default group got double
> the BW of ioband2 default group.  
> 
> Am I doing something wrong here?
> 

I tried another variant of test. This time I also created two additional
groups on ioband1 devices and linked these to cgroups test1 and test2 and
launched two dd threads in two cgroups on device ioband1. There also I
don't seem to be getting the right fairness numbers for cgroup test1 and
test2.

Script to create ioband devices and additional groups
-----------------------------------------------------
echo "0 $(blockdev --getsize /dev/sdd1) ioband /dev/sdd1 1 0 0 none"
"weight 0 :200" | dmsetup create ioband1
echo "0 $(blockdev --getsize /dev/sdd2) ioband /dev/sdd2 1 0 0 none"
"weight 0 :100" | dmsetup create ioband2

# Some code to mount and create cgroups.
# Read group id
test1_id=`cat /cgroup/ioband/test1/blkio.id`
test2_id=`cat /cgroup/ioband/test2/blkio.id`

test1_weight=200
test2_weight=100

dmsetup message ioband1 0 type cgroup 
dmsetup message ioband1 0 attach $test1_id
dmsetup message ioband1 0 attach $test2_id
dmsetup message ioband1 0 weight $test1_id:$test1_weight
dmsetup message ioband1 0 weight $test2_id:$test2_weight

mount /dev/mapper/ioband1 /mnt/sdd1
mount /dev/mapper/ioband2 /mnt/sdd2
-----------------------------------------------------------------

Following are two dd jobs
-------------------------
dd if=/mnt/sdd1/testzerofile1 of=/dev/null &
echo $! > /cgroup/ioband/test1/tasks

dd if=/mnt/sdd1/testzerofile2 of=/dev/null &
echo $! > /cgroup/ioband/test2/tasks


Following are "dmsetup status" results every 2 seconds
======================================================

ioband2: 0 40355280 ioband 1 -1 0 0 0 0 0 0
ioband1: 0 37768752 ioband 1 -1 0 0 0 0 0 0 2 0 0 0 0 0 0 3 0 0 0 0 0 0

ioband2: 0 40355280 ioband 1 -1 0 0 0 0 0 0
ioband1: 0 37768752 ioband 1 -1 0 0 0 0 0 0 2 0 0 0 0 0 0 3 0 0 0 0 0 0

ioband2: 0 40355280 ioband 1 -1 0 0 0 0 0 0
ioband1: 0 37768752 ioband 1 -1 0 0 0 0 0 0 2 689 0 86336 0 0 0 3 650 3
81472 0 0 0

ioband2: 0 40355280 ioband 1 -1 0 0 0 0 0 0
ioband1: 0 37768752 ioband 1 -1 0 0 0 0 0 0 2 1725 0 217024 0 0 0 3 1270
11 158912 0 0 0

ioband2: 0 40355280 ioband 1 -1 0 0 0 0 0 0
ioband1: 0 37768752 ioband 1 -1 0 0 0 0 0 0 2 2690 0 338744 0 0 0 3 1978
15 247856 0 0 0

ioband2: 0 40355280 ioband 1 -1 0 0 0 0 0 0
ioband1: 0 37768752 ioband 1 -1 0 0 0 0 0 0 2 3762 0 474040 0 0 0 3 2583
21 323736 0 0 0

ioband2: 0 40355280 ioband 1 -1 0 0 0 0 0 0
ioband1: 0 37768752 ioband 1 -1 0 0 0 0 0 0 2 4745 0 598064 0 0 0 3 3275
27 410392 0 0 0

ioband2: 0 40355280 ioband 1 -1 0 0 0 0 0 0
ioband1: 0 37768752 ioband 1 -1 0 0 0 0 0 0 2 5737 0 723120 0 0 0 3 3985
31 499592 0 0 0

ioband2: 0 40355280 ioband 1 -1 0 0 0 0 0 0
ioband1: 0 37768752 ioband 1 -1 0 0 0 0 0 0 2 6815 0 859184 0 0 0 3 4594
37 575864 0 0 0

ioband2: 0 40355280 ioband 1 -1 0 0 0 0 0 0
ioband1: 0 37768752 ioband 1 -1 0 0 0 0 0 0 2 7823 0 986288 0 0 0 3 5276
43 661360 0 0 0

"dmsetup table" output
======================
ioband2: 0 40355280 ioband 8:50 1 4 192 none weight 768 :100
ioband1: 0 37768752 ioband 8:49 1 4 192 cgroup weight 768 :200 2:200 3:100

Because I am using "weight" policy, I thought that test1 cgroup with id
"2" will issue double the number of requests of cgroup test2 with id "3".
But that does not seem to be happening here. Is there an issue with my
testing method.

Thanks
Vivek
 
> ioband2: 0 40355280 ioband 1 -1 0 0 0 0 0 0
> ioband1: 0 37768752 ioband 1 -1 0 0 0 0 0 0
> 
> ioband2: 0 40355280 ioband 1 -1 96 0 11528 0 0 0
> ioband1: 0 37768752 ioband 1 -1 82 0 9736 0 0 0
> 
> ioband2: 0 40355280 ioband 1 -1 748 2 93032 0 0 0
> ioband1: 0 37768752 ioband 1 -1 896 0 112232 0 0 0
> 
> ioband2: 0 40355280 ioband 1 -1 1326 5 165816 0 0 0
> ioband1: 0 37768752 ioband 1 -1 1816 0 228312 0 0 0
> 
> ioband2: 0 40355280 ioband 1 -1 1943 6 243712 0 0 0
> ioband1: 0 37768752 ioband 1 -1 2692 0 338760 0 0 0
> 
> ioband2: 0 40355280 ioband 1 -1 2461 10 308576 0 0 0
> ioband1: 0 37768752 ioband 1 -1 3618 0 455608 0 0 0
> 
> ioband2: 0 40355280 ioband 1 -1 3118 11 391352 0 0 0
> ioband1: 0 37768752 ioband 1 -1 4406 0 555032 0 0 0
> 
> ioband2: 0 40355280 ioband 1 -1 3734 15 468760 0 0 0
> ioband1: 0 37768752 ioband 1 -1 5273 0 664328 0 0 0
> 
> ioband2: 0 40355280 ioband 1 -1 4307 17 540784 0 0 0
> ioband1: 0 37768752 ioband 1 -1 6181 0 778992 0 0 0
> 
> ioband2: 0 40355280 ioband 1 -1 4930 19 619208 0 0 0
> ioband1: 0 37768752 ioband 1 -1 7028 0 885728 0 0 0
> 
> ioband2: 0 40355280 ioband 1 -1 5599 22 703280 0 0 0
> ioband1: 0 37768752 ioband 1 -1 7815 0 985024 0 0 0
> 
> ioband2: 0 40355280 ioband 1 -1 6586 27 827456 0 0 0
> ioband1: 0 37768752 ioband 1 -1 8327 0 1049624 0 0 0
> 
> Following are details of my test setup.
> ---------------------------------------
> I took dm-ioband patch version 1.12.3 and applied on 2.6.31-rc6.
> 
> Created ioband devices using following command.
> ----------------------------------------------
> echo "0 $(blockdev --getsize /dev/sdd1) ioband /dev/sdd1 1 0 0 none"
> "weight 0 :200" | dmsetup create ioband1
> echo "0 $(blockdev --getsize /dev/sdd2) ioband /dev/sdd2 1 0 0 none"
> "weight 0 :100" | dmsetup create ioband2
> 
> mount /dev/mapper/ioband1 /mnt/sdd1
> mount /dev/mapper/ioband2 /mnt/sdd2
> 
> Started two dd threads
> ======================
> dd if=/mnt/sdd1/testzerofile1 of=/dev/null &
> dd if=/mnt/sdd2/testzerofile1 of=/dev/null &
> 
> Output of dmsetup table command
> ================================
> ioband2: 0 40355280 ioband 8:50 1 4 192 none weight 768 :100
> ioband1: 0 37768752 ioband 8:49 1 4 192 none weight 768 :200
> 
> Thanks
> Vivek

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: Regarding dm-ioband tests
  2009-09-01 17:47   ` Vivek Goyal
@ 2009-09-03 13:11     ` Vivek Goyal
  -1 siblings, 0 replies; 80+ messages in thread
From: Vivek Goyal @ 2009-09-03 13:11 UTC (permalink / raw)
  To: Ryo Tsuruta; +Cc: linux kernel mailing list, dm-devel

On Tue, Sep 01, 2009 at 01:47:24PM -0400, Vivek Goyal wrote:
> On Tue, Sep 01, 2009 at 12:50:11PM -0400, Vivek Goyal wrote:
> > Hi Ryo,
> > 
> > I decided to play a bit more with dm-ioband and started doing some
> > testing. I am doing a simple two dd threads doing reads and don't seem
> > to be gettting the fairness. So thought will ask you what's the issue
> > here. Is there an issue with my testing procedure.
> > 
> > I got one 40G SATA drive (no hardware queuing). I have created two
> > partitions on that disk /dev/sdd1 and /dev/sdd2 and created two ioband
> > devices ioband1 and ioband2 on partitions sdd1 and sdd2 respectively. The
> > weights of ioband1 and ioband2 devices are 200 and 100 respectively. 
> > 
> > I am assuming that this setup will create two default groups and IO
> > going to partition sdd1 should get double the BW of partition sdd2.
> > 
> > But it looks like I am not gettting that behavior. Following is the output
> > of "dmsetup table" command. This snapshot has been taken every 2 seconds
> > while IO was going on. Column 9 seems to be containing how many sectors
> > of IO has been done on a particular io band device and group. Looking at
> > the snapshot, it does not look like that ioband1 default group got double
> > the BW of ioband2 default group.  
> > 
> > Am I doing something wrong here?
> > 
> 

Hi Ryo,

Did you get a chance to look into it? Am I doing something wrong or it is
an issue with dm-ioband.

Thanks
Vivek

> I tried another variant of test. This time I also created two additional
> groups on ioband1 devices and linked these to cgroups test1 and test2 and
> launched two dd threads in two cgroups on device ioband1. There also I
> don't seem to be getting the right fairness numbers for cgroup test1 and
> test2.
> 
> Script to create ioband devices and additional groups
> -----------------------------------------------------
> echo "0 $(blockdev --getsize /dev/sdd1) ioband /dev/sdd1 1 0 0 none"
> "weight 0 :200" | dmsetup create ioband1
> echo "0 $(blockdev --getsize /dev/sdd2) ioband /dev/sdd2 1 0 0 none"
> "weight 0 :100" | dmsetup create ioband2
> 
> # Some code to mount and create cgroups.
> # Read group id
> test1_id=`cat /cgroup/ioband/test1/blkio.id`
> test2_id=`cat /cgroup/ioband/test2/blkio.id`
> 
> test1_weight=200
> test2_weight=100
> 
> dmsetup message ioband1 0 type cgroup 
> dmsetup message ioband1 0 attach $test1_id
> dmsetup message ioband1 0 attach $test2_id
> dmsetup message ioband1 0 weight $test1_id:$test1_weight
> dmsetup message ioband1 0 weight $test2_id:$test2_weight
> 
> mount /dev/mapper/ioband1 /mnt/sdd1
> mount /dev/mapper/ioband2 /mnt/sdd2
> -----------------------------------------------------------------
> 
> Following are two dd jobs
> -------------------------
> dd if=/mnt/sdd1/testzerofile1 of=/dev/null &
> echo $! > /cgroup/ioband/test1/tasks
> 
> dd if=/mnt/sdd1/testzerofile2 of=/dev/null &
> echo $! > /cgroup/ioband/test2/tasks
> 
> 
> Following are "dmsetup status" results every 2 seconds
> ======================================================
> 
> ioband2: 0 40355280 ioband 1 -1 0 0 0 0 0 0
> ioband1: 0 37768752 ioband 1 -1 0 0 0 0 0 0 2 0 0 0 0 0 0 3 0 0 0 0 0 0
> 
> ioband2: 0 40355280 ioband 1 -1 0 0 0 0 0 0
> ioband1: 0 37768752 ioband 1 -1 0 0 0 0 0 0 2 0 0 0 0 0 0 3 0 0 0 0 0 0
> 
> ioband2: 0 40355280 ioband 1 -1 0 0 0 0 0 0
> ioband1: 0 37768752 ioband 1 -1 0 0 0 0 0 0 2 689 0 86336 0 0 0 3 650 3
> 81472 0 0 0
> 
> ioband2: 0 40355280 ioband 1 -1 0 0 0 0 0 0
> ioband1: 0 37768752 ioband 1 -1 0 0 0 0 0 0 2 1725 0 217024 0 0 0 3 1270
> 11 158912 0 0 0
> 
> ioband2: 0 40355280 ioband 1 -1 0 0 0 0 0 0
> ioband1: 0 37768752 ioband 1 -1 0 0 0 0 0 0 2 2690 0 338744 0 0 0 3 1978
> 15 247856 0 0 0
> 
> ioband2: 0 40355280 ioband 1 -1 0 0 0 0 0 0
> ioband1: 0 37768752 ioband 1 -1 0 0 0 0 0 0 2 3762 0 474040 0 0 0 3 2583
> 21 323736 0 0 0
> 
> ioband2: 0 40355280 ioband 1 -1 0 0 0 0 0 0
> ioband1: 0 37768752 ioband 1 -1 0 0 0 0 0 0 2 4745 0 598064 0 0 0 3 3275
> 27 410392 0 0 0
> 
> ioband2: 0 40355280 ioband 1 -1 0 0 0 0 0 0
> ioband1: 0 37768752 ioband 1 -1 0 0 0 0 0 0 2 5737 0 723120 0 0 0 3 3985
> 31 499592 0 0 0
> 
> ioband2: 0 40355280 ioband 1 -1 0 0 0 0 0 0
> ioband1: 0 37768752 ioband 1 -1 0 0 0 0 0 0 2 6815 0 859184 0 0 0 3 4594
> 37 575864 0 0 0
> 
> ioband2: 0 40355280 ioband 1 -1 0 0 0 0 0 0
> ioband1: 0 37768752 ioband 1 -1 0 0 0 0 0 0 2 7823 0 986288 0 0 0 3 5276
> 43 661360 0 0 0
> 
> "dmsetup table" output
> ======================
> ioband2: 0 40355280 ioband 8:50 1 4 192 none weight 768 :100
> ioband1: 0 37768752 ioband 8:49 1 4 192 cgroup weight 768 :200 2:200 3:100
> 
> Because I am using "weight" policy, I thought that test1 cgroup with id
> "2" will issue double the number of requests of cgroup test2 with id "3".
> But that does not seem to be happening here. Is there an issue with my
> testing method.
> 
> Thanks
> Vivek
>  
> > ioband2: 0 40355280 ioband 1 -1 0 0 0 0 0 0
> > ioband1: 0 37768752 ioband 1 -1 0 0 0 0 0 0
> > 
> > ioband2: 0 40355280 ioband 1 -1 96 0 11528 0 0 0
> > ioband1: 0 37768752 ioband 1 -1 82 0 9736 0 0 0
> > 
> > ioband2: 0 40355280 ioband 1 -1 748 2 93032 0 0 0
> > ioband1: 0 37768752 ioband 1 -1 896 0 112232 0 0 0
> > 
> > ioband2: 0 40355280 ioband 1 -1 1326 5 165816 0 0 0
> > ioband1: 0 37768752 ioband 1 -1 1816 0 228312 0 0 0
> > 
> > ioband2: 0 40355280 ioband 1 -1 1943 6 243712 0 0 0
> > ioband1: 0 37768752 ioband 1 -1 2692 0 338760 0 0 0
> > 
> > ioband2: 0 40355280 ioband 1 -1 2461 10 308576 0 0 0
> > ioband1: 0 37768752 ioband 1 -1 3618 0 455608 0 0 0
> > 
> > ioband2: 0 40355280 ioband 1 -1 3118 11 391352 0 0 0
> > ioband1: 0 37768752 ioband 1 -1 4406 0 555032 0 0 0
> > 
> > ioband2: 0 40355280 ioband 1 -1 3734 15 468760 0 0 0
> > ioband1: 0 37768752 ioband 1 -1 5273 0 664328 0 0 0
> > 
> > ioband2: 0 40355280 ioband 1 -1 4307 17 540784 0 0 0
> > ioband1: 0 37768752 ioband 1 -1 6181 0 778992 0 0 0
> > 
> > ioband2: 0 40355280 ioband 1 -1 4930 19 619208 0 0 0
> > ioband1: 0 37768752 ioband 1 -1 7028 0 885728 0 0 0
> > 
> > ioband2: 0 40355280 ioband 1 -1 5599 22 703280 0 0 0
> > ioband1: 0 37768752 ioband 1 -1 7815 0 985024 0 0 0
> > 
> > ioband2: 0 40355280 ioband 1 -1 6586 27 827456 0 0 0
> > ioband1: 0 37768752 ioband 1 -1 8327 0 1049624 0 0 0
> > 
> > Following are details of my test setup.
> > ---------------------------------------
> > I took dm-ioband patch version 1.12.3 and applied on 2.6.31-rc6.
> > 
> > Created ioband devices using following command.
> > ----------------------------------------------
> > echo "0 $(blockdev --getsize /dev/sdd1) ioband /dev/sdd1 1 0 0 none"
> > "weight 0 :200" | dmsetup create ioband1
> > echo "0 $(blockdev --getsize /dev/sdd2) ioband /dev/sdd2 1 0 0 none"
> > "weight 0 :100" | dmsetup create ioband2
> > 
> > mount /dev/mapper/ioband1 /mnt/sdd1
> > mount /dev/mapper/ioband2 /mnt/sdd2
> > 
> > Started two dd threads
> > ======================
> > dd if=/mnt/sdd1/testzerofile1 of=/dev/null &
> > dd if=/mnt/sdd2/testzerofile1 of=/dev/null &
> > 
> > Output of dmsetup table command
> > ================================
> > ioband2: 0 40355280 ioband 8:50 1 4 192 none weight 768 :100
> > ioband1: 0 37768752 ioband 8:49 1 4 192 none weight 768 :200
> > 
> > Thanks
> > Vivek

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: Regarding dm-ioband tests
@ 2009-09-03 13:11     ` Vivek Goyal
  0 siblings, 0 replies; 80+ messages in thread
From: Vivek Goyal @ 2009-09-03 13:11 UTC (permalink / raw)
  To: Ryo Tsuruta; +Cc: dm-devel, linux kernel mailing list

On Tue, Sep 01, 2009 at 01:47:24PM -0400, Vivek Goyal wrote:
> On Tue, Sep 01, 2009 at 12:50:11PM -0400, Vivek Goyal wrote:
> > Hi Ryo,
> > 
> > I decided to play a bit more with dm-ioband and started doing some
> > testing. I am doing a simple two dd threads doing reads and don't seem
> > to be gettting the fairness. So thought will ask you what's the issue
> > here. Is there an issue with my testing procedure.
> > 
> > I got one 40G SATA drive (no hardware queuing). I have created two
> > partitions on that disk /dev/sdd1 and /dev/sdd2 and created two ioband
> > devices ioband1 and ioband2 on partitions sdd1 and sdd2 respectively. The
> > weights of ioband1 and ioband2 devices are 200 and 100 respectively. 
> > 
> > I am assuming that this setup will create two default groups and IO
> > going to partition sdd1 should get double the BW of partition sdd2.
> > 
> > But it looks like I am not gettting that behavior. Following is the output
> > of "dmsetup table" command. This snapshot has been taken every 2 seconds
> > while IO was going on. Column 9 seems to be containing how many sectors
> > of IO has been done on a particular io band device and group. Looking at
> > the snapshot, it does not look like that ioband1 default group got double
> > the BW of ioband2 default group.  
> > 
> > Am I doing something wrong here?
> > 
> 

Hi Ryo,

Did you get a chance to look into it? Am I doing something wrong or it is
an issue with dm-ioband.

Thanks
Vivek

> I tried another variant of test. This time I also created two additional
> groups on ioband1 devices and linked these to cgroups test1 and test2 and
> launched two dd threads in two cgroups on device ioband1. There also I
> don't seem to be getting the right fairness numbers for cgroup test1 and
> test2.
> 
> Script to create ioband devices and additional groups
> -----------------------------------------------------
> echo "0 $(blockdev --getsize /dev/sdd1) ioband /dev/sdd1 1 0 0 none"
> "weight 0 :200" | dmsetup create ioband1
> echo "0 $(blockdev --getsize /dev/sdd2) ioband /dev/sdd2 1 0 0 none"
> "weight 0 :100" | dmsetup create ioband2
> 
> # Some code to mount and create cgroups.
> # Read group id
> test1_id=`cat /cgroup/ioband/test1/blkio.id`
> test2_id=`cat /cgroup/ioband/test2/blkio.id`
> 
> test1_weight=200
> test2_weight=100
> 
> dmsetup message ioband1 0 type cgroup 
> dmsetup message ioband1 0 attach $test1_id
> dmsetup message ioband1 0 attach $test2_id
> dmsetup message ioband1 0 weight $test1_id:$test1_weight
> dmsetup message ioband1 0 weight $test2_id:$test2_weight
> 
> mount /dev/mapper/ioband1 /mnt/sdd1
> mount /dev/mapper/ioband2 /mnt/sdd2
> -----------------------------------------------------------------
> 
> Following are two dd jobs
> -------------------------
> dd if=/mnt/sdd1/testzerofile1 of=/dev/null &
> echo $! > /cgroup/ioband/test1/tasks
> 
> dd if=/mnt/sdd1/testzerofile2 of=/dev/null &
> echo $! > /cgroup/ioband/test2/tasks
> 
> 
> Following are "dmsetup status" results every 2 seconds
> ======================================================
> 
> ioband2: 0 40355280 ioband 1 -1 0 0 0 0 0 0
> ioband1: 0 37768752 ioband 1 -1 0 0 0 0 0 0 2 0 0 0 0 0 0 3 0 0 0 0 0 0
> 
> ioband2: 0 40355280 ioband 1 -1 0 0 0 0 0 0
> ioband1: 0 37768752 ioband 1 -1 0 0 0 0 0 0 2 0 0 0 0 0 0 3 0 0 0 0 0 0
> 
> ioband2: 0 40355280 ioband 1 -1 0 0 0 0 0 0
> ioband1: 0 37768752 ioband 1 -1 0 0 0 0 0 0 2 689 0 86336 0 0 0 3 650 3
> 81472 0 0 0
> 
> ioband2: 0 40355280 ioband 1 -1 0 0 0 0 0 0
> ioband1: 0 37768752 ioband 1 -1 0 0 0 0 0 0 2 1725 0 217024 0 0 0 3 1270
> 11 158912 0 0 0
> 
> ioband2: 0 40355280 ioband 1 -1 0 0 0 0 0 0
> ioband1: 0 37768752 ioband 1 -1 0 0 0 0 0 0 2 2690 0 338744 0 0 0 3 1978
> 15 247856 0 0 0
> 
> ioband2: 0 40355280 ioband 1 -1 0 0 0 0 0 0
> ioband1: 0 37768752 ioband 1 -1 0 0 0 0 0 0 2 3762 0 474040 0 0 0 3 2583
> 21 323736 0 0 0
> 
> ioband2: 0 40355280 ioband 1 -1 0 0 0 0 0 0
> ioband1: 0 37768752 ioband 1 -1 0 0 0 0 0 0 2 4745 0 598064 0 0 0 3 3275
> 27 410392 0 0 0
> 
> ioband2: 0 40355280 ioband 1 -1 0 0 0 0 0 0
> ioband1: 0 37768752 ioband 1 -1 0 0 0 0 0 0 2 5737 0 723120 0 0 0 3 3985
> 31 499592 0 0 0
> 
> ioband2: 0 40355280 ioband 1 -1 0 0 0 0 0 0
> ioband1: 0 37768752 ioband 1 -1 0 0 0 0 0 0 2 6815 0 859184 0 0 0 3 4594
> 37 575864 0 0 0
> 
> ioband2: 0 40355280 ioband 1 -1 0 0 0 0 0 0
> ioband1: 0 37768752 ioband 1 -1 0 0 0 0 0 0 2 7823 0 986288 0 0 0 3 5276
> 43 661360 0 0 0
> 
> "dmsetup table" output
> ======================
> ioband2: 0 40355280 ioband 8:50 1 4 192 none weight 768 :100
> ioband1: 0 37768752 ioband 8:49 1 4 192 cgroup weight 768 :200 2:200 3:100
> 
> Because I am using "weight" policy, I thought that test1 cgroup with id
> "2" will issue double the number of requests of cgroup test2 with id "3".
> But that does not seem to be happening here. Is there an issue with my
> testing method.
> 
> Thanks
> Vivek
>  
> > ioband2: 0 40355280 ioband 1 -1 0 0 0 0 0 0
> > ioband1: 0 37768752 ioband 1 -1 0 0 0 0 0 0
> > 
> > ioband2: 0 40355280 ioband 1 -1 96 0 11528 0 0 0
> > ioband1: 0 37768752 ioband 1 -1 82 0 9736 0 0 0
> > 
> > ioband2: 0 40355280 ioband 1 -1 748 2 93032 0 0 0
> > ioband1: 0 37768752 ioband 1 -1 896 0 112232 0 0 0
> > 
> > ioband2: 0 40355280 ioband 1 -1 1326 5 165816 0 0 0
> > ioband1: 0 37768752 ioband 1 -1 1816 0 228312 0 0 0
> > 
> > ioband2: 0 40355280 ioband 1 -1 1943 6 243712 0 0 0
> > ioband1: 0 37768752 ioband 1 -1 2692 0 338760 0 0 0
> > 
> > ioband2: 0 40355280 ioband 1 -1 2461 10 308576 0 0 0
> > ioband1: 0 37768752 ioband 1 -1 3618 0 455608 0 0 0
> > 
> > ioband2: 0 40355280 ioband 1 -1 3118 11 391352 0 0 0
> > ioband1: 0 37768752 ioband 1 -1 4406 0 555032 0 0 0
> > 
> > ioband2: 0 40355280 ioband 1 -1 3734 15 468760 0 0 0
> > ioband1: 0 37768752 ioband 1 -1 5273 0 664328 0 0 0
> > 
> > ioband2: 0 40355280 ioband 1 -1 4307 17 540784 0 0 0
> > ioband1: 0 37768752 ioband 1 -1 6181 0 778992 0 0 0
> > 
> > ioband2: 0 40355280 ioband 1 -1 4930 19 619208 0 0 0
> > ioband1: 0 37768752 ioband 1 -1 7028 0 885728 0 0 0
> > 
> > ioband2: 0 40355280 ioband 1 -1 5599 22 703280 0 0 0
> > ioband1: 0 37768752 ioband 1 -1 7815 0 985024 0 0 0
> > 
> > ioband2: 0 40355280 ioband 1 -1 6586 27 827456 0 0 0
> > ioband1: 0 37768752 ioband 1 -1 8327 0 1049624 0 0 0
> > 
> > Following are details of my test setup.
> > ---------------------------------------
> > I took dm-ioband patch version 1.12.3 and applied on 2.6.31-rc6.
> > 
> > Created ioband devices using following command.
> > ----------------------------------------------
> > echo "0 $(blockdev --getsize /dev/sdd1) ioband /dev/sdd1 1 0 0 none"
> > "weight 0 :200" | dmsetup create ioband1
> > echo "0 $(blockdev --getsize /dev/sdd2) ioband /dev/sdd2 1 0 0 none"
> > "weight 0 :100" | dmsetup create ioband2
> > 
> > mount /dev/mapper/ioband1 /mnt/sdd1
> > mount /dev/mapper/ioband2 /mnt/sdd2
> > 
> > Started two dd threads
> > ======================
> > dd if=/mnt/sdd1/testzerofile1 of=/dev/null &
> > dd if=/mnt/sdd2/testzerofile1 of=/dev/null &
> > 
> > Output of dmsetup table command
> > ================================
> > ioband2: 0 40355280 ioband 8:50 1 4 192 none weight 768 :100
> > ioband1: 0 37768752 ioband 8:49 1 4 192 none weight 768 :200
> > 
> > Thanks
> > Vivek

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: Regarding dm-ioband tests
  2009-09-03 13:11     ` Vivek Goyal
  (?)
@ 2009-09-04  1:12     ` Ryo Tsuruta
  2009-09-15 21:40         ` Vivek Goyal
  -1 siblings, 1 reply; 80+ messages in thread
From: Ryo Tsuruta @ 2009-09-04  1:12 UTC (permalink / raw)
  To: vgoyal; +Cc: linux-kernel, dm-devel

Hi Vivek,

Vivek Goyal <vgoyal@redhat.com> wrote:
> On Tue, Sep 01, 2009 at 01:47:24PM -0400, Vivek Goyal wrote:
> > On Tue, Sep 01, 2009 at 12:50:11PM -0400, Vivek Goyal wrote:
> > > Hi Ryo,
> > > 
> > > I decided to play a bit more with dm-ioband and started doing some
> > > testing. I am doing a simple two dd threads doing reads and don't seem
> > > to be gettting the fairness. So thought will ask you what's the issue
> > > here. Is there an issue with my testing procedure.
> > > 
> > > I got one 40G SATA drive (no hardware queuing). I have created two
> > > partitions on that disk /dev/sdd1 and /dev/sdd2 and created two ioband
> > > devices ioband1 and ioband2 on partitions sdd1 and sdd2 respectively. The
> > > weights of ioband1 and ioband2 devices are 200 and 100 respectively. 
> > > 
> > > I am assuming that this setup will create two default groups and IO
> > > going to partition sdd1 should get double the BW of partition sdd2.
> > > 
> > > But it looks like I am not gettting that behavior. Following is the output
> > > of "dmsetup table" command. This snapshot has been taken every 2 seconds
> > > while IO was going on. Column 9 seems to be containing how many sectors
> > > of IO has been done on a particular io band device and group. Looking at
> > > the snapshot, it does not look like that ioband1 default group got double
> > > the BW of ioband2 default group.  
> > > 
> > > Am I doing something wrong here?
> > > 
> > 
> 
> Hi Ryo,
> 
> Did you get a chance to look into it? Am I doing something wrong or it is
> an issue with dm-ioband.

Sorry, I missed it. I'll look into it and report back to you.

Thanks,
Ryo Tsuruta

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: Regarding dm-ioband tests
  2009-09-01 16:50 ` Vivek Goyal
@ 2009-09-04  4:02   ` Ryo Tsuruta
  -1 siblings, 0 replies; 80+ messages in thread
From: Ryo Tsuruta @ 2009-09-04  4:02 UTC (permalink / raw)
  To: vgoyal; +Cc: linux-kernel, dm-devel

Hi Vivek,

Vivek Goyal <vgoyal@redhat.com> wrote:
> Hi Ryo,
> 
> I decided to play a bit more with dm-ioband and started doing some
> testing. I am doing a simple two dd threads doing reads and don't seem
> to be gettting the fairness. So thought will ask you what's the issue
> here. Is there an issue with my testing procedure.

Thank you for testing dm-ioband. dm-ioband is designed to start
throttling bandwidth when multiple IO requests are issued to devices
simultaneously, IOW, to start throttling when IO load exceeds a
certain level.

Here is my test script that runs multiple dd threads on each
directory. Each directory stores 20 files of 2GB.

    #!/bin/sh
    tmout=60

    for nr_threads in 1 4 8 12 16 20; do
            sync; echo 3 > /proc/sys/vm/drop_caches

            for i in $(seq $nr_threads); do
                    dd if=/mnt1/ioband1.${i}.0 of=/dev/null &
                    dd if=/mnt2/ioband2.${i}.0 of=/dev/null &
            done
            iostat -k 1 $tmout > ${nr_threads}.log
            killall -ws TERM dd
    done
    exit 0

Here is the result. The average throughputs of each device are
according to the proportion of the weight settings when the number of
thread is over four.

              Average thoughput in 60 seconds [KB/s]

              ioband1           ioband2
  threads    weight 200        weight 100       total
        1   26642 (54.9%)     21925 (45.1%)     48568
        4   33974 (67.7%)     16181 (32.3%)     50156
        8   31952 (66.2%)     16297 (33.8%)     48249
       12   32062 (67.8%)     15236 (32.2%)     47299
       16   31780 (67.7%)     15165 (32.3%)     46946
       20   29955 (66.3%)     15239 (33.7%)     45195

Please try to run the above script on your envirionment and I would be
glad if you let me know the result.

Thanks,
Ryo Tsuruta

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: Regarding dm-ioband tests
@ 2009-09-04  4:02   ` Ryo Tsuruta
  0 siblings, 0 replies; 80+ messages in thread
From: Ryo Tsuruta @ 2009-09-04  4:02 UTC (permalink / raw)
  To: vgoyal; +Cc: dm-devel, linux-kernel

Hi Vivek,

Vivek Goyal <vgoyal@redhat.com> wrote:
> Hi Ryo,
> 
> I decided to play a bit more with dm-ioband and started doing some
> testing. I am doing a simple two dd threads doing reads and don't seem
> to be gettting the fairness. So thought will ask you what's the issue
> here. Is there an issue with my testing procedure.

Thank you for testing dm-ioband. dm-ioband is designed to start
throttling bandwidth when multiple IO requests are issued to devices
simultaneously, IOW, to start throttling when IO load exceeds a
certain level.

Here is my test script that runs multiple dd threads on each
directory. Each directory stores 20 files of 2GB.

    #!/bin/sh
    tmout=60

    for nr_threads in 1 4 8 12 16 20; do
            sync; echo 3 > /proc/sys/vm/drop_caches

            for i in $(seq $nr_threads); do
                    dd if=/mnt1/ioband1.${i}.0 of=/dev/null &
                    dd if=/mnt2/ioband2.${i}.0 of=/dev/null &
            done
            iostat -k 1 $tmout > ${nr_threads}.log
            killall -ws TERM dd
    done
    exit 0

Here is the result. The average throughputs of each device are
according to the proportion of the weight settings when the number of
thread is over four.

              Average thoughput in 60 seconds [KB/s]

              ioband1           ioband2
  threads    weight 200        weight 100       total
        1   26642 (54.9%)     21925 (45.1%)     48568
        4   33974 (67.7%)     16181 (32.3%)     50156
        8   31952 (66.2%)     16297 (33.8%)     48249
       12   32062 (67.8%)     15236 (32.2%)     47299
       16   31780 (67.7%)     15165 (32.3%)     46946
       20   29955 (66.3%)     15239 (33.7%)     45195

Please try to run the above script on your envirionment and I would be
glad if you let me know the result.

Thanks,
Ryo Tsuruta

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: Regarding dm-ioband tests
  2009-09-04  4:02   ` Ryo Tsuruta
@ 2009-09-04 23:11     ` Vivek Goyal
  -1 siblings, 0 replies; 80+ messages in thread
From: Vivek Goyal @ 2009-09-04 23:11 UTC (permalink / raw)
  To: Ryo Tsuruta
  Cc: linux-kernel, dm-devel, Jens Axboe, Alasdair G Kergon,
	Morton Andrew Morton, Nauman Rafique, Gui Jianfeng, Rik Van Riel,
	Moyer Jeff Moyer, Balbir Singh

On Fri, Sep 04, 2009 at 01:02:28PM +0900, Ryo Tsuruta wrote:
> Hi Vivek,
> 

> Vivek Goyal <vgoyal@redhat.com> wrote:
> > Hi Ryo,
> > 
> > I decided to play a bit more with dm-ioband and started doing some
> > testing. I am doing a simple two dd threads doing reads and don't seem
> > to be gettting the fairness. So thought will ask you what's the issue
> > here. Is there an issue with my testing procedure.

[ CCing relevant folks on thread as what does fairness mean is becoming
  interesting]

> 
> Thank you for testing dm-ioband. dm-ioband is designed to start
> throttling bandwidth when multiple IO requests are issued to devices
> simultaneously, IOW, to start throttling when IO load exceeds a
> certain level.
> 

What is that certain level? Secondly what's the advantage of this?

I can see disadvantages though. So unless a group is really busy "up to
that certain level" it will not get fairness? I breaks the isolation
between groups.

> Here is my test script that runs multiple dd threads on each
> directory. Each directory stores 20 files of 2GB.
> 
>     #!/bin/sh
>     tmout=60
> 
>     for nr_threads in 1 4 8 12 16 20; do
>             sync; echo 3 > /proc/sys/vm/drop_caches
> 
>             for i in $(seq $nr_threads); do
>                     dd if=/mnt1/ioband1.${i}.0 of=/dev/null &
>                     dd if=/mnt2/ioband2.${i}.0 of=/dev/null &
>             done
>             iostat -k 1 $tmout > ${nr_threads}.log
>             killall -ws TERM dd
>     done
>     exit 0
> 
> Here is the result. The average throughputs of each device are
> according to the proportion of the weight settings when the number of
> thread is over four.
> 
>               Average thoughput in 60 seconds [KB/s]
> 
>               ioband1           ioband2
>   threads    weight 200        weight 100       total
>         1   26642 (54.9%)     21925 (45.1%)     48568
>         4   33974 (67.7%)     16181 (32.3%)     50156
>         8   31952 (66.2%)     16297 (33.8%)     48249
>        12   32062 (67.8%)     15236 (32.2%)     47299
>        16   31780 (67.7%)     15165 (32.3%)     46946
>        20   29955 (66.3%)     15239 (33.7%)     45195
> 
> Please try to run the above script on your envirionment and I would be
> glad if you let me know the result.

I ran my simple dd test again with two ioband deviecs of weight 200
(ioband1) and 100 (ioband2)respectively. I launched four sequential dd
readers on ioband2 and and one sequential reader in ioband1.

Now if we are providing isolation between groups then ioband1 should get
double the bandwidth of ioband1. But that does not happen. Following is
the output of "dmsetup table" command.

Fri Sep  4 18:02:01 EDT 2009
ioband2: 0 40355280 ioband 1 -1 0 0 0 0 0 0
ioband1: 0 37768752 ioband 1 -1 0 0 0 0 0 0

Fri Sep  4 18:02:08 EDT 2009
ioband2: 0 40355280 ioband 1 -1 2031 32 250280 0 0 0
ioband1: 0 37768752 ioband 1 -1 1484 0 186544 0 0 0

Fri Sep  4 18:02:12 EDT 2009
ioband2: 0 40355280 ioband 1 -1 3541 64 437192 0 0 0
ioband1: 0 37768752 ioband 1 -1 2802 0 352728 0 0 0

Fri Sep  4 18:02:16 EDT 2009
ioband2: 0 40355280 ioband 1 -1 5200 87 644144 0 0 0
ioband1: 0 37768752 ioband 1 -1 4003 0 504296 0 0 0

Fri Sep  4 18:02:20 EDT 2009
ioband2: 0 40355280 ioband 1 -1 7632 111 948232 0 0 0
ioband1: 0 37768752 ioband 1 -1 4494 0 566080 0 0 0

This seems to be breaking the isolation between two groups. Now if there
is one bad group with lots of readers and writers of lower weight, it will
overwhelm a group of higher weight with 1-2 readers running or some random
readers running etc.

If there are lots of readers running in a group and then a small file
reader comes in a different group of higher prio, it will not get any
fairness and latency of file read will be very high. But one would expect
that groups will provide isolation and latency of small file reader will
not increase with number of readers in a low prio group.

I also ran your test of doing heavy IO in two groups. This time I am
running 4 dd threads in both the ioband devices. Following is the snapshot
of "dmsetup table" output.

Fri Sep  4 17:45:27 EDT 2009
ioband2: 0 40355280 ioband 1 -1 0 0 0 0 0 0
ioband1: 0 37768752 ioband 1 -1 0 0 0 0 0 0

Fri Sep  4 17:45:29 EDT 2009
ioband2: 0 40355280 ioband 1 -1 41 0 4184 0 0 0
ioband1: 0 37768752 ioband 1 -1 173 0 20096 0 0 0

Fri Sep  4 17:45:37 EDT 2009
ioband2: 0 40355280 ioband 1 -1 1605 23 197976 0 0 0
ioband1: 0 37768752 ioband 1 -1 4640 1 583168 0 0 0

Fri Sep  4 17:45:45 EDT 2009
ioband2: 0 40355280 ioband 1 -1 3650 47 453488 0 0 0
ioband1: 0 37768752 ioband 1 -1 8572 1 1079144 0 0 0

Fri Sep  4 17:45:51 EDT 2009
ioband2: 0 40355280 ioband 1 -1 5111 68 635696 0 0 0
ioband1: 0 37768752 ioband 1 -1 11587 1 1459544 0 0 0

Fri Sep  4 17:45:53 EDT 2009
ioband2: 0 40355280 ioband 1 -1 5698 73 709272 0 0 0
ioband1: 0 37768752 ioband 1 -1 12503 1 1575112 0 0 0

Fri Sep  4 17:45:57 EDT 2009
ioband2: 0 40355280 ioband 1 -1 6790 87 845808 0 0 0
ioband1: 0 37768752 ioband 1 -1 14395 2 1813680 0 0 0

Note, it took me more than 20 seconds (since I started the threds) to
reach close to desired fairness level. That's too long a duration. Again
random readers or small file readers are compeltely out of picture for
any kind of fairness or are not protected at all with dm-ioband
controller.

I think there are serious issues with the notion of fairness and what kind of
isolation dm-ioband provide between groups and it should be looked into.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: Regarding dm-ioband tests
@ 2009-09-04 23:11     ` Vivek Goyal
  0 siblings, 0 replies; 80+ messages in thread
From: Vivek Goyal @ 2009-09-04 23:11 UTC (permalink / raw)
  To: Ryo Tsuruta
  Cc: Rik Van Riel, Gui Jianfeng, linux-kernel, Moyer Jeff Moyer,
	dm-devel, Jens Axboe, Nauman Rafique, Morton Andrew Morton,
	Alasdair G Kergon, Balbir Singh

On Fri, Sep 04, 2009 at 01:02:28PM +0900, Ryo Tsuruta wrote:
> Hi Vivek,
> 

> Vivek Goyal <vgoyal@redhat.com> wrote:
> > Hi Ryo,
> > 
> > I decided to play a bit more with dm-ioband and started doing some
> > testing. I am doing a simple two dd threads doing reads and don't seem
> > to be gettting the fairness. So thought will ask you what's the issue
> > here. Is there an issue with my testing procedure.

[ CCing relevant folks on thread as what does fairness mean is becoming
  interesting]

> 
> Thank you for testing dm-ioband. dm-ioband is designed to start
> throttling bandwidth when multiple IO requests are issued to devices
> simultaneously, IOW, to start throttling when IO load exceeds a
> certain level.
> 

What is that certain level? Secondly what's the advantage of this?

I can see disadvantages though. So unless a group is really busy "up to
that certain level" it will not get fairness? I breaks the isolation
between groups.

> Here is my test script that runs multiple dd threads on each
> directory. Each directory stores 20 files of 2GB.
> 
>     #!/bin/sh
>     tmout=60
> 
>     for nr_threads in 1 4 8 12 16 20; do
>             sync; echo 3 > /proc/sys/vm/drop_caches
> 
>             for i in $(seq $nr_threads); do
>                     dd if=/mnt1/ioband1.${i}.0 of=/dev/null &
>                     dd if=/mnt2/ioband2.${i}.0 of=/dev/null &
>             done
>             iostat -k 1 $tmout > ${nr_threads}.log
>             killall -ws TERM dd
>     done
>     exit 0
> 
> Here is the result. The average throughputs of each device are
> according to the proportion of the weight settings when the number of
> thread is over four.
> 
>               Average thoughput in 60 seconds [KB/s]
> 
>               ioband1           ioband2
>   threads    weight 200        weight 100       total
>         1   26642 (54.9%)     21925 (45.1%)     48568
>         4   33974 (67.7%)     16181 (32.3%)     50156
>         8   31952 (66.2%)     16297 (33.8%)     48249
>        12   32062 (67.8%)     15236 (32.2%)     47299
>        16   31780 (67.7%)     15165 (32.3%)     46946
>        20   29955 (66.3%)     15239 (33.7%)     45195
> 
> Please try to run the above script on your envirionment and I would be
> glad if you let me know the result.

I ran my simple dd test again with two ioband deviecs of weight 200
(ioband1) and 100 (ioband2)respectively. I launched four sequential dd
readers on ioband2 and and one sequential reader in ioband1.

Now if we are providing isolation between groups then ioband1 should get
double the bandwidth of ioband1. But that does not happen. Following is
the output of "dmsetup table" command.

Fri Sep  4 18:02:01 EDT 2009
ioband2: 0 40355280 ioband 1 -1 0 0 0 0 0 0
ioband1: 0 37768752 ioband 1 -1 0 0 0 0 0 0

Fri Sep  4 18:02:08 EDT 2009
ioband2: 0 40355280 ioband 1 -1 2031 32 250280 0 0 0
ioband1: 0 37768752 ioband 1 -1 1484 0 186544 0 0 0

Fri Sep  4 18:02:12 EDT 2009
ioband2: 0 40355280 ioband 1 -1 3541 64 437192 0 0 0
ioband1: 0 37768752 ioband 1 -1 2802 0 352728 0 0 0

Fri Sep  4 18:02:16 EDT 2009
ioband2: 0 40355280 ioband 1 -1 5200 87 644144 0 0 0
ioband1: 0 37768752 ioband 1 -1 4003 0 504296 0 0 0

Fri Sep  4 18:02:20 EDT 2009
ioband2: 0 40355280 ioband 1 -1 7632 111 948232 0 0 0
ioband1: 0 37768752 ioband 1 -1 4494 0 566080 0 0 0

This seems to be breaking the isolation between two groups. Now if there
is one bad group with lots of readers and writers of lower weight, it will
overwhelm a group of higher weight with 1-2 readers running or some random
readers running etc.

If there are lots of readers running in a group and then a small file
reader comes in a different group of higher prio, it will not get any
fairness and latency of file read will be very high. But one would expect
that groups will provide isolation and latency of small file reader will
not increase with number of readers in a low prio group.

I also ran your test of doing heavy IO in two groups. This time I am
running 4 dd threads in both the ioband devices. Following is the snapshot
of "dmsetup table" output.

Fri Sep  4 17:45:27 EDT 2009
ioband2: 0 40355280 ioband 1 -1 0 0 0 0 0 0
ioband1: 0 37768752 ioband 1 -1 0 0 0 0 0 0

Fri Sep  4 17:45:29 EDT 2009
ioband2: 0 40355280 ioband 1 -1 41 0 4184 0 0 0
ioband1: 0 37768752 ioband 1 -1 173 0 20096 0 0 0

Fri Sep  4 17:45:37 EDT 2009
ioband2: 0 40355280 ioband 1 -1 1605 23 197976 0 0 0
ioband1: 0 37768752 ioband 1 -1 4640 1 583168 0 0 0

Fri Sep  4 17:45:45 EDT 2009
ioband2: 0 40355280 ioband 1 -1 3650 47 453488 0 0 0
ioband1: 0 37768752 ioband 1 -1 8572 1 1079144 0 0 0

Fri Sep  4 17:45:51 EDT 2009
ioband2: 0 40355280 ioband 1 -1 5111 68 635696 0 0 0
ioband1: 0 37768752 ioband 1 -1 11587 1 1459544 0 0 0

Fri Sep  4 17:45:53 EDT 2009
ioband2: 0 40355280 ioband 1 -1 5698 73 709272 0 0 0
ioband1: 0 37768752 ioband 1 -1 12503 1 1575112 0 0 0

Fri Sep  4 17:45:57 EDT 2009
ioband2: 0 40355280 ioband 1 -1 6790 87 845808 0 0 0
ioband1: 0 37768752 ioband 1 -1 14395 2 1813680 0 0 0

Note, it took me more than 20 seconds (since I started the threds) to
reach close to desired fairness level. That's too long a duration. Again
random readers or small file readers are compeltely out of picture for
any kind of fairness or are not protected at all with dm-ioband
controller.

I think there are serious issues with the notion of fairness and what kind of
isolation dm-ioband provide between groups and it should be looked into.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: Regarding dm-ioband tests
  2009-09-04 23:11     ` Vivek Goyal
@ 2009-09-07 11:02       ` Ryo Tsuruta
  -1 siblings, 0 replies; 80+ messages in thread
From: Ryo Tsuruta @ 2009-09-07 11:02 UTC (permalink / raw)
  To: vgoyal
  Cc: linux-kernel, dm-devel, jens.axboe, agk, akpm, nauman,
	guijianfeng, riel, jmoyer, balbir

Hi Vivek,

Vivek Goyal <vgoyal@redhat.com> wrote:
> > Thank you for testing dm-ioband. dm-ioband is designed to start
> > throttling bandwidth when multiple IO requests are issued to devices
> > simultaneously, IOW, to start throttling when IO load exceeds a
> > certain level.
> > 
> 
> What is that certain level? Secondly what's the advantage of this?
> 
> I can see disadvantages though. So unless a group is really busy "up to
> that certain level" it will not get fairness? I breaks the isolation
> between groups.

In your test case, at least more than one dd thread have to run
simultaneously in the higher weight group. The reason is that
if there is an IO group which does not issue a certain number of IO
requests, dm-ioband assumes the IO group is inactive and assign its
spare bandwidth to active IO groups. Then whole bandwidth of the
device can be efficiently used. Please run two dd threads in the
higher group, it will work as you expect.

However, if you want to get fairness in a case like this, a new
bandwidth control policy which controls accurately according to
assigned weights can be added to dm-ioband.

> I also ran your test of doing heavy IO in two groups. This time I am
> running 4 dd threads in both the ioband devices. Following is the snapshot
> of "dmsetup table" output.
>
> Fri Sep  4 17:45:27 EDT 2009
> ioband2: 0 40355280 ioband 1 -1 0 0 0 0 0 0
> ioband1: 0 37768752 ioband 1 -1 0 0 0 0 0 0
> 
> Fri Sep  4 17:45:29 EDT 2009
> ioband2: 0 40355280 ioband 1 -1 41 0 4184 0 0 0
> ioband1: 0 37768752 ioband 1 -1 173 0 20096 0 0 0
> 
> Fri Sep  4 17:45:37 EDT 2009
> ioband2: 0 40355280 ioband 1 -1 1605 23 197976 0 0 0
> ioband1: 0 37768752 ioband 1 -1 4640 1 583168 0 0 0
> 
> Fri Sep  4 17:45:45 EDT 2009
> ioband2: 0 40355280 ioband 1 -1 3650 47 453488 0 0 0
> ioband1: 0 37768752 ioband 1 -1 8572 1 1079144 0 0 0
> 
> Fri Sep  4 17:45:51 EDT 2009
> ioband2: 0 40355280 ioband 1 -1 5111 68 635696 0 0 0
> ioband1: 0 37768752 ioband 1 -1 11587 1 1459544 0 0 0
> 
> Fri Sep  4 17:45:53 EDT 2009
> ioband2: 0 40355280 ioband 1 -1 5698 73 709272 0 0 0
> ioband1: 0 37768752 ioband 1 -1 12503 1 1575112 0 0 0
> 
> Fri Sep  4 17:45:57 EDT 2009
> ioband2: 0 40355280 ioband 1 -1 6790 87 845808 0 0 0
> ioband1: 0 37768752 ioband 1 -1 14395 2 1813680 0 0 0
> 
> Note, it took me more than 20 seconds (since I started the threds) to
> reach close to desired fairness level. That's too long a duration.

We regarded reducing throughput loss rather than reducing duration
as the design of dm-ioband. Of course, it is possible to make a new
policy which reduces duration.

Thanks,
Ryo Tsuruta

> Again random readers or small file readers are compeltely out of picture for
> any kind of fairness or are not protected at all with dm-ioband
> controller.
> 
> I think there are serious issues with the notion of fairness and what kind of
> isolation dm-ioband provide between groups and it should be looked into.
> 
> Thanks
> Vivek

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: Regarding dm-ioband tests
@ 2009-09-07 11:02       ` Ryo Tsuruta
  0 siblings, 0 replies; 80+ messages in thread
From: Ryo Tsuruta @ 2009-09-07 11:02 UTC (permalink / raw)
  To: vgoyal
  Cc: riel, guijianfeng, linux-kernel, jmoyer, dm-devel, jens.axboe,
	nauman, akpm, agk, balbir

Hi Vivek,

Vivek Goyal <vgoyal@redhat.com> wrote:
> > Thank you for testing dm-ioband. dm-ioband is designed to start
> > throttling bandwidth when multiple IO requests are issued to devices
> > simultaneously, IOW, to start throttling when IO load exceeds a
> > certain level.
> > 
> 
> What is that certain level? Secondly what's the advantage of this?
> 
> I can see disadvantages though. So unless a group is really busy "up to
> that certain level" it will not get fairness? I breaks the isolation
> between groups.

In your test case, at least more than one dd thread have to run
simultaneously in the higher weight group. The reason is that
if there is an IO group which does not issue a certain number of IO
requests, dm-ioband assumes the IO group is inactive and assign its
spare bandwidth to active IO groups. Then whole bandwidth of the
device can be efficiently used. Please run two dd threads in the
higher group, it will work as you expect.

However, if you want to get fairness in a case like this, a new
bandwidth control policy which controls accurately according to
assigned weights can be added to dm-ioband.

> I also ran your test of doing heavy IO in two groups. This time I am
> running 4 dd threads in both the ioband devices. Following is the snapshot
> of "dmsetup table" output.
>
> Fri Sep  4 17:45:27 EDT 2009
> ioband2: 0 40355280 ioband 1 -1 0 0 0 0 0 0
> ioband1: 0 37768752 ioband 1 -1 0 0 0 0 0 0
> 
> Fri Sep  4 17:45:29 EDT 2009
> ioband2: 0 40355280 ioband 1 -1 41 0 4184 0 0 0
> ioband1: 0 37768752 ioband 1 -1 173 0 20096 0 0 0
> 
> Fri Sep  4 17:45:37 EDT 2009
> ioband2: 0 40355280 ioband 1 -1 1605 23 197976 0 0 0
> ioband1: 0 37768752 ioband 1 -1 4640 1 583168 0 0 0
> 
> Fri Sep  4 17:45:45 EDT 2009
> ioband2: 0 40355280 ioband 1 -1 3650 47 453488 0 0 0
> ioband1: 0 37768752 ioband 1 -1 8572 1 1079144 0 0 0
> 
> Fri Sep  4 17:45:51 EDT 2009
> ioband2: 0 40355280 ioband 1 -1 5111 68 635696 0 0 0
> ioband1: 0 37768752 ioband 1 -1 11587 1 1459544 0 0 0
> 
> Fri Sep  4 17:45:53 EDT 2009
> ioband2: 0 40355280 ioband 1 -1 5698 73 709272 0 0 0
> ioband1: 0 37768752 ioband 1 -1 12503 1 1575112 0 0 0
> 
> Fri Sep  4 17:45:57 EDT 2009
> ioband2: 0 40355280 ioband 1 -1 6790 87 845808 0 0 0
> ioband1: 0 37768752 ioband 1 -1 14395 2 1813680 0 0 0
> 
> Note, it took me more than 20 seconds (since I started the threds) to
> reach close to desired fairness level. That's too long a duration.

We regarded reducing throughput loss rather than reducing duration
as the design of dm-ioband. Of course, it is possible to make a new
policy which reduces duration.

Thanks,
Ryo Tsuruta

> Again random readers or small file readers are compeltely out of picture for
> any kind of fairness or are not protected at all with dm-ioband
> controller.
> 
> I think there are serious issues with the notion of fairness and what kind of
> isolation dm-ioband provide between groups and it should be looked into.
> 
> Thanks
> Vivek

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: Regarding dm-ioband tests
  2009-09-07 11:02       ` Ryo Tsuruta
@ 2009-09-07 13:53         ` Rik van Riel
  -1 siblings, 0 replies; 80+ messages in thread
From: Rik van Riel @ 2009-09-07 13:53 UTC (permalink / raw)
  To: Ryo Tsuruta
  Cc: vgoyal, linux-kernel, dm-devel, jens.axboe, agk, akpm, nauman,
	guijianfeng, jmoyer, balbir

Ryo Tsuruta wrote:

> However, if you want to get fairness in a case like this, a new
> bandwidth control policy which controls accurately according to
> assigned weights can be added to dm-ioband.

Are you saying that dm-ioband is purposely unfair,
until a certain load level is reached?

> We regarded reducing throughput loss rather than reducing duration
> as the design of dm-ioband. Of course, it is possible to make a new
> policy which reduces duration.

... while also reducing overall system throughput
by design?

Why are you even bothering to submit this to the
linux-kernel mailing list, when there is a codebase
available that has no throughput or fairness regressions?
(Vivek's io scheduler based io controler)

-- 
All rights reversed.

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: Regarding dm-ioband tests
@ 2009-09-07 13:53         ` Rik van Riel
  0 siblings, 0 replies; 80+ messages in thread
From: Rik van Riel @ 2009-09-07 13:53 UTC (permalink / raw)
  To: Ryo Tsuruta
  Cc: guijianfeng, linux-kernel, jmoyer, dm-devel, vgoyal, jens.axboe,
	nauman, akpm, agk, balbir

Ryo Tsuruta wrote:

> However, if you want to get fairness in a case like this, a new
> bandwidth control policy which controls accurately according to
> assigned weights can be added to dm-ioband.

Are you saying that dm-ioband is purposely unfair,
until a certain load level is reached?

> We regarded reducing throughput loss rather than reducing duration
> as the design of dm-ioband. Of course, it is possible to make a new
> policy which reduces duration.

... while also reducing overall system throughput
by design?

Why are you even bothering to submit this to the
linux-kernel mailing list, when there is a codebase
available that has no throughput or fairness regressions?
(Vivek's io scheduler based io controler)

-- 
All rights reversed.

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: Regarding dm-ioband tests
  2009-09-07 13:53         ` Rik van Riel
@ 2009-09-08  3:01           ` Ryo Tsuruta
  -1 siblings, 0 replies; 80+ messages in thread
From: Ryo Tsuruta @ 2009-09-08  3:01 UTC (permalink / raw)
  To: riel
  Cc: vgoyal, linux-kernel, dm-devel, jens.axboe, agk, akpm, nauman,
	guijianfeng, jmoyer, balbir

Hi Rik,

Rik van Riel <riel@redhat.com> wrote:
> Ryo Tsuruta wrote:
> 
> > However, if you want to get fairness in a case like this, a new
> > bandwidth control policy which controls accurately according to
> > assigned weights can be added to dm-ioband.
> 
> Are you saying that dm-ioband is purposely unfair,
> until a certain load level is reached?

Not unfair, dm-ioband(weight policy) is intentionally designed to
use bandwidth efficiently, weight policy tries to give spare bandwidth
of inactive groups to active groups.

> > We regarded reducing throughput loss rather than reducing duration
> > as the design of dm-ioband. Of course, it is possible to make a new
> > policy which reduces duration.
> 
> ... while also reducing overall system throughput
> by design?

I think it reduces system throughput compared to the current
implementation, because it causes more overhead to do fine grained
control. 

> Why are you even bothering to submit this to the
> linux-kernel mailing list, when there is a codebase
> available that has no throughput or fairness regressions?
> (Vivek's io scheduler based io controler)

I think there are some advantages to dm-ioband. That's why I post
dm-ioband to the mailing list.

- dm-ioband supports not only proportional weight policy but also rate
  limiting policy. Besides, new policies can be added to dm-ioband if
  a user wants to control bandwidth by his or her own policy.
- The dm-ioband driver can be replaced without stopping the system by
  using device-mapper's facility. It's easy to maintain.
- dm-ioband can use without cgroup. (I remember Vivek said it's not an
  advantage.)

Thanks,
Ryo Tsuruta

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: Regarding dm-ioband tests
@ 2009-09-08  3:01           ` Ryo Tsuruta
  0 siblings, 0 replies; 80+ messages in thread
From: Ryo Tsuruta @ 2009-09-08  3:01 UTC (permalink / raw)
  To: riel
  Cc: guijianfeng, linux-kernel, jmoyer, dm-devel, vgoyal, jens.axboe,
	nauman, akpm, agk, balbir

Hi Rik,

Rik van Riel <riel@redhat.com> wrote:
> Ryo Tsuruta wrote:
> 
> > However, if you want to get fairness in a case like this, a new
> > bandwidth control policy which controls accurately according to
> > assigned weights can be added to dm-ioband.
> 
> Are you saying that dm-ioband is purposely unfair,
> until a certain load level is reached?

Not unfair, dm-ioband(weight policy) is intentionally designed to
use bandwidth efficiently, weight policy tries to give spare bandwidth
of inactive groups to active groups.

> > We regarded reducing throughput loss rather than reducing duration
> > as the design of dm-ioband. Of course, it is possible to make a new
> > policy which reduces duration.
> 
> ... while also reducing overall system throughput
> by design?

I think it reduces system throughput compared to the current
implementation, because it causes more overhead to do fine grained
control. 

> Why are you even bothering to submit this to the
> linux-kernel mailing list, when there is a codebase
> available that has no throughput or fairness regressions?
> (Vivek's io scheduler based io controler)

I think there are some advantages to dm-ioband. That's why I post
dm-ioband to the mailing list.

- dm-ioband supports not only proportional weight policy but also rate
  limiting policy. Besides, new policies can be added to dm-ioband if
  a user wants to control bandwidth by his or her own policy.
- The dm-ioband driver can be replaced without stopping the system by
  using device-mapper's facility. It's easy to maintain.
- dm-ioband can use without cgroup. (I remember Vivek said it's not an
  advantage.)

Thanks,
Ryo Tsuruta

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: Regarding dm-ioband tests
  2009-09-08  3:01           ` Ryo Tsuruta
@ 2009-09-08  3:22             ` Balbir Singh
  -1 siblings, 0 replies; 80+ messages in thread
From: Balbir Singh @ 2009-09-08  3:22 UTC (permalink / raw)
  To: Ryo Tsuruta
  Cc: riel, vgoyal, linux-kernel, dm-devel, jens.axboe, agk, akpm,
	nauman, guijianfeng, jmoyer

* Ryo Tsuruta <ryov@valinux.co.jp> [2009-09-08 12:01:19]:

> I think there are some advantages to dm-ioband. That's why I post
> dm-ioband to the mailing list.
> 
> - dm-ioband supports not only proportional weight policy but also rate
>   limiting policy. Besides, new policies can be added to dm-ioband if
>   a user wants to control bandwidth by his or her own policy.
> - The dm-ioband driver can be replaced without stopping the system by
>   using device-mapper's facility. It's easy to maintain.
> - dm-ioband can use without cgroup. (I remember Vivek said it's not an
>   advantage.)

But don't you need page_cgroup for IO tracking?

-- 
	Balbir

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: Regarding dm-ioband tests
@ 2009-09-08  3:22             ` Balbir Singh
  0 siblings, 0 replies; 80+ messages in thread
From: Balbir Singh @ 2009-09-08  3:22 UTC (permalink / raw)
  To: Ryo Tsuruta
  Cc: riel, guijianfeng, linux-kernel, jmoyer, dm-devel, vgoyal,
	jens.axboe, nauman, akpm, agk

* Ryo Tsuruta <ryov@valinux.co.jp> [2009-09-08 12:01:19]:

> I think there are some advantages to dm-ioband. That's why I post
> dm-ioband to the mailing list.
> 
> - dm-ioband supports not only proportional weight policy but also rate
>   limiting policy. Besides, new policies can be added to dm-ioband if
>   a user wants to control bandwidth by his or her own policy.
> - The dm-ioband driver can be replaced without stopping the system by
>   using device-mapper's facility. It's easy to maintain.
> - dm-ioband can use without cgroup. (I remember Vivek said it's not an
>   advantage.)

But don't you need page_cgroup for IO tracking?

-- 
	Balbir

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: Regarding dm-ioband tests
  2009-09-08  3:22             ` Balbir Singh
@ 2009-09-08  5:05               ` Ryo Tsuruta
  -1 siblings, 0 replies; 80+ messages in thread
From: Ryo Tsuruta @ 2009-09-08  5:05 UTC (permalink / raw)
  To: balbir
  Cc: riel, vgoyal, linux-kernel, dm-devel, jens.axboe, agk, akpm,
	nauman, guijianfeng, jmoyer

Hi Balbir,

Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
> * Ryo Tsuruta <ryov@valinux.co.jp> [2009-09-08 12:01:19]:
> 
> > I think there are some advantages to dm-ioband. That's why I post
> > dm-ioband to the mailing list.
> > 
> > - dm-ioband supports not only proportional weight policy but also rate
> >   limiting policy. Besides, new policies can be added to dm-ioband if
> >   a user wants to control bandwidth by his or her own policy.
> > - The dm-ioband driver can be replaced without stopping the system by
> >   using device-mapper's facility. It's easy to maintain.
> > - dm-ioband can use without cgroup. (I remember Vivek said it's not an
> >   advantage.)
> 
> But don't you need page_cgroup for IO tracking?

It is not necessary when controlling bandwidth on a per partition
basis or on a IO thread basis like Xen blkback kernel thread.

Here are configration examples.
http://sourceforge.net/apps/trac/ioband/wiki/dm-ioband/man/examples

Thanks,
Ryo Tsuruta

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: Regarding dm-ioband tests
@ 2009-09-08  5:05               ` Ryo Tsuruta
  0 siblings, 0 replies; 80+ messages in thread
From: Ryo Tsuruta @ 2009-09-08  5:05 UTC (permalink / raw)
  To: balbir
  Cc: riel, guijianfeng, linux-kernel, jmoyer, dm-devel, vgoyal,
	jens.axboe, nauman, akpm, agk

Hi Balbir,

Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
> * Ryo Tsuruta <ryov@valinux.co.jp> [2009-09-08 12:01:19]:
> 
> > I think there are some advantages to dm-ioband. That's why I post
> > dm-ioband to the mailing list.
> > 
> > - dm-ioband supports not only proportional weight policy but also rate
> >   limiting policy. Besides, new policies can be added to dm-ioband if
> >   a user wants to control bandwidth by his or her own policy.
> > - The dm-ioband driver can be replaced without stopping the system by
> >   using device-mapper's facility. It's easy to maintain.
> > - dm-ioband can use without cgroup. (I remember Vivek said it's not an
> >   advantage.)
> 
> But don't you need page_cgroup for IO tracking?

It is not necessary when controlling bandwidth on a per partition
basis or on a IO thread basis like Xen blkback kernel thread.

Here are configration examples.
http://sourceforge.net/apps/trac/ioband/wiki/dm-ioband/man/examples

Thanks,
Ryo Tsuruta

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: Regarding dm-ioband tests
  2009-09-08  3:01           ` Ryo Tsuruta
@ 2009-09-08 13:42             ` Vivek Goyal
  -1 siblings, 0 replies; 80+ messages in thread
From: Vivek Goyal @ 2009-09-08 13:42 UTC (permalink / raw)
  To: Ryo Tsuruta
  Cc: riel, linux-kernel, dm-devel, jens.axboe, agk, akpm, nauman,
	guijianfeng, jmoyer, balbir

On Tue, Sep 08, 2009 at 12:01:19PM +0900, Ryo Tsuruta wrote:
> Hi Rik,
> 
> Rik van Riel <riel@redhat.com> wrote:
> > Ryo Tsuruta wrote:
> > 
> > > However, if you want to get fairness in a case like this, a new
> > > bandwidth control policy which controls accurately according to
> > > assigned weights can be added to dm-ioband.
> > 
> > Are you saying that dm-ioband is purposely unfair,
> > until a certain load level is reached?
> 
> Not unfair, dm-ioband(weight policy) is intentionally designed to
> use bandwidth efficiently, weight policy tries to give spare bandwidth
> of inactive groups to active groups.
> 

This group is running a sequential reader. How can you call it an inactive
group?

I think that whole problem is that like CFQ you have not taken care of
idling into account.

Your solution seems to be designed only for processes doing bulk IO over
a very long period of time. I think it limits the usefulness of solution
severely.

> > > We regarded reducing throughput loss rather than reducing duration
> > > as the design of dm-ioband. Of course, it is possible to make a new
> > > policy which reduces duration.
> > 
> > ... while also reducing overall system throughput
> > by design?
> 
> I think it reduces system throughput compared to the current
> implementation, because it causes more overhead to do fine grained
> control. 
> 
> > Why are you even bothering to submit this to the
> > linux-kernel mailing list, when there is a codebase
> > available that has no throughput or fairness regressions?
> > (Vivek's io scheduler based io controler)
> 
> I think there are some advantages to dm-ioband. That's why I post
> dm-ioband to the mailing list.
> 
> - dm-ioband supports not only proportional weight policy but also rate
>   limiting policy. Besides, new policies can be added to dm-ioband if
>   a user wants to control bandwidth by his or her own policy.

I think we can easily extent io scheduler based controller to also support
max rate per group policy also. That should not be too hard. It is a
matter of only keeping track of io rate per group and if a group is
exceeding the rate, then schedule it out and move on to next group.

I can do that once proportional weight solution is stablized and gets
merged. 

So its not an advantage of dm-ioband.

> - The dm-ioband driver can be replaced without stopping the system by
>   using device-mapper's facility. It's easy to maintain.

We talked about this point in the past also. In io scheduler based
controller, just move all the tasks to root group and you got a system
not doing any io control.

By the way why would one like to do that? 

So this is also not an advantage.

> - dm-ioband can use without cgroup. (I remember Vivek said it's not an
>   advantage.)

I think this is more of a disadvantage than advantage. We have a very well
defined functionality of cgroup in kernel to group the tasks. Now you are
coming up with your own method of grouping the tasks which will make life
even more confusing for users and application writers.

I don't understand what is that core requirement of yours which is not met
by io scheduler based io controller. range policy control you have
implemented recently. I don't think that removing dm-ioband module
dynamically is core requirement. Also whatever you can do with additional 
grouping mechanism, you can do with cgroup also.

So if there is any of your core functionality which is not fulfilled by
io scheduler based controller, please let me know. I will be happy to look
into it and try to provide that feature. But looking at above list, I am
not convinced that any of the above is a compelling argument for dm-ioband
inclusion.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: Regarding dm-ioband tests
@ 2009-09-08 13:42             ` Vivek Goyal
  0 siblings, 0 replies; 80+ messages in thread
From: Vivek Goyal @ 2009-09-08 13:42 UTC (permalink / raw)
  To: Ryo Tsuruta
  Cc: riel, guijianfeng, linux-kernel, jmoyer, dm-devel, jens.axboe,
	nauman, akpm, agk, balbir

On Tue, Sep 08, 2009 at 12:01:19PM +0900, Ryo Tsuruta wrote:
> Hi Rik,
> 
> Rik van Riel <riel@redhat.com> wrote:
> > Ryo Tsuruta wrote:
> > 
> > > However, if you want to get fairness in a case like this, a new
> > > bandwidth control policy which controls accurately according to
> > > assigned weights can be added to dm-ioband.
> > 
> > Are you saying that dm-ioband is purposely unfair,
> > until a certain load level is reached?
> 
> Not unfair, dm-ioband(weight policy) is intentionally designed to
> use bandwidth efficiently, weight policy tries to give spare bandwidth
> of inactive groups to active groups.
> 

This group is running a sequential reader. How can you call it an inactive
group?

I think that whole problem is that like CFQ you have not taken care of
idling into account.

Your solution seems to be designed only for processes doing bulk IO over
a very long period of time. I think it limits the usefulness of solution
severely.

> > > We regarded reducing throughput loss rather than reducing duration
> > > as the design of dm-ioband. Of course, it is possible to make a new
> > > policy which reduces duration.
> > 
> > ... while also reducing overall system throughput
> > by design?
> 
> I think it reduces system throughput compared to the current
> implementation, because it causes more overhead to do fine grained
> control. 
> 
> > Why are you even bothering to submit this to the
> > linux-kernel mailing list, when there is a codebase
> > available that has no throughput or fairness regressions?
> > (Vivek's io scheduler based io controler)
> 
> I think there are some advantages to dm-ioband. That's why I post
> dm-ioband to the mailing list.
> 
> - dm-ioband supports not only proportional weight policy but also rate
>   limiting policy. Besides, new policies can be added to dm-ioband if
>   a user wants to control bandwidth by his or her own policy.

I think we can easily extent io scheduler based controller to also support
max rate per group policy also. That should not be too hard. It is a
matter of only keeping track of io rate per group and if a group is
exceeding the rate, then schedule it out and move on to next group.

I can do that once proportional weight solution is stablized and gets
merged. 

So its not an advantage of dm-ioband.

> - The dm-ioband driver can be replaced without stopping the system by
>   using device-mapper's facility. It's easy to maintain.

We talked about this point in the past also. In io scheduler based
controller, just move all the tasks to root group and you got a system
not doing any io control.

By the way why would one like to do that? 

So this is also not an advantage.

> - dm-ioband can use without cgroup. (I remember Vivek said it's not an
>   advantage.)

I think this is more of a disadvantage than advantage. We have a very well
defined functionality of cgroup in kernel to group the tasks. Now you are
coming up with your own method of grouping the tasks which will make life
even more confusing for users and application writers.

I don't understand what is that core requirement of yours which is not met
by io scheduler based io controller. range policy control you have
implemented recently. I don't think that removing dm-ioband module
dynamically is core requirement. Also whatever you can do with additional 
grouping mechanism, you can do with cgroup also.

So if there is any of your core functionality which is not fulfilled by
io scheduler based controller, please let me know. I will be happy to look
into it and try to provide that feature. But looking at above list, I am
not convinced that any of the above is a compelling argument for dm-ioband
inclusion.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: Regarding dm-ioband tests
  2009-09-08  5:05               ` Ryo Tsuruta
@ 2009-09-08 13:49                 ` Vivek Goyal
  -1 siblings, 0 replies; 80+ messages in thread
From: Vivek Goyal @ 2009-09-08 13:49 UTC (permalink / raw)
  To: Ryo Tsuruta
  Cc: balbir, riel, linux-kernel, dm-devel, jens.axboe, agk, akpm,
	nauman, guijianfeng, jmoyer

On Tue, Sep 08, 2009 at 02:05:16PM +0900, Ryo Tsuruta wrote:
> Hi Balbir,
> 
> Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
> > * Ryo Tsuruta <ryov@valinux.co.jp> [2009-09-08 12:01:19]:
> > 
> > > I think there are some advantages to dm-ioband. That's why I post
> > > dm-ioband to the mailing list.
> > > 
> > > - dm-ioband supports not only proportional weight policy but also rate
> > >   limiting policy. Besides, new policies can be added to dm-ioband if
> > >   a user wants to control bandwidth by his or her own policy.
> > > - The dm-ioband driver can be replaced without stopping the system by
> > >   using device-mapper's facility. It's easy to maintain.
> > > - dm-ioband can use without cgroup. (I remember Vivek said it's not an
> > >   advantage.)
> > 
> > But don't you need page_cgroup for IO tracking?
> 
> It is not necessary when controlling bandwidth on a per partition
> basis or on a IO thread basis like Xen blkback kernel thread.
> 
> Here are configration examples.
> http://sourceforge.net/apps/trac/ioband/wiki/dm-ioband/man/examples
> 

For partition based control, where a thread or group of threads is doing
IO to a specific parition, why can't you simply create different cgroups
for each partition and move threads in those partitions.


			root
		 	/ | \
		    sda1 sda2 sda3

Above are three groups and move threads doing IO into those groups and
problem is solved. In fact that's what one will do for KVM virtual
machines. Move all the qemu helper threds doing IO for a virtual machine
instance into a specific group and control the IO.

Why do you have to come up with additional complicated grouping mechanism?

Thanks
Vivek

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: Regarding dm-ioband tests
@ 2009-09-08 13:49                 ` Vivek Goyal
  0 siblings, 0 replies; 80+ messages in thread
From: Vivek Goyal @ 2009-09-08 13:49 UTC (permalink / raw)
  To: Ryo Tsuruta
  Cc: riel, guijianfeng, linux-kernel, jmoyer, dm-devel, jens.axboe,
	nauman, akpm, agk, balbir

On Tue, Sep 08, 2009 at 02:05:16PM +0900, Ryo Tsuruta wrote:
> Hi Balbir,
> 
> Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
> > * Ryo Tsuruta <ryov@valinux.co.jp> [2009-09-08 12:01:19]:
> > 
> > > I think there are some advantages to dm-ioband. That's why I post
> > > dm-ioband to the mailing list.
> > > 
> > > - dm-ioband supports not only proportional weight policy but also rate
> > >   limiting policy. Besides, new policies can be added to dm-ioband if
> > >   a user wants to control bandwidth by his or her own policy.
> > > - The dm-ioband driver can be replaced without stopping the system by
> > >   using device-mapper's facility. It's easy to maintain.
> > > - dm-ioband can use without cgroup. (I remember Vivek said it's not an
> > >   advantage.)
> > 
> > But don't you need page_cgroup for IO tracking?
> 
> It is not necessary when controlling bandwidth on a per partition
> basis or on a IO thread basis like Xen blkback kernel thread.
> 
> Here are configration examples.
> http://sourceforge.net/apps/trac/ioband/wiki/dm-ioband/man/examples
> 

For partition based control, where a thread or group of threads is doing
IO to a specific parition, why can't you simply create different cgroups
for each partition and move threads in those partitions.


			root
		 	/ | \
		    sda1 sda2 sda3

Above are three groups and move threads doing IO into those groups and
problem is solved. In fact that's what one will do for KVM virtual
machines. Move all the qemu helper threds doing IO for a virtual machine
instance into a specific group and control the IO.

Why do you have to come up with additional complicated grouping mechanism?

Thanks
Vivek

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: Regarding dm-ioband tests
  2009-09-08 13:42             ` Vivek Goyal
@ 2009-09-08 16:30               ` Nauman Rafique
  -1 siblings, 0 replies; 80+ messages in thread
From: Nauman Rafique @ 2009-09-08 16:30 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Ryo Tsuruta, riel, linux-kernel, dm-devel, jens.axboe, agk, akpm,
	guijianfeng, jmoyer, balbir

On Tue, Sep 8, 2009 at 6:42 AM, Vivek Goyal<vgoyal@redhat.com> wrote:
> On Tue, Sep 08, 2009 at 12:01:19PM +0900, Ryo Tsuruta wrote:
>> Hi Rik,
>>
>> Rik van Riel <riel@redhat.com> wrote:
>> > Ryo Tsuruta wrote:
>> >
>> > > However, if you want to get fairness in a case like this, a new
>> > > bandwidth control policy which controls accurately according to
>> > > assigned weights can be added to dm-ioband.
>> >
>> > Are you saying that dm-ioband is purposely unfair,
>> > until a certain load level is reached?
>>
>> Not unfair, dm-ioband(weight policy) is intentionally designed to
>> use bandwidth efficiently, weight policy tries to give spare bandwidth
>> of inactive groups to active groups.
>>
>
> This group is running a sequential reader. How can you call it an inactive
> group?
>
> I think that whole problem is that like CFQ you have not taken care of
> idling into account.

I think this is probably the key deal breaker. dm-ioband has no
mechanism to anticipate or idle for a reader task. Without such a
mechanism, a proportional division scheme cannot work for tasks doing
reads. Most readers do not send down more than one IO at a time, and
they do not send another until the previous one is complete.
Anticipation helps in this case, as we would wait for the task to send
down a new IO, before we expire its timeslice. Without anticipation,
we would serve the one IO from reader and then go on to serve IOs from
other tasks. When the reader would finally get around to sending next
IO, it would have to wait behind other IOs that have sent down in the
meanwhile.

IO schedulers in block layer have anticipation built into them, so a
proportional scheduling scheduling at that layer does not have to
repeat the logic or data structures for anticipation.

In fact, a rate limiting mechanism like dm-ioband can potentially
break the anticipation logic at IO schedulers, by queuing up the IOs
at an upper layer, while scheduler in block layer could have been
anticipating for it.

>
> Your solution seems to be designed only for processes doing bulk IO over
> a very long period of time. I think it limits the usefulness of solution
> severely.
>
>> > > We regarded reducing throughput loss rather than reducing duration
>> > > as the design of dm-ioband. Of course, it is possible to make a new
>> > > policy which reduces duration.
>> >
>> > ... while also reducing overall system throughput
>> > by design?
>>
>> I think it reduces system throughput compared to the current
>> implementation, because it causes more overhead to do fine grained
>> control.
>>
>> > Why are you even bothering to submit this to the
>> > linux-kernel mailing list, when there is a codebase
>> > available that has no throughput or fairness regressions?
>> > (Vivek's io scheduler based io controler)
>>
>> I think there are some advantages to dm-ioband. That's why I post
>> dm-ioband to the mailing list.
>>
>> - dm-ioband supports not only proportional weight policy but also rate
>>   limiting policy. Besides, new policies can be added to dm-ioband if
>>   a user wants to control bandwidth by his or her own policy.
>
> I think we can easily extent io scheduler based controller to also support
> max rate per group policy also. That should not be too hard. It is a
> matter of only keeping track of io rate per group and if a group is
> exceeding the rate, then schedule it out and move on to next group.

At Google, we have implemented a rate limiting mechanism on top of
Vivek's patches, and have been testing it. But I feel like the patch
set maintained by Vivek is pretty big already. Once we have those
patches merged, we can introduce more functionality.

>
> I can do that once proportional weight solution is stablized and gets
> merged.
>
> So its not an advantage of dm-ioband.
>
>> - The dm-ioband driver can be replaced without stopping the system by
>>   using device-mapper's facility. It's easy to maintain.
>
> We talked about this point in the past also. In io scheduler based
> controller, just move all the tasks to root group and you got a system
> not doing any io control.
>
> By the way why would one like to do that?
>
> So this is also not an advantage.
>
>> - dm-ioband can use without cgroup. (I remember Vivek said it's not an
>>   advantage.)
>
> I think this is more of a disadvantage than advantage. We have a very well
> defined functionality of cgroup in kernel to group the tasks. Now you are
> coming up with your own method of grouping the tasks which will make life
> even more confusing for users and application writers.
>
> I don't understand what is that core requirement of yours which is not met
> by io scheduler based io controller. range policy control you have
> implemented recently. I don't think that removing dm-ioband module
> dynamically is core requirement. Also whatever you can do with additional
> grouping mechanism, you can do with cgroup also.
>
> So if there is any of your core functionality which is not fulfilled by
> io scheduler based controller, please let me know. I will be happy to look
> into it and try to provide that feature. But looking at above list, I am
> not convinced that any of the above is a compelling argument for dm-ioband
> inclusion.
>
> Thanks
> Vivek
>

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: Regarding dm-ioband tests
@ 2009-09-08 16:30               ` Nauman Rafique
  0 siblings, 0 replies; 80+ messages in thread
From: Nauman Rafique @ 2009-09-08 16:30 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: riel, guijianfeng, linux-kernel, jmoyer, dm-devel, jens.axboe,
	akpm, agk, balbir

On Tue, Sep 8, 2009 at 6:42 AM, Vivek Goyal<vgoyal@redhat.com> wrote:
> On Tue, Sep 08, 2009 at 12:01:19PM +0900, Ryo Tsuruta wrote:
>> Hi Rik,
>>
>> Rik van Riel <riel@redhat.com> wrote:
>> > Ryo Tsuruta wrote:
>> >
>> > > However, if you want to get fairness in a case like this, a new
>> > > bandwidth control policy which controls accurately according to
>> > > assigned weights can be added to dm-ioband.
>> >
>> > Are you saying that dm-ioband is purposely unfair,
>> > until a certain load level is reached?
>>
>> Not unfair, dm-ioband(weight policy) is intentionally designed to
>> use bandwidth efficiently, weight policy tries to give spare bandwidth
>> of inactive groups to active groups.
>>
>
> This group is running a sequential reader. How can you call it an inactive
> group?
>
> I think that whole problem is that like CFQ you have not taken care of
> idling into account.

I think this is probably the key deal breaker. dm-ioband has no
mechanism to anticipate or idle for a reader task. Without such a
mechanism, a proportional division scheme cannot work for tasks doing
reads. Most readers do not send down more than one IO at a time, and
they do not send another until the previous one is complete.
Anticipation helps in this case, as we would wait for the task to send
down a new IO, before we expire its timeslice. Without anticipation,
we would serve the one IO from reader and then go on to serve IOs from
other tasks. When the reader would finally get around to sending next
IO, it would have to wait behind other IOs that have sent down in the
meanwhile.

IO schedulers in block layer have anticipation built into them, so a
proportional scheduling scheduling at that layer does not have to
repeat the logic or data structures for anticipation.

In fact, a rate limiting mechanism like dm-ioband can potentially
break the anticipation logic at IO schedulers, by queuing up the IOs
at an upper layer, while scheduler in block layer could have been
anticipating for it.

>
> Your solution seems to be designed only for processes doing bulk IO over
> a very long period of time. I think it limits the usefulness of solution
> severely.
>
>> > > We regarded reducing throughput loss rather than reducing duration
>> > > as the design of dm-ioband. Of course, it is possible to make a new
>> > > policy which reduces duration.
>> >
>> > ... while also reducing overall system throughput
>> > by design?
>>
>> I think it reduces system throughput compared to the current
>> implementation, because it causes more overhead to do fine grained
>> control.
>>
>> > Why are you even bothering to submit this to the
>> > linux-kernel mailing list, when there is a codebase
>> > available that has no throughput or fairness regressions?
>> > (Vivek's io scheduler based io controler)
>>
>> I think there are some advantages to dm-ioband. That's why I post
>> dm-ioband to the mailing list.
>>
>> - dm-ioband supports not only proportional weight policy but also rate
>>   limiting policy. Besides, new policies can be added to dm-ioband if
>>   a user wants to control bandwidth by his or her own policy.
>
> I think we can easily extent io scheduler based controller to also support
> max rate per group policy also. That should not be too hard. It is a
> matter of only keeping track of io rate per group and if a group is
> exceeding the rate, then schedule it out and move on to next group.

At Google, we have implemented a rate limiting mechanism on top of
Vivek's patches, and have been testing it. But I feel like the patch
set maintained by Vivek is pretty big already. Once we have those
patches merged, we can introduce more functionality.

>
> I can do that once proportional weight solution is stablized and gets
> merged.
>
> So its not an advantage of dm-ioband.
>
>> - The dm-ioband driver can be replaced without stopping the system by
>>   using device-mapper's facility. It's easy to maintain.
>
> We talked about this point in the past also. In io scheduler based
> controller, just move all the tasks to root group and you got a system
> not doing any io control.
>
> By the way why would one like to do that?
>
> So this is also not an advantage.
>
>> - dm-ioband can use without cgroup. (I remember Vivek said it's not an
>>   advantage.)
>
> I think this is more of a disadvantage than advantage. We have a very well
> defined functionality of cgroup in kernel to group the tasks. Now you are
> coming up with your own method of grouping the tasks which will make life
> even more confusing for users and application writers.
>
> I don't understand what is that core requirement of yours which is not met
> by io scheduler based io controller. range policy control you have
> implemented recently. I don't think that removing dm-ioband module
> dynamically is core requirement. Also whatever you can do with additional
> grouping mechanism, you can do with cgroup also.
>
> So if there is any of your core functionality which is not fulfilled by
> io scheduler based controller, please let me know. I will be happy to look
> into it and try to provide that feature. But looking at above list, I am
> not convinced that any of the above is a compelling argument for dm-ioband
> inclusion.
>
> Thanks
> Vivek
>

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: Regarding dm-ioband tests
  2009-09-08 16:30               ` Nauman Rafique
@ 2009-09-08 16:47                 ` Rik van Riel
  -1 siblings, 0 replies; 80+ messages in thread
From: Rik van Riel @ 2009-09-08 16:47 UTC (permalink / raw)
  To: Nauman Rafique
  Cc: Vivek Goyal, Ryo Tsuruta, linux-kernel, dm-devel, jens.axboe,
	agk, akpm, guijianfeng, jmoyer, balbir

Nauman Rafique wrote:

> I think this is probably the key deal breaker. dm-ioband has no
> mechanism to anticipate or idle for a reader task. Without such a
> mechanism, a proportional division scheme cannot work for tasks doing
> reads. 

That is a really big issue, since most reads tend to be synchronous
(the application is waiting for the read), while many writes are not
(the application is doing something else while the data is written).

Having writes take precedence over reads will really screw over the
readers, while not benefitting the writers all that much.

-- 
All rights reversed.

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: Regarding dm-ioband tests
@ 2009-09-08 16:47                 ` Rik van Riel
  0 siblings, 0 replies; 80+ messages in thread
From: Rik van Riel @ 2009-09-08 16:47 UTC (permalink / raw)
  To: Nauman Rafique
  Cc: guijianfeng, linux-kernel, jmoyer, dm-devel, Vivek Goyal,
	jens.axboe, akpm, agk, balbir

Nauman Rafique wrote:

> I think this is probably the key deal breaker. dm-ioband has no
> mechanism to anticipate or idle for a reader task. Without such a
> mechanism, a proportional division scheme cannot work for tasks doing
> reads. 

That is a really big issue, since most reads tend to be synchronous
(the application is waiting for the read), while many writes are not
(the application is doing something else while the data is written).

Having writes take precedence over reads will really screw over the
readers, while not benefitting the writers all that much.

-- 
All rights reversed.

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: Regarding dm-ioband tests
  2009-09-08 13:42             ` Vivek Goyal
  (?)
  (?)
@ 2009-09-08 17:06             ` Dhaval Giani
  2009-09-09  6:05                 ` Ryo Tsuruta
  -1 siblings, 1 reply; 80+ messages in thread
From: Dhaval Giani @ 2009-09-08 17:06 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Ryo Tsuruta, riel, linux-kernel, dm-devel, jens.axboe, agk, akpm,
	nauman, guijianfeng, jmoyer, balbir

> > - dm-ioband can use without cgroup. (I remember Vivek said it's not an
> >   advantage.)
> 
> I think this is more of a disadvantage than advantage. We have a very well
> defined functionality of cgroup in kernel to group the tasks. Now you are
> coming up with your own method of grouping the tasks which will make life
> even more confusing for users and application writers.
> 

I would tend to agree with this. With other resource management
controllers using cgroups, having dm-ioband use something different will
require a different set of userspace tools/libraries to be used.
Something that will severly limit its usefulness froma programmer's
perspective.

thanks,
-- 
regards,
Dhaval

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: Regarding dm-ioband tests
  2009-09-08 16:47                 ` Rik van Riel
@ 2009-09-08 17:54                   ` Vivek Goyal
  -1 siblings, 0 replies; 80+ messages in thread
From: Vivek Goyal @ 2009-09-08 17:54 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Nauman Rafique, Ryo Tsuruta, linux-kernel, dm-devel, jens.axboe,
	agk, akpm, guijianfeng, jmoyer, balbir

On Tue, Sep 08, 2009 at 12:47:33PM -0400, Rik van Riel wrote:
> Nauman Rafique wrote:
>
>> I think this is probably the key deal breaker. dm-ioband has no
>> mechanism to anticipate or idle for a reader task. Without such a
>> mechanism, a proportional division scheme cannot work for tasks doing
>> reads. 
>
> That is a really big issue, since most reads tend to be synchronous
> (the application is waiting for the read), while many writes are not
> (the application is doing something else while the data is written).
>
> Having writes take precedence over reads will really screw over the
> readers, while not benefitting the writers all that much.
>

I ran a test to show how readers can be starved in certain cases. I launched
one reader and three writers. I ran this test twice. First without dm-ioband
and then with dm-ioband.

Following are few lines from the script to launch readers and writers.

**************************************************************
sync
echo 3 > /proc/sys/vm/drop_caches

# Launch writers on sdd2
dd if=/dev/zero of=/mnt/sdd2/writezerofile1 bs=4K count=262144 &

# Launch  writers on sdd1
dd if=/dev/zero of=/mnt/sdd1/writezerofile1 bs=4K count=262144 &
dd if=/dev/zero of=/mnt/sdd1/writezerofile2 bs=4K count=262144 &

echo "sleeping for 5 seconds"
sleep 5

# launch reader on sdd1
time dd if=/mnt/sdd1/testzerofile1 of=/dev/zero &
echo "launched reader $!"
*********************************************************************

Without dm-ioband, reader finished in roughly 5 seconds.

289533952 bytes (290 MB) copied, 5.16765 s, 56.0 MB/s
real	0m5.300s
user	0m0.098s
sys	0m0.492s

With dm-ioband, reader took, more than 2 minutes to finish.

289533952 bytes (290 MB) copied, 122.386 s, 2.4 MB/s

real	2m2.569s
user	0m0.107s
sys	0m0.548s

I had created ioband1 on /dev/sdd1 and ioband2 on /dev/sdd2 with weights
200 and 100 respectively.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: Regarding dm-ioband tests
@ 2009-09-08 17:54                   ` Vivek Goyal
  0 siblings, 0 replies; 80+ messages in thread
From: Vivek Goyal @ 2009-09-08 17:54 UTC (permalink / raw)
  To: Rik van Riel
  Cc: guijianfeng, linux-kernel, jmoyer, dm-devel, jens.axboe,
	Nauman Rafique, akpm, agk, balbir

On Tue, Sep 08, 2009 at 12:47:33PM -0400, Rik van Riel wrote:
> Nauman Rafique wrote:
>
>> I think this is probably the key deal breaker. dm-ioband has no
>> mechanism to anticipate or idle for a reader task. Without such a
>> mechanism, a proportional division scheme cannot work for tasks doing
>> reads. 
>
> That is a really big issue, since most reads tend to be synchronous
> (the application is waiting for the read), while many writes are not
> (the application is doing something else while the data is written).
>
> Having writes take precedence over reads will really screw over the
> readers, while not benefitting the writers all that much.
>

I ran a test to show how readers can be starved in certain cases. I launched
one reader and three writers. I ran this test twice. First without dm-ioband
and then with dm-ioband.

Following are few lines from the script to launch readers and writers.

**************************************************************
sync
echo 3 > /proc/sys/vm/drop_caches

# Launch writers on sdd2
dd if=/dev/zero of=/mnt/sdd2/writezerofile1 bs=4K count=262144 &

# Launch  writers on sdd1
dd if=/dev/zero of=/mnt/sdd1/writezerofile1 bs=4K count=262144 &
dd if=/dev/zero of=/mnt/sdd1/writezerofile2 bs=4K count=262144 &

echo "sleeping for 5 seconds"
sleep 5

# launch reader on sdd1
time dd if=/mnt/sdd1/testzerofile1 of=/dev/zero &
echo "launched reader $!"
*********************************************************************

Without dm-ioband, reader finished in roughly 5 seconds.

289533952 bytes (290 MB) copied, 5.16765 s, 56.0 MB/s
real	0m5.300s
user	0m0.098s
sys	0m0.492s

With dm-ioband, reader took, more than 2 minutes to finish.

289533952 bytes (290 MB) copied, 122.386 s, 2.4 MB/s

real	2m2.569s
user	0m0.107s
sys	0m0.548s

I had created ioband1 on /dev/sdd1 and ioband2 on /dev/sdd2 with weights
200 and 100 respectively.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: Regarding dm-ioband tests
  2009-09-08  3:01           ` Ryo Tsuruta
@ 2009-09-08 19:24             ` Rik van Riel
  -1 siblings, 0 replies; 80+ messages in thread
From: Rik van Riel @ 2009-09-08 19:24 UTC (permalink / raw)
  To: Ryo Tsuruta
  Cc: vgoyal, linux-kernel, dm-devel, jens.axboe, agk, akpm, nauman,
	guijianfeng, jmoyer, balbir

Ryo Tsuruta wrote:
> Rik van Riel <riel@redhat.com> wrote:

>> Are you saying that dm-ioband is purposely unfair,
>> until a certain load level is reached?
> 
> Not unfair, dm-ioband(weight policy) is intentionally designed to
> use bandwidth efficiently, weight policy tries to give spare bandwidth
> of inactive groups to active groups.

This sounds good, except that the lack of anticipation
means that a group with just one task doing reads will
be considered "inactive" in-between reads.

This means writes can always get in-between two reads,
sometimes multiple writes at a time, really disadvantaging
a group that is doing just disk reads.

This is a problem, because reads are generally more time
sensitive than writes.

>>> We regarded reducing throughput loss rather than reducing duration
>>> as the design of dm-ioband. Of course, it is possible to make a new
>>> policy which reduces duration.
>> ... while also reducing overall system throughput
>> by design?
> 
> I think it reduces system throughput compared to the current
> implementation, because it causes more overhead to do fine grained
> control. 

Except that the io scheduler based io controller seems
to be able to enforce fairness while not reducing
throughput.

Dm-ioband would have to address these issues to be a
serious contender, IMHO.

-- 
All rights reversed.

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: Regarding dm-ioband tests
@ 2009-09-08 19:24             ` Rik van Riel
  0 siblings, 0 replies; 80+ messages in thread
From: Rik van Riel @ 2009-09-08 19:24 UTC (permalink / raw)
  To: Ryo Tsuruta
  Cc: guijianfeng, linux-kernel, jmoyer, dm-devel, vgoyal, jens.axboe,
	nauman, akpm, agk, balbir

Ryo Tsuruta wrote:
> Rik van Riel <riel@redhat.com> wrote:

>> Are you saying that dm-ioband is purposely unfair,
>> until a certain load level is reached?
> 
> Not unfair, dm-ioband(weight policy) is intentionally designed to
> use bandwidth efficiently, weight policy tries to give spare bandwidth
> of inactive groups to active groups.

This sounds good, except that the lack of anticipation
means that a group with just one task doing reads will
be considered "inactive" in-between reads.

This means writes can always get in-between two reads,
sometimes multiple writes at a time, really disadvantaging
a group that is doing just disk reads.

This is a problem, because reads are generally more time
sensitive than writes.

>>> We regarded reducing throughput loss rather than reducing duration
>>> as the design of dm-ioband. Of course, it is possible to make a new
>>> policy which reduces duration.
>> ... while also reducing overall system throughput
>> by design?
> 
> I think it reduces system throughput compared to the current
> implementation, because it causes more overhead to do fine grained
> control. 

Except that the io scheduler based io controller seems
to be able to enforce fairness while not reducing
throughput.

Dm-ioband would have to address these issues to be a
serious contender, IMHO.

-- 
All rights reversed.

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: Regarding dm-ioband tests
  2009-09-08 19:24             ` Rik van Riel
  (?)
@ 2009-09-09  0:09             ` Fabio Checconi
  2009-09-09  2:06                 ` Vivek Goyal
  2009-09-09  9:24                 ` Ryo Tsuruta
  -1 siblings, 2 replies; 80+ messages in thread
From: Fabio Checconi @ 2009-09-09  0:09 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Ryo Tsuruta, vgoyal, linux-kernel, dm-devel, jens.axboe, agk,
	akpm, nauman, guijianfeng, jmoyer, balbir

Hi,

> From: Rik van Riel <riel@redhat.com>
> Date: Tue, Sep 08, 2009 03:24:08PM -0400
>
> Ryo Tsuruta wrote:
> >Rik van Riel <riel@redhat.com> wrote:
> 
> >>Are you saying that dm-ioband is purposely unfair,
> >>until a certain load level is reached?
> >
> >Not unfair, dm-ioband(weight policy) is intentionally designed to
> >use bandwidth efficiently, weight policy tries to give spare bandwidth
> >of inactive groups to active groups.
> 
> This sounds good, except that the lack of anticipation
> means that a group with just one task doing reads will
> be considered "inactive" in-between reads.
> 

  anticipation helps in achieving fairness, but CFQ currently disables
idling for nonrot+NCQ media, to avoid the resulting throughput loss on
some SSDs.  Are we really sure that we want to introduce anticipation
everywhere, not only to improve throughput on rotational media, but to
achieve fairness too?

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: Regarding dm-ioband tests
  2009-09-09  0:09             ` Fabio Checconi
@ 2009-09-09  2:06                 ` Vivek Goyal
  2009-09-09  9:24                 ` Ryo Tsuruta
  1 sibling, 0 replies; 80+ messages in thread
From: Vivek Goyal @ 2009-09-09  2:06 UTC (permalink / raw)
  To: Fabio Checconi
  Cc: Rik van Riel, Ryo Tsuruta, linux-kernel, dm-devel, jens.axboe,
	agk, akpm, nauman, guijianfeng, jmoyer, balbir

On Wed, Sep 09, 2009 at 02:09:00AM +0200, Fabio Checconi wrote:
> Hi,
> 
> > From: Rik van Riel <riel@redhat.com>
> > Date: Tue, Sep 08, 2009 03:24:08PM -0400
> >
> > Ryo Tsuruta wrote:
> > >Rik van Riel <riel@redhat.com> wrote:
> > 
> > >>Are you saying that dm-ioband is purposely unfair,
> > >>until a certain load level is reached?
> > >
> > >Not unfair, dm-ioband(weight policy) is intentionally designed to
> > >use bandwidth efficiently, weight policy tries to give spare bandwidth
> > >of inactive groups to active groups.
> > 
> > This sounds good, except that the lack of anticipation
> > means that a group with just one task doing reads will
> > be considered "inactive" in-between reads.
> > 
> 
>   anticipation helps in achieving fairness, but CFQ currently disables
> idling for nonrot+NCQ media, to avoid the resulting throughput loss on
> some SSDs.  Are we really sure that we want to introduce anticipation
> everywhere, not only to improve throughput on rotational media, but to
> achieve fairness too?

That's a good point. Personally I think that fairness requirements for
individual queues and groups are little different. CFQ in general seems
to be focussing more on latency and throughput at the cost of fairness.

With groups, we probably need to put a greater amount of emphasis on group
fairness. So group will be a relatively a slower entity (with anticiaption
on and more idling), but it will also give you a greater amount of
isolation. So in practice, one will create groups carefully and they will
not proliferate like queues. This can mean overall reduced throughput on
SSD.

Having said that, group idling is tunable and one can always reduce it to
achieve a balance between fairness vs throughput depending on his need.

Thanks
Vivek 

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: Regarding dm-ioband tests
@ 2009-09-09  2:06                 ` Vivek Goyal
  0 siblings, 0 replies; 80+ messages in thread
From: Vivek Goyal @ 2009-09-09  2:06 UTC (permalink / raw)
  To: Fabio Checconi
  Cc: Rik van Riel, guijianfeng, linux-kernel, jmoyer, dm-devel,
	nauman, jens.axboe, akpm, agk, balbir

On Wed, Sep 09, 2009 at 02:09:00AM +0200, Fabio Checconi wrote:
> Hi,
> 
> > From: Rik van Riel <riel@redhat.com>
> > Date: Tue, Sep 08, 2009 03:24:08PM -0400
> >
> > Ryo Tsuruta wrote:
> > >Rik van Riel <riel@redhat.com> wrote:
> > 
> > >>Are you saying that dm-ioband is purposely unfair,
> > >>until a certain load level is reached?
> > >
> > >Not unfair, dm-ioband(weight policy) is intentionally designed to
> > >use bandwidth efficiently, weight policy tries to give spare bandwidth
> > >of inactive groups to active groups.
> > 
> > This sounds good, except that the lack of anticipation
> > means that a group with just one task doing reads will
> > be considered "inactive" in-between reads.
> > 
> 
>   anticipation helps in achieving fairness, but CFQ currently disables
> idling for nonrot+NCQ media, to avoid the resulting throughput loss on
> some SSDs.  Are we really sure that we want to introduce anticipation
> everywhere, not only to improve throughput on rotational media, but to
> achieve fairness too?

That's a good point. Personally I think that fairness requirements for
individual queues and groups are little different. CFQ in general seems
to be focussing more on latency and throughput at the cost of fairness.

With groups, we probably need to put a greater amount of emphasis on group
fairness. So group will be a relatively a slower entity (with anticiaption
on and more idling), but it will also give you a greater amount of
isolation. So in practice, one will create groups carefully and they will
not proliferate like queues. This can mean overall reduced throughput on
SSD.

Having said that, group idling is tunable and one can always reduce it to
achieve a balance between fairness vs throughput depending on his need.

Thanks
Vivek 

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: Regarding dm-ioband tests
  2009-09-08 13:49                 ` Vivek Goyal
@ 2009-09-09  5:17                   ` Ryo Tsuruta
  -1 siblings, 0 replies; 80+ messages in thread
From: Ryo Tsuruta @ 2009-09-09  5:17 UTC (permalink / raw)
  To: vgoyal
  Cc: balbir, riel, linux-kernel, dm-devel, jens.axboe, agk, akpm,
	nauman, guijianfeng, jmoyer

Hi Vivek,

Vivek Goyal <vgoyal@redhat.com> wrote:
> > It is not necessary when controlling bandwidth on a per partition
> > basis or on a IO thread basis like Xen blkback kernel thread.
> > 
> > Here are configration examples.
> > http://sourceforge.net/apps/trac/ioband/wiki/dm-ioband/man/examples
> > 
> 
> For partition based control, where a thread or group of threads is doing
> IO to a specific parition, why can't you simply create different cgroups
> for each partition and move threads in those partitions.
> 
> 
> 			root
> 		 	/ | \
> 		    sda1 sda2 sda3
>
> Above are three groups and move threads doing IO into those groups and
> problem is solved. In fact that's what one will do for KVM virtual
> machines. Move all the qemu helper threds doing IO for a virtual machine
> instance into a specific group and control the IO.
> 
> Why do you have to come up with additional complicated grouping mechanism?

I don't get why you think it's complicated, your io-controller also
provides the same grouping machanism which assigns bandwidth per
device by io.policy file. What's the difference? The thread grouping
machianism is also not special, it is the same concept as cgroup.
These mechanisms are necessary to make use of dm-ioband on the systems
which doesn't support cgroup such as RHEL 5.x. As you know, dm-ioband
also supports cgroup, the configurations you mentioned above can apply
to the system by dm-ioband. I think it's not bad to have several ways
to setup.

Thanks,
Ryo Tsuruta

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: Regarding dm-ioband tests
@ 2009-09-09  5:17                   ` Ryo Tsuruta
  0 siblings, 0 replies; 80+ messages in thread
From: Ryo Tsuruta @ 2009-09-09  5:17 UTC (permalink / raw)
  To: vgoyal
  Cc: riel, guijianfeng, linux-kernel, jmoyer, dm-devel, jens.axboe,
	nauman, akpm, agk, balbir

Hi Vivek,

Vivek Goyal <vgoyal@redhat.com> wrote:
> > It is not necessary when controlling bandwidth on a per partition
> > basis or on a IO thread basis like Xen blkback kernel thread.
> > 
> > Here are configration examples.
> > http://sourceforge.net/apps/trac/ioband/wiki/dm-ioband/man/examples
> > 
> 
> For partition based control, where a thread or group of threads is doing
> IO to a specific parition, why can't you simply create different cgroups
> for each partition and move threads in those partitions.
> 
> 
> 			root
> 		 	/ | \
> 		    sda1 sda2 sda3
>
> Above are three groups and move threads doing IO into those groups and
> problem is solved. In fact that's what one will do for KVM virtual
> machines. Move all the qemu helper threds doing IO for a virtual machine
> instance into a specific group and control the IO.
> 
> Why do you have to come up with additional complicated grouping mechanism?

I don't get why you think it's complicated, your io-controller also
provides the same grouping machanism which assigns bandwidth per
device by io.policy file. What's the difference? The thread grouping
machianism is also not special, it is the same concept as cgroup.
These mechanisms are necessary to make use of dm-ioband on the systems
which doesn't support cgroup such as RHEL 5.x. As you know, dm-ioband
also supports cgroup, the configurations you mentioned above can apply
to the system by dm-ioband. I think it's not bad to have several ways
to setup.

Thanks,
Ryo Tsuruta

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: Regarding dm-ioband tests
  2009-09-08 17:06             ` Regarding dm-ioband tests Dhaval Giani
@ 2009-09-09  6:05                 ` Ryo Tsuruta
  0 siblings, 0 replies; 80+ messages in thread
From: Ryo Tsuruta @ 2009-09-09  6:05 UTC (permalink / raw)
  To: dhaval
  Cc: vgoyal, riel, linux-kernel, dm-devel, jens.axboe, agk, akpm,
	nauman, guijianfeng, jmoyer, balbir

Hi,

Dhaval Giani <dhaval@linux.vnet.ibm.com> wrote:
> > > - dm-ioband can use without cgroup. (I remember Vivek said it's not an
> > >   advantage.)
> > 
> > I think this is more of a disadvantage than advantage. We have a very well
> > defined functionality of cgroup in kernel to group the tasks. Now you are
> > coming up with your own method of grouping the tasks which will make life
> > even more confusing for users and application writers.

I know that cgroup is a very well defined functionality, that is why
dm-ioband also supports throttling per cgroup. But how are we supposed
to do throttling on the system which doesn't support cgroup?
As I wrote in another mail to Vivek, I would like to make use of
dm-ioband on RHEL 5.x. 
And I don't think that the grouping methods are not complicated, just
stack a new device on the existing device and assign bandwidth to it,
that is the same method as other device-mapper targets, if you would
like to assign bandwidth per thread, then register the thread's ID to
the device and assign bandwidth to it as well. I don't think it makes
users confused.

> I would tend to agree with this. With other resource management
> controllers using cgroups, having dm-ioband use something different will
> require a different set of userspace tools/libraries to be used.
> Something that will severly limit its usefulness froma programmer's
> perspective.

Once we create a dm-ioband device, the device can be configured
through the cgroup interface. I think it will not severly limit its
usefulness.

Thanks,
Ryo Tsuruta

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: Regarding dm-ioband tests
@ 2009-09-09  6:05                 ` Ryo Tsuruta
  0 siblings, 0 replies; 80+ messages in thread
From: Ryo Tsuruta @ 2009-09-09  6:05 UTC (permalink / raw)
  To: dhaval
  Cc: riel, guijianfeng, linux-kernel, jmoyer, dm-devel, vgoyal,
	jens.axboe, nauman, akpm, agk, balbir

Hi,

Dhaval Giani <dhaval@linux.vnet.ibm.com> wrote:
> > > - dm-ioband can use without cgroup. (I remember Vivek said it's not an
> > >   advantage.)
> > 
> > I think this is more of a disadvantage than advantage. We have a very well
> > defined functionality of cgroup in kernel to group the tasks. Now you are
> > coming up with your own method of grouping the tasks which will make life
> > even more confusing for users and application writers.

I know that cgroup is a very well defined functionality, that is why
dm-ioband also supports throttling per cgroup. But how are we supposed
to do throttling on the system which doesn't support cgroup?
As I wrote in another mail to Vivek, I would like to make use of
dm-ioband on RHEL 5.x. 
And I don't think that the grouping methods are not complicated, just
stack a new device on the existing device and assign bandwidth to it,
that is the same method as other device-mapper targets, if you would
like to assign bandwidth per thread, then register the thread's ID to
the device and assign bandwidth to it as well. I don't think it makes
users confused.

> I would tend to agree with this. With other resource management
> controllers using cgroups, having dm-ioband use something different will
> require a different set of userspace tools/libraries to be used.
> Something that will severly limit its usefulness froma programmer's
> perspective.

Once we create a dm-ioband device, the device can be configured
through the cgroup interface. I think it will not severly limit its
usefulness.

Thanks,
Ryo Tsuruta

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: Regarding dm-ioband tests
  2009-09-09  0:09             ` Fabio Checconi
@ 2009-09-09  9:24                 ` Ryo Tsuruta
  2009-09-09  9:24                 ` Ryo Tsuruta
  1 sibling, 0 replies; 80+ messages in thread
From: Ryo Tsuruta @ 2009-09-09  9:24 UTC (permalink / raw)
  To: fchecconi
  Cc: riel, vgoyal, linux-kernel, dm-devel, jens.axboe, agk, akpm,
	nauman, guijianfeng, jmoyer, balbir

Hi,

Fabio Checconi <fchecconi@gmail.com> wrote:
> Hi,
> 
> > From: Rik van Riel <riel@redhat.com>
> > Date: Tue, Sep 08, 2009 03:24:08PM -0400
> >
> > Ryo Tsuruta wrote:
> > >Rik van Riel <riel@redhat.com> wrote:
> > 
> > >>Are you saying that dm-ioband is purposely unfair,
> > >>until a certain load level is reached?
> > >
> > >Not unfair, dm-ioband(weight policy) is intentionally designed to
> > >use bandwidth efficiently, weight policy tries to give spare bandwidth
> > >of inactive groups to active groups.
> > 
> > This sounds good, except that the lack of anticipation
> > means that a group with just one task doing reads will
> > be considered "inactive" in-between reads.
> > 
> 
>   anticipation helps in achieving fairness, but CFQ currently disables
> idling for nonrot+NCQ media, to avoid the resulting throughput loss on
> some SSDs.  Are we really sure that we want to introduce anticipation
> everywhere, not only to improve throughput on rotational media, but to
> achieve fairness too?

I'm also not sure if it's worth introducing anticipation everywhere.
The storage devices are becoming faster and smarter every year. In
practice, I did a benchmark with a SAN storage and the noop scheduler
got the best result.

However, I'll consider how IO from one task should take care of.

Thanks,
Ryo Tsuruta

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: Regarding dm-ioband tests
@ 2009-09-09  9:24                 ` Ryo Tsuruta
  0 siblings, 0 replies; 80+ messages in thread
From: Ryo Tsuruta @ 2009-09-09  9:24 UTC (permalink / raw)
  To: fchecconi
  Cc: riel, guijianfeng, linux-kernel, jmoyer, dm-devel, vgoyal,
	jens.axboe, nauman, akpm, agk, balbir

Hi,

Fabio Checconi <fchecconi@gmail.com> wrote:
> Hi,
> 
> > From: Rik van Riel <riel@redhat.com>
> > Date: Tue, Sep 08, 2009 03:24:08PM -0400
> >
> > Ryo Tsuruta wrote:
> > >Rik van Riel <riel@redhat.com> wrote:
> > 
> > >>Are you saying that dm-ioband is purposely unfair,
> > >>until a certain load level is reached?
> > >
> > >Not unfair, dm-ioband(weight policy) is intentionally designed to
> > >use bandwidth efficiently, weight policy tries to give spare bandwidth
> > >of inactive groups to active groups.
> > 
> > This sounds good, except that the lack of anticipation
> > means that a group with just one task doing reads will
> > be considered "inactive" in-between reads.
> > 
> 
>   anticipation helps in achieving fairness, but CFQ currently disables
> idling for nonrot+NCQ media, to avoid the resulting throughput loss on
> some SSDs.  Are we really sure that we want to introduce anticipation
> everywhere, not only to improve throughput on rotational media, but to
> achieve fairness too?

I'm also not sure if it's worth introducing anticipation everywhere.
The storage devices are becoming faster and smarter every year. In
practice, I did a benchmark with a SAN storage and the noop scheduler
got the best result.

However, I'll consider how IO from one task should take care of.

Thanks,
Ryo Tsuruta

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: Regarding dm-ioband tests
  2009-09-08 13:42             ` Vivek Goyal
                               ` (2 preceding siblings ...)
  (?)
@ 2009-09-09 10:01             ` Ryo Tsuruta
  2009-09-09 14:31                 ` Vivek Goyal
  -1 siblings, 1 reply; 80+ messages in thread
From: Ryo Tsuruta @ 2009-09-09 10:01 UTC (permalink / raw)
  To: vgoyal
  Cc: riel, linux-kernel, dm-devel, jens.axboe, agk, akpm, nauman,
	guijianfeng, jmoyer, balbir

Hi Vivek,

Vivek Goyal <vgoyal@redhat.com> wrote:
> > I think there are some advantages to dm-ioband. That's why I post
> > dm-ioband to the mailing list.
> > 
> > - dm-ioband supports not only proportional weight policy but also rate
> >   limiting policy. Besides, new policies can be added to dm-ioband if
> >   a user wants to control bandwidth by his or her own policy.
> 
> I think we can easily extent io scheduler based controller to also support
> max rate per group policy also. That should not be too hard. It is a
> matter of only keeping track of io rate per group and if a group is
> exceeding the rate, then schedule it out and move on to next group.
> 
> I can do that once proportional weight solution is stablized and gets
> merged. 
> 
> So its not an advantage of dm-ioband.

O.K.

> > - The dm-ioband driver can be replaced without stopping the system by
> >   using device-mapper's facility. It's easy to maintain.
> 
> We talked about this point in the past also. In io scheduler based
> controller, just move all the tasks to root group and you got a system
> not doing any io control.
> 
> By the way why would one like to do that? 
> 
> So this is also not an advantage.

My point is that dm-ioband can be updated for improvements and
bug-fixing without stopping the system.

> > - dm-ioband can use without cgroup. (I remember Vivek said it's not an
> >   advantage.)
> 
> I think this is more of a disadvantage than advantage. We have a very well
> defined functionality of cgroup in kernel to group the tasks. Now you are
> coming up with your own method of grouping the tasks which will make life
> even more confusing for users and application writers.
> 
> I don't understand what is that core requirement of yours which is not met
> by io scheduler based io controller. range policy control you have
> implemented recently. I don't think that removing dm-ioband module
> dynamically is core requirement. Also whatever you can do with additional 
> grouping mechanism, you can do with cgroup also.
> 
> So if there is any of your core functionality which is not fulfilled by
> io scheduler based controller, please let me know. I will be happy to look
> into it and try to provide that feature. But looking at above list, I am
> not convinced that any of the above is a compelling argument for dm-ioband
> inclusion.

As I wrote in another email, I would like to make use of dm-ioband on
the system which doesn't support cgroup such as RHEL. In addition,
there are devices which doesn't use standard IO schedulers, and
dm-ioband can work on even such devices.

Thanks,
Ryo Tsuruta

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: Regarding dm-ioband tests
  2009-09-09  6:05                 ` Ryo Tsuruta
  (?)
@ 2009-09-09 10:51                 ` Dhaval Giani
  2009-09-10  7:58                     ` Ryo Tsuruta
  -1 siblings, 1 reply; 80+ messages in thread
From: Dhaval Giani @ 2009-09-09 10:51 UTC (permalink / raw)
  To: Ryo Tsuruta
  Cc: vgoyal, riel, linux-kernel, dm-devel, jens.axboe, agk, akpm,
	nauman, guijianfeng, jmoyer, balbir

On Wed, Sep 09, 2009 at 03:05:11PM +0900, Ryo Tsuruta wrote:
> Hi,
> 
> Dhaval Giani <dhaval@linux.vnet.ibm.com> wrote:
> > > > - dm-ioband can use without cgroup. (I remember Vivek said it's not an
> > > >   advantage.)
> > > 
> > > I think this is more of a disadvantage than advantage. We have a very well
> > > defined functionality of cgroup in kernel to group the tasks. Now you are
> > > coming up with your own method of grouping the tasks which will make life
> > > even more confusing for users and application writers.
> 
> I know that cgroup is a very well defined functionality, that is why
> dm-ioband also supports throttling per cgroup. But how are we supposed
> to do throttling on the system which doesn't support cgroup?
> As I wrote in another mail to Vivek, I would like to make use of
> dm-ioband on RHEL 5.x. 

Hi Ryo,

I am not sure that upstream should really be worrying about RHEL 5.x.
cgroups is a relatively mature solution and is available in most (if not
all) community distros today. We really should not be looking at another
grouping solution if the sole reason is that then dm-ioband can be used
on RHEL 5.x. The correct solution would be to maintain a separate patch
for RHEL 5.x then and not to burden the upstream kernel.

> And I don't think that the grouping methods are not complicated, just
> stack a new device on the existing device and assign bandwidth to it,
> that is the same method as other device-mapper targets, if you would
> like to assign bandwidth per thread, then register the thread's ID to
> the device and assign bandwidth to it as well. I don't think it makes
> users confused.
> 
> > I would tend to agree with this. With other resource management
> > controllers using cgroups, having dm-ioband use something different will
> > require a different set of userspace tools/libraries to be used.
> > Something that will severly limit its usefulness froma programmer's
> > perspective.
> 
> Once we create a dm-ioband device, the device can be configured
> through the cgroup interface. I think it will not severly limit its
> usefulness.
> 

My objection is slightly different. My objection is that there are too
many interfaces to do the same thing. Which one of these is the
recommended one? WHich one is going to be supported? If we say that
cgroups is not the preferred interface, do the application developers
need to use yet another library for io control along with cpu/memory
control?

thanks,
-- 
regards,
Dhaval

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: Regarding dm-ioband tests
  2009-09-09  5:17                   ` Ryo Tsuruta
@ 2009-09-09 13:34                     ` Vivek Goyal
  -1 siblings, 0 replies; 80+ messages in thread
From: Vivek Goyal @ 2009-09-09 13:34 UTC (permalink / raw)
  To: Ryo Tsuruta
  Cc: balbir, riel, linux-kernel, dm-devel, jens.axboe, agk, akpm,
	nauman, guijianfeng, jmoyer

On Wed, Sep 09, 2009 at 02:17:48PM +0900, Ryo Tsuruta wrote:
> Hi Vivek,
> 
> Vivek Goyal <vgoyal@redhat.com> wrote:
> > > It is not necessary when controlling bandwidth on a per partition
> > > basis or on a IO thread basis like Xen blkback kernel thread.
> > > 
> > > Here are configration examples.
> > > http://sourceforge.net/apps/trac/ioband/wiki/dm-ioband/man/examples
> > > 
> > 
> > For partition based control, where a thread or group of threads is doing
> > IO to a specific parition, why can't you simply create different cgroups
> > for each partition and move threads in those partitions.
> > 
> > 
> > 			root
> > 		 	/ | \
> > 		    sda1 sda2 sda3
> >
> > Above are three groups and move threads doing IO into those groups and
> > problem is solved. In fact that's what one will do for KVM virtual
> > machines. Move all the qemu helper threds doing IO for a virtual machine
> > instance into a specific group and control the IO.
> > 
> > Why do you have to come up with additional complicated grouping mechanism?
> 
> I don't get why you think it's complicated, your io-controller also
> provides the same grouping machanism which assigns bandwidth per
> device by io.policy file. What's the difference?

I am using purely cgroup based interface. This makes life easier for user
space tools and libraries like libcgroup and should also help libvirt. Now
they can treat all the resource controllers in kernel in a uniform way
(through cgroup interface).

With-in cgroups, there are controller specific files and libcgroup is
aware of that. So libcgroup can be modified to take care of special
syntax of io.policy file. But this is not too big a deviation in overall
picture.

dm-ioband is coming up with a whole new way of configuring and managing
groups and now these user space tools shall have to be modified to take
care of this new way only for io controller.

The point is there is no need. You seem to be introducing this new
interface because you want to use this module with RHEL5 which does not
have cgroup support. 

I think this is hard to sell argument that upstream should also introduce
a new interface just because one wants to use the module with older
releases of kernel which did not have cgroup support.

Taking this new interface solves your case but will make life harder for user
space tools, libraries and applications making use of cgroups and various
resource controllers.

> The thread grouping
> machianism is also not special, it is the same concept as cgroup.
> These mechanisms are necessary to make use of dm-ioband on the systems
> which doesn't support cgroup such as RHEL 5.x. As you know, dm-ioband
> also supports cgroup, the configurations you mentioned above can apply
> to the system by dm-ioband.

dm-ioband also supports cgroup but there is additional step required and
that is passing all the cgroup ids to various ioband devices. This
requires knowledge of all the ioband devices and how these have been
created and usage of dm tools. 

The only place where it helps a bit is that once the configuration is
done, one can move the tasks in groups to arbitrarily group them instead
of grouping these on pid, gid etc....

So it still does not solve the issue of dm-ioband being so different from
rest of the controllers and introducing a new interface for creation and
management of groups.

> I think it's not bad to have several ways
> to setup.

It is not bad if there is a proper justification for new interface and why
existing standard mechanism does not meet that requirement.

In this case you are saying that in general cgroup mechanism is sufficient
to take care of grouping of tasks but it is not available in older kernels
hence let us introduce a new interface in upstream kernels. I think this does
not work. This brings in unnecessary overhead of maintaining anohter interface
for upstream and upstream does not benefit from this interface.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: Regarding dm-ioband tests
@ 2009-09-09 13:34                     ` Vivek Goyal
  0 siblings, 0 replies; 80+ messages in thread
From: Vivek Goyal @ 2009-09-09 13:34 UTC (permalink / raw)
  To: Ryo Tsuruta
  Cc: riel, guijianfeng, linux-kernel, jmoyer, dm-devel, jens.axboe,
	nauman, akpm, agk, balbir

On Wed, Sep 09, 2009 at 02:17:48PM +0900, Ryo Tsuruta wrote:
> Hi Vivek,
> 
> Vivek Goyal <vgoyal@redhat.com> wrote:
> > > It is not necessary when controlling bandwidth on a per partition
> > > basis or on a IO thread basis like Xen blkback kernel thread.
> > > 
> > > Here are configration examples.
> > > http://sourceforge.net/apps/trac/ioband/wiki/dm-ioband/man/examples
> > > 
> > 
> > For partition based control, where a thread or group of threads is doing
> > IO to a specific parition, why can't you simply create different cgroups
> > for each partition and move threads in those partitions.
> > 
> > 
> > 			root
> > 		 	/ | \
> > 		    sda1 sda2 sda3
> >
> > Above are three groups and move threads doing IO into those groups and
> > problem is solved. In fact that's what one will do for KVM virtual
> > machines. Move all the qemu helper threds doing IO for a virtual machine
> > instance into a specific group and control the IO.
> > 
> > Why do you have to come up with additional complicated grouping mechanism?
> 
> I don't get why you think it's complicated, your io-controller also
> provides the same grouping machanism which assigns bandwidth per
> device by io.policy file. What's the difference?

I am using purely cgroup based interface. This makes life easier for user
space tools and libraries like libcgroup and should also help libvirt. Now
they can treat all the resource controllers in kernel in a uniform way
(through cgroup interface).

With-in cgroups, there are controller specific files and libcgroup is
aware of that. So libcgroup can be modified to take care of special
syntax of io.policy file. But this is not too big a deviation in overall
picture.

dm-ioband is coming up with a whole new way of configuring and managing
groups and now these user space tools shall have to be modified to take
care of this new way only for io controller.

The point is there is no need. You seem to be introducing this new
interface because you want to use this module with RHEL5 which does not
have cgroup support. 

I think this is hard to sell argument that upstream should also introduce
a new interface just because one wants to use the module with older
releases of kernel which did not have cgroup support.

Taking this new interface solves your case but will make life harder for user
space tools, libraries and applications making use of cgroups and various
resource controllers.

> The thread grouping
> machianism is also not special, it is the same concept as cgroup.
> These mechanisms are necessary to make use of dm-ioband on the systems
> which doesn't support cgroup such as RHEL 5.x. As you know, dm-ioband
> also supports cgroup, the configurations you mentioned above can apply
> to the system by dm-ioband.

dm-ioband also supports cgroup but there is additional step required and
that is passing all the cgroup ids to various ioband devices. This
requires knowledge of all the ioband devices and how these have been
created and usage of dm tools. 

The only place where it helps a bit is that once the configuration is
done, one can move the tasks in groups to arbitrarily group them instead
of grouping these on pid, gid etc....

So it still does not solve the issue of dm-ioband being so different from
rest of the controllers and introducing a new interface for creation and
management of groups.

> I think it's not bad to have several ways
> to setup.

It is not bad if there is a proper justification for new interface and why
existing standard mechanism does not meet that requirement.

In this case you are saying that in general cgroup mechanism is sufficient
to take care of grouping of tasks but it is not available in older kernels
hence let us introduce a new interface in upstream kernels. I think this does
not work. This brings in unnecessary overhead of maintaining anohter interface
for upstream and upstream does not benefit from this interface.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: Regarding dm-ioband tests
  2009-09-09  6:05                 ` Ryo Tsuruta
@ 2009-09-09 13:57                   ` Vivek Goyal
  -1 siblings, 0 replies; 80+ messages in thread
From: Vivek Goyal @ 2009-09-09 13:57 UTC (permalink / raw)
  To: Ryo Tsuruta
  Cc: dhaval, riel, linux-kernel, dm-devel, jens.axboe, agk, akpm,
	nauman, guijianfeng, jmoyer, balbir

On Wed, Sep 09, 2009 at 03:05:11PM +0900, Ryo Tsuruta wrote:
> Hi,
> 
> Dhaval Giani <dhaval@linux.vnet.ibm.com> wrote:
> > > > - dm-ioband can use without cgroup. (I remember Vivek said it's not an
> > > >   advantage.)
> > > 
> > > I think this is more of a disadvantage than advantage. We have a very well
> > > defined functionality of cgroup in kernel to group the tasks. Now you are
> > > coming up with your own method of grouping the tasks which will make life
> > > even more confusing for users and application writers.
> 
> I know that cgroup is a very well defined functionality, that is why
> dm-ioband also supports throttling per cgroup. But how are we supposed
> to do throttling on the system which doesn't support cgroup?
> As I wrote in another mail to Vivek, I would like to make use of
> dm-ioband on RHEL 5.x. 

I think you need to maintain and support this module out of the kernel tree for
older kernels. Does not make much sense to introduce new interfaces to support
a functionality in older kernels.

> And I don't think that the grouping methods are not complicated, just
> stack a new device on the existing device and assign bandwidth to it,
> that is the same method as other device-mapper targets, if you would
> like to assign bandwidth per thread, then register the thread's ID to
> the device and assign bandwidth to it as well. I don't think it makes
> users confused.

- First of all it is more about doing things a new way and not the standard
  way. Moreover upstream does not benefit from this new interface. It just
  stands to loose because of maintenance overhead and need of chaning user
  space tools to make use of this new interface.

- Secondly, personally I think it more twisted also. Following is the
  small code to setup two ioband devices ioband1 and ioband2 and two
  additional groups on ioband1 device using cgroup interface.

***********************************************************************
echo "0 $(blockdev --getsize /dev/sdd1) ioband /dev/sdd1 1 0 0 none"
"weight 0 :200" | dmsetup create ioband1
echo "0 $(blockdev --getsize /dev/sdd2) ioband /dev/sdd2 1 0 0 none"
"weight 0 :100" | dmsetup create ioband2

mount -t cgroup -o blkio hier1 /cgroup/ioband
mkdir /cgroup/ioband/test1 /cgroup/ioband/test2

test1_id=`cat /cgroup/ioband/test1/blkio.id`
test2_id=`cat /cgroup/ioband/test2/blkio.id`

test1_weight=200
test2_weight=100

dmsetup message ioband1 0 type cgroup
dmsetup message ioband1 0 attach $test1_id
dmsetup message ioband1 0 attach $test2_id
dmsetup message ioband1 0 weight $test1_id:$test1_weight
dmsetup message ioband1 0 weight $test2_id:$test2_weight

mount /dev/mapper/ioband1 /mnt/sdd1
mount /dev/mapper/ioband2 /mnt/sdd2
************************************************************************* 

For status of various settings one needs to use "dmsetup status" and
"dmsetup table" commands. Look at the output of these commands with just
two groups. Output for all the groups is on a single line. Think of the
situation when there are 7-8 groups and how bad it will look.

#dmsetup status
ioband2: 0 40355280 ioband 1 -1 105 0 834 1 0 8
ioband1: 0 37768752 ioband 1 -1 105 0 834 1 0 8 2 0 0 0 0 0 0 3 0 0 0 0 0 0

#dmsetup table
ioband2: 0 40355280 ioband 8:50 1 4 192 none weight 768 :100
ioband1: 0 37768752 ioband 8:49 1 4 192 cgroup weight 768 :200 2:200 3:100

I find it so hard to interpre those numbers. Everything about a device is
exported in a single line.

In cgroup based interface, things are deviced nicely among different
files. Also one group shows statistics about that group only and not about
all the groups present in the system. It is easier to parse and
comprehend.

> 
> > I would tend to agree with this. With other resource management
> > controllers using cgroups, having dm-ioband use something different will
> > require a different set of userspace tools/libraries to be used.
> > Something that will severly limit its usefulness froma programmer's
> > perspective.
> 
> Once we create a dm-ioband device, the device can be configured
> through the cgroup interface. I think it will not severly limit its
> usefulness.

To create the device once you need dm-tools and libcgroup needs to learn 
how to make various use of various dm commands. It also needs to learn how
to parse outputs of "dmsetup table" and "dmsetup status" commands and
consolidate that information.

This is despite the fact that it is using cgroup interface finally to
group the task. But libcgroup still need to propagate cgroup id to
individual ioband devices.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: Regarding dm-ioband tests
@ 2009-09-09 13:57                   ` Vivek Goyal
  0 siblings, 0 replies; 80+ messages in thread
From: Vivek Goyal @ 2009-09-09 13:57 UTC (permalink / raw)
  To: Ryo Tsuruta
  Cc: riel, dhaval, guijianfeng, linux-kernel, jmoyer, dm-devel,
	jens.axboe, nauman, akpm, agk, balbir

On Wed, Sep 09, 2009 at 03:05:11PM +0900, Ryo Tsuruta wrote:
> Hi,
> 
> Dhaval Giani <dhaval@linux.vnet.ibm.com> wrote:
> > > > - dm-ioband can use without cgroup. (I remember Vivek said it's not an
> > > >   advantage.)
> > > 
> > > I think this is more of a disadvantage than advantage. We have a very well
> > > defined functionality of cgroup in kernel to group the tasks. Now you are
> > > coming up with your own method of grouping the tasks which will make life
> > > even more confusing for users and application writers.
> 
> I know that cgroup is a very well defined functionality, that is why
> dm-ioband also supports throttling per cgroup. But how are we supposed
> to do throttling on the system which doesn't support cgroup?
> As I wrote in another mail to Vivek, I would like to make use of
> dm-ioband on RHEL 5.x. 

I think you need to maintain and support this module out of the kernel tree for
older kernels. Does not make much sense to introduce new interfaces to support
a functionality in older kernels.

> And I don't think that the grouping methods are not complicated, just
> stack a new device on the existing device and assign bandwidth to it,
> that is the same method as other device-mapper targets, if you would
> like to assign bandwidth per thread, then register the thread's ID to
> the device and assign bandwidth to it as well. I don't think it makes
> users confused.

- First of all it is more about doing things a new way and not the standard
  way. Moreover upstream does not benefit from this new interface. It just
  stands to loose because of maintenance overhead and need of chaning user
  space tools to make use of this new interface.

- Secondly, personally I think it more twisted also. Following is the
  small code to setup two ioband devices ioband1 and ioband2 and two
  additional groups on ioband1 device using cgroup interface.

***********************************************************************
echo "0 $(blockdev --getsize /dev/sdd1) ioband /dev/sdd1 1 0 0 none"
"weight 0 :200" | dmsetup create ioband1
echo "0 $(blockdev --getsize /dev/sdd2) ioband /dev/sdd2 1 0 0 none"
"weight 0 :100" | dmsetup create ioband2

mount -t cgroup -o blkio hier1 /cgroup/ioband
mkdir /cgroup/ioband/test1 /cgroup/ioband/test2

test1_id=`cat /cgroup/ioband/test1/blkio.id`
test2_id=`cat /cgroup/ioband/test2/blkio.id`

test1_weight=200
test2_weight=100

dmsetup message ioband1 0 type cgroup
dmsetup message ioband1 0 attach $test1_id
dmsetup message ioband1 0 attach $test2_id
dmsetup message ioband1 0 weight $test1_id:$test1_weight
dmsetup message ioband1 0 weight $test2_id:$test2_weight

mount /dev/mapper/ioband1 /mnt/sdd1
mount /dev/mapper/ioband2 /mnt/sdd2
************************************************************************* 

For status of various settings one needs to use "dmsetup status" and
"dmsetup table" commands. Look at the output of these commands with just
two groups. Output for all the groups is on a single line. Think of the
situation when there are 7-8 groups and how bad it will look.

#dmsetup status
ioband2: 0 40355280 ioband 1 -1 105 0 834 1 0 8
ioband1: 0 37768752 ioband 1 -1 105 0 834 1 0 8 2 0 0 0 0 0 0 3 0 0 0 0 0 0

#dmsetup table
ioband2: 0 40355280 ioband 8:50 1 4 192 none weight 768 :100
ioband1: 0 37768752 ioband 8:49 1 4 192 cgroup weight 768 :200 2:200 3:100

I find it so hard to interpre those numbers. Everything about a device is
exported in a single line.

In cgroup based interface, things are deviced nicely among different
files. Also one group shows statistics about that group only and not about
all the groups present in the system. It is easier to parse and
comprehend.

> 
> > I would tend to agree with this. With other resource management
> > controllers using cgroups, having dm-ioband use something different will
> > require a different set of userspace tools/libraries to be used.
> > Something that will severly limit its usefulness froma programmer's
> > perspective.
> 
> Once we create a dm-ioband device, the device can be configured
> through the cgroup interface. I think it will not severly limit its
> usefulness.

To create the device once you need dm-tools and libcgroup needs to learn 
how to make various use of various dm commands. It also needs to learn how
to parse outputs of "dmsetup table" and "dmsetup status" commands and
consolidate that information.

This is despite the fact that it is using cgroup interface finally to
group the task. But libcgroup still need to propagate cgroup id to
individual ioband devices.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: Regarding dm-ioband tests
  2009-09-09 10:01             ` Ryo Tsuruta
@ 2009-09-09 14:31                 ` Vivek Goyal
  0 siblings, 0 replies; 80+ messages in thread
From: Vivek Goyal @ 2009-09-09 14:31 UTC (permalink / raw)
  To: Ryo Tsuruta
  Cc: riel, linux-kernel, dm-devel, jens.axboe, agk, akpm, nauman,
	guijianfeng, jmoyer, balbir

On Wed, Sep 09, 2009 at 07:01:46PM +0900, Ryo Tsuruta wrote:
> Hi Vivek,
> 
> Vivek Goyal <vgoyal@redhat.com> wrote:
> > > I think there are some advantages to dm-ioband. That's why I post
> > > dm-ioband to the mailing list.
> > > 
> > > - dm-ioband supports not only proportional weight policy but also rate
> > >   limiting policy. Besides, new policies can be added to dm-ioband if
> > >   a user wants to control bandwidth by his or her own policy.
> > 
> > I think we can easily extent io scheduler based controller to also support
> > max rate per group policy also. That should not be too hard. It is a
> > matter of only keeping track of io rate per group and if a group is
> > exceeding the rate, then schedule it out and move on to next group.
> > 
> > I can do that once proportional weight solution is stablized and gets
> > merged. 
> > 
> > So its not an advantage of dm-ioband.
> 
> O.K.
> 
> > > - The dm-ioband driver can be replaced without stopping the system by
> > >   using device-mapper's facility. It's easy to maintain.
> > 
> > We talked about this point in the past also. In io scheduler based
> > controller, just move all the tasks to root group and you got a system
> > not doing any io control.
> > 
> > By the way why would one like to do that? 
> > 
> > So this is also not an advantage.
> 
> My point is that dm-ioband can be updated for improvements and
> bug-fixing without stopping the system.
> 
> > > - dm-ioband can use without cgroup. (I remember Vivek said it's not an
> > >   advantage.)
> > 
> > I think this is more of a disadvantage than advantage. We have a very well
> > defined functionality of cgroup in kernel to group the tasks. Now you are
> > coming up with your own method of grouping the tasks which will make life
> > even more confusing for users and application writers.
> > 
> > I don't understand what is that core requirement of yours which is not met
> > by io scheduler based io controller. range policy control you have
> > implemented recently. I don't think that removing dm-ioband module
> > dynamically is core requirement. Also whatever you can do with additional 
> > grouping mechanism, you can do with cgroup also.
> > 
> > So if there is any of your core functionality which is not fulfilled by
> > io scheduler based controller, please let me know. I will be happy to look
> > into it and try to provide that feature. But looking at above list, I am
> > not convinced that any of the above is a compelling argument for dm-ioband
> > inclusion.
> 
> As I wrote in another email, I would like to make use of dm-ioband on
> the system which doesn't support cgroup such as RHEL.

For supporting io controller mechanism in older kernels which don't have
cgroup interface support, I think one needs to maintain out of the tree
module. Upstream does not benefit from it.

> In addition,
> there are devices which doesn't use standard IO schedulers, and
> dm-ioband can work on even such devices.

This is a interesting use case. Few thoughts.

- Can't io scheduling mechanism of these devices make use of elevator and
  elevator fair queuing interfaces to take advantage of io controlling
  mechanism. It should not be too difficult. Look at noop. It has
  just 131 lines of code and it now supports hierarchical io scheduling.
 
  This will come with request queue and its merging and plug/unplug
  mechanism. Is that an issue?

- If not, then yes, for these corner cases, io scheduler based controller
  does not work as it is.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: Regarding dm-ioband tests
@ 2009-09-09 14:31                 ` Vivek Goyal
  0 siblings, 0 replies; 80+ messages in thread
From: Vivek Goyal @ 2009-09-09 14:31 UTC (permalink / raw)
  To: Ryo Tsuruta
  Cc: riel, guijianfeng, linux-kernel, jmoyer, dm-devel, jens.axboe,
	nauman, akpm, agk, balbir

On Wed, Sep 09, 2009 at 07:01:46PM +0900, Ryo Tsuruta wrote:
> Hi Vivek,
> 
> Vivek Goyal <vgoyal@redhat.com> wrote:
> > > I think there are some advantages to dm-ioband. That's why I post
> > > dm-ioband to the mailing list.
> > > 
> > > - dm-ioband supports not only proportional weight policy but also rate
> > >   limiting policy. Besides, new policies can be added to dm-ioband if
> > >   a user wants to control bandwidth by his or her own policy.
> > 
> > I think we can easily extent io scheduler based controller to also support
> > max rate per group policy also. That should not be too hard. It is a
> > matter of only keeping track of io rate per group and if a group is
> > exceeding the rate, then schedule it out and move on to next group.
> > 
> > I can do that once proportional weight solution is stablized and gets
> > merged. 
> > 
> > So its not an advantage of dm-ioband.
> 
> O.K.
> 
> > > - The dm-ioband driver can be replaced without stopping the system by
> > >   using device-mapper's facility. It's easy to maintain.
> > 
> > We talked about this point in the past also. In io scheduler based
> > controller, just move all the tasks to root group and you got a system
> > not doing any io control.
> > 
> > By the way why would one like to do that? 
> > 
> > So this is also not an advantage.
> 
> My point is that dm-ioband can be updated for improvements and
> bug-fixing without stopping the system.
> 
> > > - dm-ioband can use without cgroup. (I remember Vivek said it's not an
> > >   advantage.)
> > 
> > I think this is more of a disadvantage than advantage. We have a very well
> > defined functionality of cgroup in kernel to group the tasks. Now you are
> > coming up with your own method of grouping the tasks which will make life
> > even more confusing for users and application writers.
> > 
> > I don't understand what is that core requirement of yours which is not met
> > by io scheduler based io controller. range policy control you have
> > implemented recently. I don't think that removing dm-ioband module
> > dynamically is core requirement. Also whatever you can do with additional 
> > grouping mechanism, you can do with cgroup also.
> > 
> > So if there is any of your core functionality which is not fulfilled by
> > io scheduler based controller, please let me know. I will be happy to look
> > into it and try to provide that feature. But looking at above list, I am
> > not convinced that any of the above is a compelling argument for dm-ioband
> > inclusion.
> 
> As I wrote in another email, I would like to make use of dm-ioband on
> the system which doesn't support cgroup such as RHEL.

For supporting io controller mechanism in older kernels which don't have
cgroup interface support, I think one needs to maintain out of the tree
module. Upstream does not benefit from it.

> In addition,
> there are devices which doesn't use standard IO schedulers, and
> dm-ioband can work on even such devices.

This is a interesting use case. Few thoughts.

- Can't io scheduling mechanism of these devices make use of elevator and
  elevator fair queuing interfaces to take advantage of io controlling
  mechanism. It should not be too difficult. Look at noop. It has
  just 131 lines of code and it now supports hierarchical io scheduling.
 
  This will come with request queue and its merging and plug/unplug
  mechanism. Is that an issue?

- If not, then yes, for these corner cases, io scheduler based controller
  does not work as it is.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: Regarding dm-ioband tests
  2009-09-09  2:06                 ` Vivek Goyal
  (?)
@ 2009-09-09 15:41                 ` Fabio Checconi
  2009-09-09 17:30                     ` Vivek Goyal
  -1 siblings, 1 reply; 80+ messages in thread
From: Fabio Checconi @ 2009-09-09 15:41 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Rik van Riel, Ryo Tsuruta, linux-kernel, dm-devel, jens.axboe,
	agk, akpm, nauman, guijianfeng, jmoyer, balbir

> From: Vivek Goyal <vgoyal@redhat.com>
> Date: Tue, Sep 08, 2009 10:06:20PM -0400
>
> On Wed, Sep 09, 2009 at 02:09:00AM +0200, Fabio Checconi wrote:
> > Hi,
> > 
> > > From: Rik van Riel <riel@redhat.com>
> > > Date: Tue, Sep 08, 2009 03:24:08PM -0400
> > >
> > > Ryo Tsuruta wrote:
> > > >Rik van Riel <riel@redhat.com> wrote:
> > > 
> > > >>Are you saying that dm-ioband is purposely unfair,
> > > >>until a certain load level is reached?
> > > >
> > > >Not unfair, dm-ioband(weight policy) is intentionally designed to
> > > >use bandwidth efficiently, weight policy tries to give spare bandwidth
> > > >of inactive groups to active groups.
> > > 
> > > This sounds good, except that the lack of anticipation
> > > means that a group with just one task doing reads will
> > > be considered "inactive" in-between reads.
> > > 
> > 
> >   anticipation helps in achieving fairness, but CFQ currently disables
> > idling for nonrot+NCQ media, to avoid the resulting throughput loss on
> > some SSDs.  Are we really sure that we want to introduce anticipation
> > everywhere, not only to improve throughput on rotational media, but to
> > achieve fairness too?
> 
> That's a good point. Personally I think that fairness requirements for
> individual queues and groups are little different. CFQ in general seems
> to be focussing more on latency and throughput at the cost of fairness.
> 
> With groups, we probably need to put a greater amount of emphasis on group
> fairness. So group will be a relatively a slower entity (with anticiaption
> on and more idling), but it will also give you a greater amount of
> isolation. So in practice, one will create groups carefully and they will
> not proliferate like queues. This can mean overall reduced throughput on
> SSD.
> 

Ok, I personally agree on that, but I think it's something to be documented.


> Having said that, group idling is tunable and one can always reduce it to
> achieve a balance between fairness vs throughput depending on his need.
> 

This is good, however tuning will not be an easy task (at least, in my
experience with BFQ it has been a problem): while for throughput usually
there are tradeoffs, as soon as a queue/group idles and then timeouts,
from the fairness perspective the results soon become almost random
(i.e., depending on the rate of successful anticipations, but in the
common case they are unpredictable)...

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: Regarding dm-ioband tests
  2009-09-09 15:41                 ` Fabio Checconi
@ 2009-09-09 17:30                     ` Vivek Goyal
  0 siblings, 0 replies; 80+ messages in thread
From: Vivek Goyal @ 2009-09-09 17:30 UTC (permalink / raw)
  To: Fabio Checconi
  Cc: Rik van Riel, Ryo Tsuruta, linux-kernel, dm-devel, jens.axboe,
	agk, akpm, nauman, guijianfeng, jmoyer, balbir

On Wed, Sep 09, 2009 at 05:41:26PM +0200, Fabio Checconi wrote:
> > From: Vivek Goyal <vgoyal@redhat.com>
> > Date: Tue, Sep 08, 2009 10:06:20PM -0400
> >
> > On Wed, Sep 09, 2009 at 02:09:00AM +0200, Fabio Checconi wrote:
> > > Hi,
> > > 
> > > > From: Rik van Riel <riel@redhat.com>
> > > > Date: Tue, Sep 08, 2009 03:24:08PM -0400
> > > >
> > > > Ryo Tsuruta wrote:
> > > > >Rik van Riel <riel@redhat.com> wrote:
> > > > 
> > > > >>Are you saying that dm-ioband is purposely unfair,
> > > > >>until a certain load level is reached?
> > > > >
> > > > >Not unfair, dm-ioband(weight policy) is intentionally designed to
> > > > >use bandwidth efficiently, weight policy tries to give spare bandwidth
> > > > >of inactive groups to active groups.
> > > > 
> > > > This sounds good, except that the lack of anticipation
> > > > means that a group with just one task doing reads will
> > > > be considered "inactive" in-between reads.
> > > > 
> > > 
> > >   anticipation helps in achieving fairness, but CFQ currently disables
> > > idling for nonrot+NCQ media, to avoid the resulting throughput loss on
> > > some SSDs.  Are we really sure that we want to introduce anticipation
> > > everywhere, not only to improve throughput on rotational media, but to
> > > achieve fairness too?
> > 
> > That's a good point. Personally I think that fairness requirements for
> > individual queues and groups are little different. CFQ in general seems
> > to be focussing more on latency and throughput at the cost of fairness.
> > 
> > With groups, we probably need to put a greater amount of emphasis on group
> > fairness. So group will be a relatively a slower entity (with anticiaption
> > on and more idling), but it will also give you a greater amount of
> > isolation. So in practice, one will create groups carefully and they will
> > not proliferate like queues. This can mean overall reduced throughput on
> > SSD.
> > 
> 
> Ok, I personally agree on that, but I think it's something to be documented.
> 

Sure. I will document it in documentation file.

> 
> > Having said that, group idling is tunable and one can always reduce it to
> > achieve a balance between fairness vs throughput depending on his need.
> > 
> 
> This is good, however tuning will not be an easy task (at least, in my
> experience with BFQ it has been a problem): while for throughput usually
> there are tradeoffs, as soon as a queue/group idles and then timeouts,
> from the fairness perspective the results soon become almost random
> (i.e., depending on the rate of successful anticipations, but in the
> common case they are unpredictable)...

I am lost in last few lines. I guess you are suggesting that static tuning
is hard and dynamically adjusting idling has limitations that it might not
be accurate all the time?

I will explain how things are working in current set of io scheduler
patches.

Currently on top of queue idling, I have implemented group idling also.
Queue idling is dynamic and io scheduler like CFQ keeps track of
traffic pattern on the queue and disables/enables idling dynamically. So
in this case fairness depends on rate of successful anticipations by the
io scheduler.

Group idling currently is static in nature and purely implemented in
elevator fair queuing layer. Group idling kicks in only when a group is
empty at the time of queue expiration and underlying ioscheduler has not
chosen to enable idling on the queue. This provides us the gurantee that
group will keep on getting its fair share of disk as long as a new request
comes in the group with-in that idling period.

Implementing group idling ensures that it does not bog down the io scheduler
and with-in group queue switching can still be very fast (no idling on many of
the queues by cfq).

Now in case of SSD if group idling is really hurting somebody, I would
expect him to set it to either 1 or 0. You might get better throughput
but then expect fairness for the group only if the group is continuously
backlogged. (Something what dm-ioband guys seem to be doing).

So do you think that adjusting this "group_idling" tunable is too
complicated and there are better ways to handle it in case of SSD+NCQ?

Thanks
Vivek

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: Regarding dm-ioband tests
@ 2009-09-09 17:30                     ` Vivek Goyal
  0 siblings, 0 replies; 80+ messages in thread
From: Vivek Goyal @ 2009-09-09 17:30 UTC (permalink / raw)
  To: Fabio Checconi
  Cc: Rik van Riel, guijianfeng, linux-kernel, jmoyer, dm-devel,
	nauman, jens.axboe, akpm, agk, balbir

On Wed, Sep 09, 2009 at 05:41:26PM +0200, Fabio Checconi wrote:
> > From: Vivek Goyal <vgoyal@redhat.com>
> > Date: Tue, Sep 08, 2009 10:06:20PM -0400
> >
> > On Wed, Sep 09, 2009 at 02:09:00AM +0200, Fabio Checconi wrote:
> > > Hi,
> > > 
> > > > From: Rik van Riel <riel@redhat.com>
> > > > Date: Tue, Sep 08, 2009 03:24:08PM -0400
> > > >
> > > > Ryo Tsuruta wrote:
> > > > >Rik van Riel <riel@redhat.com> wrote:
> > > > 
> > > > >>Are you saying that dm-ioband is purposely unfair,
> > > > >>until a certain load level is reached?
> > > > >
> > > > >Not unfair, dm-ioband(weight policy) is intentionally designed to
> > > > >use bandwidth efficiently, weight policy tries to give spare bandwidth
> > > > >of inactive groups to active groups.
> > > > 
> > > > This sounds good, except that the lack of anticipation
> > > > means that a group with just one task doing reads will
> > > > be considered "inactive" in-between reads.
> > > > 
> > > 
> > >   anticipation helps in achieving fairness, but CFQ currently disables
> > > idling for nonrot+NCQ media, to avoid the resulting throughput loss on
> > > some SSDs.  Are we really sure that we want to introduce anticipation
> > > everywhere, not only to improve throughput on rotational media, but to
> > > achieve fairness too?
> > 
> > That's a good point. Personally I think that fairness requirements for
> > individual queues and groups are little different. CFQ in general seems
> > to be focussing more on latency and throughput at the cost of fairness.
> > 
> > With groups, we probably need to put a greater amount of emphasis on group
> > fairness. So group will be a relatively a slower entity (with anticiaption
> > on and more idling), but it will also give you a greater amount of
> > isolation. So in practice, one will create groups carefully and they will
> > not proliferate like queues. This can mean overall reduced throughput on
> > SSD.
> > 
> 
> Ok, I personally agree on that, but I think it's something to be documented.
> 

Sure. I will document it in documentation file.

> 
> > Having said that, group idling is tunable and one can always reduce it to
> > achieve a balance between fairness vs throughput depending on his need.
> > 
> 
> This is good, however tuning will not be an easy task (at least, in my
> experience with BFQ it has been a problem): while for throughput usually
> there are tradeoffs, as soon as a queue/group idles and then timeouts,
> from the fairness perspective the results soon become almost random
> (i.e., depending on the rate of successful anticipations, but in the
> common case they are unpredictable)...

I am lost in last few lines. I guess you are suggesting that static tuning
is hard and dynamically adjusting idling has limitations that it might not
be accurate all the time?

I will explain how things are working in current set of io scheduler
patches.

Currently on top of queue idling, I have implemented group idling also.
Queue idling is dynamic and io scheduler like CFQ keeps track of
traffic pattern on the queue and disables/enables idling dynamically. So
in this case fairness depends on rate of successful anticipations by the
io scheduler.

Group idling currently is static in nature and purely implemented in
elevator fair queuing layer. Group idling kicks in only when a group is
empty at the time of queue expiration and underlying ioscheduler has not
chosen to enable idling on the queue. This provides us the gurantee that
group will keep on getting its fair share of disk as long as a new request
comes in the group with-in that idling period.

Implementing group idling ensures that it does not bog down the io scheduler
and with-in group queue switching can still be very fast (no idling on many of
the queues by cfq).

Now in case of SSD if group idling is really hurting somebody, I would
expect him to set it to either 1 or 0. You might get better throughput
but then expect fairness for the group only if the group is continuously
backlogged. (Something what dm-ioband guys seem to be doing).

So do you think that adjusting this "group_idling" tunable is too
complicated and there are better ways to handle it in case of SSD+NCQ?

Thanks
Vivek

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: Regarding dm-ioband tests
  2009-09-09 17:30                     ` Vivek Goyal
  (?)
@ 2009-09-09 19:01                     ` Fabio Checconi
  -1 siblings, 0 replies; 80+ messages in thread
From: Fabio Checconi @ 2009-09-09 19:01 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Rik van Riel, Ryo Tsuruta, linux-kernel, dm-devel, jens.axboe,
	agk, akpm, nauman, guijianfeng, jmoyer, balbir

> From: Vivek Goyal <vgoyal@redhat.com>
> Date: Wed, Sep 09, 2009 01:30:03PM -0400
>
> On Wed, Sep 09, 2009 at 05:41:26PM +0200, Fabio Checconi wrote:
> > > From: Vivek Goyal <vgoyal@redhat.com>
> > > Date: Tue, Sep 08, 2009 10:06:20PM -0400
> > >
...
> > This is good, however tuning will not be an easy task (at least, in my
> > experience with BFQ it has been a problem): while for throughput usually
> > there are tradeoffs, as soon as a queue/group idles and then timeouts,
> > from the fairness perspective the results soon become almost random
> > (i.e., depending on the rate of successful anticipations, but in the
> > common case they are unpredictable)...
> 
> I am lost in last few lines. I guess you are suggesting that static tuning
> is hard and dynamically adjusting idling has limitations that it might not
> be accurate all the time?
> 

Yes, this was the problem, at least for me.  As soon as there were
unsuccessful anticipations there was no graceful degradation of
fairness, and bandwidth distribution became almost random.  In
this situation all the complexity of CFQ/BFQ/io-controller seems
overkill; NCQ+SSD is or will be quite a common usage scenario
triggering it.


> I will explain how things are working in current set of io scheduler
> patches.
> 
> Currently on top of queue idling, I have implemented group idling also.
> Queue idling is dynamic and io scheduler like CFQ keeps track of
> traffic pattern on the queue and disables/enables idling dynamically. So
> in this case fairness depends on rate of successful anticipations by the
> io scheduler.
> 
> Group idling currently is static in nature and purely implemented in
> elevator fair queuing layer. Group idling kicks in only when a group is
> empty at the time of queue expiration and underlying ioscheduler has not
> chosen to enable idling on the queue. This provides us the gurantee that
> group will keep on getting its fair share of disk as long as a new request
> comes in the group with-in that idling period.
> 
> Implementing group idling ensures that it does not bog down the io scheduler
> and with-in group queue switching can still be very fast (no idling on many of
> the queues by cfq).
> 
> Now in case of SSD if group idling is really hurting somebody, I would
> expect him to set it to either 1 or 0. You might get better throughput
> but then expect fairness for the group only if the group is continuously
> backlogged. (Something what dm-ioband guys seem to be doing).
> 
> So do you think that adjusting this "group_idling" tunable is too
> complicated and there are better ways to handle it in case of SSD+NCQ?
> 

Unfortunately I am not aware of any reasonable and working method to
properly handle this issue; anyway adjusting the tunable is something
that needs a lot of care.

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: Regarding dm-ioband tests
  2009-09-09 13:57                   ` Vivek Goyal
  (?)
@ 2009-09-10  3:06                   ` Ryo Tsuruta
  -1 siblings, 0 replies; 80+ messages in thread
From: Ryo Tsuruta @ 2009-09-10  3:06 UTC (permalink / raw)
  To: vgoyal
  Cc: dhaval, riel, linux-kernel, dm-devel, jens.axboe, agk, akpm,
	nauman, guijianfeng, jmoyer, balbir

Hi Vivek,

Vivek Goyal <vgoyal@redhat.com> wrote:
> - Secondly, personally I think it more twisted also. Following is the
>   small code to setup two ioband devices ioband1 and ioband2 and two
>   additional groups on ioband1 device using cgroup interface.
> 

The latest dm-ioband and blkio-cgroup, configurations can be done
through the cgroup interface once a dm-device is created, blkio.setting
file is appered under the cgroup directory. There is no need to run
"dmsetup message" command and care the the blkio.id anymore.

The following is an example script based on yours.

***********************************************************************
echo "0 $(blockdev --getsize /dev/sdd1) ioband /dev/sdd1 1 0 0 cgroup"
 "weight 0 :100" | dmsetup create ioband1

mount -t cgroup -o blkio hier1 /cgroup/ioband
mkdir /cgroup/ioband/test1 /cgroup/ioband/test2

echo ioband1 200 > /cgroup/ioband/test1/blkio.settings
echo ioband1 100 > /cgroup/ioband/test2/blkio.settings

mount /dev/mapper/ioband1 /mnt/sdd1

> For status of various settings one needs to use "dmsetup status" and
> "dmsetup table" commands. Look at the output of these commands with just
> two groups. Output for all the groups is on a single line. Think of the
> situation when there are 7-8 groups and how bad it will look.
> 
> #dmsetup status
> ioband2: 0 40355280 ioband 1 -1 105 0 834 1 0 8
> ioband1: 0 37768752 ioband 1 -1 105 0 834 1 0 8 2 0 0 0 0 0 0 3 0 0 0 0 0 0

I'll provide blkio.stat file to get statistics per cgroup in the next
release.

> > Once we create a dm-ioband device, the device can be configured
> > through the cgroup interface. I think it will not severly limit its
> > usefulness.
> 
> To create the device once you need dm-tools and libcgroup needs to learn 
> how to make various use of various dm commands. It also needs to learn how
> to parse outputs of "dmsetup table" and "dmsetup status" commands and
> consolidate that information.
> 
> This is despite the fact that it is using cgroup interface finally to
> group the task. But libcgroup still need to propagate cgroup id to
> individual ioband devices.

We still need to use dmsetup for the device creation, but It is not
too much pain. I think it would be better If dm-ioband is intergrated
into LVM, then we can handle dm-ioband devies in almost the same
manner as other LV devices.

Thanks,
Ryo Tsuruta

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: Regarding dm-ioband tests
  2009-09-09 14:31                 ` Vivek Goyal
  (?)
@ 2009-09-10  3:45                 ` Ryo Tsuruta
  2009-09-10 13:25                     ` Vivek Goyal
  -1 siblings, 1 reply; 80+ messages in thread
From: Ryo Tsuruta @ 2009-09-10  3:45 UTC (permalink / raw)
  To: vgoyal
  Cc: riel, linux-kernel, dm-devel, jens.axboe, agk, akpm, nauman,
	guijianfeng, jmoyer, balbir

Hi Vivek,

Vivek Goyal <vgoyal@redhat.com> wrote:
> > In addition,
> > there are devices which doesn't use standard IO schedulers, and
> > dm-ioband can work on even such devices.
> 
> This is a interesting use case. Few thoughts.
> 
> - Can't io scheduling mechanism of these devices make use of elevator and
>   elevator fair queuing interfaces to take advantage of io controlling
>   mechanism. It should not be too difficult. Look at noop. It has
>   just 131 lines of code and it now supports hierarchical io scheduling.
>  
>   This will come with request queue and its merging and plug/unplug
>   mechanism. Is that an issue?
>
> - If not, then yes, for these corner cases, io scheduler based controller
>   does not work as it is.

I have a extreme fast SSD and its device driver provides its own
make_request_fn(). So the device driver intercepts IO requests and the
subsequent processes are done within it.

Thanks,
Ryo Tsuruta

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: Regarding dm-ioband tests
  2009-09-09 10:51                 ` Dhaval Giani
@ 2009-09-10  7:58                     ` Ryo Tsuruta
  0 siblings, 0 replies; 80+ messages in thread
From: Ryo Tsuruta @ 2009-09-10  7:58 UTC (permalink / raw)
  To: dhaval
  Cc: vgoyal, riel, linux-kernel, dm-devel, jens.axboe, agk, akpm,
	nauman, guijianfeng, jmoyer, balbir

Hi,

Dhaval Giani <dhaval@linux.vnet.ibm.com> wrote:
> > I know that cgroup is a very well defined functionality, that is why
> > dm-ioband also supports throttling per cgroup. But how are we supposed
> > to do throttling on the system which doesn't support cgroup?
> > As I wrote in another mail to Vivek, I would like to make use of
> > dm-ioband on RHEL 5.x. 
> 
> Hi Ryo,
> 
> I am not sure that upstream should really be worrying about RHEL 5.x.
> cgroups is a relatively mature solution and is available in most (if not
> all) community distros today. We really should not be looking at another
> grouping solution if the sole reason is that then dm-ioband can be used
> on RHEL 5.x. The correct solution would be to maintain a separate patch
> for RHEL 5.x then and not to burden the upstream kernel.

RHEL 5.x is not the sole reason for that.

> > And I don't think that the grouping methods are not complicated, just
> > stack a new device on the existing device and assign bandwidth to it,
> > that is the same method as other device-mapper targets, if you would
> > like to assign bandwidth per thread, then register the thread's ID to
> > the device and assign bandwidth to it as well. I don't think it makes
> > users confused.
> > 
> > > I would tend to agree with this. With other resource management
> > > controllers using cgroups, having dm-ioband use something different will
> > > require a different set of userspace tools/libraries to be used.
> > > Something that will severly limit its usefulness froma programmer's
> > > perspective.
> > 
> > Once we create a dm-ioband device, the device can be configured
> > through the cgroup interface. I think it will not severly limit its
> > usefulness.
> > 
> 
> My objection is slightly different. My objection is that there are too
> many interfaces to do the same thing.

Not too many, There are only two interfaces device-mapper and cgroup.

> Which one of these is the recommended one? 

I think whichever users like, it's up to users.

> WHich one is going to be supported?

Both device-mapper and cgroup interfaces are supported continuously.

> If we say that cgroups is not the preferred interface, do the
> application developers need to use yet another library for io
> control along with cpu/memory control?

I don't think that cgroup is not the preferred interface, especially
not if dm-ioband uses with cpu/memory controller together. Once a
dm-ioband device is created, all configurations for the device can be
done through the cgroup interface like other cgroup subsystems. It
does not require a different set of userspace tools/libraries to be
used.

I think it is natural to make dm-ioband can be configured by
device-mapper's interface since dm-ioband is a device-mapper driver,
and it extends use case, rather it limits usefulness.

Thanks,
Ryo Tsuruta

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: Regarding dm-ioband tests
@ 2009-09-10  7:58                     ` Ryo Tsuruta
  0 siblings, 0 replies; 80+ messages in thread
From: Ryo Tsuruta @ 2009-09-10  7:58 UTC (permalink / raw)
  To: dhaval
  Cc: riel, guijianfeng, linux-kernel, jmoyer, dm-devel, vgoyal,
	jens.axboe, nauman, akpm, agk, balbir

Hi,

Dhaval Giani <dhaval@linux.vnet.ibm.com> wrote:
> > I know that cgroup is a very well defined functionality, that is why
> > dm-ioband also supports throttling per cgroup. But how are we supposed
> > to do throttling on the system which doesn't support cgroup?
> > As I wrote in another mail to Vivek, I would like to make use of
> > dm-ioband on RHEL 5.x. 
> 
> Hi Ryo,
> 
> I am not sure that upstream should really be worrying about RHEL 5.x.
> cgroups is a relatively mature solution and is available in most (if not
> all) community distros today. We really should not be looking at another
> grouping solution if the sole reason is that then dm-ioband can be used
> on RHEL 5.x. The correct solution would be to maintain a separate patch
> for RHEL 5.x then and not to burden the upstream kernel.

RHEL 5.x is not the sole reason for that.

> > And I don't think that the grouping methods are not complicated, just
> > stack a new device on the existing device and assign bandwidth to it,
> > that is the same method as other device-mapper targets, if you would
> > like to assign bandwidth per thread, then register the thread's ID to
> > the device and assign bandwidth to it as well. I don't think it makes
> > users confused.
> > 
> > > I would tend to agree with this. With other resource management
> > > controllers using cgroups, having dm-ioband use something different will
> > > require a different set of userspace tools/libraries to be used.
> > > Something that will severly limit its usefulness froma programmer's
> > > perspective.
> > 
> > Once we create a dm-ioband device, the device can be configured
> > through the cgroup interface. I think it will not severly limit its
> > usefulness.
> > 
> 
> My objection is slightly different. My objection is that there are too
> many interfaces to do the same thing.

Not too many, There are only two interfaces device-mapper and cgroup.

> Which one of these is the recommended one? 

I think whichever users like, it's up to users.

> WHich one is going to be supported?

Both device-mapper and cgroup interfaces are supported continuously.

> If we say that cgroups is not the preferred interface, do the
> application developers need to use yet another library for io
> control along with cpu/memory control?

I don't think that cgroup is not the preferred interface, especially
not if dm-ioband uses with cpu/memory controller together. Once a
dm-ioband device is created, all configurations for the device can be
done through the cgroup interface like other cgroup subsystems. It
does not require a different set of userspace tools/libraries to be
used.

I think it is natural to make dm-ioband can be configured by
device-mapper's interface since dm-ioband is a device-mapper driver,
and it extends use case, rather it limits usefulness.

Thanks,
Ryo Tsuruta

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: Regarding dm-ioband tests
  2009-09-10  3:45                 ` Ryo Tsuruta
@ 2009-09-10 13:25                     ` Vivek Goyal
  0 siblings, 0 replies; 80+ messages in thread
From: Vivek Goyal @ 2009-09-10 13:25 UTC (permalink / raw)
  To: Ryo Tsuruta
  Cc: riel, linux-kernel, dm-devel, jens.axboe, agk, akpm, nauman,
	guijianfeng, jmoyer, balbir

On Thu, Sep 10, 2009 at 12:45:47PM +0900, Ryo Tsuruta wrote:
> Hi Vivek,
> 
> Vivek Goyal <vgoyal@redhat.com> wrote:
> > > In addition,
> > > there are devices which doesn't use standard IO schedulers, and
> > > dm-ioband can work on even such devices.
> > 
> > This is a interesting use case. Few thoughts.
> > 
> > - Can't io scheduling mechanism of these devices make use of elevator and
> >   elevator fair queuing interfaces to take advantage of io controlling
> >   mechanism. It should not be too difficult. Look at noop. It has
> >   just 131 lines of code and it now supports hierarchical io scheduling.
> >  
> >   This will come with request queue and its merging and plug/unplug
> >   mechanism. Is that an issue?
> >
> > - If not, then yes, for these corner cases, io scheduler based controller
> >   does not work as it is.
> 
> I have a extreme fast SSD and its device driver provides its own
> make_request_fn(). So the device driver intercepts IO requests and the
> subsequent processes are done within it.

IMHO, in those cases these SSD driver needs to hook into block layer's request
queue mechanism if they need io controlling mechanism instead of we coming
up a device mapper module.

Think of it that if somebody needs CFQ like tasks classes and prio
suported on these devices, should we also come up with another device
mapper module "dm-cfq"?

Jens, I am wondering if similiar concerns have popped in the past also for
CFQ also? Somebody asking to support task prio and classes on devices which
don't use standard IO scheduler?

Thanks
Vivek

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: Regarding dm-ioband tests
@ 2009-09-10 13:25                     ` Vivek Goyal
  0 siblings, 0 replies; 80+ messages in thread
From: Vivek Goyal @ 2009-09-10 13:25 UTC (permalink / raw)
  To: Ryo Tsuruta
  Cc: riel, guijianfeng, linux-kernel, jmoyer, dm-devel, jens.axboe,
	nauman, akpm, agk, balbir

On Thu, Sep 10, 2009 at 12:45:47PM +0900, Ryo Tsuruta wrote:
> Hi Vivek,
> 
> Vivek Goyal <vgoyal@redhat.com> wrote:
> > > In addition,
> > > there are devices which doesn't use standard IO schedulers, and
> > > dm-ioband can work on even such devices.
> > 
> > This is a interesting use case. Few thoughts.
> > 
> > - Can't io scheduling mechanism of these devices make use of elevator and
> >   elevator fair queuing interfaces to take advantage of io controlling
> >   mechanism. It should not be too difficult. Look at noop. It has
> >   just 131 lines of code and it now supports hierarchical io scheduling.
> >  
> >   This will come with request queue and its merging and plug/unplug
> >   mechanism. Is that an issue?
> >
> > - If not, then yes, for these corner cases, io scheduler based controller
> >   does not work as it is.
> 
> I have a extreme fast SSD and its device driver provides its own
> make_request_fn(). So the device driver intercepts IO requests and the
> subsequent processes are done within it.

IMHO, in those cases these SSD driver needs to hook into block layer's request
queue mechanism if they need io controlling mechanism instead of we coming
up a device mapper module.

Think of it that if somebody needs CFQ like tasks classes and prio
suported on these devices, should we also come up with another device
mapper module "dm-cfq"?

Jens, I am wondering if similiar concerns have popped in the past also for
CFQ also? Somebody asking to support task prio and classes on devices which
don't use standard IO scheduler?

Thanks
Vivek

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: Regarding dm-ioband tests
  2009-09-10  7:58                     ` Ryo Tsuruta
  (?)
@ 2009-09-11  9:53                     ` Dhaval Giani
  2009-09-15 15:12                         ` Ryo Tsuruta
  -1 siblings, 1 reply; 80+ messages in thread
From: Dhaval Giani @ 2009-09-11  9:53 UTC (permalink / raw)
  To: Ryo Tsuruta
  Cc: vgoyal, riel, linux-kernel, dm-devel, jens.axboe, agk, akpm,
	nauman, guijianfeng, jmoyer, balbir

On Thu, Sep 10, 2009 at 04:58:49PM +0900, Ryo Tsuruta wrote:
> Hi,
> 
> Dhaval Giani <dhaval@linux.vnet.ibm.com> wrote:
> > > I know that cgroup is a very well defined functionality, that is why
> > > dm-ioband also supports throttling per cgroup. But how are we supposed
> > > to do throttling on the system which doesn't support cgroup?
> > > As I wrote in another mail to Vivek, I would like to make use of
> > > dm-ioband on RHEL 5.x. 
> > 
> > Hi Ryo,
> > 
> > I am not sure that upstream should really be worrying about RHEL 5.x.
> > cgroups is a relatively mature solution and is available in most (if not
> > all) community distros today. We really should not be looking at another
> > grouping solution if the sole reason is that then dm-ioband can be used
> > on RHEL 5.x. The correct solution would be to maintain a separate patch
> > for RHEL 5.x then and not to burden the upstream kernel.
> 
> RHEL 5.x is not the sole reason for that.
> 

Could you please enumerate the other reasons for pushing in another
grouping mechanism then? (Why can we not resolve them via cgroups?)

Thanks,
-- 
regards,
Dhaval

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: Regarding dm-ioband tests
  2009-09-11  9:53                     ` Dhaval Giani
@ 2009-09-15 15:12                         ` Ryo Tsuruta
  0 siblings, 0 replies; 80+ messages in thread
From: Ryo Tsuruta @ 2009-09-15 15:12 UTC (permalink / raw)
  To: dhaval
  Cc: vgoyal, riel, linux-kernel, dm-devel, jens.axboe, agk, akpm,
	nauman, guijianfeng, jmoyer, balbir

Hi Dhaval,

Dhaval Giani <dhaval@linux.vnet.ibm.com> wrote:
> > Dhaval Giani <dhaval@linux.vnet.ibm.com> wrote:
> > > > I know that cgroup is a very well defined functionality, that is why
> > > > dm-ioband also supports throttling per cgroup. But how are we supposed
> > > > to do throttling on the system which doesn't support cgroup?
> > > > As I wrote in another mail to Vivek, I would like to make use of
> > > > dm-ioband on RHEL 5.x. 
> > > 
> > > Hi Ryo,
> > > 
> > > I am not sure that upstream should really be worrying about RHEL 5.x.
> > > cgroups is a relatively mature solution and is available in most (if not
> > > all) community distros today. We really should not be looking at another
> > > grouping solution if the sole reason is that then dm-ioband can be used
> > > on RHEL 5.x. The correct solution would be to maintain a separate patch
> > > for RHEL 5.x then and not to burden the upstream kernel.
> > 
> > RHEL 5.x is not the sole reason for that.
> > 
> 
> Could you please enumerate the other reasons for pushing in another
> grouping mechanism then? (Why can we not resolve them via cgroups?)

I'm sorry for late reply.

I'm not only pushing in the grouping mechanism by using the dmsetup
command. Please understand that dm-ioband also provides cgroup
interface and can be configured in the same manner like other cgroup
subsystems.
Why it is so bad to have multiple ways to configure? I think that it
rather gains in flexibility of configurations.

Thanks,
Ryo Tsuruta

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: Regarding dm-ioband tests
@ 2009-09-15 15:12                         ` Ryo Tsuruta
  0 siblings, 0 replies; 80+ messages in thread
From: Ryo Tsuruta @ 2009-09-15 15:12 UTC (permalink / raw)
  To: dhaval
  Cc: riel, guijianfeng, linux-kernel, jmoyer, dm-devel, vgoyal,
	jens.axboe, nauman, akpm, agk, balbir

Hi Dhaval,

Dhaval Giani <dhaval@linux.vnet.ibm.com> wrote:
> > Dhaval Giani <dhaval@linux.vnet.ibm.com> wrote:
> > > > I know that cgroup is a very well defined functionality, that is why
> > > > dm-ioband also supports throttling per cgroup. But how are we supposed
> > > > to do throttling on the system which doesn't support cgroup?
> > > > As I wrote in another mail to Vivek, I would like to make use of
> > > > dm-ioband on RHEL 5.x. 
> > > 
> > > Hi Ryo,
> > > 
> > > I am not sure that upstream should really be worrying about RHEL 5.x.
> > > cgroups is a relatively mature solution and is available in most (if not
> > > all) community distros today. We really should not be looking at another
> > > grouping solution if the sole reason is that then dm-ioband can be used
> > > on RHEL 5.x. The correct solution would be to maintain a separate patch
> > > for RHEL 5.x then and not to burden the upstream kernel.
> > 
> > RHEL 5.x is not the sole reason for that.
> > 
> 
> Could you please enumerate the other reasons for pushing in another
> grouping mechanism then? (Why can we not resolve them via cgroups?)

I'm sorry for late reply.

I'm not only pushing in the grouping mechanism by using the dmsetup
command. Please understand that dm-ioband also provides cgroup
interface and can be configured in the same manner like other cgroup
subsystems.
Why it is so bad to have multiple ways to configure? I think that it
rather gains in flexibility of configurations.

Thanks,
Ryo Tsuruta

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: Regarding dm-ioband tests
  2009-09-15 15:12                         ` Ryo Tsuruta
@ 2009-09-15 15:19                           ` Balbir Singh
  -1 siblings, 0 replies; 80+ messages in thread
From: Balbir Singh @ 2009-09-15 15:19 UTC (permalink / raw)
  To: Ryo Tsuruta
  Cc: dhaval, vgoyal, riel, linux-kernel, dm-devel, jens.axboe, agk,
	akpm, nauman, guijianfeng, jmoyer

* Ryo Tsuruta <ryov@valinux.co.jp> [2009-09-16 00:12:37]:

> Hi Dhaval,
> 
> Dhaval Giani <dhaval@linux.vnet.ibm.com> wrote:
> > > Dhaval Giani <dhaval@linux.vnet.ibm.com> wrote:
> > > > > I know that cgroup is a very well defined functionality, that is why
> > > > > dm-ioband also supports throttling per cgroup. But how are we supposed
> > > > > to do throttling on the system which doesn't support cgroup?
> > > > > As I wrote in another mail to Vivek, I would like to make use of
> > > > > dm-ioband on RHEL 5.x. 
> > > > 
> > > > Hi Ryo,
> > > > 
> > > > I am not sure that upstream should really be worrying about RHEL 5.x.
> > > > cgroups is a relatively mature solution and is available in most (if not
> > > > all) community distros today. We really should not be looking at another
> > > > grouping solution if the sole reason is that then dm-ioband can be used
> > > > on RHEL 5.x. The correct solution would be to maintain a separate patch
> > > > for RHEL 5.x then and not to burden the upstream kernel.
> > > 
> > > RHEL 5.x is not the sole reason for that.
> > > 
> > 
> > Could you please enumerate the other reasons for pushing in another
> > grouping mechanism then? (Why can we not resolve them via cgroups?)
> 
> I'm sorry for late reply.
> 
> I'm not only pushing in the grouping mechanism by using the dmsetup
> command. Please understand that dm-ioband also provides cgroup
> interface and can be configured in the same manner like other cgroup
> subsystems.
> Why it is so bad to have multiple ways to configure? I think that it
> rather gains in flexibility of configurations.
>

The main issue I see is user confusion and distro issues. If a distro
compiles cgroups and dmsetup provides both methods, what method
do we recommend to end users? Also should system management tool
support two configuration mechanisms for the same functionality?
 

-- 
	Balbir

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: Regarding dm-ioband tests
@ 2009-09-15 15:19                           ` Balbir Singh
  0 siblings, 0 replies; 80+ messages in thread
From: Balbir Singh @ 2009-09-15 15:19 UTC (permalink / raw)
  To: Ryo Tsuruta
  Cc: riel, dhaval, guijianfeng, linux-kernel, jmoyer, dm-devel, agk,
	jens.axboe, nauman, akpm, vgoyal

* Ryo Tsuruta <ryov@valinux.co.jp> [2009-09-16 00:12:37]:

> Hi Dhaval,
> 
> Dhaval Giani <dhaval@linux.vnet.ibm.com> wrote:
> > > Dhaval Giani <dhaval@linux.vnet.ibm.com> wrote:
> > > > > I know that cgroup is a very well defined functionality, that is why
> > > > > dm-ioband also supports throttling per cgroup. But how are we supposed
> > > > > to do throttling on the system which doesn't support cgroup?
> > > > > As I wrote in another mail to Vivek, I would like to make use of
> > > > > dm-ioband on RHEL 5.x. 
> > > > 
> > > > Hi Ryo,
> > > > 
> > > > I am not sure that upstream should really be worrying about RHEL 5.x.
> > > > cgroups is a relatively mature solution and is available in most (if not
> > > > all) community distros today. We really should not be looking at another
> > > > grouping solution if the sole reason is that then dm-ioband can be used
> > > > on RHEL 5.x. The correct solution would be to maintain a separate patch
> > > > for RHEL 5.x then and not to burden the upstream kernel.
> > > 
> > > RHEL 5.x is not the sole reason for that.
> > > 
> > 
> > Could you please enumerate the other reasons for pushing in another
> > grouping mechanism then? (Why can we not resolve them via cgroups?)
> 
> I'm sorry for late reply.
> 
> I'm not only pushing in the grouping mechanism by using the dmsetup
> command. Please understand that dm-ioband also provides cgroup
> interface and can be configured in the same manner like other cgroup
> subsystems.
> Why it is so bad to have multiple ways to configure? I think that it
> rather gains in flexibility of configurations.
>

The main issue I see is user confusion and distro issues. If a distro
compiles cgroups and dmsetup provides both methods, what method
do we recommend to end users? Also should system management tool
support two configuration mechanisms for the same functionality?
 

-- 
	Balbir

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: Regarding dm-ioband tests
  2009-09-15 15:19                           ` Balbir Singh
@ 2009-09-15 15:58                             ` Rik van Riel
  -1 siblings, 0 replies; 80+ messages in thread
From: Rik van Riel @ 2009-09-15 15:58 UTC (permalink / raw)
  To: balbir
  Cc: Ryo Tsuruta, dhaval, vgoyal, linux-kernel, dm-devel, jens.axboe,
	agk, akpm, nauman, guijianfeng, jmoyer

Balbir Singh wrote:
> * Ryo Tsuruta <ryov@valinux.co.jp> [2009-09-16 00:12:37]:

>> Why it is so bad to have multiple ways to configure? I think that it
>> rather gains in flexibility of configurations.
>>
> 
> The main issue I see is user confusion and distro issues. If a distro
> compiles cgroups and dmsetup provides both methods, what method
> do we recommend to end users? Also should system management tool
> support two configuration mechanisms for the same functionality?

It gets worse.

If the distro sets up things via cgroups and the admin tries
to use dmsetup - how does the configuration propagate between
the two mechanisms?

The sysadmin would expect that any changes made via dmsetup
will become visible via the config tools (that use cgroups),
too.

This will quickly increase the code requirements to ridiculous
proportions - or leave sysadmins confused and annoyed.

Neither is a good option, IMHO.

-- 
All rights reversed.

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: Regarding dm-ioband tests
@ 2009-09-15 15:58                             ` Rik van Riel
  0 siblings, 0 replies; 80+ messages in thread
From: Rik van Riel @ 2009-09-15 15:58 UTC (permalink / raw)
  To: balbir
  Cc: dhaval, guijianfeng, linux-kernel, jmoyer, dm-devel, nauman, agk,
	jens.axboe, akpm, vgoyal

Balbir Singh wrote:
> * Ryo Tsuruta <ryov@valinux.co.jp> [2009-09-16 00:12:37]:

>> Why it is so bad to have multiple ways to configure? I think that it
>> rather gains in flexibility of configurations.
>>
> 
> The main issue I see is user confusion and distro issues. If a distro
> compiles cgroups and dmsetup provides both methods, what method
> do we recommend to end users? Also should system management tool
> support two configuration mechanisms for the same functionality?

It gets worse.

If the distro sets up things via cgroups and the admin tries
to use dmsetup - how does the configuration propagate between
the two mechanisms?

The sysadmin would expect that any changes made via dmsetup
will become visible via the config tools (that use cgroups),
too.

This will quickly increase the code requirements to ridiculous
proportions - or leave sysadmins confused and annoyed.

Neither is a good option, IMHO.

-- 
All rights reversed.

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: Regarding dm-ioband tests
  2009-09-15 15:19                           ` Balbir Singh
@ 2009-09-15 16:21                             ` Ryo Tsuruta
  -1 siblings, 0 replies; 80+ messages in thread
From: Ryo Tsuruta @ 2009-09-15 16:21 UTC (permalink / raw)
  To: balbir
  Cc: dhaval, vgoyal, riel, linux-kernel, dm-devel, jens.axboe, agk,
	akpm, nauman, guijianfeng, jmoyer

Hi Balbir,

Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
> * Ryo Tsuruta <ryov@valinux.co.jp> [2009-09-16 00:12:37]:
> 
> > Hi Dhaval,
> > 
> > Dhaval Giani <dhaval@linux.vnet.ibm.com> wrote:
> > > > Dhaval Giani <dhaval@linux.vnet.ibm.com> wrote:
> > > > > > I know that cgroup is a very well defined functionality, that is why
> > > > > > dm-ioband also supports throttling per cgroup. But how are we supposed
> > > > > > to do throttling on the system which doesn't support cgroup?
> > > > > > As I wrote in another mail to Vivek, I would like to make use of
> > > > > > dm-ioband on RHEL 5.x. 
> > > > > 
> > > > > Hi Ryo,
> > > > > 
> > > > > I am not sure that upstream should really be worrying about RHEL 5.x.
> > > > > cgroups is a relatively mature solution and is available in most (if not
> > > > > all) community distros today. We really should not be looking at another
> > > > > grouping solution if the sole reason is that then dm-ioband can be used
> > > > > on RHEL 5.x. The correct solution would be to maintain a separate patch
> > > > > for RHEL 5.x then and not to burden the upstream kernel.
> > > > 
> > > > RHEL 5.x is not the sole reason for that.
> > > > 
> > > 
> > > Could you please enumerate the other reasons for pushing in another
> > > grouping mechanism then? (Why can we not resolve them via cgroups?)
> > 
> > I'm sorry for late reply.
> > 
> > I'm not only pushing in the grouping mechanism by using the dmsetup
> > command. Please understand that dm-ioband also provides cgroup
> > interface and can be configured in the same manner like other cgroup
> > subsystems.
> > Why it is so bad to have multiple ways to configure? I think that it
> > rather gains in flexibility of configurations.
> >
> 
> The main issue I see is user confusion and distro issues. If a distro
> compiles cgroups and dmsetup provides both methods, what method
> do we recommend to end users? Also should system management tool
> support two configuration mechanisms for the same functionality?

I think that it is up to users which mechanism they choose to use, and
such kind of users who can use dmsetup or cgroup interface directly
will not be confused in such a situation.

I also think that management tools are required for end users, and if
a distro supports cgroups, I recommend the management tools configure
dm-ioband by using cgroups, because dm-ioband is more usable when
using with blkio-cgroup and memory cgroup.

Thanks,
Ryo Tsuruta

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: Regarding dm-ioband tests
@ 2009-09-15 16:21                             ` Ryo Tsuruta
  0 siblings, 0 replies; 80+ messages in thread
From: Ryo Tsuruta @ 2009-09-15 16:21 UTC (permalink / raw)
  To: balbir
  Cc: riel, dhaval, guijianfeng, linux-kernel, jmoyer, dm-devel, agk,
	jens.axboe, nauman, akpm, vgoyal

Hi Balbir,

Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
> * Ryo Tsuruta <ryov@valinux.co.jp> [2009-09-16 00:12:37]:
> 
> > Hi Dhaval,
> > 
> > Dhaval Giani <dhaval@linux.vnet.ibm.com> wrote:
> > > > Dhaval Giani <dhaval@linux.vnet.ibm.com> wrote:
> > > > > > I know that cgroup is a very well defined functionality, that is why
> > > > > > dm-ioband also supports throttling per cgroup. But how are we supposed
> > > > > > to do throttling on the system which doesn't support cgroup?
> > > > > > As I wrote in another mail to Vivek, I would like to make use of
> > > > > > dm-ioband on RHEL 5.x. 
> > > > > 
> > > > > Hi Ryo,
> > > > > 
> > > > > I am not sure that upstream should really be worrying about RHEL 5.x.
> > > > > cgroups is a relatively mature solution and is available in most (if not
> > > > > all) community distros today. We really should not be looking at another
> > > > > grouping solution if the sole reason is that then dm-ioband can be used
> > > > > on RHEL 5.x. The correct solution would be to maintain a separate patch
> > > > > for RHEL 5.x then and not to burden the upstream kernel.
> > > > 
> > > > RHEL 5.x is not the sole reason for that.
> > > > 
> > > 
> > > Could you please enumerate the other reasons for pushing in another
> > > grouping mechanism then? (Why can we not resolve them via cgroups?)
> > 
> > I'm sorry for late reply.
> > 
> > I'm not only pushing in the grouping mechanism by using the dmsetup
> > command. Please understand that dm-ioband also provides cgroup
> > interface and can be configured in the same manner like other cgroup
> > subsystems.
> > Why it is so bad to have multiple ways to configure? I think that it
> > rather gains in flexibility of configurations.
> >
> 
> The main issue I see is user confusion and distro issues. If a distro
> compiles cgroups and dmsetup provides both methods, what method
> do we recommend to end users? Also should system management tool
> support two configuration mechanisms for the same functionality?

I think that it is up to users which mechanism they choose to use, and
such kind of users who can use dmsetup or cgroup interface directly
will not be confused in such a situation.

I also think that management tools are required for end users, and if
a distro supports cgroups, I recommend the management tools configure
dm-ioband by using cgroups, because dm-ioband is more usable when
using with blkio-cgroup and memory cgroup.

Thanks,
Ryo Tsuruta

^ permalink raw reply	[flat|nested] 80+ messages in thread

* dm-ioband fairness in terms of sectors seems to be killing disk (Was: Re: Regarding dm-ioband tests)
  2009-09-04  1:12     ` Ryo Tsuruta
@ 2009-09-15 21:40         ` Vivek Goyal
  0 siblings, 0 replies; 80+ messages in thread
From: Vivek Goyal @ 2009-09-15 21:40 UTC (permalink / raw)
  To: Ryo Tsuruta
  Cc: linux-kernel, dm-devel, dhaval, jens.axboe, agk, akpm, nauman,
	guijianfeng, jmoyer

On Fri, Sep 04, 2009 at 10:12:22AM +0900, Ryo Tsuruta wrote:
> Hi Vivek,
> 
> Vivek Goyal <vgoyal@redhat.com> wrote:
> > On Tue, Sep 01, 2009 at 01:47:24PM -0400, Vivek Goyal wrote:
> > > On Tue, Sep 01, 2009 at 12:50:11PM -0400, Vivek Goyal wrote:
> > > > Hi Ryo,
> > > > 
> > > > I decided to play a bit more with dm-ioband and started doing some
> > > > testing. I am doing a simple two dd threads doing reads and don't seem
> > > > to be gettting the fairness. So thought will ask you what's the issue
> > > > here. Is there an issue with my testing procedure.
> > > > 
> > > > I got one 40G SATA drive (no hardware queuing). I have created two
> > > > partitions on that disk /dev/sdd1 and /dev/sdd2 and created two ioband
> > > > devices ioband1 and ioband2 on partitions sdd1 and sdd2 respectively. The
> > > > weights of ioband1 and ioband2 devices are 200 and 100 respectively. 
> > > > 
> > > > I am assuming that this setup will create two default groups and IO
> > > > going to partition sdd1 should get double the BW of partition sdd2.
> > > > 
> > > > But it looks like I am not gettting that behavior. Following is the output
> > > > of "dmsetup table" command. This snapshot has been taken every 2 seconds
> > > > while IO was going on. Column 9 seems to be containing how many sectors
> > > > of IO has been done on a particular io band device and group. Looking at
> > > > the snapshot, it does not look like that ioband1 default group got double
> > > > the BW of ioband2 default group.  
> > > > 
> > > > Am I doing something wrong here?
> > > > 
> > > 
> > 
> > Hi Ryo,
> > 
> > Did you get a chance to look into it? Am I doing something wrong or it is
> > an issue with dm-ioband.
> 
> Sorry, I missed it. I'll look into it and report back to you.

Hi Ryo,

I am running a sequential reader in one group and few random reader and
writers in second group. Both groups are of same weight. I ran fio scripts
for 60 seconds and then looked at the output. In this case looks like we just
kill the throughput of sequential reader and disk (because random
readers/writers take over).

I ran the test "with-dm-ioband", "without-dm-ioband" and "with ioscheduler
based io controller".

First I am pasting the results and in the end I will paste my test
scripts. I have cut fio output heavily so that we does not get lost in
lots of output.

with-dm-ioband
==============

ioband1
-------
randread: (groupid=0, jobs=4): err= 0: pid=3610
  read : io=18,432KiB, bw=314KiB/s, iops=76, runt= 60076msec
    clat (usec): min=140, max=744K, avg=50866.75, stdev=61266.88

randwrite: (groupid=1, jobs=2): err= 0: pid=3614
  write: io=920KiB, bw=15KiB/s, iops=3, runt= 60098msec
    clat (usec): min=203, max=14,171K, avg=522937.86, stdev=960929.44

ioband2
-------
seqread0: (groupid=0, jobs=1): err= 0: pid=3609
  read : io=37,904KiB, bw=636KiB/s, iops=155, runt= 61026msec
    clat (usec): min=92, max=9,969K, avg=6437.89, stdev=168573.23

without dm-ioband (vanilla cfq, no grouping)
============================================
seqread0: (groupid=0, jobs=1): err= 0: pid=3969
  read : io=321MiB, bw=5,598KiB/s, iops=1,366, runt= 60104msec
    clat (usec): min=91, max=763K, avg=729.61, stdev=17402.63

randread: (groupid=0, jobs=4): err= 0: pid=3970
  read : io=15,112KiB, bw=257KiB/s, iops=62, runt= 60039msec
    clat (usec): min=124, max=1,066K, avg=63721.26, stdev=78215.17

randwrite: (groupid=1, jobs=2): err= 0: pid=3974
  write: io=680KiB, bw=11KiB/s, iops=2, runt= 60073msec
    clat (usec): min=199, max=24,646K, avg=706719.51, stdev=1774887.55

With ioscheduer based io controller patches
===========================================
cgroup 1 (weight 100)
---------------------
randread: (groupid=0, jobs=4): err= 0: pid=2995
  read : io=9,484KiB, bw=161KiB/s, iops=39, runt= 60107msec
    clat (msec): min=1, max=2,167, avg=95.47, stdev=131.60

randwrite: (groupid=1, jobs=2): err= 0: pid=2999
  write: io=2,692KiB, bw=45KiB/s, iops=11, runt= 60131msec
    clat (usec): min=199, max=30,043K, avg=178710.05, stdev=1281485.75

cgroup 2 (weight 100)
--------------------
seqread0: (groupid=0, jobs=1): err= 0: pid=2993
  read : io=547mib, bw=9,556kib/s, iops=2,333, runt= 60043msec
    clat (usec): min=92, max=224k, avg=426.74, stdev=5734.12

Note the BW of sequential reader in three cases
(636 KB/s, 5,598KiB/s, 9,556KiB/s). dm-ioband tries to provide fairness in
terms of number of sectors and it completely kills the disk throughput.

with io scheduler based io controller, we see increased throughput for
seqential reader as compared to CFQ, because now random readers are
running in a separate group and hence reader gets isolation from random
readers.

Here are my fio jobs
--------------------
First fio job file
-----------------
[global]
runtime=60

[randread]
rw=randread
size=2G
iodepth=20
directory=/mnt/sdd1/fio/
direct=1
numjobs=4
group_reporting

[randwrite]
rw=randwrite
size=1G
iodepth=20
directory=/mnt/sdd1/fio/
group_reporting
direct=1
numjobs=2

Second fio job file
-------------------
[global]
runtime=60
rw=read
size=4G
directory=/mnt/sdd2/fio/
direct=1

[seqread0]
numjobs=1
group_reporting

Thanks
Vivek

^ permalink raw reply	[flat|nested] 80+ messages in thread

* dm-ioband fairness in terms of sectors seems to be killing disk (Was: Re: Regarding dm-ioband tests)
@ 2009-09-15 21:40         ` Vivek Goyal
  0 siblings, 0 replies; 80+ messages in thread
From: Vivek Goyal @ 2009-09-15 21:40 UTC (permalink / raw)
  To: Ryo Tsuruta
  Cc: dhaval, guijianfeng, linux-kernel, jmoyer, dm-devel, jens.axboe,
	nauman, akpm, agk

On Fri, Sep 04, 2009 at 10:12:22AM +0900, Ryo Tsuruta wrote:
> Hi Vivek,
> 
> Vivek Goyal <vgoyal@redhat.com> wrote:
> > On Tue, Sep 01, 2009 at 01:47:24PM -0400, Vivek Goyal wrote:
> > > On Tue, Sep 01, 2009 at 12:50:11PM -0400, Vivek Goyal wrote:
> > > > Hi Ryo,
> > > > 
> > > > I decided to play a bit more with dm-ioband and started doing some
> > > > testing. I am doing a simple two dd threads doing reads and don't seem
> > > > to be gettting the fairness. So thought will ask you what's the issue
> > > > here. Is there an issue with my testing procedure.
> > > > 
> > > > I got one 40G SATA drive (no hardware queuing). I have created two
> > > > partitions on that disk /dev/sdd1 and /dev/sdd2 and created two ioband
> > > > devices ioband1 and ioband2 on partitions sdd1 and sdd2 respectively. The
> > > > weights of ioband1 and ioband2 devices are 200 and 100 respectively. 
> > > > 
> > > > I am assuming that this setup will create two default groups and IO
> > > > going to partition sdd1 should get double the BW of partition sdd2.
> > > > 
> > > > But it looks like I am not gettting that behavior. Following is the output
> > > > of "dmsetup table" command. This snapshot has been taken every 2 seconds
> > > > while IO was going on. Column 9 seems to be containing how many sectors
> > > > of IO has been done on a particular io band device and group. Looking at
> > > > the snapshot, it does not look like that ioband1 default group got double
> > > > the BW of ioband2 default group.  
> > > > 
> > > > Am I doing something wrong here?
> > > > 
> > > 
> > 
> > Hi Ryo,
> > 
> > Did you get a chance to look into it? Am I doing something wrong or it is
> > an issue with dm-ioband.
> 
> Sorry, I missed it. I'll look into it and report back to you.

Hi Ryo,

I am running a sequential reader in one group and few random reader and
writers in second group. Both groups are of same weight. I ran fio scripts
for 60 seconds and then looked at the output. In this case looks like we just
kill the throughput of sequential reader and disk (because random
readers/writers take over).

I ran the test "with-dm-ioband", "without-dm-ioband" and "with ioscheduler
based io controller".

First I am pasting the results and in the end I will paste my test
scripts. I have cut fio output heavily so that we does not get lost in
lots of output.

with-dm-ioband
==============

ioband1
-------
randread: (groupid=0, jobs=4): err= 0: pid=3610
  read : io=18,432KiB, bw=314KiB/s, iops=76, runt= 60076msec
    clat (usec): min=140, max=744K, avg=50866.75, stdev=61266.88

randwrite: (groupid=1, jobs=2): err= 0: pid=3614
  write: io=920KiB, bw=15KiB/s, iops=3, runt= 60098msec
    clat (usec): min=203, max=14,171K, avg=522937.86, stdev=960929.44

ioband2
-------
seqread0: (groupid=0, jobs=1): err= 0: pid=3609
  read : io=37,904KiB, bw=636KiB/s, iops=155, runt= 61026msec
    clat (usec): min=92, max=9,969K, avg=6437.89, stdev=168573.23

without dm-ioband (vanilla cfq, no grouping)
============================================
seqread0: (groupid=0, jobs=1): err= 0: pid=3969
  read : io=321MiB, bw=5,598KiB/s, iops=1,366, runt= 60104msec
    clat (usec): min=91, max=763K, avg=729.61, stdev=17402.63

randread: (groupid=0, jobs=4): err= 0: pid=3970
  read : io=15,112KiB, bw=257KiB/s, iops=62, runt= 60039msec
    clat (usec): min=124, max=1,066K, avg=63721.26, stdev=78215.17

randwrite: (groupid=1, jobs=2): err= 0: pid=3974
  write: io=680KiB, bw=11KiB/s, iops=2, runt= 60073msec
    clat (usec): min=199, max=24,646K, avg=706719.51, stdev=1774887.55

With ioscheduer based io controller patches
===========================================
cgroup 1 (weight 100)
---------------------
randread: (groupid=0, jobs=4): err= 0: pid=2995
  read : io=9,484KiB, bw=161KiB/s, iops=39, runt= 60107msec
    clat (msec): min=1, max=2,167, avg=95.47, stdev=131.60

randwrite: (groupid=1, jobs=2): err= 0: pid=2999
  write: io=2,692KiB, bw=45KiB/s, iops=11, runt= 60131msec
    clat (usec): min=199, max=30,043K, avg=178710.05, stdev=1281485.75

cgroup 2 (weight 100)
--------------------
seqread0: (groupid=0, jobs=1): err= 0: pid=2993
  read : io=547mib, bw=9,556kib/s, iops=2,333, runt= 60043msec
    clat (usec): min=92, max=224k, avg=426.74, stdev=5734.12

Note the BW of sequential reader in three cases
(636 KB/s, 5,598KiB/s, 9,556KiB/s). dm-ioband tries to provide fairness in
terms of number of sectors and it completely kills the disk throughput.

with io scheduler based io controller, we see increased throughput for
seqential reader as compared to CFQ, because now random readers are
running in a separate group and hence reader gets isolation from random
readers.

Here are my fio jobs
--------------------
First fio job file
-----------------
[global]
runtime=60

[randread]
rw=randread
size=2G
iodepth=20
directory=/mnt/sdd1/fio/
direct=1
numjobs=4
group_reporting

[randwrite]
rw=randwrite
size=1G
iodepth=20
directory=/mnt/sdd1/fio/
group_reporting
direct=1
numjobs=2

Second fio job file
-------------------
[global]
runtime=60
rw=read
size=4G
directory=/mnt/sdd2/fio/
direct=1

[seqread0]
numjobs=1
group_reporting

Thanks
Vivek

^ permalink raw reply	[flat|nested] 80+ messages in thread

* ioband: Writer starves reader even without competitors (Re: Regarding dm-ioband tests)
  2009-09-08 17:54                   ` Vivek Goyal
@ 2009-09-15 23:37                     ` Vivek Goyal
  -1 siblings, 0 replies; 80+ messages in thread
From: Vivek Goyal @ 2009-09-15 23:37 UTC (permalink / raw)
  To: Ryo Tsuruta
  Cc: Nauman Rafique, linux-kernel, dm-devel, jens.axboe, agk, akpm,
	guijianfeng, jmoyer, balbir, Rik Van Riel

On Tue, Sep 08, 2009 at 01:54:00PM -0400, Vivek Goyal wrote:

[..]
> I ran a test to show how readers can be starved in certain cases. I launched
> one reader and three writers. I ran this test twice. First without dm-ioband
> and then with dm-ioband.
> 
> Following are few lines from the script to launch readers and writers.
> 
> **************************************************************
> sync
> echo 3 > /proc/sys/vm/drop_caches
> 
> # Launch writers on sdd2
> dd if=/dev/zero of=/mnt/sdd2/writezerofile1 bs=4K count=262144 &
> 
> # Launch  writers on sdd1
> dd if=/dev/zero of=/mnt/sdd1/writezerofile1 bs=4K count=262144 &
> dd if=/dev/zero of=/mnt/sdd1/writezerofile2 bs=4K count=262144 &
> 
> echo "sleeping for 5 seconds"
> sleep 5
> 
> # launch reader on sdd1
> time dd if=/mnt/sdd1/testzerofile1 of=/dev/zero &
> echo "launched reader $!"
> *********************************************************************
> 
> Without dm-ioband, reader finished in roughly 5 seconds.
> 
> 289533952 bytes (290 MB) copied, 5.16765 s, 56.0 MB/s
> real	0m5.300s
> user	0m0.098s
> sys	0m0.492s
> 
> With dm-ioband, reader took, more than 2 minutes to finish.
> 
> 289533952 bytes (290 MB) copied, 122.386 s, 2.4 MB/s
> 
> real	2m2.569s
> user	0m0.107s
> sys	0m0.548s
> 
> I had created ioband1 on /dev/sdd1 and ioband2 on /dev/sdd2 with weights
> 200 and 100 respectively.

Hi Ryo,

I notice that with-in a single ioband device, a single writer starves the
reader even without any competitor groups being present.

I ran following two tests with and without dm-ioband devices

Test1
====
Try to use fio for a sequential reader job. First fio will lay out the
file and do write operation. While those writes are going on, try to do
ls on that partition and observe latency of ls operation.

with dm-ioband (ls test)
------------------------
# cd /mnt/sdd2
# time ls

real    0m9.483s
user    0m0.000s
sys     0m0.002s

without dm-ioband (ls test)
---------------------------
# cd /mnt/sdd2
# time ls

256M-file1  256M-file5  2G-file1  2G-file5    writefile1  writezerofile
256M-file2  256M-file6  2G-file2  files       writefile2
256M-file3  256M-file7  2G-file3  fio         writefile3
256M-file4  256M-file8  2G-file4  lost+found  writefile4

real    0m0.067s
user    0m0.000s
sys     0m0.002s

Notice the time simle "ls" operation took in two cases.


Test2
=====
Same case where fio is laying out a file and then try to read some small
files on that partition at the interval of 1 second.

small file read with dm-ioband
------------------------------
[root@chilli fairness-tests]# ./small-file-read.sh
file #   0, plain reading it took: 0.24 seconds
file #   1, plain reading it took: 13.40 seconds
file #   2, plain reading it took: 6.27 seconds
file #   3, plain reading it took: 13.84 seconds
file #   4, plain reading it took: 5.63 seconds


small file read with-out dm-ioband
==================================
[root@chilli fairness-tests]# ./small-file-read.sh
file #   0, plain reading it took: 0.04 seconds
file #   1, plain reading it took: 0.03 seconds
file #   2, plain reading it took: 0.04 seconds
file #   3, plain reading it took: 0.03 seconds
file #   4, plain reading it took: 0.03 seconds

Notice how small file read latencies have shot up.

Looks like a single writer is completely starving a reader even without
any IO going in any of the other groups.

setup
=====
I created an two ioband device of weight 100 each on partition /dev/sdd1
and /dev/sdd2 respectively. I am doing IO only on partition /dev/sdd2
(ioband2).

Following is fio job script.

[seqread]
runtime=60
rw=read
size=2G
directory=/mnt/sdd2/fio/
numjobs=1
group_reporting


Following is small file read script.

echo 3 > /proc/sys/vm/drop_caches

for ((i=0;i<5;i++)); do
        printf "file #%4d, plain reading it took: " $i
        /usr/bin/time -f "%e seconds"  cat /mnt/sdd2/files/$i >/dev/null
        sleep 1
done


Thanks
Vivek


^ permalink raw reply	[flat|nested] 80+ messages in thread

* ioband: Writer starves reader even without competitors (Re: Regarding dm-ioband tests)
@ 2009-09-15 23:37                     ` Vivek Goyal
  0 siblings, 0 replies; 80+ messages in thread
From: Vivek Goyal @ 2009-09-15 23:37 UTC (permalink / raw)
  To: Ryo Tsuruta
  Cc: Rik Van Riel, guijianfeng, linux-kernel, jmoyer, dm-devel,
	jens.axboe, Nauman Rafique, akpm, agk, balbir

On Tue, Sep 08, 2009 at 01:54:00PM -0400, Vivek Goyal wrote:

[..]
> I ran a test to show how readers can be starved in certain cases. I launched
> one reader and three writers. I ran this test twice. First without dm-ioband
> and then with dm-ioband.
> 
> Following are few lines from the script to launch readers and writers.
> 
> **************************************************************
> sync
> echo 3 > /proc/sys/vm/drop_caches
> 
> # Launch writers on sdd2
> dd if=/dev/zero of=/mnt/sdd2/writezerofile1 bs=4K count=262144 &
> 
> # Launch  writers on sdd1
> dd if=/dev/zero of=/mnt/sdd1/writezerofile1 bs=4K count=262144 &
> dd if=/dev/zero of=/mnt/sdd1/writezerofile2 bs=4K count=262144 &
> 
> echo "sleeping for 5 seconds"
> sleep 5
> 
> # launch reader on sdd1
> time dd if=/mnt/sdd1/testzerofile1 of=/dev/zero &
> echo "launched reader $!"
> *********************************************************************
> 
> Without dm-ioband, reader finished in roughly 5 seconds.
> 
> 289533952 bytes (290 MB) copied, 5.16765 s, 56.0 MB/s
> real	0m5.300s
> user	0m0.098s
> sys	0m0.492s
> 
> With dm-ioband, reader took, more than 2 minutes to finish.
> 
> 289533952 bytes (290 MB) copied, 122.386 s, 2.4 MB/s
> 
> real	2m2.569s
> user	0m0.107s
> sys	0m0.548s
> 
> I had created ioband1 on /dev/sdd1 and ioband2 on /dev/sdd2 with weights
> 200 and 100 respectively.

Hi Ryo,

I notice that with-in a single ioband device, a single writer starves the
reader even without any competitor groups being present.

I ran following two tests with and without dm-ioband devices

Test1
====
Try to use fio for a sequential reader job. First fio will lay out the
file and do write operation. While those writes are going on, try to do
ls on that partition and observe latency of ls operation.

with dm-ioband (ls test)
------------------------
# cd /mnt/sdd2
# time ls

real    0m9.483s
user    0m0.000s
sys     0m0.002s

without dm-ioband (ls test)
---------------------------
# cd /mnt/sdd2
# time ls

256M-file1  256M-file5  2G-file1  2G-file5    writefile1  writezerofile
256M-file2  256M-file6  2G-file2  files       writefile2
256M-file3  256M-file7  2G-file3  fio         writefile3
256M-file4  256M-file8  2G-file4  lost+found  writefile4

real    0m0.067s
user    0m0.000s
sys     0m0.002s

Notice the time simle "ls" operation took in two cases.


Test2
=====
Same case where fio is laying out a file and then try to read some small
files on that partition at the interval of 1 second.

small file read with dm-ioband
------------------------------
[root@chilli fairness-tests]# ./small-file-read.sh
file #   0, plain reading it took: 0.24 seconds
file #   1, plain reading it took: 13.40 seconds
file #   2, plain reading it took: 6.27 seconds
file #   3, plain reading it took: 13.84 seconds
file #   4, plain reading it took: 5.63 seconds


small file read with-out dm-ioband
==================================
[root@chilli fairness-tests]# ./small-file-read.sh
file #   0, plain reading it took: 0.04 seconds
file #   1, plain reading it took: 0.03 seconds
file #   2, plain reading it took: 0.04 seconds
file #   3, plain reading it took: 0.03 seconds
file #   4, plain reading it took: 0.03 seconds

Notice how small file read latencies have shot up.

Looks like a single writer is completely starving a reader even without
any IO going in any of the other groups.

setup
=====
I created an two ioband device of weight 100 each on partition /dev/sdd1
and /dev/sdd2 respectively. I am doing IO only on partition /dev/sdd2
(ioband2).

Following is fio job script.

[seqread]
runtime=60
rw=read
size=2G
directory=/mnt/sdd2/fio/
numjobs=1
group_reporting


Following is small file read script.

echo 3 > /proc/sys/vm/drop_caches

for ((i=0;i<5;i++)); do
        printf "file #%4d, plain reading it took: " $i
        /usr/bin/time -f "%e seconds"  cat /mnt/sdd2/files/$i >/dev/null
        sleep 1
done


Thanks
Vivek

^ permalink raw reply	[flat|nested] 80+ messages in thread

* ioband: Limited fairness and weak isolation between groups (Was: Re: Regarding dm-ioband tests)
  2009-09-07 11:02       ` Ryo Tsuruta
@ 2009-09-16  4:45         ` Vivek Goyal
  -1 siblings, 0 replies; 80+ messages in thread
From: Vivek Goyal @ 2009-09-16  4:45 UTC (permalink / raw)
  To: Ryo Tsuruta
  Cc: linux-kernel, dm-devel, jens.axboe, agk, akpm, nauman,
	guijianfeng, riel, jmoyer, balbir

On Mon, Sep 07, 2009 at 08:02:22PM +0900, Ryo Tsuruta wrote:
> Hi Vivek,
> 
> Vivek Goyal <vgoyal@redhat.com> wrote:
> > > Thank you for testing dm-ioband. dm-ioband is designed to start
> > > throttling bandwidth when multiple IO requests are issued to devices
> > > simultaneously, IOW, to start throttling when IO load exceeds a
> > > certain level.
> > > 
> > 
> > What is that certain level? Secondly what's the advantage of this?
> > 
> > I can see disadvantages though. So unless a group is really busy "up to
> > that certain level" it will not get fairness? I breaks the isolation
> > between groups.
> 
> In your test case, at least more than one dd thread have to run
> simultaneously in the higher weight group. The reason is that
> if there is an IO group which does not issue a certain number of IO
> requests, dm-ioband assumes the IO group is inactive and assign its
> spare bandwidth to active IO groups. Then whole bandwidth of the
> device can be efficiently used. Please run two dd threads in the
> higher group, it will work as you expect.
> 
> However, if you want to get fairness in a case like this, a new
> bandwidth control policy which controls accurately according to
> assigned weights can be added to dm-ioband.
> 
> > I also ran your test of doing heavy IO in two groups. This time I am
> > running 4 dd threads in both the ioband devices. Following is the snapshot
> > of "dmsetup table" output.
> >
> > Fri Sep  4 17:45:27 EDT 2009
> > ioband2: 0 40355280 ioband 1 -1 0 0 0 0 0 0
> > ioband1: 0 37768752 ioband 1 -1 0 0 0 0 0 0
> > 
> > Fri Sep  4 17:45:29 EDT 2009
> > ioband2: 0 40355280 ioband 1 -1 41 0 4184 0 0 0
> > ioband1: 0 37768752 ioband 1 -1 173 0 20096 0 0 0
> > 
> > Fri Sep  4 17:45:37 EDT 2009
> > ioband2: 0 40355280 ioband 1 -1 1605 23 197976 0 0 0
> > ioband1: 0 37768752 ioband 1 -1 4640 1 583168 0 0 0
> > 
> > Fri Sep  4 17:45:45 EDT 2009
> > ioband2: 0 40355280 ioband 1 -1 3650 47 453488 0 0 0
> > ioband1: 0 37768752 ioband 1 -1 8572 1 1079144 0 0 0
> > 
> > Fri Sep  4 17:45:51 EDT 2009
> > ioband2: 0 40355280 ioband 1 -1 5111 68 635696 0 0 0
> > ioband1: 0 37768752 ioband 1 -1 11587 1 1459544 0 0 0
> > 
> > Fri Sep  4 17:45:53 EDT 2009
> > ioband2: 0 40355280 ioband 1 -1 5698 73 709272 0 0 0
> > ioband1: 0 37768752 ioband 1 -1 12503 1 1575112 0 0 0
> > 
> > Fri Sep  4 17:45:57 EDT 2009
> > ioband2: 0 40355280 ioband 1 -1 6790 87 845808 0 0 0
> > ioband1: 0 37768752 ioband 1 -1 14395 2 1813680 0 0 0
> > 
> > Note, it took me more than 20 seconds (since I started the threds) to
> > reach close to desired fairness level. That's too long a duration.
> 
> We regarded reducing throughput loss rather than reducing duration
> as the design of dm-ioband. Of course, it is possible to make a new
> policy which reduces duration.

Not anticipating on rotation media and letting other group do the dispatch
is not only bad for fairness of random readers but it seems to be bad for
overall throughput also. So letting other group dispatching thinking it
will boost throughput is not necessarily right on rotational media.

I ran following test. Created two groups of weight 100 each and put a 
sequential dd reader in first group and put buffered writers in second
group and let it run for 20 seconds and observed at the end of 20 seconds
which group got how much work done. I ran this test multiple time, while
increasing the number of writers by one each time. Did test this with
dm-ioband and with io scheduler based io controller patches.

With dm-ioband
==============
launched reader 3176
launched 1 writers
waiting for 20 seconds
ioband2: 0 40355280 ioband 1 -1 159 0 1272 0 0 0
ioband1: 0 37768752 ioband 1 -1 13282 23 1673656 0 0 0
Total sectors transferred: 1674928

launched reader 3194
launched 2 writers
waiting for 20 seconds
ioband2: 0 40355280 ioband 1 -1 138 0 1104 54538 54081 436304
ioband1: 0 37768752 ioband 1 -1 4247 1 535056 0 0 0
Total sectors transferred: 972464

launched reader 3203
launched 3 writers
waiting for 20 seconds
ioband2: 0 40355280 ioband 1 -1 189 0 1512 44956 44572 359648
ioband1: 0 37768752 ioband 1 -1 3546 0 447128 0 0 0
Total sectors transferred: 808288

launched reader 3213
launched 4 writers
waiting for 20 seconds
ioband2: 0 40355280 ioband 1 -1 83 0 664 55937 55810 447496
ioband1: 0 37768752 ioband 1 -1 2243 0 282624 0 0 0
Total sectors transferred: 730784

launched reader 3224
launched 5 writers
waiting for 20 seconds
ioband2: 0 40355280 ioband 1 -1 179 0 1432 46544 46146 372352
ioband1: 0 37768752 ioband 1 -1 3348 0 422744 0 0 0
Total sectors transferred: 796528

launched reader 3236
launched 6 writers
waiting for 20 seconds
ioband2: 0 40355280 ioband 1 -1 176 0 1408 44499 44115 355992
ioband1: 0 37768752 ioband 1 -1 3998 0 504504 0 0 0
Total sectors transferred: 861904

launched reader 3250
launched 7 writers
waiting for 20 seconds
ioband2: 0 40355280 ioband 1 -1 451 0 3608 42267 42115 338136
ioband1: 0 37768752 ioband 1 -1 2682 0 337976 0 0 0
Total sectors transferred: 679720

With io scheduler based io controller
=====================================
launched reader 3026
launched 1 writers
waiting for 20 seconds
test1 statistics: time=8:48 8657   sectors=8:48 886112 dq=8:48 0
test2 statistics: time=8:48 7685   sectors=8:48 473384 dq=8:48 4
Total sectors transferred: 1359496

launched reader 3064
launched 2 writers
waiting for 20 seconds
test1 statistics: time=8:48 7429   sectors=8:48 856664 dq=8:48 0
test2 statistics: time=8:48 7431   sectors=8:48 376528 dq=8:48 0
Total sectors transferred: 1233192

launched reader 3094
launched 3 writers
waiting for 20 seconds
test1 statistics: time=8:48 7279   sectors=8:48 832840 dq=8:48 0
test2 statistics: time=8:48 7302   sectors=8:48 372120 dq=8:48 0
Total sectors transferred: 1204960

launched reader 3122
launched 4 writers
waiting for 20 seconds
test1 statistics: time=8:48 7291   sectors=8:48 846024 dq=8:48 0
test2 statistics: time=8:48 7314   sectors=8:48 361280 dq=8:48 0
Total sectors transferred: 1207304

launched reader 3151
launched 5 writers
waiting for 20 seconds
test1 statistics: time=8:48 7077   sectors=8:48 815184 dq=8:48 0
test2 statistics: time=8:48 7090   sectors=8:48 398472 dq=8:48 0
Total sectors transferred: 1213656

launched reader 3179
launched 6 writers
waiting for 20 seconds
test1 statistics: time=8:48 7494   sectors=8:48 873304 dq=8:48 1
test2 statistics: time=8:48 7034   sectors=8:48 316312 dq=8:48 2
Total sectors transferred: 1189616

launched reader 3209
launched 7 writers
waiting for 20 seconds
test1 statistics: time=8:48 6809   sectors=8:48 795528 dq=8:48 0
test2 statistics: time=8:48 6850   sectors=8:48 380008 dq=8:48 1
Total sectors transferred: 1175536

Few things stand out.
====================
- With dm-ioband, as number of writers increased, in group 2, it gave
  BW to those writes over reads running in group 1. It had two bad
  effects. First of all read throughput went down secondly overall disk
  throughput also went down.

  So reader did not get fairness at the same time overall throughput went
  down. Hence probably it is not a very good idea to not anticipate and
  always let other groups dispatch on rotational media.

  In contrast, io scheduler based controller seems to be steady and reader
  doest not suffer as number of writers increase in the second group and
  overall disk throughput also remains stable.

Follwoing is the sample script I used for above test.

*******************************************************************
launch_writers() {
        nr_writers=$1
        for ((j=1;j<=$nr_writers;j++)); do
                dd if=/dev/zero of=/mnt/sdd2/writefile$j bs=4K &
        #       echo "launched writer $!"
        done
}

do_test () {
        nr_writers=$1
        sync
        echo 3 > /proc/sys/vm/drop_caches

        echo noop > /sys/block/sdd/queue/scheduler
        echo cfq > /sys/block/sdd/queue/scheduler

        dmsetup message ioband1 0 reset
        dmsetup message ioband2 0 reset

        #launch a sequential reader in sdd1
        dd if=/mnt/sdd1/4G-file of=/dev/null &
        echo "launched reader $!"

        launch_writers $nr_writers
        echo "launched $nr_writers writers"
        echo "waiting for 20 seconds"
        sleep 20
        dmsetup status
        killall dd > /dev/null 2>&1
}

for ((i=1;i<8;i++)); do
        do_test $i
        echo
done

*********************************************************************

^ permalink raw reply	[flat|nested] 80+ messages in thread

* ioband: Limited fairness and weak isolation between groups (Was: Re: Regarding dm-ioband tests)
@ 2009-09-16  4:45         ` Vivek Goyal
  0 siblings, 0 replies; 80+ messages in thread
From: Vivek Goyal @ 2009-09-16  4:45 UTC (permalink / raw)
  To: Ryo Tsuruta
  Cc: riel, guijianfeng, linux-kernel, jmoyer, dm-devel, jens.axboe,
	nauman, akpm, agk, balbir

On Mon, Sep 07, 2009 at 08:02:22PM +0900, Ryo Tsuruta wrote:
> Hi Vivek,
> 
> Vivek Goyal <vgoyal@redhat.com> wrote:
> > > Thank you for testing dm-ioband. dm-ioband is designed to start
> > > throttling bandwidth when multiple IO requests are issued to devices
> > > simultaneously, IOW, to start throttling when IO load exceeds a
> > > certain level.
> > > 
> > 
> > What is that certain level? Secondly what's the advantage of this?
> > 
> > I can see disadvantages though. So unless a group is really busy "up to
> > that certain level" it will not get fairness? I breaks the isolation
> > between groups.
> 
> In your test case, at least more than one dd thread have to run
> simultaneously in the higher weight group. The reason is that
> if there is an IO group which does not issue a certain number of IO
> requests, dm-ioband assumes the IO group is inactive and assign its
> spare bandwidth to active IO groups. Then whole bandwidth of the
> device can be efficiently used. Please run two dd threads in the
> higher group, it will work as you expect.
> 
> However, if you want to get fairness in a case like this, a new
> bandwidth control policy which controls accurately according to
> assigned weights can be added to dm-ioband.
> 
> > I also ran your test of doing heavy IO in two groups. This time I am
> > running 4 dd threads in both the ioband devices. Following is the snapshot
> > of "dmsetup table" output.
> >
> > Fri Sep  4 17:45:27 EDT 2009
> > ioband2: 0 40355280 ioband 1 -1 0 0 0 0 0 0
> > ioband1: 0 37768752 ioband 1 -1 0 0 0 0 0 0
> > 
> > Fri Sep  4 17:45:29 EDT 2009
> > ioband2: 0 40355280 ioband 1 -1 41 0 4184 0 0 0
> > ioband1: 0 37768752 ioband 1 -1 173 0 20096 0 0 0
> > 
> > Fri Sep  4 17:45:37 EDT 2009
> > ioband2: 0 40355280 ioband 1 -1 1605 23 197976 0 0 0
> > ioband1: 0 37768752 ioband 1 -1 4640 1 583168 0 0 0
> > 
> > Fri Sep  4 17:45:45 EDT 2009
> > ioband2: 0 40355280 ioband 1 -1 3650 47 453488 0 0 0
> > ioband1: 0 37768752 ioband 1 -1 8572 1 1079144 0 0 0
> > 
> > Fri Sep  4 17:45:51 EDT 2009
> > ioband2: 0 40355280 ioband 1 -1 5111 68 635696 0 0 0
> > ioband1: 0 37768752 ioband 1 -1 11587 1 1459544 0 0 0
> > 
> > Fri Sep  4 17:45:53 EDT 2009
> > ioband2: 0 40355280 ioband 1 -1 5698 73 709272 0 0 0
> > ioband1: 0 37768752 ioband 1 -1 12503 1 1575112 0 0 0
> > 
> > Fri Sep  4 17:45:57 EDT 2009
> > ioband2: 0 40355280 ioband 1 -1 6790 87 845808 0 0 0
> > ioband1: 0 37768752 ioband 1 -1 14395 2 1813680 0 0 0
> > 
> > Note, it took me more than 20 seconds (since I started the threds) to
> > reach close to desired fairness level. That's too long a duration.
> 
> We regarded reducing throughput loss rather than reducing duration
> as the design of dm-ioband. Of course, it is possible to make a new
> policy which reduces duration.

Not anticipating on rotation media and letting other group do the dispatch
is not only bad for fairness of random readers but it seems to be bad for
overall throughput also. So letting other group dispatching thinking it
will boost throughput is not necessarily right on rotational media.

I ran following test. Created two groups of weight 100 each and put a 
sequential dd reader in first group and put buffered writers in second
group and let it run for 20 seconds and observed at the end of 20 seconds
which group got how much work done. I ran this test multiple time, while
increasing the number of writers by one each time. Did test this with
dm-ioband and with io scheduler based io controller patches.

With dm-ioband
==============
launched reader 3176
launched 1 writers
waiting for 20 seconds
ioband2: 0 40355280 ioband 1 -1 159 0 1272 0 0 0
ioband1: 0 37768752 ioband 1 -1 13282 23 1673656 0 0 0
Total sectors transferred: 1674928

launched reader 3194
launched 2 writers
waiting for 20 seconds
ioband2: 0 40355280 ioband 1 -1 138 0 1104 54538 54081 436304
ioband1: 0 37768752 ioband 1 -1 4247 1 535056 0 0 0
Total sectors transferred: 972464

launched reader 3203
launched 3 writers
waiting for 20 seconds
ioband2: 0 40355280 ioband 1 -1 189 0 1512 44956 44572 359648
ioband1: 0 37768752 ioband 1 -1 3546 0 447128 0 0 0
Total sectors transferred: 808288

launched reader 3213
launched 4 writers
waiting for 20 seconds
ioband2: 0 40355280 ioband 1 -1 83 0 664 55937 55810 447496
ioband1: 0 37768752 ioband 1 -1 2243 0 282624 0 0 0
Total sectors transferred: 730784

launched reader 3224
launched 5 writers
waiting for 20 seconds
ioband2: 0 40355280 ioband 1 -1 179 0 1432 46544 46146 372352
ioband1: 0 37768752 ioband 1 -1 3348 0 422744 0 0 0
Total sectors transferred: 796528

launched reader 3236
launched 6 writers
waiting for 20 seconds
ioband2: 0 40355280 ioband 1 -1 176 0 1408 44499 44115 355992
ioband1: 0 37768752 ioband 1 -1 3998 0 504504 0 0 0
Total sectors transferred: 861904

launched reader 3250
launched 7 writers
waiting for 20 seconds
ioband2: 0 40355280 ioband 1 -1 451 0 3608 42267 42115 338136
ioband1: 0 37768752 ioband 1 -1 2682 0 337976 0 0 0
Total sectors transferred: 679720

With io scheduler based io controller
=====================================
launched reader 3026
launched 1 writers
waiting for 20 seconds
test1 statistics: time=8:48 8657   sectors=8:48 886112 dq=8:48 0
test2 statistics: time=8:48 7685   sectors=8:48 473384 dq=8:48 4
Total sectors transferred: 1359496

launched reader 3064
launched 2 writers
waiting for 20 seconds
test1 statistics: time=8:48 7429   sectors=8:48 856664 dq=8:48 0
test2 statistics: time=8:48 7431   sectors=8:48 376528 dq=8:48 0
Total sectors transferred: 1233192

launched reader 3094
launched 3 writers
waiting for 20 seconds
test1 statistics: time=8:48 7279   sectors=8:48 832840 dq=8:48 0
test2 statistics: time=8:48 7302   sectors=8:48 372120 dq=8:48 0
Total sectors transferred: 1204960

launched reader 3122
launched 4 writers
waiting for 20 seconds
test1 statistics: time=8:48 7291   sectors=8:48 846024 dq=8:48 0
test2 statistics: time=8:48 7314   sectors=8:48 361280 dq=8:48 0
Total sectors transferred: 1207304

launched reader 3151
launched 5 writers
waiting for 20 seconds
test1 statistics: time=8:48 7077   sectors=8:48 815184 dq=8:48 0
test2 statistics: time=8:48 7090   sectors=8:48 398472 dq=8:48 0
Total sectors transferred: 1213656

launched reader 3179
launched 6 writers
waiting for 20 seconds
test1 statistics: time=8:48 7494   sectors=8:48 873304 dq=8:48 1
test2 statistics: time=8:48 7034   sectors=8:48 316312 dq=8:48 2
Total sectors transferred: 1189616

launched reader 3209
launched 7 writers
waiting for 20 seconds
test1 statistics: time=8:48 6809   sectors=8:48 795528 dq=8:48 0
test2 statistics: time=8:48 6850   sectors=8:48 380008 dq=8:48 1
Total sectors transferred: 1175536

Few things stand out.
====================
- With dm-ioband, as number of writers increased, in group 2, it gave
  BW to those writes over reads running in group 1. It had two bad
  effects. First of all read throughput went down secondly overall disk
  throughput also went down.

  So reader did not get fairness at the same time overall throughput went
  down. Hence probably it is not a very good idea to not anticipate and
  always let other groups dispatch on rotational media.

  In contrast, io scheduler based controller seems to be steady and reader
  doest not suffer as number of writers increase in the second group and
  overall disk throughput also remains stable.

Follwoing is the sample script I used for above test.

*******************************************************************
launch_writers() {
        nr_writers=$1
        for ((j=1;j<=$nr_writers;j++)); do
                dd if=/dev/zero of=/mnt/sdd2/writefile$j bs=4K &
        #       echo "launched writer $!"
        done
}

do_test () {
        nr_writers=$1
        sync
        echo 3 > /proc/sys/vm/drop_caches

        echo noop > /sys/block/sdd/queue/scheduler
        echo cfq > /sys/block/sdd/queue/scheduler

        dmsetup message ioband1 0 reset
        dmsetup message ioband2 0 reset

        #launch a sequential reader in sdd1
        dd if=/mnt/sdd1/4G-file of=/dev/null &
        echo "launched reader $!"

        launch_writers $nr_writers
        echo "launched $nr_writers writers"
        echo "waiting for 20 seconds"
        sleep 20
        dmsetup status
        killall dd > /dev/null 2>&1
}

for ((i=1;i<8;i++)); do
        do_test $i
        echo
done

*********************************************************************

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: dm-ioband fairness in terms of sectors seems to be killing disk
  2009-09-15 21:40         ` Vivek Goyal
@ 2009-09-16 11:10           ` Ryo Tsuruta
  -1 siblings, 0 replies; 80+ messages in thread
From: Ryo Tsuruta @ 2009-09-16 11:10 UTC (permalink / raw)
  To: vgoyal
  Cc: linux-kernel, dm-devel, dhaval, jens.axboe, agk, akpm, nauman,
	guijianfeng, jmoyer

Hi Vivek,

Vivek Goyal <vgoyal@redhat.com> wrote:
> Hi Ryo,
> 
> I am running a sequential reader in one group and few random reader and
> writers in second group. Both groups are of same weight. I ran fio scripts
> for 60 seconds and then looked at the output. In this case looks like we just
> kill the throughput of sequential reader and disk (because random
> readers/writers take over).

Thank you for testing dm-ioband. 

I ran your script on my environment, and here are the results.

                        Throughput  [KiB/s]
              vanilla     dm-ioband            dm-ioband   
                       (io-throttle = 4)  (io-throttle = 50)
  randread      312           392                 368
  randwrite      11            12                  10
  seqread      4341           651                1599

I ran the script on dm-ioband under two conditions, one is that the
io-throttle options is set to 4, and the other is set to 50. When
there are some in-flight IO requests in the group and those numbers
exceed io-throttle, then dm-ioband gives priority to the group and the
group can issue subsequent IO requests in preference to the other
groups. 50 io-throttle means that it cancels this mechanism, so the
seq-read got more bandwidth than 4 io-throttle.

I tried to test with 2.6.31-rc7 and io-controller v9, but unfortunately,
a kernel panic happened. I'll try to test with your io-controller
again later.
 
> with io scheduler based io controller, we see increased throughput for
> seqential reader as compared to CFQ, because now random readers are
> running in a separate group and hence reader gets isolation from random
> readers.

I summarized your results in a tabular format.

                   Throughput [KiB/s]
             vanilla io-controller  dm-ioband
randread        257        161          314
randwrite        11         45           15
seqread        5598       9556          631

On the result of io-controller, the throughput of seqread was
increased but randread was decreased against vanilla. Did it perform as
you expected? Was disktime consumed equally on each group according to
the weight settings? Could you tell me your opinion what an
io-controller should do when this kind of workload is applied?

Thanks,
Ryo Tsuruta

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: dm-ioband fairness in terms of sectors seems to be killing disk
@ 2009-09-16 11:10           ` Ryo Tsuruta
  0 siblings, 0 replies; 80+ messages in thread
From: Ryo Tsuruta @ 2009-09-16 11:10 UTC (permalink / raw)
  To: vgoyal
  Cc: dhaval, guijianfeng, linux-kernel, jmoyer, dm-devel, jens.axboe,
	nauman, akpm, agk

Hi Vivek,

Vivek Goyal <vgoyal@redhat.com> wrote:
> Hi Ryo,
> 
> I am running a sequential reader in one group and few random reader and
> writers in second group. Both groups are of same weight. I ran fio scripts
> for 60 seconds and then looked at the output. In this case looks like we just
> kill the throughput of sequential reader and disk (because random
> readers/writers take over).

Thank you for testing dm-ioband. 

I ran your script on my environment, and here are the results.

                        Throughput  [KiB/s]
              vanilla     dm-ioband            dm-ioband   
                       (io-throttle = 4)  (io-throttle = 50)
  randread      312           392                 368
  randwrite      11            12                  10
  seqread      4341           651                1599

I ran the script on dm-ioband under two conditions, one is that the
io-throttle options is set to 4, and the other is set to 50. When
there are some in-flight IO requests in the group and those numbers
exceed io-throttle, then dm-ioband gives priority to the group and the
group can issue subsequent IO requests in preference to the other
groups. 50 io-throttle means that it cancels this mechanism, so the
seq-read got more bandwidth than 4 io-throttle.

I tried to test with 2.6.31-rc7 and io-controller v9, but unfortunately,
a kernel panic happened. I'll try to test with your io-controller
again later.
 
> with io scheduler based io controller, we see increased throughput for
> seqential reader as compared to CFQ, because now random readers are
> running in a separate group and hence reader gets isolation from random
> readers.

I summarized your results in a tabular format.

                   Throughput [KiB/s]
             vanilla io-controller  dm-ioband
randread        257        161          314
randwrite        11         45           15
seqread        5598       9556          631

On the result of io-controller, the throughput of seqread was
increased but randread was decreased against vanilla. Did it perform as
you expected? Was disktime consumed equally on each group according to
the weight settings? Could you tell me your opinion what an
io-controller should do when this kind of workload is applied?

Thanks,
Ryo Tsuruta

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: ioband: Writer starves reader even without competitors
  2009-09-15 23:37                     ` Vivek Goyal
  (?)
@ 2009-09-16 12:08                     ` Ryo Tsuruta
  -1 siblings, 0 replies; 80+ messages in thread
From: Ryo Tsuruta @ 2009-09-16 12:08 UTC (permalink / raw)
  To: vgoyal
  Cc: nauman, linux-kernel, dm-devel, jens.axboe, agk, akpm,
	guijianfeng, jmoyer, balbir, riel

Hi Vivek,

Vivek Goyal <vgoyal@redhat.com> wrote:
> Hi Ryo,
> 
> I notice that with-in a single ioband device, a single writer starves the
> reader even without any competitor groups being present.
> 
> I ran following two tests with and without dm-ioband devices

Thank you again for testing dm-ioband.
I got the similar results and am investigating it now.
I'll let you know if I find something.

Thanks,
Ryo Tsuruta

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: ioband: Limited fairness and weak isolation between groups
  2009-09-16  4:45         ` Vivek Goyal
  (?)
@ 2009-09-18  7:33         ` Ryo Tsuruta
  -1 siblings, 0 replies; 80+ messages in thread
From: Ryo Tsuruta @ 2009-09-18  7:33 UTC (permalink / raw)
  To: vgoyal
  Cc: linux-kernel, dm-devel, jens.axboe, agk, akpm, nauman,
	guijianfeng, riel, jmoyer, balbir

Hi Vivek,

Vivek Goyal <vgoyal@redhat.com> wrote:
> I ran following test. Created two groups of weight 100 each and put a 
> sequential dd reader in first group and put buffered writers in second
> group and let it run for 20 seconds and observed at the end of 20 seconds
> which group got how much work done. I ran this test multiple time, while
> increasing the number of writers by one each time. Did test this with
> dm-ioband and with io scheduler based io controller patches.

I did the same test on my environment (2.6.31 + dm-ioband v1.13.0) and
here are the results.

      The number of sectors transferred
  writers   read    write   total
     1     800696  588600  1389296
     2     747704  430736  1178440
     3     757136  455808  1212944
     4     704888  562912  1267800
     5     788760  387672  1176432
     6     730664  495832  1226496
     7     765864  427384  1193248

I got the different results to yours, the total throughput did not
decreased according to increasing the number of writers. I've attached
the outputs of the test script. Please note that the format of
"dmsetup status" have been changed like /sys/block/dev/stat file.

launched reader 3567
launched 1 writers
waiting for 20 seconds
ioband2: 0 112455000 ioband share1 -1 85 0 680 0 100087 0 800696 0 384 0 0
ioband1: 0 112455000 ioband share1 -1 4673 0 588600 0 0 0 0 0 0 0 0

launched reader 3575
launched 2 writers
waiting for 20 seconds
ioband2: 0 112455000 ioband share1 -1 197 0 1576 0 93463 0 747704 0 384 0 0
ioband1: 0 112455000 ioband share1 -1 3420 0 430736 0 0 0 0 0 0 0 0

launched reader 3584
launched 3 writers
waiting for 20 seconds
ioband2: 0 112455000 ioband share1 -1 237 0 1896 0 94642 0 757136 0 384 0 0
ioband1: 0 112455000 ioband share1 -1 3614 0 455808 0 0 0 0 0 0 0 0

launched reader 3594
launched 4 writers
waiting for 20 seconds
ioband2: 0 112455000 ioband share1 -1 207 0 1656 0 88111 0 704888 0 159 0 0
ioband1: 0 112455000 ioband share1 -1 4462 0 562912 0 0 0 0 0 0 0 0

launched reader 3605
launched 5 writers
waiting for 20 seconds
ioband2: 0 112455000 ioband share1 -1 234 0 1872 0 98595 0 788760 0 384 0 0
ioband1: 0 112455000 ioband share1 -1 3077 0 387672 0 0 0 0 0 0 0 0

launched reader 3618
launched 6 writers
waiting for 20 seconds
ioband2: 0 112455000 ioband share1 -1 215 0 1720 0 91333 0 730664 0 384 0 0
ioband1: 0 112455000 ioband share1 -1 3937 0 495832 0 0 0 0 0 0 0 0

launched reader 3631
launched 7 writers
waiting for 20 seconds
ioband2: 0 112455000 ioband share1 -1 245 0 1960 0 95733 0 765864 0 384 0 0
ioband1: 0 112455000 ioband share1 -1 3391 0 427384 0 0 0 0 0 0 0 0

Thanks,
Ryo Tsuruta

^ permalink raw reply	[flat|nested] 80+ messages in thread

end of thread, other threads:[~2009-09-18  7:33 UTC | newest]

Thread overview: 80+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-09-01 16:50 Regarding dm-ioband tests Vivek Goyal
2009-09-01 16:50 ` Vivek Goyal
2009-09-01 17:47 ` Vivek Goyal
2009-09-01 17:47   ` Vivek Goyal
2009-09-03 13:11   ` Vivek Goyal
2009-09-03 13:11     ` Vivek Goyal
2009-09-04  1:12     ` Ryo Tsuruta
2009-09-15 21:40       ` dm-ioband fairness in terms of sectors seems to be killing disk (Was: Re: Regarding dm-ioband tests) Vivek Goyal
2009-09-15 21:40         ` Vivek Goyal
2009-09-16 11:10         ` dm-ioband fairness in terms of sectors seems to be killing disk Ryo Tsuruta
2009-09-16 11:10           ` Ryo Tsuruta
2009-09-04  4:02 ` Regarding dm-ioband tests Ryo Tsuruta
2009-09-04  4:02   ` Ryo Tsuruta
2009-09-04 23:11   ` Vivek Goyal
2009-09-04 23:11     ` Vivek Goyal
2009-09-07 11:02     ` Ryo Tsuruta
2009-09-07 11:02       ` Ryo Tsuruta
2009-09-07 13:53       ` Rik van Riel
2009-09-07 13:53         ` Rik van Riel
2009-09-08  3:01         ` Ryo Tsuruta
2009-09-08  3:01           ` Ryo Tsuruta
2009-09-08  3:22           ` Balbir Singh
2009-09-08  3:22             ` Balbir Singh
2009-09-08  5:05             ` Ryo Tsuruta
2009-09-08  5:05               ` Ryo Tsuruta
2009-09-08 13:49               ` Vivek Goyal
2009-09-08 13:49                 ` Vivek Goyal
2009-09-09  5:17                 ` Ryo Tsuruta
2009-09-09  5:17                   ` Ryo Tsuruta
2009-09-09 13:34                   ` Vivek Goyal
2009-09-09 13:34                     ` Vivek Goyal
2009-09-08 13:42           ` Vivek Goyal
2009-09-08 13:42             ` Vivek Goyal
2009-09-08 16:30             ` Nauman Rafique
2009-09-08 16:30               ` Nauman Rafique
2009-09-08 16:47               ` Rik van Riel
2009-09-08 16:47                 ` Rik van Riel
2009-09-08 17:54                 ` Vivek Goyal
2009-09-08 17:54                   ` Vivek Goyal
2009-09-15 23:37                   ` ioband: Writer starves reader even without competitors (Re: Regarding dm-ioband tests) Vivek Goyal
2009-09-15 23:37                     ` Vivek Goyal
2009-09-16 12:08                     ` ioband: Writer starves reader even without competitors Ryo Tsuruta
2009-09-08 17:06             ` Regarding dm-ioband tests Dhaval Giani
2009-09-09  6:05               ` Ryo Tsuruta
2009-09-09  6:05                 ` Ryo Tsuruta
2009-09-09 10:51                 ` Dhaval Giani
2009-09-10  7:58                   ` Ryo Tsuruta
2009-09-10  7:58                     ` Ryo Tsuruta
2009-09-11  9:53                     ` Dhaval Giani
2009-09-15 15:12                       ` Ryo Tsuruta
2009-09-15 15:12                         ` Ryo Tsuruta
2009-09-15 15:19                         ` Balbir Singh
2009-09-15 15:19                           ` Balbir Singh
2009-09-15 15:58                           ` Rik van Riel
2009-09-15 15:58                             ` Rik van Riel
2009-09-15 16:21                           ` Ryo Tsuruta
2009-09-15 16:21                             ` Ryo Tsuruta
2009-09-09 13:57                 ` Vivek Goyal
2009-09-09 13:57                   ` Vivek Goyal
2009-09-10  3:06                   ` Ryo Tsuruta
2009-09-09 10:01             ` Ryo Tsuruta
2009-09-09 14:31               ` Vivek Goyal
2009-09-09 14:31                 ` Vivek Goyal
2009-09-10  3:45                 ` Ryo Tsuruta
2009-09-10 13:25                   ` Vivek Goyal
2009-09-10 13:25                     ` Vivek Goyal
2009-09-08 19:24           ` Rik van Riel
2009-09-08 19:24             ` Rik van Riel
2009-09-09  0:09             ` Fabio Checconi
2009-09-09  2:06               ` Vivek Goyal
2009-09-09  2:06                 ` Vivek Goyal
2009-09-09 15:41                 ` Fabio Checconi
2009-09-09 17:30                   ` Vivek Goyal
2009-09-09 17:30                     ` Vivek Goyal
2009-09-09 19:01                     ` Fabio Checconi
2009-09-09  9:24               ` Ryo Tsuruta
2009-09-09  9:24                 ` Ryo Tsuruta
2009-09-16  4:45       ` ioband: Limited fairness and weak isolation between groups (Was: Re: Regarding dm-ioband tests) Vivek Goyal
2009-09-16  4:45         ` Vivek Goyal
2009-09-18  7:33         ` ioband: Limited fairness and weak isolation between groups Ryo Tsuruta

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.