All of lore.kernel.org
 help / color / mirror / Atom feed
* IO scheduler based IO Controller V2
@ 2009-05-05 19:58 Vivek Goyal
  0 siblings, 0 replies; 97+ messages in thread
From: Vivek Goyal @ 2009-05-05 19:58 UTC (permalink / raw)
  To: nauman, dpshah, lizf, mikew, fchecconi, paolo.valente,
	jens.axboe, ryov, fernando
  Cc: akpm, vgoyal


Hi All,

Here is the V2 of the IO controller patches generated on top of 2.6.30-rc4.
First version of the patches was posted here.

http://lkml.org/lkml/2009/3/11/486

This patchset is still work in progress but I want to keep on getting the
snapshot of my tree out at regular intervals to get the feedback hence V2.

Before I go into details of what are the major changes from V1, wanted
to highlight other IO controller proposals on lkml.

Other active IO controller proposals
------------------------------------
Currently primarily two other IO controller proposals are out there.

dm-ioband
---------
This patch set is from Ryo Tsuruta from valinux. It is a proportional bandwidth controller implemented as a dm driver.

http://people.valinux.co.jp/~ryov/dm-ioband/

The biggest issue (apart from others), with a 2nd level IO controller is that
buffering of BIOs takes place in a single queue and dispatch of this BIOs
to unerlying IO scheduler is in FIFO manner. That means whenever the buffering
takes place, it breaks the notion of different class and priority of CFQ.

That means RT requests might be stuck behind some write requests or some read
requests might be stuck behind somet write requests for long time etc. To
demonstrate the single FIFO dispatch issues, I had run some basic tests and
posted the results in following mail thread.

http://lkml.org/lkml/2009/4/13/2

These are hard to solve issues and one will end up maintaining the separate
queues for separate classes and priority as CFQ does to fully resolve it.
But that will make 2nd level implementation complex at the same time if
somebody is trying to use IO controller on a single disk or on a hardware RAID
using cfq as scheduler, it will be two layers of queueing maintating separate
queues per priorty level. One at dm-driver level and other at CFQ which again
does not make lot of sense.

On the other hand, if a user is running noop at the device level, at higher
level we will be maintaining multiple cfq like queues, which also does not
make sense as underlying IO scheduler never wanted that.

Hence, IMHO, I think that controlling bio at second level probably is not a
very good idea. We should instead do it at IO scheduler level where we already
maintain all the needed queues. Just that make the scheduling hierarhical and
group aware so isolate IO of one group from other.

IO-throttling
-------------
This patch set is from Andrea Righi provides max bandwidth controller. That
means, it does not gurantee the minimum bandwidth. It provides the maximum
bandwidth limits and throttles the application if it crosses its bandwidth.

So its not apple vs apple comparison. This patch set and dm-ioband provide
proportional bandwidth control where a cgroup can use much more bandwidth
if there are not other users and resource control comes into the picture
only if there is contention.

It seems that there are both the kind of users there. One set of people needing
proportional BW control and other people needing max bandwidth control.

Now the question is, where max bandwidth control should be implemented? At
higher layers or at IO scheduler level? Should proportional bw control and
max bw control be implemented separately at different layer or these should
be implemented at one place?

IMHO, if we are doing proportional bw control at IO scheduler layer, it should
be possible to extend it to do max bw control also here without lot of effort.
Then it probably does not make too much of sense to do two types of control
at two different layers. Doing it at one place should lead to lesser code
and reduced complexity.

Secondly, io-throttling solution also buffers writes at higher layer.
Which again will lead to issue of losing the notion of priority of writes.

Hence, personally I think that users will need both proportional bw as well
as max bw control and we probably should implement these at a single place
instead of splitting it. Once elevator based io controller patchset matures,
it can be enhanced to do max bw control also.

Having said that, one issue with doing upper limit control at elevator/IO
scheduler level is that it does not have the view of higher level logical
devices. So if there is a software RAID with two disks, then one can not do
max bw control on logical device, instead it shall have to be on leaf node
where io scheduler is attached.

Now back to the desciption of this patchset and changes from V1.

- Rebased patches to 2.6.30-rc4.

- Last time Andrew mentioned that async writes are big issue for us hence,
  introduced the control for async writes also.

- Implemented per group request descriptor support. This was needed to
  make sure one group doing lot of IO does not starve other group of request
  descriptors and other group does not get fair share. This is a basic patch
  right now which probably will require more changes after some discussion.

- Exported the disk time used and number of sectors dispatched by a cgroup
  through cgroup interface. This should help us in seeing how much disk
  time each group got and whether it is fair or not.

- Implemented group refcounting support. Lack of this was causing some
  cgroup related issues. There are still some races left out which needs
  to be fixed. 

- For IO tracking/async write tracking, started making use of patches of
  blkio-cgroup from ryo Tsuruta posted here.

  http://lkml.org/lkml/2009/4/28/235

  Currently people seem to be liking the idea of separate subsystem for
  tracking writes and then rest of the users can use that info instead of
  everybody implementing their own. That's a different thing that how many
  users are out there which will end up in kernel is not clear.

  So instead of carrying own versin of bio-cgroup patches, and overloading
  io controller cgroup subsystem, I am making use of blkio-cgroup patches.
  One shall have to mount io controller and blkio subsystem together on the
  same hiearchy for the time being. Later we can take care of the case where
  blkio is mounted on a different hierarchy.

- Replaced group priorities with group weights.

Testing
=======

Again, I have been able to do only very basic testing of reads and writes.
Did not want to hold the patches back because of testing. Providing support
for async writes took much more time than expected and still work is left
in that area. Will continue to do more testing.

Test1 (Fairness for synchronous reads)
======================================
- Two dd in two cgroups with cgrop weights 1000 and 500. Ran two "dd" in those
  cgroups (With CFQ scheduler and /sys/block/<device>/queue/fairness = 1)

dd if=/mnt/$BLOCKDEV/zerofile1 of=/dev/null &
dd if=/mnt/$BLOCKDEV/zerofile2 of=/dev/null &

234179072 bytes (234 MB) copied, 4.13954 s, 56.6 MB/s
234179072 bytes (234 MB) copied, 5.2127 s, 44.9 MB/s

group1 time=3108 group1 sectors=460968
group2 time=1405 group2 sectors=264944

This patchset tries to provide fairness in terms of disk time received. group1
got almost double of group2 disk time (At the time of first dd finish). These
time and sectors statistics can be read using io.disk_time and io.disk_sector
files in cgroup. More about it in documentation file.

Test2 (Fairness for async writes)
=================================
Fairness for async writes is tricy and biggest reason is that async writes
are cached in higher layers (page cahe) and are dispatched to lower layers
not necessarily in proportional manner. For example, consider two dd threads
reading /dev/zero as input file and doing writes of huge files. Very soon
we will cross vm_dirty_ratio and dd thread will be forced to write out some
pages to disk before more pages can be dirtied. But not necessarily dirty
pages of same thread are picked. It can very well pick the inode of lesser
priority dd thread and do some writeout. So effectively higher weight dd is
doing writeouts of lower weight dd pages and we don't see service differentation

IOW, the core problem with async write fairness is that higher weight thread
does not throw enought IO traffic at IO controller to keep the queue
continuously backlogged. This are many .2 to .8 second intervals where higher
weight queue is empty and in that duration lower weight queue get lots of job
done giving the impression that there was no service differentiation.

In summary, from IO controller point of view async writes support is there. Now
we need to do some more work in higher layers to make sure higher weight process
is not blocked behind IO of some lower weight process. This is a TODO item.

So to test async writes I generated lots of write traffic in two cgroups (50
fio threads) and watched the disk time statistics in respective cgroups at
the interval of 2 seconds. Thanks to ryo tsuruta for the test case.

*****************************************************************
sync
echo 3 > /proc/sys/vm/drop_caches

fio_args="--size=64m --rw=write --numjobs=50 --group_reporting"

echo $$ > /cgroup/bfqio/test1/tasks
fio $fio_args --name=test1 --directory=/mnt/sdd1/fio/ --output=/mnt/sdd1/fio/test1.log &

echo $$ > /cgroup/bfqio/test2/tasks
fio $fio_args --name=test2 --directory=/mnt/sdd2/fio/ --output=/mnt/sdd2/fio/test2.log &
*********************************************************************** 

And watched the disk time and sector statistics for the both the cgroups
every 2 seconds using a script. How is snippet from output.

test1 statistics: time=9848   sectors=643152
test2 statistics: time=5224   sectors=258600

test1 statistics: time=11736   sectors=785792
test2 statistics: time=6509   sectors=333160

test1 statistics: time=13607   sectors=943968
test2 statistics: time=7443   sectors=394352

test1 statistics: time=15662   sectors=1089496
test2 statistics: time=8568   sectors=451152

So disk time consumed by group1 is almost double of group2.  

Your feedback and comments are welcome.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: IO scheduler based IO Controller V2
  2009-05-08 21:56                                 ` Vivek Goyal
@ 2009-05-14 16:43                                     ` Dhaval Giani
  -1 siblings, 0 replies; 97+ messages in thread
From: Dhaval Giani @ 2009-05-14 16:43 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: snitzer-H+wXaHxf7aLQT0dZR+AlfA, dm-devel-H+wXaHxf7aLQT0dZR+AlfA,
	jens.axboe-QHcLZuEGTsvQT0dZR+AlfA, agk-H+wXaHxf7aLQT0dZR+AlfA,
	balbir-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8,
	paolo.valente-rcYM44yAMweonA0d6jMUrA,
	fernando-gVGce1chcLdL9jVzuh4AOg, jmoyer-H+wXaHxf7aLQT0dZR+AlfA,
	Bharata B Rao, fchecconi-Re5JQEeQqe8AvxtiuMwx3w,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Andrew Morton, Andrea Righi

On Fri, May 08, 2009 at 05:56:18PM -0400, Vivek Goyal wrote:

>   So, we shall have to come up with something better, I think Dhaval was
>   implementing upper limit for cpu controller. May be PeterZ and Dhaval can
>   give us some pointers how did they manage to implement both proportional
>   and max bw control with the help of a single tree while maintaining the
>   notion of prio with-in cgroup.
> 
> PeterZ/Dhaval  ^^^^^^^^
> 

We still haven't :). I think the idea is to keep fairness (or
propotion) between the groups that are currently running. The throttled
groups should not be considered.

thanks,
-- 
regards,
Dhaval

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: IO scheduler based IO Controller V2
@ 2009-05-14 16:43                                     ` Dhaval Giani
  0 siblings, 0 replies; 97+ messages in thread
From: Dhaval Giani @ 2009-05-14 16:43 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Andrea Righi, Andrew Morton, nauman, dpshah, lizf, mikew,
	fchecconi, paolo.valente, jens.axboe, ryov, fernando, s-uchida,
	taka, guijianfeng, jmoyer, balbir, linux-kernel, containers, agk,
	dm-devel, snitzer, m-ikeda, peterz, Bharata B Rao

On Fri, May 08, 2009 at 05:56:18PM -0400, Vivek Goyal wrote:

>   So, we shall have to come up with something better, I think Dhaval was
>   implementing upper limit for cpu controller. May be PeterZ and Dhaval can
>   give us some pointers how did they manage to implement both proportional
>   and max bw control with the help of a single tree while maintaining the
>   notion of prio with-in cgroup.
> 
> PeterZ/Dhaval  ^^^^^^^^
> 

We still haven't :). I think the idea is to keep fairness (or
propotion) between the groups that are currently running. The throttled
groups should not be considered.

thanks,
-- 
regards,
Dhaval

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: IO scheduler based IO Controller V2
       [not found]                                 ` <20090508215618.GJ7293-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  2009-05-09  9:22                                   ` Peter Zijlstra
@ 2009-05-14 10:31                                   ` Andrea Righi
  2009-05-14 16:43                                     ` Dhaval Giani
  2 siblings, 0 replies; 97+ messages in thread
From: Andrea Righi @ 2009-05-14 10:31 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: dhaval-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8,
	snitzer-H+wXaHxf7aLQT0dZR+AlfA, dm-devel-H+wXaHxf7aLQT0dZR+AlfA,
	jens.axboe-QHcLZuEGTsvQT0dZR+AlfA, agk-H+wXaHxf7aLQT0dZR+AlfA,
	balbir-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8,
	paolo.valente-rcYM44yAMweonA0d6jMUrA,
	fernando-gVGce1chcLdL9jVzuh4AOg, jmoyer-H+wXaHxf7aLQT0dZR+AlfA,
	fchecconi-Re5JQEeQqe8AvxtiuMwx3w,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Andrew Morton

On Fri, May 08, 2009 at 05:56:18PM -0400, Vivek Goyal wrote:
> On Fri, May 08, 2009 at 10:05:01PM +0200, Andrea Righi wrote:
> 
> [..]
> > > Conclusion
> > > ==========
> > > It just reaffirms that with max BW control, we are not doing a fair job
> > > of throttling hence no more hold the IO scheduler properties with-in
> > > cgroup.
> > > 
> > > With proportional BW controller implemented at IO scheduler level, one
> > > can do very tight integration with IO controller and hence retain 
> > > IO scheduler behavior with-in cgroup.
> > 
> > It is worth to bug you I would say :). Results are interesting,
> > definitely. I'll check if it's possible to merge part of the io-throttle
> > max BW control in this controller and who knows if finally we'll be able
> > to converge to a common proposal...
> 
> Great, Few thoughts though.
> 
> - What are your requirements? Do you strictly need max bw control or
>   proportional BW control will satisfy your needs? Or you need both?

The theoretical advantages of max BW control are that they offer an
immediate action on policy enforcement mitigating the problem before it
happens (a kind of static partitioning I would say) and that you have
probably something that provides a more explicit control to contain
different classes of users in hosted environment (e.g., give BW in
function on how much they pay). And I can say the io-throttle approach
at the moment seems to work fine for a production environment
(http://www.bluehost.com).

Apart the motivations above, I don't have specific requirements to
provide the max BW control.

But it is also true that the io-controller approach is still in a
development stage and needs more testing. The design concepts make
sense, definitely, so maybe only the proportional approach will be
sufficient to satisfy the requirements of the 90% of users out there.

-Andrea

> 
> - With the current algorithm BFQ (modified WF2Q+), we should be able
>   to do proportional BW division while maintaining the properties of
>   IO scheduler with-in cgroup in hiearchical manner.
>  
>   I think it can be simply enhanced to do max bw control also. That is
>   whenever a queue is selected for dispatch (from fairness point of view)
>   also check the IO rate of that group and if IO rate exceeded, expire
>   the queue immediately and fake as if queue consumed its time slice
>   which will be equivalent to throttling.
> 
>   But in this simple scheme, I think throttling is still unfair with-in
>   the class. What I mean is following.
> 
>   if an RT task and an BE task are in same cgroup and cgroup exceeds its
>   max BW, RT task is next to be dispatched from fairness point of view and it
>   will end being throttled. This is still fine because until RT task is
>   finished, BE task will never get to run in that cgroup, so at some point
>   of time, cgroup rate will come down and RT task will get the IO done
>   meeting fairnesss and max bw constraints.
> 
>   But this simple scheme does not work with-in same class. Say prio 0
>   and prio 7 BE class readers. Now we will end up throttling the guy who
>   is scheduled to go next and there is no mechanism that prio0 and prio7
>   tasks are throttled in proportionate manner.
> 
>   So, we shall have to come up with something better, I think Dhaval was
>   implementing upper limit for cpu controller. May be PeterZ and Dhaval can
>   give us some pointers how did they manage to implement both proportional
>   and max bw control with the help of a single tree while maintaining the
>   notion of prio with-in cgroup.
> 
> PeterZ/Dhaval  ^^^^^^^^
> 
> - We should be able to get rid of reader-writer issue even with above
>   simple throttling mechanism for schedulers like deadline and AS, because at
>   elevator we see it as a single queue (for both reads and writes) and we
>   will throttle this queue. With-in queue dispatch are taken care by io
>   scheduler. So as long as IO has been queued in the queue, scheduler
>   will take care of giving advantage to readers even if throttling is
>   taking place on the queue.
> 
> Why am I thinking loud? So that we know what are we trying to achieve at the
> end of the day. So at this point of time what are the advantages/disadvantages
> of doing max bw control along with proportional bw control?
> 
> Advantages
> ==========
> - With a combined code base, total code should be less as compared to if
>   both of them are implemented separately. 
> 
> - There can be few advantages in terms of maintaining the notion of IO
>   scheduler with-in cgroup. (like RT tasks always goes first in presence
>   of BE and IDLE task etc. But simple throttling scheme will not take
>   care of fair throttling with-in class. We need a better algorithm to
>   achive that goal).
> 
> - We probably will get rid of reader writer issue for single queue
>   schedulers like deadline and AS. (Need to run tests and see).
> 
> Disadvantages
> =============
> - Implementation at IO scheduler/elevator layer does not cover higher
>   level logical devices. So one can do max bw control only at leaf nodes
>   where IO scheduler is running and not at intermediate logical nodes.
>    
> I personally think that proportional BW control will meet more people's
> need as compared to max bw contorl. 
> 
> So far nobody has come up with a solution where a single proposal covers
> all the cases without breaking things. So personally, I want to make
> things work at least at IO scheduler level and cover as much ground as
> possible without breaking things (hardware RAID, all the direct attached
> devices etc) and then worry about higher level software devices.
> 
> Thoughts?
> 
> Thanks
> Vivek

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: IO scheduler based IO Controller V2
  2009-05-08 21:56                                 ` Vivek Goyal
  (?)
  (?)
@ 2009-05-14 10:31                                 ` Andrea Righi
  -1 siblings, 0 replies; 97+ messages in thread
From: Andrea Righi @ 2009-05-14 10:31 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Andrew Morton, nauman, dpshah, lizf, mikew, fchecconi,
	paolo.valente, jens.axboe, ryov, fernando, s-uchida, taka,
	guijianfeng, jmoyer, dhaval, balbir, linux-kernel, containers,
	agk, dm-devel, snitzer, m-ikeda, peterz

On Fri, May 08, 2009 at 05:56:18PM -0400, Vivek Goyal wrote:
> On Fri, May 08, 2009 at 10:05:01PM +0200, Andrea Righi wrote:
> 
> [..]
> > > Conclusion
> > > ==========
> > > It just reaffirms that with max BW control, we are not doing a fair job
> > > of throttling hence no more hold the IO scheduler properties with-in
> > > cgroup.
> > > 
> > > With proportional BW controller implemented at IO scheduler level, one
> > > can do very tight integration with IO controller and hence retain 
> > > IO scheduler behavior with-in cgroup.
> > 
> > It is worth to bug you I would say :). Results are interesting,
> > definitely. I'll check if it's possible to merge part of the io-throttle
> > max BW control in this controller and who knows if finally we'll be able
> > to converge to a common proposal...
> 
> Great, Few thoughts though.
> 
> - What are your requirements? Do you strictly need max bw control or
>   proportional BW control will satisfy your needs? Or you need both?

The theoretical advantages of max BW control are that they offer an
immediate action on policy enforcement mitigating the problem before it
happens (a kind of static partitioning I would say) and that you have
probably something that provides a more explicit control to contain
different classes of users in hosted environment (e.g., give BW in
function on how much they pay). And I can say the io-throttle approach
at the moment seems to work fine for a production environment
(http://www.bluehost.com).

Apart the motivations above, I don't have specific requirements to
provide the max BW control.

But it is also true that the io-controller approach is still in a
development stage and needs more testing. The design concepts make
sense, definitely, so maybe only the proportional approach will be
sufficient to satisfy the requirements of the 90% of users out there.

-Andrea

> 
> - With the current algorithm BFQ (modified WF2Q+), we should be able
>   to do proportional BW division while maintaining the properties of
>   IO scheduler with-in cgroup in hiearchical manner.
>  
>   I think it can be simply enhanced to do max bw control also. That is
>   whenever a queue is selected for dispatch (from fairness point of view)
>   also check the IO rate of that group and if IO rate exceeded, expire
>   the queue immediately and fake as if queue consumed its time slice
>   which will be equivalent to throttling.
> 
>   But in this simple scheme, I think throttling is still unfair with-in
>   the class. What I mean is following.
> 
>   if an RT task and an BE task are in same cgroup and cgroup exceeds its
>   max BW, RT task is next to be dispatched from fairness point of view and it
>   will end being throttled. This is still fine because until RT task is
>   finished, BE task will never get to run in that cgroup, so at some point
>   of time, cgroup rate will come down and RT task will get the IO done
>   meeting fairnesss and max bw constraints.
> 
>   But this simple scheme does not work with-in same class. Say prio 0
>   and prio 7 BE class readers. Now we will end up throttling the guy who
>   is scheduled to go next and there is no mechanism that prio0 and prio7
>   tasks are throttled in proportionate manner.
> 
>   So, we shall have to come up with something better, I think Dhaval was
>   implementing upper limit for cpu controller. May be PeterZ and Dhaval can
>   give us some pointers how did they manage to implement both proportional
>   and max bw control with the help of a single tree while maintaining the
>   notion of prio with-in cgroup.
> 
> PeterZ/Dhaval  ^^^^^^^^
> 
> - We should be able to get rid of reader-writer issue even with above
>   simple throttling mechanism for schedulers like deadline and AS, because at
>   elevator we see it as a single queue (for both reads and writes) and we
>   will throttle this queue. With-in queue dispatch are taken care by io
>   scheduler. So as long as IO has been queued in the queue, scheduler
>   will take care of giving advantage to readers even if throttling is
>   taking place on the queue.
> 
> Why am I thinking loud? So that we know what are we trying to achieve at the
> end of the day. So at this point of time what are the advantages/disadvantages
> of doing max bw control along with proportional bw control?
> 
> Advantages
> ==========
> - With a combined code base, total code should be less as compared to if
>   both of them are implemented separately. 
> 
> - There can be few advantages in terms of maintaining the notion of IO
>   scheduler with-in cgroup. (like RT tasks always goes first in presence
>   of BE and IDLE task etc. But simple throttling scheme will not take
>   care of fair throttling with-in class. We need a better algorithm to
>   achive that goal).
> 
> - We probably will get rid of reader writer issue for single queue
>   schedulers like deadline and AS. (Need to run tests and see).
> 
> Disadvantages
> =============
> - Implementation at IO scheduler/elevator layer does not cover higher
>   level logical devices. So one can do max bw control only at leaf nodes
>   where IO scheduler is running and not at intermediate logical nodes.
>    
> I personally think that proportional BW control will meet more people's
> need as compared to max bw contorl. 
> 
> So far nobody has come up with a solution where a single proposal covers
> all the cases without breaking things. So personally, I want to make
> things work at least at IO scheduler level and cover as much ground as
> possible without breaking things (hardware RAID, all the direct attached
> devices etc) and then worry about higher level software devices.
> 
> Thoughts?
> 
> Thanks
> Vivek

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: IO scheduler based IO Controller V2
  2009-05-11 11:23             ` Ryo Tsuruta
@ 2009-05-11 12:49                   ` Vivek Goyal
  0 siblings, 0 replies; 97+ messages in thread
From: Vivek Goyal @ 2009-05-11 12:49 UTC (permalink / raw)
  To: Ryo Tsuruta
  Cc: dhaval-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8,
	snitzer-H+wXaHxf7aLQT0dZR+AlfA, dm-devel-H+wXaHxf7aLQT0dZR+AlfA,
	jens.axboe-QHcLZuEGTsvQT0dZR+AlfA, agk-H+wXaHxf7aLQT0dZR+AlfA,
	balbir-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8,
	paolo.valente-rcYM44yAMweonA0d6jMUrA,
	fernando-gVGce1chcLdL9jVzuh4AOg, jmoyer-H+wXaHxf7aLQT0dZR+AlfA,
	fchecconi-Re5JQEeQqe8AvxtiuMwx3w,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	righi.andrea-Re5JQEeQqe8AvxtiuMwx3w

On Mon, May 11, 2009 at 08:23:09PM +0900, Ryo Tsuruta wrote:
> Hi Vivek,
> 
> From: Vivek Goyal <vgoyal-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> Subject: Re: IO scheduler based IO Controller V2
> Date: Wed, 6 May 2009 21:25:59 -0400
> 
> > On Thu, May 07, 2009 at 09:18:58AM +0900, Ryo Tsuruta wrote:
> > > Hi Vivek,
> > > 
> > > > Ryo, dm-ioband breaks the notion of classes and priority of CFQ because
> > > > of FIFO dispatch of buffered bios. Apart from that it tries to provide
> > > > fairness in terms of actual IO done and that would mean a seeky workload
> > > > will can use disk for much longer to get equivalent IO done and slow down
> > > > other applications. Implementing IO controller at IO scheduler level gives
> > > > us tigher control. Will it not meet your requirements? If you got specific
> > > > concerns with IO scheduler based contol patches, please highlight these and
> > > > we will see how these can be addressed.
> > > 
> > > I'd like to avoid making complicated existing IO schedulers and other
> > > kernel codes and to give a choice to users whether or not to use it.
> > > I know that you chose an approach that using compile time options to
> > > get the same behavior as old system, but device-mapper drivers can be
> > > added, removed and replaced while system is running.
> > > 
> > 
> > Same is possible with IO scheduler based controller. If you don't want
> > cgroup stuff, don't create those. By default everything will be in root
> > group and you will get the old behavior. 
> > 
> > If you want io controller stuff, just create the cgroup, assign weight
> > and move task there. So what more choices do you want which are missing
> > here?
> 
> What I mean to say is that device-mapper drivers can be completely
> removed from the kernel if not used.
> 
> I know that dm-ioband has some issues which can be addressed by your
> IO controller, but I'm not sure your controller works well. So I would
> like to see some benchmark results of your IO controller.
> 

Fair enough. IO scheduler based IO controller is still work in progress
and we have started to get some basic things right. I think after 3-4
iterations of patches, patches will be stable enough and working enough that
I should be able to give some benchmark numbers also.

Currently I am posting the intermediate snapshot of my tree to lkml to get the
design feedback so that if there are fundamental design issues, we can sort
these out.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: IO scheduler based IO Controller V2
@ 2009-05-11 12:49                   ` Vivek Goyal
  0 siblings, 0 replies; 97+ messages in thread
From: Vivek Goyal @ 2009-05-11 12:49 UTC (permalink / raw)
  To: Ryo Tsuruta
  Cc: akpm, nauman, dpshah, lizf, mikew, fchecconi, paolo.valente,
	jens.axboe, fernando, s-uchida, taka, guijianfeng, jmoyer,
	dhaval, balbir, linux-kernel, containers, righi.andrea, agk,
	dm-devel, snitzer, m-ikeda, peterz

On Mon, May 11, 2009 at 08:23:09PM +0900, Ryo Tsuruta wrote:
> Hi Vivek,
> 
> From: Vivek Goyal <vgoyal@redhat.com>
> Subject: Re: IO scheduler based IO Controller V2
> Date: Wed, 6 May 2009 21:25:59 -0400
> 
> > On Thu, May 07, 2009 at 09:18:58AM +0900, Ryo Tsuruta wrote:
> > > Hi Vivek,
> > > 
> > > > Ryo, dm-ioband breaks the notion of classes and priority of CFQ because
> > > > of FIFO dispatch of buffered bios. Apart from that it tries to provide
> > > > fairness in terms of actual IO done and that would mean a seeky workload
> > > > will can use disk for much longer to get equivalent IO done and slow down
> > > > other applications. Implementing IO controller at IO scheduler level gives
> > > > us tigher control. Will it not meet your requirements? If you got specific
> > > > concerns with IO scheduler based contol patches, please highlight these and
> > > > we will see how these can be addressed.
> > > 
> > > I'd like to avoid making complicated existing IO schedulers and other
> > > kernel codes and to give a choice to users whether or not to use it.
> > > I know that you chose an approach that using compile time options to
> > > get the same behavior as old system, but device-mapper drivers can be
> > > added, removed and replaced while system is running.
> > > 
> > 
> > Same is possible with IO scheduler based controller. If you don't want
> > cgroup stuff, don't create those. By default everything will be in root
> > group and you will get the old behavior. 
> > 
> > If you want io controller stuff, just create the cgroup, assign weight
> > and move task there. So what more choices do you want which are missing
> > here?
> 
> What I mean to say is that device-mapper drivers can be completely
> removed from the kernel if not used.
> 
> I know that dm-ioband has some issues which can be addressed by your
> IO controller, but I'm not sure your controller works well. So I would
> like to see some benchmark results of your IO controller.
> 

Fair enough. IO scheduler based IO controller is still work in progress
and we have started to get some basic things right. I think after 3-4
iterations of patches, patches will be stable enough and working enough that
I should be able to give some benchmark numbers also.

Currently I am posting the intermediate snapshot of my tree to lkml to get the
design feedback so that if there are fundamental design issues, we can sort
these out.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: IO scheduler based IO Controller V2
       [not found]             ` <20090507012559.GC4187-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2009-05-11 11:23               ` Ryo Tsuruta
  0 siblings, 0 replies; 97+ messages in thread
From: Ryo Tsuruta @ 2009-05-11 11:23 UTC (permalink / raw)
  To: vgoyal-H+wXaHxf7aLQT0dZR+AlfA
  Cc: dhaval-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8,
	snitzer-H+wXaHxf7aLQT0dZR+AlfA, dm-devel-H+wXaHxf7aLQT0dZR+AlfA,
	jens.axboe-QHcLZuEGTsvQT0dZR+AlfA, agk-H+wXaHxf7aLQT0dZR+AlfA,
	balbir-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8,
	paolo.valente-rcYM44yAMweonA0d6jMUrA,
	fernando-gVGce1chcLdL9jVzuh4AOg, jmoyer-H+wXaHxf7aLQT0dZR+AlfA,
	fchecconi-Re5JQEeQqe8AvxtiuMwx3w,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	righi.andrea-Re5JQEeQqe8AvxtiuMwx3w

Hi Vivek,

From: Vivek Goyal <vgoyal-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Subject: Re: IO scheduler based IO Controller V2
Date: Wed, 6 May 2009 21:25:59 -0400

> On Thu, May 07, 2009 at 09:18:58AM +0900, Ryo Tsuruta wrote:
> > Hi Vivek,
> > 
> > > Ryo, dm-ioband breaks the notion of classes and priority of CFQ because
> > > of FIFO dispatch of buffered bios. Apart from that it tries to provide
> > > fairness in terms of actual IO done and that would mean a seeky workload
> > > will can use disk for much longer to get equivalent IO done and slow down
> > > other applications. Implementing IO controller at IO scheduler level gives
> > > us tigher control. Will it not meet your requirements? If you got specific
> > > concerns with IO scheduler based contol patches, please highlight these and
> > > we will see how these can be addressed.
> > 
> > I'd like to avoid making complicated existing IO schedulers and other
> > kernel codes and to give a choice to users whether or not to use it.
> > I know that you chose an approach that using compile time options to
> > get the same behavior as old system, but device-mapper drivers can be
> > added, removed and replaced while system is running.
> > 
> 
> Same is possible with IO scheduler based controller. If you don't want
> cgroup stuff, don't create those. By default everything will be in root
> group and you will get the old behavior. 
> 
> If you want io controller stuff, just create the cgroup, assign weight
> and move task there. So what more choices do you want which are missing
> here?

What I mean to say is that device-mapper drivers can be completely
removed from the kernel if not used.

I know that dm-ioband has some issues which can be addressed by your
IO controller, but I'm not sure your controller works well. So I would
like to see some benchmark results of your IO controller.

Thanks,
Ryo Tsuruta

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: IO scheduler based IO Controller V2
  2009-05-07  1:25             ` Vivek Goyal
  (?)
  (?)
@ 2009-05-11 11:23             ` Ryo Tsuruta
       [not found]               ` <20090511.202309.112614168.ryov-jCdQPDEk3idL9jVzuh4AOg@public.gmane.org>
  -1 siblings, 1 reply; 97+ messages in thread
From: Ryo Tsuruta @ 2009-05-11 11:23 UTC (permalink / raw)
  To: vgoyal
  Cc: akpm, nauman, dpshah, lizf, mikew, fchecconi, paolo.valente,
	jens.axboe, fernando, s-uchida, taka, guijianfeng, jmoyer,
	dhaval, balbir, linux-kernel, containers, righi.andrea, agk,
	dm-devel, snitzer, m-ikeda, peterz

Hi Vivek,

From: Vivek Goyal <vgoyal@redhat.com>
Subject: Re: IO scheduler based IO Controller V2
Date: Wed, 6 May 2009 21:25:59 -0400

> On Thu, May 07, 2009 at 09:18:58AM +0900, Ryo Tsuruta wrote:
> > Hi Vivek,
> > 
> > > Ryo, dm-ioband breaks the notion of classes and priority of CFQ because
> > > of FIFO dispatch of buffered bios. Apart from that it tries to provide
> > > fairness in terms of actual IO done and that would mean a seeky workload
> > > will can use disk for much longer to get equivalent IO done and slow down
> > > other applications. Implementing IO controller at IO scheduler level gives
> > > us tigher control. Will it not meet your requirements? If you got specific
> > > concerns with IO scheduler based contol patches, please highlight these and
> > > we will see how these can be addressed.
> > 
> > I'd like to avoid making complicated existing IO schedulers and other
> > kernel codes and to give a choice to users whether or not to use it.
> > I know that you chose an approach that using compile time options to
> > get the same behavior as old system, but device-mapper drivers can be
> > added, removed and replaced while system is running.
> > 
> 
> Same is possible with IO scheduler based controller. If you don't want
> cgroup stuff, don't create those. By default everything will be in root
> group and you will get the old behavior. 
> 
> If you want io controller stuff, just create the cgroup, assign weight
> and move task there. So what more choices do you want which are missing
> here?

What I mean to say is that device-mapper drivers can be completely
removed from the kernel if not used.

I know that dm-ioband has some issues which can be addressed by your
IO controller, but I'm not sure your controller works well. So I would
like to see some benchmark results of your IO controller.

Thanks,
Ryo Tsuruta

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: IO scheduler based IO Controller V2
       [not found]           ` <4A0440B2.7040300-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2009-05-11 10:11             ` Ryo Tsuruta
  0 siblings, 0 replies; 97+ messages in thread
From: Ryo Tsuruta @ 2009-05-11 10:11 UTC (permalink / raw)
  To: riel-H+wXaHxf7aLQT0dZR+AlfA
  Cc: dhaval-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8,
	snitzer-H+wXaHxf7aLQT0dZR+AlfA, dm-devel-H+wXaHxf7aLQT0dZR+AlfA,
	jens.axboe-QHcLZuEGTsvQT0dZR+AlfA, agk-H+wXaHxf7aLQT0dZR+AlfA,
	balbir-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8,
	paolo.valente-rcYM44yAMweonA0d6jMUrA,
	fernando-gVGce1chcLdL9jVzuh4AOg, jmoyer-H+wXaHxf7aLQT0dZR+AlfA,
	righi.andrea-Re5JQEeQqe8AvxtiuMwx3w,
	fchecconi-Re5JQEeQqe8AvxtiuMwx3w,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b

Hi Rik,

From: Rik van Riel <riel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Subject: Re: IO scheduler based IO Controller V2
Date: Fri, 08 May 2009 10:24:50 -0400

> Ryo Tsuruta wrote:
> > Hi Vivek,
> > 
> >> Ryo, dm-ioband breaks the notion of classes and priority of CFQ because
> >> of FIFO dispatch of buffered bios. Apart from that it tries to provide
> >> fairness in terms of actual IO done and that would mean a seeky workload
> >> will can use disk for much longer to get equivalent IO done and slow down
> >> other applications. Implementing IO controller at IO scheduler level gives
> >> us tigher control. Will it not meet your requirements? If you got specific
> >> concerns with IO scheduler based contol patches, please highlight these and
> >> we will see how these can be addressed.
> > I'd like to avoid making complicated existing IO schedulers and other
> > kernel codes and to give a choice to users whether or not to use it.
> > I know that you chose an approach that using compile time options to
> > get the same behavior as old system, but device-mapper drivers can be
> > added, removed and replaced while system is running.
> 
> I do not believe that every use of cgroups will end up with
> a separate logical volume for each group.
> 
> In fact, if you look at group-per-UID usage, which could be
> quite common on shared web servers and shell servers, I would
> expect all the groups to share the same filesystem.
> 
> I do not believe dm-ioband would be useful in that configuration,
> while the IO scheduler based IO controller will just work.

dm-ioband can control bandwidth on a per cgroup basis as same as
Vivek's IO controller. Could you explain what do you want to do and
how to configure the IO scheduler based IO controller in that case?

Thanks,
Ryo Tsuruta

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: IO scheduler based IO Controller V2
  2009-05-08 14:24         ` Rik van Riel
       [not found]           ` <4A0440B2.7040300-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2009-05-11 10:11           ` Ryo Tsuruta
  1 sibling, 0 replies; 97+ messages in thread
From: Ryo Tsuruta @ 2009-05-11 10:11 UTC (permalink / raw)
  To: riel
  Cc: vgoyal, akpm, nauman, dpshah, lizf, mikew, fchecconi,
	paolo.valente, jens.axboe, fernando, s-uchida, taka, guijianfeng,
	jmoyer, dhaval, balbir, linux-kernel, containers, righi.andrea,
	agk, dm-devel, snitzer, m-ikeda, peterz

Hi Rik,

From: Rik van Riel <riel@redhat.com>
Subject: Re: IO scheduler based IO Controller V2
Date: Fri, 08 May 2009 10:24:50 -0400

> Ryo Tsuruta wrote:
> > Hi Vivek,
> > 
> >> Ryo, dm-ioband breaks the notion of classes and priority of CFQ because
> >> of FIFO dispatch of buffered bios. Apart from that it tries to provide
> >> fairness in terms of actual IO done and that would mean a seeky workload
> >> will can use disk for much longer to get equivalent IO done and slow down
> >> other applications. Implementing IO controller at IO scheduler level gives
> >> us tigher control. Will it not meet your requirements? If you got specific
> >> concerns with IO scheduler based contol patches, please highlight these and
> >> we will see how these can be addressed.
> > I'd like to avoid making complicated existing IO schedulers and other
> > kernel codes and to give a choice to users whether or not to use it.
> > I know that you chose an approach that using compile time options to
> > get the same behavior as old system, but device-mapper drivers can be
> > added, removed and replaced while system is running.
> 
> I do not believe that every use of cgroups will end up with
> a separate logical volume for each group.
> 
> In fact, if you look at group-per-UID usage, which could be
> quite common on shared web servers and shell servers, I would
> expect all the groups to share the same filesystem.
> 
> I do not believe dm-ioband would be useful in that configuration,
> while the IO scheduler based IO controller will just work.

dm-ioband can control bandwidth on a per cgroup basis as same as
Vivek's IO controller. Could you explain what do you want to do and
how to configure the IO scheduler based IO controller in that case?

Thanks,
Ryo Tsuruta

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: IO scheduler based IO Controller V2
       [not found]             ` <20090508133740.GD7293-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2009-05-11  2:59               ` Gui Jianfeng
  0 siblings, 0 replies; 97+ messages in thread
From: Gui Jianfeng @ 2009-05-11  2:59 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: dhaval-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8,
	snitzer-H+wXaHxf7aLQT0dZR+AlfA, dm-devel-H+wXaHxf7aLQT0dZR+AlfA,
	jens.axboe-QHcLZuEGTsvQT0dZR+AlfA, agk-H+wXaHxf7aLQT0dZR+AlfA,
	balbir-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8,
	paolo.valente-rcYM44yAMweonA0d6jMUrA,
	fernando-gVGce1chcLdL9jVzuh4AOg, jmoyer-H+wXaHxf7aLQT0dZR+AlfA,
	fchecconi-Re5JQEeQqe8AvxtiuMwx3w,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	righi.andrea-Re5JQEeQqe8AvxtiuMwx3w

Vivek Goyal wrote:
...
>>
> 
> Thanks Li and Gui for pointing out the problem. With you script, I could
> also produce lock validator warning as well as system freeze. I could
> identify at least two trouble spots. With following patch things seems
> to be fine on my system. Can you please give it a try.

  Hi Vivek,

  I'v tried this patch, and seems the problem is addressed. Thanks.

-- 
Regards
Gui Jianfeng

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: IO scheduler based IO Controller V2
  2009-05-08 13:37             ` Vivek Goyal
  (?)
@ 2009-05-11  2:59             ` Gui Jianfeng
  -1 siblings, 0 replies; 97+ messages in thread
From: Gui Jianfeng @ 2009-05-11  2:59 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Li Zefan, nauman, dpshah, mikew, fchecconi, paolo.valente,
	jens.axboe, ryov, fernando, s-uchida, taka, jmoyer, dhaval,
	balbir, linux-kernel, containers, righi.andrea, agk, dm-devel,
	snitzer, m-ikeda, akpm

Vivek Goyal wrote:
...
>>
> 
> Thanks Li and Gui for pointing out the problem. With you script, I could
> also produce lock validator warning as well as system freeze. I could
> identify at least two trouble spots. With following patch things seems
> to be fine on my system. Can you please give it a try.

  Hi Vivek,

  I'v tried this patch, and seems the problem is addressed. Thanks.

-- 
Regards
Gui Jianfeng


^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: IO scheduler based IO Controller V2
       [not found]                                 ` <20090508215618.GJ7293-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2009-05-09  9:22                                   ` Peter Zijlstra
  2009-05-14 10:31                                   ` Andrea Righi
  2009-05-14 16:43                                     ` Dhaval Giani
  2 siblings, 0 replies; 97+ messages in thread
From: Peter Zijlstra @ 2009-05-09  9:22 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: dhaval-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8,
	snitzer-H+wXaHxf7aLQT0dZR+AlfA, dm-devel-H+wXaHxf7aLQT0dZR+AlfA,
	jens.axboe-QHcLZuEGTsvQT0dZR+AlfA, agk-H+wXaHxf7aLQT0dZR+AlfA,
	balbir-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8,
	paolo.valente-rcYM44yAMweonA0d6jMUrA,
	fernando-gVGce1chcLdL9jVzuh4AOg, jmoyer-H+wXaHxf7aLQT0dZR+AlfA,
	Andrea Righi, fchecconi-Re5JQEeQqe8AvxtiuMwx3w,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Andrew Morton

On Fri, 2009-05-08 at 17:56 -0400, Vivek Goyal wrote:
>   So, we shall have to come up with something better, I think Dhaval was
>   implementing upper limit for cpu controller. May be PeterZ and Dhaval can
>   give us some pointers how did they manage to implement both proportional
>   and max bw control with the help of a single tree while maintaining the
>   notion of prio with-in cgroup.

We don't do max bandwidth control in the SCHED_OTHER bits as I oppose to
making it non work conserving.

SCHED_FIFO/RR do constant bandwidth things and are always scheduled in
favour of SCHED_OTHER. 

That is, we provide a minimum bandwidth for real-time tasks, but since
having a maximum higher than the minimum is useless since one cannot
rely on it (non deterministic) we put max = min.

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: IO scheduler based IO Controller V2
  2009-05-08 21:56                                 ` Vivek Goyal
  (?)
@ 2009-05-09  9:22                                 ` Peter Zijlstra
  -1 siblings, 0 replies; 97+ messages in thread
From: Peter Zijlstra @ 2009-05-09  9:22 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Andrea Righi, Andrew Morton, nauman, dpshah, lizf, mikew,
	fchecconi, paolo.valente, jens.axboe, ryov, fernando, s-uchida,
	taka, guijianfeng, jmoyer, dhaval, balbir, linux-kernel,
	containers, agk, dm-devel, snitzer, m-ikeda

On Fri, 2009-05-08 at 17:56 -0400, Vivek Goyal wrote:
>   So, we shall have to come up with something better, I think Dhaval was
>   implementing upper limit for cpu controller. May be PeterZ and Dhaval can
>   give us some pointers how did they manage to implement both proportional
>   and max bw control with the help of a single tree while maintaining the
>   notion of prio with-in cgroup.

We don't do max bandwidth control in the SCHED_OTHER bits as I oppose to
making it non work conserving.

SCHED_FIFO/RR do constant bandwidth things and are always scheduled in
favour of SCHED_OTHER. 

That is, we provide a minimum bandwidth for real-time tasks, but since
having a maximum higher than the minimum is useless since one cannot
rely on it (non deterministic) we put max = min.


^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: IO scheduler based IO Controller V2
  2009-05-08 20:05                             ` Andrea Righi
@ 2009-05-08 21:56                                 ` Vivek Goyal
  0 siblings, 0 replies; 97+ messages in thread
From: Vivek Goyal @ 2009-05-08 21:56 UTC (permalink / raw)
  To: Andrea Righi
  Cc: dhaval-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8,
	snitzer-H+wXaHxf7aLQT0dZR+AlfA, dm-devel-H+wXaHxf7aLQT0dZR+AlfA,
	jens.axboe-QHcLZuEGTsvQT0dZR+AlfA, agk-H+wXaHxf7aLQT0dZR+AlfA,
	balbir-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8,
	paolo.valente-rcYM44yAMweonA0d6jMUrA,
	fernando-gVGce1chcLdL9jVzuh4AOg, jmoyer-H+wXaHxf7aLQT0dZR+AlfA,
	fchecconi-Re5JQEeQqe8AvxtiuMwx3w,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Andrew Morton

On Fri, May 08, 2009 at 10:05:01PM +0200, Andrea Righi wrote:

[..]
> > Conclusion
> > ==========
> > It just reaffirms that with max BW control, we are not doing a fair job
> > of throttling hence no more hold the IO scheduler properties with-in
> > cgroup.
> > 
> > With proportional BW controller implemented at IO scheduler level, one
> > can do very tight integration with IO controller and hence retain 
> > IO scheduler behavior with-in cgroup.
> 
> It is worth to bug you I would say :). Results are interesting,
> definitely. I'll check if it's possible to merge part of the io-throttle
> max BW control in this controller and who knows if finally we'll be able
> to converge to a common proposal...

Great, Few thoughts though.

- What are your requirements? Do you strictly need max bw control or
  proportional BW control will satisfy your needs? Or you need both?

- With the current algorithm BFQ (modified WF2Q+), we should be able
  to do proportional BW division while maintaining the properties of
  IO scheduler with-in cgroup in hiearchical manner.
 
  I think it can be simply enhanced to do max bw control also. That is
  whenever a queue is selected for dispatch (from fairness point of view)
  also check the IO rate of that group and if IO rate exceeded, expire
  the queue immediately and fake as if queue consumed its time slice
  which will be equivalent to throttling.

  But in this simple scheme, I think throttling is still unfair with-in
  the class. What I mean is following.

  if an RT task and an BE task are in same cgroup and cgroup exceeds its
  max BW, RT task is next to be dispatched from fairness point of view and it
  will end being throttled. This is still fine because until RT task is
  finished, BE task will never get to run in that cgroup, so at some point
  of time, cgroup rate will come down and RT task will get the IO done
  meeting fairnesss and max bw constraints.

  But this simple scheme does not work with-in same class. Say prio 0
  and prio 7 BE class readers. Now we will end up throttling the guy who
  is scheduled to go next and there is no mechanism that prio0 and prio7
  tasks are throttled in proportionate manner.

  So, we shall have to come up with something better, I think Dhaval was
  implementing upper limit for cpu controller. May be PeterZ and Dhaval can
  give us some pointers how did they manage to implement both proportional
  and max bw control with the help of a single tree while maintaining the
  notion of prio with-in cgroup.

PeterZ/Dhaval  ^^^^^^^^

- We should be able to get rid of reader-writer issue even with above
  simple throttling mechanism for schedulers like deadline and AS, because at
  elevator we see it as a single queue (for both reads and writes) and we
  will throttle this queue. With-in queue dispatch are taken care by io
  scheduler. So as long as IO has been queued in the queue, scheduler
  will take care of giving advantage to readers even if throttling is
  taking place on the queue.

Why am I thinking loud? So that we know what are we trying to achieve at the
end of the day. So at this point of time what are the advantages/disadvantages
of doing max bw control along with proportional bw control?

Advantages
==========
- With a combined code base, total code should be less as compared to if
  both of them are implemented separately. 

- There can be few advantages in terms of maintaining the notion of IO
  scheduler with-in cgroup. (like RT tasks always goes first in presence
  of BE and IDLE task etc. But simple throttling scheme will not take
  care of fair throttling with-in class. We need a better algorithm to
  achive that goal).

- We probably will get rid of reader writer issue for single queue
  schedulers like deadline and AS. (Need to run tests and see).

Disadvantages
=============
- Implementation at IO scheduler/elevator layer does not cover higher
  level logical devices. So one can do max bw control only at leaf nodes
  where IO scheduler is running and not at intermediate logical nodes.
   
I personally think that proportional BW control will meet more people's
need as compared to max bw contorl. 

So far nobody has come up with a solution where a single proposal covers
all the cases without breaking things. So personally, I want to make
things work at least at IO scheduler level and cover as much ground as
possible without breaking things (hardware RAID, all the direct attached
devices etc) and then worry about higher level software devices.

Thoughts?

Thanks
Vivek

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: IO scheduler based IO Controller V2
@ 2009-05-08 21:56                                 ` Vivek Goyal
  0 siblings, 0 replies; 97+ messages in thread
From: Vivek Goyal @ 2009-05-08 21:56 UTC (permalink / raw)
  To: Andrea Righi
  Cc: Andrew Morton, nauman, dpshah, lizf, mikew, fchecconi,
	paolo.valente, jens.axboe, ryov, fernando, s-uchida, taka,
	guijianfeng, jmoyer, dhaval, balbir, linux-kernel, containers,
	agk, dm-devel, snitzer, m-ikeda, peterz

On Fri, May 08, 2009 at 10:05:01PM +0200, Andrea Righi wrote:

[..]
> > Conclusion
> > ==========
> > It just reaffirms that with max BW control, we are not doing a fair job
> > of throttling hence no more hold the IO scheduler properties with-in
> > cgroup.
> > 
> > With proportional BW controller implemented at IO scheduler level, one
> > can do very tight integration with IO controller and hence retain 
> > IO scheduler behavior with-in cgroup.
> 
> It is worth to bug you I would say :). Results are interesting,
> definitely. I'll check if it's possible to merge part of the io-throttle
> max BW control in this controller and who knows if finally we'll be able
> to converge to a common proposal...

Great, Few thoughts though.

- What are your requirements? Do you strictly need max bw control or
  proportional BW control will satisfy your needs? Or you need both?

- With the current algorithm BFQ (modified WF2Q+), we should be able
  to do proportional BW division while maintaining the properties of
  IO scheduler with-in cgroup in hiearchical manner.
 
  I think it can be simply enhanced to do max bw control also. That is
  whenever a queue is selected for dispatch (from fairness point of view)
  also check the IO rate of that group and if IO rate exceeded, expire
  the queue immediately and fake as if queue consumed its time slice
  which will be equivalent to throttling.

  But in this simple scheme, I think throttling is still unfair with-in
  the class. What I mean is following.

  if an RT task and an BE task are in same cgroup and cgroup exceeds its
  max BW, RT task is next to be dispatched from fairness point of view and it
  will end being throttled. This is still fine because until RT task is
  finished, BE task will never get to run in that cgroup, so at some point
  of time, cgroup rate will come down and RT task will get the IO done
  meeting fairnesss and max bw constraints.

  But this simple scheme does not work with-in same class. Say prio 0
  and prio 7 BE class readers. Now we will end up throttling the guy who
  is scheduled to go next and there is no mechanism that prio0 and prio7
  tasks are throttled in proportionate manner.

  So, we shall have to come up with something better, I think Dhaval was
  implementing upper limit for cpu controller. May be PeterZ and Dhaval can
  give us some pointers how did they manage to implement both proportional
  and max bw control with the help of a single tree while maintaining the
  notion of prio with-in cgroup.

PeterZ/Dhaval  ^^^^^^^^

- We should be able to get rid of reader-writer issue even with above
  simple throttling mechanism for schedulers like deadline and AS, because at
  elevator we see it as a single queue (for both reads and writes) and we
  will throttle this queue. With-in queue dispatch are taken care by io
  scheduler. So as long as IO has been queued in the queue, scheduler
  will take care of giving advantage to readers even if throttling is
  taking place on the queue.

Why am I thinking loud? So that we know what are we trying to achieve at the
end of the day. So at this point of time what are the advantages/disadvantages
of doing max bw control along with proportional bw control?

Advantages
==========
- With a combined code base, total code should be less as compared to if
  both of them are implemented separately. 

- There can be few advantages in terms of maintaining the notion of IO
  scheduler with-in cgroup. (like RT tasks always goes first in presence
  of BE and IDLE task etc. But simple throttling scheme will not take
  care of fair throttling with-in class. We need a better algorithm to
  achive that goal).

- We probably will get rid of reader writer issue for single queue
  schedulers like deadline and AS. (Need to run tests and see).

Disadvantages
=============
- Implementation at IO scheduler/elevator layer does not cover higher
  level logical devices. So one can do max bw control only at leaf nodes
  where IO scheduler is running and not at intermediate logical nodes.
   
I personally think that proportional BW control will meet more people's
need as compared to max bw contorl. 

So far nobody has come up with a solution where a single proposal covers
all the cases without breaking things. So personally, I want to make
things work at least at IO scheduler level and cover as much ground as
possible without breaking things (hardware RAID, all the direct attached
devices etc) and then worry about higher level software devices.

Thoughts?

Thanks
Vivek

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: IO scheduler based IO Controller V2
       [not found]                             ` <20090508180951.GG7293-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2009-05-08 20:05                               ` Andrea Righi
  0 siblings, 0 replies; 97+ messages in thread
From: Andrea Righi @ 2009-05-08 20:05 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: dhaval-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8,
	snitzer-H+wXaHxf7aLQT0dZR+AlfA, dm-devel-H+wXaHxf7aLQT0dZR+AlfA,
	jens.axboe-QHcLZuEGTsvQT0dZR+AlfA, agk-H+wXaHxf7aLQT0dZR+AlfA,
	balbir-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8,
	paolo.valente-rcYM44yAMweonA0d6jMUrA,
	fernando-gVGce1chcLdL9jVzuh4AOg, jmoyer-H+wXaHxf7aLQT0dZR+AlfA,
	fchecconi-Re5JQEeQqe8AvxtiuMwx3w,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Andrew Morton

On Fri, May 08, 2009 at 02:09:51PM -0400, Vivek Goyal wrote:
> On Fri, May 08, 2009 at 12:19:01AM +0200, Andrea Righi wrote:
> > On Thu, May 07, 2009 at 11:36:42AM -0400, Vivek Goyal wrote:
> > > Hmm.., my old config had "AS" as default scheduler that's why I was seeing
> > > the strange issue of RT task finishing after BE. My apologies for that. I
> > > somehow assumed that CFQ is default scheduler in my config.
> > 
> > ok.
> > 
> > > 
> > > So I have re-run the test to see if we are still seeing the issue of
> > > loosing priority and class with-in cgroup. And we still do..
> > > 
> > > 2.6.30-rc4 with io-throttle patches
> > > ===================================
> > > Test1
> > > =====
> > > - Two readers, one BE prio 0 and other BE prio 7 in a cgroup limited with
> > >   8MB/s BW.
> > > 
> > > 234179072 bytes (234 MB) copied, 55.8448 s, 4.2 MB/s
> > > prio 0 task finished
> > > 234179072 bytes (234 MB) copied, 55.8878 s, 4.2 MB/s
> > > 
> > > Test2
> > > =====
> > > - Two readers, one RT prio 0 and other BE prio 7 in a cgroup limited with
> > >   8MB/s BW.
> > > 
> > > 234179072 bytes (234 MB) copied, 55.8876 s, 4.2 MB/s
> > > 234179072 bytes (234 MB) copied, 55.8984 s, 4.2 MB/s
> > > RT task finished
> > 
> > ok, coherent with the current io-throttle implementation.
> > 
> > > 
> > > Test3
> > > =====
> > > - Reader Starvation
> > > - I created a cgroup with BW limit of 64MB/s. First I just run the reader
> > >   alone and then I run reader along with 4 writers 4 times. 
> > > 
> > > Reader alone
> > > 234179072 bytes (234 MB) copied, 3.71796 s, 63.0 MB/s
> > > 
> > > Reader with 4 writers
> > > ---------------------
> > > First run
> > > 234179072 bytes (234 MB) copied, 30.394 s, 7.7 MB/s 
> > > 
> > > Second run
> > > 234179072 bytes (234 MB) copied, 26.9607 s, 8.7 MB/s
> > > 
> > > Third run
> > > 234179072 bytes (234 MB) copied, 37.3515 s, 6.3 MB/s
> > > 
> > > Fourth run
> > > 234179072 bytes (234 MB) copied, 36.817 s, 6.4 MB/s
> > > 
> > > Note that out of 64MB/s limit of this cgroup, reader does not get even
> > > 1/5 of the BW. In normal systems, readers are advantaged and reader gets
> > > its job done much faster even in presence of multiple writers.   
> > 
> > And this is also coherent. The throttling is equally probable for read
> > and write. But this shouldn't happen if we saturate the physical disk BW
> > (doing proportional BW control or using a watermark close to 100 in
> > io-throttle). In this case IO scheduler logic shouldn't be totally
> > broken.
> >
> 
> Can you please explain the watermark a bit more? So blockio.watermark=90
> mean 90% of what? total disk BW? But disk BW varies based on work load?

The controller starts to apply throttling rules only when the total disk
BW utilization is greater than 90%.

The consumed BW is evaluated as (cpu_ticks / io_ticks * 100), where
cpu_ticks are the ticks (in jiffies) since the last i/o request and
io_ticks is the difference of ticks accounted to a particular block
device, retrieved by:

part_stat_read(bdev->bd_part, io_ticks)

BTW it's the same metric (%util) used by iostat.

> 
> > Doing a very quick test with io-throttle, using a 10MB/s BW limit and
> > blockio.watermark=90:
> > 
> > Launching reader
> > 256+0 records in
> > 256+0 records out
> > 268435456 bytes (268 MB) copied, 32.2798 s, 8.3 MB/s
> > 
> > In the same time the writers wrote ~190MB, so the single reader got
> > about 1/3 of the total BW.
> > 
> > 182M testzerofile4
> > 198M testzerofile1
> > 188M testzerofile3
> > 189M testzerofile2
> > 
> 
> But its now more a max bw controller at all now? I seem to be getting the
> total BW of (268+182+198+188+189)/32 = 32MB/s and you set the limit to
> 10MB/s?
>  

The limit of 10MB/s is applied only when the consumed disk BW hits 90%.

If the disk is not fully saturated no limit is applied. It's nothing
more than soft limiting, to avoid to waste the unused disk BW that we
have with hard limits. This is similar to the proportional approach from
a certain point of view.

But ok, this only reduces the number of times that we block the IO
requests. The fact is that when we apply throttling the probability to
block a read or a write it's the same also in this case.

> 
> [..]
> > What are the results with your IO scheduler controller (if you already
> > have them, otherwise I'll repeat this test in my system)? It seems a
> > very interesting test to compare the advantages of the IO scheduler
> > solution respect to the io-throttle approach.
> > 
> 
> I had not done any reader writer testing so far. But you forced me to run
> some now. :-) Here are the results. 

Good! :)

> 
> Because one is max BW controller and other is proportional BW controller
> doing exact comparison is hard. Still....
> 
> Test1
> =====
> Try to run lots of writers (50 random writers using fio and 4 sequential
> writers with dd if=/dev/zero) and one single reader either in root group
> or with in one cgroup to show that readers are not starved by writers
> as opposed to io-throttle controller.
> 
> Run test1 with vanilla kernel with CFQ
> =====================================
> Launched 50 fio random writers, 4 sequential writers and 1 reader in root
> and noted how long it takes reader to finish. Also noted the per second output
> from iostat -d 1 -m /dev/sdb1 to monitor how disk throughput varies.
> 
> ***********************************************************************
> # launch 50 writers fio job
> 
> fio_args="--size=64m --rw=write --numjobs=50 --group_reporting"
> fio $fio_args --name=test2 --directory=/mnt/sdb/fio2/ --output=/mnt/sdb/fio2/test2.log > /dev/null  &
> 
> #launch 4 sequential writers
> ionice -c 2 -n 7 dd if=/dev/zero of=/mnt/sdb/testzerofile1 bs=4K count=524288 &
> ionice -c 2 -n 7 dd if=/dev/zero of=/mnt/sdb/testzerofile2 bs=4K count=524288 &
> ionice -c 2 -n 7 dd if=/dev/zero of=/mnt/sdb/testzerofile3 bs=4K count=524288 &
> ionice -c 2 -n 7 dd if=/dev/zero of=/mnt/sdb/testzerofile4 bs=4K count=524288 &
> 
> echo "Sleeping for 5 seconds"
> sleep 5
> echo "Launching reader"
> 
> ionice -c 2 -n 0 dd if=/mnt/sdb/zerofile2 of=/dev/zero &
> wait $!
> echo "Reader Finished"
> ***************************************************************************
> 
> Results
> -------
> 234179072 bytes (234 MB) copied, 4.55047 s, 51.5 MB/s
> 
> Reader finished in 4.5 seconds. Following are few lines from iostat output
> 
> ***********************************************************************
> Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
> sdb1            151.00         0.04        48.33          0         48
> 
> Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
> sdb1            120.00         1.78        31.23          1         31
> 
> Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
> sdb1            504.95        56.75         7.51         57          7
> 
> Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
> sdb1            547.47        62.71         4.47         62          4
> 
> Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
> sdb1            441.00        49.80         7.82         49          7
> 
> Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
> sdb1            441.41        48.28        13.84         47         13
> 
> *************************************************************************
> 
> Note how, first write picks up and then suddenly reader comes in and CFQ
> allocates a huge chunk of BW to reader to give it the advantage.
> 
> Run Test1 with IO scheduler based io controller patch
> =====================================================
> 
> 234179072 bytes (234 MB) copied, 5.23141 s, 44.8 MB/s 
> 
> Reader finishes in 5.23 seconds. Why does it take more time than CFQ,
> because looks like current algorithm is not punishing writers that hard.
> This can be fixed and not an issue.
> 
> Following is some output from iostat.
> 
> **********************************************************************
> Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
> sdb1            139.60         0.04        43.83          0         44
> 
> Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
> sdb1            227.72        16.88        29.05         17         29
> 
> Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
> sdb1            349.00        35.04        16.06         35         16
> 
> Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
> sdb1            339.00        34.16        21.07         34         21
> 
> Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
> sdb1            343.56        36.68        12.54         37         12
> 
> Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
> sdb1            378.00        38.68        19.47         38         19
> 
> Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
> sdb1            532.00        59.06        10.00         59         10
> 
> Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
> sdb1            125.00         2.62        38.82          2         38
> ************************************************************************
> 
> Note how read throughput goes up when reader comes in. Also note that
> writer is still getting some decent IO done and that's why reader took
> little bit more time as compared to CFQ.
> 
> 
> Run Test1 with IO throttle patches
> ==================================
> 
> Now same test is run with io-throttle patches. The only difference is that
> it run the test in a cgroup with max limit of 32MB/s. That should mean 
> that effectvily we got a disk which can support at max 32MB/s of IO rate.
> If we look at above CFQ and io controller results, it looks like with
> above load we touched a peak of 70MB/s.  So one can think of same test
> being run on a disk roughly half the speed of original disk.
> 
> 234179072 bytes (234 MB) copied, 144.207 s, 1.6 MB/s
> 
> Reader got a disk rate of 1.6MB/s (5 %) out of 32MB/s capacity, as opposed to
> the case CFQ and io scheduler controller where reader got around 70-80% of
> disk BW under similar work load.
> 
> Test2
> =====
> Run test2 with io scheduler based io controller
> ===============================================
> Now run almost same test with a little difference. This time I create two
> cgroups of same weight 1000. I run the 50 fio random writer in one cgroup
> and 4 sequential writers and 1 reader in second group. This test is more
> to show that proportional BW IO controller is working and because of
> reader in group1, group2 writes are not killed (providing isolation) and
> secondly, reader still gets preference over the writers which are in same
> group.
> 
> 				root
> 			     /       \		
> 			  group1     group2
> 		  (50 fio writers)   ( 4 writers and one reader)
> 
> 234179072 bytes (234 MB) copied, 12.8546 s, 18.2 MB/s
> 
> Reader finished in almost 13 seconds and got around 18MB/s. Remember when
> everything was in root group reader got around 45MB/s. This is to account
> for the fact that half of the disk is now being shared by other cgroup
> which are running 50 fio writes and reader can't steal the disk from them.
> 
> Following is some portion of iostat output when reader became active
> *********************************************************************
> Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
> sdb1            103.92         0.03        40.21          0         41
> 
> Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
> sdb1            240.00        15.78        37.40         15         37
> 
> Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
> sdb1            206.93        13.17        28.50         13         28
> 
> Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
> sdb1            224.75        15.39        27.89         15         28
> 
> Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
> sdb1            270.71        16.85        25.95         16         25
> 
> Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
> sdb1            215.84         8.81        32.40          8         32
> 
> Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
> sdb1            216.16        19.11        20.75         18         20
> 
> Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
> sdb1            211.11        14.67        35.77         14         35
> 
> Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
> sdb1            208.91        15.04        26.95         15         27
> 
> Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
> sdb1            277.23        24.30        28.53         24         28
> 
> Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
> sdb1            202.97        12.29        34.79         12         35
> **********************************************************************
> 
> Total disk throughput is varying a lot, on an average it looks like it
> is getting 45MB/s. Lets say 50% of that is going to cgroup1 (fio writers),
> then out of rest of 22 MB/s reader seems to have to 18MB/s. These are
> highly approximate numbers. I think I need to come up with some kind of 
> tool to measure per cgroup throughput (like we have for per partition
> stat) for more accurate comparision.
> 
> But the point is that second cgroup got the isolation and read got
> preference with-in same cgroup. The expected behavior.
> 
> Run test2 with io-throttle
> ==========================
> Same setup of two groups. The only difference is that I setup two groups
> with (16MB) limit. So previous 32MB limit got divided between two cgroups
> 50% each.
> 
> - 234179072 bytes (234 MB) copied, 90.8055 s, 2.6 MB/s
> 
> Reader took 90 seconds to finish.  It seems to have got around 16% of
> available disk BW (16MB) to it.
> 
> iostat output is long. Will just paste one section.
> 
> ************************************************************************
> [..]
> 
> Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
> sdb1            141.58        10.16        16.12         10         16
> 
> Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
> sdb1            174.75         8.06        12.31          7         12
> 
> Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
> sdb1             47.52         0.12         6.16          0          6
> 
> Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
> sdb1             82.00         0.00        31.85          0         31
> 
> Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
> sdb1            141.00         0.00        48.07          0         48
> 
> Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
> sdb1             72.73         0.00        26.52          0         26
>  
> 
> ***************************************************************************
> 
> Conclusion
> ==========
> It just reaffirms that with max BW control, we are not doing a fair job
> of throttling hence no more hold the IO scheduler properties with-in
> cgroup.
> 
> With proportional BW controller implemented at IO scheduler level, one
> can do very tight integration with IO controller and hence retain 
> IO scheduler behavior with-in cgroup.

It is worth to bug you I would say :). Results are interesting,
definitely. I'll check if it's possible to merge part of the io-throttle
max BW control in this controller and who knows if finally we'll be able
to converge to a common proposal...

Thanks,
-Andrea

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: IO scheduler based IO Controller V2
  2009-05-08 18:09                           ` Vivek Goyal
@ 2009-05-08 20:05                             ` Andrea Righi
  2009-05-08 21:56                                 ` Vivek Goyal
       [not found]                             ` <20090508180951.GG7293-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  1 sibling, 1 reply; 97+ messages in thread
From: Andrea Righi @ 2009-05-08 20:05 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Andrew Morton, nauman, dpshah, lizf, mikew, fchecconi,
	paolo.valente, jens.axboe, ryov, fernando, s-uchida, taka,
	guijianfeng, jmoyer, dhaval, balbir, linux-kernel, containers,
	agk, dm-devel, snitzer, m-ikeda, peterz

On Fri, May 08, 2009 at 02:09:51PM -0400, Vivek Goyal wrote:
> On Fri, May 08, 2009 at 12:19:01AM +0200, Andrea Righi wrote:
> > On Thu, May 07, 2009 at 11:36:42AM -0400, Vivek Goyal wrote:
> > > Hmm.., my old config had "AS" as default scheduler that's why I was seeing
> > > the strange issue of RT task finishing after BE. My apologies for that. I
> > > somehow assumed that CFQ is default scheduler in my config.
> > 
> > ok.
> > 
> > > 
> > > So I have re-run the test to see if we are still seeing the issue of
> > > loosing priority and class with-in cgroup. And we still do..
> > > 
> > > 2.6.30-rc4 with io-throttle patches
> > > ===================================
> > > Test1
> > > =====
> > > - Two readers, one BE prio 0 and other BE prio 7 in a cgroup limited with
> > >   8MB/s BW.
> > > 
> > > 234179072 bytes (234 MB) copied, 55.8448 s, 4.2 MB/s
> > > prio 0 task finished
> > > 234179072 bytes (234 MB) copied, 55.8878 s, 4.2 MB/s
> > > 
> > > Test2
> > > =====
> > > - Two readers, one RT prio 0 and other BE prio 7 in a cgroup limited with
> > >   8MB/s BW.
> > > 
> > > 234179072 bytes (234 MB) copied, 55.8876 s, 4.2 MB/s
> > > 234179072 bytes (234 MB) copied, 55.8984 s, 4.2 MB/s
> > > RT task finished
> > 
> > ok, coherent with the current io-throttle implementation.
> > 
> > > 
> > > Test3
> > > =====
> > > - Reader Starvation
> > > - I created a cgroup with BW limit of 64MB/s. First I just run the reader
> > >   alone and then I run reader along with 4 writers 4 times. 
> > > 
> > > Reader alone
> > > 234179072 bytes (234 MB) copied, 3.71796 s, 63.0 MB/s
> > > 
> > > Reader with 4 writers
> > > ---------------------
> > > First run
> > > 234179072 bytes (234 MB) copied, 30.394 s, 7.7 MB/s 
> > > 
> > > Second run
> > > 234179072 bytes (234 MB) copied, 26.9607 s, 8.7 MB/s
> > > 
> > > Third run
> > > 234179072 bytes (234 MB) copied, 37.3515 s, 6.3 MB/s
> > > 
> > > Fourth run
> > > 234179072 bytes (234 MB) copied, 36.817 s, 6.4 MB/s
> > > 
> > > Note that out of 64MB/s limit of this cgroup, reader does not get even
> > > 1/5 of the BW. In normal systems, readers are advantaged and reader gets
> > > its job done much faster even in presence of multiple writers.   
> > 
> > And this is also coherent. The throttling is equally probable for read
> > and write. But this shouldn't happen if we saturate the physical disk BW
> > (doing proportional BW control or using a watermark close to 100 in
> > io-throttle). In this case IO scheduler logic shouldn't be totally
> > broken.
> >
> 
> Can you please explain the watermark a bit more? So blockio.watermark=90
> mean 90% of what? total disk BW? But disk BW varies based on work load?

The controller starts to apply throttling rules only when the total disk
BW utilization is greater than 90%.

The consumed BW is evaluated as (cpu_ticks / io_ticks * 100), where
cpu_ticks are the ticks (in jiffies) since the last i/o request and
io_ticks is the difference of ticks accounted to a particular block
device, retrieved by:

part_stat_read(bdev->bd_part, io_ticks)

BTW it's the same metric (%util) used by iostat.

> 
> > Doing a very quick test with io-throttle, using a 10MB/s BW limit and
> > blockio.watermark=90:
> > 
> > Launching reader
> > 256+0 records in
> > 256+0 records out
> > 268435456 bytes (268 MB) copied, 32.2798 s, 8.3 MB/s
> > 
> > In the same time the writers wrote ~190MB, so the single reader got
> > about 1/3 of the total BW.
> > 
> > 182M testzerofile4
> > 198M testzerofile1
> > 188M testzerofile3
> > 189M testzerofile2
> > 
> 
> But its now more a max bw controller at all now? I seem to be getting the
> total BW of (268+182+198+188+189)/32 = 32MB/s and you set the limit to
> 10MB/s?
>  

The limit of 10MB/s is applied only when the consumed disk BW hits 90%.

If the disk is not fully saturated no limit is applied. It's nothing
more than soft limiting, to avoid to waste the unused disk BW that we
have with hard limits. This is similar to the proportional approach from
a certain point of view.

But ok, this only reduces the number of times that we block the IO
requests. The fact is that when we apply throttling the probability to
block a read or a write it's the same also in this case.

> 
> [..]
> > What are the results with your IO scheduler controller (if you already
> > have them, otherwise I'll repeat this test in my system)? It seems a
> > very interesting test to compare the advantages of the IO scheduler
> > solution respect to the io-throttle approach.
> > 
> 
> I had not done any reader writer testing so far. But you forced me to run
> some now. :-) Here are the results. 

Good! :)

> 
> Because one is max BW controller and other is proportional BW controller
> doing exact comparison is hard. Still....
> 
> Test1
> =====
> Try to run lots of writers (50 random writers using fio and 4 sequential
> writers with dd if=/dev/zero) and one single reader either in root group
> or with in one cgroup to show that readers are not starved by writers
> as opposed to io-throttle controller.
> 
> Run test1 with vanilla kernel with CFQ
> =====================================
> Launched 50 fio random writers, 4 sequential writers and 1 reader in root
> and noted how long it takes reader to finish. Also noted the per second output
> from iostat -d 1 -m /dev/sdb1 to monitor how disk throughput varies.
> 
> ***********************************************************************
> # launch 50 writers fio job
> 
> fio_args="--size=64m --rw=write --numjobs=50 --group_reporting"
> fio $fio_args --name=test2 --directory=/mnt/sdb/fio2/ --output=/mnt/sdb/fio2/test2.log > /dev/null  &
> 
> #launch 4 sequential writers
> ionice -c 2 -n 7 dd if=/dev/zero of=/mnt/sdb/testzerofile1 bs=4K count=524288 &
> ionice -c 2 -n 7 dd if=/dev/zero of=/mnt/sdb/testzerofile2 bs=4K count=524288 &
> ionice -c 2 -n 7 dd if=/dev/zero of=/mnt/sdb/testzerofile3 bs=4K count=524288 &
> ionice -c 2 -n 7 dd if=/dev/zero of=/mnt/sdb/testzerofile4 bs=4K count=524288 &
> 
> echo "Sleeping for 5 seconds"
> sleep 5
> echo "Launching reader"
> 
> ionice -c 2 -n 0 dd if=/mnt/sdb/zerofile2 of=/dev/zero &
> wait $!
> echo "Reader Finished"
> ***************************************************************************
> 
> Results
> -------
> 234179072 bytes (234 MB) copied, 4.55047 s, 51.5 MB/s
> 
> Reader finished in 4.5 seconds. Following are few lines from iostat output
> 
> ***********************************************************************
> Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
> sdb1            151.00         0.04        48.33          0         48
> 
> Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
> sdb1            120.00         1.78        31.23          1         31
> 
> Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
> sdb1            504.95        56.75         7.51         57          7
> 
> Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
> sdb1            547.47        62.71         4.47         62          4
> 
> Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
> sdb1            441.00        49.80         7.82         49          7
> 
> Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
> sdb1            441.41        48.28        13.84         47         13
> 
> *************************************************************************
> 
> Note how, first write picks up and then suddenly reader comes in and CFQ
> allocates a huge chunk of BW to reader to give it the advantage.
> 
> Run Test1 with IO scheduler based io controller patch
> =====================================================
> 
> 234179072 bytes (234 MB) copied, 5.23141 s, 44.8 MB/s 
> 
> Reader finishes in 5.23 seconds. Why does it take more time than CFQ,
> because looks like current algorithm is not punishing writers that hard.
> This can be fixed and not an issue.
> 
> Following is some output from iostat.
> 
> **********************************************************************
> Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
> sdb1            139.60         0.04        43.83          0         44
> 
> Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
> sdb1            227.72        16.88        29.05         17         29
> 
> Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
> sdb1            349.00        35.04        16.06         35         16
> 
> Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
> sdb1            339.00        34.16        21.07         34         21
> 
> Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
> sdb1            343.56        36.68        12.54         37         12
> 
> Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
> sdb1            378.00        38.68        19.47         38         19
> 
> Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
> sdb1            532.00        59.06        10.00         59         10
> 
> Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
> sdb1            125.00         2.62        38.82          2         38
> ************************************************************************
> 
> Note how read throughput goes up when reader comes in. Also note that
> writer is still getting some decent IO done and that's why reader took
> little bit more time as compared to CFQ.
> 
> 
> Run Test1 with IO throttle patches
> ==================================
> 
> Now same test is run with io-throttle patches. The only difference is that
> it run the test in a cgroup with max limit of 32MB/s. That should mean 
> that effectvily we got a disk which can support at max 32MB/s of IO rate.
> If we look at above CFQ and io controller results, it looks like with
> above load we touched a peak of 70MB/s.  So one can think of same test
> being run on a disk roughly half the speed of original disk.
> 
> 234179072 bytes (234 MB) copied, 144.207 s, 1.6 MB/s
> 
> Reader got a disk rate of 1.6MB/s (5 %) out of 32MB/s capacity, as opposed to
> the case CFQ and io scheduler controller where reader got around 70-80% of
> disk BW under similar work load.
> 
> Test2
> =====
> Run test2 with io scheduler based io controller
> ===============================================
> Now run almost same test with a little difference. This time I create two
> cgroups of same weight 1000. I run the 50 fio random writer in one cgroup
> and 4 sequential writers and 1 reader in second group. This test is more
> to show that proportional BW IO controller is working and because of
> reader in group1, group2 writes are not killed (providing isolation) and
> secondly, reader still gets preference over the writers which are in same
> group.
> 
> 				root
> 			     /       \		
> 			  group1     group2
> 		  (50 fio writers)   ( 4 writers and one reader)
> 
> 234179072 bytes (234 MB) copied, 12.8546 s, 18.2 MB/s
> 
> Reader finished in almost 13 seconds and got around 18MB/s. Remember when
> everything was in root group reader got around 45MB/s. This is to account
> for the fact that half of the disk is now being shared by other cgroup
> which are running 50 fio writes and reader can't steal the disk from them.
> 
> Following is some portion of iostat output when reader became active
> *********************************************************************
> Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
> sdb1            103.92         0.03        40.21          0         41
> 
> Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
> sdb1            240.00        15.78        37.40         15         37
> 
> Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
> sdb1            206.93        13.17        28.50         13         28
> 
> Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
> sdb1            224.75        15.39        27.89         15         28
> 
> Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
> sdb1            270.71        16.85        25.95         16         25
> 
> Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
> sdb1            215.84         8.81        32.40          8         32
> 
> Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
> sdb1            216.16        19.11        20.75         18         20
> 
> Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
> sdb1            211.11        14.67        35.77         14         35
> 
> Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
> sdb1            208.91        15.04        26.95         15         27
> 
> Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
> sdb1            277.23        24.30        28.53         24         28
> 
> Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
> sdb1            202.97        12.29        34.79         12         35
> **********************************************************************
> 
> Total disk throughput is varying a lot, on an average it looks like it
> is getting 45MB/s. Lets say 50% of that is going to cgroup1 (fio writers),
> then out of rest of 22 MB/s reader seems to have to 18MB/s. These are
> highly approximate numbers. I think I need to come up with some kind of 
> tool to measure per cgroup throughput (like we have for per partition
> stat) for more accurate comparision.
> 
> But the point is that second cgroup got the isolation and read got
> preference with-in same cgroup. The expected behavior.
> 
> Run test2 with io-throttle
> ==========================
> Same setup of two groups. The only difference is that I setup two groups
> with (16MB) limit. So previous 32MB limit got divided between two cgroups
> 50% each.
> 
> - 234179072 bytes (234 MB) copied, 90.8055 s, 2.6 MB/s
> 
> Reader took 90 seconds to finish.  It seems to have got around 16% of
> available disk BW (16MB) to it.
> 
> iostat output is long. Will just paste one section.
> 
> ************************************************************************
> [..]
> 
> Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
> sdb1            141.58        10.16        16.12         10         16
> 
> Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
> sdb1            174.75         8.06        12.31          7         12
> 
> Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
> sdb1             47.52         0.12         6.16          0          6
> 
> Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
> sdb1             82.00         0.00        31.85          0         31
> 
> Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
> sdb1            141.00         0.00        48.07          0         48
> 
> Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
> sdb1             72.73         0.00        26.52          0         26
>  
> 
> ***************************************************************************
> 
> Conclusion
> ==========
> It just reaffirms that with max BW control, we are not doing a fair job
> of throttling hence no more hold the IO scheduler properties with-in
> cgroup.
> 
> With proportional BW controller implemented at IO scheduler level, one
> can do very tight integration with IO controller and hence retain 
> IO scheduler behavior with-in cgroup.

It is worth to bug you I would say :). Results are interesting,
definitely. I'll check if it's possible to merge part of the io-throttle
max BW control in this controller and who knows if finally we'll be able
to converge to a common proposal...

Thanks,
-Andrea

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: IO scheduler based IO Controller V2
  2009-05-07 22:19                         ` Andrea Righi
  2009-05-08 18:09                           ` Vivek Goyal
@ 2009-05-08 18:09                           ` Vivek Goyal
  1 sibling, 0 replies; 97+ messages in thread
From: Vivek Goyal @ 2009-05-08 18:09 UTC (permalink / raw)
  To: Andrea Righi
  Cc: dhaval-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8,
	snitzer-H+wXaHxf7aLQT0dZR+AlfA, dm-devel-H+wXaHxf7aLQT0dZR+AlfA,
	jens.axboe-QHcLZuEGTsvQT0dZR+AlfA, agk-H+wXaHxf7aLQT0dZR+AlfA,
	balbir-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8,
	paolo.valente-rcYM44yAMweonA0d6jMUrA,
	fernando-gVGce1chcLdL9jVzuh4AOg, jmoyer-H+wXaHxf7aLQT0dZR+AlfA,
	fchecconi-Re5JQEeQqe8AvxtiuMwx3w,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Andrew Morton

On Fri, May 08, 2009 at 12:19:01AM +0200, Andrea Righi wrote:
> On Thu, May 07, 2009 at 11:36:42AM -0400, Vivek Goyal wrote:
> > Hmm.., my old config had "AS" as default scheduler that's why I was seeing
> > the strange issue of RT task finishing after BE. My apologies for that. I
> > somehow assumed that CFQ is default scheduler in my config.
> 
> ok.
> 
> > 
> > So I have re-run the test to see if we are still seeing the issue of
> > loosing priority and class with-in cgroup. And we still do..
> > 
> > 2.6.30-rc4 with io-throttle patches
> > ===================================
> > Test1
> > =====
> > - Two readers, one BE prio 0 and other BE prio 7 in a cgroup limited with
> >   8MB/s BW.
> > 
> > 234179072 bytes (234 MB) copied, 55.8448 s, 4.2 MB/s
> > prio 0 task finished
> > 234179072 bytes (234 MB) copied, 55.8878 s, 4.2 MB/s
> > 
> > Test2
> > =====
> > - Two readers, one RT prio 0 and other BE prio 7 in a cgroup limited with
> >   8MB/s BW.
> > 
> > 234179072 bytes (234 MB) copied, 55.8876 s, 4.2 MB/s
> > 234179072 bytes (234 MB) copied, 55.8984 s, 4.2 MB/s
> > RT task finished
> 
> ok, coherent with the current io-throttle implementation.
> 
> > 
> > Test3
> > =====
> > - Reader Starvation
> > - I created a cgroup with BW limit of 64MB/s. First I just run the reader
> >   alone and then I run reader along with 4 writers 4 times. 
> > 
> > Reader alone
> > 234179072 bytes (234 MB) copied, 3.71796 s, 63.0 MB/s
> > 
> > Reader with 4 writers
> > ---------------------
> > First run
> > 234179072 bytes (234 MB) copied, 30.394 s, 7.7 MB/s 
> > 
> > Second run
> > 234179072 bytes (234 MB) copied, 26.9607 s, 8.7 MB/s
> > 
> > Third run
> > 234179072 bytes (234 MB) copied, 37.3515 s, 6.3 MB/s
> > 
> > Fourth run
> > 234179072 bytes (234 MB) copied, 36.817 s, 6.4 MB/s
> > 
> > Note that out of 64MB/s limit of this cgroup, reader does not get even
> > 1/5 of the BW. In normal systems, readers are advantaged and reader gets
> > its job done much faster even in presence of multiple writers.   
> 
> And this is also coherent. The throttling is equally probable for read
> and write. But this shouldn't happen if we saturate the physical disk BW
> (doing proportional BW control or using a watermark close to 100 in
> io-throttle). In this case IO scheduler logic shouldn't be totally
> broken.
>

Can you please explain the watermark a bit more? So blockio.watermark=90
mean 90% of what? total disk BW? But disk BW varies based on work load?

> Doing a very quick test with io-throttle, using a 10MB/s BW limit and
> blockio.watermark=90:
> 
> Launching reader
> 256+0 records in
> 256+0 records out
> 268435456 bytes (268 MB) copied, 32.2798 s, 8.3 MB/s
> 
> In the same time the writers wrote ~190MB, so the single reader got
> about 1/3 of the total BW.
> 
> 182M testzerofile4
> 198M testzerofile1
> 188M testzerofile3
> 189M testzerofile2
> 

But its now more a max bw controller at all now? I seem to be getting the
total BW of (268+182+198+188+189)/32 = 32MB/s and you set the limit to
10MB/s?
 

[..]
> What are the results with your IO scheduler controller (if you already
> have them, otherwise I'll repeat this test in my system)? It seems a
> very interesting test to compare the advantages of the IO scheduler
> solution respect to the io-throttle approach.
> 

I had not done any reader writer testing so far. But you forced me to run
some now. :-) Here are the results. 

Because one is max BW controller and other is proportional BW controller
doing exact comparison is hard. Still....

Test1
=====
Try to run lots of writers (50 random writers using fio and 4 sequential
writers with dd if=/dev/zero) and one single reader either in root group
or with in one cgroup to show that readers are not starved by writers
as opposed to io-throttle controller.

Run test1 with vanilla kernel with CFQ
=====================================
Launched 50 fio random writers, 4 sequential writers and 1 reader in root
and noted how long it takes reader to finish. Also noted the per second output
from iostat -d 1 -m /dev/sdb1 to monitor how disk throughput varies.

***********************************************************************
# launch 50 writers fio job

fio_args="--size=64m --rw=write --numjobs=50 --group_reporting"
fio $fio_args --name=test2 --directory=/mnt/sdb/fio2/ --output=/mnt/sdb/fio2/test2.log > /dev/null  &

#launch 4 sequential writers
ionice -c 2 -n 7 dd if=/dev/zero of=/mnt/sdb/testzerofile1 bs=4K count=524288 &
ionice -c 2 -n 7 dd if=/dev/zero of=/mnt/sdb/testzerofile2 bs=4K count=524288 &
ionice -c 2 -n 7 dd if=/dev/zero of=/mnt/sdb/testzerofile3 bs=4K count=524288 &
ionice -c 2 -n 7 dd if=/dev/zero of=/mnt/sdb/testzerofile4 bs=4K count=524288 &

echo "Sleeping for 5 seconds"
sleep 5
echo "Launching reader"

ionice -c 2 -n 0 dd if=/mnt/sdb/zerofile2 of=/dev/zero &
wait $!
echo "Reader Finished"
***************************************************************************

Results
-------
234179072 bytes (234 MB) copied, 4.55047 s, 51.5 MB/s

Reader finished in 4.5 seconds. Following are few lines from iostat output

***********************************************************************
Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
sdb1            151.00         0.04        48.33          0         48

Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
sdb1            120.00         1.78        31.23          1         31

Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
sdb1            504.95        56.75         7.51         57          7

Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
sdb1            547.47        62.71         4.47         62          4

Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
sdb1            441.00        49.80         7.82         49          7

Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
sdb1            441.41        48.28        13.84         47         13

*************************************************************************

Note how, first write picks up and then suddenly reader comes in and CFQ
allocates a huge chunk of BW to reader to give it the advantage.

Run Test1 with IO scheduler based io controller patch
=====================================================

234179072 bytes (234 MB) copied, 5.23141 s, 44.8 MB/s 

Reader finishes in 5.23 seconds. Why does it take more time than CFQ,
because looks like current algorithm is not punishing writers that hard.
This can be fixed and not an issue.

Following is some output from iostat.

**********************************************************************
Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
sdb1            139.60         0.04        43.83          0         44

Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
sdb1            227.72        16.88        29.05         17         29

Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
sdb1            349.00        35.04        16.06         35         16

Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
sdb1            339.00        34.16        21.07         34         21

Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
sdb1            343.56        36.68        12.54         37         12

Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
sdb1            378.00        38.68        19.47         38         19

Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
sdb1            532.00        59.06        10.00         59         10

Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
sdb1            125.00         2.62        38.82          2         38
************************************************************************

Note how read throughput goes up when reader comes in. Also note that
writer is still getting some decent IO done and that's why reader took
little bit more time as compared to CFQ.


Run Test1 with IO throttle patches
==================================

Now same test is run with io-throttle patches. The only difference is that
it run the test in a cgroup with max limit of 32MB/s. That should mean 
that effectvily we got a disk which can support at max 32MB/s of IO rate.
If we look at above CFQ and io controller results, it looks like with
above load we touched a peak of 70MB/s.  So one can think of same test
being run on a disk roughly half the speed of original disk.

234179072 bytes (234 MB) copied, 144.207 s, 1.6 MB/s

Reader got a disk rate of 1.6MB/s (5 %) out of 32MB/s capacity, as opposed to
the case CFQ and io scheduler controller where reader got around 70-80% of
disk BW under similar work load.

Test2
=====
Run test2 with io scheduler based io controller
===============================================
Now run almost same test with a little difference. This time I create two
cgroups of same weight 1000. I run the 50 fio random writer in one cgroup
and 4 sequential writers and 1 reader in second group. This test is more
to show that proportional BW IO controller is working and because of
reader in group1, group2 writes are not killed (providing isolation) and
secondly, reader still gets preference over the writers which are in same
group.

				root
			     /       \		
			  group1     group2
		  (50 fio writers)   ( 4 writers and one reader)

234179072 bytes (234 MB) copied, 12.8546 s, 18.2 MB/s

Reader finished in almost 13 seconds and got around 18MB/s. Remember when
everything was in root group reader got around 45MB/s. This is to account
for the fact that half of the disk is now being shared by other cgroup
which are running 50 fio writes and reader can't steal the disk from them.

Following is some portion of iostat output when reader became active
*********************************************************************
Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
sdb1            103.92         0.03        40.21          0         41

Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
sdb1            240.00        15.78        37.40         15         37

Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
sdb1            206.93        13.17        28.50         13         28

Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
sdb1            224.75        15.39        27.89         15         28

Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
sdb1            270.71        16.85        25.95         16         25

Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
sdb1            215.84         8.81        32.40          8         32

Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
sdb1            216.16        19.11        20.75         18         20

Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
sdb1            211.11        14.67        35.77         14         35

Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
sdb1            208.91        15.04        26.95         15         27

Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
sdb1            277.23        24.30        28.53         24         28

Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
sdb1            202.97        12.29        34.79         12         35
**********************************************************************

Total disk throughput is varying a lot, on an average it looks like it
is getting 45MB/s. Lets say 50% of that is going to cgroup1 (fio writers),
then out of rest of 22 MB/s reader seems to have to 18MB/s. These are
highly approximate numbers. I think I need to come up with some kind of 
tool to measure per cgroup throughput (like we have for per partition
stat) for more accurate comparision.

But the point is that second cgroup got the isolation and read got
preference with-in same cgroup. The expected behavior.

Run test2 with io-throttle
==========================
Same setup of two groups. The only difference is that I setup two groups
with (16MB) limit. So previous 32MB limit got divided between two cgroups
50% each.

- 234179072 bytes (234 MB) copied, 90.8055 s, 2.6 MB/s

Reader took 90 seconds to finish.  It seems to have got around 16% of
available disk BW (16MB) to it.

iostat output is long. Will just paste one section.

************************************************************************
[..]

Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
sdb1            141.58        10.16        16.12         10         16

Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
sdb1            174.75         8.06        12.31          7         12

Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
sdb1             47.52         0.12         6.16          0          6

Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
sdb1             82.00         0.00        31.85          0         31

Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
sdb1            141.00         0.00        48.07          0         48

Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
sdb1             72.73         0.00        26.52          0         26
 

***************************************************************************

Conclusion
==========
It just reaffirms that with max BW control, we are not doing a fair job
of throttling hence no more hold the IO scheduler properties with-in
cgroup.

With proportional BW controller implemented at IO scheduler level, one
can do very tight integration with IO controller and hence retain 
IO scheduler behavior with-in cgroup.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: IO scheduler based IO Controller V2
  2009-05-07 22:19                         ` Andrea Righi
@ 2009-05-08 18:09                           ` Vivek Goyal
  2009-05-08 20:05                             ` Andrea Righi
       [not found]                             ` <20090508180951.GG7293-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  2009-05-08 18:09                           ` Vivek Goyal
  1 sibling, 2 replies; 97+ messages in thread
From: Vivek Goyal @ 2009-05-08 18:09 UTC (permalink / raw)
  To: Andrea Righi
  Cc: Andrew Morton, nauman, dpshah, lizf, mikew, fchecconi,
	paolo.valente, jens.axboe, ryov, fernando, s-uchida, taka,
	guijianfeng, jmoyer, dhaval, balbir, linux-kernel, containers,
	agk, dm-devel, snitzer, m-ikeda, peterz

On Fri, May 08, 2009 at 12:19:01AM +0200, Andrea Righi wrote:
> On Thu, May 07, 2009 at 11:36:42AM -0400, Vivek Goyal wrote:
> > Hmm.., my old config had "AS" as default scheduler that's why I was seeing
> > the strange issue of RT task finishing after BE. My apologies for that. I
> > somehow assumed that CFQ is default scheduler in my config.
> 
> ok.
> 
> > 
> > So I have re-run the test to see if we are still seeing the issue of
> > loosing priority and class with-in cgroup. And we still do..
> > 
> > 2.6.30-rc4 with io-throttle patches
> > ===================================
> > Test1
> > =====
> > - Two readers, one BE prio 0 and other BE prio 7 in a cgroup limited with
> >   8MB/s BW.
> > 
> > 234179072 bytes (234 MB) copied, 55.8448 s, 4.2 MB/s
> > prio 0 task finished
> > 234179072 bytes (234 MB) copied, 55.8878 s, 4.2 MB/s
> > 
> > Test2
> > =====
> > - Two readers, one RT prio 0 and other BE prio 7 in a cgroup limited with
> >   8MB/s BW.
> > 
> > 234179072 bytes (234 MB) copied, 55.8876 s, 4.2 MB/s
> > 234179072 bytes (234 MB) copied, 55.8984 s, 4.2 MB/s
> > RT task finished
> 
> ok, coherent with the current io-throttle implementation.
> 
> > 
> > Test3
> > =====
> > - Reader Starvation
> > - I created a cgroup with BW limit of 64MB/s. First I just run the reader
> >   alone and then I run reader along with 4 writers 4 times. 
> > 
> > Reader alone
> > 234179072 bytes (234 MB) copied, 3.71796 s, 63.0 MB/s
> > 
> > Reader with 4 writers
> > ---------------------
> > First run
> > 234179072 bytes (234 MB) copied, 30.394 s, 7.7 MB/s 
> > 
> > Second run
> > 234179072 bytes (234 MB) copied, 26.9607 s, 8.7 MB/s
> > 
> > Third run
> > 234179072 bytes (234 MB) copied, 37.3515 s, 6.3 MB/s
> > 
> > Fourth run
> > 234179072 bytes (234 MB) copied, 36.817 s, 6.4 MB/s
> > 
> > Note that out of 64MB/s limit of this cgroup, reader does not get even
> > 1/5 of the BW. In normal systems, readers are advantaged and reader gets
> > its job done much faster even in presence of multiple writers.   
> 
> And this is also coherent. The throttling is equally probable for read
> and write. But this shouldn't happen if we saturate the physical disk BW
> (doing proportional BW control or using a watermark close to 100 in
> io-throttle). In this case IO scheduler logic shouldn't be totally
> broken.
>

Can you please explain the watermark a bit more? So blockio.watermark=90
mean 90% of what? total disk BW? But disk BW varies based on work load?

> Doing a very quick test with io-throttle, using a 10MB/s BW limit and
> blockio.watermark=90:
> 
> Launching reader
> 256+0 records in
> 256+0 records out
> 268435456 bytes (268 MB) copied, 32.2798 s, 8.3 MB/s
> 
> In the same time the writers wrote ~190MB, so the single reader got
> about 1/3 of the total BW.
> 
> 182M testzerofile4
> 198M testzerofile1
> 188M testzerofile3
> 189M testzerofile2
> 

But its now more a max bw controller at all now? I seem to be getting the
total BW of (268+182+198+188+189)/32 = 32MB/s and you set the limit to
10MB/s?
 

[..]
> What are the results with your IO scheduler controller (if you already
> have them, otherwise I'll repeat this test in my system)? It seems a
> very interesting test to compare the advantages of the IO scheduler
> solution respect to the io-throttle approach.
> 

I had not done any reader writer testing so far. But you forced me to run
some now. :-) Here are the results. 

Because one is max BW controller and other is proportional BW controller
doing exact comparison is hard. Still....

Test1
=====
Try to run lots of writers (50 random writers using fio and 4 sequential
writers with dd if=/dev/zero) and one single reader either in root group
or with in one cgroup to show that readers are not starved by writers
as opposed to io-throttle controller.

Run test1 with vanilla kernel with CFQ
=====================================
Launched 50 fio random writers, 4 sequential writers and 1 reader in root
and noted how long it takes reader to finish. Also noted the per second output
from iostat -d 1 -m /dev/sdb1 to monitor how disk throughput varies.

***********************************************************************
# launch 50 writers fio job

fio_args="--size=64m --rw=write --numjobs=50 --group_reporting"
fio $fio_args --name=test2 --directory=/mnt/sdb/fio2/ --output=/mnt/sdb/fio2/test2.log > /dev/null  &

#launch 4 sequential writers
ionice -c 2 -n 7 dd if=/dev/zero of=/mnt/sdb/testzerofile1 bs=4K count=524288 &
ionice -c 2 -n 7 dd if=/dev/zero of=/mnt/sdb/testzerofile2 bs=4K count=524288 &
ionice -c 2 -n 7 dd if=/dev/zero of=/mnt/sdb/testzerofile3 bs=4K count=524288 &
ionice -c 2 -n 7 dd if=/dev/zero of=/mnt/sdb/testzerofile4 bs=4K count=524288 &

echo "Sleeping for 5 seconds"
sleep 5
echo "Launching reader"

ionice -c 2 -n 0 dd if=/mnt/sdb/zerofile2 of=/dev/zero &
wait $!
echo "Reader Finished"
***************************************************************************

Results
-------
234179072 bytes (234 MB) copied, 4.55047 s, 51.5 MB/s

Reader finished in 4.5 seconds. Following are few lines from iostat output

***********************************************************************
Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
sdb1            151.00         0.04        48.33          0         48

Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
sdb1            120.00         1.78        31.23          1         31

Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
sdb1            504.95        56.75         7.51         57          7

Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
sdb1            547.47        62.71         4.47         62          4

Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
sdb1            441.00        49.80         7.82         49          7

Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
sdb1            441.41        48.28        13.84         47         13

*************************************************************************

Note how, first write picks up and then suddenly reader comes in and CFQ
allocates a huge chunk of BW to reader to give it the advantage.

Run Test1 with IO scheduler based io controller patch
=====================================================

234179072 bytes (234 MB) copied, 5.23141 s, 44.8 MB/s 

Reader finishes in 5.23 seconds. Why does it take more time than CFQ,
because looks like current algorithm is not punishing writers that hard.
This can be fixed and not an issue.

Following is some output from iostat.

**********************************************************************
Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
sdb1            139.60         0.04        43.83          0         44

Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
sdb1            227.72        16.88        29.05         17         29

Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
sdb1            349.00        35.04        16.06         35         16

Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
sdb1            339.00        34.16        21.07         34         21

Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
sdb1            343.56        36.68        12.54         37         12

Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
sdb1            378.00        38.68        19.47         38         19

Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
sdb1            532.00        59.06        10.00         59         10

Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
sdb1            125.00         2.62        38.82          2         38
************************************************************************

Note how read throughput goes up when reader comes in. Also note that
writer is still getting some decent IO done and that's why reader took
little bit more time as compared to CFQ.


Run Test1 with IO throttle patches
==================================

Now same test is run with io-throttle patches. The only difference is that
it run the test in a cgroup with max limit of 32MB/s. That should mean 
that effectvily we got a disk which can support at max 32MB/s of IO rate.
If we look at above CFQ and io controller results, it looks like with
above load we touched a peak of 70MB/s.  So one can think of same test
being run on a disk roughly half the speed of original disk.

234179072 bytes (234 MB) copied, 144.207 s, 1.6 MB/s

Reader got a disk rate of 1.6MB/s (5 %) out of 32MB/s capacity, as opposed to
the case CFQ and io scheduler controller where reader got around 70-80% of
disk BW under similar work load.

Test2
=====
Run test2 with io scheduler based io controller
===============================================
Now run almost same test with a little difference. This time I create two
cgroups of same weight 1000. I run the 50 fio random writer in one cgroup
and 4 sequential writers and 1 reader in second group. This test is more
to show that proportional BW IO controller is working and because of
reader in group1, group2 writes are not killed (providing isolation) and
secondly, reader still gets preference over the writers which are in same
group.

				root
			     /       \		
			  group1     group2
		  (50 fio writers)   ( 4 writers and one reader)

234179072 bytes (234 MB) copied, 12.8546 s, 18.2 MB/s

Reader finished in almost 13 seconds and got around 18MB/s. Remember when
everything was in root group reader got around 45MB/s. This is to account
for the fact that half of the disk is now being shared by other cgroup
which are running 50 fio writes and reader can't steal the disk from them.

Following is some portion of iostat output when reader became active
*********************************************************************
Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
sdb1            103.92         0.03        40.21          0         41

Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
sdb1            240.00        15.78        37.40         15         37

Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
sdb1            206.93        13.17        28.50         13         28

Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
sdb1            224.75        15.39        27.89         15         28

Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
sdb1            270.71        16.85        25.95         16         25

Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
sdb1            215.84         8.81        32.40          8         32

Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
sdb1            216.16        19.11        20.75         18         20

Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
sdb1            211.11        14.67        35.77         14         35

Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
sdb1            208.91        15.04        26.95         15         27

Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
sdb1            277.23        24.30        28.53         24         28

Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
sdb1            202.97        12.29        34.79         12         35
**********************************************************************

Total disk throughput is varying a lot, on an average it looks like it
is getting 45MB/s. Lets say 50% of that is going to cgroup1 (fio writers),
then out of rest of 22 MB/s reader seems to have to 18MB/s. These are
highly approximate numbers. I think I need to come up with some kind of 
tool to measure per cgroup throughput (like we have for per partition
stat) for more accurate comparision.

But the point is that second cgroup got the isolation and read got
preference with-in same cgroup. The expected behavior.

Run test2 with io-throttle
==========================
Same setup of two groups. The only difference is that I setup two groups
with (16MB) limit. So previous 32MB limit got divided between two cgroups
50% each.

- 234179072 bytes (234 MB) copied, 90.8055 s, 2.6 MB/s

Reader took 90 seconds to finish.  It seems to have got around 16% of
available disk BW (16MB) to it.

iostat output is long. Will just paste one section.

************************************************************************
[..]

Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
sdb1            141.58        10.16        16.12         10         16

Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
sdb1            174.75         8.06        12.31          7         12

Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
sdb1             47.52         0.12         6.16          0          6

Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
sdb1             82.00         0.00        31.85          0         31

Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
sdb1            141.00         0.00        48.07          0         48

Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
sdb1             72.73         0.00        26.52          0         26
 

***************************************************************************

Conclusion
==========
It just reaffirms that with max BW control, we are not doing a fair job
of throttling hence no more hold the IO scheduler properties with-in
cgroup.

With proportional BW controller implemented at IO scheduler level, one
can do very tight integration with IO controller and hence retain 
IO scheduler behavior with-in cgroup.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: IO scheduler based IO Controller V2
       [not found]         ` <20090507.091858.226775723.ryov-jCdQPDEk3idL9jVzuh4AOg@public.gmane.org>
  2009-05-07  1:25             ` Vivek Goyal
@ 2009-05-08 14:24           ` Rik van Riel
  1 sibling, 0 replies; 97+ messages in thread
From: Rik van Riel @ 2009-05-08 14:24 UTC (permalink / raw)
  To: Ryo Tsuruta
  Cc: dhaval-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8,
	snitzer-H+wXaHxf7aLQT0dZR+AlfA, dm-devel-H+wXaHxf7aLQT0dZR+AlfA,
	jens.axboe-QHcLZuEGTsvQT0dZR+AlfA, agk-H+wXaHxf7aLQT0dZR+AlfA,
	balbir-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8,
	paolo.valente-rcYM44yAMweonA0d6jMUrA,
	fernando-gVGce1chcLdL9jVzuh4AOg, jmoyer-H+wXaHxf7aLQT0dZR+AlfA,
	righi.andrea-Re5JQEeQqe8AvxtiuMwx3w,
	fchecconi-Re5JQEeQqe8AvxtiuMwx3w,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b

Ryo Tsuruta wrote:
> Hi Vivek,
> 
>> Ryo, dm-ioband breaks the notion of classes and priority of CFQ because
>> of FIFO dispatch of buffered bios. Apart from that it tries to provide
>> fairness in terms of actual IO done and that would mean a seeky workload
>> will can use disk for much longer to get equivalent IO done and slow down
>> other applications. Implementing IO controller at IO scheduler level gives
>> us tigher control. Will it not meet your requirements? If you got specific
>> concerns with IO scheduler based contol patches, please highlight these and
>> we will see how these can be addressed.
> 
> I'd like to avoid making complicated existing IO schedulers and other
> kernel codes and to give a choice to users whether or not to use it.
> I know that you chose an approach that using compile time options to
> get the same behavior as old system, but device-mapper drivers can be
> added, removed and replaced while system is running.

I do not believe that every use of cgroups will end up with
a separate logical volume for each group.

In fact, if you look at group-per-UID usage, which could be
quite common on shared web servers and shell servers, I would
expect all the groups to share the same filesystem.

I do not believe dm-ioband would be useful in that configuration,
while the IO scheduler based IO controller will just work.

-- 
All rights reversed.

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: IO scheduler based IO Controller V2
  2009-05-07  0:18       ` Ryo Tsuruta
       [not found]         ` <20090507.091858.226775723.ryov-jCdQPDEk3idL9jVzuh4AOg@public.gmane.org>
@ 2009-05-08 14:24         ` Rik van Riel
       [not found]           ` <4A0440B2.7040300-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  2009-05-11 10:11           ` Ryo Tsuruta
  1 sibling, 2 replies; 97+ messages in thread
From: Rik van Riel @ 2009-05-08 14:24 UTC (permalink / raw)
  To: Ryo Tsuruta
  Cc: vgoyal, akpm, nauman, dpshah, lizf, mikew, fchecconi,
	paolo.valente, jens.axboe, fernando, s-uchida, taka, guijianfeng,
	jmoyer, dhaval, balbir, linux-kernel, containers, righi.andrea,
	agk, dm-devel, snitzer, m-ikeda, peterz

Ryo Tsuruta wrote:
> Hi Vivek,
> 
>> Ryo, dm-ioband breaks the notion of classes and priority of CFQ because
>> of FIFO dispatch of buffered bios. Apart from that it tries to provide
>> fairness in terms of actual IO done and that would mean a seeky workload
>> will can use disk for much longer to get equivalent IO done and slow down
>> other applications. Implementing IO controller at IO scheduler level gives
>> us tigher control. Will it not meet your requirements? If you got specific
>> concerns with IO scheduler based contol patches, please highlight these and
>> we will see how these can be addressed.
> 
> I'd like to avoid making complicated existing IO schedulers and other
> kernel codes and to give a choice to users whether or not to use it.
> I know that you chose an approach that using compile time options to
> get the same behavior as old system, but device-mapper drivers can be
> added, removed and replaced while system is running.

I do not believe that every use of cgroups will end up with
a separate logical volume for each group.

In fact, if you look at group-per-UID usage, which could be
quite common on shared web servers and shell servers, I would
expect all the groups to share the same filesystem.

I do not believe dm-ioband would be useful in that configuration,
while the IO scheduler based IO controller will just work.

-- 
All rights reversed.

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: IO scheduler based IO Controller V2
  2009-05-07  5:36       ` Li Zefan
@ 2009-05-08 13:37             ` Vivek Goyal
  0 siblings, 0 replies; 97+ messages in thread
From: Vivek Goyal @ 2009-05-08 13:37 UTC (permalink / raw)
  To: Li Zefan
  Cc: dhaval-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8,
	snitzer-H+wXaHxf7aLQT0dZR+AlfA, dm-devel-H+wXaHxf7aLQT0dZR+AlfA,
	jens.axboe-QHcLZuEGTsvQT0dZR+AlfA, agk-H+wXaHxf7aLQT0dZR+AlfA,
	balbir-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8,
	paolo.valente-rcYM44yAMweonA0d6jMUrA,
	fernando-gVGce1chcLdL9jVzuh4AOg, jmoyer-H+wXaHxf7aLQT0dZR+AlfA,
	fchecconi-Re5JQEeQqe8AvxtiuMwx3w,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	righi.andrea-Re5JQEeQqe8AvxtiuMwx3w

On Thu, May 07, 2009 at 01:36:08PM +0800, Li Zefan wrote:
> Vivek Goyal wrote:
> > On Wed, May 06, 2009 at 04:11:05PM +0800, Gui Jianfeng wrote:
> >> Vivek Goyal wrote:
> >>> Hi All,
> >>>
> >>> Here is the V2 of the IO controller patches generated on top of 2.6.30-rc4.
> >>> First version of the patches was posted here.
> >> Hi Vivek,
> >>
> >> I did some simple test for V2, and triggered an kernel panic.
> >> The following script can reproduce this bug. It seems that the cgroup
> >> is already removed, but IO Controller still try to access into it.
> >>
> > 
> > Hi Gui,
> > 
> > Thanks for the report. I use cgroup_path() for debugging. I guess that
> > cgroup_path() was passed null cgrp pointer that's why it crashed.
> > 
> > If yes, then it is strange though. I call cgroup_path() only after
> > grabbing a refenrece to css object. (I am assuming that if I have a valid
> > reference to css object then css->cgrp can't be null).
> > 
> 
> Yes, css->cgrp shouldn't be NULL.. I doubt we hit a bug in cgroup here.
> The code dealing with css refcnt and cgroup rmdir has changed quite a lot,
> and is much more complex than it was.
> 
> > Anyway, can you please try out following patch and see if it fixes your
> > crash.
> ...
> > BTW, I tried following equivalent script and I can't see the crash on 
> > my system. Are you able to hit it regularly?
> > 
> 
> I modified the script like this:
> 
> ======================
> #!/bin/sh
> echo 1 > /proc/sys/vm/drop_caches
> mkdir /cgroup 2> /dev/null
> mount -t cgroup -o io,blkio io /cgroup
> mkdir /cgroup/test1
> mkdir /cgroup/test2
> echo 100 > /cgroup/test1/io.weight
> echo 500 > /cgroup/test2/io.weight
> 
> dd if=/dev/zero bs=4096 count=128000 of=500M.1 &
> pid1=$!
> echo $pid1 > /cgroup/test1/tasks
> 
> dd if=/dev/zero bs=4096 count=128000 of=500M.2 &
> pid2=$!
> echo $pid2 > /cgroup/test2/tasks
> 
> sleep 5
> kill -9 $pid1
> kill -9 $pid2
> 
> for ((;count != 2;))
> {
>         rmdir /cgroup/test1 > /dev/null 2>&1
>         if [ $? -eq 0 ]; then
>                 count=$(( $count + 1 ))
>         fi
> 
>         rmdir /cgroup/test2 > /dev/null 2>&1
>         if [ $? -eq 0 ]; then
>                 count=$(( $count + 1 ))
>         fi
> }
> 
> umount /cgroup
> rmdir /cgroup
> ======================
> 
> I ran this script and got lockdep BUG. Full log and my config are attached.
> 
> Actually this can be triggered with the following steps on my box:
> # mount -t cgroup -o blkio,io xxx /mnt
> # mkdir /mnt/0
> # echo $$ > /mnt/0/tasks
> # echo 3 > /proc/sys/vm/drop_cache
> # echo $$ > /mnt/tasks
> # rmdir /mnt/0
> 
> And when I ran the script for the second time, my box was freezed
> and I had to reset it.
> 

Thanks Li and Gui for pointing out the problem. With you script, I could
also produce lock validator warning as well as system freeze. I could
identify at least two trouble spots. With following patch things seems
to be fine on my system. Can you please give it a try.


---
 block/elevator-fq.c |   20 ++++++++++++++++----
 1 file changed, 16 insertions(+), 4 deletions(-)

Index: linux11/block/elevator-fq.c
===================================================================
--- linux11.orig/block/elevator-fq.c	2009-05-08 08:47:45.000000000 -0400
+++ linux11/block/elevator-fq.c	2009-05-08 09:27:37.000000000 -0400
@@ -942,6 +942,7 @@ void entity_served(struct io_entity *ent
 	struct io_service_tree *st;
 
 	for_each_entity(entity) {
+		BUG_ON(!entity->on_st);
 		st = io_entity_service_tree(entity);
 		entity->service += served;
 		entity->total_service += served;
@@ -1652,6 +1653,14 @@ static inline int io_group_has_active_en
 			return 1;
 	}
 
+	/*
+	 * Also check there are no active entities being served which are
+	 * not on active tree
+	 */
+
+	if (iog->sched_data.active_entity)
+		return 1;
+
 	return 0;
 }
 
@@ -1738,7 +1747,7 @@ void iocg_destroy(struct cgroup_subsys *
 	struct io_cgroup *iocg = cgroup_to_io_cgroup(cgroup);
 	struct hlist_node *n, *tmp;
 	struct io_group *iog;
-	unsigned long flags;
+	unsigned long flags, flags1;
 	int queue_lock_held = 0;
 	struct elv_fq_data *efqd;
 
@@ -1766,7 +1775,8 @@ retry:
 		rcu_read_lock();
 		efqd = rcu_dereference(iog->key);
 		if (efqd != NULL) {
-			if (spin_trylock_irq(efqd->queue->queue_lock)) {
+			if (spin_trylock_irqsave(efqd->queue->queue_lock,
+						flags1)) {
 				if (iog->key == efqd) {
 					queue_lock_held = 1;
 					rcu_read_unlock();
@@ -1780,7 +1790,8 @@ retry:
 				 * elevator hence we can proceed safely without
 				 * queue lock.
 				 */
-				spin_unlock_irq(efqd->queue->queue_lock);
+				spin_unlock_irqrestore(efqd->queue->queue_lock,
+							flags1);
 			} else {
 				/*
 				 * Did not get the queue lock while trying.
@@ -1803,7 +1814,7 @@ retry:
 locked:
 		__iocg_destroy(iocg, iog, queue_lock_held);
 		if (queue_lock_held) {
-			spin_unlock_irq(efqd->queue->queue_lock);
+			spin_unlock_irqrestore(efqd->queue->queue_lock, flags1);
 			queue_lock_held = 0;
 		}
 	}
@@ -1811,6 +1822,7 @@ locked:
 
 	BUG_ON(!hlist_empty(&iocg->group_data));
 
+	free_css_id(&io_subsys, &iocg->css);
 	kfree(iocg);
 }

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: IO scheduler based IO Controller V2
@ 2009-05-08 13:37             ` Vivek Goyal
  0 siblings, 0 replies; 97+ messages in thread
From: Vivek Goyal @ 2009-05-08 13:37 UTC (permalink / raw)
  To: Li Zefan
  Cc: Gui Jianfeng, nauman, dpshah, mikew, fchecconi, paolo.valente,
	jens.axboe, ryov, fernando, s-uchida, taka, jmoyer, dhaval,
	balbir, linux-kernel, containers, righi.andrea, agk, dm-devel,
	snitzer, m-ikeda, akpm

On Thu, May 07, 2009 at 01:36:08PM +0800, Li Zefan wrote:
> Vivek Goyal wrote:
> > On Wed, May 06, 2009 at 04:11:05PM +0800, Gui Jianfeng wrote:
> >> Vivek Goyal wrote:
> >>> Hi All,
> >>>
> >>> Here is the V2 of the IO controller patches generated on top of 2.6.30-rc4.
> >>> First version of the patches was posted here.
> >> Hi Vivek,
> >>
> >> I did some simple test for V2, and triggered an kernel panic.
> >> The following script can reproduce this bug. It seems that the cgroup
> >> is already removed, but IO Controller still try to access into it.
> >>
> > 
> > Hi Gui,
> > 
> > Thanks for the report. I use cgroup_path() for debugging. I guess that
> > cgroup_path() was passed null cgrp pointer that's why it crashed.
> > 
> > If yes, then it is strange though. I call cgroup_path() only after
> > grabbing a refenrece to css object. (I am assuming that if I have a valid
> > reference to css object then css->cgrp can't be null).
> > 
> 
> Yes, css->cgrp shouldn't be NULL.. I doubt we hit a bug in cgroup here.
> The code dealing with css refcnt and cgroup rmdir has changed quite a lot,
> and is much more complex than it was.
> 
> > Anyway, can you please try out following patch and see if it fixes your
> > crash.
> ...
> > BTW, I tried following equivalent script and I can't see the crash on 
> > my system. Are you able to hit it regularly?
> > 
> 
> I modified the script like this:
> 
> ======================
> #!/bin/sh
> echo 1 > /proc/sys/vm/drop_caches
> mkdir /cgroup 2> /dev/null
> mount -t cgroup -o io,blkio io /cgroup
> mkdir /cgroup/test1
> mkdir /cgroup/test2
> echo 100 > /cgroup/test1/io.weight
> echo 500 > /cgroup/test2/io.weight
> 
> dd if=/dev/zero bs=4096 count=128000 of=500M.1 &
> pid1=$!
> echo $pid1 > /cgroup/test1/tasks
> 
> dd if=/dev/zero bs=4096 count=128000 of=500M.2 &
> pid2=$!
> echo $pid2 > /cgroup/test2/tasks
> 
> sleep 5
> kill -9 $pid1
> kill -9 $pid2
> 
> for ((;count != 2;))
> {
>         rmdir /cgroup/test1 > /dev/null 2>&1
>         if [ $? -eq 0 ]; then
>                 count=$(( $count + 1 ))
>         fi
> 
>         rmdir /cgroup/test2 > /dev/null 2>&1
>         if [ $? -eq 0 ]; then
>                 count=$(( $count + 1 ))
>         fi
> }
> 
> umount /cgroup
> rmdir /cgroup
> ======================
> 
> I ran this script and got lockdep BUG. Full log and my config are attached.
> 
> Actually this can be triggered with the following steps on my box:
> # mount -t cgroup -o blkio,io xxx /mnt
> # mkdir /mnt/0
> # echo $$ > /mnt/0/tasks
> # echo 3 > /proc/sys/vm/drop_cache
> # echo $$ > /mnt/tasks
> # rmdir /mnt/0
> 
> And when I ran the script for the second time, my box was freezed
> and I had to reset it.
> 

Thanks Li and Gui for pointing out the problem. With you script, I could
also produce lock validator warning as well as system freeze. I could
identify at least two trouble spots. With following patch things seems
to be fine on my system. Can you please give it a try.


---
 block/elevator-fq.c |   20 ++++++++++++++++----
 1 file changed, 16 insertions(+), 4 deletions(-)

Index: linux11/block/elevator-fq.c
===================================================================
--- linux11.orig/block/elevator-fq.c	2009-05-08 08:47:45.000000000 -0400
+++ linux11/block/elevator-fq.c	2009-05-08 09:27:37.000000000 -0400
@@ -942,6 +942,7 @@ void entity_served(struct io_entity *ent
 	struct io_service_tree *st;
 
 	for_each_entity(entity) {
+		BUG_ON(!entity->on_st);
 		st = io_entity_service_tree(entity);
 		entity->service += served;
 		entity->total_service += served;
@@ -1652,6 +1653,14 @@ static inline int io_group_has_active_en
 			return 1;
 	}
 
+	/*
+	 * Also check there are no active entities being served which are
+	 * not on active tree
+	 */
+
+	if (iog->sched_data.active_entity)
+		return 1;
+
 	return 0;
 }
 
@@ -1738,7 +1747,7 @@ void iocg_destroy(struct cgroup_subsys *
 	struct io_cgroup *iocg = cgroup_to_io_cgroup(cgroup);
 	struct hlist_node *n, *tmp;
 	struct io_group *iog;
-	unsigned long flags;
+	unsigned long flags, flags1;
 	int queue_lock_held = 0;
 	struct elv_fq_data *efqd;
 
@@ -1766,7 +1775,8 @@ retry:
 		rcu_read_lock();
 		efqd = rcu_dereference(iog->key);
 		if (efqd != NULL) {
-			if (spin_trylock_irq(efqd->queue->queue_lock)) {
+			if (spin_trylock_irqsave(efqd->queue->queue_lock,
+						flags1)) {
 				if (iog->key == efqd) {
 					queue_lock_held = 1;
 					rcu_read_unlock();
@@ -1780,7 +1790,8 @@ retry:
 				 * elevator hence we can proceed safely without
 				 * queue lock.
 				 */
-				spin_unlock_irq(efqd->queue->queue_lock);
+				spin_unlock_irqrestore(efqd->queue->queue_lock,
+							flags1);
 			} else {
 				/*
 				 * Did not get the queue lock while trying.
@@ -1803,7 +1814,7 @@ retry:
 locked:
 		__iocg_destroy(iocg, iog, queue_lock_held);
 		if (queue_lock_held) {
-			spin_unlock_irq(efqd->queue->queue_lock);
+			spin_unlock_irqrestore(efqd->queue->queue_lock, flags1);
 			queue_lock_held = 0;
 		}
 	}
@@ -1811,6 +1822,7 @@ locked:
 
 	BUG_ON(!hlist_empty(&iocg->group_data));
 
+	free_css_id(&io_subsys, &iocg->css);
 	kfree(iocg);
 }
 

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: IO scheduler based IO Controller V2
       [not found]                     ` <20090507144501.GB9463-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  2009-05-07 15:36                         ` Vivek Goyal
@ 2009-05-07 22:40                       ` Andrea Righi
  1 sibling, 0 replies; 97+ messages in thread
From: Andrea Righi @ 2009-05-07 22:40 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: dhaval-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8,
	snitzer-H+wXaHxf7aLQT0dZR+AlfA, dm-devel-H+wXaHxf7aLQT0dZR+AlfA,
	jens.axboe-QHcLZuEGTsvQT0dZR+AlfA, agk-H+wXaHxf7aLQT0dZR+AlfA,
	balbir-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8,
	paolo.valente-rcYM44yAMweonA0d6jMUrA,
	fernando-gVGce1chcLdL9jVzuh4AOg, jmoyer-H+wXaHxf7aLQT0dZR+AlfA,
	fchecconi-Re5JQEeQqe8AvxtiuMwx3w,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Andrew Morton

On Thu, May 07, 2009 at 10:45:01AM -0400, Vivek Goyal wrote:
> So now we are left with the issue of loosing the notion of priority and
> class with-in cgroup. In fact on bigger systems we will probably run into
> issues of kiothrottled scalability as single thread is trying to cater to
> all the disks.
> 
> If we do max bw control at IO scheduler level, then I think we should be able
> to control max bw while maintaining the notion of priority and class with-in
> cgroup. Also there are multiple pdflush threads and jens seems to be pushing
> flusher threads per bdi which will help us achieve greater scalability and
> don't have to replicate that infrastructure for kiothrottled also.

There's a lot of room for improvements and optimizations in the
kiothrottled part, obviously the single-threaded approach is not a
definitive solutions.

Flusher threads are probably a good solution. But I don't think we need
to replicate the pdflush replacement infrastructure for throttled
writeback IO. Instead it could be just integrated with the flusher
threads, i.e. activate flusher threads only when the request needs to be
written to disk according to the dirty memory limit and IO BW limits.

I mean, I don't see any critical problem for this part.

Instead, preserving the IO priority and IO scheduler logic inside
cgroups seems a more critical issue to me. And I'm quite convinced that
the right approach for this is to operate at the IO scheduler, but I'm
still a little bit skeptical that only operating at the IO scheduler
level would resolve all our problems.

-Andrea

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: IO scheduler based IO Controller V2
  2009-05-07 14:45                     ` Vivek Goyal
  (?)
  (?)
@ 2009-05-07 22:40                     ` Andrea Righi
  -1 siblings, 0 replies; 97+ messages in thread
From: Andrea Righi @ 2009-05-07 22:40 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Andrew Morton, nauman, dpshah, lizf, mikew, fchecconi,
	paolo.valente, jens.axboe, ryov, fernando, s-uchida, taka,
	guijianfeng, jmoyer, dhaval, balbir, linux-kernel, containers,
	agk, dm-devel, snitzer, m-ikeda, peterz

On Thu, May 07, 2009 at 10:45:01AM -0400, Vivek Goyal wrote:
> So now we are left with the issue of loosing the notion of priority and
> class with-in cgroup. In fact on bigger systems we will probably run into
> issues of kiothrottled scalability as single thread is trying to cater to
> all the disks.
> 
> If we do max bw control at IO scheduler level, then I think we should be able
> to control max bw while maintaining the notion of priority and class with-in
> cgroup. Also there are multiple pdflush threads and jens seems to be pushing
> flusher threads per bdi which will help us achieve greater scalability and
> don't have to replicate that infrastructure for kiothrottled also.

There's a lot of room for improvements and optimizations in the
kiothrottled part, obviously the single-threaded approach is not a
definitive solutions.

Flusher threads are probably a good solution. But I don't think we need
to replicate the pdflush replacement infrastructure for throttled
writeback IO. Instead it could be just integrated with the flusher
threads, i.e. activate flusher threads only when the request needs to be
written to disk according to the dirty memory limit and IO BW limits.

I mean, I don't see any critical problem for this part.

Instead, preserving the IO priority and IO scheduler logic inside
cgroups seems a more critical issue to me. And I'm quite convinced that
the right approach for this is to operate at the IO scheduler, but I'm
still a little bit skeptical that only operating at the IO scheduler
level would resolve all our problems.

-Andrea

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: IO scheduler based IO Controller V2
       [not found]                         ` <20090507153642.GC9463-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  2009-05-07 15:42                             ` Vivek Goyal
@ 2009-05-07 22:19                           ` Andrea Righi
  1 sibling, 0 replies; 97+ messages in thread
From: Andrea Righi @ 2009-05-07 22:19 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: dhaval-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8,
	snitzer-H+wXaHxf7aLQT0dZR+AlfA, dm-devel-H+wXaHxf7aLQT0dZR+AlfA,
	jens.axboe-QHcLZuEGTsvQT0dZR+AlfA, agk-H+wXaHxf7aLQT0dZR+AlfA,
	balbir-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8,
	paolo.valente-rcYM44yAMweonA0d6jMUrA,
	fernando-gVGce1chcLdL9jVzuh4AOg, jmoyer-H+wXaHxf7aLQT0dZR+AlfA,
	fchecconi-Re5JQEeQqe8AvxtiuMwx3w,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Andrew Morton

On Thu, May 07, 2009 at 11:36:42AM -0400, Vivek Goyal wrote:
> Hmm.., my old config had "AS" as default scheduler that's why I was seeing
> the strange issue of RT task finishing after BE. My apologies for that. I
> somehow assumed that CFQ is default scheduler in my config.

ok.

> 
> So I have re-run the test to see if we are still seeing the issue of
> loosing priority and class with-in cgroup. And we still do..
> 
> 2.6.30-rc4 with io-throttle patches
> ===================================
> Test1
> =====
> - Two readers, one BE prio 0 and other BE prio 7 in a cgroup limited with
>   8MB/s BW.
> 
> 234179072 bytes (234 MB) copied, 55.8448 s, 4.2 MB/s
> prio 0 task finished
> 234179072 bytes (234 MB) copied, 55.8878 s, 4.2 MB/s
> 
> Test2
> =====
> - Two readers, one RT prio 0 and other BE prio 7 in a cgroup limited with
>   8MB/s BW.
> 
> 234179072 bytes (234 MB) copied, 55.8876 s, 4.2 MB/s
> 234179072 bytes (234 MB) copied, 55.8984 s, 4.2 MB/s
> RT task finished

ok, coherent with the current io-throttle implementation.

> 
> Test3
> =====
> - Reader Starvation
> - I created a cgroup with BW limit of 64MB/s. First I just run the reader
>   alone and then I run reader along with 4 writers 4 times. 
> 
> Reader alone
> 234179072 bytes (234 MB) copied, 3.71796 s, 63.0 MB/s
> 
> Reader with 4 writers
> ---------------------
> First run
> 234179072 bytes (234 MB) copied, 30.394 s, 7.7 MB/s 
> 
> Second run
> 234179072 bytes (234 MB) copied, 26.9607 s, 8.7 MB/s
> 
> Third run
> 234179072 bytes (234 MB) copied, 37.3515 s, 6.3 MB/s
> 
> Fourth run
> 234179072 bytes (234 MB) copied, 36.817 s, 6.4 MB/s
> 
> Note that out of 64MB/s limit of this cgroup, reader does not get even
> 1/5 of the BW. In normal systems, readers are advantaged and reader gets
> its job done much faster even in presence of multiple writers.   

And this is also coherent. The throttling is equally probable for read
and write. But this shouldn't happen if we saturate the physical disk BW
(doing proportional BW control or using a watermark close to 100 in
io-throttle). In this case IO scheduler logic shouldn't be totally
broken.

Doing a very quick test with io-throttle, using a 10MB/s BW limit and
blockio.watermark=90:

Launching reader
256+0 records in
256+0 records out
268435456 bytes (268 MB) copied, 32.2798 s, 8.3 MB/s

In the same time the writers wrote ~190MB, so the single reader got
about 1/3 of the total BW.

182M testzerofile4
198M testzerofile1
188M testzerofile3
189M testzerofile2

Things are probably better with many cgroups, many readers and writers
and in general the disk BW more saturated.

Proportional BW approach wins in this case, because if you always use
the whole disk BW the logic of the IO scheduler is still valid.

> 
> Vanilla 2.6.30-rc4
> ==================
> 
> Test3
> =====
> Reader alone
> 234179072 bytes (234 MB) copied, 2.52195 s, 92.9 MB/s
> 
> Reader with 4 writers
> ---------------------
> First run
> 234179072 bytes (234 MB) copied, 4.39929 s, 53.2 MB/s
> 
> Second run
> 234179072 bytes (234 MB) copied, 4.55929 s, 51.4 MB/s
> 
> Third run
> 234179072 bytes (234 MB) copied, 4.79855 s, 48.8 MB/s
> 
> Fourth run
> 234179072 bytes (234 MB) copied, 4.5069 s, 52.0 MB/s
> 
> Notice, that without any writers we seem to be having BW of 92MB/s and
> more than 50% of that BW is still assigned to reader in presence of
> writers. Compare this with io-throttle cgroup of 64MB/s where reader
> struggles to get 10-15% of BW. 
> 
> So any 2nd level control will break the notion and assumptions of
> underlying IO scheduler. We should probably do control at IO scheduler
> level to make sure we don't run into such issues while getting
> hierarchical fair share for groups.
> 
> Thanks
> Vivek
> 

What are the results with your IO scheduler controller (if you already
have them, otherwise I'll repeat this test in my system)? It seems a
very interesting test to compare the advantages of the IO scheduler
solution respect to the io-throttle approach.

Thanks,
-Andrea

> > So now we are left with the issue of loosing the notion of priority and
> > class with-in cgroup. In fact on bigger systems we will probably run into > issues of kiothrottled scalability as single thread is trying to cater to
> > all the disks.
> > 
> > If we do max bw control at IO scheduler level, then I think we should be able
> > to control max bw while maintaining the notion of priority and class with-in
> > cgroup. Also there are multiple pdflush threads and jens seems to be pushing
> > flusher threads per bdi which will help us achieve greater scalability and
> > don't have to replicate that infrastructure for kiothrottled also.
> > 
> > Thanks
> > Vivek

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: IO scheduler based IO Controller V2
  2009-05-07 15:36                         ` Vivek Goyal
  (?)
  (?)
@ 2009-05-07 22:19                         ` Andrea Righi
  2009-05-08 18:09                           ` Vivek Goyal
  2009-05-08 18:09                           ` Vivek Goyal
  -1 siblings, 2 replies; 97+ messages in thread
From: Andrea Righi @ 2009-05-07 22:19 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Andrew Morton, nauman, dpshah, lizf, mikew, fchecconi,
	paolo.valente, jens.axboe, ryov, fernando, s-uchida, taka,
	guijianfeng, jmoyer, dhaval, balbir, linux-kernel, containers,
	agk, dm-devel, snitzer, m-ikeda, peterz

On Thu, May 07, 2009 at 11:36:42AM -0400, Vivek Goyal wrote:
> Hmm.., my old config had "AS" as default scheduler that's why I was seeing
> the strange issue of RT task finishing after BE. My apologies for that. I
> somehow assumed that CFQ is default scheduler in my config.

ok.

> 
> So I have re-run the test to see if we are still seeing the issue of
> loosing priority and class with-in cgroup. And we still do..
> 
> 2.6.30-rc4 with io-throttle patches
> ===================================
> Test1
> =====
> - Two readers, one BE prio 0 and other BE prio 7 in a cgroup limited with
>   8MB/s BW.
> 
> 234179072 bytes (234 MB) copied, 55.8448 s, 4.2 MB/s
> prio 0 task finished
> 234179072 bytes (234 MB) copied, 55.8878 s, 4.2 MB/s
> 
> Test2
> =====
> - Two readers, one RT prio 0 and other BE prio 7 in a cgroup limited with
>   8MB/s BW.
> 
> 234179072 bytes (234 MB) copied, 55.8876 s, 4.2 MB/s
> 234179072 bytes (234 MB) copied, 55.8984 s, 4.2 MB/s
> RT task finished

ok, coherent with the current io-throttle implementation.

> 
> Test3
> =====
> - Reader Starvation
> - I created a cgroup with BW limit of 64MB/s. First I just run the reader
>   alone and then I run reader along with 4 writers 4 times. 
> 
> Reader alone
> 234179072 bytes (234 MB) copied, 3.71796 s, 63.0 MB/s
> 
> Reader with 4 writers
> ---------------------
> First run
> 234179072 bytes (234 MB) copied, 30.394 s, 7.7 MB/s 
> 
> Second run
> 234179072 bytes (234 MB) copied, 26.9607 s, 8.7 MB/s
> 
> Third run
> 234179072 bytes (234 MB) copied, 37.3515 s, 6.3 MB/s
> 
> Fourth run
> 234179072 bytes (234 MB) copied, 36.817 s, 6.4 MB/s
> 
> Note that out of 64MB/s limit of this cgroup, reader does not get even
> 1/5 of the BW. In normal systems, readers are advantaged and reader gets
> its job done much faster even in presence of multiple writers.   

And this is also coherent. The throttling is equally probable for read
and write. But this shouldn't happen if we saturate the physical disk BW
(doing proportional BW control or using a watermark close to 100 in
io-throttle). In this case IO scheduler logic shouldn't be totally
broken.

Doing a very quick test with io-throttle, using a 10MB/s BW limit and
blockio.watermark=90:

Launching reader
256+0 records in
256+0 records out
268435456 bytes (268 MB) copied, 32.2798 s, 8.3 MB/s

In the same time the writers wrote ~190MB, so the single reader got
about 1/3 of the total BW.

182M testzerofile4
198M testzerofile1
188M testzerofile3
189M testzerofile2

Things are probably better with many cgroups, many readers and writers
and in general the disk BW more saturated.

Proportional BW approach wins in this case, because if you always use
the whole disk BW the logic of the IO scheduler is still valid.

> 
> Vanilla 2.6.30-rc4
> ==================
> 
> Test3
> =====
> Reader alone
> 234179072 bytes (234 MB) copied, 2.52195 s, 92.9 MB/s
> 
> Reader with 4 writers
> ---------------------
> First run
> 234179072 bytes (234 MB) copied, 4.39929 s, 53.2 MB/s
> 
> Second run
> 234179072 bytes (234 MB) copied, 4.55929 s, 51.4 MB/s
> 
> Third run
> 234179072 bytes (234 MB) copied, 4.79855 s, 48.8 MB/s
> 
> Fourth run
> 234179072 bytes (234 MB) copied, 4.5069 s, 52.0 MB/s
> 
> Notice, that without any writers we seem to be having BW of 92MB/s and
> more than 50% of that BW is still assigned to reader in presence of
> writers. Compare this with io-throttle cgroup of 64MB/s where reader
> struggles to get 10-15% of BW. 
> 
> So any 2nd level control will break the notion and assumptions of
> underlying IO scheduler. We should probably do control at IO scheduler
> level to make sure we don't run into such issues while getting
> hierarchical fair share for groups.
> 
> Thanks
> Vivek
> 

What are the results with your IO scheduler controller (if you already
have them, otherwise I'll repeat this test in my system)? It seems a
very interesting test to compare the advantages of the IO scheduler
solution respect to the io-throttle approach.

Thanks,
-Andrea

> > So now we are left with the issue of loosing the notion of priority and
> > class with-in cgroup. In fact on bigger systems we will probably run into > issues of kiothrottled scalability as single thread is trying to cater to
> > all the disks.
> > 
> > If we do max bw control at IO scheduler level, then I think we should be able
> > to control max bw while maintaining the notion of priority and class with-in
> > cgroup. Also there are multiple pdflush threads and jens seems to be pushing
> > flusher threads per bdi which will help us achieve greater scalability and
> > don't have to replicate that infrastructure for kiothrottled also.
> > 
> > Thanks
> > Vivek

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: IO scheduler based IO Controller V2
  2009-05-07 15:36                         ` Vivek Goyal
@ 2009-05-07 15:42                             ` Vivek Goyal
  -1 siblings, 0 replies; 97+ messages in thread
From: Vivek Goyal @ 2009-05-07 15:42 UTC (permalink / raw)
  To: Andrea Righi
  Cc: dhaval-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8,
	snitzer-H+wXaHxf7aLQT0dZR+AlfA, dm-devel-H+wXaHxf7aLQT0dZR+AlfA,
	jens.axboe-QHcLZuEGTsvQT0dZR+AlfA, agk-H+wXaHxf7aLQT0dZR+AlfA,
	balbir-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8,
	paolo.valente-rcYM44yAMweonA0d6jMUrA,
	fernando-gVGce1chcLdL9jVzuh4AOg, jmoyer-H+wXaHxf7aLQT0dZR+AlfA,
	fchecconi-Re5JQEeQqe8AvxtiuMwx3w,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Andrew Morton

On Thu, May 07, 2009 at 11:36:42AM -0400, Vivek Goyal wrote:
> On Thu, May 07, 2009 at 10:45:01AM -0400, Vivek Goyal wrote:
> > On Thu, May 07, 2009 at 10:11:26AM -0400, Vivek Goyal wrote:
> > 
> > [..]
> > > [root@chilli io-throttle-tests]# ./andrea-test-script.sh 
> > > RT: 223+1 records in
> > > RT: 223+1 records out
> > > RT: 234179072 bytes (234 MB) copied, 0.988448 s, 237 MB/s
> > > BE: 223+1 records in
> > > BE: 223+1 records out
> > > BE: 234179072 bytes (234 MB) copied, 1.93885 s, 121 MB/s
> > > 
> > > So I am still seeing the issue with differnt kind of disks also. At this point
> > > of time I am really not sure why I am seeing such results.
> > 
> > Hold on. I think I found the culprit here. I was thinking that what is
> > the difference between two setups and realized that with vanilla kernels
> > I had done "make defconfig" and with io-throttle kernels I had used an
> > old config of my and did "make oldconfig". So basically config files
> > were differnt.
> > 
> > I now used the same config file and issues seems to have gone away. I
> > will look into why an old config file can force such kind of issues.
> > 
> 
> Hmm.., my old config had "AS" as default scheduler that's why I was seeing
> the strange issue of RT task finishing after BE. My apologies for that. I
> somehow assumed that CFQ is default scheduler in my config.
> 
> So I have re-run the test to see if we are still seeing the issue of
> loosing priority and class with-in cgroup. And we still do..
> 
> 2.6.30-rc4 with io-throttle patches
> ===================================
> Test1
> =====
> - Two readers, one BE prio 0 and other BE prio 7 in a cgroup limited with
>   8MB/s BW.
> 
> 234179072 bytes (234 MB) copied, 55.8448 s, 4.2 MB/s
> prio 0 task finished
> 234179072 bytes (234 MB) copied, 55.8878 s, 4.2 MB/s
> 
> Test2
> =====
> - Two readers, one RT prio 0 and other BE prio 7 in a cgroup limited with
>   8MB/s BW.
> 
> 234179072 bytes (234 MB) copied, 55.8876 s, 4.2 MB/s
> 234179072 bytes (234 MB) copied, 55.8984 s, 4.2 MB/s
> RT task finished
> 
> Test3
> =====
> - Reader Starvation
> - I created a cgroup with BW limit of 64MB/s. First I just run the reader
>   alone and then I run reader along with 4 writers 4 times. 
> 
> Reader alone
> 234179072 bytes (234 MB) copied, 3.71796 s, 63.0 MB/s
> 
> Reader with 4 writers
> ---------------------
> First run
> 234179072 bytes (234 MB) copied, 30.394 s, 7.7 MB/s 
> 
> Second run
> 234179072 bytes (234 MB) copied, 26.9607 s, 8.7 MB/s
> 
> Third run
> 234179072 bytes (234 MB) copied, 37.3515 s, 6.3 MB/s
> 
> Fourth run
> 234179072 bytes (234 MB) copied, 36.817 s, 6.4 MB/s
> 
> Note that out of 64MB/s limit of this cgroup, reader does not get even
> 1/5 of the BW. In normal systems, readers are advantaged and reader gets
> its job done much faster even in presence of multiple writers.   
> 
> Vanilla 2.6.30-rc4
> ==================
> 
> Test3
> =====
> Reader alone
> 234179072 bytes (234 MB) copied, 2.52195 s, 92.9 MB/s
> 
> Reader with 4 writers
> ---------------------
> First run
> 234179072 bytes (234 MB) copied, 4.39929 s, 53.2 MB/s
> 
> Second run
> 234179072 bytes (234 MB) copied, 4.55929 s, 51.4 MB/s
> 
> Third run
> 234179072 bytes (234 MB) copied, 4.79855 s, 48.8 MB/s
> 
> Fourth run
> 234179072 bytes (234 MB) copied, 4.5069 s, 52.0 MB/s
> 
> Notice, that without any writers we seem to be having BW of 92MB/s and
> more than 50% of that BW is still assigned to reader in presence of
> writers. Compare this with io-throttle cgroup of 64MB/s where reader
> struggles to get 10-15% of BW. 
> 
> So any 2nd level control will break the notion and assumptions of
> underlying IO scheduler. We should probably do control at IO scheduler
> level to make sure we don't run into such issues while getting
> hierarchical fair share for groups.
> 

Forgot to attached my reader-writer script last time. Here it is.


***************************************************************
#!/bin/bash

mount /dev/sdb1 /mnt/sdb

mount -t cgroup -o blockio blockio /cgroup/iot/
mkdir -p /cgroup/iot/test1 /cgroup/iot/test2

# Set bw limit of 64 MB/ps on sdb
echo "/dev/sdb:$((64 * 1024 * 1024)):0:0" > /cgroup/iot/test1/blockio.bandwidth-max

sync
echo 3 > /proc/sys/vm/drop_caches

echo $$ > /cgroup/iot/test1/tasks

ionice -c 2 -n 7 dd if=/dev/zero of=/mnt/sdb/testzerofile1 bs=4K count=524288 & 
echo $!

ionice -c 2 -n 7 dd if=/dev/zero of=/mnt/sdb/testzerofile2 bs=4K count=524288 & 
echo $!

ionice -c 2 -n 7 dd if=/dev/zero of=/mnt/sdb/testzerofile3 bs=4K count=524288 & 
echo $!

ionice -c 2 -n 7 dd if=/dev/zero of=/mnt/sdb/testzerofile4 bs=4K count=524288 & 
echo $!

sleep 5
echo "Launching reader"

ionice -c 2 -n 0 dd if=/mnt/sdb/zerofile2 of=/dev/zero &
pid2=$!
echo $pid2

wait $pid2
echo "Reader Finished"
killall dd
**********************************************************************

Thanks
Vivek

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: IO scheduler based IO Controller V2
@ 2009-05-07 15:42                             ` Vivek Goyal
  0 siblings, 0 replies; 97+ messages in thread
From: Vivek Goyal @ 2009-05-07 15:42 UTC (permalink / raw)
  To: Andrea Righi
  Cc: Andrew Morton, nauman, dpshah, lizf, mikew, fchecconi,
	paolo.valente, jens.axboe, ryov, fernando, s-uchida, taka,
	guijianfeng, jmoyer, dhaval, balbir, linux-kernel, containers,
	agk, dm-devel, snitzer, m-ikeda, peterz

On Thu, May 07, 2009 at 11:36:42AM -0400, Vivek Goyal wrote:
> On Thu, May 07, 2009 at 10:45:01AM -0400, Vivek Goyal wrote:
> > On Thu, May 07, 2009 at 10:11:26AM -0400, Vivek Goyal wrote:
> > 
> > [..]
> > > [root@chilli io-throttle-tests]# ./andrea-test-script.sh 
> > > RT: 223+1 records in
> > > RT: 223+1 records out
> > > RT: 234179072 bytes (234 MB) copied, 0.988448 s, 237 MB/s
> > > BE: 223+1 records in
> > > BE: 223+1 records out
> > > BE: 234179072 bytes (234 MB) copied, 1.93885 s, 121 MB/s
> > > 
> > > So I am still seeing the issue with differnt kind of disks also. At this point
> > > of time I am really not sure why I am seeing such results.
> > 
> > Hold on. I think I found the culprit here. I was thinking that what is
> > the difference between two setups and realized that with vanilla kernels
> > I had done "make defconfig" and with io-throttle kernels I had used an
> > old config of my and did "make oldconfig". So basically config files
> > were differnt.
> > 
> > I now used the same config file and issues seems to have gone away. I
> > will look into why an old config file can force such kind of issues.
> > 
> 
> Hmm.., my old config had "AS" as default scheduler that's why I was seeing
> the strange issue of RT task finishing after BE. My apologies for that. I
> somehow assumed that CFQ is default scheduler in my config.
> 
> So I have re-run the test to see if we are still seeing the issue of
> loosing priority and class with-in cgroup. And we still do..
> 
> 2.6.30-rc4 with io-throttle patches
> ===================================
> Test1
> =====
> - Two readers, one BE prio 0 and other BE prio 7 in a cgroup limited with
>   8MB/s BW.
> 
> 234179072 bytes (234 MB) copied, 55.8448 s, 4.2 MB/s
> prio 0 task finished
> 234179072 bytes (234 MB) copied, 55.8878 s, 4.2 MB/s
> 
> Test2
> =====
> - Two readers, one RT prio 0 and other BE prio 7 in a cgroup limited with
>   8MB/s BW.
> 
> 234179072 bytes (234 MB) copied, 55.8876 s, 4.2 MB/s
> 234179072 bytes (234 MB) copied, 55.8984 s, 4.2 MB/s
> RT task finished
> 
> Test3
> =====
> - Reader Starvation
> - I created a cgroup with BW limit of 64MB/s. First I just run the reader
>   alone and then I run reader along with 4 writers 4 times. 
> 
> Reader alone
> 234179072 bytes (234 MB) copied, 3.71796 s, 63.0 MB/s
> 
> Reader with 4 writers
> ---------------------
> First run
> 234179072 bytes (234 MB) copied, 30.394 s, 7.7 MB/s 
> 
> Second run
> 234179072 bytes (234 MB) copied, 26.9607 s, 8.7 MB/s
> 
> Third run
> 234179072 bytes (234 MB) copied, 37.3515 s, 6.3 MB/s
> 
> Fourth run
> 234179072 bytes (234 MB) copied, 36.817 s, 6.4 MB/s
> 
> Note that out of 64MB/s limit of this cgroup, reader does not get even
> 1/5 of the BW. In normal systems, readers are advantaged and reader gets
> its job done much faster even in presence of multiple writers.   
> 
> Vanilla 2.6.30-rc4
> ==================
> 
> Test3
> =====
> Reader alone
> 234179072 bytes (234 MB) copied, 2.52195 s, 92.9 MB/s
> 
> Reader with 4 writers
> ---------------------
> First run
> 234179072 bytes (234 MB) copied, 4.39929 s, 53.2 MB/s
> 
> Second run
> 234179072 bytes (234 MB) copied, 4.55929 s, 51.4 MB/s
> 
> Third run
> 234179072 bytes (234 MB) copied, 4.79855 s, 48.8 MB/s
> 
> Fourth run
> 234179072 bytes (234 MB) copied, 4.5069 s, 52.0 MB/s
> 
> Notice, that without any writers we seem to be having BW of 92MB/s and
> more than 50% of that BW is still assigned to reader in presence of
> writers. Compare this with io-throttle cgroup of 64MB/s where reader
> struggles to get 10-15% of BW. 
> 
> So any 2nd level control will break the notion and assumptions of
> underlying IO scheduler. We should probably do control at IO scheduler
> level to make sure we don't run into such issues while getting
> hierarchical fair share for groups.
> 

Forgot to attached my reader-writer script last time. Here it is.


***************************************************************
#!/bin/bash

mount /dev/sdb1 /mnt/sdb

mount -t cgroup -o blockio blockio /cgroup/iot/
mkdir -p /cgroup/iot/test1 /cgroup/iot/test2

# Set bw limit of 64 MB/ps on sdb
echo "/dev/sdb:$((64 * 1024 * 1024)):0:0" > /cgroup/iot/test1/blockio.bandwidth-max

sync
echo 3 > /proc/sys/vm/drop_caches

echo $$ > /cgroup/iot/test1/tasks

ionice -c 2 -n 7 dd if=/dev/zero of=/mnt/sdb/testzerofile1 bs=4K count=524288 & 
echo $!

ionice -c 2 -n 7 dd if=/dev/zero of=/mnt/sdb/testzerofile2 bs=4K count=524288 & 
echo $!

ionice -c 2 -n 7 dd if=/dev/zero of=/mnt/sdb/testzerofile3 bs=4K count=524288 & 
echo $!

ionice -c 2 -n 7 dd if=/dev/zero of=/mnt/sdb/testzerofile4 bs=4K count=524288 & 
echo $!

sleep 5
echo "Launching reader"

ionice -c 2 -n 0 dd if=/mnt/sdb/zerofile2 of=/dev/zero &
pid2=$!
echo $pid2

wait $pid2
echo "Reader Finished"
killall dd
**********************************************************************

Thanks
Vivek

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: IO scheduler based IO Controller V2
  2009-05-07 14:45                     ` Vivek Goyal
@ 2009-05-07 15:36                         ` Vivek Goyal
  -1 siblings, 0 replies; 97+ messages in thread
From: Vivek Goyal @ 2009-05-07 15:36 UTC (permalink / raw)
  To: Andrea Righi
  Cc: dhaval-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8,
	snitzer-H+wXaHxf7aLQT0dZR+AlfA, dm-devel-H+wXaHxf7aLQT0dZR+AlfA,
	jens.axboe-QHcLZuEGTsvQT0dZR+AlfA, agk-H+wXaHxf7aLQT0dZR+AlfA,
	balbir-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8,
	paolo.valente-rcYM44yAMweonA0d6jMUrA,
	fernando-gVGce1chcLdL9jVzuh4AOg, jmoyer-H+wXaHxf7aLQT0dZR+AlfA,
	fchecconi-Re5JQEeQqe8AvxtiuMwx3w,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Andrew Morton

On Thu, May 07, 2009 at 10:45:01AM -0400, Vivek Goyal wrote:
> On Thu, May 07, 2009 at 10:11:26AM -0400, Vivek Goyal wrote:
> 
> [..]
> > [root@chilli io-throttle-tests]# ./andrea-test-script.sh 
> > RT: 223+1 records in
> > RT: 223+1 records out
> > RT: 234179072 bytes (234 MB) copied, 0.988448 s, 237 MB/s
> > BE: 223+1 records in
> > BE: 223+1 records out
> > BE: 234179072 bytes (234 MB) copied, 1.93885 s, 121 MB/s
> > 
> > So I am still seeing the issue with differnt kind of disks also. At this point
> > of time I am really not sure why I am seeing such results.
> 
> Hold on. I think I found the culprit here. I was thinking that what is
> the difference between two setups and realized that with vanilla kernels
> I had done "make defconfig" and with io-throttle kernels I had used an
> old config of my and did "make oldconfig". So basically config files
> were differnt.
> 
> I now used the same config file and issues seems to have gone away. I
> will look into why an old config file can force such kind of issues.
> 

Hmm.., my old config had "AS" as default scheduler that's why I was seeing
the strange issue of RT task finishing after BE. My apologies for that. I
somehow assumed that CFQ is default scheduler in my config.

So I have re-run the test to see if we are still seeing the issue of
loosing priority and class with-in cgroup. And we still do..

2.6.30-rc4 with io-throttle patches
===================================
Test1
=====
- Two readers, one BE prio 0 and other BE prio 7 in a cgroup limited with
  8MB/s BW.

234179072 bytes (234 MB) copied, 55.8448 s, 4.2 MB/s
prio 0 task finished
234179072 bytes (234 MB) copied, 55.8878 s, 4.2 MB/s

Test2
=====
- Two readers, one RT prio 0 and other BE prio 7 in a cgroup limited with
  8MB/s BW.

234179072 bytes (234 MB) copied, 55.8876 s, 4.2 MB/s
234179072 bytes (234 MB) copied, 55.8984 s, 4.2 MB/s
RT task finished

Test3
=====
- Reader Starvation
- I created a cgroup with BW limit of 64MB/s. First I just run the reader
  alone and then I run reader along with 4 writers 4 times. 

Reader alone
234179072 bytes (234 MB) copied, 3.71796 s, 63.0 MB/s

Reader with 4 writers
---------------------
First run
234179072 bytes (234 MB) copied, 30.394 s, 7.7 MB/s 

Second run
234179072 bytes (234 MB) copied, 26.9607 s, 8.7 MB/s

Third run
234179072 bytes (234 MB) copied, 37.3515 s, 6.3 MB/s

Fourth run
234179072 bytes (234 MB) copied, 36.817 s, 6.4 MB/s

Note that out of 64MB/s limit of this cgroup, reader does not get even
1/5 of the BW. In normal systems, readers are advantaged and reader gets
its job done much faster even in presence of multiple writers.   

Vanilla 2.6.30-rc4
==================

Test3
=====
Reader alone
234179072 bytes (234 MB) copied, 2.52195 s, 92.9 MB/s

Reader with 4 writers
---------------------
First run
234179072 bytes (234 MB) copied, 4.39929 s, 53.2 MB/s

Second run
234179072 bytes (234 MB) copied, 4.55929 s, 51.4 MB/s

Third run
234179072 bytes (234 MB) copied, 4.79855 s, 48.8 MB/s

Fourth run
234179072 bytes (234 MB) copied, 4.5069 s, 52.0 MB/s

Notice, that without any writers we seem to be having BW of 92MB/s and
more than 50% of that BW is still assigned to reader in presence of
writers. Compare this with io-throttle cgroup of 64MB/s where reader
struggles to get 10-15% of BW. 

So any 2nd level control will break the notion and assumptions of
underlying IO scheduler. We should probably do control at IO scheduler
level to make sure we don't run into such issues while getting
hierarchical fair share for groups.

Thanks
Vivek

> So now we are left with the issue of loosing the notion of priority and
> class with-in cgroup. In fact on bigger systems we will probably run into > issues of kiothrottled scalability as single thread is trying to cater to
> all the disks.
> 
> If we do max bw control at IO scheduler level, then I think we should be able
> to control max bw while maintaining the notion of priority and class with-in
> cgroup. Also there are multiple pdflush threads and jens seems to be pushing
> flusher threads per bdi which will help us achieve greater scalability and
> don't have to replicate that infrastructure for kiothrottled also.
> 
> Thanks
> Vivek

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: IO scheduler based IO Controller V2
@ 2009-05-07 15:36                         ` Vivek Goyal
  0 siblings, 0 replies; 97+ messages in thread
From: Vivek Goyal @ 2009-05-07 15:36 UTC (permalink / raw)
  To: Andrea Righi
  Cc: Andrew Morton, nauman, dpshah, lizf, mikew, fchecconi,
	paolo.valente, jens.axboe, ryov, fernando, s-uchida, taka,
	guijianfeng, jmoyer, dhaval, balbir, linux-kernel, containers,
	agk, dm-devel, snitzer, m-ikeda, peterz

On Thu, May 07, 2009 at 10:45:01AM -0400, Vivek Goyal wrote:
> On Thu, May 07, 2009 at 10:11:26AM -0400, Vivek Goyal wrote:
> 
> [..]
> > [root@chilli io-throttle-tests]# ./andrea-test-script.sh 
> > RT: 223+1 records in
> > RT: 223+1 records out
> > RT: 234179072 bytes (234 MB) copied, 0.988448 s, 237 MB/s
> > BE: 223+1 records in
> > BE: 223+1 records out
> > BE: 234179072 bytes (234 MB) copied, 1.93885 s, 121 MB/s
> > 
> > So I am still seeing the issue with differnt kind of disks also. At this point
> > of time I am really not sure why I am seeing such results.
> 
> Hold on. I think I found the culprit here. I was thinking that what is
> the difference between two setups and realized that with vanilla kernels
> I had done "make defconfig" and with io-throttle kernels I had used an
> old config of my and did "make oldconfig". So basically config files
> were differnt.
> 
> I now used the same config file and issues seems to have gone away. I
> will look into why an old config file can force such kind of issues.
> 

Hmm.., my old config had "AS" as default scheduler that's why I was seeing
the strange issue of RT task finishing after BE. My apologies for that. I
somehow assumed that CFQ is default scheduler in my config.

So I have re-run the test to see if we are still seeing the issue of
loosing priority and class with-in cgroup. And we still do..

2.6.30-rc4 with io-throttle patches
===================================
Test1
=====
- Two readers, one BE prio 0 and other BE prio 7 in a cgroup limited with
  8MB/s BW.

234179072 bytes (234 MB) copied, 55.8448 s, 4.2 MB/s
prio 0 task finished
234179072 bytes (234 MB) copied, 55.8878 s, 4.2 MB/s

Test2
=====
- Two readers, one RT prio 0 and other BE prio 7 in a cgroup limited with
  8MB/s BW.

234179072 bytes (234 MB) copied, 55.8876 s, 4.2 MB/s
234179072 bytes (234 MB) copied, 55.8984 s, 4.2 MB/s
RT task finished

Test3
=====
- Reader Starvation
- I created a cgroup with BW limit of 64MB/s. First I just run the reader
  alone and then I run reader along with 4 writers 4 times. 

Reader alone
234179072 bytes (234 MB) copied, 3.71796 s, 63.0 MB/s

Reader with 4 writers
---------------------
First run
234179072 bytes (234 MB) copied, 30.394 s, 7.7 MB/s 

Second run
234179072 bytes (234 MB) copied, 26.9607 s, 8.7 MB/s

Third run
234179072 bytes (234 MB) copied, 37.3515 s, 6.3 MB/s

Fourth run
234179072 bytes (234 MB) copied, 36.817 s, 6.4 MB/s

Note that out of 64MB/s limit of this cgroup, reader does not get even
1/5 of the BW. In normal systems, readers are advantaged and reader gets
its job done much faster even in presence of multiple writers.   

Vanilla 2.6.30-rc4
==================

Test3
=====
Reader alone
234179072 bytes (234 MB) copied, 2.52195 s, 92.9 MB/s

Reader with 4 writers
---------------------
First run
234179072 bytes (234 MB) copied, 4.39929 s, 53.2 MB/s

Second run
234179072 bytes (234 MB) copied, 4.55929 s, 51.4 MB/s

Third run
234179072 bytes (234 MB) copied, 4.79855 s, 48.8 MB/s

Fourth run
234179072 bytes (234 MB) copied, 4.5069 s, 52.0 MB/s

Notice, that without any writers we seem to be having BW of 92MB/s and
more than 50% of that BW is still assigned to reader in presence of
writers. Compare this with io-throttle cgroup of 64MB/s where reader
struggles to get 10-15% of BW. 

So any 2nd level control will break the notion and assumptions of
underlying IO scheduler. We should probably do control at IO scheduler
level to make sure we don't run into such issues while getting
hierarchical fair share for groups.

Thanks
Vivek

> So now we are left with the issue of loosing the notion of priority and
> class with-in cgroup. In fact on bigger systems we will probably run into > issues of kiothrottled scalability as single thread is trying to cater to
> all the disks.
> 
> If we do max bw control at IO scheduler level, then I think we should be able
> to control max bw while maintaining the notion of priority and class with-in
> cgroup. Also there are multiple pdflush threads and jens seems to be pushing
> flusher threads per bdi which will help us achieve greater scalability and
> don't have to replicate that infrastructure for kiothrottled also.
> 
> Thanks
> Vivek

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: IO scheduler based IO Controller V2
  2009-05-07 14:11               ` Vivek Goyal
@ 2009-05-07 14:45                     ` Vivek Goyal
  0 siblings, 0 replies; 97+ messages in thread
From: Vivek Goyal @ 2009-05-07 14:45 UTC (permalink / raw)
  To: Andrea Righi
  Cc: dhaval-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8,
	snitzer-H+wXaHxf7aLQT0dZR+AlfA, dm-devel-H+wXaHxf7aLQT0dZR+AlfA,
	jens.axboe-QHcLZuEGTsvQT0dZR+AlfA, agk-H+wXaHxf7aLQT0dZR+AlfA,
	balbir-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8,
	paolo.valente-rcYM44yAMweonA0d6jMUrA,
	fernando-gVGce1chcLdL9jVzuh4AOg, jmoyer-H+wXaHxf7aLQT0dZR+AlfA,
	fchecconi-Re5JQEeQqe8AvxtiuMwx3w,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Andrew Morton

On Thu, May 07, 2009 at 10:11:26AM -0400, Vivek Goyal wrote:

[..]
> [root@chilli io-throttle-tests]# ./andrea-test-script.sh 
> RT: 223+1 records in
> RT: 223+1 records out
> RT: 234179072 bytes (234 MB) copied, 0.988448 s, 237 MB/s
> BE: 223+1 records in
> BE: 223+1 records out
> BE: 234179072 bytes (234 MB) copied, 1.93885 s, 121 MB/s
> 
> So I am still seeing the issue with differnt kind of disks also. At this point
> of time I am really not sure why I am seeing such results.

Hold on. I think I found the culprit here. I was thinking that what is
the difference between two setups and realized that with vanilla kernels
I had done "make defconfig" and with io-throttle kernels I had used an
old config of my and did "make oldconfig". So basically config files
were differnt.

I now used the same config file and issues seems to have gone away. I
will look into why an old config file can force such kind of issues.

So now we are left with the issue of loosing the notion of priority and
class with-in cgroup. In fact on bigger systems we will probably run into
issues of kiothrottled scalability as single thread is trying to cater to
all the disks.

If we do max bw control at IO scheduler level, then I think we should be able
to control max bw while maintaining the notion of priority and class with-in
cgroup. Also there are multiple pdflush threads and jens seems to be pushing
flusher threads per bdi which will help us achieve greater scalability and
don't have to replicate that infrastructure for kiothrottled also.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: IO scheduler based IO Controller V2
@ 2009-05-07 14:45                     ` Vivek Goyal
  0 siblings, 0 replies; 97+ messages in thread
From: Vivek Goyal @ 2009-05-07 14:45 UTC (permalink / raw)
  To: Andrea Righi
  Cc: Andrew Morton, nauman, dpshah, lizf, mikew, fchecconi,
	paolo.valente, jens.axboe, ryov, fernando, s-uchida, taka,
	guijianfeng, jmoyer, dhaval, balbir, linux-kernel, containers,
	agk, dm-devel, snitzer, m-ikeda, peterz

On Thu, May 07, 2009 at 10:11:26AM -0400, Vivek Goyal wrote:

[..]
> [root@chilli io-throttle-tests]# ./andrea-test-script.sh 
> RT: 223+1 records in
> RT: 223+1 records out
> RT: 234179072 bytes (234 MB) copied, 0.988448 s, 237 MB/s
> BE: 223+1 records in
> BE: 223+1 records out
> BE: 234179072 bytes (234 MB) copied, 1.93885 s, 121 MB/s
> 
> So I am still seeing the issue with differnt kind of disks also. At this point
> of time I am really not sure why I am seeing such results.

Hold on. I think I found the culprit here. I was thinking that what is
the difference between two setups and realized that with vanilla kernels
I had done "make defconfig" and with io-throttle kernels I had used an
old config of my and did "make oldconfig". So basically config files
were differnt.

I now used the same config file and issues seems to have gone away. I
will look into why an old config file can force such kind of issues.

So now we are left with the issue of loosing the notion of priority and
class with-in cgroup. In fact on bigger systems we will probably run into
issues of kiothrottled scalability as single thread is trying to cater to
all the disks.

If we do max bw control at IO scheduler level, then I think we should be able
to control max bw while maintaining the notion of priority and class with-in
cgroup. Also there are multiple pdflush threads and jens seems to be pushing
flusher threads per bdi which will help us achieve greater scalability and
don't have to replicate that infrastructure for kiothrottled also.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: IO scheduler based IO Controller V2
  2009-05-07  9:04             ` Andrea Righi
  2009-05-07 12:22               ` Andrea Righi
  2009-05-07 12:22               ` Andrea Righi
@ 2009-05-07 14:11               ` Vivek Goyal
  2009-05-07 14:11               ` Vivek Goyal
  3 siblings, 0 replies; 97+ messages in thread
From: Vivek Goyal @ 2009-05-07 14:11 UTC (permalink / raw)
  To: Andrea Righi
  Cc: dhaval-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8,
	snitzer-H+wXaHxf7aLQT0dZR+AlfA, dm-devel-H+wXaHxf7aLQT0dZR+AlfA,
	jens.axboe-QHcLZuEGTsvQT0dZR+AlfA, agk-H+wXaHxf7aLQT0dZR+AlfA,
	balbir-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8,
	paolo.valente-rcYM44yAMweonA0d6jMUrA,
	fernando-gVGce1chcLdL9jVzuh4AOg, jmoyer-H+wXaHxf7aLQT0dZR+AlfA,
	fchecconi-Re5JQEeQqe8AvxtiuMwx3w,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Andrew Morton

On Thu, May 07, 2009 at 11:04:50AM +0200, Andrea Righi wrote:
> On Wed, May 06, 2009 at 05:52:35PM -0400, Vivek Goyal wrote:
> > > > Without io-throttle patches
> > > > ---------------------------
> > > > - Two readers, first BE prio 7, second BE prio 0
> > > > 
> > > > 234179072 bytes (234 MB) copied, 4.12074 s, 56.8 MB/s
> > > > High prio reader finished
> > > > 234179072 bytes (234 MB) copied, 5.36023 s, 43.7 MB/s
> > > > 
> > > > Note: There is no service differentiation between prio 0 and prio 7 task
> > > >       with io-throttle patches.
> > > > 
> > > > Test 3
> > > > ======
> > > > - Run the one RT reader and one BE reader in root cgroup without any
> > > >   limitations. I guess this should mean unlimited BW and behavior should
> > > >   be same as with CFQ without io-throttling patches.
> > > > 
> > > > With io-throttle patches
> > > > =========================
> > > > Ran the test 4 times because I was getting different results in different
> > > > runs.
> > > > 
> > > > - Two readers, one RT prio 0  other BE prio 7
> > > > 
> > > > 234179072 bytes (234 MB) copied, 2.74604 s, 85.3 MB/s
> > > > 234179072 bytes (234 MB) copied, 5.20995 s, 44.9 MB/s
> > > > RT task finished
> > > > 
> > > > 234179072 bytes (234 MB) copied, 4.54417 s, 51.5 MB/s
> > > > RT task finished
> > > > 234179072 bytes (234 MB) copied, 5.23396 s, 44.7 MB/s
> > > > 
> > > > 234179072 bytes (234 MB) copied, 5.17727 s, 45.2 MB/s
> > > > RT task finished
> > > > 234179072 bytes (234 MB) copied, 5.25894 s, 44.5 MB/s
> > > > 
> > > > 234179072 bytes (234 MB) copied, 2.74141 s, 85.4 MB/s
> > > > 234179072 bytes (234 MB) copied, 5.20536 s, 45.0 MB/s
> > > > RT task finished
> > > > 
> > > > Note: Out of 4 runs, looks like twice it is complete priority inversion
> > > >       and RT task finished after BE task. Rest of the two times, the
> > > >       difference between BW of RT and BE task is much less as compared to
> > > >       without patches. In fact once it was almost same.
> > > 
> > > This is strange. If you don't set any limit there shouldn't be any
> > > difference respect to the other case (without io-throttle patches).
> > > 
> > > At worst a small overhead given by the task_to_iothrottle(), under
> > > rcu_read_lock(). I'll repeat this test ASAP and see if I'll be able to
> > > reproduce this strange behaviour.
> > 
> > Ya, I also found this strange. At least in root group there should not be
> > any behavior change (at max one might expect little drop in throughput
> > because of extra code).
> 
> Hi Vivek,
> 
> I'm not able to reproduce the strange behaviour above.
> 
> Which commands are you running exactly? is the system isolated (stupid
> question) no cron or background tasks doing IO during the tests?
> 
> Following the script I've used:
> 
> $ cat test.sh
> #!/bin/sh
> echo 3 > /proc/sys/vm/drop_caches
> ionice -c 1 -n 0 dd if=bigfile1 of=/dev/null bs=1M 2>&1 | sed "s/\(.*\)/RT: \1/" &
> cat /proc/$!/cgroup | sed "s/\(.*\)/RT: \1/"
> ionice -c 2 -n 7 dd if=bigfile2 of=/dev/null bs=1M 2>&1 | sed "s/\(.*\)/BE: \1/" &
> cat /proc/$!/cgroup | sed "s/\(.*\)/BE: \1/"
> for i in 1 2; do
> 	wait
> done
> 
> And the results on my PC:
> 

[..]

> The difference seems to be just the expected overhead.

Hm..., something is really amiss here. I took your scripts and ran on
my system and I still see the issue. There is nothing else running on the
system and it is isolated.

2.6.30-rc4 + io-throttle patches V16
===================================
It is freshly booted system with nothing extra running on it. This is a
4 core system.

Disk1
=====
This is a fast disk which supports queue depth of 31.

Following is the output picked from dmesg for my device properties.
[    3.016099] sd 2:0:0:0: [sdb] 488397168 512-byte hardware sectors: (250
GB/232 GiB)
[    3.016188] sd 2:0:0:0: Attached scsi generic sg2 type 0

Following are the results of 4 runs of your script. (Just changed the 
script to read right file on my system if=/mnt/sdb/zerofile1).

[root@chilli io-throttle-tests]# ./andrea-test-script.sh 
BE: 223+1 records in
BE: 223+1 records out
BE: 234179072 bytes (234 MB) copied, 4.38435 s, 53.4 MB/s
RT: 223+1 records in
RT: 223+1 records out
RT: 234179072 bytes (234 MB) copied, 5.20706 s, 45.0 MB/s

[root@chilli io-throttle-tests]# ./andrea-test-script.sh 
BE: 223+1 records in
BE: 223+1 records out
BE: 234179072 bytes (234 MB) copied, 5.12953 s, 45.7 MB/s
RT: 223+1 records in
RT: 223+1 records out
RT: 234179072 bytes (234 MB) copied, 5.23573 s, 44.7 MB/s

[root@chilli io-throttle-tests]# ./andrea-test-script.sh 
BE: 223+1 records in
BE: 223+1 records out
BE: 234179072 bytes (234 MB) copied, 3.54644 s, 66.0 MB/s
RT: 223+1 records in
RT: 223+1 records out
RT: 234179072 bytes (234 MB) copied, 5.19406 s, 45.1 MB/s

[root@chilli io-throttle-tests]# ./andrea-test-script.sh 
RT: 223+1 records in
RT: 223+1 records out
RT: 234179072 bytes (234 MB) copied, 5.21908 s, 44.9 MB/s
BE: 223+1 records in
BE: 223+1 records out
BE: 234179072 bytes (234 MB) copied, 5.23802 s, 44.7 MB/s

Disk2
=====
This is a relatively slower disk with no command queuing.

[root@chilli io-throttle-tests]# ./andrea-test-script.sh 
RT: 223+1 records in
RT: 223+1 records out
RT: 234179072 bytes (234 MB) copied, 7.06471 s, 33.1 MB/s
BE: 223+1 records in
BE: 223+1 records out
BE: 234179072 bytes (234 MB) copied, 8.01571 s, 29.2 MB/s

[root@chilli io-throttle-tests]# ./andrea-test-script.sh 
RT: 223+1 records in
RT: 223+1 records out
RT: 234179072 bytes (234 MB) copied, 7.89043 s, 29.7 MB/s
BE: 223+1 records in
BE: 223+1 records out
BE: 234179072 bytes (234 MB) copied, 8.03428 s, 29.1 MB/s

[root@chilli io-throttle-tests]# ./andrea-test-script.sh 
BE: 223+1 records in
BE: 223+1 records out
BE: 234179072 bytes (234 MB) copied, 7.38942 s, 31.7 MB/s
RT: 223+1 records in
RT: 223+1 records out
RT: 234179072 bytes (234 MB) copied, 8.01146 s, 29.2 MB/s

[root@chilli io-throttle-tests]# ./andrea-test-script.sh 
BE: 223+1 records in
BE: 223+1 records out
BE: 234179072 bytes (234 MB) copied, 7.78351 s, 30.1 MB/s
RT: 223+1 records in
RT: 223+1 records out
RT: 234179072 bytes (234 MB) copied, 8.06292 s, 29.0 MB/s

Disk3
=====
This is an Intel SSD.

[root@chilli io-throttle-tests]# ./andrea-test-script.sh 
RT: 223+1 records in
RT: 223+1 records out
RT: 234179072 bytes (234 MB) copied, 0.993735 s, 236 MB/s
BE: 223+1 records in
BE: 223+1 records out
BE: 234179072 bytes (234 MB) copied, 1.98772 s, 118 MB/s

[root@chilli io-throttle-tests]# ./andrea-test-script.sh 
RT: 223+1 records in
RT: 223+1 records out
RT: 234179072 bytes (234 MB) copied, 1.8616 s, 126 MB/s
BE: 223+1 records in
BE: 223+1 records out
BE: 234179072 bytes (234 MB) copied, 1.98499 s, 118 MB/s

[root@chilli io-throttle-tests]# ./andrea-test-script.sh 
RT: 223+1 records in
RT: 223+1 records out
RT: 234179072 bytes (234 MB) copied, 1.01174 s, 231 MB/s
BE: 223+1 records in
BE: 223+1 records out
BE: 234179072 bytes (234 MB) copied, 1.99143 s, 118 MB/s

[root@chilli io-throttle-tests]# ./andrea-test-script.sh 
RT: 223+1 records in
RT: 223+1 records out
RT: 234179072 bytes (234 MB) copied, 1.96132 s, 119 MB/s
BE: 223+1 records in
BE: 223+1 records out
BE: 234179072 bytes (234 MB) copied, 1.97746 s, 118 MB/s

Results without io-throttle patches (vanilla 2.6.30-rc4)
========================================================

Disk 1
======
This is relatively faster SATA drive with command queuing enabled.

RT: 223+1 records in
RT: 223+1 records out
RT: 234179072 bytes (234 MB) copied, 2.84065 s, 82.4 MB/s
BE: 223+1 records in
BE: 223+1 records out
BE: 234179072 bytes (234 MB) copied, 5.30087 s, 44.2 MB/s

[root@chilli io-throttle-tests]# ./andrea-test-script.sh 
RT: 223+1 records in
RT: 223+1 records out
RT: 234179072 bytes (234 MB) copied, 2.69688 s, 86.8 MB/s
BE: 223+1 records in
BE: 223+1 records out
BE: 234179072 bytes (234 MB) copied, 5.18175 s, 45.2 MB/s

[root@chilli io-throttle-tests]# ./andrea-test-script.sh 
RT: 223+1 records in
RT: 223+1 records out
RT: 234179072 bytes (234 MB) copied, 2.73279 s, 85.7 MB/s
BE: 223+1 records in
BE: 223+1 records out
BE: 234179072 bytes (234 MB) copied, 5.21803 s, 44.9 MB/s

[root@chilli io-throttle-tests]# ./andrea-test-script.sh 
RT: 223+1 records in
RT: 223+1 records out
RT: 234179072 bytes (234 MB) copied, 2.69304 s, 87.0 MB/s
BE: 223+1 records in
BE: 223+1 records out
BE: 234179072 bytes (234 MB) copied, 5.17821 s, 45.2 MB/s

Disk 2
======
Slower disk with no command queuing.

[root@chilli io-throttle-tests]# ./andrea-test-script.sh 
RT: 223+1 records in
RT: 223+1 records out
RT: 234179072 bytes (234 MB) copied, 4.29453 s, 54.5 MB/s
BE: 223+1 records in
BE: 223+1 records out
BE: 234179072 bytes (234 MB) copied, 8.04978 s, 29.1 MB/s

[root@chilli io-throttle-tests]# ./andrea-test-script.sh 
RT: 223+1 records in
RT: 223+1 records out
RT: 234179072 bytes (234 MB) copied, 3.96924 s, 59.0 MB/s
BE: 223+1 records in
BE: 223+1 records out
BE: 234179072 bytes (234 MB) copied, 7.74984 s, 30.2 MB/s

[root@chilli io-throttle-tests]# ./andrea-test-script.sh 
RT: 223+1 records in
RT: 223+1 records out
RT: 234179072 bytes (234 MB) copied, 4.11254 s, 56.9 MB/s
BE: 223+1 records in
BE: 223+1 records out
BE: 234179072 bytes (234 MB) copied, 7.8678 s, 29.8 MB/s

[root@chilli io-throttle-tests]# ./andrea-test-script.sh 
RT: 223+1 records in
RT: 223+1 records out
RT: 234179072 bytes (234 MB) copied, 3.95979 s, 59.1 MB/s
BE: 223+1 records in
BE: 223+1 records out
BE: 234179072 bytes (234 MB) copied, 7.73976 s, 30.3 MB/s

Disk3
=====
Intel SSD

[root@chilli io-throttle-tests]# ./andrea-test-script.sh 
RT: 223+1 records in
RT: 223+1 records out
RT: 234179072 bytes (234 MB) copied, 0.996762 s, 235 MB/s
BE: 223+1 records in
BE: 223+1 records out
BE: 234179072 bytes (234 MB) copied, 1.93268 s, 121 MB/s

[root@chilli io-throttle-tests]# ./andrea-test-script.sh 
RT: 223+1 records in
RT: 223+1 records out
RT: 234179072 bytes (234 MB) copied, 0.98511 s, 238 MB/s
BE: 223+1 records in
BE: 223+1 records out
BE: 234179072 bytes (234 MB) copied, 1.92481 s, 122 MB/s

[root@chilli io-throttle-tests]# ./andrea-test-script.sh 
RT: 223+1 records in
RT: 223+1 records out
RT: 234179072 bytes (234 MB) copied, 0.986981 s, 237 MB/s
BE: 223+1 records in
BE: 223+1 records out
BE: 234179072 bytes (234 MB) copied, 1.9312 s, 121 MB/s

[root@chilli io-throttle-tests]# ./andrea-test-script.sh 
RT: 223+1 records in
RT: 223+1 records out
RT: 234179072 bytes (234 MB) copied, 0.988448 s, 237 MB/s
BE: 223+1 records in
BE: 223+1 records out
BE: 234179072 bytes (234 MB) copied, 1.93885 s, 121 MB/s

So I am still seeing the issue with differnt kind of disks also. At this point
of time I am really not sure why I am seeing such results.

I have following patches applied on 30-rc4 (V16).

3954-vivek.goyal2008-res_counter-introduce-ratelimiting-attributes.patch
3955-vivek.goyal2008-page_cgroup-provide-a-generic-page-tracking-infrastructure.patch
3956-vivek.goyal2008-io-throttle-controller-infrastructure.patch
3957-vivek.goyal2008-kiothrottled-throttle-buffered-io.patch
3958-vivek.goyal2008-io-throttle-instrumentation.patch
3959-vivek.goyal2008-io-throttle-export-per-task-statistics-to-userspace.patch

Thanks
Vivek

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: IO scheduler based IO Controller V2
  2009-05-07  9:04             ` Andrea Righi
                                 ` (2 preceding siblings ...)
  2009-05-07 14:11               ` Vivek Goyal
@ 2009-05-07 14:11               ` Vivek Goyal
       [not found]                 ` <20090507141126.GA9463-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  3 siblings, 1 reply; 97+ messages in thread
From: Vivek Goyal @ 2009-05-07 14:11 UTC (permalink / raw)
  To: Andrea Righi
  Cc: Andrew Morton, nauman, dpshah, lizf, mikew, fchecconi,
	paolo.valente, jens.axboe, ryov, fernando, s-uchida, taka,
	guijianfeng, jmoyer, dhaval, balbir, linux-kernel, containers,
	agk, dm-devel, snitzer, m-ikeda, peterz

On Thu, May 07, 2009 at 11:04:50AM +0200, Andrea Righi wrote:
> On Wed, May 06, 2009 at 05:52:35PM -0400, Vivek Goyal wrote:
> > > > Without io-throttle patches
> > > > ---------------------------
> > > > - Two readers, first BE prio 7, second BE prio 0
> > > > 
> > > > 234179072 bytes (234 MB) copied, 4.12074 s, 56.8 MB/s
> > > > High prio reader finished
> > > > 234179072 bytes (234 MB) copied, 5.36023 s, 43.7 MB/s
> > > > 
> > > > Note: There is no service differentiation between prio 0 and prio 7 task
> > > >       with io-throttle patches.
> > > > 
> > > > Test 3
> > > > ======
> > > > - Run the one RT reader and one BE reader in root cgroup without any
> > > >   limitations. I guess this should mean unlimited BW and behavior should
> > > >   be same as with CFQ without io-throttling patches.
> > > > 
> > > > With io-throttle patches
> > > > =========================
> > > > Ran the test 4 times because I was getting different results in different
> > > > runs.
> > > > 
> > > > - Two readers, one RT prio 0  other BE prio 7
> > > > 
> > > > 234179072 bytes (234 MB) copied, 2.74604 s, 85.3 MB/s
> > > > 234179072 bytes (234 MB) copied, 5.20995 s, 44.9 MB/s
> > > > RT task finished
> > > > 
> > > > 234179072 bytes (234 MB) copied, 4.54417 s, 51.5 MB/s
> > > > RT task finished
> > > > 234179072 bytes (234 MB) copied, 5.23396 s, 44.7 MB/s
> > > > 
> > > > 234179072 bytes (234 MB) copied, 5.17727 s, 45.2 MB/s
> > > > RT task finished
> > > > 234179072 bytes (234 MB) copied, 5.25894 s, 44.5 MB/s
> > > > 
> > > > 234179072 bytes (234 MB) copied, 2.74141 s, 85.4 MB/s
> > > > 234179072 bytes (234 MB) copied, 5.20536 s, 45.0 MB/s
> > > > RT task finished
> > > > 
> > > > Note: Out of 4 runs, looks like twice it is complete priority inversion
> > > >       and RT task finished after BE task. Rest of the two times, the
> > > >       difference between BW of RT and BE task is much less as compared to
> > > >       without patches. In fact once it was almost same.
> > > 
> > > This is strange. If you don't set any limit there shouldn't be any
> > > difference respect to the other case (without io-throttle patches).
> > > 
> > > At worst a small overhead given by the task_to_iothrottle(), under
> > > rcu_read_lock(). I'll repeat this test ASAP and see if I'll be able to
> > > reproduce this strange behaviour.
> > 
> > Ya, I also found this strange. At least in root group there should not be
> > any behavior change (at max one might expect little drop in throughput
> > because of extra code).
> 
> Hi Vivek,
> 
> I'm not able to reproduce the strange behaviour above.
> 
> Which commands are you running exactly? is the system isolated (stupid
> question) no cron or background tasks doing IO during the tests?
> 
> Following the script I've used:
> 
> $ cat test.sh
> #!/bin/sh
> echo 3 > /proc/sys/vm/drop_caches
> ionice -c 1 -n 0 dd if=bigfile1 of=/dev/null bs=1M 2>&1 | sed "s/\(.*\)/RT: \1/" &
> cat /proc/$!/cgroup | sed "s/\(.*\)/RT: \1/"
> ionice -c 2 -n 7 dd if=bigfile2 of=/dev/null bs=1M 2>&1 | sed "s/\(.*\)/BE: \1/" &
> cat /proc/$!/cgroup | sed "s/\(.*\)/BE: \1/"
> for i in 1 2; do
> 	wait
> done
> 
> And the results on my PC:
> 

[..]

> The difference seems to be just the expected overhead.

Hm..., something is really amiss here. I took your scripts and ran on
my system and I still see the issue. There is nothing else running on the
system and it is isolated.

2.6.30-rc4 + io-throttle patches V16
===================================
It is freshly booted system with nothing extra running on it. This is a
4 core system.

Disk1
=====
This is a fast disk which supports queue depth of 31.

Following is the output picked from dmesg for my device properties.
[    3.016099] sd 2:0:0:0: [sdb] 488397168 512-byte hardware sectors: (250
GB/232 GiB)
[    3.016188] sd 2:0:0:0: Attached scsi generic sg2 type 0

Following are the results of 4 runs of your script. (Just changed the 
script to read right file on my system if=/mnt/sdb/zerofile1).

[root@chilli io-throttle-tests]# ./andrea-test-script.sh 
BE: 223+1 records in
BE: 223+1 records out
BE: 234179072 bytes (234 MB) copied, 4.38435 s, 53.4 MB/s
RT: 223+1 records in
RT: 223+1 records out
RT: 234179072 bytes (234 MB) copied, 5.20706 s, 45.0 MB/s

[root@chilli io-throttle-tests]# ./andrea-test-script.sh 
BE: 223+1 records in
BE: 223+1 records out
BE: 234179072 bytes (234 MB) copied, 5.12953 s, 45.7 MB/s
RT: 223+1 records in
RT: 223+1 records out
RT: 234179072 bytes (234 MB) copied, 5.23573 s, 44.7 MB/s

[root@chilli io-throttle-tests]# ./andrea-test-script.sh 
BE: 223+1 records in
BE: 223+1 records out
BE: 234179072 bytes (234 MB) copied, 3.54644 s, 66.0 MB/s
RT: 223+1 records in
RT: 223+1 records out
RT: 234179072 bytes (234 MB) copied, 5.19406 s, 45.1 MB/s

[root@chilli io-throttle-tests]# ./andrea-test-script.sh 
RT: 223+1 records in
RT: 223+1 records out
RT: 234179072 bytes (234 MB) copied, 5.21908 s, 44.9 MB/s
BE: 223+1 records in
BE: 223+1 records out
BE: 234179072 bytes (234 MB) copied, 5.23802 s, 44.7 MB/s

Disk2
=====
This is a relatively slower disk with no command queuing.

[root@chilli io-throttle-tests]# ./andrea-test-script.sh 
RT: 223+1 records in
RT: 223+1 records out
RT: 234179072 bytes (234 MB) copied, 7.06471 s, 33.1 MB/s
BE: 223+1 records in
BE: 223+1 records out
BE: 234179072 bytes (234 MB) copied, 8.01571 s, 29.2 MB/s

[root@chilli io-throttle-tests]# ./andrea-test-script.sh 
RT: 223+1 records in
RT: 223+1 records out
RT: 234179072 bytes (234 MB) copied, 7.89043 s, 29.7 MB/s
BE: 223+1 records in
BE: 223+1 records out
BE: 234179072 bytes (234 MB) copied, 8.03428 s, 29.1 MB/s

[root@chilli io-throttle-tests]# ./andrea-test-script.sh 
BE: 223+1 records in
BE: 223+1 records out
BE: 234179072 bytes (234 MB) copied, 7.38942 s, 31.7 MB/s
RT: 223+1 records in
RT: 223+1 records out
RT: 234179072 bytes (234 MB) copied, 8.01146 s, 29.2 MB/s

[root@chilli io-throttle-tests]# ./andrea-test-script.sh 
BE: 223+1 records in
BE: 223+1 records out
BE: 234179072 bytes (234 MB) copied, 7.78351 s, 30.1 MB/s
RT: 223+1 records in
RT: 223+1 records out
RT: 234179072 bytes (234 MB) copied, 8.06292 s, 29.0 MB/s

Disk3
=====
This is an Intel SSD.

[root@chilli io-throttle-tests]# ./andrea-test-script.sh 
RT: 223+1 records in
RT: 223+1 records out
RT: 234179072 bytes (234 MB) copied, 0.993735 s, 236 MB/s
BE: 223+1 records in
BE: 223+1 records out
BE: 234179072 bytes (234 MB) copied, 1.98772 s, 118 MB/s

[root@chilli io-throttle-tests]# ./andrea-test-script.sh 
RT: 223+1 records in
RT: 223+1 records out
RT: 234179072 bytes (234 MB) copied, 1.8616 s, 126 MB/s
BE: 223+1 records in
BE: 223+1 records out
BE: 234179072 bytes (234 MB) copied, 1.98499 s, 118 MB/s

[root@chilli io-throttle-tests]# ./andrea-test-script.sh 
RT: 223+1 records in
RT: 223+1 records out
RT: 234179072 bytes (234 MB) copied, 1.01174 s, 231 MB/s
BE: 223+1 records in
BE: 223+1 records out
BE: 234179072 bytes (234 MB) copied, 1.99143 s, 118 MB/s

[root@chilli io-throttle-tests]# ./andrea-test-script.sh 
RT: 223+1 records in
RT: 223+1 records out
RT: 234179072 bytes (234 MB) copied, 1.96132 s, 119 MB/s
BE: 223+1 records in
BE: 223+1 records out
BE: 234179072 bytes (234 MB) copied, 1.97746 s, 118 MB/s

Results without io-throttle patches (vanilla 2.6.30-rc4)
========================================================

Disk 1
======
This is relatively faster SATA drive with command queuing enabled.

RT: 223+1 records in
RT: 223+1 records out
RT: 234179072 bytes (234 MB) copied, 2.84065 s, 82.4 MB/s
BE: 223+1 records in
BE: 223+1 records out
BE: 234179072 bytes (234 MB) copied, 5.30087 s, 44.2 MB/s

[root@chilli io-throttle-tests]# ./andrea-test-script.sh 
RT: 223+1 records in
RT: 223+1 records out
RT: 234179072 bytes (234 MB) copied, 2.69688 s, 86.8 MB/s
BE: 223+1 records in
BE: 223+1 records out
BE: 234179072 bytes (234 MB) copied, 5.18175 s, 45.2 MB/s

[root@chilli io-throttle-tests]# ./andrea-test-script.sh 
RT: 223+1 records in
RT: 223+1 records out
RT: 234179072 bytes (234 MB) copied, 2.73279 s, 85.7 MB/s
BE: 223+1 records in
BE: 223+1 records out
BE: 234179072 bytes (234 MB) copied, 5.21803 s, 44.9 MB/s

[root@chilli io-throttle-tests]# ./andrea-test-script.sh 
RT: 223+1 records in
RT: 223+1 records out
RT: 234179072 bytes (234 MB) copied, 2.69304 s, 87.0 MB/s
BE: 223+1 records in
BE: 223+1 records out
BE: 234179072 bytes (234 MB) copied, 5.17821 s, 45.2 MB/s

Disk 2
======
Slower disk with no command queuing.

[root@chilli io-throttle-tests]# ./andrea-test-script.sh 
RT: 223+1 records in
RT: 223+1 records out
RT: 234179072 bytes (234 MB) copied, 4.29453 s, 54.5 MB/s
BE: 223+1 records in
BE: 223+1 records out
BE: 234179072 bytes (234 MB) copied, 8.04978 s, 29.1 MB/s

[root@chilli io-throttle-tests]# ./andrea-test-script.sh 
RT: 223+1 records in
RT: 223+1 records out
RT: 234179072 bytes (234 MB) copied, 3.96924 s, 59.0 MB/s
BE: 223+1 records in
BE: 223+1 records out
BE: 234179072 bytes (234 MB) copied, 7.74984 s, 30.2 MB/s

[root@chilli io-throttle-tests]# ./andrea-test-script.sh 
RT: 223+1 records in
RT: 223+1 records out
RT: 234179072 bytes (234 MB) copied, 4.11254 s, 56.9 MB/s
BE: 223+1 records in
BE: 223+1 records out
BE: 234179072 bytes (234 MB) copied, 7.8678 s, 29.8 MB/s

[root@chilli io-throttle-tests]# ./andrea-test-script.sh 
RT: 223+1 records in
RT: 223+1 records out
RT: 234179072 bytes (234 MB) copied, 3.95979 s, 59.1 MB/s
BE: 223+1 records in
BE: 223+1 records out
BE: 234179072 bytes (234 MB) copied, 7.73976 s, 30.3 MB/s

Disk3
=====
Intel SSD

[root@chilli io-throttle-tests]# ./andrea-test-script.sh 
RT: 223+1 records in
RT: 223+1 records out
RT: 234179072 bytes (234 MB) copied, 0.996762 s, 235 MB/s
BE: 223+1 records in
BE: 223+1 records out
BE: 234179072 bytes (234 MB) copied, 1.93268 s, 121 MB/s

[root@chilli io-throttle-tests]# ./andrea-test-script.sh 
RT: 223+1 records in
RT: 223+1 records out
RT: 234179072 bytes (234 MB) copied, 0.98511 s, 238 MB/s
BE: 223+1 records in
BE: 223+1 records out
BE: 234179072 bytes (234 MB) copied, 1.92481 s, 122 MB/s

[root@chilli io-throttle-tests]# ./andrea-test-script.sh 
RT: 223+1 records in
RT: 223+1 records out
RT: 234179072 bytes (234 MB) copied, 0.986981 s, 237 MB/s
BE: 223+1 records in
BE: 223+1 records out
BE: 234179072 bytes (234 MB) copied, 1.9312 s, 121 MB/s

[root@chilli io-throttle-tests]# ./andrea-test-script.sh 
RT: 223+1 records in
RT: 223+1 records out
RT: 234179072 bytes (234 MB) copied, 0.988448 s, 237 MB/s
BE: 223+1 records in
BE: 223+1 records out
BE: 234179072 bytes (234 MB) copied, 1.93885 s, 121 MB/s

So I am still seeing the issue with differnt kind of disks also. At this point
of time I am really not sure why I am seeing such results.

I have following patches applied on 30-rc4 (V16).

3954-vivek.goyal2008-res_counter-introduce-ratelimiting-attributes.patch
3955-vivek.goyal2008-page_cgroup-provide-a-generic-page-tracking-infrastructure.patch
3956-vivek.goyal2008-io-throttle-controller-infrastructure.patch
3957-vivek.goyal2008-kiothrottled-throttle-buffered-io.patch
3958-vivek.goyal2008-io-throttle-instrumentation.patch
3959-vivek.goyal2008-io-throttle-export-per-task-statistics-to-userspace.patch

Thanks
Vivek

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: IO scheduler based IO Controller V2
  2009-05-07  9:04             ` Andrea Righi
  2009-05-07 12:22               ` Andrea Righi
@ 2009-05-07 12:22               ` Andrea Righi
  2009-05-07 14:11               ` Vivek Goyal
  2009-05-07 14:11               ` Vivek Goyal
  3 siblings, 0 replies; 97+ messages in thread
From: Andrea Righi @ 2009-05-07 12:22 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: dhaval-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8,
	snitzer-H+wXaHxf7aLQT0dZR+AlfA, dm-devel-H+wXaHxf7aLQT0dZR+AlfA,
	jens.axboe-QHcLZuEGTsvQT0dZR+AlfA, agk-H+wXaHxf7aLQT0dZR+AlfA,
	balbir-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8,
	paolo.valente-rcYM44yAMweonA0d6jMUrA,
	fernando-gVGce1chcLdL9jVzuh4AOg, jmoyer-H+wXaHxf7aLQT0dZR+AlfA,
	fchecconi-Re5JQEeQqe8AvxtiuMwx3w,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Andrew Morton

On Thu, May 07, 2009 at 11:04:50AM +0200, Andrea Righi wrote:
> On Wed, May 06, 2009 at 05:52:35PM -0400, Vivek Goyal wrote:
> > > > Without io-throttle patches
> > > > ---------------------------
> > > > - Two readers, first BE prio 7, second BE prio 0
> > > > 
> > > > 234179072 bytes (234 MB) copied, 4.12074 s, 56.8 MB/s
> > > > High prio reader finished
> > > > 234179072 bytes (234 MB) copied, 5.36023 s, 43.7 MB/s
> > > > 
> > > > Note: There is no service differentiation between prio 0 and prio 7 task
> > > >       with io-throttle patches.
> > > > 
> > > > Test 3
> > > > ======
> > > > - Run the one RT reader and one BE reader in root cgroup without any
> > > >   limitations. I guess this should mean unlimited BW and behavior should
> > > >   be same as with CFQ without io-throttling patches.
> > > > 
> > > > With io-throttle patches
> > > > =========================
> > > > Ran the test 4 times because I was getting different results in different
> > > > runs.
> > > > 
> > > > - Two readers, one RT prio 0  other BE prio 7
> > > > 
> > > > 234179072 bytes (234 MB) copied, 2.74604 s, 85.3 MB/s
> > > > 234179072 bytes (234 MB) copied, 5.20995 s, 44.9 MB/s
> > > > RT task finished
> > > > 
> > > > 234179072 bytes (234 MB) copied, 4.54417 s, 51.5 MB/s
> > > > RT task finished
> > > > 234179072 bytes (234 MB) copied, 5.23396 s, 44.7 MB/s
> > > > 
> > > > 234179072 bytes (234 MB) copied, 5.17727 s, 45.2 MB/s
> > > > RT task finished
> > > > 234179072 bytes (234 MB) copied, 5.25894 s, 44.5 MB/s
> > > > 
> > > > 234179072 bytes (234 MB) copied, 2.74141 s, 85.4 MB/s
> > > > 234179072 bytes (234 MB) copied, 5.20536 s, 45.0 MB/s
> > > > RT task finished
> > > > 
> > > > Note: Out of 4 runs, looks like twice it is complete priority inversion
> > > >       and RT task finished after BE task. Rest of the two times, the
> > > >       difference between BW of RT and BE task is much less as compared to
> > > >       without patches. In fact once it was almost same.
> > > 
> > > This is strange. If you don't set any limit there shouldn't be any
> > > difference respect to the other case (without io-throttle patches).
> > > 
> > > At worst a small overhead given by the task_to_iothrottle(), under
> > > rcu_read_lock(). I'll repeat this test ASAP and see if I'll be able to
> > > reproduce this strange behaviour.
> > 
> > Ya, I also found this strange. At least in root group there should not be
> > any behavior change (at max one might expect little drop in throughput
> > because of extra code).
> 
> Hi Vivek,
> 
> I'm not able to reproduce the strange behaviour above.
> 
> Which commands are you running exactly? is the system isolated (stupid
> question) no cron or background tasks doing IO during the tests?
> 
> Following the script I've used:
> 
> $ cat test.sh
> #!/bin/sh
> echo 3 > /proc/sys/vm/drop_caches
> ionice -c 1 -n 0 dd if=bigfile1 of=/dev/null bs=1M 2>&1 | sed "s/\(.*\)/RT: \1/" &
> cat /proc/$!/cgroup | sed "s/\(.*\)/RT: \1/"
> ionice -c 2 -n 7 dd if=bigfile2 of=/dev/null bs=1M 2>&1 | sed "s/\(.*\)/BE: \1/" &
> cat /proc/$!/cgroup | sed "s/\(.*\)/BE: \1/"
> for i in 1 2; do
> 	wait
> done
> 
> And the results on my PC:
> 
> 2.6.30-rc4
> ~~~~~~~~~~
> $ sudo sh test.sh | sort
> BE: 234+0 records in
> BE: 234+0 records out
> BE: 245366784 bytes (245 MB) copied, 21.3406 s, 11.5 MB/s
> RT: 234+0 records in
> RT: 234+0 records out
> RT: 245366784 bytes (245 MB) copied, 11.989 s, 20.5 MB/s
> $ sudo sh test.sh | sort
> BE: 234+0 records in
> BE: 234+0 records out
> BE: 245366784 bytes (245 MB) copied, 23.4436 s, 10.5 MB/s
> RT: 234+0 records in
> RT: 234+0 records out
> RT: 245366784 bytes (245 MB) copied, 11.9555 s, 20.5 MB/s
> $ sudo sh test.sh | sort
> BE: 234+0 records in
> BE: 234+0 records out
> BE: 245366784 bytes (245 MB) copied, 21.622 s, 11.3 MB/s
> RT: 234+0 records in
> RT: 234+0 records out
> RT: 245366784 bytes (245 MB) copied, 11.9856 s, 20.5 MB/s
> $ sudo sh test.sh | sort
> BE: 234+0 records in
> BE: 234+0 records out
> BE: 245366784 bytes (245 MB) copied, 21.5664 s, 11.4 MB/s
> RT: 234+0 records in
> RT: 234+0 records out
> RT: 245366784 bytes (245 MB) copied, 11.8522 s, 20.7 MB/s
> 
> 2.6.30-rc4 + io-throttle, no BW limit, both tasks in the root cgroup
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> $ sudo sh ./test.sh | sort
> BE: 234+0 records in
> BE: 234+0 records out
> BE: 245366784 bytes (245 MB) copied, 23.6739 s, 10.4 MB/s
> BE: cgroup 4:blockio:/
> RT: 234+0 records in
> RT: 234+0 records out
> RT: 245366784 bytes (245 MB) copied, 12.2853 s, 20.0 MB/s
> RT: 4:blockio:/
> $ sudo sh ./test.sh | sort
> BE: 234+0 records in
> BE: 234+0 records out
> BE: 245366784 bytes (245 MB) copied, 23.7483 s, 10.3 MB/s
> BE: cgroup 4:blockio:/
> RT: 234+0 records in
> RT: 234+0 records out
> RT: 245366784 bytes (245 MB) copied, 12.3597 s, 19.9 MB/s
> RT: 4:blockio:/
> $ sudo sh ./test.sh | sort
> BE: 234+0 records in
> BE: 234+0 records out
> BE: 245366784 bytes (245 MB) copied, 23.6843 s, 10.4 MB/s
> BE: cgroup 4:blockio:/
> RT: 234+0 records in
> RT: 234+0 records out
> RT: 245366784 bytes (245 MB) copied, 12.4886 s, 19.6 MB/s
> RT: 4:blockio:/
> $ sudo sh ./test.sh | sort
> BE: 234+0 records in
> BE: 234+0 records out
> BE: 245366784 bytes (245 MB) copied, 23.8621 s, 10.3 MB/s
> BE: cgroup 4:blockio:/
> RT: 234+0 records in
> RT: 234+0 records out
> RT: 245366784 bytes (245 MB) copied, 12.6737 s, 19.4 MB/s
> RT: 4:blockio:/
> 
> The difference seems to be just the expected overhead.

BTW, it is possible to reduce the io-throttle overhead even more for non
io-throttle users (also when CONFIG_CGROUP_IO_THROTTLE is enabled) using
the trick below.

2.6.30-rc4 + io-throttle + following patch, no BW limit, tasks in root cgroup
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
$ sudo sh test.sh | sort
BE: 234+0 records in
BE: 234+0 records out
BE: 245366784 bytes (245 MB) copied, 17.462 s, 14.1 MB/s
BE: 4:blockio:/
RT: 234+0 records in
RT: 234+0 records out
RT: 245366784 bytes (245 MB) copied, 11.7865 s, 20.8 MB/s
RT: 4:blockio:/
$ sudo sh test.sh | sort
BE: 234+0 records in
BE: 234+0 records out
BE: 245366784 bytes (245 MB) copied, 18.8375 s, 13.0 MB/s
BE: 4:blockio:/
RT: 234+0 records in
RT: 234+0 records out
RT: 245366784 bytes (245 MB) copied, 11.9148 s, 20.6 MB/s
RT: 4:blockio:/
$ sudo sh test.sh | sort
BE: 234+0 records in
BE: 234+0 records out
BE: 245366784 bytes (245 MB) copied, 19.6826 s, 12.5 MB/s
BE: 4:blockio:/
RT: 234+0 records in
RT: 234+0 records out
RT: 245366784 bytes (245 MB) copied, 11.8715 s, 20.7 MB/s
RT: 4:blockio:/
$ sudo sh test.sh | sort
BE: 234+0 records in
BE: 234+0 records out
BE: 245366784 bytes (245 MB) copied, 18.9152 s, 13.0 MB/s
BE: 4:blockio:/
RT: 234+0 records in
RT: 234+0 records out
RT: 245366784 bytes (245 MB) copied, 11.8925 s, 20.6 MB/s
RT: 4:blockio:/

[ To be applied on top of io-throttle v16 ]

Signed-off-by: Andrea Righi <righi.andrea-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
---
 block/blk-io-throttle.c |   16 ++++++++++++++--
 1 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/block/blk-io-throttle.c b/block/blk-io-throttle.c
index e2dfd24..8b45c71 100644
--- a/block/blk-io-throttle.c
+++ b/block/blk-io-throttle.c
@@ -131,6 +131,14 @@ struct iothrottle_node {
 	struct iothrottle_stat stat;
 };
 
+/*
+ * This is a trick to reduce the unneded overhead when io-throttle is not used
+ * at all. We use a counter of the io-throttle rules; if the counter is zero,
+ * we immediately return from the io-throttle hooks, without accounting IO and
+ * without checking if we need to apply some limiting rules.
+ */
+static atomic_t iothrottle_node_count __read_mostly;
+
 /**
  * struct iothrottle - throttling rules for a cgroup
  * @css: pointer to the cgroup state
@@ -193,6 +201,7 @@ static void iothrottle_insert_node(struct iothrottle *iot,
 {
 	WARN_ON_ONCE(!cgroup_is_locked());
 	list_add_rcu(&n->node, &iot->list);
+	atomic_inc(&iothrottle_node_count);
 }
 
 /*
@@ -214,6 +223,7 @@ iothrottle_delete_node(struct iothrottle *iot, struct iothrottle_node *n)
 {
 	WARN_ON_ONCE(!cgroup_is_locked());
 	list_del_rcu(&n->node);
+	atomic_dec(&iothrottle_node_count);
 }
 
 /*
@@ -250,8 +260,10 @@ static void iothrottle_destroy(struct cgroup_subsys *ss, struct cgroup *cgrp)
 	 * reference to the list.
 	 */
 	if (!list_empty(&iot->list))
-		list_for_each_entry_safe(n, p, &iot->list, node)
+		list_for_each_entry_safe(n, p, &iot->list, node) {
 			kfree(n);
+			atomic_dec(&iothrottle_node_count);
+		}
 	kfree(iot);
 }
 
@@ -836,7 +848,7 @@ cgroup_io_throttle(struct bio *bio, struct block_device *bdev, ssize_t bytes)
 	unsigned long long sleep;
 	int type, can_sleep = 1;
 
-	if (iothrottle_disabled())
+	if (iothrottle_disabled() || !atomic_read(&iothrottle_node_count))
 		return 0;
 	if (unlikely(!bdev))
 		return 0;

^ permalink raw reply related	[flat|nested] 97+ messages in thread

* Re: IO scheduler based IO Controller V2
  2009-05-07  9:04             ` Andrea Righi
@ 2009-05-07 12:22               ` Andrea Righi
  2009-05-07 12:22               ` Andrea Righi
                                 ` (2 subsequent siblings)
  3 siblings, 0 replies; 97+ messages in thread
From: Andrea Righi @ 2009-05-07 12:22 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Andrew Morton, nauman, dpshah, lizf, mikew, fchecconi,
	paolo.valente, jens.axboe, ryov, fernando, s-uchida, taka,
	guijianfeng, jmoyer, dhaval, balbir, linux-kernel, containers,
	agk, dm-devel, snitzer, m-ikeda, peterz

On Thu, May 07, 2009 at 11:04:50AM +0200, Andrea Righi wrote:
> On Wed, May 06, 2009 at 05:52:35PM -0400, Vivek Goyal wrote:
> > > > Without io-throttle patches
> > > > ---------------------------
> > > > - Two readers, first BE prio 7, second BE prio 0
> > > > 
> > > > 234179072 bytes (234 MB) copied, 4.12074 s, 56.8 MB/s
> > > > High prio reader finished
> > > > 234179072 bytes (234 MB) copied, 5.36023 s, 43.7 MB/s
> > > > 
> > > > Note: There is no service differentiation between prio 0 and prio 7 task
> > > >       with io-throttle patches.
> > > > 
> > > > Test 3
> > > > ======
> > > > - Run the one RT reader and one BE reader in root cgroup without any
> > > >   limitations. I guess this should mean unlimited BW and behavior should
> > > >   be same as with CFQ without io-throttling patches.
> > > > 
> > > > With io-throttle patches
> > > > =========================
> > > > Ran the test 4 times because I was getting different results in different
> > > > runs.
> > > > 
> > > > - Two readers, one RT prio 0  other BE prio 7
> > > > 
> > > > 234179072 bytes (234 MB) copied, 2.74604 s, 85.3 MB/s
> > > > 234179072 bytes (234 MB) copied, 5.20995 s, 44.9 MB/s
> > > > RT task finished
> > > > 
> > > > 234179072 bytes (234 MB) copied, 4.54417 s, 51.5 MB/s
> > > > RT task finished
> > > > 234179072 bytes (234 MB) copied, 5.23396 s, 44.7 MB/s
> > > > 
> > > > 234179072 bytes (234 MB) copied, 5.17727 s, 45.2 MB/s
> > > > RT task finished
> > > > 234179072 bytes (234 MB) copied, 5.25894 s, 44.5 MB/s
> > > > 
> > > > 234179072 bytes (234 MB) copied, 2.74141 s, 85.4 MB/s
> > > > 234179072 bytes (234 MB) copied, 5.20536 s, 45.0 MB/s
> > > > RT task finished
> > > > 
> > > > Note: Out of 4 runs, looks like twice it is complete priority inversion
> > > >       and RT task finished after BE task. Rest of the two times, the
> > > >       difference between BW of RT and BE task is much less as compared to
> > > >       without patches. In fact once it was almost same.
> > > 
> > > This is strange. If you don't set any limit there shouldn't be any
> > > difference respect to the other case (without io-throttle patches).
> > > 
> > > At worst a small overhead given by the task_to_iothrottle(), under
> > > rcu_read_lock(). I'll repeat this test ASAP and see if I'll be able to
> > > reproduce this strange behaviour.
> > 
> > Ya, I also found this strange. At least in root group there should not be
> > any behavior change (at max one might expect little drop in throughput
> > because of extra code).
> 
> Hi Vivek,
> 
> I'm not able to reproduce the strange behaviour above.
> 
> Which commands are you running exactly? is the system isolated (stupid
> question) no cron or background tasks doing IO during the tests?
> 
> Following the script I've used:
> 
> $ cat test.sh
> #!/bin/sh
> echo 3 > /proc/sys/vm/drop_caches
> ionice -c 1 -n 0 dd if=bigfile1 of=/dev/null bs=1M 2>&1 | sed "s/\(.*\)/RT: \1/" &
> cat /proc/$!/cgroup | sed "s/\(.*\)/RT: \1/"
> ionice -c 2 -n 7 dd if=bigfile2 of=/dev/null bs=1M 2>&1 | sed "s/\(.*\)/BE: \1/" &
> cat /proc/$!/cgroup | sed "s/\(.*\)/BE: \1/"
> for i in 1 2; do
> 	wait
> done
> 
> And the results on my PC:
> 
> 2.6.30-rc4
> ~~~~~~~~~~
> $ sudo sh test.sh | sort
> BE: 234+0 records in
> BE: 234+0 records out
> BE: 245366784 bytes (245 MB) copied, 21.3406 s, 11.5 MB/s
> RT: 234+0 records in
> RT: 234+0 records out
> RT: 245366784 bytes (245 MB) copied, 11.989 s, 20.5 MB/s
> $ sudo sh test.sh | sort
> BE: 234+0 records in
> BE: 234+0 records out
> BE: 245366784 bytes (245 MB) copied, 23.4436 s, 10.5 MB/s
> RT: 234+0 records in
> RT: 234+0 records out
> RT: 245366784 bytes (245 MB) copied, 11.9555 s, 20.5 MB/s
> $ sudo sh test.sh | sort
> BE: 234+0 records in
> BE: 234+0 records out
> BE: 245366784 bytes (245 MB) copied, 21.622 s, 11.3 MB/s
> RT: 234+0 records in
> RT: 234+0 records out
> RT: 245366784 bytes (245 MB) copied, 11.9856 s, 20.5 MB/s
> $ sudo sh test.sh | sort
> BE: 234+0 records in
> BE: 234+0 records out
> BE: 245366784 bytes (245 MB) copied, 21.5664 s, 11.4 MB/s
> RT: 234+0 records in
> RT: 234+0 records out
> RT: 245366784 bytes (245 MB) copied, 11.8522 s, 20.7 MB/s
> 
> 2.6.30-rc4 + io-throttle, no BW limit, both tasks in the root cgroup
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> $ sudo sh ./test.sh | sort
> BE: 234+0 records in
> BE: 234+0 records out
> BE: 245366784 bytes (245 MB) copied, 23.6739 s, 10.4 MB/s
> BE: cgroup 4:blockio:/
> RT: 234+0 records in
> RT: 234+0 records out
> RT: 245366784 bytes (245 MB) copied, 12.2853 s, 20.0 MB/s
> RT: 4:blockio:/
> $ sudo sh ./test.sh | sort
> BE: 234+0 records in
> BE: 234+0 records out
> BE: 245366784 bytes (245 MB) copied, 23.7483 s, 10.3 MB/s
> BE: cgroup 4:blockio:/
> RT: 234+0 records in
> RT: 234+0 records out
> RT: 245366784 bytes (245 MB) copied, 12.3597 s, 19.9 MB/s
> RT: 4:blockio:/
> $ sudo sh ./test.sh | sort
> BE: 234+0 records in
> BE: 234+0 records out
> BE: 245366784 bytes (245 MB) copied, 23.6843 s, 10.4 MB/s
> BE: cgroup 4:blockio:/
> RT: 234+0 records in
> RT: 234+0 records out
> RT: 245366784 bytes (245 MB) copied, 12.4886 s, 19.6 MB/s
> RT: 4:blockio:/
> $ sudo sh ./test.sh | sort
> BE: 234+0 records in
> BE: 234+0 records out
> BE: 245366784 bytes (245 MB) copied, 23.8621 s, 10.3 MB/s
> BE: cgroup 4:blockio:/
> RT: 234+0 records in
> RT: 234+0 records out
> RT: 245366784 bytes (245 MB) copied, 12.6737 s, 19.4 MB/s
> RT: 4:blockio:/
> 
> The difference seems to be just the expected overhead.

BTW, it is possible to reduce the io-throttle overhead even more for non
io-throttle users (also when CONFIG_CGROUP_IO_THROTTLE is enabled) using
the trick below.

2.6.30-rc4 + io-throttle + following patch, no BW limit, tasks in root cgroup
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
$ sudo sh test.sh | sort
BE: 234+0 records in
BE: 234+0 records out
BE: 245366784 bytes (245 MB) copied, 17.462 s, 14.1 MB/s
BE: 4:blockio:/
RT: 234+0 records in
RT: 234+0 records out
RT: 245366784 bytes (245 MB) copied, 11.7865 s, 20.8 MB/s
RT: 4:blockio:/
$ sudo sh test.sh | sort
BE: 234+0 records in
BE: 234+0 records out
BE: 245366784 bytes (245 MB) copied, 18.8375 s, 13.0 MB/s
BE: 4:blockio:/
RT: 234+0 records in
RT: 234+0 records out
RT: 245366784 bytes (245 MB) copied, 11.9148 s, 20.6 MB/s
RT: 4:blockio:/
$ sudo sh test.sh | sort
BE: 234+0 records in
BE: 234+0 records out
BE: 245366784 bytes (245 MB) copied, 19.6826 s, 12.5 MB/s
BE: 4:blockio:/
RT: 234+0 records in
RT: 234+0 records out
RT: 245366784 bytes (245 MB) copied, 11.8715 s, 20.7 MB/s
RT: 4:blockio:/
$ sudo sh test.sh | sort
BE: 234+0 records in
BE: 234+0 records out
BE: 245366784 bytes (245 MB) copied, 18.9152 s, 13.0 MB/s
BE: 4:blockio:/
RT: 234+0 records in
RT: 234+0 records out
RT: 245366784 bytes (245 MB) copied, 11.8925 s, 20.6 MB/s
RT: 4:blockio:/

[ To be applied on top of io-throttle v16 ]

Signed-off-by: Andrea Righi <righi.andrea@gmail.com>
---
 block/blk-io-throttle.c |   16 ++++++++++++++--
 1 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/block/blk-io-throttle.c b/block/blk-io-throttle.c
index e2dfd24..8b45c71 100644
--- a/block/blk-io-throttle.c
+++ b/block/blk-io-throttle.c
@@ -131,6 +131,14 @@ struct iothrottle_node {
 	struct iothrottle_stat stat;
 };
 
+/*
+ * This is a trick to reduce the unneded overhead when io-throttle is not used
+ * at all. We use a counter of the io-throttle rules; if the counter is zero,
+ * we immediately return from the io-throttle hooks, without accounting IO and
+ * without checking if we need to apply some limiting rules.
+ */
+static atomic_t iothrottle_node_count __read_mostly;
+
 /**
  * struct iothrottle - throttling rules for a cgroup
  * @css: pointer to the cgroup state
@@ -193,6 +201,7 @@ static void iothrottle_insert_node(struct iothrottle *iot,
 {
 	WARN_ON_ONCE(!cgroup_is_locked());
 	list_add_rcu(&n->node, &iot->list);
+	atomic_inc(&iothrottle_node_count);
 }
 
 /*
@@ -214,6 +223,7 @@ iothrottle_delete_node(struct iothrottle *iot, struct iothrottle_node *n)
 {
 	WARN_ON_ONCE(!cgroup_is_locked());
 	list_del_rcu(&n->node);
+	atomic_dec(&iothrottle_node_count);
 }
 
 /*
@@ -250,8 +260,10 @@ static void iothrottle_destroy(struct cgroup_subsys *ss, struct cgroup *cgrp)
 	 * reference to the list.
 	 */
 	if (!list_empty(&iot->list))
-		list_for_each_entry_safe(n, p, &iot->list, node)
+		list_for_each_entry_safe(n, p, &iot->list, node) {
 			kfree(n);
+			atomic_dec(&iothrottle_node_count);
+		}
 	kfree(iot);
 }
 
@@ -836,7 +848,7 @@ cgroup_io_throttle(struct bio *bio, struct block_device *bdev, ssize_t bytes)
 	unsigned long long sleep;
 	int type, can_sleep = 1;
 
-	if (iothrottle_disabled())
+	if (iothrottle_disabled() || !atomic_read(&iothrottle_node_count))
 		return 0;
 	if (unlikely(!bdev))
 		return 0;

^ permalink raw reply related	[flat|nested] 97+ messages in thread

* Re: IO scheduler based IO Controller V2
       [not found]             ` <20090506215235.GJ8180-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  2009-05-06 22:35               ` Andrea Righi
@ 2009-05-07  9:04               ` Andrea Righi
  1 sibling, 0 replies; 97+ messages in thread
From: Andrea Righi @ 2009-05-07  9:04 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: dhaval-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8,
	snitzer-H+wXaHxf7aLQT0dZR+AlfA, dm-devel-H+wXaHxf7aLQT0dZR+AlfA,
	jens.axboe-QHcLZuEGTsvQT0dZR+AlfA, agk-H+wXaHxf7aLQT0dZR+AlfA,
	balbir-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8,
	paolo.valente-rcYM44yAMweonA0d6jMUrA,
	fernando-gVGce1chcLdL9jVzuh4AOg, jmoyer-H+wXaHxf7aLQT0dZR+AlfA,
	fchecconi-Re5JQEeQqe8AvxtiuMwx3w,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Andrew Morton

On Wed, May 06, 2009 at 05:52:35PM -0400, Vivek Goyal wrote:
> > > Without io-throttle patches
> > > ---------------------------
> > > - Two readers, first BE prio 7, second BE prio 0
> > > 
> > > 234179072 bytes (234 MB) copied, 4.12074 s, 56.8 MB/s
> > > High prio reader finished
> > > 234179072 bytes (234 MB) copied, 5.36023 s, 43.7 MB/s
> > > 
> > > Note: There is no service differentiation between prio 0 and prio 7 task
> > >       with io-throttle patches.
> > > 
> > > Test 3
> > > ======
> > > - Run the one RT reader and one BE reader in root cgroup without any
> > >   limitations. I guess this should mean unlimited BW and behavior should
> > >   be same as with CFQ without io-throttling patches.
> > > 
> > > With io-throttle patches
> > > =========================
> > > Ran the test 4 times because I was getting different results in different
> > > runs.
> > > 
> > > - Two readers, one RT prio 0  other BE prio 7
> > > 
> > > 234179072 bytes (234 MB) copied, 2.74604 s, 85.3 MB/s
> > > 234179072 bytes (234 MB) copied, 5.20995 s, 44.9 MB/s
> > > RT task finished
> > > 
> > > 234179072 bytes (234 MB) copied, 4.54417 s, 51.5 MB/s
> > > RT task finished
> > > 234179072 bytes (234 MB) copied, 5.23396 s, 44.7 MB/s
> > > 
> > > 234179072 bytes (234 MB) copied, 5.17727 s, 45.2 MB/s
> > > RT task finished
> > > 234179072 bytes (234 MB) copied, 5.25894 s, 44.5 MB/s
> > > 
> > > 234179072 bytes (234 MB) copied, 2.74141 s, 85.4 MB/s
> > > 234179072 bytes (234 MB) copied, 5.20536 s, 45.0 MB/s
> > > RT task finished
> > > 
> > > Note: Out of 4 runs, looks like twice it is complete priority inversion
> > >       and RT task finished after BE task. Rest of the two times, the
> > >       difference between BW of RT and BE task is much less as compared to
> > >       without patches. In fact once it was almost same.
> > 
> > This is strange. If you don't set any limit there shouldn't be any
> > difference respect to the other case (without io-throttle patches).
> > 
> > At worst a small overhead given by the task_to_iothrottle(), under
> > rcu_read_lock(). I'll repeat this test ASAP and see if I'll be able to
> > reproduce this strange behaviour.
> 
> Ya, I also found this strange. At least in root group there should not be
> any behavior change (at max one might expect little drop in throughput
> because of extra code).

Hi Vivek,

I'm not able to reproduce the strange behaviour above.

Which commands are you running exactly? is the system isolated (stupid
question) no cron or background tasks doing IO during the tests?

Following the script I've used:

$ cat test.sh
#!/bin/sh
echo 3 > /proc/sys/vm/drop_caches
ionice -c 1 -n 0 dd if=bigfile1 of=/dev/null bs=1M 2>&1 | sed "s/\(.*\)/RT: \1/" &
cat /proc/$!/cgroup | sed "s/\(.*\)/RT: \1/"
ionice -c 2 -n 7 dd if=bigfile2 of=/dev/null bs=1M 2>&1 | sed "s/\(.*\)/BE: \1/" &
cat /proc/$!/cgroup | sed "s/\(.*\)/BE: \1/"
for i in 1 2; do
	wait
done

And the results on my PC:

2.6.30-rc4
~~~~~~~~~~
$ sudo sh test.sh | sort
BE: 234+0 records in
BE: 234+0 records out
BE: 245366784 bytes (245 MB) copied, 21.3406 s, 11.5 MB/s
RT: 234+0 records in
RT: 234+0 records out
RT: 245366784 bytes (245 MB) copied, 11.989 s, 20.5 MB/s
$ sudo sh test.sh | sort
BE: 234+0 records in
BE: 234+0 records out
BE: 245366784 bytes (245 MB) copied, 23.4436 s, 10.5 MB/s
RT: 234+0 records in
RT: 234+0 records out
RT: 245366784 bytes (245 MB) copied, 11.9555 s, 20.5 MB/s
$ sudo sh test.sh | sort
BE: 234+0 records in
BE: 234+0 records out
BE: 245366784 bytes (245 MB) copied, 21.622 s, 11.3 MB/s
RT: 234+0 records in
RT: 234+0 records out
RT: 245366784 bytes (245 MB) copied, 11.9856 s, 20.5 MB/s
$ sudo sh test.sh | sort
BE: 234+0 records in
BE: 234+0 records out
BE: 245366784 bytes (245 MB) copied, 21.5664 s, 11.4 MB/s
RT: 234+0 records in
RT: 234+0 records out
RT: 245366784 bytes (245 MB) copied, 11.8522 s, 20.7 MB/s

2.6.30-rc4 + io-throttle, no BW limit, both tasks in the root cgroup
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
$ sudo sh ./test.sh | sort
BE: 234+0 records in
BE: 234+0 records out
BE: 245366784 bytes (245 MB) copied, 23.6739 s, 10.4 MB/s
BE: cgroup 4:blockio:/
RT: 234+0 records in
RT: 234+0 records out
RT: 245366784 bytes (245 MB) copied, 12.2853 s, 20.0 MB/s
RT: 4:blockio:/
$ sudo sh ./test.sh | sort
BE: 234+0 records in
BE: 234+0 records out
BE: 245366784 bytes (245 MB) copied, 23.7483 s, 10.3 MB/s
BE: cgroup 4:blockio:/
RT: 234+0 records in
RT: 234+0 records out
RT: 245366784 bytes (245 MB) copied, 12.3597 s, 19.9 MB/s
RT: 4:blockio:/
$ sudo sh ./test.sh | sort
BE: 234+0 records in
BE: 234+0 records out
BE: 245366784 bytes (245 MB) copied, 23.6843 s, 10.4 MB/s
BE: cgroup 4:blockio:/
RT: 234+0 records in
RT: 234+0 records out
RT: 245366784 bytes (245 MB) copied, 12.4886 s, 19.6 MB/s
RT: 4:blockio:/
$ sudo sh ./test.sh | sort
BE: 234+0 records in
BE: 234+0 records out
BE: 245366784 bytes (245 MB) copied, 23.8621 s, 10.3 MB/s
BE: cgroup 4:blockio:/
RT: 234+0 records in
RT: 234+0 records out
RT: 245366784 bytes (245 MB) copied, 12.6737 s, 19.4 MB/s
RT: 4:blockio:/

The difference seems to be just the expected overhead.

-Andrea

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: IO scheduler based IO Controller V2
  2009-05-06 21:52             ` Vivek Goyal
                               ` (2 preceding siblings ...)
  (?)
@ 2009-05-07  9:04             ` Andrea Righi
  2009-05-07 12:22               ` Andrea Righi
                                 ` (3 more replies)
  -1 siblings, 4 replies; 97+ messages in thread
From: Andrea Righi @ 2009-05-07  9:04 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Andrew Morton, nauman, dpshah, lizf, mikew, fchecconi,
	paolo.valente, jens.axboe, ryov, fernando, s-uchida, taka,
	guijianfeng, jmoyer, dhaval, balbir, linux-kernel, containers,
	agk, dm-devel, snitzer, m-ikeda, peterz

On Wed, May 06, 2009 at 05:52:35PM -0400, Vivek Goyal wrote:
> > > Without io-throttle patches
> > > ---------------------------
> > > - Two readers, first BE prio 7, second BE prio 0
> > > 
> > > 234179072 bytes (234 MB) copied, 4.12074 s, 56.8 MB/s
> > > High prio reader finished
> > > 234179072 bytes (234 MB) copied, 5.36023 s, 43.7 MB/s
> > > 
> > > Note: There is no service differentiation between prio 0 and prio 7 task
> > >       with io-throttle patches.
> > > 
> > > Test 3
> > > ======
> > > - Run the one RT reader and one BE reader in root cgroup without any
> > >   limitations. I guess this should mean unlimited BW and behavior should
> > >   be same as with CFQ without io-throttling patches.
> > > 
> > > With io-throttle patches
> > > =========================
> > > Ran the test 4 times because I was getting different results in different
> > > runs.
> > > 
> > > - Two readers, one RT prio 0  other BE prio 7
> > > 
> > > 234179072 bytes (234 MB) copied, 2.74604 s, 85.3 MB/s
> > > 234179072 bytes (234 MB) copied, 5.20995 s, 44.9 MB/s
> > > RT task finished
> > > 
> > > 234179072 bytes (234 MB) copied, 4.54417 s, 51.5 MB/s
> > > RT task finished
> > > 234179072 bytes (234 MB) copied, 5.23396 s, 44.7 MB/s
> > > 
> > > 234179072 bytes (234 MB) copied, 5.17727 s, 45.2 MB/s
> > > RT task finished
> > > 234179072 bytes (234 MB) copied, 5.25894 s, 44.5 MB/s
> > > 
> > > 234179072 bytes (234 MB) copied, 2.74141 s, 85.4 MB/s
> > > 234179072 bytes (234 MB) copied, 5.20536 s, 45.0 MB/s
> > > RT task finished
> > > 
> > > Note: Out of 4 runs, looks like twice it is complete priority inversion
> > >       and RT task finished after BE task. Rest of the two times, the
> > >       difference between BW of RT and BE task is much less as compared to
> > >       without patches. In fact once it was almost same.
> > 
> > This is strange. If you don't set any limit there shouldn't be any
> > difference respect to the other case (without io-throttle patches).
> > 
> > At worst a small overhead given by the task_to_iothrottle(), under
> > rcu_read_lock(). I'll repeat this test ASAP and see if I'll be able to
> > reproduce this strange behaviour.
> 
> Ya, I also found this strange. At least in root group there should not be
> any behavior change (at max one might expect little drop in throughput
> because of extra code).

Hi Vivek,

I'm not able to reproduce the strange behaviour above.

Which commands are you running exactly? is the system isolated (stupid
question) no cron or background tasks doing IO during the tests?

Following the script I've used:

$ cat test.sh
#!/bin/sh
echo 3 > /proc/sys/vm/drop_caches
ionice -c 1 -n 0 dd if=bigfile1 of=/dev/null bs=1M 2>&1 | sed "s/\(.*\)/RT: \1/" &
cat /proc/$!/cgroup | sed "s/\(.*\)/RT: \1/"
ionice -c 2 -n 7 dd if=bigfile2 of=/dev/null bs=1M 2>&1 | sed "s/\(.*\)/BE: \1/" &
cat /proc/$!/cgroup | sed "s/\(.*\)/BE: \1/"
for i in 1 2; do
	wait
done

And the results on my PC:

2.6.30-rc4
~~~~~~~~~~
$ sudo sh test.sh | sort
BE: 234+0 records in
BE: 234+0 records out
BE: 245366784 bytes (245 MB) copied, 21.3406 s, 11.5 MB/s
RT: 234+0 records in
RT: 234+0 records out
RT: 245366784 bytes (245 MB) copied, 11.989 s, 20.5 MB/s
$ sudo sh test.sh | sort
BE: 234+0 records in
BE: 234+0 records out
BE: 245366784 bytes (245 MB) copied, 23.4436 s, 10.5 MB/s
RT: 234+0 records in
RT: 234+0 records out
RT: 245366784 bytes (245 MB) copied, 11.9555 s, 20.5 MB/s
$ sudo sh test.sh | sort
BE: 234+0 records in
BE: 234+0 records out
BE: 245366784 bytes (245 MB) copied, 21.622 s, 11.3 MB/s
RT: 234+0 records in
RT: 234+0 records out
RT: 245366784 bytes (245 MB) copied, 11.9856 s, 20.5 MB/s
$ sudo sh test.sh | sort
BE: 234+0 records in
BE: 234+0 records out
BE: 245366784 bytes (245 MB) copied, 21.5664 s, 11.4 MB/s
RT: 234+0 records in
RT: 234+0 records out
RT: 245366784 bytes (245 MB) copied, 11.8522 s, 20.7 MB/s

2.6.30-rc4 + io-throttle, no BW limit, both tasks in the root cgroup
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
$ sudo sh ./test.sh | sort
BE: 234+0 records in
BE: 234+0 records out
BE: 245366784 bytes (245 MB) copied, 23.6739 s, 10.4 MB/s
BE: cgroup 4:blockio:/
RT: 234+0 records in
RT: 234+0 records out
RT: 245366784 bytes (245 MB) copied, 12.2853 s, 20.0 MB/s
RT: 4:blockio:/
$ sudo sh ./test.sh | sort
BE: 234+0 records in
BE: 234+0 records out
BE: 245366784 bytes (245 MB) copied, 23.7483 s, 10.3 MB/s
BE: cgroup 4:blockio:/
RT: 234+0 records in
RT: 234+0 records out
RT: 245366784 bytes (245 MB) copied, 12.3597 s, 19.9 MB/s
RT: 4:blockio:/
$ sudo sh ./test.sh | sort
BE: 234+0 records in
BE: 234+0 records out
BE: 245366784 bytes (245 MB) copied, 23.6843 s, 10.4 MB/s
BE: cgroup 4:blockio:/
RT: 234+0 records in
RT: 234+0 records out
RT: 245366784 bytes (245 MB) copied, 12.4886 s, 19.6 MB/s
RT: 4:blockio:/
$ sudo sh ./test.sh | sort
BE: 234+0 records in
BE: 234+0 records out
BE: 245366784 bytes (245 MB) copied, 23.8621 s, 10.3 MB/s
BE: cgroup 4:blockio:/
RT: 234+0 records in
RT: 234+0 records out
RT: 245366784 bytes (245 MB) copied, 12.6737 s, 19.4 MB/s
RT: 4:blockio:/

The difference seems to be just the expected overhead.

-Andrea

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: IO scheduler based IO Controller V2
       [not found]       ` <20090506161012.GC8180-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  2009-05-07  5:36         ` Li Zefan
@ 2009-05-07  5:47         ` Gui Jianfeng
  1 sibling, 0 replies; 97+ messages in thread
From: Gui Jianfeng @ 2009-05-07  5:47 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: dhaval-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8,
	snitzer-H+wXaHxf7aLQT0dZR+AlfA, dm-devel-H+wXaHxf7aLQT0dZR+AlfA,
	jens.axboe-QHcLZuEGTsvQT0dZR+AlfA, agk-H+wXaHxf7aLQT0dZR+AlfA,
	balbir-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8,
	paolo.valente-rcYM44yAMweonA0d6jMUrA,
	fernando-gVGce1chcLdL9jVzuh4AOg, jmoyer-H+wXaHxf7aLQT0dZR+AlfA,
	fchecconi-Re5JQEeQqe8AvxtiuMwx3w,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	righi.andrea-Re5JQEeQqe8AvxtiuMwx3w

[-- Attachment #1: Type: text/plain, Size: 2218 bytes --]

Vivek Goyal wrote:
> Hi Gui,
> 
> Thanks for the report. I use cgroup_path() for debugging. I guess that
> cgroup_path() was passed null cgrp pointer that's why it crashed.
> 
> If yes, then it is strange though. I call cgroup_path() only after
> grabbing a refenrece to css object. (I am assuming that if I have a valid
> reference to css object then css->cgrp can't be null).

  I think so too...

> 
> Anyway, can you please try out following patch and see if it fixes your
> crash.
> 
> ---
>  block/elevator-fq.c |   10 +++++-----
>  1 file changed, 5 insertions(+), 5 deletions(-)
> 
> Index: linux11/block/elevator-fq.c
> ===================================================================
> --- linux11.orig/block/elevator-fq.c	2009-05-05 15:38:06.000000000 -0400
> +++ linux11/block/elevator-fq.c	2009-05-06 11:55:47.000000000 -0400
> @@ -125,6 +125,9 @@ static void io_group_path(struct io_grou
>  	unsigned short id = iog->iocg_id;
>  	struct cgroup_subsys_state *css;
>  
> +	/* For error case */
> +	buf[0] = '\0';
> +
>  	rcu_read_lock();
>  
>  	if (!id)
> @@ -137,15 +140,12 @@ static void io_group_path(struct io_grou
>  	if (!css_tryget(css))
>  		goto out;
>  
> -	cgroup_path(css->cgroup, buf, buflen);
> +	if (css->cgroup)

  According to CR2, when kernel crashing, css->cgroup equals 0x00000100.
  So i guess this patch won't fix this issue.

> +		cgroup_path(css->cgroup, buf, buflen);
>  
>  	css_put(css);
> -
> -	rcu_read_unlock();
> -	return;
>  out:
>  	rcu_read_unlock();
> -	buf[0] = '\0';
>  	return;
>  }
>  #endif
> 
> BTW, I tried following equivalent script and I can't see the crash on 
> my system. Are you able to hit it regularly?

  yes, it's 50% chance that i can reproduce it.
  i'v attached the rwio source code.

> 
> Instead of killing the tasks I also tried moving the tasks into root cgroup
> and then deleting test1 and test2 groups, that also did not produce any crash.
> (Hit a different bug though after 5-6 attempts :-)
> 
> As I mentioned in the patchset, currently we do have issues with group
> refcounting and cgroup/group going away. Hopefully in next version they
> all should be fixed up. But still, it is nice to hear back...
> 
> 

-- 
Regards
Gui Jianfeng

[-- Attachment #2: rwio.c --]
[-- Type: image/x-xbitmap, Size: 1613 bytes --]

[-- Attachment #3: Type: text/plain, Size: 206 bytes --]

_______________________________________________
Containers mailing list
Containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org
https://lists.linux-foundation.org/mailman/listinfo/containers

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: IO scheduler based IO Controller V2
  2009-05-06 16:10       ` Vivek Goyal
  (?)
  (?)
@ 2009-05-07  5:47       ` Gui Jianfeng
  -1 siblings, 0 replies; 97+ messages in thread
From: Gui Jianfeng @ 2009-05-07  5:47 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: nauman, dpshah, lizf, mikew, fchecconi, paolo.valente,
	jens.axboe, ryov, fernando, s-uchida, taka, jmoyer, dhaval,
	balbir, linux-kernel, containers, righi.andrea, agk, dm-devel,
	snitzer, m-ikeda, akpm

[-- Attachment #1: Type: text/plain, Size: 2218 bytes --]

Vivek Goyal wrote:
> Hi Gui,
> 
> Thanks for the report. I use cgroup_path() for debugging. I guess that
> cgroup_path() was passed null cgrp pointer that's why it crashed.
> 
> If yes, then it is strange though. I call cgroup_path() only after
> grabbing a refenrece to css object. (I am assuming that if I have a valid
> reference to css object then css->cgrp can't be null).

  I think so too...

> 
> Anyway, can you please try out following patch and see if it fixes your
> crash.
> 
> ---
>  block/elevator-fq.c |   10 +++++-----
>  1 file changed, 5 insertions(+), 5 deletions(-)
> 
> Index: linux11/block/elevator-fq.c
> ===================================================================
> --- linux11.orig/block/elevator-fq.c	2009-05-05 15:38:06.000000000 -0400
> +++ linux11/block/elevator-fq.c	2009-05-06 11:55:47.000000000 -0400
> @@ -125,6 +125,9 @@ static void io_group_path(struct io_grou
>  	unsigned short id = iog->iocg_id;
>  	struct cgroup_subsys_state *css;
>  
> +	/* For error case */
> +	buf[0] = '\0';
> +
>  	rcu_read_lock();
>  
>  	if (!id)
> @@ -137,15 +140,12 @@ static void io_group_path(struct io_grou
>  	if (!css_tryget(css))
>  		goto out;
>  
> -	cgroup_path(css->cgroup, buf, buflen);
> +	if (css->cgroup)

  According to CR2, when kernel crashing, css->cgroup equals 0x00000100.
  So i guess this patch won't fix this issue.

> +		cgroup_path(css->cgroup, buf, buflen);
>  
>  	css_put(css);
> -
> -	rcu_read_unlock();
> -	return;
>  out:
>  	rcu_read_unlock();
> -	buf[0] = '\0';
>  	return;
>  }
>  #endif
> 
> BTW, I tried following equivalent script and I can't see the crash on 
> my system. Are you able to hit it regularly?

  yes, it's 50% chance that i can reproduce it.
  i'v attached the rwio source code.

> 
> Instead of killing the tasks I also tried moving the tasks into root cgroup
> and then deleting test1 and test2 groups, that also did not produce any crash.
> (Hit a different bug though after 5-6 attempts :-)
> 
> As I mentioned in the patchset, currently we do have issues with group
> refcounting and cgroup/group going away. Hopefully in next version they
> all should be fixed up. But still, it is nice to hear back...
> 
> 

-- 
Regards
Gui Jianfeng

[-- Attachment #2: rwio.c --]
[-- Type: image/x-xbitmap, Size: 1613 bytes --]

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: IO scheduler based IO Controller V2
       [not found]       ` <20090506161012.GC8180-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2009-05-07  5:36         ` Li Zefan
  2009-05-07  5:47         ` Gui Jianfeng
  1 sibling, 0 replies; 97+ messages in thread
From: Li Zefan @ 2009-05-07  5:36 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: dhaval-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8,
	snitzer-H+wXaHxf7aLQT0dZR+AlfA, dm-devel-H+wXaHxf7aLQT0dZR+AlfA,
	jens.axboe-QHcLZuEGTsvQT0dZR+AlfA, agk-H+wXaHxf7aLQT0dZR+AlfA,
	balbir-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8,
	paolo.valente-rcYM44yAMweonA0d6jMUrA,
	fernando-gVGce1chcLdL9jVzuh4AOg, jmoyer-H+wXaHxf7aLQT0dZR+AlfA,
	fchecconi-Re5JQEeQqe8AvxtiuMwx3w,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	righi.andrea-Re5JQEeQqe8AvxtiuMwx3w

[-- Attachment #1: Type: text/plain, Size: 2886 bytes --]

Vivek Goyal wrote:
> On Wed, May 06, 2009 at 04:11:05PM +0800, Gui Jianfeng wrote:
>> Vivek Goyal wrote:
>>> Hi All,
>>>
>>> Here is the V2 of the IO controller patches generated on top of 2.6.30-rc4.
>>> First version of the patches was posted here.
>> Hi Vivek,
>>
>> I did some simple test for V2, and triggered an kernel panic.
>> The following script can reproduce this bug. It seems that the cgroup
>> is already removed, but IO Controller still try to access into it.
>>
> 
> Hi Gui,
> 
> Thanks for the report. I use cgroup_path() for debugging. I guess that
> cgroup_path() was passed null cgrp pointer that's why it crashed.
> 
> If yes, then it is strange though. I call cgroup_path() only after
> grabbing a refenrece to css object. (I am assuming that if I have a valid
> reference to css object then css->cgrp can't be null).
> 

Yes, css->cgrp shouldn't be NULL.. I doubt we hit a bug in cgroup here.
The code dealing with css refcnt and cgroup rmdir has changed quite a lot,
and is much more complex than it was.

> Anyway, can you please try out following patch and see if it fixes your
> crash.
...
> BTW, I tried following equivalent script and I can't see the crash on 
> my system. Are you able to hit it regularly?
> 

I modified the script like this:

======================
#!/bin/sh
echo 1 > /proc/sys/vm/drop_caches
mkdir /cgroup 2> /dev/null
mount -t cgroup -o io,blkio io /cgroup
mkdir /cgroup/test1
mkdir /cgroup/test2
echo 100 > /cgroup/test1/io.weight
echo 500 > /cgroup/test2/io.weight

dd if=/dev/zero bs=4096 count=128000 of=500M.1 &
pid1=$!
echo $pid1 > /cgroup/test1/tasks

dd if=/dev/zero bs=4096 count=128000 of=500M.2 &
pid2=$!
echo $pid2 > /cgroup/test2/tasks

sleep 5
kill -9 $pid1
kill -9 $pid2

for ((;count != 2;))
{
        rmdir /cgroup/test1 > /dev/null 2>&1
        if [ $? -eq 0 ]; then
                count=$(( $count + 1 ))
        fi

        rmdir /cgroup/test2 > /dev/null 2>&1
        if [ $? -eq 0 ]; then
                count=$(( $count + 1 ))
        fi
}

umount /cgroup
rmdir /cgroup
======================

I ran this script and got lockdep BUG. Full log and my config are attached.

Actually this can be triggered with the following steps on my box:
# mount -t cgroup -o blkio,io xxx /mnt
# mkdir /mnt/0
# echo $$ > /mnt/0/tasks
# echo 3 > /proc/sys/vm/drop_cache
# echo $$ > /mnt/tasks
# rmdir /mnt/0

And when I ran the script for the second time, my box was freezed
and I had to reset it.

> Instead of killing the tasks I also tried moving the tasks into root cgroup
> and then deleting test1 and test2 groups, that also did not produce any crash.
> (Hit a different bug though after 5-6 attempts :-)
> 
> As I mentioned in the patchset, currently we do have issues with group
> refcounting and cgroup/group going away. Hopefully in next version they
> all should be fixed up. But still, it is nice to hear back...
> 

[-- Attachment #2: myconfig --]
[-- Type: text/plain, Size: 64514 bytes --]

#
# Automatically generated make config: don't edit
# Linux kernel version: 2.6.30-rc4
# Thu May  7 09:11:29 2009
#
# CONFIG_64BIT is not set
CONFIG_X86_32=y
# CONFIG_X86_64 is not set
CONFIG_X86=y
CONFIG_ARCH_DEFCONFIG="arch/x86/configs/i386_defconfig"
CONFIG_GENERIC_TIME=y
CONFIG_GENERIC_CMOS_UPDATE=y
CONFIG_CLOCKSOURCE_WATCHDOG=y
CONFIG_GENERIC_CLOCKEVENTS=y
CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_HAVE_LATENCYTOP_SUPPORT=y
CONFIG_FAST_CMPXCHG_LOCAL=y
CONFIG_MMU=y
CONFIG_ZONE_DMA=y
CONFIG_GENERIC_ISA_DMA=y
CONFIG_GENERIC_IOMAP=y
CONFIG_GENERIC_BUG=y
CONFIG_GENERIC_HWEIGHT=y
CONFIG_ARCH_MAY_HAVE_PC_FDC=y
# CONFIG_RWSEM_GENERIC_SPINLOCK is not set
CONFIG_RWSEM_XCHGADD_ALGORITHM=y
CONFIG_ARCH_HAS_CPU_IDLE_WAIT=y
CONFIG_GENERIC_CALIBRATE_DELAY=y
# CONFIG_GENERIC_TIME_VSYSCALL is not set
CONFIG_ARCH_HAS_CPU_RELAX=y
CONFIG_ARCH_HAS_DEFAULT_IDLE=y
CONFIG_ARCH_HAS_CACHE_LINE_SIZE=y
CONFIG_HAVE_SETUP_PER_CPU_AREA=y
CONFIG_HAVE_DYNAMIC_PER_CPU_AREA=y
# CONFIG_HAVE_CPUMASK_OF_CPU_MAP is not set
CONFIG_ARCH_HIBERNATION_POSSIBLE=y
CONFIG_ARCH_SUSPEND_POSSIBLE=y
# CONFIG_ZONE_DMA32 is not set
CONFIG_ARCH_POPULATES_NODE_MAP=y
# CONFIG_AUDIT_ARCH is not set
CONFIG_ARCH_SUPPORTS_OPTIMIZED_INLINING=y
CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC=y
CONFIG_GENERIC_HARDIRQS=y
CONFIG_GENERIC_HARDIRQS_NO__DO_IRQ=y
CONFIG_GENERIC_IRQ_PROBE=y
CONFIG_GENERIC_PENDING_IRQ=y
CONFIG_USE_GENERIC_SMP_HELPERS=y
CONFIG_X86_32_SMP=y
CONFIG_X86_HT=y
CONFIG_X86_TRAMPOLINE=y
CONFIG_X86_32_LAZY_GS=y
CONFIG_KTIME_SCALAR=y
CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config"

#
# General setup
#
CONFIG_EXPERIMENTAL=y
CONFIG_LOCK_KERNEL=y
CONFIG_INIT_ENV_ARG_LIMIT=32
CONFIG_LOCALVERSION=""
# CONFIG_LOCALVERSION_AUTO is not set
CONFIG_HAVE_KERNEL_GZIP=y
CONFIG_HAVE_KERNEL_BZIP2=y
CONFIG_HAVE_KERNEL_LZMA=y
CONFIG_KERNEL_GZIP=y
# CONFIG_KERNEL_BZIP2 is not set
# CONFIG_KERNEL_LZMA is not set
CONFIG_SWAP=y
CONFIG_SYSVIPC=y
CONFIG_SYSVIPC_SYSCTL=y
CONFIG_POSIX_MQUEUE=y
CONFIG_POSIX_MQUEUE_SYSCTL=y
CONFIG_BSD_PROCESS_ACCT=y
# CONFIG_BSD_PROCESS_ACCT_V3 is not set
CONFIG_TASKSTATS=y
CONFIG_TASK_DELAY_ACCT=y
CONFIG_TASK_XACCT=y
CONFIG_TASK_IO_ACCOUNTING=y
# CONFIG_AUDIT is not set

#
# RCU Subsystem
#
# CONFIG_CLASSIC_RCU is not set
# CONFIG_TREE_RCU is not set
CONFIG_PREEMPT_RCU=y
CONFIG_RCU_TRACE=y
# CONFIG_TREE_RCU_TRACE is not set
CONFIG_PREEMPT_RCU_TRACE=y
# CONFIG_IKCONFIG is not set
CONFIG_LOG_BUF_SHIFT=17
CONFIG_HAVE_UNSTABLE_SCHED_CLOCK=y
CONFIG_GROUP_SCHED=y
CONFIG_FAIR_GROUP_SCHED=y
CONFIG_RT_GROUP_SCHED=y
# CONFIG_USER_SCHED is not set
CONFIG_CGROUP_SCHED=y
CONFIG_CGROUPS=y
CONFIG_CGROUP_DEBUG=y
CONFIG_CGROUP_NS=y
CONFIG_CGROUP_FREEZER=y
CONFIG_CGROUP_DEVICE=y
CONFIG_CPUSETS=y
CONFIG_PROC_PID_CPUSET=y
CONFIG_CGROUP_CPUACCT=y
CONFIG_RESOURCE_COUNTERS=y
CONFIG_CGROUP_MEM_RES_CTLR=y
CONFIG_CGROUP_MEM_RES_CTLR_SWAP=y
CONFIG_GROUP_IOSCHED=y
CONFIG_CGROUP_BLKIO=y
CONFIG_CGROUP_PAGE=y
CONFIG_MM_OWNER=y
CONFIG_SYSFS_DEPRECATED=y
CONFIG_SYSFS_DEPRECATED_V2=y
CONFIG_RELAY=y
CONFIG_NAMESPACES=y
# CONFIG_UTS_NS is not set
# CONFIG_IPC_NS is not set
CONFIG_USER_NS=y
CONFIG_PID_NS=y
# CONFIG_NET_NS is not set
CONFIG_BLK_DEV_INITRD=y
CONFIG_INITRAMFS_SOURCE=""
CONFIG_RD_GZIP=y
CONFIG_RD_BZIP2=y
CONFIG_RD_LZMA=y
CONFIG_CC_OPTIMIZE_FOR_SIZE=y
CONFIG_SYSCTL=y
CONFIG_ANON_INODES=y
# CONFIG_EMBEDDED is not set
CONFIG_UID16=y
CONFIG_SYSCTL_SYSCALL=y
CONFIG_KALLSYMS=y
CONFIG_KALLSYMS_ALL=y
CONFIG_KALLSYMS_EXTRA_PASS=y
# CONFIG_STRIP_ASM_SYMS is not set
CONFIG_HOTPLUG=y
CONFIG_PRINTK=y
CONFIG_BUG=y
CONFIG_ELF_CORE=y
CONFIG_PCSPKR_PLATFORM=y
CONFIG_BASE_FULL=y
CONFIG_FUTEX=y
CONFIG_EPOLL=y
CONFIG_SIGNALFD=y
CONFIG_TIMERFD=y
CONFIG_EVENTFD=y
CONFIG_SHMEM=y
CONFIG_AIO=y
CONFIG_VM_EVENT_COUNTERS=y
CONFIG_PCI_QUIRKS=y
CONFIG_SLUB_DEBUG=y
CONFIG_COMPAT_BRK=y
# CONFIG_SLAB is not set
CONFIG_SLUB=y
# CONFIG_SLOB is not set
CONFIG_PROFILING=y
CONFIG_TRACEPOINTS=y
CONFIG_MARKERS=y
CONFIG_OPROFILE=m
# CONFIG_OPROFILE_IBS is not set
CONFIG_HAVE_OPROFILE=y
CONFIG_KPROBES=y
CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y
CONFIG_KRETPROBES=y
CONFIG_HAVE_IOREMAP_PROT=y
CONFIG_HAVE_KPROBES=y
CONFIG_HAVE_KRETPROBES=y
CONFIG_HAVE_ARCH_TRACEHOOK=y
CONFIG_HAVE_DMA_API_DEBUG=y
# CONFIG_SLOW_WORK is not set
CONFIG_HAVE_GENERIC_DMA_COHERENT=y
CONFIG_SLABINFO=y
CONFIG_RT_MUTEXES=y
CONFIG_BASE_SMALL=0
CONFIG_MODULES=y
# CONFIG_MODULE_FORCE_LOAD is not set
CONFIG_MODULE_UNLOAD=y
# CONFIG_MODULE_FORCE_UNLOAD is not set
# CONFIG_MODVERSIONS is not set
# CONFIG_MODULE_SRCVERSION_ALL is not set
CONFIG_STOP_MACHINE=y
CONFIG_BLOCK=y
CONFIG_LBD=y
CONFIG_BLK_DEV_BSG=y
# CONFIG_BLK_DEV_INTEGRITY is not set

#
# IO Schedulers
#
CONFIG_ELV_FAIR_QUEUING=y
CONFIG_IOSCHED_NOOP=y
CONFIG_IOSCHED_NOOP_HIER=y
CONFIG_IOSCHED_AS=m
CONFIG_IOSCHED_AS_HIER=y
CONFIG_IOSCHED_DEADLINE=m
CONFIG_IOSCHED_DEADLINE_HIER=y
CONFIG_IOSCHED_CFQ=y
CONFIG_IOSCHED_CFQ_HIER=y
# CONFIG_DEFAULT_AS is not set
# CONFIG_DEFAULT_DEADLINE is not set
CONFIG_DEFAULT_CFQ=y
# CONFIG_DEFAULT_NOOP is not set
CONFIG_DEFAULT_IOSCHED="cfq"
CONFIG_TRACK_ASYNC_CONTEXT=y
CONFIG_DEBUG_GROUP_IOSCHED=y
CONFIG_FREEZER=y

#
# Processor type and features
#
CONFIG_TICK_ONESHOT=y
CONFIG_NO_HZ=y
CONFIG_HIGH_RES_TIMERS=y
CONFIG_GENERIC_CLOCKEVENTS_BUILD=y
CONFIG_SMP=y
# CONFIG_SPARSE_IRQ is not set
CONFIG_X86_MPPARSE=y
# CONFIG_X86_BIGSMP is not set
CONFIG_X86_EXTENDED_PLATFORM=y
# CONFIG_X86_ELAN is not set
# CONFIG_X86_RDC321X is not set
# CONFIG_X86_32_NON_STANDARD is not set
CONFIG_SCHED_OMIT_FRAME_POINTER=y
# CONFIG_PARAVIRT_GUEST is not set
# CONFIG_MEMTEST is not set
# CONFIG_M386 is not set
# CONFIG_M486 is not set
# CONFIG_M586 is not set
# CONFIG_M586TSC is not set
# CONFIG_M586MMX is not set
CONFIG_M686=y
# CONFIG_MPENTIUMII is not set
# CONFIG_MPENTIUMIII is not set
# CONFIG_MPENTIUMM is not set
# CONFIG_MPENTIUM4 is not set
# CONFIG_MK6 is not set
# CONFIG_MK7 is not set
# CONFIG_MK8 is not set
# CONFIG_MCRUSOE is not set
# CONFIG_MEFFICEON is not set
# CONFIG_MWINCHIPC6 is not set
# CONFIG_MWINCHIP3D is not set
# CONFIG_MGEODEGX1 is not set
# CONFIG_MGEODE_LX is not set
# CONFIG_MCYRIXIII is not set
# CONFIG_MVIAC3_2 is not set
# CONFIG_MVIAC7 is not set
# CONFIG_MPSC is not set
# CONFIG_MCORE2 is not set
# CONFIG_GENERIC_CPU is not set
CONFIG_X86_GENERIC=y
CONFIG_X86_CPU=y
CONFIG_X86_L1_CACHE_BYTES=64
CONFIG_X86_INTERNODE_CACHE_BYTES=64
CONFIG_X86_CMPXCHG=y
CONFIG_X86_L1_CACHE_SHIFT=5
CONFIG_X86_XADD=y
CONFIG_X86_PPRO_FENCE=y
CONFIG_X86_WP_WORKS_OK=y
CONFIG_X86_INVLPG=y
CONFIG_X86_BSWAP=y
CONFIG_X86_POPAD_OK=y
CONFIG_X86_INTEL_USERCOPY=y
CONFIG_X86_USE_PPRO_CHECKSUM=y
CONFIG_X86_TSC=y
CONFIG_X86_CMOV=y
CONFIG_X86_MINIMUM_CPU_FAMILY=4
CONFIG_X86_DEBUGCTLMSR=y
CONFIG_CPU_SUP_INTEL=y
CONFIG_CPU_SUP_CYRIX_32=y
CONFIG_CPU_SUP_AMD=y
CONFIG_CPU_SUP_CENTAUR=y
CONFIG_CPU_SUP_TRANSMETA_32=y
CONFIG_CPU_SUP_UMC_32=y
# CONFIG_X86_DS is not set
CONFIG_HPET_TIMER=y
CONFIG_HPET_EMULATE_RTC=y
CONFIG_DMI=y
# CONFIG_IOMMU_HELPER is not set
# CONFIG_IOMMU_API is not set
CONFIG_NR_CPUS=8
# CONFIG_SCHED_SMT is not set
CONFIG_SCHED_MC=y
# CONFIG_PREEMPT_NONE is not set
# CONFIG_PREEMPT_VOLUNTARY is not set
CONFIG_PREEMPT=y
CONFIG_X86_LOCAL_APIC=y
CONFIG_X86_IO_APIC=y
# CONFIG_X86_REROUTE_FOR_BROKEN_BOOT_IRQS is not set
CONFIG_X86_MCE=y
# CONFIG_X86_MCE_NONFATAL is not set
# CONFIG_X86_MCE_P4THERMAL is not set
CONFIG_VM86=y
# CONFIG_TOSHIBA is not set
# CONFIG_I8K is not set
# CONFIG_X86_REBOOTFIXUPS is not set
# CONFIG_MICROCODE is not set
CONFIG_X86_MSR=m
CONFIG_X86_CPUID=m
# CONFIG_X86_CPU_DEBUG is not set
# CONFIG_NOHIGHMEM is not set
CONFIG_HIGHMEM4G=y
# CONFIG_HIGHMEM64G is not set
CONFIG_PAGE_OFFSET=0xC0000000
CONFIG_HIGHMEM=y
# CONFIG_ARCH_PHYS_ADDR_T_64BIT is not set
CONFIG_ARCH_FLATMEM_ENABLE=y
CONFIG_ARCH_SPARSEMEM_ENABLE=y
CONFIG_ARCH_SELECT_MEMORY_MODEL=y
CONFIG_SELECT_MEMORY_MODEL=y
CONFIG_FLATMEM_MANUAL=y
# CONFIG_DISCONTIGMEM_MANUAL is not set
# CONFIG_SPARSEMEM_MANUAL is not set
CONFIG_FLATMEM=y
CONFIG_FLAT_NODE_MEM_MAP=y
CONFIG_SPARSEMEM_STATIC=y
CONFIG_PAGEFLAGS_EXTENDED=y
CONFIG_SPLIT_PTLOCK_CPUS=4
# CONFIG_PHYS_ADDR_T_64BIT is not set
CONFIG_ZONE_DMA_FLAG=1
CONFIG_BOUNCE=y
CONFIG_VIRT_TO_BUS=y
CONFIG_UNEVICTABLE_LRU=y
CONFIG_HAVE_MLOCK=y
CONFIG_HAVE_MLOCKED_PAGE_BIT=y
CONFIG_HIGHPTE=y
# CONFIG_X86_CHECK_BIOS_CORRUPTION is not set
CONFIG_X86_RESERVE_LOW_64K=y
# CONFIG_MATH_EMULATION is not set
CONFIG_MTRR=y
CONFIG_MTRR_SANITIZER=y
CONFIG_MTRR_SANITIZER_ENABLE_DEFAULT=0
CONFIG_MTRR_SANITIZER_SPARE_REG_NR_DEFAULT=1
# CONFIG_X86_PAT is not set
CONFIG_EFI=y
CONFIG_SECCOMP=y
# CONFIG_CC_STACKPROTECTOR is not set
# CONFIG_HZ_100 is not set
# CONFIG_HZ_250 is not set
# CONFIG_HZ_300 is not set
CONFIG_HZ_1000=y
CONFIG_HZ=1000
CONFIG_SCHED_HRTICK=y
CONFIG_KEXEC=y
CONFIG_CRASH_DUMP=y
CONFIG_PHYSICAL_START=0x1000000
CONFIG_RELOCATABLE=y
CONFIG_PHYSICAL_ALIGN=0x400000
CONFIG_HOTPLUG_CPU=y
# CONFIG_COMPAT_VDSO is not set
# CONFIG_CMDLINE_BOOL is not set
CONFIG_ARCH_ENABLE_MEMORY_HOTPLUG=y

#
# Power management and ACPI options
#
CONFIG_PM=y
CONFIG_PM_DEBUG=y
# CONFIG_PM_VERBOSE is not set
CONFIG_CAN_PM_TRACE=y
# CONFIG_PM_TRACE_RTC is not set
CONFIG_PM_SLEEP_SMP=y
CONFIG_PM_SLEEP=y
CONFIG_SUSPEND=y
CONFIG_SUSPEND_FREEZER=y
# CONFIG_HIBERNATION is not set
CONFIG_ACPI=y
CONFIG_ACPI_SLEEP=y
# CONFIG_ACPI_PROCFS is not set
# CONFIG_ACPI_PROCFS_POWER is not set
CONFIG_ACPI_SYSFS_POWER=y
# CONFIG_ACPI_PROC_EVENT is not set
CONFIG_ACPI_AC=m
# CONFIG_ACPI_BATTERY is not set
CONFIG_ACPI_BUTTON=m
CONFIG_ACPI_VIDEO=m
CONFIG_ACPI_FAN=y
CONFIG_ACPI_DOCK=y
CONFIG_ACPI_PROCESSOR=y
CONFIG_ACPI_HOTPLUG_CPU=y
CONFIG_ACPI_THERMAL=y
# CONFIG_ACPI_CUSTOM_DSDT is not set
CONFIG_ACPI_BLACKLIST_YEAR=1999
# CONFIG_ACPI_DEBUG is not set
# CONFIG_ACPI_PCI_SLOT is not set
CONFIG_X86_PM_TIMER=y
CONFIG_ACPI_CONTAINER=y
# CONFIG_ACPI_SBS is not set
CONFIG_X86_APM_BOOT=y
CONFIG_APM=y
# CONFIG_APM_IGNORE_USER_SUSPEND is not set
# CONFIG_APM_DO_ENABLE is not set
CONFIG_APM_CPU_IDLE=y
# CONFIG_APM_DISPLAY_BLANK is not set
# CONFIG_APM_ALLOW_INTS is not set

#
# CPU Frequency scaling
#
CONFIG_CPU_FREQ=y
CONFIG_CPU_FREQ_TABLE=y
CONFIG_CPU_FREQ_DEBUG=y
CONFIG_CPU_FREQ_STAT=m
CONFIG_CPU_FREQ_STAT_DETAILS=y
# CONFIG_CPU_FREQ_DEFAULT_GOV_PERFORMANCE is not set
# CONFIG_CPU_FREQ_DEFAULT_GOV_POWERSAVE is not set
CONFIG_CPU_FREQ_DEFAULT_GOV_USERSPACE=y
# CONFIG_CPU_FREQ_DEFAULT_GOV_ONDEMAND is not set
# CONFIG_CPU_FREQ_DEFAULT_GOV_CONSERVATIVE is not set
CONFIG_CPU_FREQ_GOV_PERFORMANCE=y
CONFIG_CPU_FREQ_GOV_POWERSAVE=m
CONFIG_CPU_FREQ_GOV_USERSPACE=y
CONFIG_CPU_FREQ_GOV_ONDEMAND=m
CONFIG_CPU_FREQ_GOV_CONSERVATIVE=m

#
# CPUFreq processor drivers
#
# CONFIG_X86_ACPI_CPUFREQ is not set
# CONFIG_X86_POWERNOW_K6 is not set
# CONFIG_X86_POWERNOW_K7 is not set
# CONFIG_X86_POWERNOW_K8 is not set
# CONFIG_X86_GX_SUSPMOD is not set
# CONFIG_X86_SPEEDSTEP_CENTRINO is not set
CONFIG_X86_SPEEDSTEP_ICH=y
CONFIG_X86_SPEEDSTEP_SMI=y
# CONFIG_X86_P4_CLOCKMOD is not set
# CONFIG_X86_CPUFREQ_NFORCE2 is not set
# CONFIG_X86_LONGRUN is not set
# CONFIG_X86_LONGHAUL is not set
# CONFIG_X86_E_POWERSAVER is not set

#
# shared options
#
CONFIG_X86_SPEEDSTEP_LIB=y
# CONFIG_X86_SPEEDSTEP_RELAXED_CAP_CHECK is not set
CONFIG_CPU_IDLE=y
CONFIG_CPU_IDLE_GOV_LADDER=y
CONFIG_CPU_IDLE_GOV_MENU=y

#
# Bus options (PCI etc.)
#
CONFIG_PCI=y
# CONFIG_PCI_GOBIOS is not set
# CONFIG_PCI_GOMMCONFIG is not set
# CONFIG_PCI_GODIRECT is not set
# CONFIG_PCI_GOOLPC is not set
CONFIG_PCI_GOANY=y
CONFIG_PCI_BIOS=y
CONFIG_PCI_DIRECT=y
CONFIG_PCI_MMCONFIG=y
CONFIG_PCI_DOMAINS=y
CONFIG_PCIEPORTBUS=y
CONFIG_HOTPLUG_PCI_PCIE=m
CONFIG_PCIEAER=y
# CONFIG_PCIEASPM is not set
CONFIG_ARCH_SUPPORTS_MSI=y
# CONFIG_PCI_MSI is not set
CONFIG_PCI_LEGACY=y
# CONFIG_PCI_DEBUG is not set
# CONFIG_PCI_STUB is not set
CONFIG_HT_IRQ=y
# CONFIG_PCI_IOV is not set
CONFIG_ISA_DMA_API=y
CONFIG_ISA=y
# CONFIG_EISA is not set
# CONFIG_MCA is not set
# CONFIG_SCx200 is not set
# CONFIG_OLPC is not set
CONFIG_PCCARD=y
# CONFIG_PCMCIA_DEBUG is not set
CONFIG_PCMCIA=y
CONFIG_PCMCIA_LOAD_CIS=y
# CONFIG_PCMCIA_IOCTL is not set
CONFIG_CARDBUS=y

#
# PC-card bridges
#
CONFIG_YENTA=y
CONFIG_YENTA_O2=y
CONFIG_YENTA_RICOH=y
CONFIG_YENTA_TI=y
CONFIG_YENTA_ENE_TUNE=y
CONFIG_YENTA_TOSHIBA=y
# CONFIG_PD6729 is not set
# CONFIG_I82092 is not set
# CONFIG_I82365 is not set
# CONFIG_TCIC is not set
CONFIG_PCMCIA_PROBE=y
CONFIG_PCCARD_NONSTATIC=y
CONFIG_HOTPLUG_PCI=y
CONFIG_HOTPLUG_PCI_FAKE=m
# CONFIG_HOTPLUG_PCI_COMPAQ is not set
# CONFIG_HOTPLUG_PCI_IBM is not set
CONFIG_HOTPLUG_PCI_ACPI=m
CONFIG_HOTPLUG_PCI_ACPI_IBM=m
# CONFIG_HOTPLUG_PCI_CPCI is not set
# CONFIG_HOTPLUG_PCI_SHPC is not set

#
# Executable file formats / Emulations
#
CONFIG_BINFMT_ELF=y
# CONFIG_CORE_DUMP_DEFAULT_ELF_HEADERS is not set
CONFIG_HAVE_AOUT=y
# CONFIG_BINFMT_AOUT is not set
CONFIG_BINFMT_MISC=y
CONFIG_HAVE_ATOMIC_IOMAP=y
CONFIG_NET=y

#
# Networking options
#
CONFIG_PACKET=y
CONFIG_PACKET_MMAP=y
CONFIG_UNIX=y
# CONFIG_NET_KEY is not set
CONFIG_INET=y
CONFIG_IP_MULTICAST=y
CONFIG_IP_ADVANCED_ROUTER=y
CONFIG_ASK_IP_FIB_HASH=y
# CONFIG_IP_FIB_TRIE is not set
CONFIG_IP_FIB_HASH=y
CONFIG_IP_MULTIPLE_TABLES=y
CONFIG_IP_ROUTE_MULTIPATH=y
CONFIG_IP_ROUTE_VERBOSE=y
# CONFIG_IP_PNP is not set
CONFIG_NET_IPIP=m
# CONFIG_NET_IPGRE is not set
CONFIG_IP_MROUTE=y
CONFIG_IP_PIMSM_V1=y
CONFIG_IP_PIMSM_V2=y
# CONFIG_ARPD is not set
CONFIG_SYN_COOKIES=y
# CONFIG_INET_AH is not set
# CONFIG_INET_ESP is not set
# CONFIG_INET_IPCOMP is not set
# CONFIG_INET_XFRM_TUNNEL is not set
CONFIG_INET_TUNNEL=m
# CONFIG_INET_XFRM_MODE_TRANSPORT is not set
# CONFIG_INET_XFRM_MODE_TUNNEL is not set
# CONFIG_INET_XFRM_MODE_BEET is not set
CONFIG_INET_LRO=m
CONFIG_INET_DIAG=m
CONFIG_INET_TCP_DIAG=m
CONFIG_TCP_CONG_ADVANCED=y
CONFIG_TCP_CONG_BIC=m
CONFIG_TCP_CONG_CUBIC=y
# CONFIG_TCP_CONG_WESTWOOD is not set
# CONFIG_TCP_CONG_HTCP is not set
CONFIG_TCP_CONG_HSTCP=m
CONFIG_TCP_CONG_HYBLA=m
# CONFIG_TCP_CONG_VEGAS is not set
CONFIG_TCP_CONG_SCALABLE=m
CONFIG_TCP_CONG_LP=m
# CONFIG_TCP_CONG_VENO is not set
# CONFIG_TCP_CONG_YEAH is not set
CONFIG_TCP_CONG_ILLINOIS=m
# CONFIG_DEFAULT_BIC is not set
CONFIG_DEFAULT_CUBIC=y
# CONFIG_DEFAULT_HTCP is not set
# CONFIG_DEFAULT_VEGAS is not set
# CONFIG_DEFAULT_WESTWOOD is not set
# CONFIG_DEFAULT_RENO is not set
CONFIG_DEFAULT_TCP_CONG="cubic"
# CONFIG_TCP_MD5SIG is not set
# CONFIG_IPV6 is not set
# CONFIG_NETWORK_SECMARK is not set
# CONFIG_NETFILTER is not set
# CONFIG_IP_DCCP is not set
# CONFIG_IP_SCTP is not set
# CONFIG_TIPC is not set
# CONFIG_ATM is not set
CONFIG_STP=m
CONFIG_BRIDGE=m
# CONFIG_NET_DSA is not set
# CONFIG_VLAN_8021Q is not set
# CONFIG_DECNET is not set
CONFIG_LLC=m
# CONFIG_LLC2 is not set
# CONFIG_IPX is not set
# CONFIG_ATALK is not set
# CONFIG_X25 is not set
# CONFIG_LAPB is not set
# CONFIG_ECONET is not set
# CONFIG_WAN_ROUTER is not set
# CONFIG_PHONET is not set
CONFIG_NET_SCHED=y

#
# Queueing/Scheduling
#
# CONFIG_NET_SCH_CBQ is not set
# CONFIG_NET_SCH_HTB is not set
# CONFIG_NET_SCH_HFSC is not set
# CONFIG_NET_SCH_PRIO is not set
# CONFIG_NET_SCH_MULTIQ is not set
# CONFIG_NET_SCH_RED is not set
# CONFIG_NET_SCH_SFQ is not set
# CONFIG_NET_SCH_TEQL is not set
# CONFIG_NET_SCH_TBF is not set
# CONFIG_NET_SCH_GRED is not set
# CONFIG_NET_SCH_DSMARK is not set
# CONFIG_NET_SCH_NETEM is not set
# CONFIG_NET_SCH_DRR is not set

#
# Classification
#
CONFIG_NET_CLS=y
# CONFIG_NET_CLS_BASIC is not set
# CONFIG_NET_CLS_TCINDEX is not set
# CONFIG_NET_CLS_ROUTE4 is not set
# CONFIG_NET_CLS_FW is not set
# CONFIG_NET_CLS_U32 is not set
# CONFIG_NET_CLS_RSVP is not set
# CONFIG_NET_CLS_RSVP6 is not set
# CONFIG_NET_CLS_FLOW is not set
CONFIG_NET_CLS_CGROUP=y
# CONFIG_NET_EMATCH is not set
# CONFIG_NET_CLS_ACT is not set
CONFIG_NET_SCH_FIFO=y
# CONFIG_DCB is not set

#
# Network testing
#
# CONFIG_NET_PKTGEN is not set
# CONFIG_NET_TCPPROBE is not set
# CONFIG_NET_DROP_MONITOR is not set
# CONFIG_HAMRADIO is not set
# CONFIG_CAN is not set
# CONFIG_IRDA is not set
# CONFIG_BT is not set
# CONFIG_AF_RXRPC is not set
CONFIG_FIB_RULES=y
# CONFIG_WIRELESS is not set
# CONFIG_WIMAX is not set
# CONFIG_RFKILL is not set
# CONFIG_NET_9P is not set

#
# Device Drivers
#

#
# Generic Driver Options
#
CONFIG_UEVENT_HELPER_PATH="/sbin/hotplug"
CONFIG_STANDALONE=y
CONFIG_PREVENT_FIRMWARE_BUILD=y
CONFIG_FW_LOADER=y
CONFIG_FIRMWARE_IN_KERNEL=y
CONFIG_EXTRA_FIRMWARE=""
# CONFIG_DEBUG_DRIVER is not set
CONFIG_DEBUG_DEVRES=y
# CONFIG_SYS_HYPERVISOR is not set
# CONFIG_CONNECTOR is not set
# CONFIG_MTD is not set
CONFIG_PARPORT=m
CONFIG_PARPORT_PC=m
CONFIG_PARPORT_SERIAL=m
# CONFIG_PARPORT_PC_FIFO is not set
# CONFIG_PARPORT_PC_SUPERIO is not set
CONFIG_PARPORT_PC_PCMCIA=m
# CONFIG_PARPORT_GSC is not set
# CONFIG_PARPORT_AX88796 is not set
CONFIG_PARPORT_1284=y
CONFIG_PNP=y
CONFIG_PNP_DEBUG_MESSAGES=y

#
# Protocols
#
CONFIG_ISAPNP=y
# CONFIG_PNPBIOS is not set
CONFIG_PNPACPI=y
CONFIG_BLK_DEV=y
# CONFIG_BLK_DEV_FD is not set
# CONFIG_BLK_DEV_XD is not set
CONFIG_PARIDE=m

#
# Parallel IDE high-level drivers
#
CONFIG_PARIDE_PD=m
CONFIG_PARIDE_PCD=m
CONFIG_PARIDE_PF=m
# CONFIG_PARIDE_PT is not set
CONFIG_PARIDE_PG=m

#
# Parallel IDE protocol modules
#
# CONFIG_PARIDE_ATEN is not set
# CONFIG_PARIDE_BPCK is not set
# CONFIG_PARIDE_BPCK6 is not set
# CONFIG_PARIDE_COMM is not set
# CONFIG_PARIDE_DSTR is not set
# CONFIG_PARIDE_FIT2 is not set
# CONFIG_PARIDE_FIT3 is not set
# CONFIG_PARIDE_EPAT is not set
# CONFIG_PARIDE_EPIA is not set
# CONFIG_PARIDE_FRIQ is not set
# CONFIG_PARIDE_FRPW is not set
# CONFIG_PARIDE_KBIC is not set
# CONFIG_PARIDE_KTTI is not set
# CONFIG_PARIDE_ON20 is not set
# CONFIG_PARIDE_ON26 is not set
# CONFIG_BLK_CPQ_DA is not set
# CONFIG_BLK_CPQ_CISS_DA is not set
# CONFIG_BLK_DEV_DAC960 is not set
# CONFIG_BLK_DEV_UMEM is not set
# CONFIG_BLK_DEV_COW_COMMON is not set
CONFIG_BLK_DEV_LOOP=m
CONFIG_BLK_DEV_CRYPTOLOOP=m
CONFIG_BLK_DEV_NBD=m
# CONFIG_BLK_DEV_SX8 is not set
# CONFIG_BLK_DEV_UB is not set
CONFIG_BLK_DEV_RAM=y
CONFIG_BLK_DEV_RAM_COUNT=16
CONFIG_BLK_DEV_RAM_SIZE=16384
# CONFIG_BLK_DEV_XIP is not set
# CONFIG_CDROM_PKTCDVD is not set
# CONFIG_ATA_OVER_ETH is not set
# CONFIG_BLK_DEV_HD is not set
CONFIG_MISC_DEVICES=y
# CONFIG_IBM_ASM is not set
# CONFIG_PHANTOM is not set
# CONFIG_SGI_IOC4 is not set
# CONFIG_TIFM_CORE is not set
# CONFIG_ICS932S401 is not set
# CONFIG_ENCLOSURE_SERVICES is not set
# CONFIG_HP_ILO is not set
# CONFIG_ISL29003 is not set
# CONFIG_C2PORT is not set

#
# EEPROM support
#
# CONFIG_EEPROM_AT24 is not set
# CONFIG_EEPROM_LEGACY is not set
CONFIG_EEPROM_93CX6=m
CONFIG_HAVE_IDE=y
# CONFIG_IDE is not set

#
# SCSI device support
#
# CONFIG_RAID_ATTRS is not set
CONFIG_SCSI=m
CONFIG_SCSI_DMA=y
CONFIG_SCSI_TGT=m
CONFIG_SCSI_NETLINK=y
CONFIG_SCSI_PROC_FS=y

#
# SCSI support type (disk, tape, CD-ROM)
#
CONFIG_BLK_DEV_SD=m
# CONFIG_CHR_DEV_ST is not set
# CONFIG_CHR_DEV_OSST is not set
CONFIG_BLK_DEV_SR=m
CONFIG_BLK_DEV_SR_VENDOR=y
CONFIG_CHR_DEV_SG=m
CONFIG_CHR_DEV_SCH=m

#
# Some SCSI devices (e.g. CD jukebox) support multiple LUNs
#
CONFIG_SCSI_MULTI_LUN=y
# CONFIG_SCSI_CONSTANTS is not set
CONFIG_SCSI_LOGGING=y
CONFIG_SCSI_SCAN_ASYNC=y
CONFIG_SCSI_WAIT_SCAN=m

#
# SCSI Transports
#
CONFIG_SCSI_SPI_ATTRS=m
CONFIG_SCSI_FC_ATTRS=m
# CONFIG_SCSI_FC_TGT_ATTRS is not set
CONFIG_SCSI_ISCSI_ATTRS=m
CONFIG_SCSI_SAS_ATTRS=m
CONFIG_SCSI_SAS_LIBSAS=m
CONFIG_SCSI_SAS_ATA=y
CONFIG_SCSI_SAS_HOST_SMP=y
# CONFIG_SCSI_SAS_LIBSAS_DEBUG is not set
CONFIG_SCSI_SRP_ATTRS=m
# CONFIG_SCSI_SRP_TGT_ATTRS is not set
CONFIG_SCSI_LOWLEVEL=y
CONFIG_ISCSI_TCP=m
# CONFIG_BLK_DEV_3W_XXXX_RAID is not set
# CONFIG_SCSI_3W_9XXX is not set
# CONFIG_SCSI_7000FASST is not set
CONFIG_SCSI_ACARD=m
# CONFIG_SCSI_AHA152X is not set
# CONFIG_SCSI_AHA1542 is not set
# CONFIG_SCSI_AACRAID is not set
CONFIG_SCSI_AIC7XXX=m
CONFIG_AIC7XXX_CMDS_PER_DEVICE=4
CONFIG_AIC7XXX_RESET_DELAY_MS=15000
# CONFIG_AIC7XXX_DEBUG_ENABLE is not set
CONFIG_AIC7XXX_DEBUG_MASK=0
# CONFIG_AIC7XXX_REG_PRETTY_PRINT is not set
CONFIG_SCSI_AIC7XXX_OLD=m
CONFIG_SCSI_AIC79XX=m
CONFIG_AIC79XX_CMDS_PER_DEVICE=4
CONFIG_AIC79XX_RESET_DELAY_MS=15000
# CONFIG_AIC79XX_DEBUG_ENABLE is not set
CONFIG_AIC79XX_DEBUG_MASK=0
# CONFIG_AIC79XX_REG_PRETTY_PRINT is not set
CONFIG_SCSI_AIC94XX=m
# CONFIG_AIC94XX_DEBUG is not set
# CONFIG_SCSI_DPT_I2O is not set
CONFIG_SCSI_ADVANSYS=m
# CONFIG_SCSI_IN2000 is not set
# CONFIG_SCSI_ARCMSR is not set
# CONFIG_MEGARAID_NEWGEN is not set
# CONFIG_MEGARAID_LEGACY is not set
# CONFIG_MEGARAID_SAS is not set
# CONFIG_SCSI_MPT2SAS is not set
# CONFIG_SCSI_HPTIOP is not set
CONFIG_SCSI_BUSLOGIC=m
# CONFIG_SCSI_FLASHPOINT is not set
# CONFIG_LIBFC is not set
# CONFIG_LIBFCOE is not set
# CONFIG_FCOE is not set
# CONFIG_SCSI_DMX3191D is not set
# CONFIG_SCSI_DTC3280 is not set
# CONFIG_SCSI_EATA is not set
# CONFIG_SCSI_FUTURE_DOMAIN is not set
CONFIG_SCSI_GDTH=m
# CONFIG_SCSI_GENERIC_NCR5380 is not set
# CONFIG_SCSI_GENERIC_NCR5380_MMIO is not set
CONFIG_SCSI_IPS=m
CONFIG_SCSI_INITIO=m
CONFIG_SCSI_INIA100=m
CONFIG_SCSI_PPA=m
CONFIG_SCSI_IMM=m
# CONFIG_SCSI_IZIP_EPP16 is not set
# CONFIG_SCSI_IZIP_SLOW_CTR is not set
# CONFIG_SCSI_MVSAS is not set
# CONFIG_SCSI_NCR53C406A is not set
# CONFIG_SCSI_STEX is not set
CONFIG_SCSI_SYM53C8XX_2=m
CONFIG_SCSI_SYM53C8XX_DMA_ADDRESSING_MODE=1
CONFIG_SCSI_SYM53C8XX_DEFAULT_TAGS=16
CONFIG_SCSI_SYM53C8XX_MAX_TAGS=64
CONFIG_SCSI_SYM53C8XX_MMIO=y
# CONFIG_SCSI_IPR is not set
# CONFIG_SCSI_PAS16 is not set
# CONFIG_SCSI_QLOGIC_FAS is not set
# CONFIG_SCSI_QLOGIC_1280 is not set
# CONFIG_SCSI_QLA_FC is not set
# CONFIG_SCSI_QLA_ISCSI is not set
# CONFIG_SCSI_LPFC is not set
# CONFIG_SCSI_SYM53C416 is not set
# CONFIG_SCSI_DC395x is not set
# CONFIG_SCSI_DC390T is not set
# CONFIG_SCSI_T128 is not set
# CONFIG_SCSI_U14_34F is not set
# CONFIG_SCSI_ULTRASTOR is not set
# CONFIG_SCSI_NSP32 is not set
# CONFIG_SCSI_DEBUG is not set
# CONFIG_SCSI_SRP is not set
CONFIG_SCSI_LOWLEVEL_PCMCIA=y
# CONFIG_PCMCIA_AHA152X is not set
# CONFIG_PCMCIA_FDOMAIN is not set
# CONFIG_PCMCIA_NINJA_SCSI is not set
CONFIG_PCMCIA_QLOGIC=m
# CONFIG_PCMCIA_SYM53C500 is not set
# CONFIG_SCSI_DH is not set
# CONFIG_SCSI_OSD_INITIATOR is not set
CONFIG_ATA=m
# CONFIG_ATA_NONSTANDARD is not set
CONFIG_ATA_ACPI=y
CONFIG_SATA_PMP=y
CONFIG_SATA_AHCI=m
# CONFIG_SATA_SIL24 is not set
CONFIG_ATA_SFF=y
# CONFIG_SATA_SVW is not set
CONFIG_ATA_PIIX=m
# CONFIG_SATA_MV is not set
CONFIG_SATA_NV=m
# CONFIG_PDC_ADMA is not set
# CONFIG_SATA_QSTOR is not set
# CONFIG_SATA_PROMISE is not set
# CONFIG_SATA_SX4 is not set
# CONFIG_SATA_SIL is not set
CONFIG_SATA_SIS=m
# CONFIG_SATA_ULI is not set
# CONFIG_SATA_VIA is not set
# CONFIG_SATA_VITESSE is not set
# CONFIG_SATA_INIC162X is not set
# CONFIG_PATA_ACPI is not set
# CONFIG_PATA_ALI is not set
# CONFIG_PATA_AMD is not set
# CONFIG_PATA_ARTOP is not set
CONFIG_PATA_ATIIXP=m
# CONFIG_PATA_CMD640_PCI is not set
# CONFIG_PATA_CMD64X is not set
# CONFIG_PATA_CS5520 is not set
# CONFIG_PATA_CS5530 is not set
# CONFIG_PATA_CS5535 is not set
# CONFIG_PATA_CS5536 is not set
# CONFIG_PATA_CYPRESS is not set
# CONFIG_PATA_EFAR is not set
CONFIG_ATA_GENERIC=m
# CONFIG_PATA_HPT366 is not set
# CONFIG_PATA_HPT37X is not set
# CONFIG_PATA_HPT3X2N is not set
# CONFIG_PATA_HPT3X3 is not set
# CONFIG_PATA_ISAPNP is not set
# CONFIG_PATA_IT821X is not set
# CONFIG_PATA_IT8213 is not set
# CONFIG_PATA_JMICRON is not set
# CONFIG_PATA_LEGACY is not set
# CONFIG_PATA_TRIFLEX is not set
# CONFIG_PATA_MARVELL is not set
CONFIG_PATA_MPIIX=m
# CONFIG_PATA_OLDPIIX is not set
# CONFIG_PATA_NETCELL is not set
# CONFIG_PATA_NINJA32 is not set
# CONFIG_PATA_NS87410 is not set
# CONFIG_PATA_NS87415 is not set
# CONFIG_PATA_OPTI is not set
# CONFIG_PATA_OPTIDMA is not set
CONFIG_PATA_PCMCIA=m
# CONFIG_PATA_PDC_OLD is not set
# CONFIG_PATA_QDI is not set
# CONFIG_PATA_RADISYS is not set
# CONFIG_PATA_RZ1000 is not set
# CONFIG_PATA_SC1200 is not set
# CONFIG_PATA_SERVERWORKS is not set
# CONFIG_PATA_PDC2027X is not set
# CONFIG_PATA_SIL680 is not set
CONFIG_PATA_SIS=m
CONFIG_PATA_VIA=m
# CONFIG_PATA_WINBOND is not set
# CONFIG_PATA_WINBOND_VLB is not set
# CONFIG_PATA_SCH is not set
# CONFIG_MD is not set
CONFIG_FUSION=y
CONFIG_FUSION_SPI=m
CONFIG_FUSION_FC=m
# CONFIG_FUSION_SAS is not set
CONFIG_FUSION_MAX_SGE=40
CONFIG_FUSION_CTL=m
CONFIG_FUSION_LAN=m
CONFIG_FUSION_LOGGING=y

#
# IEEE 1394 (FireWire) support
#

#
# Enable only one of the two stacks, unless you know what you are doing
#
CONFIG_FIREWIRE=m
CONFIG_FIREWIRE_OHCI=m
CONFIG_FIREWIRE_OHCI_DEBUG=y
CONFIG_FIREWIRE_SBP2=m
# CONFIG_IEEE1394 is not set
CONFIG_I2O=m
# CONFIG_I2O_LCT_NOTIFY_ON_CHANGES is not set
CONFIG_I2O_EXT_ADAPTEC=y
CONFIG_I2O_CONFIG=m
CONFIG_I2O_CONFIG_OLD_IOCTL=y
CONFIG_I2O_BUS=m
CONFIG_I2O_BLOCK=m
CONFIG_I2O_SCSI=m
CONFIG_I2O_PROC=m
# CONFIG_MACINTOSH_DRIVERS is not set
CONFIG_NETDEVICES=y
CONFIG_COMPAT_NET_DEV_OPS=y
CONFIG_DUMMY=m
CONFIG_BONDING=m
# CONFIG_MACVLAN is not set
# CONFIG_EQUALIZER is not set
CONFIG_TUN=m
# CONFIG_VETH is not set
# CONFIG_NET_SB1000 is not set
# CONFIG_ARCNET is not set
CONFIG_PHYLIB=m

#
# MII PHY device drivers
#
# CONFIG_MARVELL_PHY is not set
# CONFIG_DAVICOM_PHY is not set
# CONFIG_QSEMI_PHY is not set
CONFIG_LXT_PHY=m
# CONFIG_CICADA_PHY is not set
# CONFIG_VITESSE_PHY is not set
# CONFIG_SMSC_PHY is not set
# CONFIG_BROADCOM_PHY is not set
# CONFIG_ICPLUS_PHY is not set
# CONFIG_REALTEK_PHY is not set
# CONFIG_NATIONAL_PHY is not set
# CONFIG_STE10XP is not set
# CONFIG_LSI_ET1011C_PHY is not set
# CONFIG_MDIO_BITBANG is not set
CONFIG_NET_ETHERNET=y
CONFIG_MII=m
# CONFIG_HAPPYMEAL is not set
# CONFIG_SUNGEM is not set
# CONFIG_CASSINI is not set
CONFIG_NET_VENDOR_3COM=y
# CONFIG_EL1 is not set
# CONFIG_EL2 is not set
# CONFIG_ELPLUS is not set
# CONFIG_EL16 is not set
CONFIG_EL3=m
# CONFIG_3C515 is not set
CONFIG_VORTEX=m
CONFIG_TYPHOON=m
# CONFIG_LANCE is not set
CONFIG_NET_VENDOR_SMC=y
# CONFIG_WD80x3 is not set
# CONFIG_ULTRA is not set
# CONFIG_SMC9194 is not set
# CONFIG_ETHOC is not set
# CONFIG_NET_VENDOR_RACAL is not set
# CONFIG_DNET is not set
CONFIG_NET_TULIP=y
CONFIG_DE2104X=m
CONFIG_TULIP=m
# CONFIG_TULIP_MWI is not set
CONFIG_TULIP_MMIO=y
# CONFIG_TULIP_NAPI is not set
CONFIG_DE4X5=m
CONFIG_WINBOND_840=m
CONFIG_DM9102=m
CONFIG_ULI526X=m
CONFIG_PCMCIA_XIRCOM=m
# CONFIG_AT1700 is not set
# CONFIG_DEPCA is not set
# CONFIG_HP100 is not set
CONFIG_NET_ISA=y
# CONFIG_E2100 is not set
# CONFIG_EWRK3 is not set
# CONFIG_EEXPRESS is not set
# CONFIG_EEXPRESS_PRO is not set
# CONFIG_HPLAN_PLUS is not set
# CONFIG_HPLAN is not set
# CONFIG_LP486E is not set
# CONFIG_ETH16I is not set
CONFIG_NE2000=m
# CONFIG_ZNET is not set
# CONFIG_SEEQ8005 is not set
# CONFIG_IBM_NEW_EMAC_ZMII is not set
# CONFIG_IBM_NEW_EMAC_RGMII is not set
# CONFIG_IBM_NEW_EMAC_TAH is not set
# CONFIG_IBM_NEW_EMAC_EMAC4 is not set
# CONFIG_IBM_NEW_EMAC_NO_FLOW_CTRL is not set
# CONFIG_IBM_NEW_EMAC_MAL_CLR_ICINTSTAT is not set
# CONFIG_IBM_NEW_EMAC_MAL_COMMON_ERR is not set
CONFIG_NET_PCI=y
CONFIG_PCNET32=m
CONFIG_AMD8111_ETH=m
CONFIG_ADAPTEC_STARFIRE=m
# CONFIG_AC3200 is not set
# CONFIG_APRICOT is not set
CONFIG_B44=m
CONFIG_B44_PCI_AUTOSELECT=y
CONFIG_B44_PCICORE_AUTOSELECT=y
CONFIG_B44_PCI=y
CONFIG_FORCEDETH=m
CONFIG_FORCEDETH_NAPI=y
# CONFIG_CS89x0 is not set
CONFIG_E100=m
# CONFIG_FEALNX is not set
# CONFIG_NATSEMI is not set
CONFIG_NE2K_PCI=m
# CONFIG_8139CP is not set
CONFIG_8139TOO=m
# CONFIG_8139TOO_PIO is not set
# CONFIG_8139TOO_TUNE_TWISTER is not set
CONFIG_8139TOO_8129=y
# CONFIG_8139_OLD_RX_RESET is not set
# CONFIG_R6040 is not set
CONFIG_SIS900=m
# CONFIG_EPIC100 is not set
# CONFIG_SMSC9420 is not set
# CONFIG_SUNDANCE is not set
# CONFIG_TLAN is not set
CONFIG_VIA_RHINE=m
CONFIG_VIA_RHINE_MMIO=y
# CONFIG_SC92031 is not set
CONFIG_NET_POCKET=y
CONFIG_ATP=m
CONFIG_DE600=m
CONFIG_DE620=m
# CONFIG_ATL2 is not set
CONFIG_NETDEV_1000=y
CONFIG_ACENIC=m
# CONFIG_ACENIC_OMIT_TIGON_I is not set
# CONFIG_DL2K is not set
CONFIG_E1000=m
CONFIG_E1000E=m
# CONFIG_IP1000 is not set
# CONFIG_IGB is not set
# CONFIG_IGBVF is not set
# CONFIG_NS83820 is not set
# CONFIG_HAMACHI is not set
# CONFIG_YELLOWFIN is not set
CONFIG_R8169=m
# CONFIG_SIS190 is not set
CONFIG_SKGE=m
# CONFIG_SKGE_DEBUG is not set
CONFIG_SKY2=m
# CONFIG_SKY2_DEBUG is not set
CONFIG_VIA_VELOCITY=m
# CONFIG_TIGON3 is not set
# CONFIG_BNX2 is not set
# CONFIG_QLA3XXX is not set
# CONFIG_ATL1 is not set
# CONFIG_ATL1E is not set
# CONFIG_ATL1C is not set
# CONFIG_JME is not set
# CONFIG_NETDEV_10000 is not set
# CONFIG_TR is not set

#
# Wireless LAN
#
# CONFIG_WLAN_PRE80211 is not set
# CONFIG_WLAN_80211 is not set

#
# Enable WiMAX (Networking options) to see the WiMAX drivers
#

#
# USB Network Adapters
#
# CONFIG_USB_CATC is not set
# CONFIG_USB_KAWETH is not set
# CONFIG_USB_PEGASUS is not set
# CONFIG_USB_RTL8150 is not set
CONFIG_USB_USBNET=m
CONFIG_USB_NET_AX8817X=m
CONFIG_USB_NET_CDCETHER=m
CONFIG_USB_NET_DM9601=m
# CONFIG_USB_NET_SMSC95XX is not set
CONFIG_USB_NET_GL620A=m
CONFIG_USB_NET_NET1080=m
# CONFIG_USB_NET_PLUSB is not set
# CONFIG_USB_NET_MCS7830 is not set
# CONFIG_USB_NET_RNDIS_HOST is not set
CONFIG_USB_NET_CDC_SUBSET=m
CONFIG_USB_ALI_M5632=y
CONFIG_USB_AN2720=y
CONFIG_USB_BELKIN=y
CONFIG_USB_ARMLINUX=y
CONFIG_USB_EPSON2888=y
CONFIG_USB_KC2190=y
# CONFIG_USB_NET_ZAURUS is not set
CONFIG_NET_PCMCIA=y
# CONFIG_PCMCIA_3C589 is not set
# CONFIG_PCMCIA_3C574 is not set
# CONFIG_PCMCIA_FMVJ18X is not set
CONFIG_PCMCIA_PCNET=m
CONFIG_PCMCIA_NMCLAN=m
CONFIG_PCMCIA_SMC91C92=m
# CONFIG_PCMCIA_XIRC2PS is not set
# CONFIG_PCMCIA_AXNET is not set
# CONFIG_WAN is not set
CONFIG_FDDI=y
# CONFIG_DEFXX is not set
# CONFIG_SKFP is not set
# CONFIG_HIPPI is not set
CONFIG_PLIP=m
CONFIG_PPP=m
CONFIG_PPP_MULTILINK=y
CONFIG_PPP_FILTER=y
CONFIG_PPP_ASYNC=m
CONFIG_PPP_SYNC_TTY=m
CONFIG_PPP_DEFLATE=m
# CONFIG_PPP_BSDCOMP is not set
# CONFIG_PPP_MPPE is not set
CONFIG_PPPOE=m
# CONFIG_PPPOL2TP is not set
CONFIG_SLIP=m
CONFIG_SLIP_COMPRESSED=y
CONFIG_SLHC=m
CONFIG_SLIP_SMART=y
# CONFIG_SLIP_MODE_SLIP6 is not set
CONFIG_NET_FC=y
CONFIG_NETCONSOLE=m
# CONFIG_NETCONSOLE_DYNAMIC is not set
CONFIG_NETPOLL=y
CONFIG_NETPOLL_TRAP=y
CONFIG_NET_POLL_CONTROLLER=y
# CONFIG_ISDN is not set
# CONFIG_PHONE is not set

#
# Input device support
#
CONFIG_INPUT=y
CONFIG_INPUT_FF_MEMLESS=y
CONFIG_INPUT_POLLDEV=m

#
# Userland interfaces
#
CONFIG_INPUT_MOUSEDEV=y
# CONFIG_INPUT_MOUSEDEV_PSAUX is not set
CONFIG_INPUT_MOUSEDEV_SCREEN_X=1024
CONFIG_INPUT_MOUSEDEV_SCREEN_Y=768
# CONFIG_INPUT_JOYDEV is not set
CONFIG_INPUT_EVDEV=y
# CONFIG_INPUT_EVBUG is not set

#
# Input Device Drivers
#
CONFIG_INPUT_KEYBOARD=y
CONFIG_KEYBOARD_ATKBD=y
# CONFIG_KEYBOARD_SUNKBD is not set
# CONFIG_KEYBOARD_LKKBD is not set
# CONFIG_KEYBOARD_XTKBD is not set
# CONFIG_KEYBOARD_NEWTON is not set
# CONFIG_KEYBOARD_STOWAWAY is not set
CONFIG_INPUT_MOUSE=y
CONFIG_MOUSE_PS2=y
CONFIG_MOUSE_PS2_ALPS=y
CONFIG_MOUSE_PS2_LOGIPS2PP=y
CONFIG_MOUSE_PS2_SYNAPTICS=y
CONFIG_MOUSE_PS2_LIFEBOOK=y
CONFIG_MOUSE_PS2_TRACKPOINT=y
# CONFIG_MOUSE_PS2_ELANTECH is not set
# CONFIG_MOUSE_PS2_TOUCHKIT is not set
CONFIG_MOUSE_SERIAL=m
CONFIG_MOUSE_APPLETOUCH=m
# CONFIG_MOUSE_BCM5974 is not set
# CONFIG_MOUSE_INPORT is not set
# CONFIG_MOUSE_LOGIBM is not set
# CONFIG_MOUSE_PC110PAD is not set
CONFIG_MOUSE_VSXXXAA=m
# CONFIG_INPUT_JOYSTICK is not set
# CONFIG_INPUT_TABLET is not set
# CONFIG_INPUT_TOUCHSCREEN is not set
CONFIG_INPUT_MISC=y
# CONFIG_INPUT_PCSPKR is not set
# CONFIG_INPUT_APANEL is not set
# CONFIG_INPUT_WISTRON_BTNS is not set
# CONFIG_INPUT_ATLAS_BTNS is not set
# CONFIG_INPUT_ATI_REMOTE is not set
# CONFIG_INPUT_ATI_REMOTE2 is not set
# CONFIG_INPUT_KEYSPAN_REMOTE is not set
# CONFIG_INPUT_POWERMATE is not set
# CONFIG_INPUT_YEALINK is not set
# CONFIG_INPUT_CM109 is not set
CONFIG_INPUT_UINPUT=m

#
# Hardware I/O ports
#
CONFIG_SERIO=y
CONFIG_SERIO_I8042=y
CONFIG_SERIO_SERPORT=y
# CONFIG_SERIO_CT82C710 is not set
# CONFIG_SERIO_PARKBD is not set
# CONFIG_SERIO_PCIPS2 is not set
CONFIG_SERIO_LIBPS2=y
CONFIG_SERIO_RAW=m
# CONFIG_GAMEPORT is not set

#
# Character devices
#
CONFIG_VT=y
CONFIG_CONSOLE_TRANSLATIONS=y
CONFIG_VT_CONSOLE=y
CONFIG_HW_CONSOLE=y
CONFIG_VT_HW_CONSOLE_BINDING=y
CONFIG_DEVKMEM=y
CONFIG_SERIAL_NONSTANDARD=y
# CONFIG_COMPUTONE is not set
CONFIG_ROCKETPORT=m
CONFIG_CYCLADES=m
# CONFIG_CYZ_INTR is not set
# CONFIG_DIGIEPCA is not set
# CONFIG_MOXA_INTELLIO is not set
# CONFIG_MOXA_SMARTIO is not set
# CONFIG_ISI is not set
# CONFIG_SYNCLINK is not set
CONFIG_SYNCLINKMP=m
CONFIG_SYNCLINK_GT=m
# CONFIG_N_HDLC is not set
# CONFIG_RISCOM8 is not set
# CONFIG_SPECIALIX is not set
# CONFIG_SX is not set
# CONFIG_RIO is not set
# CONFIG_STALDRV is not set
# CONFIG_NOZOMI is not set

#
# Serial drivers
#
CONFIG_SERIAL_8250=y
CONFIG_SERIAL_8250_CONSOLE=y
CONFIG_FIX_EARLYCON_MEM=y
CONFIG_SERIAL_8250_PCI=y
CONFIG_SERIAL_8250_PNP=y
CONFIG_SERIAL_8250_CS=m
CONFIG_SERIAL_8250_NR_UARTS=32
CONFIG_SERIAL_8250_RUNTIME_UARTS=4
CONFIG_SERIAL_8250_EXTENDED=y
CONFIG_SERIAL_8250_MANY_PORTS=y
# CONFIG_SERIAL_8250_FOURPORT is not set
# CONFIG_SERIAL_8250_ACCENT is not set
# CONFIG_SERIAL_8250_BOCA is not set
# CONFIG_SERIAL_8250_EXAR_ST16C554 is not set
# CONFIG_SERIAL_8250_HUB6 is not set
CONFIG_SERIAL_8250_SHARE_IRQ=y
CONFIG_SERIAL_8250_DETECT_IRQ=y
CONFIG_SERIAL_8250_RSA=y

#
# Non-8250 serial port support
#
CONFIG_SERIAL_CORE=y
CONFIG_SERIAL_CORE_CONSOLE=y
CONFIG_SERIAL_JSM=m
CONFIG_UNIX98_PTYS=y
# CONFIG_DEVPTS_MULTIPLE_INSTANCES is not set
# CONFIG_LEGACY_PTYS is not set
CONFIG_PRINTER=m
CONFIG_LP_CONSOLE=y
CONFIG_PPDEV=m
CONFIG_IPMI_HANDLER=m
# CONFIG_IPMI_PANIC_EVENT is not set
CONFIG_IPMI_DEVICE_INTERFACE=m
CONFIG_IPMI_SI=m
CONFIG_IPMI_WATCHDOG=m
CONFIG_IPMI_POWEROFF=m
CONFIG_HW_RANDOM=y
# CONFIG_HW_RANDOM_TIMERIOMEM is not set
CONFIG_HW_RANDOM_INTEL=m
CONFIG_HW_RANDOM_AMD=m
CONFIG_HW_RANDOM_GEODE=m
CONFIG_HW_RANDOM_VIA=m
CONFIG_NVRAM=y
CONFIG_RTC=y
# CONFIG_DTLK is not set
# CONFIG_R3964 is not set
# CONFIG_APPLICOM is not set
# CONFIG_SONYPI is not set

#
# PCMCIA character devices
#
# CONFIG_SYNCLINK_CS is not set
CONFIG_CARDMAN_4000=m
CONFIG_CARDMAN_4040=m
# CONFIG_IPWIRELESS is not set
CONFIG_MWAVE=m
# CONFIG_PC8736x_GPIO is not set
# CONFIG_NSC_GPIO is not set
# CONFIG_CS5535_GPIO is not set
# CONFIG_RAW_DRIVER is not set
CONFIG_HPET=y
# CONFIG_HPET_MMAP is not set
CONFIG_HANGCHECK_TIMER=m
# CONFIG_TCG_TPM is not set
# CONFIG_TELCLOCK is not set
CONFIG_DEVPORT=y
CONFIG_I2C=m
CONFIG_I2C_BOARDINFO=y
CONFIG_I2C_CHARDEV=m
CONFIG_I2C_HELPER_AUTO=y
CONFIG_I2C_ALGOBIT=m
CONFIG_I2C_ALGOPCA=m

#
# I2C Hardware Bus support
#

#
# PC SMBus host controller drivers
#
CONFIG_I2C_ALI1535=m
CONFIG_I2C_ALI1563=m
CONFIG_I2C_ALI15X3=m
CONFIG_I2C_AMD756=m
CONFIG_I2C_AMD756_S4882=m
# CONFIG_I2C_AMD8111 is not set
CONFIG_I2C_I801=m
# CONFIG_I2C_ISCH is not set
CONFIG_I2C_PIIX4=m
CONFIG_I2C_NFORCE2=m
# CONFIG_I2C_NFORCE2_S4985 is not set
# CONFIG_I2C_SIS5595 is not set
# CONFIG_I2C_SIS630 is not set
# CONFIG_I2C_SIS96X is not set
CONFIG_I2C_VIA=m
CONFIG_I2C_VIAPRO=m

#
# I2C system bus drivers (mostly embedded / system-on-chip)
#
# CONFIG_I2C_OCORES is not set
CONFIG_I2C_SIMTEC=m

#
# External I2C/SMBus adapter drivers
#
CONFIG_I2C_PARPORT=m
CONFIG_I2C_PARPORT_LIGHT=m
# CONFIG_I2C_TAOS_EVM is not set
# CONFIG_I2C_TINY_USB is not set

#
# Graphics adapter I2C/DDC channel drivers
#
CONFIG_I2C_VOODOO3=m

#
# Other I2C/SMBus bus drivers
#
CONFIG_I2C_PCA_ISA=m
# CONFIG_I2C_PCA_PLATFORM is not set
CONFIG_I2C_STUB=m
# CONFIG_SCx200_ACB is not set

#
# Miscellaneous I2C Chip support
#
# CONFIG_DS1682 is not set
# CONFIG_SENSORS_PCF8574 is not set
# CONFIG_PCF8575 is not set
# CONFIG_SENSORS_PCA9539 is not set
CONFIG_SENSORS_MAX6875=m
# CONFIG_SENSORS_TSL2550 is not set
# CONFIG_I2C_DEBUG_CORE is not set
# CONFIG_I2C_DEBUG_ALGO is not set
# CONFIG_I2C_DEBUG_BUS is not set
# CONFIG_I2C_DEBUG_CHIP is not set
# CONFIG_SPI is not set
CONFIG_ARCH_WANT_OPTIONAL_GPIOLIB=y
# CONFIG_GPIOLIB is not set
# CONFIG_W1 is not set
CONFIG_POWER_SUPPLY=y
# CONFIG_POWER_SUPPLY_DEBUG is not set
# CONFIG_PDA_POWER is not set
# CONFIG_BATTERY_DS2760 is not set
# CONFIG_BATTERY_BQ27x00 is not set
CONFIG_HWMON=m
CONFIG_HWMON_VID=m
# CONFIG_SENSORS_ABITUGURU is not set
# CONFIG_SENSORS_ABITUGURU3 is not set
# CONFIG_SENSORS_AD7414 is not set
CONFIG_SENSORS_AD7418=m
# CONFIG_SENSORS_ADM1021 is not set
# CONFIG_SENSORS_ADM1025 is not set
# CONFIG_SENSORS_ADM1026 is not set
# CONFIG_SENSORS_ADM1029 is not set
# CONFIG_SENSORS_ADM1031 is not set
# CONFIG_SENSORS_ADM9240 is not set
# CONFIG_SENSORS_ADT7462 is not set
# CONFIG_SENSORS_ADT7470 is not set
# CONFIG_SENSORS_ADT7473 is not set
# CONFIG_SENSORS_ADT7475 is not set
# CONFIG_SENSORS_K8TEMP is not set
# CONFIG_SENSORS_ASB100 is not set
# CONFIG_SENSORS_ATK0110 is not set
# CONFIG_SENSORS_ATXP1 is not set
# CONFIG_SENSORS_DS1621 is not set
# CONFIG_SENSORS_I5K_AMB is not set
# CONFIG_SENSORS_F71805F is not set
# CONFIG_SENSORS_F71882FG is not set
# CONFIG_SENSORS_F75375S is not set
# CONFIG_SENSORS_FSCHER is not set
# CONFIG_SENSORS_FSCPOS is not set
# CONFIG_SENSORS_FSCHMD is not set
# CONFIG_SENSORS_G760A is not set
# CONFIG_SENSORS_GL518SM is not set
# CONFIG_SENSORS_GL520SM is not set
CONFIG_SENSORS_CORETEMP=m
# CONFIG_SENSORS_IBMAEM is not set
# CONFIG_SENSORS_IBMPEX is not set
# CONFIG_SENSORS_IT87 is not set
# CONFIG_SENSORS_LM63 is not set
# CONFIG_SENSORS_LM75 is not set
# CONFIG_SENSORS_LM77 is not set
# CONFIG_SENSORS_LM78 is not set
# CONFIG_SENSORS_LM80 is not set
# CONFIG_SENSORS_LM83 is not set
# CONFIG_SENSORS_LM85 is not set
# CONFIG_SENSORS_LM87 is not set
# CONFIG_SENSORS_LM90 is not set
# CONFIG_SENSORS_LM92 is not set
# CONFIG_SENSORS_LM93 is not set
# CONFIG_SENSORS_LTC4215 is not set
# CONFIG_SENSORS_LTC4245 is not set
# CONFIG_SENSORS_LM95241 is not set
# CONFIG_SENSORS_MAX1619 is not set
# CONFIG_SENSORS_MAX6650 is not set
# CONFIG_SENSORS_PC87360 is not set
# CONFIG_SENSORS_PC87427 is not set
# CONFIG_SENSORS_PCF8591 is not set
CONFIG_SENSORS_SIS5595=m
# CONFIG_SENSORS_DME1737 is not set
# CONFIG_SENSORS_SMSC47M1 is not set
# CONFIG_SENSORS_SMSC47M192 is not set
# CONFIG_SENSORS_SMSC47B397 is not set
# CONFIG_SENSORS_ADS7828 is not set
# CONFIG_SENSORS_THMC50 is not set
CONFIG_SENSORS_VIA686A=m
CONFIG_SENSORS_VT1211=m
CONFIG_SENSORS_VT8231=m
# CONFIG_SENSORS_W83781D is not set
# CONFIG_SENSORS_W83791D is not set
# CONFIG_SENSORS_W83792D is not set
# CONFIG_SENSORS_W83793 is not set
# CONFIG_SENSORS_W83L785TS is not set
# CONFIG_SENSORS_W83L786NG is not set
# CONFIG_SENSORS_W83627HF is not set
# CONFIG_SENSORS_W83627EHF is not set
CONFIG_SENSORS_HDAPS=m
# CONFIG_SENSORS_LIS3LV02D is not set
# CONFIG_SENSORS_APPLESMC is not set
# CONFIG_HWMON_DEBUG_CHIP is not set
CONFIG_THERMAL=y
# CONFIG_WATCHDOG is not set
CONFIG_SSB_POSSIBLE=y

#
# Sonics Silicon Backplane
#
CONFIG_SSB=m
CONFIG_SSB_SPROM=y
CONFIG_SSB_PCIHOST_POSSIBLE=y
CONFIG_SSB_PCIHOST=y
# CONFIG_SSB_B43_PCI_BRIDGE is not set
CONFIG_SSB_PCMCIAHOST_POSSIBLE=y
CONFIG_SSB_PCMCIAHOST=y
# CONFIG_SSB_DEBUG is not set
CONFIG_SSB_DRIVER_PCICORE_POSSIBLE=y
CONFIG_SSB_DRIVER_PCICORE=y

#
# Multifunction device drivers
#
# CONFIG_MFD_CORE is not set
# CONFIG_MFD_SM501 is not set
# CONFIG_HTC_PASIC3 is not set
# CONFIG_MFD_TMIO is not set
# CONFIG_MFD_WM8400 is not set
# CONFIG_MFD_WM8350_I2C is not set
# CONFIG_MFD_PCF50633 is not set
# CONFIG_REGULATOR is not set

#
# Multimedia devices
#

#
# Multimedia core support
#
CONFIG_VIDEO_DEV=m
CONFIG_VIDEO_V4L2_COMMON=m
CONFIG_VIDEO_ALLOW_V4L1=y
CONFIG_VIDEO_V4L1_COMPAT=y
# CONFIG_DVB_CORE is not set
CONFIG_VIDEO_MEDIA=m

#
# Multimedia drivers
#
# CONFIG_MEDIA_ATTACH is not set
CONFIG_MEDIA_TUNER=m
# CONFIG_MEDIA_TUNER_CUSTOMISE is not set
CONFIG_MEDIA_TUNER_SIMPLE=m
CONFIG_MEDIA_TUNER_TDA8290=m
CONFIG_MEDIA_TUNER_TDA9887=m
CONFIG_MEDIA_TUNER_TEA5761=m
CONFIG_MEDIA_TUNER_TEA5767=m
CONFIG_MEDIA_TUNER_MT20XX=m
CONFIG_MEDIA_TUNER_XC2028=m
CONFIG_MEDIA_TUNER_XC5000=m
CONFIG_MEDIA_TUNER_MC44S803=m
CONFIG_VIDEO_V4L2=m
CONFIG_VIDEO_V4L1=m
CONFIG_VIDEOBUF_GEN=m
CONFIG_VIDEOBUF_DMA_SG=m
CONFIG_VIDEO_BTCX=m
CONFIG_VIDEO_IR=m
CONFIG_VIDEO_TVEEPROM=m
CONFIG_VIDEO_TUNER=m
CONFIG_VIDEO_CAPTURE_DRIVERS=y
# CONFIG_VIDEO_ADV_DEBUG is not set
# CONFIG_VIDEO_FIXED_MINOR_RANGES is not set
# CONFIG_VIDEO_HELPER_CHIPS_AUTO is not set
CONFIG_VIDEO_IR_I2C=m

#
# Encoders/decoders and other helper chips
#

#
# Audio decoders
#
CONFIG_VIDEO_TVAUDIO=m
CONFIG_VIDEO_TDA7432=m
CONFIG_VIDEO_TDA9840=m
CONFIG_VIDEO_TDA9875=m
CONFIG_VIDEO_TEA6415C=m
CONFIG_VIDEO_TEA6420=m
CONFIG_VIDEO_MSP3400=m
# CONFIG_VIDEO_CS5345 is not set
CONFIG_VIDEO_CS53L32A=m
CONFIG_VIDEO_M52790=m
CONFIG_VIDEO_TLV320AIC23B=m
CONFIG_VIDEO_WM8775=m
CONFIG_VIDEO_WM8739=m
CONFIG_VIDEO_VP27SMPX=m

#
# RDS decoders
#
# CONFIG_VIDEO_SAA6588 is not set

#
# Video decoders
#
CONFIG_VIDEO_BT819=m
CONFIG_VIDEO_BT856=m
CONFIG_VIDEO_BT866=m
CONFIG_VIDEO_KS0127=m
CONFIG_VIDEO_OV7670=m
# CONFIG_VIDEO_TCM825X is not set
CONFIG_VIDEO_SAA7110=m
CONFIG_VIDEO_SAA711X=m
CONFIG_VIDEO_SAA717X=m
CONFIG_VIDEO_SAA7191=m
# CONFIG_VIDEO_TVP514X is not set
CONFIG_VIDEO_TVP5150=m
CONFIG_VIDEO_VPX3220=m

#
# Video and audio decoders
#
CONFIG_VIDEO_CX25840=m

#
# MPEG video encoders
#
CONFIG_VIDEO_CX2341X=m

#
# Video encoders
#
CONFIG_VIDEO_SAA7127=m
CONFIG_VIDEO_SAA7185=m
CONFIG_VIDEO_ADV7170=m
CONFIG_VIDEO_ADV7175=m

#
# Video improvement chips
#
CONFIG_VIDEO_UPD64031A=m
CONFIG_VIDEO_UPD64083=m
# CONFIG_VIDEO_VIVI is not set
CONFIG_VIDEO_BT848=m
# CONFIG_VIDEO_PMS is not set
# CONFIG_VIDEO_BWQCAM is not set
# CONFIG_VIDEO_CQCAM is not set
# CONFIG_VIDEO_W9966 is not set
CONFIG_VIDEO_CPIA=m
CONFIG_VIDEO_CPIA_PP=m
CONFIG_VIDEO_CPIA_USB=m
CONFIG_VIDEO_CPIA2=m
# CONFIG_VIDEO_SAA5246A is not set
# CONFIG_VIDEO_SAA5249 is not set
# CONFIG_VIDEO_STRADIS is not set
CONFIG_VIDEO_ZORAN=m
# CONFIG_VIDEO_ZORAN_DC30 is not set
CONFIG_VIDEO_ZORAN_ZR36060=m
CONFIG_VIDEO_ZORAN_BUZ=m
# CONFIG_VIDEO_ZORAN_DC10 is not set
CONFIG_VIDEO_ZORAN_LML33=m
# CONFIG_VIDEO_ZORAN_LML33R10 is not set
# CONFIG_VIDEO_ZORAN_AVS6EYES is not set
# CONFIG_VIDEO_SAA7134 is not set
# CONFIG_VIDEO_MXB is not set
# CONFIG_VIDEO_HEXIUM_ORION is not set
# CONFIG_VIDEO_HEXIUM_GEMINI is not set
# CONFIG_VIDEO_CX88 is not set
CONFIG_VIDEO_IVTV=m
# CONFIG_VIDEO_FB_IVTV is not set
# CONFIG_VIDEO_CAFE_CCIC is not set
# CONFIG_SOC_CAMERA is not set
# CONFIG_V4L_USB_DRIVERS is not set
CONFIG_RADIO_ADAPTERS=y
# CONFIG_RADIO_CADET is not set
# CONFIG_RADIO_RTRACK is not set
# CONFIG_RADIO_RTRACK2 is not set
# CONFIG_RADIO_AZTECH is not set
# CONFIG_RADIO_GEMTEK is not set
# CONFIG_RADIO_GEMTEK_PCI is not set
CONFIG_RADIO_MAXIRADIO=m
CONFIG_RADIO_MAESTRO=m
# CONFIG_RADIO_SF16FMI is not set
# CONFIG_RADIO_SF16FMR2 is not set
# CONFIG_RADIO_TERRATEC is not set
# CONFIG_RADIO_TRUST is not set
# CONFIG_RADIO_TYPHOON is not set
# CONFIG_RADIO_ZOLTRIX is not set
CONFIG_USB_DSBR=m
# CONFIG_USB_SI470X is not set
# CONFIG_USB_MR800 is not set
# CONFIG_RADIO_TEA5764 is not set
CONFIG_DAB=y
CONFIG_USB_DABUSB=m

#
# Graphics support
#
CONFIG_AGP=y
CONFIG_AGP_ALI=y
CONFIG_AGP_ATI=y
# CONFIG_AGP_AMD is not set
# CONFIG_AGP_AMD64 is not set
CONFIG_AGP_INTEL=y
CONFIG_AGP_NVIDIA=y
CONFIG_AGP_SIS=y
# CONFIG_AGP_SWORKS is not set
CONFIG_AGP_VIA=y
CONFIG_AGP_EFFICEON=y
CONFIG_DRM=m
CONFIG_DRM_TDFX=m
CONFIG_DRM_R128=m
CONFIG_DRM_RADEON=m
CONFIG_DRM_I810=m
CONFIG_DRM_I830=m
CONFIG_DRM_I915=m
# CONFIG_DRM_I915_KMS is not set
# CONFIG_DRM_MGA is not set
CONFIG_DRM_SIS=m
# CONFIG_DRM_VIA is not set
# CONFIG_DRM_SAVAGE is not set
CONFIG_VGASTATE=m
CONFIG_VIDEO_OUTPUT_CONTROL=m
CONFIG_FB=y
# CONFIG_FIRMWARE_EDID is not set
CONFIG_FB_DDC=m
CONFIG_FB_BOOT_VESA_SUPPORT=y
CONFIG_FB_CFB_FILLRECT=y
CONFIG_FB_CFB_COPYAREA=y
CONFIG_FB_CFB_IMAGEBLIT=y
# CONFIG_FB_CFB_REV_PIXELS_IN_BYTE is not set
# CONFIG_FB_SYS_FILLRECT is not set
# CONFIG_FB_SYS_COPYAREA is not set
# CONFIG_FB_SYS_IMAGEBLIT is not set
# CONFIG_FB_FOREIGN_ENDIAN is not set
# CONFIG_FB_SYS_FOPS is not set
CONFIG_FB_SVGALIB=m
# CONFIG_FB_MACMODES is not set
CONFIG_FB_BACKLIGHT=y
CONFIG_FB_MODE_HELPERS=y
CONFIG_FB_TILEBLITTING=y

#
# Frame buffer hardware drivers
#
# CONFIG_FB_CIRRUS is not set
# CONFIG_FB_PM2 is not set
# CONFIG_FB_CYBER2000 is not set
# CONFIG_FB_ARC is not set
# CONFIG_FB_ASILIANT is not set
# CONFIG_FB_IMSTT is not set
# CONFIG_FB_VGA16 is not set
CONFIG_FB_VESA=y
# CONFIG_FB_EFI is not set
# CONFIG_FB_N411 is not set
# CONFIG_FB_HGA is not set
# CONFIG_FB_S1D13XXX is not set
CONFIG_FB_NVIDIA=m
CONFIG_FB_NVIDIA_I2C=y
# CONFIG_FB_NVIDIA_DEBUG is not set
CONFIG_FB_NVIDIA_BACKLIGHT=y
# CONFIG_FB_RIVA is not set
# CONFIG_FB_I810 is not set
# CONFIG_FB_LE80578 is not set
# CONFIG_FB_INTEL is not set
# CONFIG_FB_MATROX is not set
CONFIG_FB_RADEON=m
CONFIG_FB_RADEON_I2C=y
CONFIG_FB_RADEON_BACKLIGHT=y
# CONFIG_FB_RADEON_DEBUG is not set
# CONFIG_FB_ATY128 is not set
# CONFIG_FB_ATY is not set
CONFIG_FB_S3=m
CONFIG_FB_SAVAGE=m
CONFIG_FB_SAVAGE_I2C=y
CONFIG_FB_SAVAGE_ACCEL=y
# CONFIG_FB_SIS is not set
# CONFIG_FB_VIA is not set
# CONFIG_FB_NEOMAGIC is not set
# CONFIG_FB_KYRO is not set
# CONFIG_FB_3DFX is not set
# CONFIG_FB_VOODOO1 is not set
# CONFIG_FB_VT8623 is not set
CONFIG_FB_TRIDENT=m
# CONFIG_FB_ARK is not set
# CONFIG_FB_PM3 is not set
# CONFIG_FB_CARMINE is not set
# CONFIG_FB_GEODE is not set
# CONFIG_FB_VIRTUAL is not set
# CONFIG_FB_METRONOME is not set
# CONFIG_FB_MB862XX is not set
# CONFIG_FB_BROADSHEET is not set
CONFIG_BACKLIGHT_LCD_SUPPORT=y
CONFIG_LCD_CLASS_DEVICE=m
# CONFIG_LCD_ILI9320 is not set
# CONFIG_LCD_PLATFORM is not set
CONFIG_BACKLIGHT_CLASS_DEVICE=y
CONFIG_BACKLIGHT_GENERIC=y
CONFIG_BACKLIGHT_PROGEAR=m
# CONFIG_BACKLIGHT_MBP_NVIDIA is not set
# CONFIG_BACKLIGHT_SAHARA is not set

#
# Display device support
#
CONFIG_DISPLAY_SUPPORT=m

#
# Display hardware drivers
#

#
# Console display driver support
#
CONFIG_VGA_CONSOLE=y
CONFIG_VGACON_SOFT_SCROLLBACK=y
CONFIG_VGACON_SOFT_SCROLLBACK_SIZE=64
# CONFIG_MDA_CONSOLE is not set
CONFIG_DUMMY_CONSOLE=y
CONFIG_FRAMEBUFFER_CONSOLE=y
CONFIG_FRAMEBUFFER_CONSOLE_DETECT_PRIMARY=y
CONFIG_FRAMEBUFFER_CONSOLE_ROTATION=y
# CONFIG_FONTS is not set
CONFIG_FONT_8x8=y
CONFIG_FONT_8x16=y
CONFIG_LOGO=y
# CONFIG_LOGO_LINUX_MONO is not set
# CONFIG_LOGO_LINUX_VGA16 is not set
CONFIG_LOGO_LINUX_CLUT224=y
# CONFIG_SOUND is not set
# CONFIG_HID_SUPPORT is not set
CONFIG_USB_SUPPORT=y
CONFIG_USB_ARCH_HAS_HCD=y
CONFIG_USB_ARCH_HAS_OHCI=y
CONFIG_USB_ARCH_HAS_EHCI=y
CONFIG_USB=y
# CONFIG_USB_DEBUG is not set
# CONFIG_USB_ANNOUNCE_NEW_DEVICES is not set

#
# Miscellaneous USB options
#
CONFIG_USB_DEVICEFS=y
# CONFIG_USB_DEVICE_CLASS is not set
# CONFIG_USB_DYNAMIC_MINORS is not set
CONFIG_USB_SUSPEND=y
# CONFIG_USB_OTG is not set
# CONFIG_USB_MON is not set
# CONFIG_USB_WUSB is not set
# CONFIG_USB_WUSB_CBAF is not set

#
# USB Host Controller Drivers
#
# CONFIG_USB_C67X00_HCD is not set
CONFIG_USB_EHCI_HCD=m
CONFIG_USB_EHCI_ROOT_HUB_TT=y
CONFIG_USB_EHCI_TT_NEWSCHED=y
# CONFIG_USB_OXU210HP_HCD is not set
# CONFIG_USB_ISP116X_HCD is not set
# CONFIG_USB_ISP1760_HCD is not set
CONFIG_USB_OHCI_HCD=m
# CONFIG_USB_OHCI_HCD_SSB is not set
# CONFIG_USB_OHCI_BIG_ENDIAN_DESC is not set
# CONFIG_USB_OHCI_BIG_ENDIAN_MMIO is not set
CONFIG_USB_OHCI_LITTLE_ENDIAN=y
CONFIG_USB_UHCI_HCD=m
# CONFIG_USB_U132_HCD is not set
# CONFIG_USB_SL811_HCD is not set
# CONFIG_USB_R8A66597_HCD is not set
# CONFIG_USB_WHCI_HCD is not set
# CONFIG_USB_HWA_HCD is not set

#
# USB Device Class drivers
#
# CONFIG_USB_ACM is not set
# CONFIG_USB_PRINTER is not set
# CONFIG_USB_WDM is not set
# CONFIG_USB_TMC is not set

#
# NOTE: USB_STORAGE depends on SCSI but BLK_DEV_SD may
#

#
# also be needed; see USB_STORAGE Help for more info
#
CONFIG_USB_STORAGE=m
# CONFIG_USB_STORAGE_DEBUG is not set
CONFIG_USB_STORAGE_DATAFAB=m
CONFIG_USB_STORAGE_FREECOM=m
# CONFIG_USB_STORAGE_ISD200 is not set
CONFIG_USB_STORAGE_USBAT=m
# CONFIG_USB_STORAGE_SDDR09 is not set
# CONFIG_USB_STORAGE_SDDR55 is not set
# CONFIG_USB_STORAGE_JUMPSHOT is not set
# CONFIG_USB_STORAGE_ALAUDA is not set
# CONFIG_USB_STORAGE_ONETOUCH is not set
# CONFIG_USB_STORAGE_KARMA is not set
# CONFIG_USB_STORAGE_CYPRESS_ATACB is not set
# CONFIG_USB_LIBUSUAL is not set

#
# USB Imaging devices
#
# CONFIG_USB_MDC800 is not set
# CONFIG_USB_MICROTEK is not set

#
# USB port drivers
#
# CONFIG_USB_USS720 is not set
CONFIG_USB_SERIAL=m
CONFIG_USB_EZUSB=y
CONFIG_USB_SERIAL_GENERIC=y
# CONFIG_USB_SERIAL_AIRCABLE is not set
# CONFIG_USB_SERIAL_ARK3116 is not set
# CONFIG_USB_SERIAL_BELKIN is not set
# CONFIG_USB_SERIAL_CH341 is not set
# CONFIG_USB_SERIAL_WHITEHEAT is not set
# CONFIG_USB_SERIAL_DIGI_ACCELEPORT is not set
# CONFIG_USB_SERIAL_CP210X is not set
# CONFIG_USB_SERIAL_CYPRESS_M8 is not set
CONFIG_USB_SERIAL_EMPEG=m
# CONFIG_USB_SERIAL_FTDI_SIO is not set
# CONFIG_USB_SERIAL_FUNSOFT is not set
# CONFIG_USB_SERIAL_VISOR is not set
# CONFIG_USB_SERIAL_IPAQ is not set
# CONFIG_USB_SERIAL_IR is not set
# CONFIG_USB_SERIAL_EDGEPORT is not set
# CONFIG_USB_SERIAL_EDGEPORT_TI is not set
# CONFIG_USB_SERIAL_GARMIN is not set
# CONFIG_USB_SERIAL_IPW is not set
# CONFIG_USB_SERIAL_IUU is not set
# CONFIG_USB_SERIAL_KEYSPAN_PDA is not set
CONFIG_USB_SERIAL_KEYSPAN=m
# CONFIG_USB_SERIAL_KEYSPAN_MPR is not set
# CONFIG_USB_SERIAL_KEYSPAN_USA28 is not set
# CONFIG_USB_SERIAL_KEYSPAN_USA28X is not set
# CONFIG_USB_SERIAL_KEYSPAN_USA28XA is not set
# CONFIG_USB_SERIAL_KEYSPAN_USA28XB is not set
# CONFIG_USB_SERIAL_KEYSPAN_USA19 is not set
# CONFIG_USB_SERIAL_KEYSPAN_USA18X is not set
# CONFIG_USB_SERIAL_KEYSPAN_USA19W is not set
CONFIG_USB_SERIAL_KEYSPAN_USA19QW=y
CONFIG_USB_SERIAL_KEYSPAN_USA19QI=y
CONFIG_USB_SERIAL_KEYSPAN_USA49W=y
CONFIG_USB_SERIAL_KEYSPAN_USA49WLC=y
# CONFIG_USB_SERIAL_KLSI is not set
# CONFIG_USB_SERIAL_KOBIL_SCT is not set
# CONFIG_USB_SERIAL_MCT_U232 is not set
# CONFIG_USB_SERIAL_MOS7720 is not set
# CONFIG_USB_SERIAL_MOS7840 is not set
# CONFIG_USB_SERIAL_MOTOROLA is not set
# CONFIG_USB_SERIAL_NAVMAN is not set
# CONFIG_USB_SERIAL_PL2303 is not set
# CONFIG_USB_SERIAL_OTI6858 is not set
# CONFIG_USB_SERIAL_QUALCOMM is not set
# CONFIG_USB_SERIAL_SPCP8X5 is not set
# CONFIG_USB_SERIAL_HP4X is not set
# CONFIG_USB_SERIAL_SAFE is not set
# CONFIG_USB_SERIAL_SIEMENS_MPI is not set
# CONFIG_USB_SERIAL_SIERRAWIRELESS is not set
# CONFIG_USB_SERIAL_SYMBOL is not set
# CONFIG_USB_SERIAL_TI is not set
# CONFIG_USB_SERIAL_CYBERJACK is not set
# CONFIG_USB_SERIAL_XIRCOM is not set
# CONFIG_USB_SERIAL_OPTION is not set
# CONFIG_USB_SERIAL_OMNINET is not set
# CONFIG_USB_SERIAL_OPTICON is not set
# CONFIG_USB_SERIAL_DEBUG is not set

#
# USB Miscellaneous drivers
#
# CONFIG_USB_EMI62 is not set
# CONFIG_USB_EMI26 is not set
# CONFIG_USB_ADUTUX is not set
# CONFIG_USB_SEVSEG is not set
# CONFIG_USB_RIO500 is not set
# CONFIG_USB_LEGOTOWER is not set
# CONFIG_USB_LCD is not set
# CONFIG_USB_BERRY_CHARGE is not set
# CONFIG_USB_LED is not set
# CONFIG_USB_CYPRESS_CY7C63 is not set
# CONFIG_USB_CYTHERM is not set
# CONFIG_USB_IDMOUSE is not set
CONFIG_USB_FTDI_ELAN=m
# CONFIG_USB_APPLEDISPLAY is not set
# CONFIG_USB_SISUSBVGA is not set
# CONFIG_USB_LD is not set
# CONFIG_USB_TRANCEVIBRATOR is not set
# CONFIG_USB_IOWARRIOR is not set
# CONFIG_USB_TEST is not set
# CONFIG_USB_ISIGHTFW is not set
# CONFIG_USB_VST is not set
# CONFIG_USB_GADGET is not set

#
# OTG and related infrastructure
#
# CONFIG_NOP_USB_XCEIV is not set
# CONFIG_UWB is not set
# CONFIG_MMC is not set
# CONFIG_MEMSTICK is not set
CONFIG_NEW_LEDS=y
CONFIG_LEDS_CLASS=y

#
# LED drivers
#
# CONFIG_LEDS_ALIX2 is not set
# CONFIG_LEDS_PCA9532 is not set
# CONFIG_LEDS_LP5521 is not set
# CONFIG_LEDS_CLEVO_MAIL is not set
# CONFIG_LEDS_PCA955X is not set
# CONFIG_LEDS_BD2802 is not set

#
# LED Triggers
#
CONFIG_LEDS_TRIGGERS=y
CONFIG_LEDS_TRIGGER_TIMER=m
# CONFIG_LEDS_TRIGGER_HEARTBEAT is not set
# CONFIG_LEDS_TRIGGER_BACKLIGHT is not set
# CONFIG_LEDS_TRIGGER_DEFAULT_ON is not set

#
# iptables trigger is under Netfilter config (LED target)
#
# CONFIG_ACCESSIBILITY is not set
# CONFIG_INFINIBAND is not set
# CONFIG_EDAC is not set
# CONFIG_RTC_CLASS is not set
# CONFIG_DMADEVICES is not set
# CONFIG_AUXDISPLAY is not set
CONFIG_UIO=m
# CONFIG_UIO_CIF is not set
# CONFIG_UIO_PDRV is not set
# CONFIG_UIO_PDRV_GENIRQ is not set
# CONFIG_UIO_SMX is not set
# CONFIG_UIO_AEC is not set
# CONFIG_UIO_SERCOS3 is not set
# CONFIG_STAGING is not set
CONFIG_X86_PLATFORM_DEVICES=y
# CONFIG_ASUS_LAPTOP is not set
# CONFIG_FUJITSU_LAPTOP is not set
# CONFIG_TC1100_WMI is not set
# CONFIG_MSI_LAPTOP is not set
# CONFIG_PANASONIC_LAPTOP is not set
# CONFIG_COMPAL_LAPTOP is not set
# CONFIG_THINKPAD_ACPI is not set
# CONFIG_INTEL_MENLOW is not set
# CONFIG_EEEPC_LAPTOP is not set
# CONFIG_ACPI_WMI is not set
# CONFIG_ACPI_ASUS is not set
# CONFIG_ACPI_TOSHIBA is not set

#
# Firmware Drivers
#
CONFIG_EDD=m
# CONFIG_EDD_OFF is not set
CONFIG_FIRMWARE_MEMMAP=y
CONFIG_EFI_VARS=y
# CONFIG_DELL_RBU is not set
# CONFIG_DCDBAS is not set
CONFIG_DMIID=y
# CONFIG_ISCSI_IBFT_FIND is not set

#
# File systems
#
CONFIG_EXT2_FS=m
# CONFIG_EXT2_FS_XATTR is not set
CONFIG_EXT2_FS_XIP=y
CONFIG_EXT3_FS=m
# CONFIG_EXT3_DEFAULTS_TO_ORDERED is not set
CONFIG_EXT3_FS_XATTR=y
CONFIG_EXT3_FS_POSIX_ACL=y
CONFIG_EXT3_FS_SECURITY=y
CONFIG_EXT4_FS=m
CONFIG_EXT4DEV_COMPAT=y
CONFIG_EXT4_FS_XATTR=y
CONFIG_EXT4_FS_POSIX_ACL=y
CONFIG_EXT4_FS_SECURITY=y
CONFIG_FS_XIP=y
CONFIG_JBD=m
# CONFIG_JBD_DEBUG is not set
CONFIG_JBD2=m
# CONFIG_JBD2_DEBUG is not set
CONFIG_FS_MBCACHE=m
# CONFIG_REISERFS_FS is not set
# CONFIG_JFS_FS is not set
CONFIG_FS_POSIX_ACL=y
CONFIG_FILE_LOCKING=y
# CONFIG_XFS_FS is not set
# CONFIG_GFS2_FS is not set
# CONFIG_OCFS2_FS is not set
# CONFIG_BTRFS_FS is not set
CONFIG_DNOTIFY=y
CONFIG_INOTIFY=y
CONFIG_INOTIFY_USER=y
# CONFIG_QUOTA is not set
# CONFIG_AUTOFS_FS is not set
CONFIG_AUTOFS4_FS=m
CONFIG_FUSE_FS=m
CONFIG_GENERIC_ACL=y

#
# Caches
#
# CONFIG_FSCACHE is not set

#
# CD-ROM/DVD Filesystems
#
CONFIG_ISO9660_FS=y
CONFIG_JOLIET=y
CONFIG_ZISOFS=y
CONFIG_UDF_FS=y
CONFIG_UDF_NLS=y

#
# DOS/FAT/NT Filesystems
#
CONFIG_FAT_FS=m
CONFIG_MSDOS_FS=m
CONFIG_VFAT_FS=m
CONFIG_FAT_DEFAULT_CODEPAGE=437
CONFIG_FAT_DEFAULT_IOCHARSET="ascii"
# CONFIG_NTFS_FS is not set

#
# Pseudo filesystems
#
CONFIG_PROC_FS=y
CONFIG_PROC_KCORE=y
CONFIG_PROC_VMCORE=y
CONFIG_PROC_SYSCTL=y
CONFIG_PROC_PAGE_MONITOR=y
CONFIG_SYSFS=y
CONFIG_TMPFS=y
CONFIG_TMPFS_POSIX_ACL=y
CONFIG_HUGETLBFS=y
CONFIG_HUGETLB_PAGE=y
CONFIG_CONFIGFS_FS=m
CONFIG_MISC_FILESYSTEMS=y
# CONFIG_ADFS_FS is not set
# CONFIG_AFFS_FS is not set
# CONFIG_HFS_FS is not set
# CONFIG_HFSPLUS_FS is not set
# CONFIG_BEFS_FS is not set
# CONFIG_BFS_FS is not set
# CONFIG_EFS_FS is not set
CONFIG_CRAMFS=m
# CONFIG_SQUASHFS is not set
# CONFIG_VXFS_FS is not set
# CONFIG_MINIX_FS is not set
# CONFIG_OMFS_FS is not set
# CONFIG_HPFS_FS is not set
# CONFIG_QNX4FS_FS is not set
CONFIG_ROMFS_FS=m
CONFIG_ROMFS_BACKED_BY_BLOCK=y
# CONFIG_ROMFS_BACKED_BY_MTD is not set
# CONFIG_ROMFS_BACKED_BY_BOTH is not set
CONFIG_ROMFS_ON_BLOCK=y
# CONFIG_SYSV_FS is not set
CONFIG_UFS_FS=m
# CONFIG_UFS_FS_WRITE is not set
# CONFIG_UFS_DEBUG is not set
# CONFIG_NILFS2_FS is not set
CONFIG_NETWORK_FILESYSTEMS=y
CONFIG_NFS_FS=m
CONFIG_NFS_V3=y
CONFIG_NFS_V3_ACL=y
CONFIG_NFS_V4=y
# CONFIG_NFSD is not set
CONFIG_LOCKD=m
CONFIG_LOCKD_V4=y
CONFIG_NFS_ACL_SUPPORT=m
CONFIG_NFS_COMMON=y
CONFIG_SUNRPC=m
CONFIG_SUNRPC_GSS=m
CONFIG_RPCSEC_GSS_KRB5=m
# CONFIG_RPCSEC_GSS_SPKM3 is not set
# CONFIG_SMB_FS is not set
# CONFIG_CIFS is not set
# CONFIG_NCP_FS is not set
# CONFIG_CODA_FS is not set
# CONFIG_AFS_FS is not set

#
# Partition Types
#
CONFIG_PARTITION_ADVANCED=y
# CONFIG_ACORN_PARTITION is not set
# CONFIG_OSF_PARTITION is not set
# CONFIG_AMIGA_PARTITION is not set
# CONFIG_ATARI_PARTITION is not set
# CONFIG_MAC_PARTITION is not set
CONFIG_MSDOS_PARTITION=y
CONFIG_BSD_DISKLABEL=y
# CONFIG_MINIX_SUBPARTITION is not set
# CONFIG_SOLARIS_X86_PARTITION is not set
# CONFIG_UNIXWARE_DISKLABEL is not set
# CONFIG_LDM_PARTITION is not set
# CONFIG_SGI_PARTITION is not set
# CONFIG_ULTRIX_PARTITION is not set
# CONFIG_SUN_PARTITION is not set
# CONFIG_KARMA_PARTITION is not set
CONFIG_EFI_PARTITION=y
# CONFIG_SYSV68_PARTITION is not set
CONFIG_NLS=y
CONFIG_NLS_DEFAULT="utf8"
CONFIG_NLS_CODEPAGE_437=y
# CONFIG_NLS_CODEPAGE_737 is not set
# CONFIG_NLS_CODEPAGE_775 is not set
CONFIG_NLS_CODEPAGE_850=m
CONFIG_NLS_CODEPAGE_852=m
# CONFIG_NLS_CODEPAGE_855 is not set
# CONFIG_NLS_CODEPAGE_857 is not set
# CONFIG_NLS_CODEPAGE_860 is not set
# CONFIG_NLS_CODEPAGE_861 is not set
# CONFIG_NLS_CODEPAGE_862 is not set
CONFIG_NLS_CODEPAGE_863=m
# CONFIG_NLS_CODEPAGE_864 is not set
# CONFIG_NLS_CODEPAGE_865 is not set
# CONFIG_NLS_CODEPAGE_866 is not set
# CONFIG_NLS_CODEPAGE_869 is not set
CONFIG_NLS_CODEPAGE_936=m
CONFIG_NLS_CODEPAGE_950=m
CONFIG_NLS_CODEPAGE_932=m
# CONFIG_NLS_CODEPAGE_949 is not set
# CONFIG_NLS_CODEPAGE_874 is not set
CONFIG_NLS_ISO8859_8=m
CONFIG_NLS_CODEPAGE_1250=m
CONFIG_NLS_CODEPAGE_1251=m
CONFIG_NLS_ASCII=y
# CONFIG_NLS_ISO8859_1 is not set
# CONFIG_NLS_ISO8859_2 is not set
# CONFIG_NLS_ISO8859_3 is not set
# CONFIG_NLS_ISO8859_4 is not set
# CONFIG_NLS_ISO8859_5 is not set
# CONFIG_NLS_ISO8859_6 is not set
# CONFIG_NLS_ISO8859_7 is not set
# CONFIG_NLS_ISO8859_9 is not set
# CONFIG_NLS_ISO8859_13 is not set
# CONFIG_NLS_ISO8859_14 is not set
# CONFIG_NLS_ISO8859_15 is not set
# CONFIG_NLS_KOI8_R is not set
# CONFIG_NLS_KOI8_U is not set
CONFIG_NLS_UTF8=m
# CONFIG_DLM is not set

#
# Kernel hacking
#
CONFIG_TRACE_IRQFLAGS_SUPPORT=y
# CONFIG_PRINTK_TIME is not set
# CONFIG_ENABLE_WARN_DEPRECATED is not set
# CONFIG_ENABLE_MUST_CHECK is not set
CONFIG_FRAME_WARN=1024
CONFIG_MAGIC_SYSRQ=y
# CONFIG_UNUSED_SYMBOLS is not set
CONFIG_DEBUG_FS=y
CONFIG_HEADERS_CHECK=y
CONFIG_DEBUG_KERNEL=y
CONFIG_DEBUG_SHIRQ=y
CONFIG_DETECT_SOFTLOCKUP=y
# CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC is not set
CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC_VALUE=0
CONFIG_DETECT_HUNG_TASK=y
# CONFIG_BOOTPARAM_HUNG_TASK_PANIC is not set
CONFIG_BOOTPARAM_HUNG_TASK_PANIC_VALUE=0
CONFIG_SCHED_DEBUG=y
CONFIG_SCHEDSTATS=y
CONFIG_TIMER_STATS=y
# CONFIG_DEBUG_OBJECTS is not set
# CONFIG_SLUB_DEBUG_ON is not set
# CONFIG_SLUB_STATS is not set
CONFIG_DEBUG_PREEMPT=y
# CONFIG_DEBUG_RT_MUTEXES is not set
# CONFIG_RT_MUTEX_TESTER is not set
CONFIG_DEBUG_SPINLOCK=y
CONFIG_DEBUG_MUTEXES=y
CONFIG_DEBUG_LOCK_ALLOC=y
CONFIG_PROVE_LOCKING=y
CONFIG_LOCKDEP=y
# CONFIG_LOCK_STAT is not set
CONFIG_DEBUG_LOCKDEP=y
CONFIG_TRACE_IRQFLAGS=y
CONFIG_DEBUG_SPINLOCK_SLEEP=y
# CONFIG_DEBUG_LOCKING_API_SELFTESTS is not set
CONFIG_STACKTRACE=y
# CONFIG_DEBUG_KOBJECT is not set
CONFIG_DEBUG_HIGHMEM=y
CONFIG_DEBUG_BUGVERBOSE=y
CONFIG_DEBUG_INFO=y
# CONFIG_DEBUG_VM is not set
# CONFIG_DEBUG_VIRTUAL is not set
# CONFIG_DEBUG_WRITECOUNT is not set
CONFIG_DEBUG_MEMORY_INIT=y
CONFIG_DEBUG_LIST=y
# CONFIG_DEBUG_SG is not set
# CONFIG_DEBUG_NOTIFIERS is not set
CONFIG_ARCH_WANT_FRAME_POINTERS=y
CONFIG_FRAME_POINTER=y
# CONFIG_BOOT_PRINTK_DELAY is not set
# CONFIG_RCU_TORTURE_TEST is not set
# CONFIG_KPROBES_SANITY_TEST is not set
# CONFIG_BACKTRACE_SELF_TEST is not set
# CONFIG_DEBUG_BLOCK_EXT_DEVT is not set
# CONFIG_LKDTM is not set
# CONFIG_FAULT_INJECTION is not set
# CONFIG_LATENCYTOP is not set
CONFIG_SYSCTL_SYSCALL_CHECK=y
# CONFIG_DEBUG_PAGEALLOC is not set
CONFIG_USER_STACKTRACE_SUPPORT=y
CONFIG_NOP_TRACER=y
CONFIG_HAVE_FTRACE_NMI_ENTER=y
CONFIG_HAVE_FUNCTION_TRACER=y
CONFIG_HAVE_FUNCTION_GRAPH_TRACER=y
CONFIG_HAVE_FUNCTION_TRACE_MCOUNT_TEST=y
CONFIG_HAVE_DYNAMIC_FTRACE=y
CONFIG_HAVE_FTRACE_MCOUNT_RECORD=y
CONFIG_HAVE_FTRACE_SYSCALLS=y
CONFIG_TRACER_MAX_TRACE=y
CONFIG_RING_BUFFER=y
CONFIG_FTRACE_NMI_ENTER=y
CONFIG_TRACING=y
CONFIG_TRACING_SUPPORT=y

#
# Tracers
#
CONFIG_FUNCTION_TRACER=y
CONFIG_FUNCTION_GRAPH_TRACER=y
CONFIG_IRQSOFF_TRACER=y
CONFIG_PREEMPT_TRACER=y
CONFIG_SYSPROF_TRACER=y
CONFIG_SCHED_TRACER=y
CONFIG_CONTEXT_SWITCH_TRACER=y
CONFIG_EVENT_TRACER=y
CONFIG_FTRACE_SYSCALLS=y
CONFIG_BOOT_TRACER=y
# CONFIG_TRACE_BRANCH_PROFILING is not set
CONFIG_POWER_TRACER=y
CONFIG_STACK_TRACER=y
# CONFIG_KMEMTRACE is not set
CONFIG_WORKQUEUE_TRACER=y
CONFIG_BLK_DEV_IO_TRACE=y
CONFIG_DYNAMIC_FTRACE=y
CONFIG_FTRACE_MCOUNT_RECORD=y
# CONFIG_FTRACE_STARTUP_TEST is not set
CONFIG_MMIOTRACE=y
CONFIG_MMIOTRACE_TEST=m
# CONFIG_PROVIDE_OHCI1394_DMA_INIT is not set
# CONFIG_FIREWIRE_OHCI_REMOTE_DMA is not set
# CONFIG_BUILD_DOCSRC is not set
# CONFIG_DYNAMIC_DEBUG is not set
# CONFIG_DMA_API_DEBUG is not set
CONFIG_SAMPLES=y
# CONFIG_SAMPLE_MARKERS is not set
# CONFIG_SAMPLE_TRACEPOINTS is not set
CONFIG_SAMPLE_KOBJECT=m
CONFIG_SAMPLE_KPROBES=m
CONFIG_SAMPLE_KRETPROBES=m
CONFIG_HAVE_ARCH_KGDB=y
# CONFIG_KGDB is not set
# CONFIG_STRICT_DEVMEM is not set
CONFIG_X86_VERBOSE_BOOTUP=y
CONFIG_EARLY_PRINTK=y
# CONFIG_EARLY_PRINTK_DBGP is not set
# CONFIG_DEBUG_STACKOVERFLOW is not set
# CONFIG_DEBUG_STACK_USAGE is not set
# CONFIG_DEBUG_PER_CPU_MAPS is not set
# CONFIG_X86_PTDUMP is not set
CONFIG_DEBUG_RODATA=y
# CONFIG_DEBUG_RODATA_TEST is not set
# CONFIG_DEBUG_NX_TEST is not set
CONFIG_4KSTACKS=y
CONFIG_DOUBLEFAULT=y
CONFIG_HAVE_MMIOTRACE_SUPPORT=y
CONFIG_IO_DELAY_TYPE_0X80=0
CONFIG_IO_DELAY_TYPE_0XED=1
CONFIG_IO_DELAY_TYPE_UDELAY=2
CONFIG_IO_DELAY_TYPE_NONE=3
CONFIG_IO_DELAY_0X80=y
# CONFIG_IO_DELAY_0XED is not set
# CONFIG_IO_DELAY_UDELAY is not set
# CONFIG_IO_DELAY_NONE is not set
CONFIG_DEFAULT_IO_DELAY_TYPE=0
# CONFIG_DEBUG_BOOT_PARAMS is not set
# CONFIG_CPA_DEBUG is not set
# CONFIG_OPTIMIZE_INLINING is not set

#
# Security options
#
# CONFIG_KEYS is not set
# CONFIG_SECURITY is not set
# CONFIG_SECURITYFS is not set
# CONFIG_SECURITY_FILE_CAPABILITIES is not set
# CONFIG_IMA is not set
CONFIG_CRYPTO=y

#
# Crypto core or helper
#
# CONFIG_CRYPTO_FIPS is not set
CONFIG_CRYPTO_ALGAPI=y
CONFIG_CRYPTO_ALGAPI2=y
CONFIG_CRYPTO_AEAD2=y
CONFIG_CRYPTO_BLKCIPHER=m
CONFIG_CRYPTO_BLKCIPHER2=y
CONFIG_CRYPTO_HASH=y
CONFIG_CRYPTO_HASH2=y
CONFIG_CRYPTO_RNG2=y
CONFIG_CRYPTO_PCOMP=y
CONFIG_CRYPTO_MANAGER=y
CONFIG_CRYPTO_MANAGER2=y
# CONFIG_CRYPTO_GF128MUL is not set
CONFIG_CRYPTO_NULL=m
CONFIG_CRYPTO_WORKQUEUE=y
# CONFIG_CRYPTO_CRYPTD is not set
# CONFIG_CRYPTO_AUTHENC is not set
# CONFIG_CRYPTO_TEST is not set

#
# Authenticated Encryption with Associated Data
#
# CONFIG_CRYPTO_CCM is not set
# CONFIG_CRYPTO_GCM is not set
# CONFIG_CRYPTO_SEQIV is not set

#
# Block modes
#
CONFIG_CRYPTO_CBC=m
# CONFIG_CRYPTO_CTR is not set
# CONFIG_CRYPTO_CTS is not set
# CONFIG_CRYPTO_ECB is not set
# CONFIG_CRYPTO_LRW is not set
# CONFIG_CRYPTO_PCBC is not set
# CONFIG_CRYPTO_XTS is not set

#
# Hash modes
#
# CONFIG_CRYPTO_HMAC is not set
# CONFIG_CRYPTO_XCBC is not set

#
# Digest
#
CONFIG_CRYPTO_CRC32C=y
# CONFIG_CRYPTO_CRC32C_INTEL is not set
CONFIG_CRYPTO_MD4=m
CONFIG_CRYPTO_MD5=y
# CONFIG_CRYPTO_MICHAEL_MIC is not set
# CONFIG_CRYPTO_RMD128 is not set
# CONFIG_CRYPTO_RMD160 is not set
# CONFIG_CRYPTO_RMD256 is not set
# CONFIG_CRYPTO_RMD320 is not set
CONFIG_CRYPTO_SHA1=y
CONFIG_CRYPTO_SHA256=m
# CONFIG_CRYPTO_SHA512 is not set
# CONFIG_CRYPTO_TGR192 is not set
# CONFIG_CRYPTO_WP512 is not set

#
# Ciphers
#
CONFIG_CRYPTO_AES=m
# CONFIG_CRYPTO_AES_586 is not set
# CONFIG_CRYPTO_ANUBIS is not set
# CONFIG_CRYPTO_ARC4 is not set
# CONFIG_CRYPTO_BLOWFISH is not set
# CONFIG_CRYPTO_CAMELLIA is not set
# CONFIG_CRYPTO_CAST5 is not set
# CONFIG_CRYPTO_CAST6 is not set
CONFIG_CRYPTO_DES=m
# CONFIG_CRYPTO_FCRYPT is not set
# CONFIG_CRYPTO_KHAZAD is not set
# CONFIG_CRYPTO_SALSA20 is not set
# CONFIG_CRYPTO_SALSA20_586 is not set
# CONFIG_CRYPTO_SEED is not set
# CONFIG_CRYPTO_SERPENT is not set
# CONFIG_CRYPTO_TEA is not set
# CONFIG_CRYPTO_TWOFISH is not set
# CONFIG_CRYPTO_TWOFISH_586 is not set

#
# Compression
#
# CONFIG_CRYPTO_DEFLATE is not set
# CONFIG_CRYPTO_ZLIB is not set
# CONFIG_CRYPTO_LZO is not set

#
# Random Number Generation
#
# CONFIG_CRYPTO_ANSI_CPRNG is not set
# CONFIG_CRYPTO_HW is not set
CONFIG_HAVE_KVM=y
CONFIG_HAVE_KVM_IRQCHIP=y
# CONFIG_VIRTUALIZATION is not set
CONFIG_BINARY_PRINTF=y

#
# Library routines
#
CONFIG_BITREVERSE=y
CONFIG_GENERIC_FIND_FIRST_BIT=y
CONFIG_GENERIC_FIND_NEXT_BIT=y
CONFIG_GENERIC_FIND_LAST_BIT=y
CONFIG_CRC_CCITT=m
CONFIG_CRC16=m
# CONFIG_CRC_T10DIF is not set
CONFIG_CRC_ITU_T=y
CONFIG_CRC32=y
# CONFIG_CRC7 is not set
# CONFIG_LIBCRC32C is not set
CONFIG_ZLIB_INFLATE=y
CONFIG_ZLIB_DEFLATE=m
CONFIG_DECOMPRESS_GZIP=y
CONFIG_DECOMPRESS_BZIP2=y
CONFIG_DECOMPRESS_LZMA=y
CONFIG_HAS_IOMEM=y
CONFIG_HAS_IOPORT=y
CONFIG_HAS_DMA=y
CONFIG_NLATTR=y

[-- Attachment #3: dmesg.txt --]
[-- Type: text/plain, Size: 90566 bytes --]

Initializing cgroup subsys cpuset
Initializing cgroup subsys cpu
Linux version 2.6.30-rc4-io (root-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org) (gcc version 4.1.2 20070925 (Red Hat 4.1.2-33)) #6 SMP PREEMPT Thu May 7 11:07:49 CST 2009
KERNEL supported cpus:
  Intel GenuineIntel
  AMD AuthenticAMD
  NSC Geode by NSC
  Cyrix CyrixInstead
  Centaur CentaurHauls
  Transmeta GenuineTMx86
  Transmeta TransmetaCPU
  UMC UMC UMC UMC
BIOS-provided physical RAM map:
 BIOS-e820: 0000000000000000 - 000000000009f400 (usable)
 BIOS-e820: 000000000009f400 - 00000000000a0000 (reserved)
 BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved)
 BIOS-e820: 0000000000100000 - 000000003bff0000 (usable)
 BIOS-e820: 000000003bff0000 - 000000003bff3000 (ACPI NVS)
 BIOS-e820: 000000003bff3000 - 000000003c000000 (ACPI data)
 BIOS-e820: 00000000fec00000 - 0000000100000000 (reserved)
DMI 2.3 present.
Phoenix BIOS detected: BIOS may corrupt low RAM, working around it.
e820 update range: 0000000000000000 - 0000000000010000 (usable) ==> (reserved)
last_pfn = 0x3bff0 max_arch_pfn = 0x100000
MTRR default type: uncachable
MTRR fixed ranges enabled:
  00000-9FFFF write-back
  A0000-BFFFF uncachable
  C0000-C7FFF write-protect
  C8000-FFFFF uncachable
MTRR variable ranges enabled:
  0 base 000000000 mask FC0000000 write-back
  1 base 03C000000 mask FFC000000 uncachable
  2 base 0D0000000 mask FF8000000 write-combining
  3 disabled
  4 disabled
  5 disabled
  6 disabled
  7 disabled
init_memory_mapping: 0000000000000000-00000000377fe000
 0000000000 - 0000400000 page 4k
 0000400000 - 0037400000 page 2M
 0037400000 - 00377fe000 page 4k
kernel direct mapping tables up to 377fe000 @ 10000-15000
RAMDISK: 37d0d000 - 37fefd69
Allocated new RAMDISK: 00100000 - 003e2d69
Move RAMDISK from 0000000037d0d000 - 0000000037fefd68 to 00100000 - 003e2d68
ACPI: RSDP 000f7560 00014 (v00 AWARD )
ACPI: RSDT 3bff3040 0002C (v01 AWARD  AWRDACPI 42302E31 AWRD 00000000)
ACPI: FACP 3bff30c0 00074 (v01 AWARD  AWRDACPI 42302E31 AWRD 00000000)
ACPI: DSDT 3bff3180 03ABC (v01 AWARD  AWRDACPI 00001000 MSFT 0100000E)
ACPI: FACS 3bff0000 00040
ACPI: APIC 3bff6c80 00084 (v01 AWARD  AWRDACPI 42302E31 AWRD 00000000)
ACPI: Local APIC address 0xfee00000
71MB HIGHMEM available.
887MB LOWMEM available.
  mapped low ram: 0 - 377fe000
  low ram: 0 - 377fe000
  node 0 low ram: 00000000 - 377fe000
  node 0 bootmap 00011000 - 00017f00
(9 early reservations) ==> bootmem [0000000000 - 00377fe000]
  #0 [0000000000 - 0000001000]   BIOS data page ==> [0000000000 - 0000001000]
  #1 [0000001000 - 0000002000]    EX TRAMPOLINE ==> [0000001000 - 0000002000]
  #2 [0000006000 - 0000007000]       TRAMPOLINE ==> [0000006000 - 0000007000]
  #3 [0000400000 - 0000c6bd1c]    TEXT DATA BSS ==> [0000400000 - 0000c6bd1c]
  #4 [000009f400 - 0000100000]    BIOS reserved ==> [000009f400 - 0000100000]
  #5 [0000c6c000 - 0000c700ed]              BRK ==> [0000c6c000 - 0000c700ed]
  #6 [0000010000 - 0000011000]          PGTABLE ==> [0000010000 - 0000011000]
  #7 [0000100000 - 00003e2d69]      NEW RAMDISK ==> [0000100000 - 00003e2d69]
  #8 [0000011000 - 0000018000]          BOOTMAP ==> [0000011000 - 0000018000]
found SMP MP-table at [c00f5ad0] f5ad0
Zone PFN ranges:
  DMA      0x00000010 -> 0x00001000
  Normal   0x00001000 -> 0x000377fe
  HighMem  0x000377fe -> 0x0003bff0
Movable zone start PFN for each node
early_node_map[2] active PFN ranges
    0: 0x00000010 -> 0x0000009f
    0: 0x00000100 -> 0x0003bff0
On node 0 totalpages: 245631
free_area_init_node: node 0, pgdat c0778f80, node_mem_map c1000340
  DMA zone: 52 pages used for memmap
  DMA zone: 0 pages reserved
  DMA zone: 3931 pages, LIFO batch:0
  Normal zone: 2834 pages used for memmap
  Normal zone: 220396 pages, LIFO batch:31
  HighMem zone: 234 pages used for memmap
  HighMem zone: 18184 pages, LIFO batch:3
Using APIC driver default
ACPI: PM-Timer IO Port: 0x1008
ACPI: Local APIC address 0xfee00000
ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled)
ACPI: LAPIC (acpi_id[0x01] lapic_id[0x01] enabled)
ACPI: LAPIC (acpi_id[0x02] lapic_id[0x02] disabled)
ACPI: LAPIC (acpi_id[0x03] lapic_id[0x03] disabled)
ACPI: LAPIC_NMI (acpi_id[0x00] high edge lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x01] high edge lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x02] high edge lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x03] high edge lint[0x1])
ACPI: IOAPIC (id[0x04] address[0xfec00000] gsi_base[0])
IOAPIC[0]: apic_id 4, version 17, address 0xfec00000, GSI 0-23
ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 dfl dfl)
ACPI: IRQ0 used by override.
ACPI: IRQ2 used by override.
ACPI: IRQ9 used by override.
Enabling APIC mode:  Flat.  Using 1 I/O APICs
Using ACPI (MADT) for SMP configuration information
SMP: Allowing 4 CPUs, 2 hotplug CPUs
nr_irqs_gsi: 24
Allocating PCI resources starting at 40000000 (gap: 3c000000:c2c00000)
NR_CPUS:8 nr_cpumask_bits:8 nr_cpu_ids:4 nr_node_ids:1
PERCPU: Embedded 13 pages at c1c3b000, static data 32756 bytes
Built 1 zonelists in Zone order, mobility grouping on.  Total pages: 242511
Kernel command line: ro root=LABEL=/ rhgb quiet
Enabling fast FPU save and restore... done.
Enabling unmasked SIMD FPU exception support... done.
Initializing CPU#0
Preemptible RCU implementation.
NR_IRQS:512
CPU 0 irqstacks, hard=c1c3b000 soft=c1c3c000
PID hash table entries: 4096 (order: 12, 16384 bytes)
Fast TSC calibration using PIT
Detected 2800.222 MHz processor.
Console: colour VGA+ 80x25
console [tty0] enabled
Lock dependency validator: Copyright (c) 2006 Red Hat, Inc., Ingo Molnar
... MAX_LOCKDEP_SUBCLASSES:  8
... MAX_LOCK_DEPTH:          48
... MAX_LOCKDEP_KEYS:        8191
... CLASSHASH_SIZE:          4096
... MAX_LOCKDEP_ENTRIES:     8192
... MAX_LOCKDEP_CHAINS:      16384
... CHAINHASH_SIZE:          8192
 memory used by lock dependency info: 2847 kB
 per task-struct memory footprint: 1152 bytes
Dentry cache hash table entries: 131072 (order: 7, 524288 bytes)
Inode-cache hash table entries: 65536 (order: 6, 262144 bytes)
allocated 4914560 bytes of page_cgroup
please try cgroup_disable=memory,blkio option if you don't want
Initializing HighMem for node 0 (000377fe:0003bff0)
Memory: 952284k/982976k available (2258k kernel code, 30016k reserved, 1424k data, 320k init, 73672k highmem)
virtual kernel memory layout:
    fixmap  : 0xffedf000 - 0xfffff000   (1152 kB)
    pkmap   : 0xff800000 - 0xffc00000   (4096 kB)
    vmalloc : 0xf7ffe000 - 0xff7fe000   ( 120 MB)
    lowmem  : 0xc0000000 - 0xf77fe000   ( 887 MB)
      .init : 0xc079d000 - 0xc07ed000   ( 320 kB)
      .data : 0xc06349ab - 0xc0798cb8   (1424 kB)
      .text : 0xc0400000 - 0xc06349ab   (2258 kB)
Checking if this processor honours the WP bit even in supervisor mode...Ok.
SLUB: Genslabs=13, HWalign=128, Order=0-3, MinObjects=0, CPUs=4, Nodes=1
Calibrating delay loop (skipped), value calculated using timer frequency.. 5600.44 BogoMIPS (lpj=2800222)
Mount-cache hash table entries: 512
Initializing cgroup subsys debug
Initializing cgroup subsys ns
Initializing cgroup subsys cpuacct
Initializing cgroup subsys memory
Initializing cgroup subsys blkio
Initializing cgroup subsys devices
Initializing cgroup subsys freezer
Initializing cgroup subsys net_cls
Initializing cgroup subsys io
CPU: Trace cache: 12K uops, L1 D cache: 16K
CPU: L2 cache: 1024K
CPU: Physical Processor ID: 0
CPU: Processor Core ID: 0
Intel machine check architecture supported.
Intel machine check reporting enabled on CPU#0.
CPU0: Intel P4/Xeon Extended MCE MSRs (24) available
using mwait in idle threads.
Checking 'hlt' instruction... OK.
ACPI: Core revision 20090320
ftrace: converting mcount calls to 0f 1f 44 00 00
ftrace: allocating 12136 entries in 24 pages
..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
CPU0: Intel(R) Pentium(R) D CPU 2.80GHz stepping 04
lockdep: fixing up alternatives.
CPU 1 irqstacks, hard=c1c4b000 soft=c1c4c000
Booting processor 1 APIC 0x1 ip 0x6000
Initializing CPU#1
Calibrating delay using timer specific routine.. 5599.23 BogoMIPS (lpj=2799617)
CPU: Trace cache: 12K uops, L1 D cache: 16K
CPU: L2 cache: 1024K
CPU: Physical Processor ID: 0
CPU: Processor Core ID: 1
Intel machine check architecture supported.
Intel machine check reporting enabled on CPU#1.
CPU1: Intel P4/Xeon Extended MCE MSRs (24) available
CPU1: Intel(R) Pentium(R) D CPU 2.80GHz stepping 04
checking TSC synchronization [CPU#0 -> CPU#1]: passed.
Brought up 2 CPUs
Total of 2 processors activated (11199.67 BogoMIPS).
CPU0 attaching sched-domain:
 domain 0: span 0-1 level CPU
  groups: 0 1
CPU1 attaching sched-domain:
 domain 0: span 0-1 level CPU
  groups: 1 0
net_namespace: 436 bytes
NET: Registered protocol family 16
ACPI: bus type pci registered
PCI: PCI BIOS revision 2.10 entry at 0xfbda0, last bus=1
PCI: Using configuration type 1 for base access
mtrr: your CPUs had inconsistent fixed MTRR settings
mtrr: probably your BIOS does not setup all CPUs.
mtrr: corrected configuration.
bio: create slab <bio-0> at 0
ACPI: EC: Look up EC in DSDT
ACPI: Interpreter enabled
ACPI: (supports S0 S3 S5)
ACPI: Using IOAPIC for interrupt routing
ACPI: No dock devices found.
ACPI: PCI Root Bridge [PCI0] (0000:00)
pci 0000:00:00.0: reg 10 32bit mmio: [0xd0000000-0xd7ffffff]
pci 0000:00:02.5: reg 10 io port: [0x1f0-0x1f7]
pci 0000:00:02.5: reg 14 io port: [0x3f4-0x3f7]
pci 0000:00:02.5: reg 18 io port: [0x170-0x177]
pci 0000:00:02.5: reg 1c io port: [0x374-0x377]
pci 0000:00:02.5: reg 20 io port: [0x4000-0x400f]
pci 0000:00:02.5: PME# supported from D3cold
pci 0000:00:02.5: PME# disabled
pci 0000:00:02.7: reg 10 io port: [0xd000-0xd0ff]
pci 0000:00:02.7: reg 14 io port: [0xd400-0xd47f]
pci 0000:00:02.7: supports D1 D2
pci 0000:00:02.7: PME# supported from D3hot D3cold
pci 0000:00:02.7: PME# disabled
pci 0000:00:03.0: reg 10 32bit mmio: [0xe1104000-0xe1104fff]
pci 0000:00:03.1: reg 10 32bit mmio: [0xe1100000-0xe1100fff]
pci 0000:00:03.2: reg 10 32bit mmio: [0xe1101000-0xe1101fff]
pci 0000:00:03.3: reg 10 32bit mmio: [0xe1102000-0xe1102fff]
pci 0000:00:03.3: PME# supported from D0 D3hot D3cold
pci 0000:00:03.3: PME# disabled
pci 0000:00:05.0: reg 10 io port: [0xd800-0xd807]
pci 0000:00:05.0: reg 14 io port: [0xdc00-0xdc03]
pci 0000:00:05.0: reg 18 io port: [0xe000-0xe007]
pci 0000:00:05.0: reg 1c io port: [0xe400-0xe403]
pci 0000:00:05.0: reg 20 io port: [0xe800-0xe80f]
pci 0000:00:05.0: PME# supported from D3cold
pci 0000:00:05.0: PME# disabled
pci 0000:00:0e.0: reg 10 io port: [0xec00-0xecff]
pci 0000:00:0e.0: reg 14 32bit mmio: [0xe1103000-0xe11030ff]
pci 0000:00:0e.0: reg 30 32bit mmio: [0x000000-0x01ffff]
pci 0000:00:0e.0: supports D1 D2
pci 0000:00:0e.0: PME# supported from D1 D2 D3hot D3cold
pci 0000:00:0e.0: PME# disabled
pci 0000:01:00.0: reg 10 32bit mmio: [0xd8000000-0xdfffffff]
pci 0000:01:00.0: reg 14 32bit mmio: [0xe1000000-0xe101ffff]
pci 0000:01:00.0: reg 18 io port: [0xc000-0xc07f]
pci 0000:01:00.0: supports D1 D2
pci 0000:00:01.0: bridge io port: [0xc000-0xcfff]
pci 0000:00:01.0: bridge 32bit mmio: [0xe1000000-0xe10fffff]
pci 0000:00:01.0: bridge 32bit mmio pref: [0xd8000000-0xdfffffff]
pci_bus 0000:00: on NUMA node 0
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0._PRT]
ACPI: PCI Interrupt Link [LNKA] (IRQs 3 4 5 6 7 9 10 11 14 15) *0, disabled.
ACPI: PCI Interrupt Link [LNKB] (IRQs 3 4 5 6 7 9 10 *11 14 15)
ACPI: PCI Interrupt Link [LNKC] (IRQs 3 4 5 6 7 9 *10 11 14 15)
ACPI: PCI Interrupt Link [LNKD] (IRQs 3 4 5 6 7 9 10 11 14 15) *0, disabled.
ACPI: PCI Interrupt Link [LNKE] (IRQs 3 4 5 6 7 9 10 *11 14 15)
ACPI: PCI Interrupt Link [LNKF] (IRQs 3 4 5 *6 7 9 10 11 14 15)
ACPI: PCI Interrupt Link [LNKG] (IRQs 3 4 5 6 7 *9 10 11 14 15)
ACPI: PCI Interrupt Link [LNKH] (IRQs 3 4 *5 6 7 9 10 11 14 15)
usbcore: registered new interface driver usbfs
usbcore: registered new interface driver hub
usbcore: registered new device driver usb
PCI: Using ACPI for IRQ routing
pnp: PnP ACPI init
ACPI: bus type pnp registered
pnp: PnP ACPI: found 12 devices
ACPI: ACPI bus type pnp unregistered
system 00:00: iomem range 0xc8000-0xcbfff has been reserved
system 00:00: iomem range 0xf0000-0xf7fff could not be reserved
system 00:00: iomem range 0xf8000-0xfbfff could not be reserved
system 00:00: iomem range 0xfc000-0xfffff could not be reserved
system 00:00: iomem range 0x3bff0000-0x3bffffff could not be reserved
system 00:00: iomem range 0xffff0000-0xffffffff has been reserved
system 00:00: iomem range 0x0-0x9ffff could not be reserved
system 00:00: iomem range 0x100000-0x3bfeffff could not be reserved
system 00:00: iomem range 0xffee0000-0xffefffff has been reserved
system 00:00: iomem range 0xfffe0000-0xfffeffff has been reserved
system 00:00: iomem range 0xfec00000-0xfecfffff has been reserved
system 00:00: iomem range 0xfee00000-0xfeefffff has been reserved
system 00:02: ioport range 0x4d0-0x4d1 has been reserved
system 00:02: ioport range 0x800-0x805 has been reserved
system 00:02: ioport range 0x290-0x297 has been reserved
system 00:02: ioport range 0x880-0x88f has been reserved
pci 0000:00:01.0: PCI bridge, secondary bus 0000:01
pci 0000:00:01.0:   IO window: 0xc000-0xcfff
pci 0000:00:01.0:   MEM window: 0xe1000000-0xe10fffff
pci 0000:00:01.0:   PREFETCH window: 0x000000d8000000-0x000000dfffffff
pci_bus 0000:00: resource 0 io:  [0x00-0xffff]
pci_bus 0000:00: resource 1 mem: [0x000000-0xffffffff]
pci_bus 0000:01: resource 0 io:  [0xc000-0xcfff]
pci_bus 0000:01: resource 1 mem: [0xe1000000-0xe10fffff]
pci_bus 0000:01: resource 2 pref mem [0xd8000000-0xdfffffff]
NET: Registered protocol family 2
IP route cache hash table entries: 32768 (order: 5, 131072 bytes)
TCP established hash table entries: 131072 (order: 8, 1048576 bytes)
TCP bind hash table entries: 65536 (order: 9, 2097152 bytes)
TCP: Hash tables configured (established 131072 bind 65536)
TCP reno registered
NET: Registered protocol family 1
checking if image is initramfs...
rootfs image is initramfs; unpacking...
Freeing initrd memory: 2955k freed
apm: BIOS version 1.2 Flags 0x07 (Driver version 1.16ac)
apm: disabled - APM is not SMP safe.
highmem bounce pool size: 64 pages
HugeTLB registered 4 MB page size, pre-allocated 0 pages
msgmni has been set to 1722
alg: No test for stdrng (krng)
Block layer SCSI generic (bsg) driver version 0.4 loaded (major 254)
io scheduler noop registered
io scheduler cfq registered (default)
pci 0000:01:00.0: Boot video device
pci_hotplug: PCI Hot Plug PCI Core version: 0.5
fan PNP0C0B:00: registered as cooling_device0
ACPI: Fan [FAN] (on)
processor ACPI_CPU:00: registered as cooling_device1
processor ACPI_CPU:01: registered as cooling_device2
thermal LNXTHERM:01: registered as thermal_zone0
ACPI: Thermal Zone [THRM] (62 C)
isapnp: Scanning for PnP cards...
Switched to high resolution mode on CPU 1
Switched to high resolution mode on CPU 0
isapnp: No Plug & Play device found
Real Time Clock Driver v1.12b
Non-volatile memory driver v1.3
Linux agpgart interface v0.103
agpgart-sis 0000:00:00.0: SiS chipset [1039/0661]
agpgart-sis 0000:00:00.0: AGP aperture is 128M @ 0xd0000000
Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled
serial8250: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
serial8250: ttyS1 at I/O 0x2f8 (irq = 3) is a 16550A
00:07: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
00:08: ttyS1 at I/O 0x2f8 (irq = 3) is a 16550A
brd: module loaded
PNP: PS/2 Controller [PNP0303:PS2K,PNP0f13:PS2M] at 0x60,0x64 irq 1,12
serio: i8042 KBD port at 0x60,0x64 irq 1
serio: i8042 AUX port at 0x60,0x64 irq 12
mice: PS/2 mouse device common for all mice
cpuidle: using governor ladder
cpuidle: using governor menu
TCP cubic registered
NET: Registered protocol family 17
Using IPI No-Shortcut mode
registered taskstats version 1
Freeing unused kernel memory: 320k freed
Write protecting the kernel text: 2260k
Write protecting the kernel read-only data: 1120k
ehci_hcd: USB 2.0 'Enhanced' Host Controller (EHCI) Driver
ehci_hcd 0000:00:03.3: PCI INT D -> GSI 23 (level, low) -> IRQ 23
ehci_hcd 0000:00:03.3: EHCI Host Controller
ehci_hcd 0000:00:03.3: new USB bus registered, assigned bus number 1
ehci_hcd 0000:00:03.3: cache line size of 128 is not supported
ehci_hcd 0000:00:03.3: irq 23, io mem 0xe1102000
ehci_hcd 0000:00:03.3: USB 2.0 started, EHCI 1.00
usb usb1: configuration #1 chosen from 1 choice
hub 1-0:1.0: USB hub found
hub 1-0:1.0: 8 ports detected
ohci_hcd: USB 1.1 'Open' Host Controller (OHCI) Driver
ohci_hcd 0000:00:03.0: PCI INT A -> GSI 20 (level, low) -> IRQ 20
ohci_hcd 0000:00:03.0: OHCI Host Controller
ohci_hcd 0000:00:03.0: new USB bus registered, assigned bus number 2
ohci_hcd 0000:00:03.0: irq 20, io mem 0xe1104000
usb usb2: configuration #1 chosen from 1 choice
hub 2-0:1.0: USB hub found
hub 2-0:1.0: 3 ports detected
ohci_hcd 0000:00:03.1: PCI INT B -> GSI 21 (level, low) -> IRQ 21
ohci_hcd 0000:00:03.1: OHCI Host Controller
ohci_hcd 0000:00:03.1: new USB bus registered, assigned bus number 3
ohci_hcd 0000:00:03.1: irq 21, io mem 0xe1100000
usb usb3: configuration #1 chosen from 1 choice
hub 3-0:1.0: USB hub found
hub 3-0:1.0: 3 ports detected
ohci_hcd 0000:00:03.2: PCI INT C -> GSI 22 (level, low) -> IRQ 22
ohci_hcd 0000:00:03.2: OHCI Host Controller
ohci_hcd 0000:00:03.2: new USB bus registered, assigned bus number 4
ohci_hcd 0000:00:03.2: irq 22, io mem 0xe1101000
usb usb4: configuration #1 chosen from 1 choice
hub 4-0:1.0: USB hub found
hub 4-0:1.0: 2 ports detected
uhci_hcd: USB Universal Host Controller Interface driver
SCSI subsystem initialized
Driver 'sd' needs updating - please use bus_type methods
libata version 3.00 loaded.
pata_sis 0000:00:02.5: version 0.5.2
pata_sis 0000:00:02.5: PCI INT A -> GSI 16 (level, low) -> IRQ 16
scsi0 : pata_sis
scsi1 : pata_sis
ata1: PATA max UDMA/133 cmd 0x1f0 ctl 0x3f6 bmdma 0x4000 irq 14
ata2: PATA max UDMA/133 cmd 0x170 ctl 0x376 bmdma 0x4008 irq 15
input: ImPS/2 Logitech Wheel Mouse as /class/input/input0
input: AT Translated Set 2 keyboard as /class/input/input1
sata_sis 0000:00:05.0: version 1.0
sata_sis 0000:00:05.0: PCI INT A -> GSI 17 (level, low) -> IRQ 17
sata_sis 0000:00:05.0: Detected SiS 180/181/964 chipset in SATA mode
scsi2 : sata_sis
scsi3 : sata_sis
ata3: SATA max UDMA/133 cmd 0xd800 ctl 0xdc00 bmdma 0xe800 irq 17
ata4: SATA max UDMA/133 cmd 0xe000 ctl 0xe400 bmdma 0xe808 irq 17
ata3: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
ata3.00: ATA-7: ST3808110AS, 3.AAE, max UDMA/133
ata3.00: 156301488 sectors, multi 16: LBA48 NCQ (depth 0/32)
ata3.00: configured for UDMA/133
scsi 2:0:0:0: Direct-Access     ATA      ST3808110AS      3.AA PQ: 0 ANSI: 5
sd 2:0:0:0: [sda] 156301488 512-byte hardware sectors: (80.0 GB/74.5 GiB)
sd 2:0:0:0: [sda] Write Protect is off
sd 2:0:0:0: [sda] Mode Sense: 00 3a 00 00
sd 2:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
 sda: sda1 sda2 < sda5 sda6 sda7 sda8 sda9 >
sd 2:0:0:0: [sda] Attached SCSI disk
ata4: SATA link down (SStatus 0 SControl 300)
EXT3-fs: INFO: recovery required on readonly filesystem.
EXT3-fs: write access will be enabled during recovery.
kjournald starting.  Commit interval 5 seconds
EXT3-fs: sda8: orphan cleanup on readonly fs
ext3_orphan_cleanup: deleting unreferenced inode 3725366
ext3_orphan_cleanup: deleting unreferenced inode 3725365
ext3_orphan_cleanup: deleting unreferenced inode 3725364
EXT3-fs: sda8: 3 orphan inodes deleted
EXT3-fs: recovery complete.
EXT3-fs: mounted filesystem with writeback data mode.
r8169 Gigabit Ethernet driver 2.3LK-NAPI loaded
r8169 0000:00:0e.0: PCI INT A -> GSI 18 (level, low) -> IRQ 18
r8169 0000:00:0e.0: no PCI Express capability
eth0: RTL8110s at 0xf8236000, 00:16:ec:2e:b7:e0, XID 04000000 IRQ 18
sd 2:0:0:0: Attached scsi generic sg0 type 0
parport_pc 00:09: reported by Plug and Play ACPI
parport0: PC-style at 0x378 (0x778), irq 7 [PCSPP,TRISTATE]
input: Power Button as /class/input/input2
ACPI: Power Button [PWRF]
input: Power Button as /class/input/input3
ACPI: Power Button [PWRB]
input: Sleep Button as /class/input/input4
ACPI: Sleep Button [FUTS]
ramfs: bad mount option: maxsize=512
EXT3 FS on sda8, internal journal
kjournald starting.  Commit interval 5 seconds
EXT3 FS on sda7, internal journal
EXT3-fs: mounted filesystem with writeback data mode.
Adding 1052216k swap on /dev/sda6.  Priority:-1 extents:1 across:1052216k 
warning: process `kudzu' used the deprecated sysctl system call with 1.23.
kudzu[1133] general protection ip:8056968 sp:bffe9e90 error:0
r8169: eth0: link up
r8169: eth0: link up
warning: `dbus-daemon' uses 32-bit capabilities (legacy support in use)
CPU0 attaching NULL sched-domain.
CPU1 attaching NULL sched-domain.
CPU0 attaching sched-domain:
 domain 0: span 0-1 level CPU
  groups: 0 1
CPU1 attaching sched-domain:
 domain 0: span 0-1 level CPU
  groups: 1 0

=========================================================
[ INFO: possible irq lock inversion dependency detected ]
2.6.30-rc4-io #6
---------------------------------------------------------
rmdir/2186 just changed the state of lock:
 (&iocg->lock){+.+...}, at: [<c0513b18>] iocg_destroy+0x2a/0x118
but this lock was taken by another, SOFTIRQ-safe lock in the past:
 (&q->__queue_lock){..-...}

and interrupts could create inverse lock ordering between them.


other info that might help us debug this:
3 locks held by rmdir/2186:
 #0:  (&sb->s_type->i_mutex_key#10/1){+.+.+.}, at: [<c04ae1e8>] do_rmdir+0x5c/0xc8
 #1:  (cgroup_mutex){+.+.+.}, at: [<c045a15b>] cgroup_diput+0x3c/0xa7
 #2:  (&iocg->lock){+.+...}, at: [<c0513b18>] iocg_destroy+0x2a/0x118

the first lock's dependencies:
-> (&iocg->lock){+.+...} ops: 3 {
   HARDIRQ-ON-W at:
                        [<c044b840>] mark_held_locks+0x3d/0x58
                        [<c044b963>] trace_hardirqs_on_caller+0x108/0x14c
                        [<c044b9b2>] trace_hardirqs_on+0xb/0xd
                        [<c0630883>] _spin_unlock_irq+0x27/0x47
                        [<c0513baa>] iocg_destroy+0xbc/0x118
                        [<c045a16a>] cgroup_diput+0x4b/0xa7
                        [<c04b1dbb>] dentry_iput+0x78/0x9c
                        [<c04b1e82>] d_kill+0x21/0x3b
                        [<c04b2f2a>] dput+0xf3/0xfc
                        [<c04ae226>] do_rmdir+0x9a/0xc8
                        [<c04ae29d>] sys_rmdir+0x15/0x17
                        [<c0402a68>] sysenter_do_call+0x12/0x36
                        [<ffffffff>] 0xffffffff
   SOFTIRQ-ON-W at:
                        [<c044b840>] mark_held_locks+0x3d/0x58
                        [<c044b97c>] trace_hardirqs_on_caller+0x121/0x14c
                        [<c044b9b2>] trace_hardirqs_on+0xb/0xd
                        [<c0630883>] _spin_unlock_irq+0x27/0x47
                        [<c0513baa>] iocg_destroy+0xbc/0x118
                        [<c045a16a>] cgroup_diput+0x4b/0xa7
                        [<c04b1dbb>] dentry_iput+0x78/0x9c
                        [<c04b1e82>] d_kill+0x21/0x3b
                        [<c04b2f2a>] dput+0xf3/0xfc
                        [<c04ae226>] do_rmdir+0x9a/0xc8
                        [<c04ae29d>] sys_rmdir+0x15/0x17
                        [<c0402a68>] sysenter_do_call+0x12/0x36
                        [<ffffffff>] 0xffffffff
   INITIAL USE at:
                       [<c044dad5>] __lock_acquire+0x58c/0x73e
                       [<c044dd36>] lock_acquire+0xaf/0xcc
                       [<c06304ea>] _spin_lock_irq+0x30/0x3f
                       [<c05119bd>] io_alloc_root_group+0x104/0x155
                       [<c05133cb>] elv_init_fq_data+0x32/0xe0
                       [<c0504317>] elevator_alloc+0x150/0x170
                       [<c0505393>] elevator_init+0x9d/0x100
                       [<c0507088>] blk_init_queue_node+0xc4/0xf7
                       [<c05070cb>] blk_init_queue+0x10/0x12
                       [<f81060fd>] __scsi_alloc_queue+0x1c/0xba [scsi_mod]
                       [<f81061b0>] scsi_alloc_queue+0x15/0x4e [scsi_mod]
                       [<f810803d>] scsi_alloc_sdev+0x154/0x1f5 [scsi_mod]
                       [<f8108387>] scsi_probe_and_add_lun+0x123/0xb5b [scsi_mod]
                       [<f8109847>] __scsi_add_device+0x8a/0xb0 [scsi_mod]
                       [<f816ad14>] ata_scsi_scan_host+0x77/0x141 [libata]
                       [<f816903f>] async_port_probe+0xa0/0xa9 [libata]
                       [<c044341f>] async_thread+0xe9/0x1c9
                       [<c043e204>] kthread+0x4a/0x72
                       [<c04034e7>] kernel_thread_helper+0x7/0x10
                       [<ffffffff>] 0xffffffff
 }
 ... key      at: [<c0c5ebd8>] __key.29462+0x0/0x8

the second lock's dependencies:
-> (&q->__queue_lock){..-...} ops: 162810 {
   IN-SOFTIRQ-W at:
                        [<c044da08>] __lock_acquire+0x4bf/0x73e
                        [<c044dd36>] lock_acquire+0xaf/0xcc
                        [<c0630340>] _spin_lock+0x2a/0x39
                        [<f810672c>] scsi_device_unbusy+0x78/0x92 [scsi_mod]
                        [<f8101483>] scsi_finish_command+0x22/0xd4 [scsi_mod]
                        [<f8106fdb>] scsi_softirq_done+0xf9/0x101 [scsi_mod]
                        [<c050a936>] blk_done_softirq+0x5e/0x70
                        [<c0431379>] __do_softirq+0xb8/0x180
                        [<ffffffff>] 0xffffffff
   INITIAL USE at:
                       [<c044dad5>] __lock_acquire+0x58c/0x73e
                       [<c044dd36>] lock_acquire+0xaf/0xcc
                       [<c063056b>] _spin_lock_irqsave+0x33/0x43
                       [<f8101337>] scsi_adjust_queue_depth+0x2a/0xc9 [scsi_mod]
                       [<f8108079>] scsi_alloc_sdev+0x190/0x1f5 [scsi_mod]
                       [<f8108387>] scsi_probe_and_add_lun+0x123/0xb5b [scsi_mod]
                       [<f8109847>] __scsi_add_device+0x8a/0xb0 [scsi_mod]
                       [<f816ad14>] ata_scsi_scan_host+0x77/0x141 [libata]
                       [<f816903f>] async_port_probe+0xa0/0xa9 [libata]
                       [<c044341f>] async_thread+0xe9/0x1c9
                       [<c043e204>] kthread+0x4a/0x72
                       [<c04034e7>] kernel_thread_helper+0x7/0x10
                       [<ffffffff>] 0xffffffff
 }
 ... key      at: [<c0c5e698>] __key.29749+0x0/0x8
 -> (&ioc->lock){..-...} ops: 1032 {
    IN-SOFTIRQ-W at:
                          [<c044da08>] __lock_acquire+0x4bf/0x73e
                          [<c044dd36>] lock_acquire+0xaf/0xcc
                          [<c063056b>] _spin_lock_irqsave+0x33/0x43
                          [<c050f0f0>] cic_free_func+0x26/0x64
                          [<c050ea90>] __call_for_each_cic+0x23/0x2e
                          [<c050eaad>] cfq_free_io_context+0x12/0x14
                          [<c050978c>] put_io_context+0x4b/0x66
                          [<c050f2a2>] cfq_put_request+0x42/0x5b
                          [<c0504629>] elv_put_request+0x30/0x33
                          [<c050678d>] __blk_put_request+0x8b/0xb8
                          [<c0506953>] end_that_request_last+0x199/0x1a1
                          [<c0506a0d>] blk_end_io+0x51/0x6f
                          [<c0506a64>] blk_end_request+0x11/0x13
                          [<f8106c9c>] scsi_io_completion+0x1d9/0x41f [scsi_mod]
                          [<f810152d>] scsi_finish_command+0xcc/0xd4 [scsi_mod]
                          [<f8106fdb>] scsi_softirq_done+0xf9/0x101 [scsi_mod]
                          [<c050a936>] blk_done_softirq+0x5e/0x70
                          [<c0431379>] __do_softirq+0xb8/0x180
                          [<ffffffff>] 0xffffffff
    INITIAL USE at:
                         [<c044dad5>] __lock_acquire+0x58c/0x73e
                         [<c044dd36>] lock_acquire+0xaf/0xcc
                         [<c063056b>] _spin_lock_irqsave+0x33/0x43
                         [<c050f9bf>] cfq_set_request+0x123/0x33d
                         [<c05052e6>] elv_set_request+0x43/0x53
                         [<c0506d44>] get_request+0x22e/0x33f
                         [<c0507498>] get_request_wait+0x137/0x15d
                         [<c0507501>] blk_get_request+0x43/0x73
                         [<f8106854>] scsi_execute+0x24/0x11c [scsi_mod]
                         [<f81069ff>] scsi_execute_req+0xb3/0x104 [scsi_mod]
                         [<f81084f8>] scsi_probe_and_add_lun+0x294/0xb5b [scsi_mod]
                         [<f8109847>] __scsi_add_device+0x8a/0xb0 [scsi_mod]
                         [<f816ad14>] ata_scsi_scan_host+0x77/0x141 [libata]
                         [<f816903f>] async_port_probe+0xa0/0xa9 [libata]
                         [<c044341f>] async_thread+0xe9/0x1c9
                         [<c043e204>] kthread+0x4a/0x72
                         [<c04034e7>] kernel_thread_helper+0x7/0x10
                         [<ffffffff>] 0xffffffff
  }
  ... key      at: [<c0c5e6ec>] __key.27747+0x0/0x8
  -> (&rdp->lock){-.-...} ops: 168014 {
     IN-HARDIRQ-W at:
                            [<c044d9e4>] __lock_acquire+0x49b/0x73e
                            [<c044dd36>] lock_acquire+0xaf/0xcc
                            [<c063056b>] _spin_lock_irqsave+0x33/0x43
                            [<c0461b2a>] rcu_check_callbacks+0x6a/0xa3
                            [<c043549a>] update_process_times+0x3d/0x53
                            [<c0447fe0>] tick_periodic+0x6b/0x77
                            [<c0448009>] tick_handle_periodic+0x1d/0x60
                            [<c063406e>] smp_apic_timer_interrupt+0x6e/0x81
                            [<c04033c7>] apic_timer_interrupt+0x2f/0x34
                            [<c042fbd7>] do_exit+0x53e/0x5b3
                            [<c043a9d8>] __request_module+0x0/0x100
                            [<c04034e7>] kernel_thread_helper+0x7/0x10
                            [<ffffffff>] 0xffffffff
     IN-SOFTIRQ-W at:
                            [<c044da08>] __lock_acquire+0x4bf/0x73e
                            [<c044dd36>] lock_acquire+0xaf/0xcc
                            [<c0630340>] _spin_lock+0x2a/0x39
                            [<c04619db>] rcu_process_callbacks+0x2b/0x86
                            [<c0431379>] __do_softirq+0xb8/0x180
                            [<ffffffff>] 0xffffffff
     INITIAL USE at:
                           [<c044dad5>] __lock_acquire+0x58c/0x73e
                           [<c044dd36>] lock_acquire+0xaf/0xcc
                           [<c063056b>] _spin_lock_irqsave+0x33/0x43
                           [<c062c8ca>] rcu_online_cpu+0x3d/0x51
                           [<c062c910>] rcu_cpu_notify+0x32/0x43
                           [<c07b097f>] __rcu_init+0xf0/0x120
                           [<c07af027>] rcu_init+0x8/0x14
                           [<c079d6e1>] start_kernel+0x187/0x2fc
                           [<c079d06a>] __init_begin+0x6a/0x6f
                           [<ffffffff>] 0xffffffff
   }
   ... key      at: [<c0c2e52c>] __key.17543+0x0/0x8
  ... acquired at:
   [<c044d243>] validate_chain+0x8a8/0xbae
   [<c044dbfd>] __lock_acquire+0x6b4/0x73e
   [<c044dd36>] lock_acquire+0xaf/0xcc
   [<c0630340>] _spin_lock+0x2a/0x39
   [<c046143d>] call_rcu+0x36/0x5b
   [<c0517b45>] radix_tree_delete+0xe7/0x176
   [<c050f0fe>] cic_free_func+0x34/0x64
   [<c050ea90>] __call_for_each_cic+0x23/0x2e
   [<c050eaad>] cfq_free_io_context+0x12/0x14
   [<c050978c>] put_io_context+0x4b/0x66
   [<c050984c>] exit_io_context+0x77/0x7b
   [<c042fc24>] do_exit+0x58b/0x5b3
   [<c04034ed>] kernel_thread_helper+0xd/0x10
   [<ffffffff>] 0xffffffff

 ... acquired at:
   [<c044d243>] validate_chain+0x8a8/0xbae
   [<c044dbfd>] __lock_acquire+0x6b4/0x73e
   [<c044dd36>] lock_acquire+0xaf/0xcc
   [<c063056b>] _spin_lock_irqsave+0x33/0x43
   [<c050f4a3>] cfq_cic_lookup+0xd9/0xef
   [<c050f674>] cfq_get_queue+0x92/0x2ba
   [<c050fb01>] cfq_set_request+0x265/0x33d
   [<c05052e6>] elv_set_request+0x43/0x53
   [<c0506d44>] get_request+0x22e/0x33f
   [<c0507498>] get_request_wait+0x137/0x15d
   [<c0507501>] blk_get_request+0x43/0x73
   [<f8106854>] scsi_execute+0x24/0x11c [scsi_mod]
   [<f81069ff>] scsi_execute_req+0xb3/0x104 [scsi_mod]
   [<f81084f8>] scsi_probe_and_add_lun+0x294/0xb5b [scsi_mod]
   [<f8109847>] __scsi_add_device+0x8a/0xb0 [scsi_mod]
   [<f816ad14>] ata_scsi_scan_host+0x77/0x141 [libata]
   [<f816903f>] async_port_probe+0xa0/0xa9 [libata]
   [<c044341f>] async_thread+0xe9/0x1c9
   [<c043e204>] kthread+0x4a/0x72
   [<c04034e7>] kernel_thread_helper+0x7/0x10
   [<ffffffff>] 0xffffffff

 -> (&base->lock){..-...} ops: 348073 {
    IN-SOFTIRQ-W at:
                          [<c044da08>] __lock_acquire+0x4bf/0x73e
                          [<c044dd36>] lock_acquire+0xaf/0xcc
                          [<c06304ea>] _spin_lock_irq+0x30/0x3f
                          [<c0434b8b>] run_timer_softirq+0x3c/0x1d1
                          [<c0431379>] __do_softirq+0xb8/0x180
                          [<ffffffff>] 0xffffffff
    INITIAL USE at:
                         [<c044dad5>] __lock_acquire+0x58c/0x73e
                         [<c044dd36>] lock_acquire+0xaf/0xcc
                         [<c063056b>] _spin_lock_irqsave+0x33/0x43
                         [<c0434e84>] lock_timer_base+0x24/0x43
                         [<c0434f3d>] mod_timer+0x46/0xcc
                         [<c07bd97a>] con_init+0xa4/0x20e
                         [<c07bd3b2>] console_init+0x12/0x20
                         [<c079d735>] start_kernel+0x1db/0x2fc
                         [<c079d06a>] __init_begin+0x6a/0x6f
                         [<ffffffff>] 0xffffffff
  }
  ... key      at: [<c082304c>] __key.23401+0x0/0x8
 ... acquired at:
   [<c044d243>] validate_chain+0x8a8/0xbae
   [<c044dbfd>] __lock_acquire+0x6b4/0x73e
   [<c044dd36>] lock_acquire+0xaf/0xcc
   [<c063056b>] _spin_lock_irqsave+0x33/0x43
   [<c0434e84>] lock_timer_base+0x24/0x43
   [<c0434f3d>] mod_timer+0x46/0xcc
   [<c05075cb>] blk_plug_device+0x9a/0xdf
   [<c05049e1>] __elv_add_request+0x86/0x96
   [<c0509d52>] blk_execute_rq_nowait+0x5d/0x86
   [<c0509e2e>] blk_execute_rq+0xb3/0xd5
   [<f81068f5>] scsi_execute+0xc5/0x11c [scsi_mod]
   [<f81069ff>] scsi_execute_req+0xb3/0x104 [scsi_mod]
   [<f81084f8>] scsi_probe_and_add_lun+0x294/0xb5b [scsi_mod]
   [<f8109847>] __scsi_add_device+0x8a/0xb0 [scsi_mod]
   [<f816ad14>] ata_scsi_scan_host+0x77/0x141 [libata]
   [<f816903f>] async_port_probe+0xa0/0xa9 [libata]
   [<c044341f>] async_thread+0xe9/0x1c9
   [<c043e204>] kthread+0x4a/0x72
   [<c04034e7>] kernel_thread_helper+0x7/0x10
   [<ffffffff>] 0xffffffff

 -> (&sdev->list_lock){..-...} ops: 27612 {
    IN-SOFTIRQ-W at:
                          [<c044da08>] __lock_acquire+0x4bf/0x73e
                          [<c044dd36>] lock_acquire+0xaf/0xcc
                          [<c063056b>] _spin_lock_irqsave+0x33/0x43
                          [<f8101cb4>] scsi_put_command+0x17/0x57 [scsi_mod]
                          [<f810620f>] scsi_next_command+0x26/0x39 [scsi_mod]
                          [<f8106d02>] scsi_io_completion+0x23f/0x41f [scsi_mod]
                          [<f810152d>] scsi_finish_command+0xcc/0xd4 [scsi_mod]
                          [<f8106fdb>] scsi_softirq_done+0xf9/0x101 [scsi_mod]
                          [<c050a936>] blk_done_softirq+0x5e/0x70
                          [<c0431379>] __do_softirq+0xb8/0x180
                          [<ffffffff>] 0xffffffff
    INITIAL USE at:
                         [<c044dad5>] __lock_acquire+0x58c/0x73e
                         [<c044dd36>] lock_acquire+0xaf/0xcc
                         [<c063056b>] _spin_lock_irqsave+0x33/0x43
                         [<f8101c64>] scsi_get_command+0x5c/0x95 [scsi_mod]
                         [<f81062b6>] scsi_get_cmd_from_req+0x26/0x50 [scsi_mod]
                         [<f8106594>] scsi_setup_blk_pc_cmnd+0x2b/0xd7 [scsi_mod]
                         [<f8106664>] scsi_prep_fn+0x24/0x33 [scsi_mod]
                         [<c0504712>] elv_next_request+0xe6/0x18d
                         [<f810704c>] scsi_request_fn+0x69/0x431 [scsi_mod]
                         [<c05072af>] __generic_unplug_device+0x2e/0x31
                         [<c0509d59>] blk_execute_rq_nowait+0x64/0x86
                         [<c0509e2e>] blk_execute_rq+0xb3/0xd5
                         [<f81068f5>] scsi_execute+0xc5/0x11c [scsi_mod]
                         [<f81069ff>] scsi_execute_req+0xb3/0x104 [scsi_mod]
                         [<f81084f8>] scsi_probe_and_add_lun+0x294/0xb5b [scsi_mod]
                         [<f8109847>] __scsi_add_device+0x8a/0xb0 [scsi_mod]
                         [<f816ad14>] ata_scsi_scan_host+0x77/0x141 [libata]
                         [<f816903f>] async_port_probe+0xa0/0xa9 [libata]
                         [<c044341f>] async_thread+0xe9/0x1c9
                         [<c043e204>] kthread+0x4a/0x72
                         [<c04034e7>] kernel_thread_helper+0x7/0x10
                         [<ffffffff>] 0xffffffff
  }
  ... key      at: [<f811916c>] __key.29786+0x0/0xffff2ebf [scsi_mod]
 ... acquired at:
   [<c044d243>] validate_chain+0x8a8/0xbae
   [<c044dbfd>] __lock_acquire+0x6b4/0x73e
   [<c044dd36>] lock_acquire+0xaf/0xcc
   [<c063056b>] _spin_lock_irqsave+0x33/0x43
   [<f8101c64>] scsi_get_command+0x5c/0x95 [scsi_mod]
   [<f81062b6>] scsi_get_cmd_from_req+0x26/0x50 [scsi_mod]
   [<f8106594>] scsi_setup_blk_pc_cmnd+0x2b/0xd7 [scsi_mod]
   [<f8106664>] scsi_prep_fn+0x24/0x33 [scsi_mod]
   [<c0504712>] elv_next_request+0xe6/0x18d
   [<f810704c>] scsi_request_fn+0x69/0x431 [scsi_mod]
   [<c05072af>] __generic_unplug_device+0x2e/0x31
   [<c0509d59>] blk_execute_rq_nowait+0x64/0x86
   [<c0509e2e>] blk_execute_rq+0xb3/0xd5
   [<f81068f5>] scsi_execute+0xc5/0x11c [scsi_mod]
   [<f81069ff>] scsi_execute_req+0xb3/0x104 [scsi_mod]
   [<f81084f8>] scsi_probe_and_add_lun+0x294/0xb5b [scsi_mod]
   [<f8109847>] __scsi_add_device+0x8a/0xb0 [scsi_mod]
   [<f816ad14>] ata_scsi_scan_host+0x77/0x141 [libata]
   [<f816903f>] async_port_probe+0xa0/0xa9 [libata]
   [<c044341f>] async_thread+0xe9/0x1c9
   [<c043e204>] kthread+0x4a/0x72
   [<c04034e7>] kernel_thread_helper+0x7/0x10
   [<ffffffff>] 0xffffffff

 -> (&q->lock){-.-.-.} ops: 2105038 {
    IN-HARDIRQ-W at:
                          [<c044d9e4>] __lock_acquire+0x49b/0x73e
                          [<c044dd36>] lock_acquire+0xaf/0xcc
                          [<c063056b>] _spin_lock_irqsave+0x33/0x43
                          [<c041ec0d>] complete+0x17/0x43
                          [<c062609b>] i8042_aux_test_irq+0x4c/0x65
                          [<c045e922>] handle_IRQ_event+0xa4/0x169
                          [<c04602ea>] handle_edge_irq+0xc9/0x10a
                          [<ffffffff>] 0xffffffff
    IN-SOFTIRQ-W at:
                          [<c044da08>] __lock_acquire+0x4bf/0x73e
                          [<c044dd36>] lock_acquire+0xaf/0xcc
                          [<c063056b>] _spin_lock_irqsave+0x33/0x43
                          [<c041ec0d>] complete+0x17/0x43
                          [<c043c336>] wakeme_after_rcu+0x10/0x12
                          [<c0461a12>] rcu_process_callbacks+0x62/0x86
                          [<c0431379>] __do_softirq+0xb8/0x180
                          [<ffffffff>] 0xffffffff
    IN-RECLAIM_FS-W at:
                             [<c044dabd>] __lock_acquire+0x574/0x73e
                             [<c044dd36>] lock_acquire+0xaf/0xcc
                             [<c063056b>] _spin_lock_irqsave+0x33/0x43
                             [<c043e47b>] prepare_to_wait+0x1c/0x4a
                             [<c0485d3e>] kswapd+0xa7/0x51b
                             [<c043e204>] kthread+0x4a/0x72
                             [<c04034e7>] kernel_thread_helper+0x7/0x10
                             [<ffffffff>] 0xffffffff
    INITIAL USE at:
                         [<c044dad5>] __lock_acquire+0x58c/0x73e
                         [<c044dd36>] lock_acquire+0xaf/0xcc
                         [<c06304ea>] _spin_lock_irq+0x30/0x3f
                         [<c062d811>] wait_for_common+0x2f/0xeb
                         [<c062d968>] wait_for_completion+0x17/0x19
                         [<c043e161>] kthread_create+0x6e/0xc7
                         [<c062b7eb>] migration_call+0x39/0x444
                         [<c07ae112>] migration_init+0x1d/0x4b
                         [<c040115c>] do_one_initcall+0x6a/0x16e
                         [<c079d44d>] kernel_init+0x4d/0x15a
                         [<c04034e7>] kernel_thread_helper+0x7/0x10
                         [<ffffffff>] 0xffffffff
  }
  ... key      at: [<c0823490>] __key.17681+0x0/0x8
  -> (&rq->lock){-.-.-.} ops: 854341 {
     IN-HARDIRQ-W at:
                            [<c044d9e4>] __lock_acquire+0x49b/0x73e
                            [<c044dd36>] lock_acquire+0xaf/0xcc
                            [<c0630340>] _spin_lock+0x2a/0x39
                            [<c0429f89>] scheduler_tick+0x39/0x19b
                            [<c04354a4>] update_process_times+0x47/0x53
                            [<c0447fe0>] tick_periodic+0x6b/0x77
                            [<c0448009>] tick_handle_periodic+0x1d/0x60
                            [<c0404ace>] timer_interrupt+0x3e/0x45
                            [<c045e922>] handle_IRQ_event+0xa4/0x169
                            [<c04603a3>] handle_level_irq+0x78/0xc1
                            [<ffffffff>] 0xffffffff
     IN-SOFTIRQ-W at:
                            [<c044da08>] __lock_acquire+0x4bf/0x73e
                            [<c044dd36>] lock_acquire+0xaf/0xcc
                            [<c0630340>] _spin_lock+0x2a/0x39
                            [<c041ede7>] task_rq_lock+0x3b/0x62
                            [<c0426e41>] try_to_wake_up+0x75/0x2d4
                            [<c04270d7>] wake_up_process+0x14/0x16
                            [<c043507c>] process_timeout+0xd/0xf
                            [<c0434caa>] run_timer_softirq+0x15b/0x1d1
                            [<c0431379>] __do_softirq+0xb8/0x180
                            [<ffffffff>] 0xffffffff
     IN-RECLAIM_FS-W at:
                               [<c044dabd>] __lock_acquire+0x574/0x73e
                               [<c044dd36>] lock_acquire+0xaf/0xcc
                               [<c0630340>] _spin_lock+0x2a/0x39
                               [<c041ede7>] task_rq_lock+0x3b/0x62
                               [<c0427515>] set_cpus_allowed_ptr+0x1a/0xdd
                               [<c0485cf8>] kswapd+0x61/0x51b
                               [<c043e204>] kthread+0x4a/0x72
                               [<c04034e7>] kernel_thread_helper+0x7/0x10
                               [<ffffffff>] 0xffffffff
     INITIAL USE at:
                           [<c044dad5>] __lock_acquire+0x58c/0x73e
                           [<c044dd36>] lock_acquire+0xaf/0xcc
                           [<c063056b>] _spin_lock_irqsave+0x33/0x43
                           [<c042398e>] rq_attach_root+0x17/0xa7
                           [<c07ae52c>] sched_init+0x240/0x33e
                           [<c079d661>] start_kernel+0x107/0x2fc
                           [<c079d06a>] __init_begin+0x6a/0x6f
                           [<ffffffff>] 0xffffffff
   }
   ... key      at: [<c0800518>] __key.46938+0x0/0x8
   -> (&vec->lock){-.-...} ops: 34058 {
      IN-HARDIRQ-W at:
                              [<c044d9e4>] __lock_acquire+0x49b/0x73e
                              [<c044dd36>] lock_acquire+0xaf/0xcc
                              [<c063056b>] _spin_lock_irqsave+0x33/0x43
                              [<c047ad3b>] cpupri_set+0x51/0xba
                              [<c04219ee>] __enqueue_rt_entity+0xe2/0x1c8
                              [<c0421e18>] enqueue_rt_entity+0x19/0x23
                              [<c0428a52>] enqueue_task_rt+0x24/0x51
                              [<c041e03b>] enqueue_task+0x64/0x70
                              [<c041e06b>] activate_task+0x24/0x2a
                              [<c0426f9e>] try_to_wake_up+0x1d2/0x2d4
                              [<c04270d7>] wake_up_process+0x14/0x16
                              [<c04408b6>] hrtimer_wakeup+0x1d/0x21
                              [<c0440922>] __run_hrtimer+0x68/0x98
                              [<c04411ca>] hrtimer_interrupt+0x101/0x153
                              [<c063406e>] smp_apic_timer_interrupt+0x6e/0x81
                              [<c04033c7>] apic_timer_interrupt+0x2f/0x34
                              [<c0401c4f>] cpu_idle+0x53/0x85
                              [<c061fc80>] rest_init+0x6c/0x6e
                              [<c079d851>] start_kernel+0x2f7/0x2fc
                              [<c079d06a>] __init_begin+0x6a/0x6f
                              [<ffffffff>] 0xffffffff
      IN-SOFTIRQ-W at:
                              [<c044da08>] __lock_acquire+0x4bf/0x73e
                              [<c044dd36>] lock_acquire+0xaf/0xcc
                              [<c063056b>] _spin_lock_irqsave+0x33/0x43
                              [<c047ad3b>] cpupri_set+0x51/0xba
                              [<c04219ee>] __enqueue_rt_entity+0xe2/0x1c8
                              [<c0421e18>] enqueue_rt_entity+0x19/0x23
                              [<c0428a52>] enqueue_task_rt+0x24/0x51
                              [<c041e03b>] enqueue_task+0x64/0x70
                              [<c041e06b>] activate_task+0x24/0x2a
                              [<c0426f9e>] try_to_wake_up+0x1d2/0x2d4
                              [<c04270d7>] wake_up_process+0x14/0x16
                              [<c042737c>] rebalance_domains+0x2a3/0x3ac
                              [<c0429a06>] run_rebalance_domains+0x32/0xaa
                              [<c0431379>] __do_softirq+0xb8/0x180
                              [<ffffffff>] 0xffffffff
      INITIAL USE at:
                             [<c044dad5>] __lock_acquire+0x58c/0x73e
                             [<c044dd36>] lock_acquire+0xaf/0xcc
                             [<c063056b>] _spin_lock_irqsave+0x33/0x43
                             [<c047ad74>] cpupri_set+0x8a/0xba
                             [<c04216f2>] rq_online_rt+0x5e/0x61
                             [<c041dd3a>] set_rq_online+0x40/0x4a
                             [<c04239fb>] rq_attach_root+0x84/0xa7
                             [<c07ae52c>] sched_init+0x240/0x33e
                             [<c079d661>] start_kernel+0x107/0x2fc
                             [<c079d06a>] __init_begin+0x6a/0x6f
                             [<ffffffff>] 0xffffffff
    }
    ... key      at: [<c0c525d0>] __key.14261+0x0/0x10
   ... acquired at:
   [<c044d243>] validate_chain+0x8a8/0xbae
   [<c044dbfd>] __lock_acquire+0x6b4/0x73e
   [<c044dd36>] lock_acquire+0xaf/0xcc
   [<c063056b>] _spin_lock_irqsave+0x33/0x43
   [<c047ad74>] cpupri_set+0x8a/0xba
   [<c04216f2>] rq_online_rt+0x5e/0x61
   [<c041dd3a>] set_rq_online+0x40/0x4a
   [<c04239fb>] rq_attach_root+0x84/0xa7
   [<c07ae52c>] sched_init+0x240/0x33e
   [<c079d661>] start_kernel+0x107/0x2fc
   [<c079d06a>] __init_begin+0x6a/0x6f
   [<ffffffff>] 0xffffffff

   -> (&rt_b->rt_runtime_lock){-.-...} ops: 336 {
      IN-HARDIRQ-W at:
                              [<c044d9e4>] __lock_acquire+0x49b/0x73e
                              [<c044dd36>] lock_acquire+0xaf/0xcc
                              [<c0630340>] _spin_lock+0x2a/0x39
                              [<c0421a75>] __enqueue_rt_entity+0x169/0x1c8
                              [<c0421e18>] enqueue_rt_entity+0x19/0x23
                              [<c0428a52>] enqueue_task_rt+0x24/0x51
                              [<c041e03b>] enqueue_task+0x64/0x70
                              [<c041e06b>] activate_task+0x24/0x2a
                              [<c0426f9e>] try_to_wake_up+0x1d2/0x2d4
                              [<c04270d7>] wake_up_process+0x14/0x16
                              [<c04408b6>] hrtimer_wakeup+0x1d/0x21
                              [<c0440922>] __run_hrtimer+0x68/0x98
                              [<c04411ca>] hrtimer_interrupt+0x101/0x153
                              [<c063406e>] smp_apic_timer_interrupt+0x6e/0x81
                              [<c04033c7>] apic_timer_interrupt+0x2f/0x34
                              [<c0401c4f>] cpu_idle+0x53/0x85
                              [<c061fc80>] rest_init+0x6c/0x6e
                              [<c079d851>] start_kernel+0x2f7/0x2fc
                              [<c079d06a>] __init_begin+0x6a/0x6f
                              [<ffffffff>] 0xffffffff
      IN-SOFTIRQ-W at:
                              [<c044da08>] __lock_acquire+0x4bf/0x73e
                              [<c044dd36>] lock_acquire+0xaf/0xcc
                              [<c0630340>] _spin_lock+0x2a/0x39
                              [<c0421a75>] __enqueue_rt_entity+0x169/0x1c8
                              [<c0421e18>] enqueue_rt_entity+0x19/0x23
                              [<c0428a52>] enqueue_task_rt+0x24/0x51
                              [<c041e03b>] enqueue_task+0x64/0x70
                              [<c041e06b>] activate_task+0x24/0x2a
                              [<c0426f9e>] try_to_wake_up+0x1d2/0x2d4
                              [<c04270d7>] wake_up_process+0x14/0x16
                              [<c042737c>] rebalance_domains+0x2a3/0x3ac
                              [<c0429a06>] run_rebalance_domains+0x32/0xaa
                              [<c0431379>] __do_softirq+0xb8/0x180
                              [<ffffffff>] 0xffffffff
      INITIAL USE at:
                             [<c044dad5>] __lock_acquire+0x58c/0x73e
                             [<c044dd36>] lock_acquire+0xaf/0xcc
                             [<c0630340>] _spin_lock+0x2a/0x39
                             [<c0421a75>] __enqueue_rt_entity+0x169/0x1c8
                             [<c0421e18>] enqueue_rt_entity+0x19/0x23
                             [<c0428a52>] enqueue_task_rt+0x24/0x51
                             [<c041e03b>] enqueue_task+0x64/0x70
                             [<c041e06b>] activate_task+0x24/0x2a
                             [<c0426f9e>] try_to_wake_up+0x1d2/0x2d4
                             [<c04270d7>] wake_up_process+0x14/0x16
                             [<c062b86b>] migration_call+0xb9/0x444
                             [<c07ae130>] migration_init+0x3b/0x4b
                             [<c040115c>] do_one_initcall+0x6a/0x16e
                             [<c079d44d>] kernel_init+0x4d/0x15a
                             [<c04034e7>] kernel_thread_helper+0x7/0x10
                             [<ffffffff>] 0xffffffff
    }
    ... key      at: [<c0800504>] __key.37924+0x0/0x8
    -> (&cpu_base->lock){-.-...} ops: 950512 {
       IN-HARDIRQ-W at:
                                [<c044d9e4>] __lock_acquire+0x49b/0x73e
                                [<c044dd36>] lock_acquire+0xaf/0xcc
                                [<c0630340>] _spin_lock+0x2a/0x39
                                [<c0440a3a>] hrtimer_run_queues+0xe8/0x131
                                [<c0435151>] run_local_timers+0xd/0x1e
                                [<c0435486>] update_process_times+0x29/0x53
                                [<c0447fe0>] tick_periodic+0x6b/0x77
                                [<c0448009>] tick_handle_periodic+0x1d/0x60
                                [<c063406e>] smp_apic_timer_interrupt+0x6e/0x81
                                [<c04033c7>] apic_timer_interrupt+0x2f/0x34
                                [<c04082c7>] arch_dup_task_struct+0x19/0x81
                                [<c042ac1c>] copy_process+0xab/0x115f
                                [<c042be78>] do_fork+0x129/0x2c5
                                [<c0401698>] kernel_thread+0x7f/0x87
                                [<c043e0b3>] kthreadd+0xa3/0xe3
                                [<c04034e7>] kernel_thread_helper+0x7/0x10
                                [<ffffffff>] 0xffffffff
       IN-SOFTIRQ-W at:
                                [<c044da08>] __lock_acquire+0x4bf/0x73e
                                [<c044dd36>] lock_acquire+0xaf/0xcc
                                [<c063056b>] _spin_lock_irqsave+0x33/0x43
                                [<c0440b98>] lock_hrtimer_base+0x1d/0x38
                                [<c0440ca9>] __hrtimer_start_range_ns+0x1f/0x232
                                [<c0440ee7>] hrtimer_start_range_ns+0x15/0x17
                                [<c0448ef1>] tick_setup_sched_timer+0xf6/0x124
                                [<c0441558>] hrtimer_run_pending+0xb0/0xe8
                                [<c0434b76>] run_timer_softirq+0x27/0x1d1
                                [<c0431379>] __do_softirq+0xb8/0x180
                                [<ffffffff>] 0xffffffff
       INITIAL USE at:
                               [<c044dad5>] __lock_acquire+0x58c/0x73e
                               [<c044dd36>] lock_acquire+0xaf/0xcc
                               [<c063056b>] _spin_lock_irqsave+0x33/0x43
                               [<c0440b98>] lock_hrtimer_base+0x1d/0x38
                               [<c0440ca9>] __hrtimer_start_range_ns+0x1f/0x232
                               [<c0421ab1>] __enqueue_rt_entity+0x1a5/0x1c8
                               [<c0421e18>] enqueue_rt_entity+0x19/0x23
                               [<c0428a52>] enqueue_task_rt+0x24/0x51
                               [<c041e03b>] enqueue_task+0x64/0x70
                               [<c041e06b>] activate_task+0x24/0x2a
                               [<c0426f9e>] try_to_wake_up+0x1d2/0x2d4
                               [<c04270d7>] wake_up_process+0x14/0x16
                               [<c062b86b>] migration_call+0xb9/0x444
                               [<c07ae130>] migration_init+0x3b/0x4b
                               [<c040115c>] do_one_initcall+0x6a/0x16e
                               [<c079d44d>] kernel_init+0x4d/0x15a
                               [<c04034e7>] kernel_thread_helper+0x7/0x10
                               [<ffffffff>] 0xffffffff
     }
     ... key      at: [<c08234b8>] __key.20063+0x0/0x8
    ... acquired at:
   [<c044d243>] validate_chain+0x8a8/0xbae
   [<c044dbfd>] __lock_acquire+0x6b4/0x73e
   [<c044dd36>] lock_acquire+0xaf/0xcc
   [<c063056b>] _spin_lock_irqsave+0x33/0x43
   [<c0440b98>] lock_hrtimer_base+0x1d/0x38
   [<c0440ca9>] __hrtimer_start_range_ns+0x1f/0x232
   [<c0421ab1>] __enqueue_rt_entity+0x1a5/0x1c8
   [<c0421e18>] enqueue_rt_entity+0x19/0x23
   [<c0428a52>] enqueue_task_rt+0x24/0x51
   [<c041e03b>] enqueue_task+0x64/0x70
   [<c041e06b>] activate_task+0x24/0x2a
   [<c0426f9e>] try_to_wake_up+0x1d2/0x2d4
   [<c04270d7>] wake_up_process+0x14/0x16
   [<c062b86b>] migration_call+0xb9/0x444
   [<c07ae130>] migration_init+0x3b/0x4b
   [<c040115c>] do_one_initcall+0x6a/0x16e
   [<c079d44d>] kernel_init+0x4d/0x15a
   [<c04034e7>] kernel_thread_helper+0x7/0x10
   [<ffffffff>] 0xffffffff

    -> (&rt_rq->rt_runtime_lock){-.....} ops: 17587 {
       IN-HARDIRQ-W at:
                                [<c044d9e4>] __lock_acquire+0x49b/0x73e
                                [<c044dd36>] lock_acquire+0xaf/0xcc
                                [<c0630340>] _spin_lock+0x2a/0x39
                                [<c0421efc>] sched_rt_period_timer+0xda/0x24e
                                [<c0440922>] __run_hrtimer+0x68/0x98
                                [<c04411ca>] hrtimer_interrupt+0x101/0x153
                                [<c063406e>] smp_apic_timer_interrupt+0x6e/0x81
                                [<c04033c7>] apic_timer_interrupt+0x2f/0x34
                                [<c0452203>] each_symbol_in_section+0x27/0x57
                                [<c045225a>] each_symbol+0x27/0x113
                                [<c0452373>] find_symbol+0x2d/0x51
                                [<c0454a7a>] load_module+0xaec/0x10eb
                                [<c04550bf>] sys_init_module+0x46/0x19b
                                [<c0402a68>] sysenter_do_call+0x12/0x36
                                [<ffffffff>] 0xffffffff
       INITIAL USE at:
                               [<c044dad5>] __lock_acquire+0x58c/0x73e
                               [<c044dd36>] lock_acquire+0xaf/0xcc
                               [<c0630340>] _spin_lock+0x2a/0x39
                               [<c0421c41>] update_curr_rt+0x13a/0x20d
                               [<c0421dd8>] dequeue_task_rt+0x13/0x3a
                               [<c041df9e>] dequeue_task+0xff/0x10e
                               [<c041dfd1>] deactivate_task+0x24/0x2a
                               [<c062db54>] __schedule+0x162/0x991
                               [<c062e39a>] schedule+0x17/0x30
                               [<c0426c54>] migration_thread+0x175/0x203
                               [<c043e204>] kthread+0x4a/0x72
                               [<c04034e7>] kernel_thread_helper+0x7/0x10
                               [<ffffffff>] 0xffffffff
     }
     ... key      at: [<c080050c>] __key.46863+0x0/0x8
    ... acquired at:
   [<c044d243>] validate_chain+0x8a8/0xbae
   [<c044dbfd>] __lock_acquire+0x6b4/0x73e
   [<c044dd36>] lock_acquire+0xaf/0xcc
   [<c0630340>] _spin_lock+0x2a/0x39
   [<c041ee73>] __enable_runtime+0x43/0xb3
   [<c04216d8>] rq_online_rt+0x44/0x61
   [<c041dd3a>] set_rq_online+0x40/0x4a
   [<c062b8a5>] migration_call+0xf3/0x444
   [<c063291c>] notifier_call_chain+0x2b/0x4a
   [<c0441e22>] __raw_notifier_call_chain+0x13/0x15
   [<c0441e35>] raw_notifier_call_chain+0x11/0x13
   [<c062bd2f>] _cpu_up+0xc3/0xf6
   [<c062bdac>] cpu_up+0x4a/0x5a
   [<c079d49a>] kernel_init+0x9a/0x15a
   [<c04034e7>] kernel_thread_helper+0x7/0x10
   [<ffffffff>] 0xffffffff

   ... acquired at:
   [<c044d243>] validate_chain+0x8a8/0xbae
   [<c044dbfd>] __lock_acquire+0x6b4/0x73e
   [<c044dd36>] lock_acquire+0xaf/0xcc
   [<c0630340>] _spin_lock+0x2a/0x39
   [<c0421a75>] __enqueue_rt_entity+0x169/0x1c8
   [<c0421e18>] enqueue_rt_entity+0x19/0x23
   [<c0428a52>] enqueue_task_rt+0x24/0x51
   [<c041e03b>] enqueue_task+0x64/0x70
   [<c041e06b>] activate_task+0x24/0x2a
   [<c0426f9e>] try_to_wake_up+0x1d2/0x2d4
   [<c04270d7>] wake_up_process+0x14/0x16
   [<c062b86b>] migration_call+0xb9/0x444
   [<c07ae130>] migration_init+0x3b/0x4b
   [<c040115c>] do_one_initcall+0x6a/0x16e
   [<c079d44d>] kernel_init+0x4d/0x15a
   [<c04034e7>] kernel_thread_helper+0x7/0x10
   [<ffffffff>] 0xffffffff

   ... acquired at:
   [<c044d243>] validate_chain+0x8a8/0xbae
   [<c044dbfd>] __lock_acquire+0x6b4/0x73e
   [<c044dd36>] lock_acquire+0xaf/0xcc
   [<c0630340>] _spin_lock+0x2a/0x39
   [<c0421c41>] update_curr_rt+0x13a/0x20d
   [<c0421dd8>] dequeue_task_rt+0x13/0x3a
   [<c041df9e>] dequeue_task+0xff/0x10e
   [<c041dfd1>] deactivate_task+0x24/0x2a
   [<c062db54>] __schedule+0x162/0x991
   [<c062e39a>] schedule+0x17/0x30
   [<c0426c54>] migration_thread+0x175/0x203
   [<c043e204>] kthread+0x4a/0x72
   [<c04034e7>] kernel_thread_helper+0x7/0x10
   [<ffffffff>] 0xffffffff

   -> (&sig->cputimer.lock){......} ops: 1949 {
      INITIAL USE at:
                             [<c044dad5>] __lock_acquire+0x58c/0x73e
                             [<c044dd36>] lock_acquire+0xaf/0xcc
                             [<c063056b>] _spin_lock_irqsave+0x33/0x43
                             [<c043f03e>] thread_group_cputimer+0x29/0x90
                             [<c044004c>] posix_cpu_timers_exit_group+0x16/0x39
                             [<c042e5f0>] release_task+0xa2/0x376
                             [<c042fbe1>] do_exit+0x548/0x5b3
                             [<c043a9d8>] __request_module+0x0/0x100
                             [<c04034e7>] kernel_thread_helper+0x7/0x10
                             [<ffffffff>] 0xffffffff
    }
    ... key      at: [<c08014ac>] __key.15480+0x0/0x8
   ... acquired at:
   [<c044d243>] validate_chain+0x8a8/0xbae
   [<c044dbfd>] __lock_acquire+0x6b4/0x73e
   [<c044dd36>] lock_acquire+0xaf/0xcc
   [<c0630340>] _spin_lock+0x2a/0x39
   [<c041f43a>] update_curr+0xef/0x107
   [<c042131b>] enqueue_entity+0x1a/0x1c6
   [<c0421535>] enqueue_task_fair+0x24/0x3e
   [<c041e03b>] enqueue_task+0x64/0x70
   [<c041e06b>] activate_task+0x24/0x2a
   [<c0426f9e>] try_to_wake_up+0x1d2/0x2d4
   [<c04270b0>] default_wake_function+0x10/0x12
   [<c041d785>] __wake_up_common+0x34/0x5f
   [<c041ec26>] complete+0x30/0x43
   [<c043e1e8>] kthread+0x2e/0x72
   [<c04034e7>] kernel_thread_helper+0x7/0x10
   [<ffffffff>] 0xffffffff

   -> (&rq->lock/1){..-...} ops: 3217 {
      IN-SOFTIRQ-W at:
                              [<c044da08>] __lock_acquire+0x4bf/0x73e
                              [<c044dd36>] lock_acquire+0xaf/0xcc
                              [<c0630305>] _spin_lock_nested+0x2d/0x3e
                              [<c0422cb4>] double_rq_lock+0x4b/0x7d
                              [<c0427274>] rebalance_domains+0x19b/0x3ac
                              [<c0429a06>] run_rebalance_domains+0x32/0xaa
                              [<c0431379>] __do_softirq+0xb8/0x180
                              [<ffffffff>] 0xffffffff
      INITIAL USE at:
                             [<c044dad5>] __lock_acquire+0x58c/0x73e
                             [<c044dd36>] lock_acquire+0xaf/0xcc
                             [<c0630305>] _spin_lock_nested+0x2d/0x3e
                             [<c0422cb4>] double_rq_lock+0x4b/0x7d
                             [<c0427274>] rebalance_domains+0x19b/0x3ac
                             [<c0429a06>] run_rebalance_domains+0x32/0xaa
                             [<c0431379>] __do_softirq+0xb8/0x180
                             [<ffffffff>] 0xffffffff
    }
    ... key      at: [<c0800519>] __key.46938+0x1/0x8
    ... acquired at:
   [<c044d243>] validate_chain+0x8a8/0xbae
   [<c044dbfd>] __lock_acquire+0x6b4/0x73e
   [<c044dd36>] lock_acquire+0xaf/0xcc
   [<c0630340>] _spin_lock+0x2a/0x39
   [<c0421c41>] update_curr_rt+0x13a/0x20d
   [<c0421dd8>] dequeue_task_rt+0x13/0x3a
   [<c041df9e>] dequeue_task+0xff/0x10e
   [<c041dfd1>] deactivate_task+0x24/0x2a
   [<c0427b1b>] push_rt_task+0x189/0x1f7
   [<c0427b9b>] push_rt_tasks+0x12/0x19
   [<c0427bb9>] post_schedule_rt+0x17/0x21
   [<c0425a68>] finish_task_switch+0x83/0xc0
   [<c062e339>] __schedule+0x947/0x991
   [<c062e39a>] schedule+0x17/0x30
   [<c0426c54>] migration_thread+0x175/0x203
   [<c043e204>] kthread+0x4a/0x72
   [<c04034e7>] kernel_thread_helper+0x7/0x10
   [<ffffffff>] 0xffffffff

    ... acquired at:
   [<c044d243>] validate_chain+0x8a8/0xbae
   [<c044dbfd>] __lock_acquire+0x6b4/0x73e
   [<c044dd36>] lock_acquire+0xaf/0xcc
   [<c063056b>] _spin_lock_irqsave+0x33/0x43
   [<c047ad3b>] cpupri_set+0x51/0xba
   [<c04219ee>] __enqueue_rt_entity+0xe2/0x1c8
   [<c0421e18>] enqueue_rt_entity+0x19/0x23
   [<c0428a52>] enqueue_task_rt+0x24/0x51
   [<c041e03b>] enqueue_task+0x64/0x70
   [<c041e06b>] activate_task+0x24/0x2a
   [<c0427b33>] push_rt_task+0x1a1/0x1f7
   [<c0427b9b>] push_rt_tasks+0x12/0x19
   [<c0427bb9>] post_schedule_rt+0x17/0x21
   [<c0425a68>] finish_task_switch+0x83/0xc0
   [<c062e339>] __schedule+0x947/0x991
   [<c062e39a>] schedule+0x17/0x30
   [<c0426c54>] migration_thread+0x175/0x203
   [<c043e204>] kthread+0x4a/0x72
   [<c04034e7>] kernel_thread_helper+0x7/0x10
   [<ffffffff>] 0xffffffff

   ... acquired at:
   [<c044d243>] validate_chain+0x8a8/0xbae
   [<c044dbfd>] __lock_acquire+0x6b4/0x73e
   [<c044dd36>] lock_acquire+0xaf/0xcc
   [<c0630305>] _spin_lock_nested+0x2d/0x3e
   [<c0422cb4>] double_rq_lock+0x4b/0x7d
   [<c0427274>] rebalance_domains+0x19b/0x3ac
   [<c0429a06>] run_rebalance_domains+0x32/0xaa
   [<c0431379>] __do_softirq+0xb8/0x180
   [<ffffffff>] 0xffffffff

  ... acquired at:
   [<c044d243>] validate_chain+0x8a8/0xbae
   [<c044dbfd>] __lock_acquire+0x6b4/0x73e
   [<c044dd36>] lock_acquire+0xaf/0xcc
   [<c0630340>] _spin_lock+0x2a/0x39
   [<c041ede7>] task_rq_lock+0x3b/0x62
   [<c0426e41>] try_to_wake_up+0x75/0x2d4
   [<c04270b0>] default_wake_function+0x10/0x12
   [<c041d785>] __wake_up_common+0x34/0x5f
   [<c041ec26>] complete+0x30/0x43
   [<c043e0cc>] kthreadd+0xbc/0xe3
   [<c04034e7>] kernel_thread_helper+0x7/0x10
   [<ffffffff>] 0xffffffff

  -> (&ep->lock){......} ops: 110 {
     INITIAL USE at:
                           [<c044dad5>] __lock_acquire+0x58c/0x73e
                           [<c044dd36>] lock_acquire+0xaf/0xcc
                           [<c063056b>] _spin_lock_irqsave+0x33/0x43
                           [<c04ca381>] sys_epoll_ctl+0x232/0x3f6
                           [<c0402a68>] sysenter_do_call+0x12/0x36
                           [<ffffffff>] 0xffffffff
   }
   ... key      at: [<c0c5be90>] __key.22301+0x0/0x10
   ... acquired at:
   [<c044d243>] validate_chain+0x8a8/0xbae
   [<c044dbfd>] __lock_acquire+0x6b4/0x73e
   [<c044dd36>] lock_acquire+0xaf/0xcc
   [<c0630340>] _spin_lock+0x2a/0x39
   [<c041ede7>] task_rq_lock+0x3b/0x62
   [<c0426e41>] try_to_wake_up+0x75/0x2d4
   [<c04270b0>] default_wake_function+0x10/0x12
   [<c041d785>] __wake_up_common+0x34/0x5f
   [<c041d7c6>] __wake_up_locked+0x16/0x1a
   [<c04ca7f5>] ep_poll_callback+0x7c/0xb6
   [<c041d785>] __wake_up_common+0x34/0x5f
   [<c041ec70>] __wake_up_sync_key+0x37/0x4a
   [<c05cbefa>] sock_def_readable+0x42/0x71
   [<c061c8b1>] unix_stream_connect+0x2f3/0x368
   [<c05c830a>] sys_connect+0x59/0x76
   [<c05c963f>] sys_socketcall+0x76/0x172
   [<c0402a68>] sysenter_do_call+0x12/0x36
   [<ffffffff>] 0xffffffff

  ... acquired at:
   [<c044d243>] validate_chain+0x8a8/0xbae
   [<c044dbfd>] __lock_acquire+0x6b4/0x73e
   [<c044dd36>] lock_acquire+0xaf/0xcc
   [<c063056b>] _spin_lock_irqsave+0x33/0x43
   [<c04ca797>] ep_poll_callback+0x1e/0xb6
   [<c041d785>] __wake_up_common+0x34/0x5f
   [<c041ec70>] __wake_up_sync_key+0x37/0x4a
   [<c05cbefa>] sock_def_readable+0x42/0x71
   [<c061c8b1>] unix_stream_connect+0x2f3/0x368
   [<c05c830a>] sys_connect+0x59/0x76
   [<c05c963f>] sys_socketcall+0x76/0x172
   [<c0402a68>] sysenter_do_call+0x12/0x36
   [<ffffffff>] 0xffffffff

 ... acquired at:
   [<c044d243>] validate_chain+0x8a8/0xbae
   [<c044dbfd>] __lock_acquire+0x6b4/0x73e
   [<c044dd36>] lock_acquire+0xaf/0xcc
   [<c063056b>] _spin_lock_irqsave+0x33/0x43
   [<c041ec0d>] complete+0x17/0x43
   [<c0509cf2>] blk_end_sync_rq+0x2a/0x2d
   [<c0506935>] end_that_request_last+0x17b/0x1a1
   [<c0506a0d>] blk_end_io+0x51/0x6f
   [<c0506a64>] blk_end_request+0x11/0x13
   [<f8106c9c>] scsi_io_completion+0x1d9/0x41f [scsi_mod]
   [<f810152d>] scsi_finish_command+0xcc/0xd4 [scsi_mod]
   [<f8106fdb>] scsi_softirq_done+0xf9/0x101 [scsi_mod]
   [<c050a936>] blk_done_softirq+0x5e/0x70
   [<c0431379>] __do_softirq+0xb8/0x180
   [<ffffffff>] 0xffffffff

 -> (&n->list_lock){..-...} ops: 49241 {
    IN-SOFTIRQ-W at:
                          [<c044da08>] __lock_acquire+0x4bf/0x73e
                          [<c044dd36>] lock_acquire+0xaf/0xcc
                          [<c0630340>] _spin_lock+0x2a/0x39
                          [<c049bd18>] add_partial+0x16/0x40
                          [<c049d0d4>] __slab_free+0x96/0x28f
                          [<c049df5c>] kmem_cache_free+0x8c/0xf2
                          [<c04a5ce9>] file_free_rcu+0x35/0x38
                          [<c0461a12>] rcu_process_callbacks+0x62/0x86
                          [<c0431379>] __do_softirq+0xb8/0x180
                          [<ffffffff>] 0xffffffff
    INITIAL USE at:
                         [<c044dad5>] __lock_acquire+0x58c/0x73e
                         [<c044dd36>] lock_acquire+0xaf/0xcc
                         [<c0630340>] _spin_lock+0x2a/0x39
                         [<c049bd18>] add_partial+0x16/0x40
                         [<c049d0d4>] __slab_free+0x96/0x28f
                         [<c049df5c>] kmem_cache_free+0x8c/0xf2
                         [<c0514eda>] ida_get_new_above+0x13b/0x155
                         [<c0514f00>] ida_get_new+0xc/0xe
                         [<c04a628b>] set_anon_super+0x39/0xa3
                         [<c04a68c6>] sget+0x2f3/0x386
                         [<c04a7365>] get_sb_single+0x24/0x8f
                         [<c04e034c>] sysfs_get_sb+0x18/0x1a
                         [<c04a6dd1>] vfs_kern_mount+0x40/0x7b
                         [<c04a6e21>] kern_mount_data+0x15/0x17
                         [<c07b5ff6>] sysfs_init+0x50/0x9c
                         [<c07b4ac9>] mnt_init+0x8c/0x1e4
                         [<c07b4737>] vfs_caches_init+0xd8/0xea
                         [<c079d815>] start_kernel+0x2bb/0x2fc
                         [<c079d06a>] __init_begin+0x6a/0x6f
                         [<ffffffff>] 0xffffffff
  }
  ... key      at: [<c0c5a424>] __key.25358+0x0/0x8
 ... acquired at:
   [<c044d243>] validate_chain+0x8a8/0xbae
   [<c044dbfd>] __lock_acquire+0x6b4/0x73e
   [<c044dd36>] lock_acquire+0xaf/0xcc
   [<c0630340>] _spin_lock+0x2a/0x39
   [<c049cc45>] __slab_alloc+0xf6/0x4ef
   [<c049d333>] kmem_cache_alloc+0x66/0x11f
   [<f810189b>] scsi_pool_alloc_command+0x20/0x4c [scsi_mod]
   [<f81018de>] scsi_host_alloc_command+0x17/0x4f [scsi_mod]
   [<f810192b>] __scsi_get_command+0x15/0x71 [scsi_mod]
   [<f8101c41>] scsi_get_command+0x39/0x95 [scsi_mod]
   [<f81062b6>] scsi_get_cmd_from_req+0x26/0x50 [scsi_mod]
   [<f8106594>] scsi_setup_blk_pc_cmnd+0x2b/0xd7 [scsi_mod]
   [<f8106664>] scsi_prep_fn+0x24/0x33 [scsi_mod]
   [<c0504712>] elv_next_request+0xe6/0x18d
   [<f810704c>] scsi_request_fn+0x69/0x431 [scsi_mod]
   [<c05072af>] __generic_unplug_device+0x2e/0x31
   [<c0509d59>] blk_execute_rq_nowait+0x64/0x86
   [<c0509e2e>] blk_execute_rq+0xb3/0xd5
   [<f81068f5>] scsi_execute+0xc5/0x11c [scsi_mod]
   [<f81069ff>] scsi_execute_req+0xb3/0x104 [scsi_mod]
   [<f812b40d>] sd_revalidate_disk+0x1a3/0xf64 [sd_mod]
   [<f812d52f>] sd_probe_async+0x146/0x22d [sd_mod]
   [<c044341f>] async_thread+0xe9/0x1c9
   [<c043e204>] kthread+0x4a/0x72
   [<c04034e7>] kernel_thread_helper+0x7/0x10
   [<ffffffff>] 0xffffffff

 -> (&cwq->lock){-.-...} ops: 30335 {
    IN-HARDIRQ-W at:
                          [<c044d9e4>] __lock_acquire+0x49b/0x73e
                          [<c044dd36>] lock_acquire+0xaf/0xcc
                          [<c063056b>] _spin_lock_irqsave+0x33/0x43
                          [<c043b54b>] __queue_work+0x14/0x30
                          [<c043b5ce>] queue_work_on+0x3a/0x46
                          [<c043b617>] queue_work+0x26/0x4a
                          [<c043b64f>] schedule_work+0x14/0x16
                          [<c057a367>] schedule_console_callback+0x12/0x14
                          [<c05788ed>] kbd_event+0x595/0x600
                          [<c05b3d15>] input_pass_event+0x56/0x7e
                          [<c05b4702>] input_handle_event+0x314/0x334
                          [<c05b4f1e>] input_event+0x50/0x63
                          [<c05b9bd4>] atkbd_interrupt+0x209/0x4e9
                          [<c05b1793>] serio_interrupt+0x38/0x6e
                          [<c05b24e8>] i8042_interrupt+0x1db/0x1ec
                          [<c045e922>] handle_IRQ_event+0xa4/0x169
                          [<c04602ea>] handle_edge_irq+0xc9/0x10a
                          [<ffffffff>] 0xffffffff
    IN-SOFTIRQ-W at:
                          [<c044da08>] __lock_acquire+0x4bf/0x73e
                          [<c044dd36>] lock_acquire+0xaf/0xcc
                          [<c063056b>] _spin_lock_irqsave+0x33/0x43
                          [<c043b54b>] __queue_work+0x14/0x30
                          [<c043b590>] delayed_work_timer_fn+0x29/0x2d
                          [<c0434caa>] run_timer_softirq+0x15b/0x1d1
                          [<c0431379>] __do_softirq+0xb8/0x180
                          [<ffffffff>] 0xffffffff
    INITIAL USE at:
                         [<c044dad5>] __lock_acquire+0x58c/0x73e
                         [<c044dd36>] lock_acquire+0xaf/0xcc
                         [<c063056b>] _spin_lock_irqsave+0x33/0x43
                         [<c043b54b>] __queue_work+0x14/0x30
                         [<c043b5ce>] queue_work_on+0x3a/0x46
                         [<c043b617>] queue_work+0x26/0x4a
                         [<c043a7b3>] call_usermodehelper_exec+0x83/0xd0
                         [<c051631a>] kobject_uevent_env+0x351/0x385
                         [<c0516358>] kobject_uevent+0xa/0xc
                         [<c0515a0e>] kset_register+0x2e/0x34
                         [<c0590f18>] bus_register+0xed/0x23d
                         [<c07bea09>] platform_bus_init+0x23/0x38
                         [<c07beb77>] driver_init+0x1c/0x28
                         [<c079d4f6>] kernel_init+0xf6/0x15a
                         [<c04034e7>] kernel_thread_helper+0x7/0x10
                         [<ffffffff>] 0xffffffff
  }
  ... key      at: [<c08230a8>] __key.23814+0x0/0x8
  -> (&workqueue_cpu_stat(cpu)->lock){-.-...} ops: 20397 {
     IN-HARDIRQ-W at:
                            [<c044d9e4>] __lock_acquire+0x49b/0x73e
                            [<c044dd36>] lock_acquire+0xaf/0xcc
                            [<c063056b>] _spin_lock_irqsave+0x33/0x43
                            [<c0474909>] probe_workqueue_insertion+0x33/0x81
                            [<c043acf3>] insert_work+0x3f/0x9b
                            [<c043b559>] __queue_work+0x22/0x30
                            [<c043b5ce>] queue_work_on+0x3a/0x46
                            [<c043b617>] queue_work+0x26/0x4a
                            [<c043b64f>] schedule_work+0x14/0x16
                            [<c057a367>] schedule_console_callback+0x12/0x14
                            [<c05788ed>] kbd_event+0x595/0x600
                            [<c05b3d15>] input_pass_event+0x56/0x7e
                            [<c05b4702>] input_handle_event+0x314/0x334
                            [<c05b4f1e>] input_event+0x50/0x63
                            [<c05b9bd4>] atkbd_interrupt+0x209/0x4e9
                            [<c05b1793>] serio_interrupt+0x38/0x6e
                            [<c05b24e8>] i8042_interrupt+0x1db/0x1ec
                            [<c045e922>] handle_IRQ_event+0xa4/0x169
                            [<c04602ea>] handle_edge_irq+0xc9/0x10a
                            [<ffffffff>] 0xffffffff
     IN-SOFTIRQ-W at:
                            [<c044da08>] __lock_acquire+0x4bf/0x73e
                            [<c044dd36>] lock_acquire+0xaf/0xcc
                            [<c063056b>] _spin_lock_irqsave+0x33/0x43
                            [<c0474909>] probe_workqueue_insertion+0x33/0x81
                            [<c043acf3>] insert_work+0x3f/0x9b
                            [<c043b559>] __queue_work+0x22/0x30
                            [<c043b590>] delayed_work_timer_fn+0x29/0x2d
                            [<c0434caa>] run_timer_softirq+0x15b/0x1d1
                            [<c0431379>] __do_softirq+0xb8/0x180
                            [<ffffffff>] 0xffffffff
     INITIAL USE at:
                           [<c044dad5>] __lock_acquire+0x58c/0x73e
                           [<c044dd36>] lock_acquire+0xaf/0xcc
                           [<c063056b>] _spin_lock_irqsave+0x33/0x43
                           [<c04747eb>] probe_workqueue_creation+0xc9/0x10a
                           [<c043abcb>] create_workqueue_thread+0x87/0xb0
                           [<c043b12f>] __create_workqueue_key+0x16d/0x1b2
                           [<c07aeedb>] init_workqueues+0x61/0x73
                           [<c079d4e7>] kernel_init+0xe7/0x15a
                           [<c04034e7>] kernel_thread_helper+0x7/0x10
                           [<ffffffff>] 0xffffffff
   }
   ... key      at: [<c0c52574>] __key.23424+0x0/0x8
  ... acquired at:
   [<c044d243>] validate_chain+0x8a8/0xbae
   [<c044dbfd>] __lock_acquire+0x6b4/0x73e
   [<c044dd36>] lock_acquire+0xaf/0xcc
   [<c063056b>] _spin_lock_irqsave+0x33/0x43
   [<c0474909>] probe_workqueue_insertion+0x33/0x81
   [<c043acf3>] insert_work+0x3f/0x9b
   [<c043b559>] __queue_work+0x22/0x30
   [<c043b5ce>] queue_work_on+0x3a/0x46
   [<c043b617>] queue_work+0x26/0x4a
   [<c043a7b3>] call_usermodehelper_exec+0x83/0xd0
   [<c051631a>] kobject_uevent_env+0x351/0x385
   [<c0516358>] kobject_uevent+0xa/0xc
   [<c0515a0e>] kset_register+0x2e/0x34
   [<c0590f18>] bus_register+0xed/0x23d
   [<c07bea09>] platform_bus_init+0x23/0x38
   [<c07beb77>] driver_init+0x1c/0x28
   [<c079d4f6>] kernel_init+0xf6/0x15a
   [<c04034e7>] kernel_thread_helper+0x7/0x10
   [<ffffffff>] 0xffffffff

  ... acquired at:
   [<c044d243>] validate_chain+0x8a8/0xbae
   [<c044dbfd>] __lock_acquire+0x6b4/0x73e
   [<c044dd36>] lock_acquire+0xaf/0xcc
   [<c063056b>] _spin_lock_irqsave+0x33/0x43
   [<c041ecaf>] __wake_up+0x1a/0x40
   [<c043ad46>] insert_work+0x92/0x9b
   [<c043b559>] __queue_work+0x22/0x30
   [<c043b5ce>] queue_work_on+0x3a/0x46
   [<c043b617>] queue_work+0x26/0x4a
   [<c043a7b3>] call_usermodehelper_exec+0x83/0xd0
   [<c051631a>] kobject_uevent_env+0x351/0x385
   [<c0516358>] kobject_uevent+0xa/0xc
   [<c0515a0e>] kset_register+0x2e/0x34
   [<c0590f18>] bus_register+0xed/0x23d
   [<c07bea09>] platform_bus_init+0x23/0x38
   [<c07beb77>] driver_init+0x1c/0x28
   [<c079d4f6>] kernel_init+0xf6/0x15a
   [<c04034e7>] kernel_thread_helper+0x7/0x10
   [<ffffffff>] 0xffffffff

 ... acquired at:
   [<c044d243>] validate_chain+0x8a8/0xbae
   [<c044dbfd>] __lock_acquire+0x6b4/0x73e
   [<c044dd36>] lock_acquire+0xaf/0xcc
   [<c063056b>] _spin_lock_irqsave+0x33/0x43
   [<c043b54b>] __queue_work+0x14/0x30
   [<c043b5ce>] queue_work_on+0x3a/0x46
   [<c043b617>] queue_work+0x26/0x4a
   [<c0505679>] kblockd_schedule_work+0x12/0x14
   [<c05113bb>] elv_schedule_dispatch+0x41/0x48
   [<c0513377>] elv_ioq_completed_request+0x2dc/0x2fe
   [<c05045aa>] elv_completed_request+0x48/0x97
   [<c0506738>] __blk_put_request+0x36/0xb8
   [<c0506953>] end_that_request_last+0x199/0x1a1
   [<c0506a0d>] blk_end_io+0x51/0x6f
   [<c0506a64>] blk_end_request+0x11/0x13
   [<f8106c9c>] scsi_io_completion+0x1d9/0x41f [scsi_mod]
   [<f810152d>] scsi_finish_command+0xcc/0xd4 [scsi_mod]
   [<f8106fdb>] scsi_softirq_done+0xf9/0x101 [scsi_mod]
   [<c050a936>] blk_done_softirq+0x5e/0x70
   [<c0431379>] __do_softirq+0xb8/0x180
   [<ffffffff>] 0xffffffff

 -> (&zone->lock){..-...} ops: 80266 {
    IN-SOFTIRQ-W at:
                          [<c044da08>] __lock_acquire+0x4bf/0x73e
                          [<c044dd36>] lock_acquire+0xaf/0xcc
                          [<c0630340>] _spin_lock+0x2a/0x39
                          [<c047fc71>] __free_pages_ok+0x167/0x321
                          [<c04800ce>] __free_pages+0x29/0x2b
                          [<c049c7c1>] __free_slab+0xb2/0xba
                          [<c049c800>] discard_slab+0x37/0x39
                          [<c049d15c>] __slab_free+0x11e/0x28f
                          [<c049df5c>] kmem_cache_free+0x8c/0xf2
                          [<c042ab6e>] free_task+0x31/0x34
                          [<c042c37b>] __put_task_struct+0xd3/0xd8
                          [<c042e072>] delayed_put_task_struct+0x60/0x64
                          [<c0461a12>] rcu_process_callbacks+0x62/0x86
                          [<c0431379>] __do_softirq+0xb8/0x180
                          [<ffffffff>] 0xffffffff
    INITIAL USE at:
                         [<c044dad5>] __lock_acquire+0x58c/0x73e
                         [<c044dd36>] lock_acquire+0xaf/0xcc
                         [<c0630340>] _spin_lock+0x2a/0x39
                         [<c047f7b6>] free_pages_bulk+0x21/0x1a1
                         [<c047ffcf>] free_hot_cold_page+0x181/0x20f
                         [<c04800a3>] free_hot_page+0xf/0x11
                         [<c04800c5>] __free_pages+0x20/0x2b
                         [<c07c4d96>] __free_pages_bootmem+0x6d/0x71
                         [<c07b2244>] free_all_bootmem_core+0xd2/0x177
                         [<c07b22f6>] free_all_bootmem+0xd/0xf
                         [<c07ad21a>] mem_init+0x28/0x28c
                         [<c079d7b1>] start_kernel+0x257/0x2fc
                         [<c079d06a>] __init_begin+0x6a/0x6f
                         [<ffffffff>] 0xffffffff
  }
  ... key      at: [<c0c52628>] __key.30749+0x0/0x8
 ... acquired at:
   [<c044d243>] validate_chain+0x8a8/0xbae
   [<c044dbfd>] __lock_acquire+0x6b4/0x73e
   [<c044dd36>] lock_acquire+0xaf/0xcc
   [<c063056b>] _spin_lock_irqsave+0x33/0x43
   [<c048035e>] get_page_from_freelist+0x236/0x3e3
   [<c04805f4>] __alloc_pages_internal+0xce/0x371
   [<c049cce6>] __slab_alloc+0x197/0x4ef
   [<c049d333>] kmem_cache_alloc+0x66/0x11f
   [<c047d96b>] mempool_alloc_slab+0x13/0x15
   [<c047da5c>] mempool_alloc+0x3a/0xd5
   [<f81063cc>] scsi_sg_alloc+0x47/0x4a [scsi_mod]
   [<c051cd02>] __sg_alloc_table+0x48/0xc7
   [<f8106325>] scsi_init_sgtable+0x2c/0x8c [scsi_mod]
   [<f81064e7>] scsi_init_io+0x19/0x9b [scsi_mod]
   [<f8106abf>] scsi_setup_fs_cmnd+0x6f/0x73 [scsi_mod]
   [<f812ca73>] sd_prep_fn+0x6a/0x7d4 [sd_mod]
   [<c0504712>] elv_next_request+0xe6/0x18d
   [<f810704c>] scsi_request_fn+0x69/0x431 [scsi_mod]
   [<c05072af>] __generic_unplug_device+0x2e/0x31
   [<c05072db>] blk_start_queueing+0x29/0x2b
   [<c05137b8>] elv_ioq_request_add+0x2be/0x393
   [<c05048cd>] elv_insert+0x114/0x1a2
   [<c05049ec>] __elv_add_request+0x91/0x96
   [<c0507a00>] __make_request+0x365/0x397
   [<c050635a>] generic_make_request+0x342/0x3ce
   [<c0507b21>] submit_bio+0xef/0xfa
   [<c04c6c4e>] mpage_bio_submit+0x21/0x26
   [<c04c7b7f>] mpage_readpages+0xa3/0xad
   [<f80c1ea8>] ext3_readpages+0x19/0x1b [ext3]
   [<c048275e>] __do_page_cache_readahead+0xfd/0x166
   [<c0482b42>] do_page_cache_readahead+0x44/0x52
   [<c047d665>] filemap_fault+0x197/0x3ae
   [<c048b9ea>] __do_fault+0x40/0x37b
   [<c048d43f>] handle_mm_fault+0x2bb/0x646
   [<c063273c>] do_page_fault+0x29c/0x2fd
   [<c0630b4a>] error_code+0x72/0x78
   [<ffffffff>] 0xffffffff

 -> (&page_address_htable[i].lock){......} ops: 6802 {
    INITIAL USE at:
                         [<c044dad5>] __lock_acquire+0x58c/0x73e
                         [<c044dd36>] lock_acquire+0xaf/0xcc
                         [<c063056b>] _spin_lock_irqsave+0x33/0x43
                         [<c048af69>] page_address+0x50/0xa6
                         [<c048b0e7>] kmap_high+0x21/0x175
                         [<c041b7ef>] kmap+0x4e/0x5b
                         [<c04abb36>] page_getlink+0x37/0x59
                         [<c04abb75>] page_follow_link_light+0x1d/0x2b
                         [<c04ad4d0>] __link_path_walk+0x3d1/0xa71
                         [<c04adbae>] path_walk+0x3e/0x77
                         [<c04add0e>] do_path_lookup+0xeb/0x105
                         [<c04ae6f2>] path_lookup_open+0x48/0x7a
                         [<c04a8e96>] open_exec+0x25/0xf4
                         [<c04a9c2d>] do_execve+0xfa/0x2cc
                         [<c04015c0>] sys_execve+0x2b/0x54
                         [<c0402ae9>] syscall_call+0x7/0xb
                         [<ffffffff>] 0xffffffff
  }
  ... key      at: [<c0c5288c>] __key.28547+0x0/0x14
 ... acquired at:
   [<c044d243>] validate_chain+0x8a8/0xbae
   [<c044dbfd>] __lock_acquire+0x6b4/0x73e
   [<c044dd36>] lock_acquire+0xaf/0xcc
   [<c063056b>] _spin_lock_irqsave+0x33/0x43
   [<c048af69>] page_address+0x50/0xa6
   [<c05078a1>] __make_request+0x206/0x397
   [<c050635a>] generic_make_request+0x342/0x3ce
   [<c0507b21>] submit_bio+0xef/0xfa
   [<c04c6c4e>] mpage_bio_submit+0x21/0x26
   [<c04c78b8>] do_mpage_readpage+0x471/0x5e5
   [<c04c7b55>] mpage_readpages+0x79/0xad
   [<f80c1ea8>] ext3_readpages+0x19/0x1b [ext3]
   [<c048275e>] __do_page_cache_readahead+0xfd/0x166
   [<c0482b42>] do_page_cache_readahead+0x44/0x52
   [<c047d665>] filemap_fault+0x197/0x3ae
   [<c048b9ea>] __do_fault+0x40/0x37b
   [<c048d43f>] handle_mm_fault+0x2bb/0x646
   [<c063273c>] do_page_fault+0x29c/0x2fd
   [<c0630b4a>] error_code+0x72/0x78
   [<ffffffff>] 0xffffffff

 ... acquired at:
   [<c044d243>] validate_chain+0x8a8/0xbae
   [<c044dbfd>] __lock_acquire+0x6b4/0x73e
   [<c044dd36>] lock_acquire+0xaf/0xcc
   [<c0630340>] _spin_lock+0x2a/0x39
   [<c046143d>] call_rcu+0x36/0x5b
   [<c050f0c8>] cfq_cic_free+0x15/0x17
   [<c050f128>] cic_free_func+0x5e/0x64
   [<c050ea90>] __call_for_each_cic+0x23/0x2e
   [<c050eaad>] cfq_free_io_context+0x12/0x14
   [<c050978c>] put_io_context+0x4b/0x66
   [<c050f00a>] cfq_active_ioq_reset+0x21/0x39
   [<c0511044>] elv_reset_active_ioq+0x2b/0x3e
   [<c0512ecf>] __elv_ioq_slice_expired+0x238/0x26a
   [<c0512f1f>] elv_ioq_slice_expired+0x1e/0x20
   [<c0513860>] elv_ioq_request_add+0x366/0x393
   [<c05048cd>] elv_insert+0x114/0x1a2
   [<c05049ec>] __elv_add_request+0x91/0x96
   [<c0507a00>] __make_request+0x365/0x397
   [<c050635a>] generic_make_request+0x342/0x3ce
   [<c0507b21>] submit_bio+0xef/0xfa
   [<c04bf495>] submit_bh+0xe3/0x102
   [<c04c04b0>] ll_rw_block+0xbe/0xf7
   [<f80c35ba>] ext3_bread+0x39/0x79 [ext3]
   [<f80c5643>] dx_probe+0x2f/0x298 [ext3]
   [<f80c5956>] ext3_find_entry+0xaa/0x573 [ext3]
   [<f80c739e>] ext3_lookup+0x31/0xbe [ext3]
   [<c04abf7c>] do_lookup+0xbc/0x159
   [<c04ad7e8>] __link_path_walk+0x6e9/0xa71
   [<c04adbae>] path_walk+0x3e/0x77
   [<c04add0e>] do_path_lookup+0xeb/0x105
   [<c04ae584>] user_path_at+0x41/0x6c
   [<c04a8301>] vfs_fstatat+0x32/0x59
   [<c04a8417>] vfs_stat+0x18/0x1a
   [<c04a8432>] sys_stat64+0x19/0x2d
   [<c0402a68>] sysenter_do_call+0x12/0x36
   [<ffffffff>] 0xffffffff

 -> (&iocg->lock){+.+...} ops: 3 {
    HARDIRQ-ON-W at:
                          [<c044b840>] mark_held_locks+0x3d/0x58
                          [<c044b963>] trace_hardirqs_on_caller+0x108/0x14c
                          [<c044b9b2>] trace_hardirqs_on+0xb/0xd
                          [<c0630883>] _spin_unlock_irq+0x27/0x47
                          [<c0513baa>] iocg_destroy+0xbc/0x118
                          [<c045a16a>] cgroup_diput+0x4b/0xa7
                          [<c04b1dbb>] dentry_iput+0x78/0x9c
                          [<c04b1e82>] d_kill+0x21/0x3b
                          [<c04b2f2a>] dput+0xf3/0xfc
                          [<c04ae226>] do_rmdir+0x9a/0xc8
                          [<c04ae29d>] sys_rmdir+0x15/0x17
                          [<c0402a68>] sysenter_do_call+0x12/0x36
                          [<ffffffff>] 0xffffffff
    SOFTIRQ-ON-W at:
                          [<c044b840>] mark_held_locks+0x3d/0x58
                          [<c044b97c>] trace_hardirqs_on_caller+0x121/0x14c
                          [<c044b9b2>] trace_hardirqs_on+0xb/0xd
                          [<c0630883>] _spin_unlock_irq+0x27/0x47
                          [<c0513baa>] iocg_destroy+0xbc/0x118
                          [<c045a16a>] cgroup_diput+0x4b/0xa7
                          [<c04b1dbb>] dentry_iput+0x78/0x9c
                          [<c04b1e82>] d_kill+0x21/0x3b
                          [<c04b2f2a>] dput+0xf3/0xfc
                          [<c04ae226>] do_rmdir+0x9a/0xc8
                          [<c04ae29d>] sys_rmdir+0x15/0x17
                          [<c0402a68>] sysenter_do_call+0x12/0x36
                          [<ffffffff>] 0xffffffff
    INITIAL USE at:
                         [<c044dad5>] __lock_acquire+0x58c/0x73e
                         [<c044dd36>] lock_acquire+0xaf/0xcc
                         [<c06304ea>] _spin_lock_irq+0x30/0x3f
                         [<c05119bd>] io_alloc_root_group+0x104/0x155
                         [<c05133cb>] elv_init_fq_data+0x32/0xe0
                         [<c0504317>] elevator_alloc+0x150/0x170
                         [<c0505393>] elevator_init+0x9d/0x100
                         [<c0507088>] blk_init_queue_node+0xc4/0xf7
                         [<c05070cb>] blk_init_queue+0x10/0x12
                         [<f81060fd>] __scsi_alloc_queue+0x1c/0xba [scsi_mod]
                         [<f81061b0>] scsi_alloc_queue+0x15/0x4e [scsi_mod]
                         [<f810803d>] scsi_alloc_sdev+0x154/0x1f5 [scsi_mod]
                         [<f8108387>] scsi_probe_and_add_lun+0x123/0xb5b [scsi_mod]
                         [<f8109847>] __scsi_add_device+0x8a/0xb0 [scsi_mod]
                         [<f816ad14>] ata_scsi_scan_host+0x77/0x141 [libata]
                         [<f816903f>] async_port_probe+0xa0/0xa9 [libata]
                         [<c044341f>] async_thread+0xe9/0x1c9
                         [<c043e204>] kthread+0x4a/0x72
                         [<c04034e7>] kernel_thread_helper+0x7/0x10
                         [<ffffffff>] 0xffffffff
  }
  ... key      at: [<c0c5ebd8>] __key.29462+0x0/0x8
 ... acquired at:
   [<c044d243>] validate_chain+0x8a8/0xbae
   [<c044dbfd>] __lock_acquire+0x6b4/0x73e
   [<c044dd36>] lock_acquire+0xaf/0xcc
   [<c063056b>] _spin_lock_irqsave+0x33/0x43
   [<c0510f6f>] io_group_chain_link+0x5c/0x106
   [<c0511ba7>] io_find_alloc_group+0x54/0x60
   [<c0511c11>] io_get_io_group_bio+0x5e/0x89
   [<c0511cc3>] io_group_get_request_list+0x12/0x21
   [<c0507485>] get_request_wait+0x124/0x15d
   [<c050797e>] __make_request+0x2e3/0x397
   [<c050635a>] generic_make_request+0x342/0x3ce
   [<c0507b21>] submit_bio+0xef/0xfa
   [<c04c6c4e>] mpage_bio_submit+0x21/0x26
   [<c04c7b7f>] mpage_readpages+0xa3/0xad
   [<f80c1ea8>] ext3_readpages+0x19/0x1b [ext3]
   [<c048275e>] __do_page_cache_readahead+0xfd/0x166
   [<c048294a>] ondemand_readahead+0x10a/0x118
   [<c04829db>] page_cache_sync_readahead+0x1b/0x20
   [<c047cf37>] generic_file_aio_read+0x226/0x545
   [<c04a4cf6>] do_sync_read+0xb0/0xee
   [<c04a54b0>] vfs_read+0x8f/0x136
   [<c04a8d7c>] kernel_read+0x39/0x4b
   [<c04a8e69>] prepare_binprm+0xdb/0xe3
   [<c04a9ca8>] do_execve+0x175/0x2cc
   [<c04015c0>] sys_execve+0x2b/0x54
   [<c0402a68>] sysenter_do_call+0x12/0x36
   [<ffffffff>] 0xffffffff


stack backtrace:
Pid: 2186, comm: rmdir Not tainted 2.6.30-rc4-io #6
Call Trace:
 [<c044b1ac>] print_irq_inversion_bug+0x13b/0x147
 [<c044c3e5>] check_usage_backwards+0x7d/0x86
 [<c044b5ec>] mark_lock+0x2d3/0x4ea
 [<c044c368>] ? check_usage_backwards+0x0/0x86
 [<c044b840>] mark_held_locks+0x3d/0x58
 [<c0630883>] ? _spin_unlock_irq+0x27/0x47
 [<c044b97c>] trace_hardirqs_on_caller+0x121/0x14c
 [<c044b9b2>] trace_hardirqs_on+0xb/0xd
 [<c0630883>] _spin_unlock_irq+0x27/0x47
 [<c0513baa>] iocg_destroy+0xbc/0x118
 [<c045a16a>] cgroup_diput+0x4b/0xa7
 [<c04b1dbb>] dentry_iput+0x78/0x9c
 [<c04b1e82>] d_kill+0x21/0x3b
 [<c04b2f2a>] dput+0xf3/0xfc
 [<c04ae226>] do_rmdir+0x9a/0xc8
 [<c04029b1>] ? resume_userspace+0x11/0x28
 [<c051aa14>] ? trace_hardirqs_on_thunk+0xc/0x10
 [<c0402b34>] ? restore_nocheck_notrace+0x0/0xe
 [<c06324a0>] ? do_page_fault+0x0/0x2fd
 [<c044b97c>] ? trace_hardirqs_on_caller+0x121/0x14c
 [<c04ae29d>] sys_rmdir+0x15/0x17
 [<c0402a68>] sysenter_do_call+0x12/0x36

[-- Attachment #4: Type: text/plain, Size: 206 bytes --]

_______________________________________________
Containers mailing list
Containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org
https://lists.linux-foundation.org/mailman/listinfo/containers

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: IO scheduler based IO Controller V2
  2009-05-06 16:10       ` Vivek Goyal
  (?)
@ 2009-05-07  5:36       ` Li Zefan
       [not found]         ` <4A027348.6000808-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>
  -1 siblings, 1 reply; 97+ messages in thread
From: Li Zefan @ 2009-05-07  5:36 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Gui Jianfeng, nauman, dpshah, mikew, fchecconi, paolo.valente,
	jens.axboe, ryov, fernando, s-uchida, taka, jmoyer, dhaval,
	balbir, linux-kernel, containers, righi.andrea, agk, dm-devel,
	snitzer, m-ikeda, akpm

[-- Attachment #1: Type: text/plain, Size: 2886 bytes --]

Vivek Goyal wrote:
> On Wed, May 06, 2009 at 04:11:05PM +0800, Gui Jianfeng wrote:
>> Vivek Goyal wrote:
>>> Hi All,
>>>
>>> Here is the V2 of the IO controller patches generated on top of 2.6.30-rc4.
>>> First version of the patches was posted here.
>> Hi Vivek,
>>
>> I did some simple test for V2, and triggered an kernel panic.
>> The following script can reproduce this bug. It seems that the cgroup
>> is already removed, but IO Controller still try to access into it.
>>
> 
> Hi Gui,
> 
> Thanks for the report. I use cgroup_path() for debugging. I guess that
> cgroup_path() was passed null cgrp pointer that's why it crashed.
> 
> If yes, then it is strange though. I call cgroup_path() only after
> grabbing a refenrece to css object. (I am assuming that if I have a valid
> reference to css object then css->cgrp can't be null).
> 

Yes, css->cgrp shouldn't be NULL.. I doubt we hit a bug in cgroup here.
The code dealing with css refcnt and cgroup rmdir has changed quite a lot,
and is much more complex than it was.

> Anyway, can you please try out following patch and see if it fixes your
> crash.
...
> BTW, I tried following equivalent script and I can't see the crash on 
> my system. Are you able to hit it regularly?
> 

I modified the script like this:

======================
#!/bin/sh
echo 1 > /proc/sys/vm/drop_caches
mkdir /cgroup 2> /dev/null
mount -t cgroup -o io,blkio io /cgroup
mkdir /cgroup/test1
mkdir /cgroup/test2
echo 100 > /cgroup/test1/io.weight
echo 500 > /cgroup/test2/io.weight

dd if=/dev/zero bs=4096 count=128000 of=500M.1 &
pid1=$!
echo $pid1 > /cgroup/test1/tasks

dd if=/dev/zero bs=4096 count=128000 of=500M.2 &
pid2=$!
echo $pid2 > /cgroup/test2/tasks

sleep 5
kill -9 $pid1
kill -9 $pid2

for ((;count != 2;))
{
        rmdir /cgroup/test1 > /dev/null 2>&1
        if [ $? -eq 0 ]; then
                count=$(( $count + 1 ))
        fi

        rmdir /cgroup/test2 > /dev/null 2>&1
        if [ $? -eq 0 ]; then
                count=$(( $count + 1 ))
        fi
}

umount /cgroup
rmdir /cgroup
======================

I ran this script and got lockdep BUG. Full log and my config are attached.

Actually this can be triggered with the following steps on my box:
# mount -t cgroup -o blkio,io xxx /mnt
# mkdir /mnt/0
# echo $$ > /mnt/0/tasks
# echo 3 > /proc/sys/vm/drop_cache
# echo $$ > /mnt/tasks
# rmdir /mnt/0

And when I ran the script for the second time, my box was freezed
and I had to reset it.

> Instead of killing the tasks I also tried moving the tasks into root cgroup
> and then deleting test1 and test2 groups, that also did not produce any crash.
> (Hit a different bug though after 5-6 attempts :-)
> 
> As I mentioned in the patchset, currently we do have issues with group
> refcounting and cgroup/group going away. Hopefully in next version they
> all should be fixed up. But still, it is nice to hear back...
> 

[-- Attachment #2: myconfig --]
[-- Type: text/plain, Size: 64514 bytes --]

#
# Automatically generated make config: don't edit
# Linux kernel version: 2.6.30-rc4
# Thu May  7 09:11:29 2009
#
# CONFIG_64BIT is not set
CONFIG_X86_32=y
# CONFIG_X86_64 is not set
CONFIG_X86=y
CONFIG_ARCH_DEFCONFIG="arch/x86/configs/i386_defconfig"
CONFIG_GENERIC_TIME=y
CONFIG_GENERIC_CMOS_UPDATE=y
CONFIG_CLOCKSOURCE_WATCHDOG=y
CONFIG_GENERIC_CLOCKEVENTS=y
CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_HAVE_LATENCYTOP_SUPPORT=y
CONFIG_FAST_CMPXCHG_LOCAL=y
CONFIG_MMU=y
CONFIG_ZONE_DMA=y
CONFIG_GENERIC_ISA_DMA=y
CONFIG_GENERIC_IOMAP=y
CONFIG_GENERIC_BUG=y
CONFIG_GENERIC_HWEIGHT=y
CONFIG_ARCH_MAY_HAVE_PC_FDC=y
# CONFIG_RWSEM_GENERIC_SPINLOCK is not set
CONFIG_RWSEM_XCHGADD_ALGORITHM=y
CONFIG_ARCH_HAS_CPU_IDLE_WAIT=y
CONFIG_GENERIC_CALIBRATE_DELAY=y
# CONFIG_GENERIC_TIME_VSYSCALL is not set
CONFIG_ARCH_HAS_CPU_RELAX=y
CONFIG_ARCH_HAS_DEFAULT_IDLE=y
CONFIG_ARCH_HAS_CACHE_LINE_SIZE=y
CONFIG_HAVE_SETUP_PER_CPU_AREA=y
CONFIG_HAVE_DYNAMIC_PER_CPU_AREA=y
# CONFIG_HAVE_CPUMASK_OF_CPU_MAP is not set
CONFIG_ARCH_HIBERNATION_POSSIBLE=y
CONFIG_ARCH_SUSPEND_POSSIBLE=y
# CONFIG_ZONE_DMA32 is not set
CONFIG_ARCH_POPULATES_NODE_MAP=y
# CONFIG_AUDIT_ARCH is not set
CONFIG_ARCH_SUPPORTS_OPTIMIZED_INLINING=y
CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC=y
CONFIG_GENERIC_HARDIRQS=y
CONFIG_GENERIC_HARDIRQS_NO__DO_IRQ=y
CONFIG_GENERIC_IRQ_PROBE=y
CONFIG_GENERIC_PENDING_IRQ=y
CONFIG_USE_GENERIC_SMP_HELPERS=y
CONFIG_X86_32_SMP=y
CONFIG_X86_HT=y
CONFIG_X86_TRAMPOLINE=y
CONFIG_X86_32_LAZY_GS=y
CONFIG_KTIME_SCALAR=y
CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config"

#
# General setup
#
CONFIG_EXPERIMENTAL=y
CONFIG_LOCK_KERNEL=y
CONFIG_INIT_ENV_ARG_LIMIT=32
CONFIG_LOCALVERSION=""
# CONFIG_LOCALVERSION_AUTO is not set
CONFIG_HAVE_KERNEL_GZIP=y
CONFIG_HAVE_KERNEL_BZIP2=y
CONFIG_HAVE_KERNEL_LZMA=y
CONFIG_KERNEL_GZIP=y
# CONFIG_KERNEL_BZIP2 is not set
# CONFIG_KERNEL_LZMA is not set
CONFIG_SWAP=y
CONFIG_SYSVIPC=y
CONFIG_SYSVIPC_SYSCTL=y
CONFIG_POSIX_MQUEUE=y
CONFIG_POSIX_MQUEUE_SYSCTL=y
CONFIG_BSD_PROCESS_ACCT=y
# CONFIG_BSD_PROCESS_ACCT_V3 is not set
CONFIG_TASKSTATS=y
CONFIG_TASK_DELAY_ACCT=y
CONFIG_TASK_XACCT=y
CONFIG_TASK_IO_ACCOUNTING=y
# CONFIG_AUDIT is not set

#
# RCU Subsystem
#
# CONFIG_CLASSIC_RCU is not set
# CONFIG_TREE_RCU is not set
CONFIG_PREEMPT_RCU=y
CONFIG_RCU_TRACE=y
# CONFIG_TREE_RCU_TRACE is not set
CONFIG_PREEMPT_RCU_TRACE=y
# CONFIG_IKCONFIG is not set
CONFIG_LOG_BUF_SHIFT=17
CONFIG_HAVE_UNSTABLE_SCHED_CLOCK=y
CONFIG_GROUP_SCHED=y
CONFIG_FAIR_GROUP_SCHED=y
CONFIG_RT_GROUP_SCHED=y
# CONFIG_USER_SCHED is not set
CONFIG_CGROUP_SCHED=y
CONFIG_CGROUPS=y
CONFIG_CGROUP_DEBUG=y
CONFIG_CGROUP_NS=y
CONFIG_CGROUP_FREEZER=y
CONFIG_CGROUP_DEVICE=y
CONFIG_CPUSETS=y
CONFIG_PROC_PID_CPUSET=y
CONFIG_CGROUP_CPUACCT=y
CONFIG_RESOURCE_COUNTERS=y
CONFIG_CGROUP_MEM_RES_CTLR=y
CONFIG_CGROUP_MEM_RES_CTLR_SWAP=y
CONFIG_GROUP_IOSCHED=y
CONFIG_CGROUP_BLKIO=y
CONFIG_CGROUP_PAGE=y
CONFIG_MM_OWNER=y
CONFIG_SYSFS_DEPRECATED=y
CONFIG_SYSFS_DEPRECATED_V2=y
CONFIG_RELAY=y
CONFIG_NAMESPACES=y
# CONFIG_UTS_NS is not set
# CONFIG_IPC_NS is not set
CONFIG_USER_NS=y
CONFIG_PID_NS=y
# CONFIG_NET_NS is not set
CONFIG_BLK_DEV_INITRD=y
CONFIG_INITRAMFS_SOURCE=""
CONFIG_RD_GZIP=y
CONFIG_RD_BZIP2=y
CONFIG_RD_LZMA=y
CONFIG_CC_OPTIMIZE_FOR_SIZE=y
CONFIG_SYSCTL=y
CONFIG_ANON_INODES=y
# CONFIG_EMBEDDED is not set
CONFIG_UID16=y
CONFIG_SYSCTL_SYSCALL=y
CONFIG_KALLSYMS=y
CONFIG_KALLSYMS_ALL=y
CONFIG_KALLSYMS_EXTRA_PASS=y
# CONFIG_STRIP_ASM_SYMS is not set
CONFIG_HOTPLUG=y
CONFIG_PRINTK=y
CONFIG_BUG=y
CONFIG_ELF_CORE=y
CONFIG_PCSPKR_PLATFORM=y
CONFIG_BASE_FULL=y
CONFIG_FUTEX=y
CONFIG_EPOLL=y
CONFIG_SIGNALFD=y
CONFIG_TIMERFD=y
CONFIG_EVENTFD=y
CONFIG_SHMEM=y
CONFIG_AIO=y
CONFIG_VM_EVENT_COUNTERS=y
CONFIG_PCI_QUIRKS=y
CONFIG_SLUB_DEBUG=y
CONFIG_COMPAT_BRK=y
# CONFIG_SLAB is not set
CONFIG_SLUB=y
# CONFIG_SLOB is not set
CONFIG_PROFILING=y
CONFIG_TRACEPOINTS=y
CONFIG_MARKERS=y
CONFIG_OPROFILE=m
# CONFIG_OPROFILE_IBS is not set
CONFIG_HAVE_OPROFILE=y
CONFIG_KPROBES=y
CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y
CONFIG_KRETPROBES=y
CONFIG_HAVE_IOREMAP_PROT=y
CONFIG_HAVE_KPROBES=y
CONFIG_HAVE_KRETPROBES=y
CONFIG_HAVE_ARCH_TRACEHOOK=y
CONFIG_HAVE_DMA_API_DEBUG=y
# CONFIG_SLOW_WORK is not set
CONFIG_HAVE_GENERIC_DMA_COHERENT=y
CONFIG_SLABINFO=y
CONFIG_RT_MUTEXES=y
CONFIG_BASE_SMALL=0
CONFIG_MODULES=y
# CONFIG_MODULE_FORCE_LOAD is not set
CONFIG_MODULE_UNLOAD=y
# CONFIG_MODULE_FORCE_UNLOAD is not set
# CONFIG_MODVERSIONS is not set
# CONFIG_MODULE_SRCVERSION_ALL is not set
CONFIG_STOP_MACHINE=y
CONFIG_BLOCK=y
CONFIG_LBD=y
CONFIG_BLK_DEV_BSG=y
# CONFIG_BLK_DEV_INTEGRITY is not set

#
# IO Schedulers
#
CONFIG_ELV_FAIR_QUEUING=y
CONFIG_IOSCHED_NOOP=y
CONFIG_IOSCHED_NOOP_HIER=y
CONFIG_IOSCHED_AS=m
CONFIG_IOSCHED_AS_HIER=y
CONFIG_IOSCHED_DEADLINE=m
CONFIG_IOSCHED_DEADLINE_HIER=y
CONFIG_IOSCHED_CFQ=y
CONFIG_IOSCHED_CFQ_HIER=y
# CONFIG_DEFAULT_AS is not set
# CONFIG_DEFAULT_DEADLINE is not set
CONFIG_DEFAULT_CFQ=y
# CONFIG_DEFAULT_NOOP is not set
CONFIG_DEFAULT_IOSCHED="cfq"
CONFIG_TRACK_ASYNC_CONTEXT=y
CONFIG_DEBUG_GROUP_IOSCHED=y
CONFIG_FREEZER=y

#
# Processor type and features
#
CONFIG_TICK_ONESHOT=y
CONFIG_NO_HZ=y
CONFIG_HIGH_RES_TIMERS=y
CONFIG_GENERIC_CLOCKEVENTS_BUILD=y
CONFIG_SMP=y
# CONFIG_SPARSE_IRQ is not set
CONFIG_X86_MPPARSE=y
# CONFIG_X86_BIGSMP is not set
CONFIG_X86_EXTENDED_PLATFORM=y
# CONFIG_X86_ELAN is not set
# CONFIG_X86_RDC321X is not set
# CONFIG_X86_32_NON_STANDARD is not set
CONFIG_SCHED_OMIT_FRAME_POINTER=y
# CONFIG_PARAVIRT_GUEST is not set
# CONFIG_MEMTEST is not set
# CONFIG_M386 is not set
# CONFIG_M486 is not set
# CONFIG_M586 is not set
# CONFIG_M586TSC is not set
# CONFIG_M586MMX is not set
CONFIG_M686=y
# CONFIG_MPENTIUMII is not set
# CONFIG_MPENTIUMIII is not set
# CONFIG_MPENTIUMM is not set
# CONFIG_MPENTIUM4 is not set
# CONFIG_MK6 is not set
# CONFIG_MK7 is not set
# CONFIG_MK8 is not set
# CONFIG_MCRUSOE is not set
# CONFIG_MEFFICEON is not set
# CONFIG_MWINCHIPC6 is not set
# CONFIG_MWINCHIP3D is not set
# CONFIG_MGEODEGX1 is not set
# CONFIG_MGEODE_LX is not set
# CONFIG_MCYRIXIII is not set
# CONFIG_MVIAC3_2 is not set
# CONFIG_MVIAC7 is not set
# CONFIG_MPSC is not set
# CONFIG_MCORE2 is not set
# CONFIG_GENERIC_CPU is not set
CONFIG_X86_GENERIC=y
CONFIG_X86_CPU=y
CONFIG_X86_L1_CACHE_BYTES=64
CONFIG_X86_INTERNODE_CACHE_BYTES=64
CONFIG_X86_CMPXCHG=y
CONFIG_X86_L1_CACHE_SHIFT=5
CONFIG_X86_XADD=y
CONFIG_X86_PPRO_FENCE=y
CONFIG_X86_WP_WORKS_OK=y
CONFIG_X86_INVLPG=y
CONFIG_X86_BSWAP=y
CONFIG_X86_POPAD_OK=y
CONFIG_X86_INTEL_USERCOPY=y
CONFIG_X86_USE_PPRO_CHECKSUM=y
CONFIG_X86_TSC=y
CONFIG_X86_CMOV=y
CONFIG_X86_MINIMUM_CPU_FAMILY=4
CONFIG_X86_DEBUGCTLMSR=y
CONFIG_CPU_SUP_INTEL=y
CONFIG_CPU_SUP_CYRIX_32=y
CONFIG_CPU_SUP_AMD=y
CONFIG_CPU_SUP_CENTAUR=y
CONFIG_CPU_SUP_TRANSMETA_32=y
CONFIG_CPU_SUP_UMC_32=y
# CONFIG_X86_DS is not set
CONFIG_HPET_TIMER=y
CONFIG_HPET_EMULATE_RTC=y
CONFIG_DMI=y
# CONFIG_IOMMU_HELPER is not set
# CONFIG_IOMMU_API is not set
CONFIG_NR_CPUS=8
# CONFIG_SCHED_SMT is not set
CONFIG_SCHED_MC=y
# CONFIG_PREEMPT_NONE is not set
# CONFIG_PREEMPT_VOLUNTARY is not set
CONFIG_PREEMPT=y
CONFIG_X86_LOCAL_APIC=y
CONFIG_X86_IO_APIC=y
# CONFIG_X86_REROUTE_FOR_BROKEN_BOOT_IRQS is not set
CONFIG_X86_MCE=y
# CONFIG_X86_MCE_NONFATAL is not set
# CONFIG_X86_MCE_P4THERMAL is not set
CONFIG_VM86=y
# CONFIG_TOSHIBA is not set
# CONFIG_I8K is not set
# CONFIG_X86_REBOOTFIXUPS is not set
# CONFIG_MICROCODE is not set
CONFIG_X86_MSR=m
CONFIG_X86_CPUID=m
# CONFIG_X86_CPU_DEBUG is not set
# CONFIG_NOHIGHMEM is not set
CONFIG_HIGHMEM4G=y
# CONFIG_HIGHMEM64G is not set
CONFIG_PAGE_OFFSET=0xC0000000
CONFIG_HIGHMEM=y
# CONFIG_ARCH_PHYS_ADDR_T_64BIT is not set
CONFIG_ARCH_FLATMEM_ENABLE=y
CONFIG_ARCH_SPARSEMEM_ENABLE=y
CONFIG_ARCH_SELECT_MEMORY_MODEL=y
CONFIG_SELECT_MEMORY_MODEL=y
CONFIG_FLATMEM_MANUAL=y
# CONFIG_DISCONTIGMEM_MANUAL is not set
# CONFIG_SPARSEMEM_MANUAL is not set
CONFIG_FLATMEM=y
CONFIG_FLAT_NODE_MEM_MAP=y
CONFIG_SPARSEMEM_STATIC=y
CONFIG_PAGEFLAGS_EXTENDED=y
CONFIG_SPLIT_PTLOCK_CPUS=4
# CONFIG_PHYS_ADDR_T_64BIT is not set
CONFIG_ZONE_DMA_FLAG=1
CONFIG_BOUNCE=y
CONFIG_VIRT_TO_BUS=y
CONFIG_UNEVICTABLE_LRU=y
CONFIG_HAVE_MLOCK=y
CONFIG_HAVE_MLOCKED_PAGE_BIT=y
CONFIG_HIGHPTE=y
# CONFIG_X86_CHECK_BIOS_CORRUPTION is not set
CONFIG_X86_RESERVE_LOW_64K=y
# CONFIG_MATH_EMULATION is not set
CONFIG_MTRR=y
CONFIG_MTRR_SANITIZER=y
CONFIG_MTRR_SANITIZER_ENABLE_DEFAULT=0
CONFIG_MTRR_SANITIZER_SPARE_REG_NR_DEFAULT=1
# CONFIG_X86_PAT is not set
CONFIG_EFI=y
CONFIG_SECCOMP=y
# CONFIG_CC_STACKPROTECTOR is not set
# CONFIG_HZ_100 is not set
# CONFIG_HZ_250 is not set
# CONFIG_HZ_300 is not set
CONFIG_HZ_1000=y
CONFIG_HZ=1000
CONFIG_SCHED_HRTICK=y
CONFIG_KEXEC=y
CONFIG_CRASH_DUMP=y
CONFIG_PHYSICAL_START=0x1000000
CONFIG_RELOCATABLE=y
CONFIG_PHYSICAL_ALIGN=0x400000
CONFIG_HOTPLUG_CPU=y
# CONFIG_COMPAT_VDSO is not set
# CONFIG_CMDLINE_BOOL is not set
CONFIG_ARCH_ENABLE_MEMORY_HOTPLUG=y

#
# Power management and ACPI options
#
CONFIG_PM=y
CONFIG_PM_DEBUG=y
# CONFIG_PM_VERBOSE is not set
CONFIG_CAN_PM_TRACE=y
# CONFIG_PM_TRACE_RTC is not set
CONFIG_PM_SLEEP_SMP=y
CONFIG_PM_SLEEP=y
CONFIG_SUSPEND=y
CONFIG_SUSPEND_FREEZER=y
# CONFIG_HIBERNATION is not set
CONFIG_ACPI=y
CONFIG_ACPI_SLEEP=y
# CONFIG_ACPI_PROCFS is not set
# CONFIG_ACPI_PROCFS_POWER is not set
CONFIG_ACPI_SYSFS_POWER=y
# CONFIG_ACPI_PROC_EVENT is not set
CONFIG_ACPI_AC=m
# CONFIG_ACPI_BATTERY is not set
CONFIG_ACPI_BUTTON=m
CONFIG_ACPI_VIDEO=m
CONFIG_ACPI_FAN=y
CONFIG_ACPI_DOCK=y
CONFIG_ACPI_PROCESSOR=y
CONFIG_ACPI_HOTPLUG_CPU=y
CONFIG_ACPI_THERMAL=y
# CONFIG_ACPI_CUSTOM_DSDT is not set
CONFIG_ACPI_BLACKLIST_YEAR=1999
# CONFIG_ACPI_DEBUG is not set
# CONFIG_ACPI_PCI_SLOT is not set
CONFIG_X86_PM_TIMER=y
CONFIG_ACPI_CONTAINER=y
# CONFIG_ACPI_SBS is not set
CONFIG_X86_APM_BOOT=y
CONFIG_APM=y
# CONFIG_APM_IGNORE_USER_SUSPEND is not set
# CONFIG_APM_DO_ENABLE is not set
CONFIG_APM_CPU_IDLE=y
# CONFIG_APM_DISPLAY_BLANK is not set
# CONFIG_APM_ALLOW_INTS is not set

#
# CPU Frequency scaling
#
CONFIG_CPU_FREQ=y
CONFIG_CPU_FREQ_TABLE=y
CONFIG_CPU_FREQ_DEBUG=y
CONFIG_CPU_FREQ_STAT=m
CONFIG_CPU_FREQ_STAT_DETAILS=y
# CONFIG_CPU_FREQ_DEFAULT_GOV_PERFORMANCE is not set
# CONFIG_CPU_FREQ_DEFAULT_GOV_POWERSAVE is not set
CONFIG_CPU_FREQ_DEFAULT_GOV_USERSPACE=y
# CONFIG_CPU_FREQ_DEFAULT_GOV_ONDEMAND is not set
# CONFIG_CPU_FREQ_DEFAULT_GOV_CONSERVATIVE is not set
CONFIG_CPU_FREQ_GOV_PERFORMANCE=y
CONFIG_CPU_FREQ_GOV_POWERSAVE=m
CONFIG_CPU_FREQ_GOV_USERSPACE=y
CONFIG_CPU_FREQ_GOV_ONDEMAND=m
CONFIG_CPU_FREQ_GOV_CONSERVATIVE=m

#
# CPUFreq processor drivers
#
# CONFIG_X86_ACPI_CPUFREQ is not set
# CONFIG_X86_POWERNOW_K6 is not set
# CONFIG_X86_POWERNOW_K7 is not set
# CONFIG_X86_POWERNOW_K8 is not set
# CONFIG_X86_GX_SUSPMOD is not set
# CONFIG_X86_SPEEDSTEP_CENTRINO is not set
CONFIG_X86_SPEEDSTEP_ICH=y
CONFIG_X86_SPEEDSTEP_SMI=y
# CONFIG_X86_P4_CLOCKMOD is not set
# CONFIG_X86_CPUFREQ_NFORCE2 is not set
# CONFIG_X86_LONGRUN is not set
# CONFIG_X86_LONGHAUL is not set
# CONFIG_X86_E_POWERSAVER is not set

#
# shared options
#
CONFIG_X86_SPEEDSTEP_LIB=y
# CONFIG_X86_SPEEDSTEP_RELAXED_CAP_CHECK is not set
CONFIG_CPU_IDLE=y
CONFIG_CPU_IDLE_GOV_LADDER=y
CONFIG_CPU_IDLE_GOV_MENU=y

#
# Bus options (PCI etc.)
#
CONFIG_PCI=y
# CONFIG_PCI_GOBIOS is not set
# CONFIG_PCI_GOMMCONFIG is not set
# CONFIG_PCI_GODIRECT is not set
# CONFIG_PCI_GOOLPC is not set
CONFIG_PCI_GOANY=y
CONFIG_PCI_BIOS=y
CONFIG_PCI_DIRECT=y
CONFIG_PCI_MMCONFIG=y
CONFIG_PCI_DOMAINS=y
CONFIG_PCIEPORTBUS=y
CONFIG_HOTPLUG_PCI_PCIE=m
CONFIG_PCIEAER=y
# CONFIG_PCIEASPM is not set
CONFIG_ARCH_SUPPORTS_MSI=y
# CONFIG_PCI_MSI is not set
CONFIG_PCI_LEGACY=y
# CONFIG_PCI_DEBUG is not set
# CONFIG_PCI_STUB is not set
CONFIG_HT_IRQ=y
# CONFIG_PCI_IOV is not set
CONFIG_ISA_DMA_API=y
CONFIG_ISA=y
# CONFIG_EISA is not set
# CONFIG_MCA is not set
# CONFIG_SCx200 is not set
# CONFIG_OLPC is not set
CONFIG_PCCARD=y
# CONFIG_PCMCIA_DEBUG is not set
CONFIG_PCMCIA=y
CONFIG_PCMCIA_LOAD_CIS=y
# CONFIG_PCMCIA_IOCTL is not set
CONFIG_CARDBUS=y

#
# PC-card bridges
#
CONFIG_YENTA=y
CONFIG_YENTA_O2=y
CONFIG_YENTA_RICOH=y
CONFIG_YENTA_TI=y
CONFIG_YENTA_ENE_TUNE=y
CONFIG_YENTA_TOSHIBA=y
# CONFIG_PD6729 is not set
# CONFIG_I82092 is not set
# CONFIG_I82365 is not set
# CONFIG_TCIC is not set
CONFIG_PCMCIA_PROBE=y
CONFIG_PCCARD_NONSTATIC=y
CONFIG_HOTPLUG_PCI=y
CONFIG_HOTPLUG_PCI_FAKE=m
# CONFIG_HOTPLUG_PCI_COMPAQ is not set
# CONFIG_HOTPLUG_PCI_IBM is not set
CONFIG_HOTPLUG_PCI_ACPI=m
CONFIG_HOTPLUG_PCI_ACPI_IBM=m
# CONFIG_HOTPLUG_PCI_CPCI is not set
# CONFIG_HOTPLUG_PCI_SHPC is not set

#
# Executable file formats / Emulations
#
CONFIG_BINFMT_ELF=y
# CONFIG_CORE_DUMP_DEFAULT_ELF_HEADERS is not set
CONFIG_HAVE_AOUT=y
# CONFIG_BINFMT_AOUT is not set
CONFIG_BINFMT_MISC=y
CONFIG_HAVE_ATOMIC_IOMAP=y
CONFIG_NET=y

#
# Networking options
#
CONFIG_PACKET=y
CONFIG_PACKET_MMAP=y
CONFIG_UNIX=y
# CONFIG_NET_KEY is not set
CONFIG_INET=y
CONFIG_IP_MULTICAST=y
CONFIG_IP_ADVANCED_ROUTER=y
CONFIG_ASK_IP_FIB_HASH=y
# CONFIG_IP_FIB_TRIE is not set
CONFIG_IP_FIB_HASH=y
CONFIG_IP_MULTIPLE_TABLES=y
CONFIG_IP_ROUTE_MULTIPATH=y
CONFIG_IP_ROUTE_VERBOSE=y
# CONFIG_IP_PNP is not set
CONFIG_NET_IPIP=m
# CONFIG_NET_IPGRE is not set
CONFIG_IP_MROUTE=y
CONFIG_IP_PIMSM_V1=y
CONFIG_IP_PIMSM_V2=y
# CONFIG_ARPD is not set
CONFIG_SYN_COOKIES=y
# CONFIG_INET_AH is not set
# CONFIG_INET_ESP is not set
# CONFIG_INET_IPCOMP is not set
# CONFIG_INET_XFRM_TUNNEL is not set
CONFIG_INET_TUNNEL=m
# CONFIG_INET_XFRM_MODE_TRANSPORT is not set
# CONFIG_INET_XFRM_MODE_TUNNEL is not set
# CONFIG_INET_XFRM_MODE_BEET is not set
CONFIG_INET_LRO=m
CONFIG_INET_DIAG=m
CONFIG_INET_TCP_DIAG=m
CONFIG_TCP_CONG_ADVANCED=y
CONFIG_TCP_CONG_BIC=m
CONFIG_TCP_CONG_CUBIC=y
# CONFIG_TCP_CONG_WESTWOOD is not set
# CONFIG_TCP_CONG_HTCP is not set
CONFIG_TCP_CONG_HSTCP=m
CONFIG_TCP_CONG_HYBLA=m
# CONFIG_TCP_CONG_VEGAS is not set
CONFIG_TCP_CONG_SCALABLE=m
CONFIG_TCP_CONG_LP=m
# CONFIG_TCP_CONG_VENO is not set
# CONFIG_TCP_CONG_YEAH is not set
CONFIG_TCP_CONG_ILLINOIS=m
# CONFIG_DEFAULT_BIC is not set
CONFIG_DEFAULT_CUBIC=y
# CONFIG_DEFAULT_HTCP is not set
# CONFIG_DEFAULT_VEGAS is not set
# CONFIG_DEFAULT_WESTWOOD is not set
# CONFIG_DEFAULT_RENO is not set
CONFIG_DEFAULT_TCP_CONG="cubic"
# CONFIG_TCP_MD5SIG is not set
# CONFIG_IPV6 is not set
# CONFIG_NETWORK_SECMARK is not set
# CONFIG_NETFILTER is not set
# CONFIG_IP_DCCP is not set
# CONFIG_IP_SCTP is not set
# CONFIG_TIPC is not set
# CONFIG_ATM is not set
CONFIG_STP=m
CONFIG_BRIDGE=m
# CONFIG_NET_DSA is not set
# CONFIG_VLAN_8021Q is not set
# CONFIG_DECNET is not set
CONFIG_LLC=m
# CONFIG_LLC2 is not set
# CONFIG_IPX is not set
# CONFIG_ATALK is not set
# CONFIG_X25 is not set
# CONFIG_LAPB is not set
# CONFIG_ECONET is not set
# CONFIG_WAN_ROUTER is not set
# CONFIG_PHONET is not set
CONFIG_NET_SCHED=y

#
# Queueing/Scheduling
#
# CONFIG_NET_SCH_CBQ is not set
# CONFIG_NET_SCH_HTB is not set
# CONFIG_NET_SCH_HFSC is not set
# CONFIG_NET_SCH_PRIO is not set
# CONFIG_NET_SCH_MULTIQ is not set
# CONFIG_NET_SCH_RED is not set
# CONFIG_NET_SCH_SFQ is not set
# CONFIG_NET_SCH_TEQL is not set
# CONFIG_NET_SCH_TBF is not set
# CONFIG_NET_SCH_GRED is not set
# CONFIG_NET_SCH_DSMARK is not set
# CONFIG_NET_SCH_NETEM is not set
# CONFIG_NET_SCH_DRR is not set

#
# Classification
#
CONFIG_NET_CLS=y
# CONFIG_NET_CLS_BASIC is not set
# CONFIG_NET_CLS_TCINDEX is not set
# CONFIG_NET_CLS_ROUTE4 is not set
# CONFIG_NET_CLS_FW is not set
# CONFIG_NET_CLS_U32 is not set
# CONFIG_NET_CLS_RSVP is not set
# CONFIG_NET_CLS_RSVP6 is not set
# CONFIG_NET_CLS_FLOW is not set
CONFIG_NET_CLS_CGROUP=y
# CONFIG_NET_EMATCH is not set
# CONFIG_NET_CLS_ACT is not set
CONFIG_NET_SCH_FIFO=y
# CONFIG_DCB is not set

#
# Network testing
#
# CONFIG_NET_PKTGEN is not set
# CONFIG_NET_TCPPROBE is not set
# CONFIG_NET_DROP_MONITOR is not set
# CONFIG_HAMRADIO is not set
# CONFIG_CAN is not set
# CONFIG_IRDA is not set
# CONFIG_BT is not set
# CONFIG_AF_RXRPC is not set
CONFIG_FIB_RULES=y
# CONFIG_WIRELESS is not set
# CONFIG_WIMAX is not set
# CONFIG_RFKILL is not set
# CONFIG_NET_9P is not set

#
# Device Drivers
#

#
# Generic Driver Options
#
CONFIG_UEVENT_HELPER_PATH="/sbin/hotplug"
CONFIG_STANDALONE=y
CONFIG_PREVENT_FIRMWARE_BUILD=y
CONFIG_FW_LOADER=y
CONFIG_FIRMWARE_IN_KERNEL=y
CONFIG_EXTRA_FIRMWARE=""
# CONFIG_DEBUG_DRIVER is not set
CONFIG_DEBUG_DEVRES=y
# CONFIG_SYS_HYPERVISOR is not set
# CONFIG_CONNECTOR is not set
# CONFIG_MTD is not set
CONFIG_PARPORT=m
CONFIG_PARPORT_PC=m
CONFIG_PARPORT_SERIAL=m
# CONFIG_PARPORT_PC_FIFO is not set
# CONFIG_PARPORT_PC_SUPERIO is not set
CONFIG_PARPORT_PC_PCMCIA=m
# CONFIG_PARPORT_GSC is not set
# CONFIG_PARPORT_AX88796 is not set
CONFIG_PARPORT_1284=y
CONFIG_PNP=y
CONFIG_PNP_DEBUG_MESSAGES=y

#
# Protocols
#
CONFIG_ISAPNP=y
# CONFIG_PNPBIOS is not set
CONFIG_PNPACPI=y
CONFIG_BLK_DEV=y
# CONFIG_BLK_DEV_FD is not set
# CONFIG_BLK_DEV_XD is not set
CONFIG_PARIDE=m

#
# Parallel IDE high-level drivers
#
CONFIG_PARIDE_PD=m
CONFIG_PARIDE_PCD=m
CONFIG_PARIDE_PF=m
# CONFIG_PARIDE_PT is not set
CONFIG_PARIDE_PG=m

#
# Parallel IDE protocol modules
#
# CONFIG_PARIDE_ATEN is not set
# CONFIG_PARIDE_BPCK is not set
# CONFIG_PARIDE_BPCK6 is not set
# CONFIG_PARIDE_COMM is not set
# CONFIG_PARIDE_DSTR is not set
# CONFIG_PARIDE_FIT2 is not set
# CONFIG_PARIDE_FIT3 is not set
# CONFIG_PARIDE_EPAT is not set
# CONFIG_PARIDE_EPIA is not set
# CONFIG_PARIDE_FRIQ is not set
# CONFIG_PARIDE_FRPW is not set
# CONFIG_PARIDE_KBIC is not set
# CONFIG_PARIDE_KTTI is not set
# CONFIG_PARIDE_ON20 is not set
# CONFIG_PARIDE_ON26 is not set
# CONFIG_BLK_CPQ_DA is not set
# CONFIG_BLK_CPQ_CISS_DA is not set
# CONFIG_BLK_DEV_DAC960 is not set
# CONFIG_BLK_DEV_UMEM is not set
# CONFIG_BLK_DEV_COW_COMMON is not set
CONFIG_BLK_DEV_LOOP=m
CONFIG_BLK_DEV_CRYPTOLOOP=m
CONFIG_BLK_DEV_NBD=m
# CONFIG_BLK_DEV_SX8 is not set
# CONFIG_BLK_DEV_UB is not set
CONFIG_BLK_DEV_RAM=y
CONFIG_BLK_DEV_RAM_COUNT=16
CONFIG_BLK_DEV_RAM_SIZE=16384
# CONFIG_BLK_DEV_XIP is not set
# CONFIG_CDROM_PKTCDVD is not set
# CONFIG_ATA_OVER_ETH is not set
# CONFIG_BLK_DEV_HD is not set
CONFIG_MISC_DEVICES=y
# CONFIG_IBM_ASM is not set
# CONFIG_PHANTOM is not set
# CONFIG_SGI_IOC4 is not set
# CONFIG_TIFM_CORE is not set
# CONFIG_ICS932S401 is not set
# CONFIG_ENCLOSURE_SERVICES is not set
# CONFIG_HP_ILO is not set
# CONFIG_ISL29003 is not set
# CONFIG_C2PORT is not set

#
# EEPROM support
#
# CONFIG_EEPROM_AT24 is not set
# CONFIG_EEPROM_LEGACY is not set
CONFIG_EEPROM_93CX6=m
CONFIG_HAVE_IDE=y
# CONFIG_IDE is not set

#
# SCSI device support
#
# CONFIG_RAID_ATTRS is not set
CONFIG_SCSI=m
CONFIG_SCSI_DMA=y
CONFIG_SCSI_TGT=m
CONFIG_SCSI_NETLINK=y
CONFIG_SCSI_PROC_FS=y

#
# SCSI support type (disk, tape, CD-ROM)
#
CONFIG_BLK_DEV_SD=m
# CONFIG_CHR_DEV_ST is not set
# CONFIG_CHR_DEV_OSST is not set
CONFIG_BLK_DEV_SR=m
CONFIG_BLK_DEV_SR_VENDOR=y
CONFIG_CHR_DEV_SG=m
CONFIG_CHR_DEV_SCH=m

#
# Some SCSI devices (e.g. CD jukebox) support multiple LUNs
#
CONFIG_SCSI_MULTI_LUN=y
# CONFIG_SCSI_CONSTANTS is not set
CONFIG_SCSI_LOGGING=y
CONFIG_SCSI_SCAN_ASYNC=y
CONFIG_SCSI_WAIT_SCAN=m

#
# SCSI Transports
#
CONFIG_SCSI_SPI_ATTRS=m
CONFIG_SCSI_FC_ATTRS=m
# CONFIG_SCSI_FC_TGT_ATTRS is not set
CONFIG_SCSI_ISCSI_ATTRS=m
CONFIG_SCSI_SAS_ATTRS=m
CONFIG_SCSI_SAS_LIBSAS=m
CONFIG_SCSI_SAS_ATA=y
CONFIG_SCSI_SAS_HOST_SMP=y
# CONFIG_SCSI_SAS_LIBSAS_DEBUG is not set
CONFIG_SCSI_SRP_ATTRS=m
# CONFIG_SCSI_SRP_TGT_ATTRS is not set
CONFIG_SCSI_LOWLEVEL=y
CONFIG_ISCSI_TCP=m
# CONFIG_BLK_DEV_3W_XXXX_RAID is not set
# CONFIG_SCSI_3W_9XXX is not set
# CONFIG_SCSI_7000FASST is not set
CONFIG_SCSI_ACARD=m
# CONFIG_SCSI_AHA152X is not set
# CONFIG_SCSI_AHA1542 is not set
# CONFIG_SCSI_AACRAID is not set
CONFIG_SCSI_AIC7XXX=m
CONFIG_AIC7XXX_CMDS_PER_DEVICE=4
CONFIG_AIC7XXX_RESET_DELAY_MS=15000
# CONFIG_AIC7XXX_DEBUG_ENABLE is not set
CONFIG_AIC7XXX_DEBUG_MASK=0
# CONFIG_AIC7XXX_REG_PRETTY_PRINT is not set
CONFIG_SCSI_AIC7XXX_OLD=m
CONFIG_SCSI_AIC79XX=m
CONFIG_AIC79XX_CMDS_PER_DEVICE=4
CONFIG_AIC79XX_RESET_DELAY_MS=15000
# CONFIG_AIC79XX_DEBUG_ENABLE is not set
CONFIG_AIC79XX_DEBUG_MASK=0
# CONFIG_AIC79XX_REG_PRETTY_PRINT is not set
CONFIG_SCSI_AIC94XX=m
# CONFIG_AIC94XX_DEBUG is not set
# CONFIG_SCSI_DPT_I2O is not set
CONFIG_SCSI_ADVANSYS=m
# CONFIG_SCSI_IN2000 is not set
# CONFIG_SCSI_ARCMSR is not set
# CONFIG_MEGARAID_NEWGEN is not set
# CONFIG_MEGARAID_LEGACY is not set
# CONFIG_MEGARAID_SAS is not set
# CONFIG_SCSI_MPT2SAS is not set
# CONFIG_SCSI_HPTIOP is not set
CONFIG_SCSI_BUSLOGIC=m
# CONFIG_SCSI_FLASHPOINT is not set
# CONFIG_LIBFC is not set
# CONFIG_LIBFCOE is not set
# CONFIG_FCOE is not set
# CONFIG_SCSI_DMX3191D is not set
# CONFIG_SCSI_DTC3280 is not set
# CONFIG_SCSI_EATA is not set
# CONFIG_SCSI_FUTURE_DOMAIN is not set
CONFIG_SCSI_GDTH=m
# CONFIG_SCSI_GENERIC_NCR5380 is not set
# CONFIG_SCSI_GENERIC_NCR5380_MMIO is not set
CONFIG_SCSI_IPS=m
CONFIG_SCSI_INITIO=m
CONFIG_SCSI_INIA100=m
CONFIG_SCSI_PPA=m
CONFIG_SCSI_IMM=m
# CONFIG_SCSI_IZIP_EPP16 is not set
# CONFIG_SCSI_IZIP_SLOW_CTR is not set
# CONFIG_SCSI_MVSAS is not set
# CONFIG_SCSI_NCR53C406A is not set
# CONFIG_SCSI_STEX is not set
CONFIG_SCSI_SYM53C8XX_2=m
CONFIG_SCSI_SYM53C8XX_DMA_ADDRESSING_MODE=1
CONFIG_SCSI_SYM53C8XX_DEFAULT_TAGS=16
CONFIG_SCSI_SYM53C8XX_MAX_TAGS=64
CONFIG_SCSI_SYM53C8XX_MMIO=y
# CONFIG_SCSI_IPR is not set
# CONFIG_SCSI_PAS16 is not set
# CONFIG_SCSI_QLOGIC_FAS is not set
# CONFIG_SCSI_QLOGIC_1280 is not set
# CONFIG_SCSI_QLA_FC is not set
# CONFIG_SCSI_QLA_ISCSI is not set
# CONFIG_SCSI_LPFC is not set
# CONFIG_SCSI_SYM53C416 is not set
# CONFIG_SCSI_DC395x is not set
# CONFIG_SCSI_DC390T is not set
# CONFIG_SCSI_T128 is not set
# CONFIG_SCSI_U14_34F is not set
# CONFIG_SCSI_ULTRASTOR is not set
# CONFIG_SCSI_NSP32 is not set
# CONFIG_SCSI_DEBUG is not set
# CONFIG_SCSI_SRP is not set
CONFIG_SCSI_LOWLEVEL_PCMCIA=y
# CONFIG_PCMCIA_AHA152X is not set
# CONFIG_PCMCIA_FDOMAIN is not set
# CONFIG_PCMCIA_NINJA_SCSI is not set
CONFIG_PCMCIA_QLOGIC=m
# CONFIG_PCMCIA_SYM53C500 is not set
# CONFIG_SCSI_DH is not set
# CONFIG_SCSI_OSD_INITIATOR is not set
CONFIG_ATA=m
# CONFIG_ATA_NONSTANDARD is not set
CONFIG_ATA_ACPI=y
CONFIG_SATA_PMP=y
CONFIG_SATA_AHCI=m
# CONFIG_SATA_SIL24 is not set
CONFIG_ATA_SFF=y
# CONFIG_SATA_SVW is not set
CONFIG_ATA_PIIX=m
# CONFIG_SATA_MV is not set
CONFIG_SATA_NV=m
# CONFIG_PDC_ADMA is not set
# CONFIG_SATA_QSTOR is not set
# CONFIG_SATA_PROMISE is not set
# CONFIG_SATA_SX4 is not set
# CONFIG_SATA_SIL is not set
CONFIG_SATA_SIS=m
# CONFIG_SATA_ULI is not set
# CONFIG_SATA_VIA is not set
# CONFIG_SATA_VITESSE is not set
# CONFIG_SATA_INIC162X is not set
# CONFIG_PATA_ACPI is not set
# CONFIG_PATA_ALI is not set
# CONFIG_PATA_AMD is not set
# CONFIG_PATA_ARTOP is not set
CONFIG_PATA_ATIIXP=m
# CONFIG_PATA_CMD640_PCI is not set
# CONFIG_PATA_CMD64X is not set
# CONFIG_PATA_CS5520 is not set
# CONFIG_PATA_CS5530 is not set
# CONFIG_PATA_CS5535 is not set
# CONFIG_PATA_CS5536 is not set
# CONFIG_PATA_CYPRESS is not set
# CONFIG_PATA_EFAR is not set
CONFIG_ATA_GENERIC=m
# CONFIG_PATA_HPT366 is not set
# CONFIG_PATA_HPT37X is not set
# CONFIG_PATA_HPT3X2N is not set
# CONFIG_PATA_HPT3X3 is not set
# CONFIG_PATA_ISAPNP is not set
# CONFIG_PATA_IT821X is not set
# CONFIG_PATA_IT8213 is not set
# CONFIG_PATA_JMICRON is not set
# CONFIG_PATA_LEGACY is not set
# CONFIG_PATA_TRIFLEX is not set
# CONFIG_PATA_MARVELL is not set
CONFIG_PATA_MPIIX=m
# CONFIG_PATA_OLDPIIX is not set
# CONFIG_PATA_NETCELL is not set
# CONFIG_PATA_NINJA32 is not set
# CONFIG_PATA_NS87410 is not set
# CONFIG_PATA_NS87415 is not set
# CONFIG_PATA_OPTI is not set
# CONFIG_PATA_OPTIDMA is not set
CONFIG_PATA_PCMCIA=m
# CONFIG_PATA_PDC_OLD is not set
# CONFIG_PATA_QDI is not set
# CONFIG_PATA_RADISYS is not set
# CONFIG_PATA_RZ1000 is not set
# CONFIG_PATA_SC1200 is not set
# CONFIG_PATA_SERVERWORKS is not set
# CONFIG_PATA_PDC2027X is not set
# CONFIG_PATA_SIL680 is not set
CONFIG_PATA_SIS=m
CONFIG_PATA_VIA=m
# CONFIG_PATA_WINBOND is not set
# CONFIG_PATA_WINBOND_VLB is not set
# CONFIG_PATA_SCH is not set
# CONFIG_MD is not set
CONFIG_FUSION=y
CONFIG_FUSION_SPI=m
CONFIG_FUSION_FC=m
# CONFIG_FUSION_SAS is not set
CONFIG_FUSION_MAX_SGE=40
CONFIG_FUSION_CTL=m
CONFIG_FUSION_LAN=m
CONFIG_FUSION_LOGGING=y

#
# IEEE 1394 (FireWire) support
#

#
# Enable only one of the two stacks, unless you know what you are doing
#
CONFIG_FIREWIRE=m
CONFIG_FIREWIRE_OHCI=m
CONFIG_FIREWIRE_OHCI_DEBUG=y
CONFIG_FIREWIRE_SBP2=m
# CONFIG_IEEE1394 is not set
CONFIG_I2O=m
# CONFIG_I2O_LCT_NOTIFY_ON_CHANGES is not set
CONFIG_I2O_EXT_ADAPTEC=y
CONFIG_I2O_CONFIG=m
CONFIG_I2O_CONFIG_OLD_IOCTL=y
CONFIG_I2O_BUS=m
CONFIG_I2O_BLOCK=m
CONFIG_I2O_SCSI=m
CONFIG_I2O_PROC=m
# CONFIG_MACINTOSH_DRIVERS is not set
CONFIG_NETDEVICES=y
CONFIG_COMPAT_NET_DEV_OPS=y
CONFIG_DUMMY=m
CONFIG_BONDING=m
# CONFIG_MACVLAN is not set
# CONFIG_EQUALIZER is not set
CONFIG_TUN=m
# CONFIG_VETH is not set
# CONFIG_NET_SB1000 is not set
# CONFIG_ARCNET is not set
CONFIG_PHYLIB=m

#
# MII PHY device drivers
#
# CONFIG_MARVELL_PHY is not set
# CONFIG_DAVICOM_PHY is not set
# CONFIG_QSEMI_PHY is not set
CONFIG_LXT_PHY=m
# CONFIG_CICADA_PHY is not set
# CONFIG_VITESSE_PHY is not set
# CONFIG_SMSC_PHY is not set
# CONFIG_BROADCOM_PHY is not set
# CONFIG_ICPLUS_PHY is not set
# CONFIG_REALTEK_PHY is not set
# CONFIG_NATIONAL_PHY is not set
# CONFIG_STE10XP is not set
# CONFIG_LSI_ET1011C_PHY is not set
# CONFIG_MDIO_BITBANG is not set
CONFIG_NET_ETHERNET=y
CONFIG_MII=m
# CONFIG_HAPPYMEAL is not set
# CONFIG_SUNGEM is not set
# CONFIG_CASSINI is not set
CONFIG_NET_VENDOR_3COM=y
# CONFIG_EL1 is not set
# CONFIG_EL2 is not set
# CONFIG_ELPLUS is not set
# CONFIG_EL16 is not set
CONFIG_EL3=m
# CONFIG_3C515 is not set
CONFIG_VORTEX=m
CONFIG_TYPHOON=m
# CONFIG_LANCE is not set
CONFIG_NET_VENDOR_SMC=y
# CONFIG_WD80x3 is not set
# CONFIG_ULTRA is not set
# CONFIG_SMC9194 is not set
# CONFIG_ETHOC is not set
# CONFIG_NET_VENDOR_RACAL is not set
# CONFIG_DNET is not set
CONFIG_NET_TULIP=y
CONFIG_DE2104X=m
CONFIG_TULIP=m
# CONFIG_TULIP_MWI is not set
CONFIG_TULIP_MMIO=y
# CONFIG_TULIP_NAPI is not set
CONFIG_DE4X5=m
CONFIG_WINBOND_840=m
CONFIG_DM9102=m
CONFIG_ULI526X=m
CONFIG_PCMCIA_XIRCOM=m
# CONFIG_AT1700 is not set
# CONFIG_DEPCA is not set
# CONFIG_HP100 is not set
CONFIG_NET_ISA=y
# CONFIG_E2100 is not set
# CONFIG_EWRK3 is not set
# CONFIG_EEXPRESS is not set
# CONFIG_EEXPRESS_PRO is not set
# CONFIG_HPLAN_PLUS is not set
# CONFIG_HPLAN is not set
# CONFIG_LP486E is not set
# CONFIG_ETH16I is not set
CONFIG_NE2000=m
# CONFIG_ZNET is not set
# CONFIG_SEEQ8005 is not set
# CONFIG_IBM_NEW_EMAC_ZMII is not set
# CONFIG_IBM_NEW_EMAC_RGMII is not set
# CONFIG_IBM_NEW_EMAC_TAH is not set
# CONFIG_IBM_NEW_EMAC_EMAC4 is not set
# CONFIG_IBM_NEW_EMAC_NO_FLOW_CTRL is not set
# CONFIG_IBM_NEW_EMAC_MAL_CLR_ICINTSTAT is not set
# CONFIG_IBM_NEW_EMAC_MAL_COMMON_ERR is not set
CONFIG_NET_PCI=y
CONFIG_PCNET32=m
CONFIG_AMD8111_ETH=m
CONFIG_ADAPTEC_STARFIRE=m
# CONFIG_AC3200 is not set
# CONFIG_APRICOT is not set
CONFIG_B44=m
CONFIG_B44_PCI_AUTOSELECT=y
CONFIG_B44_PCICORE_AUTOSELECT=y
CONFIG_B44_PCI=y
CONFIG_FORCEDETH=m
CONFIG_FORCEDETH_NAPI=y
# CONFIG_CS89x0 is not set
CONFIG_E100=m
# CONFIG_FEALNX is not set
# CONFIG_NATSEMI is not set
CONFIG_NE2K_PCI=m
# CONFIG_8139CP is not set
CONFIG_8139TOO=m
# CONFIG_8139TOO_PIO is not set
# CONFIG_8139TOO_TUNE_TWISTER is not set
CONFIG_8139TOO_8129=y
# CONFIG_8139_OLD_RX_RESET is not set
# CONFIG_R6040 is not set
CONFIG_SIS900=m
# CONFIG_EPIC100 is not set
# CONFIG_SMSC9420 is not set
# CONFIG_SUNDANCE is not set
# CONFIG_TLAN is not set
CONFIG_VIA_RHINE=m
CONFIG_VIA_RHINE_MMIO=y
# CONFIG_SC92031 is not set
CONFIG_NET_POCKET=y
CONFIG_ATP=m
CONFIG_DE600=m
CONFIG_DE620=m
# CONFIG_ATL2 is not set
CONFIG_NETDEV_1000=y
CONFIG_ACENIC=m
# CONFIG_ACENIC_OMIT_TIGON_I is not set
# CONFIG_DL2K is not set
CONFIG_E1000=m
CONFIG_E1000E=m
# CONFIG_IP1000 is not set
# CONFIG_IGB is not set
# CONFIG_IGBVF is not set
# CONFIG_NS83820 is not set
# CONFIG_HAMACHI is not set
# CONFIG_YELLOWFIN is not set
CONFIG_R8169=m
# CONFIG_SIS190 is not set
CONFIG_SKGE=m
# CONFIG_SKGE_DEBUG is not set
CONFIG_SKY2=m
# CONFIG_SKY2_DEBUG is not set
CONFIG_VIA_VELOCITY=m
# CONFIG_TIGON3 is not set
# CONFIG_BNX2 is not set
# CONFIG_QLA3XXX is not set
# CONFIG_ATL1 is not set
# CONFIG_ATL1E is not set
# CONFIG_ATL1C is not set
# CONFIG_JME is not set
# CONFIG_NETDEV_10000 is not set
# CONFIG_TR is not set

#
# Wireless LAN
#
# CONFIG_WLAN_PRE80211 is not set
# CONFIG_WLAN_80211 is not set

#
# Enable WiMAX (Networking options) to see the WiMAX drivers
#

#
# USB Network Adapters
#
# CONFIG_USB_CATC is not set
# CONFIG_USB_KAWETH is not set
# CONFIG_USB_PEGASUS is not set
# CONFIG_USB_RTL8150 is not set
CONFIG_USB_USBNET=m
CONFIG_USB_NET_AX8817X=m
CONFIG_USB_NET_CDCETHER=m
CONFIG_USB_NET_DM9601=m
# CONFIG_USB_NET_SMSC95XX is not set
CONFIG_USB_NET_GL620A=m
CONFIG_USB_NET_NET1080=m
# CONFIG_USB_NET_PLUSB is not set
# CONFIG_USB_NET_MCS7830 is not set
# CONFIG_USB_NET_RNDIS_HOST is not set
CONFIG_USB_NET_CDC_SUBSET=m
CONFIG_USB_ALI_M5632=y
CONFIG_USB_AN2720=y
CONFIG_USB_BELKIN=y
CONFIG_USB_ARMLINUX=y
CONFIG_USB_EPSON2888=y
CONFIG_USB_KC2190=y
# CONFIG_USB_NET_ZAURUS is not set
CONFIG_NET_PCMCIA=y
# CONFIG_PCMCIA_3C589 is not set
# CONFIG_PCMCIA_3C574 is not set
# CONFIG_PCMCIA_FMVJ18X is not set
CONFIG_PCMCIA_PCNET=m
CONFIG_PCMCIA_NMCLAN=m
CONFIG_PCMCIA_SMC91C92=m
# CONFIG_PCMCIA_XIRC2PS is not set
# CONFIG_PCMCIA_AXNET is not set
# CONFIG_WAN is not set
CONFIG_FDDI=y
# CONFIG_DEFXX is not set
# CONFIG_SKFP is not set
# CONFIG_HIPPI is not set
CONFIG_PLIP=m
CONFIG_PPP=m
CONFIG_PPP_MULTILINK=y
CONFIG_PPP_FILTER=y
CONFIG_PPP_ASYNC=m
CONFIG_PPP_SYNC_TTY=m
CONFIG_PPP_DEFLATE=m
# CONFIG_PPP_BSDCOMP is not set
# CONFIG_PPP_MPPE is not set
CONFIG_PPPOE=m
# CONFIG_PPPOL2TP is not set
CONFIG_SLIP=m
CONFIG_SLIP_COMPRESSED=y
CONFIG_SLHC=m
CONFIG_SLIP_SMART=y
# CONFIG_SLIP_MODE_SLIP6 is not set
CONFIG_NET_FC=y
CONFIG_NETCONSOLE=m
# CONFIG_NETCONSOLE_DYNAMIC is not set
CONFIG_NETPOLL=y
CONFIG_NETPOLL_TRAP=y
CONFIG_NET_POLL_CONTROLLER=y
# CONFIG_ISDN is not set
# CONFIG_PHONE is not set

#
# Input device support
#
CONFIG_INPUT=y
CONFIG_INPUT_FF_MEMLESS=y
CONFIG_INPUT_POLLDEV=m

#
# Userland interfaces
#
CONFIG_INPUT_MOUSEDEV=y
# CONFIG_INPUT_MOUSEDEV_PSAUX is not set
CONFIG_INPUT_MOUSEDEV_SCREEN_X=1024
CONFIG_INPUT_MOUSEDEV_SCREEN_Y=768
# CONFIG_INPUT_JOYDEV is not set
CONFIG_INPUT_EVDEV=y
# CONFIG_INPUT_EVBUG is not set

#
# Input Device Drivers
#
CONFIG_INPUT_KEYBOARD=y
CONFIG_KEYBOARD_ATKBD=y
# CONFIG_KEYBOARD_SUNKBD is not set
# CONFIG_KEYBOARD_LKKBD is not set
# CONFIG_KEYBOARD_XTKBD is not set
# CONFIG_KEYBOARD_NEWTON is not set
# CONFIG_KEYBOARD_STOWAWAY is not set
CONFIG_INPUT_MOUSE=y
CONFIG_MOUSE_PS2=y
CONFIG_MOUSE_PS2_ALPS=y
CONFIG_MOUSE_PS2_LOGIPS2PP=y
CONFIG_MOUSE_PS2_SYNAPTICS=y
CONFIG_MOUSE_PS2_LIFEBOOK=y
CONFIG_MOUSE_PS2_TRACKPOINT=y
# CONFIG_MOUSE_PS2_ELANTECH is not set
# CONFIG_MOUSE_PS2_TOUCHKIT is not set
CONFIG_MOUSE_SERIAL=m
CONFIG_MOUSE_APPLETOUCH=m
# CONFIG_MOUSE_BCM5974 is not set
# CONFIG_MOUSE_INPORT is not set
# CONFIG_MOUSE_LOGIBM is not set
# CONFIG_MOUSE_PC110PAD is not set
CONFIG_MOUSE_VSXXXAA=m
# CONFIG_INPUT_JOYSTICK is not set
# CONFIG_INPUT_TABLET is not set
# CONFIG_INPUT_TOUCHSCREEN is not set
CONFIG_INPUT_MISC=y
# CONFIG_INPUT_PCSPKR is not set
# CONFIG_INPUT_APANEL is not set
# CONFIG_INPUT_WISTRON_BTNS is not set
# CONFIG_INPUT_ATLAS_BTNS is not set
# CONFIG_INPUT_ATI_REMOTE is not set
# CONFIG_INPUT_ATI_REMOTE2 is not set
# CONFIG_INPUT_KEYSPAN_REMOTE is not set
# CONFIG_INPUT_POWERMATE is not set
# CONFIG_INPUT_YEALINK is not set
# CONFIG_INPUT_CM109 is not set
CONFIG_INPUT_UINPUT=m

#
# Hardware I/O ports
#
CONFIG_SERIO=y
CONFIG_SERIO_I8042=y
CONFIG_SERIO_SERPORT=y
# CONFIG_SERIO_CT82C710 is not set
# CONFIG_SERIO_PARKBD is not set
# CONFIG_SERIO_PCIPS2 is not set
CONFIG_SERIO_LIBPS2=y
CONFIG_SERIO_RAW=m
# CONFIG_GAMEPORT is not set

#
# Character devices
#
CONFIG_VT=y
CONFIG_CONSOLE_TRANSLATIONS=y
CONFIG_VT_CONSOLE=y
CONFIG_HW_CONSOLE=y
CONFIG_VT_HW_CONSOLE_BINDING=y
CONFIG_DEVKMEM=y
CONFIG_SERIAL_NONSTANDARD=y
# CONFIG_COMPUTONE is not set
CONFIG_ROCKETPORT=m
CONFIG_CYCLADES=m
# CONFIG_CYZ_INTR is not set
# CONFIG_DIGIEPCA is not set
# CONFIG_MOXA_INTELLIO is not set
# CONFIG_MOXA_SMARTIO is not set
# CONFIG_ISI is not set
# CONFIG_SYNCLINK is not set
CONFIG_SYNCLINKMP=m
CONFIG_SYNCLINK_GT=m
# CONFIG_N_HDLC is not set
# CONFIG_RISCOM8 is not set
# CONFIG_SPECIALIX is not set
# CONFIG_SX is not set
# CONFIG_RIO is not set
# CONFIG_STALDRV is not set
# CONFIG_NOZOMI is not set

#
# Serial drivers
#
CONFIG_SERIAL_8250=y
CONFIG_SERIAL_8250_CONSOLE=y
CONFIG_FIX_EARLYCON_MEM=y
CONFIG_SERIAL_8250_PCI=y
CONFIG_SERIAL_8250_PNP=y
CONFIG_SERIAL_8250_CS=m
CONFIG_SERIAL_8250_NR_UARTS=32
CONFIG_SERIAL_8250_RUNTIME_UARTS=4
CONFIG_SERIAL_8250_EXTENDED=y
CONFIG_SERIAL_8250_MANY_PORTS=y
# CONFIG_SERIAL_8250_FOURPORT is not set
# CONFIG_SERIAL_8250_ACCENT is not set
# CONFIG_SERIAL_8250_BOCA is not set
# CONFIG_SERIAL_8250_EXAR_ST16C554 is not set
# CONFIG_SERIAL_8250_HUB6 is not set
CONFIG_SERIAL_8250_SHARE_IRQ=y
CONFIG_SERIAL_8250_DETECT_IRQ=y
CONFIG_SERIAL_8250_RSA=y

#
# Non-8250 serial port support
#
CONFIG_SERIAL_CORE=y
CONFIG_SERIAL_CORE_CONSOLE=y
CONFIG_SERIAL_JSM=m
CONFIG_UNIX98_PTYS=y
# CONFIG_DEVPTS_MULTIPLE_INSTANCES is not set
# CONFIG_LEGACY_PTYS is not set
CONFIG_PRINTER=m
CONFIG_LP_CONSOLE=y
CONFIG_PPDEV=m
CONFIG_IPMI_HANDLER=m
# CONFIG_IPMI_PANIC_EVENT is not set
CONFIG_IPMI_DEVICE_INTERFACE=m
CONFIG_IPMI_SI=m
CONFIG_IPMI_WATCHDOG=m
CONFIG_IPMI_POWEROFF=m
CONFIG_HW_RANDOM=y
# CONFIG_HW_RANDOM_TIMERIOMEM is not set
CONFIG_HW_RANDOM_INTEL=m
CONFIG_HW_RANDOM_AMD=m
CONFIG_HW_RANDOM_GEODE=m
CONFIG_HW_RANDOM_VIA=m
CONFIG_NVRAM=y
CONFIG_RTC=y
# CONFIG_DTLK is not set
# CONFIG_R3964 is not set
# CONFIG_APPLICOM is not set
# CONFIG_SONYPI is not set

#
# PCMCIA character devices
#
# CONFIG_SYNCLINK_CS is not set
CONFIG_CARDMAN_4000=m
CONFIG_CARDMAN_4040=m
# CONFIG_IPWIRELESS is not set
CONFIG_MWAVE=m
# CONFIG_PC8736x_GPIO is not set
# CONFIG_NSC_GPIO is not set
# CONFIG_CS5535_GPIO is not set
# CONFIG_RAW_DRIVER is not set
CONFIG_HPET=y
# CONFIG_HPET_MMAP is not set
CONFIG_HANGCHECK_TIMER=m
# CONFIG_TCG_TPM is not set
# CONFIG_TELCLOCK is not set
CONFIG_DEVPORT=y
CONFIG_I2C=m
CONFIG_I2C_BOARDINFO=y
CONFIG_I2C_CHARDEV=m
CONFIG_I2C_HELPER_AUTO=y
CONFIG_I2C_ALGOBIT=m
CONFIG_I2C_ALGOPCA=m

#
# I2C Hardware Bus support
#

#
# PC SMBus host controller drivers
#
CONFIG_I2C_ALI1535=m
CONFIG_I2C_ALI1563=m
CONFIG_I2C_ALI15X3=m
CONFIG_I2C_AMD756=m
CONFIG_I2C_AMD756_S4882=m
# CONFIG_I2C_AMD8111 is not set
CONFIG_I2C_I801=m
# CONFIG_I2C_ISCH is not set
CONFIG_I2C_PIIX4=m
CONFIG_I2C_NFORCE2=m
# CONFIG_I2C_NFORCE2_S4985 is not set
# CONFIG_I2C_SIS5595 is not set
# CONFIG_I2C_SIS630 is not set
# CONFIG_I2C_SIS96X is not set
CONFIG_I2C_VIA=m
CONFIG_I2C_VIAPRO=m

#
# I2C system bus drivers (mostly embedded / system-on-chip)
#
# CONFIG_I2C_OCORES is not set
CONFIG_I2C_SIMTEC=m

#
# External I2C/SMBus adapter drivers
#
CONFIG_I2C_PARPORT=m
CONFIG_I2C_PARPORT_LIGHT=m
# CONFIG_I2C_TAOS_EVM is not set
# CONFIG_I2C_TINY_USB is not set

#
# Graphics adapter I2C/DDC channel drivers
#
CONFIG_I2C_VOODOO3=m

#
# Other I2C/SMBus bus drivers
#
CONFIG_I2C_PCA_ISA=m
# CONFIG_I2C_PCA_PLATFORM is not set
CONFIG_I2C_STUB=m
# CONFIG_SCx200_ACB is not set

#
# Miscellaneous I2C Chip support
#
# CONFIG_DS1682 is not set
# CONFIG_SENSORS_PCF8574 is not set
# CONFIG_PCF8575 is not set
# CONFIG_SENSORS_PCA9539 is not set
CONFIG_SENSORS_MAX6875=m
# CONFIG_SENSORS_TSL2550 is not set
# CONFIG_I2C_DEBUG_CORE is not set
# CONFIG_I2C_DEBUG_ALGO is not set
# CONFIG_I2C_DEBUG_BUS is not set
# CONFIG_I2C_DEBUG_CHIP is not set
# CONFIG_SPI is not set
CONFIG_ARCH_WANT_OPTIONAL_GPIOLIB=y
# CONFIG_GPIOLIB is not set
# CONFIG_W1 is not set
CONFIG_POWER_SUPPLY=y
# CONFIG_POWER_SUPPLY_DEBUG is not set
# CONFIG_PDA_POWER is not set
# CONFIG_BATTERY_DS2760 is not set
# CONFIG_BATTERY_BQ27x00 is not set
CONFIG_HWMON=m
CONFIG_HWMON_VID=m
# CONFIG_SENSORS_ABITUGURU is not set
# CONFIG_SENSORS_ABITUGURU3 is not set
# CONFIG_SENSORS_AD7414 is not set
CONFIG_SENSORS_AD7418=m
# CONFIG_SENSORS_ADM1021 is not set
# CONFIG_SENSORS_ADM1025 is not set
# CONFIG_SENSORS_ADM1026 is not set
# CONFIG_SENSORS_ADM1029 is not set
# CONFIG_SENSORS_ADM1031 is not set
# CONFIG_SENSORS_ADM9240 is not set
# CONFIG_SENSORS_ADT7462 is not set
# CONFIG_SENSORS_ADT7470 is not set
# CONFIG_SENSORS_ADT7473 is not set
# CONFIG_SENSORS_ADT7475 is not set
# CONFIG_SENSORS_K8TEMP is not set
# CONFIG_SENSORS_ASB100 is not set
# CONFIG_SENSORS_ATK0110 is not set
# CONFIG_SENSORS_ATXP1 is not set
# CONFIG_SENSORS_DS1621 is not set
# CONFIG_SENSORS_I5K_AMB is not set
# CONFIG_SENSORS_F71805F is not set
# CONFIG_SENSORS_F71882FG is not set
# CONFIG_SENSORS_F75375S is not set
# CONFIG_SENSORS_FSCHER is not set
# CONFIG_SENSORS_FSCPOS is not set
# CONFIG_SENSORS_FSCHMD is not set
# CONFIG_SENSORS_G760A is not set
# CONFIG_SENSORS_GL518SM is not set
# CONFIG_SENSORS_GL520SM is not set
CONFIG_SENSORS_CORETEMP=m
# CONFIG_SENSORS_IBMAEM is not set
# CONFIG_SENSORS_IBMPEX is not set
# CONFIG_SENSORS_IT87 is not set
# CONFIG_SENSORS_LM63 is not set
# CONFIG_SENSORS_LM75 is not set
# CONFIG_SENSORS_LM77 is not set
# CONFIG_SENSORS_LM78 is not set
# CONFIG_SENSORS_LM80 is not set
# CONFIG_SENSORS_LM83 is not set
# CONFIG_SENSORS_LM85 is not set
# CONFIG_SENSORS_LM87 is not set
# CONFIG_SENSORS_LM90 is not set
# CONFIG_SENSORS_LM92 is not set
# CONFIG_SENSORS_LM93 is not set
# CONFIG_SENSORS_LTC4215 is not set
# CONFIG_SENSORS_LTC4245 is not set
# CONFIG_SENSORS_LM95241 is not set
# CONFIG_SENSORS_MAX1619 is not set
# CONFIG_SENSORS_MAX6650 is not set
# CONFIG_SENSORS_PC87360 is not set
# CONFIG_SENSORS_PC87427 is not set
# CONFIG_SENSORS_PCF8591 is not set
CONFIG_SENSORS_SIS5595=m
# CONFIG_SENSORS_DME1737 is not set
# CONFIG_SENSORS_SMSC47M1 is not set
# CONFIG_SENSORS_SMSC47M192 is not set
# CONFIG_SENSORS_SMSC47B397 is not set
# CONFIG_SENSORS_ADS7828 is not set
# CONFIG_SENSORS_THMC50 is not set
CONFIG_SENSORS_VIA686A=m
CONFIG_SENSORS_VT1211=m
CONFIG_SENSORS_VT8231=m
# CONFIG_SENSORS_W83781D is not set
# CONFIG_SENSORS_W83791D is not set
# CONFIG_SENSORS_W83792D is not set
# CONFIG_SENSORS_W83793 is not set
# CONFIG_SENSORS_W83L785TS is not set
# CONFIG_SENSORS_W83L786NG is not set
# CONFIG_SENSORS_W83627HF is not set
# CONFIG_SENSORS_W83627EHF is not set
CONFIG_SENSORS_HDAPS=m
# CONFIG_SENSORS_LIS3LV02D is not set
# CONFIG_SENSORS_APPLESMC is not set
# CONFIG_HWMON_DEBUG_CHIP is not set
CONFIG_THERMAL=y
# CONFIG_WATCHDOG is not set
CONFIG_SSB_POSSIBLE=y

#
# Sonics Silicon Backplane
#
CONFIG_SSB=m
CONFIG_SSB_SPROM=y
CONFIG_SSB_PCIHOST_POSSIBLE=y
CONFIG_SSB_PCIHOST=y
# CONFIG_SSB_B43_PCI_BRIDGE is not set
CONFIG_SSB_PCMCIAHOST_POSSIBLE=y
CONFIG_SSB_PCMCIAHOST=y
# CONFIG_SSB_DEBUG is not set
CONFIG_SSB_DRIVER_PCICORE_POSSIBLE=y
CONFIG_SSB_DRIVER_PCICORE=y

#
# Multifunction device drivers
#
# CONFIG_MFD_CORE is not set
# CONFIG_MFD_SM501 is not set
# CONFIG_HTC_PASIC3 is not set
# CONFIG_MFD_TMIO is not set
# CONFIG_MFD_WM8400 is not set
# CONFIG_MFD_WM8350_I2C is not set
# CONFIG_MFD_PCF50633 is not set
# CONFIG_REGULATOR is not set

#
# Multimedia devices
#

#
# Multimedia core support
#
CONFIG_VIDEO_DEV=m
CONFIG_VIDEO_V4L2_COMMON=m
CONFIG_VIDEO_ALLOW_V4L1=y
CONFIG_VIDEO_V4L1_COMPAT=y
# CONFIG_DVB_CORE is not set
CONFIG_VIDEO_MEDIA=m

#
# Multimedia drivers
#
# CONFIG_MEDIA_ATTACH is not set
CONFIG_MEDIA_TUNER=m
# CONFIG_MEDIA_TUNER_CUSTOMISE is not set
CONFIG_MEDIA_TUNER_SIMPLE=m
CONFIG_MEDIA_TUNER_TDA8290=m
CONFIG_MEDIA_TUNER_TDA9887=m
CONFIG_MEDIA_TUNER_TEA5761=m
CONFIG_MEDIA_TUNER_TEA5767=m
CONFIG_MEDIA_TUNER_MT20XX=m
CONFIG_MEDIA_TUNER_XC2028=m
CONFIG_MEDIA_TUNER_XC5000=m
CONFIG_MEDIA_TUNER_MC44S803=m
CONFIG_VIDEO_V4L2=m
CONFIG_VIDEO_V4L1=m
CONFIG_VIDEOBUF_GEN=m
CONFIG_VIDEOBUF_DMA_SG=m
CONFIG_VIDEO_BTCX=m
CONFIG_VIDEO_IR=m
CONFIG_VIDEO_TVEEPROM=m
CONFIG_VIDEO_TUNER=m
CONFIG_VIDEO_CAPTURE_DRIVERS=y
# CONFIG_VIDEO_ADV_DEBUG is not set
# CONFIG_VIDEO_FIXED_MINOR_RANGES is not set
# CONFIG_VIDEO_HELPER_CHIPS_AUTO is not set
CONFIG_VIDEO_IR_I2C=m

#
# Encoders/decoders and other helper chips
#

#
# Audio decoders
#
CONFIG_VIDEO_TVAUDIO=m
CONFIG_VIDEO_TDA7432=m
CONFIG_VIDEO_TDA9840=m
CONFIG_VIDEO_TDA9875=m
CONFIG_VIDEO_TEA6415C=m
CONFIG_VIDEO_TEA6420=m
CONFIG_VIDEO_MSP3400=m
# CONFIG_VIDEO_CS5345 is not set
CONFIG_VIDEO_CS53L32A=m
CONFIG_VIDEO_M52790=m
CONFIG_VIDEO_TLV320AIC23B=m
CONFIG_VIDEO_WM8775=m
CONFIG_VIDEO_WM8739=m
CONFIG_VIDEO_VP27SMPX=m

#
# RDS decoders
#
# CONFIG_VIDEO_SAA6588 is not set

#
# Video decoders
#
CONFIG_VIDEO_BT819=m
CONFIG_VIDEO_BT856=m
CONFIG_VIDEO_BT866=m
CONFIG_VIDEO_KS0127=m
CONFIG_VIDEO_OV7670=m
# CONFIG_VIDEO_TCM825X is not set
CONFIG_VIDEO_SAA7110=m
CONFIG_VIDEO_SAA711X=m
CONFIG_VIDEO_SAA717X=m
CONFIG_VIDEO_SAA7191=m
# CONFIG_VIDEO_TVP514X is not set
CONFIG_VIDEO_TVP5150=m
CONFIG_VIDEO_VPX3220=m

#
# Video and audio decoders
#
CONFIG_VIDEO_CX25840=m

#
# MPEG video encoders
#
CONFIG_VIDEO_CX2341X=m

#
# Video encoders
#
CONFIG_VIDEO_SAA7127=m
CONFIG_VIDEO_SAA7185=m
CONFIG_VIDEO_ADV7170=m
CONFIG_VIDEO_ADV7175=m

#
# Video improvement chips
#
CONFIG_VIDEO_UPD64031A=m
CONFIG_VIDEO_UPD64083=m
# CONFIG_VIDEO_VIVI is not set
CONFIG_VIDEO_BT848=m
# CONFIG_VIDEO_PMS is not set
# CONFIG_VIDEO_BWQCAM is not set
# CONFIG_VIDEO_CQCAM is not set
# CONFIG_VIDEO_W9966 is not set
CONFIG_VIDEO_CPIA=m
CONFIG_VIDEO_CPIA_PP=m
CONFIG_VIDEO_CPIA_USB=m
CONFIG_VIDEO_CPIA2=m
# CONFIG_VIDEO_SAA5246A is not set
# CONFIG_VIDEO_SAA5249 is not set
# CONFIG_VIDEO_STRADIS is not set
CONFIG_VIDEO_ZORAN=m
# CONFIG_VIDEO_ZORAN_DC30 is not set
CONFIG_VIDEO_ZORAN_ZR36060=m
CONFIG_VIDEO_ZORAN_BUZ=m
# CONFIG_VIDEO_ZORAN_DC10 is not set
CONFIG_VIDEO_ZORAN_LML33=m
# CONFIG_VIDEO_ZORAN_LML33R10 is not set
# CONFIG_VIDEO_ZORAN_AVS6EYES is not set
# CONFIG_VIDEO_SAA7134 is not set
# CONFIG_VIDEO_MXB is not set
# CONFIG_VIDEO_HEXIUM_ORION is not set
# CONFIG_VIDEO_HEXIUM_GEMINI is not set
# CONFIG_VIDEO_CX88 is not set
CONFIG_VIDEO_IVTV=m
# CONFIG_VIDEO_FB_IVTV is not set
# CONFIG_VIDEO_CAFE_CCIC is not set
# CONFIG_SOC_CAMERA is not set
# CONFIG_V4L_USB_DRIVERS is not set
CONFIG_RADIO_ADAPTERS=y
# CONFIG_RADIO_CADET is not set
# CONFIG_RADIO_RTRACK is not set
# CONFIG_RADIO_RTRACK2 is not set
# CONFIG_RADIO_AZTECH is not set
# CONFIG_RADIO_GEMTEK is not set
# CONFIG_RADIO_GEMTEK_PCI is not set
CONFIG_RADIO_MAXIRADIO=m
CONFIG_RADIO_MAESTRO=m
# CONFIG_RADIO_SF16FMI is not set
# CONFIG_RADIO_SF16FMR2 is not set
# CONFIG_RADIO_TERRATEC is not set
# CONFIG_RADIO_TRUST is not set
# CONFIG_RADIO_TYPHOON is not set
# CONFIG_RADIO_ZOLTRIX is not set
CONFIG_USB_DSBR=m
# CONFIG_USB_SI470X is not set
# CONFIG_USB_MR800 is not set
# CONFIG_RADIO_TEA5764 is not set
CONFIG_DAB=y
CONFIG_USB_DABUSB=m

#
# Graphics support
#
CONFIG_AGP=y
CONFIG_AGP_ALI=y
CONFIG_AGP_ATI=y
# CONFIG_AGP_AMD is not set
# CONFIG_AGP_AMD64 is not set
CONFIG_AGP_INTEL=y
CONFIG_AGP_NVIDIA=y
CONFIG_AGP_SIS=y
# CONFIG_AGP_SWORKS is not set
CONFIG_AGP_VIA=y
CONFIG_AGP_EFFICEON=y
CONFIG_DRM=m
CONFIG_DRM_TDFX=m
CONFIG_DRM_R128=m
CONFIG_DRM_RADEON=m
CONFIG_DRM_I810=m
CONFIG_DRM_I830=m
CONFIG_DRM_I915=m
# CONFIG_DRM_I915_KMS is not set
# CONFIG_DRM_MGA is not set
CONFIG_DRM_SIS=m
# CONFIG_DRM_VIA is not set
# CONFIG_DRM_SAVAGE is not set
CONFIG_VGASTATE=m
CONFIG_VIDEO_OUTPUT_CONTROL=m
CONFIG_FB=y
# CONFIG_FIRMWARE_EDID is not set
CONFIG_FB_DDC=m
CONFIG_FB_BOOT_VESA_SUPPORT=y
CONFIG_FB_CFB_FILLRECT=y
CONFIG_FB_CFB_COPYAREA=y
CONFIG_FB_CFB_IMAGEBLIT=y
# CONFIG_FB_CFB_REV_PIXELS_IN_BYTE is not set
# CONFIG_FB_SYS_FILLRECT is not set
# CONFIG_FB_SYS_COPYAREA is not set
# CONFIG_FB_SYS_IMAGEBLIT is not set
# CONFIG_FB_FOREIGN_ENDIAN is not set
# CONFIG_FB_SYS_FOPS is not set
CONFIG_FB_SVGALIB=m
# CONFIG_FB_MACMODES is not set
CONFIG_FB_BACKLIGHT=y
CONFIG_FB_MODE_HELPERS=y
CONFIG_FB_TILEBLITTING=y

#
# Frame buffer hardware drivers
#
# CONFIG_FB_CIRRUS is not set
# CONFIG_FB_PM2 is not set
# CONFIG_FB_CYBER2000 is not set
# CONFIG_FB_ARC is not set
# CONFIG_FB_ASILIANT is not set
# CONFIG_FB_IMSTT is not set
# CONFIG_FB_VGA16 is not set
CONFIG_FB_VESA=y
# CONFIG_FB_EFI is not set
# CONFIG_FB_N411 is not set
# CONFIG_FB_HGA is not set
# CONFIG_FB_S1D13XXX is not set
CONFIG_FB_NVIDIA=m
CONFIG_FB_NVIDIA_I2C=y
# CONFIG_FB_NVIDIA_DEBUG is not set
CONFIG_FB_NVIDIA_BACKLIGHT=y
# CONFIG_FB_RIVA is not set
# CONFIG_FB_I810 is not set
# CONFIG_FB_LE80578 is not set
# CONFIG_FB_INTEL is not set
# CONFIG_FB_MATROX is not set
CONFIG_FB_RADEON=m
CONFIG_FB_RADEON_I2C=y
CONFIG_FB_RADEON_BACKLIGHT=y
# CONFIG_FB_RADEON_DEBUG is not set
# CONFIG_FB_ATY128 is not set
# CONFIG_FB_ATY is not set
CONFIG_FB_S3=m
CONFIG_FB_SAVAGE=m
CONFIG_FB_SAVAGE_I2C=y
CONFIG_FB_SAVAGE_ACCEL=y
# CONFIG_FB_SIS is not set
# CONFIG_FB_VIA is not set
# CONFIG_FB_NEOMAGIC is not set
# CONFIG_FB_KYRO is not set
# CONFIG_FB_3DFX is not set
# CONFIG_FB_VOODOO1 is not set
# CONFIG_FB_VT8623 is not set
CONFIG_FB_TRIDENT=m
# CONFIG_FB_ARK is not set
# CONFIG_FB_PM3 is not set
# CONFIG_FB_CARMINE is not set
# CONFIG_FB_GEODE is not set
# CONFIG_FB_VIRTUAL is not set
# CONFIG_FB_METRONOME is not set
# CONFIG_FB_MB862XX is not set
# CONFIG_FB_BROADSHEET is not set
CONFIG_BACKLIGHT_LCD_SUPPORT=y
CONFIG_LCD_CLASS_DEVICE=m
# CONFIG_LCD_ILI9320 is not set
# CONFIG_LCD_PLATFORM is not set
CONFIG_BACKLIGHT_CLASS_DEVICE=y
CONFIG_BACKLIGHT_GENERIC=y
CONFIG_BACKLIGHT_PROGEAR=m
# CONFIG_BACKLIGHT_MBP_NVIDIA is not set
# CONFIG_BACKLIGHT_SAHARA is not set

#
# Display device support
#
CONFIG_DISPLAY_SUPPORT=m

#
# Display hardware drivers
#

#
# Console display driver support
#
CONFIG_VGA_CONSOLE=y
CONFIG_VGACON_SOFT_SCROLLBACK=y
CONFIG_VGACON_SOFT_SCROLLBACK_SIZE=64
# CONFIG_MDA_CONSOLE is not set
CONFIG_DUMMY_CONSOLE=y
CONFIG_FRAMEBUFFER_CONSOLE=y
CONFIG_FRAMEBUFFER_CONSOLE_DETECT_PRIMARY=y
CONFIG_FRAMEBUFFER_CONSOLE_ROTATION=y
# CONFIG_FONTS is not set
CONFIG_FONT_8x8=y
CONFIG_FONT_8x16=y
CONFIG_LOGO=y
# CONFIG_LOGO_LINUX_MONO is not set
# CONFIG_LOGO_LINUX_VGA16 is not set
CONFIG_LOGO_LINUX_CLUT224=y
# CONFIG_SOUND is not set
# CONFIG_HID_SUPPORT is not set
CONFIG_USB_SUPPORT=y
CONFIG_USB_ARCH_HAS_HCD=y
CONFIG_USB_ARCH_HAS_OHCI=y
CONFIG_USB_ARCH_HAS_EHCI=y
CONFIG_USB=y
# CONFIG_USB_DEBUG is not set
# CONFIG_USB_ANNOUNCE_NEW_DEVICES is not set

#
# Miscellaneous USB options
#
CONFIG_USB_DEVICEFS=y
# CONFIG_USB_DEVICE_CLASS is not set
# CONFIG_USB_DYNAMIC_MINORS is not set
CONFIG_USB_SUSPEND=y
# CONFIG_USB_OTG is not set
# CONFIG_USB_MON is not set
# CONFIG_USB_WUSB is not set
# CONFIG_USB_WUSB_CBAF is not set

#
# USB Host Controller Drivers
#
# CONFIG_USB_C67X00_HCD is not set
CONFIG_USB_EHCI_HCD=m
CONFIG_USB_EHCI_ROOT_HUB_TT=y
CONFIG_USB_EHCI_TT_NEWSCHED=y
# CONFIG_USB_OXU210HP_HCD is not set
# CONFIG_USB_ISP116X_HCD is not set
# CONFIG_USB_ISP1760_HCD is not set
CONFIG_USB_OHCI_HCD=m
# CONFIG_USB_OHCI_HCD_SSB is not set
# CONFIG_USB_OHCI_BIG_ENDIAN_DESC is not set
# CONFIG_USB_OHCI_BIG_ENDIAN_MMIO is not set
CONFIG_USB_OHCI_LITTLE_ENDIAN=y
CONFIG_USB_UHCI_HCD=m
# CONFIG_USB_U132_HCD is not set
# CONFIG_USB_SL811_HCD is not set
# CONFIG_USB_R8A66597_HCD is not set
# CONFIG_USB_WHCI_HCD is not set
# CONFIG_USB_HWA_HCD is not set

#
# USB Device Class drivers
#
# CONFIG_USB_ACM is not set
# CONFIG_USB_PRINTER is not set
# CONFIG_USB_WDM is not set
# CONFIG_USB_TMC is not set

#
# NOTE: USB_STORAGE depends on SCSI but BLK_DEV_SD may
#

#
# also be needed; see USB_STORAGE Help for more info
#
CONFIG_USB_STORAGE=m
# CONFIG_USB_STORAGE_DEBUG is not set
CONFIG_USB_STORAGE_DATAFAB=m
CONFIG_USB_STORAGE_FREECOM=m
# CONFIG_USB_STORAGE_ISD200 is not set
CONFIG_USB_STORAGE_USBAT=m
# CONFIG_USB_STORAGE_SDDR09 is not set
# CONFIG_USB_STORAGE_SDDR55 is not set
# CONFIG_USB_STORAGE_JUMPSHOT is not set
# CONFIG_USB_STORAGE_ALAUDA is not set
# CONFIG_USB_STORAGE_ONETOUCH is not set
# CONFIG_USB_STORAGE_KARMA is not set
# CONFIG_USB_STORAGE_CYPRESS_ATACB is not set
# CONFIG_USB_LIBUSUAL is not set

#
# USB Imaging devices
#
# CONFIG_USB_MDC800 is not set
# CONFIG_USB_MICROTEK is not set

#
# USB port drivers
#
# CONFIG_USB_USS720 is not set
CONFIG_USB_SERIAL=m
CONFIG_USB_EZUSB=y
CONFIG_USB_SERIAL_GENERIC=y
# CONFIG_USB_SERIAL_AIRCABLE is not set
# CONFIG_USB_SERIAL_ARK3116 is not set
# CONFIG_USB_SERIAL_BELKIN is not set
# CONFIG_USB_SERIAL_CH341 is not set
# CONFIG_USB_SERIAL_WHITEHEAT is not set
# CONFIG_USB_SERIAL_DIGI_ACCELEPORT is not set
# CONFIG_USB_SERIAL_CP210X is not set
# CONFIG_USB_SERIAL_CYPRESS_M8 is not set
CONFIG_USB_SERIAL_EMPEG=m
# CONFIG_USB_SERIAL_FTDI_SIO is not set
# CONFIG_USB_SERIAL_FUNSOFT is not set
# CONFIG_USB_SERIAL_VISOR is not set
# CONFIG_USB_SERIAL_IPAQ is not set
# CONFIG_USB_SERIAL_IR is not set
# CONFIG_USB_SERIAL_EDGEPORT is not set
# CONFIG_USB_SERIAL_EDGEPORT_TI is not set
# CONFIG_USB_SERIAL_GARMIN is not set
# CONFIG_USB_SERIAL_IPW is not set
# CONFIG_USB_SERIAL_IUU is not set
# CONFIG_USB_SERIAL_KEYSPAN_PDA is not set
CONFIG_USB_SERIAL_KEYSPAN=m
# CONFIG_USB_SERIAL_KEYSPAN_MPR is not set
# CONFIG_USB_SERIAL_KEYSPAN_USA28 is not set
# CONFIG_USB_SERIAL_KEYSPAN_USA28X is not set
# CONFIG_USB_SERIAL_KEYSPAN_USA28XA is not set
# CONFIG_USB_SERIAL_KEYSPAN_USA28XB is not set
# CONFIG_USB_SERIAL_KEYSPAN_USA19 is not set
# CONFIG_USB_SERIAL_KEYSPAN_USA18X is not set
# CONFIG_USB_SERIAL_KEYSPAN_USA19W is not set
CONFIG_USB_SERIAL_KEYSPAN_USA19QW=y
CONFIG_USB_SERIAL_KEYSPAN_USA19QI=y
CONFIG_USB_SERIAL_KEYSPAN_USA49W=y
CONFIG_USB_SERIAL_KEYSPAN_USA49WLC=y
# CONFIG_USB_SERIAL_KLSI is not set
# CONFIG_USB_SERIAL_KOBIL_SCT is not set
# CONFIG_USB_SERIAL_MCT_U232 is not set
# CONFIG_USB_SERIAL_MOS7720 is not set
# CONFIG_USB_SERIAL_MOS7840 is not set
# CONFIG_USB_SERIAL_MOTOROLA is not set
# CONFIG_USB_SERIAL_NAVMAN is not set
# CONFIG_USB_SERIAL_PL2303 is not set
# CONFIG_USB_SERIAL_OTI6858 is not set
# CONFIG_USB_SERIAL_QUALCOMM is not set
# CONFIG_USB_SERIAL_SPCP8X5 is not set
# CONFIG_USB_SERIAL_HP4X is not set
# CONFIG_USB_SERIAL_SAFE is not set
# CONFIG_USB_SERIAL_SIEMENS_MPI is not set
# CONFIG_USB_SERIAL_SIERRAWIRELESS is not set
# CONFIG_USB_SERIAL_SYMBOL is not set
# CONFIG_USB_SERIAL_TI is not set
# CONFIG_USB_SERIAL_CYBERJACK is not set
# CONFIG_USB_SERIAL_XIRCOM is not set
# CONFIG_USB_SERIAL_OPTION is not set
# CONFIG_USB_SERIAL_OMNINET is not set
# CONFIG_USB_SERIAL_OPTICON is not set
# CONFIG_USB_SERIAL_DEBUG is not set

#
# USB Miscellaneous drivers
#
# CONFIG_USB_EMI62 is not set
# CONFIG_USB_EMI26 is not set
# CONFIG_USB_ADUTUX is not set
# CONFIG_USB_SEVSEG is not set
# CONFIG_USB_RIO500 is not set
# CONFIG_USB_LEGOTOWER is not set
# CONFIG_USB_LCD is not set
# CONFIG_USB_BERRY_CHARGE is not set
# CONFIG_USB_LED is not set
# CONFIG_USB_CYPRESS_CY7C63 is not set
# CONFIG_USB_CYTHERM is not set
# CONFIG_USB_IDMOUSE is not set
CONFIG_USB_FTDI_ELAN=m
# CONFIG_USB_APPLEDISPLAY is not set
# CONFIG_USB_SISUSBVGA is not set
# CONFIG_USB_LD is not set
# CONFIG_USB_TRANCEVIBRATOR is not set
# CONFIG_USB_IOWARRIOR is not set
# CONFIG_USB_TEST is not set
# CONFIG_USB_ISIGHTFW is not set
# CONFIG_USB_VST is not set
# CONFIG_USB_GADGET is not set

#
# OTG and related infrastructure
#
# CONFIG_NOP_USB_XCEIV is not set
# CONFIG_UWB is not set
# CONFIG_MMC is not set
# CONFIG_MEMSTICK is not set
CONFIG_NEW_LEDS=y
CONFIG_LEDS_CLASS=y

#
# LED drivers
#
# CONFIG_LEDS_ALIX2 is not set
# CONFIG_LEDS_PCA9532 is not set
# CONFIG_LEDS_LP5521 is not set
# CONFIG_LEDS_CLEVO_MAIL is not set
# CONFIG_LEDS_PCA955X is not set
# CONFIG_LEDS_BD2802 is not set

#
# LED Triggers
#
CONFIG_LEDS_TRIGGERS=y
CONFIG_LEDS_TRIGGER_TIMER=m
# CONFIG_LEDS_TRIGGER_HEARTBEAT is not set
# CONFIG_LEDS_TRIGGER_BACKLIGHT is not set
# CONFIG_LEDS_TRIGGER_DEFAULT_ON is not set

#
# iptables trigger is under Netfilter config (LED target)
#
# CONFIG_ACCESSIBILITY is not set
# CONFIG_INFINIBAND is not set
# CONFIG_EDAC is not set
# CONFIG_RTC_CLASS is not set
# CONFIG_DMADEVICES is not set
# CONFIG_AUXDISPLAY is not set
CONFIG_UIO=m
# CONFIG_UIO_CIF is not set
# CONFIG_UIO_PDRV is not set
# CONFIG_UIO_PDRV_GENIRQ is not set
# CONFIG_UIO_SMX is not set
# CONFIG_UIO_AEC is not set
# CONFIG_UIO_SERCOS3 is not set
# CONFIG_STAGING is not set
CONFIG_X86_PLATFORM_DEVICES=y
# CONFIG_ASUS_LAPTOP is not set
# CONFIG_FUJITSU_LAPTOP is not set
# CONFIG_TC1100_WMI is not set
# CONFIG_MSI_LAPTOP is not set
# CONFIG_PANASONIC_LAPTOP is not set
# CONFIG_COMPAL_LAPTOP is not set
# CONFIG_THINKPAD_ACPI is not set
# CONFIG_INTEL_MENLOW is not set
# CONFIG_EEEPC_LAPTOP is not set
# CONFIG_ACPI_WMI is not set
# CONFIG_ACPI_ASUS is not set
# CONFIG_ACPI_TOSHIBA is not set

#
# Firmware Drivers
#
CONFIG_EDD=m
# CONFIG_EDD_OFF is not set
CONFIG_FIRMWARE_MEMMAP=y
CONFIG_EFI_VARS=y
# CONFIG_DELL_RBU is not set
# CONFIG_DCDBAS is not set
CONFIG_DMIID=y
# CONFIG_ISCSI_IBFT_FIND is not set

#
# File systems
#
CONFIG_EXT2_FS=m
# CONFIG_EXT2_FS_XATTR is not set
CONFIG_EXT2_FS_XIP=y
CONFIG_EXT3_FS=m
# CONFIG_EXT3_DEFAULTS_TO_ORDERED is not set
CONFIG_EXT3_FS_XATTR=y
CONFIG_EXT3_FS_POSIX_ACL=y
CONFIG_EXT3_FS_SECURITY=y
CONFIG_EXT4_FS=m
CONFIG_EXT4DEV_COMPAT=y
CONFIG_EXT4_FS_XATTR=y
CONFIG_EXT4_FS_POSIX_ACL=y
CONFIG_EXT4_FS_SECURITY=y
CONFIG_FS_XIP=y
CONFIG_JBD=m
# CONFIG_JBD_DEBUG is not set
CONFIG_JBD2=m
# CONFIG_JBD2_DEBUG is not set
CONFIG_FS_MBCACHE=m
# CONFIG_REISERFS_FS is not set
# CONFIG_JFS_FS is not set
CONFIG_FS_POSIX_ACL=y
CONFIG_FILE_LOCKING=y
# CONFIG_XFS_FS is not set
# CONFIG_GFS2_FS is not set
# CONFIG_OCFS2_FS is not set
# CONFIG_BTRFS_FS is not set
CONFIG_DNOTIFY=y
CONFIG_INOTIFY=y
CONFIG_INOTIFY_USER=y
# CONFIG_QUOTA is not set
# CONFIG_AUTOFS_FS is not set
CONFIG_AUTOFS4_FS=m
CONFIG_FUSE_FS=m
CONFIG_GENERIC_ACL=y

#
# Caches
#
# CONFIG_FSCACHE is not set

#
# CD-ROM/DVD Filesystems
#
CONFIG_ISO9660_FS=y
CONFIG_JOLIET=y
CONFIG_ZISOFS=y
CONFIG_UDF_FS=y
CONFIG_UDF_NLS=y

#
# DOS/FAT/NT Filesystems
#
CONFIG_FAT_FS=m
CONFIG_MSDOS_FS=m
CONFIG_VFAT_FS=m
CONFIG_FAT_DEFAULT_CODEPAGE=437
CONFIG_FAT_DEFAULT_IOCHARSET="ascii"
# CONFIG_NTFS_FS is not set

#
# Pseudo filesystems
#
CONFIG_PROC_FS=y
CONFIG_PROC_KCORE=y
CONFIG_PROC_VMCORE=y
CONFIG_PROC_SYSCTL=y
CONFIG_PROC_PAGE_MONITOR=y
CONFIG_SYSFS=y
CONFIG_TMPFS=y
CONFIG_TMPFS_POSIX_ACL=y
CONFIG_HUGETLBFS=y
CONFIG_HUGETLB_PAGE=y
CONFIG_CONFIGFS_FS=m
CONFIG_MISC_FILESYSTEMS=y
# CONFIG_ADFS_FS is not set
# CONFIG_AFFS_FS is not set
# CONFIG_HFS_FS is not set
# CONFIG_HFSPLUS_FS is not set
# CONFIG_BEFS_FS is not set
# CONFIG_BFS_FS is not set
# CONFIG_EFS_FS is not set
CONFIG_CRAMFS=m
# CONFIG_SQUASHFS is not set
# CONFIG_VXFS_FS is not set
# CONFIG_MINIX_FS is not set
# CONFIG_OMFS_FS is not set
# CONFIG_HPFS_FS is not set
# CONFIG_QNX4FS_FS is not set
CONFIG_ROMFS_FS=m
CONFIG_ROMFS_BACKED_BY_BLOCK=y
# CONFIG_ROMFS_BACKED_BY_MTD is not set
# CONFIG_ROMFS_BACKED_BY_BOTH is not set
CONFIG_ROMFS_ON_BLOCK=y
# CONFIG_SYSV_FS is not set
CONFIG_UFS_FS=m
# CONFIG_UFS_FS_WRITE is not set
# CONFIG_UFS_DEBUG is not set
# CONFIG_NILFS2_FS is not set
CONFIG_NETWORK_FILESYSTEMS=y
CONFIG_NFS_FS=m
CONFIG_NFS_V3=y
CONFIG_NFS_V3_ACL=y
CONFIG_NFS_V4=y
# CONFIG_NFSD is not set
CONFIG_LOCKD=m
CONFIG_LOCKD_V4=y
CONFIG_NFS_ACL_SUPPORT=m
CONFIG_NFS_COMMON=y
CONFIG_SUNRPC=m
CONFIG_SUNRPC_GSS=m
CONFIG_RPCSEC_GSS_KRB5=m
# CONFIG_RPCSEC_GSS_SPKM3 is not set
# CONFIG_SMB_FS is not set
# CONFIG_CIFS is not set
# CONFIG_NCP_FS is not set
# CONFIG_CODA_FS is not set
# CONFIG_AFS_FS is not set

#
# Partition Types
#
CONFIG_PARTITION_ADVANCED=y
# CONFIG_ACORN_PARTITION is not set
# CONFIG_OSF_PARTITION is not set
# CONFIG_AMIGA_PARTITION is not set
# CONFIG_ATARI_PARTITION is not set
# CONFIG_MAC_PARTITION is not set
CONFIG_MSDOS_PARTITION=y
CONFIG_BSD_DISKLABEL=y
# CONFIG_MINIX_SUBPARTITION is not set
# CONFIG_SOLARIS_X86_PARTITION is not set
# CONFIG_UNIXWARE_DISKLABEL is not set
# CONFIG_LDM_PARTITION is not set
# CONFIG_SGI_PARTITION is not set
# CONFIG_ULTRIX_PARTITION is not set
# CONFIG_SUN_PARTITION is not set
# CONFIG_KARMA_PARTITION is not set
CONFIG_EFI_PARTITION=y
# CONFIG_SYSV68_PARTITION is not set
CONFIG_NLS=y
CONFIG_NLS_DEFAULT="utf8"
CONFIG_NLS_CODEPAGE_437=y
# CONFIG_NLS_CODEPAGE_737 is not set
# CONFIG_NLS_CODEPAGE_775 is not set
CONFIG_NLS_CODEPAGE_850=m
CONFIG_NLS_CODEPAGE_852=m
# CONFIG_NLS_CODEPAGE_855 is not set
# CONFIG_NLS_CODEPAGE_857 is not set
# CONFIG_NLS_CODEPAGE_860 is not set
# CONFIG_NLS_CODEPAGE_861 is not set
# CONFIG_NLS_CODEPAGE_862 is not set
CONFIG_NLS_CODEPAGE_863=m
# CONFIG_NLS_CODEPAGE_864 is not set
# CONFIG_NLS_CODEPAGE_865 is not set
# CONFIG_NLS_CODEPAGE_866 is not set
# CONFIG_NLS_CODEPAGE_869 is not set
CONFIG_NLS_CODEPAGE_936=m
CONFIG_NLS_CODEPAGE_950=m
CONFIG_NLS_CODEPAGE_932=m
# CONFIG_NLS_CODEPAGE_949 is not set
# CONFIG_NLS_CODEPAGE_874 is not set
CONFIG_NLS_ISO8859_8=m
CONFIG_NLS_CODEPAGE_1250=m
CONFIG_NLS_CODEPAGE_1251=m
CONFIG_NLS_ASCII=y
# CONFIG_NLS_ISO8859_1 is not set
# CONFIG_NLS_ISO8859_2 is not set
# CONFIG_NLS_ISO8859_3 is not set
# CONFIG_NLS_ISO8859_4 is not set
# CONFIG_NLS_ISO8859_5 is not set
# CONFIG_NLS_ISO8859_6 is not set
# CONFIG_NLS_ISO8859_7 is not set
# CONFIG_NLS_ISO8859_9 is not set
# CONFIG_NLS_ISO8859_13 is not set
# CONFIG_NLS_ISO8859_14 is not set
# CONFIG_NLS_ISO8859_15 is not set
# CONFIG_NLS_KOI8_R is not set
# CONFIG_NLS_KOI8_U is not set
CONFIG_NLS_UTF8=m
# CONFIG_DLM is not set

#
# Kernel hacking
#
CONFIG_TRACE_IRQFLAGS_SUPPORT=y
# CONFIG_PRINTK_TIME is not set
# CONFIG_ENABLE_WARN_DEPRECATED is not set
# CONFIG_ENABLE_MUST_CHECK is not set
CONFIG_FRAME_WARN=1024
CONFIG_MAGIC_SYSRQ=y
# CONFIG_UNUSED_SYMBOLS is not set
CONFIG_DEBUG_FS=y
CONFIG_HEADERS_CHECK=y
CONFIG_DEBUG_KERNEL=y
CONFIG_DEBUG_SHIRQ=y
CONFIG_DETECT_SOFTLOCKUP=y
# CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC is not set
CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC_VALUE=0
CONFIG_DETECT_HUNG_TASK=y
# CONFIG_BOOTPARAM_HUNG_TASK_PANIC is not set
CONFIG_BOOTPARAM_HUNG_TASK_PANIC_VALUE=0
CONFIG_SCHED_DEBUG=y
CONFIG_SCHEDSTATS=y
CONFIG_TIMER_STATS=y
# CONFIG_DEBUG_OBJECTS is not set
# CONFIG_SLUB_DEBUG_ON is not set
# CONFIG_SLUB_STATS is not set
CONFIG_DEBUG_PREEMPT=y
# CONFIG_DEBUG_RT_MUTEXES is not set
# CONFIG_RT_MUTEX_TESTER is not set
CONFIG_DEBUG_SPINLOCK=y
CONFIG_DEBUG_MUTEXES=y
CONFIG_DEBUG_LOCK_ALLOC=y
CONFIG_PROVE_LOCKING=y
CONFIG_LOCKDEP=y
# CONFIG_LOCK_STAT is not set
CONFIG_DEBUG_LOCKDEP=y
CONFIG_TRACE_IRQFLAGS=y
CONFIG_DEBUG_SPINLOCK_SLEEP=y
# CONFIG_DEBUG_LOCKING_API_SELFTESTS is not set
CONFIG_STACKTRACE=y
# CONFIG_DEBUG_KOBJECT is not set
CONFIG_DEBUG_HIGHMEM=y
CONFIG_DEBUG_BUGVERBOSE=y
CONFIG_DEBUG_INFO=y
# CONFIG_DEBUG_VM is not set
# CONFIG_DEBUG_VIRTUAL is not set
# CONFIG_DEBUG_WRITECOUNT is not set
CONFIG_DEBUG_MEMORY_INIT=y
CONFIG_DEBUG_LIST=y
# CONFIG_DEBUG_SG is not set
# CONFIG_DEBUG_NOTIFIERS is not set
CONFIG_ARCH_WANT_FRAME_POINTERS=y
CONFIG_FRAME_POINTER=y
# CONFIG_BOOT_PRINTK_DELAY is not set
# CONFIG_RCU_TORTURE_TEST is not set
# CONFIG_KPROBES_SANITY_TEST is not set
# CONFIG_BACKTRACE_SELF_TEST is not set
# CONFIG_DEBUG_BLOCK_EXT_DEVT is not set
# CONFIG_LKDTM is not set
# CONFIG_FAULT_INJECTION is not set
# CONFIG_LATENCYTOP is not set
CONFIG_SYSCTL_SYSCALL_CHECK=y
# CONFIG_DEBUG_PAGEALLOC is not set
CONFIG_USER_STACKTRACE_SUPPORT=y
CONFIG_NOP_TRACER=y
CONFIG_HAVE_FTRACE_NMI_ENTER=y
CONFIG_HAVE_FUNCTION_TRACER=y
CONFIG_HAVE_FUNCTION_GRAPH_TRACER=y
CONFIG_HAVE_FUNCTION_TRACE_MCOUNT_TEST=y
CONFIG_HAVE_DYNAMIC_FTRACE=y
CONFIG_HAVE_FTRACE_MCOUNT_RECORD=y
CONFIG_HAVE_FTRACE_SYSCALLS=y
CONFIG_TRACER_MAX_TRACE=y
CONFIG_RING_BUFFER=y
CONFIG_FTRACE_NMI_ENTER=y
CONFIG_TRACING=y
CONFIG_TRACING_SUPPORT=y

#
# Tracers
#
CONFIG_FUNCTION_TRACER=y
CONFIG_FUNCTION_GRAPH_TRACER=y
CONFIG_IRQSOFF_TRACER=y
CONFIG_PREEMPT_TRACER=y
CONFIG_SYSPROF_TRACER=y
CONFIG_SCHED_TRACER=y
CONFIG_CONTEXT_SWITCH_TRACER=y
CONFIG_EVENT_TRACER=y
CONFIG_FTRACE_SYSCALLS=y
CONFIG_BOOT_TRACER=y
# CONFIG_TRACE_BRANCH_PROFILING is not set
CONFIG_POWER_TRACER=y
CONFIG_STACK_TRACER=y
# CONFIG_KMEMTRACE is not set
CONFIG_WORKQUEUE_TRACER=y
CONFIG_BLK_DEV_IO_TRACE=y
CONFIG_DYNAMIC_FTRACE=y
CONFIG_FTRACE_MCOUNT_RECORD=y
# CONFIG_FTRACE_STARTUP_TEST is not set
CONFIG_MMIOTRACE=y
CONFIG_MMIOTRACE_TEST=m
# CONFIG_PROVIDE_OHCI1394_DMA_INIT is not set
# CONFIG_FIREWIRE_OHCI_REMOTE_DMA is not set
# CONFIG_BUILD_DOCSRC is not set
# CONFIG_DYNAMIC_DEBUG is not set
# CONFIG_DMA_API_DEBUG is not set
CONFIG_SAMPLES=y
# CONFIG_SAMPLE_MARKERS is not set
# CONFIG_SAMPLE_TRACEPOINTS is not set
CONFIG_SAMPLE_KOBJECT=m
CONFIG_SAMPLE_KPROBES=m
CONFIG_SAMPLE_KRETPROBES=m
CONFIG_HAVE_ARCH_KGDB=y
# CONFIG_KGDB is not set
# CONFIG_STRICT_DEVMEM is not set
CONFIG_X86_VERBOSE_BOOTUP=y
CONFIG_EARLY_PRINTK=y
# CONFIG_EARLY_PRINTK_DBGP is not set
# CONFIG_DEBUG_STACKOVERFLOW is not set
# CONFIG_DEBUG_STACK_USAGE is not set
# CONFIG_DEBUG_PER_CPU_MAPS is not set
# CONFIG_X86_PTDUMP is not set
CONFIG_DEBUG_RODATA=y
# CONFIG_DEBUG_RODATA_TEST is not set
# CONFIG_DEBUG_NX_TEST is not set
CONFIG_4KSTACKS=y
CONFIG_DOUBLEFAULT=y
CONFIG_HAVE_MMIOTRACE_SUPPORT=y
CONFIG_IO_DELAY_TYPE_0X80=0
CONFIG_IO_DELAY_TYPE_0XED=1
CONFIG_IO_DELAY_TYPE_UDELAY=2
CONFIG_IO_DELAY_TYPE_NONE=3
CONFIG_IO_DELAY_0X80=y
# CONFIG_IO_DELAY_0XED is not set
# CONFIG_IO_DELAY_UDELAY is not set
# CONFIG_IO_DELAY_NONE is not set
CONFIG_DEFAULT_IO_DELAY_TYPE=0
# CONFIG_DEBUG_BOOT_PARAMS is not set
# CONFIG_CPA_DEBUG is not set
# CONFIG_OPTIMIZE_INLINING is not set

#
# Security options
#
# CONFIG_KEYS is not set
# CONFIG_SECURITY is not set
# CONFIG_SECURITYFS is not set
# CONFIG_SECURITY_FILE_CAPABILITIES is not set
# CONFIG_IMA is not set
CONFIG_CRYPTO=y

#
# Crypto core or helper
#
# CONFIG_CRYPTO_FIPS is not set
CONFIG_CRYPTO_ALGAPI=y
CONFIG_CRYPTO_ALGAPI2=y
CONFIG_CRYPTO_AEAD2=y
CONFIG_CRYPTO_BLKCIPHER=m
CONFIG_CRYPTO_BLKCIPHER2=y
CONFIG_CRYPTO_HASH=y
CONFIG_CRYPTO_HASH2=y
CONFIG_CRYPTO_RNG2=y
CONFIG_CRYPTO_PCOMP=y
CONFIG_CRYPTO_MANAGER=y
CONFIG_CRYPTO_MANAGER2=y
# CONFIG_CRYPTO_GF128MUL is not set
CONFIG_CRYPTO_NULL=m
CONFIG_CRYPTO_WORKQUEUE=y
# CONFIG_CRYPTO_CRYPTD is not set
# CONFIG_CRYPTO_AUTHENC is not set
# CONFIG_CRYPTO_TEST is not set

#
# Authenticated Encryption with Associated Data
#
# CONFIG_CRYPTO_CCM is not set
# CONFIG_CRYPTO_GCM is not set
# CONFIG_CRYPTO_SEQIV is not set

#
# Block modes
#
CONFIG_CRYPTO_CBC=m
# CONFIG_CRYPTO_CTR is not set
# CONFIG_CRYPTO_CTS is not set
# CONFIG_CRYPTO_ECB is not set
# CONFIG_CRYPTO_LRW is not set
# CONFIG_CRYPTO_PCBC is not set
# CONFIG_CRYPTO_XTS is not set

#
# Hash modes
#
# CONFIG_CRYPTO_HMAC is not set
# CONFIG_CRYPTO_XCBC is not set

#
# Digest
#
CONFIG_CRYPTO_CRC32C=y
# CONFIG_CRYPTO_CRC32C_INTEL is not set
CONFIG_CRYPTO_MD4=m
CONFIG_CRYPTO_MD5=y
# CONFIG_CRYPTO_MICHAEL_MIC is not set
# CONFIG_CRYPTO_RMD128 is not set
# CONFIG_CRYPTO_RMD160 is not set
# CONFIG_CRYPTO_RMD256 is not set
# CONFIG_CRYPTO_RMD320 is not set
CONFIG_CRYPTO_SHA1=y
CONFIG_CRYPTO_SHA256=m
# CONFIG_CRYPTO_SHA512 is not set
# CONFIG_CRYPTO_TGR192 is not set
# CONFIG_CRYPTO_WP512 is not set

#
# Ciphers
#
CONFIG_CRYPTO_AES=m
# CONFIG_CRYPTO_AES_586 is not set
# CONFIG_CRYPTO_ANUBIS is not set
# CONFIG_CRYPTO_ARC4 is not set
# CONFIG_CRYPTO_BLOWFISH is not set
# CONFIG_CRYPTO_CAMELLIA is not set
# CONFIG_CRYPTO_CAST5 is not set
# CONFIG_CRYPTO_CAST6 is not set
CONFIG_CRYPTO_DES=m
# CONFIG_CRYPTO_FCRYPT is not set
# CONFIG_CRYPTO_KHAZAD is not set
# CONFIG_CRYPTO_SALSA20 is not set
# CONFIG_CRYPTO_SALSA20_586 is not set
# CONFIG_CRYPTO_SEED is not set
# CONFIG_CRYPTO_SERPENT is not set
# CONFIG_CRYPTO_TEA is not set
# CONFIG_CRYPTO_TWOFISH is not set
# CONFIG_CRYPTO_TWOFISH_586 is not set

#
# Compression
#
# CONFIG_CRYPTO_DEFLATE is not set
# CONFIG_CRYPTO_ZLIB is not set
# CONFIG_CRYPTO_LZO is not set

#
# Random Number Generation
#
# CONFIG_CRYPTO_ANSI_CPRNG is not set
# CONFIG_CRYPTO_HW is not set
CONFIG_HAVE_KVM=y
CONFIG_HAVE_KVM_IRQCHIP=y
# CONFIG_VIRTUALIZATION is not set
CONFIG_BINARY_PRINTF=y

#
# Library routines
#
CONFIG_BITREVERSE=y
CONFIG_GENERIC_FIND_FIRST_BIT=y
CONFIG_GENERIC_FIND_NEXT_BIT=y
CONFIG_GENERIC_FIND_LAST_BIT=y
CONFIG_CRC_CCITT=m
CONFIG_CRC16=m
# CONFIG_CRC_T10DIF is not set
CONFIG_CRC_ITU_T=y
CONFIG_CRC32=y
# CONFIG_CRC7 is not set
# CONFIG_LIBCRC32C is not set
CONFIG_ZLIB_INFLATE=y
CONFIG_ZLIB_DEFLATE=m
CONFIG_DECOMPRESS_GZIP=y
CONFIG_DECOMPRESS_BZIP2=y
CONFIG_DECOMPRESS_LZMA=y
CONFIG_HAS_IOMEM=y
CONFIG_HAS_IOPORT=y
CONFIG_HAS_DMA=y
CONFIG_NLATTR=y

[-- Attachment #3: dmesg.txt --]
[-- Type: text/plain, Size: 90538 bytes --]

Initializing cgroup subsys cpuset
Initializing cgroup subsys cpu
Linux version 2.6.30-rc4-io (root@localhost.localdomain) (gcc version 4.1.2 20070925 (Red Hat 4.1.2-33)) #6 SMP PREEMPT Thu May 7 11:07:49 CST 2009
KERNEL supported cpus:
  Intel GenuineIntel
  AMD AuthenticAMD
  NSC Geode by NSC
  Cyrix CyrixInstead
  Centaur CentaurHauls
  Transmeta GenuineTMx86
  Transmeta TransmetaCPU
  UMC UMC UMC UMC
BIOS-provided physical RAM map:
 BIOS-e820: 0000000000000000 - 000000000009f400 (usable)
 BIOS-e820: 000000000009f400 - 00000000000a0000 (reserved)
 BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved)
 BIOS-e820: 0000000000100000 - 000000003bff0000 (usable)
 BIOS-e820: 000000003bff0000 - 000000003bff3000 (ACPI NVS)
 BIOS-e820: 000000003bff3000 - 000000003c000000 (ACPI data)
 BIOS-e820: 00000000fec00000 - 0000000100000000 (reserved)
DMI 2.3 present.
Phoenix BIOS detected: BIOS may corrupt low RAM, working around it.
e820 update range: 0000000000000000 - 0000000000010000 (usable) ==> (reserved)
last_pfn = 0x3bff0 max_arch_pfn = 0x100000
MTRR default type: uncachable
MTRR fixed ranges enabled:
  00000-9FFFF write-back
  A0000-BFFFF uncachable
  C0000-C7FFF write-protect
  C8000-FFFFF uncachable
MTRR variable ranges enabled:
  0 base 000000000 mask FC0000000 write-back
  1 base 03C000000 mask FFC000000 uncachable
  2 base 0D0000000 mask FF8000000 write-combining
  3 disabled
  4 disabled
  5 disabled
  6 disabled
  7 disabled
init_memory_mapping: 0000000000000000-00000000377fe000
 0000000000 - 0000400000 page 4k
 0000400000 - 0037400000 page 2M
 0037400000 - 00377fe000 page 4k
kernel direct mapping tables up to 377fe000 @ 10000-15000
RAMDISK: 37d0d000 - 37fefd69
Allocated new RAMDISK: 00100000 - 003e2d69
Move RAMDISK from 0000000037d0d000 - 0000000037fefd68 to 00100000 - 003e2d68
ACPI: RSDP 000f7560 00014 (v00 AWARD )
ACPI: RSDT 3bff3040 0002C (v01 AWARD  AWRDACPI 42302E31 AWRD 00000000)
ACPI: FACP 3bff30c0 00074 (v01 AWARD  AWRDACPI 42302E31 AWRD 00000000)
ACPI: DSDT 3bff3180 03ABC (v01 AWARD  AWRDACPI 00001000 MSFT 0100000E)
ACPI: FACS 3bff0000 00040
ACPI: APIC 3bff6c80 00084 (v01 AWARD  AWRDACPI 42302E31 AWRD 00000000)
ACPI: Local APIC address 0xfee00000
71MB HIGHMEM available.
887MB LOWMEM available.
  mapped low ram: 0 - 377fe000
  low ram: 0 - 377fe000
  node 0 low ram: 00000000 - 377fe000
  node 0 bootmap 00011000 - 00017f00
(9 early reservations) ==> bootmem [0000000000 - 00377fe000]
  #0 [0000000000 - 0000001000]   BIOS data page ==> [0000000000 - 0000001000]
  #1 [0000001000 - 0000002000]    EX TRAMPOLINE ==> [0000001000 - 0000002000]
  #2 [0000006000 - 0000007000]       TRAMPOLINE ==> [0000006000 - 0000007000]
  #3 [0000400000 - 0000c6bd1c]    TEXT DATA BSS ==> [0000400000 - 0000c6bd1c]
  #4 [000009f400 - 0000100000]    BIOS reserved ==> [000009f400 - 0000100000]
  #5 [0000c6c000 - 0000c700ed]              BRK ==> [0000c6c000 - 0000c700ed]
  #6 [0000010000 - 0000011000]          PGTABLE ==> [0000010000 - 0000011000]
  #7 [0000100000 - 00003e2d69]      NEW RAMDISK ==> [0000100000 - 00003e2d69]
  #8 [0000011000 - 0000018000]          BOOTMAP ==> [0000011000 - 0000018000]
found SMP MP-table at [c00f5ad0] f5ad0
Zone PFN ranges:
  DMA      0x00000010 -> 0x00001000
  Normal   0x00001000 -> 0x000377fe
  HighMem  0x000377fe -> 0x0003bff0
Movable zone start PFN for each node
early_node_map[2] active PFN ranges
    0: 0x00000010 -> 0x0000009f
    0: 0x00000100 -> 0x0003bff0
On node 0 totalpages: 245631
free_area_init_node: node 0, pgdat c0778f80, node_mem_map c1000340
  DMA zone: 52 pages used for memmap
  DMA zone: 0 pages reserved
  DMA zone: 3931 pages, LIFO batch:0
  Normal zone: 2834 pages used for memmap
  Normal zone: 220396 pages, LIFO batch:31
  HighMem zone: 234 pages used for memmap
  HighMem zone: 18184 pages, LIFO batch:3
Using APIC driver default
ACPI: PM-Timer IO Port: 0x1008
ACPI: Local APIC address 0xfee00000
ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled)
ACPI: LAPIC (acpi_id[0x01] lapic_id[0x01] enabled)
ACPI: LAPIC (acpi_id[0x02] lapic_id[0x02] disabled)
ACPI: LAPIC (acpi_id[0x03] lapic_id[0x03] disabled)
ACPI: LAPIC_NMI (acpi_id[0x00] high edge lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x01] high edge lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x02] high edge lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x03] high edge lint[0x1])
ACPI: IOAPIC (id[0x04] address[0xfec00000] gsi_base[0])
IOAPIC[0]: apic_id 4, version 17, address 0xfec00000, GSI 0-23
ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 dfl dfl)
ACPI: IRQ0 used by override.
ACPI: IRQ2 used by override.
ACPI: IRQ9 used by override.
Enabling APIC mode:  Flat.  Using 1 I/O APICs
Using ACPI (MADT) for SMP configuration information
SMP: Allowing 4 CPUs, 2 hotplug CPUs
nr_irqs_gsi: 24
Allocating PCI resources starting at 40000000 (gap: 3c000000:c2c00000)
NR_CPUS:8 nr_cpumask_bits:8 nr_cpu_ids:4 nr_node_ids:1
PERCPU: Embedded 13 pages at c1c3b000, static data 32756 bytes
Built 1 zonelists in Zone order, mobility grouping on.  Total pages: 242511
Kernel command line: ro root=LABEL=/ rhgb quiet
Enabling fast FPU save and restore... done.
Enabling unmasked SIMD FPU exception support... done.
Initializing CPU#0
Preemptible RCU implementation.
NR_IRQS:512
CPU 0 irqstacks, hard=c1c3b000 soft=c1c3c000
PID hash table entries: 4096 (order: 12, 16384 bytes)
Fast TSC calibration using PIT
Detected 2800.222 MHz processor.
Console: colour VGA+ 80x25
console [tty0] enabled
Lock dependency validator: Copyright (c) 2006 Red Hat, Inc., Ingo Molnar
... MAX_LOCKDEP_SUBCLASSES:  8
... MAX_LOCK_DEPTH:          48
... MAX_LOCKDEP_KEYS:        8191
... CLASSHASH_SIZE:          4096
... MAX_LOCKDEP_ENTRIES:     8192
... MAX_LOCKDEP_CHAINS:      16384
... CHAINHASH_SIZE:          8192
 memory used by lock dependency info: 2847 kB
 per task-struct memory footprint: 1152 bytes
Dentry cache hash table entries: 131072 (order: 7, 524288 bytes)
Inode-cache hash table entries: 65536 (order: 6, 262144 bytes)
allocated 4914560 bytes of page_cgroup
please try cgroup_disable=memory,blkio option if you don't want
Initializing HighMem for node 0 (000377fe:0003bff0)
Memory: 952284k/982976k available (2258k kernel code, 30016k reserved, 1424k data, 320k init, 73672k highmem)
virtual kernel memory layout:
    fixmap  : 0xffedf000 - 0xfffff000   (1152 kB)
    pkmap   : 0xff800000 - 0xffc00000   (4096 kB)
    vmalloc : 0xf7ffe000 - 0xff7fe000   ( 120 MB)
    lowmem  : 0xc0000000 - 0xf77fe000   ( 887 MB)
      .init : 0xc079d000 - 0xc07ed000   ( 320 kB)
      .data : 0xc06349ab - 0xc0798cb8   (1424 kB)
      .text : 0xc0400000 - 0xc06349ab   (2258 kB)
Checking if this processor honours the WP bit even in supervisor mode...Ok.
SLUB: Genslabs=13, HWalign=128, Order=0-3, MinObjects=0, CPUs=4, Nodes=1
Calibrating delay loop (skipped), value calculated using timer frequency.. 5600.44 BogoMIPS (lpj=2800222)
Mount-cache hash table entries: 512
Initializing cgroup subsys debug
Initializing cgroup subsys ns
Initializing cgroup subsys cpuacct
Initializing cgroup subsys memory
Initializing cgroup subsys blkio
Initializing cgroup subsys devices
Initializing cgroup subsys freezer
Initializing cgroup subsys net_cls
Initializing cgroup subsys io
CPU: Trace cache: 12K uops, L1 D cache: 16K
CPU: L2 cache: 1024K
CPU: Physical Processor ID: 0
CPU: Processor Core ID: 0
Intel machine check architecture supported.
Intel machine check reporting enabled on CPU#0.
CPU0: Intel P4/Xeon Extended MCE MSRs (24) available
using mwait in idle threads.
Checking 'hlt' instruction... OK.
ACPI: Core revision 20090320
ftrace: converting mcount calls to 0f 1f 44 00 00
ftrace: allocating 12136 entries in 24 pages
..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
CPU0: Intel(R) Pentium(R) D CPU 2.80GHz stepping 04
lockdep: fixing up alternatives.
CPU 1 irqstacks, hard=c1c4b000 soft=c1c4c000
Booting processor 1 APIC 0x1 ip 0x6000
Initializing CPU#1
Calibrating delay using timer specific routine.. 5599.23 BogoMIPS (lpj=2799617)
CPU: Trace cache: 12K uops, L1 D cache: 16K
CPU: L2 cache: 1024K
CPU: Physical Processor ID: 0
CPU: Processor Core ID: 1
Intel machine check architecture supported.
Intel machine check reporting enabled on CPU#1.
CPU1: Intel P4/Xeon Extended MCE MSRs (24) available
CPU1: Intel(R) Pentium(R) D CPU 2.80GHz stepping 04
checking TSC synchronization [CPU#0 -> CPU#1]: passed.
Brought up 2 CPUs
Total of 2 processors activated (11199.67 BogoMIPS).
CPU0 attaching sched-domain:
 domain 0: span 0-1 level CPU
  groups: 0 1
CPU1 attaching sched-domain:
 domain 0: span 0-1 level CPU
  groups: 1 0
net_namespace: 436 bytes
NET: Registered protocol family 16
ACPI: bus type pci registered
PCI: PCI BIOS revision 2.10 entry at 0xfbda0, last bus=1
PCI: Using configuration type 1 for base access
mtrr: your CPUs had inconsistent fixed MTRR settings
mtrr: probably your BIOS does not setup all CPUs.
mtrr: corrected configuration.
bio: create slab <bio-0> at 0
ACPI: EC: Look up EC in DSDT
ACPI: Interpreter enabled
ACPI: (supports S0 S3 S5)
ACPI: Using IOAPIC for interrupt routing
ACPI: No dock devices found.
ACPI: PCI Root Bridge [PCI0] (0000:00)
pci 0000:00:00.0: reg 10 32bit mmio: [0xd0000000-0xd7ffffff]
pci 0000:00:02.5: reg 10 io port: [0x1f0-0x1f7]
pci 0000:00:02.5: reg 14 io port: [0x3f4-0x3f7]
pci 0000:00:02.5: reg 18 io port: [0x170-0x177]
pci 0000:00:02.5: reg 1c io port: [0x374-0x377]
pci 0000:00:02.5: reg 20 io port: [0x4000-0x400f]
pci 0000:00:02.5: PME# supported from D3cold
pci 0000:00:02.5: PME# disabled
pci 0000:00:02.7: reg 10 io port: [0xd000-0xd0ff]
pci 0000:00:02.7: reg 14 io port: [0xd400-0xd47f]
pci 0000:00:02.7: supports D1 D2
pci 0000:00:02.7: PME# supported from D3hot D3cold
pci 0000:00:02.7: PME# disabled
pci 0000:00:03.0: reg 10 32bit mmio: [0xe1104000-0xe1104fff]
pci 0000:00:03.1: reg 10 32bit mmio: [0xe1100000-0xe1100fff]
pci 0000:00:03.2: reg 10 32bit mmio: [0xe1101000-0xe1101fff]
pci 0000:00:03.3: reg 10 32bit mmio: [0xe1102000-0xe1102fff]
pci 0000:00:03.3: PME# supported from D0 D3hot D3cold
pci 0000:00:03.3: PME# disabled
pci 0000:00:05.0: reg 10 io port: [0xd800-0xd807]
pci 0000:00:05.0: reg 14 io port: [0xdc00-0xdc03]
pci 0000:00:05.0: reg 18 io port: [0xe000-0xe007]
pci 0000:00:05.0: reg 1c io port: [0xe400-0xe403]
pci 0000:00:05.0: reg 20 io port: [0xe800-0xe80f]
pci 0000:00:05.0: PME# supported from D3cold
pci 0000:00:05.0: PME# disabled
pci 0000:00:0e.0: reg 10 io port: [0xec00-0xecff]
pci 0000:00:0e.0: reg 14 32bit mmio: [0xe1103000-0xe11030ff]
pci 0000:00:0e.0: reg 30 32bit mmio: [0x000000-0x01ffff]
pci 0000:00:0e.0: supports D1 D2
pci 0000:00:0e.0: PME# supported from D1 D2 D3hot D3cold
pci 0000:00:0e.0: PME# disabled
pci 0000:01:00.0: reg 10 32bit mmio: [0xd8000000-0xdfffffff]
pci 0000:01:00.0: reg 14 32bit mmio: [0xe1000000-0xe101ffff]
pci 0000:01:00.0: reg 18 io port: [0xc000-0xc07f]
pci 0000:01:00.0: supports D1 D2
pci 0000:00:01.0: bridge io port: [0xc000-0xcfff]
pci 0000:00:01.0: bridge 32bit mmio: [0xe1000000-0xe10fffff]
pci 0000:00:01.0: bridge 32bit mmio pref: [0xd8000000-0xdfffffff]
pci_bus 0000:00: on NUMA node 0
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0._PRT]
ACPI: PCI Interrupt Link [LNKA] (IRQs 3 4 5 6 7 9 10 11 14 15) *0, disabled.
ACPI: PCI Interrupt Link [LNKB] (IRQs 3 4 5 6 7 9 10 *11 14 15)
ACPI: PCI Interrupt Link [LNKC] (IRQs 3 4 5 6 7 9 *10 11 14 15)
ACPI: PCI Interrupt Link [LNKD] (IRQs 3 4 5 6 7 9 10 11 14 15) *0, disabled.
ACPI: PCI Interrupt Link [LNKE] (IRQs 3 4 5 6 7 9 10 *11 14 15)
ACPI: PCI Interrupt Link [LNKF] (IRQs 3 4 5 *6 7 9 10 11 14 15)
ACPI: PCI Interrupt Link [LNKG] (IRQs 3 4 5 6 7 *9 10 11 14 15)
ACPI: PCI Interrupt Link [LNKH] (IRQs 3 4 *5 6 7 9 10 11 14 15)
usbcore: registered new interface driver usbfs
usbcore: registered new interface driver hub
usbcore: registered new device driver usb
PCI: Using ACPI for IRQ routing
pnp: PnP ACPI init
ACPI: bus type pnp registered
pnp: PnP ACPI: found 12 devices
ACPI: ACPI bus type pnp unregistered
system 00:00: iomem range 0xc8000-0xcbfff has been reserved
system 00:00: iomem range 0xf0000-0xf7fff could not be reserved
system 00:00: iomem range 0xf8000-0xfbfff could not be reserved
system 00:00: iomem range 0xfc000-0xfffff could not be reserved
system 00:00: iomem range 0x3bff0000-0x3bffffff could not be reserved
system 00:00: iomem range 0xffff0000-0xffffffff has been reserved
system 00:00: iomem range 0x0-0x9ffff could not be reserved
system 00:00: iomem range 0x100000-0x3bfeffff could not be reserved
system 00:00: iomem range 0xffee0000-0xffefffff has been reserved
system 00:00: iomem range 0xfffe0000-0xfffeffff has been reserved
system 00:00: iomem range 0xfec00000-0xfecfffff has been reserved
system 00:00: iomem range 0xfee00000-0xfeefffff has been reserved
system 00:02: ioport range 0x4d0-0x4d1 has been reserved
system 00:02: ioport range 0x800-0x805 has been reserved
system 00:02: ioport range 0x290-0x297 has been reserved
system 00:02: ioport range 0x880-0x88f has been reserved
pci 0000:00:01.0: PCI bridge, secondary bus 0000:01
pci 0000:00:01.0:   IO window: 0xc000-0xcfff
pci 0000:00:01.0:   MEM window: 0xe1000000-0xe10fffff
pci 0000:00:01.0:   PREFETCH window: 0x000000d8000000-0x000000dfffffff
pci_bus 0000:00: resource 0 io:  [0x00-0xffff]
pci_bus 0000:00: resource 1 mem: [0x000000-0xffffffff]
pci_bus 0000:01: resource 0 io:  [0xc000-0xcfff]
pci_bus 0000:01: resource 1 mem: [0xe1000000-0xe10fffff]
pci_bus 0000:01: resource 2 pref mem [0xd8000000-0xdfffffff]
NET: Registered protocol family 2
IP route cache hash table entries: 32768 (order: 5, 131072 bytes)
TCP established hash table entries: 131072 (order: 8, 1048576 bytes)
TCP bind hash table entries: 65536 (order: 9, 2097152 bytes)
TCP: Hash tables configured (established 131072 bind 65536)
TCP reno registered
NET: Registered protocol family 1
checking if image is initramfs...
rootfs image is initramfs; unpacking...
Freeing initrd memory: 2955k freed
apm: BIOS version 1.2 Flags 0x07 (Driver version 1.16ac)
apm: disabled - APM is not SMP safe.
highmem bounce pool size: 64 pages
HugeTLB registered 4 MB page size, pre-allocated 0 pages
msgmni has been set to 1722
alg: No test for stdrng (krng)
Block layer SCSI generic (bsg) driver version 0.4 loaded (major 254)
io scheduler noop registered
io scheduler cfq registered (default)
pci 0000:01:00.0: Boot video device
pci_hotplug: PCI Hot Plug PCI Core version: 0.5
fan PNP0C0B:00: registered as cooling_device0
ACPI: Fan [FAN] (on)
processor ACPI_CPU:00: registered as cooling_device1
processor ACPI_CPU:01: registered as cooling_device2
thermal LNXTHERM:01: registered as thermal_zone0
ACPI: Thermal Zone [THRM] (62 C)
isapnp: Scanning for PnP cards...
Switched to high resolution mode on CPU 1
Switched to high resolution mode on CPU 0
isapnp: No Plug & Play device found
Real Time Clock Driver v1.12b
Non-volatile memory driver v1.3
Linux agpgart interface v0.103
agpgart-sis 0000:00:00.0: SiS chipset [1039/0661]
agpgart-sis 0000:00:00.0: AGP aperture is 128M @ 0xd0000000
Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled
serial8250: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
serial8250: ttyS1 at I/O 0x2f8 (irq = 3) is a 16550A
00:07: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
00:08: ttyS1 at I/O 0x2f8 (irq = 3) is a 16550A
brd: module loaded
PNP: PS/2 Controller [PNP0303:PS2K,PNP0f13:PS2M] at 0x60,0x64 irq 1,12
serio: i8042 KBD port at 0x60,0x64 irq 1
serio: i8042 AUX port at 0x60,0x64 irq 12
mice: PS/2 mouse device common for all mice
cpuidle: using governor ladder
cpuidle: using governor menu
TCP cubic registered
NET: Registered protocol family 17
Using IPI No-Shortcut mode
registered taskstats version 1
Freeing unused kernel memory: 320k freed
Write protecting the kernel text: 2260k
Write protecting the kernel read-only data: 1120k
ehci_hcd: USB 2.0 'Enhanced' Host Controller (EHCI) Driver
ehci_hcd 0000:00:03.3: PCI INT D -> GSI 23 (level, low) -> IRQ 23
ehci_hcd 0000:00:03.3: EHCI Host Controller
ehci_hcd 0000:00:03.3: new USB bus registered, assigned bus number 1
ehci_hcd 0000:00:03.3: cache line size of 128 is not supported
ehci_hcd 0000:00:03.3: irq 23, io mem 0xe1102000
ehci_hcd 0000:00:03.3: USB 2.0 started, EHCI 1.00
usb usb1: configuration #1 chosen from 1 choice
hub 1-0:1.0: USB hub found
hub 1-0:1.0: 8 ports detected
ohci_hcd: USB 1.1 'Open' Host Controller (OHCI) Driver
ohci_hcd 0000:00:03.0: PCI INT A -> GSI 20 (level, low) -> IRQ 20
ohci_hcd 0000:00:03.0: OHCI Host Controller
ohci_hcd 0000:00:03.0: new USB bus registered, assigned bus number 2
ohci_hcd 0000:00:03.0: irq 20, io mem 0xe1104000
usb usb2: configuration #1 chosen from 1 choice
hub 2-0:1.0: USB hub found
hub 2-0:1.0: 3 ports detected
ohci_hcd 0000:00:03.1: PCI INT B -> GSI 21 (level, low) -> IRQ 21
ohci_hcd 0000:00:03.1: OHCI Host Controller
ohci_hcd 0000:00:03.1: new USB bus registered, assigned bus number 3
ohci_hcd 0000:00:03.1: irq 21, io mem 0xe1100000
usb usb3: configuration #1 chosen from 1 choice
hub 3-0:1.0: USB hub found
hub 3-0:1.0: 3 ports detected
ohci_hcd 0000:00:03.2: PCI INT C -> GSI 22 (level, low) -> IRQ 22
ohci_hcd 0000:00:03.2: OHCI Host Controller
ohci_hcd 0000:00:03.2: new USB bus registered, assigned bus number 4
ohci_hcd 0000:00:03.2: irq 22, io mem 0xe1101000
usb usb4: configuration #1 chosen from 1 choice
hub 4-0:1.0: USB hub found
hub 4-0:1.0: 2 ports detected
uhci_hcd: USB Universal Host Controller Interface driver
SCSI subsystem initialized
Driver 'sd' needs updating - please use bus_type methods
libata version 3.00 loaded.
pata_sis 0000:00:02.5: version 0.5.2
pata_sis 0000:00:02.5: PCI INT A -> GSI 16 (level, low) -> IRQ 16
scsi0 : pata_sis
scsi1 : pata_sis
ata1: PATA max UDMA/133 cmd 0x1f0 ctl 0x3f6 bmdma 0x4000 irq 14
ata2: PATA max UDMA/133 cmd 0x170 ctl 0x376 bmdma 0x4008 irq 15
input: ImPS/2 Logitech Wheel Mouse as /class/input/input0
input: AT Translated Set 2 keyboard as /class/input/input1
sata_sis 0000:00:05.0: version 1.0
sata_sis 0000:00:05.0: PCI INT A -> GSI 17 (level, low) -> IRQ 17
sata_sis 0000:00:05.0: Detected SiS 180/181/964 chipset in SATA mode
scsi2 : sata_sis
scsi3 : sata_sis
ata3: SATA max UDMA/133 cmd 0xd800 ctl 0xdc00 bmdma 0xe800 irq 17
ata4: SATA max UDMA/133 cmd 0xe000 ctl 0xe400 bmdma 0xe808 irq 17
ata3: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
ata3.00: ATA-7: ST3808110AS, 3.AAE, max UDMA/133
ata3.00: 156301488 sectors, multi 16: LBA48 NCQ (depth 0/32)
ata3.00: configured for UDMA/133
scsi 2:0:0:0: Direct-Access     ATA      ST3808110AS      3.AA PQ: 0 ANSI: 5
sd 2:0:0:0: [sda] 156301488 512-byte hardware sectors: (80.0 GB/74.5 GiB)
sd 2:0:0:0: [sda] Write Protect is off
sd 2:0:0:0: [sda] Mode Sense: 00 3a 00 00
sd 2:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
 sda: sda1 sda2 < sda5 sda6 sda7 sda8 sda9 >
sd 2:0:0:0: [sda] Attached SCSI disk
ata4: SATA link down (SStatus 0 SControl 300)
EXT3-fs: INFO: recovery required on readonly filesystem.
EXT3-fs: write access will be enabled during recovery.
kjournald starting.  Commit interval 5 seconds
EXT3-fs: sda8: orphan cleanup on readonly fs
ext3_orphan_cleanup: deleting unreferenced inode 3725366
ext3_orphan_cleanup: deleting unreferenced inode 3725365
ext3_orphan_cleanup: deleting unreferenced inode 3725364
EXT3-fs: sda8: 3 orphan inodes deleted
EXT3-fs: recovery complete.
EXT3-fs: mounted filesystem with writeback data mode.
r8169 Gigabit Ethernet driver 2.3LK-NAPI loaded
r8169 0000:00:0e.0: PCI INT A -> GSI 18 (level, low) -> IRQ 18
r8169 0000:00:0e.0: no PCI Express capability
eth0: RTL8110s at 0xf8236000, 00:16:ec:2e:b7:e0, XID 04000000 IRQ 18
sd 2:0:0:0: Attached scsi generic sg0 type 0
parport_pc 00:09: reported by Plug and Play ACPI
parport0: PC-style at 0x378 (0x778), irq 7 [PCSPP,TRISTATE]
input: Power Button as /class/input/input2
ACPI: Power Button [PWRF]
input: Power Button as /class/input/input3
ACPI: Power Button [PWRB]
input: Sleep Button as /class/input/input4
ACPI: Sleep Button [FUTS]
ramfs: bad mount option: maxsize=512
EXT3 FS on sda8, internal journal
kjournald starting.  Commit interval 5 seconds
EXT3 FS on sda7, internal journal
EXT3-fs: mounted filesystem with writeback data mode.
Adding 1052216k swap on /dev/sda6.  Priority:-1 extents:1 across:1052216k 
warning: process `kudzu' used the deprecated sysctl system call with 1.23.
kudzu[1133] general protection ip:8056968 sp:bffe9e90 error:0
r8169: eth0: link up
r8169: eth0: link up
warning: `dbus-daemon' uses 32-bit capabilities (legacy support in use)
CPU0 attaching NULL sched-domain.
CPU1 attaching NULL sched-domain.
CPU0 attaching sched-domain:
 domain 0: span 0-1 level CPU
  groups: 0 1
CPU1 attaching sched-domain:
 domain 0: span 0-1 level CPU
  groups: 1 0

=========================================================
[ INFO: possible irq lock inversion dependency detected ]
2.6.30-rc4-io #6
---------------------------------------------------------
rmdir/2186 just changed the state of lock:
 (&iocg->lock){+.+...}, at: [<c0513b18>] iocg_destroy+0x2a/0x118
but this lock was taken by another, SOFTIRQ-safe lock in the past:
 (&q->__queue_lock){..-...}

and interrupts could create inverse lock ordering between them.


other info that might help us debug this:
3 locks held by rmdir/2186:
 #0:  (&sb->s_type->i_mutex_key#10/1){+.+.+.}, at: [<c04ae1e8>] do_rmdir+0x5c/0xc8
 #1:  (cgroup_mutex){+.+.+.}, at: [<c045a15b>] cgroup_diput+0x3c/0xa7
 #2:  (&iocg->lock){+.+...}, at: [<c0513b18>] iocg_destroy+0x2a/0x118

the first lock's dependencies:
-> (&iocg->lock){+.+...} ops: 3 {
   HARDIRQ-ON-W at:
                        [<c044b840>] mark_held_locks+0x3d/0x58
                        [<c044b963>] trace_hardirqs_on_caller+0x108/0x14c
                        [<c044b9b2>] trace_hardirqs_on+0xb/0xd
                        [<c0630883>] _spin_unlock_irq+0x27/0x47
                        [<c0513baa>] iocg_destroy+0xbc/0x118
                        [<c045a16a>] cgroup_diput+0x4b/0xa7
                        [<c04b1dbb>] dentry_iput+0x78/0x9c
                        [<c04b1e82>] d_kill+0x21/0x3b
                        [<c04b2f2a>] dput+0xf3/0xfc
                        [<c04ae226>] do_rmdir+0x9a/0xc8
                        [<c04ae29d>] sys_rmdir+0x15/0x17
                        [<c0402a68>] sysenter_do_call+0x12/0x36
                        [<ffffffff>] 0xffffffff
   SOFTIRQ-ON-W at:
                        [<c044b840>] mark_held_locks+0x3d/0x58
                        [<c044b97c>] trace_hardirqs_on_caller+0x121/0x14c
                        [<c044b9b2>] trace_hardirqs_on+0xb/0xd
                        [<c0630883>] _spin_unlock_irq+0x27/0x47
                        [<c0513baa>] iocg_destroy+0xbc/0x118
                        [<c045a16a>] cgroup_diput+0x4b/0xa7
                        [<c04b1dbb>] dentry_iput+0x78/0x9c
                        [<c04b1e82>] d_kill+0x21/0x3b
                        [<c04b2f2a>] dput+0xf3/0xfc
                        [<c04ae226>] do_rmdir+0x9a/0xc8
                        [<c04ae29d>] sys_rmdir+0x15/0x17
                        [<c0402a68>] sysenter_do_call+0x12/0x36
                        [<ffffffff>] 0xffffffff
   INITIAL USE at:
                       [<c044dad5>] __lock_acquire+0x58c/0x73e
                       [<c044dd36>] lock_acquire+0xaf/0xcc
                       [<c06304ea>] _spin_lock_irq+0x30/0x3f
                       [<c05119bd>] io_alloc_root_group+0x104/0x155
                       [<c05133cb>] elv_init_fq_data+0x32/0xe0
                       [<c0504317>] elevator_alloc+0x150/0x170
                       [<c0505393>] elevator_init+0x9d/0x100
                       [<c0507088>] blk_init_queue_node+0xc4/0xf7
                       [<c05070cb>] blk_init_queue+0x10/0x12
                       [<f81060fd>] __scsi_alloc_queue+0x1c/0xba [scsi_mod]
                       [<f81061b0>] scsi_alloc_queue+0x15/0x4e [scsi_mod]
                       [<f810803d>] scsi_alloc_sdev+0x154/0x1f5 [scsi_mod]
                       [<f8108387>] scsi_probe_and_add_lun+0x123/0xb5b [scsi_mod]
                       [<f8109847>] __scsi_add_device+0x8a/0xb0 [scsi_mod]
                       [<f816ad14>] ata_scsi_scan_host+0x77/0x141 [libata]
                       [<f816903f>] async_port_probe+0xa0/0xa9 [libata]
                       [<c044341f>] async_thread+0xe9/0x1c9
                       [<c043e204>] kthread+0x4a/0x72
                       [<c04034e7>] kernel_thread_helper+0x7/0x10
                       [<ffffffff>] 0xffffffff
 }
 ... key      at: [<c0c5ebd8>] __key.29462+0x0/0x8

the second lock's dependencies:
-> (&q->__queue_lock){..-...} ops: 162810 {
   IN-SOFTIRQ-W at:
                        [<c044da08>] __lock_acquire+0x4bf/0x73e
                        [<c044dd36>] lock_acquire+0xaf/0xcc
                        [<c0630340>] _spin_lock+0x2a/0x39
                        [<f810672c>] scsi_device_unbusy+0x78/0x92 [scsi_mod]
                        [<f8101483>] scsi_finish_command+0x22/0xd4 [scsi_mod]
                        [<f8106fdb>] scsi_softirq_done+0xf9/0x101 [scsi_mod]
                        [<c050a936>] blk_done_softirq+0x5e/0x70
                        [<c0431379>] __do_softirq+0xb8/0x180
                        [<ffffffff>] 0xffffffff
   INITIAL USE at:
                       [<c044dad5>] __lock_acquire+0x58c/0x73e
                       [<c044dd36>] lock_acquire+0xaf/0xcc
                       [<c063056b>] _spin_lock_irqsave+0x33/0x43
                       [<f8101337>] scsi_adjust_queue_depth+0x2a/0xc9 [scsi_mod]
                       [<f8108079>] scsi_alloc_sdev+0x190/0x1f5 [scsi_mod]
                       [<f8108387>] scsi_probe_and_add_lun+0x123/0xb5b [scsi_mod]
                       [<f8109847>] __scsi_add_device+0x8a/0xb0 [scsi_mod]
                       [<f816ad14>] ata_scsi_scan_host+0x77/0x141 [libata]
                       [<f816903f>] async_port_probe+0xa0/0xa9 [libata]
                       [<c044341f>] async_thread+0xe9/0x1c9
                       [<c043e204>] kthread+0x4a/0x72
                       [<c04034e7>] kernel_thread_helper+0x7/0x10
                       [<ffffffff>] 0xffffffff
 }
 ... key      at: [<c0c5e698>] __key.29749+0x0/0x8
 -> (&ioc->lock){..-...} ops: 1032 {
    IN-SOFTIRQ-W at:
                          [<c044da08>] __lock_acquire+0x4bf/0x73e
                          [<c044dd36>] lock_acquire+0xaf/0xcc
                          [<c063056b>] _spin_lock_irqsave+0x33/0x43
                          [<c050f0f0>] cic_free_func+0x26/0x64
                          [<c050ea90>] __call_for_each_cic+0x23/0x2e
                          [<c050eaad>] cfq_free_io_context+0x12/0x14
                          [<c050978c>] put_io_context+0x4b/0x66
                          [<c050f2a2>] cfq_put_request+0x42/0x5b
                          [<c0504629>] elv_put_request+0x30/0x33
                          [<c050678d>] __blk_put_request+0x8b/0xb8
                          [<c0506953>] end_that_request_last+0x199/0x1a1
                          [<c0506a0d>] blk_end_io+0x51/0x6f
                          [<c0506a64>] blk_end_request+0x11/0x13
                          [<f8106c9c>] scsi_io_completion+0x1d9/0x41f [scsi_mod]
                          [<f810152d>] scsi_finish_command+0xcc/0xd4 [scsi_mod]
                          [<f8106fdb>] scsi_softirq_done+0xf9/0x101 [scsi_mod]
                          [<c050a936>] blk_done_softirq+0x5e/0x70
                          [<c0431379>] __do_softirq+0xb8/0x180
                          [<ffffffff>] 0xffffffff
    INITIAL USE at:
                         [<c044dad5>] __lock_acquire+0x58c/0x73e
                         [<c044dd36>] lock_acquire+0xaf/0xcc
                         [<c063056b>] _spin_lock_irqsave+0x33/0x43
                         [<c050f9bf>] cfq_set_request+0x123/0x33d
                         [<c05052e6>] elv_set_request+0x43/0x53
                         [<c0506d44>] get_request+0x22e/0x33f
                         [<c0507498>] get_request_wait+0x137/0x15d
                         [<c0507501>] blk_get_request+0x43/0x73
                         [<f8106854>] scsi_execute+0x24/0x11c [scsi_mod]
                         [<f81069ff>] scsi_execute_req+0xb3/0x104 [scsi_mod]
                         [<f81084f8>] scsi_probe_and_add_lun+0x294/0xb5b [scsi_mod]
                         [<f8109847>] __scsi_add_device+0x8a/0xb0 [scsi_mod]
                         [<f816ad14>] ata_scsi_scan_host+0x77/0x141 [libata]
                         [<f816903f>] async_port_probe+0xa0/0xa9 [libata]
                         [<c044341f>] async_thread+0xe9/0x1c9
                         [<c043e204>] kthread+0x4a/0x72
                         [<c04034e7>] kernel_thread_helper+0x7/0x10
                         [<ffffffff>] 0xffffffff
  }
  ... key      at: [<c0c5e6ec>] __key.27747+0x0/0x8
  -> (&rdp->lock){-.-...} ops: 168014 {
     IN-HARDIRQ-W at:
                            [<c044d9e4>] __lock_acquire+0x49b/0x73e
                            [<c044dd36>] lock_acquire+0xaf/0xcc
                            [<c063056b>] _spin_lock_irqsave+0x33/0x43
                            [<c0461b2a>] rcu_check_callbacks+0x6a/0xa3
                            [<c043549a>] update_process_times+0x3d/0x53
                            [<c0447fe0>] tick_periodic+0x6b/0x77
                            [<c0448009>] tick_handle_periodic+0x1d/0x60
                            [<c063406e>] smp_apic_timer_interrupt+0x6e/0x81
                            [<c04033c7>] apic_timer_interrupt+0x2f/0x34
                            [<c042fbd7>] do_exit+0x53e/0x5b3
                            [<c043a9d8>] __request_module+0x0/0x100
                            [<c04034e7>] kernel_thread_helper+0x7/0x10
                            [<ffffffff>] 0xffffffff
     IN-SOFTIRQ-W at:
                            [<c044da08>] __lock_acquire+0x4bf/0x73e
                            [<c044dd36>] lock_acquire+0xaf/0xcc
                            [<c0630340>] _spin_lock+0x2a/0x39
                            [<c04619db>] rcu_process_callbacks+0x2b/0x86
                            [<c0431379>] __do_softirq+0xb8/0x180
                            [<ffffffff>] 0xffffffff
     INITIAL USE at:
                           [<c044dad5>] __lock_acquire+0x58c/0x73e
                           [<c044dd36>] lock_acquire+0xaf/0xcc
                           [<c063056b>] _spin_lock_irqsave+0x33/0x43
                           [<c062c8ca>] rcu_online_cpu+0x3d/0x51
                           [<c062c910>] rcu_cpu_notify+0x32/0x43
                           [<c07b097f>] __rcu_init+0xf0/0x120
                           [<c07af027>] rcu_init+0x8/0x14
                           [<c079d6e1>] start_kernel+0x187/0x2fc
                           [<c079d06a>] __init_begin+0x6a/0x6f
                           [<ffffffff>] 0xffffffff
   }
   ... key      at: [<c0c2e52c>] __key.17543+0x0/0x8
  ... acquired at:
   [<c044d243>] validate_chain+0x8a8/0xbae
   [<c044dbfd>] __lock_acquire+0x6b4/0x73e
   [<c044dd36>] lock_acquire+0xaf/0xcc
   [<c0630340>] _spin_lock+0x2a/0x39
   [<c046143d>] call_rcu+0x36/0x5b
   [<c0517b45>] radix_tree_delete+0xe7/0x176
   [<c050f0fe>] cic_free_func+0x34/0x64
   [<c050ea90>] __call_for_each_cic+0x23/0x2e
   [<c050eaad>] cfq_free_io_context+0x12/0x14
   [<c050978c>] put_io_context+0x4b/0x66
   [<c050984c>] exit_io_context+0x77/0x7b
   [<c042fc24>] do_exit+0x58b/0x5b3
   [<c04034ed>] kernel_thread_helper+0xd/0x10
   [<ffffffff>] 0xffffffff

 ... acquired at:
   [<c044d243>] validate_chain+0x8a8/0xbae
   [<c044dbfd>] __lock_acquire+0x6b4/0x73e
   [<c044dd36>] lock_acquire+0xaf/0xcc
   [<c063056b>] _spin_lock_irqsave+0x33/0x43
   [<c050f4a3>] cfq_cic_lookup+0xd9/0xef
   [<c050f674>] cfq_get_queue+0x92/0x2ba
   [<c050fb01>] cfq_set_request+0x265/0x33d
   [<c05052e6>] elv_set_request+0x43/0x53
   [<c0506d44>] get_request+0x22e/0x33f
   [<c0507498>] get_request_wait+0x137/0x15d
   [<c0507501>] blk_get_request+0x43/0x73
   [<f8106854>] scsi_execute+0x24/0x11c [scsi_mod]
   [<f81069ff>] scsi_execute_req+0xb3/0x104 [scsi_mod]
   [<f81084f8>] scsi_probe_and_add_lun+0x294/0xb5b [scsi_mod]
   [<f8109847>] __scsi_add_device+0x8a/0xb0 [scsi_mod]
   [<f816ad14>] ata_scsi_scan_host+0x77/0x141 [libata]
   [<f816903f>] async_port_probe+0xa0/0xa9 [libata]
   [<c044341f>] async_thread+0xe9/0x1c9
   [<c043e204>] kthread+0x4a/0x72
   [<c04034e7>] kernel_thread_helper+0x7/0x10
   [<ffffffff>] 0xffffffff

 -> (&base->lock){..-...} ops: 348073 {
    IN-SOFTIRQ-W at:
                          [<c044da08>] __lock_acquire+0x4bf/0x73e
                          [<c044dd36>] lock_acquire+0xaf/0xcc
                          [<c06304ea>] _spin_lock_irq+0x30/0x3f
                          [<c0434b8b>] run_timer_softirq+0x3c/0x1d1
                          [<c0431379>] __do_softirq+0xb8/0x180
                          [<ffffffff>] 0xffffffff
    INITIAL USE at:
                         [<c044dad5>] __lock_acquire+0x58c/0x73e
                         [<c044dd36>] lock_acquire+0xaf/0xcc
                         [<c063056b>] _spin_lock_irqsave+0x33/0x43
                         [<c0434e84>] lock_timer_base+0x24/0x43
                         [<c0434f3d>] mod_timer+0x46/0xcc
                         [<c07bd97a>] con_init+0xa4/0x20e
                         [<c07bd3b2>] console_init+0x12/0x20
                         [<c079d735>] start_kernel+0x1db/0x2fc
                         [<c079d06a>] __init_begin+0x6a/0x6f
                         [<ffffffff>] 0xffffffff
  }
  ... key      at: [<c082304c>] __key.23401+0x0/0x8
 ... acquired at:
   [<c044d243>] validate_chain+0x8a8/0xbae
   [<c044dbfd>] __lock_acquire+0x6b4/0x73e
   [<c044dd36>] lock_acquire+0xaf/0xcc
   [<c063056b>] _spin_lock_irqsave+0x33/0x43
   [<c0434e84>] lock_timer_base+0x24/0x43
   [<c0434f3d>] mod_timer+0x46/0xcc
   [<c05075cb>] blk_plug_device+0x9a/0xdf
   [<c05049e1>] __elv_add_request+0x86/0x96
   [<c0509d52>] blk_execute_rq_nowait+0x5d/0x86
   [<c0509e2e>] blk_execute_rq+0xb3/0xd5
   [<f81068f5>] scsi_execute+0xc5/0x11c [scsi_mod]
   [<f81069ff>] scsi_execute_req+0xb3/0x104 [scsi_mod]
   [<f81084f8>] scsi_probe_and_add_lun+0x294/0xb5b [scsi_mod]
   [<f8109847>] __scsi_add_device+0x8a/0xb0 [scsi_mod]
   [<f816ad14>] ata_scsi_scan_host+0x77/0x141 [libata]
   [<f816903f>] async_port_probe+0xa0/0xa9 [libata]
   [<c044341f>] async_thread+0xe9/0x1c9
   [<c043e204>] kthread+0x4a/0x72
   [<c04034e7>] kernel_thread_helper+0x7/0x10
   [<ffffffff>] 0xffffffff

 -> (&sdev->list_lock){..-...} ops: 27612 {
    IN-SOFTIRQ-W at:
                          [<c044da08>] __lock_acquire+0x4bf/0x73e
                          [<c044dd36>] lock_acquire+0xaf/0xcc
                          [<c063056b>] _spin_lock_irqsave+0x33/0x43
                          [<f8101cb4>] scsi_put_command+0x17/0x57 [scsi_mod]
                          [<f810620f>] scsi_next_command+0x26/0x39 [scsi_mod]
                          [<f8106d02>] scsi_io_completion+0x23f/0x41f [scsi_mod]
                          [<f810152d>] scsi_finish_command+0xcc/0xd4 [scsi_mod]
                          [<f8106fdb>] scsi_softirq_done+0xf9/0x101 [scsi_mod]
                          [<c050a936>] blk_done_softirq+0x5e/0x70
                          [<c0431379>] __do_softirq+0xb8/0x180
                          [<ffffffff>] 0xffffffff
    INITIAL USE at:
                         [<c044dad5>] __lock_acquire+0x58c/0x73e
                         [<c044dd36>] lock_acquire+0xaf/0xcc
                         [<c063056b>] _spin_lock_irqsave+0x33/0x43
                         [<f8101c64>] scsi_get_command+0x5c/0x95 [scsi_mod]
                         [<f81062b6>] scsi_get_cmd_from_req+0x26/0x50 [scsi_mod]
                         [<f8106594>] scsi_setup_blk_pc_cmnd+0x2b/0xd7 [scsi_mod]
                         [<f8106664>] scsi_prep_fn+0x24/0x33 [scsi_mod]
                         [<c0504712>] elv_next_request+0xe6/0x18d
                         [<f810704c>] scsi_request_fn+0x69/0x431 [scsi_mod]
                         [<c05072af>] __generic_unplug_device+0x2e/0x31
                         [<c0509d59>] blk_execute_rq_nowait+0x64/0x86
                         [<c0509e2e>] blk_execute_rq+0xb3/0xd5
                         [<f81068f5>] scsi_execute+0xc5/0x11c [scsi_mod]
                         [<f81069ff>] scsi_execute_req+0xb3/0x104 [scsi_mod]
                         [<f81084f8>] scsi_probe_and_add_lun+0x294/0xb5b [scsi_mod]
                         [<f8109847>] __scsi_add_device+0x8a/0xb0 [scsi_mod]
                         [<f816ad14>] ata_scsi_scan_host+0x77/0x141 [libata]
                         [<f816903f>] async_port_probe+0xa0/0xa9 [libata]
                         [<c044341f>] async_thread+0xe9/0x1c9
                         [<c043e204>] kthread+0x4a/0x72
                         [<c04034e7>] kernel_thread_helper+0x7/0x10
                         [<ffffffff>] 0xffffffff
  }
  ... key      at: [<f811916c>] __key.29786+0x0/0xffff2ebf [scsi_mod]
 ... acquired at:
   [<c044d243>] validate_chain+0x8a8/0xbae
   [<c044dbfd>] __lock_acquire+0x6b4/0x73e
   [<c044dd36>] lock_acquire+0xaf/0xcc
   [<c063056b>] _spin_lock_irqsave+0x33/0x43
   [<f8101c64>] scsi_get_command+0x5c/0x95 [scsi_mod]
   [<f81062b6>] scsi_get_cmd_from_req+0x26/0x50 [scsi_mod]
   [<f8106594>] scsi_setup_blk_pc_cmnd+0x2b/0xd7 [scsi_mod]
   [<f8106664>] scsi_prep_fn+0x24/0x33 [scsi_mod]
   [<c0504712>] elv_next_request+0xe6/0x18d
   [<f810704c>] scsi_request_fn+0x69/0x431 [scsi_mod]
   [<c05072af>] __generic_unplug_device+0x2e/0x31
   [<c0509d59>] blk_execute_rq_nowait+0x64/0x86
   [<c0509e2e>] blk_execute_rq+0xb3/0xd5
   [<f81068f5>] scsi_execute+0xc5/0x11c [scsi_mod]
   [<f81069ff>] scsi_execute_req+0xb3/0x104 [scsi_mod]
   [<f81084f8>] scsi_probe_and_add_lun+0x294/0xb5b [scsi_mod]
   [<f8109847>] __scsi_add_device+0x8a/0xb0 [scsi_mod]
   [<f816ad14>] ata_scsi_scan_host+0x77/0x141 [libata]
   [<f816903f>] async_port_probe+0xa0/0xa9 [libata]
   [<c044341f>] async_thread+0xe9/0x1c9
   [<c043e204>] kthread+0x4a/0x72
   [<c04034e7>] kernel_thread_helper+0x7/0x10
   [<ffffffff>] 0xffffffff

 -> (&q->lock){-.-.-.} ops: 2105038 {
    IN-HARDIRQ-W at:
                          [<c044d9e4>] __lock_acquire+0x49b/0x73e
                          [<c044dd36>] lock_acquire+0xaf/0xcc
                          [<c063056b>] _spin_lock_irqsave+0x33/0x43
                          [<c041ec0d>] complete+0x17/0x43
                          [<c062609b>] i8042_aux_test_irq+0x4c/0x65
                          [<c045e922>] handle_IRQ_event+0xa4/0x169
                          [<c04602ea>] handle_edge_irq+0xc9/0x10a
                          [<ffffffff>] 0xffffffff
    IN-SOFTIRQ-W at:
                          [<c044da08>] __lock_acquire+0x4bf/0x73e
                          [<c044dd36>] lock_acquire+0xaf/0xcc
                          [<c063056b>] _spin_lock_irqsave+0x33/0x43
                          [<c041ec0d>] complete+0x17/0x43
                          [<c043c336>] wakeme_after_rcu+0x10/0x12
                          [<c0461a12>] rcu_process_callbacks+0x62/0x86
                          [<c0431379>] __do_softirq+0xb8/0x180
                          [<ffffffff>] 0xffffffff
    IN-RECLAIM_FS-W at:
                             [<c044dabd>] __lock_acquire+0x574/0x73e
                             [<c044dd36>] lock_acquire+0xaf/0xcc
                             [<c063056b>] _spin_lock_irqsave+0x33/0x43
                             [<c043e47b>] prepare_to_wait+0x1c/0x4a
                             [<c0485d3e>] kswapd+0xa7/0x51b
                             [<c043e204>] kthread+0x4a/0x72
                             [<c04034e7>] kernel_thread_helper+0x7/0x10
                             [<ffffffff>] 0xffffffff
    INITIAL USE at:
                         [<c044dad5>] __lock_acquire+0x58c/0x73e
                         [<c044dd36>] lock_acquire+0xaf/0xcc
                         [<c06304ea>] _spin_lock_irq+0x30/0x3f
                         [<c062d811>] wait_for_common+0x2f/0xeb
                         [<c062d968>] wait_for_completion+0x17/0x19
                         [<c043e161>] kthread_create+0x6e/0xc7
                         [<c062b7eb>] migration_call+0x39/0x444
                         [<c07ae112>] migration_init+0x1d/0x4b
                         [<c040115c>] do_one_initcall+0x6a/0x16e
                         [<c079d44d>] kernel_init+0x4d/0x15a
                         [<c04034e7>] kernel_thread_helper+0x7/0x10
                         [<ffffffff>] 0xffffffff
  }
  ... key      at: [<c0823490>] __key.17681+0x0/0x8
  -> (&rq->lock){-.-.-.} ops: 854341 {
     IN-HARDIRQ-W at:
                            [<c044d9e4>] __lock_acquire+0x49b/0x73e
                            [<c044dd36>] lock_acquire+0xaf/0xcc
                            [<c0630340>] _spin_lock+0x2a/0x39
                            [<c0429f89>] scheduler_tick+0x39/0x19b
                            [<c04354a4>] update_process_times+0x47/0x53
                            [<c0447fe0>] tick_periodic+0x6b/0x77
                            [<c0448009>] tick_handle_periodic+0x1d/0x60
                            [<c0404ace>] timer_interrupt+0x3e/0x45
                            [<c045e922>] handle_IRQ_event+0xa4/0x169
                            [<c04603a3>] handle_level_irq+0x78/0xc1
                            [<ffffffff>] 0xffffffff
     IN-SOFTIRQ-W at:
                            [<c044da08>] __lock_acquire+0x4bf/0x73e
                            [<c044dd36>] lock_acquire+0xaf/0xcc
                            [<c0630340>] _spin_lock+0x2a/0x39
                            [<c041ede7>] task_rq_lock+0x3b/0x62
                            [<c0426e41>] try_to_wake_up+0x75/0x2d4
                            [<c04270d7>] wake_up_process+0x14/0x16
                            [<c043507c>] process_timeout+0xd/0xf
                            [<c0434caa>] run_timer_softirq+0x15b/0x1d1
                            [<c0431379>] __do_softirq+0xb8/0x180
                            [<ffffffff>] 0xffffffff
     IN-RECLAIM_FS-W at:
                               [<c044dabd>] __lock_acquire+0x574/0x73e
                               [<c044dd36>] lock_acquire+0xaf/0xcc
                               [<c0630340>] _spin_lock+0x2a/0x39
                               [<c041ede7>] task_rq_lock+0x3b/0x62
                               [<c0427515>] set_cpus_allowed_ptr+0x1a/0xdd
                               [<c0485cf8>] kswapd+0x61/0x51b
                               [<c043e204>] kthread+0x4a/0x72
                               [<c04034e7>] kernel_thread_helper+0x7/0x10
                               [<ffffffff>] 0xffffffff
     INITIAL USE at:
                           [<c044dad5>] __lock_acquire+0x58c/0x73e
                           [<c044dd36>] lock_acquire+0xaf/0xcc
                           [<c063056b>] _spin_lock_irqsave+0x33/0x43
                           [<c042398e>] rq_attach_root+0x17/0xa7
                           [<c07ae52c>] sched_init+0x240/0x33e
                           [<c079d661>] start_kernel+0x107/0x2fc
                           [<c079d06a>] __init_begin+0x6a/0x6f
                           [<ffffffff>] 0xffffffff
   }
   ... key      at: [<c0800518>] __key.46938+0x0/0x8
   -> (&vec->lock){-.-...} ops: 34058 {
      IN-HARDIRQ-W at:
                              [<c044d9e4>] __lock_acquire+0x49b/0x73e
                              [<c044dd36>] lock_acquire+0xaf/0xcc
                              [<c063056b>] _spin_lock_irqsave+0x33/0x43
                              [<c047ad3b>] cpupri_set+0x51/0xba
                              [<c04219ee>] __enqueue_rt_entity+0xe2/0x1c8
                              [<c0421e18>] enqueue_rt_entity+0x19/0x23
                              [<c0428a52>] enqueue_task_rt+0x24/0x51
                              [<c041e03b>] enqueue_task+0x64/0x70
                              [<c041e06b>] activate_task+0x24/0x2a
                              [<c0426f9e>] try_to_wake_up+0x1d2/0x2d4
                              [<c04270d7>] wake_up_process+0x14/0x16
                              [<c04408b6>] hrtimer_wakeup+0x1d/0x21
                              [<c0440922>] __run_hrtimer+0x68/0x98
                              [<c04411ca>] hrtimer_interrupt+0x101/0x153
                              [<c063406e>] smp_apic_timer_interrupt+0x6e/0x81
                              [<c04033c7>] apic_timer_interrupt+0x2f/0x34
                              [<c0401c4f>] cpu_idle+0x53/0x85
                              [<c061fc80>] rest_init+0x6c/0x6e
                              [<c079d851>] start_kernel+0x2f7/0x2fc
                              [<c079d06a>] __init_begin+0x6a/0x6f
                              [<ffffffff>] 0xffffffff
      IN-SOFTIRQ-W at:
                              [<c044da08>] __lock_acquire+0x4bf/0x73e
                              [<c044dd36>] lock_acquire+0xaf/0xcc
                              [<c063056b>] _spin_lock_irqsave+0x33/0x43
                              [<c047ad3b>] cpupri_set+0x51/0xba
                              [<c04219ee>] __enqueue_rt_entity+0xe2/0x1c8
                              [<c0421e18>] enqueue_rt_entity+0x19/0x23
                              [<c0428a52>] enqueue_task_rt+0x24/0x51
                              [<c041e03b>] enqueue_task+0x64/0x70
                              [<c041e06b>] activate_task+0x24/0x2a
                              [<c0426f9e>] try_to_wake_up+0x1d2/0x2d4
                              [<c04270d7>] wake_up_process+0x14/0x16
                              [<c042737c>] rebalance_domains+0x2a3/0x3ac
                              [<c0429a06>] run_rebalance_domains+0x32/0xaa
                              [<c0431379>] __do_softirq+0xb8/0x180
                              [<ffffffff>] 0xffffffff
      INITIAL USE at:
                             [<c044dad5>] __lock_acquire+0x58c/0x73e
                             [<c044dd36>] lock_acquire+0xaf/0xcc
                             [<c063056b>] _spin_lock_irqsave+0x33/0x43
                             [<c047ad74>] cpupri_set+0x8a/0xba
                             [<c04216f2>] rq_online_rt+0x5e/0x61
                             [<c041dd3a>] set_rq_online+0x40/0x4a
                             [<c04239fb>] rq_attach_root+0x84/0xa7
                             [<c07ae52c>] sched_init+0x240/0x33e
                             [<c079d661>] start_kernel+0x107/0x2fc
                             [<c079d06a>] __init_begin+0x6a/0x6f
                             [<ffffffff>] 0xffffffff
    }
    ... key      at: [<c0c525d0>] __key.14261+0x0/0x10
   ... acquired at:
   [<c044d243>] validate_chain+0x8a8/0xbae
   [<c044dbfd>] __lock_acquire+0x6b4/0x73e
   [<c044dd36>] lock_acquire+0xaf/0xcc
   [<c063056b>] _spin_lock_irqsave+0x33/0x43
   [<c047ad74>] cpupri_set+0x8a/0xba
   [<c04216f2>] rq_online_rt+0x5e/0x61
   [<c041dd3a>] set_rq_online+0x40/0x4a
   [<c04239fb>] rq_attach_root+0x84/0xa7
   [<c07ae52c>] sched_init+0x240/0x33e
   [<c079d661>] start_kernel+0x107/0x2fc
   [<c079d06a>] __init_begin+0x6a/0x6f
   [<ffffffff>] 0xffffffff

   -> (&rt_b->rt_runtime_lock){-.-...} ops: 336 {
      IN-HARDIRQ-W at:
                              [<c044d9e4>] __lock_acquire+0x49b/0x73e
                              [<c044dd36>] lock_acquire+0xaf/0xcc
                              [<c0630340>] _spin_lock+0x2a/0x39
                              [<c0421a75>] __enqueue_rt_entity+0x169/0x1c8
                              [<c0421e18>] enqueue_rt_entity+0x19/0x23
                              [<c0428a52>] enqueue_task_rt+0x24/0x51
                              [<c041e03b>] enqueue_task+0x64/0x70
                              [<c041e06b>] activate_task+0x24/0x2a
                              [<c0426f9e>] try_to_wake_up+0x1d2/0x2d4
                              [<c04270d7>] wake_up_process+0x14/0x16
                              [<c04408b6>] hrtimer_wakeup+0x1d/0x21
                              [<c0440922>] __run_hrtimer+0x68/0x98
                              [<c04411ca>] hrtimer_interrupt+0x101/0x153
                              [<c063406e>] smp_apic_timer_interrupt+0x6e/0x81
                              [<c04033c7>] apic_timer_interrupt+0x2f/0x34
                              [<c0401c4f>] cpu_idle+0x53/0x85
                              [<c061fc80>] rest_init+0x6c/0x6e
                              [<c079d851>] start_kernel+0x2f7/0x2fc
                              [<c079d06a>] __init_begin+0x6a/0x6f
                              [<ffffffff>] 0xffffffff
      IN-SOFTIRQ-W at:
                              [<c044da08>] __lock_acquire+0x4bf/0x73e
                              [<c044dd36>] lock_acquire+0xaf/0xcc
                              [<c0630340>] _spin_lock+0x2a/0x39
                              [<c0421a75>] __enqueue_rt_entity+0x169/0x1c8
                              [<c0421e18>] enqueue_rt_entity+0x19/0x23
                              [<c0428a52>] enqueue_task_rt+0x24/0x51
                              [<c041e03b>] enqueue_task+0x64/0x70
                              [<c041e06b>] activate_task+0x24/0x2a
                              [<c0426f9e>] try_to_wake_up+0x1d2/0x2d4
                              [<c04270d7>] wake_up_process+0x14/0x16
                              [<c042737c>] rebalance_domains+0x2a3/0x3ac
                              [<c0429a06>] run_rebalance_domains+0x32/0xaa
                              [<c0431379>] __do_softirq+0xb8/0x180
                              [<ffffffff>] 0xffffffff
      INITIAL USE at:
                             [<c044dad5>] __lock_acquire+0x58c/0x73e
                             [<c044dd36>] lock_acquire+0xaf/0xcc
                             [<c0630340>] _spin_lock+0x2a/0x39
                             [<c0421a75>] __enqueue_rt_entity+0x169/0x1c8
                             [<c0421e18>] enqueue_rt_entity+0x19/0x23
                             [<c0428a52>] enqueue_task_rt+0x24/0x51
                             [<c041e03b>] enqueue_task+0x64/0x70
                             [<c041e06b>] activate_task+0x24/0x2a
                             [<c0426f9e>] try_to_wake_up+0x1d2/0x2d4
                             [<c04270d7>] wake_up_process+0x14/0x16
                             [<c062b86b>] migration_call+0xb9/0x444
                             [<c07ae130>] migration_init+0x3b/0x4b
                             [<c040115c>] do_one_initcall+0x6a/0x16e
                             [<c079d44d>] kernel_init+0x4d/0x15a
                             [<c04034e7>] kernel_thread_helper+0x7/0x10
                             [<ffffffff>] 0xffffffff
    }
    ... key      at: [<c0800504>] __key.37924+0x0/0x8
    -> (&cpu_base->lock){-.-...} ops: 950512 {
       IN-HARDIRQ-W at:
                                [<c044d9e4>] __lock_acquire+0x49b/0x73e
                                [<c044dd36>] lock_acquire+0xaf/0xcc
                                [<c0630340>] _spin_lock+0x2a/0x39
                                [<c0440a3a>] hrtimer_run_queues+0xe8/0x131
                                [<c0435151>] run_local_timers+0xd/0x1e
                                [<c0435486>] update_process_times+0x29/0x53
                                [<c0447fe0>] tick_periodic+0x6b/0x77
                                [<c0448009>] tick_handle_periodic+0x1d/0x60
                                [<c063406e>] smp_apic_timer_interrupt+0x6e/0x81
                                [<c04033c7>] apic_timer_interrupt+0x2f/0x34
                                [<c04082c7>] arch_dup_task_struct+0x19/0x81
                                [<c042ac1c>] copy_process+0xab/0x115f
                                [<c042be78>] do_fork+0x129/0x2c5
                                [<c0401698>] kernel_thread+0x7f/0x87
                                [<c043e0b3>] kthreadd+0xa3/0xe3
                                [<c04034e7>] kernel_thread_helper+0x7/0x10
                                [<ffffffff>] 0xffffffff
       IN-SOFTIRQ-W at:
                                [<c044da08>] __lock_acquire+0x4bf/0x73e
                                [<c044dd36>] lock_acquire+0xaf/0xcc
                                [<c063056b>] _spin_lock_irqsave+0x33/0x43
                                [<c0440b98>] lock_hrtimer_base+0x1d/0x38
                                [<c0440ca9>] __hrtimer_start_range_ns+0x1f/0x232
                                [<c0440ee7>] hrtimer_start_range_ns+0x15/0x17
                                [<c0448ef1>] tick_setup_sched_timer+0xf6/0x124
                                [<c0441558>] hrtimer_run_pending+0xb0/0xe8
                                [<c0434b76>] run_timer_softirq+0x27/0x1d1
                                [<c0431379>] __do_softirq+0xb8/0x180
                                [<ffffffff>] 0xffffffff
       INITIAL USE at:
                               [<c044dad5>] __lock_acquire+0x58c/0x73e
                               [<c044dd36>] lock_acquire+0xaf/0xcc
                               [<c063056b>] _spin_lock_irqsave+0x33/0x43
                               [<c0440b98>] lock_hrtimer_base+0x1d/0x38
                               [<c0440ca9>] __hrtimer_start_range_ns+0x1f/0x232
                               [<c0421ab1>] __enqueue_rt_entity+0x1a5/0x1c8
                               [<c0421e18>] enqueue_rt_entity+0x19/0x23
                               [<c0428a52>] enqueue_task_rt+0x24/0x51
                               [<c041e03b>] enqueue_task+0x64/0x70
                               [<c041e06b>] activate_task+0x24/0x2a
                               [<c0426f9e>] try_to_wake_up+0x1d2/0x2d4
                               [<c04270d7>] wake_up_process+0x14/0x16
                               [<c062b86b>] migration_call+0xb9/0x444
                               [<c07ae130>] migration_init+0x3b/0x4b
                               [<c040115c>] do_one_initcall+0x6a/0x16e
                               [<c079d44d>] kernel_init+0x4d/0x15a
                               [<c04034e7>] kernel_thread_helper+0x7/0x10
                               [<ffffffff>] 0xffffffff
     }
     ... key      at: [<c08234b8>] __key.20063+0x0/0x8
    ... acquired at:
   [<c044d243>] validate_chain+0x8a8/0xbae
   [<c044dbfd>] __lock_acquire+0x6b4/0x73e
   [<c044dd36>] lock_acquire+0xaf/0xcc
   [<c063056b>] _spin_lock_irqsave+0x33/0x43
   [<c0440b98>] lock_hrtimer_base+0x1d/0x38
   [<c0440ca9>] __hrtimer_start_range_ns+0x1f/0x232
   [<c0421ab1>] __enqueue_rt_entity+0x1a5/0x1c8
   [<c0421e18>] enqueue_rt_entity+0x19/0x23
   [<c0428a52>] enqueue_task_rt+0x24/0x51
   [<c041e03b>] enqueue_task+0x64/0x70
   [<c041e06b>] activate_task+0x24/0x2a
   [<c0426f9e>] try_to_wake_up+0x1d2/0x2d4
   [<c04270d7>] wake_up_process+0x14/0x16
   [<c062b86b>] migration_call+0xb9/0x444
   [<c07ae130>] migration_init+0x3b/0x4b
   [<c040115c>] do_one_initcall+0x6a/0x16e
   [<c079d44d>] kernel_init+0x4d/0x15a
   [<c04034e7>] kernel_thread_helper+0x7/0x10
   [<ffffffff>] 0xffffffff

    -> (&rt_rq->rt_runtime_lock){-.....} ops: 17587 {
       IN-HARDIRQ-W at:
                                [<c044d9e4>] __lock_acquire+0x49b/0x73e
                                [<c044dd36>] lock_acquire+0xaf/0xcc
                                [<c0630340>] _spin_lock+0x2a/0x39
                                [<c0421efc>] sched_rt_period_timer+0xda/0x24e
                                [<c0440922>] __run_hrtimer+0x68/0x98
                                [<c04411ca>] hrtimer_interrupt+0x101/0x153
                                [<c063406e>] smp_apic_timer_interrupt+0x6e/0x81
                                [<c04033c7>] apic_timer_interrupt+0x2f/0x34
                                [<c0452203>] each_symbol_in_section+0x27/0x57
                                [<c045225a>] each_symbol+0x27/0x113
                                [<c0452373>] find_symbol+0x2d/0x51
                                [<c0454a7a>] load_module+0xaec/0x10eb
                                [<c04550bf>] sys_init_module+0x46/0x19b
                                [<c0402a68>] sysenter_do_call+0x12/0x36
                                [<ffffffff>] 0xffffffff
       INITIAL USE at:
                               [<c044dad5>] __lock_acquire+0x58c/0x73e
                               [<c044dd36>] lock_acquire+0xaf/0xcc
                               [<c0630340>] _spin_lock+0x2a/0x39
                               [<c0421c41>] update_curr_rt+0x13a/0x20d
                               [<c0421dd8>] dequeue_task_rt+0x13/0x3a
                               [<c041df9e>] dequeue_task+0xff/0x10e
                               [<c041dfd1>] deactivate_task+0x24/0x2a
                               [<c062db54>] __schedule+0x162/0x991
                               [<c062e39a>] schedule+0x17/0x30
                               [<c0426c54>] migration_thread+0x175/0x203
                               [<c043e204>] kthread+0x4a/0x72
                               [<c04034e7>] kernel_thread_helper+0x7/0x10
                               [<ffffffff>] 0xffffffff
     }
     ... key      at: [<c080050c>] __key.46863+0x0/0x8
    ... acquired at:
   [<c044d243>] validate_chain+0x8a8/0xbae
   [<c044dbfd>] __lock_acquire+0x6b4/0x73e
   [<c044dd36>] lock_acquire+0xaf/0xcc
   [<c0630340>] _spin_lock+0x2a/0x39
   [<c041ee73>] __enable_runtime+0x43/0xb3
   [<c04216d8>] rq_online_rt+0x44/0x61
   [<c041dd3a>] set_rq_online+0x40/0x4a
   [<c062b8a5>] migration_call+0xf3/0x444
   [<c063291c>] notifier_call_chain+0x2b/0x4a
   [<c0441e22>] __raw_notifier_call_chain+0x13/0x15
   [<c0441e35>] raw_notifier_call_chain+0x11/0x13
   [<c062bd2f>] _cpu_up+0xc3/0xf6
   [<c062bdac>] cpu_up+0x4a/0x5a
   [<c079d49a>] kernel_init+0x9a/0x15a
   [<c04034e7>] kernel_thread_helper+0x7/0x10
   [<ffffffff>] 0xffffffff

   ... acquired at:
   [<c044d243>] validate_chain+0x8a8/0xbae
   [<c044dbfd>] __lock_acquire+0x6b4/0x73e
   [<c044dd36>] lock_acquire+0xaf/0xcc
   [<c0630340>] _spin_lock+0x2a/0x39
   [<c0421a75>] __enqueue_rt_entity+0x169/0x1c8
   [<c0421e18>] enqueue_rt_entity+0x19/0x23
   [<c0428a52>] enqueue_task_rt+0x24/0x51
   [<c041e03b>] enqueue_task+0x64/0x70
   [<c041e06b>] activate_task+0x24/0x2a
   [<c0426f9e>] try_to_wake_up+0x1d2/0x2d4
   [<c04270d7>] wake_up_process+0x14/0x16
   [<c062b86b>] migration_call+0xb9/0x444
   [<c07ae130>] migration_init+0x3b/0x4b
   [<c040115c>] do_one_initcall+0x6a/0x16e
   [<c079d44d>] kernel_init+0x4d/0x15a
   [<c04034e7>] kernel_thread_helper+0x7/0x10
   [<ffffffff>] 0xffffffff

   ... acquired at:
   [<c044d243>] validate_chain+0x8a8/0xbae
   [<c044dbfd>] __lock_acquire+0x6b4/0x73e
   [<c044dd36>] lock_acquire+0xaf/0xcc
   [<c0630340>] _spin_lock+0x2a/0x39
   [<c0421c41>] update_curr_rt+0x13a/0x20d
   [<c0421dd8>] dequeue_task_rt+0x13/0x3a
   [<c041df9e>] dequeue_task+0xff/0x10e
   [<c041dfd1>] deactivate_task+0x24/0x2a
   [<c062db54>] __schedule+0x162/0x991
   [<c062e39a>] schedule+0x17/0x30
   [<c0426c54>] migration_thread+0x175/0x203
   [<c043e204>] kthread+0x4a/0x72
   [<c04034e7>] kernel_thread_helper+0x7/0x10
   [<ffffffff>] 0xffffffff

   -> (&sig->cputimer.lock){......} ops: 1949 {
      INITIAL USE at:
                             [<c044dad5>] __lock_acquire+0x58c/0x73e
                             [<c044dd36>] lock_acquire+0xaf/0xcc
                             [<c063056b>] _spin_lock_irqsave+0x33/0x43
                             [<c043f03e>] thread_group_cputimer+0x29/0x90
                             [<c044004c>] posix_cpu_timers_exit_group+0x16/0x39
                             [<c042e5f0>] release_task+0xa2/0x376
                             [<c042fbe1>] do_exit+0x548/0x5b3
                             [<c043a9d8>] __request_module+0x0/0x100
                             [<c04034e7>] kernel_thread_helper+0x7/0x10
                             [<ffffffff>] 0xffffffff
    }
    ... key      at: [<c08014ac>] __key.15480+0x0/0x8
   ... acquired at:
   [<c044d243>] validate_chain+0x8a8/0xbae
   [<c044dbfd>] __lock_acquire+0x6b4/0x73e
   [<c044dd36>] lock_acquire+0xaf/0xcc
   [<c0630340>] _spin_lock+0x2a/0x39
   [<c041f43a>] update_curr+0xef/0x107
   [<c042131b>] enqueue_entity+0x1a/0x1c6
   [<c0421535>] enqueue_task_fair+0x24/0x3e
   [<c041e03b>] enqueue_task+0x64/0x70
   [<c041e06b>] activate_task+0x24/0x2a
   [<c0426f9e>] try_to_wake_up+0x1d2/0x2d4
   [<c04270b0>] default_wake_function+0x10/0x12
   [<c041d785>] __wake_up_common+0x34/0x5f
   [<c041ec26>] complete+0x30/0x43
   [<c043e1e8>] kthread+0x2e/0x72
   [<c04034e7>] kernel_thread_helper+0x7/0x10
   [<ffffffff>] 0xffffffff

   -> (&rq->lock/1){..-...} ops: 3217 {
      IN-SOFTIRQ-W at:
                              [<c044da08>] __lock_acquire+0x4bf/0x73e
                              [<c044dd36>] lock_acquire+0xaf/0xcc
                              [<c0630305>] _spin_lock_nested+0x2d/0x3e
                              [<c0422cb4>] double_rq_lock+0x4b/0x7d
                              [<c0427274>] rebalance_domains+0x19b/0x3ac
                              [<c0429a06>] run_rebalance_domains+0x32/0xaa
                              [<c0431379>] __do_softirq+0xb8/0x180
                              [<ffffffff>] 0xffffffff
      INITIAL USE at:
                             [<c044dad5>] __lock_acquire+0x58c/0x73e
                             [<c044dd36>] lock_acquire+0xaf/0xcc
                             [<c0630305>] _spin_lock_nested+0x2d/0x3e
                             [<c0422cb4>] double_rq_lock+0x4b/0x7d
                             [<c0427274>] rebalance_domains+0x19b/0x3ac
                             [<c0429a06>] run_rebalance_domains+0x32/0xaa
                             [<c0431379>] __do_softirq+0xb8/0x180
                             [<ffffffff>] 0xffffffff
    }
    ... key      at: [<c0800519>] __key.46938+0x1/0x8
    ... acquired at:
   [<c044d243>] validate_chain+0x8a8/0xbae
   [<c044dbfd>] __lock_acquire+0x6b4/0x73e
   [<c044dd36>] lock_acquire+0xaf/0xcc
   [<c0630340>] _spin_lock+0x2a/0x39
   [<c0421c41>] update_curr_rt+0x13a/0x20d
   [<c0421dd8>] dequeue_task_rt+0x13/0x3a
   [<c041df9e>] dequeue_task+0xff/0x10e
   [<c041dfd1>] deactivate_task+0x24/0x2a
   [<c0427b1b>] push_rt_task+0x189/0x1f7
   [<c0427b9b>] push_rt_tasks+0x12/0x19
   [<c0427bb9>] post_schedule_rt+0x17/0x21
   [<c0425a68>] finish_task_switch+0x83/0xc0
   [<c062e339>] __schedule+0x947/0x991
   [<c062e39a>] schedule+0x17/0x30
   [<c0426c54>] migration_thread+0x175/0x203
   [<c043e204>] kthread+0x4a/0x72
   [<c04034e7>] kernel_thread_helper+0x7/0x10
   [<ffffffff>] 0xffffffff

    ... acquired at:
   [<c044d243>] validate_chain+0x8a8/0xbae
   [<c044dbfd>] __lock_acquire+0x6b4/0x73e
   [<c044dd36>] lock_acquire+0xaf/0xcc
   [<c063056b>] _spin_lock_irqsave+0x33/0x43
   [<c047ad3b>] cpupri_set+0x51/0xba
   [<c04219ee>] __enqueue_rt_entity+0xe2/0x1c8
   [<c0421e18>] enqueue_rt_entity+0x19/0x23
   [<c0428a52>] enqueue_task_rt+0x24/0x51
   [<c041e03b>] enqueue_task+0x64/0x70
   [<c041e06b>] activate_task+0x24/0x2a
   [<c0427b33>] push_rt_task+0x1a1/0x1f7
   [<c0427b9b>] push_rt_tasks+0x12/0x19
   [<c0427bb9>] post_schedule_rt+0x17/0x21
   [<c0425a68>] finish_task_switch+0x83/0xc0
   [<c062e339>] __schedule+0x947/0x991
   [<c062e39a>] schedule+0x17/0x30
   [<c0426c54>] migration_thread+0x175/0x203
   [<c043e204>] kthread+0x4a/0x72
   [<c04034e7>] kernel_thread_helper+0x7/0x10
   [<ffffffff>] 0xffffffff

   ... acquired at:
   [<c044d243>] validate_chain+0x8a8/0xbae
   [<c044dbfd>] __lock_acquire+0x6b4/0x73e
   [<c044dd36>] lock_acquire+0xaf/0xcc
   [<c0630305>] _spin_lock_nested+0x2d/0x3e
   [<c0422cb4>] double_rq_lock+0x4b/0x7d
   [<c0427274>] rebalance_domains+0x19b/0x3ac
   [<c0429a06>] run_rebalance_domains+0x32/0xaa
   [<c0431379>] __do_softirq+0xb8/0x180
   [<ffffffff>] 0xffffffff

  ... acquired at:
   [<c044d243>] validate_chain+0x8a8/0xbae
   [<c044dbfd>] __lock_acquire+0x6b4/0x73e
   [<c044dd36>] lock_acquire+0xaf/0xcc
   [<c0630340>] _spin_lock+0x2a/0x39
   [<c041ede7>] task_rq_lock+0x3b/0x62
   [<c0426e41>] try_to_wake_up+0x75/0x2d4
   [<c04270b0>] default_wake_function+0x10/0x12
   [<c041d785>] __wake_up_common+0x34/0x5f
   [<c041ec26>] complete+0x30/0x43
   [<c043e0cc>] kthreadd+0xbc/0xe3
   [<c04034e7>] kernel_thread_helper+0x7/0x10
   [<ffffffff>] 0xffffffff

  -> (&ep->lock){......} ops: 110 {
     INITIAL USE at:
                           [<c044dad5>] __lock_acquire+0x58c/0x73e
                           [<c044dd36>] lock_acquire+0xaf/0xcc
                           [<c063056b>] _spin_lock_irqsave+0x33/0x43
                           [<c04ca381>] sys_epoll_ctl+0x232/0x3f6
                           [<c0402a68>] sysenter_do_call+0x12/0x36
                           [<ffffffff>] 0xffffffff
   }
   ... key      at: [<c0c5be90>] __key.22301+0x0/0x10
   ... acquired at:
   [<c044d243>] validate_chain+0x8a8/0xbae
   [<c044dbfd>] __lock_acquire+0x6b4/0x73e
   [<c044dd36>] lock_acquire+0xaf/0xcc
   [<c0630340>] _spin_lock+0x2a/0x39
   [<c041ede7>] task_rq_lock+0x3b/0x62
   [<c0426e41>] try_to_wake_up+0x75/0x2d4
   [<c04270b0>] default_wake_function+0x10/0x12
   [<c041d785>] __wake_up_common+0x34/0x5f
   [<c041d7c6>] __wake_up_locked+0x16/0x1a
   [<c04ca7f5>] ep_poll_callback+0x7c/0xb6
   [<c041d785>] __wake_up_common+0x34/0x5f
   [<c041ec70>] __wake_up_sync_key+0x37/0x4a
   [<c05cbefa>] sock_def_readable+0x42/0x71
   [<c061c8b1>] unix_stream_connect+0x2f3/0x368
   [<c05c830a>] sys_connect+0x59/0x76
   [<c05c963f>] sys_socketcall+0x76/0x172
   [<c0402a68>] sysenter_do_call+0x12/0x36
   [<ffffffff>] 0xffffffff

  ... acquired at:
   [<c044d243>] validate_chain+0x8a8/0xbae
   [<c044dbfd>] __lock_acquire+0x6b4/0x73e
   [<c044dd36>] lock_acquire+0xaf/0xcc
   [<c063056b>] _spin_lock_irqsave+0x33/0x43
   [<c04ca797>] ep_poll_callback+0x1e/0xb6
   [<c041d785>] __wake_up_common+0x34/0x5f
   [<c041ec70>] __wake_up_sync_key+0x37/0x4a
   [<c05cbefa>] sock_def_readable+0x42/0x71
   [<c061c8b1>] unix_stream_connect+0x2f3/0x368
   [<c05c830a>] sys_connect+0x59/0x76
   [<c05c963f>] sys_socketcall+0x76/0x172
   [<c0402a68>] sysenter_do_call+0x12/0x36
   [<ffffffff>] 0xffffffff

 ... acquired at:
   [<c044d243>] validate_chain+0x8a8/0xbae
   [<c044dbfd>] __lock_acquire+0x6b4/0x73e
   [<c044dd36>] lock_acquire+0xaf/0xcc
   [<c063056b>] _spin_lock_irqsave+0x33/0x43
   [<c041ec0d>] complete+0x17/0x43
   [<c0509cf2>] blk_end_sync_rq+0x2a/0x2d
   [<c0506935>] end_that_request_last+0x17b/0x1a1
   [<c0506a0d>] blk_end_io+0x51/0x6f
   [<c0506a64>] blk_end_request+0x11/0x13
   [<f8106c9c>] scsi_io_completion+0x1d9/0x41f [scsi_mod]
   [<f810152d>] scsi_finish_command+0xcc/0xd4 [scsi_mod]
   [<f8106fdb>] scsi_softirq_done+0xf9/0x101 [scsi_mod]
   [<c050a936>] blk_done_softirq+0x5e/0x70
   [<c0431379>] __do_softirq+0xb8/0x180
   [<ffffffff>] 0xffffffff

 -> (&n->list_lock){..-...} ops: 49241 {
    IN-SOFTIRQ-W at:
                          [<c044da08>] __lock_acquire+0x4bf/0x73e
                          [<c044dd36>] lock_acquire+0xaf/0xcc
                          [<c0630340>] _spin_lock+0x2a/0x39
                          [<c049bd18>] add_partial+0x16/0x40
                          [<c049d0d4>] __slab_free+0x96/0x28f
                          [<c049df5c>] kmem_cache_free+0x8c/0xf2
                          [<c04a5ce9>] file_free_rcu+0x35/0x38
                          [<c0461a12>] rcu_process_callbacks+0x62/0x86
                          [<c0431379>] __do_softirq+0xb8/0x180
                          [<ffffffff>] 0xffffffff
    INITIAL USE at:
                         [<c044dad5>] __lock_acquire+0x58c/0x73e
                         [<c044dd36>] lock_acquire+0xaf/0xcc
                         [<c0630340>] _spin_lock+0x2a/0x39
                         [<c049bd18>] add_partial+0x16/0x40
                         [<c049d0d4>] __slab_free+0x96/0x28f
                         [<c049df5c>] kmem_cache_free+0x8c/0xf2
                         [<c0514eda>] ida_get_new_above+0x13b/0x155
                         [<c0514f00>] ida_get_new+0xc/0xe
                         [<c04a628b>] set_anon_super+0x39/0xa3
                         [<c04a68c6>] sget+0x2f3/0x386
                         [<c04a7365>] get_sb_single+0x24/0x8f
                         [<c04e034c>] sysfs_get_sb+0x18/0x1a
                         [<c04a6dd1>] vfs_kern_mount+0x40/0x7b
                         [<c04a6e21>] kern_mount_data+0x15/0x17
                         [<c07b5ff6>] sysfs_init+0x50/0x9c
                         [<c07b4ac9>] mnt_init+0x8c/0x1e4
                         [<c07b4737>] vfs_caches_init+0xd8/0xea
                         [<c079d815>] start_kernel+0x2bb/0x2fc
                         [<c079d06a>] __init_begin+0x6a/0x6f
                         [<ffffffff>] 0xffffffff
  }
  ... key      at: [<c0c5a424>] __key.25358+0x0/0x8
 ... acquired at:
   [<c044d243>] validate_chain+0x8a8/0xbae
   [<c044dbfd>] __lock_acquire+0x6b4/0x73e
   [<c044dd36>] lock_acquire+0xaf/0xcc
   [<c0630340>] _spin_lock+0x2a/0x39
   [<c049cc45>] __slab_alloc+0xf6/0x4ef
   [<c049d333>] kmem_cache_alloc+0x66/0x11f
   [<f810189b>] scsi_pool_alloc_command+0x20/0x4c [scsi_mod]
   [<f81018de>] scsi_host_alloc_command+0x17/0x4f [scsi_mod]
   [<f810192b>] __scsi_get_command+0x15/0x71 [scsi_mod]
   [<f8101c41>] scsi_get_command+0x39/0x95 [scsi_mod]
   [<f81062b6>] scsi_get_cmd_from_req+0x26/0x50 [scsi_mod]
   [<f8106594>] scsi_setup_blk_pc_cmnd+0x2b/0xd7 [scsi_mod]
   [<f8106664>] scsi_prep_fn+0x24/0x33 [scsi_mod]
   [<c0504712>] elv_next_request+0xe6/0x18d
   [<f810704c>] scsi_request_fn+0x69/0x431 [scsi_mod]
   [<c05072af>] __generic_unplug_device+0x2e/0x31
   [<c0509d59>] blk_execute_rq_nowait+0x64/0x86
   [<c0509e2e>] blk_execute_rq+0xb3/0xd5
   [<f81068f5>] scsi_execute+0xc5/0x11c [scsi_mod]
   [<f81069ff>] scsi_execute_req+0xb3/0x104 [scsi_mod]
   [<f812b40d>] sd_revalidate_disk+0x1a3/0xf64 [sd_mod]
   [<f812d52f>] sd_probe_async+0x146/0x22d [sd_mod]
   [<c044341f>] async_thread+0xe9/0x1c9
   [<c043e204>] kthread+0x4a/0x72
   [<c04034e7>] kernel_thread_helper+0x7/0x10
   [<ffffffff>] 0xffffffff

 -> (&cwq->lock){-.-...} ops: 30335 {
    IN-HARDIRQ-W at:
                          [<c044d9e4>] __lock_acquire+0x49b/0x73e
                          [<c044dd36>] lock_acquire+0xaf/0xcc
                          [<c063056b>] _spin_lock_irqsave+0x33/0x43
                          [<c043b54b>] __queue_work+0x14/0x30
                          [<c043b5ce>] queue_work_on+0x3a/0x46
                          [<c043b617>] queue_work+0x26/0x4a
                          [<c043b64f>] schedule_work+0x14/0x16
                          [<c057a367>] schedule_console_callback+0x12/0x14
                          [<c05788ed>] kbd_event+0x595/0x600
                          [<c05b3d15>] input_pass_event+0x56/0x7e
                          [<c05b4702>] input_handle_event+0x314/0x334
                          [<c05b4f1e>] input_event+0x50/0x63
                          [<c05b9bd4>] atkbd_interrupt+0x209/0x4e9
                          [<c05b1793>] serio_interrupt+0x38/0x6e
                          [<c05b24e8>] i8042_interrupt+0x1db/0x1ec
                          [<c045e922>] handle_IRQ_event+0xa4/0x169
                          [<c04602ea>] handle_edge_irq+0xc9/0x10a
                          [<ffffffff>] 0xffffffff
    IN-SOFTIRQ-W at:
                          [<c044da08>] __lock_acquire+0x4bf/0x73e
                          [<c044dd36>] lock_acquire+0xaf/0xcc
                          [<c063056b>] _spin_lock_irqsave+0x33/0x43
                          [<c043b54b>] __queue_work+0x14/0x30
                          [<c043b590>] delayed_work_timer_fn+0x29/0x2d
                          [<c0434caa>] run_timer_softirq+0x15b/0x1d1
                          [<c0431379>] __do_softirq+0xb8/0x180
                          [<ffffffff>] 0xffffffff
    INITIAL USE at:
                         [<c044dad5>] __lock_acquire+0x58c/0x73e
                         [<c044dd36>] lock_acquire+0xaf/0xcc
                         [<c063056b>] _spin_lock_irqsave+0x33/0x43
                         [<c043b54b>] __queue_work+0x14/0x30
                         [<c043b5ce>] queue_work_on+0x3a/0x46
                         [<c043b617>] queue_work+0x26/0x4a
                         [<c043a7b3>] call_usermodehelper_exec+0x83/0xd0
                         [<c051631a>] kobject_uevent_env+0x351/0x385
                         [<c0516358>] kobject_uevent+0xa/0xc
                         [<c0515a0e>] kset_register+0x2e/0x34
                         [<c0590f18>] bus_register+0xed/0x23d
                         [<c07bea09>] platform_bus_init+0x23/0x38
                         [<c07beb77>] driver_init+0x1c/0x28
                         [<c079d4f6>] kernel_init+0xf6/0x15a
                         [<c04034e7>] kernel_thread_helper+0x7/0x10
                         [<ffffffff>] 0xffffffff
  }
  ... key      at: [<c08230a8>] __key.23814+0x0/0x8
  -> (&workqueue_cpu_stat(cpu)->lock){-.-...} ops: 20397 {
     IN-HARDIRQ-W at:
                            [<c044d9e4>] __lock_acquire+0x49b/0x73e
                            [<c044dd36>] lock_acquire+0xaf/0xcc
                            [<c063056b>] _spin_lock_irqsave+0x33/0x43
                            [<c0474909>] probe_workqueue_insertion+0x33/0x81
                            [<c043acf3>] insert_work+0x3f/0x9b
                            [<c043b559>] __queue_work+0x22/0x30
                            [<c043b5ce>] queue_work_on+0x3a/0x46
                            [<c043b617>] queue_work+0x26/0x4a
                            [<c043b64f>] schedule_work+0x14/0x16
                            [<c057a367>] schedule_console_callback+0x12/0x14
                            [<c05788ed>] kbd_event+0x595/0x600
                            [<c05b3d15>] input_pass_event+0x56/0x7e
                            [<c05b4702>] input_handle_event+0x314/0x334
                            [<c05b4f1e>] input_event+0x50/0x63
                            [<c05b9bd4>] atkbd_interrupt+0x209/0x4e9
                            [<c05b1793>] serio_interrupt+0x38/0x6e
                            [<c05b24e8>] i8042_interrupt+0x1db/0x1ec
                            [<c045e922>] handle_IRQ_event+0xa4/0x169
                            [<c04602ea>] handle_edge_irq+0xc9/0x10a
                            [<ffffffff>] 0xffffffff
     IN-SOFTIRQ-W at:
                            [<c044da08>] __lock_acquire+0x4bf/0x73e
                            [<c044dd36>] lock_acquire+0xaf/0xcc
                            [<c063056b>] _spin_lock_irqsave+0x33/0x43
                            [<c0474909>] probe_workqueue_insertion+0x33/0x81
                            [<c043acf3>] insert_work+0x3f/0x9b
                            [<c043b559>] __queue_work+0x22/0x30
                            [<c043b590>] delayed_work_timer_fn+0x29/0x2d
                            [<c0434caa>] run_timer_softirq+0x15b/0x1d1
                            [<c0431379>] __do_softirq+0xb8/0x180
                            [<ffffffff>] 0xffffffff
     INITIAL USE at:
                           [<c044dad5>] __lock_acquire+0x58c/0x73e
                           [<c044dd36>] lock_acquire+0xaf/0xcc
                           [<c063056b>] _spin_lock_irqsave+0x33/0x43
                           [<c04747eb>] probe_workqueue_creation+0xc9/0x10a
                           [<c043abcb>] create_workqueue_thread+0x87/0xb0
                           [<c043b12f>] __create_workqueue_key+0x16d/0x1b2
                           [<c07aeedb>] init_workqueues+0x61/0x73
                           [<c079d4e7>] kernel_init+0xe7/0x15a
                           [<c04034e7>] kernel_thread_helper+0x7/0x10
                           [<ffffffff>] 0xffffffff
   }
   ... key      at: [<c0c52574>] __key.23424+0x0/0x8
  ... acquired at:
   [<c044d243>] validate_chain+0x8a8/0xbae
   [<c044dbfd>] __lock_acquire+0x6b4/0x73e
   [<c044dd36>] lock_acquire+0xaf/0xcc
   [<c063056b>] _spin_lock_irqsave+0x33/0x43
   [<c0474909>] probe_workqueue_insertion+0x33/0x81
   [<c043acf3>] insert_work+0x3f/0x9b
   [<c043b559>] __queue_work+0x22/0x30
   [<c043b5ce>] queue_work_on+0x3a/0x46
   [<c043b617>] queue_work+0x26/0x4a
   [<c043a7b3>] call_usermodehelper_exec+0x83/0xd0
   [<c051631a>] kobject_uevent_env+0x351/0x385
   [<c0516358>] kobject_uevent+0xa/0xc
   [<c0515a0e>] kset_register+0x2e/0x34
   [<c0590f18>] bus_register+0xed/0x23d
   [<c07bea09>] platform_bus_init+0x23/0x38
   [<c07beb77>] driver_init+0x1c/0x28
   [<c079d4f6>] kernel_init+0xf6/0x15a
   [<c04034e7>] kernel_thread_helper+0x7/0x10
   [<ffffffff>] 0xffffffff

  ... acquired at:
   [<c044d243>] validate_chain+0x8a8/0xbae
   [<c044dbfd>] __lock_acquire+0x6b4/0x73e
   [<c044dd36>] lock_acquire+0xaf/0xcc
   [<c063056b>] _spin_lock_irqsave+0x33/0x43
   [<c041ecaf>] __wake_up+0x1a/0x40
   [<c043ad46>] insert_work+0x92/0x9b
   [<c043b559>] __queue_work+0x22/0x30
   [<c043b5ce>] queue_work_on+0x3a/0x46
   [<c043b617>] queue_work+0x26/0x4a
   [<c043a7b3>] call_usermodehelper_exec+0x83/0xd0
   [<c051631a>] kobject_uevent_env+0x351/0x385
   [<c0516358>] kobject_uevent+0xa/0xc
   [<c0515a0e>] kset_register+0x2e/0x34
   [<c0590f18>] bus_register+0xed/0x23d
   [<c07bea09>] platform_bus_init+0x23/0x38
   [<c07beb77>] driver_init+0x1c/0x28
   [<c079d4f6>] kernel_init+0xf6/0x15a
   [<c04034e7>] kernel_thread_helper+0x7/0x10
   [<ffffffff>] 0xffffffff

 ... acquired at:
   [<c044d243>] validate_chain+0x8a8/0xbae
   [<c044dbfd>] __lock_acquire+0x6b4/0x73e
   [<c044dd36>] lock_acquire+0xaf/0xcc
   [<c063056b>] _spin_lock_irqsave+0x33/0x43
   [<c043b54b>] __queue_work+0x14/0x30
   [<c043b5ce>] queue_work_on+0x3a/0x46
   [<c043b617>] queue_work+0x26/0x4a
   [<c0505679>] kblockd_schedule_work+0x12/0x14
   [<c05113bb>] elv_schedule_dispatch+0x41/0x48
   [<c0513377>] elv_ioq_completed_request+0x2dc/0x2fe
   [<c05045aa>] elv_completed_request+0x48/0x97
   [<c0506738>] __blk_put_request+0x36/0xb8
   [<c0506953>] end_that_request_last+0x199/0x1a1
   [<c0506a0d>] blk_end_io+0x51/0x6f
   [<c0506a64>] blk_end_request+0x11/0x13
   [<f8106c9c>] scsi_io_completion+0x1d9/0x41f [scsi_mod]
   [<f810152d>] scsi_finish_command+0xcc/0xd4 [scsi_mod]
   [<f8106fdb>] scsi_softirq_done+0xf9/0x101 [scsi_mod]
   [<c050a936>] blk_done_softirq+0x5e/0x70
   [<c0431379>] __do_softirq+0xb8/0x180
   [<ffffffff>] 0xffffffff

 -> (&zone->lock){..-...} ops: 80266 {
    IN-SOFTIRQ-W at:
                          [<c044da08>] __lock_acquire+0x4bf/0x73e
                          [<c044dd36>] lock_acquire+0xaf/0xcc
                          [<c0630340>] _spin_lock+0x2a/0x39
                          [<c047fc71>] __free_pages_ok+0x167/0x321
                          [<c04800ce>] __free_pages+0x29/0x2b
                          [<c049c7c1>] __free_slab+0xb2/0xba
                          [<c049c800>] discard_slab+0x37/0x39
                          [<c049d15c>] __slab_free+0x11e/0x28f
                          [<c049df5c>] kmem_cache_free+0x8c/0xf2
                          [<c042ab6e>] free_task+0x31/0x34
                          [<c042c37b>] __put_task_struct+0xd3/0xd8
                          [<c042e072>] delayed_put_task_struct+0x60/0x64
                          [<c0461a12>] rcu_process_callbacks+0x62/0x86
                          [<c0431379>] __do_softirq+0xb8/0x180
                          [<ffffffff>] 0xffffffff
    INITIAL USE at:
                         [<c044dad5>] __lock_acquire+0x58c/0x73e
                         [<c044dd36>] lock_acquire+0xaf/0xcc
                         [<c0630340>] _spin_lock+0x2a/0x39
                         [<c047f7b6>] free_pages_bulk+0x21/0x1a1
                         [<c047ffcf>] free_hot_cold_page+0x181/0x20f
                         [<c04800a3>] free_hot_page+0xf/0x11
                         [<c04800c5>] __free_pages+0x20/0x2b
                         [<c07c4d96>] __free_pages_bootmem+0x6d/0x71
                         [<c07b2244>] free_all_bootmem_core+0xd2/0x177
                         [<c07b22f6>] free_all_bootmem+0xd/0xf
                         [<c07ad21a>] mem_init+0x28/0x28c
                         [<c079d7b1>] start_kernel+0x257/0x2fc
                         [<c079d06a>] __init_begin+0x6a/0x6f
                         [<ffffffff>] 0xffffffff
  }
  ... key      at: [<c0c52628>] __key.30749+0x0/0x8
 ... acquired at:
   [<c044d243>] validate_chain+0x8a8/0xbae
   [<c044dbfd>] __lock_acquire+0x6b4/0x73e
   [<c044dd36>] lock_acquire+0xaf/0xcc
   [<c063056b>] _spin_lock_irqsave+0x33/0x43
   [<c048035e>] get_page_from_freelist+0x236/0x3e3
   [<c04805f4>] __alloc_pages_internal+0xce/0x371
   [<c049cce6>] __slab_alloc+0x197/0x4ef
   [<c049d333>] kmem_cache_alloc+0x66/0x11f
   [<c047d96b>] mempool_alloc_slab+0x13/0x15
   [<c047da5c>] mempool_alloc+0x3a/0xd5
   [<f81063cc>] scsi_sg_alloc+0x47/0x4a [scsi_mod]
   [<c051cd02>] __sg_alloc_table+0x48/0xc7
   [<f8106325>] scsi_init_sgtable+0x2c/0x8c [scsi_mod]
   [<f81064e7>] scsi_init_io+0x19/0x9b [scsi_mod]
   [<f8106abf>] scsi_setup_fs_cmnd+0x6f/0x73 [scsi_mod]
   [<f812ca73>] sd_prep_fn+0x6a/0x7d4 [sd_mod]
   [<c0504712>] elv_next_request+0xe6/0x18d
   [<f810704c>] scsi_request_fn+0x69/0x431 [scsi_mod]
   [<c05072af>] __generic_unplug_device+0x2e/0x31
   [<c05072db>] blk_start_queueing+0x29/0x2b
   [<c05137b8>] elv_ioq_request_add+0x2be/0x393
   [<c05048cd>] elv_insert+0x114/0x1a2
   [<c05049ec>] __elv_add_request+0x91/0x96
   [<c0507a00>] __make_request+0x365/0x397
   [<c050635a>] generic_make_request+0x342/0x3ce
   [<c0507b21>] submit_bio+0xef/0xfa
   [<c04c6c4e>] mpage_bio_submit+0x21/0x26
   [<c04c7b7f>] mpage_readpages+0xa3/0xad
   [<f80c1ea8>] ext3_readpages+0x19/0x1b [ext3]
   [<c048275e>] __do_page_cache_readahead+0xfd/0x166
   [<c0482b42>] do_page_cache_readahead+0x44/0x52
   [<c047d665>] filemap_fault+0x197/0x3ae
   [<c048b9ea>] __do_fault+0x40/0x37b
   [<c048d43f>] handle_mm_fault+0x2bb/0x646
   [<c063273c>] do_page_fault+0x29c/0x2fd
   [<c0630b4a>] error_code+0x72/0x78
   [<ffffffff>] 0xffffffff

 -> (&page_address_htable[i].lock){......} ops: 6802 {
    INITIAL USE at:
                         [<c044dad5>] __lock_acquire+0x58c/0x73e
                         [<c044dd36>] lock_acquire+0xaf/0xcc
                         [<c063056b>] _spin_lock_irqsave+0x33/0x43
                         [<c048af69>] page_address+0x50/0xa6
                         [<c048b0e7>] kmap_high+0x21/0x175
                         [<c041b7ef>] kmap+0x4e/0x5b
                         [<c04abb36>] page_getlink+0x37/0x59
                         [<c04abb75>] page_follow_link_light+0x1d/0x2b
                         [<c04ad4d0>] __link_path_walk+0x3d1/0xa71
                         [<c04adbae>] path_walk+0x3e/0x77
                         [<c04add0e>] do_path_lookup+0xeb/0x105
                         [<c04ae6f2>] path_lookup_open+0x48/0x7a
                         [<c04a8e96>] open_exec+0x25/0xf4
                         [<c04a9c2d>] do_execve+0xfa/0x2cc
                         [<c04015c0>] sys_execve+0x2b/0x54
                         [<c0402ae9>] syscall_call+0x7/0xb
                         [<ffffffff>] 0xffffffff
  }
  ... key      at: [<c0c5288c>] __key.28547+0x0/0x14
 ... acquired at:
   [<c044d243>] validate_chain+0x8a8/0xbae
   [<c044dbfd>] __lock_acquire+0x6b4/0x73e
   [<c044dd36>] lock_acquire+0xaf/0xcc
   [<c063056b>] _spin_lock_irqsave+0x33/0x43
   [<c048af69>] page_address+0x50/0xa6
   [<c05078a1>] __make_request+0x206/0x397
   [<c050635a>] generic_make_request+0x342/0x3ce
   [<c0507b21>] submit_bio+0xef/0xfa
   [<c04c6c4e>] mpage_bio_submit+0x21/0x26
   [<c04c78b8>] do_mpage_readpage+0x471/0x5e5
   [<c04c7b55>] mpage_readpages+0x79/0xad
   [<f80c1ea8>] ext3_readpages+0x19/0x1b [ext3]
   [<c048275e>] __do_page_cache_readahead+0xfd/0x166
   [<c0482b42>] do_page_cache_readahead+0x44/0x52
   [<c047d665>] filemap_fault+0x197/0x3ae
   [<c048b9ea>] __do_fault+0x40/0x37b
   [<c048d43f>] handle_mm_fault+0x2bb/0x646
   [<c063273c>] do_page_fault+0x29c/0x2fd
   [<c0630b4a>] error_code+0x72/0x78
   [<ffffffff>] 0xffffffff

 ... acquired at:
   [<c044d243>] validate_chain+0x8a8/0xbae
   [<c044dbfd>] __lock_acquire+0x6b4/0x73e
   [<c044dd36>] lock_acquire+0xaf/0xcc
   [<c0630340>] _spin_lock+0x2a/0x39
   [<c046143d>] call_rcu+0x36/0x5b
   [<c050f0c8>] cfq_cic_free+0x15/0x17
   [<c050f128>] cic_free_func+0x5e/0x64
   [<c050ea90>] __call_for_each_cic+0x23/0x2e
   [<c050eaad>] cfq_free_io_context+0x12/0x14
   [<c050978c>] put_io_context+0x4b/0x66
   [<c050f00a>] cfq_active_ioq_reset+0x21/0x39
   [<c0511044>] elv_reset_active_ioq+0x2b/0x3e
   [<c0512ecf>] __elv_ioq_slice_expired+0x238/0x26a
   [<c0512f1f>] elv_ioq_slice_expired+0x1e/0x20
   [<c0513860>] elv_ioq_request_add+0x366/0x393
   [<c05048cd>] elv_insert+0x114/0x1a2
   [<c05049ec>] __elv_add_request+0x91/0x96
   [<c0507a00>] __make_request+0x365/0x397
   [<c050635a>] generic_make_request+0x342/0x3ce
   [<c0507b21>] submit_bio+0xef/0xfa
   [<c04bf495>] submit_bh+0xe3/0x102
   [<c04c04b0>] ll_rw_block+0xbe/0xf7
   [<f80c35ba>] ext3_bread+0x39/0x79 [ext3]
   [<f80c5643>] dx_probe+0x2f/0x298 [ext3]
   [<f80c5956>] ext3_find_entry+0xaa/0x573 [ext3]
   [<f80c739e>] ext3_lookup+0x31/0xbe [ext3]
   [<c04abf7c>] do_lookup+0xbc/0x159
   [<c04ad7e8>] __link_path_walk+0x6e9/0xa71
   [<c04adbae>] path_walk+0x3e/0x77
   [<c04add0e>] do_path_lookup+0xeb/0x105
   [<c04ae584>] user_path_at+0x41/0x6c
   [<c04a8301>] vfs_fstatat+0x32/0x59
   [<c04a8417>] vfs_stat+0x18/0x1a
   [<c04a8432>] sys_stat64+0x19/0x2d
   [<c0402a68>] sysenter_do_call+0x12/0x36
   [<ffffffff>] 0xffffffff

 -> (&iocg->lock){+.+...} ops: 3 {
    HARDIRQ-ON-W at:
                          [<c044b840>] mark_held_locks+0x3d/0x58
                          [<c044b963>] trace_hardirqs_on_caller+0x108/0x14c
                          [<c044b9b2>] trace_hardirqs_on+0xb/0xd
                          [<c0630883>] _spin_unlock_irq+0x27/0x47
                          [<c0513baa>] iocg_destroy+0xbc/0x118
                          [<c045a16a>] cgroup_diput+0x4b/0xa7
                          [<c04b1dbb>] dentry_iput+0x78/0x9c
                          [<c04b1e82>] d_kill+0x21/0x3b
                          [<c04b2f2a>] dput+0xf3/0xfc
                          [<c04ae226>] do_rmdir+0x9a/0xc8
                          [<c04ae29d>] sys_rmdir+0x15/0x17
                          [<c0402a68>] sysenter_do_call+0x12/0x36
                          [<ffffffff>] 0xffffffff
    SOFTIRQ-ON-W at:
                          [<c044b840>] mark_held_locks+0x3d/0x58
                          [<c044b97c>] trace_hardirqs_on_caller+0x121/0x14c
                          [<c044b9b2>] trace_hardirqs_on+0xb/0xd
                          [<c0630883>] _spin_unlock_irq+0x27/0x47
                          [<c0513baa>] iocg_destroy+0xbc/0x118
                          [<c045a16a>] cgroup_diput+0x4b/0xa7
                          [<c04b1dbb>] dentry_iput+0x78/0x9c
                          [<c04b1e82>] d_kill+0x21/0x3b
                          [<c04b2f2a>] dput+0xf3/0xfc
                          [<c04ae226>] do_rmdir+0x9a/0xc8
                          [<c04ae29d>] sys_rmdir+0x15/0x17
                          [<c0402a68>] sysenter_do_call+0x12/0x36
                          [<ffffffff>] 0xffffffff
    INITIAL USE at:
                         [<c044dad5>] __lock_acquire+0x58c/0x73e
                         [<c044dd36>] lock_acquire+0xaf/0xcc
                         [<c06304ea>] _spin_lock_irq+0x30/0x3f
                         [<c05119bd>] io_alloc_root_group+0x104/0x155
                         [<c05133cb>] elv_init_fq_data+0x32/0xe0
                         [<c0504317>] elevator_alloc+0x150/0x170
                         [<c0505393>] elevator_init+0x9d/0x100
                         [<c0507088>] blk_init_queue_node+0xc4/0xf7
                         [<c05070cb>] blk_init_queue+0x10/0x12
                         [<f81060fd>] __scsi_alloc_queue+0x1c/0xba [scsi_mod]
                         [<f81061b0>] scsi_alloc_queue+0x15/0x4e [scsi_mod]
                         [<f810803d>] scsi_alloc_sdev+0x154/0x1f5 [scsi_mod]
                         [<f8108387>] scsi_probe_and_add_lun+0x123/0xb5b [scsi_mod]
                         [<f8109847>] __scsi_add_device+0x8a/0xb0 [scsi_mod]
                         [<f816ad14>] ata_scsi_scan_host+0x77/0x141 [libata]
                         [<f816903f>] async_port_probe+0xa0/0xa9 [libata]
                         [<c044341f>] async_thread+0xe9/0x1c9
                         [<c043e204>] kthread+0x4a/0x72
                         [<c04034e7>] kernel_thread_helper+0x7/0x10
                         [<ffffffff>] 0xffffffff
  }
  ... key      at: [<c0c5ebd8>] __key.29462+0x0/0x8
 ... acquired at:
   [<c044d243>] validate_chain+0x8a8/0xbae
   [<c044dbfd>] __lock_acquire+0x6b4/0x73e
   [<c044dd36>] lock_acquire+0xaf/0xcc
   [<c063056b>] _spin_lock_irqsave+0x33/0x43
   [<c0510f6f>] io_group_chain_link+0x5c/0x106
   [<c0511ba7>] io_find_alloc_group+0x54/0x60
   [<c0511c11>] io_get_io_group_bio+0x5e/0x89
   [<c0511cc3>] io_group_get_request_list+0x12/0x21
   [<c0507485>] get_request_wait+0x124/0x15d
   [<c050797e>] __make_request+0x2e3/0x397
   [<c050635a>] generic_make_request+0x342/0x3ce
   [<c0507b21>] submit_bio+0xef/0xfa
   [<c04c6c4e>] mpage_bio_submit+0x21/0x26
   [<c04c7b7f>] mpage_readpages+0xa3/0xad
   [<f80c1ea8>] ext3_readpages+0x19/0x1b [ext3]
   [<c048275e>] __do_page_cache_readahead+0xfd/0x166
   [<c048294a>] ondemand_readahead+0x10a/0x118
   [<c04829db>] page_cache_sync_readahead+0x1b/0x20
   [<c047cf37>] generic_file_aio_read+0x226/0x545
   [<c04a4cf6>] do_sync_read+0xb0/0xee
   [<c04a54b0>] vfs_read+0x8f/0x136
   [<c04a8d7c>] kernel_read+0x39/0x4b
   [<c04a8e69>] prepare_binprm+0xdb/0xe3
   [<c04a9ca8>] do_execve+0x175/0x2cc
   [<c04015c0>] sys_execve+0x2b/0x54
   [<c0402a68>] sysenter_do_call+0x12/0x36
   [<ffffffff>] 0xffffffff


stack backtrace:
Pid: 2186, comm: rmdir Not tainted 2.6.30-rc4-io #6
Call Trace:
 [<c044b1ac>] print_irq_inversion_bug+0x13b/0x147
 [<c044c3e5>] check_usage_backwards+0x7d/0x86
 [<c044b5ec>] mark_lock+0x2d3/0x4ea
 [<c044c368>] ? check_usage_backwards+0x0/0x86
 [<c044b840>] mark_held_locks+0x3d/0x58
 [<c0630883>] ? _spin_unlock_irq+0x27/0x47
 [<c044b97c>] trace_hardirqs_on_caller+0x121/0x14c
 [<c044b9b2>] trace_hardirqs_on+0xb/0xd
 [<c0630883>] _spin_unlock_irq+0x27/0x47
 [<c0513baa>] iocg_destroy+0xbc/0x118
 [<c045a16a>] cgroup_diput+0x4b/0xa7
 [<c04b1dbb>] dentry_iput+0x78/0x9c
 [<c04b1e82>] d_kill+0x21/0x3b
 [<c04b2f2a>] dput+0xf3/0xfc
 [<c04ae226>] do_rmdir+0x9a/0xc8
 [<c04029b1>] ? resume_userspace+0x11/0x28
 [<c051aa14>] ? trace_hardirqs_on_thunk+0xc/0x10
 [<c0402b34>] ? restore_nocheck_notrace+0x0/0xe
 [<c06324a0>] ? do_page_fault+0x0/0x2fd
 [<c044b97c>] ? trace_hardirqs_on_caller+0x121/0x14c
 [<c04ae29d>] sys_rmdir+0x15/0x17
 [<c0402a68>] sysenter_do_call+0x12/0x36

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: IO scheduler based IO Controller V2
  2009-05-06 22:35             ` Andrea Righi
@ 2009-05-07  1:48               ` Ryo Tsuruta
  2009-05-07  1:48               ` Ryo Tsuruta
  1 sibling, 0 replies; 97+ messages in thread
From: Ryo Tsuruta @ 2009-05-07  1:48 UTC (permalink / raw)
  To: righi.andrea-Re5JQEeQqe8AvxtiuMwx3w
  Cc: dhaval-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8,
	snitzer-H+wXaHxf7aLQT0dZR+AlfA, dm-devel-H+wXaHxf7aLQT0dZR+AlfA,
	jens.axboe-QHcLZuEGTsvQT0dZR+AlfA, agk-H+wXaHxf7aLQT0dZR+AlfA,
	balbir-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8,
	paolo.valente-rcYM44yAMweonA0d6jMUrA,
	fernando-gVGce1chcLdL9jVzuh4AOg, jmoyer-H+wXaHxf7aLQT0dZR+AlfA,
	fchecconi-Re5JQEeQqe8AvxtiuMwx3w,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b

From: Andrea Righi <righi.andrea-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Subject: Re: IO scheduler based IO Controller V2
Date: Thu, 7 May 2009 00:35:13 +0200

> On Wed, May 06, 2009 at 05:52:35PM -0400, Vivek Goyal wrote:
> > On Wed, May 06, 2009 at 11:34:54PM +0200, Andrea Righi wrote:
> > > On Wed, May 06, 2009 at 04:32:28PM -0400, Vivek Goyal wrote:
> > > > Hi Andrea and others,
> > > > 
> > > > I always had this doubt in mind that any kind of 2nd level controller will
> > > > have no idea about underlying IO scheduler queues/semantics. So while it
> > > > can implement a particular cgroup policy (max bw like io-throttle or
> > > > proportional bw like dm-ioband) but there are high chances that it will
> > > > break IO scheduler's semantics in one way or other.
> > > > 
> > > > I had already sent out the results for dm-ioband in a separate thread.
> > > > 
> > > > http://linux.derkeiler.com/Mailing-Lists/Kernel/2009-04/msg07258.html
> > > > http://linux.derkeiler.com/Mailing-Lists/Kernel/2009-04/msg07573.html
> > > > http://linux.derkeiler.com/Mailing-Lists/Kernel/2009-04/msg08177.html
> > > > http://linux.derkeiler.com/Mailing-Lists/Kernel/2009-04/msg08345.html
> > > > http://linux.derkeiler.com/Mailing-Lists/Kernel/2009-04/msg08355.html
> > > > 
> > > > Here are some basic results with io-throttle. Andrea, please let me know
> > > > if you think this is procedural problem. Playing with io-throttle patches
> > > > for the first time.
> > > > 
> > > > I took V16 of your patches and trying it out with 2.6.30-rc4 with CFQ
> > > > scheduler.
> > > > 
> > > > I have got one SATA drive with one partition on it.
> > > > 
> > > > I am trying to create one cgroup and assignn 8MB/s limit to it and launch
> > > > on RT prio 0 task and one BE prio 7 task and see how this 8MB/s is divided
> > > > between these tasks. Following are the results.
> > > > 
> > > > Following is my test script.
> > > > 
> > > > *******************************************************************
> > > > #!/bin/bash
> > > > 
> > > > mount /dev/sdb1 /mnt/sdb
> > > > 
> > > > mount -t cgroup -o blockio blockio /cgroup/iot/
> > > > mkdir -p /cgroup/iot/test1 /cgroup/iot/test2
> > > > 
> > > > # Set bw limit of 8 MB/ps on sdb
> > > > echo "/dev/sdb:$((8 * 1024 * 1024)):0:0" >
> > > > /cgroup/iot/test1/blockio.bandwidth-max
> > > > 
> > > > sync
> > > > echo 3 > /proc/sys/vm/drop_caches
> > > > 
> > > > echo $$ > /cgroup/iot/test1/tasks
> > > > 
> > > > # Launch a normal prio reader.
> > > > ionice -c 2 -n 7 dd if=/mnt/sdb/zerofile1 of=/dev/zero &
> > > > pid1=$!
> > > > echo $pid1
> > > > 
> > > > # Launch an RT reader  
> > > > ionice -c 1 -n 0 dd if=/mnt/sdb/zerofile2 of=/dev/zero &
> > > > pid2=$!
> > > > echo $pid2
> > > > 
> > > > wait $pid2
> > > > echo "RT task finished"
> > > > **********************************************************************
> > > > 
> > > > Test1
> > > > =====
> > > > Test two readers (one RT class and one BE class) and see how BW is
> > > > allocated with-in cgroup
> > > > 
> > > > With io-throttle patches
> > > > ------------------------
> > > > - Two readers, first BE prio 7, second RT prio 0
> > > > 
> > > > 234179072 bytes (234 MB) copied, 55.8482 s, 4.2 MB/s
> > > > 234179072 bytes (234 MB) copied, 55.8975 s, 4.2 MB/s
> > > > RT task finished
> > > > 
> > > > Note: See, there is no difference in the performance of RT or BE task.
> > > > Looks like these got throttled equally.
> > > 
> > > OK, this is coherent with the current io-throttle implementation. IO
> > > requests are throttled without the concept of the ioprio model.
> > > 
> > > We could try to distribute the throttle using a function of each task's
> > > ioprio, but ok, the obvious drawback is that it totally breaks the logic
> > > used by the underlying layers.
> > > 
> > > BTW, I'm wondering, is it a very critical issue? I would say why not to
> > > move the RT task to a different cgroup with unlimited BW? or limited BW
> > > but with other tasks running at the same IO priority...
> > 
> > So one of hypothetical use case probably  could be following. Somebody
> > is having a hosted server and customers are going to get there
> > applications running in a particular cgroup with a limit on max bw.
> > 
> > 			root
> > 		  /      |      \
> > 	     cust1      cust2   cust3
> > 	   (20 MB/s)  (40MB/s)  (30MB/s)
> > 
> > Now all three customers will run their own applications/virtual machines
> > in their respective groups with upper limits. Will we say to these that
> > all your tasks will be considered as same class and same prio level.
> > 
> > Assume cust1 is running a hypothetical application which creates multiple
> > threads and assigns these threads different priorities based on its needs
> > at run time. How would we handle this thing?
> > 
> > You can't collect all the RT tasks from all customers and move these to a
> > single cgroup. Or ask customers to separate out their tasks based on
> > priority level and give them multiple groups of different priority.
> 
> Clear.
> 
> Unfortunately, I think, with absolute BW limits at a certain point, if
> we hit the limit, we need to block the IO request. That's the same
> either, when we dispatch or submit the request. And the risk is to break
> the logic of the IO priorities and fall in the classic priority
> inversion problem.
> 
> The difference is that probably working at the CFQ level gives a better
> control so we can handle these cases appropriately and avoid the
> priority inversion problems.
> 
> Thanks,
> -Andrea

If RT tasks in cust1 issue IOs intensively, are IOs issued from BE
tasks running on cust2 and cust3 suppressed and cust1 can use whole
bandwidth?
I think that CFQ's class and priority should be preserved within a
given bandwidth to each cgroup.

Thanks,
Ryo Tsuruta

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: IO scheduler based IO Controller V2
  2009-05-06 22:35             ` Andrea Righi
  2009-05-07  1:48               ` Ryo Tsuruta
@ 2009-05-07  1:48               ` Ryo Tsuruta
  1 sibling, 0 replies; 97+ messages in thread
From: Ryo Tsuruta @ 2009-05-07  1:48 UTC (permalink / raw)
  To: righi.andrea
  Cc: vgoyal, akpm, nauman, dpshah, lizf, mikew, fchecconi,
	paolo.valente, jens.axboe, fernando, s-uchida, taka, guijianfeng,
	jmoyer, dhaval, balbir, linux-kernel, containers, agk, dm-devel,
	snitzer, m-ikeda, peterz

From: Andrea Righi <righi.andrea@gmail.com>
Subject: Re: IO scheduler based IO Controller V2
Date: Thu, 7 May 2009 00:35:13 +0200

> On Wed, May 06, 2009 at 05:52:35PM -0400, Vivek Goyal wrote:
> > On Wed, May 06, 2009 at 11:34:54PM +0200, Andrea Righi wrote:
> > > On Wed, May 06, 2009 at 04:32:28PM -0400, Vivek Goyal wrote:
> > > > Hi Andrea and others,
> > > > 
> > > > I always had this doubt in mind that any kind of 2nd level controller will
> > > > have no idea about underlying IO scheduler queues/semantics. So while it
> > > > can implement a particular cgroup policy (max bw like io-throttle or
> > > > proportional bw like dm-ioband) but there are high chances that it will
> > > > break IO scheduler's semantics in one way or other.
> > > > 
> > > > I had already sent out the results for dm-ioband in a separate thread.
> > > > 
> > > > http://linux.derkeiler.com/Mailing-Lists/Kernel/2009-04/msg07258.html
> > > > http://linux.derkeiler.com/Mailing-Lists/Kernel/2009-04/msg07573.html
> > > > http://linux.derkeiler.com/Mailing-Lists/Kernel/2009-04/msg08177.html
> > > > http://linux.derkeiler.com/Mailing-Lists/Kernel/2009-04/msg08345.html
> > > > http://linux.derkeiler.com/Mailing-Lists/Kernel/2009-04/msg08355.html
> > > > 
> > > > Here are some basic results with io-throttle. Andrea, please let me know
> > > > if you think this is procedural problem. Playing with io-throttle patches
> > > > for the first time.
> > > > 
> > > > I took V16 of your patches and trying it out with 2.6.30-rc4 with CFQ
> > > > scheduler.
> > > > 
> > > > I have got one SATA drive with one partition on it.
> > > > 
> > > > I am trying to create one cgroup and assignn 8MB/s limit to it and launch
> > > > on RT prio 0 task and one BE prio 7 task and see how this 8MB/s is divided
> > > > between these tasks. Following are the results.
> > > > 
> > > > Following is my test script.
> > > > 
> > > > *******************************************************************
> > > > #!/bin/bash
> > > > 
> > > > mount /dev/sdb1 /mnt/sdb
> > > > 
> > > > mount -t cgroup -o blockio blockio /cgroup/iot/
> > > > mkdir -p /cgroup/iot/test1 /cgroup/iot/test2
> > > > 
> > > > # Set bw limit of 8 MB/ps on sdb
> > > > echo "/dev/sdb:$((8 * 1024 * 1024)):0:0" >
> > > > /cgroup/iot/test1/blockio.bandwidth-max
> > > > 
> > > > sync
> > > > echo 3 > /proc/sys/vm/drop_caches
> > > > 
> > > > echo $$ > /cgroup/iot/test1/tasks
> > > > 
> > > > # Launch a normal prio reader.
> > > > ionice -c 2 -n 7 dd if=/mnt/sdb/zerofile1 of=/dev/zero &
> > > > pid1=$!
> > > > echo $pid1
> > > > 
> > > > # Launch an RT reader  
> > > > ionice -c 1 -n 0 dd if=/mnt/sdb/zerofile2 of=/dev/zero &
> > > > pid2=$!
> > > > echo $pid2
> > > > 
> > > > wait $pid2
> > > > echo "RT task finished"
> > > > **********************************************************************
> > > > 
> > > > Test1
> > > > =====
> > > > Test two readers (one RT class and one BE class) and see how BW is
> > > > allocated with-in cgroup
> > > > 
> > > > With io-throttle patches
> > > > ------------------------
> > > > - Two readers, first BE prio 7, second RT prio 0
> > > > 
> > > > 234179072 bytes (234 MB) copied, 55.8482 s, 4.2 MB/s
> > > > 234179072 bytes (234 MB) copied, 55.8975 s, 4.2 MB/s
> > > > RT task finished
> > > > 
> > > > Note: See, there is no difference in the performance of RT or BE task.
> > > > Looks like these got throttled equally.
> > > 
> > > OK, this is coherent with the current io-throttle implementation. IO
> > > requests are throttled without the concept of the ioprio model.
> > > 
> > > We could try to distribute the throttle using a function of each task's
> > > ioprio, but ok, the obvious drawback is that it totally breaks the logic
> > > used by the underlying layers.
> > > 
> > > BTW, I'm wondering, is it a very critical issue? I would say why not to
> > > move the RT task to a different cgroup with unlimited BW? or limited BW
> > > but with other tasks running at the same IO priority...
> > 
> > So one of hypothetical use case probably  could be following. Somebody
> > is having a hosted server and customers are going to get there
> > applications running in a particular cgroup with a limit on max bw.
> > 
> > 			root
> > 		  /      |      \
> > 	     cust1      cust2   cust3
> > 	   (20 MB/s)  (40MB/s)  (30MB/s)
> > 
> > Now all three customers will run their own applications/virtual machines
> > in their respective groups with upper limits. Will we say to these that
> > all your tasks will be considered as same class and same prio level.
> > 
> > Assume cust1 is running a hypothetical application which creates multiple
> > threads and assigns these threads different priorities based on its needs
> > at run time. How would we handle this thing?
> > 
> > You can't collect all the RT tasks from all customers and move these to a
> > single cgroup. Or ask customers to separate out their tasks based on
> > priority level and give them multiple groups of different priority.
> 
> Clear.
> 
> Unfortunately, I think, with absolute BW limits at a certain point, if
> we hit the limit, we need to block the IO request. That's the same
> either, when we dispatch or submit the request. And the risk is to break
> the logic of the IO priorities and fall in the classic priority
> inversion problem.
> 
> The difference is that probably working at the CFQ level gives a better
> control so we can handle these cases appropriately and avoid the
> priority inversion problems.
> 
> Thanks,
> -Andrea

If RT tasks in cust1 issue IOs intensively, are IOs issued from BE
tasks running on cust2 and cust3 suppressed and cust1 can use whole
bandwidth?
I think that CFQ's class and priority should be preserved within a
given bandwidth to each cgroup.

Thanks,
Ryo Tsuruta

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: IO scheduler based IO Controller V2
  2009-05-07  0:18       ` Ryo Tsuruta
@ 2009-05-07  1:25             ` Vivek Goyal
  2009-05-08 14:24         ` Rik van Riel
  1 sibling, 0 replies; 97+ messages in thread
From: Vivek Goyal @ 2009-05-07  1:25 UTC (permalink / raw)
  To: Ryo Tsuruta
  Cc: dhaval-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8,
	snitzer-H+wXaHxf7aLQT0dZR+AlfA, dm-devel-H+wXaHxf7aLQT0dZR+AlfA,
	jens.axboe-QHcLZuEGTsvQT0dZR+AlfA, agk-H+wXaHxf7aLQT0dZR+AlfA,
	balbir-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8,
	paolo.valente-rcYM44yAMweonA0d6jMUrA,
	fernando-gVGce1chcLdL9jVzuh4AOg, jmoyer-H+wXaHxf7aLQT0dZR+AlfA,
	fchecconi-Re5JQEeQqe8AvxtiuMwx3w,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	righi.andrea-Re5JQEeQqe8AvxtiuMwx3w

On Thu, May 07, 2009 at 09:18:58AM +0900, Ryo Tsuruta wrote:
> Hi Vivek,
> 
> > Ryo, dm-ioband breaks the notion of classes and priority of CFQ because
> > of FIFO dispatch of buffered bios. Apart from that it tries to provide
> > fairness in terms of actual IO done and that would mean a seeky workload
> > will can use disk for much longer to get equivalent IO done and slow down
> > other applications. Implementing IO controller at IO scheduler level gives
> > us tigher control. Will it not meet your requirements? If you got specific
> > concerns with IO scheduler based contol patches, please highlight these and
> > we will see how these can be addressed.
> 
> I'd like to avoid making complicated existing IO schedulers and other
> kernel codes and to give a choice to users whether or not to use it.
> I know that you chose an approach that using compile time options to
> get the same behavior as old system, but device-mapper drivers can be
> added, removed and replaced while system is running.
> 

Same is possible with IO scheduler based controller. If you don't want
cgroup stuff, don't create those. By default everything will be in root
group and you will get the old behavior. 

If you want io controller stuff, just create the cgroup, assign weight
and move task there. So what more choices do you want which are missing
here?

Thanks
Vivek

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: IO scheduler based IO Controller V2
@ 2009-05-07  1:25             ` Vivek Goyal
  0 siblings, 0 replies; 97+ messages in thread
From: Vivek Goyal @ 2009-05-07  1:25 UTC (permalink / raw)
  To: Ryo Tsuruta
  Cc: akpm, nauman, dpshah, lizf, mikew, fchecconi, paolo.valente,
	jens.axboe, fernando, s-uchida, taka, guijianfeng, jmoyer,
	dhaval, balbir, linux-kernel, containers, righi.andrea, agk,
	dm-devel, snitzer, m-ikeda, peterz

On Thu, May 07, 2009 at 09:18:58AM +0900, Ryo Tsuruta wrote:
> Hi Vivek,
> 
> > Ryo, dm-ioband breaks the notion of classes and priority of CFQ because
> > of FIFO dispatch of buffered bios. Apart from that it tries to provide
> > fairness in terms of actual IO done and that would mean a seeky workload
> > will can use disk for much longer to get equivalent IO done and slow down
> > other applications. Implementing IO controller at IO scheduler level gives
> > us tigher control. Will it not meet your requirements? If you got specific
> > concerns with IO scheduler based contol patches, please highlight these and
> > we will see how these can be addressed.
> 
> I'd like to avoid making complicated existing IO schedulers and other
> kernel codes and to give a choice to users whether or not to use it.
> I know that you chose an approach that using compile time options to
> get the same behavior as old system, but device-mapper drivers can be
> added, removed and replaced while system is running.
> 

Same is possible with IO scheduler based controller. If you don't want
cgroup stuff, don't create those. By default everything will be in root
group and you will get the old behavior. 

If you want io controller stuff, just create the cgroup, assign weight
and move task there. So what more choices do you want which are missing
here?

Thanks
Vivek

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: IO scheduler based IO Controller V2
       [not found]       ` <20090506023332.GA1212-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
                           ` (2 preceding siblings ...)
  2009-05-06 20:32         ` Vivek Goyal
@ 2009-05-07  0:18         ` Ryo Tsuruta
  3 siblings, 0 replies; 97+ messages in thread
From: Ryo Tsuruta @ 2009-05-07  0:18 UTC (permalink / raw)
  To: vgoyal-H+wXaHxf7aLQT0dZR+AlfA
  Cc: dhaval-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8,
	snitzer-H+wXaHxf7aLQT0dZR+AlfA, dm-devel-H+wXaHxf7aLQT0dZR+AlfA,
	jens.axboe-QHcLZuEGTsvQT0dZR+AlfA, agk-H+wXaHxf7aLQT0dZR+AlfA,
	balbir-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8,
	paolo.valente-rcYM44yAMweonA0d6jMUrA,
	fernando-gVGce1chcLdL9jVzuh4AOg, jmoyer-H+wXaHxf7aLQT0dZR+AlfA,
	fchecconi-Re5JQEeQqe8AvxtiuMwx3w,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	righi.andrea-Re5JQEeQqe8AvxtiuMwx3w

Hi Vivek,

> Ryo, dm-ioband breaks the notion of classes and priority of CFQ because
> of FIFO dispatch of buffered bios. Apart from that it tries to provide
> fairness in terms of actual IO done and that would mean a seeky workload
> will can use disk for much longer to get equivalent IO done and slow down
> other applications. Implementing IO controller at IO scheduler level gives
> us tigher control. Will it not meet your requirements? If you got specific
> concerns with IO scheduler based contol patches, please highlight these and
> we will see how these can be addressed.

I'd like to avoid making complicated existing IO schedulers and other
kernel codes and to give a choice to users whether or not to use it.
I know that you chose an approach that using compile time options to
get the same behavior as old system, but device-mapper drivers can be
added, removed and replaced while system is running.

Thanks,
Ryo Tsuruta

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: IO scheduler based IO Controller V2
  2009-05-06  2:33     ` Vivek Goyal
                         ` (3 preceding siblings ...)
  2009-05-06 20:32       ` Vivek Goyal
@ 2009-05-07  0:18       ` Ryo Tsuruta
       [not found]         ` <20090507.091858.226775723.ryov-jCdQPDEk3idL9jVzuh4AOg@public.gmane.org>
  2009-05-08 14:24         ` Rik van Riel
  4 siblings, 2 replies; 97+ messages in thread
From: Ryo Tsuruta @ 2009-05-07  0:18 UTC (permalink / raw)
  To: vgoyal
  Cc: akpm, nauman, dpshah, lizf, mikew, fchecconi, paolo.valente,
	jens.axboe, fernando, s-uchida, taka, guijianfeng, jmoyer,
	dhaval, balbir, linux-kernel, containers, righi.andrea, agk,
	dm-devel, snitzer, m-ikeda, peterz

Hi Vivek,

> Ryo, dm-ioband breaks the notion of classes and priority of CFQ because
> of FIFO dispatch of buffered bios. Apart from that it tries to provide
> fairness in terms of actual IO done and that would mean a seeky workload
> will can use disk for much longer to get equivalent IO done and slow down
> other applications. Implementing IO controller at IO scheduler level gives
> us tigher control. Will it not meet your requirements? If you got specific
> concerns with IO scheduler based contol patches, please highlight these and
> we will see how these can be addressed.

I'd like to avoid making complicated existing IO schedulers and other
kernel codes and to give a choice to users whether or not to use it.
I know that you chose an approach that using compile time options to
get the same behavior as old system, but device-mapper drivers can be
added, removed and replaced while system is running.

Thanks,
Ryo Tsuruta

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: IO scheduler based IO Controller V2
       [not found]             ` <20090506215235.GJ8180-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2009-05-06 22:35               ` Andrea Righi
  2009-05-07  9:04               ` Andrea Righi
  1 sibling, 0 replies; 97+ messages in thread
From: Andrea Righi @ 2009-05-06 22:35 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: dhaval-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8,
	snitzer-H+wXaHxf7aLQT0dZR+AlfA, dm-devel-H+wXaHxf7aLQT0dZR+AlfA,
	jens.axboe-QHcLZuEGTsvQT0dZR+AlfA, agk-H+wXaHxf7aLQT0dZR+AlfA,
	balbir-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8,
	paolo.valente-rcYM44yAMweonA0d6jMUrA,
	fernando-gVGce1chcLdL9jVzuh4AOg, jmoyer-H+wXaHxf7aLQT0dZR+AlfA,
	fchecconi-Re5JQEeQqe8AvxtiuMwx3w,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Andrew Morton

On Wed, May 06, 2009 at 05:52:35PM -0400, Vivek Goyal wrote:
> On Wed, May 06, 2009 at 11:34:54PM +0200, Andrea Righi wrote:
> > On Wed, May 06, 2009 at 04:32:28PM -0400, Vivek Goyal wrote:
> > > Hi Andrea and others,
> > > 
> > > I always had this doubt in mind that any kind of 2nd level controller will
> > > have no idea about underlying IO scheduler queues/semantics. So while it
> > > can implement a particular cgroup policy (max bw like io-throttle or
> > > proportional bw like dm-ioband) but there are high chances that it will
> > > break IO scheduler's semantics in one way or other.
> > > 
> > > I had already sent out the results for dm-ioband in a separate thread.
> > > 
> > > http://linux.derkeiler.com/Mailing-Lists/Kernel/2009-04/msg07258.html
> > > http://linux.derkeiler.com/Mailing-Lists/Kernel/2009-04/msg07573.html
> > > http://linux.derkeiler.com/Mailing-Lists/Kernel/2009-04/msg08177.html
> > > http://linux.derkeiler.com/Mailing-Lists/Kernel/2009-04/msg08345.html
> > > http://linux.derkeiler.com/Mailing-Lists/Kernel/2009-04/msg08355.html
> > > 
> > > Here are some basic results with io-throttle. Andrea, please let me know
> > > if you think this is procedural problem. Playing with io-throttle patches
> > > for the first time.
> > > 
> > > I took V16 of your patches and trying it out with 2.6.30-rc4 with CFQ
> > > scheduler.
> > > 
> > > I have got one SATA drive with one partition on it.
> > > 
> > > I am trying to create one cgroup and assignn 8MB/s limit to it and launch
> > > on RT prio 0 task and one BE prio 7 task and see how this 8MB/s is divided
> > > between these tasks. Following are the results.
> > > 
> > > Following is my test script.
> > > 
> > > *******************************************************************
> > > #!/bin/bash
> > > 
> > > mount /dev/sdb1 /mnt/sdb
> > > 
> > > mount -t cgroup -o blockio blockio /cgroup/iot/
> > > mkdir -p /cgroup/iot/test1 /cgroup/iot/test2
> > > 
> > > # Set bw limit of 8 MB/ps on sdb
> > > echo "/dev/sdb:$((8 * 1024 * 1024)):0:0" >
> > > /cgroup/iot/test1/blockio.bandwidth-max
> > > 
> > > sync
> > > echo 3 > /proc/sys/vm/drop_caches
> > > 
> > > echo $$ > /cgroup/iot/test1/tasks
> > > 
> > > # Launch a normal prio reader.
> > > ionice -c 2 -n 7 dd if=/mnt/sdb/zerofile1 of=/dev/zero &
> > > pid1=$!
> > > echo $pid1
> > > 
> > > # Launch an RT reader  
> > > ionice -c 1 -n 0 dd if=/mnt/sdb/zerofile2 of=/dev/zero &
> > > pid2=$!
> > > echo $pid2
> > > 
> > > wait $pid2
> > > echo "RT task finished"
> > > **********************************************************************
> > > 
> > > Test1
> > > =====
> > > Test two readers (one RT class and one BE class) and see how BW is
> > > allocated with-in cgroup
> > > 
> > > With io-throttle patches
> > > ------------------------
> > > - Two readers, first BE prio 7, second RT prio 0
> > > 
> > > 234179072 bytes (234 MB) copied, 55.8482 s, 4.2 MB/s
> > > 234179072 bytes (234 MB) copied, 55.8975 s, 4.2 MB/s
> > > RT task finished
> > > 
> > > Note: See, there is no difference in the performance of RT or BE task.
> > > Looks like these got throttled equally.
> > 
> > OK, this is coherent with the current io-throttle implementation. IO
> > requests are throttled without the concept of the ioprio model.
> > 
> > We could try to distribute the throttle using a function of each task's
> > ioprio, but ok, the obvious drawback is that it totally breaks the logic
> > used by the underlying layers.
> > 
> > BTW, I'm wondering, is it a very critical issue? I would say why not to
> > move the RT task to a different cgroup with unlimited BW? or limited BW
> > but with other tasks running at the same IO priority...
> 
> So one of hypothetical use case probably  could be following. Somebody
> is having a hosted server and customers are going to get there
> applications running in a particular cgroup with a limit on max bw.
> 
> 			root
> 		  /      |      \
> 	     cust1      cust2   cust3
> 	   (20 MB/s)  (40MB/s)  (30MB/s)
> 
> Now all three customers will run their own applications/virtual machines
> in their respective groups with upper limits. Will we say to these that
> all your tasks will be considered as same class and same prio level.
> 
> Assume cust1 is running a hypothetical application which creates multiple
> threads and assigns these threads different priorities based on its needs
> at run time. How would we handle this thing?
> 
> You can't collect all the RT tasks from all customers and move these to a
> single cgroup. Or ask customers to separate out their tasks based on
> priority level and give them multiple groups of different priority.

Clear.

Unfortunately, I think, with absolute BW limits at a certain point, if
we hit the limit, we need to block the IO request. That's the same
either, when we dispatch or submit the request. And the risk is to break
the logic of the IO priorities and fall in the classic priority
inversion problem.

The difference is that probably working at the CFQ level gives a better
control so we can handle these cases appropriately and avoid the
priority inversion problems.

Thanks,
-Andrea

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: IO scheduler based IO Controller V2
  2009-05-06 21:52             ` Vivek Goyal
  (?)
@ 2009-05-06 22:35             ` Andrea Righi
  2009-05-07  1:48               ` Ryo Tsuruta
  2009-05-07  1:48               ` Ryo Tsuruta
  -1 siblings, 2 replies; 97+ messages in thread
From: Andrea Righi @ 2009-05-06 22:35 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Andrew Morton, nauman, dpshah, lizf, mikew, fchecconi,
	paolo.valente, jens.axboe, ryov, fernando, s-uchida, taka,
	guijianfeng, jmoyer, dhaval, balbir, linux-kernel, containers,
	agk, dm-devel, snitzer, m-ikeda, peterz

On Wed, May 06, 2009 at 05:52:35PM -0400, Vivek Goyal wrote:
> On Wed, May 06, 2009 at 11:34:54PM +0200, Andrea Righi wrote:
> > On Wed, May 06, 2009 at 04:32:28PM -0400, Vivek Goyal wrote:
> > > Hi Andrea and others,
> > > 
> > > I always had this doubt in mind that any kind of 2nd level controller will
> > > have no idea about underlying IO scheduler queues/semantics. So while it
> > > can implement a particular cgroup policy (max bw like io-throttle or
> > > proportional bw like dm-ioband) but there are high chances that it will
> > > break IO scheduler's semantics in one way or other.
> > > 
> > > I had already sent out the results for dm-ioband in a separate thread.
> > > 
> > > http://linux.derkeiler.com/Mailing-Lists/Kernel/2009-04/msg07258.html
> > > http://linux.derkeiler.com/Mailing-Lists/Kernel/2009-04/msg07573.html
> > > http://linux.derkeiler.com/Mailing-Lists/Kernel/2009-04/msg08177.html
> > > http://linux.derkeiler.com/Mailing-Lists/Kernel/2009-04/msg08345.html
> > > http://linux.derkeiler.com/Mailing-Lists/Kernel/2009-04/msg08355.html
> > > 
> > > Here are some basic results with io-throttle. Andrea, please let me know
> > > if you think this is procedural problem. Playing with io-throttle patches
> > > for the first time.
> > > 
> > > I took V16 of your patches and trying it out with 2.6.30-rc4 with CFQ
> > > scheduler.
> > > 
> > > I have got one SATA drive with one partition on it.
> > > 
> > > I am trying to create one cgroup and assignn 8MB/s limit to it and launch
> > > on RT prio 0 task and one BE prio 7 task and see how this 8MB/s is divided
> > > between these tasks. Following are the results.
> > > 
> > > Following is my test script.
> > > 
> > > *******************************************************************
> > > #!/bin/bash
> > > 
> > > mount /dev/sdb1 /mnt/sdb
> > > 
> > > mount -t cgroup -o blockio blockio /cgroup/iot/
> > > mkdir -p /cgroup/iot/test1 /cgroup/iot/test2
> > > 
> > > # Set bw limit of 8 MB/ps on sdb
> > > echo "/dev/sdb:$((8 * 1024 * 1024)):0:0" >
> > > /cgroup/iot/test1/blockio.bandwidth-max
> > > 
> > > sync
> > > echo 3 > /proc/sys/vm/drop_caches
> > > 
> > > echo $$ > /cgroup/iot/test1/tasks
> > > 
> > > # Launch a normal prio reader.
> > > ionice -c 2 -n 7 dd if=/mnt/sdb/zerofile1 of=/dev/zero &
> > > pid1=$!
> > > echo $pid1
> > > 
> > > # Launch an RT reader  
> > > ionice -c 1 -n 0 dd if=/mnt/sdb/zerofile2 of=/dev/zero &
> > > pid2=$!
> > > echo $pid2
> > > 
> > > wait $pid2
> > > echo "RT task finished"
> > > **********************************************************************
> > > 
> > > Test1
> > > =====
> > > Test two readers (one RT class and one BE class) and see how BW is
> > > allocated with-in cgroup
> > > 
> > > With io-throttle patches
> > > ------------------------
> > > - Two readers, first BE prio 7, second RT prio 0
> > > 
> > > 234179072 bytes (234 MB) copied, 55.8482 s, 4.2 MB/s
> > > 234179072 bytes (234 MB) copied, 55.8975 s, 4.2 MB/s
> > > RT task finished
> > > 
> > > Note: See, there is no difference in the performance of RT or BE task.
> > > Looks like these got throttled equally.
> > 
> > OK, this is coherent with the current io-throttle implementation. IO
> > requests are throttled without the concept of the ioprio model.
> > 
> > We could try to distribute the throttle using a function of each task's
> > ioprio, but ok, the obvious drawback is that it totally breaks the logic
> > used by the underlying layers.
> > 
> > BTW, I'm wondering, is it a very critical issue? I would say why not to
> > move the RT task to a different cgroup with unlimited BW? or limited BW
> > but with other tasks running at the same IO priority...
> 
> So one of hypothetical use case probably  could be following. Somebody
> is having a hosted server and customers are going to get there
> applications running in a particular cgroup with a limit on max bw.
> 
> 			root
> 		  /      |      \
> 	     cust1      cust2   cust3
> 	   (20 MB/s)  (40MB/s)  (30MB/s)
> 
> Now all three customers will run their own applications/virtual machines
> in their respective groups with upper limits. Will we say to these that
> all your tasks will be considered as same class and same prio level.
> 
> Assume cust1 is running a hypothetical application which creates multiple
> threads and assigns these threads different priorities based on its needs
> at run time. How would we handle this thing?
> 
> You can't collect all the RT tasks from all customers and move these to a
> single cgroup. Or ask customers to separate out their tasks based on
> priority level and give them multiple groups of different priority.

Clear.

Unfortunately, I think, with absolute BW limits at a certain point, if
we hit the limit, we need to block the IO request. That's the same
either, when we dispatch or submit the request. And the risk is to break
the logic of the IO priorities and fall in the classic priority
inversion problem.

The difference is that probably working at the CFQ level gives a better
control so we can handle these cases appropriately and avoid the
priority inversion problems.

Thanks,
-Andrea

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: IO scheduler based IO Controller V2
  2009-05-06 22:02               ` Andrea Righi
@ 2009-05-06 22:17                 ` Vivek Goyal
  -1 siblings, 0 replies; 97+ messages in thread
From: Vivek Goyal @ 2009-05-06 22:17 UTC (permalink / raw)
  To: Andrea Righi
  Cc: dhaval-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8,
	snitzer-H+wXaHxf7aLQT0dZR+AlfA, dm-devel-H+wXaHxf7aLQT0dZR+AlfA,
	jens.axboe-QHcLZuEGTsvQT0dZR+AlfA, agk-H+wXaHxf7aLQT0dZR+AlfA,
	balbir-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8,
	paolo.valente-rcYM44yAMweonA0d6jMUrA,
	fernando-gVGce1chcLdL9jVzuh4AOg, jmoyer-H+wXaHxf7aLQT0dZR+AlfA,
	fchecconi-Re5JQEeQqe8AvxtiuMwx3w,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Andrew Morton

On Thu, May 07, 2009 at 12:02:51AM +0200, Andrea Righi wrote:
> On Wed, May 06, 2009 at 05:21:21PM -0400, Vivek Goyal wrote:
> > > Well, IMHO the big concern is at which level we want to implement the
> > > logic of control: IO scheduler, when the IO requests are already
> > > submitted and need to be dispatched, or at high level when the
> > > applications generates IO requests (or maybe both).
> > > 
> > > And, as pointed by Andrew, do everything by a cgroup-based controller.
> > 
> > I am not sure what's the rationale behind that. Why to do it at higher
> > layer? Doing it at IO scheduler layer will make sure that one does not
> > breaks the IO scheduler's properties with-in cgroup. (See my other mail
> > with some io-throttling test results).
> > 
> > The advantage of higher layer mechanism is that it can also cover software
> > RAID devices well. 
> > 
> > > 
> > > The other features, proportional BW, throttling, take the current ioprio
> > > model in account, etc. are implementation details and any of the
> > > proposed solutions can be extended to support all these features. I
> > > mean, io-throttle can be extended to support proportional BW (for a
> > > certain perspective it is already provided by the throttling water mark
> > > in v16), as well as the IO scheduler based controller can be extended to
> > > support absolute BW limits. The same for dm-ioband. I don't think
> > > there're huge obstacle to merge the functionalities in this sense.
> > 
> > Yes, from technical point of view, one can implement a proportional BW
> > controller at higher layer also. But that would practically mean almost
> > re-implementing the CFQ logic at higher layer. Now why to get into all
> > that complexity. Why not simply make CFQ hiearchical to also handle the
> > groups?
> 
> Make CFQ aware of cgroups is very important also. I could be wrong, but
> I don't think we shouldn't re-implement the same exact CFQ logic at
> higher layers. CFQ dispatches IO requests, at higher layers applications
> submit IO requests. We're talking about different things and applying
> different logic doesn't sound too strange IMHO. I mean, at least we
> should consider/test also this different approach before deciding drop
> it.
> 

Lot of CFQ code is all about maintaining per io context queues, for
different classes and different prio level, about anticipation for
reads etc. Anybody who wants to get classes and ioprio within cgroup
right will end up duplicating all that logic (to cover all the cases).
So I did not mean that you will end up copying the whole code but
logically a lot of it.

Secondly, there will be mismatch in anticipation logic. CFQ gives
preference to reads and for dependent readers it idles and waits for
next request to come. A higher level throttling can interefere with IO
pattern of application and can lead CFQ to think that average thinktime
of this application is high and disable the anticipation on that
application. Which should result in high latencies for simple commands
like "ls", in presence of competing applications. 

> This solution also guarantee no changes in the IO schedulers for those
> who are not interested in using the cgroup IO controller. What is the
> impact of the IO scheduler based controller for those users?
> 

IO scheduler based solution is highly customizable. First of all there
are compile time switches to either completely remove fair queuing code
(for noop, deadline and AS only) or to disable group scheduling only. If
that's the case one would expect same behavior as old scheduler.

Secondly, even if everything is compiled in and customer is not using
cgroups, I would expect almost same behavior (because we will have only
root group). There will be extra code in the way and we will need some
optimizations to detect that there is only one group and bypass as much
code as possible bringing the overhead of the new code to the minimum. 

So if customer is not using IO controller, he should get the same behavior
as old system. Can't prove it right now because my patches are not in that
matured but there are no fundamental design limitations.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: IO scheduler based IO Controller V2
@ 2009-05-06 22:17                 ` Vivek Goyal
  0 siblings, 0 replies; 97+ messages in thread
From: Vivek Goyal @ 2009-05-06 22:17 UTC (permalink / raw)
  To: Andrea Righi
  Cc: Andrew Morton, nauman, dpshah, lizf, mikew, fchecconi,
	paolo.valente, jens.axboe, ryov, fernando, s-uchida, taka,
	guijianfeng, jmoyer, dhaval, balbir, linux-kernel, containers,
	agk, dm-devel, snitzer, m-ikeda, peterz

On Thu, May 07, 2009 at 12:02:51AM +0200, Andrea Righi wrote:
> On Wed, May 06, 2009 at 05:21:21PM -0400, Vivek Goyal wrote:
> > > Well, IMHO the big concern is at which level we want to implement the
> > > logic of control: IO scheduler, when the IO requests are already
> > > submitted and need to be dispatched, or at high level when the
> > > applications generates IO requests (or maybe both).
> > > 
> > > And, as pointed by Andrew, do everything by a cgroup-based controller.
> > 
> > I am not sure what's the rationale behind that. Why to do it at higher
> > layer? Doing it at IO scheduler layer will make sure that one does not
> > breaks the IO scheduler's properties with-in cgroup. (See my other mail
> > with some io-throttling test results).
> > 
> > The advantage of higher layer mechanism is that it can also cover software
> > RAID devices well. 
> > 
> > > 
> > > The other features, proportional BW, throttling, take the current ioprio
> > > model in account, etc. are implementation details and any of the
> > > proposed solutions can be extended to support all these features. I
> > > mean, io-throttle can be extended to support proportional BW (for a
> > > certain perspective it is already provided by the throttling water mark
> > > in v16), as well as the IO scheduler based controller can be extended to
> > > support absolute BW limits. The same for dm-ioband. I don't think
> > > there're huge obstacle to merge the functionalities in this sense.
> > 
> > Yes, from technical point of view, one can implement a proportional BW
> > controller at higher layer also. But that would practically mean almost
> > re-implementing the CFQ logic at higher layer. Now why to get into all
> > that complexity. Why not simply make CFQ hiearchical to also handle the
> > groups?
> 
> Make CFQ aware of cgroups is very important also. I could be wrong, but
> I don't think we shouldn't re-implement the same exact CFQ logic at
> higher layers. CFQ dispatches IO requests, at higher layers applications
> submit IO requests. We're talking about different things and applying
> different logic doesn't sound too strange IMHO. I mean, at least we
> should consider/test also this different approach before deciding drop
> it.
> 

Lot of CFQ code is all about maintaining per io context queues, for
different classes and different prio level, about anticipation for
reads etc. Anybody who wants to get classes and ioprio within cgroup
right will end up duplicating all that logic (to cover all the cases).
So I did not mean that you will end up copying the whole code but
logically a lot of it.

Secondly, there will be mismatch in anticipation logic. CFQ gives
preference to reads and for dependent readers it idles and waits for
next request to come. A higher level throttling can interefere with IO
pattern of application and can lead CFQ to think that average thinktime
of this application is high and disable the anticipation on that
application. Which should result in high latencies for simple commands
like "ls", in presence of competing applications. 

> This solution also guarantee no changes in the IO schedulers for those
> who are not interested in using the cgroup IO controller. What is the
> impact of the IO scheduler based controller for those users?
> 

IO scheduler based solution is highly customizable. First of all there
are compile time switches to either completely remove fair queuing code
(for noop, deadline and AS only) or to disable group scheduling only. If
that's the case one would expect same behavior as old scheduler.

Secondly, even if everything is compiled in and customer is not using
cgroups, I would expect almost same behavior (because we will have only
root group). There will be extra code in the way and we will need some
optimizations to detect that there is only one group and bypass as much
code as possible bringing the overhead of the new code to the minimum. 

So if customer is not using IO controller, he should get the same behavior
as old system. Can't prove it right now because my patches are not in that
matured but there are no fundamental design limitations.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: IO scheduler based IO Controller V2
  2009-05-06 21:21         ` Vivek Goyal
@ 2009-05-06 22:02               ` Andrea Righi
  0 siblings, 0 replies; 97+ messages in thread
From: Andrea Righi @ 2009-05-06 22:02 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: dhaval-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8,
	snitzer-H+wXaHxf7aLQT0dZR+AlfA, dm-devel-H+wXaHxf7aLQT0dZR+AlfA,
	jens.axboe-QHcLZuEGTsvQT0dZR+AlfA, agk-H+wXaHxf7aLQT0dZR+AlfA,
	balbir-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8,
	paolo.valente-rcYM44yAMweonA0d6jMUrA,
	fernando-gVGce1chcLdL9jVzuh4AOg, jmoyer-H+wXaHxf7aLQT0dZR+AlfA,
	fchecconi-Re5JQEeQqe8AvxtiuMwx3w,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Andrew Morton

On Wed, May 06, 2009 at 05:21:21PM -0400, Vivek Goyal wrote:
> > Well, IMHO the big concern is at which level we want to implement the
> > logic of control: IO scheduler, when the IO requests are already
> > submitted and need to be dispatched, or at high level when the
> > applications generates IO requests (or maybe both).
> > 
> > And, as pointed by Andrew, do everything by a cgroup-based controller.
> 
> I am not sure what's the rationale behind that. Why to do it at higher
> layer? Doing it at IO scheduler layer will make sure that one does not
> breaks the IO scheduler's properties with-in cgroup. (See my other mail
> with some io-throttling test results).
> 
> The advantage of higher layer mechanism is that it can also cover software
> RAID devices well. 
> 
> > 
> > The other features, proportional BW, throttling, take the current ioprio
> > model in account, etc. are implementation details and any of the
> > proposed solutions can be extended to support all these features. I
> > mean, io-throttle can be extended to support proportional BW (for a
> > certain perspective it is already provided by the throttling water mark
> > in v16), as well as the IO scheduler based controller can be extended to
> > support absolute BW limits. The same for dm-ioband. I don't think
> > there're huge obstacle to merge the functionalities in this sense.
> 
> Yes, from technical point of view, one can implement a proportional BW
> controller at higher layer also. But that would practically mean almost
> re-implementing the CFQ logic at higher layer. Now why to get into all
> that complexity. Why not simply make CFQ hiearchical to also handle the
> groups?

Make CFQ aware of cgroups is very important also. I could be wrong, but
I don't think we shouldn't re-implement the same exact CFQ logic at
higher layers. CFQ dispatches IO requests, at higher layers applications
submit IO requests. We're talking about different things and applying
different logic doesn't sound too strange IMHO. I mean, at least we
should consider/test also this different approach before deciding drop
it.

This solution also guarantee no changes in the IO schedulers for those
who are not interested in using the cgroup IO controller. What is the
impact of the IO scheduler based controller for those users?

> 
> Secondly, think of following odd scenarios if we implement a higher level
> proportional BW controller which can offer the same feature as CFQ and
> also can handle group scheduling.
> 
> Case1:
> ======	 
>            (Higher level proportional BW controller)
> 			/dev/sda (CFQ)
> 
> So if somebody wants a group scheduling, we will be doing same IO control
> at two places (with-in group). Once at higher level and second time at CFQ
> level. Does not sound too logical to me.
> 
> Case2:
> ======
> 
>            (Higher level proportional BW controller)
> 			/dev/sda (NOOP)
> 	
> This is other extrememt. Lower level IO scheduler does not offer any kind
> of notion of class or prio with-in class and higher level scheduler will
> still be maintaining all the infrastructure unnecessarily.
> 
> That's why I get back to this simple question again, why not extend the
> IO schedulers to handle group scheduling and do both proportional BW and
> max bw control there.
> 
> > 
> > > 
> > > Andrea, last time you were planning to have a look at my patches and see
> > > if max bw controller can be implemented there. I got a feeling that it
> > > should not be too difficult to implement it there. We already have the
> > > hierarchical tree of io queues and groups in elevator layer and we run
> > > BFQ (WF2Q+) algorithm to select next queue to dispatch the IO from. It is
> > > just a matter of also keeping track of IO rate per queue/group and we should
> > > be easily be able to delay the dispatch of IO from a queue if its group has
> > > crossed the specified max bw.
> > 
> > Yes, sorry for my late, I quickly tested your patchset, but I still need
> > to understand many details of your solution. In the next days I'll
> > re-read everything carefully and I'll try to do a detailed review of
> > your patchset (just re-building the kernel with your patchset applied).
> > 
> 
> Sure. My patchset is still in the infancy stage. So don't expect great
> results. But it does highlight the idea and design very well.
> 
> > > 
> > > This should lead to less code and reduced complextiy (compared with the
> > > case where we do max bw control with io-throttling patches and proportional
> > > BW control using IO scheduler based control patches).
> > 
> > mmmh... changing the logic at the elevator and all IO schedulers doesn't
> > sound like reduced complexity and less code changed. With io-throttle we
> > just need to place the cgroup_io_throttle() hook in the right functions
> > where we want to apply throttling. This is a quite easy approach to
> > extend the IO control also to logical devices (more in general devices
> > that use their own make_request_fn) or even network-attached devices, as
> > well as networking filesystems, etc.
> > 
> > But I may be wrong. As I said I still need to review in the details your
> > solution.
> 
> Well I meant reduced code in the sense if we implement both max bw and
> proportional bw at IO scheduler level instead of proportional BW at
> IO scheduler and max bw at higher level.

OK.

> 
> I agree that doing max bw control at higher level has this advantage that
> it covers all the kind of deivces (higher level logical devices) and IO
> scheduler level solution does not do that. But this comes at the price
> of broken IO scheduler properties with-in cgroup.
> 
> Maybe we can then implement both. A higher level max bw controller and a
> max bw feature implemented along side proportional BW controller at IO
> scheduler level. Folks who use hardware RAID, or single disk devices can
> use max bw control of IO scheduler and those using software RAID devices
> can use higher level max bw controller.

OK, maybe.

> 
> > 
> > >  
> > > So do you think that it would make sense to do max BW control along with
> > > proportional weight IO controller at IO scheduler? If yes, then we can
> > > work together and continue to develop this patchset to also support max
> > > bw control and meet your requirements and drop the io-throttling patches.
> > 
> > It is surely worth to be explored. Honestly, I don't know if it would be
> > a better solution or not. Probably comparing some results with different
> > IO workloads is the best way to proceed and decide which is the right
> > way to go. This is necessary IMHO, before totally dropping one solution
> > or another.
> 
> Sure. My patches have started giving some basic results but because there
> is lot of work remaining before a fair comparison can be done on the
> basis of performance under various work loads. So some more time to
> go before we can do a fair comparison based on numbers.
>  
> > 
> > > 
> > > The only thing which concerns me is the fact that IO scheduler does not
> > > have the view of higher level logical device. So if somebody has setup a
> > > software RAID and wants to put max BW limit on software raid device, this
> > > solution will not work. One shall have to live with max bw limits on 
> > > individual disks (where io scheduler is actually running). Do your patches
> > > allow to put limit on software RAID devices also? 
> > 
> > No, but as said above my patchset provides the interfaces to apply the
> > IO control and accounting wherever we want. At the moment there's just
> > one interface, cgroup_io_throttle().
> 
> Sorry, I did not get it clearly. I guess I did not ask the question right.
> So lets say I got a setup where there are two phyical devices /dev/sda and
> /dev/sdb and I create a logical device (say using device mapper facilities)
> on top of these two physical disks. And some application is generating
> the IO for logical device lv0.
> 
> 				Appl
> 				 |
> 				lv0
> 			       /  \
> 			    sda	   sdb
> 
> 
> Where should I put the bandwidth limiting rules now for io-throtle. I 
> specify these for lv0 device or for sda and sdb devices?

The BW limiting rules would be applied into the make_request_fn provided
by the lv0 device. If it's not provided, before calling
generic_make_request(). A problem could be that the driver must be aware
of the particular lv0 device at that point.

> 
> Thanks
> Vivek

OK. I definitely need to look at your patchset before saying any other
opinion... :)

Thanks,
-Andrea

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: IO scheduler based IO Controller V2
@ 2009-05-06 22:02               ` Andrea Righi
  0 siblings, 0 replies; 97+ messages in thread
From: Andrea Righi @ 2009-05-06 22:02 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Andrew Morton, nauman, dpshah, lizf, mikew, fchecconi,
	paolo.valente, jens.axboe, ryov, fernando, s-uchida, taka,
	guijianfeng, jmoyer, dhaval, balbir, linux-kernel, containers,
	agk, dm-devel, snitzer, m-ikeda, peterz

On Wed, May 06, 2009 at 05:21:21PM -0400, Vivek Goyal wrote:
> > Well, IMHO the big concern is at which level we want to implement the
> > logic of control: IO scheduler, when the IO requests are already
> > submitted and need to be dispatched, or at high level when the
> > applications generates IO requests (or maybe both).
> > 
> > And, as pointed by Andrew, do everything by a cgroup-based controller.
> 
> I am not sure what's the rationale behind that. Why to do it at higher
> layer? Doing it at IO scheduler layer will make sure that one does not
> breaks the IO scheduler's properties with-in cgroup. (See my other mail
> with some io-throttling test results).
> 
> The advantage of higher layer mechanism is that it can also cover software
> RAID devices well. 
> 
> > 
> > The other features, proportional BW, throttling, take the current ioprio
> > model in account, etc. are implementation details and any of the
> > proposed solutions can be extended to support all these features. I
> > mean, io-throttle can be extended to support proportional BW (for a
> > certain perspective it is already provided by the throttling water mark
> > in v16), as well as the IO scheduler based controller can be extended to
> > support absolute BW limits. The same for dm-ioband. I don't think
> > there're huge obstacle to merge the functionalities in this sense.
> 
> Yes, from technical point of view, one can implement a proportional BW
> controller at higher layer also. But that would practically mean almost
> re-implementing the CFQ logic at higher layer. Now why to get into all
> that complexity. Why not simply make CFQ hiearchical to also handle the
> groups?

Make CFQ aware of cgroups is very important also. I could be wrong, but
I don't think we shouldn't re-implement the same exact CFQ logic at
higher layers. CFQ dispatches IO requests, at higher layers applications
submit IO requests. We're talking about different things and applying
different logic doesn't sound too strange IMHO. I mean, at least we
should consider/test also this different approach before deciding drop
it.

This solution also guarantee no changes in the IO schedulers for those
who are not interested in using the cgroup IO controller. What is the
impact of the IO scheduler based controller for those users?

> 
> Secondly, think of following odd scenarios if we implement a higher level
> proportional BW controller which can offer the same feature as CFQ and
> also can handle group scheduling.
> 
> Case1:
> ======	 
>            (Higher level proportional BW controller)
> 			/dev/sda (CFQ)
> 
> So if somebody wants a group scheduling, we will be doing same IO control
> at two places (with-in group). Once at higher level and second time at CFQ
> level. Does not sound too logical to me.
> 
> Case2:
> ======
> 
>            (Higher level proportional BW controller)
> 			/dev/sda (NOOP)
> 	
> This is other extrememt. Lower level IO scheduler does not offer any kind
> of notion of class or prio with-in class and higher level scheduler will
> still be maintaining all the infrastructure unnecessarily.
> 
> That's why I get back to this simple question again, why not extend the
> IO schedulers to handle group scheduling and do both proportional BW and
> max bw control there.
> 
> > 
> > > 
> > > Andrea, last time you were planning to have a look at my patches and see
> > > if max bw controller can be implemented there. I got a feeling that it
> > > should not be too difficult to implement it there. We already have the
> > > hierarchical tree of io queues and groups in elevator layer and we run
> > > BFQ (WF2Q+) algorithm to select next queue to dispatch the IO from. It is
> > > just a matter of also keeping track of IO rate per queue/group and we should
> > > be easily be able to delay the dispatch of IO from a queue if its group has
> > > crossed the specified max bw.
> > 
> > Yes, sorry for my late, I quickly tested your patchset, but I still need
> > to understand many details of your solution. In the next days I'll
> > re-read everything carefully and I'll try to do a detailed review of
> > your patchset (just re-building the kernel with your patchset applied).
> > 
> 
> Sure. My patchset is still in the infancy stage. So don't expect great
> results. But it does highlight the idea and design very well.
> 
> > > 
> > > This should lead to less code and reduced complextiy (compared with the
> > > case where we do max bw control with io-throttling patches and proportional
> > > BW control using IO scheduler based control patches).
> > 
> > mmmh... changing the logic at the elevator and all IO schedulers doesn't
> > sound like reduced complexity and less code changed. With io-throttle we
> > just need to place the cgroup_io_throttle() hook in the right functions
> > where we want to apply throttling. This is a quite easy approach to
> > extend the IO control also to logical devices (more in general devices
> > that use their own make_request_fn) or even network-attached devices, as
> > well as networking filesystems, etc.
> > 
> > But I may be wrong. As I said I still need to review in the details your
> > solution.
> 
> Well I meant reduced code in the sense if we implement both max bw and
> proportional bw at IO scheduler level instead of proportional BW at
> IO scheduler and max bw at higher level.

OK.

> 
> I agree that doing max bw control at higher level has this advantage that
> it covers all the kind of deivces (higher level logical devices) and IO
> scheduler level solution does not do that. But this comes at the price
> of broken IO scheduler properties with-in cgroup.
> 
> Maybe we can then implement both. A higher level max bw controller and a
> max bw feature implemented along side proportional BW controller at IO
> scheduler level. Folks who use hardware RAID, or single disk devices can
> use max bw control of IO scheduler and those using software RAID devices
> can use higher level max bw controller.

OK, maybe.

> 
> > 
> > >  
> > > So do you think that it would make sense to do max BW control along with
> > > proportional weight IO controller at IO scheduler? If yes, then we can
> > > work together and continue to develop this patchset to also support max
> > > bw control and meet your requirements and drop the io-throttling patches.
> > 
> > It is surely worth to be explored. Honestly, I don't know if it would be
> > a better solution or not. Probably comparing some results with different
> > IO workloads is the best way to proceed and decide which is the right
> > way to go. This is necessary IMHO, before totally dropping one solution
> > or another.
> 
> Sure. My patches have started giving some basic results but because there
> is lot of work remaining before a fair comparison can be done on the
> basis of performance under various work loads. So some more time to
> go before we can do a fair comparison based on numbers.
>  
> > 
> > > 
> > > The only thing which concerns me is the fact that IO scheduler does not
> > > have the view of higher level logical device. So if somebody has setup a
> > > software RAID and wants to put max BW limit on software raid device, this
> > > solution will not work. One shall have to live with max bw limits on 
> > > individual disks (where io scheduler is actually running). Do your patches
> > > allow to put limit on software RAID devices also? 
> > 
> > No, but as said above my patchset provides the interfaces to apply the
> > IO control and accounting wherever we want. At the moment there's just
> > one interface, cgroup_io_throttle().
> 
> Sorry, I did not get it clearly. I guess I did not ask the question right.
> So lets say I got a setup where there are two phyical devices /dev/sda and
> /dev/sdb and I create a logical device (say using device mapper facilities)
> on top of these two physical disks. And some application is generating
> the IO for logical device lv0.
> 
> 				Appl
> 				 |
> 				lv0
> 			       /  \
> 			    sda	   sdb
> 
> 
> Where should I put the bandwidth limiting rules now for io-throtle. I 
> specify these for lv0 device or for sda and sdb devices?

The BW limiting rules would be applied into the make_request_fn provided
by the lv0 device. If it's not provided, before calling
generic_make_request(). A problem could be that the driver must be aware
of the particular lv0 device at that point.

> 
> Thanks
> Vivek

OK. I definitely need to look at your patchset before saying any other
opinion... :)

Thanks,
-Andrea

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: IO scheduler based IO Controller V2
  2009-05-06 21:34         ` Andrea Righi
@ 2009-05-06 21:52             ` Vivek Goyal
  0 siblings, 0 replies; 97+ messages in thread
From: Vivek Goyal @ 2009-05-06 21:52 UTC (permalink / raw)
  To: Andrea Righi
  Cc: dhaval-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8,
	snitzer-H+wXaHxf7aLQT0dZR+AlfA, dm-devel-H+wXaHxf7aLQT0dZR+AlfA,
	jens.axboe-QHcLZuEGTsvQT0dZR+AlfA, agk-H+wXaHxf7aLQT0dZR+AlfA,
	balbir-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8,
	paolo.valente-rcYM44yAMweonA0d6jMUrA,
	fernando-gVGce1chcLdL9jVzuh4AOg, jmoyer-H+wXaHxf7aLQT0dZR+AlfA,
	fchecconi-Re5JQEeQqe8AvxtiuMwx3w,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Andrew Morton

On Wed, May 06, 2009 at 11:34:54PM +0200, Andrea Righi wrote:
> On Wed, May 06, 2009 at 04:32:28PM -0400, Vivek Goyal wrote:
> > Hi Andrea and others,
> > 
> > I always had this doubt in mind that any kind of 2nd level controller will
> > have no idea about underlying IO scheduler queues/semantics. So while it
> > can implement a particular cgroup policy (max bw like io-throttle or
> > proportional bw like dm-ioband) but there are high chances that it will
> > break IO scheduler's semantics in one way or other.
> > 
> > I had already sent out the results for dm-ioband in a separate thread.
> > 
> > http://linux.derkeiler.com/Mailing-Lists/Kernel/2009-04/msg07258.html
> > http://linux.derkeiler.com/Mailing-Lists/Kernel/2009-04/msg07573.html
> > http://linux.derkeiler.com/Mailing-Lists/Kernel/2009-04/msg08177.html
> > http://linux.derkeiler.com/Mailing-Lists/Kernel/2009-04/msg08345.html
> > http://linux.derkeiler.com/Mailing-Lists/Kernel/2009-04/msg08355.html
> > 
> > Here are some basic results with io-throttle. Andrea, please let me know
> > if you think this is procedural problem. Playing with io-throttle patches
> > for the first time.
> > 
> > I took V16 of your patches and trying it out with 2.6.30-rc4 with CFQ
> > scheduler.
> > 
> > I have got one SATA drive with one partition on it.
> > 
> > I am trying to create one cgroup and assignn 8MB/s limit to it and launch
> > on RT prio 0 task and one BE prio 7 task and see how this 8MB/s is divided
> > between these tasks. Following are the results.
> > 
> > Following is my test script.
> > 
> > *******************************************************************
> > #!/bin/bash
> > 
> > mount /dev/sdb1 /mnt/sdb
> > 
> > mount -t cgroup -o blockio blockio /cgroup/iot/
> > mkdir -p /cgroup/iot/test1 /cgroup/iot/test2
> > 
> > # Set bw limit of 8 MB/ps on sdb
> > echo "/dev/sdb:$((8 * 1024 * 1024)):0:0" >
> > /cgroup/iot/test1/blockio.bandwidth-max
> > 
> > sync
> > echo 3 > /proc/sys/vm/drop_caches
> > 
> > echo $$ > /cgroup/iot/test1/tasks
> > 
> > # Launch a normal prio reader.
> > ionice -c 2 -n 7 dd if=/mnt/sdb/zerofile1 of=/dev/zero &
> > pid1=$!
> > echo $pid1
> > 
> > # Launch an RT reader  
> > ionice -c 1 -n 0 dd if=/mnt/sdb/zerofile2 of=/dev/zero &
> > pid2=$!
> > echo $pid2
> > 
> > wait $pid2
> > echo "RT task finished"
> > **********************************************************************
> > 
> > Test1
> > =====
> > Test two readers (one RT class and one BE class) and see how BW is
> > allocated with-in cgroup
> > 
> > With io-throttle patches
> > ------------------------
> > - Two readers, first BE prio 7, second RT prio 0
> > 
> > 234179072 bytes (234 MB) copied, 55.8482 s, 4.2 MB/s
> > 234179072 bytes (234 MB) copied, 55.8975 s, 4.2 MB/s
> > RT task finished
> > 
> > Note: See, there is no difference in the performance of RT or BE task.
> > Looks like these got throttled equally.
> 
> OK, this is coherent with the current io-throttle implementation. IO
> requests are throttled without the concept of the ioprio model.
> 
> We could try to distribute the throttle using a function of each task's
> ioprio, but ok, the obvious drawback is that it totally breaks the logic
> used by the underlying layers.
> 
> BTW, I'm wondering, is it a very critical issue? I would say why not to
> move the RT task to a different cgroup with unlimited BW? or limited BW
> but with other tasks running at the same IO priority...

So one of hypothetical use case probably  could be following. Somebody
is having a hosted server and customers are going to get there
applications running in a particular cgroup with a limit on max bw.

			root
		  /      |      \
	     cust1      cust2   cust3
	   (20 MB/s)  (40MB/s)  (30MB/s)

Now all three customers will run their own applications/virtual machines
in their respective groups with upper limits. Will we say to these that
all your tasks will be considered as same class and same prio level.

Assume cust1 is running a hypothetical application which creates multiple
threads and assigns these threads different priorities based on its needs
at run time. How would we handle this thing?

You can't collect all the RT tasks from all customers and move these to a
single cgroup. Or ask customers to separate out their tasks based on
priority level and give them multiple groups of different priority.

> could the cgroup
> subsystem be a more flexible and customizable framework respect to the
> current ioprio model?
> 
> I'm not saying we have to ignore the problem, just trying to evaluate
> the impact and alternatives. And I'm still convinced that also providing
> per-cgroup ioprio would be an important feature.
> 
> > 
> > 
> > Without io-throttle patches
> > ----------------------------
> > - Two readers, first BE prio 7, second RT prio 0
> > 
> > 234179072 bytes (234 MB) copied, 2.81801 s, 83.1 MB/s
> > RT task finished
> > 234179072 bytes (234 MB) copied, 5.28238 s, 44.3 MB/s
> > 
> > Note: Because I can't limit the BW without io-throttle patches, so don't
> >       worry about increased BW. But the important point is that RT task
> >       gets much more BW than a BE prio 7 task.
> > 
> > Test2
> > ====
> > - Test 2 readers (One BE prio 0 and one BE prio 7) and see how BW is
> > distributed among these.
> > 
> > With io-throttle patches
> > ------------------------
> > - Two readers, first BE prio 7, second BE prio 0
> > 
> > 234179072 bytes (234 MB) copied, 55.8604 s, 4.2 MB/s
> > 234179072 bytes (234 MB) copied, 55.8918 s, 4.2 MB/s
> > High prio reader finished
> 
> Ditto.
> 
> > 
> > Without io-throttle patches
> > ---------------------------
> > - Two readers, first BE prio 7, second BE prio 0
> > 
> > 234179072 bytes (234 MB) copied, 4.12074 s, 56.8 MB/s
> > High prio reader finished
> > 234179072 bytes (234 MB) copied, 5.36023 s, 43.7 MB/s
> > 
> > Note: There is no service differentiation between prio 0 and prio 7 task
> >       with io-throttle patches.
> > 
> > Test 3
> > ======
> > - Run the one RT reader and one BE reader in root cgroup without any
> >   limitations. I guess this should mean unlimited BW and behavior should
> >   be same as with CFQ without io-throttling patches.
> > 
> > With io-throttle patches
> > =========================
> > Ran the test 4 times because I was getting different results in different
> > runs.
> > 
> > - Two readers, one RT prio 0  other BE prio 7
> > 
> > 234179072 bytes (234 MB) copied, 2.74604 s, 85.3 MB/s
> > 234179072 bytes (234 MB) copied, 5.20995 s, 44.9 MB/s
> > RT task finished
> > 
> > 234179072 bytes (234 MB) copied, 4.54417 s, 51.5 MB/s
> > RT task finished
> > 234179072 bytes (234 MB) copied, 5.23396 s, 44.7 MB/s
> > 
> > 234179072 bytes (234 MB) copied, 5.17727 s, 45.2 MB/s
> > RT task finished
> > 234179072 bytes (234 MB) copied, 5.25894 s, 44.5 MB/s
> > 
> > 234179072 bytes (234 MB) copied, 2.74141 s, 85.4 MB/s
> > 234179072 bytes (234 MB) copied, 5.20536 s, 45.0 MB/s
> > RT task finished
> > 
> > Note: Out of 4 runs, looks like twice it is complete priority inversion
> >       and RT task finished after BE task. Rest of the two times, the
> >       difference between BW of RT and BE task is much less as compared to
> >       without patches. In fact once it was almost same.
> 
> This is strange. If you don't set any limit there shouldn't be any
> difference respect to the other case (without io-throttle patches).
> 
> At worst a small overhead given by the task_to_iothrottle(), under
> rcu_read_lock(). I'll repeat this test ASAP and see if I'll be able to
> reproduce this strange behaviour.

Ya, I also found this strange. At least in root group there should not be
any behavior change (at max one might expect little drop in throughput
because of extra code).

Thanks
Vivek

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: IO scheduler based IO Controller V2
@ 2009-05-06 21:52             ` Vivek Goyal
  0 siblings, 0 replies; 97+ messages in thread
From: Vivek Goyal @ 2009-05-06 21:52 UTC (permalink / raw)
  To: Andrea Righi
  Cc: Andrew Morton, nauman, dpshah, lizf, mikew, fchecconi,
	paolo.valente, jens.axboe, ryov, fernando, s-uchida, taka,
	guijianfeng, jmoyer, dhaval, balbir, linux-kernel, containers,
	agk, dm-devel, snitzer, m-ikeda, peterz

On Wed, May 06, 2009 at 11:34:54PM +0200, Andrea Righi wrote:
> On Wed, May 06, 2009 at 04:32:28PM -0400, Vivek Goyal wrote:
> > Hi Andrea and others,
> > 
> > I always had this doubt in mind that any kind of 2nd level controller will
> > have no idea about underlying IO scheduler queues/semantics. So while it
> > can implement a particular cgroup policy (max bw like io-throttle or
> > proportional bw like dm-ioband) but there are high chances that it will
> > break IO scheduler's semantics in one way or other.
> > 
> > I had already sent out the results for dm-ioband in a separate thread.
> > 
> > http://linux.derkeiler.com/Mailing-Lists/Kernel/2009-04/msg07258.html
> > http://linux.derkeiler.com/Mailing-Lists/Kernel/2009-04/msg07573.html
> > http://linux.derkeiler.com/Mailing-Lists/Kernel/2009-04/msg08177.html
> > http://linux.derkeiler.com/Mailing-Lists/Kernel/2009-04/msg08345.html
> > http://linux.derkeiler.com/Mailing-Lists/Kernel/2009-04/msg08355.html
> > 
> > Here are some basic results with io-throttle. Andrea, please let me know
> > if you think this is procedural problem. Playing with io-throttle patches
> > for the first time.
> > 
> > I took V16 of your patches and trying it out with 2.6.30-rc4 with CFQ
> > scheduler.
> > 
> > I have got one SATA drive with one partition on it.
> > 
> > I am trying to create one cgroup and assignn 8MB/s limit to it and launch
> > on RT prio 0 task and one BE prio 7 task and see how this 8MB/s is divided
> > between these tasks. Following are the results.
> > 
> > Following is my test script.
> > 
> > *******************************************************************
> > #!/bin/bash
> > 
> > mount /dev/sdb1 /mnt/sdb
> > 
> > mount -t cgroup -o blockio blockio /cgroup/iot/
> > mkdir -p /cgroup/iot/test1 /cgroup/iot/test2
> > 
> > # Set bw limit of 8 MB/ps on sdb
> > echo "/dev/sdb:$((8 * 1024 * 1024)):0:0" >
> > /cgroup/iot/test1/blockio.bandwidth-max
> > 
> > sync
> > echo 3 > /proc/sys/vm/drop_caches
> > 
> > echo $$ > /cgroup/iot/test1/tasks
> > 
> > # Launch a normal prio reader.
> > ionice -c 2 -n 7 dd if=/mnt/sdb/zerofile1 of=/dev/zero &
> > pid1=$!
> > echo $pid1
> > 
> > # Launch an RT reader  
> > ionice -c 1 -n 0 dd if=/mnt/sdb/zerofile2 of=/dev/zero &
> > pid2=$!
> > echo $pid2
> > 
> > wait $pid2
> > echo "RT task finished"
> > **********************************************************************
> > 
> > Test1
> > =====
> > Test two readers (one RT class and one BE class) and see how BW is
> > allocated with-in cgroup
> > 
> > With io-throttle patches
> > ------------------------
> > - Two readers, first BE prio 7, second RT prio 0
> > 
> > 234179072 bytes (234 MB) copied, 55.8482 s, 4.2 MB/s
> > 234179072 bytes (234 MB) copied, 55.8975 s, 4.2 MB/s
> > RT task finished
> > 
> > Note: See, there is no difference in the performance of RT or BE task.
> > Looks like these got throttled equally.
> 
> OK, this is coherent with the current io-throttle implementation. IO
> requests are throttled without the concept of the ioprio model.
> 
> We could try to distribute the throttle using a function of each task's
> ioprio, but ok, the obvious drawback is that it totally breaks the logic
> used by the underlying layers.
> 
> BTW, I'm wondering, is it a very critical issue? I would say why not to
> move the RT task to a different cgroup with unlimited BW? or limited BW
> but with other tasks running at the same IO priority...

So one of hypothetical use case probably  could be following. Somebody
is having a hosted server and customers are going to get there
applications running in a particular cgroup with a limit on max bw.

			root
		  /      |      \
	     cust1      cust2   cust3
	   (20 MB/s)  (40MB/s)  (30MB/s)

Now all three customers will run their own applications/virtual machines
in their respective groups with upper limits. Will we say to these that
all your tasks will be considered as same class and same prio level.

Assume cust1 is running a hypothetical application which creates multiple
threads and assigns these threads different priorities based on its needs
at run time. How would we handle this thing?

You can't collect all the RT tasks from all customers and move these to a
single cgroup. Or ask customers to separate out their tasks based on
priority level and give them multiple groups of different priority.

> could the cgroup
> subsystem be a more flexible and customizable framework respect to the
> current ioprio model?
> 
> I'm not saying we have to ignore the problem, just trying to evaluate
> the impact and alternatives. And I'm still convinced that also providing
> per-cgroup ioprio would be an important feature.
> 
> > 
> > 
> > Without io-throttle patches
> > ----------------------------
> > - Two readers, first BE prio 7, second RT prio 0
> > 
> > 234179072 bytes (234 MB) copied, 2.81801 s, 83.1 MB/s
> > RT task finished
> > 234179072 bytes (234 MB) copied, 5.28238 s, 44.3 MB/s
> > 
> > Note: Because I can't limit the BW without io-throttle patches, so don't
> >       worry about increased BW. But the important point is that RT task
> >       gets much more BW than a BE prio 7 task.
> > 
> > Test2
> > ====
> > - Test 2 readers (One BE prio 0 and one BE prio 7) and see how BW is
> > distributed among these.
> > 
> > With io-throttle patches
> > ------------------------
> > - Two readers, first BE prio 7, second BE prio 0
> > 
> > 234179072 bytes (234 MB) copied, 55.8604 s, 4.2 MB/s
> > 234179072 bytes (234 MB) copied, 55.8918 s, 4.2 MB/s
> > High prio reader finished
> 
> Ditto.
> 
> > 
> > Without io-throttle patches
> > ---------------------------
> > - Two readers, first BE prio 7, second BE prio 0
> > 
> > 234179072 bytes (234 MB) copied, 4.12074 s, 56.8 MB/s
> > High prio reader finished
> > 234179072 bytes (234 MB) copied, 5.36023 s, 43.7 MB/s
> > 
> > Note: There is no service differentiation between prio 0 and prio 7 task
> >       with io-throttle patches.
> > 
> > Test 3
> > ======
> > - Run the one RT reader and one BE reader in root cgroup without any
> >   limitations. I guess this should mean unlimited BW and behavior should
> >   be same as with CFQ without io-throttling patches.
> > 
> > With io-throttle patches
> > =========================
> > Ran the test 4 times because I was getting different results in different
> > runs.
> > 
> > - Two readers, one RT prio 0  other BE prio 7
> > 
> > 234179072 bytes (234 MB) copied, 2.74604 s, 85.3 MB/s
> > 234179072 bytes (234 MB) copied, 5.20995 s, 44.9 MB/s
> > RT task finished
> > 
> > 234179072 bytes (234 MB) copied, 4.54417 s, 51.5 MB/s
> > RT task finished
> > 234179072 bytes (234 MB) copied, 5.23396 s, 44.7 MB/s
> > 
> > 234179072 bytes (234 MB) copied, 5.17727 s, 45.2 MB/s
> > RT task finished
> > 234179072 bytes (234 MB) copied, 5.25894 s, 44.5 MB/s
> > 
> > 234179072 bytes (234 MB) copied, 2.74141 s, 85.4 MB/s
> > 234179072 bytes (234 MB) copied, 5.20536 s, 45.0 MB/s
> > RT task finished
> > 
> > Note: Out of 4 runs, looks like twice it is complete priority inversion
> >       and RT task finished after BE task. Rest of the two times, the
> >       difference between BW of RT and BE task is much less as compared to
> >       without patches. In fact once it was almost same.
> 
> This is strange. If you don't set any limit there shouldn't be any
> difference respect to the other case (without io-throttle patches).
> 
> At worst a small overhead given by the task_to_iothrottle(), under
> rcu_read_lock(). I'll repeat this test ASAP and see if I'll be able to
> reproduce this strange behaviour.

Ya, I also found this strange. At least in root group there should not be
any behavior change (at max one might expect little drop in throughput
because of extra code).

Thanks
Vivek

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: IO scheduler based IO Controller V2
       [not found]         ` <20090506203228.GH8180-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2009-05-06 21:34           ` Andrea Righi
  0 siblings, 0 replies; 97+ messages in thread
From: Andrea Righi @ 2009-05-06 21:34 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: dhaval-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8,
	snitzer-H+wXaHxf7aLQT0dZR+AlfA, dm-devel-H+wXaHxf7aLQT0dZR+AlfA,
	jens.axboe-QHcLZuEGTsvQT0dZR+AlfA, agk-H+wXaHxf7aLQT0dZR+AlfA,
	balbir-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8,
	paolo.valente-rcYM44yAMweonA0d6jMUrA,
	fernando-gVGce1chcLdL9jVzuh4AOg, jmoyer-H+wXaHxf7aLQT0dZR+AlfA,
	fchecconi-Re5JQEeQqe8AvxtiuMwx3w,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Andrew Morton

On Wed, May 06, 2009 at 04:32:28PM -0400, Vivek Goyal wrote:
> Hi Andrea and others,
> 
> I always had this doubt in mind that any kind of 2nd level controller will
> have no idea about underlying IO scheduler queues/semantics. So while it
> can implement a particular cgroup policy (max bw like io-throttle or
> proportional bw like dm-ioband) but there are high chances that it will
> break IO scheduler's semantics in one way or other.
> 
> I had already sent out the results for dm-ioband in a separate thread.
> 
> http://linux.derkeiler.com/Mailing-Lists/Kernel/2009-04/msg07258.html
> http://linux.derkeiler.com/Mailing-Lists/Kernel/2009-04/msg07573.html
> http://linux.derkeiler.com/Mailing-Lists/Kernel/2009-04/msg08177.html
> http://linux.derkeiler.com/Mailing-Lists/Kernel/2009-04/msg08345.html
> http://linux.derkeiler.com/Mailing-Lists/Kernel/2009-04/msg08355.html
> 
> Here are some basic results with io-throttle. Andrea, please let me know
> if you think this is procedural problem. Playing with io-throttle patches
> for the first time.
> 
> I took V16 of your patches and trying it out with 2.6.30-rc4 with CFQ
> scheduler.
> 
> I have got one SATA drive with one partition on it.
> 
> I am trying to create one cgroup and assignn 8MB/s limit to it and launch
> on RT prio 0 task and one BE prio 7 task and see how this 8MB/s is divided
> between these tasks. Following are the results.
> 
> Following is my test script.
> 
> *******************************************************************
> #!/bin/bash
> 
> mount /dev/sdb1 /mnt/sdb
> 
> mount -t cgroup -o blockio blockio /cgroup/iot/
> mkdir -p /cgroup/iot/test1 /cgroup/iot/test2
> 
> # Set bw limit of 8 MB/ps on sdb
> echo "/dev/sdb:$((8 * 1024 * 1024)):0:0" >
> /cgroup/iot/test1/blockio.bandwidth-max
> 
> sync
> echo 3 > /proc/sys/vm/drop_caches
> 
> echo $$ > /cgroup/iot/test1/tasks
> 
> # Launch a normal prio reader.
> ionice -c 2 -n 7 dd if=/mnt/sdb/zerofile1 of=/dev/zero &
> pid1=$!
> echo $pid1
> 
> # Launch an RT reader  
> ionice -c 1 -n 0 dd if=/mnt/sdb/zerofile2 of=/dev/zero &
> pid2=$!
> echo $pid2
> 
> wait $pid2
> echo "RT task finished"
> **********************************************************************
> 
> Test1
> =====
> Test two readers (one RT class and one BE class) and see how BW is
> allocated with-in cgroup
> 
> With io-throttle patches
> ------------------------
> - Two readers, first BE prio 7, second RT prio 0
> 
> 234179072 bytes (234 MB) copied, 55.8482 s, 4.2 MB/s
> 234179072 bytes (234 MB) copied, 55.8975 s, 4.2 MB/s
> RT task finished
> 
> Note: See, there is no difference in the performance of RT or BE task.
> Looks like these got throttled equally.

OK, this is coherent with the current io-throttle implementation. IO
requests are throttled without the concept of the ioprio model.

We could try to distribute the throttle using a function of each task's
ioprio, but ok, the obvious drawback is that it totally breaks the logic
used by the underlying layers.

BTW, I'm wondering, is it a very critical issue? I would say why not to
move the RT task to a different cgroup with unlimited BW? or limited BW
but with other tasks running at the same IO priority... could the cgroup
subsystem be a more flexible and customizable framework respect to the
current ioprio model?

I'm not saying we have to ignore the problem, just trying to evaluate
the impact and alternatives. And I'm still convinced that also providing
per-cgroup ioprio would be an important feature.

> 
> 
> Without io-throttle patches
> ----------------------------
> - Two readers, first BE prio 7, second RT prio 0
> 
> 234179072 bytes (234 MB) copied, 2.81801 s, 83.1 MB/s
> RT task finished
> 234179072 bytes (234 MB) copied, 5.28238 s, 44.3 MB/s
> 
> Note: Because I can't limit the BW without io-throttle patches, so don't
>       worry about increased BW. But the important point is that RT task
>       gets much more BW than a BE prio 7 task.
> 
> Test2
> ====
> - Test 2 readers (One BE prio 0 and one BE prio 7) and see how BW is
> distributed among these.
> 
> With io-throttle patches
> ------------------------
> - Two readers, first BE prio 7, second BE prio 0
> 
> 234179072 bytes (234 MB) copied, 55.8604 s, 4.2 MB/s
> 234179072 bytes (234 MB) copied, 55.8918 s, 4.2 MB/s
> High prio reader finished

Ditto.

> 
> Without io-throttle patches
> ---------------------------
> - Two readers, first BE prio 7, second BE prio 0
> 
> 234179072 bytes (234 MB) copied, 4.12074 s, 56.8 MB/s
> High prio reader finished
> 234179072 bytes (234 MB) copied, 5.36023 s, 43.7 MB/s
> 
> Note: There is no service differentiation between prio 0 and prio 7 task
>       with io-throttle patches.
> 
> Test 3
> ======
> - Run the one RT reader and one BE reader in root cgroup without any
>   limitations. I guess this should mean unlimited BW and behavior should
>   be same as with CFQ without io-throttling patches.
> 
> With io-throttle patches
> =========================
> Ran the test 4 times because I was getting different results in different
> runs.
> 
> - Two readers, one RT prio 0  other BE prio 7
> 
> 234179072 bytes (234 MB) copied, 2.74604 s, 85.3 MB/s
> 234179072 bytes (234 MB) copied, 5.20995 s, 44.9 MB/s
> RT task finished
> 
> 234179072 bytes (234 MB) copied, 4.54417 s, 51.5 MB/s
> RT task finished
> 234179072 bytes (234 MB) copied, 5.23396 s, 44.7 MB/s
> 
> 234179072 bytes (234 MB) copied, 5.17727 s, 45.2 MB/s
> RT task finished
> 234179072 bytes (234 MB) copied, 5.25894 s, 44.5 MB/s
> 
> 234179072 bytes (234 MB) copied, 2.74141 s, 85.4 MB/s
> 234179072 bytes (234 MB) copied, 5.20536 s, 45.0 MB/s
> RT task finished
> 
> Note: Out of 4 runs, looks like twice it is complete priority inversion
>       and RT task finished after BE task. Rest of the two times, the
>       difference between BW of RT and BE task is much less as compared to
>       without patches. In fact once it was almost same.

This is strange. If you don't set any limit there shouldn't be any
difference respect to the other case (without io-throttle patches).

At worst a small overhead given by the task_to_iothrottle(), under
rcu_read_lock(). I'll repeat this test ASAP and see if I'll be able to
reproduce this strange behaviour.

> 
> Without io-throttle patches.
> ===========================
> - Two readers, one RT prio 0  other BE prio 7 (4 runs)
> 
> 234179072 bytes (234 MB) copied, 2.80988 s, 83.3 MB/s
> RT task finished
> 234179072 bytes (234 MB) copied, 5.28228 s, 44.3 MB/s
> 
> 234179072 bytes (234 MB) copied, 2.80659 s, 83.4 MB/s
> RT task finished
> 234179072 bytes (234 MB) copied, 5.27874 s, 44.4 MB/s
> 
> 234179072 bytes (234 MB) copied, 2.79601 s, 83.8 MB/s
> RT task finished
> 234179072 bytes (234 MB) copied, 5.2542 s, 44.6 MB/s
> 
> 234179072 bytes (234 MB) copied, 2.78764 s, 84.0 MB/s
> RT task finished
> 234179072 bytes (234 MB) copied, 5.26009 s, 44.5 MB/s
> 
> Note, How consistent the behavior is without io-throttle patches.
> 
> In summary, I think a 2nd level solution can ensure one policy on cgroups but
> it will break other semantics/properties of IO scheduler with-in cgroup as
> 2nd level solution has no idea at run time what is the IO scheduler running
> underneath and what kind of properties it has.
> 
> Andrea, please try it on your setup and see if you get similar results
> on or. Hopefully it is not a configuration or test procedure issue on my
> side.
> 
> Thanks
> Vivek
> 
> > The only thing which concerns me is the fact that IO scheduler does not
> > have the view of higher level logical device. So if somebody has setup a
> > software RAID and wants to put max BW limit on software raid device, this
> > solution will not work. One shall have to live with max bw limits on 
> > individual disks (where io scheduler is actually running). Do your patches
> > allow to put limit on software RAID devices also? 
> > 
> > Ryo, dm-ioband breaks the notion of classes and priority of CFQ because
> > of FIFO dispatch of buffered bios. Apart from that it tries to provide
> > fairness in terms of actual IO done and that would mean a seeky workload
> > will can use disk for much longer to get equivalent IO done and slow down
> > other applications. Implementing IO controller at IO scheduler level gives
> > us tigher control. Will it not meet your requirements? If you got specific
> > concerns with IO scheduler based contol patches, please highlight these and
> > we will see how these can be addressed.
> > 
> > Thanks
> > Vivek

-Andrea

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: IO scheduler based IO Controller V2
  2009-05-06 20:32       ` Vivek Goyal
       [not found]         ` <20090506203228.GH8180-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2009-05-06 21:34         ` Andrea Righi
  2009-05-06 21:52             ` Vivek Goyal
  1 sibling, 1 reply; 97+ messages in thread
From: Andrea Righi @ 2009-05-06 21:34 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Andrew Morton, nauman, dpshah, lizf, mikew, fchecconi,
	paolo.valente, jens.axboe, ryov, fernando, s-uchida, taka,
	guijianfeng, jmoyer, dhaval, balbir, linux-kernel, containers,
	agk, dm-devel, snitzer, m-ikeda, peterz

On Wed, May 06, 2009 at 04:32:28PM -0400, Vivek Goyal wrote:
> Hi Andrea and others,
> 
> I always had this doubt in mind that any kind of 2nd level controller will
> have no idea about underlying IO scheduler queues/semantics. So while it
> can implement a particular cgroup policy (max bw like io-throttle or
> proportional bw like dm-ioband) but there are high chances that it will
> break IO scheduler's semantics in one way or other.
> 
> I had already sent out the results for dm-ioband in a separate thread.
> 
> http://linux.derkeiler.com/Mailing-Lists/Kernel/2009-04/msg07258.html
> http://linux.derkeiler.com/Mailing-Lists/Kernel/2009-04/msg07573.html
> http://linux.derkeiler.com/Mailing-Lists/Kernel/2009-04/msg08177.html
> http://linux.derkeiler.com/Mailing-Lists/Kernel/2009-04/msg08345.html
> http://linux.derkeiler.com/Mailing-Lists/Kernel/2009-04/msg08355.html
> 
> Here are some basic results with io-throttle. Andrea, please let me know
> if you think this is procedural problem. Playing with io-throttle patches
> for the first time.
> 
> I took V16 of your patches and trying it out with 2.6.30-rc4 with CFQ
> scheduler.
> 
> I have got one SATA drive with one partition on it.
> 
> I am trying to create one cgroup and assignn 8MB/s limit to it and launch
> on RT prio 0 task and one BE prio 7 task and see how this 8MB/s is divided
> between these tasks. Following are the results.
> 
> Following is my test script.
> 
> *******************************************************************
> #!/bin/bash
> 
> mount /dev/sdb1 /mnt/sdb
> 
> mount -t cgroup -o blockio blockio /cgroup/iot/
> mkdir -p /cgroup/iot/test1 /cgroup/iot/test2
> 
> # Set bw limit of 8 MB/ps on sdb
> echo "/dev/sdb:$((8 * 1024 * 1024)):0:0" >
> /cgroup/iot/test1/blockio.bandwidth-max
> 
> sync
> echo 3 > /proc/sys/vm/drop_caches
> 
> echo $$ > /cgroup/iot/test1/tasks
> 
> # Launch a normal prio reader.
> ionice -c 2 -n 7 dd if=/mnt/sdb/zerofile1 of=/dev/zero &
> pid1=$!
> echo $pid1
> 
> # Launch an RT reader  
> ionice -c 1 -n 0 dd if=/mnt/sdb/zerofile2 of=/dev/zero &
> pid2=$!
> echo $pid2
> 
> wait $pid2
> echo "RT task finished"
> **********************************************************************
> 
> Test1
> =====
> Test two readers (one RT class and one BE class) and see how BW is
> allocated with-in cgroup
> 
> With io-throttle patches
> ------------------------
> - Two readers, first BE prio 7, second RT prio 0
> 
> 234179072 bytes (234 MB) copied, 55.8482 s, 4.2 MB/s
> 234179072 bytes (234 MB) copied, 55.8975 s, 4.2 MB/s
> RT task finished
> 
> Note: See, there is no difference in the performance of RT or BE task.
> Looks like these got throttled equally.

OK, this is coherent with the current io-throttle implementation. IO
requests are throttled without the concept of the ioprio model.

We could try to distribute the throttle using a function of each task's
ioprio, but ok, the obvious drawback is that it totally breaks the logic
used by the underlying layers.

BTW, I'm wondering, is it a very critical issue? I would say why not to
move the RT task to a different cgroup with unlimited BW? or limited BW
but with other tasks running at the same IO priority... could the cgroup
subsystem be a more flexible and customizable framework respect to the
current ioprio model?

I'm not saying we have to ignore the problem, just trying to evaluate
the impact and alternatives. And I'm still convinced that also providing
per-cgroup ioprio would be an important feature.

> 
> 
> Without io-throttle patches
> ----------------------------
> - Two readers, first BE prio 7, second RT prio 0
> 
> 234179072 bytes (234 MB) copied, 2.81801 s, 83.1 MB/s
> RT task finished
> 234179072 bytes (234 MB) copied, 5.28238 s, 44.3 MB/s
> 
> Note: Because I can't limit the BW without io-throttle patches, so don't
>       worry about increased BW. But the important point is that RT task
>       gets much more BW than a BE prio 7 task.
> 
> Test2
> ====
> - Test 2 readers (One BE prio 0 and one BE prio 7) and see how BW is
> distributed among these.
> 
> With io-throttle patches
> ------------------------
> - Two readers, first BE prio 7, second BE prio 0
> 
> 234179072 bytes (234 MB) copied, 55.8604 s, 4.2 MB/s
> 234179072 bytes (234 MB) copied, 55.8918 s, 4.2 MB/s
> High prio reader finished

Ditto.

> 
> Without io-throttle patches
> ---------------------------
> - Two readers, first BE prio 7, second BE prio 0
> 
> 234179072 bytes (234 MB) copied, 4.12074 s, 56.8 MB/s
> High prio reader finished
> 234179072 bytes (234 MB) copied, 5.36023 s, 43.7 MB/s
> 
> Note: There is no service differentiation between prio 0 and prio 7 task
>       with io-throttle patches.
> 
> Test 3
> ======
> - Run the one RT reader and one BE reader in root cgroup without any
>   limitations. I guess this should mean unlimited BW and behavior should
>   be same as with CFQ without io-throttling patches.
> 
> With io-throttle patches
> =========================
> Ran the test 4 times because I was getting different results in different
> runs.
> 
> - Two readers, one RT prio 0  other BE prio 7
> 
> 234179072 bytes (234 MB) copied, 2.74604 s, 85.3 MB/s
> 234179072 bytes (234 MB) copied, 5.20995 s, 44.9 MB/s
> RT task finished
> 
> 234179072 bytes (234 MB) copied, 4.54417 s, 51.5 MB/s
> RT task finished
> 234179072 bytes (234 MB) copied, 5.23396 s, 44.7 MB/s
> 
> 234179072 bytes (234 MB) copied, 5.17727 s, 45.2 MB/s
> RT task finished
> 234179072 bytes (234 MB) copied, 5.25894 s, 44.5 MB/s
> 
> 234179072 bytes (234 MB) copied, 2.74141 s, 85.4 MB/s
> 234179072 bytes (234 MB) copied, 5.20536 s, 45.0 MB/s
> RT task finished
> 
> Note: Out of 4 runs, looks like twice it is complete priority inversion
>       and RT task finished after BE task. Rest of the two times, the
>       difference between BW of RT and BE task is much less as compared to
>       without patches. In fact once it was almost same.

This is strange. If you don't set any limit there shouldn't be any
difference respect to the other case (without io-throttle patches).

At worst a small overhead given by the task_to_iothrottle(), under
rcu_read_lock(). I'll repeat this test ASAP and see if I'll be able to
reproduce this strange behaviour.

> 
> Without io-throttle patches.
> ===========================
> - Two readers, one RT prio 0  other BE prio 7 (4 runs)
> 
> 234179072 bytes (234 MB) copied, 2.80988 s, 83.3 MB/s
> RT task finished
> 234179072 bytes (234 MB) copied, 5.28228 s, 44.3 MB/s
> 
> 234179072 bytes (234 MB) copied, 2.80659 s, 83.4 MB/s
> RT task finished
> 234179072 bytes (234 MB) copied, 5.27874 s, 44.4 MB/s
> 
> 234179072 bytes (234 MB) copied, 2.79601 s, 83.8 MB/s
> RT task finished
> 234179072 bytes (234 MB) copied, 5.2542 s, 44.6 MB/s
> 
> 234179072 bytes (234 MB) copied, 2.78764 s, 84.0 MB/s
> RT task finished
> 234179072 bytes (234 MB) copied, 5.26009 s, 44.5 MB/s
> 
> Note, How consistent the behavior is without io-throttle patches.
> 
> In summary, I think a 2nd level solution can ensure one policy on cgroups but
> it will break other semantics/properties of IO scheduler with-in cgroup as
> 2nd level solution has no idea at run time what is the IO scheduler running
> underneath and what kind of properties it has.
> 
> Andrea, please try it on your setup and see if you get similar results
> on or. Hopefully it is not a configuration or test procedure issue on my
> side.
> 
> Thanks
> Vivek
> 
> > The only thing which concerns me is the fact that IO scheduler does not
> > have the view of higher level logical device. So if somebody has setup a
> > software RAID and wants to put max BW limit on software raid device, this
> > solution will not work. One shall have to live with max bw limits on 
> > individual disks (where io scheduler is actually running). Do your patches
> > allow to put limit on software RAID devices also? 
> > 
> > Ryo, dm-ioband breaks the notion of classes and priority of CFQ because
> > of FIFO dispatch of buffered bios. Apart from that it tries to provide
> > fairness in terms of actual IO done and that would mean a seeky workload
> > will can use disk for much longer to get equivalent IO done and slow down
> > other applications. Implementing IO controller at IO scheduler level gives
> > us tigher control. Will it not meet your requirements? If you got specific
> > concerns with IO scheduler based contol patches, please highlight these and
> > we will see how these can be addressed.
> > 
> > Thanks
> > Vivek

-Andrea


^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: IO scheduler based IO Controller V2
  2009-05-06 20:07       ` Andrea Righi
@ 2009-05-06 21:21         ` Vivek Goyal
  2009-05-06 21:21         ` Vivek Goyal
  1 sibling, 0 replies; 97+ messages in thread
From: Vivek Goyal @ 2009-05-06 21:21 UTC (permalink / raw)
  To: Andrea Righi
  Cc: dhaval-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8,
	snitzer-H+wXaHxf7aLQT0dZR+AlfA, dm-devel-H+wXaHxf7aLQT0dZR+AlfA,
	jens.axboe-QHcLZuEGTsvQT0dZR+AlfA, agk-H+wXaHxf7aLQT0dZR+AlfA,
	balbir-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8,
	paolo.valente-rcYM44yAMweonA0d6jMUrA,
	fernando-gVGce1chcLdL9jVzuh4AOg, jmoyer-H+wXaHxf7aLQT0dZR+AlfA,
	fchecconi-Re5JQEeQqe8AvxtiuMwx3w,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Andrew Morton

On Wed, May 06, 2009 at 10:07:53PM +0200, Andrea Righi wrote:
> On Tue, May 05, 2009 at 10:33:32PM -0400, Vivek Goyal wrote:
> > On Tue, May 05, 2009 at 01:24:41PM -0700, Andrew Morton wrote:
> > > On Tue,  5 May 2009 15:58:27 -0400
> > > Vivek Goyal <vgoyal-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> > > 
> > > > 
> > > > Hi All,
> > > > 
> > > > Here is the V2 of the IO controller patches generated on top of 2.6.30-rc4.
> > > > ...
> > > > Currently primarily two other IO controller proposals are out there.
> > > > 
> > > > dm-ioband
> > > > ---------
> > > > This patch set is from Ryo Tsuruta from valinux.
> > > > ...
> > > > IO-throttling
> > > > -------------
> > > > This patch set is from Andrea Righi provides max bandwidth controller.
> > > 
> > > I'm thinking we need to lock you guys in a room and come back in 15 minutes.
> > > 
> > > Seriously, how are we to resolve this?  We could lock me in a room and
> > > cmoe back in 15 days, but there's no reason to believe that I'd emerge
> > > with the best answer.
> > > 
> > > I tend to think that a cgroup-based controller is the way to go. 
> > > Anything else will need to be wired up to cgroups _anyway_, and that
> > > might end up messy.
> > 
> > Hi Andrew,
> > 
> > Sorry, did not get what do you mean by cgroup based controller? If you
> > mean that we use cgroups for grouping tasks for controlling IO, then both
> > IO scheduler based controller as well as io throttling proposal do that.
> > dm-ioband also supports that up to some extent but it requires extra step of
> > transferring cgroup grouping information to dm-ioband device using dm-tools.
> > 
> > But if you meant that io-throttle patches, then I think it solves only
> > part of the problem and that is max bw control. It does not offer minimum
> > BW/minimum disk share gurantees as offered by proportional BW control.
> > 
> > IOW, it supports upper limit control and does not support a work conserving
> > IO controller which lets a group use the whole BW if competing groups are
> > not present. IMHO, proportional BW control is an important feature which
> > we will need and IIUC, io-throttle patches can't be easily extended to support
> > proportional BW control, OTOH, one should be able to extend IO scheduler
> > based proportional weight controller to also support max bw control. 
> 
> Well, IMHO the big concern is at which level we want to implement the
> logic of control: IO scheduler, when the IO requests are already
> submitted and need to be dispatched, or at high level when the
> applications generates IO requests (or maybe both).
> 
> And, as pointed by Andrew, do everything by a cgroup-based controller.

I am not sure what's the rationale behind that. Why to do it at higher
layer? Doing it at IO scheduler layer will make sure that one does not
breaks the IO scheduler's properties with-in cgroup. (See my other mail
with some io-throttling test results).

The advantage of higher layer mechanism is that it can also cover software
RAID devices well. 

> 
> The other features, proportional BW, throttling, take the current ioprio
> model in account, etc. are implementation details and any of the
> proposed solutions can be extended to support all these features. I
> mean, io-throttle can be extended to support proportional BW (for a
> certain perspective it is already provided by the throttling water mark
> in v16), as well as the IO scheduler based controller can be extended to
> support absolute BW limits. The same for dm-ioband. I don't think
> there're huge obstacle to merge the functionalities in this sense.

Yes, from technical point of view, one can implement a proportional BW
controller at higher layer also. But that would practically mean almost
re-implementing the CFQ logic at higher layer. Now why to get into all
that complexity. Why not simply make CFQ hiearchical to also handle the
groups?

Secondly, think of following odd scenarios if we implement a higher level
proportional BW controller which can offer the same feature as CFQ and
also can handle group scheduling.

Case1:
======	 
           (Higher level proportional BW controller)
			/dev/sda (CFQ)

So if somebody wants a group scheduling, we will be doing same IO control
at two places (with-in group). Once at higher level and second time at CFQ
level. Does not sound too logical to me.

Case2:
======

           (Higher level proportional BW controller)
			/dev/sda (NOOP)
	
This is other extrememt. Lower level IO scheduler does not offer any kind
of notion of class or prio with-in class and higher level scheduler will
still be maintaining all the infrastructure unnecessarily.

That's why I get back to this simple question again, why not extend the
IO schedulers to handle group scheduling and do both proportional BW and
max bw control there.

> 
> > 
> > Andrea, last time you were planning to have a look at my patches and see
> > if max bw controller can be implemented there. I got a feeling that it
> > should not be too difficult to implement it there. We already have the
> > hierarchical tree of io queues and groups in elevator layer and we run
> > BFQ (WF2Q+) algorithm to select next queue to dispatch the IO from. It is
> > just a matter of also keeping track of IO rate per queue/group and we should
> > be easily be able to delay the dispatch of IO from a queue if its group has
> > crossed the specified max bw.
> 
> Yes, sorry for my late, I quickly tested your patchset, but I still need
> to understand many details of your solution. In the next days I'll
> re-read everything carefully and I'll try to do a detailed review of
> your patchset (just re-building the kernel with your patchset applied).
> 

Sure. My patchset is still in the infancy stage. So don't expect great
results. But it does highlight the idea and design very well.

> > 
> > This should lead to less code and reduced complextiy (compared with the
> > case where we do max bw control with io-throttling patches and proportional
> > BW control using IO scheduler based control patches).
> 
> mmmh... changing the logic at the elevator and all IO schedulers doesn't
> sound like reduced complexity and less code changed. With io-throttle we
> just need to place the cgroup_io_throttle() hook in the right functions
> where we want to apply throttling. This is a quite easy approach to
> extend the IO control also to logical devices (more in general devices
> that use their own make_request_fn) or even network-attached devices, as
> well as networking filesystems, etc.
> 
> But I may be wrong. As I said I still need to review in the details your
> solution.

Well I meant reduced code in the sense if we implement both max bw and
proportional bw at IO scheduler level instead of proportional BW at
IO scheduler and max bw at higher level.

I agree that doing max bw control at higher level has this advantage that
it covers all the kind of deivces (higher level logical devices) and IO
scheduler level solution does not do that. But this comes at the price
of broken IO scheduler properties with-in cgroup.

Maybe we can then implement both. A higher level max bw controller and a
max bw feature implemented along side proportional BW controller at IO
scheduler level. Folks who use hardware RAID, or single disk devices can
use max bw control of IO scheduler and those using software RAID devices
can use higher level max bw controller.

> 
> >  
> > So do you think that it would make sense to do max BW control along with
> > proportional weight IO controller at IO scheduler? If yes, then we can
> > work together and continue to develop this patchset to also support max
> > bw control and meet your requirements and drop the io-throttling patches.
> 
> It is surely worth to be explored. Honestly, I don't know if it would be
> a better solution or not. Probably comparing some results with different
> IO workloads is the best way to proceed and decide which is the right
> way to go. This is necessary IMHO, before totally dropping one solution
> or another.

Sure. My patches have started giving some basic results but because there
is lot of work remaining before a fair comparison can be done on the
basis of performance under various work loads. So some more time to
go before we can do a fair comparison based on numbers.
 
> 
> > 
> > The only thing which concerns me is the fact that IO scheduler does not
> > have the view of higher level logical device. So if somebody has setup a
> > software RAID and wants to put max BW limit on software raid device, this
> > solution will not work. One shall have to live with max bw limits on 
> > individual disks (where io scheduler is actually running). Do your patches
> > allow to put limit on software RAID devices also? 
> 
> No, but as said above my patchset provides the interfaces to apply the
> IO control and accounting wherever we want. At the moment there's just
> one interface, cgroup_io_throttle().

Sorry, I did not get it clearly. I guess I did not ask the question right.
So lets say I got a setup where there are two phyical devices /dev/sda and
/dev/sdb and I create a logical device (say using device mapper facilities)
on top of these two physical disks. And some application is generating
the IO for logical device lv0.

				Appl
				 |
				lv0
			       /  \
			    sda	   sdb


Where should I put the bandwidth limiting rules now for io-throtle. I 
specify these for lv0 device or for sda and sdb devices?

Thanks
Vivek

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: IO scheduler based IO Controller V2
  2009-05-06 20:07       ` Andrea Righi
  2009-05-06 21:21         ` Vivek Goyal
@ 2009-05-06 21:21         ` Vivek Goyal
       [not found]           ` <20090506212121.GI8180-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  1 sibling, 1 reply; 97+ messages in thread
From: Vivek Goyal @ 2009-05-06 21:21 UTC (permalink / raw)
  To: Andrea Righi
  Cc: Andrew Morton, nauman, dpshah, lizf, mikew, fchecconi,
	paolo.valente, jens.axboe, ryov, fernando, s-uchida, taka,
	guijianfeng, jmoyer, dhaval, balbir, linux-kernel, containers,
	agk, dm-devel, snitzer, m-ikeda, peterz

On Wed, May 06, 2009 at 10:07:53PM +0200, Andrea Righi wrote:
> On Tue, May 05, 2009 at 10:33:32PM -0400, Vivek Goyal wrote:
> > On Tue, May 05, 2009 at 01:24:41PM -0700, Andrew Morton wrote:
> > > On Tue,  5 May 2009 15:58:27 -0400
> > > Vivek Goyal <vgoyal@redhat.com> wrote:
> > > 
> > > > 
> > > > Hi All,
> > > > 
> > > > Here is the V2 of the IO controller patches generated on top of 2.6.30-rc4.
> > > > ...
> > > > Currently primarily two other IO controller proposals are out there.
> > > > 
> > > > dm-ioband
> > > > ---------
> > > > This patch set is from Ryo Tsuruta from valinux.
> > > > ...
> > > > IO-throttling
> > > > -------------
> > > > This patch set is from Andrea Righi provides max bandwidth controller.
> > > 
> > > I'm thinking we need to lock you guys in a room and come back in 15 minutes.
> > > 
> > > Seriously, how are we to resolve this?  We could lock me in a room and
> > > cmoe back in 15 days, but there's no reason to believe that I'd emerge
> > > with the best answer.
> > > 
> > > I tend to think that a cgroup-based controller is the way to go. 
> > > Anything else will need to be wired up to cgroups _anyway_, and that
> > > might end up messy.
> > 
> > Hi Andrew,
> > 
> > Sorry, did not get what do you mean by cgroup based controller? If you
> > mean that we use cgroups for grouping tasks for controlling IO, then both
> > IO scheduler based controller as well as io throttling proposal do that.
> > dm-ioband also supports that up to some extent but it requires extra step of
> > transferring cgroup grouping information to dm-ioband device using dm-tools.
> > 
> > But if you meant that io-throttle patches, then I think it solves only
> > part of the problem and that is max bw control. It does not offer minimum
> > BW/minimum disk share gurantees as offered by proportional BW control.
> > 
> > IOW, it supports upper limit control and does not support a work conserving
> > IO controller which lets a group use the whole BW if competing groups are
> > not present. IMHO, proportional BW control is an important feature which
> > we will need and IIUC, io-throttle patches can't be easily extended to support
> > proportional BW control, OTOH, one should be able to extend IO scheduler
> > based proportional weight controller to also support max bw control. 
> 
> Well, IMHO the big concern is at which level we want to implement the
> logic of control: IO scheduler, when the IO requests are already
> submitted and need to be dispatched, or at high level when the
> applications generates IO requests (or maybe both).
> 
> And, as pointed by Andrew, do everything by a cgroup-based controller.

I am not sure what's the rationale behind that. Why to do it at higher
layer? Doing it at IO scheduler layer will make sure that one does not
breaks the IO scheduler's properties with-in cgroup. (See my other mail
with some io-throttling test results).

The advantage of higher layer mechanism is that it can also cover software
RAID devices well. 

> 
> The other features, proportional BW, throttling, take the current ioprio
> model in account, etc. are implementation details and any of the
> proposed solutions can be extended to support all these features. I
> mean, io-throttle can be extended to support proportional BW (for a
> certain perspective it is already provided by the throttling water mark
> in v16), as well as the IO scheduler based controller can be extended to
> support absolute BW limits. The same for dm-ioband. I don't think
> there're huge obstacle to merge the functionalities in this sense.

Yes, from technical point of view, one can implement a proportional BW
controller at higher layer also. But that would practically mean almost
re-implementing the CFQ logic at higher layer. Now why to get into all
that complexity. Why not simply make CFQ hiearchical to also handle the
groups?

Secondly, think of following odd scenarios if we implement a higher level
proportional BW controller which can offer the same feature as CFQ and
also can handle group scheduling.

Case1:
======	 
           (Higher level proportional BW controller)
			/dev/sda (CFQ)

So if somebody wants a group scheduling, we will be doing same IO control
at two places (with-in group). Once at higher level and second time at CFQ
level. Does not sound too logical to me.

Case2:
======

           (Higher level proportional BW controller)
			/dev/sda (NOOP)
	
This is other extrememt. Lower level IO scheduler does not offer any kind
of notion of class or prio with-in class and higher level scheduler will
still be maintaining all the infrastructure unnecessarily.

That's why I get back to this simple question again, why not extend the
IO schedulers to handle group scheduling and do both proportional BW and
max bw control there.

> 
> > 
> > Andrea, last time you were planning to have a look at my patches and see
> > if max bw controller can be implemented there. I got a feeling that it
> > should not be too difficult to implement it there. We already have the
> > hierarchical tree of io queues and groups in elevator layer and we run
> > BFQ (WF2Q+) algorithm to select next queue to dispatch the IO from. It is
> > just a matter of also keeping track of IO rate per queue/group and we should
> > be easily be able to delay the dispatch of IO from a queue if its group has
> > crossed the specified max bw.
> 
> Yes, sorry for my late, I quickly tested your patchset, but I still need
> to understand many details of your solution. In the next days I'll
> re-read everything carefully and I'll try to do a detailed review of
> your patchset (just re-building the kernel with your patchset applied).
> 

Sure. My patchset is still in the infancy stage. So don't expect great
results. But it does highlight the idea and design very well.

> > 
> > This should lead to less code and reduced complextiy (compared with the
> > case where we do max bw control with io-throttling patches and proportional
> > BW control using IO scheduler based control patches).
> 
> mmmh... changing the logic at the elevator and all IO schedulers doesn't
> sound like reduced complexity and less code changed. With io-throttle we
> just need to place the cgroup_io_throttle() hook in the right functions
> where we want to apply throttling. This is a quite easy approach to
> extend the IO control also to logical devices (more in general devices
> that use their own make_request_fn) or even network-attached devices, as
> well as networking filesystems, etc.
> 
> But I may be wrong. As I said I still need to review in the details your
> solution.

Well I meant reduced code in the sense if we implement both max bw and
proportional bw at IO scheduler level instead of proportional BW at
IO scheduler and max bw at higher level.

I agree that doing max bw control at higher level has this advantage that
it covers all the kind of deivces (higher level logical devices) and IO
scheduler level solution does not do that. But this comes at the price
of broken IO scheduler properties with-in cgroup.

Maybe we can then implement both. A higher level max bw controller and a
max bw feature implemented along side proportional BW controller at IO
scheduler level. Folks who use hardware RAID, or single disk devices can
use max bw control of IO scheduler and those using software RAID devices
can use higher level max bw controller.

> 
> >  
> > So do you think that it would make sense to do max BW control along with
> > proportional weight IO controller at IO scheduler? If yes, then we can
> > work together and continue to develop this patchset to also support max
> > bw control and meet your requirements and drop the io-throttling patches.
> 
> It is surely worth to be explored. Honestly, I don't know if it would be
> a better solution or not. Probably comparing some results with different
> IO workloads is the best way to proceed and decide which is the right
> way to go. This is necessary IMHO, before totally dropping one solution
> or another.

Sure. My patches have started giving some basic results but because there
is lot of work remaining before a fair comparison can be done on the
basis of performance under various work loads. So some more time to
go before we can do a fair comparison based on numbers.
 
> 
> > 
> > The only thing which concerns me is the fact that IO scheduler does not
> > have the view of higher level logical device. So if somebody has setup a
> > software RAID and wants to put max BW limit on software raid device, this
> > solution will not work. One shall have to live with max bw limits on 
> > individual disks (where io scheduler is actually running). Do your patches
> > allow to put limit on software RAID devices also? 
> 
> No, but as said above my patchset provides the interfaces to apply the
> IO control and accounting wherever we want. At the moment there's just
> one interface, cgroup_io_throttle().

Sorry, I did not get it clearly. I guess I did not ask the question right.
So lets say I got a setup where there are two phyical devices /dev/sda and
/dev/sdb and I create a logical device (say using device mapper facilities)
on top of these two physical disks. And some application is generating
the IO for logical device lv0.

				Appl
				 |
				lv0
			       /  \
			    sda	   sdb


Where should I put the bandwidth limiting rules now for io-throtle. I 
specify these for lv0 device or for sda and sdb devices?

Thanks
Vivek

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: IO scheduler based IO Controller V2
       [not found]         ` <20090506034254.GD4416-SINUvgVNF2CyUtPGxGje5AC/G2K4zDHf@public.gmane.org>
  2009-05-06 10:20           ` Fabio Checconi
  2009-05-06 18:47           ` Divyesh Shah
@ 2009-05-06 20:42           ` Andrea Righi
  2 siblings, 0 replies; 97+ messages in thread
From: Andrea Righi @ 2009-05-06 20:42 UTC (permalink / raw)
  To: Balbir Singh
  Cc: dhaval-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8,
	snitzer-H+wXaHxf7aLQT0dZR+AlfA, dm-devel-H+wXaHxf7aLQT0dZR+AlfA,
	jens.axboe-QHcLZuEGTsvQT0dZR+AlfA, agk-H+wXaHxf7aLQT0dZR+AlfA,
	paolo.valente-rcYM44yAMweonA0d6jMUrA,
	fernando-gVGce1chcLdL9jVzuh4AOg, jmoyer-H+wXaHxf7aLQT0dZR+AlfA,
	fchecconi-Re5JQEeQqe8AvxtiuMwx3w,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Andrew Morton

On Wed, May 06, 2009 at 09:12:54AM +0530, Balbir Singh wrote:
> * Peter Zijlstra <peterz-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org> [2009-05-06 00:20:49]:
> 
> > On Tue, 2009-05-05 at 13:24 -0700, Andrew Morton wrote:
> > > On Tue,  5 May 2009 15:58:27 -0400
> > > Vivek Goyal <vgoyal-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> > > 
> > > > 
> > > > Hi All,
> > > > 
> > > > Here is the V2 of the IO controller patches generated on top of 2.6.30-rc4.
> > > > ...
> > > > Currently primarily two other IO controller proposals are out there.
> > > > 
> > > > dm-ioband
> > > > ---------
> > > > This patch set is from Ryo Tsuruta from valinux.
> > > > ...
> > > > IO-throttling
> > > > -------------
> > > > This patch set is from Andrea Righi provides max bandwidth controller.
> > > 
> > > I'm thinking we need to lock you guys in a room and come back in 15 minutes.
> > > 
> > > Seriously, how are we to resolve this?  We could lock me in a room and
> > > cmoe back in 15 days, but there's no reason to believe that I'd emerge
> > > with the best answer.
> > > 
> > > I tend to think that a cgroup-based controller is the way to go. 
> > > Anything else will need to be wired up to cgroups _anyway_, and that
> > > might end up messy.
> > 
> > FWIW I subscribe to the io-scheduler faith as opposed to the
> > device-mapper cult ;-)
> > 
> > Also, I don't think a simple throttle will be very useful, a more mature
> > solution should cater to more use cases.
> >
> 
> I tend to agree, unless Andrea can prove us wrong. I don't think
> throttling a task (not letting it consume CPU, memory when its IO
> quota is exceeded) is a good idea. I've asked that question to Andrea
> a few times, but got no response.

Sorry Balbir, I probably missed your question. Or replied in a different
thread maybe...

Actually we could allow an offending cgroup to continue to submit IO
requests without throttling it directly. But if we don't want to waste
the memory with pending IO requests or pending writeback pages, we need
to block it sooner or later.

Instead of directly throttle the offending applications, we could block
them when we hit a max limit of requests or dirty pages, i.e. something
like congestion_wait(), but that's the same, no? the difference is that
in this case throttling is asynchronous. Or am I oversimplifying it?

As an example, with writeback IO io-throttle doesn't throttle the IO
requests directly, each request instead receives a deadline (depending
on the BW limit) and it's added into a rbtree. Then all the requests are
dispatched asynchronously using a kernel thread (kiothrottled) only when
the deadline is expired.

OK, there's a lot of space for improvements: provide many kernel threads
per block device, multiple queues/rbtrees, etc., but this is actually a
way to apply throttling asynchronously. The fact is that if I don't
apply the throttling also in balance_dirty_pages() (and I did so in the
last io-throttle version) or add a max limit of requests the rbtree
increases indefinitely...

That should be very similar to the proportional BW solution allocating a
quota of nr_requests per block device and per cgroup.

-Andrea

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: IO scheduler based IO Controller V2
  2009-05-06  3:42       ` Balbir Singh
                           ` (2 preceding siblings ...)
       [not found]         ` <20090506034254.GD4416-SINUvgVNF2CyUtPGxGje5AC/G2K4zDHf@public.gmane.org>
@ 2009-05-06 20:42         ` Andrea Righi
  3 siblings, 0 replies; 97+ messages in thread
From: Andrea Righi @ 2009-05-06 20:42 UTC (permalink / raw)
  To: Balbir Singh
  Cc: Peter Zijlstra, Andrew Morton, Vivek Goyal, nauman, dpshah, lizf,
	mikew, fchecconi, paolo.valente, jens.axboe, ryov, fernando,
	s-uchida, taka, guijianfeng, jmoyer, dhaval, linux-kernel,
	containers, agk, dm-devel, snitzer, m-ikeda

On Wed, May 06, 2009 at 09:12:54AM +0530, Balbir Singh wrote:
> * Peter Zijlstra <peterz@infradead.org> [2009-05-06 00:20:49]:
> 
> > On Tue, 2009-05-05 at 13:24 -0700, Andrew Morton wrote:
> > > On Tue,  5 May 2009 15:58:27 -0400
> > > Vivek Goyal <vgoyal@redhat.com> wrote:
> > > 
> > > > 
> > > > Hi All,
> > > > 
> > > > Here is the V2 of the IO controller patches generated on top of 2.6.30-rc4.
> > > > ...
> > > > Currently primarily two other IO controller proposals are out there.
> > > > 
> > > > dm-ioband
> > > > ---------
> > > > This patch set is from Ryo Tsuruta from valinux.
> > > > ...
> > > > IO-throttling
> > > > -------------
> > > > This patch set is from Andrea Righi provides max bandwidth controller.
> > > 
> > > I'm thinking we need to lock you guys in a room and come back in 15 minutes.
> > > 
> > > Seriously, how are we to resolve this?  We could lock me in a room and
> > > cmoe back in 15 days, but there's no reason to believe that I'd emerge
> > > with the best answer.
> > > 
> > > I tend to think that a cgroup-based controller is the way to go. 
> > > Anything else will need to be wired up to cgroups _anyway_, and that
> > > might end up messy.
> > 
> > FWIW I subscribe to the io-scheduler faith as opposed to the
> > device-mapper cult ;-)
> > 
> > Also, I don't think a simple throttle will be very useful, a more mature
> > solution should cater to more use cases.
> >
> 
> I tend to agree, unless Andrea can prove us wrong. I don't think
> throttling a task (not letting it consume CPU, memory when its IO
> quota is exceeded) is a good idea. I've asked that question to Andrea
> a few times, but got no response.

Sorry Balbir, I probably missed your question. Or replied in a different
thread maybe...

Actually we could allow an offending cgroup to continue to submit IO
requests without throttling it directly. But if we don't want to waste
the memory with pending IO requests or pending writeback pages, we need
to block it sooner or later.

Instead of directly throttle the offending applications, we could block
them when we hit a max limit of requests or dirty pages, i.e. something
like congestion_wait(), but that's the same, no? the difference is that
in this case throttling is asynchronous. Or am I oversimplifying it?

As an example, with writeback IO io-throttle doesn't throttle the IO
requests directly, each request instead receives a deadline (depending
on the BW limit) and it's added into a rbtree. Then all the requests are
dispatched asynchronously using a kernel thread (kiothrottled) only when
the deadline is expired.

OK, there's a lot of space for improvements: provide many kernel threads
per block device, multiple queues/rbtrees, etc., but this is actually a
way to apply throttling asynchronously. The fact is that if I don't
apply the throttling also in balance_dirty_pages() (and I did so in the
last io-throttle version) or add a max limit of requests the rbtree
increases indefinitely...

That should be very similar to the proportional BW solution allocating a
quota of nr_requests per block device and per cgroup.

-Andrea

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: IO scheduler based IO Controller V2
       [not found]       ` <20090506023332.GA1212-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  2009-05-06 17:59         ` Nauman Rafique
  2009-05-06 20:07         ` Andrea Righi
@ 2009-05-06 20:32         ` Vivek Goyal
  2009-05-07  0:18         ` Ryo Tsuruta
  3 siblings, 0 replies; 97+ messages in thread
From: Vivek Goyal @ 2009-05-06 20:32 UTC (permalink / raw)
  To: Andrew Morton, Andrea Righi
  Cc: dhaval-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8,
	snitzer-H+wXaHxf7aLQT0dZR+AlfA, dm-devel-H+wXaHxf7aLQT0dZR+AlfA,
	jens.axboe-QHcLZuEGTsvQT0dZR+AlfA, agk-H+wXaHxf7aLQT0dZR+AlfA,
	balbir-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8,
	paolo.valente-rcYM44yAMweonA0d6jMUrA,
	fernando-gVGce1chcLdL9jVzuh4AOg, jmoyer-H+wXaHxf7aLQT0dZR+AlfA,
	fchecconi-Re5JQEeQqe8AvxtiuMwx3w,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

On Tue, May 05, 2009 at 10:33:32PM -0400, Vivek Goyal wrote:
> On Tue, May 05, 2009 at 01:24:41PM -0700, Andrew Morton wrote:
> > On Tue,  5 May 2009 15:58:27 -0400
> > Vivek Goyal <vgoyal-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> > 
> > > 
> > > Hi All,
> > > 
> > > Here is the V2 of the IO controller patches generated on top of 2.6.30-rc4.
> > > ...
> > > Currently primarily two other IO controller proposals are out there.
> > > 
> > > dm-ioband
> > > ---------
> > > This patch set is from Ryo Tsuruta from valinux.
> > > ...
> > > IO-throttling
> > > -------------
> > > This patch set is from Andrea Righi provides max bandwidth controller.
> > 
> > I'm thinking we need to lock you guys in a room and come back in 15 minutes.
> > 
> > Seriously, how are we to resolve this?  We could lock me in a room and
> > cmoe back in 15 days, but there's no reason to believe that I'd emerge
> > with the best answer.
> > 
> > I tend to think that a cgroup-based controller is the way to go. 
> > Anything else will need to be wired up to cgroups _anyway_, and that
> > might end up messy.
> 
> Hi Andrew,
> 
> Sorry, did not get what do you mean by cgroup based controller? If you
> mean that we use cgroups for grouping tasks for controlling IO, then both
> IO scheduler based controller as well as io throttling proposal do that.
> dm-ioband also supports that up to some extent but it requires extra step of
> transferring cgroup grouping information to dm-ioband device using dm-tools.
> 
> But if you meant that io-throttle patches, then I think it solves only
> part of the problem and that is max bw control. It does not offer minimum
> BW/minimum disk share gurantees as offered by proportional BW control.
> 
> IOW, it supports upper limit control and does not support a work conserving
> IO controller which lets a group use the whole BW if competing groups are
> not present. IMHO, proportional BW control is an important feature which
> we will need and IIUC, io-throttle patches can't be easily extended to support
> proportional BW control, OTOH, one should be able to extend IO scheduler
> based proportional weight controller to also support max bw control. 
> 
> Andrea, last time you were planning to have a look at my patches and see
> if max bw controller can be implemented there. I got a feeling that it
> should not be too difficult to implement it there. We already have the
> hierarchical tree of io queues and groups in elevator layer and we run
> BFQ (WF2Q+) algorithm to select next queue to dispatch the IO from. It is
> just a matter of also keeping track of IO rate per queue/group and we should
> be easily be able to delay the dispatch of IO from a queue if its group has
> crossed the specified max bw.
> 
> This should lead to less code and reduced complextiy (compared with the
> case where we do max bw control with io-throttling patches and proportional
> BW control using IO scheduler based control patches).
>  
> So do you think that it would make sense to do max BW control along with
> proportional weight IO controller at IO scheduler? If yes, then we can
> work together and continue to develop this patchset to also support max
> bw control and meet your requirements and drop the io-throttling patches.
> 

Hi Andrea and others,

I always had this doubt in mind that any kind of 2nd level controller will
have no idea about underlying IO scheduler queues/semantics. So while it
can implement a particular cgroup policy (max bw like io-throttle or
proportional bw like dm-ioband) but there are high chances that it will
break IO scheduler's semantics in one way or other.

I had already sent out the results for dm-ioband in a separate thread.

http://linux.derkeiler.com/Mailing-Lists/Kernel/2009-04/msg07258.html
http://linux.derkeiler.com/Mailing-Lists/Kernel/2009-04/msg07573.html
http://linux.derkeiler.com/Mailing-Lists/Kernel/2009-04/msg08177.html
http://linux.derkeiler.com/Mailing-Lists/Kernel/2009-04/msg08345.html
http://linux.derkeiler.com/Mailing-Lists/Kernel/2009-04/msg08355.html

Here are some basic results with io-throttle. Andrea, please let me know
if you think this is procedural problem. Playing with io-throttle patches
for the first time.

I took V16 of your patches and trying it out with 2.6.30-rc4 with CFQ
scheduler.

I have got one SATA drive with one partition on it.

I am trying to create one cgroup and assignn 8MB/s limit to it and launch
on RT prio 0 task and one BE prio 7 task and see how this 8MB/s is divided
between these tasks. Following are the results.

Following is my test script.

*******************************************************************
#!/bin/bash

mount /dev/sdb1 /mnt/sdb

mount -t cgroup -o blockio blockio /cgroup/iot/
mkdir -p /cgroup/iot/test1 /cgroup/iot/test2

# Set bw limit of 8 MB/ps on sdb
echo "/dev/sdb:$((8 * 1024 * 1024)):0:0" >
/cgroup/iot/test1/blockio.bandwidth-max

sync
echo 3 > /proc/sys/vm/drop_caches

echo $$ > /cgroup/iot/test1/tasks

# Launch a normal prio reader.
ionice -c 2 -n 7 dd if=/mnt/sdb/zerofile1 of=/dev/zero &
pid1=$!
echo $pid1

# Launch an RT reader  
ionice -c 1 -n 0 dd if=/mnt/sdb/zerofile2 of=/dev/zero &
pid2=$!
echo $pid2

wait $pid2
echo "RT task finished"
**********************************************************************

Test1
=====
Test two readers (one RT class and one BE class) and see how BW is
allocated with-in cgroup

With io-throttle patches
------------------------
- Two readers, first BE prio 7, second RT prio 0

234179072 bytes (234 MB) copied, 55.8482 s, 4.2 MB/s
234179072 bytes (234 MB) copied, 55.8975 s, 4.2 MB/s
RT task finished

Note: See, there is no difference in the performance of RT or BE task.
Looks like these got throttled equally.


Without io-throttle patches
----------------------------
- Two readers, first BE prio 7, second RT prio 0

234179072 bytes (234 MB) copied, 2.81801 s, 83.1 MB/s
RT task finished
234179072 bytes (234 MB) copied, 5.28238 s, 44.3 MB/s

Note: Because I can't limit the BW without io-throttle patches, so don't
      worry about increased BW. But the important point is that RT task
      gets much more BW than a BE prio 7 task.

Test2
====
- Test 2 readers (One BE prio 0 and one BE prio 7) and see how BW is
distributed among these.

With io-throttle patches
------------------------
- Two readers, first BE prio 7, second BE prio 0

234179072 bytes (234 MB) copied, 55.8604 s, 4.2 MB/s
234179072 bytes (234 MB) copied, 55.8918 s, 4.2 MB/s
High prio reader finished

Without io-throttle patches
---------------------------
- Two readers, first BE prio 7, second BE prio 0

234179072 bytes (234 MB) copied, 4.12074 s, 56.8 MB/s
High prio reader finished
234179072 bytes (234 MB) copied, 5.36023 s, 43.7 MB/s

Note: There is no service differentiation between prio 0 and prio 7 task
      with io-throttle patches.

Test 3
======
- Run the one RT reader and one BE reader in root cgroup without any
  limitations. I guess this should mean unlimited BW and behavior should
  be same as with CFQ without io-throttling patches.

With io-throttle patches
=========================
Ran the test 4 times because I was getting different results in different
runs.

- Two readers, one RT prio 0  other BE prio 7

234179072 bytes (234 MB) copied, 2.74604 s, 85.3 MB/s
234179072 bytes (234 MB) copied, 5.20995 s, 44.9 MB/s
RT task finished

234179072 bytes (234 MB) copied, 4.54417 s, 51.5 MB/s
RT task finished
234179072 bytes (234 MB) copied, 5.23396 s, 44.7 MB/s

234179072 bytes (234 MB) copied, 5.17727 s, 45.2 MB/s
RT task finished
234179072 bytes (234 MB) copied, 5.25894 s, 44.5 MB/s

234179072 bytes (234 MB) copied, 2.74141 s, 85.4 MB/s
234179072 bytes (234 MB) copied, 5.20536 s, 45.0 MB/s
RT task finished

Note: Out of 4 runs, looks like twice it is complete priority inversion
      and RT task finished after BE task. Rest of the two times, the
      difference between BW of RT and BE task is much less as compared to
      without patches. In fact once it was almost same.

Without io-throttle patches.
===========================
- Two readers, one RT prio 0  other BE prio 7 (4 runs)

234179072 bytes (234 MB) copied, 2.80988 s, 83.3 MB/s
RT task finished
234179072 bytes (234 MB) copied, 5.28228 s, 44.3 MB/s

234179072 bytes (234 MB) copied, 2.80659 s, 83.4 MB/s
RT task finished
234179072 bytes (234 MB) copied, 5.27874 s, 44.4 MB/s

234179072 bytes (234 MB) copied, 2.79601 s, 83.8 MB/s
RT task finished
234179072 bytes (234 MB) copied, 5.2542 s, 44.6 MB/s

234179072 bytes (234 MB) copied, 2.78764 s, 84.0 MB/s
RT task finished
234179072 bytes (234 MB) copied, 5.26009 s, 44.5 MB/s

Note, How consistent the behavior is without io-throttle patches.

In summary, I think a 2nd level solution can ensure one policy on cgroups but
it will break other semantics/properties of IO scheduler with-in cgroup as
2nd level solution has no idea at run time what is the IO scheduler running
underneath and what kind of properties it has.

Andrea, please try it on your setup and see if you get similar results
on or. Hopefully it is not a configuration or test procedure issue on my
side.

Thanks
Vivek

> The only thing which concerns me is the fact that IO scheduler does not
> have the view of higher level logical device. So if somebody has setup a
> software RAID and wants to put max BW limit on software raid device, this
> solution will not work. One shall have to live with max bw limits on 
> individual disks (where io scheduler is actually running). Do your patches
> allow to put limit on software RAID devices also? 
> 
> Ryo, dm-ioband breaks the notion of classes and priority of CFQ because
> of FIFO dispatch of buffered bios. Apart from that it tries to provide
> fairness in terms of actual IO done and that would mean a seeky workload
> will can use disk for much longer to get equivalent IO done and slow down
> other applications. Implementing IO controller at IO scheduler level gives
> us tigher control. Will it not meet your requirements? If you got specific
> concerns with IO scheduler based contol patches, please highlight these and
> we will see how these can be addressed.
> 
> Thanks
> Vivek

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: IO scheduler based IO Controller V2
  2009-05-06  2:33     ` Vivek Goyal
                         ` (2 preceding siblings ...)
       [not found]       ` <20090506023332.GA1212-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2009-05-06 20:32       ` Vivek Goyal
       [not found]         ` <20090506203228.GH8180-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  2009-05-06 21:34         ` Andrea Righi
  2009-05-07  0:18       ` Ryo Tsuruta
  4 siblings, 2 replies; 97+ messages in thread
From: Vivek Goyal @ 2009-05-06 20:32 UTC (permalink / raw)
  To: Andrew Morton, Andrea Righi
  Cc: nauman, dpshah, lizf, mikew, fchecconi, paolo.valente,
	jens.axboe, ryov, fernando, s-uchida, taka, guijianfeng, jmoyer,
	dhaval, balbir, linux-kernel, containers, agk, dm-devel, snitzer,
	m-ikeda, peterz

On Tue, May 05, 2009 at 10:33:32PM -0400, Vivek Goyal wrote:
> On Tue, May 05, 2009 at 01:24:41PM -0700, Andrew Morton wrote:
> > On Tue,  5 May 2009 15:58:27 -0400
> > Vivek Goyal <vgoyal@redhat.com> wrote:
> > 
> > > 
> > > Hi All,
> > > 
> > > Here is the V2 of the IO controller patches generated on top of 2.6.30-rc4.
> > > ...
> > > Currently primarily two other IO controller proposals are out there.
> > > 
> > > dm-ioband
> > > ---------
> > > This patch set is from Ryo Tsuruta from valinux.
> > > ...
> > > IO-throttling
> > > -------------
> > > This patch set is from Andrea Righi provides max bandwidth controller.
> > 
> > I'm thinking we need to lock you guys in a room and come back in 15 minutes.
> > 
> > Seriously, how are we to resolve this?  We could lock me in a room and
> > cmoe back in 15 days, but there's no reason to believe that I'd emerge
> > with the best answer.
> > 
> > I tend to think that a cgroup-based controller is the way to go. 
> > Anything else will need to be wired up to cgroups _anyway_, and that
> > might end up messy.
> 
> Hi Andrew,
> 
> Sorry, did not get what do you mean by cgroup based controller? If you
> mean that we use cgroups for grouping tasks for controlling IO, then both
> IO scheduler based controller as well as io throttling proposal do that.
> dm-ioband also supports that up to some extent but it requires extra step of
> transferring cgroup grouping information to dm-ioband device using dm-tools.
> 
> But if you meant that io-throttle patches, then I think it solves only
> part of the problem and that is max bw control. It does not offer minimum
> BW/minimum disk share gurantees as offered by proportional BW control.
> 
> IOW, it supports upper limit control and does not support a work conserving
> IO controller which lets a group use the whole BW if competing groups are
> not present. IMHO, proportional BW control is an important feature which
> we will need and IIUC, io-throttle patches can't be easily extended to support
> proportional BW control, OTOH, one should be able to extend IO scheduler
> based proportional weight controller to also support max bw control. 
> 
> Andrea, last time you were planning to have a look at my patches and see
> if max bw controller can be implemented there. I got a feeling that it
> should not be too difficult to implement it there. We already have the
> hierarchical tree of io queues and groups in elevator layer and we run
> BFQ (WF2Q+) algorithm to select next queue to dispatch the IO from. It is
> just a matter of also keeping track of IO rate per queue/group and we should
> be easily be able to delay the dispatch of IO from a queue if its group has
> crossed the specified max bw.
> 
> This should lead to less code and reduced complextiy (compared with the
> case where we do max bw control with io-throttling patches and proportional
> BW control using IO scheduler based control patches).
>  
> So do you think that it would make sense to do max BW control along with
> proportional weight IO controller at IO scheduler? If yes, then we can
> work together and continue to develop this patchset to also support max
> bw control and meet your requirements and drop the io-throttling patches.
> 

Hi Andrea and others,

I always had this doubt in mind that any kind of 2nd level controller will
have no idea about underlying IO scheduler queues/semantics. So while it
can implement a particular cgroup policy (max bw like io-throttle or
proportional bw like dm-ioband) but there are high chances that it will
break IO scheduler's semantics in one way or other.

I had already sent out the results for dm-ioband in a separate thread.

http://linux.derkeiler.com/Mailing-Lists/Kernel/2009-04/msg07258.html
http://linux.derkeiler.com/Mailing-Lists/Kernel/2009-04/msg07573.html
http://linux.derkeiler.com/Mailing-Lists/Kernel/2009-04/msg08177.html
http://linux.derkeiler.com/Mailing-Lists/Kernel/2009-04/msg08345.html
http://linux.derkeiler.com/Mailing-Lists/Kernel/2009-04/msg08355.html

Here are some basic results with io-throttle. Andrea, please let me know
if you think this is procedural problem. Playing with io-throttle patches
for the first time.

I took V16 of your patches and trying it out with 2.6.30-rc4 with CFQ
scheduler.

I have got one SATA drive with one partition on it.

I am trying to create one cgroup and assignn 8MB/s limit to it and launch
on RT prio 0 task and one BE prio 7 task and see how this 8MB/s is divided
between these tasks. Following are the results.

Following is my test script.

*******************************************************************
#!/bin/bash

mount /dev/sdb1 /mnt/sdb

mount -t cgroup -o blockio blockio /cgroup/iot/
mkdir -p /cgroup/iot/test1 /cgroup/iot/test2

# Set bw limit of 8 MB/ps on sdb
echo "/dev/sdb:$((8 * 1024 * 1024)):0:0" >
/cgroup/iot/test1/blockio.bandwidth-max

sync
echo 3 > /proc/sys/vm/drop_caches

echo $$ > /cgroup/iot/test1/tasks

# Launch a normal prio reader.
ionice -c 2 -n 7 dd if=/mnt/sdb/zerofile1 of=/dev/zero &
pid1=$!
echo $pid1

# Launch an RT reader  
ionice -c 1 -n 0 dd if=/mnt/sdb/zerofile2 of=/dev/zero &
pid2=$!
echo $pid2

wait $pid2
echo "RT task finished"
**********************************************************************

Test1
=====
Test two readers (one RT class and one BE class) and see how BW is
allocated with-in cgroup

With io-throttle patches
------------------------
- Two readers, first BE prio 7, second RT prio 0

234179072 bytes (234 MB) copied, 55.8482 s, 4.2 MB/s
234179072 bytes (234 MB) copied, 55.8975 s, 4.2 MB/s
RT task finished

Note: See, there is no difference in the performance of RT or BE task.
Looks like these got throttled equally.


Without io-throttle patches
----------------------------
- Two readers, first BE prio 7, second RT prio 0

234179072 bytes (234 MB) copied, 2.81801 s, 83.1 MB/s
RT task finished
234179072 bytes (234 MB) copied, 5.28238 s, 44.3 MB/s

Note: Because I can't limit the BW without io-throttle patches, so don't
      worry about increased BW. But the important point is that RT task
      gets much more BW than a BE prio 7 task.

Test2
====
- Test 2 readers (One BE prio 0 and one BE prio 7) and see how BW is
distributed among these.

With io-throttle patches
------------------------
- Two readers, first BE prio 7, second BE prio 0

234179072 bytes (234 MB) copied, 55.8604 s, 4.2 MB/s
234179072 bytes (234 MB) copied, 55.8918 s, 4.2 MB/s
High prio reader finished

Without io-throttle patches
---------------------------
- Two readers, first BE prio 7, second BE prio 0

234179072 bytes (234 MB) copied, 4.12074 s, 56.8 MB/s
High prio reader finished
234179072 bytes (234 MB) copied, 5.36023 s, 43.7 MB/s

Note: There is no service differentiation between prio 0 and prio 7 task
      with io-throttle patches.

Test 3
======
- Run the one RT reader and one BE reader in root cgroup without any
  limitations. I guess this should mean unlimited BW and behavior should
  be same as with CFQ without io-throttling patches.

With io-throttle patches
=========================
Ran the test 4 times because I was getting different results in different
runs.

- Two readers, one RT prio 0  other BE prio 7

234179072 bytes (234 MB) copied, 2.74604 s, 85.3 MB/s
234179072 bytes (234 MB) copied, 5.20995 s, 44.9 MB/s
RT task finished

234179072 bytes (234 MB) copied, 4.54417 s, 51.5 MB/s
RT task finished
234179072 bytes (234 MB) copied, 5.23396 s, 44.7 MB/s

234179072 bytes (234 MB) copied, 5.17727 s, 45.2 MB/s
RT task finished
234179072 bytes (234 MB) copied, 5.25894 s, 44.5 MB/s

234179072 bytes (234 MB) copied, 2.74141 s, 85.4 MB/s
234179072 bytes (234 MB) copied, 5.20536 s, 45.0 MB/s
RT task finished

Note: Out of 4 runs, looks like twice it is complete priority inversion
      and RT task finished after BE task. Rest of the two times, the
      difference between BW of RT and BE task is much less as compared to
      without patches. In fact once it was almost same.

Without io-throttle patches.
===========================
- Two readers, one RT prio 0  other BE prio 7 (4 runs)

234179072 bytes (234 MB) copied, 2.80988 s, 83.3 MB/s
RT task finished
234179072 bytes (234 MB) copied, 5.28228 s, 44.3 MB/s

234179072 bytes (234 MB) copied, 2.80659 s, 83.4 MB/s
RT task finished
234179072 bytes (234 MB) copied, 5.27874 s, 44.4 MB/s

234179072 bytes (234 MB) copied, 2.79601 s, 83.8 MB/s
RT task finished
234179072 bytes (234 MB) copied, 5.2542 s, 44.6 MB/s

234179072 bytes (234 MB) copied, 2.78764 s, 84.0 MB/s
RT task finished
234179072 bytes (234 MB) copied, 5.26009 s, 44.5 MB/s

Note, How consistent the behavior is without io-throttle patches.

In summary, I think a 2nd level solution can ensure one policy on cgroups but
it will break other semantics/properties of IO scheduler with-in cgroup as
2nd level solution has no idea at run time what is the IO scheduler running
underneath and what kind of properties it has.

Andrea, please try it on your setup and see if you get similar results
on or. Hopefully it is not a configuration or test procedure issue on my
side.

Thanks
Vivek

> The only thing which concerns me is the fact that IO scheduler does not
> have the view of higher level logical device. So if somebody has setup a
> software RAID and wants to put max BW limit on software raid device, this
> solution will not work. One shall have to live with max bw limits on 
> individual disks (where io scheduler is actually running). Do your patches
> allow to put limit on software RAID devices also? 
> 
> Ryo, dm-ioband breaks the notion of classes and priority of CFQ because
> of FIFO dispatch of buffered bios. Apart from that it tries to provide
> fairness in terms of actual IO done and that would mean a seeky workload
> will can use disk for much longer to get equivalent IO done and slow down
> other applications. Implementing IO controller at IO scheduler level gives
> us tigher control. Will it not meet your requirements? If you got specific
> concerns with IO scheduler based contol patches, please highlight these and
> we will see how these can be addressed.
> 
> Thanks
> Vivek

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: IO scheduler based IO Controller V2
       [not found]       ` <20090506023332.GA1212-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  2009-05-06 17:59         ` Nauman Rafique
@ 2009-05-06 20:07         ` Andrea Righi
  2009-05-06 20:32         ` Vivek Goyal
  2009-05-07  0:18         ` Ryo Tsuruta
  3 siblings, 0 replies; 97+ messages in thread
From: Andrea Righi @ 2009-05-06 20:07 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: dhaval-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8,
	snitzer-H+wXaHxf7aLQT0dZR+AlfA, dm-devel-H+wXaHxf7aLQT0dZR+AlfA,
	jens.axboe-QHcLZuEGTsvQT0dZR+AlfA, agk-H+wXaHxf7aLQT0dZR+AlfA,
	balbir-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8,
	paolo.valente-rcYM44yAMweonA0d6jMUrA,
	fernando-gVGce1chcLdL9jVzuh4AOg, jmoyer-H+wXaHxf7aLQT0dZR+AlfA,
	fchecconi-Re5JQEeQqe8AvxtiuMwx3w,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Andrew Morton

On Tue, May 05, 2009 at 10:33:32PM -0400, Vivek Goyal wrote:
> On Tue, May 05, 2009 at 01:24:41PM -0700, Andrew Morton wrote:
> > On Tue,  5 May 2009 15:58:27 -0400
> > Vivek Goyal <vgoyal-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> > 
> > > 
> > > Hi All,
> > > 
> > > Here is the V2 of the IO controller patches generated on top of 2.6.30-rc4.
> > > ...
> > > Currently primarily two other IO controller proposals are out there.
> > > 
> > > dm-ioband
> > > ---------
> > > This patch set is from Ryo Tsuruta from valinux.
> > > ...
> > > IO-throttling
> > > -------------
> > > This patch set is from Andrea Righi provides max bandwidth controller.
> > 
> > I'm thinking we need to lock you guys in a room and come back in 15 minutes.
> > 
> > Seriously, how are we to resolve this?  We could lock me in a room and
> > cmoe back in 15 days, but there's no reason to believe that I'd emerge
> > with the best answer.
> > 
> > I tend to think that a cgroup-based controller is the way to go. 
> > Anything else will need to be wired up to cgroups _anyway_, and that
> > might end up messy.
> 
> Hi Andrew,
> 
> Sorry, did not get what do you mean by cgroup based controller? If you
> mean that we use cgroups for grouping tasks for controlling IO, then both
> IO scheduler based controller as well as io throttling proposal do that.
> dm-ioband also supports that up to some extent but it requires extra step of
> transferring cgroup grouping information to dm-ioband device using dm-tools.
> 
> But if you meant that io-throttle patches, then I think it solves only
> part of the problem and that is max bw control. It does not offer minimum
> BW/minimum disk share gurantees as offered by proportional BW control.
> 
> IOW, it supports upper limit control and does not support a work conserving
> IO controller which lets a group use the whole BW if competing groups are
> not present. IMHO, proportional BW control is an important feature which
> we will need and IIUC, io-throttle patches can't be easily extended to support
> proportional BW control, OTOH, one should be able to extend IO scheduler
> based proportional weight controller to also support max bw control. 

Well, IMHO the big concern is at which level we want to implement the
logic of control: IO scheduler, when the IO requests are already
submitted and need to be dispatched, or at high level when the
applications generates IO requests (or maybe both).

And, as pointed by Andrew, do everything by a cgroup-based controller.

The other features, proportional BW, throttling, take the current ioprio
model in account, etc. are implementation details and any of the
proposed solutions can be extended to support all these features. I
mean, io-throttle can be extended to support proportional BW (for a
certain perspective it is already provided by the throttling water mark
in v16), as well as the IO scheduler based controller can be extended to
support absolute BW limits. The same for dm-ioband. I don't think
there're huge obstacle to merge the functionalities in this sense.

> 
> Andrea, last time you were planning to have a look at my patches and see
> if max bw controller can be implemented there. I got a feeling that it
> should not be too difficult to implement it there. We already have the
> hierarchical tree of io queues and groups in elevator layer and we run
> BFQ (WF2Q+) algorithm to select next queue to dispatch the IO from. It is
> just a matter of also keeping track of IO rate per queue/group and we should
> be easily be able to delay the dispatch of IO from a queue if its group has
> crossed the specified max bw.

Yes, sorry for my late, I quickly tested your patchset, but I still need
to understand many details of your solution. In the next days I'll
re-read everything carefully and I'll try to do a detailed review of
your patchset (just re-building the kernel with your patchset applied).

> 
> This should lead to less code and reduced complextiy (compared with the
> case where we do max bw control with io-throttling patches and proportional
> BW control using IO scheduler based control patches).

mmmh... changing the logic at the elevator and all IO schedulers doesn't
sound like reduced complexity and less code changed. With io-throttle we
just need to place the cgroup_io_throttle() hook in the right functions
where we want to apply throttling. This is a quite easy approach to
extend the IO control also to logical devices (more in general devices
that use their own make_request_fn) or even network-attached devices, as
well as networking filesystems, etc.

But I may be wrong. As I said I still need to review in the details your
solution.

>  
> So do you think that it would make sense to do max BW control along with
> proportional weight IO controller at IO scheduler? If yes, then we can
> work together and continue to develop this patchset to also support max
> bw control and meet your requirements and drop the io-throttling patches.

It is surely worth to be explored. Honestly, I don't know if it would be
a better solution or not. Probably comparing some results with different
IO workloads is the best way to proceed and decide which is the right
way to go. This is necessary IMHO, before totally dropping one solution
or another.

> 
> The only thing which concerns me is the fact that IO scheduler does not
> have the view of higher level logical device. So if somebody has setup a
> software RAID and wants to put max BW limit on software raid device, this
> solution will not work. One shall have to live with max bw limits on 
> individual disks (where io scheduler is actually running). Do your patches
> allow to put limit on software RAID devices also? 

No, but as said above my patchset provides the interfaces to apply the
IO control and accounting wherever we want. At the moment there's just
one interface, cgroup_io_throttle().

-Andrea

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: IO scheduler based IO Controller V2
  2009-05-06  2:33     ` Vivek Goyal
  2009-05-06 17:59       ` Nauman Rafique
@ 2009-05-06 20:07       ` Andrea Righi
  2009-05-06 21:21         ` Vivek Goyal
  2009-05-06 21:21         ` Vivek Goyal
       [not found]       ` <20090506023332.GA1212-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
                         ` (2 subsequent siblings)
  4 siblings, 2 replies; 97+ messages in thread
From: Andrea Righi @ 2009-05-06 20:07 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Andrew Morton, nauman, dpshah, lizf, mikew, fchecconi,
	paolo.valente, jens.axboe, ryov, fernando, s-uchida, taka,
	guijianfeng, jmoyer, dhaval, balbir, linux-kernel, containers,
	agk, dm-devel, snitzer, m-ikeda, peterz

On Tue, May 05, 2009 at 10:33:32PM -0400, Vivek Goyal wrote:
> On Tue, May 05, 2009 at 01:24:41PM -0700, Andrew Morton wrote:
> > On Tue,  5 May 2009 15:58:27 -0400
> > Vivek Goyal <vgoyal@redhat.com> wrote:
> > 
> > > 
> > > Hi All,
> > > 
> > > Here is the V2 of the IO controller patches generated on top of 2.6.30-rc4.
> > > ...
> > > Currently primarily two other IO controller proposals are out there.
> > > 
> > > dm-ioband
> > > ---------
> > > This patch set is from Ryo Tsuruta from valinux.
> > > ...
> > > IO-throttling
> > > -------------
> > > This patch set is from Andrea Righi provides max bandwidth controller.
> > 
> > I'm thinking we need to lock you guys in a room and come back in 15 minutes.
> > 
> > Seriously, how are we to resolve this?  We could lock me in a room and
> > cmoe back in 15 days, but there's no reason to believe that I'd emerge
> > with the best answer.
> > 
> > I tend to think that a cgroup-based controller is the way to go. 
> > Anything else will need to be wired up to cgroups _anyway_, and that
> > might end up messy.
> 
> Hi Andrew,
> 
> Sorry, did not get what do you mean by cgroup based controller? If you
> mean that we use cgroups for grouping tasks for controlling IO, then both
> IO scheduler based controller as well as io throttling proposal do that.
> dm-ioband also supports that up to some extent but it requires extra step of
> transferring cgroup grouping information to dm-ioband device using dm-tools.
> 
> But if you meant that io-throttle patches, then I think it solves only
> part of the problem and that is max bw control. It does not offer minimum
> BW/minimum disk share gurantees as offered by proportional BW control.
> 
> IOW, it supports upper limit control and does not support a work conserving
> IO controller which lets a group use the whole BW if competing groups are
> not present. IMHO, proportional BW control is an important feature which
> we will need and IIUC, io-throttle patches can't be easily extended to support
> proportional BW control, OTOH, one should be able to extend IO scheduler
> based proportional weight controller to also support max bw control. 

Well, IMHO the big concern is at which level we want to implement the
logic of control: IO scheduler, when the IO requests are already
submitted and need to be dispatched, or at high level when the
applications generates IO requests (or maybe both).

And, as pointed by Andrew, do everything by a cgroup-based controller.

The other features, proportional BW, throttling, take the current ioprio
model in account, etc. are implementation details and any of the
proposed solutions can be extended to support all these features. I
mean, io-throttle can be extended to support proportional BW (for a
certain perspective it is already provided by the throttling water mark
in v16), as well as the IO scheduler based controller can be extended to
support absolute BW limits. The same for dm-ioband. I don't think
there're huge obstacle to merge the functionalities in this sense.

> 
> Andrea, last time you were planning to have a look at my patches and see
> if max bw controller can be implemented there. I got a feeling that it
> should not be too difficult to implement it there. We already have the
> hierarchical tree of io queues and groups in elevator layer and we run
> BFQ (WF2Q+) algorithm to select next queue to dispatch the IO from. It is
> just a matter of also keeping track of IO rate per queue/group and we should
> be easily be able to delay the dispatch of IO from a queue if its group has
> crossed the specified max bw.

Yes, sorry for my late, I quickly tested your patchset, but I still need
to understand many details of your solution. In the next days I'll
re-read everything carefully and I'll try to do a detailed review of
your patchset (just re-building the kernel with your patchset applied).

> 
> This should lead to less code and reduced complextiy (compared with the
> case where we do max bw control with io-throttling patches and proportional
> BW control using IO scheduler based control patches).

mmmh... changing the logic at the elevator and all IO schedulers doesn't
sound like reduced complexity and less code changed. With io-throttle we
just need to place the cgroup_io_throttle() hook in the right functions
where we want to apply throttling. This is a quite easy approach to
extend the IO control also to logical devices (more in general devices
that use their own make_request_fn) or even network-attached devices, as
well as networking filesystems, etc.

But I may be wrong. As I said I still need to review in the details your
solution.

>  
> So do you think that it would make sense to do max BW control along with
> proportional weight IO controller at IO scheduler? If yes, then we can
> work together and continue to develop this patchset to also support max
> bw control and meet your requirements and drop the io-throttling patches.

It is surely worth to be explored. Honestly, I don't know if it would be
a better solution or not. Probably comparing some results with different
IO workloads is the best way to proceed and decide which is the right
way to go. This is necessary IMHO, before totally dropping one solution
or another.

> 
> The only thing which concerns me is the fact that IO scheduler does not
> have the view of higher level logical device. So if somebody has setup a
> software RAID and wants to put max BW limit on software raid device, this
> solution will not work. One shall have to live with max bw limits on 
> individual disks (where io scheduler is actually running). Do your patches
> allow to put limit on software RAID devices also? 

No, but as said above my patchset provides the interfaces to apply the
IO control and accounting wherever we want. At the moment there's just
one interface, cgroup_io_throttle().

-Andrea

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: IO scheduler based IO Controller V2
       [not found]         ` <20090506034254.GD4416-SINUvgVNF2CyUtPGxGje5AC/G2K4zDHf@public.gmane.org>
  2009-05-06 10:20           ` Fabio Checconi
@ 2009-05-06 18:47           ` Divyesh Shah
  2009-05-06 20:42           ` Andrea Righi
  2 siblings, 0 replies; 97+ messages in thread
From: Divyesh Shah @ 2009-05-06 18:47 UTC (permalink / raw)
  To: balbir-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8
  Cc: dhaval-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8,
	snitzer-H+wXaHxf7aLQT0dZR+AlfA, dm-devel-H+wXaHxf7aLQT0dZR+AlfA,
	jens.axboe-QHcLZuEGTsvQT0dZR+AlfA, agk-H+wXaHxf7aLQT0dZR+AlfA,
	paolo.valente-rcYM44yAMweonA0d6jMUrA,
	fernando-gVGce1chcLdL9jVzuh4AOg, jmoyer-H+wXaHxf7aLQT0dZR+AlfA,
	righi.andrea-Re5JQEeQqe8AvxtiuMwx3w,
	fchecconi-Re5JQEeQqe8AvxtiuMwx3w,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Andrew Morton

Balbir Singh wrote:
> * Peter Zijlstra <peterz-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org> [2009-05-06 00:20:49]:
> 
>> On Tue, 2009-05-05 at 13:24 -0700, Andrew Morton wrote:
>>> On Tue,  5 May 2009 15:58:27 -0400
>>> Vivek Goyal <vgoyal-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
>>>
>>>> Hi All,
>>>>
>>>> Here is the V2 of the IO controller patches generated on top of 2.6.30-rc4.
>>>> ...
>>>> Currently primarily two other IO controller proposals are out there.
>>>>
>>>> dm-ioband
>>>> ---------
>>>> This patch set is from Ryo Tsuruta from valinux.
>>>> ...
>>>> IO-throttling
>>>> -------------
>>>> This patch set is from Andrea Righi provides max bandwidth controller.
>>> I'm thinking we need to lock you guys in a room and come back in 15 minutes.
>>>
>>> Seriously, how are we to resolve this?  We could lock me in a room and
>>> cmoe back in 15 days, but there's no reason to believe that I'd emerge
>>> with the best answer.
>>>
>>> I tend to think that a cgroup-based controller is the way to go. 
>>> Anything else will need to be wired up to cgroups _anyway_, and that
>>> might end up messy.
>> FWIW I subscribe to the io-scheduler faith as opposed to the
>> device-mapper cult ;-)
>>
>> Also, I don't think a simple throttle will be very useful, a more mature
>> solution should cater to more use cases.
>>
> 
> I tend to agree, unless Andrea can prove us wrong. I don't think
> throttling a task (not letting it consume CPU, memory when its IO
> quota is exceeded) is a good idea. I've asked that question to Andrea
> a few times, but got no response.

I agree with what Balbir said about the effects of throttling on memory and cpu usage of that task.
Nauman and I have been working on Vivek's set of patches (which also includes some patches by Nauman) and have been testing and developing on top of that. I've found this solution to be the one that takes us closest to a complete solution. This approach works well under the assumption that the queues are backlogged and in the limited testing that we've done so far doesn't fare that badly when they are not backlogged (though there is definitely room to improve there).
With buffered writes, when the queues are not backlogged I think it might be useful to explore into vm space and see if we can do something there w/o any impact to the tasks mem or cpu usage. I don't have any brilliant ideas on this now but want to get people thinking about this.

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: IO scheduler based IO Controller V2
  2009-05-06  3:42       ` Balbir Singh
  2009-05-06 10:20         ` Fabio Checconi
@ 2009-05-06 18:47         ` Divyesh Shah
       [not found]         ` <20090506034254.GD4416-SINUvgVNF2CyUtPGxGje5AC/G2K4zDHf@public.gmane.org>
  2009-05-06 20:42         ` Andrea Righi
  3 siblings, 0 replies; 97+ messages in thread
From: Divyesh Shah @ 2009-05-06 18:47 UTC (permalink / raw)
  To: balbir
  Cc: Peter Zijlstra, Andrew Morton, Vivek Goyal, nauman, lizf, mikew,
	fchecconi, paolo.valente, jens.axboe, ryov, fernando, s-uchida,
	taka, guijianfeng, jmoyer, dhaval, linux-kernel, containers,
	righi.andrea, agk, dm-devel, snitzer, m-ikeda

Balbir Singh wrote:
> * Peter Zijlstra <peterz@infradead.org> [2009-05-06 00:20:49]:
> 
>> On Tue, 2009-05-05 at 13:24 -0700, Andrew Morton wrote:
>>> On Tue,  5 May 2009 15:58:27 -0400
>>> Vivek Goyal <vgoyal@redhat.com> wrote:
>>>
>>>> Hi All,
>>>>
>>>> Here is the V2 of the IO controller patches generated on top of 2.6.30-rc4.
>>>> ...
>>>> Currently primarily two other IO controller proposals are out there.
>>>>
>>>> dm-ioband
>>>> ---------
>>>> This patch set is from Ryo Tsuruta from valinux.
>>>> ...
>>>> IO-throttling
>>>> -------------
>>>> This patch set is from Andrea Righi provides max bandwidth controller.
>>> I'm thinking we need to lock you guys in a room and come back in 15 minutes.
>>>
>>> Seriously, how are we to resolve this?  We could lock me in a room and
>>> cmoe back in 15 days, but there's no reason to believe that I'd emerge
>>> with the best answer.
>>>
>>> I tend to think that a cgroup-based controller is the way to go. 
>>> Anything else will need to be wired up to cgroups _anyway_, and that
>>> might end up messy.
>> FWIW I subscribe to the io-scheduler faith as opposed to the
>> device-mapper cult ;-)
>>
>> Also, I don't think a simple throttle will be very useful, a more mature
>> solution should cater to more use cases.
>>
> 
> I tend to agree, unless Andrea can prove us wrong. I don't think
> throttling a task (not letting it consume CPU, memory when its IO
> quota is exceeded) is a good idea. I've asked that question to Andrea
> a few times, but got no response.

I agree with what Balbir said about the effects of throttling on memory and cpu usage of that task.
Nauman and I have been working on Vivek's set of patches (which also includes some patches by Nauman) and have been testing and developing on top of that. I've found this solution to be the one that takes us closest to a complete solution. This approach works well under the assumption that the queues are backlogged and in the limited testing that we've done so far doesn't fare that badly when they are not backlogged (though there is definitely room to improve there).
With buffered writes, when the queues are not backlogged I think it might be useful to explore into vm space and see if we can do something there w/o any impact to the tasks mem or cpu usage. I don't have any brilliant ideas on this now but want to get people thinking about this.


^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: IO scheduler based IO Controller V2
       [not found]       ` <20090506023332.GA1212-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2009-05-06 17:59         ` Nauman Rafique
  2009-05-06 20:07         ` Andrea Righi
                           ` (2 subsequent siblings)
  3 siblings, 0 replies; 97+ messages in thread
From: Nauman Rafique @ 2009-05-06 17:59 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: dhaval-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8,
	snitzer-H+wXaHxf7aLQT0dZR+AlfA, dm-devel-H+wXaHxf7aLQT0dZR+AlfA,
	jens.axboe-QHcLZuEGTsvQT0dZR+AlfA, agk-H+wXaHxf7aLQT0dZR+AlfA,
	balbir-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8,
	paolo.valente-rcYM44yAMweonA0d6jMUrA,
	fernando-gVGce1chcLdL9jVzuh4AOg, jmoyer-H+wXaHxf7aLQT0dZR+AlfA,
	fchecconi-Re5JQEeQqe8AvxtiuMwx3w,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Andrew Morton,
	righi.andrea-Re5JQEeQqe8AvxtiuMwx3w

On Tue, May 5, 2009 at 7:33 PM, Vivek Goyal <vgoyal-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> On Tue, May 05, 2009 at 01:24:41PM -0700, Andrew Morton wrote:
>> On Tue,  5 May 2009 15:58:27 -0400
>> Vivek Goyal <vgoyal-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
>>
>> >
>> > Hi All,
>> >
>> > Here is the V2 of the IO controller patches generated on top of 2.6.30-rc4.
>> > ...
>> > Currently primarily two other IO controller proposals are out there.
>> >
>> > dm-ioband
>> > ---------
>> > This patch set is from Ryo Tsuruta from valinux.
>> > ...
>> > IO-throttling
>> > -------------
>> > This patch set is from Andrea Righi provides max bandwidth controller.
>>
>> I'm thinking we need to lock you guys in a room and come back in 15 minutes.
>>
>> Seriously, how are we to resolve this?  We could lock me in a room and
>> cmoe back in 15 days, but there's no reason to believe that I'd emerge
>> with the best answer.
>>
>> I tend to think that a cgroup-based controller is the way to go.
>> Anything else will need to be wired up to cgroups _anyway_, and that
>> might end up messy.
>
> Hi Andrew,
>
> Sorry, did not get what do you mean by cgroup based controller? If you
> mean that we use cgroups for grouping tasks for controlling IO, then both
> IO scheduler based controller as well as io throttling proposal do that.
> dm-ioband also supports that up to some extent but it requires extra step of
> transferring cgroup grouping information to dm-ioband device using dm-tools.
>
> But if you meant that io-throttle patches, then I think it solves only
> part of the problem and that is max bw control. It does not offer minimum
> BW/minimum disk share gurantees as offered by proportional BW control.
>
> IOW, it supports upper limit control and does not support a work conserving
> IO controller which lets a group use the whole BW if competing groups are
> not present. IMHO, proportional BW control is an important feature which
> we will need and IIUC, io-throttle patches can't be easily extended to support
> proportional BW control, OTOH, one should be able to extend IO scheduler
> based proportional weight controller to also support max bw control.
>
> Andrea, last time you were planning to have a look at my patches and see
> if max bw controller can be implemented there. I got a feeling that it
> should not be too difficult to implement it there. We already have the
> hierarchical tree of io queues and groups in elevator layer and we run
> BFQ (WF2Q+) algorithm to select next queue to dispatch the IO from. It is
> just a matter of also keeping track of IO rate per queue/group and we should
> be easily be able to delay the dispatch of IO from a queue if its group has
> crossed the specified max bw.
>
> This should lead to less code and reduced complextiy (compared with the
> case where we do max bw control with io-throttling patches and proportional
> BW control using IO scheduler based control patches).
>
> So do you think that it would make sense to do max BW control along with
> proportional weight IO controller at IO scheduler? If yes, then we can
> work together and continue to develop this patchset to also support max
> bw control and meet your requirements and drop the io-throttling patches.
>
> The only thing which concerns me is the fact that IO scheduler does not
> have the view of higher level logical device. So if somebody has setup a
> software RAID and wants to put max BW limit on software raid device, this
> solution will not work. One shall have to live with max bw limits on
> individual disks (where io scheduler is actually running). Do your patches
> allow to put limit on software RAID devices also?
>
> Ryo, dm-ioband breaks the notion of classes and priority of CFQ because
> of FIFO dispatch of buffered bios. Apart from that it tries to provide
> fairness in terms of actual IO done and that would mean a seeky workload
> will can use disk for much longer to get equivalent IO done and slow down
> other applications. Implementing IO controller at IO scheduler level gives
> us tigher control. Will it not meet your requirements? If you got specific
> concerns with IO scheduler based contol patches, please highlight these and
> we will see how these can be addressed.

In my opinion, IO throttling and dm-ioband are probably simpler, but
incomplete solutions to the problem. And for a solution to be
complete, it would have to be at a IO scheduler layer so it can do
things like taking an IO as soon as it comes and stick it to the front
of all the queues so that it can go to the disk right away. This patch
set is big, but it takes us in the right direction. Our ultimate goal
should be able to reach the level of control that we can have over CPU
and network resources. And I don't think IO throttling and dm-ioband
approaches take us in that direction.

>
> Thanks
> Vivek
>

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: IO scheduler based IO Controller V2
  2009-05-06  2:33     ` Vivek Goyal
@ 2009-05-06 17:59       ` Nauman Rafique
  2009-05-06 20:07       ` Andrea Righi
                         ` (3 subsequent siblings)
  4 siblings, 0 replies; 97+ messages in thread
From: Nauman Rafique @ 2009-05-06 17:59 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Andrew Morton, dpshah, lizf, mikew, fchecconi, paolo.valente,
	jens.axboe, ryov, fernando, s-uchida, taka, guijianfeng, jmoyer,
	dhaval, balbir, linux-kernel, containers, righi.andrea, agk,
	dm-devel, snitzer, m-ikeda, peterz

On Tue, May 5, 2009 at 7:33 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> On Tue, May 05, 2009 at 01:24:41PM -0700, Andrew Morton wrote:
>> On Tue,  5 May 2009 15:58:27 -0400
>> Vivek Goyal <vgoyal@redhat.com> wrote:
>>
>> >
>> > Hi All,
>> >
>> > Here is the V2 of the IO controller patches generated on top of 2.6.30-rc4.
>> > ...
>> > Currently primarily two other IO controller proposals are out there.
>> >
>> > dm-ioband
>> > ---------
>> > This patch set is from Ryo Tsuruta from valinux.
>> > ...
>> > IO-throttling
>> > -------------
>> > This patch set is from Andrea Righi provides max bandwidth controller.
>>
>> I'm thinking we need to lock you guys in a room and come back in 15 minutes.
>>
>> Seriously, how are we to resolve this?  We could lock me in a room and
>> cmoe back in 15 days, but there's no reason to believe that I'd emerge
>> with the best answer.
>>
>> I tend to think that a cgroup-based controller is the way to go.
>> Anything else will need to be wired up to cgroups _anyway_, and that
>> might end up messy.
>
> Hi Andrew,
>
> Sorry, did not get what do you mean by cgroup based controller? If you
> mean that we use cgroups for grouping tasks for controlling IO, then both
> IO scheduler based controller as well as io throttling proposal do that.
> dm-ioband also supports that up to some extent but it requires extra step of
> transferring cgroup grouping information to dm-ioband device using dm-tools.
>
> But if you meant that io-throttle patches, then I think it solves only
> part of the problem and that is max bw control. It does not offer minimum
> BW/minimum disk share gurantees as offered by proportional BW control.
>
> IOW, it supports upper limit control and does not support a work conserving
> IO controller which lets a group use the whole BW if competing groups are
> not present. IMHO, proportional BW control is an important feature which
> we will need and IIUC, io-throttle patches can't be easily extended to support
> proportional BW control, OTOH, one should be able to extend IO scheduler
> based proportional weight controller to also support max bw control.
>
> Andrea, last time you were planning to have a look at my patches and see
> if max bw controller can be implemented there. I got a feeling that it
> should not be too difficult to implement it there. We already have the
> hierarchical tree of io queues and groups in elevator layer and we run
> BFQ (WF2Q+) algorithm to select next queue to dispatch the IO from. It is
> just a matter of also keeping track of IO rate per queue/group and we should
> be easily be able to delay the dispatch of IO from a queue if its group has
> crossed the specified max bw.
>
> This should lead to less code and reduced complextiy (compared with the
> case where we do max bw control with io-throttling patches and proportional
> BW control using IO scheduler based control patches).
>
> So do you think that it would make sense to do max BW control along with
> proportional weight IO controller at IO scheduler? If yes, then we can
> work together and continue to develop this patchset to also support max
> bw control and meet your requirements and drop the io-throttling patches.
>
> The only thing which concerns me is the fact that IO scheduler does not
> have the view of higher level logical device. So if somebody has setup a
> software RAID and wants to put max BW limit on software raid device, this
> solution will not work. One shall have to live with max bw limits on
> individual disks (where io scheduler is actually running). Do your patches
> allow to put limit on software RAID devices also?
>
> Ryo, dm-ioband breaks the notion of classes and priority of CFQ because
> of FIFO dispatch of buffered bios. Apart from that it tries to provide
> fairness in terms of actual IO done and that would mean a seeky workload
> will can use disk for much longer to get equivalent IO done and slow down
> other applications. Implementing IO controller at IO scheduler level gives
> us tigher control. Will it not meet your requirements? If you got specific
> concerns with IO scheduler based contol patches, please highlight these and
> we will see how these can be addressed.

In my opinion, IO throttling and dm-ioband are probably simpler, but
incomplete solutions to the problem. And for a solution to be
complete, it would have to be at a IO scheduler layer so it can do
things like taking an IO as soon as it comes and stick it to the front
of all the queues so that it can go to the disk right away. This patch
set is big, but it takes us in the right direction. Our ultimate goal
should be able to reach the level of control that we can have over CPU
and network resources. And I don't think IO throttling and dm-ioband
approaches take us in that direction.

>
> Thanks
> Vivek
>

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: IO scheduler based IO Controller V2
       [not found]           ` <20090506102030.GB20544-f9ZlEuEWxVeACYmtYXMKmw@public.gmane.org>
@ 2009-05-06 17:10             ` Balbir Singh
  0 siblings, 0 replies; 97+ messages in thread
From: Balbir Singh @ 2009-05-06 17:10 UTC (permalink / raw)
  To: Fabio Checconi
  Cc: paolo.valente-rcYM44yAMweonA0d6jMUrA,
	dhaval-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8,
	snitzer-H+wXaHxf7aLQT0dZR+AlfA, fernando-gVGce1chcLdL9jVzuh4AOg,
	jmoyer-H+wXaHxf7aLQT0dZR+AlfA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	dm-devel-H+wXaHxf7aLQT0dZR+AlfA,
	jens.axboe-QHcLZuEGTsvQT0dZR+AlfA, Andrew Morton,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	agk-H+wXaHxf7aLQT0dZR+AlfA, righi.andrea-Re5JQEeQqe8AvxtiuMwx3w

* Fabio Checconi <fchecconi-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> [2009-05-06 12:20:30]:

> Hi,
> 
> > From: Balbir Singh <balbir-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
> > Date: Wed, May 06, 2009 09:12:54AM +0530
> >
> > * Peter Zijlstra <peterz-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org> [2009-05-06 00:20:49]:
> > 
> > > On Tue, 2009-05-05 at 13:24 -0700, Andrew Morton wrote:
> > > > On Tue,  5 May 2009 15:58:27 -0400
> > > > Vivek Goyal <vgoyal-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> > > > 
> > > > > 
> > > > > Hi All,
> > > > > 
> > > > > Here is the V2 of the IO controller patches generated on top of 2.6.30-rc4.
> > > > > ...
> > > > > Currently primarily two other IO controller proposals are out there.
> > > > > 
> > > > > dm-ioband
> > > > > ---------
> > > > > This patch set is from Ryo Tsuruta from valinux.
> > > > > ...
> > > > > IO-throttling
> > > > > -------------
> > > > > This patch set is from Andrea Righi provides max bandwidth controller.
> > > > 
> > > > I'm thinking we need to lock you guys in a room and come back in 15 minutes.
> > > > 
> > > > Seriously, how are we to resolve this?  We could lock me in a room and
> > > > cmoe back in 15 days, but there's no reason to believe that I'd emerge
> > > > with the best answer.
> > > > 
> > > > I tend to think that a cgroup-based controller is the way to go. 
> > > > Anything else will need to be wired up to cgroups _anyway_, and that
> > > > might end up messy.
> > > 
> > > FWIW I subscribe to the io-scheduler faith as opposed to the
> > > device-mapper cult ;-)
> > > 
> > > Also, I don't think a simple throttle will be very useful, a more mature
> > > solution should cater to more use cases.
> > >
> > 
> > I tend to agree, unless Andrea can prove us wrong. I don't think
> > throttling a task (not letting it consume CPU, memory when its IO
> > quota is exceeded) is a good idea. I've asked that question to Andrea
> > a few times, but got no response.
> >  
> 
>   from what I can see, the principle used by io-throttling is not too
> different to what happens when bandwidth differentiation with synchronous
> access patterns is achieved using idling at the io scheduler level.
> 
> When an io scheduler anticipates requests from a task/cgroup, all the
> other tasks with pending (synchronous) requests are in fact blocked, and
> the fact that the task being anticipated is allowed to submit additional
> io while they remain blocked is what creates the bandwidth differentiation
> among them.
> 
> Of course there are many differences, in particular related to the
> latencies introduced by the two mechanisms, the granularity they use to
> allocate disk service, and to what throttling and proportional share io
> scheduling can or cannot guarantee, but FWIK both of them rely on
> blocking tasks to create bandwidth differentiation.

My concern stems from the fact that in the case in this case we might
throttle all the tasks in the group.. no? I'll take a closer look.


-- 
	Balbir

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: IO scheduler based IO Controller V2
  2009-05-06 10:20         ` Fabio Checconi
@ 2009-05-06 17:10             ` Balbir Singh
       [not found]           ` <20090506102030.GB20544-f9ZlEuEWxVeACYmtYXMKmw@public.gmane.org>
  1 sibling, 0 replies; 97+ messages in thread
From: Balbir Singh @ 2009-05-06 17:10 UTC (permalink / raw)
  To: Fabio Checconi
  Cc: dhaval, snitzer, dm-devel, jens.axboe, agk, paolo.valente,
	fernando, jmoyer, righi.andrea, containers, linux-kernel,
	Andrew Morton

* Fabio Checconi <fchecconi@gmail.com> [2009-05-06 12:20:30]:

> Hi,
> 
> > From: Balbir Singh <balbir@linux.vnet.ibm.com>
> > Date: Wed, May 06, 2009 09:12:54AM +0530
> >
> > * Peter Zijlstra <peterz@infradead.org> [2009-05-06 00:20:49]:
> > 
> > > On Tue, 2009-05-05 at 13:24 -0700, Andrew Morton wrote:
> > > > On Tue,  5 May 2009 15:58:27 -0400
> > > > Vivek Goyal <vgoyal@redhat.com> wrote:
> > > > 
> > > > > 
> > > > > Hi All,
> > > > > 
> > > > > Here is the V2 of the IO controller patches generated on top of 2.6.30-rc4.
> > > > > ...
> > > > > Currently primarily two other IO controller proposals are out there.
> > > > > 
> > > > > dm-ioband
> > > > > ---------
> > > > > This patch set is from Ryo Tsuruta from valinux.
> > > > > ...
> > > > > IO-throttling
> > > > > -------------
> > > > > This patch set is from Andrea Righi provides max bandwidth controller.
> > > > 
> > > > I'm thinking we need to lock you guys in a room and come back in 15 minutes.
> > > > 
> > > > Seriously, how are we to resolve this?  We could lock me in a room and
> > > > cmoe back in 15 days, but there's no reason to believe that I'd emerge
> > > > with the best answer.
> > > > 
> > > > I tend to think that a cgroup-based controller is the way to go. 
> > > > Anything else will need to be wired up to cgroups _anyway_, and that
> > > > might end up messy.
> > > 
> > > FWIW I subscribe to the io-scheduler faith as opposed to the
> > > device-mapper cult ;-)
> > > 
> > > Also, I don't think a simple throttle will be very useful, a more mature
> > > solution should cater to more use cases.
> > >
> > 
> > I tend to agree, unless Andrea can prove us wrong. I don't think
> > throttling a task (not letting it consume CPU, memory when its IO
> > quota is exceeded) is a good idea. I've asked that question to Andrea
> > a few times, but got no response.
> >  
> 
>   from what I can see, the principle used by io-throttling is not too
> different to what happens when bandwidth differentiation with synchronous
> access patterns is achieved using idling at the io scheduler level.
> 
> When an io scheduler anticipates requests from a task/cgroup, all the
> other tasks with pending (synchronous) requests are in fact blocked, and
> the fact that the task being anticipated is allowed to submit additional
> io while they remain blocked is what creates the bandwidth differentiation
> among them.
> 
> Of course there are many differences, in particular related to the
> latencies introduced by the two mechanisms, the granularity they use to
> allocate disk service, and to what throttling and proportional share io
> scheduling can or cannot guarantee, but FWIK both of them rely on
> blocking tasks to create bandwidth differentiation.

My concern stems from the fact that in the case in this case we might
throttle all the tasks in the group.. no? I'll take a closer look.


-- 
	Balbir

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: IO scheduler based IO Controller V2
@ 2009-05-06 17:10             ` Balbir Singh
  0 siblings, 0 replies; 97+ messages in thread
From: Balbir Singh @ 2009-05-06 17:10 UTC (permalink / raw)
  To: Fabio Checconi
  Cc: paolo.valente, dhaval, snitzer, fernando, jmoyer, linux-kernel,
	dm-devel, jens.axboe, Andrew Morton, containers, agk,
	righi.andrea

* Fabio Checconi <fchecconi@gmail.com> [2009-05-06 12:20:30]:

> Hi,
> 
> > From: Balbir Singh <balbir@linux.vnet.ibm.com>
> > Date: Wed, May 06, 2009 09:12:54AM +0530
> >
> > * Peter Zijlstra <peterz@infradead.org> [2009-05-06 00:20:49]:
> > 
> > > On Tue, 2009-05-05 at 13:24 -0700, Andrew Morton wrote:
> > > > On Tue,  5 May 2009 15:58:27 -0400
> > > > Vivek Goyal <vgoyal@redhat.com> wrote:
> > > > 
> > > > > 
> > > > > Hi All,
> > > > > 
> > > > > Here is the V2 of the IO controller patches generated on top of 2.6.30-rc4.
> > > > > ...
> > > > > Currently primarily two other IO controller proposals are out there.
> > > > > 
> > > > > dm-ioband
> > > > > ---------
> > > > > This patch set is from Ryo Tsuruta from valinux.
> > > > > ...
> > > > > IO-throttling
> > > > > -------------
> > > > > This patch set is from Andrea Righi provides max bandwidth controller.
> > > > 
> > > > I'm thinking we need to lock you guys in a room and come back in 15 minutes.
> > > > 
> > > > Seriously, how are we to resolve this?  We could lock me in a room and
> > > > cmoe back in 15 days, but there's no reason to believe that I'd emerge
> > > > with the best answer.
> > > > 
> > > > I tend to think that a cgroup-based controller is the way to go. 
> > > > Anything else will need to be wired up to cgroups _anyway_, and that
> > > > might end up messy.
> > > 
> > > FWIW I subscribe to the io-scheduler faith as opposed to the
> > > device-mapper cult ;-)
> > > 
> > > Also, I don't think a simple throttle will be very useful, a more mature
> > > solution should cater to more use cases.
> > >
> > 
> > I tend to agree, unless Andrea can prove us wrong. I don't think
> > throttling a task (not letting it consume CPU, memory when its IO
> > quota is exceeded) is a good idea. I've asked that question to Andrea
> > a few times, but got no response.
> >  
> 
>   from what I can see, the principle used by io-throttling is not too
> different to what happens when bandwidth differentiation with synchronous
> access patterns is achieved using idling at the io scheduler level.
> 
> When an io scheduler anticipates requests from a task/cgroup, all the
> other tasks with pending (synchronous) requests are in fact blocked, and
> the fact that the task being anticipated is allowed to submit additional
> io while they remain blocked is what creates the bandwidth differentiation
> among them.
> 
> Of course there are many differences, in particular related to the
> latencies introduced by the two mechanisms, the granularity they use to
> allocate disk service, and to what throttling and proportional share io
> scheduling can or cannot guarantee, but FWIK both of them rely on
> blocking tasks to create bandwidth differentiation.

My concern stems from the fact that in the case in this case we might
throttle all the tasks in the group.. no? I'll take a closer look.


-- 
	Balbir

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: IO scheduler based IO Controller V2
  2009-05-06  8:11 ` Gui Jianfeng
@ 2009-05-06 16:10       ` Vivek Goyal
  0 siblings, 0 replies; 97+ messages in thread
From: Vivek Goyal @ 2009-05-06 16:10 UTC (permalink / raw)
  To: Gui Jianfeng
  Cc: dhaval-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8,
	snitzer-H+wXaHxf7aLQT0dZR+AlfA, dm-devel-H+wXaHxf7aLQT0dZR+AlfA,
	jens.axboe-QHcLZuEGTsvQT0dZR+AlfA, agk-H+wXaHxf7aLQT0dZR+AlfA,
	balbir-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8,
	paolo.valente-rcYM44yAMweonA0d6jMUrA,
	fernando-gVGce1chcLdL9jVzuh4AOg, jmoyer-H+wXaHxf7aLQT0dZR+AlfA,
	fchecconi-Re5JQEeQqe8AvxtiuMwx3w,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	righi.andrea-Re5JQEeQqe8AvxtiuMwx3w

On Wed, May 06, 2009 at 04:11:05PM +0800, Gui Jianfeng wrote:
> Vivek Goyal wrote:
> > Hi All,
> > 
> > Here is the V2 of the IO controller patches generated on top of 2.6.30-rc4.
> > First version of the patches was posted here.
> 
> Hi Vivek,
> 
> I did some simple test for V2, and triggered an kernel panic.
> The following script can reproduce this bug. It seems that the cgroup
> is already removed, but IO Controller still try to access into it.
> 

Hi Gui,

Thanks for the report. I use cgroup_path() for debugging. I guess that
cgroup_path() was passed null cgrp pointer that's why it crashed.

If yes, then it is strange though. I call cgroup_path() only after
grabbing a refenrece to css object. (I am assuming that if I have a valid
reference to css object then css->cgrp can't be null).

Anyway, can you please try out following patch and see if it fixes your
crash.

---
 block/elevator-fq.c |   10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

Index: linux11/block/elevator-fq.c
===================================================================
--- linux11.orig/block/elevator-fq.c	2009-05-05 15:38:06.000000000 -0400
+++ linux11/block/elevator-fq.c	2009-05-06 11:55:47.000000000 -0400
@@ -125,6 +125,9 @@ static void io_group_path(struct io_grou
 	unsigned short id = iog->iocg_id;
 	struct cgroup_subsys_state *css;
 
+	/* For error case */
+	buf[0] = '\0';
+
 	rcu_read_lock();
 
 	if (!id)
@@ -137,15 +140,12 @@ static void io_group_path(struct io_grou
 	if (!css_tryget(css))
 		goto out;
 
-	cgroup_path(css->cgroup, buf, buflen);
+	if (css->cgroup)
+		cgroup_path(css->cgroup, buf, buflen);
 
 	css_put(css);
-
-	rcu_read_unlock();
-	return;
 out:
 	rcu_read_unlock();
-	buf[0] = '\0';
 	return;
 }
 #endif

BTW, I tried following equivalent script and I can't see the crash on 
my system. Are you able to hit it regularly?

Instead of killing the tasks I also tried moving the tasks into root cgroup
and then deleting test1 and test2 groups, that also did not produce any crash.
(Hit a different bug though after 5-6 attempts :-)

As I mentioned in the patchset, currently we do have issues with group
refcounting and cgroup/group going away. Hopefully in next version they
all should be fixed up. But still, it is nice to hear back...


#!/bin/sh

../mount-cgroups.sh

# Mount disk
mount /dev/sdd1 /mnt/sdd1
mount /dev/sdd2 /mnt/sdd2

echo 1 > /proc/sys/vm/drop_caches

dd if=/dev/zero of=/mnt/sdd1/testzerofile1 bs=4K count=524288 &
pid1=$!
echo $pid1 > /cgroup/bfqio/test1/tasks
echo "Launched $pid1"

dd if=/dev/zero of=/mnt/sdd2/testzerofile1 bs=4K count=524288 &
pid2=$!
echo $pid2 > /cgroup/bfqio/test2/tasks
echo "Launched $pid2"

#echo "sleeping for 10 seconds"
#sleep 10
#echo "Killing pid $pid1"
#kill -9 $pid1
#echo "Killing pid $pid2"
#kill -9 $pid2
#sleep 5

echo "sleeping for 10 seconds"
sleep 10

echo "moving pid $pid1 to root"
echo $pid1 > /cgroup/bfqio/tasks
echo "moving pid $pid2 to root"
echo $pid2 > /cgroup/bfqio/tasks

echo ======
cat /cgroup/bfqio/test1/io.disk_time
cat /cgroup/bfqio/test2/io.disk_time

echo ======
cat /cgroup/bfqio/test1/io.disk_sectors
cat /cgroup/bfqio/test2/io.disk_sectors

echo "Removing test1"
rmdir /cgroup/bfqio/test1
echo "Removing test2"
rmdir /cgroup/bfqio/test2

echo "Unmounting /cgroup"
umount /cgroup/bfqio
echo "Done"
#rmdir /cgroup



> #!/bin/sh
> echo 1 > /proc/sys/vm/drop_caches
> mkdir /cgroup 2> /dev/null
> mount -t cgroup -o io,blkio io /cgroup
> mkdir /cgroup/test1
> mkdir /cgroup/test2
> echo 100 > /cgroup/test1/io.weight
> echo 500 > /cgroup/test2/io.weight
> 
> ./rwio -w -f 2000M.1 &  //do async write
> pid1=$!
> echo $pid1 > /cgroup/test1/tasks
> 
> ./rwio -w -f 2000M.2 &
> pid2=$!
> echo $pid2 > /cgroup/test2/tasks
> 
> sleep 10
> kill -9 $pid1
> kill -9 $pid2
> sleep 1
> 
> echo ======
> cat /cgroup/test1/io.disk_time
> cat /cgroup/test2/io.disk_time
> 
> echo ======
> cat /cgroup/test1/io.disk_sectors
> cat /cgroup/test2/io.disk_sectors
> 
> rmdir /cgroup/test1
> rmdir /cgroup/test2
> umount /cgroup
> rmdir /cgroup
> 
> 
> BUG: unable to handle kernel NULL pointer dereferec
> IP: [<c0448c24>] cgroup_path+0xc/0x97
> *pde = 64d2d067
> Oops: 0000 [#1] SMP
> last sysfs file: /sys/block/md0/range
> Modules linked in: ipv6 cpufreq_ondemand acpi_cpufreq dm_mirror dm_multipath sbd
> Pid: 132, comm: kblockd/0 Not tainted (2.6.30-rc4-Vivek-V2 #1) Veriton M460
> EIP: 0060:[<c0448c24>] EFLAGS: 00010086 CPU: 0
> EIP is at cgroup_path+0xc/0x97
> EAX: 00000100 EBX: f60adca0 ECX: 00000080 EDX: f709fe28
> ESI: f60adca8 EDI: f709fe28 EBP: 00000100 ESP: f709fdf0
>  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
> Process kblockd/0 (pid: 132, ti=f709f000 task=f70a8f60 task.ti=f709f000)
> Stack:
>  f709fe28 f68c5698 f60adca0 f60adca8 f709fe28 f68de801 c04f5389 00000080
>  f68de800 f7094d0c f6a29118 f68bde00 00000016 c04f5e8d c04f5340 00000080
>  c0579fec f68c5e94 00000082 c042edb4 f68c5fd4 f68c5fd4 c080b520 00000082
> Call Trace:
>  [<c04f5389>] ? io_group_path+0x6d/0x89
>  [<c04f5e8d>] ? elv_ioq_served+0x2a/0x7a
>  [<c04f5340>] ? io_group_path+0x24/0x89
>  [<c0579fec>] ? ide_build_dmatable+0xda/0x130
>  [<c042edb4>] ? lock_timer_base+0x19/0x35
>  [<c042ef0c>] ? mod_timer+0x9f/0xa8
>  [<c04fdee6>] ? __delay+0x6/0x7
>  [<c057364f>] ? ide_execute_command+0x5d/0x71
>  [<c0579d4f>] ? ide_dma_intr+0x0/0x99
>  [<c0576496>] ? do_rw_taskfile+0x201/0x213
>  [<c04f6daa>] ? __elv_ioq_slice_expired+0x212/0x25e
>  [<c04f7e15>] ? elv_fq_select_ioq+0x121/0x184
>  [<c04e8a2f>] ? elv_select_sched_queue+0x1e/0x2e
>  [<c04f439c>] ? cfq_dispatch_requests+0xaa/0x238
>  [<c04e7e67>] ? elv_next_request+0x152/0x15f
>  [<c04240c2>] ? dequeue_task_fair+0x16/0x2d
>  [<c0572f49>] ? do_ide_request+0x10f/0x4c8
>  [<c0642d44>] ? __schedule+0x845/0x893
>  [<c042edb4>] ? lock_timer_base+0x19/0x35
>  [<c042f1be>] ? del_timer+0x41/0x47
>  [<c04ea5c6>] ? __generic_unplug_device+0x23/0x25
>  [<c04f530d>] ? elv_kick_queue+0x19/0x28
>  [<c0434b77>] ? worker_thread+0x11f/0x19e
>  [<c04f52f4>] ? elv_kick_queue+0x0/0x28
>  [<c0436ffc>] ? autoremove_wake_function+0x0/0x2d
>  [<c0434a58>] ? worker_thread+0x0/0x19e
>  [<c0436f3b>] ? kthread+0x42/0x67
>  [<c0436ef9>] ? kthread+0x0/0x67
>  [<c040326f>] ? kernel_thread_helper+0x7/0x10
> Code: c0 84 c0 74 0e 89 d8 e8 7c e9 fd ff eb 05 bf fd ff ff ff e8 c0 ea ff ff 8
> EIP: [<c0448c24>] cgroup_path+0xc/0x97 SS:ESP 0068:f709fdf0
> CR2: 000000000000011c
> ---[ end trace 2d4bc25a2c33e394 ]---
> 
> -- 
> Regards
> Gui Jianfeng
> 

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: IO scheduler based IO Controller V2
@ 2009-05-06 16:10       ` Vivek Goyal
  0 siblings, 0 replies; 97+ messages in thread
From: Vivek Goyal @ 2009-05-06 16:10 UTC (permalink / raw)
  To: Gui Jianfeng
  Cc: nauman, dpshah, lizf, mikew, fchecconi, paolo.valente,
	jens.axboe, ryov, fernando, s-uchida, taka, jmoyer, dhaval,
	balbir, linux-kernel, containers, righi.andrea, agk, dm-devel,
	snitzer, m-ikeda, akpm

On Wed, May 06, 2009 at 04:11:05PM +0800, Gui Jianfeng wrote:
> Vivek Goyal wrote:
> > Hi All,
> > 
> > Here is the V2 of the IO controller patches generated on top of 2.6.30-rc4.
> > First version of the patches was posted here.
> 
> Hi Vivek,
> 
> I did some simple test for V2, and triggered an kernel panic.
> The following script can reproduce this bug. It seems that the cgroup
> is already removed, but IO Controller still try to access into it.
> 

Hi Gui,

Thanks for the report. I use cgroup_path() for debugging. I guess that
cgroup_path() was passed null cgrp pointer that's why it crashed.

If yes, then it is strange though. I call cgroup_path() only after
grabbing a refenrece to css object. (I am assuming that if I have a valid
reference to css object then css->cgrp can't be null).

Anyway, can you please try out following patch and see if it fixes your
crash.

---
 block/elevator-fq.c |   10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

Index: linux11/block/elevator-fq.c
===================================================================
--- linux11.orig/block/elevator-fq.c	2009-05-05 15:38:06.000000000 -0400
+++ linux11/block/elevator-fq.c	2009-05-06 11:55:47.000000000 -0400
@@ -125,6 +125,9 @@ static void io_group_path(struct io_grou
 	unsigned short id = iog->iocg_id;
 	struct cgroup_subsys_state *css;
 
+	/* For error case */
+	buf[0] = '\0';
+
 	rcu_read_lock();
 
 	if (!id)
@@ -137,15 +140,12 @@ static void io_group_path(struct io_grou
 	if (!css_tryget(css))
 		goto out;
 
-	cgroup_path(css->cgroup, buf, buflen);
+	if (css->cgroup)
+		cgroup_path(css->cgroup, buf, buflen);
 
 	css_put(css);
-
-	rcu_read_unlock();
-	return;
 out:
 	rcu_read_unlock();
-	buf[0] = '\0';
 	return;
 }
 #endif

BTW, I tried following equivalent script and I can't see the crash on 
my system. Are you able to hit it regularly?

Instead of killing the tasks I also tried moving the tasks into root cgroup
and then deleting test1 and test2 groups, that also did not produce any crash.
(Hit a different bug though after 5-6 attempts :-)

As I mentioned in the patchset, currently we do have issues with group
refcounting and cgroup/group going away. Hopefully in next version they
all should be fixed up. But still, it is nice to hear back...


#!/bin/sh

../mount-cgroups.sh

# Mount disk
mount /dev/sdd1 /mnt/sdd1
mount /dev/sdd2 /mnt/sdd2

echo 1 > /proc/sys/vm/drop_caches

dd if=/dev/zero of=/mnt/sdd1/testzerofile1 bs=4K count=524288 &
pid1=$!
echo $pid1 > /cgroup/bfqio/test1/tasks
echo "Launched $pid1"

dd if=/dev/zero of=/mnt/sdd2/testzerofile1 bs=4K count=524288 &
pid2=$!
echo $pid2 > /cgroup/bfqio/test2/tasks
echo "Launched $pid2"

#echo "sleeping for 10 seconds"
#sleep 10
#echo "Killing pid $pid1"
#kill -9 $pid1
#echo "Killing pid $pid2"
#kill -9 $pid2
#sleep 5

echo "sleeping for 10 seconds"
sleep 10

echo "moving pid $pid1 to root"
echo $pid1 > /cgroup/bfqio/tasks
echo "moving pid $pid2 to root"
echo $pid2 > /cgroup/bfqio/tasks

echo ======
cat /cgroup/bfqio/test1/io.disk_time
cat /cgroup/bfqio/test2/io.disk_time

echo ======
cat /cgroup/bfqio/test1/io.disk_sectors
cat /cgroup/bfqio/test2/io.disk_sectors

echo "Removing test1"
rmdir /cgroup/bfqio/test1
echo "Removing test2"
rmdir /cgroup/bfqio/test2

echo "Unmounting /cgroup"
umount /cgroup/bfqio
echo "Done"
#rmdir /cgroup



> #!/bin/sh
> echo 1 > /proc/sys/vm/drop_caches
> mkdir /cgroup 2> /dev/null
> mount -t cgroup -o io,blkio io /cgroup
> mkdir /cgroup/test1
> mkdir /cgroup/test2
> echo 100 > /cgroup/test1/io.weight
> echo 500 > /cgroup/test2/io.weight
> 
> ./rwio -w -f 2000M.1 &  //do async write
> pid1=$!
> echo $pid1 > /cgroup/test1/tasks
> 
> ./rwio -w -f 2000M.2 &
> pid2=$!
> echo $pid2 > /cgroup/test2/tasks
> 
> sleep 10
> kill -9 $pid1
> kill -9 $pid2
> sleep 1
> 
> echo ======
> cat /cgroup/test1/io.disk_time
> cat /cgroup/test2/io.disk_time
> 
> echo ======
> cat /cgroup/test1/io.disk_sectors
> cat /cgroup/test2/io.disk_sectors
> 
> rmdir /cgroup/test1
> rmdir /cgroup/test2
> umount /cgroup
> rmdir /cgroup
> 
> 
> BUG: unable to handle kernel NULL pointer dereferec
> IP: [<c0448c24>] cgroup_path+0xc/0x97
> *pde = 64d2d067
> Oops: 0000 [#1] SMP
> last sysfs file: /sys/block/md0/range
> Modules linked in: ipv6 cpufreq_ondemand acpi_cpufreq dm_mirror dm_multipath sbd
> Pid: 132, comm: kblockd/0 Not tainted (2.6.30-rc4-Vivek-V2 #1) Veriton M460
> EIP: 0060:[<c0448c24>] EFLAGS: 00010086 CPU: 0
> EIP is at cgroup_path+0xc/0x97
> EAX: 00000100 EBX: f60adca0 ECX: 00000080 EDX: f709fe28
> ESI: f60adca8 EDI: f709fe28 EBP: 00000100 ESP: f709fdf0
>  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
> Process kblockd/0 (pid: 132, ti=f709f000 task=f70a8f60 task.ti=f709f000)
> Stack:
>  f709fe28 f68c5698 f60adca0 f60adca8 f709fe28 f68de801 c04f5389 00000080
>  f68de800 f7094d0c f6a29118 f68bde00 00000016 c04f5e8d c04f5340 00000080
>  c0579fec f68c5e94 00000082 c042edb4 f68c5fd4 f68c5fd4 c080b520 00000082
> Call Trace:
>  [<c04f5389>] ? io_group_path+0x6d/0x89
>  [<c04f5e8d>] ? elv_ioq_served+0x2a/0x7a
>  [<c04f5340>] ? io_group_path+0x24/0x89
>  [<c0579fec>] ? ide_build_dmatable+0xda/0x130
>  [<c042edb4>] ? lock_timer_base+0x19/0x35
>  [<c042ef0c>] ? mod_timer+0x9f/0xa8
>  [<c04fdee6>] ? __delay+0x6/0x7
>  [<c057364f>] ? ide_execute_command+0x5d/0x71
>  [<c0579d4f>] ? ide_dma_intr+0x0/0x99
>  [<c0576496>] ? do_rw_taskfile+0x201/0x213
>  [<c04f6daa>] ? __elv_ioq_slice_expired+0x212/0x25e
>  [<c04f7e15>] ? elv_fq_select_ioq+0x121/0x184
>  [<c04e8a2f>] ? elv_select_sched_queue+0x1e/0x2e
>  [<c04f439c>] ? cfq_dispatch_requests+0xaa/0x238
>  [<c04e7e67>] ? elv_next_request+0x152/0x15f
>  [<c04240c2>] ? dequeue_task_fair+0x16/0x2d
>  [<c0572f49>] ? do_ide_request+0x10f/0x4c8
>  [<c0642d44>] ? __schedule+0x845/0x893
>  [<c042edb4>] ? lock_timer_base+0x19/0x35
>  [<c042f1be>] ? del_timer+0x41/0x47
>  [<c04ea5c6>] ? __generic_unplug_device+0x23/0x25
>  [<c04f530d>] ? elv_kick_queue+0x19/0x28
>  [<c0434b77>] ? worker_thread+0x11f/0x19e
>  [<c04f52f4>] ? elv_kick_queue+0x0/0x28
>  [<c0436ffc>] ? autoremove_wake_function+0x0/0x2d
>  [<c0434a58>] ? worker_thread+0x0/0x19e
>  [<c0436f3b>] ? kthread+0x42/0x67
>  [<c0436ef9>] ? kthread+0x0/0x67
>  [<c040326f>] ? kernel_thread_helper+0x7/0x10
> Code: c0 84 c0 74 0e 89 d8 e8 7c e9 fd ff eb 05 bf fd ff ff ff e8 c0 ea ff ff 8
> EIP: [<c0448c24>] cgroup_path+0xc/0x97 SS:ESP 0068:f709fdf0
> CR2: 000000000000011c
> ---[ end trace 2d4bc25a2c33e394 ]---
> 
> -- 
> Regards
> Gui Jianfeng
> 

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: IO scheduler based IO Controller V2
       [not found]       ` <20090506034118.GC4416-SINUvgVNF2CyUtPGxGje5AC/G2K4zDHf@public.gmane.org>
@ 2009-05-06 13:28         ` Vivek Goyal
  0 siblings, 0 replies; 97+ messages in thread
From: Vivek Goyal @ 2009-05-06 13:28 UTC (permalink / raw)
  To: Balbir Singh
  Cc: paolo.valente-rcYM44yAMweonA0d6jMUrA,
	dhaval-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8,
	snitzer-H+wXaHxf7aLQT0dZR+AlfA, fernando-gVGce1chcLdL9jVzuh4AOg,
	jmoyer-H+wXaHxf7aLQT0dZR+AlfA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	fchecconi-Re5JQEeQqe8AvxtiuMwx3w,
	dm-devel-H+wXaHxf7aLQT0dZR+AlfA,
	jens.axboe-QHcLZuEGTsvQT0dZR+AlfA, Andrew Morton,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	agk-H+wXaHxf7aLQT0dZR+AlfA, righi.andrea-Re5JQEeQqe8AvxtiuMwx3w

On Wed, May 06, 2009 at 09:11:18AM +0530, Balbir Singh wrote:
> * Andrew Morton <akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org> [2009-05-05 13:24:41]:
> 
> > On Tue,  5 May 2009 15:58:27 -0400
> > Vivek Goyal <vgoyal-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> > 
> > > 
> > > Hi All,
> > > 
> > > Here is the V2 of the IO controller patches generated on top of 2.6.30-rc4.
> > > ...
> > > Currently primarily two other IO controller proposals are out there.
> > > 
> > > dm-ioband
> > > ---------
> > > This patch set is from Ryo Tsuruta from valinux.
> > > ...
> > > IO-throttling
> > > -------------
> > > This patch set is from Andrea Righi provides max bandwidth controller.
> > 
> > I'm thinking we need to lock you guys in a room and come back in 15 minutes.
> > 
> > Seriously, how are we to resolve this?  We could lock me in a room and
> > cmoe back in 15 days, but there's no reason to believe that I'd emerge
> > with the best answer.
> >
> 
> We are planning an IO mini-summit prior to the kernel summit
> (hopefully we'll all be able to attend and decide).

Hi Balbir,

Mini summit is still few months away. I think a better idea would be to
try to thrash out the details here on lkml and try to reach to some
conclusion.

Its a complicated problem and there are no simple and easy answers. If we
can't reach a conclusion here, I am skeptical that mini summit will serve
that purpose.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: IO scheduler based IO Controller V2
  2009-05-06  3:41     ` Balbir Singh
@ 2009-05-06 13:28         ` Vivek Goyal
       [not found]       ` <20090506034118.GC4416-SINUvgVNF2CyUtPGxGje5AC/G2K4zDHf@public.gmane.org>
  1 sibling, 0 replies; 97+ messages in thread
From: Vivek Goyal @ 2009-05-06 13:28 UTC (permalink / raw)
  To: Balbir Singh
  Cc: Andrew Morton, dhaval, snitzer, dm-devel, jens.axboe, agk,
	paolo.valente, fernando, jmoyer, fchecconi, containers,
	linux-kernel, righi.andrea

On Wed, May 06, 2009 at 09:11:18AM +0530, Balbir Singh wrote:
> * Andrew Morton <akpm@linux-foundation.org> [2009-05-05 13:24:41]:
> 
> > On Tue,  5 May 2009 15:58:27 -0400
> > Vivek Goyal <vgoyal@redhat.com> wrote:
> > 
> > > 
> > > Hi All,
> > > 
> > > Here is the V2 of the IO controller patches generated on top of 2.6.30-rc4.
> > > ...
> > > Currently primarily two other IO controller proposals are out there.
> > > 
> > > dm-ioband
> > > ---------
> > > This patch set is from Ryo Tsuruta from valinux.
> > > ...
> > > IO-throttling
> > > -------------
> > > This patch set is from Andrea Righi provides max bandwidth controller.
> > 
> > I'm thinking we need to lock you guys in a room and come back in 15 minutes.
> > 
> > Seriously, how are we to resolve this?  We could lock me in a room and
> > cmoe back in 15 days, but there's no reason to believe that I'd emerge
> > with the best answer.
> >
> 
> We are planning an IO mini-summit prior to the kernel summit
> (hopefully we'll all be able to attend and decide).

Hi Balbir,

Mini summit is still few months away. I think a better idea would be to
try to thrash out the details here on lkml and try to reach to some
conclusion.

Its a complicated problem and there are no simple and easy answers. If we
can't reach a conclusion here, I am skeptical that mini summit will serve
that purpose.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: IO scheduler based IO Controller V2
@ 2009-05-06 13:28         ` Vivek Goyal
  0 siblings, 0 replies; 97+ messages in thread
From: Vivek Goyal @ 2009-05-06 13:28 UTC (permalink / raw)
  To: Balbir Singh
  Cc: paolo.valente, dhaval, snitzer, fernando, jmoyer, linux-kernel,
	fchecconi, dm-devel, jens.axboe, Andrew Morton, containers, agk,
	righi.andrea

On Wed, May 06, 2009 at 09:11:18AM +0530, Balbir Singh wrote:
> * Andrew Morton <akpm@linux-foundation.org> [2009-05-05 13:24:41]:
> 
> > On Tue,  5 May 2009 15:58:27 -0400
> > Vivek Goyal <vgoyal@redhat.com> wrote:
> > 
> > > 
> > > Hi All,
> > > 
> > > Here is the V2 of the IO controller patches generated on top of 2.6.30-rc4.
> > > ...
> > > Currently primarily two other IO controller proposals are out there.
> > > 
> > > dm-ioband
> > > ---------
> > > This patch set is from Ryo Tsuruta from valinux.
> > > ...
> > > IO-throttling
> > > -------------
> > > This patch set is from Andrea Righi provides max bandwidth controller.
> > 
> > I'm thinking we need to lock you guys in a room and come back in 15 minutes.
> > 
> > Seriously, how are we to resolve this?  We could lock me in a room and
> > cmoe back in 15 days, but there's no reason to believe that I'd emerge
> > with the best answer.
> >
> 
> We are planning an IO mini-summit prior to the kernel summit
> (hopefully we'll all be able to attend and decide).

Hi Balbir,

Mini summit is still few months away. I think a better idea would be to
try to thrash out the details here on lkml and try to reach to some
conclusion.

Its a complicated problem and there are no simple and easy answers. If we
can't reach a conclusion here, I am skeptical that mini summit will serve
that purpose.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: IO scheduler based IO Controller V2
       [not found]         ` <20090506034254.GD4416-SINUvgVNF2CyUtPGxGje5AC/G2K4zDHf@public.gmane.org>
@ 2009-05-06 10:20           ` Fabio Checconi
  2009-05-06 18:47           ` Divyesh Shah
  2009-05-06 20:42           ` Andrea Righi
  2 siblings, 0 replies; 97+ messages in thread
From: Fabio Checconi @ 2009-05-06 10:20 UTC (permalink / raw)
  To: Balbir Singh
  Cc: dhaval-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8,
	snitzer-H+wXaHxf7aLQT0dZR+AlfA, dm-devel-H+wXaHxf7aLQT0dZR+AlfA,
	jens.axboe-QHcLZuEGTsvQT0dZR+AlfA, agk-H+wXaHxf7aLQT0dZR+AlfA,
	paolo.valente-rcYM44yAMweonA0d6jMUrA,
	fernando-gVGce1chcLdL9jVzuh4AOg, jmoyer-H+wXaHxf7aLQT0dZR+AlfA,
	righi.andrea-Re5JQEeQqe8AvxtiuMwx3w,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Andrew Morton

Hi,

> From: Balbir Singh <balbir-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
> Date: Wed, May 06, 2009 09:12:54AM +0530
>
> * Peter Zijlstra <peterz-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org> [2009-05-06 00:20:49]:
> 
> > On Tue, 2009-05-05 at 13:24 -0700, Andrew Morton wrote:
> > > On Tue,  5 May 2009 15:58:27 -0400
> > > Vivek Goyal <vgoyal-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> > > 
> > > > 
> > > > Hi All,
> > > > 
> > > > Here is the V2 of the IO controller patches generated on top of 2.6.30-rc4.
> > > > ...
> > > > Currently primarily two other IO controller proposals are out there.
> > > > 
> > > > dm-ioband
> > > > ---------
> > > > This patch set is from Ryo Tsuruta from valinux.
> > > > ...
> > > > IO-throttling
> > > > -------------
> > > > This patch set is from Andrea Righi provides max bandwidth controller.
> > > 
> > > I'm thinking we need to lock you guys in a room and come back in 15 minutes.
> > > 
> > > Seriously, how are we to resolve this?  We could lock me in a room and
> > > cmoe back in 15 days, but there's no reason to believe that I'd emerge
> > > with the best answer.
> > > 
> > > I tend to think that a cgroup-based controller is the way to go. 
> > > Anything else will need to be wired up to cgroups _anyway_, and that
> > > might end up messy.
> > 
> > FWIW I subscribe to the io-scheduler faith as opposed to the
> > device-mapper cult ;-)
> > 
> > Also, I don't think a simple throttle will be very useful, a more mature
> > solution should cater to more use cases.
> >
> 
> I tend to agree, unless Andrea can prove us wrong. I don't think
> throttling a task (not letting it consume CPU, memory when its IO
> quota is exceeded) is a good idea. I've asked that question to Andrea
> a few times, but got no response.
>  

  from what I can see, the principle used by io-throttling is not too
different to what happens when bandwidth differentiation with synchronous
access patterns is achieved using idling at the io scheduler level.

When an io scheduler anticipates requests from a task/cgroup, all the
other tasks with pending (synchronous) requests are in fact blocked, and
the fact that the task being anticipated is allowed to submit additional
io while they remain blocked is what creates the bandwidth differentiation
among them.

Of course there are many differences, in particular related to the
latencies introduced by the two mechanisms, the granularity they use to
allocate disk service, and to what throttling and proportional share io
scheduling can or cannot guarantee, but FWIK both of them rely on
blocking tasks to create bandwidth differentiation.

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: IO scheduler based IO Controller V2
  2009-05-06  3:42       ` Balbir Singh
@ 2009-05-06 10:20         ` Fabio Checconi
  2009-05-06 17:10             ` Balbir Singh
       [not found]           ` <20090506102030.GB20544-f9ZlEuEWxVeACYmtYXMKmw@public.gmane.org>
  2009-05-06 18:47         ` Divyesh Shah
                           ` (2 subsequent siblings)
  3 siblings, 2 replies; 97+ messages in thread
From: Fabio Checconi @ 2009-05-06 10:20 UTC (permalink / raw)
  To: Balbir Singh
  Cc: Peter Zijlstra, Andrew Morton, Vivek Goyal, nauman, dpshah, lizf,
	mikew, paolo.valente, jens.axboe, ryov, fernando, s-uchida, taka,
	guijianfeng, jmoyer, dhaval, linux-kernel, containers,
	righi.andrea, agk, dm-devel, snitzer, m-ikeda

Hi,

> From: Balbir Singh <balbir@linux.vnet.ibm.com>
> Date: Wed, May 06, 2009 09:12:54AM +0530
>
> * Peter Zijlstra <peterz@infradead.org> [2009-05-06 00:20:49]:
> 
> > On Tue, 2009-05-05 at 13:24 -0700, Andrew Morton wrote:
> > > On Tue,  5 May 2009 15:58:27 -0400
> > > Vivek Goyal <vgoyal@redhat.com> wrote:
> > > 
> > > > 
> > > > Hi All,
> > > > 
> > > > Here is the V2 of the IO controller patches generated on top of 2.6.30-rc4.
> > > > ...
> > > > Currently primarily two other IO controller proposals are out there.
> > > > 
> > > > dm-ioband
> > > > ---------
> > > > This patch set is from Ryo Tsuruta from valinux.
> > > > ...
> > > > IO-throttling
> > > > -------------
> > > > This patch set is from Andrea Righi provides max bandwidth controller.
> > > 
> > > I'm thinking we need to lock you guys in a room and come back in 15 minutes.
> > > 
> > > Seriously, how are we to resolve this?  We could lock me in a room and
> > > cmoe back in 15 days, but there's no reason to believe that I'd emerge
> > > with the best answer.
> > > 
> > > I tend to think that a cgroup-based controller is the way to go. 
> > > Anything else will need to be wired up to cgroups _anyway_, and that
> > > might end up messy.
> > 
> > FWIW I subscribe to the io-scheduler faith as opposed to the
> > device-mapper cult ;-)
> > 
> > Also, I don't think a simple throttle will be very useful, a more mature
> > solution should cater to more use cases.
> >
> 
> I tend to agree, unless Andrea can prove us wrong. I don't think
> throttling a task (not letting it consume CPU, memory when its IO
> quota is exceeded) is a good idea. I've asked that question to Andrea
> a few times, but got no response.
>  

  from what I can see, the principle used by io-throttling is not too
different to what happens when bandwidth differentiation with synchronous
access patterns is achieved using idling at the io scheduler level.

When an io scheduler anticipates requests from a task/cgroup, all the
other tasks with pending (synchronous) requests are in fact blocked, and
the fact that the task being anticipated is allowed to submit additional
io while they remain blocked is what creates the bandwidth differentiation
among them.

Of course there are many differences, in particular related to the
latencies introduced by the two mechanisms, the granularity they use to
allocate disk service, and to what throttling and proportional share io
scheduling can or cannot guarantee, but FWIK both of them rely on
blocking tasks to create bandwidth differentiation.

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: IO scheduler based IO Controller V2
       [not found] ` <1241553525-28095-1-git-send-email-vgoyal-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  2009-05-05 20:24     ` Andrew Morton
@ 2009-05-06  8:11   ` Gui Jianfeng
  1 sibling, 0 replies; 97+ messages in thread
From: Gui Jianfeng @ 2009-05-06  8:11 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: dhaval-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8,
	snitzer-H+wXaHxf7aLQT0dZR+AlfA, dm-devel-H+wXaHxf7aLQT0dZR+AlfA,
	jens.axboe-QHcLZuEGTsvQT0dZR+AlfA, agk-H+wXaHxf7aLQT0dZR+AlfA,
	balbir-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8,
	paolo.valente-rcYM44yAMweonA0d6jMUrA,
	fernando-gVGce1chcLdL9jVzuh4AOg, jmoyer-H+wXaHxf7aLQT0dZR+AlfA,
	fchecconi-Re5JQEeQqe8AvxtiuMwx3w,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	righi.andrea-Re5JQEeQqe8AvxtiuMwx3w

Vivek Goyal wrote:
> Hi All,
> 
> Here is the V2 of the IO controller patches generated on top of 2.6.30-rc4.
> First version of the patches was posted here.

Hi Vivek,

I did some simple test for V2, and triggered an kernel panic.
The following script can reproduce this bug. It seems that the cgroup
is already removed, but IO Controller still try to access into it.

#!/bin/sh
echo 1 > /proc/sys/vm/drop_caches
mkdir /cgroup 2> /dev/null
mount -t cgroup -o io,blkio io /cgroup
mkdir /cgroup/test1
mkdir /cgroup/test2
echo 100 > /cgroup/test1/io.weight
echo 500 > /cgroup/test2/io.weight

./rwio -w -f 2000M.1 &  //do async write
pid1=$!
echo $pid1 > /cgroup/test1/tasks

./rwio -w -f 2000M.2 &
pid2=$!
echo $pid2 > /cgroup/test2/tasks

sleep 10
kill -9 $pid1
kill -9 $pid2
sleep 1

echo ======
cat /cgroup/test1/io.disk_time
cat /cgroup/test2/io.disk_time

echo ======
cat /cgroup/test1/io.disk_sectors
cat /cgroup/test2/io.disk_sectors

rmdir /cgroup/test1
rmdir /cgroup/test2
umount /cgroup
rmdir /cgroup


BUG: unable to handle kernel NULL pointer dereferec
IP: [<c0448c24>] cgroup_path+0xc/0x97
*pde = 64d2d067
Oops: 0000 [#1] SMP
last sysfs file: /sys/block/md0/range
Modules linked in: ipv6 cpufreq_ondemand acpi_cpufreq dm_mirror dm_multipath sbd
Pid: 132, comm: kblockd/0 Not tainted (2.6.30-rc4-Vivek-V2 #1) Veriton M460
EIP: 0060:[<c0448c24>] EFLAGS: 00010086 CPU: 0
EIP is at cgroup_path+0xc/0x97
EAX: 00000100 EBX: f60adca0 ECX: 00000080 EDX: f709fe28
ESI: f60adca8 EDI: f709fe28 EBP: 00000100 ESP: f709fdf0
 DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
Process kblockd/0 (pid: 132, ti=f709f000 task=f70a8f60 task.ti=f709f000)
Stack:
 f709fe28 f68c5698 f60adca0 f60adca8 f709fe28 f68de801 c04f5389 00000080
 f68de800 f7094d0c f6a29118 f68bde00 00000016 c04f5e8d c04f5340 00000080
 c0579fec f68c5e94 00000082 c042edb4 f68c5fd4 f68c5fd4 c080b520 00000082
Call Trace:
 [<c04f5389>] ? io_group_path+0x6d/0x89
 [<c04f5e8d>] ? elv_ioq_served+0x2a/0x7a
 [<c04f5340>] ? io_group_path+0x24/0x89
 [<c0579fec>] ? ide_build_dmatable+0xda/0x130
 [<c042edb4>] ? lock_timer_base+0x19/0x35
 [<c042ef0c>] ? mod_timer+0x9f/0xa8
 [<c04fdee6>] ? __delay+0x6/0x7
 [<c057364f>] ? ide_execute_command+0x5d/0x71
 [<c0579d4f>] ? ide_dma_intr+0x0/0x99
 [<c0576496>] ? do_rw_taskfile+0x201/0x213
 [<c04f6daa>] ? __elv_ioq_slice_expired+0x212/0x25e
 [<c04f7e15>] ? elv_fq_select_ioq+0x121/0x184
 [<c04e8a2f>] ? elv_select_sched_queue+0x1e/0x2e
 [<c04f439c>] ? cfq_dispatch_requests+0xaa/0x238
 [<c04e7e67>] ? elv_next_request+0x152/0x15f
 [<c04240c2>] ? dequeue_task_fair+0x16/0x2d
 [<c0572f49>] ? do_ide_request+0x10f/0x4c8
 [<c0642d44>] ? __schedule+0x845/0x893
 [<c042edb4>] ? lock_timer_base+0x19/0x35
 [<c042f1be>] ? del_timer+0x41/0x47
 [<c04ea5c6>] ? __generic_unplug_device+0x23/0x25
 [<c04f530d>] ? elv_kick_queue+0x19/0x28
 [<c0434b77>] ? worker_thread+0x11f/0x19e
 [<c04f52f4>] ? elv_kick_queue+0x0/0x28
 [<c0436ffc>] ? autoremove_wake_function+0x0/0x2d
 [<c0434a58>] ? worker_thread+0x0/0x19e
 [<c0436f3b>] ? kthread+0x42/0x67
 [<c0436ef9>] ? kthread+0x0/0x67
 [<c040326f>] ? kernel_thread_helper+0x7/0x10
Code: c0 84 c0 74 0e 89 d8 e8 7c e9 fd ff eb 05 bf fd ff ff ff e8 c0 ea ff ff 8
EIP: [<c0448c24>] cgroup_path+0xc/0x97 SS:ESP 0068:f709fdf0
CR2: 000000000000011c
---[ end trace 2d4bc25a2c33e394 ]---

-- 
Regards
Gui Jianfeng

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: IO scheduler based IO Controller V2
  2009-05-05 19:58 Vivek Goyal
       [not found] ` <1241553525-28095-1-git-send-email-vgoyal-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2009-05-06  8:11 ` Gui Jianfeng
       [not found]   ` <4A014619.1040000-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>
  1 sibling, 1 reply; 97+ messages in thread
From: Gui Jianfeng @ 2009-05-06  8:11 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: nauman, dpshah, lizf, mikew, fchecconi, paolo.valente,
	jens.axboe, ryov, fernando, s-uchida, taka, jmoyer, dhaval,
	balbir, linux-kernel, containers, righi.andrea, agk, dm-devel,
	snitzer, m-ikeda, akpm

Vivek Goyal wrote:
> Hi All,
> 
> Here is the V2 of the IO controller patches generated on top of 2.6.30-rc4.
> First version of the patches was posted here.

Hi Vivek,

I did some simple test for V2, and triggered an kernel panic.
The following script can reproduce this bug. It seems that the cgroup
is already removed, but IO Controller still try to access into it.

#!/bin/sh
echo 1 > /proc/sys/vm/drop_caches
mkdir /cgroup 2> /dev/null
mount -t cgroup -o io,blkio io /cgroup
mkdir /cgroup/test1
mkdir /cgroup/test2
echo 100 > /cgroup/test1/io.weight
echo 500 > /cgroup/test2/io.weight

./rwio -w -f 2000M.1 &  //do async write
pid1=$!
echo $pid1 > /cgroup/test1/tasks

./rwio -w -f 2000M.2 &
pid2=$!
echo $pid2 > /cgroup/test2/tasks

sleep 10
kill -9 $pid1
kill -9 $pid2
sleep 1

echo ======
cat /cgroup/test1/io.disk_time
cat /cgroup/test2/io.disk_time

echo ======
cat /cgroup/test1/io.disk_sectors
cat /cgroup/test2/io.disk_sectors

rmdir /cgroup/test1
rmdir /cgroup/test2
umount /cgroup
rmdir /cgroup


BUG: unable to handle kernel NULL pointer dereferec
IP: [<c0448c24>] cgroup_path+0xc/0x97
*pde = 64d2d067
Oops: 0000 [#1] SMP
last sysfs file: /sys/block/md0/range
Modules linked in: ipv6 cpufreq_ondemand acpi_cpufreq dm_mirror dm_multipath sbd
Pid: 132, comm: kblockd/0 Not tainted (2.6.30-rc4-Vivek-V2 #1) Veriton M460
EIP: 0060:[<c0448c24>] EFLAGS: 00010086 CPU: 0
EIP is at cgroup_path+0xc/0x97
EAX: 00000100 EBX: f60adca0 ECX: 00000080 EDX: f709fe28
ESI: f60adca8 EDI: f709fe28 EBP: 00000100 ESP: f709fdf0
 DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
Process kblockd/0 (pid: 132, ti=f709f000 task=f70a8f60 task.ti=f709f000)
Stack:
 f709fe28 f68c5698 f60adca0 f60adca8 f709fe28 f68de801 c04f5389 00000080
 f68de800 f7094d0c f6a29118 f68bde00 00000016 c04f5e8d c04f5340 00000080
 c0579fec f68c5e94 00000082 c042edb4 f68c5fd4 f68c5fd4 c080b520 00000082
Call Trace:
 [<c04f5389>] ? io_group_path+0x6d/0x89
 [<c04f5e8d>] ? elv_ioq_served+0x2a/0x7a
 [<c04f5340>] ? io_group_path+0x24/0x89
 [<c0579fec>] ? ide_build_dmatable+0xda/0x130
 [<c042edb4>] ? lock_timer_base+0x19/0x35
 [<c042ef0c>] ? mod_timer+0x9f/0xa8
 [<c04fdee6>] ? __delay+0x6/0x7
 [<c057364f>] ? ide_execute_command+0x5d/0x71
 [<c0579d4f>] ? ide_dma_intr+0x0/0x99
 [<c0576496>] ? do_rw_taskfile+0x201/0x213
 [<c04f6daa>] ? __elv_ioq_slice_expired+0x212/0x25e
 [<c04f7e15>] ? elv_fq_select_ioq+0x121/0x184
 [<c04e8a2f>] ? elv_select_sched_queue+0x1e/0x2e
 [<c04f439c>] ? cfq_dispatch_requests+0xaa/0x238
 [<c04e7e67>] ? elv_next_request+0x152/0x15f
 [<c04240c2>] ? dequeue_task_fair+0x16/0x2d
 [<c0572f49>] ? do_ide_request+0x10f/0x4c8
 [<c0642d44>] ? __schedule+0x845/0x893
 [<c042edb4>] ? lock_timer_base+0x19/0x35
 [<c042f1be>] ? del_timer+0x41/0x47
 [<c04ea5c6>] ? __generic_unplug_device+0x23/0x25
 [<c04f530d>] ? elv_kick_queue+0x19/0x28
 [<c0434b77>] ? worker_thread+0x11f/0x19e
 [<c04f52f4>] ? elv_kick_queue+0x0/0x28
 [<c0436ffc>] ? autoremove_wake_function+0x0/0x2d
 [<c0434a58>] ? worker_thread+0x0/0x19e
 [<c0436f3b>] ? kthread+0x42/0x67
 [<c0436ef9>] ? kthread+0x0/0x67
 [<c040326f>] ? kernel_thread_helper+0x7/0x10
Code: c0 84 c0 74 0e 89 d8 e8 7c e9 fd ff eb 05 bf fd ff ff ff e8 c0 ea ff ff 8
EIP: [<c0448c24>] cgroup_path+0xc/0x97 SS:ESP 0068:f709fdf0
CR2: 000000000000011c
---[ end trace 2d4bc25a2c33e394 ]---

-- 
Regards
Gui Jianfeng



^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: IO scheduler based IO Controller V2
  2009-05-05 22:20     ` Peter Zijlstra
@ 2009-05-06  3:42       ` Balbir Singh
  2009-05-06  3:42       ` Balbir Singh
  1 sibling, 0 replies; 97+ messages in thread
From: Balbir Singh @ 2009-05-06  3:42 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: dhaval-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8,
	snitzer-H+wXaHxf7aLQT0dZR+AlfA, dm-devel-H+wXaHxf7aLQT0dZR+AlfA,
	jens.axboe-QHcLZuEGTsvQT0dZR+AlfA, agk-H+wXaHxf7aLQT0dZR+AlfA,
	paolo.valente-rcYM44yAMweonA0d6jMUrA,
	fernando-gVGce1chcLdL9jVzuh4AOg, jmoyer-H+wXaHxf7aLQT0dZR+AlfA,
	righi.andrea-Re5JQEeQqe8AvxtiuMwx3w,
	fchecconi-Re5JQEeQqe8AvxtiuMwx3w,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Andrew Morton

* Peter Zijlstra <peterz-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org> [2009-05-06 00:20:49]:

> On Tue, 2009-05-05 at 13:24 -0700, Andrew Morton wrote:
> > On Tue,  5 May 2009 15:58:27 -0400
> > Vivek Goyal <vgoyal-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> > 
> > > 
> > > Hi All,
> > > 
> > > Here is the V2 of the IO controller patches generated on top of 2.6.30-rc4.
> > > ...
> > > Currently primarily two other IO controller proposals are out there.
> > > 
> > > dm-ioband
> > > ---------
> > > This patch set is from Ryo Tsuruta from valinux.
> > > ...
> > > IO-throttling
> > > -------------
> > > This patch set is from Andrea Righi provides max bandwidth controller.
> > 
> > I'm thinking we need to lock you guys in a room and come back in 15 minutes.
> > 
> > Seriously, how are we to resolve this?  We could lock me in a room and
> > cmoe back in 15 days, but there's no reason to believe that I'd emerge
> > with the best answer.
> > 
> > I tend to think that a cgroup-based controller is the way to go. 
> > Anything else will need to be wired up to cgroups _anyway_, and that
> > might end up messy.
> 
> FWIW I subscribe to the io-scheduler faith as opposed to the
> device-mapper cult ;-)
> 
> Also, I don't think a simple throttle will be very useful, a more mature
> solution should cater to more use cases.
>

I tend to agree, unless Andrea can prove us wrong. I don't think
throttling a task (not letting it consume CPU, memory when its IO
quota is exceeded) is a good idea. I've asked that question to Andrea
a few times, but got no response.
 

-- 
	Balbir

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: IO scheduler based IO Controller V2
  2009-05-05 22:20     ` Peter Zijlstra
  2009-05-06  3:42       ` Balbir Singh
@ 2009-05-06  3:42       ` Balbir Singh
  2009-05-06 10:20         ` Fabio Checconi
                           ` (3 more replies)
  1 sibling, 4 replies; 97+ messages in thread
From: Balbir Singh @ 2009-05-06  3:42 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Andrew Morton, Vivek Goyal, nauman, dpshah, lizf, mikew,
	fchecconi, paolo.valente, jens.axboe, ryov, fernando, s-uchida,
	taka, guijianfeng, jmoyer, dhaval, linux-kernel, containers,
	righi.andrea, agk, dm-devel, snitzer, m-ikeda

* Peter Zijlstra <peterz@infradead.org> [2009-05-06 00:20:49]:

> On Tue, 2009-05-05 at 13:24 -0700, Andrew Morton wrote:
> > On Tue,  5 May 2009 15:58:27 -0400
> > Vivek Goyal <vgoyal@redhat.com> wrote:
> > 
> > > 
> > > Hi All,
> > > 
> > > Here is the V2 of the IO controller patches generated on top of 2.6.30-rc4.
> > > ...
> > > Currently primarily two other IO controller proposals are out there.
> > > 
> > > dm-ioband
> > > ---------
> > > This patch set is from Ryo Tsuruta from valinux.
> > > ...
> > > IO-throttling
> > > -------------
> > > This patch set is from Andrea Righi provides max bandwidth controller.
> > 
> > I'm thinking we need to lock you guys in a room and come back in 15 minutes.
> > 
> > Seriously, how are we to resolve this?  We could lock me in a room and
> > cmoe back in 15 days, but there's no reason to believe that I'd emerge
> > with the best answer.
> > 
> > I tend to think that a cgroup-based controller is the way to go. 
> > Anything else will need to be wired up to cgroups _anyway_, and that
> > might end up messy.
> 
> FWIW I subscribe to the io-scheduler faith as opposed to the
> device-mapper cult ;-)
> 
> Also, I don't think a simple throttle will be very useful, a more mature
> solution should cater to more use cases.
>

I tend to agree, unless Andrea can prove us wrong. I don't think
throttling a task (not letting it consume CPU, memory when its IO
quota is exceeded) is a good idea. I've asked that question to Andrea
a few times, but got no response.
 

-- 
	Balbir

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: IO scheduler based IO Controller V2
       [not found]     ` <20090505132441.1705bfad.akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
  2009-05-05 22:20       ` Peter Zijlstra
  2009-05-06  2:33       ` Vivek Goyal
@ 2009-05-06  3:41       ` Balbir Singh
  2 siblings, 0 replies; 97+ messages in thread
From: Balbir Singh @ 2009-05-06  3:41 UTC (permalink / raw)
  To: Andrew Morton
  Cc: paolo.valente-rcYM44yAMweonA0d6jMUrA,
	dhaval-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8,
	snitzer-H+wXaHxf7aLQT0dZR+AlfA, fernando-gVGce1chcLdL9jVzuh4AOg,
	jmoyer-H+wXaHxf7aLQT0dZR+AlfA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	fchecconi-Re5JQEeQqe8AvxtiuMwx3w,
	dm-devel-H+wXaHxf7aLQT0dZR+AlfA,
	jens.axboe-QHcLZuEGTsvQT0dZR+AlfA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	agk-H+wXaHxf7aLQT0dZR+AlfA, righi.andrea-Re5JQEeQqe8AvxtiuMwx3w

* Andrew Morton <akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org> [2009-05-05 13:24:41]:

> On Tue,  5 May 2009 15:58:27 -0400
> Vivek Goyal <vgoyal-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> 
> > 
> > Hi All,
> > 
> > Here is the V2 of the IO controller patches generated on top of 2.6.30-rc4.
> > ...
> > Currently primarily two other IO controller proposals are out there.
> > 
> > dm-ioband
> > ---------
> > This patch set is from Ryo Tsuruta from valinux.
> > ...
> > IO-throttling
> > -------------
> > This patch set is from Andrea Righi provides max bandwidth controller.
> 
> I'm thinking we need to lock you guys in a room and come back in 15 minutes.
> 
> Seriously, how are we to resolve this?  We could lock me in a room and
> cmoe back in 15 days, but there's no reason to believe that I'd emerge
> with the best answer.
>

We are planning an IO mini-summit prior to the kernel summit
(hopefully we'll all be able to attend and decide).
 
-- 
	Balbir

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: IO scheduler based IO Controller V2
  2009-05-05 20:24     ` Andrew Morton
                       ` (3 preceding siblings ...)
  (?)
@ 2009-05-06  3:41     ` Balbir Singh
  2009-05-06 13:28         ` Vivek Goyal
       [not found]       ` <20090506034118.GC4416-SINUvgVNF2CyUtPGxGje5AC/G2K4zDHf@public.gmane.org>
  -1 siblings, 2 replies; 97+ messages in thread
From: Balbir Singh @ 2009-05-06  3:41 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Vivek Goyal, dhaval, snitzer, dm-devel, jens.axboe, agk,
	paolo.valente, fernando, jmoyer, fchecconi, containers,
	linux-kernel, righi.andrea

* Andrew Morton <akpm@linux-foundation.org> [2009-05-05 13:24:41]:

> On Tue,  5 May 2009 15:58:27 -0400
> Vivek Goyal <vgoyal@redhat.com> wrote:
> 
> > 
> > Hi All,
> > 
> > Here is the V2 of the IO controller patches generated on top of 2.6.30-rc4.
> > ...
> > Currently primarily two other IO controller proposals are out there.
> > 
> > dm-ioband
> > ---------
> > This patch set is from Ryo Tsuruta from valinux.
> > ...
> > IO-throttling
> > -------------
> > This patch set is from Andrea Righi provides max bandwidth controller.
> 
> I'm thinking we need to lock you guys in a room and come back in 15 minutes.
> 
> Seriously, how are we to resolve this?  We could lock me in a room and
> cmoe back in 15 days, but there's no reason to believe that I'd emerge
> with the best answer.
>

We are planning an IO mini-summit prior to the kernel summit
(hopefully we'll all be able to attend and decide).
 
-- 
	Balbir

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: IO scheduler based IO Controller V2
       [not found]     ` <20090505132441.1705bfad.akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
  2009-05-05 22:20       ` Peter Zijlstra
@ 2009-05-06  2:33       ` Vivek Goyal
  2009-05-06  3:41       ` Balbir Singh
  2 siblings, 0 replies; 97+ messages in thread
From: Vivek Goyal @ 2009-05-06  2:33 UTC (permalink / raw)
  To: Andrew Morton
  Cc: dhaval-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8,
	snitzer-H+wXaHxf7aLQT0dZR+AlfA, dm-devel-H+wXaHxf7aLQT0dZR+AlfA,
	jens.axboe-QHcLZuEGTsvQT0dZR+AlfA, agk-H+wXaHxf7aLQT0dZR+AlfA,
	balbir-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8,
	paolo.valente-rcYM44yAMweonA0d6jMUrA,
	fernando-gVGce1chcLdL9jVzuh4AOg, jmoyer-H+wXaHxf7aLQT0dZR+AlfA,
	fchecconi-Re5JQEeQqe8AvxtiuMwx3w,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	righi.andrea-Re5JQEeQqe8AvxtiuMwx3w

On Tue, May 05, 2009 at 01:24:41PM -0700, Andrew Morton wrote:
> On Tue,  5 May 2009 15:58:27 -0400
> Vivek Goyal <vgoyal-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> 
> > 
> > Hi All,
> > 
> > Here is the V2 of the IO controller patches generated on top of 2.6.30-rc4.
> > ...
> > Currently primarily two other IO controller proposals are out there.
> > 
> > dm-ioband
> > ---------
> > This patch set is from Ryo Tsuruta from valinux.
> > ...
> > IO-throttling
> > -------------
> > This patch set is from Andrea Righi provides max bandwidth controller.
> 
> I'm thinking we need to lock you guys in a room and come back in 15 minutes.
> 
> Seriously, how are we to resolve this?  We could lock me in a room and
> cmoe back in 15 days, but there's no reason to believe that I'd emerge
> with the best answer.
> 
> I tend to think that a cgroup-based controller is the way to go. 
> Anything else will need to be wired up to cgroups _anyway_, and that
> might end up messy.

Hi Andrew,

Sorry, did not get what do you mean by cgroup based controller? If you
mean that we use cgroups for grouping tasks for controlling IO, then both
IO scheduler based controller as well as io throttling proposal do that.
dm-ioband also supports that up to some extent but it requires extra step of
transferring cgroup grouping information to dm-ioband device using dm-tools.

But if you meant that io-throttle patches, then I think it solves only
part of the problem and that is max bw control. It does not offer minimum
BW/minimum disk share gurantees as offered by proportional BW control.

IOW, it supports upper limit control and does not support a work conserving
IO controller which lets a group use the whole BW if competing groups are
not present. IMHO, proportional BW control is an important feature which
we will need and IIUC, io-throttle patches can't be easily extended to support
proportional BW control, OTOH, one should be able to extend IO scheduler
based proportional weight controller to also support max bw control. 

Andrea, last time you were planning to have a look at my patches and see
if max bw controller can be implemented there. I got a feeling that it
should not be too difficult to implement it there. We already have the
hierarchical tree of io queues and groups in elevator layer and we run
BFQ (WF2Q+) algorithm to select next queue to dispatch the IO from. It is
just a matter of also keeping track of IO rate per queue/group and we should
be easily be able to delay the dispatch of IO from a queue if its group has
crossed the specified max bw.

This should lead to less code and reduced complextiy (compared with the
case where we do max bw control with io-throttling patches and proportional
BW control using IO scheduler based control patches).
 
So do you think that it would make sense to do max BW control along with
proportional weight IO controller at IO scheduler? If yes, then we can
work together and continue to develop this patchset to also support max
bw control and meet your requirements and drop the io-throttling patches.

The only thing which concerns me is the fact that IO scheduler does not
have the view of higher level logical device. So if somebody has setup a
software RAID and wants to put max BW limit on software raid device, this
solution will not work. One shall have to live with max bw limits on 
individual disks (where io scheduler is actually running). Do your patches
allow to put limit on software RAID devices also? 

Ryo, dm-ioband breaks the notion of classes and priority of CFQ because
of FIFO dispatch of buffered bios. Apart from that it tries to provide
fairness in terms of actual IO done and that would mean a seeky workload
will can use disk for much longer to get equivalent IO done and slow down
other applications. Implementing IO controller at IO scheduler level gives
us tigher control. Will it not meet your requirements? If you got specific
concerns with IO scheduler based contol patches, please highlight these and
we will see how these can be addressed.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: IO scheduler based IO Controller V2
  2009-05-05 20:24     ` Andrew Morton
  (?)
  (?)
@ 2009-05-06  2:33     ` Vivek Goyal
  2009-05-06 17:59       ` Nauman Rafique
                         ` (4 more replies)
  -1 siblings, 5 replies; 97+ messages in thread
From: Vivek Goyal @ 2009-05-06  2:33 UTC (permalink / raw)
  To: Andrew Morton
  Cc: nauman, dpshah, lizf, mikew, fchecconi, paolo.valente,
	jens.axboe, ryov, fernando, s-uchida, taka, guijianfeng, jmoyer,
	dhaval, balbir, linux-kernel, containers, righi.andrea, agk,
	dm-devel, snitzer, m-ikeda, peterz

On Tue, May 05, 2009 at 01:24:41PM -0700, Andrew Morton wrote:
> On Tue,  5 May 2009 15:58:27 -0400
> Vivek Goyal <vgoyal@redhat.com> wrote:
> 
> > 
> > Hi All,
> > 
> > Here is the V2 of the IO controller patches generated on top of 2.6.30-rc4.
> > ...
> > Currently primarily two other IO controller proposals are out there.
> > 
> > dm-ioband
> > ---------
> > This patch set is from Ryo Tsuruta from valinux.
> > ...
> > IO-throttling
> > -------------
> > This patch set is from Andrea Righi provides max bandwidth controller.
> 
> I'm thinking we need to lock you guys in a room and come back in 15 minutes.
> 
> Seriously, how are we to resolve this?  We could lock me in a room and
> cmoe back in 15 days, but there's no reason to believe that I'd emerge
> with the best answer.
> 
> I tend to think that a cgroup-based controller is the way to go. 
> Anything else will need to be wired up to cgroups _anyway_, and that
> might end up messy.

Hi Andrew,

Sorry, did not get what do you mean by cgroup based controller? If you
mean that we use cgroups for grouping tasks for controlling IO, then both
IO scheduler based controller as well as io throttling proposal do that.
dm-ioband also supports that up to some extent but it requires extra step of
transferring cgroup grouping information to dm-ioband device using dm-tools.

But if you meant that io-throttle patches, then I think it solves only
part of the problem and that is max bw control. It does not offer minimum
BW/minimum disk share gurantees as offered by proportional BW control.

IOW, it supports upper limit control and does not support a work conserving
IO controller which lets a group use the whole BW if competing groups are
not present. IMHO, proportional BW control is an important feature which
we will need and IIUC, io-throttle patches can't be easily extended to support
proportional BW control, OTOH, one should be able to extend IO scheduler
based proportional weight controller to also support max bw control. 

Andrea, last time you were planning to have a look at my patches and see
if max bw controller can be implemented there. I got a feeling that it
should not be too difficult to implement it there. We already have the
hierarchical tree of io queues and groups in elevator layer and we run
BFQ (WF2Q+) algorithm to select next queue to dispatch the IO from. It is
just a matter of also keeping track of IO rate per queue/group and we should
be easily be able to delay the dispatch of IO from a queue if its group has
crossed the specified max bw.

This should lead to less code and reduced complextiy (compared with the
case where we do max bw control with io-throttling patches and proportional
BW control using IO scheduler based control patches).
 
So do you think that it would make sense to do max BW control along with
proportional weight IO controller at IO scheduler? If yes, then we can
work together and continue to develop this patchset to also support max
bw control and meet your requirements and drop the io-throttling patches.

The only thing which concerns me is the fact that IO scheduler does not
have the view of higher level logical device. So if somebody has setup a
software RAID and wants to put max BW limit on software raid device, this
solution will not work. One shall have to live with max bw limits on 
individual disks (where io scheduler is actually running). Do your patches
allow to put limit on software RAID devices also? 

Ryo, dm-ioband breaks the notion of classes and priority of CFQ because
of FIFO dispatch of buffered bios. Apart from that it tries to provide
fairness in terms of actual IO done and that would mean a seeky workload
will can use disk for much longer to get equivalent IO done and slow down
other applications. Implementing IO controller at IO scheduler level gives
us tigher control. Will it not meet your requirements? If you got specific
concerns with IO scheduler based contol patches, please highlight these and
we will see how these can be addressed.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: IO scheduler based IO Controller V2
       [not found]     ` <20090505132441.1705bfad.akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
@ 2009-05-05 22:20       ` Peter Zijlstra
  2009-05-06  2:33       ` Vivek Goyal
  2009-05-06  3:41       ` Balbir Singh
  2 siblings, 0 replies; 97+ messages in thread
From: Peter Zijlstra @ 2009-05-05 22:20 UTC (permalink / raw)
  To: Andrew Morton
  Cc: dhaval-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8,
	snitzer-H+wXaHxf7aLQT0dZR+AlfA, dm-devel-H+wXaHxf7aLQT0dZR+AlfA,
	jens.axboe-QHcLZuEGTsvQT0dZR+AlfA, agk-H+wXaHxf7aLQT0dZR+AlfA,
	balbir-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8,
	paolo.valente-rcYM44yAMweonA0d6jMUrA,
	fernando-gVGce1chcLdL9jVzuh4AOg, jmoyer-H+wXaHxf7aLQT0dZR+AlfA,
	righi.andrea-Re5JQEeQqe8AvxtiuMwx3w,
	fchecconi-Re5JQEeQqe8AvxtiuMwx3w,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

On Tue, 2009-05-05 at 13:24 -0700, Andrew Morton wrote:
> On Tue,  5 May 2009 15:58:27 -0400
> Vivek Goyal <vgoyal-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> 
> > 
> > Hi All,
> > 
> > Here is the V2 of the IO controller patches generated on top of 2.6.30-rc4.
> > ...
> > Currently primarily two other IO controller proposals are out there.
> > 
> > dm-ioband
> > ---------
> > This patch set is from Ryo Tsuruta from valinux.
> > ...
> > IO-throttling
> > -------------
> > This patch set is from Andrea Righi provides max bandwidth controller.
> 
> I'm thinking we need to lock you guys in a room and come back in 15 minutes.
> 
> Seriously, how are we to resolve this?  We could lock me in a room and
> cmoe back in 15 days, but there's no reason to believe that I'd emerge
> with the best answer.
> 
> I tend to think that a cgroup-based controller is the way to go. 
> Anything else will need to be wired up to cgroups _anyway_, and that
> might end up messy.

FWIW I subscribe to the io-scheduler faith as opposed to the
device-mapper cult ;-)

Also, I don't think a simple throttle will be very useful, a more mature
solution should cater to more use cases.

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: IO scheduler based IO Controller V2
  2009-05-05 20:24     ` Andrew Morton
  (?)
@ 2009-05-05 22:20     ` Peter Zijlstra
  2009-05-06  3:42       ` Balbir Singh
  2009-05-06  3:42       ` Balbir Singh
  -1 siblings, 2 replies; 97+ messages in thread
From: Peter Zijlstra @ 2009-05-05 22:20 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Vivek Goyal, nauman, dpshah, lizf, mikew, fchecconi,
	paolo.valente, jens.axboe, ryov, fernando, s-uchida, taka,
	guijianfeng, jmoyer, dhaval, balbir, linux-kernel, containers,
	righi.andrea, agk, dm-devel, snitzer, m-ikeda

On Tue, 2009-05-05 at 13:24 -0700, Andrew Morton wrote:
> On Tue,  5 May 2009 15:58:27 -0400
> Vivek Goyal <vgoyal@redhat.com> wrote:
> 
> > 
> > Hi All,
> > 
> > Here is the V2 of the IO controller patches generated on top of 2.6.30-rc4.
> > ...
> > Currently primarily two other IO controller proposals are out there.
> > 
> > dm-ioband
> > ---------
> > This patch set is from Ryo Tsuruta from valinux.
> > ...
> > IO-throttling
> > -------------
> > This patch set is from Andrea Righi provides max bandwidth controller.
> 
> I'm thinking we need to lock you guys in a room and come back in 15 minutes.
> 
> Seriously, how are we to resolve this?  We could lock me in a room and
> cmoe back in 15 days, but there's no reason to believe that I'd emerge
> with the best answer.
> 
> I tend to think that a cgroup-based controller is the way to go. 
> Anything else will need to be wired up to cgroups _anyway_, and that
> might end up messy.

FWIW I subscribe to the io-scheduler faith as opposed to the
device-mapper cult ;-)

Also, I don't think a simple throttle will be very useful, a more mature
solution should cater to more use cases.



^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: IO scheduler based IO Controller V2
  2009-05-05 19:58 Vivek Goyal
@ 2009-05-05 20:24     ` Andrew Morton
  2009-05-06  8:11 ` Gui Jianfeng
  1 sibling, 0 replies; 97+ messages in thread
From: Andrew Morton @ 2009-05-05 20:24 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: dhaval-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8,
	snitzer-H+wXaHxf7aLQT0dZR+AlfA, dm-devel-H+wXaHxf7aLQT0dZR+AlfA,
	jens.axboe-QHcLZuEGTsvQT0dZR+AlfA, agk-H+wXaHxf7aLQT0dZR+AlfA,
	balbir-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8,
	paolo.valente-rcYM44yAMweonA0d6jMUrA,
	fernando-gVGce1chcLdL9jVzuh4AOg, jmoyer-H+wXaHxf7aLQT0dZR+AlfA,
	fchecconi-Re5JQEeQqe8AvxtiuMwx3w,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	righi.andrea-Re5JQEeQqe8AvxtiuMwx3w

On Tue,  5 May 2009 15:58:27 -0400
Vivek Goyal <vgoyal-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:

> 
> Hi All,
> 
> Here is the V2 of the IO controller patches generated on top of 2.6.30-rc4.
> ...
> Currently primarily two other IO controller proposals are out there.
> 
> dm-ioband
> ---------
> This patch set is from Ryo Tsuruta from valinux.
> ...
> IO-throttling
> -------------
> This patch set is from Andrea Righi provides max bandwidth controller.

I'm thinking we need to lock you guys in a room and come back in 15 minutes.

Seriously, how are we to resolve this?  We could lock me in a room and
cmoe back in 15 days, but there's no reason to believe that I'd emerge
with the best answer.

I tend to think that a cgroup-based controller is the way to go. 
Anything else will need to be wired up to cgroups _anyway_, and that
might end up messy.

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: IO scheduler based IO Controller V2
@ 2009-05-05 20:24     ` Andrew Morton
  0 siblings, 0 replies; 97+ messages in thread
From: Andrew Morton @ 2009-05-05 20:24 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: nauman, dpshah, lizf, mikew, fchecconi, paolo.valente,
	jens.axboe, ryov, fernando, s-uchida, taka, guijianfeng, jmoyer,
	dhaval, balbir, linux-kernel, containers, righi.andrea, agk,
	dm-devel, snitzer, m-ikeda, vgoyal

On Tue,  5 May 2009 15:58:27 -0400
Vivek Goyal <vgoyal@redhat.com> wrote:

> 
> Hi All,
> 
> Here is the V2 of the IO controller patches generated on top of 2.6.30-rc4.
> ...
> Currently primarily two other IO controller proposals are out there.
> 
> dm-ioband
> ---------
> This patch set is from Ryo Tsuruta from valinux.
> ...
> IO-throttling
> -------------
> This patch set is from Andrea Righi provides max bandwidth controller.

I'm thinking we need to lock you guys in a room and come back in 15 minutes.

Seriously, how are we to resolve this?  We could lock me in a room and
cmoe back in 15 days, but there's no reason to believe that I'd emerge
with the best answer.

I tend to think that a cgroup-based controller is the way to go. 
Anything else will need to be wired up to cgroups _anyway_, and that
might end up messy.


^ permalink raw reply	[flat|nested] 97+ messages in thread

* IO scheduler based IO Controller V2
@ 2009-05-05 19:58 Vivek Goyal
  0 siblings, 0 replies; 97+ messages in thread
From: Vivek Goyal @ 2009-05-05 19:58 UTC (permalink / raw)
  To: nauman-hpIqsD4AKlfQT0dZR+AlfA, dpshah-hpIqsD4AKlfQT0dZR+AlfA,
	lizf-BthXqXjhjHXQFUHtdCDX3A, mikew-hpIqsD4AKlfQT0dZR+AlfA,
	fchecconi-Re5JQEeQqe8AvxtiuMwx3w,
	paolo.valente-rcYM44yAMweonA0d6jMUrA,
	jens.axboe-QHcLZuEGTsvQT0dZR+AlfA, ryov-jCdQPDEk3idL9jVzuh4AOg,
	fer
  Cc: akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b


Hi All,

Here is the V2 of the IO controller patches generated on top of 2.6.30-rc4.
First version of the patches was posted here.

http://lkml.org/lkml/2009/3/11/486

This patchset is still work in progress but I want to keep on getting the
snapshot of my tree out at regular intervals to get the feedback hence V2.

Before I go into details of what are the major changes from V1, wanted
to highlight other IO controller proposals on lkml.

Other active IO controller proposals
------------------------------------
Currently primarily two other IO controller proposals are out there.

dm-ioband
---------
This patch set is from Ryo Tsuruta from valinux. It is a proportional bandwidth controller implemented as a dm driver.

http://people.valinux.co.jp/~ryov/dm-ioband/

The biggest issue (apart from others), with a 2nd level IO controller is that
buffering of BIOs takes place in a single queue and dispatch of this BIOs
to unerlying IO scheduler is in FIFO manner. That means whenever the buffering
takes place, it breaks the notion of different class and priority of CFQ.

That means RT requests might be stuck behind some write requests or some read
requests might be stuck behind somet write requests for long time etc. To
demonstrate the single FIFO dispatch issues, I had run some basic tests and
posted the results in following mail thread.

http://lkml.org/lkml/2009/4/13/2

These are hard to solve issues and one will end up maintaining the separate
queues for separate classes and priority as CFQ does to fully resolve it.
But that will make 2nd level implementation complex at the same time if
somebody is trying to use IO controller on a single disk or on a hardware RAID
using cfq as scheduler, it will be two layers of queueing maintating separate
queues per priorty level. One at dm-driver level and other at CFQ which again
does not make lot of sense.

On the other hand, if a user is running noop at the device level, at higher
level we will be maintaining multiple cfq like queues, which also does not
make sense as underlying IO scheduler never wanted that.

Hence, IMHO, I think that controlling bio at second level probably is not a
very good idea. We should instead do it at IO scheduler level where we already
maintain all the needed queues. Just that make the scheduling hierarhical and
group aware so isolate IO of one group from other.

IO-throttling
-------------
This patch set is from Andrea Righi provides max bandwidth controller. That
means, it does not gurantee the minimum bandwidth. It provides the maximum
bandwidth limits and throttles the application if it crosses its bandwidth.

So its not apple vs apple comparison. This patch set and dm-ioband provide
proportional bandwidth control where a cgroup can use much more bandwidth
if there are not other users and resource control comes into the picture
only if there is contention.

It seems that there are both the kind of users there. One set of people needing
proportional BW control and other people needing max bandwidth control.

Now the question is, where max bandwidth control should be implemented? At
higher layers or at IO scheduler level? Should proportional bw control and
max bw control be implemented separately at different layer or these should
be implemented at one place?

IMHO, if we are doing proportional bw control at IO scheduler layer, it should
be possible to extend it to do max bw control also here without lot of effort.
Then it probably does not make too much of sense to do two types of control
at two different layers. Doing it at one place should lead to lesser code
and reduced complexity.

Secondly, io-throttling solution also buffers writes at higher layer.
Which again will lead to issue of losing the notion of priority of writes.

Hence, personally I think that users will need both proportional bw as well
as max bw control and we probably should implement these at a single place
instead of splitting it. Once elevator based io controller patchset matures,
it can be enhanced to do max bw control also.

Having said that, one issue with doing upper limit control at elevator/IO
scheduler level is that it does not have the view of higher level logical
devices. So if there is a software RAID with two disks, then one can not do
max bw control on logical device, instead it shall have to be on leaf node
where io scheduler is attached.

Now back to the desciption of this patchset and changes from V1.

- Rebased patches to 2.6.30-rc4.

- Last time Andrew mentioned that async writes are big issue for us hence,
  introduced the control for async writes also.

- Implemented per group request descriptor support. This was needed to
  make sure one group doing lot of IO does not starve other group of request
  descriptors and other group does not get fair share. This is a basic patch
  right now which probably will require more changes after some discussion.

- Exported the disk time used and number of sectors dispatched by a cgroup
  through cgroup interface. This should help us in seeing how much disk
  time each group got and whether it is fair or not.

- Implemented group refcounting support. Lack of this was causing some
  cgroup related issues. There are still some races left out which needs
  to be fixed. 

- For IO tracking/async write tracking, started making use of patches of
  blkio-cgroup from ryo Tsuruta posted here.

  http://lkml.org/lkml/2009/4/28/235

  Currently people seem to be liking the idea of separate subsystem for
  tracking writes and then rest of the users can use that info instead of
  everybody implementing their own. That's a different thing that how many
  users are out there which will end up in kernel is not clear.

  So instead of carrying own versin of bio-cgroup patches, and overloading
  io controller cgroup subsystem, I am making use of blkio-cgroup patches.
  One shall have to mount io controller and blkio subsystem together on the
  same hiearchy for the time being. Later we can take care of the case where
  blkio is mounted on a different hierarchy.

- Replaced group priorities with group weights.

Testing
=======

Again, I have been able to do only very basic testing of reads and writes.
Did not want to hold the patches back because of testing. Providing support
for async writes took much more time than expected and still work is left
in that area. Will continue to do more testing.

Test1 (Fairness for synchronous reads)
======================================
- Two dd in two cgroups with cgrop weights 1000 and 500. Ran two "dd" in those
  cgroups (With CFQ scheduler and /sys/block/<device>/queue/fairness = 1)

dd if=/mnt/$BLOCKDEV/zerofile1 of=/dev/null &
dd if=/mnt/$BLOCKDEV/zerofile2 of=/dev/null &

234179072 bytes (234 MB) copied, 4.13954 s, 56.6 MB/s
234179072 bytes (234 MB) copied, 5.2127 s, 44.9 MB/s

group1 time=3108 group1 sectors=460968
group2 time=1405 group2 sectors=264944

This patchset tries to provide fairness in terms of disk time received. group1
got almost double of group2 disk time (At the time of first dd finish). These
time and sectors statistics can be read using io.disk_time and io.disk_sector
files in cgroup. More about it in documentation file.

Test2 (Fairness for async writes)
=================================
Fairness for async writes is tricy and biggest reason is that async writes
are cached in higher layers (page cahe) and are dispatched to lower layers
not necessarily in proportional manner. For example, consider two dd threads
reading /dev/zero as input file and doing writes of huge files. Very soon
we will cross vm_dirty_ratio and dd thread will be forced to write out some
pages to disk before more pages can be dirtied. But not necessarily dirty
pages of same thread are picked. It can very well pick the inode of lesser
priority dd thread and do some writeout. So effectively higher weight dd is
doing writeouts of lower weight dd pages and we don't see service differentation

IOW, the core problem with async write fairness is that higher weight thread
does not throw enought IO traffic at IO controller to keep the queue
continuously backlogged. This are many .2 to .8 second intervals where higher
weight queue is empty and in that duration lower weight queue get lots of job
done giving the impression that there was no service differentiation.

In summary, from IO controller point of view async writes support is there. Now
we need to do some more work in higher layers to make sure higher weight process
is not blocked behind IO of some lower weight process. This is a TODO item.

So to test async writes I generated lots of write traffic in two cgroups (50
fio threads) and watched the disk time statistics in respective cgroups at
the interval of 2 seconds. Thanks to ryo tsuruta for the test case.

*****************************************************************
sync
echo 3 > /proc/sys/vm/drop_caches

fio_args="--size=64m --rw=write --numjobs=50 --group_reporting"

echo $$ > /cgroup/bfqio/test1/tasks
fio $fio_args --name=test1 --directory=/mnt/sdd1/fio/ --output=/mnt/sdd1/fio/test1.log &

echo $$ > /cgroup/bfqio/test2/tasks
fio $fio_args --name=test2 --directory=/mnt/sdd2/fio/ --output=/mnt/sdd2/fio/test2.log &
*********************************************************************** 

And watched the disk time and sector statistics for the both the cgroups
every 2 seconds using a script. How is snippet from output.

test1 statistics: time=9848   sectors=643152
test2 statistics: time=5224   sectors=258600

test1 statistics: time=11736   sectors=785792
test2 statistics: time=6509   sectors=333160

test1 statistics: time=13607   sectors=943968
test2 statistics: time=7443   sectors=394352

test1 statistics: time=15662   sectors=1089496
test2 statistics: time=8568   sectors=451152

So disk time consumed by group1 is almost double of group2.  

Your feedback and comments are welcome.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 97+ messages in thread

* IO scheduler based IO Controller V2
@ 2009-05-05 19:58 Vivek Goyal
       [not found] ` <1241553525-28095-1-git-send-email-vgoyal-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  2009-05-06  8:11 ` Gui Jianfeng
  0 siblings, 2 replies; 97+ messages in thread
From: Vivek Goyal @ 2009-05-05 19:58 UTC (permalink / raw)
  To: nauman, dpshah, lizf, mikew, fchecconi, paolo.valente,
	jens.axboe, ryov, fernando, s-uchida, taka, guijianfeng, jmoyer,
	dhaval, balbir, linux-kernel, containers, righi.andrea, agk,
	dm-devel, snitzer, m-ikeda
  Cc: vgoyal, akpm


Hi All,

Here is the V2 of the IO controller patches generated on top of 2.6.30-rc4.
First version of the patches was posted here.

http://lkml.org/lkml/2009/3/11/486

This patchset is still work in progress but I want to keep on getting the
snapshot of my tree out at regular intervals to get the feedback hence V2.

Before I go into details of what are the major changes from V1, wanted
to highlight other IO controller proposals on lkml.

Other active IO controller proposals
------------------------------------
Currently primarily two other IO controller proposals are out there.

dm-ioband
---------
This patch set is from Ryo Tsuruta from valinux. It is a proportional bandwidth controller implemented as a dm driver.

http://people.valinux.co.jp/~ryov/dm-ioband/

The biggest issue (apart from others), with a 2nd level IO controller is that
buffering of BIOs takes place in a single queue and dispatch of this BIOs
to unerlying IO scheduler is in FIFO manner. That means whenever the buffering
takes place, it breaks the notion of different class and priority of CFQ.

That means RT requests might be stuck behind some write requests or some read
requests might be stuck behind somet write requests for long time etc. To
demonstrate the single FIFO dispatch issues, I had run some basic tests and
posted the results in following mail thread.

http://lkml.org/lkml/2009/4/13/2

These are hard to solve issues and one will end up maintaining the separate
queues for separate classes and priority as CFQ does to fully resolve it.
But that will make 2nd level implementation complex at the same time if
somebody is trying to use IO controller on a single disk or on a hardware RAID
using cfq as scheduler, it will be two layers of queueing maintating separate
queues per priorty level. One at dm-driver level and other at CFQ which again
does not make lot of sense.

On the other hand, if a user is running noop at the device level, at higher
level we will be maintaining multiple cfq like queues, which also does not
make sense as underlying IO scheduler never wanted that.

Hence, IMHO, I think that controlling bio at second level probably is not a
very good idea. We should instead do it at IO scheduler level where we already
maintain all the needed queues. Just that make the scheduling hierarhical and
group aware so isolate IO of one group from other.

IO-throttling
-------------
This patch set is from Andrea Righi provides max bandwidth controller. That
means, it does not gurantee the minimum bandwidth. It provides the maximum
bandwidth limits and throttles the application if it crosses its bandwidth.

So its not apple vs apple comparison. This patch set and dm-ioband provide
proportional bandwidth control where a cgroup can use much more bandwidth
if there are not other users and resource control comes into the picture
only if there is contention.

It seems that there are both the kind of users there. One set of people needing
proportional BW control and other people needing max bandwidth control.

Now the question is, where max bandwidth control should be implemented? At
higher layers or at IO scheduler level? Should proportional bw control and
max bw control be implemented separately at different layer or these should
be implemented at one place?

IMHO, if we are doing proportional bw control at IO scheduler layer, it should
be possible to extend it to do max bw control also here without lot of effort.
Then it probably does not make too much of sense to do two types of control
at two different layers. Doing it at one place should lead to lesser code
and reduced complexity.

Secondly, io-throttling solution also buffers writes at higher layer.
Which again will lead to issue of losing the notion of priority of writes.

Hence, personally I think that users will need both proportional bw as well
as max bw control and we probably should implement these at a single place
instead of splitting it. Once elevator based io controller patchset matures,
it can be enhanced to do max bw control also.

Having said that, one issue with doing upper limit control at elevator/IO
scheduler level is that it does not have the view of higher level logical
devices. So if there is a software RAID with two disks, then one can not do
max bw control on logical device, instead it shall have to be on leaf node
where io scheduler is attached.

Now back to the desciption of this patchset and changes from V1.

- Rebased patches to 2.6.30-rc4.

- Last time Andrew mentioned that async writes are big issue for us hence,
  introduced the control for async writes also.

- Implemented per group request descriptor support. This was needed to
  make sure one group doing lot of IO does not starve other group of request
  descriptors and other group does not get fair share. This is a basic patch
  right now which probably will require more changes after some discussion.

- Exported the disk time used and number of sectors dispatched by a cgroup
  through cgroup interface. This should help us in seeing how much disk
  time each group got and whether it is fair or not.

- Implemented group refcounting support. Lack of this was causing some
  cgroup related issues. There are still some races left out which needs
  to be fixed. 

- For IO tracking/async write tracking, started making use of patches of
  blkio-cgroup from ryo Tsuruta posted here.

  http://lkml.org/lkml/2009/4/28/235

  Currently people seem to be liking the idea of separate subsystem for
  tracking writes and then rest of the users can use that info instead of
  everybody implementing their own. That's a different thing that how many
  users are out there which will end up in kernel is not clear.

  So instead of carrying own versin of bio-cgroup patches, and overloading
  io controller cgroup subsystem, I am making use of blkio-cgroup patches.
  One shall have to mount io controller and blkio subsystem together on the
  same hiearchy for the time being. Later we can take care of the case where
  blkio is mounted on a different hierarchy.

- Replaced group priorities with group weights.

Testing
=======

Again, I have been able to do only very basic testing of reads and writes.
Did not want to hold the patches back because of testing. Providing support
for async writes took much more time than expected and still work is left
in that area. Will continue to do more testing.

Test1 (Fairness for synchronous reads)
======================================
- Two dd in two cgroups with cgrop weights 1000 and 500. Ran two "dd" in those
  cgroups (With CFQ scheduler and /sys/block/<device>/queue/fairness = 1)

dd if=/mnt/$BLOCKDEV/zerofile1 of=/dev/null &
dd if=/mnt/$BLOCKDEV/zerofile2 of=/dev/null &

234179072 bytes (234 MB) copied, 4.13954 s, 56.6 MB/s
234179072 bytes (234 MB) copied, 5.2127 s, 44.9 MB/s

group1 time=3108 group1 sectors=460968
group2 time=1405 group2 sectors=264944

This patchset tries to provide fairness in terms of disk time received. group1
got almost double of group2 disk time (At the time of first dd finish). These
time and sectors statistics can be read using io.disk_time and io.disk_sector
files in cgroup. More about it in documentation file.

Test2 (Fairness for async writes)
=================================
Fairness for async writes is tricy and biggest reason is that async writes
are cached in higher layers (page cahe) and are dispatched to lower layers
not necessarily in proportional manner. For example, consider two dd threads
reading /dev/zero as input file and doing writes of huge files. Very soon
we will cross vm_dirty_ratio and dd thread will be forced to write out some
pages to disk before more pages can be dirtied. But not necessarily dirty
pages of same thread are picked. It can very well pick the inode of lesser
priority dd thread and do some writeout. So effectively higher weight dd is
doing writeouts of lower weight dd pages and we don't see service differentation

IOW, the core problem with async write fairness is that higher weight thread
does not throw enought IO traffic at IO controller to keep the queue
continuously backlogged. This are many .2 to .8 second intervals where higher
weight queue is empty and in that duration lower weight queue get lots of job
done giving the impression that there was no service differentiation.

In summary, from IO controller point of view async writes support is there. Now
we need to do some more work in higher layers to make sure higher weight process
is not blocked behind IO of some lower weight process. This is a TODO item.

So to test async writes I generated lots of write traffic in two cgroups (50
fio threads) and watched the disk time statistics in respective cgroups at
the interval of 2 seconds. Thanks to ryo tsuruta for the test case.

*****************************************************************
sync
echo 3 > /proc/sys/vm/drop_caches

fio_args="--size=64m --rw=write --numjobs=50 --group_reporting"

echo $$ > /cgroup/bfqio/test1/tasks
fio $fio_args --name=test1 --directory=/mnt/sdd1/fio/ --output=/mnt/sdd1/fio/test1.log &

echo $$ > /cgroup/bfqio/test2/tasks
fio $fio_args --name=test2 --directory=/mnt/sdd2/fio/ --output=/mnt/sdd2/fio/test2.log &
*********************************************************************** 

And watched the disk time and sector statistics for the both the cgroups
every 2 seconds using a script. How is snippet from output.

test1 statistics: time=9848   sectors=643152
test2 statistics: time=5224   sectors=258600

test1 statistics: time=11736   sectors=785792
test2 statistics: time=6509   sectors=333160

test1 statistics: time=13607   sectors=943968
test2 statistics: time=7443   sectors=394352

test1 statistics: time=15662   sectors=1089496
test2 statistics: time=8568   sectors=451152

So disk time consumed by group1 is almost double of group2.  

Your feedback and comments are welcome.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 97+ messages in thread

end of thread, other threads:[~2009-05-14 16:44 UTC | newest]

Thread overview: 97+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-05-05 19:58 IO scheduler based IO Controller V2 Vivek Goyal
2009-05-05 19:58 Vivek Goyal
     [not found] ` <1241553525-28095-1-git-send-email-vgoyal-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2009-05-05 20:24   ` Andrew Morton
2009-05-05 20:24     ` Andrew Morton
2009-05-05 22:20     ` Peter Zijlstra
2009-05-06  3:42       ` Balbir Singh
2009-05-06  3:42       ` Balbir Singh
2009-05-06 10:20         ` Fabio Checconi
2009-05-06 17:10           ` Balbir Singh
2009-05-06 17:10             ` Balbir Singh
     [not found]           ` <20090506102030.GB20544-f9ZlEuEWxVeACYmtYXMKmw@public.gmane.org>
2009-05-06 17:10             ` Balbir Singh
2009-05-06 18:47         ` Divyesh Shah
     [not found]         ` <20090506034254.GD4416-SINUvgVNF2CyUtPGxGje5AC/G2K4zDHf@public.gmane.org>
2009-05-06 10:20           ` Fabio Checconi
2009-05-06 18:47           ` Divyesh Shah
2009-05-06 20:42           ` Andrea Righi
2009-05-06 20:42         ` Andrea Righi
2009-05-06  2:33     ` Vivek Goyal
2009-05-06 17:59       ` Nauman Rafique
2009-05-06 20:07       ` Andrea Righi
2009-05-06 21:21         ` Vivek Goyal
2009-05-06 21:21         ` Vivek Goyal
     [not found]           ` <20090506212121.GI8180-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2009-05-06 22:02             ` Andrea Righi
2009-05-06 22:02               ` Andrea Righi
2009-05-06 22:17               ` Vivek Goyal
2009-05-06 22:17                 ` Vivek Goyal
     [not found]       ` <20090506023332.GA1212-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2009-05-06 17:59         ` Nauman Rafique
2009-05-06 20:07         ` Andrea Righi
2009-05-06 20:32         ` Vivek Goyal
2009-05-07  0:18         ` Ryo Tsuruta
2009-05-06 20:32       ` Vivek Goyal
     [not found]         ` <20090506203228.GH8180-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2009-05-06 21:34           ` Andrea Righi
2009-05-06 21:34         ` Andrea Righi
2009-05-06 21:52           ` Vivek Goyal
2009-05-06 21:52             ` Vivek Goyal
2009-05-06 22:35             ` Andrea Righi
2009-05-07  1:48               ` Ryo Tsuruta
2009-05-07  1:48               ` Ryo Tsuruta
     [not found]             ` <20090506215235.GJ8180-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2009-05-06 22:35               ` Andrea Righi
2009-05-07  9:04               ` Andrea Righi
2009-05-07  9:04             ` Andrea Righi
2009-05-07 12:22               ` Andrea Righi
2009-05-07 12:22               ` Andrea Righi
2009-05-07 14:11               ` Vivek Goyal
2009-05-07 14:11               ` Vivek Goyal
     [not found]                 ` <20090507141126.GA9463-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2009-05-07 14:45                   ` Vivek Goyal
2009-05-07 14:45                     ` Vivek Goyal
     [not found]                     ` <20090507144501.GB9463-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2009-05-07 15:36                       ` Vivek Goyal
2009-05-07 15:36                         ` Vivek Goyal
     [not found]                         ` <20090507153642.GC9463-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2009-05-07 15:42                           ` Vivek Goyal
2009-05-07 15:42                             ` Vivek Goyal
2009-05-07 22:19                           ` Andrea Righi
2009-05-07 22:19                         ` Andrea Righi
2009-05-08 18:09                           ` Vivek Goyal
2009-05-08 20:05                             ` Andrea Righi
2009-05-08 21:56                               ` Vivek Goyal
2009-05-08 21:56                                 ` Vivek Goyal
2009-05-09  9:22                                 ` Peter Zijlstra
2009-05-14 10:31                                 ` Andrea Righi
     [not found]                                 ` <20090508215618.GJ7293-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2009-05-09  9:22                                   ` Peter Zijlstra
2009-05-14 10:31                                   ` Andrea Righi
2009-05-14 16:43                                   ` Dhaval Giani
2009-05-14 16:43                                     ` Dhaval Giani
     [not found]                             ` <20090508180951.GG7293-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2009-05-08 20:05                               ` Andrea Righi
2009-05-08 18:09                           ` Vivek Goyal
2009-05-07 22:40                       ` Andrea Righi
2009-05-07 22:40                     ` Andrea Righi
2009-05-07  0:18       ` Ryo Tsuruta
     [not found]         ` <20090507.091858.226775723.ryov-jCdQPDEk3idL9jVzuh4AOg@public.gmane.org>
2009-05-07  1:25           ` Vivek Goyal
2009-05-07  1:25             ` Vivek Goyal
     [not found]             ` <20090507012559.GC4187-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2009-05-11 11:23               ` Ryo Tsuruta
2009-05-11 11:23             ` Ryo Tsuruta
     [not found]               ` <20090511.202309.112614168.ryov-jCdQPDEk3idL9jVzuh4AOg@public.gmane.org>
2009-05-11 12:49                 ` Vivek Goyal
2009-05-11 12:49                   ` Vivek Goyal
2009-05-08 14:24           ` Rik van Riel
2009-05-08 14:24         ` Rik van Riel
     [not found]           ` <4A0440B2.7040300-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2009-05-11 10:11             ` Ryo Tsuruta
2009-05-11 10:11           ` Ryo Tsuruta
     [not found]     ` <20090505132441.1705bfad.akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
2009-05-05 22:20       ` Peter Zijlstra
2009-05-06  2:33       ` Vivek Goyal
2009-05-06  3:41       ` Balbir Singh
2009-05-06  3:41     ` Balbir Singh
2009-05-06 13:28       ` Vivek Goyal
2009-05-06 13:28         ` Vivek Goyal
     [not found]       ` <20090506034118.GC4416-SINUvgVNF2CyUtPGxGje5AC/G2K4zDHf@public.gmane.org>
2009-05-06 13:28         ` Vivek Goyal
2009-05-06  8:11   ` Gui Jianfeng
2009-05-06  8:11 ` Gui Jianfeng
     [not found]   ` <4A014619.1040000-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>
2009-05-06 16:10     ` Vivek Goyal
2009-05-06 16:10       ` Vivek Goyal
2009-05-07  5:36       ` Li Zefan
     [not found]         ` <4A027348.6000808-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>
2009-05-08 13:37           ` Vivek Goyal
2009-05-08 13:37             ` Vivek Goyal
2009-05-11  2:59             ` Gui Jianfeng
     [not found]             ` <20090508133740.GD7293-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2009-05-11  2:59               ` Gui Jianfeng
2009-05-07  5:47       ` Gui Jianfeng
     [not found]       ` <20090506161012.GC8180-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2009-05-07  5:36         ` Li Zefan
2009-05-07  5:47         ` Gui Jianfeng
2009-05-05 19:58 Vivek Goyal

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.