All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andrea Righi <righi.andrea@gmail.com>
To: Paul Menage <menage@google.com>
Cc: Balbir Singh <balbir@linux.vnet.ibm.com>,
	Gui Jianfeng <guijianfeng@cn.fujitsu.com>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	agk@sourceware.org, akpm@linux-foundation.org, axboe@kernel.dk,
	baramsori72@gmail.com, Carl Henrik Lunde <chlunde@ping.uio.no>,
	dave@linux.vnet.ibm.com, Divyesh Shah <dpshah@google.com>,
	eric.rannaud@gmail.com, fernando@oss.ntt.co.jp,
	Hirokazu Takahashi <taka@valinux.co.jp>,
	Li Zefan <lizf@cn.fujitsu.com>,
	matt@bluehost.com, dradford@bluehost.com, ngupta@google.com,
	randy.dunlap@oracle.com, roberto@unbit.it,
	Ryo Tsuruta <ryov@valinux.co.jp>,
	Satoshi UCHIDA <s-uchida@ap.jp.nec.com>,
	subrata@linux.vnet.ibm.com, yoshikawa.takuya@oss.ntt.co.jp,
	Nauman Rafique <nauman@google.com>,
	fchecconi@gmail.com, paolo.valente@unimore.it,
	containers@lists.linux-foundation.org,
	linux-kernel@vger.kernel.org
Subject: [PATCH 0/7] cgroup: io-throttle controller (v14)
Date: Sat, 18 Apr 2009 23:38:25 +0200	[thread overview]
Message-ID: <1240090712-1058-1-git-send-email-righi.andrea@gmail.com> (raw)

Objective
~~~~~~~~~
The objective of the io-throttle controller is to improve IO performance
predictability of different cgroups that share the same block devices.

State of the art (quick overview)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
A recent work made by Vivek propose a weighted BW solution introducing
fair queuing support in the elevator layer and modifying the existent IO
schedulers to use that functionality
(https://lists.linux-foundation.org/pipermail/containers/2009-March/016129.html).

For the fair queuing part Vivek's IO controller makes use of the BFQ
code as posted by Paolo and Fabio (http://lkml.org/lkml/2008/11/11/148).

The dm-ioband controller by the valinux guys is also proposing a
proportional ticket-based solution fully implemented at the device
mapper level (http://people.valinux.co.jp/~ryov/dm-ioband/).

The bio-cgroup patch (http://people.valinux.co.jp/~ryov/bio-cgroup/) is
a BIO tracking mechanism for cgroups, implemented in the cgroup memory
subsystem. It is maintained by Ryo and it allows dm-ioband to track
writeback requests issued by kernel threads (pdflush).

Another work by Satoshi implements the cgroup awareness in CFQ, mapping
per-cgroup priority to CFQ IO priorities and this also provide only the
proportional BW support (http://lwn.net/Articles/306772/).

Please correct me or integrate if I missed someone or something. :)

Proposed solution
~~~~~~~~~~~~~~~~~
Respect to other priority/weight-based solutions the approach used by
this controller is to explicitly choke applications' requests that
directly or indirectly generate IO activity in the system (this
controller addresses both synchronous IO and writeback/buffered IO).

The bandwidth and iops limiting method has the advantage of improving
the performance predictability at the cost of reducing, in general, the
overall performance of the system in terms of throughput.

IO throttling and accounting is performed during the submission of IO
requests and it is independent of the particular IO scheduler.

Detailed informations about design, goal and usage are described in the
documentation (see [PATCH 1/7]).

Implementation
~~~~~~~~~~~~~~
Patchset against latest Linus' git:

  [PATCH 0/7] cgroup: block device IO controller (v14)
  [PATCH 1/7] io-throttle documentation
  [PATCH 2/7] res_counter: introduce ratelimiting attributes
  [PATCH 3/7] page_cgroup: provide a generic page tracking infrastructure
  [PATCH 4/7] io-throttle controller infrastructure
  [PATCH 5/7] kiothrottled: throttle buffered (writeback) IO
  [PATCH 6/7] io-throttle instrumentation
  [PATCH 7/7] export per-task io-throttle statistics to userspace

The v14 all-in-one patch, along with the previous versions, can be found at:
http://download.systemimager.org/~arighi/linux/patches/io-throttle/

What's new
~~~~~~~~~~
In this new version I've embedded the bio-cgroup code inside
io-throttle, providing the page_cgroup page tracking infrastructure.

This completely removes the complexity and the overhead of associating
multiple IO controllers (bio-cgroup groups and io-throttle groups) from
userspace, preserving the same tracking and throttling functionalities
for writeback IO. And it is also possibel to bind other cgroup
subsystems with io-throttle.

I've removed the tracking of IO generated by anonymous pages (swap), to
reduce the overhead of the page tracking functionality (and probably is
not a good idea to delay IO requests that come from swap-in/swap-out
operations).

I've also removed the ext3 specific patch to tag journal IO with
BIO_RW_META to never throttle such IO requests.

As suggested by Ted and Jens we need a more specific solution, where
filesystems inform the IO subsystem which IO requests come from tasks
that are holding filesystem exclusive resources (journal IO, metadata,
etc.). Then, the IO subsystem (both the IO scheduler and the IO
controller) will be able to dispatched those "special" requests at the
highest priority to avoid the classic priority inversion problems.

Changelog (v13 -> v14)
~~~~~~~~~~~~~~~~~~~~~~
* implemented the bio-cgroup functionality as pure infrastructure for page
  tracking capability
* removed the tracking and throttling of IO generated by anonymous pages (swap)
* updated documentation

Overall diffstat
~~~~~~~~~~~~~~~~
 Documentation/cgroups/io-throttle.txt |  417 +++++++++++++++++
 block/Makefile                        |    1 +
 block/blk-core.c                      |    8 +
 block/blk-io-throttle.c               |  822 +++++++++++++++++++++++++++++++++
 block/kiothrottled.c                  |  341 ++++++++++++++
 fs/aio.c                              |   12 +
 fs/buffer.c                           |    2 +
 fs/proc/base.c                        |   18 +
 include/linux/blk-io-throttle.h       |  144 ++++++
 include/linux/cgroup_subsys.h         |    6 +
 include/linux/memcontrol.h            |    6 +
 include/linux/mmzone.h                |    4 +-
 include/linux/page_cgroup.h           |   33 ++-
 include/linux/res_counter.h           |   69 ++-
 include/linux/sched.h                 |    7 +
 init/Kconfig                          |   16 +
 kernel/fork.c                         |    7 +
 kernel/res_counter.c                  |   72 +++
 mm/Makefile                           |    3 +-
 mm/bounce.c                           |    2 +
 mm/filemap.c                          |    2 +
 mm/memcontrol.c                       |    6 +
 mm/page-writeback.c                   |    2 +
 mm/page_cgroup.c                      |   95 ++++-
 mm/readahead.c                        |    3 +
 25 files changed, 2065 insertions(+), 33 deletions(-)

-Andrea

             reply	other threads:[~2009-04-18 21:38 UTC|newest]

Thread overview: 76+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-04-18 21:38 Andrea Righi [this message]
2009-04-18 21:38 ` [PATCH 1/7] io-throttle documentation Andrea Righi
2009-04-18 21:38 ` [PATCH 2/7] res_counter: introduce ratelimiting attributes Andrea Righi
2009-04-21  0:15   ` KAMEZAWA Hiroyuki
     [not found]     ` <20090421091534.971f676f.kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
2009-04-21  9:55       ` Andrea Righi
2009-04-21  9:55     ` Andrea Righi
2009-04-21 10:16       ` Balbir Singh
2009-04-21 10:16       ` Balbir Singh
2009-04-21 14:17         ` Andrea Righi
     [not found]         ` <20090421101659.GF19637-SINUvgVNF2CyUtPGxGje5AC/G2K4zDHf@public.gmane.org>
2009-04-21 14:17           ` Andrea Righi
2009-04-21 10:19       ` KAMEZAWA Hiroyuki
2009-04-21 10:19       ` KAMEZAWA Hiroyuki
     [not found]   ` <1240090712-1058-3-git-send-email-righi.andrea-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2009-04-21  0:15     ` KAMEZAWA Hiroyuki
2009-04-21 10:13     ` Balbir Singh
2009-04-21 10:13   ` Balbir Singh
2009-04-21 11:16     ` Andrea Righi
     [not found]     ` <20090421101326.GE19637-SINUvgVNF2CyUtPGxGje5AC/G2K4zDHf@public.gmane.org>
2009-04-21 11:16       ` Andrea Righi
2009-04-18 21:38 ` [PATCH 3/7] page_cgroup: provide a generic page tracking infrastructure Andrea Righi
2009-04-24  2:11   ` Gui Jianfeng
2009-04-24  8:31     ` Andrea Righi
2009-04-24  9:14       ` Gui Jianfeng
2009-04-24  9:14       ` Gui Jianfeng
     [not found]         ` <49F1830F.8020609-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>
2009-04-26 17:19           ` Andrea Righi
2009-04-26 17:19             ` Andrea Righi
     [not found]     ` <49F11FBD.3070705-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>
2009-04-24  8:31       ` Andrea Righi
     [not found]   ` <1240090712-1058-4-git-send-email-righi.andrea-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2009-04-24  2:11     ` Gui Jianfeng
2009-04-18 21:38 ` [PATCH 4/7] io-throttle controller infrastructure Andrea Righi
2009-04-20 17:59   ` Paul E. McKenney
     [not found]     ` <20090420175904.GD6822-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
2009-04-20 21:22       ` Andrea Righi
2009-04-20 21:22     ` Andrea Righi
2009-04-21  4:15       ` Paul E. McKenney
     [not found]         ` <20090421041524.GB6939-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
2009-04-21 12:58           ` Andrea Righi
2009-04-21 12:58         ` Andrea Righi
2009-04-21 14:03           ` Paul E. McKenney
2009-04-21 14:03           ` Paul E. McKenney
2009-04-21  4:15       ` Paul E. McKenney
     [not found]   ` <1240090712-1058-5-git-send-email-righi.andrea-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2009-04-20 17:59     ` Paul E. McKenney
2009-04-18 21:38 ` [PATCH 5/7] kiothrottled: throttle buffered (writeback) IO Andrea Righi
     [not found]   ` <1240090712-1058-6-git-send-email-righi.andrea-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2009-04-23  7:53     ` Gui Jianfeng
2009-04-23  7:53       ` Gui Jianfeng
2009-04-23 10:25       ` Andrea Righi
2009-04-24  6:36         ` Gui Jianfeng
2009-04-24  6:36         ` Gui Jianfeng
     [not found]       ` <49F01E8F.80807-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>
2009-04-23 10:25         ` Andrea Righi
     [not found] ` <1240090712-1058-1-git-send-email-righi.andrea-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2009-04-18 21:38   ` [PATCH 1/7] io-throttle documentation Andrea Righi
2009-04-18 21:38   ` [PATCH 2/7] res_counter: introduce ratelimiting attributes Andrea Righi
2009-04-18 21:38   ` [PATCH 3/7] page_cgroup: provide a generic page tracking infrastructure Andrea Righi
2009-04-18 21:38   ` [PATCH 4/7] io-throttle controller infrastructure Andrea Righi
2009-04-18 21:38   ` [PATCH 5/7] kiothrottled: throttle buffered (writeback) IO Andrea Righi
2009-04-18 21:38   ` [PATCH 6/7] io-throttle instrumentation Andrea Righi
2009-04-18 21:38     ` Andrea Righi
2009-04-18 21:38   ` [PATCH 7/7] export per-task io-throttle statistics to userspace Andrea Righi
2009-04-18 21:38     ` Andrea Righi
  -- strict thread matches above, loose matches on Subject: below --
2009-04-18 21:38 [PATCH 0/7] cgroup: io-throttle controller (v14) Andrea Righi
2009-04-18 21:37 Andrea Righi
2009-04-18 21:37 Andrea Righi
2009-04-20  8:39 ` Gui Jianfeng
2009-04-20 14:48   ` Andrea Righi
2009-04-21  1:16     ` Gui Jianfeng
2009-04-21  9:58       ` Andrea Righi
2009-04-22  1:09         ` Gui Jianfeng
     [not found]           ` <49EE6E3C.8050409-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>
2009-04-22 12:43             ` Andrea Righi
2009-04-22 12:43           ` Andrea Righi
2009-04-23  2:58             ` Gui Jianfeng
2009-04-23 10:14               ` Andrea Righi
2009-04-24  1:10                 ` Gui Jianfeng
2009-04-24  8:13                   ` Andrea Righi
     [not found]                   ` <49F11177.6000709-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>
2009-04-24  8:13                     ` Andrea Righi
2009-04-24  1:10                 ` Gui Jianfeng
     [not found]               ` <49EFD948.7050803-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>
2009-04-23 10:14                 ` Andrea Righi
2009-04-23  2:58             ` Gui Jianfeng
2009-04-22  1:09         ` Gui Jianfeng
     [not found]       ` <49ED1E66.6030604-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>
2009-04-21  9:58         ` Andrea Righi
2009-04-21  1:16     ` Gui Jianfeng
     [not found]   ` <49EC34C1.6010709-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>
2009-04-20 14:48     ` Andrea Righi
     [not found] ` <1240090636-898-1-git-send-email-righi.andrea-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2009-04-20  8:39   ` Gui Jianfeng

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1240090712-1058-1-git-send-email-righi.andrea@gmail.com \
    --to=righi.andrea@gmail.com \
    --cc=agk@sourceware.org \
    --cc=akpm@linux-foundation.org \
    --cc=axboe@kernel.dk \
    --cc=balbir@linux.vnet.ibm.com \
    --cc=baramsori72@gmail.com \
    --cc=chlunde@ping.uio.no \
    --cc=containers@lists.linux-foundation.org \
    --cc=dave@linux.vnet.ibm.com \
    --cc=dpshah@google.com \
    --cc=dradford@bluehost.com \
    --cc=eric.rannaud@gmail.com \
    --cc=fchecconi@gmail.com \
    --cc=fernando@oss.ntt.co.jp \
    --cc=guijianfeng@cn.fujitsu.com \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lizf@cn.fujitsu.com \
    --cc=matt@bluehost.com \
    --cc=menage@google.com \
    --cc=nauman@google.com \
    --cc=ngupta@google.com \
    --cc=paolo.valente@unimore.it \
    --cc=randy.dunlap@oracle.com \
    --cc=roberto@unbit.it \
    --cc=ryov@valinux.co.jp \
    --cc=s-uchida@ap.jp.nec.com \
    --cc=subrata@linux.vnet.ibm.com \
    --cc=taka@valinux.co.jp \
    --cc=yoshikawa.takuya@oss.ntt.co.jp \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.