linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 00/13][V2] Introduce io.latency io controller for cgroups
@ 2018-06-05 13:29 Josef Bacik
  2018-06-05 13:29 ` [PATCH 01/13] block: add bi_blkg to the bio " Josef Bacik
                   ` (12 more replies)
  0 siblings, 13 replies; 21+ messages in thread
From: Josef Bacik @ 2018-06-05 13:29 UTC (permalink / raw)
  To: axboe, kernel-team, linux-block, akpm, hannes, linux-kernel, tj,
	linux-fsdevel

v1->v2:
- fix how we get the swap device for the page when doing the swap throttling.
- add a bunch of comments how the throttling works.
- move the documentation to cgroup-v2.txt
- address the various other comments.

==== Original message =====

This series adds a latency based io controller for cgroups.  It is based on the
same concept as the writeback throttling code, which is watching the overall
total latency of IO's in a given window and then adjusting the queue depth of
the group accordingly.  This is meant to be a workload protection controller, so
whoever has the lowest latency target gets the preferential treatment with no
thought to fairness or proportionality.  It is meant to be work conserving, so
as long as nobody is missing their latency targets the disk is fair game.

We have been testing this in production for several months now to get the
behavior right and we are finally at the point that it is working well in all of
our test cases.  With this patch we protect our main workload (the web server)
and isolate out the system services (chef/yum/etc).  This works well in the
normal case, smoothing out weird request per second (RPS) dips that we would see
when one of the system services would run and compete for IO resources.  This
also works incredibly well in the runaway task case.

The runaway task usecase is where we have some task that slowly eats up all of
the memory on the system (think a memory leak).  Previously this sort of
workload would push the box into a swapping/oom death spiral that was only
recovered by rebooting the box.  With this patchset and proper configuration of
the memory.low and io.latency controllers we're able to survive this test with a
at most 20% dip in RPS.

There are a lot of extra patches in here to set everything up.  The following
are just infrastructure that should be relatively uncontroversial

[PATCH 01/13] block: add bi_blkg to the bio for cgroups
[PATCH 02/13] block: introduce bio_issue_as_root_blkg
[PATCH 03/13] blk-cgroup: allow controllers to output their own stats

The following simply allow us to tag swap IO and assign the appropriate cgroup
to the bio's so we can do the appropriate accounting inside the io controller

[PATCH 04/13] blk: introduce REQ_SWAP
[PATCH 05/13] swap,blkcg: issue swap io with the appropriate context

This is so that we can induce delays.  The io controller mostly throttles based
on queue depth, however for cases like REQ_SWAP/REQ_META where we cannot
throttle without inducing a priority inversion we have a mechanism to "back
charge" groups for this IO by inducing an artificial delay at user space return
time.

[PATCH 06/13] blkcg: add generic throttling mechanism
[PATCH 07/13] memcontrol: schedule throttling if we are congested

This is more moving things around and refactoring, Jens you may want to pay
close attention to this to make sure I didn't break anything.

[PATCH 08/13] blk-stat: export helpers for modifying blk_rq_stat
[PATCH 09/13] blk-rq-qos: refactor out common elements of blk-wbt
[PATCH 10/13] block: remove external dependency on wbt_flags
[PATCH 11/13] rq-qos: introduce dio_bio callback

And this is the meat of the controller and it's documentation.

[PATCH 12/13] block: introduce blk-iolatency io controller
[PATCH 13/13] Documentation: add a doc for blk-iolatency

Jens, I'm sending this through your tree since it's mostly block related,
however there are the two mm related patches, so if somebody from mm could weigh
in on how we want to handle those that would be great.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 21+ messages in thread
* [PATCH 00/13] Introdue io.latency io controller for cgroups
@ 2018-05-29 21:17 Josef Bacik
  2018-05-29 21:17 ` [PATCH 07/13] memcontrol: schedule throttling if we are congested Josef Bacik
  0 siblings, 1 reply; 21+ messages in thread
From: Josef Bacik @ 2018-05-29 21:17 UTC (permalink / raw)
  To: axboe, kernel-team, linux-block, akpm, linux-mm, hannes,
	linux-kernel, tj, linux-fsdevel

This series adds a latency based io controller for cgroups.  It is based on the
same concept as the writeback throttling code, which is watching the overall
total latency of IO's in a given window and then adjusting the queue depth of
the group accordingly.  This is meant to be a workload protection controller, so
whoever has the lowest latency target gets the preferential treatment with no
thought to fairness or proportionality.  It is meant to be work conserving, so
as long as nobody is missing their latency targets the disk is fair game.

We have been testing this in production for several months now to get the
behavior right and we are finally at the point that it is working well in all of
our test cases.  With this patch we protect our main workload (the web server)
and isolate out the system services (chef/yum/etc).  This works well in the
normal case, smoothing out weird request per second (RPS) dips that we would see
when one of the system services would run and compete for IO resources.  This
also works incredibly well in the runaway task case.

The runaway task usecase is where we have some task that slowly eats up all of
the memory on the system (think a memory leak).  Previously this sort of
workload would push the box into a swapping/oom death spiral that was only
recovered by rebooting the box.  With this patchset and proper configuration of
the memory.low and io.latency controllers we're able to survive this test with a
at most 20% dip in RPS.

There are a lot of extra patches in here to set everything up.  The following
are just infrastructure that should be relatively uncontroversial

[PATCH 01/13] block: add bi_blkg to the bio for cgroups
[PATCH 02/13] block: introduce bio_issue_as_root_blkg
[PATCH 03/13] blk-cgroup: allow controllers to output their own stats

The following simply allow us to tag swap IO and assign the appropriate cgroup
to the bio's so we can do the appropriate accounting inside the io controller

[PATCH 04/13] blk: introduce REQ_SWAP
[PATCH 05/13] swap,blkcg: issue swap io with the appropriate context

This is so that we can induce delays.  The io controller mostly throttles based
on queue depth, however for cases like REQ_SWAP/REQ_META where we cannot
throttle without inducing a priority inversion we have a mechanism to "back
charge" groups for this IO by inducing an artificial delay at user space return
time.

[PATCH 06/13] blkcg: add generic throttling mechanism
[PATCH 07/13] memcontrol: schedule throttling if we are congested

This is more moving things around and refactoring, Jens you may want to pay
close attention to this to make sure I didn't break anything.

[PATCH 08/13] blk-stat: export helpers for modifying blk_rq_stat
[PATCH 09/13] blk-rq-qos: refactor out common elements of blk-wbt
[PATCH 10/13] block: remove external dependency on wbt_flags
[PATCH 11/13] rq-qos: introduce dio_bio callback

And this is the meat of the controller and it's documentation.

[PATCH 12/13] block: introduce blk-iolatency io controller
[PATCH 13/13] Documentation: add a doc for blk-iolatency

Jens, I'm sending this through your tree since it's mostly block related,
however there are the two mm related patches, so if somebody from mm could weigh
in on how we want to handle those that would be great.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2018-06-11 14:06 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-06-05 13:29 [PATCH 00/13][V2] Introduce io.latency io controller for cgroups Josef Bacik
2018-06-05 13:29 ` [PATCH 01/13] block: add bi_blkg to the bio " Josef Bacik
2018-06-05 13:29 ` [PATCH 02/13] block: introduce bio_issue_as_root_blkg Josef Bacik
2018-06-05 13:29 ` [PATCH 03/13] blk-cgroup: allow controllers to output their own stats Josef Bacik
2018-06-05 13:29 ` [PATCH 04/13] blk: introduce REQ_SWAP Josef Bacik
2018-06-05 13:29 ` [PATCH 05/13] swap,blkcg: issue swap io with the appropriate context Josef Bacik
2018-06-05 19:41   ` Tejun Heo
2018-06-11 14:06   ` Johannes Weiner
2018-06-05 13:29 ` [PATCH 06/13] blkcg: add generic throttling mechanism Josef Bacik
2018-06-05 20:45   ` Tejun Heo
2018-06-05 13:29 ` [PATCH 07/13] memcontrol: schedule throttling if we are congested Josef Bacik
2018-06-05 20:46   ` Tejun Heo
2018-06-11 14:08   ` Johannes Weiner
2018-06-05 13:29 ` [PATCH 08/13] blk-stat: export helpers for modifying blk_rq_stat Josef Bacik
2018-06-05 13:29 ` [PATCH 09/13] blk-rq-qos: refactor out common elements of blk-wbt Josef Bacik
2018-06-05 13:29 ` [PATCH 10/13] block: remove external dependency on wbt_flags Josef Bacik
2018-06-05 13:29 ` [PATCH 11/13] rq-qos: introduce dio_bio callback Josef Bacik
2018-06-05 13:29 ` [PATCH 12/13] block: introduce blk-iolatency io controller Josef Bacik
2018-06-05 13:29 ` [PATCH 13/13] Documentation: add a doc for blk-iolatency Josef Bacik
  -- strict thread matches above, loose matches on Subject: below --
2018-05-29 21:17 [PATCH 00/13] Introdue io.latency io controller for cgroups Josef Bacik
2018-05-29 21:17 ` [PATCH 07/13] memcontrol: schedule throttling if we are congested Josef Bacik
2018-05-30 14:15   ` Johannes Weiner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).