linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCHSET for-5.10/block] blk-iocost: iocost: improve donation, debt and excess handling
@ 2020-09-01 18:52 Tejun Heo
  2020-09-01 18:52 ` [PATCH 01/27] blk-iocost: ioc_pd_free() shouldn't assume irq disabled Tejun Heo
                   ` (27 more replies)
  0 siblings, 28 replies; 31+ messages in thread
From: Tejun Heo @ 2020-09-01 18:52 UTC (permalink / raw)
  To: axboe; +Cc: linux-block, cgroups, linux-kernel, kernel-team, newella

Hello,

This patchset improves iocost in three areas to make iocost internal
operations more accurate and immediate with the goal of improving work
conservation and distribution fairness, and removing dependence on vrate
adjustments for masking work conservation issues. This improves overall
control quality and allows regulating vrate more tightly for more consistent
behavior as vrate now only needs to respond to device behavior changes.

1. Donation

iocost implements work-conservation by making under-utliized cgroups to
donate unused budgets to saturated cgroups. This approach has the
significant advantage that calculation or synchronization inaccuracies never
lead to over utilization of the device while allowing all hot path
operations to be local to each cgroup - it's inherently safe w/o needing
system-wide synchronization in hot paths.

However, this approach requires dynamically adjusting weights according to
the current usage of each cgroup. Given that a cgroup with weight X is using
only a portion of its hierarchical absoulte share, it needs to scale down X
so that the share matches the observed usage. With nesting and multiple
nodes needing adjustments at once, the math is non-trivial. The current
implementation works around the issue by trying to converge by repeatedly
under-adjusting the weight of each cgroup.

The innate inaccuracies can lead to significant errors impacting work
conservation and fairness, and the workarounds around them weigh down the
rest of the control logic.

Andy Newell devised a method to calculate the exact weight updates given the
target hierarchical shares which is described in the following pdfs.

  https://drive.google.com/file/d/1PsJwxPFtjUnwOY1QJ5AeICCcsL7BM3bo
  https://drive.google.com/file/d/1vONz1-fzVO7oY5DXXsLjSxEtYYQbOvsE
  https://drive.google.com/file/d/1WcrltBOSPN0qXVdBgnKm4mdp9FhuEFQN

This patchset implements Andy's method for precise donation weight
adjustments on each period timer.

Donation amount is also adjusted during a period if the donor is running out
of budget. This mechanism used to be very coarse as donation calculations
weren't accurate to begin with. Now that donation calculations are exact,
this patchset improves in-period adjustments too.

2. Debt

Some IOs which are attributed to a low priority cgroup can cause severe
priority inversions when blocked - e.g. swap outs and filesystem metadata
IOs. These IOs are issued right away even when the cgroup doesn't have
enough budget. When this happens, the cgroup incurs debt, which the cgroup
has to pay off before issuing more IOs.

There were several issues around debt handling around how weight is adjusted
while under debt, how payment is calculated, and how anonymous memory delay
duration is determined. This patchset fixes and improves debt handling and
adds debt forgiveness mechanism which avoids extended pathological stalling
on very slow devices.

3. Excess handling

During a period, each cgroup mostly runs on its own without constantly
synchronizing with other cgroups. This often leads to excess budget which
needs to be thrown away at the end of the period, which can have negative
impact on work conservation. This is somewhat offset by vrate adjustments
but vrate compensation is delayed and can sometimes be erratic and it
prevents us from confining vrate for more consistent behavior.

This patchset implements excess vrate compensation where the effective vrate
is transparently boosted to compensate for excesses without affecting the
regular latency based vrate adjustment mechanism. This compensates for
excesses immediately and accurately and allows the regular vrate adjustment
mechanism to worry only about device behavior changes.


This patchset is on top of for-5.10/block (2b64038972e4) and availalbe in
the following git branch.

 git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git iocost-andys

It contains the following 27 patches.

 0001-blk-iocost-ioc_pd_free-shouldn-t-assume-irq-disabled.patch
 0002-blk-stat-make-q-stats-lock-irqsafe.patch
 0003-blk-iocost-use-local-64-_t-for-percpu-stat.patch
 0004-blk-iocost-rename-propagate_active_weights-to-propag.patch
 0005-blk-iocost-clamp-inuse-and-skip-noops-in-__propagate.patch
 0006-blk-iocost-move-iocg_kick_delay-above-iocg_kick_wait.patch
 0007-blk-iocost-make-iocg_kick_waitq-call-iocg_kick_delay.patch
 0008-blk-iocost-s-HWEIGHT_WHOLE-WEIGHT_ONE-g.patch
 0009-blk-iocost-use-WEIGHT_ONE-based-fixed-point-number-f.patch
 0010-blk-iocost-make-ioc_now-now-and-ioc-period_at-64bit.patch
 0011-blk-iocost-streamline-vtime-margin-and-timer-slack-h.patch
 0012-blk-iocost-grab-ioc-lock-for-debt-handling.patch
 0013-blk-iocost-add-absolute-usage-stat.patch
 0014-blk-iocost-calculate-iocg-usages-from-iocg-local_sta.patch
 0015-blk-iocost-replace-iocg-has_surplus-with-surplus_lis.patch
 0016-blk-iocost-decouple-vrate-adjustment-from-surplus-tr.patch
 0017-blk-iocost-restructure-surplus-donation-logic.patch
 0018-blk-iocost-implement-Andy-s-method-for-donation-weig.patch
 0019-blk-iocost-revamp-donation-amount-determination.patch
 0020-blk-iocost-revamp-in-period-donation-snapbacks.patch
 0021-blk-iocost-revamp-debt-handling.patch
 0022-blk-iocost-implement-delay-adjustment-hysteresis.patch
 0023-blk-iocost-halve-debts-if-device-stays-idle.patch
 0024-blk-iocost-implement-vtime-loss-compensation.patch
 0025-blk-iocost-restore-inuse-update-tracepoints.patch
 0026-blk-iocost-add-three-debug-stat-cost.wait-indebt-and.patch
 0027-blk-iocost-update-iocost_monitor.py.patch

0001-0002 are fixes w/ stable cc'd.

0003-0012 are prep patches - increasing calculation precision for weights,
switching some fields to 64bit, code reorganization, locking changes and so
on.

0013-0014 implement per-cgroup absolute usage tracking so that control
decisions aren't affected by weight distribution changes.

0015-0017 restructure donation logic to prepare for Andy's weight adjustment
method.

0018-0020 implement Andy's weight adjustment method, improve donation logic
both on period and in period.

0021-0023 improve debt and delay handling.

0024 implements budget excess compensation.

0025-0027 update tracepoints, monitoring script, debug stat.

diffstat follows. Thanks.

 block/blk-cgroup.c             |   23 
 block/blk-iocost.c             | 1540 +++++++++++++++++++++++++++++++----------
 block/blk-stat.c               |   17 
 include/trace/events/iocost.h  |   26 
 tools/cgroup/iocost_monitor.py |   54 -
 5 files changed, 1227 insertions(+), 433 deletions(-)

--
tejun


^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2020-11-20 22:14 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-01 18:52 [PATCHSET for-5.10/block] blk-iocost: iocost: improve donation, debt and excess handling Tejun Heo
2020-09-01 18:52 ` [PATCH 01/27] blk-iocost: ioc_pd_free() shouldn't assume irq disabled Tejun Heo
2020-09-01 18:52 ` [PATCH 02/27] blk-stat: make q->stats->lock irqsafe Tejun Heo
2020-09-01 18:52 ` [PATCH 03/27] blk-iocost: use local[64]_t for percpu stat Tejun Heo
2020-11-20 21:51   ` Stafford Horne
2020-11-20 22:13     ` Tejun Heo
2020-09-01 18:52 ` [PATCH 04/27] blk-iocost: rename propagate_active_weights() to propagate_weights() Tejun Heo
2020-09-01 18:52 ` [PATCH 05/27] blk-iocost: clamp inuse and skip noops in __propagate_weights() Tejun Heo
2020-09-01 18:52 ` [PATCH 06/27] blk-iocost: move iocg_kick_delay() above iocg_kick_waitq() Tejun Heo
2020-09-01 18:52 ` [PATCH 07/27] blk-iocost: make iocg_kick_waitq() call iocg_kick_delay() after paying debt Tejun Heo
2020-09-01 18:52 ` [PATCH 08/27] blk-iocost: s/HWEIGHT_WHOLE/WEIGHT_ONE/g Tejun Heo
2020-09-01 18:52 ` [PATCH 09/27] blk-iocost: use WEIGHT_ONE based fixed point number for weights Tejun Heo
2020-09-01 18:52 ` [PATCH 10/27] blk-iocost: make ioc_now->now and ioc->period_at 64bit Tejun Heo
2020-09-01 18:52 ` [PATCH 11/27] blk-iocost: streamline vtime margin and timer slack handling Tejun Heo
2020-09-01 18:52 ` [PATCH 12/27] blk-iocost: grab ioc->lock for debt handling Tejun Heo
2020-09-01 18:52 ` [PATCH 13/27] blk-iocost: add absolute usage stat Tejun Heo
2020-09-01 18:52 ` [PATCH 14/27] blk-iocost: calculate iocg->usages[] from iocg->local_stat.usage_us Tejun Heo
2020-09-01 18:52 ` [PATCH 15/27] blk-iocost: replace iocg->has_surplus with ->surplus_list Tejun Heo
2020-09-01 18:52 ` [PATCH 16/27] blk-iocost: decouple vrate adjustment from surplus transfers Tejun Heo
2020-09-01 18:52 ` [PATCH 17/27] blk-iocost: restructure surplus donation logic Tejun Heo
2020-09-01 18:52 ` [PATCH 18/27] blk-iocost: implement Andy's method for donation weight updates Tejun Heo
2020-09-01 18:52 ` [PATCH 19/27] blk-iocost: revamp donation amount determination Tejun Heo
2020-09-01 18:52 ` [PATCH 20/27] blk-iocost: revamp in-period donation snapbacks Tejun Heo
2020-09-01 18:52 ` [PATCH 21/27] blk-iocost: revamp debt handling Tejun Heo
2020-09-01 18:52 ` [PATCH 22/27] blk-iocost: implement delay adjustment hysteresis Tejun Heo
2020-09-01 18:52 ` [PATCH 23/27] blk-iocost: halve debts if device stays idle Tejun Heo
2020-09-01 18:52 ` [PATCH 24/27] blk-iocost: implement vtime loss compensation Tejun Heo
2020-09-01 18:52 ` [PATCH 25/27] blk-iocost: restore inuse update tracepoints Tejun Heo
2020-09-01 18:52 ` [PATCH 26/27] blk-iocost: add three debug stat - cost.wait, indebt and indelay Tejun Heo
2020-09-01 18:52 ` [PATCH 27/27] blk-iocost: update iocost_monitor.py Tejun Heo
2020-09-01 22:57 ` [PATCHSET for-5.10/block] blk-iocost: iocost: improve donation, debt and excess handling Jens Axboe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).