Linux-Block Archive on lore.kernel.org
 help / color / Atom feed
* [PATCHSET block/for-next] blk-iocost: Implement absolute debt handling
@ 2019-09-04 19:45 Tejun Heo
  2019-09-04 19:45 ` [PATCH 1/5] blk-iocost: Account force-charged overage in absolute vtime Tejun Heo
                   ` (5 more replies)
  0 siblings, 6 replies; 7+ messages in thread
From: Tejun Heo @ 2019-09-04 19:45 UTC (permalink / raw)
  To: axboe; +Cc: linux-block, linux-kernel, kernel-team, cgroups, newella

Currently, when a given cgroup doesn't have enough budget, a forced or
merged bio will advance the cgroup's vtime by the cost calculated
according to the hierarchical weight at the time of issue.  Once vtime
is advanced, how the cgroup's weight changes doesn't matter.  It has
to wait until global vtime catches up with the cgroup's.

This means that the cost is calculated based on the hweight at the
time of issuing but may later be paid at the wrong hweight.  This, for
example, can lead to a scenario like the following.

1. A cgroup with a very low hweight runs out of budget.

2. A storm of swap-out happens on it.  All of them are scaled
   according to the current low hweight and charged to vtime pushing
   it to a far future.

3. All other cgroups go idle and now the above cgroup has access to
   the whole device.  However, because vtime is already wound using
   the past low hweight, what its current hweight is doesn't matter
   until global vtime catches up to the local vtime.

4. As a result, either vrate gets ramped up extremely or the IOs stall
   while the underlying device is idle.

This patchset fixes the behavior by accounting the cost of forced or
merged bios in absolute vtime rather than cgroup-relative.  This
allows the cgroup to pay back the debt with whatever actual budget it
has each period removing the hweight discrepancy.

Note that !forced bios' costs are already accounted in absolute vtime.
This patchset puts forced charges on the same ground.

This patchset contains the following five patches and is on top of the
current linux-block.git for-next 35e7ae82f62b ("Merge branch
'for-5.4/block' into for-next").

 0001-blk-iocost-Account-force-charged-overage-in-absolute.patch
 0002-blk-iocost-Don-t-let-merges-push-vtime-into-the-futu.patch
 0003-iocost_monitor-Always-use-strings-for-json-values.patch
 0004-iocost_monitor-Report-more-info-with-higher-accuracy.patch
 0005-iocost_monitor-Report-debt.patch

0001-0002 implement absolute debt handling.  0003-0005 improve the
monitoring script and add debt reporting.

This patchset is also available in the following git branch.

 git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git iocost-vdebt

diffstat follows.  Thanks.

 block/blk-iocost.c             |   93 +++++++++++++++++++++++++++++++++--------
 tools/cgroup/iocost_monitor.py |   61 ++++++++++++++------------
 2 files changed, 110 insertions(+), 44 deletions(-)

--
tejun


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 1/5] blk-iocost: Account force-charged overage in absolute vtime
  2019-09-04 19:45 [PATCHSET block/for-next] blk-iocost: Implement absolute debt handling Tejun Heo
@ 2019-09-04 19:45 ` Tejun Heo
  2019-09-04 19:45 ` [PATCH 2/5] blk-iocost: Don't let merges push vtime into the future Tejun Heo
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: Tejun Heo @ 2019-09-04 19:45 UTC (permalink / raw)
  To: axboe; +Cc: linux-block, linux-kernel, kernel-team, cgroups, newella, Tejun Heo

Currently, when a bio needs to be force-charged and there isn't enough
budget, vtime is simply pushed into the future.  This means that the
cost of the whole bio is scaled using the current hweight and then
charged immediately.  Until the global vtime advances beyond this
future vtime, the cgroup won't be allowed to issue normal IOs.

This is incorrect and can lead to, for example, exploding vrate or
extended stalls if vrate range is constrained.  Consider the following
scenario.

1. A cgroup with a very low hweight runs out of budget.

2. A storm of swap-out happens on it.  All of them are scaled
   according to the current low hweight and charged to vtime pushing
   it to a far future.

3. All other cgroups go idle and now the above cgroup has access to
   the whole device.  However, because vtime is already wound using
   the past low hweight, what its current hweight is doesn't matter
   until global vtime catches up to the local vtime.

4. As a result, either vrate gets ramped up extremely or the IOs stall
   while the underlying device is idle.

This is because the hweight the overage is calculated at is different
from the hweight that it's being paid at.

Fix it by remembering the overage in absoulte vtime and continuously
paying with the actual budget according to the current hweight at each
period.

Note that non-forced bios which wait already remembers the cost in
absolute vtime.  This brings forced-bio accounting in line.

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 block/blk-iocost.c | 62 ++++++++++++++++++++++++++++++++++++++++------
 1 file changed, 55 insertions(+), 7 deletions(-)

diff --git a/block/blk-iocost.c b/block/blk-iocost.c
index 2aae8ec391ef..5b0c024ee5dd 100644
--- a/block/blk-iocost.c
+++ b/block/blk-iocost.c
@@ -469,6 +469,7 @@ struct ioc_gq {
 	 */
 	atomic64_t			vtime;
 	atomic64_t			done_vtime;
+	atomic64_t			abs_vdebt;
 	u64				last_vtime;
 
 	/*
@@ -653,13 +654,21 @@ static struct ioc_cgrp *blkcg_to_iocc(struct blkcg *blkcg)
 
 /*
  * Scale @abs_cost to the inverse of @hw_inuse.  The lower the hierarchical
- * weight, the more expensive each IO.
+ * weight, the more expensive each IO.  Must round up.
  */
 static u64 abs_cost_to_cost(u64 abs_cost, u32 hw_inuse)
 {
 	return DIV64_U64_ROUND_UP(abs_cost * HWEIGHT_WHOLE, hw_inuse);
 }
 
+/*
+ * The inverse of abs_cost_to_cost().  Must round up.
+ */
+static u64 cost_to_abs_cost(u64 cost, u32 hw_inuse)
+{
+	return DIV64_U64_ROUND_UP(cost * hw_inuse, HWEIGHT_WHOLE);
+}
+
 static void iocg_commit_bio(struct ioc_gq *iocg, struct bio *bio, u64 cost)
 {
 	bio->bi_iocost_cost = cost;
@@ -1132,16 +1141,36 @@ static void iocg_kick_waitq(struct ioc_gq *iocg, struct ioc_now *now)
 	struct iocg_wake_ctx ctx = { .iocg = iocg };
 	u64 margin_ns = (u64)(ioc->period_us *
 			      WAITQ_TIMER_MARGIN_PCT / 100) * NSEC_PER_USEC;
-	u64 vshortage, expires, oexpires;
+	u64 abs_vdebt, vdebt, vshortage, expires, oexpires;
+	s64 vbudget;
+	u32 hw_inuse;
 
 	lockdep_assert_held(&iocg->waitq.lock);
 
+	current_hweight(iocg, NULL, &hw_inuse);
+	vbudget = now->vnow - atomic64_read(&iocg->vtime);
+
+	/* pay off debt */
+	abs_vdebt = atomic64_read(&iocg->abs_vdebt);
+	vdebt = abs_cost_to_cost(abs_vdebt, hw_inuse);
+	if (vdebt && vbudget > 0) {
+		u64 delta = min_t(u64, vbudget, vdebt);
+		u64 abs_delta = min(cost_to_abs_cost(delta, hw_inuse),
+				    abs_vdebt);
+
+		atomic64_add(delta, &iocg->vtime);
+		atomic64_add(delta, &iocg->done_vtime);
+		atomic64_sub(abs_delta, &iocg->abs_vdebt);
+		if (WARN_ON_ONCE(atomic64_read(&iocg->abs_vdebt) < 0))
+			atomic64_set(&iocg->abs_vdebt, 0);
+	}
+
 	/*
 	 * Wake up the ones which are due and see how much vtime we'll need
 	 * for the next one.
 	 */
-	current_hweight(iocg, NULL, &ctx.hw_inuse);
-	ctx.vbudget = now->vnow - atomic64_read(&iocg->vtime);
+	ctx.hw_inuse = hw_inuse;
+	ctx.vbudget = vbudget - vdebt;
 	__wake_up_locked_key(&iocg->waitq, TASK_NORMAL, &ctx);
 	if (!waitqueue_active(&iocg->waitq))
 		return;
@@ -1187,6 +1216,11 @@ static void iocg_kick_delay(struct ioc_gq *iocg, struct ioc_now *now, u64 cost)
 	u64 vmargin = ioc->margin_us * now->vrate;
 	u64 margin_ns = ioc->margin_us * NSEC_PER_USEC;
 	u64 expires, oexpires;
+	u32 hw_inuse;
+
+	/* debt-adjust vtime */
+	current_hweight(iocg, NULL, &hw_inuse);
+	vtime += abs_cost_to_cost(atomic64_read(&iocg->abs_vdebt), hw_inuse);
 
 	/* clear or maintain depending on the overage */
 	if (time_before_eq64(vtime, now->vnow)) {
@@ -1332,12 +1366,14 @@ static void ioc_timer_fn(struct timer_list *timer)
 	 * should have woken up in the last period and expire idle iocgs.
 	 */
 	list_for_each_entry_safe(iocg, tiocg, &ioc->active_iocgs, active_list) {
-		if (!waitqueue_active(&iocg->waitq) && !iocg_is_idle(iocg))
+		if (!waitqueue_active(&iocg->waitq) &&
+		    !atomic64_read(&iocg->abs_vdebt) && !iocg_is_idle(iocg))
 			continue;
 
 		spin_lock(&iocg->waitq.lock);
 
-		if (waitqueue_active(&iocg->waitq)) {
+		if (waitqueue_active(&iocg->waitq) ||
+		    atomic64_read(&iocg->abs_vdebt)) {
 			/* might be oversleeping vtime / hweight changes, kick */
 			iocg_kick_waitq(iocg, &now);
 			iocg_kick_delay(iocg, &now, 0);
@@ -1673,13 +1709,24 @@ static void ioc_rqos_throttle(struct rq_qos *rqos, struct bio *bio)
 	 * in a while which is fine.
 	 */
 	if (!waitqueue_active(&iocg->waitq) &&
+	    !atomic64_read(&iocg->abs_vdebt) &&
 	    time_before_eq64(vtime + cost, now.vnow)) {
 		iocg_commit_bio(iocg, bio, cost);
 		return;
 	}
 
+	/*
+	 * We're over budget.  If @bio has to be issued regardless,
+	 * remember the abs_cost instead of advancing vtime.
+	 * iocg_kick_waitq() will pay off the debt before waking more IOs.
+	 * This way, the debt is continuously paid off each period with the
+	 * actual budget available to the cgroup.  If we just wound vtime,
+	 * we would incorrectly use the current hw_inuse for the entire
+	 * amount which, for example, can lead to the cgroup staying
+	 * blocked for a long time even with substantially raised hw_inuse.
+	 */
 	if (bio_issue_as_root_blkg(bio) || fatal_signal_pending(current)) {
-		iocg_commit_bio(iocg, bio, cost);
+		atomic64_add(abs_cost, &iocg->abs_vdebt);
 		iocg_kick_delay(iocg, &now, cost);
 		return;
 	}
@@ -1928,6 +1975,7 @@ static void ioc_pd_init(struct blkg_policy_data *pd)
 	iocg->ioc = ioc;
 	atomic64_set(&iocg->vtime, now.vnow);
 	atomic64_set(&iocg->done_vtime, now.vnow);
+	atomic64_set(&iocg->abs_vdebt, 0);
 	atomic64_set(&iocg->active_period, atomic64_read(&ioc->cur_period));
 	INIT_LIST_HEAD(&iocg->active_list);
 	iocg->hweight_active = HWEIGHT_WHOLE;
-- 
2.17.1


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 2/5] blk-iocost: Don't let merges push vtime into the future
  2019-09-04 19:45 [PATCHSET block/for-next] blk-iocost: Implement absolute debt handling Tejun Heo
  2019-09-04 19:45 ` [PATCH 1/5] blk-iocost: Account force-charged overage in absolute vtime Tejun Heo
@ 2019-09-04 19:45 ` Tejun Heo
  2019-09-04 19:45 ` [PATCH 3/5] iocost_monitor: Always use strings for json values Tejun Heo
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: Tejun Heo @ 2019-09-04 19:45 UTC (permalink / raw)
  To: axboe; +Cc: linux-block, linux-kernel, kernel-team, cgroups, newella, Tejun Heo

Merges have the same problem that forced-bios had which is fixed by
the previous patch.  The cost of a merge is calculated at the time of
issue and force-advances vtime into the future.  Until global vtime
catches up, how the cgroup's hweight changes in the meantime doesn't
matter and it often leads to situations where the cost is calculated
at one hweight and paid at a very different one.  See the previous
patch for more details.

Fix it by never advancing vtime into the future for merges.  If budget
is available, vtime is advanced.  Otherwise, the cost is charged as
debt.

This brings merge cost handling in line with issue cost handling in
ioc_rqos_throttle().

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 block/blk-iocost.c | 25 ++++++++++++++++++-------
 1 file changed, 18 insertions(+), 7 deletions(-)

diff --git a/block/blk-iocost.c b/block/blk-iocost.c
index 5b0c024ee5dd..6000ce9b10bb 100644
--- a/block/blk-iocost.c
+++ b/block/blk-iocost.c
@@ -1784,28 +1784,39 @@ static void ioc_rqos_merge(struct rq_qos *rqos, struct request *rq,
 			   struct bio *bio)
 {
 	struct ioc_gq *iocg = blkg_to_iocg(bio->bi_blkg);
+	struct ioc *ioc = iocg->ioc;
 	sector_t bio_end = bio_end_sector(bio);
+	struct ioc_now now;
 	u32 hw_inuse;
 	u64 abs_cost, cost;
 
-	/* add iff the existing request has cost assigned */
-	if (!rq->bio || !rq->bio->bi_iocost_cost)
+	/* bypass if disabled or for root cgroup */
+	if (!ioc->enabled || !iocg->level)
 		return;
 
 	abs_cost = calc_vtime_cost(bio, iocg, true);
 	if (!abs_cost)
 		return;
 
+	ioc_now(ioc, &now);
+	current_hweight(iocg, NULL, &hw_inuse);
+	cost = abs_cost_to_cost(abs_cost, hw_inuse);
+
 	/* update cursor if backmerging into the request at the cursor */
 	if (blk_rq_pos(rq) < bio_end &&
 	    blk_rq_pos(rq) + blk_rq_sectors(rq) == iocg->cursor)
 		iocg->cursor = bio_end;
 
-	current_hweight(iocg, NULL, &hw_inuse);
-	cost = div64_u64(abs_cost * HWEIGHT_WHOLE, hw_inuse);
-	bio->bi_iocost_cost = cost;
-
-	atomic64_add(cost, &iocg->vtime);
+	/*
+	 * Charge if there's enough vtime budget and the existing request
+	 * has cost assigned.  Otherwise, account it as debt.  See debt
+	 * handling in ioc_rqos_throttle() for details.
+	 */
+	if (rq->bio && rq->bio->bi_iocost_cost &&
+	    time_before_eq64(atomic64_read(&iocg->vtime) + cost, now.vnow))
+		iocg_commit_bio(iocg, bio, cost);
+	else
+		atomic64_add(abs_cost, &iocg->abs_vdebt);
 }
 
 static void ioc_rqos_done_bio(struct rq_qos *rqos, struct bio *bio)
-- 
2.17.1


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 3/5] iocost_monitor: Always use strings for json values
  2019-09-04 19:45 [PATCHSET block/for-next] blk-iocost: Implement absolute debt handling Tejun Heo
  2019-09-04 19:45 ` [PATCH 1/5] blk-iocost: Account force-charged overage in absolute vtime Tejun Heo
  2019-09-04 19:45 ` [PATCH 2/5] blk-iocost: Don't let merges push vtime into the future Tejun Heo
@ 2019-09-04 19:45 ` Tejun Heo
  2019-09-04 19:45 ` [PATCH 4/5] iocost_monitor: Report more info with higher accuracy Tejun Heo
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: Tejun Heo @ 2019-09-04 19:45 UTC (permalink / raw)
  To: axboe; +Cc: linux-block, linux-kernel, kernel-team, cgroups, newella, Tejun Heo

Json has limited accuracy for numbers and can silently truncate 64bit
values, which can be extremely confusing.  Let's consistently use
string encapsulated values for json output.

While at it, convert an unnecesary f-string to str().

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 tools/cgroup/iocost_monitor.py | 40 +++++++++++++++++-----------------
 1 file changed, 20 insertions(+), 20 deletions(-)

diff --git a/tools/cgroup/iocost_monitor.py b/tools/cgroup/iocost_monitor.py
index 2c9445e966d8..8f6b4ac377bd 100644
--- a/tools/cgroup/iocost_monitor.py
+++ b/tools/cgroup/iocost_monitor.py
@@ -111,14 +111,14 @@ autop_names = {
 
     def dict(self, now):
         return { 'device'               : devname,
-                 'timestamp'            : now,
-                 'enabled'              : self.enabled,
-                 'running'              : self.running,
-                 'period_ms'            : self.period_ms,
-                 'period_at'            : self.period_at,
-                 'period_vtime_at'      : self.vperiod_at,
-                 'busy_level'           : self.busy_level,
-                 'vrate_pct'            : self.vrate_pct, }
+                 'timestamp'            : str(now),
+                 'enabled'              : str(int(self.enabled)),
+                 'running'              : str(int(self.running)),
+                 'period_ms'            : str(self.period_ms),
+                 'period_at'            : str(self.period_at),
+                 'period_vtime_at'      : str(self.vperiod_at),
+                 'busy_level'           : str(self.busy_level),
+                 'vrate_pct'            : str(self.vrate_pct), }
 
     def table_preamble_str(self):
         state = ('RUN' if self.running else 'IDLE') if self.enabled else 'OFF'
@@ -171,19 +171,19 @@ autop_names = {
 
     def dict(self, now, path):
         out = { 'cgroup'                : path,
-                'timestamp'             : now,
-                'is_active'             : self.is_active,
-                'weight'                : self.weight,
-                'weight_active'         : self.active,
-                'weight_inuse'          : self.inuse,
-                'hweight_active_pct'    : self.hwa_pct,
-                'hweight_inuse_pct'     : self.hwi_pct,
-                'inflight_pct'          : self.inflight_pct,
-                'use_delay'             : self.use_delay,
-                'delay_ms'              : self.delay_ms,
-                'usage_pct'             : self.usage }
+                'timestamp'             : str(now),
+                'is_active'             : str(int(self.is_active)),
+                'weight'                : str(self.weight),
+                'weight_active'         : str(self.active),
+                'weight_inuse'          : str(self.inuse),
+                'hweight_active_pct'    : str(self.hwa_pct),
+                'hweight_inuse_pct'     : str(self.hwi_pct),
+                'inflight_pct'          : str(self.inflight_pct),
+                'use_delay'             : str(self.use_delay),
+                'delay_ms'              : str(self.delay_ms),
+                'usage_pct'             : str(self.usage) }
         for i in range(len(self.usages)):
-            out[f'usage_pct_{i}'] = f'{self.usages[i]}'
+            out[f'usage_pct_{i}'] = str(self.usages[i])
         return out
 
     def table_row_str(self, path):
-- 
2.17.1


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 4/5] iocost_monitor: Report more info with higher accuracy
  2019-09-04 19:45 [PATCHSET block/for-next] blk-iocost: Implement absolute debt handling Tejun Heo
                   ` (2 preceding siblings ...)
  2019-09-04 19:45 ` [PATCH 3/5] iocost_monitor: Always use strings for json values Tejun Heo
@ 2019-09-04 19:45 ` Tejun Heo
  2019-09-04 19:45 ` [PATCH 5/5] iocost_monitor: Report debt Tejun Heo
  2019-09-10 18:32 ` [PATCHSET block/for-next] blk-iocost: Implement absolute debt handling Jens Axboe
  5 siblings, 0 replies; 7+ messages in thread
From: Tejun Heo @ 2019-09-04 19:45 UTC (permalink / raw)
  To: axboe; +Cc: linux-block, linux-kernel, kernel-team, cgroups, newella, Tejun Heo

When outputting json:

* Don't truncate numbers.

* Report address of iocg to ease drilling down further.

When outputting table:

* Use math.ceil() for delay_ms so that small delays don't read as 0.

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 tools/cgroup/iocost_monitor.py | 18 +++++++++++-------
 1 file changed, 11 insertions(+), 7 deletions(-)

diff --git a/tools/cgroup/iocost_monitor.py b/tools/cgroup/iocost_monitor.py
index 8f6b4ac377bd..5d8bac603ffa 100644
--- a/tools/cgroup/iocost_monitor.py
+++ b/tools/cgroup/iocost_monitor.py
@@ -13,6 +13,7 @@ import sys
 import re
 import time
 import json
+import math
 
 import drgn
 from drgn import container_of
@@ -95,7 +96,7 @@ autop_names = {
 
         self.enabled = ioc.enabled.value_()
         self.running = ioc.running.value_() == IOC_RUNNING
-        self.period_ms = round(ioc.period_us.value_() / 1_000)
+        self.period_ms = ioc.period_us.value_() / 1_000
         self.period_at = ioc.period_at.value_() / 1_000_000
         self.vperiod_at = ioc.period_at_vtime.value_() / VTIME_PER_SEC
         self.vrate_pct = ioc.vtime_rate.counter.value_() * 100 / VTIME_PER_USEC
@@ -147,6 +148,7 @@ autop_names = {
         self.inuse = iocg.inuse.value_()
         self.hwa_pct = iocg.hweight_active.value_() * 100 / HWEIGHT_WHOLE
         self.hwi_pct = iocg.hweight_inuse.value_() * 100 / HWEIGHT_WHOLE
+        self.address = iocg.value_()
 
         vdone = iocg.done_vtime.counter.value_()
         vtime = iocg.vtime.counter.value_()
@@ -157,15 +159,15 @@ autop_names = {
         else:
             self.inflight_pct = 0
 
-        self.use_delay = min(blkg.use_delay.counter.value_(), 99)
-        self.delay_ms = min(round(blkg.delay_nsec.counter.value_() / 1_000_000), 999)
+        self.use_delay = blkg.use_delay.counter.value_()
+        self.delay_ms = blkg.delay_nsec.counter.value_() / 1_000_000
 
         usage_idx = iocg.usage_idx.value_()
         self.usages = []
         self.usage = 0
         for i in range(NR_USAGE_SLOTS):
             usage = iocg.usages[(usage_idx + i) % NR_USAGE_SLOTS].value_()
-            upct = min(usage * 100 / HWEIGHT_WHOLE, 999)
+            upct = usage * 100 / HWEIGHT_WHOLE
             self.usages.append(upct)
             self.usage = max(self.usage, upct)
 
@@ -181,7 +183,8 @@ autop_names = {
                 'inflight_pct'          : str(self.inflight_pct),
                 'use_delay'             : str(self.use_delay),
                 'delay_ms'              : str(self.delay_ms),
-                'usage_pct'             : str(self.usage) }
+                'usage_pct'             : str(self.usage),
+                'address'               : str(hex(self.address)) }
         for i in range(len(self.usages)):
             out[f'usage_pct_{i}'] = str(self.usages[i])
         return out
@@ -192,9 +195,10 @@ autop_names = {
               f'{self.inuse:5}/{self.active:5} ' \
               f'{self.hwi_pct:6.2f}/{self.hwa_pct:6.2f} ' \
               f'{self.inflight_pct:6.2f} ' \
-              f'{self.use_delay:2}*{self.delay_ms:03} '
+              f'{min(self.use_delay, 99):2}*'\
+              f'{min(math.ceil(self.delay_ms), 999):03} '
         for u in self.usages:
-            out += f'{round(u):03d}:'
+            out += f'{min(round(u), 999):03d}:'
         out = out.rstrip(':')
         return out
 
-- 
2.17.1


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 5/5] iocost_monitor: Report debt
  2019-09-04 19:45 [PATCHSET block/for-next] blk-iocost: Implement absolute debt handling Tejun Heo
                   ` (3 preceding siblings ...)
  2019-09-04 19:45 ` [PATCH 4/5] iocost_monitor: Report more info with higher accuracy Tejun Heo
@ 2019-09-04 19:45 ` Tejun Heo
  2019-09-10 18:32 ` [PATCHSET block/for-next] blk-iocost: Implement absolute debt handling Jens Axboe
  5 siblings, 0 replies; 7+ messages in thread
From: Tejun Heo @ 2019-09-04 19:45 UTC (permalink / raw)
  To: axboe; +Cc: linux-block, linux-kernel, kernel-team, cgroups, newella, Tejun Heo

Report debt and rename del_ms row to delay for consistency.

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 block/blk-iocost.c             | 6 +++---
 tools/cgroup/iocost_monitor.py | 5 ++++-
 2 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/block/blk-iocost.c b/block/blk-iocost.c
index 6000ce9b10bb..f0c5bfd4b4a8 100644
--- a/block/blk-iocost.c
+++ b/block/blk-iocost.c
@@ -161,9 +161,9 @@
  * https://github.com/osandov/drgn.  The ouput looks like the following.
  *
  *  sdb RUN   per=300ms cur_per=234.218:v203.695 busy= +1 vrate= 62.12%
- *                 active      weight      hweight% inflt% del_ms usages%
- *  test/a              *    50/   50  33.33/ 33.33  27.65  0*041 033:033:033
- *  test/b              *   100/  100  66.67/ 66.67  17.56  0*000 066:079:077
+ *                 active      weight      hweight% inflt% dbt  delay usages%
+ *  test/a              *    50/   50  33.33/ 33.33  27.65   2  0*041 033:033:033
+ *  test/b              *   100/  100  66.67/ 66.67  17.56   0  0*000 066:079:077
  *
  * - per	: Timer period
  * - cur_per	: Internal wall and device vtime clock
diff --git a/tools/cgroup/iocost_monitor.py b/tools/cgroup/iocost_monitor.py
index 5d8bac603ffa..f79b23582a1d 100644
--- a/tools/cgroup/iocost_monitor.py
+++ b/tools/cgroup/iocost_monitor.py
@@ -135,7 +135,7 @@ autop_names = {
 
     def table_header_str(self):
         return f'{"":25} active {"weight":>9} {"hweight%":>13} {"inflt%":>6} ' \
-               f'{"del_ms":>6} {"usages%"}'
+               f'{"dbt":>3} {"delay":>6} {"usages%"}'
 
 class IocgStat:
     def __init__(self, iocg):
@@ -159,6 +159,7 @@ autop_names = {
         else:
             self.inflight_pct = 0
 
+        self.debt_ms = iocg.abs_vdebt.counter.value_() / VTIME_PER_USEC / 1000
         self.use_delay = blkg.use_delay.counter.value_()
         self.delay_ms = blkg.delay_nsec.counter.value_() / 1_000_000
 
@@ -181,6 +182,7 @@ autop_names = {
                 'hweight_active_pct'    : str(self.hwa_pct),
                 'hweight_inuse_pct'     : str(self.hwi_pct),
                 'inflight_pct'          : str(self.inflight_pct),
+                'debt_ms'               : str(self.debt_ms),
                 'use_delay'             : str(self.use_delay),
                 'delay_ms'              : str(self.delay_ms),
                 'usage_pct'             : str(self.usage),
@@ -195,6 +197,7 @@ autop_names = {
               f'{self.inuse:5}/{self.active:5} ' \
               f'{self.hwi_pct:6.2f}/{self.hwa_pct:6.2f} ' \
               f'{self.inflight_pct:6.2f} ' \
+              f'{min(math.ceil(self.debt_ms), 999):3} ' \
               f'{min(self.use_delay, 99):2}*'\
               f'{min(math.ceil(self.delay_ms), 999):03} '
         for u in self.usages:
-- 
2.17.1


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCHSET block/for-next] blk-iocost: Implement absolute debt handling
  2019-09-04 19:45 [PATCHSET block/for-next] blk-iocost: Implement absolute debt handling Tejun Heo
                   ` (4 preceding siblings ...)
  2019-09-04 19:45 ` [PATCH 5/5] iocost_monitor: Report debt Tejun Heo
@ 2019-09-10 18:32 ` Jens Axboe
  5 siblings, 0 replies; 7+ messages in thread
From: Jens Axboe @ 2019-09-10 18:32 UTC (permalink / raw)
  To: Tejun Heo; +Cc: linux-block, linux-kernel, kernel-team, cgroups, newella

On 9/4/19 1:45 PM, Tejun Heo wrote:
> Currently, when a given cgroup doesn't have enough budget, a forced or
> merged bio will advance the cgroup's vtime by the cost calculated
> according to the hierarchical weight at the time of issue.  Once vtime
> is advanced, how the cgroup's weight changes doesn't matter.  It has
> to wait until global vtime catches up with the cgroup's.
> 
> This means that the cost is calculated based on the hweight at the
> time of issuing but may later be paid at the wrong hweight.  This, for
> example, can lead to a scenario like the following.
> 
> 1. A cgroup with a very low hweight runs out of budget.
> 
> 2. A storm of swap-out happens on it.  All of them are scaled
>     according to the current low hweight and charged to vtime pushing
>     it to a far future.
> 
> 3. All other cgroups go idle and now the above cgroup has access to
>     the whole device.  However, because vtime is already wound using
>     the past low hweight, what its current hweight is doesn't matter
>     until global vtime catches up to the local vtime.
> 
> 4. As a result, either vrate gets ramped up extremely or the IOs stall
>     while the underlying device is idle.
> 
> This patchset fixes the behavior by accounting the cost of forced or
> merged bios in absolute vtime rather than cgroup-relative.  This
> allows the cgroup to pay back the debt with whatever actual budget it
> has each period removing the hweight discrepancy.
> 
> Note that !forced bios' costs are already accounted in absolute vtime.
> This patchset puts forced charges on the same ground.
> 
> This patchset contains the following five patches and is on top of the
> current linux-block.git for-next 35e7ae82f62b ("Merge branch
> 'for-5.4/block' into for-next").
> 
>   0001-blk-iocost-Account-force-charged-overage-in-absolute.patch
>   0002-blk-iocost-Don-t-let-merges-push-vtime-into-the-futu.patch
>   0003-iocost_monitor-Always-use-strings-for-json-values.patch
>   0004-iocost_monitor-Report-more-info-with-higher-accuracy.patch
>   0005-iocost_monitor-Report-debt.patch
> 
> 0001-0002 implement absolute debt handling.  0003-0005 improve the
> monitoring script and add debt reporting.

Applied, thanks.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, back to index

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-09-04 19:45 [PATCHSET block/for-next] blk-iocost: Implement absolute debt handling Tejun Heo
2019-09-04 19:45 ` [PATCH 1/5] blk-iocost: Account force-charged overage in absolute vtime Tejun Heo
2019-09-04 19:45 ` [PATCH 2/5] blk-iocost: Don't let merges push vtime into the future Tejun Heo
2019-09-04 19:45 ` [PATCH 3/5] iocost_monitor: Always use strings for json values Tejun Heo
2019-09-04 19:45 ` [PATCH 4/5] iocost_monitor: Report more info with higher accuracy Tejun Heo
2019-09-04 19:45 ` [PATCH 5/5] iocost_monitor: Report debt Tejun Heo
2019-09-10 18:32 ` [PATCHSET block/for-next] blk-iocost: Implement absolute debt handling Jens Axboe

Linux-Block Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-block/0 linux-block/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-block linux-block/ https://lore.kernel.org/linux-block \
		linux-block@vger.kernel.org linux-block@archiver.kernel.org
	public-inbox-index linux-block


Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-block


AGPL code for this site: git clone https://public-inbox.org/ public-inbox