[PATCHSET block/for-5.8] iocost: improve use_delay and latency target handling

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCHSET block/for-5.8] iocost: improve use_delay and latency target handling
@ 2020-04-08 20:14 Tejun Heo
  2020-04-08 20:14 ` [PATCH 1/5] blk-iocost: switch to fixed non-auto-decaying use_delay Tejun Heo
                   ` (4 more replies)
  0 siblings, 5 replies; 14+ messages in thread
From: Tejun Heo @ 2020-04-08 20:14 UTC (permalink / raw)
  To: axboe; +Cc: linux-block, linux-kernel, kernel-team, cgroups, newella, josef

Hello, Jens.

This patchset improves the following two iocost control behaviors.

* iocost was failing to punish heavy shared IO generators (file metadata, memory
  reclaim) through use_delay mechanism - use_delay automatically decays which
  works well for iolatency but doesn't match how iocost behaves. This led to
  e.g. memory bombs which generate a lot of swap IOs to use over their allotted
  amount. This is fixed by adding non-decaying use_delay mechanism.

* The same latency targets were being applied regardless of the IO sizes. While
  this works fine for loose targets, it gets in the way when trying to tigthen
  them - a latency target adequate for a 4k IO is too short for a 1 meg IO.
  iocost now discounts the size portion of cost when testing whether a given IO
  met or missed its latency target.

While at it, it also makes minor changse to iocost_monitor.py.

This patchset contains the following five patches.

 0001-blk-iocost-switch-to-fixed-non-auto-decaying-use_del.patch
 0002-block-add-request-io_data_len.patch
 0003-blk-iocost-account-for-IO-size-when-testing-latencie.patch
 0004-iocost_monitor-exit-successfully-if-interval-is-zero.patch
 0005-iocost_monitor-drop-string-wrap-around-numbers-when-.patch

and is also available in the following git branch.

 git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git iocost-delay-latency

diffstat follows. Thanks.

 block/Kconfig                  |    4 +++
 block/blk-cgroup.c             |    6 ++++
 block/blk-iocost.c             |   54 ++++++++++++++++++++++++++++-------------
 block/blk-mq.c                 |    6 ++++
 include/linux/blk-cgroup.h     |   43 +++++++++++++++++++++++++-------
 include/linux/blkdev.h         |    8 ++++++
 tools/cgroup/iocost_monitor.py |   48 +++++++++++++++++++-----------------
 7 files changed, 121 insertions(+), 48 deletions(-)

--
tejun

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 1/5] blk-iocost: switch to fixed non-auto-decaying use_delay
  2020-04-08 20:14 [PATCHSET block/for-5.8] iocost: improve use_delay and latency target handling Tejun Heo
@ 2020-04-08 20:14 ` Tejun Heo
  2020-04-08 20:14 ` [PATCH 2/5] block: add request->io_data_len Tejun Heo
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 14+ messages in thread
From: Tejun Heo @ 2020-04-08 20:14 UTC (permalink / raw)
  To: axboe
  Cc: linux-block, linux-kernel, kernel-team, cgroups, newella, josef,
	Tejun Heo

The use_delay mechanism was introduced by blk-iolatency to hold memory
allocators accountable for the reclaim and other shared IOs they cause. The
duration of the delay is dynamically balanced between iolatency increasing the
value on each target miss and it auto-decaying as time passes and threads get
delayed on it.

While this works well for iolatency, iocost's control model isn't compatible
with it. There is no repeated "violation" events which can be balanced against
auto-decaying. iocost instead knows how much a given cgroup is over budget and
wants to prevent that cgroup from issuing IOs while over budget. Until now,
iocost has been adding the cost of force-issued IOs. However, this doesn't
reflect the amount which is already over budget and is simply not enough to
counter the auto-decaying allowing anon-memory leaking low priority cgroup to
go over its alloted share of IOs.

As auto-decaying doesn't make much sense for iocost, this patch introduces a
different mode of operation for use_delay - when blkcg_set_delay() are used
insted of blkcg_add/use_delay(), the delay duration is not auto-decayed until it
is explicitly cleared with blkcg_clear_delay(). iocost is updated to keep the
delay duration synchronized to the budget overage amount.

With this change, iocost can effectively police cgroups which generate
significant amount of force-issued IOs.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Josef Bacik <josef@toxicpanda.com>
---
 block/blk-cgroup.c         |  6 ++++++
 block/blk-iocost.c         | 23 ++++++++------------
 include/linux/blk-cgroup.h | 43 +++++++++++++++++++++++++++++---------
 3 files changed, 48 insertions(+), 24 deletions(-)

diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index c15a26096038..3ee9df979ff2 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -1514,6 +1514,10 @@ static void blkcg_scale_delay(struct blkcg_gq *blkg, u64 now)
 {
 	u64 old = atomic64_read(&blkg->delay_start);
 
+	/* negative use_delay means no scaling, see blkcg_set_delay() */
+	if (atomic_read(&blkg->use_delay) < 0)
+		return;
+
 	/*
 	 * We only want to scale down every second.  The idea here is that we
 	 * want to delay people for min(delay_nsec, NSEC_PER_SEC) in a certain
@@ -1701,6 +1705,8 @@ void blkcg_schedule_throttle(struct request_queue *q, bool use_memdelay)
  */
 void blkcg_add_delay(struct blkcg_gq *blkg, u64 now, u64 delta)
 {
+	if (WARN_ON_ONCE(atomic_read(&blkg->use_delay) < 0))
+		return;
 	blkcg_scale_delay(blkg, now);
 	atomic64_add(delta, &blkg->delay_nsec);
 }
diff --git a/block/blk-iocost.c b/block/blk-iocost.c
index db35ee682294..a8e99ef76a08 100644
--- a/block/blk-iocost.c
+++ b/block/blk-iocost.c
@@ -1209,14 +1209,14 @@ static enum hrtimer_restart iocg_waitq_timer_fn(struct hrtimer *timer)
 	return HRTIMER_NORESTART;
 }
 
-static bool iocg_kick_delay(struct ioc_gq *iocg, struct ioc_now *now, u64 cost)
+static bool iocg_kick_delay(struct ioc_gq *iocg, struct ioc_now *now)
 {
 	struct ioc *ioc = iocg->ioc;
 	struct blkcg_gq *blkg = iocg_to_blkg(iocg);
 	u64 vtime = atomic64_read(&iocg->vtime);
 	u64 vmargin = ioc->margin_us * now->vrate;
 	u64 margin_ns = ioc->margin_us * NSEC_PER_USEC;
-	u64 expires, oexpires;
+	u64 delta_ns, expires, oexpires;
 	u32 hw_inuse;
 
 	/* debt-adjust vtime */
@@ -1233,15 +1233,10 @@ static bool iocg_kick_delay(struct ioc_gq *iocg, struct ioc_now *now, u64 cost)
 		return false;
 
 	/* use delay */
-	if (cost) {
-		u64 cost_ns = DIV64_U64_ROUND_UP(cost * NSEC_PER_USEC,
-						 now->vrate);
-		blkcg_add_delay(blkg, now->now_ns, cost_ns);
-	}
-	blkcg_use_delay(blkg);
-
-	expires = now->now_ns + DIV64_U64_ROUND_UP(vtime - now->vnow,
-						   now->vrate) * NSEC_PER_USEC;
+	delta_ns = DIV64_U64_ROUND_UP(vtime - now->vnow,
+				      now->vrate) * NSEC_PER_USEC;
+	blkcg_set_delay(blkg, delta_ns);
+	expires = now->now_ns + delta_ns;
 
 	/* if already active and close enough, don't bother */
 	oexpires = ktime_to_ns(hrtimer_get_softexpires(&iocg->delay_timer));
@@ -1260,7 +1255,7 @@ static enum hrtimer_restart iocg_delay_timer_fn(struct hrtimer *timer)
 	struct ioc_now now;
 
 	ioc_now(iocg->ioc, &now);
-	iocg_kick_delay(iocg, &now, 0);
+	iocg_kick_delay(iocg, &now);
 
 	return HRTIMER_NORESTART;
 }
@@ -1378,7 +1373,7 @@ static void ioc_timer_fn(struct timer_list *timer)
 		    atomic64_read(&iocg->abs_vdebt)) {
 			/* might be oversleeping vtime / hweight changes, kick */
 			iocg_kick_waitq(iocg, &now);
-			iocg_kick_delay(iocg, &now, 0);
+			iocg_kick_delay(iocg, &now);
 		} else if (iocg_is_idle(iocg)) {
 			/* no waiter and idle, deactivate */
 			iocg->last_inuse = iocg->inuse;
@@ -1737,7 +1732,7 @@ static void ioc_rqos_throttle(struct rq_qos *rqos, struct bio *bio)
 	 */
 	if (bio_issue_as_root_blkg(bio) || fatal_signal_pending(current)) {
 		atomic64_add(abs_cost, &iocg->abs_vdebt);
-		if (iocg_kick_delay(iocg, &now, cost))
+		if (iocg_kick_delay(iocg, &now))
 			blkcg_schedule_throttle(rqos->q,
 					(bio->bi_opf & REQ_SWAP) == REQ_SWAP);
 		return;
diff --git a/include/linux/blk-cgroup.h b/include/linux/blk-cgroup.h
index e4a6949fd171..a95cd4c8a3fb 100644
--- a/include/linux/blk-cgroup.h
+++ b/include/linux/blk-cgroup.h
@@ -638,6 +638,8 @@ static inline bool blkcg_bio_issue_check(struct request_queue *q,
 
 static inline void blkcg_use_delay(struct blkcg_gq *blkg)
 {
+	if (WARN_ON_ONCE(atomic_read(&blkg->use_delay) < 0))
+		return;
 	if (atomic_add_return(1, &blkg->use_delay) == 1)
 		atomic_inc(&blkg->blkcg->css.cgroup->congestion_count);
 }
@@ -646,6 +648,8 @@ static inline int blkcg_unuse_delay(struct blkcg_gq *blkg)
 {
 	int old = atomic_read(&blkg->use_delay);
 
+	if (WARN_ON_ONCE(old < 0))
+		return 0;
 	if (old == 0)
 		return 0;
 
@@ -670,20 +674,39 @@ static inline int blkcg_unuse_delay(struct blkcg_gq *blkg)
 	return 1;
 }
 
+/**
+ * blkcg_set_delay - Enable allocator delay mechanism with the specified delay amount
+ * @blkg: target blkg
+ * @delay: delay duration in nsecs
+ *
+ * When enabled with this function, the delay is not decayed and must be
+ * explicitly cleared with blkcg_clear_delay(). Must not be mixed with
+ * blkcg_[un]use_delay() and blkcg_add_delay() usages.
+ */
+static inline void blkcg_set_delay(struct blkcg_gq *blkg, u64 delay)
+{
+	int old = atomic_read(&blkg->use_delay);
+
+	/* We only want 1 person setting the congestion count for this blkg. */
+	if (!old && atomic_cmpxchg(&blkg->use_delay, old, -1) == old)
+		atomic_inc(&blkg->blkcg->css.cgroup->congestion_count);
+
+	atomic64_set(&blkg->delay_nsec, delay);
+}
+
+/**
+ * blkcg_clear_delay - Disable allocator delay mechanism
+ * @blkg: target blkg
+ *
+ * Disable use_delay mechanism. See blkcg_set_delay().
+ */
 static inline void blkcg_clear_delay(struct blkcg_gq *blkg)
 {
 	int old = atomic_read(&blkg->use_delay);
-	if (!old)
-		return;
+
 	/* We only want 1 person clearing the congestion count for this blkg. */
-	while (old) {
-		int cur = atomic_cmpxchg(&blkg->use_delay, old, 0);
-		if (cur == old) {
-			atomic_dec(&blkg->blkcg->css.cgroup->congestion_count);
-			break;
-		}
-		old = cur;
-	}
+	if (old && atomic_cmpxchg(&blkg->use_delay, old, 0) == old)
+		atomic_dec(&blkg->blkcg->css.cgroup->congestion_count);
 }
 
 void blkcg_add_delay(struct blkcg_gq *blkg, u64 now, u64 delta);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 2/5] block: add request->io_data_len
  2020-04-08 20:14 [PATCHSET block/for-5.8] iocost: improve use_delay and latency target handling Tejun Heo
  2020-04-08 20:14 ` [PATCH 1/5] blk-iocost: switch to fixed non-auto-decaying use_delay Tejun Heo
@ 2020-04-08 20:14 ` Tejun Heo
  2020-04-09  1:44   ` Ming Lei
  2020-04-09  3:44   ` Bart Van Assche
  2020-04-08 20:14 ` [PATCH 3/5] blk-iocost: account for IO size when testing latencies Tejun Heo
                   ` (2 subsequent siblings)
  4 siblings, 2 replies; 14+ messages in thread
From: Tejun Heo @ 2020-04-08 20:14 UTC (permalink / raw)
  To: axboe
  Cc: linux-block, linux-kernel, kernel-team, cgroups, newella, josef,
	Tejun Heo

Currently, at the time of completeion, there's no way of knowing how big a
request was. blk-iocost will need this information to account for IO size when
calculating expected latencies.

This patch adds rq->io_data_len which remembers blk_rq_bytes() at the time the
request gets issued. The field is enabled iff CONFIG_BLK_IO_DATA_LEN is set and
doesn't increase the size of the struct even when enabled.

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 block/Kconfig          | 3 +++
 block/blk-mq.c         | 6 ++++++
 include/linux/blkdev.h | 8 ++++++++
 3 files changed, 17 insertions(+)

diff --git a/block/Kconfig b/block/Kconfig
index 3bc76bb113a0..48308e600dc8 100644
--- a/block/Kconfig
+++ b/block/Kconfig
@@ -26,6 +26,9 @@ menuconfig BLOCK
 
 if BLOCK
 
+config BLK_RQ_IO_DATA_LEN
+	bool
+
 config BLK_RQ_ALLOC_TIME
 	bool
 
diff --git a/block/blk-mq.c b/block/blk-mq.c
index f6291ceedee4..64ed22712fe4 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -415,6 +415,9 @@ struct request *blk_mq_alloc_request(struct request_queue *q, unsigned int op,
 		return ERR_PTR(-EWOULDBLOCK);
 
 	rq->__data_len = 0;
+#ifdef CONFIG_BLK_RQ_IO_DATA_LEN
+	rq->io_data_len = 0;
+#endif
 	rq->__sector = (sector_t) -1;
 	rq->bio = rq->biotail = NULL;
 	return rq;
@@ -655,6 +658,9 @@ void blk_mq_start_request(struct request *rq)
 
 	trace_block_rq_issue(q, rq);
 
+#ifdef CONFIG_BLK_RQ_IO_DATA_LEN
+	rq->io_data_len = blk_rq_bytes(rq);
+#endif
 	if (test_bit(QUEUE_FLAG_STATS, &q->queue_flags)) {
 		rq->io_start_time_ns = ktime_get_ns();
 		rq->stats_sectors = blk_rq_sectors(rq);
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 32868fbedc9e..bfd34c6a27ef 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -142,6 +142,14 @@ struct request {
 
 	/* the following two fields are internal, NEVER access directly */
 	unsigned int __data_len;	/* total data len */
+#ifdef CONFIG_BLK_RQ_IO_DATA_LEN
+	/*
+	 * Total data len at the time of issue. This doesn't get deducted by
+	 * blk_update_request() and can be used by completion path to determine
+	 * the request size.
+	 */
+	unsigned int io_data_len;
+#endif
 	sector_t __sector;		/* sector cursor */
 
 	struct bio *bio;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 3/5] blk-iocost: account for IO size when testing latencies
  2020-04-08 20:14 [PATCHSET block/for-5.8] iocost: improve use_delay and latency target handling Tejun Heo
  2020-04-08 20:14 ` [PATCH 1/5] blk-iocost: switch to fixed non-auto-decaying use_delay Tejun Heo
  2020-04-08 20:14 ` [PATCH 2/5] block: add request->io_data_len Tejun Heo
@ 2020-04-08 20:14 ` Tejun Heo
  2020-04-08 20:14 ` [PATCH 4/5] iocost_monitor: exit successfully if interval is zero Tejun Heo
  2020-04-08 20:14 ` [PATCH 5/5] iocost_monitor: drop string wrap around numbers when outputting json Tejun Heo
  4 siblings, 0 replies; 14+ messages in thread
From: Tejun Heo @ 2020-04-08 20:14 UTC (permalink / raw)
  To: axboe
  Cc: linux-block, linux-kernel, kernel-team, cgroups, newella, josef,
	Tejun Heo

On each IO completion, iocost decides whether the IO met or missed its latency
target. Currently, the targets are fixed numbers per IO type. While this can be
good enough for loose latency targets way higher than typical completion
latencies, the effect of IO size makes it difficult to tighten the latency
target - a target adequate for 4k IOs might be too tight for 512k IOs and
vice-versa.

iocost already has all the necessary information to account for different IO
sizes when testing whether the latency target is met as iocost can calculate the
size vtime cost of a given IO. This patch updates the completion path to
calculate the size vtime cost of the IO, deduct the nsec equivalent from the
observed latency and use the adjusted value to decide whether the target is met.

This makes latency targets independent from IO size and enables determining
adequate latency targets with fixed size fio runs.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Andy Newell <newella@fb.com>
---
 block/Kconfig      |  1 +
 block/blk-iocost.c | 31 +++++++++++++++++++++++++++++--
 2 files changed, 30 insertions(+), 2 deletions(-)

diff --git a/block/Kconfig b/block/Kconfig
index 48308e600dc8..3b0b698ca254 100644
--- a/block/Kconfig
+++ b/block/Kconfig
@@ -149,6 +149,7 @@ config BLK_CGROUP_IOLATENCY
 config BLK_CGROUP_IOCOST
 	bool "Enable support for cost model based cgroup IO controller"
 	depends on BLK_CGROUP=y
+	select BLK_RQ_IO_DATA_LEN
 	select BLK_RQ_ALLOC_TIME
 	---help---
 	Enabling this option enables the .weight interface for cost
diff --git a/block/blk-iocost.c b/block/blk-iocost.c
index a8e99ef76a08..76e8738f7bb5 100644
--- a/block/blk-iocost.c
+++ b/block/blk-iocost.c
@@ -260,6 +260,7 @@ enum {
 	VTIME_PER_SEC_SHIFT	= 37,
 	VTIME_PER_SEC		= 1LLU << VTIME_PER_SEC_SHIFT,
 	VTIME_PER_USEC		= VTIME_PER_SEC / USEC_PER_SEC,
+	VTIME_PER_NSEC		= VTIME_PER_SEC / NSEC_PER_SEC,
 
 	/* bound vrate adjustments within two orders of magnitude */
 	VRATE_MIN_PPM		= 10000,	/* 1% */
@@ -1668,6 +1669,30 @@ static u64 calc_vtime_cost(struct bio *bio, struct ioc_gq *iocg, bool is_merge)
 	return cost;
 }
 
+static void calc_size_vtime_cost_builtin(struct request *rq, struct ioc *ioc, u64 *costp)
+{
+	u64 pages = max_t(u64, rq->io_data_len >> IOC_PAGE_SHIFT, 1);
+
+	switch (req_op(rq)) {
+	case REQ_OP_READ:
+		*costp = pages * ioc->params.lcoefs[LCOEF_RPAGE];
+		break;
+	case REQ_OP_WRITE:
+		*costp = pages * ioc->params.lcoefs[LCOEF_WPAGE];
+		break;
+	default:
+		*costp = 0;
+	}
+}
+
+static u64 calc_size_vtime_cost(struct request *rq, struct ioc *ioc)
+{
+	u64 cost;
+
+	calc_size_vtime_cost_builtin(rq, ioc, &cost);
+	return cost;
+}
+
 static void ioc_rqos_throttle(struct rq_qos *rqos, struct bio *bio)
 {
 	struct blkcg_gq *blkg = bio->bi_blkg;
@@ -1837,7 +1862,7 @@ static void ioc_rqos_done_bio(struct rq_qos *rqos, struct bio *bio)
 static void ioc_rqos_done(struct rq_qos *rqos, struct request *rq)
 {
 	struct ioc *ioc = rqos_to_ioc(rqos);
-	u64 on_q_ns, rq_wait_ns;
+	u64 on_q_ns, rq_wait_ns, size_nsec;
 	int pidx, rw;
 
 	if (!ioc->enabled || !rq->alloc_time_ns || !rq->start_time_ns)
@@ -1858,8 +1883,10 @@ static void ioc_rqos_done(struct rq_qos *rqos, struct request *rq)
 
 	on_q_ns = ktime_get_ns() - rq->alloc_time_ns;
 	rq_wait_ns = rq->start_time_ns - rq->alloc_time_ns;
+	size_nsec = div64_u64(calc_size_vtime_cost(rq, ioc), VTIME_PER_NSEC);
 
-	if (on_q_ns <= ioc->params.qos[pidx] * NSEC_PER_USEC)
+	if (on_q_ns <= size_nsec ||
+	    on_q_ns - size_nsec <= ioc->params.qos[pidx] * NSEC_PER_USEC)
 		this_cpu_inc(ioc->pcpu_stat->missed[rw].nr_met);
 	else
 		this_cpu_inc(ioc->pcpu_stat->missed[rw].nr_missed);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 4/5] iocost_monitor: exit successfully if interval is zero
  2020-04-08 20:14 [PATCHSET block/for-5.8] iocost: improve use_delay and latency target handling Tejun Heo
                   ` (2 preceding siblings ...)
  2020-04-08 20:14 ` [PATCH 3/5] blk-iocost: account for IO size when testing latencies Tejun Heo
@ 2020-04-08 20:14 ` Tejun Heo
  2020-04-08 20:14 ` [PATCH 5/5] iocost_monitor: drop string wrap around numbers when outputting json Tejun Heo
  4 siblings, 0 replies; 14+ messages in thread
From: Tejun Heo @ 2020-04-08 20:14 UTC (permalink / raw)
  To: axboe
  Cc: linux-block, linux-kernel, kernel-team, cgroups, newella, josef,
	Tejun Heo

This is to help external tools to decide whether iocost_monitor has all its
requirements met or not based on the exit status of an -i0 run.

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 tools/cgroup/iocost_monitor.py | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/tools/cgroup/iocost_monitor.py b/tools/cgroup/iocost_monitor.py
index 7427a5ee761b..eb2363b868c5 100644
--- a/tools/cgroup/iocost_monitor.py
+++ b/tools/cgroup/iocost_monitor.py
@@ -28,7 +28,8 @@ parser.add_argument('devname', metavar='DEV',
 parser.add_argument('--cgroup', action='append', metavar='REGEX',
                     help='Regex for target cgroups, ')
 parser.add_argument('--interval', '-i', metavar='SECONDS', type=float, default=1,
-                    help='Monitoring interval in seconds')
+                    help='Monitoring interval in seconds (0 exits immediately '
+                    'after checking requirements)')
 parser.add_argument('--json', action='store_true',
                     help='Output in json')
 args = parser.parse_args()
@@ -243,6 +244,9 @@ ioc = None
 if ioc is None:
     err(f'Could not find ioc for {devname}');
 
+if interval == 0:
+    sys.exit(0)
+
 # Keep printing
 while True:
     now = time.time()
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 5/5] iocost_monitor: drop string wrap around numbers when outputting json
  2020-04-08 20:14 [PATCHSET block/for-5.8] iocost: improve use_delay and latency target handling Tejun Heo
                   ` (3 preceding siblings ...)
  2020-04-08 20:14 ` [PATCH 4/5] iocost_monitor: exit successfully if interval is zero Tejun Heo
@ 2020-04-08 20:14 ` Tejun Heo
  4 siblings, 0 replies; 14+ messages in thread
From: Tejun Heo @ 2020-04-08 20:14 UTC (permalink / raw)
  To: axboe
  Cc: linux-block, linux-kernel, kernel-team, cgroups, newella, josef,
	Tejun Heo

Wrapping numbers in strings is used by some to work around bit-width issues in
some enviroments. The problem isn't innate to json and the workaround seems to
cause more integration problems than help. Let's drop the string wrapping.

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 tools/cgroup/iocost_monitor.py | 42 +++++++++++++++++-----------------
 1 file changed, 21 insertions(+), 21 deletions(-)

diff --git a/tools/cgroup/iocost_monitor.py b/tools/cgroup/iocost_monitor.py
index eb2363b868c5..188b3379b9a1 100644
--- a/tools/cgroup/iocost_monitor.py
+++ b/tools/cgroup/iocost_monitor.py
@@ -113,14 +113,14 @@ autop_names = {
 
     def dict(self, now):
         return { 'device'               : devname,
-                 'timestamp'            : str(now),
-                 'enabled'              : str(int(self.enabled)),
-                 'running'              : str(int(self.running)),
-                 'period_ms'            : str(self.period_ms),
-                 'period_at'            : str(self.period_at),
-                 'period_vtime_at'      : str(self.vperiod_at),
-                 'busy_level'           : str(self.busy_level),
-                 'vrate_pct'            : str(self.vrate_pct), }
+                 'timestamp'            : now,
+                 'enabled'              : self.enabled,
+                 'running'              : self.running,
+                 'period_ms'            : self.period_ms,
+                 'period_at'            : self.period_at,
+                 'period_vtime_at'      : self.vperiod_at,
+                 'busy_level'           : self.busy_level,
+                 'vrate_pct'            : self.vrate_pct, }
 
     def table_preamble_str(self):
         state = ('RUN' if self.running else 'IDLE') if self.enabled else 'OFF'
@@ -175,19 +175,19 @@ autop_names = {
 
     def dict(self, now, path):
         out = { 'cgroup'                : path,
-                'timestamp'             : str(now),
-                'is_active'             : str(int(self.is_active)),
-                'weight'                : str(self.weight),
-                'weight_active'         : str(self.active),
-                'weight_inuse'          : str(self.inuse),
-                'hweight_active_pct'    : str(self.hwa_pct),
-                'hweight_inuse_pct'     : str(self.hwi_pct),
-                'inflight_pct'          : str(self.inflight_pct),
-                'debt_ms'               : str(self.debt_ms),
-                'use_delay'             : str(self.use_delay),
-                'delay_ms'              : str(self.delay_ms),
-                'usage_pct'             : str(self.usage),
-                'address'               : str(hex(self.address)) }
+                'timestamp'             : now,
+                'is_active'             : self.is_active,
+                'weight'                : self.weight,
+                'weight_active'         : self.active,
+                'weight_inuse'          : self.inuse,
+                'hweight_active_pct'    : self.hwa_pct,
+                'hweight_inuse_pct'     : self.hwi_pct,
+                'inflight_pct'          : self.inflight_pct,
+                'debt_ms'               : self.debt_ms,
+                'use_delay'             : self.use_delay,
+                'delay_ms'              : self.delay_ms,
+                'usage_pct'             : self.usage,
+                'address'               : self.address }
         for i in range(len(self.usages)):
             out[f'usage_pct_{i}'] = str(self.usages[i])
         return out
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/5] block: add request->io_data_len
  2020-04-08 20:14 ` [PATCH 2/5] block: add request->io_data_len Tejun Heo
@ 2020-04-09  1:44   ` Ming Lei
  2020-04-09  2:11     ` Tejun Heo
  2020-04-09  3:44   ` Bart Van Assche
  1 sibling, 1 reply; 14+ messages in thread
From: Ming Lei @ 2020-04-09  1:44 UTC (permalink / raw)
  To: Tejun Heo
  Cc: axboe, linux-block, linux-kernel, kernel-team, cgroups, newella, josef

Hi Tejun,

On Wed, Apr 08, 2020 at 04:14:47PM -0400, Tejun Heo wrote:
> Currently, at the time of completeion, there's no way of knowing how big a
> request was. blk-iocost will need this information to account for IO size when
> calculating expected latencies.
> 
> This patch adds rq->io_data_len which remembers blk_rq_bytes() at the time the
> request gets issued. The field is enabled iff CONFIG_BLK_IO_DATA_LEN is set and
> doesn't increase the size of the struct even when enabled.

Almost all __blk_mq_end_request() follow blk_update_request(), so the
completed bytes can be passed to __blk_mq_end_request(), then we can
avoid to introduce this field.

Also there is just 20 callers of __blk_mq_end_request(), looks this kind
of change shouldn't be too big.


Thanks, 
Ming


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/5] block: add request->io_data_len
  2020-04-09  1:44   ` Ming Lei
@ 2020-04-09  2:11     ` Tejun Heo
  2020-04-09  2:38       ` Ming Lei
  0 siblings, 1 reply; 14+ messages in thread
From: Tejun Heo @ 2020-04-09  2:11 UTC (permalink / raw)
  To: Ming Lei
  Cc: axboe, linux-block, linux-kernel, kernel-team, cgroups, newella, josef

On Thu, Apr 09, 2020 at 09:44:06AM +0800, Ming Lei wrote:
> Almost all __blk_mq_end_request() follow blk_update_request(), so the
> completed bytes can be passed to __blk_mq_end_request(), then we can
> avoid to introduce this field.

But on some drivers blk_update_request() may be called multiple times before
__blk_mq_end_request() is called and what's needed here is the total number of
bytes in the whole request, not just in the final completion.

> Also there is just 20 callers of __blk_mq_end_request(), looks this kind
> of change shouldn't be too big.

This would work iff we get rid of partial completions and if we get rid of
partial completions, we might as well stop exposing blk_update_request() and
__blk_mq_end_request().

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/5] block: add request->io_data_len
  2020-04-09  2:11     ` Tejun Heo
@ 2020-04-09  2:38       ` Ming Lei
  2020-04-09  5:08         ` Pavel Begunkov
  2020-04-13 13:56         ` Tejun Heo
  0 siblings, 2 replies; 14+ messages in thread
From: Ming Lei @ 2020-04-09  2:38 UTC (permalink / raw)
  To: Tejun Heo
  Cc: axboe, linux-block, linux-kernel, kernel-team, cgroups, newella, josef

On Wed, Apr 08, 2020 at 10:11:19PM -0400, Tejun Heo wrote:
> On Thu, Apr 09, 2020 at 09:44:06AM +0800, Ming Lei wrote:
> > Almost all __blk_mq_end_request() follow blk_update_request(), so the
> > completed bytes can be passed to __blk_mq_end_request(), then we can
> > avoid to introduce this field.
> 
> But on some drivers blk_update_request() may be called multiple times before
> __blk_mq_end_request() is called and what's needed here is the total number of
> bytes in the whole request, not just in the final completion.

OK.

Another choice might be to record request bytes in rq's payload
when calling .queue_rq() only for these drivers.

> 
> > Also there is just 20 callers of __blk_mq_end_request(), looks this kind
> > of change shouldn't be too big.
> 
> This would work iff we get rid of partial completions and if we get rid of
> partial completions, we might as well stop exposing blk_update_request() and
> __blk_mq_end_request().

Indeed, we can store the completed bytes in request payload, so looks killing
partial completion shouldn't be too hard.

Thanks,
Ming


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/5] block: add request->io_data_len
  2020-04-08 20:14 ` [PATCH 2/5] block: add request->io_data_len Tejun Heo
  2020-04-09  1:44   ` Ming Lei
@ 2020-04-09  3:44   ` Bart Van Assche
  2020-04-13 13:52     ` Tejun Heo
  1 sibling, 1 reply; 14+ messages in thread
From: Bart Van Assche @ 2020-04-09  3:44 UTC (permalink / raw)
  To: Tejun Heo, axboe
  Cc: linux-block, linux-kernel, kernel-team, cgroups, newella, josef

On 2020-04-08 13:14, Tejun Heo wrote:
> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> index 32868fbedc9e..bfd34c6a27ef 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -142,6 +142,14 @@ struct request {
>  
>  	/* the following two fields are internal, NEVER access directly */
>  	unsigned int __data_len;	/* total data len */
> +#ifdef CONFIG_BLK_RQ_IO_DATA_LEN
> +	/*
> +	 * Total data len at the time of issue. This doesn't get deducted by
> +	 * blk_update_request() and can be used by completion path to determine
> +	 * the request size.
> +	 */
> +	unsigned int io_data_len;
> +#endif
>  	sector_t __sector;		/* sector cursor */
>  
>  	struct bio *bio;

So we have one struct member with the description "total data len" and
another struct member with the description "total data len at the time
of issue"? How could one not get confused by these descriptions?

This change makes the comment above __data_len incorrect. Please update
that comment or move io_data_len in front of that comment.

How does this change interact with the code in drivers/scsi/sd.c that
manipulates __data_len directly?

Thanks,

Bart.



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/5] block: add request->io_data_len
  2020-04-09  2:38       ` Ming Lei
@ 2020-04-09  5:08         ` Pavel Begunkov
  2020-04-13 14:02           ` Tejun Heo
  2020-04-13 13:56         ` Tejun Heo
  1 sibling, 1 reply; 14+ messages in thread
From: Pavel Begunkov @ 2020-04-09  5:08 UTC (permalink / raw)
  To: Ming Lei, Tejun Heo
  Cc: axboe, linux-block, linux-kernel, kernel-team, cgroups, newella, josef

On 09/04/2020 05:38, Ming Lei wrote:
> On Wed, Apr 08, 2020 at 10:11:19PM -0400, Tejun Heo wrote:
>> On Thu, Apr 09, 2020 at 09:44:06AM +0800, Ming Lei wrote:
>>> Almost all __blk_mq_end_request() follow blk_update_request(), so the
>>> completed bytes can be passed to __blk_mq_end_request(), then we can
>>> avoid to introduce this field.
>>
>> But on some drivers blk_update_request() may be called multiple times before
>> __blk_mq_end_request() is called and what's needed here is the total number of
>> bytes in the whole request, not just in the final completion.
> 
> OK.
> 
> Another choice might be to record request bytes in rq's payload
> when calling .queue_rq() only for these drivers.
> 
>>
>>> Also there is just 20 callers of __blk_mq_end_request(), looks this kind
>>> of change shouldn't be too big.
>>
>> This would work iff we get rid of partial completions and if we get rid of
>> partial completions, we might as well stop exposing blk_update_request() and
>> __blk_mq_end_request().
> 
> Indeed, we can store the completed bytes in request payload, so looks killing
> partial completion shouldn't be too hard.

struct request already has such field (see @stats_sectors) because of the same
root-cause. I'd prefer killing it as well by following Ming's way, but otherwise
it could be easily adopted.

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/5] block: add request->io_data_len
  2020-04-09  3:44   ` Bart Van Assche
@ 2020-04-13 13:52     ` Tejun Heo
  0 siblings, 0 replies; 14+ messages in thread
From: Tejun Heo @ 2020-04-13 13:52 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: axboe, linux-block, linux-kernel, kernel-team, cgroups, newella, josef

Hello,

On Wed, Apr 08, 2020 at 08:44:17PM -0700, Bart Van Assche wrote:
> On 2020-04-08 13:14, Tejun Heo wrote:
> > diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> > index 32868fbedc9e..bfd34c6a27ef 100644
> > --- a/include/linux/blkdev.h
> > +++ b/include/linux/blkdev.h
> > @@ -142,6 +142,14 @@ struct request {
> >  
> >  	/* the following two fields are internal, NEVER access directly */
> >  	unsigned int __data_len;	/* total data len */
> > +#ifdef CONFIG_BLK_RQ_IO_DATA_LEN
> > +	/*
> > +	 * Total data len at the time of issue. This doesn't get deducted by
> > +	 * blk_update_request() and can be used by completion path to determine
> > +	 * the request size.
> > +	 */
> > +	unsigned int io_data_len;
> > +#endif
> >  	sector_t __sector;		/* sector cursor */
> >  
> >  	struct bio *bio;
> 
> So we have one struct member with the description "total data len" and
> another struct member with the description "total data len at the time
> of issue"? How could one not get confused by these descriptions?

The new one explicitly says it doesn't get deducted by update_request.

> This change makes the comment above __data_len incorrect. Please update
> that comment or move io_data_len in front of that comment.

Sure.

> How does this change interact with the code in drivers/scsi/sd.c that
> manipulates __data_len directly?

It doesn't.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/5] block: add request->io_data_len
  2020-04-09  2:38       ` Ming Lei
  2020-04-09  5:08         ` Pavel Begunkov
@ 2020-04-13 13:56         ` Tejun Heo
  1 sibling, 0 replies; 14+ messages in thread
From: Tejun Heo @ 2020-04-13 13:56 UTC (permalink / raw)
  To: Ming Lei
  Cc: axboe, linux-block, linux-kernel, kernel-team, cgroups, newella, josef

Hello,

On Thu, Apr 09, 2020 at 10:38:57AM +0800, Ming Lei wrote:
> On Wed, Apr 08, 2020 at 10:11:19PM -0400, Tejun Heo wrote:
> > On Thu, Apr 09, 2020 at 09:44:06AM +0800, Ming Lei wrote:
> > > Almost all __blk_mq_end_request() follow blk_update_request(), so the
> > > completed bytes can be passed to __blk_mq_end_request(), then we can
> > > avoid to introduce this field.
> > 
> > But on some drivers blk_update_request() may be called multiple times before
> > __blk_mq_end_request() is called and what's needed here is the total number of
> > bytes in the whole request, not just in the final completion.
> 
> OK.
> 
> Another choice might be to record request bytes in rq's payload
> when calling .queue_rq() only for these drivers.

There sure are multiple ways to skin a cat.

> > > Also there is just 20 callers of __blk_mq_end_request(), looks this kind
> > > of change shouldn't be too big.
> > 
> > This would work iff we get rid of partial completions and if we get rid of
> > partial completions, we might as well stop exposing blk_update_request() and
> > __blk_mq_end_request().
> 
> Indeed, we can store the completed bytes in request payload, so looks killing
> partial completion shouldn't be too hard.

There's a reason why we've had partial completions. On slower IO devices, like
floppy, partial completions actually are advantageous. I'm not arguing this
still holds up as a valid justification but getting rid of partial completions
isn't just a decision about plumbing details either.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/5] block: add request->io_data_len
  2020-04-09  5:08         ` Pavel Begunkov
@ 2020-04-13 14:02           ` Tejun Heo
  0 siblings, 0 replies; 14+ messages in thread
From: Tejun Heo @ 2020-04-13 14:02 UTC (permalink / raw)
  To: Pavel Begunkov
  Cc: Ming Lei, axboe, linux-block, linux-kernel, kernel-team, cgroups,
	newella, josef

Hello,

On Thu, Apr 09, 2020 at 08:08:59AM +0300, Pavel Begunkov wrote:
> struct request already has such field (see @stats_sectors) because of the same
> root-cause. I'd prefer killing it as well by following Ming's way, but otherwise
> it could be easily adopted.

Oh, I completely missed that field. Lemme just use that one instead. Please
disregard this patch. As for killing partial completions, while I'm not
necessarily against it, it isn't a no brainer either.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2020-04-13 14:02 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-04-08 20:14 [PATCHSET block/for-5.8] iocost: improve use_delay and latency target handling Tejun Heo
2020-04-08 20:14 ` [PATCH 1/5] blk-iocost: switch to fixed non-auto-decaying use_delay Tejun Heo
2020-04-08 20:14 ` [PATCH 2/5] block: add request->io_data_len Tejun Heo
2020-04-09  1:44   ` Ming Lei
2020-04-09  2:11     ` Tejun Heo
2020-04-09  2:38       ` Ming Lei
2020-04-09  5:08         ` Pavel Begunkov
2020-04-13 14:02           ` Tejun Heo
2020-04-13 13:56         ` Tejun Heo
2020-04-09  3:44   ` Bart Van Assche
2020-04-13 13:52     ` Tejun Heo
2020-04-08 20:14 ` [PATCH 3/5] blk-iocost: account for IO size when testing latencies Tejun Heo
2020-04-08 20:14 ` [PATCH 4/5] iocost_monitor: exit successfully if interval is zero Tejun Heo
2020-04-08 20:14 ` [PATCH 5/5] iocost_monitor: drop string wrap around numbers when outputting json Tejun Heo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).