[PATCH 0/5][v2] blk-iolatency: Fixes and tweak the miss algo for ssds

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH 0/5][v2] blk-iolatency: Fixes and tweak the miss algo for ssds
@ 2018-09-28 17:45 Josef Bacik
  2018-09-28 17:45 ` [PATCH 1/5] blk-iolatency: use q->nr_requests directly Josef Bacik
                   ` (5 more replies)
  0 siblings, 6 replies; 7+ messages in thread
From: Josef Bacik @ 2018-09-28 17:45 UTC (permalink / raw)
  To: axboe, linux-block, kernel-team

v1->v2:
- rebased onto a recent for-4.20/block branch
- dropped the changed variable cleanup.

-- Original message --

Testing on ssd's with the current iolatency code wasn't working quite as well.
This is mostly because ssd's don't behave like rotational drives, they are more
spikey which means that using the average latency for IO wasn't very responsive
until the drive was extremely over taxed.  To deal with this I've reworked
iolatency to use a p(90) based approach for ssd latencies.  I originally
intended to use this approach for both ssd's and rotational drives, but p(90)
was too high of a bar to use.  By the time we were exceeding p(90) things were
already pretty bad.  So to keep things simpler just use p(90) for ssd's since
their latency targets tend to be orders of magnitude lower than rotational
drives, and keep the average latency calculations for rotational drives.

This testing also showed a few issues with blk-iolatency, so the preceding
patches are all fixing issues we saw in testing.  Using q->nr_requests instead
of blk_queue_depth() is probably the most subtle and important change.  We want
to limit the IO's based on the number of outstanding requests we can have in the
block layer, not necessarily how many we can have going to the device.  So make
this explicity by using nr_requests directly.  These patches have been in
production for a week on both our rotational and ssd tiers and everything is
going smoothly.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 1/5] blk-iolatency: use q->nr_requests directly
  2018-09-28 17:45 [PATCH 0/5][v2] blk-iolatency: Fixes and tweak the miss algo for ssds Josef Bacik
@ 2018-09-28 17:45 ` Josef Bacik
  2018-09-28 17:45 ` [PATCH 2/5] blk-iolatency: deal with nr_requests == 1 Josef Bacik
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: Josef Bacik @ 2018-09-28 17:45 UTC (permalink / raw)
  To: axboe, linux-block, kernel-team

We were using blk_queue_depth() assuming that it would return
nr_requests, but we hit a case in production on drives that had to have
NCQ turned off in order for them to not shit the bed which resulted in a
qd of 1, even though the nr_requests was much larger.  iolatency really
only cares about requests we are allowed to queue up, as any io that
get's onto the request list is going to be serviced soonish, so we want
to be throttling before the bio gets onto the request list.  To make
iolatency work as expected, simply use q->nr_requests instead of
blk_queue_depth() as that is what we actually care about.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 block/blk-iolatency.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/block/blk-iolatency.c b/block/blk-iolatency.c
index 27c14f8d2576..c2e38bc12f27 100644
--- a/block/blk-iolatency.c
+++ b/block/blk-iolatency.c
@@ -255,7 +255,7 @@ static void scale_cookie_change(struct blk_iolatency *blkiolat,
 				struct child_latency_info *lat_info,
 				bool up)
 {
-	unsigned long qd = blk_queue_depth(blkiolat->rqos.q);
+	unsigned long qd = blkiolat->rqos.q->nr_requests;
 	unsigned long scale = scale_amount(qd, up);
 	unsigned long old = atomic_read(&lat_info->scale_cookie);
 	unsigned long max_scale = qd << 1;
@@ -295,7 +295,7 @@ static void scale_cookie_change(struct blk_iolatency *blkiolat,
  */
 static void scale_change(struct iolatency_grp *iolat, bool up)
 {
-	unsigned long qd = blk_queue_depth(iolat->blkiolat->rqos.q);
+	unsigned long qd = iolat->blkiolat->rqos.q->nr_requests;
 	unsigned long scale = scale_amount(qd, up);
 	unsigned long old = iolat->rq_depth.max_depth;
 
@@ -857,7 +857,7 @@ static void iolatency_pd_init(struct blkg_policy_data *pd)
 
 	rq_wait_init(&iolat->rq_wait);
 	spin_lock_init(&iolat->child_lat.lock);
-	iolat->rq_depth.queue_depth = blk_queue_depth(blkg->q);
+	iolat->rq_depth.queue_depth = blkg->q->nr_requests;
 	iolat->rq_depth.max_depth = UINT_MAX;
 	iolat->rq_depth.default_depth = iolat->rq_depth.queue_depth;
 	iolat->blkiolat = blkiolat;
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 2/5] blk-iolatency: deal with nr_requests == 1
  2018-09-28 17:45 [PATCH 0/5][v2] blk-iolatency: Fixes and tweak the miss algo for ssds Josef Bacik
  2018-09-28 17:45 ` [PATCH 1/5] blk-iolatency: use q->nr_requests directly Josef Bacik
@ 2018-09-28 17:45 ` Josef Bacik
  2018-09-28 17:45 ` [PATCH 3/5] blk-iolatency: deal with small samples Josef Bacik
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: Josef Bacik @ 2018-09-28 17:45 UTC (permalink / raw)
  To: axboe, linux-block, kernel-team

Hitting the case where blk_queue_depth() returned 1 uncovered the fact
that iolatency doesn't actually handle this case properly, it simply
doesn't scale down anybody.  For this case we should go straight into
applying the time delay, which we weren't doing.  Since we already limit
the floor at 1 request this if statement is not needed, and this allows
us to set our depth to 1 which allows us to apply the delay if needed.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 block/blk-iolatency.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/blk-iolatency.c b/block/blk-iolatency.c
index c2e38bc12f27..8daea7a4fe49 100644
--- a/block/blk-iolatency.c
+++ b/block/blk-iolatency.c
@@ -312,7 +312,7 @@ static void scale_change(struct iolatency_grp *iolat, bool up)
 			iolat->rq_depth.max_depth = old;
 			wake_up_all(&iolat->rq_wait.wait);
 		}
-	} else if (old > 1) {
+	} else {
 		old >>= 1;
 		iolat->rq_depth.max_depth = max(old, 1UL);
 	}
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 3/5] blk-iolatency: deal with small samples
  2018-09-28 17:45 [PATCH 0/5][v2] blk-iolatency: Fixes and tweak the miss algo for ssds Josef Bacik
  2018-09-28 17:45 ` [PATCH 1/5] blk-iolatency: use q->nr_requests directly Josef Bacik
  2018-09-28 17:45 ` [PATCH 2/5] blk-iolatency: deal with nr_requests == 1 Josef Bacik
@ 2018-09-28 17:45 ` Josef Bacik
  2018-09-28 17:45 ` [PATCH 4/5] blk-iolatency: use a percentile approache for ssd's Josef Bacik
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: Josef Bacik @ 2018-09-28 17:45 UTC (permalink / raw)
  To: axboe, linux-block, kernel-team

There is logic to keep cgroups that haven't done a lot of IO in the most
recent scale window from being punished for over-active higher priority
groups.  However for things like ssd's where the windows are pretty
short we'll end up with small numbers of samples, so 5% of samples will
come out to 0 if there aren't enough.  Make the floor 1 sample to keep
us from improperly bailing out of scaling down.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 block/blk-iolatency.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/blk-iolatency.c b/block/blk-iolatency.c
index 8daea7a4fe49..e7be77b0ce8b 100644
--- a/block/blk-iolatency.c
+++ b/block/blk-iolatency.c
@@ -366,7 +366,7 @@ static void check_scale_change(struct iolatency_grp *iolat)
 		 * scale down event.
 		 */
 		samples_thresh = lat_info->nr_samples * 5;
-		samples_thresh = div64_u64(samples_thresh, 100);
+		samples_thresh = max(1ULL, div64_u64(samples_thresh, 100));
 		if (iolat->nr_samples <= samples_thresh)
 			return;
 	}
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 4/5] blk-iolatency: use a percentile approache for ssd's
  2018-09-28 17:45 [PATCH 0/5][v2] blk-iolatency: Fixes and tweak the miss algo for ssds Josef Bacik
                   ` (2 preceding siblings ...)
  2018-09-28 17:45 ` [PATCH 3/5] blk-iolatency: deal with small samples Josef Bacik
@ 2018-09-28 17:45 ` Josef Bacik
  2018-09-28 17:45 ` [PATCH 5/5] blk-iolatency: keep track of previous windows stats Josef Bacik
  2018-09-28 17:48 ` [PATCH 0/5][v2] blk-iolatency: Fixes and tweak the miss algo for ssds Jens Axboe
  5 siblings, 0 replies; 7+ messages in thread
From: Josef Bacik @ 2018-09-28 17:45 UTC (permalink / raw)
  To: axboe, linux-block, kernel-team

We use an average latency approach for determining if we're missing our
latency target.  This works well for rotational storage where we have
generally consistent latencies, but for ssd's and other low latency
devices you have more of a spikey behavior, which means we often won't
throttle misbehaving groups because a lot of IO completes at drastically
faster times than our latency target.  Instead keep track of how many
IO's miss our target and how many IO's are done in our time window.  If
the p(90) latency is above our target then we know we need to throttle.
With this change in place we are seeing the same throttling behavior
with our testcase on ssd's as we see with rotational drives.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 block/blk-iolatency.c | 179 ++++++++++++++++++++++++++++++++++++++++----------
 1 file changed, 145 insertions(+), 34 deletions(-)

diff --git a/block/blk-iolatency.c b/block/blk-iolatency.c
index e7be77b0ce8b..fd246805b0be 100644
--- a/block/blk-iolatency.c
+++ b/block/blk-iolatency.c
@@ -115,9 +115,21 @@ struct child_latency_info {
 	atomic_t scale_cookie;
 };
 
+struct percentile_stats {
+	u64 total;
+	u64 missed;
+};
+
+struct latency_stat {
+	union {
+		struct percentile_stats ps;
+		struct blk_rq_stat rqs;
+	};
+};
+
 struct iolatency_grp {
 	struct blkg_policy_data pd;
-	struct blk_rq_stat __percpu *stats;
+	struct latency_stat __percpu *stats;
 	struct blk_iolatency *blkiolat;
 	struct rq_depth rq_depth;
 	struct rq_wait rq_wait;
@@ -132,6 +144,7 @@ struct iolatency_grp {
 	/* Our current number of IO's for the last summation. */
 	u64 nr_samples;
 
+	bool ssd;
 	struct child_latency_info child_lat;
 };
 
@@ -172,6 +185,80 @@ static inline struct blkcg_gq *lat_to_blkg(struct iolatency_grp *iolat)
 	return pd_to_blkg(&iolat->pd);
 }
 
+static inline void latency_stat_init(struct iolatency_grp *iolat,
+				     struct latency_stat *stat)
+{
+	if (iolat->ssd) {
+		stat->ps.total = 0;
+		stat->ps.missed = 0;
+	} else
+		blk_rq_stat_init(&stat->rqs);
+}
+
+static inline void latency_stat_sum(struct iolatency_grp *iolat,
+				    struct latency_stat *sum,
+				    struct latency_stat *stat)
+{
+	if (iolat->ssd) {
+		sum->ps.total += stat->ps.total;
+		sum->ps.missed += stat->ps.missed;
+	} else
+		blk_rq_stat_sum(&sum->rqs, &stat->rqs);
+}
+
+static inline void latency_stat_record_time(struct iolatency_grp *iolat,
+					    u64 req_time)
+{
+	struct latency_stat *stat = get_cpu_ptr(iolat->stats);
+	if (iolat->ssd) {
+		if (req_time >= iolat->min_lat_nsec)
+			stat->ps.missed++;
+		stat->ps.total++;
+	} else
+		blk_rq_stat_add(&stat->rqs, req_time);
+	put_cpu_ptr(stat);
+}
+
+static inline bool latency_sum_ok(struct iolatency_grp *iolat,
+				  struct latency_stat *stat)
+{
+	if (iolat->ssd) {
+		u64 thresh = div64_u64(stat->ps.total, 10);
+		thresh = max(thresh, 1ULL);
+		return stat->ps.missed < thresh;
+	}
+	return stat->rqs.mean <= iolat->min_lat_nsec;
+}
+
+static inline u64 latency_stat_samples(struct iolatency_grp *iolat,
+				       struct latency_stat *stat)
+{
+	if (iolat->ssd)
+		return stat->ps.total;
+	return stat->rqs.nr_samples;
+}
+
+static inline void iolat_update_total_lat_avg(struct iolatency_grp *iolat,
+					      struct latency_stat *stat)
+{
+	int exp_idx;
+
+	if (iolat->ssd)
+		return;
+
+	/*
+	 * CALC_LOAD takes in a number stored in fixed point representation.
+	 * Because we are using this for IO time in ns, the values stored
+	 * are significantly larger than the FIXED_1 denominator (2048).
+	 * Therefore, rounding errors in the calculation are negligible and
+	 * can be ignored.
+	 */
+	exp_idx = min_t(int, BLKIOLATENCY_NR_EXP_FACTORS - 1,
+			div64_u64(iolat->cur_win_nsec,
+				  BLKIOLATENCY_EXP_BUCKET_SIZE));
+	CALC_LOAD(iolat->lat_avg, iolatency_exp_factors[exp_idx], stat->rqs.mean);
+}
+
 static inline bool iolatency_may_queue(struct iolatency_grp *iolat,
 				       wait_queue_entry_t *wait,
 				       bool first_block)
@@ -418,7 +505,6 @@ static void iolatency_record_time(struct iolatency_grp *iolat,
 				  struct bio_issue *issue, u64 now,
 				  bool issue_as_root)
 {
-	struct blk_rq_stat *rq_stat;
 	u64 start = bio_issue_time(issue);
 	u64 req_time;
 
@@ -444,9 +530,7 @@ static void iolatency_record_time(struct iolatency_grp *iolat,
 		return;
 	}
 
-	rq_stat = get_cpu_ptr(iolat->stats);
-	blk_rq_stat_add(rq_stat, req_time);
-	put_cpu_ptr(rq_stat);
+	latency_stat_record_time(iolat, req_time);
 }
 
 #define BLKIOLATENCY_MIN_ADJUST_TIME (500 * NSEC_PER_MSEC)
@@ -457,17 +541,17 @@ static void iolatency_check_latencies(struct iolatency_grp *iolat, u64 now)
 	struct blkcg_gq *blkg = lat_to_blkg(iolat);
 	struct iolatency_grp *parent;
 	struct child_latency_info *lat_info;
-	struct blk_rq_stat stat;
+	struct latency_stat stat;
 	unsigned long flags;
-	int cpu, exp_idx;
+	int cpu;
 
-	blk_rq_stat_init(&stat);
+	latency_stat_init(iolat, &stat);
 	preempt_disable();
 	for_each_online_cpu(cpu) {
-		struct blk_rq_stat *s;
+		struct latency_stat *s;
 		s = per_cpu_ptr(iolat->stats, cpu);
-		blk_rq_stat_sum(&stat, s);
-		blk_rq_stat_init(s);
+		latency_stat_sum(iolat, &stat, s);
+		latency_stat_init(iolat, s);
 	}
 	preempt_enable();
 
@@ -477,41 +561,33 @@ static void iolatency_check_latencies(struct iolatency_grp *iolat, u64 now)
 
 	lat_info = &parent->child_lat;
 
-	/*
-	 * CALC_LOAD takes in a number stored in fixed point representation.
-	 * Because we are using this for IO time in ns, the values stored
-	 * are significantly larger than the FIXED_1 denominator (2048).
-	 * Therefore, rounding errors in the calculation are negligible and
-	 * can be ignored.
-	 */
-	exp_idx = min_t(int, BLKIOLATENCY_NR_EXP_FACTORS - 1,
-			div64_u64(iolat->cur_win_nsec,
-				  BLKIOLATENCY_EXP_BUCKET_SIZE));
-	CALC_LOAD(iolat->lat_avg, iolatency_exp_factors[exp_idx], stat.mean);
+	iolat_update_total_lat_avg(iolat, &stat);
 
 	/* Everything is ok and we don't need to adjust the scale. */
-	if (stat.mean <= iolat->min_lat_nsec &&
+	if (latency_sum_ok(iolat, &stat) &&
 	    atomic_read(&lat_info->scale_cookie) == DEFAULT_SCALE_COOKIE)
 		return;
 
 	/* Somebody beat us to the punch, just bail. */
 	spin_lock_irqsave(&lat_info->lock, flags);
 	lat_info->nr_samples -= iolat->nr_samples;
-	lat_info->nr_samples += stat.nr_samples;
-	iolat->nr_samples = stat.nr_samples;
+	lat_info->nr_samples += latency_stat_samples(iolat, &stat);
+	iolat->nr_samples = latency_stat_samples(iolat, &stat);
 
 	if ((lat_info->last_scale_event >= now ||
 	    now - lat_info->last_scale_event < BLKIOLATENCY_MIN_ADJUST_TIME) &&
 	    lat_info->scale_lat <= iolat->min_lat_nsec)
 		goto out;
 
-	if (stat.mean <= iolat->min_lat_nsec &&
-	    stat.nr_samples >= BLKIOLATENCY_MIN_GOOD_SAMPLES) {
+	if (latency_sum_ok(iolat, &stat)) {
+		if (latency_stat_samples(iolat, &stat) <
+		    BLKIOLATENCY_MIN_GOOD_SAMPLES)
+			goto out;
 		if (lat_info->scale_grp == iolat) {
 			lat_info->last_scale_event = now;
 			scale_cookie_change(iolat->blkiolat, lat_info, true);
 		}
-	} else if (stat.mean > iolat->min_lat_nsec) {
+	} else {
 		lat_info->last_scale_event = now;
 		if (!lat_info->scale_grp ||
 		    lat_info->scale_lat > iolat->min_lat_nsec) {
@@ -808,13 +884,43 @@ static int iolatency_print_limit(struct seq_file *sf, void *v)
 	return 0;
 }
 
+static size_t iolatency_ssd_stat(struct iolatency_grp *iolat, char *buf,
+				 size_t size)
+{
+	struct latency_stat stat;
+	int cpu;
+
+	latency_stat_init(iolat, &stat);
+	preempt_disable();
+	for_each_online_cpu(cpu) {
+		struct latency_stat *s;
+		s = per_cpu_ptr(iolat->stats, cpu);
+		latency_stat_sum(iolat, &stat, s);
+	}
+	preempt_enable();
+
+	if (iolat->rq_depth.max_depth == UINT_MAX)
+		return scnprintf(buf, size, " missed=%llu total=%llu depth=max",
+				 (unsigned long long)stat.ps.missed,
+				 (unsigned long long)stat.ps.total);
+	return scnprintf(buf, size, " missed=%llu total=%llu depth=%u",
+			 (unsigned long long)stat.ps.missed,
+			 (unsigned long long)stat.ps.total,
+			 iolat->rq_depth.max_depth);
+}
+
 static size_t iolatency_pd_stat(struct blkg_policy_data *pd, char *buf,
 				size_t size)
 {
 	struct iolatency_grp *iolat = pd_to_lat(pd);
-	unsigned long long avg_lat = div64_u64(iolat->lat_avg, NSEC_PER_USEC);
-	unsigned long long cur_win = div64_u64(iolat->cur_win_nsec, NSEC_PER_MSEC);
+	unsigned long long avg_lat;
+	unsigned long long cur_win;
+
+	if (iolat->ssd)
+		return iolatency_ssd_stat(iolat, buf, size);
 
+	avg_lat = div64_u64(iolat->lat_avg, NSEC_PER_USEC);
+	cur_win = div64_u64(iolat->cur_win_nsec, NSEC_PER_MSEC);
 	if (iolat->rq_depth.max_depth == UINT_MAX)
 		return scnprintf(buf, size, " depth=max avg_lat=%llu win=%llu",
 				 avg_lat, cur_win);
@@ -831,8 +937,8 @@ static struct blkg_policy_data *iolatency_pd_alloc(gfp_t gfp, int node)
 	iolat = kzalloc_node(sizeof(*iolat), gfp, node);
 	if (!iolat)
 		return NULL;
-	iolat->stats = __alloc_percpu_gfp(sizeof(struct blk_rq_stat),
-				       __alignof__(struct blk_rq_stat), gfp);
+	iolat->stats = __alloc_percpu_gfp(sizeof(struct latency_stat),
+				       __alignof__(struct latency_stat), gfp);
 	if (!iolat->stats) {
 		kfree(iolat);
 		return NULL;
@@ -849,10 +955,15 @@ static void iolatency_pd_init(struct blkg_policy_data *pd)
 	u64 now = ktime_to_ns(ktime_get());
 	int cpu;
 
+	if (blk_queue_nonrot(blkg->q))
+		iolat->ssd = true;
+	else
+		iolat->ssd = false;
+
 	for_each_possible_cpu(cpu) {
-		struct blk_rq_stat *stat;
+		struct latency_stat *stat;
 		stat = per_cpu_ptr(iolat->stats, cpu);
-		blk_rq_stat_init(stat);
+		latency_stat_init(iolat, stat);
 	}
 
 	rq_wait_init(&iolat->rq_wait);
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 5/5] blk-iolatency: keep track of previous windows stats
  2018-09-28 17:45 [PATCH 0/5][v2] blk-iolatency: Fixes and tweak the miss algo for ssds Josef Bacik
                   ` (3 preceding siblings ...)
  2018-09-28 17:45 ` [PATCH 4/5] blk-iolatency: use a percentile approache for ssd's Josef Bacik
@ 2018-09-28 17:45 ` Josef Bacik
  2018-09-28 17:48 ` [PATCH 0/5][v2] blk-iolatency: Fixes and tweak the miss algo for ssds Jens Axboe
  5 siblings, 0 replies; 7+ messages in thread
From: Josef Bacik @ 2018-09-28 17:45 UTC (permalink / raw)
  To: axboe, linux-block, kernel-team

We apply a smoothing to the scale changes in order to keep sawtoothy
behavior from occurring.  However our window for checking if we've
missed our target can sometimes be lower than the smoothing interval
(500ms), especially on faster drives like ssd's.  In order to deal with
this keep track of the running tally of the previous intervals that we
threw away because we had already done a scale event recently.

This is needed for the ssd case as these low latency drives will have
bursts of latency, and if it happens to be ok for the window that
directly follows the opening of the scale window we could unthrottle
when previous windows we were missing our target.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 block/blk-iolatency.c | 20 +++++++++++++-------
 1 file changed, 13 insertions(+), 7 deletions(-)

diff --git a/block/blk-iolatency.c b/block/blk-iolatency.c
index fd246805b0be..35c48d7b8f78 100644
--- a/block/blk-iolatency.c
+++ b/block/blk-iolatency.c
@@ -130,6 +130,7 @@ struct latency_stat {
 struct iolatency_grp {
 	struct blkg_policy_data pd;
 	struct latency_stat __percpu *stats;
+	struct latency_stat cur_stat;
 	struct blk_iolatency *blkiolat;
 	struct rq_depth rq_depth;
 	struct rq_wait rq_wait;
@@ -570,24 +571,27 @@ static void iolatency_check_latencies(struct iolatency_grp *iolat, u64 now)
 
 	/* Somebody beat us to the punch, just bail. */
 	spin_lock_irqsave(&lat_info->lock, flags);
+
+	latency_stat_sum(iolat, &iolat->cur_stat, &stat);
 	lat_info->nr_samples -= iolat->nr_samples;
-	lat_info->nr_samples += latency_stat_samples(iolat, &stat);
-	iolat->nr_samples = latency_stat_samples(iolat, &stat);
+	lat_info->nr_samples += latency_stat_samples(iolat, &iolat->cur_stat);
+	iolat->nr_samples = latency_stat_samples(iolat, &iolat->cur_stat);
 
 	if ((lat_info->last_scale_event >= now ||
-	    now - lat_info->last_scale_event < BLKIOLATENCY_MIN_ADJUST_TIME) &&
-	    lat_info->scale_lat <= iolat->min_lat_nsec)
+	    now - lat_info->last_scale_event < BLKIOLATENCY_MIN_ADJUST_TIME))
 		goto out;
 
-	if (latency_sum_ok(iolat, &stat)) {
-		if (latency_stat_samples(iolat, &stat) <
+	if (latency_sum_ok(iolat, &iolat->cur_stat) &&
+	    latency_sum_ok(iolat, &stat)) {
+		if (latency_stat_samples(iolat, &iolat->cur_stat) <
 		    BLKIOLATENCY_MIN_GOOD_SAMPLES)
 			goto out;
 		if (lat_info->scale_grp == iolat) {
 			lat_info->last_scale_event = now;
 			scale_cookie_change(iolat->blkiolat, lat_info, true);
 		}
-	} else {
+	} else if (lat_info->scale_lat == 0 ||
+		   lat_info->scale_lat >= iolat->min_lat_nsec) {
 		lat_info->last_scale_event = now;
 		if (!lat_info->scale_grp ||
 		    lat_info->scale_lat > iolat->min_lat_nsec) {
@@ -596,6 +600,7 @@ static void iolatency_check_latencies(struct iolatency_grp *iolat, u64 now)
 		}
 		scale_cookie_change(iolat->blkiolat, lat_info, false);
 	}
+	latency_stat_init(iolat, &iolat->cur_stat);
 out:
 	spin_unlock_irqrestore(&lat_info->lock, flags);
 }
@@ -966,6 +971,7 @@ static void iolatency_pd_init(struct blkg_policy_data *pd)
 		latency_stat_init(iolat, stat);
 	}
 
+	latency_stat_init(iolat, &iolat->cur_stat);
 	rq_wait_init(&iolat->rq_wait);
 	spin_lock_init(&iolat->child_lat.lock);
 	iolat->rq_depth.queue_depth = blkg->q->nr_requests;
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH 0/5][v2] blk-iolatency: Fixes and tweak the miss algo for ssds
  2018-09-28 17:45 [PATCH 0/5][v2] blk-iolatency: Fixes and tweak the miss algo for ssds Josef Bacik
                   ` (4 preceding siblings ...)
  2018-09-28 17:45 ` [PATCH 5/5] blk-iolatency: keep track of previous windows stats Josef Bacik
@ 2018-09-28 17:48 ` Jens Axboe
  5 siblings, 0 replies; 7+ messages in thread
From: Jens Axboe @ 2018-09-28 17:48 UTC (permalink / raw)
  To: Josef Bacik, linux-block, kernel-team

On 9/28/18 11:45 AM, Josef Bacik wrote:
> v1->v2:
> - rebased onto a recent for-4.20/block branch
> - dropped the changed variable cleanup.
> 
> -- Original message --
> 
> Testing on ssd's with the current iolatency code wasn't working quite as well.
> This is mostly because ssd's don't behave like rotational drives, they are more
> spikey which means that using the average latency for IO wasn't very responsive
> until the drive was extremely over taxed.  To deal with this I've reworked
> iolatency to use a p(90) based approach for ssd latencies.  I originally
> intended to use this approach for both ssd's and rotational drives, but p(90)
> was too high of a bar to use.  By the time we were exceeding p(90) things were
> already pretty bad.  So to keep things simpler just use p(90) for ssd's since
> their latency targets tend to be orders of magnitude lower than rotational
> drives, and keep the average latency calculations for rotational drives.
> 
> This testing also showed a few issues with blk-iolatency, so the preceding
> patches are all fixing issues we saw in testing.  Using q->nr_requests instead
> of blk_queue_depth() is probably the most subtle and important change.  We want
> to limit the IO's based on the number of outstanding requests we can have in the
> block layer, not necessarily how many we can have going to the device.  So make
> this explicity by using nr_requests directly.  These patches have been in
> production for a week on both our rotational and ssd tiers and everything is
> going smoothly.  Thanks,

Applied for 4.20, thanks for respinning.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2018-09-28 17:48 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-09-28 17:45 [PATCH 0/5][v2] blk-iolatency: Fixes and tweak the miss algo for ssds Josef Bacik
2018-09-28 17:45 ` [PATCH 1/5] blk-iolatency: use q->nr_requests directly Josef Bacik
2018-09-28 17:45 ` [PATCH 2/5] blk-iolatency: deal with nr_requests == 1 Josef Bacik
2018-09-28 17:45 ` [PATCH 3/5] blk-iolatency: deal with small samples Josef Bacik
2018-09-28 17:45 ` [PATCH 4/5] blk-iolatency: use a percentile approache for ssd's Josef Bacik
2018-09-28 17:45 ` [PATCH 5/5] blk-iolatency: keep track of previous windows stats Josef Bacik
2018-09-28 17:48 ` [PATCH 0/5][v2] blk-iolatency: Fixes and tweak the miss algo for ssds Jens Axboe

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.