All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2] block: make iolatency avg_lat exponentially decay
@ 2018-08-01  0:25 Dennis Zhou
  2018-08-01  4:46 ` kbuild test robot
                   ` (3 more replies)
  0 siblings, 4 replies; 7+ messages in thread
From: Dennis Zhou @ 2018-08-01  0:25 UTC (permalink / raw)
  To: Tejun Heo, Jens Axboe, Josef Bacik
  Cc: kernel-team, linux-block, linux-kernel, Johannes Weiner,
	Dennis Zhou (Facebook)

From: "Dennis Zhou (Facebook)" <dennisszhou@gmail.com>

Currently, avg_lat is calculated by accumulating the mean of every
window in a long running cumulative average. As time goes on, the metric
becomes less and less useful due to the accumulated history.

This patch reuses the same calculation done in load averages to make the
avg_lat metric more lively. Unlike load averages, the avg only advances
when a window elapses (due to an io). Idle periods extend the most
recent window. Bucketing is used to limit the history of avg_lat by
binding it to the window size. So, the window range for 1/exp (decay
rate) is [1 min, 2.5 min) when windows elapse immediately.

The current sample window size is exposed in the debug info to enable
calculation of the window range.

Signed-off-by: Dennis Zhou <dennisszhou@gmail.com>
---
 block/blk-iolatency.c | 55 +++++++++++++++++++++++++++++++++----------
 1 file changed, 42 insertions(+), 13 deletions(-)

diff --git a/block/blk-iolatency.c b/block/blk-iolatency.c
index bb59b2929e0d..2a6bb7e31dda 100644
--- a/block/blk-iolatency.c
+++ b/block/blk-iolatency.c
@@ -69,6 +69,7 @@
 #include <linux/module.h>
 #include <linux/timer.h>
 #include <linux/memcontrol.h>
+#include <linux/sched/loadavg.h>
 #include <linux/sched/signal.h>
 #include <trace/events/block.h>
 #include "blk-rq-qos.h"
@@ -127,7 +128,6 @@ struct iolatency_grp {
 
 	/* total running average of our io latency. */
 	u64 total_lat_avg;
-	u64 total_lat_nr;
 
 	/* Our current number of IO's for the last summation. */
 	u64 nr_samples;
@@ -135,6 +135,27 @@ struct iolatency_grp {
 	struct child_latency_info child_lat;
 };
 
+#define BLKIOLATENCY_MIN_WIN_SIZE (100 * NSEC_PER_MSEC)
+#define BLKIOLATENCY_MAX_WIN_SIZE NSEC_PER_SEC
+/*
+ * These are the constants used to fake the fixed-point moving average
+ * calculation just like load average. The call to CALC_LOAD folds
+ * (FIXED_1 (2048) - exp_factor) * new_sample into total_lat_avg.
+ * The sampling window size is bucketed to try to approximately calculate
+ * average latency such that 1/exp (decay rate) is [1 min, 2.5 min) when
+ * windows elapse immediately.
+ */
+#define BLKIOLATENCY_NR_EXP_FACTORS 5
+#define BLKIOLATENCY_EXP_BUCKET_SIZE (BLKIOLATENCY_MAX_WIN_SIZE / \
+				      (BLKIOLATENCY_NR_EXP_FACTORS - 1))
+static const u64 iolatency_exp_factors[BLKIOLATENCY_NR_EXP_FACTORS] = {
+	2045, // exp(1/600) - 600 samples
+	2039, // exp(1/240) - 240 samples
+	2031, // exp(1/120) - 120 samples
+	2023, // exp(1/80)  - 80 samples
+	2014, // exp(1/60)  - 60 samples
+};
+
 static inline struct iolatency_grp *pd_to_lat(struct blkg_policy_data *pd)
 {
 	return pd ? container_of(pd, struct iolatency_grp, pd) : NULL;
@@ -462,7 +483,7 @@ static void iolatency_check_latencies(struct iolatency_grp *iolat, u64 now)
 	struct child_latency_info *lat_info;
 	struct blk_rq_stat stat;
 	unsigned long flags;
-	int cpu;
+	int cpu, exp_idx;
 
 	blk_rq_stat_init(&stat);
 	preempt_disable();
@@ -480,11 +501,17 @@ static void iolatency_check_latencies(struct iolatency_grp *iolat, u64 now)
 
 	lat_info = &parent->child_lat;
 
-	iolat->total_lat_avg =
-		div64_u64((iolat->total_lat_avg * iolat->total_lat_nr) +
-			  stat.mean, iolat->total_lat_nr + 1);
-
-	iolat->total_lat_nr++;
+	/*
+	 * CALC_LOAD takes in a number stored in fixed point representation.
+	 * Because we are using this for IO time in ns, the values stored
+	 * are significantly larger than the FIXED_1 denominator (2048).
+	 * Therefore, rounding errors in the calculation are negligible and
+	 * can be ignored.
+	 */
+	exp_idx = min_t(int, BLKIOLATENCY_NR_EXP_FACTORS - 1,
+			iolat->cur_win_nsec / BLKIOLATENCY_EXP_BUCKET_SIZE);
+	CALC_LOAD(iolat->total_lat_avg, iolatency_exp_factors[exp_idx],
+		  stat.mean);
 
 	/* Everything is ok and we don't need to adjust the scale. */
 	if (stat.mean <= iolat->min_lat_nsec &&
@@ -700,8 +727,9 @@ static void iolatency_set_min_lat_nsec(struct blkcg_gq *blkg, u64 val)
 	u64 oldval = iolat->min_lat_nsec;
 
 	iolat->min_lat_nsec = val;
-	iolat->cur_win_nsec = max_t(u64, val << 4, 100 * NSEC_PER_MSEC);
-	iolat->cur_win_nsec = min_t(u64, iolat->cur_win_nsec, NSEC_PER_SEC);
+	iolat->cur_win_nsec = max_t(u64, val << 4, BLKIOLATENCY_MIN_WIN_SIZE);
+	iolat->cur_win_nsec = min_t(u64, iolat->cur_win_nsec,
+				    BLKIOLATENCY_MAX_WIN_SIZE);
 
 	if (!oldval && val)
 		atomic_inc(&blkiolat->enabled);
@@ -811,13 +839,14 @@ static size_t iolatency_pd_stat(struct blkg_policy_data *pd, char *buf,
 {
 	struct iolatency_grp *iolat = pd_to_lat(pd);
 	unsigned long long avg_lat = div64_u64(iolat->total_lat_avg, NSEC_PER_USEC);
+	unsigned long long cur_win = div64_u64(iolat->cur_win_nsec, NSEC_PER_MSEC);
 
 	if (iolat->rq_depth.max_depth == UINT_MAX)
-		return scnprintf(buf, size, " depth=max avg_lat=%llu",
-				 avg_lat);
+		return scnprintf(buf, size, " depth=max avg_lat=%llu win=%llu",
+				 avg_lat, cur_win);
 
-	return scnprintf(buf, size, " depth=%u avg_lat=%llu",
-			 iolat->rq_depth.max_depth, avg_lat);
+	return scnprintf(buf, size, " depth=%u avg_lat=%llu win=%llu",
+			 iolat->rq_depth.max_depth, avg_lat, cur_win);
 }
 
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH v2] block: make iolatency avg_lat exponentially decay
  2018-08-01  0:25 [PATCH v2] block: make iolatency avg_lat exponentially decay Dennis Zhou
@ 2018-08-01  4:46 ` kbuild test robot
  2018-08-01 15:49 ` Tejun Heo
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 7+ messages in thread
From: kbuild test robot @ 2018-08-01  4:46 UTC (permalink / raw)
  To: Dennis Zhou
  Cc: kbuild-all, Tejun Heo, Jens Axboe, Josef Bacik, kernel-team,
	linux-block, linux-kernel, Johannes Weiner,
	Dennis Zhou (Facebook)

[-- Attachment #1: Type: text/plain, Size: 3800 bytes --]

Hi Dennis,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on block/for-next]
[also build test ERROR on next-20180731]
[cannot apply to v4.18-rc7]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Dennis-Zhou/block-make-iolatency-avg_lat-exponentially-decay/20180801-100533
base:   https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git for-next
config: i386-randconfig-i1-201830 (attached as .config)
compiler: gcc-7 (Debian 7.3.0-16) 7.3.0
reproduce:
        # save the attached .config to linux build tree
        make ARCH=i386 

All errors (new ones prefixed by >>):

   block/blk-iolatency.o: In function `iolatency_check_latencies':
>> block/blk-iolatency.c:511: undefined reference to `__udivdi3'

vim +511 block/blk-iolatency.c

   478	
   479	static void iolatency_check_latencies(struct iolatency_grp *iolat, u64 now)
   480	{
   481		struct blkcg_gq *blkg = lat_to_blkg(iolat);
   482		struct iolatency_grp *parent;
   483		struct child_latency_info *lat_info;
   484		struct blk_rq_stat stat;
   485		unsigned long flags;
   486		int cpu, exp_idx;
   487	
   488		blk_rq_stat_init(&stat);
   489		preempt_disable();
   490		for_each_online_cpu(cpu) {
   491			struct blk_rq_stat *s;
   492			s = per_cpu_ptr(iolat->stats, cpu);
   493			blk_rq_stat_sum(&stat, s);
   494			blk_rq_stat_init(s);
   495		}
   496		preempt_enable();
   497	
   498		parent = blkg_to_lat(blkg->parent);
   499		if (!parent)
   500			return;
   501	
   502		lat_info = &parent->child_lat;
   503	
   504		/*
   505		 * CALC_LOAD takes in a number stored in fixed point representation.
   506		 * Because we are using this for IO time in ns, the values stored
   507		 * are significantly larger than the FIXED_1 denominator (2048).
   508		 * Therefore, rounding errors in the calculation are negligible and
   509		 * can be ignored.
   510		 */
 > 511		exp_idx = min_t(int, BLKIOLATENCY_NR_EXP_FACTORS - 1,
   512				iolat->cur_win_nsec / BLKIOLATENCY_EXP_BUCKET_SIZE);
   513		CALC_LOAD(iolat->total_lat_avg, iolatency_exp_factors[exp_idx],
   514			  stat.mean);
   515	
   516		/* Everything is ok and we don't need to adjust the scale. */
   517		if (stat.mean <= iolat->min_lat_nsec &&
   518		    atomic_read(&lat_info->scale_cookie) == DEFAULT_SCALE_COOKIE)
   519			return;
   520	
   521		/* Somebody beat us to the punch, just bail. */
   522		spin_lock_irqsave(&lat_info->lock, flags);
   523		lat_info->nr_samples -= iolat->nr_samples;
   524		lat_info->nr_samples += stat.nr_samples;
   525		iolat->nr_samples = stat.nr_samples;
   526	
   527		if ((lat_info->last_scale_event >= now ||
   528		    now - lat_info->last_scale_event < BLKIOLATENCY_MIN_ADJUST_TIME) &&
   529		    lat_info->scale_lat <= iolat->min_lat_nsec)
   530			goto out;
   531	
   532		if (stat.mean <= iolat->min_lat_nsec &&
   533		    stat.nr_samples >= BLKIOLATENCY_MIN_GOOD_SAMPLES) {
   534			if (lat_info->scale_grp == iolat) {
   535				lat_info->last_scale_event = now;
   536				scale_cookie_change(iolat->blkiolat, lat_info, true);
   537			}
   538		} else if (stat.mean > iolat->min_lat_nsec) {
   539			lat_info->last_scale_event = now;
   540			if (!lat_info->scale_grp ||
   541			    lat_info->scale_lat > iolat->min_lat_nsec) {
   542				WRITE_ONCE(lat_info->scale_lat, iolat->min_lat_nsec);
   543				lat_info->scale_grp = iolat;
   544			}
   545			scale_cookie_change(iolat->blkiolat, lat_info, false);
   546		}
   547	out:
   548		spin_unlock_irqrestore(&lat_info->lock, flags);
   549	}
   550	

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 28309 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2] block: make iolatency avg_lat exponentially decay
  2018-08-01  0:25 [PATCH v2] block: make iolatency avg_lat exponentially decay Dennis Zhou
  2018-08-01  4:46 ` kbuild test robot
@ 2018-08-01 15:49 ` Tejun Heo
  2018-08-01 16:10   ` Tejun Heo
  2018-08-01 16:02 ` Johannes Weiner
  2018-08-01 16:26 ` Josef Bacik
  3 siblings, 1 reply; 7+ messages in thread
From: Tejun Heo @ 2018-08-01 15:49 UTC (permalink / raw)
  To: Dennis Zhou
  Cc: Jens Axboe, Josef Bacik, kernel-team, linux-block, linux-kernel,
	Johannes Weiner

Hello,

On Tue, Jul 31, 2018 at 05:25:59PM -0700, Dennis Zhou wrote:
...
> +	/*
> +	 * CALC_LOAD takes in a number stored in fixed point representation.
> +	 * Because we are using this for IO time in ns, the values stored
> +	 * are significantly larger than the FIXED_1 denominator (2048).
> +	 * Therefore, rounding errors in the calculation are negligible and
> +	 * can be ignored.
> +	 */
> +	exp_idx = min_t(int, BLKIOLATENCY_NR_EXP_FACTORS - 1,
> +			iolat->cur_win_nsec / BLKIOLATENCY_EXP_BUCKET_SIZE);

Build bot is complaining about naked 64bit div.  Should use one of the
div64*() helpers.

Looks good to me.  Once Johannes's concerns are addressed, please feel
free to add

Acked-by: Tejun Heo <tj@kernel.org>

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2] block: make iolatency avg_lat exponentially decay
  2018-08-01  0:25 [PATCH v2] block: make iolatency avg_lat exponentially decay Dennis Zhou
  2018-08-01  4:46 ` kbuild test robot
  2018-08-01 15:49 ` Tejun Heo
@ 2018-08-01 16:02 ` Johannes Weiner
  2018-08-01 16:26 ` Josef Bacik
  3 siblings, 0 replies; 7+ messages in thread
From: Johannes Weiner @ 2018-08-01 16:02 UTC (permalink / raw)
  To: Dennis Zhou
  Cc: Tejun Heo, Jens Axboe, Josef Bacik, kernel-team, linux-block,
	linux-kernel

On Tue, Jul 31, 2018 at 05:25:59PM -0700, Dennis Zhou wrote:
> From: "Dennis Zhou (Facebook)" <dennisszhou@gmail.com>
> 
> Currently, avg_lat is calculated by accumulating the mean of every
> window in a long running cumulative average. As time goes on, the metric
> becomes less and less useful due to the accumulated history.
> 
> This patch reuses the same calculation done in load averages to make the
> avg_lat metric more lively. Unlike load averages, the avg only advances
> when a window elapses (due to an io). Idle periods extend the most
> recent window. Bucketing is used to limit the history of avg_lat by
> binding it to the window size. So, the window range for 1/exp (decay
> rate) is [1 min, 2.5 min) when windows elapse immediately.
> 
> The current sample window size is exposed in the debug info to enable
> calculation of the window range.
> 
> Signed-off-by: Dennis Zhou <dennisszhou@gmail.com>

Looks great, thanks Dennis. Once the div build error is fixed, please
feel free to add:

Acked-by: Johannes Weiner <hannes@cmpxchg.org>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2] block: make iolatency avg_lat exponentially decay
  2018-08-01 15:49 ` Tejun Heo
@ 2018-08-01 16:10   ` Tejun Heo
  2018-08-02  6:11     ` Dennis Zhou
  0 siblings, 1 reply; 7+ messages in thread
From: Tejun Heo @ 2018-08-01 16:10 UTC (permalink / raw)
  To: Dennis Zhou
  Cc: Jens Axboe, Josef Bacik, kernel-team, linux-block, linux-kernel,
	Johannes Weiner

On Wed, Aug 01, 2018 at 08:49:58AM -0700, Tejun Heo wrote:
> Hello,
> 
> On Tue, Jul 31, 2018 at 05:25:59PM -0700, Dennis Zhou wrote:
> ...
> > +	/*
> > +	 * CALC_LOAD takes in a number stored in fixed point representation.
> > +	 * Because we are using this for IO time in ns, the values stored
> > +	 * are significantly larger than the FIXED_1 denominator (2048).
> > +	 * Therefore, rounding errors in the calculation are negligible and
> > +	 * can be ignored.
> > +	 */
> > +	exp_idx = min_t(int, BLKIOLATENCY_NR_EXP_FACTORS - 1,
> > +			iolat->cur_win_nsec / BLKIOLATENCY_EXP_BUCKET_SIZE);
> 
> Build bot is complaining about naked 64bit div.  Should use one of the
> div64*() helpers.
> 
> Looks good to me.  Once Johannes's concerns are addressed, please feel
> free to add

Ooh, one nitpick.  total_lat_avg is a bit of a misnomer now.  Maybe
rename to lat_avg?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2] block: make iolatency avg_lat exponentially decay
  2018-08-01  0:25 [PATCH v2] block: make iolatency avg_lat exponentially decay Dennis Zhou
                   ` (2 preceding siblings ...)
  2018-08-01 16:02 ` Johannes Weiner
@ 2018-08-01 16:26 ` Josef Bacik
  3 siblings, 0 replies; 7+ messages in thread
From: Josef Bacik @ 2018-08-01 16:26 UTC (permalink / raw)
  To: Dennis Zhou
  Cc: Tejun Heo, Jens Axboe, Josef Bacik, kernel-team, linux-block,
	linux-kernel, Johannes Weiner

On Tue, Jul 31, 2018 at 05:25:59PM -0700, Dennis Zhou wrote:
> From: "Dennis Zhou (Facebook)" <dennisszhou@gmail.com>
> 
> Currently, avg_lat is calculated by accumulating the mean of every
> window in a long running cumulative average. As time goes on, the metric
> becomes less and less useful due to the accumulated history.
> 
> This patch reuses the same calculation done in load averages to make the
> avg_lat metric more lively. Unlike load averages, the avg only advances
> when a window elapses (due to an io). Idle periods extend the most
> recent window. Bucketing is used to limit the history of avg_lat by
> binding it to the window size. So, the window range for 1/exp (decay
> rate) is [1 min, 2.5 min) when windows elapse immediately.
> 
> The current sample window size is exposed in the debug info to enable
> calculation of the window range.
> 
> Signed-off-by: Dennis Zhou <dennisszhou@gmail.com>

Fix the nits and you can add

Acked-by: Josef Bacik <josef@toxicpanda.com>

Thanks,

Josef

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2] block: make iolatency avg_lat exponentially decay
  2018-08-01 16:10   ` Tejun Heo
@ 2018-08-02  6:11     ` Dennis Zhou
  0 siblings, 0 replies; 7+ messages in thread
From: Dennis Zhou @ 2018-08-02  6:11 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Jens Axboe, Josef Bacik, kernel-team, linux-block, linux-kernel,
	Johannes Weiner

Hi Tejun,

On Wed, Aug 01, 2018 at 09:10:11AM -0700, Tejun Heo wrote:
> On Wed, Aug 01, 2018 at 08:49:58AM -0700, Tejun Heo wrote:
> > Hello,
> > 
> > On Tue, Jul 31, 2018 at 05:25:59PM -0700, Dennis Zhou wrote:
> > ...
> > > +	/*
> > > +	 * CALC_LOAD takes in a number stored in fixed point representation.
> > > +	 * Because we are using this for IO time in ns, the values stored
> > > +	 * are significantly larger than the FIXED_1 denominator (2048).
> > > +	 * Therefore, rounding errors in the calculation are negligible and
> > > +	 * can be ignored.
> > > +	 */
> > > +	exp_idx = min_t(int, BLKIOLATENCY_NR_EXP_FACTORS - 1,
> > > +			iolat->cur_win_nsec / BLKIOLATENCY_EXP_BUCKET_SIZE);
> > 
> > Build bot is complaining about naked 64bit div.  Should use one of the
> > div64*() helpers.
> > 
> > Looks good to me.  Once Johannes's concerns are addressed, please feel
> > free to add
> 
> Ooh, one nitpick.  total_lat_avg is a bit of a misnomer now.  Maybe
> rename to lat_avg?
> 

Yeah, that makes sense. I've renamed it to lat_avg and updated the
Documentation file as well.

Thanks,
Dennis

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2018-08-02  6:12 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-08-01  0:25 [PATCH v2] block: make iolatency avg_lat exponentially decay Dennis Zhou
2018-08-01  4:46 ` kbuild test robot
2018-08-01 15:49 ` Tejun Heo
2018-08-01 16:10   ` Tejun Heo
2018-08-02  6:11     ` Dennis Zhou
2018-08-01 16:02 ` Johannes Weiner
2018-08-01 16:26 ` Josef Bacik

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.