All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/4] path latency prio fixes
@ 2017-11-18  0:11 Martin Wilck
  2017-11-18  0:11 ` [PATCH 1/4] libmultipath: path latency: fix default base num Martin Wilck
                   ` (4 more replies)
  0 siblings, 5 replies; 14+ messages in thread
From: Martin Wilck @ 2017-11-18  0:11 UTC (permalink / raw)
  To: Christophe Varoqui, Guan Junxiong; +Cc: dm-devel

Hi Christophe and Guan,

I was hoping to get down to the review before Christophe committed
your your patch "calculate standard deviation on a logarithmic scale for
prioritizer path_latency". I failed to do this in timely manner, sorry.

But I still have some issues with the patch. Please check these fixes.
I also have remaining issues with the "intermittent IO accounting"
patch, will send these later.

Martin Wilck (4):
  libmultipath: path latency: fix default base num
  libmultipath: path latency: log threshold with p2
  libmultipath: path latency: simplify getprio()
  libmultipath: path latency: remove warnings

 libmultipath/prioritizers/path_latency.c | 112 ++++++++++---------------------
 1 file changed, 34 insertions(+), 78 deletions(-)

-- 
2.15.0

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 1/4] libmultipath: path latency: fix default base num
  2017-11-18  0:11 [PATCH 0/4] path latency prio fixes Martin Wilck
@ 2017-11-18  0:11 ` Martin Wilck
  2017-11-19  2:19   ` Guan Junxiong
  2017-11-18  0:11 ` [PATCH 2/4] libmultipath: path latency: log threshold with p2 Martin Wilck
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 14+ messages in thread
From: Martin Wilck @ 2017-11-18  0:11 UTC (permalink / raw)
  To: Christophe Varoqui, Guan Junxiong; +Cc: dm-devel

I don't think anyone can measure latency to 1% accuracy. It's
better to not even pretend to be able to. 10% should be fine
even for the most latency-critical environments.

Signed-off-by: Martin Wilck <mwilck@suse.com>
---
 libmultipath/prioritizers/path_latency.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/libmultipath/prioritizers/path_latency.c b/libmultipath/prioritizers/path_latency.c
index 9d5397ec1b3a..b8c5bc7c50a4 100644
--- a/libmultipath/prioritizers/path_latency.c
+++ b/libmultipath/prioritizers/path_latency.c
@@ -42,8 +42,9 @@
 #define DEF_IO_NUM		100
 
 #define MAX_BASE_NUM		10
-#define MIN_BASE_NUM		1.01
-#define DEF_BASE_NUM		1.5
+#define MIN_BASE_NUM		1.1
+// This is 10**(1/4). 4 prio steps correspond to a factor of 10.
+#define DEF_BASE_NUM		1.77827941004
 
 #define MAX_AVG_LATENCY		100000000.	/* Unit: us */
 #define MIN_AVG_LATENCY		1.		/* Unit: us */
-- 
2.15.0

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 2/4] libmultipath: path latency: log threshold with p2
  2017-11-18  0:11 [PATCH 0/4] path latency prio fixes Martin Wilck
  2017-11-18  0:11 ` [PATCH 1/4] libmultipath: path latency: fix default base num Martin Wilck
@ 2017-11-18  0:11 ` Martin Wilck
  2017-11-19  2:23   ` Guan Junxiong
  2017-11-18  0:11 ` [PATCH 3/4] libmultipath: path latency: simplify getprio() Martin Wilck
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 14+ messages in thread
From: Martin Wilck @ 2017-11-18  0:11 UTC (permalink / raw)
  To: Christophe Varoqui, Guan Junxiong; +Cc: dm-devel

This is not a critical error. It just means that the path in
question will have low priority (rightly so, if it has >100s latency).

Signed-off-by: Martin Wilck <mwilck@suse.com>
---
 libmultipath/prioritizers/path_latency.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libmultipath/prioritizers/path_latency.c b/libmultipath/prioritizers/path_latency.c
index b8c5bc7c50a4..ce5c95dd7075 100644
--- a/libmultipath/prioritizers/path_latency.c
+++ b/libmultipath/prioritizers/path_latency.c
@@ -299,7 +299,7 @@ int getprio(struct path *pp, char *args, unsigned int timeout)
 			(long long)pow(base_num, lg_avglatency));
 
 	if (lg_avglatency > lg_maxavglatency) {
-		pp_pl_log(0,
+		pp_pl_log(2,
 			  "%s: average latency (%lld us) is outside the thresold (%lld us)",
 			  pp->dev, (long long)pow(base_num, lg_avglatency),
 			  (long long)MAX_AVG_LATENCY);
-- 
2.15.0

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 3/4] libmultipath: path latency: simplify getprio()
  2017-11-18  0:11 [PATCH 0/4] path latency prio fixes Martin Wilck
  2017-11-18  0:11 ` [PATCH 1/4] libmultipath: path latency: fix default base num Martin Wilck
  2017-11-18  0:11 ` [PATCH 2/4] libmultipath: path latency: log threshold with p2 Martin Wilck
@ 2017-11-18  0:11 ` Martin Wilck
  2017-11-19  2:52   ` Guan Junxiong
  2017-12-07  4:26   ` Guan Junxiong
  2017-11-18  0:11 ` [PATCH 4/4] libmultipath: path latency: remove warnings Martin Wilck
  2017-11-20  9:11 ` [PATCH 0/4] path latency prio fixes Martin Wilck
  4 siblings, 2 replies; 14+ messages in thread
From: Martin Wilck @ 2017-11-18  0:11 UTC (permalink / raw)
  To: Christophe Varoqui, Guan Junxiong; +Cc: dm-devel

The log standard deviation can be calculated much more simply
by realizing

   sum_n (x_i - avg(x))^2 == sum_n x_i^2 - n * avg(x)^2

Also, use timespecsub rather than the custom timeval_to_usec,
and avoid taking log(0).

Signed-off-by: Martin Wilck <mwilck@suse.com>
---
 libmultipath/prioritizers/path_latency.c | 71 ++++++++++++++------------------
 1 file changed, 30 insertions(+), 41 deletions(-)

diff --git a/libmultipath/prioritizers/path_latency.c b/libmultipath/prioritizers/path_latency.c
index ce5c95dd7075..44472b77dd86 100644
--- a/libmultipath/prioritizers/path_latency.c
+++ b/libmultipath/prioritizers/path_latency.c
@@ -34,6 +34,7 @@
 #include "prio.h"
 #include "structs.h"
 #include "util.h"
+#include "time-util.h"
 
 #define pp_pl_log(prio, fmt, args...) condlog(prio, "path_latency prio: " fmt, ##args)
 
@@ -56,14 +57,6 @@
 
 #define DEF_BLK_SIZE		4096
 
-static double lg_path_latency[MAX_IO_NUM];
-
-static inline long long timeval_to_us(const struct timespec *tv)
-{
-	return ((long long)tv->tv_sec * USEC_PER_SEC) +
-	    (tv->tv_nsec / NSEC_PER_USEC);
-}
-
 static int prepare_directio_read(int fd, int *blksz, char **pbuf,
 		int *restore_flags)
 {
@@ -199,22 +192,6 @@ out:
 	return 0;
 }
 
-double calc_standard_deviation(double *lg_path_latency, int size,
-				  double lg_avglatency)
-{
-	int index;
-	double sum = 0;
-
-	for (index = 0; index < size; index++) {
-		sum += (lg_path_latency[index] - lg_avglatency) *
-			(lg_path_latency[index] - lg_avglatency);
-	}
-
-	sum /= (size - 1);
-
-	return sqrt(sum);
-}
-
 /*
  * Do not scale the prioriy in a certain range such as [0, 1024]
  * because scaling will eliminate the effect of base_num.
@@ -234,20 +211,16 @@ int calcPrio(double lg_avglatency, double lg_maxavglatency,
 int getprio(struct path *pp, char *args, unsigned int timeout)
 {
 	int rc, temp;
-	int index = 0;
 	int io_num = 0;
 	double base_num = 0;
 	double lg_avglatency, lg_maxavglatency, lg_minavglatency;
 	double standard_deviation;
 	double lg_toldelay = 0;
-	long long before, after;
-	struct timespec tv;
 	int blksize;
 	char *buf;
 	int restore_flags = 0;
 	double lg_base;
-	long long sum_latency = 0;
-	long long arith_mean_lat;
+	double sum_squares = 0;
 
 	if (pp->fd < 0)
 		return -1;
@@ -260,7 +233,6 @@ int getprio(struct path *pp, char *args, unsigned int timeout)
 				pp->dev, io_num, base_num);
 	}
 
-	memset(lg_path_latency, 0, sizeof(lg_path_latency));
 	lg_base = log(base_num);
 	lg_maxavglatency = log(MAX_AVG_LATENCY) / lg_base;
 	lg_minavglatency = log(MIN_AVG_LATENCY) / lg_base;
@@ -269,8 +241,10 @@ int getprio(struct path *pp, char *args, unsigned int timeout)
 
 	temp = io_num;
 	while (temp-- > 0) {
-		(void)clock_gettime(CLOCK_MONOTONIC, &tv);
-		before = timeval_to_us(&tv);
+		struct timespec tv_before, tv_after, tv_diff;
+		double diff, reldiff;
+
+		(void)clock_gettime(CLOCK_MONOTONIC, &tv_before);
 
 		if (do_directio_read(pp->fd, timeout, buf, blksize)) {
 			pp_pl_log(0, "%s: path down", pp->dev);
@@ -278,25 +252,34 @@ int getprio(struct path *pp, char *args, unsigned int timeout)
 			return -1;
 		}
 
-		(void)clock_gettime(CLOCK_MONOTONIC, &tv);
-		after = timeval_to_us(&tv);
+		(void)clock_gettime(CLOCK_MONOTONIC, &tv_after);
+
+		timespecsub(&tv_after, &tv_before, &tv_diff);
+		diff = tv_diff.tv_sec * 1000 * 1000 + tv_diff.tv_nsec / 1000;
+
+		if (diff == 0)
+			/*
+			 * Avoid taking log(0).
+			 * This unlikely case is treated as minimum -
+			 * the sums don't increase
+			 */
+			continue;
+
+		/* we scale by lg_base here */
+		reldiff = log(diff) / lg_base;
+
 		/*
 		 * We assume that the latency complies with Log-normal
 		 * distribution. The logarithm of latency is in normal
 		 * distribution.
 		 */
-		lg_path_latency[index] = log(after - before) / lg_base;
-		lg_toldelay += lg_path_latency[index++];
-		sum_latency += after - before;
+		lg_toldelay += reldiff;
+		sum_squares += reldiff * reldiff;
 	}
 
 	cleanup_directio_read(pp->fd, buf, restore_flags);
 
 	lg_avglatency = lg_toldelay / (long long)io_num;
-	arith_mean_lat = sum_latency / (long long)io_num;
-	pp_pl_log(4, "%s: arithmetic mean latency is (%lld us), geometric mean latency is (%lld us)",
-			pp->dev, arith_mean_lat,
-			(long long)pow(base_num, lg_avglatency));
 
 	if (lg_avglatency > lg_maxavglatency) {
 		pp_pl_log(2,
@@ -340,8 +323,14 @@ int getprio(struct path *pp, char *args, unsigned int timeout)
 			  ".It is recommend to be set larger",
 			  pp->dev, base_num);
 
+	standard_deviation = sqrt((sum_squares - lg_toldelay * lg_toldelay)
+				  / (io_num -1));
 
 	rc = calcPrio(lg_avglatency, lg_maxavglatency, lg_minavglatency);
 
+	pp_pl_log(3, "%s: latency avg=%.2e uncertainty=%.1f prio=%d\n",
+		  pp->dev, exp(lg_avglatency * lg_base),
+		  exp(standard_deviation * lg_base), rc);
+
 	return rc;
 }
-- 
2.15.0

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 4/4] libmultipath: path latency: remove warnings
  2017-11-18  0:11 [PATCH 0/4] path latency prio fixes Martin Wilck
                   ` (2 preceding siblings ...)
  2017-11-18  0:11 ` [PATCH 3/4] libmultipath: path latency: simplify getprio() Martin Wilck
@ 2017-11-18  0:11 ` Martin Wilck
  2017-11-19  3:00   ` Guan Junxiong
  2017-11-20  9:11 ` [PATCH 0/4] path latency prio fixes Martin Wilck
  4 siblings, 1 reply; 14+ messages in thread
From: Martin Wilck @ 2017-11-18  0:11 UTC (permalink / raw)
  To: Christophe Varoqui, Guan Junxiong; +Cc: dm-devel

The warnings at here are pointless. We are looking at a single
path only. Firstly, the standdard deviation for this measurement
can't be "too low" - the lower, the more precise the measurement,
the better. Secondly, a high standard deviation indicates an
unstable path with highly variable latency. Not good, but nothing
to warn about here.

What matters for the selection of "base_num" is not how a single
path behaves, but how different paths of the same path group relate
to each other, which we don't know at this point at the code.

What we want to avoid is too fine a differentiation, in particular
in combination with group_by_prio, because we'd loose the ability for
load balancing. But this is rather a topic for the man page or a
"best practices" document.

Signed-off-by: Martin Wilck <mwilck@suse.com>
---
 libmultipath/prioritizers/path_latency.c | 34 --------------------------------
 1 file changed, 34 deletions(-)

diff --git a/libmultipath/prioritizers/path_latency.c b/libmultipath/prioritizers/path_latency.c
index 44472b77dd86..b8e2765d9c16 100644
--- a/libmultipath/prioritizers/path_latency.c
+++ b/libmultipath/prioritizers/path_latency.c
@@ -289,40 +289,6 @@ int getprio(struct path *pp, char *args, unsigned int timeout)
 		return DEFAULT_PRIORITY;
 	}
 
-	standard_deviation = calc_standard_deviation(lg_path_latency,
-			index, lg_avglatency);
-	/*
-	 * In calPrio(), we let prio y = f(x) = log(max, base) - log (x, base);
-	 * So if we want to let the priority of the latency outside 2 standard
-	 * deviations can be distinguished from the latency inside 2 standard
-	 * deviation, in others words at most 95% are the same and at least 5%
-	 * are different according interval estimation of normal distribution,
-	 * we should warn the user to set the base_num to be smaller if the
-	 * log(x_threshold, base) is small than 2 standard deviation.
-	 * x_threshold is derived from:
-	 * y + 1 = f(x) + 1 = f(x) + log(base, base), so x_threadshold =
-	 * base_num; Note that we only can compare the logarithm of x_threshold
-	 * with the standard deviation because the standard deviation is derived
-	 * from logarithm of latency.
-	 *
-	 * therefore , we recommend the base_num to meet the condition :
-	 * 1 <= 2 * standard_deviation
-	 */
-	pp_pl_log(5, "%s: standard deviation for logarithm of latency = %.6f",
-			pp->dev, standard_deviation);
-	if (standard_deviation <= 0.5)
-		pp_pl_log(3, "%s: the base_num(%.3lf) is too big to distinguish different priority "
-			  "of two far-away latency. It is recommend to be set smaller",
-			  pp->dev, base_num);
-	/*
-	 * If the standard deviation is too large , we should also warn the user
-	 */
-
-	if (standard_deviation > 4)
-		pp_pl_log(3, "%s: the base_num(%.3lf) is too small to avoid noise disturbance "
-			  ".It is recommend to be set larger",
-			  pp->dev, base_num);
-
 	standard_deviation = sqrt((sum_squares - lg_toldelay * lg_toldelay)
 				  / (io_num -1));
 
-- 
2.15.0

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/4] libmultipath: path latency: fix default base num
  2017-11-18  0:11 ` [PATCH 1/4] libmultipath: path latency: fix default base num Martin Wilck
@ 2017-11-19  2:19   ` Guan Junxiong
  0 siblings, 0 replies; 14+ messages in thread
From: Guan Junxiong @ 2017-11-19  2:19 UTC (permalink / raw)
  To: Martin Wilck, Christophe Varoqui; +Cc: dm-devel, niuhaoxin, Shenhong (C)

It looks good.

Reviewed-by: Guan Junxiong <guanjunxiong@huawei.com>

On 2017/11/18 8:11, Martin Wilck wrote:
> I don't think anyone can measure latency to 1% accuracy. It's
> better to not even pretend to be able to. 10% should be fine
> even for the most latency-critical environments.
> 
> Signed-off-by: Martin Wilck <mwilck@suse.com>
> ---
>  libmultipath/prioritizers/path_latency.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/libmultipath/prioritizers/path_latency.c b/libmultipath/prioritizers/path_latency.c
> index 9d5397ec1b3a..b8c5bc7c50a4 100644
> --- a/libmultipath/prioritizers/path_latency.c
> +++ b/libmultipath/prioritizers/path_latency.c
> @@ -42,8 +42,9 @@
>  #define DEF_IO_NUM		100
>  
>  #define MAX_BASE_NUM		10
> -#define MIN_BASE_NUM		1.01
> -#define DEF_BASE_NUM		1.5
> +#define MIN_BASE_NUM		1.1
> +// This is 10**(1/4). 4 prio steps correspond to a factor of 10.
> +#define DEF_BASE_NUM		1.77827941004
>  
>  #define MAX_AVG_LATENCY		100000000.	/* Unit: us */
>  #define MIN_AVG_LATENCY		1.		/* Unit: us */
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/4] libmultipath: path latency: log threshold with p2
  2017-11-18  0:11 ` [PATCH 2/4] libmultipath: path latency: log threshold with p2 Martin Wilck
@ 2017-11-19  2:23   ` Guan Junxiong
  0 siblings, 0 replies; 14+ messages in thread
From: Guan Junxiong @ 2017-11-19  2:23 UTC (permalink / raw)
  To: Martin Wilck, Christophe Varoqui; +Cc: dm-devel


Looks good.  Thanks for reviewing.

Reviewed-by: Guan Junxiong <guanjunxiong@huawei.com>

On 2017/11/18 8:11, Martin Wilck wrote:
> This is not a critical error. It just means that the path in
> question will have low priority (rightly so, if it has >100s latency).
> 
> Signed-off-by: Martin Wilck <mwilck@suse.com>> ---
>  libmultipath/prioritizers/path_latency.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/libmultipath/prioritizers/path_latency.c b/libmultipath/prioritizers/path_latency.c
> index b8c5bc7c50a4..ce5c95dd7075 100644
> --- a/libmultipath/prioritizers/path_latency.c
> +++ b/libmultipath/prioritizers/path_latency.c
> @@ -299,7 +299,7 @@ int getprio(struct path *pp, char *args, unsigned int timeout)
>  			(long long)pow(base_num, lg_avglatency));
>  
>  	if (lg_avglatency > lg_maxavglatency) {
> -		pp_pl_log(0,
> +		pp_pl_log(2,
>  			  "%s: average latency (%lld us) is outside the thresold (%lld us)",
>  			  pp->dev, (long long)pow(base_num, lg_avglatency),
>  			  (long long)MAX_AVG_LATENCY);
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/4] libmultipath: path latency: simplify getprio()
  2017-11-18  0:11 ` [PATCH 3/4] libmultipath: path latency: simplify getprio() Martin Wilck
@ 2017-11-19  2:52   ` Guan Junxiong
  2017-11-20  8:46     ` Martin Wilck
  2017-12-07  4:26   ` Guan Junxiong
  1 sibling, 1 reply; 14+ messages in thread
From: Guan Junxiong @ 2017-11-19  2:52 UTC (permalink / raw)
  To: Martin Wilck, Christophe Varoqui; +Cc: dm-devel, niuhaoxin, Shenhong (C)



On 2017/11/18 8:11, Martin Wilck wrote:
> The log standard deviation can be calculated much more simply
> by realizing
> 
>    sum_n (x_i - avg(x))^2 == sum_n x_i^2 - n * avg(x)^2
> 

I derive the equation:
 sum_n {(x_i - avg(x))^2} = sum_n{x_i^2 -2*x_i*avg(x) + avg(x)^2}
                          = sum_n{x_i^2} - 2*avg(x)*sum_n{x_i} + sum_n{avg(x)^2}
                          = sum_n{x_i^2} - 2*avg(x)*avg(x) + n*avg(x)^2
                          =  sum_n{x_i^2} + (n-2)*avg(x)^2

> Also, use timespecsub rather than the custom timeval_to_usec,
> and avoid taking log(0).
> 

Great.


> +	pp_pl_log(3, "%s: latency avg=%.2e uncertainty=%.1f prio=%d\n",

latency avg -> latency geometric avg ? Because in most cases,
avg means arithmetic avg , but in this case, it means geometric avg.


> +		  pp->dev, exp(lg_avglatency * lg_base),
> +		  exp(standard_deviation * lg_base), rc);

How can you get the uncertainty of Log-normal distribution
is the exp(standard_deviation * lg_base) ?

Thanks.
Regards
Guan

.

> +
>  	return rc;
>  }
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 4/4] libmultipath: path latency: remove warnings
  2017-11-18  0:11 ` [PATCH 4/4] libmultipath: path latency: remove warnings Martin Wilck
@ 2017-11-19  3:00   ` Guan Junxiong
  0 siblings, 0 replies; 14+ messages in thread
From: Guan Junxiong @ 2017-11-19  3:00 UTC (permalink / raw)
  To: Martin Wilck, Christophe Varoqui; +Cc: dm-devel, niuhaoxin, Shenhong (C)


Looks fine.

Reviewed-by: Guan Junxiong <guanjunxiong@huawei.com>

On 2017/11/18 8:11, Martin Wilck wrote:
> The warnings at here are pointless. We are looking at a single
> path only. Firstly, the standdard deviation for this measurement
> can't be "too low" - the lower, the more precise the measurement,
> the better. Secondly, a high standard deviation indicates an
> unstable path with highly variable latency. Not good, but nothing
> to warn about here.
> 
> What matters for the selection of "base_num" is not how a single
> path behaves, but how different paths of the same path group relate
> to each other, which we don't know at this point at the code.
> 

Oh, you are right. Thanks.

> What we want to avoid is too fine a differentiation, in particular
> in combination with group_by_prio, because we'd loose the ability for
> load balancing. But this is rather a topic for the man page or a
> "best practices" document.
> 
> Signed-off-by: Martin Wilck <mwilck@suse.com>
> ---
>  libmultipath/prioritizers/path_latency.c | 34 --------------------------------
>  1 file changed, 34 deletions(-)
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/4] libmultipath: path latency: simplify getprio()
  2017-11-19  2:52   ` Guan Junxiong
@ 2017-11-20  8:46     ` Martin Wilck
  0 siblings, 0 replies; 14+ messages in thread
From: Martin Wilck @ 2017-11-20  8:46 UTC (permalink / raw)
  To: Guan Junxiong, Christophe Varoqui; +Cc: dm-devel, niuhaoxin, Shenhong (C)

Hello Guan,

> On 2017/11/18 8:11, Martin Wilck wrote:
> > The log standard deviation can be calculated much more simply
> > by realizing
> > 
> >    sum_n (x_i - avg(x))^2 == sum_n x_i^2 - n * avg(x)^2
> > 
> 
> I derive the equation:
>  sum_n {(x_i - avg(x))^2} = sum_n{x_i^2 -2*x_i*avg(x) + avg(x)^2}
>                           = sum_n{x_i^2} - 2*avg(x)*sum_n{x_i} +
> sum_n{avg(x)^2}
>                           = sum_n{x_i^2} - 2*avg(x)*avg(x) +
> n*avg(x)^2
>                           =  sum_n{x_i^2} + (n-2)*avg(x)^2

No, that's wrong:

    avg(x) = (1/n) * sum_n(x_i)
=>  sum_n(x_i) = n * avg(x)

Thus the 2nd term in the line before the last in your derivation
is not "- 2*avg(x)*avg(x)", but "- 2*n*avg(x)*avg(x)", and the end
result becomes sum_n(x_i^2) - n*avg(x)^2.

> 
> > Also, use timespecsub rather than the custom timeval_to_usec,
> > and avoid taking log(0).
> > 
> 
> Great.
> 
> 
> > +	pp_pl_log(3, "%s: latency avg=%.2e uncertainty=%.1f
> > prio=%d\n",
> 
> latency avg -> latency geometric avg ? Because in most cases,
> avg means arithmetic avg , but in this case, it means geometric avg.

Yes, I meant the geometric average. I don't think we should bother the
user with these subtleties. Well, maybe it would feel better if we'd
use "geometric mean" rather than "avg" in the log message, alhough that
might again irritate some people who don't know the term ... I really
don't care much.

> > +		  pp->dev, exp(lg_avglatency * lg_base),
> > +		  exp(standard_deviation * lg_base), rc);
> 
> How can you get the uncertainty of Log-normal distribution
> is the exp(standard_deviation * lg_base) ?

The "width" of the normal distribution is measured in terms of the
standard deviation, sigma. In your patch, you correctly accounted for
the confidence levels of the 2*sigma environment 
(https://en.wikipedia.org/wiki/68%E2%80%9395%E2%80%9399.7_rule).

Here, we're assuming a log-normal distribution for the latency (it's a
practical assumption, not a statistical assertion - in reality the
latency probably rather follows an exponential or Poisson distribution
but we don't need to go into that detail). That means we're assuming
that log(latency) can be described by a normal distribution with a
certain standard deviation sigma around the log of the geometric mean.
Again, sigma is the "width" of that normal distribution. Thus with ~68%
probability, the log of the the latency is in the 1-sigma interval
around the average. Translating that back into "real" latency, with 68%
likelyhood it will be in the interval [(1/F) * gm, F*gm], where gm is
the geometric mean and F=exp(sigma). Therefore, F (which is
exp(standard_deviation * lg_base)) can be used as an estimate of the
"uncertainty factor" for the latency.

Agreed?

Regards
Martin

-- 
Dr. Martin Wilck <mwilck@suse.com>, Tel. +49 (0)911 74053 2107
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/4] path latency prio fixes
  2017-11-18  0:11 [PATCH 0/4] path latency prio fixes Martin Wilck
                   ` (3 preceding siblings ...)
  2017-11-18  0:11 ` [PATCH 4/4] libmultipath: path latency: remove warnings Martin Wilck
@ 2017-11-20  9:11 ` Martin Wilck
  2017-12-08 20:49   ` Benjamin Marzinski
  4 siblings, 1 reply; 14+ messages in thread
From: Martin Wilck @ 2017-11-20  9:11 UTC (permalink / raw)
  To: Christophe Varoqui, Guan Junxiong; +Cc: dm-devel

Hi Guan,

while staring at this code the other day, I realized another possible
issue with your latency prioritizer. 

It will cause significant IO to every path of a map during multipath /
multipathd startup. If any paths really have latencies as long as your
patch considers (up to 100s), or worse if they don't respond at all,
startup may be *massively* delayed or may even never complete. So if we
a storage with two mirrors with a fast and a slow leg (I reckon that's
the scenario this patch was made for), and if we're out of luck and the
slow leg is probed first, we may end up in a situation where the fast
leg, which may be fully up and healthy, is never set up (or with big
delay) because multipathd keeps waiting for the slow leg to respond.

Similar delays can occur whenever pathinfo(..., DI_PRIO) is called.
Unless I'm overlooking something essential here, that's a really
dangerous thing to do. I believe that before activating this prio
checker for  everyone, we need find a way to avoid this scenario. 

By using aio with a reasonable timeout for the latency check rather
then sync IO, we could at least set an upper limit for the time
get_prio takes. That would be a first step. But I don't think that
would be sufficient. 

What we'd really need is an asynchronous priority checker, similar to
the asynchronous path checker. The get_prio() call would return
immediately with some special return code indicating to the caller that
a priority check is running the background. A preliminary prio would be
set for the path in pathinfo(), and multipathd would re-check later (or
get some sort of event) when the priority check has actually been done.
An open question is what multipathd should do wrt path grouping if it
only has preliminary prio values, in particular with group_by_prio.

Putting Hannes and Ben on CC because I'd like to get their opinion,
too.

Regards
Martin

-- 
Dr. Martin Wilck <mwilck@suse.com>, Tel. +49 (0)911 74053 2107
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/4] libmultipath: path latency: simplify getprio()
  2017-11-18  0:11 ` [PATCH 3/4] libmultipath: path latency: simplify getprio() Martin Wilck
  2017-11-19  2:52   ` Guan Junxiong
@ 2017-12-07  4:26   ` Guan Junxiong
  2017-12-07 15:56     ` Martin Wilck
  1 sibling, 1 reply; 14+ messages in thread
From: Guan Junxiong @ 2017-12-07  4:26 UTC (permalink / raw)
  To: Martin Wilck, Christophe Varoqui
  Cc: guanjunxiong, dm-devel, niuhaoxin, Shenhong (C)

Hi Martin,

I have tested this patch and found something needed to be correct. My comments inline.


On 2017/11/18 8:11, Martin Wilck wrote:
> The log standard deviation can be calculated much more simply
> by realizing
> 
>    sum_n (x_i - avg(x))^2 == sum_n x_i^2 - n * avg(x)^2
> 


> @@ -340,8 +323,14 @@ int getprio(struct path *pp, char *args, unsigned int timeout)
>  			  ".It is recommend to be set larger",
>  			  pp->dev, base_num);
>  
> +	standard_deviation = sqrt((sum_squares - lg_toldelay * lg_toldelay)
> +				  / (io_num -1));
>  
This assignment is wrong. It gets a "NAN" for standard_deviation.
It should be the following equation according to sum_n (x_i - avg(x))^2 == sum_n x_i^2 - n * avg(x)^2

standard_deviation = sqrt((sum_squares - io_num* lg_avglatency* lg_avglatency)/ (io_num -1));
                   = sqrt((sum_squares -  lg_toldelay* lg_avglatency)/ (io_num -1));


>  	rc = calcPrio(lg_avglatency, lg_maxavglatency, lg_minavglatency);
>  
> +	pp_pl_log(3, "%s: latency avg=%.2e uncertainty=%.1f prio=%d\n",
> +		  pp->dev, exp(lg_avglatency * lg_base),
> +		  exp(standard_deviation * lg_base), rc);
> +

I still have the doubt about the computation of the "uncertainty" item.
Because I have observed that the uncertainty is in the range of 1.3 ~ 1.6 whenever
the base_num varies from 1.1 to 10.

Do you mean the uncertainty as the "Error function" of (log) normal distribution?
Here is the definition of https://en.wikipedia.org/wiki/Error_function

I prefer to using  "Error function" that describes accumulated probability of  how  prio
locates in the (-inf, prio-1) and (prio+1, +inf).

Thanks
Guan

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/4] libmultipath: path latency: simplify getprio()
  2017-12-07  4:26   ` Guan Junxiong
@ 2017-12-07 15:56     ` Martin Wilck
  0 siblings, 0 replies; 14+ messages in thread
From: Martin Wilck @ 2017-12-07 15:56 UTC (permalink / raw)
  To: Guan Junxiong, Christophe Varoqui; +Cc: dm-devel, niuhaoxin, Shenhong (C)

On Thu, 2017-12-07 at 12:26 +0800, Guan Junxiong wrote:
> Hi Martin,
> 
> I have tested this patch and found something needed to be correct. My
> comments inline.
> 
> 
> > @@ -340,8 +323,14 @@ int getprio(struct path *pp, char *args,
> > unsigned int timeout)
> >  			  ".It is recommend to be set larger",
> >  			  pp->dev, base_num);
> >  
> > +	standard_deviation = sqrt((sum_squares - lg_toldelay *
> > lg_toldelay)
> > +				  / (io_num -1));
> >  
> 
> This assignment is wrong. It gets a "NAN" for standard_deviation.
> It should be the following equation according to sum_n (x_i -
> avg(x))^2 == sum_n x_i^2 - n * avg(x)^2
> standard_deviation = sqrt((sum_squares - io_num* lg_avglatency*
> lg_avglatency)/ (io_num -1));
>                    = sqrt((sum_squares -  lg_toldelay*
> lg_avglatency)/ (io_num -1));

That's of course correct. I'll submit an update asap.

> I still have the doubt about the computation of the "uncertainty"
> item.
> Because I have observed that the uncertainty is in the range of 1.3 ~
> 1.6 whenever
> the base_num varies from 1.1 to 10.

> Do you mean the uncertainty as the "Error function" of (log) normal
> distribution?

> Here is the definition of
> https://en.wikipedia.org/wiki/Error_function
> 
> I prefer to using  "Error function" that describes accumulated
> probability of  how  prio
> locates in the (-inf, prio-1) and (prio+1, +inf).

There's a misunderstanding here.

My "uncertainty factor" describes the width of the distribution of the
*measured* latencies. It roughly represents the accuracy of the
measurement (indicating that 68%, or error_function (sqrt(1/2)), of the
measured values lie in the interval [avg/F, avg*F]). IOW, it tells you
how stable your latencies for a certain path are.

It has nothing to do with the artificial prio "bins" that the latency
prioritizer sets up. Therefore it *has to be independent* of base_num.
It just gives an indication how large base_num should be. 

I hope this makes it clear.

Martin

-- 
Dr. Martin Wilck <mwilck@suse.com>, Tel. +49 (0)911 74053 2107
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/4] path latency prio fixes
  2017-11-20  9:11 ` [PATCH 0/4] path latency prio fixes Martin Wilck
@ 2017-12-08 20:49   ` Benjamin Marzinski
  0 siblings, 0 replies; 14+ messages in thread
From: Benjamin Marzinski @ 2017-12-08 20:49 UTC (permalink / raw)
  To: Martin Wilck; +Cc: Guan Junxiong, dm-devel

On Mon, Nov 20, 2017 at 10:11:02AM +0100, Martin Wilck wrote:
> Hi Guan,
> 
> while staring at this code the other day, I realized another possible
> issue with your latency prioritizer. 
> 
> It will cause significant IO to every path of a map during multipath /
> multipathd startup. If any paths really have latencies as long as your
> patch considers (up to 100s), or worse if they don't respond at all,
> startup may be *massively* delayed or may even never complete. So if we
> a storage with two mirrors with a fast and a slow leg (I reckon that's
> the scenario this patch was made for), and if we're out of luck and the
> slow leg is probed first, we may end up in a situation where the fast
> leg, which may be fully up and healthy, is never set up (or with big
> delay) because multipathd keeps waiting for the slow leg to respond.
> 
> Similar delays can occur whenever pathinfo(..., DI_PRIO) is called.
> Unless I'm overlooking something essential here, that's a really
> dangerous thing to do. I believe that before activating this prio
> checker for  everyone, we need find a way to avoid this scenario. 
> 
> By using aio with a reasonable timeout for the latency check rather
> then sync IO, we could at least set an upper limit for the time
> get_prio takes. That would be a first step. But I don't think that
> would be sufficient. 
> 
> What we'd really need is an asynchronous priority checker, similar to
> the asynchronous path checker. The get_prio() call would return
> immediately with some special return code indicating to the caller that
> a priority check is running the background. A preliminary prio would be
> set for the path in pathinfo(), and multipathd would re-check later (or
> get some sort of event) when the priority check has actually been done.
> An open question is what multipathd should do wrt path grouping if it
> only has preliminary prio values, in particular with group_by_prio.

Yeah, you're right. Something like this is necessary. We could have the
prioritizers work like the checkers and have them include a context,
that the checkers can use to save data.  Then could make this work like
the directio checker, with its async calls.

> Putting Hannes and Ben on CC because I'd like to get their opinion,
> too.
> 
> Regards
> Martin
> 
> -- 
> Dr. Martin Wilck <mwilck@suse.com>, Tel. +49 (0)911 74053 2107
> SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton
> HRB 21284 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2017-12-08 20:49 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-11-18  0:11 [PATCH 0/4] path latency prio fixes Martin Wilck
2017-11-18  0:11 ` [PATCH 1/4] libmultipath: path latency: fix default base num Martin Wilck
2017-11-19  2:19   ` Guan Junxiong
2017-11-18  0:11 ` [PATCH 2/4] libmultipath: path latency: log threshold with p2 Martin Wilck
2017-11-19  2:23   ` Guan Junxiong
2017-11-18  0:11 ` [PATCH 3/4] libmultipath: path latency: simplify getprio() Martin Wilck
2017-11-19  2:52   ` Guan Junxiong
2017-11-20  8:46     ` Martin Wilck
2017-12-07  4:26   ` Guan Junxiong
2017-12-07 15:56     ` Martin Wilck
2017-11-18  0:11 ` [PATCH 4/4] libmultipath: path latency: remove warnings Martin Wilck
2017-11-19  3:00   ` Guan Junxiong
2017-11-20  9:11 ` [PATCH 0/4] path latency prio fixes Martin Wilck
2017-12-08 20:49   ` Benjamin Marzinski

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.