All of lore.kernel.org
 help / color / mirror / Atom feed
* [lustre-devel] [PATCH 0/6] Rate-limiting Quality of Service
@ 2017-03-21 19:43 Yan Li
  2017-03-21 19:43 ` [lustre-devel] [PATCH 1/6] Autoconf option for rate-limiting Quality of Service (RLQOS) Yan Li
                   ` (5 more replies)
  0 siblings, 6 replies; 17+ messages in thread
From: Yan Li @ 2017-03-21 19:43 UTC (permalink / raw)
  To: lustre-devel

This patch enables rate-limiting quality of service (RLQOS) support as
talked in the ASCAR paper [1]. The purpose of RLQOS is to provide a
client-side rate limiting mechanism that controls max_rpcs_in_flight
and minimal gap between brw RPC requests (called tau in the code and
paper). It is very different from the existing LOV QOS in Lustre. I
have presented this work at LUG'16 and am sorry for the belated code
release. This is my first code patch to the Lustre mailing list so I'm
sure lot of things can be improved. Please kindly let me know.

The main idea is to provide a rule-based rate limiting mechanism on
Lustre clients that can be used to ease congestion and improve
performance during peak hours. The rules designate how
max_rpcs_in_flight and tau can be changed based on three metrics. The
rules are set through a procfs handle. In the research paper [1], a
machine learning-based heuristics method is used to generate the
traffic control rules that can improve performance. The traffic
control rules can also be hand crafted based on benchmark results.

I probably should write more details here but the email would be
rather long. The paper [1] has detailed introduction of the idea and
implementation. I also believe there should be better documentation on
this feature. I'm not sure if I should create a wiki page for this or
provide a documentation within the code base.

This function is still under development, and the latest code can be
found at https://github.com/mlogic/ascar-lustre-2.9-client . 

This research was supported in part by the National Science Foundation
under awards IIP-1266400, CCF-1219163, CNS-1018928, CNS-1528179, by
the Department of Energy under award DE-FC02-10ER26017/DESC0005417, by
a Symantec Graduate Fellowship, by a grant from Intel Corporation, and
by industrial members of the Center for Research in Storage Systems in
UC Santa Cruz.

[1] http://storageconference.us/2015/Papers/14.Li.pdf

Yan Li (6):
  Autoconf option for rate-limiting Quality of Service (RLQOS)
  Added fields to message for RLQOS support
  RLQOS main data structure
  lprocfs interfaces for showing, parsing, and controlling rules
  Throttle the outgoing requests according to tau
  Adjust max_rpcs_in_flight according to metrics

 lustre/autoconf/lustre-core.m4     |  17 ++++
 lustre/include/Makefile.am         |   3 +-
 lustre/include/lustre/lustre_idl.h |   4 +
 lustre/include/obd.h               |   8 ++
 lustre/include/rlqos.h             | 136 ++++++++++++++++++++++++++++++
 lustre/obdclass/genops.c           |  25 ++++++
 lustre/obdclass/lprocfs_status.c   |  32 +++++++
 lustre/osc/Makefile.in             |   2 +-
 lustre/osc/lproc_osc.c             | 157 ++++++++++++++++++++++++++++++-----
 lustre/osc/osc_cache.c             |   3 +
 lustre/osc/osc_internal.h          |  66 +++++++++++++++
 lustre/osc/osc_request.c           | 165 +++++++++++++++++++++++++++++++++++++
 lustre/osc/qos_rules.c             | 125 ++++++++++++++++++++++++++++
 lustre/ptlrpc/pack_generic.c       |   5 ++
 lustre/ptlrpc/wiretest.c           |   2 +
 lustre/utils/wiretest.c            |   2 +
 16 files changed, 730 insertions(+), 22 deletions(-)
 create mode 100644 lustre/include/rlqos.h
 create mode 100644 lustre/osc/qos_rules.c

-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [lustre-devel] [PATCH 1/6] Autoconf option for rate-limiting Quality of Service (RLQOS)
  2017-03-21 19:43 [lustre-devel] [PATCH 0/6] Rate-limiting Quality of Service Yan Li
@ 2017-03-21 19:43 ` Yan Li
  2017-03-21 20:09   ` Ben Evans
  2017-03-24 22:22   ` Dilger, Andreas
  2017-03-21 19:43 ` [lustre-devel] [PATCH 2/6] Added fields to message for RLQOS support Yan Li
                   ` (4 subsequent siblings)
  5 siblings, 2 replies; 17+ messages in thread
From: Yan Li @ 2017-03-21 19:43 UTC (permalink / raw)
  To: lustre-devel

This patch enables rate-limiting quality of service (RLQOS) support as
talked in the ASCAR paper [1]. The purpose of RLQOS is to provide a
client-side rate limiting mechanism that controls max_rpcs_in_flight
and minimal gap between brw RPC requests (called tau in the code and
paper).

RLQOS can be enabled by passing --enable-rlqos to configure. It then
can be controlled by tunables in procfs of each osc.

[1] http://storageconference.us/2015/Papers/14.Li.pdf

Signed-off-by: Yan Li <yanli@ascar.io>
---
 lustre/autoconf/lustre-core.m4 | 17 +++++++++++++++++
 lustre/include/Makefile.am     |  3 ++-
 2 files changed, 19 insertions(+), 1 deletion(-)

diff --git a/lustre/autoconf/lustre-core.m4 b/lustre/autoconf/lustre-core.m4
index 0578325..7f1828e 100644
--- a/lustre/autoconf/lustre-core.m4
+++ b/lustre/autoconf/lustre-core.m4
@@ -369,6 +369,22 @@ AC_COMPILE_IFELSE([AC_LANG_SOURCE([
 AC_MSG_RESULT([$enable_ssk])
 ]) # LC_OPENSSL_SSK
 
+#
+# LC_CONFIG_RLQOS
+#
+# Rate-limiting Quality of Service support
+#
+AC_DEFUN([LC_CONFIG_RLQOS], [
+AC_MSG_CHECKING([whether to enable rate-limiting quality of service support])
+AC_ARG_ENABLE([rlqos],
+	AC_HELP_STRING([--enable-rlqos],
+		[enable rate-limiting quality of service support]),
+	[], [enable_rlqos="no"])
+AC_MSG_RESULT([$enable_rlqos])
+AS_IF([test "x$enable_rlqos" != xno],
+	[AC_DEFINE(ENABLE_RLQOS, 1, [enable rate-limiting quality of service support])])
+]) # LC_CONFIG_RLQOS
+
 # LC_INODE_PERMISION_2ARGS
 #
 # up to v2.6.27 had a 3 arg version (inode, mask, nameidata)
@@ -2241,6 +2257,7 @@ AC_DEFUN([LC_PROG_LINUX], [
 	LC_GLIBC_SUPPORT_FHANDLES
 	LC_CONFIG_GSS
 	LC_OPENSSL_SSK
+	LC_CONFIG_RLQOS
 
 	# 2.6.32
 	LC_BLK_QUEUE_MAX_SEGMENTS
diff --git a/lustre/include/Makefile.am b/lustre/include/Makefile.am
index 9074ca4..6d72b6e 100644
--- a/lustre/include/Makefile.am
+++ b/lustre/include/Makefile.am
@@ -98,4 +98,5 @@ EXTRA_DIST = \
 	upcall_cache.h \
 	lustre_kernelcomm.h \
 	seq_range.h \
-	uapi_kernelcomm.h
+	uapi_kernelcomm.h \
+	rlqos.h
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [lustre-devel] [PATCH 2/6] Added fields to message for RLQOS support
  2017-03-21 19:43 [lustre-devel] [PATCH 0/6] Rate-limiting Quality of Service Yan Li
  2017-03-21 19:43 ` [lustre-devel] [PATCH 1/6] Autoconf option for rate-limiting Quality of Service (RLQOS) Yan Li
@ 2017-03-21 19:43 ` Yan Li
  2017-03-23 14:54   ` Alexey Lyashkov
  2017-03-21 19:43 ` [lustre-devel] [PATCH 3/6] RLQOS main data structure Yan Li
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 17+ messages in thread
From: Yan Li @ 2017-03-21 19:43 UTC (permalink / raw)
  To: lustre-devel

Modified the request message to embed sent_time, which will be
returned from the server and used to calculate the exponentially
weighted moving average of sent_time gap in return messages. It is
used as a metric for rate-limiting quality of service.

Signed-off-by: Yan Li <yanli@ascar.io>
---
 lustre/include/lustre/lustre_idl.h | 4 ++++
 lustre/ptlrpc/pack_generic.c       | 5 +++++
 lustre/ptlrpc/wiretest.c           | 2 ++
 lustre/utils/wiretest.c            | 2 ++
 4 files changed, 13 insertions(+)

diff --git a/lustre/include/lustre/lustre_idl.h b/lustre/include/lustre/lustre_idl.h
index bf23a47..7a200d1 100644
--- a/lustre/include/lustre/lustre_idl.h
+++ b/lustre/include/lustre/lustre_idl.h
@@ -3336,8 +3336,12 @@ struct obdo {
 						 * each stripe.
 						 * brw: grant space consumed on
 						 * the client for the write */
+#ifdef ENABLE_RLQOS
+	struct timeval		o_sent_time;	/* timeval is 64x2 bits on Linux */
+#else
 	__u64			o_padding_4;
 	__u64			o_padding_5;
+#endif
 	__u64			o_padding_6;
 };
 
diff --git a/lustre/ptlrpc/pack_generic.c b/lustre/ptlrpc/pack_generic.c
index 8df8ea8..d0bc87a 100644
--- a/lustre/ptlrpc/pack_generic.c
+++ b/lustre/ptlrpc/pack_generic.c
@@ -1722,8 +1722,13 @@ void lustre_swab_obdo (struct obdo  *o)
         __swab32s (&o->o_uid_h);
         __swab32s (&o->o_gid_h);
         __swab64s (&o->o_data_version);
+#ifdef ENABLE_RLQOS
+        __swab64s ((__u64*)&o->o_sent_time.tv_sec);
+        __swab64s ((__u64*)&o->o_sent_time.tv_usec);
+#else
         CLASSERT(offsetof(typeof(*o), o_padding_4) != 0);
         CLASSERT(offsetof(typeof(*o), o_padding_5) != 0);
+#endif
         CLASSERT(offsetof(typeof(*o), o_padding_6) != 0);
 
 }
diff --git a/lustre/ptlrpc/wiretest.c b/lustre/ptlrpc/wiretest.c
index 070ef91..0c909a6 100644
--- a/lustre/ptlrpc/wiretest.c
+++ b/lustre/ptlrpc/wiretest.c
@@ -1314,6 +1314,7 @@ void lustre_assert_wire_constants(void)
 		 (long long)(int)offsetof(struct obdo, o_data_version));
 	LASSERTF((int)sizeof(((struct obdo *)0)->o_data_version) == 8, "found %lld\n",
 		 (long long)(int)sizeof(((struct obdo *)0)->o_data_version));
+#ifndef ENABLE_RLQOS
 	LASSERTF((int)offsetof(struct obdo, o_padding_4) == 184, "found %lld\n",
 		 (long long)(int)offsetof(struct obdo, o_padding_4));
 	LASSERTF((int)sizeof(((struct obdo *)0)->o_padding_4) == 8, "found %lld\n",
@@ -1322,6 +1323,7 @@ void lustre_assert_wire_constants(void)
 		 (long long)(int)offsetof(struct obdo, o_padding_5));
 	LASSERTF((int)sizeof(((struct obdo *)0)->o_padding_5) == 8, "found %lld\n",
 		 (long long)(int)sizeof(((struct obdo *)0)->o_padding_5));
+#endif
 	LASSERTF((int)offsetof(struct obdo, o_padding_6) == 200, "found %lld\n",
 		 (long long)(int)offsetof(struct obdo, o_padding_6));
 	LASSERTF((int)sizeof(((struct obdo *)0)->o_padding_6) == 8, "found %lld\n",
diff --git a/lustre/utils/wiretest.c b/lustre/utils/wiretest.c
index 233d7d8..47fbbf0 100644
--- a/lustre/utils/wiretest.c
+++ b/lustre/utils/wiretest.c
@@ -1329,6 +1329,7 @@ void lustre_assert_wire_constants(void)
 		 (long long)(int)offsetof(struct obdo, o_data_version));
 	LASSERTF((int)sizeof(((struct obdo *)0)->o_data_version) == 8, "found %lld\n",
 		 (long long)(int)sizeof(((struct obdo *)0)->o_data_version));
+#ifndef ENABLE_RLQOS
 	LASSERTF((int)offsetof(struct obdo, o_padding_4) == 184, "found %lld\n",
 		 (long long)(int)offsetof(struct obdo, o_padding_4));
 	LASSERTF((int)sizeof(((struct obdo *)0)->o_padding_4) == 8, "found %lld\n",
@@ -1337,6 +1338,7 @@ void lustre_assert_wire_constants(void)
 		 (long long)(int)offsetof(struct obdo, o_padding_5));
 	LASSERTF((int)sizeof(((struct obdo *)0)->o_padding_5) == 8, "found %lld\n",
 		 (long long)(int)sizeof(((struct obdo *)0)->o_padding_5));
+#endif
 	LASSERTF((int)offsetof(struct obdo, o_padding_6) == 200, "found %lld\n",
 		 (long long)(int)offsetof(struct obdo, o_padding_6));
 	LASSERTF((int)sizeof(((struct obdo *)0)->o_padding_6) == 8, "found %lld\n",
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [lustre-devel] [PATCH 3/6] RLQOS main data structure
  2017-03-21 19:43 [lustre-devel] [PATCH 0/6] Rate-limiting Quality of Service Yan Li
  2017-03-21 19:43 ` [lustre-devel] [PATCH 1/6] Autoconf option for rate-limiting Quality of Service (RLQOS) Yan Li
  2017-03-21 19:43 ` [lustre-devel] [PATCH 2/6] Added fields to message for RLQOS support Yan Li
@ 2017-03-21 19:43 ` Yan Li
  2017-03-21 19:43 ` [lustre-devel] [PATCH 4/6] lprocfs interfaces for showing, parsing, and controlling rules Yan Li
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 17+ messages in thread
From: Yan Li @ 2017-03-21 19:43 UTC (permalink / raw)
  To: lustre-devel

Each client_obd maintains a qos data structure.

Signed-off-by: Yan Li <yanli@ascar.io>
---
 lustre/include/obd.h     |   8 +++
 lustre/include/rlqos.h   | 136 +++++++++++++++++++++++++++++++++++++++++++++++
 lustre/obdclass/genops.c |  25 +++++++++
 3 files changed, 169 insertions(+)
 create mode 100644 lustre/include/rlqos.h

diff --git a/lustre/include/obd.h b/lustre/include/obd.h
index b4ee379..726493c 100644
--- a/lustre/include/obd.h
+++ b/lustre/include/obd.h
@@ -50,6 +50,9 @@
 #include <lustre_intent.h>
 #include <lvfs.h>
 #include <lustre_quota.h>
+#ifdef ENABLE_RLQOS
+# include "rlqos.h"
+#endif
 
 #define MAX_OBD_DEVICES 8192
 
@@ -331,6 +334,11 @@ struct client_obd {
 	void			*cl_lru_work;
 	/* hash tables for osc_quota_info */
 	struct cfs_hash		*cl_quota_hash[LL_MAXQUOTAS];
+
+#ifdef ENABLE_RLQOS
+	/* rate-limiting quality of service data */
+	struct qos_data_t	qos;
+#endif
 };
 #define obd2cli_tgt(obd) ((char *)(obd)->u.cli.cl_target_uuid.uuid)
 
diff --git a/lustre/include/rlqos.h b/lustre/include/rlqos.h
new file mode 100644
index 0000000..d8e012b
--- /dev/null
+++ b/lustre/include/rlqos.h
@@ -0,0 +1,136 @@
+/*
+ * GPL HEADER START
+ *
+ * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 only,
+ * as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License version 2 for more details (a copy is included
+ * in the LICENSE file that accompanied this code).
+ *
+ * You should have received a copy of the GNU General Public License
+ * version 2 along with this program; If not, see
+ * http://www.sun.com/software/products/lustre/docs/GPLv2.pdf
+ *
+ * Please contact Storage Systems Research Center, Computer Science Department,
+ * University of California, Santa Cruz (www.ssrc.ucsc.edu) if you need
+ * additional information or have any questions.
+ *
+ * GPL HEADER END
+ */
+/*
+ * Copyright (c) 2013-2017, University of California, Santa Cruz, CA, USA.
+ * All rights reserved.
+ */
+/*
+ * This file is part of Lustre, http://www.lustre.org/
+ * Lustre is a trademark of Sun Microsystems, Inc.
+ *
+ * lustre/include/rlqos.h
+ */
+
+#ifndef _RLQOS_H
+#define _RLQOS_H
+
+/* We work with kernel only */
+#ifdef __KERNEL__
+# include <linux/types.h>
+# include <linux/time.h>
+# include <asm/param.h>
+# include <libcfs/libcfs.h>
+# include <linux/delay.h>
+#else /* __KERNEL__ */
+# define HZ 100
+# define ONE_MILLION 1000000
+# include <liblustre.h>
+#endif
+
+#define EWMA_ALPHA_INV (8)
+
+/**
+ * For tracking the exponentially-weighted moving average of a timeval. Note
+ * that we can't do float point div in kernel, so actually we are tracking
+ * ea = ewma * alpha. You should divide ea with alpha to get the real ewma.
+ */
+struct time_ewma {
+	__u64          alpha_inv;
+	__u64          ea;
+	struct timeval last_time;
+};
+/* We can't do float point div, so we are tracking
+ * ea = ewma * alpha = ewma / alpha_inv
+ */
+
+struct qos_rule_t {
+	__u64 ack_ewma_lower;
+	__u64 ack_ewma_upper;
+	__u64 send_ewma_lower;
+	__u64 send_ewma_upper;
+	unsigned int rtt_ratio100_lower;
+	unsigned int rtt_ratio100_upper;
+	int m100;
+	int b100;
+	unsigned int tau;
+	int used_times;
+
+	__u64 ack_ewma_avg;
+	__u64 send_ewma_avg;
+	unsigned int rtt_ratio100_avg;
+};
+
+struct qos_data_t {
+	spinlock_t       lock;
+        struct time_ewma ack_ewma;
+        struct time_ewma sent_ewma;
+        int              rtt_ratio100;
+        long             smallest_rtt;
+        int              max_rpc_in_flight100;
+        struct timeval   last_mrif_update_time;
+        int              min_gap_between_updating_mrif;
+        int              rule_no;
+        /* Following fields are for calculating I/O bandwidth,
+         * 0 for read, 1 for write */
+        long             last_req_sec[2];       /* second of last request we received */
+        __u64            tp_last_sec[2];        /* throughput of last sec */
+        __u64            sum_bytes_this_sec[2]; /* cumulative bytes read within this sec */
+        /* For throttling support */
+        unsigned int     min_usec_between_rpcs;
+        struct timeval   last_rpc_time;
+        struct qos_rule_t *rules;
+};
+
+static inline __u64 qos_get_ewma_usec(const struct time_ewma *ewma) {
+	return ewma->ea / ewma->alpha_inv;
+}
+
+int parse_qos_rules(const char *buf, struct qos_data_t *qos);
+
+/* Lock of qos must be held. op == 0 for read, 1 for write */
+static inline void calc_throughput(struct qos_data_t *qos, int op, int bytes_transferred)
+{
+	struct timeval now;
+
+	if (op != 0 && op != 1)
+		return;
+
+	do_gettimeofday(&now);
+	if (likely(now.tv_sec == qos->last_req_sec[op])) {
+		qos->sum_bytes_this_sec[op] += bytes_transferred;
+	} else if (likely(now.tv_sec == qos->last_req_sec[op] + 1)) {
+		qos->tp_last_sec[op] = qos->sum_bytes_this_sec[op];
+		qos->last_req_sec[op] = now.tv_sec;
+		qos->sum_bytes_this_sec[op] = bytes_transferred;
+	} else if (likely(now.tv_sec > qos->last_req_sec[op] + 1)) {
+		qos->tp_last_sec[op] = 0;
+		qos->last_req_sec[op] = now.tv_sec;
+		qos->sum_bytes_this_sec[op] = bytes_transferred;
+	}
+	/* Ignore cases when now.tv_sec < qos->last_req_sec */
+}
+
+#endif /* _RLQOS_H */
diff --git a/lustre/obdclass/genops.c b/lustre/obdclass/genops.c
index a48f887..417c612 100644
--- a/lustre/obdclass/genops.c
+++ b/lustre/obdclass/genops.c
@@ -284,6 +284,28 @@ int class_unregister_type(const char *name)
 } /* class_unregister_type */
 EXPORT_SYMBOL(class_unregister_type);
 
+#ifdef ENABLE_RLQOS
+static void init_time_ewma(struct time_ewma *ewma)
+{
+	ewma->alpha_inv = 8;
+	ewma->ea = 0;
+	ewma->last_time.tv_sec = 0;
+	ewma->last_time.tv_usec = 0;
+}
+
+static void init_qos(struct client_obd *cli)
+{
+	struct qos_data_t *qos = &cli->qos;
+
+	init_time_ewma(&qos->ack_ewma);
+	init_time_ewma(&qos->sent_ewma);
+
+	spin_lock(&cli->cl_loi_list_lock);
+	qos->max_rpc_in_flight100 = cli->cl_max_rpcs_in_flight * 100;
+	spin_unlock(&cli->cl_loi_list_lock);
+}
+#endif
+
 /**
  * Create a new obd device.
  *
@@ -349,6 +371,9 @@ struct obd_device *class_newdev(const char *type_name, const char *name)
                         result->obd_type = type;
                         strncpy(result->obd_name, name,
                                 sizeof(result->obd_name) - 1);
+#ifdef ENABLE_RLQOS
+                        init_qos(&result->u.cli);
+#endif
                         obd_devs[i] = result;
                 }
         }
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [lustre-devel] [PATCH 4/6] lprocfs interfaces for showing, parsing, and controlling rules
  2017-03-21 19:43 [lustre-devel] [PATCH 0/6] Rate-limiting Quality of Service Yan Li
                   ` (2 preceding siblings ...)
  2017-03-21 19:43 ` [lustre-devel] [PATCH 3/6] RLQOS main data structure Yan Li
@ 2017-03-21 19:43 ` Yan Li
  2017-03-21 19:43 ` [lustre-devel] [PATCH 5/6] Throttle the outgoing requests according to tau Yan Li
  2017-03-21 19:43 ` [lustre-devel] [PATCH 6/6] Adjust max_rpcs_in_flight according to metrics Yan Li
  5 siblings, 0 replies; 17+ messages in thread
From: Yan Li @ 2017-03-21 19:43 UTC (permalink / raw)
  To: lustre-devel

Signed-off-by: Yan Li <yanli@ascar.io>
---
 lustre/obdclass/lprocfs_status.c |  32 ++++++++
 lustre/osc/Makefile.in           |   2 +-
 lustre/osc/lproc_osc.c           | 157 ++++++++++++++++++++++++++++++++++-----
 lustre/osc/qos_rules.c           | 125 +++++++++++++++++++++++++++++++
 4 files changed, 295 insertions(+), 21 deletions(-)
 create mode 100644 lustre/osc/qos_rules.c

diff --git a/lustre/obdclass/lprocfs_status.c b/lustre/obdclass/lprocfs_status.c
index 08db676..841a3da 100644
--- a/lustre/obdclass/lprocfs_status.c
+++ b/lustre/obdclass/lprocfs_status.c
@@ -814,6 +814,14 @@ int lprocfs_import_seq_show(struct seq_file *m, void *data)
 	int                             j;
 	int                             k;
 	int                             rw      = 0;
+#ifdef ENABLE_RLQOS
+	struct qos_data_t		*qos;
+	__u64				ack_ewma;
+	__u64				sent_ewma;
+	int				rtt_ratio100;
+	__u64				read_tp;
+	__u64				write_tp;
+#endif
 
 	LASSERT(obd != NULL);
 	LPROCFS_CLIMP_CHECK(obd);
@@ -884,6 +892,26 @@ int lprocfs_import_seq_show(struct seq_file *m, void *data)
 		   atomic_read(&imp->imp_unregistering),
 		   atomic_read(&imp->imp_timeouts),
 		   ret.lc_sum, header->lc_units);
+#ifdef ENABLE_RLQOS
+	qos = &obd->u.cli.qos;
+	spin_lock(&qos->lock);
+	ack_ewma  = qos_get_ewma_usec(&qos->ack_ewma);
+	sent_ewma = qos_get_ewma_usec(&qos->sent_ewma);
+	rtt_ratio100 = qos->rtt_ratio100;
+
+	/* Refresh throughput. If a long time has passed since we
+           received last req, throughput data is stale. */
+	calc_throughput(qos, OST_READ-OST_READ, 0);
+	calc_throughput(qos, OST_WRITE-OST_READ, 0);
+
+	read_tp   = qos->tp_last_sec[0];
+	write_tp  = qos->tp_last_sec[1];
+	spin_unlock(&qos->lock);
+	seq_printf(m, "       ack_ewma: %llu usec\n"
+		   "       sent_ewma: %llu usec\n"
+		   "       rtt_ratio100: %d\n",
+		   ack_ewma, sent_ewma, rtt_ratio100);
+#endif
 
 	k = 0;
 	for(j = 0; j < IMP_AT_MAX_PORTALS; j++) {
@@ -938,6 +966,10 @@ int lprocfs_import_seq_show(struct seq_file *m, void *data)
 					   k / j, (100 * k / j) % 100);
 		}
 	}
+#ifdef ENABLE_RLQOS
+	seq_printf(m, "    read_throughput: %llu\n", read_tp);
+	seq_printf(m, "    write_throughput: %llu\n", write_tp);
+#endif
 
 out_climp:
 	LPROCFS_CLIMP_EXIT(obd);
diff --git a/lustre/osc/Makefile.in b/lustre/osc/Makefile.in
index b1128bc..d6edab2 100644
--- a/lustre/osc/Makefile.in
+++ b/lustre/osc/Makefile.in
@@ -1,5 +1,5 @@
 MODULES := osc
-osc-objs := osc_request.o lproc_osc.o osc_dev.o osc_object.o osc_page.o osc_lock.o osc_io.o osc_quota.o osc_cache.o
+osc-objs := osc_request.o lproc_osc.o osc_dev.o osc_object.o osc_page.o osc_lock.o osc_io.o osc_quota.o osc_cache.o qos_rules.o
 
 EXTRA_DIST = $(osc-objs:%.o=%.c) osc_internal.h osc_cl_internal.h
 
diff --git a/lustre/osc/lproc_osc.c b/lustre/osc/lproc_osc.c
index de5a29c..653afc4 100644
--- a/lustre/osc/lproc_osc.c
+++ b/lustre/osc/lproc_osc.c
@@ -1,3 +1,4 @@
+
 /*
  * GPL HEADER START
  *
@@ -38,6 +39,9 @@
 #include <lprocfs_status.h>
 #include <linux/seq_file.h>
 #include "osc_internal.h"
+#ifdef ENABLE_RLQOS
+# include "../include/rlqos.h"
+#endif
 
 #ifdef CONFIG_PROC_FS
 static int osc_active_seq_show(struct seq_file *m, void *v)
@@ -92,8 +96,10 @@ static ssize_t osc_max_rpcs_in_flight_seq_write(struct file *file,
 {
 	struct obd_device *dev = ((struct seq_file *)file->private_data)->private;
 	struct client_obd *cli = &dev->u.cli;
+#ifdef ENABLE_RLQOS
+	struct qos_data_t *qos = &cli->qos;
+#endif
 	int rc;
-	int adding, added, req_count;
 	__s64 val;
 
 	rc = lprocfs_str_to_s64(buffer, count, &val);
@@ -103,31 +109,57 @@ static ssize_t osc_max_rpcs_in_flight_seq_write(struct file *file,
 		return -ERANGE;
 
 	LPROCFS_CLIMP_CHECK(dev);
+	set_max_rpcs_in_flight((int)val, cli);
+	LPROCFS_CLIMP_EXIT(dev);
 
-	adding = (int)val - cli->cl_max_rpcs_in_flight;
-	req_count = atomic_read(&osc_pool_req_count);
-	if (adding > 0 && req_count < osc_reqpool_maxreqcount) {
-		/*
-		 * There might be some race which will cause over-limit
-		 * allocation, but it is fine.
-		 */
-		if (req_count + adding > osc_reqpool_maxreqcount)
-			adding = osc_reqpool_maxreqcount - req_count;
-
-		added = osc_rq_pool->prp_populate(osc_rq_pool, adding);
-		atomic_add(added, &osc_pool_req_count);
-	}
-
-	spin_lock(&cli->cl_loi_list_lock);
-	cli->cl_max_rpcs_in_flight = val;
-	client_adjust_max_dirty(cli);
-	spin_unlock(&cli->cl_loi_list_lock);
+#ifdef ENABLE_RLQOS
+	/* Update the value tracked by QoS routines too */
+	spin_lock(&qos->lock);
+	qos->max_rpc_in_flight100 = val * 100;
+	spin_unlock(&qos->lock);
+#endif
 
-	LPROCFS_CLIMP_EXIT(dev);
 	return count;
 }
 LPROC_SEQ_FOPS(osc_max_rpcs_in_flight);
 
+#ifdef ENABLE_RLQOS
+static int osc_min_brw_rpc_gap_seq_show(struct seq_file *m, void *v)
+{
+	struct obd_device *dev = m->private;
+	struct client_obd *cli = &dev->u.cli;
+	struct qos_data_t *qos = &cli->qos;
+
+	spin_lock(&qos->lock);
+	seq_printf(m, "%u\n", qos->min_usec_between_rpcs);
+	spin_unlock(&qos->lock);
+	return 0;
+}
+
+static ssize_t osc_min_brw_rpc_gap_seq_write(struct file *file,
+					     const char __user *buffer,
+					     size_t count, loff_t *off)
+{
+	struct obd_device *dev = ((struct seq_file *)file->private_data)->private;
+	struct client_obd *cli = &dev->u.cli;
+	int rc;
+	__s64 val;
+	struct qos_data_t *qos = &cli->qos;
+
+	rc = lprocfs_str_to_s64(buffer, count, &val);
+	if (rc)
+		return rc;
+	if (val < 0)
+		return -ERANGE;
+
+	spin_lock(&qos->lock);
+	qos->min_usec_between_rpcs = val;
+	spin_unlock(&qos->lock);
+	return count;
+}
+LPROC_SEQ_FOPS(osc_min_brw_rpc_gap);
+#endif
+
 static int osc_max_dirty_mb_seq_show(struct seq_file *m, void *v)
 {
 	struct obd_device *dev = m->private;
@@ -599,6 +631,83 @@ static int osc_unstable_stats_seq_show(struct seq_file *m, void *v)
 }
 LPROC_SEQ_FOPS_RO(osc_unstable_stats);
 
+#ifdef ENABLE_RLQOS
+static int osc_qos_rules_seq_show(struct seq_file *m, void *data)
+{
+	struct obd_device *dev = m->private;
+	struct client_obd *cli = &dev->u.cli;
+	struct qos_data_t *qos = &cli->qos;
+	int i;
+	struct qos_rule_t *r;
+
+	spin_lock(&qos->lock);
+	if (0 == qos->rule_no || NULL == qos->rules || 0 == qos->min_gap_between_updating_mrif) {
+		seq_printf(m, "0\n");
+		/* Make sure the upcoming for loop doesn't run */
+		qos->rule_no = 0;
+	} else {
+		seq_printf(m, "%d,%d\n", qos->rule_no, 1000000 / qos->min_gap_between_updating_mrif);
+	}
+	for (i = 0; i < qos->rule_no; ++i) {
+		r = &qos->rules[i];
+		seq_printf(m, "%llu,%llu,%llu,%llu,%u,%u,%d,%d,%u,%d,%llu,%llu,%u\n",
+			      r->ack_ewma_lower,  r->ack_ewma_upper,
+			      r->send_ewma_lower, r->send_ewma_upper,
+			      r->rtt_ratio100_lower, r->rtt_ratio100_upper,
+			      r->m100, r->b100, r->tau,
+			      r->used_times,
+			      r->ack_ewma_avg, r->send_ewma_avg, r->rtt_ratio100_avg);
+	}
+	spin_unlock(&qos->lock);
+	return 0;
+}
+
+static ssize_t osc_qos_rules_seq_write(struct file *file,
+				       const char __user *buffer,
+				       size_t count, loff_t *off)
+{
+	struct obd_device *dev = ((struct seq_file *)file->private_data)->private;
+	struct client_obd *cli = &dev->u.cli;
+	struct qos_data_t *qos = &cli->qos;
+	int rc;
+	char *kernbuf = NULL;
+
+	OBD_ALLOC(kernbuf, count + 1);
+	if (NULL == kernbuf) {
+		return -ENOMEM;
+	}
+	if (copy_from_user(kernbuf, buffer, count)) {
+		rc = -EFAULT;
+		goto out_free_kernbuf;
+	}
+	/* Make sure the buf ends with a null so that sscanf won't overread */
+	kernbuf[count] = '\0';
+
+	spin_lock(&qos->lock);
+	/* parse_qos_rules() will free existing rules in qos before starting parsing */
+	rc = parse_qos_rules(kernbuf, qos);
+	if (0 == rc) {
+		/* return the number of chars processed on a success parsing */
+		rc = count;
+	}
+	qos->ack_ewma.ea = 0;
+	qos->ack_ewma.last_time.tv_sec = 0;
+	qos->ack_ewma.last_time.tv_usec = 0;
+	qos->sent_ewma.ea = 0;
+	qos->sent_ewma.last_time.tv_sec = 0;
+	qos->sent_ewma.last_time.tv_usec = 0;
+	qos->rtt_ratio100 = 0;
+	qos->smallest_rtt = 0;
+	qos->min_usec_between_rpcs = 0;
+	spin_unlock(&qos->lock);
+out_free_kernbuf:
+	OBD_FREE(kernbuf, count + 1);
+	return rc;
+
+}
+LPROC_SEQ_FOPS(osc_qos_rules);
+#endif
+
 LPROC_SEQ_FOPS_RO_TYPE(osc, uuid);
 LPROC_SEQ_FOPS_RO_TYPE(osc, connect_flags);
 LPROC_SEQ_FOPS_RO_TYPE(osc, blksize);
@@ -647,6 +756,10 @@ struct lprocfs_vars lprocfs_osc_obd_vars[] = {
 	  .fops	=	&osc_obd_max_pages_per_rpc_fops	},
 	{ .name	=	"max_rpcs_in_flight",
 	  .fops	=	&osc_max_rpcs_in_flight_fops	},
+#ifdef ENABLE_RLQOS
+	{ .name	=	"min_brw_rpc_gap",
+	  .fops	=	&osc_min_brw_rpc_gap_fops	},
+#endif
 	{ .name	=	"destroys_in_flight",
 	  .fops	=	&osc_destroys_in_flight_fops	},
 	{ .name	=	"max_dirty_mb",
@@ -683,6 +796,10 @@ struct lprocfs_vars lprocfs_osc_obd_vars[] = {
 	  .fops	=	&osc_pinger_recov_fops		},
 	{ .name	=	"unstable_stats",
 	  .fops	=	&osc_unstable_stats_fops	},
+#ifdef ENABLE_RLQOS
+	{ .name	=	"qos_rules",
+	  .fops	=	&osc_qos_rules_fops		},
+#endif
 	{ NULL }
 };
 
diff --git a/lustre/osc/qos_rules.c b/lustre/osc/qos_rules.c
new file mode 100644
index 0000000..8db24bd
--- /dev/null
+++ b/lustre/osc/qos_rules.c
@@ -0,0 +1,125 @@
+/*
+ * GPL HEADER START
+ *
+ * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 only,
+ * as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License version 2 for more details (a copy is included
+ * in the LICENSE file that accompanied this code).
+ *
+ * You should have received a copy of the GNU General Public License
+ * version 2 along with this program; If not, see
+ * http://www.sun.com/software/products/lustre/docs/GPLv2.pdf
+ *
+ * Please contact Storage Systems Research Center, Computer Science Department,
+ * University of California, Santa Cruz (www.ssrc.ucsc.edu) if you need
+ * additional information or have any questions.
+ *
+ * GPL HEADER END
+ */
+/*
+ * Copyright (c) 2013-2017, University of California, Santa Cruz, CA, USA.
+ * All rights reserved.
+ */
+/*
+ * This file is part of Lustre, http://www.lustre.org/
+ * Lustre is a trademark of Sun Microsystems, Inc.
+ *
+ * qos_rules.c
+ */
+#ifndef __KERNEL__
+  #include <stdio.h>
+  #include "kernel-test-primitives.h"
+  #include <string.h>
+#endif
+#include "../include/rlqos.h"
+
+/* Parse qos_rules in buf and store the result to qos.
+ *
+ * Pre-condition:
+ *   1. qos must be initialized and qos->lock MUST be held before calling this function!
+ *   2. exisiting rules in qos->rules will be freed
+ *   3. buf must be NULL-terminated or sscanf may overread it.
+ *
+ * Return value:
+ *  0: success
+ *  other value: error code. On error, qos->rules is NULL and qos->rule_no is 0.
+ */
+int parse_qos_rules(const char *buf, struct qos_data_t *qos)
+{
+	int new_rule_no = 0;
+	int rules_per_sec = 0;
+	int rc;
+	int i;
+	const char *p = buf;
+	int n;
+	const size_t rule_size = sizeof(*(qos->rules));
+	struct qos_rule_t *r;
+
+	/* handle "0\n" and "0" */
+	if (strlen(p) <= 2 && '0' == *p) {
+		if (qos->rules) {
+			LIBCFS_FREE(qos->rules, qos->rule_no * rule_size);
+		}
+		qos->rule_no = 0;
+		qos->rules = NULL;
+		return 0;
+	}
+
+	rc = sscanf(p, "%d,%d\n%n", &new_rule_no, &rules_per_sec, &n);
+	if (2 != rc) {
+		CWARN("Input data error, can't read new_rule_no\n");
+		return -EINVAL;
+	}
+	if (0 == new_rule_no || 0 == rules_per_sec) {
+		if (qos->rules) {
+			LIBCFS_FREE(qos->rules, qos->rule_no * rule_size);
+		}
+		qos->rule_no = 0;
+		qos->rules = NULL;
+		return 0;
+	}
+	p += n;
+	if (qos->rules) {
+		LIBCFS_FREE(qos->rules, qos->rule_no * rule_size);
+	}
+	qos->rule_no = new_rule_no;
+	qos->min_gap_between_updating_mrif = 1000000 / rules_per_sec;
+	LIBCFS_ALLOC_ATOMIC(qos->rules, new_rule_no * rule_size);
+	if (!qos->rules) {
+		CWARN("Can't allocate enough mem for %d rules\n", new_rule_no);
+		return -ENOMEM;
+	}
+	memset(qos->rules, 0, new_rule_no * rule_size);
+
+	for (i = 0; i < new_rule_no; i++) {
+		r = &qos->rules[i];
+		/* Don't put \n at the end of sscanf format str
+		   because there may be other unknown fields there,
+		   which will be discarded later */
+		rc = sscanf(p, "%llu,%llu,%llu,%llu,%u,%u,%d,%d,%u%n",
+		                &r->ack_ewma_lower,  &r->ack_ewma_upper,
+		                &r->send_ewma_lower, &r->send_ewma_upper,
+		                &r->rtt_ratio100_lower, &r->rtt_ratio100_upper,
+		                &r->m100, &r->b100, &r->tau, &n);
+		p += n;
+		if (rc != 9) {
+			CWARN("QoS rule parsing error, rc = %d\n", rc);
+			LIBCFS_FREE(qos->rules, qos->rule_no * rule_size);
+			qos->rules = NULL;
+			qos->rule_no = 0;
+			return -EINVAL;
+		}
+		/* consume all other chars till \n or end-of-buffer */
+		while (*p != '\0' && *(p++) != '\n')
+			;
+	}
+
+	return 0;
+}
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [lustre-devel] [PATCH 5/6] Throttle the outgoing requests according to tau
  2017-03-21 19:43 [lustre-devel] [PATCH 0/6] Rate-limiting Quality of Service Yan Li
                   ` (3 preceding siblings ...)
  2017-03-21 19:43 ` [lustre-devel] [PATCH 4/6] lprocfs interfaces for showing, parsing, and controlling rules Yan Li
@ 2017-03-21 19:43 ` Yan Li
  2017-03-23 14:03   ` Alexey Lyashkov
  2017-03-21 19:43 ` [lustre-devel] [PATCH 6/6] Adjust max_rpcs_in_flight according to metrics Yan Li
  5 siblings, 1 reply; 17+ messages in thread
From: Yan Li @ 2017-03-21 19:43 UTC (permalink / raw)
  To: lustre-devel

Signed-off-by: Yan Li <yanli@ascar.io>
---
 lustre/osc/osc_cache.c    |  3 +++
 lustre/osc/osc_internal.h | 66 +++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 69 insertions(+)

diff --git a/lustre/osc/osc_cache.c b/lustre/osc/osc_cache.c
index 236263c..2f9d4e1 100644
--- a/lustre/osc/osc_cache.c
+++ b/lustre/osc/osc_cache.c
@@ -2316,6 +2316,9 @@ static int osc_io_unplug0(const struct lu_env *env, struct client_obd *cli,
 	} else {
 		CDEBUG(D_CACHE, "Queue writeback work for client %p.\n", cli);
 		LASSERT(cli->cl_writeback_work != NULL);
+#ifdef ENABLE_RLQOS
+		qos_throttle(&cli->qos);
+#endif
 		rc = ptlrpcd_queue_work(cli->cl_writeback_work);
 	}
 	return rc;
diff --git a/lustre/osc/osc_internal.h b/lustre/osc/osc_internal.h
index 06c21b3..d31d5ba 100644
--- a/lustre/osc/osc_internal.h
+++ b/lustre/osc/osc_internal.h
@@ -245,4 +245,70 @@ extern unsigned long osc_cache_shrink_count(struct shrinker *sk,
 extern unsigned long osc_cache_shrink_scan(struct shrinker *sk,
 					   struct shrink_control *sc);
 
+#ifdef ENABLE_RLQOS
+static inline void qos_throttle(struct qos_data_t *qos)
+{
+	struct timeval now;
+	long           usec_since_last_rpc;
+	long           need_sleep_usec = 0;
+
+	spin_lock(&qos->lock);
+	if (0 == qos->min_usec_between_rpcs)
+		goto out;
+
+	do_gettimeofday(&now);
+	usec_since_last_rpc = cfs_timeval_sub(&now, &qos->last_rpc_time, NULL);
+	if (usec_since_last_rpc < 0) {
+		usec_since_last_rpc = 0;
+	}
+	if (usec_since_last_rpc < qos->min_usec_between_rpcs) {
+		need_sleep_usec = qos->min_usec_between_rpcs - usec_since_last_rpc;
+	}
+	qos->last_rpc_time = now;
+out:
+	spin_unlock(&qos->lock);
+	if (0 == need_sleep_usec) {
+		return;
+	}
+
+	/* About timer ranges:
+	   Ref: https://www.kernel.org/doc/Documentation/timers/timers-howto.txt */
+	if (need_sleep_usec < 1000) {
+		udelay(need_sleep_usec);
+	} else if (need_sleep_usec < 20000) {
+		usleep_range(need_sleep_usec - 1, need_sleep_usec);
+	} else {
+		msleep(need_sleep_usec / 1000);
+	}
+}
+#endif /* ENABLE_RLQOS */
+
+/* You must call LPROCFS_CLIMP_CHECK() on the obd device before and
+ * LPROCFS_CLIMP_EXIT() after calling this function. They are not called inside
+ * this function, because they may return an error code.
+ */
+static inline void set_max_rpcs_in_flight(int val, struct client_obd *cli)
+{
+	int adding, added, req_count;
+
+	adding = val - cli->cl_max_rpcs_in_flight;
+	req_count = atomic_read(&osc_pool_req_count);
+	if (adding > 0 && req_count < osc_reqpool_maxreqcount) {
+		/*
+		 * There might be some race which will cause over-limit
+		 * allocation, but it is fine.
+		 */
+		if (req_count + adding > osc_reqpool_maxreqcount)
+			adding = osc_reqpool_maxreqcount - req_count;
+
+		added = osc_rq_pool->prp_populate(osc_rq_pool, adding);
+		atomic_add(added, &osc_pool_req_count);
+	}
+
+	spin_lock(&cli->cl_loi_list_lock);
+	cli->cl_max_rpcs_in_flight = val;
+	client_adjust_max_dirty(cli);
+	spin_unlock(&cli->cl_loi_list_lock);
+}
+
 #endif /* OSC_INTERNAL_H */
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [lustre-devel] [PATCH 6/6] Adjust max_rpcs_in_flight according to metrics
  2017-03-21 19:43 [lustre-devel] [PATCH 0/6] Rate-limiting Quality of Service Yan Li
                   ` (4 preceding siblings ...)
  2017-03-21 19:43 ` [lustre-devel] [PATCH 5/6] Throttle the outgoing requests according to tau Yan Li
@ 2017-03-21 19:43 ` Yan Li
  5 siblings, 0 replies; 17+ messages in thread
From: Yan Li @ 2017-03-21 19:43 UTC (permalink / raw)
  To: lustre-devel

Signed-off-by: Yan Li <yanli@ascar.io>
---
 lustre/osc/osc_request.c | 165 +++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 165 insertions(+)

diff --git a/lustre/osc/osc_request.c b/lustre/osc/osc_request.c
index c59c281..8efaf5a 100644
--- a/lustre/osc/osc_request.c
+++ b/lustre/osc/osc_request.c
@@ -1613,6 +1613,156 @@ static void osc_release_ppga(struct brw_page **ppga, size_t count)
         OBD_FREE(ppga, sizeof(*ppga) * count);
 }
 
+
+#ifdef ENABLE_RLQOS
+/**
+ * te's lock should be acquired beforehand
+ */
+static void time_ewma_add_extlock(struct time_ewma *te, struct timeval *new_time) {
+	__u64 old_ea = te->ea;
+	long timediff;
+
+	if (te->last_time.tv_sec != 0) {
+		timediff = cfs_timeval_sub(new_time, &te->last_time, NULL);
+		if (timediff < 0) {
+			CDEBUG(D_INFO,
+					"(te: %p) negative timediff %ld detected, using abs value\n",
+					te, timediff);
+			timediff = -timediff;
+		}
+
+		/* Reset ea to 0 if a long gap (>10min) is detected */
+		if (timediff > 10 * 60 * ONE_MILLION) {
+			CWARN("(te: %p) Long gap detected\n", te);
+			te->ea = 0;
+		} else {
+			/* ewma = ewma * (1-alpha) + amount * alpha
+			 * ea = ewma * alpha, alpha_inv = 1/alpha
+			 *
+			 * ea = ea / alpha_inv * (alpha_inv - 1) + timediff
+			 */
+			do_div(te->ea, te->alpha_inv);
+			te->ea = te->ea * (te->alpha_inv - 1) + timediff;
+			if (te->ea > 1000000) {
+				CDEBUG(D_INFO,
+				       "(te: %p) old_ea = %llu, "
+				       "old_time = %ld.%ld, "
+				       "new_time = %ld.%ld, new ea = %llu\n",
+				       te, old_ea,
+				       te->last_time.tv_sec,
+				       te->last_time.tv_usec,
+				       new_time->tv_sec,
+				       new_time->tv_usec, te->ea);
+			}
+		}
+	} else {
+		CDEBUG(D_INFO, "(te: %p) first call\n", te);
+	}
+	te->last_time = *new_time;
+}
+
+/**
+ * Calculate ewma of time values. Long gaps will be ignored.
+ */
+static int qos_adjust(struct obd_device *obd, struct timeval *new_ack_time,
+		struct timeval *new_sent_time, int op, int bytes_transferred)
+{
+	struct client_obd *cli = &obd->u.cli;
+	struct qos_data_t *qos = &cli->qos;
+	struct time_ewma *ack_ewma_p = &qos->ack_ewma;
+	struct time_ewma *sent_ewma_p = &qos->sent_ewma;
+	__u64 ack_ewma;
+	__u64 sent_ewma;
+	struct qos_rule_t *r;
+	int new_mrif = -1;  /* -1 means no change needed */
+	int i;
+	struct timeval now;
+	long rtt;
+	int rtt_ratio100;
+	long usec_since_last_mrif_update;
+
+	spin_lock(&qos->lock);
+	time_ewma_add_extlock(ack_ewma_p, new_ack_time);
+	ack_ewma = qos_get_ewma_usec(ack_ewma_p);
+
+	time_ewma_add_extlock(sent_ewma_p, new_sent_time);
+	sent_ewma = qos_get_ewma_usec(sent_ewma_p);
+
+	/* calculate rtt */
+	do_gettimeofday(&now);
+	rtt = cfs_timeval_sub(&now, new_sent_time, NULL);
+	if (0 == qos->smallest_rtt || rtt < qos->smallest_rtt) {
+		qos->smallest_rtt = rtt;
+	}
+	rtt = rtt * 100;
+	rtt_ratio100 =  rtt / qos->smallest_rtt;
+	qos->rtt_ratio100 = rtt_ratio100;
+
+	/* Calculate throughput */
+	calc_throughput(qos, op, bytes_transferred);
+
+	/* Adjust max_rpc_in_flight according to ack_ewma and send_ewma */
+	if (NULL == qos->rules) goto out;
+	if (NULL == cli->cl_import) goto out; /* or else LPROCFS_CLIMP_CHECK may return this function, leaving qos->lock locked */
+	for(i = 0; i < qos->rule_no; ++i) {
+		r = &qos->rules[i];
+		if (ack_ewma     >= r->ack_ewma_lower &&
+		    ack_ewma     <  r->ack_ewma_upper &&
+		    sent_ewma    >= r->send_ewma_lower &&
+		    sent_ewma    <  r->send_ewma_upper &&
+		    rtt_ratio100 >= r->rtt_ratio100_lower &&
+		    rtt_ratio100 <  r->rtt_ratio100_upper)
+		{
+			r->used_times++;
+			r->ack_ewma_avg += ((__s64)ack_ewma - (__s64)r->ack_ewma_avg) / r->used_times;
+			r->send_ewma_avg += ((__s64)sent_ewma - (__s64)r->send_ewma_avg) / r->used_times;
+			r->rtt_ratio100_avg += (rtt_ratio100 - (int)r->rtt_ratio100_avg) / r->used_times;
+
+			usec_since_last_mrif_update = cfs_timeval_sub(&now, &qos->last_mrif_update_time, NULL);
+			if (usec_since_last_mrif_update > 0 &&
+					usec_since_last_mrif_update >= qos->min_gap_between_updating_mrif) {
+				qos->last_mrif_update_time = now;
+				/* m100 is disabled when assigned negative values */
+				if (r->m100 >= 0) {
+					/* Must multiply m100 first, then div by 100 to avoid
+					 * losing precision */
+					qos->max_rpc_in_flight100 *= r->m100;
+					qos->max_rpc_in_flight100 /= 100;
+				}
+				qos->max_rpc_in_flight100 += r->b100;
+				CDEBUG(D_INFO, "New max_rpc_in_flight100 = %d\n", qos->max_rpc_in_flight100);
+				if (qos->max_rpc_in_flight100 < 0) {
+					CDEBUG(D_INFO, "New max_rpc_in_flight100 is negative, reset it to 0\n");
+					qos->max_rpc_in_flight100 = 0;
+				}
+				if (qos->max_rpc_in_flight100 > OSC_MAX_RIF_MAX * 100) {
+					CDEBUG(D_INFO, "New max_rpc_in_flight100 is larger than %d, reset it to max allowed value\n", OSC_MAX_RIF_MAX * 100);
+					qos->max_rpc_in_flight100 = OSC_MAX_RIF_MAX * 100;
+				}
+				new_mrif = qos->max_rpc_in_flight100 / 100;
+				if (new_mrif < 1) {
+					CDEBUG(D_INFO, "New max_rpc_in_flight is smaller than 1, reset it to 1\n");
+					new_mrif = 1;
+				}
+			}
+			/* Update min_usec_between_rpcs to tau */
+			qos->min_usec_between_rpcs = r->tau;
+			/* set MRIF after unlocking qos->lock to prevent deadlocking */
+			break;
+		}
+	}
+out:
+	spin_unlock(&qos->lock);
+
+	if (-1 != new_mrif) {   /* -1 means no change needed */
+		LPROCFS_CLIMP_CHECK(obd);
+		set_max_rpcs_in_flight(new_mrif, cli);
+		LPROCFS_CLIMP_EXIT(obd);
+	}
+	return 0;
+}
+#endif /* ENABLE_RLQOS */
+
 static int brw_interpret(const struct lu_env *env,
                          struct ptlrpc_request *req, void *data, int rc)
 {
@@ -1622,6 +1772,14 @@ static int brw_interpret(const struct lu_env *env,
 	struct client_obd *cli = aa->aa_cli;
         ENTRY;
 
+#ifdef ENABLE_RLQOS
+	qos_adjust(req->rq_import->imp_obd,
+		   &req->rq_arrival_time,
+	           &aa->aa_oa->o_sent_time,
+	           lustre_msg_get_opc(req->rq_reqmsg) - OST_READ,
+	           req->rq_bulk->bd_nob_transferred);
+#endif
+
         rc = osc_brw_fini_request(req, rc);
         CDEBUG(D_INODE, "request %p aa %p rc %d\n", req, aa, rc);
         /* When server return -EINPROGRESS, client should always retry
@@ -1874,6 +2032,10 @@ int osc_build_rpc(const struct lu_env *env, struct client_obd *cli,
 	list_splice_init(&rpc_list, &aa->aa_oaps);
 	INIT_LIST_HEAD(&aa->aa_exts);
 	list_splice_init(ext_list, &aa->aa_exts);
+#ifdef ENABLE_RLQOS
+	/* sent_time is used by RLQoS */
+	do_gettimeofday(&aa->aa_oa->o_sent_time);
+#endif
 
 	spin_lock(&cli->cl_loi_list_lock);
 	starting_offset >>= PAGE_SHIFT;
@@ -1897,6 +2059,9 @@ int osc_build_rpc(const struct lu_env *env, struct client_obd *cli,
 		  cli->cl_w_in_flight);
 	OBD_FAIL_TIMEOUT(OBD_FAIL_OSC_DELAY_IO, cfs_fail_val);
 
+#ifdef ENABLE_RLQOS
+	qos_throttle(&cli->qos);
+#endif
 	ptlrpcd_add_req(req);
 	rc = 0;
 	EXIT;
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [lustre-devel] [PATCH 1/6] Autoconf option for rate-limiting Quality of Service (RLQOS)
  2017-03-21 19:43 ` [lustre-devel] [PATCH 1/6] Autoconf option for rate-limiting Quality of Service (RLQOS) Yan Li
@ 2017-03-21 20:09   ` Ben Evans
  2017-03-22 14:19     ` Yan Li
  2017-03-24 22:22   ` Dilger, Andreas
  1 sibling, 1 reply; 17+ messages in thread
From: Ben Evans @ 2017-03-21 20:09 UTC (permalink / raw)
  To: lustre-devel

I would remove the #ifdef ENABLE_RLQOS blocks, especially in lustre_idl.h
since you're proposing to add new fields and consume some of the padding
bits.  It will cause a lot of headache for the next feature that comes
along and consumes some of those bits.

-Ben Evans

On 3/21/17, 3:43 PM, "lustre-devel on behalf of Yan Li"
<lustre-devel-bounces at lists.lustre.org on behalf of yanli@ascar.io> wrote:

>This patch enables rate-limiting quality of service (RLQOS) support as
>talked in the ASCAR paper [1]. The purpose of RLQOS is to provide a
>client-side rate limiting mechanism that controls max_rpcs_in_flight
>and minimal gap between brw RPC requests (called tau in the code and
>paper).
>
>RLQOS can be enabled by passing --enable-rlqos to configure. It then
>can be controlled by tunables in procfs of each osc.
>
>[1] http://storageconference.us/2015/Papers/14.Li.pdf
>
>Signed-off-by: Yan Li <yanli@ascar.io>
>---
> lustre/autoconf/lustre-core.m4 | 17 +++++++++++++++++
> lustre/include/Makefile.am     |  3 ++-
> 2 files changed, 19 insertions(+), 1 deletion(-)
>
>diff --git a/lustre/autoconf/lustre-core.m4
>b/lustre/autoconf/lustre-core.m4
>index 0578325..7f1828e 100644
>--- a/lustre/autoconf/lustre-core.m4
>+++ b/lustre/autoconf/lustre-core.m4
>@@ -369,6 +369,22 @@ AC_COMPILE_IFELSE([AC_LANG_SOURCE([
> AC_MSG_RESULT([$enable_ssk])
> ]) # LC_OPENSSL_SSK
> 
>+#
>+# LC_CONFIG_RLQOS
>+#
>+# Rate-limiting Quality of Service support
>+#
>+AC_DEFUN([LC_CONFIG_RLQOS], [
>+AC_MSG_CHECKING([whether to enable rate-limiting quality of service
>support])
>+AC_ARG_ENABLE([rlqos],
>+	AC_HELP_STRING([--enable-rlqos],
>+		[enable rate-limiting quality of service support]),
>+	[], [enable_rlqos="no"])
>+AC_MSG_RESULT([$enable_rlqos])
>+AS_IF([test "x$enable_rlqos" != xno],
>+	[AC_DEFINE(ENABLE_RLQOS, 1, [enable rate-limiting quality of service
>support])])
>+]) # LC_CONFIG_RLQOS
>+
> # LC_INODE_PERMISION_2ARGS
> #
> # up to v2.6.27 had a 3 arg version (inode, mask, nameidata)
>@@ -2241,6 +2257,7 @@ AC_DEFUN([LC_PROG_LINUX], [
> 	LC_GLIBC_SUPPORT_FHANDLES
> 	LC_CONFIG_GSS
> 	LC_OPENSSL_SSK
>+	LC_CONFIG_RLQOS
> 
> 	# 2.6.32
> 	LC_BLK_QUEUE_MAX_SEGMENTS
>diff --git a/lustre/include/Makefile.am b/lustre/include/Makefile.am
>index 9074ca4..6d72b6e 100644
>--- a/lustre/include/Makefile.am
>+++ b/lustre/include/Makefile.am
>@@ -98,4 +98,5 @@ EXTRA_DIST = \
> 	upcall_cache.h \
> 	lustre_kernelcomm.h \
> 	seq_range.h \
>-	uapi_kernelcomm.h
>+	uapi_kernelcomm.h \
>+	rlqos.h
>-- 
>1.8.3.1
>
>_______________________________________________
>lustre-devel mailing list
>lustre-devel at lists.lustre.org
>http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [lustre-devel] [PATCH 1/6] Autoconf option for rate-limiting Quality of Service (RLQOS)
  2017-03-21 20:09   ` Ben Evans
@ 2017-03-22 14:19     ` Yan Li
  2017-03-22 14:27       ` Ben Evans
  0 siblings, 1 reply; 17+ messages in thread
From: Yan Li @ 2017-03-22 14:19 UTC (permalink / raw)
  To: lustre-devel


On 03/21/2017 01:09 PM, Ben Evans wrote:
> I would remove the #ifdef ENABLE_RLQOS blocks, especially in lustre_idl.h
> since you're proposing to add new fields and consume some of the padding
> bits.  It will cause a lot of headache for the next feature that comes
> along and consumes some of those bits.

Yeah, that's a good point. I'll remove it if all are ok with this.

Yan

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [lustre-devel] [PATCH 1/6] Autoconf option for rate-limiting Quality of Service (RLQOS)
  2017-03-22 14:19     ` Yan Li
@ 2017-03-22 14:27       ` Ben Evans
  0 siblings, 0 replies; 17+ messages in thread
From: Ben Evans @ 2017-03-22 14:27 UTC (permalink / raw)
  To: lustre-devel

I'd get rid of all the ENABLE_RLQOS blocks myself, but minimally the
lustre_idl.h ones.

-Ben Evans

On 3/22/17, 10:19 AM, "Yan Li" <yanli@ascar.io> wrote:

>
>On 03/21/2017 01:09 PM, Ben Evans wrote:
>> I would remove the #ifdef ENABLE_RLQOS blocks, especially in
>>lustre_idl.h
>> since you're proposing to add new fields and consume some of the padding
>> bits.  It will cause a lot of headache for the next feature that comes
>> along and consumes some of those bits.
>
>Yeah, that's a good point. I'll remove it if all are ok with this.
>
>Yan

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [lustre-devel] [PATCH 5/6] Throttle the outgoing requests according to tau
  2017-03-21 19:43 ` [lustre-devel] [PATCH 5/6] Throttle the outgoing requests according to tau Yan Li
@ 2017-03-23 14:03   ` Alexey Lyashkov
  0 siblings, 0 replies; 17+ messages in thread
From: Alexey Lyashkov @ 2017-03-23 14:03 UTC (permalink / raw)
  To: lustre-devel

I dislike a sleep in this code.
I think you should use req->rq_sent time to have a some delay, as way as
osc redo code does.
ptlrpc_check_set()
..

                /* delayed send - skip */

                if (req->rq_phase == RQ_PHASE_NEW && req->rq_sent)

                        continue;



On Tue, Mar 21, 2017 at 10:43 PM, Yan Li <yanli@ascar.io> wrote:

> Signed-off-by: Yan Li <yanli@ascar.io>
> ---
>  lustre/osc/osc_cache.c    |  3 +++
>  lustre/osc/osc_internal.h | 66 ++++++++++++++++++++++++++++++
> +++++++++++++++++
>  2 files changed, 69 insertions(+)
>
> diff --git a/lustre/osc/osc_cache.c b/lustre/osc/osc_cache.c
> index 236263c..2f9d4e1 100644
> --- a/lustre/osc/osc_cache.c
> +++ b/lustre/osc/osc_cache.c
> @@ -2316,6 +2316,9 @@ static int osc_io_unplug0(const struct lu_env *env,
> struct client_obd *cli,
>         } else {
>                 CDEBUG(D_CACHE, "Queue writeback work for client %p.\n",
> cli);
>                 LASSERT(cli->cl_writeback_work != NULL);
> +#ifdef ENABLE_RLQOS
> +               qos_throttle(&cli->qos);
> +#endif
>                 rc = ptlrpcd_queue_work(cli->cl_writeback_work);
>         }
>         return rc;
> diff --git a/lustre/osc/osc_internal.h b/lustre/osc/osc_internal.h
> index 06c21b3..d31d5ba 100644
> --- a/lustre/osc/osc_internal.h
> +++ b/lustre/osc/osc_internal.h
> @@ -245,4 +245,70 @@ extern unsigned long osc_cache_shrink_count(struct
> shrinker *sk,
>  extern unsigned long osc_cache_shrink_scan(struct shrinker *sk,
>                                            struct shrink_control *sc);
>
> +#ifdef ENABLE_RLQOS
> +static inline void qos_throttle(struct qos_data_t *qos)
> +{
> +       struct timeval now;
> +       long           usec_since_last_rpc;
> +       long           need_sleep_usec = 0;
> +
> +       spin_lock(&qos->lock);
> +       if (0 == qos->min_usec_between_rpcs)
> +               goto out;
> +
> +       do_gettimeofday(&now);
> +       usec_since_last_rpc = cfs_timeval_sub(&now, &qos->last_rpc_time,
> NULL);
> +       if (usec_since_last_rpc < 0) {
> +               usec_since_last_rpc = 0;
> +       }
> +       if (usec_since_last_rpc < qos->min_usec_between_rpcs) {
> +               need_sleep_usec = qos->min_usec_between_rpcs -
> usec_since_last_rpc;
> +       }
> +       qos->last_rpc_time = now;
> +out:
> +       spin_unlock(&qos->lock);
> +       if (0 == need_sleep_usec) {
> +               return;
> +       }
> +
> +       /* About timer ranges:
> +          Ref: https://urldefense.proofpoint.com/v2/url?u=https-3A__www.
> kernel.org_doc_Documentation_timers_timers-2Dhowto.txt&d=
> DwICAg&c=IGDlg0lD0b-nebmJJ0Kp8A&r=m8P9AM2wTf4l79yg9e1LHD5IHagtwa
> 3P4AXaemlM6Lg&m=w0oijGmz2ea38--CHGZq4fPu44dwEldJr2BDVZcBR2U&
> s=jN5WjVQ8jELL9iEXADWoal4-Yo76FIU3VVDcdN3zsC4&e=  */
> +       if (need_sleep_usec < 1000) {
> +               udelay(need_sleep_usec);
> +       } else if (need_sleep_usec < 20000) {
> +               usleep_range(need_sleep_usec - 1, need_sleep_usec);
> +       } else {
> +               msleep(need_sleep_usec / 1000);
> +       }
> +}
> +#endif /* ENABLE_RLQOS */
> +
> +/* You must call LPROCFS_CLIMP_CHECK() on the obd device before and
> + * LPROCFS_CLIMP_EXIT() after calling this function. They are not called
> inside
> + * this function, because they may return an error code.
> + */
> +static inline void set_max_rpcs_in_flight(int val, struct client_obd *cli)
> +{
> +       int adding, added, req_count;
> +
> +       adding = val - cli->cl_max_rpcs_in_flight;
> +       req_count = atomic_read(&osc_pool_req_count);
> +       if (adding > 0 && req_count < osc_reqpool_maxreqcount) {
> +               /*
> +                * There might be some race which will cause over-limit
> +                * allocation, but it is fine.
> +                */
> +               if (req_count + adding > osc_reqpool_maxreqcount)
> +                       adding = osc_reqpool_maxreqcount - req_count;
> +
> +               added = osc_rq_pool->prp_populate(osc_rq_pool, adding);
> +               atomic_add(added, &osc_pool_req_count);
> +       }
> +
> +       spin_lock(&cli->cl_loi_list_lock);
> +       cli->cl_max_rpcs_in_flight = val;
> +       client_adjust_max_dirty(cli);
> +       spin_unlock(&cli->cl_loi_list_lock);
> +}
> +
>  #endif /* OSC_INTERNAL_H */
> --
> 1.8.3.1
>
> _______________________________________________
> lustre-devel mailing list
> lustre-devel at lists.lustre.org
> https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.
> lustre.org_listinfo.cgi_lustre-2Ddevel-2Dlustre.org&d=DwICAg&c=IGDlg0lD0b-
> nebmJJ0Kp8A&r=m8P9AM2wTf4l79yg9e1LHD5IHagtwa3P4AXaemlM6Lg&m=w0oijGmz2ea38-
> -CHGZq4fPu44dwEldJr2BDVZcBR2U&s=ppAA2u9phKTaqwpnFsNVQGtqbG3xF6
> tk4_Q9mVL_lGk&e=
>



-- 
Alexey Lyashkov *?* Technical lead for a Morpheus team
Seagate Technology, LLC
www.seagate.com
www.lustre.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/attachments/20170323/f9d9ab00/attachment.htm>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [lustre-devel] [PATCH 2/6] Added fields to message for RLQOS support
  2017-03-21 19:43 ` [lustre-devel] [PATCH 2/6] Added fields to message for RLQOS support Yan Li
@ 2017-03-23 14:54   ` Alexey Lyashkov
  0 siblings, 0 replies; 17+ messages in thread
From: Alexey Lyashkov @ 2017-03-23 14:54 UTC (permalink / raw)
  To: lustre-devel

You should don't comment a asserts, but introduce an additional connect
flag to handle used fields if this flag set.

As i see you have write an code to work between patched nodes, but we have
no guaratee all nodes in clusters uses same version all time.

On Tue, Mar 21, 2017 at 10:43 PM, Yan Li <yanli@ascar.io> wrote:

> Modified the request message to embed sent_time, which will be
> returned from the server and used to calculate the exponentially
> weighted moving average of sent_time gap in return messages. It is
> used as a metric for rate-limiting quality of service.
>
> Signed-off-by: Yan Li <yanli@ascar.io>
> ---
>  lustre/include/lustre/lustre_idl.h | 4 ++++
>  lustre/ptlrpc/pack_generic.c       | 5 +++++
>  lustre/ptlrpc/wiretest.c           | 2 ++
>  lustre/utils/wiretest.c            | 2 ++
>  4 files changed, 13 insertions(+)
>
> diff --git a/lustre/include/lustre/lustre_idl.h b/lustre/include/lustre/
> lustre_idl.h
> index bf23a47..7a200d1 100644
> --- a/lustre/include/lustre/lustre_idl.h
> +++ b/lustre/include/lustre/lustre_idl.h
> @@ -3336,8 +3336,12 @@ struct obdo {
>                                                  * each stripe.
>                                                  * brw: grant space
> consumed on
>                                                  * the client for the
> write */
> +#ifdef ENABLE_RLQOS
> +       struct timeval          o_sent_time;    /* timeval is 64x2 bits on
> Linux */
> +#else
>         __u64                   o_padding_4;
>         __u64                   o_padding_5;
> +#endif
>         __u64                   o_padding_6;
>  };
>
> diff --git a/lustre/ptlrpc/pack_generic.c b/lustre/ptlrpc/pack_generic.c
> index 8df8ea8..d0bc87a 100644
> --- a/lustre/ptlrpc/pack_generic.c
> +++ b/lustre/ptlrpc/pack_generic.c
> @@ -1722,8 +1722,13 @@ void lustre_swab_obdo (struct obdo  *o)
>          __swab32s (&o->o_uid_h);
>          __swab32s (&o->o_gid_h);
>          __swab64s (&o->o_data_version);
> +#ifdef ENABLE_RLQOS
> +        __swab64s ((__u64*)&o->o_sent_time.tv_sec);
> +        __swab64s ((__u64*)&o->o_sent_time.tv_usec);
> +#else
>          CLASSERT(offsetof(typeof(*o), o_padding_4) != 0);
>          CLASSERT(offsetof(typeof(*o), o_padding_5) != 0);
> +#endif
>          CLASSERT(offsetof(typeof(*o), o_padding_6) != 0);
>
>  }
> diff --git a/lustre/ptlrpc/wiretest.c b/lustre/ptlrpc/wiretest.c
> index 070ef91..0c909a6 100644
> --- a/lustre/ptlrpc/wiretest.c
> +++ b/lustre/ptlrpc/wiretest.c
> @@ -1314,6 +1314,7 @@ void lustre_assert_wire_constants(void)
>                  (long long)(int)offsetof(struct obdo, o_data_version));
>         LASSERTF((int)sizeof(((struct obdo *)0)->o_data_version) == 8,
> "found %lld\n",
>                  (long long)(int)sizeof(((struct obdo
> *)0)->o_data_version));
> +#ifndef ENABLE_RLQOS
>         LASSERTF((int)offsetof(struct obdo, o_padding_4) == 184, "found
> %lld\n",
>                  (long long)(int)offsetof(struct obdo, o_padding_4));
>         LASSERTF((int)sizeof(((struct obdo *)0)->o_padding_4) == 8, "found
> %lld\n",
> @@ -1322,6 +1323,7 @@ void lustre_assert_wire_constants(void)
>                  (long long)(int)offsetof(struct obdo, o_padding_5));
>         LASSERTF((int)sizeof(((struct obdo *)0)->o_padding_5) == 8, "found
> %lld\n",
>                  (long long)(int)sizeof(((struct obdo *)0)->o_padding_5));
> +#endif
>         LASSERTF((int)offsetof(struct obdo, o_padding_6) == 200, "found
> %lld\n",
>                  (long long)(int)offsetof(struct obdo, o_padding_6));
>         LASSERTF((int)sizeof(((struct obdo *)0)->o_padding_6) == 8, "found
> %lld\n",
> diff --git a/lustre/utils/wiretest.c b/lustre/utils/wiretest.c
> index 233d7d8..47fbbf0 100644
> --- a/lustre/utils/wiretest.c
> +++ b/lustre/utils/wiretest.c
> @@ -1329,6 +1329,7 @@ void lustre_assert_wire_constants(void)
>                  (long long)(int)offsetof(struct obdo, o_data_version));
>         LASSERTF((int)sizeof(((struct obdo *)0)->o_data_version) == 8,
> "found %lld\n",
>                  (long long)(int)sizeof(((struct obdo
> *)0)->o_data_version));
> +#ifndef ENABLE_RLQOS
>         LASSERTF((int)offsetof(struct obdo, o_padding_4) == 184, "found
> %lld\n",
>                  (long long)(int)offsetof(struct obdo, o_padding_4));
>         LASSERTF((int)sizeof(((struct obdo *)0)->o_padding_4) == 8, "found
> %lld\n",
> @@ -1337,6 +1338,7 @@ void lustre_assert_wire_constants(void)
>                  (long long)(int)offsetof(struct obdo, o_padding_5));
>         LASSERTF((int)sizeof(((struct obdo *)0)->o_padding_5) == 8, "found
> %lld\n",
>                  (long long)(int)sizeof(((struct obdo *)0)->o_padding_5));
> +#endif
>         LASSERTF((int)offsetof(struct obdo, o_padding_6) == 200, "found
> %lld\n",
>                  (long long)(int)offsetof(struct obdo, o_padding_6));
>         LASSERTF((int)sizeof(((struct obdo *)0)->o_padding_6) == 8, "found
> %lld\n",
> --
> 1.8.3.1
>
> _______________________________________________
> lustre-devel mailing list
> lustre-devel at lists.lustre.org
> https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.
> lustre.org_listinfo.cgi_lustre-2Ddevel-2Dlustre.org&d=DwICAg&c=IGDlg0lD0b-
> nebmJJ0Kp8A&r=m8P9AM2wTf4l79yg9e1LHD5IHagtwa3P4AXaemlM6Lg&m=
> NuClc8LkPaQ91Zav0h5yoiRmBVC4_Ks9Db6KX3xsRmk&s=
> 6FVNfemWTMvnOwmVBxixoJyS4CNIP_D14UGw2pWlGd0&e=
>



-- 
Alexey Lyashkov *?* Technical lead for a Morpheus team
Seagate Technology, LLC
www.seagate.com
www.lustre.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/attachments/20170323/f2b00fba/attachment-0001.htm>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [lustre-devel] [PATCH 1/6] Autoconf option for rate-limiting Quality of Service (RLQOS)
  2017-03-21 19:43 ` [lustre-devel] [PATCH 1/6] Autoconf option for rate-limiting Quality of Service (RLQOS) Yan Li
  2017-03-21 20:09   ` Ben Evans
@ 2017-03-24 22:22   ` Dilger, Andreas
       [not found]     ` <3BE4A898-D944-41F9-84C8-FE8DA80D0D65@datadirectnet.com>
  1 sibling, 1 reply; 17+ messages in thread
From: Dilger, Andreas @ 2017-03-24 22:22 UTC (permalink / raw)
  To: lustre-devel

On Mar 21, 2017, at 13:43, Yan Li <yanli@ascar.io> wrote:
> 
> This patch enables rate-limiting quality of service (RLQOS) support as
> talked in the ASCAR paper [1]. The purpose of RLQOS is to provide a
> client-side rate limiting mechanism that controls max_rpcs_in_flight
> and minimal gap between brw RPC requests (called tau in the code and
> paper).
> 
> RLQOS can be enabled by passing --enable-rlqos to configure. It then
> can be controlled by tunables in procfs of each osc.

Hi Yan,
thanks for submitting the patch series.  Two high level comments on the
patches, since I haven't had a chance to review them in detail (though
I see Alexey has commented on some of them):
- What external tools (if any) are needed in order to use this functionality?
 Are these available for download, and is there documentation for using them?
- It is fine that you've submitted the patches here for discussion and to
 raise awareness of your work.  In order to get them landed you should submit
 the patches to Gerrit (see https://wiki.hpdd.intel.com/display/PUB/Using+Gerrit

I'll try to take a look at them when I get a chance.  This may also be of
interest to Li Xi and Qian at DDN, who have been working on server-side NRS.

Cheers, Andreas

> [1] http://storageconference.us/2015/Papers/14.Li.pdf
> 
> Signed-off-by: Yan Li <yanli@ascar.io>
> ---
> lustre/autoconf/lustre-core.m4 | 17 +++++++++++++++++
> lustre/include/Makefile.am     |  3 ++-
> 2 files changed, 19 insertions(+), 1 deletion(-)
> 
> diff --git a/lustre/autoconf/lustre-core.m4 b/lustre/autoconf/lustre-core.m4
> index 0578325..7f1828e 100644
> --- a/lustre/autoconf/lustre-core.m4
> +++ b/lustre/autoconf/lustre-core.m4
> @@ -369,6 +369,22 @@ AC_COMPILE_IFELSE([AC_LANG_SOURCE([
> AC_MSG_RESULT([$enable_ssk])
> ]) # LC_OPENSSL_SSK
> 
> +#
> +# LC_CONFIG_RLQOS
> +#
> +# Rate-limiting Quality of Service support
> +#
> +AC_DEFUN([LC_CONFIG_RLQOS], [
> +AC_MSG_CHECKING([whether to enable rate-limiting quality of service support])
> +AC_ARG_ENABLE([rlqos],
> +	AC_HELP_STRING([--enable-rlqos],
> +		[enable rate-limiting quality of service support]),
> +	[], [enable_rlqos="no"])
> +AC_MSG_RESULT([$enable_rlqos])
> +AS_IF([test "x$enable_rlqos" != xno],
> +	[AC_DEFINE(ENABLE_RLQOS, 1, [enable rate-limiting quality of service support])])
> +]) # LC_CONFIG_RLQOS
> +
> # LC_INODE_PERMISION_2ARGS
> #
> # up to v2.6.27 had a 3 arg version (inode, mask, nameidata)
> @@ -2241,6 +2257,7 @@ AC_DEFUN([LC_PROG_LINUX], [
> 	LC_GLIBC_SUPPORT_FHANDLES
> 	LC_CONFIG_GSS
> 	LC_OPENSSL_SSK
> +	LC_CONFIG_RLQOS
> 
> 	# 2.6.32
> 	LC_BLK_QUEUE_MAX_SEGMENTS
> diff --git a/lustre/include/Makefile.am b/lustre/include/Makefile.am
> index 9074ca4..6d72b6e 100644
> --- a/lustre/include/Makefile.am
> +++ b/lustre/include/Makefile.am
> @@ -98,4 +98,5 @@ EXTRA_DIST = \
> 	upcall_cache.h \
> 	lustre_kernelcomm.h \
> 	seq_range.h \
> -	uapi_kernelcomm.h
> +	uapi_kernelcomm.h \
> +	rlqos.h
> -- 
> 1.8.3.1
> 
> _______________________________________________
> lustre-devel mailing list
> lustre-devel at lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

Cheers, Andreas
--
Andreas Dilger
Lustre Principal Architect
Intel Corporation

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [lustre-devel] [PATCH 1/6] Autoconf option for rate-limiting Quality of Service (RLQOS)
       [not found]     ` <3BE4A898-D944-41F9-84C8-FE8DA80D0D65@datadirectnet.com>
@ 2017-04-14  2:55       ` Yan Li
  2017-04-17 12:32         ` Brinkmann, Prof. Dr. André
  0 siblings, 1 reply; 17+ messages in thread
From: Yan Li @ 2017-04-14  2:55 UTC (permalink / raw)
  To: lustre-devel

On 03/24/2017 08:36 PM, Li Xi wrote:
> > As you already know, we (DDN and also Prof. Andr? and Lingfang from Mainz University)
> are working together on QoS, not only server side TBF policy of NRS, but also client
> side QoS (https://jira.hpdd.intel.com/browse/LU-7982). And also, global QoS of Lustre
> is under development. After a glance on the paper, I think your work looks different from
> our approach. That is good, because these mechanisms could work together to improve
> the service quality of Lustre in different ways for different requirements.
> 
> I have a few questions about LRQOS. I haven?t read all the details in the paper, so please
> correct me if I am wrong.
> 
> 1) In my understanding, ASCAR/RLQOS is aimed at preventing congestion in the Lustre
> client. Am I correct? Is ASCAR/RLQOS able to provide any bandwidth/IOPS guarantees
> to each applications, or to each user, or job? There is an common use case of client side
> QoS. For example, multiple users are sharing the same Lustre client. But one of the users
> starts a very aggressive application which uses all the available bandwidth/IOPS and thus
> cause very bad performance/latency to other users. So, what we (DDN) are currently working
> on is to trying to isolate/balance performance between users/jobs. I am wondering whether
> ASCAR/RLQOS is able to be combined with our patches (https://review.whamcloud.com/#/c/19896/,
> https://review.whamcloud.com/#/c/19700/) to provide an even better solution.

ASCAR/RLQOS can't do bandwidth allocation for jobs accessing the same
OSC yet. It is theoretically possible to do that for jobs accessing
different OSCs, by using different rulesets for different OSC, but we
don't know what is the best way to design these rulesets for bandwidth
allocation yet.

I agree it would be beneficial if our development effort can be
combined. The core idea of ASCAR/RLQOS is to use a predefined ruleset to
manage existing parameters, and this idea can be applied to any
parameters in addition to those used in QoS.

> 2) It is mentioned in the paper that ASCAR/RLQOS is controlling max flight RPC of OSC to
> prevent congestion. However, for cached I/O, we found that page cache limitation on client is
> also affecting the throughputs of applications. Especially, when multiple different applications
> are sharing limited page caches. One of big concern is, when max flight RPC is limited,
> the page cache will be exhausted (for example, when multiple applications keep on writing
> data), and this is a new type of congestion. I am not sure, do you think this kind of congestion
> will cause any performance decline/problem to the application?

Yes. If that's the concern we should also tune the page cache limitation
in addition to mrif. But the effectiveness of this needs to be carefully
evaluated.

> 3) Have you considered implementing ASCAR/RLQOS on Lustre server side? As already
> mentioned in the paper, sometimes, clients of Lustre could connect to the servers and send
> requests without any self-restraint which is unfair to other clients which follows the control
> of ASCAR/RLQOS. And unfortunately, it is hard to get all the clients under control since
> Lustre clients could change a lot from time to time. However, if a similar mechanism is
> implemented on server side, things becomes much easier. And that is part of the reason
> why TBF was implemented on server side rather than client side. Maybe something
> similar to ASCAR/RLQOS could be implemented on MDTs/OSTs too. What do you think?

This is definitely an interesting idea. As I said earlier, the core idea
of ASCAR/RLQOS is actually tuning parameters dynamically, and we can
apply this rule-based control to any parameters on both the server
and client side.

> 4) In your paper, I/O patterns detection or work load classifier are mentioned. So do you know
> is there any any good way to detect/describe the I/O pattern of a application? Understanding
> the I/O patterns of applications are really important and helpful for QoS. But I guess it is
> really difficult, comparing to the pattern detection on other systems, like network. However,
> do you have any idea or direction that looks like the right way? Maybe something like
> machine learning?

Yes. We are experimenting deep reinforcement learning-based methods and
have seen some good results. The best part of using deep learning is
that we don't have to worry about feature selection. As to whether deep
learning works in real world, it has to be evaluated thoroughly.


Yan

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [lustre-devel] [PATCH 1/6] Autoconf option for rate-limiting Quality of Service (RLQOS)
  2017-04-14  2:55       ` Yan Li
@ 2017-04-17 12:32         ` Brinkmann, Prof. Dr. André
  2017-04-17 16:46           ` Yan Li
  0 siblings, 1 reply; 17+ messages in thread
From: Brinkmann, Prof. Dr. André @ 2017-04-17 12:32 UTC (permalink / raw)
  To: lustre-devel

Dear Yan Li,

I fully agree that your approach to learn a small rule-set is very interesting to optimize overall 
Lustre bandwidth. What I have not been able to fully understand from your paper is the cost of 
adaptation. What is happening in a cluster running many jobs at the same time applying very different 
access patterns (in very different combinations to different OSSes)? 

We have just started to collect these patterns. Might be interesting to apply different (machine learning)
algorithms on top of these patters going into different directions:

- Optimize overall bandwidth (like ASCAR is doing)
- Optimize bandwidth while supporting QoS rules for certain applications

Will you be at LUG? At least Tim from our team will participate and it might be a good opportunity to discuss
a joint approach.

Best Regards, 

Andr?

Am 14.04.17, 04:55 schrieb "Yan Li" <yanli@ascar.io>:

    On 03/24/2017 08:36 PM, Li Xi wrote:
    > > As you already know, we (DDN and also Prof. Andr? and Lingfang from Mainz University)
    > are working together on QoS, not only server side TBF policy of NRS, but also client
    > side QoS (https://jira.hpdd.intel.com/browse/LU-7982). And also, global QoS of Lustre
    > is under development. After a glance on the paper, I think your work looks different from
    > our approach. That is good, because these mechanisms could work together to improve
    > the service quality of Lustre in different ways for different requirements.
    > 
    > I have a few questions about LRQOS. I haven?t read all the details in the paper, so please
    > correct me if I am wrong.
    > 
    > 1) In my understanding, ASCAR/RLQOS is aimed at preventing congestion in the Lustre
    > client. Am I correct? Is ASCAR/RLQOS able to provide any bandwidth/IOPS guarantees
    > to each applications, or to each user, or job? There is an common use case of client side
    > QoS. For example, multiple users are sharing the same Lustre client. But one of the users
    > starts a very aggressive application which uses all the available bandwidth/IOPS and thus
    > cause very bad performance/latency to other users. So, what we (DDN) are currently working
    > on is to trying to isolate/balance performance between users/jobs. I am wondering whether
    > ASCAR/RLQOS is able to be combined with our patches (https://review.whamcloud.com/#/c/19896/,
    > https://review.whamcloud.com/#/c/19700/) to provide an even better solution.
    
    ASCAR/RLQOS can't do bandwidth allocation for jobs accessing the same
    OSC yet. It is theoretically possible to do that for jobs accessing
    different OSCs, by using different rulesets for different OSC, but we
    don't know what is the best way to design these rulesets for bandwidth
    allocation yet.
    
    I agree it would be beneficial if our development effort can be
    combined. The core idea of ASCAR/RLQOS is to use a predefined ruleset to
    manage existing parameters, and this idea can be applied to any
    parameters in addition to those used in QoS.
    
    > 2) It is mentioned in the paper that ASCAR/RLQOS is controlling max flight RPC of OSC to
    > prevent congestion. However, for cached I/O, we found that page cache limitation on client is
    > also affecting the throughputs of applications. Especially, when multiple different applications
    > are sharing limited page caches. One of big concern is, when max flight RPC is limited,
    > the page cache will be exhausted (for example, when multiple applications keep on writing
    > data), and this is a new type of congestion. I am not sure, do you think this kind of congestion
    > will cause any performance decline/problem to the application?
    
    Yes. If that's the concern we should also tune the page cache limitation
    in addition to mrif. But the effectiveness of this needs to be carefully
    evaluated.
    
    > 3) Have you considered implementing ASCAR/RLQOS on Lustre server side? As already
    > mentioned in the paper, sometimes, clients of Lustre could connect to the servers and send
    > requests without any self-restraint which is unfair to other clients which follows the control
    > of ASCAR/RLQOS. And unfortunately, it is hard to get all the clients under control since
    > Lustre clients could change a lot from time to time. However, if a similar mechanism is
    > implemented on server side, things becomes much easier. And that is part of the reason
    > why TBF was implemented on server side rather than client side. Maybe something
    > similar to ASCAR/RLQOS could be implemented on MDTs/OSTs too. What do you think?
    
    This is definitely an interesting idea. As I said earlier, the core idea
    of ASCAR/RLQOS is actually tuning parameters dynamically, and we can
    apply this rule-based control to any parameters on both the server
    and client side.
    
    > 4) In your paper, I/O patterns detection or work load classifier are mentioned. So do you know
    > is there any any good way to detect/describe the I/O pattern of a application? Understanding
    > the I/O patterns of applications are really important and helpful for QoS. But I guess it is
    > really difficult, comparing to the pattern detection on other systems, like network. However,
    > do you have any idea or direction that looks like the right way? Maybe something like
    > machine learning?
    
    Yes. We are experimenting deep reinforcement learning-based methods and
    have seen some good results. The best part of using deep learning is
    that we don't have to worry about feature selection. As to whether deep
    learning works in real world, it has to be evaluated thoroughly.
    
    
    Yan
    

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [lustre-devel] [PATCH 1/6] Autoconf option for rate-limiting Quality of Service (RLQOS)
  2017-04-17 12:32         ` Brinkmann, Prof. Dr. André
@ 2017-04-17 16:46           ` Yan Li
  2017-04-21 12:50             ` Brinkmann, Prof. Dr. André
  0 siblings, 1 reply; 17+ messages in thread
From: Yan Li @ 2017-04-17 16:46 UTC (permalink / raw)
  To: lustre-devel


On 04/17/2017 05:32 AM, Brinkmann, Prof. Dr. Andr? wrote:
> I fully agree that your approach to learn a small rule-set is very interesting to optimize overall 
> Lustre bandwidth. What I have not been able to fully understand from your paper is the cost of 
> adaptation. What is happening in a cluster running many jobs at the same time applying very different 
> access patterns (in very different combinations to different OSSes)?

When there are many jobs, their aggregated  I/O pattern can usually be
treated as a mixed random read/write workload. The more jobs you have,
the more uniformly random the I/O pattern is. My experience is that they
are not that hard to optimize. The hardest to optimize are when only one
or two I/O job is running and they have a very special I/O pattern.

> We have just started to collect these patterns. Might be interesting to apply different (machine learning)
> algorithms on top of these patters going into different directions:
> 
> - Optimize overall bandwidth (like ASCAR is doing)

This is similar to what I'm working on. I've been systematically testing
many machine learning algorithms on bandwidth optimization, and some of
them have pretty good results. My problem is that all my workloads so
far are synthetic.

> - Optimize bandwidth while supporting QoS rules for certain
> applications

This is on my radar. I'll look into your design and implementation to
see how we can do something interesting together.

> Will you be at LUG? At least Tim from our team will participate and it might be a good opportunity to discuss
> a joint approach.

I'm not sure yet. Now I've graduated I need to find my own funding
source for travel.

--
Yan

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [lustre-devel] [PATCH 1/6] Autoconf option for rate-limiting Quality of Service (RLQOS)
  2017-04-17 16:46           ` Yan Li
@ 2017-04-21 12:50             ` Brinkmann, Prof. Dr. André
  0 siblings, 0 replies; 17+ messages in thread
From: Brinkmann, Prof. Dr. André @ 2017-04-21 12:50 UTC (permalink / raw)
  To: lustre-devel


    On 04/17/2017 05:32 AM, Brinkmann, Prof. Dr. Andr? wrote:
    > I fully agree that your approach to learn a small rule-set is very interesting to optimize overall 
    > Lustre bandwidth. What I have not been able to fully understand from your paper is the cost of 
    > adaptation. What is happening in a cluster running many jobs at the same time applying very different 
    > access patterns (in very different combinations to different OSSes)?
    
    When there are many jobs, their aggregated  I/O pattern can usually be
    treated as a mixed random read/write workload. The more jobs you have,
    the more uniformly random the I/O pattern is. My experience is that they
    are not that hard to optimize. The hardest to optimize are when only one
    or two I/O job is running and they have a very special I/O pattern.
    
    > We have just started to collect these patterns. Might be interesting to apply different (machine learning)
    > algorithms on top of these patters going into different directions:
    > 
    > - Optimize overall bandwidth (like ASCAR is doing)
    
    This is similar to what I'm working on. I've been systematically testing
    many machine learning algorithms on bandwidth optimization, and some of
    them have pretty good results. My problem is that all my workloads so
    far are synthetic.
    
    > - Optimize bandwidth while supporting QoS rules for certain
    > applications
    
    This is on my radar. I'll look into your design and implementation to
    see how we can do something interesting together.
    
    > Will you be at LUG? At least Tim from our team will participate and it might be a good opportunity to discuss
    > a joint approach.
    
    I'm not sure yet. Now I've graduated I need to find my own funding
    source for travel.
 
We should try to setup a conf-call after LUG if you are unable to attend to streamline our development.

Cheers, 

Andr?
   
    --
    Yan
    

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2017-04-21 12:50 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-03-21 19:43 [lustre-devel] [PATCH 0/6] Rate-limiting Quality of Service Yan Li
2017-03-21 19:43 ` [lustre-devel] [PATCH 1/6] Autoconf option for rate-limiting Quality of Service (RLQOS) Yan Li
2017-03-21 20:09   ` Ben Evans
2017-03-22 14:19     ` Yan Li
2017-03-22 14:27       ` Ben Evans
2017-03-24 22:22   ` Dilger, Andreas
     [not found]     ` <3BE4A898-D944-41F9-84C8-FE8DA80D0D65@datadirectnet.com>
2017-04-14  2:55       ` Yan Li
2017-04-17 12:32         ` Brinkmann, Prof. Dr. André
2017-04-17 16:46           ` Yan Li
2017-04-21 12:50             ` Brinkmann, Prof. Dr. André
2017-03-21 19:43 ` [lustre-devel] [PATCH 2/6] Added fields to message for RLQOS support Yan Li
2017-03-23 14:54   ` Alexey Lyashkov
2017-03-21 19:43 ` [lustre-devel] [PATCH 3/6] RLQOS main data structure Yan Li
2017-03-21 19:43 ` [lustre-devel] [PATCH 4/6] lprocfs interfaces for showing, parsing, and controlling rules Yan Li
2017-03-21 19:43 ` [lustre-devel] [PATCH 5/6] Throttle the outgoing requests according to tau Yan Li
2017-03-23 14:03   ` Alexey Lyashkov
2017-03-21 19:43 ` [lustre-devel] [PATCH 6/6] Adjust max_rpcs_in_flight according to metrics Yan Li

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.