ceph-devel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v5 0/5] ceph: periodically send perf metrics to ceph
@ 2020-06-30  7:52 xiubli
  2020-06-30  7:52 ` [PATCH v5 1/5] ceph: add check_session_state helper and make it global xiubli
                   ` (5 more replies)
  0 siblings, 6 replies; 12+ messages in thread
From: xiubli @ 2020-06-30  7:52 UTC (permalink / raw)
  To: jlayton; +Cc: idryomov, zyan, pdonnell, vshankar, ceph-devel, Xiubo Li

From: Xiubo Li <xiubli@redhat.com>

This series is based the previous patches of the metrics in kceph[1]
and mds daemons record and forward client side metrics to manager[2][3].

This will send the caps/read/write/metadata metrics to any available
MDS only once per second, which will be the same as the userland client.
We could disable it via the disable_send_metrics module parameter.

In mdsc->metric we have two new members:
'metric.mds': save the available and valid MDS rank number to send the
              metrics to.
'metric.mds_cnt: how many MDSs support the metric collection feature.

Only when '!disable_send_metric && metric.mds_cnt > 0' will the workqueue
job keep alive.


And will also send the metric flags to MDS, currently it supports the
cap, read latency, write latency and metadata latency.

Also have pushed this series to github [4].

[1] https://patchwork.kernel.org/project/ceph-devel/list/?series=238907 [Merged]
[2] https://github.com/ceph/ceph/pull/26004 [Merged]
[3] https://github.com/ceph/ceph/pull/35608 [Merged]
[4] https://github.com/lxbsz/ceph-client/commits/perf_metric5

Changes in V5:
- rename enable_send_metrics --> disable_send_metrics
- switch back to a single workqueue job.
- 'list' --> 'metric_wakeup'

Changes in V4:
- WARN_ON --> WARN_ON_ONCE
- do not send metrics when no mds suppor the metric collection.
- add global total_caps in mdsc->metric
- add the delayed work for each session and choose one to send the metrics to get rid of the mdsc->mutex lock

Changed in V3:
- fold "check the METRIC_COLLECT feature before sending metrics" into previous one
- use `enable_send_metrics` on/off switch instead

Changed in V2:
- split the patches into small ones as possible.
- check the METRIC_COLLECT feature before sending metrics
- switch to WARN_ON and bubble up errnos to the callers




Xiubo Li (5):
  ceph: add check_session_state helper and make it global
  ceph: add global total_caps to count the mdsc's total caps number
  ceph: periodically send perf metrics to ceph
  ceph: switch to WARN_ON_ONCE and bubble up errnos to the callers
  ceph: send client provided metric flags in client metadata

 fs/ceph/caps.c               |   2 +
 fs/ceph/debugfs.c            |  14 +---
 fs/ceph/mds_client.c         | 166 ++++++++++++++++++++++++++++++++++---------
 fs/ceph/mds_client.h         |   7 +-
 fs/ceph/metric.c             | 158 ++++++++++++++++++++++++++++++++++++++++
 fs/ceph/metric.h             |  96 +++++++++++++++++++++++++
 fs/ceph/super.c              |  42 +++++++++++
 fs/ceph/super.h              |   2 +
 include/linux/ceph/ceph_fs.h |   1 +
 9 files changed, 442 insertions(+), 46 deletions(-)

-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v5 1/5] ceph: add check_session_state helper and make it global
  2020-06-30  7:52 [PATCH v5 0/5] ceph: periodically send perf metrics to ceph xiubli
@ 2020-06-30  7:52 ` xiubli
  2020-06-30  7:52 ` [PATCH v5 2/5] ceph: add global total_caps to count the mdsc's total caps number xiubli
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 12+ messages in thread
From: xiubli @ 2020-06-30  7:52 UTC (permalink / raw)
  To: jlayton; +Cc: idryomov, zyan, pdonnell, vshankar, ceph-devel, Xiubo Li

From: Xiubo Li <xiubli@redhat.com>

And remove the unsed mdsc parameter to simplify the code.

URL: https://tracker.ceph.com/issues/43215
Signed-off-by: Xiubo Li <xiubli@redhat.com>
---
 fs/ceph/mds_client.c | 47 +++++++++++++++++++++++++++--------------------
 fs/ceph/mds_client.h |  3 +++
 2 files changed, 30 insertions(+), 20 deletions(-)

diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
index a504971..58c54d4 100644
--- a/fs/ceph/mds_client.c
+++ b/fs/ceph/mds_client.c
@@ -1785,8 +1785,7 @@ static void renewed_caps(struct ceph_mds_client *mdsc,
 /*
  * send a session close request
  */
-static int request_close_session(struct ceph_mds_client *mdsc,
-				 struct ceph_mds_session *session)
+static int request_close_session(struct ceph_mds_session *session)
 {
 	struct ceph_msg *msg;
 
@@ -1809,7 +1808,7 @@ static int __close_session(struct ceph_mds_client *mdsc,
 	if (session->s_state >= CEPH_MDS_SESSION_CLOSING)
 		return 0;
 	session->s_state = CEPH_MDS_SESSION_CLOSING;
-	return request_close_session(mdsc, session);
+	return request_close_session(session);
 }
 
 static bool drop_negative_children(struct dentry *dentry)
@@ -4263,6 +4262,29 @@ static void maybe_recover_session(struct ceph_mds_client *mdsc)
 	ceph_force_reconnect(fsc->sb);
 }
 
+bool check_session_state(struct ceph_mds_session *s)
+{
+	if (s->s_state == CEPH_MDS_SESSION_CLOSING) {
+		dout("resending session close request for mds%d\n",
+				s->s_mds);
+		request_close_session(s);
+		return false;
+	}
+	if (s->s_ttl && time_after(jiffies, s->s_ttl)) {
+		if (s->s_state == CEPH_MDS_SESSION_OPEN) {
+			s->s_state = CEPH_MDS_SESSION_HUNG;
+			pr_info("mds%d hung\n", s->s_mds);
+		}
+	}
+	if (s->s_state == CEPH_MDS_SESSION_NEW ||
+	    s->s_state == CEPH_MDS_SESSION_RESTARTING ||
+	    s->s_state == CEPH_MDS_SESSION_REJECTED)
+		/* this mds is failed or recovering, just wait */
+		return false;
+
+	return true;
+}
+
 /*
  * delayed work -- periodically trim expired leases, renew caps with mds
  */
@@ -4294,23 +4316,8 @@ static void delayed_work(struct work_struct *work)
 		struct ceph_mds_session *s = __ceph_lookup_mds_session(mdsc, i);
 		if (!s)
 			continue;
-		if (s->s_state == CEPH_MDS_SESSION_CLOSING) {
-			dout("resending session close request for mds%d\n",
-			     s->s_mds);
-			request_close_session(mdsc, s);
-			ceph_put_mds_session(s);
-			continue;
-		}
-		if (s->s_ttl && time_after(jiffies, s->s_ttl)) {
-			if (s->s_state == CEPH_MDS_SESSION_OPEN) {
-				s->s_state = CEPH_MDS_SESSION_HUNG;
-				pr_info("mds%d hung\n", s->s_mds);
-			}
-		}
-		if (s->s_state == CEPH_MDS_SESSION_NEW ||
-		    s->s_state == CEPH_MDS_SESSION_RESTARTING ||
-		    s->s_state == CEPH_MDS_SESSION_REJECTED) {
-			/* this mds is failed or recovering, just wait */
+
+		if (!check_session_state(s)) {
 			ceph_put_mds_session(s);
 			continue;
 		}
diff --git a/fs/ceph/mds_client.h b/fs/ceph/mds_client.h
index 5e0c407..6147ff0 100644
--- a/fs/ceph/mds_client.h
+++ b/fs/ceph/mds_client.h
@@ -18,6 +18,7 @@
 #include <linux/ceph/auth.h>
 
 #include "metric.h"
+#include "super.h"
 
 /* The first 8 bits are reserved for old ceph releases */
 enum ceph_feature_type {
@@ -476,6 +477,8 @@ struct ceph_mds_client {
 
 extern const char *ceph_mds_op_name(int op);
 
+extern bool check_session_state(struct ceph_mds_session *s);
+
 extern struct ceph_mds_session *
 __ceph_lookup_mds_session(struct ceph_mds_client *, int mds);
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v5 2/5] ceph: add global total_caps to count the mdsc's total caps number
  2020-06-30  7:52 [PATCH v5 0/5] ceph: periodically send perf metrics to ceph xiubli
  2020-06-30  7:52 ` [PATCH v5 1/5] ceph: add check_session_state helper and make it global xiubli
@ 2020-06-30  7:52 ` xiubli
  2020-06-30  7:52 ` [PATCH v5 3/5] ceph: periodically send perf metrics to ceph xiubli
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 12+ messages in thread
From: xiubli @ 2020-06-30  7:52 UTC (permalink / raw)
  To: jlayton; +Cc: idryomov, zyan, pdonnell, vshankar, ceph-devel, Xiubo Li

From: Xiubo Li <xiubli@redhat.com>

This will help to reduce using the global mdsc->metux lock in many
places.

URL: https://tracker.ceph.com/issues/43215
Signed-off-by: Xiubo Li <xiubli@redhat.com>
---
 fs/ceph/caps.c       |  2 ++
 fs/ceph/debugfs.c    | 14 ++------------
 fs/ceph/mds_client.c |  1 +
 fs/ceph/metric.c     |  1 +
 fs/ceph/metric.h     |  1 +
 5 files changed, 7 insertions(+), 12 deletions(-)

diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c
index 972c13a..5f48940 100644
--- a/fs/ceph/caps.c
+++ b/fs/ceph/caps.c
@@ -668,6 +668,7 @@ void ceph_add_cap(struct inode *inode,
 		spin_lock(&session->s_cap_lock);
 		list_add_tail(&cap->session_caps, &session->s_caps);
 		session->s_nr_caps++;
+		atomic64_inc(&mdsc->metric.total_caps);
 		spin_unlock(&session->s_cap_lock);
 	} else {
 		spin_lock(&session->s_cap_lock);
@@ -1161,6 +1162,7 @@ void __ceph_remove_cap(struct ceph_cap *cap, bool queue_release)
 	} else {
 		list_del_init(&cap->session_caps);
 		session->s_nr_caps--;
+		atomic64_dec(&mdsc->metric.total_caps);
 		cap->session = NULL;
 		removed = 1;
 	}
diff --git a/fs/ceph/debugfs.c b/fs/ceph/debugfs.c
index 070ed84..3030f55 100644
--- a/fs/ceph/debugfs.c
+++ b/fs/ceph/debugfs.c
@@ -145,7 +145,7 @@ static int metric_show(struct seq_file *s, void *p)
 	struct ceph_fs_client *fsc = s->private;
 	struct ceph_mds_client *mdsc = fsc->mdsc;
 	struct ceph_client_metric *m = &mdsc->metric;
-	int i, nr_caps = 0;
+	int nr_caps = 0;
 	s64 total, sum, avg, min, max, sq;
 
 	seq_printf(s, "item          total       avg_lat(us)     min_lat(us)     max_lat(us)     stdev(us)\n");
@@ -190,17 +190,7 @@ static int metric_show(struct seq_file *s, void *p)
 		   percpu_counter_sum(&m->d_lease_mis),
 		   percpu_counter_sum(&m->d_lease_hit));
 
-	mutex_lock(&mdsc->mutex);
-	for (i = 0; i < mdsc->max_sessions; i++) {
-		struct ceph_mds_session *s;
-
-		s = __ceph_lookup_mds_session(mdsc, i);
-		if (!s)
-			continue;
-		nr_caps += s->s_nr_caps;
-		ceph_put_mds_session(s);
-	}
-	mutex_unlock(&mdsc->mutex);
+	nr_caps = atomic64_read(&m->total_caps);
 	seq_printf(s, "%-14s%-16d%-16lld%lld\n", "caps", nr_caps,
 		   percpu_counter_sum(&m->i_caps_mis),
 		   percpu_counter_sum(&m->i_caps_hit));
diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
index 58c54d4..f3c7123 100644
--- a/fs/ceph/mds_client.c
+++ b/fs/ceph/mds_client.c
@@ -1485,6 +1485,7 @@ int ceph_iterate_session_caps(struct ceph_mds_session *session,
 			cap->session = NULL;
 			list_del_init(&cap->session_caps);
 			session->s_nr_caps--;
+			atomic64_dec(&session->s_mdsc->metric.total_caps);
 			if (cap->queue_release)
 				__ceph_queue_cap_release(session, cap);
 			else
diff --git a/fs/ceph/metric.c b/fs/ceph/metric.c
index 9217f35..269eacb 100644
--- a/fs/ceph/metric.c
+++ b/fs/ceph/metric.c
@@ -22,6 +22,7 @@ int ceph_metric_init(struct ceph_client_metric *m)
 	if (ret)
 		goto err_d_lease_mis;
 
+	atomic64_set(&m->total_caps, 0);
 	ret = percpu_counter_init(&m->i_caps_hit, 0, GFP_KERNEL);
 	if (ret)
 		goto err_i_caps_hit;
diff --git a/fs/ceph/metric.h b/fs/ceph/metric.h
index ccd8128..23a3373 100644
--- a/fs/ceph/metric.h
+++ b/fs/ceph/metric.h
@@ -12,6 +12,7 @@ struct ceph_client_metric {
 	struct percpu_counter d_lease_hit;
 	struct percpu_counter d_lease_mis;
 
+	atomic64_t            total_caps;
 	struct percpu_counter i_caps_hit;
 	struct percpu_counter i_caps_mis;
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v5 3/5] ceph: periodically send perf metrics to ceph
  2020-06-30  7:52 [PATCH v5 0/5] ceph: periodically send perf metrics to ceph xiubli
  2020-06-30  7:52 ` [PATCH v5 1/5] ceph: add check_session_state helper and make it global xiubli
  2020-06-30  7:52 ` [PATCH v5 2/5] ceph: add global total_caps to count the mdsc's total caps number xiubli
@ 2020-06-30  7:52 ` xiubli
  2020-06-30 11:29   ` Jeff Layton
  2020-06-30  7:52 ` [PATCH v5 4/5] ceph: switch to WARN_ON_ONCE and bubble up errnos to the callers xiubli
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 12+ messages in thread
From: xiubli @ 2020-06-30  7:52 UTC (permalink / raw)
  To: jlayton; +Cc: idryomov, zyan, pdonnell, vshankar, ceph-devel, Xiubo Li

From: Xiubo Li <xiubli@redhat.com>

This will send the caps/read/write/metadata metrics to any available
MDS only once per second as default, which will be the same as the
userland client. It will skip the MDS sessions which don't support
the metric collection, or the MDSs will close the socket connections
directly when it get an unknown type message.

We can disable the metric sending via the enable_send_metric module
parameter.

URL: https://tracker.ceph.com/issues/43215
Signed-off-by: Xiubo Li <xiubli@redhat.com>
---
 fs/ceph/mds_client.c         |  12 ++++
 fs/ceph/mds_client.h         |   4 +-
 fs/ceph/metric.c             | 157 +++++++++++++++++++++++++++++++++++++++++++
 fs/ceph/metric.h             |  82 ++++++++++++++++++++++
 fs/ceph/super.c              |  42 ++++++++++++
 fs/ceph/super.h              |   2 +
 include/linux/ceph/ceph_fs.h |   1 +
 7 files changed, 299 insertions(+), 1 deletion(-)

diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
index f3c7123..bcdda5a 100644
--- a/fs/ceph/mds_client.c
+++ b/fs/ceph/mds_client.c
@@ -1809,6 +1809,11 @@ static int __close_session(struct ceph_mds_client *mdsc,
 	if (session->s_state >= CEPH_MDS_SESSION_CLOSING)
 		return 0;
 	session->s_state = CEPH_MDS_SESSION_CLOSING;
+
+	if (test_bit(CEPHFS_FEATURE_METRIC_COLLECT, &session->s_features)
+	    && atomic_dec_return(&mdsc->metric.mds_cnt) == 0)
+		cancel_delayed_work_sync(&mdsc->metric.delayed_work);
+
 	return request_close_session(session);
 }
 
@@ -3310,6 +3315,9 @@ static void handle_session(struct ceph_mds_session *session,
 		session->s_state = CEPH_MDS_SESSION_OPEN;
 		session->s_features = features;
 		renewed_caps(mdsc, session, 0);
+		if (test_bit(CEPHFS_FEATURE_METRIC_COLLECT, &session->s_features)
+		    && atomic_inc_return(&mdsc->metric.mds_cnt) == 1)
+			metric_schedule_delayed(&mdsc->metric);
 		wake = 1;
 		if (mdsc->stopping)
 			__close_session(mdsc, session);
@@ -3809,6 +3817,10 @@ static void send_mds_reconnect(struct ceph_mds_client *mdsc,
 	session->s_state = CEPH_MDS_SESSION_RECONNECTING;
 	session->s_seq = 0;
 
+	if (test_bit(CEPHFS_FEATURE_METRIC_COLLECT, &session->s_features)
+	    && atomic_dec_return(&mdsc->metric.mds_cnt) == 0)
+		cancel_delayed_work_sync(&mdsc->metric.delayed_work);
+
 	dout("session %p state %s\n", session,
 	     ceph_session_state_name(session->s_state));
 
diff --git a/fs/ceph/mds_client.h b/fs/ceph/mds_client.h
index 6147ff0..bc9e959 100644
--- a/fs/ceph/mds_client.h
+++ b/fs/ceph/mds_client.h
@@ -28,8 +28,9 @@ enum ceph_feature_type {
 	CEPHFS_FEATURE_LAZY_CAP_WANTED,
 	CEPHFS_FEATURE_MULTI_RECONNECT,
 	CEPHFS_FEATURE_DELEG_INO,
+	CEPHFS_FEATURE_METRIC_COLLECT,
 
-	CEPHFS_FEATURE_MAX = CEPHFS_FEATURE_DELEG_INO,
+	CEPHFS_FEATURE_MAX = CEPHFS_FEATURE_METRIC_COLLECT,
 };
 
 /*
@@ -43,6 +44,7 @@ enum ceph_feature_type {
 	CEPHFS_FEATURE_LAZY_CAP_WANTED,		\
 	CEPHFS_FEATURE_MULTI_RECONNECT,		\
 	CEPHFS_FEATURE_DELEG_INO,		\
+	CEPHFS_FEATURE_METRIC_COLLECT,		\
 						\
 	CEPHFS_FEATURE_MAX,			\
 }
diff --git a/fs/ceph/metric.c b/fs/ceph/metric.c
index 269eacb..8d93cf6 100644
--- a/fs/ceph/metric.c
+++ b/fs/ceph/metric.c
@@ -1,10 +1,162 @@
 /* SPDX-License-Identifier: GPL-2.0 */
+#include <linux/ceph/ceph_debug.h>
 
 #include <linux/types.h>
 #include <linux/percpu_counter.h>
 #include <linux/math64.h>
 
 #include "metric.h"
+#include "mds_client.h"
+
+static bool ceph_mdsc_send_metrics(struct ceph_mds_client *mdsc,
+				   struct ceph_mds_session *s,
+				   u64 nr_caps)
+{
+	struct ceph_metric_head *head;
+	struct ceph_metric_cap *cap;
+	struct ceph_metric_read_latency *read;
+	struct ceph_metric_write_latency *write;
+	struct ceph_metric_metadata_latency *meta;
+	struct ceph_client_metric *m = &mdsc->metric;
+	struct ceph_msg *msg;
+	struct timespec64 ts;
+	s64 sum, total;
+	s32 items = 0;
+	s32 len;
+
+	len = sizeof(*head) + sizeof(*cap) + sizeof(*read) + sizeof(*write)
+	      + sizeof(*meta);
+
+	msg = ceph_msg_new(CEPH_MSG_CLIENT_METRICS, len, GFP_NOFS, true);
+	if (!msg) {
+		pr_err("send metrics to mds%d, failed to allocate message\n",
+		       s->s_mds);
+		return false;
+	}
+
+	head = msg->front.iov_base;
+
+	/* encode the cap metric */
+	cap = (struct ceph_metric_cap *)(head + 1);
+	cap->type = cpu_to_le32(CLIENT_METRIC_TYPE_CAP_INFO);
+	cap->ver = 1;
+	cap->compat = 1;
+	cap->data_len = cpu_to_le32(sizeof(*cap) - 10);
+	cap->hit = cpu_to_le64(percpu_counter_sum(&mdsc->metric.i_caps_hit));
+	cap->mis = cpu_to_le64(percpu_counter_sum(&mdsc->metric.i_caps_mis));
+	cap->total = cpu_to_le64(nr_caps);
+	items++;
+
+	/* encode the read latency metric */
+	read = (struct ceph_metric_read_latency *)(cap + 1);
+	read->type = cpu_to_le32(CLIENT_METRIC_TYPE_READ_LATENCY);
+	read->ver = 1;
+	read->compat = 1;
+	read->data_len = cpu_to_le32(sizeof(*read) - 10);
+	total = m->total_reads;
+	sum = m->read_latency_sum;
+	jiffies_to_timespec64(sum, &ts);
+	read->sec = cpu_to_le32(ts.tv_sec);
+	read->nsec = cpu_to_le32(ts.tv_nsec);
+	items++;
+
+	/* encode the write latency metric */
+	write = (struct ceph_metric_write_latency *)(read + 1);
+	write->type = cpu_to_le32(CLIENT_METRIC_TYPE_WRITE_LATENCY);
+	write->ver = 1;
+	write->compat = 1;
+	write->data_len = cpu_to_le32(sizeof(*write) - 10);
+	total = m->total_writes;
+	sum = m->write_latency_sum;
+	jiffies_to_timespec64(sum, &ts);
+	write->sec = cpu_to_le32(ts.tv_sec);
+	write->nsec = cpu_to_le32(ts.tv_nsec);
+	items++;
+
+	/* encode the metadata latency metric */
+	meta = (struct ceph_metric_metadata_latency *)(write + 1);
+	meta->type = cpu_to_le32(CLIENT_METRIC_TYPE_METADATA_LATENCY);
+	meta->ver = 1;
+	meta->compat = 1;
+	meta->data_len = cpu_to_le32(sizeof(*meta) - 10);
+	total = m->total_metadatas;
+	sum = m->metadata_latency_sum;
+	jiffies_to_timespec64(sum, &ts);
+	meta->sec = cpu_to_le32(ts.tv_sec);
+	meta->nsec = cpu_to_le32(ts.tv_nsec);
+	items++;
+
+	put_unaligned_le32(items, &head->num);
+	msg->front.iov_len = cpu_to_le32(len);
+	msg->hdr.version = cpu_to_le16(1);
+	msg->hdr.compat_version = cpu_to_le16(1);
+	msg->hdr.front_len = cpu_to_le32(msg->front.iov_len);
+	dout("client%llu send metrics to mds%d\n",
+	     ceph_client_gid(mdsc->fsc->client), s->s_mds);
+	ceph_con_send(&s->s_con, msg);
+
+	return true;
+}
+
+static struct ceph_mds_session *metric_get_session(struct ceph_mds_client *mdsc)
+{
+	struct ceph_mds_session *s;
+	int i;
+
+	mutex_lock(&mdsc->mutex);
+	for (i = 0; i < mdsc->max_sessions; i++) {
+		s = __ceph_lookup_mds_session(mdsc, i);
+		if (!s)
+			continue;
+		mutex_unlock(&mdsc->mutex);
+
+		/*
+		 * Skip it if MDS doesn't support the metric collection,
+		 * or the MDS will close the session's socket connection
+		 * directly when it get this message.
+		 */
+		if (check_session_state(s) &&
+		    test_bit(CEPHFS_FEATURE_METRIC_COLLECT, &s->s_features)) {
+			mdsc->metric.mds = i;
+			return s;
+		}
+		ceph_put_mds_session(s);
+
+		mutex_lock(&mdsc->mutex);
+	}
+	mutex_unlock(&mdsc->mutex);
+
+	return NULL;
+}
+
+static void metric_delayed_work(struct work_struct *work)
+{
+	struct ceph_client_metric *m =
+		container_of(work, struct ceph_client_metric, delayed_work.work);
+	struct ceph_mds_client *mdsc =
+		container_of(m, struct ceph_mds_client, metric);
+	struct ceph_mds_session *s = NULL;
+	u64 nr_caps = atomic64_read(&m->total_caps);
+
+	/* No mds supports the metric collection, will stop the work */
+	if (!atomic_read(&m->mds_cnt))
+		return;
+
+	mutex_lock(&mdsc->mutex);
+	s = __ceph_lookup_mds_session(mdsc, m->mds);
+	mutex_unlock(&mdsc->mutex);
+	if (unlikely(!s || !check_session_state(s) ||
+	    !test_bit(CEPHFS_FEATURE_METRIC_COLLECT, &s->s_features)))
+		s = metric_get_session(mdsc);
+
+	if (s) {
+		/* Only send the metric once in any available session */
+		ceph_mdsc_send_metrics(mdsc, s, nr_caps);
+		ceph_put_mds_session(s);
+	}
+
+	metric_schedule_delayed(m);
+}
 
 int ceph_metric_init(struct ceph_client_metric *m)
 {
@@ -52,6 +204,11 @@ int ceph_metric_init(struct ceph_client_metric *m)
 	m->total_metadatas = 0;
 	m->metadata_latency_sum = 0;
 
+	/* We assume the rank 0 support it as default */
+	m->mds = 0;
+	atomic_set(&m->mds_cnt, 0);
+	INIT_DELAYED_WORK(&m->delayed_work, metric_delayed_work);
+
 	return 0;
 
 err_i_caps_mis:
diff --git a/fs/ceph/metric.h b/fs/ceph/metric.h
index 23a3373..68e2d17 100644
--- a/fs/ceph/metric.h
+++ b/fs/ceph/metric.h
@@ -6,6 +6,71 @@
 #include <linux/percpu_counter.h>
 #include <linux/ktime.h>
 
+extern bool disable_send_metrics;
+
+enum ceph_metric_type {
+	CLIENT_METRIC_TYPE_CAP_INFO,
+	CLIENT_METRIC_TYPE_READ_LATENCY,
+	CLIENT_METRIC_TYPE_WRITE_LATENCY,
+	CLIENT_METRIC_TYPE_METADATA_LATENCY,
+	CLIENT_METRIC_TYPE_DENTRY_LEASE,
+
+	CLIENT_METRIC_TYPE_MAX = CLIENT_METRIC_TYPE_DENTRY_LEASE,
+};
+
+/* metric caps header */
+struct ceph_metric_cap {
+	__le32 type;     /* ceph metric type */
+
+	__u8  ver;
+	__u8  compat;
+
+	__le32 data_len; /* length of sizeof(hit + mis + total) */
+	__le64 hit;
+	__le64 mis;
+	__le64 total;
+} __packed;
+
+/* metric read latency header */
+struct ceph_metric_read_latency {
+	__le32 type;     /* ceph metric type */
+
+	__u8  ver;
+	__u8  compat;
+
+	__le32 data_len; /* length of sizeof(sec + nsec) */
+	__le32 sec;
+	__le32 nsec;
+} __packed;
+
+/* metric write latency header */
+struct ceph_metric_write_latency {
+	__le32 type;     /* ceph metric type */
+
+	__u8  ver;
+	__u8  compat;
+
+	__le32 data_len; /* length of sizeof(sec + nsec) */
+	__le32 sec;
+	__le32 nsec;
+} __packed;
+
+/* metric metadata latency header */
+struct ceph_metric_metadata_latency {
+	__le32 type;     /* ceph metric type */
+
+	__u8  ver;
+	__u8  compat;
+
+	__le32 data_len; /* length of sizeof(sec + nsec) */
+	__le32 sec;
+	__le32 nsec;
+} __packed;
+
+struct ceph_metric_head {
+	__le32 num;	/* the number of metrics that will be sent */
+} __packed;
+
 /* This is the global metrics */
 struct ceph_client_metric {
 	atomic64_t            total_dentries;
@@ -36,8 +101,25 @@ struct ceph_client_metric {
 	ktime_t metadata_latency_sq_sum;
 	ktime_t metadata_latency_min;
 	ktime_t metadata_latency_max;
+
+	int mds; /* the MDS being used to send the metrics to */
+	atomic_t mds_cnt;  /* how many MDSs support metrics collection */
+	struct delayed_work delayed_work;  /* delayed work */
 };
 
+static inline void metric_schedule_delayed(struct ceph_client_metric *m)
+{
+	/*
+	 * If send metrics is disabled or no mds support metric
+	 * collection, will stop the work
+	 */
+	if (disable_send_metrics || !atomic_read(&m->mds_cnt))
+		return;
+
+	/* per second */
+	schedule_delayed_work(&m->delayed_work, round_jiffies_relative(HZ));
+}
+
 extern int ceph_metric_init(struct ceph_client_metric *m);
 extern void ceph_metric_destroy(struct ceph_client_metric *m);
 
diff --git a/fs/ceph/super.c b/fs/ceph/super.c
index c9784eb1..cd33836 100644
--- a/fs/ceph/super.c
+++ b/fs/ceph/super.c
@@ -27,6 +27,9 @@
 #include <linux/ceph/auth.h>
 #include <linux/ceph/debugfs.h>
 
+static DEFINE_MUTEX(ceph_fsc_lock);
+static LIST_HEAD(ceph_fsc_list);
+
 /*
  * Ceph superblock operations
  *
@@ -691,6 +694,10 @@ static struct ceph_fs_client *create_fs_client(struct ceph_mount_options *fsopt,
 	if (!fsc->wb_pagevec_pool)
 		goto fail_cap_wq;
 
+	mutex_lock(&ceph_fsc_lock);
+	list_add_tail(&fsc->metric_wakeup, &ceph_fsc_list);
+	mutex_unlock(&ceph_fsc_lock);
+
 	return fsc;
 
 fail_cap_wq:
@@ -717,6 +724,10 @@ static void destroy_fs_client(struct ceph_fs_client *fsc)
 {
 	dout("destroy_fs_client %p\n", fsc);
 
+	mutex_lock(&ceph_fsc_lock);
+	list_del(&fsc->metric_wakeup);
+	mutex_unlock(&ceph_fsc_lock);
+
 	ceph_mdsc_destroy(fsc);
 	destroy_workqueue(fsc->inode_wq);
 	destroy_workqueue(fsc->cap_wq);
@@ -1282,6 +1293,37 @@ static void __exit exit_ceph(void)
 	destroy_caches();
 }
 
+static int param_set_metrics(const char *val, const struct kernel_param *kp)
+{
+	struct ceph_fs_client *fsc;
+	int ret;
+
+	ret = param_set_bool(val, kp);
+	if (ret) {
+		pr_err("Failed to parse sending metrics switch value '%s'\n",
+		       val);
+		return ret;
+	} else if (!disable_send_metrics) {
+		// wake up all the mds clients
+		mutex_lock(&ceph_fsc_lock);
+		list_for_each_entry(fsc, &ceph_fsc_list, metric_wakeup) {
+			metric_schedule_delayed(&fsc->mdsc->metric);
+		}
+		mutex_unlock(&ceph_fsc_lock);
+	}
+
+	return 0;
+}
+
+static const struct kernel_param_ops param_ops_metrics = {
+	.set = param_set_metrics,
+	.get = param_get_bool,
+};
+
+bool disable_send_metrics = false;
+module_param_cb(disable_send_metrics, &param_ops_metrics, &disable_send_metrics, 0644);
+MODULE_PARM_DESC(disable_send_metrics, "Enable sending perf metrics to ceph cluster (default: on)");
+
 module_init(init_ceph);
 module_exit(exit_ceph);
 
diff --git a/fs/ceph/super.h b/fs/ceph/super.h
index 5a6cdd3..2dcb6a9 100644
--- a/fs/ceph/super.h
+++ b/fs/ceph/super.h
@@ -101,6 +101,8 @@ struct ceph_mount_options {
 struct ceph_fs_client {
 	struct super_block *sb;
 
+	struct list_head metric_wakeup;
+
 	struct ceph_mount_options *mount_options;
 	struct ceph_client *client;
 
diff --git a/include/linux/ceph/ceph_fs.h b/include/linux/ceph/ceph_fs.h
index ebf5ba6..455e9b9 100644
--- a/include/linux/ceph/ceph_fs.h
+++ b/include/linux/ceph/ceph_fs.h
@@ -130,6 +130,7 @@ struct ceph_dir_layout {
 #define CEPH_MSG_CLIENT_REQUEST         24
 #define CEPH_MSG_CLIENT_REQUEST_FORWARD 25
 #define CEPH_MSG_CLIENT_REPLY           26
+#define CEPH_MSG_CLIENT_METRICS         29
 #define CEPH_MSG_CLIENT_CAPS            0x310
 #define CEPH_MSG_CLIENT_LEASE           0x311
 #define CEPH_MSG_CLIENT_SNAP            0x312
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v5 4/5] ceph: switch to WARN_ON_ONCE and bubble up errnos to the callers
  2020-06-30  7:52 [PATCH v5 0/5] ceph: periodically send perf metrics to ceph xiubli
                   ` (2 preceding siblings ...)
  2020-06-30  7:52 ` [PATCH v5 3/5] ceph: periodically send perf metrics to ceph xiubli
@ 2020-06-30  7:52 ` xiubli
  2020-06-30  7:52 ` [PATCH v5 5/5] ceph: send client provided metric flags in client metadata xiubli
  2020-06-30 13:02 ` [PATCH v5 0/5] ceph: periodically send perf metrics to ceph Jeff Layton
  5 siblings, 0 replies; 12+ messages in thread
From: xiubli @ 2020-06-30  7:52 UTC (permalink / raw)
  To: jlayton; +Cc: idryomov, zyan, pdonnell, vshankar, ceph-devel, Xiubo Li

From: Xiubo Li <xiubli@redhat.com>

Signed-off-by: Xiubo Li <xiubli@redhat.com>
---
 fs/ceph/mds_client.c | 46 +++++++++++++++++++++++++++++++++++-----------
 1 file changed, 35 insertions(+), 11 deletions(-)

diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
index bcdda5a..93b539e 100644
--- a/fs/ceph/mds_client.c
+++ b/fs/ceph/mds_client.c
@@ -1168,7 +1168,7 @@ static struct ceph_msg *create_session_msg(u32 op, u64 seq)
 
 static const unsigned char feature_bits[] = CEPHFS_FEATURES_CLIENT_SUPPORTED;
 #define FEATURE_BYTES(c) (DIV_ROUND_UP((size_t)feature_bits[c - 1] + 1, 64) * 8)
-static void encode_supported_features(void **p, void *end)
+static int encode_supported_features(void **p, void *end)
 {
 	static const size_t count = ARRAY_SIZE(feature_bits);
 
@@ -1176,16 +1176,22 @@ static void encode_supported_features(void **p, void *end)
 		size_t i;
 		size_t size = FEATURE_BYTES(count);
 
-		BUG_ON(*p + 4 + size > end);
+		if (WARN_ON_ONCE(*p + 4 + size > end))
+			return -ERANGE;
+
 		ceph_encode_32(p, size);
 		memset(*p, 0, size);
 		for (i = 0; i < count; i++)
 			((unsigned char*)(*p))[i / 8] |= BIT(feature_bits[i] % 8);
 		*p += size;
 	} else {
-		BUG_ON(*p + 4 > end);
+		if (WARN_ON_ONCE(*p + 4 > end))
+			return -ERANGE;
+
 		ceph_encode_32(p, 0);
 	}
+
+	return 0;
 }
 
 /*
@@ -1203,6 +1209,7 @@ static struct ceph_msg *create_session_open_msg(struct ceph_mds_client *mdsc, u6
 	struct ceph_mount_options *fsopt = mdsc->fsc->mount_options;
 	size_t size, count;
 	void *p, *end;
+	int ret;
 
 	const char* metadata[][2] = {
 		{"hostname", mdsc->nodename},
@@ -1232,7 +1239,7 @@ static struct ceph_msg *create_session_open_msg(struct ceph_mds_client *mdsc, u6
 			   GFP_NOFS, false);
 	if (!msg) {
 		pr_err("create_session_msg ENOMEM creating msg\n");
-		return NULL;
+		return ERR_PTR(-ENOMEM);
 	}
 	p = msg->front.iov_base;
 	end = p + msg->front.iov_len;
@@ -1269,7 +1276,13 @@ static struct ceph_msg *create_session_open_msg(struct ceph_mds_client *mdsc, u6
 		p += val_len;
 	}
 
-	encode_supported_features(&p, end);
+	ret = encode_supported_features(&p, end);
+	if (ret) {
+		pr_err("encode_supported_features failed!\n");
+		ceph_msg_put(msg);
+		return ERR_PTR(ret);
+	}
+
 	msg->front.iov_len = p - msg->front.iov_base;
 	msg->hdr.front_len = cpu_to_le32(msg->front.iov_len);
 
@@ -1297,8 +1310,8 @@ static int __open_session(struct ceph_mds_client *mdsc,
 
 	/* send connect message */
 	msg = create_session_open_msg(mdsc, session->s_seq);
-	if (!msg)
-		return -ENOMEM;
+	if (IS_ERR(msg))
+		return PTR_ERR(msg);
 	ceph_con_send(&session->s_con, msg);
 	return 0;
 }
@@ -1312,6 +1325,7 @@ static int __open_session(struct ceph_mds_client *mdsc,
 __open_export_target_session(struct ceph_mds_client *mdsc, int target)
 {
 	struct ceph_mds_session *session;
+	int ret;
 
 	session = __ceph_lookup_mds_session(mdsc, target);
 	if (!session) {
@@ -1320,8 +1334,11 @@ static int __open_session(struct ceph_mds_client *mdsc,
 			return session;
 	}
 	if (session->s_state == CEPH_MDS_SESSION_NEW ||
-	    session->s_state == CEPH_MDS_SESSION_CLOSING)
-		__open_session(mdsc, session);
+	    session->s_state == CEPH_MDS_SESSION_CLOSING) {
+		ret = __open_session(mdsc, session);
+		if (ret)
+			return ERR_PTR(ret);
+	}
 
 	return session;
 }
@@ -2525,7 +2542,12 @@ static struct ceph_msg *create_request_message(struct ceph_mds_client *mdsc,
 		ceph_encode_copy(&p, &ts, sizeof(ts));
 	}
 
-	BUG_ON(p > end);
+	if (WARN_ON_ONCE(p > end)) {
+		ceph_msg_put(msg);
+		msg = ERR_PTR(-ERANGE);
+		goto out_free2;
+	}
+
 	msg->front.iov_len = p - msg->front.iov_base;
 	msg->hdr.front_len = cpu_to_le32(msg->front.iov_len);
 
@@ -2761,7 +2783,9 @@ static void __do_request(struct ceph_mds_client *mdsc,
 		}
 		if (session->s_state == CEPH_MDS_SESSION_NEW ||
 		    session->s_state == CEPH_MDS_SESSION_CLOSING) {
-			__open_session(mdsc, session);
+			err = __open_session(mdsc, session);
+			if (err)
+				goto out_session;
 			/* retry the same mds later */
 			if (random)
 				req->r_resend_mds = mds;
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v5 5/5] ceph: send client provided metric flags in client metadata
  2020-06-30  7:52 [PATCH v5 0/5] ceph: periodically send perf metrics to ceph xiubli
                   ` (3 preceding siblings ...)
  2020-06-30  7:52 ` [PATCH v5 4/5] ceph: switch to WARN_ON_ONCE and bubble up errnos to the callers xiubli
@ 2020-06-30  7:52 ` xiubli
  2020-06-30 13:02 ` [PATCH v5 0/5] ceph: periodically send perf metrics to ceph Jeff Layton
  5 siblings, 0 replies; 12+ messages in thread
From: xiubli @ 2020-06-30  7:52 UTC (permalink / raw)
  To: jlayton; +Cc: idryomov, zyan, pdonnell, vshankar, ceph-devel, Xiubo Li

From: Xiubo Li <xiubli@redhat.com>

Will send the metric flags to MDS, currently it supports the cap,
read latency, write latency and metadata latency.

URL: https://tracker.ceph.com/issues/43435
Signed-off-by: Xiubo Li <xiubli@redhat.com>
---
 fs/ceph/mds_client.c | 60 ++++++++++++++++++++++++++++++++++++++++++++++++++--
 fs/ceph/metric.h     | 13 ++++++++++++
 2 files changed, 71 insertions(+), 2 deletions(-)

diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
index 93b539e..7bcf4d64 100644
--- a/fs/ceph/mds_client.c
+++ b/fs/ceph/mds_client.c
@@ -1194,6 +1194,48 @@ static int encode_supported_features(void **p, void *end)
 	return 0;
 }
 
+static const unsigned char metric_bits[] = CEPHFS_METRIC_SPEC_CLIENT_SUPPORTED;
+#define METRIC_BYTES(cnt) (DIV_ROUND_UP((size_t)metric_bits[cnt - 1] + 1, 64) * 8)
+static int encode_metric_spec(void **p, void *end)
+{
+	static const size_t count = ARRAY_SIZE(metric_bits);
+
+	/* header */
+	if (WARN_ON_ONCE(*p + 2 > end))
+		return -ERANGE;
+
+	ceph_encode_8(p, 1); /* version */
+	ceph_encode_8(p, 1); /* compat */
+
+	if (count > 0) {
+		size_t i;
+		size_t size = METRIC_BYTES(count);
+
+		if (WARN_ON_ONCE(*p + 4 + 4 + size > end))
+			return -ERANGE;
+
+		/* metric spec info length */
+		ceph_encode_32(p, 4 + size);
+
+		/* metric spec */
+		ceph_encode_32(p, size);
+		memset(*p, 0, size);
+		for (i = 0; i < count; i++)
+			((unsigned char *)(*p))[i / 8] |= BIT(metric_bits[i] % 8);
+		*p += size;
+	} else {
+		if (WARN_ON_ONCE(*p + 4 + 4 > end))
+			return -ERANGE;
+
+		/* metric spec info length */
+		ceph_encode_32(p, 4);
+		/* metric spec */
+		ceph_encode_32(p, 0);
+	}
+
+	return 0;
+}
+
 /*
  * session message, specialization for CEPH_SESSION_REQUEST_OPEN
  * to include additional client metadata fields.
@@ -1234,6 +1276,13 @@ static struct ceph_msg *create_session_open_msg(struct ceph_mds_client *mdsc, u6
 		size = FEATURE_BYTES(count);
 	extra_bytes += 4 + size;
 
+	/* metric spec */
+	size = 0;
+	count = ARRAY_SIZE(metric_bits);
+	if (count > 0)
+		size = METRIC_BYTES(count);
+	extra_bytes += 2 + 4 + 4 + size;
+
 	/* Allocate the message */
 	msg = ceph_msg_new(CEPH_MSG_CLIENT_SESSION, sizeof(*h) + extra_bytes,
 			   GFP_NOFS, false);
@@ -1252,9 +1301,9 @@ static struct ceph_msg *create_session_open_msg(struct ceph_mds_client *mdsc, u6
 	 * Serialize client metadata into waiting buffer space, using
 	 * the format that userspace expects for map<string, string>
 	 *
-	 * ClientSession messages with metadata are v3
+	 * ClientSession messages with metadata are v4
 	 */
-	msg->hdr.version = cpu_to_le16(3);
+	msg->hdr.version = cpu_to_le16(4);
 	msg->hdr.compat_version = cpu_to_le16(1);
 
 	/* The write pointer, following the session_head structure */
@@ -1283,6 +1332,13 @@ static struct ceph_msg *create_session_open_msg(struct ceph_mds_client *mdsc, u6
 		return ERR_PTR(ret);
 	}
 
+	ret = encode_metric_spec(&p, end);
+	if (ret) {
+		pr_err("encode_metric_spec failed!\n");
+		ceph_msg_put(msg);
+		return ERR_PTR(ret);
+	}
+
 	msg->front.iov_len = p - msg->front.iov_base;
 	msg->hdr.front_len = cpu_to_le32(msg->front.iov_len);
 
diff --git a/fs/ceph/metric.h b/fs/ceph/metric.h
index 68e2d17..3ebb0ef 100644
--- a/fs/ceph/metric.h
+++ b/fs/ceph/metric.h
@@ -18,6 +18,19 @@ enum ceph_metric_type {
 	CLIENT_METRIC_TYPE_MAX = CLIENT_METRIC_TYPE_DENTRY_LEASE,
 };
 
+/*
+ * This will always have the highest metric bit value
+ * as the last element of the array.
+ */
+#define CEPHFS_METRIC_SPEC_CLIENT_SUPPORTED {	\
+	CLIENT_METRIC_TYPE_CAP_INFO,		\
+	CLIENT_METRIC_TYPE_READ_LATENCY,	\
+	CLIENT_METRIC_TYPE_WRITE_LATENCY,	\
+	CLIENT_METRIC_TYPE_METADATA_LATENCY,	\
+						\
+	CLIENT_METRIC_TYPE_MAX,			\
+}
+
 /* metric caps header */
 struct ceph_metric_cap {
 	__le32 type;     /* ceph metric type */
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v5 3/5] ceph: periodically send perf metrics to ceph
  2020-06-30  7:52 ` [PATCH v5 3/5] ceph: periodically send perf metrics to ceph xiubli
@ 2020-06-30 11:29   ` Jeff Layton
  2020-06-30 12:14     ` Xiubo Li
  0 siblings, 1 reply; 12+ messages in thread
From: Jeff Layton @ 2020-06-30 11:29 UTC (permalink / raw)
  To: xiubli; +Cc: idryomov, zyan, pdonnell, vshankar, ceph-devel

On Tue, 2020-06-30 at 03:52 -0400, xiubli@redhat.com wrote:
> From: Xiubo Li <xiubli@redhat.com>
> 
> This will send the caps/read/write/metadata metrics to any available
> MDS only once per second as default, which will be the same as the
> userland client. It will skip the MDS sessions which don't support
> the metric collection, or the MDSs will close the socket connections
> directly when it get an unknown type message.
> 
> We can disable the metric sending via the enable_send_metric module
> parameter.
> 
> URL: https://tracker.ceph.com/issues/43215
> Signed-off-by: Xiubo Li <xiubli@redhat.com>
> ---
>  fs/ceph/mds_client.c         |  12 ++++
>  fs/ceph/mds_client.h         |   4 +-
>  fs/ceph/metric.c             | 157 +++++++++++++++++++++++++++++++++++++++++++
>  fs/ceph/metric.h             |  82 ++++++++++++++++++++++
>  fs/ceph/super.c              |  42 ++++++++++++
>  fs/ceph/super.h              |   2 +
>  include/linux/ceph/ceph_fs.h |   1 +
>  7 files changed, 299 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
> index f3c7123..bcdda5a 100644
> --- a/fs/ceph/mds_client.c
> +++ b/fs/ceph/mds_client.c
> @@ -1809,6 +1809,11 @@ static int __close_session(struct ceph_mds_client *mdsc,
>  	if (session->s_state >= CEPH_MDS_SESSION_CLOSING)
>  		return 0;
>  	session->s_state = CEPH_MDS_SESSION_CLOSING;
> +
> +	if (test_bit(CEPHFS_FEATURE_METRIC_COLLECT, &session->s_features)
> +	    && atomic_dec_return(&mdsc->metric.mds_cnt) == 0)
> +		cancel_delayed_work_sync(&mdsc->metric.delayed_work);
> +
>  	return request_close_session(session);
>  }
>  
> @@ -3310,6 +3315,9 @@ static void handle_session(struct ceph_mds_session *session,
>  		session->s_state = CEPH_MDS_SESSION_OPEN;
>  		session->s_features = features;
>  		renewed_caps(mdsc, session, 0);
> +		if (test_bit(CEPHFS_FEATURE_METRIC_COLLECT, &session->s_features)
> +		    && atomic_inc_return(&mdsc->metric.mds_cnt) == 1)
> +			metric_schedule_delayed(&mdsc->metric);
>  		wake = 1;
>  		if (mdsc->stopping)
>  			__close_session(mdsc, session);
> @@ -3809,6 +3817,10 @@ static void send_mds_reconnect(struct ceph_mds_client *mdsc,
>  	session->s_state = CEPH_MDS_SESSION_RECONNECTING;
>  	session->s_seq = 0;
>  
> +	if (test_bit(CEPHFS_FEATURE_METRIC_COLLECT, &session->s_features)
> +	    && atomic_dec_return(&mdsc->metric.mds_cnt) == 0)
> +		cancel_delayed_work_sync(&mdsc->metric.delayed_work);
> +
>  	dout("session %p state %s\n", session,
>  	     ceph_session_state_name(session->s_state));
>  
> diff --git a/fs/ceph/mds_client.h b/fs/ceph/mds_client.h
> index 6147ff0..bc9e959 100644
> --- a/fs/ceph/mds_client.h
> +++ b/fs/ceph/mds_client.h
> @@ -28,8 +28,9 @@ enum ceph_feature_type {
>  	CEPHFS_FEATURE_LAZY_CAP_WANTED,
>  	CEPHFS_FEATURE_MULTI_RECONNECT,
>  	CEPHFS_FEATURE_DELEG_INO,
> +	CEPHFS_FEATURE_METRIC_COLLECT,
>  
> -	CEPHFS_FEATURE_MAX = CEPHFS_FEATURE_DELEG_INO,
> +	CEPHFS_FEATURE_MAX = CEPHFS_FEATURE_METRIC_COLLECT,
>  };
>  
>  /*
> @@ -43,6 +44,7 @@ enum ceph_feature_type {
>  	CEPHFS_FEATURE_LAZY_CAP_WANTED,		\
>  	CEPHFS_FEATURE_MULTI_RECONNECT,		\
>  	CEPHFS_FEATURE_DELEG_INO,		\
> +	CEPHFS_FEATURE_METRIC_COLLECT,		\
>  						\
>  	CEPHFS_FEATURE_MAX,			\
>  }
> diff --git a/fs/ceph/metric.c b/fs/ceph/metric.c
> index 269eacb..8d93cf6 100644
> --- a/fs/ceph/metric.c
> +++ b/fs/ceph/metric.c
> @@ -1,10 +1,162 @@
>  /* SPDX-License-Identifier: GPL-2.0 */
> +#include <linux/ceph/ceph_debug.h>
>  
>  #include <linux/types.h>
>  #include <linux/percpu_counter.h>
>  #include <linux/math64.h>
>  
>  #include "metric.h"
> +#include "mds_client.h"
> +
> +static bool ceph_mdsc_send_metrics(struct ceph_mds_client *mdsc,
> +				   struct ceph_mds_session *s,
> +				   u64 nr_caps)
> +{
> +	struct ceph_metric_head *head;
> +	struct ceph_metric_cap *cap;
> +	struct ceph_metric_read_latency *read;
> +	struct ceph_metric_write_latency *write;
> +	struct ceph_metric_metadata_latency *meta;
> +	struct ceph_client_metric *m = &mdsc->metric;
> +	struct ceph_msg *msg;
> +	struct timespec64 ts;
> +	s64 sum, total;
> +	s32 items = 0;
> +	s32 len;
> +
> +	len = sizeof(*head) + sizeof(*cap) + sizeof(*read) + sizeof(*write)
> +	      + sizeof(*meta);
> +
> +	msg = ceph_msg_new(CEPH_MSG_CLIENT_METRICS, len, GFP_NOFS, true);
> +	if (!msg) {
> +		pr_err("send metrics to mds%d, failed to allocate message\n",
> +		       s->s_mds);
> +		return false;
> +	}
> +
> +	head = msg->front.iov_base;
> +
> +	/* encode the cap metric */
> +	cap = (struct ceph_metric_cap *)(head + 1);
> +	cap->type = cpu_to_le32(CLIENT_METRIC_TYPE_CAP_INFO);
> +	cap->ver = 1;
> +	cap->compat = 1;
> +	cap->data_len = cpu_to_le32(sizeof(*cap) - 10);
> +	cap->hit = cpu_to_le64(percpu_counter_sum(&mdsc->metric.i_caps_hit));
> +	cap->mis = cpu_to_le64(percpu_counter_sum(&mdsc->metric.i_caps_mis));
> +	cap->total = cpu_to_le64(nr_caps);
> +	items++;
> +
> +	/* encode the read latency metric */
> +	read = (struct ceph_metric_read_latency *)(cap + 1);
> +	read->type = cpu_to_le32(CLIENT_METRIC_TYPE_READ_LATENCY);
> +	read->ver = 1;
> +	read->compat = 1;
> +	read->data_len = cpu_to_le32(sizeof(*read) - 10);
> +	total = m->total_reads;
> +	sum = m->read_latency_sum;
> +	jiffies_to_timespec64(sum, &ts);
> +	read->sec = cpu_to_le32(ts.tv_sec);
> +	read->nsec = cpu_to_le32(ts.tv_nsec);
> +	items++;
> +
> +	/* encode the write latency metric */
> +	write = (struct ceph_metric_write_latency *)(read + 1);
> +	write->type = cpu_to_le32(CLIENT_METRIC_TYPE_WRITE_LATENCY);
> +	write->ver = 1;
> +	write->compat = 1;
> +	write->data_len = cpu_to_le32(sizeof(*write) - 10);
> +	total = m->total_writes;
> +	sum = m->write_latency_sum;
> +	jiffies_to_timespec64(sum, &ts);
> +	write->sec = cpu_to_le32(ts.tv_sec);
> +	write->nsec = cpu_to_le32(ts.tv_nsec);
> +	items++;
> +
> +	/* encode the metadata latency metric */
> +	meta = (struct ceph_metric_metadata_latency *)(write + 1);
> +	meta->type = cpu_to_le32(CLIENT_METRIC_TYPE_METADATA_LATENCY);
> +	meta->ver = 1;
> +	meta->compat = 1;
> +	meta->data_len = cpu_to_le32(sizeof(*meta) - 10);
> +	total = m->total_metadatas;
> +	sum = m->metadata_latency_sum;
> +	jiffies_to_timespec64(sum, &ts);
> +	meta->sec = cpu_to_le32(ts.tv_sec);
> +	meta->nsec = cpu_to_le32(ts.tv_nsec);
> +	items++;
> +
> +	put_unaligned_le32(items, &head->num);
> +	msg->front.iov_len = cpu_to_le32(len);
> +	msg->hdr.version = cpu_to_le16(1);
> +	msg->hdr.compat_version = cpu_to_le16(1);
> +	msg->hdr.front_len = cpu_to_le32(msg->front.iov_len);
> +	dout("client%llu send metrics to mds%d\n",
> +	     ceph_client_gid(mdsc->fsc->client), s->s_mds);
> +	ceph_con_send(&s->s_con, msg);
> +
> +	return true;
> +}
> +
> +static struct ceph_mds_session *metric_get_session(struct ceph_mds_client *mdsc)
> +{
> +	struct ceph_mds_session *s;
> +	int i;
> +
> +	mutex_lock(&mdsc->mutex);
> +	for (i = 0; i < mdsc->max_sessions; i++) {
> +		s = __ceph_lookup_mds_session(mdsc, i);
> +		if (!s)
> +			continue;
> +		mutex_unlock(&mdsc->mutex);
> +

Why unlock here? AFAICT, it's safe to call ceph_put_mds_session with the
mdsc->mutex held.

> +		/*
> +		 * Skip it if MDS doesn't support the metric collection,
> +		 * or the MDS will close the session's socket connection
> +		 * directly when it get this message.
> +		 */
> +		if (check_session_state(s) &&
> +		    test_bit(CEPHFS_FEATURE_METRIC_COLLECT, &s->s_features)) {
> +			mdsc->metric.mds = i;
> +			return s;
> +		}
> +		ceph_put_mds_session(s);
> +
> +		mutex_lock(&mdsc->mutex);
> +	}
> +	mutex_unlock(&mdsc->mutex);
> +
> +	return NULL;
> +}
> +
> +static void metric_delayed_work(struct work_struct *work)
> +{
> +	struct ceph_client_metric *m =
> +		container_of(work, struct ceph_client_metric, delayed_work.work);
> +	struct ceph_mds_client *mdsc =
> +		container_of(m, struct ceph_mds_client, metric);
> +	struct ceph_mds_session *s = NULL;
> +	u64 nr_caps = atomic64_read(&m->total_caps);
> +
> +	/* No mds supports the metric collection, will stop the work */
> +	if (!atomic_read(&m->mds_cnt))
> +		return;
> +
> +	mutex_lock(&mdsc->mutex);
> +	s = __ceph_lookup_mds_session(mdsc, m->mds);
> +	mutex_unlock(&mdsc->mutex);


Instead of doing a lookup of the mds every time we need to do this,
would it be better to instead just do a lookup before you first schedule
the work and keep a reference to it until the session state is no longer
good?

With that, you'd only need to take the mutex here if check_session_state
indicated that the session you had saved was no longer good.

> +	if (unlikely(!s || !check_session_state(s) ||
> +	    !test_bit(CEPHFS_FEATURE_METRIC_COLLECT, &s->s_features)))
> +		s = metric_get_session(mdsc);
> +

If we do need to keep doing a lookup every time, then it'd probably be
better to do the above check while holding the mdsc->mutex and just have
metric_get_session expect to be called with the mutex already held.

FWIW, mutexes are expensive locks since you can end up having to
schedule() if you can't get it. Minimizing the number of acquisitions
and simply holding them for a little longer is often the more efficient
approach.

> +	if (s) {
> +		/* Only send the metric once in any available session */
> +		ceph_mdsc_send_metrics(mdsc, s, nr_caps);
> +		ceph_put_mds_session(s);
> +	}
> +
> +	metric_schedule_delayed(m);
> +}
>  
>  int ceph_metric_init(struct ceph_client_metric *m)
>  {
> @@ -52,6 +204,11 @@ int ceph_metric_init(struct ceph_client_metric *m)
>  	m->total_metadatas = 0;
>  	m->metadata_latency_sum = 0;
>  
> +	/* We assume the rank 0 support it as default */
> +	m->mds = 0;
> +	atomic_set(&m->mds_cnt, 0);
> +	INIT_DELAYED_WORK(&m->delayed_work, metric_delayed_work);
> +
>  	return 0;
>  
>  err_i_caps_mis:
> diff --git a/fs/ceph/metric.h b/fs/ceph/metric.h
> index 23a3373..68e2d17 100644
> --- a/fs/ceph/metric.h
> +++ b/fs/ceph/metric.h
> @@ -6,6 +6,71 @@
>  #include <linux/percpu_counter.h>
>  #include <linux/ktime.h>
>  
> +extern bool disable_send_metrics;
> +
> +enum ceph_metric_type {
> +	CLIENT_METRIC_TYPE_CAP_INFO,
> +	CLIENT_METRIC_TYPE_READ_LATENCY,
> +	CLIENT_METRIC_TYPE_WRITE_LATENCY,
> +	CLIENT_METRIC_TYPE_METADATA_LATENCY,
> +	CLIENT_METRIC_TYPE_DENTRY_LEASE,
> +
> +	CLIENT_METRIC_TYPE_MAX = CLIENT_METRIC_TYPE_DENTRY_LEASE,
> +};
> +
> +/* metric caps header */
> +struct ceph_metric_cap {
> +	__le32 type;     /* ceph metric type */
> +
> +	__u8  ver;
> +	__u8  compat;
> +
> +	__le32 data_len; /* length of sizeof(hit + mis + total) */
> +	__le64 hit;
> +	__le64 mis;
> +	__le64 total;
> +} __packed;
> +
> +/* metric read latency header */
> +struct ceph_metric_read_latency {
> +	__le32 type;     /* ceph metric type */
> +
> +	__u8  ver;
> +	__u8  compat;
> +
> +	__le32 data_len; /* length of sizeof(sec + nsec) */
> +	__le32 sec;
> +	__le32 nsec;
> +} __packed;
> +
> +/* metric write latency header */
> +struct ceph_metric_write_latency {
> +	__le32 type;     /* ceph metric type */
> +
> +	__u8  ver;
> +	__u8  compat;
> +
> +	__le32 data_len; /* length of sizeof(sec + nsec) */
> +	__le32 sec;
> +	__le32 nsec;
> +} __packed;
> +
> +/* metric metadata latency header */
> +struct ceph_metric_metadata_latency {
> +	__le32 type;     /* ceph metric type */
> +
> +	__u8  ver;
> +	__u8  compat;
> +
> +	__le32 data_len; /* length of sizeof(sec + nsec) */
> +	__le32 sec;
> +	__le32 nsec;
> +} __packed;
> +
> +struct ceph_metric_head {
> +	__le32 num;	/* the number of metrics that will be sent */
> +} __packed;
> +
>  /* This is the global metrics */
>  struct ceph_client_metric {
>  	atomic64_t            total_dentries;
> @@ -36,8 +101,25 @@ struct ceph_client_metric {
>  	ktime_t metadata_latency_sq_sum;
>  	ktime_t metadata_latency_min;
>  	ktime_t metadata_latency_max;
> +
> +	int mds; /* the MDS being used to send the metrics to */
> +	atomic_t mds_cnt;  /* how many MDSs support metrics collection */
> +	struct delayed_work delayed_work;  /* delayed work */
>  };
>  
> +static inline void metric_schedule_delayed(struct ceph_client_metric *m)
> +{
> +	/*
> +	 * If send metrics is disabled or no mds support metric
> +	 * collection, will stop the work
> +	 */
> +	if (disable_send_metrics || !atomic_read(&m->mds_cnt))
> +		return;
> +
> +	/* per second */
> +	schedule_delayed_work(&m->delayed_work, round_jiffies_relative(HZ));
> +}
> +
>  extern int ceph_metric_init(struct ceph_client_metric *m);
>  extern void ceph_metric_destroy(struct ceph_client_metric *m);
>  
> diff --git a/fs/ceph/super.c b/fs/ceph/super.c
> index c9784eb1..cd33836 100644
> --- a/fs/ceph/super.c
> +++ b/fs/ceph/super.c
> @@ -27,6 +27,9 @@
>  #include <linux/ceph/auth.h>
>  #include <linux/ceph/debugfs.h>
>  
> +static DEFINE_MUTEX(ceph_fsc_lock);

I think this could be a spinlock. None of the operations it protects
look like they can sleep.

> +static LIST_HEAD(ceph_fsc_list);
> +
>  /*
>   * Ceph superblock operations
>   *
> @@ -691,6 +694,10 @@ static struct ceph_fs_client *create_fs_client(struct ceph_mount_options *fsopt,
>  	if (!fsc->wb_pagevec_pool)
>  		goto fail_cap_wq;
>  
> +	mutex_lock(&ceph_fsc_lock);
> +	list_add_tail(&fsc->metric_wakeup, &ceph_fsc_list);
> +	mutex_unlock(&ceph_fsc_lock);
> +
>  	return fsc;
>  
>  fail_cap_wq:
> @@ -717,6 +724,10 @@ static void destroy_fs_client(struct ceph_fs_client *fsc)
>  {
>  	dout("destroy_fs_client %p\n", fsc);
>  
> +	mutex_lock(&ceph_fsc_lock);
> +	list_del(&fsc->metric_wakeup);
> +	mutex_unlock(&ceph_fsc_lock);
> +
>  	ceph_mdsc_destroy(fsc);
>  	destroy_workqueue(fsc->inode_wq);
>  	destroy_workqueue(fsc->cap_wq);
> @@ -1282,6 +1293,37 @@ static void __exit exit_ceph(void)
>  	destroy_caches();
>  }
>  
> +static int param_set_metrics(const char *val, const struct kernel_param *kp)
> +{
> +	struct ceph_fs_client *fsc;
> +	int ret;
> +
> +	ret = param_set_bool(val, kp);
> +	if (ret) {
> +		pr_err("Failed to parse sending metrics switch value '%s'\n",
> +		       val);
> +		return ret;
> +	} else if (!disable_send_metrics) {
> +		// wake up all the mds clients
> +		mutex_lock(&ceph_fsc_lock);
> +		list_for_each_entry(fsc, &ceph_fsc_list, metric_wakeup) {
> +			metric_schedule_delayed(&fsc->mdsc->metric);
> +		}
> +		mutex_unlock(&ceph_fsc_lock);
> +	}
> +
> +	return 0;
> +}
> +
> +static const struct kernel_param_ops param_ops_metrics = {
> +	.set = param_set_metrics,
> +	.get = param_get_bool,
> +};
> +
> +bool disable_send_metrics = false;
> +module_param_cb(disable_send_metrics, &param_ops_metrics, &disable_send_metrics, 0644);
> +MODULE_PARM_DESC(disable_send_metrics, "Enable sending perf metrics to ceph cluster (default: on)");
> +
>  module_init(init_ceph);
>  module_exit(exit_ceph);
>  
> diff --git a/fs/ceph/super.h b/fs/ceph/super.h
> index 5a6cdd3..2dcb6a9 100644
> --- a/fs/ceph/super.h
> +++ b/fs/ceph/super.h
> @@ -101,6 +101,8 @@ struct ceph_mount_options {
>  struct ceph_fs_client {
>  	struct super_block *sb;
>  
> +	struct list_head metric_wakeup;
> +
>  	struct ceph_mount_options *mount_options;
>  	struct ceph_client *client;
>  
> diff --git a/include/linux/ceph/ceph_fs.h b/include/linux/ceph/ceph_fs.h
> index ebf5ba6..455e9b9 100644
> --- a/include/linux/ceph/ceph_fs.h
> +++ b/include/linux/ceph/ceph_fs.h
> @@ -130,6 +130,7 @@ struct ceph_dir_layout {
>  #define CEPH_MSG_CLIENT_REQUEST         24
>  #define CEPH_MSG_CLIENT_REQUEST_FORWARD 25
>  #define CEPH_MSG_CLIENT_REPLY           26
> +#define CEPH_MSG_CLIENT_METRICS         29
>  #define CEPH_MSG_CLIENT_CAPS            0x310
>  #define CEPH_MSG_CLIENT_LEASE           0x311
>  #define CEPH_MSG_CLIENT_SNAP            0x312

-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v5 3/5] ceph: periodically send perf metrics to ceph
  2020-06-30 11:29   ` Jeff Layton
@ 2020-06-30 12:14     ` Xiubo Li
  2020-06-30 14:28       ` Jeff Layton
  0 siblings, 1 reply; 12+ messages in thread
From: Xiubo Li @ 2020-06-30 12:14 UTC (permalink / raw)
  To: Jeff Layton; +Cc: idryomov, zyan, pdonnell, vshankar, ceph-devel

On 2020/6/30 19:29, Jeff Layton wrote:
> On Tue, 2020-06-30 at 03:52 -0400, xiubli@redhat.com wrote:
>> From: Xiubo Li <xiubli@redhat.com>
>>
>> This will send the caps/read/write/metadata metrics to any available
>> MDS only once per second as default, which will be the same as the
>> userland client. It will skip the MDS sessions which don't support
>> the metric collection, or the MDSs will close the socket connections
>> directly when it get an unknown type message.
>>

[...]

>> +static struct ceph_mds_session *metric_get_session(struct ceph_mds_client *mdsc)
>> +{
>> +	struct ceph_mds_session *s;
>> +	int i;
>> +
>> +	mutex_lock(&mdsc->mutex);
>> +	for (i = 0; i < mdsc->max_sessions; i++) {
>> +		s = __ceph_lookup_mds_session(mdsc, i);
>> +		if (!s)
>> +			continue;
>> +		mutex_unlock(&mdsc->mutex);
>> +
> Why unlock here? AFAICT, it's safe to call ceph_put_mds_session with the
> mdsc->mutex held.

Yeah, it is. Just wanted to make the critical section as small as 
possible. And the following code no need the lock.

Compared to the mutex lock acquisition is very expensive, we might not 
benefit much from the smaller critical section.

I will fix it.

>
>> +		/*
>> +		 * Skip it if MDS doesn't support the metric collection,
>> +		 * or the MDS will close the session's socket connection
>> +		 * directly when it get this message.
>> +		 */
>> +		if (check_session_state(s) &&
>> +		    test_bit(CEPHFS_FEATURE_METRIC_COLLECT, &s->s_features)) {
>> +			mdsc->metric.mds = i;
>> +			return s;
>> +		}
>> +		ceph_put_mds_session(s);
>> +
>> +		mutex_lock(&mdsc->mutex);
>> +	}
>> +	mutex_unlock(&mdsc->mutex);
>> +
>> +	return NULL;
>> +}
>> +
>> +static void metric_delayed_work(struct work_struct *work)
>> +{
>> +	struct ceph_client_metric *m =
>> +		container_of(work, struct ceph_client_metric, delayed_work.work);
>> +	struct ceph_mds_client *mdsc =
>> +		container_of(m, struct ceph_mds_client, metric);
>> +	struct ceph_mds_session *s = NULL;
>> +	u64 nr_caps = atomic64_read(&m->total_caps);
>> +
>> +	/* No mds supports the metric collection, will stop the work */
>> +	if (!atomic_read(&m->mds_cnt))
>> +		return;
>> +
>> +	mutex_lock(&mdsc->mutex);
>> +	s = __ceph_lookup_mds_session(mdsc, m->mds);
>> +	mutex_unlock(&mdsc->mutex);
>
> Instead of doing a lookup of the mds every time we need to do this,
> would it be better to instead just do a lookup before you first schedule
> the work and keep a reference to it until the session state is no longer
> good?
>
> With that, you'd only need to take the mutex here if check_session_state
> indicated that the session you had saved was no longer good.

This sounds very cool and with this we can get rid of the mutex lock in 
normal case.


>
>> +	if (unlikely(!s || !check_session_state(s) ||
>> +	    !test_bit(CEPHFS_FEATURE_METRIC_COLLECT, &s->s_features)))
>> +		s = metric_get_session(mdsc);
>> +
> If we do need to keep doing a lookup every time, then it'd probably be
> better to do the above check while holding the mdsc->mutex and just have
> metric_get_session expect to be called with the mutex already held.
>
> FWIW, mutexes are expensive locks since you can end up having to
> schedule() if you can't get it. Minimizing the number of acquisitions
> and simply holding them for a little longer is often the more efficient
> approach.

Okay, will do that.

[...]

>
>>   
>> diff --git a/fs/ceph/super.c b/fs/ceph/super.c
>> index c9784eb1..cd33836 100644
>> --- a/fs/ceph/super.c
>> +++ b/fs/ceph/super.c
>> @@ -27,6 +27,9 @@
>>   #include <linux/ceph/auth.h>
>>   #include <linux/ceph/debugfs.h>
>>   
>> +static DEFINE_MUTEX(ceph_fsc_lock);
> I think this could be a spinlock. None of the operations it protects
> look like they can sleep.

Will fix it.

Thanks.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v5 0/5] ceph: periodically send perf metrics to ceph
  2020-06-30  7:52 [PATCH v5 0/5] ceph: periodically send perf metrics to ceph xiubli
                   ` (4 preceding siblings ...)
  2020-06-30  7:52 ` [PATCH v5 5/5] ceph: send client provided metric flags in client metadata xiubli
@ 2020-06-30 13:02 ` Jeff Layton
  2020-06-30 13:09   ` Xiubo Li
  5 siblings, 1 reply; 12+ messages in thread
From: Jeff Layton @ 2020-06-30 13:02 UTC (permalink / raw)
  To: xiubli; +Cc: idryomov, zyan, pdonnell, vshankar, ceph-devel

On Tue, 2020-06-30 at 03:52 -0400, xiubli@redhat.com wrote:
> From: Xiubo Li <xiubli@redhat.com>
> 
> This series is based the previous patches of the metrics in kceph[1]
> and mds daemons record and forward client side metrics to manager[2][3].
> 
> This will send the caps/read/write/metadata metrics to any available
> MDS only once per second, which will be the same as the userland client.
> We could disable it via the disable_send_metrics module parameter.
> 
> In mdsc->metric we have two new members:
> 'metric.mds': save the available and valid MDS rank number to send the
>               metrics to.
> 'metric.mds_cnt: how many MDSs support the metric collection feature.
> 
> Only when '!disable_send_metric && metric.mds_cnt > 0' will the workqueue
> job keep alive.
> 
> 
> And will also send the metric flags to MDS, currently it supports the
> cap, read latency, write latency and metadata latency.
> 
> Also have pushed this series to github [4].
> 
> [1] https://patchwork.kernel.org/project/ceph-devel/list/?series=238907 [Merged]
> [2] https://github.com/ceph/ceph/pull/26004 [Merged]
> [3] https://github.com/ceph/ceph/pull/35608 [Merged]
> [4] https://github.com/lxbsz/ceph-client/commits/perf_metric5
> 
> Changes in V5:
> - rename enable_send_metrics --> disable_send_metrics
> - switch back to a single workqueue job.
> - 'list' --> 'metric_wakeup'
> 
> Changes in V4:
> - WARN_ON --> WARN_ON_ONCE
> - do not send metrics when no mds suppor the metric collection.
> - add global total_caps in mdsc->metric
> - add the delayed work for each session and choose one to send the metrics to get rid of the mdsc->mutex lock
> 
> Changed in V3:
> - fold "check the METRIC_COLLECT feature before sending metrics" into previous one
> - use `enable_send_metrics` on/off switch instead
> 
> Changed in V2:
> - split the patches into small ones as possible.
> - check the METRIC_COLLECT feature before sending metrics
> - switch to WARN_ON and bubble up errnos to the callers
> 
> 
> 
> 
> Xiubo Li (5):
>   ceph: add check_session_state helper and make it global
>   ceph: add global total_caps to count the mdsc's total caps number
>   ceph: periodically send perf metrics to ceph
>   ceph: switch to WARN_ON_ONCE and bubble up errnos to the callers
>   ceph: send client provided metric flags in client metadata
> 
>  fs/ceph/caps.c               |   2 +
>  fs/ceph/debugfs.c            |  14 +---
>  fs/ceph/mds_client.c         | 166 ++++++++++++++++++++++++++++++++++---------
>  fs/ceph/mds_client.h         |   7 +-
>  fs/ceph/metric.c             | 158 ++++++++++++++++++++++++++++++++++++++++
>  fs/ceph/metric.h             |  96 +++++++++++++++++++++++++
>  fs/ceph/super.c              |  42 +++++++++++
>  fs/ceph/super.h              |   2 +
>  include/linux/ceph/ceph_fs.h |   1 +
>  9 files changed, 442 insertions(+), 46 deletions(-)
> 

Hi Xiubo,

I'm going to go ahead and merge patches 1,2 and 4 out of this series.
They look like they should stand just fine on their own, and we can
focus on the last two stats patches in the series that way.

Let me know if you'd rather I not.

Thanks,
-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v5 0/5] ceph: periodically send perf metrics to ceph
  2020-06-30 13:02 ` [PATCH v5 0/5] ceph: periodically send perf metrics to ceph Jeff Layton
@ 2020-06-30 13:09   ` Xiubo Li
  0 siblings, 0 replies; 12+ messages in thread
From: Xiubo Li @ 2020-06-30 13:09 UTC (permalink / raw)
  To: Jeff Layton; +Cc: idryomov, zyan, pdonnell, vshankar, ceph-devel

On 2020/6/30 21:02, Jeff Layton wrote:
> On Tue, 2020-06-30 at 03:52 -0400, xiubli@redhat.com wrote:
>> From: Xiubo Li <xiubli@redhat.com>
>>
>> This series is based the previous patches of the metrics in kceph[1]
>> and mds daemons record and forward client side metrics to manager[2][3].
>>
>> This will send the caps/read/write/metadata metrics to any available
>> MDS only once per second, which will be the same as the userland client.
>> We could disable it via the disable_send_metrics module parameter.
>>
>> In mdsc->metric we have two new members:
>> 'metric.mds': save the available and valid MDS rank number to send the
>>                metrics to.
>> 'metric.mds_cnt: how many MDSs support the metric collection feature.
>>
>> Only when '!disable_send_metric && metric.mds_cnt > 0' will the workqueue
>> job keep alive.
>>
>>
>> And will also send the metric flags to MDS, currently it supports the
>> cap, read latency, write latency and metadata latency.
>>
>> Also have pushed this series to github [4].
>>
>> [1] https://patchwork.kernel.org/project/ceph-devel/list/?series=238907 [Merged]
>> [2] https://github.com/ceph/ceph/pull/26004 [Merged]
>> [3] https://github.com/ceph/ceph/pull/35608 [Merged]
>> [4] https://github.com/lxbsz/ceph-client/commits/perf_metric5
>>
>> Changes in V5:
>> - rename enable_send_metrics --> disable_send_metrics
>> - switch back to a single workqueue job.
>> - 'list' --> 'metric_wakeup'
>>
>> Changes in V4:
>> - WARN_ON --> WARN_ON_ONCE
>> - do not send metrics when no mds suppor the metric collection.
>> - add global total_caps in mdsc->metric
>> - add the delayed work for each session and choose one to send the metrics to get rid of the mdsc->mutex lock
>>
>> Changed in V3:
>> - fold "check the METRIC_COLLECT feature before sending metrics" into previous one
>> - use `enable_send_metrics` on/off switch instead
>>
>> Changed in V2:
>> - split the patches into small ones as possible.
>> - check the METRIC_COLLECT feature before sending metrics
>> - switch to WARN_ON and bubble up errnos to the callers
>>
>>
>>
>>
>> Xiubo Li (5):
>>    ceph: add check_session_state helper and make it global
>>    ceph: add global total_caps to count the mdsc's total caps number
>>    ceph: periodically send perf metrics to ceph
>>    ceph: switch to WARN_ON_ONCE and bubble up errnos to the callers
>>    ceph: send client provided metric flags in client metadata
>>
>>   fs/ceph/caps.c               |   2 +
>>   fs/ceph/debugfs.c            |  14 +---
>>   fs/ceph/mds_client.c         | 166 ++++++++++++++++++++++++++++++++++---------
>>   fs/ceph/mds_client.h         |   7 +-
>>   fs/ceph/metric.c             | 158 ++++++++++++++++++++++++++++++++++++++++
>>   fs/ceph/metric.h             |  96 +++++++++++++++++++++++++
>>   fs/ceph/super.c              |  42 +++++++++++
>>   fs/ceph/super.h              |   2 +
>>   include/linux/ceph/ceph_fs.h |   1 +
>>   9 files changed, 442 insertions(+), 46 deletions(-)
>>
> Hi Xiubo,
>
> I'm going to go ahead and merge patches 1,2 and 4 out of this series.
> They look like they should stand just fine on their own, and we can
> focus on the last two stats patches in the series that way.
>
> Let me know if you'd rather I not.

Sure, go ahead.

Thanks Jeff.


>
> Thanks,

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v5 3/5] ceph: periodically send perf metrics to ceph
  2020-06-30 12:14     ` Xiubo Li
@ 2020-06-30 14:28       ` Jeff Layton
  2020-07-01  5:59         ` Xiubo Li
  0 siblings, 1 reply; 12+ messages in thread
From: Jeff Layton @ 2020-06-30 14:28 UTC (permalink / raw)
  To: Xiubo Li; +Cc: idryomov, zyan, pdonnell, vshankar, ceph-devel

On Tue, 2020-06-30 at 20:14 +0800, Xiubo Li wrote:
> On 2020/6/30 19:29, Jeff Layton wrote:
> > On Tue, 2020-06-30 at 03:52 -0400, xiubli@redhat.com wrote:
> > > From: Xiubo Li <xiubli@redhat.com>
> > > 
> > > This will send the caps/read/write/metadata metrics to any available
> > > MDS only once per second as default, which will be the same as the
> > > userland client. It will skip the MDS sessions which don't support
> > > the metric collection, or the MDSs will close the socket connections
> > > directly when it get an unknown type message.
> > > 
> 
> [...]
> 
> > > +static struct ceph_mds_session *metric_get_session(struct ceph_mds_client *mdsc)
> > > +{
> > > +	struct ceph_mds_session *s;
> > > +	int i;
> > > +
> > > +	mutex_lock(&mdsc->mutex);
> > > +	for (i = 0; i < mdsc->max_sessions; i++) {
> > > +		s = __ceph_lookup_mds_session(mdsc, i);
> > > +		if (!s)
> > > +			continue;
> > > +		mutex_unlock(&mdsc->mutex);
> > > +
> > Why unlock here? AFAICT, it's safe to call ceph_put_mds_session with the
> > mdsc->mutex held.
> 
> Yeah, it is. Just wanted to make the critical section as small as 
> possible. And the following code no need the lock.
> 
> Compared to the mutex lock acquisition is very expensive, we might not 
> benefit much from the smaller critical section.
>
> I will fix it.
> 

Generally small critical sections are preferred, but almost all of these
are in-memory operations. None of that should sleep, so we're almost
certainly better off with less lock thrashing. If the lock isn't
contended, then no harm is done. If it is, then we're better off not
letting the cacheline bounce.


> > > +		/*
> > > +		 * Skip it if MDS doesn't support the metric collection,
> > > +		 * or the MDS will close the session's socket connection
> > > +		 * directly when it get this message.
> > > +		 */
> > > +		if (check_session_state(s) &&
> > > +		    test_bit(CEPHFS_FEATURE_METRIC_COLLECT, &s->s_features)) {
> > > +			mdsc->metric.mds = i;
> > > +			return s;
> > > +		}
> > > +		ceph_put_mds_session(s);
> > > +
> > > +		mutex_lock(&mdsc->mutex);
> > > +	}
> > > +	mutex_unlock(&mdsc->mutex);
> > > +
> > > +	return NULL;
> > > +}
> > > +
> > > +static void metric_delayed_work(struct work_struct *work)
> > > +{
> > > +	struct ceph_client_metric *m =
> > > +		container_of(work, struct ceph_client_metric, delayed_work.work);
> > > +	struct ceph_mds_client *mdsc =
> > > +		container_of(m, struct ceph_mds_client, metric);
> > > +	struct ceph_mds_session *s = NULL;
> > > +	u64 nr_caps = atomic64_read(&m->total_caps);
> > > +
> > > +	/* No mds supports the metric collection, will stop the work */
> > > +	if (!atomic_read(&m->mds_cnt))
> > > +		return;
> > > +
> > > +	mutex_lock(&mdsc->mutex);
> > > +	s = __ceph_lookup_mds_session(mdsc, m->mds);
> > > +	mutex_unlock(&mdsc->mutex);
> > 
> > Instead of doing a lookup of the mds every time we need to do this,
> > would it be better to instead just do a lookup before you first schedule
> > the work and keep a reference to it until the session state is no longer
> > good?
> > 
> > With that, you'd only need to take the mutex here if check_session_state
> > indicated that the session you had saved was no longer good.
> 
> This sounds very cool and with this we can get rid of the mutex lock in 
> normal case.
> 

Yeah. I think the code could be simplified the code in other ways too.

You have per-mdsc work now, so there's no problem scheduling the work
more than once. You can just schedule it any time you get a new session
that has the feature flag set.

Keep a metrics session pointer in the mdsc->metric, and start with it
set to NULL. When the work runs, do a lookup if the pointer is NULL or
if the current one isn't valid any more. At the end, only reschedule the
work if we found a suitable session.

That should eliminate the need for the mdsc.metric->mds_cnt counter.

> > > +	if (unlikely(!s || !check_session_state(s) ||
> > > +	    !test_bit(CEPHFS_FEATURE_METRIC_COLLECT, &s->s_features)))
> > > +		s = metric_get_session(mdsc);
> > > +
> > If we do need to keep doing a lookup every time, then it'd probably be
> > better to do the above check while holding the mdsc->mutex and just have
> > metric_get_session expect to be called with the mutex already held.
> > 
> > FWIW, mutexes are expensive locks since you can end up having to
> > schedule() if you can't get it. Minimizing the number of acquisitions
> > and simply holding them for a little longer is often the more efficient
> > approach.
> 
> Okay, will do that.
> 
> [...]
> 
> > >   
> > > diff --git a/fs/ceph/super.c b/fs/ceph/super.c
> > > index c9784eb1..cd33836 100644
> > > --- a/fs/ceph/super.c
> > > +++ b/fs/ceph/super.c
> > > @@ -27,6 +27,9 @@
> > >   #include <linux/ceph/auth.h>
> > >   #include <linux/ceph/debugfs.h>
> > >   
> > > +static DEFINE_MUTEX(ceph_fsc_lock);
> > I think this could be a spinlock. None of the operations it protects
> > look like they can sleep.
> 
> Will fix it.
> 
> Thanks.
> 
> 

-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v5 3/5] ceph: periodically send perf metrics to ceph
  2020-06-30 14:28       ` Jeff Layton
@ 2020-07-01  5:59         ` Xiubo Li
  0 siblings, 0 replies; 12+ messages in thread
From: Xiubo Li @ 2020-07-01  5:59 UTC (permalink / raw)
  To: Jeff Layton; +Cc: idryomov, zyan, pdonnell, vshankar, ceph-devel

On 2020/6/30 22:28, Jeff Layton wrote:
> On Tue, 2020-06-30 at 20:14 +0800, Xiubo Li wrote:
>> On 2020/6/30 19:29, Jeff Layton wrote:
>>> On Tue, 2020-06-30 at 03:52 -0400, xiubli@redhat.com wrote:
>>>> From: Xiubo Li <xiubli@redhat.com>
>>>>
>>>> This will send the caps/read/write/metadata metrics to any available
>>>> MDS only once per second as default, which will be the same as the
>>>> userland client. It will skip the MDS sessions which don't support
>>>> the metric collection, or the MDSs will close the socket connections
>>>> directly when it get an unknown type message.
>>>>
>> [...]
>>
>>>> +static struct ceph_mds_session *metric_get_session(struct ceph_mds_client *mdsc)
>>>> +{
>>>> +	struct ceph_mds_session *s;
>>>> +	int i;
>>>> +
>>>> +	mutex_lock(&mdsc->mutex);
>>>> +	for (i = 0; i < mdsc->max_sessions; i++) {
>>>> +		s = __ceph_lookup_mds_session(mdsc, i);
>>>> +		if (!s)
>>>> +			continue;
>>>> +		mutex_unlock(&mdsc->mutex);
>>>> +
>>> Why unlock here? AFAICT, it's safe to call ceph_put_mds_session with the
>>> mdsc->mutex held.
>> Yeah, it is. Just wanted to make the critical section as small as
>> possible. And the following code no need the lock.
>>
>> Compared to the mutex lock acquisition is very expensive, we might not
>> benefit much from the smaller critical section.
>>
>> I will fix it.
>>
> Generally small critical sections are preferred, but almost all of these
> are in-memory operations. None of that should sleep, so we're almost
> certainly better off with less lock thrashing. If the lock isn't
> contended, then no harm is done. If it is, then we're better off not
> letting the cacheline bounce.

Agree.


>
>
>>>> +		/*
>>>> +		 * Skip it if MDS doesn't support the metric collection,
>>>> +		 * or the MDS will close the session's socket connection
>>>> +		 * directly when it get this message.
>>>> +		 */
>>>> +		if (check_session_state(s) &&
>>>> +		    test_bit(CEPHFS_FEATURE_METRIC_COLLECT, &s->s_features)) {
>>>> +			mdsc->metric.mds = i;
>>>> +			return s;
>>>> +		}
>>>> +		ceph_put_mds_session(s);
>>>> +
>>>> +		mutex_lock(&mdsc->mutex);
>>>> +	}
>>>> +	mutex_unlock(&mdsc->mutex);
>>>> +
>>>> +	return NULL;
>>>> +}
>>>> +
>>>> +static void metric_delayed_work(struct work_struct *work)
>>>> +{
>>>> +	struct ceph_client_metric *m =
>>>> +		container_of(work, struct ceph_client_metric, delayed_work.work);
>>>> +	struct ceph_mds_client *mdsc =
>>>> +		container_of(m, struct ceph_mds_client, metric);
>>>> +	struct ceph_mds_session *s = NULL;
>>>> +	u64 nr_caps = atomic64_read(&m->total_caps);
>>>> +
>>>> +	/* No mds supports the metric collection, will stop the work */
>>>> +	if (!atomic_read(&m->mds_cnt))
>>>> +		return;
>>>> +
>>>> +	mutex_lock(&mdsc->mutex);
>>>> +	s = __ceph_lookup_mds_session(mdsc, m->mds);
>>>> +	mutex_unlock(&mdsc->mutex);
>>> Instead of doing a lookup of the mds every time we need to do this,
>>> would it be better to instead just do a lookup before you first schedule
>>> the work and keep a reference to it until the session state is no longer
>>> good?
>>>
>>> With that, you'd only need to take the mutex here if check_session_state
>>> indicated that the session you had saved was no longer good.
>> This sounds very cool and with this we can get rid of the mutex lock in
>> normal case.
>>
> Yeah. I think the code could be simplified the code in other ways too.
>
> You have per-mdsc work now, so there's no problem scheduling the work
> more than once. You can just schedule it any time you get a new session
> that has the feature flag set.
>
> Keep a metrics session pointer in the mdsc->metric, and start with it
> set to NULL. When the work runs, do a lookup if the pointer is NULL or
> if the current one isn't valid any more. At the end, only reschedule the
> work if we found a suitable session.
>
> That should eliminate the need for the mdsc.metric->mds_cnt counter.
Yeah, this is almost the same with what I did but still with the 
mds_cnt, and I will try to
get rid of it.

Thanks


>>>> +	if (unlikely(!s || !check_session_state(s) ||
>>>> +	    !test_bit(CEPHFS_FEATURE_METRIC_COLLECT, &s->s_features)))
>>>> +		s = metric_get_session(mdsc);
>>>> +
>>> If we do need to keep doing a lookup every time, then it'd probably be
>>> better to do the above check while holding the mdsc->mutex and just have
>>> metric_get_session expect to be called with the mutex already held.
>>>
>>> FWIW, mutexes are expensive locks since you can end up having to
>>> schedule() if you can't get it. Minimizing the number of acquisitions
>>> and simply holding them for a little longer is often the more efficient
>>> approach.
>> Okay, will do that.
>>
>> [...]
>>
>>>>    
>>>> diff --git a/fs/ceph/super.c b/fs/ceph/super.c
>>>> index c9784eb1..cd33836 100644
>>>> --- a/fs/ceph/super.c
>>>> +++ b/fs/ceph/super.c
>>>> @@ -27,6 +27,9 @@
>>>>    #include <linux/ceph/auth.h>
>>>>    #include <linux/ceph/debugfs.h>
>>>>    
>>>> +static DEFINE_MUTEX(ceph_fsc_lock);
>>> I think this could be a spinlock. None of the operations it protects
>>> look like they can sleep.
>> Will fix it.
>>
>> Thanks.
>>
>>

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2020-07-01  5:59 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-30  7:52 [PATCH v5 0/5] ceph: periodically send perf metrics to ceph xiubli
2020-06-30  7:52 ` [PATCH v5 1/5] ceph: add check_session_state helper and make it global xiubli
2020-06-30  7:52 ` [PATCH v5 2/5] ceph: add global total_caps to count the mdsc's total caps number xiubli
2020-06-30  7:52 ` [PATCH v5 3/5] ceph: periodically send perf metrics to ceph xiubli
2020-06-30 11:29   ` Jeff Layton
2020-06-30 12:14     ` Xiubo Li
2020-06-30 14:28       ` Jeff Layton
2020-07-01  5:59         ` Xiubo Li
2020-06-30  7:52 ` [PATCH v5 4/5] ceph: switch to WARN_ON_ONCE and bubble up errnos to the callers xiubli
2020-06-30  7:52 ` [PATCH v5 5/5] ceph: send client provided metric flags in client metadata xiubli
2020-06-30 13:02 ` [PATCH v5 0/5] ceph: periodically send perf metrics to ceph Jeff Layton
2020-06-30 13:09   ` Xiubo Li

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).