[PATCH v12 0/4] ceph: add perf metrics support

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH v12 0/4] ceph: add perf metrics support
@ 2020-03-19 14:06 xiubli
  2020-03-19 14:06 ` [PATCH v12 1/4] ceph: add dentry lease metric support xiubli
                   ` (3 more replies)
  0 siblings, 4 replies; 10+ messages in thread
From: xiubli @ 2020-03-19 14:06 UTC (permalink / raw)
  To: jlayton; +Cc: sage, idryomov, gfarnum, zyan, pdonnell, ceph-devel, Xiubo Li

From: Xiubo Li <xiubli@redhat.com>

Changed in V12:
- [3/4] and [4/4] switch atomic64_t type to u64 for lat sum and total numbers

Changed in V11:
- [3/4] and [4/4] fold the min/max/stdev factors

Changed in V10:
- rebase to the latest testing branch
- merge all the metric related patches into one
- [1/6] move metric helpers into a new file metric.c
- [2/6] move metric helpers into metric.c
- [3/6] merge the read/write patches into a signal patch and move metric helpers to metric.c
- [4/6] move metric helpers to metric.c
- [5/6] min/max latency support
- [6/6] standard deviation support

Changed in V9:
- add an r_ended field to the mds request struct and use that to calculate the metric
- fix some commit comments

# cat /sys/kernel/debug/ceph/9a972bfc-68cb-4d52-a610-7cd9a9adbbdd.client52904/metrics
item          total       avg_lat(us)     min_lat(us)     max_lat(us)     stdev(us)
-----------------------------------------------------------------------------------
read          798         32000           4000            196000          560.3
write         2394        588000          28000           4812000         36673.9
metadata      7           116000          2000            707000          8282.8

item          total           miss            hit
-------------------------------------------------
d_lease       2               0               0
caps          2               14              546500




Xiubo Li (4):
  ceph: add dentry lease metric support
  ceph: add caps perf metric for each superblock
  ceph: add read/write latency metric support
  ceph: add metadata perf metric support

 fs/ceph/Makefile                |   2 +-
 fs/ceph/acl.c                   |   2 +-
 fs/ceph/addr.c                  |  18 ++++
 fs/ceph/caps.c                  |  19 ++++
 fs/ceph/debugfs.c               | 116 +++++++++++++++++++++++-
 fs/ceph/dir.c                   |  17 +++-
 fs/ceph/file.c                  |  26 ++++++
 fs/ceph/inode.c                 |   4 +-
 fs/ceph/mds_client.c            |  21 ++++-
 fs/ceph/mds_client.h            |   7 +-
 fs/ceph/metric.c                | 194 ++++++++++++++++++++++++++++++++++++++++
 fs/ceph/metric.h                |  64 +++++++++++++
 fs/ceph/super.h                 |   9 +-
 fs/ceph/xattr.c                 |   4 +-
 include/linux/ceph/osd_client.h |   1 +
 net/ceph/osd_client.c           |   2 +
 16 files changed, 488 insertions(+), 18 deletions(-)
 create mode 100644 fs/ceph/metric.c
 create mode 100644 fs/ceph/metric.h

-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v12 1/4] ceph: add dentry lease metric support
  2020-03-19 14:06 [PATCH v12 0/4] ceph: add perf metrics support xiubli
@ 2020-03-19 14:06 ` xiubli
  2020-03-19 14:06 ` [PATCH v12 2/4] ceph: add caps perf metric for each superblock xiubli
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 10+ messages in thread
From: xiubli @ 2020-03-19 14:06 UTC (permalink / raw)
  To: jlayton; +Cc: sage, idryomov, gfarnum, zyan, pdonnell, ceph-devel, Xiubo Li

From: Xiubo Li <xiubli@redhat.com>

For dentry leases, only count the hit/miss info triggered from the vfs
calls. For the cases like request reply handling and ceph_trim_dentries,
ignore them.

For now, these are only viewable using debugfs. Future patches will
allow the client to send the stats to the MDS.

The output looks like:

item          total           miss            hit
-------------------------------------------------
d_lease       11              7               141

URL: https://tracker.ceph.com/issues/43215
Signed-off-by: Xiubo Li <xiubli@redhat.com>
---
 fs/ceph/Makefile     |  2 +-
 fs/ceph/debugfs.c    | 33 +++++++++++++++++++++++++++++----
 fs/ceph/dir.c        | 12 ++++++++++++
 fs/ceph/mds_client.c | 16 ++++++++++++++--
 fs/ceph/mds_client.h |  4 ++++
 fs/ceph/metric.c     | 35 +++++++++++++++++++++++++++++++++++
 fs/ceph/metric.h     | 17 +++++++++++++++++
 fs/ceph/super.h      |  1 +
 8 files changed, 113 insertions(+), 7 deletions(-)
 create mode 100644 fs/ceph/metric.c
 create mode 100644 fs/ceph/metric.h

diff --git a/fs/ceph/Makefile b/fs/ceph/Makefile
index 0a0823d..50c635d 100644
--- a/fs/ceph/Makefile
+++ b/fs/ceph/Makefile
@@ -8,7 +8,7 @@ obj-$(CONFIG_CEPH_FS) += ceph.o
 ceph-y := super.o inode.o dir.o file.o locks.o addr.o ioctl.o \
 	export.o caps.o snap.o xattr.o quota.o io.o \
 	mds_client.o mdsmap.o strings.o ceph_frag.o \
-	debugfs.o util.o
+	debugfs.o util.o metric.o
 
 ceph-$(CONFIG_CEPH_FSCACHE) += cache.o
 ceph-$(CONFIG_CEPH_FS_POSIX_ACL) += acl.o
diff --git a/fs/ceph/debugfs.c b/fs/ceph/debugfs.c
index 481ac97..60cdb8f 100644
--- a/fs/ceph/debugfs.c
+++ b/fs/ceph/debugfs.c
@@ -124,6 +124,23 @@ static int mdsc_show(struct seq_file *s, void *p)
 	return 0;
 }
 
+static int metric_show(struct seq_file *s, void *p)
+{
+	struct ceph_fs_client *fsc = s->private;
+	struct ceph_mds_client *mdsc = fsc->mdsc;
+	struct ceph_client_metric *m = &mdsc->metric;
+
+	seq_printf(s, "item          total           miss            hit\n");
+	seq_printf(s, "-------------------------------------------------\n");
+
+	seq_printf(s, "%-14s%-16lld%-16lld%lld\n", "d_lease",
+		   atomic64_read(&m->total_dentries),
+		   percpu_counter_sum(&m->d_lease_mis),
+		   percpu_counter_sum(&m->d_lease_hit));
+
+	return 0;
+}
+
 static int caps_show_cb(struct inode *inode, struct ceph_cap *cap, void *p)
 {
 	struct seq_file *s = p;
@@ -222,6 +239,7 @@ static int mds_sessions_show(struct seq_file *s, void *ptr)
 DEFINE_SHOW_ATTRIBUTE(mdsc);
 DEFINE_SHOW_ATTRIBUTE(caps);
 DEFINE_SHOW_ATTRIBUTE(mds_sessions);
+DEFINE_SHOW_ATTRIBUTE(metric);
 
 
 /*
@@ -255,6 +273,7 @@ void ceph_fs_debugfs_cleanup(struct ceph_fs_client *fsc)
 	debugfs_remove(fsc->debugfs_mdsmap);
 	debugfs_remove(fsc->debugfs_mds_sessions);
 	debugfs_remove(fsc->debugfs_caps);
+	debugfs_remove(fsc->debugfs_metric);
 	debugfs_remove(fsc->debugfs_mdsc);
 }
 
@@ -295,11 +314,17 @@ void ceph_fs_debugfs_init(struct ceph_fs_client *fsc)
 						fsc,
 						&mdsc_fops);
 
+	fsc->debugfs_metric = debugfs_create_file("metrics",
+						  0400,
+						  fsc->client->debugfs_dir,
+						  fsc,
+						  &metric_fops);
+
 	fsc->debugfs_caps = debugfs_create_file("caps",
-						   0400,
-						   fsc->client->debugfs_dir,
-						   fsc,
-						   &caps_fops);
+						0400,
+						fsc->client->debugfs_dir,
+						fsc,
+						&caps_fops);
 }
 
 
diff --git a/fs/ceph/dir.c b/fs/ceph/dir.c
index d594c26..8097a86 100644
--- a/fs/ceph/dir.c
+++ b/fs/ceph/dir.c
@@ -38,6 +38,8 @@
 static int ceph_d_init(struct dentry *dentry)
 {
 	struct ceph_dentry_info *di;
+	struct ceph_fs_client *fsc = ceph_sb_to_client(dentry->d_sb);
+	struct ceph_mds_client *mdsc = fsc->mdsc;
 
 	di = kmem_cache_zalloc(ceph_dentry_cachep, GFP_KERNEL);
 	if (!di)
@@ -48,6 +50,9 @@ static int ceph_d_init(struct dentry *dentry)
 	di->time = jiffies;
 	dentry->d_fsdata = di;
 	INIT_LIST_HEAD(&di->lease_list);
+
+	atomic64_inc(&mdsc->metric.total_dentries);
+
 	return 0;
 }
 
@@ -1709,6 +1714,8 @@ static int ceph_d_revalidate(struct dentry *dentry, unsigned int flags)
 		if (flags & LOOKUP_RCU)
 			return -ECHILD;
 
+		percpu_counter_inc(&mdsc->metric.d_lease_mis);
+
 		op = ceph_snap(dir) == CEPH_SNAPDIR ?
 			CEPH_MDS_OP_LOOKUPSNAP : CEPH_MDS_OP_LOOKUP;
 		req = ceph_mdsc_create_request(mdsc, op, USE_ANY_MDS);
@@ -1740,6 +1747,8 @@ static int ceph_d_revalidate(struct dentry *dentry, unsigned int flags)
 			dout("d_revalidate %p lookup result=%d\n",
 			     dentry, err);
 		}
+	} else {
+		percpu_counter_inc(&mdsc->metric.d_lease_hit);
 	}
 
 	dout("d_revalidate %p %s\n", dentry, valid ? "valid" : "invalid");
@@ -1782,9 +1791,12 @@ static int ceph_d_delete(const struct dentry *dentry)
 static void ceph_d_release(struct dentry *dentry)
 {
 	struct ceph_dentry_info *di = ceph_dentry(dentry);
+	struct ceph_fs_client *fsc = ceph_sb_to_client(dentry->d_sb);
 
 	dout("d_release %p\n", dentry);
 
+	atomic64_dec(&fsc->mdsc->metric.total_dentries);
+
 	spin_lock(&dentry->d_lock);
 	__dentry_lease_unlist(di);
 	dentry->d_fsdata = NULL;
diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
index 486f91f..1e242d8 100644
--- a/fs/ceph/mds_client.c
+++ b/fs/ceph/mds_client.c
@@ -4325,6 +4325,7 @@ int ceph_mdsc_init(struct ceph_fs_client *fsc)
 
 {
 	struct ceph_mds_client *mdsc;
+	int err;
 
 	mdsc = kzalloc(sizeof(struct ceph_mds_client), GFP_NOFS);
 	if (!mdsc)
@@ -4333,8 +4334,8 @@ int ceph_mdsc_init(struct ceph_fs_client *fsc)
 	mutex_init(&mdsc->mutex);
 	mdsc->mdsmap = kzalloc(sizeof(*mdsc->mdsmap), GFP_NOFS);
 	if (!mdsc->mdsmap) {
-		kfree(mdsc);
-		return -ENOMEM;
+		err = -ENOMEM;
+		goto err_mdsc;
 	}
 
 	fsc->mdsc = mdsc;
@@ -4373,6 +4374,9 @@ int ceph_mdsc_init(struct ceph_fs_client *fsc)
 	init_waitqueue_head(&mdsc->cap_flushing_wq);
 	INIT_WORK(&mdsc->cap_reclaim_work, ceph_cap_reclaim_work);
 	atomic_set(&mdsc->cap_reclaim_pending, 0);
+	err = ceph_metric_init(&mdsc->metric);
+	if (err)
+		goto err_mdsmap;
 
 	spin_lock_init(&mdsc->dentry_list_lock);
 	INIT_LIST_HEAD(&mdsc->dentry_leases);
@@ -4391,6 +4395,12 @@ int ceph_mdsc_init(struct ceph_fs_client *fsc)
 	strscpy(mdsc->nodename, utsname()->nodename,
 		sizeof(mdsc->nodename));
 	return 0;
+
+err_mdsmap:
+	kfree(mdsc->mdsmap);
+err_mdsc:
+	kfree(mdsc);
+	return err;
 }
 
 /*
@@ -4648,6 +4658,8 @@ void ceph_mdsc_destroy(struct ceph_fs_client *fsc)
 
 	ceph_mdsc_stop(mdsc);
 
+	ceph_metric_destroy(&mdsc->metric);
+
 	fsc->mdsc = NULL;
 	kfree(mdsc);
 	dout("mdsc_destroy %p done\n", mdsc);
diff --git a/fs/ceph/mds_client.h b/fs/ceph/mds_client.h
index 4e5be79b..ae1d01c 100644
--- a/fs/ceph/mds_client.h
+++ b/fs/ceph/mds_client.h
@@ -16,6 +16,8 @@
 #include <linux/ceph/mdsmap.h>
 #include <linux/ceph/auth.h>
 
+#include "metric.h"
+
 /* The first 8 bits are reserved for old ceph releases */
 enum ceph_feature_type {
 	CEPHFS_FEATURE_MIMIC = 8,
@@ -454,6 +456,8 @@ struct ceph_mds_client {
 	struct list_head  dentry_leases;     /* fifo list */
 	struct list_head  dentry_dir_leases; /* lru list */
 
+	struct ceph_client_metric metric;
+
 	spinlock_t		snapid_map_lock;
 	struct rb_root		snapid_map_tree;
 	struct list_head	snapid_map_lru;
diff --git a/fs/ceph/metric.c b/fs/ceph/metric.c
new file mode 100644
index 0000000..873a376
--- /dev/null
+++ b/fs/ceph/metric.c
@@ -0,0 +1,35 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#include <linux/types.h>
+#include <linux/percpu_counter.h>
+
+#include "metric.h"
+
+int ceph_metric_init(struct ceph_client_metric *m)
+{
+	int ret;
+
+	if (!m)
+		return -EINVAL;
+
+	atomic64_set(&m->total_dentries, 0);
+	ret = percpu_counter_init(&m->d_lease_hit, 0, GFP_KERNEL);
+	if (ret)
+		return ret;
+	ret = percpu_counter_init(&m->d_lease_mis, 0, GFP_KERNEL);
+	if (ret) {
+		percpu_counter_destroy(&m->d_lease_hit);
+		return ret;
+	}
+
+	return 0;
+}
+
+void ceph_metric_destroy(struct ceph_client_metric *m)
+{
+	if (!m)
+		return;
+
+	percpu_counter_destroy(&m->d_lease_mis);
+	percpu_counter_destroy(&m->d_lease_hit);
+}
diff --git a/fs/ceph/metric.h b/fs/ceph/metric.h
new file mode 100644
index 0000000..da725da
--- /dev/null
+++ b/fs/ceph/metric.h
@@ -0,0 +1,17 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _FS_CEPH_MDS_METRIC_H
+#define _FS_CEPH_MDS_METRIC_H
+
+#include <linux/types.h>
+#include <linux/percpu_counter.h>
+
+/* This is the global metrics */
+struct ceph_client_metric {
+	atomic64_t            total_dentries;
+	struct percpu_counter d_lease_hit;
+	struct percpu_counter d_lease_mis;
+};
+
+extern int ceph_metric_init(struct ceph_client_metric *m);
+extern void ceph_metric_destroy(struct ceph_client_metric *m);
+#endif /* _FS_CEPH_MDS_METRIC_H */
diff --git a/fs/ceph/super.h b/fs/ceph/super.h
index 60aac3a..5c73cf1 100644
--- a/fs/ceph/super.h
+++ b/fs/ceph/super.h
@@ -128,6 +128,7 @@ struct ceph_fs_client {
 	struct dentry *debugfs_congestion_kb;
 	struct dentry *debugfs_bdi;
 	struct dentry *debugfs_mdsc, *debugfs_mdsmap;
+	struct dentry *debugfs_metric;
 	struct dentry *debugfs_mds_sessions;
 #endif
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v12 2/4] ceph: add caps perf metric for each superblock
  2020-03-19 14:06 [PATCH v12 0/4] ceph: add perf metrics support xiubli
  2020-03-19 14:06 ` [PATCH v12 1/4] ceph: add dentry lease metric support xiubli
@ 2020-03-19 14:06 ` xiubli
  2020-03-19 14:06 ` [PATCH v12 3/4] ceph: add read/write latency metric support xiubli
  2020-03-19 14:06 ` [PATCH v12 4/4] ceph: add metadata perf " xiubli
  3 siblings, 0 replies; 10+ messages in thread
From: xiubli @ 2020-03-19 14:06 UTC (permalink / raw)
  To: jlayton; +Cc: sage, idryomov, gfarnum, zyan, pdonnell, ceph-devel, Xiubo Li

From: Xiubo Li <xiubli@redhat.com>

Count hits and misses in the caps cache. If the client has all of
the necessary caps when a task needs references, then it's counted
as a hit. Any other situation is a miss.

URL: https://tracker.ceph.com/issues/43215
Signed-off-by: Xiubo Li <xiubli@redhat.com>
---
 fs/ceph/acl.c     |  2 +-
 fs/ceph/caps.c    | 19 +++++++++++++++++++
 fs/ceph/debugfs.c | 16 ++++++++++++++++
 fs/ceph/dir.c     |  5 +++--
 fs/ceph/inode.c   |  4 ++--
 fs/ceph/metric.c  | 26 ++++++++++++++++++++++----
 fs/ceph/metric.h  | 13 +++++++++++++
 fs/ceph/super.h   |  8 +++++---
 fs/ceph/xattr.c   |  4 ++--
 9 files changed, 83 insertions(+), 14 deletions(-)

diff --git a/fs/ceph/acl.c b/fs/ceph/acl.c
index 26be652..e046574 100644
--- a/fs/ceph/acl.c
+++ b/fs/ceph/acl.c
@@ -22,7 +22,7 @@ static inline void ceph_set_cached_acl(struct inode *inode,
 	struct ceph_inode_info *ci = ceph_inode(inode);
 
 	spin_lock(&ci->i_ceph_lock);
-	if (__ceph_caps_issued_mask(ci, CEPH_CAP_XATTR_SHARED, 0))
+	if (__ceph_caps_issued_mask_metric(ci, CEPH_CAP_XATTR_SHARED, 0))
 		set_cached_acl(inode, type, acl);
 	else
 		forget_cached_acl(inode, type);
diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c
index 185db76..7794338 100644
--- a/fs/ceph/caps.c
+++ b/fs/ceph/caps.c
@@ -912,6 +912,20 @@ int __ceph_caps_issued_mask(struct ceph_inode_info *ci, int mask, int touch)
 	return 0;
 }
 
+int __ceph_caps_issued_mask_metric(struct ceph_inode_info *ci, int mask,
+				   int touch)
+{
+	struct ceph_fs_client *fsc = ceph_sb_to_client(ci->vfs_inode.i_sb);
+	int r;
+
+	r = __ceph_caps_issued_mask(ci, mask, touch);
+	if (r)
+		ceph_update_cap_hit(&fsc->mdsc->metric);
+	else
+		ceph_update_cap_mis(&fsc->mdsc->metric);
+	return r;
+}
+
 /*
  * Return true if mask caps are currently being revoked by an MDS.
  */
@@ -2685,6 +2699,11 @@ static int try_get_cap_refs(struct inode *inode, int need, int want,
 	if (snap_rwsem_locked)
 		up_read(&mdsc->snap_rwsem);
 
+	if (!ret)
+		ceph_update_cap_mis(&mdsc->metric);
+	else if (ret == 1)
+		ceph_update_cap_hit(&mdsc->metric);
+
 	dout("get_cap_refs %p ret %d got %s\n", inode,
 	     ret, ceph_cap_string(*got));
 	return ret;
diff --git a/fs/ceph/debugfs.c b/fs/ceph/debugfs.c
index 60cdb8f..66b9622 100644
--- a/fs/ceph/debugfs.c
+++ b/fs/ceph/debugfs.c
@@ -129,6 +129,7 @@ static int metric_show(struct seq_file *s, void *p)
 	struct ceph_fs_client *fsc = s->private;
 	struct ceph_mds_client *mdsc = fsc->mdsc;
 	struct ceph_client_metric *m = &mdsc->metric;
+	int i, nr_caps = 0;
 
 	seq_printf(s, "item          total           miss            hit\n");
 	seq_printf(s, "-------------------------------------------------\n");
@@ -138,6 +139,21 @@ static int metric_show(struct seq_file *s, void *p)
 		   percpu_counter_sum(&m->d_lease_mis),
 		   percpu_counter_sum(&m->d_lease_hit));
 
+	mutex_lock(&mdsc->mutex);
+	for (i = 0; i < mdsc->max_sessions; i++) {
+		struct ceph_mds_session *s;
+
+		s = __ceph_lookup_mds_session(mdsc, i);
+		if (!s)
+			continue;
+		nr_caps += s->s_nr_caps;
+		ceph_put_mds_session(s);
+	}
+	mutex_unlock(&mdsc->mutex);
+	seq_printf(s, "%-14s%-16d%-16lld%lld\n", "caps", nr_caps,
+		   percpu_counter_sum(&m->i_caps_mis),
+		   percpu_counter_sum(&m->i_caps_hit));
+
 	return 0;
 }
 
diff --git a/fs/ceph/dir.c b/fs/ceph/dir.c
index 8097a86..10d528a 100644
--- a/fs/ceph/dir.c
+++ b/fs/ceph/dir.c
@@ -349,8 +349,9 @@ static int ceph_readdir(struct file *file, struct dir_context *ctx)
 	    !ceph_test_mount_opt(fsc, NOASYNCREADDIR) &&
 	    ceph_snap(inode) != CEPH_SNAPDIR &&
 	    __ceph_dir_is_complete_ordered(ci) &&
-	    __ceph_caps_issued_mask(ci, CEPH_CAP_FILE_SHARED, 1)) {
+	    __ceph_caps_issued_mask_metric(ci, CEPH_CAP_FILE_SHARED, 1)) {
 		int shared_gen = atomic_read(&ci->i_shared_gen);
+
 		spin_unlock(&ci->i_ceph_lock);
 		err = __dcache_readdir(file, ctx, shared_gen);
 		if (err != -EAGAIN)
@@ -767,7 +768,7 @@ static struct dentry *ceph_lookup(struct inode *dir, struct dentry *dentry,
 		    !is_root_ceph_dentry(dir, dentry) &&
 		    ceph_test_mount_opt(fsc, DCACHE) &&
 		    __ceph_dir_is_complete(ci) &&
-		    (__ceph_caps_issued_mask(ci, CEPH_CAP_FILE_SHARED, 1))) {
+		    __ceph_caps_issued_mask_metric(ci, CEPH_CAP_FILE_SHARED, 1)) {
 			__ceph_touch_fmode(ci, mdsc, CEPH_FILE_MODE_RD);
 			spin_unlock(&ci->i_ceph_lock);
 			dout(" dir %p complete, -ENOENT\n", dir);
diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c
index 7fef94f..357c937 100644
--- a/fs/ceph/inode.c
+++ b/fs/ceph/inode.c
@@ -2288,8 +2288,8 @@ int __ceph_do_getattr(struct inode *inode, struct page *locked_page,
 
 	dout("do_getattr inode %p mask %s mode 0%o\n",
 	     inode, ceph_cap_string(mask), inode->i_mode);
-	if (!force && ceph_caps_issued_mask(ceph_inode(inode), mask, 1))
-		return 0;
+	if (!force && ceph_caps_issued_mask_metric(ceph_inode(inode), mask, 1))
+			return 0;
 
 	mode = (mask & CEPH_STAT_RSTAT) ? USE_AUTH_MDS : USE_ANY_MDS;
 	req = ceph_mdsc_create_request(mdsc, CEPH_MDS_OP_GETATTR, mode);
diff --git a/fs/ceph/metric.c b/fs/ceph/metric.c
index 873a376..2a4b739 100644
--- a/fs/ceph/metric.c
+++ b/fs/ceph/metric.c
@@ -16,13 +16,29 @@ int ceph_metric_init(struct ceph_client_metric *m)
 	ret = percpu_counter_init(&m->d_lease_hit, 0, GFP_KERNEL);
 	if (ret)
 		return ret;
+
 	ret = percpu_counter_init(&m->d_lease_mis, 0, GFP_KERNEL);
-	if (ret) {
-		percpu_counter_destroy(&m->d_lease_hit);
-		return ret;
-	}
+	if (ret)
+		goto err_d_lease_mis;
+
+	ret = percpu_counter_init(&m->i_caps_hit, 0, GFP_KERNEL);
+	if (ret)
+		goto err_i_caps_hit;
+
+	ret = percpu_counter_init(&m->i_caps_mis, 0, GFP_KERNEL);
+	if (ret)
+		goto err_i_caps_mis;
 
 	return 0;
+
+err_i_caps_mis:
+	percpu_counter_destroy(&m->i_caps_hit);
+err_i_caps_hit:
+	percpu_counter_destroy(&m->d_lease_mis);
+err_d_lease_mis:
+	percpu_counter_destroy(&m->d_lease_hit);
+
+	return ret;
 }
 
 void ceph_metric_destroy(struct ceph_client_metric *m)
@@ -30,6 +46,8 @@ void ceph_metric_destroy(struct ceph_client_metric *m)
 	if (!m)
 		return;
 
+	percpu_counter_destroy(&m->i_caps_mis);
+	percpu_counter_destroy(&m->i_caps_hit);
 	percpu_counter_destroy(&m->d_lease_mis);
 	percpu_counter_destroy(&m->d_lease_hit);
 }
diff --git a/fs/ceph/metric.h b/fs/ceph/metric.h
index da725da..098ee8a 100644
--- a/fs/ceph/metric.h
+++ b/fs/ceph/metric.h
@@ -10,8 +10,21 @@ struct ceph_client_metric {
 	atomic64_t            total_dentries;
 	struct percpu_counter d_lease_hit;
 	struct percpu_counter d_lease_mis;
+
+	struct percpu_counter i_caps_hit;
+	struct percpu_counter i_caps_mis;
 };
 
 extern int ceph_metric_init(struct ceph_client_metric *m);
 extern void ceph_metric_destroy(struct ceph_client_metric *m);
+
+static inline void ceph_update_cap_hit(struct ceph_client_metric *m)
+{
+	percpu_counter_inc(&m->i_caps_hit);
+}
+
+static inline void ceph_update_cap_mis(struct ceph_client_metric *m)
+{
+	percpu_counter_inc(&m->i_caps_mis);
+}
 #endif /* _FS_CEPH_MDS_METRIC_H */
diff --git a/fs/ceph/super.h b/fs/ceph/super.h
index 5c73cf1..47cfd89 100644
--- a/fs/ceph/super.h
+++ b/fs/ceph/super.h
@@ -645,6 +645,8 @@ static inline bool __ceph_is_any_real_caps(struct ceph_inode_info *ci)
 
 extern int __ceph_caps_issued(struct ceph_inode_info *ci, int *implemented);
 extern int __ceph_caps_issued_mask(struct ceph_inode_info *ci, int mask, int t);
+extern int __ceph_caps_issued_mask_metric(struct ceph_inode_info *ci, int mask,
+					  int t);
 extern int __ceph_caps_issued_other(struct ceph_inode_info *ci,
 				    struct ceph_cap *cap);
 
@@ -657,12 +659,12 @@ static inline int ceph_caps_issued(struct ceph_inode_info *ci)
 	return issued;
 }
 
-static inline int ceph_caps_issued_mask(struct ceph_inode_info *ci, int mask,
-					int touch)
+static inline int ceph_caps_issued_mask_metric(struct ceph_inode_info *ci,
+					       int mask, int touch)
 {
 	int r;
 	spin_lock(&ci->i_ceph_lock);
-	r = __ceph_caps_issued_mask(ci, mask, touch);
+	r = __ceph_caps_issued_mask_metric(ci, mask, touch);
 	spin_unlock(&ci->i_ceph_lock);
 	return r;
 }
diff --git a/fs/ceph/xattr.c b/fs/ceph/xattr.c
index 7b8a070..71ee34d 100644
--- a/fs/ceph/xattr.c
+++ b/fs/ceph/xattr.c
@@ -856,7 +856,7 @@ ssize_t __ceph_getxattr(struct inode *inode, const char *name, void *value,
 
 	if (ci->i_xattrs.version == 0 ||
 	    !((req_mask & CEPH_CAP_XATTR_SHARED) ||
-	      __ceph_caps_issued_mask(ci, CEPH_CAP_XATTR_SHARED, 1))) {
+	      __ceph_caps_issued_mask_metric(ci, CEPH_CAP_XATTR_SHARED, 1))) {
 		spin_unlock(&ci->i_ceph_lock);
 
 		/* security module gets xattr while filling trace */
@@ -914,7 +914,7 @@ ssize_t ceph_listxattr(struct dentry *dentry, char *names, size_t size)
 	     ci->i_xattrs.version, ci->i_xattrs.index_version);
 
 	if (ci->i_xattrs.version == 0 ||
-	    !__ceph_caps_issued_mask(ci, CEPH_CAP_XATTR_SHARED, 1)) {
+	    !__ceph_caps_issued_mask_metric(ci, CEPH_CAP_XATTR_SHARED, 1)) {
 		spin_unlock(&ci->i_ceph_lock);
 		err = ceph_do_getattr(inode, CEPH_STAT_CAP_XATTR, true);
 		if (err)
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v12 3/4] ceph: add read/write latency metric support
  2020-03-19 14:06 [PATCH v12 0/4] ceph: add perf metrics support xiubli
  2020-03-19 14:06 ` [PATCH v12 1/4] ceph: add dentry lease metric support xiubli
  2020-03-19 14:06 ` [PATCH v12 2/4] ceph: add caps perf metric for each superblock xiubli
@ 2020-03-19 14:06 ` xiubli
  2020-03-19 14:36   ` Jeff Layton
  2020-03-19 14:06 ` [PATCH v12 4/4] ceph: add metadata perf " xiubli
  3 siblings, 1 reply; 10+ messages in thread
From: xiubli @ 2020-03-19 14:06 UTC (permalink / raw)
  To: jlayton; +Cc: sage, idryomov, gfarnum, zyan, pdonnell, ceph-devel, Xiubo Li

From: Xiubo Li <xiubli@redhat.com>

Calculate the latency for OSD read requests. Add a new r_end_stamp
field to struct ceph_osd_request that will hold the time of that
the reply was received. Use that to calculate the RTT for each call,
and divide the sum of those by number of calls to get averate RTT.

Keep a tally of RTT for OSD writes and number of calls to track average
latency of OSD writes.

URL: https://tracker.ceph.com/issues/43215
Signed-off-by: Xiubo Li <xiubli@redhat.com>
---
 fs/ceph/addr.c                  |  18 +++++++
 fs/ceph/debugfs.c               |  60 +++++++++++++++++++++-
 fs/ceph/file.c                  |  26 ++++++++++
 fs/ceph/metric.c                | 110 ++++++++++++++++++++++++++++++++++++++++
 fs/ceph/metric.h                |  23 +++++++++
 include/linux/ceph/osd_client.h |   1 +
 net/ceph/osd_client.c           |   2 +
 7 files changed, 239 insertions(+), 1 deletion(-)

diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c
index 6f4678d..f359619 100644
--- a/fs/ceph/addr.c
+++ b/fs/ceph/addr.c
@@ -216,6 +216,9 @@ static int ceph_sync_readpages(struct ceph_fs_client *fsc,
 	if (!rc)
 		rc = ceph_osdc_wait_request(osdc, req);
 
+	ceph_update_read_latency(&fsc->mdsc->metric, req->r_start_stamp,
+				 req->r_end_stamp, rc);
+
 	ceph_osdc_put_request(req);
 	dout("readpages result %d\n", rc);
 	return rc;
@@ -299,6 +302,7 @@ static int ceph_readpage(struct file *filp, struct page *page)
 static void finish_read(struct ceph_osd_request *req)
 {
 	struct inode *inode = req->r_inode;
+	struct ceph_fs_client *fsc = ceph_inode_to_client(inode);
 	struct ceph_osd_data *osd_data;
 	int rc = req->r_result <= 0 ? req->r_result : 0;
 	int bytes = req->r_result >= 0 ? req->r_result : 0;
@@ -336,6 +340,10 @@ static void finish_read(struct ceph_osd_request *req)
 		put_page(page);
 		bytes -= PAGE_SIZE;
 	}
+
+	ceph_update_read_latency(&fsc->mdsc->metric, req->r_start_stamp,
+				 req->r_end_stamp, rc);
+
 	kfree(osd_data->pages);
 }
 
@@ -643,6 +651,9 @@ static int ceph_sync_writepages(struct ceph_fs_client *fsc,
 	if (!rc)
 		rc = ceph_osdc_wait_request(osdc, req);
 
+	ceph_update_write_latency(&fsc->mdsc->metric, req->r_start_stamp,
+				  req->r_end_stamp, rc);
+
 	ceph_osdc_put_request(req);
 	if (rc == 0)
 		rc = len;
@@ -794,6 +805,9 @@ static void writepages_finish(struct ceph_osd_request *req)
 		ceph_clear_error_write(ci);
 	}
 
+	ceph_update_write_latency(&fsc->mdsc->metric, req->r_start_stamp,
+				  req->r_end_stamp, rc);
+
 	/*
 	 * We lost the cache cap, need to truncate the page before
 	 * it is unlocked, otherwise we'd truncate it later in the
@@ -1852,6 +1866,10 @@ int ceph_uninline_data(struct file *filp, struct page *locked_page)
 	err = ceph_osdc_start_request(&fsc->client->osdc, req, false);
 	if (!err)
 		err = ceph_osdc_wait_request(&fsc->client->osdc, req);
+
+	ceph_update_write_latency(&fsc->mdsc->metric, req->r_start_stamp,
+				  req->r_end_stamp, err);
+
 out_put:
 	ceph_osdc_put_request(req);
 	if (err == -ECANCELED)
diff --git a/fs/ceph/debugfs.c b/fs/ceph/debugfs.c
index 66b9622..de07fdb 100644
--- a/fs/ceph/debugfs.c
+++ b/fs/ceph/debugfs.c
@@ -7,6 +7,7 @@
 #include <linux/ctype.h>
 #include <linux/debugfs.h>
 #include <linux/seq_file.h>
+#include <linux/math64.h>
 
 #include <linux/ceph/libceph.h>
 #include <linux/ceph/mon_client.h>
@@ -124,13 +125,70 @@ static int mdsc_show(struct seq_file *s, void *p)
 	return 0;
 }
 
+static u64 get_avg(u64 *totalp, u64 *sump, spinlock_t *lockp, u64 *total)
+{
+	u64 t, sum, avg = 0;
+
+	spin_lock(lockp);
+	t = *totalp;
+	sum = *sump;
+	spin_unlock(lockp);
+
+	if (likely(t))
+		avg = DIV64_U64_ROUND_CLOSEST(sum, t);
+
+	*total = t;
+	return avg;
+}
+
+#define CEPH_METRIC_SHOW(name, total, avg, min, max, sq) {		\
+	u64 _total, _avg, _min, _max, _sq, _st, _re = 0;		\
+	_avg = jiffies_to_usecs(avg);					\
+	_min = jiffies_to_usecs(min == S64_MAX ? 0 : min);		\
+	_max = jiffies_to_usecs(max);					\
+	_total = total - 1;						\
+	_sq = _total > 0 ? DIV64_U64_ROUND_CLOSEST(sq, _total) : 0;	\
+	_sq = jiffies_to_usecs(_sq);					\
+	_st = int_sqrt64(_sq);						\
+	if (_st > 0) {							\
+		_re = 5 * (_sq - (_st * _st));				\
+		_re = _re > 0 ? _re - 1 : 0;				\
+		_re = _st > 0 ? div64_s64(_re, _st) : 0;		\
+	}								\
+	seq_printf(s, "%-14s%-12llu%-16llu%-16llu%-16llu%llu.%llu\n",	\
+		   name, total, _avg, _min, _max, _st, _re);		\
+}
+
 static int metric_show(struct seq_file *s, void *p)
 {
 	struct ceph_fs_client *fsc = s->private;
 	struct ceph_mds_client *mdsc = fsc->mdsc;
 	struct ceph_client_metric *m = &mdsc->metric;
 	int i, nr_caps = 0;
-
+	u64 total, avg, min, max, sq;
+
+	seq_printf(s, "item          total       avg_lat(us)     min_lat(us)     max_lat(us)     stdev(us)\n");
+	seq_printf(s, "-----------------------------------------------------------------------------------\n");
+
+	avg = get_avg(&m->total_reads,
+		      &m->read_latency_sum,
+		      &m->read_latency_lock,
+		      &total);
+	min = atomic64_read(&m->read_latency_min);
+	max = atomic64_read(&m->read_latency_max);
+	sq = percpu_counter_sum(&m->read_latency_sq_sum);
+	CEPH_METRIC_SHOW("read", total, avg, min, max, sq);
+
+	avg = get_avg(&m->total_writes,
+		      &m->write_latency_sum,
+		      &m->write_latency_lock,
+		      &total);
+	min = atomic64_read(&m->write_latency_min);
+	max = atomic64_read(&m->write_latency_max);
+	sq = percpu_counter_sum(&m->write_latency_sq_sum);
+	CEPH_METRIC_SHOW("write", total, avg, min, max, sq);
+
+	seq_printf(s, "\n");
 	seq_printf(s, "item          total           miss            hit\n");
 	seq_printf(s, "-------------------------------------------------\n");
 
diff --git a/fs/ceph/file.c b/fs/ceph/file.c
index 4a5ccbb..8e40022 100644
--- a/fs/ceph/file.c
+++ b/fs/ceph/file.c
@@ -906,6 +906,10 @@ static ssize_t ceph_sync_read(struct kiocb *iocb, struct iov_iter *to,
 		ret = ceph_osdc_start_request(osdc, req, false);
 		if (!ret)
 			ret = ceph_osdc_wait_request(osdc, req);
+
+		ceph_update_read_latency(&fsc->mdsc->metric, req->r_start_stamp,
+					 req->r_end_stamp, ret);
+
 		ceph_osdc_put_request(req);
 
 		i_size = i_size_read(inode);
@@ -1044,6 +1048,8 @@ static void ceph_aio_complete_req(struct ceph_osd_request *req)
 	struct inode *inode = req->r_inode;
 	struct ceph_aio_request *aio_req = req->r_priv;
 	struct ceph_osd_data *osd_data = osd_req_op_extent_osd_data(req, 0);
+	struct ceph_fs_client *fsc = ceph_inode_to_client(inode);
+	struct ceph_client_metric *metric = &fsc->mdsc->metric;
 
 	BUG_ON(osd_data->type != CEPH_OSD_DATA_TYPE_BVECS);
 	BUG_ON(!osd_data->num_bvecs);
@@ -1051,6 +1057,16 @@ static void ceph_aio_complete_req(struct ceph_osd_request *req)
 	dout("ceph_aio_complete_req %p rc %d bytes %u\n",
 	     inode, rc, osd_data->bvec_pos.iter.bi_size);
 
+	/* r_start_stamp == 0 means the request was not submitted */
+	if (req->r_start_stamp) {
+		if (aio_req->write)
+			ceph_update_write_latency(metric, req->r_start_stamp,
+						  req->r_end_stamp, rc);
+		else
+			ceph_update_read_latency(metric, req->r_start_stamp,
+						 req->r_end_stamp, rc);
+	}
+
 	if (rc == -EOLDSNAPC) {
 		struct ceph_aio_work *aio_work;
 		BUG_ON(!aio_req->write);
@@ -1179,6 +1195,7 @@ static void ceph_aio_retry_work(struct work_struct *work)
 	struct inode *inode = file_inode(file);
 	struct ceph_inode_info *ci = ceph_inode(inode);
 	struct ceph_fs_client *fsc = ceph_inode_to_client(inode);
+	struct ceph_client_metric *metric = &fsc->mdsc->metric;
 	struct ceph_vino vino;
 	struct ceph_osd_request *req;
 	struct bio_vec *bvecs;
@@ -1295,6 +1312,13 @@ static void ceph_aio_retry_work(struct work_struct *work)
 		if (!ret)
 			ret = ceph_osdc_wait_request(&fsc->client->osdc, req);
 
+		if (write)
+			ceph_update_write_latency(metric, req->r_start_stamp,
+						  req->r_end_stamp, ret);
+		else
+			ceph_update_read_latency(metric, req->r_start_stamp,
+						 req->r_end_stamp, ret);
+
 		size = i_size_read(inode);
 		if (!write) {
 			if (ret == -ENOENT)
@@ -1466,6 +1490,8 @@ static void ceph_aio_retry_work(struct work_struct *work)
 		if (!ret)
 			ret = ceph_osdc_wait_request(&fsc->client->osdc, req);
 
+		ceph_update_write_latency(&fsc->mdsc->metric, req->r_start_stamp,
+					  req->r_end_stamp, ret);
 out:
 		ceph_osdc_put_request(req);
 		if (ret != 0) {
diff --git a/fs/ceph/metric.c b/fs/ceph/metric.c
index 2a4b739..6cb64fb 100644
--- a/fs/ceph/metric.c
+++ b/fs/ceph/metric.c
@@ -2,6 +2,7 @@
 
 #include <linux/types.h>
 #include <linux/percpu_counter.h>
+#include <linux/math64.h>
 
 #include "metric.h"
 
@@ -29,8 +30,32 @@ int ceph_metric_init(struct ceph_client_metric *m)
 	if (ret)
 		goto err_i_caps_mis;
 
+	ret = percpu_counter_init(&m->read_latency_sq_sum, 0, GFP_KERNEL);
+	if (ret)
+		goto err_read_latency_sq_sum;
+
+	atomic64_set(&m->read_latency_min, S64_MAX);
+	atomic64_set(&m->read_latency_max, 0);
+	spin_lock_init(&m->read_latency_lock);
+	m->total_reads = 0;
+	m->read_latency_sum = 0;
+
+	ret = percpu_counter_init(&m->write_latency_sq_sum, 0, GFP_KERNEL);
+	if (ret)
+		goto err_write_latency_sq_sum;
+
+	atomic64_set(&m->write_latency_min, S64_MAX);
+	atomic64_set(&m->write_latency_max, 0);
+	spin_lock_init(&m->write_latency_lock);
+	m->total_writes = 0;
+	m->write_latency_sum = 0;
+
 	return 0;
 
+err_write_latency_sq_sum:
+	percpu_counter_destroy(&m->read_latency_sq_sum);
+err_read_latency_sq_sum:
+	percpu_counter_destroy(&m->i_caps_mis);
 err_i_caps_mis:
 	percpu_counter_destroy(&m->i_caps_hit);
 err_i_caps_hit:
@@ -46,8 +71,93 @@ void ceph_metric_destroy(struct ceph_client_metric *m)
 	if (!m)
 		return;
 
+	percpu_counter_destroy(&m->write_latency_sq_sum);
+	percpu_counter_destroy(&m->read_latency_sq_sum);
 	percpu_counter_destroy(&m->i_caps_mis);
 	percpu_counter_destroy(&m->i_caps_hit);
 	percpu_counter_destroy(&m->d_lease_mis);
 	percpu_counter_destroy(&m->d_lease_hit);
 }
+
+static inline void __update_min_latency(atomic64_t *min, unsigned long lat)
+{
+	u64 cur, old;
+
+	cur = atomic64_read(min);
+	do {
+		old = cur;
+		if (likely(lat >= old))
+			break;
+	} while (unlikely((cur = atomic64_cmpxchg(min, old, lat)) != old));
+}
+
+static inline void __update_max_latency(atomic64_t *max, unsigned long lat)
+{
+	u64 cur, old;
+
+	cur = atomic64_read(max);
+	do {
+		old = cur;
+		if (likely(lat <= old))
+			break;
+	} while (unlikely((cur = atomic64_cmpxchg(max, old, lat)) != old));
+}
+
+static inline void __update_avg_and_sq(u64 *totalp, u64 *lsump,
+				       struct percpu_counter *sq_sump,
+				       spinlock_t *lockp, unsigned long lat)
+{
+	u64 total, avg, sq, lsum;
+
+	spin_lock(lockp);
+	total = ++(*totalp);
+	*lsump += lat;
+	lsum = *lsump;
+	spin_unlock(lockp);
+
+	if (unlikely(total == 1))
+		return;
+
+	/* the sq is (lat - old_avg) * (lat - new_avg) */
+	avg = DIV64_U64_ROUND_CLOSEST((lsum - lat), (total - 1));
+	sq = lat - avg;
+	avg = DIV64_U64_ROUND_CLOSEST(lsum, total);
+	sq = sq * (lat - avg);
+	percpu_counter_add(sq_sump, sq);
+}
+
+void ceph_update_read_latency(struct ceph_client_metric *m,
+			      unsigned long r_start,
+			      unsigned long r_end,
+			      int rc)
+{
+	unsigned long lat = r_end - r_start;
+
+	if (unlikely(rc < 0 && rc != -ENOENT && rc != -ETIMEDOUT))
+		return;
+
+	__update_min_latency(&m->read_latency_min, lat);
+	__update_max_latency(&m->read_latency_max, lat);
+	__update_avg_and_sq(&m->total_reads, &m->read_latency_sum,
+			    &m->read_latency_sq_sum,
+			    &m->read_latency_lock,
+			    lat);
+}
+
+void ceph_update_write_latency(struct ceph_client_metric *m,
+			       unsigned long r_start,
+			       unsigned long r_end,
+			       int rc)
+{
+	unsigned long lat = r_end - r_start;
+
+	if (unlikely(rc && rc != -ETIMEDOUT))
+		return;
+
+	__update_min_latency(&m->write_latency_min, lat);
+	__update_max_latency(&m->write_latency_max, lat);
+	__update_avg_and_sq(&m->total_writes, &m->write_latency_sum,
+			    &m->write_latency_sq_sum,
+			    &m->write_latency_lock,
+			    lat);
+}
diff --git a/fs/ceph/metric.h b/fs/ceph/metric.h
index 098ee8a..c7eae56 100644
--- a/fs/ceph/metric.h
+++ b/fs/ceph/metric.h
@@ -13,6 +13,20 @@ struct ceph_client_metric {
 
 	struct percpu_counter i_caps_hit;
 	struct percpu_counter i_caps_mis;
+
+	struct percpu_counter read_latency_sq_sum;
+	atomic64_t read_latency_min;
+	atomic64_t read_latency_max;
+	spinlock_t read_latency_lock;
+	u64 total_reads;
+	u64 read_latency_sum;
+
+	struct percpu_counter write_latency_sq_sum;
+	atomic64_t write_latency_min;
+	atomic64_t write_latency_max;
+	spinlock_t write_latency_lock;
+	u64 total_writes;
+	u64 write_latency_sum;
 };
 
 extern int ceph_metric_init(struct ceph_client_metric *m);
@@ -27,4 +41,13 @@ static inline void ceph_update_cap_mis(struct ceph_client_metric *m)
 {
 	percpu_counter_inc(&m->i_caps_mis);
 }
+
+extern void ceph_update_read_latency(struct ceph_client_metric *m,
+				     unsigned long r_start,
+				     unsigned long r_end,
+				     int rc);
+extern void ceph_update_write_latency(struct ceph_client_metric *m,
+				      unsigned long r_start,
+				      unsigned long r_end,
+				      int rc);
 #endif /* _FS_CEPH_MDS_METRIC_H */
diff --git a/include/linux/ceph/osd_client.h b/include/linux/ceph/osd_client.h
index 9d9f745..02ff3a3 100644
--- a/include/linux/ceph/osd_client.h
+++ b/include/linux/ceph/osd_client.h
@@ -213,6 +213,7 @@ struct ceph_osd_request {
 	/* internal */
 	unsigned long r_stamp;                /* jiffies, send or check time */
 	unsigned long r_start_stamp;          /* jiffies */
+	unsigned long r_end_stamp;            /* jiffies */
 	int r_attempts;
 	u32 r_map_dne_bound;
 
diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c
index 998e26b..28e33e0 100644
--- a/net/ceph/osd_client.c
+++ b/net/ceph/osd_client.c
@@ -2389,6 +2389,8 @@ static void finish_request(struct ceph_osd_request *req)
 	WARN_ON(lookup_request_mc(&osdc->map_checks, req->r_tid));
 	dout("%s req %p tid %llu\n", __func__, req, req->r_tid);
 
+	req->r_end_stamp = jiffies;
+
 	if (req->r_osd)
 		unlink_request(req->r_osd, req);
 	atomic_dec(&osdc->num_requests);
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v12 4/4] ceph: add metadata perf metric support
  2020-03-19 14:06 [PATCH v12 0/4] ceph: add perf metrics support xiubli
                   ` (2 preceding siblings ...)
  2020-03-19 14:06 ` [PATCH v12 3/4] ceph: add read/write latency metric support xiubli
@ 2020-03-19 14:06 ` xiubli
  3 siblings, 0 replies; 10+ messages in thread
From: xiubli @ 2020-03-19 14:06 UTC (permalink / raw)
  To: jlayton; +Cc: sage, idryomov, gfarnum, zyan, pdonnell, ceph-devel, Xiubo Li

From: Xiubo Li <xiubli@redhat.com>

Add a new "r_ended" field to struct ceph_mds_request and use that to
maintain the average latency of MDS requests.

URL: https://tracker.ceph.com/issues/43215
Signed-off-by: Xiubo Li <xiubli@redhat.com>
---
 fs/ceph/debugfs.c    |  9 +++++++++
 fs/ceph/mds_client.c |  5 +++++
 fs/ceph/mds_client.h |  3 ++-
 fs/ceph/metric.c     | 31 +++++++++++++++++++++++++++++++
 fs/ceph/metric.h     | 11 +++++++++++
 5 files changed, 58 insertions(+), 1 deletion(-)

diff --git a/fs/ceph/debugfs.c b/fs/ceph/debugfs.c
index de07fdb..52bc14a 100644
--- a/fs/ceph/debugfs.c
+++ b/fs/ceph/debugfs.c
@@ -188,6 +188,15 @@ static int metric_show(struct seq_file *s, void *p)
 	sq = percpu_counter_sum(&m->write_latency_sq_sum);
 	CEPH_METRIC_SHOW("write", total, avg, min, max, sq);
 
+	avg = get_avg(&m->total_metadatas,
+		      &m->metadata_latency_sum,
+		      &m->metadata_latency_lock,
+		      &total);
+	min = atomic64_read(&m->metadata_latency_min);
+	max = atomic64_read(&m->metadata_latency_max);
+	sq = percpu_counter_sum(&m->metadata_latency_sq_sum);
+	CEPH_METRIC_SHOW("metadata", total, avg, min, max, sq);
+
 	seq_printf(s, "\n");
 	seq_printf(s, "item          total           miss            hit\n");
 	seq_printf(s, "-------------------------------------------------\n");
diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
index 1e242d8..b3f985a 100644
--- a/fs/ceph/mds_client.c
+++ b/fs/ceph/mds_client.c
@@ -2547,6 +2547,8 @@ static struct ceph_msg *create_request_message(struct ceph_mds_client *mdsc,
 static void complete_request(struct ceph_mds_client *mdsc,
 			     struct ceph_mds_request *req)
 {
+	req->r_ended = jiffies;
+
 	if (req->r_callback)
 		req->r_callback(mdsc, req);
 	complete_all(&req->r_completion);
@@ -3155,6 +3157,9 @@ static void handle_reply(struct ceph_mds_session *session, struct ceph_msg *msg)
 
 	/* kick calling process */
 	complete_request(mdsc, req);
+
+	ceph_update_metadata_latency(&mdsc->metric, req->r_started,
+				     req->r_ended, err);
 out:
 	ceph_mdsc_put_request(req);
 	return;
diff --git a/fs/ceph/mds_client.h b/fs/ceph/mds_client.h
index ae1d01c..9018fa7 100644
--- a/fs/ceph/mds_client.h
+++ b/fs/ceph/mds_client.h
@@ -298,7 +298,8 @@ struct ceph_mds_request {
 	u32               r_readdir_offset;
 
 	unsigned long r_timeout;  /* optional.  jiffies, 0 is "wait forever" */
-	unsigned long r_started;  /* start time to measure timeout against */
+	unsigned long r_started;  /* start time to measure timeout against and latency */
+	unsigned long r_ended;    /* finish time to measure latency */
 	unsigned long r_request_started; /* start time for mds request only,
 					    used to measure lease durations */
 
diff --git a/fs/ceph/metric.c b/fs/ceph/metric.c
index 6cb64fb..9cd9dc3 100644
--- a/fs/ceph/metric.c
+++ b/fs/ceph/metric.c
@@ -50,8 +50,20 @@ int ceph_metric_init(struct ceph_client_metric *m)
 	m->total_writes = 0;
 	m->write_latency_sum = 0;
 
+	ret = percpu_counter_init(&m->metadata_latency_sq_sum, 0, GFP_KERNEL);
+	if (ret)
+		goto err_metadata_latency_sq_sum;
+
+	atomic64_set(&m->metadata_latency_min, S64_MAX);
+	atomic64_set(&m->metadata_latency_max, 0);
+	spin_lock_init(&m->metadata_latency_lock);
+	m->total_metadatas = 0;
+	m->metadata_latency_sum = 0;
+
 	return 0;
 
+err_metadata_latency_sq_sum:
+	percpu_counter_destroy(&m->write_latency_sq_sum);
 err_write_latency_sq_sum:
 	percpu_counter_destroy(&m->read_latency_sq_sum);
 err_read_latency_sq_sum:
@@ -71,6 +83,7 @@ void ceph_metric_destroy(struct ceph_client_metric *m)
 	if (!m)
 		return;
 
+	percpu_counter_destroy(&m->metadata_latency_sq_sum);
 	percpu_counter_destroy(&m->write_latency_sq_sum);
 	percpu_counter_destroy(&m->read_latency_sq_sum);
 	percpu_counter_destroy(&m->i_caps_mis);
@@ -161,3 +174,21 @@ void ceph_update_write_latency(struct ceph_client_metric *m,
 			    &m->write_latency_lock,
 			    lat);
 }
+
+void ceph_update_metadata_latency(struct ceph_client_metric *m,
+				  unsigned long r_start,
+				  unsigned long r_end,
+				  int rc)
+{
+	unsigned long lat = r_end - r_start;
+
+	if (unlikely(rc && rc != -ENOENT))
+		return;
+
+	__update_min_latency(&m->metadata_latency_min, lat);
+	__update_max_latency(&m->metadata_latency_max, lat);
+	__update_avg_and_sq(&m->total_metadatas, &m->metadata_latency_sum,
+			    &m->metadata_latency_sq_sum,
+			    &m->metadata_latency_lock,
+			    lat);
+}
diff --git a/fs/ceph/metric.h b/fs/ceph/metric.h
index c7eae56..14aa910 100644
--- a/fs/ceph/metric.h
+++ b/fs/ceph/metric.h
@@ -27,6 +27,13 @@ struct ceph_client_metric {
 	spinlock_t write_latency_lock;
 	u64 total_writes;
 	u64 write_latency_sum;
+
+	struct percpu_counter metadata_latency_sq_sum;
+	atomic64_t metadata_latency_min;
+	atomic64_t metadata_latency_max;
+	spinlock_t metadata_latency_lock;
+	u64 total_metadatas;
+	u64 metadata_latency_sum;
 };
 
 extern int ceph_metric_init(struct ceph_client_metric *m);
@@ -50,4 +57,8 @@ extern void ceph_update_write_latency(struct ceph_client_metric *m,
 				      unsigned long r_start,
 				      unsigned long r_end,
 				      int rc);
+extern void ceph_update_metadata_latency(struct ceph_client_metric *m,
+					 unsigned long r_start,
+					 unsigned long r_end,
+					 int rc);
 #endif /* _FS_CEPH_MDS_METRIC_H */
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH v12 3/4] ceph: add read/write latency metric support
  2020-03-19 14:06 ` [PATCH v12 3/4] ceph: add read/write latency metric support xiubli
@ 2020-03-19 14:36   ` Jeff Layton
  2020-03-19 17:44     ` Xiubo Li
  0 siblings, 1 reply; 10+ messages in thread
From: Jeff Layton @ 2020-03-19 14:36 UTC (permalink / raw)
  To: xiubli; +Cc: sage, idryomov, gfarnum, zyan, pdonnell, ceph-devel

On Thu, 2020-03-19 at 10:06 -0400, xiubli@redhat.com wrote:
> From: Xiubo Li <xiubli@redhat.com>
> 
> Calculate the latency for OSD read requests. Add a new r_end_stamp
> field to struct ceph_osd_request that will hold the time of that
> the reply was received. Use that to calculate the RTT for each call,
> and divide the sum of those by number of calls to get averate RTT.
> 
> Keep a tally of RTT for OSD writes and number of calls to track average
> latency of OSD writes.
> 
> URL: https://tracker.ceph.com/issues/43215
> Signed-off-by: Xiubo Li <xiubli@redhat.com>
> ---
>  fs/ceph/addr.c                  |  18 +++++++
>  fs/ceph/debugfs.c               |  60 +++++++++++++++++++++-
>  fs/ceph/file.c                  |  26 ++++++++++
>  fs/ceph/metric.c                | 110 ++++++++++++++++++++++++++++++++++++++++
>  fs/ceph/metric.h                |  23 +++++++++
>  include/linux/ceph/osd_client.h |   1 +
>  net/ceph/osd_client.c           |   2 +
>  7 files changed, 239 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c
> index 6f4678d..f359619 100644
> --- a/fs/ceph/addr.c
> +++ b/fs/ceph/addr.c
> @@ -216,6 +216,9 @@ static int ceph_sync_readpages(struct ceph_fs_client *fsc,
>  	if (!rc)
>  		rc = ceph_osdc_wait_request(osdc, req);
>  
> +	ceph_update_read_latency(&fsc->mdsc->metric, req->r_start_stamp,
> +				 req->r_end_stamp, rc);
> +
>  	ceph_osdc_put_request(req);
>  	dout("readpages result %d\n", rc);
>  	return rc;
> @@ -299,6 +302,7 @@ static int ceph_readpage(struct file *filp, struct page *page)
>  static void finish_read(struct ceph_osd_request *req)
>  {
>  	struct inode *inode = req->r_inode;
> +	struct ceph_fs_client *fsc = ceph_inode_to_client(inode);
>  	struct ceph_osd_data *osd_data;
>  	int rc = req->r_result <= 0 ? req->r_result : 0;
>  	int bytes = req->r_result >= 0 ? req->r_result : 0;
> @@ -336,6 +340,10 @@ static void finish_read(struct ceph_osd_request *req)
>  		put_page(page);
>  		bytes -= PAGE_SIZE;
>  	}
> +
> +	ceph_update_read_latency(&fsc->mdsc->metric, req->r_start_stamp,
> +				 req->r_end_stamp, rc);
> +
>  	kfree(osd_data->pages);
>  }
>  
> @@ -643,6 +651,9 @@ static int ceph_sync_writepages(struct ceph_fs_client *fsc,
>  	if (!rc)
>  		rc = ceph_osdc_wait_request(osdc, req);
>  
> +	ceph_update_write_latency(&fsc->mdsc->metric, req->r_start_stamp,
> +				  req->r_end_stamp, rc);
> +
>  	ceph_osdc_put_request(req);
>  	if (rc == 0)
>  		rc = len;
> @@ -794,6 +805,9 @@ static void writepages_finish(struct ceph_osd_request *req)
>  		ceph_clear_error_write(ci);
>  	}
>  
> +	ceph_update_write_latency(&fsc->mdsc->metric, req->r_start_stamp,
> +				  req->r_end_stamp, rc);
> +
>  	/*
>  	 * We lost the cache cap, need to truncate the page before
>  	 * it is unlocked, otherwise we'd truncate it later in the
> @@ -1852,6 +1866,10 @@ int ceph_uninline_data(struct file *filp, struct page *locked_page)
>  	err = ceph_osdc_start_request(&fsc->client->osdc, req, false);
>  	if (!err)
>  		err = ceph_osdc_wait_request(&fsc->client->osdc, req);
> +
> +	ceph_update_write_latency(&fsc->mdsc->metric, req->r_start_stamp,
> +				  req->r_end_stamp, err);
> +
>  out_put:
>  	ceph_osdc_put_request(req);
>  	if (err == -ECANCELED)
> diff --git a/fs/ceph/debugfs.c b/fs/ceph/debugfs.c
> index 66b9622..de07fdb 100644
> --- a/fs/ceph/debugfs.c
> +++ b/fs/ceph/debugfs.c
> @@ -7,6 +7,7 @@
>  #include <linux/ctype.h>
>  #include <linux/debugfs.h>
>  #include <linux/seq_file.h>
> +#include <linux/math64.h>
>  
>  #include <linux/ceph/libceph.h>
>  #include <linux/ceph/mon_client.h>
> @@ -124,13 +125,70 @@ static int mdsc_show(struct seq_file *s, void *p)
>  	return 0;
>  }
>  
> +static u64 get_avg(u64 *totalp, u64 *sump, spinlock_t *lockp, u64 *total)
> +{
> +	u64 t, sum, avg = 0;
> +
> +	spin_lock(lockp);
> +	t = *totalp;
> +	sum = *sump;
> +	spin_unlock(lockp);
> +
> +	if (likely(t))
> +		avg = DIV64_U64_ROUND_CLOSEST(sum, t);
> +
> +	*total = t;
> +	return avg;
> +}
> +
> +#define CEPH_METRIC_SHOW(name, total, avg, min, max, sq) {		\
> +	u64 _total, _avg, _min, _max, _sq, _st, _re = 0;		\
> +	_avg = jiffies_to_usecs(avg);					\
> +	_min = jiffies_to_usecs(min == S64_MAX ? 0 : min);		\
> +	_max = jiffies_to_usecs(max);					\
> +	_total = total - 1;						\
> +	_sq = _total > 0 ? DIV64_U64_ROUND_CLOSEST(sq, _total) : 0;	\
> +	_sq = jiffies_to_usecs(_sq);					\
> +	_st = int_sqrt64(_sq);						\
> +	if (_st > 0) {							\
> +		_re = 5 * (_sq - (_st * _st));				\
> +		_re = _re > 0 ? _re - 1 : 0;				\
> +		_re = _st > 0 ? div64_s64(_re, _st) : 0;		\
> +	}								\
> +	seq_printf(s, "%-14s%-12llu%-16llu%-16llu%-16llu%llu.%llu\n",	\
> +		   name, total, _avg, _min, _max, _st, _re);		\
> +}
> +
>  static int metric_show(struct seq_file *s, void *p)
>  {
>  	struct ceph_fs_client *fsc = s->private;
>  	struct ceph_mds_client *mdsc = fsc->mdsc;
>  	struct ceph_client_metric *m = &mdsc->metric;
>  	int i, nr_caps = 0;
> -
> +	u64 total, avg, min, max, sq;
> +
> +	seq_printf(s, "item          total       avg_lat(us)     min_lat(us)     max_lat(us)     stdev(us)\n");
> +	seq_printf(s, "-----------------------------------------------------------------------------------\n");
> +
> +	avg = get_avg(&m->total_reads,
> +		      &m->read_latency_sum,
> +		      &m->read_latency_lock,
> +		      &total);
> +	min = atomic64_read(&m->read_latency_min);
> +	max = atomic64_read(&m->read_latency_max);
> +	sq = percpu_counter_sum(&m->read_latency_sq_sum);
> +	CEPH_METRIC_SHOW("read", total, avg, min, max, sq);
> +
> +	avg = get_avg(&m->total_writes,
> +		      &m->write_latency_sum,
> +		      &m->write_latency_lock,
> +		      &total);
> +	min = atomic64_read(&m->write_latency_min);
> +	max = atomic64_read(&m->write_latency_max);
> +	sq = percpu_counter_sum(&m->write_latency_sq_sum);
> +	CEPH_METRIC_SHOW("write", total, avg, min, max, sq);
> +
> +	seq_printf(s, "\n");
>  	seq_printf(s, "item          total           miss            hit\n");
>  	seq_printf(s, "-------------------------------------------------\n");
>  
> diff --git a/fs/ceph/file.c b/fs/ceph/file.c
> index 4a5ccbb..8e40022 100644
> --- a/fs/ceph/file.c
> +++ b/fs/ceph/file.c
> @@ -906,6 +906,10 @@ static ssize_t ceph_sync_read(struct kiocb *iocb, struct iov_iter *to,
>  		ret = ceph_osdc_start_request(osdc, req, false);
>  		if (!ret)
>  			ret = ceph_osdc_wait_request(osdc, req);
> +
> +		ceph_update_read_latency(&fsc->mdsc->metric, req->r_start_stamp,
> +					 req->r_end_stamp, ret);
> +
>  		ceph_osdc_put_request(req);
>  
>  		i_size = i_size_read(inode);
> @@ -1044,6 +1048,8 @@ static void ceph_aio_complete_req(struct ceph_osd_request *req)
>  	struct inode *inode = req->r_inode;
>  	struct ceph_aio_request *aio_req = req->r_priv;
>  	struct ceph_osd_data *osd_data = osd_req_op_extent_osd_data(req, 0);
> +	struct ceph_fs_client *fsc = ceph_inode_to_client(inode);
> +	struct ceph_client_metric *metric = &fsc->mdsc->metric;
>  
>  	BUG_ON(osd_data->type != CEPH_OSD_DATA_TYPE_BVECS);
>  	BUG_ON(!osd_data->num_bvecs);
> @@ -1051,6 +1057,16 @@ static void ceph_aio_complete_req(struct ceph_osd_request *req)
>  	dout("ceph_aio_complete_req %p rc %d bytes %u\n",
>  	     inode, rc, osd_data->bvec_pos.iter.bi_size);
>  
> +	/* r_start_stamp == 0 means the request was not submitted */
> +	if (req->r_start_stamp) {
> +		if (aio_req->write)
> +			ceph_update_write_latency(metric, req->r_start_stamp,
> +						  req->r_end_stamp, rc);
> +		else
> +			ceph_update_read_latency(metric, req->r_start_stamp,
> +						 req->r_end_stamp, rc);
> +	}
> +
>  	if (rc == -EOLDSNAPC) {
>  		struct ceph_aio_work *aio_work;
>  		BUG_ON(!aio_req->write);
> @@ -1179,6 +1195,7 @@ static void ceph_aio_retry_work(struct work_struct *work)
>  	struct inode *inode = file_inode(file);
>  	struct ceph_inode_info *ci = ceph_inode(inode);
>  	struct ceph_fs_client *fsc = ceph_inode_to_client(inode);
> +	struct ceph_client_metric *metric = &fsc->mdsc->metric;
>  	struct ceph_vino vino;
>  	struct ceph_osd_request *req;
>  	struct bio_vec *bvecs;
> @@ -1295,6 +1312,13 @@ static void ceph_aio_retry_work(struct work_struct *work)
>  		if (!ret)
>  			ret = ceph_osdc_wait_request(&fsc->client->osdc, req);
>  
> +		if (write)
> +			ceph_update_write_latency(metric, req->r_start_stamp,
> +						  req->r_end_stamp, ret);
> +		else
> +			ceph_update_read_latency(metric, req->r_start_stamp,
> +						 req->r_end_stamp, ret);
> +
>  		size = i_size_read(inode);
>  		if (!write) {
>  			if (ret == -ENOENT)
> @@ -1466,6 +1490,8 @@ static void ceph_aio_retry_work(struct work_struct *work)
>  		if (!ret)
>  			ret = ceph_osdc_wait_request(&fsc->client->osdc, req);
>  
> +		ceph_update_write_latency(&fsc->mdsc->metric, req->r_start_stamp,
> +					  req->r_end_stamp, ret);
>  out:
>  		ceph_osdc_put_request(req);
>  		if (ret != 0) {
> diff --git a/fs/ceph/metric.c b/fs/ceph/metric.c
> index 2a4b739..6cb64fb 100644
> --- a/fs/ceph/metric.c
> +++ b/fs/ceph/metric.c
> @@ -2,6 +2,7 @@
>  
>  #include <linux/types.h>
>  #include <linux/percpu_counter.h>
> +#include <linux/math64.h>
>  
>  #include "metric.h"
>  
> @@ -29,8 +30,32 @@ int ceph_metric_init(struct ceph_client_metric *m)
>  	if (ret)
>  		goto err_i_caps_mis;
>  
> +	ret = percpu_counter_init(&m->read_latency_sq_sum, 0, GFP_KERNEL);
> +	if (ret)
> +		goto err_read_latency_sq_sum;
> +
> +	atomic64_set(&m->read_latency_min, S64_MAX);
> +	atomic64_set(&m->read_latency_max, 0);
> +	spin_lock_init(&m->read_latency_lock);
> +	m->total_reads = 0;
> +	m->read_latency_sum = 0;
> +
> +	ret = percpu_counter_init(&m->write_latency_sq_sum, 0, GFP_KERNEL);
> +	if (ret)
> +		goto err_write_latency_sq_sum;
> +
> +	atomic64_set(&m->write_latency_min, S64_MAX);
> +	atomic64_set(&m->write_latency_max, 0);
> +	spin_lock_init(&m->write_latency_lock);
> +	m->total_writes = 0;
> +	m->write_latency_sum = 0;
> +
>  	return 0;
>  
> +err_write_latency_sq_sum:
> +	percpu_counter_destroy(&m->read_latency_sq_sum);
> +err_read_latency_sq_sum:
> +	percpu_counter_destroy(&m->i_caps_mis);
>  err_i_caps_mis:
>  	percpu_counter_destroy(&m->i_caps_hit);
>  err_i_caps_hit:
> @@ -46,8 +71,93 @@ void ceph_metric_destroy(struct ceph_client_metric *m)
>  	if (!m)
>  		return;
>  
> +	percpu_counter_destroy(&m->write_latency_sq_sum);
> +	percpu_counter_destroy(&m->read_latency_sq_sum);
>  	percpu_counter_destroy(&m->i_caps_mis);
>  	percpu_counter_destroy(&m->i_caps_hit);
>  	percpu_counter_destroy(&m->d_lease_mis);
>  	percpu_counter_destroy(&m->d_lease_hit);
>  }
> +
> +static inline void __update_min_latency(atomic64_t *min, unsigned long lat)
> +{
> +	u64 cur, old;
> +
> +	cur = atomic64_read(min);
> +	do {
> +		old = cur;
> +		if (likely(lat >= old))
> +			break;
> +	} while (unlikely((cur = atomic64_cmpxchg(min, old, lat)) != old));
> +}
> +
> +static inline void __update_max_latency(atomic64_t *max, unsigned long lat)
> +{
> +	u64 cur, old;
> +
> +	cur = atomic64_read(max);
> +	do {
> +		old = cur;
> +		if (likely(lat <= old))
> +			break;
> +	} while (unlikely((cur = atomic64_cmpxchg(max, old, lat)) != old));
> +}
> +
> +static inline void __update_avg_and_sq(u64 *totalp, u64 *lsump,
> +				       struct percpu_counter *sq_sump,
> +				       spinlock_t *lockp, unsigned long lat)
> +{
> +	u64 total, avg, sq, lsum;
> +
> +	spin_lock(lockp);
> +	total = ++(*totalp);
> +	*lsump += lat;
> +	lsum = *lsump;
> +	spin_unlock(lockp);
> +
> +	if (unlikely(total == 1))
> +		return;
> +
> +	/* the sq is (lat - old_avg) * (lat - new_avg) */
> +	avg = DIV64_U64_ROUND_CLOSEST((lsum - lat), (total - 1));
> +	sq = lat - avg;
> +	avg = DIV64_U64_ROUND_CLOSEST(lsum, total);
> +	sq = sq * (lat - avg);
> +	percpu_counter_add(sq_sump, sq);
> +}
> +
> +void ceph_update_read_latency(struct ceph_client_metric *m,
> +			      unsigned long r_start,
> +			      unsigned long r_end,
> +			      int rc)
> +{
> +	unsigned long lat = r_end - r_start;
> +
> +	if (unlikely(rc < 0 && rc != -ENOENT && rc != -ETIMEDOUT))
> +		return;
> +
> +	__update_min_latency(&m->read_latency_min, lat);
> +	__update_max_latency(&m->read_latency_max, lat);
> +	__update_avg_and_sq(&m->total_reads, &m->read_latency_sum,
> +			    &m->read_latency_sq_sum,
> +			    &m->read_latency_lock,
> +			    lat);
> +}
> +
> +void ceph_update_write_latency(struct ceph_client_metric *m,
> +			       unsigned long r_start,
> +			       unsigned long r_end,
> +			       int rc)
> +{
> +	unsigned long lat = r_end - r_start;
> +
> +	if (unlikely(rc && rc != -ETIMEDOUT))
> +		return;
> +
> +	__update_min_latency(&m->write_latency_min, lat);
> +	__update_max_latency(&m->write_latency_max, lat);
> +	__update_avg_and_sq(&m->total_writes, &m->write_latency_sum,
> +			    &m->write_latency_sq_sum,
> +			    &m->write_latency_lock,
> +			    lat);
> +}
> diff --git a/fs/ceph/metric.h b/fs/ceph/metric.h
> index 098ee8a..c7eae56 100644
> --- a/fs/ceph/metric.h
> +++ b/fs/ceph/metric.h
> @@ -13,6 +13,20 @@ struct ceph_client_metric {
>  
>  	struct percpu_counter i_caps_hit;
>  	struct percpu_counter i_caps_mis;
> +
> +	struct percpu_counter read_latency_sq_sum;
> +	atomic64_t read_latency_min;
> +	atomic64_t read_latency_max;

I'd make the above 3 values be regular values and make them all use the
read_latency_lock. Given that you're taking a lock anyway, it's more
efficient to just do all of the manipulation under a single spinlock
rather than fooling with atomic or percpu values. These are all almost
certainly going to be in the same cacheline anyway.

> +	spinlock_t read_latency_lock;
> +	u64 total_reads;
> +	u64 read_latency_sum;
> +
> +	struct percpu_counter write_latency_sq_sum;
> +	atomic64_t write_latency_min;
> +	atomic64_t write_latency_max;
> +	spinlock_t write_latency_lock;
> +	u64 total_writes;
> +	u64 write_latency_sum;
>  };
>  
>  extern int ceph_metric_init(struct ceph_client_metric *m);
> @@ -27,4 +41,13 @@ static inline void ceph_update_cap_mis(struct ceph_client_metric *m)
>  {
>  	percpu_counter_inc(&m->i_caps_mis);
>  }
> +
> +extern void ceph_update_read_latency(struct ceph_client_metric *m,
> +				     unsigned long r_start,
> +				     unsigned long r_end,
> +				     int rc);
> +extern void ceph_update_write_latency(struct ceph_client_metric *m,
> +				      unsigned long r_start,
> +				      unsigned long r_end,
> +				      int rc);
>  #endif /* _FS_CEPH_MDS_METRIC_H */
> diff --git a/include/linux/ceph/osd_client.h b/include/linux/ceph/osd_client.h
> index 9d9f745..02ff3a3 100644
> --- a/include/linux/ceph/osd_client.h
> +++ b/include/linux/ceph/osd_client.h
> @@ -213,6 +213,7 @@ struct ceph_osd_request {
>  	/* internal */
>  	unsigned long r_stamp;                /* jiffies, send or check time */
>  	unsigned long r_start_stamp;          /* jiffies */
> +	unsigned long r_end_stamp;            /* jiffies */
>  	int r_attempts;
>  	u32 r_map_dne_bound;
>  
> diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c
> index 998e26b..28e33e0 100644
> --- a/net/ceph/osd_client.c
> +++ b/net/ceph/osd_client.c
> @@ -2389,6 +2389,8 @@ static void finish_request(struct ceph_osd_request *req)
>  	WARN_ON(lookup_request_mc(&osdc->map_checks, req->r_tid));
>  	dout("%s req %p tid %llu\n", __func__, req, req->r_tid);
>  
> +	req->r_end_stamp = jiffies;
> +
>  	if (req->r_osd)
>  		unlink_request(req->r_osd, req);
>  	atomic_dec(&osdc->num_requests);

-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v12 3/4] ceph: add read/write latency metric support
  2020-03-19 14:36   ` Jeff Layton
@ 2020-03-19 17:44     ` Xiubo Li
  2020-03-19 18:01       ` Xiubo Li
  2020-03-19 18:36       ` Jeff Layton
  0 siblings, 2 replies; 10+ messages in thread
From: Xiubo Li @ 2020-03-19 17:44 UTC (permalink / raw)
  To: Jeff Layton; +Cc: sage, idryomov, gfarnum, zyan, pdonnell, ceph-devel

On 2020/3/19 22:36, Jeff Layton wrote:
> On Thu, 2020-03-19 at 10:06 -0400, xiubli@redhat.com wrote:
>> From: Xiubo Li <xiubli@redhat.com>
>>
>> Calculate the latency for OSD read requests. Add a new r_end_stamp
>> field to struct ceph_osd_request that will hold the time of that
>> the reply was received. Use that to calculate the RTT for each call,
>> and divide the sum of those by number of calls to get averate RTT.
>>
>> Keep a tally of RTT for OSD writes and number of calls to track average
>> latency of OSD writes.
>>
>> URL: https://tracker.ceph.com/issues/43215
>> Signed-off-by: Xiubo Li <xiubli@redhat.com>
>> ---
>>   fs/ceph/addr.c                  |  18 +++++++
>>   fs/ceph/debugfs.c               |  60 +++++++++++++++++++++-
>>   fs/ceph/file.c                  |  26 ++++++++++
>>   fs/ceph/metric.c                | 110 ++++++++++++++++++++++++++++++++++++++++
>>   fs/ceph/metric.h                |  23 +++++++++
>>   include/linux/ceph/osd_client.h |   1 +
>>   net/ceph/osd_client.c           |   2 +
>>   7 files changed, 239 insertions(+), 1 deletion(-)
>>
>> diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c
>> index 6f4678d..f359619 100644
>> --- a/fs/ceph/addr.c
>> +++ b/fs/ceph/addr.c
>> @@ -216,6 +216,9 @@ static int ceph_sync_readpages(struct ceph_fs_client *fsc,
>>   	if (!rc)
>>   		rc = ceph_osdc_wait_request(osdc, req);
>>   
>> +	ceph_update_read_latency(&fsc->mdsc->metric, req->r_start_stamp,
>> +				 req->r_end_stamp, rc);
>> +
>>   	ceph_osdc_put_request(req);
>>   	dout("readpages result %d\n", rc);
>>   	return rc;
>> @@ -299,6 +302,7 @@ static int ceph_readpage(struct file *filp, struct page *page)
>>   static void finish_read(struct ceph_osd_request *req)
>>   {
>>   	struct inode *inode = req->r_inode;
>> +	struct ceph_fs_client *fsc = ceph_inode_to_client(inode);
>>   	struct ceph_osd_data *osd_data;
>>   	int rc = req->r_result <= 0 ? req->r_result : 0;
>>   	int bytes = req->r_result >= 0 ? req->r_result : 0;
>> @@ -336,6 +340,10 @@ static void finish_read(struct ceph_osd_request *req)
>>   		put_page(page);
>>   		bytes -= PAGE_SIZE;
>>   	}
>> +
>> +	ceph_update_read_latency(&fsc->mdsc->metric, req->r_start_stamp,
>> +				 req->r_end_stamp, rc);
>> +
>>   	kfree(osd_data->pages);
>>   }
>>   
>> @@ -643,6 +651,9 @@ static int ceph_sync_writepages(struct ceph_fs_client *fsc,
>>   	if (!rc)
>>   		rc = ceph_osdc_wait_request(osdc, req);
>>   
>> +	ceph_update_write_latency(&fsc->mdsc->metric, req->r_start_stamp,
>> +				  req->r_end_stamp, rc);
>> +
>>   	ceph_osdc_put_request(req);
>>   	if (rc == 0)
>>   		rc = len;
>> @@ -794,6 +805,9 @@ static void writepages_finish(struct ceph_osd_request *req)
>>   		ceph_clear_error_write(ci);
>>   	}
>>   
>> +	ceph_update_write_latency(&fsc->mdsc->metric, req->r_start_stamp,
>> +				  req->r_end_stamp, rc);
>> +
>>   	/*
>>   	 * We lost the cache cap, need to truncate the page before
>>   	 * it is unlocked, otherwise we'd truncate it later in the
>> @@ -1852,6 +1866,10 @@ int ceph_uninline_data(struct file *filp, struct page *locked_page)
>>   	err = ceph_osdc_start_request(&fsc->client->osdc, req, false);
>>   	if (!err)
>>   		err = ceph_osdc_wait_request(&fsc->client->osdc, req);
>> +
>> +	ceph_update_write_latency(&fsc->mdsc->metric, req->r_start_stamp,
>> +				  req->r_end_stamp, err);
>> +
>>   out_put:
>>   	ceph_osdc_put_request(req);
>>   	if (err == -ECANCELED)
>> diff --git a/fs/ceph/debugfs.c b/fs/ceph/debugfs.c
>> index 66b9622..de07fdb 100644
>> --- a/fs/ceph/debugfs.c
>> +++ b/fs/ceph/debugfs.c
>> @@ -7,6 +7,7 @@
>>   #include <linux/ctype.h>
>>   #include <linux/debugfs.h>
>>   #include <linux/seq_file.h>
>> +#include <linux/math64.h>
>>   
>>   #include <linux/ceph/libceph.h>
>>   #include <linux/ceph/mon_client.h>
>> @@ -124,13 +125,70 @@ static int mdsc_show(struct seq_file *s, void *p)
>>   	return 0;
>>   }
>>   
>> +static u64 get_avg(u64 *totalp, u64 *sump, spinlock_t *lockp, u64 *total)
>> +{
>> +	u64 t, sum, avg = 0;
>> +
>> +	spin_lock(lockp);
>> +	t = *totalp;
>> +	sum = *sump;
>> +	spin_unlock(lockp);
>> +
>> +	if (likely(t))
>> +		avg = DIV64_U64_ROUND_CLOSEST(sum, t);
>> +
>> +	*total = t;
>> +	return avg;
>> +}
>> +
>> +#define CEPH_METRIC_SHOW(name, total, avg, min, max, sq) {		\
>> +	u64 _total, _avg, _min, _max, _sq, _st, _re = 0;		\
>> +	_avg = jiffies_to_usecs(avg);					\
>> +	_min = jiffies_to_usecs(min == S64_MAX ? 0 : min);		\
>> +	_max = jiffies_to_usecs(max);					\
>> +	_total = total - 1;						\
>> +	_sq = _total > 0 ? DIV64_U64_ROUND_CLOSEST(sq, _total) : 0;	\
>> +	_sq = jiffies_to_usecs(_sq);					\
>> +	_st = int_sqrt64(_sq);						\
>> +	if (_st > 0) {							\
>> +		_re = 5 * (_sq - (_st * _st));				\
>> +		_re = _re > 0 ? _re - 1 : 0;				\
>> +		_re = _st > 0 ? div64_s64(_re, _st) : 0;		\
>> +	}								\
>> +	seq_printf(s, "%-14s%-12llu%-16llu%-16llu%-16llu%llu.%llu\n",	\
>> +		   name, total, _avg, _min, _max, _st, _re);		\
>> +}
>> +
>>   static int metric_show(struct seq_file *s, void *p)
>>   {
>>   	struct ceph_fs_client *fsc = s->private;
>>   	struct ceph_mds_client *mdsc = fsc->mdsc;
>>   	struct ceph_client_metric *m = &mdsc->metric;
>>   	int i, nr_caps = 0;
>> -
>> +	u64 total, avg, min, max, sq;
>> +
>> +	seq_printf(s, "item          total       avg_lat(us)     min_lat(us)     max_lat(us)     stdev(us)\n");
>> +	seq_printf(s, "-----------------------------------------------------------------------------------\n");
>> +
>> +	avg = get_avg(&m->total_reads,
>> +		      &m->read_latency_sum,
>> +		      &m->read_latency_lock,
>> +		      &total);
>> +	min = atomic64_read(&m->read_latency_min);
>> +	max = atomic64_read(&m->read_latency_max);
>> +	sq = percpu_counter_sum(&m->read_latency_sq_sum);
>> +	CEPH_METRIC_SHOW("read", total, avg, min, max, sq);
>> +
>> +	avg = get_avg(&m->total_writes,
>> +		      &m->write_latency_sum,
>> +		      &m->write_latency_lock,
>> +		      &total);
>> +	min = atomic64_read(&m->write_latency_min);
>> +	max = atomic64_read(&m->write_latency_max);
>> +	sq = percpu_counter_sum(&m->write_latency_sq_sum);
>> +	CEPH_METRIC_SHOW("write", total, avg, min, max, sq);
>> +
>> +	seq_printf(s, "\n");
>>   	seq_printf(s, "item          total           miss            hit\n");
>>   	seq_printf(s, "-------------------------------------------------\n");
>>   
>> diff --git a/fs/ceph/file.c b/fs/ceph/file.c
>> index 4a5ccbb..8e40022 100644
>> --- a/fs/ceph/file.c
>> +++ b/fs/ceph/file.c
>> @@ -906,6 +906,10 @@ static ssize_t ceph_sync_read(struct kiocb *iocb, struct iov_iter *to,
>>   		ret = ceph_osdc_start_request(osdc, req, false);
>>   		if (!ret)
>>   			ret = ceph_osdc_wait_request(osdc, req);
>> +
>> +		ceph_update_read_latency(&fsc->mdsc->metric, req->r_start_stamp,
>> +					 req->r_end_stamp, ret);
>> +
>>   		ceph_osdc_put_request(req);
>>   
>>   		i_size = i_size_read(inode);
>> @@ -1044,6 +1048,8 @@ static void ceph_aio_complete_req(struct ceph_osd_request *req)
>>   	struct inode *inode = req->r_inode;
>>   	struct ceph_aio_request *aio_req = req->r_priv;
>>   	struct ceph_osd_data *osd_data = osd_req_op_extent_osd_data(req, 0);
>> +	struct ceph_fs_client *fsc = ceph_inode_to_client(inode);
>> +	struct ceph_client_metric *metric = &fsc->mdsc->metric;
>>   
>>   	BUG_ON(osd_data->type != CEPH_OSD_DATA_TYPE_BVECS);
>>   	BUG_ON(!osd_data->num_bvecs);
>> @@ -1051,6 +1057,16 @@ static void ceph_aio_complete_req(struct ceph_osd_request *req)
>>   	dout("ceph_aio_complete_req %p rc %d bytes %u\n",
>>   	     inode, rc, osd_data->bvec_pos.iter.bi_size);
>>   
>> +	/* r_start_stamp == 0 means the request was not submitted */
>> +	if (req->r_start_stamp) {
>> +		if (aio_req->write)
>> +			ceph_update_write_latency(metric, req->r_start_stamp,
>> +						  req->r_end_stamp, rc);
>> +		else
>> +			ceph_update_read_latency(metric, req->r_start_stamp,
>> +						 req->r_end_stamp, rc);
>> +	}
>> +
>>   	if (rc == -EOLDSNAPC) {
>>   		struct ceph_aio_work *aio_work;
>>   		BUG_ON(!aio_req->write);
>> @@ -1179,6 +1195,7 @@ static void ceph_aio_retry_work(struct work_struct *work)
>>   	struct inode *inode = file_inode(file);
>>   	struct ceph_inode_info *ci = ceph_inode(inode);
>>   	struct ceph_fs_client *fsc = ceph_inode_to_client(inode);
>> +	struct ceph_client_metric *metric = &fsc->mdsc->metric;
>>   	struct ceph_vino vino;
>>   	struct ceph_osd_request *req;
>>   	struct bio_vec *bvecs;
>> @@ -1295,6 +1312,13 @@ static void ceph_aio_retry_work(struct work_struct *work)
>>   		if (!ret)
>>   			ret = ceph_osdc_wait_request(&fsc->client->osdc, req);
>>   
>> +		if (write)
>> +			ceph_update_write_latency(metric, req->r_start_stamp,
>> +						  req->r_end_stamp, ret);
>> +		else
>> +			ceph_update_read_latency(metric, req->r_start_stamp,
>> +						 req->r_end_stamp, ret);
>> +
>>   		size = i_size_read(inode);
>>   		if (!write) {
>>   			if (ret == -ENOENT)
>> @@ -1466,6 +1490,8 @@ static void ceph_aio_retry_work(struct work_struct *work)
>>   		if (!ret)
>>   			ret = ceph_osdc_wait_request(&fsc->client->osdc, req);
>>   
>> +		ceph_update_write_latency(&fsc->mdsc->metric, req->r_start_stamp,
>> +					  req->r_end_stamp, ret);
>>   out:
>>   		ceph_osdc_put_request(req);
>>   		if (ret != 0) {
>> diff --git a/fs/ceph/metric.c b/fs/ceph/metric.c
>> index 2a4b739..6cb64fb 100644
>> --- a/fs/ceph/metric.c
>> +++ b/fs/ceph/metric.c
>> @@ -2,6 +2,7 @@
>>   
>>   #include <linux/types.h>
>>   #include <linux/percpu_counter.h>
>> +#include <linux/math64.h>
>>   
>>   #include "metric.h"
>>   
>> @@ -29,8 +30,32 @@ int ceph_metric_init(struct ceph_client_metric *m)
>>   	if (ret)
>>   		goto err_i_caps_mis;
>>   
>> +	ret = percpu_counter_init(&m->read_latency_sq_sum, 0, GFP_KERNEL);
>> +	if (ret)
>> +		goto err_read_latency_sq_sum;
>> +
>> +	atomic64_set(&m->read_latency_min, S64_MAX);
>> +	atomic64_set(&m->read_latency_max, 0);
>> +	spin_lock_init(&m->read_latency_lock);
>> +	m->total_reads = 0;
>> +	m->read_latency_sum = 0;
>> +
>> +	ret = percpu_counter_init(&m->write_latency_sq_sum, 0, GFP_KERNEL);
>> +	if (ret)
>> +		goto err_write_latency_sq_sum;
>> +
>> +	atomic64_set(&m->write_latency_min, S64_MAX);
>> +	atomic64_set(&m->write_latency_max, 0);
>> +	spin_lock_init(&m->write_latency_lock);
>> +	m->total_writes = 0;
>> +	m->write_latency_sum = 0;
>> +
>>   	return 0;
>>   
>> +err_write_latency_sq_sum:
>> +	percpu_counter_destroy(&m->read_latency_sq_sum);
>> +err_read_latency_sq_sum:
>> +	percpu_counter_destroy(&m->i_caps_mis);
>>   err_i_caps_mis:
>>   	percpu_counter_destroy(&m->i_caps_hit);
>>   err_i_caps_hit:
>> @@ -46,8 +71,93 @@ void ceph_metric_destroy(struct ceph_client_metric *m)
>>   	if (!m)
>>   		return;
>>   
>> +	percpu_counter_destroy(&m->write_latency_sq_sum);
>> +	percpu_counter_destroy(&m->read_latency_sq_sum);
>>   	percpu_counter_destroy(&m->i_caps_mis);
>>   	percpu_counter_destroy(&m->i_caps_hit);
>>   	percpu_counter_destroy(&m->d_lease_mis);
>>   	percpu_counter_destroy(&m->d_lease_hit);
>>   }
>> +
>> +static inline void __update_min_latency(atomic64_t *min, unsigned long lat)
>> +{
>> +	u64 cur, old;
>> +
>> +	cur = atomic64_read(min);
>> +	do {
>> +		old = cur;
>> +		if (likely(lat >= old))
>> +			break;
>> +	} while (unlikely((cur = atomic64_cmpxchg(min, old, lat)) != old));
>> +}
>> +
>> +static inline void __update_max_latency(atomic64_t *max, unsigned long lat)
>> +{
>> +	u64 cur, old;
>> +
>> +	cur = atomic64_read(max);
>> +	do {
>> +		old = cur;
>> +		if (likely(lat <= old))
>> +			break;
>> +	} while (unlikely((cur = atomic64_cmpxchg(max, old, lat)) != old));
>> +}
>> +
>> +static inline void __update_avg_and_sq(u64 *totalp, u64 *lsump,
>> +				       struct percpu_counter *sq_sump,
>> +				       spinlock_t *lockp, unsigned long lat)
>> +{
>> +	u64 total, avg, sq, lsum;
>> +
>> +	spin_lock(lockp);
>> +	total = ++(*totalp);
>> +	*lsump += lat;
>> +	lsum = *lsump;
>> +	spin_unlock(lockp);

For each read/write/metadata latency updating,  I am trying to just make 
the critical code as small as possible here.


>> +
>> +	if (unlikely(total == 1))
>> +		return;
>> +
>> +	/* the sq is (lat - old_avg) * (lat - new_avg) */
>> +	avg = DIV64_U64_ROUND_CLOSEST((lsum - lat), (total - 1));
>> +	sq = lat - avg;
>> +	avg = DIV64_U64_ROUND_CLOSEST(lsum, total);
>> +	sq = sq * (lat - avg);
>> +	percpu_counter_add(sq_sump, sq);

IMO, the percpu_counter could bring us benefit without locks, which will 
do many div/muti many times and will take some longer time on computing 
the sq.


>> +}
>> +
>> +void ceph_update_read_latency(struct ceph_client_metric *m,
>> +			      unsigned long r_start,
>> +			      unsigned long r_end,
>> +			      int rc)
>> +{
>> +	unsigned long lat = r_end - r_start;
>> +
>> +	if (unlikely(rc < 0 && rc != -ENOENT && rc != -ETIMEDOUT))
>> +		return;
>> +
>> +	__update_min_latency(&m->read_latency_min, lat);
>> +	__update_max_latency(&m->read_latency_max, lat);

And also here to update the min/max without locks, but this should be 
okay to switch to u64 and under the locks.

Thought ?

If this makes sense, I will make the min/max to u64 type, and keep the 
sq_sum as the percpu. Or I will make them all to u64.

Thanks.



>> +	__update_avg_and_sq(&m->total_reads, &m->read_latency_sum,
>> +			    &m->read_latency_sq_sum,
>> +			    &m->read_latency_lock,
>> +			    lat);
>> +}
>> +
>> +void ceph_update_write_latency(struct ceph_client_metric *m,
>> +			       unsigned long r_start,
>> +			       unsigned long r_end,
>> +			       int rc)
>> +{
>> +	unsigned long lat = r_end - r_start;
>> +
>> +	if (unlikely(rc && rc != -ETIMEDOUT))
>> +		return;
>> +
>> +	__update_min_latency(&m->write_latency_min, lat);
>> +	__update_max_latency(&m->write_latency_max, lat);
>> +	__update_avg_and_sq(&m->total_writes, &m->write_latency_sum,
>> +			    &m->write_latency_sq_sum,
>> +			    &m->write_latency_lock,
>> +			    lat);
>> +}
>> diff --git a/fs/ceph/metric.h b/fs/ceph/metric.h
>> index 098ee8a..c7eae56 100644
>> --- a/fs/ceph/metric.h
>> +++ b/fs/ceph/metric.h
>> @@ -13,6 +13,20 @@ struct ceph_client_metric {
>>   
>>   	struct percpu_counter i_caps_hit;
>>   	struct percpu_counter i_caps_mis;
>> +
>> +	struct percpu_counter read_latency_sq_sum;
>> +	atomic64_t read_latency_min;
>> +	atomic64_t read_latency_max;
> I'd make the above 3 values be regular values and make them all use the
> read_latency_lock. Given that you're taking a lock anyway, it's more
> efficient to just do all of the manipulation under a single spinlock
> rather than fooling with atomic or percpu values. These are all almost
> certainly going to be in the same cacheline anyway.
>
>> +	spinlock_t read_latency_lock;
>> +	u64 total_reads;
>> +	u64 read_latency_sum;
>> +
>> +	struct percpu_counter write_latency_sq_sum;
>> +	atomic64_t write_latency_min;
>> +	atomic64_t write_latency_max;
>> +	spinlock_t write_latency_lock;
>> +	u64 total_writes;
>> +	u64 write_latency_sum;
>>   };
>>   
>>   extern int ceph_metric_init(struct ceph_client_metric *m);
>> @@ -27,4 +41,13 @@ static inline void ceph_update_cap_mis(struct ceph_client_metric *m)
>>   {
>>   	percpu_counter_inc(&m->i_caps_mis);
>>   }
>> +
>> +extern void ceph_update_read_latency(struct ceph_client_metric *m,
>> +				     unsigned long r_start,
>> +				     unsigned long r_end,
>> +				     int rc);
>> +extern void ceph_update_write_latency(struct ceph_client_metric *m,
>> +				      unsigned long r_start,
>> +				      unsigned long r_end,
>> +				      int rc);
>>   #endif /* _FS_CEPH_MDS_METRIC_H */
>> diff --git a/include/linux/ceph/osd_client.h b/include/linux/ceph/osd_client.h
>> index 9d9f745..02ff3a3 100644
>> --- a/include/linux/ceph/osd_client.h
>> +++ b/include/linux/ceph/osd_client.h
>> @@ -213,6 +213,7 @@ struct ceph_osd_request {
>>   	/* internal */
>>   	unsigned long r_stamp;                /* jiffies, send or check time */
>>   	unsigned long r_start_stamp;          /* jiffies */
>> +	unsigned long r_end_stamp;            /* jiffies */
>>   	int r_attempts;
>>   	u32 r_map_dne_bound;
>>   
>> diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c
>> index 998e26b..28e33e0 100644
>> --- a/net/ceph/osd_client.c
>> +++ b/net/ceph/osd_client.c
>> @@ -2389,6 +2389,8 @@ static void finish_request(struct ceph_osd_request *req)
>>   	WARN_ON(lookup_request_mc(&osdc->map_checks, req->r_tid));
>>   	dout("%s req %p tid %llu\n", __func__, req, req->r_tid);
>>   
>> +	req->r_end_stamp = jiffies;
>> +
>>   	if (req->r_osd)
>>   		unlink_request(req->r_osd, req);
>>   	atomic_dec(&osdc->num_requests);

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v12 3/4] ceph: add read/write latency metric support
  2020-03-19 17:44     ` Xiubo Li
@ 2020-03-19 18:01       ` Xiubo Li
  2020-03-19 18:36       ` Jeff Layton
  1 sibling, 0 replies; 10+ messages in thread
From: Xiubo Li @ 2020-03-19 18:01 UTC (permalink / raw)
  To: Jeff Layton; +Cc: sage, idryomov, gfarnum, zyan, pdonnell, ceph-devel

On 2020/3/20 1:44, Xiubo Li wrote:
> On 2020/3/19 22:36, Jeff Layton wrote:
>> On Thu, 2020-03-19 at 10:06 -0400, xiubli@redhat.com wrote:
>>> From: Xiubo Li <xiubli@redhat.com>
>>>
>>> Calculate the latency for OSD read requests. Add a new r_end_stamp
>>> field to struct ceph_osd_request that will hold the time of that
>>> the reply was received. Use that to calculate the RTT for each call,
>>> and divide the sum of those by number of calls to get averate RTT.
>>>
>>> Keep a tally of RTT for OSD writes and number of calls to track average
>>> latency of OSD writes.
>>>
>>> URL: https://tracker.ceph.com/issues/43215
>>> Signed-off-by: Xiubo Li <xiubli@redhat.com>
>>> ---
>>>   fs/ceph/addr.c                  |  18 +++++++
>>>   fs/ceph/debugfs.c               |  60 +++++++++++++++++++++-
>>>   fs/ceph/file.c                  |  26 ++++++++++
>>>   fs/ceph/metric.c                | 110 
>>> ++++++++++++++++++++++++++++++++++++++++
>>>   fs/ceph/metric.h                |  23 +++++++++
>>>   include/linux/ceph/osd_client.h |   1 +
>>>   net/ceph/osd_client.c           |   2 +
>>>   7 files changed, 239 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c
>>> index 6f4678d..f359619 100644
>>> --- a/fs/ceph/addr.c
>>> +++ b/fs/ceph/addr.c
>>> @@ -216,6 +216,9 @@ static int ceph_sync_readpages(struct 
>>> ceph_fs_client *fsc,
>>>       if (!rc)
>>>           rc = ceph_osdc_wait_request(osdc, req);
>>>   +    ceph_update_read_latency(&fsc->mdsc->metric, req->r_start_stamp,
>>> +                 req->r_end_stamp, rc);
>>> +
>>>       ceph_osdc_put_request(req);
>>>       dout("readpages result %d\n", rc);
>>>       return rc;
>>> @@ -299,6 +302,7 @@ static int ceph_readpage(struct file *filp, 
>>> struct page *page)
>>>   static void finish_read(struct ceph_osd_request *req)
>>>   {
>>>       struct inode *inode = req->r_inode;
>>> +    struct ceph_fs_client *fsc = ceph_inode_to_client(inode);
>>>       struct ceph_osd_data *osd_data;
>>>       int rc = req->r_result <= 0 ? req->r_result : 0;
>>>       int bytes = req->r_result >= 0 ? req->r_result : 0;
>>> @@ -336,6 +340,10 @@ static void finish_read(struct ceph_osd_request 
>>> *req)
>>>           put_page(page);
>>>           bytes -= PAGE_SIZE;
>>>       }
>>> +
>>> +    ceph_update_read_latency(&fsc->mdsc->metric, req->r_start_stamp,
>>> +                 req->r_end_stamp, rc);
>>> +
>>>       kfree(osd_data->pages);
>>>   }
>>>   @@ -643,6 +651,9 @@ static int ceph_sync_writepages(struct 
>>> ceph_fs_client *fsc,
>>>       if (!rc)
>>>           rc = ceph_osdc_wait_request(osdc, req);
>>>   +    ceph_update_write_latency(&fsc->mdsc->metric, 
>>> req->r_start_stamp,
>>> +                  req->r_end_stamp, rc);
>>> +
>>>       ceph_osdc_put_request(req);
>>>       if (rc == 0)
>>>           rc = len;
>>> @@ -794,6 +805,9 @@ static void writepages_finish(struct 
>>> ceph_osd_request *req)
>>>           ceph_clear_error_write(ci);
>>>       }
>>>   +    ceph_update_write_latency(&fsc->mdsc->metric, 
>>> req->r_start_stamp,
>>> +                  req->r_end_stamp, rc);
>>> +
>>>       /*
>>>        * We lost the cache cap, need to truncate the page before
>>>        * it is unlocked, otherwise we'd truncate it later in the
>>> @@ -1852,6 +1866,10 @@ int ceph_uninline_data(struct file *filp, 
>>> struct page *locked_page)
>>>       err = ceph_osdc_start_request(&fsc->client->osdc, req, false);
>>>       if (!err)
>>>           err = ceph_osdc_wait_request(&fsc->client->osdc, req);
>>> +
>>> +    ceph_update_write_latency(&fsc->mdsc->metric, req->r_start_stamp,
>>> +                  req->r_end_stamp, err);
>>> +
>>>   out_put:
>>>       ceph_osdc_put_request(req);
>>>       if (err == -ECANCELED)
>>> diff --git a/fs/ceph/debugfs.c b/fs/ceph/debugfs.c
>>> index 66b9622..de07fdb 100644
>>> --- a/fs/ceph/debugfs.c
>>> +++ b/fs/ceph/debugfs.c
>>> @@ -7,6 +7,7 @@
>>>   #include <linux/ctype.h>
>>>   #include <linux/debugfs.h>
>>>   #include <linux/seq_file.h>
>>> +#include <linux/math64.h>
>>>     #include <linux/ceph/libceph.h>
>>>   #include <linux/ceph/mon_client.h>
>>> @@ -124,13 +125,70 @@ static int mdsc_show(struct seq_file *s, void *p)
>>>       return 0;
>>>   }
>>>   +static u64 get_avg(u64 *totalp, u64 *sump, spinlock_t *lockp, u64 
>>> *total)
>>> +{
>>> +    u64 t, sum, avg = 0;
>>> +
>>> +    spin_lock(lockp);
>>> +    t = *totalp;
>>> +    sum = *sump;
>>> +    spin_unlock(lockp);
>>> +
>>> +    if (likely(t))
>>> +        avg = DIV64_U64_ROUND_CLOSEST(sum, t);
>>> +
>>> +    *total = t;
>>> +    return avg;
>>> +}
>>> +
>>> +#define CEPH_METRIC_SHOW(name, total, avg, min, max, sq) {        \
>>> +    u64 _total, _avg, _min, _max, _sq, _st, _re = 0;        \
>>> +    _avg = jiffies_to_usecs(avg);                    \
>>> +    _min = jiffies_to_usecs(min == S64_MAX ? 0 : min); \
>>> +    _max = jiffies_to_usecs(max);                    \
>>> +    _total = total - 1;                        \
>>> +    _sq = _total > 0 ? DIV64_U64_ROUND_CLOSEST(sq, _total) : 0;    \
>>> +    _sq = jiffies_to_usecs(_sq);                    \
>>> +    _st = int_sqrt64(_sq);                        \
>>> +    if (_st > 0) {                            \
>>> +        _re = 5 * (_sq - (_st * _st));                \
>>> +        _re = _re > 0 ? _re - 1 : 0;                \
>>> +        _re = _st > 0 ? div64_s64(_re, _st) : 0;        \
>>> +    }                                \
>>> +    seq_printf(s, "%-14s%-12llu%-16llu%-16llu%-16llu%llu.%llu\n",    \
>>> +           name, total, _avg, _min, _max, _st, _re);        \
>>> +}
>>> +
>>>   static int metric_show(struct seq_file *s, void *p)
>>>   {
>>>       struct ceph_fs_client *fsc = s->private;
>>>       struct ceph_mds_client *mdsc = fsc->mdsc;
>>>       struct ceph_client_metric *m = &mdsc->metric;
>>>       int i, nr_caps = 0;
>>> -
>>> +    u64 total, avg, min, max, sq;
>>> +
>>> +    seq_printf(s, "item          total       avg_lat(us) 
>>> min_lat(us)     max_lat(us)     stdev(us)\n");
>>> +    seq_printf(s, 
>>> "-----------------------------------------------------------------------------------\n");
>>> +
>>> +    avg = get_avg(&m->total_reads,
>>> +              &m->read_latency_sum,
>>> +              &m->read_latency_lock,
>>> +              &total);
>>> +    min = atomic64_read(&m->read_latency_min);
>>> +    max = atomic64_read(&m->read_latency_max);
>>> +    sq = percpu_counter_sum(&m->read_latency_sq_sum);
>>> +    CEPH_METRIC_SHOW("read", total, avg, min, max, sq);
>>> +
>>> +    avg = get_avg(&m->total_writes,
>>> +              &m->write_latency_sum,
>>> +              &m->write_latency_lock,
>>> +              &total);
>>> +    min = atomic64_read(&m->write_latency_min);
>>> +    max = atomic64_read(&m->write_latency_max);
>>> +    sq = percpu_counter_sum(&m->write_latency_sq_sum);
>>> +    CEPH_METRIC_SHOW("write", total, avg, min, max, sq);
>>> +
>>> +    seq_printf(s, "\n");
>>>       seq_printf(s, "item          total miss            hit\n");
>>>       seq_printf(s, 
>>> "-------------------------------------------------\n");
>>>   diff --git a/fs/ceph/file.c b/fs/ceph/file.c
>>> index 4a5ccbb..8e40022 100644
>>> --- a/fs/ceph/file.c
>>> +++ b/fs/ceph/file.c
>>> @@ -906,6 +906,10 @@ static ssize_t ceph_sync_read(struct kiocb 
>>> *iocb, struct iov_iter *to,
>>>           ret = ceph_osdc_start_request(osdc, req, false);
>>>           if (!ret)
>>>               ret = ceph_osdc_wait_request(osdc, req);
>>> +
>>> + ceph_update_read_latency(&fsc->mdsc->metric, req->r_start_stamp,
>>> +                     req->r_end_stamp, ret);
>>> +
>>>           ceph_osdc_put_request(req);
>>>             i_size = i_size_read(inode);
>>> @@ -1044,6 +1048,8 @@ static void ceph_aio_complete_req(struct 
>>> ceph_osd_request *req)
>>>       struct inode *inode = req->r_inode;
>>>       struct ceph_aio_request *aio_req = req->r_priv;
>>>       struct ceph_osd_data *osd_data = 
>>> osd_req_op_extent_osd_data(req, 0);
>>> +    struct ceph_fs_client *fsc = ceph_inode_to_client(inode);
>>> +    struct ceph_client_metric *metric = &fsc->mdsc->metric;
>>>         BUG_ON(osd_data->type != CEPH_OSD_DATA_TYPE_BVECS);
>>>       BUG_ON(!osd_data->num_bvecs);
>>> @@ -1051,6 +1057,16 @@ static void ceph_aio_complete_req(struct 
>>> ceph_osd_request *req)
>>>       dout("ceph_aio_complete_req %p rc %d bytes %u\n",
>>>            inode, rc, osd_data->bvec_pos.iter.bi_size);
>>>   +    /* r_start_stamp == 0 means the request was not submitted */
>>> +    if (req->r_start_stamp) {
>>> +        if (aio_req->write)
>>> +            ceph_update_write_latency(metric, req->r_start_stamp,
>>> +                          req->r_end_stamp, rc);
>>> +        else
>>> +            ceph_update_read_latency(metric, req->r_start_stamp,
>>> +                         req->r_end_stamp, rc);
>>> +    }
>>> +
>>>       if (rc == -EOLDSNAPC) {
>>>           struct ceph_aio_work *aio_work;
>>>           BUG_ON(!aio_req->write);
>>> @@ -1179,6 +1195,7 @@ static void ceph_aio_retry_work(struct 
>>> work_struct *work)
>>>       struct inode *inode = file_inode(file);
>>>       struct ceph_inode_info *ci = ceph_inode(inode);
>>>       struct ceph_fs_client *fsc = ceph_inode_to_client(inode);
>>> +    struct ceph_client_metric *metric = &fsc->mdsc->metric;
>>>       struct ceph_vino vino;
>>>       struct ceph_osd_request *req;
>>>       struct bio_vec *bvecs;
>>> @@ -1295,6 +1312,13 @@ static void ceph_aio_retry_work(struct 
>>> work_struct *work)
>>>           if (!ret)
>>>               ret = ceph_osdc_wait_request(&fsc->client->osdc, req);
>>>   +        if (write)
>>> +            ceph_update_write_latency(metric, req->r_start_stamp,
>>> +                          req->r_end_stamp, ret);
>>> +        else
>>> +            ceph_update_read_latency(metric, req->r_start_stamp,
>>> +                         req->r_end_stamp, ret);
>>> +
>>>           size = i_size_read(inode);
>>>           if (!write) {
>>>               if (ret == -ENOENT)
>>> @@ -1466,6 +1490,8 @@ static void ceph_aio_retry_work(struct 
>>> work_struct *work)
>>>           if (!ret)
>>>               ret = ceph_osdc_wait_request(&fsc->client->osdc, req);
>>>   + ceph_update_write_latency(&fsc->mdsc->metric, req->r_start_stamp,
>>> +                      req->r_end_stamp, ret);
>>>   out:
>>>           ceph_osdc_put_request(req);
>>>           if (ret != 0) {
>>> diff --git a/fs/ceph/metric.c b/fs/ceph/metric.c
>>> index 2a4b739..6cb64fb 100644
>>> --- a/fs/ceph/metric.c
>>> +++ b/fs/ceph/metric.c
>>> @@ -2,6 +2,7 @@
>>>     #include <linux/types.h>
>>>   #include <linux/percpu_counter.h>
>>> +#include <linux/math64.h>
>>>     #include "metric.h"
>>>   @@ -29,8 +30,32 @@ int ceph_metric_init(struct ceph_client_metric *m)
>>>       if (ret)
>>>           goto err_i_caps_mis;
>>>   +    ret = percpu_counter_init(&m->read_latency_sq_sum, 0, 
>>> GFP_KERNEL);
>>> +    if (ret)
>>> +        goto err_read_latency_sq_sum;
>>> +
>>> +    atomic64_set(&m->read_latency_min, S64_MAX);
>>> +    atomic64_set(&m->read_latency_max, 0);
>>> +    spin_lock_init(&m->read_latency_lock);
>>> +    m->total_reads = 0;
>>> +    m->read_latency_sum = 0;
>>> +
>>> +    ret = percpu_counter_init(&m->write_latency_sq_sum, 0, 
>>> GFP_KERNEL);
>>> +    if (ret)
>>> +        goto err_write_latency_sq_sum;
>>> +
>>> +    atomic64_set(&m->write_latency_min, S64_MAX);
>>> +    atomic64_set(&m->write_latency_max, 0);
>>> +    spin_lock_init(&m->write_latency_lock);
>>> +    m->total_writes = 0;
>>> +    m->write_latency_sum = 0;
>>> +
>>>       return 0;
>>>   +err_write_latency_sq_sum:
>>> +    percpu_counter_destroy(&m->read_latency_sq_sum);
>>> +err_read_latency_sq_sum:
>>> +    percpu_counter_destroy(&m->i_caps_mis);
>>>   err_i_caps_mis:
>>>       percpu_counter_destroy(&m->i_caps_hit);
>>>   err_i_caps_hit:
>>> @@ -46,8 +71,93 @@ void ceph_metric_destroy(struct 
>>> ceph_client_metric *m)
>>>       if (!m)
>>>           return;
>>>   + percpu_counter_destroy(&m->write_latency_sq_sum);
>>> +    percpu_counter_destroy(&m->read_latency_sq_sum);
>>>       percpu_counter_destroy(&m->i_caps_mis);
>>>       percpu_counter_destroy(&m->i_caps_hit);
>>>       percpu_counter_destroy(&m->d_lease_mis);
>>>       percpu_counter_destroy(&m->d_lease_hit);
>>>   }
>>> +
>>> +static inline void __update_min_latency(atomic64_t *min, unsigned 
>>> long lat)
>>> +{
>>> +    u64 cur, old;
>>> +
>>> +    cur = atomic64_read(min);
>>> +    do {
>>> +        old = cur;
>>> +        if (likely(lat >= old))
>>> +            break;
>>> +    } while (unlikely((cur = atomic64_cmpxchg(min, old, lat)) != 
>>> old));
>>> +}
>>> +
>>> +static inline void __update_max_latency(atomic64_t *max, unsigned 
>>> long lat)
>>> +{
>>> +    u64 cur, old;
>>> +
>>> +    cur = atomic64_read(max);
>>> +    do {
>>> +        old = cur;
>>> +        if (likely(lat <= old))
>>> +            break;
>>> +    } while (unlikely((cur = atomic64_cmpxchg(max, old, lat)) != 
>>> old));
>>> +}
>>> +
>>> +static inline void __update_avg_and_sq(u64 *totalp, u64 *lsump,
>>> +                       struct percpu_counter *sq_sump,
>>> +                       spinlock_t *lockp, unsigned long lat)
>>> +{
>>> +    u64 total, avg, sq, lsum;
>>> +
>>> +    spin_lock(lockp);
>>> +    total = ++(*totalp);
>>> +    *lsump += lat;
>>> +    lsum = *lsump;
>>> +    spin_unlock(lockp);
>
> For each read/write/metadata latency updating,  I am trying to just 
> make the critical code as small as possible here.
>
>
>>> +
>>> +    if (unlikely(total == 1))
>>> +        return;
>>> +
>>> +    /* the sq is (lat - old_avg) * (lat - new_avg) */
>>> +    avg = DIV64_U64_ROUND_CLOSEST((lsum - lat), (total - 1));
>>> +    sq = lat - avg;
>>> +    avg = DIV64_U64_ROUND_CLOSEST(lsum, total);
>>> +    sq = sq * (lat - avg);
>>> +    percpu_counter_add(sq_sump, sq);
>
> IMO, the percpu_counter could bring us benefit without locks, which 
> will do many div/muti many times and will take some longer time on 
> computing the sq.
>
>
>>> +}
>>> +
>>> +void ceph_update_read_latency(struct ceph_client_metric *m,
>>> +                  unsigned long r_start,
>>> +                  unsigned long r_end,
>>> +                  int rc)
>>> +{
>>> +    unsigned long lat = r_end - r_start;
>>> +
>>> +    if (unlikely(rc < 0 && rc != -ENOENT && rc != -ETIMEDOUT))
>>> +        return;
>>> +
>>> +    __update_min_latency(&m->read_latency_min, lat);
>>> +    __update_max_latency(&m->read_latency_max, lat);
>
> And also here to update the min/max without locks, but this should be 
> okay to switch to u64 and under the locks.
>
> Thought ?
>
> If this makes sense, I will make the min/max to u64 type, and keep the 
> sq_sum as the percpu. Or I will make them all to u64.
>
And also in the future, we may need to support and calculate all the 
above perf in different IO sizes, like 4k/8k/16k/32k/.../64M.

If so, keep the min/max as atomic type and sq_sum as percpu type and do 
them outside the spin lock should be better ?

Thanks

> Thanks.
>
>
>
>>> + __update_avg_and_sq(&m->total_reads, &m->read_latency_sum,
>>> +                &m->read_latency_sq_sum,
>>> +                &m->read_latency_lock,
>>> +                lat);
>>> +}
>>> +
>>> +void ceph_update_write_latency(struct ceph_client_metric *m,
>>> +                   unsigned long r_start,
>>> +                   unsigned long r_end,
>>> +                   int rc)
>>> +{
>>> +    unsigned long lat = r_end - r_start;
>>> +
>>> +    if (unlikely(rc && rc != -ETIMEDOUT))
>>> +        return;
>>> +
>>> +    __update_min_latency(&m->write_latency_min, lat);
>>> +    __update_max_latency(&m->write_latency_max, lat);
>>> +    __update_avg_and_sq(&m->total_writes, &m->write_latency_sum,
>>> +                &m->write_latency_sq_sum,
>>> +                &m->write_latency_lock,
>>> +                lat);
>>> +}
>>> diff --git a/fs/ceph/metric.h b/fs/ceph/metric.h
>>> index 098ee8a..c7eae56 100644
>>> --- a/fs/ceph/metric.h
>>> +++ b/fs/ceph/metric.h
>>> @@ -13,6 +13,20 @@ struct ceph_client_metric {
>>>         struct percpu_counter i_caps_hit;
>>>       struct percpu_counter i_caps_mis;
>>> +
>>> +    struct percpu_counter read_latency_sq_sum;
>>> +    atomic64_t read_latency_min;
>>> +    atomic64_t read_latency_max;
>> I'd make the above 3 values be regular values and make them all use the
>> read_latency_lock. Given that you're taking a lock anyway, it's more
>> efficient to just do all of the manipulation under a single spinlock
>> rather than fooling with atomic or percpu values. These are all almost
>> certainly going to be in the same cacheline anyway.
>>
>>> +    spinlock_t read_latency_lock;
>>> +    u64 total_reads;
>>> +    u64 read_latency_sum;
>>> +
>>> +    struct percpu_counter write_latency_sq_sum;
>>> +    atomic64_t write_latency_min;
>>> +    atomic64_t write_latency_max;
>>> +    spinlock_t write_latency_lock;
>>> +    u64 total_writes;
>>> +    u64 write_latency_sum;
>>>   };
>>>     extern int ceph_metric_init(struct ceph_client_metric *m);
>>> @@ -27,4 +41,13 @@ static inline void ceph_update_cap_mis(struct 
>>> ceph_client_metric *m)
>>>   {
>>>       percpu_counter_inc(&m->i_caps_mis);
>>>   }
>>> +
>>> +extern void ceph_update_read_latency(struct ceph_client_metric *m,
>>> +                     unsigned long r_start,
>>> +                     unsigned long r_end,
>>> +                     int rc);
>>> +extern void ceph_update_write_latency(struct ceph_client_metric *m,
>>> +                      unsigned long r_start,
>>> +                      unsigned long r_end,
>>> +                      int rc);
>>>   #endif /* _FS_CEPH_MDS_METRIC_H */
>>> diff --git a/include/linux/ceph/osd_client.h 
>>> b/include/linux/ceph/osd_client.h
>>> index 9d9f745..02ff3a3 100644
>>> --- a/include/linux/ceph/osd_client.h
>>> +++ b/include/linux/ceph/osd_client.h
>>> @@ -213,6 +213,7 @@ struct ceph_osd_request {
>>>       /* internal */
>>>       unsigned long r_stamp;                /* jiffies, send or 
>>> check time */
>>>       unsigned long r_start_stamp;          /* jiffies */
>>> +    unsigned long r_end_stamp;            /* jiffies */
>>>       int r_attempts;
>>>       u32 r_map_dne_bound;
>>>   diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c
>>> index 998e26b..28e33e0 100644
>>> --- a/net/ceph/osd_client.c
>>> +++ b/net/ceph/osd_client.c
>>> @@ -2389,6 +2389,8 @@ static void finish_request(struct 
>>> ceph_osd_request *req)
>>>       WARN_ON(lookup_request_mc(&osdc->map_checks, req->r_tid));
>>>       dout("%s req %p tid %llu\n", __func__, req, req->r_tid);
>>>   +    req->r_end_stamp = jiffies;
>>> +
>>>       if (req->r_osd)
>>>           unlink_request(req->r_osd, req);
>>>       atomic_dec(&osdc->num_requests);
>
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v12 3/4] ceph: add read/write latency metric support
  2020-03-19 17:44     ` Xiubo Li
  2020-03-19 18:01       ` Xiubo Li
@ 2020-03-19 18:36       ` Jeff Layton
  2020-03-19 18:49         ` Xiubo Li
  1 sibling, 1 reply; 10+ messages in thread
From: Jeff Layton @ 2020-03-19 18:36 UTC (permalink / raw)
  To: Xiubo Li; +Cc: sage, idryomov, gfarnum, zyan, pdonnell, ceph-devel

On Fri, 2020-03-20 at 01:44 +0800, Xiubo Li wrote:
> On 2020/3/19 22:36, Jeff Layton wrote:
> > On Thu, 2020-03-19 at 10:06 -0400, xiubli@redhat.com wrote:
> > > From: Xiubo Li <xiubli@redhat.com>
> > > 
> > > Calculate the latency for OSD read requests. Add a new r_end_stamp
> > > field to struct ceph_osd_request that will hold the time of that
> > > the reply was received. Use that to calculate the RTT for each call,
> > > and divide the sum of those by number of calls to get averate RTT.
> > > 
> > > Keep a tally of RTT for OSD writes and number of calls to track average
> > > latency of OSD writes.
> > > 
> > > URL: https://tracker.ceph.com/issues/43215
> > > Signed-off-by: Xiubo Li <xiubli@redhat.com>
> > > ---
> > >   fs/ceph/addr.c                  |  18 +++++++
> > >   fs/ceph/debugfs.c               |  60 +++++++++++++++++++++-
> > >   fs/ceph/file.c                  |  26 ++++++++++
> > >   fs/ceph/metric.c                | 110 ++++++++++++++++++++++++++++++++++++++++
> > >   fs/ceph/metric.h                |  23 +++++++++
> > >   include/linux/ceph/osd_client.h |   1 +
> > >   net/ceph/osd_client.c           |   2 +
> > >   7 files changed, 239 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c
> > > index 6f4678d..f359619 100644
> > > --- a/fs/ceph/addr.c
> > > +++ b/fs/ceph/addr.c
> > > @@ -216,6 +216,9 @@ static int ceph_sync_readpages(struct ceph_fs_client *fsc,
> > >   	if (!rc)
> > >   		rc = ceph_osdc_wait_request(osdc, req);
> > >   
> > > +	ceph_update_read_latency(&fsc->mdsc->metric, req->r_start_stamp,
> > > +				 req->r_end_stamp, rc);
> > > +
> > >   	ceph_osdc_put_request(req);
> > >   	dout("readpages result %d\n", rc);
> > >   	return rc;
> > > @@ -299,6 +302,7 @@ static int ceph_readpage(struct file *filp, struct page *page)
> > >   static void finish_read(struct ceph_osd_request *req)
> > >   {
> > >   	struct inode *inode = req->r_inode;
> > > +	struct ceph_fs_client *fsc = ceph_inode_to_client(inode);
> > >   	struct ceph_osd_data *osd_data;
> > >   	int rc = req->r_result <= 0 ? req->r_result : 0;
> > >   	int bytes = req->r_result >= 0 ? req->r_result : 0;
> > > @@ -336,6 +340,10 @@ static void finish_read(struct ceph_osd_request *req)
> > >   		put_page(page);
> > >   		bytes -= PAGE_SIZE;
> > >   	}
> > > +
> > > +	ceph_update_read_latency(&fsc->mdsc->metric, req->r_start_stamp,
> > > +				 req->r_end_stamp, rc);
> > > +
> > >   	kfree(osd_data->pages);
> > >   }
> > >   
> > > @@ -643,6 +651,9 @@ static int ceph_sync_writepages(struct ceph_fs_client *fsc,
> > >   	if (!rc)
> > >   		rc = ceph_osdc_wait_request(osdc, req);
> > >   
> > > +	ceph_update_write_latency(&fsc->mdsc->metric, req->r_start_stamp,
> > > +				  req->r_end_stamp, rc);
> > > +
> > >   	ceph_osdc_put_request(req);
> > >   	if (rc == 0)
> > >   		rc = len;
> > > @@ -794,6 +805,9 @@ static void writepages_finish(struct ceph_osd_request *req)
> > >   		ceph_clear_error_write(ci);
> > >   	}
> > >   
> > > +	ceph_update_write_latency(&fsc->mdsc->metric, req->r_start_stamp,
> > > +				  req->r_end_stamp, rc);
> > > +
> > >   	/*
> > >   	 * We lost the cache cap, need to truncate the page before
> > >   	 * it is unlocked, otherwise we'd truncate it later in the
> > > @@ -1852,6 +1866,10 @@ int ceph_uninline_data(struct file *filp, struct page *locked_page)
> > >   	err = ceph_osdc_start_request(&fsc->client->osdc, req, false);
> > >   	if (!err)
> > >   		err = ceph_osdc_wait_request(&fsc->client->osdc, req);
> > > +
> > > +	ceph_update_write_latency(&fsc->mdsc->metric, req->r_start_stamp,
> > > +				  req->r_end_stamp, err);
> > > +
> > >   out_put:
> > >   	ceph_osdc_put_request(req);
> > >   	if (err == -ECANCELED)
> > > diff --git a/fs/ceph/debugfs.c b/fs/ceph/debugfs.c
> > > index 66b9622..de07fdb 100644
> > > --- a/fs/ceph/debugfs.c
> > > +++ b/fs/ceph/debugfs.c
> > > @@ -7,6 +7,7 @@
> > >   #include <linux/ctype.h>
> > >   #include <linux/debugfs.h>
> > >   #include <linux/seq_file.h>
> > > +#include <linux/math64.h>
> > >   
> > >   #include <linux/ceph/libceph.h>
> > >   #include <linux/ceph/mon_client.h>
> > > @@ -124,13 +125,70 @@ static int mdsc_show(struct seq_file *s, void *p)
> > >   	return 0;
> > >   }
> > >   
> > > +static u64 get_avg(u64 *totalp, u64 *sump, spinlock_t *lockp, u64 *total)
> > > +{
> > > +	u64 t, sum, avg = 0;
> > > +
> > > +	spin_lock(lockp);
> > > +	t = *totalp;
> > > +	sum = *sump;
> > > +	spin_unlock(lockp);
> > > +
> > > +	if (likely(t))
> > > +		avg = DIV64_U64_ROUND_CLOSEST(sum, t);
> > > +
> > > +	*total = t;
> > > +	return avg;
> > > +}
> > > +
> > > +#define CEPH_METRIC_SHOW(name, total, avg, min, max, sq) {		\
> > > +	u64 _total, _avg, _min, _max, _sq, _st, _re = 0;		\
> > > +	_avg = jiffies_to_usecs(avg);					\
> > > +	_min = jiffies_to_usecs(min == S64_MAX ? 0 : min);		\
> > > +	_max = jiffies_to_usecs(max);					\
> > > +	_total = total - 1;						\
> > > +	_sq = _total > 0 ? DIV64_U64_ROUND_CLOSEST(sq, _total) : 0;	\
> > > +	_sq = jiffies_to_usecs(_sq);					\
> > > +	_st = int_sqrt64(_sq);						\
> > > +	if (_st > 0) {							\
> > > +		_re = 5 * (_sq - (_st * _st));				\
> > > +		_re = _re > 0 ? _re - 1 : 0;				\
> > > +		_re = _st > 0 ? div64_s64(_re, _st) : 0;		\
> > > +	}								\
> > > +	seq_printf(s, "%-14s%-12llu%-16llu%-16llu%-16llu%llu.%llu\n",	\
> > > +		   name, total, _avg, _min, _max, _st, _re);		\
> > > +}
> > > +
> > >   static int metric_show(struct seq_file *s, void *p)
> > >   {
> > >   	struct ceph_fs_client *fsc = s->private;
> > >   	struct ceph_mds_client *mdsc = fsc->mdsc;
> > >   	struct ceph_client_metric *m = &mdsc->metric;
> > >   	int i, nr_caps = 0;
> > > -
> > > +	u64 total, avg, min, max, sq;
> > > +
> > > +	seq_printf(s, "item          total       avg_lat(us)     min_lat(us)     max_lat(us)     stdev(us)\n");
> > > +	seq_printf(s, "-----------------------------------------------------------------------------------\n");
> > > +
> > > +	avg = get_avg(&m->total_reads,
> > > +		      &m->read_latency_sum,
> > > +		      &m->read_latency_lock,
> > > +		      &total);
> > > +	min = atomic64_read(&m->read_latency_min);
> > > +	max = atomic64_read(&m->read_latency_max);
> > > +	sq = percpu_counter_sum(&m->read_latency_sq_sum);
> > > +	CEPH_METRIC_SHOW("read", total, avg, min, max, sq);
> > > +
> > > +	avg = get_avg(&m->total_writes,
> > > +		      &m->write_latency_sum,
> > > +		      &m->write_latency_lock,
> > > +		      &total);
> > > +	min = atomic64_read(&m->write_latency_min);
> > > +	max = atomic64_read(&m->write_latency_max);
> > > +	sq = percpu_counter_sum(&m->write_latency_sq_sum);
> > > +	CEPH_METRIC_SHOW("write", total, avg, min, max, sq);
> > > +
> > > +	seq_printf(s, "\n");
> > >   	seq_printf(s, "item          total           miss            hit\n");
> > >   	seq_printf(s, "-------------------------------------------------\n");
> > >   
> > > diff --git a/fs/ceph/file.c b/fs/ceph/file.c
> > > index 4a5ccbb..8e40022 100644
> > > --- a/fs/ceph/file.c
> > > +++ b/fs/ceph/file.c
> > > @@ -906,6 +906,10 @@ static ssize_t ceph_sync_read(struct kiocb *iocb, struct iov_iter *to,
> > >   		ret = ceph_osdc_start_request(osdc, req, false);
> > >   		if (!ret)
> > >   			ret = ceph_osdc_wait_request(osdc, req);
> > > +
> > > +		ceph_update_read_latency(&fsc->mdsc->metric, req->r_start_stamp,
> > > +					 req->r_end_stamp, ret);
> > > +
> > >   		ceph_osdc_put_request(req);
> > >   
> > >   		i_size = i_size_read(inode);
> > > @@ -1044,6 +1048,8 @@ static void ceph_aio_complete_req(struct ceph_osd_request *req)
> > >   	struct inode *inode = req->r_inode;
> > >   	struct ceph_aio_request *aio_req = req->r_priv;
> > >   	struct ceph_osd_data *osd_data = osd_req_op_extent_osd_data(req, 0);
> > > +	struct ceph_fs_client *fsc = ceph_inode_to_client(inode);
> > > +	struct ceph_client_metric *metric = &fsc->mdsc->metric;
> > >   
> > >   	BUG_ON(osd_data->type != CEPH_OSD_DATA_TYPE_BVECS);
> > >   	BUG_ON(!osd_data->num_bvecs);
> > > @@ -1051,6 +1057,16 @@ static void ceph_aio_complete_req(struct ceph_osd_request *req)
> > >   	dout("ceph_aio_complete_req %p rc %d bytes %u\n",
> > >   	     inode, rc, osd_data->bvec_pos.iter.bi_size);
> > >   
> > > +	/* r_start_stamp == 0 means the request was not submitted */
> > > +	if (req->r_start_stamp) {
> > > +		if (aio_req->write)
> > > +			ceph_update_write_latency(metric, req->r_start_stamp,
> > > +						  req->r_end_stamp, rc);
> > > +		else
> > > +			ceph_update_read_latency(metric, req->r_start_stamp,
> > > +						 req->r_end_stamp, rc);
> > > +	}
> > > +
> > >   	if (rc == -EOLDSNAPC) {
> > >   		struct ceph_aio_work *aio_work;
> > >   		BUG_ON(!aio_req->write);
> > > @@ -1179,6 +1195,7 @@ static void ceph_aio_retry_work(struct work_struct *work)
> > >   	struct inode *inode = file_inode(file);
> > >   	struct ceph_inode_info *ci = ceph_inode(inode);
> > >   	struct ceph_fs_client *fsc = ceph_inode_to_client(inode);
> > > +	struct ceph_client_metric *metric = &fsc->mdsc->metric;
> > >   	struct ceph_vino vino;
> > >   	struct ceph_osd_request *req;
> > >   	struct bio_vec *bvecs;
> > > @@ -1295,6 +1312,13 @@ static void ceph_aio_retry_work(struct work_struct *work)
> > >   		if (!ret)
> > >   			ret = ceph_osdc_wait_request(&fsc->client->osdc, req);
> > >   
> > > +		if (write)
> > > +			ceph_update_write_latency(metric, req->r_start_stamp,
> > > +						  req->r_end_stamp, ret);
> > > +		else
> > > +			ceph_update_read_latency(metric, req->r_start_stamp,
> > > +						 req->r_end_stamp, ret);
> > > +
> > >   		size = i_size_read(inode);
> > >   		if (!write) {
> > >   			if (ret == -ENOENT)
> > > @@ -1466,6 +1490,8 @@ static void ceph_aio_retry_work(struct work_struct *work)
> > >   		if (!ret)
> > >   			ret = ceph_osdc_wait_request(&fsc->client->osdc, req);
> > >   
> > > +		ceph_update_write_latency(&fsc->mdsc->metric, req->r_start_stamp,
> > > +					  req->r_end_stamp, ret);
> > >   out:
> > >   		ceph_osdc_put_request(req);
> > >   		if (ret != 0) {
> > > diff --git a/fs/ceph/metric.c b/fs/ceph/metric.c
> > > index 2a4b739..6cb64fb 100644
> > > --- a/fs/ceph/metric.c
> > > +++ b/fs/ceph/metric.c
> > > @@ -2,6 +2,7 @@
> > >   
> > >   #include <linux/types.h>
> > >   #include <linux/percpu_counter.h>
> > > +#include <linux/math64.h>
> > >   
> > >   #include "metric.h"
> > >   
> > > @@ -29,8 +30,32 @@ int ceph_metric_init(struct ceph_client_metric *m)
> > >   	if (ret)
> > >   		goto err_i_caps_mis;
> > >   
> > > +	ret = percpu_counter_init(&m->read_latency_sq_sum, 0, GFP_KERNEL);
> > > +	if (ret)
> > > +		goto err_read_latency_sq_sum;
> > > +
> > > +	atomic64_set(&m->read_latency_min, S64_MAX);
> > > +	atomic64_set(&m->read_latency_max, 0);
> > > +	spin_lock_init(&m->read_latency_lock);
> > > +	m->total_reads = 0;
> > > +	m->read_latency_sum = 0;
> > > +
> > > +	ret = percpu_counter_init(&m->write_latency_sq_sum, 0, GFP_KERNEL);
> > > +	if (ret)
> > > +		goto err_write_latency_sq_sum;
> > > +
> > > +	atomic64_set(&m->write_latency_min, S64_MAX);
> > > +	atomic64_set(&m->write_latency_max, 0);
> > > +	spin_lock_init(&m->write_latency_lock);
> > > +	m->total_writes = 0;
> > > +	m->write_latency_sum = 0;
> > > +
> > >   	return 0;
> > >   
> > > +err_write_latency_sq_sum:
> > > +	percpu_counter_destroy(&m->read_latency_sq_sum);
> > > +err_read_latency_sq_sum:
> > > +	percpu_counter_destroy(&m->i_caps_mis);
> > >   err_i_caps_mis:
> > >   	percpu_counter_destroy(&m->i_caps_hit);
> > >   err_i_caps_hit:
> > > @@ -46,8 +71,93 @@ void ceph_metric_destroy(struct ceph_client_metric *m)
> > >   	if (!m)
> > >   		return;
> > >   
> > > +	percpu_counter_destroy(&m->write_latency_sq_sum);
> > > +	percpu_counter_destroy(&m->read_latency_sq_sum);
> > >   	percpu_counter_destroy(&m->i_caps_mis);
> > >   	percpu_counter_destroy(&m->i_caps_hit);
> > >   	percpu_counter_destroy(&m->d_lease_mis);
> > >   	percpu_counter_destroy(&m->d_lease_hit);
> > >   }
> > > +
> > > +static inline void __update_min_latency(atomic64_t *min, unsigned long lat)
> > > +{
> > > +	u64 cur, old;
> > > +
> > > +	cur = atomic64_read(min);
> > > +	do {
> > > +		old = cur;
> > > +		if (likely(lat >= old))
> > > +			break;
> > > +	} while (unlikely((cur = atomic64_cmpxchg(min, old, lat)) != old));
> > > +}
> > > +
> > > +static inline void __update_max_latency(atomic64_t *max, unsigned long lat)
> > > +{
> > > +	u64 cur, old;
> > > +
> > > +	cur = atomic64_read(max);
> > > +	do {
> > > +		old = cur;
> > > +		if (likely(lat <= old))
> > > +			break;
> > > +	} while (unlikely((cur = atomic64_cmpxchg(max, old, lat)) != old));
> > > +}
> > > +
> > > +static inline void __update_avg_and_sq(u64 *totalp, u64 *lsump,
> > > +				       struct percpu_counter *sq_sump,
> > > +				       spinlock_t *lockp, unsigned long lat)
> > > +{
> > > +	u64 total, avg, sq, lsum;
> > > +
> > > +	spin_lock(lockp);
> > > +	total = ++(*totalp);
> > > +	*lsump += lat;
> > > +	lsum = *lsump;
> > > +	spin_unlock(lockp);
> 
> For each read/write/metadata latency updating,  I am trying to just make 
> the critical code as small as possible here.
> 
> 

A few extra arithmetic operations won't make a big difference here. All
of the data being accessed will (probably) be on the same cacheline too,
so it's almost certainly going to cost next to nothing anyway. There might be some benefit to using a percpu value, but it's hard to imagine it making a difference at the frequency we'll be updating this.


> > > +
> > > +	if (unlikely(total == 1))
> > > +		return;
> > > +
> > > +	/* the sq is (lat - old_avg) * (lat - new_avg) */
> > > +	avg = DIV64_U64_ROUND_CLOSEST((lsum - lat), (total - 1));
> > > +	sq = lat - avg;
> > > +	avg = DIV64_U64_ROUND_CLOSEST(lsum, total);
> > > +	sq = sq * (lat - avg);
> > > +	percpu_counter_add(sq_sump, sq);
> 
> IMO, the percpu_counter could bring us benefit without locks, which will 
> do many div/muti many times and will take some longer time on computing 
> the sq.
> 

It's really unlikely to make much difference. These operations are still
pretty fast on modern CPUs, and we're only doing one update per I/O.


> 
> > > +}
> > > +
> > > +void ceph_update_read_latency(struct ceph_client_metric *m,
> > > +			      unsigned long r_start,
> > > +			      unsigned long r_end,
> > > +			      int rc)
> > > +{
> > > +	unsigned long lat = r_end - r_start;
> > > +
> > > +	if (unlikely(rc < 0 && rc != -ENOENT && rc != -ETIMEDOUT))
> > > +		return;
> > > +
> > > +	__update_min_latency(&m->read_latency_min, lat);
> > > +	__update_max_latency(&m->read_latency_max, lat);
> 
> And also here to update the min/max without locks, but this should be 
> okay to switch to u64 and under the locks.
> 
> Thought ?
> 

The thing is that volatile variable accesses (atomic64_t's) are not
without cost. This may perform _worse_ in a contended situation as two
CPUs might be ping-ponging the same cacheline back and forth as they
each update the different fields in the same struct.

> If this makes sense, I will make the min/max to u64 type, and keep the 
> sq_sum as the percpu. Or I will make them all to u64.
> 

I'd just make them all u64s that are protected by the spinlock. You are
going to have to take the spinlock anyway, and you're updating memory
that should be very close together (and hence in the same cachelines).

Lock mitigation strategies in general don't really help if you have to
take a lock every time anyway. Any performance hit from having to do an
extra bit of math and a store under spinlock will probably not be
measurable.

-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v12 3/4] ceph: add read/write latency metric support
  2020-03-19 18:36       ` Jeff Layton
@ 2020-03-19 18:49         ` Xiubo Li
  0 siblings, 0 replies; 10+ messages in thread
From: Xiubo Li @ 2020-03-19 18:49 UTC (permalink / raw)
  To: Jeff Layton; +Cc: sage, idryomov, gfarnum, zyan, pdonnell, ceph-devel

On 2020/3/20 2:36, Jeff Layton wrote:
> On Fri, 2020-03-20 at 01:44 +0800, Xiubo Li wrote:
>> On 2020/3/19 22:36, Jeff Layton wrote:
>>> On Thu, 2020-03-19 at 10:06 -0400, xiubli@redhat.com wrote:
>>>> From: Xiubo Li <xiubli@redhat.com>
>>>>
>>>> Calculate the latency for OSD read requests. Add a new r_end_stamp
>>>> field to struct ceph_osd_request that will hold the time of that
>>>> the reply was received. Use that to calculate the RTT for each call,
>>>> and divide the sum of those by number of calls to get averate RTT.
>>>>
>>>> Keep a tally of RTT for OSD writes and number of calls to track average
>>>> latency of OSD writes.
>>>>
>>>> URL: https://tracker.ceph.com/issues/43215
>>>> Signed-off-by: Xiubo Li <xiubli@redhat.com>
>>>> ---
>>>>    fs/ceph/addr.c                  |  18 +++++++
>>>>    fs/ceph/debugfs.c               |  60 +++++++++++++++++++++-
>>>>    fs/ceph/file.c                  |  26 ++++++++++
>>>>    fs/ceph/metric.c                | 110 ++++++++++++++++++++++++++++++++++++++++
>>>>    fs/ceph/metric.h                |  23 +++++++++
>>>>    include/linux/ceph/osd_client.h |   1 +
>>>>    net/ceph/osd_client.c           |   2 +
>>>>    7 files changed, 239 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c
>>>> index 6f4678d..f359619 100644
>>>> --- a/fs/ceph/addr.c
>>>> +++ b/fs/ceph/addr.c
>>>> @@ -216,6 +216,9 @@ static int ceph_sync_readpages(struct ceph_fs_client *fsc,
>>>>    	if (!rc)
>>>>    		rc = ceph_osdc_wait_request(osdc, req);
>>>>    
>>>> +	ceph_update_read_latency(&fsc->mdsc->metric, req->r_start_stamp,
>>>> +				 req->r_end_stamp, rc);
>>>> +
>>>>    	ceph_osdc_put_request(req);
>>>>    	dout("readpages result %d\n", rc);
>>>>    	return rc;
>>>> @@ -299,6 +302,7 @@ static int ceph_readpage(struct file *filp, struct page *page)
>>>>    static void finish_read(struct ceph_osd_request *req)
>>>>    {
>>>>    	struct inode *inode = req->r_inode;
>>>> +	struct ceph_fs_client *fsc = ceph_inode_to_client(inode);
>>>>    	struct ceph_osd_data *osd_data;
>>>>    	int rc = req->r_result <= 0 ? req->r_result : 0;
>>>>    	int bytes = req->r_result >= 0 ? req->r_result : 0;
>>>> @@ -336,6 +340,10 @@ static void finish_read(struct ceph_osd_request *req)
>>>>    		put_page(page);
>>>>    		bytes -= PAGE_SIZE;
>>>>    	}
>>>> +
>>>> +	ceph_update_read_latency(&fsc->mdsc->metric, req->r_start_stamp,
>>>> +				 req->r_end_stamp, rc);
>>>> +
>>>>    	kfree(osd_data->pages);
>>>>    }
>>>>    
>>>> @@ -643,6 +651,9 @@ static int ceph_sync_writepages(struct ceph_fs_client *fsc,
>>>>    	if (!rc)
>>>>    		rc = ceph_osdc_wait_request(osdc, req);
>>>>    
>>>> +	ceph_update_write_latency(&fsc->mdsc->metric, req->r_start_stamp,
>>>> +				  req->r_end_stamp, rc);
>>>> +
>>>>    	ceph_osdc_put_request(req);
>>>>    	if (rc == 0)
>>>>    		rc = len;
>>>> @@ -794,6 +805,9 @@ static void writepages_finish(struct ceph_osd_request *req)
>>>>    		ceph_clear_error_write(ci);
>>>>    	}
>>>>    
>>>> +	ceph_update_write_latency(&fsc->mdsc->metric, req->r_start_stamp,
>>>> +				  req->r_end_stamp, rc);
>>>> +
>>>>    	/*
>>>>    	 * We lost the cache cap, need to truncate the page before
>>>>    	 * it is unlocked, otherwise we'd truncate it later in the
>>>> @@ -1852,6 +1866,10 @@ int ceph_uninline_data(struct file *filp, struct page *locked_page)
>>>>    	err = ceph_osdc_start_request(&fsc->client->osdc, req, false);
>>>>    	if (!err)
>>>>    		err = ceph_osdc_wait_request(&fsc->client->osdc, req);
>>>> +
>>>> +	ceph_update_write_latency(&fsc->mdsc->metric, req->r_start_stamp,
>>>> +				  req->r_end_stamp, err);
>>>> +
>>>>    out_put:
>>>>    	ceph_osdc_put_request(req);
>>>>    	if (err == -ECANCELED)
>>>> diff --git a/fs/ceph/debugfs.c b/fs/ceph/debugfs.c
>>>> index 66b9622..de07fdb 100644
>>>> --- a/fs/ceph/debugfs.c
>>>> +++ b/fs/ceph/debugfs.c
>>>> @@ -7,6 +7,7 @@
>>>>    #include <linux/ctype.h>
>>>>    #include <linux/debugfs.h>
>>>>    #include <linux/seq_file.h>
>>>> +#include <linux/math64.h>
>>>>    
>>>>    #include <linux/ceph/libceph.h>
>>>>    #include <linux/ceph/mon_client.h>
>>>> @@ -124,13 +125,70 @@ static int mdsc_show(struct seq_file *s, void *p)
>>>>    	return 0;
>>>>    }
>>>>    
>>>> +static u64 get_avg(u64 *totalp, u64 *sump, spinlock_t *lockp, u64 *total)
>>>> +{
>>>> +	u64 t, sum, avg = 0;
>>>> +
>>>> +	spin_lock(lockp);
>>>> +	t = *totalp;
>>>> +	sum = *sump;
>>>> +	spin_unlock(lockp);
>>>> +
>>>> +	if (likely(t))
>>>> +		avg = DIV64_U64_ROUND_CLOSEST(sum, t);
>>>> +
>>>> +	*total = t;
>>>> +	return avg;
>>>> +}
>>>> +
>>>> +#define CEPH_METRIC_SHOW(name, total, avg, min, max, sq) {		\
>>>> +	u64 _total, _avg, _min, _max, _sq, _st, _re = 0;		\
>>>> +	_avg = jiffies_to_usecs(avg);					\
>>>> +	_min = jiffies_to_usecs(min == S64_MAX ? 0 : min);		\
>>>> +	_max = jiffies_to_usecs(max);					\
>>>> +	_total = total - 1;						\
>>>> +	_sq = _total > 0 ? DIV64_U64_ROUND_CLOSEST(sq, _total) : 0;	\
>>>> +	_sq = jiffies_to_usecs(_sq);					\
>>>> +	_st = int_sqrt64(_sq);						\
>>>> +	if (_st > 0) {							\
>>>> +		_re = 5 * (_sq - (_st * _st));				\
>>>> +		_re = _re > 0 ? _re - 1 : 0;				\
>>>> +		_re = _st > 0 ? div64_s64(_re, _st) : 0;		\
>>>> +	}								\
>>>> +	seq_printf(s, "%-14s%-12llu%-16llu%-16llu%-16llu%llu.%llu\n",	\
>>>> +		   name, total, _avg, _min, _max, _st, _re);		\
>>>> +}
>>>> +
>>>>    static int metric_show(struct seq_file *s, void *p)
>>>>    {
>>>>    	struct ceph_fs_client *fsc = s->private;
>>>>    	struct ceph_mds_client *mdsc = fsc->mdsc;
>>>>    	struct ceph_client_metric *m = &mdsc->metric;
>>>>    	int i, nr_caps = 0;
>>>> -
>>>> +	u64 total, avg, min, max, sq;
>>>> +
>>>> +	seq_printf(s, "item          total       avg_lat(us)     min_lat(us)     max_lat(us)     stdev(us)\n");
>>>> +	seq_printf(s, "-----------------------------------------------------------------------------------\n");
>>>> +
>>>> +	avg = get_avg(&m->total_reads,
>>>> +		      &m->read_latency_sum,
>>>> +		      &m->read_latency_lock,
>>>> +		      &total);
>>>> +	min = atomic64_read(&m->read_latency_min);
>>>> +	max = atomic64_read(&m->read_latency_max);
>>>> +	sq = percpu_counter_sum(&m->read_latency_sq_sum);
>>>> +	CEPH_METRIC_SHOW("read", total, avg, min, max, sq);
>>>> +
>>>> +	avg = get_avg(&m->total_writes,
>>>> +		      &m->write_latency_sum,
>>>> +		      &m->write_latency_lock,
>>>> +		      &total);
>>>> +	min = atomic64_read(&m->write_latency_min);
>>>> +	max = atomic64_read(&m->write_latency_max);
>>>> +	sq = percpu_counter_sum(&m->write_latency_sq_sum);
>>>> +	CEPH_METRIC_SHOW("write", total, avg, min, max, sq);
>>>> +
>>>> +	seq_printf(s, "\n");
>>>>    	seq_printf(s, "item          total           miss            hit\n");
>>>>    	seq_printf(s, "-------------------------------------------------\n");
>>>>    
>>>> diff --git a/fs/ceph/file.c b/fs/ceph/file.c
>>>> index 4a5ccbb..8e40022 100644
>>>> --- a/fs/ceph/file.c
>>>> +++ b/fs/ceph/file.c
>>>> @@ -906,6 +906,10 @@ static ssize_t ceph_sync_read(struct kiocb *iocb, struct iov_iter *to,
>>>>    		ret = ceph_osdc_start_request(osdc, req, false);
>>>>    		if (!ret)
>>>>    			ret = ceph_osdc_wait_request(osdc, req);
>>>> +
>>>> +		ceph_update_read_latency(&fsc->mdsc->metric, req->r_start_stamp,
>>>> +					 req->r_end_stamp, ret);
>>>> +
>>>>    		ceph_osdc_put_request(req);
>>>>    
>>>>    		i_size = i_size_read(inode);
>>>> @@ -1044,6 +1048,8 @@ static void ceph_aio_complete_req(struct ceph_osd_request *req)
>>>>    	struct inode *inode = req->r_inode;
>>>>    	struct ceph_aio_request *aio_req = req->r_priv;
>>>>    	struct ceph_osd_data *osd_data = osd_req_op_extent_osd_data(req, 0);
>>>> +	struct ceph_fs_client *fsc = ceph_inode_to_client(inode);
>>>> +	struct ceph_client_metric *metric = &fsc->mdsc->metric;
>>>>    
>>>>    	BUG_ON(osd_data->type != CEPH_OSD_DATA_TYPE_BVECS);
>>>>    	BUG_ON(!osd_data->num_bvecs);
>>>> @@ -1051,6 +1057,16 @@ static void ceph_aio_complete_req(struct ceph_osd_request *req)
>>>>    	dout("ceph_aio_complete_req %p rc %d bytes %u\n",
>>>>    	     inode, rc, osd_data->bvec_pos.iter.bi_size);
>>>>    
>>>> +	/* r_start_stamp == 0 means the request was not submitted */
>>>> +	if (req->r_start_stamp) {
>>>> +		if (aio_req->write)
>>>> +			ceph_update_write_latency(metric, req->r_start_stamp,
>>>> +						  req->r_end_stamp, rc);
>>>> +		else
>>>> +			ceph_update_read_latency(metric, req->r_start_stamp,
>>>> +						 req->r_end_stamp, rc);
>>>> +	}
>>>> +
>>>>    	if (rc == -EOLDSNAPC) {
>>>>    		struct ceph_aio_work *aio_work;
>>>>    		BUG_ON(!aio_req->write);
>>>> @@ -1179,6 +1195,7 @@ static void ceph_aio_retry_work(struct work_struct *work)
>>>>    	struct inode *inode = file_inode(file);
>>>>    	struct ceph_inode_info *ci = ceph_inode(inode);
>>>>    	struct ceph_fs_client *fsc = ceph_inode_to_client(inode);
>>>> +	struct ceph_client_metric *metric = &fsc->mdsc->metric;
>>>>    	struct ceph_vino vino;
>>>>    	struct ceph_osd_request *req;
>>>>    	struct bio_vec *bvecs;
>>>> @@ -1295,6 +1312,13 @@ static void ceph_aio_retry_work(struct work_struct *work)
>>>>    		if (!ret)
>>>>    			ret = ceph_osdc_wait_request(&fsc->client->osdc, req);
>>>>    
>>>> +		if (write)
>>>> +			ceph_update_write_latency(metric, req->r_start_stamp,
>>>> +						  req->r_end_stamp, ret);
>>>> +		else
>>>> +			ceph_update_read_latency(metric, req->r_start_stamp,
>>>> +						 req->r_end_stamp, ret);
>>>> +
>>>>    		size = i_size_read(inode);
>>>>    		if (!write) {
>>>>    			if (ret == -ENOENT)
>>>> @@ -1466,6 +1490,8 @@ static void ceph_aio_retry_work(struct work_struct *work)
>>>>    		if (!ret)
>>>>    			ret = ceph_osdc_wait_request(&fsc->client->osdc, req);
>>>>    
>>>> +		ceph_update_write_latency(&fsc->mdsc->metric, req->r_start_stamp,
>>>> +					  req->r_end_stamp, ret);
>>>>    out:
>>>>    		ceph_osdc_put_request(req);
>>>>    		if (ret != 0) {
>>>> diff --git a/fs/ceph/metric.c b/fs/ceph/metric.c
>>>> index 2a4b739..6cb64fb 100644
>>>> --- a/fs/ceph/metric.c
>>>> +++ b/fs/ceph/metric.c
>>>> @@ -2,6 +2,7 @@
>>>>    
>>>>    #include <linux/types.h>
>>>>    #include <linux/percpu_counter.h>
>>>> +#include <linux/math64.h>
>>>>    
>>>>    #include "metric.h"
>>>>    
>>>> @@ -29,8 +30,32 @@ int ceph_metric_init(struct ceph_client_metric *m)
>>>>    	if (ret)
>>>>    		goto err_i_caps_mis;
>>>>    
>>>> +	ret = percpu_counter_init(&m->read_latency_sq_sum, 0, GFP_KERNEL);
>>>> +	if (ret)
>>>> +		goto err_read_latency_sq_sum;
>>>> +
>>>> +	atomic64_set(&m->read_latency_min, S64_MAX);
>>>> +	atomic64_set(&m->read_latency_max, 0);
>>>> +	spin_lock_init(&m->read_latency_lock);
>>>> +	m->total_reads = 0;
>>>> +	m->read_latency_sum = 0;
>>>> +
>>>> +	ret = percpu_counter_init(&m->write_latency_sq_sum, 0, GFP_KERNEL);
>>>> +	if (ret)
>>>> +		goto err_write_latency_sq_sum;
>>>> +
>>>> +	atomic64_set(&m->write_latency_min, S64_MAX);
>>>> +	atomic64_set(&m->write_latency_max, 0);
>>>> +	spin_lock_init(&m->write_latency_lock);
>>>> +	m->total_writes = 0;
>>>> +	m->write_latency_sum = 0;
>>>> +
>>>>    	return 0;
>>>>    
>>>> +err_write_latency_sq_sum:
>>>> +	percpu_counter_destroy(&m->read_latency_sq_sum);
>>>> +err_read_latency_sq_sum:
>>>> +	percpu_counter_destroy(&m->i_caps_mis);
>>>>    err_i_caps_mis:
>>>>    	percpu_counter_destroy(&m->i_caps_hit);
>>>>    err_i_caps_hit:
>>>> @@ -46,8 +71,93 @@ void ceph_metric_destroy(struct ceph_client_metric *m)
>>>>    	if (!m)
>>>>    		return;
>>>>    
>>>> +	percpu_counter_destroy(&m->write_latency_sq_sum);
>>>> +	percpu_counter_destroy(&m->read_latency_sq_sum);
>>>>    	percpu_counter_destroy(&m->i_caps_mis);
>>>>    	percpu_counter_destroy(&m->i_caps_hit);
>>>>    	percpu_counter_destroy(&m->d_lease_mis);
>>>>    	percpu_counter_destroy(&m->d_lease_hit);
>>>>    }
>>>> +
>>>> +static inline void __update_min_latency(atomic64_t *min, unsigned long lat)
>>>> +{
>>>> +	u64 cur, old;
>>>> +
>>>> +	cur = atomic64_read(min);
>>>> +	do {
>>>> +		old = cur;
>>>> +		if (likely(lat >= old))
>>>> +			break;
>>>> +	} while (unlikely((cur = atomic64_cmpxchg(min, old, lat)) != old));
>>>> +}
>>>> +
>>>> +static inline void __update_max_latency(atomic64_t *max, unsigned long lat)
>>>> +{
>>>> +	u64 cur, old;
>>>> +
>>>> +	cur = atomic64_read(max);
>>>> +	do {
>>>> +		old = cur;
>>>> +		if (likely(lat <= old))
>>>> +			break;
>>>> +	} while (unlikely((cur = atomic64_cmpxchg(max, old, lat)) != old));
>>>> +}
>>>> +
>>>> +static inline void __update_avg_and_sq(u64 *totalp, u64 *lsump,
>>>> +				       struct percpu_counter *sq_sump,
>>>> +				       spinlock_t *lockp, unsigned long lat)
>>>> +{
>>>> +	u64 total, avg, sq, lsum;
>>>> +
>>>> +	spin_lock(lockp);
>>>> +	total = ++(*totalp);
>>>> +	*lsump += lat;
>>>> +	lsum = *lsump;
>>>> +	spin_unlock(lockp);
>> For each read/write/metadata latency updating,  I am trying to just make
>> the critical code as small as possible here.
>>
>>
> A few extra arithmetic operations won't make a big difference here. All
> of the data being accessed will (probably) be on the same cacheline too,
> so it's almost certainly going to cost next to nothing anyway. There might be some benefit to using a percpu value, but it's hard to imagine it making a difference at the frequency we'll be updating this.
>
>
>>>> +
>>>> +	if (unlikely(total == 1))
>>>> +		return;
>>>> +
>>>> +	/* the sq is (lat - old_avg) * (lat - new_avg) */
>>>> +	avg = DIV64_U64_ROUND_CLOSEST((lsum - lat), (total - 1));
>>>> +	sq = lat - avg;
>>>> +	avg = DIV64_U64_ROUND_CLOSEST(lsum, total);
>>>> +	sq = sq * (lat - avg);
>>>> +	percpu_counter_add(sq_sump, sq);
>> IMO, the percpu_counter could bring us benefit without locks, which will
>> do many div/muti many times and will take some longer time on computing
>> the sq.
>>
> It's really unlikely to make much difference. These operations are still
> pretty fast on modern CPUs, and we're only doing one update per I/O.
>
>
>>>> +}
>>>> +
>>>> +void ceph_update_read_latency(struct ceph_client_metric *m,
>>>> +			      unsigned long r_start,
>>>> +			      unsigned long r_end,
>>>> +			      int rc)
>>>> +{
>>>> +	unsigned long lat = r_end - r_start;
>>>> +
>>>> +	if (unlikely(rc < 0 && rc != -ENOENT && rc != -ETIMEDOUT))
>>>> +		return;
>>>> +
>>>> +	__update_min_latency(&m->read_latency_min, lat);
>>>> +	__update_max_latency(&m->read_latency_max, lat);
>> And also here to update the min/max without locks, but this should be
>> okay to switch to u64 and under the locks.
>>
>> Thought ?
>>
> The thing is that volatile variable accesses (atomic64_t's) are not
> without cost. This may perform _worse_ in a contended situation as two
> CPUs might be ping-ponging the same cacheline back and forth as they
> each update the different fields in the same struct.
>
>> If this makes sense, I will make the min/max to u64 type, and keep the
>> sq_sum as the percpu. Or I will make them all to u64.
>>
> I'd just make them all u64s that are protected by the spinlock. You are
> going to have to take the spinlock anyway, and you're updating memory
> that should be very close together (and hence in the same cachelines).
>
> Lock mitigation strategies in general don't really help if you have to
> take a lock every time anyway. Any performance hit from having to do an
> extra bit of math and a store under spinlock will probably not be
> measurable.

Okay, will fix it.

Thanks.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2020-03-19 18:49 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-19 14:06 [PATCH v12 0/4] ceph: add perf metrics support xiubli
2020-03-19 14:06 ` [PATCH v12 1/4] ceph: add dentry lease metric support xiubli
2020-03-19 14:06 ` [PATCH v12 2/4] ceph: add caps perf metric for each superblock xiubli
2020-03-19 14:06 ` [PATCH v12 3/4] ceph: add read/write latency metric support xiubli
2020-03-19 14:36   ` Jeff Layton
2020-03-19 17:44     ` Xiubo Li
2020-03-19 18:01       ` Xiubo Li
2020-03-19 18:36       ` Jeff Layton
2020-03-19 18:49         ` Xiubo Li
2020-03-19 14:06 ` [PATCH v12 4/4] ceph: add metadata perf " xiubli

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.